Professional Documents
Culture Documents
100 Years of Math Milestones 9781470436520 PDF
100 Years of Math Milestones 9781470436520 PDF
MATH MILESTONES
The Pi Mu Epsilon Centennial Collection
Stephan Ramon Garcia Steven J. Miller
100 YEARS OF
MATH MILESTONES
The Pi Mu Epsilon Centennial Collection
100 YEARS OF
MATH MILESTONES
The Pi Mu Epsilon Centennial Collection
Stephan Ramon Garcia Steven J. Miller
2010 Mathematics Subject Classification. Primary 00A08, 00A30, 00A35, 05-01, 11-01,
30-01, 54-01, 60-01.
Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting
for them, are permitted to make fair use of the material, such as to copy select pages for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Requests for permission
to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For
more information, please visit www.ams.org/publications/pubpermissions.
Send requests for translation rights and licensed reprints to reprint-permission@ams.org.
c 2019 by the authors. All rights reserved.
Printed in the United States of America.
∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at https://www.ams.org/
10 9 8 7 6 5 4 3 2 1 24 23 22 21 20 19
Stephan Ramon Garcia dedicates this book to his wife, Gizem Karaali, and
their children, Reyhan and Altay. Thanks also go to his parents for their constant
support and affection.
Steven Miller dedicates this book with thanks to his many colleagues and stu-
dents who assisted in writing this book, to his in-laws Jeffrey and Judy Gelfand for
providing a hospitable environment where many of these entries were written and
edited, and to his friends at Pi Mu Epsilon (especially Harold Reiter, a previous
editor of the Problem Section) for their support of this project.
Contents
Preface xi
Notation xiii
1913. Paul Erdős 1
1914. Martin Gardner 7
1915. General Relativity and the Absolute Differential Calculus 11
1962. The Gale–Shapley Algorithm and the Stable Marriage Problem 263
In 2013, the second named author had the honor of succeeding Ashley Ahlin
and Harold Reiter as the editor of the Problem Department of the ΠME Journal.
This event essentially coincided with the 100th anniversary of Pi Mu Epsilon, so
Miller thought it would be fun and appropriate to recognize this milestone in some
way. Many others agreed. For example, Mike Pinter, from Belmont University in
Nashville, Tennessee, proposed the base-16 celebratory equation
PMEMATH
+ SOCIETY
HUNDRED
(which was used in the Spring 2014 issue). Many readers submitted correct solu-
tions, the first being Jessica Lehr of Elizabethtown College. We leave the task of
determining all possible solutions as a fun exercise for you.
Being still somewhat young, energetic, and new to the job, while also gravely
worried about finding enough good problems for issue after issue (not yet aware of
the excellent submissions that would consistently arrive), Miller decided to celebrate
with one hundred problems related to important mathematical milestones of the
past century. Since one hundred is a large number of problems relative to the normal
operation of the Problem Department (there are typically five or six problems per
issue), he asked many colleagues for contributions. This resulted in four centennial
articles, which appeared in The Pi Mu Epsilon Journal in 2013–2014 (13 (2013),
no. 9, 513–534; 13 (2014), no. 10, 577–608; 14 (2014), no. 1, 65–99; and 14 (2014),
no. 2, 100–134).
The four articles were well received and there was strong interest in converting
them into a book. The first named author came on board early in the process
as a collaborator. Every entry was either expanded jointly by us from the four
centennial articles or simply written anew. The second option was an essential
step in converting the collection from a series of disjointed problems into a unified
whole. We have used the original descriptions as springboards to introduce a variety
of mathematical ideas, techniques, and applications. Whenever possible, we have
quoted primary sources. Concepts are often introduced early on and then threaded
through and expanded upon in later entries. The final result is a tour through much
of mathematics, with an emphasis on beauty, big ideas, and interesting problems.
There are several influential collections of problems that have motivated and
guided mathematics. Hilbert’s problems and the Clay Millennium Problems are
notable examples. We have a different emphasis here. Pi Mu Epsilon is an un-
dergraduate mathematics honor society and thus, in addition to being important,
the problems must be accessible to students. Although some of them do require
analysis or algebra, number theory or probability, as a whole we hope they will be
xi
xii PREFACE
• ∅ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . empty set
• |A| . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cardinality of a set A
• (. . .)b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . number in base-b
• log x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .base-e logarithm of x
• logb x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . base-b logarithm of x
• a|b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . a divides b
• x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . greatest integer function
• gcd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . greatest common divisor
• lcm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . least common multiple
• a ≡ b (mod m) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . congruence modulo m
n
• i=1 ai . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . product of a1 , a2 , . . . , an
• N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set {1, 2, 3, . . .} of natural numbers
• Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set {. . . , −2, −1, 0, 1, 2, . . .} of integers
• Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set of rational numbers
• R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set of real numbers
• C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the set of complex numbers
• Re z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . real part of the complex number z
• Im z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . imaginary part of the complex number z
• ∼
= . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . equinumerosity (p. 28)
• f ∼ g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . asymptotic equivalence (p. 33)
• π(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the number of primes at most x (p. 33)
• Li(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (offset) logarithmic integral of x (p. 107)
xiii
1913
Paul Erdős
Introduction
How many contacts do you have in your cell phone? How many friends do you
have on Facebook? Over the course of his life, Paul Erdős (1913–1996) published
over 1,500 mathematical papers with more than 500 different people. These are
staggering numbers, and it is fitting to begin with a problem related to him. He
worked in many fields, especially in combinatorics and number theory, often using
probabilistic methods. The Erdős number (see the problem from 1969 for more
details) measures a mathematician’s collaborative distance from Erdős; the famous
“Six Degrees of Kevin Bacon” game is based upon it.
Erdős is best known for solving difficult problems and making profound con-
jectures, as opposed to developing new theories. Many conjectures he formulated
remain open. Some have small cash prizes associated with them to attract attention
and encourage further investigation. One of his most famous conjectures deals with
finding arithmetic progressions contained in a given set of integers. An arithmetic
progression is a (finite or infinite) sequence of integers, such as 4, 9, 14, 19, 24, whose
terms differ by a fixed amount.
Let N = {1, 2, 3, . . .} denote the set of natural numbers, let
and let B = N\A. The first few primes congruent to 3 modulo 4 are
3, 7, 11, 19, 23, 31, 43, 47, 59, 67, 71, 79, 83, 103, 107, 127, 131, 139, . . . ,
so
A = {3, 6, 7, 9, 11, 12, 14, 15, 18, 19, 21, 22, 23, 24, . . .}
and
B = {1, 2, 4, 5, 8, 10, 13, 16, 17, 20, 25, . . .}.
Examining the numbers that are at most 25, we see A contains a lot of arithmetic
progressions, from short ones of length three (for example, 7, 11, 15) to long ones
of length seven (for example, 3, 6, 9, 12, 15, 18, 21). However, it is harder to find
progressions among the elements of B at most 25. A little work turns up many
of length three (for example, 2, 5, 8 or 4, 10, 16 or 1, 13, 25), but we do not have a
progression as long as seven. What happens if we look at the full sets A and B? Do
you think that there are arithmetic progressions of length for any finite ? Why
might the two sets behave differently?
1
2 1913. PAUL ERDŐS
diverges, then S contains arithmetic progressions of any given finite length. Cur-
rently $5,000 is offered for the proof of the conjecture.
1913: Comments
More on the Erdős conjecture. It is important to note that the Erdős
conjecture is not an “if and only if” statement. A set of natural numbers may
contain arbitrarily long arithmetic progressions and have a convergent reciprocal
sum. An example is
1, 10, 11, 100, 101, 102, 1000, 1001, 1002, 1003, 1004, 10001, . . . , 10005, . . . .
Erdős’s conjecture only asserts that a divergent reciprocal sum is sufficient to en-
sure the existence of arbitrarily long arithmetic progressions. It is not a necessary
condition.
Notable progress on Erdős’s problem includes the celebrated Green–Tao the-
orem (see the 2004 entry), which states that the primes contain arbitrarily long
arithmetic progressions. That the sum of the reciprocals of the primes diverges is
an old result of Leonhard Euler (1707–1783); see p. 4 for a proof. Even though
the Green–Tao theorem is a special case of Erdős’s more general conjecture, it is a
profound one. It shows that a set of natural numbers as seemingly erratic as the
primes enjoys some occasional semblance of regularity.
While the proof of the Green–Tao theorem is beyond the scope of this book,
we can look at some famous sequences whose reciprocal sums converge, to see if
Erdős’s conjecture is reasonable. Two well-known examples are
∞ ∞
1 1 π2
= 1 and = ;
n=1
2n n=1
n2 6
see the notes for 1919 for a proof of the second identity.
Suppose there is a three-term arithmetic progression in the powers of 2, say
2a < 2b < 2c . Since the two gaps between the three terms are the same,
2b − 2a = 2c − 2b ,
or equivalently
2b+1 = 2c + 2a = 2a (2c−a + 1).
Since b > a, the left-hand side is divisible by a higher power of 2 than the right-hand
side, a contradiction. Thus, the longest arithmetic progression in this sequence is
of length 2 (which is not impressive). Now for perfect squares.
100TH ANNIVERSARY PROBLEMS 3
1 If a and b are not relatively prime, there can be at most one prime congruent to b modulo
a, and this happens precisely when b is prime. Thus, this case is uninteresting.
4 1913. PAUL ERDŐS
for N ≥ 1; the reason for the first inequality is due to the fact that the sum in the
middle, when expanded term-by-term, includes every term on the left-hand side.
∞ 1
This is a contradiction, since the series n=1 nQ+1 diverges.
Bibliography
[1] M. Aigner and G. M. Ziegler, Proofs from The Book, 6th ed., see corrected reprint of the 1998
original [MR1723092]; including illustrations by Karl H. Hofmann, Springer, Berlin, 2018.
MR3823190
[2] J. A. Clarkson, On the series of prime reciprocals, Proc. Amer. Math. Soc. 17 (1966), 541,
DOI 10.2307/2035210. MR0188132
[3] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of
Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/annals.2008.167.481. MR2415379
[4] P. Hoffman, The man who loved only numbers: The story of Paul Erdős and the search for
mathematical truth, Hyperion Books, New York, 1998. MR1666054
100TH ANNIVERSARY PROBLEMS 5
[5] M. Ram Murty, Primes in certain arithmetic progressions, Journal of the Madras University
(1988), 161–169.
[6] M. R. Murty and N. Thain, Prime numbers in certain arithmetic progressions, Funct. Approx.
Comment. Math. 35 (2006), 249–259, DOI 10.7169/facm/1229442627. MR2271617
1914
Martin Gardner
Introduction
Few twentieth-century mathematical authors have written on such diverse sub-
jects as Martin Gardner (1914–2010), whose books, numbering over seventy, cover
not only numerous fields of mathematics but also literature, philosophy, pseudo-
science, religion, and magic. He is best known as a recreational mathematician, due
to the accessible and entertaining manner in which he wrote. This is an important
role and should not be overlooked or minimized, as it both draws people to study
mathematics and helps with public awareness and appreciation.
In the introduction to his first book of puzzles, Hexaflexagons, Probability Para-
doxes, and the Tower of Hanoi, he wrote:
(a) Most students remember that cos(x + y) involves cos x cos y and sin x sin y, but
do we add them or subtract? Let us suppose that
and refine our guess; as long as the formula is of this general shape we can
determine a and b from special cases. When investigating special cases, try
the simplest. For example, if we take x = y = 0, then we see a = 1. Setting
7
8 1914. MARTIN GARDNER
powerful method. Euler’s formula implies that cos(x + y) + i sin(x + y) = ei(x+y) = eix eiy =
(cos x + i sin x)(cos y + i sin y) = (cos x cos y − sin x sin y) + i(cos x sin y + sin x cos y). Compare real
and imaginary parts to obtain the addition formulas for cosine and sine.
100TH ANNIVERSARY PROBLEMS 9
the simplest configuration, then we can solve it for all configurations! Of course,
it is often hard to show that all the different possibilities lead to the same answer
and that we need only deal with one case. Fortunately, this idea is still useful even
if we cannot prove the equivalence since we can use it as a starting point to guess
the correct solution.
1914: Comments
Solution to the problem. If we have a rough idea of the answer, checking a
special case can help us determine it precisely. Let us use this idea to attack the
problem from Gardner’s column. We therefore assume that the answer is indepen-
dent of the radius of the given sphere since that information is not given to us.
What would be a good choice for the radius of the sphere? An excellent option is
to have the diameter of the sphere equal 6, so the volume of the removed cylinder
is zero! If instead of choosing the diameter to be 6 we considered the general case,
we would have to argue as in Figure 2. This is certainly possible, but it is not fun.
10 1914. MARTIN GARDNER
√
R2 − 9
R−3
R
3
Of course, the difficulty of this problem is proving that the answer is indepen-
dent of the radius of the initial sphere. However, if you are willing to accept this
fact (which is implicit in the formulation of the problem), we just need to find the
answer in one special case. We might as well choose the case that is the simplest.
This is a truly powerful method and it is well worth mastering.
Bibliography
[1] M. Gardner, Hexaflexagons, probability paradoxes, and the Tower of Hanoi, New Martin Gard-
ner Mathematical Library, vol. 1, Cambridge University Press, Cambridge; Mathematical As-
sociation of America, Washington, DC, 2008. Martin Gardner’s first book of mathematical
puzzles and games; second edition of The Scientific American book of mathematical puzzles
and diversions. MR2444876
[2] E. Peres, Martin Gardner: the mathematical jester, Mathematical lives, Springer, Berlin, 2011,
pp. 217–220, DOI 10.1007/978-3-642-13606-1 31. MR2743951
[3] J. J. O’Connor and E. F. Robertson, Martin Gardner, MacTutor History of Mathematics,
http://www-history.mcs.st-and.ac.uk/Biographies/Gardner.html.
1915
General Relativity
and the Absolute Differential Calculus
Introduction
Gregorio Ricci-Curbastro (1853–1925) developed a branch of mathematics
known as the absolute differential calculus in his study of geometrical quantities and
physical laws that are invariant under general coordinate transformations. The con-
cept of a tensor first appeared in Ricci’s work, although a restricted form of tensors
had been previously introduced in vector analysis. In 1901, Ricci and his student,
Tullio Levi-Civita (1873–1941), published a complete account of the methods of
absolute differential calculus and their applications [12]. Their work was a natural
extension of the mathematics of curved surfaces introduced by Gauss and devel-
oped by Riemann and others, and of the vector analysis developed by Gibbs and
Heaviside.
Albert Einstein’s special theory of relativity deals with the study of the dy-
namics of matter and light in frames of reference that move uniformly with respect
to each other, the so-called inertial frames. Those quantities that are invariant
under the (Lorentz) transformation from one frame to another are of fundamental
importance. They include the invariant interval between two events (ct)2 − x2 , the
energy-momentum invariant E 2 − (pc)2 , and the frequency-wave number invariant
ω 2 − (kc)2 . Here c denotes the speed of light in free space. The special theory is
formulated in a gravity-free universe.
Ten years after introducing his special theory of relativity, Einstein (1879–1955)
published his crowning achievement, the general theory of relativity [6, 7]. This is
a theory of space-time and dynamics in the presence of gravity. The essential
mathematical methods used in the general theory are differential geometry and
the absolute differential calculus (which Einstein referred to as tensor analysis).
Einstein devoted more than five years to mastering the necessary mathematical
techniques. He corresponded with Levi-Civita, asking for his advice on applications
of tensor analysis.
A tensor is a set of functions, fixed in a coordinate system that transforms
under a change of the coordinate system according to definite rules. Each tensor
component in a given coordinate system is a linear, homogeneous function of the
components in another system. If there are two tensors with components that
are equal when both are written in one coordinate system, then they are equal
in all coordinate systems; these tensors are invariant under a transformation of
the coordinates [14]. Physical laws are true in their mathematical forms for all
observers in their own frames of reference (coordinate systems) and therefore the
laws are necessarily formulated in terms of tensors.
11
12 1915. GENERAL RELATIVITY AND THE ABSOLUTE DIFFERENTIAL CALCULUS
Observed position
Actual position
Observer
the principle of equivalence. Tests of general relativity are an active part of research
in physics and astronomy. The problem below is related to one of these tests; for a
review of early tests of gravitational theory see [10].
The Schwarzschild line element, in the region of a spherical mass M (obtained
as an exact solution of the Einstein field equations) is, in polar coordinates,
ds2 = c2 (1 − 2GM/rc2 )dt2 − (1 − 2GM/rc2 )−1 dr 2 − r 2 (dθ 2 + sin2 θ dφ2 ).
If χ = 2GM/rc2 is small, then the coefficient (1 − χ)−1 of dr 2 in the Schwarzschild
line element can be replaced by the leading term of its binomial expansion to give
the “weak field” line element
ds2W = (1 − χ)(c dt)2 − (1 + χ)dr 2 − r 2 (dθ 2 + sin2 θ dφ2 ).
At the surface of the sun, the value of χ is 4.2 · 8−6 , so that the weak-field approx-
imation is valid for all gravitational phenomena in our solar system.
Consider a beam of light traveling radially in the weak field of a mass M . Then
ds2W = 0 (a light-like interval) and dθ 2 + sin2 θ dφ2 = 0,
which gives
0 = (1 − χ)(c dt)2 − (1 + χ)dr 2 .
The “velocity” of the light vL = dr/dt, as determined by observers far from the
gravitational influence of M , is therefore
vL = c (1 − χ)/(1 + χ) < c
since χ > 0. Observers in free fall near M have
χ = 0 and hence measure the
speed of light to be c. Expanding the term (1 − χ)/(1 + χ) to first order in
χ = 2GM/rc2 provides the approximation
vL (r) ≈ c(1 − 2GM/rc2 + · · · ).
In geometrical optics, the refractive index n of a material is n = c/vmedium , in
which vmedium is the speed of light in the medium. We introduce the concept of
the refractive index of space-time nG (r) at a point r in the gravitational field of a
mass M :
nG (r) = c/vL (r) ≈ 1 − 2GM/rc2 .
The value of nG (r) increases as r decreases. This effect can be interpreted as an
increase in the “density” of space-time as M is approached.
As a plane wave of light approaches a spherical mass, those parts of the wave
front nearest the mass are slowed down more than those parts farthest from the
mass. The speed of the wave front is no longer constant along its surface, and
therefore the normal to the surface must be deflected. The deflection of a plane
wave of light by a spherical mass M of radius R, as it travels through space-time,
can be calculated in the weak-field approximation.
1915: Comments
There are several nice points worth isolating from this problem and remarking
on. First, when a new theory is conjectured in the sciences, we test it to see
whether or not it can explain current observations. In the case of the general
theory of relativity, this was spectacularly done by its explanation of the perihelion
of Mercury; Isaac Asimov (1920–1992) has a beautiful article on this [1]. If one is
lucky, the theory also predicts new phenomena. A terrific example of such a theory
is Bohr’s model for the hydrogen atom, which not only explained the observed
spectral lines but also predicted others previously unseen. Scientists before Einstein,
using Newtonian physics and particle models for light, posited a deflection of light
passing near a massive object. But Einstein obtained a much different value for
this deflection, which experiments then verified. Speaking of gravitational lensing,
did you know that the number of images produced by n coplanar point lenses is
at most 5n − 5? This was proved in 2008 using complex dynamics and harmonic
function theory [9].
The second great lesson here is that the usefulness of mathematics is not always
apparent. When asked about the utility of a new invention, Benjamin Franklin
(1706–1790) remarked, “What is the use of a new-born child?” The differential
geometry that underlies Einstein’s theories was not developed for relativity, but
it was available and could be used when the proper situation arose. While it can
take decades or more for some mathematics to find applications, such connections
often arise to the surprise of many of the involved parties. The 1940 entry involves
G. H. Hardy’s classic book, A Mathematician’s Apology; the reader is encouraged
to jump to that entry and reflect, while reading the excerpt, on the fact that many
of Hardy’s results have found a home in modern cryptography (and even in biology
[2]). That said, for those who would like a more down-to-earth answer here is one:
Einstein’s general theory of relativity is essential for the Global Positioning System
(GPS) to function properly and accurately [13].
Finally, it is important to remember that the jury is always out and we should
constantly explore additional ways to test a theory. It often takes decades or longer
to fully explore all the predictions and verify the results of these experiments. To
this end, there have been some exciting recent developments in the field of rela-
tivity. The Laser Interferometer Gravitational-Wave Observatory (LIGO) recently
announced [11] that they have verified another prediction of Einstein’s general the-
ory: the existence of gravitational waves. Of course, with monumental discoveries
such as this, one must wait for the results to be confirmed. To give the reader a
sense of how delicate these measurements are, researchers are looking for effects on
the order of one part in 1021 . One article put this in perspective by saying this is
equivalent to squishing our galaxy to the height of a human [8].
Bibliography
[1] I. Asimov, The planet that wasn’t, The Magazine of Fantasy and Science Fiction (1975), May.
http://geobeck.tripod.com/frontier/planet.htm.
[2] H. E. Christenson and S. R. Garcia, G. H. Hardy: mathematical biologist, J. Humanist. Math.
5 (2015), no. 2, 96–102, DOI 10.5642/jhummath.201502.08. http://scholarship.claremont.
edu/jhm/vol5/iss2/8. MR3378780
[3] P. A. M. Dirac, General theory of relativity, reprint of the 1975 original, Princeton Landmarks
in Physics, Princeton University Press, Princeton, NJ, 1996. MR1373868
100TH ANNIVERSARY PROBLEMS 15
[4] A. Einstein, On the electrodymanics of moving bodies, Annalen der Physik 17 (1905), 891–921.
http://www.fourmilab.ch/etexts/einstein/specrel/www/. For more of Einstein’s papers
from this time period, see http://www.loc.gov/rr/scitech/SciRefGuides/einstein.html.
[5] A. Einstein, Über das Relativitätsprinzip und die aus demselben gezogene Fol-
gerungen, Jahrbuch Rad. 4 (1907), 410. http://www.relativitycalculator.com/pdfs/
Einstein_1907_Comprehensive_Essay_PartsI_II_III.pdf.
[6] A. Einstein, The foundation of the general theory of relativity, Annalen der Physik
(1916). http://web.archive.org/web/20060831163721/http://www.alberteinstein.info/
gallery/pdf/CP6Doc30_English_pp146-200.pdf.
[7] A. Einstein, The meaning of relativity, reprint of the 1956 edition, Princeton University Press,
Princeton, NJ, 1988. MR1042572
[8] C. Hanna, What happens when LIGO texts you to say it’s detected one of Einstein’s pre-
dicted gravitational waves, The Conversation, February 11, 2016. http://theconversation.
com/what-happens-when-ligo-texts-you-to-say-its-detected-one-of-einsteins-
predicted-gravitational-waves-53259.
[9] D. Khavinson and G. Neumann, From the fundamental theorem of algebra to astrophysics: a
“harmonious” path, Notices Amer. Math. Soc. 55 (2008), no. 6, 666–675. MR2431564
[10] D. F. Lawden, An introduction to tensor calculus, relativity and cosmology, 3rd ed., John
Wiley & Sons, Ltd., Chichester, 1982. MR665917
[11] LIGO, Gravitational Waves Detected 100 Years After Einstein’s Prediction, LIGO News
Release, February 11, 2016. https://www.ligo.caltech.edu/news/ligo20160211.
[12] M. M. G. Ricci and T. Levi-Civita, Méthodes de calcul différentiel absolu et leurs applications
(French), Math. Ann. 54 (1900), no. 1-2, 125–201, DOI 10.1007/BF01454201. MR1511109
[13] T. Van Flandern, What the Global Positioning System tells us about relativity, in Open
Questions in Relativistic Physics (edited by F. Selleri), Apeiron (1998), 81–90.
[14] C. M. Will, in General Relativity (edited by S. W. Hawking and W. Israel), Chapter 2,
Cambridge University Press, 1979.
1916
Ostrowski’s Theorem
Introduction
The absolute value function gives the magnitude of a real or complex number.
However, there are other ways to define the “size” of a number. An absolute value
on a field F is a real-valued function that satisfies
(a) x ≥ 0,
(b) x = 0 if and only if x = 0,
(c) xy = xy, and
(d) x + y ≤ x + y.
Josef Kurschak proposed these axioms in 1912, although Kurt Hensel (1861–1941)
had started related research in 1897.
The standard absolute value on the field Q of rational numbers is
x if x ≥ 0,
x0 =
−x if x < 0.
Another example is the trivial absolute value, defined by
1 if x
= 0,
x =
0 if x = 0.
relates the standard absolute value to all of the p-adic absolute values.
We say that two absolute values ·1 and ·2 on a field F are equivalent if there
is a c > 0 so that x1 = xc2 for all x ∈ F. In 1916, Alexander Ostrowski (1893–
1986) proved what is now known as Ostrowski’s theorem: each absolute value on the
rational numbers is equivalent to the trivial absolute value, the standard absolute
value, or a p-adic absolute value. In other words, we have a complete description
17
18 1916. OSTROWSKI’S THEOREM
of all possible ways to generalize the notion of “size” for rational numbers so that
the four axioms above hold.
The standard absolute value on Q is Archimedean; that is, for each x
= 0 there
is an N ∈ N so that nx0 > 1 for all n ≥ N . In contrast, the p-adic absolute values
are non-Archimedean. Since the Archimedean property is, in a sense, “natural,”
one might use Ostrowski’s theorem to argue that the standard absolute value is the
most natural possible absolute value one can endow Q with.
1916: Comments
The p-adic numbers. Each absolute value on Q defines a metric. The stan-
dard metric on Q is
d0 (x, y) = x − y0 .
On the other hand, each prime number p gives rise to the p-adic metric on Q:
dp (x, y) = x − yp .
The real number system is the completion of Q with respect to the standard metric.
In the same way, for each prime p we complete Q with respect to the p-adic metric
and obtain the p-adic number system Qp . Just as the completion of Q with respect
to the standard metric is a field (namely R), one can show that Qp is a field. What
do the elements of Qp look like?
First let us examine how Z, the set of integers, sits inside of Q3 ; see Figure 1.
Modulo 3, the integers come in precisely three flavors: an integer is congruent to
exactly one of 0, 1, or 2 modulo 3. If x ≡ y (mod 3), then they are “pretty close”
to each other in Qp since 3|(x − y) and hence x − y3 ≤ 13 . If x ≡ y (mod 9), then
they are even closer since 9|(x − y) and hence x − y3 ≤ 19 . Continuing in this
fashion, a famous fractal (a Sierpiński triangle; see the 1963 entry) emerges. This is
suggested by Figure 1(d). To picture how Z sits inside of Q3 , imagine iterating this
process “downward” infinitely many times; to picture Q3 itself, imagine iterating
“upward” too!
If you are baffled and confused, then we have done our job. The p-adic number
system is strange. For instance, the p-adic metric is an example of an ultrametric.
An ultrametric is a metric that satisfies the strong triangle inequality
d(x, z) ≤ max{d(x, y), d(y, z)}.
One consequence of this is that every triangle in Qp is isosceles: if x − yp
=
z − yp , then
x − yp = max {x − zp , z − yp } .
Even more baffling is the fact that every point in a p-adic open disk is a center of
that open disk! Try to prove these results.
100TH ANNIVERSARY PROBLEMS 19
(a) Location in Q3 of the integers congruent to (b) Location in Q3 of the integers congruent to
0, 1, 2 (mod 3). 0, 1, . . . , 8 (mod 9).
0
0 27 54
9 18
9 18 36 63 45 72
3 6
3 6 30 57 33 60
12 21 15 24
12 21 15 24 39 66 48 75 42 69 51 78
1 2
1 2 28 55 29 56
10 19 11 20
10 19 11 20 37 64 46 73 38 65 47 74
4 7 5 8
4 7 5 8 31 58 34 61 32 59 35 62
13 22 16 25 14 23 17 26
13 22 16 25 14 23 17 26 40 67 49 76 43 70 52 79 41 68 50 77 44 71 53 80
(c) Location in Q3 of the integers congruent to (d) Location in Q3 of the integers congruent to
0, 1, . . . , 26 (mod 27). 0, 1, . . . , 80 (mod 81).
1 − 2N
2 − (−1)
=
n
+ 1
N
= 1 − (1 − 2 ) 2 = 2 2 = 2 ,
N −N
1−2 2
n=0 2
Bibliography
[1] J. E. Holly, Pictures of ultrametric spaces, the p-adic numbers, and valued fields, Amer. Math.
Monthly 108 (2001), no. 8, 721–728, DOI 10.2307/2695615. https://www.colby.edu/math/
faculty/Faculty_files/hollydir/Holly01.pdf. MR1865659
[2] A. Ostrowski, Über einige Lösungen der Funktionalgleichung ψ(x) · ψ(x) = ψ(xy) (German),
Acta Math. 41 (1916), no. 1, 271–284, DOI 10.1007/BF02422947. http://link.springer.
com/article/10.1007%2FBF02422947. MR1555153
[3] W. Stein, Introduction to Algebraic Number Theory, May 2005, http://wstein.org/129-05/
notes/129.pdf.
1917
Introduction
Marston Morse (1892–1977) was inspired by the work of Jacques Hadamard
(1865–1963), Henri Poincaré (1854–1912), and his advisor George Birkhoff (1884–
1944). In choosing a topic for his thesis, he wished to combine the fields of analysis
and geometry, a theme that continued throughout his life’s work. An entire branch
of mathematics, Morse theory, is named after him.
The shortest distance between two points in a plane is a straight line, and
straight lines have constant slope. Now consider two points on a surface. The
analogue for the straight line is a curve called a geodesic. The analogue for constant
slope is that the tangent vectors to the curve remain parallel as they are transported
along the curve. For example, on a sphere the geodesic between two points is the
arc of the great circle going through them; see Figure 1. Morse often focused on
surfaces with negative curvature, such as the “pair of pants” in Figure 2(a). In
his 1917 thesis he proved the existence of certain types of nonperiodic geodesics on
surfaces of negative curvature; for more information, see Morse’s article [2].
On a less happy note, 1917 was the year when Georg Cantor (1845–1918)
entered the sanatorium in which he ultimately died. We have a lot of things to say
about the work of Cantor, so much so that he has snuck into this entry even though
the entire 1918 entry is devoted to him!
21
22 1917. MORSE THEORY, BUT REALLY CANTOR
(a) A topological pair of pants. (b) Add a pair of pants to each “leghole” and
continue indefinitely.
C0 = [0, 1],
C1 = [0, 13 ] ∪ [ 23 , 1],
C2 = [0, 19 ] ∪ [ 29 , 13 ] ∪ [ 23 , 79 ] ∪ [ 89 , 1],
and so forth. For n ∈ N, the set Cn is obtained from Cn−1 by removing the middle
third of every closed interval contained in Cn−1 ; see Figure 3. The Cantor set is
∞
C= Cn ,
n=0
What remains behind, namely C, has Lebesgue measure zero; that is, it has zero
length. The Cantor set is a fractal, a set that demonstrates self-similarity: it con-
tains infinitely many scaled copies of itself. Moreover, C is a compact, uncountable
(see the 1918 and 1999 entries), nowhere dense, and totally disconnected subset of
[0, 1].
Two metric spaces (X, dX ) and (Y, dY ) are homeomorphic if there is a continu-
ous bijection f : X → Y whose inverse f −1 : Y → X is continuous; the functions f
100TH ANNIVERSARY PROBLEMS 23
1917: Comments
The Brunn–Minkowski theorem and Cantor dust. It seems appropriate
to spend a few pages discussing some little-known, but extremely interesting, prop-
erties of the Cantor set. Much of this can be found in [3]. A famous result that
combines arithmetic properties of sets with topological and measure-theoretic prop-
erties is the Brunn–Minkowski theorem. Let n ∈ N, let A and B be two nonempty,
compact subsets of Rd , and let
A + B = {a + b : a ∈ A, b ∈ B}.
Then
(m(A))1/n + (m(B))1/n ≤ (m(A + B))1/n ,
24 1917. MORSE THEORY, BUT REALLY CANTOR
1 Thinkof m(S) as the “length” of S. That is, until you read the notes to the 1924 entry!
a complete metric space, the intersection of a sequence A1 ⊇ A2 ⊇ · · · of nested compact
2 In
Bibliography
[1] H. M. Morse, A One-to-One Representation of Geodesics on a Surface of Negative Curva-
ture, Amer. J. Math. 43 (1921), no. 1, 33–51, DOI 10.2307/2370306. http://www.jstor.org/
stable/2370306. MR1506428
[2] H. M. Morse, Recurrent geodesics on a surface of negative curvature, Trans. Amer. Math. Soc.
22 (1921), no. 1, 84–100, DOI 10.2307/1988844. http://www.ams.org/journals/tran/1921-
022-01/S0002-9947-1921-1501161-8/S0002-9947-1921-1501161-8.pdf. MR1501161
[3] C. C. Pugh, Real mathematical analysis, 2nd ed., Undergraduate Texts in Mathematics,
Springer, Cham, 2015. MR3380933
[4] M. Spivak, A comprehensive introduction to differential geometry. Vol. One, Published by M.
Spivak, Brandeis Univ., Waltham, Mass., 1970. MR0267467
1918
Georg Cantor
Introduction
It is known that there are an infinite number of worlds, simply because
there is an infinite amount of space for them to be in. However, not
every one of them is inhabited. Therefore, there must be a finite
number of inhabited worlds. Any finite number divided by infinity is
as near to nothing as makes no odds, so the average population of all
the planets in the Universe can be said to be zero. From this it follows
that the population of the whole Universe is also zero, and that any
people you may meet from time to time are merely the products of a
deranged imagination. [1]
While we hold Douglas Adams (1952–2001), author of the famed Hitchhiker’s
Guide to the Galaxy “trilogy,” in the highest regard, “this argument isn’t worth a
pair of fetid dingo’s kidneys.” Find at least three things wrong with his argument!
The most influential mathematician to study infinity was Georg Cantor. His
work was so mind-blowing that he even managed to appropriate territory in our
1917 entry. Before getting into Cantor’s theory of cardinality and some of its jaw-
dropping consequences, let us first warm up with a few infinity-related paradoxes.
Imagine that every second, you are given two numbers that you add to your
(initially empty) collection. The first pair is 1, 2, the second pair is 3, 4, and so on.
After receiving each pair of numbers, you must discard exactly one number from
your collection. Let us examine two strategies for handling this situation.
(a) Every time you receive a pair of numbers, you discard the odd one. Thus, the
number 2n arrives in round n and remains in your collection in all successive
rounds. You are eventually left with the infinite set {2, 4, 6, . . .}.
(b) Every time you receive a pair of numbers, you discard the lowest number in
your collection. Thus, the natural number n arrives in round n+1 2 and is
removed in round n. You are eventually left with the empty set ∅.
In both scenarios, you discard exactly one card in each round. How can they lead
to two such different outcomes?
This next paradox is from the final book of Galileo Galilei (1564–1642), Dis-
courses and Mathematical Demonstrations Relating to Two New Sciences (1638).
Let S = {1, 4, 9, 16, . . .} denote the set of perfect squares. Galileo’s paradox is the
apparent contradiction that although S is “much smaller than N,” the function
n → n2 exhibits a one-to-one correspondence between S and N. How can this be?
A function f : A → B is injective (one-to-one) if “distinct inputs are sent to
distinct outputs”, that is, if f (a) = f (a ) implies a = a . We say that f : A → B is
surjective (onto) if every element of B is “hit by f ”, that is, if f (A) = B. Two sets
A and B are equinumerous if there is a one-to-one and onto function f : A → B.
27
28 1918. GEORG CANTOR
a x a x
b y b y
c z c z
(a) Injective and surjective. (b) Not injective and not surjective.
a x a x
b y
y b
c z
d z c w
(c) Surjective and not injective. (d) Injective and not surjective.
Such a function is called a bijection; see Figure 1. This relationship between A and
B is denoted A ∼ = B; we also say that A and B have the same cardinality.
One of the most important properties of the symbol ∼ = is that it is an equivalence
relation. In other words, it “behaves like an equal sign” in the sense that it is
reflexive (A ∼ = A), symmetric (A ∼ = B implies that B ∼ = A), and transitive (A ∼ =B
∼ ∼
and B = C implies that A = C). Can you prove this?
We say that A is finite if A = ∅ or A ∼ = {1, 2, . . . , n} for some n ∈ N. For finite
sets, A ∼ = B just means that “A and B have the same number of elements.”
We say that A is infinite if A is not finite, and countable if A is finite or A ∼ = N.
In fact, A is infinite if and only if it has a proper subset B such that A ∼ = B. For
instance, Galileo noted that S is a proper subset of N and S ∼ = N.
An infinite set A is countable if and only if its elements can be enumerated
a1 , a2 , a3 , . . . without repetition. Indeed, if A is so enumerable, then the function
f : N → A defined by f (n) = an is a bijection. Conversely, each bijection f : N → A
gives rise to an enumeration f (1), f (2), f (3), . . . of A.
Even though Z = {. . . , −2, −1, 0, 1, 2, . . .} has ellipses going in two directions,
it is countable since 0, 1, −1, 2, −2, 3, −3, 4, −4, . . . is an enumeration of Z. In fact,
an explicit bijection f : N → Z is
⎧n
⎪
⎨ if n is even,
2
f (n) =
⎩ −n
⎪ 1
if n is odd.
2
Can you use a similar idea to prove that the union of two countable sets is countable?
100TH ANNIVERSARY PROBLEMS 29
1
n →
2n − n + 1
−2/1 −1 0 1 2
array
f (1) = 0.d11 d12 d13 d14 d15 . . .
f (2) = 0.d21 d22 d23 d24 d25 . . .
f (3) = 0.d31 d32 d33 d34 d35 . . .
f (4) = 0.d41 d42 d43 d44 d45 . . .
f (5) = 0.d51 d52 d53 d54 d55 . . .
.. .. ..
. . .
and consider the number c = 0.c1 c2 c3 . . . ∈ [0, 1), in which
4 if dnn
= 4,
cn = (1918.1)
7 if dnn = 4.
For each n = 1, 2, . . ., the nth digit of c differs from the nth digit of f (n). Since
c
= f (n) for any n, the function f : N → [0, 1) is not a surjection.1
A shocking consequence of the uncountability of R is that there are many more
irrational numbers than rational numbers. Indeed, if the set of Qc of irrational
numbers were countable, then R = Q ∪ Qc would be the union of two countable
sets and hence be countable, which is not the case.
An algebraic number is a complex number that is a root of a polynomial with
integer coefficients. The set A of all algebraic numbers includes the rationals and
1 Why can this argument not be used to prove that Q is uncountable? Even if each
f (1), f (2), . . . is rational, there is no way to guarantee that c has an eventually repeating dec-
imal expansion (a real number is rational if and only if it has an eventually repeating decimal
expansion).
100TH ANNIVERSARY PROBLEMS 31
numbers such as
√ √ √
21/3
, i = −1, and 3 + 5; (1918.2)
these are roots of
x3 − 2, x2 + 1, and x8 − 16x4 + 4,
respectively. The degree of the integer polynomial with least degree for which an
algebraic number α is a root is called the degree of α. For instance, the numbers
(1918.2) have degrees 3, 2, and 8, respectively. One can show that the set of all
algebraic numbers is countable2 and hence most real numbers are transcendental
(not algebraic). For more information about transcendental numbers, see the 1935
and 1955 entries.
If all of this has not blown your mind, then maybe our next major revelation
will. If A is a set, then the powerset P(A) of A is the set of all subsets of A. For
example, if A = {a, b, c}, then
P(A) = ∅, {a}, {b}, {c}, {a, b}, {b, c}, {a, c}, A .
Cantor’s powerset theorem asserts that if S is any set, then there does not exist
a surjection (let alone a bijection) f : S → P(S). Since s → {s} furnishes an
injection from S into P(S) (so P(S) is “at least as big as S”), Cantor’s theorem
tells us that “P(S) is of a strictly larger cardinality than S.” Starting with S = N
and iterating the preceding result reveals that there are “infinitely many levels of
infinity”! If that did not blow your mind, then please consider a pan galactic gargle
blaster3 or two.
Here is the proof of Cantor’s theorem. Suppose toward a contradiction that
f : S → P(S) is a surjection. For each x ∈ S, we have f (x) ⊆ S and hence either
x ∈ f (x) or x ∈
/ f (x). Let
E = {x ∈ S : x ∈
/ f (x)}.
Since f is a bijection, there exists a z ∈ S such that f (z) = E. However,
z ∈ E ⇐⇒ z ∈
/ f (z) ⇐⇒ z ∈
/ E;
the first equivalence is from the definition of E, the second since f (z) = E. This
contradiction shows that no such f exists.
tions taking on only transcendental values. This has been moved to the 1955 entry.
32 1918. GEORG CANTOR
1918: Comments
A common misconception. Cantor’s diagonal method is often described as
existential and nonconstructive. This is incorrect, since it can be used to produce
a real number that is not on the given list. For instance, when Cantor’s method
is applied to a list of all algebraic numbers, in some specified order, it produces
the digits of a transcendental number. For more on these issues, see the excellent
article by Gray [4].
Solution to the problem. There are many potential false starts to the prob-
lem. If your solution involves the word “next” or “first”, then it is probably incor-
rect! Since N ∼= Q, we may as well replace N with Q to see if that makes things
easier. For each x ∈ R, let
Ax = {q ∈ Q : q < x}.
The function f : R → P(Q) defined by f (x) = Ax is an injection since the density
of Q in R implies that x < y if and only if Ax Ay . It follows that the collection
f (R) = {Ax : x ∈ R}
is uncountable and linearly ordered by ⊆. Now let g : N → Q be a bijection and let
Bx = {n ∈ N : g(n) < x}.
The collection
C = {Bx : x ∈ R}
is an uncountable chain of subsets of N.
Bibliography
[1] D. Adams, The Restaurant at the End of the Universe, Pan Books, 1980.
[2] H. Cantor, Ueber eine Eigenschaft des Inbegriffs aller reellen algebraischen Zahlen (German),
J. Reine Angew. Math. 77 (1874), 258–262, DOI 10.1515/crll.1874.77.258. MR1579605
[3] G. Cantor, Über eine elementare Frage der Mannigfaltigskeitslehre, Jahresbericht der
Deutschen Mathematiker-Vereinigung 1 (1891), 75–78.
[4] R. Gray, Georg Cantor and transcendental numbers, Amer. Math. Monthly 101 (1994), no. 9,
819–832, DOI 10.2307/2975129. http://www.jstor.org/stable/2975129. MR1300488
[5] J. J. O’Connor and E. F. Robertson, Georg Ferdinand Ludwig Philipp Cantor, MacTutor
History of Mathematics, http://www-history.mcs.st-andrews.ac.uk/Biographies/Cantor.
html.
1919
Brun’s Theorem
Introduction
One of the most tantalizing conjectures in number theory is the twin prime
conjecture. It asserts that there are infinitely many pairs of primes that differ by
2; a prime in such a pair is a twin prime. Examples of such pairs are 5 and 7,
29 and 31, and 2,996,863,034,895 × 21,290,000 ± 1 (the numbers in the last example
have 388,342 digits when fully written out [11]). More generally, given any even
k ∈ N, are there infinitely many pairs of primes whose elements differ by k? This
is Polignac’s conjecture.
Although both conjectures remain open, there has been remarkable progress
over the past 100 years, culminating in the 2013 proof of Yitang Zhang (1955– )
that there is some even number k ≤ 70,000,000 such that infinitely many pairs
of primes differ by k. This result has been improved and generalized by many
authors, especially James Maynard (1987– ), Terence Tao, and the Polymath8
project [2, 8, 9]. It is now known that there are infinitely many pairs of primes that
differ by at most 246.
One of the earliest results in the field is due to Viggo Brun (1885–1978), who
proved in 1919 that the sum of the reciprocals of the twin primes converges. Com-
pare this to Euler’s result that the sum of the reciprocals of the primes diverges
(see p. 4). Thus, in a qualitative sense, the twin primes are far more sparse than
the primes. The value of Brun’s sum,
1 1 1 1 1 1
B = + + + + + + ··· , (1919.1)
3 5 5 7 11 13
which is at least 1.83 and less than 2.347 [4], is Brun’s constant. The search for
a good approximation to it led Thomas Nicely of Lynchburg College to discover a
floating-point arithmetic error in Intel’s Pentium processor [6]. This led to a $475
million loss for Intel, demonstrating the power of pure mathematics!
Unfortunately, the convergence of Brun’s series does not resolve the twin prime
conjecture since there are many infinite collections of natural numbers that have
convergent reciprocal sums. The perfect squares are an example, since
∞
1 π2
= (1919.2)
n=1
n2 6
as Euler showed in 1734 (see the notes for a derivation). Since any finite sum of
rational numbers is rational, if one could show that Brun’s constant were irrational,
then one would have a proof of the twin prime conjecture!
In what follows, log x denotes the base-e logarithm of x. We say that functions
f and g are asymptotically equivalent, denoted f ∼ g, if limx→∞ f (x)/g(x) = 1.
Let π(x) denote the number of primes at most x. For example, π(10.5) = 4 since
33
34 1919. BRUN’S THEOREM
2, 3, 5, 7 ≤ 10. Similarly, we let π2 (x) denote the number of twin primes at most x.
The celebrated prime number theorem states that π(x) ∼ x/ log x; that is,
π(x)
lim = 1;
x→∞ x/ log x
see the 1933 and 1948 entries. Consequently, for any C > 1
Cx
π(x) ≤
log x
for sufficiently large x. In contrast, one can show that there is a constant D > 0
such that
Dx
π2 (x) ≤
(log x)2
for sufficiently large x. The smallest constant known to work here is D = 4.5 [12].
A refinement of the twin prime conjecture is the Hardy–Littlewood conjecture
(twin primes), which suggests that x/(log x)2 is the appropriate benchmark function
for the twin primes. The conjecture is
x
dt
π2 (x) ∼ 2C2 , (1919.3)
2 (log t)2
in which
p(p − 2)
C2 = = 0.660161815 . . . (1919.4)
(p − 1)2
p≥3
is the twin primes constant [3]; see Figure 1. A simpler expression that is asymptot-
ically equivalent to (1919.3) is 2C2 x/(log x)2 . See the comments for the 2005 entry
for information about the Bateman–Horn conjecture, a wide-ranging generalization
of the Hardy–Littlewood conjecture.
100TH ANNIVERSARY PROBLEMS 35
1919: Comments
The Basel problem. In 1644, Pietro Mengoli (1626–1686), posed the famous
Basel problem: evaluate
1 1 1 1
1+
+ + + + ··· .
4 9 16 25
This was solved by Euler in 1734, who provided the formula (1919.2); the problem
is named after his hometown. There are now dozens of proofs of Euler’s result. We
present a 2015 proof by Samuel G. Moreno, which is also one of the shortest [5].
It simplifies an earlier argument of Eberhard L. Stark [10]. We require the mean
value theorem for integrals: if f : [a, b] → R is continuous and g : [a, b] → R is
Riemann integrable and nonnegative, then there is a c ∈ (a, b) so that
b b
f (x)g(x) dx = f (c) g(x) dx.
a a
We start by proving the well-known formula
1
n
sin((n + 12 )x)
+ cos kx = (1919.7)
2 2 sin x2
k=1
Convert the sum on the left-hand side of (1919.7) into a sum of complex exponen-
tials, use the finite geometric series summation formula:
1 + r + · · · + r n = (1 − r n+1 )/(1 − r),
and appeal to the exponential representation of the sine function to complete the
proof of (1919.7).
Now multiply both sides of (1919.7) by x2 − 2πx and integrate by parts over
[0, π] to obtain
π
π 3 2π
n
x/2
− + 2
= (x − 2π) sin (n + 1/2)x dx
3 k sin(x/2)
k=1 0
u dv
π
x/2 − cos (n + 1/2)x
= (x − 2π)
sin(x/2) n + 1/2
0
u v
π
− cos (n + 1/2)x du
− dx
n + 1/2 dx
0
v du
π
−2π cos (n + 1/2)ξ n du
= + dx ξn ∈ [0, π]
n + 1/2 n + 1/2 0 dx
−2π + (u(π) − u(0)) cos (n + 1/2)ξn
=
n + 1/2
−2π + (2π − π /2) cos (n + 1/2)ξn
2
= .
n + 1/2
Let n → ∞ and obtain
∞
π 3 2π −2π + (2π − π 2 /2) cos (n + 1/2)ξn
− + = lim = 0,
3 k2 n→∞ n + 1/2
k=1
Solution to the second problem. Before tackling the first (and more diffi-
cult) problem, let us address the second: the series converges. One way to interpret
this result is: “most natural numbers have 9’s in them.” Big numbers have lots of
digits, and hence a high probability of having a 9 in them somewhere. Since most
numbers are “big,” we expect that the set (1919.6) omits most natural numbers.
Let us try to make this precise.
The sum of the terms with single-digit denominators is
1 1 1
1++ + ···+ < 9 · 1 = 9.
2 3 8
The sum of the terms with 2-digit denominators is
1 1 1 1
+ + ··· + < 92 · ,
10 11 88 10
since there are 92 ways of getting an ordered pair of digits from the set {0, 1, 2, 3, 4, 5,
1
6, 7, 8} and since 10 is the largest summand in the group. Similarly, the sum of the
100TH ANNIVERSARY PROBLEMS 37
terms with 3-digit denominators is less than 93 /102 , and so forth. Thus, the series
converges1 and
2
1 9 9 9
< 9 1+ + + ··· = = 90.
n 10 10 1 − 9/10
n∈A
Solution to the first problem. Let us get back to the first problem. The
sum in (1919.5) can be written as
−1
1 1
S= (1 − 1/p) = 1 + + 2 + ··· .
p p
p∈Ptwin p∈Ptwin
To see this, multiply the right-hand side of the preceding equation term-by-term
and use the fundamental theorem of arithmetic (this is permissible by Mertens’s
theorem; see the 1933 entry). Consequently,
1
1 1 1
log S = log = + 2 + 3 + ··· (1919.8)
1 − 1/p p 2p 3p
p∈Ptwin p∈Ptwin
1 1 1 1 1
= + 2
+ + ···
p 2 p 3 p3
p∈Ptwin p∈Ptwin p∈Ptwin
1 1 1 1 1
≤ + + + ···
p 2·3 p 3·3 2 p
p∈Ptwin p∈Ptwin p∈Ptwin
∞
1
= (B − 15 ) n−1
= 3(B − 15 ) log 32 < ∞,
n=1
n3
so the series that defines S converges. The appearance of B − 15 in place of Brun’s
constant is due to the fact that 15 occurs twice in the sum (1919.1) that defines B.
From (1919.8) we obtain B − 15 < log S, so
1
3 3(B− 15 )
5.10 ≈ eB− 5 < S ≤ ≈ 13.62,
2
since 1.83 ≤ B ≤ 2.347 [4].
Bibliography
[1] V. Brun, La série 1/5 + 1/7 + 1/11 + 1/13 + 1/17 + 1/19 + 1/29 + 1/31 + 1/41 + 1/43
+ 1/59 + 1/61 + ..., oú les dénominateurs sont nombres premiers jumeaux est convergente
ou finie, Bulletin des Sciences Mathématiques 43 (1919), 100-104, 124-128. http://gallica.
bnf.fr/ark:/12148/bpt6k486270d.
[2] J. Maynard, Small gaps between primes, Ann. of Math. (2) 181 (2015), no. 1, 383–413, DOI
10.4007/annals.2015.181.1.7. MR3272929
[3] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’; III: On the
expression of a number as a sum of primes, Acta Math. 44 (1923), no. 1, 1–70, DOI
10.1007/BF02403921. MR1555183
[4] D. Klyve, Explicit bounds on twin primes and Brun’s Constant, Thesis (Ph.D.)–Dartmouth
College, ProQuest LLC, Ann Arbor, MI, 2007. MR2712414
[5] Samuel G. Moreno, A one-sentence and truly elementary proof of the Basel problem, http://
arxiv.org/abs/1502.07667.
[6] T. Nicely, Pentium FDIV flaw (2011), http://www.trnicely.net/pentbug/pentbug.html.
1 To
1
be more specific, n∈A n is a series of positive terms for which the partial sums are
1
bounded above by 90. Thus, the monotone sequence property ensures that n∈A n converges.
38 1919. BRUN’S THEOREM
[7] T. R. Nicely, Enumeration to 1014 of the twin primes and Brun’s constant, Virginia J. Sci. 46
(1995), no. 3, 195–204. See also http://www.trnicely.net/twins/twins2.html. MR1401560
[8] D. H. J. Polymath, New equidistribution estimates of Zhang type, Algebra Number Theory
8 (2014), no. 9, 2067–2199, DOI 10.2140/ant.2014.8.2067. MR3294387
[9] D. H. J. Polymath, Variants of the Selberg sieve, and bounded intervals containing many
primes, Res. Math. Sci. 1 (2014), Art. 12, 83, DOI 10.1186/s40687-014-0012-7. MR3373710
[10] E. L. Stark, Application of a mean value theorem for integrals to series summation, Amer.
Math. Monthly 85 (1978), no. 6, 481–483, DOI 10.2307/2320072. MR0476932
[11] The Prime Pages, The List of Largest Known Primes Home Page, http://primes.utm.edu/
primes/.
[12] J. Wu, Chen’s double sieve, Goldbach’s conjecture and the twin prime problem, Acta Arith.
114 (2004), no. 3, 215–273, DOI 10.4064/aa114-3-2. MR2071082
1920
Waring’s Problem
Introduction
Godfrey Harold Hardy and John Edensor Littlewood wrote a series of influential
papers concerning additive problems in number theory. The first paper in this
series, published in 1920, addressed Waring’s problem [1]. For each k ≥ 2, is there
an s = s(k) such that every natural number is a sum of at most s perfect k-powers?
This problem, posed in 1770 by Edward Waring (1736–1798), is closely related
to several other famous problems in number theory. In 1769, Euler suggested that
for k ≥ 3, it is impossible to write a kth power as a sum of fewer than k nonzero kth
powers. The case k = 3 is Fermat’s last theorem for the exponent 3, now known
to be true; see the 1995 entry. Euler’s conjecture was disproved in 1966 by Leon J.
Lander and Thomas R. Parkin, who showed that
275 + 845 + 1105 + 1335 = 1445 .
What about k = 4? In 1986, Noam Elkies (1966– ) constructed an infinite
sequence of counterexamples, the smallest of which is
2,682,4404 + 15,365,6394 + 18,796,7604 = 20,615,6734 .
The smallest possible counterexample to Euler’s conjecture with k = 4 was provided
by Roger Fry in 1988:
95,8004 + 217,5194 + 414,5604 = 422,4814 .
The case k = 2 of Waring’s problem was settled in 1770 by Joseph-Louis La-
grange (1736–1813), who proved that every integer is a sum of four squares. For
instance,
2 = 12 + 12 + 02 + 02 and 7 = 22 + 12 + 12 + 12 .
A key ingredient in Lagrange’s proof is the four-square identity:
(a21 + a22 + a23 + a24 )(b21 + b22 + b23 + b24 )
= (a1 b1 − a2 b2 − a3 b3 − a4 b4 )2 + (a1 b2 + a2 b1 + a3 b4 − a4 b3 )2
+ (a1 b3 − a2 b4 + a3 b1 + a4 b2 )2 + (a1 b4 + a2 b3 − a3 b2 + a4 b1 )2 .
This identity is not as “magical” as it seems; see the notes for a derivation.
Do three squares suffice? No, because 7 cannot be written as a sum of three
squares (try it). Lagrange proved that a natural number can be represented as the
sum of three perfect squares if and only if it is not of the form 4j (8k + 7). Thus,
every natural number at most 100 except for
7, 15, 23, 28, 31, 39, 47, 55, 60, 63, 71, 79, 87, 92, 95
can be written as a sum of three squares.
39
40 1920. WARING’S PROBLEM
The finiteness of s(k) for all k ≥ 2 was not shown until the work of David
Hilbert (1862–1943) in 1909. For k = 1, 2, 3, 4, 5, 6, 7, the optimal values of s are
1, 4, 9, 19, 37, 73, 143. For instance, each positive integer is the sum of 19 fourth
powers, and there are some positive integers for which 18 fourth powers do not
suffice. For most values of k, we still do not know the optimal value of s.
Hilbert’s proof is existential; as originally stated it does not provide a bound on
how many kth powers are needed. This was remedied by Hardy and Littlewood in
their masterful paper, in which they further develop the circle method, introduced
by Hardy and Srinivasa Ramanujan (1887–1920) in 1916–1917 to analyze the par-
tition function; see the 1923 entry. This approach involves a delicate analysis of ex-
ponential sums, which we now sketch in the more modern trigonometric-polynomial
formulation.
If we attempt to write integers as a sum of kth powers, we might attempt to
use the generating function
∞
k
f (x) = e2πin x ,
n=0
ix
in which e = cos x + i sin x (this is Euler’s formula). Unfortunately, the series
above does not converge since its terms are each of unit magnitude and hence do
not tend to zero.1 We can avoid convergence problems altogether by considering
the truncated sum
N
k
fN (x) = e2πin x .
n=0
There is now a free parameter N involved; we choose N to be the number we are
trying to represent as a sum of kth powers. There is no danger in doing so since if
we are trying to express N as a sum of kth powers, none of the summands can be
larger than N .
The great insight is to consider
N
k
s
fN (x)s = e2πin x
(1920.1)
n=0
N
k
N
k
= e2πin1 x · · · e2πins x
n1 =0 ns =0
2πi(nk k
1 +···+ns )x
= e
0≤n1 ,...,ns ≤N
k
sN
= a(m; s, N )e2πimx , (1920.2)
m=0
in which a(m; s, N ) is the number of ways of writing m = nk1 + · · · + nks with each
n ≤ N . This can be deduced by expanding the product (1920.1) and collecting
terms involving e2πimx for each m.
To solve Waring’s problem, we just need to show that if s is sufficiently large,
then a(N ; s, N )
= 0 for all N . Fortunately, we can isolate a(N ; s, N ) in (1920.2),
1 The terms of a series that converges must tend to zero. However, a series whose terms tend
which is the number of ways of writing N as a sum of exactly s perfect kth powers:
1
a(N ; s, N ) = fN (x)s e−2πiN x dx. (1920.3)
0
This is because
1
1 sN
k
s −2πiN x
fN (x) e dx = a(m; s, N )e2πimx e−2πiN x
0 0 m=0
sN k 1
= a(m; s, N ) e2πi(m−N )x
m=0 0
sN k 1
= a(m; s, N ) [cos (2π(m − N )x) + i sin (2π(m − N )x)] .
m=0 0
The integrals of the sine terms are all zero; the cosine integral vanishes unless
m − N = 0, in which case the integral is 1. Thus, the preceding sum collapses to
(1920.3).
We must still show that the integral (1920.3) is nonzero. This is done by
splitting the domain of integration into two pieces, the set M of major arcs (where
the integrand is large) and the set m of minor arcs (where the integrand is small).
The terms “arc” and “circle method” originate with Hardy and Littlewood, who
formulated the method in terms of power series on the unit disk in the complex
plane. The modern treatment recasts things in terms of truncated exponential
sums and the wrapped interval [0, 1). Since fN (x) is highly oscillatory, we expect
it to exhibit massive amounts of cancellation for most values of x. For which x is
there strong reinforcement? It turns out that if x = a/q is a rational number whose
denominator q is small relative to N , then fN (x) is large. If one can show that the
integrals over M and m are of different orders of magnitude as N → ∞, we win.
See [6] for details of this calculation and [5] for a general introduction to the circle
method.
1920: Comments
The four-square identity. Before giving the solution to the problem, let us
return to the four-square identity. Let
z1 = a1 + ia2 , z2 = a3 + ia4 , w1 = b1 + ib2 , and w2 = b3 + ib4 .
42 1920. WARING’S PROBLEM
Solution to the problem. Here is a quick sketch of the solution; for more
details see [6]. The idea is to exploit the fact that both sums and differences
are allowed. This allows us to overshoot our target number, then fall back down
through subtractions. For f : N → N, let
(Δf )(x) = f (x + 1) − f (x),
(Δ(2) f )(x) = (Δ(Δf ))(x), and so forth. Induction confirms that
r
r
(Δ(r) f )(x) = (−1)r− f (x + )
=0
kth powers. Given N , write N −dk = k!z +w, in which |w| ≤ k!/2. Since 1 = 1k , we
see that w is at worst the sum or difference of at most 12 k! kth powers. Consequently,
N is a sum and difference of at most 2k−1 + 12 k! kth powers. What is the optimal
number of summands?
Bibliography
[1] G. H. Hardy and J. E. Littlewood, Some problems of “Partitio numerorum”, I: A new solu-
tion of Waring’s problem, Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen,
Mathematisch-Physikalische Klasse (1920), 33-54. https://eudml.org/doc/59073.
[2] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio Numerorum’: IV. The singular
series in Waring’s Problem and the value of the number G(k), Math. Z. 12 (1922), no. 1, 161–
188, DOI 10.1007/BF01482074. http://link.springer.com/article/10.1007%2FBF01482074.
MR1544511
[3] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’ (VI): Further
researches in Waring’s Problem, Math. Z. 23 (1925), no. 1, 1–37, DOI 10.1007/BF01506218.
http://link.springer.com/article/10.1007%2FBF01506218. MR1544728
100TH ANNIVERSARY PROBLEMS 43
Mordell’s Theorem
Introduction
An elliptic curve is a plane curve E determined by an equation of the form
y 2 = x3 + ax + b, (1921.1)
in which a and b are fixed integers and the discriminant
Δ = −16(4a3 + 27b2 )
is nonzero. The nonvanishing of the discriminant ensures that E has no cusps, self-
intersections, or isolated points; see Figure 1. Elliptic curves have many fascinating
properties; we can barely scratch the surface of this important topic.
45
46 1921. MORDELL’S THEOREM
y y
Q
x Q x
P
P
P +Q
P +Q
rank of E(Q) and the group T is the torsion subgroup of E(Q). Barry Mazur
(1937– ) proved that T must be of the form Z/nZ for 1 ≤ n ≤ 12 or Z/2Z × Z/2nZ
for n ∈ {1, 2, 3, 4} [2, 3]. Moreover, each of these groups occurs infinitely often as
the torsion subgroup of an elliptic curve.
It is possible for r = 0 to occur. In fact, it is conjectured that 50% of elliptic
curves have rank 0 and 50% have rank 1 (in a probabilistic sense that we cannot
make precise here). For instance, one can show that the elliptic curve E defined by
y 2 = x3 − x has only four rational points:
E(Q) = (0, 0), (1, 0), (0, 1), ∞ ;
see [7, Ex. 1.1]. In this case, rank E = 0 and E(Q) is isomorphic to Z/2Z × Z/2Z.
On the other hand, the elliptic curve with the largest known rank is
y 2 + xy + y
= x3 − 120039822036992245303534619191166796374x
+ 504224992484910670010801799168082726759443756222911415116.
This can be put in “standard form” (1921.1) with a change of variables, although
the coefficients become even larger. The rank of this curve is at least 24; the actual
rank is unknown (it is suspected to be exactly 24).
In addition to being of theoretical interest, elliptic curves play an important
role in cryptography, factorization algorithms, and primality testing; see [10] and
the references in [7]. Their group structure is far richer than the group structure
arising in the cyclic groups (Z/pqZ)× , in which p and q are distinct primes, that
are used in RSA (see the entry for 1977).
1921: Comments
The Birch and Swinnerton-Dyer conjecture. The celebrated Birch and
Swinnerton-Dyer conjecture, one of the Clay Millennium Problems (see the com-
ments for the 2000 entry), concerns the rank of elliptic curves. The 2014 Fields
Medal of Manjul Bhargava (1974– ) was awarded in part for work related to this
problem. The conjecture states that the geometric rank of an elliptic curve equals
its analytic rank . What does this mean?
48 1921. MORDELL’S THEOREM
We may consider the equation (1921.1) modulo some prime p; for technical
reasons, p should not divide Δ. The number of points on the elliptic curve modulo
p is
Np = 1 + (x, y) ∈ (Z/pZ)2 : y 2 = x3 + ax + b ,
in which the +1 is added for the “point at infinity.” Helmut Hasse (1898–1979)
proved that
√
|Np − (p + 1)| ≤ 2 p (1921.4)
for every prime p that does not divide Δ. The Hasse–Weil L-function of the elliptic
curve E is the function
(p + 1) − Np p −1
L(E, s) = 1− 2
+ 2s × p (E, s)−1
p p
p∈P p∈P
pΔ pΔ
of the complex variable s, in which p (E, s) is a certain polynomial in p−s that does
not vanish at s = 1. Hasse’s bound (1921.4) ensures that the product that defines
L(E, s) converges absolutely and locally uniformly on the half plane Re s > 32 .
Consequently, L(E, s) is an analytic function of s on this region.
The famed Taniyama–Shimura conjecture ensures that L(E, s) has an analytic
continuation to C satisfying a certain functional equation analogous to that satisfied
by the Riemann zeta function (see the 1928, 1933, 1939, 1942, 1945, 1967, and
1987 entries). The analytic rank of E is the order of the zero of L(E, s) at s = 1.
The simplest version of the Birch and Swinnerton-Dyer conjecture asserts that the
geometric and analytic ranks of an elliptic curve are equal. See [4, 5, 8] for more on
L-functions associated to elliptic curves. See [1, 11] and the references therein for
results towards the Birch and Swinnerton-Dyer conjecture and the distribution of
ranks.
Bibliography
[1] M. Bhargava and A. Shankar, Ternary cubic forms having bounded invariants, and the ex-
istence of a positive proportion of elliptic curves having rank 0, Ann. of Math. (2) 181
(2015), no. 2, 587–621, DOI 10.4007/annals.2015.181.2.4. http://annals.math.princeton.
edu/2015/181-2/p04. MR3275847
[2] B. Mazur, Modular curves and the Eisenstein ideal, Inst. Hautes Études Sci. Publ. Math.
47 (1977), 33–186 (1978). http://link.springer.com/article/10.1007%2FBF02684339.
MR488287
[3] B. Mazur, Rational isogenies of prime degree (with an appendix by D. Goldfeld), Invent.
Math. 44 (1978), no. 2, 129–162, DOI 10.1007/BF01390348. http://link.springer.com/
article/10.1007%2FBF01390348. MR482230
[4] A. W. Knapp, Elliptic curves, Mathematical Notes, vol. 40, Princeton University Press,
Princeton, NJ, 1992. MR1193029
[5] Á. Lozano-Robledo, Elliptic curves, modular forms, and their L-functions, Student Math-
ematical Library, vol. 58, American Mathematical Society, Providence, RI; Institute for
Advanced Study (IAS), Princeton, NJ, 2011. IAS/Park City Mathematical Subseries.
MR2757255
[6] L. J. Mordell, On the rational solutions of the indeterminate equations of the third and fourth
degrees, Proc Cam. Phil. Soc. 21 (1922).
[7] K. Rubin and A. Silverberg, Ranks of elliptic curves, Bull. Amer. Math. Soc. (N.S.) 39 (2002),
no. 4, 455–474, DOI 10.1090/S0273-0979-02-00952-7. MR1920278
[8] J. H. Silverman, The arithmetic of elliptic curves, Graduate Texts in Mathematics, vol. 106,
Springer-Verlag, New York, 1986. MR817210
100TH ANNIVERSARY PROBLEMS 49
[9] J. H. Silverman and J. Tate, Rational points on elliptic curves, Undergraduate Texts in
Mathematics, Springer-Verlag, New York, 1992. MR1171452
[10] L. C. Washington, Elliptic curves: Number theory and cryptography, 2nd ed., Discrete Math-
ematics and its Applications (Boca Raton), Chapman & Hall/CRC, Boca Raton, FL, 2008.
MR2404461
[11] A. Wiles, The Birch and Swinnerton-Dyer conjecture, The millennium prize problems, Clay
Math. Inst., Cambridge, MA, 2006, pp. 31–41. http://www.claymath.org/sites/default/
files/birchswin.pdf. MR2238272
1922
Lindeberg Condition
Introduction
This year celebrates a milestone in the history of the central limit theorem, one
of the most important results in probability theory. We first introduce some of the
key concepts. A continuous random variable X has density fX if
(a) fX (x) ≥ 0,
∞
(b) fX (x) dx = 1, and
−∞
b
(c) P (a ≤ X ≤ b) = fX (x) dx,
a
in which P (a ≤ X ≤ b) denotes the probability that X takes on a value in the closed
interval [a, b]. This leads to one of the most important applications of integration:
it allows us to compute probabilities.
The nth moment of a random variable X with density fX , also called the
expected value of X n , is
∞
E[X ] =
n
xn fX (x) dx.
−∞
When the random variable X is clear from context, we often simplify the notation
and write μ, σ, and f in place of μX , σX , and fX , respectively.
The mean is the average value of a random variable. The standard deviation,
the square root of the variance, is the natural scale to measure fluctuations from the
mean. If you assign units to the random variable, say meters, then the mean and
the standard deviation are both in meters while the variance is in meters-squared.
If we want to have confidence intervals about a measurement, then the units of the
uncertainty should be the same as the units of the random variable. This is why
the standard deviation is the natural quantity considered in many problems.
There are many densities that arise in theory and applications. The normal
distribution occupies a central role in the subject; we will see why shortly. A random
51
52 1922. LINDEBERG CONDITION
(a) Z1 (b) Z2
(c) Z4 (d) Z8
normality. Typically the first version students encounter has the random variables
identically distributed, and the even moments
∞
m2k = x2k fX (x) dx
−∞
converges for all |t| < δ for some δ > 0. This comes from a desire to have the
moment generating function
∞
MX (t) = E[e ] =
tX
etx fX (x) dx
−∞
and let E[·] denote expectation relative to the underlying probability space. If s2n =
n 2
k=1 σXk and for all
> 0 we have
n
k=1 E[(Xk − μXk ) I(Xk ) ≥
sn ]
2
lim = 0,
n→∞ s2n
2
then Zn converges to a Gaussian. If we additionally assume maxk σX k
/s2n → 0,
then this condition is also necessary.
1922: Comments
A useful trick. By taking logarithms we can convert questions about products
of random variables to questions about sums of related random variables. If Yi =
logB Xi , then to understand the distribution of the product X1 · · · Xn it suffices
to determine the distribution of Y1 + · · · + Yn and then exponentiate. In many
situations Lindeberg’s conditions hold for the Yi and as n tends to infinity the sum
is approximately a Gaussian. Since we have not standardized this sum, we expect
the variance of the Gaussian to tend to infinity as we add more and more terms.
Given a positive real number r, we may write it uniquely as r = S10 (r)10k(r) ,
where S10 (r) ∈ [1, 10) is the significand and k(r) is an integer. Computing the
sum Y1 + · · · + Yn modulo 1, in which Yi = log10 Xi , is equivalent to determining
the significand of the product X1 · · · Xn , which is the first nonzero digit of this
product when rounded down. See [4, 6] for proofs that the sum converges to the
uniform distribution, as well as applications to Benford’s law; see the 1938 entry. In
addition to being of theoretical interest, such probabilistic digit laws are frequently
used in a variety of fields. The Internal Revenue Service (IRS) uses them to detect
tax fraud and computer scientists use them to optimize systems architecture.
100TH ANNIVERSARY PROBLEMS 55
(a) n = 10 (b) n = 20
Bibliography
[1] P. Billingsley, Probability and measure, 3rd ed., A Wiley-Interscience Publication, Wiley Se-
ries in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1995.
MR1324786
[2] J. W. Lindeberg, Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeit-
srechnung (German), Math. Z. 15 (1922), no. 1, 211–225, DOI 10.1007/BF01494395. http://
link.springer.com/article/10.1007%2FBF01494395. MR1544569
[3] W. Feller, An introduction to probability theory and its applications. Vol. II., Second edition,
John Wiley & Sons, Inc., New York-London-Sydney, 1971. MR0270403
[4] S. J. Miller (ed.), Benford’s Law: theory and applications, Princeton University Press, Prince-
ton, NJ, 2015. MR3408774
[5] S. J. Miller, The probability lifesaver: All the tools you need to understand chance, Princeton
Lifesaver Study Guide, Princeton University Press, Princeton, NJ, 2017. MR3585480
[6] S. J. Miller and M. J. Nigrini, The modulo 1 central limit theorem and Benford’s law for
products, Int. J. Algebra 2 (2008), no. 1-4, 119–130. http://arxiv.org/pdf/math/0607686v2.
MR2417189
1923
Introduction
In 1923 G. H. Hardy and J. E. Littlewood published a landmark paper [3] that
further developed the celebrated circle method; see the 1920 entry. This method
is well represented in the early years of this book. This is no accident, since it has
enjoyed spectacular success in resolving difficult problems in number theory. Their
paper attacked many questions in additive number theory, including the ternary
Goldbach conjecture, the twin prime conjecture, and the distribution of admissible
tuples of primes. The first two problems are discussed in the 1937 and 1919 entries,
respectively. We discuss the third problem below.
The expressions n, n + 2, and n + 4 are simultaneously prime if and only if
n = 3; this yields the primes 3, 5, and 7. Indeed, n, n+2, and n+4 are congruent to
n, n + 2, and n + 1 modulo 3, respectively. Therefore, exactly one of these numbers
is divisible by 3, which leads to only one prime triple, (3, 5, 7). This congruence
obstruction prevents (n, n + 2, n + 4) from being a triple of primes infinitely often.
On the other hand, there is no congruence obstruction that prevents n and n + 2
from being simultaneously prime. The twin prime conjecture suggests that this
occurs infinitely often.
Hardy and Littlewood conjectured that if there are no congruence obstructions
that prevent a particular k-tuple of primes from occurring, then there are infinitely
many such k-tuples. They also gave asymptotic formulas for the expected number
of k-tuples of primes below a certain threshold. The main term is the product of
a function asymptotic to x/(log x)k+1 and a constant that depends on the k − 1
neighbor differences and vanishes if there is a congruence obstruction. For example,
they predicted that the number π2 (x) of twin primes at most x obeys
p(p − 2) x dt
π2 (x) ∼ 2
(p − 1)2 2 (log t)2
p≥3
as x → ∞ (see the comments for the 2005 entry for information about the Bateman–
Horn conjecture, a broad generalization). The formula does a phenomenal job; the
number of twin primes at most 1016 is 10,304,195,697,298, which differs from the
Hardy–Littlewood prediction by 3,142,802. Although the error might seem large,
it is small in terms of percentages. The ratio of the error to the true value is about
3 × 10−7 (and we believe the ratio gets smaller the higher up we go). To put this
in the proper perspective, MapQuest lists it as 2,990.1 miles from Fenway Park in
Boston to Dodger Stadium in Los Angeles. A similarly precise measurement here
would correspond to an error of about 5 feet (about a third of the length of a typical
car).
57
58 1923. THE CIRCLE METHOD
Ben Green and Terence Tao extended a seminal result of Endre Szemerédi
(1940– ) and established that the primes contain arbitrarily long arithmetic pro-
gressions [2]; see the 1975 and 2004 entries. That is, given , there exist a and b
so that an + b is prime for n = 1, 2, . . . , . This differs from Dirichlet’s theorem on
primes in arithmetic progressions (see the 1913 entry), in which one fixes relatively
prime a and b and then concludes that the set {an + b : n = 1, 2, . . .} contains infin-
itely many primes (along with many composite numbers). The Green–Tao theorem
is a consequence of the more far-reaching Hardy–Littlewood k-tuple conjecture,
which is currently beyond our reach.
1923: Comments
The power of counting. The centennial problem illustrates another impor-
tant idea: the power of counting arguments. Brun proved his theorem on the
100TH ANNIVERSARY PROBLEMS 59
convergence of the sum of the reciprocals of the twin primes by showing there is a
C > 0 such that
2
log log x
π2 (x) ≤ Cx
log x
for sufficiently large x. Brun’s estimate permits us to obtain an upper bound on the
number of the twin primes in each interval of the form [2n , 2n+1 ). The reciprocals
of such primes are at most 1/2n , from which one can show that the sum of the
reciprocals of the twin primes converges.
Suppose that p is prime, p a, and p b. For a given integer c, are there any
points (x, y) on the “ellipse”
ax2 + by 2 ≡ c (mod p)?
Although the expression appears familiar, we are working modulo p and things
become hard to visualize. The answer involves a beautiful counting argument.
First observe that there are exactly p+1 2
2 distinct values modulo p assumed by x ,
p−1
namely 0 and 2 nonzero values. This is because
x2 ≡ y 2 (mod p) ⇐⇒ x ≡ y (mod p) or x ≡ −y (mod p).
By hypothesis, a is invertible modulo p and hence ax also assumes p+1
2
2 distinct
values modulo p. Similarly, −by + c assumes 2 distinct values as well. If there
2 p+1
did not exist x, y such that ax2 ≡ −by 2 + c (mod p), then there would be at least
p+1 p+1
+ =p+1
2 2
distinct residue classes modulo p, which is impossible. Thus, ax2 + by 2 ≡ c (mod p)
has a solution.
to study the asymptotic behavior of p(n). To see why (1923.1) iscorrect, expand
∞
each term in the product as a geometric series: (1 − z k )−1 = jk
j=0 z . Then
multiply the product of these series term-by-term. When the terms are gathered
together, the coefficient of z n will be p(n). If multiplying together infinitely many
infinite series makes you feel queazy, look at the derivation of the Euler product
formula in 1933. We give a rigorous derivation there that has a similar flavor.
In 1918, Hardy and Ramanujan proved that
"
1 2n
p(n) ∼ √ exp π ;
4n 3 3
60 1923. THE CIRCLE METHOD
Bibliography
[1] S. Ahlgren and K. Ono, Addition and counting: the arithmetic of partitions, Notices Amer.
Math. Soc. 48 (2001), no. 9, 978–984. http://www.ams.org/notices/200109/fea-ahlgren.
pdf. MR1854533
[2] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of
Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/annals.2008.167.481. arXiv:math.NT/
0404188. MR2415379
[3] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’; III: On
the expression of a number as a sum of primes, Acta Math. 44 (1923), no. 1, 1–
70, DOI 10.1007/BF02403921. http://link.springer.com/article/10.1007%2FBF02403921.
MR1555183
[4] G. H. Hardy and S. Ramanujan, Asymptotic Formulaae in Combinatory Analysis, Proc. Lon-
don Math. Soc. (2) 17 (1918), 75–115, DOI 10.1112/plms/s2-17.1.75. MR1575586
[5] F. Johansson, Efficient implementation of the Hardy-Ramanujan-Rademacher formula, LMS
J. Comput. Math. 15 (2012), 341–359, DOI 10.1112/S1461157012001088. MR2988821
[6] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[7] M. B. Nathanson, Additive number theory: The classical bases, Graduate Texts in Mathemat-
ics, vol. 164, Springer-Verlag, New York, 1996. MR1395371
[8] K. Ono, Distribution of the partition function modulo m, Ann. of Math. (2) 151 (2000), no. 1,
293–307, DOI 10.2307/121118. MR1745012
1924
Introduction
A mathematical paradox often leads to a reevaluation of underlying assump-
tions. A good paradox can greatly influence the development of a subject. The
Banach–Tarski paradox is one of the best: there exists a partition of the unit ball
in R3 into a finite number of subsets that can be rearranged, using rigid motions,
to yield two identical copies of the original ball. In other words, you can dissect an
orange and reassemble it into two full-sized oranges; see Figure 1.
This is impossible in the real world—hence the word “paradox”. Whereas real
oranges are made up of atoms and are cut with a knife, mathematical oranges are
made up of infinitely many points that can be partitioned into extremely compli-
cated sets. We can choose which subset each point should belong to, with no regard
61
62 1924. THE BANACH–TARSKI PARADOX
for nearby points (that is, ignoring continuity). One should not think of the pieces
involved as solid pieces that can be handled like everyday objects. It is best to
think of the pieces as, perhaps, “dense gas clouds.” The actual construction is
more subtle, but it does involve making infinitely many arbitrary choices. This is
permitted by the axiom of choice, a topic that deserves a healthy digression; see
the 1940 and 1999 entries.
Stefan Banach (1882–1945) and Alfred Tarski (1901–1983) actually proved
much more. For n ≥ 3, given any two bounded subsets E and F of Rn having
#N #N
nonempty interior, there are partitions E = i=1 Ei and F = i=1 Fi into dis-
joint sets so that the sets Ei and Fi are congruent (in the geometric sense) for
i = 1, 2, . . . , N . So for n ≥ 3, it is impossible to find a finitely additive, normal-
ized “volume function,” defined on all subsets of Rn , that is invariant under rigid
motions. This defeats any attempt to assign a “volume” to all subsets of Rn .
We briefly sketch the main ideas behind “doubling the ball” in R3 . Let SO3 =
SO3 (R) denote the group of all 3 × 3 real orthogonal matrices with determinant 1.
That is, SO3 is the set of rigid motions of R3 that fix the origin and preserve orien-
tation. The crucial observation is that SO3 contains a subgroup that is “isomorphic
to the free group on two generators.” In less technical language, SO3 contains two
matrices A and B for which there are no nontrivial relationships between A, A−1 ,
B, and B −1 , apart from A−1 A = I, BB −1 = I, B −1 AA−1 B = I, and so forth. For
example, ⎡3 ⎤ ⎡ ⎤
5 − 45 0 1 0 0
⎢ ⎥ ⎢ ⎥
A = ⎣ 45 3
5 0⎦ , B = ⎣0 35 − 45 ⎦
0 0 1 0 45 3
5
are such matrices (the proof is nontrivial and involves elements of Diophantine
approximation). These matrices induce rotations by θ = tan−1 ( 34 ), which is not a
rational multiple of π, with respect to the z- and x-axes, respectively. Let Γ ⊂ SO3
denote the group generated by A and B; it consists of words in A, A−1 , B, and B −1 ,
such as AB 2 B −2 A−1 A2 B. This word is not reduced because further cancellation is
possible; it reduces to A2 B.
Let w(A) be the set of all reduced words that begin with A, and so on, and let
Γ1 = w(A) ∪ {I, A−1 , A−2 , A−3 , . . .},
Γ2 = w(A−1 )\{A−1 , A−2 , A−3 , . . .},
Γ3 = w(B), and
−1
Γ4 = w(B ).
We have the paradoxical decomposition
Γ = Γ1 ∪ Γ2 ∪ Γ3 ∪ Γ4 = Γ1 ∪ AΓ2 = Γ3 ∪ BΓ4 ,
which facilitates “doubling the ball.”
Let S 2 = {x ∈ R3 : x = 1} denote the surface of the unit sphere in R3 ;
the exponent 2 denotes that S 2 is a “two-dimensional manifold” (a microscopic
observer on the surface of the sphere will think that they are in R2 ; see the 2003
entry for more information). Define a relation on S 2 by saying that x ∼ y if there
is a C ∈ Γ so that Cx = y. This is an equivalence relation: ∼ is reflexive since
I ∈ Γ; ∼ is symmetric since Γ is closed under inversion; ∼ is transitive since matrix
multiplication is associative.
100TH ANNIVERSARY PROBLEMS 63
1924: Comments
Why three dimensions? The Banach–Tarski paradox dealt with a solid ball
in R3 . What happens if we work with a disk in R2 ? Recall that SO3 contains
a subgroup isomorphic to the free group on two generators. The group SO2 of
2 × 2 orthogonal matrices with determinant 1 does not contain such a subgroup.
Indeed, SO2 is the group of all rotations of R2 around the origin. Rotations in R2
commute with each other, so SO2 is an abelian (commutative) group. It is too nice
for something like the “paradoxical decomposition” required for the Banach–Tarski
paradox to occur.
The set S constructed above is an example of a Vitali set, named after Giuseppe
Vitali (1875–1932). These sets are so strange that there is no reasonable way to
define their “length.” Their construction relies upon the axiom of choice; see the
1940 and 1999 entries for more about this fascinating, and somewhat controversial,
axiom of set theory.
Bibliography
[1] A. Akhmedov, A new metric criterion for non-amenability III: Non-amenability of R. Thomp-
son’s group F , http://arxiv.org/abs/0902.3849.
[2] S. Banach and A. Tarski, Sur la décomposition des ensembles de points en parties respective-
ment congruentes, Fund. Math. 6 (1924), 244-277. http://matwbn.icm.edu.pl/ksiazki/fm/
fm6/fm6127.pdf
[3] R. M. Robinson, On the decomposition of spheres, Fund. Math. 34 (1947), 246–260,
DOI 10.4064/fm-34-1-246-260. http://matwbn.icm.edu.pl/ksiazki/fm/fm34/fm34125.pdf.
MR0026093
[4] E. T. Shavgulidze, About amenability of subgroups of the group of diffeomorphisms of the
interval, http://arxiv.org/abs/0906.0107.
[5] S. Wagon, The Banach-Tarski paradox, with a foreword by Jan Mycielski, Encyclopedia of
Mathematics and its Applications, vol. 24, Cambridge University Press, Cambridge, 1985.
MR803509
1925
Introduction
1
Probably the second most famous equation in physics is Newton’s second law:
F = ma. Here F is the force acting on a body, m is the mass of the body, and a is
the acceleration (the second derivative of position).
The analogue for quantum mechanical systems is the Schrödinger equation,
formulated in 1925 by Erwin Schrödinger (1887–1961). It is
∂ +
i Ψ = HΨ,
∂t
in which i2 = −1, = h/2π (h is Planck’s constant), H + is the Hamiltonian operator
of the system, and Ψ is the wave function that governs the system. To explain the
mathematics behind the Schrödinger equation with any sense of rigor would occupy
the remaining pages of this book.2
In the quantum-mechanical setting, the eigenvalues E of the time-independent
Schrödinger equation EΨ = HΨ + are the energy levels of the corresponding quan-
tum system. In this eigenvalue problem, H + is an unbounded, selfadjoint operator
on a Hilbert space (pardon the jargon). If one knows the eigenvalues of a given
Schrödinger operator, one often wants to predict how these eigenvalues are affected
by a slight modification of the original operator. Although this is far too compli-
cated to address here, we can discuss the finite-dimensional setting.
Let Mn (C) denote the set of n × n complex matrices. We say that A ∈ Mn (C)
is selfadjoint if A = A∗ , that is, if A equals its conjugate transpose (physicists tend
to use A† instead of A∗ ). A selfadjoint matrix A ∈ Mn (C) has only real eigenvalues,
denoted by
λ1 (A) ≥ λ2 (A) ≥ · · · ≥ λn (A)
and repeated according to multiplicity, along with a corresponding orthonormal
basis of eigenvectors. This is a special case of the spectral theorem [3]. How do the
eigenvalues of a selfadjoint matrix behave under a small perturbation?
Suppose that E ∈ Mn (C) is a positive semidefinite matrix of rank one. That is,
E = ee∗ ∈ Mn (C) for some nonzero column vector e ∈ Cn . Then the eigenvalues of
A interlace with those of A + E and A − E; that is, each eigenvalue of A is at most
the corresponding eigenvalue of A + E and at least the corresponding eigenvalue of
1 The most famous is undoubtedly E = mc2 ; see the entry for 1915. See Episode 2 of the
1972 Doctor Who serial The Time Monster for the more dubious E = mc3 .
2 It took the genius of John von Neumann to put quantum mechanics on a firm mathematical
foundation; see the entries for 1924, 1931, 1944, 1946 for more about him.
67
68 1925. THE SCHRÖDINGER EQUATION
A+E
A−E
−12 −10 −8 −6 −4 −2 0 2 4 6 8
A − E. For example, if
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−4 0 2 4 1 2 1 0 1
⎢ 0 −4 1 2⎥ ⎢2 4 2 0⎥ ⎢ ⎥
A = ⎣⎢ ⎥ ⎢
and E = ⎣ ⎥ = ⎢2⎥ [1 2 1 0],
2 1 4 −2⎦ 1 2 1 0⎦ ⎣1⎦
4 2 −2 −4 0 0 0 0 0
then adding E to A increases each of its eigenvalues. Subtracting E from A de-
creases each of its eigenvalues. This is illustrated in Figure 1. This sort of eigenvalue
interlacing result is the tip of the iceberg; for more information see [4].
There are lots of other beautiful results like this that are not typically covered in
an undergraduate linear algebra course. What if we have three selfadjoint matrices
A, B, C ∈ Mn (C) that satisfy A + B = C? How are the eigenvalues of A, B, and C
related? Taking the trace of this equation indicates that
n n n
λi (A) + λi (B) = λi (C).
i=1 i=1 i=1
However, there are many other nontrivial relationships between the eigenvalues of
A, B, and C. For instance, if n = 2, then
λ1 (C) ≤ λ1 (A) + λ1 (B),
λ2 (C) ≤ λ1 (A) + λ2 (B), and
λ2 (C) ≤ λ2 (A) + λ1 (B).
For larger values of n, more and more inequalities emerge. The story was only
completed in 1999, with the resolution of the famous Horn conjecture by Alexander
Klyachko [1], and by Allen Knutson (1969– ) and Terence Tao [2].
1925: Comments
Stone’s theorem and the solution to the problem. Stone’s theorem is
a seminal result in the mathematical formulation of quantum mechanics. It says
that a strongly continuous, one-parameter semigroup t → U (t) of unitary operators
on a Hilbert space H is of the form U (t) = exp(itA), in which A is a (potentially
100TH ANNIVERSARY PROBLEMS 69
∞ ∞
1 n n 1 n n
U (t)x = exp(tS)x = t S x=x+ t S x = x + 0 = x.
n=0
n! n=1
n!
3 Interesting historical tidbit: mathematician Marshall H. Stone (1903–1989) was the son of
Harlan F. Stone (1872–1946), Chief Justice of the Supreme Court from 1941 to 1946.
70 1925. THE SCHRÖDINGER EQUATION
That is, the point x is fixed by each U (t). Furthermore, x generates a one-
dimensional subspace span{x} of R3 that is fixed by each U (t). In other words,
span{x} is an axis of rotation for our planet.
The fact that our model takes place in three dimensions is crucial. In an
even-dimensional universe, the planet need not have any axis of rotation. The
computation (1925.1) would only yield the useless deduction det S = det S. Here
is a 2 × 2 semigroup of real orthogonal matrices:
!
cos t − sin t
U (t) = .
sin t cos t
Their eigenvalues are eit = cos t+i sin t and they have no common (real) eigenvector.
Although U (t) rotates R2 clockwise around the origin through an angle of t, there
is no nonzero vector that is fixed by each U (t).
Bibliography
[1] A. A. Klyachko, Stable bundles, representation theory and Hermitian operators, Selecta Math.
(N.S.) 4 (1998), no. 3, 419–445, DOI 10.1007/s000290050037. MR1654578
[2] A. Knutson and T. Tao, Honeycombs and sums of Hermitian matrices, Notices Amer. Math.
Soc. 48 (2001), no. 2, 175–186. MR1811121
[3] S. R. Garcia and R. A. Horn, A Second Course in Linear Algebra, Cambridge University Press,
2017.
[4] R. A. Horn and C. R. Johnson, Matrix analysis, 2nd ed., Cambridge University Press, Cam-
bridge, 2013. MR2978290
[5] D. Hilbert, Über das Unendliche, Math. Ann. 95 (1926), 161-190. http://link.springer.
com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/1902-08-
10/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf.
[6] E. Schrödinger, An undulatory theory of the mechanics of atoms and molecules, Phys. Rev.
28 (1926), no. 6, 1049-1070. http://journals.aps.org/pr/abstract/10.1103/PhysRev.28.
1049.
1926
Ackermann’s Function
Introduction
In 1926 David Hilbert published an article on infinity [2], at that time still a
controversial topic, in which he famously declared “no one will drive us from the
paradise which Cantor created for us” (see the 1918 entry for a brief introduction
to Cantor’s theory of cardinality). In this important paper, Hilbert also described
a function discovered by his student, Wilhelm Ackermann (1896–1962).
Ackermann was trying to unify arithmetic operations on natural numbers. Just
as addition is repeated counting, multiplication is repeated addition, and exponen-
tiation is repeated multiplication, one can continue to iterate each successive oper-
ation to produce an even faster-growing one. Ackermann defined his function ϕ of
three variables recursively in such a way that
.a
..
ϕ(a, b, 0) = a + b, ϕ(a, b, 1) = a · b, ϕ(a, b, 2) = ab , ϕ(a, b, 3) = aa ,
b times
To get an idea of how fast this function grows, note that A(2, 3) = 9, A(3, 3) = 61,
and A(4, 3) have about 1020,000 decimal digits. The enormity of A(5, 3) is scarcely
conceivable.
Because Ackermann’s function (in whatever incarnation) grows very rapidly,
one can form a kind of “inverse” function, α, that grows so slowly that for all
practical purposes it is constant. This function turns out to play an important
role in the analysis of algorithms. For example, although there is no linear-time
algorithm for managing a sequence of “union” and “find” operations on a collection
of n disjoint sets, Robert Tarjan (1948– ) found a data structure such that these
operations can be performed in time O(n · α(n)) [5].
71
72 1926. ACKERMANN’S FUNCTION
1926: Comments
Euler’s power tower. Before discussing the solution, let us digress a bit on
a few other interesting expressions with a recursive flavor. Under certain circum-
stances, the function
··
x·
f (x) = xx (1926.1)
can be made sense of. First of all, the preceding denotes the limit of the sequence
a1 , a2 , . . . defined by a1 = x and an+1 = xan for n = 1, 2, . . .. That is, we always
group exponents “from the top down”:
x x 2
xx means x(x ) , not (xx )x = x(x ) .
Euler showed in 1783 that the expression that defines f (x) converges if
0.06598 . . . = e−e < x < e1/e = 1.4446 . . . ;
√
see Figure 1. Since 2 = 1.4142 . . .,
·
√ √2··
√ 2
s = 2
√ s
is well-defined and nonnegative. Since s = 2 , we have
√ s
s2 = ( 2 )2 = (2s/2 )2 = 2s .
Consequently, 2 log s = s log 2 and hence
log s log 2
= .
s 2
Since s ≥ 1 and log x/x is strictly decreasing on [1, ∞), we conclude that s = 2.
Assuming convergence, a similar approach can be used to evaluate
, "
√
r = 2 + 2 + 2 + 2 + · · ·.
√
Since r = 2 + r, we have r 2 = 2 + r and hence (r − 2)(r + 1) = 0. Since r = −1
is impossible, we must have r = 2.
100TH ANNIVERSARY PROBLEMS 73
See the 1931, 1934, and 1972 entries for more on continued fractions.
1 Recall that we may interchange limits with continuous functions. Here we are using the fact
that f (t) = 1 + t and g(t) = 1/t are continuous functions of the real variable t = 0.
74 1926. ACKERMANN’S FUNCTION
Solution to the problem. It turns out that there is exactly one counterex-
ample for n < 4,000,000, namely n = 1969. In this case, the values A1969 (2i, ·) are
(1698, 0, 0, 0, 0, 0, . . .), and the values A1969 (2i+1, ·) are (0, 1698, 0, 1698, 0, 1698, . . .)
for all i ≥ 4 [3]. It is not known whether there are other counterexamples.
Bibliography
[1] W. Ackermann, Zum Hilbertschen Aufbau der reellen Zahlen (German), Math. Ann.
99 (1928), no. 1, 118–133, DOI 10.1007/BF01459088. http://eretrandre.org/rb/files/
Ackermann1928_126.pdf. MR1512441
[2] D. Hilbert, Über das Unendliche (German), Math. Ann. 95 (1926), no. 1, 161–
190, DOI 10.1007/BF01206605. http://link.springer.com/article/10.1007%2FBF01206605.
MR1512272
[3] J. Froemke and J. W. Grossman, Unsolved Problems: A Mod-n Ackermann Function, or
What’s So Special About 1969?, Amer. Math. Monthly 100 (1993), no. 2, 180–183, DOI
10.2307/2323780. MR1542281
[4] R. M. Robinson, Recursion and double recursion, Bull. Amer. Math. Soc. 54 (1948), 987–
993, DOI 10.1090/S0002-9904-1948-09121-2. http://www.math.ntnu.no/emner/MA2301/2010h/
robinson_doublerec.pdf. MR0026976
[5] R. E. Tarjan, Efficiency of a good but not linear set union algorithm, J. Assoc. Comput.
Mach. 22 (1975), 215–225, DOI 10.1145/321879.321884. http://ecommons.library.cornell.
edu/handle/1813/5942. MR0458996
1927
Introduction
Many a problem solver will be aware of the William Lowell Putnam Math-
ematical Competition, a North American undergraduate contest administered by
the Mathematical Association of America. It was founded in 1927 by Elizabeth
Lowell Putnam1 (1862–1935) in honor of her late husband, William Lowell Put-
nam (1861–1923), who firmly believed in the virtues of academic rivalry between
universities. Among the many unwritten traditions of the Putnam exam is that
every exam should have at least one problem that uses the year number as part of
a problem statement or its solution. So, it is a fitting twist that the Putnam exam
is the subject of this section.
Joseph Gallian (1942– ) wrote a fabulous overview of the Putnam exam’s his-
tory, milestones, statistics, and trivia [2]. Offered every year since 1938 (except in
1943–1945), the Putnam exam’s roots include a math competition also sponsored
by Elizabeth Lowell Putnam and held in 1933 between ten Harvard students and
ten West Point cadets. The cadets both won the team contest and had the top
individual score. Earlier Putnam exams featured problems in areas closer to the
introductory technical undergraduate curriculum such as calculus, differential equa-
tions, or geometry; in more recent years, a recognizable blend of topics including
also linear algebra, some abstract algebra, combinatorics, number theory (or even
an occasional advanced topic on harder questions) characterizes each year’s twelve
problems.
The five most successful contestants each year are named Putnam Fellows, one
of whom is also awarded a fellowship for graduate study at Harvard; eight persons so
far have been a Putnam Fellow the maximum possible four times. Other substantial
monetary team and individual prizes are given, and an Elizabeth Lowell Putnam
prize may be awarded to one female contestant. The original intent to boost team
spirit and provide an avenue for students to fight for their institution’s glory in an
academic subject helps one to understand the peculiar ranking system, in which
every participating institution must designate a three-person team in advance, and
the team ranking is obtained by adding the team members’ individual ranks (rather
than their scores). Since higher scores are obtained by much fewer students, a
university whose three team members solve seven problems will usually rank higher
than the one where two brilliant team members solve nine problems and the third
solves three.
1 Her brother was astronomer Percival Lowell (1855–1916), who predicted the location of
Pluto and popularized the erroneous theory that Mars bore canals that indicated the presence of
intelligent life.
75
76 1927. WILLIAM LOWELL PUTNAM MATHEMATICAL COMPETITION
The Putnam exam has been called “the hardest math test in the world” [1, 3].
This is not without reason: the median score has budged above 1 point out of 120 in
only four years since 1999 and then never above 3, while fully 62.6% of 2006 entrants
scored 0. A student must make substantial progress toward an actual solution to
receive any points for a problem; checking small examples or stating some immediate
conclusions typically does not make the cut. Each of the 12 problems is graded on
a scale from 0 to 10 points, with the only scores allowed being 0, 1, 2, 8, 9, and
10. Thus, the grader must decide whether the problem is essentially solved or
not. A submission that solves one of the two main cases or one that contains the
structure of a full solution but has a serious flaw might get 1 or 2 points. On
the other hand, a submission that contains all the ingredients of a full solution
but neglects to check a minor subcase might get 8 or 9 points; the full mark of
10 points is reserved for essentially perfect solutions. The first round of grading
currently occurs in December at Santa Clara University. Imagine several dozen
mathematicians tackling the collective output of over 4,000 competitors from over
500 colleges on twelve problems one paper at a time over the span of four days.
Undergraduate students solve problems in several competitions around the
world, such as the annual Schweitzer competition in Hungary, the Jarnı́k competi-
tion in central Europe, the famous competitions at Moscow’s and Kiev’s Mech-Mats,
or the International Mathematics Competition for University Students [5], an an-
nual contest held in Europe that has also seen participation from several American
universities. While many Putnam stars were successful in the high school IMOs,
the two contests retain distinct mathematical profiles.
Opinions differ on the extent to which the Putnam exam or other competi-
tions mimic the mathematical research experience or are somehow reflective of the
student’s research aptitude; see several Putnam Fellows’ perspectives in [1]. The
Putnam exam was, of course, never designed for such use. Five Fellows, namely
Richard Feynman (1918–1988), John Milnor (1931– ), David Mumford (1937– ),
Daniel Quillen (1940–2011), and Kenneth Wilson (1936–2013), have been subse-
quently recognized with a Fields Medal or a Nobel Prize, and many dozens more
have become distinguished mathematicians at top universities and research insti-
tutes. Notable Putnam competitors include many Abel Prize winners, MacArthur
Fellows, AMS and MAA presidents, members of the National Academy of Sciences,
as well as many winners of the Morgan Prize for undergraduate research. Many oth-
ers have chosen entirely different careers, and many top-notch mathematicians have
never taken or particularly enjoyed contest mathematics. Ask mathematics gradu-
ate program admissions chairs or hedge fund managers and many will tell you that,
while neither a prerequisite nor a guarantee of success, a candidate’s good showing
on the Putnam exam gets their attention. Putnam problems test a specific kind of
ingenuity over technical mastery and are sometimes seen as occupying a universe
of their own, but here as in Hamming we must remember that Putnam problems
“were not on the stone tablets that Moses brought down from Mt. Sinai” [4]. They
are composed by a committee of working mathematicians designated by the MAA,
and so their evolution over time perhaps reflects our collective style and taste.
What makes a good Putnam problem? Bruce Reznick (1953– ) has writ-
ten with charm and detail about writing for the Putnam exam [8]. André Weil
(1906–1998), paraphrasing the English poet Housman who had used an example of
a fox-terrier hunting for a rat to explain why he cannot define poetry, famously
100TH ANNIVERSARY PROBLEMS 77
quipped: “when I smell number-theory I think I know it, and when I smell some-
thing else I think I know it too” [9]. He then proceeded to argue that analytic
number theory is not number theory, but this is a subject for another article. Put-
nam takers and experienced problem-solvers will similarly spot a juicy problem. It
will be accessible but not trivial, challenging but not impossibly so. It will relate
to important mathematics, but with an unexpected twist. It will make you smile
and, in Reznick’s words, whistle in your mind like a catchy tune.
We propose the following Putnam problem for your whistling enjoyment. It
appeared as problem A3 in the 2013 exam and requires nothing beyond calculus.
Do not just solve it and bask in your glory. Make yourself a hot beverage, relate the
solution to your other mathematical experiences, continue the story that it tells;
you will have new problems of your own in no time.
1927: Comments
A hint for the problem. There are numerous collections, both in print and
online, with solutions and commentaries on the Putnam problems. For example, see
[6]. Before looking at the solution, you are strongly encouraged to try it yourself.
Here is a hint: suppose that
a0 + a1 y + · · · + an y n
= 0 (1927.2)
for 0 < y < 1. If this occurs, then the intermediate value theorem implies that
a0 + a1 y + · · · + an y n has the same sign for all 0 < y < 1.
and hence pA assumes both positive and negative values on R. The intermediate
value theorem ensures that pA has a real zero; that is, A has a real eigenvalue. This
argument fails if n is even. For example, the eigenvalues of
!
0 −1
A=
1 0
are ±i.
78 1927. WILLIAM LOWELL PUTNAM MATHEMATICAL COMPETITION
Bibliography
[1] G. L. Alexanderson, How Putnam fellows view the competition, MAA Focus, December 2004,
14-15, http://www.maa.org/sites/default/files/pdf/pubs/dec04.pdf
[2] J. A. Gallian, The first sixty-six years of the Putnam competition, American Mathematical
Monthly 111 (2004), 691–699. See also http://www.d.umn.edu/~jgallian/putnam.pdf.
[3] L. Grossman, Crunching the numbers, Time, December 16, 2002, http://content.time.com/
time/magazine/article/0,9171,400000,00.html.
[4] R. W. Hamming, The unreasonable effectiveness of mathematics, Amer. Math. Monthly 87
(1980), no. 2, 81–90, DOI 10.2307/2321982. MR559142
[5] International Mathematics Competition for University Students, http://www.imc-math.org.
uk.
[6] K. S. Kedlaya, B. Poonen, and R. Vakil, The William Lowell Putnam Mathematical Compe-
tition, 1985–2000, Problems, solutions, and commentary, MAA Problem Books Series, Math-
ematical Association of America, Washington, DC, 2002. MR1933844
[7] K. S. Kedlaya, The Putnam archive, http://kskedlaya.org/putnam-archive/.
[8] B. Reznick, Some thoughts on writing for the Putnam, http://www.math.uiuc.edu/~reznick/
putnam.pdf.
[9] A. Weil, Essais historiques sur la théorie des nombres (French), L’Enseignement
Mathématique, Université de Genève, Geneva, 1975. Extrait de l’Enseignement Math. 20
(1974); Monographie No. 22 de L’Enseignement Mathématique. MR0389725
1928
Introduction
Random matrix theory is, as expected, the study of randomly chosen matrices.
What is not immediately apparent is why it should so beautifully model such diverse
phenomena as energy levels in nuclear physics, zeros of the Riemann zeta function
(which encode information about the primes; see the 1942 entry), and stopping
times of bus routes, to name just a few! While the subject began with the 1928
paper of John Wishart (1898–1956) [24], for many people the most exciting dates
come later, in the 1950s, 1970s and 1990s.
In the 1950s Eugene Wigner (1902–1995) had the fruitful insight that systems
of random matrices could accurately predict properties of heavy nuclei [18–22]. In
a classical mechanics course one learns how to solve, in closed form, problems that
involve just one or two point masses. Once we have three bodies, chaos sets in and
closed-form solutions typically do not exist. Now imagine how much more daunting
the task is with heavy nuclei, in which there are hundreds of protons and neutrons
interacting under far more complicated forces than gravity. In quantum mechanics,
this is represented as HΨn = En Ψn , in which H is the Hamiltonian of the system
and Ψn are the energy eigenstates with eigenvalues En ; see the 1925 entry. While
this reduces quantum mechanics to linear algebra, there is a twist. The matrices
are infinite1 and the entries are unknown. These sorts of problems are beyond the
techniques learned in an undergraduate linear algebra class.
Wigner’s idea, for which he earned a Nobel Prize, is that the complicated
interactions actually help us. Rather than trying to find the eigenvalues of the
operator associated to our physical system, he looked at many random matrices,
diagonalized them, weighted the observed eigenvalue spectra by the probability of
choosing those matrices, and then averaged over a family of matrices. The hope,
which has been borne out time and time again in experiments and theories, is that
a “typical” system is close to average. A good way to view this universality is
to see it as a sort of central limit theorem; see the 1922 entry. We first establish
some notation and then give a simple version of his result below; it has since been
greatly generalized and extended. See [4–7, 16, 17] for some of the recent successes
and surveys, which have greatly weakened the assumptions needed on the random
variables. While we concentrate on real symmetric matrices, variants have been
proved in many other settings.
1 To be more precise, they are unbounded selfadjoint operators on an infinite-dimensional
Hilbert space. There are additional wrinkles too. On infinite-dimensional vector spaces, linear
operators need not have any eigenvalues. Consider the operator T : C[0, 1] → C[0, 1] defined
by (T f )(x) = xf (x). Here C[0, 1] denotes the complex vector space of continuous functions
f : [0, 1] → C. If T f = λf for some λ ∈ C, then (x − λ)f (x) = 0 for all x ∈ [0, 1]. Since f is
continuous, it must be the zero function. Thus, no λ ∈ C is an eigenvalue of T !
79
80 1928. RANDOM MATRIX THEORY
A key tool in understanding the eigenvalues is the eigenvalue trace lemma, which
states
n
tr A = aii = λ1 (A) + · · · + λN (A)
i=1
and, more generally,
N
tr(Ak ) = ai1 i2 ai2 i3 · · · aik−1 ik aik i1 = λi (A)k .
1≤i1 ≤···≤ik ≤N i=1
The importance of this lemma is that it allows us to pass from the matrix elements
(which we know) to the eigenvalues (which we want to understand). Determining
the moments of the eigenvalues yields information on their distribution. For exam-
ple, taking k = 2 implies that the expected value of tr(A2 ) is N 2 and hence the
average
√ square of an eigenvalue is N . Thus, a typical eigenvalue should be of order
N . This simple calculation suggests the normalization we shall see in Wigner’s
semicircle law.
√ we need is the empirical spectral measure of A,√denoted by μA .
The last item
We divide by 2 N to normalize each eigenvalue; the factor 2 in 2 N ensures that
we ultimately wind up with a circle instead of an ellipse. We write
1
N
λi (A)
μA (x) = δ x− √ ,
N i=1 2 N
in which δ is the Dirac delta functional. One can view δ(x − a) as a unit point mass
at a. If f is a suitably nice function, then
∞
f (x)δ(x − a) dx = f (a).
−∞
The way to mathematically realize “delta functions” in a rigorous way is to regard
them as linear functionals on suitable spaces of functions. For example, δ(x − a) is
the linear functional that sends a function f to the scalar f (a).
Wigner’s Semicircle Law: Consider the ensemble E(N, p) of N ×N real symmet-
ric matrices where the independent entries are independent, identically distributed
random variables drawn from a distribution p with mean 0, variance 1, and finite
2 Forinstance, the (1, 2) and (3, 4) entries are independent, in the sense that neither deter-
mines the other. On the other hand, the (1, 2) and (2, 1) entries each determine the other since
the matrices involved are symmetric.
100TH ANNIVERSARY PROBLEMS 81
higher moments. As N → ∞, for almost all A ∈ E(N, p), the √ empirical spectral
measure μA converges to the density of the semicircle, fsc (x) = π2 1 − x2 if |x| ≤ 1
and 0 otherwise.
Wigner’s work was expanded by Freeman Dyson (1923– ) [2, 3], whom we will
meet again shortly, and many others. Physical reasons often require the matrices
involved to be real symmetric (this ensures that the eigenvalues are real). Modulo
such constraints, researchers mostly considered matrices in which the free entries
were chosen independently from a fixed distribution. See Figures 1 and 2 for ex-
amples.
Fast forward to the 1970s. The Riemann zeta function, defined for Re s > 1 by
∞
1 1
−1
ζ(s) = = 1− s ,
n=1
ns p prime
p
can be meromorphically continued to the entire complex plane with a simple pole
at s = 1; see the 1933 entry for a proof of the remarkable product formula above.
Using complex analysis, one can show that the zeros of the completed zeta func-
tion are intimately connected to many properties of the primes. Hugh Montgomery
(1944– ) was working on the pair correlation problem [14], trying to understand the
distribution of differences of pairs of zeta zeros. While visiting the Institute for Ad-
vanced Study at Princeton, he relayed what he had found to Dyson, who remarked
that the same behavior is seen in the eigenvalues of certain ensembles of random
matrices. Additional support was later provided by the numerical investigations of
Andrew Odlyzko (1949– ) on the zeros of ζ(s); see [15] and the 1987 entry.
82 1928. RANDOM MATRIX THEORY
From that moment, number theory, random matrix theory, and physics had a
lot to say to each other. These subjects continue to drive each other. New questions
emerged in the 1990s with the work of Nick Katz (1943– ) and Peter Sarnak (1953– )
[11], expanding the universe of matrix families relevant to number theory. For more
information on random matrix theory and its connection to number theory, see the
books [12, 13] and the survey articles [1, 21, 22]. See also the entry from 1960 for
an entertaining look at Wigner’s views on the role of mathematics in physics.
1928: Comments
Some connections. It is worth briefly mentioning the remarkable similarities
we see in such diverse systems. In the 1922 entry, we saw another example of
different systems converging to similar behavior. For more on this phenomenon see
the 1960 entry on Wigner’s paper The Unreasonable Effectiveness of Mathematics
in the Natural Sciences [23], in which we deliberately chose to focus on quantities
related to the characters from this year’s entry.
100TH ANNIVERSARY PROBLEMS 83
For further reading. The references below contain many great introductions
to random matrix theory. These include a general interest article [10], short survey
articles [1, 8, 11], textbooks [9, 12], and many of the original papers in nuclear
physics and number theory [2, 18–24].
Bibliography
[1] J. B. Conrey, L-functions and random matrices, Mathematics unlimited—2001 and
beyond, Springer, Berlin, 2001, pp. 331–352. http://arxiv.org/pdf/math/0005300.pdf?
origin=publication_detail. MR1852163
[2] F. Dyson, Statistical theory of the energy levels of complex systems: I, II, III, J. Mathemat-
ical Phys. 3 (1962), 140-156, 157-165, 166-175, http://scitation.aip.org/content/aip/
journal/jmp/3/1/10.1063/1.1703773, http://scitation.aip.org/content/aip/journal/
jmp/3/1/10.1063/1.1703774, http://scitation.aip.org/content/aip/journal/jmp/3/1/
10.1063/1.1703773.
[3] F. J. Dyson, The threefold way. Algebraic structure of symmetry groups and en-
sembles in quantum mechanics, J. Mathematical Phys. 3 (1962), 1199–1215, DOI
10.1063/1.1703863. http://scitation.aip.org/content/aip/journal/jmp/3/6/10.1063/1.
1703863. MR0177643
[4] L. Erdős, Universality of Wigner random matrices: a survey of recent results, http://arxiv.
org/abs/1004.0861.
[5] L. Erdős, B. Schlein, and H.-T. Yau, Semicircle law on short scales and delocalization of
eigenvectors for Wigner random matrices, Ann. Probab. 37 (2009), no. 3, 815–852, DOI
10.1214/08-AOP421. MR2537522
[6] L. Erdős, B. Schlein, and H.-T. Yau, Local semicircle law and complete delocalization
for Wigner random matrices, Comm. Math. Phys. 287 (2009), no. 2, 641–655, DOI
10.1007/s00220-008-0636-9. MR2481753
[7] L. Erdős, B. Schlein, and H.-T. Yau, Wegner estimate and level repulsion for Wigner ran-
dom matrices, Int. Math. Res. Not. IMRN 3 (2010), 436–479, DOI 10.1093/imrn/rnp136.
MR2587574
[8] F. W. K. Firk and S. J. Miller, Nuclei, primes and the random matrix connection, Symmetry
1 (2009), no. 1, 64–105, DOI 10.3390/sym1010064. http://arxiv.org/pdf/0909.4914.pdf.
MR2756142
[9] P. J. Forrester, Log-gases and random matrices, London Mathematical Society Monographs
Series, vol. 34, Princeton University Press, Princeton, NJ, 2010. MR2641363
[10] B. Hayes, The spectrum of Riemannium, American Scientist 91 (2003), no. 4, 296-300.
[11] N. M. Katz and P. Sarnak, Zeroes of zeta functions and symmetry, Bull. Amer. Math. Soc.
(N.S.) 36 (1999), no. 1, 1–26, DOI 10.1090/S0273-0979-99-00766-1. MR1640151
[12] M. L. Mehta, Random matrices, 2nd ed., Academic Press, Inc., Boston, MA, 1991.
MR1083764
[13] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
84 1928. RANDOM MATRIX THEORY
[14] H. L. Montgomery, The pair correlation of zeros of the zeta function, Analytic number theory
(Proc. Sympos. Pure Math., Vol. XXIV, St. Louis Univ., St. Louis, Mo., 1972), Amer. Math.
Soc., Providence, R.I., 1973, pp. 181–193. http://www-personal.umich.edu/~hlm/paircor1.
pdf. MR0337821
[15] A. M. Odlyzko, On the distribution of spacings between zeros of the zeta function, Math.
Comp. 48 (1987), no. 177, 273–308, DOI 10.2307/2007890. http://www.ams.org/journals/
mcom/1987-48-177/S0025-5718-1987-0866115-0/. MR866115
[16] B. Schlein, Spectral Properties of Wigner Matrices, Proceedings of the Conference QMath
11, Hradec Kralove, 2010.
[17] T. Tao and V. Vu, Random matrices: universality of local eigenvalue statistics, Acta Math.
206 (2011), no. 1, 127–204, DOI 10.1007/s11511-011-0061-3. MR2784665
[18] E. Wigner, On the statistical distribution of the widths and spacings of nuclear resonance
levels, Proc. Cambridge Philo. Soc. 47 (1951), 790-798. http://journals.cambridge.org/
abstract_S0305004100027237.
[19] E. Wigner, Characteristic vectors of bordered matrices with infinite dimensions, Ann.
of Math. 2 (1955), no. 62, 548-564. http://www.jstor.org/stable/1970079?seq=1#
page_scan_tab_contents.
[20] E. Wigner, Statistical properties of real symmetric matrices, in Canadian Mathematical Con-
gress Proceedings, University of Toronto Press, Toronto, 1957, 174–184.
[21] E. P. Wigner, Characteristic vectors of bordered matrices with infinite dimensions. II, Ann.
of Math. (2) 65 (1957), 203–207, DOI 10.2307/1969956. MR0083848
[22] E. P. Wigner, On the distribution of the roots of certain symmetric matrices, Ann. of Math.
(2) 67 (1958), 325–327, DOI 10.2307/1970008. http://www.jstor.org/stable/1970008?
seq=1#page_scan_tab_contents. MR0095527
[23] E. P. Wigner, The unreasonable effectiveness of mathematics in the natural sciences [Comm.
Pure Appl. Math. 13 (1960), 1–14; Zbl 102, 7], Mathematical analysis of physical systems,
Van Nostrand Reinhold, New York, 1985, pp. 1–14. https://www.dartmouth.edu/~matc/
MathDrama/reading/Wigner.html. MR824292
[24] J. Wishart, The generalized product moment distribution in samples from a normal multi-
variate population, Biometrika 20 A (1928), 32-52. http://www.jstor.org/stable/2331939?
seq=1#page_scan_tab_contents.
1929
Introduction
This statement is false. If it is false, then it is true; likewise, if it is true, then it
is false. This paradox, commonly known as the liar’s paradox, has been attributed
to Eubulides of Miletus (4th century BCE). If you are confused, you are not the only
one. The liar’s paradox was used to disable an android in the Star Trek episode I,
Mudd (1967) and a sentient mainframe in the Doctor Who serial The Green Death
(1973).
In a similar vein, Bertrand Russell (1872–1970) dealt the death blow to “naive
set theory” in 1901 when he observed that the definition of the “set” R = {x : x ∈ / x}
ensures that R ∈ R if and only if R ∈ / R, which is absurd. Thus, R is not a set;
it is too big to be treated as a set in a logically sound manner. This is Russell’s
paradox .
Needless to say, the invalidity of naive set theory1 was a great disappointment.
No one was more disappointed than Gottlob Frege (1848–1925), who was just fin-
ishing his would-be masterpiece Grundgesetze der Arithmetik , which purported to
derive the laws of arithmetic from supposedly logical axioms. As Frege admitted:
A scientist can hardly meet with anything more undesirable than to
have the foundation give way just as the work is finished. I was put
in this position by a letter from Mr. Bertrand Russell when the work
was nearly through the press.
Mathematicians were forced to reevaluate the foundations of their discipline. Sets
would have to be treated in a rigorous manner. The rules would have to be explicitly
stated so that contradictions would not occur in axiomatic set theory.
The Zermelo–Fraenkel axioms (ZF) are a list of eight or so axioms,2 depending
upon the particular formulation, that codify the properties of sets. For the most
part they assert things that most mathematicians take for granted (for example,
unions of sets exist). Here are a couple of the axioms.
Axiom of Foundation. ∀x x
= ∅ =⇒ ∃y ∈ x(y ∩ x = ∅) .
This axiom prevents a set from being an element of itself.3
1 More specifically, the general comprehension principle, which asserts that given any prop-
erty, there exists a set that consists of all objects having that property.
2 Technically, some of them are axiom schema; the distinction is not important for us.
3 If A is a set, then so is {A} (this requires the axiom of pairing, which asserts that if A
and B are sets, then so is {A, B}; let A = B to see that {A} is a set). The axiom of foundation
ensures that there is an element of {A} that is disjoint from {A}. The only element of {A} is A,
so A and {A} are disjoint. Thus, A ∈ / A.
85
86 1929. GÖDEL’S INCOMPLETENESS THEOREMS
Axiom of Infinity. ∃X ∅ ∈ X ∧ ∀x ∈ X (x ∪ {x}) ∈ X .
This ensures that an infinite set exists; the set X described by the axiom contains
∅, ∅ ∪ {∅}, ∅ ∪ {∅} ∪ ∅ ∪ {∅} , . . . ,
which can be used to define the natural numbers 0, 1, 2, . . .. Is Zermelo–Fraenkel
set theory the ultimate answer to the foundational crisis in mathematics? In short,
no. We have not even brought up the axiom of choice or the continuum hypothesis
yet; see the entries for 1924, 1940, 1963, 1964, and 1999.
Self-reference, seen in the liar’s paradox and Russell’s paradox, lies at the heart
of the celebrated first incompleteness theorem of Kurt Gödel (1906–1978).4 A set
of axioms is consistent if there does not exist a statement S such that both S
and its negation ¬S are provable from the axioms (that is, the axioms are not
self-contradictory). The first incompleteness theorem says that any “sufficiently
complicated” axiomatic system is either incomplete (not all true statements in that
system can be proved in that system) or inconsistent (self-contradictory). The
second incompleteness theorem states that no “sufficiently complicated” axiomatic
system (this includes ZF) can prove its own consistency.
Around the turn of the 20th century, David Hilbert initiated a program that
aimed to show that all of mathematics can be derived from a set of self-evident
axioms. This program was pursued in earnest by Bertrand Russell and Alfred North
Whitehead (1861–1947), who authored the imposing Principia Mathematica. This
was an ambitious task; it took over 300 pages to establish that 1 + 1 = 2.
Gödel’s theorems show that Hilbert’s program is doomed. If ZF is consistent,
then there are true statements that can be expressed in the language of ZF, but
not proved in ZF. Moreover, we cannot even hope to use ZF to prove that ZF
is consistent. Indeed, if ZF could be used to prove that ZF is consistent, then
the second incompleteness theorem would ensure that ZF is inconsistent! There
is nothing special about ZF. Any other sufficiently complicated system of axioms
would be plagued by the same issues.
4 Technically, we are a little early. The completeness theorem for first-order logic was the
subject of Gödel’s 1929 thesis; the incompleteness theorems actually date from 1931.
100TH ANNIVERSARY PROBLEMS 87
1929: Comments
Although we do not provide the complete solution here, the following example
illustrates the main idea; see [1] for more details. Let us return to the case n = 266
discussed above. In the line corresponding to base b, replace every occurrence of b
in the hereditary base-b expansion of Gb (266) − 1 with the first infinite ordinal ω;
see Figure 1. This yields
ω+1
H2 (266) = ω ω + ω ω+1 + ω,
ω+1
H3 (266) = ω ω + ω ω+1 + 2,
ω+1
H4 (266) = ω ω + ω ω+1 + 1,
ω+1
H5 (266) = ω ω + ω ω+1 ,
ω+1
H6 (266) = ω ω + ω ω · 5 + ω 5 · 5 + · · · + ω · 5 + 5,
and so forth. The structure of H6 (266) looks different than expected because
(1929.1) is not the hereditary base-6 expansion of Gb (266) − 1; the “ones” term
cannot be −1. Instead.
6+1 6+1
66 + 66+1 − 1 = 66 + 5 · 66 + 5 · 65 + · · · + 5 · 6 + 5.
This leads to a strictly decreasing sequence Hb (266) of countable ordinals. It turns
out that any strictly decreasing sequence of ordinal numbers is finite, so Hb (266),
and hence Gb (266), terminates with 0.
Although Goodstein’s theorem is a statement about natural numbers and their
properties, it cannot be proved without “infinitary” means; some form of transfinite
mathematics is required to prove Goodstein’s theorem. The Kirby–Paris theorem
(1982) implies that Goodstein’s theorem is independent of Peano arithmetic (PA).
In other words, Goodstein’s theorem can neither be proved nor disproved in PA.
One can think of Peano arithmetic as the system ZFCfin obtained from ZFC
(see the 1963, 1964, and 1969 entries) if the axiom of infinity (“there exists an
infinite set”) is replaced by its negation (“all sets are finite”). PA is sufficient for
almost all familiar statements of elementary number theory. For instance, Euclid’s
theorem on the infinitude of the primes can be proved without any reference to
infinite sets. “For each prime p there exists a prime q such that p < q” expresses
88 1929. GÖDEL’S INCOMPLETENESS THEOREMS
ω4 ω5 ω6
ω +2 ω3
4 ω +1 ω2
3
2 ω
the infinitude of primes without discussing infinite sets. One does not need to “hold
in one’s hand” the set of all primes in order to prove Euclid’s theorem.
Gödel’s first incompleteness theorem ensures that if Peano arithmetic is con-
sistent (as most people believe), then there are true statements about the integers
that cannot be proved in PA. Goodstein’s theorem is one such statement.
If you want more information about the foundational crisis and its main char-
acters and you want it in the form of a graphic novel, then [2] is for you. If
you want your Gödel with a serving of M. C. Escher (1898–1972) and J. S. Bach
(1685–1750), then you need the acclaimed book [6]. Another great choice is [3],
particularly for debunking the numerous pseudoscientific assertions often ascribed
to Gödel’s theorems.
Bibliography
[1] A. E. Caicedo, Goodstein’s function (English, with English and Spanish summaries), Rev.
Colombiana Mat. 41 (2007), no. 2, 381–391. MR2585906
[2] A. Doxiadis and C. H. Papadimitriou, Logicomix: An epic search for truth, character design
and drawings by Alecos Papadatos, color by Annie Di Donna, Bloomsbury Press, New York,
2009. MR2884886
[3] T. Franzén, Gödel’s theorem: An incomplete guide to its use and abuse, A K Peters, Ltd.,
Wellesley, MA, 2005. MR2146326
[4] K. Gödel, Über die Vollständigkeit des Logikkalküls, Doctoral dissertation, University of Vi-
enna, 1929.
[5] R. L. Goodstein, On the restricted ordinal theorem, J. Symbolic Logic 9 (1944), 33–41, DOI
10.2307/2268019. https://projecteuclid.org/euclid.jsl/1183391360. MR0010515
[6] D. R. Hofstadter, Gödel, Escher, Bach: an eternal golden braid, Basic Books, Inc., Publishers,
New York, 1979. MR530196
[7] L. Kirby and J. Paris, Accessible independence results for Peano arithmetic, Bull. Lon-
don Math. Soc. 14 (1982), no. 4, 285–293, DOI 10.1112/blms/14.4.285. http://blms.
oxfordjournals.org/content/14/4/285.full.pdf. MR663480
[8] Wolfram MathWorld, Goodstein Sequence, http://mathworld.wolfram.com/
GoodsteinSequence.html.
1930
Ramsey Theory
Introduction
There are many questions that could, in principle, be settled by a computation.
However, some of these problems are so far beyond the realm of practical compu-
tation that we may never know the answer; see the 1933 entry and the comments
for the 1992 entry for other examples of this phenomenon. A great source of such
problems is Ramsey theory, named after Frank Plumpton Ramsey (1903–1930), an
area of mathematics that studies how large a collection of objects must be to ensure
the emergence of a desired property.
The seminal problem in Ramsey theory is the determination of the Ramsey
number R(m, n), which is defined as follows. Imagine there is a long-expected
party with N people, and in any pair of two people either both know each other
or neither knows the other. Then R(m, n) is the smallest N which guarantees that
there are either (a) at least m people that all know each other or (b) at least n people
such that none of these n people know each other. It is known that R(3, 3) = 6 and
R(4, 4) = 18.
Ramsey theory’s mantra is “complete disorder is impossible”: any large, seem-
ingly disordered, structure should contain a smaller, highly ordered substructure.
Unfortunately, there are often so many cases to investigate that these sorts of prob-
lems cannot be solved by brute force. For example, we may associate a graph to
the party problem (see Figure 1), with vertices representing people and an edge
C B
D A
E F
89
90 1930. RAMSEY THEORY
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
connecting people who know each other. The number of graphs on N labeled ver-
n
tices is 2( 2 ) = 2n(n−1)/2 , which already exceeds 10200 for n = 40. According to
Paul Erdős:
Suppose aliens invade the earth and threaten to obliterate it in a year’s
time unless human beings can find the Ramsey number for red five
and blue five. We could marshal the world’s best minds and fastest
computers, and within a year we could probably calculate the value.
If the aliens demanded the Ramsey number for red six and blue six,
however, we would have no choice but to launch a preemptive attack.
Erdős’s quote is about R(5, 5) (which is between 43 and 49) and R(6, 6) (which is
between 102 and 165).
A famous theorem in the subject is due to Bartel Leendert van der Waerden
(1903–1996). Given c ≥ 2 colors and a natural number n, there is a natural number
W (c, n) such that if N ≥ W (c, n) and we paint the integers 1, 2, . . . , N with these
colors, there is a length n arithmetic progression in {1, 2, . . . , N }, each element of
which has the same color. It is known that W (2, 3) = 9 (see Figure 2), W (2, 4) = 35,
W (2, 5) = 178, and W (2, 6) = 1132. Most other values of W (c, d) are unknown,
although bounds exist. For instance, the novel approach to Szemerédi’s theorem
(see the 1975 entry) developed by Fields Medalist Timothy Gowers (1963– ) yields
the upper bound
2n+9
c2
W (c, n) ≤ 22 .
Although W (c, n) grows rapidly, it is hoped that Gowers’s bound is overkill. A cash
prize of $1,000 was offered by Ronald Graham (1935– ) for a proof that
2
W (2, n) < 2n ;
see [6] for a list of problems in Ramsey theory with cash prizes attached.
100TH ANNIVERSARY PROBLEMS 91
1930: Comments
Some cautionary tales. Before giving the solution to this problem, let us
digress a bit on the dangers of extrapolating from limited data. For instance, the
Ramsey numbers R(n, n) are known only for n = 1, 2, 3, 4; it is hard to surmise
what R(5, 5) should be based on this information. Here are a couple cautionary
tales about careless extrapolation.
Moser’s circle problem1 asks for the maximum number f (n) of regions into
which a circle can be partitioned by connecting n points on the circle with chords.
A couple quick sketches confirm that
f (1) = 1, f (2) = 2, f (3) = 4, f (4) = 8, and f (5) = 16;
n−1
see Figure 3. This limited data suggests that f (n) = 2 for all n. However, it
turns out that f (6) = 31; see Figure 4. The correct general answer
1 4
f (n) = (n − 6n3 + 23n2 − 18n + 24),
24
can be obtained by induction or combinatorial topology [3, 11].
The previous conjecture failed at n = 6. Here is an even more striking example.
Let p(n) = n2 + n + 41 and consider its values
41, 43, 47, 53, 61, 71, 83, 97, 113, 131, 151, 173, 197, 223, 251,
281, 313, 347, 383, 421, 461, 503, 547, 593, 641, 691, 743, 797,
853, 911, 971, 1033, 1097, 1163, 1231, 1301, 1373, 1447, 1523,
1601, . . .
for n = 0, 1, 2, . . .. Do you see a pattern? They are all prime! Or at least
p(0), p(1), . . . , p(39) are; we intentionally left off the composite number p(40) =
1681 = 412 . This shows that even a few dozen cases do not a theorem make. This
amazing polynomial was discovered by Euler in 1772; see the 1983 entry for an even
more amazing “prime generating polynomial.”
In 1919, George Pólya (1887–1985) suggested that most natural numbers have
an odd number of prime factors [12]. For instance, 108 = 22 · 33 has 2 + 3 = 5
prime factors. The Liouville lambda function λ(n) is +1 if n has an even number of
prime factors and −1 if n has an odd number of prime factors. Pólya’s conjecture
states that
n
L(n) = λ(i) ≤ 0
i=1
for n = 2, 3, 4, . . .. Numerical evidence suggests that truth of the conjecture. In fact,
it holds for all n < 906,150,257, which is the smallest counterexample to Pólya’s
1 The problem appears in a paper of Leo Moser (1921–1970) and W. Bruce Ross [10], so
2
1
1 3 1
2
4
2
2 1 7 11 1
6 5 12 6
3 8 16 15
7 8 13 10
3 4 9 14 5
4
14 20
21 13
3 1
9 7
26
27 25
15 19
31
22 18
28 30
29
10 12
4 6
16 24
23 17
11
Bibliography
[1] J. C. Baez, Patterns that eventually fail, https://johncarlosbaez.wordpress.com/2018/09/
20/patterns-that-eventually-fail/.
[2] A. Carr, Party at Ramsey’s, http://blogs.ams.org/mathgradblog/2013/05/11/
mathematics/.
[3] J. H. Conway and R. K. Guy, The book of numbers, Copernicus, New York, 1996. MR1411676
[4] A. M. Gleason, R. E. Greenwood, and L. M. Kelly, The William Lowell Putnam Mathemati-
cal Competition: Problems and solutions: 1938–1964, Mathematical Association of America,
Washington, D.C., 1980. MR588757
[5] R. L. Graham and J. H. Spencer, Ramsey Theory, Scientific American (July 1990), 112-117.
http://www.math.ucsd.edu/~ronspubs/90_06_ramsey_theory.pdf.
[6] R. L. Graham, Some of my favorite problems in Ramsey theory, INTEGERS 7 (2007), no. 2,
#A15.
[7] W. T. Gowers, A new proof of Szemerédi’s theorem, Geom. Funct. Anal. 11 (2001), no. 3,
465–588, DOI 10.1007/s00039-001-0332-9. MR1844079
[8] C. B. Haselgrove, A disproof of a conjecture of Pólya, Mathematika 5 (1958), 141–145, DOI
10.1112/S0025579300001480. MR0104638
[9] B. M. Landman and A. Robertson, Ramsey theory on the integers, Student Mathematical
Library, vol. 24, American Mathematical Society, Providence, RI, 2004. MR2020361
[10] L. Moser and W. B. Ross, Mathematical Miscellany, Math. Mag. 23 (1949), no. 2, 109–114.
MR1570450
[11] The On-Line Encyclopedia of Integer Sequences, A000127 (Maximal number of regions ob-
tained by joining n points around a circle by straight lines. Also number of regions in 4-space
formed by n − 1 hyperplanes), http://oeis.org/A000127.
[12] G. Pólya Verschiedene Bemerkungen zur Zahlentheorie, Jahresbericht der Deutschen
Mathematiker-Vereinigung 28, 31-40.
[13] F. P. Ramsey, On a Problem of Formal Logic, Proc. London Math. Soc. (1930), s2-30,
no. 1, 264-286. https://londmathsoc.onlinelibrary.wiley.com/doi/abs/10.1112/plms/
s2-30.1.264.
1931
Introduction
A discrete dynamical system consists of a set of states X and a function T :
X → X. Given a system in state x ∈ X, let T (x) be the state of the system one
unit of time later. If the system starts in state x, it next moves to T (x), then to
T 2 (x) = T (T (x)), and so forth. If A is a set of states and
1 if x ∈ A,
χA (x) =
0 if x ∈ /A
is the characteristic function (also called the indicator function) of A, then
n
χA (T i (x))
i=1
counts the number of visits of x to A up to and including time n. The time average
of visits of x to A is
1
n
lim χA (T i (x)),
n→∞ n
i=1
if this limit exists.
Suppose that we have a notion of “size” m(A) for subsets A of X. We insist
that this “measure” is normalized so that m(X) = 1. Then m(A) can be thought
of as the relative size of A in X (see the notes for 1924 for some of the hazards of
naive measure theory). We also assume that T : X → X is measure preserving, in
the sense that m(T −1 (A)) = m(A) for every measurable set A. That is, although
T mixes and rearranges points of X, the size of A is unchanged after an application
of T . For instance, X could be a batch of (incompressible) cookie dough and T
could be the act of kneading the dough once (in some prescribed manner) for one
minute. A particular handful A of dough might be warped, stretched, or cut, but
the volume occupied by A, T (A), T 2 (A),. . . is always the same.
A consequence of Ludwig Boltzmann (1844–1906) and Josiah Willard Gibbs’s
investigations in statistical mechanics was the ergodic hypothesis. A version of the
ergodic hypothesis states that the time average of a system should equal the space
average, m(A). To see what this means, we consider a simple example. An irrational
rotation is a function T : [0, 1) → [0, 1) of the form T (x) = x + θ (mod 1), in which
θ is a fixed irrational real number. By x + θ (mod 1), we refer to√ the fractional part
x + θ − x + θ of x + θ. For example, if x = 0.5 and θ = 2 = 1.414 . . ., then
x + θ (mod 1) = 0.914 . . .. The term “rotation” stems from the fact the wrapped
interval [0, 1) is topologically the same as a circle. From this perspective, addition
of θ modulo 1 corresponds to a rotation of the circle through an angle of 2πθ. It is
possible to show that the ergodic hypothesis holds for this example. For instance,
95
96 1931. THE ERGODIC THEOREM
√
Figure 1. T (0), T 2 (0), . . . , T 100 (0) for θ = 2, e, and π, respectively.
the average amount of time that T (0), T 2 (0), . . . spends in an interval [a, b] equals
the length b − a of that interval; see Figure 1.
In 1931 John von Neumann [6] followed shortly by the sooner-to-publish George
Birkhoff [2], proved that time averages exist and equal the space averages for
measure-preserving systems satisfying a condition called ergodicity. A set E is
invariant if T (x) ∈ E if and only if x ∈ E. Ergodicity means that the only invari-
ant sets for T are those that differ from ∅ or X by a set of measure zero. If E is
invariant and x starts in E, then all of its iterates stay in E and no point outside
E visits E. That means that if T were not ergodic, there would exist sets E and
E c , both of positive measure, for which the dynamics of T on E will be totally
unrelated to the dynamics of T on E c . In other words, one could decompose the
dynamic system into two independent, simpler systems.
We can now state the Birkhoff ergodic theorem: for all measurable sets A there
exists a set of measure zero N so that
1
n
lim χA (T i (x)) = m(A) for all x outside N.
n→∞ n
i=1
An important part of the theorem is that the limit exists. In fact, once we know
the limit exists, using standard results from analysis it is possible to show the
limit equals the measure of A. An immediate consequence is that for all sets A of
positive measure, every point of X (outside a set of measure zero) visits the set A,
and furthermore, the visits are with the “right frequency.” Thus, we know a lot
about the orbit of almost every point.
This theorem has had a strong influence in analysis and has many consequences.
For example, it can be used to prove Weyl’s uniform distribution property1 and the
law of large numbers2 from probability. The ergodic theorem is in fact a bit more
general: one can replace χA by any Lebesgue-integrable function f , and then m(A)
is replaced by the integral of f . The theorem proved by von Neumann is similar
to Birkhoff’s but the convergence is in the norm of a Hilbert space in which the
functions reside. For an introduction and proof the reader may consult [7]. Further
historical details and current developments can be found in [1].
1 Ifα is irrational, then the set {nα (mod 1)}∞ n=1 is equidistributed in [0, 1].
2 Let X1 , . . . , Xn be independent random variables drawn from a common distribution with
mean μ and let X n = (X1 + · · · + Xn )/n denote the sample mean. Then X n converges in
probability to μ. A sequence of random variables Sn converges in probability to a random variable
S (which in our case will be the constant μ) if for every > 0 we have limn→∞ P (|Sn −S| > ) = 0.
100TH ANNIVERSARY PROBLEMS 97
1
n
f (T p(i) (x)) converges for almost all x,
n i=1
for any polynomial p with integer coefficients. When all powers of T are ergodic,
it follows that this limit equals the integral that is expected. It is reasonable to
ask what happens when the function f is merely integrable, even in the case of the
squares: p(i) = i2 . It was shown recently by Buczolich and Mauldin [4] that the
theorem for the squares fails when f is only assumed to be integrable. This proof
has been extended recently by P. LaVictoire. It would be interesting to find simpler
proofs of all of these results.
1931: Comments
Continued fractions. We briefly discuss a connection between the ergodic
theorem and continued fractions; see [5] and the references therein, as well as
the 1934 and 1972 entries. Each real number x has a unique continued fraction
expansion
1
x = a0 (x) + , (1931.1)
1
a1 (x) +
1
a2 (x) + 1
a3 (x) + ···
in which the positive integers ai (x) are the continued fraction digits of x. For
typographical reasons we write x = [a0 ; a1 , a2 , . . . ] or x = [a1 , a2 , . . . ] if a0 = 0.
How are the ai (x) computed? First, let a0 (x) = x, the greatest integer at most
x. Next, let a1 (x) = 1/(x − a0 (x)) and so forth. The continued fraction (1931.1)
is finite if and only if x is rational. It is eventually periodic if and only if x is a
quadratic irrational. For an x chosen uniformly at random in [0, 1), what is the
probability as n → ∞ that the nth digit is k?
The answer is the beautiful Gauss–Kuzmin theorem, due to Gauss and Rodion
Kuzmin (1891–1949). It says that for almost all x the probability converges to
1
log2 1 + ;
k(k + 2)
see [5] for a proof, which is an expanded version of the the argument in the classic
book by Aleksandr Khinchin (1894–1959). The beauty stems from the clear, simple
formula. The problem with the Gauss–Kuzmin theorem is that we do not know
much about the exceptional set. For example, although we believe that cubic
irrationals follow the Gauss–Kuzmin distribution, we do not know for sure. Some
specific numbers, such as e1/n for n = 1, 2, . . ., fail dramatically.
98 1931. THE ERGODIC THEOREM
(a) x = π (b) x = e
Bibliography
[1] V. Bergelson, Some historical comments and modern questions around the ergodic theorem,
Dynamics of Complex Systems, Research Institute for Math. Sciences, Kyoto, 2004, 1–11.
https://people.math.osu.edu/bergelson.1/vb_Kyoto8Nov04.pdf.
[2] G. Birkhoff, Proof of the ergodic theorem, Proc. Nat. Acad. Sci, USA 17 (1931), 656–660.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1076138/.
[3] J. Bourgain, On the maximal ergodic theorem for certain subsets of the integers, Israel J. Math.
61 (1988), no. 1, 39–72, DOI 10.1007/BF02776301. http://link.springer.com/article/10.
1007%2FBF02776301. MR937581
[4] Z. Buczolich and R. D. Mauldin, Divergent square averages, Ann. of Math. (2) 171 (2010),
no. 3, 1479–1530, DOI 10.4007/annals.2010.171.1479. http://annals.math.princeton.edu/
wp-content/uploads/annals-v171-n3-p02-p.pdf. MR2680392
[5] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[6] J. von Neumann, Proof of the quasi-ergodic hypothesis, Proc. Nat. Acad. Sci., USA 18 (1932),
70–82. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1076162/.
[7] C. E. Silva, Invitation to ergodic theory, Student Mathematical Library, vol. 42, American
Mathematical Society, Providence, RI, 2008. MR2371216
1932
The 3x + 1 Problem
Introduction
The Collatz function T : N → N is defined by
⎧x
⎨ if x is even,
T (x) = 2
⎩3x + 1 if x is odd.
Now pick a seed, a natural number n, and consider the corresponding Collatz se-
quence n, T (n), T 2 (n), . . ., in which T k (n) denotes the k-fold iterate T (T (· · · (T (n)))).
This is also called the orbit of n under T . For example, n = 21 yields the Collatz
sequence
and n = 24 provides
One can prove that the distribution of digits converges, in an appropriate sense,
to Benford’s law on digit bias (see the 1938 entry and [4, 7]). That is, if you take
a large starting seed and look at all the iterates until it hits the cycle 4, 2, 1, then
with probability tending to 1 the digit distribution converges to Benford’s law.
One can make this precise: given any tolerance, the number of starting seeds in an
interval [1, X] that are more than this tolerance away from Benford tends to zero
as X → ∞.
How can one disprove the 3x + 1 conjecture? There are two ways in which
the conjecture could be false. There might be a seed whose Collatz sequence is
unbounded. Or there might be a periodic orbit other than 4, 2, 1 (it is known that
there are no other periodic orbits of length 100,000,000 or less [3]).
Some have described the 3x + 1 problem as a Soviet conspiracy to slow down
American mathematics since so many people tried working on it, tempted by its
apparent simplicity. Paul Erdős said that mathematics is not yet ready to address
questions such as the the 3x + 1 problem.
100TH ANNIVERSARY PROBLEMS 103
1932: Comments
A heuristic approach. When stuck on a difficult conjecture, one can try to
give heuristic arguments for or against its validity. To simplify our model, we omit
the troublesome +1 in the definition of the Collatz function. Since half of the even
numbers are divisible by 2 and not by 4, and a fourth are divisible by 4 and not by 8,
and so on, we consider the functions H2 (x) = 3x/2, H4 (x) = 3x/4, H8 (x) = 3x/8,
and so forth. Our heuristic approximation to the Collatz function is denoted H;
it is obtained by applying H2k with probability 1/2k for k = 1, 2, . . .. The hope is
that this related problem is easier to analyze and that its behavior will shed light
on the original problem.
104 1932. THE 3x + 1 PROBLEM
It is more appropriate to consider the expected value of log H(x) since there
are products involved. According to our model,1
∞ ∞
1 log(3x/2k )
E[log H(x)] = log H 2k (x) =
2k 2k
k=1 k=1
∞
k log 2
= log x + log 3 − = log x + log(3/4)
2k
k=1
< log x.
Consequently, iterating H once decreases the size of the expected outcome. Re-
peated iterations should continue to decrease. Not only does such an argument
lead to heuristic support for the 3x + 1 conjecture, it also suggests roughly how
many steps one needs to iterate until we reach 1. Since each iteration tends to
replace x with 34 x, the expected number of iterations should satisfy (3/4)m x = 1;
that is,
log x
m≈ .
log 4/3
Numerical data strongly supports this rate; see [5, 6] for more on these ideas.
The idea of replacing a deterministic problem with a random one is applicable in
many other settings. One can do this with prime numbers to build intuition about
a host of problems. However, one must be careful. Just as the 3x + 1 problem has
some structure that is lost in the conversion to a random model, the actual sequence
of primes has additional structure not present in random analogues. While random
models are useful, they sometimes give the wrong answer in certain regimes.
∞
1 To sum differentiate the identity ∞
k
k=1 2k , n=0 z = (1 − z)
n −1 , valid for |z| < 1, multiply
∞
the result by z, and obtain n=1 nz n = z/(1 − z)2 . Then substitute z = 1/2.
100TH ANNIVERSARY PROBLEMS 105
Bibliography
[1] R. K. Guy, Don’t try to solve these problems!, Amer. Math. Monthly 90 (1983), 35–41. http://
www.jstor.org/discover/10.2307/2975688?uid=3739256&uid=2&uid=4&sid=21102550539183.
[2] M. S. Klamkin, Problem 63-13∗ , SIAM Review 5 (1963), 275–276.
[3] L. Halbeisen and N. Hungerbühler, Optimal bounds for the length of rational Collatz cycles,
Acta Arith. 78 (1997), no. 3, 227–239, DOI 10.4064/aa-78-3-227-239. MR1432018
[4] A. V. Kontorovich and S. J. Miller, Benford’s law, values of L-functions and the 3x + 1
problem, Acta Arith. 120 (2005), no. 3, 269–297, DOI 10.4064/aa120-3-4. http://arxiv.org/
pdf/math/0412003v2. MR2188844
[5] J. C. Lagarias, The 3x + 1 problem and its generalizations, Amer. Math. Monthly 92 (1985),
no. 1, 3–23, DOI 10.2307/2322189. MR777565
[6] J. C. Lagarias (ed.), The ultimate challenge: the 3x + 1 problem, American Mathematical
Society, Providence, RI, 2010. MR2663745
[7] J. C. Lagarias and K. Soundararajan, Benford’s law for the 3x + 1 function, J. London Math.
Soc. (2) 74 (2006), no. 2, 289–303, DOI 10.1112/S0024610706023131. http://arxiv.org/pdf/
math/0509175.pdf. MR2269630
[8] H. L. Montgomery and K. Soundararajan, Primes in short intervals, Comm. Math. Phys. 252
(2004), no. 1-3, 589–617, DOI 10.1007/s00220-004-1222-4. MR2104891
[9] The On-Line Encyclopedia of Integer Sequences, A023108 (Positive integers which appar-
ently never result in a palindrome under repeated applications of the function f (x) =
x + (x with digits reversed), http://oeis.org/A023108.
1933
Skewes’s Number
Introduction
For a few decades, Skewes’s number held the record as the largest finite number
to meaningfully appear in a mathematical research paper. Let π(x) denote the
number of primes at most x and let
x
dt
Li(x) = (1933.1)
2 log t
denote the offset logarithmic integral function. One version of the prime number
theorem (see the 1913 and 1919 entries) says that
π(x)
lim = 1.
x→∞ Li(x)
This is illustrated in Figure 1. The logarithmic integral gives a better approximation
to π(x) than x/ log x, which is used in other formulations of the prime number
theorem; see Table 1.
107
108 1933. SKEWES’S NUMBER
For all practically computable values of x, the function li(x) = Li(x) + log 2
satisfies li(x) > π(x). Based upon overwhelming numerical evidence, it was con-
jectured that this held for all x. In 1914, John Edensor Littlewood (1885–1977)
showed that li(x) − π(x) changes sign infinitely many times. Littlewood asked one
of his students, a South African named Stanley Skewes (1899–1988), to compute
how high one must go to find the first integer s0 for which π(s0 ) > li(s0 ). Assuming
the truth of the Riemann hypothesis,1 Skewes proved in 1933 that
e79
s0 < ee .
In 1955, he showed that if the Riemann hypothesis is false, then
7.705
ee
s0 < ee .
Both of these extraordinary numbers are sometimes referred to as Skewes’s number .
While much progress has been made, the best upper bounds on s0 are still on the
order of e728 (or about 10316 ). It seems hopeless to expect the first sign change to
be found by computer.
Since Skewes’s second bound is larger than the first, we can conclude that
li(x)−π(x) changes sign somewhere before exp(exp(exp(exp(7.705)))). Why? There
are two cases. Either the Riemann hypothesis is true or it is false, and Skewes
covered both cases! Voilà! For another striking example of this sort of “magical”
reasoning, see the 1935 entry.
Are we overlooking a third possibility? Could the Riemann hypothesis (see the
1942 and 1945 entries) be undecidable, say in ZFC (Zermelo–Fraenkel set theory
with the axiom of choice)? If it is false, then it must be provably false in ZFC.
Why? Because it is known to be equivalent, under ZFC, to various elementary
statements about natural numbers. Let
1 1
Hn = 1 + + · · · +
2 n
denote the nth harmonic number . In 2002, Lagarias showed that the statement
“for each n ≥ 1, d ≤ Hn + eHn log Hn ”
d|n
1 The Riemann hypothesis, one of the seven Clay Millennium Problems (see the comments for
the 2000 entry), is one of the most important open problems in mathematics. Its veracity would
have numerous applications throughout number theory and cryptography. It’s going to take a
while to build up to! See below and the entries for 1942, 1945, 1948, 1967, and 1987.
100TH ANNIVERSARY PROBLEMS 109
is equivalent to the Riemann hypothesis [3]. Thus, if the Riemann hypothesis (RH)
is false, there is a natural number n for which the preceding inequality is violated
and hence there is a finite computation that disproves the Riemann hypothesis. On
the other hand, if RH is undecidable in ZFC, then it is true (but just not provable
in ZFC; see the 1929 entry on Gödel’s work). Why? If the RH were undecidable in
ZFC, then no natural number n violating Lagarias’s condition exists (the existence
of such an n would lead to a quick proof of the falsehood of the Riemann hypothesis).
Thus, if the RH is undecidable in ZFC, then Lagarias’s condition holds, so the
RH is true (just not provable). See the 1924, 1929, and 1963 entries for more
information on axiom systems, and the 1987 entry for connections between the
Riemann hypothesis and counting primes.
1933: Comments
A proof technique. Skewes’s arguments use a powerful proof technique:
break the problem into an exhaustive set of cases, where in each case you have
additional facts at your disposal. For another example of this approach, see the
1935 entry.
the series involved are absolutely convergent.2 This is used implicitly in calculus,
complex variables, and differential equations whenever power series methods are
involved.
If both series are conditionally convergent (convergent but not absolutely con-
vergent), then their Cauchy product series can diverge. An example is furnished by
∞
an = bn = √ (−1)n
n+1
. The alternating series test confirms that ∞ n=0 an and n=0 bn
converge. However,
n n
1
|cn | = ak bn−k =
k=0 k=0
(k + 1)(n − k + 1)
n
1
n
1
n
2
≥ 2 = n =
n +1 n+2
k=0
2 +1 k=0 2 k=0
2 2n + 2
= (n + 1) =
n+2 n+2
∞
does not tend to zero, so n=1 cn diverges.
Mertens’s theorem, due to Franz Mertens (1840–1927), ensures that if at least
one of the two series involved is absolutely convergent,
∞ then term-by-term
∞ multi-
plication is permissible. To be more specific, if n=0 an = A and n=0 bn = B
are convergent series of complex numbers, atleast one of which is absolutely con-
∞
vergent, then their Cauchy product series n=0 cn converges to AB. Proving
Mertens’s theorem is a good exercise in analysis. Here is a sketch. Let An , Bn ,
and Cn be the nth partial sums of the three series involved and consider the iden-
n
tity
n C n = A n B + i=0 (Bi − B)an−i . Since An → A, the key is to show that
i=0 (B i − B)a n−i → 0 as n → ∞.
The Riemann zeta function and the Euler product formula. In homage
to Riemann, who wrote s = σ + it to denote his complex variable, we follow him
and use the letter s below to refer to a complex number. The Riemann hypothesis
concerns the location of the complex zeros of the Riemann zeta function
∞
1
ζ(s) = s
, (1933.2)
n=1
n
which is defined initially for Re s > 1. It might at first appear strange to call
(1933.2) by such a fancy name. Indeed, (1933.2) is the familiar p-series from
calculus. However, the Riemann zeta function is the critical function that links
analysis and number theory. In particular, the deepest properties of the prime
numbers are encoded in the Riemann zeta function.
The connection between the innocuous looking Riemann zeta function and the
prime numbers is furnished by the Euler product formula. If Re s > 1, then
∞
1 1
−1
= 1 − . (1933.3)
n=1
ns p prime
ps
∞ ∞
2A series n=0 an is absolutely convergent if n=0 |an | converges. Absolute convergence
∞ (−1)n+1
implies convergence, but the converse is not true. The alternating harmonic series
n=1 n
converges to log 2, but the harmonic series ∞ 1
n=1 n diverges.
100TH ANNIVERSARY PROBLEMS 111
Since quite a few of our entries (1928, 1942, 1945, 1967, and 1987) involve the
Riemann zeta function, we can take the liberty to develop the topic slowly and
deliberately.
If p is a fixed prime number and s > 1, then the series
∞ ∞ ∞ n −1
1 1 1 1
= = = 1 −
n=0
(pn )s n=0
pns n=0
ps ps
converges absolutely since |1/ps | < 1. By Mertens’s theorem,
−1 −1
1 1 1 1 1 1 1 1
1− s 1− s = 1 + s + s + s + ··· 1 + s + s + s + ···
2 3 2 4 8 3 9 27
1 1 1 1 1 1 1
= 1 + s + s + s + s + s + s + s + ··· ,
2 3 4 6 8 9 12
in which the last sum includes terms corresponding exactly to those numbers whose
prime factorizations involve only 2 or 3. Since Re s > 1, the preceding series is
absolutely convergent. Similarly,
−1 −1 −1
1 1 1
1− s 1− s 1− s
2 3 5
1 1 1 1 1 1 1 1 1 1
= 1 + s + s + s + s + s + s + s + s + s + s + ··· ,
2 3 4 5 6 8 9 10 12 15
in which the sum involves those numbers whose only prime factors are 2, 3, or 5,
and so forth. Since the tail end of a convergent series tends to zero,
∞
1 1
−1
∞
1
s
− 1 − s ≤ → 0
n=1
n p ns
p prime n=N
p≤N
Bibliography
[1] T. M. Apostol, Introduction to analytic number theory, Undergraduate Texts in Mathematics,
Springer-Verlag, New York-Heidelberg, 1976. MR0434929
[2] H. Davenport, Multiplicative number theory, 3rd ed., revised and with a preface by Hugh
L. Montgomery, Graduate Texts in Mathematics, vol. 74, Springer-Verlag, New York, 2000.
MR1790423
[3] J. C. Lagarias, An elementary problem equivalent to the Riemann hypothesis, Amer. Math.
Monthly 109 (2002), no. 6, 534–543, DOI 10.2307/2695443. MR1908008
[4] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[5] S. Skewes, On the Difference pi(x) − li(x) (I), J. London Math. Soc. 8 (1933), no. 4, 277–283,
DOI 10.1112/jlms/s1-8.4.277. MR1573970
[6] S. Skewes, On the difference π(x) − li x. II, Proc. London Math. Soc. (3) 5 (1955), 48–70, DOI
10.1112/plms/s3-5.1.48. MR0067145
1934
Khinchin’s Constant
Introduction
Each irrational real number x has a unique infinite continued fraction expansion
1
x = a0 (x) + ,
1
a1 (x) +
1
a2 (x) + 1
a3 (x) + ···
in which the ai (x) are the continued fraction digits of x and a1 (x), a2 (x), . . . are
positive integers (see the 1931 and 1972 entries or [6] and the references therein for
more details). For instance,
1
π = 3+ ,
1
7+
1
15 +
1
1+
1
292 +
···
which we write as
π = [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1,
84, 2, 1, 1, 15, 3, 13, 1, 4, 2, 6, 6, 99, 1, 2, 2, 6, 3, 5, 1, 1, 6, 8, 1,
7, 1, 2, 3, 7, 1, 2, 1, 1, 12, 1, 1, 1, 3, 1, 1, 8, 1, 1, 2, 1, 6, 1, 1, 5,
2, 2, 3, 1, 2, 4, 4, 16, 1, 161, 45, 1, 22, 1, 2, 2, 1, 4, 1, 2, 24,. . . ].
Truncating this expansion after a few steps provides excellent rational approxima-
tions to π:
1 22
3+ = = 3.142857 . . . ,
7 7
1 333
3+ 1 = = 3.141509 . . . , and
7 + 15 106
1 355
3+ 1 = = 3.141592 . . . .
7 + 15+ 1 113
1
mean of the first n digits in the continued fraction expansion of x converges to the
same constant K as n → ∞:
lim n a1 (x)a2 (x) · · · an (x) = K. (1934.1)
n→∞
That means that, for every
> 0, the set of real numbers x for which (1934.1) fails
can be covered by countably many open intervals of total length <
. The constant
K is called Khinchin’s constant; it is given by
∞ log2 r
1
K = 1+ = 2.6854520010653064453 . . . .
r=1
r(r + 2)
1934: Comments
Solution to the problem. A quadratic irrational has a continued fraction
expansion that is eventually periodic (try to prove it). Thus, the geometric means
of its continued fraction digits converges to the th root of a product of integers, in
which denotes the length of the period. Consequently, the limit of the geometric
means are either rational or algebraic irrational, and hence not transcendental.
This solves the proposed problem.
100TH ANNIVERSARY PROBLEMS 115
Bibliography
[1] B. C. Berndt, H. H. Chan, S.-S. Huang, S.-Y. Kang, J. Sohn, and S. H. Son, The
Rogers-Ramanujan continued fraction, Continued fractions and geometric function the-
ory (CONFUN) (Trondheim, 1997), J. Comput. Appl. Math. 105 (1999), no. 1-2, 9–
24, DOI 10.1016/S0377-0427(99)00033-3. http://www.sciencedirect.com/science/article/
pii/S0377042799000333. MR1690576
[2] C. L. Frenzen, A New Elementary Proof of Stirling’s Formula, Math. Mag. 68 (1995), no. 1,
55–58. https://www.maa.org/sites/default/files/269138004440.pdf. MR1573069
[3] A. Khintchine, Metrische Kettenbruchprobleme (German), Compositio Math. 1 (1935),
361–382. http://archive.numdam.org/ARCHIVE/CM/CM_1935__1_/CM_1935__1__361_0/
CM_1935__1__361_0.pdf. MR1556899
[4] A. Ya. Khinchin, Continued fractions, The University of Chicago Press, Chicago, Ill.-London,
1964. MR0161833
[5] S. J. Miller, The probability lifesaver: All the tools you need to understand chance, Princeton
Lifesaver Study Guide, Princeton University Press, Princeton, NJ, 2017. MR3585480
[6] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
1935
Introduction
Our problem collection is inspired, as are so many other collections, by the
problems David Hilbert proposed in his keynote address at the International Con-
gress of Mathematicians in 1900; see [1]. These problems were meant to chart
important directions for research in the 20th century. A solution to any of Hilbert’s
problems brings instant fame and membership in “The Honors Class” [3].
Here is a curious warmup to one of Hilbert’s problems. We claim that there
are irrational numbers α and β so that αβ is rational. To show this, we consider
√
√ 2
γ = 2 .
You may use the Gelfond–Schneider theorem. Describe the union on the right-
hand side of (1935.1) and investigate the algebraic structure of Bγ .
(b) Cantor: Use the fundamental theorem of algebra to prove that the set of
algebraic numbers is countable (see the footnote on p. 31 for an outline of the
proof). Since R is uncountable, this shows that almost all real numbers are
transcendental. Although this argument proves that almost all numbers are
transcendental, it does not provide an explicit example of a transcendental
number.
(c) Liouville: Suppose α is an algebraic number of degree d > 1 (see p. 30 for the
definition). Liouville’s theorem asserts that there exists a positive constant
C(α) such that for any rational number a/b,
a
C(α)
α − > . (1935.2)
b bd
We say α ∈ R is a Liouville number if for every positive integer n there are
integers a and b with b > 1 such that
a
1
0 < α − < n .
b b
The result above implies that all Liouville numbers are transcendental; how-
ever, not all transcendental numbers are Liouville numbers. Show that the
set of Liouville numbers in the interval [−1, 1] has measure zero.1 See the
notes below for a proof of Liouville’s theorem and the explicit construction of
a transcendental number.
1 That is, for every > 0, the set of Liouville numbers in [−1, 1] can be covered by countably
1935: Comments
Proof of Liouville’s theorem. Suppose that α ∈ R is a root of
f (x) = cd xd + cd−1 xd−1 + · · · + c1 x + c0 ,
in which the coefficients are integers and cd
= 0. Since f has only finitely many
roots, there is a δ > 0 so that f (x)
= 0 whenever 0 < |x − α| ≤ δ. Write
f (x) = (x−α)g(x), in which g is a polynomial of degree d−1. Since g is continuous,
there is an M > 0 such that |g(x)| ≤ M for |x − α| ≤ δ.
Suppose that a, b ∈ Z, b > 1, and 0 < |α − a/b| ≤ δ. Then g(a/b)
= 0 and
hence
d
a f (a/b) cn ( abd ) + · · · + c1 ( ab ) + c0
−α = =
b g(a/b) g(a/b)
cn a + cd−1 a b + · · · + c0 bd
d d−1
= .
bd g(a/b)
The numerator is an integer which is nonzero since f (a/b)
= 0, so
a
1
α − ≥
b M bd
whenever 0 < |α − a/b| ≤ δ. On the other hand, if |α − a/b| > δ, then
a
δ
α − > n
b b
since b ≥ 1. Consequently, 0 < C(α) < min{δ, M
1
} ensures that
a C(α)
α − > .
b bn
This concludes the proof of Liouville’s theorem.
is transcendental; this number was “cooked up” exactly for this purpose. It is
irrational since its decimal expansion is not eventually repeating. Thus, if λ is
algebraic, its degree is at least 2. So suppose toward a contradiction that λ is an
algebraic number of degree d ≥ 2. If n > d, then consider the nth partial sum
n
1 a
j!
=
j=1
10 b
120 1935. HILBERT’S SEVENTH PROBLEM
of the series defining λ. Putting things over a common denominator, we find that
the preceding is a rational number with denominator b = 10m! for some m. Thus,
a
1 1 1
λ − = + + (n+3)! + · · ·
b 10(n+1)! 10(n+2)! 10
1 1 1
= 1 + (n+2)!−(n+1)! + (n+3)!−(n+1)! + · · ·
10(n+1)! 10 10
1 1 1
= 1 + (n+1)!(n+1) + (n+1)!(n+2) + · · ·
10(n+1)! 10 10
1 1 1 1 1
< 1+ + + ··· = ·
10(n+1)! 10 102 10(n+1)! 1 − 10
1
2
< .
10(n+1)!
Liouville’s theorem ensures that for n > d,
a
C(λ) C(λ) 2
0 < = < λ − <
bd 10n!d b 10(n+1)!
and hence
C(λ) 10n!d 10n!d
0 < < = = 10n!(d−n−1) → 0
2 10(n+1)! 10n!(n+1)
as n → ∞. This is a contradiction, so λ must be transcendental.
Bibliography
[1] D. Hilbert, Über das Unendliche, Math. Ann. 95 (1926), 161–190. http://link.springer.
com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/1902-08-
10/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf.
[2] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[3] B. H. Yandell, The honors class: Hilbert’s problems and their solvers, A K Peters, Ltd., Natick,
MA, 2002. MR1880187
[4] Wikipedia, Gelfond–Schneider theorem, https://en.wikipedia.org/wiki/Gelfond-
Schneider_theorem.
[5] Wikipedia, Liouville number, https://en.wikipedia.org/wiki/Liouville_number.
1936
Alan Turing
Introduction
Besides cracking codes at Bletchley Park during World War II and pioneering
the field of artificial intelligence, Alan Turing (1912–1954) might be best known
for his eponymous model of computation, the Turing machine (see Figure 1). The
machine features an infinite tape partitioned into squares and a moving head that
overlooks a single square at each moment in time. Squares start out blank but can
also contain symbols from a finite alphabet. The head can read symbols from and
write symbols to the tape. It also occupies one of n states-of-mind, which we simply
call states. These states serve as the machine’s memory. Computation occurs as
follows: the head reads a symbol from its current square, writes a new symbol to
the square (it might be the same symbol or a blank), and moves either to the left
or to the right while also (potentially) changing its state. The alphabet, states, and
transition rules constitute a finite description of a Turing machine.
In [4], Turing defined a universal machine, one that can take the description of
another Turing machine as input and then simulate that Turing machine. It is the
first example of the now ubiquitous virtual machine. Turing also used his machine
to define computable numbers, which are real numbers whose decimal values can
be written down successively, with each additional digit appearing after a finite
number of steps. These machines do not halt, but they always make progress. Most
Symbol on tape
S1 S2 S3 S4 S5 S6 S7 S8
Moving tape
121
122 1936. ALAN TURING
1936: Comments
Turing and Enigma. It would be a disservice of the highest rank not to
mention the valuable work Turing and his colleagues performed for the British
government in cracking the German Enigma encryption; see Figures 2 and 3. To put
these contributions in perspective, estimates of their worth range from shortening
the war by two to four years, to turning the tide to an Allied victory. Turing was
one of the driving forces in cracking the supposedly uncrackable codes. For more
on these efforts see the 1943 entry.
Much of Turing’s work during the war was classified and was kept classified for
a variety of reasons afterwards. However, with the passage of time the need for such
security lessened, and much of his work is now publicly available; see [5]. Sadly,
Turing, who was a homosexual, was prosecuted for “gross indecency” and forced to
undergo chemical castration. He committed suicide at the age of 41. Speaking in
100TH ANNIVERSARY PROBLEMS 123
represented in ASCII. For example, the string ASCII corresponds (in decimal) to
65 83 67 73 73 and (in binary) to
01000001 01010011 01000011 01001001 01001001.
Each symbol is represented by eight bits, that is, a sequence of eight 0’s and 1’s.
These transmitted segments can be augmented to ensure a more accurate trans-
mission. For instance, the seven bits 0100110 that Alice wants to send might be
augmented to 01001101. The additional 1 is a checksum bit; it means that there
is an odd number of ones in 0100110. If Bob receives 01001100, then he knows
an error has occurred and he can request that Alice resend the block. There are,
of course, many more effective and fascinating error-detecting methods that have
been developed over the years.
If Alice and Bob share a common key beforehand, then there are many methods
they can use to encrypt their data. For instance, the National Institute of Standards
and Technology (NIST) adopted the Data Encryption Standard (DES) in 1976 and
the Advanced Encryption Standard (AES) in 2001. Since this is our first expedition
into cryptography, we discuss a simple technique that dates back to antiquity: the
Caesar cipher .
For the sake of readability and simplicity, we do not consider a blank space as
a character. Alice replaces each letter in the plaintext
HERE IS A MESSAGE ENCRYPTED WITH THE CAESAR
CIPHER USING THE KEY FIVE.
with the letter that occurs k places after it (with “wraparound”). We say that k is
the key that is used to encrypt the message. With k = 5, Alice sends the ciphertext
MJWJ NXFR JXXF LJJS HWDU YJIB NYMY MJHF JXFW
HNUM JWZX NSLY MJPJ DKNA JPMN.
100TH ANNIVERSARY PROBLEMS 125
to Bob. Observe that Alice padded the cipher text with nonsense to ensure that the
blocks are of uniform size. For Alice and Bob to use the Caesar cipher, they must
first share the key k (see the 1977 entry for an encryption method that eliminates
the need to share a secret key before communicating).
Eve can use frequency analysis to decipher an intercepted message, even though
she does not know k. For example, the letter E is the most common letter in English;
see Table 1. The uncommon letter J occurs often in the ciphertext, which suggests
that E is replaced by J. Thus, Eve guesses that k = 5 (the distance between E and
J) and obtains the plaintext message.
As its name suggests, this method of encryption was used by Julius Caesar
(100–44 BCE). Although the Caesar cipher is easily broken, in a time when most
of the population was illiterate and mathematically unsophisticated, it provided
adequate security. The following was encrypted with the Caesar cipher.
HSSN HBSP ZKPC PKLK PUAV AOYL LWHY AZVU LVMD OPJO AOLI
LSNH LPUO HIPA AOLH XBPA HUPH UVAO LYAO VZLD OVPU AOLP
YVDU SHUN BHNL HYLJ HSSL KJLS AZPU VBYN HBSZ AOLA OPYK
HSSA OLZL KPMM LYMY VTLH JOVA OLYP USHU NBHN LJBZ AVTZ
HUKS HDZA OLYP CLYN HYVU ULZL WHYH ALZA OLNH BSZM YVTA
OLHX BPAH UPAO LTHY ULHU KAOL ZLPU LZLW HYHA LAOL TMYV
TAOL ILSN HLXY
Use frequency analysis to determine possible keys and then decipher the message.
See below for the answer.
Bibliography
[1] G. Boolos and R. Jeffrey, Computability and Logic (third edition), Cambridge University Press,
1999.
[2] K. Gödel, Undecidable Diophantine propositions, in Collected Works III (from the 1930s),
164–175.
[3] T. Radó, On non-computable functions, Bell System Tech. J. 41 (1962), 877–884, DOI
10.1002/j.1538-7305.1962.tb00480.x. MR0133229
[4] A. M. Turing, On Computable Numbers, with an Application to the Entscheidungsprob-
lem, Proc. London Math. Soc. (2) 42 (1936), no. 3, 230–265, DOI 10.1112/plms/s2-
42.1.230. https://academic.oup.com/plms/article-abstract/s2-42/1/230/1491926?
redirectedFrom=fulltext. MR1577030
[5] A. M. Turing, The Applications of Probability to Cryptography, http://arxiv.org/pdf/1505.
04714v2.pdf.
Answer: The key is 7. The message is: “All Gaul is divided into three parts, one
of which the Belgae inhabit, the Aquitani another, those who in their own language
are called Celts, in our Gauls, the third. All these differ from each other in language,
customs and laws. The river Garonne separates the Gauls from the Aquitani; the Marne
and the Seine separate them from the Belgae.” These are the famous opening lines of
Julius Caesar’s Commentarii de Bello Gallico (Commentary on the Gallic War). The
final XY in the ciphertext is padding.
Berlin, 2004. MR2106942
at the École Polytechnique Fédérale de Lausanne, Lausanne, June 28, 2002, Springer-Verlag,
“Turing Day: Computing Science 90 Years from the Birth of Alan Mathison Turing” held
[6] C. Teuscher (ed.), Alan Turing: life and legacy of a great thinker, papers from the Conference
1936. ALAN TURING 126
1937
Vinogradov’s Theorem
Introduction
Although we normally view primes from a multiplicative perspective, there are
many interesting additive questions to investigate. A famous conjecture, due to
Christian Goldbach (1690–1764), is that every even number greater than four is
the sum of two primes; see Figure 1. This is the binary Goldbach conjecture; it
is significantly harder than the ternary Goldbach conjecture: every odd number at
least seven is the sum of three primes.
A major advance towards the proof of the ternary conjecture was made by Ivan
Matveyevich Vinogradov (1891–1983) in 1937, who proved that there is a constant
C such that every odd number at least C is the sum of three primes. Thus, the
ternary Goldbach conjecture is reduced to a finite computation: show that every
odd number less than C is a sum of three primes. Unfortunately, the value of C
produced by Vinogradov’s proof is too large for practical computation: it was over
101000 .
In 2013, the ternary Goldbach conjecture was proved by Harald Andrés Helf-
gott (1977– ), who brought C down to 1027 , well within the range checkable by
computers. These approaches all use the circle method (see the 1920 and 1923
127
128 1937. VINOGRADOV’S THEOREM
1937: Comments
Recent developments. The Hardy–Littlewood k-tuple conjecture (see the
1923 entry) implies that for every even number n there is a positive constant Cn ,
which can be explicitly written down in terms of functions of the prime factors of
n, such that the number of pairs of primes of the form (p, p + n) with p ≤ x is
asymptotic to Cn x/(log x)2 . In particular, for each even n the conjecture predicts
infinitely many pairs of primes (p, p + n). Although this has not been proved for
any n, the landscape has changed dramatically in recent years. In 2013, Yitang
Zhang proved that there is some n ≤ 70,000,000 for which there are infinitely many
100TH ANNIVERSARY PROBLEMS 129
pairs of primes (p, p + n). Subsequent work has lowered seventy million to 246; see
the 1919 entry. Also see the comments for the 2005 entry for information about
the more general Bateman–Horn conjecture.
That is, there are approximately twice as many primes in the interval (0, 2n) than
there are in the interval (0, n). But there is a simpler proof that does not rely on
such heavy machinery.
In 1932, the nineteen-year-old Paul Erdős (see the 1913 entry) gave a beautiful
elementary proof of Bertrand’s postulate [2]. Our presentation is based upon that
given in [1]. Erdős first obtained the estimate
p ≤ 4x−1
p≤x
for real x ≥ 2; here and henceforth, the subscript p refers to a prime number. He
next examined the prime divisors of the central binomial coefficient
2n (2n)!
= . (1937.2)
n (n!)2
Erdős
√then showed that no prime divides (1937.2) more than 2n times, that primes
p > 2n appear at most once in the factorization of (1937.2), and that primes p
satisfying 23 n < p ≤ n do not divide (1937.2) at all. This last remark is the key
to his argument. To see why it is true, observe that 3p > 2n for n, p ≥ 3 implies
that p and 2p are the only multiples of p that divide (2n)! and that p divides (n!)2
exactly twice. Consequently,
4n 2n
≤ ≤ 2n · p· p,
2n n √ √
p≤ 2n 2
2n<p≤ 3 n n<p≤2n
the lower bound being obtained by noting that (1937.2) is the largest term of the
2n + 1 terms in the binomial expansion of (1 + 1)2n = 22n = 4n . Suppose toward a
contradiction that there is no prime in the interval (n, 2n]. Then,
√ √ 2
4n ≤ (2n)1+ 2n p ≤ (2n)1+ 2n 4 3 n ,
√
2n<p≤ 23 n
which is false for n ≥ 467 (the original argument gives n ≥ 4,000; we have used
modern computation). This reduces the proof of Bertrand’s postulate to a finite
computation, which is easily accomplished.
130 1937. VINOGRADOV’S THEOREM
Bibliography
[1] M. Aigner and G. M. Ziegler, Proofs from The Book, 6th ed., see corrected reprint of the
1998 original [MR1723092]; including illustrations by Karl H. Hofmann, Springer, Berlin,
2018. MR3823190
[2] P. Erdős, Beweis eines Satzes von Tschedbyschef, Acta Sci. Math. (Szeged) 5 (1930-2), 194–
198.
[3] D. Goldston, Zhang’s theorem on bounded gaps between primes, http://www.aimath.org/
news/primegaps70m/.
[4] H. Helfgott, Major arcs for Goldbach’s theorem, http://arxiv.org/abs/1305.2897.
[5] H. Helfgott, The ternary Goldbach conjecture, http://valuevar.wordpress.com/2013/07/
02/the-ternary-goldbach-conjecture/.
[6] D. H. J. Polymath, New equidistribution estimates of Zhang type, Algebra Number Theory
8 (2014), no. 9, 2067–2199, DOI 10.2140/ant.2014.8.2067. MR3294387
[7] D. H. J. Polymath, Variants of the Selberg sieve, and bounded intervals containing many
primes, Res. Math. Sci. 1 (2014), Art. 12, 83, DOI 10.1186/s40687-014-0012-7. MR3373710
[8] Terence Tao, Online reading seminar for Zhang’s “bounded gaps between primes”, http://
terrytao.wordpress.com/2013/06/04/online-reading-seminar-for-zhangs-bounded-
gaps-between-primes/.
[9] I. M. Vinogradov, The method of trigonometrical sums in the theory of numbers, translated
from the Russian, revised and annotated by K. F. Roth and Anne Davenport; reprint of the
1954 translation, Dover Publications, Inc., Mineola, NY, 2004. MR2104806
[10] Y. Zhang, Bounded gaps between primes, Ann. of Math. (2) 179 (2014), no. 3, 1121–1174,
DOI 10.4007/annals.2014.179.3.7. MR3171761
1938
Benford’s Law
Introduction
Calculate the first N Fibonacci numbers
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, . . .
for as large an N as you can. For d = 1, 2, . . . , 9, what percentage have first digit d?
While it is natural to think that each digit is equally likely to occur, this guess is
totally wrong. For N sufficiently large, about 30% of the first N Fibonacci numbers
start with a 1, while only about 4.5% begin with a 9. In general, the probability
of the first digit being d is log10 d+1
d . The Fibonacci numbers are not an isolated
oddity in this respect. Many mathematical and natural data sets exhibit this bias,
which is known as Benford’s law ; see Figure 1 and Table 1.
One interesting application of Benford’s law is to detect tax fraud (also image
fraud, voter fraud, medical fraud, and so forth). It is so successful because peo-
ple are lousy random number generators: they do not put in enough of the right
patterns. For example, if we toss a fair coin 100 times, most people know there
should be about 50 heads and 50 tails, but they typically do not know what the
longest run of heads or tails should be, or how many alternations between runs of
heads and runs of tails should occur (see [13] for a very readable introduction to
some surprising results on the longest run). The same is true in creating fake data
entries; people are more likely to spread out the leading digit equally from 1 to 9,
or concentrate near 5, in the mistaken belief that this makes the data look more
plausible. There is now a vast literature on Benford’s law and its applications. It
surfaces in accounting, computer science, dynamical systems, economics, finance,
geology, medicine, number theory, physics, psychology, statistics, and astronomy.
See [3, 4, 12] for introductions to the theory and many of its applications and [2]
for a searchable website on Benford publications.
Benford’s law of digit bias says that there is no bias, provided that we look at
the data the right way. Suppose that {log10 xn } is equidistributed modulo 1 for
First digit 1 2 3 4 5 6 7 8 9
Benford 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%
Fibonacci 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%
2n 30.1% 17.6% 12.5% 9.7% 7.9% 6.7% 5.8% 5.1% 4.6%
California 30.9% 17.6% 13.8% 9.3% 7.5% 6.7% 5.8% 4.8% 3.7%
GDP 31.3% 20.8% 10.4% 8.9% 9.9% 5.2% 3.6% 5.2% 4.7%
131
132 1938. BENFORD’S LAW
Figure 1. Many data sets exhibit the bias described by Benford’s law.
100TH ANNIVERSARY PROBLEMS 133
some data set {xn }. This means that for any [a, b] ⊆ [0, 1],
|{n ≤ N : log10 xn (mod 1) ∈ [a, b]}|
lim = b − a.
N →∞ N
We claim that any such data set satisfies Benford’s law. Let x > 0 and write
x = 10u+v = 10u 10v ,
in which u = log10 x and v ∈ [0, 1) is the fractional part of log10 x; that is,
v = log10 x − log10 x.
Since 10u is a positive integer power of 10, it follows that the leading digit of x is
determined entirely by 10v . Because 10v ∈ [1, 10), the probability that 10v has lead-
ing digit d is the probability that 10v ∈ [d, d + 1); that is, v ∈ [log10 d, log10 (d + 1)).
The equidistribution hypothesis on the data set ensures the probability that v ∈
[log10 d, log10 (d + 1)) is
d + 1
log10 (d + 1) − log10 d = log10 .
d
This is the prediction of Benford’s law.
1938: Comments
The Kronecker–Weyl theorem. Although we will not spoil the problem,
we should at least explain why {2n } and {3n } obey Benford’s law. The Kronecker–
Weyl theorem asserts that nξ is equidistributed modulo 1 if ξ is irrational; see
Figure 2 and the 1931 entry. Consequently, if ξ = log10 α is irrational, then xn =
nξ = log10 (αn ) is equidistributed modulo 1 and hence the sequence {αn } obeys
Benford’s law. Since log10 2 and log10 3 are irrational, we conclude that {2n } and
{3n } are Benford. A bit more work shows that {en } and {π n } are Benford too.
The version of the Kronecker–Weyl theorem we used above states that nξ is
equidistributed modulo 1; it says nothing about how rapidly the equidistribution
sets in. This can be remedied by a more involved analysis that takes into account
how “irrational” a number is. A real number α has irrationality type κ if κ is the
supremum of all γ such that
p
lim inf q γ+1
min α − = 0.
q→∞ p q
Roth’s theorem (see the 1955 entry) ensures that every algebraic irrational is of type
1. See [7] for more details on irrationality types, [9] for applications to Benford’s
law, and [8, Thm. 3.3, p. 124] for details connecting the irrationality type to the
convergence rate.
134 1938. BENFORD’S LAW
Powers of 2’s and 3’s. Here is another interesting question about 2 and 3. Is
S = {3n /2m : 1 ≤ m, n < ∞}
dense in the positive real numbers? To handle this question, we need Kronecker’s
approximation theorem [6, Thm. 440], which asserts that if β > 0 is irrational,
α ∈ R, and δ > 0, then there are n, m ∈ N so that |nβ − α − m| < δ. Let ξ,
> 0
and note that β = log2 3 > 0 is irrational. By the continuity of f (x) = 2x at log2 ξ,
there exists δ > 0 such that
| log2 x − log2 ξ| < δ =⇒ |x − ξ| <
. (1938.1)
Kronecker’s theorem with β = log2 3 and α = log2 ξ now yields n, m ∈ N so that
n
log2 3 − log2 ξ = |n log2 3 − log2 ξ − m| < δ.
2m
In light of (1938.1), it follows that |3n /2m − ξ| <
, and thus S is dense in the
positive real numbers. This answer, along with many similar results, can be found
in [5].
1 If
the probability of observing a leading digit of d is pd and we have N observations, the
9
chi-square statistic (with 8 degrees of freedom) is χ2 = d=1 (Obsd − N pd ) /N pd , where N is
2
the number of observations and Obsd is the number with leading digit d.
100TH ANNIVERSARY PROBLEMS 135
lower. We plot the results in Figure 3, where for convenience we plot the logarithm
of the chi-square value.
Two items are immediately apparent. First, for most N the chi-square values
for π n are significantly larger than those of en . Second, there seems to be an almost
periodic behavior in the amplitude of the chi-square values for π n , with a period of
approximately 175 (and the amplitude getting smaller in subsequent periods).
The latter is not a coincidence. While many people have made it a matter
of personal pride to memorize and be able to recite digits of π on demand, very
few can do this feat for π 2 , and almost no one for even higher powers. This is a
shame, as that knowledge would be useful here. If we go far down with our powers,
we eventually come to π 175 and notice that it is approximately 1.0028 · 1087 . In
other words, every time we increase the exponent n by 175 we almost return to
our original value padded by 87 zeros. Almost. If we returned to the same leading
digits (just with an extra 87 zeros at the end), we would have periodic, non-Benford
behavior. The slight difference eventually pushes us to Benford behavior, but very
slowly (as can be seen by the slow decay in the maximum amplitudes); this is what
we mean by the irrationality of the number controlling the behavior. The fact that
a large power of π is almost a large power of 10 produces the peculiar behavior
exhibited in Figure 3.
Bibliography
[1] F. Benford, The law of anomalous numbers, Proceedings of the American Philosophical So-
ciety 78 (1938), 551–572. http://www.jstor.org/discover/10.2307/984802?uid=3739552&
uid=2&uid=4&uid=3739256&sid=21103164625091.
[2] A. Berger and T. P. Hill, Benford online bibliography, http://www.benfordonline.net.
[3] A. Berger and T. P. Hill, A basic theory of Benford’s law, Probab. Surv. 8 (2011), 1–126,
DOI 10.1214/11-PS175. MR2846899
136 1938. BENFORD’S LAW
[4] A. Berger and T. P. Hill, An introduction to Benford’s law, Princeton University Press,
Princeton, NJ, 2015. MR3242822
[5] B. Brown, M. Dairyko, S. R. Garcia, B. Lutz, and M. Someck, Four quotient set gems, Amer.
Math. Monthly 121 (2014), no. 7, 590–599, DOI 10.4169/amer.math.monthly.121.07.590.
MR3229105
[6] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, 6th ed., revised by
D. R. Heath-Brown and J. H. Silverman; with a foreword by Andrew Wiles, Oxford University
Press, Oxford, 2008. MR2445243
[7] M. Hindry and J. H. Silverman, Diophantine geometry: An introduction, Graduate Texts in
Mathematics, vol. 201, Springer-Verlag, New York, 2000. MR1745599
[8] L. Kuipers and H. Niederreiter, Uniform distribution of sequences, Pure and Applied
Mathematics, Wiley-Interscience [John Wiley & Sons], New York-London-Sydney, 1974.
MR0419394
[9] A. V. Kontorovich and S. J. Miller, Benford’s law, values of L-functions and the 3x + 1
problem, Acta Arith. 120 (2005), no. 3, 269–297, DOI 10.4064/aa120-3-4. http://arxiv.
org/abs/math/0412003. MR2188844
[10] S. J. Miller and M. J. Nigrini, The modulo 1 central limit theorem and Benford’s law for
products, Int. J. Algebra 2 (2008), no. 1-4, 119–130. MR2417189
[11] S. J. Miller (editor), The Theory and Applications of Benford’s Law, Princeton University
Press, 2015.
[12] R. A. Raimi, The first digit problem, Amer. Math. Monthly 83 (1976), no. 7, 521–538.
MR0410850
[13] M. F. Schilling, The longest run of heads, College Math. J. 21 (1990), no. 3, 196–207, DOI
10.2307/2686886. MR1070635
1939
Introduction
A student doing a homework problem has an enormous advantage over a re-
searcher: the problem is known to be solvable. This is especially true in undergrad-
uate and beginning graduate classes, in which assignments are meant to reinforce
lessons and help students learn techniques. It is hard to overstate how important
this is. It is a huge psychological boost to know a solution exists (let alone having
a sense of what methods will be useful in finding it).
There are many anecdotes and studies of people who were unaware of the diffi-
culty of a problem and who then proceeded to make great progress. The following
story and its variants have circulated for years and are the subject of this year’s
entry. We will meet the protagonist, George Dantzig (1914–2005), again in the
1947 entry. The quote below is from a 1986 interview [1]. He was asked why his
Ph.D. was on a statistics topic when he had taken so few statistics courses.
It happened because during my first year at Berkeley I arrived late
one day at one of Neyman’s classes. On the blackboard there were two
problems that I assumed had been assigned for homework. I copied
them down. A few days later I apologized to Neyman1 for taking so
long to do the homework—the problems seemed to be a little harder
to do than usual. I asked him if he still wanted it. He told me to
throw it on his desk. I did so reluctantly because his desk was covered
with such a heap of papers that I feared my homework would be lost
there forever. About six weeks later, one Sunday morning about eight
o’clock, Anne and I were awakened by someone banging on our front
door. It was Neyman. He rushed in with papers in hand, all excited:
“I’ve just written an introduction to one of your papers. Read it so I
can send it out right away for publication.” For a minute I had no idea
what he was talking about. To make a long story short, the problems
on the blackboard that I had solved thinking they were homework were
in fact two famous unsolved problems in statistics. That was the first
inkling I had that there was anything special about them.
Later in the interview he discusses how the story found its way into sermons.
The origin of that minister’s sermon can be traced to another Lutheran
minister, the Reverend Schuler of the Crystal Cathedral in Los Angeles.
Several years ago he and I happened to have adjacent seats on an
airplane. He told me his ideas about thinking positively, and I told him
my story about the homework problems and my thesis. A few months
later I received a letter from him asking permission to include my story
137
138 1939. THE POWER OF POSITIVE THINKING
1939: Comments
The birthday problem. Another candidate for this year’s topic is the birth-
day problem. In 1939 Richard von Mises (1883–1953) posed the following problem,
which is a staple in most probability courses. How many people must there be in
a room before there is at least a 50% chance that two people share a birthday?
We give a quick discussion of this problem; see [4] for an expanded treatment and
additional questions.
The first step is to interpret what is going on. Normally people assume that all
birthdays are equally likely (and no one is born on February 29th). This assumption
is not always met. Malcolm Gladwell (1963– ) has a beautifully humorous passage
in his book Outliers [3], in which he investigates the distribution of birthdays among
Canadian junior hockey players. What often happens is that the young kids who
just miss the cutoff for a program are now the oldest and hence likely to be among
the biggest players. This is a tremendous advantage and this makes them look like
better players. They then get more attention, get on to special teams, and the
difference grows. In a telling passage, Gladwell substitutes the birthdays for the
players names:
It no longer sounds like the championship of Canadian junior hockey.
It now sounds like a strange sporting ritual for teenage boys born
under the astrological signs Capricorn, Aquarius, and Pisces. March
11 starts around one side of the Tigers’ net, leaving the puck for his
teammate January 4, who passes it to January 22, who flips it back
to March 12, who shoots point-blank at the Tigers’ goalie, April 27.
April 27 blocks the shot, but it’s rebounded by Vancouver’s March 6.
He shoots! Medicine Hat defensemen February 9 and February 14 dive
to block the puck while January 10 looks on helplessly. March 6 scores!
Back to the birthday problem. We assume that there are 365 days in each year
and that all days are equally likely. We use the law of complementary probability:
the probability that an event happens is one minus the probability that it does not
happen. The probability that among n people we have n different birthdays is
n−1
0 1 n−1 k
qn = 1 − 1− ··· 1 − = 1− .
365 365 365 365
k=0
100TH ANNIVERSARY PROBLEMS 139
Indeed, the first person can have any birthday, the next person must avoid that
first birthday, then the subsequent person must miss those two days, and so on.
As we saw in the 1920 and 1934 entries, it is often profitable to take the
logarithm of a product. Thus, we consider
n−1
k
log qn = log 1 − .
365
k=0
If we choose N so that qN ≤ 1/2, then 1 − qN ≥ 1/2; that is, the probability that
a birthday is shared among N people is ≥ 1/2. For small x, we use the Taylor
approximation log(1 − x) ≈ −x and obtain
N −1
k (N − 1)N (N − 1/2)2
log(1/2) ≈ − = − ≈ −
365 2 · 365 2 · 365
k=0
and hence
1 1
N ≈ −2 · 365 log(1/2) + = 365 log 4 + = 22.994 . . . .
2 2
Most people unfamiliar with the problem significantly underestimate the chance;
the probability is about 70% if there are 30 people, 89% with 40, and √ 97% with
50. More generally, if there were D days in the year, we need at least D log 4 + 12
people to have a 50% chance of at least one shared birthday.
How close is this approximation? Very close: the probability that among n
people at least two share the same birthday is ≥ 50% if n ≥ 23. This is sometimes
called the birthday paradox since the answer is strikingly different than the answer
to a seemingly similar problem: how many people are needed before there is a 50%
chance that someone shares my birthday? We need N so large that
N
1 1
1− ≤ ;
365 2
this occurs first for N = 253 (if we had D days in a year, we would find N ≈ D log 2).
The reason the two answers disagree by so much is that in one version any two
people may agree, while in the other someone must agree with a predetermined
person. Note the sharp difference in behavior: the first answer grows like D1/2 ,
whereas and the second grows linearly with D. In addition to being a source of
revenue for probability professors betting their students on the odds two members
in the class share a birthday, an interesting application is the birthday attack to
find collisions of hash functions in cryptography (see [5] and the references therein).
The zeta function and relatively prime integers. Now that we have de-
veloped a bit of the theory behind the Riemann zeta function (see the 1928 and
1933 entries) here is another probability gem that we cannot resist. What is the
probability that two randomly chosen integers a, b are relatively prime?
To begin, let us note that gcd(a, b) = 1 if and only if a and b have no prime
factors in common. In other words, no prime number p divides both a and b. There
is only a 1/4 chance that both a and b are divisible by 2. Therefore there is a
1 1
1− = 1− 2
4 2
140 1939. THE POWER OF POSITIVE THINKING
Bibliography
[1] D. J. Albers and C. Reid, An interview with George B. Dantzig: the father of linear program-
ming, College Math. J. 17 (1986), no. 4, 293–314, DOI 10.2307/2686279. http://www.jstor.
org/stable/2686279. MR856311
[2] G. B. Dantzig, On the non-existence of tests of “Student’s” hypothesis having power functions
independent of σ, Ann. Math. Statistics 11 (1940), 186–192, DOI 10.1214/aoms/1177731912.
http://projecteuclid.org/download/pdf_1/euclid.aoms/1177731912. MR0002082
[3] M. Gladwell, Outliers: The story of success (reprint edition), Back Bay Books, 2011.
[4] S. J. Miller, The probability lifesaver: All the tools you need to understand chance, Princeton
Lifesaver Study Guide, Princeton University Press, Princeton, NJ, 2017. MR3585480
[5] R. Niebuhr, P.-L. Cayrel, and J. Buchmann, Improving the efficiency of generalized birth-
day attacks against certain structured cryptosystems, published in WCC 2011—Workshop on
coding and cryptography (2011), 163–172. https://www.cdc.informatik.tu-darmstadt.de/
reports/reports/GBA-final2.pdf.
1940
A Mathematician’s Apology
Introduction
One of the most important parts of an academic’s job is mentoring the next gen-
eration. Some have written extensively to share the lessons they have learned. One
of the most prolific is Steven G. Krantz (1951– ), whose titles include
A Mathematician’s Survival Guide: Graduate School and Early Career Develop-
ment; A Primer of Mathematical Writing: Being a Disquisition on Having Your
Ideas Recorded, Typeset, Published, Read and Appreciated ; How to Teach Mathe-
matics; A TEX Primer for Scientists; and The Survival of a Mathematician: From
Tenure to Emeritus. These books give a nice sample of the issues, challenges, and
rewards that lie ahead (the last is available online [9]; all can be purchased for
reasonable amounts).
Although there are many authors and texts to mention, this entry highlights
Godfrey Harold Hardy’s A Mathematician’s Apology, first published in 1940 [7].
While many books discuss the challenges and rewards of being a mathematician, his
work is a reflection on his life and whether or not it was well spent. Mathematically
it surely was, since he was responsible for numerous advances and new techniques.
Regarding his life, Hardy considered it to be a success in terms of the happiness
and comfort that he found, but the question remained as to the “triviality” of his
life. He resolved it accordingly:
The case for my life. . . is this: that I have added something to knowl-
edge, and helped others to add more; and that these somethings have
a value which differs in degree only, and not in kind, from that of the
creations of the great mathematicians, or of any of the other artists,
great or small, who have left some kind of memorial behind them.
Because of the influence of Hardy’s writing and work, we devote the entire entry
to him. This is not meant to imply that there were no significant results proved in
1940. One natural candidate is Kurt Gödel’s proof [5] of the relative consistency of
the axiom of choice with the Zermelo–Fraenkel axioms of set theory; see the entry
from 1963 for the rest of the story.
A well-known passage from the Apology proclaims:
I have never done anything “useful.” No discovery of mine has made,
or is likely to make, directly or indirectly, for good or ill, the least differ-
ence to the amenity of the world. . . . Judged by all practical standards,
the value of my mathematical life is nil; and outside mathematics it is
trivial anyhow.
Then it might come as a surprise that Hardy is best known to the world for his work
in genetics. His fame stems from a condescending letter to the editor in Science
on the stability of genotype distributions from one generation to the next [6]; see
141
142 1940. A MATHEMATICIAN’S APOLOGY
Figure 1. The result was independently found by the German physician Wilhelm
Weinberg (1862–1937) and is now known as the Hardy–Weinberg law ; see [1] for
more details.
During a lecture by Reginald Crundall Punnett (1875–1967) of Punnett square
fame, the statistician Udny Yule (1871–1951) asked about the behavior of the ratio
of dominant to recessive traits over time. Why does the population not tend towards
the dominant trait over time? Punnett brought the problem to his friend and cricket
companion Hardy; see [3, 4] for more details.
Using only “mathematics of the multiplication-table type”, under natural con-
ditions Hardy proved that there is an equilibrium at which the ratio of different
genotypes remains constant over time. The mathematical content of the letter can
be summarized in one line:
(p + q)2 = p2 + 2pq + q 2 .
The following passage from his note gives a good sense of its tone.
I am reluctant to intrude in a discussion concerning matters of which I
have no expert knowledge, and I should have expected the very simple
point which I wish to make to have been familiar to biologists. . . .
There is not the slightest foundation for the idea that a dominant
character should show a tendency to spread over a whole population,
or that a recessive should tend to die out. [6].
In an obituary of Hardy, Edward Charles Titchmarsh (1899–1963) states that Hardy
“attached little weight to it” [12]. However, its prevalence in introductory biology
100TH ANNIVERSARY PROBLEMS 143
1940: Comments
More about Hardy. Hardy lived through World War I and the Apology was
written at the start of the Second World War. Much of his pride in the uselessness
of his work stemmed from the fact that he was not contributing to violence and
war.
But here I must deal with a misconception. It is sometimes suggested
that pure mathematicians glory in the uselessness of their work. If the
theory of numbers could be employed for any practical and obviously
honorable purpose, if it could be turned directly to the furtherance of
human happiness of the relief of human suffering. . . then surely neither
Gauss nor any other mathematician would have been so foolish as to
decry or regret such applications. But science works for evil as well as
for good (and particularly, of course, in time of war). . . . [7]
Interestingly, what seems useless and pure in one era can become useful and applied
a short time later. Hardy’s own work provides an excellent example, where much of
elementary number theory (as well as advanced results on L-functions) now plays
an important role in cryptography; see the 1921 and 1977 entries.
Of course, Hardy is perhaps best known (in the mathematical community at
any rate) for his collaborations with Littlewood and Ramanujan. On this Hardy
says:
I still say to myself when I am depressed and find myself forced to listen
to pompous and tiresome people, “Well, I have done one thing you
could never have done, and that is to have collaborated with Littlewood
and Ramanujan on something like equal terms.” [7]
The 2015 film “The Man Who Knew Infinity,” based upon the outstanding biogra-
phy of Ramanujan by Robert Kanigel (1946– ) [8], depicts some of Hardy’s many
quirks and his working relationship with the great Ramanujan; see Figure 2.
144 1940. A MATHEMATICIAN’S APOLOGY
Figure 2. A scene from the 2015 movie “The Man Who Knew
Infinity.” S. Ramanujan (left) speaks with J. E. Littlewood (right)
as G. H. Hardy (middle) observes. Ramanujan, Littlewood, and
Hardy are played by Dev Patel (1990– ), Toby Jones (1966– ), and
Jeremy Irons (1948– ), respectively.
Bibliography
[1] H. E. Christenson and S. R. Garcia, G. H. Hardy: mathematical biologist, J. Humanist. Math.
5 (2015), no. 2, 96–102, DOI 10.5642/jhummath.201502.08. http://scholarship.claremont.
edu/cgi/viewcontent.cgi?article=1273&context=jhm. MR3378780
[2] J. F. Crow, Eighty years ago: the beginnings of population genetics, Genetics 19 (1988), no. 3,
473–76.
[3] A. W. F. Edwards, G. H. Hardy (1908) and Hardy–Weinberg equilibrium, Genetics
179 (2008), no. 3, 1143–150. http://genetics.org/content/179/3/1143.
[4] C. R. Fletcher, G. H. Hardy—applied mathematician, Bull. Inst. Math. Appl. 16 (1980),
no. 2-3, 61–67. MR576086
[5] K. Gödel, The Consistency of the Continuum Hypothesis, Annals of Mathematics Studies,
no. 3, Princeton University Press, Princeton, N. J., 1940. MR0002514
[6] G. H. Hardy, Mendelian proportions in a mixed population, Science 28 (1908), 49–50. http://
www.esp.org/foundations/genetics/classical/hardy.pdf.
[7] G. H. Hardy, A mathematician’s apology, with a foreword by C. P. Snow; reprint of the 1967
edition, Canto, Cambridge University Press, Cambridge, 1992. MR1148590
[8] R. Kanigel, The man who knew infinity: A life of the genius Ramanujan, Charles Scribner’s
Sons, New York, 1991. MR1113890
[9] S. G. Krantz, The survival of a mathematician: From tenure-track to emeritus, Ameri-
can Mathematical Society, Providence, RI, 2009. http://www.math.wustl.edu/~sk/books/
newsurv.pdf. MR3309302
[10] R. C. Punnett, Early days of genetics, Heredity 4 (1950), no. 1, 1–10.
[11] B. Riemann, On the number of prime numbers less than a given quantity, Monatsberichte der
Königlich Preußischen Akademie der Wissenschaften zu Berlin, 1859. http://www.claymath.
org/sites/default/files/ezeta.pdf.
[12] E. C. Titchmarsh, Obituary: Godfrey Harold Hardy (1877–1947), Obit. Notices Roy. Soc.
London 6 (1949), 447–461 (1 plate). MR0037796
1941
Introduction
On August 1, 1941, Isaac Asimov visited John Campbell, editor of Astounding
Science Fiction. The meeting led to the creation of the Foundation series, one of
the most influential science-fiction series of all time. The story is modeled on the
celebrated The History of the Decline and Fall of the Roman Empire by Edward
Gibbon (1737–1794) and tells the story of how the Galactic Empire will fall and
30,000 years of anarchy will reign before a new empire arises.1 Hari Seldon develops
the mathematical theory of psychohistory. Inspired by statistical mechanics, the
Foundation series postulates that it is possible to mathematically predict the general
behavior of galactic populations with high precision (despite the fact that it is
impossible to predict the behavior of specific individuals). While it is too late
to stop the fall, Hari and his colleagues analyze the equations and take steps to
minimize its impact, so that a new empire will rise after just a thousand years.
Asimov is but one of many science-fiction writers whose work has inspired
scientists and engineers. NASA seriously considered adopting the Star Trek logo;
while that never happened, the first shuttle was named Enterprise.
Of course, this is not meant to imply that science fiction always gets the math
right. In the 1989 Star Trek: The Next Generation episode The Royale, Captain
Jean-Luc Picard claims that Fermat’s last theorem is still unresolved after 800
years;2 it was proved by Andrew Wiles (1953– ) in 1994 (see the 1995 entry). The
2010 Doctor Who episode The Eleventh Hour is notable for conflating anecdotes
about the mathematicians Pierre de Fermat (1607–1665) and Évariste Galois (1811–
1832). On the other hand, the 1981 Doctor Who story Logopolis and the 1982 story
Castrovalva (named after an M. C. Escher lithograph) involve mathematics, in a
vague but fascinating sense, as part of the plot.
1 Can you think of a Roman Emperor who was captured in battle? Can you find a Fields
Medalist with that middle name?
2 Although the 1995 Star Trek: Deep Space 9 episode Facets refers to Wiles’s proof.
145
146 1941. THE FOUNDATION TRILOGY
1941: Comments
Elliptical reasoning. If we let u = x/a and v = y/b, then the equation of the
ellipse, in uv-space, becomes the equation of the unit circle centered at the origin;
the area element dx dy becomes ab du dv. This change of variables yields
1 dx dy = 1 · ab du dv = πab
(x/a)2 +(y/b)2 ≤1 u2 +v 2 ≤1
since the area of the unit circle is π. If a = b = r, then the area is πr 2 , as expected.
A similar calculation shows that the volume of the ellipsoid
(x/a)2 + (y/b)2 + (z/c)2 ≤ 1
is πabc. Computing the perimeter of an ellipse is a different story; see [1] for a
discussion and solution.
Fourier series. Of course, 1941 witnessed many mathematical innovations
that are worthy of our attention. We focus on a famous theorem of Norbert Wiener
(1894–1964) about absolutely convergent Fourier series. Before tackling Wiener’s
theorem, we need to talk about Fourier series. This is a subject that every mathe-
matics student should learn about. Under certain circumstances one can approxi-
mate a function f : [−π, π] → R by the partial sums of its Fourier series
∞
a
√0 + (an cos nt + a−n sin nt), (1941.1)
2 n=1
in which the Fourier coefficients are given by
1 π
a−n = f (t) sin nt dt,
π −π
1 π f (t)
a0 = √ dt,
π −π 2
and
1 π
an = f (t) cos nt dt
π −π
for n ∈ N. The motivation stems from the study of waves. A 2π-periodic function
f : R → R can be regarded as a function f : [−π, π] → R since the values of f
on [−π, π] determine the values of f everywhere. Under certain circumstances, one
hopes to express f as a superposition of simple sine and cosine waves. The integrals
defining the “amplitudes” an act as “filters” that isolate the component of f that
has “frequency” n.
A typical result in the area, used all the time by electrical engineers, is the
following. Let f : R → R be a periodic function with period 2π. Suppose that
100TH ANNIVERSARY PROBLEMS 147
f and f are both piecewise continuous on [−π, π] and that f (−π) = f (π) and
f (−π) = f (π). If f is continuous at t, then the Fourier series (1941.1) converges
to f (t). If f has a jump discontinuity at t, then the Fourier series (1941.1) converges
to the midpoint 12 (f (t+ ) + f (t− )) of the gap [4, 2.3.10]. This result is of practical
value, since it ensures that “nice” waves can be studied using sines and cosines.
Leibniz’s series for π/4. Here is a cute example of Fourier series in action.
Consider the square-wave function f : [−π, π] → R defined by
⎧
⎪
⎨0 if −π < t < 0,
f (t) = π if 0 < t < π, (1941.2)
⎪
⎩π
2 if t = 0 or t = ±π.
The advantage of this approach is that we can work entirely with exponential
functions, which are much easier to deal with than sines and cosines. For instance,
do you remember the trigonometric identities for cos(x + y) and sin(x + y)? If not,
see the footnote on p. 8. However, you certainly know that ex+y = ex ey .
Wiener’s 1/f theorem asserts that if f : T → C has an absolutely convergent
Fourier series f (eit ) = n∈Z an eint and if f does not vanish on T, then 1/f has
an absolutely convergent Fourier series. The original proof by Norbert Wiener
(1894–1964) from 1932 was delicate and technical (around 100 pages) [7]. It is
often described as a tour-de-force of “hard analysis.” Using the theory of Banach
algebras, Israel Gelfand (1913–2009) gave a “soft” proof of Wiener’s theorem in
1941 [5] that requires only a few pages.
To discuss Gelfand’s proof, we need to view the problem through the lens of
Banach algebras [2]. The Wiener algebra W is the set of all functions f : T → C
of the form f (eit ) = n∈Z an eint for which
f W = |an | < ∞.
n∈Z
for all n ∈ Z, so |λ| = 1. Thus, for each character χ, there is a unique eiα ∈ T so
that χ(z) = eiα . Consequently,
for all n ∈ Z. Thus, χ simply evaluates the function z n at eiα . If f = n∈Z cn z
n
∈
W, then the continuity of χ ensures that
χ(f ) = cn χ(z n ) = cn einα = f (eiα ).
n∈Z n∈Z
Bibliography
[1] S. Adlaj, An eloquent formula for the perimeter of an ellipse, Notices Amer. Math. Soc.
59 (2012), no. 8, 1094–1099, DOI 10.1090/noti879. http://www.ams.org/notices/201208/
rtx120801094p.pdf. MR2985810
[2] W. Arveson, A short course on spectral theory, Graduate Texts in Mathematics, vol. 209,
Springer-Verlag, New York, 2002. MR1865513
[3] I. Asimov, Foundation, Gnome Press, 1951.
[4] R. Bhatia, Fourier series, reprint of the 1993 edition [Hindustan Book Agency, New Delhi,
MR1657675], Classroom Resource Materials Series, Mathematical Association of America,
Washington, DC, 2005. MR2108537
[5] I. M. Gelfand, Normierte ringe, Mat. Sbornik N.S. 9 (1941), no. 51, 3-24. http://www.mathnet.
ru/links/c2c9f3ffc009b4ac303540de01718e1e/sm6046.pdf.
[6] D. J. Newman, A simple proof of Wiener’s 1/f theorem, Proc. Amer. Math. Soc. 48
(1975), 264–265, DOI 10.2307/2040730. http://www.ams.org/journals/proc/1975-048-01/
S0002-9939-1975-0365002-8/. MR0365002
[7] N. Wiener, Tauberian theorems, Ann. of Math. (2) 33 (1932), 1-100. http://www.jstor.org/
stable/1968102?origin=crossref&seq=1#page_scan_tab_contents.
1942
Zeros of ζ(s)
Introduction
The Riemann zeta function is perhaps the most important function in number
theory; see the 1928, 1933, 1939, 1942, 1945, 1967, and 1987 entries. It is initially
defined for Re s > 1 by the series
∞
1
ζ(s) = s
.
n=1
n
also valid for Re s > 1; see the 1933 entry for a proof. Although Euler and others
studied the zeta function first, it is named after Georg Friedrich Bernhard Riemann
(1826–1866) because of his 1859 masterpiece that relates the distribution of the
zeros of ζ(s) to the fine properties of the prime-counting function π(x) [8].
The Euler product formula confirms that the zeta function has no zeros in the
half plane Re s > 1. However, neither the series nor the product representation
given above converges if Re s ≤ 1. So what do we mean by the zeros of ζ(s)?
To resolve this issue and to understand Riemann’s contribution, we must discuss
analytic continuation.
An analytic function is a differentiable function f : U → C defined on a
nonempty, connected open set U ⊆ C. By “differentiable,” we mean that
f (z) − f (z0 )
f (z0 ) = lim
z→z0 z − z0
exists for every z0 ∈ U . This is the complex version of the single-variable calculus
definition. For instance, the zeta function is analytic on Re s > 1 with derivative
∞
log n
ζ (s) = − .
n=1
ns
is valid for |z| < 1.1 There is an important asymmetry in (1942.1): the series
z)−1 is defined for all z
= 1.
converges only for |z| < 1, whereas the function (1 −
−1 ∞
Thus, (1 − z) provides an analytic continuation of n=0 z n from the open disk
|z| < 1 to the much larger region C\{1}.
Obtaining an analytic continuation of the zeta function is more difficult. We
first construct an analytic continuation to Re s > 0. Observe that
∞
∞ ∞ n+1
1
ζ(s) − = n−s − x−s dx = n−s − x−s dx
s − 1 n=1 1 n=1 n
∞ n+1
−s
= n − x−s dx
n=1 n
∞ n+1 x
= s y −1−s dy dx. (1942.2)
n=1 n n
Since
n+1 x
s y −1−s
dy dx ≤ |s|n−1−Re s ,
n n
it follows that the series (1942.2) converges absolutely and uniformly on each half-
plane Re s ≥ δ > 0. Each summand is an analytic function of s, so (1942.2) provides
an analytic continuation of ζ(s) − (s − 1)−1 to the half-plane Re z > 0; the presence
of the term (s − 1)−1 on the left-hand side ensures that ζ(s) has a simple pole at
s = 1 with residue 1. That is, near the point s = 1, the zeta function behaves like
the function (s − 1)−1 .
The next, and most complicated, step is to show that the zeta function satisfies
the functional equation
πs
ζ(s) = 2s π s−1 sin Γ(1 − s)ζ(1 − s), (1942.3)
2
in which
e−γs s −1 s
∞
Γ(s) = 1+ en (1942.4)
s n=1 n
is the gamma function and
N
1
γ = lim − log N ≈ 0.5772156 . . . (1942.5)
N →∞ n
n=1
is the Euler–Mascheroni constant. For the sake of brevity, we omit this step. Since
the product (1942.4) is an analytic function on C\{0, −1, −2, −3, . . .}, the functional
equation (1942.3) permits us to define ζ(s) for Re s ≤ 0 since the function on the
right-hand side of (1942.3) is now defined for s
= 1 with Re s ≥ 1. Thus, we have
obtained an analytic continuation of the zeta function to C\{0}.
The product representation (1942.4) of the gamma function and (1942.3) ensure
that ζ has zeros at −2, −4, −6, . . . These are the trivial zeros of the zeta function.
Any remaining zeros must be in the critical strip
{s ∈ C : 0 < Re s < 1}.
1 The
∞ n
radius of convergence of the series n=0 z is 1. What students of calculus do not
often realize is that the “radius” referred to is the radius of the disk |z| < 1 in the complex plane.
100TH ANNIVERSARY PROBLEMS 153
These are the nontrivial zeros of the zeta function. It turns out that the nontrivial
zeros govern the main terms in our error estimates of the π(x). Neglecting some
logarithmic factors, if
θ = sup{Re s : 0 < Re s < 1, ζ(s) = 0},
then the maximum deviation2 |π(x)−Li(x)| from the prediction of the prime number
theorem is essentially of size at most xθ . Thus, the nontrivial zeros of the zeta
function have an enormous influence in number theory: they control the large-scale
distribution of the prime numbers.
To a few decimal places, these are the first twenty nontrivial zeros that lie in
the upper half-plane:
0.5 + 14.1347i, 0.5 + 21.0220i, 0.5 + 25.0109i, 0.5 + 30.4249i, 0.5 + 32.9351i,
0.5 + 37.5862i, 0.5 + 40.9187i, 0.5 + 43.3271i, 0.5 + 48.0052i, 0.5 + 49.7738i,
0.5 + 52.9703i, 0.5 + 56.4462i, 0.5 + 59.3470i, 0.5 + 60.8318i, 0.5 + 65.1125i,
0.5 + 67.0798i, 0.5 + 69.5464i, 0.5 + 72.0672i, 0.5 + 75.7047i, 0.5 + 77.1448i.
Notice a pattern? Numerical calculations have confirmed that the first 1013 non-
trivial zeros lie on the critical line Re s = 12 ; see Figure 1. The Riemann hypothesis,
one of the seven Clay Millennium Problems, asserts that the nontrivial zeros all lie
on the critical line. Riemann wrote in [8]:
. . . and it is very probable that all roots are real.3 Certainly one would
wish for a stricter proof here; I have meanwhile temporarily put aside
the search for this after some fleeting futile attempts, as it appears
unnecessary for the next objective of my investigation.
The Riemann hypothesis, which was one of Hilbert’s problems [10] (see the 1935,
1963, 1970, 1980, and 1983 entries), is considered by many mathematicians to be
the most important open problem in mathematics.
In 1914, Godfrey Harold Hardy (see the 1920, 1923, and 1940 entries) proved
there are infinitely many nontrivial zeros on the critical line. However, he was
unable to ascertain whether a positive proportion of them are on the critical line.
The situation changed in 1942, when Atle Selberg (1917–2007) showed that a small,
but positive, proportion of the zeros of ζ(s) are on the critical line; see the 1948
entry. A major advance came in 1974 with the work of Norman Levinson (1912–
1975), who proved more than a third of these zeros are on the line. The best results
today are around 40%; there is still a long way to go. Even if we can prove that
100% of the zeros are on the critical line, that still would be insufficient to prove
the Riemann hypothesis. There could still be infinitely many zeros in the critical
strip that do not lie on the critical line. This is meant in the same sense that “100%
of natural numbers are not perfect squares.” The proportion √ of natural numbers
√
at most x that are not perfect squares is approximately (x − x)/x = 1 − 1/ x,
which tends to zero as x → ∞.
It is still unknown whether or not there is a c < 1 such that all nontrivial zeros
of the zeta function have real part at most c; the Riemann hypothesis is equivalent
to being able to take c = 12 (the nontrivial zeros are symmetric about the line
Re s = 12 ). The best results are zero-free regions where how far to the left of the
line Re s = 1 we can go tends to zero rapidly with the height t, giving regions where
ζ(σ + it)
= 0 if σ > 1 − A(log |t|)−r1 (log log |t|)−r2
for some positive constants A, r1 , r2 .
(b) Let ζ2 (s) = h2 (s)ζ(s). The analytic continuation of ζ2 (s) is simply h2 (s) times
the analytic continuation of ζ(s). Furthermore, ζ2 (s) and ζ(s) have the same
zeros for Re s > 0. Observe that
−1 −1
ζ2 (s) = 1 − 2−2s 1 − p−s .
p prime
p≥3
(c) Similarly set ζ3 (s) = h3 (s)ζ2 (s), and observe that ζ3 (s) and ζ2 (s) (and hence
also ζ(s)) have the same zeros in the region Re s > 0. Note that
−1 −1 −1
ζ3 (s) = 1 − 2−2s 1 − 3−2s 1 − p−s .
p prime
p≥5
(d) We continue this process, working initially in the region Re s > 2 so that all
the products involved converge uniformly. We let ζ∞ (s) be the limit of ζp (s)
as p → ∞. This limit exists and equals ζ(2s) for Re s > 2.
(e) Since ζ(2s) has an analytic continuation that does not vanish for Re s > 1/2
(because ζ(s) does not vanish if Re s > 1), each ζp (s) does not vanish for
Re s > 1/2. Since all these functions have the same zeros in this region, none
of them vanish for Re s > 1/2. Thus, ζ(s) does not vanish in this region and
the Riemann hypothesis is true.
1942: Comments
Solution to the problem. The approach sketched above is fundamentally
flawed. The error is that the analytic continuation of the limit is not necessarily
the limit of the analytic continuation. Moreover, there is no hope of salvaging the
argument above. If instead of replacing each prime with its square we used its
cube, we would then deduce that ζ(s) has no zeros for Re s > 1/3. However, this
is impossible since the zeta function has infinitely many zeros on the critical line.
Bibliography
[1] E. Bombieri, Problems of the millennium: the Riemann hypothesis, Clay Mathematics In-
stitute, http://www.claymath.org/sites/default/files/official_problem_description.
pdf.
[2] Clay Mathematics Institute, Millennium problems, http://www.claymath.org/millennium-
problems.
[3] H. Davenport, Multiplicative number theory, 2nd ed., revised by Hugh L. Montgomery, Grad-
uate Texts in Mathematics, vol. 74, Springer-Verlag, New York-Berlin, 1980. MR606931
[4] H. M. Edwards, Riemann’s zeta function, Pure and Applied Mathematics, Vol. 58, Aca-
demic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers], New York-London,
1974. MR0466039
[5] G. H. Hardy, Sur les zéros de la fonction ζ(s), Comp. Rend. Acad. Sci. 158 (1914), 1012–
1014.
[6] H. Iwaniec and E. Kowalski, Analytic number theory, American Mathematical Society
Colloquium Publications, vol. 53, American Mathematical Society, Providence, RI, 2004.
MR2061214
[7] N. Levinson, More than one third of zeros of Riemann’s zeta-function are on σ = 1/2,
Advances in Math. 13 (1974), 383–436, DOI 10.1016/0001-8708(74)90074-7. MR0564081
156 1942. ZEROS OF ζ(s)
[8] G. F. B. Riemann, Über die Anzahl der Primzahlen unter einer gegebenen Grösse, Monats-
ber. Königl. Preuss. Akad. Wiss. Berlin, Nov. 1859, 671–680. http://www.maths.tcd.ie/pub/
HistMath/People/Riemann/Zeta/EZeta.pdf.
[9] A. Selberg, Contributions to the theory of the Riemann zeta-function, Arch. Math. Naturvid.
48 (1946), no. 5, 89–155. MR0020594
[10] Wikipedia, Hilbert’s problems, http://en.wikipedia.org/wiki/Hilbert’s_problems.
1943
Breaking Enigma
Introduction
One group of mathematicians played a crucial role in the Allied victory in
World War II: the codebreakers. The German Army encrypted its communications
with Enigma machines, typerwriter-like devices (see Figures 2 and 3 on pp. 123 and
124, respectively) that produce a fiendishly complicated code. The Polish Cipher
Bureau developed the strategies to break the Enigma code in the early 1930s, but
the largest codebreaking operation was British, headquartered at Bletchley Park,
a Victorian manor northwest of London. The top-secret Bletchley Park project,
codenamed “Ultra,” is legendary. It employed mathematicians, linguists, chess
masters, academics, composers, and puzzle experts. Recruiters once asked the
Daily Telegraph to organize a crossword competition and then secretly offered jobs
to the winners. One of the leaders of Ultra was Alan Turing, the mathematician
and pioneer of theoretical computer science whom we met in the 1936 entry.
Mathematically speaking, the Enigma machine generates a permutation τ ∈
S26 of the 26 letters of the alphabet. Here Sn denotes the symmetric group on n
symbols. The permutation τ changes with each keystroke. Typing one letter sends
an electric current through scrambling mechanisms—a plugboard, then a set of
rotors, then a reflector, then back through the rotors and the plugboard—causing
a different letter to light up. It also turns the rotors so that the next letter will be
scrambled differently.
The scramblers are wired as follows: the plugboard has one plug for each letter
and ten pairs of letters wired together. It defines a permutation π, which is a
product of ten two-cycles. The rotors are rotating wheels with a circle of twenty-
six brass pins on one side and twenty-six electrical contacts on the other. The wiring
from contacts to pins gives a fixed permutation ρ. Depending on the position of the
rotor, this permutation is conjugated by a power of the 26-cycle α = (1 2 3 . . . 26).
The reflector has twenty-six electrical contacts, connected in pairs by thirteen wires.
It gives a fixed permutation σ, a product of thirteen 2-cycles. Altogether, the
permutation τ is
π −1 (α−i1 ρ1 αi1 )−1 (α−i2 ρ2 αi2 )−1 (α−i3 ρ3 αi3 )−1 σ(α−i3 ρ3 αi3 )(α−i2 ρ2 αi2 )(α−i1 ρ1 αi1 )π,
where i1 , i2 , i3 , which represent the positions of rotors 1, 2, and 3, vary. Since each
permutation τ is a conjugate of σ, it follows that τ is also a product of thirteen
2-cycles and that τ −1 = τ . Thus, a message can be encrypted and decrypted by
Enigma machines with the same settings. The operator could choose ten pairs
of letters to connect in the plugboard, three out of five exchangeable rotors in
any order, and twenty-six initial positions for each rotor. This gives a total of
150,738,274,937,250 initial settings for the machine.
157
158 1943. BREAKING ENIGMA
The vast number of initial settings makes the Enigma code almost unbreakable,
but it does have weaknesses. Since τ is a product of thirteen 2-cycles, no letter is
ever encoded as itself. A codebreaker can look for common words and phrases in the
encrypted text and rule them out if any letters match. German messages also had
various common formats that made them easier to guess. Furthermore, the Allied
spies captured parts of Enigma machines, decrypted messages, and information
about initial settings. All this was just enough to break the code. By 1943, British
Intelligence was able to decrypt most Enigma codes without knowing the initial
settings of the machine. This capability was kept utterly secret; the Nazis never
knew. Winston Churchill (1874–1965) later told George VI (1895–1952), “It was
thanks to Ultra that we won the war.”
1943: Comments
Derangements. There are n! permutations of {1, 2, . . . , n}. A permutation
is a derangement if no element ends up where it started. Thus, (2 4 3 5 1) is not
a derangement since 3 is fixed, but (2 3 5 1 4) is. Let pn denote the fraction of
permutations of {1, 2, . . . , n} that are derangements. Does limn→∞ pn exist? If it
exists, is it large (close to 1) or small (close to 0)? Think about this before reading
on.
To determine pn , we compute 1 minus the probability at least one number
is fixed. Let Ai1 ,i2 ,...,ik denote the number of permutations that fix the distinct
natural numbers i1 , i2 , . . . , ik ≤ n; these permutations may fix other numbers as
well, so long as i1 , i2 , . . . , ik are fixed. Then Ai1 ,i2 ,...,ik = (n − k)!. The principle of
inclusion-exclusion ensures that the number of permutations that fix at least one
of {1, 2, . . . , n} is
n
A i1 − Ai1 ,i2 + Ai1 ,i2 ,i3 − · · · + (−1)n−1 A1,2,...,n .
i1 =1 i1 <i2 i1 <i2 <i3
n
Since the number of ways to select i1 < i2 < · · · < ik is k , the preceding equals
n! n! n! n!
(n − 1)! − (n − 2)! + (n − 3)! − · · · + (−1)n 0!
1!(n − 1)! 2!(n − 2)! 3!(n − 3)! n!0!
1 1 1 1
= n! − + − · · · + (−1)n .
1! 2! 3! n!
Divide the above by n! (there are n! total permutations of {1, 2, . . . , n}), subtract
the result from 1, and obtain
n
1 1 1 1 1 (−1)k
pn = − + − + · · · + (−1)n = .
0! 1! 2! 3! n! k!
k=0
100TH ANNIVERSARY PROBLEMS 159
Since
x x2 x3
ex = 1 + + + + ···
1! 2! 3!
for all real x, we see that
lim pn = e−1 ≈ 36.79%.
n→∞
160 1943. BREAKING ENIGMA
Looked at another way, for large n a random permutation of the set {1, 2, . . . , n}
has a 63.21% chance of keeping at least one element fixed.
Bibliography
[1] M. Cozzens and S. J. Miller, The mathematics of encryption: An elementary introduc-
tion, Mathematical World, vol. 29, American Mathematical Society, Providence, RI, 2013.
MR3098499
[2] F. H. Hinsley and A. Stripp (eds.), Codebreakers: The inside story of Bletchley Park, The
Clarendon Press, Oxford University Press, New York, 1993. MR1243675
100TH ANNIVERSARY PROBLEMS 161
Introduction
In 1944 John von Neumann and Oskar Morgenstern (1901–1997) published
Theory of Games and Economic Behavior [7], the seminal book in the field of
game theory. Since its publication, game theory has steadily become more widely
used both within and across disciplines. It is now a leading analytical tool in
microeconomic theory, formal political theory, evolutionary biology, and ecology
[5, 6]. It is even used in the study of literature and philosophy [2]. One can argue
that game theory is one of the great success stories of modern applied mathematics.
One of the central problems in the field is the determination of equilibrium
strategies for rational participants. These range from existence questions (does an
equilibrium exist and, if so, of what type?) to normative ones (what is the optimal
equilibrium?). Enormous progress was made in 1950. In his Princeton mathematics
dissertation [3], John Forbes Nash Jr. (1928–2015) proved that in a large class of
noncooperative games, an equilibrium exists in which no player has an incentive to
change his behavior. These points are now called Nash equilibria.
Nash’s biography [4] was turned into a movie by the same name, A Beautiful
Mind, which won the Oscar for Best Picture in 2002. The movie dramatized the
scene in which Nash thought of the idea for his thesis. Mathematicians, however,
might get more of a kick out of a different scene, described in the book but left
out of the movie, in which Nash visited von Neumann’s office to share his idea.
The book reports that the meeting was short and ended with von Neumann saying
“That’s trivial, you know. That’s just a fixed-point theorem.” The idea won Nash
a Nobel Prize in 1994.1
Sadly, Nash and his wife, Alicia, died in a car crash in 2015 after returning from
Norway, where Nash had just received the famed Abel Prize. The driver of their
taxi lost control and the couple was ejected from the vehicle (they were not wearing
seat belts). The driver, who was wearing a seat beat, sustained non-life-threatening
injuries.
1 This is not to say that von Neumann did not know what he was talking about. One of
the leading mathematicians of the 20th century, von Neumann’s work inspired the work of Fields
Medalists Alain Connes (1947– ) in 1982 and Vaughan F. R. Jones (1952– ) in 1990, both of
whom worked on von Neumann algebras (see the 1985 entry). He also made seminal contributions
to the foundations of mathematics (see the 1924 entry), ergodic theory (1931 entry), operator
theory, numerical analysis (1926 entry), quantum mechanics, hydrodynamics, fluid dynamics, and
statistics. He was also one of the founders of computer science and he was a key member of the
Manhattan Project.
163
164 1944. THEORY OF GAMES AND ECONOMIC BEHAVIOR
1944: Comments
The Borsuk–Ulam theorem. As von Neumann noted, the heavy lifting in
the theory of Nash equilibria is done by a fixed-point theorem. To be more specific,
if T : X → X is a function, then xf ∈ X is a fixed point of T if T (xf ) = xf . The
most famous fixed-point theorem is undoubtedly Brouwer’s fixed-point theorem;
see the 2009 entry. We describe a few simple fixed-point theorems below.
Our first fixed-point theorem involves only calculus. We claim that if T :
[0, 1] → [0, 1] is continuous, then it has at least one fixed point; see Figure 1.
Consider the function Δ : [0, 1] → [0, 1] defined by Δ(x) = T (x) − x. If either
Δ(0) or Δ(1) vanishes, then we are done. Consequently, we may suppose that
Δ(0) > 0 and Δ(1) < 0. Then the intermediate value theorem ensures that there
is an xf ∈ (0, 1) such that Δ(xf ) = 0. In other words, T (xf ) = xf .
100TH ANNIVERSARY PROBLEMS 165
Figure 1. The function T : [0, 1] → [0, 1] given by T (x) =
sin2 x4 +0.0712
1
cos sin2 (x + 2) has nine fixed points. Each in-
tersection of its graph with the diagonal line y = x is a fixed point
of T .
−x0
x0
converges to the unique fixed point guaranteed by the theorem. The method of
Picard iteration 2 for solving differential equations is based upon this idea; see [1].
on [0, ∞). We regard the differential equation (1944.2) as a law or model that
governs some system that evolves with respect to the variable t, which represents
time. The initial condition y(0) = 0 gives us some initial data from which we wish
to predict the future, that is, find a function y : [0, ∞) → R that satisfies (1944.2).
One solution to (1944.2) is y(x) = x2 /4. However, it is not the only one since
0 if x ≤ τ ,
yτ (x) = 1
4 (x − τ )
2
if x > τ
also solves (1944.2) for any τ ≥ 0 (a short argument with the definition of the
derivative confirms that yτ (x) is differentiable at x = τ ). Thus, our model is
useless for predicting the future since it says that y might remain zero forever, or
it might suddenly start increasing quadratically at any moment!
Bibliography
[1] W. Boyce and R. DiPrima, Elementary differential equations and boundary value problems
(seventh edition), John Wiley & Sons, 2000.
[2] M. S.-Y. Chwe, Jane Austen, game theorist, Princeton University Press, Princeton, NJ, 2014.
MR3222736
[3] J. F. Nash Jr, Non-cooperative games, Thesis (Ph.D.)–Princeton University, ProQuest
LLC, Ann Arbor, MI, 1950. http://www.princeton.edu/mudd/news/faq/topics/Non-
Cooperative_Games_Nash.pdf. MR2938064
[4] S. Nasar, A beautiful mind, Simon & Schuster, New York, 1998. MR1631630
2 Themathematician Charles Émile Picard (1856–1941) was a distant ancestor of the famed
Jean-Luc Picard (2305 – ?). Although this is supported by neither canonical nor noncanonical
sources, we would like to “make it so.”
100TH ANNIVERSARY PROBLEMS 167
Introduction
Suppose that F is a field. If α is the root of√a polynomial with coefficients in F,
then α is algebraic over F. For instance, i = −1 is algebraic over R since it is a
root of x2 + 1. We say that the degree of the extension field C = R(i) is two, since
C = {a + bi : a, b ∈ R} is a two-dimensional vector space over R. If an extension
field F(α) is finite-dimensional over F, then it is a finite extension of F. A finite
extension
√ of the rational numbers is a number field . The quadratic number fields
Q( d) for various d are an important class of examples; see the 1966 entry.
If α is not the root of a polynomial with coefficients in F, then α is transcen-
dental over F. For example, π and e are known to be transcendental over Q; see
the 1918, 1934, 1935, 1955, and 1973 entries for more information. If α is transcen-
dental over F, then no nontrivial F-linear combination of the powers of α equals
zero. That is, a nonzero polynomial in α cannot be reduced to elements of F. Con-
sequently, the extension field F(α) obtained from F by including α resembles the
field of rational functions, with coefficients in F, in the “variable” α. That is,
2 p(α) 3
F(α) = : p, q are polynomials .
q(α)
A function field in one variable over F is a field K, containing F and at least one
element x that is transcendental over F, such that K is a finite algebraic extension
of F(x). For instance, we can consider the subfield K of F(x, y) generated by two
transcendental elements x and y that satisfy the defining equation y 2 = x5 + 1.
Here F(x, y) denotes the set of all quotients of polynomials in x and y. Such a field
is said to have transcendence degree one over F. A function field in one variable
over a finite field of constants is a global function field .
It is often easier to prove results for function fields than number fields. This
is one reason for their popularity: they provide a nice testing ground to explore
what may be true for number fields. For instance, Fermat’s last theorem is famously
difficult to prove (see the 1995 entry) but its function-field analogue is a consequence
of the Mason–Stothers theorem (see the 1981 entry), which itself has an elementary
proof that boils down to a careful study of polynomials and their derivatives. The
Riemann hypothesis (see the 1942 entry) is no exception to this rule. Weil proved
it in the function field setting in the 1940s; we are still waiting for a proof of the
Riemann hypothesis in the classical case.
Since the Riemann zeta function plays such a useful role in understanding the
prime numbers, we wish to define an analogue for function fields: the zeta function
of a global function field K over Fq , the finite field with q elements. We must first
169
170 1945. THE RIEMANN HYPOTHESIS IN FUNCTION FIELDS
introduce some notation; for more on these objects see [8]. A prime in K is a
discrete valuation ring R with maximal ideal P such that F ⊂ R and the quotient
field of R is equal to K. The group of divisors of K, denoted DK , is the free abelian
group generated by the primes. For A ∈ DK , we define the norm of A, denoted
N (A), to be q degA . The zeta function of K is
−1
1 1
ζK (s) = = 1− , Re s > 1.
N (A)s N (P )s
A≥0 P prime in K
Like the Riemann zeta function, ζK (s) can be analytically continued to a larger
domain (see the 1942 entry). The Riemann hypothesis over global function fields
is a theorem, first proved by André Weil in the 1940s.
The result above was first conjectured for hyperelliptic function fields by Emil
Artin (1898–1962) in his thesis. The simplest case (elliptic curves; see the 1921
entry) was proved by Helmut Hasse. The first proof of the general result was
published by Weil in 1948. Weil presented two proofs of this theorem. The first
used the geometry of algebraic surfaces and the theory of correspondences. The
second used the theory of abelian varieties; see [13, 14]. The whole project required
revisions in the foundations of algebraic geometry since he needed these theories
to be valid over arbitrary fields not just algebraically closed fields in characteristic
zero. In the early seventies, Fields Medalist Enrico Bombieri (1940– ) obtained
a more elementary proof, building upon important work of Sergei Aleksandrovich
Stepanov (1941– ).
LK (q −s )
ζK (s) = .
(1 − q −s )(1 − q 1−s )
You will need to use the Riemann–Roch theorem. For more details about the genus
of a function field and the Riemann–Roch theorem see [9].
1945: Comments
Special values of the Riemann zeta function. Since we have discussed the
Riemann zeta function in this entry, now is a good time to explore some more of its
intriguing properties. While reading the results below ask yourself: do analogous
statements hold in the function field setting?
100TH ANNIVERSARY PROBLEMS 171
The Riemann zeta function can be evaluated at the even positive integers in
closed form. The first few values are
π2 π 10
ζ(2) = , ζ(10) = ,
6 93555
π4 691π 12
ζ(4) = , ζ(12) = ,
90 638512875
π6 2π 14
ζ(6) = , ζ(14) = ,
945 18243225
π8 3617π 16
ζ(8) = , ζ(16) = ;
9450 325641566250
see the 1919 entry for an evaluation of ζ(2). In fact, Euler showed that ζ(2k) is a
rational multiple of π 2k for k = 1, 2, . . .. On the other hand, the exact value of
∞
1
ζ(3) = = 1.2020569031595942854 . . .
n=1
n3
is unknown. Fortune and glory await the mathematician who provides a closed-
form evaluation of ζ(3). The most significant result in this direction is due to
Roger Apéry (1916–1994), who proved that ζ(3) is irrational in 1979 [1] (see the
1979 entry).
A famous open problem in this area is the Dirichlet divisor problem. It asks
for the infimum over all α such that
n
τ (j) = n log n + (2γ − 1)n + O(nα ),
j=1
On the other hand, Edmund Landau (1916) showed that the infimum must be ≥ 14 .
It is customary to write τ (n) = d|n d, in which the subscript d|n indicates that
the sum runs over all of the positive divisors of n. This suggests the generalization
σk (n) = dk ,
d|n
which sums the kth powers of the divisors of n. If k = 0, then σ0 = τ . The case
k = 1 is also special; we write σ = σ1 and refer to this as the sigma function (or the
sum of divisors function). Like the τ function, the functions σk are multiplicative.
For Re s > 2, we have
∞
1 1
∞
ζ(s)ζ(s − 1) =
n=1
ns m=1
ms−1
∞
1 m
∞
=
n=1
ns m=1
ms
1 1 1 2 3 4
= 1 + s + s + s + ··· 1 + s + s + s + ···
2 3 4 2 3 4
1 2 1 3 1 2 4
= 1+ + + + + + + + ···
2s 2s 3s 3s 4s 4s 4s
∞
σ(n)
= ,
n=1
ns
that Albert Ingham (1900–1967) used in 1930 to provide a quick proof of the prime
number theorem [7]. Ramanujan’s formula reduces to the curious formula
∞
(τ (n))2 ζ 4 (s)
s
=
n=1
n ζ(2s)
100TH ANNIVERSARY PROBLEMS 173
Bibliography
[1] R. Apéry, Irrationalité de ζ(2) et ζ(3) (French), Luminy Conference on Arithmetic, Astérisque
61 (1979), 11–13. MR3363457
[2] T. M. Apostol, Introduction to analytic number theory, Undergraduate Texts in Mathematics,
Springer-Verlag, New York-Heidelberg, 1976. MR0434929
[3] E. Artin, Quadratische Körper in Geibiet der Höheren Kongruzzen I and II, Math. Z.
19 (1924), 153-296. https://eudml.org/doc/167773 and https://eudml.org/doc/167774.
[4] E. Bombieri, Problems of the millennium: the Riemann hypothesis, Clay Mathematics In-
stitute, http://www.claymath.org/sites/default/files/official_problem_description.
pdf.
[5] E. Bombieri, Counting points on curves over finite fields (d’après S. A. Stepanov),
Séminaire Bourbaki, 25ème année (1972/1973), Exp. No. 430, Lecture Notes in Math.,
Vol. 383, Springer, Berlin, 1974, pp. 234–241. http://link.springer.com/chapter/10.1007
%2FBFb0057311. MR0429903
[6] G. Lejeune Dirichlet, Sur l’usage des séries infinies dans la théorie des nombres (French), J.
Reine Angew. Math. 18 (1838), 259–274, DOI 10.1515/crll.1838.18.259. MR1578191
[7] A. E. Ingham, Note on Riemann’s zeta-Function and Dirichlet’s L-Functions, J. London
Math. Soc. 5 (1930), no. 2, 107–112, DOI 10.1112/jlms/s1-5.2.107. MR1574211
[8] K. Ireland and M. Rosen, A classical introduction to modern number theory, 2nd ed., Grad-
uate Texts in Mathematics, vol. 84, Springer-Verlag, New York, 1990. MR1070716
[9] C. Moreno, Algebraic curves over finite fields, Cambridge Tracts in Mathematics, vol. 97,
Cambridge University Press, Cambridge, 1991. MR1101140
[10] S. Ramanujan, On certain trigonometrical sums and their applications in the theory of num-
bers [Trans. Cambridge Philos. Soc. 22 (1918), no. 13, 259–276], Collected papers of Srini-
vasa Ramanujan, AMS Chelsea Publ., Providence, RI, 2000, pp. 179–199. MR2280864
[11] P. Sarnak, Problems of the millennium: the Riemann hypothesis (2004), Clay Mathematics
Institute, http://www.claymath.org/sites/default/files/sarnak_rh_0.pdf.
[12] A. Weil, On the Riemann hypothesis in function fields, Proc. Nat. Acad. Sci. U. S. A. 27
(1941), 345–347. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1078336/. MR0004242
[13] A. Weil, Sur les courbes algébriques et les variétés qui s’en déduisent (French), Actualités
Sci. Ind., no. 1041 = Publ. Inst. Math. Univ. Strasbourg 7 (1945), Hermann et Cie., Paris,
1948. MR0027151
[14] A. Weil, Variétés abéliennes et courbes algébriques (French), Actualités Sci. Ind., no. 1064 =
Publ. Inst. Math. Univ. Strasbourg 8 (1946), Hermann & Cie., Paris, 1948. MR0029522
1946
Introduction
While today it is hard to gaze around a room without seeing a computer,
be it in a smartphone or a thermostat, the situation was different during World
War II. Computers were in their infancy. They were rare, expensive, and big.
Early computers could fill an entire room and they had enormous power demands.
A major leap came when people realized that they could be used for more than
computing exact answers to specific problems. They can be used to approximate
the answers to difficult problems through extensive simulations. This led to what
is now called the Monte Carlo method (the name refers to the famous casino on
Monaco).
The first thoughts and attempts I made to practice [the Monte Carlo
method] were suggested by a question which occurred to me in 1946
as I was convalescing from an illness and playing solitaires. The ques-
tion was what are the chances that a Canfield solitaire laid out with
52 cards will come out successfully? After spending a lot of time try-
ing to estimate them by pure combinatorial calculations, I wondered
whether a more practical method than “abstract thinking” might not
be to lay it out say one hundred times and count the number of success-
ful plays. This was already possible to envisage with the beginning of
the new era of fast computers, and I immediately thought of problems
of neutron diffusion and other questions of mathematical physics, and
more generally how to change processes described by certain differen-
tial equations into an equivalent form interpretable as a succession of
random operations. – Stanislaw Ulam (1909–1984) [2]
Monte Carlo techniques are now used to approximate the solution to numerous
problems. Rather than finding exact answers, one can simulate millions of cases and
use that information to obtain an excellent approximation to the correct answer. An
early application was to nuclear reactions, in which scientists would approximate
both the trajectories of neutrons and the numbers released in each collision. A
more down-to-earth example involves integration. In calculus, students learn how
to compute areas by integration. Instructors work hard to find functions that have
nice antiderivatives; a general function does not have a closed-form expression for
its integral (see the 1968 and 1976 entries). For instance, the definite integrals
1 47 1
dx
e−x dx
2
1 + x3 dx, , and
0 2 ln x 0
175
176 1946. MONTE CARLO METHOD
in [0, 1]2 , count how many of them satisfy yi ≤ e−xi , then divide this number by
2
Figure 1. Monte Carlo method for computing the area A bounded by the
ellipse x2 + 4y 2 = 1. The true area is π/2 = 1.570796 . . .; see the comments
for the 1941 entry for the derivation.
178 1946. MONTE CARLO METHOD
applications. A popular, early method is the von Neumann middle square digits
method, described with some nice references in the “random numbers” section of [3].
Given an n-digit natural number, square it to get a 2n-digit number. Our random
number is the middle n digits. We then square that, take the middle n digits of
the new product, and obtain our next “random” number. Continuing this process
generates our pseudo-random sequence of numbers. For example, if we start with
4321, our next number is 6710 since 43212 = 18671041. Since 67102 = 45024100,
our next number is 241.
This process cannot generate numbers uniformly at random, even if we restrict
ourselves to numbers from 0 to 10n −1. The reason is simple: this process generates
a periodic sequence! After at most 10n − 1 terms we have a repeat, at which point
the pattern cycles since all future terms are completely determined by the preceding
value.
For each n, what is the shortest period? The longest? How many of the 10n
initial seeds have the shortest (or longest) period? Can you give an example? If you
cannot solve this problem exactly, can you approximate the answer using Monte
Carlo techniques?
1946: Comments
How many trials? One of the most natural questions to ask when doing a
Monte Carlo simulation is, “How many trials N are needed for a given accuracy?”
√ In
many settings the convergence is fast, with the error on the order of 1/ N . Let us
explore approximating the area A of a region R contained in the unit square [0, 1]2 .
Let X1 , X2 , . . . , XN be independent, identically distributed random variables, in
which each is 1 with probability A and 0 with probability 1 − A (each Xn is a
Bernoulli random variable; see the comments for the 1922 entry). The fraction
X1 + · · · + XN
YN =
N
has expected value A and variance
1 N N
1
Var YN = Var Xn = 2
A(1 − A)
N n=1 n=1
N
A(1 − A)
= .
N
As N → ∞, the central limit theorem (see the 1922 entry) ensures that the
random variable YN converges to being normally distributed with mean A and
standard deviation "
A(1 − A)
.
N
The greatest uncertainty is when A = 1/2, for which the standard deviation is√at
most 2√1N . The probability that the observed estimate is off by more than 2/ N
is bounded by the probability of being more than four standard deviations from
the mean,√which is approximately 0.0000633425. If instead we asked about being
within 3/ N , the probability of failing decreases to at most 1.973176 · 10−9 .
180 1946. MONTE CARLO METHOD
Bibliography
[1] M. Aigner and G. M. Ziegler, Proofs from The Book, 6th ed., see corrected reprint of the 1998
original [MR1723092]; including illustrations by Karl H. Hofmann, Springer, Berlin, 2018.
MR3823190
[2] R. Eckhardt, Stan Ulam, John von Neumann, and the Monte Carlo method, with contribu-
tions by Tony Warnock, Gary D. Doolen and John Hendricks; Stanislaw Ulam 1909–1984,
Los Alamos Sci. 15, Special Issue (1987), 131–137. http://library.lanl.gov/cgi-bin/
getfile?00326867.pdf. MR935772
[3] N. Metropolis, The beginning of the Monte Carlo method, Stanislaw Ulam 1909–1984,
Los Alamos Sci. 15, Special Issue (1987), 125–130. http://library.lanl.gov/cgi-bin/
getfile?00326866.pdf. MR935771
[4] N. Metropolis and S. Ulam, The Monte Carlo method, J. Amer. Statist. Assoc. 44 (1949),
335–341. http://www.jstor.org/stable/2280232. MR0031341
[5] Wikipedia, Monte Carlo method, http://en.wikipedia.org/wiki/Monte_Carlo_method.
1947
Introduction
There are many important problems for which an algorithm to find a solution
exists but has a prohibitively long run time that limits its practical value. One
example is integer factorization: given an integer N , write it as a product of primes.
We give one solution below without any attempt to improve its efficiency.
• Step 1: Initialize Factors(N ) to be the empty set; as the name suggests, we will
store the factors of N here. Let M = N and n = 2 and continue to Step 2.
• Step 2: If n divides M , then append n to Factors(N ), replace M with M/n,
and continue to Step 3. If n does not divide M , then let n = n + 1; if n = M ,
then append n to Factors(N ) and go to Step 4, else repeat this step.
• Step 3: If M > 1, then set n = 2 and repeat Step 2, else go to Step 4.
• Step 4: Print Factors(N ) and stop.
When the entries are restricted to 0 or 1, we can interpret the components as bi-
nary indicator variables. Do we have a plane leaving from Albany to Charlotte
at 2:45pm? Do we show The Lego Movie on our biggest screen at 10:30am? If
we are trying to solve the traveling salesman problem (what is the route of least
distance through a given set of cities?), is the fifth leg of our trip from Boston to
Rochester? These examples should convey the importance of solving binary integer
programming problems.
Prove that if we could modify the simplex method to handle problems with
quadratic constraints, then we could solve all integer programming problems! For
those familiar with the P versus NP problem (see the comments for the 2000 entry),
this would prove P equals NP.
1947: Comments
Overview of the simplex method. While we cannot describe the simplex
method in its full glory and prove why it works in a short introduction, we can at
least sketch what it is and give a sense of why it should work. Suppose that we
wish to solve the canonical linear programming problem: minimize cT x subject to
Ax = b for an m×n matrix A, in which the entries of x and b are nonnegative (if an
entry of b were negative, we could multiply the corresponding row of A by −1 and
reverse its sign). We also make the assumption that the problem is nondegenerate,
in the following sense.
Suppose that the m rows of A are linearly independent. If the rows are not
linearly independent, then either we cannot solve Ax = b or at least one of the
rows is unnecessary. We also assume b is not a linear combination of fewer than
m columns of A. If b is a combination of fewer than m columns, this will create a
technical difficulty in the simplex method. Fortunately this is a weak condition: if
we change some of the entries of b by small amounts (less than 10−10 , for example),
this should suffice to break the degeneracy.
The simplex method has two phases.
• Phase I: Find a basic feasible solution (or prove that none exists).
• Phase II: Given a basic feasible solution, find a basic optimal solution (or prove
that none exists). If no optimal solution exists, Phase II produces a sequence of
feasible solutions with cost cT x that tends to minus infinity.
The idea of the proof seems absurd at first: we start by assuming we can do
Phase II, use that to do Phase I, and then use Phase I to do Phase II. The reason
this argument is not circular is that the input of Phase II is a basic feasible solution.
If we have a problem for which we have one such solution, then we can run through
Phase II. Instead of the original problem, we instead consider a related one for which
we can find a basic feasible solution by inspection. It is to this related problem that
we apply Phase II to determine whether or not there is a basic feasible solution to
the original problem; if there is, we then use that solution as an input in applying
Phase II to the original problem.
We proceed by appending the m × m identity matrix to A to form the new
matrix A = [A I] and consider the following new canonical linear programming
184 1947. THE SIMPLEX METHOD
(a) The region of feasible solutions. (b) Searching for the cheapest diet.
can shift the cost line down and to the left and lower the cost. Doing so lands us at
one of the three vertices (unless the slope of the cost line equals the slope of one of
the boundary lines, but even in that case we would still have the value at a vertex
186 1947. THE SIMPLEX METHOD
equal to the minimal cost). These arguments generalize to a large class of linear
programming problems and show that the optimal solution occurs at a boundary;
all that remains is to find a method to quickly reach such a point.
Bibliography
[1] G. Dantzig, The diet problem, Interfaces 20 (1990), no. 4, 43–47.
[2] J. Franklin, Methods of mathematical economics: Linear and nonlinear programming, fixed-
point theorems, Undergraduate Texts in Mathematics, Springer-Verlag, New York-Berlin, 1980.
MR602694
[3] S. J. Miller, Mathematics of optimization: how to do things faster, Pure and Applied Under-
graduate Texts, vol. 30, American Mathematical Society, Providence, RI, 2017. MR3729274
[4] Wikipedia, Simplex algorithm, https://en.wikipedia.org/wiki/Simplex_algorithm.
1948
Introduction
The prime number theorem states that the number of primes at most x, denoted
π(x), is asymptotic to x/ log x:
π(x)
lim = 1;
x→∞ x/ log x
see the 1919 and 1933 entries. First conjectured in the 1790s, it was not proved
until almost 100 years later, when Jacques Hadamard and Charles Jean de la Vallée-
Poussin (1866–1962) independently established it in 1896. They both used complex
analysis to understand the distribution of zeros of the Riemann zeta function (see
the 1928, 1933, 1939, 1942, 1945, 1967, and 1987 entries).
Since the prime number theorem is a statement about integers and not about
complex analysis, these proofs were unsatisfactory to some. It felt unnatural to use
complex numbers to study primes.1 However, it was commonly believed that com-
plex analysis or other similarly “deep” methods were needed to prove it. According
to G. H. Hardy (see the 1940 entry):
No elementary proof of the prime number theorem is known, and one
may ask whether it is reasonable to expect one. Now we know that the
theorem is roughly equivalent to a theorem about an analytic function,
the theorem that the Riemann zeta function has no roots on a certain
line. A proof of such a theorem, not fundamentally dependent on the
theory of functions, seems to me extraordinarily unlikely.
It took almost fifty years for an elementary proof (that is, one that does not
rely on complex analysis) to be found. This was done by Paul Erdős [3] (see the
1913 entry) and Atle Selberg [9] in 1948. The story of who contributed what and
when, and who should receive what credit, has been the subject of many heated
discussions. Dorian Goldfeld (1947– ), who knew the players involved, has written
a good description of what happened [5]. See also [6] for a motivated account of
the proof.
The term “elementary” should not be confused with “easy.” The elementary
proofs of the prime number theorem are longer, more technical, and provide less
accurate estimates about π(x) than the complex analysis proofs do. In fact, the
classical approach is still preferred in most textbooks. We devote the remainder of
this entry to discussing the traditional, complex analysis approach.
1 A famous, humorous dictum is that the shortest path between two statements involving real
numbers is through the complex plane; that is certainly the case here.
187
188 1948. ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM
In Riemann’s seminal 1859 paper [8], he showed that knowledge of the zeros of
the zeta function yields information about π(x); see the 1942 entry. The fact that
the zeta function enjoys the Euler product representation (1933.3) suggests the use
of logarithmic derivatives. Recall that the logarithmic derivative of f is
f
(log f ) = .
f
The logarithmic derivative of a product is a sum of logarithmic derivatives:
(f g) f g + f g f g
= = + ;
fg fg f g
the same holds true for products with three or more factors. With appropriate
limit arguments, one can use this technique to study the Euler product (1933.3)
representation of the zeta function. The following theorem from complex analysis
permits us to pass from knowledge of the logarithmic derivative of a function to
knowledge of the number and location of the roots of that function.
Theorem: Let Ω be a nonempty, connected open set in C and let γ be a simple
closed curve in Ω with its interior in Ω. Let f : Ω → C be analytic with no zeros
on γ and let g : Ω → C be analytic. Then f has finitely many zeros in the interior
of γ and
1 f (s)
g(s) ds = g(ρ), (1948.1)
2πi γ f (s)
f (ρ)=0
Some care is needed in writing down the sum so that it converges (this is typically
done by summing the zeros in complex-conjugate pairs). As remarked in the 1942
entry, the analytic continuation of ζ(s) has only a single pole3 , which is simple and
at s = 1 with residue 1; this is responsible for the x = x1 /1 term in (1948.2). The
remaining terms come from the zeros of ζ(s). One can show that these zeros have
real part at most 1 without too much trouble; this follows from the convergence of
the Euler product (1933.3).
The prime number theorem asserts that
x
1 ∼ ,
log x
p≤x
2 The line Re s = 2 is not a simple closed curve. However, it is when viewed as a curve that
passes through ∞. Suitable limit arguments and the Riemann sphere model of the complex plane
(see the 1956 entry) are required to push this through.
3 A pole of f is an isolated singularity s around which f behaves like a constant times
0
(s − s0 )−k for some natural number k. The pole is simple if k = 1. The residue of f at a simple
pole s0 is lims→s0 (s − s0 )f (s).
100TH ANNIVERSARY PROBLEMS 189
The left-hand side of this expression is precisely the left-hand side of (1948.2). The
crucial step in the standard proof of the prime number theorem is to show that if
ζ(ρ) = 0, then Re ρ < 1. See [7] for a more fleshed out sketch of this proof or see
[3, 6] for full details.
as x → ∞.
1948: Comments
Mertens’s theorem. In 1874, a little more than 20 years before the prime
number theorem was proved, Franz Mertens proved that
1
lim − log log x − M = 0, (1948.3)
x→∞ p
p≤x
For a proof of Mertens’s theorem, see almost any textbook on analytic number
theory (see [7] or [10]).
190 1948. ELEMENTARY PROOF OF THE PRIME NUMBER THEOREM
The asymptotic growth rate log log x remains the same even if we sum over the
reciprocals of all prime powers. To be more specific,
1
= log log x + O(1),
pa
p ≤x
a
in which pa denotes a prime power and O(1) a quantity that remains bounded as
x → ∞. Indeed, Mertens’s theorem ensures that
1 1 1 1
a
= + a
≤ log log x + O(1) +
p p p nk
p ≤x
a a
p≤x p ≤x n≥2 k≥2
a≥2
1 1 1
= log log x + O(1) + 1 + + 2 + ···
n2 n n
n≥2
1 1
= log log x + O(1) + ·
n 1 − 1/n
2
n≥2
= log log x + O(1).
A zero-free region for the zeta function. Let us sketch the proof that
ζ(s)
= 0 whenever s = σ + it with σ, t ∈ R and σ ≥ 1. This is very weak progress
towards the Riemann hypothesis, although it is enough to obtain the prime number
theorem (though with a poor error term). First, use the series definition (1933.2)
to prove that ζ(σ)
= 0 if σ > 1. Then use the Euler product formula (1933.3) to
show that ζ(σ + it)
= 0 if σ > 1.
We are left with the case σ = 1. This was originally independently proved by
Hadamard and de la Vallée-Poussin in 1896; fill in the details of Mertens’s elegant
proof from a few years later by proving the following statements.
(a) 3 + 4 cos θ + cos 2θ ≥ 0. (Hint: Consider (cos θ + 1)2 .)
∞
p−kσ
(b) For s = σ + it, log ζ(s) = e−itk log p .
p k=1
k
∞
p−kσ
(c) Re log ζ(s) = cos t log pk .
p k=1
k
Bibliography
[1] M. Aigner and G. M. Ziegler, Proofs from The Book, 6th ed., see corrected reprint of the
1998 original [MR1723092]; including illustrations by Karl H. Hofmann, Springer, Berlin,
2018. MR3823190
[2] D. Burt, S. Donow, S. J. Miller, M. Schiffman, and B. Wieland, Irrationality measure and
lower bounds for π(X), Pi Mu Epsilon J. 14 (2017), no. 7, 421–429. http://arxiv.org/pdf/
0709.2184.pdf. MR3726946
[3] P. Erdős, On a new method in elementary number theory which leads to an elementary
proof of the prime number theorem, Proc. Nat. Acad. Sci. U. S. A. 35 (1949), 374–384, DOI
10.1073/pnas.35.7.374. MR0029411
[4] H. Furstenberg, On the infinitude of primes, Amer. Math. Monthly 62 (1955), 353, DOI
10.2307/2307043. MR0068566
[5] D. Goldfeld, The elementary proof of the prime number theorem: an historical perspective,
http://www.math.columbia.edu/~goldfeld/ErdosSelbergDispute.pdf.
[6] N. Levinson, A motivated account of an elementary proof of the prime number theorem,
Amer. Math. Monthly 76 (1969), 225–245, DOI 10.2307/2316361. http://www.maa.org/
sites/default/files/images/upload_library/22/Ford/NormanLevinson.pdf. MR0241372
[7] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[8] G. F. B. Riemann, Über die Anzahl der Primzahlen unter einer gegebenen Grösse, Monats-
ber. Königl. Preuss. Akad. Wiss. Berlin (1859), 671-680. http://www.maths.tcd.ie/pub/
HistMath/People/Riemann/Zeta/EZeta.pdf.
[9] A. Selberg, An elementary proof of the prime-number theorem, Ann. of Math. (2)
50 (1949), 305–313, DOI 10.2307/1969455. http://www.jstor.org/stable/1969455?seq=1#
page_scan_tab_contents. MR0029410
[10] T. Tao, Mertens’ theorems, https://terrytao.wordpress.com/2013/12/11/mertens-theorems/.
1949
Beurling’s Theorem
Introduction
The study of linear transformations between finite-dimensional spaces is the
purview of linear algebra. Analysis rarely enters into the discussion because it is
possible to show any two norms on the vector space Rn are essentially the same.
To be more specific, they give rise to the same open sets, closed sets, convergent
sequences, continuous functions, and so forth. The study of linear transformations
between infinite-dimensional normed vector spaces is called operator theory. There
are lots of things that can go wrong when one steps up to the infinite-dimensional
setting; we examine of few of them below. The interplay between linear algebra
and analysis is one of the great appeals of the subject.
A norm on a vector space V is a function · : V → [0, ∞) that satisfies
(a) v = 0 if and only if v = 0,
(b) cv = |c|v for all v ∈ V and all scalars c, and
(c) u + v ≤ u + v for all u, v ∈ V.
The Euclidean norm x = (x21 + x22 + · · · + x2n )1/2 on Rn is an example, as are
x1 = |x1 | + |x2 | + · · · + |xn | and x∞ = max |x1 |, |x2 |, . . . , |xn | , (1949.1)
f ∞ = sup |f (x)|.
x∈[a,b]
e2
e2
0 e1 0 e1
(a) {x ∈ R2 : x 1 ≤ 1} (b) {x ∈ R2 : x ∞ ≤ 1}
Figure 1. The closed unit balls for the norms (1949.1) on R2 have
corners. The corners are extreme points: they do not lie in any
open line segment that joins two points of the closed ball. Here
e1 = (1, 0) and e2 = (0, 1).
Hardy space H 2 (named after G. H. Hardy; see the 1940 entry), which consists of
complex power series f (z) = ∞ n
n=0 an z for which
∞ 1/2
f = |an |2
n=0
is finite. Each function f ∈ H 2 is analytic (see p. 151) on the open unit disk
D = {z ∈ C : |z| < 1}. The spaces 2 (N) and H 2 are fundamentally the same; they
are relabelled versions of each other. The shift operator on 2 (N) is essentially the
same as the operator that maps f (z) to zf (z): multiplication by z shifts the Taylor
coefficients of f .
Beurling’s theorem asserts that the nontrivial invariant subspaces for the for-
ward shift operator are all of the form uH 2 = {uf : f ∈ H 2 }, in which u is an
inner function. What is an inner function? An inner function is a bounded ana-
lytic function on D whose boundary values on the unit circle have absolute value 1
“almost everywhere.” For example, a Möbius transformation (see the 1956 entry)
of the form u(z) = (a − z)/(1 − az) with a ∈ D is an example, as are finite prod-
ucts of such functions. An important factorization theorem asserts that each inner
function factors as
∞ π it !
|zn | zn − z e +z
iγ N
e z exp − it
dμ(e ) ,
n=1 n
z 1 − zn z −π e − z
it
1949: Comments
Left and right inverses. Here are two proofs that left and right inverses
coincide for n × n matrices [4]. Both involve finite dimensionality in a crucial way
and avoid the unnecessary use of determinants. You may wish to consider at which
point the proofs break down for the infinite matrices in (1949.2).
196 1949. BEURLING’S THEOREM
it is a polynomial of degree at most n. Let x ∈ [0, 1] and consider a coin flip with
probability of heads x and tails 1 − x, respectively. The probability of k heads in n
trials is P (k, n) = nk xk (1 − x)n−k . What do the Bernstein polynomials (1949.3)
represent? Think of f ∈ C[0, 1] as the “payoff function” for a coin tossing game: if
there are k heads in n trials, then you win f (k/n) dollars. The expected winnings
are (Bn f )(x). For large n, we expect k ≈ nx heads, so
(Bn f )(x) = Expected winnings after n tosses ≈ f ( nk ) ≈ f (x).
The uniform continuity of f on [0, 1] and some probability theory ensure that this
informal reasoning can be pushed through.
The Müntz–Szász theorem. Recall that the span of a set of vectors is the
collection of all finite linear combinations of elements of that set. The Weierstrass
approximation theorem says that span{1, x, x2 , . . .} is dense in C[a, b]. This suggests
the following question.
Let 0 = λ0 < λ1 < λ2 < · · · . What are necessary and sufficient
conditions for span{1, xλ1 , xλ2 , . . .} to be dense in C[a, b]?
100TH ANNIVERSARY PROBLEMS 197
This question has a precise and elegant answer, due independently to Herman
Müntz (1884–1956) [9] and Otto Szász (1884–1952) [11]. Let
S = span{1, xλ1 , xλ2 , . . .}−
denote the closure of span{1, xλ1 , xλ2 , . . .} in C[a, b], in which a > 0.
∞
1
(a) If = ∞, then S = C[a, b].
λ
n=1 n
∞
1
(b) If / {λn }∞
< ∞ and if λ ∈ n=0 , then x ∈
λ
/ S (so S is not dense in C[a, b]).
n=1
λ n
The proof of the Müntz–Szász theorem is beyond the scope of this course. Its key
ingredients are the Hahn–Banach theorem and Riesz representation theorem from
functional analysis and the Blaschke characterization of the zero sets of bounded
analytic functions on the unit disk. A proof can be found in [8].
Here are two curious corollaries of the Müntz–Szász theorem. Let a > 0.
(a) If C[a, b] = span{1, xλ1 , xλ2 , . . .}− , then there is an infinite subsequence of the
λn that can be removed from the collection {1, xλ1 , xλ2 , . . .} so that the span
of the new collection is also dense in C[a, b].
(b) span{1, x2 , x3 , x5 , x7 , x11 , x13 , x17 , x19 , . . .} is dense in C[a, b]. This follows
from Euler’s proof that the sum of the reciprocals of the primes diverges;
see the notes for the 1913 entry.
Solution to the problem. Now for the answer to our question, which (in the
L2 [0, 1] Hilbert space setting) was first asked by Israel Gelfand [7]. The invariant
subspaces of the Volterra integration operator are all of the form {f ∈ C[0, 1] :
f (x) = 0 for x ∈ [0, a)} for some a ∈ [0, 1]. Thus, the invariant subspaces form
an uncountably, linearly ordered chain of subspaces of C[0, 1]. This result was
established in 1949 by Shmuel Agmon (1922– ) [1], with later proofs given by
several others authors (the most influential proof being that of Donald Sarason
(1933–2017) [10]).
Bibliography
[1] S. Agmon, Sur un problème de translations (French), C. R. Acad. Sci. Paris 229 (1949),
540–542. MR0031110
[2] S. N. Bernstein, Démonstration du théroème de Weierstrass, fondeé sur le calcul des proba-
bilités, Commun. Soc. Math. Kharkow (2) 13, 1–2.
[3] A. Beurling, On two problems concerning linear transformations in Hilbert space, Acta Math.
81 (1948), 17, DOI 10.1007/BF02395019. MR0027954
[4] S. R. Garcia and R. A. Horn, A Second Course in Linear Algebra, Cambridge Mathematical
Textbooks, Cambridge University Press, 2017.
[5] S. R. Garcia, J. Mashreghi, and W. T. Ross, Introduction to model spaces and their oper-
ators, Cambridge Studies in Advanced Mathematics, vol. 148, Cambridge University Press,
Cambridge, 2016. MR3526203
[6] J. B. Garnett, Bounded analytic functions, 1st ed., Graduate Texts in Mathematics, vol. 236,
Springer, New York, 2007. MR2261424
[7] I. M. Gelfand, A problem (Russian), Uspehi Matem. Nauk, 5 (1938), 233.
[8] P. D. Lax, Functional analysis, Pure and Applied Mathematics (New York), Wiley-
Interscience [John Wiley & Sons], New York, 2002. MR1892228
198 1949. BEURLING’S THEOREM
Introduction
Kenneth J. Arrow (1921–2017) was awarded the Nobel Prize in Economics
in 1972. Among the contributions cited in the prize committee’s statement was
the “possibility theorem” from his doctoral dissertation on voting theory that was
published as the book Social Choice and Individual Values [1–3]. Arrow set out
to determine the best election procedure and narrowed the set of all procedures
by requiring them to satisfy a number of desirable properties. These properties
were called axioms because they represented what Arrow believed were, in some
sense, the most natural properties that an election procedure should satisfy. Arrow
showed that no election procedure satisfies the axioms, which we describe below,
when two or more voters decide among three or more candidates. That is, the
axioms are inconsistent (see the notes for the 1924 entry for another example of
inconsistent axioms). His result is now referred to as Arrow’s impossibility theorem.
Assume that each of m ≥ 2 voters can rank order n ≥ 3 candidates, listing
them from most preferred to least preferred. An election procedure aggregates the
voters’ rankings and produces a societal ranking of the candidates. A version of
Arrow’s theorem from 1963 (the second edition of [1]) says that there is no election
procedure that satisfies the following three axioms.
• Pareto condition: If every voter prefers A over B, then the group ranks A above
B.
• Nondictatorship: There is not a single voter who is able to determine the group’s
rankings (that is, there is no dictator).
• Independence of Irrelevant Alternatives (or IIA): The societal ranking between
candidates A and B should only depend on the voters’ preferences for A and B.
The third axiom perhaps requires a bit more explanation. It asserts that for
a society to rank A and B, it is irrelevant to factor in how the voters rank other
candidates. For example, suppose that several voters change the relative rankings
of B and C. This should not affect how the society ranks A and B. It may, of
course, affect how the society ranks B and C.
To appreciate Arrow’s theorem, we need to go back to the beginnings of voting
theory. Nicolas de Condorcet (1743–1794), whose full name was Marie Jean Antoine
Nicolas de Caritat, Marquis de Condorcet, was a French mathematician, political
scientist, and philosopher. Although he published several papers on differential
and integral calculus, in mathematics he is most famous for observing a fundamen-
tal paradox in voting theory, which we describe below. He died in prison under
mysterious circumstances (some suggest poison) during the French revolution.
199
200 1950. ARROW’S IMPOSSIBILITY THEOREM
A candidate is the Condorcet winner if the candidate defeats every other can-
didate in a pairwise election (by being preferred by more than half of the voters
to every other candidate in a head-to-head competition). However, not every col-
lection of voters’ preferences has a Condorcet winner. In his 1785 paper Essai sur
l’application de l’analyse à la probabilité des décisions rendues à la pluralité des
voix, Condorcet made a remarkable observation. Suppose that three voters have
the following preferences for candidates A, B, and C:
voter 1 voter 2 voter 3
First choice A B C
Second choice B C A
Third choice C A B
Suppose that C is removed from consideration. Then we have the voter preferences
voter 1 voter 2 voter 3
First choice A B A
Second choice B A B
Consequently, A would defeat B in a pairwise election (denoted A B) because A
would receive two first-choice votes (from voters 1 and 3) and B would receive only
one such vote (from voter 2). Similarly, B would defeat C in a pairwise election
and C would defeat A. Notationally, this is represented by
ABCA
and is referred to as a Condorcet cycle. A Condorcet cycle can involve more candi-
dates. For example, we might have
A B D E C A.
If a Condorcet cycle contains all of the candidates in an election, then that election
does not have a Condorcet winner.
An election procedure satisfies the Condorcet winner criterion (CWC) if the
following holds:
If a Condorcet winner exists, then the election procedure always has
the Condorcet winner ranked first.
A weaker and easily accessible version of Arrow’s impossibility theorem requires just
two axioms, IIA and the Condorcet winner criterion (CWC), but also supposes that
the election procedure returns a top-ranked candidate; see [5, p. 343] for details.
1950: Comments
Penrose–Banzhaf power index. In the discussion above we gave each voter
the same weight. However, this is frequently not the case in practice. Consider the
following two situations for a private firm. In the first, Adams owns 90% of the
stock, Buchanan owns 8%, and Cleveland has the remaining 2%. In the second,
Adams has 45%, Buchanan 35%, and Cleveland 20%. How much are their shares
worth in each case, assuming that if over 50% of the stock supports a plan, then
that plan will be done?
Adams effectively controls the company in the first setting, since she can do
whatever she wants and the other two cannot outvote her. The second case is more
interesting, since any two of the three suffice to control the company. Thus, in this
setting, it is reasonable to say each share is worth a third of the company.
More generally, we define the Penrose–Banzhaf power index , named after Lionel
Penrose (1898–1972) [7] and John F. Banzhaf III (1940– ) [4]) as follows. A winning
coalition is a sets of voters that is sufficient to pass a measure. That is, if every
member of a winning coalition votes “yes,” then the measure is guaranteed to pass.
A swing vote is an additional vote necessary for a particular coalition; without that
“yes” vote, the coalition is not sufficient to pass the measure. The power index of
an individual is the fraction of all possible swing votes that they cast, that is, their
percentage of all swing votes in all winning coalitions.
In our first example, there are four winning coalitions (any coalition with
Adams), and Adams casts every swing vote, so Adams’s power index is 1 while
Buchanan and Cleveland both get 0. In our second example, there are seven win-
ning coalitions (any coalition with at least two voters), and each voter has the same
number of swing votes, so they all have power index 1/3. This index is useful in
evaluating many real-world voting schemes, such as the United States Electoral
College or the European Union.
Voting in Venice. The most convoluted voting system that the authors are
aware of was the method used by the Republic of Venice to elect its Doge; see
Figure 1. It was established in 1268 and remained in use until the ignominious
fall of the Most Serene Republic in 1797. The details and particulars are so mind-
bogglingly complicated that we have no choice but to quote John Julius Norwich
(1929–2018), an authority on the subject [6, p. 166]. Surely we would get the finer
points incorrect otherwise!
On the day appointed for the election, the youngest member of the
Signoria was to pray in St Mark’s; then on leaving the Basilica, he was
to stop the first boy he met and take him to the Doges’ Palace, where
the Great Council, minus those of its members who were under thirty,
was to be in full session. This boy, known as the ballotino, would
202 1950. ARROW’S IMPOSSIBILITY THEOREM
have the duty of picking the slips of paper from the urn during the
drawing of lots. By the first of such lots, the Council chose thirty of
their own number. The second was used to reduce the thirty to nine,
and the nine would then vote for forty, each of whom was to receive at
least seven nominations. The forty would then be reduced, again by
lot, to twelve, whose task was to vote for twenty-five, of whom each
this time required nine votes. The twenty-five were in turn reduced to
another nine; the nine voted for forty-five, with a minimum of seven
votes each, and from these the ballotino picked out the names of eleven.
The eleven now voted for forty-one—nine or more votes each—and it
was these forty-one who were to elect the Doge. They first attended
Mass, and individually swore an oath that they would act honestly
and uprightly, for the good of the Republic. They were then locked in
secret conclave in the Palace, cut off from all contact or communication
with the outside world and guarded by a special force of sailors, day
and night, until their work was done.
So much for the preliminaries; now the election itself could begin.
Each elector wrote the name of his candidate on a paper and dropped
it in the urn; the slips were then removed and read, and a list drawn
up of all the names proposed, regardless of the number of nominations
for each. A single slip for each name was now placed in another urn,
and one drawn. If the candidate concerned was present, he retired
together with any other elector who bore the same surname, and the
remainder proceeded to discuss his suitability. He was then called back
to answer questions or to defend himself against any accusations. A
ballot followed. If he obtained the required twenty-five votes, he was
declared Doge; otherwise a second name was drawn, and so on.
With a system so tortuously involved as this, it may seem remark-
able that anyone was ever elected at all.
100TH ANNIVERSARY PROBLEMS 203
Bibliography
[1] K. J. Arrow, Social Choice and Individual Values, Cowles Commission Monograph No. 12,
John Wiley & Sons, Inc., New York, N. Y.; Chapman & Hall, Ltd., London, 1951. MR0039976
[2] K. J. Arrow, Social Choice and Individual Values (second edition), Cowles Foundation Mono-
graphs Series 12, Yale University Press, 1963.
[3] K. J. Arrow, Social Choice and Individual Values (third edition), Cowles Foundation Mono-
graphs Series 12, Yale University Press, 2012. http://www.jstor.org/stable/j.ctt1nqb90.
[4] J. F. Banzhaf, Weighted voting doesn’t work: a mathematical analysis, Rutgers Law Re-
view 19 (1965), no. 2, 317–343. http://heinonline.org/HOL/Page?handle=hein.journals/
rutlr19&div=19&g_sent=1&collection=journals.
[5] COMAP, For All Practical Purposes (ninth edition), W. H. Freeman and Company, 2013.
[6] J. J. Norwich, A History of Venice, Vintage Books, 1989.
[7] L. Penrose, The elementary statistics of majority voting, Journal of the Royal Sta-
tistical Society 109 (1946), no. 1, 53–57. http://www.jstor.org/stable/2981392?seq=1#
page_scan_tab_contents.
1951
√
Tennenbaum’s Proof of the Irrationality of 2
Introduction
√
There are now hundreds of proofs of the irrationality of 2. Perhaps the most
familiar is the following:
√
Suppose toward a contradiction that 2 = a/b, in which a, b are rela-
tively prime integers and b = 0. Squaring the preceding equation, we
obtain 2b2 = a2 . This shows that a2 is even, so a is even too. Write
a = 2c with c a positive integer, so that 2b2 = (2c)2 = 4c2 . Thus,
b2 = 2c2 . This shows that b2 , and hence b itself, is even. Thus,√2
divides both a and b, which is a contradiction. We conclude that 2
is irrational.
It is worth noting that we used the fact that 2 is a prime number. Indeed, if p is
a prime number that divides a perfect square a2 , then p divides a itself. This does
not hold in general since, for example, 4 divides 36 = 62 , but 4 does not divide 6.
Sometime in the 1950s Stanley Tennenbaum came up with the √ following geo-
metric gem (see Figure 1). Suppose toward a contradiction that 2 = a/b, in
which the positive integer b is as small as possible. Consider a square with sides
of length a, and draw squares of side length b in the upper-left and lower-right
corners. Since a2 = 2b2 , the area of the two squares of side length b equals that of
the large square of side length a. The figure suggests that these two squares miss
two small squares with side length a − b and double count a square with side length
2b − a. Consequently, the double counted region must have√the same area as the
two missing squares; that is, (2b − a)2 = 2(a − b)2 . Thus, 2 = (2b − a)/(a − b)
b
a + =
2b − a
a−b
√
Figure 1. Illustration of Tennenbaum’s proof of the irrationality of 2.
205
√
206 1951. TENNENBAUM’S PROOF OF THE IRRATIONALITY OF 2
and√a little more work shows 0 < a − b < b. This contradicts the minimality of b,
so 2 is irrational. For a discussion of the history of Tennenbaum’s proof, see [4].
Is Tennenbaum’s proof valid? One must always be wary about “proofs by
picture.” There are many appealing visual “proofs” that are wrong; see Figure 2.
Fortunately, the geometric intuition used in Tennenbaum’s proof can be formalized.
It is instructive to see what the fundamental ingredients are and how the proof
√ can
proceed without the use of diagrams. Suppose toward a√contradiction that 2 is
rational and let b be the smallest natural number so that 2 = a/b for some integer
a. More explicitly, the well-ordering principle ensures that
√
b = min{n ∈ N : 2n ∈ Z}
exists. We claim that a > b; if not, then we obtain a contradiction:
a ≤ b =⇒ 2
= a2
2b ≤ b2 =⇒ 2 ≤ 1.
√
since 2 = a/b
1951: Comments
The square root of 2 has been called the Rome of mathematics, for all roads
lead to it [2, p. 207]. Here are some lesser-known proofs for your enjoyment.
√
Since the binomial coefficients nk are integers and since ( 2)n is either an integer
√
or an integer times √2, the desired formula (1951.4) follows. From (1951.4) and
the assumption that 2 is rational, we have
√ p an q + bn p cn
en = an + bn 2 = an + bn = = ,
q q q
in which cn is an integer. Since en
= 0, it follows that cn ≥ 1 and hence en ≥ 1/q.
In light of (1951.3), we find that
1 1
≤ en < n
q 2
for every n ∈ N. However, the resulting
√ inequality 2n < q fails for sufficiently large
n. This contradiction shows that 2 is irrational.
Proof by overkill. Our final gem is not self-contained, but it is worth the
setup (if only for its humor value) [1]. The following result was first conjectured
by Pierre de Fermat in 1637 and remained one of the most famous open problems
in mathematics until its resolution in 1994 by Andrew Wiles. Its proof requires
some of the most sophisticated and powerful tools of modern number theory; see
the 1995 entry.
Theorem (Fermat’s Last Theorem). There do not exist natural numbers a, b, c,
n that satisfy an + bn = cn if n ≥ 3.1
√
n
We are now in a position
√ to prove that 2 is irrational for n = 3, 4, 5, . . ..
Suppose that n ≥ 3 and n 2 = a/b, in which a, b ∈ N. Raise both sides to the nth
power and obtain an = 2bn = bn + bn , in violation of Fermat’s last theorem.2 It
has been remarked sarcastically that Fermat’s last theorem is not strong enough to
handle the case n = 2.
Bibliography
[1] R. Ehrenborg, An observation, Amer. Math. Monthly 110 (2003), no. 5, 423.
[2] S. R. Garcia, S. J. Miller, 100 Years of Math Milestones: The Pi Mu Epsilon Centennial
Collection, American Mathematical Society, 2019.
[3] S. J. Miller and D. Montague, Picturing irrationality, Math. Mag. 85 (2012), no. 2, 110–114,
DOI 10.4169/math.mag.85.2.110. http://arxiv.org/pdf/0909.4913.pdf. MR2910300
[4] P. Tennenbaum (personal communication), http://web.williams.edu/Mathematics/
sjmiller/public_html/math/papers/Neode_Tennenbaum.PDF.
[5] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
1 If
n = 2, then there are many solutions, such as 32 + 42 = 52 and 52 + 122 = 132 .
2 The
author of [1] attributes this observation to William Henry Schultz, an undergraduate
at UNC Charlotte.
1952
NSA Founded
Introduction
The National Security Agency (NSA) is a high-technology organization on the
frontiers of communications and data processing. Created by President Harry S.
Truman (1884–1972) in 1952, the NSA coordinates, directs, and performs highly
specialized activities to protect U.S. information systems. Moreover, it also pro-
duces vital foreign intelligence information for U.S. policy makers and the U.S.
military. Virtually every mathematical discipline finds some application within the
NSA. In addition to cryptology, mathematicians at the NSA work on problems in
signal analysis, speech processing, coding theory, data compression, communication
networks, computer security, and other areas.
The NSA is the country’s largest employer of mathematicians and provides op-
portunities for both summer internships and full-time employment. The Director’s
Summer Program (DSP) and the CASASP (Cryptanalysis and Signals Analysis
Summer Program) are summer internships open to undergraduate students ma-
joring in mathematics. The Graduate Mathematics Program (GMP) is a summer
internship available to graduate students. Additionally, the NSA hires full-time
mathematicians and statisticians year round at every degree level. For more infor-
mation about these opportunities and how to apply, visit https://www.nsa.gov/
careers/.
209
210 1952. NSA FOUNDED
K5FSVEJ238VE49QT2DQP4P28J8ST6PGJ69GJPAA0MK99U9DM243OH8PN514N80G3
8Q58FTFHCUH61IVPOR2TGBITUTIV56KU56TVQ0V71CV8TMFSPM2HGOP228FGTSIM
3TSP8KHA7GG34QTCN83O7OP8JOJ450VS5J60EPMDUV70QPE5SHHS88A6OPIONQ08
V1KDULE9JG9KN4RLCKUL51MJ3AMMGBQDSMDUJESP0Q36G48E7SH2LA41TT29G4C5
NOF4TSIHC3I1OJ8JJDUPNO6UF2TUDURELMBQHHKDR7Q3CMOG44JEHADQUPID567O
P2N02I92H6DIFPCJKJTR30HQ718SH2PS3OB8RM6K4BCT08VC8NUPJ4B0365MHEQ8
1GJ4FG5LQ4AE3PCF1QL030NAE9PG0D4OSSRKDAVGJS1GBMQVVK7G919NLM7ETIBC
PA7OF3D1JAQ71UME1IHOEVR9MRLJEEB4C17LIC6TK9MGQHO1MHOHGJ25MPF6OU9E
LO4RVHVGGKVS5ANK2QOAPQJAH9DU4R5Q63878JQ5DUAKV4NQJCVOVC9KJC7T8GCM
J5TM9TVG76T194MT05SLKU20HFCVBTT6LQ18N80J0BIFCA5FBUPEFGBDRRCR0P2H
ALQ1PGI4EDRHIKKCNVQDM3FFA94EECOLEM3R2N4BMDOV2TUPEFTMKH3KHGLOFSLS
N26BE6SV762N3F8RA15TN8RO9A1IJ819HU0IL8NORDPCHD25MT81QT8JA0G3HHO1
APTKS8MJDH86TFQNFJU8SNJ83MLAFKR4PCFGF2MKK5OBCO3SR0RQ9QLEPUES0F4L
1LIPD6P8SK7RC9FDR6RTC0UCNFIF4R65KF2AIFMFSGM57TSJL9DQJLLQ0BJL27MK
RS9KIAL9EVIHQTKPEDOTO4LQ4FLDIL6T25916DVRABS88T7T8USI0ULUBGQQC7MU
DPNE45JUEOL6P9ERO78RO3UJ8JSF6DJ49AJO65IL6DDEHHHSTQ25K7K2SN2I7STQ
5DC8VCBC4PDHM46TG56T5BB02VSR75I2DIG73NGT87UIAMF47VS86EL035HV52NE
HM1M72J092NG1ALMVVQRFS73K6O9ATGDAPQ1JHGQGU5J8JITG5C9JGJ49GFD580N
3121EBGRCFSJE8USS9KE7RA4O7Q6QG9UUM1Q4HK83K4G326OHSE50NOS4H6TM5I5
4LA8S74FLO4AFCDIALAF8338K5TG3KRKP6J5GJ2A491HU66JHVCAML0J4BI34761
2QE2L610RNM699MFG74CCUVCJKMB3K7Q09U9FUUSPSDKF0TK74P2R8SNDRCDNPUC
1L0K78N05SN2H5MFJ7N8BE6I6A9EKTM96LNPO7L85UNC3SO9MT6HMDKLMLFJCJ9I
70LU34O4U1UPAG79VD2MUP8DQS07874A096FCVB2QV2QOF0R2JGBM5UF6NN0FGN0
2P0R4TM93TPBIPB0VE8627ENR5NB0J23S3QK5CNO5EVC5U03L92G6SD858CPQD8H
3D1QED6VL6L4UEBMPKUUVNSGI5OVVSG2JKB0RI92LM98H9UQM338B9ADGRADLAIP
100TH ANNIVERSARY PROBLEMS 211
AROKCE74NL6PT9VSPABODIBVSMM4JCFSLD2OM9OLRQKQF0OROV89ILE5JJ80R8LD
6MUL69QE4B2GB2L30I29O5A9A90PUD0VM94RTNHIESP69QBED62R59G3ITNNBU1F
1I49ILEOCS7CRON2LUFKCFR969O4C8JGULTJB4RSTGRPSI48R8K42CSNRILAJLLE
740TF8GLS3MLL65SB0F8BCHMK1GKCD87I1ONETA5UF6Q9ELET75SR1H1AG3EJSL6
UU4PUD6SB1G16PALODQG2FI18NDMD0I1LS04VARF8K1FBV3CTQ1VGRNBFA9IPUB2
PI12VTQDSREHLUGMU1B2Q95QJA5Q92DKFO7DATD3DU6CI91SQ90H6DQLOHB8O3UL
07SJ4H65IDULCEU523I587U5CP25U8HJ9LQDER4BCHMHDJJA3DEE7S7C352TM9EO
5DRCFSD07GF2KNA530RI962646RHNG3ST43QSR2108CRC4VDF4B0DGBSR49GN09I
395RC1C8BCFSLPOD6P6M8JSBDPT54EU6BATF7TMTCTE5M7IV4RNO46JS7CVA5UM7
D9RMA83GH5EGK7AJ98JT5A1GA4NOK5PE8G4KF4H27AJKJSJQ5AHI1EJR8EIHOG3G
1AR4DATRR0NGBKQ056A3JOBIBNVJ4NCCUD2VCDAKAPS1QHQ58LMOFFDVO1SI03A4
38NG2Q1EPPJEQ5SN2HK3TB1MDL7ME9G5ALM4HC745A6T8ATKT0BGR8JCDOFSR4HS
E6G2TSAK83S5HQM7C7EFO54EDPF9KNAP8BO7OPALULAACDUJGOODIPCK7HGVO38P
U95HSNVK9FFDSD2C0QR49UPADO3GS25N7A125AJG6IHM7K92L21HQ9VUNP5UVSB8
HSBOHOBMU5C1BHV8FKR3L47GVCL8BODIR831TQL4DUALA5K1PFS1ESAQLVOFTLV3
GHELA5U1CLI961GPUPDFU90JC3G129A2UO2C30U45PNCTVBI3QPR3EJVIUCLOFT4
CIV2TMJKTMF8K9IN47GNCU7FSSJ1SDA5OE0O56LEFRM3T64GCQAHSFPNGU9EM1S7
8NC12KA1ATS5ATUDST6C6N6RUHVR95QHA1UPKCIKUP3O2HVC93P1H4JB3E9U5E61
1A9E7UH66TL76HI5Q0PIRCNKSTU5ALQ9H37JLH9KK5EDAA087S10NTMTDP6D9MF4
H6D6TS90IBFTOCJLFGLKA1CRQ1GJBK15R7VNHSVVFBGT68O7MI7T0BOF44RLUKML
C14DGJGU9OO4PQ3OF8Q3DFS51FMK83JCH4RP1L65OLH2KHFCFOJ8A06T6TK9QTGB
FAOPIRNM5MPE48M21ENR0M0KHNM2HEUHPRE16VAVA3GV47J6HHNKDIRKRGN2DICH
JLSG6G3H132PUDBP92HA5D9DA9PCJD0QHGBNE9OJ69VSFCNAPU1R4AMB808I4H69
GAFSDA59HO4N3MIHGUF3HKPCTALMILDBMKJ755JB0N63S70NACFTKB229RDFIR07
G3GJ1FGRNEJP6JR64OU5AMHDM9A9698PQ9ACDGUD9V2TSKOGNDC3RT85I70R02J2
JM6120KJOJCHCLVG59L0IP7PKCU543IL0BSJCPQDIPNAPCULC5RE1QJ792HLVRER
4TOJIDC0JUVVMB44U98J9NKJOK31NG78PRU2LHNMFSK6V492GJ9G5TN83A1SJC6L
7TU9GQCTJ9LTADM56PGQVEHUPEFQT9QR9EVCC6TCGTDR0BUVB2V0L6NI7VETN102
THBOPUDOKD6P34G8KRUPGFAPQJH0MKBOL2FN4MSLSP942QHNQ9H9BO4JMJPL08V1
MFP3D2RK9IFU9MD0N5E4IL4NV0KBDSFCLEBN2N2TA96LB9NQ8DL65D8PU4T70CP7
CQSCCL4PI5GPIPUP09UJM2K2IO8RCDIJJ7O7AUQ9RS5IRU671CGO0PIJC03KJ85E
8C7O4TN5GF6T8B0V9KJCU5G44HI4RI5TIP9E2DORVILGGF7QMVUSC6JAVQD2DAV4
DM96TCB0GAEH15PH9QVUT2V8DHUNSFNL8V7P1J98JG3KS9QTORMHH9LBU830NO3O
5CN0BLPHTU37IL52KADKBC31A50N37MB6R77URIUHE1I5GNBGNO32JKB0J6FLA6S
5Q165EPBGV8FOR0QGDDODT75N2KR29A36E2TB03CH46SQOH0FGH27SLNR2D7IKMK
N5SG8VGBER340K197B6C9BHI83N3C2LM9RGCRK9QB5G7CFGOBDRGSLIRE548143C
1QV81QOR6MLAAB9PN6STR8L062FGLGTDJ03Q5T7165SP6DUTODJM2T54P6PQD9BU
NUQBVP5M00J4VDDCS4LQLIAKV07KV4PCFK0JJ7GJQ90RUO55DG7A5U1E77A2SN6N
VE6URPHND4VD67EHNGLKB0RG5G2P47M947474LQ87P7SI4M52RIBGFP8PM08IN3Q
1952: Comments
TUNNY business. Believe it or not, this is essentially how the solution of
the TUNNY problem progressed in 1941–1942. On August 30, 1941, two TUNNY
messages with the indicator HQIBPEXEZMUG were intercepted. Colonel John Tiltman
recognized that these messages were isologous, meaning that they were repeats of
the same underlying plaintexts. Since the two messages had numerous typographi-
cal errors and extraneous spaces, the messages soon got out of sync with each other
which allowed Tiltman to read both messages and recover a stretch of about 4,000
key characters. In January 1942, he made a cryptanalytic breakthrough (you will
212 1952. NSA FOUNDED
need to find it for yourself if you wish to decrypt the message) that allowed him to
ascertain the entire inner workings of the TUNNY machine.
Determining how TUNNY encipherment worked was just a single step towards
producing an ability to read TUNNY messages from an intercepted cipher. An
excellent account of Bletchley Park’s success against the TUNNY machine can be
found in [1]. A technical report on TUNNY, written by the codebreakers themselves
and containing a wealth of details about the machine and its exploitation, was
declassified in 2000 and is available at [3].
QHVVH OGMZS TRDXV VDXRC LRVHK IFSFV ADWMQ BLWUW PTNSD RFGWV
CWJLV TRVYO UHICR HNIFO
For the solution, see the comments in the 1954 entry (we have other things to talk
about in the 1953 entry).
Bibliography
[1] J. Copeland (editor), Colossus: The Secrets of Bletchley Park’s Codebreaking Computers,
Oxford University Press, 2006.
[2] M. Cozzens and S. J. Miller, The mathematics of encryption: An elementary introduc-
tion, Mathematical World, vol. 29, American Mathematical Society, Providence, RI, 2013.
MR3098499
[3] I. J. Good, D. Michie, and G. Timms, General report on TUNNY with emphasis on statistical
methods, www.alanturing.net.
[4] Wikipedia, Lorenz cipher, https://en.wikipedia.org/wiki/Lorenz_cipher.
1953
Introduction
Suppose, for the sake of simplicity, that Middle Earth is divided into three
demographic regions: Gondor, Rohan, and Mordor (sorry to disappoint fans of
Eriador, Forodwaith, Rhovanion, Rhûn, Harad, and so forth). Each year, 5% of
the residents of Gondor move to Rohan and 5% move to Mordor (property values
in Mordor are low and the climate is warm and dry). Of the residents of Rohan,
15% of them move to Gondor and 10% move to Mordor each year. Finally, 10% of
the residents of Mordor move to Gondor and 5% move to Rohan each year. What
percentage of the population will reside in each of the three regions after a long
period of time? What role does the original population distribution play?
Because of the complicated interactions between the three regions, it is difficult
to track the flow of residents throughout the system for more than one or two years
without some sort of visual or algebraic aid. We can make a diagram of our system
to present the data in an intuitive format; see Figure 1. Each node represents one
of the three regions of Middle Earth, and the arrows indicate the flow of residents
between regions. The sum of the outgoing arrows from each node in the diagram
must be 1 since the entire population of a given region needs to be accounted for.
For example, we know that 15% and 10% of Rohan residents move to Gondor and
Mordor, respectively, during a given year. What happens to the remaining 75% of
them? They stay put and remain in Rohan.
If the initial population is split between in a 40 : 60 : 10 ratio between Gondor,
Rohan, and Mordor, we define the initial probability vector p0 = [0.4 0.6 0.1]T .
More generally, we call any vector whose entries are nonnegative and sum to 1 a
probability vector . We wish to find the probability vectors p1 , p2 , . . . that describe
the population distributions in years 1, 2, . . . and so on. Moreover, we would like
to evaluate limn→∞ pn , if it exists, since this probability vector will reveal the
eventual state of our system (can you explain why the limit of probability vectors
is a probability vector?).
We compile our data in the transition matrix
⎡ ⎤ ⎡9 3 1
⎤
0.90 0.15 0.10 10 20 10
⎢1 1 ⎥
A = ⎣0.05 0.75 0.05⎦ = ⎣ 20 3
4 20 ⎦ ,
0.05 0.10 0.85 1 1 17
20 10 20
each column of which is a probability vector that describes the outgoing arrows from
the corresponding node in Figure 1. This matrix governs the population flow in
Middle Earth from one year to the next. The population after one year is described
215
216 1953. THE METROPOLIS ALGORITHM
0.15
0.90 Gondor Rohan 0.75
0.05
0.05 0.05
0.10 0.10
Mordor 0.85
0.1375
That is, we can apply A to the year-one result, p1 , to obtain the answer, or we
can use A2 to jump straight from p0 to the year-two result A2 p0 . In general,
pn = An p0 . Observe that the populations of the three regions appear to stabilize
rapidly; see Figure 2(a).
What happens if we change the initial settings? Suppose that p0 = [0.3 0 0.7]T ;
that is, 30% of the initial population resides in Gondor and 70% in Mordor. Figure
2(b) suggests that the populations stabilize at the same levels as before. Moreover,
the convergence again appears to be very rapid. Can we explain this behavior?
Since pn = An p0 describes the relative population levels of the three regions
after n years have elapsed, we wish to evaluate limn→∞ An p0 , assuming this limit
exists. How can we compute it? This is where linear algebra comes in: we attempt
to diagonalize A. If we can write A = SDS −1 , in which D is diagonal, then
Since D is diagonal, limn→∞ Dn should be easy to compute and we obtain the limit
L = lim An = S( lim Dn )S −1 .
n→∞ n→∞
A little linear algebra of the eigenvalue-eigenvector sort tells us that
⎡9 ⎤ ⎡ 13 ⎤⎡ ⎤⎡ 7 ⎤
10
3
20
1
10 7 −1 1 1 0 0 24
7
24
7
24
⎢1 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 1 ⎥ ⎢4 0 −2⎥ ⎢ ⎥⎢ 3 5 ⎥
⎦ ⎣0 5 0 ⎦ ⎣− 8
3 4 1
⎣ 20 4 20 ⎦ = ⎣ 7 8 8 ⎦.
1 1 17
1 1 1 0 0 10 7 1
− 125 1
20 10 20
12 12
A S D S −1
1
Alice Bob
1
remotely feasible. This year honors the Metropolis algorithm. It and various gener-
alizations have led to the explosive growth of Markov chain Monte Carlo (MCMC)
algorithms, which have revolutionized subjects such as statistical physics, Bayesian
inference, theoretical computer science, and financial mathematics by giving us the
ability to simulate almost anything in real time. For example we can simulate a
baseball game as a Markov chain. These results and applications are a natural
extension of the Monte Carlo methods from the 1946 entry.
A Markov chain is a random sequence of states, each of whose probabilities
depend iteratively on the previous state. Nicholas Metropolis (1915–1999) and his
colleagues realized in 1953 that Markov chains could be run on then-new electronic
computers to converge to, and hence sample from, a probability distribution of
interest [3]. The following problem concerns a Markov chain with an infinite number
of states.
1953: Comments
The interesting posts [6, 9] give a nice background of the history of Markov
chains, some surprising examples, and code to explore. Andrey Markov (1856–1922)
introduced the chains now named for him in 1913 while performing an analysis of
the sequence of consonants and vowels in the work of the Russian writer Alexander
Pushkin (1799–1837). In particular, he found that he could create state diagrams
in which the transition probability to the next letter depended only on the previous
two letters.
In the intervening years these ideas have been successfully extended and ap-
plied to numerous other problems. There are many readable accounts of the history
of these algorithms [1, 8]. Motivation for these extensions and improvements range
from studying the behavior of neutrons in fissile material to estimating the prob-
ability that certain solitaire games are winnable. The applications are almost as
varied. One reason for this is that we can use these ideas to estimate integrals and
areas; it is desirable to be able to determine areas since these frequently correspond
to probabilities. For more on this see the 1946 entry.
Bibliography
[1] D. B. Hitchcock, A history of the Metropolis-Hastings algorithm, Amer. Statist. 57 (2003),
no. 4, 254–257, DOI 10.1198/0003130032413. http://www.jstor.org/stable/pdf/30037292.
pdf. MR2037852
[2] N. Metropolis, The beginning of the Monte Carlo method, Stanislaw Ulam 1909–1984,
Los Alamos Sci. 15, Special Issue (1987), 125–130. http://library.lanl.gov/cgi-bin/
getfile?15-12.pdf. MR935771
[3] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller, Equations of state
calculations by fast computing machines, J. Chem. Phys. 21 (1953), 1087–1091.
[4] N. Metropolis and S. Ulam, The Monte Carlo method, J. Amer. Statist. Assoc. 44 (1949),
335–341. http://www.jstor.org/stable/pdf/2280232.pdf. MR0031341
[5] J. R. Norris, Markov chains, Cambridge Series in Statistical and Probabilistic Mathematics,
vol. 2, reprint of 1997 original, Cambridge University Press, Cambridge, 1998. http://www.
statslab.cam.ac.uk/~james/Markov/. MR1600720
[6] O. Pavlyk, Centennial of Markov Chains, Mathematica Algorithm R&D, http://blog.
wolfram.com/2013/02/04/centennial-of-markov-chains/.
[7] http://probability.ca/jeff/java/rwm.html
[8] C. Robert and G. Casella, A short history of Markov chain Monte Carlo: subjective recollec-
tions from incomplete data, Statist. Sci. 26 (2011), no. 1, 102–115, DOI 10.1214/10-STS351.
http://arxiv.org/pdf/0808.2902.pdf. MR2849912
[9] A. Smith, Surprising examples of Markov chains, http://mathoverflow.net/q/252671.
1954
Kolmogorov–Arnold–Moser Theorem
Introduction
One can regard perturbation theory as a collection of various methods for ob-
taining approximate solutions to difficult problems based upon exact solutions to
closely related, but simpler, problems. Applied mathematicians and scientists use
the tools of perturbation theory to infer information about problems that model
dynamical systems under the influence of gravitational or quantum forces. For ex-
ample, the planet Neptune was discovered in 1846 as a result of calculations made
by the French mathematician Urbain Le Verrier (1811–1877) and mathematician-
astronomer John Couch Adams (1819–1892), based on the perturbations of the
planet Uranus due to the gravitational influence of the then-unknown Neptune.
It was a momentous day in the history of science when mathematicians told as-
tronomers where to point their telescopes to see the first new planet discovered
since 1781, sixty-five years earlier.
Around the turn of the 20th century, Henri Poincaré, expanding on the work
of the “problem of small denominators” by astronomer Charles-Eugéne Delaunay
(1816–1872), first postulated that small perturbations can have large effects on a
dynamical system. In popular culture, this is known as “chaos” or the “butterfly ef-
fect.” The “problem of small denominators” refers to issues arising from potentially
small quantities that appear in the denominators of the formal Fourier series con-
structed to solve the problem. These can cause convergence issues in the proposed
perturbative series, a problem solved with the advent of KAM theory [8].
The Kolmogorov–Arnold–Moser theorem concerns the behavior of systems un-
der small perturbations; see [3, 4, 8]. The first set of results are due to Andrey
Kolmogorov (1903–1987) in 1954, which were later extended in 1962 by Jürgen
Moser (1928–1999) and further developed by Vladimir Arnold (1937–2010) a year
later. Essentially, the Kolmogorov–Arnold–Moser theorem provides criteria under
which a system of partial differential equations have little “chaotic” behavior under
small perturbations.
One of the most important examples arises from physics. In the Hamiltonian
formulation we have position variables q = (q1 , q2 , . . . , qn ), momentum variables
p = (p1 , p2 , . . . , pn ), the Hamiltonian function H(p, q) (which often corresponds to
the total energy of the system), and the time evolution given by
dp
= −∂q H
dt
and
dq
= ∂p H.
dt
221
222 1954. KOLMOGOROV–ARNOLD–MOSER THEOREM
One then studies how the solutions evolve over time. In this setting, KAM theory
states that for sufficiently small perturbations the new behavior should be close to
that of the unperturbed system.
See [1] for a nice perspective written fifty years after Kolmogorov’s work. We
can see some of the issues by looking at an example from that paper: complex
linearization. Consider a map F (z) = λz + f (z) for some nonzero λ ∈ C. We wish
to find a function Φ such that
(Φ ◦ F )(z) = λΦ(z).
If f has a series expansion, then we can formally find a series expansion for Φ. If
0 < |λ|
= 1, the series for Φ converges, but issues arise if |λ| = 1. In that case
we may write λ = e2πiα for some α ∈ R, and the behavior depends on how well
approximable α is by rational numbers. We elaborate on this in the associated
problem and encourage the interested reader to consult [1].
(a) Prove that for any irrational α there exist infinitely many relatively prime
pairs of integers p, q such that
α − p < 1 .
q q2
This is known as Dirichlet’s approximation theorem. It implies that every
irrational number can be approximated fairly well.
(b) Consider all irrational numbers in [0, 1] of type (1, 2 +
) for a fixed
> 0.
What is the measure of such numbers? More generally, what is the measure
of all irrational numbers in [0, 1] that are of type (K, 2 +
) for a fixed
> 0
(so K is allowed to vary)?
1954: Comments
In the 1938 entry we saw how the irrationality of α affected the rate of con-
vergence of the sequence {αn } to Benford’s law. The results of this year provide
another example of irrationality in action and serve as an excellent bridge to the
1955 entry on Roth’s theorem.
Hurwitz’s approximation theorem. Are there any irrationals that are hard
to approximate by rationals, or that are particularly easy? The golden ratio
√
1+ 5
φ =
2
224 1954. KOLMOGOROV–ARNOLD–MOSER THEOREM
is among the hardest to approximate. Hurwitz’s theorem states that for any irra-
tional number α there are infinitely many relatively prime pairs (p, q) such that
α − p < √ 1
q 5q 2
√
and that the√constant 1/ 5 is best possible in the sense that if α = φ, then for
each C < 1/ 5 there are only finitely many relatively prime pairs (p, q) such that
φ − p < C .
q q2
Some more frequency analysis and inspired guesswork provides the keyword
CODE. The decoded message is:
HERE IS A LONG PIECE OF TEXT THAT WE HAVE ENCRYPTED USING THE
FAMOUS VIGENERE CIPHER. HOPEFULLY IT WILL NOT PROVE TOO DIFFICULT
TO DECIPHER. IF THIS TEXT IS LONG ENOUGH, THERE SHOULD BE ENOUGH
INFORMATION FOR YOU TO BE ABLE TO FIND THE LENGTH OF THE KEY.
WITH THAT YOU CAN DO FREQUENCY ANALYSIS AND HOPEFULLY FIND THE
KEY WORD. AT THAT POINT THE DECRYPTION IS SIMPLE AND STRAIGHTFORWARD.
GOOD LUCK.
Bibliography
[1] H. W. Broer, KAM theory: the legacy of A. N. Kolmogorov’s 1954 paper. Comment on: “The
general theory of dynamic systems and classical mechanics” (French) [in Proceedings of the
International Congress of Mathematicians, Amsterdam, 1954, Vol. 1, 315–333, Erven P.
Noordhoff N.V., Groningen, 1957; MR0097598], Bull. Amer. Math. Soc. (N.S.) 41 (2004),
no. 4, 507–521, DOI 10.1090/S0273-0979-04-01009-2. http://www.ams.org/journals/bull/
2004-41-04/S0273-0979-04-01009-2/S0273-0979-04-01009-2.pdf. MR2083638
[2] L. Chierchia and J. N. Mather, Kolmogorov–Arnold–Moser theory, Scholarpedia 5 (2010),
no. 9, 2123. http://www.scholarpedia.org/article/Kolmogorov-Arnold-Moser_theory.
[3] CORNELLCAST, Small denominators: adventures through the looking glass, http://www.
cornell.edu/video/john-milnor-small-denominators.
[4] H. S. Dumas, The KAM story: A friendly introduction to the content, history, and signif-
icance of classical Kolmogorov-Arnold-Moser theory, World Scientific Publishing Co. Pte.
Ltd., Hackensack, NJ, 2014. MR3222196
[5] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, 5th ed., The
Clarendon Press, Oxford University Press, New York, 1979. MR568909
[6] A. Ya. Khinchin, Continued fractions, The University of Chicago Press, Chicago, Ill.-London,
1964. MR0161833
[7] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[8] C. E. Wayne, An introduction to KAM theory, Dynamical systems and probabilistic methods
in partial differential equations (Berkeley, CA, 1994), Lectures in Appl. Math., vol. 31, Amer.
Math. Soc., Providence, RI, 1996, pp. 3–29. http://math.bu.edu/people/cew/preprints/
introkam.pdf. MR1363023
[9] Wikipedia, Kolmogorov–Arnold–Moser theorem, http://en.wikipedia.org/wiki/
Kolmogorov-Arnold-Moser_theorem.
[10] Wikipedia, Discovery of Neptune, http://en.wikipedia.org/wiki/Discovery_of_Neptune.
[11] Wikipedia, Perturbation theory, http://en.wikipedia.org/wiki/Perturbation_theory.
1955
Roth’s Theorem
Introduction
A real number is either rational or irrational; determining which is the case is
often a difficult matter (see the 1935 entry). Can one irrational number be “more
irrational” than another?
The irrationality measure μ(α) of a real number α is the least upper bound of
the set of real r > 0 such that
p 1
0 < α − < r (1955.1)
q q
has infinitely many solutions with relatively prime integers p, q and q > 0. Dirich-
let’s approximation theorem asserts that μ(α) ≥ 2 for all irrational α; see Table
1. To be more specific, Dirichlet proved that (1955.1) has infinitely many solutions
when r = 2. For such p, q, the error in the approximation α ≈ p/q is much smaller
than one has a right to expect since consecutive rational numbers with denomina-
tor q are at a distance of 1/q from each other. An error bounded above by 1/q 2
seems like too much to ask for. Such excellent approximations can be produced
with truncated continued fraction expansions (see the 1931 entry).
On the other hand, each rational α has μ(α) = 1. To see this, write α = a/b
in lowest terms and let δ > 0. Suppose that (1955.1) has infinitely many solutions
with r = 1 + δ. Then
aq − bp
0< < 1 .
bq q 1+δ
Multiply by bq and observe that aq − bp is a nonzero integer, so
b
1 ≤ |aq − bp| < δ .
q
Since the right-hand side of the preceding tends to zero as q → ∞, the preceding
inequalities can be satisfied by only finitely many pairs p, q of relatively prime inte-
gers. Thus, there is a profound gap between the irrationality measures of rationals
(exactly 1) and irrationals (≥ 2).
One can show that almost every real number has irrationality measure 2; this
includes the constant e.1 For other numbers, such as π, we have only upper bounds:
μ(π) ≤ 7.6063 [4]. On the other hand, there are numbers whose irrationality
∞
measure is infinite. This includes Liouville’s constant n=1 10−n! , the first explicit
transcendental number discovered; see the 1935 entry.
1 The convenient form of the continued fraction expansion for e (see the 1931 entry) makes
it possible to show that μ(e) = 2. The erratic nature of the continued fraction expansion for π,
on the other hand, does not permit a similar evaluation of μ(π).
227
228 1955. ROTH’S THEOREM
p
q
p
q (numeric) | pq − π| 1
q2
1
q2 (numeric)
3 3.00000000000 0.14 1 1.0
22 1
7 3.14285714286 0.0013 49 0.020
333 1
106 3.14150943396 0.000083 11236 0.000089
−7
355
113 3.14159292035 2.67 × 10 1
12769 0.000078
103993
33102 3.14159265301 5.78 × 10−10 1
1095742404 9.13 × 10−10
104348
33215 3.14159265392 3.32 × 10−10 1
1103236225 9.06 × 10−10
208341
66317 3.14159265347 1.22 × 10−10 1
4397944489 2.27 × 10−10
Roth’s theorem, for which Klaus Friedrich Roth (1925–2015) was awarded a
Fields Medal in 1958, states that μ(α) = 2 for every irrational algebraic real number
α. That is, for every
> 0, the inequality
α − p < 1
q q 2+
has only finitely many solutions with relatively prime integers p, q and q > 0. Thus,
an irrational algebraic real number cannot have many “extremely good” rational
approximations.
The origins of this work go back to Joseph Liouville (1809–1882), who proved in
1844 that μ(α) ≤ d for an algebraic number of degree d ≥ 2; see the 1935 entry for a
proof of this result. In 1909, Axel Thue (1863–1922)
√ improved this to d/2 + 1 +
for
every
> 0. √This bound was reduced to 2 d by Carl Ludwig Siegel (1896–1981) in
1921 and to 2d by mathematician-physicist Freeman Dyson in 1947 (see the 1928
entry). Siegel had conjectured that μ(α) = 2 for all algebraic irrational numbers;
this was finally proved by Roth in 1955.
Due to the recent explosion of work in additive combinatorics [5], the phrase
“Roth’s theorem” now often refers to Roth’s theorem on arithmetic progressions
(1953), which asserts that if A ⊆ Z has positive upper density, meaning that
|A ∩ [−N, N ]|
lim sup > 0,
N →∞ 2N
then A contains infinitely many arithmetic progressions of length three; see the 1913
entry. This is the first nontrivial case of Szemerédi’s theorem; see the 1975 entry.
To avoid confusion, Roth’s theorem on Diophantine approximation is sometimes
referred to as the Thue–Siegel–Roth theorem.
1955: Comments
Hint for the problem. Consider the binary expansion
∞
bn (x)
x = , bn (x) ∈ {0, 1},
n=1
2n
The Flint Hills series. Here is another difficult question related to Diophan-
tine approximation. Does the Flint Hills series
∞
1
(1955.2)
n=1
n3 sin2 n
converge? In case you were wondering, the nomenclature refers to Flint Hills,
Kansas [6, Chapter 25]. One suspects that the n3 in the denominator should force
the series to converge. However, sin n gets close to zero every now and then. For
example, the exceptionally good rational approximation π ≈ 355/113 means that
sin 355 = −0.000030144 . . . is dangerously close to sin 113π = 0. This results in a
big jump: the 354th partial sum of (1955.2) is approximately 4.8 and the 355th
partial sum is approximately 29.4. Figures 1 and 2 illustrate this sort of behavior.
Figure 2. First 400 partial sums of the Flint Hills series (1955.2).
The huge jump between the 354th and 355th partial sums is evi-
dent.
As of 2018, it is unknown whether the Flint Hills series converges. Its relevance
to this entry stems from the fact that its convergence would imply that μ(π) ≤ 2.5
[1]. That would be a huge improvement over the best known result, μ(π) ≤ 7.6063.
Ba,n = {a + kn : k ∈ Z}.
Thus, each Ba,n is open and each open set in Z is a union of some collection of
infinite arithmetic progressions. Since
*
n−1
c
Ba,n = Ba+j (mod n),n
j=1
c
is a finite union of open sets, it follows that each Ba,n is open. Consequently, each
Ba,n is closed (in addition to being open).
100TH ANNIVERSARY PROBLEMS 231
Suppose toward a contradiction that there are only finitely many primes. Since
the finite union of closed sets is closed2 , we conclude that
*
A= B0,p
p prime
is closed. Since every integer except −1 and 1 is a multiple of some prime, it follows
that {−1, 1} = Z\A is a nonempty open set that contains no infinite arithmetic
progression. This contradiction shows that there are infinitely many prime numbers.
Much of the preceding material on Furstenberg’s proof was originally contained
in the 1948 entry, whose problem was written by James M. Andrews. The original
problem asked: Is the Furstenberg topology Hausdorff? Is it regular? Is it normal?
Here is a quick explanation of the terminology for those who have not taken a
course in point-set topology. A topological space X is
(a) Hausdorff if whenever x, y ∈ X are distinct points, there are disjoint open sets
U, V ⊂ X with x ∈ U and y ∈ V ,
(b) regular if whenever A ⊂ X is closed and x ∈ X\A, there are disjoint open sets
U, V ⊂ X with x ∈ U and A ⊂ V , and
(c) normal if whenever A, B ⊂ X are disjoint closed sets, there are disjoint open
sets U, V ⊂ X with A ⊂ U and B ⊂ V .
Bibliography
[1] M. A. Alekseyev, On convergence of the Flint Hills series, https://arxiv.org/pdf/1104.
5100.pdf
[2] H. Furstenberg, On the infinitude of primes, Amer. Math. Monthly 62 (1955), 353, DOI
10.2307/2307043. MR0068566
[3] I. D. Mercer, On Furstenberg’s proof of the infinitude of primes, Amer. Math. Monthly 116
(2009), no. 4, 355–356, DOI 10.4169/193009709X470218. MR2503321
[4] V. Kh. Salikhov, On the irrationality measure of π (Russian), Uspekhi Mat. Nauk 63 (2008),
no. 3(381), 163–164, DOI 10.1070/RM2008v063n03ABEH004543; English transl., Russian
Math. Surveys 63 (2008), no. 3, 570–572. MR2483171
[5] T. Tao and V. H. Vu, Additive combinatorics, paperback edition [of MR2289012], Cambridge
Studies in Advanced Mathematics, vol. 105, Cambridge University Press, Cambridge, 2010.
MR2573797
[6] C. A. Pickover, The mathematics of Oz: Mental gymnastics from beyond the edge, Cambridge
University Press, Cambridge, 2002. MR1936664
2 This
n n
follows from de Morgan’s law ( i=1 S i )c = i=1 Sic and axiom (c).
1956
Introduction
In calculus one encounters a vast array of “transcendental” functions such as
ex , sin x, and log x. In multivariable calculus (with functions) and differential
geometry (with smooth maps), the abundance of “transcendental” functions and
maps becomes even more pronounced. In 1956, it was shown by Jean-Pierre Serre
(1926– ), who had been awarded the Fields Medal in 1954, that in the setting of
complex variables, under a compactness hypothesis many “transcendental-looking”
geometric and function-theoretic constructions are algebraic from an appropriate
point of view and, moreover, that such an “algebraization” of the analytic construc-
tion is unique.
This result explained many earlier known special cases and was of fundamental
importance in the development of algebraic and complex analytic geometry. Not
only did it justify the role of transcendental methods in the solution of algebraic
problems admitting a sufficiently geometric flavor, it also inspired the profound
work of Alexander Grothendieck (1928–2014) and many others during the revolu-
tion that swept through algebraic geometry in the 1960s.
Serre’s method of proof was sufficiently robust that it was later generalized to
apply to geometric constructions over the p-adic numbers instead of C, and this
generalization is a ubiquitous tool in contemporary algebraic number theory. His
1956 paper is titled “Géométrie algébrique et géométrie analytique”, or GAGA for
short, and the phrase “GAGA principle” expresses the idea that in the presence
of compactness, certain analytic constructions in geometry over C not only admit
an algebraic description (which is already quite striking) but in fact an essentially
unique one.
such zeros or poles. By studying the zero or pole order of f (1/z) at z = 0, get
to a case where Liouville’s theorem can be applied.
1956: Comments
Fermat’s last theorem. Since J. P. Serre has played an important role in
many entries in this collection, it is worth mentioning another here which we explore
in greater detail in the 1995 entry: Fermat’s last theorem.
Fermat’s last theorem states that if n ≥ 3, then there are no solutions to
an + bn = cn in natural numbers a, b, c. Although Pierre de Fermat claimed to
have a remarkably simple proof almost four hundred years ago, the only known
proof uses an enormous amount of machinery from 20th-century mathematics. The
following summary is painfully short, and the interested reader is encouraged to
peruse the references from the 1995 entry.
In the 1960s Yves Hellegouarch (1936– ) considered what could be done if a
solution (a, b, c) existed for some n. He associated the elliptic curve
y 2 = x(x − an )(x + bn )
to this solution and saw that it would have some special properties. A few years
later in the 1980s, Gerhard Frey (1944– ) explored these curves again and proposed
that such curves would not be “modular.” However, it was believed that all el-
liptic curves are modular (one interpretation is that there is a weight-2 cuspidal
newform associated to the curve). Serre noticed a mistake in Frey’s proof of the
nonmodularity of his curves; this issue (the epsilon conjecture) was proved by Ken
Ribet (1948– ) in 1986. The proof of Fermat’s last theorem follows from showing all
semistable elliptic curves over Q are modular, something accomplished by Andrew
Wiles with assistance from Richard Taylor (1962– ) in the 1990s.
A word about fonts. A convention going back to the influential books of
Nicholas Bourbaki (a pseudonym under which a group of mathematicians, mostly
French, published a series of influential textbooks known for their high level of
abstraction) in the 1930s is that canonical mathematical structures should be de-
noted with a boldface font, including various number systems such as as Z, Q, R,
and C. Since such boldface is hard to replicate in handwriting, Kunihiko Kodaira
(1915–1997) proposed the variants Z, Q, R, and C when writing by hand. Para-
doxically, with the advent of modern mathematical typsetting, these latter fonts
became more widespread in typesetting than the boldface fonts they were invented
to replicate. Both Conrad and Serre (whose work is featured here) feel strongly
that only boldface should be used in the typography for these number systems, so
we have followed that convention here.
Bibliography
[1] J.-P. Serre, Géométrie algébrique et géométrie analytique (French), Ann. Inst. Fourier, Greno-
ble 6 (1955), 1–42. MR0082175
[2] Wikipedia, Algebraic and analytic geometry, http://en.wikipedia.org/wiki/
Algebraic_geometry_and_analytic_geometry.
1957
Introduction
The Ross Mathematics Program is an intensive residential summer program for
talented high school students. Arnold Ross (1906–2002) founded the program at the
University of Notre Dame in 1957. He later moved it to the Ohio State University
in 1964. Although Dr. Ross stepped down in 2000, the Ross Program continues
to run, involving about seventy-five first-year students every summer. The central
goal of the program is to train students to think like mathematicians and to write
convincing, logical proofs of their mathematical observations. Ross chose number
theory as the vehicle for this learning process. Starting from the axioms for the ring
of integers, Ross participants analyze topics such as modular arithmetic, Euclid’s
algorithm, quadratic reciprocity, and the existence of primitive roots. They also
consider analogues of those ideas in other contexts such as the Gaussian integers and
the ring of polynomials over Z/pZ. Further information about the Ross Program is
posted at http://www.math.osu.edu/ross. The problems below are taken from
some of the Ross problem sets.
gcd(2m − 1, 2n − 1) = 2gcd(m,n) − 1.
We give this property a name: a sequence {An }n≥1 of positive integers has the gcd
property if gcd(Am , An ) = Agcd(m,n) for every pair of indices m, n.
Problem 1. Show that the following sequences have the gcd property:
(a) the constant sequence Cn = r, in which r ∈ N is fixed,
(b) the linear sequence Ln = rn, in which r ∈ N is fixed,
(c) for fixed c, k ∈ N, the sequence
c if n is a multiple of k,
E(k, c)n =
1 otherwise,
(d) for fixed a, b ∈ N with a > b, the sequence Rn = an − bn ,
(e) the Fibonacci numbers, defined by F1 = F2 = 1 and Fn+2 = Fn+1 + Fn .
235
236 1957. THE ROSS PROGRAM
For example,
B2 = b1 b2 , B4 = b1 b2 b4 , and B6 = b1 b2 b3 b6 .
If gcd(bm , bn ) = 1 whenever m
= n, show that {Bn } has the gcd property.
(a) Which {bn } produce sequences {Bn } with the gcd property?
(b) Does every {Bn } with the gcd property arise from some (unique) integer se-
quence {bn }?
1957: Comments
Cyclotomic polynomials. The second problem is related to the factorization
xn − 1 = Φd (x),
d|n
in which Φd (x) denotes the dth cyclotomic polynomial. To be more specific, Φd (x)
is the monic (leading coefficient 1) polynomial whose roots are the primitive dth
roots of unity. The primitive dth roots of unity are exp(2πij/d), in which i2 = −1,
j ∈ {1, 2, . . . , d}, and gcd(j, d) = 1; see Figure 1.
√ i √
− 12 + 3
2 i
1
2 + 3
2 i
−1 0 1
√ √
− 12 − 3 1
− 3
2 i
2 i 2
−i
Although they are defined in terms of their roots, which are certain complex
roots of unity, cyclotomic polynomials have only integer coefficients. The first few
cyclotomic polynomials are
Φ1 (x) = x − 1,
Φ2 (x) = x + 1,
Φ3 (x) = x2 + x + 1,
Φ4 (x) = x2 + 1,
Φ5 (x) = x4 + x3 + x2 + x + 1,
Φ6 (x) = x2 − x + 1,
Φ7 (x) = x6 + x5 + x4 + x3 + x2 + x + 1,
Φ8 (x) = x4 + 1,
Φ9 (x) = x6 + x3 + 1,
Φ10 (x) = x4 − x3 + x2 − x + 1.
Φ105 (x) = x48 + x47 + x46 − x43 − x42 − 2x41 − x40 − x39 + x36 + x35 + x34
+ x33 + x32 + x31 − x28 − x26 − x24 − x22 − x20 + x17 + x16 + x15
+ x14 + x13 + x12 − x9 − x8 − 2x7 − x6 − x5 + x2 + x + 1.
When one realizes that 105 = 3 · 5 · 7 is the smallest number that is the product of
three distinct odd primes, it becomes slightly more reasonable to expect that the
first counterexample might take so long to materialize.
Invisible forests. Imagine that there is a slender tree planted at each lattice
point (x, y) ∈ Z2 and pretend that you are at the origin (0, 0). How many lattice
points can you “see” from the origin? Which ones are blocked by trees? Are there
arbitrarily large portions of the forest that are not visible from the origin? See [2]
for interesting generalizations of this problem.
If gcd(x, y) = g
= 1, then x = gx and y = gy for some (x , y ) ∈ Z2 , so that
the tree planted at (x , y ) “blocks” our view of (x, y) = g(x , y ). In general, a
lattice point (x, y) is visible from the origin if and only if gcd(x, y) = 1; see Figures
2 and 3. Based upon this, one can show that the proportion of lattice points visible
from the origin is 6/π 2 ≈ 60.8%; see the notes to the 1939 entry.
The following result is from [1, Thm. 5.29]: the set of lattice points visible from
the origin contains arbitrarily large square gaps. That is, given any positive integer
n, there exists a lattice point (a, b) such that none of the lattice points (a + j, b + k)
with 0 < j, k ≤ n is visible from the origin.
238 1957. THE ROSS PROGRAM
Figure 2. Visible lattice points in the region [−10, 10] × [−10, 10].
The proof is an elegant use of prime numbers and the Chinese remainder the-
orem. Given n > 0, form the n × n matrix
⎡ ⎤
2 3 · · · pn
⎢ pn+1 pn+2 · · · p2n ⎥
⎢ ⎥
⎢ .. .. .. ⎥
⎣ . . . ⎦
p(n−1)n+1 pn(n−1)+2 ··· pn2
whose first row consists of the first n primes, whose second row consists of the next
n primes, and so on. Let rj be the product of the primes in the jth row and let
cj denote the product of the primes in the jth column. Since none of the primes
p1 , p2 , . . . , pn2 can lie in two rows or two columns simultaneously, it follows that
gcd(rj , rk ) = gcd(cj , ck ) = 1
whenever j
= k. The Chinese remainder theorem asserts that the system of con-
gruences
Figure 3. Visible lattice points in the region [−50, 50] × [−50, 50].
r1 r2 · · · rk = c 1 c 2 · · · c k = 2 · 3 · 5 · · · p k 2 .
Consider the square with corners at (a, b) and (a + n, b + n). Any lattice point
inside of this square can be written in the form (a + j, b + k), in which 0 < j, k < n
(the points with either j = n or k = n lie on the boundary of the square). Since
by the definition of a and b, it follows that rj |(a + j) and ck |(b + k). Thus, the
prime number at the intersection of row j and column k divides a + j and b + k.
Consequently, gcd(a + j, b + k)
= 1 and hence (a + j, b + k) is not visible from the
origin if 0 < j, k ≤ n. This proves that there exists a square of n2 lattice points
that are not visible from the origin.
Bibliography
[1] T. M. Apostol, Introduction to analytic number theory, Undergraduate Texts in Mathematics,
Springer-Verlag, New York-Heidelberg, 1976. MR0434929
[2] E. H. Goins, P. E. Harris, B. Kubik, and A. Mbirika, Lattice point visibility on
generalized lines of sight, Amer. Math. Monthly 125 (2018), no. 7, 593–601, DOI
10.1080/00029890.2018.1465760. MR3836421
240 1957. THE ROSS PROGRAM
[3] D. Goss (editor), Arnold Ross Memorial Issue, Journal of Number Theory 110 (2005), no. 1.
In particular, see Arnold Ephraim Ross (1906–2002), p.1-2. http://www.sciencedirect.com/
science/journal/0022314X/110/1.
[4] M. Dziemiańczuk and Wieslaw Bajguz, On GCD-morphic sequences, 2008, http://arxiv.
org/abs/0802.1303.
1958
Smale’s Paradox
Introduction
There are many remarkable results in topology that are counterintuitive. One
of the most famous is the subject of our 1924 entry, the Banach–Tarski paradox.
It asserts that the three-dimensional unit ball can be partitioned into finitely many
disjoint subsets that can be rearranged using rigid motions to form two identical
unit balls. This appears to violate our notion of volume.
Smale’s paradox is another strange result about supposedly familiar objects.
Imagine a sphere composed of a material that can pass through itself. With-
out puncturing or creasing the material, is it possible to turn the sphere inside
out? Stephen Smale (1930– ) shocked the mathematical world in 1958 when he
proved that sphere eversion is possible [2]. However, his proof was difficult to
distill into an explicit regular homotopy. It was through the work of many oth-
ers, including Arnold Shapiro (1921–1962) and Bernard Morin (1931– ), that the
first concrete geometric representation of a sphere eversion emerged. In particu-
lar, William Thurston (1946–2012) discovered a clever explicit construction, now
known as Thurston’s corrugations. Using the methods of Thurston’s corrugations,
the sphere is corrugated and then the top and the bottom of the sphere are pulled
through each other without creasing due to the geometry of the corrugations which
permits the “turning.” An excellent introduction to the topic, including an anima-
tion of the eversion, is available online at [3].
Smale’s paradox belongs in the “a video is worth a thousand words” category,
so we make no attempt to provide the details here. However, we can introduce
some other interesting topological ideas here that are not so exotic.
The Möbius strip is perhaps most students’ first brush with topology. It is a
peculiar surface that is obtained by gluing two opposite ends of a flexible, rect-
angular strip with a half twist; see Figure 1. The Möbius strip is an example of
a two-dimensional manifold with boundary. This means that a tiny observer who
lives on the surface of the Möbius strip, but not on the boundary curve, could be
forgiven for thinking that she lives in R2 . The observer would not be able to deduce,
based upon purely local observations, that the universe was curved in some way.
Nor would she be able to deduce that the Möbius strip is nonorientable: there are
no “front” and “back” sides to the Möbius strip. It has only one side. A torus can
be described in a similar manner; see Figure 2. Unlike the Möbius strip, a torus is
orientable: it has an inside and an outside (as Homer Simpson could tell you).
What is a practical application of the Möbius strip? Large conveyor belts are of-
ten fashioned into Möbius strips to ensure that the entire surface wears evenly. The
241
242 1958. SMALE’S PARADOX
Although he used the term as early as 1836, Listing’s 1847 book Vorstudien zur
Topologie firmly cemented the word in German mathematics. English speakers
continued to use the term analysis situs until the late 1920s when “topology” came
into popular use [1].
1958: Comments
Möbius trip. As Figure 1 suggests, we can describe a Möbius strip as a
parametrized surface. The parametrization
v u
x(u, v) = 1 + cos cos u,
2 2
v u
y(u, v) = 1 + cos sin u,
2 2
v u
z(u, v) = sin ,
2 2
maps the rectangle [0, 2π] × [−1, 1] in uv-space onto a Möbius strip in xyz-space
with width 1 and whose central circle has radius 1.
The boundary of a Möbius strip is, topologically speaking, a circle. In the
formation of the Möbius strip from our square of flexible material (Figure 1), the
upper and lower edges are joined into one continuous curve. Indeed, trace the edge
of a Möbius strip and you will find that it is a single curve; see Figure 3.
Imagine that we live in a high-dimensional space, or that we had a flexible
material that could pass through itself. What would happen if we took a disk and
glued its edge to the boundary of a Möbius strip? We cannot accomplish this in
R3 without self-intersections, but we could accomplish this in R5 , which gives us
enough wiggle room. See the notes for the 2003 entry for more information.
The Klein bottle. Another popular topological item is the Klein bottle; see
Figure 4. This peculiar bottle was first described in 1882 by Felix Klein (1849–
1925). Like the Möbius strip, it is nonorientable. You should definitely not use
it to store liquids since it has no inside or outside! You can make a Klein bottle
by gluing together two Möbius strips along their boundaries. Unfortunately, the
resulting object cannot be realized in R3 without self-intersections, although R4
244 1958. SMALE’S PARADOX
maps the square [0, 2π] × [0, 2π] in uv-space onto a Klein bottle in xyzw-space
without self-intersection.
100TH ANNIVERSARY PROBLEMS 245
Bibliography
[1] J. J. O’Connor and E. F. Robertson, Johann Benedict Listing, MacTutor History of Mathe-
matics, http://www-history.mcs.st-and.ac.uk/Biographies/Listing.html.
[2] S. Smale, A classification of immersions of the two-sphere, Trans. Amer. Math. Soc. 90
(1958), 281–290, DOI 10.2307/1993205. http://www.maths.ed.ac.uk/~aar/papers/smale5.
pdf. MR0104227
[3] YouTube, Outside In, http://www.youtube.com/watch?v=wO61D9x6lNY.
1959
QR Decomposition
Introduction
The QR decomposition is a phenomenally useful matrix factorization that was
independently discovered by John G. F. Francis (1934– ) in 1959 [3, 4] and Vera
Kublanovskaya (1920–2012) in 1961 [9]; see [7] for a detailed history. Suppose that
A ∈ Mm×n (R) and m ≥ n; that is, suppose that A has at most as many columns
as rows. Then we may factor A = QR, in which Q ∈ Mm×n (R) has orthonormal
columns and R ∈ Mn (R) is upper triangular and has nonnegative diagonal entries.
The QR algorithm is an iterative algorithm, based upon repeated QR decompo-
sitions, that is used to quickly and accurately compute eigenvalues. The standard
approach, taught in most introductory linear algebra courses, is to compute the
characteristic polynomial
pA (z) = det(zI − A)
of A ∈ Mn and then find its roots. Due to its reliance on determinants, this method
is terribly inefficient for large matrices. Moreover, there are no simple formulas to
exactly compute the roots of a polynomial of degree five or more. According to
mathematician-writer-journalist Barry Cipra (1952– ) [1]:
Eigenvalues are arguably the most important numbers associated with
matrices—and they can be the trickiest to compute. It’s relatively
easy to transform a square matrix into a matrix that’s “almost” up-
per triangular, meaning one with a single extra set of nonzero entries
just below the main diagonal.1 But chipping away those final nonze-
ros, without launching an avalanche of error, is nontrivial. The QR
algorithm is just the ticket. Based on the QR decomposition, which
writes A as the product of an orthogonal matrix Q and an upper tri-
angular matrix R, this approach iteratively changes Ai = QR into
Ai+1 = RQ, with a few bells and whistles for accelerating convergence
to upper triangular form. By the mid-1960s, the QR algorithm had
turned once-formidable eigenvalue problems into routine calculations.
The QR algorithm has rightly been hailed as one of the ten most important algo-
rithms of the 20th century [1, 2]; see also the 1965 entry.
How does one compute a QR decomposition? We outline here the method of
Householder reflections, named after Alston Scott Householder (1904–1993). See
[5] for more details and corresponding results about complex matrices; see [6] for
1 Such a matrix is called an upper Hessenberg matrix. It is possible to bring a square matrix
into upper Hessenberg form through the use of Householder transformations; see [5].
247
248 1959. QR DECOMPOSITION
w
x
Uw x
2 One can always find a Householder matrix that takes a given vector to another given vector
with the same norm. To improve numerical stability, it is useful to consider a slight generalization.
Suppose that x, y ∈ Rn and x = y = 0. Let σ = 1 if x · y ≤ 0 and σ = −1 if x · y > 0. Then
σUy−σx ∈ Mn (R) is a real orthogonal matrix that maps x to y; see [5].
100TH ANNIVERSARY PROBLEMS 249
1959: Comments
Gram–Schmidt in the real world. It is customary in elementary linear
algebra courses to teach students how to orthogonalize a list of vectors with the
Gram–Schmidt process. While there is some merit to this (for instance, the Gram–
Schmidt process can be used to provide an easy proof that every finite-dimensional
inner product space has an orthonormal basis), students should be warned that the
Gram–Schmidt process is numerically unstable and hence unreliable in practice.
The QR decomposition, because of its reliance on orthogonal matrices, is stable
and yields much better results.
If A = [a1 a2 . . . an ] ∈ Mm×n (R) has linearly independent columns (this im-
plies that m ≥ n), then the columns of the matrix Q = [q1 q2 . . . qn ] ∈ Mm×n (R)
from the QR decomposition are orthonormal and they have the property that
span{a1 , a2 , . . . , ar } = span{q1 , q2 , . . . , qr }
for r = 1, 2, . . . , n.
Hadamard matrices. Jacques Hadamard first proved his eponymous inequal-
ity in 1893 [8]; it is related to a fascinating open problem in combinatorics. If each
entry of A ∈ Mn (R) is −1 or 1, then Hadamard’s inequality (1959.3) tells us that
| det A| ≤ nn/2 .
A matrix for which equality holds is a Hadamard matrix of order n. It is possible to
show that the order of a Hadamard matrix must be 1, 2, or a multiple of 4. Some
Hadamard matrices of small order are
⎡ ⎤
! 1 1 1 1
1 1 ⎢ 1 1 −1 −1 ⎥
[1], , ⎢ ⎥,
1 −1 ⎣ 1 −1 −1 1 ⎦
1 −1 1 −1
and ⎡ ⎤
1 1 1 1 1 1 1 1
⎢ 1 1 1 1 −1 −1 −1 −1 ⎥
⎢ ⎥
⎢ 1 1 −1 −1 −1 −1 1 1 ⎥
⎢ ⎥
⎢ 1 1 −1 −1 1 1 −1 −1 ⎥
⎢ ⎥.
⎢ 1 −1 −1 1 1 −1 −1 1 ⎥
⎢ ⎥
⎢ 1 −1 −1 1 −1 1 1 −1 ⎥
⎢ ⎥
⎣ 1 −1 1 −1 −1 1 −1 1 ⎦
1 −1 1 −1 1 −1 1 −1
As these examples suggest, a Hadamard matrix must be a multiple of an orthogonal
matrix; that is, it must have orthogonal rows and orthogonal columns. The famed
250 1959. QR DECOMPOSITION
Hadamard conjecture asserts that a Hadamard matrix of order 4k exists for every
positive integer k; the smallest permissible order for which no Hadamard matrix is
presently known is 668.
A determinantal inequality. We conclude with a beautiful determinantal
inequality for positive semidefinite matrices. Recall that A ∈ Mn (R) is positive
semidefinite if A is symmetric and its eigenvalues are nonnegative (the eigenvalues
of a real symmetric matrix are always real). This is equivalent to
A = B TB (1959.4)
for some B = [b1 b2 . . . bn ] ∈ Mm×n (R). This decomposition highlights one of
the main applications of positive semidefinite matrices. Since
i bj = bi · bj
aij = bT
for 1 ≤ i, j ≤ n in (1959.4), the entries of A measure the correlations between
the vectors b1 , b2 , . . . , bn ∈ Rm . In this context, positive semidefinite matrices
frequently arise in statistics. As a consequence of Hadamard’s inequality,
| det A| = | det(B T B)| = | det(B T ) det B| = | det B|2
≤ b1 2 b2 2 · · · bn 2
= a11 a22 · · · ann
since aii = bi · bi = bi 2 ≥ 0 for i = 1, 2, . . . , n. Thus, the absolute value of the
determinant of a positive semidefinite matrix is bounded above by the product of
its diagonal entries. If A is merely symmetric, but not positive semidefinite, then
the preceding inequality fails. Consider
!
0 1
A= ,
1 0
for which | det A| = 1 and a11 = a22 = 0. For more information about positive
semidefinite matrices and their properties, see [5].
Bibliography
[1] B. A. Cipra, The best of the 20th century: editors name top 10 algorithms, SIAM News
33 (2000).
[2] J. Dongarra and F. Sullivan, The top 10 algorithms, Comput. Sci. Eng. 2 (2000), 22–23.
[3] J. G. F. Francis, The QR transformation: a unitary analogue to the LR transformation. I,
Comput. J. 4 (1961/1962), 265–271, DOI 10.1093/comjnl/4.3.265. MR0130111
[4] J. G. F. Francis, The QR transformation. II, Comput. J. 4 (1961/1962), 332–345, DOI
10.1093/comjnl/4.4.332. MR0137289
[5] S. R. Garcia and R. A. Horn, A Second Course in Linear Algebra, Cambridge Mathematical
Textbooks, Cambridge University Press, 2017.
[6] G. H. Golub and C. F. Van Loan, Matrix computations, 4th ed., Johns Hopkins Studies in the
Mathematical Sciences, Johns Hopkins University Press, Baltimore, MD, 2013. MR3024913
[7] G. Golub and F. Uhlig, The QR algorithm: 50 years later its genesis by John Francis and
Vera Kublanovskaya and subsequent developments, IMA J. Numer. Anal. 29 (2009), no. 3,
467–485, DOI 10.1093/imanum/drp012. MR2520155
[8] J. Hadamard, Résolution d’une question relative aux déterminants, Bulletin des Sciences
Mathématiques 17 (1893), 240–246.
[9] V.N. Kublanovskaya, On some algorithms for the solution of the complete eigenvalue problem,
USSR Computational Mathematics and Mathematical Physics 1 (1963), no. 3, 637–657
1960
Introduction
This year honors a groundbreaking, influential article by Eugene Wigner [12],
the Nobel laureate in physics whose work in random matrix theory eventually led
to astonishing connections between the seemingly diverse fields of number theory
and nuclear physics; see the 1928 entry. In his article, Wigner discusses the use of
mathematics in physics:1
A possible explanation of the physicist’s use of mathematics to formu-
late his laws of nature is that he is a somewhat irresponsible person.
As a result, when he finds a connection between two quantities which
resembles a connection well-known from mathematics, he will jump at
the conclusion that the connection is that discussed in mathematics
simply because he does not know of any other similar connection. It is
not the intention of the present discussion to refute the charge that the
physicist is a somewhat irresponsible person. Perhaps he is. However,
it is important to point out that the mathematical formulation of the
physicist’s often crude experience leads in an uncanny number of cases
to an amazingly accurate description of a large class of phenomena.
This shows that the mathematical language has more to commend it
than being the only language which we can speak; it shows that it is,
in a very real sense, the correct language.
Mathematics is so ubiquitous in physics that the American Journal of Physics
asked, “Does any piece of mathematics exist for which there is no application what-
soever in physics? ” To this, physicist Dwight E. Neuenschwander (1952– ) re-
sponded:
While constructing such a “useless” piece of mathematics would be the
delight of a mathematical purist, it seems we physicists have always
managed to foil this lofty goal. It seems that even the most esoteric
mathematical inventions of the human mind are eventually used to
model physical systems. Why that should be true is of course a deep
and fascinating question. [9]
The catchphrase “unreasonable effectiveness” has spawned innumerable imitators
and it is difficult to catalogue them all. Some of the most influential were discussed
by economist K. Vela Velupillai (1947– ) [11]:
Eugene Wigner’s Richard Courant Lecture in the Mathematical Sci-
ences, delivered at New York University on 11 May 1959, was titled,
1 The repeated use of “his” and “he” to refer to a generic physicist is regrettable.
251
252 1960. THE UNREASONABLE EFFECTIVENESS OF MATHEMATICS
with more insidious ones. The real world of nature has the uncanny
habit of surprising us; it has always proven to be a lot stranger than
we give it credit for. Mathematics is a product of the imagination that
sometimes works on simplified models of reality. Platonism is a viral
form of philosophical reductionism that breaks apart holistic concepts
into imaginary dualisms. . . . Mathematics is a human invention for de-
scribing patterns and regularities. It follows that mathematics is then
a useful tool in describing regularities we see in the universe. The re-
ality of the regularities and invariances, which we exploit, may be a
little rubbery, but as long as they are sufficiently rigid on the scales of
interest to humans, then it bestows a sense of order.
Certainly many mathematicians would disagree with Abbott’s account!
1960: Comments
Catalan numbers. There is a wealth of interesting facts known about the
Catalan numbers. First of all, they are named after the French-Belgian mathemati-
cian Eugène Charles Catalan (1814–1894), who does not appear to be Catalonian.
Nevertheless, the term “Catalonian” has been used by a few authors to refer to
subjects related to the Catalan numbers [4, p. 254] (at least the authors think it a
good idea and are not above flagrant self-reference). The Catalan numbers appear
in many different places in mathematics; over fifty such occurrences are discussed
in [10].
It turns out that Cn is the number of ways to write n left parentheses and n
right parentheses so that, as we move from left to right, we never see more right
parentheses than left parentheses. We see that C1 = 1 since the only possible
arrangement is (). Similarly, C2 = 2 since there are only two permissible configu-
rations: ()() and (()). For n = 3, we have exactly five options:
((())), (()()), (())(), ()(()), and ()(()).
Thus, C3 = 5. See the comments for the 2008 entry for the asymptotic rate of
growth of the Catalan numbers.
Another interesting interpretation of Cn is that it is the number of “staircase
walks” from (0, 0) to (n, n) that never rise above the main diagonal; that is, j ≤ k
whenever (j, k) is on our path. Such a path is called a Dyck path, in honor of
Walther von Dyck (1856–1934); see Figure 2.
100TH ANNIVERSARY PROBLEMS 255
Bibliography
[1] D. Abbott, The reasonable ineffectiveness of mathematics, Proceedings of the IEEE, Vol.
101, no. 10, October 2013.
[2] S. A. Burr (ed.), The unreasonable effectiveness of number theory, papers from the American
Mathematical Society Short Course held in Orono, Maine, August 6–7, 1991, Proceedings of
Symposia in Applied Mathematics, vol. 46, American Mathematical Society, Providence, RI,
1992. MR1195838
[3] S. M. Focardi and F. J. Fabozzi, The reasonable effectiveness of mathematics in economics,
American Economist 1 (2010), no. 55, 19–30.
[4] S. R. Garcia and S. J. Miller, 100 Years of Math Milestones: The Pi Mu Epsilon Centennial
Collection, American Mathematical Society, 2019.
[5] R. W. Hamming, The unreasonable effectiveness of mathematics, Amer. Math. Monthly 87
(1980), no. 2, 81–90, DOI 10.2307/2321982. MR559142
[6] A. Harvey, The Reasonable Effectiveness of Mathematics in the Physical Sciences, Relativity
and Gravitation, 43 (2011), 3057–3064.
[7] J. Kilpatrick, The reasonable ineffectiveness of research in mathematics education, For the
Learning of Mathematics 2 (1981), no. 2, 22–29.
[8] A. M. Lesk, The unreasonable effectiveness of mathematics in molecular biology, Math. In-
telligencer 22 (2000), no. 2, 28–37, DOI 10.1007/BF03025372. MR1764266
[9] D. E. Neuenschwander, Does any piece of mathematics exist for which is no application
whatsoever in physics?, Amer. J. Phys. 63 (1996), 63.
[10] R. P. Stanley, Enumerative combinatorics. Vol. 2, with a foreword by Gian-Carlo Rota and
appendix 1 by Sergey Fomin, Cambridge Studies in Advanced Mathematics, vol. 62, Cam-
bridge University Press, Cambridge, 1999. MR1676282
[11] K. V. Velupillai, The unreasonable ineffectiveness of mathematics in economics, Cambridge
Journal of Economics 29 (2005), 849–872.
[12] E. P. Wigner, The unreasonable effectiveness of mathematics in the natural sciences [Comm.
Pure Appl. Math. 13 (1960), 1–14; Zbl 102, 7], Mathematical analysis of physical systems,
Van Nostrand Reinhold, New York, 1985, pp. 1–14. https://www.dartmouth.edu/~matc/
MathDrama/reading/Wigner.html. MR824292
[13] E. O. Wilson, Great Scientist = Good at Math: E. O. Wilson shares a secret: Discoveries
emerge from ideas, not number-crunching, Wall Street Journal (online). http://www.wsj.
com/articles/SB10001424127887323611604578398943650327184.
1961
Introduction
There is a certain “continuity principle” that underlies much familiar mathe-
matics: if one jiggles parameters a little bit, then the final answer should only change
by a small amount. For example, the roots of a quadratic polynomial ax2 + bx + c,
in which a
= 0, are given by
√
−b ± b2 − 4ac
.
2a
As long as we avoid a = 0, the (possibly complex) roots vary continuously with the
parameters (a, b, c). Similarly, the area and perimeter of a polygon vary continu-
ously with the placement of its vertices.
Beginning with the work of Henri Poincaré on the orbits of planets, this gen-
eral principle began to be questioned. A milestone in our understanding of chaotic
behavior is the work of Edward Lorenz (1917–2008). His seminal paper Determin-
istic Nonperiodic Flow, published in 1963 (but based on work started in 1961),
introduced the notion of “sensitive dependence on initial conditions.” This refers
to when minute changes to initial conditions drastically affect long-term behavior.
In an attempt to study the weather, Lorenz considered the deterministic system
dx dy dz
= σ(y − x), = x(ρ − z) − y, = xy − βz,
dt dt dt
in which x is proportional to the rate of convection, y to the horizontal temperature
variation, z to the vertical temperature variation, and σ, ρ, and β are parameters.
He wanted to rerun some calculations from an intermediate point. When he fed
the output from a previous run into the computer, the system behaved in a totally
different manner than it had before.
How could this occur in a deterministic system? Lorenz’s printer only displayed
three digits of the output, while the computer code worked internally with six. The
resulting loss of precision changed the initial conditions slightly and permitted the
two computations to make radically different long-term predictions. Many people
are familiar with the butterfly effect, a phrase which insinuates that the flap of a
butterfly’s wings may eventually cause (or prevent) the onset of a tornado hundreds
of miles away. Long-term weather forecasting may be impossible since we can never
know all the parameter values with perfect accuracy.
The following problem shows that a tiny difference in the initial trajectory of a
billiard ball can have qualitative effects on the long-time orbits of the initial points.
Furthermore, if one imagines this rectangle is slightly compressed to have concave
257
258 1961. LORENZ’S NONPERIODIC FLOW
sides, then a tiny difference in the initial slope has exponentially large effects on
the long-time orbits of the initial points, a property of a dynamical system known
as sensitive dependence on initial conditions.
1961: Comments
Newton’s method. Here is a particularly nice example of a chaotic system
that sets the stage for our 1964 and 1978 entries.
Newton’s method is a powerful algorithm that constructs a sequence of real
numbers that rapidly converges to a zero of a given polynomial. For example,
f (x) = x2 − 3
√ √
has
√ the zeros x = ± 3. Arithmetic tells us that 1 < 3 < 2 and we suspect that
3√lies closer to 2 than 1. Let x0 = 2 be our initial guess for the numerical value
of 3; it is not a particularly good guess, but this matters not because Newton’s
method is incredibly effective. Construct the tangent line to the the graph of f (x)
at the point (x0 , f (x0 )); that is,
f (x0 ) f (2) 7
x1 = x0 −
= 2− = = 1.75.
f (x0 ) f (2) 4
f (xn )
xn+1 = xn − .
f (xn )
In particular, x0 = 0 is a poor initial choice since xn is the zero sequence and hence
does not converge to a zero of f . Things become much more interesting if we use
polynomials of higher degree and permit the use of complex numbers.
Consider the complex polynomial
f (z) = z 6 − 1,
whose roots are the vertices of an equilateral hexagon inscribed in the unit circle
|z| = 1; see the figure on p. 236. For almost every complex number z, the sequence
obtained from Newton’s method with initial seed z converges to one of the three
roots. But which one? Assign a color to each of the three roots of f . Now paint each
initial seed z according to which root the seed iterates to under Newton’s method.
The resulting image (see Figure 3) is a Newton fractal. Other polynomials yield
similarly enchanting images; see Figure 4. For a wealth of information about chaos
and fractals, see [3].
Bibliography
[1] E. N. Lorenz, Deterministic Nonperiodic Flow, Journal of Atmospheric Sciences 20 (1963),
130–141. http://eaps4.mit.edu/research/Lorenz/Deterministic_63.pdf
[2] E. N. Lorenz, How much better can weather prediction become?, Technology Re-
view Jul/Aug 1969, 39-49. http://eaps4.mit.edu/research/Lorenz/How Much Better Can
Weather Prediction 1969.pdf
[3] H.-O. Peitgen, H. Jürgens, and D. Saupe, Chaos and fractals: New frontiers of science, 2nd
ed., with a foreword by Mitchell J. Feigenbaum, Springer-Verlag, New York, 2004. MR2031217
1962
Introduction
In a seminal 1962 paper [4], David Gale (1921–2008) and Lloyd Shapley (1923–
2016) initiated the formal study of stable matchings. In 2012 the Nobel Prize in
Economics was given to Shapley and Alvin Roth (1951– ) “for the theory of stable
allocations and the practice of market design” (Gale had the misfortune of passing
away in 2008, thus rendering him ineligible for a Nobel Prize).
One of the most important applications of these ideas is to the National Res-
ident Match Program (NRMP), which matches hospitals and medical students for
residencies. In 1998, the NRMP changed the matching algorithm in response to
concerns of fairness. Finding stable matchings that meet various fairness criteria
remains challenging and depends upon a careful study of intricate relationships in
posets imposed on multiple stable matchings.
Suppose that we have two groups of the same size: proposers and acceptors.
Each proposer must be matched with an acceptor; see Figure 1. We collect the
preferences for proposers and acceptors in two preference matrices, one for each of
263
264 1962. THE GALE–SHAPLEY ALGORITHM & THE STABLE MARRIAGE PROBLEM
the groups. A matching is stable if no two parties prefer each other to their assigned
partners. In a stable matching, no two parties have a reason to switch partners.
The Gale–Shapley algorithm is an efficient proposal algorithm that, given two
preference matrices, finds stable matchings. These are often called stable marriages
because one of the original applications provided by Gale and Shapley was the
matching of n men to n women (although this makes one question who the consumer
of such an algorithm would be). The worst case complexity of the algorithm is
O(n2 ), which means that the number of steps needed is at worst proportional to
the square of the size of each group. Moreover, the Gale–Shapley algorithm always
returns at least one stable matching, and at most two of them no matter what set
of preferences are given. Unlike our description of the powerful simplex method
(see the 1947 entry), the Gale–Shapley algorithm is simple enough for us to explain
in detail.
Suppose we have a group of n men and a group of n women who want to be
matched.1 Let the men, in turn, propose to the women, each of whom either rejects
the proposals she receives or breaks off a previous engagement if a better proposal
comes along. Here is the Gale–Shapley algorithm.
(a) In the first round, each man proposes to the woman he prefers the most.
Each woman considers all the proposals she receives. She provisionally accepts
the proposal coming from the man she ranks highest among those who have
proposed to her and rejects all the other proposals.
(b) Each unengaged man now proposes to the woman he prefers among the women
he has not previously asked to marry him (once a woman rejects a man he
never asks her again), regardless of whether or not she has provisionally ac-
cepted a proposal. Each woman considers all the proposals she receives and
provisionally accepts the proposal coming from the man she ranks highest
among those who have proposed to her. She rejects all the other proposals.
(c) We keep repeating step (b), with the unengaged men asking and the women
provisionally accepting, until all men are provisionally engaged. At this point
all the provisional engagements become permanent and we have obtained a
matching between the men and women.
The proof that this algorithm always results in a stable matching is construc-
tive. Once a woman provisionally accepts a proposal, she can only stay the same or
trade-up; she is never unmatched. If a man has been unsuccessful, then he proposes
to someone new. Since there are the same number of men and women, there must
be at least one available woman who has not received any offers and thus must
accept his. Each man remains paired with a woman he prefers unless that woman
receives a better offer, and every woman is given the option of choosing among the
men that prefer her.
When the Gale–Shapley algorithm finds at most two stable matchings, it is
because the matching resulting from having one group do the proposing may differ
from the matching obtained when the other group does the proposing. If two
distinct stable matchings can be returned by the algorithm, each is optimal for the
1 We tried unsuccessfully to rephrase the algorithm in a gender-neutral manner. It became
increasingly difficult to understand because the words “its” and “their” could refer to either party.
100TH ANNIVERSARY PROBLEMS 265
group doing the proposing. For example, if the men propose, each man will fare at
least as well as he would in the matching obtained by having the women propose.
Current attention is focused on what happens when there are many more than
two stable matchings possible. In these cases the additional matchings must be
found with other algorithms. Christine Cheng and her colleagues recently proved
that a nice relationship holds for local and global aspects of the set of matchings,
given that all the stable matchings for a particular problem instance have been
found. This work involves the following two concepts.
Global Median Matching (GMM): Impose a partial ordering on the set of stable
matchings according to the rule that one matching is better than another if every
proposer (or symmetrically every acceptor) receives at least as good a partner in
the former matching as in the latter matching. The resulting poset terminates at
one end in the proposer-optimal matching and at the other end in the acceptor-
optimal matching. A GMM matching is a matching that lies a median number of
steps from these extreme matchings.
Local Median Matching (LMM): Consider for each proposer (and similarly for
each acceptor) the ordered set of all the rankings of the partners it is assigned in all
the stable matchings. An LMM is a matching that assigns all the people a partner
that lies at the median of their ordered sets.
The surprising result is that not only do GMMs and LMMs exist, but there is
always at least one GMM and LMM that are identical. Therefore, if one accepts
these local and global measures of fairness as valid, both measures can be satisfied
by a single stable matching.
1962: Comments
Kidney transplants. One of the powers of the Gale–Shapley algorithm is
its flexibility. If we can formulate a real-world problem in terms of assignments,
then the algorithm may be applicable. For example, the Gale–Shapley algorithm
can be used in diverse situations such as college admissions, scheduling tasks on
processors, matching kidneys with patients, internet search engine auctions, speed
dating, and pairing students to schools. For many problems in the world there are
other algorithms that could run faster or yield better solutions, but it is good to
know that a stable matching exists. Moreover, the algorithm often runs fast enough
to resolve the problem.
The great insight in many of these situations is that we can find solutions us-
ing market-like situations without money changing hands. For example, consider
266 1962. THE GALE–SHAPLEY ALGORITHM & THE STABLE MARRIAGE PROBLEM
Hari Peter
Hober Petra
kidney transplants. Initially most transplants came from deceased donors. How-
ever, it is possible for a living person to donate one of their kidneys. This greatly
increases the available supply, but many people are understandably hesitant to do-
nate one of their kidneys. Moreover, not any kidney can go to any patient; there
are compatibility issues that must be addressed.
Imagine two families in which someone needs a kidney, say Hari and Hober
in one, and Petra and Peter in another. Hober and Peter both need kidneys,
but unfortunately Hari’s kidney is incompatible with Hober. Similarly, Petra’s
is incompatible with Peter. However, Hari’s would work in Peter and Petra’s in
Hober. This opens up the opportunity for a trade that helps both families; see
Figure 3. Now Peter and Hobert can declare: “Kidneys! I’ve got new kidneys! I
don’t like the colour.”
Before the Gale–Shapley algorithm was applied in the early 2000s, only about
twenty transplants per year were from living donors. Now thousands of such trans-
plants have been performed successfully. For more information about this life-saving
application of mathematics, see [6, 7, 9].
Bibliography
[1] C. T. Cheng, Understanding the generalized median stable matchings, Algorithmica 58 (2010),
no. 1, 34–51, DOI 10.1007/s00453-009-9307-2. http://link.springer.com/article/10.1007
%2Fs00453-009-9307-2. MR2658099
[2] C. T. Cheng and A. Lin, Stable roommates matchings, mirror posets, median graphs, and
the local/global median phenomenon in stable matchings, SIAM J. Discrete Math. 25 (2011),
no. 1, 72–94, DOI 10.1137/090750299. http://epubs.siam.org/doi/abs/10.1137/090750299.
MR2765702
[3] C. Cheng, E. McDermid, and I. Suzuki, A unified approach to finding good stable matchings
in the hospitals/residents setting, Theoret. Comput. Sci. 400 (2008), no. 1-3, 84–99, DOI
10.1016/j.tcs.2008.02.014. MR2424344
[4] D. Gale and L. S. Shapley, College Admissions and the Stability of Marriage, Amer. Math.
Monthly 69 (1962), no. 1, 9–15, DOI 10.2307/2312726. http://www.econ.ucsb.edu/~tedb/
Courses/Ec100C/galeshapley.pdf. MR1531503
[5] D. Gusfield and R. W. Irving, The stable marriage problem: structure and algorithms, Foun-
dations of Computing Series, MIT Press, Cambridge, MA, 1989. MR1021242
[6] A. Hern, Trading kidneys, repugnant markets and stable marriages win the Nobel Prize
in Economics, NewStatesman, October 15, 2012. http://www.newstatesman.com/blogs/
economics/2012/10/trading-kidneys-repugnant-markets-and-stable-marriages-win-
nobel-prize-econo.
[7] K. Luong, Matching theory: kidney allocation, Health Policy and Economics, UWOMJ 82,
no.1, Spring 2013. http://www.uwomj.com/wp-content/uploads/2013/10/v82no1_6.pdf.
[8] C. O’Neil, Weapons of math destruction: How big data increases inequality and threatens
democracy, Crown, New York, 2016. MR3561130
[9] Reuters, Alvin Roth Transformed Kidney Donation System, Reuters, October 15,
2012. http://forward.com/news/breaking-news/164327/alvin-roth-transformed-kidney-
donation-system/.
1963
Continuum Hypothesis
Introduction
In our 1918 entry, we introduced Cantor’s theory of cardinality and its shocking
implication that there are multiple levels of infinity. Recall that A ∼
= B means that
there is a one-to-one and onto function f : A → B. For finite sets, A ∼ = B if and
only if A and B have the same number of elements. Cantor’s brilliant insight was
to extend this definition to infinite sets. His classic diagonal argument (see p. 29)
reveals that no one-to-one correspondence between N and R exists; that is, N and
R represent two different levels of infinity.
Since N is a subset of R, it is natural to consider what happens “in between”
N and R. The continuum hypothesis (CH) asserts that if N ⊆ A ⊆ R, then either
A∼ = N or A ∼ = R; that is, there are no “intermediate infinities” between those of
the natural numbers and the real numbers.
Cantor believed the continuum hypothesis to be true and he spent years at-
tempting to prove it, without success. David Hilbert, one of the greatest math-
ematicians of all time, placed it first on his list of twenty-three open questions
presented to the International Congress of Mathematicians, held in Paris in 1900
(for more about Hilbert’s problems, see the 1935, 1970, and 1980 entries). Hilbert
opened his historic address with these words:
Who among us would not be happy to lift the veil behind which is
hidden the future; to gaze at the coming developments of our science
and at the secrets of its development in the centuries to come? What
will be the ends toward which the spirit of future generations of math-
ematicians will tend? What methods, what new facts will the new
century reveal in the vast and rich field of mathematical thought?
So is the continuum hypothesis true or false? To this day, nobody has been
able to prove it. Neither has anyone been able to disprove it. Nevertheless, the
problem has been resolved! How can this be?
In 1940, Kurt Gödel proved that CH cannot be disproved from the traditionally
accepted axioms of set theory [5]. Specifically, he showed that CH cannot be
disproved using the Zermelo–Fraenkel (ZF) axioms (see the 1929 entry) or the
Zermelo–Fraenkel axioms augmented with the axiom of choice (AC). This extended
axiom system is denoted ZFC; see the comments for the 1964 and 1969 entries.
In 1963, Paul Cohen (1934–2007) introduced the powerful forcing technique
and demonstrated that CH cannot be proved in ZFC [1, 2, 9]. Thus, the continuum
hypothesis is neither provable nor disprovable from the standard axioms of set
theory. Cohen won the prestigious Fields Medal in 1966 for this achievement.
269
270 1963. CONTINUUM HYPOTHESIS
See [9] for a remembrance of Paul Cohen; the second named author is one of his
mathematical grandchildren.
Of course, the results of Gödel and Cohen assume that ZFC is consistent. The
issue of whether or not ZFC is consistent is another story altogether; see the 1929
entry. To some extent, whether CH is “true” or “false” is a matter of opinion
since it can neither be proved nor disproved in ZFC. One can add either CH or its
negation to ZFC and obtain two different versions of mathematics, one in which CH
is “true” and one in which CH is “false.” Each is as valid as the other, although, as
Gödel showed, neither system can prove its own consistency. This situation seems
bizarre, although it becomes easier to understand if we study a similar occurrence
in classical geometry; see the comments for this entry.
(b) If you have two sets of positive fractal dimension d1
= d2 , can you always
construct a set whose fractal dimension is strictly between d1 and d2 ?
1963: Comments
Self-similarity. The Cantor set is self-similar: it is composed of two scaled
copies of itself, each of which has been shrunk by a factor of three. The power-law
relation between the number of pieces p, the reduction factor r, and the fractal
dimension d is p = r d ; that is,
log p
d= .
log r
At each stage of the Cantor set construction, the number of pieces is doubled and
each is shrunk by a factor of 3. Thus, the fractal dimension of the Cantor set is
log 2
≈ 0.63093.
log 3
What about a fractal whose dimension is between 1 and 2? Take a solid equilat-
eral triangle and subdivide it into four equilateral triangles by removing the central
triangle. Iterate this process and obtain the Sierpiński triangle (Figure 2), named
after Waclaw Sierpiński (1882–1969). In particular, observe that the Sierpiński tri-
angle resembles our diagram of the 3-adic integers; see the 1916 entry. This fractal
is composed of three scaled copies of itself, each of which has been shrunk by a
factor of two. Thus, its fractal dimension is
log 3
≈ 1.58496,
log 2
which is strictly between 1 and 2. For more information about fractals, see the
1961 and 1978 entries and [8].
The parallel postulate. How does classical geometry relate to the indepen-
dence of the continuum hypothesis? The story begins around 2,300 years ago,
when Euclid of Alexandria (in modern Egypt) wrote the Elements, a monumental
treatise on geometry and related topics. The Elements was an attempt to build
geometry in a logical and rigorous manner from a few basic axioms. Although Eu-
clid’s book contains some mistakes and a few hidden assumptions, it is nonetheless
a magnificent intellectual achievement.
After defining everything from circles to isosceles triangles to rhomboids, Euclid
presents his five postulates (axioms):
(a) A straight line segment can be drawn joining any two points.
(b) Any straight line segment can be extended indefinitely in a straight line.
(c) Given any straight line segment, a circle can be drawn having the segment as radius
and one endpoint as center.
(d) All right angles are congruent.
(e) If two lines are drawn which intersect a third in such a way that the sum of the inner
angles on one side is less than two right angles, then the two lines inevitably must
intersect each other on that side if extended far enough.
The fifth postulate sticks out: it seems too complicated to accept as an axiom.
Perhaps with sufficient work we can deduce it from the remaining axioms? Euclid
himself must have been unsatisfied with his fifth postulate since he held off from
using it until his twenty-ninth theorem (Proposition I.29). For over 2,000 years
mathematicians tried unsuccessfully to prove that the fifth postulate followed from
the other postulates and definitions.
They all failed for a subtle reason: it is impossible to prove or disprove the
fifth postulate, given only the truth of the other four! This is because we can
produce two distinct versions of geometry, one in which the fifth postulate is true
and another in which it is false. If you assume that Euclid’s fifth postulate is
true, then your geometry is just plain-old plane geometry. If you assume that
Euclid’s fifth postulate is false, then you are studying hyperbolic geometry, a type
of curved geometry. The existence of curved geometries is not surprising to us in the
21st century, since we are used to hearing of relativity and “curved space-time.”
However, this was once an extremely radical thought. Indeed, the philosopher
100TH ANNIVERSARY PROBLEMS 273
Immanuel Kant (1724–1804) went so far as to say that “Euclidean geometry is the
inevitable necessity of thought.”
The fifth postulate is often called the parallel postulate because it is equivalent
to Playfair’s axiom:
In a plane, given a line and a point not on it, at most one line parallel
to the given line can be drawn through the point.
The Poincaré disk model of the hyperbolic plane is a geometry in which Euclid’s
first four postulates hold, but the parallel postulate fails; see Figure 4. The “points”
in this geometry are elements of an open disk. The “lines” are arcs of circles that
intersect the boundary circle orthogonally. This geometry satisfies the first four of
Euclid’s axioms, but not the fifth. See [6] for the whole story behind Euclidean and
non-Euclidean geometry.
Bibliography
[1] P. Cohen, The independence of the continuum hypothesis, Proc. Nat. Acad. Sci. U.S.A.
50 (1963), 1143–1148, DOI 10.1073/pnas.50.6.1143. http://www.ncbi.nlm.nih.gov/pmc/
articles/PMC221287/. MR0157890
[2] P. J. Cohen, The independence of the continuum hypothesis. II, Proc. Nat. Acad. Sci. U.S.A. 51
(1964), 105–110, DOI 10.1073/pnas.51.1.105. http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC300611/. MR0159745
[3] T. Y. Chow, A beginner’s guide to forcing, Communicating Mathematics: A Conference in
Honor of Joseph A. Gallian’s 65th Birthday, Contemporary Mathematics 479, 25–40. http://
arxiv.org/abs/0712.1320.
[4] L. Gillman, Two classical surprises concerning the axiom of choice and the continuum hypothe-
sis, Amer. Math. Monthly 109 (2002), no. 6, 544–553, DOI 10.2307/2695444. http://www.maa.
org/sites/default/files/pdf/upload_library/22/Ford/Gillman544-553.pdf. MR1908009
274 1963. CONTINUUM HYPOTHESIS
[5] K. Gödel, The Consistency of the Continuum Hypothesis, Annals of Mathematics Studies, no.
3, Princeton University Press, Princeton, N. J., 1940. MR0002514
[6] R. Hartshorne, Geometry: Euclid and beyond, Undergraduate Texts in Mathematics, Springer-
Verlag, New York, 2000. MR1761093
[7] D. Hilbert, Über das Unendliche, Math. Ann. 95 (1926), 161–190. http://link.springer.
com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/1902-08-
10/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf.
[8] H.-O. Peitgen, H. Jürgens, and D. Saupe, Chaos and fractals: New frontiers of science, 2nd
ed., with a foreword by Mitchell J. Feigenbaum, Springer-Verlag, New York, 2004. MR2031217
[9] P. Sarnak (ed.), Remembering Paul Cohen, Notices of the AMS 57 (2010), no. 7, 824–838.
1964
Introduction
One of the most important contributions someone can make to mathematics
is to encourage others to join and thrive in the field. Although there are many
ways to do this, one way is through quality writing. A good textbook can circle
the globe, edition after edition, reaching many generations. For example, Euclid’s
Elements remained in use for almost 2,000 years; see notes for the 1963 entry.
One of the most prestigious prizes honoring such work is the Leroy P. Steele
Prize for Mathematical Exposition. It was first given in 1993 to Walter Rudin
(1921–2010) for his enormously influential books Principles of Mathematical Anal-
ysis [4] and Real and Complex Analysis [5]. These books have been used around
the world for decades and have influenced countless mathematicians. They have
survived into many editions. In fact, the reason this is the entry for 1964 and not
1953 is that this year marks the publication of the second edition of Principles.
Many mathematicians profess their love for these books because of the challeng-
ing problems at the end of each chapter. On a personal note, the second named
author remembers using the third edition of Principles as a sophomore at Yale.
At the time he was on the fence between mathematics and physics. The joy of
wrestling with Rudin’s problems finally pushed him into the math camp.
Principles is such an omnipresent classic that one can hardly imagine the time,
shortly after it was published, when it was just another new real analysis textbook.
The original 1953 review from the Bulletin of the American Mathematical Soci-
ety compared three contemporary real analysis books: Real Functions by Casper
Goffman (1913–2006), H. P. Thielman’s Theory of Functions of Real Variables,
and Rudin’s Principles of Mathematical Analysis; see Figure 1. The reviewer,
M. E. Munroe, concluded:
Rudin’s books do have their detractors. His style, which was typical for the era,
is terse. As generations of students have lamented, illustrations are notoriously ab-
sent from Principles. The first named author suspects that for each student turned
on to mathematics by Principles’ style, another few are turned away. Perhaps the
widespread use of Principles is one of the main reasons why real analysis is so fre-
quently viewed as the “sink or swim” course by mathematics majors. As Herbert
275
276 1964. PRINCIPLES OF MATHEMATICAL ANALYSIS
Wilf (1931–2012), who took undergraduate analysis at MIT under Rudin said:
This course is famous for being our rite of passage. Our hazing cere-
mony. If you want to join the club, then here is the hurdle that you
have to jump over. [6]
One Goodreads reviewer had the following humorous take:
I have mixed feelings about this book. How to describe it. . . ok, let’s
talk kung-fu movies. So there’s a standard trope in martial arts movies
where the young apprentice shows up at the stoop of the Old Master
and says, “teach me to fight”. And the Old Master decides that instead
of doing the obvious thing and having our hapless padawan practice
something reasonable like, you know, punching techniques, the Old
Master tells the aspirant to do a series of incomprehensible and difficult
tasks. Carrying the Old Master up and down the mountains. Knitting
sweaters while hanging upside-down over hot coals. Doing the Old
Master’s laundry. And so on. Usually, it’s never clear if the training is
difficult because Sensei is trying to impart some kind of deeper wisdom
or if he’s really just a resentful old jerk who takes pleasure in making
young students suffer.
Principles of Mathematical Analysis is the Old Master. It is com-
pletely uncompromising—no diagrams, the proofs are often opaque,
the definitions unmotivated—and the book carries more than a whiff
100TH ANNIVERSARY PROBLEMS 277
of that sadistic strain in math education that sees formal rigor and a
lack of justification as a kind of intellectual machismo. [3]
1964: Comments
The axiom of choice. We were so busy in the 1963 entry discussing the
continuum hypothesis that we never had a chance to say anything substantial about
the axiom of choice! That is a much more exciting topic than debating the merits
of Rudin’s Principles of Real Analysis.
In the proof of the existence of Vitali sets and in the derivation of the Banach–
Tarski paradox (see the 1924 entry), we had an equivalence relation on a set. We
produced a new set by selecting one element from each equivalence class. This step
implicitly appeals to the axiom of choice (AC):
Axiom of Choice. If {Xα }α∈I is a nonempty collection of
#
nonempty sets, then there is an f : I → α∈I Xα such that
f (α) ∈ Xα for all α ∈ I.
278 1964. PRINCIPLES OF MATHEMATICAL ANALYSIS
Without further knowledge about the sets Xn , the axiom of choice is required
to assert that the sequence x1 , x2 , . . . exists. What do we mean about “further
knowledge”? For example, AC is not required for the following statement:
Here we have used the fact that N is well-ordered : a nonempty subset of N contains
a smallest element. This does not require the axiom of choice because we have
provided a definite rule for producing each xn .
Suppose that a caterpillar with infinitely many pairs of legs is getting dressed.
It has infinitely many pairs of shoes and infinitely many pairs of socks. For each pair
of legs, the caterpillar can put on the left shoe first, then the right. The caterpillar
is unable to wear its socks without the axiom of choice, since infinitely many choices
need to be made without the aid of a procedure for making the selection. Since the
socks are indistinguishable, an arbitrary choice must be made for each pair.
For more about the axiom of choice, see the comments for the 1999 entry.
f (x + y) = f (x) + f (y)
Bibliography
[1] G. Hamel, Eine Basis aller Zahlen und die unstetigen Lösungen der Funktionalgleichung:
f (x + y) = f (x) + f (y) (German), Math. Ann. 60 (1905), no. 3, 459–462, DOI
10.1007/BF01457624. MR1511317
[2] M. E. Munroe, Book Review: Real functions // Book Review: Principles of mathematical
analysis // Book Review: Theory of functions of real variables, Bull. Amer. Math. Soc. 59
(1953), no. 6, 572–577, DOI 10.1090/S0002-9904-1953-09765-8. MR1565532
[3] M. Needham, Review of Principles of Mathematical Analysis, https://www.goodreads.com/
review/show/1271096254?book_show_action=true&from_review_page=1.
[4] W. Rudin, Principles of mathematical analysis, McGraw-Hill Book Company, Inc., New York-
Toronto-London, 1953. MR0055409
[5] W. Rudin, Real and complex analysis, McGraw-Hill Book Co., New York-Toronto, Ont.-
London, 1966. MR0210528
[6] H. S. Wilf, Epsilon sandwiches, https://www.math.upenn.edu/~wilf/website/MAASpeech.
1965
Introduction
There are many important milestones in our efforts to find better and faster
ways to solve problems. One of the most important is the fast Fourier trans-
form (FFT), developed in 1965 by James William Cooley (1926–2016) and John
Tukey (1915–2000). It is (unintentionally) based upon tools first developed by Carl
Friedrich Gauss in 1805 to calculate the coefficients in a trigonometric expansion
related to the trajectories of two asteroids. The FFT has had a tremendous impact
upon the engineering community, particularly in the field of digital signal process-
ing.
A discrete periodic function with period n can be thought of as a function
whose domain is the cyclic group Z/nZ. Such functions arise naturally not only
in abstract algebra and number theory, but also in many real-world applications.
For example, a real- or complex-valued function on Z/nZ can be regarded as the
discretization of a continuous, periodic function; see Figure 1. The discrete Fourier
transform (DFT) of f : Z/nZ → C is the function f+ : Z/nZ → C defined by
1
n−1
f+(j) = √ f (k)e−2πijk/n ,
n
k=0
in which i2 = −1. The DFT is used to analyze the strength of the “signal” f at
√
various frequencies. The normalization 1/ n is not universal: 1 and 1/n are also
used. Since periodic functions arise anytime there are waves or vibrations, the DFT
is used to analyze everything from radio waves to earthquakes.
The FFT reduces the number of computations required to compute the discrete
Fourier transform from O(n2 ) to O(n log n). Since n log n tends to infinity much
more slowly than n2 , the FFT provides a huge savings when n is large. This
illustrates that a problem which appears to require a certain amount of time or
effort may be susceptible to a more clever, faster approach; see the 2002 entry for
another striking example. Our problem for this year involves such a problem: how
fast can one multiply two matrices? We must be specific about how we measure
the speed of an algorithm. Addition is somewhat faster than multiplication on
a computer, so one often counts the number of multiplications required by an
algorithm as a measure of its approximate runtime. In particular, one wants to
know how the algorithm performs as the size of the input increases.
281
282 1965. FAST FOURIER TRANSFORM
(b) A discretization (with 100 sample points) over one period of the periodic
function above.
1965: Comments
Matrix multiplication. The method suggested by the centennial problem
can be iterated to provide an algorithm for multiplying two n × n matrices with
only O(n2.8074 ) multiplications. The exponent log2 7 ≈ 2.8074, which improves
upon the log2 8 = 3 provided by the naive algorithm, reflects the fact that only seven
100TH ANNIVERSARY PROBLEMS 283
multiplications (instead of eight) are required with each iteration. This method of
matrix multiplication is known as the Strassen algorithm, due to Volker Strassen
(1936– ) [6]. Since its introduction in 1969, there have been many incremental
improvements; see Figure 2. The current world record is an O(n2.3728639 ) algorithm
due to François Le Gall in 2014 [3].
For the most part these algorithms are only of theoretical interest since their
numerical stability is inferior to that of the naive method. Moreover, the constants
hidden by the Big-O notation can be prohibitively large. On the other hand, the
Strassen algorithm can be used effectively over finite fields, in which numerical
accuracy is irrelevant because the computations are performed exactly [7].
n +1 −1 −i i
4k k+1 k k k−1
4k + 1 k+1 k k k
4k + 2 k+1 k+1 k k
4k + 3 k+1 k+1 k+1 k
This is the Fourier matrix of order n. It is unitary, meaning that Fn−1 = Fn∗ ,
in which Fn∗ is the conjugate transpose of Fn . Some computations with finite
geometric series confirm that Fn2 = −I, and hence Fn4 = I. Thus, the eigenvalues of
Fn are among 1, −1, i, −i; see Table 1. Since the trace of a matrix is the sum of its
eigenvalues, repeated according to multiplicity, the multiplicities of the eigenvalues
of Fn can be deduced from the evaluation of the quadratic Gauss sum
⎧ √
⎪
⎪ (1 + i) n if n ≡ 0 (mod 4),
⎪
2 ⎪
n−1 ⎨√n if n ≡ 1 (mod 4),
k
ζ =
⎪
⎪ if n ≡ 2 (mod 4),
k=0 ⎪0√
⎪
⎩i n if n ≡ 3 (mod 4).
The preceding formula is not at all obvious! Although the magnitude of the sum
can be found relatively easily, its argument is much harder to pin down. As Gauss
confided to Wilhelm Olbers (1758–1840) in 1805 [4]:
. . . the determination of the sign, is exactly what has tortured me all
the time. This shortcoming spoiled everything else that I found; and
hardly a week passed during the last four years where I have not made
this or that vain attempt to untie that knot—especially vigorously
during recent times. But all this brooding and searching was in vain,
sadly I had to put the pen down again. Finally, a few days ago, it has
been achieved—but not by my cumbersome search, rather through
God’s good grace, I am tempted to say. As the lightning strikes the
riddle was solved; I myself would be unable to point to a guiding thread
between what I knew before, what I had used in my last attempts, and
what made it work. Curiously enough the solution now appears to me
to be easier than many other things that have not detained me as many
days as this one years, and surely noone whom I will once explain the
material will get an idea of the tight spot into which this problem had
locked me for so long.
Horner’s method. The FFT and Strassen algorithm provide much more
rapid methods for performing important computations than the naive approaches
suggested by the definitions. Although we refrain from discussing the technical
details of these algorithms, we can at least discuss a simpler algorithm for the rapid
evaluation of polynomials. This hints at the sort of creative thinking and clever
repackaging that is often required to “beat” the approach suggested by definitions.
100TH ANNIVERSARY PROBLEMS 285
Bibliography
[1] C. Burrus, Fast Fourier Transforms. http://cnx.org/content/col10550/1.22/
[2] M. T. Heideman, D. H. Johnson, and C. S. Burrus, Gauss and the history of the fast
Fourier transform, Arch. Hist. Exact Sci. 34 (1985), no. 3, 265–277, DOI 10.1007/BF00348431.
MR815154
[3] F. Le Gall, Powers of tensors and fast matrix multiplication, ISSAC 2014—Proceedings of the
39th International Symposium on Symbolic and Algebraic Computation, ACM, New York,
2014, pp. 296–303, DOI 10.1145/2608628.2608664. MR3239939
[4] S. J. Patterson, Gauss sums, The shaping of arithmetic after C. F. Gauss’s Disquisitiones arith-
meticae, Springer, Berlin, 2007, pp. 505–528, DOI 10.1007/978-3-540-34720-0 19. MR2284818
[5] J. M. Pollard, The fast Fourier transform in a finite field, Math. Comp. 25 (1971), 365–374,
DOI 10.2307/2004932. http://www.ams.org/journals/mcom/1971-25-114/S0025-5718-1971-
0301966-0/S0025-5718-1971-0301966-0.pdf. MR0301966
[6] V. Strassen, Gaussian elimination is not optimal, Numer. Math. 13 (1969), 354–356, DOI
10.1007/BF02165411. MR0248973
[7] Wikipedia, Matrix multiplication algorithm, https://en.wikipedia.org/wiki/Matrix
multiplication algorithm.
1966
Introduction
A binary, integral quadratic form is a function Q : Z2 → Z of the form
Q(x, y) = ax2 + bxy + cy 2 , (1966.1)
in which a, b, c are integers. We require that a and c be nonzero to avoid trivialities
and we often drop the adjectives “binary” and “integral” in what follows. Despite
their simple appearance, quadratic forms have an incredibly rich structure. Carl
Friedrich Gauss developed much of the theory of quadratic forms in his landmark
book Disquisitiones Arithmeticae.
Two quadratic forms Q and Q are equivalent if
Q (x, y) = Q(αx + βy, γx + δy),
in which α, β, γ, δ ∈ Z and αδ − βγ = ±1; that is, (αx + βy, γx + δy) is related to
(x, y) by a 2 × 2 integer matrix whose determinant is 1 or −1. The discriminant of
(1966.1) is
D = b2 − 4ac.
One can show that equivalent forms share the same discriminant. We denote the
number of equivalence classes of quadratic forms with discriminant D by h(D);
this is the class number of D (see Table 1). For the sake of simplicity, we assume
throughout the following that D < 0, in which case Q is positive definite: Q(x, y) >
0 for all x, y ∈ Z.
Gauss proved that h(D) is always finite and discovered that the set of equiva-
lence classes of quadratic forms of discriminant D forms an abelian group of order
h(D) under a complicated operation now known as “Gauss composition.” This
was illuminated by Fields Medalist Manjul Bhargava, who discovered many higher-
order composition laws. In particular, the composition of quadratic forms can now
be conveniently represented with so-called Bhargava cubes [2].
Gauss’s legendary class number one problem asserts that D > 0 satisfies h(−D)
= 1 if and only if
D ∈ {3, 4, 7, 8, 11, 12, 16, 19, 27, 28, 43, 67, 163}.
It is more
√ convenient these days to treat things in terms of imaginary quadratic
fields Q[ −D] instead of quadratic forms. Consequently, it suffices to consider
only square-free D since removing
√ a perfect-square divisor of D results in the same
field. In this context, Q[ −D] is said to have class number one if its “ideal
√ class
group” is trivial. This occurs if and only if the ring of integers in Q[ −D] is a
287
288 1966. CLASS NUMBER ONE PROBLEM
unique
√ factorization domain. An equivalent form of Gauss’s conjecture claims that
Q( −D) with D > 0 has class number one if and only if
7, 11, 23, 29, 37, 43, 53, 67, 71, 79, 107, 109, 113, 127, 137, 149, 151, 163, 179.
100TH ANNIVERSARY PROBLEMS 289
1966: Comments
Primes of the form x2 + dy 2 . The study of primes of the form x2 + dy 2 has a
long and storied history [3]. Fermat showed that a prime p is of the form x2 + y 2 if
and only if p = 2 or p ≡ 1 (mod 4). Here is a short explanation. Since 2 = 12 + 12 ,
it suffices to consider odd primes p. If p ≡ 1 (mod 4), then the method discussed
in the comments for the 1923 entry imply that p is the sum of two squares. On the
other hand, any square is congruent to 0 or 1 (mod 4). Thus, a sum of two squares
cannot be congruent to 3 (mod 4).
Similar criteria are available for many other small values of d:
• p = x2 + 2y 2 iff p = 2 or p ≡ 1, 3 (mod 8).
• p = x2 + 3y 2 iff p = 3 or p ≡ 1 (mod 3).
• p = x2 + 5y 2 iff p ≡ 1, 9 (mod 20).
• p = x2 + 6y 2 iff p ≡ 1, 7 (mod 24).
• p = x2 + 7y 2 iff p = 7 or p ≡ 1, 2, 4 (mod 7).
Why does (1966.3) hold? First, observe that r(n) equals the number of points in
√
Z2 that lie on the circle x2 +y 2 = n with center (0, 0) and radius n. Consequently,
r(0) + r(1) + · · · + r(n) (1966.4)
is the number of points in Z2 that lie in the disk
Dn = {(x, y) ∈ R2 : x2 + y 2 ≤ n}
√
of radius n centered at the origin. Thus, (1966.3) says that for large n, the
expression (1966.4) is approximately equal to the area πn of the disk Dn .
For each point of Dn ∩ Z2 , we associate the square of area 1 of which it forms
the lower left-hand corner.1 The area of the region Rn that is formed by the union
of these squares is (1966.4). In certain places Rn extends past the boundary of the
Dn while in other places Dn extends beyond Rn . Since the squares have
√ √ diagonal
√
2, it follows that the region Rn is contained in the disk of radius n+ 2 centered
√
√
at the origin. On the other hand, the region Rn contains the disk of radius n− 2
centered at the origin; see Figure 2.
Consequently,
√ √ √ √
π( n − 2)2 ≤ r(0) + r(1) + · · · + r(n) ≤ π( n + 2)2 ,
from which it follows that
√ √
π − 2π 2n π + 2π 2n
π+ ≤ An ≤ π + .
n+1 n+1
Take the limit as n → ∞ to obtain (1966.3). In fact, the preceding inequalities tell
√
us that An = π + O(1/ n), so the convergence is relatively slow. For example,
A1,000,000 = 3.141545858 . . ., which is only accurate to four decimal places.
1 Any other corner, or even the center of the square, would work too. The important thing is
Figure
√ √ 2. The region Rn is contained inside of the disk of radius
n + 2 that√ is centered at the origin. It contains the disk of
√
radius n − 2 that is centered at the origin.
p = x2 + 7y 2 ⇐⇒ (−7/p) = 1.
b2 + 28
b = 2b and c=
4p
292 1966. CLASS NUMBER ONE PROBLEM
are integers. Since h(−28) = 1 and because the discriminant of the quadratic form
Q (x, y) = px2 + bxy + cy 2
is −28, it follows that Q is equivalent to Q. Since Q (1, 0) = p, it follows that Q
also represents p; that is, p is of the form x2 + 7y 2 .
Bibliography
[1] A. Baker, Linear forms in the logarithms of algebraic numbers. IV, Mathematika 15 (1968),
204–216, DOI 10.1112/S0025579300002588. MR0258756
[2] M. Bhargava, Higher composition laws. I. A new view on Gauss composition, and qua-
dratic generalizations, Ann. of Math. (2) 159 (2004), no. 1, 217–250, DOI 10.4007/an-
nals.2004.159.217. MR2051392
[3] D. A. Cox, Primes of the form x2 +ny 2 : Fermat, class field theory, and complex multiplication,
2nd ed., Pure and Applied Mathematics (Hoboken), John Wiley & Sons, Inc., Hoboken, NJ,
2013. MR3236783
[4] K. Heegner, Diophantische Analysis und Modulfunktionen (German), Math. Z. 56 (1952),
227–253, DOI 10.1007/BF01174749. MR0053135
[5] D. Shanks, On Gauss’s class number problems, Math. Comp. 23 (1969), 151–163,
DOI 10.2307/2005064. http://www.ams.org/journals/mcom/1969-23-105/S0025-5718-1969-
0262204-1/S0025-5718-1969-0262204-1.pdf. MR0262204
[6] H. Stark, On complex quadratic fields with class number equal to one, Trans. Amer. Math.
Soc. 122 (1966), 112–119, DOI 10.2307/1994504. http://www.ams.org/journals/tran/1966-
122-01/S0002-9947-1966-0195845-4/S0002-9947-1966-0195845-4.pdf. MR0195845
[7] H. M. Stark, On the “gap” in a theorem of Heegner, J. Number Theory 1 (1969), 16–27, DOI
10.1016/0022-314X(69)90023-7. MR0241384
[8] H. M. Stark, A complete determination of the complex quadratic fields of class-number one,
Michigan Math. J. 14 (1967), 1–27.
[9] H. M. Stark, The Gauss class-number problems, Analytic number theory, Clay Math. Proc.,
vol. 7, Amer. Math. Soc., Providence, RI, 2007, pp. 247–256. http://www.claymath.org/
publications/Gauss_Dirichlet/stark.pdf. MR2362205
1967
Introduction
A handwritten letter from Robert Langlands (1936– ) to André Weil begins
modestly:
While trying to formulate clearly the question I was asking you before
Chern’s talk I was led to two more general questions. Your opinion of
these questions would be appreciated. I have not had a chance to think
over these questions seriously and I would not ask them except as the
continuation of a casual conversation. I hope you will treat them with
the tolerance they require at this stage. After I have asked them I will
comment briefly on their genesis. [5]
This 1967 letter was a tour de force, a manifesto that would shape the next half-
century (and more) of number theory. The main characters in Langlands’s drama
are automorphic forms: functions on a topological space that are invariant under
a discrete group of symmetries (the actual definition is much longer and more
technical). There are two crucial supporting characters.
First we have Galois representations; these are homomorphisms Gal(Q/Q) →
GLn (C), in which Q is the algebraic closure of Q and GLn (C) is the group of n × n
invertible complex matrices. This Galois group is one of the richest objects in alge-
braic number theory and describing its representations is a complicated problem.
We also have L-functions, of which the simplest example is the Riemann zeta
function:
∞
1 1
−1
ζ(s) = = 1− s . (1967.1)
n=1
ns p prime
p
The equality of the sum and the product is the famed Euler product formula; see the
1933 entry. Every L-function can be written as a product over primes in this way.
They extend meromorphically to s ∈ C, with a certain symmetry with respect to
s → 1 − s. Miraculously, L-functions encode all kinds of data, from the distribution
of prime numbers to the number of points on algebraic varieties.
The Langlands program, to put it roughly, aims to show that behind every
Galois representation or L-function there is an automorphic form. For example,
suppose that we have an L-function
αp
−1
βp
−1
L(s) = 1− s 1− s ,
p p
p prime
mysterious Maass form. In the case of an elliptic curve (see the 1921 entry), αp
and βp are functions of the number of points on the curve modulo p. Langlands
conjectured that not only do these L-functions have automorphic forms behind
them, but so too do the “symmetric power L-functions”
−1
r
αpi βpr−i
Lr (s) = 1− .
p prime i=0
ps
Just the convergence of the symmetric powers implies two famous conjectures:
the Ramanujan conjecture (all αp , βp are on the unit circle) and the Sato–Tate
conjecture (they are equidistributed on the circle).
The Langlands program encompasses a vast range of conjectures and theo-
rems, more than one person could ever prove. For example, class field theory is
the simplest case of the Langlands program. Andrew Wiles’s proof of Fermat’s last
theorem? That is part of the next simplest case. There have been huge break-
throughs on the Langlands program since 1967, such as the proof of the so-called
fundamental lemma by Ngô Bao Châu (1972– ). He received the Fields Medal in
2010 for this result. However, we will almost certainly be working on the Langlands
program for years to come.
m
2r
(1 − αi β 2r−i x)−1 ,
r=0 i=0
1967: Comments
It is impossible to do the Langlands program justice in a few short paragraphs
and hence we make no attempt to do so. Instead we focus on a couple tangential
results that are of a more elementary nature.
an old result due to Leonhard Euler (see also the 1939 and 1973 entries). Put these
two results together and obtain
1
−1
π2
1− 2 = .
p 6
p prime
Since π 2 /6 is irrational, the preceding product must include infinitely many terms;
that is, there are infinitely many primes. This provides another proof of Euclid’s
theorem. Armed with the finiteness of the irrationality measure of π 2 /6, one can
modify this proof and obtain lower bounds on the prime counting function [2].
We should be more careful, however. The √ irrationality of π does not automati-
cally imply that π 2 is irrational. For example, 2 is irrational (see the 1951 entry),
but its square is an integer. Fortunately, π is transcendental (proved by Lindemann
in 1882) and hence π 2 is irrational. Indeed, if π 2 were rational, then π would be
algebraic since it is a root of x2 − π 2 , which is assumed to have rational coefficients.
(A ⊗ (B ⊗ C)) ⊗ D
α
⊗ 1D A,
B⊗
C C,
B, D
α A,
((A ⊗ B) ⊗ C) ⊗ D A ⊗ ((B ⊗ C) ⊗ D)
D
αA
C,
B,
⊗B
α
,C
⊗
,D
1A
αA,B,C⊗D
(A ⊗ B) ⊗ (C ⊗ D) A ⊗ (B ⊗ (C ⊗ D))
Bibliography
[1] D. Bump, Automorphic forms and representations, Cambridge Studies in Advanced Mathe-
matics, vol. 55, Cambridge University Press, Cambridge, 1997. MR1431508
[2] D. Burt, S. Donow, S. J. Miller, M. Schiffman, and B. Wieland, Irrationality measure and
lower bounds for π(X), Pi Mu Epsilon J. 14 (2017), no. 7, 421–429. https://arxiv.org/abs/
0709.2184. MR3726946
[3] S. R. Garcia and R. A. Horn A Second Course in Linear Algebra, Cambridge University Press,
2017.
[4] S. Gelbart, An elementary introduction to the Langlands program, Bull. Amer. Math. Soc.
(N.S.) 10 (1984), no. 2, 177–219, DOI 10.1090/S0273-0979-1984-15237-6. http://www.ams.
org/journals/bull/1984-10-02/S0273-0979-1984-15237-6/S0273-0979-1984-15237-6.
pdf. MR733692
[5] R. Langlands, Letter to André Weil, Institute for Advanced Study. http://publications.
ias.edu/rpl/paper/43.
[6] R. P. Langlands, Problems in the theory of automorphic forms, Lectures in modern analysis
and applications, III, Lecture Notes in Math., Vol. 170, Springer, Berlin, 1970, pp. 18–61.
MR0302614
1968
Introduction
In 1968, Michael Atiyah (1929–2019) and Isador Singer (1924– ) established
what is now known as the Atiyah–Singer index theorem, a remarkable result that
connects topology and analysis [2, 3]. In 2004, the Norwegian Academy of Science
and Letters awarded the Abel Prize to Atiyah and Singer for this work (the inaugu-
ral award went to Jean-Pierre Serre, whom we met in our 1956 entry). The award
citation proclaims:
The Atiyah-Singer index theorem is one of the great landmarks of
twentieth-century mathematics, influencing profoundly many of the
most important later developments in topology, differential geometry
and quantum field theory. Its authors, both jointly and individu-
ally, have been instrumental in repairing a rift between the worlds of
pure mathematics and theoretical particle physics, initiating a cross-
fertilization which has been one of the most exciting developments of
the last decades.
We describe the world by measuring quantities and forces that
vary over time and space. The rules of nature are often expressed by
formulas involving their rates of change, that is, differential equations.
Such formulas may have an “index”, the number of solutions of the
formulas minus the number of restrictions which they impose on the
values of the quantities being computed. The index theorem calculates
this number in terms of the geometry of the surrounding space. [12]
It is also worth noting that Atiyah was knighted in 1983, in part for the index
theorem. Singer, who is American, is not eligible for knighthood.
Although the precise statement of the Atiyah–Singer index theorem is beyond
the scope of this book, we can describe an elementary result that is of a similar
spirit. The rank-nullity theorem from linear algebra states that if A is an m × n
complex matrix, then
n = rank A + nullity A.
Similarly,
m = rank A∗ + nullity A∗ ,
1968: Comments
A temporal anomaly. Although the index theorem appears here in the 1968
entry, Atiyah was awarded a Fields Medal in 1966 because he
[d]id joint work with Hirzebruch in K-theory; proved jointly with
Singer the index theorem of elliptic operators on complex manifolds;
worked in collaboration with Bott to prove a fixed point theorem re-
lated to the “Lefschetz formula.”1
1 The quote refers to Friedrich Hirzebruch (1927–2012), Raoul Bott (1923–2005), and Solomon
Lefschetz (1884–1972).
100TH ANNIVERSARY PROBLEMS 301
How did this occur? The original announcement of the index theorem dates to
1963 [1] and the results had undergone many years of peer review and study by the
community before the final papers [2, 3] appeared in print in 1968.
n=0
is finite; see the 1949 entry. Each function f ∈ H 2 is analytic (see p. 151) on the
open unit disk D and has a boundary function
∞
∞
f (ζ) = an ζ n = lim− an (rζ)n (1968.2)
r→1
n=0 n=0
that exists for almost all ζ on the unit circle T [10]. For example,
∞
zn
f (z) =
n=0
n+1
∞
belongs to H 2 (its norm is the square root of n=1 1/n2 = π 2 /6; see the 1919
entry), but (1968.2) diverges for ζ = 1 since it is the harmonic series. However,
such points are the exception, rather than the rule: the radial limit (1968.2) exists
generically.
Suppose now that we have a suitable function2 g : T → C that can be decom-
posed as a complex Fourier series
g(ζ) = bn ζ n .
n∈Z
2 The technical hypothesis here is that g belongs to L2 (T), the space of complex-valued
−1 0 2
Risch algorithm. The year 1968 is also notable for the introduction of the
Risch algorithm, developed by Robert Henry Risch (1939– ) [13,14]. This algorithm
determines whether a given function has an elementary antiderivative. If it has such
an antiderivative, the Risch algorithm produces it. Calculus students worldwide
depend on variants of the algorithm whenever they appeal to Wolfram Alpha to
do their homework. Information about computer implementations of the Risch
algorithm can be found in [5, 9].
What is an elementary function? We say that f (x) is elementary if it can be
obtained from the field of complex rational functions in x by adjoining a finite num-
ber of nested exponentials, logarithms, and algebraic functions. The trigonometric
and hyperbolic functions are elementary, as are their inverses. For example,
by Euler’s formula, so cos x is elementary. What about the inverse cosine? Write
the preceding equation as
e2ix − 2eix cos x + 1 = 0
and use the quadratic formula to reveal
√
2 cos x ± 4 cos2 x − 4
ix
e = = cos x ± i cos2 x − 1.
2
In what follows, we gloss over some technical issues, such as the precise definition
of the complex logarithm. By convention, we select the plus sign in the preceding
equation. Substitute x = cos−1 z and obtain
cos−1 z = −i log(z + z 2 − 1).
This demonstrates that cos−1 z is an elementary function.
Some well-known functions that do not have elementary antiderivatives are
1/ log x, which arises in the prime number theorem (see the 1919, 1933, and 1948
entries), cos(x2 ) and sin(x2 ), which arise in the Fresnel integrals from optics, and
e−x , which arises in the central limit theorem (see the 1922 entry). A particularly
2
compelling example was found by Manuel Bronstein (1963– ), who observed that
x
f (x) = √ (1968.3)
x4 + 10x2 − 96x − 71
has the elementary antiderivative
1
F (x) = − ln (x6 + 15x4 − 80x3 + 27x2 − 528x + 781) x4 + 10x2 − 96x − 71
8
− (x8 + 20x6 − 128x5 + 54x4 − 1408x3 + 3124x2 + 10001) + C
Bibliography
[1] M. F. Atiyah and I. M. Singer, The index of elliptic operators on compact manifolds, Bull.
Amer. Math. Soc. 69 (1963), 422–433, DOI 10.1090/S0002-9904-1963-10957-X. MR0157392
[2] M. F. Atiyah and I. M. Singer, The index of elliptic operators. I, Ann. of Math. (2) 87 (1968),
484–530, DOI 10.2307/1970715. http://www.jstor.org/stable/1970715. MR0236950
[3] M. F. Atiyah and I. M. Singer, The index of elliptic operators. III, Ann. of Math.
(2) 87 (1968), 546–604, DOI 10.2307/1970717. http://www.jstor.org/stable/1970717.
MR0236952
[4] M. Bronstein, Integration of elementary functions, J. Symbolic Comput. 9 (1990), no. 2,
117–173, DOI 10.1016/S0747-7171(08)80027-2. MR1056841
[5] M. Bronstein, Symbolic Integration Tutorial http://www-sop.inria.fr/cafe/Manuel.
Bronstein/publications/issac98.pdf.
[6] K. R. Davidson, C ∗ -algebras by example, Fields Institute Monographs, vol. 6, American
Mathematical Society, Providence, RI, 1996. MR1402012
[7] W. Feit, Some consequences of the classification of finite simple groups, The Santa Cruz
Conference on Finite Groups (Univ. California, Santa Cruz, Calif., 1979), Proc. Sympos.
Pure Math., vol. 37, Amer. Math. Soc., Providence, R.I., 1980, pp. 175–181. MR604576
[8] M. Fried, Exposition on an arithmetic-group theoretic connection via Riemann’s existence
theorem, The Santa Cruz Conference on Finite Groups (Univ. California, Santa Cruz, Calif.,
1979), Proc. Sympos. Pure Math., vol. 37, Amer. Math. Soc., Providence, R.I., 1980, pp. 571–
602. MR604636
304 1968. ATIYAH–SINGER INDEX THEOREM
[9] K. O. Geddes, S. R. Czapor, and G. Labahn, Algorithms for computer algebra, Kluwer Aca-
demic Publishers, Boston, MA, 1992. MR1256483
[10] J. Mashreghi, Representation theorems in Hardy spaces, London Mathematical Society Stu-
dent Texts, vol. 74, Cambridge University Press, Cambridge, 2009. MR2500010
[11] R. B. Melrose, The Atiyah-Patodi-Singer index theorem, Research Notes in Mathematics,
vol. 4, A K Peters, Ltd., Wellesley, MA, 1993. http://www.maths.ed.ac.uk/~aar/papers/
melrose.pdf. MR1348401
[12] Norwegian Academy of Science and Letters, 2004 Abel Prize Citation, http://www.
abelprize.no/c53865/binfil/download.php?tid=53806
[13] R. H. Risch, The problem of integration in finite terms, Trans. Amer. Math. Soc. 139 (1969),
167–189, DOI 10.2307/1995313. MR0237477
[14] R. H. Risch, The solution of the problem of integration in finite terms, Bull. Amer. Math.
Soc. 76 (1970), 605–608, DOI 10.1090/S0002-9904-1970-12454-5. MR0269635
[15] J. Rognes, On the Atiyah–Singer index theorem, http://www.abelprize.no/c53865/binfil/
download.php?tid=53804.
[16] R. P. Stanley, Enumerative combinatorics. Vol. 2, with a foreword by Gian-Carlo Rota and
appendix 1 by Sergey Fomin, Cambridge Studies in Advanced Mathematics, vol. 62, Cam-
bridge University Press, Cambridge, 1999. MR1676282
[17] R. P. Stanley, Enumerative combinatorics. Vol. 2, with a foreword by Gian-Carlo Rota and
appendix 1 by Sergey Fomin, Cambridge Studies in Advanced Mathematics, vol. 62, Cam-
bridge University Press, Cambridge, 1999. MR1676282
1969
Erdős Numbers
Introduction
The most prolific mathematical researcher of the 20th century was Paul Erdős.
He wrote over 1,500 articles with around 500 different coauthors. Mathematicians
started to think of him as the center of the research collaboration world. In 1969
Casper Goffman wrote a whimsical article in which he described a measure of
distance from Erdős in terms of mathematical collaborations [6]:
Currently, over 11,000 people have Erdős number 2 and nearly every practicing
mathematician has Erdős number 6 or less [9]. Most nonmathematicians have
Erdős number ∞ (simply because most people have never coauthored a research
article of any type), although there are many exceptions since researchers in physics,
economics, computer science, and other fields can often be linked to Erdős in a finite
number of steps.
From a mathematical point of view, Erdős numbers are distances in the grand
“collaboration graph.” The vertices of this graph are researchers and an edge is
present between every pair of researchers who have published together. A small
portion of this graph is depicted in Figure 1. The collaboration graph is just one
example of a large social network; other examples include Facebook and Twitter.
Research into the structure and dynamics of social networks has reached a feverish
pace in the past several years [10]. Much of that work deals with how graphs can
evolve randomly, a topic pioneered by Erdős and his collaborators decades ago [4].
Erdős himself wrote a short paper in 1972 in which it is shown that the more
restrictive collaboration graph, in which only two-author papers are considered,
cannot be drawn in the plane without its edges crossing [2]. To be more specific,
he attributed the observation to Andrzej Schinzel (1937– ):
305
306 1969. ERDŐS NUMBERS
Christopher N. B. Hammond
Harold S. Shapiro
and Schinzel; the black ones are Davenport, Erdős, Lewis; the simple
task of finding the 9 relevant papers can be left to the reader.
1969: Comments
Erdős–Bacon numbers. A much more selective group are those who have a
finite Erdős–Bacon number . Your Erdős–Bacon number is the sum of your Erdős
number and your Bacon number . The Bacon number is similar to the Erdős num-
ber: just replace “Paul Erdős” with “Kevin Bacon” and “research papers” with
“movie roles.” If you have never appeared in a movie, then your Bacon number is
infinite. Thus, it is hard to have a finite Erdős–Bacon number.
The first named author’s former senior thesis student, Vincent Selhorst-Jones,
has one of the lowest Erdős–Bacon numbers (5) on record; see Figure 2. He appeared
in American Sniper (2014) with Joel Lambert, who appeared in Patriots Day (2016)
with Kevin Bacon. Thus, Vincent has Bacon number 2 (since he never appeared in a
movie with Kevin Bacon, he does not have a 1). As an undergraduate mathematics
100TH ANNIVERSARY PROBLEMS 307
major at Pomona College, Vincent coauthored a paper [7] with the first named
author, who has Erdős number 2; see Figure 1. Thus, Vincent has Erdős–Bacon
number 5.
Book to figure out what happened. However, Peter Duren (1935– ), who was a
professor at the University of Michigan in 1962, tells us:
The Secretary of the Math Club acted as guardian of the book, and
both locals and visitors were invited to look through it. Unfortunately,
the book was lost during the Christmas break of 1962–63, on the streets
of Chicago. The man then serving as Secretary of the Math Club
had carried the book (or books) with him when he drove to Chicago
and had left it in his car overnight. Someone broke into the car and
set it on fire, and the Math Club book was lost (among other items,
including the car). . . . What Paul Erdős called the Ann Arbor Problem
Book must have been the Math Club book. But his reference can’t be
checked, since the original entries for December 1962 no longer exist.
Bibliography
[1] M. Aigner and G. M. Ziegler, Proofs from The Book, 3rd ed., including illustrations by Karl
H. Hofmann, Springer-Verlag, Berlin, 2004. MR2014872
[2] P. Erdős, Mathematical Notes: On the Fundamental Problem of Mathematics, Amer. Math.
Monthly 79 (1972), no. 2, 149–150, DOI 10.2307/2316535. MR1536622
[3] P. Erdős, An interpolation problem associated with the continuum hypothesis, Michigan Math.
J. 11 (1964), 9–10. MR0168482
[4] P. Erdős and A. Rényi, On the evolution of random graphs (English, with Russian summary),
Magyar Tud. Akad. Mat. Kutató Int. Közl. 5 (1960), 17–61. http://www.renyi.hu/~p_erdos/
1961-15.pdf. MR0125031
[5] P. Erdős, A. Rényi, and V. T. Sós, On a problem of graph theory, Studia Sci. Math. Hungar.
1 (1966), 215–235. MR0223262
[6] C. Goffman, Mathematical Notes: And What Is Your Erdős Number?, Amer. Math. Monthly
76 (1969), no. 7, 791, DOI 10.2307/2317868. MR1535523
[7] S. R. Garcia, V. Selhorst-Jones, D. E. Poore, and N. Simon, Quotient sets and
Diophantine equations, Amer. Math. Monthly 118 (2011), no. 8, 704–711, DOI
10.4169/amer.math.monthly.118.08.704. MR2843990
[8] S. R. Garcia and A. L. Shoemaker, Wetzel’s problem, Paul Erdős, and the continuum hy-
pothesis: a mathematical mystery, Notices Amer. Math. Soc. 62 (2015), no. 3, 243-247 (in
Part II of the Erdős retrospective).
[9] J. W. Grossman, The Erdős Number Project, www.oakland.edu/enp.
[10] M. Newman, A.-L. Barabási, and D. J. Watts (eds.), The structure and dynamics of net-
works, Princeton Studies in Complexity, Princeton University Press, Princeton, NJ, 2006.
MR2352222
1970
Introduction
A Diophantine equation is an equation of the form
p(x1 , x2 , . . . , xn ) = 0, (1970.1)
in which p is a polynomial with integer coefficients and only integer solutions are
sought. Such equations have intrigued mathematicians from the dawn of the subject
to the present day. Here are just a few well-known examples.
An early example arises in the Pythagorean theorem, which asserts that
a2 + b2 = c2
for a right triangle with sides a and b and hypotenuse c. Since this relationship
can be rewritten as a2 + b2 − c2 = 0, it is of the form (1970.1). As evidence in
favor of the old dictum that one should not trust a scarecrow whose certification
comes from an unscrupulous degree mill, the theorem is apparently contradicted by
the nonsense the scarecrow utters upon receiving his Th.D. (Doctor of Thinkology)
diploma in The Wizard of Oz :
The sum of the square roots of any two sides of an isosceles triangle is
equal to the square root of the remaining side. [8]
Fictional scarecrows are not alone in botching the Pythagorean theorem: Major
League Baseball messed it up as well (see the comments for the 1971 entry). See
the comments for this year for a proof of the theorem.
Another famous Diophantine equation is the Fermat equation
xn + y n = z n ,
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, (1970.2)
42 = 34 + 8,
221 = 144 + 55 + 21 + 1, and
1,701 = 1,597 + 89 + 13 + 2.
42 = 34 + 5 + 3
= 21 + 13 + 8
= 21 + 13 + 5 + 3,
x1 + x2 + · · · + xP = C, x1 , x2 , . . . , xP ≥ 0.
1970: Comments
Statistical properties of Zeckendorf decompositions. The combinatorial
interpretation suggested by the problem can be used not only to prove Zeckendorf’s
theorem, but also to obtain statistical results about Zeckendorf decompositions. For
example, if we look at all integers in [Fn , Fn+1 ), then the number of summands in
the Zeckendorf decomposition becomes normally distributed as n → ∞ [4]. As a
consequence, one can obtain the following curious results of Lekkerkerker [3]. The
average number of summands used to represent integers in [Fn , Fn+1 ) is
√
5− 5 2 1 2
n− = n−
10 5 1 + φ2 5
and the variance in the number of summands is
1 2 φ 2
√ n− = n− ,
5 5 25 5(φ + 2) 25
in which
1 √
φ = (1 + 5) = 1.618 . . .
2
denotes the golden ratio. The appearance of the golden ratio is not surprising in
light of Binet’s formula; see the comments for the 2002 entry. For another angle on
statistical properties of the Fibonacci numbers, see the 1938 entry.
A proof of the Pythagorean theorem. If you are reading this book, chances
are that you have taken a good number of sophisticated mathematics courses. How-
ever, a surprising number of mathematics majors cannot prove the Pythagorean
theorem off the top of their heads! We shall remedy that here; see Figure 2 for
an elegant “proof by picture.” There are now hundreds of proofs known. Even
b a b
c a
c b
b b
c
b c
a
c b
a a c a
a b a
(a) The area of the large square equals that of (b) The area of the large square equals that
the two small squares (a2 + b2 ) plus that of the of the central square (c2 ) plus that of the four
four triangles. triangles.
Bibliography
[1] J. A. Garfield, Pons Asinorum, The New England Journal of Education 3 (1876), no. 14, 161.
[2] D. Hilbert, Über das Unendliche (German), Math. Ann. 95 (1926), no. 1, 161–190,
DOI 10.1007/BF01206605. http://www.ams.org/journals/bull/1902-08-10/S0002-9904-
1902-00923-3/S0002-9904-1902-00923-3.pdf. MR1512272
[3] C. G. Lekkerkerker, Voorstelling van natuurlyke getallen door een som van getallen van Fi-
bonacci, Simon Stevin 29 (1952), 190–195.
[4] M. Koloğlu, G. S. Kopp, S. J. Miller, and Y. Wang, On the number of summands in Zeckendorf
decompositions, Fibonacci Quart. 49 (2011), no. 2, 116–130. MR2801798
[5] Ju. V. Matijasevič, The Diophantineness of enumerable sets (Russian), Dokl. Akad. Nauk
SSSR 191 (1970), 279–282. MR0258744
[6] Y. Matijasevich, My collaboration with Julia Robinson, Math. Intelligencer 14 (1992), no. 4,
38–45, DOI 10.1007/BF03024472. MR1188142
[7] E. Maor, The Pythagorean theorem: A 4,000-year history, Princeton University Press, Prince-
ton, NJ, 2007. MR2316578
[8] Scarecrow (from the Wizard of Oz), https://www.youtube.com/watch?v=uCOxU2rKLas.
1971
Introduction
The Society for American Baseball Research (SABR), founded in Cooperstown,
New York, by Bob Davids (1926–2002) in 1971, has many objectives, one of which
is to encourage and aid the application of mathematics and statistics to the analysis
of baseball. The term sabermetrics, derived from the acronym SABR, refers to the
statistical study of baseball (usually with the aim of improving a team’s perfor-
mance). Sabermetricians have created an alphabet soup of acronyms to describe
new metrics for measuring player performance (VORP, WAR, OPS, and so forth).
Other sports have since followed baseball’s lead. For example, exotic acronyms
such as TS%, PER, PPP, USG%, and APM are now bandied about on basketball
websites. The current dominance of the NBA’s Golden State Warriors is often
partly attributed to their wholehearted embrace of data analytics.
It is important to know what to measure. For example, walks were originally
viewed as errors by the pitcher and not a positive event by the batter. This led
to an enormous undervaluation of walks, now remedied by the consideration of
on-base percentage. Since the annual revenues in Major League Baseball (MLB)
and other professional sports are measured in the billions, there is a lot at stake.
A team that has a better understanding of which statistics truly matter can as-
semble a better team for less money. This can translate into World Series rings
and increased revenue. Most teams now have sabermetricians helping with player
selection and strategy. Moneyball [6], by Michael Lewis (1960– ), is an excellent
popular account of how the Oakland A’s applied these principles and, with a rela-
tively small budget, fielded competitive teams that routinely reached the playoffs.
See also [4] for applications of mathematics in sports.
1971: Comments
Predicting unlikely events. The importance of the chosen problem extends
far beyond baseball: how do we estimate the probability of an unlikely event? One
approach is through simulation; see the 1946 entry on the Monte Carlo method.
However, we need the ability to run many trials and gather a lot of data. For Monte
Carlo-type methods to be useful in baseball, we would need to be able to simulate
games accurately. Such programs do exist and they often use Markov chains; see
the 1953 entry. Consult [2, 9] for an introduction to Markov chains and [1, 3, 8] for
some applications to baseball.
Another approach is to count how many situations have existed in which the
desired event could have occurred and how many of these situations led to the
outcome. Such an approach does an excellent job for events that occur frequently,
such as hits or stolen bases, or even coming back to win after being down by four
runs after six innings. It is much harder to apply this method if there are few
occurrences.
In a playoff matchup, two teams compete in a best-of-seven series; this means
that the first team to win four games advances. Prior to 2004 (when the Boston
Red Sox achieved the feat), no team in Major League Baseball had ever come back
to win a series after trailing 3-0. However, such opportunities for an epic comeback
only arose 24 times (as of January 1, 2018, it has happened only 34 times). If each
team has an equal chance of winning a game, then we should expect the team down
3-0 to complete the comeback one out of every sixteen times. Of course, it is too
simplistic to think that each team has an equal chance: perhaps the team that is
up 3-0 is just much better than the other team.
In Figure 1, we plot the probability of having no teams, at most one team, and
at most two teams come back to win a best-of-seven series after being down 3-0 if
there are n teams in that situation. There is an enormous difference if we drop the
hypothesis that each team in a series is equally likely to win any given game. If we
assume that the losing team has only a 40% chance of winning each game, then the
number of teams expected to complete an epic comeback drops dramatically.
To compute the probabilities in Figure 1, we first find the chance that one team
comes back after being down 3-0 in games. Assuming they win each individual
game with probability p, the chance they win the next four is just p4 . Thus, the
probability they do not come back is 1−p4 . If there are n teams that find themselves
in a 3-0 hole, the probability that none come back is just (1 − p4 )n , while exactly
one team comes back with probability
n
(p4 )(1 − p4 )n−1
1
and exactly two happens with probability
n
(p4 )2 (1 − p4 )n−2 .
2
To get the probabilities of at most 0, 1, or 2 teams winning a series we just sum
the corresponding probabilities.
100TH ANNIVERSARY PROBLEMS 319
Which team wins? What is the probability that one team beats another?
The goal is to obtain a formula that allows you to assess the contributions of your
players to winning. Such knowledge can then be used to determine where you need
to build. Is it more valuable to improve your offense or your pitching? How much
should you pay for a hitter that is a little bit better than your current player?
More generally, the answer is a result of general techniques that can be applied to
a variety of problems.
One of the most commonly used formulas is the Pythagorean won-loss formula,
due to Bill James (1949– ), which dates back to the 1970s. To give a sense of
its value, it is one of the few statistics often used in scoreboards or expanded
scoreboards online (frequently denoted X-WL for expected won-loss). If RS denotes
the average number of runs scored by a team per game, and RA the average number
of runs they allow, James postulated that a good approximation to their winning
percentage (number of wins divided by number of games) would be
The exponent 2 was chosen to simplify the computations and led to the name since
the sum of the squares in the denominator looks similar to the sum of squares in
the Pythagorean theorem. Nowadays the 2 is replaced by a parameter γ, whose
best fit value in baseball is close to, but a little less than, 2. In 2006, the second
named author provided a theoretical justification for why this formula should be
an excellent predictor. He used elementary probability theory to model the runs
scored and allowed as being drawn from independent Weibull distributions; see
[5, 7]. One of the great values of the Pythagorean expectation is that it allows a
team to estimate the benefit it would receive from adding a hitter who generates
10 more runs versus signing a pitcher who allows 10 fewer.
17
8.5 8.5
12 12
Consequently, home base does not exist, from which it follows that baseball does
not exist.
100TH ANNIVERSARY PROBLEMS 321
Bibliography
[1] J. Beamer, Introducing Markov chains, The Hardball Times, November 26, 2007. https://
www.fangraphs.com/tht/introducing-markov-chains/.
[2] E. Behrends, Introduction to Markov chains: With special emphasis on rapid mixing, Advanced
Lectures in Mathematics, Friedr. Vieweg & Sohn, Braunschweig, 2000. MR1730905
[3] B. Bukiet, E. R. Harold, and J. L. Palacios, Markov Chain Approach to Baseball, Operations
Research 45 (1997), 14–23. https://pubsonline.informs.org/doi/abs/10.1287/opre.45.1.
14.
[4] J. A. Gallian (ed.), Mathematics and sports, The Dolciani Mathematical Expositions, vol. 43,
Mathematical Association of America, Washington, DC, 2010. MR2766424
[5] S. J. Miller, T. Corcoran, J. Gossels, V. Luo, and J. Porfilio, Pythagoras at the
bat, Social networks and the economics of sports, Springer, Cham, 2014, pp. 89–
113, DOI 10.1007/978-3-319-08440-4 6. https://web.williams.edu/Mathematics/sjmiller/
public_html/math/papers/MillerEtAl_Pythagoras.pdf. MR3307909
[6] M. Lewis, Moneyball: The Art of Winning an Unfair Game, W. N. Norton & Company, 2004.
[7] S. J. Miller, A derivation of James’ Pythagorean projection, By The Numbers – The Newsletter
of the SABR Statistical Analysis Committee 16 (February 2006), no. 1, 17–22 and Chance Mag-
azine 20 (Winter 2007), no. 1, 40–48; expanded version available at https://web.williams.
edu/Mathematics/sjmiller/public_html/math/papers/PythagWonLoss_Paper.pdf.
[8] M. D. Pankin, Baseball as a Markov Chain, http://www.pankin.com/markov/intro.htm.
[9] D. Stansbury, A Brief Introduction to Markov Chains, The Clever Machine: Topics in Compu-
tational Neuroscience & Machine Learning, September 24, 2012. https://theclevermachine.
wordpress.com/2012/09/24/a-brief-introduction-to-markov-chains/.
1972
Zaremba’s Conjecture
Introduction
What is the best way to numerically integrate a function of several variables?
One method is to compute the average value of the function over a large number of
sample points. The 1946 entry described the Monte Carlo method, in which sample
points are selected at random. However, it is often desirable to use a deterministic
approach, that is, one that does not depend upon random choices.
Suppose for the sake of simplicity that we wish to numerically integrate a real-
valued smooth function of two variables over the unit square [0, 1]2 in R2 . In 1971,
Stanislaw Zaremba (1903–1990) suggested using sample points
- .
n np
, (mod 1) : 1 ≤ n ≤ q ,
q q
in which gcd(p, q) = 1. In other words, he considered the orbit of (1/q, p/q) under
repeated addition modulo 1; this may remind you of Figure 1 in the 1961 entry.
Zaremba noticed that the quality of the sampling depends upon how small the
partial quotients a0 , a1 , . . . , ak are in the (finite) continued fraction expansion
p 1
= a0 + ;
q 1
a1 +
1
a2 +
a3 + · · ·
see Figure 1 (for more information about continued fractions, see [3,5] and the 1931,
1934, and 1955 entries). There is no loss of generality in assuming that 1 ≤ p < q,
in which case a0 = 0 and we write
p
= [a1 , a2 , . . . , ak ].
q
For a given q, can we select a p so that a1 , a2 , . . . , ak are as small as possible? In
1972, Zaremba conjectured that this “height” of the partial quotients can be made
absolute, for any choice of sample size q [6]. In particular, he conjectured that
one can always select p so that max{a1 , a2 , . . . , ak } ≤ 5; see Table 1. Zaremba’s
conjecture is our problem for this year.
(a) 1191/2383 = [2, 1191] yields a poor sam- (b) 1678/2383 = [1, 2, 2, 1, 1, 1, 2, 2, 2, 2, 2]
pling of the unit square. yields a good sampling of the unit square.
1972: Comments
A continued fraction expansion for e. We follow [4, Sect. 3.8] and derive
the beautiful continued fraction expansion1
2
e = 2+ (1972.1)
3
2+
4
3+
5
4+
5 + ···
for Euler’s constant e = 2.71828 . . .. First, substitute x = −1 in the power series
expansion
∞
xn x2 x3
ex = = 1+x+ + + ···
n=0
n! 2! 3!
and obtain
1 1 1
= 1 − 1 + − + ··· ,
e 2! 3!
1 Since the numerators in (1972.1) are not all 1, this is not a “simple” continued fraction of
p p p p
q [a1 , a2 , . . . , ak ] q [a1 , a2 , . . . , ak ] q [a1 , a2 , . . . , ak ] q [a1 , a2 , . . . , ak ]
1 5 9 16
2 2 27 5, 2, 2 52 5, 1, 3, 2 77 4, 1, 4, 3
1 5 10 17
3 3 28 5, 1, 1, 2 53 5, 3, 3 78 4, 1, 1, 2, 3
1 5 17 14
4 4 29 5, 1, 4 54 3, 5, 1, 2 79 5, 1, 1, 1, 4
1 7 12 17
5 5 30 4, 3, 2 55 4, 1, 1, 2, 2 80 4, 1, 2, 2, 2
5 7 13 14
6 1, 5 31 4, 2, 3 56 4, 3, 4 81 5, 1, 3, 1, 2
2 7 10 17
7 3, 2 32 4, 1, 1, 3 57 5, 1, 2, 3 82 4, 1, 4, 1, 2
3 7 11 16
8 2, 1, 2 33 4, 1, 2, 2 58 5, 3, 1, 2 83 5, 5, 3
2 9 11 19
9 4, 2 34 3, 1, 3, 2 59 5, 2, 1, 3 84 4, 2, 2, 1, 2
3 6 11 16
10 3, 3 35 5, 1, 5 60 5, 2, 5 85 5, 3, 5
2 11 11 15
11 5, 2 36 3, 3, 1, 2 61 5, 1, 1, 5 86 5, 1, 2, 1, 3
5 7 11 16
12 2, 2, 2 37 5, 3, 2 62 5, 1, 1, 1, 3 87 5, 2, 3, 2
3 7 11 17
13 4, 3 38 5, 2, 3 63 5, 1, 2, 1, 2 88 5, 5, 1, 2
3 7 11 16
14 4, 1, 2 39 5, 1, 1, 3 64 5, 1, 4, 2 89 5, 1, 1, 3, 2
4 7 12 17
15 3, 1, 3 40 5, 1, 2, 2 65 5, 2, 2, 2 90 5, 3, 2, 2
3 9 25 16
16 5, 3 41 4, 1, 1, 4 66 2, 1, 1, 1, 3, 2 91 5, 1, 2, 5
3 11 12 17
17 5, 1, 2 42 3, 1, 4, 2 67 5, 1, 1, 2, 2 92 5, 2, 2, 3
5 8 13 16
18 3, 1, 1, 2 43 5, 2, 1, 2 68 5, 4, 3 93 5, 1, 4, 3
4 13 13 33
19 4, 1, 3 44 3, 2, 1, 1, 2 69 5, 3, 4 94 2, 1, 5, 1, 1, 2
9 8 13 17
20 2, 4, 2 45 5, 1, 1, 1, 2 70 5, 2, 1, 1, 2 95 5, 1, 1, 2, 3
4 11 15 17
21 5, 4 46 4, 5, 2 71 4, 1, 2, 1, 3 96 5, 1, 1, 1, 5
5 9 17 17
22 4, 2, 2 47 5, 4, 2 72 4, 4, 4 97 5, 1, 2, 2, 2
4 11 13 17
23 5, 1, 3 48 4, 2, 1, 3 73 5, 1, 1, 1, 1, 2 98 5, 1, 3, 4
5 9 13 17
24 4, 1, 4 49 5, 2, 4 74 5, 1, 2, 4 99 5, 1, 4, 1, 2
7 9 13 19
25 3, 1, 1, 3 50 5, 1, 1, 4 75 5, 1, 3, 3 100 5, 3, 1, 4
5 11 13 18
26 5, 5 51 4, 1, 1, 1, 3 76 5, 1, 5, 2 101 5, 1, 1, 1, 1, 3
Status of the problem. In 2011, Alex Kontorovich (1980– ) and Jean Bour-
gain almost proved Zaremba’s conjecture [1]. To be more specific, they showed
that
|D50 ∩ {1, 2, . . . , n}|
lim = 1.
n→∞ n
In other words, almost all natural numbers appear as the denominator of a finite
continued fraction whose partial quotients are bounded by 50. In 2015, the same
100TH ANNIVERSARY PROBLEMS 327
result was established with D5 in place of D50 . Thus, Zaremba’s original conjecture
with A = 5 is now known to be “almost” true in the sense that those natural
numbers that do not belong to D5 have density zero.
Bibliography
[1] J. Bourgain and A. Kontorovich, On Zaremba’s conjecture, Ann. of Math. (2) 180 (2014), no. 1,
137–196, DOI 10.4007/annals.2014.180.1.3. https://arxiv.org/pdf/1107.3776. MR3194813
[2] S. Huang, An improvement to Zaremba’s conjecture, Geom. Funct. Anal. 25 (2015), no. 3,
860–914, DOI 10.1007/s00039-015-0327-6. MR3361774
[3] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[4] D. Perkins, ϕ, π, e & i, MAA Press, 2017.
[5] A. J. van der Poorten, Notes on continued fractions and recurrence sequences, Number theory
and cryptography (Sydney, 1989), London Math. Soc. Lecture Note Ser., vol. 154, Cambridge
Univ. Press, Cambridge, 1990, pp. 86–97. MR1055401
[6] S. K. Zaremba, La méthode des “bons treillis” pour le calcul des intégrales multiples (French,
with English summary), Applications of number theory to numerical analysis (Proc. Sym-
pos., Univ. Montreal, Montreal, Que., 1971), Academic Press, New York, 1972, pp. 39–119.
MR0343530
1973
Transcendence of e Centennial
Introduction
Let α be a complex number. If there exists a polynomial p(x) of positive degree
with integer coefficients such that p(α) = 0, then α is an algebraic number . If no
such polynomial √exists, then
√ α is transcendental. Thus, all rational numbers are
algebraic, as are 2, i = −1, and
"
√ √
5 + 3 + 1 + 2.
However, not every algebraic number can be written in terms of integers, rational
operations, and root extractions. Students of Galois theory know that the Abel–
Ruffini theorem says that there is no formula analogous to the quadratic formula
that can provide the roots of every polynomials of degree five. For example, x5 −x−1
has roots that are algebraic but not expressible in terms of radicals.
It is often difficult to prove that a specific number is transcendental, although
we can quickly show that most real (or complex) numbers are transcendental. Georg
Cantor proved that the set of algebraic numbers is countable (see the footnote con-
cerning this on p. 31 in the 1918 entry). Since the set of real numbers is uncountable
(see the 1918 and 1999 entries for proofs), it follows that real transcendental num-
bers exist and, moreover, that most real numbers are transcendental.
In 1844, Joseph Liouville proved a theorem that can be used to construct
specific transcendental numbers. For example, he proved that Liouville’s constant
∞
1
λ = n!
= 0.11000100000000000000000100000 . . .
n=1
10
is transcendental; see the comments for the 1935 entry for complete details. This did
not, however, shed any light on the status of the famous constants e and π. Charles
Hermite (1822–1901) established the transcendence of Euler’s constant (Figure 1)
e = 2.7182818284590452353602874713526624977572470936999 . . .
which is positive. Then use integration by parts and induction to show that there
are integers an and bn such that
In = an + bn e, n = 0, 1, 2, . . . .
Suppose toward a contradiction that e = p/q for some natural numbers p and q.
Then
p an q + bn p 1
In = an + bn = ≥
q q q
since the numerator an q + bn p is a positive integer. On the other hand,
1 1
1 e
≤ In = xn ex dx ≤ e xn dx = → 0.
q 0 0 n + 1
This contradiction implies that e is irrational.
See the 1935 and 1955 entries, along with the the comments for the 1918, 1934,
1938, and 1967 entries, for more information about algebraic and transcendental
numbers.
1 When a version of this entry was published in the Pi Mu Epsilon journal, the following prob-
lem was used: “Find a 1-to-1, increasing function f : [0, 1] → R such that f (x) is transcendental
for all x.” This problem has been moved to the 1955 entry.
100TH ANNIVERSARY PROBLEMS 331
1973: Comments
Transcendence of e. The following proof, which can be found in
[2, Thm. 12.45], involves a small amount of complex analysis, or at least some
familiarity with complex integration (see [4, Thm. 5.4.2] for another proof). If f (x)
is a polynomial with deg f = m, then define
z
I(z) = f (ζ)ez−ζ dζ. (1973.1)
0
Repeated integration by parts yields
m
m
I(z) = ez f (j) (0) − f (j) (z). (1973.2)
j=0 j=0
Let F (x) denote the polynomial obtained from f by replacing each coefficient of f
with its absolute value. Since the inequality
|ez−ζ | ≤ e|z−ζ| ≤ e|z|
holds for ζ = tz with t ∈ [0, 1], it follows from (1973.1) that
|I(z)| ≤ |z|e|z| F (|z|).
Suppose toward a contradiction that e is algebraic. Then there are integers
0 and gcd(q0 , q1 , . . . , qn ) = 1 so that
q0 , q1 , . . . , qn with q0 =
q0 + q1 e + q2 e2 + · · · + qn en = 0. (1973.3)
Let
f (x) = xp−1 (x − 1)p · · · (x − n)p , (1973.4)
in which p is a large prime number. Let I(z) denote (1973.1) and let
J = q0 I(0) + q1 I(1) + · · · + qn I(n). (1973.5)
Then (1973.2) and (1973.3) ensure that
m
n
J = − qk f (j) (k),
j=0 k=0
in which
m = deg f = (n + 1)p − 1.
The definition (1973.4) tells us that f (j) (k) = 0 if j < p and k > 0 or if j < p − 1
and k = 0. Consequently p! divides f (j) (k) for all j, k except for j = p − 1 and
k = 0, in which case we have
f (p−1) (0) = (p − 1)!(−1)np (n!)p .
It follows that f (p−1) (0) is a nonzero integer that is divisible by (p − 1)! but not p!
whenever p > n. Let p > max{n, |q0 |} so that |J| ≥ (p − 1)!. Since
F (k) ≤ (2n)m ,
it follows from (1973.4) and (1973.5) that
|J| ≤ |q1 |eF (1) + · · · + |qn |nen F (n) ≤ cp ,
332 1973. TRANSCENDENCE OF e CENTENNIAL
Bibliography
[1] E. B. Burger and R. Tubbs, Making transcendence transparent: An intuitive approach to
classical transcendental number theory, Springer-Verlag, New York, 2004. MR2077395
[2] B. Fine, A. Gaglione, A. Moldenhauer, G. Rosenberger, and D. Spellman, Algebra and number
theory: A selection of highlights, De Gruyter Textbook, De Gruyter, Berlin, 2017. MR3727130
[3] S. Lang, Algebra, 3rd ed., Springer-Verlag, 2002.
[4] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[5] Yu. V. Nesterenko, Modular functions and transcendence questions (Russian, with Russian
summary), Mat. Sb. 187 (1996), no. 9, 65–96, DOI 10.1070/SM1996v187n09ABEH000158;
English transl., Sb. Math. 187 (1996), no. 9, 1319–1348. MR1422383
[6] D. Richeson, The transcendence of e, Division by Zero blog, September 28, 2010. https://
divisbyzero.com/2010/09/28/the-transcendence-of-e/.
[7] R. Schwartz, Transcendence of e, online notes adopted from Section 5.2 of Herstein’s Topics
in Algebra, http://www.math.brown.edu/~res/M154/e.pdf.
1974
Rubik’s Cube
Introduction
In 1974, Ernő Rubik (1944– ) invented the Magic Cube (as it was initially called
in his native Hungary), a mechanical puzzle now known around the world as the
Rubik’s Cube [3]. It is easy to scramble the cube with just a few turns; figuring out
how to restore the six faces takes much more work (one solution is presented in the
comments below). Although the Rubik’s Cube has
43,252,003,274,489,856,000 = 227 × 314 × 53 × 72 × 11
possible states, it can always be solved in 20 moves or less, a fact only established in
2010 [1]. At the first World Championships in 1982, winner Minh Thai (1966– ) of
the United States won with a best time of 22.95 seconds. The current world record
belongs to Feliks Zemdegs (1995– ) of Australia, who clocked in at 4.22 seconds [4].
The best average time over five solves is Zemdegs’s astounding 5.80 seconds.
The mathematics of the Rubik’s Cube is inherently noncommutative in nature:
the order of operations matters. For example, fix an orientation of the cube, rotate
the front face by 90◦ clockwise, then rotate the right face by 90◦ clockwise:
F R
−−−−−→ −−−−−→
Call these operations F and R, respectively. Now take a similarly oriented cube
and perform these steps in the reverse order:
R F
−−−−−→ −−−−−→
subject to the natural relations imposed by the cube itself. For example, U 4 =
D4 = L4 = R4 = F 4 = B 4 = I, in which I denotes the identity element (that is, do
nothing) of the Rubik’s Cube group. This algebraically encapsulates the fact that
turning any face of the cube four times returns the cube to its original state. Other
relations are more subtle, such as (R2 U 2 )6 = I and (RU 2 D−1 BD−1 )1260 = I.
1974: Comments
How to solve a cube. Before giving the solution to the centennial problem,
we might as well provide a solution to the Rubik’s Cube itself! Although the
following method is far from the fastest, it is relatively simple and relies only on
a small number of algorithms. The first named author of this book and economist
(and Pomona alum) Xan Vongsathorn have coached dozens of students through
their first solves with the method below. The second named author uses a similar
approach (online tutorials of his are available at [2]). Speed cubers know dozens of
additional algorithms and have different approaches to the cube entirely.
discussion, we do not attach these letters to particular colors or insist upon fixing
a certain orientation of the cube. In Figure 1 we call the orange face F because we
are holding the cube so that the orange face is in front of us. The green face is R
because it is on our right.
The letters U, D, L, R, F, B also describe turning a face 90◦ clockwise from the
perspective of someone looking at the face head on. For example, U means “turn
the U face clockwise (seen from above) by 90◦ ” and D means “turn the D face
clockwise (from the perspective of someone looking at the bottom of the cube).”
We use F −1 to refer to a counterclockwise quarter turn of the F face, and similarly
for the other faces. An algorithm is a specific sequence of turns, such as F RF −1 R−1 .
This algorithm asks us to execute F , then R, then F −1 , then R−1 .
The first step is to make a white cross on the U face:
Good: Bad:
Somewhere on the cube are four corners with a white sticker on them. You want
to get them on the white (U ) face without messing up the white cross. The two
other colors on the white corners will need to match their surroundings:
Good: Bad:
F F F
If a white corner is in the top layer but not in the correct position, use one of the
preceding algorithms to move it into the bottom layer. Then proceed as above.
336 1974. RUBIK’S CUBE
Now flip the cube over so that the white side is on the D face. If possible, turn
the U layer until you are in a position to apply one of these algorithms:
(U −1 F −1 U F )(U RU −1 R−1 )
−−−−−−−−−−−−−−−−−−−−−−→
F F
(U RU −1 R−1 )(U −1 F −1 U F )
−−−−−−−−−−−−−−−−−−−−−−→
F F
If the desired edge is not in the top layer, then it is in the middle layer. Use one
of the algorithms above to swap the edge that is stuck in the middle layer with an
edge from the upper layer. Now proceed as above. With these two algorithms, you
can solve all four of the middle layer edge pieces.
We now want to make a yellow cross on the (U ) face. If you already have a
yellow cross, you can move onto the next phase. If not, apply the algorithm below
one, two, or three times to make the cross.
B B
F U RU −1 R−1 F −1
L U R −−−−−−−−−−−−−−→ L U R
F F
B B
F U RU −1 R−1 F −1
L U R −−−−−−−−−−−−−−→ L U R
F F
B B
F U RU −1 R−1 F −1
L U R −−−−−−−−−−−−−−→ L U R
F F
100TH ANNIVERSARY PROBLEMS 337
Good: Bad:
B B
L U R L U R
F F
(RU R−1 U )(RU U R−1 )U (RU R−1 U )(RU U R−1 )
You might need to swap two opposite edges. In that case, apply one of the preceding
algorithms and reevaluate the situation.
Now we need to move the corners to the right locations. We will worry about
their orientations later.
Good: Bad:
one or more times to permute the corners until they are in the correct locations.
This is the last step! It is the most complicated. Every piece should be in the
right location, but the yellow corners may not all have yellow stickers facing up.
338 1974. RUBIK’S CUBE
The algorithm
Bibliography
[1] Cube20, God’s Number is 20, http://www.cube20.org/.
[2] S. J. Miller, Talks on solving the 2 × 2 × 2 and 3 × 3 × 3 cubes, https://youtu.be/PKZ7pxFyYu0
and https://youtu.be/FO1kOU-3Blw.
[3] Rubik’s, Home of the Rubik’s Cube, http://www.rubiks.com/.
[4] World Cube Association, https://www.worldcubeassociation.org/.
1975
Szemerédi’s Theorem
Introduction
An arithmetic progression is a finite sequence of integers, such as 4, 9, 14, 19, 24,
whose consecutive terms differ by a fixed amount; see the 1913 entry. We say that
a subset of the natural numbers is AP-rich if it contains arbitrarily long arithmetic
progressions. For example, the set of even numbers is AP-rich. Is the set
A = {1, 2, 3, 5, 6, 7, 10, 11, 13, 14, 15, 17, 19, 21, 22, 23, 26, 29, . . . } (1975.1)
of square-free natural numbers AP-rich? What about the set
B = {1, 4, 9, 16, 25, 36, . . .}
of perfect squares? Or the set
C = {2, 3, 5, 7, 11, 13, 17, 19, 23, . . .}
of prime numbers?
Although each of these three sets is infinite, the ways in which they sit inside
of the natural numbers are different. The square-free numbers appear omnipresent,
whereas the perfect squares seem sparsely distributed. The prime numbers are
somewhere in between. To capture this intuitive idea, we introduce the notion of
natural density (or simply density):
|S ∩ {1, 2, . . . , n}|
d(S) = lim . (1975.2)
n→∞ n
For example, one can show that
6
d(A) = = 0.607927 . . . ;
π2
see the notes for the 1939 entry. Consequently, one might say that “A contains
about 60.8% of the natural numbers.” The perfect squares are much sparser, since
√ √
n + 1 2 n
d(B) = lim ≤ lim = 0.
n→∞ n n→∞ n
Similarly, the prime number theorem (see the 1919 and 1948 entries) ensures that
π(n) n/ log n 1
d(C) = lim = lim = lim = 0.
n→∞ n n→∞ n n→∞ log n
Natural density confirms, in a quantitative manner, that there are a lot more square-
free natural numbers than perfect squares or primes. One can make such statements
more precise by studying the asymptotic behavior of the quotient that appears in
√
(1975.2). For the sets A, B, C above, the quotient is asymptotic to 6n/π 2 , 1/ n,
and 1/ log n, respectively.
339
340 1975. SZEMERÉDI’S THEOREM
Suppose that S is a syndetic set in (N, +), that is, a set with the property that
finitely many of its shifts
k + S = {k + s : s ∈ S}
cover N. Equivalently, S is syndetic if it has bounded gaps in the sense that there
exists a g ∈ N so that
{a, a + 1, a + 2, . . . , a + g} ∩ S
= ∅
for all a ∈ N.
1975: Comments
Divisibility chains. The set (1975.1) of square-free numbers, while free of
geometric progressions, has a lot of multiplicative structure. Each prime number is
square free, and the product of any finite number of distinct primes is square free.
Thus, A contains
p1 , p1 p2 , p1 p2 p3 , . . . ,
for any sequence p1 , p2 , . . . of distinct primes. It turns out that any set of positive
upper logarithmic density (a notion slightly stronger than that of upper density)
contains an infinite divisibility chain, that is, a sequence x1 , x2 , . . . for which each
term divides the next [6]. On the other hand, there is a set of positive upper density
for which no element divides any other element [5].
Ramanujan’s constant. If someone told you that a particular computation
had produced the number
262,537,412,640,768,743.999999999999,
it would be reasonable to assume that the correct result is actually the integer
262,537,412,640,768,744
and that the string of twelve 9’s beyond the decimal point is the byproduct of round-
off error or some other inaccuracy introduced through numerical computation.
In 1975, Martin Gardner (see the 1914 entry) played a famous April Fool’s
joke on the mathematical community when he claimed in his√Scientific American
column that Srinivasa Ramanujan had conjectured that exp(π √163) was an integer
[9] (Gardner fessed up about the joke in [10]). Although exp(π 163) is not exactly
an integer, it is remarkably close since
√
eπ 163
= 262,537,412,640,768,743.9999999999992500725971981856888 . . . .
This amazing near-miss had already been noted in 1859 by Charles Hermite, whom
we met in our 1973 entry. One should keep in mind that few readers in 1975 would
have been able to detect this ruse. Personal computers did not yet exist and desktop
calculators did not have the ability to deal with such large numbers or work with
such great precision. On the other hand, the first named author just computed
100TH ANNIVERSARY PROBLEMS 343
√
1,000 digits of eπ 163 on a late 2013 iMac in just 0.000038 seconds. One million
digits only took 3.367 seconds. How far we have come!
The origin of this spectacular “almost integer” lies with the theory of the j-
invariant; see the 1992 entry. If τ is a quadratic irrational number with positive
imaginary part, then j(τ ) is an algebraic integer (an algebraic number that is the
root of a monic polynomial with integer coefficients) whose degree is the class
number of the quadratic field Q(τ ). Consequently, if Q(τ ) has class number one,
then j(τ ) is an algebraic
√ integer of degree 1, that is, an integer in√the usual sense
of the word. For τ = −d with d square free (to ensure that Q( −d)
= Q), this
occurs if and only if d is a Heegner number . These are 1, 2, 3, 7, 11, 19, 43, 67, 163;
see the 1966 entry. One can show that
√
√
1 + −d
eπ d
≈ −j + 744,
2
in which the first term on the right-hand side is an integer if d = 163. Other, less
spectacular, near-integer identities hold for the largest remaining Heegner numbers:
√
eπ 67
= 147,197,952,743.99999 . . . ,
√
eπ 43
= 884,736,743.9997 . . . .
For an explanation of the mathematics behind “Ramanujan’s constant”, see [11].
Bibliography
[1] M. Beiglböck, V. Bergelson, N. Hindman, and D. Strauss, Multiplicative structures
in additively large sets, J. Combin. Theory Ser. A 113 (2006), no. 7, 1219–
1242, DOI 10.1016/j.jcta.2005.11.003. http://www.sciencedirect.com/science/article/
pii/S0097316505002141. MR2259058
[2] M. Beiglböck, V. Bergelson, N. Hindman, and D. Strauss, Some new results in multiplicative
and additive Ramsey theory, Trans. Amer. Math. Soc. 360 (2008), no. 2, 819–847, DOI
10.1090/S0002-9947-07-04370-X. http://www.ams.org/journals/tran/2008-360-02/S0002-
9947-07-04370-X/S0002-9947-07-04370-X.pdf. MR2346473
[3] V. Bergelson, Multiplicatively large sets and ergodic Ramsey theory, Probability in mathemat-
ics, Israel J. Math. 148 (2005), 23–40, DOI 10.1007/BF02775431. http://link.springer.
com/article/10.1007%2FBF02775431. MR2191223
[4] V. Bergelson and A. Leibman, Polynomial extensions of van der Waerden’s and Szemerédi’s
theorems, J. Amer. Math. Soc. 9 (1996), no. 3, 725–753, DOI 10.1090/S0894-0347-96-00194-4.
MR1325795
344 1975. SZEMERÉDI’S THEOREM
[5] A. S. Besicovitch, On the density of certain sequences of integers, Math. Ann. 110 (1935),
no. 1, 336–341, DOI 10.1007/BF01448032. MR1512943
[6] H. Davenport and P. Erdős, On sequences of positive integers, Acta Arith. 2 (1936), 147–151.
[7] P. Erdős and P. Turán, On Some Sequences of Integers, J. London Math. Soc. 11 (1936),
261–264. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.101.8225.
[8] H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Szemerédi on
arithmetic progressions, J. Analyse Math. 31 (1977), 204–256, DOI 10.1007/BF02813304.
MR0498471
[9] M. Gardner, Mathematical Games: Six Sensational Discoveries that Somehow or Another
Have Escaped Public Attention, Sci. Amer. 232 (1975), no. 4, 127–131.
[10] M. Gardner, Mathematical Games: On Tessellating the Plane with Convex Polygons, Sci.
Amer. 232 (1975), no. 7, 12–117.
[11] B. J. Green, The Ramanujan Constant: An Essay on Elliptic Curves, Complex Multiplication
and Modular Forms, http://people.maths.ox.ac.uk/greenbj/papers/ramanujanconstant.
pdf.
[12] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of
Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/annals.2008.167.481. MR2415379
[13] K. F. Roth, On certain sets of integers, J. London Math. Soc. 28 (1953), 104–109, DOI
10.1112/jlms/s1-28.1.104. MR0051853
[14] T. Tao and T. Ziegler, The primes contain arbitrarily long polynomial progressions, Acta
Math. 201 (2008), no. 2, 213–305, DOI 10.1007/s11511-008-0032-5. MR2461509
[15] J. Pintz, Patterns of primes in arithmetic progressions, Number theory—Diophantine prob-
lems, uniform distribution and applications, Springer, Cham, 2017, pp. 369–379. MR3676411
[16] B. L. van der Waerden, Beweis einer Baudetschen Vermutung, Nieuw. Arch. Wisk. 15 (1927),
212–216.
[17] Wikipedia, https://en.wikipedia.org/wiki/Heegner_number.
[18] E. Szemerédi, On sets of integers containing no four elements in arithmetic progression, Acta
Math. Acad. Sci. Hungar. 20 (1969), 89–104, DOI 10.1007/BF01894569. MR0245555
[19] E. Szemerédi, On sets of integers containing no k elements in arithmetic progression, Col-
lection of articles in memory of Juriı̆ Vladimirovič Linnik, Acta Arith. 27 (1975), 199–245,
DOI 10.4064/aa-27-1-199-245. MR0369312
1976
Introduction
The four color theorem states that every planar map can be colored with four
colors in such a way that no two adjacent countries share the same color; see Figure
1. However, we should be precise about what this means. First of all, each country
must be connected. For example, the United States does not count because Alaska
and Hawaii are not connected to the lower forty-eight states. Second, we do not
consider countries that touch “at corners” to be adjacent. Thus, Arizona and
Colorado do not share a border as far as we are concerned; neither do Utah and
New Mexico. Finally, we prohibit countries with infinitely long boundaries since
otherwise one can construct bizarre maps that require more than four colors [8].
The year 1976 marked the end of the long search for a (correct) proof of the four
color theorem, which was initially conjectured in 1852 by Francis Guthrie (1831–
1899). The conjecture was prompted by his attempt to color a map of English
counties. Today most people know the theorem in the form “no more than four
colors are needed to color a map.” Despite this common understanding of the
theorem, cartographers claim that it does not matter since there is no reason to
limit the number of colors used. Moreover, only three colors are needed for most
345
346 1976. FOUR COLOR THEOREM
maps that arise in practice. Despite its pragmatic insignificance, the four color
theorem has great historical importance.
To make the problem more precise, one converts statements about maps into
statements about graphs. Assign each country a vertex. Place an edge between
two vertices if and only if the two corresponding countries share a common border.
This permits us to phrase the four color theorem in terms of graph theory: the
vertices of any graph that can be drawn in the plane without edge crossings can be
colored with at most four colors so that no two adjacent vertices share the same
color.
The four color theorem has the dubious honor of having been “proved” twice
before 1976. Proofs by Alfred Kempe (1849–1922) in 1879 and by Peter Guthrie
Tait (1831–1901) in 1880 each stood unchallenged for 11 years before fatal flaws
were found. It is much easier to prove that five colors suffice [7]; see [9, Chapter
19] for details.
It was not until 1976 that mathematicians again claimed to have a proof of
the elusive theorem. Kenneth Appel (1932–2013) and Wolfgang Haken (1928– ) at
the University of Illinois proved the four color theorem with computer assistance,
through which they reduced the problem to 1,936 special cases, each of which was
checked by computer. This was greeted with controversy by the mathematical
community (see also the 1998 entry on the Kepler conjecture). Is a proof valid
if it is so long and computationally intensive that no human can understand it
in totality? Although the theorem has since been verified by the Coq interactive
theorem prover [6], there are some who still find the prospect of computer-aided
proofs unsettling. Perhaps a more elegant, humanly understandable proof of the
four color theorem exists. Try to find it!
1976: Comments
Heawood conjecture. The four color theorem tells us that we can color any
planar map using at most four colors. What about map colorings on the torus,
the Klein bottle (see the 1958 entry), or other surfaces? Percy J. Heawood (1861–
1955), who spent most of his career attempting to prove the four color theorem and
found the fatal flaw in Kempe’s 1879 proof, conjectured in 1890 that the minimum
100TH ANNIVERSARY PROBLEMS 347
S v e f χ
tetrahedron 4 6 4 2
cube 8 12 6 2
octahedron 6 12 8 2
dodecahedron 20 30 12 2
isosahedron 12 30 20 2
χ(S) = v − e + f,
in which v denotes the number of vertices, e the number of edges, and f the number
of faces in the triangulation. It turns out that any triangulation of S produces the
same value; that is, the Euler characteristic is a topological invariant of S.1 For
example, the five Platonic solids are all homeomorphic (see p. 22) to a sphere and
all have χ = 2; see Figure 2 and Table 1. Substituting this into (1976.1) suggests
that any map on a sphere can be colored with at most four colors.
What is the status of the Heawood conjecture? Technically, it was disproved in
1934 when Philip Franklin (1898–1965) proved that any map on the Klein bottle (for
which χ = 0) can be colored with only six colors, as opposed to the seven predicted
by the conjecture [5]. This bound is tight since the Franklin graph (Figure 3) can
be embedded on the surface of the Klein bottle and the resulting map cannot be
colored with fewer than six colors. Morally speaking, however, the conjecture is
true 100% of the time since Gerhard Ringel (1919–2008) and John W. T. Youngs
(1910–1970) proved that it holds for all surfaces other than the Klein bottle [10].
For example, any map on the torus (which has χ = 0) can be colored with only
seven colors, and this is minimal; see Figure 4.
1 It is important to note that nonhomeomorphic surfaces may have the same Euler character-
istic. For example, the torus and the Klein bottle both have Euler characteristic zero. They are
not homeomorphic since, for example, they have different fundamental groups (Z2 for the torus
and
a, b : ab = b−1 a for the Klein bottle). We refrain from further discussion since that would
take us too far afield.
348 1976. FOUR COLOR THEOREM
(a) Tetrahedron (4, 6, 4). (b) Cube (8, 12, 6) (c) Octahedron (6, 12, 8)
(d) Dodecahedron (20, 30, 12) (e) Icosahedron (12, 30, 20)
Bibliography
[1] K. Appel and W. Haken, Every planar map is four colorable. I. Discharging, Illinois J.
Math. 21 (1977), no. 3, 429–490. http://www.projecteuclid.org/euclid.ijm/1256049011.
MR0543792
[2] K. Appel, W. Haken, and J. Koch, Every planar map is four colorable. II. Reducibility, Illinois
J. Math. 21 (1977), no. 3, 491–567. http://projecteuclid.org/euclid.ijm/1256049012.
MR0543793
[3] K. Appel and W. Haken, The solution of the four-color-map problem, Sci. Amer. 237 (1977),
no. 4, 108–121, 152, DOI 10.1038/scientificamerican1077-108. MR0543796
[4] K. Appel and W. Haken, Every planar map is four colorable, with the collaboration of
J. Koch, Contemporary Mathematics, vol. 98, American Mathematical Society, Providence,
RI, 1989. MR1025335
[5] P. Franklin, A six color problem, J. Math. Phys. 13 (1934), 363–379.
350 1976. FOUR COLOR THEOREM
[6] G. Gonthier, Formal proof—the four-color theorem, Notices Amer. Math. Soc. 55 (2008),
no. 11, 1382–1393. http://www.ams.org/notices/200811/tx081101382p.pdf. MR2463991
[7] P. J. Heawood, Map-colour theorems, Quarterly Journal of Mathematics, Oxford 24 (1890),
332–338.
[8] H. Hudson, Four colors do not suffice, Amer. Math. Monthly 110 (2003), no. 5, 417–423.
[9] S. J. Miller, Mathematics of optimization: how to do things faster, Pure and Applied Under-
graduate Texts, vol. 30, American Mathematical Society, Providence, RI, 2017. MR3729274
[10] G. Ringel and J. W. T. Youngs, Solution of the Heawood map-coloring problem, Proc. Nat.
Acad. Sci. U.S.A. 60 (1968), 438–445, DOI 10.1073/pnas.60.2.438. MR0228378
[11] R. Thomas, An update on the four-color theorem, Notices Amer. Math. Soc. 45 (1998), no. 7,
848–859. http://www.ams.org/notices/199807/thomas.pdf. MR1633714
[12] Wikipedia, Four color theorem, http://en.wikipedia.org/wiki/Four_color_theorem.
1977
RSA Encryption
Introduction
Alice and Bob wish to communicate without letting an eavesdropper, Eve,
understand their conversation. Any information that they wish to exchange can be
encoded with numbers (see the comments for the 1936 entry). Instead of sending
one large number that represents an entire message, information is typically broken
up into smaller blocks of fixed size. Thus, Alice and Bob want to securely send
and receive nonnegative integers less than or equal to a fixed threshold while Eve
is eavesdropping. Moreover, they need to do this without first exchanging a secret
key for their code: otherwise Eve will know the key!
The RSA cryptosystem, invented by Ronald Rivest (1947– ), Adi Shamir
(1952– ), and Leonard Adleman (1945– ) in 1977 and, independently, by Clifford
Cocks (1950– ) of the UK intelligence agency GCHQ (Government Communications
Headquarters) in 1973, addresses this issue (Cocks’s work remained classified until
1997). Eve can listen to the entire RSA-encrypted communication and she will be
unable to decipher it! Without algorithms such as RSA, modern e-commerce would
be impossible: we can buy things online without meeting the seller in person to
agree on a secret key for the transaction. To perform this amazing feat, Alice and
Bob require some number theory.
To describe the RSA cryptosystem, we need Euler’s generalization of Fermat’s
little theorem. Fermat’s little theorem tells us that
ap−1 ≡ 1 (mod p)
if p is prime and gcd(a, p) = 1; see the 2002 entry. Let φ(n) denote the number
of j ∈ {1, 2, . . . , n} that are relatively prime to n. For example, φ(15) = 8 since
there are eight numbers, namely 1, 2, 4, 7, 8, 11, 13, 14, in the specified range that
are relatively prime to 15. The function φ is called the Euler totient function. It is
multiplicative, in the sense that φ(mn) = φ(m)φ(n) if m and n are relatively prime.
For example, φ(15) = φ(3)φ(5) = 2 · 4 = 8. Moreover, φ(p) = p − 1 whenever p is
prime, since 1, 2, . . . , p − 1 are relatively prime to p. Euler’s theorem states that if
gcd(a, n) = 1, then
RSA algorithm.
• Alice secretly selects distinct large primes p and q. Their product n = pq is her
enciphering modulus.
• Alice picks a public key (also called an encryption key) e. This is a positive
integer such that gcd(e, φ(n)) = 1. She knows n = pq, so she can compute
φ(n) = φ(p)φ(q) = (p − 1)(q − 1) and check if gcd(e, φ(n)) = 1 rapidly via the
Euclidean algorithm.
• Alice’s private key (also called a decryption key) d is the inverse of e (mod φ(n)).
Thus, de = jφ(n) + 1 for some integer j.
• Alice makes n and e known to the public. She does not disclose p, q, or d.
• To send the message M ∈ {1, 2, . . . , n} to Alice, Bob computes1 E ≡ M e (mod n).
He sends E to Alice.
• Alice recovers M from E as follows:2
E d ≡ (M e )d ≡ M de ≡ M jφ(n)+1 ≡ (M φ(n) )j M ≡ M (mod n).
Since n and e are publicly available, anyone can send messages to Alice. Only
she can decrypt these messages because only she knows the private key d. Here
is an example. Alice selects secret primes p = 7,919 and q = 9,733. Then n =
pq = 77,075,627 and φ(n) = (p − 1)(q − 1) = 77,057,976. Alice chooses e = 47
and checks that gcd(47, φ(n)) = 1. The multiplicative inverse of 47 (mod φ(n)) is
d = 68,860,319. Bob wants to send the message M = 12,345 to Alice. He computes
E ≡ M e = (12,345)47 ≡ 18,269,972 (mod n)
and sends this to Alice, who receives it and computes
E d = (18,269,972)68,860,319 ≡ 12,345 (mod n).
Suppose that Eve wants to find M , knowing only E and Alice’s public informa-
tion, n and e. She needs Alice’s private key d, so Eve must solve de ≡ 1 (mod φ(n)).
To do this, Eve needs to know φ(n) = (p − 1)(q − 1). Since
(p − 1)(q − 1) = pq − p − q + 1 = n − (p + q) + 1,
knowing φ(n) is equivalent to knowing p + q. However, knowing p + q is equivalent
to knowing p and q since the roots of
(x − p)(x − q) = x2 − (p + q)x + pq = x2 − (p + q)x + n,
namely p and q, can be found by the quadratic formula. Thus, finding φ(n) =
(p − 1)(q − 1) is as hard as factoring n = pq.
The security of RSA is based upon the assumption that it is hard to factor large
numbers (even though it is easy to multiply them). If a method for fast factoriza-
tion were to be found, then RSA would cease to be secure. Peter Shor (1959– )
found such an algorithm for fast factorization, but it requires a quantum computer.
Although quantum computers have so far only been able to factor relatively small
1 Althoughexponentiating M modulo n appears to be a daunting task, it can be done rapidly
by repeated squaring and modular reduction; see the 2002 entry.
2 One can prove that E d ≡ M (mod n) even if gcd(M, n) = 1.
100TH ANNIVERSARY PROBLEMS 353
numbers, the potential exists for them to one day factor RSA moduli. Other cryp-
tographic systems, such as lattice-based methods, are believed to be more secure
against quantum-computer attacks.
1977: Comments
Poor choices and Pollard’s p − 1 algorithm. The security of RSA rests
on the assumption that factoring n = pq is computationally infeasible. However,
there are some choices of p and q that render n susceptible to certain factorization
algorithms. Suppose that p − 1 has only small prime factors. For instance, the
prime p = 614,657 is “large” but p − 1 = 614,656 = 28 · 74 has only “small” prime
factors. In this situation, Pollard’s p − 1 algorithm might be able to factor n in a
reasonable amount of time. In what follows, we do not require that n is a product
of two distinct primes.
The starting point of Pollard’s algorithm is the observation that if p − 1 does
not have any large prime factors, then (p − 1)|k! for some small k. For example, if
p = 181, then
p − 1 = 180 = 22 · 32 · 5
354 1977. RSA ENCRYPTION
contains only small prime factors and p − 1 divides 6! = 720 = 180 · 4. On the
other hand, if p = 179, then p − 1 = 178 = 2 · 89 has a relatively large prime factor.
Because of this, p − 1 does not divide k! for k = 1, 2, . . . , 88, although it divides 89!.
Suppose that p is a prime factor of n and (p − 1)|k!. Then k! = (p − 1)r for
some r ∈ N and Fermat’s little theorem yields
2k! = 2(p−1)r ≡ (2p−1 )r ≡ 1r ≡ 1 (mod p),
so p|(2k! − 1). Although other bases may be used, the base 2 is preferred in practice
since exponentiation with base 2 is particularly amenable to computation.
Let mk ≡ 2k! − 1 (mod n) with 1 ≤ mk ≤ n. Since mk and 2k! − 1 differ by a
multiple of n, we have
gcd(mk , n) = gcd(2k! − 1, n) ≥ p.
If n does not divide 2k! − 1, then gcd(mk , n) is a proper divisor of n. In the
preceding, we insisted that mk is the least positive residue of 2k! − 1 modulo n since
mk = 0 implies that gcd(mk , n) = n and hence we do not obtain a proper factor of
n.
To implement Pollard’s algorithm, fix a threshold K and compute gcd(mk , n)
for k = 2, 3, . . . , K and hope that a proper divisor of n is found. Observe that
mk ≡ 2k! − 1 ≡ (2(k−1)! )k − 1 ≡ (mk−1 + 1)k − 1 (mod n),
so the mk can be computed iteratively without computing k!. This shortcut is
important, since the rapid growth of k! prevents the direct evaluation of mk .
Here is an example. If n = 26,016,619, then
22! ≡ 4 (mod n), m2 = 3, gcd(m2 , n) = 1,
2 ≡ 4 ≡ 64 (mod n),
3! 3
m3 = 63, gcd(m3 , n) = 1,
2 ≡ 64 ≡ 16,777,216 (mod n),
4! 4
m4 = 16,777,215, gcd(m4 , n) = 1,
25! ≡ 16,777,2165 ≡ 6,730,144 (mod n), m5 = 6,730,143, gcd(m5 , n) = 1,
2 ≡ 6,730,144 ≡ 14,067,788 (mod n),
6! 6
m6 = 14,067,787, gcd(m6 , n) = 1,
2 ≡ 14,067,788 ≡ 20,137,005 (mod n), m7 = 20,137,004,
7! 7
gcd(m7 , n) = 5,419,
so 5,419|n. In fact, n = pq, in which p = 5,419 and q = 4,801 are prime. Neither
p − 1 = 5,418 = 2 · 32 · 7 · 43 nor q − 1 = 4,800 = 26 · 3 · 52
divides 7! = 5,040. That is, the Pollard p − 1 method was successful before our
initial analysis predicted that it should be. This is because 2k! −1 might be divisible
by p by chance, as opposed to being divisible by p because k! is a multiple of p − 1.
This is the case here, since 27! − 1 happens to be divisible by p.
If Alice is careful in selecting her primes p and q, she can prevent Eve from
factoring her RSA modulus n = pq using Pollard’s p − 1 algorithm. Let p0 , q0 be
large primes. Then let p and q be even larger primes of the form
p = ip0 + 1 and q = jq0 + 1.
Dirichlet’s theorem on primes in arithmetic progressions ensures that there are
infinitely many such primes; see the comments for the 1913 entry. By construction,
100TH ANNIVERSARY PROBLEMS 355
p − 1 = ip0 and q − 1 = jq0 have the large prime factors p0 and q0 , respectively.
This prevents Eve from applying the Pollard p − 1 algorithm effectively.
Answer to the problem. The factorization of RSA-100 is
37975227936943673922808872755445627854565536638199
× 40094690950920881030683735292761468389214899724061
This was found in 1991 by Mark Manasse (1958– ) and Arjen K. Lenstra (1956– )
[3].
Bibliography
[1] R. Rivest, A. Shamir, and L. Adleman, US Patent 4,405,829 (1977). http://www.google.com/
patents/US4405829.
[2] R. L. Rivest, A. Shamir, and L. Adleman, A method for obtaining digital signatures and public-
key cryptosystems, Comm. ACM 21 (1978), no. 2, 120–126, DOI 10.1145/359340.359342.
https://people.csail.mit.edu/rivest/Rsapaper.pdf. MR700103
[3] RSA Laboratories, RSA Honor Roll, http://www.ontko.com/pub/rayo/primes/hr_rsa.txt
[4] Wikipedia, Pollard’s p − 1 algorithm, https://en.wikipedia.org/wiki/Pollard’s_p_-
_1_algorithm.
[5] Wikipedia, RSA Factoring Challenge, http://en.wikipedia.org/wiki/RSA Factoring
Challenge.
[6] Wikipedia, Shor’s Algorithm, http://en.wikipedia.org/wiki/Shor’s_algorithm.
1978
Mandelbrot Set
Introduction
The Mandelbrot set is an example of a fractal, a mathematical object that
possesses a great deal of self-similarity. It is constructed as follows. For each
complex number c, form the sequence zn;c , in which
2
z0;c = c and zn+1;c = zn;c + c.
The simplest pictures of the Mandelbrot set are obtained by coloring a point c black
if the sequence defined above is bounded and white otherwise; see Figure 1. For
finer detail, we can color points c whose sequences zn;c appear unbounded based
upon how many iterations are needed to exceed a fixed, large threshhold; see Figure
2. One can zoom in on the Mandelbrot set and obtain a variety of beautiful and
bewildering images; see Figure 3 and the links at [9].
One of the most important things to address with any iterative problem is
the existence and classification of fixed points. If w is a fixed point of the map
p(z) = z 2 + c, in which c is a constant, then p(w) = w; that is,
w2 − w + c = 0.
357
358 1978. MANDELBROT SET
determines the nature of the fixed point w. If |p (w)| < 1, then w is an attracting
fixed point and values that start out close to w will iterate toward w. If |p (w)| > 1,
then w is a repelling fixed point and values that start out close to w will iterate away
from w. If |p (w)| = 1, then the situation is more complicated and the argument of
the complex number p (w) comes into play.
What about polynomials of higher degree? If p is a polynomial of degree n,
then p(w) = w means that w is a zero of the polynomial h(z) = p(z) − z, which has
degree at most n. The fundamental theorem of algebra asserts that a polynomial
of degree n has exactly n zeros, counted according to multiplicity, in the complex
plane. Thus, p has at most n fixed points. What if the polynomial p is replaced
with a slightly more exotic function?
100TH ANNIVERSARY PROBLEMS 359
1978: Comments
Space Invaders. A strong contender for this year’s topic was the video game
Space Invaders 1 [10]. Created by Tomohiro Nishikado (1944– ) and released in
1978, this mega-blockbuster game revolutionized the industry. Interestingly, one of
the defining features of the game was due to hardware limitations. In the game,
alien ships are attacking the Earth. As more and more of them are destroyed, the
1A common misconception is that the line “And the space he invades he gets by on you”
from the 1981 Rush song Tom Sawyer is “And the space invaders get by on you.” Certainly, the
second is the more amusing interpretation.
360 1978. MANDELBROT SET
remaining ships move faster and faster until the last few ships move at incredible
speeds. This feature was due to a computational bottleneck. The fewer the num-
ber of ships that need to be drawn, the faster the computer could display them!
Nishikado decided that he liked this and incorporated it into the game.
is continuous and bounded. Since g(x) is periodic with period 1, it follows that
g(2n x) is periodic with period 21n . If x is a dyadic rational number (that is, its
denominator is a power of 2), then 2k x is an integer whenever k ≥ n and hence
However,
for some mn ∈ Z. Since 2k−n ≤ 12 for k < n, this means that g is linear on the
interval [2k un , 2k vn ]. Thus, each of the difference quotients on the right side of
(1978.2) is ±1 (depending on whether mn is even or odd). In other words,
f (vn ) − f (un )
n−1
= ±1 (1978.3)
vn − un
k=0
for some sequence of signs ±. Since the terms of a convergent series must tend to
zero, it follows that (1978.3) does not tend to a finite limit as n → ∞. In light of
(1978.1), we conclude that f (x) does not exist.
Answer to the problem. Let p(z) and q(z) be polynomials with deg p = n,
deg q = m, and m < n. What is the maximum number of zeros of
Terence Sheil-Small conjectured in 1992 that the sharp upper bound was n2 . This
is indeed the case if m = n or m = n − 1, as his student A. S. Wilmshurst proved
[11]. What if m < n − 1? Wilmshurst conjectured that if m = 1, that is,
h(z) = p(z) − z,
then the number of zeros of h is at most 3n − 2. This was proved in 2002 by
Dmitry Khavinson (1956– ) and Grzegorz Świa̧tek using techniques from complex
dynamics [4]; see [3] for an elegant exposition of this result and an application to
gravitational lensing (also see the 1915 entry). The sharpness of the upper bound
3n − 2 was proved in 2008 by Lukas Geyer [2].
Bibliography
[1] R. Brooks and J. P. Matelski, The dynamics of 2-generator subgroups of PSL(2, C), Riemann
surfaces and related topics: Proceedings of the 1978 Stony Brook Conference (State Univ.
New York, Stony Brook, N.Y., 1978), Ann. of Math. Stud., vol. 97, Princeton Univ. Press,
Princeton, N.J., 1981, pp. 65–71. MR624805
[2] L. Geyer, Sharp bounds for the valence of certain harmonic polynomials, Proc. Amer. Math.
Soc. 136 (2008), no. 2, 549–555, DOI 10.1090/S0002-9939-07-08946-0. MR2358495
[3] D. Khavinson and G. Neumann, From the fundamental theorem of algebra to astrophysics: a
“harmonious” path, Notices Amer. Math. Soc. 55 (2008), no. 6, 666–675. MR2431564
[4] D. Khavinson and G. Świa̧tek, On the number of zeros of certain harmonic polynomials,
Proc. Amer. Math. Soc. 131 (2003), no. 2, 409–414, DOI 10.1090/S0002-9939-02-06476-6.
MR1933331
[5] B. Mandelbrot, Fractal aspects of the iteration of z
→ λz(1 − z) for complex λ, z, Annals of
the New York Academy of Sciences 357, 249–259.
[6] B. B. Mandelbrot, The fractal geometry of nature, Schriftenreihe für den Referenten [Series
for the Referee], W. H. Freeman and Co., San Francisco, Calif., 1982. MR665254
[7] Team Fresh, Last Lights On—Mandelbrot fractal zoom to 6.066 e228 (2760 ). http://vimeo.
com/12185093.
[8] T. Takagi, A simple example of the continuous function without derivative, Proc. Phys. Math.
Japan, 1 (1901), 176–177.
[9] Wikipedia, Mandelbrot set, http://en.wikipedia.org/wiki/Mandelbrot_set.
[10] Wikipedia, Space invaders, http://en.wikipedia.org/wiki/Space_Invaders.
[11] A. S. Wilmshurst, The valence of harmonic polynomials, Proc. Amer. Math. Soc. 126 (1998),
no. 7, 2077–2081, DOI 10.1090/S0002-9939-98-04315-9. MR1443416
1979
TEX
Introduction
This entry honors two fundamental contributions of computer science to mathe-
matics. First, there is the creation of the TEX typesetting system by Donald Knuth
(1938– ), which was released in 1978. Second, there are off-by-one errors (in which
a loop intended to be executed n times is inadvertently executed n − 1 or n + 1
times), which is why this entry is listed under 1979. Purists will be happy to know,
however, that Knuth was honored with the National Medal of Science in 1979.
Donald Knuth is perhaps best known for his monumental, encyclopedic, and
stunningly readable series The Art of Computer Programming. Begun in 1962
while he was a graduate student at Caltech, the project continues to this day, with
volume 4A published in 2011 and several remaining volumes in preparation. While
preparing a second edition of Volume 2, Knuth was dismayed with the quality of
the typesetting done by the publisher. He realized that digital typesetting can be
boiled down to 0’s and 1’s: is a pixel black or white? Knuth saw this as a problem
amenable to computer science and set out to design his own system.
Knuth estimated he could have his digital-typesetting system ready in six
months. Instead, it was almost ten years before TEX was released. It was called
version 3. The next version was 3.1, which was followed by version 3.14, and so
forth. The current version is 3.14159265. This unusual numbering system suggests
that later versions are only incrementally different from previous ones and that TEX
has essentially stabilized.
TEX is used extensively in the publishing world. Almost every contemporary
paper in mathematics and computer science is typeset using a system based on TEX,
including this book! Most mathematicians use LATEX, a document-preparation
system written in TEX that includes many predefined commands that would be
tedious to deal with in “raw” TEX. For example, the LATEX source
\begin{equation}\label{eq:ZetaAgain1979}
\zeta(s) \ = \ \sum_{n=1}^\infty \frac{1}{n^s}
\end{equation}
produces (1979.1) below. The formula for the Riemann zeta function is enclosed in
an equation environment with the label eq:ZetaAgain1979 attached in case we
need to refer to it at some point.
Although TEX has many features that distinguish it from other digital typeset-
ting and publishing systems, we focus on only two points of interest here.
363
364 1979. TEX
First, TEX is a programming language. The user writes a program that de-
scribes both the content and layout of the document. The program is then inter-
preted by TEX and produces the desired output. This design choice means that
TEX is extraordinarily flexible and customizable. The price is that TEX and related
systems can be hard for beginners to pick up. Fortunately there are many tem-
plates available online. By looking at existing code and compiled documents you
can learn over 90% of what you need fairly quickly, and then search the web or ask
experts for the rest. For example, the second named author maintains TEX tem-
plates (for papers and presentations) at http://web.williams.edu/Mathematics/
sjmiller/public_html/math/handouts/latex.htm. The website also has a link
to a YouTube video that goes through writing simple articles with these templates.
The second point is that TEX uses sophisticated algorithms to lay out text.
Consider the problem of breaking a paragraph into justified lines. Each line must
begin at the left margin and end at the right margin, and there can be neither too
much nor too little space between words. Line breaks are allowed only between
words and, if necessary, inside a word at a known hyphenation point.
How would you solve this problem? The solution used in most digital typeset-
ting systems and word processors is a greedy strategy. We consider the words of
the paragraph one at a time, adding them to the current line. When the current
line is full, it is added to the page, and we begin adding words to the next line.
This approach is fast since it considers each word only once, but it can lead to
unappealing results because it never changes its mind about lines that have already
been added to the page. For example, the greedy algorithm may put vastly different
amounts of space between words on different lines, which looks terrible.
1979: Comments
Apéry’s constant. The year 1979 also saw Roger Apéry’s proof that ζ(3) is
irrational. Here ζ denotes the Riemann zeta function, defined by
∞
1
ζ(s) = (1979.1)
n=1
ns
100TH ANNIVERSARY PROBLEMS 365
for Re s > 1; see the 1928, 1933, 1939, 1942, 1945, 1967, and 1987 entries for
more information. As a consequence of Apéry’s result, some people refer to ζ(3) as
Apéry’s constant.
It has long been known that ζ(k) is a rational multiple of π k if k ≥ 2 is even1
(see the 1919 and 1945 entries); the values of ζ(k) for odd k ≥ 3 remain largely
mysterious. To fifty decimal places,
ζ(3) = 1.2020569031595942853997381615114499907649862923405 . . . .
Is this a rational multiple of π 3 ? If so, the numbers involved must be enormous
since otherwise an explicit formula for ζ(3) would have been found long ago. Lots of
mathematicians have studied ζ(3). For example, Srinivasa Ramanujan discovered
the curious representation
∞
1
7 3
ζ(3) = π −2 e2πk − 1.
180 k3
k=1
Although we do not have a closed-form expression for ζ(3), at least we know that it
is irrational. Moreover, ζ(k) is irrational for infinitely many odd k [2] and at least
one of ζ(5), ζ(7), ζ(9), ζ(11) is irrational [9].
and
∞
1 1
sa tb log st 1 1 1
ds dt = − . (1979.6)
0 0 1 − st a − b n=1 (n + a)2 (n + b)2
The formulas (1979.2) and (1979.4) follow by expanding
∞
1
= s n tn
1 − st n=0
and integrating. The others follow by differentiating (1979.2) and (1979.4) with
respect to a and b.
If p(s, t) is a polynomial of degree n with integral coefficients, then (1979.3)
and (1979.6) imply
1 1
p(s, t) log(st) an + bn ζ(3)
ds dt = , (1979.7)
0 0 1 − st d3n
in which an , bn , dn ∈ Z and dn = lcm{1, 2, . . . , n}. We claim that dn ≤ e1.01n for
sufficiently large n. Indeed,
dn = pk(p) ,
p≤n
in which 1 0
log n log n
k(p) = ≤
log p log p
is the highest power of p that divides a number at most n. The prime number
theorem ensures that
log dn = k(p) log p ≤ π(n) log n ∼ n,
p≤n
and hence
1
1
du
0 (1 − (1 − u)s)(1 − (1 − t)u)
1 log(1 − s) log(t)
= −s + (1 − t)
1 − (1 − s)t s t−1
log(t(1 − s))
= − .
1 − (1 − s)t
Use (1979.8) with x = 1 − (1 − s)t and observe that the two integrals are equal.
The preceding argument and (1979.7) ensure that
1 1 1
Pn (s)Pn (t)
(−1)n ds dt du
0 0 0 1 − (1 − (1 − s)t)u
1 1 1
Pn (s)Pn (t)
= (−1)n ds dt du
0 0 0 (1 − (1 − u)s)(1 − (1 − t)u)
1 1 1
Pn (s)Pn (t)
= ds dt du
0 0 0 (1 − (1 − u)s)(1 − tu)
is of the form
an + bn ζ(3)
.
d3n
Integrating by parts n times with respect to each of the variables s and t yields
1 1 1
Pn (s)Pn (t)
ds dt du
0 0 0 (1 − (1 − u)s)(1 − tu)
1 1 1
(s − s2 )n (t − t2 )n (u − u2 )n
= ds dt du.
0 ((1 − (1 − u)s)(1 − tu))
n+1
0 0
The nonnegative function
s(1 − s)t(1 − t)u(1 − u)
f (s, t, u) = (1979.10)
(1 − (1 − u)s)(1 − tu)
vanishes on the boundary of [0, 1] × [0, 1] × [0, 1] and attains its maximum at
√ √
(s, t, u) = (2 − 2, 2 − 1, 12 ),
where
√ √ √
f (2 − 2, 2 − 1, 12 ) = 17 − 12 2.
Thus,
an + bn ζ(3) √
3
= O (17 − 12 2)n .
dn
Since dn = O(e1.01n ) and
√
e3.03 (17 − 12 2) ≈ 0.60927,
it follows that the integer an + bn ζ(3), which is nonzero because of the positivity
of the integrand (1979.10), satisfies
1 ≤ |an + bn ζ(3)| = O(0.61n ).
This contradiction proves that ζ(3) is irrational.
368 1979. TEX
Bibliography
[1] R. Apéry, Irrationalité de ζ(2) et ζ(3) (French), Luminy Conference on Arithmetic, Astérisque
61 (1979), 11–13. MR3363457
[2] K. Ball and T. Rivoal, Irrationalité d’une infinité de valeurs de la fonction zêta aux entiers
impairs (French), Invent. Math. 146 (2001), no. 1, 193–207, DOI 10.1007/s002220100168.
MR1859021
[3] F. Beukers, A note on the irrationality of ζ(2) and ζ(3), Pi: A Source Book, Springer-
Verlag, 2000, 434-438. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.458.
2222&rep=rep1&type=pdf.
[4] D. E. Knuth and M. F. Plass Breaking paragraphs into lines, Software: Experience and Prac-
tice 11 (1981), no.11.
[5] D. E. Knuth, The art of computer programming. Vol. 1, Fundamental algorithms, 3rd ed. [of
MR0286317], Addison-Wesley, Reading, MA, 1997. MR3077152
[6] D. E. Knuth, The art of computer programming, http://www-cs-faculty.stanford.edu/
~uno/taocp.html.
[7] S. D. Miller, An Easier Way to Show ζ(3) ∈ Q. http://sites.math.rutgers.edu/~sdmiller/
simplerzeta3.pdf.
[8] The TEX User Group, History of TEX, http://www.tug.org/whatis.html.
[9] V. V. Zudilin, One of the numbers ζ(5), ζ(7), ζ(9), ζ(11) is irrational (Russian), Uspekhi Mat.
Nauk 56 (2001), no. 4(340), 149–150, DOI 10.1070/RM2001v056n04ABEH000427; English
transl., Russian Math. Surveys 56 (2001), no. 4, 774–776. MR1861452
[10] W. Zudilin, Apéry’s theorem. Thirty years after, Int. J. Math. Comput. Sci. 4 (2009), no. 1,
9–19. https://arxiv.org/abs/math/0202159. MR2598496
1980
Introduction
The Wallace–Bolyai–Gerwien theorem states that two rectilinear figures are
equidecomposable if and only if they have the same area. For example, if a square
and an equilateral triangle have the same area, then they can be dissected into
a finite number of polygonal pieces so that one figure can be rearranged into the
other; see Figure 1. The hypothesis that the original figures and the resulting pieces
are rectilinear is necessary (see the 1924 entry on the Banach–Tarski paradox).
The history of the theorem is convoluted. According to the detailed history
set forth in [3], the problem was posed in 1807 by the Scottish mathematician
William Wallace1 (1768–1843). John Lowry arrived at the first proof of what is
now known as the Wallace–Bolyai–Gerwien theorem in 1814, although sadly his
contribution now appears largely unheralded and we are unable to find out much
information about him. The Hungarian mathematician Wolfgang Farkas Bolyai
(1775–1856) independently proved the result in 1832, followed shortly thereafter by
Paul Gerwien. Little is known about Gerwien, save that he was a “lieutenant in the
Prussian 22nd Infantry Regiment and instructor in the Royal Prussian Cadet Corps
in the early 1830s” and that he published two papers and an analytic geometry
textbook in the 1830s [7]. As for Farkas Bolyai, he is well known for the stern
warning to his son János Bolyai (1802–1860) about Euclid’s parallel postulate (see
the comments from the 1963 entry): “do not try the parallels in that way. . . . I have
measured that bottomless night, and all the light and all the joy of my life went
out there” [11].
Does the Wallace–Bolyai–Gerwien theorem have an analogue for solids in three
dimensions? At the International Congress of Mathematicians in Paris in 1900,
David Hilbert presented a host of problems to inspire and guide mathematicians
in the new century [8] (see the 1935 and 1970 entries). His third problem concerns
polyhedra, the analogues of polygons in three dimensions. Hilbert asked if two
polyhedra with equal volumes can always be dissected into finitely many polyhedra
so that one of the original polyhedra can be rearranged to form the other.
The problem was quickly dispatched by Max Dehn (1878–1952) in 1901. He
introduced a polyhedral invariant, the Dehn invariant, such that two polyhedra
are equivalent under dissection if and only if they have the same volume and same
Dehn invariant. Dehn proved that the cube and the tetrahedron have different
Dehn invariants and hence they are not equidecomposable [6]. More turns out to
be true. In 1980 (hence the topic for this year’s entry), Hans Debrunner showed
369
370 1980. HILBERT’S THIRD PROBLEM
that if a polyhedron tiles three-dimensional space, then its Dehn invariant is zero
[5]. Since tetrahedra have nonzero Dehn invariants, they cannot tile R3 . See [2, 9]
and the references therein for a readable account of the method.
Dehn was not the first to solve Hilbert’s third problem. That honor goes to
Ludwik Antoni Birkenmajer (1855–1929), who solved it in 1882 for a math contest
held by the Academy of Arts and Sciences of Kraków [3]. The competitors were
challenged with the following:
Given any two tetrahedra with equal volumes, subdivide one of them
by means of planes, if it is possible, into the smallest possible number
of pieces that can be rearranged so as to form the other tetrahedron. If
this cannot be done at all or can be done only with certain restrictions,
then prove the impossibility or specify precisely those restrictions.
This is Hilbert’s third problem! Although it was judged at the time to be correct,
Birkenmajer’s solution was never published. It disappeared from history until it
was recently rediscovered and reevaluated; a modern appraisal deems it valid [3].
2 The regular tetrahedra do not have to be congruent; similarity is enough. This is because
One can show that 20 tetrahedra can touch at a point. This can be done in
such a way that the 20 opposite faces of these tetrahedra (not touching the point)
lie on the 20 faces of a regular icosahedron, whose centroid is the point at which the
tetrahedra touch. We can get an upper bound on how many tetrahedra can touch
by determining the solid angle subtended by a regular tetrahedron and dividing it
into a full solid angle 4π ≈ 12.56 steradians. In this way, it is found that there is
room for at most 22 tetrahedra to touch at a point. Is the answer 20, 21, or 22?
No one knows. The answer is suspected to be 20. Can one even rule out 22?
The problem can be turned into a two-dimensional problem by intersecting a
neighborhood of the point in question with a small sphere. How many equilateral
spherical triangles, with all angles arccos(1/3) (about 71 degrees) can be packed on
the surface of a unit sphere without overlap?
1980: Comments
Origin of the problem. The first written instance of the centennial problem
appears to be the paper of Lagarias and Chuanming Zong [10, p. 1545], in which
the number 20 is suggested as the correct answer. However, the problem seems
to have circulated in the community for many years. Paul Sally (1933–2013) told
Lagarias that he encountered the problem at Lincoln Labs in 1958.
A tiling problem. Here is a problem that has an elegant solution using in-
variants (an invariant is a quantity that is unchanged throughout a process). Is it
possible to tile a chessboard with two opposite corners removed (Figure 2(b)) with
1 × 2 dominoes (Figure 2(a))? If we assign white the number −1 and black the
number 1, then the sum of the values in any figure tiled by dominoes is zero. This
is an invariant: this value is the same regardless of how many dominoes are used
or where they are placed. Since the sum of the values in the modified chessboard
is −2, no such tiling is possible. See the 2003 entry for a more difficult variation on
this problem.
(a) The perimeter of the infected area remains (b) The perimeter of the infected area de-
unchanged. creases by four.
infected. Can we reduce this to approximately n initial zombies and still end with
a complete takeover? Yes. One can show that infecting the main diagonal suffices
and this requires only n zombies to start with.
100TH ANNIVERSARY PROBLEMS 373
Can the undead rule the world if there are only n − 1 zombies at the beginning?
No! One way to see this is to introduce the following monovariant. At time t,
let P (t) denote the perimeter of the infected area. One can show that P (t) is
nonincreasing; two cases are shown in Figure 4. Since the perimeter of the n × n
board is 4n and since the maximum possible perimeter of a configuration with n − 1
infected squares is 4(n − 1), it follows that at least one person will be safe if the
zombie apocalypse commences with only n − 1 zombies. We leave it as an exercise
to the reader to determine exactly how many people will be safe.
to replace the two occurrences of Fk with Fk−2 + Fk−1 , which has a smaller index
sum. These two processes can occur only finitely many times. When the procedure
terminates, there can be no repeats or adjacencies in the decomposition. Thus, we
have a Zeckendorf decomposition. See [4] for generalizations.
Bibliography
[1] Aristotle, On the Heavens, http://classics.mit.edu/Aristotle/heavens.html.
[2] M. Aigner and G. M. Ziegler, Proofs from The Book, 3rd ed., including illustrations by Karl
H. Hofmann, Springer-Verlag, Berlin, 2004. MR2014872
[3] D. Ciesielska and K. Ciesielski, Equidecomposability of polyhedra: a solution of Hilbert’s
third problem in Kraków before ICM 1900, Math. Intelligencer 40 (2018), no. 2, 55–63, DOI
10.1007/s00283-017-9748-4. https://doi.org/10.1007/s00283-017-9748-4. MR3814621
[4] K. Cordwell, M. Hlavacek, C. Huynh, S. J. Miller, C. Peterson, and Y. N. T. Vu, On summand
minimality of generalized Zeckendorf decompositions, https://arxiv.org/abs/1608.08764.
[5] H. E. Debrunner, Über Zerlegungsgleichheit von Pflasterpolyedern mit Würfeln (German),
Arch. Math. (Basel) 35 (1980), no. 6, 583–587 (1981), DOI 10.1007/BF01235384. MR604258
[6] M. Dehn, Üeber den Rauminhalt, Mathematische Annalen 55 (1901), no. 3, 465–478. http://
gdz.sub.uni-goettingen.de/dms/load/img/?PID=GDZPPN002258633.
[7] G. N. Frederickson, Dissections: plane & fancy, Cambridge University Press, Cambridge,
1997. MR1735254
[8] D. Hilbert, Über das Unendliche, Math. Ann. 95 (1926), 161–190. http://link.springer.
com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/1902-
08-10/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf.
[9] L. A. Krasilnikova, Hilbert’s Third Problem (A Story of Threes), MIT Admissions Blog,
February 25, 2015, http://mitadmissions.org/blogs/entry/hilberts-third-problem-a-
story-of-threes and http://sciencecow.mit.edu/me/hilberts_third_problem.pdf.
[10] J. C. Lagarias and C. Zong, Mysteries in packing regular tetrahedra, Notices Amer. Math. Soc.
59 (2012), no. 11, 1540–1549, DOI 10.1090/noti918. http://www.ams.org/notices/201211/
rtx121101540p.pdf. MR3027108
374 1980. HILBERT’S THIRD PROBLEM
[11] J. J. O’Connor and E. F. Robertson, Farkas Wolfgang Bolyai, MacTutor History of Mathe-
matics, http://www-groups.dcs.st-and.ac.uk/history/Biographies/Bolyai_Farkas.html.
[12] Wikipedia, Dehn invariant, https://en.wikipedia.org/wiki/Dehn_invariant.
[13] Wikipedia, Hilbert’s third problem, https://en.wikipedia.org/wiki/Hilbert’s third
problem.
[14] Wolfram MathWorld, Dissection, http://mathworld.wolfram.com/Dissection.html.
1981
Introduction
In 1981, Walter Wilson Stothers (1946–2009) proved a remarkable theorem
about polynomials [10], later discovered independently by Richard C. Mason [3].
Although broader generalizations exist, we state it here for polynomials over the
complex numbers for the sake of simplicity. The Mason–Stothers theorem states
that if a, b, c are relatively prime polynomials, not all of which are constant, and
a + b = c, then
max{deg a, deg b, deg c} ≤ deg rad(abc) − 1, (1981.1)
in which rad f denotes radical of f , that is,
the product of the distinct irreducible
factors of f . For example, rad x3 (x + 1)2 = x(x + 1). Since the field of complex
numbers is algebraically closed,
deg rad(abc) = number of distinct roots of abc.
What is the importance of the Mason–Stothers theorem? The centennial prob-
lem for this year is to prove the polynomial version of Fermat’s last theorem! If
that is not motivation enough, perhaps an integer analogue of the Mason–Stothers
theorem will interest you. Why should such an analogue exist? As students of
abstract algebra know, there are a great many similarities between integers and
polynomials. For example, they both form rings and they both enjoy unique fac-
torization into irreducibles. The integers have the primes as their basic building
blocks and the polynomials have the linear polynomials as theirs.
Here is a first attempt at an integer analogue of (1981.1). Suppose that a, b, c
are relatively prime integers and a + b = c. A naive generalization of (1981.1) is
max |a|, |b|, |c| ≤ rad(abc), (1981.2)
in which rad(abc) denotes the product of the distinct prime factors of abc. For
example, rad(200) = rad(23 · 52 ) = 2 · 5 = 10. Unfortunately, (1981.2) is false, even
if we replace rad(abc) with K rad(abc) for some large K > 0 [1]. Let p ≥ 2K be a
large prime and let
a = 1, b = 2p(p−1) − 1, and c = 2p(p−1) .
Then Euler’s generalization of Fermat’s little theorem ensures that p2 divides b and
hence the strengthened (1981.2) implies that
2bK 2Kc
c ≤ K rad(abc) ≤ < < c,
p p
375
376 1981. THE MASON–STOTHERS THEOREM
1981: Comments
Snyder’s proof of the Mason–Stothers theorem. In 2000, Noah Snyder
provided a simple proof of the Mason–Stothers theorem [9, 11]. Let a, b, c be rela-
tively prime polynomials, not all of which are constant, and suppose that a+b+c = 0
(it is more convenient to work with this symmetric version instead of a + b = c).
Then a, b, c are pairwise relatively prime since any polynomial that divides two of
a, b, c divides the third. Since
a + b + c = 0,
the three Wronskians
W (a, b) = ab − a b, (1981.3)
W (b, c) = bc − b c = b(−a − b ) − b (−a − b) = ab − a b,
W (c, a) = ca − c a = (−a − b)a − (−a − b )a = ab − a b
are equal. Let
W = W (a, b) = W (b, c) = W (c, a)
denote their common value. We claim that W is not the zero polynomial. Without
loss of generality, suppose that a
= 0; that is, a is not constant. If W = 0, then
ab = a b and hence a divides a since gcd(a, b) = 1. However, this contradicts the
fact that deg a > deg a . Thus, W is not identically zero.
100TH ANNIVERSARY PROBLEMS 377
The various formulas for W ensure that gcd(a, a ), gcd(b, b ), and gcd(c, c )
each divide W . Since these three polynomials are pairwise relatively prime, W is
divisible by their product and hence
However,
⎫
deg gcd(a, a ) ≥ deg a − (number of distinct roots of a),⎪
⎬
deg gcd(b, b ) ≥ deg b − (number of distinct roots of b), (1981.5)
⎪
⎭
deg gcd(c, c ) ≥ deg c − (number of distinct roots of c).
Putting (1981.4) and (1981.5) together, simplifying, and using the relative primality
of a, b, c, yields
By symmetry, the same argument provides identical bounds deg a and deg b. This
yields (1981.1), as desired.
In particular,
|π −1 (z)| ≤ deg π,
with equality unless f (w) − zg(w) has a double root.
If a, b, c are relatively prime polynomials, not all of which are constant, and
a + b = c, then let π = a/c. Every term on the right-hand side of (1981.6) is
nonnegative and hence
2 deg π ≥ 2 + deg π − |π −1 (z)| .
z∈{0,1,∞}
378 1981. THE MASON–STOTHERS THEOREM
Bibliography
[1] A. Granville and T. J. Tucker, It’s as easy as abc, Notices Amer. Math. Soc. 49 (2002),
no. 10, 1224–1231. http://www.ams.org/notices/200210/fea-granville.pdf. MR1930670
[2] G. A. Jones and D. Singerman, Complex functions: An algebraic and geometric viewpoint,
Cambridge University Press, Cambridge, 1987. MR890746
[3] R. C. Mason, Diophantine equations over function fields, London Mathematical Society Lec-
ture Note Series, vol. 96, Cambridge University Press, Cambridge, 1984. MR754559
[4] D. W. Masser, Open problems, Proceedings of the Symposium on Analytic Number Theory.
London: Imperial College, 1985.
[5] S. Mochizuki, http://www.kurims.kyoto-u.ac.jp/~motizuki/top-english.html.
[6] J. Oesterlé, Nouvelles approches du “théorème” de Fermat (French), Astérisque 161-162
(1988), Exp. No. 694, 4, 165–186 (1989). Séminaire Bourbaki, Vol. 1987/88. MR992208
[7] P. Ribenboim, 13 Lectures of Fermat’s Last Theorem, Springer, 1979.
[8] J. H. Silverman, The S-unit equation over function fields, Math. Proc. Cambridge Philos.
Soc. 95 (1984), no. 1, 3–4, DOI 10.1017/S0305004100061235. MR727073
[9] N. Snyder, An alternate proof of Mason’s theorem, Elem. Math. 55 (2000), no. 3, 93–94, DOI
10.1007/s000170050074. http://cr.yp.to/bib/2000/snyder.pdf. MR1781918
[10] W. W. Stothers, Polynomial identities and Hauptmoduln, Quart. J. Math. Oxford Ser. (2)
32 (1981), no. 127, 349–370, DOI 10.1093/qmath/32.3.349. http://qjmath.oxfordjournals.
org/content/32/3/349.extract. MR625647
[11] Wikipedia, Mason–Stother’s theorem, https://en.wikipedia.org/wiki/Mason-Stothers
theorem.
1982
Introduction
The debate about whether the natural numbers and the primes are built into
the universe or whether they are human constructs has raged for centuries. In
A Mathematician’s Apology, G. H. Hardy (see the 1920, 1923, and 1940 entries)
asserts:
317 is a prime, not because we think so, or because our minds are
shaped in one way rather than another, but because it is, because
mathematical reality is built that way.
We make no attempt to wade into these deep waters here: you are welcome to
consider Figures 1 and 2 and draw your own conclusions.
Despite his legendary aversion to applicable mathematics, Hardy helped to so-
lidify some of the theoretical underpinnings of probability theory. Famed probabilist
Persi Diaconis (1945– ) wrote [1]:
Despite a true antipathy to the subject, Hardy contributed deeply
to modern probability. His work with Ramanujan begat probabilistic
number theory. His work on Tauberian theorems and divergent series
has probabilistic proofs and interpretations. Finally, Hardy spaces are
a central ingredient in stochastic calculus. . . .
I want to argue that Hardy had no knowledge of probability the-
ory and indeed had a genuine antipathy to the subject. To begin with,
Hardy loved clear rigorous argument. At the time he worked, the
mathematical underpinnings of probability were a vague mess. . . it was
only in 1933 that Kolmogorov gave a measure theoretic interpretation
of probability; a random variable was defined as a measurable func-
tion. Then one could see that early workers in probability; Bernoulli,
Laplace, Gauss, Chebychev, Markov were doing mathematics after all.
The naive approach to probability is full of pitfalls and paradoxes. It took
many years for the theory to be established on firm foundations. We have seen
several examples of paradoxes throughout this book. Each one provides a valuable
opportunity for further work: it means there is something incomplete or incompat-
ible with our view of mathematics. We began with paradoxes related to the notion
of infinity in the 1918 entry. Then we encountered the Banach–Tarski paradox in
the 1924 entry, which challenged our understanding of what area and volume are.
We continued with the liar’s paradox and Russell’s paradox in the 1929 entry and
discussed related issues in set theory and logic. This is just the start! We have
many others paradoxes in the entries before this, as well as a few more ahead (see
the Monty Hall problem in the 1990 entry).
381
382 1982. TWO ENVELOPES PROBLEM
For this year, we look at the two envelopes problem. A player can choose be-
tween two closed, identically constructed envelopes. The envelopes are labeled A
and B, respectively. Both envelopes contain money, although one of them contains
twice as much as the other. The one that contains more money cannot be deter-
mined without opening the envelopes. You initially choose envelope A but do not
open it. You are permitted to switch envelopes indefinitely until a final decision is
made. Which envelope should you open?
Where is the paradox? Let us examine the expected value of switching the
envelopes. For example, envelopes A and B each have a probability of 1/2 to
contain the greater amount. Suppose that we choose A and let x > 0 denote the
amount of money in the envelope. Should we switch? Since B has an equal chance
of having either value, half the time it should contain half as much as A, namely
x/2, and half the time it should contain twice, namely 2x. Thus, the expected
100TH ANNIVERSARY PROBLEMS 383
amount of money in B is
1 1 x 5
(2x) + = x. (1982.1)
2 2 2 4
Since this is larger than x, the amount in A, we should switch. Of course, the
same argument applies to B as well. Therefore, we should continue to switch back
and forth indefinitely since the expected return (5/4)n x after n switches tends to
infinity! In principle, this suggests that one can place $1 and $2 into two different
envelopes, juggle them for several minutes, then open one of the envelopes with the
expectation of receiving at least a billion dollars.
It is not entirely clear who first came up with the two envelopes problem. A vari-
ant of it appeared in a 1953 recreational mathematics book by Maurice Kraitchek
(1882–1957), who considered a wager between two rich men who wished to de-
termine whose necktie was more expensive. In the same year, Littlewood stated
another variant and credited it to Schrödinger (see the 1925 entry). What is beyond
a doubt is that the problem was popularized in 1982 by Scientific American writer
and puzzle enthusiast Martin Gardner [3]; see the 1914 entry.
The following problem was first proposed by Olle Häggström in 2013 [2]. It is
similar to Newcomb’s paradox, a variant of the two envelopes problem.
1982: Comments
Resolution of the two envelopes problem. Unlike many of the other para-
doxes that we have encountered, the issue for the two envelopes problem is easily
highlighted and explained. Let x denote the amount in the lesser of the two en-
velopes. Then the total amount of money in the two envelopes is 3x = x + 2x
and this cannot change. If A contains x dollars, then you gain x by switching. If
A contains 2x dollars, then you lose x by switching. Therefore, the expected gain
from switching is
1 1
x + (−x) = 0,
2 2
384 1982. TWO ENVELOPES PROBLEM
as expected. The issue with (1982.1) is that the terms 2x and x/2 are conditioned
upon whether envelope B contains more or less money than A. Thus, a more
complicated argument involving conditional probability is required to pursue that
line of reasoning.
Bibliography
[1] P. Diaconis, G. H. Hardy and probability???, Bull. London Math. Soc. 34 (2002), no. 4,
385–402, DOI 10.1112/S002460930200111X. https://doi.org/10.1112/S002460930200111X.
MR1897417
[2] O. Häggström, Paradoxes in Probability Theory (Book Review), Notices of the AMS 3 (2013),
329–331.
[3] M. Gardner, Aha! Gotcha: Paradoxes to Puzzle and Delight, W. H. Freeman & Co, 1982.
[4] Wikipedia, Two envelopes problem, http://en.wikipedia.org/wiki/Two envelopes problem.
1983
Julia Robinson
Introduction
Julia Robinson was the first woman to become president of the American Math-
ematical Society (1983–1984). She shared her passion for mathematics with her
sister and biographer, Constance Reid, who said about Robinson:
She herself, in the normal course of events, would never have considered
recounting the story of her own life. As far as she was concerned, what
she had done mathematically was all that was significant. [8]
“Significant” is a fitting word indeed when speaking about the magnitude of Robin-
son’s mathematical accomplishments, especially regarding her contributions to the
eventual resolution of Hilbert’s tenth problem (see the 1970 entry).
In the early years of the 20th century, David Hilbert proposed twenty-three
problems that would shape mathematics for decades to come [3]. One of the un-
derlying themes of the list was the question of decidability. Given a mathematical
problem that falls into a certain class, is there a general algorithm that can solve
every problem in the class?
For his tenth problem, Hilbert considered the general solvability of Diophantine
equations. These are equations of the form
P (x1 , x2 , . . . , xn ) = 0,
1983: Comments
Diophantine sets and the prime numbers. We say that S ⊆ Nj is a
Diophantine set if there is a Diophantine equation
P (x1 , x2 , . . . , xj , y1 , y2 , . . . , yk ) = 0
P (n1 , n2 , . . . , nj , m1 , m2 , . . . , mk ) = 0.
For example, the set {1, 4, 9, 16, . . .} is Diophantine since we may let
P (x, y) = y 2 − x.
Indeed, x is a perfect square if and only if x = y 2 for some y ∈ N, that is, if and
only if P (x, y) = 0. Similarly, the arithmetic progression {a + b, 2a + b, 3a + b, . . .}
is Diophantine, as witnessed by the polynomial P (x, y) = ay + b − x.
Is the set of prime numbers Diophantine? Surely the primes are random and
unpredictable enough that they could never be encapsulated by a single polynomial,
right? In 1976, James P. Jones, Daihachiro Sato, Hideo Wada, and Douglas Wien
found such a polynomial. They write:
Martin Davis, Yuri Matijasevich, Hilary Putnam and Julia Robinson
have proven that every recursively enumerable set is Diophantine, and
hence that the set of prime numbers is Diophantine. . . it follows that
the set of prime numbers is representable by a polynomial formula. In
this article such a prime representing polynomial will be exhibited in
explicit form. [5]
100TH ANNIVERSARY PROBLEMS 387
(a) Cube
(b) Octahedron
(c) Dodecahedron
The set of prime numbers, they show, is precisely the set of positive values assumed
by the polynomial
P (a, b, c, . . . , x, y, z)
= (k + 2) 1 − [wz + h + j − q]2 − [(gk + 2g + k + 1)(h + j) + h − z]2
− [16(k + 1)3 (k + 2)(n + 1)2 + 1 − f 2 ]2 − [2n + p + q + z − e]2
− [e3 (e + 2)(a + 1)2 + 1 − o2 ]2 − [(a2 − 1)y 2 + 1 − x2 ]2
− [16r 2 y 4 (a2 − 1) + 1 − u2 ]2 − [n + + v − y]2
− [(a2 − 1)2 + 1 − m2 ]2 − [ai + k + 1 − − i]2
− [((a + u2 (u2 − a))2 − 1)(n + 4dy)2 + 1 − (x + cu)2 ]2
− [p + (a − n − 1) + b(2an + 2a − n2 − 2n − 2) − m]2
− [q + y(a − p − 1) + s(2ap + 2a − p2 − 2p − 2) − x]2
− [z + p(a − p) + t(2ap − p2 − 1) − pm]2 ,
which is of degree 25 and has 26 variables. An interesting corollary is that if p
is a prime number, then there is a computation that confirms the primality of p
that involves only 87 additions and multiplications. Indeed, one only need exhibit
natural numbers a, b, c, d, . . . , x, y, z so that P (a, b, c, . . . , x, y, z) = p. Of course,
finding such numbers is no easy task.
Mills’s constant. The existence of a single polynomial that encodes the prime
numbers is shocking. What about a simple formula that produces only prime
numbers? In 1947, William H. Mills proved the existence of a constant A (called
Mills’s constant) so that
n
A3
is prime for n = 0, 1, 2, . . .. Since there are actually uncountably many values of A
with this property, the term “Mills’s constant” is mildly inappropriate, especially
since Mills himself did not specify a precise numerical value for A. Assuming the
truth of the Riemann hypothesis, the smallest possible Mills’s constant begins
1.3063778838630806904686144926026057129167845851567136 . . . .
It was calculated to 6,850 decimal places by Chris K. Caldwell and Yuanyou Cheng
in 2005 [2]. It is unknown whether this constant is rational or irrational.
The proof of Mills’s result indicates how one might go about constructing such
a constant A. Unfortunately, one needs to know a lot about the distribution of the
prime numbers to compute A and hence Mills’s result is not a practical method for
producing primes.
Here is the proof. Let pn denote the nth prime number. Using knowledge about
the rate of growth of the Riemann zeta function on the critical line 12 + it, Albert
Ingham showed in 1937 that there is a constant K so that1
pn+1 − pn < Kp5/8
n
1 Roger C. Baker (1947– ), Glyn Harman (1956– ), and János Pintz (1950– ) showed that the
exponent can be lowered from 5/8 = 0.625 to 0.525 [1]. Assuming the Riemann hypothesis, this
can be reduced even further and the constant K made explicit.
100TH ANNIVERSARY PROBLEMS 389
for all n ∈ N [4]. We use this to show that if N ≥ K 8 , then there is a prime p so
that
N 3 < p < (N + 1)3 . (1983.1)
To see this, let pn = N 3 . Then
N 3 < pn+1
< pn + Kp5/8
n
< N 3 + KN 15/8
≤ N3 + N2
< (N + 1)3 − 1,
as desired.
If P0 ≥ K 8 is prime, then (1983.1) permits us to find a sequence P0 , P1 , P2 , . . .
of primes so that
Pn3 < Pn+1 < (Pn + 1)3 − 1. (1983.2)
Define
−n −n
un = Pn3 and vn = (Pn + 1)3 .
Then perform a few computations based upon (1983.2) to verify that
un < un+1 < vn+1 < vn
for all n. In particular, the sequence un is increasing and bounded above by v0 and
is therefore convergent. Define
A = lim un
n→∞
and observe that un < A < vn , and hence
n
Pn < A3 < Pn + 1,
for all n. Thus,
n
A3 = Pn
is prime for n = 0, 1, 2, . . ..
Bibliography
[1] R. C. Baker, G. Harman, and J. Pintz, The difference between consecutive primes. II, Proc.
London Math. Soc. (3) 83 (2001), no. 3, 532–562, DOI 10.1112/plms/83.3.532. MR1851081
[2] C. K. Caldwell and Y. Cheng, Determining Mills’ constant and a note on Honaker’s problem,
J. Integer Seq. 8 (2005), no. 4, Article 05.4.1, 9. MR2165330
[3] D. Hilbert, Über das Unendliche, Math. Ann. 95 (1926), 161–190. http://link.springer.
com/article/10.1007%2FBF01206605. See also http://www.ams.org/journals/bull/1902-
08-10/S0002-9904-1902-00923-3/S0002-9904-1902-00923-3.pdf.
[4] A. E. Ingham, On the difference between consecutive primes, Quart. J. Math. Oxford 8 (1937),
255–266.
[5] J. P. Jones, D. Sato, H. Wada, and D. Wiens, Diophantine representation of the set of prime
numbers, Amer. Math. Monthly 83 (1976), no. 6, 449–464, DOI 10.2307/2318339. https://
www.jstor.org/stable/2318339. MR0414514
[6] Y. Matiyasevich, My Collaboration with Julia Robinson, http://logic.pdmi.ras.ru/~yumat/
personaljournal/collaborationjulia/index.html.
[7] W. H. Mills, A prime-representing function, Bull. Amer. Math. Soc. 53 (1947), 604, DOI
10.1090/S0002-9904-1947-08849-2. MR0020593
390 1983. JULIA ROBINSON
[8] C. Reid, The autobiography of Julia Robinson, College Math. J. 17 (1986), no. 1, 3–21,
DOI 10.2307/2686866. https://www.maa.org/sites/default/files/pdf/upload_library/
22/Polya/07468342.di020720.02p00912.pdf. MR827630
[9] C. Reid, Being Julia Robinson’s sister, Notices Amer. Math. Soc. 43 (1996), no. 12, 1486–
1492. http://www.ams.org/notices/199612/reid.pdf. MR1416722
[10] Wikipedia, Hilbert’s tenth problem, http://en.wikipedia.org/wiki/Hilbert’s tenth
problem.
[11] Wikipedia, Julia Robinson. http://en.wikipedia.org/wiki/Julia_Robinson.
[12] C. Wood, Julia Robinson and Hilbert’s Tenth Problem (film review), Notices of the American
Mathematical Society (2008), 573-575. http://www.ams.org/notices/200805/tx080500573p.
pdf.
1984
1984
Introduction
The year is the title of this entry. The other entries of this work honor mathe-
maticians or mathematical events; in a sense, this year honors math itself. Here 1984
refers to the classic dystopian novel, 1984, by George Orwell (1903–1950). Written
thirty-five years prior to 1984, it describes a world at perpetual war in which the
three major governments manipulate and control their populations. Some of the
methods of control are centuries old, such as informants, constant surveillance, and
fear. Others are either new or are given a clearer expression than before, such
as Newspeak (the language of Oceania, designed to limit freedom of thought by
restricting what can be discussed).
One of the most famous passages of the novel involves the “equation”
2 + 2 = 5,
which is false (unless one works modulo 1, as one does in certain Diophantine
approximation problems; see the 1922, 1931, 1938, and 1972 entries). The protag-
onist, Winston Smith, is thinking about Big Brother, the rule of the party, and
“alternative facts”:
In the end the Party would announce that two and two made five,
and you would have to believe it. It was inevitable that they should
make that claim sooner or later: the logic of their position demanded
it. Not merely the validity of experience, but the very existence of
external reality, was tacitly denied by their philosophy. The heresy
of heresies was common sense. And what was terrifying was not that
they would kill you for thinking otherwise, but that they might be
right. For, after all, how do we know that two and two make four? Or
that the force of gravity works? Or that the past is unchangeable? If
both the past and the external world exist only in the mind, and if the
mind itself is controllable. . . what then?
The Star Trek: The Next Generation episode Chain of Command, Part II fea-
tures a striking homage to 1984 that mirrors the powerful scene in which Winston
Smith is tortured by O’Brien (Orwell does not provide the character with a first
391
392 1984. 1984
1984: Comments
So Long, and Thanks for All the Fish. The fourth book, So Long, and
Thanks for All the Fish, in Douglas Adams’s heralded “Hitchhiker’s guide trilogy”
was released in 1984. In the first book, The Hitchhiker’s Guide to the Galaxy, the
supercomputer Deep Thought (after a seven and a half million year long calcula-
tion) produced the “Answer to the Ultimate Question of Life, The Universe, and
Everything”: 42. This unhelpful response prompted the construction of a much
more sophisticated computer that would find out what the Ultimate Question ac-
tually was.
1 This is an inversion of the “O’Brien must suffer” theme (as fans refer to it) in Star Trek:
Deep Space Nine scripts. The character Miles O’Brien (2383– ), a noncommissioned everyman
whom audience members could relate to, was often subjected to various physical and emotional
tortures.
100TH ANNIVERSARY PROBLEMS 393
At the end of the second book, The Restaurant at the End of the Universe, after
ten million additional years, the new computer (we will not spoil its identity) reveals
that the Ultimate Question is, “What do you get if you multiply six by nine?”
Unfortunately, an error was unintentionally introduced into the computation that
rendered the result meaningless. Or did it?
Most readers will agree that 6 × 9 = 54. However, it is true that
6 × 9 = 42
if the computations are carried out in base 13 since
54 = 4 · 13 + 2 · 1 = (42)13 .
Adams has claimed that this was unintentional. On the other hand, the title “42”
of the 2007 Doctor Who episode is an intentional simultaneous homage to Douglas
Adams and the television show “24,” along with a reference to the approximate
running time of the episode (which proceeds in “real time”).
which ensures that the radius of convergence of the power series that defines f is
greater than or equal to 1. If f is one-to-one on D, what can be said about the
growth of the Taylor coefficients an ? Since f (0) = a0 , it makes sense to normalize
f so that f (0) = 0; otherwise, a0 could be any complex number. Then f (0) = a1
and there are two possibilities. If a1 = 0, then f (0) = 0 and basic complex analysis
tells us that f is not one-to-one on any neighborhood of the origin. Thus, we may
assume that a1
= 0. In this case, we may divide f by a1 and, without loss of
generality, we may as well assume that a1 = 1:
f (z) = z + a2 z 2 + a3 z 3 + · · · .
Such a one-to-one analytic function is called a schlicht function.
Suppose that f is schlicht. The Bieberbach conjecture, first posed in 1916,
states that |an | ≤ n for n ≥ 2 and, moreover, that if equality is attained for some
n, then
∞
z
f (z) = = nαn−1 z n (1984.1)
(1 − αz)2 n=1
Bibliography
[1] L. Bieberbach, Über die Koeffizienten derjenigen Potenzreihen, welche eine schlichte Abbil-
dung des Einheitskreises vermitteln, Sitzungsber. Preuss. Akad. Wiss. Phys-Math. Kl. (1916),
940–955.
[2] L. de Branges, A proof of the Bieberbach conjecture, Acta Math. 154 (1985), no. 1-2, 137–
152, DOI 10.1007/BF02392821. http://link.springer.com/article/10.1007%2FBF02392821.
MR772434
[3] L. de Branges, Underlying concepts in the proof of the Bieberbach conjecture, Proceedings
of the International Congress of Mathematicians, Vol. 1, 2 (Berkeley, Calif., 1986), Amer.
Math. Soc., Providence, RI, 1987, pp. 25–42. http://www.mathunion.org/ICM/ICM1986.1/
Main/icm1986.1.0025.0042.ocr.pdf. MR934213
[4] J. J. O’Connor and E. F. Robertson, Ludwig Georg Elias Moses Bieberbach, Mac-
Tutor History of Mathematics, http://www-history.mcs.st-andrews.ac.uk/Biographies/
Bieberbach.html.
[5] G. Orwell, Nineteen Eighty-Four: A novel, Secker & Warburg, 1949.
[6] Texas Instruments Incorporated, The Great International Math on Keys Book, Texas Instru-
ments, 1976.
1985
Introduction
A knot is an embedding of a circle in three-dimensional space. We consider here
only tame knots; these are knots that can be physically realized with a string or
rope that has a nonzero thickness. Given two knots, how do we determine if they
are equivalent? That is, can we manipulate one of them into the other without
cutting? For example, is the trefoil knot (Figure 1(b)) equivalent to the unknot
(Figure 1(a))?
Knot theorists try to solve these sorts of problems by associating a mathemat-
ical object (an invariant) to each knot in such a way that equivalent knots are
assigned the same invariant. Thus, two knots with different invariants are truly
different knots: neither can be manipulated into the other (on the other hand, two
knots with the same invariant might turn out to be inequivalent). One desires knot
invariants that are simple to compute and compare. So far, nobody has come up
with a simple invariant that can distinguish between all nonequivalent knots.
Although knots have been used since ancient times, their mathematical study
began with Gauss’s development of linking numbers in 1833. Although physicists
395
396 1985. THE JONES POLYNOMIAL
were interested in knots for a period in the 1800s, the modern study of knots
only took off in the early 20th century. Max Dehn (see the 1980 entry), James
Waddell Alexander (1888–1971), and Kurt Reidemeister (1893–1971) were early
contributors. In particular, Alexander discovered the first knot polynomial [2]. The
so-called Alexander polynomial of a knot is a Laurent polynomial (negative powers
of the variable are permitted) with integer coefficients that is a knot invariant: two
equivalent knots share the same Alexander polynomial.
The year 1985 marked the publication of the explosive paper “A polynomial
invariant for knots via von Neumann algebras” by Vaughan F. R. Jones [7]. This
paper introduced the Jones polynomial of a knot, a Laurent polynomial invariant
that is distinct from the Alexander polynomial and that could settle problems
t−2 − t−1 + 1 − t + t2 t3 + t5 − t8
that were impervious to previous methods; see Figure 2. It also exposed links
between knot theory and physics that revitalized interest in the subject. Soon
afterwards, a variety of other invariants, such as the HOMFLY polynomial [4], were
discovered. Jones’s work had ushered in a new age in knot theory. See [1, Ch. 6]
for an introduction to the Alexander, Jones, and HOMFLY polynomials and how
to compute them.
The existence of the Jones polynomial was not the most surprising part of the
paper [7]. The most stunning portion of the title is “von Neumann algebras,” a
highly technical and abstract branch of operator theory (think linear algebra in
infinite-dimensional spaces with a hefty dose of analysis) with no initially apparent
connections to low-dimensional topology at all. One could hardly have predicted
deep links between two more seemingly disparate parts of mathematics! It is like
claiming that techniques from the theory of large cardinals (transfinite numbers
that are so large they require additional axioms beyond ZFC) could be used to
solve open problems in biostatistics. Most mathematicians prior to Jones would
have dismissed a connection between von Neumann algebras and knot theory as
wild fantasy. For his amazing discovery, Jones was awarded the Fields Medal in
1990.
1985: Comments
Knot theory in other dimensions. It is only in three dimensions that knot
theory is interesting. In two dimensions, there is essentially only the unknot (the
unit circle) since a knot cannot cross itself. In four dimensions and higher, there
is too much freedom to manipulate knots. One can show that any knot in four
dimensions can be untangled to obtain the unknot; see Figure 3 for an intuitive
explanation of this phenomenon.
Where do von Neumann algebras come in? Knot theorists rapidly built
on the Jones polynomial and developed more direct constructions. They have
since discovered many new invariants without the use of von Neumann algebras.
Consequently, information about the technical details of Jones’s work is hard to
come by in the knot theory literature.
We attempt to sketch, in broad strokes, some of the details behind Jones’s dis-
covery. We thank James Tener for his assistance in this endeavor. First of all, a
von Neumann algebra is a collection of bounded linear operators on a (typically
398 1985. THE JONES POLYNOMIAL
M−1 ⊆ M0 ⊆ M1 ⊆ M2 ⊆ · · ·
Bibliography
[1] C. C. Adams, The knot book: An elementary introduction to the mathematical theory of
knots, revised reprint of the 1994 original, American Mathematical Society, Providence, RI,
2004. MR2079925
[2] J. W. Alexander, Topological invariants of knots and links, Trans. Amer. Math. Soc. 30
(1928), no. 2, 275–306, DOI 10.2307/1989123. MR1501429
[3] O. T. Dasbach and S. Hougardy, Does the Jones polynomial detect unknottedness?, Ex-
periment. Math. 6 (1997), no. 1, 51–56. http://www.or.uni-bonn.de/~hougardy/paper/
does_the.pdf. MR1464581
[4] P. Freyd, D. Yetter, J. Hoste, W. B. R. Lickorish, K. Millett, and A. Ocneanu, A new polyno-
mial invariant of knots and links, Bull. Amer. Math. Soc. (N.S.) 12 (1985), no. 2, 239–246,
DOI 10.1090/S0273-0979-1985-15361-3. MR776477
[5] A. Jackson and L. Traynor, Interview with Joan Birman, Notices Amer. Math. Soc. 54 (2007),
no. 1, 20–29. http://www.ams.org/notices/200701/fea-birman.pdf. MR2275922
[6] V. F. R. Jones, Index for subfactors, Invent. Math. 72 (1983), no. 1, 1–
25, DOI 10.1007/BF01389127. http://link.springer.com/article/10.1007%2FBF01389127.
MR696688
[7] V. F. R. Jones, A polynomial invariant for knots via von Neumann algebras, Bull. Amer.
Math. Soc. (N.S.) 12 (1985), no. 1, 103–111, DOI 10.1090/S0273-0979-1985-15304-2. http://
www.ams.org/journals/bull/1985-12-01/S0273-0979-1985-15304-2/. MR766964
[8] The Knot Atlas, The Rolfsen Knot Table, http://katlas.org/wiki/The Rolfsen Knot Table
[9] MathOverflow, Why should I care about the Jones polynomial?, https://mathoverflow.net/
questions/304486/why-should-i-care-about-the-jones-polynomial.
[10] E. Witten, Jones polynomial, https://www.ias.edu/ideas/2011/witten-knots-quantum-
theory.
1986
Introduction
Long ago movie theaters had double features, at which you could see two films
for the price of one. The Astor Theatre in Melbourne opened in 1936. It is one of
the few places in the world where one can still catch a double feature. In honor of
its 50th anniversary, we present a mathematical double feature: two “recreational”
math topics for the price of one!
Almost everyone has heard about Sudokus. Their rise to popularity began in
1986 with the puzzle company Nikoli in Japan. Since then, they have become so
ubiquitous that they now share space with crossword puzzles in newspapers and
airline magazines. One is presented with a partially filled 9 × 9 grid, which is
subdivided into nine blocks of size 3 × 3. The goal is to fill in the empty boxes
with digits in such a way that each row and column is a permutation of 1, 2, . . . , 9.
Moreover, each block must contain each of 1, 2, . . . , 9 exactly once. Figure 1 is a
good example; see Figure 2 in the comments for the solution.
Sudokus involve a lot of terrific mathematics. The first natural question to
ask is how many distinct Sudoku puzzles there are. For example, if we switch all
1’s and 9’s, we obtain a puzzle that looks different, but that is fundamentally the
same. There are other transformations that can be performed: rotate the puzzle
2 7 6
6 1 3
4 9 2
3 2 5 9
1 5 3 7
6 9 1 4
1 5 2
9 6 1
4 2 3
401
402 1986. SUDOKUS AND LOOK AND SAY
Our example (Figure 1) has 30 clues and is therefore much simpler than the worst
case scenario: a Sudoku with only 17 clues. For more information about Sudoku,
see [4, 8, 10].
For our second feature, consider the famous see and say sequence (or look and
say sequence) introduced by John Horton Conway (1937– ) in 1986. The first few
terms are
The pattern is not immediately obvious because we are used to looking for patterns
that arise from mathematical processes. However, (1986.1) is generated linguis-
tically. It is created by the process suggested by its name. The first number is
“one 1”, so the second number is 11. The second number is “two 1’s”, so the third
number is 21, and so on. Can you show that no digit other than 1, 2, or 3 appears
in the sequence?
Conway and his colleagues proved a number of remarkable facts about the
sequence (1986.1). The following is from the abstract of a talk on the subject given
by Alex Kontorovich at Columbia on March 23, 2004:
He [Conway] found that the sequence decomposed into certain recur-
ring strings. Categorizing these 92 strings and labeling them by the
atoms of the periodic table (from Hydrogen to Uranium), Conway was
able to prove that the asymptotic length of the sequence grows expo-
nentially, where the growth factor (now known as Conway’s constant)
is found by computing the largest eigenvalue of a 92 × 92 transition
matrix. Even more remarkable is the Cosmological Theorem, which
100TH ANNIVERSARY PROBLEMS 403
states that regardless of the starting string, every Look and Say se-
quence will eventually decay into a compound of these 92 atoms, in a
bounded number of steps. Conway writes that, although two indepen-
dent proofs of the Cosmological Theorem were verified, they were lost
in writing! It wasn’t until a decade later that Doron Zeilberger’s pa-
per (coauthored with his computer, Shalosh B. Ekhad) gave a tangible
proof of the theorem. We will discuss this weird and wonderful chem-
istry, and some philosophical consequences. The only prerequisite is
basic linear algebra.
Many variants of Conway’s sequence have been analyzed. Some use different
starting numbers, others use binary, and still others count the total number of digits
instead of the numbers of digits in blocks. See [1–3, 9] for more information about
the “look and say” sequence and its variations.
1986: Comments
An algorithm for Sudokus. The Sudoko-solving approach that we suggest
below is not the fastest, but it connects to our 1947 entry on linear programming.
See [6] and the references therein for more information about algorithms and linear
programming.
404 1986. SUDOKUS AND LOOK AND SAY
Let X = [xi,j ]9i,j=1 be the 9 × 9 matrix that represents the unique solution to
the Sudoku puzzle. Some of the xi,j (hopefully at least 17 of them!) are given and
we must find the rest. We have the conditions that in each row, each column, and
each of the nine 3 × 3 blocks, each digit 1, 2, . . . , 9 appears exactly once. Linear
programming can solve problems such as these, although the additional restriction
that our entries are integers makes things more difficult.
A lot of packages exist for solving binary integer programming, which requires
that the variables only assume the values 0 or 1. We can modify our approach by
choosing variables xi,j,d for 1 ≤ i, j, d ≤ 9 such that
1 if xi,j = d,
xi,j,d =
0 otherwise.
How many constraints on the xi,j,d do we have? If I is the set of locations for which
we are given initial values, then we have |I| conditions. However, this is dwarfed
by what remains. Each row, column, and 3 × 3 block has exactly one of each of
the nine digits. The fact that we require exactly one 5 in the first row yields the
constraint
x1,1,5 + x1,2,5 + · · · + x1,9,5 = 1.
This yields 81 constraints for the rows, 81 for the columns, and 81 for the blocks
(some may be redundant or unnecessary due to the placement of the initial values).
So we have about 240 constraints, give or take a few. However, we do not want the
nonzero values to correspond to the same cell and hence we add 81 more constraints
9
xi,j,d = 1, 1 ≤ i, j ≤ 9,
d=1
ensures that exactly one of the xi,j,d is nonzero; we need 81 such constraints to make
sure we choose exactly one element of S for each grid location. The constraint on
the jth row is now
9
d · xi,j,d = 111,111,111.
d∈S i=1
Our choice of S ensures that the only way a row can sum to 111,111,111 is to have
exactly one element in the jth row that equals 1, exactly one that equals 10, and so
forth. This reduces the number of constraints by a huge amount, leaving us around
100 constraints to contend with.
100TH ANNIVERSARY PROBLEMS 405
2 3 8 7 6 9 1 4 5
6 5 1 4 8 3 7 2 9
4 9 7 1 2 5 8 3 6
3 2 4 6 7 1 5 9 8
1 8 5 2 9 4 3 6 7
7 6 9 3 5 8 2 1 4
9 7 3 8 1 6 4 5 2
5 4 2 9 3 7 6 8 1
8 1 6 5 4 2 9 7 3
Bibliography
[1] J. H. Conway, The weird and wonderful chemistry of audioactive decay, Eureka
46 (1986), 5-18. http://graphics8.nytimes.com/packages/pdf/crossword/GENIUS AT PLAY
Eureka Article.pdf.
[2] S. B. Ekhad and D. Zeilberger, Proof of Conway’s lost cosmological theorem, Elec-
tron. Res. Announc. Amer. Math. Soc. 3 (1997), 78–82, DOI 10.1090/S1079-6762-97-
00026-7. http://www.ams.org/journals/era/1997-03-11/S1079-6762-97-00026-7/S1079-
6762-97-00026-7.pdf. MR1461977
[3] Ó. Martı́n, Look-and-say biochemistry: exponential RNA and multistranded DNA, Amer.
Math. Monthly 113 (2006), no. 4, 289–307, DOI 10.2307/27641915. http://web.archive.
org/web/20061224154744/http://www.uam.es/personal_pdi/ciencias/omartin/Biochem.
PDF. MR2211756
[4] Math Explorer’s Club, The Math Behind Sudoku: References, http://www.math.cornell.
edu/~mec/Summer2009/Mahmood/References.html.
[5] G. McGuire, B. Tugemann, and G. Civario, There is no 16-clue Sudoku: solving the
Sudoku minimum number of clues problem via hitting set enumeration, Exp. Math. 23
(2014), no. 2, 190–217, DOI 10.1080/10586458.2013.870056. http://arxiv.org/abs/1201.
0749. MR3223774
[6] S. J. Miller, Mathematics of optimization: how to do things faster, Pure and Applied Under-
graduate Texts, vol. 30, American Mathematical Society, Providence, RI, 2017. MR3729274
[7] C. Rivera, Puzzle 657: Look and say sequence, http://www.primepuzzles.net/puzzles/
puzz_657.htm.
[8] E. Russell and F. Jarvis, There are 5472730538 essentially different Sudoku grids. . . and the
Sudoku symmetry group, Mathematical Spectrum 39 (2006), 54–58.
[9] Wikipedia, Look and Say, http://en.wikipedia.org/wiki/Look-and-say_sequence.
[10] Wikipedia, Sudoku, http://en.wikipedia.org/wiki/Sudoku.
1987
Introduction
In the 1942 entry, we saw that the Riemann zeta function
∞
1
ζ(s) =
n=1
ns
can be analytically continued from the half-plane Re s > 1 to C\{1}, with a simple
pole at s = 1 and with zeros at the negative even integers −2, −4, . . .. The nontrivial
zeros of the zeta function lie in the critical strip 0 < Re s < 1; the Riemann
hypothesis asserts that these all lie on the vertical line Re s = 12 [3].
The Euler product formula (1933.3) suggests a profound relationship between
the zeta function and the prime numbers. We suggested in the 1939 entry that the
location of the nontrivial zeros determines the large-scale behavior of the primes.
This profound link between the continuous (complex analysis) and discrete (prime
numbers) has long fascinated mathematicians. The primes dance to the tune played
by the zeros of an analytic function!
The classical methods of analytic number theory have not yet produced a proof
of the Riemann hypothesis. There is a general opinion among experts that a new
approach is needed. One idea that has spurred a huge amount of research in the
last several decades is the Hilbert–Pólya conjecture, which says that the Riemann
hypothesis is true because there is an unbounded selfadjoint operator H on some
Hilbert space so that the eigenvalues of
1
I + iH
2
are the nontrivial zeroes of the zeta function (the eigenvalues of a selfadjoint oper-
ator are real). Moreover, some expect that H is the Schrödinger operator (see the
1925 entry) corresponding to some quantum system.
Although the conjecture first appeared in print in 1973 [8], it was originally pro-
posed by George Pólya sometime during 1912–1914. Hilbert’s role in the conjecture
is less clear:
David Hilbert did not work in the central areas of analytic number the-
ory, but his name has become known for the Hilbert–Pólya conjecture
for reasons that are anecdotal. [14]
In the early 1980s, Andrew Odlyzko investigated the provenance of the conjecture.
His correspondence with Pólya and Olga Taussky-Todd (1906–1995), who worked
with Hilbert in Göttingen, makes an interesting read [10].
407
408 1987. PRIMES, THE ZETA FUNCTION, RANDOMNESS, AND PHYSICS
1987: Comments
The explicit formula. In our previous entries on the Riemann zeta function
and the Riemann hypothesis, we alluded to the connection between the location of
the nontrivial zeros of the zeta function and the distribution of the prime numbers.
We address that issue here.
Let
∞
ψ(x) = log p = log p + log p (1987.1)
pk ≤x p≤x k=2 pk ≤x
and note that the main term in ψ(x) comes from the first summand. Believe it or
not, the log p appears throughout the preceding for convenience. The reason has to
do with the Euler product representation of the zeta function and the log p terms
that arise upon taking logarithmic derivatives. Indeed, we have
∞
Λ(n)
ζ (s)
= − ,
ζ(s) n=1
ns
in which
log p if n = pk for some prime p,
Λ(n) =
0 otherwise,
is the von Mangoldt function, named after Hans Carl Friedrich von Mangoldt (1854–
1925). This computation is the first step in many delicate contour integrations; see
[7, Rem. 2.3.21 & Ch. 3] for details.
Let ρ denote a typical zero of ζ(s) in the critical strip. Then 0 < Re ρ < 1; the
Riemann hypothesis is the statement that all such ρ have Re ρ = 12 . If x is not a
prime power, then a hefty dose of complex analysis yields an explicit formula that
relates the sum (1987.1) over the prime numbers to a sum over the zeros of ζ(s):
xρ
ζ (0) 1 1
ψ(x) = x − − − log 1 − 2 .
ρ
ρ ζ(0) 2 x
There is a small technicality here. If ζ(ρ) = 0, then ζ(ρ) = 0. In order to have the
preceding sum converge, we group the terms corresponding to ρ and ρ together.
The Riemann hypothesis implies that
1 √
|ψ(x) − x| ≤ x log x, x ≥ 74. (1987.2)
8π
The square root comes from the assumption that |xρ | = x1/2 ; the extra logarithm
appears because of technical reasons. Through partial summation, one can use
(1987.2) to conclude that there is a constant C such that
|π(x) − Li(x)| ≤ Cx1/2 log x,
in which π(x) denotes the number of primes at most x and
x
dt x
Li(x) = ∼
2 log t log x
denotes the offset logarithmic integral.
These arguments can be reversed: if π(x) is sufficiently close to Li(x), then
the Riemann hypothesis is true. One shows that the existence of a zero with real
410 1987. PRIMES, THE ZETA FUNCTION, RANDOMNESS, AND PHYSICS
part greater than 1/2 leads to a violation on the proposed bound on |π(x) − Li(x)|
(if ζ(ρ) = 0, then ζ(1 − ρ); hence we may assume an exception to the Riemann
hypothesis has real part greater than 1/2).
How big is the nth prime? We have seen that the zeros of the Riemann zeta
function govern the large-scale distribution of the prime numbers. For example, the
prime number theorem is a consequence of the fact that ζ(1 + it)
= 0 for t ∈ R.
This famous theorem asserts that
π(x)
lim = 1.
x→∞ x/ log x
Since π(pn ) = n, we substitute q = pn , do a bit of calculus, and obtain
n log n π(pn ) log pn log n
lim = lim
n→∞ pn n→∞ pn log pn
log n
= lim
n→∞ log pn
log π(q)
= lim
q→∞ log q
log π(q)qlog q + log q − log log q
= lim
q→∞ log q
log 1 log log q
= lim +1−
q→∞ log q log q
= 1.
Thus, pn is asymptotic to n log n.
A more precise estimate is due to Michele Cipolla (1880–1947), who proved
that
n(log n + log log n − 1) < pn < n(log n + log log n)
for sufficiently large n [2]. In fact, he showed that
m
(−1)k+1 Tk (log log n) n(log log n)m+1
pn = n log n + log log n − 1 + +O ,
k=1
k logk n logm+1 n
The Cramér model. Based on the prime number theorem, Harald Cramér
(1893–1985) proposed a simple probabilistic model of the prime numbers that often
leads to decent predictions [9, 12] (see the comments for the 1975 entry for another
application of heuristic reasoning to the primes). The prime number theorem tells
100TH ANNIVERSARY PROBLEMS 411
Bibliography
[1] C. Axler, New estimates for the n-th prime number, https://arxiv.org/pdf/1706.03651.
pdf
[2] M. Cipolla, La determinazione assintotica dell’ nimo numero primo, Rend. Accad. Sci. Fis-
Mat. Napoli 3 (1902), no. 8, 132–166.
[3] J. B. Conrey, The Riemann hypothesis, Notices Amer. Math. Soc. 50 (2003), no. 3, 341–353.
http://www.ams.org/notices/200303/fea-conrey-web.pdf. MR1954010
[4] D. Hawkins, The random sieve, Math. Mag. 31 (1957/1958), 1–3, DOI 10.2307/3029322.
MR0099321
[5] J. Lorch and G. Ökten, Primes and probability: the Hawkins random sieve, Math. Mag.
80 (2007), no. 2, 112–119, DOI 10.1080/0025570x.2007.11953464. http://www.cs.bsu.edu/
homepages/jdlorch/mathmag116-123-lorch.pdf. MR2301878
[6] S. J. Miller, The probability lifesaver: All the tools you need to understand chance, Princeton
Lifesaver Study Guide, Princeton University Press, Princeton, NJ, 2017. MR3585480
[7] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[8] H. L. Montgomery, The pair correlation of zeros of the zeta function, Analytic number theory
(Proc. Sympos. Pure Math., Vol. XXIV, St. Louis Univ., St. Louis, Mo., 1972), Amer. Math.
Soc., Providence, R.I., 1973, pp. 181–193. MR0337821
[9] H. L. Montgomery and K. Soundararajan, Beyond pair correlation, Paul Erdős and his math-
ematics, I (Budapest, 1999), Bolyai Soc. Math. Stud., vol. 11, János Bolyai Math. Soc.,
Budapest, 2002, pp. 507–514. MR1954710
[10] A. Odlyzko, Correspondence about the origins of the Hilbert-Polya Conjecture, http://www.
dtc.umn.edu/~odlyzko/polya/index.html.
[11] A. M. Odlyzko, On the distribution of spacings between zeros of the zeta function, Math.
Comp. 48 (1987), no. 177, 273–308, DOI 10.2307/2007890. http://www.ams.org/journals/
mcom/1987-48-177/S0025-5718-1987-0866115-0/. MR866115
[12] T. Tao, 254A, Supplement 4: Probabilistic models and heuristics for the primes (optional),
https://terrytao.wordpress.com/tag/cramers-random-model/.
[13] Wikipedia, Lyapunov CLT, https://en.wikipedia.org/wiki/Central_limit_theorem#
Lyapunov_CLT.
[14] Wikipedia, Hilbert–Pólya conjecture, https://en.wikipedia.org/wiki/Hilbert-Polya
conjecture.
1988
Mathematica
Introduction
On June 23, 1988, Mathematica 1.0 was launched. What is Mathematica?
1
f (x) = .
x2 (x − 1)3 (x + 1)
This is the sort of grueling symbolic computation that is well suited for the com-
puter. We enter
into Mathematica (perhaps using its more appealing, modern interface) and imme-
diately receive the answer
There are, of course, more elegant ways to receive the output. However, this is
what users in the late 1980s would have seen on their screens. A nifty feature of
recent versions of Mathematica is the ability to get output in LATEX (see the 1979
entry). A simple cut-and-paste from the Mathematica window provides the LATEX
413
414 1988. MATHEMATICA
−∞
Of course, Mathematica is not only the backbone of Wolfram Alpha and the
hidden savior of calculus students everywhere. It has long been used for serious
mathematical research. Both authors have used Mathematica computations in their
own research, particularly in number theory, linear algebra, complex analysis, and
statistics. Its flexibility is remarkable:
It is often said that the release of Mathematica marked the beginning of
modern technical computing. Ever since the 1960s individual packages
had existed for specific numerical, algebraic, graphical and other tasks.
But the visionary concept of Mathematica was to create once and for
all a single system that could handle all the various aspects of technical
computing in a coherent and unified way. The key intellectual advance
that made this possible was the invention of a new kind of symbolic
computer language that could for the first time manipulate the very
wide range of objects involved in technical computing using only a
fairly small number of basic primitives. [12]
It does not take much imagination to see how computing software could be
useful in applied mathematics or statistics research. How can Mathematica be
used in pure mathematics research? Suppose that we wanted to explore the prime
numbers. A first step might be to examine the first 100 of them. The command
Table[Prime[n], {n, 1, 100}]
produces the output
{2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61,
67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137,
139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211,
223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283,
293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379,
383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461,
463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541}
from which we can observe several patterns. Aside from the anomalous primes 2
and 5, each prime number appears to end in 1, 3, 7, or 9. This is explained easily
enough: numbers that end in 0, 2, 4, 5, 6, 8 are divisible by 2 or by 5. Do the primes
have a favorite, among the final digits 1, 3, 7, and 9? The command
Table[Length[
Select[Prime[Range[1000000]],
Mod[#, 10] == {1, 3, 7, 9}[[i]] &]], {i, 1, 4}]
100TH ANNIVERSARY PROBLEMS 415
produces
{249934, 250110, 250014, 249940}
This tells us that among the first 1,000,000 primes, there are 249,934 that end in
1, 250,110 that end in 3, and so forth. The split looks remarkably even.
Trying a few different bases would reveal similarly equitable splits. From this, it
is a short step to conjecturing Dirichlet’s theorem on primes in arithmetic progres-
sions (see the 1913 entry). A more detailed analysis could also reveal Chebyshev’s
bias (there are usually slightly more primes of the form 4k + 3 than 4k + 1 up to
a given threshold) [2, 7], or perhaps even the recent observation of Robert Lemke
Oliver and Kannan Soundararajan (1973– ) that the primes have some definite
thoughts about who they sit next to [3, 5]:
Lemke Oliver and Soundararajan saw that in the first billion primes,
a 1 is followed by a 1 about 18% of the time, by a 3 or a 7 each 30% of
the time, and by a 9 22% of the time. They found similar results when
they started with primes that ended in 3, 7 or 9: variation, but with
repeated last digits the least common. The bias persists but slowly
decreases as numbers get larger. [4]
Simply put, computational power can reveal hidden patterns in classical objects
that could not be guessed at otherwise. This can lead to new conjectures about the
observed behavior and eventually new theorems. See the comments for an example
of this method of discovery.
Computer algebra systems can be used to produce startling identities that can
later be verified. This is similar to proofs by induction: you know the answer already
and need only to justify it. As a great example, the Mathematica commands
Sum[Binomial[n, k]^2, {k, 0, n}]
Sum[k Binomial[n, k]^2, {k, 0, n}]
Sum[k^2 Binomial[n, k]^2, {k, 0, n}]
Sum[k^3 Binomial[n, k]^2, {k, 0, n}]
yield the outputs
Binomial[2 n, n]
1/2 n Binomial[2 n, n]
n^2 Binomial[-2 + 2 n, -1 + n]
1/2 n^2 (1 + n) Binomial[-2 + 2 n, -1 + n]
These are the identities
n 2
n 2n
= ,
k n
k=0
n 2
n 1 2n
k = n ,
k 2 n
k=0
n 2
2 n 2 2n − 2
k = n ,
k n−1
k=0
n 2
n 1 2 2n − 2
k3 = n (n + 1) .
k 2 n−1
k=0
416 1988. MATHEMATICA
and
n 2
n
k5 ?
k
k=0
1988: Comments
Wilf–Zeilberger algorithm. We have seen that computers can spit out novel
identities. It would be better if they could provide humanly understandable proofs
of those identities too. In 1990, Herbert Wilf and Doron Zeilberger (1950– ) came
up with an algorithm to do just that [6, 8, 9].
Twin primes and their biases. Here is a true story about how a few Math-
ematica computations led to a new discovery. Recall that a primitive root modulo
n is a generator of the multiplicative group (Z/nZ)× . For example, 2 is a primitive
root modulo 5 since 21 , 22 , 23 , 24 ≡ 2, 4, 3, 1 (mod 5), respectively. If p is prime,
then a theorem of Gauss ensures that (Z/pZ)× has exactly φ(p − 1) primitive roots,
in which φ denotes the Euler totient function (see the 1977 entry).
One day last year, in Professor Stephan Garcia’s Number Theory and
Cryptography class, the lesson took a surprising turn. To make a point
about the use of seemingly random patterns in cryptography, Garcia
had just flashed onto the screen a chart of the first 100 [actually 20]
prime numbers and all of their primitive roots. . . . [10]
Needless to say, the chart (Table 1) was produced by Mathematica. The command
PrimitiveRootList[p] provides a list of the primitive roots of p.
Looking at the chart, Elvis Kahoro ’20 noticed something interesting
about pairs of primes known as “twins”—primes that differ by exactly
two, such as 29 and 31 [apart from 3 and 5]. The smaller of the pair
always seemed to have as many or more primitive roots than the larger
of the two. He wondered if that was always true.
“So I just asked what I thought was a random question,” Kahoro
recalls. It was the kind of curious question he was known for asking all
through his school years, sometimes with unfortunate results. “Some
teachers would get mad at me for asking so many questions that led
us off the topic,” he remembers.
100TH ANNIVERSARY PROBLEMS 417
For more information about twin primes, see the 1919 and 1923 entries.
Bibliography
[1] S. R. Garcia, F. Luca, and E. Kahoro, Primitive root bias for twin primes, Experimental
Mathematics, in press. https://www.tandfonline.com/doi/full/10.1080/10586458.2017.
1360809.
[2] A. Granville and G. Martin, Prime number races, Amer. Math. Monthly 113 (2006), no. 1,
1–33, DOI 10.2307/27641834. MR2202918
[3] E. Klarreich, Mathematicians Discover Prime Conspiracy, https://www.quantamagazine.
org/mathematicians-discover-prime-conspiracy-20160313.
[4] E. Lamb, Peculiar pattern found in ‘random’ prime numbers: last digits of nearby primes have
‘anti-sameness’ bias, Nature (online), https://www.nature.com/news/peculiar-pattern-
found-in-random-prime-numbers-1.19550.
[5] R. J. Lemke Oliver and K. Soundararajan, Unexpected biases in the distribution of
consecutive primes, Proc. Natl. Acad. Sci. USA 113 (2016), no. 31, E4446–E4454,
DOI 10.1073/pnas.1605366113. http://www.pnas.org/content/pnas/113/31/E4446.full.
pdf. MR3624386
[6] P. Paule and M. Schorn, A Mathematica Version of Zeilberger’s Algorithm for Proving Bi-
nomial Coefficient Identities, J. Symbolic Computation 11 (1994), 1–25.
[7] M. Rubinstein and P. Sarnak, Chebyshev’s bias, Experiment. Math. 3 (1994), no. 3, 173–197.
MR1329368
[8] H. S. Wilf, Computer programs from the book “A = B”, and related programs, https://www.
math.upenn.edu/~wilf/progs.html.
[9] H. S. Wilf and D. Zeilberger, Towards computerized proofs of identities, Bull. Amer. Math.
Soc. (N.S.) 23 (1990), no. 1, 77–83. https://projecteuclid.org/euclid.bams/1183555718.
[10] Staff writer, How to Advance Mathematics By Asking the Right Questions, Pomona College
Magazine, Spring 2018, 20-21. http://magazine.pomona.edu/2018/spring/how-to-advance-
mathematics-by-asking-the-right-questions/.
100TH ANNIVERSARY PROBLEMS 419
PROMYS
Introduction
In 1989, David Fried and Glenn H. Stevens (1953– ), graduates of Arnold Ross’s
Secondary Science Training Program (see the 1957 entry), cofounded PROMYS
(Programs in Mathematics for Young Scientists). Since then, over 1,000 students
have gone through the program. Currently about 80 high school students each
year come to Boston University for six weeks of challenging mathematics. They
are mentored by top graduate students and faculty drawn from all over the world.
Programs like PROMYS play a key role in exciting students to pursue mathemat-
ics and teaching older students how to mentor, design classes, and develop research
programs. In addition to standard classes and challenging problems, students par-
ticipate in research and attend advanced lectures on topics ranging from “The
Schoenflies Conjecture and Morse Theory” to “Statistical Inference and Modeling
the Unseen: How Bayesian statistics powers Google’s voice search.”
The second named author spoke at PROMYS several times. In 2009, he gave a
talk on heuristics and ballpark estimates. Informal argumentation is an important
skill for aspiring mathematicians to develop. The centennial problem for this year
concerns an application of heuristic reasoning to an old problem in number theory.
The Fermat number s are defined by
n
Fn = 22 + 1.
Notice a pattern? The first three are prime, and a little work shows that F3 and
F4 are prime too. What about F5 = 4,294,967,297? Is it a Fermat prime as
well? The Fermat numbers grow so rapidly that things soon get beyond the realm
of computation. For example, F10 has 309 digits! Pierre de Fermat conjectured
that each Fn is prime, although he was unable to prove this. What does heuristic
reasoning suggest?
1989: Comments
Why the weird exponent? Some authors consider 2 = 20 + 1 a Fermat
prime because of their preference for the formula 2n + 1 [2]. However, this is not
n
widely adhered to. Why is it that we search for primes of the form 22 + 1 instead
n
of 2 + 1? We start with the identity
Although this was an impressive computational feat at the time, a modern computer
factors F5 faster than the blink of an eye. The prime factorizations of the Fermat
numbers seem to involve some large primes (this is partially explained by the Euler–
Lucas theorem described at the end of the following section). For example, a few
seconds on a desktop computer reveals the prime factorizations
F6 = 274177 × 67280421310721,
F7 = 59649589127497217 × 5704689200685129054721,
F8 = 1238926361552897
× 93461639715357977769163558199606896584051237541638188580280321.
Prime factorizations are known for only a few more Fermat numbers. No Fermat
primes besides the original five (1989.1) have been found.
100TH ANNIVERSARY PROBLEMS 423
Since Fermat numbers are odd, the preceding tells us that gcd(Fm , Fn ) = 1. Thus,
the Fermat numbers F0 , F1 , F2 , . . . are pairwise relatively prime and hence their
prime factorizations yield infinitely many distinct primes. In fact, this proves that
n
there are at least n primes at most 22 + 1.
The ordered list of prime factors of the Fermat numbers begins with
3, 5, 17, 257, 641, 65537, 114689, 274177, 319489, 974849, 2424833,
6700417, 13631489, 26017793, 45592577, 63766529, 167772161,
825753601, 1214251009, 6487031809, 70525124609, 190274191361,
646730219521, 2710954639361, 2748779069441, 4485296422913,
6597069766657,
according to [3]. How can we be sure of this? What if a large Fermat number is
divisible by a small prime? A result of Euler, later improved by Édouard Lucas
(1842–1891), asserts that every prime factor of Fn is of the form k2n+2 + 1. Thus,
the size of the smallest prime factor of Fn tends to increase rapidly with n. For
example, we can be sure that no Fermat number has a prime factor strictly between
257 and 641 since we have the prime factorizations of all Fn for n = 0, 1, 2, . . . , 11.
Heuristic argument. The prime number theorem asserts that the number
of primes at most x is roughly x/ log x. Thus, the density of primes at most x is
about 1/ log x. We therefore model the primes as a random process, in which the
probability that a natural number n is prime is 1/ log n (see the comments for the
1987 entry). Consider the random variable
1 if n is prime,
Xn =
0 otherwise.
The expected number of Fermat primes is
E[XF0 + · · · + XFN ] = E[XF0 ] + · · · + E[XFN ]
by the linearity of expectation. Since
1 1 1
E[XFn ] = = 2n ≤ n ,
log Fn log(2 + 1) 2 log 2
the expected number of Fermat primes is at most
∞
1 1 2
= ≈ 2.88 < 3. (1989.3)
log 2 n=0 2n log 2
Thus, we expect that there are only finitely many Fermat primes. A more sophis-
ticated argument comes to the same conclusion [2].
Our estimate is reasonably close to the presently observed number (five). What
causes the discrepancy? First of all, this is a heuristic argument that proves nothing:
our model could be completely wrong. However, well-composed heuristic arguments
often do point us in the right direction (see the 1987 entry). A more likely culprit is
the bias introduced by small primes. The largest contributions to the sum (1989.3)
come from the smallest Fermat numbers. In this range, the large-scale predictions
afforded by the prime number theorem are swamped by small-scale fluctuations.
For example, the prime number theorem predicts that there are 2/ log 2 ≈ 2.73
primes at most 2, which is absurd.
100TH ANNIVERSARY PROBLEMS 425
Bibliography
[1] M. Aigner and G. M. Ziegler, Proofs from The Book, 4th ed., Springer-Verlag, Berlin, 2010.
MR2569612
[2] K. D. Boklan and J. H. Conway, Expect at most one billionth of a new Fermat prime!, Math.
Intelligencer 39 (2017), no. 1, 3–5, DOI 10.1007/s00283-016-9644-3. https://arxiv.org/pdf/
1605.01371.pdf. MR3620166
[3] The On-Line Encyclopedia of Integer Sequences, A023394 (Prime factors of Fermat numbers),
http://oeis.org/A023394.
[4] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[5] PROMYS, PROMYS: Program in Mathematics for Young Scientists, http://www.promys.
org/.
[6] Wikipedia, Constructible polygon, https://en.wikipedia.org/wiki/Constructible_polygon.
[7] Wikipedia, Heptadecagon, https://en.wikipedia.org/wiki/Heptadecagon.
1990
Introduction
Although it rose to national prominence after Marilyn vos Savant (1946– )
presented it in a 1990 Parade magazine column [3], the famed Monty Hall problem
first appeared in 1975 when it was posed by Steve Selvin (1941– ) in The American
Statistician [4]. His presentation also explains the origin of the problem’s name:
It is “Let’s Make a Deal”—a famous TV show starring Monte Hall.1
Monte Hall: One of the three boxes labeled A, B, and C contains
the keys to that new 1975 Lincoln Continental. The other two are
empty. If you choose the box containing the keys, you win the car.
Contestant: Gasp!
Monte Hall: Select one of these boxes.
Contestant: I’ll take box B.
Monte Hall: Now box A and box C are on the table and here is box
B (contestant grips box B tightly). It is possible the car keys are in
that box! I’ll give you $100 for the box.
Contestant: No, thank you.
Monte Hall: How about $200? Contestant: No!
Audience: No!!
Monte Hall: Remember that the probability of your box containing
the keys to the car is 1/3 and the probability of your box being empty
is 2/3. I’ll give you $500. Audience: No!!
Contestant: No, I think I’ll keep this box.
Monte Hall: I’ll do you a favor and open one of the remaining boxes
on the table (he opens box A). It’s empty! (Audience: applause). Now
either box C or your box B contains the car keys. Since there are two
boxes left, the probability of your box containing the keys is now 1/2.
I’ll give you $1000 cash for your box.
WAIT!!!!
Is Monte right? The contestant knows that at least one of the
boxes on the table is empty. He now knows it was box A. Does this
knowledge change his probability of having the box containing the keys
1 Monty Hall was the stage name of Monte Halparin (1921–2017). Although Selvin’s puzzle is
universally referred to as the “Monty Hall problem,” it is interesting to note that Selvin (perhaps
unintentionally) spelled the stage name “Monty” as “Monte,” which is the host’s actual first name.
427
428 1990. THE MONTY HALL PROBLEM
1 3
2
from 1/3 to 1/2? One of the boxes on the table has to be empty. Has
Monte done the contestant a favor by showing him which of the two
boxes was empty? Is the probability of winning the car 1/2 or 1/3?
In most contemporary formulations of the problem, the contestant chooses one
of three doors. One door conceals a valuable prize. Behind the other two doors are
goats, which are presumed to be worthless; see Figure 1. The host opens one of the
other doors and reveals a goat. He gives the contestant the chance to switch to the
remaining door. Should the contestant switch?
How can switching doors possibly help? Each door initially has a 1/3 chance
of holding the prize. After the host opens one of the doors, we know that one of
the two remaining doors conceals the prize. Thus, the chance that either holds the
prize is 1/2. Is this correct? See the comments below for the answer!
Our problem for this year, which also appeared in 1990, is due to philosopher
Arnold Zuboff [7]. It is called the sleeping beauty problem and it is still the source
of spirited arguments. What do you think the answer is?
1990: Comments
Resolution of the Monty Hall problem. One good way to build intuition
for the answer is to write a computer program and simulate millions of games.
100TH ANNIVERSARY PROBLEMS 429
Computational results can quickly provide evidence for or against a particular an-
swer. Without loss of generality we may assume the contestant always chooses the
first door. Here is an example of such a program in Mathematica (see the 1988
entry).2
success = 0; (* initialize number of successes to 0 *)
For[n = 1, n <= 1000000, n++, (* do one million trials *)
{
(* randomly choose what door prize is behind *)
prizelocation = RandomInteger[{1, 3}];
(* if prize is behind Door 1 host randomly opens a door *)
(* if prize is behind Door 2 host must open Door 3 *)
(* if prize is behind Door 3 host must open Door 2 *)
If[prizelocation == 1, hostopen = RandomInteger[{2, 3}]];
If[prizelocation == 2, hostopen = 3];
If[prizelocation == 3, hostopen = 2];
(* Switch to whatever remaining door the host did not open *)
If[hostopen == 2, choosedoor = 3];
If[hostopen == 3, choosedoor = 2];
(* If prize is behind our new door, increase success counter by 1 *)
If[choosedoor == prizelocation, success = success + 1];
}];
(* we now print out the success rate *)
Print["By switching success rate is ", 100.0 * success/1000000, "\%."];
What is the final result? The output after 1,000,000 trials is
By switching success rate is 66.6787\%.
Although this is not a formal proof, it strongly suggests that switching doors leads
to victory two thirds of the time.
Most people arrive at the incorrect answer; they think that switching doors
should have no impact. Where did we go wrong? What were our hidden assump-
tions? Note that Monty Hall always reveals a goat; in other words, if you choose
door 1 and the prize is behind door 2, he will never open door 2 and show the prize.
Thus, the host does not have complete freedom if there is a goat behind your door;
this happens two thirds of the time. In this case, one of the other two doors hides
a goat while the other hides the prize. Since he cannot reveal the prize, he must
choose the door with the goat; if you switch, you win. If the prize is behind your
door, then the host can open either door. In this case, which happens one third of
the time, you lose if you switch.
All of this agrees with Selvin’s original case-by-case analysis of the problem [4].
He also presented a conditional probability based justification of the 2/3 answer
since many readers were not convinced by his first argument [5]. However, we
think that his original argument is convincing enough; see Table 1.
A discussion of extreme cases can also point you in the right direction. Instead
of three doors, suppose that there are a million doors. You make your pick and the
host opens 999,998 doors in quick succession, revealing goat after goat after goat.
There are now just two doors left. Do you really want to keep your first choice or
would you want to switch? Does it seem plausible that both of the remaining doors
2 RandomInteger[{a,b}] selects an integer uniformly at random from {a, a + 1, . . . , b}.
430 1990. THE MONTY HALL PROBLEM
are equally likely to hide the prize, or does it seem clear that the host was careful
to avoid showing you the prize?
See [2] for more information about the Monty Hall problem and its history.
Getting Erdős’s goat. If you guessed that switching doors would not have an
effect on the Monty Hall problem, you would be in good company. Andrew Vazsonyi
(1916–2003) relates the following anecdote about the legendary Paul Erdős (see the
1913 entry) [6].
. . . I told the problem to the late Paul Erdős, one of the most famous
mathematicians of the century, when he visited my home in 1995.
Erdős was considered by number theorists as one of the greatest experts
in probability theory. In a conversation about the use of probability
theory in decision making, I mentioned the goats and Cadillac problem
and the answer to Erdős, fulling expecting us to move onto the next
subject. But, to my surprise, Erdős said, “No, that is impossible, it
should make no difference.”
Needless to say, whether it is a Cadillac or a Lincoln Continental at stake is irrele-
vant to the problem. The two argued over the problem for a while before Vazsonyi
became frustrated with the situation:
He wanted a straightforward explanation with no decision trees. I gave
up at this point, because I have no common sense explanation. . . .
So I told Erdős, “You don’t know about decision trees so you can’t
understand the solution. Put on your earphones, listen to your music,
and stop bothering me.” (When Erdős appeared in my house, the first
thing he did was unpack his radio and start listening to classical music.
The radio blasted from 5:00 am to midnight. He didn’t seem to be able
to live without it.)
An hour later Erdős came back really irritated. “What’s the mat-
ter with you? Why aren’t you telling me the reason why I should
switch?” I said that I was sorry, but I didn’t have a common sense
explanation and only the decision tree analysis convinces me.
100TH ANNIVERSARY PROBLEMS 431
Eventually Erdős was convinced by a numerical simulation, much like the one we
performed above.
Erdős objected that he still did not understand the reason why, but
was reluctantly convinced that I was right. A few days after he left,
he telephoned to say that Ron Graham of AT&T explained to him the
reasoning behind the answer and that now he understood. He pro-
ceeded to tell me the reasoning but I couldn’t fathom his explanation.
See the comments for the 1992 entry for more information about Ron Graham.
The ugly side of mathematics. Although Marilyn vos Savant, who claims
to have the world’s highest IQ, presented the correct answer in her column, she
received a large amount of spiteful and derogatory responses [3]. Here are some of
the totally unnecessary personal attacks that were leveled against her:
Maybe women look at math problems differently than men.3
May I suggest that you obtain and refer to a standard textbook on
probability before you try to answer a question of this type again?
You blew it, and you blew it big! Since you seem to have difficulty
grasping the basic principle at work here, I’ll explain. After the host
reveals a goat, you now have a one-in-two chance of being correct.
Whether you change your selection or not, the odds are the same.
There is enough mathematical illiteracy in this country, and we don’t
need the world’s highest IQ propagating more. Shame!
Bibliography
[1] A. Elga, Self-locating belief and the sleeping beauty problem, Analysis 60 (2000), no. 2, 143–147.
http://www.princeton.edu/~adame/papers/sleeping/sleeping.pdf.
[2] J. Rosenhouse, The Monty Hall problem: The remarkable story of math’s most contentious
brainteaser, Oxford University Press, Oxford, 2009. MR2543995
[3] M. vos Savant, Game Show Problem. http://marilynvossavant.com/game-show-problem/.
[4] S. Selvin, A problem in probability, American Statistician 29 (1975), no. 1, 67.
[5] S. Selvin, On the Monty Hall problem, American Statistician 29 (1975), no. 3, 134.
[6] A. Vazsonyi, Which door has the Cadillac?, Decision Line (1999), Dec./Jan., 17–19.
[7] A. Zuboff, One self: The logic of experience, Inquiry: An Interdisciplinary Journal of Philoso-
phy 33 (1990), no. 1, 39–68.
3 This reader later doubled down and wrote back again: “I still think you’re wrong. There is
arXiv
Introduction
This year’s problem honors the founding of the arXiv (http://arxiv.org/) in
1991 by Paul Ginsparg (1955– ). Authors frequently post preliminary versions of
articles on the arXiv, often long before they appear in print. Scientific ideas can be
shared and disseminated in almost real time. Hundreds of new papers are added
to the collection every day.
Although the arXiv originally started as a repository for physics papers, it now
hosts over 1.5 million articles in physics, mathematics, computer science, statistics,
and other fields; see Figures 1, 2, and 3. It provides researchers all over the world
immediate access to newly written papers. The arXiv is now generously supported
by Cornell University, the Simons Foundation, and various member institutions.
Most papers that are posted to the arXiv are eventually submitted for pub-
lication in peer reviewed research journals. However, there are a few exceptional
cases. For example, Grigori Perelman (1966– ) posted three papers on the arXiv
in 2002–2003 that resolved the longstanding Poincaré conjecture [5–7]. Although
he never submitted these papers to journals, experts in the field were still able to
read them and verify his results. Perelman was offered the Fields Medal in 2006,
although he declined (see the 2003 entry).
Although the arXiv was once flooded with fallacious proofs of famous conjec-
tures, they appear less frequently than they did in its early days. In 2004, the
arXiv instituted an endorsement system that requires established authors to vouch
for newcomers before they are allowed to post articles. However, papers that claim
proofs of the Riemann hypothesis, the twin prime conjecture, and so forth occa-
sionally appear. If you are brave, look a few of them up. Find the mistakes in the
proofs (there is almost certainly at least one in each paper) or satisfy yourself that
there are none.
Figure 2. left: New arXiv submissions per year for high energy
physics, condensed matter physics, astrophysics, other physics,
mathematics, computer science, electrical engineering / systems
science, statistics, quantitative biology, and quantitative finance /
economics. right: Fractional submission rates for each subject
area. Source: https://arxiv.org/help/stats/2017_by_area/
index.
1991: Comments
Reinventing the wheel. While it is easy to publish material online, there is
also the need to verify that what you read is correct and to make sure that what
you are doing is truly original work. A cautionary tale concerns Mary M. Tai, a
100TH ANNIVERSARY PROBLEMS 435
Of course, this response invites the question of why it proved necessary to publish
the result at all. Tai continued to refer to “Tai’s model,” even after numerous
readers pointed out that it is the trapezoidal rule:
According to Merriam Webster’s Dictionary, a model can be defined
as “a type of design or product;” “a description used to visualize some-
thing that cannot be directly observed;” or “a system of postulates,
data, and inferences presented as a mathematical description of an en-
tity. . . .” Even if Tai’s model were based on the trapezoid rule concept,
according to the definition of a model, I have worked out a “design”
(mathematical expression) for the “structure units” (individual areas)
on my own. In other words, I have presented the original concept into a
functioning mathematical description that can be easily observed and
applied.” [10]
Needless to say, every calculus book in existence presents the trapezoidal rule in a
manner than can easily be applied!
Bibliography
[1] arXiv preprint server, http://arxiv.org/.
[2] J. H. Monaco and R. L. Anderson, Tai’s formula is the trapezoidal rule, Diabetes Care
17 (1994), no. 10, 1224.
[3] J. Lacan, Of structure as an inmixing of an otherness prerequisite to any subject whatever,
The Languages of Criticism and the Sciences of Man (edited by R. Macksey and E. Donato),
Johns Hopkins Press, 1970, 186–200.
[4] J. Lacan, The subversion of the subject and the dialectic of desire in the Freudian uncon-
scious, In Merits: A Selection (translated by A. Sheridan), Norton, 1977, 292–325.
[5] G. Perelman, The entropy formula for the Ricci flow and its geometric applications, https://
arxiv.org/abs/math/0211159.
[6] G. Perelman, Ricci flow with surgery on three-manifolds, https://arxiv.org/abs/math/
0303109.
[7] G. Perelman, Finite extinction time for the solutions to the Ricci flow on certain three-
manifolds, https://arxiv.org/abs/math/0307245.
[8] A. Sokal and J. Bricmont, Fashionable Nonsense: Postmodern Intellectuals’ Abuse of Science,
Picador, 1999.
[9] M. M. Tai, A mathematical model for the determination of total area under glucose toler-
ance and other metabolic curves, Diabetes Care 17 (1994), no. 2, 152–154, http://care.
diabetesjournals.org/content/17/2/152.
[10] M. M. Tai, Reply from Mary Tai, Diabetes Care 17 (1994), no. 10, 1226–1228.
1992
Monstrous Moonshine
Introduction
A subgroup N of a finite group G is normal if N = gN g −1 for each g ∈ G. If N
is normal in G, then the quotient group G/N is well-defined. If N is nontrivial, then
G/N has smaller order than G itself and is, in principle, more tractable to study.
A simple group is a group that contains no proper, nontrivial normal subgroups.
For example, cyclic groups of prime order are simple, as are the alternating groups
An for n ≥ 5. The finite simple groups are the building blocks of all finite groups
in the sense that they cannot be “broken down” further.
The classification of finite simple groups, a monumental human achievement
spanning thousands of journal pages and hundreds of articles, was completed in
2004 (see the comments for the 2004 entry and the problem for 1968). In short,
the finite simple groups fall into eighteen infinite families along with twenty six
so-called “sporadic groups,” the largest of which is the monster group, M , which
has order
246 · 320 · 59 · 76 · 112 · 133 · 17 · 19 · 23 · 29 · 31 · 41 · 47 · 59 · 71 ≈ 8.08 × 1053 .
To put things in perspective, there are
26! ≈ 4.03 × 1026
permutations of the English alphabet. This is less than the square root of the order
of the monster group! There are approximately 1050 atoms in the Earth [3], still
comfortably less than the number of elements in M . Of course, there are useful
numbers that dwarf even the order of the monster group; see the comments below
on Graham’s number.
The monster group is notoriously difficult to compute with. How does one
describe its elements? The smallest faithful representation of M over the complex
field has dimension 196,884. This means that the smallest dimension n for which
there is an injective homomorphism1 φ : M → GLn (C) is n = 196,884. In contrast,
the alternating group A44 on 44 letters is larger than the monster group, since
|A44 | = 44!/2 ≈ 1.33 × 1054 .
However, we can represent any permutation on 44 letters faithfully using 44 × 44
permutation matrices. Even more extreme, the dihedral group of order 2n (no
matter how large n is) can be faithfully represented using 2×2 matrices: write down
1 Thus, M is isomorphic to its image φ(M ) in GLn (C), the group of n × n invertible complex
matrices. This reduces the study of the abstract group M to the study of a collection of matrices,
a general process that can be helpful for computations with finite groups.
439
440 1992. MONSTROUS MOONSHINE
the linear transformations that represent the action of the group on the regular n-
gon in R2 . This tells us that the monster group is really quite complicated. We
simply (pun intended) cannot accurately encode it using relatively small matrices.
The term “monstrous moonshine” refers to a connection between the monster
group and the j-invariant of Felix Klein (1849–1925), a remarkable function on the
upper half-plane in C that is related to non-Euclidean geometry, elliptic curves,
and analytic number theory; see Figure 1. To explain this connection requires a bit
of setup.
A function f meromorphic on the upper half-plane is (elliptic) modular if it
satisfies
aτ + b
f (τ ) = f
cτ + d
whenever Im τ > 0 and a, b, c, d are integers with ad − bc = 1, and it enjoys a
Laurent series expansion of the form
∞
f (q) = a(n)q n ,
n=−m
2πiτ
in which q = e . One can show that every rational function of j is a modular
function and, conversely, that every modular function is a rational function of j.
In 1978, John McKay (1939– ) observed that the first few coefficients in the
expansion
j(q) = q −1 + 744 + 196,884q + 21,493,760q 2 + 864,299,970q 3
+ 20,245,856,256q 4 + 333,202,640,600q 5 + · · ·
100TH ANNIVERSARY PROBLEMS 441
1992: Comments
Numbers with fixed prime factors. What can be said about the sequence
1 = n1 < n2 < · · · of natural numbers whose prime factors are among the list
(1992.1)? The sequence begins promisingly enough:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39,
40, 41, 42, 44, 45, 46, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57, 58, 59,
60, 62, 63, 64, 65, 66, 68, 69, 70, 71, 72, 75, 76, 77, 78, 80, 81, 82,
84, 85, 87, 88, 90, 91, 92, 93, 94, 95, 96, 98, 99, 100
but skips a few numbers. Does it contain most of the natural numbers?
Axel Thue (1863–1922) proved that if one starts with any finite set of primes,
then limi→∞ (ni+1 − ni ) = ∞; that is, the gaps between terms in the sequence tend
to infinity [7]. In particular, the sequence above contains relatively few natural
numbers in the big scheme of things. A more quantitative version is due to Robert
Tijdeman (1943– ), who proved that there is a constant C, which depends only
upon the initial (finite) list of primes, such that
ni
ni+1 − ni >
(log ni )C
for ni ≥ 3 [8].
We start with Donald Knuth’s up-arrow notation for positive integers [11]; this
is related to Ackermann’s function from the 1926 entry. We can view multiplication
as iterated addition: ab is b copies of a under addition. Along these lines, a ↑ b is
defined to be ab , that is, b copies of a under multiplication. We can define
a ↑↑ b = a ↑ (a ↑ (· · · ↑ a)),
b − 1 up arrows
so that
3
3 ↑↑ 2 = 33 = 27 and 3 ↑↑ 3 = 33 = 7,625,597,484,987, (1992.2)
and so forth. But why stop there? We can define
a ↑↑↑ b = a ↑↑ (a ↑↑ (· · · ↑↑ a)),
b − 1 double up arrows
Bibliography
[1] R. E. Borcherds, Monstrous moonshine and monstrous Lie superalgebras, Invent. Math. 109
(1992), no. 2, 405–444, DOI 10.1007/BF01232032. MR1172696
[2] J. H. Conway and S. P. Norton, Monstrous moonshine, Bull. London Math. Soc. 11 (1979),
no. 3, 308–339, DOI 10.1112/blms/11.3.308. http://blms.oxfordjournals.org/content/11/
3/308.full.pdf+html. MR554399
[3] Dr. FermiGuy, Physics Questions People Ask Fermilab, http://www.fnal.gov/pub/science/
inquiring/questions/atoms.html.
[4] T. Gannon, Monstrous moonshine: the first twenty-five years, Bull. London Math. Soc.
38 (2006), no. 1, 1–33, DOI 10.1112/S0024609305018217. http://arxiv.org/pdf/math/
0402345v2.pdf. MR2201600
[5] M. Gardner, Mathematical games, Scientific American 237 (1977), November, 18–28.
[6] R. Graham and B. Haran, How big is Graham’s number?, https://www.youtube.com/watch?
v=GuigptwlVHo.
[7] A. Thue, Selected mathematical papers, with an introduction by Carl Ludwig Siegel and a
biography by Viggo Brun; edited by Trygve Nagell, Atle Selberg, Sigmund Selberg, and Knut
Thalberg, Universitetsforlaget, Oslo, 1977. MR0460050
[8] R. Tijdeman, On integers with many small prime factors, Compositio Math. 26 (1973),
319–330. MR0325549
[9] Wikipedia, Graham’s number, https://en.wikipedia.org/wiki/Graham’s_number.
[10] Wikipedia, Monstrous moonshine, http://en.wikipedia.org/wiki/Monstrous_moonshine.
[11] Wikipedia, Knuth’s up-arrow notation, https://en.wikipedia.org/wiki/Knuth’s_up-
arrow_notation.
1993
The 15-Theorem
Introduction
Lagrange’s four-square theorem, proved by Joseph-Louis Lagrange in 1770,
says that every positive integer is the sum of four perfect squares (in which zero is
considered a square). For example, 1993 is expressible as a sum of four squares in
many different ways, such as
Lagrange’s theorem was refined in 1834 by Carl Gustav Jacob Jacobi (1804–1851),
who proved what is now known as Jacobi’s four-square theorem: the number r4 (n)
of representations
n = a2 + b2 + c2 + d2 ,
integers, is
⎧
⎨12h(−4n) if n ≡ 1, 2, 5, 6 (mod 8),
⎪
r3 (n) = 24h(−n) if n ≡ 3 (mod 8),
⎪
⎩
0 if n ≡ 7 (mod 8),
in which h(x) denotes the class number of x [10] (see the 1966 entry for more about
class numbers).
Instead of focusing only on sums of squares, we can learn even more by studying
quadratic forms in several variables. We take our inspiration from the identity
b
! !
a x1
ax21 + bx1 x2 + cx22 = [x1 x2 ] b
2 .
2 c x2
Q(x) = xT Ax (1993.2)
is a quadratic form with positive definite, integral matrix A, then Q represents all
positive integers if and only if it represents the numbers 1, 2, . . . , 15 [12]. In fact,
one can replace this list with
The restriction that A has integer entries is nontrivial. For example, the quadratic
form
! !
2 2 1 12 x1
x1 + x1 x2 + x2 = [x1 x2 ] 1 (1993.3)
2 1 x2
has integer coefficients, but its corresponding matrix has noninteger off-diagonal
entries. The original proof of the 15-theorem was not published, although Fields
Medalist Manjul Bhargava gave a simpler proof in 2000 [1].
In 1916, Ramanujan provided a list of fifty-five “diagonal” quartic forms (1993.2)
that he claimed exhausts the positive-definite, universal forms in four variables [9].
100TH ANNIVERSARY PROBLEMS 447
1993: Comments
The 290-theorem. In 2008, Bhargava and Jonathan P. Hanke proved the
290-theorem, which asserts that a quadratic form (1993.2) with integer coefficients
and positive definite matrix A is universal if and only if it assumes the values
448 1993. THE 15-THEOREM
Triangular numbers. Do not feel bad if you have trouble with part (a) of
the centennial problem. It is a famous result of Gauss from 1796 and it is closely
related to the difficult problem of representing an integer as a sum of three squares.
Indeed, if
n = Tp + Tq + Tr ,
There are many things that can be said about triangular (and more generally,
polygonal) numbers [11]. The reader is invited to deduce the correct definition of
an n-polygonal number; look at Figure 1 for inspiration. If you wish to check your
answer, the generating function for the n-polygonal numbers is
x (n − 3)x + 1
Gn (x) = .
(1 − x)3
Fermat’s polygonal number theorem, stated by Fermat in 1638, asserts that for
k = 3, 4, . . ., each natural number n is the sum of k k-polygonal numbers (as usual,
zero summands are permitted). The case n = 3 is Gauss’s theorem and the n = 4
case is Lagrange’s four-square theorem. Fermat’s theorem, for which he gave no
proof, was finally proved by Cauchy in 1813 [14].
3, 7, 21, 31, 33, 43, 67, 79, 87, 133, 217, 219, 223, 253, 307, 391, . . .
that are not represented by this form are not easily characterized [13]. Two other
numbers, 679 and 2,719, were later added to Ramanujan’s list. In 1997, Ken Ono
and Kannan Soundararajan (1973– ) conjectured that the odd natural numbers
that are not of the form x2 + y 2 + 10z 2 are
3, 7, 21, 31, 33, 43, 67, 79, 87, 133, 217, 219, 223, 253, 307, 391, 679, 2719.
More importantly, they show that if the generalized Riemann hypothesis is true,
then their conjecture holds [8, Thm. 3].
100TH ANNIVERSARY PROBLEMS 449
Bibliography
[1] M. Bhargava, On the Conway-Schneeberger fifteen theorem, Quadratic forms and their ap-
plications (Dublin, 1999), Contemp. Math., vol. 272, Amer. Math. Soc., Providence, RI,
2000, pp. 27–37, DOI 10.1090/conm/272/04395. http://www.maths.ed.ac.uk/~aar/books/
dublin.pdf. MR1803359
[2] J. H. Conway, Universal quadratic forms and the fifteen theorem, Quadratic forms and their
applications (Dublin, 1999), Contemp. Math., vol. 272, Amer. Math. Soc., Providence, RI,
2000, pp. 23–26, DOI 10.1090/conm/272/04394. http://www.maths.ed.ac.uk/~aar/books/
dublin.pdf. MR1803358
[3] K. Hartnett, A classical math problem gets pulled into self-driving cars, Quanta Maga-
zine, May 23, 2018. https://www.quantamagazine.org/a-classical-math-problem-gets-
pulled-into-the-modern-world-20180523/
[4] M.-H. Kim, Recent developments on universal forms, Algebraic and arithmetic theory of
quadratic forms, Contemp. Math., vol. 344, Amer. Math. Soc., Providence, RI, 2004, pp. 215–
228, DOI 10.1090/conm/344/06218. MR2058677
[5] S. D. Kominers, On universal binary Hermitian forms, Integers 9 (2009), A02,
6, DOI 10.1515/INTEG.2009.002. http://www.emis.de/journals/INTEGERS/papers/j2/j2.
pdf. MR2475630
[6] I. Niven, H. S. Zuckerman, and H. L. Montgomery, An Introduction to the Theory of Numbers,
Wiley, 2008.
[7] The On-Line Encyclopedia of Integer Sequences, A004215 (numbers that are the sum of 4
but no fewer nonzero squares), https://oeis.org/A004215.
[8] K. Ono and K. Soundararajan, Ramanujan’s ternary quadratic form, Invent. Math. 130
(1997), no. 3, 415–454, DOI 10.1007/s002220050191. http://link.springer.com/article/
10.1007%2Fs002220050191. MR1483991
[9] S. Ramanujan, On the expression of a number in the form ax2 + by 2 + cz 2 + du2 , Proc. Camb.
Phil. Soc. 19 (1916), 11–21.
[10] Wolfram MathWorld, Sum of squares function, http://mathworld.wolfram.com/
SumofSquaresFunction.html.
450 1993. THE 15-THEOREM
AIM
Introduction
In 1994 John Fry (1944– ), cofounder of the Fry’s Electronics chain, funded the
creation of AIM, the American Institute of Mathematics1 [1]. AIM was located in
Palo Alto, California, for many years before moving to its present location in San
Jose. The institute’s stated mission is:
To advance mathematical knowledge through collaboration, to broaden
participation in the mathematical endeavor, and to increase the aware-
ness of the contributions of the mathematical sciences to society.
Since 2002, AIM has been one of eight institutions that are part of the National
Science Foundation’s Mathematical Sciences Institute Program [5]. The others are:
• Institute for Advanced Study (IAS) in Princeton, NJ,
• Institute for Computational and Experimental Research in Mathematics (ICERM)
in Providence, RI,
• Institute for Mathematics and its Applications (IMA) in Minneapolis, MN,
• Institute for Pure and Applied Mathematics (IPAM) in Los Angeles, CA,
• Mathematical Biosciences Institute (MBI) in Columbus, OH,
• Mathematical Sciences Research Institute (MSRI) in Berkeley, CA,
• Statistical and Applied Mathematical Sciences Institute (SAMSI) in Research
Triangle Park, NC.
These institutes bring together mathematicians and foster long-term collaborations.
One of AIM’s most effective and popular methods for nurturing collaborative work
is the SQuaREs program:
The purpose of AIM’s research program called SQuaREs (Structured
Quartet Research Ensembles) is to allow a dedicated group of four to
six mathematicians to spend a week at AIM in San Jose, California,
with the possibility of returning in following years. A SQuaRE could
arise as a followup to an AIM workshop, or it could be a freestanding
activity. AIM will provide both the research facilities and the financial
support for each SQuaRE group.
There are so many good questions arising from work at AIM that it is hard to
select just one. We have chosen an easily stated problem with a long and storied his-
tory. Moreover, it connects not only to Hilbert’s tenth problem (see the 2005 entry)
1 Full disclosure: the first named author has served on the human resources board of AIM
since 2008. Both authors have led workshops at AIM over the years.
451
452 1994. AIM
and Sage (see the 2005 entry), but it also forms a segue into Fermat’s last theorem,
the topic of our next entry. See https://aimath.org/news/congruentnumbers/
for more information.
1994: Comments
A trillion triangles. An n ≥ 1 for which (1994.1) has a rational solution
(a, b, c) is a congruent number . The centennial problem above is the famed congru-
ent number problem. The first few congruent numbers are
5, 6, 7, 13, 14, 15, 20, 21, 22, 23, 24, 28, 29, 30, 31, 34, 37, 38,
39, 41, 45, 46, 47, 52, 53, 54, 55, 56, 60, 61, 62, 63, 65, 69, 70, 71,
77, 78, 79, 80, 84, 85, 86, 87, 88, 92, 93, 94, 95, 96, 101, 102, 103,
109, 110, 111, 112, 116, 117, 118, 119, 120, 124, 125, 126 [6].
For example, 5 is the area of the right triangle with sides
20 3 41
(a, b, c) = , , .
3 2 6
Although early Islamic mathematicians identified the congruent numbers
5, 6, 14, 15, 21, 30, 34, 65, 70, 110, 154, 190, 210, 221, 231, 246, 290, 390, 429, 546,
they missed many of the examples above [2]. It is not easy to determine whether
a given number is congruent or not. The first congruent number omitted in the
second list, 7, is congruent because it is the area of the right triangle with sides
24 35 337
(a, b, c) = , , .
5 12 60
Where does AIM come in? In 2009, a team of mathematicians supported by
AIM succeeded in determining all of the congruent numbers up to one trillion [2].
Long story short: there are 3,148,379,694 of them in that range. An AIM press
release declared [1]:
Mathematicians from North America, Europe, Australia, and South
America have resolved the first one trillion cases of an ancient mathe-
matics problem. The advance was made possible by a clever technique
for multiplying large numbers. The numbers involved are so enormous
that if their digits were written out by hand they would stretch to the
moon and back. The biggest challenge was that these numbers could
not even fit into the main memory of the available computers, so the
researchers had to make extensive use of the computers’ hard drives.
100TH ANNIVERSARY PROBLEMS 453
Two teams, each using different software and hardware, arrived at the same
results (one group used Sage, the focus of our 2005 entry). A critical role was played
by the fast Fourier transform (see the 1965 entry), which can be used to multiply
two n-bit numbers in O(n log n log log n) time.
x(t), y(t)
(1, 0)
2 The solutions (0, 0), (n, 0), and (0, n) to (1994.3) correspond to a = c, which is not attainable
Bibliography
[1] AIM, A trillion triangles, https://aimath.org/news/congruentnumbers/.
[2] R. Bradshaw, W. B. Hart, D. Harvey, G. Tornaria, and M. Watkins, Congruent number theta
coefficients to 1012 , http://homepages.warwick.ac.uk/~masfaw/congruent.pdf.
[3] K. Conrad, The congruent number problem, Harvard College Mathematical Review 2 (2008),
no. 2, 58–73.
[4] S. J. Miller, Extending the Pythagorean formula, talk online at http://youtu.be/
idIHcgapMG4 (slides at https://web.williams.edu/Mathematics/sjmiller/public_html/
math/talks/GeneralizingPythagoras.pdf).
[5] National Science Foundation, Mathematical sciences institutes, https://mathinstitutes.org/
institutes/.
[6] The On-Line Encyclopedia of Integer Sequences, A003273 (Congruent numbers: positive in-
tegers n for which there exists a right triangle having area n and rational sides), https://
oeis.org/A003273.
[7] J. B. Tunnell, A classical Diophantine problem and modular forms of weight 3/2, Invent.
Math. 72 (1983), no. 2, 323–334, DOI 10.1007/BF01389327. MR700775
[8] Wikipedia, Congruent number, https://en.wikipedia.org/wiki/Congruent_number.
[9] Wikipedia, Tunnell’s theorem, https://en.wikipedia.org/wiki/Tunnell’s_theorem.
1995
Introduction
In 1637, Pierre de Fermat wrote the following statement in the margin of his
copy of Diophantus’s Arithmetica (Figure 1):
Cubum autem in duos cubos, aut quadratoquadratum in duos quadra-
toquadratos & generaliter nullam in infinitum ultra quadratum potes-
tatem in duos eiusdem nominis fas est dividere cuius rei demonstra-
tionem mirabilem sane detexi. Hanc marginis exiguitas non caperet.
In English, this reads
It is impossible to separate a cube into two cubes, or a fourth power into
two fourth powers, or in general, any power higher than the second,
into two like powers. I have discovered a truly marvelous proof of this,
which this margin is too narrow to contain.
Although it appears unlikely that Fermat found a simple and correct proof,1 the
conjecture became known as Fermat’s last theorem. In modern terminology it states
that if n ≥ 3, then there are no solutions in natural numbers x, y, z to
xn + y n = z n . (1995.1)
Although various special cases of Fermat’s last theorem were handled over the
years, a complete proof remained elusive (in contrast, Fermat’s last theorem for
polynomials is significantly easier; see the 1981 entry). Many mathematicians,
great and small, chipped away and some proved various special cases. The great
David Hilbert excused himself by saying, “Before beginning I should have to put
in three years of intensive study, and I haven’t that much time to squander on a
probable failure” (however, he must have squandered a little time on it, since he
found a new proof in the case n = 4).
The year 1995 saw the publication of papers by Andrew Wiles (1953– ) [12] and
by Richard Taylor (1962– ) and Wiles [11] that finally put Fermat’s last theorem
to rest. The big announcement came in 1993 during a series of lectures delivered
by Wiles at the Isaac Newton Institute in Cambridge. However, a serious issue was
soon found that threatened to undermine his proof. He teamed up with Taylor, his
former student, and they eventually succeeded in filling the gap. Their work built
upon the foundations laid by several generations of mathematicians that connected
the problem to the theory of elliptic curves (see the 1921 entry and the comments
for the 1956 entry). While Fermat’s result has held mathematicians’ interest for
1 Fermat proved the special case n = 4. If he were in a possession of a complete proof,
this would not have been necessary. He probably never thought that people would obsess over a
comment he made to himself in the margin of a book.
457
458 1995. FERMAT’S LAST THEOREM
centuries, the method of proof was at least as important as the final result since it
yielded many important results in active areas of research.
Where does one start such a difficult and imposing problem? First observe that
if (x, y, z) ∈ N3 is a solution to (1995.1), then
(xn/d )d + (y n/d )d = (z n/d )d
whenever d divides n. Thus, we obtain solutions in natural numbers to the Fermat
equation with exponent d. Since there are solutions to (1995.1) if n = 1 and n = 2,
it suffices to show that there are no solutions if n = 4 or if n is an odd prime. This
is a significant reduction!
The case n = 3 was handled by Euler in 1770, although many independent
proofs followed over the years. The case n = 5 was dispatched by Legendre and
Dirichlet around 1825. Gabriel Lamé (1795–1870) settled the case n = 7 in 1839,
followed shortly thereafter by a proof of Victor-Amédée Lebesgue2 (1791–1875).
2 Not to be confused with Henri Lebesgue (1875–1941) of measure and integration fame.
100TH ANNIVERSARY PROBLEMS 459
one to rapidly determine whether a given prime is regular or not. The first several
regular primes are [7]
3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 41, 43, 47, 53, 61, 71, 73, 79, 83,
89, 97, 107, 109, 113, 127, 137, 139, 151, 163, 167, 173, 179, 181,
191, 193, 197, 199, 211, 223, 227, 229, 239, 241, 251, 269, 277,
281, 313, 317, 331, 337, 349, 359, 367, 373, 383, 397, 419, 431.
Kummer’s theorem tells us that Fermat’s last theorem is true for these exponents.
Unfortunately, it is not known whether infinitely many regular primes exist,
although Carl Ludwig Siegel (1896–1981) conjectured that infinitely many exist
and, moreover, that they have density e−1/2 ≈ 0.60653 as a subset of the primes
[8]. On the other hand, there are infinitely many irregular primes, that is, primes for
which Kummer’s approach to Fermat’s last theorem is not applicable. This seems
to have first been proved in 1915 by Johan Ludwig Jensen (1859–1925), although
many authors cite the 1954 paper of Leonard Carlitz (1907–1999) [2].
A result as monumental as Fermat’s last theorem deserves two problems. The
first problem below was originally from the 1995 entry, while the second was from
the 1949 entry (in the process of converting these entries to a book, we had the
opportunity to move and combine some material).
1995: Comments
Sophie Germain primes. A prime p is a Sophie Germain prime if 2p +
1 is prime. These are named after Sophie Germain (1776–1831), a remarkable
mathematician, physicist, and philosopher. She proved that if p is such a prime,
then the only natural number solutions to xp + y p = z p have p|xyz [9]. See [5,
Ch. 14] for a circle-method argument (see the 1923 entry) that suggests the number
of Sophie Germain primes at most x is asymptotic to C2 x/ log2 x, in which C2 =
0.660161815 . . . is the twin primes constant (1919.4).
100TH ANNIVERSARY PROBLEMS 461
Bibliography
[1] C. D. Bennett, A. M. W. Glass, and G. J. Székely, Fermat’s last theorem for rational expo-
nents, Amer. Math. Monthly 111 (2004), no. 4, 322–329, DOI 10.2307/4145241. MR2057186
[2] L. Carlitz, Note on irregular primes, Proc. Amer. Math. Soc. 5 (1954), 329–331, DOI
10.2307/2032249. MR0061124
[3] K. Devlin, F. Gouvêa, and A. Granville, Fermat’s last theorem, a theorem at last, FOCUS,
August 1993, 3–5. http://www.dms.umontreal.ca/~andrew/PDF/FLTatlast.pdf.
[4] F. Q. Gouvêa, “A marvelous proof ”, Amer. Math. Monthly 101 (1994), no. 3, 203–222, DOI
10.2307/2975598. MR1264001
[5] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[6] F. Morgan, Fermat’s last theorem for fractional and irrational exponents, College Math. J.
41 (2010), no. 3, 182–185, DOI 10.4169/074683410X488647. MR2656314
[7] Online Encyclopedia of Integer Sequences, A007703 (Regular primes), http://oeis.org/
A007703.
[8] C. L. Siegel, Zu zwei Bemerkungen Kummers (German), Nachr. Akad. Wiss. Göttingen
Math.-Phys. Kl. II 1964 (1964), 51–57. MR0163899
[9] A. van der Poorten, Notes on Fermat’s last theorem, Canadian Mathematical Society Series
of Monographs and Advanced Texts, A Wiley-Interscience Publication, John Wiley & Sons,
Inc., New York, 1996. MR1373197
[10] S. Singh, Fermat’s enigma: The epic quest to solve the world’s greatest mathematical problem,
with a foreword by John Lynch, Walker and Company, New York, 1997. MR1491363
[11] R. Taylor and A. Wiles, Ring-theoretic properties of certain Hecke algebras, Ann. of Math.
(2) 141 (1995), no. 3, 553–572, DOI 10.2307/2118560. MR1333036
[12] A. Wiles, Modular elliptic curves and Fermat’s last theorem, Ann. of Math. (2) 141 (1995),
no. 3, 443–551, DOI 10.2307/2118559. MR1333035
3 See the comments for the 1927 and 1944 entries for two more applications of this theorem.
1996
Introduction
Cramér’s probabilistic model of the primes (see the comments to the 1989 entry)
predicts that a large natural number n has roughly a 1/ log n chance of being prime.
This heuristic suggests that the expected number of primes in a set A ⊆ N is
1
.
log a
a∈A
diverges, there is cause for optimism. However, like Treebeard we must not be
hasty. A similar computation suggests that there are infinitely many primes of
the form 2n , which is absurd. Some sort of adjustment must be made in order to
fine-tune such predictions.
We must first address the fact that not all n are treated equally by our sequence.
If n = ab and 1 < a, b < n, then (1989.2) provides the factorization
2n − 1 = 2ab − 1 = (2a )b − 1
= (2a − 1) (2a )b−1 + (2a )b−2 + · · · + 2a + 1 .
Thus, we may restrict our attention to Mp = 2p − 1, in which p is prime. A prime
of this form is called a Mersenne prime. If we update our heuristic argument to
reflect this restriction, we obtain a sum over the primes
1 1
= (log 2) ,
p
log Mp p
p
which diverges (a famous result of Euler; see the comments for the 1913 entry).
Perhaps there are infinitely many Mersenne primes? The values of p ≤ 1,000 that
produce Mersenne primes are
2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 607
and there are currently (as of mid-2018) fifty known Mersenne primes. While the
search continues, it remains an open problem whether the number of Mersenne
primes is infinite. Even widely accepted conjectures, such as the Bateman–Horn
463
464 1996. GREAT INTERNET MERSENNE PRIME SEARCH (GIMPS)
conjecture (see the comments for the 2005 entry), are not refined enough to handle
the distribution of primes arising from nonpolynomial functions, such as 2n − 1.
For many years,
2127 − 1 = 170,141,183,460,469,231,731,687,303,715,884,105,727
was the largest known prime. It was shown to be prime by Édouard Lucas (1842–
1891) in 1876 and it will forever remain the largest prime ever found without the use
of a computer. The status of M67 , however, remained in doubt until 1903. Mersenne
claimed that it was prime, but Lucas proved that this is not the case. However, he
was unable to produce any of its factors. The following curious anecdote concerns
Frank Nelson Cole (1861–1926), whom the prestigious Cole Prizes in algebra and
number theory are named after:
At a mathematical meeting in New York in 1903, F. N. Cole walked
on to the platform and, without saying a single word, wrote two large
numbers on the blackboard. He multiplied them out in longhand, and
equated the result to 267 − 1. (Subsequently, in private, Cole said
that those few minutes at the blackboard had cost him three years of
Sundays.) So Mersenne was wrong about his ninth case: p = 67 does
not yield a prime number. . . . [5]
Perhaps Cole would be disappointed to know that his factorization
M67 = 147,573,952,589,676,412,927 = 193,707,721 × 761,838,257,287
can be found on a late-2013 desktop computer in less than 0.002 seconds!
In 1996, George Woltman (1957– ) started the Great Internet Mersenne Prime
Search (GIMPS) project.1 This distributed computing project operates on thou-
sands of participating computers around the world. Since its inception, every new
Mersenne prime that has been discovered was discovered by GIMPS. As of mid-
2018, the largest known prime is M77,232,917 , discovered in late-2017 by the GIMPS
program (see below), which has 23,249,425 digits. Anyone with a computer can
join in the effort to find new Mersenne primes and there is a monetary reward for
doing so: the Electronic Frontier Foundation (EFF) offers prizes of
• $150,000 to the first individual or group who discovers a prime number with at
least 100,000,000 decimal digits;
• $250,000 to the first individual or group who discovers a prime number with at
least 1,000,000,000 decimal digits [4].
Why is so much focus placed on Mersenne primes? Special algorithms and
the binary nature of computer architecture make Mersenne numbers particularly
tractable. The factoring code works in three phases to determine whether Mp =
2p − 1 is prime. First, one eliminates all possible small factors. This relies on the
fact that any factor of a Mersenne number must be of the form f = 2kp + 1 with
f ≡ 1, 7 (mod 8). This eliminates about 95% of the potential factors. Once any
potential small factors have been ruled out, GIMPS turns to the Lucas–Lehmer
primality test (see the comments below).
1 Not to be confused with the popular GNU Image Manipulation Program (GIMP), an open-
We have far too much to say about Mersenne primes to fit in one entry; see the
comments for the 1997 entry for more information.
1996: Comments
Lucas–Lehmer primality test. Trial division is impractical for determining
whether a large number n is prime. It suffices to check for prime factors at most
√ √
n since in any factorization n = ab, not both factors can be larger than n. If
n ≈ 10500 , then we would need to divide n by every prime at most 10250 . The prime
number theorem tells us that there are approximately 1.74 · 10247 such primes. How
bad is this? To put this in perspective, there are about 1082 atoms in the observable
universe [10]. If each atom were a universe itself, each atom of which was actually
a supercomputer capable of 1020 divisions per second and running since the big
bang (13.82 billion years ago), we would have only completed
1082 × 1082 × 1020 × 13.82 × 365 × 24 × 60 × 60 ≈ 4.36 × 10197
trial divisions. So how can we possible know, with absolute certainty, that a given
Mersenne number is truly prime?
The Lucas–Lehmer primality test, developed by Lucas in 1856 and subsequently
refined by Derrick Henry Lehmer (1905–1991), is an efficient way to test Mersenne
numbers for primality. If p is prime and
4 if i = 0,
si =
si−1 − 2 if i ≥ 1,
2
or n = p2 . Try to prove this; see [6] for a proof, as well as a characterization for
three distinct prime factors.
The Sokal affair. The year 1996 marks the publication of the landmark paper
Transgressing the Boundaries: Towards a Transformative Hermeneutics of Quan-
tum Gravity by physicist Alan Sokal [8]. With such a lofty title, one might expect
the article to have deep philosophical reflections about the potential unification of
quantum mechanics and gravity, long considered a “holy grail” in physics. It con-
tains nothing of the sort but is instead composed of rambling and largely nonsensical
passages such as:
More recently, Lacan’s topologie du sujet has been applied fruitfully to
cinema criticism and to the psychoanalysis of AIDS. In mathematical
terms, Lacan2 is here pointing out that the first homology group of the
sphere is trivial, while those of the other surfaces are profound; and this
homology is linked with the connectedness or disconnectedness of the
surface after one or more cuts. Furthermore, as Lacan suspected, there
is an intimate connection between the external structure of the physical
world and its inner psychological representation qua knot theory: this
hypothesis has recently been confirmed by Witten’s derivation of knot
invariants (in particular the Jones polynomial) from three-dimensional
Chern–Simons quantum field theory.
This load of fetid dingo’s kidneys was published by Social Text, a leading journal
in postmodern cultural studies.3 What was Sokal’s motivation for this prank? He
provides the following explanation on his website [9]:
For some years I’ve been troubled by an apparent decline in the stan-
dards of intellectual rigor in certain precincts of the American academic
humanities. . . . So, to test the prevailing intellectual standards, I de-
cided to try a modest (though admittedly uncontrolled) experiment:
Would a leading North American journal of cultural studies. . . publish
an article liberally salted with nonsense if (a) it sounded good and (b)
it flattered the editors’ ideological preconceptions?
The answer, unfortunately, is yes. . . .
Throughout the article, I employ scientific and mathematical con-
cepts in ways that few scientists or mathematicians could possibly take
seriously. . . . I assert that Lacan’s psychoanalytic speculations have
been confirmed by recent work in quantum field theory. Even nonsci-
entist readers might well wonder what in heavens’ [sic] name quantum
field theory has to do with psychoanalysis; certainly my article gives
no reasoned argument to support such a link. . . .
In sum, I intentionally wrote the article so that any competent
physicist or mathematician (or undergraduate physics or math major)
would realize that it is a spoof. Evidently the editors of Social Text felt
comfortable publishing an article on quantum physics without bother-
ing to consult anyone knowledgeable in the subject.
2 See the comments for the 1991 entry for more about Jacques Lacan.
3 Continuing the holy grail theme, one might say that the editors of the journal “chose poorly.”
100TH ANNIVERSARY PROBLEMS 467
Bibliography
[1] S. Brin and L. Page, The anatomy of a large-scale hypertextual Web search engine, Computer
Networks and ISDN Systems 30 (1998), 107–117. http://infolab.stanford.edu/~backrub/
google.html.
[2] C. K. Caldwell, Mersenne Primes: History, Theorems and Lists, http://primes.utm.edu/
mersenne/index.html.
[3] M. Cozzens and S. J. Miller, The mathematics of encryption: An elementary introduc-
tion, Mathematical World, vol. 29, American Mathematical Society, Providence, RI, 2013.
MR3098499
[4] Electronic Frontier Foundation, EFF Cooperative Computing Awards, https://www.eff.org/
awards/coop.
[5] N. Gridgeman, The search for perfect numbers, New Scientist 334 (1963), 86–88.
[6] A. Lemos and A. Cambraia Junior, On the number of prime factors of Mersenne numbers,
http://arxiv.org/abs/1606.08690.
[7] GIMPS Homepage. http://www.mersenne.org/.
[8] A. D. Sokal, Transgressing the Boundaries: Toward a Transformative Hermeneutics of Quan-
tum Gravity, Social Text (1996), no. 46/47, 217–252.
[9] A. D. Sokal, A Physicist Experiments With Cultural Studies, http://www.physics.nyu.edu/
faculty/sokal/lingua_franca_v4/lingua_franca_v4.html.
[10] Universe Today, How many atoms are there in the universe?, https://www.universetoday.
com/36302/atoms-in-the-universe/.
[11] Wikipedia, Lucas-Lehmer primality test, https://en.wikipedia.org/wiki/Lucas-Lehmer
primality test.
[12] Wikipedia, Mersenne prime, https://en.wikipedia.org/wiki/Mersenne_prime.
[13] Wikipedia, Sokal affair, https://en.wikipedia.org/wiki/Sokal_affair.
1997
Introduction
In addition to applications in the physical sciences, mathematics plays a key
role in many other fields, including economics and finance. While there is no true
Nobel Prize in Economics, since 1968 the Royal Swedish Academy of Sciences has
awarded the Bank of Sweden Prize in Economic Sciences in Memory of Alfred
Nobel. This is widely regarded as the “Nobel Prize in Economics” by the general
public. The award announcement from 1997 [2] begins:
Robert C. Merton [1944– ] and Myron S. Scholes [1941– ] have, in
collaboration with the late Fischer Black [1938–1995], developed a pi-
oneering formula for the valuation of stock options. Their methodology
has paved the way for economic valuations in many areas. It has also
generated new types of financial instruments and facilitated more effi-
cient risk management in society.
Sadly, Black passed away before the announcement and did not receive the award
since it is not given posthumously.
They begin with a stochastic model
dS
= μ dt + σ dW,
S
469
470 1997. THE NOBEL PRIZE OF MERTON AND SCHOLES
in which S is the stock price at time t and W is a Wiener process, that is, Brownian
motion (Figure 1). Intuitively, this says that the infinitesimal rate of return on S
has expected value μ dt and variance σ 2 dt. From here, one makes a few reasonable
assumptions, performs a number of manipulations, and deduces that
∂V 1 ∂2V ∂V
+ σ 2 S 2 2 + rS − rV = 0,
∂t 2 ∂S ∂S
in which V is the option price function. This is the famed Black–Scholes equation,
which can be solved numerically when given suitable boundary conditions [1].
When one considers the trillions of dollars traded annually in the global econ-
omy, the impact and importance of such mathematics is clear. See the 1962 entry
and the notes below for another connection between mathematics and Nobel Prize
winning economics applications.
No note on applications of mathematics in finance would be complete without
a mention of the dangers of using formulas in regimes in which they are not known
to hold. Famed investor Warren Buffett (1930– ) said in 2008:
I believe the Black–Scholes formula, even though it is the standard for
establishing the dollar liability for options, produces strange results
when the long-term variety are being valued. . . . The Black–Scholes
formula has approached the status of holy writ in finance . . . . If the
formula is applied to extended time periods, however, it can produce
absurd results. In fairness, Black and Scholes almost certainly under-
stood this point well. But their devoted followers may be ignoring
whatever caveats the two men attached when they first unveiled the
formula.
For a description of the faulty mathematics and incorrect assumptions that helped
instigate the “great recession,” see [3].
1997: Comments
Solution to the problem. For simplicity, we assume that μ = 0 and σ = 1.
Hence we must compute
x
1 2
F (x) = √ e−t /2 dt;
−∞ 2π
1 This is related to the error function by 1
2
[1 + erf( x−μ
√ )].
σ 2
100TH ANNIVERSARY PROBLEMS 471
see Figure 2. A natural approach is to use the series expansion for the exponential
function:
∞ ∞ x
x
1 (−1)n t2n (−1)n
F (x) = √ dt = √ t2n dt.
−∞ 2π n=0 2n n! n=0
2n n! 2π
−∞
then
⎧
⎨ 1 + G(x) if x ≥ 0,
2
F (x) =
⎩ 1 − G(|x|) if x ≤ 0.
2
1
F (x) = + sgn(x)G(|x|),
2
in which
⎧
⎪
⎨1 if x > 0,
sgn(x) = 0 if x = 0,
⎪
⎩
−1 if x < 0,
is the sign function. Since each integrand that appears is nonnegative and has finite
integral, the Fubini–Tonelli theorem implies that
∞
1 (−1)n t2n
x
G(x) = √ dt
0 2π n=0 2n n!
∞ x
(−1)n
= √ t2n dt
2n n! 2π
n=0 0
∞
(−1)n x2n+1
= √ .
n=0
2n (2n
+ 1)n! 2π
For any fixed x, the series converges rapidly due to the factorial in the denominator.
Thus,
Perfect numbers. Let us be honest. We like number theory more than math-
ematical economics. There was too much to be said about Mersenne primes in our
1996 entry, so we have appropriated some space here to continue the discussion.
The Pythagoreans regarded the number 6 as special because it equals the sum
of its proper divisors: 1 + 2 + 3 = 6. The next largest numbers with this property
are 28, 496, and 8,128 since
28 = 1 + 2 + 4 + 7 + 14,
496 = 1 + 2 + 4 + 8 + 16 + 31 + 62 + 124 + 248,
8,128 = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 127
+ 254 + 508 + 1,016 + 2,032 + 4,064.
One of the cornerstones of Pythagorean philosophy was the assignment of mysti-
cal qualities to numbers. They called numbers like 6, 28, 496, and 8,126 perfect
numbers. Later thinkers like Augustine of Hippo (354–430) and Alcuin of York
(ca. 735–804) celebrated the special nature of perfect numbers. In the City of God
(Part XI, Chapter 30), Augustine writes:
These works are recorded to have been completed in six days (the same
day being six times repeated), because six is a perfect number,—not
because God required a protracted time, as if He could not at once
create all things, which then should mark the course of time by the
movements proper to them, but because the perfection of the works
was signified by the number six. For the number six is the first which
is made up of its own parts, i.e., of its sixth, third, and half, which are
respectively one, two, and three, and which make a total of six.. . . And,
therefore, we must not despise the science of numbers, which, in many
passages of holy Scripture, is found to be of eminent service to the
careful interpreter.
The fact that it takes twenty-eight days for the moon to travel around the Earth
was also seen by many early thinkers to confirm the importance of perfect numbers.
2 The QR-decomposition (see the 1959 entry) is particularly effective here. Write I −C = QR,
in which Q is an orthogonal matrix and R is upper triangular. The given system QRx = d is
equivalent to Rx = QT d, which has an upper-triangular coefficient matrix and hence can be solved
via back substitution. This approach is more stable than Gaussian elimination, which is typically
promoted in a first course on linear algebra.
474 1997. THE NOBEL PRIZE OF MERTON AND SCHOLES
Bibliography
[1] J. Fogler, Options Pricing: Black–Scholes Model, https://www.investopedia.com/
university/options-pricing/black-scholes-model.asp.
[2] The Royal Swedish Academy of Sciences, Press Release (October 14, 1997), http://www.
nobelprize.org/nobel_prizes/economic-sciences/laureates/1997/press.html.
[3] F. Salmon, Recipe for disaster: the formula that killed Wall Street, Wired, February 23, 2009.
https://www.wired.com/2009/02/wp-quant/.
[4] Wikipedia, Black–Scholes model, https://en.wikipedia.org/wiki/Black-Scholes_model.
[5] Wikipedia, Perfect number, https://en.wikipedia.org/wiki/Perfect_number.
[6] Wolfram Mathworld, Odd perfect number, http://mathworld.wolfram.com/
OddPerfectNumber.html.
1998
Introduction
What is the densest way to pack spheres into n-dimensional space? In one
dimension, each sphere is a line segment of length two and hence the densest packing
consists of infinitely many line segments placed end to end. Thus, the packing
density in one dimension is 1. In two dimensions the problem is somewhat harder.
Here the “spheres” are disks of radius one. Joseph-Louis Lagrange (1736–1813)
proved in 1773 that the hexagonal lattice packing (see Figure 1) is the densest
possible lattice-based sphere packing in the plane. Its density is
√
π 3
≈ 0.9069,
6
so about 90.7% of the plane is covered. Although Axel Thue had provided a flawed
proof back in 1890, a complete proof that the hexagonal lattice packing is the
densest of all possible packings, including irregular, non-lattice-based packings,
came only in 1940, when it was established by László Fejes Tóth (1915–2005) [14].
475
476 1998. THE KEPLER CONJECTURE
later work of Christian Marchal [12]. In converting the proof ideas to formal form, Hales took
advantage of this to get a local inequality that was cleaner and easier to prove by computer [11].
100TH ANNIVERSARY PROBLEMS 477
Figure 2. The√ cubic close and hexagonal close packings both have
density π/(3 2) ≈ 0.74048. The difference between the two
packings is in the relative orientation of every other layer. The
spheres in the hexagonal packing lie directly above the spheres
two layers below. The spheres in the cubic close packing do not:
consider the relative orientation of the green and blue triangles
suggested by the top and bottom layers.
(a) A cubic close packing of cannonballs at Fort (b) Snowballs packed in hexagonal close (front)
Monroe in Hampton, Virginia, in 1861 (image and cubic close packings (rear) (image public
public domain). domain).
(1, 1)
( √12 , √1
2
)
(0, 0)
(a) In two dimensions, the sphere occupies ap- (b) How does the distance between the corner
proximately 52.36% of the box that contains of the cube to the nearest point of the sphere
it. change as the dimension increases?
1998: Comments
Cubes and spheres. What fraction of the n-dimensional cube (with sides of
length 2) is taken up by the n-dimensional unit sphere? In two dimensions the area
of the circle is π, giving a ratio of π/4 ≈ 0.785398, while in three dimensions the
volume of the sphere is 4π/3, giving a ratio of π/6 ≈ 0.523599; see Figure 4(a).
One can show that in n dimensions the sphere has volume
π n/2
Vn = ,
Γ( n2 + 1)
100TH ANNIVERSARY PROBLEMS 479
in which ∞
Γ(s) = e−x xs−1 dx, Re s > 0,
0
is the gamma function. For positive integers n, we have
⎧
⎪
⎨(n − 1)! if x = n,
Γ(x) = √ (n − 2)!!
⎪
⎩ π if x = n + 12 ,
n−1
2 2
in which n!! denotes the product of every other term of the corresponding factorial.
For example, 6!! = 6 · 4 · 2 and 7!! = 7 · 5 · 3 · 1.
Using Stirling’s formula (see the comments for the 1934 entry)
√
n! ≈ nn e−n 2πn,
it follows that the ratio
π n/2 /Γ( n2 + 1)
r(n) =
2n
of the volumes of the n-dimensional sphere and cube tends to zero rapidly; see
Table 1. Thus, in higher dimensions the sphere occupies very little of the cube.
How can this be? Our low-dimensional intuition misleads us in higher dimensions.
For example, the point
1
√ (1, 1, . . . , 1) ∈ Rn
n
lies on the n-dimensional sphere. Its distance to the corner (1, 1, . . . , 1) of the
n-dimensional cube is
9
: n 2
: 1 √ 1 √
; 1− √ = n 1− √ = n − 1,
i=1
n n
which tends to infinity! This unexpected behavior is not evident in Figure 4(b).
Remark on the problem. One can show that R(1) = 2, and we think that
R(2) = 1 + √23 . Then things rapidly get tricky. There are some n ≤ 20 for which
the exact answer is unknown. Some records for 1 ≤ n ≤ 32 are in [3, 8].
480 1998. THE KEPLER CONJECTURE
Bibliography
[1] J. Aron, Proof confirmed of 400-year-old fruit-stacking problem, New Scientist (August 12,
2014), https://www.newscientist.com/article/dn26041-proof-confirmed-of-400-year-
old-fruit-stacking-problem.
[2] W. Barlow, Probable nature of the internal symmetry of crystals, Nature 29 (1883), 186–188.
[3] Th. Gensane, Dense packings of equal spheres in a cube, Electron. J. Combin. 11
(2004), no. 1, Research Paper 33, 17. http://www.combinatorics.org/ojs/index.php/eljc/
article/view/v11i1r33/pdf. MR2056085
[4] T. C. Hales, A proof of the Kepler conjecture, Ann. of Math. (2) 162 (2005), no. 3, 1065–
1185, DOI 10.4007/annals.2005.162.1065. http://annals.math.princeton.edu/2005/162-3/
p01. MR2179728
[5] T. C. Hales, Historical overview of the Kepler conjecture, Discrete Comput. Geom. 36 (2006),
no. 1, 5–20, DOI 10.1007/s00454-005-1210-2. http://link.springer.com/article/10.1007
%2Fs00454-005-1210-2. MR2229657
[6] T. C. Hales and S. P. Ferguson, A formulation of the Kepler conjecture, Discrete Comput.
Geom. 36 (2006), no. 1, 21–69, DOI 10.1007/s00454-005-1211-1. http://link.springer.com/
article/10.1007%2Fs00454-005-1211-1. MR2229658
[7] T. C. Hales, J. Harrison, S. McLaughlin, T. Nipkow, S. Obua, and R. Zumkeller, A revi-
sion of the proof of the Kepler conjecture, Discrete Comput. Geom. 44 (2010), no. 1, 1–34,
DOI 10.1007/s00454-009-9148-4. http://link.springer.com/article/10.1007%2Fs00454-
009-9148-4. MR2639816
[8] A. Joós, On the packing of fourteen congruent spheres in a cube, Geom. Dedicata 140
(2009), 49–80, DOI 10.1007/s10711-008-9308-3. http://link.springer.com/article/10.
1007%2Fs10711-008-9308-3. MR2504734
[9] C. Hardie, translation of J. Kepler’s Strena, seu de nive sexangula, Oxford University Press,
2014.
[10] J. Kepler, Strena, seu de nive sexangula, Francofurti ad Moenum apud Godfefridum Tam-
pach, 1611.
[11] J. C. Lagarias, Dense sphere packings: a blueprint for formal proofs [book review
of MR3012355], Bull. Amer. Math. Soc. (N.S.) 53 (2016), no. 1, 159–166, DOI
10.1090/bull/1502. MR3443950
[12] C. Marchal, Study of the Kepler’s conjecture: the problem of the closest packing, Math. Z.
267 (2011), no. 3-4, 737–765, DOI 10.1007/s00209-009-0644-2. MR2776056
[13] J. Lagarias (ed.), The Kepler Conjecture: The Hales-Ferguson Proof, Springer-Verlag, 2011.
[14] L. F. Tóth, Über die dichteste Kugellagerung, Math. Z. 48 (1940), 676–684.
[15] S. J. Miller, Mathematics of optimization: how to do things faster, Pure and Applied Under-
graduate Texts, vol. 30, American Mathematical Society, Providence, RI, 2017. MR3729274
1999
Introduction
A seminal result in analysis, the Baire category theorem, was published by the
French mathematician René-Louis Baire (1874–1932) in his 1899 doctoral thesis
Sur les fonctions de variables réelles. In particular, it is the main ingredient in
the proof of three fundamental theorems in functional analysis: the open mapping
theorem, the closed graph theorem, and the uniform boundedness principle [3].
Because of its numerous applications and continued use in modern mathematics,
its centennial merits special attention.
A few definitions are necessary in order to state this important theorem. A
subset A of a topological space (see the comments for the 1955 entry) is nowhere
dense if its closure A− has empty interior, that is, if (A− )◦ = ∅. Figure 1 shows
the closure and interior of a set in R2 . A subset A of a topological space is of the
first category if it can be written as the countable union of nowhere dense sets;
otherwise A is of the second category. The classical version of the Baire category
theorem says that a complete metric space is of the second category in itself [1–3].
Before proceeding, we should admit that Baire’s terminology is unenlightening
and dated. To add to the confusion, it has nothing to do with category theory,
an important branch of mathematics that originated in the latter half of the 20th
century. A more modern statement of Baire’s theorem has two parts:
(a) In a complete metric space, the countable union of open dense sets is dense.
(b) A complete metric space is not the countable union of nowhere dense sets.
The theorem also applies to topological spaces that are homeomorphic (see the
comments for the 1917 entry) to complete metric spaces.
481
482 1999. BAIRE CATEGORY THEOREM
What is the big deal about the Baire category theorem? As a warmup, here is a
one-line proof that R is uncountable (see the 1918 entry). If R = {a1 , a2 , . . .}, then
#
R = ∞ n=1 {an } is the countable union of nowhere dense sets, which contradicts
(b) since R is complete. A similar argument shows that the Cantor set (see the
comments for the 1917 entry) is uncountable. Since the Cantor set is compact, it
is complete and hence it cannot be the countable union of singletons.
Here is another cute application. Let F be a fat Cantor set, that is a Cantor-
like set with positive Lebesgue measure; see Figure 2. Then R is not the countable
union of translated copies of F . We cannot appeal to a measure-theoretic argument
here: since F has positive measure, a countable union of translates of F may well
have infinite Lebesgue measure. Baire’s theorem comes to the rescue. Like the
standard Cantor set, F is nowhere dense. Thus, (b) tells us that R is not the union
of countably many translates of F .
Why is the Baire category theorem so powerful? What is going on underneath
the hood? The proof of Baire’s theorem hinges in a crucial manner upon the axiom
of choice (see the comments below and in the 1964 entry).
Our problem for this year is a typical application of the Baire category theorem
to functional analysis. It may not be obvious how to apply the theorem to the
following problem. Here is a hint: look at finite-dimensional subspaces!
1999: Comments
Axiom of choice. The proof of the Baire category theorem, which can be
found in most real analysis textbooks, involves the subtle use of the axiom of
choice (AC). See the comments for the 1964 entry for a statement of the axiom
and a few general comments. We are interested here in discussing a few equivalent
formulations of AC. To continue our discussion, we require a few definitions.
100TH ANNIVERSARY PROBLEMS 483
{1, 2, 3}
1 The order produced by the well-ordering principle need not correspond to any sort of natural
order structure that A possesses. The axiom of choice implies that R can be well-ordered, but the
order has no relation to the standard order on R.
484 1999. BAIRE CATEGORY THEOREM
(e) Hausdorff maximality principle. Every partially ordered set has a maxi-
mal totally ordered subset.
(f) The Cartesian product of nonempty sets is nonempty.
(g) Every vector space has a basis.
(h) Every poset has a maximal antichain.2
(i) Every connected graph has a spanning tree.3
The following common theorems require the axiom of choice or some weaker
variant of it such as the axiom of countable choice (in which countably many arbi-
trary choices can always be made):
• A countable union of countable sets is countable.
• Every infinite set has a countable infinite subset.
• Every field has an algebraic closure.4
• Nielsen–Schreier theorem. Every subgroup of a free group is free.
• Baire category theorem. In a complete metric space, the countable intersec-
tion of open, dense sets is dense.
The bizarre results that follow from the axiom of choice, coupled with its intu-
itive and useful consequences, spur one to ask if AC is true or false. This is, in a
precise sense, a question that cannot be answered: Gödel and Cohen proved that
AC is independent of Zermelo–Fraenkel set theory.
A set of axioms is consistent if there does not exist a statement S such that both
S and its negation ¬S are provable from the axioms; that is, the axioms are not self-
contradictory. Gödel’s second incompleteness theorem (see the 1929 entry) asserts
that no “sufficiently complicated” axiomatic system, including Zermelo–Fraenkel
set theory (ZF), can prove its own consistency.
Outside of logic and set theory, few working mathematicians concern themselves
with the consistency of ZF. Almost everyone believes that ZF is consistent, but
Gödel’s theorem tells us that we cannot hope to prove its consistency without
recourse to a more powerful axiom system; then we face the problem of proving
that that system is consistent!
Think of systems of axioms as “operating systems” for software. Most of mod-
ern mathematics “runs under” ZFC, the Zermelo–Fraenkel axioms augmented with
the axiom of choice. ZFC is sufficient to “run” the software that most average
“users” (mathematicians, statisticians, physicists, computer scientists, and so forth)
need. It has not “crashed” (been proven inconsistent) yet, but no one knows if ZFC
is “crash-proof” (consistent). There are other, more exotic operating systems out
2 Anantichain is a subset of a poset with the property that any two distinct elements in the
subset are not comparable.
3 A spanning tree in a graph G is a connected subgraph that contains every vertex of G and
The standard example is the complex field C. That C is algebraically closed is the fundamental
theorem of algebra.
100TH ANNIVERSARY PROBLEMS 485
there, such as ZFC augmented by certain large cardinal axioms, but mostly these
are for “power users” such as set theorists and logicians. The average user is content
running on ZFC and rarely thinks about operating systems, if at all.
Bibliography
[1] G. B. Folland, Real analysis: Modern techniques and their applications, 2nd ed., Pure and
Applied Mathematics (New York), A Wiley-Interscience Publication, John Wiley & Sons,
Inc., New York, 1999. MR1681462
[2] S. H. Jones, Applications of the Baire category theorem, Real Anal. Exchange 23 (1997/98),
no. 2, 363–394. https://projecteuclid.org/euclid.rae/1337001353. MR1640007
[3] T. Tao, The Baire category theorem and its Banach space consequences, http://terrytao.
wordpress.com/2009/02/01/245b-notes-9-the-baire-category-theorem-and-its-banach-
space-consequences.
2000
Introduction
This is another entry for which there were at least two good options. The
Clay Millennium Problems were one natural candidate; we briefly discuss them
in the comments below. This year’s winner is one of the most popular statistical
programming languages and environments: R.
R was created in 1993 by Ross Ihaka (1954– ) and Robert Gentleman (1959– )
at the University of Auckland, New Zealand. Its name is both a reference to
the first names of its inventors and to the underlying S programming language
that was developed at Bell Labs in the 1970s [13]. R, which is open source and
freely available, is widely used in industry and academia to perform statistical
computations. There are numerous developers and thousands of useful packages
available online.
Version 1.0.0 of R was released on February 29, 2000. This was the first version
considered stable enough for general use [5]:
The release of a current major version indicates that we believe that R
has reached a level of stability and maturity that makes it suitable for
production use. Also, the release of 1.0.0 marks that the base language
and the API for extension writers will remain stable for the foreseeable
future. In addition we have taken the opportunity to tie up as many
loose ends as we could.
In the comments to the 1953 entry, we saw how Andrey Markov developed
Markov chains to analyze the writing of Alexander Pushkin. What about the cre-
ation of literature? A little probability theory ensures that an immortal monkey
who pounds away randomly at a typewriter for all eternity will almost surely pro-
duce the complete works of William Shakespeare1 , along with the true version of
his lost play Love’s Labour’s Won, along with many false versions2 . What about
more sensible applications of mathematics to literature? For example, we might
wish to determine if a certain passage was written by the purported author. Has
an author’s style changed over time? All of these questions involve culling large
sets of linguistic data, then parsing and analyzing it.
Maciej Eder, Jan Rybicki, and Mike Kestemont created an R package to per-
form such analyses [4]. The motivating examples they consider range from a
pseudonymously published work written by J. K. Rowling (1965– ) to the alleged
original version of To Kill a Mockingbird by Harper Lee (1926–2016). Their paper
1 “Ford!” he said, “there’s an infinite number of monkeys outside who want to talk to us
487
488 2000. R
is full of code and detailed textual analyses (see Figure 1) and gives a small glimpse
of what one can do with R:
This software paper describes ‘Stylometry with R’ (stylo), a flexible
R package for the high-level analysis of writing style in stylometry.
Stylometry (computational stylistics) is concerned with the quantita-
tive study of writing style, e.g. authorship verification, an application
which has considerable potential in forensic contexts, as well as his-
torical research. In this paper we introduce the possibilities of stylo
for computational text analysis, via a number of dummy case studies
from English and French literature. We demonstrate how the package
is particularly useful in the exploratory statistical analysis of texts,
e.g. with respect to authorial writing style. Because stylo provides an
attractive graphical user interface for high-level exploratory analyses,
it is especially suited for an audience of novices, without programming
skills (e.g. from the Digital Humanities). More experienced users can
benefit from our implementation of a series of standard pipelines for
text processing, as well as a number of similarity metrics.
100TH ANNIVERSARY PROBLEMS 489
2000: Comments
Monkey business. On the theme of monkey-generated literature, we cannot
pass up the opportunity to recount the bizarre story of Pierre Brassau. In 1964,
tabloid journalist Åke Axelsson had a four-year-old chimpanzee produce a series
of paintings that were later exhibited in the Gallerie Christinae in Göteborg under
the pretense that they were the work of “Pierre Brassau,” an unheralded French
painter. One critic applauded the work: “Brassau paints with powerful strokes, but
also with clear determination. His brush strokes twist with furious fastidiousness.
Pierre is an artist who performs with the delicacy of a ballet dancer” [12]. Needless
to say, many in the Swedish art world were not amused by the hoax.
Maximum amusement. On the theme of hoaxes (make sure to check out the
comments for the 1996 entry), MIT students Jeremy Stribling, Maxwell Krohn, and
Daniel Aguayo wrote SCIgen, “a program that generates random Computer Science
research papers, including graphs, figures, and citations” [9]. It produced the now-
infamous paper Rooter: A Methodology for the Typical Unification of Access Points
and Redundancy [8], which opens with the immortal lines:
Many scholars would agree that, had it not been for active networks,
the simulation of Lamport clocks might never have occurred. The
notion that end-users synchronize with the investigation of Markov
models is rarely outdated. A theoretical grand challenge in theory
is the important unification of virtual machines and real-time theory.
To what extent can web browsers be constructed to achieve this pur-
pose? Certainly, the usual methods for the emulation of Smalltalk
that paved the way for the investigation of rasterization do not apply
in this area. In the opinions of many, despite the fact that conventional
wisdom states that this grand challenge is continuously answered by
the study of access points, we believe that a different solution is nec-
essary. It should be noted that Rooter runs in Ω(log log n). Certainly,
the shortcoming of this type of solution, however, is that compilers
and superpages are mostly incompatible. Despite the fact that similar
methodologies visualize XML, we surmount this issue without synthe-
sizing distributed archetypes.
This meaningless load of fetid dingo’s kidneys was accepted by the Ninth World
Multiconference on Systemics, Cybernetics and Informatics (WMSCI 2005). What
was the point of this exercise? The mischievous trio hoped to cause “maximum
amusement” and “test whether such meaningless manuscripts could pass the screen-
ing procedure for conferences that, they feel, exist simply to make money” [2].
490 2000. R
Curiously, the statistical generation and analysis of research papers has come
full circle. In 2014, a study by computer scientist Cyril Labbé revealed that at
least 120 nonsense papers generated by SCIgen had been published in conference
proceedings between 2008 and 2013 [6]!
• P versus NP problem. Let P denote the class of decision problems that can be
solved in polynomial time (with respect to the length of the input) and let NP be
the class of problems for which a proposed solution can be verified in polynomial
time. Thus, P ⊆ NP. The million-dollar question is whether equality holds [11].
That is, does knowing how to quickly verify a solution to a problem automatically
mean that a fast algorithm for solving that problem exists? For example, one
multiplication verifies the correctness of an integer factorization. Does this imply
that a deterministic, polynomial-time integer factorization algorithm exists?
• Navier–Stokes equation. This complicated system of partial differential equa-
tions with prescribed boundary conditions, named after Claude-Louis Navier
(1785–1836) and George Gabriel Stokes, (1819–1903), governs three-dimensional
fluid flow. For example, the turbulent behavior of water and air seems to adhere
to these equations; see Figure 2. Under reasonable mathematical hypotheses, do
solutions exist? Are they unique? Or can solutions “break down” in finite time?
How well does Navier–Stokes model physical reality?
• Hodge conjecture. This conjecture concerns how much of the topology of the
solution set of a system of algebraic equations can be defined in terms of further
algebraic equations. Since this is a tough one to describe with any degree of
faithfulness, we refer the reader to [3, 10] for further information.
• Poincaré conjecture. Is every simply connected, closed, three-dimensional
manifold homeomorphic to the three-dimensional sphere? This conjecture has a
long and storied history and, by some accounts, it has resulted in two or three
Fields Medals! See the 2003 entry for more details.
• Birch and Swinnerton-Dyer conjecture. What is the relationship between
the number of points on an elliptic curve over finite fields of prime order and the
rank of the group of rational points on the curve? See the comments for the 1921
entry for a detailed discussion of this conjecture.
Of the seven millennium problems, only the Poincaré conjecture has been resolved;
see the 2003 entry.
492 2000. R
Bibliography
[1] D. Adams, The Hitchhiker’s Guide to the Galaxy, Pan Books, 1979.
[2] P. Ball, Computer conference welcomes gobbledegook paper, Nature.com, https://www.
nature.com/articles/nature03653.
[3] Clay Mathematics Institute, The Millennium Prize Problems, http://www.claymath.org/
millennium-problems/millennium-prize-problems.
[4] M. Eder, J. Rybicki, and M. Kestemont, Stylometry with R: A Package for Computational
Text Analysis, The R Journal 8 (2016), no. 1, 107–121. https://journal.r-project.org/
archive/2016/RJ-2016-007/RJ-2016-007.pdf.
[5] R Developer Page, Statistical analysis environment “R” version 1.0.0 is released, http://
developer.r-project.org/R-release-1.0.0.txt.
[6] R. Van Noorden, Publishers withdraw more than 120 gibberish papers, Nature.com, https://
www.nature.com/news/publishers-withdraw-more-than-120-gibberish-papers-1.14763.
[7] The R Project for Statistical Computing, http://www.r-project.org/.
[8] J. Stribling, D. Aguayo, and M. Krohn, Rooter: a methodology for the typical unification of
access points and redundancy, https://pdos.csail.mit.edu/archive/scigen/rooter.pdf.
[9] J. Stribling, M. Krohn, and D. Aguayo, SCIgen—An Automatic CS Paper Generator,
https://pdos.csail.mit.edu/archive/scigen/.
[10] Wikipedia, Hodge conjecture, https://en.wikipedia.org/wiki/Hodge_conjecture.
[11] Wikipedia, P versus NP problem, https://en.wikipedia.org/wiki/P_versus_NP_problem.
[12] Wikipedia, Pierre Brassau, https://en.wikipedia.org/wiki/Pierre_Brassau.
[13] Wikipedia, R (programming language), http://en.wikipedia.org/wiki/R (programming
language).
[14] Wikipedia, Yang–Mills existence and mass gap, https://en.wikipedia.org/wiki/Yang-
Mills_existence_and_mass_gap
2001
Introduction
Project Euler, created by Colin Hughes in 2001, is an outstanding website that
has provided countless hours of enjoyment to mathematicians, computer scientists,
and other computationally minded people. It describes itself as follows [1]:
Project Euler is a series of challenging mathematical/computer pro-
gramming problems that will require more than just mathematical in-
sights to solve. Although mathematics will help you arrive at elegant
and efficient methods, the use of a computer and programming skills
will be required to solve most problems. The motivation for starting
Project Euler, and its continuation, is to provide a platform for the
inquiring mind to delve into unfamiliar areas and learn new concepts
in a fun and recreational context.
For many of the problems, one can quickly come up with a program that will
eventually find the solution. However, this does not mean that the program will
run in a reasonable amount of time. As an extreme example of this phenomenon,
consider chess. Since there are only finitely many possible board configurations, an
analysis of chess can be reduced to a finite computation. Does the first player have
a winning strategy? Can the second player always force the game to end in a draw?
Unfortunately, the number of board configurations and possible moves is far too
large for humans or their computers to analyze by brute force. The same is true
in many of the Project Euler problems: although one can describe a brute-force
approach, the naive approach simply takes too long to run.
Project Euler problems illustrate several key points:
• Theory has a place in computational problems: a clever reformulation of the
problem may prove more tractable than the original approach.
• Implementation is nontrivial: different programming languages and environments
may be better suited to different tasks.
• Although brute force sometimes works, an elegant approach is often more illu-
minating.
We illustrate these principles with the following problem. Consider a large triangle
and several possible triangulations; see Figure 1(a). Assign colors (red, green, or
blue) to each vertex as follows:
(a) the bottom left vertex of the original triangle is red, the bottom right is green,
the top is blue;
493
494 2001. COLIN HUGHES FOUNDS PROJECT EULER
(a) The initial triangle has (b) A refinement with one (c) A further refinement with
vertices of three different col- subtriangle with vertices of three subtriangles with ver-
ors. three different colors. tices of three different colors.
(b) any vertex on an outer edge of the original triangle has its color determined
by the two vertices adjacent to it;
(c) internal vertices may be colored red, green, or blue with no restrictions.
Does there exist a small triangle with red, green, and blue vertices?
Given a fixed subdivision, we can check all possible labelings by brute force.
However, this will not settle the general question since there are infinitely many
possible subdivisions that must be considered. An elegant approach to the prob-
lem is to prove that the number of triangles with distinctly labeled vertices is odd;
therefore at least one such triangle exists. This result, now known as Sperner’s
lemma, was discovered by Emanuel Sperner (1905–1980) in 1928; see [2, 4, 5]. Sur-
prisingly, it can be used to prove Brouwer’s fixed-point theorem, a seminal result
in topology; see the 2009 entry.
Here is a sketch of the proof. First, label the colors 1, 2, 3. Let Tabc , in which
a ≤ b ≤ c, denote the number of small triangles with vertices labeled a, b, and c
in some order. We want to show that T123 is positive. Let S12 denote the number
of 1−2 segments on the bottom of the original triangle. Then twice the number of
1−2 segments in the subdivision is
T123 + 2T112 + 2T122 + S12 .
This is because a 1−1−2 or 1−2−2 triangle generates two 1−2 segments. Thus, the
parity of T123 is the same as that of S12 . We leave it as an exercise to show that
the number of 1 − 2 segments on the bottom edge of the original triangle is odd,
which proves the claim.
and computer science (to efficiently code the problem). Form a group and see how
many of these problems you can solve.
2001: Comments
Fibonacci fun. The twenty-fifth problem on the Project Euler website con-
cerns the Fibonacci numbers, defined by
F0 = 0, F1 = 1, and Fn+1 = Fn + Fn−1 . (2001.1)
It asks:
What is the index of the first term in the Fibonacci sequence to contain
1,000 digits?
One can solve this by brute force. Here is a short Mathematica program to solve
the problem by searching among the first 100,000 Fibonacci numbers:
For[n = 1, n <= 100000, n++,
If[Log[10, N[Fibonacci[n]]] >= 999, Print[n]; Break[]]
]
The computer provides the answer, n = 4,782, in a fraction of a second. However,
this is not terribly satisfactory since it does not suggest a general method. We
relied upon the “black box” command Fibonacci[] to do the work for us. What
would happen if instead of 1,000 digits we insisted upon a billion? Do we really
understand what is going on?
One of the goals of the Project Euler problems is to show the interplay between
theory and coding. Is there a more elegant approach to the problem above? Binet’s
formula is a beautiful closed-form expression for the nth Fibonacci number:
√ n √ n
1 1+ 5 1− 5
Fn = √ − . (2001.2)
5 2 2
Although it is named after Jacques Philippe Marie Binet (1786–1856), who found it
in 1843, the formula was already known to Abraham de Moivre (1667–1754). This
is a classic example of Stigler’s law of eponymy; see the comments for the 2010
entry. The comments below contain two derivations of Binet’s formula.
How does Binet’s formula help? First observe that
√
1+ 5
≈ 1.61803398
2
is the golden ratio from classical geometry. Its algebraic conjugate,1
√
1− 5
≈ −0.61803398,
2
has absolute value less than one. Therefore, (2001.2) ensures that
√ n
1 1+ 5
Fn ≈ √
5 2
1 The golden ratio is a root of z 2 − z − 1, which is irreducible over Z (and hence, by Gauss’s
lemma, over Q). The other root of this polynomial is an algebraic conjugate of the golden ratio.
496 2001. COLIN HUGHES FOUNDS PROJECT EULER
with an error that tends to zero exponentially fast. If we want the first index n
such that Fn has k + 1 digits, then we solve
√ n √
1 1+ 5 k log 10 + log 5
√ ≈ 10 k
and deduce n ≈ √ .
5 2 log( 1+ 5 ) 2
Let k = 999 and conclude that the first Fibonacci with 1,000 digits has index
approximately 4,781.86. Rounding up to 4,782 yields the answer. In addition to
estimating the critical index, we can also check our claim with Binet’s formula:
F4,781 ≈ 6.613373228392440 × 10998 ,
F4,782 ≈ 1.070066266382759 × 10999 .
We are correct! For more on the Fibonacci numbers, see the 1938, 1957, 1970, and
1980 entries.
Binet’s formula via linear algebra. Let
!
1 1
A =
1 0
and use induction to confirm that2
!
Fn+1 Fn
An = (2001.3)
Fn Fn−1
for n = 1, 2, 3, . . .. The characteristic polynomial of A is
pA (z) = z 2 − z − 1,
which has roots √ √
1+ 5 1− 5
φ= and ψ= . (2001.4)
2 2
Eigenvectors that correspond to the eigenvalues (2001.4) are s1 = [1 − ψ]T and
s2 = [1 −φ]T . This yields the diagonalization A = SDS −1 , in which S = [s1 s2 ]
and D = diag(φ, ψ). Thus,
An = (SDS −1 )n = (SDS −1 )(SDS −1 ) · · · (SDS −1 ) = SDn S −1
n times
and hence
! ! n ! !
Fn+1 Fn 1 1 φ 0 1 −φ −1
= −√ . (2001.5)
Fn Fn−1 −ψ −φ 0 ψn 5 ψ 1
An S Dn S −1
Then ! !
Fn+1 Fn 1 φn − ψ n
= √ ,
Fn Fn−1 5
in which denotes entries whose exact values are irrelevant. Compare the (1, 2)
entries on both sides and obtain Fn = √15 (φn − ψ n ), which is Binet’s formula.
2 Here are some nice consequences of (2001.3). Take determinants in (2001.3) and obtain
Simpson’s formula: Fn+1 Fn−1 − Fn2 = (−1)n . Compare the lower-left entries of Am+n = Am An
and obtain Fm+n = Fm−1 Fn + Fm Fn+1 . Induction and the preceding formula can be used to
prove that Fd |Fn whenever d|n. For example, F5 = 5 divides F10 = 55.
100TH ANNIVERSARY PROBLEMS 497
Bibliography
[1] Project Euler.net, https://projecteuler.net/.
[2] J. Franklin, Methods of mathematical economics: Linear and nonlinear programming, fixed-
point theorems, Undergraduate Texts in Mathematics, Springer-Verlag, New York-Berlin, 1980.
MR602694
[3] T. Koshy, Fibonacci and Lucas numbers with applications, Pure and Applied Mathematics
(New York), Wiley-Interscience, New York, 2001. MR1855020
[4] S. J. Miller, Mathematics of optimization: how to do things faster, Pure and Applied Under-
graduate Texts, vol. 30, American Mathematical Society, Providence, RI, 2017. MR3729274
[5] E. Sperner, Neuer beweis für die invarianz der dimensionszahl und des gebietes (German),
Abh. Math. Sem. Univ. Hamburg 6 (1928), no. 1, 265–272, DOI 10.1007/BF02940617.
MR3069504
2002
PRIMES in P
Introduction
Given a large integer n, how quickly can one determine whether it is prime or
composite? The naive method is to divide n by each prime 2, 3, 5, 7, . . .. If one
√
reaches n without finding a factor, then n is prime. However, if n has a few
hundred digits, this approach can take longer than the age of the universe!
A more efficient approach is based upon Fermat’s little theorem, which says
that if p is prime and p does not divide a, then1
ap−1 ≡ 1 (mod p). (2002.1)
First, select an integer a. The Euclidean algorithm rapidly computes gcd(a, n)
without the need to factor either number.2 If gcd(a, n)
= 1, then n is composite. If
gcd(a, n) = 1 and an−1
≡ 1 (mod n), then Fermat’s little theorem ensures that n is
composite (although it does not provide a specific factor of n). If an−1 ≡ 1 (mod n),
then the test is inconclusive. In this case, repeat the test with another base a.
This can be implemented rapidly on a computer since an−1 need not be com-
puted directly. An example illustrates the approach. Suppose that we wish to
determine whether n = 1763 is prime or composite. We first write 1763 in binary.
Divide n = 1763 by the largest power of 2 that is at most n and repeat:
1762 = 1024 + 738,
738 = 512 + 226,
226 = 128 + 98,
98 = 64 + 34,
34 = 32 + 2.
Thus,
1 To prove Fermat’s little theorem, first show that a, 2a, 3a, . . . , (p − 1)a are distinct and
nonzero modulo p. Then a, 2a, 3a, . . . , (p − 1)a are congruent modulo p to 1, 2, 3, . . . , (p − 1), in
some order. Thus, a · 2a · 3a · · · (p − 1)a ≡ 1 · 2 · 3 · · · (p − 1) (mod p), and hence ap−1 (p − 1)! ≡
(p − 1)! (mod p). Since p does not divide (p − 1)!, we obtain ap−1 ≡ 1 (mod p).
2 A theorem of Gabriel Lamé says that the number of steps in the Euclidean algorithm is at
499
500 2002. PRIMES IN P
Fortunately, 3340 ≡ 56 (mod 341) and hence 3 is a witness to the fact that 341 is
composite; that is, 341 is not a pseudoprime for the base 3. By testing an integer
n with several different bases, we can weed out more pseudoprimes. Unfortunately,
there are composite numbers n that are pseudoprime for all bases 2 ≤ k ≤ n − 1
with gcd(k, n) = 1.3 These Carmichael numbers always fool our Fermat-based
primality test; see the 2010 entry.
Is there a polynomial-time algorithm that distinguishes primes and composites?
By polynomial time we mean that there are constants A, B > 0 such that the
number of elementary steps performed by the algorithm on the input n is at most
A(log n)B . The focus on log n is because the length of the decimal (or binary)
representation of n is proportional to log n.
There are algorithms that depend upon randomly selected parameters that can
do the job. One example is the Miller–Rabin test, named after Gary Lee Miller and
Michael Oser Rabin (1931– ). Let n > 2 and write n − 1 = 2r m, in which m ≥ 1 is
odd and r ≥ 0. If
j
bm ≡ 1 (mod n) or b2 m
≡ −1 (mod n) for some j ∈ {0, 1, 2, . . . , r − 1},
then n passes Miller’s test for the base b. If n fails the test for some base b, then it
is composite. It can be shown that if n is an odd composite number, then n passes
Miller’s test for at most (n−1)/4 bases b with 1 ≤ b ≤ n−1.4 This yields the Miller–
Rabin probabilistic primality test: if n passes Miller’s test for k different bases, then
the probability that n is composite is at most 1/4k . For example, if n passes the
test for k = 50 bases, then this probability is 1/450 ≈ 7.89 × 10−31 . Although
we are not 100% certain that n is a prime, our level of confidence is sufficient for
most industrial applications. Sometimes speed is more important than absolute
certainty.
It is conceivable, although highly unlikely, that n is composite but that we
continually pick from among the (n − 1)/4 “bad” bases. Thus, we cannot guarantee
that the Miller–Rabin test will work in polynomial time. On the other hand, the
Adleman–Huang test is a random procedure that is guaranteed to find a proof of
primality for a prime input in polynomial time [1].
What we really want is a deterministic, polynomial-time algorithm that dis-
tinguishes primes and composites. Over the years there were some close calls, but
it was not until an electrifying announcement from India in 2002 that we had an
answer. Manindra Agrawal (1966– ) and his two undergraduate honors students
Neeraj Kayal (1979– ) and Nitin Saxena (1981– ) gave a fairly simple deterministic,
polynomial-time algorithm that distinguishes primes from composites. It involves
a generalization of Fermat’s little theorem to the ring of polynomials over a finite
field of prime order modulo an irreducible polynomial.
We follow the description of the AKS primality test (named for Agrawal, Kayal,
and Saxena) in [3], which also contains a number of worked examples. We first
require some preliminaries. Recall that
(Z/nZ)× = k ∈ {1, 2, . . . , n − 1} : gcd(k, n) = 1
3 Ifgcd(k, n) = 1, then we already know that n is composite.
4 If the generalized Riemann hypothesis is true, then for every composite integer n, there is
a b < 2(log2 n)2 for which n fails Miller’s test for the base b.
502 2002. PRIMES IN P
Agrawal, Kayal, and Saxena were successful in de-randomizing the prime recog-
nition problem. Here is another problem for which there is a random polynomial-
time algorithm, yet for which we do not know if there is a deterministic polynomial-
time algorithm.
2002: Comments
Infinitude of base-2 psuedoprimes. To demonstrate that there are infin-
itely many pseudoprimes for the base 2, it suffices to show that for each odd, base-2
pseudoprime, there is a larger odd one. We start our construction with n = 341.
Let n be an odd pseudoprime for the base 2 and let
Mn = 2n − 1
denote the nth Mersenne number, which is known to be composite (see the 1996
entry). Because 2n−1 ≡ 1 (mod n), we have
Mn − 1 = 2n − 2 = 2(2n−1 − 1) = 2dn
for some d. Thus,
2(Mn −1)/2 − 1 = 2dn − 1
= (2n − 1)(2n(d−1) + 2n(d−2) + · · · + 2n + 1)
= Mn (2n(d−1) + 2n(d−2) + · · · + 2n + 1)
≡ 0 (mod Mn ).
Since Mn > n is composite and
2Mn −1 ≡ (2(Mn −1)/2 )2 ≡ 12 ≡ 1 (mod Mn ),
we conclude that Mn is a pseudoprime for the base 2.
504 2002. PRIMES IN P
Although there are infinitely many pseudoprimes for the base 2, our method
does not provide an efficient method for producing them. Indeed, M341 = 2341 −1 is
far larger than 561, the smallest pseudoprime for the base 2 after 341. The number
561 is also the first Carmichael number; see the 2010 entry.
Carl Pomerance alerted us to a simpler proof of the infinitude of base-2 pseudo-
primes. We claim that if p ≥ 5 is prime, then n = (4p −1)/3 is a base-2 pseudoprime.
First observe that 4p ≡ 1 (mod 3), so n is indeed an integer. Moreover, (2p + 1)/3
is an integer and hence n = (2p − 1)((2p + 1)/3) is composite. Fermat’s theorem
ensures that
n ≡ (2p − 1)(2p + 1)3−1 ≡ (2 − 1)(2 + 1)3−1 ≡ 1 (mod p).
Since n−1 is even, we have (n−1)/2 ≡ 0 (mod p) and hence 2n−1 −1 = 4(n−1)/2 −1
is divisible by 4p − 1. Thus, n is a base-2 pseudoprime.
Bibliography
[1] L. M. Adleman and M.-D. A. Huang, Primality testing and abelian varieties over finite fields,
Lecture Notes in Mathematics, vol. 1512, Springer-Verlag, Berlin, 1992. MR1176511
[2] M. Agrawal, N. Kayal, and N. Saxena, PRIMES is in P, Ann. of Math. (2) 160 (2004), no. 2,
781–793, DOI 10.4007/annals.2004.160.781. http://www.cse.iitk.ac.in/users/manindra/
algebra/primality_v6.pdf. MR2123939
[3] M. Cozzens and S. J. Miller, The mathematics of encryption: An elementary introduc-
tion, Mathematical World, vol. 29, American Mathematical Society, Providence, RI, 2013.
MR3098499
[4] R. Crandall and C. Pomerance, Prime numbers: A computational perspective, 2nd ed.,
Springer, New York, 2005. MR2156291
[5] A. Granville, It is easy to determine whether a given integer is prime, Bull. Amer. Math.
Soc. (N.S.) 42 (2005), no. 1, 3–38, DOI 10.1090/S0273-0979-04-01037-7. http://www.dms.
umontreal.ca/~andrew/PDF/Bulletin04.pdf. MR2115065
[6] C. Pomerance, Primality testing: variations on a theme of Lucas, Congr. Numer. 201 (2010),
301–312. https://math.dartmouth.edu/~carlp/PDF/lucastalk.pdf. MR2598366
2003
Poincaré Conjecture
Introduction
In 2003, Grigori Perelman, building upon seminal work of Richard S. Hamilton
(1943– ), proved the Poincaré conjecture, one of the million-dollar Clay Millennium
Problems (see the comments for the 2000 entry). The conjecture asserts that every
smooth, compact, simply connected, closed 3-manifold is homeomorphic to the 3-
sphere
(x, y, z, w) ∈ R4 : x2 + y 2 + z 2 + w2 = 1 .
Two manifolds are homeomorphic if there is a continuous bijection between them
that has a continuous inverse. For example, a circle and the trefoil knot (see the 1985
entry) are homeomorphic 1-manifolds, even though they cannot be continuously
deformed into each other when embedded in R3 . On the other hand, the 2-sphere
and the Euclidean plane are not homeomorphic: one is compact and the other is
not.1
A particularly down-to-earth explanation of the main difficulty behind the
Poincaré conjecture was recalled by Gerry Myerson in 2012 [3]:
I once heard an expert “explain” the difficulty of the n = 3 case to a
general audience by saying something like this: when n ≤ 2, there isn’t
enough room for anything to go wrong, while for n ≥ 4, there’s enough
room to fix anything that goes wrong; for n = 3, there’s enough room
for something to go wrong, and. . . it’s not clear whether there’s enough
room to fix things when they go wrong.
The cases n = 1 and n = 2 are classical and date back to the foundations of
algebraic topology. Stephen Smale proved the conjecture for n ≥ 4 in 1961 and
Michael Freedman (1951– ) proved it for n = 4 in 1982. Since both of them
received Fields Medals for their work, one can claim that the Poincaré conjecture
resulted in either two or three medals, depending upon how one accounts for the
enigmatic Perelman (see the comments below).
Although the resolution of the Poincaré conjecture is hopelessly beyond the
level of this book and the expertise of its authors, we can discuss its analogue for 2-
manifolds. By a surface, we mean a smooth, connected, two-dimensional manifold.
Think of this as a nice topological space that locally resembles R2 and does not
consist of multiple disjoint pieces. For example, a microscopic observer on a torus or
Klein bottle will believe their local environment is flat and two dimensional, much
as we perceive the ground around us as flat. A surface is closed if it is compact
1 However, the 2-sphere with a point removed is homeomorphic to the plane via stereographic
projection.
505
506 2003. POINCARÉ CONJECTURE
and has no boundary. For example, the sphere is closed, but the Möbius strip is
not since its boundary is a circle (see the 1958 entry). A closed surface can be
diagrammed using a fundamental polygon, an even-sided polygon with some of its
edges identified; see Figure 1 and the comments for the 1958 entry.
A surface is simply connected if every loop on the surface can be contracted to
a point without leaving the surface. For example, the sphere is simply connected
and the torus is not; see Figure 2. The analogue of the Poincaré conjecture for
2-manifolds asserts that every simply connected, closed surface is homeomorphic
to the sphere. This is a consequence of the classification of surfaces from alge-
braic topology, which says that every closed surface is homeomorphic to one of the
following:
(a) the sphere,
(b) the connected sum of tori, or
(c) the connected sum of real projective planes;
see the comments below for information about the connected sum of manifolds.
Every surface in the first two classes is orientable; every surface in the third class
is nonorientable. Of these, the only simply connected surface is the sphere; this
implies the Poincaré conjecture for 2-manifolds.
100TH ANNIVERSARY PROBLEMS 507
(a) Every path on the sphere is contractible to (b) Neither of these two paths on the torus is
a point. Thus, the sphere is simply connected. contractible to a point.
The problem for this year was originally posed by Frank Morgan of Williams
College and it concerned 4-manifolds. However, he felt that the statement was too
imprecise to be included here. Instead, we present a simple combinatorial problem
with a visual twist that builds upon the comments to the 1980 entry. See below for
the solution.
2003: Comments
Perelman’s Fields Medal. Contrary to popular belief, Perelman did not
receive the prestigious Fields Medal for his resolution of the Poincaré conjecture.
He declined the award and did not even attend the award ceremony:
In May 2006, a committee of nine mathematicians voted to award
Perelman a Fields Medal for his work on the Poincaré conjecture. How-
ever, Perelman declined to accept the prize. Sir John Ball, president of
the International Mathematical Union, approached Perelman in Saint
Petersburg in June 2006 to persuade him to accept the prize. After 10
hours of attempted persuasion over two days, Ball gave up. Two weeks
later, Perelman summed up the conversation as follows: “He proposed
to me three alternatives: accept and come; accept and don’t come,
and we will send you the medal later; third, I don’t accept the prize.
From the very beginning, I told him I have chosen the third one. . . [the
prize] was completely irrelevant for me. Everybody understood that if
the proof is correct, then no other recognition is needed.” [9]
In 2010, Perelman also declined the million-dollar prize offered by the Clay foun-
dation (see the comments for the 2000 entry).
508 2003. POINCARÉ CONJECTURE
B A
Bibliography
[1] R. Honsberger, Mathematical Gems I, Mathematical Association of America, 1974.
[2] J. Milnor, Poincare Conjecture, http://www.claymath.org/millennium-problems/
poincare-conjecture.
[3] G. Myerson, Poincare conjecture for n = 2 (answer), https://math.stackexchange.com/
questions/103182/poincare-conjecture-for-n-2.
[4] S. Nasar and D. Gruber, Manifold Destiny: A legendary problem and the battle over who
solved it, The New Yorker, https://www.newyorker.com/magazine/2006/08/28/manifold-
destiny.
[5] G. Perelman, The entropy formula for the Ricci flow and its geometric applications, https://
arxiv.org/abs/math/0211159.
[6] G. Perelman, Ricci flow with surgery on three-manifolds, https://arxiv.org/abs/math/
0303109.
[7] G. Perelman, Finite extinction time for the solutions to the Ricci flow on certain three-
manifolds, https://arxiv.org/abs/math/0307245.
[8] T. Tao, Perelman’s proof of the Poincaré conjecture: a nonlinear PDE perspective, https://
arxiv.org/abs/math/0610903.
[9] Wikipedia, Grigori Perelman, https://en.wikipedia.org/wiki/Grigori_Perelman.
[10] Wikipedia, Surface (topology), https://en.wikipedia.org/wiki/Surface_(topology).
2004
Introduction
2004 is another year that witnessed the announcement of two major results,
each of which is worthy of a whole entry in its own right. One was the culmination
of decades of work by dozens of mathematicians: the classification of finite simple
groups. The other is the celebrated Green–Tao theorem, which guarantees the
existence of arbitrarily long arithmetic progressions in the primes [8, 17]. Alas, we
can choose only one to focus on. However, we do have a few words to say about
finite simple groups; see the comments below.
What does the Green–Tao theorem say? It asserts that for any length , there
is an initial prime p and a common difference k so that the length- arithmetic pro-
gression p, p + k, p + 2k, . . . , p + ( − 1)k consists entirely of primes. Ben Green and
Terence Tao proved this amazing result using a “relative” version of Szemerédi’s
theorem (see the 1975 entry). Szemerédi’s theorem tells us that a subset of the
natural numbers with positive upper density contains arbitrarily long arithmetic
progressions. Unfortunately, the prime numbers have density zero and hence Sze-
merédi’s theorem does not immediately apply. Green and Tao proved a version of
Szemerédi’s theorem that applies to sets of natural numbers that are pseudoran-
dom in a certain technical sense. The final step of their proof is the construction
of a pseudorandom subset of the natural numbers that contains the primes as a
relatively dense subset. A recent overview of the theorem and its proof is [3].
Can the Green–Tao theorem be used to find arithmetic progressions in the
primes? Yes and no. The proof provides numerical bounds that guarantee the
existence of such an arithmetic progression in a certain range. However, the num-
bers produced are so astronomically large that they are well beyond the limit of
modern computation. As of mid-2018, the longest known arithmetic progression in
the primes has length twenty-six. The first such example,
For example, 2 is not a Gaussian prime since 2 = (1 + i)(1 − i). One can show
that a Gaussian integer is prime if and only if it is of the form ±p or ±pi, in which
p ≡ 3 (mod 4) is prime in the usual sense, or if it is of the form a + bi, in which
a2 + b2 is prime in the usual sense; see Figure 1. In 2005, Terence Tao showed
that given any distinct v0 , v1 , . . . , vk−1 ∈ Z[i], then there are infinitely many sets
{a + rv0 , a + rv1 , . . . , a + rvk−1 }, in which a ∈ Z[i] and r ∈ Z\{0}, all of whose
elements are Gaussian primes.
The Green–Tao theorem, along with many other famous theorems and difficult
conjectures, follows from the Bateman–Horn conjecture. See the comments for the
2005 entry for more information about this tantalizing conjecture.
2004: Comments
Four squares in arithmetic progression. The Green–Tao theorem ad-
dresses primes in arithmetic progressions. What about perfect squares? The
comments to the 1913 entry show how to construct three squares in arithmetic
progression. Mathematical folklore credits Fermat with the proof that there does
100TH ANNIVERSARY PROBLEMS 513
not exist an arithmetic progression of four perfect squares [6], although Leonhard
Euler is attributed the observation in 1780 [4]. A proof using Fermat’s method of
descent can be found in [16]. The more modern approach to the problem involves
elliptic curves. The crux of the matter is that the rational quadruples (a, b, c, d)
so that a2 , b2 , c2 , d2 form an arithmetic progression can be parametrized by the
rational points on the elliptic curve
One can show that the curve has only eight rational points, all of which give rise
to trivial solutions to the original problem. Consequently, there are no rational
perfect squares in arithmetic progression. The details can be found in [4].
a1 a2 a3 a4 + 1 = 1,807 = 13 · 139;
Does this sequence contain every prime? Without a major breakthrough in our
understanding of prime numbers, this question will likely remain unanswered for
many years to some.
Classification of finite simple groups. The year 2004 witnessed the com-
pletion of the classification of finite simple groups, a decades-long quest. A group is
simple if it contains no normal subgroups other than itself and the trivial subgroup
(see the 1992 entry for more background). Consequently, a simple group cannot
be decomposed further using the quotient group construction. The finite simple
groups are the “atoms” from which more complicated finite groups, “molecules” if
you will, can be constructed. In contrast to atoms, which come in only a hundred
or so varieties, there are infinitely many finite simple groups.
514 2004. PRIMES IN ARITHMETIC PROGRESSION
1 There is another group, named after Jacques Tits (1930– ), that is occasionally regarded as
the “27th sporadic group.” However, it is usually considered an unusual group of Lie type.
100TH ANNIVERSARY PROBLEMS 515
over 1,000 pages. A massive effort to compile a complete and largely self-contained
proof of the classification theorem is well underway:
In 1981 the monumental project to classify all of the finite simple
groups appeared to be nearing its conclusion. Danny Gorenstein had
dubbed the project the “Thirty Years’ War” dating its inception from
an address by Richard Brauer at the International Congress of Math-
ematicians in 1954. He and Richard Lyons agreed that it would be
desirable to write a series of volumes that would contain the complete
proof of this Classification Theorem, modulo a short and clearly spec-
ified list of background results. As the existing proof was scattered
over hundreds of journal articles, some of which cited other articles
that were never published, there was a consensus that this was indeed
a worthwhile project. [12]
The project is expected to be completed in 2023. Perhaps one day soon the entire
proof will be verified by computer.
Solution to the problem. Although we could use the prime number theorem
to solve the problem, a weaker result due to Chebyshev suffices. He proved that
there are constants A ≈ 0.9212 and B ≈ 1.1055 so that
Ax Bx
≤ π(x) ≤
log x log x
516 2004. PRIMES IN ARITHMETIC PROGRESSION
for sufficiently large x, in which π(x) denotes the prime-counting function. Suppose
that x is even and large enough for Chebyshev’s estimate to hold. Then the number
of distinct pairs of primes (p, q) with 2 < p < q ≤ x is
π(x) − 1 (π(x) − 1)(π(x) − 2) π(x)2 A 2 x2
= > > .
2 2 3 3 log2 x
Since the number of possible even differences between primes at most x is bounded
above by x/2, the average number of occurrences of each difference is
π(x)−1
A2 x2 /(3 log2 x) 2A2 x
2
≥ = , (2004.1)
x/2 x/2 3 log2 x
which tends to infinity. At least one of these differences occurs at least the average
number of times. Given N , let x be an even number that is large enough to ensure
that Chebyshev’s estimates are valid and that the right-hand side of (2004.1) is
larger than N . Then there is a common difference 2m for which at least N pairs
of primes (p, q) with p − q = 2m exist.
Bibliography
[1] M. Aschbacher and S. D. Smith, The classification of quasithin groups. I, Structure of strongly
quasithin K-groups, Mathematical Surveys and Monographs, vol. 111, American Mathemat-
ical Society, Providence, RI, 2004. MR2097623
[2] M. Aschbacher and S. D. Smith, The classification of quasithin groups. II, Main theorems:
the classification of simple QTKE-groups, Mathematical Surveys and Monographs, vol. 112,
American Mathematical Society, Providence, RI, 2004. MR2097624
[3] D. Conlon, J. Fox, and Y. Zhao, The Green-Tao theorem: an exposition, EMS Surv. Math.
Sci. 1 (2014), no. 2, 249–282, DOI 10.4171/EMSS/6. https://arxiv.org/abs/1403.2957.
MR3285854
[4] K. Conrad, Arithmetic progressions of four squares, http://www.math.uconn.edu/~kconrad/
blurbs/ugradnumthy/4squarearithprog.pdf.
[5] R. Crandall and C. Pomerance, Prime numbers:A computational perspective, Springer-Verlag,
New York, 2001. MR1821158
[6] L. E. Dickson, History of the theory of numbers. Vol. II, Diophantine analysis, reprinted by
AMS, 1992.
[7] J. Fox and Y. Zhao, A short proof of the multidimensional Szemerédi theorem in the primes,
Amer. J. Math. 137 (2015), no. 4, 1139–1145, DOI 10.1353/ajm.2015.0028. MR3372317
[8] B. Green and T. Tao, The primes contain arbitrarily long arithmetic progressions, Ann. of
Math. (2) 167 (2008), no. 2, 481–547, DOI 10.4007/annals.2008.167.481. http://arxiv.org/
abs/math.NT/0404188. MR2415379
[9] B. Green and T. Tao, Linear equations in primes, Ann. of Math. (2) 171 (2010), no. 3,
1753–1850, DOI 10.4007/annals.2010.171.1753. MR2680398
[10] On-Line Encyclopedia of Integer Sequences, A000945 (Euclid-Mullin sequence: a(1) = 2,
a(n + 1) is smallest prime factor of 1 + nk=1 a(k), https://oeis.org/A000945.
[11] On-Line Encyclopedia of Integer Sequences, A204189 (Benot̂ Perichon’s 26 primes in arith-
metic progression), https://oeis.org/A204189.
[12] R. Solomon, The classification of finite simple groups: a progress report, Notices Amer.
Math. Soc. 65 (2018), no. 6, 646–651. https://www.ams.org/journals/notices/201806/
rnoti-p646.pdf. MR3792856
[13] T. Tao, The Gaussian primes contain arbitrarily shaped constellations, J. Anal. Math.
99 (2006), 109–176, DOI 10.1007/BF02789444. https://arxiv.org/abs/math/0501314.
MR2279549
[14] T. Tao and T. Ziegler, The primes contain arbitrarily long polynomial progressions, Acta
Math. 201 (2008), no. 2, 213–305, DOI 10.1007/s11511-008-0032-5. MR2461509
100TH ANNIVERSARY PROBLEMS 517
[15] T. Tao and T. Ziegler, A multi-dimensional Szemerédi theorem for the primes via a corre-
spondence principle, Israel J. Math. 207 (2015), no. 1, 203–228, DOI 10.1007/s11856-015-
1157-9. MR3358045
[16] A. van der Poorten, Fermat’s Four Squares Theorem, https://arxiv.org/abs/0712.3850v1.
[17] Wikipedia, Green–Tao theorem, https://en.wikipedia.org/wiki/Green-Tao_theorem.
[18] Wikipedia, Primes in arithmetic progression, https://en.wikipedia.org/wiki/
Primes_in_arithmetic_progression.
2005
Introduction
A lot of mathematical software, such as Mathematica (see the 1988 entry)
and Maple, are closed source. This means that the actual nuts and bolts of the
algorithms and implementations are hidden from the user. For example, the Math-
ematica command Fibonacci[n] almost instantly returns the nth Fibonacci num-
ber. But what is going on under the hood? Is the program using the definition of
the Fibonacci numbers? Probably not, that would be painfully slow. Is it using
something along the lines of Binet’s formula (see the comments for the 2001 entry)?
Possibly. Perhaps Mathematica uses something altogether different and much more
clever. We simply do not know because the source code is not publicly available.
Without publicly available source code, it is difficult for a researcher to verify
that a program does exactly what it claims. Are the results accurate? Are the
algorithms correctly implemented? With closed-source programs, one must simply
trust that the programmers knew what they were doing and got things right.
In early 2005, William A. Stein (1974– ) released Sage (Software for Algebra
and Geometry Experimentation) in response to these issues; see Figure 1. Although
it is now called SageMath, the goal remains the same [4]:
The goal of the Sage project is to create a viable open source alter-
native to Magma, Maple, Mathematica, and MATLAB, which are all
closed source. This means that people have choice—they at least have
the option to use open source software for their math research and
teaching in all the academic areas represented by those software. Pro-
viding such a choice entails both implementing all relevant algorithms
in Sage (with competitive efficiency and correctness), and creating cor-
responding textbooks and documentation.
519
520 2005. WILLIAM STEIN DEVELOPED SAGE
SageMath features a web-based interface that lets the user harness the power of
dozens of open-source packages and perform computations across the spectrum
of pure and applied mathematics; see Figure 2. Computations can be performed
locally or remotely on a SageMath server.
Although SageMath is used by many mathematicians around the world, Stein
faced enormous difficulty obtaining funding. Unlike commercially available soft-
ware, SageMath does not bring in revenue and, in fact, it did not have a single
full-time developer until 2016 [11]. Most of the software development was car-
ried out by volunteers, mostly students and working mathematicians or computer
scientists. In a 2018 interview, Stein said [4]:
My perspective with Sage has always been to try to make a tool that
people could use to compute mathematical objects more easily, with
minimal friction. They should not have to pay a lot of money, they
should have full access to readable source code, and have many good
code examples that definitely work.
100TH ANNIVERSARY PROBLEMS 521
Although Stein has stepped back a bit from development work on SageMath (he
is now the CEO of SageMath, Inc. and focuses mostly on its cloud-computing
platform, CoCalc), progress continues unabated [4]:
Sage development proceeds at a steady pace, with many Sage Days
workshops in both the US and Europe; for example IMA [Institute for
Mathematics and its Applications] in Minnesota is sponsoring many
workshops this year and OpenDreamKit in Europe too! Most work on
Sage is motivated by the needs of research mathematicians for their
own work. Releases keep happening, and around 100 people contribute
to each release.
2005: Comments
The Bateman–Horn conjecture. On the theme of numerical computation
and hot on the heels of last year’s entry (the Green–Tao theorem), we embark upon
one of the final running threads in this book: the Bateman–Horn conjecture. Like
the Riemann hypothesis (see the 1942 and 1987 entries) and the abc-conjecture (see
the 1981 entry), the Bateman–Horn conjecture has many far-reaching consequences
and remains unproven. The material below, and much more, can be found in the
recent expository article [1].
The conjecture stems from a 1962 summer undergraduate research project at
the University of Illinois at Urbana-Champaign. Paul T. Bateman (1919–2012), an
analytic number theorist who joined the university in 1950, sponsored the project
and employed a promising young student, Roger A. Horn (1942– ). In 1963, they
used the ILLIAC (Illinois Automatic Computer), the first computer built and owned
by a US-based academic institution, to run some computations concerning the
distribution of prime numbers.
Needless to say, they did not use Sage, Mathematica, or any other software that
the modern user might recognize. The programs were entered on paper tape and
fed into the machine by a dedicated operator via a noisy mechanism. An attached
printer could produce output at the modest rate of ten characters per second.
Among other computations, Bateman and Horn found the 776 primes p ≤
113,000 for which p2 + p + 1 is also prime. This computation, which took 400
minutes on the state-of-the-art ILLIAC, was performed on the first named author’s
late-2013 iMac in a tenth of a second. How times have changed! These sorts
of computations, along with previous conjectures of Bunyakowky (1854), Dickson
(1904), Landau (1912), Hardy and Littlewood (1923), and Schinzel (1958), pointed
toward a grand conjecture about the asymptotic distribution of primes generated
522 2005. WILLIAM STEIN DEVELOPED SAGE
Suppose that f = f1 f2 · · · fk does not vanish identically modulo any prime. Then
C(f1 , f2 , . . . , fk ) x dt
Q(f1 , f2 , . . . , fk ; x) ∼ k k
, (2005.2)
i=1 deg fi 2 (log t)
in which
1
−k
ωf (p)
C(f1 , f2 , . . . , fk ) = 1− 1− (2005.3)
p
p p
x2 − 1 = (x − 1)(x + 1)
We expect that higher-degree polynomials assume prime values less frequently over
k
a given range. This tendency manifests itself in the denominator i=1 deg fi of
(2005.2). The integral in (2005.2) is reminiscent of the logarithmic integral that
we encountered in our study of the prime number theorem (see the 1933 entry).
The power of the logarithm reflects the fact that additional polynomials drive down
the frequency of arguments for which the polynomials simultaneously attain prime
values. Finally, the Bateman–Horn constant (2005.3) that appears in (2005.2) is a
correction factor that takes into account information about how the f1 , f2 , . . . , fk
behave modulo each prime. The fact that the infinite product (2005.3) converges
is not at all obvious. The proof is quite delicate and involves elements of both
algebraic and analytic number theory; see [1, Sect. 5] for the details.
1 See [1, Sect. 3] for a detailed heuristic derivation of the Bateman–Horn conjecture, based
upon the Cramér random model of the primes (see the comments for the 1987 entry).
100TH ANNIVERSARY PROBLEMS 523
Before moving on, we should say something about Roger A. Horn, a collaborator
of the first named author on a recent linear algebra textbook [5]. The following
passage is from [1, Sect. 5]:
Horn is known best for his long and storied career in matrix analy-
sis. Among his chief publications are the classic texts Matrix Analysis
[7] and Topics in Matrix Analysis [8], both coauthored with Charles
Johnson. Of his many papers, only two are on number theory; both of
these date from the early 1960s and concern the Bateman–Horn con-
jecture [2, 3]. Consequently, many of his close colleagues are unaware
of his connection to a famous conjecture in number theory.
Indeed, the first named author only became aware of Horn’s involvement in the
conjecture because of his recent work on primitive roots for twin primes [6].
Bibliography
[1] S. L. Aletheia-Zomlefer, L. Fukshansky, and S. R. Garcia, The Bateman–Horn Conjecture:
Heuristics, History, and Applications, to appear in Expositiones Mathematicae, https://
arxiv.org/abs/1807.08899.
[2] P. T. Bateman and R. A. Horn, A heuristic asymptotic formula concerning the distribution
of prime numbers, Math. Comp. 16 (1962), 363–367, DOI 10.2307/2004056. MR0148632
[3] P. T. Bateman and R. A. Horn, Primes represented by irreducible polynomials in one variable,
Proc. Sympos. Pure Math., Vol. VIII, Amer. Math. Soc., Providence, R.I., 1965, pp. 119–132.
MR0176966
[4] A. Diaz-Lopez, William Stein interview, Notices Amer. Math. Soc. 65 (2018), no. 5, 540–543.
MR3753815
[5] S. R. Garcia and R. A. Horn, A Second Course in Linear Algebra, Cambridge Mathematical
Textbooks, Cambridge University Press, 2017.
[6] S. R. Garcia, E. Kahoro, and F. Luca, Primitive root biases for twin primes, Experimental
Mathematics (in press), https://arxiv.org/abs/1705.02485.
[7] R. A. Horn and C. R. Johnson, Matrix analysis, 2nd ed., Cambridge University Press, Cam-
bridge, 2013. MR2978290
[8] R. A. Horn and C. R. Johnson, Topics in matrix analysis, corrected reprint of the 1991
original, Cambridge University Press, Cambridge, 1994. MR1288752
[9] Sage, http://www.sagemath.org/.
[10] W. Stein, Mathematical software and me: a very personal recollection, http://sagemath.
blogspot.com/2009/12/mathematical-software-and-me-very.html.
[11] Wikipedia, SageMath, https://en.wikipedia.org/wiki/SageMath.
2006
Introduction
Let G be a graph. The chromatic number χ(G) of G is the smallest number
of colors needed to paint the vertices of G so that no pair of adjacent vertices have
the same color. The clique number ω(G) of G is the size of the largest induced
complete subgraph in G, that is, the size of the largest subset of vertices of G, all
of which are connected to each other. Since a complete graph Kn on n vertices
satisfies χ(Kn ) = n, it follows that χ(G) ≥ ω(G) for any graph; see Figure 1. In
principle, finding either quantity is computationally intractable since both problems
are NP-hard. Nevertheless, many algorithms exist that can find χ(G) or ω(G) for
graphs of reasonable size or from certain special families.
A graph G is perfect if every induced subgraph H has χ(H) = ω(H). For
example, every bipartite graph is perfect, as is every forest (disjoint union of trees).
In 1961, Claude Berge (1926–2002) proposed a deep conjecture: a graph is perfect
if and only if neither it nor its complement has an induced subgraph that is a cycle
of odd length five or greater [2]. The conjecture implies that perfect graphs are
closed under complementation. This weaker result (the perfect graph theorem) was
proved by László Lovász (1948– ) in 1972 via an elegant polyhedral argument [8, 9].
525
526 2006. THE STRONG PERFECT GRAPH THEOREM
The full proof of Berge’s conjecture, now called the strong perfect graph theo-
rem, was obtained in 2006 by Maria Chudnovsky (1977– ), Neil Robertson
(1938– ), Paul Seymour (1950– ), and Robin Thomas [3], just one month before
Berge passed away. The foursome was awarded the 2009 Delbert Ray Fulkerson
Prize for outstanding work in discrete mathematics [6]:
Claude Berge introduced the class of perfect graphs in 1960, together
with a possible characterization in terms of forbidden subgraphs. The
resolution of Berge’s strong perfect graph conjecture quickly became
one of the most sought-after goals in graph theory. . . . The long, dif-
ficult, and creative proof by Chudnovsky and her colleagues is one of
the great achievements in discrete mathematics.
2006: Comments
Bonus problem. The problem proposed for 2006 was solved in 2017 by Maria
Chudnovsky, Alex Scott, Paul Seymour, and Sophie Spirkl [4]. So here is a
“bonus problem,” posed by Steven J. Miller. Let G be the graph with vertices
100TH ANNIVERSARY PROBLEMS 527
Figure 3. The Moser spindle and the Golomb graph are unit-
distance graphs in the plane with chromatic number 4. Their exis-
tence implies that the answer to the Hadwiger–Nelson problem is
at least four.
more careful analysis confirms that ω(G) = 5,000. Since the chromatic and clique
numbers of G are the same, what can you deduce about its induced subgraphs?
in which C2 ≈ 0.660161815 is the twin primes constant (see (1919.4) in the 1919
entry). The Bateman–Horn conjecture (2005.2) predicts that the number of n ≤ x
such that n and n + 2 are prime is
x
dt 2C2 x
Q(f1 , f2 ; x) ∼ 2C2 2
∼ .
2 (log t) (log x)2
1 We
admit that the connection with 2006 is tentative: the survey article [11] on the Goldston–
Pintz–Yıldırım theorem by Kannan Soundararajan appeared on the Bulletin of the American
Mathematical Society website on September 25, 2006).
100TH ANNIVERSARY PROBLEMS 529
/x
Figure 4. Graph of π2 (x) (orange) versus 2C2 2 (log t)−2 dt
(blue) and 2C2 x/(log x)2 (green) for x ≤ 10,000. Although all
three functions are asymptotically equivalent, the more compli-
cated integral expression provides a better approximation to π2
than does the more elementary expression.
Moreover, Q(f1 , f2 ; x) ∼ π2 (x), the counting function for the twin primes; see
Figure 4. This was first predicted by Hardy and Littlewood in 1923 [7].
Bibliography
[1] S. L. Aletheia-Zomlefer, L. Fukshansky, and S. R. Garcia, The Bateman–Horn Conjecture:
Heuristics, History, and Applications, to appear in Expositiones Mathematicae, https://
arxiv.org/abs/1807.08899.
[2] C. Berge, Färbung von Graphen, deren sämtliche bzw. deren ungerade Kreise starr sind,
Wiss. Z. Martin-Luther-Univ. Halle-Wittenberg Math.-Natur. Reihe 10 (1961), 114.
[3] M. Chudnovsky, N. Robertson, P. Seymour, and R. Thomas, The strong perfect graph theo-
rem, Ann. of Math. (2) 164 (2006), no. 1, 51–229, DOI 10.4007/annals.2006.164.51. http://
annals.math.princeton.edu/wp-content/uploads/annals-v164-n1-p02.pdf. MR2233847
[4] M. Chudnovky, A. Scott, P. Seymour, and S. Spirkl, Induced subgraphs of graphs with large
chromatic number, VIII: Long odd holes, https://arxiv.org/abs/1701.07217.
[5] A. D. N. J. de Grey, The chromatic number of the plane is at least 5, Geombinatorics 28
(2018), no. 1, 18–31. https://arxiv.org/abs/1804.02385. MR3820926
[6] Fulkerson Prize Committee, 2009 Fulkerson Prizes, Notices of the American Mathematical
Society 57 (2011), no. 11, 1475–1476.
[7] G. H. Hardy and J. E. Littlewood, Some problems of ‘Partitio numerorum’; III: On the
expression of a number as a sum of primes, Acta Math. 44 (1923), no. 1, 1–70, DOI
10.1007/BF02403921. MR1555183
[8] L. Lovász, Normal hypergraphs and the perfect graph conjecture, Discrete Math. 2 (1972),
no. 3, 253–267, DOI 10.1016/0012-365X(72)90006-4. MR0302480
[9] L. Lovász, A characterization of perfect graphs, J. Combinatorial Theory Ser. B 13 (1972),
95–98. MR0309780
[10] A. Soifer, The mathematical coloring book: Mathematics of coloring and the colorful life of
its creators, with forewords by Branko Grünbaum, Peter D. Johnson, Jr. and Cecil Rousseau
Springer, New York, 2009. MR2458293
530 2006. THE STRONG PERFECT GRAPH THEOREM
[11] K. Soundararajan, Small gaps between prime numbers: the work of Goldston-Pintz-Yıldırım,
Bull. Amer. Math. Soc. (N.S.) 44 (2007), no. 1, 1–18, DOI 10.1090/S0273-0979-06-01142-6.
MR2265008
[12] Wikipedia, Hadwiger–Nelson problem, https://en.wikipedia.org/wiki/Hadwiger-
Nelson_problem.
2007
Flatland
Introduction
The year 2007 marked the latest attempt to capture the novella Flatland: A
Romance of Many Dimensions (more commonly known as Flatland ), written in
1884 by Edwin Abbott Abbott (1838–1926). The movie is a creation of three
filmmakers from the University of Texas, Austin, who discovered that they had all
read and enjoyed the book when they were young students. Seth Caplan (1977– )
was the producer, Jeffrey Travis the director, and Dano Johnson the chief animator.
Their film is a hit with middle and high school geometry classes.
Although some of the social satire of the original book has been modified,
the central part of the story remains. The precocious hexagonal grandson in the
classic has been replaced by an equally precocious granddaughter, Hex. The major
innovation in the film is a mysterious artifact left in the two-dimensional world of
Flatland by a visitor from the third dimension, namely a cube that rotates about
a point so that the Flatlanders can see all of its various cross sections. One can
imagine what might happen. If a corner of the cube pierces the two-dimensional
world, local residents see a point appear and expand into an equilateral triangle
(Figure 1(a)). The triangle grows until its corners become blunted and form sides of
their own. The resulting hexagon slowly morphs until it is a large, regular hexagon
(Figure 1(b)). Once the cube’s center is firmly anchored in Flatland, it can spin
and rotate, revealing other shapes (Figures 1(c) and 1(d)).
The challenge for the onlookers is to imagine what kind of object could produce
all of these slices. It is only when Arthur Square and Hex are taken up into the third
dimension by the three-dimensional visitor Spherius that they begin to appreciate
geometric phenomena in a dimension higher than their own; see Figure 2.
Abbott definitely wanted to challenge his readers to imagine the analogous sit-
uation in which we are confronted by phenomena originating in a fourth spatial
dimension. The film concludes with views of a four-dimensional cube that is pro-
jected into our space as it rotates in various ways in the fourth dimension. What
would we perceive if a visitor from a four-dimensional universe visited our own?
(a) Cube cut by the plane x − y + z = −1. (b) Cube cut by the plane x − y + z = 0.
(c) Cube cut by the plane x − y + 2z = 0. (d) Cube cut by the plane z = 0.
Figure 1. The cube with vertices (±1, ±1, ±1) cut by various planes.
(a) What are the three-dimensional slices through the origin of a hypercube?
(b) Which of the central slices of the three-dimensional cube has the greatest area?
(c) Which central slice of the hypercube has the greatest volume?
(d) Describe the structure of the central slice of the five-dimensional cube by a
four-dimensional hyperplane perpendicular to its long diagonal.
100TH ANNIVERSARY PROBLEMS 533
2007: Comments
Nightfall. A classic science-fiction story that explores a theme similar to that
of Flatland is Isaac Asimov’s Nightfall [6]. Although it was written in 1941, well
before the Nebula Award for best science fiction short story was established in 1966,
the Science Fiction Writers of America voted it the best science fiction short story
from the era before the award. John Campbell, the influential editor of Astounding
Science Fiction, gave Asimov the following quote to use as inspiration for the story:
If the stars should appear one night in a thousand years, how would
men believe and adore, and preserve for many generations the remem-
brance of the city of God!
As in Flatland, a society must grapple with what is, to them, an inconceivable
concept. In this case, the planet Lagash is in a crowded system with six suns. The
planet is bathed in eternal day and its inhabitants never experience night or see
the distant stars. The story concerns what happens when they confront the truth
of the heavens.
Prime-generating polynomials. We are drawing near the end and it is time
to begin wrapping up some long-developing threads, among which is the Bateman–
Horn conjecture (see the comments for the 2005 and 2006 entries). We encountered
Euler’s polynomial n2 +n+41 in the comments for the 1930 entry. This polynomial
is prime for n = 0, 1, 2, . . . , 39, although it is composite for n = 40. Why is Euler’s
polynomial so good at producing primes? Can we find a quadratic polynomial that
beats Euler’s polynomial? Below, we follow the detailed exposition in [4].
First of all, no nonconstant polynomial f (x) with integer coefficients can pro-
duce only primes.1 To see this, let p = f (0), which is prime by assumption. For
1 Although no single-variable polynomial assumes only prime values, there are explicit multi-
The Bateman–Horn conjecture predicts that there are around 3.32 Li(x) values
n ≤ x for which n2 + n + 41 is prime.
100TH ANNIVERSARY PROBLEMS 535
Can we find a k so that C(x2 + x + k) > C(x2 + x + 41)? That is, can we beat
Euler’s polynomial? Yes! For odd p, (2007.1) tells us that
f (x) ≡ 0 (mod p) ⇐⇒ (2x + 1)2 ≡ 1 − 4k (mod p)
and hence we want an odd k such that ( 1−4k p ) = −1 for the first several dozen
primes. Let r1 , r2 , . . . , r100 be quadratic nonresidues modulo the first 100 odd
primes, respectively. The Chinese remainder theorem provides an odd k, namely,
3682528442873462645493394982418837604455310384084190749577
5453041420103519734083583186615204669729662489042369819157
7358565650719425670030967384568941667322171286195075149379
680113340447535104953498545635385597443028681,
so that 1 − 4k ≡ rp (mod p) for each such p. Then
C(x2 + x + k) ≈ 10.9945
and hence Bateman–Horn predicts that there are around 5.5 Li(x) values n ≤ x for
which n2 + n + k is prime. This beats Euler’s polynomial, at least asymptotically.
Bibliography
[1] E. A. Abbott, Flatland: A romance of many dimensions, reprint of the sixth (1953) edition,
with a new introduction by Thomas Banchoff, Princeton Science Library, Princeton University
Press, Princeton, NJ, 2005. MR2176823
[2] E. A. Abbott, Flatland: A romance of many dimensions, the movie edition, with a new
introduction by Thomas Banchoff and contributions by Seth Caplan, Dano Johnson and
Jeffrey Travis, Princeton University Press, Princeton, NJ, 2008. MR2381792
[3] E. A. Abbott, Flatland, an edition with notes and commentary by William F. Lindgren and
Thomas F. Banchoff, MAA Spectrum, Mathematical Association of America, Washington,
DC; Cambridge University Press, Cambridge, 2010. MR2573243
[4] S. L. Aletheia-Zomlefer, L. Fukshansky, and S. R. Garcia, The Bateman–Horn Conjecture:
Heuristics, History, and Applications, to appear in Expositiones Mathematicae, https://
arxiv.org/abs/1807.08899.
[5] M. Gardner Mathematical games: The remarkable lore of the prime number, Scientific Amer-
ican, 210 (1964), 120–128.
[6] I. Asimov, Nightfall, Astounding Science Fiction, September 1941. https://www.uni.edu/
morgans/astro/course/nightfall.pdf.
[7] Banchoff and Strauss Productions, The Hypercube: Projections and Slicing, 1978. http://
www.math.brown.edu/~banchoff/video/hypercube.mp4.
[8] T. Banchoff, Additional notes on Flatland, 2014. http://www.math.brown.edu/~banchoff/
HexCentralSlices/HexCentralSlices4308.html.
[9] Flatland Homepage, Flatland the Movie. http://www.flatlandthemovie.com/.
[10] N. D. Elkies, Editor’s endnotes [erratum to MR2001148], Amer. Math. Monthly 111 (2004),
no. 5, 456–460, DOI 10.1080/00029890.2004.11920101. MR2976697
[11] G. Rabinowitsch, Eindeutigkeit der Zerlegung in Primzahlfaktoren in quadratis-
chen Zahlkörpern (German), J. Reine Angew. Math. 142 (1913), 153–164, DOI
10.1515/crll.1913.142.153. MR1580865
2008
Introduction
The central limit theorem is one of the masterpieces of probability theory. It
allows us to look at sums or averages of random variables sampled from an unknown
distribution and make conclusions about the distribution of these expressions. This
has powerful applications in statistics. It allow us to compare the average of a data
set to a known distribution, the Gaussian, as long as we know the population’s
standard deviation. It also permits us to set hypotheses on the value of certain
key parameters. Unfortunately, in practice we often do not know the population’s
standard deviation. Using our sample’s standard error introduces extra uncertainty
into the model that must be taken into account.
In 1908, William Sealy Gosset (1876–1937), who was working for Guinness,
ran into this problem when trying to analyze data on the best barley and hops to
use in beer production. Gosset came up with a clever solution that revolutionized
statistics: he added the error from the approximated standard deviation into the
tails of the Gaussian model. This produced a new probability distribution that
gave accurate estimates for the probability of the observations yielding a mean at
least as extreme as the observed mean given the assumptions about the population
mean. Gosset published the model under the pseudonym “A. Student” due to com-
pany policies at Guinness designed to limit other brewers from benefiting from the
statistical research of its employees. The probability density function for Student’s
t-distribution is
− ν+1
Γ( ν+1 ) x2 2
√ 2 ν 1+ ,
νπΓ( 2 ) ν
in which
∞
Γ(x) = e−t tx−1 dt (2008.1)
0
is the gamma function (see the 1942 entry and the comments for the 1998 entry)
and ν is the number of degrees of freedom in the model, generally equal to the
number of observations in the data minus 1. In applications, a t-value, equal to the
difference of the sample mean and hypothesized mean times the square root of the
number of observations divided by the sample variance, is calculated and compared
to the probabilities in this distribution.
One of the greatest challenges in doing statistics is making sure all the assump-
tions are satisfied (see the references from the 1997 entry for examples of some
catastrophic consequences). We have already remarked that the t-test allows us to
537
538 2008. 100TH ANNIVERSARY OF THE t-TEST
consider situations in which the variances are unknown. Another great utility is
that when ν is modest, it is close to a Gaussian; we see this in Figure 1.
A popular application of Student’s t-test is to the correlation between two
quantitative variables. Generally, the assumption that errors from the model are
normally distributed is reasonable. However, since the regression might not be
over many points, estimating the standard deviation of the errors introduces extra
variance in the model, exactly what the Student’s t-model is designed to handle.
The model can also be used to compare the means of two populations and compare
the mean of a population to a specified value. After 100 years, the Student’s t-test
is still one of the most widespread and celebrated tools in statistics.
Since the problem below is far afield from the standard uses of the t-distribution,
we briefly remark on its inclusion here. First, it introduces several important ideas
in probability, especially the power of normalization constants (if we have a prob-
ability distribution, it must integrate to 1; this remarkably simple observation is
used numerous times in mathematics to attack difficult integrals). Second, it is
a wonderful example of how mathematics developed for one thing can find uses
in others. It also illustrates the value of being well read: problems that appear
intractable sometimes look that way before a new perspective is found. Finally, it
provides a great opportunity to discuss the gamma function.
2008: Comments
Stirling’s formula. How rapidly does n! grow? Stirling’s formula states that
√ n n
n! ≈ 2πn
e
(see the comments for the 1934 and 1998 entries) or, more accurately,
√ n n 1 1 139
n! = 2πn 1+ + − − · · · .
e 12n 288n2 51840n3
There are many proofs of Stirling’s formula. For example, it is a consequence of the
central limit theorem applied to sums of Poisson random variables [5, Sect. 18.7].
The definition (2008.1) ensures that Γ(0) = 1. Integration by parts confirms
that Γ(n + 1) = nΓ(n) for all natural numbers n, from which it follows that
Γ(n + 1) = n!. Thus, the gamma function can be used to interpolate the values of
√
the factorial function. For example, the value Γ( 12 ) = π arises in the definition of
the normal distribution (which we encounter below).
One way to prove Stirling’s formula involves the method of stationary phase,
also called Laplace’s method, which approximates integrals of the form
b
e−sf (x) g(x) dx
a
for certain pairs (f, g); see [6, App. A] and [5, Ch. 18]. The relevance of the
stationary phase approach stems from the fact that
∞
n! = Γ(n + 1) = e−x xn dx.
0
We sketch the argument below. It illustrates the value of embedding the quantity
we want (values of the factorial function) in a larger family where we have powerful
tools, such as calculus and analysis, at our disposal.
Suppose that f (x) > 0 and g(x) ≥ 0 on [a, b] and that there is an x0 ∈ (a, b)
such that f (x0 ) = 0. We hope to convert the integral
b
I(s) = e−sf (x) g(x) dx
a
into a Gaussian plus a small error as s → ∞. Our assumptions imply that
f (x0 )
f (x) = f (x0 ) + (x − x0 )2 + O(x2 )
2
for x near x0 . Thus,
sf (x0 )
(x−x0 )2
e−sf (x) ≈ e−sf (x0 ) · e− 2 ,
540 2008. 100TH ANNIVERSARY OF THE t-TEST
with mean x0 and whose variance σs2 = 1/(sf √ (x0 )) tends to zero as s → ∞. This
approach even suggests why there is a 2π in Stirling’s formula: it comes from
integrating a Gaussian.
As s → ∞, most of the contribution to I(s) comes from x near x0 . For any
fixed
> 0, we have
x0 +
f (x0 )
I(s) ≈ e−sf (x0 ) g(x0 ) e−s 2 (x−x0 ) dx
2
x0 −
x0 +
−sf (x0 ) 1
e−(x−x0 ) /2σs dx.
2 2
= e 2
g(x0 ) 2πσs
2
2πσs
x0 −
Since the Gaussian integrates to one over R (it is a probability distribution) and
sharply peaks at x0 as s → ∞, we obtain
,
−sf (x0 ) 2π
I(s) ≈ e g(x0 ) .
sf (x0 )
If we were more careful in our analysis, we could keep track of the lower-order terms
and bound how far we are from the true value.
How does this lead to Stirling’s formula?
∞
n! = Γ(n + 1) = e−t tn dt
0
∞
−t+n log n
= e dt
0
∞
= e−n(x−log nx) n dx (let t = nx)
0
∞
= nen log n e−n(x−log x) dx.
0
We apply the method of stationary phase as developed above, with s = n, x0 = 1,
f (x) = x − log x, and g(x) = 1. Since f (1) = 1 and g(1) = 1, we obtain
,
n log n −nf (1) 2π
n! ≈ ne e g(1)
sf (1)
"
2π
= nn+1 e−n
n
√ n n
= 2πn ,
e
which is Stirling’s approximation. See the comments for the 1934 entry for another
derivation.
Catalan numbers and their growth. While we are on the subject of facto-
rials and asymptotics, we might as well wrap up with the Catalan numbers:
1 2n (2n)!
Cn = = .
n+1 n (n + 1)! n!
100TH ANNIVERSARY PROBLEMS 541
See the 1960 entry for a variety of combinatorial interpretations of these numbers.
Although one can employ Stirling’s formula to get the leading-order asymptotics
for Cn , a more precise formula is [2, Ex. 9.8]
4n 1 9 145
Cn ∼ √ − 5/2 + + ··· . (2008.2)
π n3/2 8n 128n7/2
For example, the first three terms of (2008.2) yield the approximation
C47 ≈ 33,869,142,691,002,085,695,452,443
to
C47 = 33,868,773,757,191,046,886,429,490.
It is off by a bit, but the order of magnitude and first several significant digits are
correct; see Figure 2.
Bibliography
[1] J. F. Box, Guinness, Gosset, Fisher, and small samples, Statist. Sci. 2 (1987), no. 1, 45–52.
http://projecteuclid.org/euclid.ss/1177013437. MR896258
[2] R. L. Graham, D. E. Knuth, and O. Patashnik, Concrete mathematics: A foundation for com-
puter science, 2nd ed., Addison-Wesley Publishing Company, Reading, MA, 1994. MR1397498
[3] H. Hotelling, British Statistics and Statisticians Today, Journal of the American Statistical
Association 25 (1930) 186–190.
[4] S. J. Miller, A probabilistic proof of Wallis’s formula for π, Amer. Math. Monthly 115 (2008),
no. 8, 740–745, DOI 10.1080/00029890.2008.11920586. Expanded version: http://arxiv.org/
pdf/0709.2181. MR2456095
[5] S. J. Miller, The probability lifesaver: All the tools you need to understand chance, Princeton
Lifesaver Study Guide, Princeton University Press, Princeton, NJ, 2017. MR3585480
[6] E. M. Stein and R. Shakarchi, Complex analysis, Princeton Lectures in Analysis, vol. 2, Prince-
ton University Press, Princeton, NJ, 2003. MR1976398
[7] A. Student, The probable error of a mean, Biometrika 6 (1908), no. 1, 1–25. http://www.
aliquote.org/cours/2012_biomed/biblio/Student1908.pdf.
2009
Introduction
Whether one admires the elegance of a far-reaching theorem or its applications,
Luitzen Egbertus Jan Brouwer (1881–1966), commonly known as L. E. J. Brouwer,
proved a major theorem in 1912 that appeals to all tastes. Let f : B n → B n be a
continuous function on the n-dimensional unit ball
B n = {x ∈ Rn : x ≤ 1}.
Brouwer’s fixed-point theorem asserts that f has at least one fixed point. In other
words, there exists an x ∈ B n such that f (x) = x; see Figure 1.
Brouwer’s theorem has uses far beyond analysis and topology. Nash cemented
its foundational role in game theory (see the 1944 entry) with his seminal thesis [6].
Armed with Brouwer’s theorem, he proved the existence of equilibria for noncoop-
erative games. Nash equilibria, as they were later called, are equilibrium points
in an n-person noncooperative game, in which each of the n players with pure or
mixed strategies makes the best decision possible taking into account the best de-
cision that can be made by the other n − 1 players. In 1994, Nash received the
Nobel Prize in Economics for this contribution. This application, among others,
highlights the importance of fixed-point theorems in the past, present, and future
(see the comments for the 1944 entry).
• •
x f (x) = x
543
544 2009. 100TH ANNIVERSARY OF BROUWER’S FIXED-POINT THEOREM
Proofs of the Brouwer’s fixed-point theorem can be found in [5, 7]; see the
problem below for an unexpected combinatorial approach in the two-dimensional
case (the solution is given in the comments section).
2009: Comments
Brouwer and eigenvalues. Suppose that A is an n × n matrix with positive
entries. Intuition suggests that A has a positive eigenvalue and a corresponding
eigenvector with nonnegative entries. This is true, and it is crucial to the study of
Markov chains; see the 1953 entry.
Let x = (x1 , x2 , . . . , xn ) ∈ Rn and define
see Figure 2(a). Since A has only positive entries, it maps S into (0, ∞)n . If
x
π(x) = ,
x1 + x2 + · · · + xn
Prove it. Hint: Show that the closed unit ball is compact. Then prove that sup x =1 Ax < ∞.
546 2009. 100TH ANNIVERSARY OF BROUWER’S FIXED-POINT THEOREM
(a) The Ulam spiral is obtained by marking (b) When the Ulam spiral is plotted on a
primes as the ordered sequence of natural num- 500 × 500 grid, “lines” begin to appear.
bers spirals away from the origin.
for i = 1, 2, 3. However, f (p) and p both sum to 1 and hence they must be equal.
In other words, p is a fixed point of f .
Li(x) is the offset logarithmic integral function (1933.1), and (32/p) denotes a
Legendre symbol. Since
32 1 if p = 7, 17, 23, 31, 41, 47,
=
p −1 if p = 3, 5, 11, 13, 19, 29, 37, 43, 53, 59, 61, 67,
there is a substantial imbalance among the first few odd primes that makes C(f )
unusually large. This explains the particularly prime-rich ray that corresponds to
our polynomial. Analogous computations can be performed for other rays, most of
which have significantly smaller Bateman–Horn constants.
Bibliography
[1] S. L. Aletheia-Zomlefer, L. Fukshansky, and S. R. Garcia, The Bateman–Horn Conjecture:
Heuristics, History, and Applications, to appear in Expositiones Mathematicae, https://
arxiv.org/abs/1807.08899.
[2] L. E. J. Brouwer, Beweis der invarianz des n-dimensionalen gebiets (German), Math. Ann.
71 (1911), no. 3, 305–313, DOI 10.1007/BF01456846. MR1511658
[3] L. E. J. Brouwer, Über Abbildung von Mannigfaltigkeiten (German), Math. Ann. 71 (1911),
no. 1, 97–115, DOI 10.1007/BF01456931. MR1511644
[4] M. Gardner, Mathematical games: The remarkable lore of the prime number, Scientific Amer-
ican, 210 (1964), 120–128.
548 2009. 100TH ANNIVERSARY OF BROUWER’S FIXED-POINT THEOREM
[5] J. Franklin, Methods of mathematical economics: Linear and nonlinear programming, fixed-
point theorems, Undergraduate Texts in Mathematics, Springer-Verlag, New York-Berlin,
1980. MR602694
[6] J. F. Nash Jr, Non-cooperative games, ProQuest LLC, Ann Arbor, MI, 1950. http://www.
princeton.edu/mudd/news/faq/topics/Non-Cooperative_Games_Nash.pdf. MR2938064
[7] S. J. Miller, Mathematics of optimization: how to do things faster, Pure and Applied Under-
graduate Texts, vol. 30, American Mathematical Society, Providence, RI, 2017. MR3729274
[8] J. Shapiro Sperner’s lemma and Brouwer’s fixed-point theorem, http://joelshapiro.org/
Pubvit/Downloads/SpernerBrouwer/Sperner_Brouwer.pdf.
[9] T. Tao, Brouwer’s fixed point and invariance of domain theorems, and Hilbert’s fifth problem,
https://terrytao.wordpress.com/tag/invariance-of-domain/.
[10] Wikipedia, Brouwer’s fixed-point theorem, http://en.wikipedia.org/wiki/Brouwer fixed
point theorem.
[11] Wikipedia, Invariance of domain, https://en.wikipedia.org/wiki/Invariance_of_domain.
2010
Carmichael Numbers
Introduction
E-commerce would be impossible without the ability to securely transmit in-
formation. Since many modern cryptosystems, such as RSA (see the 1977 entry),
involve prime numbers, primality tests have been the focus of intense research in
recent years. A simple test is based on Fermat’s little theorem: if gcd(a, n) = 1 and
an−1
≡ 1 (mod n), then n is composite. As we saw in the 2002 entry, this test is
not foolproof. For example, 2340 ≡ 1 (mod 341) despite the fact that 341 = 11 · 41
is composite. That is, 341 is a pseudoprime for the base 2.
In 1910, Robert Daniel Carmichael (1879–1967) observed a devious property
of the composite number 561 = 3 · 11 · 17 [3]. Fermat’s theorem ensures that
a2 ≡ 1 (mod 3), a10 ≡ 1 (mod 11), and a16 ≡ 1 (mod 17),
whenever gcd(a, 561) = 1. Therefore,
a560 ≡ (b2 )280 ≡ 1 (mod 3),
a560 ≡ (b10 )56 ≡ 1 (mod 11),
a560 ≡ (b16 )35 ≡ 1 (mod 17),
and hence a560 ≡ 1 (mod 561), whenever gcd(a, 561) = 1. Thus, 561 is a pseudo-
prime for any base relatively prime to 561. In honor of this discovery, a composite
number n is called a Carmichael number if an−1 ≡ 1 (mod n) whenever gcd(a, n) =
1 (see the comments for more information about priority and nomenclature). The
existence of Carmichael numbers prevents us from making a primality test based
only on a direct application of Fermat’s little theorem.
Carmichael himself found several such numbers and he conjectured that infin-
itely many exist. The first few are
561, 1105, 1729, 2465, 2821, 6601, 8911, 10585, 15841, 29341.
Extensive computations carried out over the years by Richard G. E. Pinch reveal
that there are 20,138,200 Carmichael numbers at most 1021 [10]. Moreover, these
computations suggest that infinitely many Carmichael numbers exist; see Figure 1.
In 1994, William Alford (1937–2003), Andrew Granville (1962– ), and Carl
Pomerance (1944– ) proved Carmichael’s conjecture [1]. To be more specific, they
showed that for large x there are at least x2/7 Carmichael numbers at most x. An
important ingredient in the proof is Korselt’s criterion:
n is composite, square free, and (p − 1)|(n − 1)
n is a Carmichael number ⇐⇒
for all primes p that divide n.
549
550 2010. CARMICHAEL NUMBERS
a(n−1)/2 ≡ ±1 (mod n)
2010: Comments
Taxicab number. The word “taxicab” in the problem refers to a famous
anecdote about Ramanujan related by G. H. Hardy:
I remember once going to see him [Ramanujan] when he was ill at
Putney. I had ridden in taxi cab number 1729 and remarked that the
number seemed to me rather a dull one, and that I hoped it was not an
unfavorable omen. “No,” he replied, “it is a very interesting number;
it is the smallest number expressible as the sum of two cubes in two
different ways.”
Ramanujan had observed that
1,105 = 5 · 13 · 17
is the smallest number expressible as the sum of two squares in four different ways,
with order being irrelevant:
Why does this occur? A theorem of Fermat (see the comments for the 1923 and
1966 entries) asserts that an odd prime is the sum of two squares if and only if it
is congruent to 1 (mod 4). The primes 5, 13, and 17 are the smallest primes of this
form. Since
5 = 12 + 22 , 13 = 22 + 32 , and 17 = 12 + 42 ,
the identity
(a2 + b2 )(c2 + d2 ) = (ac − bd)2 + (ad + bc)2
can be used to provide the four representations of 1,105 above. Rather than taking
this multivariable polynomial identity on faith, or suggesting an unenlightening,
brute-force proof, we should provide a conceptual explanation. The identity is a
552 2010. CARMICHAEL NUMBERS
restatement of the fact that the complex absolute value is multiplicative: |zw| =
|z||w| for complex z, w. Let z = a + bi and w = c + di, in which i2 = −1. Then,
(a2 + b2 )(c2 + d2 ) = |z|2 |w|2 = |zw|2
= |(ac − bd) + i(ad + bc)|2
= (ac − bd)2 + (ad + bc)2 .
Carmichael numbers and Stigler’s law. Stigler’s law of eponymy (1980)
asserts that no scientific discovery is named after its discoverer [14]. True to form,
Stigler’s law was not proposed by statistician Stephen Stigler (1941– ); he attrib-
uted it to sociologist Robert K. Merton (1910–2003). In fact, mathematician Carl
Benjamin Boyer (1906–1976) said more or less the same thing in 1972. The history
behind Carmichael numbers is just as murky [12]. Alwin Korselt (1864–1947) an-
ticipated their discovery when he proved his criterion in 1899 (eleven years before
Carmichael’s paper [3]), although naturally he did not use the term “Carmichael
number” [7]. Since he did not provide any examples, perhaps it cannot be said that
Korselt “discovered” Carmichael numbers. That honor goes to Václav Šimerka, who
found the first seven Carmichael numbers in 1885 [11].
Carmichael numbers in arithmetic progressions. Once we know there
are infinitely many Carmichael numbers, it is natural to ask about their distribution.
Since Carmichael numbers behave, as far as Fermat’s little theorem is concerned,
like primes, it is natural to investigate other properties of primes that might be
shared with Carmichael numbers.
Dirichlet’s theorem on primes in arithmetic progressions states that if gcd(a, m)
= 1, then there are infinitely many primes congruent to a (mod m); see the 1913 and
2004 entries along with [5, 8]. Is the same true for Carmichael numbers? William
D. Banks and Pomerance proved this is the case under a certain assumption about
how soon a prime appears in an arithmetic progression. This has since been proved
unconditionally [15].
For each modulus m and each a relatively prime to m, let (a, m) be the
smallest prime congruent to a (mod m). Let
(m) = max (a, m).
gcd(a,m)=1
have real part 1/2. The 1942 entry concerns the Riemann hypothesis.
100TH ANNIVERSARY PROBLEMS 553
Bibliography
[1] W. R. Alford, A. Granville, and C. Pomerance, There are infinitely many Carmichael num-
bers, Ann. of Math. (2) 139 (1994), no. 3, 703–722, DOI 10.2307/2118576. http://www.math.
dartmouth.edu/~carlp/PDF/paper95.pdf. MR1283874
[2] W. D. Banks and C. Pomerance, On Carmichael numbers in arithmetic progressions, J.
Aust. Math. Soc. 88 (2010), no. 3, 313–321, DOI 10.1017/S1446788710000169. http://
faculty.missouri.edu/~bankswd/papers/2010_Carmichael_APs_published_version.pdf.
MR2661452
[3] R. D. Carmichael, Note on a new number theory function, Bull. Amer. Math. Soc. 16 (1910),
no. 5, 232–238, DOI 10.1090/S0002-9904-1910-01892-9. MR1558896
[4] S. Chowla, On the least prime in an arithmetical progression, J. Indian Math. Soc. (N.S.)
1 (1934), 1–3.
[5] H. Davenport, Multiplicative number theory, 3rd ed., revised and with a preface by Hugh
L. Montgomery, Graduate Texts in Mathematics, vol. 74, Springer-Verlag, New York, 2000.
MR1790423
[6] D. R. Heath-Brown, Almost-primes in arithmetic progressions and short in-
tervals, Math. Proc. Cambridge Philos. Soc. 83 (1978), no. 3, 357–375, DOI
10.1017/S0305004100054657. http://journals.cambridge.org/action/displayAbstract?
fromPage=online&aid=2079092&fileId=S0305004100054657. MR0491558
[7] A. R. Korselt, Problème chinois, L’Intermédiaire des Mathématiciens 6 (1899), 142–143.
[8] S. J. Miller and R. Takloo-Bighash, An invitation to modern number theory, with a foreword
by Peter Sarnak, Princeton University Press, Princeton, NJ, 2006. MR2208019
[9] S. J. Miller, The probability lifesaver: All the tools you need to understand chance, Princeton
Lifesaver Study Guide, Princeton University Press, Princeton, NJ, 2017. MR3585480
[10] R. G. E. Pinch, The Carmichael numbers up to 1021 , Proceedings of Conference on Algo-
rithmic Number Theory 2007, Turku Centre for Computer Science 46 (2007), 129–131.
[11] V. Šimerka, Zbytky z arithmetické posloupnosti (On the remainders of an arithmetic progres-
sion), Časopis pro pěstovánı́ matematiky a fysiky, 14 (1885), no. 5, 221–225.
[12] Wikipedia, Carmichael number, https://en.wikipedia.org/wiki/Carmichael_number.
[13] Wikipedia, Linnik’s theorem, https://en.wikipedia.org/wiki/Linnik’s_theorem.
[14] Wikipedia, Stigler’s law of eponymy, https://en.wikipedia.org/wiki/Stigler’s law of
eponymy.
[15] T. Wright, Infinitely many Carmichael numbers in arithmetic progressions, Bull. Lond. Math.
Soc. 45 (2013), no. 5, 943–952, DOI 10.1112/blms/bdt013. MR3104986
2011
Introduction
Although John Edensor Littlewood is best known for his work with G. H.
Hardy, he was a fine mathematician in his own right. Analysts frequently appeal
to Littlewood’s principles from measure theory [4]:
There are three principles, roughly expressible in the following terms:
Every (measurable) set is nearly a finite sum of intervals; every func-
tion (of class Lp ) is nearly continuous; every convergent sequence of
functions is nearly uniformly convergent.
Although these are not precise assertions, they suggest several important phenom-
ena. In what follows, let m denote Lebesgue measure on R.
Littlewood’s first principle means that a Lebesgue measurable subset A ⊂ R
of finite measure can be arbitrarily well-approximated by a finite union G of open
intervals in the sense that m(A\G ∪ G\A) can be made arbitrarily small.
In modern presentations, Littlewood’s third principle (Egorov’s theorem) usu-
ally comes before the second principle (Lusin’s theorem). Recall that a property
holds almost everywhere if the set of points at which it fails has measure zero. Here
is a formal statement of Egorov’s theorem [11]. Let fn : [a, b] → R be a sequence of
measurable functions that converges almost everywhere to f . For each
> 0, there
is a measurable set E ⊆ [a, b] so that m(E) <
and fn converges to f uniformly
on [a, b]\E. Thus, convergence almost everywhere implies uniform convergence on
subsets of large measure.
We cannot replace [a, b] in Egorov’s theorem with an unbounded interval be-
cause of the “traveling wave” phenomenon. For example, the sequence of charac-
teristic functions fn = χ[n,n+1] converges to 0 everywhere on [0, ∞). However, fn
does not converge to zero uniformly on any unbounded subset of [0, ∞) and hence
the analogue of Egorov’s theorem does not hold in this case.
One can even accomplish this with compactly supported, infinitely differen-
tiable functions. Since such functions are a staple in harmonic analysis and ad-
vanced partial differential equations, it is worth describing their construction. First
let
e−1/x if x > 0,
h(x) =
0 if x ≤ 0,
and verify, with the definition of the derivative, L’Hôpital’s rule, and induction,
that h is infinitely differentiable on R; see Figure 1(a). Then
h(x)
r(x) =
h(x) + h(1 − x)
555
556 2011. 100TH ANNIVERSARY OF EGOROV’S THEOREM
(a) The “hill” function h(x) begins perfectly (b) The “ramp” function r(x) begins perfectly
flat, then smoothly inclines on x > 0. It is flat, smoothly inclines on 0 < x < 1, and be-
infinitely differentiable. comes perfectly flat again. It is infinitely dif-
ferentiable.
2011: Comments
Mathematical genealogy. It would be unfair to reduce Egorov’s mathemat-
ical contributions to his eponymous theorem. For example, he has 6,396 academic
descendants [5]! Mathematicians like to keep track of their intellectual lineages:
one’s doctoral advisor is a “parent” in the genealogy, fellow students of the same
advisor are “siblings,” and so forth.
The Mathematics Genealogy Project is an immense database that contains
detailed information about almost every mathematician [3, 5]. Before the advent of
the formal dissertation process in Europe, many of the “genetic” relations between
mathematicians were informal. For example, an older mathematician might mentor
a younger one, or one mathematician’s writing might have influenced another.
The database is endlessly fascinating. For instance, the record for most descen-
dants belongs to Sharaf al-Din al-Tusi (ca. 1135–1213). His influence eventually
leads to the Byzantine Gregory Chioniadis (ca. 1240–1320), who studied in Persia.
A few more steps leads through the Byzantine empire to renaissance Italy, where we
begin to see a few familiar names. A more detailed analysis revealed the following:
In July 2016, Cosmin Ionita and Pat Quillen of MathWorks used MAT-
LAB to analyze the Math Genealogy Project graph. At the time, the
genealogy graph contained 200,037 vertices. There were 7639 (3.8%)
isolated vertices and 1962 components of size two (advisor-advisee
pairs where we have no information about the advisor). The largest
component of the genealogy graph contained 180,094 vertices, account-
ing for 90% of all vertices in the graph. The main component has 7323
root vertices (individuals with no advisor) and 137,155 leaves (math-
ematicians with no students), accounting for 76.2% of the vertices in
this component. The next largest component sizes were 81, 50, 47, 34,
34, 33, 31, 31, and 30. [5]
A small sample of the database is illustrated in Figure 2. Posters depicting the
genealogy of individual mathematicians, and even entire departments, are sold for
a small fee by the Mathematics Genealogy Project.
Obviously. . . . One often encounters phrases like “it is easy to see that,”
“clearly. . . ,” and “obviously. . . ” in mathematical writing. Along similar lines,
Egorov concluded his paper [1] with:
On voit sans peine que ce théorème est susceptible d’un grand nombre
d’applications. [One sees with no effort that this theorem is prone to
a great number of applications.]
558 2011. 100TH ANNIVERSARY OF EGOROV’S THEOREM
1 Daniel Tausk had communicated the same concerns to Nelson independently via e-mail.
560 2011. 100TH ANNIVERSARY OF EGOROV’S THEOREM
responded: “You are quite right, and my original response was wrong. Thank you
for spotting my error. . . . I withdraw my claim.” Peano arithmetic appears, at least
for the time being, to be free of contradictions.
Bibliography
[1] D. T. Egoroff, Sur les suites des fonctions mesurables, Comptes rendus hebdomadaires des
séances de l’Académie des sciences 152 (1911), 244–246.
[2] G. H. Hardy and W. W. Rogosinski, Fourier Series, Cambridge Tracts in Mathematics and
Mathematical Physics, no. 38, Cambridge University Press, 1944. MR0010206
[3] A. Jackson, A labor of love: the Mathematics Genealogy Project, Notices of the American
Mathematical Society 54 (2007), no. 8, 1002–1003.
[4] J. E. Littlewood, Lectures on the Theory of Functions, Oxford University Press, 1944.
MR0012121
[5] Mathematics Genealogy Project, https://genealogy.math.ndsu.nodak.edu/index.php.
[6] The n-Category Café: a group blog on math, physics, and philosophy, September 27,
2011: The Inconsistency of Arithmetic, https://golem.ph.utexas.edu/category/2011/09/
the_inconsistency_of_arithmeti.html.
[7] C. Severini, Sopra gli sviluppi in serie di funzioni ortogonali, Atti della Accademia Gioenia
di scienze naturali in Catania, Series V III (1910), 1–7.
[8] F. Su, Mathematical microaggressions, MAA Focus 35 (2015), no. 5, 36–37.
[9] L. Tonelli, Su una proposizione fondamentale dell’analisi, Bollettino della Unione Matematica
Italiana 2 (1924), no. 3, 103–104.
[10] J. D. Weston, A counter-example concerning Egoroff ’s theorem, J. London Math. Soc. 34
(1959), 139–140, DOI 10.1112/jlms/s1-34.2.139. MR0103961
[11] Wikipedia, Egorov’s theorem, https://en.wikipedia.org/wiki/Egorov’s_theorem.
2012
Introduction
We end our book, which has celebrated 100 years of mathematical milestones,
with the opening of a museum that shares a similar goal. On December 15, 2012,
The National Museum of Mathematics (MoMath) opened in New York City [4]:
The National Museum of Mathematics began in response to the closing
of a small museum of mathematics on Long Island, the Goudreau Mu-
seum. A group of interested parties (the “Working Group”) met in Au-
gust 2008 to explore the creation of a new museum of mathematics—
one that would go well beyond the Goudreau in both its scope and
methodology. The group quickly discovered that there was no mu-
seum of mathematics in the United States, and yet there was incredible
demand for hands-on math programming.
Do we need such a museum? Yes! Mathematics, unlike most other sciences, has
a terrible public-relations problem. Most college-educated Americans ended their
mathematical studies at calculus, or perhaps even earlier in “college algebra.” The
two people most closely associated with calculus, Newton and Leibniz, had their
heyday in the 1600s. This is largely where the public understanding of mathematics
stops: in the seventeenth century. So many people fear mathematics that “math
anxiety” has been studied by educators and psychologists since the 1950s [6]. No
other science1 suffers this enormous disconnect between its practitioners and the
public. Physicists can talk about nuclear energy and black holes without receiving
puzzled looks. Biologists can speak of DNA, genes, and proteins without fear of
losing the audience. These topics from 20th-century science are firmly embedded
in the public consciousness. Mathematicians have a lot of ground to make up.
Although there have been numerous exhibits on the interplay between math
and art, or small wings in science museums, MoMath is something new. It is entirely
devoted to mathematics [4]:
Mathematics illuminates the patterns that abound in our world. The
National Museum of Mathematics strives to enhance public under-
standing and perception of mathematics. Its dynamic exhibits and
programs stimulate inquiry, spark curiosity, and reveal the wonders
of mathematics. The Museum’s activities lead a broad and diverse
audience to understand the evolving, creative, human, and aesthetic
nature of mathematics.
1 There is some debate about whether mathematics is truly a “science,” or whether it is more
akin to philosophy or even religion. However, it has long been held among the liberal arts, being
well represented in the quadrivium of arithmetic, geometry, music, and astronomy.
561
562 2012. NATIONAL MUSEUM OF MATHEMATICS
On August 2, 2018, MoMath announced the first Distinguished Chair for the
Public Dissemination of Mathematics, a visiting professorship “dedicated to raising
public awareness of math.” The first recipient is Fields Medalist Manjul Bhargava,
whom we encountered several times already. Among the many activities associated
with this prestigious post is an eight-week-long minicourse, suitable for ages thirteen
and up, on mathematics and magic.
As for public awareness, mathematicians might never catch up to their col-
leagues in other sciences. We do not even have a Nobel Prize (usually good for
thirty seconds on the evening news) for the subject. However, we do our best. Like
the popular television show NUMB3RS (2005–2010) before it, MoMath is slowly
making mathematics more accessible to the general public.
2012: Comments
All good things. . . . It has been relatively easy to find subjects for the early
entries. Those have typically been concerned with significant events from times
long past. Sufficiently many years have passed so that we can determine which
results have stood the test of time. For more recent entries, the task has been more
difficult. The opinions and personal tastes of the authors have more often than not
been the deciding factors.
Can we look forward and predict what will be included in the sequel, dedicated
to the second hundred years of Pi Mu Epsilon? Some choices are obvious. For
instance, 2014 will probably focus on Maryam Mirzakhani (1977–2017), the first
female Fields Medalist (sadly, she passed away only a few years after receiving
the medal). Will she be the first of many? It is too early to tell, although we
will have to wait until at least 2022 to see a second female medalist. Which of
2018 medalists, Caucher Birkar (1978– ), Alessio Figalli (1984– ), Peter Scholze
(1987– ), and Akshay Venkatesh (1981– ), will we devote entries to in one hundred
years? What new theories will develop and blossom in the coming century?
Certainly there are open problems that will deserve entries of their own if they
are dispatched. The theory of numbers, one of our favorite topics, offers plenty of
opportunities. Will we have an entry about the proof of the Riemann hypothesis?
100TH ANNIVERSARY PROBLEMS 563
We can do no better than to finish with an excerpt from David Hilbert’s address
to the International Congress of Mathematicians in 1900:
History teaches the continuity of the development of science. We know
that every age has its own problems, which the following age either
solves or casts aside as profitless and replaces by new ones. If we
would obtain an idea of the probable development of mathematical
knowledge in the immediate future, we must let the unsettled ques-
tions pass before our minds and look over the problems which the
science of today sets and whose solution we expect from the future.
To such a review of problems the present day, lying at the meeting
of the centuries, seems to me well adapted. For the close of a great
epoch not only invites us to look back into the past but also directs
our thoughts to the unknown future. . . . It is difficult and often impos-
sible to judge the value of a problem correctly in advance; for the final
award depends upon the gain which science obtains from the prob-
lem. . . while the creative power of pure reason is at work, the outer
world again comes into play, forces upon us new questions from actual
experience, opens up new branches of mathematics, and while we seek
to conquer these new fields of knowledge for the realm of pure thought,
we often find the answers to old unsolved problems and thus at the
same time advance most successfully the old theories. And it seems
to me that the numerous and surprising analogies and that apparently
prearranged harmony which the mathematician so often perceives in
the questions, methods and ideas of the various branches of his science,
have their origin in this ever-recurring interplay between thought and
experience. [3]
Bibliography
[1] C. G. Gray, Solids of constant breadth, Math. Gaz. 56 (1972), no. 398, 289–292, DOI
10.2307/3617832. MR0487786
[2] L. Hall and S. Wagon, Roads and wheels, Math. Mag. 65 (1992), no. 5, 283–301, DOI
10.2307/2691240. MR1191272
[3] D. Hilbert, Über das Unendliche (German), Math. Ann. 95 (1926), no. 1, 161–190,
DOI 10.1007/BF01206605. http://www.ams.org/journals/bull/1902-08-10/S0002-9904-
1902-00923-3/S0002-9904-1902-00923-3.pdf. MR1512272
[4] MoMath: National Museum of Mathematics, http://momath.org/.
[5] R. L. Tennison, Smooth curves of constant width, The Mathematical Gazette 60 (1976),
no. 414, 270–272.
[6] Wikipedia, Mathematical anxiety, https://en.wikipedia.org/wiki/Mathematical_anxiety.
Index of People
Abbott, Derek, 252 Bellaso, Giovan Battista, 212 Caesar, Julius, 125, 212
Abbott, Edwin Abbott, 531 Berge, Claude, 525 Caldwell, Chris K., 388
Ackermann, Wilhelm, 71 Bergelson, Vitaly, 341, 342 Campbell, John, 145, 533
Adams, Douglas, 27, 392 Bernoulli, Jacob, 381 Cantor, Georg, 21, 27, 71,
Adams, John Couch, 221 Bernstein, Sergei, 196 118, 269, 329
Adleman, Leonard, 351, 501 Bertrand, Joseph, 129 Caplan, Seth, 531
Agmon, Shmuel, 197 Beukers, Frits, 365 Carlitz, Leonard, 460
Agrawal, Manindra, 501 Beurling, Arne, 194 Carmichael, Robert Daniel,
Aguayo, Daniel, 489 Bhargava, Manjul, 47, 287, 549
Ahlin, Ashley, xi 446, 562 Carr, Avery T., xii, 222, 242,
Aigner, Martin, 3, 189 Bieberbach, Ludwig, 393 383, 386
Akhmedov, Azer, 63 Bigelow, Stephen, 63 Catalan, Eugène Charles,
Akiyama, Yo, xii Binet, Jacques Philippe 254
al-Din al-Tusi, Sharaf, 557 Marie, 495 Cauchy, Augustin-Louis, 448
Alcuin of York, 473 Birkar, Caucher, 562 Cellarosi, Francesco, 556
Alexander, James Waddell, Birkenmajer, Ludwik An- Chang, Alan, 334
396 toni, 370 Chang, Paul, 559
Alford, William, 549 Birkhoff, George, 21, 96 Chao-Haft, Max, xii
Anderson, Randy L., 435 Birman, Joan, 399 Châu, Ngô Bao, 294
Andrade, Julio, 170 Black, Fischer, 469 Chebyshev, Pafnuty, 129,
Andrews, James M., xii, 91, Blake, Katherine, xii 381, 515
231 Bohr, Neils, 14 Chen, Hang, 306
Apéry, Roger, 171, 364 Boltzmann, Ludwig, 95 Cheng, Christine, 265
Appel, Kenneth, 346 Bolyai, János, 369 Cheng, Yuanyou, 388
Arnold, Vladimir, 221 Bolyai, Wolfgang Farkas, 369 Chioniadis, Gregory, 557
Arrow, Kenneth J., 199 Bombieri, Enrico, 170 Chowla, Sarvadaman, 552
Artin, Emil, 170 Borcherds, Richard, 441 Chudnovsky, Maria, 526
Aschbacher, Michael, 514 Bott, Raoul, 300 Church, Alonzo, 122
Asimov, Isaac, 14, 145, 533 Bourbaki, Nicholas, 234 Churchill, Winston, 158
Atiyah, Michael, 299, 490 Bourgain, Jean, 97, 326 Cipolla, Michele, 410
Augustine of Hippo, 473 Boyer, Carl Benjamin, 552 Cipra, Barry, 247
Axelsson, Åke, 489 Brassau, Pierre, 489 Civario, Gilles, 402
Brauer, Richard, 515 Clausen, Thomas, 459
Babbage, Charles, 212 Broad, Steven, 23 Clay, Landon T., 490
Bach, Johann Sebastian, 88 Bronstein, Manuel, 303 Cocks, Clifford, 351
Bacon, Kevin, 1, 306 Brooks, Robert W., 357 Cohen, Paul, 269, 307, 484
Baez, John, 559 Brouwer, Luitzen Egbertus Cole, Frank Nelson, 464
Baire, René-Louis, 481 Jan, 543 Collatz, Lothar, 101
Baker, Alan, 288 Brown, Gordon, 123 Condorcet, Nicolas de, 199
Baker, Roger C., 388 Brun, Viggo, 33, 58 Connes, Alain, 163
Banach, Stefan, 62 Buffett, Warren, 470 Conrad, Brian, 233
Banks, William D., 552 Bunyakovsky, Viktor Conway, John Horton, 402,
Banzhaf III, John F., 201 Yakovlevich, 521 441, 446
Barlow, William, 476 Burkhardt-Guim, Paula, xii Cooley, James William, 281
Bateman, Paul T., 521 Burt, David, 18 Cooper, Curtis, 306
565
566 INDEX OF PEOPLE
Corsi, Craig, 2, 258 Figalli, Alessio, 562 Grossman, Jerrold, 72, 306
Cramér, Harald, 410, 552 Fippinger, Miles C., xii Grothendieck, Alexander,
Firk, Frank W. K., 13 233, 376
Dantzig, George, 137, 182 Focardi, Sergio M., 252 Gueganic, Alexandre, xii
Davids, Bob, 317 Fraenkel, Aviezri, 306 Guthrie, Francis, 345
Davis, Martin, 312, 385, 386 Francis, John G. F., 247 Gyárfás, András, 526
de Branges, Louis, 394 Franklin, Benjamin, 14
de Grey, Aubrey, 527 Franklin, Philip, 347 Hadamard, Jacques, 21, 187,
de Moivre, Abraham, 495 Freedman, Michael, 505 190, 249
Debrunner, Hans, 369 Freeman, Jesse, 118 Hadwiger, Hugo, 527
Dehn, Max, 369, 396 Frege, Gottlob, 85 Häggström, Olle, 383
Delaunay, Charles-Eugéne, Frenkel, Igor, 441 Haken, Wolfgang, 346
221 Frey, Gerhard, 234 Hales, Thomas C., 476
Diaconis, Persi, 381 Fried, David, 421 Hall, Monty, 427
Dickson, Leonard Eugene, Fry, John, 451 Halparin, Monte, 427
521 Fry, Roger, 39 Hamel, Georg, 278
Diop, Amina, xii Furstenberg, Hillel, 3, 230, Hamilton, Richard S., 505
Diophantus of Alexandria, 340 Hamming, Richard W., 252
457, 458 Hammond, Christopher N.
Dirichlet, Peter Gustav Leje-
Gale, David, 263 B., 306
une, 3, 172, 458
Galilei, Galileo, 27 Hanke, Jonathan P., 447
Duren, Peter, 308
Gallian, Joseph, 75 Hardy, Godfrey Harold, 14,
Dyson, Freeman, 81, 228,
Galois, Évariste, 145, 514 39, 57, 59, 141, 153, 187,
408
Gardner, Martin, 7, 342, 383, 195, 381, 521, 529, 551,
442, 546 555
Eder, Maciej, 487
Garfield, James A., 315 Harman, Glyn, 388
Egorov, Dmitri, 556
Gauss, Carl Friedrich, 11, 97, Harriot, Thomas, 476
Einstein, Albert, 11
143, 281, 287, 381, 395, Haselgrove, C. Brian, 93
Ekhad, Shalosh B., 403
423, 445, 448 Hasse, Helmut, 48, 170
Elga, Adam, 428
Gelfand, Israel, 148, 197, 252 Hawkins, David, 408
Elkies, Noam, 39
Gelfond, Alexandr, 117 Hay, Mark, xii
Eppstein, David, 527
Gentleman, Robert, 487 Heath-Brown, Roger, 552
Erdős, Paul, 1, 90, 102, 129,
Germain, Sophie, 460 Heaviside, Oliver, 11
187, 189, 305–307, 340
Gerwien, Paul, 369 Heawood, Percy J., 346
Escher, Maurits Cornelis, 88,
Geyer, Lukas, 362 Heegner, Kurt, 288, 454
145
Eskew, Monroe, 559 Gibbon, Edward, 145 Heeringa, Brent, 122
Eubulides of Miletus, 85 Gibbs, Josiah Willard, 11, 95 Helfgott, Harald Andrés, 127
Euclid of Alexandria, 4, 272, Ginsparg, Paul, 433 Hellegouarch, Yves, 234
275, 369, 474 Gladwell, Malcolm, 138 Hensel, Kurt, 17
Euler, Leonhard, 2, 39, 91, Glassman, Zachary, xii Hermite, Charles, 329, 342
117, 118, 171, 295, 378, Gödel, Kurt, 86, 122, 141, Hilbert, David, 40, 71, 86,
422, 423, 458, 463, 474, 269, 484 117, 269, 311, 369, 385,
513, 533 Goffman, Casper, 275, 305 457, 490, 564
Evans, Jonny, 399 Goldbach, Christian, 127 Hirzebruch, Friedrich, 300
Goldfeld, Dorian, 187 Hoover, Colleen, 23
Fabozzi, Frank J., 252 Goldston, Daniel, 528 Horn, Roger A., 521, 523
Faltings, Gerd, 378 Golomb, Solomon, 527 Horner, William George, 285
Fedi, Zolt, 306 Gomory, Ralph E., 509 Householder, Alston Scott,
Ferguson, Samuel P., 476 Gorenstein, Daniel, 514 247
Fermat, Clément-Samuel, Gosset, William Sealy, 537 Huang, Ming-Deh A., 501
458 Gowers, Timothy, 90, 490 Hughes, Colin, 493
Fermat, Pierre de, 145, 208, Graham, Ronald, 90, 431,
234, 311, 421, 448, 457, 442 Ihaka, Ross, 487
512 Granville, Andrew, 549 Ingham, Albert, 172, 388
Feynman, Richard, 76 Green, Ben, 4, 58, 511 Irons, Jeremy, 144
INDEX OF PEOPLE 567
Jacobi, Carl Gustav Jacob, Lagarias, Jeffrey, xii, 103, Manasse, Mark, 355
445 108, 370, 477 Marchal, Christian, 476
James, Bill, 319 Lagrange, Joseph-Louis, 39, Markov, Andrey, 220, 381,
Jensen, Alexandra, 346 445, 475 487
Jensen, Johan Ludwig, 460 Lambert, Joel, 306 Mason, Richard C., 375
Johnson, Charles R., 523 Lamé, Gabriel, 458, 499 Masser, David, 376
Johnson, Dano, 531 Landau, Edmund, 172, 521 Matelski, J. Peter, 357
Jones, James P., 386 Lander, Leon J., 39 Mathey, Steven, 491
Jones, Michael, 200 Langlands, Robert, 293 Matiyasevich, Yuri, 312, 385,
Jones, Peter, 277 Laplace, Pierre-Simon, 381 386
Jones, Toby, 144 Le Gall, François, 283 Maynard, James, 33, 528
Jones, Vaughan F. R., 163, Le Verrier, Urbain, 221 Mazur, Barry, 47
396 Lebesgue, Henri, 458 McGarvey, Joey, 158
Lebesgue, Victor-Amédée, McGuire, Gary, 402
Kahoro, Elvis, 416 458 McKay, John, 440
Kanigel, Robert, 143 Leclerc, Georges-Louis, 176 Mercer, Idris D., 230
Kant, Immanuel, 273 Lee, David, xii Mersenne, Marin, 463
Karp, Richard, 402 Lee, Harper, 487 Mertens, Franz, 110, 189
Kasiski, Friedrich, 212 Lefschetz, Solomon, 300 Merton, Robert C., 469
Katz, Nick, 82 Legendre, Adrien-Marie, Merton, Robert K., 552
Kayal, Neeraj, 501 445, 458 Metropolis, Nicholas, 219
Kehle, Paul, 265 Lehmer, Derrick Henry, 465 Meurman, Arne, 441
Kempe, Alfred, 346 Lehr, Jessica, xi Milićević, Djordje, 77
Kepler, Johannes, 476 Leibniz, Gottfried Wilhelm, Miller, Gary Lee, 501
Kestemont, Mike, 487 147, 561 Miller, Stephen D., 365
Khavinson, Dmitry, 362 Lemke Oliver, Robert, 415 Mills, William H., 388
Khinchin, Aleksandr, 97, 113 Lenstra, Arjen K., 355 Milnor, John, 76
Khovanov, Mikhail, 400 Leontief, Wassily, 472 Mirzakhani, Maryam, 562
Kjos-Hanssen, Bjørn, xii Lepowsky, James, 441 Mishkin, Pamela, 346, 465
Klamkin, Murray S., 103 Levi-Civita, Tullio, 11 Mizgerd, Clayton, xii
Kleene, Stephen, 122 Levinson, Norman, 153 Möbius, August Ferdinand,
Klein, Felix, 243, 440 Lewis, Michael, 317 242
Klyachko, Alexander, 68 Lie, Sophus, 514 Mochizuki, Shinichi, 376
Knuth, Donald, 363, 443 Lindeberg, Jarl, 54 Molchanov, Stanislav, 253
Knutson, Allen, 68 Lindemann, Ferdinand von, Monaco, Jane J., 435
Kobayashi, Forest, xii 329 Montague, David, 207
Kodaira, Kunihiko, 234 Linnik, Yuri Vladimirovich, Montgomery, Hugh, 81, 408
Koebe, Paul, 394 553 Mordell, Louis, 46
Kolmogorov, Andrey, 221, Liouville, Joseph, 118, 228, Moreno, Samuel G., 35
381 329 Morgan, Frank, 306, 460, 507
Kominers, Scott Duke, xii, Listing, Johann Benedict, Morgenstern, Oskar, 163
447 242 Morin, Bernard, 241
Kontorovich, Alex, 306, 323, Littlewood, John Edensor, Morse, Marston, 21
326, 402 39, 57, 59, 108, 143, 383, Moser, Jürgen, 221
Korselt, Alwin, 552 521, 529, 555 Moser, Leo, 91, 527
Kraitchek, Maurice, 383 Logsdon, Ben, xii Moser, William, 527
Krantz, Steven G., 141 Lorenz, Edward, 257 Mumford, David, 76
Krohn, Maxwell, 489 Lovász, László, 525 Muñoz-López, José, xii
Kublanovskaya, Vera, 247 Lowell, Percival, 75 Munroe, M. E., 275
Kummer, Ernst, 459 Lowry, John, 369 Müntz, Herman, 197
Kurschak, Josef, 17 Luca, Florian, 306, 417 Murray, Francis, 398
Kuzmin, Rodion, 117 Lucas, Édouard, 423, 464 Murty, M. Ram, 306
Lyons, Richard, 515 Myerson, Gerry, 505
Labbé, Cyril, 490
Lacan, Jacques, 436, 466 Mackall, Blake, 441 Na, Giebien, xii
568 INDEX OF PEOPLE
Nash Jr., John Forbes, 163, Quillen, Daniel, 76 Scott, Alex, 526
543 Segal, Irving, 559
Navier, Claude-Louis, 491 Rabin, Michael Oser, 501 Selberg, Atle, 153, 187
Nelson, Edward, 527, 559 Rainich, Georg Yuri, 534 Seldon, Hari, 145
Neuenschwander, Dwight E., Raleigh, Walter, 476 Selhorst-Jones, Vincent, 306,
251 Ramanujan, Srinivasa, 40, 307
Newton, Isaac, 67, 435, 561 59, 143, 172, 342, 365, Selvin, Steve, 427
Neyman, Jerzy, 137 381, 446, 448, 551 Serre, Jean-Pierre, 233, 299
Nicely, Thomas, 33 Ramsey, Frank Plumpton, 89 Severini, Carlo, 556
Nimitz, Chester W., 160 Reid, Constance, 385 Seymour, Paul, 526
Nishikado, Tomohiro, 359 Reidemeister, Kurt, 396 Shakespeare, William, 487
Nobel, Alfred, 469 Reiter, Harold, xi, 253 Shamir, Adi, 351
Norton, Simon P., 441 Rényi, Alfréd, 306 Shao, Lily, xii
Norwich, John Julius, 201 Reznick, Bruce, 76 Shapiro, Arnold, 241
Ribet, Ken, 234 Shapiro, Daniel, 235
O’Brien, Miles, 392 Ricci-Curbastro, Gregorio, Shapiro, Harold S., 306
O’Neill, Cathy, 267 11 Shapley, Lloyd, 263
Odlyzko, Andrew, 81, 407, Riemann, Bernhard, 11, 151, Shavgulidze, E. T., 63
408 188 Sheil-Small, Terence, 362
Oesterlé, Joseph, 376 Ringel, Gerhard, 347 Sheldon, Kathy, xii
Olbers, Wilhelm, 284 Risch, Robert Henry, 302 Sherman, David, 306
Ono, Ken, 60, 448 Rivest, Ronald, 351 Shiing-Shen, Chern, 293
Orwell, George, 391 Robertson, Neil, 526 Shor, Peter, 352
Ostrowski, Alexander, 17 Robinson, Julia, 312, 385, Siegel, Carl Ludwig, 228, 460
386 Siegel, Zachary, xii
Palka, Bruce, 534 Robinson, Raphael, 61, 71 Sierpiński, Waclaw, 271
Pan, Chengdong, 553 Rochefort, Joseph J., 160 Silva, Cesar E., 97
Parkin, Thomas R., 39 Rosenthal, Jeffrey, 219 Silverman, Joseph H., 376,
Patel, Dev, 144 Ross, Arnold, 235, 421 377
Penrose, Lionel, 201 Ross, W. Bruce, 91 Šimerka, Václav, 552
Perelman, Grigori, 433, 505 Ross, William T., 306 Simpson, Homer, 241
Perichon, Benoı̂t, 511 Roth, Alvin, 263 Singer, Isador, 299
Perpetua, Byron, 9 Roth, Klaus Friedrich, 228, Skewes, Stanley, 108
Picard, Charles Émile, 166 340 Smale, Stephen, 241, 505
Picard, Jean-Luc, 145, 166, Rowling, J. K., 487 Smith, Stephen D., 514
392 Royden, Halsey, 307 Smith, Winston, 391
Pinch, Richard G. E., 549 Rubik, Ernő, 333 Snow, Joanne, 23
Pinter, Mike, xi Rudin, Walter, 275 Snyder, Noah, 376
Pintz, János, 388, 528 Russell, Bertrand, 85, 86 Sokal, Alan, 436, 466
Poincaré, Henri, 21, 221, 257 Rybicki, Jan, 487 Sós, Vera, 306
Pólya, George, 91, 407 Soundararajan, Kannan,
Pomerance, Carl, xii, 503, Sally, Paul, 371 415, 448, 528
504, 549, 551 Sarason, Donald, 197 Spencer, Joel, 91, 93
Post, Emil Leon, 122 Sarnak, Peter, 82 Sperner, Emanuel, 494
Pratt, Kyle, 288 Sato, Daihachiro, 386 Spirkl, Sophie, 526
Punnett, Reginald Crundall, Savant, Marilyn vos, 427 Stark, Eberhard L., 35
142 Saxena, Nitin, 501 Stark, Harold, 288
Pushkin, Alexander, 220, Schilly, Harald, xii Stein, William A., xii, 519
487 Schinzel, Andrzej, 305, 521 Stepanov, Sergei Aleksan-
Putinar, Mihai, 306 Schneeberger, William, 446 drovich, 170
Putnam, Elizabeth Lowell, Schneider, Theodor, 117 Stevens, Glenn H., 421
75 Scholes, Myron S., 469 Stigler, Stephen, 552
Putnam, Hilary, 312, 385, Scholze, Peter, 562 Stoiciu, Mihai, 482
386 Schrödinger, Erwin, 67, 383 Stokes, George Gabriel, 491
Putnam, William Lowell, 75 Schultz, William Henry, 208 Stone, Daniel F., 164
INDEX OF PEOPLE 569
571
572 INDEX
Bateman–Horn conjecture, Cantor set, 22, 270, 271, 482 Collatz sequence, 101
34, 57, 129, 464, 512, 521, Cantor surjection theorem, companion matrix, 296
528, 533, 546, 550, 563 24 complete graph, 442
Bateman–Horn constant, Cantor’s powerset theorem, completeness theorem, 86
522, 534 31 conditionally convergent,
Battle of Midway, 160 cardinality, 27, 28, 269 110
bell curve, 52 Carmichael number, 501, Condorcet cycle, 200
Benford’s law, 54, 102, 131, 504, 549 Condorcet winner, 200
223 Catalan number, 253, 254, Condorcet winner criterion,
Bernoulli numbers, 459 540 200
Bernoulli random variable, category theory, 481 congruence obstruction, 57
55, 179 Cauchy functional equation, congruent number, 452, 453
Bernstein polynomial, 196 278 congruent number problem,
Bertrand’s postulate, 129 Cauchy product, 109 452
Beurling’s theorem, 195 Cauchy random variable, 53 conjecture; 3x + 1, 101;
Bieberbach conjecture, 393 Cauchy–Riemann equations, abc-, 376, 378, 521,
bijection, 28 8 563; Bateman–Horn, 34,
billiards, 258 central limit theorem, 51, 52, 57, 129, 464, 512, 521,
Binet’s formula, 313, 495, 54, 55, 79, 176, 179, 303, 528, 533, 546, 550, 563;
519 411, 537, 539 Bieberbach, 393; Birch
binomial random variable, 55 chain of subsets, 31 and Swinnerton-Dyer,
Birch and Swinnerton-Dyer chaos, 221 47, 454, 455; Conway–
conjecture, 47, 454, 455, character, 148 Norton, 441; epsilon, 234;
491 characteristic function, 54, Erdős–Turán, 340, 343;
Birkhoff ergodic theorem, 96 95 Erdős, 2, 343; Euler’s on
birthday attack, 139 characteristic polynomial, sums of powers, 39; Fer-
birthday paradox, 139 77, 247, 296, 496 mat’s, 422; Gauss’s class
birthday problem, 138 Chebyshev’s bias, 415 number, 288; Goldbach,
Black–Scholes model, 470 checksum, 124 57; Goldbach binary,
blancmange function, 360 chess, 493 127; Goldbach ternary,
Blaschke condition, 195, 197 Chinese remainder theorem, 127; Hardy–Littlewood
Bletchley Park, 157, 210 238, 535 k-tuple, 57, 58, 128;
Boneyard Book, 308 chromatic number, 525 Hardy–Littlewood (twin
Borsuk–Ulam theorem, 165 ciphertext, 124 primes), 34; Heawood,
Boston Red Sox, 317 circle method, 40, 57, 127, 346; Hilbert–Pólya, 407;
braid group, 399 441 Kepler, 346, 476; Lan-
Brouncker’s formula, 115 class field theory, 294 dau’s, 528; Mordell’s, 378;
Brouwer’s fixed-point theo- class number, 287, 446, 459 Poincaré, 433, 505, 507;
rem, 164, 494, 543 class number one problem, Polignac’s, 33; Pólya’s, 91;
Brownian motion, 470 287 Ramanujan, 294; Sato–
Brun’s constant, 33, 37 classification of finite simple Tate, 294; Taniyama–
Brunn–Minkowski theorem, groups, 300, 439, 513 Shimura, 48; Thwaites,
23 classification of surfaces, 506 101; twin prime, 33, 57,
Buffon’s needle problem, 176 Clausen–von Staudt theo- 408, 433, 522; Ulam’s,
Burali–Forti paradox, 88 rem, 459–461 101; Zaremba’s, 323, 326
busy beaver, 122 Clay Millennium Problems, connected sum, 508
busy beaver function, 122 xi, 47, 108, 153, 487, 505 consistent, 86, 270, 484
butterfly effect, 221, 257 clique number, 525 constant; Apéry’s, 365;
closed graph theorem, 481 Bateman–Horn, 522,
C ∗ -algebra, 302 closed set, 230 534; Brun’s, 33, 37;
Caesar cipher, 124, 212 CoCalc, 521 Conway, 402; Euler’s,
Calkin–Wilf sequence, 29 Cole Prize, 464 see also e; Euler–
canonical linear program- collaboration graph, 305 Mascheroni, 134, 152, 172;
ming problem, 183 Collatz function, 101 Gelfond–Schneider, 117;
Cantor dust, 24 Collatz graph, 101 Khinchin’s, 113–115, 134;
INDEX 573
Liouville’s, 119, 227, 329; Diophantine equation, 311, 378, 461; functional (zeta
Meissel–Mertens, 189; 385, 386 function), 152; Orwell’s,
Mills’s, 388; Planck’s, 67; Diophantine set, 386 391; Schrödinger, 67
Ramanujan’s, 342, 534; Dirac delta functional, 80 equidecomposable, 369
twin primes, 34 Dirichlet divisor problem, equidistributed modulo 1,
constraint matrix, 181 172 96, 131, 133
constructible, 424 Dirichlet’s approximation equinumerous, 27, 270, 545
constructible polygon, 423 theorem, 223, 227 equivalence relation, 62, 64
consumption matrix, 472 Dirichlet’s box principle, 223 Eratosthenes of Cyrene, 408
continued fraction, 73, 97, Dirichlet’s theorem on Erdős–Turán conjecture,
113, 323, 324, 326 primes in arithmetic pro- 340, 343
continuum hypothesis, 86, gressions, 3, 58, 291, 354, Erdős conjecture, 2, 343
269, 272, 277, 307 415, 522, 528, 552, 553 Erdős number, 1
contraction mapping princi- discrete dynamical system, Erdős–Bacon number, 306
ple, 165 95 ergodic hypothesis, 95
Conway’s constant, 402 discrete Fourier transform, ergodicity, 96
Conway–Norton conjecture, 281 error function, 470
441 discriminant, 287; of an ellip- Euclid’s theorem, 3, 87, 111,
cookie problem, 313 tic curve, 45 189, 230, 295, 423, 513
Coq, 346 division algorithm, 461 Euclid–Mullin sequence, 513
cosmological theorem, 402 divisor function, 171 Euclidean algorithm, 207,
countable, 28 Doctor Who, 67, 85, 145, 499
Cramér model, 410, 463 267, 393, 487 Euclidean geometry, 272
critical line, 153, 388 dyadic filtration, 25 Euclidean norm, 193
critical strip, 152, 407, 409 Dyck path, 254 Euler characteristic, 347, 509
cryptography, 47, 124, 224, Dyck’s theorem, 509 Euler product formula, 110,
351 dynamical system, 221 140, 151, 188–190, 293,
cubic close packing, 476 294, 407, 409
cycle, 386 Earth, 68, 439 Euler totient function, 416
cyclic group, 514 Egorov’s theorem, 555 Euler’s constant, see also e
cyclotomic field, 459 eigenvalue trace lemma, 80 Euler’s formula, 8, 35, 40,
election procedure, 199 118
cyclotomic polynomial, 236
Electronic Frontier Founda- Euler’s power tower, 72
tion, 464 Euler–Lucas theorem, 422
Data Encryption Standard, elementary function, 302 Euler–Mascheroni constant,
124 ellipse, 146 134, 152
de Morgan’s law, 231 elliptic curve, 513; analytic existential proof, 117
degree, 31 rank, 47; and congruent expected value, 51
Dehn invariant, 369 numbers, 454; definition, extreme point, 194
Delbert Ray Fulkerson Prize, 45; discriminant, 45; Frey,
526 234; group operation, 46; Facebook, 1, 305
density, 339 Hasse–Weil L-function of, factor, 398
derangement, 158 48; largest known rank, fast Fourier transform, 281,
DES, 124 47; modular, 234; rank, 453
diagonal argument, 29 47; rational point, 46, 454; feasible, 182
diagonal matrix, 447 torsion subgroup, 47 Fenway Park, 57
diet problem, 182, 184 empirical spectral measure, Fermat equation, 311, 378,
differential equation, 8, 166, 80 461
221, 257, 299 energy-momentum invariant, Fermat number, 421–423
digit expansion; base B, 113; 11, 12 Fermat prime, 421, 422, 424
binary, 113; decimal, 113 Enigma machine, 122, 157 Fermat’s conjecture, 422
dihedral group, 439 epsilon conjecture, 234 Fermat’s last theorem, 39,
dimension theorem, 196 equation; Black–Scholes, 145, 169, 208, 234, 294,
Diophantine approximation, 470; Diophantine, 311, 311, 375, 376, 378, 452,
62, 207, 222, 229, 391 385, 386; Fermat, 311, 457, 460, 476
574 INDEX
Fermat’s little theorem, 351, 539; generating, 448, 497; Gelfond–Schneider constant,
354, 375, 499, 501 inner, 195; iterated expo- 117
Fermat’s polygonal number nential, 72; Koebe, 393; Gelfond–Schneider theorem,
theorem, 448 L-, 3, 48; logarithmic in- 117, 119
fetid dingo’s kidneys, 27, tegral, 107, 153, 409, 411, general comprehension prin-
466, 489 547; matrix exponential, ciple, 85
FFT, 453 69; meromorphic, 233; general theory of relativity,
Fibonacci number, 131, 235, modular, 440; multiplica- 11
312, 313, 373, 495, 497, tive, 171; periodic, 281; generating function, 59, 448,
519 prime-counting, 107, 108, 497
Fields Medal, 76, 90, 145, 129, 151, 516; ramp, 555;
geodesic, 12, 21
170, 228, 269, 287, 288, rational, 233, 377; Rie-
geometric mean, 114
294, 300, 378, 397, 433, mann zeta, 3, 48, 79, 81,
geometric progression, 341
441, 446, 491, 505, 507, 110, 139, 151, 169–171,
187, 189, 190, 363, 364, geometric series, 19, 36, 59,
562
388, 407, 409, 410, 459; 78, 151
finite, 28
finite extension, 169 sawtooth, 360; square- Géométrie algébrique et
integrable, 97; square- géométrie analytique, 233
first category, 481
first incompleteness theorem, wave, 147; sum of divi- GIMPS, 464
86, 88 sors, 172; Takagi, 360; global function field, 169
fixed point, 164, 357, 543 transcendental, 233; von Global Median Matching,
Flint Hills series, 229 Mangoldt, 409 265
forcing, 269 function field, 169; global, Global Positioning System,
formula; Binet’s, 495, 519; 169; Riemann hypothesis, 14
Simson’s, 496; Stirling’s, 170; zeta function, 169 Goldbach conjecture, 57, 127
539; Wallis’s, 539 functional equation, 152; golden ratio, 223, 313, 495
forward orbit, 103 Cauchy, 278 Golden State Warriors, 317
four color theorem, 345, 346, fundamental group, 347
Golomb graph, 527
476 fundamental lemma, 294
Goodstein sequence, 86
four fours puzzle, 392 fundamental polygon, 506
Goodstein’s theorem, 87
four-square identity, 39, 41 fundamental theorem of alge-
GP-rich, 341
Fourier coefficients, 146 bra, 118, 358, 484
fundamental theorem of GPS, 14
Fourier matrix, 284
arithmetic, 29, 37 Graham’s number, 439, 442
Fourier series, 146, 221, 301
Gram–Schmidt process, 249
fractal, 18, 22, 260, 357
GAGA principle, 233 graph, 253, 386; collabora-
fractal dimension, 270, 271
Gale–Shapley algorithm, 264 tion, 305; complete, 442;
Franklin graph, 347, 349
Galileo’s paradox, 27 Franklin, 347, 349; friend-
frequency analysis, 125
Galois representation, 293 ship, 306; spanning tree in
frequency-wave number in-
Galois theory, 329 a, 484
variant, 11
game theory, 163, 543 gravitational lensing, 14, 362
Fresnel integral, 303
Frey curve, 234 gamma function, 152, 479, gravitational waves, 14
friendship graph, 306 537, 539 Great Internet Mersenne
Fubini–Tonelli theorem, 472 Gauss map, 98 Prime Search, 464
Fulkerson Prize, 526 Gauss measure, 98 greatest common divisor,
function; Ackermann’s, 443; Gauss sum, 284 207, 235
analytic, 393; blanc- Gauss’s class number conjec- greedy algorithm, 312
mange, 360; bump, 556; ture, 288 Green–Tao theorem, 2, 58,
busy beaver, 122; charac- Gauss’s lemma, 495 340, 511, 512, 521, 522
teristic, 95; Collatz, 101; Gauss–Kuzmin theorem, 97 group; alternating, 439; clas-
continuous and nowhere- Gauss–Wantzel theorem, 424 sification of finite simple
differentiable, 360; divi- Gaussian, 52, 54, 55 groups, 439; cyclic, 514;
sor, 171; elementary, 302; Gaussian integer, 511 dihedral, 439; fundamen-
Euler totient, 351, 416; Gaussian prime, 511 tal, 347; monster, 439;
gamma, 152, 479, 537, GCHQ, 351 of Lie type, 514; pariah,
INDEX 575
514; quasithin, 514; Ru- Householder matrix, 248 Kakutani’s problem, 101
bik’s Cube, 333; simple, Householder reflections, 247 KAM theory, 221
439, 513; sporadic, 514 hyperbolic geometry, 272, Kasiski method, 212
Grundgesetze der Arith- 273 Kepler conjecture, 346, 476
metik, 85 hypercube, 531 Khinchin’s constant, 114,
115, 134
Hadamard conjecture, 250 impossibility theorem, 199 Kirby–Paris theorem, 87
Hadamard matrix, 249 incompleteness theorem; Klein bottle, 243, 346, 505,
Hadamard’s inequality, 249 first, 86; second, 86 508
Hadwiger–Nelson problem, indicator function, 95 knot; Stevedore, 398; trefoil,
527 inertial frames, 11 395, 505; unknot, 395
Hahn–Banach theorem, 197 infinite, 28 knot polynomial, 396
halting problem, 122 infinitesimal generator, 69 Kolmogorov–Arnold–Moser
Hamiltonian, 67, 69, 79, 221 infinity, 269 theorem, 221
Hamiltonian cycle, 386 injective, 27 Korselt’s criterion, 549
Hardy space, 195, 301 inner function, 195 Kronecker product, 296
Hardy–Littlewood k-tuple Institute for Advanced Kronecker’s approximation
conjecture, 57, 58, 128 Study, 408 theorem, 134
Hardy–Littlewood conjec- Intel, 33 Kronecker–Weyl theorem,
ture (twin primes), 34 interlace, 67 133
Hardy–Weinberg law, 142 intermediate value theorem, Kummer’s congruence, 460,
harmonic number, 108, 553 77, 78, 164, 461 461
harmonic series, 35, 40, 110, Internal Revenue Service, 54
301 International Congress of L’Hôpital’s rule, 555
Hasse–Minkowski local- Mathematicians, 117, 269, L-function, 3, 48, 293;
global principle, 20 311, 369, 515, 564 Hasse–Weil, 48; symmet-
Hasse–Weil L-function, 48 International Mathematics ric power, 294
Hausdorff maximality princi- Competition for Univer- Lagrange’s four-square theo-
ple, 484 sity Students, 76 rem, 445, 448
Hausdorff topology, 231 invariance of domain, 545 Landau’s conjecture, 528
Hawkins prime, 408 invariant, 371, 395 Langlands program, 293, 563
Heawood conjecture, 346 invariant set, 96 Laplace’s method, 539
Heegner number, 343 invariant subspace, 194 large cardinal, 397
heptadecagon, 423 invisible forest, 237 largest known prime, 464
hereditary base-b representa- irrational, 295 Laser Interferometer
tion, 86 irrational rotation, 95 Gravitational-Wave Ob-
heuristic reasoning, 103, 421, irrationality measure, 227, servatory, 14
422, 424, 463, 553 295 √ LATEX, 363, 413
hexagonal close packing, 476 irrationality of 2, 205 lattice point, 237
hexagonal lattice packing, irrationality type, 133, 222 Laurelin the Golden, 254
475 irreducible representation, Laurent polynomial, 396
Hilbert space, 68, 79, 96, 398, 441 law of complementary prob-
407 IRS, 54 ability, 138
Hilbert’s problems, xi, 117, isologous, 211 law of large numbers, 96
153, 269, 288, 311, 369, iterated exponential func- Lebesgue measure, 24, 555
385, 451 tion, 72 Legendre symbol, 534
Hilbert–Pólya conjecture, iterated towers, 109 lemma; eigenvalue trace, 80;
407 Gauss’s, 495; Sperner’s,
Hodge conjecture, 491 Jacobi symbol, 503 494, 544–546; Zorn’s, 483
homeomorphism, 23, 347, Jacobi’s four-square theo- Leroy P. Steele Prize, 275
544, 545 rem, 445 liar’s paradox, 85, 86, 381
HOMFLY polynomial, 397 Jarnı́k competition, 76 LIGO, 14
Honors Class, 117 Jones polynomial, 395–397, linear programming, 181, 403
Horn conjecture, 68 399 Liouville lambda function, 91
Horner’s method, 284, 285 Jones tower, 398 Liouville number, 118
576 INDEX
Liouville’s constant, 119, 62, 69, 248, 249; permu- Müntz–Szász theorem, 197
227, 329 tation, 439; positive semi-
Liouville’s theorem, 118, 119, definite, 67, 250; real or-
329 thogonal, 69; real sym- naive measure theory, 95
Littlewood’s principles, 555 metric, 80, 82; selfadjoint, naive set theory, 85
Local Median Matching, 265 67 Nash equilibrium, 163, 164,
local-global principle, 20 mean, 51 543
logarithmic derivative, 188 mean value theorem for inte-
National Institute of Stan-
logarithmic integral, 107, grals, 35
dards and Technology, 124
153, 409, 411, 522, 547 measure zero, 22, 96, 97, 118
National Medal of Science,
look and say sequence, 402 Meissel–Mertens constant,
363
189
Lorentz transformation, 11 National Museum of Mathe-
Mercury, 14
Lucas–Lehmer primality matics, 561
Mersenne number, 463, 503
test, 464, 465 National Resident Match
Mersenne prime, 463, 473
Lusin’s theorem, 555 Program, 263
Mertens’s theorem, 37, 109,
Lyapunov central limit theo- National Science Founda-
110
rem, 411 tion, 451
Mertens’s theorem (prime re-
Lychrel number, 104 National Security Agency,
ciprocals), 189
method of stationary phase, 209
539 natural density, 339
Maass form, 294 natural number, 1
method of undetermined co-
MacArthur Fellow, 76 Navier–Stokes Equation, 491
efficients, 8
Magic Cube, 333 negative curvature, 21, 23
metric space, 22, 481
major arc, 41 Neptune, 221
Metropolis algorithm, 219
Major League Baseball, 311, middle square digits method, Newcomb’s paradox, 383
317, 319 179 Newton fractal, 260, 261
Mandelbrot set, 357 Millennium Prize Problems, Newton’s method, 259, 277
manifold, 241 454 Newton’s second law, 67
Maple, 519 Millennium Problems, 47 Nielsen–Schreier theorem,
MapQuest, 57 Miller–Rabin test, 501 484
Markov chain, 219, 220, 318, Mills’s constant, 388 NIST, 124
544 minor arc, 41 Nobel Prize, 76, 79, 163, 199,
Markov chain Monte Carlo Möbius strip, 241, 243, 506 251, 263, 435, 469, 472,
algorithms, 219 modular, 440 543, 562
Markov’s theorem, 399 MoMath, 561 non-Euclidean geometry, 273
Mars, 68, 75 moment, 51
nonorientable, 241, 243
Mason–Stothers theorem, Moneyball, 317
nontrivial zeros of the zeta
169, 375–377 monoid, 508
function, 153
Matching; Global Median, monotone sequence property,
norm; Euclidean, 193; on a
265; Local Median, 265 37
vector space, 193
Mathematica, 140, 413, 416, monovariant, 371
normal distribution, 51
429, 495, 519 monster group, 439, 442, 515
normal random variable, 470
mathematical induction, 8 monstrous moonshine, 440
Monte Carlo method, 175, normal subgroup, 439
MathOverflow, 399
318, 323 normal topology, 231
matrix; characteristic poly-
nomial, 296; companion, Monty Hall problem, 427, Norwegian Academy of Sci-
296; constraint, 181; con- 428 ence and Letters, 299
sumption, 472; diago- moonshine module, 441 nowhere dense, 22, 481
nal, 447; exponential, 69; Moore–Kline theorem, 25 NP-complete problem, 385,
Fourier, 283; Hadamard, Mordell’s conjecture, 378 402
249; Householder, 248; in- Morse theory, 21 NRMP, 263
tegral, 446; left and right Moser spindle, 527 NSA, 209
inverses, 193, 195; multi- Moser’s circle problem, 91 NSA Cryptomathematics In-
plication, 282; orthogonal, multiplicative function, 171 stitute, 210
INDEX 577
number; algebraic, 30, 114, orthogonal matrix, 62, 248, Poincaré disk model, 273
117, 295, 329, 332; alge- 249 point-set topology, 230
braic integer, 343; alge- Ostrowski Prize, 394 Poisson random variable, 539
braic irrational, 133; Ba- Ostrowski’s theorem, 17 pole, 188
con, 306; Bernoulli, 459; Polignac’s conjecture, 33
Carmichael, 501, 504, 549; P versus NP problem, 183, Polish Cipher Bureau, 157
Catalan, 253, 254, 540; 490, 491 Pollard’s p−1 algorithm, 353
class, 446; congruent, 452, p-adic absolute value, 17 Pólya’s conjecture, 91
453; Erdős, 305; Erdős– p-adic number, 17, 18 polygonal number, 448
Bacon, 306; Fermat, 189, PA, 87 polyhedra, 369
421–423; Fibonacci, 131, packing density, 475 Polymath8 project, 33
235, 312, 313, 373, 495, PageRank algorithm, 465 polynomial; Alexander, 396;
497, 519; Gaussian inte- pair correlation problem, 81 Bernstein, 196; character-
ger, 511; Graham’s, 439, pair of pants, 21 istic, 77, 247, 296, 496;
442; harmonic, 553; Heeg- palindrome, 104 cyclotomic, 236; Euler’s,
ner, 343; irrational, 95, pan galactic gargle blaster, 91; Fermat’s last theorem,
114, 117, 205, 222, 227, 31 376; fixed points, 358;
258, 295; irrational of type paradox; Banach–Tarski, 61, generating fractal, 261;
(K, ν), 222; Liouville, 118; 241, 369, 381; birth- harmonic, 361; HOMFLY,
Lychrel, 104; Mersenne, day, 139; Burali–Forti, 88; 397; indecomposable, 300;
189, 463, 503; natural, Galileo’s, 27; liar’s, 85, Jones, 395–397, 399; knot,
1; ordinal, 87, 88; p- 86, 381; Newcomb’s, 383; 396; Laurent, 396; prime-
adic, 17, 18; perfect, 473; nonexistence of length, 63; generating, 91, 388; roots,
polygonal, 448; prime, 1, Russell’s, 85, 86, 381; 358
57, 289, 382, 409, 414; Smale’s, 241 polynomial-time algorithm,
quadratic irrational, 114; paradoxical decomposition, 501
Ramsey, 90; rational, 17, 62 polytope, 185
117, 222, 258; RSA chal- parallel postulate, 273, 369 poset, 483
lenge, 353; Skewes, 107, Pareto condition, 199 positive semidefinite, 250
108, 443, 444; square-free,
pariah group, 514 positive semidefinite matrix,
339, 342, 445, 549; taxi-
partial order, 483 250
cab, 551; transcendental,
partition function, 40, 59 possibility theorem, 199
31, 98, 114, 118, 169, 227–
Peano arithmetic, 87 power index, 201
229, 329; triangular, 447;
Peano curves, 25 power tower, 72, 109
van der Waerden, 90
Penrose–Banzhaf power in- powerset, 25, 31
number field, 18, 169
dex, 201 preference matrix, 263
number transcendental, 332 pentagon diagram, 295 primality test; Lucas–
numbers; algebraic, 169 Pentium processor, 33 Lehmer, 464
perfect, 525 primality testing, 47
perfect graph theorem, 525 prime; Fermat, 421, 424;
off-by-one error, 363
perfect number, 473 Gaussian, 511; Hawkins,
one-to-one, 27
periodic function, 281 408; largest known, 464;
onto, 27 permutation, 157, 158 Mersenne, 463, 473; regu-
open mapping theorem, 481 permutation matrix, 439 lar, 459; Sophie Germain,
open sector, 472 perturbation theory, 221 460
open set, 230 Picard iteration, 166 prime number, 1, 4, 57, 382,
Operation Fortitude, 160 pigeonhole principle, 223, 407, 409, 414; Cramér
operator; selfadjoint, 69; 338, 346 model, 410; Fermat, 422;
skew-symmetric, 69; uni- plaintext, 124 of the form x2 + dy 2 , 289;
tary, 69 Planck’s constant, 67 twin, 416
operator theory, 193 Platonic solid, 347, 348 prime number theorem, 34,
orbit, 101 Playfair’s axiom, 273 58, 107, 129, 153, 172,
order; partial, 483; total, 31, Pluto, 75 181, 187, 189, 303, 339,
483; well, 483 Poincaré conjecture, 433, 366, 410, 424, 465, 515,
ordinal number, 87, 88 491, 505, 507 522, 528, 554
578 INDEX
of domain, 545; Kirby– Toeplitz operator, 301 van der Waerden’s theorem,
Paris, 87; Kolmogorov– topological space, 230, 293, 341
Arnold–Moser, 221; Kro- 481 variance, 51
necker’s approximation, topology, 230, 242; base for Venice, 201
134; Kronecker–Weyl, a, 230; definition, 230; Venus, 68
133; Liouville’s, 118, Hausdorff, 231; noncom- Vigenère cipher, 212, 224
119, 329; Lusin’s, 555; mutative, 302; normal, Vitali set, 65, 277
Markov’s, 399; Mason– 231; regular, 231 Volterra integration opera-
Stothers, 169, 375–377; torsion subgroup, 47 tor, 195, 197
mean value theorem for torus, 243, 346, 505, 508 von Mangoldt function, 409
integrals, 35; Mertens’s, total order, 31, 483 von Neumann algebra, 163,
37, 109, 110; Mertens’s totient function, 416 396, 397
(prime reciprocals), 189; transcendence degree, 169
Moore–Kline, 25; Müntz– transcendental, 169, 295, 329 Wallace–Bolyai–Gerwien
Szász, 197; Nielsen– transcendental number, 31, theorem, 369
Schreier, 484; open map- 98, 227, 229, 329, 332 Wallis’s formula, 539
ping, 481; Ostrowski’s, 17; transition matrix, 215 Waring’s problem, 39
perfect graph, 525; prime transitive, 28 wave function, 67
number, 34, 58, 107, 129, traveling salesman problem, weak-field approximation, 13
153, 172, 181, 187, 189, 183, 385, 386 Weierstrass M -test, 360
303, 339, 366, 410, 424, tree, 253 Weierstrass approximation
465, 515, 522, 528, 554; trefoil knot, 395, 505 theorem, 193, 196
Pythagorean, 311, 314, triangular number, 447 well-ordered, 278
319, 320; rank-nullity, trivial zeros of the zeta func- well-ordering principle, 206,
299; Riemann–Roch, 170; tion, 152 483
Riesz representation, 197; Tunnell’s theorem, 454 Wetzel’s problem, 307
Roth’s, 133, 227; Roth’s TUNNY, 210 Weyl’s uniform distribution
(arithmetic progressions), Turing machine, 121 property, 96
228, 340; second incom- twin prime conjecture, 33, Whitney–Graustein theo-
pleteness, 484; Severini– 57, 408, 433, 522, 528 rem, 242
Egorov, 556; Stone’s, 68; twin primes constant, 34, 460 Wiener algebra, 148
strong perfect graph, 526; Twitter, 305 Wiener process, 470
Szemerédi’s, 90, 228, 340, two envelopes problem, 382 Wiener’s 1/f theorem, 148
341, 511; Thue’s on num-
Wigner’s semicircle law, 80,
bers with fixed prime fac- Ulam spiral, 522 253
tors, 442; Thue–Siegel–
Ulam’s conjecture, 101 Wilf–Zeilberger algorithm,
Roth, 228; Toeplitz in-
Ultra, 157 416
dex, 301; Tunnell’s, 454;
ultrametric, 18 William Lowell Putnam
van der Waerden’s, 341;
undecidable, 109, 307 Mathematical Competi-
Wallace–Bolyai–Gerwien,
uniform boundedness princi- tion, 75
369; Weierstrass ap-
ple, 481 winding number, 302
proximation, 193, 196;
uniformly strict contraction, winning coalition, 201
Whitney–Graustein, 242;
165 Wolfram Alpha, 302, 413,
Wiener’s 1/f , 148; Zeck-
unique factorization domain, 414
endorf’s, 312, 313, 373
288 Wolfram Mathematica, see
Thompson group, 63 universal, 446 also Mathematica
Thue’s theorem on numbers universal machine, 121
with fixed prime factors, universal quadratic form, 446 Zaremba’s conjecture, 323,
442 up-arrow, 443 326
Thue–Siegel–Roth theorem, upper density, 340 Zeckendorf decomposition,
228 upper multiplicative density, 312, 373
Thurston’s corrugations, 241 341 Zeckendorf’s theorem, 312,
Thwaites conjecture, 101 Uranus, 221 313, 373
time average, 95 Zermelo–Fraenkel axioms,
Toeplitz index theorem, 301 van der Waerden number, 90 85, 141, 269
INDEX 581
Zermelo–Fraenkel set theory, ZF, see also Zermelo– zombie infestation, 371
85, 86, 108, 278, 484 Fraenkel set theory Zorn’s lemma, 483
zeta function, see also Rie- ZFC, see also Zermelo–
mann zeta function Fraenkel set theory
This book is an outgrowth of a collection of 100 problems chosen to celebrate the
100th anniversary of the undergraduate math honor society Pi Mu Epsilon. Each
chapter describes a problem or event, the progress made, and connections to
entries from other years or other parts of mathematics. In places, some knowledge of
analysis or algebra, number theory or probability will be helpful. Put together, these
problems will be appealing and accessible to energetic and enthusiastic math majors
and aficionados of all stripes.
MBK/121