Professional Documents
Culture Documents
A Decade of The Berkeley Math Circle The American Experience 2 0821846833 9780821846834 Compress
A Decade of The Berkeley Math Circle The American Experience 2 0821846833 9780821846834 Compress
A Decade of the
Berkeley Math Circle
The American Experience,
Volume II
Zvezdelina Stankova
Tom Rike
Editors
A Decade of the
Berkeley Math Circle
The American Experience,
Volume II
Zvezdelina Stankova
Tom Rike
Editors
Advisory Board for the MSRI/Mathematical Circles Library
Titu Andreescu Walter Mientka
David Auckly Bjorn Poonen
Hélène Barcelo Alexander Shen
Alissa S. Crans Tatiana Shubin (Chair)
Zuming Feng Zvezdelina Stankova
Tony Gardiner Ravi Vakil
Kiran Kedlaya Ivan Yashchenko
Nikolaj N. Konstantinov Paul Zeitz
Silvio Levy Joshua Zucker
Andy Liu
This volume is published with the generous support of the Simons Foundation and Tom
Leighton and Bonnie Berger Leighton.
Copying and reprinting. Individual readers of this publication, and nonprofit libraries
acting for them, are permitted to make fair use of the material, such as to copy select pages for
use in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Permissions to reuse
portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink
service. For more information, please visit: http://www.ams.org/rightslink.
Send requests for translation rights and licensed reprints to reprint-permission@ams.org.
Excluded from these provisions is material for which the author holds copyright. In such cases,
requests for permission to reuse or reprint material should be addressed directly to the author(s).
Copyright ownership is indicated on the copyright page, or on the lower right-hand corner of the
first page of each article within proceedings volumes.
c 2015 by Zvezdelina Stankova and Thomas Rike
Printed in the United States of America.
∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at http://www.ams.org/
Visit the MSRI home page at http://www.msri.org/
10 9 8 7 6 5 4 3 2 1 20 19 18 17 16 15
To Zvezda’s husband, Dmitri, and to Tom’s wife, Peggy, for making
this book possible, for their infinite patience and love over the course
of two years of hard work,
and . . .
Foreword xi
Introduction xiii
1. Top-Tier Math Circles xiii
2. Why, What, and for Whom? xvi
3. Notation and Technicalities xx
4. The Art of Being a Mathematician and Problem Solving xxii
5. Acknowledgments xxiii
vii
viii CONTENTS
xi
xii FOREWORD
This book is based on material from a dozen of the 800 sessions of the
Berkeley Math Circle (BMC), held over the past 16 years. BMC has been
described as a top-tier math circle, calling for the following two definitions.
1.1. Math circles are weekly math programs that attract elementary, mid-
dle, and high school students to mathematics by exposing them to intriguing
and intellectually stimulating topics, rarely encountered in classrooms. Math
circles vary in their organization, styles of sessions, and goals. But they all
have one thing in common: to inspire in students an understanding of and
a lifelong love for mathematics.
1
Based on contributions from Marc Whitlow and Mike Breen (BMC Parents),
Zvezdelina Stankova (BMC Director), and Tatiana Shubin (SJMC Director).
xiii
xiv INTRODUCTION
1.2. Top-tier math circles prepare our best young minds for their future
roles as mathematics leaders. Sessions are taught by accomplished mathe-
maticians and explore advanced mathematical areas. They provide an ed-
ucational opportunity for top pre-college mathematics students, not offered
in any other setting in the U.S. education system. In addition to learning
advanced mathematics topics, students are taught the technical writing skills
needed to convey the solutions of complex problems.
As an example of a top-tier math circle, the Berkeley Math Circle
is fashioned after the leading models in Eastern Europe, where math circles
originated over a century ago. BMC itself started in the fall of 1998 with
about 50 students, primarily in grades 7-12, and there was only one session
per week that lasted 2 hours. Sixteen years after, the circle has expanded to
about 300 students in grades 1-12, split into two major groups:
• BMC-Upper with 3 levels: BMC-Beginners for 5th -6th grades (1.5
hours per week); BMC-Intermediate for 7th -8th grades (2 hours per
week); and BMC-Advanced for 9th -12th grades (2 hours per week).
BMC-Upper is directed by Zvezdelina Stankova.
• BMC-Elementary with 2 levels: BMC-Elementary I for 1st -2nd grades
(3 sections, 1 hour per week); and BMC-Elementary II for 3rd -4th
grades (3 sections, 1 hour per week). BMC-Elementary is directed
by Laura Givental.
This book series is based on sessions from BMC-Upper and from the orig-
inal BMC, when there was only one group for all. To save space, “BMC”
throughout this book will refer, for the most part, to materials, instructors,
and students from “BMC-Upper.”
Like top-tier universities, BMC
• challenges students with beautiful, difficult mathematical theories,
• introduces them to powerful problem-solving techniques,
• constantly provokes deep thought, and
• inspires the creation of original ideas.
Topics covered at BMC include combinatorics, graph theory, linear alge-
bra, geometric transformations, recursive sequences, series, set theory, group
theory, number theory, elliptic curves, algebraic geometry, applications to
computer science, natural sciences, economics,and many more. Each topic is
taught by an expert in the field who has the ability to challenge the students
and support them as they attempt to meet these challenges. All problems
require students tocome up with mathematical proofs. Proofs put forward
by the students are not always the most eloquent. Only an accomplished
mathematician can understand where a student might be heading in his/her
proof and offer assistance through this challenge.2
2
For examples of noteworthy past and present instructors who have brought their
world expertise to BMC, see the Epilogue.
1. TOP-TIER MATH CIRCLES xv
1.3. The next generation of math leaders. The students of BMC come
from a variety of socio-economic and ethnic backgrounds. The proportion of
female to male students is approximately 2:3. This is an amazingly high ratio
considering the trend of other high-level math programs, which are “male-
dominated” or “male-only.” Excellent role models for the female students are
provided by the female directors of the top-tier math circles in Berkeley [11],
San Jose [71], Los Angeles [47], and (formerly of) Marin Math Circle [52];
but perhaps even more important to the students are the outstanding lectures
given by dozens of female professors and graduate students.
Currently, BMC does not actively recruit participants. Students and
their parents find out about the circles by word of mouth, from the Circle’s
web site, http://mathcircle.berkeley.edu/, through local universities,
and in publications. Due to an increased number of applicants, there is
a semi-formal selection process based on several open essay-type questions
along the lines of:
• Describe your mathematical background and experiences so far.
• Why do you want to join BMC? What do you expect from BMC?
• What is your favorite math problem that you can solve? State and
solve the problem. Why is it your favorite?
• What is your favorite math problem that you cannot solve? State the
problem and explain why you cannot solve it but why you would like
to solve it.
xvi INTRODUCTION
Needless to say, BMC students are usually years ahead of their peers:
they often complete most of high school mathematics by age 13 (8th grade),
some take many college math major courses by the time they graduate from
high school, and a few of the top circlers venture into graduate courses
and serious mathematical research even before entering a university. The
accomplishments of students who have benefited from BMC can be measured
in many ways. For example, a number of these students have gone on to win
International Math Olympiad medals and Putnam awards, and the majority
have been admitted to top-tier universities. BMC and the other top-tier
math circles not only produce highly accomplished students – they produce
and train the next generation of leaders in mathematics.3
Running BMC for 16 years has taught us a lot about math education in
the U.S. and has helped us to understand better our own childhood education
and origins of our passion for mathematics. To share this experience with
you, the reader, is the purpose of this book :
• to present you with beautiful theories, problem-solving techniques, and
mathematical insights;
• to provide you with an abundance of exercises and problems to work
on and with ready materials for math circle sessions.
2.2. Prerequisites. To read the book comfortably, you do not need to have
Calculus under your belt, except
• in the very last section of Session 12 on plane geometry, which discusses
a series solution to a geometric question, or
• if you want to prove the cited theorems in Session 9 on inequalities.
However, familiarity with basic geometry and algebra concepts and theo-
rems will definitely be helpful; e.g., lines, circles, triangles, rectangles, trape-
zoids, and quadrilaterals in general; similarity criteria for triangles and the
Pythagorean Theorem; equal alternate interior angles for parallel lines and
bisecting diagonals in a parallelogram; integers, divisibility and remainders;
operations on fractions and real numbers, intervals and sets of numbers;
and manipulations of algebraic expressions written with letters. In some
sessions, functions will play a major role; hence having studied some basic
(pre-calculus) examples will not hurt; e.g., linear and quadratic functions,
polynomials, exponential and trigonometric functions, as well as their graphs.
The above concepts will be re-introduced via examples in the book. But
if you feel that you need more solid background, we direct you to several
wonderful books that should be part of any budding mathematician’s library:
• Geometry, Book 1 by Kiselev [32],
• Functions and Graphs [27], The Method of Coordinates [28], Sequences,
Combinations, Limits [31], Algebra [30] and Trigonometry [29] by
Gelfand, et al.,
• for the older reader, 103 Trigonometry Problems from the Training of
the USA IMO Team by Andreescu and Feng [5].
2.3. The logical structure of the book series (volumes I and II) is outlined
in Figure 2 on page xviii. A solid arrow indicates that a session requires
its “predecessor” to be studied beforehand, while a dashed arrow indicates
that the “predecessor” will be helpful but is not absolutely necessary. For
example, in order to understand Rubik’s Cube II, one should first study
Rubik’s Cube I; on the other hand, Rubik’s Cube I-II will make Group
Theory I more concrete, but they are certainly not mandatory.
Sessions that are bubbled in an ellipse can be attempted without any
prerequisites, while sessions encompassed in a rectangle have at least one
necessary predecessor. For example, Monovariants II calls for a prior study
of Monovariants I, while Knot Theory can be attacked with little reference
to other sessions. Sessions not enclosed in anything are from volume I.
Finally, there is a group of sessions that pertain to general proof meth-
ods, PSTs, and theory that appear in most other places. These sessions
are from volume I and are roughly grouped in the two nebulous “clouds”:
Proofs I-II, PSTs, and Induction in one “cloud,” and Number Theory I and
Combinatorics I in another “cloud.” Figure 2 captures some, but certainly
not all, relations among the sessions and topics. The reader is welcome to
search for and draw more arrows, as he/she goes through the book.
xviii
Abstract
Algebra
Complex Topology
Numbers
Number
Theory Geometry
2.4. The middle or high school teacher who wishes to start a math
circle in his/her school or teach a specially designed problem-solving class
will find this book series invaluable. To start with, five sessions from volume I
are a must for any math circle, as they provide techniques and a foundation
for solid mathematical understanding; these are Combinatorics I, Number
Theory I, Proofs I-II, and Induction.
Five of the topics in volume II are introductory and independent of each
other; e.g., Geometric Re-Constructions I, Knot Theory, Group Theory I,
Multiplicative I, and Inequalities I. Towards the end, some of these contain
harder material suitable for intermediate level and the second-to-third year
of a math circle. Four other sessions obviously need to be introduced af-
ter studying their earlier counterparts; e.g., Geometric Re-Constructions II,
Rubik’s Cube II, Monovariants II, and Complex Numbers II. The remaining
three sessions are designed truly for the advanced reader: Multiplicative II,
Monovariants III, and the last section of Geometric Re-Constructions III.
Open questions or problems beyond the scope of the book are interspersed
throughout the book and should be left to the die-hards.
Running a math circle, especially for a teacher, is a hard task. But it is
possible. In the 1960’s, Tom Rike (an editor for this book and a veteran high
school math teacher) was working on his master’s degree. While browsing in
the library one day, he ran across The USSR Olympiad Problem Book [74].
It contained problems written for talented 7th –10th graders; yet, he could
not solve any of these “elementary” problems. In his own words:
“My abstract algebra had been too abstract, and I did not have the
concrete examples that I needed. I never took a class in number theory
because it sounded too elementary. I had developed the real number
system starting from the Peano axioms, but I didn’t really understand
the fundamentals of the natural numbers, prime numbers. This was
an epiphany for me. I felt as though I had been challenged by some
force outside me and did not know how to respond.”
For the next 30 years Tom studied olympiad problem solving, first on his
own, then through workshops and math circles in the SF Bay Area. He ran
his own math circle at Oakland High School and gave talks at just about all
other circles around. Even though at times he was only “a few pages” ahead
of the students, he kept on learning and teaching problem solving because
working on math circles had come to be a large part of his life:
“Although I have not attained my goal of becoming a true olympiad
problem solver, the journey I have made in pursuit of this goal has
been one of the most rewarding endeavors in my life.”
Hence, a word to the middle and high school teachers: keep on reading
the book, despite moments of difficulty or confusion. For the motivated,
persevering, and caring teacher, there will come a time when he/she will look
back at the material here, smile, and effortlessly deliver it to the students at
his/her own math circle. Truly gratifying.
xx INTRODUCTION
2.5. Proofs in particular. That proofs are important goes without ques-
tion in the mind of Galileo’s father:
“It appears to me that those who rely simply on the weight of authority
to prove any assertion, without searching out the arguments to support
it, act absurdly. I wish to question freely and to answer freely without
any sort of adulation. That well becomes any who are sincere in the
search for truth.”
the sessions without being familiar with proofs, reviewing first the sessions
on Proofs and Induction in volume I will make it faster and easier to read
and understand this book.
2.6. The parent of a middle or high school student is also among our
intended audience; in fact, parents are probably the most important readers
because without their support and enthusiasm, without them bringing and
encouraging their children, there would hardly be any top-tier math circles
in the U.S. Hence, if you are among those parents or if you are a parent
new to the math circle movement, this book series will provide a very strong
beginning for your child. And for you as well.
As a parent, you can do three things with this book: give it to your child
(but make sure that he/she has the necessary background – see the recom-
mended basic books); learn from it and teach your child; or give it to his/her
math teacher and encourage the founding of a school-based math circle.
Whatever path you choose to follow, it will eventually benefit your child
and possibly a larger group of classmates. In any case, enjoy the book!
Problem
Basis step
Open question or one that
requires extra knowledge Inductive step
Problem-solving technique
Strong basis step
Warning
Strong inductive step
Contradiction
Symbols and Notation on page 321). If you need to review or learn this
material in depth, we refer you to the first chapter of Jacobs’ Geometry [43]
on deductive reasoning. A complete list of Abbreviations can be found on
page 325.
3.3. Labeling and future volumes. Subfigures within the same figure
are implicitly labeled in alphabetical order. For example, Figure 4 on page 9
contains subfigures Figure 4a, 4b, 4c, and 4d, reading from left to right. Fi-
nally, about half of the sessions are parts of series of sessions, to be continued
in Volume III of the book.
knowledge: Concepts, Theorems, and PSTs; and throughout this book you
will encounter about 100 PSTs. You will also need to learn how to fit together
various mathematical parts in order to move forward in the solutions.
my advisor Joe Harris. A number of places in the book will present common
problem-solving pitfalls, and alternative ways to solve the same problem.
And so, it will be you, the reader, who has to commit to mastering the
new math theories and techniques by
• “muddying your hands” in the problems,
• going back and reviewing necessary PSTs and theory, and
• persistently moving forward in the book.
Nothing good comes “for free”: you will have to work hard, always with a
pencil and paper in hand. Keep in mind that the math world is huge: you’ll
never know everything, but you’ll learn where to find things, how to connect
and use them. The rewards will be substantial.
5. Acknowledgments
5.1. Institutional support and sponsors. The Berkeley Math Circle was
made possible through the years with the unwavering support of:
• University of California at Berkeley Math Department, which hosts the
Circle and its web site and has provided student assistants and secretarial
support every year since 1999. Through faculty grants, Ivan Matić has been
able to act as an associate director. The department chairs Cal Moore,
Hugh Woodin, Ted Slaman, Alan Weinstein, and Arthur Ogus have always
been encouraging and supportive, and several dozen UCB professors have
delivered Circle sessions.
• Mathematical Sciences Research Institute, which from its inception has
overseen the project, provided funds through various sponsors, and hosted
Circle meetings and events. Special thanks to Deputy Directors Hugo Rossi,
Joe Buhler, Michael Singer, Bob Megginson, and Hélène Barchelo, Directors
David Eisenbud and Robert Bryant, and Associate Director David Auckly
for their leadership, understanding, and help.
A number of sponsors have financially supported BMC over the years:
Packard Foundation, Toyota Foundation, Clay Mathematics Institute, Mosse
Foundation for Art and Education, Merriam-Webster Foundation for the
Scripps National Spelling Bee; National Science Foundation and other grants
from Professors Ravi Vakil (Stanford), Bjorn Poonen, Alexander Givental,
and Martin Olsson (UC Berkeley), and generous private donors.
5.2. Parents and students. BMC parents have encouraged and driven
their kids to the Circle for years, brought snacks during the breaks, organized
Circle parties, attended meetings, and donated time, effort, and personal
funds to the Circle. We are especially grateful to Marc Whitlow, Mike Breen,
Jennifer O’Dorney, Yuki Ishikawa, Ian Brown, and Tony DeRose for their
enthusiasm, leadership, and professional services provided so selflessly to the
Circle.
xxiv INTRODUCTION
5.3. Professional support with the web site has been rendered on numer-
ous occasions by Paulo de Souza, Dmitri Mironov, Steve Sizemore, and Igor
Savine. Marsha Snow, Barbara Peavy, and Tom Brown have offered valu-
able secretarial support over the years. BMC owes its logo design to Archer
Design, Inc.
As one can see, many dozens of people have been involved in running
the Berkeley Math Circle: it is a joint operation born of the love and care
for our young generation of mathematicians. The most important people in
this operation are undoubtedly the BMC instructors (over 100), who have
delivered the 800 sessions during the last 16 years. We would like to thank
all of them! Twelve instructors joined BMC in the beginning and most have
stayed with us throughout the years: Ted Alper, Tom Davis, Dmitry Fuchs,
Alexander Givental, Quan Lam, Bjorn Poonen, Tom Rike, Vera Serganova,
Tatiana Shubin, Zvezdelina Stankova, Paul Zeitz, and Joshua Zucker.
5.4. Book support. Edward Dunne, our AMS editor, and his staff have
been very helpful in resolving technical and other issues. Gabriel Carroll
is responsible for drawing some of the cartoons in the book series, inspired
by the earlier BMC sessions. All USAMO problems are used with permis-
sion from the American Mathematics Competitions (AMC), Lincoln, Ne-
braska [2]. A few pictures and references have been taken from Wikipedia
at wikipedia.org/.
With gratitude,
Zvezdelina Stankova
Berkeley Math Circle Director
Session 1
Zvezdelina Stankova
The signature problems of this session are two of my favorite plane ge-
ometry problems. After decades of subsequent advanced math studies, they
still remain crystal clear in my memory . . . to remind me of the wonder I
experienced when I first saw them as a 5th -grader back in Bulgaria [12].
As our first step toward solving them, we will experiment and decide if
our answers constitute a mathematical proof or not. It is absolutely nec-
essary to bring aboard for this journey some graph paper, scissors, clear
tape, a flexible but not stretchable cord, a pin, and of course, a pencil and a
straightedge. Highly recommended are a compass and completely prohibited
are calculators and other electronic equipment. We will depend only on basic
tools and on our unlimited imagination.
1
2 1. RE-CONSTRUCTIONS. PART I
1.1. Cutting, taping, and guessing. Our first problem has an almost
century-long history. One of its solutions presented here resembles a truly fa-
mous, almost mystical 2000-year old puzzle leading way back to Archimedes!1
Problem 1. (Three Squares) Three identical squares with bases AM ,
M H, and HB are put next to each other to form a rectangle ABCD
(cf. Fig. 1a). What is the sum of the angles ∠AM D + ∠AHD + ∠ABD?
D C
γ
β
α β γ α
A M H B
Figure 1. Experimenting on the three squares
The problem is asking us to find something – an angle. In such situations,
people would give you the answer they believe is correct and, more often
than not, would think that they are done, without having actually proven
anything! But the reader who has gotten this far in the book series knows
that the solution should consist of at least two parts:
(1) investigating and conjecturing, and
(2) formally proving the conjecture.
Alas, sometimes even just coming up with the correct answer is already
a challenge. For example, when I encountered this problem as a 5th -grader,
it wasn’t at all obvious to me what the sum of the three angles had to be . . . .
So, how was I to start on a problem when it was unclear what I was supposed
to be proving?
PST 1. If physical experimentation is not too difficult, then do it in order
to discover some possible answers to a problem. Since conjecturing does not
require any proof, just about anything is allowed as “experimentation,” as
long as you follow the rules of the problem (and don’t hurt anyone!).
Figure 1b is more than suggestive:
Exercise 1. Draw the 3-squares problem on a graph paper, cut out the
three angles, and tape them to each other to form a single angle sum: the
three vertices will become one and some adjacent arms will coincide too.
How large do you think this angle sum will be? Estimate it.
If Figure 1c were drawn to show this resulting angle sum, it would have
given away the answer too easily. Now, of course, due to errors in the
physical experimentation, no two final angle sums will be absolutely the
same. Nevertheless, they will all look suspiciously close to a very well-known
angle . . . .
1
See the Historical Appendix in Part II for an explanation of this startling reference.
1. EXPERIMENTING AND CONJECTURING 3
Exercise 2. Pull out a protractor, measure, and add the three angles.
And hence a second physical experiment is performed, with its own error
of measurement, despite how much one might rely on his/her own protractor.
In fact, if you do it yourself, you will likely discover that only one of the three
angles measures easily and nicely (which one?), and the other two angles
yield seemingly random non-integer degrees . . . . As a result, this experiment
might prove to be even less precise than the first one with the scissors!
One thing, though, should be clear by now – if the problem has a nice
answer suitable for a 5th -grade solution (albeit, from a Bulgarian geometry
math circle book!), then that answer must be:
Conjecture 1. The three angles add up to a right angle.
As a middle school student, I knew three ways to prove this conjecture:2
Idea 1: A bold and truly brilliant solution that re-creates the “missing”
half of the picture by an original extra construction. A bright 5th -grader
will understand this solution, as it uses only very elementary technical tools
such as congruent triangles and a couple of special plane figures that everyone
knows. But it is unlikely (although not impossible) that the bright 5th grader,
or even the most seasoned problem-solver, will be able to come up with such
an amazing solution out of nowhere. ♦
Idea 2: A 7 -grade solution using similar triangles and the Pythagorean
th
◦
Theorem, which only partially illuminates the reason behind the 90 -sum. ♦
Idea 3: A standard and boring but fast 8 -grade solution via trigonometry,
th
which does not explain why the result really is what it is. ♦
The first challenge has been
served. You should try on your
own to solve the problem in at
least one way. The picture on
the left contains color-coded
hints for all three different
ways. We will re-create the
5th -grade solution in this
session and come back to the
other solutions in Part II.
2
. . . that is, until I saw the 54 proofs in [82]! Check out the History Appendix in
Part II.
4 1. RE-CONSTRUCTIONS. PART I
If you draw several possible paths for the farmer, measure, and add, you
will get an idea as to where the optimal place will be along the river. You
may even want to organize all data in a neat table. There is, however, a
simple physical experiment that can help you arrive at a conjecture faster:
Exercise 3. Take a flexible (but not stretchable) string or cord; pin one
end at the farmer’s position; with your right-hand fingers loosely hold the
other end at the cow’s position; and with a pencil (or your left-hand fingers)
stretch the cord until it touches the river. Then start sliding the pencil along
the river, accordingly loosening or tightening at the cow’s position to keep
the cord in two straight segments. Which place along the river needed the
least amount of cord?
Sort of an answer: As you move the C
pencil (or your left-hand fingers), you will
discover a place X at the river, to the left
and to the right of which you will need to
loosen the cord in order for the pencil to stay
along the river. 6
If the farmer and the cow walked straight F
to the river to points A and B, respectively,
then how long is AX? Since different pic- 2
tures will be drawn with different scales, a
A B
more appropriate question might be to ap- X
proximate the ratio AX : BX. ♦
As with the 3-squares problem, upon performing this experiment, the
BMC-Beginners split in their predictions; some claim that AX ≈ 0.9 km
and for some it looks like AX ≈ 1.05 km, while the majority suspect that
the exact answer must be a nice round number:
Conjecture 2. The farmer should go to a place X on the river so that
AX = 1 km; i.e., AX : BX = 1 : 3.
2. A TRIANGLE WORKOUT 5
to a trivial but equivalent version by an extra
construction that (again!) re-creates the “miss-
ing” half of the picture. A bright 5th grader will
be able to follow the logic of the solution, if she
is familiar with basic geometry tools such as
the Triangle Inequality and similar triangles,
and experienced in manipulating fractions and
solving simple linear equations. ♦
Idea 5: Take a “leap-of-faith” and apply a fundamental law of physics and
its consequence that we observe every day. ♦
answer is what it is other than “This is how the calculations work out.” ♦
The second challenge has been served. Incidentally, the picture of the
sun looking into the mirror is an indirect and direct hint for two of the ideas.
1.3. The grand design. For the rest of this session we will build the
necessary elementary geometry background and discuss the creative and non-
trivial 5th -grade solutions to our two overarching problems. At the end, we
will briefly look into the logical foundation of our plane geometry studies.
In Part II, we will continue building sophisticated geometry and some
technical trigonometry background in order to complete the remaining sug-
gested solutions, generalize their methods to other more advanced problems,
and finally, go out of our “comfort zone” and see beyond the 3-squares prob-
lem and possibly into the origins of trigonometry millennia ago.
If you feel you are already fortified with enough plane geometry back-
ground and the two overarching problems are not challenging enough, you
can skip to the historical section at the end of the session. However, be aware
that the solutions we will discuss here are purely geometric (a.k.a. synthetic)
and, arguably, these are the most beautiful solutions; they can be potentially
created by bright middle schoolers with little technical background and open
minds. And hence, they are worth experiencing.
2. A Triangle Workout
Triangles make up any polygonal shape: if you haven’t done this before,
just cut any polygon that happens to be lying around along several of its
non-intersecting diagonals, until you are left with only triangles. Triangles
also appear on their own everywhere in geometry and in everyday life. We
will definitely need them to solve all of our problems in this session!
6 1. RE-CONSTRUCTIONS. PART I
Question 1. When are two triangles the “same,” i.e., all of their correspond-
ing angles and sides are equal3 in size? This is formally known as congruent
i triangles and is denoted by the symbol ∼ =.
Question 2. When do two triangles look “alike,” i.e., their corresponding
angles are equal in size and their sides are proportional? This is formally
i known as similar triangles and is denoted by the symbol ∼.
Suppose we want to show the congruence ABC ∼ = A1 B1 C1 (cf. Fig. 2a).
Do we need to verify all six conditions for sides and angles:
• a = a1 , b = b1 , c = c1 ; and α = α1 , β = β1 , γ = γ1 ?
C1
γ1 C B2
b1 a1 γ2
∼
=
γ
a ∼ b2 a2
b
α1 β1 α2 β2
A1 c1 B1 α β A2 c2 C2
A c B
Figure 2. Congruent or Similar: A1 B1 C1 ∼
= ABC ∼ A2 B2 C2
The same question goes for similar ABC ∼ A2 B2 C2 (cf. Fig. 2b). Will
it be overkill to verify all five conditions for their sides and angles:
• a/a2 = b/b2 = c/c2 , and α = α2 , β = β2 , γ = γ2 ?
minimum number of conditions: typically only three for congruence and two
for similarity. These will be sufficient to imply the remaining conditions on
sides and angles and will guarantee the congruence/similarity of the triangles.
verify that the elements listed in the table under the criterion for one triangle
are equal to the corresponding elements for the other triangle, and then all
other corresponding elements of the two triangles will follow suit.
3
We shall be sloppy and say “equal” for sides and angles to mean that they have the
same measure, this is formally referred to as congruent sides and congruent angles.
2. A TRIANGLE WORKOUT 7
Examples. ASA criterion (cf. Fig. 3a) asks us to check that, say,
? ? ?
AB = A1 B1 , ∠ABC = ∠A1 B1 C1 , and ∠BAC = ∠B1 A1 C1 .
Similarly, according to RR, two ratios of sides in ABC must be equal to
? ?
two ratios of sides in A2 B2 C2 , e.g., BC/CA = B2 C2 /C2 A2 and CA/AB =
C2 A2 /A2 B2 (cf. Fig. 3b). This can be written in an equivalent but more
memorable way as follows:
AB ? BC ? CA
= = ·
A2 B2 B2 C2 C2 A2
C1 C C
C2
∼
=
b a
∼ b2 a2
α1 β1 α β
A1 c B1 A c1 B A c B A2 c2 B2
Figure 3. ASA and RR criteria
Exercise 4. For each criterion, draw a relevant picture of two triangles that
are congruent (or similar), as in Figure 3, label their vertices, mark the sides
or angles (or ratios) that are supposed to be equal, and write down (in letter
notation) the conditions that are satisfied by the criterion.
β l D C D C D C
α β α
β β E E
β
α m α α β
B B B
α A A A
We are now ready to state the famous theorem that generated the dis-
cussion here about parallelograms. Its proof is left for the reader, especially
since an almost explicit hint about it is definitely somewhere on this page.
Theorem 2. (Parallelogram’s Center) The diagonals of a parallelogram
bisect each other (cf. Fig. 4c).
5
A line that intersects two lines in different points is called a transversal.
2. A TRIANGLE WORKOUT 9
B1 A1 B1 α A1 B1 A1
M β
δ γ
M
α γ δ
A B A B A B
Figure 5. Centroid, Midsegment, and Medians
same point M , fix one segment XY and show that the
remaining segments divide XY in the same ratio counted Y
from X to Y . Furthermore, if you can find the exact ratio M
in which a second segment divides XY , you may be able
to apply an analogous argument to prove that any other
segment intersects XY in that same ratio. X
Now the centroid’s existence and location is only one -similarity away!
Let us start with our farmer and his cow. The way the problem is posed
makes it hard to solve. Recall how we had to experiment in order to guess
the optimal place on the river where the farmer should go, and still we were
far from actually proving anything!
PST 4. Identify what makes the problem hard, eliminate it by reducing the
situation to a simpler one, and see if the new problem is easily solved. Then
connect your solution to the original problem.
3. WALKING ALONG AN OPTIMAL PATH 11
3.1. Simplify and solve. What makes the farmer-and-cow problem hard?
One expression in the statement of the problem, about which we did not
think twice, is what got us into trouble . . . . Which one is it?
How about . . . “on the same side of the river!” What if our two protag-
onists were on different sides of the river? Would you be able to solve the
problem now in no time? Certainly! The shortest path would be the straight
path from the farmer to the cow, going through the river!
Here it is reasonable to pause: what if the river
is wide? Does it make a difference to the farmer’s
path? Sure it does, so . . . eliminate the width of the
river! I can hear some readers objecting: “But you
cannot! It is part of the problem.” Actually, it is not:
the original problem was placed entirely on one side
of the river and did not depend on the width of the
river, or for that matter, on whether the river had River l
any width at all. Hence, as any brave mathematician
would do, we will draw the river as a line with no
width: this will simplify our new problem and make
it a better match for the old problem.
3.2. Relate back to the original problem. It doesn’t take much effort to
see that reflecting the original farmer across the river to create a “phantom”
farmer will turn one problem into the other. Since any path that the original
farmer can take is mimicked by the phantom farmer,
then the shortest path of the original farmer must
correspond to the shortest path of the phantom
farmer: the straight one, as noted earlier. C
So, what is our answer? We must
3.4. Prove that your algorithm works. Our earlier informal argument
led to creating the algorithm, but we still need to formally justify that it
will yield the shortest possible path for the farmer. Indeed, suppose farmer
F walks to any other point Y on the river. Why is this path F → Y → C
longer than the path F → X → C suggested by our algorithm?
12 1. RE-CONSTRUCTIONS. PART I
PST 6. One way to create new problems or reduce to simpler ones is to
reflect across a line. Since any triangle (moreover, any figure!) retains its
size and shape, we arrive at a twin to the original situation.
The beginner should confirm the above statements about reflection:
Exercise 8. Show that the measure of any segment and any angle is pre-
served under reflection. What can you say about triangles under reflection?
3.5. Reflect upon the result of your algorithm. Are we done with
the Farmer-and-Cow problem? In some sense yes: we described a geometric
algorithm, which leads step by step to the optimal path for the farmer, and
we proved that this algorithm works.
On a second thought, though, did you notice that our solution did not
depend at all on the given numerical data: 2 km, 4 km, and 6 km?! What
was that all about? A further mystery is why we studied in detail similar
triangles when we didn’t use them at all?! Well, the Triangle Inequality
(which we did use) can and will be proven in Part II as a consequence of the
Pythagorean Theorem, which in turn will be proven via similar triangles.
But more to the point, do you remember our experiment with the flexible
cord? We made a specific conjecture about the location of point X along the
river: AX : XB = 1 : 3, with A and B at the river directly from the farmer
and the cow. Similar triangles and a bit of algebra will be the “cure” here.
3. WALKING ALONG AN OPTIMAL PATH 13
construct
prove
experiment
Let us now turn our attention to the three-squares problem (cf. Fig. 7a).
Recall our conjecture that
?
α + β + γ = 90◦ .
4.1. Fitting the conjecture into the picture. Our first experiment –
cutting and pasting – led to a right angle made out of non-overlapping α, β,
and γ, as Figure 1b suggested. But where in our original picture will this
right angle fit well?
One possibility is the right ∠ABC: since it already contains γ = ∠ABD,
we “just” have to show that the remaining ∠DBC can be split into α and β.
4.2. Grid hopping. Consider the integer grid made out of unit squares,
just like the three squares in our problem. The points where the grid lines
i intersect will be called grid points. To split ∠DBC as desired, we need to
draw at least one extra arm inside this angle.
The brilliant idea of this solution is to:
The original problem already has 9 grid triangles! Did you find them?
A1 H1 B1
β
D C D C
β
90◦ ? 45◦ ?
α β γ β γ
A M H B A M H B
Figure 7. Tiling of the integer grid
?
4.3. Special triangles to the rescue! It remains to show ∠DBH1 = α.
Since α = 45◦ from the right isosceles AM D, ideally we would find another
right isosceles grid triangle one of whose angles is ∠DBH1 . . . . Not that we
have much of a choice:
Exercise 9. Show that DBH 1 is right isosceles and hence ∠DBH1 = α.
construct:
w/ triangles
extend grid
experiment:
cut & paste
5.1.2. We listed but did not prove ten criteria for triangles:
(6-10) Congruence: SAS, ASA, SSS, SsA, HL.
(11-15) Similarity: RA, AA, RR, R A, H/L.
8
And it is quite possible that we have missed something to list here, which is fine
because the diligent reader can find it and add it to the list.
5. TO PROVE OR TO TAKE FOR GRANTED? 17
5.3.1. Can all criteria for triangles be proven? According to Hilbert’s ax-
iomatic system, part of the SAS criterion for congruence is an axiom:
Hilbert’s Congruence Axiom: If two sides and the angle between them in
one triangle are congruent to the corresponding elements in another triangle,
then the remaining corresponding angles are also congruent.
In other words, the axiom does not conclude that the triangles are con-
gruent! It can be shown then, using Hilbert’s axioms, that the remaining
sides are congruent so that the triangles are congruent. Thus, the SAS cri-
terion is partly an axiom and partly a theorem. All other congruence and
similarity criteria can be deduced from SAS, and hence they are theorems.
k
m t P m
P
β
l α X l l
Since Euclid, many a famous mathematician tried to prove the Fifth Pos-
tulate from Euclid’s other axioms,9 and some even published “proofs” . . . only
for flaws to be eventually found in the arguments. Nevertheless, with each
such attempt mankind moved closer to a non-Euclidean geometry. Finally,
around 1830, the Russian Nikolai Lobachevsky (1792-1856), the Hungarian
János Bolyai (1802-1860), and the German Carl Friedriech Gauss (1777-
1855) independently arrived at hyperbolic geometry, where all axioms of Eu-
clidean geometry hold, except for the Fifth Postulate.
9
Euclid’s axiomatic system, proposed in his Elements [24], consists of 23 definitions,
5 undefined concepts, and 5 axioms, a.k.a. postulates.
18 1. RE-CONSTRUCTIONS. PART I
5.4. A fair game in congruences. As you can see, there is a lot that can
be discussed and learned from studying the logical foundation of geometry
and exploring the implications among various theorems and axioms. It is
not the point of this section to make the reader go through the somewhat
grueling process of proving all non-axioms within the 20 statements listed
earlier (not to be confused with Hilbert’s 20 axioms). Several questions of
why, what, and how one thing implies another are, though, in order.
C1
C1 C
C C
γ
A X B A B A1 B1
A B
Figure 10. Crossbar Theorem, “SsA,” and HL
Exercise 11. Consider the SsA criterion.
(a) Why does it require the angle to be opposite the longer side?
(b) Isn’t HL congruence criterion a special case of it?
Partial Solution: (a) The question essentially asks if we can drop the
condition that the equal angles are opposite the longer sides of the triangle.
Does SsA work when the equal angles are opposite the smaller sides? In
other words, can we strengthen the SsA criterion? Recall the following PST
which was discussed and used in volume I:
PST 8. One way to disprove a statement is to provide a counterexample,
i.e., a situation where the hypothesis is satisfied but the conclusion fails.
In the case of SsA, draw ABC with an obtuse ∠ACB = γ. On ray BC
locate a point C1 , different from C and such that AC1 = AC (cf. Fig. 10b).
Why does C1 exist? What can you say about ABC and ABC1 ? ♦
Hint: (b) Properties of right triangles are obviously involved (cf. Fig. 10c).
Assuming the Pythagorean Theorem, can you deduce from it the famous
fact about right triangles that is necessary to answer the question? ♦
After answering affirmatively the last question, in effect we are left with
four criteria for congruence. Our last question introduces a slightly esoteric,
fifth criterion, which you should try to prove from the other criteria:
Exercise 12. (SASum) Show that two triangles are congruent if one side
in one triangle, an angle adjacent to that side, and the sum of the other two
sides are correspondingly equal to the same elements in the second triangle.
Hint: An extra construction is called for. Can you align the two sides whose
sum is known, without moving the third side or the given angle? Which
criterion implies that the base angles of an isosceles triangle are equal? How
is this relevant here? ♦
Exercise 1. The angles make what looks like a right angle (cf. Fig. 11a).
γ
β
α
B C
A C
D C A C B
β B A
X Y l
β B1 A1 B1
A B
A1 C1
B1 A 1 C1
C1
Exercise 10. Continuing the solution that started on page 18 and referring
to the picture there, by alternate interior angles we have α = α and β = β .
From the straight angle about point C we have α + γ + β = 180◦ so that
α + γ + β = 180◦ , which is the desired sum of the angles in ABC.
Exercise 11. (a) Since 180◦ − γ = δ is acute, there is an isosceles ACC1
with AC = AC1 and base angles ∠ACC1 = ∠AC1 C = δ. Note that C1 will
be on line BC with C between C1 and B (why?), as shown in Figure 13a.
Now, ABC and ABC1 share side AB, and have other equal sides:
AC = AC1 , across which lie equal angles: ∠ABC = ∠ABC1 = β. However,
the two triangles are definitely not congruent, since one of them is contained
in the other, namely, ABC is strictly inside ABC1 ! This counterexample
to a “strengthened SsA” criterion originated from having the (equal) angle β
lie across the smaller side AC of ABC.
In conclusion, we cannot strengthen SsA: we must have the equal angles
opposite the longer sides of both triangles when applying SsA.
2 2 2
(b) The Pythagorean Theorem says that c = a +b for any right triangle
with lengths c, a, and b of the hypotenuse and the legs. Algebraically, this
implies that c2 > a2 and c2 > b2 , i.e., c > a and c > b. Thus, we arrive at the
well-known fact that the hypotenuse is the longest side of a right triangle.
The three triangle elements in the HL criterion are a leg, the hypotenuse,
and the right angle, which is opposite the longest side of triangle. But this
is precisely the SsA criterion for right triangles! Thus, HL is indeed a special
case of SsA.
C1 D D1
C1 δ
C β
δδ
γ A B C μ C1 μ
β α α
A B A B A1 B1
Figure 13. Constructing extra isosceles triangles
Tom Davis
Sneak Preview. In Part I of this session, we encoded the moves on the Rubik’s
Cube via permutations. Understanding the mathematics of these face-twisting
permutations is indeed equivalent to a complete understanding of Rubik’s Cube.
Fortunately, permutations form a most famous and well-studied example of what
is known in mathematics as a group.
We begin this Part II with a super-fast introduction to group theory, discussing
very basic groups together with examples based on Rubik’s Cube. The session
culminates in calculating the total number of positions that can be reached from
a solved cube. Although more complex, this feat resembles the 15-puzzle in
Session 5, where one can plunge into a detailed study of group theory. Naturally,
these two sessions reinforce the same abstract concepts from somewhat different
angles, and each of them is self-contained and can be tackled independently.
Part III will reward the patient reader: our newly-developed group-theoretic
tools will be used to find methods for efficiently solving jumbled cubes.
1. What Is a Group?
For those who desire the absolute minimum in conditions, see this footnote.1
Most familiar mathematical systems involve commutative operations, but
this is not necessarily the case in group theory. In other words, there may
exist elements g and h of G such that g ∗ h = h ∗ g, making the group
i non-commutative. Notice also that the definition above does not require
that a group be finite. In this session we will consider mostly finite groups,
although, as in the case of R, those finite groups may be quite large.
Since there is only one operation ∗, we often omit it and write gh in place
of g ∗ h. In the case of multiplication of permutations we already do this:
(1 2) combined with (1 3 4) is written (1 2)(1 3 4). Similarly, we can define
g 2 = gg = g ∗ g, g 3 = ggg = g ∗ g ∗ g and so on, g 0 = e, and g −n = (g −1 )n .
Because of associativity, these are all well-defined and they obey the usual
laws of exponents, such as: g m+n = g m g n , and this is true for any integers
m and n, be they positive, negative, or zero.
1.2. Famous infinite groups. You are probably already familiar with a
few finite groups, but most of the best-known examples are infinite:
• The integers as the group elements under addition.2
• The rational numbers under addition.
• The rational numbers with 0 omitted under multiplication.
• The real numbers or complex numbers under addition.
• The real or complex numbers omitting 0 under multiplication.
1
In fact, there is a slightly simpler and equivalent definition of a group: only a right
identity and a right inverse are required (or a left identity and a left inverse). In other
words, if there is an e such that g ∗ e = g for all g ∈ G and for every g ∈ G there exists a
g −1 such that g ∗ g −1 = e then you can show that e ∗ g = g and that g −1 ∗ g = e. This can
be done by evaluating the expression g −1 ∗ g ∗ g −1 ∗ (g −1 )−1 in two different ways using
the associative property, yielding that the left and right identities are the same and that
the left and right inverses of any element are also the same.
2
The term “under addition” simply means that the group operation is addition.
1. WHAT IS A GROUP? 25
the formal definition. Identify which element in these groups is the identity
(signified by e in the formal definition), and then check that all the other
group properties listed in Definition 1 hold.
In fact, all of the above sets are infinite and commutative groups. A
i group that is commutative is sometimes called an abelian group.
On the other hand, the non-negative integers {0, 1, 2, 3, . . .} under addi-
tion do not form a group – there is an identity (0), but there are no inverses
for any positive numbers. We can’t include zero in the groups of rational,
real, or complex numbers under multiplication since it has no inverse.
i The so-called trivial group consists of the single element e, and satisfies
e ∗ e = e. Since every group must contain the identity element, this is the
smallest possible group.
1.3. Groups from number theory. Only if you know about modular
arithmetic (cf. the Number Theory I session, vol. I), show that:
Exercise 2. The n elements 0, 1, . . . , n − 1 form a (finite abelian) group
under addition modulo n.
Exercise 3. If p is prime, then multiplication modulo p forms a group con-
taining p − 1 elements: 1, 2, . . . , p − 1.
If p is not a prime then the operation of multiplication modulo p does
not form a group. For example, if p = 6 there is no inverse for 2: 2 ∗ 1 = 2,
2∗2 = 4, 2∗3 = 0, 2∗4 = 2, and 2∗5 = 4. It is also not a group since 2∗3 = 0
and 0 is not in the set {1, 2, 3, 4, 5}, so in this case the operation is not even
closed! (Remember that in this example the “∗” represents multiplication
modulo 6.) Worse, when two numbers, neither of which is zero, multiply to
yield zero, then the system is said to have zero divisors; this immediately
prevents it from being a group (why?). In fact,
Exercise 4. When a modular system under multiplication has no zero divi-
sors it forms a group. This occurs precisely when the modulus n is a prime
number. If n is not prime, there will be zero divisors, and hence no group
under multiplication.
In the group based on addition modulo n, if you begin with the element
1, one can get to any element in the group by successive additions of that
element. In the group modulo 5, we obtain: 1 = 1, 2 = 1 + 1, 3 = 1 + 1 + 1,
4 = 1 + 1 + 1 + 1 and 0 = 1 + 1 + 1 + 1 + 1. The same idea holds for any n.
In this case we say that the group is generated by a single element (e.g., 1),
and such groups are called cyclic groups, since successive additions simply
i cycle through all the group elements. The element that generates the group
in this way is called a generator.
26 2. RUBIK’S CUBE. PART II
A cyclic group may have more than one generator. For example, in
the same group corresponding to addition modulo 5, the element 3 is also
a generator: 1 = 3 + 3, 2 = 3 + 3 + 3 + 3, 3 = 3, 4 = 3 + 3 + 3 and
0 = 3 + 3 + 3 + 3 + 3.
any other generators? What are all the generators of the group correspond-
Exercise 5. Does the group above corresponding to addition modulo 5 have
1.4. Groups from geometry. For any particular geometric object, the
i symmetry operations on that object form a group. A symmetry operation is
a movement after which the object looks the same (as if nothing happened
to it and it didn’t move!). For example,
e a b c
Figure 1. Four Symmetry Operations on an Ellipse
∗ e a b c
e: Leave it unchanged. e e a b c
a: Rotate it 180◦ about its center a a e c b
b: Reflect it across its short axis b b c e a
c: Reflect it across its long axis c c b a e
The group operation consists of making one movement followed by making
a second movement. Clearly e is the identity, and each of the operations is
its own inverse. We can write down the group operation ∗ on any pair of
elements of the ellipse symmetries in the 4 × 4 table above.
triangle? On a square?
Exercise 7. How many symmetry operations are there on an equilateral
symmetries of the equilateral triangle and for the 8-element group of sym-
Exercise 8. Try to make a multiplication table for the 6-element group of
2.1. Moves and twists vs. the group operation on R. The most
important class of examples for us (since we’re supposed to be fixated on
Rubik’s Cube as we read this) come from certain sets of permutations which
also form groups. Since a permutation is just a rearrangement of objects, the
group operation is simply the concatenation of two such rearrangements.4 In
other words, if g is one rearrangement and h is another, then the rearrange-
ment that results from taking the set of objects and applying g to it, and
then applying h to the rearranged objects, is what is meant by g ∗ h.
To avoid a possible misunderstanding, when we speak about the Rubik’s
Cube group, the group members are move sequences and the group operation
is the act of doing one sequence followed by another sequence. At first
it’s easy to get confused if you think of rotating the front face as a group
operation. The term “move sequence” above is not exactly right either – move
sequences that have the same final result are considered to be the same. For
an easy example, F and F5 are the same group element.
Definition 2. The Rubik’s Cube group R is the set of all possible permuta-
tions of the facelets achievable by means of a finite number of twists of the
i cube faces. To combine two of these permutations, we simply apply one set
of twists after the other.
This, of course, is a huge group.
3
Compare with the Group Theory I session, where rows and columns are reversed.
4
Warning: This session uses the notation of multiplication from left to right, i.e., gh
means apply first g and then h. This is in contrast with the right-to-left notation in the
Group Theory session, where gh in “action groups” means apply first h and then g.
28 2. RUBIK’S CUBE. PART II
1 2 3 4 5 6
4 1 6 2 5 3
5 4 2 6 1 3
From now on we’ll omit the “∗” operator and simply place the permuta-
tions to be multiplied next to each other.
Exercise 10. Verify the following product of permutations of {1, 2, . . . , 9}:
(1 2 3)(4 5)(6 7 8 9)(2 5 6)(4 1)(3 7) = (1 5)(2 7 8 9)(3 4 6).
Practice multiplying together other pairs of permutations.
Exercise 11. If a permutation is expressed in cycle notation where each of
the permuted objects appears in a single cycle, show that the inverse of that
permutation can be obtained by reversing the order of the elements in each
Ä ä−1
cycle. For example, (1 4)(3 5 2) = (4 1)(3 2 5), where (4 1) = (1 4).
As we noticed when we looked at permutations of the facelets of Rubik’s
Cube, the order makes a difference: (1 2)(1 3) = (1 3)(1 2) since (1 2)(1 3) =
(1 2 3) and (1 3)(1 2) = (1 3 2). And indeed, here the object 1 is shared by
both cycles, preventing them from commuting with each other (why?).
1 fA fB fC r1 r2
1 1 fA fB fC r1 r2
fA fA 1 r2 r1 fC fB
fB fB r1 1 r2 fA fC
fC fC r2 r1 1 fB fA
r1 r1 fB fC fA r2 1
r2 r2 fC fA fB 1 r1
dition modulo 6 and multiplication modulo 7. Is there more than one isomor-
Exercise 12. Find an isomorphism between the group corresponding to ad-
phism, i.e., more than one way to make the multiplication tables identical?
3. PROPERTIES OF GROUPS AND THEIR SUBGROUPS 31
Exercise 13. Show that the group consisting of the 4 symmetries of an
ellipse with different length axes (described in Exercise 6) is not isomorphic
to the group corresponding to addition modulo 4.
2.5. Part of the whole may be all you need. A permutation group does
not have to include all possible permutations of the objects. If we consider
the Rubik group R as a permutation group on the cubies, there is obviously
no permutation that moves an edge cubie to a corner cubie and vice-versa.
The group consisting of the complete set of permutations of three objects
shown in Table 1 contains various proper subsets that also form groups using
the same operation, but limited to that subset:
{1}, {1, (1 2)}, {1, (1 3)}, {1, (2 3)}, and {1, (1 2 3), (1 3 2)}.
Definition 4. The subsets of groups that are themselves groups under the
i same operation are called subgroups.
For example, the above subsets are recognizable as the trivial subgroup and
the four subgroups generated each by a reflection or a rotation of the equi-
lateral triangle. The group in which we are most interested here, the Rubik’s
Cube group R, is itself a subgroup of the group of all permutations of 48
items. We will examine the properties of subgroups in the following section.
Exercise 14. If H and K are both subgroups of the same group G, then
H ∩ K is also a subgroup of G. In other words, the intersection of any two
subgroups of G satisfies all the group properties from Definition 1.
Using as an example the symmetric group on three objects displayed in
Table 1, the order of (1 2) is 2, the order of (1 2 3) is 3, and both 2 and
3 divide 6, the order of the group. The proper subgroups of the symmetric
group listed in Section 2.5 have orders 1, 2, and 3 – again, all are divisors of
6, as they must be. Any pair of subgroups in that list only have the identity
element in common, so clearly the intersection of any two of them is also a
group, although in these cases it is the trivial group.
Exercise 15. Consider the symmetric group G on 4 objects: the group of
order 4! = 24 that consists of all the permutations of 4 objects. Let H be
the subset of G made of all permutations that leave the element 1 fixed (but
with no further restrictions), and let K be the subset of permutations that
leave 2 fixed. List the elements of H, K, and their intersection H ∩ K, and
verify that all three subsets are indeed subgroups of G.
Answers: For the three subsets, we have:
H = {(1), (2 3), (2 4), (3 4), (2 3 4), (2 4 3)},
K = {(1), (1 3), (1 4), (3 4), (1 3 4), (1 4 3)},
H ∩ K = {(1), (3 4)},
illustrating that the intersection of two subgroups is also a subgroup (and in
this case, it is the set of all permutations that leave both 1 and 2 fixed). ♦
It is easy to see why Theorem 1(c) is true for the full symmetric groups:
Problem 3. If G is the symmetric group on several objects, then the order
of any permutation in G has to divide the order of G.
Sketch: As we saw in Part I, we can write down any particular permuta-
tion as a set of (disjoint) cycles, and the order of that permutation is simply
the least common multiple (lcm) of the cycle lengths (why?). Since there are
n elements that are moved by the permutations, the longest cycle can have
length at most n, so all the cycle lengths are thus n or less. But the order
of the group is n!, and clearly the lcm of a set of numbers less than n will
divide n! (why?). ♦
3.2. A few proper subgroups of the Rubik group. Since the center
cubies always remain in the same position relative to the others, we will
always consider the cube to be oriented in a specific way (say, with the white
face up and the green face on the left). We consider to be moves only those
operations that twist a face relative to the others, so rotating the entire cube
as a unit is not a move we will consider. With a real cube, it is sometimes
interesting to think about “slice moves” where, say, the top and bottom face
are left in position and the center slice between them is turned (cf. the “slice
3. PROPERTIES OF GROUPS AND THEIR SUBGROUPS 33
the harder problem. One way to simplify Rubik’s Cube is to consider only
a subset of moves as being allowable and to learn to solve cubes that were
jumbled with only those moves. If you do this, you are effectively reducing the
number of allowable permutations, but you will still be studying a subgroup
of the full Rubik group.
The Rubik program contains a “macro gizmo” to make this easier. Fig-
ure 3b shows the gizmo with two macros defined: one that does the F op-
eration twice and one that does the L operation twice. To perform the FF
macro, simply click on the button marked “FF”. The help file for the Ru-
bik program describes how to define macros and include them in the macro
gizmo. If you’d like to investigate the positions achievable by a limited set of
moves, define each of the moves as a macro and put all of them in the macro
34 2. RUBIK’S CUBE. PART II
gizmo. Then make moves from an initialized cube using only macro gizmo
entries. In fact, if you place the macro gizmo on top of the control panel of
Rubik, you will not press any other buttons by accident. If you restrict your
moves to any of these subgroups, the cube will be easier to solve.
3.2.2. Examples you can do in practice. The list below is a tiny subset of
the total number of subgroups of the whole group, but these are “practical”
examples: you can experiment with a real cube making only the moves in
the indicated subgroups. Explore and describe, as much as you can, features
of these subgroups, e.g., try to calculate the order of the subgroup, to decide
whether it is abelian or not, cyclic or not, whether it looks like another group
you know, etc. Do not look at the commentaries after the exercises until you
have thought about the subgroups for a while. (In Part III, we will examine
in detail more general but less practical subgroups of R.)
Exercise 16. (Single face subgroup) In this subgroup of R, you are only
allowed to move a single face.
Hint: This group is not very interesting, since there are only 4 achievable
positions including “solved,” but it still is a proper subgroup of R. ♦
Exercise 17. (Two opposite faces subgroup) In this subgroup of R, you
are only allowed to move only two opposite faces.
Hint: This is also a fairly trivial group since twists of two opposite faces
are independent. Still, it has 16 elements and is an example of what is
known as a direct product group. Beware: if you are allowed to turn two
adjacent faces, the subgroup is enormous: it contains 73,483,200 members,
the calculation of which is beyond the scope of this session. ♦
Exercise 18. (F-L half-turn subgroup) In this subgroup of R, you are
allowed to move either the front face or the left face by half-turns.
Solution: In Figure 4 we see all 12 cube positions in the subgroup gen-
erated by FF and LL. Since applying FF or LL twice in a row brings us to
the previous position, the 11 positions different from the solved position are:
FF, FFLL, FFLLFF, FFLLFFLL, . . . , (FFLL)5 FF, arranged in that order in the
figure. The final position in the lower-right corner of the figure will return
to the solved position with one more application of LL.
and it is not hard to prove that the pattern continues. This shows that any
n-cycle can be expressed as a product of 2-cycles. If n is even, there are
an odd number of 2-cycles and vice-versa. Since every permutation can be
expressed as a set of disjoint cycles, this means that every permutation can
be expressed as a product of 2-cycles. For example:
(1 4 2)(3 5 6 7)(9 8) = (1 4)(1 2)(3 5)(3 6)(3 7)(9 8).
Obviously, there are an infinite number of ways to express any particular
permutation as a product of 2-cycles:
(1 2 3) = (1 2)(1 3) = (1 2)(1 3)(1 2)(1 2) = (1 2)(1 3)(1 2)4 = · · · .
But it turns out that there is one big restriction in such representations:
Theorem 2. For any given permutation, the number of 2-cycles necessary
to represent it is either always even or always odd.
For this reason, we can say that
Definition 5. A permutation is either even or odd, depending on whether
i its representation requires an even or an odd number of 2-cycles.
Theorem 2 is not too hard to prove, as long as one is willing to allow
some polynomial algebra to sneak into our discussion.5
Proof: Consider a permutation of the set {1, 2, . . . , n} that moves 1 to x1 ,
2 to x2 , 3 to x3 , and so on. All the xi ’s are different, and they represent
exactly the numbers from 1 to n in some order. Now construct the product:
(1) (xi −xj ) = (x2 −x1 )(x3 −x1 ) · · · (xn −x1 )(x3 −x2 ) · · · (xn −xn−1 ),
1≤j<i≤n
where we simply multiply all differences between the xi ’s, always putting the
larger index first. If you have never seen the Π-product notation before, the
Greek symbol Π (pi) in front indicates a collection of things to be multiplied.
In the example above, it means to multiply together all possible terms of the
form (xi − xj ) where 1 ≤ i < j ≤ n. It is similar to the Σ-notation for
summation, if you have seen that before. If you find it easier to understand,
the product notation above has the following alternate representation where
both i and j step up one at a time:
Ç å
n−1
n
(xi − xj ) = (xi − xj )
1≤j<i≤n j=1 i=j+1
Since all the xi ’s are different and every term in the product (xi − xj )
is non-zero due to i = j, the total product itself is also non-zero. Since in
each term the value of xi may be greater than or less than xj , the individual
terms, and hence the product, may be positive or negative.
Definition 6. If the product (1) is negative, we will call the permutation
i odd, and if the product is even, we will call it even.
5
Theorem 2 is approached differently in the Group Theory I session.
4. EVEN AND ODD WORLDS 37
The reader should verify that this argument can be modified to work for
any 2-cycle (a b) in place of (1 2).
Returning to the proof of Theorem 2, we utilize the following well-known:
by a 2-cycle changes the parity of the permutation), break the full process
PST 12. If a property has been proven for a small step (e.g., multiplying
into a sequence of analogous steps and apply the property at each step.
Now, recall that any permutation ρ can be written as a product of 2-
cycles. Thus, we can build up, step by step, from the identity to ρ, multi-
plying by a 2-cycle and changing the parity at each step. This means that
the two definitions will yield the same parity for ρ. Since our alternative
Definition 6 using the product Π is independent of the particular way we
write ρ as a product of 2-cycles, then it doesn’t matter which and how many
particular 2-cycles we have multiplied to get ρ: the number of such 2-cycles
will be always be odd for ρ, or will always be even for ρ. ♦
Embedded in our proof was an old math “trick”:
PST 13. If a definition depends on making choices and can thereby, hy-
pothetically, yield different answers, find another way to define the same
concept that is independent of choices.
In our discussion above, the original Definition 5 of parity of a permuta-
tion depended on the specific decomposition of the permutation as a product
of 2-cycles, while the alternative Definition 6 using the product Π did not
depend on any choices. We showed that the two definitions are equivalent.
5.1. Parity and the cubies. We know that every possible permutation
of the cube can be achieved by some combination of single clockwise turns
of the faces, and it is also easy to see that:
of the cubies.
Exercise 20. Every face turn has even parity with respect to the movements
Proof: The cycle structure for a single clockwise quarter-turn, say, of the
front face is this:
(FL FU FR FD)(FUL FRU FDR FLD),
which clearly has even parity since each of the 4-cycles can be written as a
product of three 2-cycles for six total 2-cycles, making the parity even.
This means that there is no combination of moves of the cube that will
exchange a single pair of cubies because that would correspond to an odd
permutation of the cubies.
As we shall see later, a cycle of three cubies of the same kind is possible,
or an exchange of two pairs, both edges, both corners, or one of each. If the
goal of solving Rubik’s Cube were simply to get the corner cubies and edge
cubies into their correct positions but not to worry about whether they were
oriented correctly, then if you were to break the cube apart and reassemble
it at random, on average half of your re-assemblies would result in a solvable
cube. The expected solution of Rubik’s Cube does require that you get the
orientations of the edge and corner cubies correct, and it turns out that there
are additional restrictions on these orientations, which we study below.
5.2. Parity and the edge cubies. Let’s consider first the edge cubies.
We will see that they, too, satisfy a parity condition:
Problem 6. An even number of the edge cubies must be flipped.
40 2. RUBIK’S CUBE. PART II
Proof: Imagine a cube in outer space held such that the center cubies stay
fixed as the other cubies turn around them. If you imagine a set of three-
dimensional coordinate axes whose origin is at the center of the cube and
such that each axis goes through the center of a pair of center cubies, then
for each axis, there are four edge cubies whose outer edges are parallel to that
axis: these four edges are determined by a slice of the cube perpendicular to
the chosen axis. Further, each axis has a positive and a negative direction.
Let us mark the outer edge of each cubie with an arrow that is aligned with
the positive direction of the axis parallel to it in the solved configuration.
At any stage, you can look at +1 -1
the arrows on each edge cubie’s outer +1 +1 +1 +1
+1 -1
edge to see if they are aligned with
their current axis. The figure on the
right illustrates a 90◦ rotation Fccw:
the outer arrow configuration on the
left will be converted to the arrow
configuration on the right. In this
case, exactly two of the arrow direc-
tions are flipped.
Now, next to any edge arrow write +1 if it is aligned with the positive
direction of an axis, and −1 otherwise, and multiply these four numbers for
the turning face. In the above example, the products before and after the
face turn are both +1. In general, they will be always equal. Indeed, look
at the arrows on two opposite edges: they remain to each other in the same
relative orientation before and after the turn, because if they were pointing
in the same direction before, they would be pointing the same direction now;
if they were pointing in opposite directions before, they would be pointing
in the opposite directions now. This means that the product of numbers on
a pair of opposite edges does not change after the turn. Hence the whole
product of the four edge numbers on the turning face remains the same,
implying that an even number of arrows must have flipped their direction.
Thus every turn of a face will flip an even number of arrows, so at any
stage, an even number of the edge cubies will be flipped since in the original
configuration zero of them were flipped.
Consequently, it is impossible with any number of twists to flip exactly
one edge cubie in place.
5.3. Rotations and the corner cubies. The corner cubies satisfy a
slightly different condition. For each corner cubie, mark its three facelets by
1, 2, and 3 so that when you look at that cubie from the outside (along a
line through that corner toward the center of the cube), you will see “123”
marked in a clockwise direction. Obviously, for each cubie, there are 3 such
possible labellings, rotated from each other by 120◦ or 240◦ . As we will see,
5. HOW MANY CUBE POSITIONS CAN BE REACHED? 41
2 3
1 3 2 1
1 2
3 2 1 3
1 2
3 1
2 2 2 3
3 3
1 1
2 1 2 1
1 2 2 3
3 3 2 1 3 3 2 1 3 1 3 2 1 3 2 1
1 2 3 3 1 1 2 3 1 2 3 1 2 1 2 3
11 22 1 3
2 3 2 3
After any move sequence on the Rubik’s Cube, we trace how the labelling
of the corner cubies have changed with respect to the initial fixed labellings
in the solved cube in Figure 5a. There are three possibilities for a corner
cubie: its labelling “123” went to a place with original labelling “123”, in
which case we say that cubie was rotated by 0◦ ; if “123” went to “231”, the
i cubie was rotated by 120◦ ; and if “123” went to “312”, the cubie was rotated
by 240◦ (always clockwise).
Problem 7. The total rotation of all eight corner cubies is zero, meaning
the sum of the rotation degrees for all the corner cubies is a multiple of 360◦ .
Proof: To see this, we can again look at what a single face turn does.
If every face turn preserves this condition, then so will any combination of
them. Obviously, the four corner cubies of the opposite face (that is not
turned) are literally untouched by the face-twist, and hence we need to show
only that the total rotation for the four cubies on the twisting face is zero.
7
To change the orientation to counterclockwise would imply that one facelet’s number
remains fixed, while a reflection switches the other two facelets’ numbers – a physically
impossible situation with a corner cubie.
42 2. RUBIK’S CUBE. PART II
5.4. The final countdown. We are now in a position to count the total
number of configurations that can be reached from a solved cube. First,
Exercise 21. How many configurations can be constructed with no con-
straints, i.e., if you pop the cube apart with a screwdriver, in how many
ways can you put it together?
Solution: There are 8 possible locations for each corner cubie, and if all
arrangements were possible, there would be 8! rearrangements. Similarly,
there are 12! rearrangements of the edge cubies. Each corner cubie could be
in any of 3 rotations, so there are 38 ways of aligning the corner cubies, and
similarly there are 212 flipping configurations of the edge cubies. The grand
total of configurations is thus: 8! · 12! · 38 · 212 .
5. HOW MANY CUBE POSITIONS CAN BE REACHED? 43
But we know better than to think all of these rearrangements are possible:
in this section we discovered constraints on the cubies’ moves!
Proof: Of the 8!·12!·38 ·212 configurations, only 1/3 will have the rotations
of the corner cubies right (by Problem 7), only 1/2 of those will have the
edge-flipping parity right (by Problem 6), and only 1/2 of those will have the
correct even parity of the total cubie rearrangement (by Exercise 6). Thus
the total number of reachable configurations from a solved cube is at most:
(8! · 12! · 212 · 38 )/(3 · 2 · 2) = 8! 12! 210 37 = 43, 252, 003, 274, 489, 856, 000.
Are all of these rearrangements actually achievable? Obviously, we won’t
attempt to show separately that each and every one of these rearrangements
is possible. We should group them in a clever practical way in order to
minimize the work we have to do.
Before we comment on the solution to this problem, let’s see what other
moves are necessary to solve the Rubik’s Cube. Suppose now we have man-
aged to place all cubies in their right positions in the cube, except for possibly
their orientations.
Problem 9. We can perform an operation on the Rubik’s Cube that:
(a) simultaneously changes the orientation of any two edge cubies; in fact,
doing this for two adjacent edge cubies will suffice. Adjacent edge cubies
have only one corner between them, e.g., UF and UR.
(b) rotates one corner cubie one way by 1/3 and another corner cubie the
other way by 1/3; in fact, doing this for two adjacent corner cubies will
suffice. Adjacent corner cubies have only one edge cubie between them,
e.g., ULF and URF.
the Rubik’s simulator by defining macros and substantially speeding up the
process. In fact, the reader who has been exploring the Rubik’s simulator
will know where to find these specific macros already defined!
6. CONCLUSIONS 45
Problem 10. Using the five operations above, we can solve the Rubik’s Cube.
Proof: If the corner cubies need an odd permutation to get to their proper
places, then the edge cubies will also need an odd permutation (why?). In
such a case, the algorithm in Problem 8(c) will swap two corners and two
edges, which will make the corner and edge permutations both even. Now,
using the results from Problem 8(a)-(b) we can put the corner cubies and
edge cubies in their proper places.
From now on we will not permute the cubies – we will only change
their orientation in place to make them fit the solved Rubik’s Cube. Start
from any two edge cubies whose orientations are not correct (i.e., their two
facelets do not match the colors of the adjacent to them central cubies), and
use the algorithm in Problem 9(a) to flip simultaneously the orientations of
these edge cubies to the correct ones. Keep repeating the process for any
remaining pairs of incorrectly oriented edge cubies. Suppose that in the
end, there is only one incorrectly oriented edge cubie left (and so we cannot
apply the algorithm to it as that would disturb another, correctly oriented
edge cubie). If that situation were possible, then it would also be possible
from a solved cube to do a sequence of moves that results in changing the
orientation of only one edge cube, a contradiction with Problem 6! Hence,
at the end of the process, all edge cubies will be correctly oriented.
As for the corner cubies, start from any two corner cubies with incorrect
orientations. Apply the algorithm in Problem 9(b) to rotate them 1/3 one
way or the other, making sure that you are rotating at least one of them into
its correct orientation. Keep repeating the process for any remaining pairs
of incorrectly oriented corner cubies. Again, if in the end there is only one
incorrectly oriented corner cubie left, this would mean that from a solved
cube there is a sequence of moves resulting in a total rotation of 120◦ or
240◦ (given by the incorrectly oriented corner cubie), a contradiction with
Problem 7. Hence, in the end, all corner cubies must have been oriented
correctly . . .
. . . and the Rubik’s Cube is solved!
6. Conclusions
In reality, of course, no one applies the above method, unless they have
an almost infinite time and patience on their hands. Think about how many
moves it would take to just flip two edge cubies’ orientations (Problem 9(a)
requires 15 moves), then multiply this by the number of pairs of incorrectly
oriented edge cubies (up to 6 pairs), and you will still be a long way from
solving the cube!
46 2. RUBIK’S CUBE. PART II
∗ e ρ ρ2 ρ3 φ φρ φρ2 φρ3
e e ρ ρ2 ρ3 φ φρ φρ2 φρ3
2 3 3
ρ ρ ρ ρ e φρ φ φρ φρ2
ρ2 ρ2 ρ3 e ρ φρ2 φρ3 φ φρ
3 3 2 2 3
ρ ρ e ρ ρ φρ φρ φρ φ
2 3 2
φ φ φρ φρ φρ e ρ ρ ρ3
φρ φρ φρ2 φρ3 φ ρ3 e ρ ρ2
φρ2 φρ2 φρ3 φ φρ ρ2 ρ3 e ρ
3 3 2 2 3
φρ φρ φ φρ φρ ρ ρ ρ e
Exercise 17. If we rotate the front and back faces, the “two opposite faces”
subgroup will have two cyclic “single face” subgroups F = {1, F, F2 , F3 }
and B = {1, B, B2 , B3 }, the elements of which will commute. As a result,
the total group will consist of 16 elements of the form Fk Bm where k, m =
0, 1, 2, 3. The group is commonly written as F × B, the direct product of
the two “single face” subgroups.
Problem 8. (a) The macro M1 = UffurdlffLDR will perform the 3-cycle
of edge cubies on the front face: FR→FL→FU (cf. Fig. 6a). To see why
this is sufficient, pick any three edge cubies. It is straightforward to find a
sequence S1 of moves that lands all three cubies on the same face. Then we
can apply our macro M1 to cycle the three edge cubies, and finally we can
apply the inverse of S1 to return them to their original positions, but now
shifted in a cycle. The resulting total sequence of moves S1 M1 S−1
1 is called
the conjugation of M1 by S1 : a common operation in abstract algebra. ♦
(b) The macro M2 = fUBuFUbu will perform the 3-cycle of corner cubies
on the top face: ULF→ULB→URF (cf. Fig. 6b). The same idea of conjugating
M2 by a sequence S2 that moves any three corner cubies onto the same face
will work here to cycle these cubies: S2 M2 S−1
2 . ♦
(c) The macro M3 = rURurUFRbRBRfRR will perform the simultaneous
flip of edge cubies UL↔UF and of corner cubies ULF↔URF (cf. Fig. 6c).
Exercise 24. The longest paths connect diametrically opposite cubies; e.g.,
for edge cubies: FR,UF,UL,BL, and for corner cubies: URF,ULF,DLF,DLB. ♦
Problem 9. (a) The macro M4 = FRBLUlUbrfluLu will flip in place the ad-
jacent edge cubies UF and UL, thus changing their orientations (cf. Fig. 6d).
If A and B are now two arbitrary edge cubies, take a path of adjacent
edge cubies from A to B, e.g., A, A1 , A2 , B, and apply the above macro to flip
the orientations on {A, A1 }, then on {A1 , A2 }, and finally on {A2 , B}. Along
the way, the orientations of the middle cubies A1 and A2 were flipped twice
and hence did not change, while the orientations of A and B did change.
(b) The macro M5 = LdlfdFUfDFLDlu will rotate in place the corner
cubies UFL 1/3 counterclockwise and UFR 1/3 clockwise (cf. Fig. 6e). Anal-
ogously as above, take any path of adjacent corner cubies and apply macro
M5 to every pair along the path: this will turn in place all middle cubies 1/3
counterclockwise and then 1/3 clockwise, i.e., will fix them, while the first
and last corner cubies on the path will be rotated as desired.
Session 3
Knotty Mathematics
Maia Averett
1.1. History and cheating. A long time ago in the region known today as
Turkey, the historic kingdom of Phrygia had no king. Its people sought the
advice of an oracle, who decreed that the next man to enter their city driving
an ox-cart should be their king. Soon thereafter, a poor peasant Gordius and
his wife wandered into the city with an ox-cart and the Phrygians declared
Gordius their king. In his gratitude to the gods, Gordius dedicated his cart
to Zeus and tied it to a pole in the acropolis with a complex and intricate
knot that became known as the Gordian Knot.
Over time, the lore surrounding the knot grew and
grew into the legend of Gordius, which said that the
person who could unravel the knot would rule all of
Asia. The Gordian Knot resisted all attempts to untie
it until 333 BCE, when Alexander the Great visited the
city. After searching unsuccessfully for the ends of the
rope, he boldly cut through the knot with a stroke of
his sword1 . Alexander the Great went on to conquer
all of Asia, fulfilling the prophecy.
1
Depicted on the right in The Story of the Greeks, by Helena A. Guerber [34].
49
50 3. KNOTTY MATHEMATICS
But wait, did he cheat? Should he be allowed to cut the knot? Perhaps
the puzzle was truly impossible and the knot could not be untied without
cutting the rope to expose the ends. After all, we can always untie a rope
that’s knotted as long as the ends are still free. It might be quite difficult,
but with enough wriggling and pulling, it’s always possible! Perhaps the
Gordian knot had its ends spliced together and Alexander the Great had to
cut it in order to untie it! Or maybe it had its ends spliced together, but it
was still possible to untie it without cutting it. How can we know?
The mathematical branch of knot theory can help us answer this question.
It is a wonderful part of mathematics, full of pictures and silly words like
flype, writhe, and quandle, which represent actual knot theory concepts but
couldn’t be fitted in this short chapter. Do not fret: there will still be plenty
of pictures to justify the choice of such picturesque words.
1.3. The art and science of drawing knots. The first thing we need to
do in order to make sense of knots is to figure out a good way of representing
them. Playing with ropes is fun, but it’s not very useful for attacking a
problem systematically. Instead, we think of knots as represented by knot
diagrams, which are just drawings of the knot on paper so that we can easily
see what it looks like. Here is an example:
Of course, there are many, many (infinitely many, even!) different diagrams
that represent the same knot. For instance,
Exercise 1. (Warm-up) Convince yourself that the two diagrams in Fig-
ure 2 represent the same knot, called the right-handed trefoil.2
In drawing a knot diagram, the most important thing is that you should
be able to reconstruct your knot from the information you draw; so your
diagram has to be good enough to do this. In particular, it should be clear
which string goes over at each crossing, and there shouldn’t be three strings
meeting at a crossing.
1.4. Getting knottier. Now that we’re at it, why should we limit ourselves
to having just one loop of string? We may as well allow ourselves to play with
objects that are made up of more than one circle of string; these are called
i links, and the different pieces of string are called components. Of course, a
knot is a special kind of link: one that has only one component.
Figure 3 shows a few examples with their components drawn in different
colors. Again, instead of thinking of links as living in three-dimensional
space, we draw link diagrams flat on paper, leaving little gaps to indicate
the crossings.
2
Why “right-handed”? Read the beginning of the Hints section.
52 3. KNOTTY MATHEMATICS
1.5. Some famous knots and links. The simplest knot is the unknot, as
shown in Figure 4a. You can see where it gets its name! The next simplest
i knot, the trefoil, follows in Figure 4b. This is the knot you’d get if you made
a regular overhand knot (like you were tying your shoelaces) and then put
the ends together. The remainder of Figure 4 portrays one other famous
knot (Figure 4c) and three famous links (bottom row).
knots (other than the unknot) with only 1 or 2 crossings? Draw pictures!
Exercise 3. Why is the trefoil the next simplest knot, i.e., why aren’t there
A Brunnian link is a link that falls completely apart if any one of its
i components are cut. The Borromean rings in Figure 4f are an example of a
Brunnian link with three components.
Exercise 4. Find a Brunnian link with four components; and then with
five components. What is the pattern? Can you describe how to draw a
Brunnian link with n components?
2. REIDEMEISTER AND KNOT-EATING MACHINES 53
2.1. Reidemeister dance party. In order to study knots via their dia-
grams, we need a way to record on paper the wiggles that we might do to a
i knot if it were actually made of rope. Reidemeister 3 moves are operations on
knot or link diagrams that don’t change the knot or link represented by the
diagram. Reidemeister’s Theorem tells us that we only need three moves to
represent all possible knot wiggles. A fun way to think about Reidemeister’s
theorem is in terms of a knot (or link) dance party:
• the Reidemeister moves are the dance moves; and
• if a knot diagram K1 dances for a while and ends up looking like a knot
diagram K2 , then K1 and K2 represent the same knot!
Theorem 1. (Reidemeister’s Theorem) Two links are equivalent if and
only if they can be represented by diagrams that are themselves related by a
finite sequence of diagrams, each of which differs from the one before by one
of the following three moves, R1 , R2 , and R3 :
The pictures on the left are zoomed in on
R1 one part of the link, showing the strands be-
fore and after making the moves. The first
move straightens out a twist; the second sepa-
rates overlapping strands; and the third moves
R2
a strand above a crossing.
Each move has a few variations. For exam-
ple, in the first move, the loop might be on the
R3 left instead of the right; or, in the third move,
the strand might be entirely under instead of
over the crossing.
It is intuitive (and true!) that the Reidemeister moves do not change
a knot. It is harder to see that these three types of moves are actually all
that you need in order to understand knot equivalence; but try to
convince yourself of that, too! For a proof of Reidemeister’s Theorem, we
direct you to Knot Theory by Livingston [49].
3
Kurt Werner Friedrich Reidemeister (1893-1971) produced over 70 mathematical
papers and books in differential geometry, combinatorial topology, combinatorial group
theory, logic, and philosophy, as well as in his dissertation field of algebraic number theory.
While at the University of Vienna, he learned from Wilhelm Wirtinger how to compute
the fundamental group of a knot from its projection. Soon after, he published important
papers in knot theory, Elementare Begründung der Knotentheorie [66] and Knoten und
Gruppen [65]) and his fundamental book Knotentheorie [67]. The Nazis considered him
“politically unsound” and forced him to leave his chair at the University of Königsberg in
1933. After World War II, Reidemeister was re-instated at the University of Marburg, at
Kurt Hensel’s chair.
Even to this day, Reidemeister moves are ubiquitous in knot theory research.
54 3. KNOTTY MATHEMATICS
Now you will get a chance to apply to our main protagonists in Figure 4
the Reidemeister moves . . . as well as a seemingly “illegal” change-of-crossing
move, which will nevertheless prove quite revealing in sorting out knots.
Exercise 5. Use Reidemeister moves to go from one to the other trefoil
diagram in Exercise 1.
Many recreational and serious math problems may look hard because the
problem solver faces the “end” of a procedure that must be undone:
Exercise 7. Start with a picture of an unknot and apply five Reidemeister
moves on it to make it look complicated. Give it to a friend and have him/her
try to untangle it using Reidemeister moves. If you want to challenge your
friend further, repeat the procedure on a more complicated knot or link.
While it is evident that the two unknots in the Hopf link are symmet-
ric (and likewise the three Borromean rings), is it immediate that the two
Whitehead components play an “equal role” in that link? The next exercise
settles this question.
Exercise 8. Draw a sequence of Reidemeister moves that sends the White-
head link to itself but interchanges its two components, thereby showing that
they are symmetrically positioned. (Draw the components in different colors
to make your solution clear.) Practice this rigorous component-swapping on
the Hopf link and the Borromean rings.
You can think of invariants as imperfect sorting machines. You can ask
the XYZ machine to sort knots by the XYZ invariant and it’ll sort them into
boxes accordingly. It might make mistakes, though, and put two inequivalent
knots in the same box, because it can happen that two inequivalent knots or
links have the same value for an invariant!
2.3. Baby invariants. There are lots and lots of knot and link invariants.
Let’s start with the simplest ones.
2.3.1. Sorting by the number of components. One extremely simple example
of a link invariant is the number of components that it takes to make the
link. This is a rather boring invariant because it is so coarse (for example,
it can’t distinguish any knot from any other!), but it does provide at least a
first little bit of information.
Mr. Naughty Robot in Figure 5 is sorting the links by their number of
components. In which urn will he put the link he is currently analysing?
2.3.2. Counting the crossing number. The crossing number is the minimum
i number of crossings occurring in any diagram of the knot.
Exercise 9. Find out the crossing numbers of all knots in Figure 4.
Partial Solution: The only crossing number that is truly easy to com-
pute is that of the unknot, which is zero. You need to do some work for all
56 3. KNOTTY MATHEMATICS
other knots! For example, in Exercise 3, you showed that any knot with 1 or
2 crossings is really the unknot; consequently, you concluded that the trefoil
is the next simplest knot, always drawn with at least 3 crossings.
Further, since we can draw a figure 8 knot in a diagram with 4 crossings,
we know that its crossing number is ≤ 4, but we don’t know whether it
equals 3! How do you know that you can’t draw it with 3 crossings? ♦
2.3.3. Distinguishing by the unknotting number. The unknotting number is
i the minimum number of crossings that must be changed (as in Exercise 6)
before the knot becomes equivalent to the unknot (or the link to an unlink).
This is, in some sense, a measure of how knotted a knot is.
Hint: In Exercise 6 you showed that the unknotting number of the trefoil
must be ≤ 1. Again, to prove it is actually equal to 1, you need to show that
the trefoil is not already the unknot, which we won’t get to until the next
section. ♦
The sections that follow will detail several more involved and powerful
examples of invariants, which will enable us to rigorously distinguish among
all of our famous knots and links, thereby completely solving the above
problems and a lot more.
PST 17. To prove that a knot (or link) is tricolorable, all you have to do is
exhibit a nontrivial tricoloring of one of its diagrams.
This is the “muddying-your-hands” approach: just as proving that two
knots are equivalent requires us to show a sequence of (legal) transformations
that takes one knot to the other, so does tricolorability demand that we come
up with a particular tricoloring of a particular diagram of the knot. Still, re-
call the warning about how different the question of showing non-equivalence
between two knots is. Trying out specific moves that don’t transform one
knot to the other is not enough: that’s what a whole army of invariants is
created for! Analogously, just failing to tricolor a knot is not a proof of its
non-tricolorability. More subtle work needs to be done.
PST 18. To show that a knot is not tricolorable, you have to make a logical
argument as to why it can’t be. Begin by coloring a single crossing with all
3 colors (if you want to get a nontrivial tricoloring, you might as well start
with all 3 colors!). Work your way around the knot, following the rules for
tricoloring, until you come to a contradiction. Sometimes there might be
more than one choice and you have to show that in all cases you still come
to a contradiction.
Exercise 12. Find out if the figure 8 knot is tricolorable or not.
Solution: We’ll start with the crossing on the far left. If all three strands
that meet there are, say, red (cf. Fig. 7a), the fourth strand of the knot is
forced to be red too (why?), and hence the knot is trivially colored. Other-
wise, we have red, blue, and yellow at that crossing. At the next crossing
(moving clockwise), we see that two strands are already colored with dif-
ferent colors, so we have to color the remaining strand red. But now we’ve
colored all the strands, and the two crossings at the bottom of the picture
don’t obey the tricoloring rules. The figure 8 knot is not tricolorable!
R1 R1
Thus, the idea to use 3 colors and define tricolorings the way we did earlier
is far from whimsical, as the next fundamental problem also confirms.
Problem 1. Suppose a link diagram D is tricolorable. Show that if you per-
form any of the three Reidemeister moves on D, then the resulting diagram
is also tricolorable. Conclude that tricolorability is a link invariant!
PST 19. One possible way to distinguish two links (or knots) is to verify
that one is tricolorable and the other is not.
Since the unknot is not tricolorable – it has only trivial tricolorings –
and the trefoil is tricolorable, then they must be distinct knots! Now we are
sure that the trefoil is really a knot (and not the unknot). Incidentally, this
completes the proof that the trefoil’s crossing number is 3 (cf. Exer. 9) and
that the trefoil’s unknotting number is 1 (cf. Exer. 10).
Exercise 13. Mrs. Trefoilia Robot
(to the right) is sorting knots and links
into two urns according to their tri-
colorability. In which urn should she
put the Borromean rings? How about
the unlink 4 with two components?
Explain why she has correctly sorted
out the Hopf link, the figure 8 knot,
and the Whitehead link? How many
trefoils can you recognize in the YES
picture? NO
4
An unlink is a link equivalent to several disjoint unknots.
3. THREE CRAYONS DEFEAT AN ARMY OF KNOTS 59
Exercise 14. Which of the knots below are tricolorable? The knots are
known as the 31 knot (the trefoil), the 51 knot, the 71 knot, and the 91 knot.
(Why do you think they have those names?) What do you notice about your
answers? Make a conjecture and explain your reasoning.
Here are the 111 , 131 , and 151 knots. Check your conjecture!
3.3. Counting tricolorings. We can also think about not only whether or
not a link is tricolorable, but how many possible tricolorings it has. Let’s
i write τ (L) for the number of tricolorings of a link L. Hold on! The link L
has infinitely many diagrams D! Which diagram do we use? Ideally, any
two diagrams of L would have the same number of tricolorings so that it
wouldn’t matter which diagram D we choose to calculate τ (L). . . . Luckily,
this is exactly what happens:
Problem 2. Prove that performing Reidemeister moves on a diagram D
preserves the number of D’s tricolorings. Conclude that τ (L) does not change
under the Reidemester moves, i.e., that τ (L) is a link invariant.
Since every link always has at least the three trivial tricolorings, it’s easy
to see that τ (L) ≥ 3 for all links L. This relates to our previous definition
– that a link is tricolorable if and only if τ (L) > 3 – but it is a stronger
invariant, as demonstrated next.
60 3. KNOTTY MATHEMATICS
Exercise 16. Compute τ for the trefoil, the figure 8 knot, and the so-called
square knot shown in Figure 9. Conclude that these are all different knots!
The last exercise shows that τ is a more refined invariant than the sim-
ple Yes/No of tricolorability: it can distinguish between the trefoil and the
square knot, even though both are tricolorable.
Problem 3. Compute τ for various knots from the knot table on page 68.
Do you notice a pattern? Can you explain why you see that pattern? (This
will be treated in more detail with linear algebra in Section 3.5.)
3.4. Tricolorings and connected sums. Just as we can build any natural
number from its prime divisors, we can try to create more complex knots
from simpler knots. For this, we will do a bit of “surgery” on the simpler
knots in order to join them together, sort of like Siamese twins.
i The connected sum K1 #K2 of two knots K1 and K2 is formed by erasing
a little piece of a strand from each knot and then connecting the loose strands
together. The example in Figure 9 takes the right-handed and the left-
handed trefoils and forms their connected sum, known as the square knot.
K1 K2 K1 #K2
Figure 9. The square knot is the connected sum of two trefoils
For instance, it is easy to see that a knot K doesn’t change if you connect
it with the unknot U ; but that K acquires an extra ring around one of its
strands if you connect K with the Hopf link H (why?).
Exercise 17. If K 1 and K2 are tricolorable, is K1 #K2 tricolorable?
Taking connected sums is a good operation on knots as it relates features
of the resulting knot to those of its building blocks. One such feature is τ .
Problem 4. Find a formula that relates τ (K1 ), τ (K2 ), and τ (K1 #K2 ).
PST 20. It is always a smart idea to check your formulas against some
examples that you can work out directly or using other methods.
Problem 5. Consider your formula for τ (K1 #K2 ).
(a) Verify it when one of the knots Ki is U or H.
(b) What does it say about K1 and K2 if K1 #K2 is tricolorable?
(c) Use it to find τ of a linear chain of n rings (cf. Fig. 10).
(d) Is it useful in finding τ of a necklace of n rings? How about the Brunnian
link with n rings from Exercise 4? Calculate τ if you can.
3. THREE CRAYONS DEFEAT AN ARMY OF KNOTS 61
3.5. Tricolorings and linear algebra over F3 . We will now use tools
from linear algebra to systematize our study of tricolorings. To this end,
we will need to assume knowledge of a few things. You can skip ahead to
Section 4 on the Jones polynomial if you don’t know about matrices, systems
of linear equations, or adding and multiplying modulo 3.
The set of numbers {0, 1, 2} is a perfectly good place for doing arith-
metic:5 it is called the field F3 . This just means that you can add, subtract,
multiply, and divide in F3 subject to all the usual rules, e.g., distributive
law, associative law, etc. However, each time you get a number a ∈ F3 , you
divide a by 3 and replace a by its remainder 0, 1, or 2. (In a fancy language,
you reduce a mod 3.) For example, 5 = 2 and 7 = 1, 5 + 7 = 12 = 0,
5 − 7 = −2 = 1, 5 · 7 = 35 = 2, and 7 ÷ 5 = 2. In practice, arithmetic mod 3
boils down to 4 simple tables:
+ 0 1 2 − 0 1 2 · 0 1 2 ÷ 1 2
0 0 1 2 0 0 2 1 0 0 0 0 0 0 0
1 1 2 0 1 1 0 2 1 0 1 2 1 1 2
2 2 0 1 2 2 1 0 2 0 2 1 2 2 1
Moving on, you might have learned about matrices and linear algebra
working over Q or R (i.e., using rational or real numbers); but in fact you
can do linear algebra over any field, including F3 . You can do row operations,
find inverse matrices, and solve systems of equations in just the same way.
All of the theorems generalize word for word over F3 . To get warmed up, do
the following couple of computations with linear algebra over F3 .
Exercise 18. Write down the co- Exercise 19. Consider the matrix
efficient matrix for the system of ⎛ ⎞
1 2 0 0 1 1
equations ⎜ 0 1 2 2 0 0 ⎟
⎜ ⎟
2x + y = 0 ⎜ 0 0 1 0 2 0 ⎟
x + y + z = 1. A=⎜
⎜
⎟
⎟
⎜ 0 0 0 0 0 0 ⎟
⎝ 0 0 0 0 0 0 ⎠
Then write down the augmented
matrix and do row operations to 0 0 0 0 0 0
find all solutions to this system over How many solutions does the system
F3 . How many are there? Why? of equations Ax = 0 have over F3 ?
Hint: To every crossing associate the strands that go under it; and to every
strand associate the crossings (if any) under which it goes. ♦
Instead of using 3 colors to label the strands, let’s use the numbers 0, 1,
and 2. Then a tricoloring of D is an assignment of one of the numbers 0, 1,
i or 2 to each strand sk such that at each crossing either all 3 numbers are
present or only 1 number is present. Let’s denote the “color” of sk by xk , so
that xk ∈ {0, 1, 2}. Thus a “coloring” of D will be a list x1 , x2 , x3 , . . . , xn of
“colors” for the strands. But not just any list . . . .
Exercise 22. How many strands meet at a single crossing? Examples?
Solution: There could be 1, 2, or 3 distinct strands meeting at a sin-
gle crossing. Examples are provided by the unknot twisted once or twice
(cf. Fig. 8), or thrice; but you can easily go with the Hopf link and the
trefoil for the 2- and 3-strand crossings.
The variety of possibilities at a single crossing is incon-
venient. Instead, imagine an ant sitting on the diagram
D in the vicinity of our crossing C. The ant will observe
three distinct pieces of strands at C and will not know if
they are “glued” within the same strands somewhere far
away (as in the picture to the right). For the remainder
of this section over F3 , we will take the ant’s viewpoint:
the local coloring of a crossing will consists of the three
i numbers assigned to the pieces of strands that make up
the crossing.
Thus, the conditions on a tricoloring say that at each crossing there must
be only 1 number repeated three times, or there must be all 3 numbers writ-
ten in the order determined by our original strand sequence {s1 , s2 , . . . , sn }.
3. THREE CRAYONS DEFEAT AN ARMY OF KNOTS 63
For each of your combinations, compute the sum of the three elements.
What do you notice?
(b) Suppose that strands si , sj , and sk (possibly listed with repetitions)
meet at a crossing. Based on your observation above, write down an
equation that their colors xi , xj , and xk must satisfy.
Answer (b): xi + xj + xk ≡ 0 (mod 3). ♦
Have fun by playing with this awesome linear algebra tool in the following:
Exercise 24. Apply this linear algebra procedure to all knots whose τ you
already know and compare your answers, e.g.,
(a) our six famous knots and links in Figure 4;
(b) the unknot twisted by n consecutive R1-moves in Figure 8;
(c) the knots with odd and even names on page 59;
(d) the square knot; the linear chain and the necklace with n rings each as
in Problem 5;
(e) the knots from the knot table in Problem 3 on page 68.
64 3. KNOTTY MATHEMATICS
orientations of the unknot drawn with 0, 1, or 2 twists (cf. Fig. 8), and decide
which are equivalent. Give all orientations for the trefoil, the Whitehead link,
and the Borromean rings, and think about which of them are equivalent.
6
Sir Vaughan Frederick Randal Jones was born in 1952 in New Zealand. In 1979 he
completed his doctoral studies at the University of Geneva, under the Swiss topologist
André Haefliger. The next year, Jones moved to the United States, and after teaching for
several years at the University of California at Los Angeles and the University of Penn-
sylvania, he received a permanent position at the University of California at Berkeley. In
1984, while working in the theory of von Neumann algebras (an area in analysis motivated
by group representations, operator theory, ergodic theory, and quantum mechanics), Jones
discovered the link invariant known now as the Jones polynomial, which unexpectedly had
vast applications in knot theory and re-energized the study of low-dimensional topology.
In 2002, Jones received the Distinguished Companionship of the New Zealand Order of
Merit, which was renamed Knight Companion in 2009.
4. THE JONES POLYNOMIAL 65
4.3. What is the Jones polynomial? There are several choices for how
to define the Jones polynomial. For our purposes, the easiest way is through
the so-called skein relation, which relates the Jones polynomials of certain
triplets of (oriented) links. The diagrams of these links L+ , L− , and L0 are
L+ L− L0
Figure 11. Links in the skein relation
identical except for at one specific crossing (cf. Fig. 11) where L+ has an
overcrossing, L− has an undercrossing, and L0 has no crossing.
i Definition 1. The Jones polynomial VL is defined for all oriented links L
by the following three properties.7
• VU (t) = 1, where U (t) is the unknot.
• VL is an invariant of links.
• VL satisfies the skein relation: for any triplet of oriented links L+ , L− ,
and L0 as described above (cf. Fig. 11),
Ç å
1 √ 1
VL − tVL− = t − √ VL0 .
t + t
The skein relation looks complicated, but it helps us relate the Jones
polynomials of knots that differ at one crossing.
PST 21. If you can find three links whose diagrams are identical except at
one specific crossing, where they differ as in Figure 11, then you can use the
skein relation to relate their Jones polynomials. If you know two of the Jones
polynomials, the skein relation will allow you to solve for the third!
We will do an example shortly; but first let us mention an open problem:
Question 1. (Open) If knot K has Jones polynomial 1, is K equivalent to
the unknot? Equivalently, is there a nontrivial knot with Jones polynomial 1?
This is such a simple question; yet we still don’t know the answer! Perhaps
you can enlighten us someday.
4.4. Building up the trefoil via the skein relation. We will go through
a series of examples to build up to computing the Jones polynomial of the
trefoil. We already know that VU (t) = 1. To see how the skein relation works
in practice, let us move to the next simplest case:
Exercise 26. Find the Jones polynomial of the unlink with two components.
7
The reader will notice that if we adopt Definition 1, we must prove that the Jones
polynomial actually exists and is unique for every link! We don’t have space for this here;
however, the advanced reader is encouraged to look up the proof in [44].
66 3. KNOTTY MATHEMATICS
Solution: Let U stand for the (oriented) unknot, and U2 for the (oriented)
i unlink with two components. By changing the uncrossing of U2 to an over-
crossing and an under-crossing, we can relate U2 to two copies of the unknot:
L+ = U L− = U L0 = U2
Figure 12. Skein relation for the unlink U2
If VU and VU2 are the Jones polynomials for the unknot and our unlink,
respectively, the skein relation yields:
Ç å Ç å
1 √ 1 VU =1 1 √ 1
VU − tVU = t − √ VU2 ⇒ −t= t − √ VU2 .
t t t t
Using the formula a2 − b2 = (a − b)(a + b), we can now solve for VU2 :
Ä √ä Ä √ä
1
− t
1
√ − t · √1t + t √ 1
VU2 (t) = √t = t
√ = − t− √ ·
t− t√1
t− t1
√ t
We just found the Jones polynomial of the unlink with two components!
Now, if T denotes the right-handed trefoil (cf. Fig. 2a), how do we find VT ?
sumably simpler) links via the skein relation.Keep applying the skein relation
PST 22. To calculate VL , reason backwards: relate the link L to other (pre-
to those new links, until you end up with links whose Jones polynomials are
already known to you.
Exercise 27. Draw pictures that relate the right-handed trefoil T to other
well-known links, one of which is the positive Hopf link H (cf. p. 64).
Now we are stuck with the Hopf link ! Get over this obstacle:
down what the skein relation says in your diagram and solve for V
Exercise 28. Relate the (positive) Hopf link H to well-known links. Write
H.
L+ = H L− = U2 L0 = U
Figure 13. Skein relation for the Hopf link
We are now ready to put together everything and attack the trefoil again.
4. THE JONES POLYNOMIAL 67
Exercise 29. Using your findings so far, calculate the Jones polynomial VT
of the right-handed trefoil T .
The careful reader might have noticed that we skipped one simple link:
the negatively-oriented Hopf link H − . Its Jones polynomial turns out to be
VH − = −t−1/2 − t−5/2 (check it!), which differs from VH ! Have we made
a mistake? It is important to understand that the Jones polynomial is an
invariant of oriented links. Orientation does not affect the Jones polyno-
mial of a knot (why?); but for a general link, you may get different Jones
polynomials, depending on the link’s orientations. With this in mind,
Exercise 30. Find the Jones polynomials of the figure 8 knot, the White-
head link, the Borromean rings, and the square knot.
which links need to be dealt with and keep a record of the Jones polynomials
already found. For those skilled in induction and algebraic operations on
polynomials, here are a couple of challenges in true math-Olympiad style.
Problem 7. Find the Jones polynomials of the unlink with n components,
the linear chain with n components, and the knots n1 from page 59.
4.5. Mirror, mirror. Imagine taking a knot and switching all the crossings.
i Doing this to a knot creates the knot’s mirror image.8 Below you see the
figure 8 knot and its mirror image. Are these two knots equivalent? For
starters, we should look at their Jones polynomials.
Exercise 31. The Jones polynomial of the
figure 8 knot, as you should have shown
earlier, is V41 = t2 − t + 1 − t−1 + t−2 .
Compute the Jones polynomial of its mir-
ror image to obtain the same result! Are
the two knots equivalent? The figure 8 Its mirror image
Hmm . . . the Jones polynomial can’t tell the difference between these two
knots. They might be the same, but they might not! In fact, there are
special words to reflect both possibilities. A knot is amphichiral if it is
i equivalent to its mirror image; it is chiral otherwise.
Problem 8. Make a figure 8 knot and its mirror image out of rope. Play
with the ropes to try to see if the figure 8 knot is chiral or amphichiral. If
you think it is amphichiral, then prove it using Reidemeister moves!
What if not? Wait a minute! Shouldn’t we try this on a simpler example?
Exercise 32. Is the trefoil chiral or amphichiral?
8
Think about where the name comes from! Nope, despite appearances, it’s not a
reflection across a vertical line! Where is the “mirror”?
68 3. KNOTTY MATHEMATICS
V61 = t2 − t + 2 − 2
t + 1
t2 − 1
t3 + 1
t4
V61 = 1
t2 − 1
t + 2 − 2t + t2 − t3 + t4
61 61
V816 = − t2 + 3t − 4
+ 6t − t62 + t63 − 5
t4 + 3
t5 − 1
t6
V816 = − 1
t2 + 3
t −4
+ 6t − 6t2 + 6t3 − 5t4 + 3t5 − t6
816 816
t6 − t5 + t4 − t3 + t2
6 5 4 2 1
+
910 910
Figure 14. Chiral or amphichiral?
Here are a few of the many intriguing and fundamental properties of the
Jones polynomial, some demonstrated by Jones himself in 1985.
Theorem 2. For any knot K, its mirror image K, and links L, L1 , and L2 :
2π 2π
(a) VK (t) = VK (t−1 ) and VK (e 3 i ) = 1, where e 3 i = cos( 2π 2π
3 ) + i sin( 3 ).
d
(b) dt VK (1) = 0, where d/dt is the derivative with respect to t.
(c) VL (1) = (−2)p−1 , where p is the number of components of L. Moreover,
if p is odd, then VL (t) is a polynomial with integer powers; if p is even,
then VL (t) is t1/2 times such a polynomial.
(d) VL1 #L2 (t) = VL1 (t) · VL2 (t).
We challenge the advanced reader to prove or find proofs of these facts.
4. THE JONES POLYNOMIAL 69
4.6. Mysticism, art, and mathematics. Bumping into knot and link
celebrities is a daily occurrence for everyone, whether we realize it or not.
For example, the trefoil is often the centerpiece of beautiful jewelry:
on a homemade link: it’s worth it to see the gliding in action. Is it possible
to create a “super-Russian” wedding ring of 4 pieces with similar properties?
The Celtic knot (or The Emblem of Divine Inscrutability), rumored to
contain all the wisdom of King Solomon, appears in an array of artistic
versions. It is actually not a knot but a link of 2 unknots intertwined twice
and is known in mathematics as the 4-crossing link.
Definitely not! If you would like to learn more about knots, you should
have a look at Justin Roberts’ “Knots Notes,” available on his website [68].
Another great resource, as we mentioned earlier, is “The Knot Book” by Colin
Adams [1]. Of course, you should also take a peek at the more recent article
“The Jones Polynomial” by the master himself, Sir Vaughan Jones on his
own website [44].
We mentioned in passing some funny-sounding, yet rigorous knot termi-
nology. If you are curious about a quandle, it is a knot invariant, discussed in
the accessibly written “Knot Quandle” by then-undergraduate Elenoir Bir-
rell [10]. For using flypes – a different type of knot transformations – to
prove “The Tait Flyping Conjecture,” we direct you to two papers of Menasco
and Thistlethwaite [54, 55]. Finally, a writhe – a property of a positively-
oriented link – fails to be a knot invariant, as demonstrated by Hoste et al.
in “The First 1, 701, 936 Knots” [42].
Regarding the open Question 1 on page 65, check out “Links with trivial
Jones polynomial” [81] and “Infinite families of links with trivial Jones poly-
nomial” [23] by Thistlethwaite et al. Even the basic notion of Reidemeister
moves enters into modern research nowadays; for some upper bounds on “The
number of Reidemeister moves needed for unknotting” we direct you to Hass
and Lagarias’ paper [38].
Many of the images in this session were created using Robert Scharein’s
KnotPlot software at http://knotplot.com, which you should absolutely
download and play with! It allows you to load knots from a library up to
10 crossings, see them in 3-D, compute polynomial invariants, sketch new
knots, and much, much more!
The author would also like to thank Henning Hohnhold for the idea to
include the Alexander the Great story.
Exercise 1+ . In Figure 2, the upper loop of the knot on the right has two
twists. Just untwist it to get the trefoil.
T T
thumb pointing in its direction, then your other fingers point in the direction
of the under-strand; thus, each crossing is of type L+ (cf. skein relation in
Fig. 11). This remains true for the right-handed trefoil T regardless of its
orientation, and it is false for the left-handed trefoil.
Exercise 3. A knot diagram with 1 or 2 crossings inevitably results in the
unknot, as demonstrated by the untwisting in Figure 8. Trying something
“different” with 2 crossings as in Figure 17b doesn’t help: just pull the strand
that is draped over the other to eliminate the crossings and get the unknot. ♦
Exercise 4. Take n − 1 unlinks and arrange them in a line so that each one
overlaps slightly with the one before and the one after. Add in the final link
by weaving through these, going over and under, over and under, and then
fuse the ends of this final link together. A case of a Brunnian link with 4 com-
ponents is displayed in Figure 3b; one actually has to stare at it for a while to
realize that our construction recipe is not followed to the letter. For another
construction using “rubberbands” check [57]. To see a Brunnian link with
5 components go to YouTube at www.youtube.com/watch?v=vshcgnSUtyI
and watch it fall apart in slow motion when one link is cut.
In fact, for each n the infinitely many Brunnian links with n compo-
nents were classified in 1954 by John Milnor via what is now called Milnor
invariants [56]. ♦
Exercise 5. Use two R1 moves or one R2 move.
Exercise 6. Whichever crossing you choose to change, the Hopf link will
become an unlink, the trefoil and the figure 8 knot will turn into the unknot,
and one of the Borromean rings will peel off, forcing the remaining two rings
into a Hopf link. In this respect, the Whitehead link is more interesting:
changing any of its 4 “outside” crossings results in the Hopf link; but changing
its central crossing breaks it into an unlink! You should check that the
transformations described here (except for crossing changes, of course!) can
be expressed as sequences of Reidemeister moves. ♦
Exercise 9. Show that any diagram with three crossings represents either
the trefoil or the unknot. ♦
Exercise 10. The solution to Exercise 6 actually tells us that the unknotting
number is 2 for the Borromean rings, 0 for the unknot (of course!), and 1 for
all other knots and links in Figure 4. ♦
Exercise 11. The links in Figure 4 will be ordered if you insert the 2-strand
Hopf link between the 1-strand unknot and the 3-strand trefoil. Note that
the number of strands in these links equals the number of crossings, except
for the unknot. Do you know why? Check out Exercise 21. ♦
Problem 1. Tri-color the diagram D. An R1 move can be performed only
on a monochromatic crossing, after which the crossing is eliminated but its
color is preserved (cf. Fig. 18a). For R2, the over-crossing strand is all one
72 3. KNOTTY MATHEMATICS
color, while the three under-crossing strands can be colored in two ways
(cf. Fig. 18b-c). In either case, after pulling apart the strands by move R2,
the diagram remains tricolorable: indeed, all strands “exiting” the picture
preserve their colors, thereby allowing for the rest of the (unseen) diagram
to remain tricolored as before. Note that we are only allowed to change the
color of strands that lie entirely inside our picture.
R1 R2 R2
The same idea governs tricolorability when applying move R3. The first
picture in Figure 19 has five “exiting” strands (in black) and one “non-exiting”
strand (in green). There are five ways to tri-color this diagram segment: two
cases with monochromatic (blue) central crossing and three cases with tricol-
ored central crossing. Check that after move R3, all “exiting” strands have
preserved their colors, while the “non-exiting” central strand may preserve
its color (as in column 2) or may change its color (as in column 3).
R3
R3
R3
R3
R3
Exercise 13. No: Hopf link, Figure 8, Whitehead link, Borromean rings.
Yes: trefoil, 74 knot. The picture has 5 trefoils, including the hairdo! ♦
Exercises 14-15. These knots are tricolorable if and only if the number of
crossings is divisible by 3. Think about why! ♦
Problem 2. Use the solution to Problem 1. ♦
Exercise 16. τ (Trefoil) = 9; τ (Figure 8) = 3; τ (Square) = 27. ♦
Problem 3. τ (61 ) = 9 = τ (61 ); τ (816 ) = 3 = τ (816 ); τ (910 ) = 9 = τ (910 ). ♦
Exercise 17. Yes, K1 #K2 is tricolorable. Let α1 and α2 be the strands in
K1 and K2 , respectively, on which the “surgery” will be performed. If α1 and
α2 have the different colors, permute the colors on K2 to make α2 ’s color
match α1 ’s color. Perform then the surgery and extend that common color
onto the pieces connecting K1 and K2 within K1 #K2 .
6. HINTS AND SOLUTIONS TO SELECTED EXERCISES 73
= =
L+ = T L− = U L0 = H
Figure 20. Trefoil in the skein relation
Exercise 30. For a link L, the table below lists triplets (L+ , L− , L0 ) entering
in a skein relation with L. Here X Y is the disjoint union of X and Y .
Link L L+ L− L0 Jones polynomial VL
two unknots U2 U U U2 −t1/2 − t−1/2
positive Hopf link H H U2 U −t1/2 − t5/2
right-hand trefoil T T U H t + t3 − t 4
Figure 8, 41 U 41 H t2 − t + 1 − t−1 + t−2
Whitehead link W H W U t−3/2 (−1 + t − 2t2 + t3 − 2t4 + t5 )
Borromean rings B B H U W −t3 +3t2 −2t+4−2t−1 +3t−2 −t−3
Square knot S S T T #H (t + t3 − t4 )(t−1 + t−3 − t−4 )
For a knot K, changing the orientation on one strand in a local crossing
picture forces us to change the orientation of the other strand (by tracing
around the knot) and, thus, the preserves the crossing type L+ , L− , and L0 ,
and does not affect VK . For a link L, though, VL may depend on the orienta-
tion of L: you can easily see this for the negative Hopf link H. Why doesn’t
it matter for the unlink U2 ? We leave it to the reader to decipher which ori-
entations we have used for the Whitehead link W and the Borromean rings
B above and if VW and VB are affected by our choices.
9
The positive and negative Hopf links H and H − are inequivalent (the fact that they
have different Jones polynomials proves this). The next exercise will help you calculate
their Jones polynomials. Ditto for the Borromean rings.
76 3. KNOTTY MATHEMATICS
While the first five examples in the table can be handled one by one
in the listed order, for B and S we need to know the Jones polynomials of
H U and T #H, which must be computed separately.
To get VHU , note that U2 = U U . Using the “skein” √ triplet
√ (L+ = U ,
L− = U , L0 = U2 ), we found earlier that VU2 = −(1/ t + t)VU . Our
calculation generalizes to any disjoint union LU . Indeed, the corresponding
skein relation is represented by (L+ = L, L− = L, L0 = L U ) (why?), and
1 √ 1
VL − tVL = t − √ VLU .
t t
Algebra manipulations similar to those in the text for VU2 yield
Ä ä Ä √ äÄ 1 √ä
1
− t V L
1
√ − t √ + t Ä √ä
VLU = √ t
= t
Ä√ t
ä VL = − √1t + t VL .
t − √t 1
t − √t1
This allows for a painless calculation of VHU (and VB ) and also establishes
Ä √ä
Lemma 2. VLU = − √1t + t VL for any link L.
Finally, VS for the square knot S can be yanked out through the skein
relation with (S, T, T #H). A faster approach would be to apply Theorem 2
(cf. p. 68), using that S is the connected sum of T and its mirror image T :
Thm. 2(d) Thm. 2(a)
VS = VT #T (t) = VT (t) · VT (t) = VT (t) · VT (t−1 ). ♦
Finding Vn1 for the knots n1 from page 59 requires a more intricate in-
ductive reasoning with skein triplets. To completely understand this solution
will require familiarity with recursive sequences.
6. HINTS AND SOLUTIONS TO SELECTED EXERCISES 77
Let Tn denote the link with 2 components that twist n times around each
other. Consider first the case for n odd. Skeining on any crossing, check that
L+ = n1 , L− = (n − 2)1 , and L0 = Tn−1 . In turn, skeining on Tn−1 yields
L+ = Tn−1 , L− = Tn−3 , and L0 = (n − 2)1 . For simplicity, write an = Vn1
and bn = VTn . Therefore,
1
t an − tan−2 = (t1/2 − t−1/2 )bn−1 ;
1
t bn−1 − tbn−3 = (t1/2 − t−1/2 )an−2 .
Solve for bn−1 from the first equation and then shift down the indices in the
result to obtain an expression for bn−3 too. Substitute these findings into
the second equation to eliminate all bk ’s and derive a “symmetric” recursive
relation involving the ak ’s alone:
an − (t3 + t)an−2 + t4 an−4 = 0 ⇒ an − t3 an−2 = t(an−2 − t3 an−4 ).
The last representation rolls down to the lowest possible index n = 5:
an − t3 an−2 = t(n−3)/2 (a3 − t3 a1 ) ⇒ an = t3 an−2 + t(n−3)/2 (t − t4 ),
where a3 = VT = t +t3 −t4 and a1 = VU = 1. Rolling down the last equation
to the lowest possible index n = 3 results in a direct formula for the an ’s:
n−1 î ó
(1) Vn1 = an = t 2 tn−1 + (1 + t2 + t4 + · · · + tn−3 )(1 − t3 ) .
Using a geometric series, we can rewrite (1) in a closed form as
n−1
tn+2 −tn+1 −t3 +1
(2) Vn1 = an = 1−t2 t 2 .
The compact formula (2) is cumbersome to work with, as it requires long
division. Since n is odd, it is evident from the direct formula (1) that Vn1
is an ordinary polynomial with positive integer powers of t and coefficients
±1. For example, one can check that V51 = −t7 + t6 − t5 + t4 + t2 . ♦
We leave the case of n1 with n even to the reader. Note that the bottom
crossing in all such knots is special: changing it unravels the whole n1 into
the unknot U . The final answer is: Vn1 = (t3 + t − t5−n + t2−n )/(t + 1). ♦
Problem 8. 41 is amphichiral: it takes 8 Reidemeister moves to show it. ♦
RI
Exercise 33. Let P1 → P2 be a Reidemeister move, where P1 and P2 are
the parts of the diagrams D1 and D2 affected by the move (as on p. 53). It
RI
suffices to show that P1 → P2 for the mirror images of P1 and P2 . ♦
Exercise 34. For a knot K and its mirror image K, VK (t) = VK (t−1 )
(cf. Theorem 2(a)). Indeed, if links L+ , L− , and L0 satisfy the skein relation,
then L+ , L− , and L0 also satisfy the skein relation, but with L+ and L−
playing opposite roles. Thus
1 √ 1
(3) VL (t) − tVL− (t) = t − √ VL0 (t)
t + t
1 √ 1
(4) ⇒ VL− (t) − tVL+ (t) = t − √ VL0 (t)
t t
78 3. KNOTTY MATHEMATICS
= =
L− = C L+ = H − L0 = U
Figure 21. Celtic knot C in the skein relation
√
As predicted by Theorem 2(c), VC , VU2 , VH , and VW , contain t, while
VT , VW , VB , and VS have only integer powers of t (why?).
= =
L+ = R L− = L 3 L0 = U2
Figure 22. Russian “wedding” knot R in the skein relation
L− = L 3 L+ = H U L0 = H
Figure 23. Oriented 3-ring linear chain in the skein relation
Session 4
Sneak Preview. To enter Multiplicative Land, we’ll have to get tickets from
an infinite raffle. While walking through villages of relatively prime numbers
and fields of perfect squares, while examining prime decompositions of castles and
crossing geometric series rivers, we will be constantly searching for ways to win
this raffle game. To this end, we will make friends with the two-faced duke, the
function ε, and the princes of divisors, τ and σ; we will meet their sum-function
relatives Sε , Sτ , and Sσ , and realize just how contagious multiplicativity is! In-
voking the strength of induction, we will eventually emerge victorious with a
winning raffle ticket, only to discover that even deeper challenges await us in this
Multiplicative Land of Dirichlet, Möbius, and Euler, in Part II.
A beginner with some basic knowledge from Number Theory I will be well-
equipped to follow our journey. The advanced reader can study the summarizing
Figure 1, hop quickly to the olympiad-hurdle Problem 8, and upon clearing it,
plunge directly into Part II, the intermediate-level continuation.
Suppose we buy several tickets from an infinite raffle, that is, a lottery
with infinitely many tickets. Each ticket has some natural number written
on it. We have a favorite number in mind, say, 2009, and we would like to get
a ticket with that number on it. But will there necessarily be a ticket with
2009 on it? Of course, it depends on which particular numbers are written
and how they are distributed among the raffle tickets.
Problem 1. (∞-Raffle) There are infinitely many tickets, each with one
natural number on it. For any n ∈ N the number of tickets on which divisors
of n are written is exactly n. For example, the divisors of 6, {1, 2, 3, 6}, are
written in some variation on 6 tickets, and no other ticket has these numbers
written on it. Prove that any n ∈ N is written on at least one ticket.
79
80 4. MULTIPLICATIVE FUNCTIONS
1.1. Initial exploration. Let’s mess a bit with some initial data to get a
feeling for ∞-Raffle. Try to solve the first cases for n = 1, 2, 3, 4 on your own
before reading the ensuing discussion below.
• The easiest number to be tackled is obviously n = 1: it has to be
written on exactly 1 ticket since {1} constitutes all divisors of 1.
• The next number is n = 2: its divisors {1, 2} must be written on a
total of 2 tickets; we just found out that 1 is written on exactly 1
ticket, so that 2 has no choice but to appear on the remaining 1 ticket
and on no more tickets.
• We apply the same analysis for n = 3: its divisors {1, 3} must be
written on a total of 3 tickets; as 1 is already known to occupy exactly
1 ticket, 3 must appear on exactly 2 tickets.
• For n = 4 the situation is marginally more exciting: the divisors
{1, 2, 4} must be written on a total of 4 tickets; knowing that each
of 1 and 2 is written on a unique ticket, 4 must appear on the remain-
ing 2 tickets.
The reader has probably gathered by now that,
PST 23. In order to solve ∞-Raffle, i.e., to prove that every number appears
on at least 1 ticket, we must do something more: we need to introduce a
“stronger” object, a function R(n) that counts the exact number of tickets on
which n appears.
The function R (for “Raffle”) suggested by PST 23 is the main player in
the solution to the ∞-Raffle Problem. We already know its first few values:
R(1) = 1, R(2) = 1, R(3) = 2, and R(4) = 2.
To find out on how many tickets 5 is written, we just calculate R(5): the
divisors of 5 are {1, 5}, written on a total of 5 tickets, so that
R(1) + R(5) = 5 ⇒ 1 + R(5) = 5 ⇒ R(5) = 4.
Similarly, the divisors {1, 2, 3, 6} of 6 pro-
duce an equation for the total number 6
of tickets on which they appear:
R(1) + R(2) + R(3) + R(6) = 6 ⇒ 1 + 1 + 2 + R(6) = 6 ⇒ R(6) = 2.
Exercise 1. Continue with the above calculations up to n = 10 to find out
that R(7) = 6, R(8) = 4, R(9) = 6, and R(10) = 4. The impatient reader
should keep on calculating R(n) until at least n = 20 to see if a pattern for
the function R pops up.
1
From now on, “numbers” and “divisors” will refer to natural numbers and divisors,
until we lift this restriction in Part II.
1. INFINITE RAFFLE: THE INITIAL SETUP 81
1.2. Brute force bows to general abstract theory. Using the above
method, it should be clear that one can determine R(n) for any n, as long as
all R(k) for smaller k are already calculated. Although this gives one way of
proving that our favorite number 2009 will appear on some ticket (just grind
out all numbers R(1), R(2), . . . , R(2009)), these close-to-insane calculations
are definitely not the way intended by the authors of Problem 1: for one,
calculating R(2009) alone will not prove that every number is written at
least once on the tickets.
In light of our new function R(n), the ∞-Raffle Problem can be para-
phrased to say:
Problem 1 . (∞-Raffle) Show that R(n) ≥ 1 for every n ∈ N.
This is far from a simple task. Interestingly, trying to prove just the
inequality (≥ 1) is much harder than trying to find the exact values R(n) and
compare them to 1. It is also true that, in order to conquer our problem, we
will require much more sophisticated methods than brute-force calculations.
So, here is the plan: for the remainder of the session, we will
PST 24. Step back and look at the ∞-Raffle Problem from different angles,
discover and formalize properties of R(n) along with a bunch of its sibling
functions, develop a new theory of multiplicative functions to explain all of
the arising phenomena, and ultimately produce an exact formula for R(n).
At every stage of creating this new theory, we will reconsider the ∞-
Raffle Problem, check how it relates to our new discoveries, and describe the
progress we have made on it up to that moment.
Multiplicative
Dirichlet Möbius
functions M product function μ
Sum-
functions Sf ∞-Raffle Euler
function φ
Möbius
inversion
exact formula for the ∞-Raffle function R(n) at the end of this session will
come as a direct consequence of the abstract theory of multiplicative func-
tions. The real beauty of the abstraction approach of PST 24 is that, in the
context of these two multiplicative sessions, it will
• lead us to a new and deeper understanding of numbers, functions, and
relations between them, and
• empower us to conquer numerous other difficult problems that we could
not have solved before.
Figure 1 illustrates the richness of the land M of multiplicative func-
tions. The - area summarizes the current session, while the six-concept
area marked by will be developed in the intermediate-level Part II. Both
sessions contain (different) solutions to the ∞-Raffle puzzle. Part II will ven-
ture into more advanced areas such as M’s group structure and the Dirichlet
series, and touch upon the famous Riemann zeta-function ζ(s). An historical
overview at the end will link six great mathematicians who have contributed
to the topic of multiplicative functions and its various extensions.
2.1. Basic definitions. The first thing to notice about the ∞-Raffle func-
tion R(n) is that it essentially differs from commonly used functions such
as g(x) = x2 : the variable x in g(x) is a real number (x ∈ R); in contrast,
the variable n in R(n) is just a natural number (n ∈ N). Thus, R has the
restricted domain of N. Such functions have a special name:
i Definition 1. A function f : N → C is called arithmetic.
Here C is the set of complex numbers. If you don’t feel comfortable with C,
for now you can safely replace it with the set of integers Z. For instance,
R(n) is arithmetic because R : N → Z. The important thing to remember
about arithmetic functions is that their inputs can only be natural numbers.
Let A denote the set of all arithmetic functions. This is a rather large
i set involving all sorts of functions. In these sessions we will concentrate on a
special subset M of A comprised of all multiplicative functions. Why M is
so special can be explained by the fact that it is usually easier to calculate
explicit formulas for multiplicative functions and not so easy for arbitrary
arithmetic functions.2
Definition 2. An arithmetic function f : N → C is multiplicative if for any
i relatively prime m, n ∈ N:
(1) f (mn) = f (m)f (n).
2
For the advanced reader, M is special on a deeper level partly because it is closed
under the Dirichlet product in A, as we will discover in the Part II continuation.
2. WHAT ARE MULTIPLICATIVE FUNCTIONS? 83
Recall that m and n are relatively prime if they share no common divisor
other than 1. For example, 9 and 20 are relatively prime, but 9 and 6 are
not. Thus, any multiplicative function must satisfy f (180) = f (9)f (20) and
f (1) = f (1)f (1) but not necessarily f (54) = f (9)f (6) (why?). While it is
obvious that the name “multiplicative” is inspired by equation (1), it is not
immediately clear why relative primeness should be involved at all.
there any functions (other than ε(n)) that attain only two distinct values?
Exercise 4. Describe all strongly multiplicative functions. Among them, are
Even though the definition of strong multiplicativity does not involve (at
least on the surface) relative primeness, the solution to Exercise 4 heavily
depends on the notion of the prime decomposition for any n ∈ N:
(2) n = pa11 pa22 · · · par r ,
where p1 , p2 , . . . , pr are the distinct prime divisors of n and a1 , a2 , . . . , ar are
the corresponding positive exponents.
Partial Solution to Exercise 4: Let f be strongly multiplicative.
Definition 3 then allows us to split f (n) along any divisors of n. For example,
for a prime power pa we can split as follows:
f (pa ) = f (p · p · · · p) = f (p) · f (p) · · · f (p) = (f (p))a .
a a
More generally, we can split f (n) along the prime decomposition of n:
⇒ f (n) = (f (p1 ))a1 (f (p2 ))a2 · · · (f (pr ))ar .
Thus, to completely know f we need to know only the values f (p) for any
prime p. These values can be arbitrarily assigned, as long as f (1) = 0 or 1
(why?). Of course, if we set f (1) = 0, we end up with the constant function O.
We summarize: any strongly multiplicative function f is either the 0-
function O, or it is constructed in the following way. For any prime number
pi we arbitrarily choose a (complex) number bi , set f (pi ) = bi , and expand
along the prime decomposition of n; that is, we define
f (n) = f (pa11 pa22 · · · par r ) := ba11 ba22 · · · bar r . ♦
For instance, if we set f (pi ) = p2i for all primes pi , we get back the square
function f (n) = p1 2a1 p2 2a2 · · · pr 2ar = n2 . If we set all f (pi ) = 1, we get
back the constant function ι(n) = 1. And finally, if we set all f (pi ) = 0 but
insist on f (1) = 1, we get back the hybrid function ε(n). Needless to add,
all of these functions are strongly multiplicative, as observed earlier.
2. WHAT ARE MULTIPLICATIVE FUNCTIONS? 85
PST 25. “Prove” that τ and σ are multiplicative via a specific example. The
chosen example must be representative enough to illustrate the involved ideas
and PSTs so as to allow us later to generalize our solution to all cases.
Some initial trials lead us to choose the simple (but general enough) case
of relatively prime m = 5 and n = 6. The divisors of 5, 6, and 5 · 6 = 30 are
{1, 5}, {1, 2, 3, 6}, and {1, 2, 3, 6, 5, 10, 15, 30}, respectively. The key question
is: how can we obtain the divisors of 30 by using only the divisors of 5 and 6?
After staring at the data for a while, the reader is likely to notice that the
divisors of 30 are all the pairwise products of the divisors of 5 and 6:
(3) {1, 2, 3, 6, 5, 10, 15, 30} = {1·1, 1·2, 1·3, 1·6, 5·1, 5·2, 5·3, 5·6}.
For starters, this means that the desired multiplicative relation among the
number of divisors of 5, 6, and 30 is satisfied: τ (30) = τ (5)τ (6) (8 = 2·4).
Moreover, we can calculate the divisor-sum σ(30) in two ways, using the
usual distributivity property:
σ(5)σ(6) = (1 + 5)(1 + 2 + 3 + 6)
distr.
= 1·1 + 1·2 + 1·3 + 1·6 + 5·1 + 5·2 + 5·3 + 5·6
= 1 + 2 + 3 + 6 + 5 + 10 + 15 + 30 = σ(30).
There is no obstruction to generalizing the above “proof-by-example” to all
cases, as long as the idea of pairwise products in (3) holds for any relatively
prime m and n. This is a well-known fact from number theory:
Lemma 1. For any numbers m and n, the divisors of mn are all pairwise
products of divisors of m and n. If m and n are relatively prime, then all
such products are distinct. In particular, the number of divisors of mn is the
product of the numbers of divisors of m and of n: τ (mn) = τ (m)τ (n).
We leave the reader to come up with a rigorous proof of Lemma 1 (cf. Hints
section). Note that Lemma 1 shows de facto that τ is multiplicative. ♦
multiplicative function f (n) to finding such a formula only in the case when
n is a prime power pa . Namely, you can split f (n) along the prime powers
pai i from the prime decomposition of n:
(6) f (n) = f (pa11 )f (pa22 ) · · · f (par r ),
and now look for a direct formula just in the prime-power case f (pa ).
r r
pai +1 − 1
(a) τ (n) = (ai + 1); (b) σ(n) = i
·
i=1 i=1
pi − 1
The notation in this problem calls for a short detour. By now, we have
carefully avoided the symbols and , but it is high time that we stop
beating about the bush and re-introduce them, as they will substantially
shorten our presentation and clarify calculations. The notation ri=1 simply
i means “take the sum of all terms indexed by i = 1, 2, . . . , r.” For instance,
we could have written the initial definition of σ in two equivalent ways:
r
σ(n) = di = d,
i=1 d|n
where the notation d|n stands for “d divides n.” The first summation is read
as “add all d1 , d2 , . . . , dr ”; while the second summation: “add all d’s for
88 4. MULTIPLICATIVE FUNCTIONS
which d|n,” or in other words, “add all divisors d of n.” Likewise, equations
(4)–(5) on σ’s multiplicativity can be succinctly rewritten as:
Ä
s äÄ
r ä distr. Lem.1
σ(m)σ(n) = ci dj = ci dj = σ(mn).
i=1 j=1 i,j
Notice the double-index “i, j” in the last summation: when bounds for i
and j are not explicitly written, it is assumed that i and j run over all
possibilities. Using the divisor notation, we can rewrite the above calculation
in yet another way that may at first look confusing; but ultimately, it is most
advantageous for multiplicative functions:
Ä äÄ ä distr. Lem.1
σ(m)σ(n) = c d = cd = e = σ(mn).
c|m d|n c|m,d|n e|mn
r
The notation is analogous: just take the product of all terms in-
i=1 r
i dexed by i. Thus, the function π can be written as π(n) = i=1 di = d|n d.
Conversely, the desired formula for τ (n) in Problem 3 can be expanded as
τ (n) = (a1 + 1)(a2 + 1) · · · (ar + 1).
With this said, we can go back to proving our formulas for τ and σ.
If you don’t believe this, calculate by brute-force the number and the sum
of all divisors of 2009 and compare answers.
2.7. Is the ∞-Raffle Problem doable after all? It’s time to pause and
think what this all means for the function R(n). If we eventually do manage
to prove that R(n) is multiplicative, PST 26 will empower us to find a direct
formula for it. For this to work, we will need to
Problem 5. Find a formula for R(p ) for any prime power p .
a a
We are now ready to derive a direct formula for the general case R(n):
mult?
(8) R(n) = R(pa11 )R(pa22 ) · · · R(par r )
(9) = (pa1 − pa1 −1 )(pa2 − pa2 −1 )(par − par −1 ).
Since each factor (pa −pa−1 ) ≥ 1 (why?), we conclude that the whole product
R(n) ≥ 1. Hence, every number n appears on at least 1 ticket!
Solution (a): The key observation is that 403 is the product of two primes
13 and 31. Correspondingly, if n had three or more distinct prime divisors,
i.e., n = pa11 pa22 pa33 k for some k ∈ N, then
τ (n) = (a1 + 1)(a2 + 1)(a3 + 1)τ (k) = 13 · 31.
But each factor ai + 1 ≥ 2 and, therefore, it yields a prime divisor of τ (n);
yet, 13 · 31 has only two prime divisors, a contradiction. We conclude that
n has at most two distinct prime divisors; i.e., n = pa11 pa22 or n = pa .
The formula for τ then yields τ (n) = (a1 + 1)(a2 + 1) = 13 · 31 or
τ (n) = a + 1 = 13 · 31, from which a1 = 12, a2 = 30, and a = 13 · 31 − 1. The
answer is n = p12 30
1 p2 or p
13·31−1 for primes p , p , and p, with p = p .
1 2 1 2
At the heart of this solution stands a powerful idea:
PST 28. Via properties of prime decompositions (e.g., 13 · 31), bound the
number of distinct prime divisors of n (e.g., n has at most two prime divi-
sors), and investigate each case within your newly-found bound.
2. WHAT ARE MULTIPLICATIVE FUNCTIONS? 91
We leave the reader to figure out part (b) with somewhat similar tech-
niques, and we move to the different part (c).
Solution (c): If π(n) = 23 36 (= 5832), the definition of π(n) implies that
n has exactly two prime divisors: p1 = 2 and p2 = 3 (why?), from which
1
n = 2a 3b . From Problem 4 for π, we have π(2a 3b ) = (2a 3b ) 2 τ (n) , so that
(2a 3b )(a+1)(b+1)/2 = 2a(a+1)(b+1)/2 3(a+1)b(b+1)/2 = 23 36 .
Equating the exponents of the involved prime powers of 2 and 3, we arrive
at a system of two equations:
a(a + 1)(b + 1) = 6 and (a + 1)b(b + 1) = 12.
There are many ways to continue from here. A slick way is to divide the
two equations, resulting in a/b = 1/2, i.e., b = 2a. Substituting in the first
equation yields a(a + 1)(2a + 1) = 6, which (by trial and error) has only
one natural root a = 1 (why?), and hence b = 2. The final answer is then
n = 2 · 32 = 18. Checking: π(18) = 1 · 2 · 3 · 6 · 9 · 18 = 23 36 .
For the next exercise, recall the notation gcd(m, n), which stands for
the greatest common divisor of m and n. Recall also that for each prime p,
the gcd picks up the smaller of the two prime powers pa in m and pb in n.
PST 28 applies with full force here too.
Exercise 7. Find all m and n such that gcd(m, n) = 18, τ (m) = 21, and
τ (n) = 10.
Proof: The product τ (n) = (a1 + 1)(a2 + 1) · · · (ar + 1) is odd iff all factors
(ai + 1) are odd themselves, i.e., all ai ’s are even. In turn, this means that
ai = 2bi for some numbers bi , and the prime decomposition of n is
n = pa11 pa22 · · · par r = p2b1 2b2 b1 b2
1 p2 · · · pr = (p1 p2 · · · pr ) = k ,
2br br 2 2
σ(pa ) = 1 + p + p2 + · · · + pa .
n = 2a p2b2 2b3 a b2 b3
2 p3 · · · pr = 2 (p2 p3 · · · pr ) = 2 k .
2br br 2 a 2
3. Sum-Functions
(pa )b+1 − 1
= 1 + pa + (pa )2 + · · · + (pa )b = ·
pa − 1
The last equality featured a geometric series with initial term 1, ratio pa ,
and b + 1 terms. Multiplying together all σa (pai i ) yields:
r
pa(ai +1) − 1
σa (n) = ·
i=1
pai − 1
Note that this formula works for all real a except for a = 0 (why?).
94 4. MULTIPLICATIVE FUNCTIONS
r Ä
ä
Sf (n) = f (1) + f (pi ) + f (p2i ) + · · · + f (pai i ) .
i=1
Try this formula on several more sum-functions:
Problem 7. Find formulas for S τ and Sσ . How about Sln and Sπ ?
Partial Solution: Corollary 1 can be applied to the multiplicative
sum-functions Sτ and Sσ , but it cannot help us in the remaining two non-
multiplicative cases. To find Sln , two basic properties of the logarithmic
function come to the rescue: ln(a)+ln(b) = ln(ab) and ln(ac ) = c ln a. Hence,
Ä 1 ä
Sln (n) = d|n ln d = ln( d|n d) = ln(π(n)) = ln n 2 τ (n) = 12 τ (n) ln n.
As for Sπ , try your luck, patience, and ingenuity! ♦
If you would like to create even more diverse multiplicative functions,
consider the following easy-to-prove statement:
Lemma 2. If f1 , f2 , . . . , fk are multiplicative functions, then the usual func-
tion product f1 f2 · · · fk is also a multiplicative function.
Thus, for instance, τ 2009 , Sτ 2009 , and even SSτ 2009 are all multiplicative.
Sπ A
Sln = 12 τ ln
SΛ = ln Sid =σ Sτ σ
S M
S ι =τ
Sε =ι SR =id S
S μ =ε na φ
O ι ε id R
μ τ σ Λ
ln
π
One could stop right here: after all, we have solved our ∞-Raffle Problem.
But if you are curious to see the story of sum-functions placed within a much
larger context and to arrive at an even niftier solution to ∞-Raffle, plow on
into Part II.
For those who have found the discussion so far too elementary, sharpen
your olympiad problem-solving skills with the following delightful
7
Strictly increasing means f (x) < f (y) for any x < y in the domain of f .
96 4. MULTIPLICATIVE FUNCTIONS
Problem 4. The (most likely) way by which Gauss added up 1+2+· · ·+100
in his primary school math class was to pair up terms in the front with terms
in the back, each pair giving the same total sum of 101, i.e., 1+100 = 2+99 =
3 + 98 = · · · = 101.
The same idea can be applied to the product of all divisors of n. If
{1 = d1 , d2 , d3 , . . . , dr = n} are the divisors of n arranged in ascending order,
note that n = d1 dr = d2 dr−1 = d3 dr−2 , and so on. The reason this works
out so nicely is because if d is a divisor of n, then nd is also a divisor n, so
that d · nd = n. Formally, { dn1 , dn2 , · · · , dnr } are also the divisors of n, but
arranged in descending order. We see that π(n) can be calculated in two
different ways, and we multiply the two corresponding expressions below:
π(n) = d1 d2 · · · dr × Ä n äÄ nä Ä nä
⇒ π 2 (n) = d1 · d2 · · · · dr · = nr .
π(n) = dn1 dn2 · · · dnr d1 d2 dr
As r = τ (n) is the number of divisors of n, we arrive at π(n) = nτ (n)/2 .
Exercise 6(b). The prime decomposition of 381 is 3 · 127. Since σ is
multiplicative, we can apply the prime-power splitting to it:
σ(n) = σ(pa11 )σ(pa22 ) · · · σ(par r ) = 3 · 127.
Each factor σ(pa ) = 1+p+· · ·+pa yields a non-trivial divisor of σ(n) = 3·127.
Hence, there can be at most two prime divisors p1 and p2 of n. (Why?
Compare with PST 28.)
Case 1. If n = pa q b for distinct primes p and q, then σ(pa ) = 3 and
σ(q b )= 127. The first equation has only one solution: 1 + 2 = 3, i.e., pa = 2.
You can “brute-force” the solutions to the second equation, but there is a
finer way to proceed. From q b+1 − 1 = 127(q − 1) (how did we get this?)
we can reduce modulo q to −1 ≡ −127 (mod q), i.e., q|126 = 2 · 32 · 7. But
q = 2 (we already established p = 2), so that q = 3 or q = 7. Check that
3b+1 − 1 = 127 · 2 and 7b+1 − 1 = 127 · 6 do not yield acceptable solutions
for b. Therefore, this case does not work in our problem.
Case 2. If n = pa is a prime power, then σ(pa ) = 381, which means
pa+1 − 1= 381(p − 1). Again, reducing modulo p results in p|380 = 22 · 5 · 19.
Check that p = 2 and p = 5 do not yield any solutions, but p = 19 works:
193 − 1 = 18 · 381. The final (and only) answer is n = 192 = 361. ♦
Exercise 6(d). As in part (c), set n = 3a 5b and obtain a system of two
equations. Dividing them, deduce that b = 43 a. Substituting into one equa-
tion, arrive at a(a + 1)(4a + 3) = 180. The LHS increases as a increases, so
the solution a = 3 is the only one (why?). The final answer is n = 33 54 . ♦
Exercise 7. Among other things, gcd(m, n) = 18 implies that both 2 and 3
divide m and n (why?). On the other hand, τ (m) = 21 = 3 · 7 is a product
of two primes, just like in Exercise 6(a) where τ (n) = 13 · 31. By a similar
analysis, conclude that m = 22 36 or m = 26 32 . Ditto, since τ (n) = 10 = 2 · 5
is also a product of two primes, n = 21 34 or n = 24 31 (why?).
98 4. MULTIPLICATIVE FUNCTIONS
a
a Ä a+1
ä
pi+1 −1
Sσ (pa ) = σ(d) = σ(pi ) = p−1 = 1
p−1 pi
− (a + 1)
d|pa i=0 i=0 i=1
a+1
p p p−1−1 −(a+1) pa+2 −p−(p−1)(a+1) pa+2 −p(a+2)+(a+1)
= p−1 = (p−1)2 = (p−1)2 ·
Along the way, we used the formulas for the sum of the arithmetic series
a a+1 i
i=0 (i + 1) and for the sum of the geometric series i=1 p . Applying (6),
we piece together all prime-power parts into general formulas for Sτ and Sσ :
r
(ai +2)(ai +1)
r
ai +2
r a +2
pi i −pi (ai +2)+(ai +1)
Sτ = 2 = 2 and Sσ = (p −1)2 · i
i=1 i=1 i=1
The final version of the formula for Sτ employs the notation for the binomial
coefficient a+2
2 = (a+2)(a+1)
2 ·
As we indicated in the text, working with Sπ is much harder, since π is
not multiplicative. We can’t use (6) for Sπ (n); even finding a closed formula
for a prime-power piece is already problematic:
a
Prob.4
a
a
a
i+1
p( 2 ) .
i
Sπ (pa ) = π(pi ) = (pi )τ (p )/2 = pi(i+1)/2 = ♦
i=0 i=0 i=0 i=0
Lemma 2. To save chalk , we will prove the lemma only for two mul-
tiplicative functions f1 and f2 ; but this will actually suffice to prove the
statement for any number of such functions (why?).
Let m and n be relatively prime. To show multiplicativity of f1 · f2 , we
calculate as follows:
4. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 99
def· mult
(f1 ·f2 )(mn) = f1 (mn)·f2 (mn) = f1 (m)f1 (n)f2 (m)f2 (n)
Ä äÄ ä
= f1 (m)f2 (m) f1 (n)f2 (n) = (f1 ·f2 )(m)(f1 ·f2 )(n).
Therefore, f1 ·f2 is also multiplicative. ♦
For instance, in the text, we claimed that SSτ 2009 is multiplicative. This
is true because τ is multiplicative, and so is its power τ 2009 by the newly-
proven Lemma 2, and so is its sum-function Sτ 2009 , and in turn, so is its
sum-function SSτ 2009 .
Theorem 2. Let n1 and n2 be relatively prime numbers such that n = n1 n2 .
We will prove by induction on n that f (n1 n2 ) = f (n1 )f (n2 ).
The statement is trivial for n = 1: as n1 = n2 = 1, we need only to verify
f (1) = f (1)f (1). By definition, Sf (1) = f (1); since Sf (1) = 1 or 0 (Sf is
multiplicative!), we conclude that f (1) = 1 or 0, so that f (1) = f (1)f (1).
Assume now that the statement is true for all d = d1 d2 < n, i.e., that
f (d1 d2 ) = f (d1 )f (d2 ). Then for our n1 n2 = n we calculate twice:
Lem.1 ∗
• Sf (n1 n2 ) = f (d1 d2 ) = f (n1 n2 ) + f (d1 d2 )
di |ni di |ni ,d1 d2 <n
IH
= f (n1 n2 ) + f (d1 )f (d2 );
di |ni ,d1 d2 <n
def
• Sf (n1 )Sf (n2 ) = f (d1 ) f (d2 ) = f (d1 )f (d2 )
d1 |n1 d2 |n2 di |ni
**
= f (n1 )f (n2 ) + f (d1 )f (d2 ).
di |ni ,d1 d2 <n
The same key idea occurs in steps (∗) and (∗∗), where we have separated the
product n1 n2 from all other (smaller) products d1 d2 . Since Sf is given to
be multiplicative, we have Sf (n1 n2 ) = Sf (n1 )Sf (n2 ): these are the LHS’s of
the above two equations. Equating their RHS’s and canceling all summands
f (d1 )f (d2 ), we are left with f (n1 n2 ) = f (n1 )f (n2 ). This completes the
induction step and shows that f is indeed multiplicative.
Problem 8. This is a tough problem. Did you manage to do it on your
own? In any case, we are given f (1) = 1, f (2) = 2, and f (3) ≥ 3 (why?).
PST 29. To prove f (3) = 3, create an equality in order to have a variable
to work with: set f (3) = 3 + m for some m ≥ 0 and show that m = 0.
Now, we have to somehow combine the multiplicativity of f and the fact
that it is strictly increasing to show m = 0. To this end,
PST 30. Aim at some composite number n = d d 1 2 (with d1 and d2 relatively
prime) and, using f (n) = f (d1 )f (d2 ), arrive at f (n) in two different ways,
thereby creating two opposite inequalities f (n) ≤ N1 and f (n) ≥ N2 .
100 4. MULTIPLICATIVE FUNCTIONS
18 ≥
f (15) ≤ f (18) − 3 ≤ 15 + 8m 15
⇑ ∗
f (18) = f (2)f (9) ≤ 18 + 8m 10 ≥
⇑ 9
f (9) ≤ f (10) − 1 ≤ 9 + 4m ∗
⇑ 6 ≥
f (10) = f (2)f (5) ≤ 10 + 4m 5 ∗
⇑
f (5) ≤ f (6) − 1 = 5 + 2m ∗ f (15) = f (3)f (5) ≥ (3+m)(5+m)
⇑ ⇑
f (6) = f (2)f (3) = 6 + 2m 3 f (5) ≥ f (3) + 2 = 5 + m
Figure 3 depicts two possible ways of reaching f (15). Each upward (∗)
step is an application of f ’s multiplicativity, while each downward (≥) step
uses that f is strictly increasing. To the left of the diagram a chain of
calculations starts with f (6), goes through f (5), f (10), f (9), and f (18), and
lands us with our first inequality f (15) ≤ N1 = 15 + 8m. To the right of the
diagram, starting with f (5) produces our second inequality f (15) ≥ N2 =
(3+m)(5+m). A “skinny” sandwich N2 ≤ f (15) ≤ N1 has been created:
15 + 8m + m2 ≤ f (15) ≤ 15 + 8m ⇒ m2 ≤ 0 ⇒ m = 0,
and the desired f (3) = 3 follows immediately. Having cleared this hurdle,
one may now look for faster ways to show f (3) = 3. For example,
f (3) · f (5) = f (15) < f (18) = f (2) · f (9) < f (2) · f (10) = f (2) · f (2) · f (5) = 4f (5).
Cancelling f (5) implies f (3) < 4, i.e., f (3) = 3.
Do we have to come up with such specific arguments for every n to
establish that f (n) = n? Fortunately, there is a shortcut. To demonstrate
the idea, we run the first cases.
• How to show that f (4) = 4? As 4 = 2·2, multiplicativity won’t help. The next
value, f (5), also does not yield to multiplicativity. But f (6) = f (2)f (3) =
2 · 3 = 6. Since f (3) = 3 < f (4) < f (5) < f (6) = 6, only two natural numbers
fit between 3 and 6, namely, f (4) = 4 and f (5) = 5.
• Instead of going for f (7), f (8), or f (9) (all unreachable via multiplicativity),
we try f (10) = f (2)f (5) = 2 · 5 = 10. Again, we have just the right “tight”
inequalities: f (6) = 6 < f (7) < f (8) < f (9) < f (10) = 10. Only 3 natural
numbers can fit in, namely, f (7) = 7, f (8) = 8, and f (9) = 9.
• Continuing with this reasoning, the next number to find is f (14) = f (2)f (7)
= 2 · 4 = 14, thereby locating everything between f (10) and f (14).
4. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 101
Not so fast! The gap here is quite subtle. For the inductive argument
to work, we need to have 2k + 1 ≤ 4k − 2 (why? check the long chain of
inequalities above), i.e., k ≥ 1.5, i.e., k ≥ 2. But then the basis case of our
inductive hypothesis is for k = 2: “Assume that f (l) = l for l = 1, 2, 3.”
(Why?) We are given that f (1) = 1 and f (2) = 2, but we needed to go out
of our way to prove that f (3) = 3. The basis case in this problem is much
trickier to do than the general inductive step! Now we are truly done.
Timothy Chu, then a 12th grader from the SF Bay Area, asks even bolder
questions: what if f (n) is not given to be strictly increasing, or f (2) = 2?
Problem 9. (Chu/O’Dorney) If f : N → N is multiplicative and increas-
ing, and f (k) = k for some k > 1, then f (n) = n for all n ∈ N.
Solution: Replacing (<) with (≤) in the above solution, we arrive at:
f (n + 1) f (n)
(14) 1≤ ≤ for all n ≥ 2;
f (n) f (n − 1)
i.e., the ratio-function f (n + 1)/f (n) is decreasing and bounded below by 1.
If f (n) is not strictly increasing, then f (n) = f (n+1) for some n implies that
all ratios f (m +1)/f (m) from there on are equal to 1; i.e., f (n) = f (n+1) =
mult
f (n + 2) = · · · . In particular, f (n) = f (n(n + 1)) = f (n)f (n + 1); i.e.,
f (n + 1) = 1, and f is the constant 1 (why?), a contradiction. Thus, f is
strictly increasing. Because f : N → N and f (k) = k for some k > 1, it
follows that f (2) = 2.
We are back to our previous problem! Thus, f (n) = n for all n ∈ N.
Session 5
Sneak Preview. Having played with Rubik’s Cube and taken it apart to see
what is inside, it is now time to look under the hood and penetrate more deeply
into what its true structure is. The building blocks are groups. Stubborn poly-
nomials, symmetric elephants, and socks that beg to be put on, taken off, and
permuted between your feet are all part of the story, directed by Galois. You will
escape never-ending cycles in a complex world, only to stroll along in Permuter-
land and, ultimately, seek bi-polar paths in 15-Puzzleland.
1. Puzzling It Out
1 2 3 4 4 3 2 1 10 9 8 7 8 14 11 3
5 6 7 8 5 6 7 8 11 2 1 6 12 2 15 9
9 10 11 12 12 11 10 9 12 3 4 5 6 4 13 1
13 14 15 13 14 15 13 14 15 7 10 5
Problem 1. (McCoy, [53]) Starting from the initial position in Figure 1a,
which 15 -puzzle positions in Figures 1b–d can be achieved and why?
Understandably, a novice may ask: “What does this puzzle have to do
with serious mathematics?” “Ah, . . . wrong question!” an advanced math cir-
cler will say. “Just about any interesting (or uninteresting) puzzle is somehow
related to mathematics.” The puzzle is frequently a disguise for an actual
problem from group theory. In fact, by the end of this session you will have
seen such a variety of examples of groups, that (whether you wanted to or
not) you will start seeing groups everywhere around you!
103
104 5. GROUP THEORY
For instance, just like the Rubik’s Cube, the 15-puzzle is solvable via a
special type of permutations that form a subgroup, fortified with the idea of
a closed path . . . . What does this mean? As vague as this hint may be, it is
the only one you will get for now on Problem 1. Did you try it? Any luck
in transforming Figure 1a into others? Some positions will be achievable
while others will stubbornly remain out of your reach! Is it possible to rigor-
ously prove that the stubborn positions are indeed unreachable, regardless
of how long you play with the puzzle and regardless of whatever complicated
sequences of moves you invent?
If you are stuck, hang around with us for a systematic introduction to
the objects, theorems, tools, and basic applications of group theory. At
the end, we will get back to the 15-puzzle and, hopefully, by then you will
not find it nearly as difficult as it may now look. On the other hand, if
you already know the fundamentals of group theory, skim over the examples
spread throughout this session, and jump to the challenging problems in the
last section. The 15-puzzle will be waiting for you there.
2. A Polynomial Prelude1
2.1. The promise of the quartics. When we think of algebra the first
thing that comes to mind is the study of polynomial equations and their
solutions. And duly so – for a long, long time algebra essentially had been
that very study.
Of course, any linear or quadratic equation can easily be solved, and
there is evidence suggesting that Babylonians as early as 1800 BCE already
knew general procedures for dealing with both types of equations.
Cubic equations proved to be much trickier – the first description of a
general way to solve them appeared in Ars Magna, published in 1545 by
Gerolamo Cardano.2 Soon after, Cardano’s pupil Lodovico Ferrari invented
a nice reduction procedure to conquer quartic (4th degree) equations by con-
structing an associated cubic equation, solving it, and then using its roots
to find a solution to the original quartic equation. This method seemed to
promise that a similar approach could be used to solve higher degree equa-
tions – just keep constructing auxiliary lower degree equations and solving
them.
Unfortunately, this did not work – so much so that all attempts to find
a general method for solving even quintic (5th degree) equations failed.
1
If any words in this section are unfamiliar to you, don’t worry: just read on for the
fun of it. After all, the history of mathematics is full of duels, drama, and enlightenment.
2
Recall the discussion of x3 = 15x + 4 in Complex Numbers I, volume I. The method
was actually found independently by Scipione del Ferro and Niccoló Tartaglia, but revealed
by Cardano in Ars Magna, apparently, against Tartaglia’s wishes.
3. ACTION GROUPS 105
3. Action Groups
Now just stop for a second and see whether what you have just read
makes any sense to you. You certainly should be perplexed by certain words!
In particular, what exactly is meant by the product of actions? Actions are
not numbers, so how do we multiply them? When we deal with an action
group, we can combine a pair of actions by performing one of them and then
following with the other one; and – just for convenience! – we say that we
i have multiplied these two actions.
We are really interested only in the final result of these actions, and not
in the particular way by which that result has been achieved. So if the soldier
turns 180◦ around and then turns right, the result is the same as if he simply
turned left to begin with. (Can you see it?) Thus we say that the product
of actions b and r equals l, and we write rb = l. Observe the order in which
we list the actions b and r: from right to left!
Let’s go back to the multiplication table for the · s r b l
turning soldiers. If the 2nd row is labeled by r and the s
3rd column is labeled b, then we place l in cell (2, 3). r l
Can you fill in the entire table? b
Notice that s = “doing nothing” is a very special
l
action. Every group must have such an element. Why?
Anyone would agree that the counteraction of turning left, l, is turning right,
r; if we perform these two actions one after the other, we get rl = s. By our
rules for a group, we must therefore include in T the “do nothing” turn s.
Can you think of other reasons why s should be in T ? Check out the first
row and column corresponding to s: they are also very distinguishable!
3.2. A group for every sock. While one sock is not enough for your two
feet, it is enough (precisely because of this) to make for an interesting group.
Exercise 2. (Sosinski, [77]) The “One Sock” group S consists of the ac-
tions: • n = do nothing;
• c = take the sock off and put it on the other foot;
• i = take it off, turn it inside out, then put it on
the same foot again;
• t = take it off, turn it inside out, then put it on
the other foot.
Show that S is indeed a group.
Here is one question over which you may (and should)
want to ponder: Is the One Sock group any different from
the Turning Soldier group? Each consists of four actions.
Still, can we view every turn of a soldier as a sock move?
We will explore such questions soon; but for now start thinking about this.
PST 32. A classical way to distinguish between the Turning Soldier and
the One Sock groups is to find the counteraction of each sock’s move and of
each soldier’s turn and compare the two situations.
3. ACTION GROUPS 107
3.3. A group for every figure. The next example is much more interesting
(and important). While numbers measure size, groups measure symmetry.
Symmetry is the property of an object to remain unchanged while undergoing
i changes. More precisely, a symmetry is a motion that maps a figure onto
itself. For instance, any motion you perform on the elephant-in-profile E1 –
a translation, rotation, reflection, or glide reflec-
tion4 – will produce another figure (congruent to
E1 ). By contrast, the full-face-elephant E2 will go
to itself under a reflection r about a vertical line.
We conclude that elephant E1 has only the trivial symmetry i (the “fix every-
thing” motion), while elephant E2 has a second symmetry – the reflection s.
In general, for every geometric figure F , the collection of symmetries of F
i forms a group (why?) called the symmetry group of F and denoted by S(F ).
The structure and size of this group tells us how much symmetry the figure
possesses. Thus, S(E1 ) = {i} is a single-element group, while S(E2 ) = {i, s}
is a group of 2 motions. Let’s move now to larger symmetry groups.
the elements of the symmetry group S(Δ). How many are there?
Exercise 3. Let Δ denote an equilateral triangle. Describe (geometrically)
The curious reader, of course, will ask if the remaining numbers missed
by the orders of Dn can be obtained as orders of symmetry groups of plane
figures. We challenge the reader to affirmatively answer this question:
Problem 2. For any odd n ≥ 1, find a plane figure F with exactly n
symmetries, i.e., such that the number of elements in S(F ) is n.
edges; and 1 rotation about each of 3 axes through the centers of opposite
rectangular side faces.
(3) G3 has the identity; 2 rotations about each of the 4 axes through a
vertex and the center of the opposite face; and 1 rotation about each of the
3 axes through the midpoints of opposite edges.
Thus, |G1 | = |G2 | = |G3 | = 12. But clearly, the symmetries of these
solids are distinctly different. One such striking difference is the fact that
one single rotation when repeated, generates all rotations of the pyramid
(which rotation is that?); but there is no such single rotation of the prism
or the tetrahedron.6 There are other differences as well. To name just
one more, for the pyramid there is only one (non-trivial) rotation which
counteracts itself (which one?), i.e., combined once with itself it equals the
identity. For the prism, there are more such rotations (how many?); and for
the tetrahedron, the number is still different (what is it?).
These essential differences imply that the Gi ’s are all distinct groups. ♦
PST 33. To establish that groups are not the same, find a suitable property
that is satisfied by a different number of objects from each group. Along with
group order, you may want to count, for example, the number of elements
that counteract themselves, or those that generate the groups (if any).
3.5. A group within a group. Problem 3 was based on the fact that the
rotational symmetries of the solids in Figure 3 form smaller groups Gi inside
the full symmetry groups. A similar phenomenon can be observed in the
simpler case of the group D4 : we may notice that some actions in this group
i form a group by themselves. We call such a subset a subgroup.
3.6. Twin groups. Comparing the tables for R4 and T (pp. 106, 109),
we can see that they differ only by the letters used to denote the elements.
After a suitable renaming (e.g., s → r0 , r → r1 , b → r2 , l → r3 ) one
table will become exactly the same as the other. Therefore, these groups are
i indistinguishable from an algebraic point of view. We call them isomorphic
groups and denoted this by R4 ∼ = T.
Exercise 5. Are the groups R4 and S isomorphic? Why or why not?
So far, we have found only two non-isomorphic groups of order 4: T and S.
4. General Groups
4.2. One too many. It is natural to ask if the objects in Definition 2(ii)-(iii)
of a group are unique:
All the same, here is a property that will make a group abelian.9
Problem 6. Show that if a ∗ a = e for all a ∈ G then G is abelian.
Hint: For any two a, b ∈ G, start with (a ∗ b)2 = e, expand this, and solve
for b ∗ a (which will appear in the middle of your expression). ♦
The hypothesis of the problem may be interpreted to say that every
element is its own inverse, or that every action is its own counteraction!
This was the case in the One Sock group S; now we automatically know
that S is abelian, without having to check the commutativity condition for
all pairs of socks moves! Problem 6 is a classic in the group theory folklore:
it relates the local self-inverse property of individual elements of G (a∗a = e)
to the global property of G being abelian (a ∗ b = b ∗ a).
The comparison between S(Z) and Dn justifies the name infinite dihedral
i group D∞ for S(Z). Still, why wouldn’t S(R) or S(C) work as well as S(Z)?
ζ5
C ζ52 C
rβ
rα −4 −3 −2 −1 0 1 2 3 4
0 1
0
t t 2
t3 ζ53
s
ζ54
s
Figure 4. C-symmetries, Z-symmetries, and cyclic C5
5.3. Complex world. We can think of the circle C as the set of all complex
numbers12 with magnitude 1: C = {z ∈ C | |z| = 1}, or equivalently, C is
the unit circle in the C-plane. For starters,
Exercise 12. Show that C∗ = C − {0} is a group under ordinary multipli-
cation of complex numbers, and that (C, ·) is a subgroup of (C∗ , ·).
The fact that (C, ·) is an infinite group does not prevent it from having
finite subgroups. Indeed, let n ≥ 1 be an integer, and denote by Cn the
set of all roots of the polynomial equation of degree n, z n − 1 = 0, i.e.,
i Cn = {z ∈ C | z n = 1}. For example, C2 = {1, −1} and C4 = {1, i, −1, −i}.
It is no surprise that all these roots land on the unit circle C: the equation
z n = 1 implies that |z| = 1 and hence z ∈ C. This is illustrated by Figure 4c,
depicting the relative positions of the 5 elements of C5 along C. Moreover,
Exercise 13. Show that C n is a subgroup of (C, ·).
√
We can actually list all elements of Cn (via de Moivre’s formula for n z):
Cn = {1, ζn , ζn2 , ζn3 , . . . , ζnn−1 },
i where ζn = cos 2π 2π th
n + i sin n is a primitive n root of unity, that is, a root
whose powers yield all other roots of the equation z n = 1. We can observe
this phenomenon in the above examples:
• as (−1)2 = 1, the primitive root in C2 is ζ2 = −1;
12
To get comfortable with this example, read first Complex Numbers I-II. In particular,
the magnitude |z| is the distance from point z = a + bi to the origin; |z −1 | = 1/|z| and
|z1 z2 | = |z1 | · |z2 | for any z, z1 , z2 ∈ C. Exercises 12–13 are solved (under disguise) in these
sessions. Primitive roots of unity and de Moivre’s formula appear there too.
114 5. GROUP THEORY
We agreed above that a−n = (a−1 )n for every positive integer n (this
is simply the meaning of our notation). In order to explore if and how a
generates the whole group, we need to be able to manipulate all powers of a:
Exercise 17. Is it true that (a−1 )n = (an )−1 for any integer n? Why?
5.6. Will the court, please, come to order! It is true that every element
a ∈ G generates a cyclic subgroup a of G. Moreover, if G is finite, there
must be a positive n such that an = e; otherwise, a will generate an infinite
cyclic subgroup a of G! In general,
Definition 4. If G is a group and a ∈ G, the smallest positive integer n for
i which an = e is called the order of a and denoted by o(a). If such n does
not exist, we say that a has infinite order and write o(a) = ∞.
Here is a bunch of examples. Check them all out on your own!
• In C4 the order o(i) = 4 while o(−1) = 2. However, in C5 , all elements
(except for the identity 1, of course) have orders 5 so that each generates
the whole group C5 (cf. Fig. 4c).
• Moving to additive notation, o(3) = 2 in Z6 because 3 + 3 = 0; but
o(3) = 4 in Z4 because 3 + 3 + 3 + 3 = 12 = 0 and no smaller sum
would yield the identity 0; still yet, o(3) = ∞ in (Z, +) (why?).
• Finally, for the reflection s and the generating rotation r in the dihedral
group Dn we have o(s) = 2 while o(r) = n.
Problem 8. Let a and b be elements of a group G, and let o(a) = k.
(a) What is o(a−1 )? How about o(am ) for any m ∈ Z?
(b) Prove that H = {e, a, a2 , a3 , . . . , ak−1 } is a subgroup of G, previously
denoted by a. Deduce that the order of a is o(a).
(c) If o(ab) = n, prove that o(ba) = n too.
116 5. GROUP THEORY
(a) If G is cyclic, show that it is abelian.
(b) If G is cyclic of order n, show that it has an element of order n.
(c) Show that Dn is non-abelian and hence non-cyclic, but it contains a
cyclic subgroup of order n.
Hint: (c) Consider the set of rotational symmetries of a regular n-gon. ♦
5.7. A never-ending cycle? Can an infinite group have elements of finite
order? Not only the answer is Yes, but you have worked many times in the
“extreme” scenario:
Problem 9. Give at least two different examples of (infinite) groups that
contain elements of order n for every n ≥ 1.
Hint: Two possible answers are among the groups on pp. 112 -113. ♦
We constructed the cyclic groups Cn as examples of finite subgroups of
the circle C. Are these all finite subgroups of C∗ ? The ingredients for the
solution to our final problem below are spread all over this section.
Problem 10. (Intermediate) Find all finite subgroups of (C∗ , ·).
Permutation groups are the substance of Rubik’s Cube I-II. Indeed, their
complexity is what makes the Rubik’s Cube such a tantalizing and challeng-
ing puzzle. Even though permutations provide “just” examples of groups,
they are so fundamental for the development of group theory that it is worth-
while reviewing them again here and doing all associated exercises. If you
feel strongly prepared for the topic, tackle on your own the 15-puzzle in
Problem 1 and rejoin us later for the official “showdown” via permutations.
6.1. The word permutation has at least five mathematical synonyms.
Definition 5. Let A be a set of n elements. A permutation α of A is a
i rearrangement of the elements of A. In other words, α is a 1-to-1 function
from A onto A, a.k.a. a 1-to-1 correspondence or a bijection of A.
6. PERMUTATION (OR SYMMETRIC) GROUPS 117
13
The example of Permuterland was introduced in 1973 by Roy Dubish in his Groups
(Topics For Mathematics Clubs, [22]). It is interesting to realize that 40 years ago,
group theory was considered a suitable topic for budding pre-college mathematicians.
The publisher of the book is the National Council of Teachers of Mathematics.
118 5. GROUP THEORY
Now, in a hectic mood one
day, the king yells “Promenade 1” p1 p2
and, two minutes later, yells “Prom-
enade 2.” To his royal amazement, p4
the king realizes that the judges p3
are now seated exactly as they p2 p1
would be if instead he had just
yelled “Promenade 3.” The next
day he decides to try this procedure Figure 5. Permuterland
again – but with a slight variation:
now he yells “Promenade 2” first and then yells “Promenade 1” – and he is
amazed to find out that the result is not the same as Promenade 3; indeed,
the result is what he has been calling Promenade 4.
Exercise 19. Of course, you recognize that the “Promenades” are simply
elements of some Sk . What is k and how does the example of Permuterland
show that this Sk is non-abelian?
Solution: The judges comprise the set A = {A, B, C}. If pj denotes
Promenade j, then the King’s favorite promenades p1 , p2 , p3 , and p4 are
permutations of A, i.e., they are elements of the symmetric group S3 . The
King’s observation p1 p2 = p3 = p4 = p2 p1 shows that S3 is non-abelian.
Notice that the permutation π in (a) has the effect of moving the elements
i around in a cycle. Thus, we call it a cycle of length 3 and we write it as
π = (1 3 2). This is just another, more convenient, notation for the same
permutation. We think of (1 3 2) as representing the following mapping:
1 → 3 → 2 → 1, and we drop the spaces if only one-digit numbers appear.
Clearly, (132) = (321) = (213).
A cycle of length r is called an r-cycle. A 2-cycle is also called a transpo-
i sition since it transposes two elements. Which promenades in Permuterland
are transpositions and which are 3-cycles?
Exercise 23. Calculate (1356) , (1356) , and (1356) . What is o((1356))?
2 3 4
Not all permutations are cycles (obviously!) . . . but it is true that they
can all be written as products of one or more cycles. For starters,
Å ã
Exercise 25. Write permutation φ = 12345678
32567814
as a product of cycles.
Now generalize this result to any permutation, on your way to a formula for
the order of permutations.
Problem 12. Prove the following statements.
(a) Every permutation can be expressed as a product of disjoint cycles, i.e.,
cycles which have no common elements.
(b) Every permutation can be expressed as a product of transpositions.
(c) The order of the product of disjoint cycles is the least common multiple
(lcm) of the lengths of these cycles.
“Proof” by Example: You should have found out in Exercise 25 that
φ = (1357)(468) is the product of two disjoint cycles. Disjoint cycles are
great because they commute: it doesn’t matter which way you write them,
you will get the same result; e.g., (1357)(468) = (468)(1357). Thus, powers
120 5. GROUP THEORY
6.4. Permutations are born unequal! Can you guess why we didn’t
use the transpositions to calculate o(φ)? The key reason is that transposi-
tions cannot always be made disjoint; hence, they may not commute and,
in general, are not convenient in calculating the order of the permutation.
Nevertheless, representing a permutation as product of transpositions plays
a crucial role in solving the 15-puzzle, so don’t discard transpositions yet!
Definition 6. A permutation is said to be even if it can be expressed as the
i product of an even number of transpositions. A permutation is odd if it can
be expressed as the product of an odd number of transpositions.
For example, (123) and (12)(2543) are even since (123) = (13)(12) and
(12)(2543) = (12)(23)(24)(25); while (1234) = (14)(13)(12) is odd, and so is
φ above. The identity permutation is even: (1) = (12)(12).
An “annoying” question should pop up in your mind: Isn’t it possible
to write the same permutation in two different ways, once as a product of
an even number of transpositions, and once as a product of an odd number
of transpositions? If yes, this would completely obviate the meaning of the
above definition! We urgently need to resolve this question.
Problem 13. Prove the following facts about even and odd permutations.
(a) The identity permutation is not odd.
(b) Every permutation in Sn is either even or odd but not both.
(c) An r-cycle is even if and only if r is odd.
Part (a) may initially strike you as strange – why care so much about
the specific case of the identity not being odd? If (b) is proven, wouldn’t it
subsume (a)? Still, stating (a) separately is no mistake.
PST 35. To prove a property for all permutations, first prove it for the
special case of the identity e and then reduce the general case to the case for e.
This was the easy part. The hard part will come up when you try to
show that e cannot be odd, as desired in (a).
Problem 15. Prove that the group of rotations of the cube is the entire S4 .
Veiled hint: Can you think of the 4 things in the cube C which are being
permuted by every rotation of the cube and whose group of permutations
“coincides” with the group G4 of rotations of the cube? ♦
Problem 16. (Advanced) What is the group of all symmetries of the cube?
Hints through answer: The full S(C) turns out to be twice as big as
S4 , but certainly not as big as S8 . Why? Which permutations of the cube’s
vertices cannot be obtained by symmetries of C? In fact, S(C) is the direct
product of S4 with the cyclic group C2 = {1, −1}, i.e., S(C) = S4 × C2 .
Now, the identity element 1 of C2 is easy to interpret geometrically
(how?), but what is the geometric meaning of the other element −1 of C2 ?
It cannot be a rotation of the cube (as all of those are already included in
S4 ); but it is definitely a symmetry of order 2 (why?), so what is it? ♦
7. THE 15-PUZZLE PUZZLED OUT 123
α1 S16
α3
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
α2 e S15
α4 5 8 5 6 7 8 5 6 7 8 5 10 6 8
α8
A16 α6 9 9 10 11 16 9 10 16 11 9 14 7 11
13 13 14 15 12 13 14 15 12 13 15 12 16
α5 α7
P on e α1 α2 α8
closed
Problem 17. Write down α8 . Is it odd or even? What if you choose another
15 path for 16: what will the parity of the final permutation be? Why?
chance of being reached! We will determine whether this is so in the next
subsection. How about the permutation in Figure 1d?
7.4. Playing the puzzle. Now it remains to show that any even permu-
tation can be obtained via the 15-puzzle. The best way to do this is . . . to
play the puzzle. But not randomly! Here is a vastly simplifying idea:
PST 41. Instead of starting with the identity e and finding your way to any
even arrangement α of the 15-puzzle, reverse the process – start with α and
try to reach e.16 You do not need 16 anymore; so use the empty cell instead.
Problem 18. Here is the beginning of one possible algorithm to convert any
even permutation α to e. Think about each step and how to perform it.
(1) Move 1 to the top left position. Without displacing 1, move 2 on its
right; now, without shifting 1 or 2, move 3 to 2’s right.
(2) Move 4 to 3’s right (this may temporarily displace 1, 2, and 3). By now
you have arranged the first row into 1, 2, 3, 4.
(3) Using the same algorithm (without touching the first row), you can
arrange second row into 5, 6, 7, 8.
(4) With some more care (without touching the first two rows), you can
rearrange the third row into 9, 10, 11, 12.
(5) Push the empty cell to the rightmost position on the fourth row.
Call the resulting permutation β, i.e., α → β. What can β be? So far, we
know that β has the numbers 1 through 12 in their correct positions. Since
we started with an even α, Theorem 1 ensures that β must be even too. But
there are only 3 ways to rearrange the remaining numbers {13, 14, 15} in β
and still be even: the 3-cycles β1 = (13,14,15) and β2 = (13,15,14), or the
identity e itself. If you manage to convert β1 → e, then applying the same
algorithm to β2 will convert it to β1 (why? β22 = β1 ), so you will again reach
e after another application of your algorithm: β2 → β1 → e. What is left
is probably the hardest conversion you can make in the 15-puzzle: it looks
simple, but it captures the true spirit of the puzzle. Prove that
Theorem 2. In the 15-puzzle it is possible to 1 2 3 4 1 2 3 4
convert the 3-cycle β1 = (13,14,15) to the iden- 5 6 7 8 ? 5 6 7 8
tity arrangement e. Conclude that all even 9 10 11 12 9 10 11 12
permutations are reachable. 14 15 13 13 14 15
β1 e
16
You can think of this as if you are going from your house to an unknown place and
then back. Which way will be easier to cover? Probably from the unknown place back
home, because you are likely to recognize more and more familiar scenes and road markers
as you approach your house. Traveling to a familiar place will give you an advantage to
take alternative routes or find out with greater ease where you are.
126 5. GROUP THEORY
· s r b l · n c i t
s s r b l n n c i t
r r b l s c c n l s
b b l s r i i l n r
l l s r b t t s r n
Figure 6. Tables for the Turning Soldier and the One Sock groups
C B A C C C A C
s3 r2
r1 s1 s2 s1
= =
A B C A C B A B A B B A B C A B
s1 r1 s1 s2
Figure 7. In S(Δ) = D3 : s1 r1 = s3 and s1 s2 = r2
a more general argument is usually much faster and more elegant. In our
situation with S(Δ): think about why the composition of two symmetries of
Δ is again a symmetry of Δ, and why a symmetry always has a counteraction,
i.e., a “reverse” symmetry that undoes it. For example, the counteraction of
r1 is r2 , and of s1 is s1 itself. ♦
· i r1 r2 s1 s2 s3
i i r1 r2 s1 s2 s3
· i s r1 r1 r2 i s2 s3 s1
· i
i i s r2 r2 i r1 s3 s1 s2
i i
s s i s1 s1 s3 s2 i r2 r1
s2 s2 s1 s3 r1 i r2
s3 s3 s2 s1 r2 r1 i
If you are still unsure which “symmetries” of our figures we are allowed
to consider in this session, check out the footnote on page 107: the allowable
symmetries are called Euclidean motions. These are motions (bijections) of
the plane that preserve distances, also known as rigid motions or isometries:
i imagine your figure made of cardboard and you want to transform the figure
onto itself without bending, twisting, pinching, or doing other horrible stuff
to the cardboard. Thus, a symmetry of a plane figure is not just any bijection
of the figure onto itself: it is a rigid motion. For example, switching the
vertices A and B of a square ABCD while leaving the other two vertices C
and D fixed is not part of a symmetry of the square (why?). Be aware that
i in some sources “rigid” motions exclude orientation-changing motions like
reflections (a reflection changes a clockwise orientation ABCD of the square
to a counterclockwise orientation of the vertices, i.e., ADCB). However, we
will consider reflections as part of our symmetry groups in this session.
Finally, a reflection across a line combined with a translation along this
i line is what is called a glide reflection. For any plane figure, its symmetry
group will be generated by and will consist of the four types of plane trans-
formations mentioned in the text: rotations, reflections, translations, and
glide reflections. This is a fact that needs a proof, and we leave it to the
more experienced reader to provide such a proof.
Exercise 4. For n ≥ 3, Dn has 2n elements: n rotations and n reflections.
The pattern breaks for n = 1 and n = 2. Of course, we may never think
128 5. GROUP THEORY
α
O
Problem 2. In Figure 9b, two equilateral triangles share the same center
O and can be obtained from each other by a rotation and a rescaling; the
rotation is about O at some angle α = k π3 (k ∈ Z), e.g., α = 45◦ , while
the rescaling has some ratio r = 1, e.g., r = 1.5. It is easy to see that the
union of these two triangles has only 3 (rotational) symmetries, written as
|S(F )| = 3. Generalize this example to |S(F )| = n for any n ≥ 1. ♦
Problem 3. The hardest question to answer here is why G1 , G2 , and G3 are
actually groups. You can show this by brute force for each of the groups (e.g.,
compute their multiplication tables). The true explanation, however, is that
the composition of any two rotations in space is again a rotation in space, the
proof of which can be done with linear algebra methods (e.g., multiplying
the so-called orthogonal matrices) and is beyond the scope of this session.
Now, having accepted that we are indeed dealing with groups G1 , G2 , and
G3 , you can find plenty of reasons for these groups to be different.
The text suggests that G1 has a generating rotation; if rj is the rotation
about the vertical axis of the pyramid by (30j)◦ clockwise, then applying r1
repetitively j times will yield rotation rj for all j; so r1 certainly generates
all of G1 . However, r5 , r7 , and r11 also generate G1 : either check this
by brute-force examination of all their repetitive applications or, if you are
more advanced, use slick reasoning from number theory to conclude that the
generating rotations are precisely those rj s for which j is relatively prime
with 12, i.e., j = 1, 5, 7, or 11. On the other hand, if a solid has more than 1
rotational axis, there is no hope for it to have a generating rotation: indeed,
every rotation can generate at most some other rotations about its own axis,
but certainly not about another rotational axis! Thus, G2 and G3 lack single
generators!
The text asks us also to pay attention to non-trivial rotations that coun-
teract themselves: such a rotation can only be by 180◦ about the correspond-
ing axis (why?). Each of the rotational axes of our solids has such a special
rotation. Therefore, the number of “self-counteracting” rotations for each
solid is the number of axes for that solid: 1, 7, and 6, respectively. ♦
8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 129
can manipulate the above equalities to also obtain that r2 sk = sk r2 = sj and
sj sk = sk sj = r2 . In other words, {r0 , r2 , sj , sk } already forms a subgroup
of D4 of order 4. There are two such subgroups of D4 ; the pairs {sj , sk }
130 5. GROUP THEORY
corresponding to these subgroups are the two reflections {s1 , s2 } across the
midsegments of the square, or the two reflections {s3 , s4 } across the diagonals
of the square. Any other pair of reflections in your subgroup will multiply
to the rotations r1 or r3 (why?), resulting in the whole group D4 (why?).
This exhausts all possibilities for subgroups of D4 . In Group Theory II
you will learn of more powerful techniques for tracking and classifying sub-
groups K of a given group G. In particular, |K| divides |G|, which explains
why the group D4 of 8 elements ended up having subgroups only of orders
1, 2, 4, and 8, all of which are divisors of 8. ♦
Exercise 5. As was shown in the text, R4 and T are isomorphic; but T and S
are not isomorphic (we came up with different number of self-counteractions
in each of them). It follows that R4 and S cannot be isomorphic either
(why?).
Exercise 6. Every symmetry of the rectangle is its own counteraction
(cf. Fig. 9a). Thus, S(rectangle) cannot be isomorphic to T . However, it is
isomorphic to S via any relabeling of the four rectangle’s symmetries to the
four sock actions that sends the identity symmetry r0 to the “do-nothing”
action n. Explain why any such relabeling will work, and count how many
i relabellings, called isomorphisms, there are between S(rectangle) and S. ♦
Problem 5. It is slightly “illegal” to ask this question yet, as we haven’t
defined groups in general! This does not prevent the reader from glancing
ahead at Definition 2 and drawing some conclusions. For example, it is
true (and not too hard to verify) that any group G = {e, a, b, c} of order 4 is
isomorphic to T or S. Indeed, check that the product of any two non-identity
elements of G must equal the identity element e or the third non-identity
element, e.g., ac = e or ac = b. (Why ac = a and ac = c?)
(a) If ac = e, then a and c are inverses to each other, leaving b to be its own
inverse: b2 = e (why?). Then the row of a prohibits ab from being equal
to a or e (cf. Fig. 11a), and it can’t be b anyways (Why not?), so that
the only choice left is ab = c. This in turn leaves only one possibility
for a2 in the row of a: a2 = b. Using the fact that any row and any
column of G’s table contains all elements of G (without repetitions or
omissions, why?), you can easily fill in the rest of the table and establish
that it is identical to that of T .
· e a b c · e a b c
e e a b c e e a b c
a a e a a c b
b b e b b c a
c c e c c b a
Exercise 10. Verifying that (R, +) and (Z, +) are groups should be no
problem. To make sure everyone is on the same page, note that the number
0 is the identity element of both groups (why?), and inverses are obtained
by the usual negation of a number: a−1 = −a (why?) for any a ∈ R or Z.
The case of the group (Zn , +) requires some facts from Number Theory I.
For instance, to establish that the operation + is well-defined in Zn (what
is this and why are we concerned about it here?), we need the fundamental
lemma that adding congruences modulo n is a valid operation in Zn : if
a ≡ b (mod n) and c ≡ d (mod n) then a + c ≡ b + d (mod n). Again, 0 will
serve as the identity element and −a = n − a will be the inverse of a (mod n).
Regarding the group (R∗ , ·) it is important to understand that removing
the number 0 from R is necessary, as 0 has no multiplicative inverse (i.e., no
reciprocal ). The identity element in (R∗ , ·) is the number 1 this time! ♦
and vice versa: if (a−1 )n = e then an = ((a−1 )n )−1 = e−1 = e. Thus, the
same powers of a and a−1 equal e, i.e., the orders of these elements are the
same (why?). In particular, in the present case, o(a−1 ) = k.
m
For o(a ) we need the following key lemma:
Lemma 1. If an = e for some n = 0, then a has finite order k that divides n.
gcd(k,m) ·
k
Therefore, o(am ) =
Exercise 18. (a) If G is cyclic, then all its elements are powers of the same
a ∈ G. Hence, the product of any two elements b, c ∈ G can be calculated
by these powers: bc = ai aj = ai+j = aj+i = aj ai = cb. Thus, G is abelian
because the addition among the integers i and j is also abelian.
In (b), let G = a. If o(a) = k, by Problem 8(b), H = {e, a, a2 , . . . , ak−1 }
is already a (cyclic) subgroup of G, with k elements. But any power aj of
a equals some element of H! Indeed, if we divide j by k, i.e., j = kq + r
with quotient q and remainder r (0 ≤ r < k), then aj = akq+r = (ak )q ar =
eq ar = ar ∈ H. Thus, the whole group G equals H, and their orders must
be the same: n = |G| = |H| = k, i.e., o(a) = n = |G|.
(c) In our previous notation for Dn , let rk be the rotation of a regular
n-gon A0 A1 A2 . . . An−1 which takes vertex A0 to Ak , for k = 0, 1, 2, . . . n − 1.
Then (r1 )k = rk for all k, i.e., the rotation r1 generates the rotational sub-
group Rn = {r0 , r1 , . . . , rn−1 } of Dn . In particular, Rn = r1 is a cyclic
subgroup of order n. Now, if s1 the reflection of the n-gon across the perpen-
dicular bisector of A0 A1 , then check that r1 s1 = s1 r1 . Indeed, while r1 and
s1 both send vertex A0 to A1 , the two compositions r1 s1 and s1 r1 act overall
differently on A0 : r1 s1 (A0 ) = r1 (A1 ) = A2 = s1 r1 (A0 ) = s1 (A1 ) = A0 .
Therefore, Dn is non-abelian. By part (a), Dn can’t be cyclic either.
Problem 9. Following the hint, let’s examine the examples in Exercise 10.
The elements of (R, +) all have infinite orders (why?), except for the identity
element 0, whose order is 1. The subgroup (Z, +) follows suit (why?) and
doesn’t produce anything interesting in terms of orders of elements. The
group (Zn , +) is finite, hence it can’t have elements of orders larger than n
(why?); in fact, every j = 1, 2, . . . , n ∈ Zn has order k = n/ gcd(j, n) ≤ n.
(This needs a proof!) The only elements x of (R∗ , ·) with finite orders are
those for which xn = 1 for some positive n; but the only real numbers
satisfying such equations have absolute value 1 (why?), i.e., x = 1 or x = −1,
with orders 1 and 2, respectively. No luck here either!
Moving to D∞ and using Exercise 11 check that, as long as there is a
translation in the product tm sk (i.e., m = 0), the element will have infinite
order. Thus, the only two elements in D∞ of finite orders are e and s. Dn
is a finite group of order 2n; so it won’t have elements of order greater than
2n; in fact, check that the largest order of an element in Dn is n, attained
by the rotations rj with gcd(j, n) = 1. The symmetries of the real line R
behave in much the same way as D∞ (cf. solution to Problem 7); so again
no luck here.
136 5. GROUP THEORY
Finally, let’s examine the symmetries S(C) of the unit circle C. Any
rotation r 2π has order n (why?), and hence S(C) is a infinite group containing
n
elements of any finite orders n, for n ∈ N. Note that S(C) also has elements of
infinite order; for example, any rotation raπ where a is an irrational number
will never compose several times with itself to give the identity rotation
(why?). For instance, r√2π and rπ2 fall into this category.
The final example is the unit circle C itself, or the larger group in which
it is contained: the non-zero complex numbers under multiplication, (C∗ , ·).
Since any cyclic group Cn = ζn is contained in C, and since o(ζn ) = n, our
infinite groups C and C∗ have elements of any order n ∈ N.
Problem 10. Let G be a finite subgroup in (C∗ , ·). Then the same discus-
sion we had in the hints about U ++ in Exercise 8 applies to G too! Indeed,
any a ∈ G has infinitely many powers {aj }, all of which are inside the finite
group G. By PHP, it follows that two of those powers must coincide, i.e.,
an = am for some n > m, from which an−m = 1 for n − m > 0, and hence a
has some finite order k (why?). In other words, ak = 1 in C, which means:
(a) a has a finite order in G, and
(b) the modulus of a is 1 and hence a lies on the unit circle C.
Therefore, G ⊂ C. Starting from 1 ∈ G ∩ C, let’s walk along C counterclock-
wise. Since G is finite, after 1 ∈ G, there will be a first element g of G which
we will hit along our walk. Let the angle of g with the real axis be α, i.e.,
g = cos α + i sin α. We claim that g generates all of G.
Indeed, let h ∈ G, h = 1, and h = g. Then the angle of h is larger than
α, i.e., h = cos β + i sin β with 0 < α < β < 2π. Keep subtracting α from
β until you hit a negative angle for the first time: say, β − (l + 1)α < 0 but
γ = β − lα ≥ 0, so that γ − α < 0. This means that b = cos γ + i sin γ, and
in group terminology, b = h · g −l ∈ G. Thus, b ∈ G has angle γ such that
0 ≤ γ < α. By the minimality of α this is impossible, unless γ = 0, i.e.,
β = lα, b = 1, and hence h = g l .
So all elements of G are in the cyclic subgroup generated by g. This
certainly means that G equals its own subgroup, i.e., G = g. As we showed
in (a) above with g in place of a, g must have some finite order q. We conclude
that G = g is the cyclic group Cq = ζq , which we encountered earlier.
Thus, all finite subgroups of (C∗ , ·) are precisely the cyclic subgroups Cn for
any n ∈ N.
Åã Å ã Å ã
123 123 1234567
Exercise 22. (a) πρ = ; ρπ = ; (b) ;
213 132 7641352
Å ã Å ã Å ã
123456 12345678 1 2 3 4 5 6 7 8 9 10 11 12
; ; (c) .
512364 17824653 4 2 12 11 1 5 9 7 6 8 3 10
Problem 13. (a) The arising 4 cases depend on whether, how much, and
how exactly tk = (1a) overlaps with the previous transposition tk−1 :
• Complete overlap: tk−1 = tk = (1a) so that tk−1 tk = (1a)(1a) = e and
we can erase tk−1 tk from the product, thereby reducing the number of
transpositions by 2 and contradicting the minimality of this odd-length
representation of e.
• No overlap: tk−1 = (bc) for some b and c different from 1 and a. Then
tk and tk−1 commute: tk−1 tk = (bc)(1a) = (1a)(bc).
• Partial overlap 1: tk−1 = (1b) for some b different from a and 1. Then
tk−1 tk = (1b)(1a) = (1ab) = (1a)(ab).
• Partial overlap 2: tk−1 = (ba) for some b different from a and 1. Then
tk−1 tk = (ba)(1a) = (1ba) = (1b)(ba).
While the first case is impossible, in the last three cases we managed to move
1 to the left in the total product. ♦
Problem 15. The cube has 6 pairs of opposite edges, through the midpoints
of which passes a rotational axis with only 1 non-trivial rotation by 180◦ .
The cube has 3 pairs of opposite faces, through the centers of which passes a
rotational axis with 3 non-trivial rotations (by 90◦ , 180◦ and 270◦ ). Finally,
there are 4 pairs of opposite vertices (forming the 4 diagonals), through which
passes a rotational axis with 2 non-trivial rotations (by 120◦ and 240◦ ). With
the identity this makes a total of 1+6·1+3·3+4·2 = 24 rotational symmetries
of the cube, i.e., |G4 | = 24. Can we identify G4 with a well-known group?
8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 139
The cube has 8 vertices. As the hint in the text suggests, we are looking
for 4 objects that are permuted; such are the 4 pairs of diagonally oppo-
site vertices. Indeed, the 4 diagonals of the cube are permuted amongst
themselves by any symmetry of the cube (why?). If l is the line through
the midpoints of any two diagonals di and dj , then l is perpendicular to
the other two diagonals dk and dm (can you see it?). Hence, the rotation
about l by 180◦ will switch di and dj but fix dk and dm , thereby inducing
the transposition (di dj ) ∈ S4 .
As any permutation is a product of transpositions, these (di dj )s generate
the whole S4 of permutations of {d1 , d2 , d3 , d4 }. Thus, S4 ⊆ G4 . But |S4 | =
4! = 24 = |G4 |, so that G4 = S4 .
Exercise 27. (d) (1, 8, 9, 6, 2, 14, 10, 4, 3, 11, 13, 7, 15, 5, 12).
Exercise 28. If α, β ∈ S15 , then they both fix the number 16, so that their
composition αβ, as well as their inverses α−1 and β −1 , will also fix 16. As e
fixes 16, S15 can, indeed, be thought of as the subgroup of S16 consisting of
all permutations that fix 16.
Theorem 2. The 3-cycle (13, 14, 15) in the 15-puzzle can be converted into
e in many ways by permuting only the bottom two rows. Here is one way
(cf. Fig. 12), proposed by Alison Mirin, an alumna of Mills College, who
took the first course in Problem Solving in Mathematics that was based on
volume I of the present book.
140 5. GROUP THEORY
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8
9 10 11 12 14 9 10 11 14 13 9 11 13 9 11 13 9 10 11
14 15 13 15 13 12 15 10 12 14 15 10 12 14 15 12
Figure 12. Permuting the 3-cycle (13, 14, 15) to the identity e
Session 6
Monovariants. Part II
Jumping Fleas and Conway’s Checkers
1. Numerical Monovariants
given. Whenever the sum of the numbers in any row or column is negative,
we may switch the signs of all the numbers in that row or column, from
negative to positive or vice versa. Prove that if we repeat this operation
enough times, eventually all the row and column sums will be non-negative.
Before looking at the official solution below, try a simple example with a
2 × 2 array, as in Figure 1. Think about how many flips of the row or column
signs you need to perform before being unable to continue. What happens if
you change the numbers in your table? Do you always get 4 positive numbers
in the end, or could some be negative?
−2 −3 +2 −3 −2 +3 +2 +3 +2 −3 −2 +3 +2 +3
→ → → → → →
−1 −4 +1 −4 +1 −4 −1 −4 −1 +4 −1 +4 +1 +4
6 (not 15) steps to stop: Can we devise a 2 × 2 example that will take longer
to terminate? This discussion should remind the reader of the Appendix
to the Monovariants I session, where an analogous question was answered
for the mansion problem. To make it even more challenging, the advanced
reader can ask and attempt to answer the same questions about maximizing
the length of the process whenever appropriate in the forthcoming problems.
Only hints to the next few problems will be offered here, but you can
check out the solutions at the end if you get stuck. Recall that the greatest
common divisor of two integers, gcd(a, b), is the largest integer that divides
i both a and b, while the least common multiple, lcm(a, b), is the smallest
(positive) integer that is divisible by both a and b.
a blackboard. One can erase any two distinct numbers and write their gcd
Problem 2. (St. Petersburg ’96) Several positive integers are written on
and lcm instead. Prove that eventually the numbers will stop changing.
(b) You will inevitably see a pattern among the numbers when the process
terminates. What is this pattern? Does it always persist?
(c) We choose the order of pairs on which to perform the operation. Pre-
sumably, the length of the process and the final resulting set of numbers
depend on our choice, or do they?
Hints: When does the process terminate? What pairs of numbers are kept
the same under the operation? If the largest (or smallest) possible number
is written on the board, will that number change afterwards? Can you put
it aside and apply the problem to the remaining numbers? ♦
144 6. MONOVARIANTS. PART II
Solution: The only way to get the value a from now on is, at each step, to
have the consecutive pair {a, 0} or {0, a} appearing somewhere on the circle.
But the only ways to produce {a, 0} are to have consecutively {a, 0, 0} or
{0, a, a}, and to produce {0, a} – to have {0, 0, a} or {a, a, 0}. We see that
from now on, at each step, we must have at least one of the sequences
{a, 0, 0}, {0, a, a}, {0, 0, a}, or {a, a, 0}. This begs for an inductive argument.
Suppose we have shown that for some n ≥ 2 it is always necessary (from
now on at each step) to have some sequence An = {a1 , a2 , . . . , an } where all
ai ’s are 0 or a, and at least one of them is a. (The sequence itself is not
fixed, so it can be one of 2n − 1 types – the exact number of sequences is
irrelevant.) But to produce sequence An , we must have from now on, at each
step, a sequence An+1 of (n + 1) 0s and a’s, with at least one a. Indeed,
start from some ai = a in An . To have such an a, as we saw above, we need
to have either {a, 0} or {0, a} in the corresponding place in An+1 . One can
easily see that each of these cases uniquely determines the rest of sequence
An+1 , populating it only with 0s and a’s. This completes the induction step.
But what happens when the length n of the required sequence An exceeds
M ? It simply means that we have wrapped the sequence around the whole
circle, and from now on the only values on the circle will be a’s and 0s.
0
0 a a
a a a a
↔0 0
a a a a
0 a a
0
Figure 2. Cycles for triangles and hexagons
The remainder of the problem does not work in general. For instance,
take the 3 numbers {0, a, a} on the gray triangle in Figure 2a; the next
1. NUMERICAL MONOVARIANTS 145
step will be the white triangle with exactly the same label {0, a, a}! Nei-
ther will we get to the zero-configuration from the hexagon {0, a, a, 0, a, a}
(cf. Fig. 2b), whose label also goes to a rotated version of itself under the
operation. The reader may want to think of other counterexamples.
Below we move to the 2m case, which always works.
Lemma 2. If you start with M = 2m numbers, all 0s and a’s, then after
2m iterations, you will be left with only 0s.
Partial proof: As an illustration, take the square with labels {0, a, a, a}
in Figure 3. After 4 iterations (follow the numbers outside the squares), this
turns into {0, 0, 0, 0}. To see why, forget about the operation |b − c|: it is
too hard to track what happens under it. As everything is divisible by a,
we can factor out a and, de facto, assume that a = 1. Now, let’s add up
any two adjacent numbers (written inside the squares) and work modulo 2,
i.e., think of 0 for even and 1 for odd. The results will be really the same
as before: at each step, 1 + 1 = 2 = 0 = |1 − 1|, 1 + 0 = 1 = |1 − 0|, and
0 + 0 = 0 = |0 − 0|. The net effect of these additions are shown in Figure 3,
where the final label is {10, 12, 14, 12} = {0, 0, 0, 0} (mod 2).
a a
0 a 0 a 0 0
0 1 1 2 3 5 10 12
a 1 2 0 a 5 7 a
1 1 3 4 12 14
a a 2 a 0 7 0 0
0 a
Figure 3. Zeroing-in on the circle for 22 numbers
In order to show that this works for any initial labels {a1 , a2 , a3 , a4 },
track the additions for the next 4 iterations. Thus, for instance, in place
of a1 in the last square we will have 6a1 + 4a2 + 4a4 + 2a3 = 0 (mod 2).
Analogous formulas will imply that all numbers around the last square are 0.
For instance, a1 = 0, a2 = 1, a3 = 1, a4 = 1 give 6 · 0 + 4 · 1 + 4 · 1 + 2 · 1 = 10.
To extend this argument to any 2m numbers, it is necessary to recog-
nize the coefficients 6, 4, 4, and 2 in the final formula as certain binomial
coefficients, generalize them, and show that they are all even. ♦
1.3. Looking for more than just a monovariant. Here’s a clever but
really difficult problem. It will require using an invariant, a feature that does
i not change (which shouldn’t be hard to find), together with a monovariant.
Problem 5. (IMO ’86, [21]) An integer is written at each vertex of a reg-
ular pentagon so that the sum of all five numbers is positive. If three consec-
utive vertices are assigned the numbers x, y, z with y < 0, then the following
operation is allowed: the numbers x, y, z are replaced by x + y, −y, z + y,
respectively. Such an operation is repeated as long as at least one of the five
numbers is negative. Determine whether the procedure necessarily comes to
an end in a finite number of steps.
146 6. MONOVARIANTS. PART II
x x+y
5 3 2 2
t y t −y
−1 −2 −1 2 1 2 0 2
−2 < 0 −1 < 0 −1 < 0
0 3 0 1 −1 1 1 0
q z q z+y
Hint: Try the following numerical experiment. Put some numbers at the
vertices, e.g., as in Figure 4a. Then write down all of the possible sums
of 1 or more consecutive numbers around the pentagon. (There are 21 such
sums. Why?) Perform the operation in the problem several times, and at
each step, again write down all 21 sums. How do the sums change when the
operation is performed? In particular, what about their absolute values? ♦
If you attempt to solve this last problem, you will realize that the ap-
propriate monovariant can sometimes be tricky to construct. In practice,
though, when a problem requires a monovariant, it’s usually not too hard
to come up with it. The simple formulas mentioned at the beginning of
this section are fairly general-purpose. The further problems in this session
should help provide you with inspiration for the rough times when the usual
recipes come up short.
For example, the monovariant might be some function of the nth term
of the sequence, which would change in some predictable way as n increases.
Or perhaps we need to look not just at one term at a time, but at two or
more successive terms. This is confusing to explain without an example, so
here’s an example to make the discussion concrete.
1. NUMERICAL MONOVARIANTS 147
Problem 6. (USAMO ’93, [41]) Let a and b be two odd positive integers.
Define a sequence by putting f1 = a, f2 = b, and letting fn for n ≥ 3 be the
greatest odd divisor of fn−1 + fn−2 . Prove that fn becomes constant for n
sufficiently large, and determine the eventual value as a function of a and b.
Exercise 1. In the setup of Problem 6 prove that gcd(fn−1 , fn ) is an invari-
ant. Conclude that the constant value at which fn stabilizes is gcd(a, b).
Hint: Start by proving that fn+1 is divisible by gcd(fn−1 , fn ), and that
fn−1 is divisible by gcd(fn , fn+1 ). ♦
Problem 7. (USAMO ’97, [41]) Let p1 , p2 , p3 , . . . be the prime numbers
listed in increasing order, and let x0 be a real number between 0 and 1. For
each positive integer k, define xk = 0 if xk−1 = 0, and xk = {pk /xk−1 } other-
wise, where {x} = x−x denotes1 the fractional part of x. Find all x0 satis-
fying 0 < x0 < 1 for which the sequence x0 , x1 , x2 , . . . eventually becomes 0.
PST 47. For a sequence of rational numbers, investigate the two sequences
that it naturally generates: its numerators and its denominators. Depending
on the problem, it may or may not be advantageous to reduce the fractions
so as to redefine the two sequences.
Problem 8. (USAMO ’07, [25]) Let n be a positive integer. Define a
sequence by setting a1 = n and, for each k > 1, letting ak be the unique
integer in the range 0 ≤ ak ≤ k − 1 for which a1 + a2 + · · · + ak is divisible
by k. For instance, when n = 9 the sequence obtained is 9, 1, 2, 0, 3, 3, 3, . . ..
Prove that for any n the sequence a1 , a2 , a3 , . . . eventually becomes constant.
√
1
x is the largest integer ≤ x, e.g., 10/3 = 3.3̄ = 3 and 2 = 1.4 . . . = 1.
2 a
Irrational means a real number that is not of the form b for any integers a and b.
2. CONSTRUCTIVE ACTIVITIES 149
2. Constructive Activities
object that doesn’t meet the conditions, and then fix it until it does. Use a
PST 48. To construct an object that meets certain conditions, create an
2.1. Connecting the dots. The vagueness above is probably making your
eyes glaze, so let’s save the day with a specific example [59], where we will
progressively eradicate all “errors” in the solution.
Problem 9. (Kvant ’94) Given are n grey and n black points in the plane,
no three collinear (i.e., no three on the same line). Show that we can draw n
nonintersecting segments connecting the black points to the grey points.
B A B
gle Inequality on ADE and BCE. ♦
It immediately follows that the sum of the lengths of all n segments
decreases every time we perform the uncrossing operation.
2.1.2. The clean write-up. Now all the parts are in place to be put together.
Solution to Problem 9: First pair up the grey and the black points ar-
bitrarily, and connect each grey point to the corresponding black point. This
may create some intersection points. Now iterate the following operation:
• Whenever a segment AC crosses a segment BD, with A, B grey and
C, D black, replace segments AC and BD by AD and BC.
For AC and BD to intersect, ABCD must be a convex quadrilateral
(why?). So by Exercise 2, the sum of the lengths of all n line segments must
decrease each time we perform this uncrossing operation. But there are only
a finite number of ways to pair up the grey points with the black points.3 So
if we perform the operation repeatedly, the process must eventually end.
By assumption, we perform the uncrossing above whenever two segments
cross. So when the process stops, there must be no more crossings, which
means that we have paired up the grey points with the black points using n
nonintersecting line segments – just as the problem requires.
3
Recall the matchmaking Exercise 8 from Combinatorics I (vol. I). There, 10 men and
10 women could marry off in 10! heterosexual couples. Similarly, the number of possible
pairings between the black and grey points is n!.
2. CONSTRUCTIVE ACTIVITIES 151
Notice that this is an example of a problem where no operation is sup-
plied. Instead, solving the problem requires coming up with both the mono-
variant (total segment length) and the operation (uncrossing of segments)
that makes it monovary.
2.1.3. Extremes again! Writing out this solution doesn’t necessarily require
us to describe fixing the configuration as a process. For example, we could
also write the solution as follows:
Alternative solution to Problem 9: Among all n! ways of pairing
the grey points with the black points, consider the pairing that makes the
sum of the lengths of the segments as small as possible. We claim that,
with this pairing, the segments never cross each other. Indeed, if there is a
crossing, then we can re-pair the points involved as in Figure 5a to make the
total length of the n segments shorter (by Exercise 2). But we assumed that
the original pairing made this total length as small as possible, so this is a
contradiction.
The idea of picking up a pairing that minimizes the total segment lengths
i is a famous technique called the Extreme Principle:
an extreme value of that feature (e.g., minimal sum). Then argue that, due
PST 49. Define a feature (e.g., sum of lengths) and select an object having
to the extreme value, some operation is not possible (e.g., uncrossing), and
hence conclude that the object in question possesses some other property
(e.g., no intersection points).
Speaking of which, our alternative solution relied on just one minimal-
length pairing; but it did not preclude the existence of other such pairings,
nor did it outlaw good non-minimal pairings:
more than one minimal-length pairing of the segments? How about having
Exercise 3. Could there be a configuration of grey and black points with
2.1.4. Pros and cons. The solution to Problem 9 can be written either way:
by a “self-correcting” process or via the Extreme Principle. Arguably the
second way is in some sense more appealing, because it explicitly identifies
the pairing (or one of the pairings) that works. But both solutions require
the same key idea – the same operation and the same monovariant.
152 6. MONOVARIANTS. PART II
2.3. Shedding the disguise. If you are familiar with the language of
graph theory, you will probably notice that both of these last two problems
are really theorems of graph theory, recast in anthropomorphic form. Briefly,
(1) A graph consists of vertices (dots), some of which are connected by
edges (the segments between the dots).
i (2) Neighbors are two vertices connected by an edge.
(3) A Hamiltonian cycle is a path that tours the graph along the edges,
visiting each vertex exactly once and coming back where it started.
4
Paul Zeitz called this the “affirmative action coloring problem” in the case of n = 2.
3. NOT GETTING THERE 153
formed repeatedly. To show that the system can never reach some state from
PST 50. Suppose you have a system on which certain operations are per-
expect that the monovariant will also be given by some sort of linear function.
PST 51. If the operation is given by a linear function, it is reasonable to
3.1.2. Place-search. We still haven’t solved our problem. For all we know,
our monovariant could decrease forever if the fleas could go on jumping
forever, unless they all eventually land in the same location. (When does
this happen?)
However, the problem asks for something different: a place on the line
over which the fleas cannot jump, i.e., for an impossible final configuration.
It seems we again need to do some detective work.
Now consider any configuration V of the n fleas, with some value ν.
According to PST 50, we should find a configuration W of the fleas that is
unreachable from V; more precisely, whose value is larger than our value ν, so
that the decreasing monovariant will prevent us from reaching it. If ω is the
rightmost position in W, then the value of W is ω − λ (sum of fleas ≤ ω).
How can we make sure this is larger than ν? By the Sandwich technique
from the Induction session (vol. I), we squeeze in an obvious intermediate
quantity:
?
(3) ω − λ (sum of fleas ≤ ω) ≥ ω − λ(n − 1)ω = ω (1 − λ(n − 1)) > ν.
This may look intimidating, but all we did was replace each of the other
fleas with ω, in order to decrease the overall value. To resolve the “?” before
reaching ν at the end, let μ = 1 − (n − 1)λ. We finally see here why the
condition λ < 1/(n−1) was required in the problem: to make our μ positive!
Thus, we need ωμ > ν, i.e., any ω > ν/μ will do. The inequalities in (3) will
be satisfied regardless of where the remaining fleas are, as long as they are
to the left of ω.
Our search over, a formal proof needs to recap the above points.
Exercise 5. From any configuration with value ν, show that it is not possible
for any one of the fleas to get to a position M > ν/μ, where μ = 1 − (n − 1)λ.
Hint: Use inequalities (3), our monovariant, and a contradiction. ♦
We have just proven that for any initial configuration, there is some point
over which none of the fleas can ever jump. This is a stronger statement
than the problem required, so we are certainly done. In fact, the statement
of the original problem is slightly misleading in that it asks us to search for a
whole configuration of fleas, along with an unreachable position. In reality,
any flea configuration works, and any M > ν/μ is beyond the fleas’ reach.
156 6. MONOVARIANTS. PART II
W W S W W S W W S W W S
S S W S S W S W W W W W
W S S S S S S S S S S S
The trick is similar to the idea behind Problem 10 about favors among
friends. At each step, a student assimilates to the state of a majority of his
or her neighbors, and hence the total amount of mismatch between students
sitting next to each other never increases. That is, the number of pairs of
adjacent students, with one asleep and one awake, always decreases or stays
the same. This solves the problem, because we can bound the number of such
pairs at the beginning and at the end of class.
Well, almost – the count doesn’t quite work out correctly, because while
most students can belong to up to 4 such pairs, those on the edges of the
grid have only 2 or 3 neighbors. So, at the end with 10 awake students, we
could have anything from 26 mismatched pairs (with 4 corner and 6 edge
W ’s) to 40 mismatched pairs (with all 10 W ’s inside the grid). This is not
enough information to conclude for sure that there were 10 awake students
in the beginning. Hypothetically, we could have started with 36 mismatched
pairs created by 9 awake non-adjacent students inside the grid, but ended
with all W ’s having “migrated” to the border, e.g., 10 edge W ’s for a total
of 30 mismatched pairs and 10 awake non-adjacent students. As predicted,
the monovariant decreased (from 36 to 30), but we didn’t solve the problem!
3. NOT GETTING THERE 157
PST 54. When everything works perfectly inside a region and is slightly off
along the border, you might remedy this inelegance by embedding the given
region in a bigger region and imposing some trivial conditions outside the
region in order to extend the problem there.
You may have seen this technique in another setting before. In Pascal’s
Triangle, the defining rule “add two adjacent numbers to get the number
directly below them” does not work for the 1s at the ends of the rows (why?).
However, if you place 0s all around Pascal’s Triangle in the same triangular
grid covering the whole plane, the rule will work for all numbers in the plane,
except for the 1 at the top of the triangle.
In our present predicament, we embed the grid in a bigger grid. The
“trivial” condition that will be imposed outside our original grid will be
perpetual sleepiness.
4. Conway’s Checkers
This session will close with one last, fairly complex example. It might
get a little tiresome to go through so many disconnected examples of mono-
variants, but that’s no excuse for stopping here, because if you haven’t seen
this problem, your life is not complete. The problem is credited to the great
recreational mathematician John Horton Conway and is often called Con-
way’s Checkers or Conway’s Soldiers [9].
4.1. The setup is similar in spirit to the Escape of the Clones from the
Invariants session (vol. I), played on a grid infinite to the right and up.
Every cell had at most one clone. A clone sprouted one clone in the cell
to the right and another in the cell directly above, and then disappeared.
Given a “prison” fence enclosing some (or all) clones, the task was to free the
the clones from the prison. Let’s see how Conway’s problem differs.
Problem 14. (Conway’s Checkers) Imagine that you have an infinite
square grid, with a particular horizontal line of the grid designated. You
play the following game:
(a) First, you may initially place checkers in the squares below the line –
as many as you want, but no more than one checker per square.
(b) Then, you may take a checker and jump it over a checker that is adjacent
to it – in any of the four directions – into the square immediately
beyond, if that space is vacant. In the process, you remove the checker
that has been jumped over (cf. Fig. 7).
(c) You may continue jumping checkers, as long as there are two checkers
adjacent to each other somewhere.
The goal is to get some checker to be as far above the designated line as
possible. What is the highest row that can be reached?
4.2. Initial victory. Check that you can get a checker to the first row above
the designated line, by simply starting with two checkers stacked just below
the line and then jumping upward. With four checkers and a series of three
jumps, you can get a checker to the second row above the line (cf. Fig. 8).
4. CONWAY’S CHECKERS 159
Exercise 6. Find a way to get to the third row above the line. Then try to
get to the fourth row.
Hint: Figure 8c contains the initial configuration for reaching row 1, only
moved up a row. In general, suppose you have reached row k from some initial
configuration Fk . Shift Fk one row up, and try to reach it from some new
configuration Fk+1 . If you are successful, then your previous transformation
of Fk will result in a checker in row k + 1. There could be other ways. ♦
Here is the general principle that may have helped you so far:
PST 55. Re-use inductively your solution for a previous case inside your
solution for the next case.
The situation looks hopeful! Pushing on:
Exercise 7. Try to find a way to get a checker up to the fifth row. Become
frustrated.
As the last exercise foreshadowed, and fairly remarkably, it’s impossible
to get a checker more than four rows up, no matter how many checkers you
place in the first stage of the game. Can we prove this?
4.3. Is there an invariant? First, let’s try using an invariant, along the
lines of the Escape of the Clones. Can we assign a number to each square, so
that the sum of the numbers of squares with checkers in them stays constant
at each step?
a b c a b c
Consider three successive squares (cf. Fig. 9). Suppose we want to write
the numbers a, b, c in them so that the sum of the occupied squares stays
constant. One legal move is to jump a checker from the a square to the c
square, removing the checker on b in the process. For the sum to be invariant
under this move, we must have a+b = c. Similarly for a jump in the opposite
direction, we must have c+b = a. But these two equations add to give b = 0.
This argument shows that the number written in every square must be zero!
That doesn’t give us a very useful invariant.
160 6. MONOVARIANTS. PART II
4.4. Modifying an invariant. Maybe we ask too much from the invariant?
instead of having an invariant in all four directions (moving to the right, left,
up, and down), try to make the sum invariant only under jumps in certain
directions, e.g., only right and up.
First, let’s focus on a single row. Let’s choose numbers to write in the
squares so that the sum stays invariant when we jump to the right. Just as
powers of 12 were the natural choice to use in the Escape of the Clones, so are
powers xn of some unknown x in order here. Three successive squares with
xn , xn+1 , xn+2 , written from left to right, will accommodate our desired
invariant only if xn + xn+1 = xn+2 . Dividing by xn leads to a famous
quadratic
√ equation: x2 − x − 1√= 0, and the quadratic formula gives x1 =
1+ 5
2 ≈ 1.618 > 1 and x2 = 1−2 5 ≈ −0.618. The larger root x1 is denoted
i by φ and is known as the golden ratio.
To make it easier to express things concretely, coordinatize the squares
in a row with integers (increasing as we move to the right). If we write φn
in square n, we have ensured that the sum of the numbers of the squares
containing checkers stays invariant under jumps to the right: φn + φn+1 =
φn+2 . On the other hand, a jump to the left consists of replacing φn+2 +φn+1
by φn , thus always decreasing the sum (why?).
Now, what about jumps up and down, trying to be invariant only up?
For any column, an analogous discussion leads us to assign powers φn , with
n increasing as we move up the column. However, this column intersects our
previously discussed row, and there is already some power φm assigned to
the square in the intersection. We need to shift all powers in our column
up or down in order to match this φm . There is a simple algebraic way of
reconciling the numbers in all rows and columns: we assign vertical as well as
horizontal coordinates to the squares, and then write φm+n in square (m, n).
This multiplies all numbers in our column by the same φm , and all numbers
in our row by the same φn , without changing the properties that we would
like: the sum of the numbers in the checkers’ squares will
• stay the same under rightward and upward jumps, and
• decrease under leftward and downward jumps.
4.5. Symmetry gets rid of infinities. Even though we now have turned
the sum into a monovariant, this doesn’t quite work to solve the problem.
We want to show that the fifth row is unreachable by arguing the sum of the
original checkers is not large enough. But with the numbering scheme just
described, there exist squares with arbitrarily large numbers, even below the
designated line: since φ > 1, we have a real problem with the positive powers
φm . In particular, we need the sum in a single row to be finite, but what we
have now is this:
4. CONWAY’S CHECKERS 161
Lemma 3. For any row, the assignment {φm } for any integer m yields an
infinite sum for half of the row and a finite sum for the other half:
(a) 1 + φ + φ2 + φ3 + · · · + φm + · · · = ∞.
1
(b) 1 + φ−1 + φ−2 + φ−3 + · · · + φ−m + · · · = = φ2 .
1 − φ−1
Proof: Part (a) is self-evident, as all numbers there are > 1. The sum in
i part (b) is a geometric series a + ar + ar2 + · · · + ar m + · · · , where every
next term is the previous multiplied by the ratio r. Provided r is small,
namely, −1 < r < 1, the sum adds up to 1−r a
(cf. the Invariants session,
−1
vol. I). In our case, a = 1 and r = φ , and 1−φ1 −1 = φ−1φ
= φ2 because
(φ − 1)φ = φ2 − φ = 1 and hence φ−1
1
= φ.
The ∞ in part (a) is worrisome, showing that our argument will never
come together. We need to get rid of the large powers φm , while at the same
time ensure that what happens when jumping left (the sum decreases) will
also happen when jumping right! The way to do this is to:
PST 57. Choose a central object and symmetrize the rest with respect to it.
Specifically, for our checkers choose a central column, have the numbers be
highest in that column, and decrease as you go away from it along any row.
φ.7.. φ.8.. φ.9.. φ10 ...φ9 ...φ8 ...φ7
To put it more directly, choose the “central” ... ... ... ... ... ...
... ... ...
column to be the one with m-coordinate 0, and φ... φ.7.. φ.8.. φ9 ...φ8 ...φ7 ...φ6
6
... ... ... ... ... ...
as before assign to it all powers {φn }. Row ... ... ...
n intersects this central column in φn , so de- φ.5.. φ.6.. φ.7.. φ8 ...φ7 ...φ6 ...φ5
... ... ... ... ... ...
... ... ...
crease the powers of φ as you move away from φ.4.. φ.5.. φ.6.. φ7 ...φ6 ...φ5 ...φ4
φn along row n, either to the right or the left: ... ... ... ... ... ...
... ... ...
φ.3.. φ.4.. φ.5.. φ6 ...φ5 ...φ4 ...φ3
. . . , φn−2 , φn−1 , φn , φn−1 , φn−2 , . . .. ... ... ... ... ... ...
... ... ...
This boils down to replacing φm → φ−|m| φ.2.. φ.3.. φ.4.. φ5 ...φ4 ...φ3 ...φ2
... ... ... ... ... ...
in our previous formula and arriving at the ... ... ...
φ... φ.2.. φ.3.. φ4 ...φ3 ...φ2 ...φ
pretty V -shape pattern on the right. ... ... ... ... ... ...
... ... ... ...
1... φ... φ.2.. φ3 ...φ2 ...φ ...1
Exercise 8. Suppose the number φ−|m|+n is .
... ... ... ... ... ....
..
. .
. .. .. ..
assigned to square (m, n), for all m and n. φ... 1... φ... φ2 ...φ ...1 ...φ−1
−1
... .. ... ... ... ...
Check that whenever a jump is made, the sum ... .. ... ... ... ...
φ.−2 −1 −1
... φ.... 1.... φ ....1 ....φ ....φ
−2
Note that, if you add up all the numbers in the grid, you will inevitably
get ∞; indeed, any one column alone will yield an infinite sum (why?). But
as it will turn out, for our solution we do not need to add up all numbers
and we will not do that.
4.6. What’s stopping us from reaching the 5th row? Now we have all
the pieces in place to explain this “mystery”.
Solution to Problem 14: In Exercise 6, we saw that it was possible
to get checkers as high as the 4th row above the designated line. We claim
that it is not possible to get a checker to the 5th row, which establishes the
answer: the 4th row is the highest possible.
Suppose, for a contradiction, that it is possible to get a checker to the
th
5 row. Coordinatize the grid so that the row just below the designated line
is row 0 and the row just above it is row 1, and so that the alleged 5th -row
checker lands in column 0, i.e., in square (0, 5). As before, for each m, n, let
the number φ−|m|+n be written in the square in column m and row n.
Now consider the sum of all the numbers written in squares containing a
checker. Initially, checkers exist only in the squares below the line, so their
sum is at most φ5 according to Exercise 9. Furthermore, Exercise 8 showed
that our sum will either stay the same or decrease with each jump, and so it
will always be ≤ φ5 . But we assumed that we can eventually get a checker
to the square (0, 5), which has the number φ5 written in it, making the sum
≥ φ5 . This means that the sum must have started from φ5 and ended also
at φ5 , i.e., it stayed constant throughout the whole game! So the sum is,
after all, an invariant?
Wait a minute! To have an initial sum of φ5 we must have started with
all checkers that are below the designated line. To have a final sum of
φ5 concentrated in one square, (0, 5), means that we ended the game with
exactly one checker: having more checkers in the end will bump up the sum
beyond φ5 . So, we converted an initial configuration with infinitely many
checkers into a single checker? But that would take an infinite number of
jumps, while the game is finite: we said we reached the 5th row, which ended
the game! This is our contradiction: after all, ∞ is not finite.
ρ5
@
ρ5 ρ4 @ρ5
@ @
ρ5 ρ4 ρ3 @ρ4 @ρ5
@ @ @
ρ5 ρ4 ρ3 ρ2 @ρ3 @ρ4 @ρ5
@ @ @ @
ρ5 ρ4 ρ3 ρ2 ρ @ρ2 @ρ3 @ρ4 @ρ5
@ @ @ @ @
ρ5 ρ4 ρ3 ρ2 ρ 1 @ρ @ρ2 @ρ3 @ρ4 @ρ5
@ @ 5 @4 @3 @2
@ρ @ρ @ρ @ρ @ρ ρ2 ρ3 ρ4 ρ5
@ @ 5 @4 @3 2
@ρ @ρ @ρ @ρ ρ3 ρ4 ρ5
@ @ 5 @4 3
@ρ @ρ @ρ ρ4 ρ5
@ @ 5 4
@ρ @ρ ρ5
@ 5
@ρ
Problem 2. Look at the sum of all the numbers: it increases in the example,
and in fact it always increases. Indeed, using the notation from the hint, any
two numbers a and b among the given will be replaced by d and l = dkm,
so the sum will go from a + b = d(k + m) to d + l = d(1 + km). To see why
1+km ≥ k+m move to the LHS and factor: 1−k+km−m = (k−1)(m−1) ≥ 0.
Because k, m ≥ 1, equality occurs iff k = 1 or m = 1, which boils down to
a = d or b = d, and a|b or b|a. Thus, the sum will remain the same iff one
of the numbers divides the other (and in this case, the two numbers will not
change either), and the sum will strictly increase otherwise.
Note that the operation does not change the overall lcm of all numbers.
This is so because lcm(a, b) = l = lcm(d, l). Thus, the largest number we can
possibly write down is L = lcm(a1 , a2 , . . . , an ), the lcm of all given numbers
a1 , a2 , . . . , an . This puts a (very rough) upper bound on the total sum: at
most nL, and as the sum increases by steps of 1 or more, it eventually must
stop changing. By our argument above, this means that for any two of the
given numbers, one divides the other, and hence the numbers themselves
stop changing at that point.
Problem 3. In part (a), we have ab = d2 km = dl, i.e., the total product
Π = a1 a2 · · · an is an invariant. This could have been used instead of the lcm
above to show that the process will terminate (how?). It also explains what
really happens in parts (b)-(c): the process reshuffles and recombines the
factors of the given numbers, without dropping or creating new factors.
From our solution to Problem 2 we know that the process terminates
when, for any two numbers, one divides the other. Arrange the final num-
bers ci in increasing order. Then we must have a chain of divisibilities:
c1 |c2 |c3 | · · · |cn−1 |cn . There are further restrictions. If p is a prime that di-
vides some ai , let pα1 ≤ pα2 ≤ · · · ≤ pαn be all prime powers that divide the
original numbers. It is easy to see that these prime powers cannot combine
or split into two different numbers during the process (why?). Hence, they
will end up dividing the resulting final numbers on the board, i.e., pαi |ci for
all i = 1, 2, . . . , n. Thus, the original prime powers are only reshuffled to
make the final numbers, thereby, completely determining the final outcome
of the process, regardless of the order in which we conduct the process.
An interesting twist is that the length of process does depend on our
choices. In addition to the 2-step example on page 143, we can complete the
process in a longer way:
• {2, 3, 5, 15} → {1, 3, 10, 15} → {1, 3, 5, 30} → {1, 1, 15, 30}.
Problem 4. For the case of M numbers on the circle, to track the contri-
butions of a single label a1 to all labels, set a1 = 1 and all other ai = 0,
and perform the addition process. You will quickly see a famous pattern,
encoded in the Pascal’s Triangle:
5. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 165
“Miraculously,” the set of sums starting with x does not change as a whole!
Similarly for the sums starting with t or q:
• q + x ↔ q + x + y, t + q + x ↔ t + q + x + y (switch).
• t, q, t + q, q + x + y + z (go to themselves);
Finally, starting with y or z:
• y + z ↔ z; y + z + t ↔ z + t; y + z + t + q ↔ z + t + q (switch);
• y → −y and z + q + t + x → z + q + t + x + 2y.
So the only change to the set of sums occurs when y → −y and S −y → S +y:
these are the two sums without matching partners in our example! Taking
absolute values, we have no change for |y| = | − y|, but |S − y| = S + (−y) >
|S +y| (why?). (In our example, |S −y| transitions from 7 to 3, then from 6 to
4, and again from 6 to 4, dropping down every time by −2y.) Consequently,
the sum of all absolute values of the 21 possible sums goes down by an integer
value. This is our monovariant! Since this sum is always positive (why?), it
must stop decreasing after a while, i.e., the process must terminate.
Try repeating the same argument for 3, 4, and 6 numbers.
Exercise 1. When n is large enough that fn has reached its eventual con-
stant value c (odd), then fn = fn+1 = c implies (fn + fn+1 )/2 = c, so indeed
fn+2 = c. At the same time, gcd(fn , fn+1 ) = gcd(c, c) = c for all n from now
on. But what happens before the sequence stabilizes?
Following the hint, for any n, let d = gcd(fn−1 , fn ). Thus, d |fn−1 and
d |fn , and hence d also divides the sum fn−1 + fn . Since both fn−1 and fn
are odd, then d must be odd too; this means that we can divide the sum
by 2 without affecting d, i.e., d |(fn−1 + fn )/2. But fn+1 is the largest odd
divisor of this average, so d |fn+1 . Ordinarily, the gcd of a subset of numbers
{fn−1 , fn } is greater than the gcd of the whole set {fn−1 , fn , fn+1 }. However,
in our case, adding fn+1 to {fn−1 , fn } does not decrease the gcd (as fn+1 is
already divisible by that gcd), so gcd(fn−1 , fn ) = gcd(fn−1 , fn , fn+1 ) = d.
We are half done. Now, let e = gcd(fn , fn+1 ), and hence e |fn and
e |fn+1 . By definition, fn+1 |(fn−1 + fn )/2, therefore, e |(fn−1 + fn ). But
e already divides the summand fn ; hence e must divide the other sum-
mand fn−1 , i.e., gcd(fn , fn+1 ) | fn−1 . Analogously as above, gcd(fn , fn+1 ) =
gcd(fn−1 , fn , fn+1 ) = e.
Combining the two conclusions, gcd(fn−1 , fn ) = gcd(fn , fn+1 ) for any n,
i.e., the gcd of two consecutive numbers is an invariant.
To finish the exercise, on the one hand gcd(f1 , f2 ) = gcd(fn , fn+1 ) for
any n, and on the other hand c = gcd(fn , fn+1 ) for n large enough (when
{fn } stabilizes). So the constant value of the sequence is gcd(a, b).
Problem 7. Recall (from Exercise 4 in Proofs I, volume I, about arith-
metic operations on irrational and rational numbers) that the difference,
sum, product, or ratio of an irrational and a (non-zero) rational number is
irrational (proven by contradiction). Now, if x0 is irrational (x0 ∈ I), then
by induction, all xk ∈ I. Indeed, if some xk−1 ∈ I, then pk /xk−1 ∈ I (as pk is
5. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 167
a prime and hence rational). As the floor function x outputs only integers
(hence rational), the fractional part {x} = x − x of a number preserves
the rationality/irrationality of x. Putting these together, xk = {pk /xk−1 } is
also irrational. Hence the sequence will never reach 0.
On the contrary, if x0 is rational (x0 ∈ Q), the sequence will reach 0.
Indeed, similarly as above, all xk ∈ Q (xk ≥ 0), so we can write them
as xk = abkk for some relatively prime positive integers ak and bk , unless xk
becomes 0. We will show that the denominators {bk } decrease in the process.
Since 0 < x0 < 1, a0 < b0 , then x1 = { xp10 } = { a0p/b 1
0
} = { pa1 0b0 } = ab11 < 1.
The fractional part {x} changes only the numerator, but not the denominator
of the rational x (why?). Therefore, b1 = a0 , or b1 < a0 if there is some
reduction of the fraction ab11 . In either case, b1 ≤ a0 < b0 , so the denominator
of x1 is smaller than that of x0 . To push this argument through induction
(which we leave to the reader), you will also need to use a1 < b1 , which
follows from {x} < 1 for all x.
But the sequence of (positive) denominators {bk } cannot decrease for-
ever. Hence, the process terminates, implying that some xk = 0. ♦
This relationship is actually true for any sequence {an } with average se-
quence {bn }. Now we are on the right track: since k/(k + 1) < 1 and
ak+1 < k + 1, then bk+1 < bk + 1. As both sides are integers, we can be
more precise: bk+1 ≤ bk , which is what we were after! The sequence {bk } is
decreasing and consists of positive integers, so it must stabilize. But as soon
as bk+1 = bk , our inequalities above turn into equalities, showing ak+1 = bk .
Thus, the original sequence {an } stabilizes at the same value.
Exercise 3. The triangles on the right
are equilateral. The fourth point of
the first configuration is the center of
the triangle and in the second config-
uration it is a point below the center.
In each case, the two possible non-intersecting pairings are marked in
solid or dashed segments. In the first configuration both pairings are minimal
168 6. MONOVARIANTS. PART II
(they have the same length). In the second configuration the solid pairing is
longer than the other, hence it is non-minimal, but still a correct pairing.
Problem 10. Starting from an arbitrary assignment of party favors to
people, design the following operation. Whenever some person P violates
the condition, notice that there is some favor that at most 1/n of P ’s friends
have, so reassign P to this favor. For example, in Figure 11a, n = 5 and P ’s
friends are split into 5 groups according to their favors F1 , . . . , F5 . Originally,
P has favor F2 , so he is connected to each friend in the group with F2 .
However, this group is larger than 1/5 of P ’s friends. This means that
another group, namely, the group with F5 , is smaller than 1/5 of P ’s friends,
so we change P ’s favor to F5 and connect him to everyone in that group.
Check that each time the operation is performed, the number of pairs
of friends who have the same favor decreases. This monovariant cannot
decrease forever, and eventually the desired situation will be achieved.
F1
A B Bk B
F2
An An
F5
A2
B1 Bn Bn
P B2
A1 Ak Ak
B2 A1
F4 F3 Bk A
A2 B1
Case 1. If C is still the rightmost position after the jump (cf. Fig. 12a), then
the only change in the value happens with the flea at A:
−λA → −λ(B + λ(B − A)).
Ä ä
This is a net change of −λ B + λ(B − A) − A = −λ(1 + λ)(B − A) < 0, so
the value of the configuration has decreased.
A B A C A B C A
Case 2. If the flea has jumped farther right than C (cf. Fig. 12b), then the
new value is (B + λ(B − A)) − λ(C + B + (other terms)), and the net change:
(B + λ(B − A) − λC) − (C − λA) = (1 + λ)(B − C) ≤ 0.
(We know that B ≤ C because C was originally the rightmost flea.) So the
value may stay the same, but it still cannot increase.
Exercise 5. Suppose otherwise. Then the rightmost flea is at some position
ω > ν/μ and all the other fleas are at positions ≤ ω, which means the value
of the configuration is:
(3)
ω − λ(sum of fleas ≤ ω) > ν.
So getting any flea to a position > ω requires the value of the configuration
to increase, and we have shown in Exercise 4 that this can never happen.
To reach row 4, Figure 15 builds upon our previous 8-checker configura-
tion. However, there is a solution starting with only 20 checkers.
170 6. MONOVARIANTS. PART II
Exercise 9. Row 0 = {. . . , φ−3 , φ−2 , φ−1 , 1, φ−1 , φ−2 , φ−3 , . . .}. The right
half of row 0, starting with 1, is a geometric series that adds up to φ2
(cf. Lemma 3(b)). The left half of the row is the same series minus the term
1, i.e., φ2 − 1. Adding the two halves we get 2φ2 − 1 = 2φ2 − (φ2 − φ) =
φ2 + φ = φ3 . But row n < 0 (below row 0) is just row 0 with everything
multiplied by φn , hence the sum for row n is φ3 φn . Now we add up all rows for
n = 0, −1, −2, −3, . . . and factor the repeating φ3 , and discover yet another
geometric series that we have already encountered again in Lemma 3(b):
Ä ä
φ3 1 + φ−1 + φ−2 + φ−3 + · · · = φ3 · φ2 = φ5 .
Exercise 10. Compared to before, everything on and under row 5 has been
divided by φ5 . Hence the sum underneath the designated line did the same
thing: φ5 /φ5 = 1. The sum up to row 5 (inclusive) is φ5 , so the total sum is
twice that minus row 5’s sum: 2φ5 − φ3 = φ5 + (φ5 − φ3 ) = φ5 + φ4 = φ6 . ♦
Session 7
Zvezdelina Stankova
If you feel you are already fortified with enough plane geometry back-
ground and the two main problems from Part I are not challenging enough,
you can tackle their two cousins below and meet us later in the session to
compare notes on their difficulty and variety of approaches. Calculus solu-
tions are allowed; but the ultimate challenge in these problems, of course, is
to discover beautiful purely geometric solutions that can be potentially cre-
ated by bright middle schoolers with little technical background and open
minds. Do such solutions exist? Part III will partially answer this question.
171
172 7. RE-CONSTRUCTIONS. PART II
α1 α2 α3 α4 α5
A A1 A2 A3 A4 A5
Figure 1. α1 + α2 + α3 + α4 + α5 + · · · = ?
Ideas: The discussion about the original Three-Squares problem in Part I
concluded with finding the sum of the first three angles: α1 + α2 + α3 = 90◦ .
Is there a similar geometric construction for the ℵ0 -Squares puzzle, i.e., can
you usefully tile (part of) the integer grid into grid-triangles? Or could you
apply some more advanced techniques instead? In the latter case, try first
to solve the Three-Squares problem with trigonometry as a preparatory step
for this infinite version. Or, perhaps, you know how to employ the so-called
Taylor expansion of a suitable function for the infinitely many squares?
Whatever you decide, experimenting by summing some of the angles and
estimating the total can be illuminating. Starting with α4 , you may need
to add up more than a dozen angles before you realize that this problem is
very different in nature than its Three-Squares predecessor. ♦
1
ℵ0 is a shortcut for “infinitely many.”
2. A PYTHAGOREAN PATH FOR THE INTERMEDIATE 173
2.1. Similarity rules again. Let’s review the problem in Part I that we
solved via an auxiliary geometric construction and congruent triangles:
D C D C
β
?
α β γ δ β γ
A M H B A M H B
Figure 2. Three-Squares problem and Similarity of triangles
The auxiliary geometric construction was the hard and the brilliant part
of our first solution. It is unlikely that one would come up with the exact
same construction. A natural task would be to find a solution that does not
depend on auxiliary segments. It turns out that there is such a solution; but
to compensate for the lack of auxiliary segments, we will need to replace the
simpler congruences by similarities of triangles.
This is a good place to pause and re-think what happened just now.
PST 58. When reasoning backward you will often reach an important fact
that must be true (in order for the original problem to work out): try to
prove this fact without using any unjustified assumptions from the “back-
ward” discussion.
Applying PST 58 may as well be the the turning moment in the analysis
of the problem, where your solution starts “moving forward”.
174 7. RE-CONSTRUCTIONS. PART II
?
2.1.2. Moving forward . . . and back again. How do we show that M DH ∼
M BD without assuming ∠M DH = γ? We still have ∠HM D = δ shared
by the two triangles, but we do not know anything about other pairs of
angles in these non-congruent triangles. Our only chance is to use ratios of
sides through, say, the RAR criterion for similarity.
With this in mind, is it true that the sides adjacent to δ in M DH and
M BD form equal ratios, i.e.,
MH ? MD ? ? ? √
(1) = ⇔ M H · M B = M D 2 ⇔ 1 · 2 = M D 2 ⇔ M D = 2?
MD MB
We again reasoned backward! But we finally seem to have reached something
that can be proved independent of the discussion so far.
2.1.3. Ending with a Pythagorean certainty. I can almost hear the reader
objecting to the last question in (1): “It is a well-known fact! M D is the
diagonal of a unit square. The Pythagorean Theorem for isosceles
√ right
AM D implies M D 2 = DA2 +AM 2 = 12 +12 = 2, so M D = 2. Done!” ♦
Not so fast! First, do we know how to prove the Pythagorean Theorem?
And even if we do, our reasoning back and forth is not quite written in the
form of a traditional proof.
the forward argument from the above discussion, and write a short formal
Exercise 1. Assuming the Pythagorean Theorem, track down and extract
PST 59. When searching for a second solution, eliminate the methods from
the first solution, in order to restrict your attention to what other techniques
and ideas are available and suitable in your situation.
For example, in the 5th -grade solution from Part I extra constructions
were encouraged, albeit restricted only to the integer grid. On the other
hand, in the second solution we disallowed any extra drawings at all! As
restrictive as this may have seemed at the time, it worked to our advantage:
it reduced the number of possible triangles and, even more drastically, the
number of possible pairs of similar triangles, making it easier to find the
“right” pair: M DH∼M BD.
Exercise 2. If you have extra time on your hands, count for fun the number
of families of similar triangles that appear in the original Figure 2a.
Here, a family is a collection of triangles any two of which are similar
i to each other, and two triangles from different families are not similar. Be
aware that congruent triangles are also counted as similar!
2. A PYTHAGOREAN PATH FOR THE INTERMEDIATE 175
(a) In a right triangle, the hypotenuse is the largest side. √
(b) In a right isosceles triangle with legs 1, the hypotenuse is√ 2.
(c) In a 30◦ -60◦ -90◦ triangle, the three sides are in ratios 1 : 3 : 2.
Hint: Parts (a)-(b) have been done before (where?), with the “premature”
assumption of PT. In part (c) draw a segment through the vertex of the right
angle to split the original triangle into two smaller triangles, one of which
equilateral (cf. Fig. 3b). Describe the other small triangle. How does this
imply that the hypotenuse is twice the side of the equilateral triangle? ♦
We have some unfinished business from the Farmer-and-Cow discussion
in Part I. We concluded there that the farmer’s shortest route is through
point X on the river such that AX = 1 km and BX = 3 km. The other
(given) distances are F A = 2 km and CB = 3 km (cf. Fig. 3c).
Exercise 4. Calculate the length of the shortest route of the farmer.
Another PT consequence (used earlier) has a more demanding proof:
Hint: Drop the altitude to that third side, split into cases depending on
where the foot of this altitude lands, and use a baby consequence of PT. ♦
176 7. RE-CONSTRUCTIONS. PART II
normal
ing off a river.2 Everything is in a half-plane with
respect to the river, the normal is the dashed per-
pendicular to the river, and the doubly-marked
angles are equal: ρ = ρ . Subtracting each from
ρ ρ
90◦ yields α = α , which is a rephrase of law (2):
the angles made by the riverand the incoming ray α α river
and by the river and the reflected ray are equal. Y X Z
Since we are trying to connect these “laws of nature” to our Farmer-
and-Cow problem, it will be silly to expect that numerical data (such as
the specific distances from the farmer and the cow to the river) are relevant
in this discussion. With this understanding, let’s generalize the original
problem by keeping only its features that are essential:
The farmer must get to the river, dip his bucket
there, and take the water to his cow. To which
point at the river should the farmer walk so that
his total path is as short as possible?
Hint: The solution from Part I applies equally α α river
well to this generalized version: reflect the farmer Y X Z
across the river to obtain three similar triangles (review page 13). Then the
optimal path of the farmer must have made two equal angles with the river,
namely, α = α as marked above (why?). ♦
2
Caution: “Reflection” may mean different things, depending on the context. In the
Laws of Reflection, the sunlight is reflecting off the river. Mathematically speaking, this
is different from reflecting across the river, which the farmer did in order to get to the
phantom farmer on the other side of the river. The two usages are related, of course: the
sunlight’s reflection off the river is the same as its reflection across the normal (why?).
3. PHYSICS AND MATH COMBINE FORCES 177
Comparing the two pictures on the previous page leads to the inevitable
conclusion that the farmer must follow the same path as the sunlight, except
on a horizontal instead of the vertical or slanted plane along which the
sunlight travels. To make this into a formal argument, a small hurdle about
uniqueness must be overcome:
that if we connect it to the farmer F and the cow C we will make two equal
Exercise 6. Show that there is exactly one point X on the river Y Z such
Exercise 8. Among all paths from one point to another that bounce off a
mirror, show that the sunlight will take the shortest distance possible.
Proof: If the sunlight starts at point F , reflects in one mirror and passes
through point C, then the two angles that the sunlight’s path makes with
the mirror are equal by the Laws of Reflection.
Now put everything on a horizontal plane and let a farmer start at F
and walk to the mirror and then to the cow. From the general Problem 4
and the uniqueness in Exercise 6, we know that the shortest path the farmer
goes through the unique point X on the mirror where the path makes equal
angles with the mirror.
In other words, the path of the farmer and the path of the sunlight are
identical. Since this is the shortest path for the farmer, it will be the shortest
path for the sunlight too.
So, what happened here? Simply put, our solution to the Farmer-and-
Cow problem implied a “law of nature”: the sunlight travels the shortest
route possible even if it has to reflect along the way! And conversely, if we
assume this “law of nature” about the sunlight, then the shortest route for
the farmer will make two equal angles with the river. It depends on what you
assume as an “axiom” and what you decide to prove from it as a “theorem”.
178 7. RE-CONSTRUCTIONS. PART II
angle. Children play the following game: starting
from one of the trees, they run to one side of the
fence, then to the other tree, then to the other
side of the fence, and finally return to the first
tree. Help them do this as fast as possible.
You may assume that the fence extends as far as necessary so that the
children cannot go out of the courtyard. You should also think about:
go wrong then and how should the solution be modified to work here too?
Exercise 10. Why were obtuse angles eliminated in Problem 9? What may
M2
(b) bounces off from a sequence of mirrors
M1 , M2 , . . . , Mn , and
(c) ends at point B,
show that the sunlight has taken the shortest pos- A
M4
The picture above shows two paths: the sunlight’s path from A to B that
reflects through mirrors M1 , M2 , M3 , and M4 , and an alternative (dashed)
path that bounces off from the same sequence of mirrors. Note that the two
paths happen to pass through the same point on mirror M3 . Still, Exercise 11
claims that the sunlight’s path will be the shorter of the two.
Hint: Resolve the sunlight path into an equally long straight line path
while showing that the alternate path is a broken line path. ♦
Proof: The key idea is to split a diagonal, say, BD into two parts BM and
M D, so that ∠BAM = ∠CAD (= α as in Fig. 4b). Since inscribed angles
∠ABM and ∠ACD intercept the same arc AD, ¯ they are equal.3 From here,
BAM ∼ CAD by the AA similarity criterion. We can picture this by
rotating CAD about vertex A until side AC aligns with ray AB, and then
rescale CAD to the size of BAM . The angle of rotation is ∠CAB.
A second rotation about A but through ∠M AB (as in Fig. 4c), followed
by a rescaling, will move DAM onto CAB: why are these triangles also
similar? Check out the pairs of equal angles denoted by γ and δ.
Now we use ratios of sides from the above two similarities to express the
parts BM and CM of diagonal BD in terms of quadrilateral ABCD:
BM CD DM CB BA · CD DA · CB
= , = ⇒ BD = BM + M D = + ·
BA CA DA CA CA CA
Clearing the common denominator CA yields the desired equality of PtT.
Did you notice that the same problem-solving idea occurred in the proofs
to both the Pythagorean and Ptolemy’s Theorems? The hypotenuse or a
diagonal was split into two parts, whether by the foot of an altitude or by an
extra point we created. In both situations, similar triangles played a crucial
role in the geometric construction and the ensuing algebraic calculations.
3
To review some facts about angles in a circle, see Circle Geometry, vol. I.
180 7. RE-CONSTRUCTIONS. PART II
Exercise 12. Verify that sin x and cos x, as well as tan x and cot x, swap
their values for angles α and γ = ∠C; that tan x and cot x are reciprocals
of each other and can be expressed as ratios of sin x and cos x; and that a
trigonometric version of the PT is satisfied for any right triangle:
(a) sin α = cos γ, cos α = sin γ, tan α = cot γ, and cot α = tan γ;
(b) tan α = cot1 α and tan γ = cot1 γ ; tan α = cos
sin α cos α
α and cot α = sin α ;
(c) sin2 α + cos2 α = 1 and sin2 γ + cos2 γ = 1.
◦ ◦
strictly increase, while cos x and cot x strictly decrease. (This means, for
Exercise 16. When α moves from 0 to 90 , show that sin x and tan x
example, that sin x < sin y and cot x > cot y for acute angles x < y.)
Hint: Use the unit circle for the values of sin x and cos x, or lines l and m
from Exercises 14-15 for the values of tan x and cot x. ♦
Yet a third way to think about a trigonometric function is via its graph.
When drawing graphs of trigonometric functions, on the x-axis we ordinarily
use linear units called radians (instead of degrees, which are angular units).
For example, 0◦ corresponds to 0 radians, 90◦ to π2 radians, 180◦ to π radians,
etc. More generally, z ◦ corresponds to 180
πz
radians: this is the length of the
arc on the unit circle k that is encompassed by a central z ◦ -angle. Thus,
¯ on the unit circle k on page 180
the length of the smaller (dotted) arc CD
measures angle ∠BAC = α in radians. Keep this in mind when drawing the
graphs below and use radian measure along the x-axis.
In fact, any positive value of tan α can be obtained this way (why?).
Thus, the range of tan x is [0, ∞) for 0 ≤ x < π/2 . Furthermore, the graph
of tan x has a vertical asymptote ν at x = π2 (not to be confused with the
previously discussed tangent line l). Visually, we observe that the graph of
tan x gets closer and closer to the line ν as x approaches π2 . ♦
182 7. RE-CONSTRUCTIONS. PART II
The sine and cosine functions can be defined for any angles, not just for
acute angles like α and γ above, while the tangent and cotangent functions
can be extended with care to almost all angles, avoiding division by cos x or
sin x when they are zero. We will not do this here, but the reader interested in
having a more complete understanding of trigonometry should consult a basic
text on trigonometry and then justify in the Hints section the corresponding
(dashed) extensions of the graphs from Exercise 17.
4.3. Deep in Trigland. To prepare for the promised formula for tan(α+β),
we need to first address its predecessors: analogous versions for sine and
cosine. If you are familiar with these formulas, skip to Section 4.4 for the
trigonometric solution to the Three-Squares problem. Otherwise, hold on
tight, for we will pass through some rough trigonometric terrain.
Theorem 3. For any angles α and β:
(a) sin(α + β) = sin α cos β + cos α sin β;
(b) sin(α − β) = sin α cos β − cos α sin β;
(c) cos(α + β) = cos α cos β − sin α sin β.
For our purposes it suffices to consider only the case when α + β is acute.
We leave it to the reader to extend the proofs to any other cases. D
In part (b), we can modify the ideas encountered just now to accom-
modate the required difference (instead of sum) of two angles. We can also
restrict the solution to the case when α > β so that α − β > 0 and we can
use our basic definitions of the sine and cosine functions.
Exercise 18. Devise and prove analogous formulas for cos(α−β), tan(α−β),
cot(α + β), and cot(α − β).
184 7. RE-CONSTRUCTIONS. PART II
4.4.2. More trig-routes? As we went through the above solution, the reader
should have questioned our choice of the tangent function: couldn’t we have
done as well with other trigonometric functions? The answer is Yes, but you
need to complete the earlier exercises about the basic properties of sin x,
cos x, and cot x, before you can:
Exercise 21. Produce three more solutions to the√Three-Squares√problem.
Solution with cosine: In Figure 2a, DH = 5 and DB = 10 from
right DAH and right DAB. Thus, cos(β + γ) can be calculated as
cos β cos γ − sin β sin γ = √2 √3
5 10
− √1 √1
5 10
= √ 5
√
5 10
= √1
2
= cos 45◦ .
But cos x strictly decreases for acute angles, and, as above, β + γ < 90◦ ; so
cos(β + γ) = cos 45◦ means β + γ = 45◦ . Thus, again α + β + γ = 90◦ .
5. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 185
F F
A B H A H =B A H B Y X X Z Y X=X Z
Figure 5. Triangle Inequality and Questioning uniqueness
186 7. RE-CONSTRUCTIONS. PART II
M2
the alternative (dashed) route start at A1 instead.
S2 More precisely, replace the initial line segments
Q2 AS1 and AQ1 of the two routes by segments A1 S1
A
and A1 Q1 of, correspondingly, equal lengths. Note
B
Q1 M1
that the sunlight will now continue straight from
S1 A1 through S1 to mirror M2 , while the dashed
route will, in general, follow a broken line from
A1 A1 through Q1 to mirror M2 .
To summarize, moving the starting point of the routes from A to A1 did
not change the total length of each route. But now we can forget mirror
M1 and reduce the problem to one fewer mirror. Continuing this way, we
can gradually straighten out the sunlight’s route. In the end, after the last
mirror has been eliminated, both routes will start at some point An and
both will end at B, but the sunlight’s route will be a straight segment while
the alternative route will still be a broken line, unless it originally coincided
(everywhere!) with the sunlight’s route. ♦
Exercise 13. Check that sin 0◦ = cos 90◦ = 0, sin 30◦ = cos 60◦ = 12 , sin 60◦ =
√
cos 30◦ = 23 , sin 90◦ = cos 0◦ = 1. Furthermore, cot 0◦ = 0 is not defined,
√
cot 30◦ = 3, cot 60◦ = √13 , and cot 90◦ = 0. ♦
y cot α
Exercise 15. Using the figure on page 180, let m
G H m
be the line through G(0, 1) tangent to circle k and
−→ F
C
intersecting ray AC at H. Note that the measure of k
∠CAG is (90◦ − α). ◦ α
90 −α
By Exercises 12 and 14, cot α = tan(90◦ − α) = A B x
tan ∠CAG = GH, i.e., the cotangent function is mea-
sured along line m.
Exercise 16. As the second side of ∠BAC rises from 0◦ to 90◦ , point C
also rises along the unit circle k and hence its y-coordinate sin α increases.
At the same time, B moves closer to the center A of the unit circle; i.e., its
x-coordinate cos α, decreases. As we saw in Exercise 15, cot α = GH, which
will decrease since H will move towards point G. ♦
188 7. RE-CONSTRUCTIONS. PART II
Exercise 17. The graphs of sin x and cos x on [0, π], of tan x on (− π2 , π2 ) ∪
2 ), and of cot x on (0, π) ∪ (π, 2π) are sketched below. To justify the
( π2 , 3π
dashed parts of the graphs, the functions need to be defined on the cor-
responding intervals; for these definitions, use the unit circle for sin x and
cos x, and use the tangent and cotangent lines l and m for tan x and cot x ♦
y y y
1
tan x
tan x
cot x
cot x
sin
2π
x
0 π x
π π 2π
y − π2 0 π
2
3π x 0 π
2
3π x
2 2
1
co
x
sx
0 π 2π
Zvezdelina Stankova
Although Sections 3–7 are a must for everyone, Sections 8–9 are rather
non-trivial: the applications of complex numbers to geometry will require
sophistication and determination from the reader to follow through the ar-
guments and absorb all the ideas.
A prime example of this is Problem 7, which
A1
will come up in Section 10. Informally, if
A2 A0 A1 . . . An−1 is a regular polygon, which line
l
l in the plane is the “closest” to its vertices?
More precisely, if we take the distances from each
O A0
vertex Ai to l (denoted by dashed segments in
A3 Fig. 1), square them, and add them, which line
l will yield the minimal such sum?
A4
Figure 1. “Closest” line? After skimming through Sections 3–7, the
novice reader may decide to wait for Part IV (in
a future volume), devoted entirely to solving Olympiad-style problems via
complex numbers. Part III would then be an option for the intermediate
reader. The advanced reader, on the other hand, is encouraged to “stick with
it” and try Problem 7 on his/her own while we diligently move towards its
solution at the end of the present Part II.
189
190 8. ROOTS OF UNITY IN GEOMETRY
3. Complex Division
We saw in Part I how to add, subtract, and multiply two complex num-
bers z and w. In R, we can also divide two numbers x and y, as long as
y = 0. Can we do this in C too? In other words, can we rewrite the ratio
z/w as some complex number q such that qw = z?
A1
w A2
θ |w|2 1
−θ ww A0
A3
w
A4
Figure 2. Multiplying by the conjugate
(a)
1
5 − 4i
; (b)
2 + 3i
5 − 4i
; (c)
7−v
7+v
if v = 2 + 3i; (d)
a + bi
c + di
for a, b, c, d ∈ R.
Solution for part (a): The desired fraction is nothing but the recip-
rocal of w = 5 − 4i. Applying (2) with z = 1 we obtain a formula for w−1 :
1 w 5 + 4i 5 4
(3) = = = + i.
w |w| 2 25 + 16 41 41
Whoever diligently completes part (d) will frown at the resulting com-
plicated formula for the ratio z/w: this happened because the question was
phrased in Cartesian coordinates. Can we interpret division in C via polar
coordinates in a more easily remembered and natural way? Prove the fol-
lowing corollary in two ways: algebraically and geometrically, and convince
yourselves that it makes sense for any non-zero denominator w.
Corollary 1. If z = (|z|, θ) and w = (|w|, μ), then
z Ä |z| ä 1 Ä 1 ä
= , θ − μ , and = , −μ .
w |w| w |w|
In other words, division in C divides the moduli and subtracts the arguments
of the numerator and denominator.
3.2. Division in C is respected. Just like in Part I, it is time to under-
stand how the operations of modulus and conjugation interact with complex
division. That they respect division should come as no surprise. To see this,
do the following exercise in two ways, using Cartesian or polar forms.
Exercise 3. For z, w ∈ C with w = 0, prove that = and = · z
w
|z|
|w|
z
w
z
w
4.1. Modulus and addition. We have seen that conjugation respects all
four standard operations on C: addition, subtraction, multiplication, and
division.1 We have also observed that the modulus preserves multiplication
and division. How about modulus and addition? Let’s experiment.
Exercise 4. For z = 3 + 4i and w given below, compare the modulus of
their sum with the sum of their moduli. Which is larger, |z + w| or |z| + |w|?
How about |z − w| and |z| − |w|? Are they ever equal?
(a) w = 5 − 12i; (b) w = 6 + 8i; (c) w = 1 + 43 i.
We have finally come to operations in C that are not respected: the
modulus does not respect addition or subtraction on C. Instead, we have:
T =z + w T = kz |z − w|
Q=w |w|
P =z degenerate P =z |z|
O O O
All right, but how does one prove this geometric version of the Triangle
Inequality? Check out Geometry II.
5. Integer Powers in C
√ √
Solution to Problem 1: Factor |z| = 3 + 1 = 2 from z = 3 + i:
√
z= 2( 23 + 12 i) = 2(cos π6 + i sin π6 ).
de Moivre’s Theorem then yields:
Ä ä
z 2004 = 22004 cos 2004 2004
6 π + i sin 6 π = 2
2004 (cos 334π + i sin 334π) = 22004 .
equality (√3 + i)
Problem 2. Find the smallest positive integers m and n satisfying the
m = (1 − i)n .
PST 64. If you want z = w, write in polar form z = (|z|, θ) and w = (|w|, μ)
and equate the moduli and arguments: |z| = |w| and θ ≡ μ (mod 2π).
Recall from Number Theory I (vol. I) that “mod 2π” simply means that
θ and μ differ by a multiple of 2π. So, we apply PST m n
m
√ n64 to z and w . For
starters, the moduli must be equal, i.e., 2 = ( 2) , and hence n = 2m.
Excellent: n must be even, which simplifies (6) to
(7) (1 − i)n = w2m = 2m (cos(− π2 m) + i sin(− π2 m)).
5. INTEGER POWERS IN C 195
5.3. Landing on the axes. In the next problem, we seek out all z ∈ C
such that the fourth white dot on their power curve Pz lands on the real or
the imaginary axes.
Problem 3. If z = a + bi ∈ C, find out relations between a and b such that
(a) z 4 is real; (b) z 4 is purely imaginary.
Hint: The problem is again stated in the “wrong” coordinates. Instead,
write z = (r, θ) in polar form and use de Moivre’s formula: z 4 = (r4 , 4θ).
Note that the modulus of z is irrelevant in our question, since landing
on a specific axes is determined entirely by the angle θ. For example, in
part (b), in order for z 4 to be purely imaginary, 4θ must “line up” with
the positive or the negative imaginary axis, which yields two possibilities:
4θ ≡ ±π/2 (mod 2π). These two possibilities are contained in the single
congruence relation 4θ ≡ π/2 (mod π) (why?). ♦
Exercise 5. In Figure 4 which powers of z = 3 + i will extend the spiral
Pz from the initial dot z toward the origin O?
Solution: The intended extension of Pz is depicted by a dotted curve in
Figure 4. The black dots on this curve are the negative integer powers of z:
z −1 , z −2 , z −3 , etc. Indeed, for any integer n > 0,
z −n = ( |z|1n , −nθ) = ( 21n , −nθ), where θ = π6 ·
Thus, the modulus 1/2n becomes smaller and approaches 0 as n increases,
thereby pulling the complex numbers z −n toward the origin. Further, the
angle −nθ rotates z clockwise, making a spiral revolving toward O.
We can summarize informally this section as follows. For all non-zero
z ∈ C the integer powers z n comprise the “skeleton” for the power curve Pz ,
to give the impression that Pz “starts” at the origin and spirals away forever,
or that it is the unit circle for |z| = 1.
196 8. ROOTS OF UNITY IN GEOMETRY
6. Roots in C
ω
w1 √ ω 2
w2 z= 3 + i
2π
w0 5 1
w3 ω3
w4 ω4
Pz
√
Figure 5. 5th Roots of z = 3 + i and Regular pentagon in C
» the moduli we must have r = |z|, which accounts for the term r =
For 5
5
|z| in the Root Formula. Further, equating arguments of w5 and z, we
obtain 5μ ≡ θ (mod 2π), i.e., 5μ = θ + 2kπ for some integer k. Therefore
μ = 5θ + 2k5 π. Although, in principle, we can plug in any integer k, only five
distinct sums μ will be formed up to a multiple of 2π (why?):
μ = θ5 , θ
5 + 25 π, θ
5 + 45 π, θ
5 + 65 π, θ
5 + 85 π (for k = 0, 1, 2, 3, 4).
For instance, k = 2007 will land us on the third possibility:
θ
5 + + (802 + 45 )π ≡ 5θ + 45 π (mod 2π).
2·2007
5 π = θ
5
√
This explains why there are exactlyfive roots 5 z, given by k = 0, 1, 2, 3, 4.
6. ROOTS IN C 197
Now you should repeat this whole reasoning for a general n ∈ N, in the
place of 5. To really understand the Root Formula, do the following:
Exercise 6. Let z = (4, 23 π).
√
(a) Use polar coordinates to show that z has exactly two square roots z.
First reason geometrically, and then use the Root Formula. Draw a
picture.
√
(b) Repeat the exercise for the cube roots 3 z, showing z has exactly three
cube roots.
circle centered at O (drawn dashed in Fig. 5a). Hence only one of the √ wk ’s
should land on the spiral Pz . Figure 5a seems to indicate that for z = 3+i,
it is
»
w0 = 5
|z|(cos 5θ + i sin 5θ )
that lands on Pz . But is this true for any complex z? Besides, the angle
θ in the polar form for z was arbitrarily chosen up to a multiple of 2π. If
we change θ to θ + 2π in the expression for w0 , we will end up with the
formula for
»
w1 = 5
|z|(cos θ+2π θ+2π
5 + i sin 5 ).
So, should w1 also lie on Pz ? What is going on? We can clearly see that
only one of the roots wi can land on Pz . . . .
The answer is hiding where we are not looking for it: we haven’t really
defined the power curve Pz , other than saying it is a “smooth curve passing
through all integer powers of z”. But maybe there are several such smooth
curves, one of which passes through w0 , and another through w1 ?! We shall
resolve this question in Part III and extend the discussion to any powers z v ,
whether v is real or complex.
For the time being, check your understanding by solving the following:
Exercise 7. Consider the equations w6 = z for z = 1, −64, i and 64i.
(a) Find all complex solutions w of these four equations, and draw pictures.
(b) In each case, can you visually select “the one” solution w which lies on
the corresponding power curve Pz ? How are you sure that your choice
is correct?
198 8. ROOTS OF UNITY IN GEOMETRY
other roots of unity: ω k = ωk for all k, and hence the name primitive root.
We conclude that the original polynomial z n − 1 factors as:
(8) (z − ω0 )(z − ω1 ) · · · (z − ωn−1 ) = (z − 1)(z − ω)(z − ω 2 ) · · · (z − ω n−1 ).
Exercise 8. Verify that ω = ω −1 = ωn−1 . Conclude that the other roots of
unity also pair up under conjugation: ωk = ωn−1−k for k = 0, 1, . . . , n − 1.
7.2. Choosing the best coordinate system. The roots of unity are not
only algebraic objects – roots of a polynomial – they are also geometric ob-
jects; the vertices of a regular n-gon inscribed in the unit circle (cf. Fig. 5b).
Can we use this to our advantage in geometry problems? Let us start with
a relatively straightforward situation.
Exercise 9. Let A0 A1 A2 A3 A4 be a regular pentagon. Find a C–coordinate
system in which the five vertices A k are easily encoded as complex numbers.
A0
A3 B4
+z ω1
ω4
B3 B0 ω3 ω2
A1 ∗v = ru w
A2 ω0 ω0
ω2 ω3
ω1 ω4
B1
B2
A2 A2 A2
A3 A3 A3
A1 A1 A1
P
A4 A4 A4
O A0 O A0 P O A0
A5 A5 A5
A8 A8 A8
A6 A6 A6
A7 A7 A7
8.2. Getting extra mileage. The reader should be at least a bit curious
about the need for factoring out and cancelling (z − 1) on both sides of (11):
this was caused exclusively by our determination to plug in z = 1. What if
we plug any other complex number z into (11): as long as z = ω k (for any
integer k), we will get a non-trivial equality. The question is: which of these
inequalities will correspond to an elegant geometric formula?
Problem 5. Let A0 A1 . . . An−1 be a regular polygon inscribed in a circle of
−−→
radius r and center O, and let P be a point on ray OA0 beyond A0 . Prove that
the product of distances from P to the vertices of the polygon is |OP |n − rn .
Proof: A nonagon version of this problem is presented in Figure 7b. Again,
we fix the origin O at the center of the polygon, and let A0 lie on the positive
real axis. Because of the given radius r, we slightly adjust by making A0 = r,
and hence Ak = rω k for k = 0, 1, . . . , n − 1. Since P also lies on the positive
real axis, it is advantageous to write P in a similar way: p = rq for some
real q > 0. The desired product is calculated by:
n−1
n−1
n−1
|P Ak | = |rq − rω k | = rn |q − ω k |
k=0 k=0 k=0
= rn |(q − 1)(q − ω)(q − ω 2 ) · · · (q − ω n−1 )| = rn |q n − 1|.
The last equality was obtained from (8) for z = q. We can put r back inside
the modulus: rn |q n −1| = |(rq)n −rn | = |pn −rn |. Now, if P were an arbitrary
point, we would stop here since there would be nothing to simplify. But P
−−→
lies on the ray OA0 , outside of the circle. Hence, p ∈ R and p > r. Thus,
pn − rn is also a positive real number, which therefore equals its modulus:
n−1
|P Ak | = |pn − rn | = pn − rn = |OP |n − rn .
k=0
Along the way we established that for an arbitrary point P , the corresponding
k=0 |P Ak | = |p − r |.
product (illustrated in Fig. 7c) is given by n−1 n n
202 8. ROOTS OF UNITY IN GEOMETRY
9.1. Sums versus products. Now, why should the product of the above
segment lengths be any more interesting than, say, their sum? If you try
to calculate |P A0 | + |P A1 | + · · · + |P An−1 |, you will find out that, due to
convoluted square roots, this sum is harder to control and simplify than the
product. For some people the more obvious and more important question
would be to investigate the sum of squares, |P A0 |2 +|P A1 |2 +· · ·+|P An−1 |2 ;
for instance, such people
• may have studied a bit of statistics and are therefore always tempted
to minimize sums of squares of distances;
• are geometry fans of Pythagorean-like problems and would like to gen-
eralize the Pythagorean Theorem;
• have understood Part I of complex numbers well enough to realize that
the modulus |z| is much harder to manipulate since it involves a square
root, while the square |z|2 = zz = a2 + b2 is susceptible to more than
one method of slick calculation.
k=0 ω = 0. Geometri-
cally, the center of a regular n-gon inscribed in the unit circle is the origin.
4
Algebraic manipulations of complex numbers are required for this solution.
9. VENTURING EVERYWHERE IN THE PLANE 203
ω A1
P P
ω2 A2
P P
O 1 O A0
P
ω3 A3
ω4 A4
We present four different proofs of this fact, each proof using a different
PST. Even though four proofs are an “over-overkill” for the task at hand,
one never knows which idea will end up being useful in a later situation.
Proof 1 (Equating): By (8), z n − 1 = (z − ω0 )(z − ω1 ) · · · (z − ωn−1 ).
This is not just one equation, but several. Indeed, if we multiply out the
RHS and regroup around the powers of z we will obtain
z n − 1 = z n − (ω0 + ω1 + · · · + ωn−1 )z n−1 + · · · + (−1)n (ω0 ω1 · · · ωn−1 ).
We can equate the coefficients5 on both sides for any power of z. But there
is no power of z n−1 on the LHS! Therefore, equating its coefficients on both
sides yields 0 = ω0 + ω1 + · · · + ωn−1 .
Proof 2 (Series): Why not use Lemma 1? Substituting the nth primitive
root of unity ω for z, we arrive at
ωn − 1 1−1
1 + ω + ω 2 + · · · + ω n−1 = = = 0.
ω−1 ω−1
Along the way, we realized that Lemma 1 is equivalent to the well-known
and useful formula for a geometric series, which made its appearance in the
Stomp session in volume I:
zn − 1
(15) a + az + az 2 + · · · + az n−1 = a for any a, z ∈ C, z = 1.
z−1
Proof 3 (Invariants): If S = ω0 + ω1 + · · · + ωn−1 , multiplying each
vertex ωk by the primitive root of unity ω simply rotates ωk to the next vertex
ωk+1 (where ωn = ω0 = 1). Overall, the set of vertices remains the same:
{ωω0 , ωω1 , · · · , ωωn−1 } = {ω1 , ω2 , · · · , ωn−1 , ω0 } = {ω0 , ω1 , · · · , ωn−1 }, and
thus the sums in these two sets are equal:
S = ω0 + ω1 + · · · + ωn−1 = ωω0 + ωω1 + · · · + ωωn−1 = ωS
⇒ S − ωS = 0 ⇒ S(1 − ω) = 0 ⇒ S = 0.
5
Equating these coefficients would yield n relations between the roots and the coeffi-
cients of the given polynomial, which are a special case of Viète’s formulas. For instance,
equating the free terms yields −1 = (−1)n (ω0 ω1 · · · ωn−1 ), i.e., ω0 ω1 · · · ωn−1 = (−1)n−1 .
204 8. ROOTS OF UNITY IN GEOMETRY
k=0 |P Ak |
= 2nr2 − r2 q · 0 − r2 q · 0 = 2nr2 ,
n−1 2
which certainly does not depend on the specific P and is thus constant.
9.3. Letting P wander off in the plane. Naturally, we should question
the necessity of placing P on the circumcircle of the n-gon, and we should
attempt to generalize Problem 6 to any point P in the plane.
Exercise 12. Given a regular n-gon A0 A1 . . . An−1 , calculate the sum
|P A1 |2 + |P A2 |2 + · · · + |P An |2
and determine for which P it is minimal.
Sketch: The proof of Problem 6 goes through here with only one small
change. We can write again P = p = rq where r is the circumradius, but
|q| is no longer required to be 1. We adjust the calculation accordingly:
qq = |q|2 and ωω = 1, so that
n−1
n−1
|P Ak |2 = r2 (qq + (ωω)k − qω k − qω k )
k=0 k=0
= r 2 |q|2 n + r 2 n − r 2 q · 0 − r 2 q · 0 = n(|rq|2 + r 2 ) = n(|p|2 + r 2 ).
We conclude that for any circle K centered at O of radius |p|, the given sum
depends only on |p| and is therefore constant along K. Figure 8b displays
four examples of such circles K, along each of which the sum of squares
remains constant. As the circle K shrinks, the sum also decreases, and its
minimal value of nr2 is obtained when P coincides with O:
|P A1 |2 + |P A2 |2 + · · · + |P An |2 ≥ nr2 , with equality iff P = O.
f (z) = g(z), equating the coefficients on both sides for each power z
PST 66. (Partial Viète’s Formulas) Given an equality of polynomials
k yields
PST 67. (Geometric Series) When calculating a sum S, try to identify it
with some well-known type of sum. In particular, if each term is the previous
term times the same number z = 1, we can use formula (15) for the so-called
i geometric series with ratio z and initial term a.
Figure 9d depicts just that: the imaginary axis is rotated and shifted to
coincide with line l. Meanwhile, the pentagon is also rotated and shifted
without changing its relative position to l. The vertices of the pentagon may
no longer be the 5th roots of unity.
l A1 A1 l l A1 l A0
A2 A2 A1
A0 A4
A2
O A0 O O A0 O
A3 A3 A2
A4 A3
A4 A3 A4
Lemma 4.
n−1 n−1
ωk2 = 0 and, more generally, ωkm = 0 for any integer m.
k=0 k=0
Hint: The necessity for the geometric series approach should be evident
here: write out the sums and decide what your ratio and initial term are. ♦
Plugging the result of Lemma 4 into (16) yields the desired sum: S = 12 n.
(Why did the sum of conjugates ωk2 in (16) disappear too?)
The only difference from the previous case are the u2 and u2 , which are
“stuck” in front of the squares ωk2 and ωk2 . However, since u2 and u2 are
constants, after summing up everything over k, they will factor in front of
the sums ωk2 = 0 and ωk2 = 0, yielding:
n−1 Ä ä Ä ä
S= d2k = 12 n + 1
4 u2 ωk2 + u2 ωk2 = 12 n + 1
4 u2 · 0 + u2 · 0 = 12 n.
k=0
We conclude that as long as the line l passes through the center of the
polygon, the angle of rotation θ about the origin will not matter and the
sum will remain the same.
10.6. The “closest lines”: conclusions and a look ahead. The actual
final answer should be adjusted to reflect the fact that the circumradius of
our polygon may be some (real) r = 1: if Ak = ruωk + t, then
S = ( 12 r2 + t2 )n.
Note that the real number t measures the distance from the line l to the center
of the polygon. Thus, the sum is minimal when t = 0, i.e., the “closest” lines
l pass through the center O and yield a minimal sum S = 12 r2 n.
208 8. ROOTS OF UNITY IN GEOMETRY
It is curious that the answers for this Problem 7 and for the previous
Problem 6 are in some sense identical: the minimal sums are obtained when
the line l and the point P are incident with O: l passes through O or
P = O. Is there a deeper reason, beyond our calculations, for such a
“coincidence” of answers? To nudge you in one possible direction, here is
a related problem in 3-dimensions. It is suggested by one of the giants of
contemporary mathematics, the Russian Vladimir Arnol’d, in his Trivium
Mathematique [7], a collection of 100 problems that he expects every well
educated mathematician should be able to solve.
Problem 8. Given a cube, let l be a line through its center. Consider the
sum of the squares of distances from each of its vertices to l. For which
such line l is this distance minimal? How about replacing the cube by other
Platonic solids? 6
For further discussion of C and more similar examples check out the
books by Hahn, Needham, Schwerdtfeger, and Yaglom [35],[60],[72],[86],
and look for the next two sessions on complex numbers. Part III will round
up the theoretical discussion of C by applying the Fundamental Theorem
of Algebra to real polynomials, while Part IV will apply C-techniques to
solving, as promised, Olympiad-type geometry problems from around the
world.
it remains only to show that this candidate behaves as a true ratio should.
We multiply it by w in hopes of getting z:
|z| |z|
|w| , θ − μ · (|w|, μ) = |w| · |w|, (θ − μ) + μ = (|z|, θ) = z.
Along the way, we used (1) from Section 2 in order to multiply in polar form.
|z|
We conclude that |w| , θ − μ indeed equals the ratio z/w.
Exercise 3. The identities follow directly from Corollary 1 in polar form; e.g.,
z |z| |z| |z| (|z|,−θ)
w = |w| , θ −μ = |w| , −θ +μ = |w| , −θ − (−μ) = (|w|,−μ) = w·
z
Problem 3(b). θ = π
8 + kπ
4 , and hence b
a = tan( π8 + kπ
4 ) for 0 ≤ k ≤ 7. ♦
√ Ä√ 2π +2kπ ä
Exercise 6(a). If w = (4, 2π 3 ) then w = 4, 3 2 = (2, π3 + kπ) for
k = 0, 1. Check that these two solutions work and draw relevant pictures. ♦
Exercise 11. For the first part, assuming that the polygon is regular, we
can simply plug the formulas Ak = vω k +z into the given fractions and verify
that we’ll get ω:
Ak+2 − Ak+1 (vω k+2 + z) − (vω k+1 + z) ω k+1 (ω − 1)
= = = ω.
Ak+1 − Ak (vω k+1 + z) − (vω k + z) ω k (ω − 1)
210 8. ROOTS OF UNITY IN GEOMETRY
ω
2π
A1 5 A0 A1 A0 A1 A0
- 3π
5
A2 - 3π
5 A2 A4
A2 - 3π
5
A3 A3
Figure 10. Building a Regular Polygon from Equations (17)
Sneak Preview. When your teacher calculates the average of your exams scores,
she usually adds up all scores and divides by the number of exams. But what
if instead she multiplies the n scores and takes the nth root of that, or adds up
the squares of the scores, divides by n, and takes the square root of that? If all
exam scores are equal, these three ways will yield the same average; but if even
two exam scores differ, the results will all be different. Which method yields the
highest average? And what if your teacher weights the exams unequally?
This session will answers these and more questions from the realm of inequali-
ties. Some problems will invoke geometry, combinatorics, or calculus, and can be
skimmed on a first reading. Part II will tackle other fundamental inequalities.
What does a statement like this mean? It seems it can mean only one
thing: that “the squares of real numbers are non-negative.” Actually, it is
saying three things. The main part of the statement is that
(1) if t is a real number, then t2 ≥ 0.
But the last phrase “with equality if and only if t = 0” adds two more things:
(2) if t = 0, then t2 = 0; and
(3) if t2 = 0, then t = 0. (Thus if t = 0, then t2 > 0.)
This simple example shows that there is an interplay between the language
of equalities and the language of inequalities, and that often statements of
inequalities may be saying “more” than what can be seen on the surface. As
we go through this session, we will introduce further terminology related to
inequalities and pay close attention to the specific language used so as to
interpret and use it correctly.
211
212 9. INEQUALITIES I
2.1. Gardening With Baby AM-GM. The most basic arithmetic mean–
geometric mean inequality involves only two variables:
Lemma 1. (Baby AM-GM) If x and y are non-negative real numbers,
then
x+y √
2
≥ xy, with equality if and only if x = y.
x and y make the diameter AB of a circle and the length of the (dashed)
√
perpendicular CD to that diameter turns out to be the geometric mean xy.
The baby AM-GM inequality itself can be visualized using the shaded right
OCD, where the hypotenuse is “AM” (also equal to the radius), while the
(dashed) leg is “GM”. The proofs of these facts can be found in the plane
geometry interlude at the end of the session.
√ C x+y
? xy
2
√
xy O
x Area A = Area A
0 x x+y y A xD y B
y √
2 xy k
Hint: Using some algebra instead, deduce the baby AM-GM √ from the
√
inequality in the opening quotation on page 211 by letting t = x − y. For
√
i x = y, explain why x+y2 > xy, an example of a strict inequality. ♦
Problem 1. A rectangular garden is to be constructed using 20 meters of
fence for three of the sides, and using an existing long wall for the fourth
side. What is the maximum possible area that can be enclosed?
PST 69. Knowing if and when an inequality becomes an equality is usually
the key to finding extreme values, e.g., baby AM-GM roughly implies that
x = y (> 0) makes the sum x + y minimal and the product xy maximal.
2. ARITHMETIC MEAN – GEOMETRIC MEAN INEQUALITY 213
Solution: Let x be the length of the side along the wall (cf. Fig. 2), and
let y be the length of each side adjacent to this side (in meters). We must
find when the area A = xy is maximal for positive x and y such that the
fence length x + 2y is at most 20. In formal language, we must maximize xy
subject to the constraints x, y > 0 and x + 2y ≤ 20.
y y
y y
x
x
Figure 2. Two gardens along a wall
Since we see the sum x + 2y, we apply AM-GM to x and 2y (both > 0):
AM-GM x + 2y 20
(1) x · 2y ≤ ≤ = 10.
2 2
√
Squaring both sides of 2xy ≤ 10 and dividing by 2 yields xy ≤ 50.
To finish the problem, we must show that xy = 50 is possible (if not, the
maximum area would be < 50). To obtain xy = 50 in the end, we must have
an equality at each step of (1). This happens when x = 2y (by the equality
criterion in AM-GM) and x + 2y = 20. Solving this system yields x = 10
and y = 5, so these are the only allowable values of x and y that might make
the equality xy = 50 hold, and they do.
We conclude that the maximum possible area is 50 square meters, and
this is attained if and only if the rectangle is 10 meters by 5 meters, with
the long side against the wall.
2.4. More variables, more challenge for baby AM-GM! The garden
problem can also be solved using calculus, which gives another approach to
many inequality problems. Instead, we did it in detail with the baby AM-
GM because the same reasoning can be used in more complicated problems.
Try the two exercises below: despite their “multivariable” appearance, clever
applications of baby AM-GM for two variables at a time is all you need!
Exercise 3. Prove that for any a, b, c > 0,
(a + b)(b + c)(c + a) ≥ 8abc,
and determine when equality holds.
Exercise 4. Prove that n! < Ä ä for all integers n > 1.
n+1 n
2
Hints: In both exercises, baby AM-GM applies nicely to pairs of numbers.
In the former exercise there isn’t much of a choice for the pairs, whereas
in the latter exercise you have to be careful to pair up the “right” numbers
according to their sum. ♦
2. ARITHMETIC MEAN – GEOMETRIC MEAN INEQUALITY 215
2.5. Need more strength. Some problems with more variables cannot
be conquered by a repetitive application of baby AM-GM. And hence, we
formulate the general version of AM-GM for any number of variables:
Theorem 1. (AM-GM) If x1 , x2 , . . . , xn ≥ 0, then
x1 + x2 + · · · + xn √
≥ n x1 x2 . . . xn
n
with equality if and only if x1 = x2 = · · · = xn .
Theorem 1 and other fundamental inequalities of n variables will be
proven later in Monovariants III. In this session, we assume them and show
how to use them in problems, along with other PSTs. To start off,
√ 1
Exercise 5. Prove 2 x ≥ 3 − for x > 0.
x
Exercise 6. Prove that if a ≥ b ≥ 0 and n ≥ 1 is an integer, then
an − bn ≥ n(a − b)(ab)(n−1)/2.
√ √ √
Hints: Write 2 x as x + x, or factor a − b out of the LHS. Then apply
AM-GM for 3 or for n variables. If equality is attainable, find out when. ♦
2 2 2
Exercise 7. Let E be the ellipsoid xa2 + yb2 + zc2 = 1 for some a, b, c > 0.
Find, in terms of a, b, and c, the volume of the largest rectangular box that
can fit inside E, with faces parallel to the coordinate planes (cf. Fig. 3a).
z
y
x
x y
3.1. Are there any other means? The arithmetic and geometric means
are certainly not the only ways to assign an “average” to several numbers.
Definition 1. Fix x1 , x2 , . . . , xn ≥ 0. For r = 0, the rth power mean Pr of
x1 , x2 , . . . , xn is the rth root of the average of the rth powers of the xi ’s:
Å r ã
x1 + xr2 + · · · + xrn 1/r
i Pr = .
n
To avoid inverting 0s, assume r > 0 if some xi is 0.
Even though the formula yields nonsense if r = 0, there is a√natural way
i to define P0 too: simply let it be the geometric mean,1 i.e., P0 = n x1 x2 . . . xn .
At the other extreme, what happens when r is very large? If one of the
xi ’s, say xm , is larger than all the others, then xrm will be much larger than
the rth powers of the others, so much larger that Pr ≈ xm . Hence, we define:
i P∞ = max{x1 , . . . , xn }, and similarly, P−∞ = min{x1 , . . . , xn }.
Below are three famous examples of power means:
x1 + · · · + xn x21 + · · · + x2n n
P1 = , P2 = , and P−1 = ·
n n 1
x1 +··· + 1
xn
Here P1 is just the arithmetic mean, P2 is sometimes called the root mean
i square, and P−1 (defined only for x1 , . . . , xn > 0) is the harmonic mean (HM).
3.2. What is the relation between all power means? Briefly, the larger
the power, the larger the mean:
Theorem 2. (Power Mean Inequality) Let x1 , x2 , . . . , xn ≥ 0. Suppose
that r > s (and s ≥ 0 if some xi is 0). Then Pr ≥ Ps , with equality if and
only if x1 = x2 = · · · = xn .
The power mean inequality (PM) holds even if r = ∞ or s = −∞, pro-
vided that we use the definitions of P∞ and P−∞ above, and the convention
that ∞ > r > −∞ for all numbers r. Here are three important special cases
of the PM inequality, including our previous AM-GM:
1
The definitions of P0 , P∞ , and P−∞ are explained in Section 3.3.
3. POWER MEAN INEQUALITY 217
point (a, b, c) with a, b, c > 0 and meeting the positive z
parts of the three coordinate axes, find the one such that (a, b, c)
the tetrahedron bounded by it and the coordinate planes y
has minimal volume.
Hint: For r, s, t > 0 what is the equation of the plane
through (r, 0, 0), (0, s, 0), and (0, 0, t)? x
Try also the two-dimensional version of the problem. ♦
3.3. Limits justify our choices. The discussion below explains the def-
initions for the power means P0 , P∞ , and P−∞ . If you do not know limits
well, you can skip this on a first reading, without hurting your understand-
ing of inequalities. The die-hards can still find the necessary background
material in a real analysis [69] or an advanced calculus textbook.
√
Let’s start with P0 . The reason for the convention P0 = n x1 x2 . . . xn
is that when r is very small but nonzero the value of Pr is very close to
the geometric mean, and it can be made as close as desired by taking r
sufficiently close to 0. In the language of limits,
√
Lemma 2. lim Pr = n x1 x2 . . . xn .
r→0
Another way of saying this is that the only choice for P0 that makes Pr
depend continuously on r is the geometric mean: lim Pr = P0 .
r→0
Hint: l’Hôpital’s Rule and properties of ln x will be needed in the proof. ♦
Let us now explain why we defined P∞ as we did. Let xm be the largest
of the xi ’s. Then 0 ≤ xi ≤ xm for all i. Hence
xrm xr + · · · + xrm + · · · + xrn nxrm xm
≤ 1 ≤ = xrm , so √
r
≤ P r ≤ xm .
n n n n
√
But lim r n = lim n1/r = n0 = 1, so by the Sandwich (Squeeze) Theorem
r→∞ r→∞
lim Pr = xm = max{x1 , . . . , xm }.
r→∞
This motivates the definition P∞ = max{x1 , . . . , xm }.
See if you can modify this proof to explain the choice for P−∞ :
Lemma 3. lim Pr = min{x1 , x2 , . . . , xn }.
r→−∞
218 9. INEQUALITIES I
y y
f (x) l
B
D y
A x x
C
a c b x convex
4.2. The “convex hall” of fame. For convenience, here is a brief list of
some frequently encountered convex functions:
• x2k on all of R; • xr on [0, ∞), if r ≥ 1;
• −xr on [0, ∞), if r ∈ [0, 1]; • xr on (0, ∞), if r ≤ 0;
• − ln x on (0, ∞); • − sin x on [0, π];
• − cos x on [−π/2, π/2]; • tan x on [0, π/2);
• ex on all of R; • r
s+x on (−s, ∞), if r > 0.
In these, k represents a positive integer, r, s represent real constants, and x
is the variable. In fact, all of these are strictly convex on the interval given,
except for xr and −xr when r is 0 or 1.
Exercise 10. Draw the graphs of the functions above and explain why they
are convex on the given intervals by verifying visually the geometric definition
of convexity.
To make more convex functions out of already known convex functions,
we can perform certain arithmetic operations:
Lemma 4. Show that a sum of convex functions is convex, and that adding
a constant or linear function to a function does not affect convexity.
Hint: Verify the algebraic definition of convexity. ♦
Use each of the criteria above to produce three different solutions to:
Exercise 11. Find out (with proof) on which intervals x2 , x3 , and sin x
are convex. For an extra challenge, prove that each of the functions in
Exercise 10 is convex (or strictly convex) on the indicated intervals.
Theorem 6. (Maximum Principle for Convex Functions) A convex
function f (x) on an interval [a, b] is maximized at x = a or x = b (or both).
Proof: First suppose that f (b) ≥ f (a). Given c in [a, b], let λ ∈ [0, 1]
be such that c = (1 − λ)a + λb. Then the algebraic definition of convexity
implies that
f (c) ≤ (1 − λ)f (a) + λf (b) ≤ (1 − λ)f (b) + λf (b) = f (b),
so f attains a maximum at b. The case f (a) ≥ f (b) is analogous.
Thus, as long as f (x) is convex on [a, b], its maximum is attained at an
endpoint of the interval.
5. APPLICATIONS OF CONVEXITY TO INEQUALITIES 221
and f3 (a) = a+b+1 are convex on their respective domains (−1 − c, ∞) and
c
GM implies the weighted AM-GM, and similarly, the weighted PM and the
Lemma 8. When the weights are all rational numbers, the ordinary AM-
Exercise 15. Given a, b, c, p, q, r > 0 with p + q + r = 1, prove
a + b + c ≥ ap bq cr + ar bp cq + aq brc p .
Hint: Apply the weighted AM-GM three times. ♦
Exercise 16. Prove that if a, b, c are sides of a triangle, then
(a + b − c)a (b + c − a)b (c + a − b)c ≤ aa bb c c .
Hint: Why is a triangle mentioned? Divide both sides by the RHS, take
some root of both sides, and apply the weighted AM-GM. ♦
Exercise 17. In Figure 1b, two segments form the diameter AB of circle k:
AD = x and DB = y. A perpendicular is erected at point D to AB until it
√
hits the circle k in point C. Prove that CD = xy.
Proof: Since ∠ACB is an inscribed angle in circle k overlooking diameter
AB, we have ∠ACB = 90◦ (cf. Circle Geometry session, vol. I), and all
three triangles ADC, CDB, and ACB are right. They are also similar
because they have one more equal angle; e.g., ∠BAC is shared among two
of them, etc.
In particular, the ratios of the two smaller triangles’ sides are the same.
√
Hence, AD/CD = CD/BD, from which xy = CD 2 and CD = xy.
This geometric construction explains the name geometric mean of x and y.
Using it, it is possible to:
Exercise 18. Interpret and prove geometrically the baby AM-GM.
Proof: The midpoint O of AB is the center of k, and the radius of k, being
half of the diameter, is the arithmetic mean (x + y)/2 = OA = OB = OC.
So both the AM and GM of x and y appear in right triangle ODC as the
hypotenuse OC and the leg CD, respectively (cf. Fig. 1b).
√
The geometric inequality OC ≥ OD says that (x + y)/2 ≥ xy, which
is the baby AM-GM inequality. Equality is attained if and only if ODC
degenerates into a segment OC, which happens exactly when D = O, i.e.,
when x = y (cf. Fig. 5a).
C B
k
D
F
x y
E
A D=O B A
A1 D1 B1 x
Figure 5. Equality in baby AM-GM and Trapezoids in convexity
Baby AM-GM
6.2. A diagram of the major inequalities for
means introduced in this session appears to the Weighted Baby
right. The arrows show implications between dif- AM-GM
AM-GM
ferent inequalities, e.g., the bottom-most arrow in-
dicates that the Hardy-Littlewood-Pólya Inequal-
ity implies the Jensen’s Inequality. The dashed
Weighted AM-GM
arrows refer to implications being shown here for PM
the case of rational weights only.
Using the so-called smoothing technique, we
will prove some of these inequalities in Monovari- Weighted PM
JI
ants III. We will see other fundamental inequalities
and further sophisticated applications of inequal-
ities to olympiad-style problems in the upcoming
Weighted JI
Inequalities II. HLP
?
The latter is recognized as 0 ≤ (x − y)2 , which is always true, with equality
iff x = y. If you prefer to avoid squaring both sides, multiply the proposed
inequality instead by 2, pull to the RHS and rewrite as a square:
√ ? ? √ √ √ √ ? √ √
2 xy ≤ x + y ⇔ 0 ≤ ( x)2 − 2 x y + ( y)2 ⇔ 0 ≤ ( x − y)2 .
√ √
Plugging t = x − y into the quotation in the beginning of the session, we
√ √
obtain the true inequality t2 ≥ 0, with equality iff x = y, i.e., x = y.
Exercise 1. If L replaces 20m, then the system of equations is x + 2y = L
and x = 2y, leading to y = L/4, x = L/2, and largest area xy = L2 /8. ♦
Exercise 2. For the first question, the area xy = 50 is fixed and we want
to minimize the fence length x + 2y. By baby AM-GM:
x + 2y AM-GM √ √
≥ x · 2y = 2 · 50 = 100 = 10,
2
where equality is obtained iff x = 2y. Plugging into the fixed area, we obtain
2y 2 = 50, i.e., y = 5 and x = 10, yielding a minimal fence of 20 m again!
226 9. INEQUALITIES I
This is not a coincidence, since this exercise and the original problem are
two sides of the same optimization situation (why?).
To answer the other question, we have the fence length fixed at 20 =
2x + 2y, i.e., x + y = 10. To maximize the area, we again apply AM-GM,
but this time to variables x and y:
√ AM-GM
x+y
x·y ≤ 10
2 = 2 = 5,
with equality iff x = y = 5. Thus, the square has maximal area of 25 m2
among all rectangles of 20 m perimeter. Again, any perimeter length will
yield a square as the optimal figure in this type of a problem.
Exercise 3. Following the hint, we apply AM-GM once to each sum on the
LHS and then multiply the three resulting inequalities:
√ √ √
a + b ≥ 2 ab, b + c ≥ 2 bc, c + a ≥ 2 ca, and
√ √ √ √
(a + b)(b + c)(c + a) ≥ 2 ab · 2 bc · 2 ca = 8 a2 b2 c2 = 8abc,
where equality is obtained iff equalities are obtained in each of the original
three applications of AM-GM, i.e., a = b = c.
Exercise 4. We pair up the numbers from {1, 2, . . . , n} so that each pair
adds up to n + 1: (1, n), (2, n − 1), . . . , (n − 1, 2), (n, 1). Note that each
number appears twice, and if n is odd then the middle number (n + 1)/2 is
paired up with itself. We now apply AM-GM to each such pair:
√ 2+(n−1) √ √ √
2 ≥
1+n
1·n, 2 ≥ 2(n−1), . . . , (n−1)+2
2 ≥ (n−1)2, n+12 ≥ n·1.
Ä än √
Multiplying now all these n inequalities yields n+1 2 ≥ n! n! = n!. Equal-
ity can never be obtained for n > √
1 since the very first application of AM-GM
yields a strict inequality: 1+n
2 > n.
Exercise 5. Pulling all variable expressions to the LHS, and dividing by 3,
√ ?
we turn the inequality into its equivalent version (2 x + x1 )/3 ≥ 1. The 3 in
the denominator suggests using AM-GM for 3 variables, but we√have only 2
summands in the numerator! Hence the text suggests to split 2 x:
√
2 x+ x1
√ √ »
x+ x+ x1 AM-GM 3 √ √ 1
» √
3 = 3 ≥ x x x = 3
x 1
x = 3
1 = 1,
√ √
with equality iff x = x1 , i.e., x x = 1, x3 = 1, and x = 1.
Exercise 6. We see the product of many ab’s on the RHS; if we divide
everything by n, we also see a denominator of n on the LHS; still, we do not
see a sum on the LHS! But there is a common factor of (a − b) on both sides:
? n−1
an −bn = (a−b)(an−1 +an−2 b+· · ·+an−1−k bk +· · ·+abn−2 +bn−1 ) ≥ n(a−b)(ab) 2 .
If a = b, then both sides are 0 and we are done. If a > b, we divide by
n(a − b) without changing the direction of the inequality:
an−1 + an−2 b + · · · + an−1−k bk + · · · + abn−2 + bn−1 ? n−1
≥ (ab) 2 .
n
7. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 227
ways to choose the other (k−1) elements from the (n−1) leftover numbers aj .
Thus, each ai is raised to the power
n−1
(n − 1)! k!(n − k)! k
n
k−1
= · = ·
k
(k − 1)!(n − k)! n! n
n k n k
Hence, the RHS of (3) equals k (a1 a2 . . . an ) n = k g . For the whole sum,
we run this argument for k = 0, 1, . . . , n and recover on the LHS the original
product that was multiplied out:
n Ç å
n k
(1 + a1 )(1 + a2 ) . . . (1 + an ) ≥ g = (1 + g)n .
k
k=0
i The last is a famous combinatorial identity called the Binomial Theorem.
Equality is achieved only if equalities are obtained everywhere in the appli-
cations of AM-GM in (3). In particular, for k = 1 we must have the singleton
products equal among themselves, i.e., a1 = a2 = · · · = an (= g), and these
do produce an overall equality.
tan x ex
x−1 − cos x 3
5+x
−5
1
− sin x − log x
−x 3
Lemma 4. If f (x) and g(x) are convex functions on [a, b], then by the
algebraic definition of convexity, for any λ ∈ [0, 1]:
f (1 − λ)a + λb ≤ (1 − λ)f (a) + λf (b) and g (1 − λ)a + λb ≤ (1 − λ)g(a) + λg(b).
Adding these two inequalities yields
f (1 − λ)a + λb + g (1 − λ)a + λb ≤ (1 − λ)(f (a) + g(a)) + λ(f (b) + g(b)),
which is the algebraic definition of convexity on [a, b] for the function f + g.
A constant or linear function g(x) is automatically convex, so adding it to a
convex function preserves convexity.
Exercise 11. Since all three functions x2 , x3 , and sin x are continuous on R,
we can apply to them the Continuity-and-Midpoint (CM) criterion. To start,
for a, b ≥ 0 we apply special cases of the PM inequality to x2 and x3 :
Å ã Å ã
a2 + b2 (P2 ≥P1 ) a + b 2 a3 + b3 (P3 ≥P1 ) a + b 3
2 2 3 3
≥ and ≥ ,
2 2 2 2
and hence, by CM criterion, x2 and x3 are convex on [0, ∞). As for showing
that x2 is convex on all of R, what would happen to the inequality if you
replace a and b by ±a and ±b? ♦
230 9. INEQUALITIES I
• (− sin x) = − cos x, (− cos x) = sin x, (tan x) = cos12 x , (ex ) = ex .
√ Ä ä √
so ln n x1 . . . xn ≤ ln x1 +···+x
n
n
, or P0 = n x1 . . . xn ≤ x1 +···+x
n
n
= P1 ,
with equality iff x1 = x2 = · · · = xn .
For Pr ≥ Ps with r ≥ s > 0, we apply JI to xs1 , xs2 , . . . , xsn > 0 and the
function g(x) = xr/s , which is convex on [0, ∞) because rs > 1:
Å ãr/s
(xs1 )r/s + · · · + (xsn )r/s JI xs1 + · · · + xsn
≥ .
n n
Taking rth roots on both sides gives
Å r ã Å s ã
x1 + · · · + xrn 1/r x1 + · · · + xsn 1/s
Pr = ≥ = Ps .
n n
Exercise 12. To discover the needed convex function, reason backward:
Å ãx+1 Å ã Å ã
? x+1 ln ? x + 1 x+1 ? x+1
xx ≥ ⇐⇒ ln xx ≥ ln ⇐⇒ x ln x ≥ (x + 1) ln ·
2 2 2
The function f (x) = x ln x participates on both sides of the last inequality,
7. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 231
so we check if f (x) is strictly convex for x > 0. This is true by the First
Derivative Test since f (x) = ln x + 1 is strictly increasing for x > 0. By the
definition of convexity of f (x) = x ln x on [x, 1] (or [1, x]) and t = 1/2,
Å ã Å ã
f (x) + f (1) x+1 x ln x + 1 ln 1 x+1 x+1
≥f , so ≥ ln ,
2 2 2 2 2
which simplifies to the desired inequality, with equality iff x = 1.
Exercise 13. Let xi be as in the hint, so xi ∈ (0, π) and x1 + · · · + xn = π.
Then the length of the ith side of the n-gon is 2R sin xi , where R is the
radius of the circle. Thus, we need to maximize the sum of all sin xi . But
f (x) = − sin x is convex on [0, π], hence by JI:
Å ã
sin x1 + · · · + sin xn x1 + · · · + xn π
(4) − ≥ − sin = − sin ·
n n n
Therefore, sin x1 + · · · + sin xn ≤ n sin n , with equality iff x1 = · · · = xn , i.e.,
π
The three exponents add up to 1 and are positive. Hence they can serve as
weights λi , making the LHS the weighted GM of a+b−c a ,
b+c−a
b , and
b+c−a
c ,
which is less than or equal to the weighted AM, i.e.,
a a+b−c b b+c−a c b+c−a
LHS ≤ · + · + ·
a+b+c a a+b+c b a+b+c c
a+b−c b+c−a c+a−b a+b+c
= + + = =1
a+b+c a+b+c a+b+c a+b+c
Session 10
Zvezdelina Stankova
4. Dirichlet Product
233
234 10. DIRICHLET, MÖBIUS, AND EULER
In other words, the product is taken over all pairs of divisors (d1 , d2 ) that
multiply to n: d1 d2 = n. Solving for the divisor d2 = n/d1 yields the second
equivalent summation in (12). For instance,
Ä
6
f g (6) = f (d) g d) = f (1)g(6) + f (2)g(3) + f (3)g(2) + f (6)g(1).
d|6
Along the way, we have discovered that sum-functions are, not surprisingly,
a particular instance of the D-product:1
Property 1. D-multiplying by ι produces the sum-function:
(13) f ι = Sf for any f ∈ A.
Can we really solve for R from here? We’ll answer this affirmatively in a bit.
Property 2. D-product is commutative and associative: f g = g f and
(f g) h = f (g h) for all f, g, h ∈ A.
where the sum is taken over all triples (d1 , d2 , d3 ) of divisors of n that mul-
tiply to n (prove this!) ÄAs the RHSä of (14) is symmetric with respect to f ,
g, and h, it also equals f (g h) (n). Thus, we can write:
f g h = (f g) h = f (g h). ♦
4.3. Multiplicative identity. Suppose your little sister asks you: “What
is the number 1?” How would you describe 1 to identify it uniquely among
all other numbers? Answering “The number 1 signifies one object.” is a
circular definition. Saying “1 is such that 1 + 1 = 2.” is no good either:
you are defining 1 via another number 2; besides, I prohibit you from using
in your description any operation other than multiplication . . . . Well, here
is what you will “learn” about 1 in any abstract algebra course: 1 is the
unique number such that multiplying any number by 1 gives that number,
i.e., n · 1 = 1 · n = n for all n ∈ N. Again, this works equally well in the sets
of rational, real, or complex numbers too.
Moving to the set A of arithmetic functions with product , the question
is: what function plays the role of “1” and deserves to be called the multi-
i plicative identity of A? Our calculations in Exercise 10 point to the answer:
Note that any multiplicative identity (if it exists) is unique: this is well known
in abstract algebra. Indeed, in our context, if ε is another multiplicative
identity in A, then ε = ε ε = ε (why?), implying uniqueness of ε.
At this point it is worth comparing the D-product with the usual product
of functions. The ordinary product is commutative and associative: f · g =
g · f and f · (g · h) = f · (g · h), but the multiplicative identity with respect
to it is the function ι: as observed earlier, f · ι = ι ·f = f for any f ∈ A.
With respect to the D-product, ι is not the multiplicative identity, but has
the nice property of transforming each function f into its sum-function Sf :
ι f = f ι = Sf . The two types of function products and · have started
to diverge, and they will continue to do so as we study our next notion.
236 10. DIRICHLET, MÖBIUS, AND EULER
we need to enlarge N to include at least the positive rational numbers. From now on,
“numbers” will refer to natural, rational, real, or complex numbers, as needed.
4
For the group theory fans: Lemma 3 implies that A is not a group under . However,
we’ll see that its subset A∗ of all arithmetic functions f with f (1) = 0 is a group.
5. MÖBIUS INVERSION FORMULA 237
To find an explicit formula for g, we will use the ad-hoc 5 approach of PST 27,
guess the formula, and then prove it rigorously. In the following, we have
done some initial calculations for g(n). Try to come up with these calcula-
tions on your own and then compare your answers with those in the table.
n Sg (n) = 0 for n ≥ 1 g(n)
p g(1) + g(p) = 0 −1
1 ?
p2 g(1) + g(p) + g(p2 ) = 0 0
0 ?
p3 g(p3 ) = 0; why? 0
p1 p2 g(1) + g(p1 ) + g(p2 ) + g(p1 p2 ) = 0 1
0 −1 ?
3
p1 p2 p3 g(1) + g(pi ) + g(pi pj ) + g(p1 p2 p3 ) = 0 −1
i=1 i<j
1 ?
−3 3
p21 p2 g(1) + g(p1 ) + g(p2 ) + g(p1 p2 ) + g(p21 ) + g(p21 p2 ) = 0 0
0 0 ?
p31 p2 g(p31 p2 ) = 0; why? 0
Two patterns emerge in our table (Don’t see them yet? Do more cases!):
• If n = p1 p2 · · · pr is a product of distinct primes, then g(n) = (−1)r .
i Such n is called square-free since it has no perfect square divisor.
• If n is divisible by a higher prime power pa with a ≥ 2, the result is
always g(n) = 0. Correspondingly, such n is not square-free.
Our exploration naturally leads to the introduction of a (famous) function:
i Definition 7. The Möbius function μ : N → Z is defined by
⎧
⎨ 1 if n = 1;
μ(n) = 0 if n is not square-free;
⎩
(−1)r if n = p1 p2 · · · pr , pi ’s are distinct primes.
The first equality throws out divisors that are not square-free, as μ(d) = 0
for them. The second equality realizes any square-free divisor d as a product
of some distinct prime factors pij of n; these d’s are grouped according to the
number k of their prime factors, with k running from 1 (choose one pij ) to r
(choose all pij ’s); the divisor 1 is written separately, and it can be thought
of as the product of zero pij ’s.
As noted earlier, the number of k-tuples {pi1 , pi2 , . . . , pik } from among
the r given primes {p1 , p2 , . . . , pr } is calculated by kr . From the definition
of μ, the corresponding contributions are all μ(d) = (−1)k . Putting all this
together and using the Binomial Theorem completes the argument:
r
r
μ(d) = k (−1)k = (1 − 1)r = 0.
d|n k=0
This proof doesn’t work for n = 1 (why?), but this is a trivial case to be
checked by hand: d|1 μ(d) = μ(1) = 1.
r Ä
ä
(a) μ(d)f (d) = f (1) − f (pi ) .
d|n i=1
Ä ä
(b) μ(d)f ( nd ) = ri=1 f (pai i ) − f (piai −1 ) .
d|n
Exercise 15. Find closed expressions for the following sums:
μ(d)
(a) μ(d) τ (d); (b) μ(d) σ(d); (c) d ;
d|n Ä ä d|n
Ä ä d|n
(d) μ(d) τ n
d ; (e) μ(d) σ n
d ; (f) μ(d) nd ·
d|n d|n d|n
Solution to (c). The function f (n) = 1/n is multiplicative, so
μ(d) Exer.14
r
r
d = μ(d)f (d) = Sμ ·f (n) = f (1) − f (pi ) = 1− 1
pi .
d|n d|n i=1 i=1
Look closer at the result: does it remind you of something? In formula (9)
for ∞-Raffle R, if we over-factor all pai i we obtain:
Ä äÄ äÄ ä r Ä ä
(19) R(n) = pa11 pa22 · · · par r 1 − 1
p1 1− 1
p2 1− 1
pr =n i=1 1− 1
pi .
R(n) μ(d)
Dividing by n yields the alternative answer n = d|n d ·
Could parts (a) and (b) be related? You are on your own regarding the
function π in parts (c) and (d). Check out some interesting observations in
the Hints section. ♦
6. THE EULER FUNCTION φ(n) 243
1 2 3 4 5
1 2 3 4 5 6
7
6 8 9 10
7 8 9 10 11 12
11 12 13 14 15
13 14 15 16 17 18
16 17 19
19 18 20
20 21 22 23 24
21 22 23
25 26 27 28 29 30 24 25
↑ ↑
26 27 28 29 30
↑ ↑ ↑ ↑
Figure 5. φ(30) = φ(6)φ(5) = 2 · 4
6.2. Follow the beaten path and triumph again over ∞-Raffle. Once
multiplicativity of the Euler function has been shown, there is nothing else
to do but apply PST 26 and reduce φ(n) to prime powers:
Problem 11. Find a closed formula for φ(p ) for any prime power p . k k
PST 75. Using the simple idea of complements, to find the size of one subset
of a set, subtract the size of its complement from the size of the whole set.
Since the “whole set” is {1, 2, ..., pk }, we thus obtain φ(pk ) = pk − pk−1 .
6.3. Eager for second helpings. Our direct proof of φ’s multiplicativity
was somewhat hefty. Here we explore an alternative, slicker way. To this
end, we again turn to the sum-function of φ:
Problem 12. From the definition of the Euler function φ, directly calculate
its sum-function Sφ without using φ’s multiplicativity.
Arguably, this question is hard because it asks us to forget what we have
learned so far about the Euler function (and the human mind really objects
to such tasks!), and to find its sum-function with bare hands from scratch.
If you attempt to brute-force the calculation of Sφ , you will end up in an
incomprehensible mess. You certainly could try a more organized approach
by some type of induction (as you know the answer is id); try it! But suppose
you truly did not know anything about the Euler function, other than its
definition. What would you do?
The marvelous approach below is based on the simple but most funda-
mental of combinatorial ideas:
PST 76. To establish a combinatorial identity (such as Sφ = id), identify a
suitable set of objects and count its elements in two different ways.12
The question, of course, is what is this “suitable” set of objects to be
counted? Enjoy one possible solution, and see if you can think of another.
We have dedicated this whole section to problems with φ. Yet, the ways
in which the Euler function can be engaged are so varied that we can only
gloss over a few of the possible related problem-solving themes.
Solution (d): Our first move is simple, but may be unexpected. Since
φ(n) is an integer, 3 must divide n on the RHS. This prompts the idea:
PST 77. Factor out all 3s that divide n to form the partial prime decom-
position n = 3a m with gcd(3, m) = 1 and substitute it for n. The relative
primeness of 3 and m makes this expression especially suitable when working
with multiplicative functions.
In our situation, a ≥ 1. Substituting, φ(3a m) = 23 3a m, i.e., φ(3a )φ(m) =
3a−1 2 φ(m) = 2·3a−1 m. Thus, φ(m) = m. But φ(m) < m for m > 1 (why?),
so that m = 1 and the final answer is n = 3a for a ≥ 1.
Even though the above solution is probably the best there is, if we didn’t
start the “right” way by factoring out all powers of 3, we could have ended
up with a seemingly harder problem. Our instinct might have told us to
apply the formula for φ(n) from the get-go:
Ä ä
n 1− 1
pi = 23 n.
i
PST 78. Recall that all primes are odd, except for 2 itself, and match the
number of 2s dividing each side. In the process, obtain an upper bound for
the number of involved primes, and thus reduce the problem to only finitely
many possibilities.
You can try the same idea with the prime 3 instead of 2 in PST 78. We
defer the remainder of this second solution to part (d) of Exercise 18 to the
Hints section. ♦
As a bonus, try to do part (c) also in two ways: start by factoring out
all 2s from n and then apply φ, or first apply φ and then deal with the
consequences. ♦
(a)
μ(d)φ(d); (b)
μ(d)φ( nd ); (c)
μ2 (d)φ2 (d); (d)
μ(d)
φ(d)
·
d|n d|n d|n d|n
7. THE TAMING OF THE SHREWD φ 249
By contrast, part (b) works only for certain multiplicative functions! Can
you think of other non-trivial examples that satisfy (b) in place of φ?
7.3. Tinkering with φ. The divisor function τ can be written as the sum
τ (n) = d|n 1. Likewise, the Euler function φ can be written as
(25) φ(n) = 1,
(t,n)=1,1≤t≤n
where the sum runs over all numbers relatively prime to and ≤ n. Changing
“1” to “d ” in the formula for τ yields σ(n) = d|n d. Due to multiplicativity,
we have nice formulas for the three sums τ , σ, and φ. What happens if we
analogously replace “1” by “t ” in the formula for φ: can we calculate the sum
of all numbers relatively prime to and ≤ n?
Problem 13. Find a closed expression for the sum η(n) := t.
(t,n)=1,1≤t≤n
PST 79. Compare with a previously solved problem and adapt the old proof.
Right! Which proof? Gauss’s proof of 1 + 2 + · · · + n = (n + 1)n/2 was
already adapted once to finding a formula for the product π(n) of divisors of n
(cf. Exer. 4). Can it be adapted again for a swift calculation of η(n)? ♦
250 10. DIRICHLET, MÖBIUS, AND EULER
torial skills, you could represent η(n) as an alternating sum via the so-called
Inclusion-Exclusion Principle:
η(n) = t− t+ t− t + · · · + (−1)r t,
i pi |t i<j pi pj |t i<j<k pi pj pk |t p1 p2 ···pr |t
Hint: Does this problem resemble something else we’ve done before? Let’s
consider some initial cases for f (x). The constant polynomial f (x) = 1
results in λ1 = id ∈ S, thereby oversimplifying the problem. The next poly-
nomial to try is the linear f (x) = x, resulting in λx = φ: we have discussed
this case in great detail! Can you modify the table-proof of multiplicativity
of the Euler function to show multiplicativity of the general λf ? ♦
Note that Problem 14 provides infinitely many non-trivial examples λf
satisfying both Exercises 20 and 21(b), with λf in place of φ. Is this a
coincidence? (See the extra Exercise 24 in the Hints section.)
7.4. Number theory in earnest. Our final problems on the Euler function
are “ruled” by a new notion:
i Definition 9. A Fermat prime is a prime of the form 2n + 1 for some n ∈ N.
where both last factors are > 1, making 2n + 1 composite. This factorization
of 2n + 1 would not work if 5 were replaced by an even number (why?),
but a similar factorization works for any odd number. We conclude that the
exponent n can only be a power of 2, i.e., n = 2k (why?), and hence
The first Fermat primes (and the only ones known so far!) are F0 = 3,
F1 = 5, F2 = 17, F3 = 257, and F4 = 65, 537. The next case is the long-
believed “prime” F5 = 232 + 1, whose factor of 641 was discovered by Euler
in 1732. If you want to make history, find out whether there are infinitely
many Fermat primes. Could it be that, to the contrary, all Fermat numbers
n
Fn = 22 + 1 are composite for n > 4?
Now let’s turn to more accessible, even though far from easy, problems.
Problem 15. Solve the equation φ(σ(2n )) = 2n for n ≥ 0.
2n = φ(pa11 pa22 · · · par r ) = p1a1 −1 p2a2 −1 · · · prar −1 (p1 − 1)(p2 − 1) · · · (pr − 1).
?
Exercise 12. (a) The initial purpose of discovering the function μ was to
find the D-inverse of ι, i.e., μ ι = ε and Sμ = ε.
(b) The D-product μ μ is multiplicative since μ ∈ M, which reduces
the calculation to prime powers: (μ μ)(pa ) = a1 +a2 =a μ(pa1 )μ(pa2 ). If
a ≥ 3, then we always have a1 ≥ 2 or a2 ≥ 2, yielding a 0 μ-value and hence
a 0 overall answer. The remaining cases (with a ≤ 2) are:
• (μ μ)(p) = μ(1)μ(p) + μ(p)μ(1) = −2;
• (μ μ)(p ) = 2
μ(1)μ(p2 ) + μ(p)μ(p) + μ(p2 )μ(1) = 1.
Multiplying all answers for the prime powers pa produces the final formula:
ß
(−2)k if n cube-free;
(μ2 )(n) =
0 otherwise,
where k counts the primes pi such that pi| n but p2i | n. Thus, Sμ2 = μ.
Exercise 13. Each relation f → Sf was verified somewhere in the text. ♦
Exercise 14. The explicit hint in the text leaves almost nothing to do:
a
Sμf (pa ) = μ(pj )f (pj ) = μ(1)f (1) + μ(p)f (p) + 0 = f (1) − f (p);
j=0
a
μ f (pa ) = μ(pj )f (pa−j ) = μ(1)f (pa ) + μ(p)f (pa−1 ) + 0 = f (pa ) − f (pa−1 ).
j=0
(c) The sum is Sμπ = (μπ) ι, which is probably its most compact
form. It is worth noting that, since μ “kills” any non-square-free d’s, the sum
depends only on the distinct primes p1 , p2 , . . . , pr dividing n; i.e., increasing
the exponents of prime powers does not change the value of the overall sum
(why?). Thus, Sμπ (n) = Sμπ (p1 p2 · · · pr ), and the latter can be expanded as
“a sum of sums” into a symmetric polynomial in the pi ’s. ♦
(d) The sum is the D-product μ π, which (in contrast to the previous
sum) does depend on the specific exponents of the prime powers dividing n.
Although a “closed form” seems to be out of question, here is an interesting
observation shifting the emphasis of the problem in a different direction:
Problem 17. (Evan O’Dorney) Show that n
d|n μ(d)π( d ) > 0 ∀n ∈ N.
Solution: Let n = pa11 pa22 · · · par r > 1. For any subset S of {p1 , ..., pr }
define π(S) = pi ∈S pi and π(S + 1) = pi ∈S (pi + 1) as the products of all
elements of S, or of the shifts up by 1 of all pi ∈ S. As μ(d) = 0 for non-
square-free d’s, the only surviving terms μ(d)π(n/d) correspond to d = π(S)
for any such subset S. If |S| is the number of elements of S, the LHS of the
desired inequality can be written as
Ç åπ(S)π(S+1)
n 2
|S|
(−1) ,
S
π(S)
one term for each such subset S. The “hero term” (i.e., the biggest term) is
τ (n)
when S is empty13 (why?); it equals n 2 . The biggest “enemy term” (i.e.,
the most negative term) must occur for some singleton set (why?). WLOG
τ (n/p1 )
S = {p1 }, so that this term (with the negative sign dropped) is (n/p1 ) 2 .
Taking into account that n ≥ 2, all ai ≥ 1, and 2r−2 ≥ r − 1 for r ≥ 1
(why?), we bound the hero term from below as follows:
τ (n)
n 2 = n(a1 +1)(a2 +1)···(ar +1)/2 = n(a2 +1)···(ar +1)/2 · na1 (a2 +1)···(ar +1)/2
Ä ä Ä ä τ (n/p1 )
r−2
n a1 (a2 +1)···(ar +1)/2
> 22 p1 ≥ 2r−1 n
p1
2
.
In other words, the hero term is at least 2r−1 times as large as any enemy
term. But each enemy term corresponds to a subset S with an odd number
of elements, while each positive term to a subset S with an even number of
elements. A well-known combinatorial problem says the following:
Exercise 22. The number of subsets of a set with r elements is 2r . For
r ≥ 1, half of these subsets have an odd number of elements, and the other
half have an even number of elements.
Thus, there are exactly 2r−1 enemy terms, whose sum is already domi-
nated by the single hero term. We conclude that the whole sum μ π(n) > 0
for all n > 1. ♦
13
Note that for S = ∅, the empty product pi ∈∅ is defined to be 1.
8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 255
However, for n = p2 :
(f −1 f )(p2 ) = f −1 (1)f (p2 ) + f −1 (p)f (p) + f −1 (p2 )f (1) = ε(p2 ) = 0
⇒ f 2 (p) − f 2 (p) + f −1 (p2 ) = 0 ⇒ f −1 (p2 ) = 0.
Exercise 17. The cube-diagram on page 87 splits 10! into prime powers:
φ(φ(10!)) = φ(φ(28 34 52 71 )) = φ(φ(28 )φ(34 )φ(52 )φ(7))
= φ((28 −27 )(34 −33 )(52 −5)(7−1)) = φ(211 34 51 )
= φ(211 )φ(34 )φ(5) = (211 −210 )(34 −33 )4 = 213 33 .
PST 81. To reduce the number of cases to be checked, study whether a
specific prime participates in n, starting with the largest prime!
(d) For the second approach started in the text, consider again (23) and
PST 78. The RHS is divisible by 2, possibly by 4 (if p1 = 2), but never by 8
(why?). Correspondingly, the same must be true for the LHS. Each factor
pi − 1 is even, except for possibly p1 − 1 = 1 (if p1 = 2). We conclude that
there are at most two odd primes p2 and p3 , and possibly one even prime
p1 = 2. Further, since 3|LHS, then p2 = 3. So, the only possibilities are
3(2’
− 1)(3 − 1)(p÷ ! "3 , ⇒ p÷
3 − 1) = 2· 2·3· p
! "3
3 − 1 = 2· p
Exercise 20. Since n and nk have identical prime divisors {pi }, we obtain:
Ä ä
φ(nk ) = nk (1 − 1
pi ) = nk−1 n (1 − 1
pi ) = nk−1 φ(n).
i i
Exercise 21. (a) Since f ∈ M, we can split everything along the corre-
sponding prime powers and use (24). To establish the equality, it will be,
therefore, sufficient to compare only the resulting prime-power pieces:
? min{ai ,bi } max{ai ,bi }
(31) f (pbi i ) f (pai i ) = f (pi ) f (pi ).
This is true, as seen before: either min{ai , bi } = ai and max{ai , bi } = bi , or
the other way around. Multiplying (31) for all i gives the desired relation
Ä ä Ä ä
f (m)f (n) = f (m, n) f [m, n] .
(b) We apply the formula for φ, keeping in mind that the prime divisors
of mn and of [m, n] comprise the same set {p1 , p2 , . . . , pr } (why?):
φ (24) φ
(m, n)φ([m, n]) = (m, n)[m, n] (1− p1i ) = mn (1− p1i ) = φ(mn).
i i
Trivial examples that satisfy (b) in place of φ are O, ε, and id; but none of
ι, μ, τ , or σ works. In fact,
Exercise 24. For any f ∈ M show that Exercises 20 and 21(b), with f in
place of φ, are equivalent:
f (nk ) = nk−1 f (n) ∀n, k ∈ N ⇔ f (mn) = (m, n)f ([m, n]) ∀m, n ∈ N.
Hint: Split along prime powers and rewrite for any prime p as follows:
f (pk ) f (p) ? f (pa+b ) f (pb )
pk
= p ∀k ∈ N ⇔ pa+b
= pb
∀a, b ∈ N, a ≤ b.
You can further simplify by substituting g(n) = f (n)/n for all n ∈ N. ♦
Problem 13. To use Gauss’s approach, we split all numbers from 1 to n
into pairs {t, n − t}. Note that t is relatively prime to n iff n − t is relatively
prime to n; thus, either both numbers t and n−t participate in the sum η(n),
or neither of them does. More good news: each pair adds up to n. Finally,
to avoid overcounting because of the pairing, we divide by 2 and skip writing
that each sum runs over all t’s such that (t, n) = 1 and 1 ≤ t ≤ n:
Ä ä def. φ 1
η(n) = 1
2 t + (n − t) = 1
2 n = 12 n 1 = 2 nφ(n).
For the alternative combinatorial approach suggested in the text, let’s cal-
culate a few preliminary sums. If not indicated, all t’s run from 1 to n;
{p1 , p2 , . . . , pr } are, as usual, the prime factors of n, and d is a divisor of n.
The basic “Gauss” sum-formula is nt=1 t = n(n+1) 2 . If we restrict the sum to
multiples t of d, we can write t = dq for q = 1, 2, . . . , n/d and add up:
n/d n/d
n n
Gauss d(d + 1) n(n + d) n2 1
t= dq = d q = d = = 2 d + n2 ·
d|t q=1 q=1
2 2d
r
If, say, d = pi pj pk , recall that 3 counts the triples {pi , pj , pk }, so that
r n
n2
3 2·
1
t= t= 2 pi pj pk +
i<j<k pi pj pk |t i<j<k d|t i<j<k
8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 259
r Ä
r ä nφ(n)
n2
= 2 (1 − 1
pi ) + n2 (1 − 1)r = n
2 n (1 − 1
pi ) +0= · ♦
i=1 i=1
2
As we saw for φ, all entries in the same column here are congruent modulo a
(that is, have the same remainder when divided by a). Indeed, since ja + i ≡
i (mod a), applying standard algebraic properties of remainders (cf. Number
Theory I), we have in the ith column:
(32) f (ja + i) = αm (ja + i)m ≡ αm im (mod a) = f (i).
m m
1 7 15 25
37
1 7 15
25 37 51 67 127
67 85 105 127 151 177 51 85 105
151 177 205 235
267
205
235 267
301 337 375
301 337 375 415 457
415 457 501 547 595 645
697 751 807 865 925 987 501 547 595 645 697
↑ ↑ ↑ ↑ 751 807 865 925 987
↑ ↑ ↑
By definition, λf (pa ) counts all f (j) with gcd(f (j), pa ) = 1, which is equiva-
lent to gcd(f (j), p) = 1. Further, each column in Figure 7 consists (again!) of
entries congruent to each other modulo p: f (jp + i) ≡ f (i) (mod p) (cf. (32)
with a = p). Thus, every row contains the same number of entries relatively
prime to pa as does the first row. As there are pa−1 rows and the first row has
λf (p) entries relatively prime to p, we conclude that λf (pa ) = pa−1 λf (p).
8. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 261
λf (n)
r
λf (pi )
⇒ = ·
n i=1
pi
Sneak Preview. In this final part, we will see a different kind of application
of monovariants. Previously we used monovariants instrumentally, to prove that
certain situations will or will not be reached. Now we will look at problems
where the monovariant itself is, explicitly, the whole point of the problem. This
specifically happens with inequality problems. We will smooth or unsmooth them,
and relying on convex functions, we will prove the famous relations between means
from Inequalities I. Applications to Olympiad problems will bring us full circle
to Monovariants I, to gain a deeper understanding of its signature problem on
mansion walks. As a bonus, the reader familiar with limits and continuity will
construct GPS devices to extend our techniques to endless smoothing processes.
While a review of Inequalities I is optional but highly recommended, a working
knowledge of induction is essential in this session.
Problem 1. (Balkan ’98, [3]) If n ≥ 2 and 0 < a1 < · · · < a2n+1 , prove
√ √ √ √ √
n
a1 − n a2 + n a3 − · · · + n a2n+1 < n a1 − a2 + a3 − · · ·+ a2n+1 .
2.1. Striving for the golden median. Let’s illustrate the technique by
proving one of the most famous classic inequalities, AM-GM, known to the
prolix by its full name, the Arithmetic Mean–Geometric Mean inequality.1
Problem 2. (AM-GM) Let x1 , x2 , . . . , xn be positive numbers. Prove that
x1 + x2 + · · · + xn √
(1) ≥ n x1 x2 · · · xn ,
n
with equality if and only if all the xi ’s are equal.
a way that keeps their sum constant and makes their product increase. At
Our approach is going to be to gradually change the values of the xi ’s in
the end of the process, all xi ’s will be equal, and therefore (1) will hold with
equality. Since the product of the xi ’s has only increased throughout this
process, we can conclude that (1) was true initially.
Proof: Let x = (x1 + · · · + xn )/n be the average of the xi ’s. If all xi ’s are
equal to x, then (1) reads x ≤ x, which holds with equality as required.
Now suppose not all xi ’s are equal to x. They cannot all be less than or
equal to x (since then their average would be < x), so one of them must be
greater than x; call it x + a. Similarly, one of them must be less than x; call
it x − b (with a, b > 0).
x1+x2
2
It follows from the third bullet point that this replacement process, per-
formed repeatedly, must eventually come to an end (namely, after no more
than n repetitions), with all xi ’s equal to x, and at this point (1) holds with
equality. But throughout the replacement process, the LHS of (1) has stayed
the same and the RHS has strictly increased. Therefore, it follows that for
the original xi ’s, (1) must have been true with strict inequality.
This solution is an example of a very general, intuitive technique for
proving inequalities that was outlined in PST 82 and is made specific here:
PST 83. To prove an inequality with several variables gradually replace the
(values of the) variables with others to reach the equality case (or another
convenient extreme case) in such a way that one side of the inequality stays
the same, and the other side always increases or always decreases (e.g., the
sum stays the same while the product increases).
This technique is informally called smoothing, or unsmoothing when it
i involves pulling variables apart instead of bringing them together. The ex-
ample above was one of smoothing. At least one of the later exercises in this
section will use unsmoothing.
Try smoothing the Arithmetic Mean–Harmonic Mean inequality:
Exercise 1. (AM-HM) If x1 , x2 , . . . , xn are positive numbers, prove that
(2)
x1 + x2 + · · · + xn
≥ 1
n
1 ,
x + x + · · · + xn
n 1
1 2
with equality if and only if all xi ’s are equal. Can you also derive this
inequality directly as a consequence of AM-GM?
2.2. Endless smoothing. There are many ways in which to smooth an
inequality, i.e., to bring the variables closer together.
Exercise 2. Suppose that in the preceding proof of AM-GM we instead try
to smooth in a perhaps more obvious way: if not all the numbers are equal
to their average x, then take two numbers xi < x and xj > x, and replace
them both by their average (xi + xj )/2 (cf. Fig. 1b). What if we average any
pair of numbers xi and xj , not necessarily coming from the opposite sides
of x? What goes wrong with the proof in either situation?
This shows that one has to be a little careful in choosing the smoothing
operation. If it’s not clear right away why the above doesn’t work, doing the
next two exercises – one concrete and the other more abstract – might help
give you a hint.
Exercise 3. In the set up of Exercise 2,
(a) Given any 2n numbers, show that there is a way to equalize all numbers
after a finite number of replacement operations.
(b) Given any 3 numbers, if they are not all equal after the first replacement
operation, prove that they will never all be equal.
266 11. MONOVARIANTS. PART III
Now, if you want to still make the smoothing in Exercise 2 work, you
resort to the idea that the process may go on forever and compensate for
this by use of limits (cf. Appendix on Limits); or you may apply a cool trick:
add a bunch of extra terms equal to the average x so that the total number
of terms becomes a power of 2, then apply Exercise 3(a) to smooth them all
to x, and finally drop the extra terms. Check that this works!
hypothesis (IH ).
This exercise indicates how smoothing is really
logically equivalent to a kind of induction: a finite
induction, which eventually ends (as suggested by
the picture on the left). Smoothing is nonetheless
useful as a conceptual tool.
The discussion of smoothing applications also fully rounds out our dis-
cussion of monovariants. We started out Monovariants I with examples of
problems where the steps are given, and you need to come up with a mono-
variant. In using smoothing to prove inequalities, the monovariant is more
or less given, and you need to come up with the appropriate steps.
3. Rearranging Terms
Now let n > 2 and suppose the statement has been proven for n − 1. If
y1 = z1 , then we can simply apply the induction hypothesis to the remaining
n − 1 variables x2 , . . . , xn and y2 , . . . , yn . So suppose y1 = zk for some k > 1.
Since x1 ≥ xk and zk ≥ z1 , we can apply the n = 2 case to these four
variables, switching z1 and zk (x1 z1 + xk zk ≤ x1 zk + xk z1 ) and thereby
increasing the LHS:
LHS = x1 z1 + x2 z2 + · · · + xk−1 zk−1 + xk zk + xk+1 zk+1 + · · · + xn zn
PST 85. The advantage of using only transpositions, as in the proof of RI,
is that we need to keep track of what happens to both sides of the inequality,
and having the simplest steps makes this possible and easy.
1 − ln x
x
f
x+y
2 y
x x+y y x
2 A B C D
But we didn’t need convex functions before when we proved the AM-
GM, AM-HM, or the Rearrangement inequality! Were convex functions
there? Yes, in each and every solution so far that incorporated smooth-
ing/unsmoothing or rearranging of terms convexity quietly stood in the back-
ground and was the reason that our arguments worked!
If you don’t believe it, let’s take another . . .
2
Technically speaking, f (x) is continuous at x = a if lim f (x) = f (a).
x→a
3
See the Appendix on Limits for a proof of the Midpoint Rule. There are also other,
more technical ways to investigate whether functions are convex or not, for example, via
derivatives, which we won’t use in this session.
4. CONVEXITY AND SMOOTHING 269
4.1.1. Convex look at AM-HM. The key smoothing step in Exercise 1 was to
replace two of the numbers, x1 = x − b and x2 = x + a, by x + a − b and the
average x of our n non-negative xi ’s. This kept the sum x1 + x2 constant,
but it also “miraculously” decreased the sum of the reciprocals x11 + x12 , which
is what we needed for our monovariant argument to go through. The convex
function used was no other than f (x) = x1 for x > 0.
This precise smoothing situation is so common that we phrase it as:
Lemma 1. (Smoothing) If f (x) is a convex function on interval I, and
A < B < C < D are numbers in I such that the middle two are equidistant
from the end ones, i.e., B − A = D − C, then
f (B) + f (C) ≤ f (A) + f (D).
Moreover, if f (x) is strictly convex, then the inequality above is strict.
Concisely put, bringing the inputs closer together (as {A, D} → {B, C}
in Fig. 2c) decreases the sum of the outputs of a convex function. For the
novice, it will be a worthwhile experience to attack this Smoothing Lemma
about four points by the definition of convexity, which relates only three
points at a time.
PST 86. If you are given a statement P1 about k objects, but you are trying
to prove some other statement P2 about n objects (where n > k), select all
or several suitable k-element subsets out of the n objects, apply P1 to each
subset, and then bring together your results in P2 by adding, multiplying, or
performing some other such symmetric operation.
Hint for Smoothing Lemma 1: It turns out that of the four possible
triplets of points from {A, B, C, D} only two will suffice (hinted by Fig. 2c):
you just have to choose them symmetrically, apply the definition of convexity
to each triplet, and then add up your inequalities. The geometry-oriented
reader may want to find a fast trapezoidal explanation. ♦
Since the LHS (AM) remains constant, and from the above the RHS (HM)
increases (why?), our monovariant argument for AM-HM works out.
4.1.2. Convex look at AM-GM. To locate the convex function behind the
proof of AM-GM requires some manipulation of the inequality. The key
smoothing step was the same as in the AM-HM proof: to replace x1 = x + a
and x2 = x − b by x + a − b and the average x of the given n non-negative
numbers. This kept the sum x1 + x2 constant, but how do we explain that
it also increased the product x1 x2 ?
270 11. MONOVARIANTS. PART III
Proof of weighted JI: Take two variables that are not equal, say,
x1 = x2 , and replace each by their weighted average x̃ = λ1λ+λ 1
2
x1 + λ1λ+λ
2
2
x2 .
Here we divided the original weights λ1 and λ2 by (λ1 + λ2 ) in order to make
the new weights of x1 and x2 sum to 1. From the definition of a convex
function on [x1 , x2 ] (or [x2 , x1 ]):
Å ã
λ1 λ2 λ1 λ2
f (x1 ) + f (x2 ) ≥ f x1 + x2 = f (x̃)
λ1 + λ2 λ1 + λ2 λ1 + λ2 λ1 + λ2
∗(λ1 +λ2 )
⇒ λ1 f (x1 ) + λ2 f (x2 ) ≥ (λ1 + λ2 )f (x̃) = λ1 f (x̃) + λ2 f (x̃),
which shows that the RHS of the weighted JI decreased. Meanwhile, the
weighted average of all numbers did not change:
(λ1 x1 + λ2 x2 ) + λ3 x3 + · · · + λn xn = (λ1 x̃ + λ2 x̃) + λ3 x3 + · · · + λn xn ,
so the LHS stayed constant. This provides the intended smoothing argument,
alas, possibly never ending!
However, something more happened: not only have we replaced each
x1 and x2 by x̃, but we can actually combine these two variables into one
variable x̃, with weight λ̃ = λ1 + λ2 , and prove instead the inequality:
?
f (λ̃x̃ + λ3 x3 + · · · + λn xn ) ≤ λ̃f (x̃) + λ3 f (x3 ) + · · · + λn f (xn ),
where λ̃ + λ3 + · · · + λn = 1. So, there are actually one invariant and two
monovariants anchoring this proof:
• the constant LHS and the decreasing RHS;
• the total number of variables (and not the number of variables equal
to the average), which decreases by 1 at every step.
At the end, we are left with only one variable or with all variables equal to
each other, both of which cases are trivially true. Backtracking, equality is
obtained iff all original variables with non-zero weights are equal (why?).
In Inequalities I we showed that the weighted JI implies the weighted
versions of AM-GM, AM-HM, and other inequalities among means. With
our new understanding of convex functions and smoothing techniques, the
reader may want to redo these proofs here. For “extra credit,”
Problem 6. Invent other problems that can be solved by (weighted) JI.
272 11. MONOVARIANTS. PART III
x1 ≥ y1 ;
x1 + x2 ≥ y1 + y2 ;
..
(7) .
x1 + · · · + xn−1 ≥ y1 + · · · + yn−1 ;
x1 + · · · + xn−1 + xn = y1 + · · · + yn−1 + yn .
4.3.1. Happy endings that could happen. In two cases the HLP situation will
be majorly simplified:
Happy ending 1. All xi are equal. Then the last equality of (7) yields:
nx1 ≤ ny1 , i.e., x1 ≤ y1 . Combining with the first inequality x1 ≥ y1 , we
conclude x1 = y1 . Cancelling both x1 and y1 from all inequalities, we remain
in the same situation but for only n − 1 numbers. Continuing inductively,
we arrive at xi = yi for all i, and then HLP follows trivially.
Happy ending 2. An inequality in (7) is an equality: x1 + · · · + xk =
y1 + · · · + yk for some k ≤ n − 1. We can then restrict the problem to
the first k inequalities and variables. Moreover, canceling x1 + · · · + xk
and y1 + · · · + yk from both sides of the remaining inequalities, we again
arrive at the HLP problem but only for the sequences {xk+1 , . . . , xn } and
{yk+1 , . . . , yn }. Applying induction on the number of variables, we conclude
that HLP works for the first k variables, and also for the last n − k variables:
f (x1 ) + f (x2 ) + · · · + f (xk ) ≥ f (y1 ) + f (y2 ) + · · · + f (yk ),
f (xk+1 ) + f (xk+2 ) + · · · + f (xn ) ≥ f (yk+1 ) + f (yk+2 ) + · · · + f (yn ).
Summing, we obtain the HLP inequality for all n variables.
4. CONVEXITY AND SMOOTHING 273
−a
xk−1 +(k − 1)a xk−1
xn . . . xk x x2 xn . . . xk x x2
x1 x1
4.3.3. Weaving inductively the proof of HLP. We now have all pieces to put
together: we know where we would like to end, and we know how to get
there. Naturally, we use induction on n. The HLP inequality is trivially true
for n = 1, since then x1 = y1 . Suppose HLP is true for n − 1 variables.
For n variables, we keep applying our Smoothing 1 or 2 operations until
we make all variables equal to each other, or until one of the majorization
(7) inequalities (other than the nth one) becomes an equality. These two
situations were addressed before, and each ends happily.
A “bifurcation” phenomenon persisted throughout our solution of HLP:
there were two happy endings and two smoothing procedures to get there.
PST 87. While it is often possible to construct a smoothing operation lead-
ing eventually to all variables being equal, some inequalities call for alter-
native smoothing operations that lead to other favorable outcomes. In such
problems, you have to simultaneously take into account two or more scenarios
throughout the induction (or smoothing) process.
Now that we have proven HLP,
Exercise 8. Can you recognize the Smoothing and the Multi-smoothing
Lemmas as special cases of HLP?
For practice, explain why HLP implies the following:
Corollary 1. (HLP for Products) If {x1 , . . . , xn } majorizes {y1 , . . . , yn }
on interval I and all xi ’s and yj ’s are positive, then x1 x2 · · · xn ≤ y1 y2 · · · yn .
Equality is attained if and only if xi = yi for all i.
5. RANDOM FUN WITH SMOOTHING 275
Here are two beautiful Olympiad problems that will challenge us to com-
bine old monovariant ideas in creative new ways. We will only discuss how
to link the problems to what we have already learned, and leave it to the
reader to “smooth out” (pun intended) all arguments into complete solutions.
4
Besides, the inequality doesn’t make sense with an even number of variables; e.g.,
√ √ ? √ √
1 − 2 < 1 − 2 = −1 = i. . . . Nope, complex numbers cannot be compared like that!
276 11. MONOVARIANTS. PART III
not decrease it. This is where the mansion problem comes in; it suggests
that we should try an unsmoothing operation.
There are minor differences between the two problems, all resolvable:
• Before, the ai ’s were integers ≥ 0. Now they are any real numbers.
• Before, we could only make changes of (−1, +1) to pairs ai < aj . Now
the unsmoothing change (−a, +a) is allowed for any a > 0.
• Before, the monovariant eventually came to a full stop, simply because
it had only finitely many possible values. But now, we have continuous
variables ak and, therefore, infinitely many values for a2i . How do we
make the monovariant stop changing?
We need another, discrete monovariant to put the brakes on the process.
Recall the goal of the problem: to show that some ai ≥ 2. Assuming to the
contrary that all ai < 2, create a unsmoothing operation that increases at
each step the number of ai ’s ≥ 2. ♦
In both the mansion and USAMO ’99 problems the sum of squares acted
as a “concentration” monovariant. From our discussion of convex functions,
we know that we don’t have to use squares. In the latter problem the mono-
variant is, at least hypothetically, in danger of continuing to increase for-
ever, so we helped it with an auxiliary monovariant. These ideas are general
enough to be written out as PSTs.
PST 88. If you have a collection of numbers xi whose sum stays constant,
and need a monovariant that increases when the numbers become more
“spread out,” try using the sum of their squares. More generally, you can
try using the sum of f (xi ) where f is any strictly convex function.
PST 89. If the monovariant is continuous (or can take on infinitely many
values), create another, discrete monovariant (e.g., the number of variables
with some specific property) that will cause the smoothing process to end.
For fun and to understand better the technique of smoothing:
Exercise 9. Redo the problems from Monovariants I about gender bal-
ance and hybrid mansions, leaping frogs along collinear lilies or in a circular
swamp, and simultaneous switches. (The images below should help bring
on a flashback.) Identify explicitly the convex functions and the smoothing
operations used in the solutions. Create more exercises of the same type.
278 11. MONOVARIANTS. PART III
Exercise 10. Let {ak , bk , ck } be the three numbers after performing k steps
of the pairwise averaging in (8). Then k→∞
lim ak = lim bk = lim ck = x.
k→∞ k→∞
There are many ways to prove this, but perhaps the most easily gener-
alized way rests on a standard “sandwich” idea:
PST 90. To see why several sequences converge to the same limit x, set dk
to be the maximal distance between x and all the numbers after the k th step.
Show that lim dk = 0 to force all sequences to converge to x.
k→∞
Solution to Exercise 10: The process averages only pairs of numbers
that come from opposite sides of x (why?), i.e., if ak < x < bk , then ak
and bk each go to (ak + bk )/2. Suppose ak is further away from x than bk ,
i.e., dk = |x − ak |. Using the notation in Figure 4b, the simple geometric
argument CX < CB = 12 AB < AX = dk shows that ak shortened its
distance to x by a factor of at least 2. Applying the averaging once more
will bring in the third number at least twice as close to x as it was before.
To summarize, dk decreases at each step and gets at least halved every
other step, i.e, dk+2 ≤ dk /2. This results in limk→∞ dk = 0, and by PST 90
all three sequences ak , bk , and ck converge to x.
With four or more numbers, however, there are choices for the order of
pairs to average, and if we are not careful, our numbers may not converge
to the same place! Using a distance monovariant again,
6. APPENDIX ON LIMITS AND ENDLESS SMOOTHING 279
Exercise 11. Devise an algorithm for pairwise averaging of x1 , x2 , . . . , xn
that forces them to approach their arithmetic average x = ( i xi )/n.
We will refer to such an algorithm as a good pairwise smoothing, or for
i short, a GPS directing all numbers x1 , x2 , . . . , xn towards their average x.
6.2. Limits tame inequalities. To see how this is useful in working with
inequalities, suppose you want to prove something as general as:
(9) LHS = F (x1 , x2 , . . . , xn ) ≤ G(x1 , x2 , . . . , xn ) = RHS
for two functions F and G, continuous for all xi in some interval I. Suppose
Ä x +x ä
x +x
further that under pairwise averaging of the inputs (xi , xj ) → i 2 j , i 2 j ,
F increases and G decreases. Using a GPS, we have lim xi → x for all i,
k→∞
where the steps of the GPS are indexed by k.5 By continuity of F and G and
properties of limits, we are left to prove only the middle inequality below:
?
LHS ≤ lim F (x1 , . . . , xn ) = F (x, . . . , x) ≤ G(x, . . . , x) = lim G(x1 , . . . , xn ) ≤ RHS.
k→∞ k→∞
continuous functions, i.e., it lifts the smaller side up and lowers the larger
side down, then all you need to show is that the inequality is true when all
variables are equal.
This simplifies the proof of some inequalities we have encountered so far.
If you are bothered by the continuity condition, rest assured that:
Lemma 4. Any composition of the four arithmetic operations ±, ×, ÷, the
algebraic operations of raising to a power, and any of the standard continuous
functions such as exponential, logarithmic, or trigonometric functions, is
continuous on any interval where this composition is well-defined.
6.3. Infinite pairwise smoothing is useful, after all! With this sea of
continuous functions, we can attack a number of inequalities. Keep in mind
that any convex function on interval I is necessarily continuous on I (why?).
xr +···+xr
Partial Proof: Since Pr = r 1 n n is continuous for xi > 0 and
Pr = P1 for equal inputs, when r > 1 the proof of P1 ≤ Pr boils down to
showing that Pr strictly decreases under pairwise smoothing for x = y:
Å ã
? x+y r xr + y r ? x + y
xr + y r > 2 ⇔ r
> ·
2 2 2
5
Strictly speaking, each xi should also be indexed by k since xi changes with the steps.
280 11. MONOVARIANTS. PART III
The latter is the original inequality P1 < Pr but only for two variables, which
is a substantial reduction brought about by the pairwise smoothing.
Setting f (x) = xr and I = (0, ∞), we can rewrite the last inequality as:
Å ã
x + y ? f (x) + f (y)
(10) f < for all x = y in I.
2 2
“Surprise!” This is the Midpoint Rule for the strictly convex f (x) on I! ♦
6.4. Why is the Midpoint Rule (MR) true? Taking (10) as given, we
need to show the definition of a strictly convex function, i.e.,
?
(11) f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) for any λ ∈ (0, 1).
Starting with x and y, we have to get to x̃ = λx +(1−λ)y, their weighted
average, by using only ordinary averages of pairs so that we can apply MR
at each step. Another (weighted) GPS construction is in order here:
Lemma 5. There is a sequence {an } inside interval (x, y) that converges to x̃
so that each an is midway between some previous ai ’s, including possibly x
and y. If an = λn x + (1 − λn )y for some λn ∈ (0, 1) then lim λn = λ.
n→∞
Exercise 1. Suppose we want to run smoothing with the same idea as in the
proof of AM-GM: fix the sum of the xi ’s and increase the other side. Thus,
we replace a pair of numbers x + a and x − b not equal to the average x by x
and x + a − b. We know that their sum remains constant and their product
(x + a)(x − b) increases to x(x + a − b). But we also need to figure out what
happens to the sum of their reciprocals in the denominator of the RHS:
1 1 (x + a) + (x − b) x + (x + a − b) 1 1
(13) + = > = + ·
x+a x−b (x + a)(x − b) x (x + a − b) x x+a−b
Adding the other unchanged x1i ’s and reciprocating (13) flips the sign of the
inequality and forces the RHS to increase under our operation. Just as in the
proof of AM-GM, the number of variables xi equal to the average x increases
at each step, i.e., after at most n repetitions all variables will be equal to x
and equality will be then obtained. Hence, the original AM-HM inequality
must have been true, with equality iff all variables are equal.
It is possible to also deduce AM-HM from two applications of AM-GM.
Rewrite equivalently the desired AM-HM by pulling everything to the LHS:
1
x1 + 1
x2 + ··· + 1
x1 + x2 + · · · + xn ?
xn
· ≥ 1.
n n
Now, AM-GM applied separately to x1 , x2 , . . . , xn and to x11 , x12 , . . . , x1n yields
two geometric means that cancel each other:
x1 + · · · + xn x1 + · · · + xn AM-GM
1 1
√
· ≥ n 1
x . . . x1n · n x1 . . . xn = 1.
n n 1
Exercise 2. The sum remains constant and the product increases under the
x +x
operation of replacing each of two different numbers xi and xj by i 2 j :
Å ã2
xi + xj xi + xj AM-GM xi + xj
xi + xj = + and xi xj < ·
2 2 2
The problem is that the process may never end! In fact, Exercise 3(b)
provides a simple and convincing example of an endless smoothing.
Exercise 3. (a) This can be done by induction on n. The base case n = 1
of two numbers x1 and x2 being replaced by their average is trivially true.
Assuming that we can equalize 2n numbers, take any 2n+1 numbers and ar-
bitrarily split them into two groups of 2n numbers each. By IH, equalize
the numbers in each group to some common values a and b, respectively.
Performing then the operation on 2n pairs of numbers {a, b} will make ev-
erything equal to (a + b)/2, completing the induction step.
(b) After one step there will be two numbers equal and the third different
from them: {a, a, b}. After another step, we will have exactly the same
configuration of numbers: two equal and one unequal to them, and so on
and so forth. The process will therefore never stop.
Read the Limit Appendix for a discussion that pushes through the re-
placement operation in Exercise 2 to a successful proof, despite the fact that
the smoothing process itself never stops.
Exercise 4. With the pairwise averaging operation in the AM-GM solution,
we claim that any n > 1 numbers will be equal to each other after n−1 steps.
More precisely, if n − k of the numbers are already equal to the average x,
with k ≥ 1, then it will take at most k − 1 steps for the process to end in
equality, fixing the LHS at each step and increasing the RHS. We leave it to
the reader to finish the formal proof by induction on k, taking into account
that every time the operation reduces the numbers not equal to x. ♦
Exercise 5. Just reverse the monovariant step from the proof of RI: for
n = 2, if z1 = y2 and z2 = y1 , do nothing; if z1 = y1 and z2 = y2 , then switch
the zi ’s to decrease the RHS. Now apply an analogous inductive argument
as in the proof of RI. ♦
Exercise 6. The Midpoint Rule boils down to 2-variable AM-HM/AM-GM:
1 1
1 ? x + y 2 ? x+y
• x+y < ⇔ 1 1 < ;
2
2 x + y
2
Ä ä ? Ä ä ? √ ? √
− ln x−ln y
• − ln x+y
2 < 2 ⇔ ln x+y
2 > ln xy ⇔ x+y
2 > xy. ♦
shaded right triangles (with hypotenuses along A D and one leg horizontal):
lC D = A B , from which l(CC − DD ) = AA − BB (why?). Rearranging
this, AA +lDD = BB +lCC . Using that the graph of f (x) between A and
D lies underneath the segment A D , we have BB > BB and CC > CC,
which yields again f (A) + lf (D) ≥ f (B) + lf (C). ♦
Exercise 8. The Smoothing Lemma is the case with two inequalities in HLP:
x1 ≥ y1 and x1 +x2 = y1 +y2 , where x1 , x2 , y1 , y2 are D, A, C, B, respectively.
The Multi-smoothing Lemma is the following case of l inequalities in HLP:
D ≥ C, D + D ≥ C + C, . . . , lD ≥ lC, lD + A = lC + B.
Corollary 1. Apply − ln(x) to both sides of the intended inequality and
n n
split the products to sums: i=1 (− ln(xi ))≥ i=1 (− ln(yi )), which is HLP
for the convex f (x) = − ln x on (0, ∞).
284 11. MONOVARIANTS. PART III
√
n
Problem 1. To show that x is concave, use the Midpoint Rule:
» √ √ n
x+y ? n x+ n y x+y ? x1/n +y 1/n ?
n
2 > 2 ⇔ 2 > 2 ⇔ P1 > P 1 ,
n
1
which is the power mean inequality for two variables with r = < 1. Alter-
n
natively, using the Second Derivative Test (cf. Inequalities I), we calculate
√ √
( n x) = n1 ( n1 − 1)x n −2 < 0 for x > 0, so n x is concave there. Thus, all
1
√ √ √ √
m a
2n+1 +
m a
2n−1 <
m
a2n + m a2n+1 − a2n + a2n−1
√ √ √ √
(15) ⇒ m a2n+1 − m a2n + m a2n−1 < m a2n+1 − a2n + a2n−1 .
In the key step, replace a2n → a2n−1 and a2n+1 → a = a2n+1 − a2n + a2n−1 .
This increases the LHS and fixes the RHS. Cancelling out the resulting two
a2n−1 terms, we end up with √ two fewer radicals and n becomes n − 1. By
m
induction on n (for function x), we only need the base case for n = 2:
√ √ √ ? √
m
a1 − m a2 + m a3 < m a1 − a2 + a3 .
But this is the general inequality (15) that we already proved earlier with
n = 1. Finally, note that
√ our original problem is just the special case of the
n
above for the function x (when m = n) and 2n + 1 variables.
Problem 8. Suppose that all ai < 2. Iterate the following operation as long
as possible: take two numbers ai and aj both less than 2, and replace them
by 2 and ai + aj − 2. Now ai stays the same, but what happens to a2i ?
The operation is unsmoothing: ai +aj −2 < ai ,
aj < 2, so the middle numbers ai and aj are
pulled apart to ai +aj −2 and 2. As f (x) = x2
x2
is convex, by Lemma 1 the sum of the a2j ’s acts
as a “concentration” monovariant and goes up:
ai +aj −2 ai aj 2 f (ai ) + f (aj ) < f (ai + aj − 2) + f (2).
Both given inequalities are preserved by the operation; however, one more
of the ai ’s became 2. As we cannot repeat this operation forever, eventually
exactly one aj is < 2 and the rest all 2’s. The two inequalities then read:
• aj + 2(n − 1) ≥ n ⇒ aj ≥ −(n − 2) ⇒ aj < −(n − 2);
• a2j + 4(n − 1) ≥ n2 ⇒ a2j ≥ (n − 2)2 ⇒ |aj | ≥ |n − 2|.
From here aj ≥ n − 2 ≥ 4 − 2 = 2, a contradiction with aj < 2.
Hence, indeed, one of the original ai ’s must be ≥ 2.
Exercise 11. At every step simply average the number that is furthest away
from x and any other number on the other side of x. After n (really, & n2 ')
such steps the maximum distance from x will be at least halved. ♦
7. HINTS AND SOLUTIONS TO SELECTED PROBLEMS 285
Zvezdelina Stankova
Sneak Preview. If you thought that there are no more solutions to the Farmer-
and-Cow problem worth discussing . . . you are in for a surprise! To produce a
solution radically different from anything we have done so far, we will re-discover
a special case of the famous Minkowski’s inequality; and yet another solution will
invoke some more advanced derivative techniques for optimizing functions. We
will try to utilize all generated ideas to conquer the Optimal Bridge challenge from
Part II. However, in the purely geometric solution, a new “magic” transformation
in the plane will have to be created, to replace the reflection across the river
used in Part I. Everyone from beginners to advanced will find a solution here
corresponding to their level.
In contrast, the section devoted to generalizing the Three-Squares problem to
infinitely many squares is intended only for the most advanced readers. Our
solution will depend entirely on Calculus techniques: we will invoke a particular
Taylor series and some PSTs for determining when sums are finite or infinte.
Our geometric journey will end with a historical detour through a 2000-year
old puzzle attributed to Archimedes. This will bring us full circle back to where
we started: our brilliant 5th-grade solution to the Three-Squares Problem in
Part I. In an attempt to circumvent all Calculus machinery in this article, a final
geometric challenge will be posed as an open problem.
1.1. The initial set-up. Both routes start the same way.
Exercise 1. If Y is a point on the river outside segment AB, find another
path from the farmer to the cow that is shorter than the path F → Y → C.
C C C
F F b F b
a
l l A X Bl
y
YA B A X B a a
x x+y
E
F F
Figure 1. Restricting the domain and Conjecturing the optimum
Hint: If A is between Y and B (cf. Fig. 1a), show that F Y +Y C > F A+AC.
The Pythagorean Theorem may be helpful. ♦
Because of Exercise 1, we can now safely assume that the shortest path
of the farmer must pass through a point X on segment AB (cf. Fig. 1b).
Hence we can introduce the non-negative unknowns x = AX and y = BY
such that x + y = AB. Moving on to the next step:
PST 93. Describe the quantity in question via some function of the un-
knowns (and the knowns).
A specific case of this was done in Exercise 4 in Part II. To generalize, the
farmer’s route is made of hypotenuses F X and XC in F AX and CBX:
»
f (x, y) = a2 + x2 + b2 + y 2 where x, y ≥ 0 with x + y = AB.
1.2.2. Nothing new under the sun. Inequality (1) is a symmetrically phrased,
elegant inequality, so it should not be a surprise that it is well-known. Indeed,
it is a special case of a famous and much more general inequality.
Theorem 1. (Minkowski’s Inequality) If all ai , bj ≥ 0 and r ≥ 1, then
» » »
r
ar1 + · · · + arn + r
br1 + · · · + brn ≥ r
(a1 + b1 )r + · · · + (an + bn )r .
If 0 < r < 1, then the inequality is reversed.
We shall not attempt to prove Minkowski’s Inequality here (it will be dis-
cussed in its full generality in Inequalities II, vol. III). We can, though, prove
our special inequality (1) practically with “bare hands”.
1.2.4. Don’t forget the equality! Our proof so far verified that all paths of
»
the farmer are at least (a + b)2 + (x + y)2 . As per Exercise 2, the latter is
the length of F C where F is the reflected, “phantom” farmer. But we must
ask: is the corresponding path F → X → C (where X is the intersection of
F C with the river) the unique shortest path for the real farmer F ?
PST 95. To complete the proof of an inequality A ≥ B, investigate when
equality is obtained. In other words, find a condition (algebraic, geometric,
or other) on the involved letters that makes the two sides equal.
?
The task of solving A = B has been trivialized by the last step of our
proof: inequality (1) is equivalent to (ay − xb)2 ≥ 0. Equality is obviously
obtained exactly when ay = xb. Furthermore, substituting y = AB − x and
solving a(AB − x) = xb for x yields the only possible value x = a ·AB/(a + b),
as long as we can divide by a + b. Therefore, if at least one of a or b is non-
zero, there is a unique place on the riverbank such that the corresponding
path F → X → C is a shortest path for the farmer; namely, this is the point
X between A and B with AX = a · AB/(a + b).
If a = b = 0, then (1) is always trivially satisfied (check it!). In reality,
this corresponds to the situation when both the farmer and the cow are at
the riverbank: the farmer can dip his bucket into the river anywhere along
his way to the cow; i.e., AX = x can be any number between 0 and AB.
1.2.5. Why be restricted to the plane? Rewriting the condition for equality
in (1) in the form x/a = y/b, makes it reasonable to expect that the general
Minkowski’s Inequality will become equality if and only if all ratios ai /bi are
equal.2 The reader curious about more general versions of the Farmer-and-
Cow problem can formulate and solve the problem in space (a flying farmer
and a flying cow?) and even venture into four or more dimensions.
2
Technically, to avoid division by 0, we must stipulate that when some bi = 0 then
ai = 0, or rewrite the conditions for equality as ai bj = aj bi for all i, j.
1. FARMER-AND-COW VIA INEQUALITIES AND CALCULUS 291
x (c − x) »
F (x) = √ − = 0 ⇐⇒ x b2 + (c − x)2 = (c−x) a2 + x2 .
a 2 + x2 b2 + (c − x)2
The last manipulation was simply clearing the denominators. Now note that
both sides of the equality are non-negative since x ∈ [0, c]. Squaring and
multiplying through leads to:
x2 b2 +(c−x)2 = (c−x)2 (a2 +x2 ) ⇐⇒ x2 b2 +x2 (c−x)2 = (c−x)2 a2 +(c−x)2 x2 .
To summarize, F (x) > 0 if x > x0 and F (x) < 0 if x < x0 (cf. Fig. 2a
on p. 292). This means that the original function F (x) decreases before x0
and increases after x0 , i.e., F (x0 ) is the global minimum of F (x) on [0, c].
1.3.4. No more “guessing”. We can now find the minimum of our function:
Exercise 4. Calculate F (x0 ) and simplify it as much as possible.
Answer: ÄAfter ä some
» non-taxing algebraic manipulations, one arrives at
ac
F (x0 ) = F a+b = (a + b)2 + c2 . ♦
If we recall that c = AB = x + y, the expression for F (x0 ) should not be
surprising: F (x0 ) is precisely the “mysterious” RHS of the inequality A ≥ B
in our previous approach. We conclude that
» ac
F (x) ≥ F (x0 ) = (a + b)2 + c2 with equality iff x0 = ·
a+b
292 12. RE-CONSTRUCTIONS. PART III
1.3.5. The big versus the really big picture. Our investigation of the derivative
F (x) can be used to show that the local behavior of F (x) on the interval
[0, c] extends to a global behavior on (−∞, ∞). More precisely,
Exercise 5. Using the sign of F (x) again, show that F (x) decreases for all
) 12 200
(x
F + )
(x
4
4−
0 − x0 c x F
−
2x
2x
)
(x
F
0 x0 c x −2 0 x0 6 x −100 0 x0 100 x
√ »
Figure 2. Graphs of F (x) = 22 + x2 + 62 + (4 − x)2
The expected shape of the graph of F (x) is confirmed by Figure 2b in
the original case of the problem with a = 2, b = 6, and c = 4, on the
interval [−2, 6]. The graph basically looks like a smile,3 with the bottom of
the smile at x0 = ac/(a + b) = 1. However, as we enlarge the interval to, say,
[−100, 100] (cf. Fig. 2c), the graph of the function starts resembling a wedge:
it “straightens out” into two lines as x moves further away from x0 = 1. If
you are familiar with the necessary Calculus techniques,
√ »
Exercise 6. Show that F (x) = 22 + x2 + 62 + (4 − x)2 has two slant
asymptotes: y = 2x − 4 when x → ∞ and y = 4 − 2x when x → −∞.
Alternatively said, F (x) ≈ |x| + |4 − x| when |x| is large.
Thus, the length of the farmer’s path changes approximately linearly
when he approaches the river at places X very far from the cow.
2.1.3. We’ve done this before! We have justified that the optimal bridge must
be inside our rectangle, and hence our unknowns x and y are non-negative
and add up to x + y = c. Further, the total length of the route from V1 to V2
is V1 X1 + X1 X2 + X2 V2 , which can be expressed as the following function:
»
f (x, y) = a2 + x2 + d + b2 + y 2 for a, b, x, y ≥ 0.
By now the reader has, no doubt, seen the connection with the special case
of Minkowski’s inequality, proven in the Farmer-and-Cow situation:
» » »
a2 + x2 + b2 + y 2 ≥ (a + b)2 + (x + y)2 = (a + b)2 + c2 .
»
Therefore, the length of the shortest route is (a + b)2 + c2 + d, attained
(again!) iff ay = bx.
Unfortunately, both the width d of the river and the fact that the villages
are on opposite sides of the river makes our previous idea of reflecting across
the river useless here. Below we uncover another transformation that will
elegantly explain the situation and lead to a purely geometric solution.
2.2.1. Rearranging parts for a better understanding. As we observed, every
route consists of three parts: walking from V1 to the bridge, walking across
the bridge, and then walking to V2 . While the first and the third parts
depend on where the bridge is built, the middle part is kind of a “constant”:
• it always goes in the same direction; e.g., we can assume (as in our
figures) that walking across the bridge is in the north direction; and
• it has a fixed length of d.
PST 96. If some quantity (whether algebraic or geometric) consists of sev-
eral parts, try swapping some of these parts: this may give you an advanta-
geous angle by viewing the quantity in a different, easier way.
In the bridge situation, why not first walk the “constant” middle part of
the route and then follow it by the other two parts of route? To this end, we
ignore temporarily the river: this will enable us to arbitrarily build “bridges”
on land and walk on water in any direction without a bridge. Thus,
(a) First walk north from V1 to point Y for a V2
distance of d.
(b) Then walk straight from Y to village V2 .
In effect, this swapped the first segment V1 X1 of the X2 Z2
route with the second, bridge-part X1 X2 . To recover d d
always started with segment V1 Y ; i.e., the useful transformation turned out
to be a translation V1 → Y from village V1 to the north by distance d.
2. OPTIMAL BRIDGE LOCATED! 295
The idea of the translation in the plane can also explain the aftermath
of our previous inequality solution, where by algebraic calculations we dis-
covered that the shortest route occurs if x/a = y/b.
Exercise 9. Justify geometrically that V1 AX1 ∼ V2 BX2 for the optimal
bridge X X .
1 2
are on their corresponding riverbanks. How many optimal bridges are there
Exercise 10. Investigate the special cases when one or both of the villages
2.3. Why only two villages? If you want to test everything you’ve learned
so far in Parts I-III about solving optimization problems, bump up the num-
ber of villages to three, change the river to a railroad track, and try to come up
with a variety of approaches (purely geometric, inequalities, and Calculus –
anything counts!) to the following challenge problem:
The next challenge will require both trigonometry and advanced Calculus
techniques. Read on only if you are fluent in both.
Problem 4. (ℵ0 –Squares) Glue to each other infinitely many identical
squares with bases AA1 , A1 A2 , A2 A3 , A3 A4 , A4 A5 , and so on, to form an
infinite row (cf. Fig. 3). If D is the top left corner of the first square, right
above A, what is the sum ∠AA1 D + ∠AA2 D + ∠AA3 D + ∠AA4 D + · · · ?
D
α1 α2 α3 α4 α5
A A1 A2 A3 A4 A5
Figure 3. α1 + α2 + α3 + α4 + α5 + · · · = ?
3.1. Finite or infinite? Problem 4 asks us to find the sum of all angles
αi . From the Three-Squares problem, we know that α1 + α2 + α3 = 90◦ . So,
let’s concentrate on finding the sum of the rest of the αi ’s.
i To this end, define the partial sum sn to be sn = α4 + α5 + · · · + αn
for any n ≥ 4. From right DAAi , tan αn = n1 . Luckily, the formula from
Part II for tangent of a sum will link recursively all values of tan sn :
tan sn−1 + tan αn tan sn−1 + n1
tan sn = tan(sn−1 + αn ) = = ·
1 − tan sn−1 tan αn 1 − tan sn−1 · n1
Thus, tan α4 = tan s4 = 1
4 and tan s5 = ( 14 + 15 )/(1 − 1
4 · 15 ) = 9
19 .
PST 97. To find out if a sum is finite or infinite, investigate the first partial
sums and make a conjecture in order to know what type of proof to expect,
because the techniques in the finite vs. infinite case will be different.
With this in mind, we keep on investigating the sequence {tan sn }. To
go over another 90◦ , i.e., to turn the tangent positive again, check that you
will need to wait much longer: tan s81 ≈ −0.01 and tan s82 ≈ 0.002. So far,
(α1 + α2 + α3 ) + (α4 + · · · + α17 ) + (α18 + · · · + α82 ) > 3 · 90◦ = 270◦ .
3. INFINITELY MANY ANGLES AND INFINITE SERIES 297
Given the evidence, there is no reason to expect that the sum of the infinitely
many angles αn will be finite! We are compelled to make the following
PST 98. Let the terms an of a sequence be given by some function f (x).
To show that the an ’s add up to ∞ (the sum has no upper bound), find a
lower bound for f (x), i.e., another function g(x) such f (x) ≥ g(x), and show
instead that the corresponding terms bn given by g(x) add up to ∞.
In our case, an = f ( n1 ) with f (x) = arctan x, and the bn ’s should be
given as bn = g( n1 ). If you haven’t worked before with Taylor series, the
choice we will make here for the lower bound for arctan x will seem to come
out of nowhere. As we shall see later, it is not a guess at all.
Exercise 12. The function arctan x for 0 ≤ x ≤ 1 is bounded from below
3
by a cubic polynomial g(x); namely, arctan x ≥ x − x3 for all x ∈ [0, 1].
We shall first go through a less technical proof that avoids Taylor series
and relies on analysis with derivatives to minimize a function.
x
3 y
tween the polynomials x − x3 and x. We proved arcta
nx
3
above that arctan x ≥ x − x3 for x ≥ 0. For x
practice, using the derivative techniques above, 0
x−
What
Exercise 13. Show that x ≥ arctan x for x ≥ 0.
x
3
3
happens among the functions x, arctan x,
x3
and x − 3 when x ≤ 0?
3
3.4. The price to pay for demystifying the cubic polynomial x − x3 is
using Taylor expansions. It is a standard exercise in Calculus to derive the
Taylor expansion of arctan x centered at x = 0 and find the interval where
it converges to arctan x. We will discuss this calculation only in the Hints
section, and leave it to the advanced reader to investigate the topic in a
Calculus textbook. The result needed for our purposes is:
x 3 x5 x7
Exercise 14. For any x ∈ [−1, 1], arctan x = x − + − + ····
3 5 7
The RHS looks like a polynomial of “infinite” degree, but we need only the
3
degree-3 polynomial x − x3 made of its first terms! Why does dropping the
3
higher powers of x yield the desired inequality arctan x ≥ x − x3 for x ≥ 0?
PST 99. Given an equality between a function and an infinite series (such
as in Exer. 14), group the unwanted terms in the RHS and show that each
group is positive (or each group is negative, as needed). Then drop all such
grouped terms to produce an inequality in the desired direction.
Equipped with PST 99, we can justify again the lower bound for arctan x.
Proof 2 of Exercise 12: Restricting the Taylor expansion of arctan x
to 0 < x ≤ 1, note that the absolute values of the terms decrease as n grows:
x2n+1 ? x2n+3 2n + 3 ? 2 2 ?
≥ ⇔ ≥x ⇔1+ ≥ x2 ,
2n + 1 2n + 3 2n + 1 2n + 1
and the last in certainly true because 1 ≥ x . Leaving alone the first two
2
3
terms x − x3 , we can therefore group the remaining (unwanted) terms into
pairs with non-negative differences when x ∈ [0, 1]:
Ç 5 å Ç 9 å Ç 2n+1 å
x x7 x x11 x x2n+3
− + − + ··· + − + · · · ≥ 0.
5 7 9 11 2n + 1 2n + 3
x3
As a result, arctan x ≥ x − 3 for x ∈ [0, 1].
3.5. Classic infinite and finite sums. Recall that we wanted to show
that all arctan n1 add up to ∞. From Exercise 12, we know that their sum
3
will be at least the corresponding sum of values of x − x3 ; namely,
∞ ∞ Å
ã
1 1 1
(2) arctan ≥ − 3 ·
n=1
n n=1 n 3n
3. INFINITELY MANY ANGLES AND INFINITE SERIES 299
3.5.1. Doubling the index adds another half. To start off, for any n ≥ 1 let
i an = 1+ 12 + 13 +· · · + n1 . The an ’s are called the partial sums of the harmonic
series. Since n1 > 0, the sequence of partial sums {an } is increasing.
PST 100. To show that an increasing sequence {an } goes to ∞, it is enough
to show that a subsequence {ank } of it goes to ∞.
The choice of a convenient subsequence {ank } depends on the specific
example. For our harmonic series, something inventive needs to be done.
Solution to Exercise 15: The slick approach is to consider the subse-
quence {a2k } made of every (2k )th term. Now, every next term a2k+1 is a sum
of twice as many fractions as the previous term a2k . How will this increase
the value of a2k ? Check the beginning: a20 = a1 = 1, a21 = a2 = 1+ 12 = 1 12 ,
Ä ä Ä ä Ä ä Ä ä
a22 = a4 = 1 + 12 + 13 + 14 > 1 + 12 + 14 + 14 = 2,
Ä ä Ä ä
a2 3 = a8 = a4 + 15 + 16 + 17 + 18 > 2 + 18 + 18 + 18 + 18 = 2 + 4· 18 = 2 12 ·
A pattern emerges: when we double the index from 2k to 2k+1 , the terms
a2k increase by at least a half, which is the brilliant idea in this approach:
a2k+1 = a2k + 1
2k +1
+ 1
2k +2
+··· + 1
2k+1
> a2k + 2k · 2k+1
1
= a2k + 12 ·
Using induction, one can formally show that a2k ≥ 1 + k 21 for all k ≥ 1. But
the new, smaller sequence {1 + k 21 } obviously goes to ∞, pushing the larger
sequence {a2k } to go to ∞. Retracing our steps, by PST 100 we conclude
that the original (increasing) sequence {an } is also forced to go to ∞.
300 12. RE-CONSTRUCTIONS. PART III
3.5.2. Telescoping for convergence. Turning now to the partial sums of the
n3 , bn = 1 + 23 + 33 + · · · + n3 , we must change tactics because
1 1 1 1
series
1 1
n3
and the harmonic series n behave in opposite ways!
PST 101. To prove that a sequence {b } is bounded from above, find an-
n
other sequence {cn } greater than it and bounded from above. Symbolically, if
bn ≤ cn and cn ≤ B for all n, then bn ≤ B for all n.
3.6. Concluding arguments. Recall inequality (2) from page 299, which
provided
Ä
a lowerä bound for our desired sum of arc-tangents: ∞
n=1 arctan n ≥
1
∞
n=1 n − 3n3 . If we stop the sum on the RHS at some n and regroup the
1 1
terms, the partial sums an and bn discussed above will spring up:
Å ã Å ã Å ã
1 1 1 1 1 1
− + − +··· + −
1 3 · 13 2 3·2 3 n 3 · n3
Å ã Å ã
1 1 1 1 1 1 1 1 2
= + + ··· + − 3
+ 3 + · · · + 3 = an − bn > an − ,
1 2 n 3 1 2 n 3 3
where in the last inequality we used bn < 2. Since {an } goes to ∞, then
{an − 23 } also goes to ∞, making the whole RHS of (2) also go to ∞. This
in turn pushes the larger sum arctan n1 on the LHS of (2) to go to ∞.
For someone who has followed the recent great discoveries of ancient
mathematical works, this discussion may have triggered a memory of other
tilings: Figure 4b represents one possible solution to the famous Stomachion,
a 14-piece puzzle attributed to Archimedes.4 The task is to take the pieces
out and then reassemble them back into the square shape.
At a first glance, the pieces are so distinct that it seems just a few
configurations are possible; but our intuition is very far from the truth! It
was only in 2003 that William Cutler, via a computer program, proved that
there are 17,152 possibilities. Discarding those that can be obtained from
each other by rotations and reflections, he showed that the number of truly
different ways to arrange the puzzle is exactly 536 [18]. And there is more
amazing combinatorics related to the problem! For example, as pointed by
Fan Chung and Ron Graham [14], there are 3 pairs of pieces such that no
matter how we rearrange the 14 original pieces, these 6 pieces will line up
within each pair next to each other exactly as shown by the shaded figures
in Figure 4c (and as one can check too in Figure 4b). In other words, after
gluing the pieces within these pairs, we are left to play with only 11 pieces.
4
The Stomachion is a 950 AD copy of a work of Archimedes by a Byzantine scribe.
It is also the last paper in the Palimpsest, a collection of several manuscripts that were
scraped, washed, and reused in the 13th century for a Christian liturgical book. Having
a fascinating history on its own of being discovered, re-discovered, and lost in the 19th
and 20th centuries, the Palimpsest finally became available again to the public after it
was purchased by an anonymous bidder in 1998 for over $2,000,000. This led to a decade
of scholarly research that heavily relied on technological advances, making the original
papers in the Palimpsest readable and overturning century-held beliefs.
302 12. RE-CONSTRUCTIONS. PART III
In case x < 0, the first fraction is negative while the second fraction is
positive (why?), making the overall difference negative: F (x) < 0 for x < 0.
This implies that F (x) decreases when x < 0.
Argue similarly to show that F (x) > 0 for x > c. ♦
»
Exercise 6. More generally, for any function g(x) = A2 + (x − B)2 we
will show that g(x) ≈ |x − B| when |x| is large. Indeed, rationalizing the
“numerator” of the difference g(x) − |x − B|, we obtain:
g(x) − |x − B| g(x) + |x − B| g 2 (x) − (x − B)2 A2
· = = ·
1 g(x) + |x − B| g(x) + |x − B| g(x) + |x − B|
Exercise 7. The middle parts of the routes are equal: X1 X2 = AA = d. From
right V1 AX1 , we have V1 A <»V1 X1 , and from » right triangles X2 BV2
and A BV2 , we have A V2 = A B + BV2 < X2 B 2 + BV22 = X2 V2 .
2 2
Exercise 10. When exactly one of the villages is on the riverbank, our
solution goes through and yields a unique optimal bridge built at that village.
If both villages are on the riverbanks, then the two bridges built directly at
the villages and any bridge between these two bridges will be optimal.
x3
Exercise 12, Proof 1. The derivative of arctan x − x + 3 simplifies to
1 1 + (x2 − 1)(x2 + 1) 1 + x4 − 1 x4
h (x) = − 1 + x 2
= = = ·
1 + x2 1 + x2 1 + x2 1 + x2
5
An odd function F (x) is such that F (−x) = −F (x).
304 12. RE-CONSTRUCTIONS. PART III
V2
V1 V2
T T
V1
V3
V3
Figure 5. Optimal Station when ∠V1 V2 V3 is < 120◦ or ≥ 120◦
Problem 3. The optimal station will be located at a point T along the
railroad, inside V1 V2 V3 , and such that the three angles between arms T V1 ,
T V2 , and T V3 are as equal to each other as possible. In case ∠V1 V2 V3 < 120◦ ,
T will be the unique such point with ∠V1 T V3 = 120◦ , making the three angles
all equal to 120◦ (cf. Fig. 5a). If ∠V1 V2 V3 ≥ 120◦ , then T will coincide with
village V2 (cf. Fig. 5b). Can you find a geometric way to justify the answer? ♦
Epilogue
Later that year she would devise her own way of conquering the last
row of the Rubik’s Cube (having learned to solve the first two at her math
circle); in a couple of years she would represent Bulgaria at the International
Mathematical Olympiads (IMO); then go onto a math major at Bryn Mawr
and a doctorate at Harvard; train the USA math team for the IMOs . . . and
come full circle by founding the Berkeley Math Circle in 1998.
305
306 EPILOGUE
That girl is me – not angry at my middle school math teacher for putting
me on the spot in front of the whole class, rather, grateful to her for giving
me a second chance, for seeing the seed of talent in me, for accepting me and
nurturing my mathematical curiosity at her math circle, and for propelling
me forward with the belief that “what comes from within will take you far.”
3.2. Frequently asked questions. Here are some more differences between
Eastern European and U.S. math circles. Keep in mind that not all U.S.
circles follow the BMC model, and neither are my hometown math circles
(HMC) identical twins of the other Eastern European math circles.
3.2.1. Age of circlers. While in HMC all students were about the same age,
U.S. math circles may incorporate students of a variety of ages, e.g., BMC
ordinarily engages students in two or three different grades, but sometimes
ranging from 4th to 12th grade, all sitting and learning in the same room.
308 EPILOGUE
3.2.2. Logistics. HMC met twice a week for 1.5 (or more) hours. The HMC
were numerous and organized in such a way that students ordinarily could
go there and get home without parents’ assistance. U.S. math circles, due to
transportation issues and conflict with other established school and out-of-
school activities (e.g., volleyball team, music lessons, chorus, etc.), may meet
only once a week for 2-hour sessions. The large area covered by the one BMC
(from Sacramento to San Jose, from Palo Alto to Orinda and Danville) calls
for parents to drive their kids across the long distances and forces the evening
BMC time (6–8 pm) during the week, or alternative weekend sessions whose
timing presents other obstacles to families and organizers.
3.2.3. Home base. While HMC were either based at a school or at a local
math/science center, their U.S. counterparts are usually university-based.
A sufficient number of teachers in Eastern Europe were qualified to lead
math circles on their own, with some occasional support of materials and
instructors from a nearby university. Alas, this is not the case in the U.S.
3.2.4. Topics in HMC were organized in modules, providing continuity and
gradual increase of difficulty and depth of the material. This was possi-
ble mostly because the students had very similar math background, level
of knowledge, and mathematical maturity and because circlers attended all
sessions: transportation issues did not exist and other activities were de-
prioritized by the math circles. In the U.S., the circlers may vary from
beginners to seasoned members of the national USA math team, and hence
single powerful sessions incorporating the various levels and backgrounds are
more practical than long sequences of linked sessions. Besides, the sparsity of
U.S. math circles and competing activities (which become more the older the
student gets) means regular weekly attendance is not always possible; hence
missing one session should not preclude understanding the following one.
For the BMC-advanced group, the sessions are usually singletons, with occa-
sional series of 2 sessions. For the BMC-intermediate group, the sessions are
often in a series of 2, while for the BMC-beginners group a single instructor
undertakes a module of 3–4 thematically arranged sessions. (The BMC-
elementary groups have the same instructor throughout the whole year, and
topics tend to last for a month or two of sessions.) The younger the students,
the more continuity in topics and instructors is provided at BMC.
3.2.5. Session leaders in HMC were only one or two teachers who organized
the specific math circle. Occasionally we had guest speakers from the lo-
cal university, and once in a while we were visited by professors from Sofia
University or the National Youth Science/Math Center who trained the Bul-
garian national team. In contrast with HMC, each BMC instructor leads
an average of 2 sessions per year, accounting for approximately 50 instruc-
tors at the BMC-Upper every year. They are mathematicians from nearby
universities and colleges, some specially trained high school teachers, some
professionals working in related fields, and even some alumni and current
advanced circlers.
3. EASTERN EUROPEAN VS. USA MATH CIRCLES 309
3.2.6. Popularity. Everyone in Eastern Europe knew about the math circles;
children and parents alike were well aware of the opportunity to enroll and
of the possibilities which successful participation might open in the students’
future. What portion of the U.S. population has an inkling that math cir-
cles exist? Negligible. What status do math circles have in U.S. society
and its educational system? Unclear. Can they compare in popularity to
membership of a high school football or debate team? No, they can’t.
3.2.7. Government support. The overall organization and funding in the so-
cialist model math circle was entirely secured by the state; a math circle was
an extracurricular activity roughly equivalent to one course each semester
and was thus correspondingly compensated by the Ministry of Education. To
the contrary, SF Bay Area math circles, for instance, are partially funded (if
at all) by private sources; the remaining “funds” are donated by volunteers’
time, effort, professionalism, and enthusiasm.
Undoubtedly, the reader has more questions, and the comparison list can
go on and on. But this Epilogue is not intended as an exhaustive study of
the math circle phenomenon. For more details on U.S. Math Circles, see
Sam Vandervelde’s “Circle in a Box” [84].
3.3. Get to the point. One way to resolve most of the problems associated
with math circles in the U.S. is . . . (OK, start dreaming!) . . . to have a math
circle at every college and university.
(1) The professor organizing and running the math circle will receive a
one- or two-course release from the math department, depending on
the frequency, length, and intensity of the circle sessions. This will
compensate for the huge effort involved in directing a math circle and
will hopefully encourage more mathematicians to get involved in edu-
cating the talented youth of the U.S.
(2) The math circle can be formally organized as a math course and, thus,
be open also to undergraduates.
(3) Undergraduate and graduate students, as well as interested postdocs
and tenured faculty, can be vertically integrated in this model.
310 EPILOGUE
Despite the shortfalls of U.S. math circles’ set-up, don’t get me wrong: I
founded and ran one such circle for a decade and plan on doing so for at least
another decade. If I had to describe the Berkeley Math Circle in one phrase,
it would simply be a “high-power version of my hometown math circle”. But
let’s start from the beginning.
4.1. To marvel and to be appalled. By my last year of graduate stud-
ies at Harvard, I had taught enough math courses to question the quality
and depth of pre-college math education in the U.S. The few strong (very
strong!) undergraduates never took calculus or linear algebra (apparently
having taken them at some university while in high school) but jumped di-
rectly to upper-division courses like real analysis, abstract algebra, or number
theory, to name a few. The cream of the crop, former USAMO winners and
IMO medalists, even ventured into graduate courses like algebraic geometry
or topology, or Lie algebras (why not?). Each and every such top student had
beaten his/her own path out of the jungle of U.S. secondary math education
by hiring tutors, by escaping to a nearby university, or, if extremely talented
in problem solving, by qualifying for the 30-student one month Mathematical
Olympiad Summer Program (MOSP), in preparation for the IMOs.
As I marveled at the super-advanced math knowledge and skills those
relatively rare students had acquired through very special personal circum-
stances, I was appalled at the general math level of the remaining huge bulk
of undergraduates. We are talking here about problems in dealing with frac-
tions and simple algebraic manipulations, with which, I am sure, a 6th grader
in Bulgaria would have felt perfectly comfortable!
4. HISTORY AND POWER 311
4.3. The chicken or the egg. I didn’t have time to think about the
situation in the bad schools, as I graduated from Harvard and moved in 1997
to Berkeley to take up a postdoctoral position at the Mathematical Sciences
Research Institute (MSRI).
It wasn’t a month into my new job, when I got an e-mail from Hugo Rossi
(then the Deputy Director of MSRI) asking MSRI members for suggestions
on possible outreach activities to the community. About 10 minutes later,
Hugo and I were in agreement that a regional Math Olympiad for pre-college
students would be the right thing to do: an Olympiad different from the
numerous fast-type calculational contests, an Olympiad consisting of a few
hard essay-proof problems for several hours, in the true fashion of Eastern
Europe. I met Paul Zeitz (University of San Francisco) a week later, and
definite plans to start the Bay Area Mathematical Olympiad (BAMO) were
set in motion.
To publicize the plan, in the late fall
of 1997 MSRI asked me to give a talk to
an audience of 400 people at a bi-annual
public event. Sandwiched between two
spectacular lectures on the mathematics
behind “Brain Waves” and “Toy Story”,
was my modest presentation “The High
School Olympiads - Excitement, Talent,
and Determination” (cf. MSRI streaming
video [79]). Years afterward, people still
remember it by a single picture: that of a
chicken and an egg.
312 EPILOGUE
The idea was that BAMO would get its participants mainly through
newly founded school-based math circles around the SF Bay Area and would
serve as an annual focal event for their activities. The Olympiad and the
math circles would complete and strengthen each other and would be founded
at the same time: neither would exist without the other. The mathematical
community would support the math circles with materials and occasional
session leaders; but the circles would be run by teachers at their schools.
In the audience were Tom Davis (Silicon Graphics), Tom Rike (Oakland
High School), Quan Lam (UC Berkeley President’s Office), Brian Conrey
(Director of the American Institute of Mathematics in Palo Alto (AIM)), and
Donald Knuth (Stanford), who all expressed desire to help with the new circle
and Olympiad movement. MSRI and AIM then launched a series of events
with local teachers and the media to publicize BAMO and to encourage the
start-up of many math circles. Alexander Givental and Bjorn Poonen (UC
Berkeley), John McCuan (MSRI), Dmitry Fuchs (UC Davis), Tatiana Shubin
(SJSU), Joshua Zucker (then at Henry Gunn High School), and others were
attracted through these events and pledged their support.
There must have been more “crazy” people in the SF Bay Area at that
time. A twin to BMC was born: the San Jose Math Circle [71] came into
existence the same week as BMC, mid-September 1998, under the tender
care and never-ending enthusiasm of Tatiana Shubin and Tom Davis, and is
still operational. For a few years Tom Rike, Joshua Zucker, and John Howe
led their own school-based circles in Oakland, Henry Gunn, and Presentation
High Schools, respectively. Sam Vandervelde had a circle for two years at
Stanford [78] (now led by parents). With MSRI’s guidance and support,
Paul Zeitz and Brandy Wiegers launched a different type of math circle in
San Francisco [70] and Oakland [62]. Sharon Madison opened the Sudbury
Math Circle (Canada) as a chapter of BMC, and Olga Radko also fashioned
the LA Math Circle [47] after BMC. The SF Bay Area network has expanded
now to a number of math circles across the U.S.: very few school-based and
not nearly as many as needed, but certainly way more than a decade ago.
4.5. Mapping out the future. Zooming back in on the Berkeley Math
Circle, the services it offers begin with the weekly sessions and the monthly
contests, but certainly do no end there. BMC has become a center for com-
munications between students, parents, instructors, teachers, educators, and
university administrators, where the circlers’ present and future mathemat-
ical education is mapped out. This kind of mentoring is possible only in the
presence of both “sides”: high quality instructors and students.
The more than 50 BMC instructors per year range from teachers and
students to university faculty and real world tycoons. Among them are
mathematicians: Alexander Givental, Alexandre Chorin, Bernd Sturmfels,
Bjorn Poonen, Dmitry Fuchs, Elwyn Berlekamp, Federico Ardilla, Joe Buh-
ler, Kiran Kedlaya, Olga Holtz, Ravi Vakil, Robin Hartshorne, Serge Lang,
Vera Serganova, and many more. Some famous alumni have also contributed
sessions to the circle: Gabriel Carroll, Maxim Maydanskiy, Inna Zakharevich,
Neil Herriot, Andrew Dudzik, Austin Shapiro, Oaz Nir, and Evan O’Dorney,
all of whom have chosen career paths in or related to mathematics.
The accomplishments of the BMCers are stellar. For example, half of
the BAMO grand prizes and brilliancy awards have been captured by the
BMCers, including the only brilliancy award won by a girl, Hoan Ngo (Oak-
land High School), and the only BAMO-8 grand prize won by a girl, Laura
Pierson (then a 6th grader at Oakland’s Hillcrest School), as well as a dozen
gold and silver medals at the IMO’s and a dozen USAMO wins. In 2007,
Evan O’Dorney, as an 8th grader, scored perfectly at BAMO and won the
National Spelling Bee, meeting and enchanting the then-President Bush; the
next year he scored highest at the USAMO and received the Clay Olympiad
Scholar Award [15] for one of his solutions; he went on to earn the second
highest score in the world at the IMO ’10 in Kazakhstan and received a
congratulatory call from President Obama, meeting him a year later when
in Washington to be awarded the first place prize at the Intel Talent Science
314 EPILOGUE
Search in 2011. Several multiple-time Putnam Fellows1 are also among our
students. But most importantly, original mathematical research has been
conducted by several circlers, including Gabriel Carroll, Tiankai Liu, Mak-
sim Maydanskiy, Evan O’Dorney, and others.
5.1. Early birds. Creative people start at a very young age to think
“outside-of-the-box” and to make significant contributions to the world. Some
noticeable examples are Bill Gates, who at age 20 dropped out of Harvard
to run Microsoft full-time; Steve Jobs founded Apple at age 19; and re-
cently Mark Zuckerberg created Facebook, a social graph platform, also at
age 19. The best young minds in the U.S. deserve our support. The Top-Tier
Math Circles are venues for such support: they nurture individuals who are
capable of significant accomplishments by giving them advanced training in
problem-solving tools that are found in no other U.S. educational institution.
As another example, a month before Evan O’Dorney [50] qualified for
his first IMO in Spain ’08, the 9th grader was exempted from his final in a
linear algebra class at UC Berkeley. The reason: he solved an open problem
posed in an article by Professor William Kahan [45]; more precisely, Evan
found out how small one can make the Cayley transform of a real orthogonal
matrix by reversing the signs on selected columns.
1
William Lowell Putnam Mathematical Competition [64] is the premier Mathematical
Olympiad for college students in the world. A Putnam Fellow is among the top 5 scorers.
2
Excerpts from [85].
5. DOES THE U.S. NEED TOP-TIER MATH CIRCLES? 315
The moment Laura Pierson from Oakland walked into BMC as a 5th
grader, it was obvious that she was special beyond any regular measures. As
a 6th grader, she made history: she won the BAMO-8 Grand Prize in 2012
with a perfect score and conquered USAJMO ’12, thereby becoming the
youngest to have been invited to MOSP. She went on to win silver medals
on the U.S. (high school!) teams at the European and China Girls Math
Olympiads in 2013 and 2012, respectively. She astounded her professors at
UCB when, as a seventh grader, she received the top scores in multi-hundred
student Calculus II and the upper-division Linear Algebra courses. She was
accepted to College Preparatory School in Oakland, skipping 8th grade.
“BMC has opened up a whole new world for me. It sparked my passion
for math and introduced me to whole new areas of math I had no idea
existed. I’ve also gotten to meet so many amazing people who share my
passions and who I can connect with and learn from. In many ways it’s
been a really life-changing experience.”
Laura Pierson, BMCer, 9th grader
5. DOES THE U.S. NEED TOP-TIER MATH CIRCLES? 317
Nico Brown from Mill Valley is the kind of kid about whom you have no
doubt: he “breathes” mathematics just as he breathes air. Being precocious
does not come even close to describing the mature interest in pure math-
ematics which Nico spontaneously exudes. He has 13 accepted sequences
on the Online Encyclopedia of Integer Sequences, a mathematician peer-
reviewed database. A multiple winner of the Monthly Contest and the Win-
ner in the Individual Countdown Round of the Berkeley Mini-Math Tour-
nament ’13, Nico’s passion is expressed most prominently through his work
at mathnik.com on “original mathematics and proof writing, particularly in
number theory.”
“Most weeks start on Monday mornings, but mine start on Tuesday
nights with the Berkeley Math Circle. It’s the highlight of my week for a
couple of reasons. Reason #1: The math, of course, but math I wouldn’t
see otherwise, such as the chromatic number of the plane or matrices,
brought in by people who love math like me. Reason #2: I’ve met two
of my best friends at BMC. For kids who love math, it’s rare to meet
others who feel the same; so combining math with friendship is why I
keep coming back. BMC also stands for ‘Best Math Community’.”
Nico Brown, BMCer, 6th grader
Vincent Pisani from Castro Valley has been in BMC for three years
and, as one of the youngest participants, has bravely taken any and all tests
offered at the circle, including AMC8, AMC10, and BAMO. Having been
awarded the John Hopkins 2012 High Honors, it may come as an anticlimax
to know that he also received the credit for the California High School Alge-
bra requirement based on test results taken as a 4th grader. A programmer
and iPad App developer, Vincent is an accomplished trumpet player.
“I really enjoy going to the Berkeley Math Circle. Each week has a
new topic, so I get to learn about a huge variety of mathematical topics,
unlike school. I have also met several great friends who also enjoy math,
including a professor from USF. I get together with them often to share
and work on math. BMC feeds my appetite for learning about math, and
I think it is worth driving all the way to Berkeley each Tuesday.”
Vincent Pisani, BMCer, 6th grader
Arav Karighattam from Davis joined the circle two years ago and won
over everyone with his smile and irrepressible enthusiasm for math. He re-
ceived the BAMO Young Student Achievement Award in ’12 and ’13, was
one of the top students in the Junior High category of the mathleague.org
California State Championships in ’12 and ’13, qualified for AIME in ’13 (as
a 4th grader) and in ’14, and continues to amaze his UC Davis professors in
upper-division courses such as Combinatorics, Euclidean Geometry, Number
Theory, and Real Analysis. He has also won music and poetry competi-
tions, including the Composers Today California State Contest in ’13 and
the ‘Voices of Lincoln’ Young Poet Contest in ’11, ’12, and ’13.
318 EPILOGUE
“There are many things I love about the Berkeley Math Circle. First,
I like the range of advanced topics taught at each session. Second, I
enjoy all the open problems presented at the circle during certain lectures
(e.g., which permutations are Wilf-equivalent?). That is why I don’t like
to miss a single session of BMC, rain or shine. It is an extraordinary
experience.”
Arav Karighattam, BMCer, 5th grader
Espen Slettnes is a third grader at BMC, who rapidly moved from the
BMC-Elementary to the BMC-Intermediate group in only two years and
received, not surprisingly, the 2012 High Honors Award from Johns Hopkins
University’s Center for Talented Youth and Math Kangaroo’s 2013 5th place
in California and 10th place nationwide. He is also a Young Scholar at
Davidson Institute for Talent Development and was selected to participate
at the Epsilon Camp for exceptionally gifted young children in 2013 and 2014.
“I am 8 years old, and I love math. BMC is an important part of my
math education, because it is one of the only places I get to work on real
math that I don’t get to do in school. The lectures introduce me to many
different math topics and help me dive deeper into topics I already know.
I also love participating in the BMC monthly contests, which exercise
my mind and help me improve my skills in writing mathematical proofs.
I am very glad to be part of BMC.”
Espen Slettnes, BMCer, 3rd grader
5.3. The gathering storm. There are a number of studies of the deteri-
orating situation in U.S. math and science education and its impact on the
scientific and technological presence of the U.S. in the world. To describe
just how critical the situation is, we refer below to three such reports.
“The United States is losing its edge in innovation and is watching the
erosion of its capacity to create new scientific and technological break-
throughs. Increased global competition, lackluster performance in mathe-
matics and science education, and a lack of national focus on renewing its
science and technology infrastructure have created a new economic and
technological vulnerability as serious as any military or terrorist threat.”
A Commitment to America’s Future, 2005 [13]
The National Academy of Sciences has also called to our attention the
need for the U.S. to raise its capabilities in mathematics, science, and en-
gineering, in a report “Rising Above the Gathering Storm: Energizing and
Employing America for a Brighter Economic Future” [58]. According to it:
• The U.S. has long depended on foreign-born and -trained mathematicians,
engineers and scientists to help maintain its intellectual lead.
• The global competition for these talented individuals has greatly intensified
in recent years and will continue to do so, as the rest of the world increases
its technical capabilities and living standards.
• To remain competitive, the U.S. needs to devote considerably more effort and
resources to foster excellence in mathematics, science and engineering.
5. DOES THE U.S. NEED TOP-TIER MATH CIRCLES? 319
5.4. Raising the ceiling. What can be done in the U.S.? Hung-Hsi Wu,
Professor of Mathematics at UC Berkeley, has been involved in the educa-
tion of U.S. mathematics teachers for the last decade. He was on the Task
Group on Teachers in the National Mathematics Advisory Panel appointed
by President Bush and is currently serving on the National Research Council
Panel on the Study of Teacher Preparation Programs.
According to Professor Wu, a main purpose of both panels is to address
the crisis in teacher quality among math teachers so as to insure the pro-
duction of a large enough pool of mathematically literate students to fill our
technological needs. However, to insure that we also produce first rate scien-
tists and mathematicians, a different kind of approach would be necessary:
“This is where the Math Circles come in. It is programs like the
Math Circles that can provide the needed guidance and stimulation for
the cream of the crop of this pool. While the work done by the above-
mentioned panels is designed to raise the floor to make our nation com-
petitive in the global market, what the Math Circles do is to raise the
ceiling in order to maintain our worldwide leadership position in science
and technology.
320 EPILOGUE
With hope,
Zvezdelina Stankova
Berkeley Math Circle Director
Berkeley, March 17, 2014
Symbols and Notation
321
322 SYMBOLS AND NOTATION
Geometry Notation
: divide or take the ratio of segments
α, β, γ, δ alpha, beta, gamma, or delta: letters from the Greek alphabet
aA mass point (a, A)
I(A, r) inversion with center A and radius r
I(A) inversion with center A and unspecified radius
[ABC] area of triangle ABC
AB segment AB or its length depending on context
|AB| distance from A to B; used if AB is ambiguous
AB arc AB
−−→
AB ray AB
∠ABC angle ABC
ABC triangle ABC
I Triangle Inequality
⊥ is perpendicular to
* is parallel to
∼
= geometric congruence
∼ geometric similarity
∠A = ∠B congruence of angles written also as ∠A ∼= ∠B
Combinatorics Notation
n! n factorial, 1 · 2 · 3 · · · (n − 1) · n
P (n, k) number of permutations of n objects taken k at a time
n
k binomial coefficient n choose k, n!/(r!(n − r)!)
U unknot
T (right-hand) trefoil
41 figure 8
H Hopf link
W Whitehead link
B Borromean rings
S Square knot
R1, R2, R3 Reidemeister moves on links
τ (L) the number of tricolorings of a link L
K1 #K2 connected sum of two knots
L mirror image of link L
VL Jones polynomial of a link L
324 SYMBOLS AND NOTATION
325
326 ABBREVIATIONS
Maia Averett is on the faculty at Mills College. She completed her PhD
in 2008 at UC San Diego where she was a UC Regents Dissertation Year
Fellow. Her area of mathematical specialty is the wobbly world of topology.
She started off as a homotopy theorist, but lately her research has been in
the fascinating new area of topological data analysis, a field that applies the
abstract machinery of algebraic topology to point cloud data to gain insight
about topics ranging from breast cancer to basketball.
Since finding mathematics as her passion came relatively late in college,
she has made mathematical outreach to young people a central objective
in her career. She created and conducted math circle sessions since 2008
for both the Berkeley and the Marin Math Circles. Maia also has a special
interest in fostering women in mathematics. She has been engaged in the Ex-
panding Your Horizons program at UC San Diego and at Mills. She founded
student chapters of the Association for Women in Mathematics (AWM) at
UC San Diego while in graduate school and later at Mills, where the chapter
goes by the name of The Möbius Band in honor of her love of topology.
Events organized by The Möbius Band regularly attract upwards of 30 peo-
ple – quite a feat at a school like Mills, which has only 950 undergraduates.
Maia has also taken an active role in the AWM on a national level, serving on
and chairing the student chapters committee and creating chapter meet-ups
at national math meetings.
When she’s not teaching, researching, programming, outreaching, or
otherwise mathematically engaged, Maia enjoys cooking Thai food, circuit-
bending children’s toys, and hiking in the Oakland hills with her dog.
331
332 BIBLIOGRAPHY
335
Index
possible states, 142 moves, 53, 54, 58, 59, 67, 68, 71, 77
power curve, 194 Theorem, 53, 54
powers, 161 Reidemeister, Kurt, 53
pre-calculus, xvii relatively prime, 46, 83, 96, 98, 99, 134,
prime, xix, 92, 148, 256 167, 243, 244, 259
decomposition, 84, 248 remainders, xvii, 135, 259
decomposition, square-free, 252 system of, 244
power, 87, 97, 253 rescaling, 199
prime-power reduction, 87 Research Science Institute, 315, 316
primitive root of unity, 113, 198, 201 restricted patterns, 330
probability, 306, 328 reverse weights, 282
problem solving techniques, xxi revolution, 75
abstract and develop a theory, 81 Rike, Tom, xix, xxiv, 312, 313, 329
introduce stronger object, 80 roots
proof via example, 86 formula, 196
reduce to prime powers, 87 in C, 196
problem-solving techniques, xxii of unity, 198, 202, 205
programming, 317, 328 Rossi, Hugo, xxiii, 311
proof, xiv, xvi, xx, xxi, 288 rotation, 26, 30, 33, 40, 41, 46, 122, 126,
property, xxi 129, 135, 179, 199
proposition, xxi Rousse, 306
protractor, 3, 20 Rubik’s Cube, xvi, 23, 103, 116, 328
Ptolemy’s Theorem, 179, 182 Rubik’s cube, 81
Putnam, xiii, 101, 314, 316, 327, 328 Rubik’s Cube group, 27
pyramid, 108
Pythagorean Theorem, xvii, 3, 20, 22, San Francisco Math Circle, 313
174, 175, 185, 187, 202, 288, 302 San Francisco State College, 329
baby Pythagorean, 175, 180, 186 San Jose Math Circle, xv, 313, 328, 329
San Jose State University, 312, 329
quadrilateral, xvii Savine, Igor, xxiv
convex, 150 science, 51, 305, 306, 316, 318, 319
cyclic, 182, 188 Scripps Spelling Bee, xxiii, 313
diagonals of, 150, 179 Second Derivative Test, 220, 230, 284
inscribed, 179 segment, 19, 185, 192, 218, 283, 295
quantum mechanics, 64 self-correcting process, 151
quotient, 135 sequence, 144
constant, 222
radians, 181 convergent, 278, 280, 300, 304
radicals, 105 increasing, 101, 296, 299
Radko, Olga, 313 majorizes, 222, 231, 271
ratio, 4, 9, 10, 161, 174, 185, 190, 209 monotone, 304
golden, 160 of averages, 167
rationalizing, 302 of moves, 104
ray, 19, 176, 186 of transformations, 57
real analysis, xvi, 217, 304, 310, 317 recursive, xiv, 76
real number system, xix stabilizes, 167
reciprocal, 133, 236 subsequence, 299
rectangle, 21, 110, 128, 130, 215, 292 Serganova, Vera, xxiv, 313
reflection, 11, 16, 26, 46, 126, 129, 132, series, xiv, 203
133, 135, 172, 176, 186, 199, 289 arithmetic, 89
regular polyhedra, 208 geometric, 88, 161, 170, 205, 304
Reidemeister harmonic, 299
change-of-crossing move, 54 Taylor, 287, 297
INDEX 345
Do some of these scenarios sound bizarre, having never before been associated with
mathematics? Mathematicians love having fun while doing serious mathematics and
that love is what this book intends to share with the reader. Whether at a beginner,
an intermediate, or an advanced level, anyone can find a place here to be provoked
to think deeply and to be inspired to create.
In the interest of fostering a greater awareness and appreciation of mathematics
and its connections to other disciplines and everyday life, MSRI and the AMS are
publishing books in the Mathematical Circles Library series as a service to young
people, their parents and teachers, and the mathematics profession.