Download as pdf or txt
Download as pdf or txt
You are on page 1of 372

Mathematical Circles Library

A Decade of the
Berkeley Math Circle
The American Experience,
Volume II
Zvezdelina Stankova
Tom Rike


A Decade of the
Berkeley Math Circle
The American Experience,
Volume II
Mathematical Circles Library

A Decade of the
Berkeley Math Circle
The American Experience,
Volume II

Zvezdelina Stankova
Tom Rike
Advisory Board for the MSRI/Mathematical Circles Library
Titu Andreescu Walter Mientka
David Auckly Bjorn Poonen
Hélène Barcelo Alexander Shen
Alissa S. Crans Tatiana Shubin (Chair)
Zuming Feng Zvezdelina Stankova
Tony Gardiner Ravi Vakil
Kiran Kedlaya Ivan Yashchenko
Nikolaj N. Konstantinov Paul Zeitz
Silvio Levy Joshua Zucker
Andy Liu

2010 Mathematics Subject Classification. Primary 00–01, 00A07;

Secondary 00A08.

This volume is published with the generous support of the Simons Foundation and Tom
Leighton and Bonnie Berger Leighton.

For additional information and updates on this book, visit

Library of Congress Cataloging-in-Publication Data

A decade of the Berkeley Math Circle : the American experience / Zvezdelina Stankova, Tom
Rike, editors.
p. cm. — (MSRI mathematical circles library ; v. 1–)
Includes bibliographical references and index.
ISBN 978-0-8218-4683-4 (alk. paper)
1. Mathematics—Study and teaching (Middle school)—California—San Francisco Bay Area.
2. Mathematics—Study and teaching (Secondary)—California—San Francisco Bay Area.
3. Berkeley Math Circle. I. Stankova, Zvezdelina, 1969– II. Rike, Tom, 1943– III. Mathe-
matical Sciences Research Institute (Berkeley, Calif.)
QA13.5.C22S363 2008
510.7127946—dc22 2008030521

Copying and reprinting. Individual readers of this publication, and nonprofit libraries
acting for them, are permitted to make fair use of the material, such as to copy select pages for
use in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Permissions to reuse
portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink
service. For more information, please visit:
Send requests for translation rights and licensed reprints to
Excluded from these provisions is material for which the author holds copyright. In such cases,
requests for permission to reuse or reprint material should be addressed directly to the author(s).
Copyright ownership is indicated on the copyright page, or on the lower right-hand corner of the
first page of each article within proceedings volumes.

c 2015 by Zvezdelina Stankova and Thomas Rike
Printed in the United States of America.

∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at
Visit the MSRI home page at
10 9 8 7 6 5 4 3 2 1 20 19 18 17 16 15
To Zvezda’s husband, Dmitri, and to Tom’s wife, Peggy, for making
this book possible, for their infinite patience and love over the course
of two years of hard work,

and . . .

to the instructors of the Berkeley Math Circle for donating their

time and effort over the last decade, for leading inspiring sessions
full of mathematical challenges and wonders, and for sharing their
passion for mathematics with the circlers.

Zvezda Stankova and Tom Rike


Foreword xi

Introduction xiii
1. Top-Tier Math Circles xiii
2. Why, What, and for Whom? xvi
3. Notation and Technicalities xx
4. The Art of Being a Mathematician and Problem Solving xxii
5. Acknowledgments xxiii

Session 1. Geometric Re-Constructions. Part I 1

1. Experimenting and Conjecturing 1
2. A Triangle Workout 5
3. Walking Along an Optimal Path 10
4. Walking Along an Integer Grid 14
5. To Prove or to Take for Granted? 16
6. Hints and Solutions to Selected Problems 20

Session 2. Rubik’s Cube. Part II 23

1. What Is a Group? 23
2. Permutation Groups and Group Isomorphisms 27
3. Properties of Groups and Their Subgroups 31
4. Even and Odd Worlds 35
5. How Many Cube Positions Can Be Reached? 39
6. Conclusions 45
7. Hints and Solutions to Selected Problems 46

Session 3. Knotty Mathematics 49

1. A Knot, or Not a Knot. That Is the Question. 49
2. Reidemeister and Knot-Eating Machines 53
3. Three Crayons Defeat an Army of Knots 56
4. The Jones Polynomial 64
5. Is This the End? 70
6. Hints and Solutions to Selected Exercises 70


Session 4. Multiplicative Functions. Part I 79

1. Infinite Raffle: the Initial Setup 79
2. What are Multiplicative Functions? 82
3. Sum-Functions 92
4. Hints and Solutions to Selected Problems 96
Session 5. Introduction to Group Theory 103
1. Puzzling It Out 103
2. A Polynomial Prelude 104
3. Action Groups 105
4. General Groups 110
5. Some More Examples of Groups 112
6. Permutation (or Symmetric) Groups 116
7. The 15-Puzzle Puzzled Out 123
8. Hints and Solutions to Selected Problems 126
Session 6. Monovariants. Part II 141
1. Numerical Monovariants 141
2. Constructive Activities 149
3. Not Getting There 153
4. Conway’s Checkers 158
5. Hints and Solutions to Selected Problems 164
Session 7. Geometric Re-Constructions. Part II 171
1. Optimal and Infinite Challenges 171
2. A Pythagorean Path for the Intermediate 173
3. Physics and Math Combine Forces 176
4. Ptolemy’s Lead into Trigonometry 178
5. Hints and Solutions to Selected Problems 185
Session 8. Complex Numbers. Part II 189
1. Warning, “Teaser,” and Strategy 189
2. Conventions from the Past 190
3. Complex Division 190
4. The Triangle Inequality: No “Respect” for Addition? 192
5. Integer Powers in C 193
6. Roots in C 196
7. Roots of Unity and Regular Polygons 198
8. Geometric Promise Fulfilled 200
9. Venturing Everywhere in the Plane 202
10. Which are the “Closest” Lines 205
11. Hints and Solutions to Selected Problems 208
Session 9. Introduction to Inequalities. Part I 211
1. The Language of Inequalities 211
2. Arithmetic Mean – Geometric Mean Inequality 212
3. Power Mean Inequality 216

4. The Land of the Convex 218

5. Applications of Convexity to Inequalities 220
6. Geometry Leftovers and a Mean Summary 223
7. Hints and Solutions to Selected Problems 225
Session 10. Multiplicative Functions. Part II 233
4. Dirichlet Product 233
5. Möbius Inversion Formula 237
6. The Euler Function φ(n) 243
7. The Taming of the ShrewD φ 247
8. Hints and Solutions to Selected Problems 252
Session 11. Monovariants. Part III 263
1. The Balkan Roots Challenge 263
2. Smoothing and Unsmoothing 264
3. Rearranging Terms 266
4. Convexity and Smoothing 268
5. Random Fun with Smoothing 275
6. Appendix on Limits and Endless Smoothing 278
7. Hints and Solutions to Selected Problems 281
Session 12. Geometric Re-Constructions. Part III 287
1. Farmer-and-Cow via Inequalities and Calculus 287
2. Optimal Bridge Located! 292
3. Infinitely Many Angles and Infinite Series 296
4. Historical Detour: from Today back to Archimedes? 301
5. Hints and Solutions to Selected Problems 302
Epilogue 305
1. What Comes from Within 305
2. The Culture of Circles 306
3. Eastern European vs. USA Math Circles 307
4. History and Power 310
5. Does the U.S. Need Top-Tier Math Circles? 314
Symbols and Notation 321
Abbreviations 325
Biographical Data 327
Bibliography 331
Credits 335
Index 337

When I came to the Mathematical Sciences Research Institute, MSRI, as Di-

rector in 1997, the Institute already had an extraordinary and distinguished
history of research programs. Bill Thurston, my predecessor, felt strongly
that MSRI had both an opportunity and an obligation to build on its research
excellence a structure that would promote mathematics and its applications
in other ways, among them public engagement and education, and he had
started several programs for this purpose. I was very much in agreement
with this point of view, as was the then Deputy Director, Hugo Rossi.
In 1998 a new opportunity appeared in the form of a postdoc: Zvezdelina
Stankova freshly at Berkeley after a PhD at Harvard. Zvezda (as we all
learned to call her) came to our offices, telling us that the U.S. mathematical
community was missing a major opportunity to encourage youngsters to love
and be inspired by mathematics: Math Circles, a program long popular in
the Eastern Bloc countries (and in particular in Bulgaria, Zvezda’s former
home), was nearly unheard of here. Zvezda proposed that we get a math
circle going in Berkeley, and perhaps try to spread the tradition.
This turned out to be a most rewarding project, and a large group was
soon engaged. Aside from Zvezda, Paul Zeitz (Professor of Mathematics at
the University of San Francisco and coach of the winning American team at
a recent International Mathematics Olympiad), Tatiana Shubin (Professor
of Mathematics at San Jose State University, who brought her passion and
experience of math circles from the former Soviet Union), and Tom Davis (an
applied mathematician at Silicon Graphics) joined forces to offer after-school
math programs that were advanced, challenging, fun, and beautiful. We soon
learned of others who were also passionate about developing math circles,
including Mark Saul in New York and Bob and Ellen Kaplan in Boston.
It is one thing to start an individual math circle, another to start a na-
tional movement, but the latter was always something we hoped for. When
Mark Saul was a Program Director at the Templeton Foundation, MSRI
received a grant from the Foundation to help start the Math Circles Li-
brary, in partnership with the American Mathematical Society. This vol-
ume is the latest in that series, all of which can be found, for example, at or


We also began a National Math Circles Association (see,

an organization that now has more than 100 adherent Circles and makes
small grants to those who would like to start one. The work has received
further support from Tom Leighton and the Akamai Foundation, the Si-
mons Foundation, the Moscow Center for Continuous Mathematics Educa-
tion, the American Mathematical Society, and of course by MSRI’s donors
and Trustees. To all of these we are most grateful!
And what do the Math Circles actually do? I can do no better than
to quote the description by Robert Bryant, from his foreword to the first
volume in the Math Circles Library, which appeared in 2008:
And the students came, first in Berkeley and San Jose, then in
San Francisco, Stanford, Oakland, and Davis, with open minds
and willing hearts to learn about mathematics of which they pre-
viously could not have dreamt. They worked on problem solv-
ing with faculty who had a depth of understanding and a love of
mathematics beyond anything they had ever encountered. There
were no attendance lists, no tests, no being forced to do anything.
The beauty of the mathematics attracted them to want more and
more. We have come to learn that many students, girls and boys,
of different socio-economic and racial backgrounds thrive in math
circles and come to love the experience.
Both professors and students find in math circles a situation
that is rare in classroom mathematics instruction. Math circles
are voluntary, extra-curricular, after-school programs. The stu-
dents who are there are much less likely to be motivated by the
need to satisfy an academic requirement, prepare for a career, or
enhance a resumé. They are, for the most part, there because
they love mathematics. The teachers encounter students who are
willing and hungry to learn, while the students encounter teach-
ers with expertise and enthusiasm far beyond the usual classroom
experience. Teachers and students look forward with anticipation
to the next meeting.
The Math Circles Library in which this volume appears is just one facet
of the support that MSRI provides to the National Association of Math
Circles (NAMC); MSRI also supports the position of Director of the NAMC,
arranges funding for mini-grants to begin new Circles and helps organize
and fund workshops that help train new Circle leaders. For all this see
the NAMC website,, where one can also find lists of
Circles in different neighborhoods and additional resources. MSRI provides
all this support because the Math Circles have proven such an effective way
of sparking an enthusiasm for mathematics in young minds!
David Eisenbud
Mathematical Sciences Research Institute

“The Berkeley Math Circle was really critical in my development. It

was the best method available not only to get a flow of mathematical ideas
and problems to think about each week but also to meet other interested
students and professional mathematicians from all over the Bay Area.
You get stimulation from exchanging ideas with other people that you
don’t get from reading books at home.
I can also testify to the usefulness of studying mathematics even for
students who don’t plan on doing it as a career. For someone who wants
to go into, say, law, policy analysis, philosophy, economics, or computer
science, the kind of logical, abstract thinking that mathematics develops
is really the best preparation. I realize that the Circle is most interested
in attracting students whose lifelong passion is for mathematics, but it
also helps others along the way.”
Gabriel Carroll, BMC alumnus
Perfect IMO ’01 score
Four-time Putnam Fellow
Assistant Professor of Economics, Stanford

1. Top-Tier Math Circles1

This book is based on material from a dozen of the 800 sessions of the
Berkeley Math Circle (BMC), held over the past 16 years. BMC has been
described as a top-tier math circle, calling for the following two definitions.

1.1. Math circles are weekly math programs that attract elementary, mid-
dle, and high school students to mathematics by exposing them to intriguing
and intellectually stimulating topics, rarely encountered in classrooms. Math
circles vary in their organization, styles of sessions, and goals. But they all
have one thing in common: to inspire in students an understanding of and
a lifelong love for mathematics.
Based on contributions from Marc Whitlow and Mike Breen (BMC Parents),
Zvezdelina Stankova (BMC Director), and Tatiana Shubin (SJMC Director).


1.2. Top-tier math circles prepare our best young minds for their future
roles as mathematics leaders. Sessions are taught by accomplished mathe-
maticians and explore advanced mathematical areas. They provide an ed-
ucational opportunity for top pre-college mathematics students, not offered
in any other setting in the U.S. education system. In addition to learning
advanced mathematics topics, students are taught the technical writing skills
needed to convey the solutions of complex problems.
As an example of a top-tier math circle, the Berkeley Math Circle
is fashioned after the leading models in Eastern Europe, where math circles
originated over a century ago. BMC itself started in the fall of 1998 with
about 50 students, primarily in grades 7-12, and there was only one session
per week that lasted 2 hours. Sixteen years after, the circle has expanded to
about 300 students in grades 1-12, split into two major groups:
• BMC-Upper with 3 levels: BMC-Beginners for 5th -6th grades (1.5
hours per week); BMC-Intermediate for 7th -8th grades (2 hours per
week); and BMC-Advanced for 9th -12th grades (2 hours per week).
BMC-Upper is directed by Zvezdelina Stankova.
• BMC-Elementary with 2 levels: BMC-Elementary I for 1st -2nd grades
(3 sections, 1 hour per week); and BMC-Elementary II for 3rd -4th
grades (3 sections, 1 hour per week). BMC-Elementary is directed
by Laura Givental.
This book series is based on sessions from BMC-Upper and from the orig-
inal BMC, when there was only one group for all. To save space, “BMC”
throughout this book will refer, for the most part, to materials, instructors,
and students from “BMC-Upper.”
Like top-tier universities, BMC
• challenges students with beautiful, difficult mathematical theories,
• introduces them to powerful problem-solving techniques,
• constantly provokes deep thought, and
• inspires the creation of original ideas.
Topics covered at BMC include combinatorics, graph theory, linear alge-
bra, geometric transformations, recursive sequences, series, set theory, group
theory, number theory, elliptic curves, algebraic geometry, applications to
computer science, natural sciences, economics,and many more. Each topic is
taught by an expert in the field who has the ability to challenge the students
and support them as they attempt to meet these challenges. All problems
require students tocome up with mathematical proofs. Proofs put forward
by the students are not always the most eloquent. Only an accomplished
mathematician can understand where a student might be heading in his/her
proof and offer assistance through this challenge.2
For examples of noteworthy past and present instructors who have brought their
world expertise to BMC, see the Epilogue.

The sessions are fast-paced and intellectually demanding. It is hard to

convey just how advanced this subject matter is without actually attending
a session; but comparable levels can be found in advanced undergraduate
and beginning graduate courses.
The Monthly Contests (MC) at BMC can also convey the depth of
the material. These are take-home exams of four or five hard, thought-
provoking problems, requiring independent research, split into two levels:
MC-Beginners (up to grade 8) and MC-Advanced (up to grade 12). In the
beginning years of BMC, the monthly contests were designed and graded
by UCB faculty. However, for the last 14 years the MC were designed and
coordinated by current and former circlers.
The MC develop not only advanced understanding, but also technical
writing skills: the students must describe on paper, convincingly and without
gaps, how they solved a problem. This is a fundamental skill and key to
making intellectual property contributions; it is a unique feature of the top-
tier math circles, not found in middle or high schools, where students are
taught to meet state standards on questions that take less than a minute to
answer. In contrast, monthly contest problems may take the best students
hours or days of concentrated thought. Only a few participants are capable
of solving all the problems; yet, through the attempt everyone learns about
the real world of mathematical research.

1.3. The next generation of math leaders. The students of BMC come
from a variety of socio-economic and ethnic backgrounds. The proportion of
female to male students is approximately 2:3. This is an amazingly high ratio
considering the trend of other high-level math programs, which are “male-
dominated” or “male-only.” Excellent role models for the female students are
provided by the female directors of the top-tier math circles in Berkeley [11],
San Jose [71], Los Angeles [47], and (formerly of) Marin Math Circle [52];
but perhaps even more important to the students are the outstanding lectures
given by dozens of female professors and graduate students.
Currently, BMC does not actively recruit participants. Students and
their parents find out about the circles by word of mouth, from the Circle’s
web site,, through local universities,
and in publications. Due to an increased number of applicants, there is
a semi-formal selection process based on several open essay-type questions
along the lines of:
• Describe your mathematical background and experiences so far.
• Why do you want to join BMC? What do you expect from BMC?
• What is your favorite math problem that you can solve? State and
solve the problem. Why is it your favorite?
• What is your favorite math problem that you cannot solve? State the
problem and explain why you cannot solve it but why you would like
to solve it.

Needless to say, BMC students are usually years ahead of their peers:
they often complete most of high school mathematics by age 13 (8th grade),
some take many college math major courses by the time they graduate from
high school, and a few of the top circlers venture into graduate courses
and serious mathematical research even before entering a university. The
accomplishments of students who have benefited from BMC can be measured
in many ways. For example, a number of these students have gone on to win
International Math Olympiad medals and Putnam awards, and the majority
have been admitted to top-tier universities. BMC and the other top-tier
math circles not only produce highly accomplished students – they produce
and train the next generation of leaders in mathematics.3

2. Why, What, and for Whom?

Running BMC for 16 years has taught us a lot about math education in
the U.S. and has helped us to understand better our own childhood education
and origins of our passion for mathematics. To share this experience with
you, the reader, is the purpose of this book :
• to present you with beautiful theories, problem-solving techniques, and
mathematical insights;
• to provide you with an abundance of exercises and problems to work
on and with ready materials for math circle sessions.

2.1. The middle or high school student who is interested in expanding

his/her math horizons and going well beyond anything that the regular math
classroom can offer, who is brave enough to tackle non-trivial math ideas and
work on hard problems for hours, who loves challenges and is motivated to
overcome them: this is the ideal reader of the book.
Don’t confuse the above description with “top” or “brilliant” students:
you will never know if you are talented in math unless you give it a try. And
you may be pleasantly surprised by what you find out: that mathematics
is a whole lot more than “adding fractions,” “algebraic manipulations,” or
“endless quadratic equations” in homework assignments. You will discover
that Calculus is not the “pinnacle” of mathematical knowledge (as thought by
many): it is only one of many beginnings, part of the subject of real analysis.
Indeed, other wonderful topics are awaiting you (cf. Fig. 1, p. xviii):
• multiplicative functions in number theory;
• knot theory in topology;
• Rubik’s Cube and groups in abstract algebra;
• interaction between geometry, trigonometry, physics, and Calculus;
• complex numbers arising from algebra and applied to geometry;
• game theory and inequalities attacked by monovariants; and
• plenty of proof methods and problem-solving techniques.
To learn about the need for top-tier math circles, we direct the reader to the Epilogue.

2.2. Prerequisites. To read the book comfortably, you do not need to have
Calculus under your belt, except
• in the very last section of Session 12 on plane geometry, which discusses
a series solution to a geometric question, or
• if you want to prove the cited theorems in Session 9 on inequalities.
However, familiarity with basic geometry and algebra concepts and theo-
rems will definitely be helpful; e.g., lines, circles, triangles, rectangles, trape-
zoids, and quadrilaterals in general; similarity criteria for triangles and the
Pythagorean Theorem; equal alternate interior angles for parallel lines and
bisecting diagonals in a parallelogram; integers, divisibility and remainders;
operations on fractions and real numbers, intervals and sets of numbers;
and manipulations of algebraic expressions written with letters. In some
sessions, functions will play a major role; hence having studied some basic
(pre-calculus) examples will not hurt; e.g., linear and quadratic functions,
polynomials, exponential and trigonometric functions, as well as their graphs.
The above concepts will be re-introduced via examples in the book. But
if you feel that you need more solid background, we direct you to several
wonderful books that should be part of any budding mathematician’s library:
• Geometry, Book 1 by Kiselev [32],
• Functions and Graphs [27], The Method of Coordinates [28], Sequences,
Combinations, Limits [31], Algebra [30] and Trigonometry [29] by
Gelfand, et al.,
• for the older reader, 103 Trigonometry Problems from the Training of
the USA IMO Team by Andreescu and Feng [5].

2.3. The logical structure of the book series (volumes I and II) is outlined
in Figure 2 on page xviii. A solid arrow indicates that a session requires
its “predecessor” to be studied beforehand, while a dashed arrow indicates
that the “predecessor” will be helpful but is not absolutely necessary. For
example, in order to understand Rubik’s Cube II, one should first study
Rubik’s Cube I; on the other hand, Rubik’s Cube I-II will make Group
Theory I more concrete, but they are certainly not mandatory.
Sessions that are bubbled in an ellipse can be attempted without any
prerequisites, while sessions encompassed in a rectangle have at least one
necessary predecessor. For example, Monovariants II calls for a prior study
of Monovariants I, while Knot Theory can be attacked with little reference
to other sessions. Sessions not enclosed in anything are from volume I.
Finally, there is a group of sessions that pertain to general proof meth-
ods, PSTs, and theory that appear in most other places. These sessions
are from volume I and are roughly grouped in the two nebulous “clouds”:
Proofs I-II, PSTs, and Induction in one “cloud,” and Number Theory I and
Combinatorics I in another “cloud.” Figure 2 captures some, but certainly
not all, relations among the sessions and topics. The reader is welcome to
search for and draw more arrows, as he/she goes through the book.

Complex Topology

Proofs, PSTs &

Inequalities Monovariants

Theory Geometry

Figure 1. Main Areas in Volumes I-II

Geometric Inequalities I Monovariants III Complex Multiplicative II Knot

Re-Constructions III Numbers II Theory
A bit of Proofs II
Calculus Complex
Monovariants II Multiplicative I
Geometric Induction Numbers I Rubik's
Re-Constructions II Proofs I Cube II
Inversion I
Monovariants I Group
Theory I
Number Theory I
Geometric Circle Mass Point Stomp Rubik's
Combinatorics I
Re-Constructions I Geometry Geometry (Invariants) Cube I

Figure 2. Logical Structure of the 24 Sessions in Volumes I-II


2.4. The middle or high school teacher who wishes to start a math
circle in his/her school or teach a specially designed problem-solving class
will find this book series invaluable. To start with, five sessions from volume I

are a must for any math circle, as they provide techniques and a foundation
for solid mathematical understanding; these are Combinatorics I, Number
Theory I, Proofs I-II, and Induction.
Five of the topics in volume II are introductory and independent of each
other; e.g., Geometric Re-Constructions I, Knot Theory, Group Theory I,
Multiplicative I, and Inequalities I. Towards the end, some of these contain
harder material suitable for intermediate level and the second-to-third year
of a math circle. Four other sessions obviously need to be introduced af-
ter studying their earlier counterparts; e.g., Geometric Re-Constructions II,
Rubik’s Cube II, Monovariants II, and Complex Numbers II. The remaining
three sessions are designed truly for the advanced reader: Multiplicative II,
Monovariants III, and the last section of Geometric Re-Constructions III.
Open questions or problems beyond the scope of the book are interspersed
throughout the book and should be left to the die-hards.
Running a math circle, especially for a teacher, is a hard task. But it is
possible. In the 1960’s, Tom Rike (an editor for this book and a veteran high
school math teacher) was working on his master’s degree. While browsing in
the library one day, he ran across The USSR Olympiad Problem Book [74].
It contained problems written for talented 7th –10th graders; yet, he could
not solve any of these “elementary” problems. In his own words:
“My abstract algebra had been too abstract, and I did not have the
concrete examples that I needed. I never took a class in number theory
because it sounded too elementary. I had developed the real number
system starting from the Peano axioms, but I didn’t really understand
the fundamentals of the natural numbers, prime numbers. This was
an epiphany for me. I felt as though I had been challenged by some
force outside me and did not know how to respond.”
For the next 30 years Tom studied olympiad problem solving, first on his
own, then through workshops and math circles in the SF Bay Area. He ran
his own math circle at Oakland High School and gave talks at just about all
other circles around. Even though at times he was only “a few pages” ahead
of the students, he kept on learning and teaching problem solving because
working on math circles had come to be a large part of his life:
“Although I have not attained my goal of becoming a true olympiad
problem solver, the journey I have made in pursuit of this goal has
been one of the most rewarding endeavors in my life.”
Hence, a word to the middle and high school teachers: keep on reading
the book, despite moments of difficulty or confusion. For the motivated,
persevering, and caring teacher, there will come a time when he/she will look
back at the material here, smile, and effortlessly deliver it to the students at
his/her own math circle. Truly gratifying.

2.5. Proofs in particular. That proofs are important goes without ques-
tion in the mind of Galileo’s father:
“It appears to me that those who rely simply on the weight of authority
to prove any assertion, without searching out the arguments to support
it, act absurdly. I wish to question freely and to answer freely without
any sort of adulation. That well becomes any who are sincere in the
search for truth.”

In volume I we learned a variety of proof methods: by contradiction,

Pigeonhole Principle, and induction; by counterexample, example, or general
argument; using invariants or monovariants, and others. All sessions in
volume II call for rigorous proofs. Although it is possible to get the gist of

the sessions without being familiar with proofs, reviewing first the sessions
on Proofs and Induction in volume I will make it faster and easier to read
and understand this book.
2.6. The parent of a middle or high school student is also among our
intended audience; in fact, parents are probably the most important readers
because without their support and enthusiasm, without them bringing and
encouraging their children, there would hardly be any top-tier math circles
in the U.S. Hence, if you are among those parents or if you are a parent
new to the math circle movement, this book series will provide a very strong
beginning for your child. And for you as well.
As a parent, you can do three things with this book: give it to your child
(but make sure that he/she has the necessary background – see the recom-
mended basic books); learn from it and teach your child; or give it to his/her
math teacher and encourage the founding of a school-based math circle.
Whatever path you choose to follow, it will eventually benefit your child
and possibly a larger group of classmates. In any case, enjoy the book!

3. Notation and Technicalities

“Philosophy is written in this grand book, the universe, which stands
continually open to our gaze. But the book cannot be understood unless
one first learns to comprehend the language and read the characters in
which it is written. It is written in the language of mathematics, and its
characters are triangles, circles, and other geometric figures without which
it is humanly impossible to understand a single word of it; without these
one is wandering in a dark labyrinth.”
3.1. Marginalia. In addition to geometric “characters,” we will also use a
number of other symbols from algebra and logic. Let us examine first the
non-standard margin icons which appear throughout the book.

Warm-up or brute force Basic Pigeonhole Principle

 Exercise Generalized Pigeonhole Principle

Basis step
Open question or one that
requires extra knowledge Inductive step
 Problem-solving technique
Strong basis step
Strong inductive step

The first four margin pictures refer to increasing difficulty of exercises

and problems. Assigning such symbols is somewhat arbitrary since the same
exercise could be easy for one person and could be a really hard problem for
another; something may be beyond the knowledge of the reader early in the
book, while later it may turn out to be a piece of cake. Thus, treat these
symbols as a general guide to the difficulty of the material and make your
own judgment after having attempted each problem.
The problem-solving techniques, indicated by an eye, are ubiquitous
throughout the book and will be discussed in the next section. The warn-
ing road sign, the high-voltage symbol, and the pigeons were introduced in
Proofs I in volume I. The last four margin pictures refer to the steps of basic
and strong mathematical induction, the basis for Session 6 in volume I.
3.2. Logic. Mathematical statements that are proven are referred to by
standard names such as theorem, lemma, proposition, property, or corollary.
Conjectures are statements that are believed to be true, but no proof for
them has been supplied yet. As opposed to volume I, in this book we will
not avoid the formal definition environment; likewise, theorems and such
will be often phrased formally.
All sessions have a section on Hints and Solutions to Selected Problems.
There and throughout the text, you will see two symbols indicating the end
of a solution. The standard square  indicates the end of a complete solution
or a proof with minor gaps, which are usually mentioned and the reader is
expected to easily fill them in. The diamond ♦ is at the end of an incomplete
solution, partial proof, sketch of a proof, hint, or any discussion requiring
more work by the reader to reach a complete proof.
The text uses standard mathematical words and expressions, such as
“implies,” “therefore,” “if then,” “only if,” “if and only if,” letter notations
for various sets of numbers, e.g., Z for the integers, and many others. Even
though some are explained and illustrated via examples, the reader is ex-
pected to be familiar with basic logic notions and notation (cf. the list of

Symbols and Notation on page 321). If you need to review or learn this
material in depth, we refer you to the first chapter of Jacobs’ Geometry [43]
on deductive reasoning. A complete list of Abbreviations can be found on
page 325.

3.3. Labeling and future volumes. Subfigures within the same figure
are implicitly labeled in alphabetical order. For example, Figure 4 on page 9
contains subfigures Figure 4a, 4b, 4c, and 4d, reading from left to right. Fi-
nally, about half of the sessions are parts of series of sessions, to be continued
in Volume III of the book.

4. The Art of Being a Mathematician and Problem Solving

“Perhaps I can best describe my experience of doing mathematics
in terms of a journey through a dark unexplored mansion. You enter
the first room of the mansion and it’s completely dark. You stumble
around bumping into the furniture, but gradually you learn where
each piece of furniture is. Finally, after six months or so, you find
the light switch, you turn it on, and suddenly it’s all illuminated.
You can see exactly where you were.”
Sir Andrew John Wiles

There are no manuals on how to become a mathematician. This book

will give you tips and will point to possible paths; but the “art of being a
mathematician” can be mastered only through personal experience. With
every problem solved and every new definition or theorem learned, you will
move closer to this goal. The two most important skills that you will acquire
along the way are
• to think creatively while still “obeying the rules” and
• to make connections between problems, ideas, and even theories.

4.1. Problem-solving techniques. Although all sessions in this book are

based on basic knowledge from middle and high school and are, therefore,
accessible to a wide range of ages and mathematical backgrounds, to do the
exercises, you need to develop problem-solving techniques (PSTs). Session 1
on inversion in volume I introduced PSTs as part of a trilogy of mathematical

knowledge: Concepts, Theorems, and PSTs; and throughout this book you
will encounter about 100 PSTs. You will also need to learn how to fit together
various mathematical parts in order to move forward in the solutions.

4.2. Muddying your hands. Do not expect each session to be a collection

of clearly spelled out recipes leading to instantaneous solutions . . . . Nope!
The book will encourage you to apply the newly acquired knowledge to
problems and will guide you along the way but will rarely give you ready
answers. “The best way to learn is to learn from your own mistakes,” said

my advisor Joe Harris. A number of places in the book will present common
problem-solving pitfalls, and alternative ways to solve the same problem.
And so, it will be you, the reader, who has to commit to mastering the
new math theories and techniques by
• “muddying your hands” in the problems,
• going back and reviewing necessary PSTs and theory, and
• persistently moving forward in the book.
Nothing good comes “for free”: you will have to work hard, always with a
pencil and paper in hand. Keep in mind that the math world is huge: you’ll
never know everything, but you’ll learn where to find things, how to connect
and use them. The rewards will be substantial.

5. Acknowledgments

5.1. Institutional support and sponsors. The Berkeley Math Circle was
made possible through the years with the unwavering support of:
• University of California at Berkeley Math Department, which hosts the
Circle and its web site and has provided student assistants and secretarial
support every year since 1999. Through faculty grants, Ivan Matić has been
able to act as an associate director. The department chairs Cal Moore,
Hugh Woodin, Ted Slaman, Alan Weinstein, and Arthur Ogus have always
been encouraging and supportive, and several dozen UCB professors have
delivered Circle sessions.
• Mathematical Sciences Research Institute, which from its inception has
overseen the project, provided funds through various sponsors, and hosted
Circle meetings and events. Special thanks to Deputy Directors Hugo Rossi,
Joe Buhler, Michael Singer, Bob Megginson, and Hélène Barchelo, Directors
David Eisenbud and Robert Bryant, and Associate Director David Auckly
for their leadership, understanding, and help.
A number of sponsors have financially supported BMC over the years:
Packard Foundation, Toyota Foundation, Clay Mathematics Institute, Mosse
Foundation for Art and Education, Merriam-Webster Foundation for the
Scripps National Spelling Bee; National Science Foundation and other grants
from Professors Ravi Vakil (Stanford), Bjorn Poonen, Alexander Givental,
and Martin Olsson (UC Berkeley), and generous private donors.

5.2. Parents and students. BMC parents have encouraged and driven
their kids to the Circle for years, brought snacks during the breaks, organized
Circle parties, attended meetings, and donated time, effort, and personal
funds to the Circle. We are especially grateful to Marc Whitlow, Mike Breen,
Jennifer O’Dorney, Yuki Ishikawa, Ian Brown, and Tony DeRose for their
enthusiasm, leadership, and professional services provided so selflessly to the

A sequence of UC Berkeley student assistants have contributed to the

smooth operation of the Circle by communicating with circlers, parents, in-
structors, and administrators and by re-designing and maintaining the web
site. Joyce Yeung, Maksim Maydanskiy, Wycee de Vera, William Chen,
David Wertheimer, Michael Pejic, Stephanie Tung, and Hojae Lee, have
been exceptionally professional and caring. Many thanks go to our monthly
contest coordinators: Professors Alexander Givental and Bjorn Poonen, cir-
clers Gabriel Carroll, Andrew Dudzik, Inna Zakharevich, Neil Herriot, Mak-
sim Maydanskiy, Evan O’Dorney, Evan Chen, and former associate director
Ivan Matić.

5.3. Professional support with the web site has been rendered on numer-
ous occasions by Paulo de Souza, Dmitri Mironov, Steve Sizemore, and Igor
Savine. Marsha Snow, Barbara Peavy, and Tom Brown have offered valu-
able secretarial support over the years. BMC owes its logo design to Archer
Design, Inc.
As one can see, many dozens of people have been involved in running
the Berkeley Math Circle: it is a joint operation born of the love and care
for our young generation of mathematicians. The most important people in
this operation are undoubtedly the BMC instructors (over 100), who have
delivered the 800 sessions during the last 16 years. We would like to thank
all of them! Twelve instructors joined BMC in the beginning and most have
stayed with us throughout the years: Ted Alper, Tom Davis, Dmitry Fuchs,
Alexander Givental, Quan Lam, Bjorn Poonen, Tom Rike, Vera Serganova,
Tatiana Shubin, Zvezdelina Stankova, Paul Zeitz, and Joshua Zucker.

5.4. Book support. Edward Dunne, our AMS editor, and his staff have
been very helpful in resolving technical and other issues. Gabriel Carroll
is responsible for drawing some of the cartoons in the book series, inspired
by the earlier BMC sessions. All USAMO problems are used with permis-
sion from the American Mathematics Competitions (AMC), Lincoln, Ne-
braska [2]. A few pictures and references have been taken from Wikipedia

With gratitude,
Zvezdelina Stankova
Berkeley Math Circle Director
Session 1

Geometric Re-Constructions. Part I

Along Optimal Paths and Integer Grids

Zvezdelina Stankova

Sneak Preview. Volume I introduced us to three geometry topics: circle geom-

etry, mass point geometry, and inversion in the plane. To different degrees they
all assumed some familiarity with the theory and techniques of classical geometry.
In this session, we will start filling in the missing geometric background, moti-
vated by two tantalizing problems: adding up angles in a triplet of squares and
finding the shortest path on a hot summer day. Both problems will be easy to
understand but certainly not easy to conquer. In our search for solutions, we
will intelligently conjecture by physical experimentation; boldly re-create by re-
flections or grid extensions; and convincingly prove by the criteria for congruence
and similarity. Along the way, we will justify two well-known geometry theorems:
about centers of parallelograms and triangles, and we will briefly dip into the
history and logical foundations of geometry: Euclidean and hyperbolic.
All employed techniques will be accessible to ages 10+ . Yet, the originality of
the approaches will be gratifying for anyone seeing these problems for the first
time. In Part II we will continue exploring other, more advanced but perhaps not
as innocently exciting, solutions to our problems and their extensions.

1. Experimenting and Conjecturing

The signature problems of this session are two of my favorite plane ge-
ometry problems. After decades of subsequent advanced math studies, they
still remain crystal clear in my memory . . . to remind me of the wonder I
experienced when I first saw them as a 5th -grader back in Bulgaria [12].
As our first step toward solving them, we will experiment and decide if
our answers constitute a mathematical proof or not. It is absolutely nec-
essary to bring aboard for this journey some graph paper, scissors, clear
tape, a flexible but not stretchable cord, a pin, and of course, a pencil and a
straightedge. Highly recommended are a compass and completely prohibited
are calculators and other electronic equipment. We will depend only on basic
tools and on our unlimited imagination.

1.1. Cutting, taping, and guessing. Our first problem has an almost
century-long history. One of its solutions presented here resembles a truly fa-
mous, almost mystical 2000-year old puzzle leading way back to Archimedes!1
Problem 1. (Three Squares) Three identical squares with bases AM ,
M H, and HB are put next to each other to form a rectangle ABCD
(cf. Fig. 1a). What is the sum of the angles ∠AM D + ∠AHD + ∠ABD?

α β γ α
Figure 1. Experimenting on the three squares
The problem is asking us to find something – an angle. In such situations,
people would give you the answer they believe is correct and, more often
than not, would think that they are done, without having actually proven
anything! But the reader who has gotten this far in the book series knows
that the solution should consist of at least two parts:
 (1) investigating and conjecturing, and
(2) formally proving the conjecture.
Alas, sometimes even just coming up with the correct answer is already
a challenge. For example, when I encountered this problem as a 5th -grader,
it wasn’t at all obvious to me what the sum of the three angles had to be . . . .
So, how was I to start on a problem when it was unclear what I was supposed
to be proving?
 PST 1. If physical experimentation is not too difficult, then do it in order
to discover some possible answers to a problem. Since conjecturing does not
require any proof, just about anything is allowed as “experimentation,” as
long as you follow the rules of the problem (and don’t hurt anyone!).
Figure 1b is more than suggestive:
Exercise 1. Draw the 3-squares problem on a graph paper, cut out the
three angles, and tape them to each other to form a single angle sum: the
three vertices will become one and some adjacent arms will coincide too.
How large do you think this angle sum will be? Estimate it.
If Figure 1c were drawn to show this resulting angle sum, it would have
given away the answer too easily. Now, of course, due to errors in the
physical experimentation, no two final angle sums will be absolutely the
same. Nevertheless, they will all look suspiciously close to a very well-known
angle . . . .
See the Historical Appendix in Part II for an explanation of this startling reference.

Every time I ask the BMC-Beginners to complete this experiment, it

always produces the same emotional outcome: a number of students shout
out that the sum-angle is about 89.9◦, others are adamant that it is slightly
obtuse; and yet, upon voting, the majoring are convinced that “It has to be
90◦ !” And when I ask how they know, some students

Exercise 2. Pull out a protractor, measure, and add the three angles.

And hence a second physical experiment is performed, with its own error
of measurement, despite how much one might rely on his/her own protractor.
In fact, if you do it yourself, you will likely discover that only one of the three
angles measures easily and nicely (which one?), and the other two angles
yield seemingly random non-integer degrees . . . . As a result, this experiment
might prove to be even less precise than the first one with the scissors!
One thing, though, should be clear by now – if the problem has a nice
answer suitable for a 5th -grade solution (albeit, from a Bulgarian geometry
math circle book!), then that answer must be:
Conjecture 1. The three angles add up to a right angle.
As a middle school student, I knew three ways to prove this conjecture:2
 Idea 1: A bold and truly brilliant solution that re-creates the “missing”
half of the picture by an original extra construction. A bright 5th -grader
will understand this solution, as it uses only very elementary technical tools
such as congruent triangles and a couple of special plane figures that everyone
knows. But it is unlikely (although not impossible) that the bright 5th grader,
or even the most seasoned problem-solver, will be able to come up with such
an amazing solution out of nowhere. ♦
 Idea 2: A 7 -grade solution using similar triangles and the Pythagorean

Theorem, which only partially illuminates the reason behind the 90 -sum. ♦
 Idea 3: A standard and boring but fast 8 -grade solution via trigonometry,

which does not explain why the result really is what it is. ♦
The first challenge has been
served. You should try on your
own to solve the problem in at
least one way. The picture on
the left contains color-coded
hints for all three different
ways. We will re-create the
5th -grade solution in this
session and come back to the
other solutions in Part II.

. . . that is, until I saw the 54 proofs in [82]! Check out the History Appendix in
Part II.

1.2. Pinning, stretching, and sliding. Here is another popular math

problem from folklore, a favorite in math circles in Eastern Europe and
around the world.
Problem 2. (Farmer & Cow) During a hot summer day, a farmer and a
cow find themselves on the same side of a river. The farmer is 2 km from the
river and the cow is 6 km from the river. If
each of them walks straight to the river, they
will be 4 km from each other. Unfortunately,
the cow has a broken leg. The farmer must
get to the river, dip his bucket there, and
take the water to the cow. To which point ?
on the riverbank should the farmer walk so
that his total path is as short as possible?

If you draw several possible paths for the farmer, measure, and add, you
will get an idea as to where the optimal place will be along the river. You
may even want to organize all data in a neat table. There is, however, a
simple physical experiment that can help you arrive at a conjecture faster:
Exercise 3. Take a flexible (but not stretchable) string or cord; pin one
end at the farmer’s position; with your right-hand fingers loosely hold the
other end at the cow’s position; and with a pencil (or your left-hand fingers)
stretch the cord until it touches the river. Then start sliding the pencil along
the river, accordingly loosening or tightening at the cow’s position to keep
the cord in two straight segments. Which place along the river needed the
least amount of cord?
Sort of an answer: As you move the C
pencil (or your left-hand fingers), you will
discover a place X at the river, to the left
and to the right of which you will need to
loosen the cord in order for the pencil to stay
along the river. 6
If the farmer and the cow walked straight F
to the river to points A and B, respectively,
then how long is AX? Since different pic- 2
tures will be drawn with different scales, a
more appropriate question might be to ap- X
proximate the ratio AX : BX. ♦
As with the 3-squares problem, upon performing this experiment, the
BMC-Beginners split in their predictions; some claim that AX ≈ 0.9 km
and for some it looks like AX ≈ 1.05 km, while the majority suspect that
the exact answer must be a nice round number:
Conjecture 2. The farmer should go to a place X on the river so that
AX = 1 km; i.e., AX : BX = 1 : 3.

There are at least three ways to attack the problem:

Idea 4: A clever idea is to reduce the problem

to a trivial but equivalent version by an extra
construction that (again!) re-creates the “miss-
ing” half of the picture. A bright 5th grader will
be able to follow the logic of the solution, if she
is familiar with basic geometry tools such as
the Triangle Inequality and similar triangles,
and experienced in manipulating fractions and
solving simple linear equations. ♦
 Idea 5: Take a “leap-of-faith” and apply a fundamental law of physics and
its consequence that we observe every day. ♦

 or a proof with inequalities, which give us no better explanation of why the

Idea 6: The standard, technically-loaded calculus solution with derivatives

answer is what it is other than “This is how the calculations work out.” ♦
The second challenge has been served. Incidentally, the picture of the
sun looking into the mirror is an indirect and direct hint for two of the ideas.

1.3. The grand design. For the rest of this session we will build the
necessary elementary geometry background and discuss the creative and non-
trivial 5th -grade solutions to our two overarching problems. At the end, we
will briefly look into the logical foundation of our plane geometry studies.
In Part II, we will continue building sophisticated geometry and some
technical trigonometry background in order to complete the remaining sug-
gested solutions, generalize their methods to other more advanced problems,
and finally, go out of our “comfort zone” and see beyond the 3-squares prob-
lem and possibly into the origins of trigonometry millennia ago.
If you feel you are already fortified with enough plane geometry back-
ground and the two overarching problems are not challenging enough, you
can skip to the historical section at the end of the session. However, be aware
that the solutions we will discuss here are purely geometric (a.k.a. synthetic)
and, arguably, these are the most beautiful solutions; they can be potentially
created by bright middle schoolers with little technical background and open
minds. And hence, they are worth experiencing.

2. A Triangle Workout

Triangles make up any polygonal shape: if you haven’t done this before,
just cut any polygon that happens to be lying around along several of its
non-intersecting diagonals, until you are left with only triangles. Triangles
also appear on their own everywhere in geometry and in everyday life. We
will definitely need them to solve all of our problems in this session!

Therefore, it is important to answer some fundamental questions about

them. Even if you remember your geometry lessons from middle and high
school, browse through this section to double-check if it identically “reflects”
your knowledge about triangles.

2.1. To be or not to be alike? The two main questions here are:

Question 1. When are two triangles the “same,” i.e., all of their correspond-
ing angles and sides are equal3 in size? This is formally known as congruent
i triangles and is denoted by the symbol ∼ =.
Question 2. When do two triangles look “alike,” i.e., their corresponding
angles are equal in size and their sides are proportional? This is formally
i known as similar triangles and is denoted by the symbol ∼.

Suppose we want to show the congruence ABC ∼ = A1 B1 C1 (cf. Fig. 2a).
Do we need to verify all six conditions for sides and angles:
• a = a1 , b = b1 , c = c1 ; and α = α1 , β = β1 , γ = γ1 ?
γ1 C B2
b1 a1 γ2

a ∼ b2 a2
α1 β1 α2 β2
A1 c1 B1 α β A2 c2 C2
A c B
Figure 2. Congruent or Similar: A1 B1 C1 ∼
= ABC ∼ A2 B2 C2

The same question goes for similar ABC ∼ A2 B2 C2 (cf. Fig. 2b). Will
it be overkill to verify all five conditions for their sides and angles:
• a/a2 = b/b2 = c/c2 , and α = α2 , β = β2 , γ = γ2 ?

 for congruence/similarity of triangles, which will require you to verify the

PST 2. To show that two triangles are congruent or similar, use a criterion

minimum number of conditions: typically only three for congruence and two
for similarity. These will be sufficient to imply the remaining conditions on
sides and angles and will guarantee the congruence/similarity of the triangles.

Here is a table with 5 standard criteria for congruence of triangles and

their counterparts for similarity. The way each criterion works is as follows:

verify that the elements listed in the table under the criterion for one triangle
are equal to the corresponding elements for the other triangle, and then all
other corresponding elements of the two triangles will follow suit.
We shall be sloppy and say “equal” for sides and angles to mean that they have the
same measure, this is formally referred to as congruent sides and congruent angles.

Congruence Criterion Similarity Criterion

(SAS) Two sides and included angle: (RA) Ratio of two sides and included
a = a1 , b = b1 , γ = γ1 . angle : a/b = a2 /b2 , γ = γ2 .
(ASA) Two angles and the included (AA) Two angles:
side: α = α1 , β = β1 , c = c1 . α = α2 , β = β2 .
(SSS) Three sides: (RR) Two ratios of two sides:
a = a1 , b = b1 , c = c1 . a/b = a2 /b2 , b/c = b2 /c2 .

(SsA) Two sides and the angle oppo- (R A) Ratio of two sides and the angle
site the longer side: opposite the longer side:
a = a1 , α = α1 , b = b1 . a/b = a2 /b2 , α = α1 .
(HL) The hypotenuse and a leg in a (H/L) Ratio of the hypotenuse and a leg
right triangle. in a right triangle.

Table 1. Congruence and similarity criteria for triangles4

Examples. ASA criterion (cf. Fig. 3a) asks us to check that, say,
? ? ?
AB = A1 B1 , ∠ABC = ∠A1 B1 C1 , and ∠BAC = ∠B1 A1 C1 .
Similarly, according to RR, two ratios of sides in ABC must be equal to
? ?
two ratios of sides in A2 B2 C2 , e.g., BC/CA = B2 C2 /C2 A2 and CA/AB =
C2 A2 /A2 B2 (cf. Fig. 3b). This can be written in an equivalent but more
memorable way as follows:
AB ? BC ? CA
= = ·
A2 B2 B2 C2 C2 A2
C1 C C

b a
∼ b2 a2
α1 β1 α β
A1 c B1 A c1 B A c B A2 c2 B2
Figure 3. ASA and RR criteria

Exercise 4. For each criterion, draw a relevant picture of two triangles that
are congruent (or similar), as in Figure 3, label their vertices, mark the sides
or angles (or ratios) that are supposed to be equal, and write down (in letter
notation) the conditions that are satisfied by the criterion.

2.2. Reaping the benefits of congruence and similarity is what one

ordinarily does after establishing that triangles are congruent or similar: one
concludes that the remaining sides, angles, or ratios of sides are equal and
uses these facts in whatever way necessary. We formulated and applied this
PST in Inversion I (vol. I). We now again demonstrate it, by walking through
two famous basic theorems that were used but not proven in Inversion I.
The similarity criteria RA and RR are known by the names SAS∼ and SSS∼.

2.2.1. Center of a parallelogram. Let’s first agree on what a parallelogram is?

Definition 1. A quadrilateral with two pairs of parallel (opposite) sides is
i called a parallelogram.

Often a parallelogram is defined in a different way:

Definition 1 . A quadrilateral with two pairs of equal opposite sides is
i called a parallelogram.

Are the two definitions of a parallelogram equivalent? To answer this

(indeed, to be able to prove anything about parallelograms!), we need the
following fact from plane geometry:
Theorem 1. (Alternate Interior Angles) When a transversal 5 intersects
two parallel lines, the eight resulting angles are grouped into two quadruples of
equal angles; in Figure 4a one such quadruple is α = α = β = β  , where the
angles α and β are called alternate interior angles. Conversely, if alternate
interior angles α and β formed by two lines and a transversal are equal (as
in Fig. 4a), then the two lines are parallel.

β l D C D C D C
α β α
β β E E
α m α α β
 B B B
α A A A

Figure 4. Parallel lines, Sides and Diagonals in a parallelogram

Returning to the equivalence of the two definitions, let us show one

direction. If AB||CD and BC||DA then the marked alternate interior angles
on Figure 4b are equal: α = α and β = β  . Combining this with the common
side AC enables us to apply ASA and conclude that ACB ∼ = CAD. As
a consequence of this congruence, we have AB = CD and BC = DA: the
opposite sides of a parallelogram are indeed equal! Thus, Definition 1 implies
Definition 1 . ♦
For the other direction, another congruence criterion is needed:

 two opposite sides are parallel.

Exercise 5. If a quadrilateral has pairs of equal opposite sides, then any

We are now ready to state the famous theorem that generated the dis-
cussion here about parallelograms. Its proof is left for the reader, especially
since an almost explicit hint about it is definitely somewhere on this page.

Theorem 2. (Parallelogram’s Center) The diagonals of a parallelogram
bisect each other (cf. Fig. 4c).

A line that intersects two lines in different points is called a transversal.

2.2.2. Center of a triangle. A parallelogram has a unique undisputed center:

the intersection of its diagonals. However, in general, a triangle has many
different “centers,” depending on what you define its center to be.6 Below
we locate one such center, using our similarity criteria multiple times.
Definition 2. In a triangle, a median is a segment connecting a vertex
i with the midpoint of the opposite side. The point where the three medians
intersect is called the centroid. of the triangle.
Does such a centroid always exist and can its position be described using
only one median, not three? Do not peek at the figures below before you
give your best try in the upcoming exercises!
Exercise 6. Draw two medians in a triangle, and experiment physically to
estimate the ratio in which their intersection point divides each median.
Yes, your guess is right: the ratio is the same for both medians and it is
the simplest one you would come up with by eyeballing the drawing.

B1 A1 B1 α A1 B1 A1
M β
δ γ
α γ δ
Figure 5. Centroid, Midsegment, and Medians

Theorem 3. (Centroid) The three medians in a triangle intersect in one

point, which divides each median in a ratio of 2 : 1 counted from the vertex
of the triangle (cf. Fig. 5a).
Before we attack the Centroid Theorem, let us do something more basic
that will help us prove it. In a triangle, the segment connecting the midpoints
i of two of its sides is called a midsegment of the triangle.
Exercise 7. How long is the midsegment in relation to the third side of the
triangle? Do the two seem to be in special relation to one another?
Upon completing Exercise 7, you will likely formulate:

 to the third side of the triangle.

Theorem 4. (Midsegment) A midsegment is half the length and parallel

Proof: If A1 and B1 are the midpoints of sides BC and AC, respectively,

then CB1 : CA = 1 : 2 = CA1 : CB. Since ∠C is common to both triangles,
by RA we have B1 A1 C ∼ ABC (cf. Fig. 5b).
Since the triangles are similar, we have B1 A1 = 12 AB and α = α (as
marked on the figure). Since α = β (as vertical angles), we conclude that
alternate interior angles α and β are equal, and hence AB||B1 A1 . 
Orthocenter, circumcenter, incenter, or centroid, to name a few.

The Centroid Theorem is asking us to show that three segments – the

i medians – are concurrent, i.e., that they intersect in one point – the centroid
of the triangle. To establish that several figures are concurrent, in Circle
Geometry (vol. I) we utilized a general technique. A specific version of it
will help us now get hold of the centroid M :
PST 3. To show that several segments intersect in the

same point M , fix one segment XY and show that the
remaining segments divide XY in the same ratio counted Y
from X to Y . Furthermore, if you can find the exact ratio M
in which a second segment divides XY , you may be able
to apply an analogous argument to prove that any other
segment intersects XY in that same ratio. X
Now the centroid’s existence and location is only one -similarity away!

Proof of Centroid Theorem: If medians AA1 and BB1 intersect in

point M , then ABM ∼ A1 B1 M , as strongly suggested in Figure 5c. In-
deed, since B1 A1 ||AB, the alternate interior angles formed by any transversal
of these two lines are equal. In particular, γ = γ  and δ = δ  as marked on
the figure, so that AA justifies the similarity of triangles.
But we know that the ratio of the corresponding sides is AB : B1 A1 = 2 : 1,
from which the other ratios are also 2 : 1. Therefore, AM : M A1 = 2 : 1 =
BM : M B1 , as desired for the two medians AA1 and BB1 .
“And history repeats itself!” Applying the same proof above to the me-
dians AA1 and CC1 , we conclude that they divide each other in ratio 2 : 1
counted from A and C. Because M divides AA1 in that same ratio 2 : 1, from
PST 3 we conclude that the intersection point of AA1 and CC1 coincides
with the intersection point M of AA1 and BB1 . The final result is that all
three medians intersect in point M , a.k.a. the centroid of ABC. 

Now that we have seen the power of the congruence/similarity criteria,

let’s turn to our main problems. Applying the criteria will be the easy
part . . . . How do we create positions where we can usefully apply them?

3. Walking Along an Optimal Path

Let us start with our farmer and his cow. The way the problem is posed
makes it hard to solve. Recall how we had to experiment in order to guess
the optimal place on the river where the farmer should go, and still we were
far from actually proving anything!

PST 4. Identify what makes the problem hard, eliminate it by reducing the
situation to a simpler one, and see if the new problem is easily solved. Then
connect your solution to the original problem.

3.1. Simplify and solve. What makes the farmer-and-cow problem hard?
One expression in the statement of the problem, about which we did not
think twice, is what got us into trouble . . . . Which one is it?
How about . . . “on the same side of the river!” What if our two protag-
onists were on different sides of the river? Would you be able to solve the
problem now in no time? Certainly! The shortest path would be the straight
path from the farmer to the cow, going through the river!
Here it is reasonable to pause: what if the river
is wide? Does it make a difference to the farmer’s
path? Sure it does, so . . . eliminate the width of the
river! I can hear some readers objecting: “But you
cannot! It is part of the problem.” Actually, it is not:
the original problem was placed entirely on one side
of the river and did not depend on the width of the
river, or for that matter, on whether the river had River l
any width at all. Hence, as any brave mathematician
would do, we will draw the river as a line with no
width: this will simplify our new problem and make
it a better match for the old problem.

3.2. Relate back to the original problem. It doesn’t take much effort to
see that reflecting the original farmer across the river to create a “phantom”
farmer will turn one problem into the other. Since any path that the original
farmer can take is mimicked by the phantom farmer,
then the shortest path of the original farmer must
correspond to the shortest path of the phantom
farmer: the straight one, as noted earlier. C
So, what is our answer? We must

3.3. Create an algorithm to show where the orig- F

inal farmer should to go.
Step 1. Make the river into a line l (with no width). River l
Step 2. Reflect farmer F across line l to a point F  . X
Step 3. Let X be the intersection point of segment F
F  C with line l.
Step 4. Tell the farmer to go to point X at the river,
dip his bucket there, and then go to the cow.

3.4. Prove that your algorithm works. Our earlier informal argument
led to creating the algorithm, but we still need to formally justify that it
will yield the shortest possible path for the farmer. Indeed, suppose farmer
F walks to any other point Y on the river. Why is this path F → Y → C
longer than the path F → X → C suggested by our algorithm?

PST 5. To show that a broken path P1 is longer C

than another broken path P2 , try laying out path P1
 along two sides of a triangle ABC and path P2
along the third side (as on the figure to the right),
and use the Triangle Inequality AC + CB > AB to
conclude that P1 is longer than P2 . P2

C The triangle in question in our problem is created

using the phantom farmer F  . Because of the reflection,
F X = F  X and F Y = F  Y (cf. Exercise 8), so that the
original path P2 : F → X → C is as long as P2 : F  →
F X → C, and the new path P1 : F → Y → C is as long
as P1 : F  → Y → C. All this boils down to applying
l the Triangle Inequality (I) to F  Y C:
X Y length(P1 ) = F Y + Y C = F  Y + Y C
F ≥ F  C = F  X + XC = F X + XC = length(P2 ),
with equality if and only if F  Y C degenerates into a segment F  C, i.e.,
Y = X and the farmer walks to the point X prescribed by our algorithm. In
other words, the path P2 : F → X → C is indeed the shortest possible. 
This discussion should have convinced even the most skeptical reader of
the vast possibilities when working with something as simple as reflections:

PST 6. One way to create new problems or reduce to simpler ones is to
reflect across a line. Since any triangle (moreover, any figure!) retains its
size and shape, we arrive at a twin to the original situation.
The beginner should confirm the above statements about reflection:

Exercise 8. Show that the measure of any segment and any angle is pre-
served under reflection. What can you say about triangles under reflection?

3.5. Reflect upon the result of your algorithm. Are we done with
the Farmer-and-Cow problem? In some sense yes: we described a geometric
algorithm, which leads step by step to the optimal path for the farmer, and
we proved that this algorithm works.
On a second thought, though, did you notice that our solution did not
depend at all on the given numerical data: 2 km, 4 km, and 6 km?! What
was that all about? A further mystery is why we studied in detail similar
triangles when we didn’t use them at all?! Well, the Triangle Inequality
(which we did use) can and will be proven in Part II as a consequence of the
Pythagorean Theorem, which in turn will be proven via similar triangles.
But more to the point, do you remember our experiment with the flexible
cord? We made a specific conjecture about the location of point X along the
river: AX : XB = 1 : 3, with A and B at the river directly from the farmer
and the cow. Similar triangles and a bit of algebra will be the “cure” here.

Proof of Conjecture 2: The figure on the C

right contains four triangles (count them!). How-
ever, only three of them are of interest to us; natu-
rally, these are the ones similar to each other:
• F AX ∼ = F  AX due to the reflection and F 6
SAS (how?); and
• F  AX ∼ CBX due to vertical and right l
angles and AA (how?). A X B
The similarity F AX ∼ CBX prompts us to
compare ratios of sides and to finally use the given F
numerical data: F A = 2, AB = 4, and CB = 6. If AX = x, then BX =
AB − AX = 4 − x so that AX/AF = BX/BC and we can calculate:
x 4−x
= ⇒ 6x = 2(4 − x) ⇒ 3x = 4 − x ⇒ 4x = 4 ⇒ x = 1.
2 6
As predicted, AX = 1 km and AX : BX = 1 : 3. ♦

3.6. The problem-solving structure that persisted throughout our so-

lution could be applied whenever the question asks us to locate certain geo-
metric objects, be they optimal paths, special points, or other:
reflect upon



Figure 6. Problem-solving structure

(1) Experiment physically (or abstractly) to come up with a conjecture

about the possible location(s) of the object.
(2) Construct an algorithm (typically, of geometric steps) that leads
to the object/location in question.
(3) Prove that your algorithm works, i.e., that it indeed yields the desired
(4) Reflect upon (pun intended!) the result of your algorithm and try
to produce an alternative (perhaps, algebraic) description of the ob-
ject/location. Look for insights from the complete picture.
In construction problems, the first step is often replaced by a “discussion,”
during which one assumes that the object has been found and reasons what
must be true about its location. The last step too could take the shape of an
“analysis”: here one investigates the number of solutions depending on the
original configuration. For example, any locations of the farmer and the cow
will yield a unique optimal place X at the river . . . unless we allow them to
be at the river, in which case there are infinitely many solutions X (why?).

4. Walking Along an Integer Grid

Let us now turn our attention to the three-squares problem (cf. Fig. 7a).
Recall our conjecture that
α + β + γ = 90◦ .
4.1. Fitting the conjecture into the picture. Our first experiment –
cutting and pasting – led to a right angle made out of non-overlapping α, β,
and γ, as Figure 1b suggested. But where in our original picture will this
right angle fit well?
One possibility is the right ∠ABC: since it already contains γ = ∠ABD,
 we “just” have to show that the remaining ∠DBC can be split into α and β.

4.2. Grid hopping. Consider the integer grid made out of unit squares,
just like the three squares in our problem. The points where the grid lines
i intersect will be called grid points. To split ∠DBC as desired, we need to
draw at least one extra arm inside this angle.
The brilliant idea of this solution is to:

 triangles whose vertices are grid points (a.k.a. grid ’s).

PST 7. Restrict to the grid : connect only grid points, and consider only

The original problem already has 9 grid triangles! Did you find them?
A1 H1 B1

90◦ ? 45◦ ?
α β γ β γ
Figure 7. Tiling of the integer grid

Now, we want to re-locate angle β inside ∠DBC, and β participates in

the grid AHD. So cut out this triangle, move it, flip it, and rotate it as
needed, until H coincides with B, side HA goes vertically up from B, and
∠AHD = β happily fits inside our right ∠ABB1 (cf. Fig. 7b). In other
words, we have constructed a new grid BB1 H1 ∼ = AHD.
To ease the solution and make the construction more “balanced,” draw
yet a third copy of AHD: the grid DH1 A1 as in Figure 7b. This com-
pletes our picture to a rectangle ABB1 A1 , twice the size of the original
3-squares drawing. As the reader may have noticed, we have labeled by X1
the reflection7 of any grid point X across line CD.
Although the three original squares are reflected across line CD, it is worth noting
that our augmented picture is not entirely symmetric with respect to line CD (why not?).

4.3. Special triangles to the rescue! It remains to show ∠DBH1 = α.
Since α = 45◦ from the right isosceles AM D, ideally we would find another
right isosceles grid triangle one of whose angles is ∠DBH1 . . . . Not that we
have much of a choice:
 Exercise 9. Show that DBH 1 is right isosceles and hence ∠DBH1 = α.

Proof: That BH1 = DH1 is immediate, as they are hypotenuses of our

two congruent grid triangles, BB1 H1 ∼ = H1 A1 D. From these same right
triangles, ∠DH1 A1 = β and ∠BH1 B1 = 90◦ − β. Therefore, in the 180◦
∠A1 H1 B1 (depicted white in Fig. 7b), two angles add up to 90◦ and hence
the remaining third angle must be 90◦ . Namely, ∠DH1 B is right.
The desired conclusion now follows: DBH1 is right isosceles, and thus
∠DBH1 = 45◦ , so 45◦ + β + γ = 90◦ . Therefore α + β + γ = 90◦ . 

 a tiling of a 3 × 2 grid rectangle via five grid-triangles:

4.4. Triangular tiling. What really happened in our solution? We devised

(1) One tile in the shape of an obtuse triangle contained γ.

(2-4) Three tiles were congruent right triangles with legs 1 and 2; they
brought β into our argument.
(5) And finally, a big central tile was a right isosceles triangle, which pro-
vided the 45◦ -angle equal to α and completed the tiling.
At a first glance, nothing spectacular . . . . Still, when I saw this five-tile
construction as a 5th -grader, it seemed to me surprising that anyone could
come up with this drawing in the first place, and even miraculous that the
tiling could be so conveniently used to solve the 3-squares problem! By the
way, this is not the only grid tiling that can be used for our problem: as
mentioned earlier, there are several dozen geometric proofs in [82], involving
a variety of grid constructions, one of which will occur in the 7th -grade
solution in Part II. And as you might have guessed by now, the 5th -grade
solution we just re-created is my favorite: it requires the least amount of
technical background but (perhaps, because of that) the most imaginative
thinking of all. Its main steps listed below match closely the general problem-
solving scheme that we outlined on page 13:
reflect upon:
six-tiling of grid

w/ triangles

extend grid
cut & paste

Figure 8. Problem-solving steps in the 3-Squares solution


5. To Prove or to Take for Granted?

5.1. Full disclosure. Before we move to the more advanced solutions to

our two overarching problems in Part II, let us briefly discuss the logical
foundation of what has transpired so far.

5.1.1. We proved (or assigned to the reader to prove) several statements:

(1) Equivalence of parallelogram definitions.
(2) Center of Parallelogram Theorem.
(3) Midsegment Theorem.
(4) Centroid Theorem.
(5) Reflection across a line preserves sizes of segments and angles.

5.1.2. We listed but did not prove ten criteria for triangles:
(6-10) Congruence: SAS, ASA, SSS, SsA, HL.
(11-15) Similarity: RA, AA, RR, R A, H/L.

5.1.3. We used the following facts within our discussion:

(16) The Triangle Inequality (to be proven in Part II).
(17) Parallel lines imply equal alternate interior angles and vice versa.
(18) The sum of three angles in a triangle is 180◦ .
(19) The base angles in a right isosceles triangle are equal to 45◦ .
(20) Vertical angles are equal.8

5.2. Euclid and Hilbert must agree. It is important to understand that

i some statements are theorems, i.e., they can be proven based on other true
statements in our plane geometry theory or other areas of mathematics. On
the other hand, certain statements cannot be proven as they are assumed to
i be true without proof; such statements are called axioms.
Euclidean geometry – what you know from school as (plane) geometry
– is based on a carefully chosen set of axioms. It took two millennia for
mankind to agree on which statements should be “axioms” and which could
be proven from them and hence called “theorems”. How the geometric ax-
iomatic system evolved over time from Euclid to Hilbert to the present, and
how new geometries (such as hyperbolic and elliptic) sprouted from the deep
analysis of the logical foundations of geometry, is a fascinating story to study,
worth another session on its own, if not a whole semester college course [33].
We will explore here enough history to identify which of our statements
have been accepted as axioms; which could, potentially, replace tradition-
ally assumed axioms; and which should be proven as theorems; . . . and what
happens when a basic pillar of our geometric intuition is around no more.

And it is quite possible that we have missed something to list here, which is fine
because the diligent reader can find it and add it to the list.

5.3. Axioms, reveal yourselves! In 1899, the German mathematician

David Hilbert (1862-1943) proposed in his Grundlagen der Geometrie (Foun-
dations of Geometry, [40]) an axiomatic system of plane and solid geometry.
It consists of three primitive terms: point, line, and plane, and six relations
of betweenness, containment, and congruence; and 20 (originally 21) axioms.

5.3.1. Can all criteria for triangles be proven? According to Hilbert’s ax-
iomatic system, part of the SAS criterion for congruence is an axiom:
Hilbert’s Congruence Axiom: If two sides and the angle between them in
one triangle are congruent to the corresponding elements in another triangle,
then the remaining corresponding angles are also congruent.
In other words, the axiom does not conclude that the triangles are con-
gruent! It can be shown then, using Hilbert’s axioms, that the remaining
sides are congruent so that the triangles are congruent. Thus, the SAS cri-
terion is partly an axiom and partly a theorem. All other congruence and
similarity criteria can be deduced from SAS, and hence they are theorems.

5.3.2. Why do parallel lines create congruent angles? Statements (17)-(18)

about alternate interior angles and angle-sum of a triangle are implied by
the most controversial geometry axiom in history, proposed by the Greek
mathematician Euclid of Alexandria (325-265 BCE):

Euclid’s Fifth Postulate: If two lines l and m have a transversal t so that

the sum of the interior angles on one side of t is less than two right angles
(e.g., α + β < 180◦ as in Fig. 9a), then l and m intersect on that side of t.

m t P m
l α X l l

Figure 9. Euclid’s, Hyperbolic, and Parallel Axioms

Since Euclid, many a famous mathematician tried to prove the Fifth Pos-
tulate from Euclid’s other axioms,9 and some even published “proofs” . . . only
for flaws to be eventually found in the arguments. Nevertheless, with each
such attempt mankind moved closer to a non-Euclidean geometry. Finally,
around 1830, the Russian Nikolai Lobachevsky (1792-1856), the Hungarian
János Bolyai (1802-1860), and the German Carl Friedriech Gauss (1777-
1855) independently arrived at hyperbolic geometry, where all axioms of Eu-
clidean geometry hold, except for the Fifth Postulate.
Euclid’s axiomatic system, proposed in his Elements [24], consists of 23 definitions,
5 undefined concepts, and 5 axioms, a.k.a. postulates.

To complete the story, in 1868 the Italian Eugenio Beltrami (1835-1899)

provided models of this geometry, thereby proving its consistency and va-
lidity. For example, in the so-called Beltrami-Klein model (cf. Fig. 9b), the
“points” are all points inside a fixed circle k; the “lines” are the chords in k
(excluding their endpoints on k); and two “lines” intersect when the corre-
sponding chords intersect in an ordinary point inside k.
Of course, a lot more needs to be defined and technical details “ironed
out” to show that all axioms of hyperbolic geometry are satisfied in this
model. But contrary to what the Fifth Postulate implies in Euclidean geom-
etry, a striking feature persists in any of these (equivalent) hyperbolic models:
there are infinitely many parallels to a given line through a given point (three
of those are drawn as dashed lines in Figure 9b). This situation completely
defies our intuitive understanding of how (Euclidean) geometry works!
Still, the models of hyperbolic geometry live within our usual Euclidean
space and the theory behind them provides useful insights (such as inversion
in the plane) to study and elegantly prove some phenomena that we observe
in math and in life.
5.3.3. Can the Fifth Postulate be replaced? Even though Euclid’s Elements
is historically, perhaps, the most influential math book, over the millennia it
became evident that there were gaps in Euclid’s original axiomatic system
and that it had to be revised. This is what Hilbert completed at the end of
the 19th century. He chose an equivalent form of the Fifth Postulate:
Parallel Axiom: There is at most one parallel to a given line l through a
given point P (cf. Fig. 9c).
 The existence of such a parallel can be proven by a specific construction (say,
with two right angles), and hence it is not a necessary part of the axiom.
Going back to our little logic discussion, curiously enough, statement (18)
about the 180◦ -sum in a triangle can also replace the Fifth Postulate, but at
the price of an additional continuity axiom attributed to the Greek mathe-
matician, scientist, and engineer, Archimedes of Syracuse (287-212 BCE):

Archimedes’ Axiom: Given two segments AB and C D

CD, we can put together enough copies of AB to con-
struct a segment larger than CD. A B
You may recognize that this property is also valid for real (positive) num-
bers instead of segments. It will take us too far afield to show that state-
ment (18) and the Archimedes’ Axiom together imply the Fifth Postulate.
But here is a little logic exercise about (18) that you may have encountered
in school, yet likely not in such depth.
C m
Exercise 10. Prove that the angles in a triangle add α β
up to 180◦ . You may assume the Fifth Postulate and γ

Theorem 1 about alternate interior angles. α β l


Beginning of a Proof: The figure in Exercise 10 is probably what you

created if you tackled the problem before. The (dashed) parallel m should
be situated exactly as shown: outside ABC. But what if it were inside?
As preposterous as this suggestion may seem, it must be logically eliminated
in a rigorous proof. To our rescue comes the so-called Crossbar Theorem
(cf. Fig. 10a, [37]), according to which a ray through vertex C of ABC
and inside ∠ACB must necessarily intersect segment AB. But then this ray
cannot be part of a “parallel” line to AB, a contradiction!
Thus, indeed, line m is outside ABC and the picture above correctly
depicts the situation so that you can use it to complete the proof of the
180◦ -angle sum in ABC. ♦

5.4. A fair game in congruences. As you can see, there is a lot that can
be discussed and learned from studying the logical foundation of geometry
and exploring the implications among various theorems and axioms. It is
not the point of this section to make the reader go through the somewhat
grueling process of proving all non-axioms within the 20 statements listed
earlier (not to be confused with Hilbert’s 20 axioms). Several questions of
why, what, and how one thing implies another are, though, in order.
C1 C

A X B A B A1 B1
Figure 10. Crossbar Theorem, “SsA,” and HL
Exercise 11. Consider the SsA criterion.
 (a) Why does it require the angle to be opposite the longer side?
(b) Isn’t HL congruence criterion a special case of it?
Partial Solution: (a) The question essentially asks if we can drop the
condition that the equal angles are opposite the longer sides of the triangle.
Does SsA work when the equal angles are opposite the smaller sides? In
other words, can we strengthen the SsA criterion? Recall the following PST
which was discussed and used in volume I:
 PST 8. One way to disprove a statement is to provide a counterexample,
i.e., a situation where the hypothesis is satisfied but the conclusion fails.
In the case of SsA, draw ABC with an obtuse ∠ACB = γ. On ray BC
locate a point C1 , different from C and such that AC1 = AC (cf. Fig. 10b).
Why does C1 exist? What can you say about ABC and ABC1 ? ♦

Turning to the second question above,

PST 9. To show that a statement S1 is a special case of a statement S,
 verify that in S
1 all conditions of S and something extra are satisfied.

Hint: (b) Properties of right triangles are obviously involved (cf. Fig. 10c).
Assuming the Pythagorean Theorem, can you deduce from it the famous
fact about right triangles that is necessary to answer the question? ♦
After answering affirmatively the last question, in effect we are left with
four criteria for congruence. Our last question introduces a slightly esoteric,
fifth criterion, which you should try to prove from the other criteria:
Exercise 12. (SASum) Show that two triangles are congruent if one side
in one triangle, an angle adjacent to that side, and the sum of the other two
sides are correspondingly equal to the same elements in the second triangle.

Hint: An extra construction is called for. Can you align the two sides whose
sum is known, without moving the third side or the given angle? Which
criterion implies that the base angles of an isosceles triangle are equal? How
is this relevant here? ♦

5.5. Historical and modern perspectives. In preparation for Part II of

this session, the reader has several options:
(B+ ) (Beginners-and-up) Work through Kiselev’s Geometry, vol.I,[32].
This will be a great way to commence your geometry studies!
(I+ ) (Intermediate-and-up) Think about other solutions to our two
overarching problems, along the lines of Ideas 2, 3, 5, and 6.
(A) (Advanced) Look deeper into the history of Euclidean geometry.
Study Hilbert’s axiomatic system or any modern equivalent of it.
(A+ ) (Super-advanced) Study hyperbolic geometry: as an axiomatic
system, its models and applications.

6. Hints and Solutions to Selected Problems

Exercise 1. The angles make what looks like a right angle (cf. Fig. 11a). 


Figure 11. Right angle, SsA, and H/L

Exercise 2. The protractor shows the angles as α = 45◦ , β ≈ 26.5◦ , and
γ ≈ 18.5◦ , which do add up to about 90◦ . 
Exercise 3. The place X on the river that requires minimum amount of
cord is about three times closer to A than B. 

Exercise 4. Figures 11b-c represent SsA and H/L, respectively. ♦

Exercise 5. If AB = CD and BC = DA, we can also throw in the common
side BD and conclude by SSS that ABD ∼ = CDB (cf. Fig. 12a). Reaping
the benefits of the congruence, we have ∠ABD = ∠CDB – alternate interior
angles! Therefore, AB||DC. A similar argument shows that AD||BC. 
Theorem 2. Figure 4d gives it away! We already know that opposite sides
are equal, e.g., AB = CD, and that the alternate interior angles are also
equal for the pairs of parallel sides. If E is the intersection of the diagonals,
by ASA we have ABE ∼ = CDE. From here, AE = CE and BE = DE,
i.e., the diagonals bisect each other in a parallelogram. 

β B A
X Y l
β B1 A1 B1
A1 C1
B1 A 1 C1

Figure 12. Def. 2 ⇒ Def. 1 and Same sizes under reflection

Exercise 8. (a) Let segment AB go to segment A1 B1 under reflection across

line l (cf. Fig. 12b). If AA1 and BB1 intersect l in X and Y , respectively,
the reflection means that AA1 ⊥ l, AX = A1 X, BB1 ⊥ l, and BY = B1 Y .
To show that AB = A1 B1 , we draw parallels to l through A and A1 ,
which intersect line BB1 in C and C1 , respectively. Using alternate interior
angles, show that in quadrilateral XY CA all angles are right, and hence
XY CA is a rectangle. Similarly, XY C1 A1 is rectangle, which is congruent
to XY CA because of equal sides. Consequently, AC = A1 C1 , BC = BY −
CY = B1 Y − C1 Y = B1 C1 , and ∠ACB = ∠A1 C1 B1 = 90◦ . By SAS,
ABC ∼ = A1 B1 C1 and hence AB = A1 B1 . Did we miss a special case? ♦
(b) Since each side of a triangle under reflection across l will go to an
equal segment (cf. Fig. 12c), by SSS we conclude that triangles are sent to
congruent triangles under reflection. 
(c) Consider ∠CAB where B and C are some points on the two arms
of the angle, thereby forming ABC. By part (b), reflection across line l
must send ABC to a congruent A1 B1 C1 (cf. Fig. 12d), which implies
that ∠CAB goes to an equal ∠C1 A1 B1 too. 
Conjecture 2. From the reflection across l, we have F A = F  A and
∠F AX = ∠F  AX = 90◦ . Since AX is common to both triangles, the re-
quired congruence F AX ∼ = F  AX follows from SAS. Further, ∠F  AX =
∠CBX = 90 and ∠AXF = ∠BXC (as vertical angles), so that AA implies
the required similarity F  AX ∼ CBX. 

Exercise 10. Continuing the solution that started on page 18 and referring
to the picture there, by alternate interior angles we have α = α and β = β  .
From the straight angle about point C we have α + γ + β  = 180◦ so that
α + γ + β = 180◦ , which is the desired sum of the angles in ABC. 
Exercise 11. (a) Since 180◦ − γ = δ is acute, there is an isosceles ACC1
with AC = AC1 and base angles ∠ACC1 = ∠AC1 C = δ. Note that C1 will
be on line BC with C between C1 and B (why?), as shown in Figure 13a.
Now, ABC and ABC1 share side AB, and have other equal sides:
AC = AC1 , across which lie equal angles: ∠ABC = ∠ABC1 = β. However,
the two triangles are definitely not congruent, since one of them is contained
in the other, namely, ABC is strictly inside ABC1 ! This counterexample
to a “strengthened SsA” criterion originated from having the (equal) angle β
lie across the smaller side AC of ABC.
In conclusion, we cannot strengthen SsA: we must have the equal angles
opposite the longer sides of both triangles when applying SsA. 
2 2 2
(b) The Pythagorean Theorem says that c = a +b for any right triangle
with lengths c, a, and b of the hypotenuse and the legs. Algebraically, this
implies that c2 > a2 and c2 > b2 , i.e., c > a and c > b. Thus, we arrive at the
well-known fact that the hypotenuse is the longest side of a right triangle.
The three triangle elements in the HL criterion are a leg, the hypotenuse,
and the right angle, which is opposite the longest side of triangle. But this
is precisely the SsA criterion for right triangles! Thus, HL is indeed a special
case of SsA. 
C1 D D1
C1 δ
C β
γ A B C μ C1 μ
β α α
A B A B A1 B1
Figure 13. Constructing extra isosceles triangles

Exercise 12. In ABC and A1 B1 C1 (cf. Fig. 13b), let AB = A1 B1 ,

∠BAC = ∠B1 A1 C1 = α, and AC + BC = A1 C1 + B1 C1 . The extra
construction hinted at in the text is to extend side AC of ABC beyond
C to point D so that CD = CB, and analogously for A1 B1 C1 to obtain
point D1 . This was done so as to arrive at two obviously congruent triangles:
ABD ∼ = A1 B1 D1 by SAS, where the second pair of equal sides are the
sum-sides AD = A1 D1 .
The congruence yields four more equal angles: ∠ADB = δ = A1 D1 B1
along with the other δ-angles from the two isosceles triangles, BCD ∼ =
B1 C1 D1 . Subtracting, we obtain yet another pair of equal angles:
∠ABC = ∠ABD − ∠CBD = μ − δ = ∠A1 B1 D1 − ∠C1 B1 D1 = ∠A1 B1 C1 .
Since AB = A1 B1 and α is the same in both original triangles, ASA kicks in
to complete the proof of the desired congruence: ABC ∼ = A1 B1 C1 . 
Session 2

Rubik’s Cube. Part II

Tom Davis

Sneak Preview. In Part I of this session, we encoded the moves on the Rubik’s
Cube via permutations. Understanding the mathematics of these face-twisting
permutations is indeed equivalent to a complete understanding of Rubik’s Cube.
Fortunately, permutations form a most famous and well-studied example of what
is known in mathematics as a group.
We begin this Part II with a super-fast introduction to group theory, discussing
very basic groups together with examples based on Rubik’s Cube. The session
culminates in calculating the total number of positions that can be reached from
a solved cube. Although more complex, this feat resembles the 15-puzzle in
Session 5, where one can plunge into a detailed study of group theory. Naturally,
these two sessions reinforce the same abstract concepts from somewhat different
angles, and each of them is self-contained and can be tackled independently.
Part III will reward the patient reader: our newly-developed group-theoretic
tools will be used to find methods for efficiently solving jumbled cubes.

1. What Is a Group?

1.1. Formal definition. Whether small or “impossibly” large, a group

is an abstract mathematical object that can be defined in terms of a few
simple axioms and about which theorems can be proven. For example, the
set of permutations of Rubik’s Cube that we studied in Part I provide one
example of a group. Unfortunately, this Rubik’s Cube group is large and
fairly complex. Indeed, as we shall prove in Section 5:
Problem 1. The Rubik’s Cube group R has
8! 12! 210 37 = 43, 252, 003, 274, 489, 856, 000 members,
one corresponding to each position reachable from a solved cube!
To begin our study of group theory, as is always the case in mathematics,
it is a good idea to begin looking at basic groups with only a few members
instead of trying to tackle the Rubik’s Cube group as our first example. But
first, let’s settle on

Definition 1. A group G consists of a set of objects and a binary operation

i ∗ on those objects satisfying the following four conditions:
(1) The operation ∗ is closed, i.e., if g and h are any two elements of the
group G then the object g ∗ h is also in G.
(2) The operation ∗ is associative, i.e., if f , g, and h are any three elements
of G, then (f ∗ g) ∗ h = f ∗ (g ∗ h).
(3) There is an identity element e in G, i.e., there exists an e ∈ G such that
for every element g ∈ G, e ∗ g = g ∗ e = g. (Often, in groups where the
operation is like multiplication, the symbol “1” is used in place of e.)
(4) Every element in G has an inverse relative to the operation ∗, i.e., for
every g ∈ G, there exists an element g −1 ∈ G such that g ∗ g −1 =
g −1 ∗ g = e.

For those who desire the absolute minimum in conditions, see this footnote.1
Most familiar mathematical systems involve commutative operations, but
this is not necessarily the case in group theory. In other words, there may
exist elements g and h of G such that g ∗ h = h ∗ g, making the group
i non-commutative. Notice also that the definition above does not require
that a group be finite. In this session we will consider mostly finite groups,
although, as in the case of R, those finite groups may be quite large.
Since there is only one operation ∗, we often omit it and write gh in place
of g ∗ h. In the case of multiplication of permutations we already do this:
(1 2) combined with (1 3 4) is written (1 2)(1 3 4). Similarly, we can define
g 2 = gg = g ∗ g, g 3 = ggg = g ∗ g ∗ g and so on, g 0 = e, and g −n = (g −1 )n .
Because of associativity, these are all well-defined and they obey the usual
laws of exponents, such as: g m+n = g m g n , and this is true for any integers
m and n, be they positive, negative, or zero.

1.2. Famous infinite groups. You are probably already familiar with a
few finite groups, but most of the best-known examples are infinite:
• The integers as the group elements under addition.2
• The rational numbers under addition.
• The rational numbers with 0 omitted under multiplication.
• The real numbers or complex numbers under addition.
• The real or complex numbers omitting 0 under multiplication.

In fact, there is a slightly simpler and equivalent definition of a group: only a right
identity and a right inverse are required (or a left identity and a left inverse). In other
words, if there is an e such that g ∗ e = g for all g ∈ G and for every g ∈ G there exists a
g −1 such that g ∗ g −1 = e then you can show that e ∗ g = g and that g −1 ∗ g = e. This can
be done by evaluating the expression g −1 ∗ g ∗ g −1 ∗ (g −1 )−1 in two different ways using
the associative property, yielding that the left and right identities are the same and that
the left and right inverses of any element are also the same.
The term “under addition” simply means that the group operation is addition.

Check your understanding of the definitions so far:

Exercise 1. Verify that the examples above do form groups according to

the formal definition. Identify which element in these groups is the identity
(signified by e in the formal definition), and then check that all the other
group properties listed in Definition 1 hold.
In fact, all of the above sets are infinite and commutative groups. A
i group that is commutative is sometimes called an abelian group.
On the other hand, the non-negative integers {0, 1, 2, 3, . . .} under addi-
tion do not form a group – there is an identity (0), but there are no inverses
for any positive numbers. We can’t include zero in the groups of rational,
real, or complex numbers under multiplication since it has no inverse.
i The so-called trivial group consists of the single element e, and satisfies
e ∗ e = e. Since every group must contain the identity element, this is the
smallest possible group.

1.3. Groups from number theory. Only if you know about modular
arithmetic (cf. the Number Theory I session, vol. I), show that:
Exercise 2. The n elements 0, 1, . . . , n − 1 form a (finite abelian) group
under addition modulo n.

Exercise 3. If p is prime, then multiplication modulo p forms a group con-
taining p − 1 elements: 1, 2, . . . , p − 1.
If p is not a prime then the operation of multiplication modulo p does
not form a group. For example, if p = 6 there is no inverse for 2: 2 ∗ 1 = 2,
2∗2 = 4, 2∗3 = 0, 2∗4 = 2, and 2∗5 = 4. It is also not a group since 2∗3 = 0
and 0 is not in the set {1, 2, 3, 4, 5}, so in this case the operation is not even
closed! (Remember that in this example the “∗” represents multiplication
modulo 6.) Worse, when two numbers, neither of which is zero, multiply to
yield zero, then the system is said to have zero divisors; this immediately
prevents it from being a group (why?). In fact,
Exercise 4. When a modular system under multiplication has no zero divi-

sors it forms a group. This occurs precisely when the modulus n is a prime
number. If n is not prime, there will be zero divisors, and hence no group
under multiplication.
In the group based on addition modulo n, if you begin with the element
1, one can get to any element in the group by successive additions of that
element. In the group modulo 5, we obtain: 1 = 1, 2 = 1 + 1, 3 = 1 + 1 + 1,
4 = 1 + 1 + 1 + 1 and 0 = 1 + 1 + 1 + 1 + 1. The same idea holds for any n.
In this case we say that the group is generated by a single element (e.g., 1),
and such groups are called cyclic groups, since successive additions simply
i cycle through all the group elements. The element that generates the group
in this way is called a generator.

A cyclic group may have more than one generator. For example, in
the same group corresponding to addition modulo 5, the element 3 is also
a generator: 1 = 3 + 3, 2 = 3 + 3 + 3 + 3, 3 = 3, 4 = 3 + 3 + 3 and
0 = 3 + 3 + 3 + 3 + 3.

 any other generators? What are all the generators of the group correspond-
Exercise 5. Does the group above corresponding to addition modulo 5 have

ing to addition modulo 6? How about multiplication modulo 7?

1.4. Groups from geometry. For any particular geometric object, the
i symmetry operations on that object form a group. A symmetry operation is
a movement after which the object looks the same (as if nothing happened
to it and it didn’t move!). For example,

 not a circle? Describe the group of these symmetries.

Exercise 6. How many symmetry operations are there on an ellipse that is

Solution: There are 4 symmetry operations on an ellipse whose width

and height are different:

e a b c
Figure 1. Four Symmetry Operations on an Ellipse

∗ e a b c
e: Leave it unchanged. e e a b c
a: Rotate it 180◦ about its center a a e c b
b: Reflect it across its short axis b b c e a
c: Reflect it across its long axis c c b a e
The group operation consists of making one movement followed by making
a second movement. Clearly e is the identity, and each of the operations is
its own inverse. We can write down the group operation ∗ on any pair of
elements of the ellipse symmetries in the 4 × 4 table above. 

 triangle? On a square?
Exercise 7. How many symmetry operations are there on an equilateral

Answers: The group of symmetries of an equilateral triangle consists of

six elements. You can leave it unchanged, rotate it by 120◦ or 240◦ , and you
can reflect it across any of the lines through the center and a vertex. 
In the same way, the group of symmetries of a square consists of eight
elements: the four rotations (including a rotation of 0◦ which is the identity)
and four reflections through lines passing through the center containing either
the diagonals or the perpendiculars to the edges. 

In general, a regular n-gon has a group of 2n symmetries and these are

i called the dihedral groups.

 symmetries of the equilateral triangle and for the 8-element group of sym-
Exercise 8. Try to make a multiplication table for the 6-element group of

metries of the square.

Unlike the group of symmetries of the ellipse, the groups in Exercise 8
are not abelian, so a ∗ b is not necessarily the same as b ∗ a. To keep track
of which is which, make the column correspond to the first element and the
row correspond to the second.3 In other words, a ∗ b is in column a, row b,
and b ∗ a is in column b, row a.
 Exercise 9. Is the group of symmetries for the circle finite? Abelian?
Answer: A circle has an infinite number of symmetries. It can be rotated
about its center by any angle θ such that 0 ≤ θ < 360◦ or it can be reflected
across any line passing through its center. The group is not abelian: rotations
and reflections do not commute in general. 

2. Permutation Groups and Group Isomorphisms

2.1. Moves and twists vs. the group operation on R. The most
important class of examples for us (since we’re supposed to be fixated on
Rubik’s Cube as we read this) come from certain sets of permutations which
also form groups. Since a permutation is just a rearrangement of objects, the
group operation is simply the concatenation of two such rearrangements.4 In
other words, if g is one rearrangement and h is another, then the rearrange-
ment that results from taking the set of objects and applying g to it, and
then applying h to the rearranged objects, is what is meant by g ∗ h.
To avoid a possible misunderstanding, when we speak about the Rubik’s
Cube group, the group members are move sequences and the group operation
is the act of doing one sequence followed by another sequence. At first
it’s easy to get confused if you think of rotating the front face as a group
operation. The term “move sequence” above is not exactly right either – move
sequences that have the same final result are considered to be the same. For
an easy example, F and F5 are the same group element.
Definition 2. The Rubik’s Cube group R is the set of all possible permuta-
tions of the facelets achievable by means of a finite number of twists of the
i cube faces. To combine two of these permutations, we simply apply one set
of twists after the other.
This, of course, is a huge group.
Compare with the Group Theory I session, where rows and columns are reversed.
Warning: This session uses the notation of multiplication from left to right, i.e., gh
means apply first g and then h. This is in contrast with the right-to-left notation in the
Group Theory session, where gh in “action groups” means apply first h and then g.

2.2. Permutation after permutation. In any permutation group the

identity is the permutation that leaves all the objects in place. The inverse
i of a permutation is the permutation that exactly undoes it. To multiply two
permutations together, just pick each element from the set of objects being
permuted and trace it through.
For example, if the set of objects that are to be permuted consists of the
six objects {1, 2, 3, 4, 5, 6} and we wish to multiply together (1 2 4)(3 6) and
(5 1 2)(4 3), we can begin by seeing what happens to the object in box 1
under the influence of the two operations (cf. Fig. 2 for a visual display of
this product). The first operation moves it to box 2 and the second moves
the object in box 2 to box 5. Thus, the combination moves the object in box
1 to box 5. Therefore, we can begin to write out the product as follows:
(1 2 4)(3 6) ∗ (5 1 2)(4 3) = (1 5 . . .
We have written “. . .” at the end since we don’t know where the object in
box 5 goes yet. Let’s trace 5 through the two permutations. The first does
not move 5 and the second moves 5 to 1, so (1 5) is a complete cycle in the
product. Here’s what we have, so far:
(1 2 4)(3 6) ∗ (5 1 2)(4 3) = (1 5) . . .

1 2 3 4 5 6

4 1 6 2 5 3

5 4 2 6 1 3

Figure 2. Multiplying (1 2 4)(3 6) ∗ (5 1 2)(4 3)

We still need to determine the fates of the other objects. So far, we

haven’t looked at 2, so let’s begin with that. The first permutation takes it
to 4 and the second takes 4 to 3 so we’ve got this:
(1 2 4)(3 6) ∗ (5 1 2)(4 3) = (1 5)(2 3 . . .
Doing the same thing again and again, we find that the pair of permutations
takes 3 to 6, that it takes 6 to 4, and finally, that it takes 4 back to 2. This
accounts for all of the objects in the set, so the final product of the two
permutations is given by:
(1 2 4)(3 6) ∗ (5 1 2)(4 3) = (1 5)(2 3 6 4).

From now on we’ll omit the “∗” operator and simply place the permuta-
tions to be multiplied next to each other.
Exercise 10. Verify the following product of permutations of {1, 2, . . . , 9}:
(1 2 3)(4 5)(6 7 8 9)(2 5 6)(4 1)(3 7) = (1 5)(2 7 8 9)(3 4 6).
Practice multiplying together other pairs of permutations.
Exercise 11. If a permutation is expressed in cycle notation where each of

the permuted objects appears in a single cycle, show that the inverse of that
permutation can be obtained by reversing the order of the elements in each
Ä ä−1
cycle. For example, (1 4)(3 5 2) = (4 1)(3 2 5), where (4 1) = (1 4).
As we noticed when we looked at permutations of the facelets of Rubik’s
Cube, the order makes a difference: (1 2)(1 3) = (1 3)(1 2) since (1 2)(1 3) =
(1 2 3) and (1 3)(1 2) = (1 3 2). And indeed, here the object 1 is shared by
both cycles, preventing them from commuting with each other (why?).

2.3. The multiplication table revisited. Let’s look in detail at a par-

ticular group – the group of all permutations of the three objects {1, 2, 3}.
We know that there are n! ways to rearrange n items since we can chose
the final position of the first in n ways, leaving n − 1 ways to chose the
final position of the second, n − 2 for the third, and so on. The product,
n · (n − 1) · (n − 2) · · · 3 · 2 · 1 = n! is thus the total number of permutations.
For three items this means there are 3! = 3 · 2 · 1 = 6 permutations:
(1), (1 2), (1 3), (2 3), (1 2 3), and (1 3 2).
 Table 1 is the group “multiplication table” for these six elements. Since, as
we noted above, the group multiplication is not necessarily commutative,
the table is to be interpreted such that the first permutation in a product
is chosen from the row on the top and the second from the column on the
left. At the intersection of the row and column determined by these choices
is the product of the permutations. For example, to find the permutation
product of (1 2) by (1 3) choose the item in the second column and third
row: (1 2 3).
* (1) (1 2) (1 3) (2 3) (1 2 3) (1 3 2)
(1) (1) (1 2) (1 3) (2 3) (1 2 3) (1 3 2)
(1 2) (1 2) (1) (1 3 2) (1 2 3) (2 3) (1 3)
(1 3) (1 3) (1 2 3) (1) (1 3 2) (1 2) (2 3)
(2 3) (2 3) (1 3 2) (1 2 3) (1) (1 3) (1 2)
(1 2 3) (1 2 3) (1 3) (2 3) (1 2) (1 3 2) (1)
(1 3 2) (1 3 2) (2 3) (1 2) (1 3) (1) (1 2 3)

Table 1. Multiplication of permutations of 3 objects


2.4. To be or not to be isomorphic? If we make a similar table of the

symmetries of an equilateral triangle ABC (cf. Exer. 8) with A, B, and C
listed counterclockwise, whose elements are 1, rotate 120◦ (r1 ), rotate 240◦
(r2 ), and flip across an axis through A, B or C (fA , fB , fC ), then we will
obtain the group multiplication Table 2:

1 fA fB fC r1 r2
1 1 fA fB fC r1 r2
fA fA 1 r2 r1 fC fB
fB fB r1 1 r2 fA fC
fC fC r2 r1 1 fB fA
r1 r1 fB fC fA r2 1
r2 r2 fC fA fB 1 r1

Table 2. Multiplication of symmetries of an equilateral triangle

If you look carefully at Tables 1 and 2, you can see that they are, in
a sense, the same – the only difference is the names used for the group
elements. If you substitute 1 for (1), fA for (1 2), fB for (1 3), fC for (2 3),
r1 for (1 2 3), and r2 for (1 3 2), the two tables are identical, so in a sense,
the two groups are the same. When two groups differ only in the names used
i for their elements, we call them isomorphic.
In fact, it is easy to see why this is the case. The symmetries of ABC
just move the letters labeling the vertices around to new locations and the
six symmetries of the triangle can arrange them in any possible way, so in a
sense, the triangle symmetries rearrange A, B, and C and the permutation
group rearranges the objects 1, 2, and 3.
Thus when you read in a group theory textbook that:
Problem 2. There are exactly two groups of order 6.
. . . what this means is that every group having 6 elements, with an appropri-
ate relabeling of the members of the group, will be like (be isomorphic to)
one of those two groups. Now, the groups in Tables 1 and 2 are the same;
the other group of order 6 corresponds to addition modulo 6 (cf. Exer. 2).
Definition 3. The group that contains all the permutations of three objects
i is called the symmetric group on three objects. In general, the group consist-
ing of all the permutations on n objects is the symmetric group on n objects.
Since there are n! permutations of n objects, n! is the size of the sym-
metric group on n objects.

 dition modulo 6 and multiplication modulo 7. Is there more than one isomor-
Exercise 12. Find an isomorphism between the group corresponding to ad-

phism, i.e., more than one way to make the multiplication tables identical?

Exercise 13. Show that the group consisting of the 4 symmetries of an
ellipse with different length axes (described in Exercise 6) is not isomorphic
to the group corresponding to addition modulo 4.
2.5. Part of the whole may be all you need. A permutation group does
not have to include all possible permutations of the objects. If we consider
the Rubik group R as a permutation group on the cubies, there is obviously
no permutation that moves an edge cubie to a corner cubie and vice-versa.
The group consisting of the complete set of permutations of three objects
shown in Table 1 contains various proper subsets that also form groups using
the same operation, but limited to that subset:
{1}, {1, (1 2)}, {1, (1 3)}, {1, (2 3)}, and {1, (1 2 3), (1 3 2)}.
Definition 4. The subsets of groups that are themselves groups under the
i same operation are called subgroups.
For example, the above subsets are recognizable as the trivial subgroup and
the four subgroups generated each by a reflection or a rotation of the equi-
lateral triangle. The group in which we are most interested here, the Rubik’s
Cube group R, is itself a subgroup of the group of all permutations of 48
items. We will examine the properties of subgroups in the following section.

3. Properties of Groups and Their Subgroups

3.1. Basics in group theory. This session is not meant to be a complete

course in group theory, so we’ll list below a few of the important definitions
and some properties satisfied by all groups. Proofs can be found in any
introduction to group theory or abstract algebra textbook (cf. Gallian [26]).
From time to time, we will give a name of a group or group property with
no further explanation because we do not need it to help us solve Rubik’s
Cube. If you are interested, you can look up that name or property in a
group theory textbook to learn more.
Theorem 1. Let G be a group.
(a) The identity is unique and every element of G has a unique inverse.
(b) The order of an element g ∈ G is the smallest positive integer n such
that g n = e. If no such n exists, the order is said to be infinite. In a
finite group every element has a finite order.
(c) The order of a group is the number of elements in it. If g ∈ G then
the order of g divides the order of G.
(d) If H is a subgroup of G then the order of H divides the order of G.
Although parts (a)-(b) are doable (by contradiction or Pigeonhole Principle)
parts (c)-(d) are hard and are beyond the scope of this session. Still, the
next page has exercises on properties of subgroups (in special but important
cases) that can be verified with no advanced group theory background.

Exercise 14. If H and K are both subgroups of the same group G, then
 H ∩ K is also a subgroup of G. In other words, the intersection of any two
subgroups of G satisfies all the group properties from Definition 1.
Using as an example the symmetric group on three objects displayed in
Table 1, the order of (1 2) is 2, the order of (1 2 3) is 3, and both 2 and
3 divide 6, the order of the group. The proper subgroups of the symmetric
group listed in Section 2.5 have orders 1, 2, and 3 – again, all are divisors of
6, as they must be. Any pair of subgroups in that list only have the identity
element in common, so clearly the intersection of any two of them is also a
group, although in these cases it is the trivial group.
Exercise 15. Consider the symmetric group G on 4 objects: the group of
order 4! = 24 that consists of all the permutations of 4 objects. Let H be
the subset of G made of all permutations that leave the element 1 fixed (but
 with no further restrictions), and let K be the subset of permutations that
leave 2 fixed. List the elements of H, K, and their intersection H ∩ K, and
verify that all three subsets are indeed subgroups of G.
Answers: For the three subsets, we have:
H = {(1), (2 3), (2 4), (3 4), (2 3 4), (2 4 3)},
K = {(1), (1 3), (1 4), (3 4), (1 3 4), (1 4 3)},
H ∩ K = {(1), (3 4)},
illustrating that the intersection of two subgroups is also a subgroup (and in
this case, it is the set of all permutations that leave both 1 and 2 fixed). ♦
It is easy to see why Theorem 1(c) is true for the full symmetric groups:
Problem 3. If G is the symmetric group on several objects, then the order
 of any permutation in G has to divide the order of G.
Sketch: As we saw in Part I, we can write down any particular permuta-
tion as a set of (disjoint) cycles, and the order of that permutation is simply
the least common multiple (lcm) of the cycle lengths (why?). Since there are
n elements that are moved by the permutations, the longest cycle can have
length at most n, so all the cycle lengths are thus n or less. But the order
of the group is n!, and clearly the lcm of a set of numbers less than n will
divide n! (why?). ♦

3.2. A few proper subgroups of the Rubik group. Since the center
cubies always remain in the same position relative to the others, we will
always consider the cube to be oriented in a specific way (say, with the white
face up and the green face on the left). We consider to be moves only those
operations that twist a face relative to the others, so rotating the entire cube
as a unit is not a move we will consider. With a real cube, it is sometimes
interesting to think about “slice moves” where, say, the top and bottom face
are left in position and the center slice between them is turned (cf. the “slice

subgroup” in Problem 4, p. 34), but this is equivalent to a combination of a

clockwise rotation of one face together with a counterclockwise rotation of
the face opposite, so a slice move does not really introduce anything new.
In its total glory, a jumbled Rubik’s Cube is difficult to unjumble, espe-
cially when you are a beginner.
PST 10. A common method to study complex situations is to look first
at simpler cases and learn as much as you can about them before tackling

the harder problem. One way to simplify Rubik’s Cube is to consider only
a subset of moves as being allowable and to learn to solve cubes that were
jumbled with only those moves. If you do this, you are effectively reducing the
number of allowable permutations, but you will still be studying a subgroup
of the full Rubik group.

3.2.1. Rubik program to the rescue! Let’s consider a few subgroups of R,

which you may wish to investigate yourself using the Rubik program:
Figure 3a shows what the Rubik window looks like after pressing the “Fcw”
(meaning “F clockwise”) button twice, beginning with a solved cube. The
cube can be returned to a solved state by pressing the “Reset Cube” button.

Figure 3. Rubik window and Macro gizmo (FF,LL)

The Rubik program contains a “macro gizmo” to make this easier. Fig-
ure 3b shows the gizmo with two macros defined: one that does the F op-
eration twice and one that does the L operation twice. To perform the FF
macro, simply click on the button marked “FF”. The help file for the Ru-
bik program describes how to define macros and include them in the macro
gizmo. If you’d like to investigate the positions achievable by a limited set of
moves, define each of the moves as a macro and put all of them in the macro

gizmo. Then make moves from an initialized cube using only macro gizmo
entries. In fact, if you place the macro gizmo on top of the control panel of
Rubik, you will not press any other buttons by accident. If you restrict your
moves to any of these subgroups, the cube will be easier to solve.

3.2.2. Examples you can do in practice. The list below is a tiny subset of
the total number of subgroups of the whole group, but these are “practical”
examples: you can experiment with a real cube making only the moves in
the indicated subgroups. Explore and describe, as much as you can, features
of these subgroups, e.g., try to calculate the order of the subgroup, to decide
whether it is abelian or not, cyclic or not, whether it looks like another group
you know, etc. Do not look at the commentaries after the exercises until you
have thought about the subgroups for a while. (In Part III, we will examine
in detail more general but less practical subgroups of R.)
Exercise 16. (Single face subgroup) In this subgroup of R, you are only
allowed to move a single face.
Hint: This group is not very interesting, since there are only 4 achievable
positions including “solved,” but it still is a proper subgroup of R. ♦
Exercise 17. (Two opposite faces subgroup) In this subgroup of R, you
 are only allowed to move only two opposite faces.
Hint: This is also a fairly trivial group since twists of two opposite faces
are independent. Still, it has 16 elements and is an example of what is
known as a direct product group. Beware: if you are allowed to turn two
adjacent faces, the subgroup is enormous: it contains 73,483,200 members,
the calculation of which is beyond the scope of this session. ♦
Exercise 18. (F-L half-turn subgroup) In this subgroup of R, you are
 allowed to move either the front face or the left face by half-turns.
Solution: In Figure 4 we see all 12 cube positions in the subgroup gen-
erated by FF and LL. Since applying FF or LL twice in a row brings us to
the previous position, the 11 positions different from the solved position are:
FF, FFLL, FFLLFF, FFLLFFLL, . . . , (FFLL)5 FF, arranged in that order in the
figure. The final position in the lower-right corner of the figure will return
to the solved position with one more application of LL. 

Problem 4. (The slice subgroup) In this subgroup of

R, you can only move the center slices (cf. the figure to the
right). The subgroup can be further restricted by requiring
that one, two, or three of those slices must make half-
turns only.
Answers: The full slice group contains 768 members. If one of the slices
must be a half-turn, there are 192 members. If two are half-turns, there are
32 group members, and if all three moves must be half-turns, there are only
8 members. Can you justify all these numbers? ♦

Figure 4. F-L half-turn subgroup of R

4. Even and Odd Worlds

Not every rearrangement of the Rubik’s Cube is possible. In the up-

coming Group Theory I session, we will learn that the 15-puzzle “prohibits”
exactly half of the possible arrangements of its squares: these were the so-
called odd permutations, which are unattainable (unless you cheat, break
the puzzle apart, and put it back together switching two tiles). For the Ru-
bik’s Cube, a larger fraction of rearrangements are impossible, some due to
a similar parity argument. To prepare ourselves for it, we study even and
odd permutations from scratch in this section.

4.1. Parity of permutations. We will now show that all permutations

can be divided into two sets – those with even and odd parity. Just as is the
case of addition of whole numbers, multiplying two permutations with even
parity or two with odd parity will result in a permutation of even parity. If
one of the two has odd parity and the other even parity, the result will be odd.

To start with, notice the following:

(1 2) = (1 2)
(1 2)(1 3) = (1 2 3)
(1 2)(1 3)(1 4) = (1 2 3 4)
(1 2)(1 3)(1 4)(1 5) = (1 2 3 4 5)
(1 2)(1 3)(1 4)(1 5)(1 6) = (1 2 3 4 5 6),

and it is not hard to prove that the pattern continues. This shows that any
n-cycle can be expressed as a product of 2-cycles. If n is even, there are

an odd number of 2-cycles and vice-versa. Since every permutation can be
expressed as a set of disjoint cycles, this means that every permutation can
be expressed as a product of 2-cycles. For example:
(1 4 2)(3 5 6 7)(9 8) = (1 4)(1 2)(3 5)(3 6)(3 7)(9 8).
Obviously, there are an infinite number of ways to express any particular
permutation as a product of 2-cycles:
(1 2 3) = (1 2)(1 3) = (1 2)(1 3)(1 2)(1 2) = (1 2)(1 3)(1 2)4 = · · · .
But it turns out that there is one big restriction in such representations:
Theorem 2. For any given permutation, the number of 2-cycles necessary
to represent it is either always even or always odd.
For this reason, we can say that
Definition 5. A permutation is either even or odd, depending on whether
i its representation requires an even or an odd number of 2-cycles.
Theorem 2 is not too hard to prove, as long as one is willing to allow
some polynomial algebra to sneak into our discussion.5
Proof: Consider a permutation of the set {1, 2, . . . , n} that moves 1 to x1 ,
2 to x2 , 3 to x3 , and so on. All the xi ’s are different, and they represent
exactly the numbers from 1 to n in some order. Now construct the product:

(1) (xi −xj ) = (x2 −x1 )(x3 −x1 ) · · · (xn −x1 )(x3 −x2 ) · · · (xn −xn−1 ),
where we simply multiply all differences between the xi ’s, always putting the
larger index first. If you have never seen the Π-product notation before, the
Greek symbol Π (pi) in front indicates a collection of things to be multiplied.
In the example above, it means to multiply together all possible terms of the
form (xi − xj ) where 1 ≤ i < j ≤ n. It is similar to the Σ-notation for
summation, if you have seen that before. If you find it easier to understand,
the product notation above has the following alternate representation where
both i and j step up one at a time:
Ç å
(xi − xj ) = (xi − xj )
1≤j<i≤n j=1 i=j+1

Since all the xi ’s are different and every term in the product (xi − xj )
is non-zero due to i = j, the total product itself is also non-zero. Since in
each term the value of xi may be greater than or less than xj , the individual
terms, and hence the product, may be positive or negative.
Definition 6. If the product (1) is negative, we will call the permutation
i odd, and if the product is even, we will call it even.
Theorem 2 is approached differently in the Group Theory I session.

4.2. Logical consistency. Earlier we defined the parity of a permutation

in a different way. Is it OK to make another definition of the same concept?

 up with an alternative definition of a concept with which it might be easier

PST 11. Often within proofs and applications, it is advantageous to come

to work within the theory and/or in practice. To make everything logically

consistent, you need to show that the two definitions are equivalent.
Indeed, we will see that our new Definition 6 using the sign of the product
Π corresponds exactly to previous Definition 5 of odd and even permutations
based on the number of 2-cycles used to represent them. First, let’s check
that the new definition seems to make sense, at least in a few simple cases.
Exercise 19. Verify that the two definitions above yield the same parity for
the permutation (1 2). How about the permutation (1 3 2)?
Solution: The permutation (1 2) swaps 1 and 2, i.e., x1 = 2 and x2 = 1,
and the product has only a single term: (x2 − x1 ) = (1 − 2) = −1 < 0. Thus,
a permutation with one cycle (1 is odd) corresponds to a negative product,
and hence (1 2) is odd, regardless of which definition we use. 
Now consider (1 3 2). This should be an even permutation since (1 3 2) =
(1 3)(1 2), and thus the corresponding product should be positive. We have
x1 = 3, x2 = 1, and x3 = 2, and the calculation below shows that indeed the
product is positive:
(x2 − x1 )(x3 − x1 )(x3 − x2 ) = (1 − 3)(2 − 3)(2 − 1) = +2 > 0.
Once again, (1 3 2) is even according to both definitions. 
You can check a couple more examples if you like, but you’ll discover
that it always seems to work. Why is that? Let’s continue with the proof.
The identity permutation should be even (it can be represented by zero
2-cycles, and 0 is even). Indeed, xi = i for all i, so if i > j, xi −xj = i−j > 0,
and all the terms in the product are positive, making the product positive.
Now, if we multiply any permutation by a 2-cycle, this should change
it from even to odd or vice-versa. Correspondingly, we’d like to see that
multiplying by a 2-cycle will flip the sign of the product. The following
technique will work for any 2-cycle, but let’s just look at multiplication of
some permutation ρ by the 2-cycle (1 2): ρ (1 2). This 2-cycle exchanges 1
and 2, so in the product, every x1 becomes an x2 and vice-versa. Let’s write
the original product in the following form:

(xi −xj ) = (x2 − x1 )(x3 − x1 )(x4 − x1 ) · · · (xn − x1 )
1≤j<i≤n (x3 − x2 )(x4 − x2 ) · · · (xn − x2 )
(x4 − x3 ) · · · (xn − x3 )
.. ..
. .
(xn − xn−1 ).

If we exchange x1 and x2 , the sign of (x2 − x1 ) will flip, but consider

the rest of the line. Each term in the remainder of the line will become
exactly the same as the term directly below it, and the term directly below
will become that term, so there will be no additional changes of sign in the
rest of the terms of the product. (For example, (x3 − x1 ) and (x3 − x2 )
switch places, but this leaves the product unchanged.) Hence, only one term
changes sign, so the product will flip from positive to negative or vice-versa,
and the permutation will flip its parity.

The reader should verify that this argument can be modified to work for
any 2-cycle (a b) in place of (1 2).
Returning to the proof of Theorem 2, we utilize the following well-known:

 by a 2-cycle changes the parity of the permutation), break the full process
PST 12. If a property has been proven for a small step (e.g., multiplying

into a sequence of analogous steps and apply the property at each step.
Now, recall that any permutation ρ can be written as a product of 2-
cycles. Thus, we can build up, step by step, from the identity to ρ, multi-
plying by a 2-cycle and changing the parity at each step. This means that
the two definitions will yield the same parity for ρ. Since our alternative
Definition 6 using the product Π is independent of the particular way we
write ρ as a product of 2-cycles, then it doesn’t matter which and how many
particular 2-cycles we have multiplied to get ρ: the number of such 2-cycles
will be always be odd for ρ, or will always be even for ρ. ♦
Embedded in our proof was an old math “trick”:
PST 13. If a definition depends on making choices and can thereby, hy-

pothetically, yield different answers, find another way to define the same
concept that is independent of choices.
In our discussion above, the original Definition 5 of parity of a permuta-
tion depended on the specific decomposition of the permutation as a product
of 2-cycles, while the alternative Definition 6 using the product Π did not
depend on any choices. We showed that the two definitions are equivalent.

4.3. Looking globally at the odd/even worlds, we discover a balance:

Problem 5. Half of the permutations on n objects are even and half – odd.

In fact, an important subgroup of the symmetric group on n objects is the

subset of all the even permutations. This subgroup is called the alternating
i group on n objects.6 Obviously, the subset of the odd permutations does
not form a subgroup since it is missing the identity; besides, it is not closed
under composition of two odd permutations (why?).
The alternating groups on 5 or more objects are the first examples of
so-called simple groups that you will encounter in any formal class on group
theory. We will not examine simple groups in this session.
See Group Theory I for a proof of Problem 5 and a discussion of the alternating group.

5. How Many Cube Positions Can Be Reached?

Ideal Toy Company stated on the package of the original Rubik’s
Cube that there were more than three billion possible states the cube
could attain. It’s analogous to MacDonald’s proudly announcing
that they’ve sold more than 120 hamburgers.
J. A. Paulos, Innumeracy

Problem 1 claims that the total number of reachable positions from a

solved cube is the following huge number: 8!·12!·210 ·37 = 227 ·314 ·53 ·72 ·11 =
43, 252, 003, 274, 489, 856, 000. How was this calculated? That’s what we’ll
investigate in this section, but we’ll need to learn to use some mathematical
tools to do so. We can also investigate later, with these same tools, the
orders of some of the subgroups of the full cube group R.

5.1. Parity and the cubies. We know that every possible permutation
of the cube can be achieved by some combination of single clockwise turns
of the faces, and it is also easy to see that:

 of the cubies.
Exercise 20. Every face turn has even parity with respect to the movements

Proof: The cycle structure for a single clockwise quarter-turn, say, of the
front face is this:
which clearly has even parity since each of the 4-cycles can be written as a
product of three 2-cycles for six total 2-cycles, making the parity even. 
This means that there is no combination of moves of the cube that will
exchange a single pair of cubies because that would correspond to an odd
permutation of the cubies.
As we shall see later, a cycle of three cubies of the same kind is possible,
or an exchange of two pairs, both edges, both corners, or one of each. If the
goal of solving Rubik’s Cube were simply to get the corner cubies and edge
cubies into their correct positions but not to worry about whether they were
oriented correctly, then if you were to break the cube apart and reassemble
it at random, on average half of your re-assemblies would result in a solvable
cube. The expected solution of Rubik’s Cube does require that you get the
orientations of the edge and corner cubies correct, and it turns out that there
are additional restrictions on these orientations, which we study below.

5.2. Parity and the edge cubies. Let’s consider first the edge cubies.
We will see that they, too, satisfy a parity condition:
Problem 6. An even number of the edge cubies must be flipped.

Proof: Imagine a cube in outer space held such that the center cubies stay
fixed as the other cubies turn around them. If you imagine a set of three-
dimensional coordinate axes whose origin is at the center of the cube and
such that each axis goes through the center of a pair of center cubies, then
for each axis, there are four edge cubies whose outer edges are parallel to that
axis: these four edges are determined by a slice of the cube perpendicular to
the chosen axis. Further, each axis has a positive and a negative direction.
Let us mark the outer edge of each cubie with an arrow that is aligned with
the positive direction of the axis parallel to it in the solved configuration.
At any stage, you can look at +1 -1
the arrows on each edge cubie’s outer +1 +1 +1 +1
+1 -1
edge to see if they are aligned with
their current axis. The figure on the
right illustrates a 90◦ rotation Fccw:
the outer arrow configuration on the
left will be converted to the arrow
configuration on the right. In this
case, exactly two of the arrow direc-
tions are flipped.
Now, next to any edge arrow write +1 if it is aligned with the positive
direction of an axis, and −1 otherwise, and multiply these four numbers for
the turning face. In the above example, the products before and after the
face turn are both +1. In general, they will be always equal. Indeed, look
at the arrows on two opposite edges: they remain to each other in the same
relative orientation before and after the turn, because if they were pointing
in the same direction before, they would be pointing the same direction now;
if they were pointing in opposite directions before, they would be pointing
in the opposite directions now. This means that the product of numbers on
a pair of opposite edges does not change after the turn. Hence the whole
product of the four edge numbers on the turning face remains the same,
implying that an even number of arrows must have flipped their direction.
Thus every turn of a face will flip an even number of arrows, so at any
stage, an even number of the edge cubies will be flipped since in the original
configuration zero of them were flipped. 
Consequently, it is impossible with any number of twists to flip exactly
one edge cubie in place.

5.3. Rotations and the corner cubies. The corner cubies satisfy a
slightly different condition. For each corner cubie, mark its three facelets by
1, 2, and 3 so that when you look at that cubie from the outside (along a
line through that corner toward the center of the cube), you will see “123”
marked in a clockwise direction. Obviously, for each cubie, there are 3 such
possible labellings, rotated from each other by 120◦ or 240◦ . As we will see,

our argument is independent of the particular labelling. We have picked and

fixed a random (clockwise) labelling for all corner cubies, as shown in the
first Rubik’s Cube in Figure 5a and in the unfolded version underneath it.
For clarity, the numbers on the front (red) face are in white. Note that if any
face of the cube is rotated, the corner cubies will still be oriented clockwise.7

2 3
1 3 2 1
1 2
3 2 1 3

1 2
3 1
2 2 2 3
3 3
1 1

2 1 2 1
1 2 2 3
3 3 2 1 3 3 2 1 3 1 3 2 1 3 2 1
1 2 3 3 1 1 2 3 1 2 3 1 2 1 2 3
11 22 1 3
2 3 2 3

Figure 5. Rotations of corner cubies under Fcw

After any move sequence on the Rubik’s Cube, we trace how the labelling
of the corner cubies have changed with respect to the initial fixed labellings
in the solved cube in Figure 5a. There are three possibilities for a corner
cubie: its labelling “123” went to a place with original labelling “123”, in
which case we say that cubie was rotated by 0◦ ; if “123” went to “231”, the
i cubie was rotated by 120◦ ; and if “123” went to “312”, the cubie was rotated
by 240◦ (always clockwise).
Problem 7. The total rotation of all eight corner cubies is zero, meaning
the sum of the rotation degrees for all the corner cubies is a multiple of 360◦ .
Proof: To see this, we can again look at what a single face turn does.
If every face turn preserves this condition, then so will any combination of
them. Obviously, the four corner cubies of the opposite face (that is not
turned) are literally untouched by the face-twist, and hence we need to show
only that the total rotation for the four cubies on the twisting face is zero.
To change the orientation to counterclockwise would imply that one facelet’s number
remains fixed, while a reflection switches the other two facelets’ numbers – a physically
impossible situation with a corner cubie.

In the example of the clockwise quarter-turn of the front face in Figure 5,

the corner cubie LUF (belonging to the left, up, and front faces) moves to
the corner cubie RUF, sending label 1 → 3, 2 → 1, and 3 → 2, i.e., “123”
→ “312,” indicating a 240◦ rotation. The corner cubie RUF also does a 240◦
rotation (“123” → “312”); RDF does a 0◦ rotation (“123” → “123”), and LDF
does another 240◦ rotation (“123” → “312”). In total, 3 × 240◦ = 720◦ ,
i.e., a zero total rotation. Of course, if we turn another face (or relabel the
solved cube differently), the individual rotations for the corner cubies will
be different. To see that the total rotation will always be zero, label
α1 the corner cubies of the rotated face by A1 , A2 , A3 ,
A1 A2
and A4 , going clockwise, and by α1 , α2 , α3 , and α4
α4 α2 the permutations of {1, 2, 3} that send, respectively, the
labels of A1 to those of A2 , of A2 to those of A3 , etc.
Then label 1 of cubie A1 will be moved first to label
A4 A3 α1 (1) of cubie A2 , which will move to label α2 (α1 (1)) of
cubie A3 , then to label α3 (α2 (α1 (1))) of cubie A4 , and
finally to label α4 (α3 (α2 (α1 (1)))) back in cubie A1 .
However, throughout this process, label 1 made 4 quarter turns around
the same face and, therefore, came back to itself. A similar argument for
the other labels 2 and 3 shows that the composition of the four permuta-
tions is the identity: α1 α2 α3 α4 = e (written in the left-to-right notation of
composition). In Figure 5, for instance, α1 = α2 = α4 = (312), and α3 = e,
so that α1 α2 α3 α4 = (312)3 e = e. In general, any αi could be the identity
(a 0◦ rotation), the 3-cycle (123) (a 120◦ rotation), or the 3-cycle (132) (a
240◦ rotation). Multiplying two αi ’s corresponds to adding their rotational
angles (why?). Since the product of the αi ’s is e, we conclude that the total
rotational angle of the 4 cubies, and hence of all 8 corner cubies, is 0◦ . 
This means that if the cube were assembled randomly, only one third
of the assemblies could be manipulated to put the corner cubes in a correct
orientation: one third of the time you’d be off by a total of 120◦ , and another
third of the time you’d be off by 240◦ .

5.4. The final countdown. We are now in a position to count the total
number of configurations that can be reached from a solved cube. First,

Exercise 21. How many configurations can be constructed with no con-
straints, i.e., if you pop the cube apart with a screwdriver, in how many
ways can you put it together?
Solution: There are 8 possible locations for each corner cubie, and if all
arrangements were possible, there would be 8! rearrangements. Similarly,
there are 12! rearrangements of the edge cubies. Each corner cubie could be
in any of 3 rotations, so there are 38 ways of aligning the corner cubies, and
similarly there are 212 flipping configurations of the edge cubies. The grand
total of configurations is thus: 8! · 12! · 38 · 212 . 

But we know better than to think all of these rearrangements are possible:
in this section we discovered constraints on the cubies’ moves!

 are at most the number given in Problem 1.

Exercise 22. Show that the achievable configurations of the Rubik’s Cube

Proof: Of the 8!·12!·38 ·212 configurations, only 1/3 will have the rotations
of the corner cubies right (by Problem 7), only 1/2 of those will have the
edge-flipping parity right (by Problem 6), and only 1/2 of those will have the
correct even parity of the total cubie rearrangement (by Exercise 6). Thus
the total number of reachable configurations from a solved cube is at most:
(8! · 12! · 212 · 38 )/(3 · 2 · 2) = 8! 12! 210 37 = 43, 252, 003, 274, 489, 856, 000. 
Are all of these rearrangements actually achievable? Obviously, we won’t
attempt to show separately that each and every one of these rearrangements
is possible. We should group them in a clever practical way in order to
minimize the work we have to do.

5.5. Getting the cubies in their correct positions. If we were able to

switch any two cubies at will, solving the cube would be a piece of cake . . . .
Unfortunately, we can’t solve the Rubik’s Cube by just using 2-cycles: after
all, a 2-cycle is an odd permutation while Exercise 20 showed that all moves
of the Rubik’s Cube correspond to even permutations of the cubies. The
closest to a 2-cycle that is an even permutation is a 3-cycle.
Exercise 23. Show that any even permutation on {1, 2, . . . , n} can be writ-
 ten as a product of 3-cycles.
Proof: Since any even permutation is a product of an even number of
2-cycles, it suffices to show that a product of any two 2-cycles can be written
in the desired form. These two 2-cycles could:
(a) be the same, i.e., (ab)(ab) = 1 = (abc)0 .
(b) share an element, i.e., (ab)(bc) = (acb).
(b) be disjoint, i.e., (ab)(cd) = (ab)(bc)(bc)(cd) = (acb)(bdc). 
Thus, if we show how to cycle any three edge cubies, we will be able to
perform any even permutation on the edge cubies. However, what would
happen if, by applying 3-cycles to the edge cubies, we end up with all of
them correctly positioned except for two edge cubies switched? Yes, such a
configuration is possible: it simply means that some two (or more) corner
cubies are also out of place, so that the total cubie permutation is still even.
OK, we perform 3-cycles on the corner cubies to get them in place; but, for
exactly the same reason, we might still not be able to place correctly the
last two corner cubies. And so, we need a “bridge” between edge and corner
cubies, i.e., a move that flips simultaneously pairs of cubies from each type.
We can now start thinking of concrete moves on the Rubik’s Cube that
will put the edge cubies and the corner cubies in their correct places.

Problem 8. We can perform an operation on the Rubik’s Cube that (with-

out worrying about orientation):
(a) cycles any three edge cubies; in fact, cycling three edge cubies on a
single face suffices to construct any 3-cycle among edge cubies.
(b) cycles any three corner cubies; in fact, cycling three corner cubies on a
single face suffices to construct any 3-cycle among corner cubies.
(c) swaps two edge cubies and two corner cubies; only one such operation
is required here.

Before we comment on the solution to this problem, let’s see what other
moves are necessary to solve the Rubik’s Cube. Suppose now we have man-
aged to place all cubies in their right positions in the cube, except for possibly
their orientations.
Problem 9. We can perform an operation on the Rubik’s Cube that:
(a) simultaneously changes the orientation of any two edge cubies; in fact,
doing this for two adjacent edge cubies will suffice. Adjacent edge cubies
have only one corner between them, e.g., UF and UR.
(b) rotates one corner cubie one way by 1/3 and another corner cubie the
other way by 1/3; in fact, doing this for two adjacent corner cubies will
suffice. Adjacent corner cubies have only one edge cubie between them,
e.g., ULF and URF.

Employing adjacent cubies above is prompted by a well-known technique:

 succeed in doing so by performing the operation only on pairs of adjacent

PST 14. If we want to perform some operation on two objects, we might

objects, one pair after the other.

In the case of Rubik’s Cube, the change of orientations on the edge
cubies or on the corner cubies (as in Problem 9) is achievable by successively
changing orientation on several pairs of adjacent edge cubies, or several pairs
of adjacent corner cubies. To see this, note that:
Exercise 24. One can get from any edge cubie to any other edge cubie by
following a path through adjacent edge cubies. Ditto for corner cubies. For
all such paths, one needs at most 4 cubies.
Unless you already know how to do it, it will be very hard to produce
the five algorithms in Problems 8-9 – they are at the heart of the solution to
the Rubik’s Cube! If you cannot succeed in finding such algorithms on your
own, it is OK to peek at the answers in the Hints section, and then either
repeat them on your physical Rubik’s Cube, or even better, try them on

the Rubik’s simulator by defining macros and substantially speeding up the
process. In fact, the reader who has been exploring the Rubik’s simulator
will know where to find these specific macros already defined!

5.6. Solving the cube, at least theoretically.

Problem 10. Using the five operations above, we can solve the Rubik’s Cube.

Proof: If the corner cubies need an odd permutation to get to their proper
places, then the edge cubies will also need an odd permutation (why?). In
such a case, the algorithm in Problem 8(c) will swap two corners and two
edges, which will make the corner and edge permutations both even. Now,
using the results from Problem 8(a)-(b) we can put the corner cubies and
edge cubies in their proper places.
From now on we will not permute the cubies – we will only change
their orientation in place to make them fit the solved Rubik’s Cube. Start
from any two edge cubies whose orientations are not correct (i.e., their two
facelets do not match the colors of the adjacent to them central cubies), and
use the algorithm in Problem 9(a) to flip simultaneously the orientations of
these edge cubies to the correct ones. Keep repeating the process for any
remaining pairs of incorrectly oriented edge cubies. Suppose that in the
end, there is only one incorrectly oriented edge cubie left (and so we cannot
apply the algorithm to it as that would disturb another, correctly oriented
edge cubie). If that situation were possible, then it would also be possible
from a solved cube to do a sequence of moves that results in changing the
orientation of only one edge cube, a contradiction with Problem 6! Hence,
at the end of the process, all edge cubies will be correctly oriented.
As for the corner cubies, start from any two corner cubies with incorrect
orientations. Apply the algorithm in Problem 9(b) to rotate them 1/3 one
way or the other, making sure that you are rotating at least one of them into
its correct orientation. Keep repeating the process for any remaining pairs
of incorrectly oriented corner cubies. Again, if in the end there is only one
incorrectly oriented corner cubie left, this would mean that from a solved
cube there is a sequence of moves resulting in a total rotation of 120◦ or
240◦ (given by the incorrectly oriented corner cubie), a contradiction with
Problem 7. Hence, in the end, all corner cubies must have been oriented
correctly . . .
. . . and the Rubik’s Cube is solved! 

6. Conclusions

In reality, of course, no one applies the above method, unless they have
an almost infinite time and patience on their hands. Think about how many
moves it would take to just flip two edge cubies’ orientations (Problem 9(a)
requires 15 moves), then multiply this by the number of pairs of incorrectly
oriented edge cubies (up to 6 pairs), and you will still be a long way from
solving the cube!

As the reader has undoubtedly heard, there are competitions in speed–

solving the Rubik’s Cube,8 with a world record of about 5.5 seconds! So,
there must be substantially more efficient (and practical) ways to solve the
Rubik’s Cube, some of which will be discussed in Part III of this session.
For now, we will content ourselves with the fact that we have, at least
theoretically, solved the Rubik’s Cube, and along the way incidentally gotten
two very cool facts: all possible jumbled states of the Rubik’s Cube are
precisely (8! · 12! · 212 · 38 )/(3 · 2 · 2), as Problem 1 promised, and

 would only have one chance in twelve of being solvable.

Corollary 1. A Rubik’s Cube reassembled at random after breaking it apart

7. Hints and Solutions to Selected Problems

Exercise 1. The complete verification of the properties of addition and

multiplication for complex numbers C and its subsets R, Q, and Z is beyond
the scope of this session. See Cohen and Ehrlich [17] or Youse [87] for details.
As an example of how to proceed, consider the set Q = { ab : a ∈ Z, b ∈ N}.
bd . Now by closure of addition in Z and N,
We add fractions by ab + dc = ad+bc
ad + bc ∈ Z and bd ∈ N, so Q is closed under addition. Since ab = ab + 0i,
the associativity follows from the associativity of the complex numbers. The
identity element is 01 and every fraction ab has − ab for an inverse. ♦
Exercise 3. To show existence of inverses, note that if ab ≡ ac (mod p) for
a prime p and integers a, b, c, d, none divisible by p, then we can cancel a
and conclude that b ≡ c (mod p) (why?). This means that for a fixed a = 0
in our proposed group, the products a · 1, a · 2, · · · , a · (p − 1) are all distinct
(why?), and hence one of them must be 1, yielding the inverse of a. ♦
Exercise 5. The addition modulo 5 group is also generated by 2 and by 4;
the addition modulo 6 group is only generated by 1 and by 5. Non-surprisingly,
the generators under addition modulo n are precisely the relatively prime to
n members of the group (why?). The multiplication modulo 7 group is only
generated by 3 and by 5. In fact, all modulo p groups under multiplication
are cyclic, but finding generators for them is a more involved process. ♦
Exercise 8. For the group table of the symmetries of an equilateral triangle
see Table 2. One way to present the symmetries of a square is to use a coun-
terclockwise rotation, ρ, and a reflection, φ, over the diagonal from upper left
to lower right. The eight symmetries are then {e, ρ, ρ2 , ρ3 , φ, φρ, φρ2 , φρ3 }
To help fill in the table note that φ2 = ρ4 = e. By inspecting the end results
one sees that φρφ = ρ3 . Multiplying by φ on the left we have ρφ = φρ3 . By
induction we then see ρn φ = φρ3n . The group table of the symmetries of
a square can thus be fairly easily completed – without having to inspect a
physical model – as follows:

∗ e ρ ρ2 ρ3 φ φρ φρ2 φρ3
e e ρ ρ2 ρ3 φ φρ φρ2 φρ3
2 3 3
ρ ρ ρ ρ e φρ φ φρ φρ2
ρ2 ρ2 ρ3 e ρ φρ2 φρ3 φ φρ
3 3 2 2 3
ρ ρ e ρ ρ φρ φρ φρ φ
2 3 2
φ φ φρ φρ φρ e ρ ρ ρ3
φρ φρ φρ2 φρ3 φ ρ3 e ρ ρ2
φρ2 φρ2 φρ3 φ φρ ρ2 ρ3 e ρ
3 3 2 2 3
φρ φρ φ φρ φρ ρ ρ ρ e

Table 3. Multiplication of symmetries of a square

Notice the 4 × 4 blocks in the upper left and lower right corners. ♦
Exercise 10. To begin the product of (1 2 3)(4 5)(6 7 8 9) (2 5 6)(4 1)(3 7),
1 goes to 2 and 2 goes to 5, so we write (1 5. Now 5 goes to 4 and 4 goes
to 1, so the cycle is complete and we close it: (1 5). The next number that
hasn’t been used is 2, so we have 2 goes to 3 and 3 goes to 7, so we write
(2 7. Now 7 goes to 8, so we write (2 7 8. Then 8 goes to 9, so we write
(2 7 8 9. Next 9 goes to 6 and 6 goes to 2, and the cycle closes giving so far:
(1 5)(2 7 8 9). The first number that hasn’t been used is 3, and the cycle
(3 4 6) can be found in the same manner. ♦
Exercise 11. For example, Ä consider äÄ
(a b c)(d e)ä and (c b a)(e d). Then
(a b c)(d e)·(c b a)(e d) = (a b c)(c b a) (d e)(e d) = (a)(b)(c)(d)(e) = 1. ♦
Exercise 12. Recall from Exercise 5 that the multiplication modulo 7 group
is generated by 3 and by 5. To reveal the addition structure in the group
look at the powers of the generators: {30 , 31 , 32 , 33 , 34 , 35 } = {1, 3, 2, 6, 4, 5}
and {50 , 51 , 52 , 53 , 54 , 55 } = {1, 5, 4, 6, 2, 3}. So, there are two isomorphisms
from the addition modulo 6 group to the multiplication modulo 7 group:
k → 3k and m → 5m . ♦
Exercise 13. Since every element in the ellipse group is its own inverse
there cannot be an isomorphism to the addition modulo 4 group where only
0 and 2 are their own inverses. ♦
Exercise 14. Let h, k ∈ H ∩ K. Since H ∩ K ⊆ H ⇒ h, k ∈ H and H ∩ K ⊆
K ⇒ h, k ∈ K, the product hk is in H and in K, so it is in H ∩K. This proves
closure. Since H and K are subgroups of G, the associativity is automatically
true, and because of uniqueness of the identity in G (cf. Theorem 1a) it also
follows that e ∈ H and e ∈ K (why?), so that e ∈ H ∩ K. Since any element
h ∈ H ∩ K is in H and in K, it follows that h−1 is in both H and K, implying
h−1 ∈ H ∩ K. This proves that H ∩ K is a subgroup of G. 
Exercise 16. If we rotate the front face, the “single face” subgroup is the
cyclic group {1, F, F2 , F3 }, generated by F (and by F−1 =F3 ). 

Exercise 17. If we rotate the front and back faces, the “two opposite faces”
subgroup will have two cyclic “single face” subgroups F = {1, F, F2 , F3 }
and B = {1, B, B2 , B3 }, the elements of which will commute. As a result,
the total group will consist of 16 elements of the form Fk Bm where k, m =
0, 1, 2, 3. The group is commonly written as F × B, the direct product of
the two “single face” subgroups. 
Problem 8. (a) The macro M1 = UffurdlffLDR will perform the 3-cycle
of edge cubies on the front face: FR→FL→FU (cf. Fig. 6a). To see why
this is sufficient, pick any three edge cubies. It is straightforward to find a
sequence S1 of moves that lands all three cubies on the same face. Then we
can apply our macro M1 to cycle the three edge cubies, and finally we can
apply the inverse of S1 to return them to their original positions, but now
shifted in a cycle. The resulting total sequence of moves S1 M1 S−1
1 is called
the conjugation of M1 by S1 : a common operation in abstract algebra. ♦

Figure 6. Five algorithms for positioning and orienting cubies

(b) The macro M2 = fUBuFUbu will perform the 3-cycle of corner cubies
on the top face: ULF→ULB→URF (cf. Fig. 6b). The same idea of conjugating
M2 by a sequence S2 that moves any three corner cubies onto the same face
will work here to cycle these cubies: S2 M2 S−1
2 . ♦
(c) The macro M3 = rURurUFRbRBRfRR will perform the simultaneous
flip of edge cubies UL↔UF and of corner cubies ULF↔URF (cf. Fig. 6c). 
Exercise 24. The longest paths connect diametrically opposite cubies; e.g.,
for edge cubies: FR,UF,UL,BL, and for corner cubies: URF,ULF,DLF,DLB. ♦
Problem 9. (a) The macro M4 = FRBLUlUbrfluLu will flip in place the ad-
jacent edge cubies UF and UL, thus changing their orientations (cf. Fig. 6d).
If A and B are now two arbitrary edge cubies, take a path of adjacent
edge cubies from A to B, e.g., A, A1 , A2 , B, and apply the above macro to flip
the orientations on {A, A1 }, then on {A1 , A2 }, and finally on {A2 , B}. Along
the way, the orientations of the middle cubies A1 and A2 were flipped twice
and hence did not change, while the orientations of A and B did change. 
(b) The macro M5 = LdlfdFUfDFLDlu will rotate in place the corner
cubies UFL 1/3 counterclockwise and UFR 1/3 clockwise (cf. Fig. 6e). Anal-
ogously as above, take any path of adjacent corner cubies and apply macro
M5 to every pair along the path: this will turn in place all middle cubies 1/3
counterclockwise and then 1/3 clockwise, i.e., will fix them, while the first
and last corner cubies on the path will be rotated as desired. 
Session 3

Knotty Mathematics

Maia Averett

Sneak Preview. Have you encountered knot-eating machines? After pulling

and twisting in vain, have you resorted to flyping or writhing your shoelaces? Do
you apply a quandle or your sword in sorting out tangles? Have you ever defeated
theorems with toddler crayons? Have you seen the Reidemeister dance of links? If
you are unsure, plunge into this article to learn the strict mathematical meanings
of the funny words above. No special background is needed until Subsection 3.5,
where basic knowledge of systems of linear equations and arithmetic modulo 3
will come into play. The Jones polynomials in Section 4 will crown the discussion
of knot invariants, requiring mastery of high school algebra and some experience
with induction. In short, get naughty, knottier, or unknotted: your choice! 

1. A Knot, or Not a Knot. That Is the Question.

1.1. History and cheating. A long time ago in the region known today as
Turkey, the historic kingdom of Phrygia had no king. Its people sought the
advice of an oracle, who decreed that the next man to enter their city driving
an ox-cart should be their king. Soon thereafter, a poor peasant Gordius and
his wife wandered into the city with an ox-cart and the Phrygians declared
Gordius their king. In his gratitude to the gods, Gordius dedicated his cart
to Zeus and tied it to a pole in the acropolis with a complex and intricate
knot that became known as the Gordian Knot.
Over time, the lore surrounding the knot grew and
grew into the legend of Gordius, which said that the
person who could unravel the knot would rule all of
Asia. The Gordian Knot resisted all attempts to untie
it until 333 BCE, when Alexander the Great visited the
city. After searching unsuccessfully for the ends of the
rope, he boldly cut through the knot with a stroke of
his sword1 . Alexander the Great went on to conquer
all of Asia, fulfilling the prophecy.
Depicted on the right in The Story of the Greeks, by Helena A. Guerber [34].


But wait, did he cheat? Should he be allowed to cut the knot? Perhaps
the puzzle was truly impossible and the knot could not be untied without
cutting the rope to expose the ends. After all, we can always untie a rope
that’s knotted as long as the ends are still free. It might be quite difficult,
but with enough wriggling and pulling, it’s always possible! Perhaps the
Gordian knot had its ends spliced together and Alexander the Great had to
cut it in order to untie it! Or maybe it had its ends spliced together, but it
was still possible to untie it without cutting it. How can we know?
The mathematical branch of knot theory can help us answer this question.
It is a wonderful part of mathematics, full of pictures and silly words like
flype, writhe, and quandle, which represent actual knot theory concepts but
couldn’t be fitted in this short chapter. Do not fret: there will still be plenty
of pictures to justify the choice of such picturesque words.

1.2. Knotty definitions. Loosely speaking, a knot is a piece of string that

has been tangled up and then had the ends fused together, just like the
i Gordian Knot. You can easily make lots of interesting knots by taking an
extension cord, tangling it up, and then plugging one end into the other end.
In order to figure out if Alexander the Great cheated or not, we need to
find a way of looking at a knot and seeing if it is truly knotted or if there is
some way of wiggling and loosening it so that it becomes untangled and just
looks like a circle of rope. Indeed, once we start thinking in this direction,
we soon realize that there are many questions that we can ask about knots:
1. When should we consider two knots to be the same?
2. How can we tell different knots apart?
3. What kinds of different knots are there? Can we make a list?
4. If a friend hands us a knot, how can we find out which one it is?
5. How can we decide if our friend’s knot is really not knotted?
Mathematicians answer the first question by saying that two knots K1 and
i K2 are equivalent if K1 can be wriggled around until it looks exactly like K2 .
We only allow wriggling, twisting, swinging, sliding, smooshing, and the like.
Absolutely no cutting, tearing, slicing, or anything that Alexander the Great
would do! To make this notion (and the notion of a knot) mathematically
precise is a bit technical, but we can still attempt an array of challenging
problems without formal definitions of knots and knot equivalence.
 To prove that two knots are equivalent, one only needs to exhibit a series
of wriggles that transforms one into the other. But to prove that two knots
are not equivalent is an entirely different beast! It is not enough to simply
fail to come up with the correct sequence of wriggles after trying for a while.
If you lock a thousand monkeys in a room with the knots and have them try
for a hundred years and they don’t succeed, that’s still not a proof! Maybe
they just didn’t try the correct wriggle . . . . Instead of delegating the matter
to monkeys, in Section 2.2 we’ll develop invariants to help us rigorously
prove that certain knots are not equivalent.

1.3. The art and science of drawing knots. The first thing we need to
do in order to make sense of knots is to figure out a good way of representing
them. Playing with ropes is fun, but it’s not very useful for attacking a
problem systematically. Instead, we think of knots as represented by knot
diagrams, which are just drawings of the knot on paper so that we can easily
see what it looks like. Here is an example:

Figure 1. A knot and its diagram

Of course, there are many, many (infinitely many, even!) different diagrams
that represent the same knot. For instance,
Exercise 1. (Warm-up) Convince yourself that the two diagrams in Fig-
ure 2 represent the same knot, called the right-handed trefoil.2

Figure 2. Two diagrams of the right-handed trefoil

In drawing a knot diagram, the most important thing is that you should
be able to reconstruct your knot from the information you draw; so your
diagram has to be good enough to do this. In particular, it should be clear
which string goes over at each crossing, and there shouldn’t be three strings
meeting at a crossing.

1.4. Getting knottier. Now that we’re at it, why should we limit ourselves
to having just one loop of string? We may as well allow ourselves to play with
objects that are made up of more than one circle of string; these are called
i links, and the different pieces of string are called components. Of course, a
knot is a special kind of link: one that has only one component.
Figure 3 shows a few examples with their components drawn in different
colors. Again, instead of thinking of links as living in three-dimensional
space, we draw link diagrams flat on paper, leaving little gaps to indicate
the crossings.
Why “right-handed”? Read the beginning of the Hints section.

Figure 3. Examples of links

1.5. Some famous knots and links. The simplest knot is the unknot, as
shown in Figure 4a. You can see where it gets its name! The next simplest
i knot, the trefoil, follows in Figure 4b. This is the knot you’d get if you made
a regular overhand knot (like you were tying your shoelaces) and then put
the ends together. The remainder of Figure 4 portrays one other famous
knot (Figure 4c) and three famous links (bottom row).

The unknot The trefoil The figure 8 knot

The Hopf link The Whitehead link The Borromean rings

Figure 4. Knot-and-Link Hall of Fame

Exercise 2. Draw diagrams for the knots and links in Figure 4.

 knots (other than the unknot) with only 1 or 2 crossings? Draw pictures!
Exercise 3. Why is the trefoil the next simplest knot, i.e., why aren’t there

A Brunnian link is a link that falls completely apart if any one of its
i components are cut. The Borromean rings in Figure 4f are an example of a
Brunnian link with three components.

Exercise 4. Find a Brunnian link with four components; and then with
five components. What is the pattern? Can you describe how to draw a
Brunnian link with n components?

2. Reidemeister and Knot-Eating Machines

2.1. Reidemeister dance party. In order to study knots via their dia-
grams, we need a way to record on paper the wiggles that we might do to a
i knot if it were actually made of rope. Reidemeister 3 moves are operations on
knot or link diagrams that don’t change the knot or link represented by the
diagram. Reidemeister’s Theorem tells us that we only need three moves to
represent all possible knot wiggles. A fun way to think about Reidemeister’s
theorem is in terms of a knot (or link) dance party:
• the Reidemeister moves are the dance moves; and
• if a knot diagram K1 dances for a while and ends up looking like a knot
diagram K2 , then K1 and K2 represent the same knot!
Theorem 1. (Reidemeister’s Theorem) Two links are equivalent if and
only if they can be represented by diagrams that are themselves related by a
finite sequence of diagrams, each of which differs from the one before by one
of the following three moves, R1 , R2 , and R3 :
The pictures on the left are zoomed in on
R1 one part of the link, showing the strands be-
fore and after making the moves. The first
move straightens out a twist; the second sepa-
rates overlapping strands; and the third moves
a strand above a crossing.
Each move has a few variations. For exam-
ple, in the first move, the loop might be on the
R3 left instead of the right; or, in the third move,
the strand might be entirely under instead of
over the crossing.
It is intuitive (and true!) that the Reidemeister moves do not change
a knot. It is harder to see that these three types of moves are actually all
that you need in order to understand knot equivalence; but try to
convince yourself of that, too! For a proof of Reidemeister’s Theorem, we
direct you to Knot Theory by Livingston [49].
Kurt Werner Friedrich Reidemeister (1893-1971) produced over 70 mathematical
papers and books in differential geometry, combinatorial topology, combinatorial group
theory, logic, and philosophy, as well as in his dissertation field of algebraic number theory.
While at the University of Vienna, he learned from Wilhelm Wirtinger how to compute
the fundamental group of a knot from its projection. Soon after, he published important
papers in knot theory, Elementare Begründung der Knotentheorie [66] and Knoten und
Gruppen [65]) and his fundamental book Knotentheorie [67]. The Nazis considered him
“politically unsound” and forced him to leave his chair at the University of Königsberg in
1933. After World War II, Reidemeister was re-instated at the University of Marburg, at
Kurt Hensel’s chair.
Even to this day, Reidemeister moves are ubiquitous in knot theory research.

Now you will get a chance to apply to our main protagonists in Figure 4
the Reidemeister moves . . . as well as a seemingly “illegal” change-of-crossing
move, which will nevertheless prove quite revealing in sorting out knots.
Exercise 5. Use Reidemeister moves to go from one to the other trefoil
diagram in Exercise 1.

Exercise 6. Draw a diagram of the trefoil in pen-

cil. Change one of the crossings: reverse the order
i in which strands go “on top of each other” in this
crossing. Draw a sequence of Reidemeister moves
that shows that the resulting knot is the unknot. Repeat the exercise with
the figure 8 knot and the links in Figure 4, trying out different crossings.

Many recreational and serious math problems may look hard because the
problem solver faces the “end” of a procedure that must be undone:
Exercise 7. Start with a picture of an unknot and apply five Reidemeister
moves on it to make it look complicated. Give it to a friend and have him/her
try to untangle it using Reidemeister moves. If you want to challenge your
friend further, repeat the procedure on a more complicated knot or link.
While it is evident that the two unknots in the Hopf link are symmet-
ric (and likewise the three Borromean rings), is it immediate that the two
Whitehead components play an “equal role” in that link? The next exercise
settles this question.
Exercise 8. Draw a sequence of Reidemeister moves that sends the White-

head link to itself but interchanges its two components, thereby showing that
they are symmetrically positioned. (Draw the components in different colors
to make your solution clear.) Practice this rigorous component-swapping on
the Hopf link and the Borromean rings.

2.2. Knot-sorting machines. To distinguish various knots and links,

mathematicians employ the notion of invariant. In volume I, invariants were
described as quantities which do not change regardless of how the process
is played out. Using the same idea in a slightly different fashion, in this
session we shall think of invariants as abstract mathematical machines that
eat knots and return numbers, polynomials, or other “easier” mathematical
objects. These machines are generally defined on and computed from knot
i diagrams. In order to be invariants, they have to give the same value on all
diagrams of a given knot.
Reidemeister’s Theorem is extremely useful in this context, providing a
basic technique of checking if our machines are invariants or not:
 PST 15. To prove that something is a knot (or link) invariant, simply verify
that its value does not change under the three Reidemeister moves.

You can think of invariants as imperfect sorting machines. You can ask
the XYZ machine to sort knots by the XYZ invariant and it’ll sort them into
boxes accordingly. It might make mistakes, though, and put two inequivalent
knots in the same box, because it can happen that two inequivalent knots or
links have the same value for an invariant!

Figure 5. Mr. Naughty Robot sorts links by number of components

Despite such imperfections, being an invariant always means that if two

knots or links are equivalent, then the invariant must return the same value
for both of them. To put it in other words, if two knots or links have different
values for the invariant, then they are not equivalent! Thus,
 PST 16. To prove that two knots or links are not equivalent, find an in-
variant which takes different values on some diagrams of the two objects.

2.3. Baby invariants. There are lots and lots of knot and link invariants.
Let’s start with the simplest ones.
2.3.1. Sorting by the number of components. One extremely simple example
of a link invariant is the number of components that it takes to make the
link. This is a rather boring invariant because it is so coarse (for example,
it can’t distinguish any knot from any other!), but it does provide at least a
first little bit of information.
Mr. Naughty Robot in Figure 5 is sorting the links by their number of
components. In which urn will he put the link he is currently analysing?
2.3.2. Counting the crossing number. The crossing number is the minimum
i number of crossings occurring in any diagram of the knot.
 Exercise 9. Find out the crossing numbers of all knots in Figure 4.
Partial Solution: The only crossing number that is truly easy to com-
pute is that of the unknot, which is zero. You need to do some work for all

other knots! For example, in Exercise 3, you showed that any knot with 1 or
2 crossings is really the unknot; consequently, you concluded that the trefoil
is the next simplest knot, always drawn with at least 3 crossings.
Further, since we can draw a figure 8 knot in a diagram with 4 crossings,
we know that its crossing number is ≤ 4, but we don’t know whether it
equals 3! How do you know that you can’t draw it with 3 crossings? ♦
2.3.3. Distinguishing by the unknotting number. The unknotting number is
i the minimum number of crossings that must be changed (as in Exercise 6)
before the knot becomes equivalent to the unknot (or the link to an unlink).
This is, in some sense, a measure of how knotted a knot is.

 knot, and our famous links in Figure 4.

Exercise 10. Conjecture the unknotting numbers of the trefoil, the figure 8

Hint: In Exercise 6 you showed that the unknotting number of the trefoil
must be ≤ 1. Again, to prove it is actually equal to 1, you need to show that
the trefoil is not already the unknot, which we won’t get to until the next
section. ♦
The sections that follow will detail several more involved and powerful
examples of invariants, which will enable us to rigorously distinguish among
all of our famous knots and links, thereby completely solving the above
problems and a lot more.

3. Three Crayons Defeat an Army of Knots

3.1. Mathematicians love to color. Among the easy-to-define-but-still-

powerful link invariants lies the notion of tricoloring. To start, let D be a
link diagram, and choose three colors, e.g., red, blue, and yellow.
We have used the word “strand,” assuming that its meaning is obvious.
Now let’s rigorously define it: a strand in D is a continuous piece that goes
i from one undercrossing to the next, or comes back to itself as in the unknot
drawn without any crossings. Interestingly, all possible numbers of strands
from 1 to 6 appear in Figure 4, e.g., the Whitehead link has 5 strands.
Exercise 11. (Warm-up) Arrange our six famous knots and links accord-
ing to the number of strands each has, as drawn in Figure 4.
A tricoloring of a diagram D is a choice of color for each strand such
i that at each crossing either all 3 colors are present or only 1 color is present.
Figure 6 exhibits two tricolorings. Of course, with this definition, we could
always color all the strands the same color, so every diagram has a trivial
tricoloring. We say that
• D is tricolorable if it has a nontrivial tricoloring, i.e., one that uses at
i least 2 colors.
• A link is tricolorable if it has a tricolorable diagram.

Figure 6. The trefoil and the 74 knot are tricolorable.

 PST 17. To prove that a knot (or link) is tricolorable, all you have to do is
exhibit a nontrivial tricoloring of one of its diagrams.
This is the “muddying-your-hands” approach: just as proving that two
knots are equivalent requires us to show a sequence of (legal) transformations
that takes one knot to the other, so does tricolorability demand that we come
up with a particular tricoloring of a particular diagram of the knot. Still, re-
call the warning about how different the question of showing non-equivalence
between two knots is. Trying out specific moves that don’t transform one
knot to the other is not enough: that’s what a whole army of invariants is
created for! Analogously, just failing to tricolor a knot is not a proof of its
non-tricolorability. More subtle work needs to be done.
 PST 18. To show that a knot is not tricolorable, you have to make a logical
argument as to why it can’t be. Begin by coloring a single crossing with all
3 colors (if you want to get a nontrivial tricoloring, you might as well start
with all 3 colors!). Work your way around the knot, following the rules for
tricoloring, until you come to a contradiction. Sometimes there might be
more than one choice and you have to show that in all cases you still come
to a contradiction.
 Exercise 12. Find out if the figure 8 knot is tricolorable or not.
Solution: We’ll start with the crossing on the far left. If all three strands
that meet there are, say, red (cf. Fig. 7a), the fourth strand of the knot is
forced to be red too (why?), and hence the knot is trivially colored. Other-
wise, we have red, blue, and yellow at that crossing. At the next crossing
(moving clockwise), we see that two strands are already colored with dif-
ferent colors, so we have to color the remaining strand red. But now we’ve
colored all the strands, and the two crossings at the bottom of the picture
don’t obey the tricoloring rules. The figure 8 knot is not tricolorable! 

Figure 7. Trying to tricolor the figure 8 knot


3.2. Tricolorability as an invariant. The reader may ask what is so

special about tri-colorings? Why not allow 2 colors at a single crossing?
Recall that in order to create a feature that truly describes a knot or
a link (and not just a particular diagram of it), this feature must be an
invariant; in other words, it must remain unchanged under the Reidemeister
moves. A 2-coloring rule would be at odds already with the first Reidemeister
move R1. To see this, color the two strands of the double-twist unknot in
red and blue (cf. Fig. 8), and then apply R1 to undo one or both twists; this
results in a single strand and yields only the trivial colorings of the unknot.
You are forced to conclude that the unknot is both two-colorable and not
two-colorable, depending on which diagram of it you look at: this is no good!
On the other hand, the unknot is never(!) tricolorable, regardless of how we
draw it on paper.

R1 R1

Figure 8. Two-colorability is not an invariant

Thus, the idea to use 3 colors and define tricolorings the way we did earlier
is far from whimsical, as the next fundamental problem also confirms.
Problem 1. Suppose a link diagram D is tricolorable. Show that if you per-
form any of the three Reidemeister moves on D, then the resulting diagram
is also tricolorable. Conclude that tricolorability is a link invariant!
 PST 19. One possible way to distinguish two links (or knots) is to verify
that one is tricolorable and the other is not.
Since the unknot is not tricolorable – it has only trivial tricolorings –
and the trefoil is tricolorable, then they must be distinct knots! Now we are
sure that the trefoil is really a knot (and not the unknot). Incidentally, this
completes the proof that the trefoil’s crossing number is 3 (cf. Exer. 9) and
that the trefoil’s unknotting number is 1 (cf. Exer. 10). 
Exercise 13. Mrs. Trefoilia Robot
(to the right) is sorting knots and links
into two urns according to their tri-
colorability. In which urn should she

put the Borromean rings? How about
the unlink 4 with two components?
Explain why she has correctly sorted
out the Hopf link, the figure 8 knot,
and the Whitehead link? How many
trefoils can you recognize in the YES
picture? NO
An unlink is a link equivalent to several disjoint unknots.

Exercise 14. Which of the knots below are tricolorable? The knots are

known as the 31 knot (the trefoil), the 51 knot, the 71 knot, and the 91 knot.
(Why do you think they have those names?) What do you notice about your
answers? Make a conjecture and explain your reasoning.

Here are the 111 , 131 , and 151 knots. Check your conjecture!

 even analogues, the 4 , 6 , 8 , and 10

Exercise 15. In Exercise 14, the knots all had odd names. Here are their
1 1 1 1 knots. Which are tricolorable? Ex-
plain your reasoning. Draw the 121 knot and check your conjecture.

3.3. Counting tricolorings. We can also think about not only whether or
not a link is tricolorable, but how many possible tricolorings it has. Let’s
i write τ (L) for the number of tricolorings of a link L. Hold on! The link L
has infinitely many diagrams D! Which diagram do we use? Ideally, any
two diagrams of L would have the same number of tricolorings so that it
wouldn’t matter which diagram D we choose to calculate τ (L). . . . Luckily,
this is exactly what happens:
Problem 2. Prove that performing Reidemeister moves on a diagram D
preserves the number of D’s tricolorings. Conclude that τ (L) does not change
under the Reidemester moves, i.e., that τ (L) is a link invariant.
Since every link always has at least the three trivial tricolorings, it’s easy
to see that τ (L) ≥ 3 for all links L. This relates to our previous definition
– that a link is tricolorable if and only if τ (L) > 3 – but it is a stronger
invariant, as demonstrated next.

Exercise 16. Compute τ for the trefoil, the figure 8 knot, and the so-called
square knot shown in Figure 9. Conclude that these are all different knots!
The last exercise shows that τ is a more refined invariant than the sim-
ple Yes/No of tricolorability: it can distinguish between the trefoil and the
square knot, even though both are tricolorable.
Problem 3. Compute τ for various knots from the knot table on page 68.
Do you notice a pattern? Can you explain why you see that pattern? (This
will be treated in more detail with linear algebra in Section 3.5.)

3.4. Tricolorings and connected sums. Just as we can build any natural
number from its prime divisors, we can try to create more complex knots
from simpler knots. For this, we will do a bit of “surgery” on the simpler
knots in order to join them together, sort of like Siamese twins.
i The connected sum K1 #K2 of two knots K1 and K2 is formed by erasing
a little piece of a strand from each knot and then connecting the loose strands
together. The example in Figure 9 takes the right-handed and the left-
handed trefoils and forms their connected sum, known as the square knot.

K1 K2 K1 #K2
Figure 9. The square knot is the connected sum of two trefoils
For instance, it is easy to see that a knot K doesn’t change if you connect
it with the unknot U ; but that K acquires an extra ring around one of its
strands if you connect K with the Hopf link H (why?).
 Exercise 17. If K 1 and K2 are tricolorable, is K1 #K2 tricolorable?
Taking connected sums is a good operation on knots as it relates features
of the resulting knot to those of its building blocks. One such feature is τ .
Problem 4. Find a formula that relates τ (K1 ), τ (K2 ), and τ (K1 #K2 ).
 PST 20. It is always a smart idea to check your formulas against some
examples that you can work out directly or using other methods.
Problem 5. Consider your formula for τ (K1 #K2 ).
(a) Verify it when one of the knots Ki is U or H.
(b) What does it say about K1 and K2 if K1 #K2 is tricolorable?
(c) Use it to find τ of a linear chain of n rings (cf. Fig. 10).
(d) Is it useful in finding τ of a necklace of n rings? How about the Brunnian
link with n rings from Exercise 4? Calculate τ if you can.

Figure 10. Chain and Necklace of rings

3.5. Tricolorings and linear algebra over F3 . We will now use tools
from linear algebra to systematize our study of tricolorings. To this end,
we will need to assume knowledge of a few things. You can skip ahead to
Section 4 on the Jones polynomial if you don’t know about matrices, systems
of linear equations, or adding and multiplying modulo 3.
The set of numbers {0, 1, 2} is a perfectly good place for doing arith-
metic:5 it is called the field F3 . This just means that you can add, subtract,
multiply, and divide in F3 subject to all the usual rules, e.g., distributive
law, associative law, etc. However, each time you get a number a ∈ F3 , you
divide a by 3 and replace a by its remainder 0, 1, or 2. (In a fancy language,
you reduce a mod 3.) For example, 5 = 2 and 7 = 1, 5 + 7 = 12 = 0,
5 − 7 = −2 = 1, 5 · 7 = 35 = 2, and 7 ÷ 5 = 2. In practice, arithmetic mod 3
boils down to 4 simple tables:
+ 0 1 2 − 0 1 2 · 0 1 2 ÷ 1 2
0 0 1 2 0 0 2 1 0 0 0 0 0 0 0
1 1 2 0 1 1 0 2 1 0 1 2 1 1 2
2 2 0 1 2 2 1 0 2 0 2 1 2 2 1
Moving on, you might have learned about matrices and linear algebra
working over Q or R (i.e., using rational or real numbers); but in fact you
can do linear algebra over any field, including F3 . You can do row operations,
find inverse matrices, and solve systems of equations in just the same way.
All of the theorems generalize word for word over F3 . To get warmed up, do
the following couple of computations with linear algebra over F3 .
Exercise 18. Write down the co- Exercise 19. Consider the matrix
efficient matrix for the system of ⎛ ⎞
1 2 0 0 1 1
equations ⎜ 0 1 2 2 0 0 ⎟
⎜ ⎟
2x + y = 0 ⎜ 0 0 1 0 2 0 ⎟
x + y + z = 1. A=⎜

⎜ 0 0 0 0 0 0 ⎟
⎝ 0 0 0 0 0 0 ⎠
Then write down the augmented
matrix and do row operations to 0 0 0 0 0 0
find all solutions to this system over How many solutions does the system
F3 . How many are there? Why? of equations Ax = 0 have over F3 ?

Exercise 20. Let A be a matrix over F3 . Describe the relationship between

 the number of solutions to Ax = 0 and the number of zero rows in the echelon
form of A.
Review Number Theory I in Volume I or Group Theory I in Volume II.

Solution: If A is a matrix over F3 with k rows of zeros in its echelon form,

then the homogeneous system of equations Ax = 0 has 3k solutions. Indeed,
each row of zeros in the echelon form of A corresponds to a free variable.
After assigning arbitrary values in F3 to all free variables, we can uniquely
solve for the remaining (leading) variables. For each of the k free variables,
we have three choices (0, 1, or 2 in F3 ); so there are 3k possible k-tuples of
the free variables, and hence 3k overall solutions to our system. 

3.6. And on to the knots! Let D be a diagram of a link with m crossings

and label the strands in the diagram s1 , s2 , s3 , . . . , sn . Most of the time
m = n, but sometimes not! Why not? Essentially, the only counterexample
is the unknot drawn with 0 crossings: yet, it will still have 1 strand! For the
combinatorially inclined, here is one little exercise on counting, which can
be skipped without harm.

 drawn with no self-crossings and unlinked to the rest of the diagram.

Exercise 21. Prove that n = m + u where u is the number of D’s unknots

Hint: To every crossing associate the strands that go under it; and to every
strand associate the crossings (if any) under which it goes. ♦
Instead of using 3 colors to label the strands, let’s use the numbers 0, 1,
and 2. Then a tricoloring of D is an assignment of one of the numbers 0, 1,
i or 2 to each strand sk such that at each crossing either all 3 numbers are
present or only 1 number is present. Let’s denote the “color” of sk by xk , so
that xk ∈ {0, 1, 2}. Thus a “coloring” of D will be a list x1 , x2 , x3 , . . . , xn of
“colors” for the strands. But not just any list . . . .
Exercise 22. How many strands meet at a single crossing? Examples?
Solution: There could be 1, 2, or 3 distinct strands meeting at a sin-
gle crossing. Examples are provided by the unknot twisted once or twice
(cf. Fig. 8), or thrice; but you can easily go with the Hopf link and the
trefoil for the 2- and 3-strand crossings. 
The variety of possibilities at a single crossing is incon-
venient. Instead, imagine an ant sitting on the diagram
D in the vicinity of our crossing C. The ant will observe
three distinct pieces of strands at C and will not know if
they are “glued” within the same strands somewhere far
away (as in the picture to the right). For the remainder
of this section over F3 , we will take the ant’s viewpoint:
the local coloring of a crossing will consists of the three
i numbers assigned to the pieces of strands that make up
the crossing.
Thus, the conditions on a tricoloring say that at each crossing there must
be only 1 number repeated three times, or there must be all 3 numbers writ-
ten in the order determined by our original strand sequence {s1 , s2 , . . . , sn }.

Exercise 23. In our new language,

(a) List all possible combinations of local colorings of a single crossing.

For each of your combinations, compute the sum of the three elements.
What do you notice?
(b) Suppose that strands si , sj , and sk (possibly listed with repetitions)
meet at a crossing. Based on your observation above, write down an
equation that their colors xi , xj , and xk must satisfy.
Answer (b): xi + xj + xk ≡ 0 (mod 3). ♦

As we consistently work mod 3, let’s write xi + xj + xk = 0 with the

i understanding that this equation lives in F3 . Denote by T (D) the set of
tricolorings of a knot diagram D. Then by definition, τ (D) is the size of
T (D). From Exercise 23 we can describe T (D) as:
ß ™
if strands si , sj , and
T (D) = (x1 , x2 , . . . , xn ) ∈ Fn3 xi + xj + xk = 0 .
sk meet at a crossing
Since there is one equation of the form xi + xj + xk = 0 for each crossing, the
conditions on the list (x1 , x2 , . . . , xn ) are a set of m equations in n unknowns.
Ah ha! Finding τ (D), that is, calculating the number of allowable sequences
(x1 , x2 , . . . , xn ), is a linear algebra problem over F3 ! Let’s apply this idea.
Problem 6. Let 77 be the knot depicted below. Label its strands with
the numbers 1 through 7. Find τ (77 ) by completing the following steps.
(a) Write down the equations that must be true in
order to have a tricoloring of this knot.
(b) Write down the coefficient matrix A of the result-
ing system of 7 equations and 7 unknowns.
(c) Do row operations on A to obtain its reduced ech-
elon form B.
The 77 knot
Let x = (x1 , x2 , . . . , x7 ). Then a tricoloring is a solution x to the matrix
i equation Ax = 0, or, equivalently, to B x = 0. Recalling that you are working
over F3 , and keeping in mind monochromatic (a.k.a. trivial) colorings:
(d) Decide if this knot is tricolorable.
(e) Even better, count the number τ (77 ) of tricolorings of this knot!

Have fun by playing with this awesome linear algebra tool in the following:
Exercise 24. Apply this linear algebra procedure to all knots whose τ you
already know and compare your answers, e.g.,
(a) our six famous knots and links in Figure 4;
(b) the unknot twisted by n consecutive R1-moves in Figure 8;
(c) the knots with odd and even names on page 59;
(d) the square knot; the linear chain and the necklace with n rings each as
in Problem 5;
(e) the knots from the knot table in Problem 3 on page 68.

4. The Jones Polynomial

4.1. Revolutionizing knot theory. The Jones polynomial is another ex-

ample of a link invariant; but instead of being a number (like the crossing
number) or a simple Yes/No (like tricolorability), it is a polynomial. Actu-
ally, it’s not quite a polynomial, since it can have half-integer and negative
exponents as well, but it is commonly referred to as a polynomial. It was
discovered in 1983 by UC Berkeley Professor Vaughan Jones, and it revolu-
tionized the world of knot theory! Suddenly longstanding conjectures were
easy to prove and a whole host of generalizations were invented. In 1990,
Jones won the Fields Medal for his work.6
Recall that being an invariant means that:
• If two knots or links are equivalent, then their Jones polynomials are
equal. In other words, if two knots or links have different Jones poly-
nomials, then they are not the same object!
• The Jones polynomial is not perfect, though. It can happen that two
different knots or links have the same Jones polynomial.
4.2. Orienting links. A key idea in defining the Jones polynomial is the
i notion of an orientation on a link : this is just a choice of direction for each
of the link’s components. Here the two possible oriented Hopf links:
They turn out to be inequivalent as oriented links, since
it is impossible to transform one into the other (via legal
link moves) while still keeping the assigned orientations.
But to explain why this is impossible would take us too
far afield and is not necessary for our purposes. The top
i orientation, called the positively-oriented Hopf link, will
be denoted by H, while the bottom orientation by H − .
To get a feeling for oriented links, play around with the
following warm-up questions:
Exercise 25. How many orientations does a knot have? Display all possible

orientations of the unknot drawn with 0, 1, or 2 twists (cf. Fig. 8), and decide
which are equivalent. Give all orientations for the trefoil, the Whitehead link,
and the Borromean rings, and think about which of them are equivalent.
Sir Vaughan Frederick Randal Jones was born in 1952 in New Zealand. In 1979 he
completed his doctoral studies at the University of Geneva, under the Swiss topologist
André Haefliger. The next year, Jones moved to the United States, and after teaching for
several years at the University of California at Los Angeles and the University of Penn-
sylvania, he received a permanent position at the University of California at Berkeley. In
1984, while working in the theory of von Neumann algebras (an area in analysis motivated
by group representations, operator theory, ergodic theory, and quantum mechanics), Jones
discovered the link invariant known now as the Jones polynomial, which unexpectedly had
vast applications in knot theory and re-energized the study of low-dimensional topology.
In 2002, Jones received the Distinguished Companionship of the New Zealand Order of
Merit, which was renamed Knight Companion in 2009.

4.3. What is the Jones polynomial? There are several choices for how
to define the Jones polynomial. For our purposes, the easiest way is through
the so-called skein relation, which relates the Jones polynomials of certain
triplets of (oriented) links. The diagrams of these links L+ , L− , and L0 are

L+ L− L0
Figure 11. Links in the skein relation
identical except for at one specific crossing (cf. Fig. 11) where L+ has an
overcrossing, L− has an undercrossing, and L0 has no crossing.
i Definition 1. The Jones polynomial VL is defined for all oriented links L
by the following three properties.7
• VU (t) = 1, where U (t) is the unknot.
• VL is an invariant of links.
• VL satisfies the skein relation: for any triplet of oriented links L+ , L− ,
and L0 as described above (cf. Fig. 11),
Ç å
1 √ 1
VL − tVL− = t − √ VL0 .
t + t
The skein relation looks complicated, but it helps us relate the Jones
polynomials of knots that differ at one crossing.
PST 21. If you can find three links whose diagrams are identical except at

one specific crossing, where they differ as in Figure 11, then you can use the
skein relation to relate their Jones polynomials. If you know two of the Jones
polynomials, the skein relation will allow you to solve for the third!
We will do an example shortly; but first let us mention an open problem:
Question 1. (Open) If knot K has Jones polynomial 1, is K equivalent to
the unknot? Equivalently, is there a nontrivial knot with Jones polynomial 1?

This is such a simple question; yet we still don’t know the answer! Perhaps
you can enlighten us someday.

4.4. Building up the trefoil via the skein relation. We will go through
a series of examples to build up to computing the Jones polynomial of the
trefoil. We already know that VU (t) = 1. To see how the skein relation works
in practice, let us move to the next simplest case:
 Exercise 26. Find the Jones polynomial of the unlink with two components.
The reader will notice that if we adopt Definition 1, we must prove that the Jones
polynomial actually exists and is unique for every link! We don’t have space for this here;
however, the advanced reader is encouraged to look up the proof in [44].

Solution: Let U stand for the (oriented) unknot, and U2 for the (oriented)
i unlink with two components. By changing the uncrossing of U2 to an over-
crossing and an under-crossing, we can relate U2 to two copies of the unknot:

L+ = U L− = U L0 = U2
Figure 12. Skein relation for the unlink U2

If VU and VU2 are the Jones polynomials for the unknot and our unlink,
respectively, the skein relation yields:
Ç å Ç å
1 √ 1 VU =1 1 √ 1
VU − tVU = t − √ VU2 ⇒ −t= t − √ VU2 .
t t t t
Using the formula a2 − b2 = (a − b)(a + b), we can now solve for VU2 :
Ä √ä Ä √ä
− t
√ − t · √1t + t √ 1
VU2 (t) = √t = t
√ = − t− √ · 
t− t√1
t− t1
√ t
We just found the Jones polynomial of the unlink with two components!
Now, if T denotes the right-handed trefoil (cf. Fig. 2a), how do we find VT ?

 sumably simpler) links via the skein relation.Keep applying the skein relation
PST 22. To calculate VL , reason backwards: relate the link L to other (pre-

to those new links, until you end up with links whose Jones polynomials are
already known to you.

Exercise 27. Draw pictures that relate the right-handed trefoil T to other
well-known links, one of which is the positive Hopf link H (cf. p. 64).
Now we are stuck with the Hopf link ! Get over this obstacle:

 down what the skein relation says in your diagram and solve for V
Exercise 28. Relate the (positive) Hopf link H to well-known links. Write

Partial Solution: The diagram relates H = L+ to the unlink U2 = L−

and the unknot U = L0 ; notice that the only change occurs√ in the √ top
crossing. After “skeining,” the final answer comes to VH (t) = − t − t2 t. ♦

L+ = H L− = U2 L0 = U
Figure 13. Skein relation for the Hopf link

We are now ready to put together everything and attack the trefoil again.

Exercise 29. Using your findings so far, calculate the Jones polynomial VT
of the right-handed trefoil T .
The careful reader might have noticed that we skipped one simple link:
the negatively-oriented Hopf link H − . Its Jones polynomial turns out to be
VH − = −t−1/2 − t−5/2 (check it!), which differs from VH ! Have we made
a mistake? It is important to understand that the Jones polynomial is an
invariant of oriented links. Orientation does not affect the Jones polyno-
mial of a knot (why?); but for a general link, you may get different Jones
polynomials, depending on the link’s orientations. With this in mind,
Exercise 30. Find the Jones polynomials of the figure 8 knot, the White-
head link, the Borromean rings, and the square knot.

 in Figure 3. The idea is to be systematic in your calculations: track down

The more adventurous reader can also try out the complicated links

which links need to be dealt with and keep a record of the Jones polynomials
already found. For those skilled in induction and algebraic operations on
polynomials, here are a couple of challenges in true math-Olympiad style.
Problem 7. Find the Jones polynomials of the unlink with n components,
the linear chain with n components, and the knots n1 from page 59.

4.5. Mirror, mirror. Imagine taking a knot and switching all the crossings.
i Doing this to a knot creates the knot’s mirror image.8 Below you see the
figure 8 knot and its mirror image. Are these two knots equivalent? For
starters, we should look at their Jones polynomials.
Exercise 31. The Jones polynomial of the
figure 8 knot, as you should have shown
earlier, is V41 = t2 − t + 1 − t−1 + t−2 .
 Compute the Jones polynomial of its mir-
ror image to obtain the same result! Are
the two knots equivalent? The figure 8 Its mirror image

Hmm . . . the Jones polynomial can’t tell the difference between these two
knots. They might be the same, but they might not! In fact, there are
special words to reflect both possibilities. A knot is amphichiral if it is
i equivalent to its mirror image; it is chiral otherwise.
Problem 8. Make a figure 8 knot and its mirror image out of rope. Play
with the ropes to try to see if the figure 8 knot is chiral or amphichiral. If
you think it is amphichiral, then prove it using Reidemeister moves!
What if not? Wait a minute! Shouldn’t we try this on a simpler example?
 Exercise 32. Is the trefoil chiral or amphichiral?
Think about where the name comes from! Nope, despite appearances, it’s not a
reflection across a vertical line! Where is the “mirror”? 

Partial Solution: As you must have found earlier, the right-handed

trefoil has VT = t + t3 − t4 . However, its mirror image, the left-handed trefoil
T , has VT = t−1 + t−3 − t−4 . Since the Jones polynomial is an invariant of
knots, we conclude that the right-handed and left-handed trefoils are different
knots, and hence the trefoil is chiral. ♦
Implicitly, we have assumed that the mirror image of a link does not
depend on the particular diagram we use . . . . But we haven’t proven this!

 move. Prove that the mirror images of D

Exercise 33. Let D2 be obtained from a link diagram D1 via a Reidemeister
1 and D2 represent the same link.

 their Jones polynomials.

Exercise 34. Below are some pictures of knots, their mirror images, and
What do you notice? Make a conjecture! Then
give a criterion on the Jones polynomial that implies that a knot is chiral.

V61 = t2 − t + 2 − 2
t + 1
t2 − 1
t3 + 1

V61 = 1
t2 − 1
t + 2 − 2t + t2 − t3 + t4

61 61

V816 = − t2 + 3t − 4
+ 6t − t62 + t63 − 5
t4 + 3
t5 − 1

V816 = − 1
t2 + 3
t −4
+ 6t − 6t2 + 6t3 − 5t4 + 3t5 − t6
816 816

V910 = − t11 + t10 − 3t9 + 5t8 − 5t7

+ 6t6 − 5t5 + 4t4 − 2t3 + t2
V910 = − t11 + t10 − t9 + t8 − t7
1 1 3 5 5

t6 − t5 + t4 − t3 + t2
6 5 4 2 1
910 910
Figure 14. Chiral or amphichiral?

Here are a few of the many intriguing and fundamental properties of the
Jones polynomial, some demonstrated by Jones himself in 1985.
Theorem 2. For any knot K, its mirror image K, and links L, L1 , and L2 :
2π 2π
(a) VK (t) = VK (t−1 ) and VK (e 3 i ) = 1, where e 3 i = cos( 2π 2π
3 ) + i sin( 3 ).
(b) dt VK (1) = 0, where d/dt is the derivative with respect to t.
(c) VL (1) = (−2)p−1 , where p is the number of components of L. Moreover,
if p is odd, then VL (t) is a polynomial with integer powers; if p is even,
then VL (t) is t1/2 times such a polynomial.
(d) VL1 #L2 (t) = VL1 (t) · VL2 (t).
We challenge the advanced reader to prove or find proofs of these facts.

4.6. Mysticism, art, and mathematics. Bumping into knot and link
celebrities is a daily occurrence for everyone, whether we realize it or not.
For example, the trefoil is often the centerpiece of beautiful jewelry:

Figure 15. Trefoil ring and “Russian” wedding ring

The very popular “Russian” wedding ring (a.k.a. Cartier trinity ring) is
simply a link of 3 unknots. As opposed to the Borromean rings, removing
one of the unknots does not cause the rest to fall apart but leaves a Hopf
link. A feature making it so convenient to wear can be appreciated when
the 3 pieces are aligned to put them on a finger: they glide on top of each
other, causing the whole ring to smoothly move across the finger! Try this

on a homemade link: it’s worth it to see the gliding in action. Is it possible
to create a “super-Russian” wedding ring of 4 pieces with similar properties?
The Celtic knot (or The Emblem of Divine Inscrutability), rumored to
contain all the wisdom of King Solomon, appears in an array of artistic
versions. It is actually not a knot but a link of 2 unknots intertwined twice
and is known in mathematics as the 4-crossing link.

Figure 16. King Solomon’s knot and the IMO logo

There is no way we can omit our last example: the logo of the Interna-
tional Mathematical Olympiad itself is our old friend, the Whitehead link;
but instead of being non-trivially tricolored (which it cannot!), the link is
5-colored in honor of another even more famous logo. Can you guess which
Exercise 35. Calculate the Jones polynomials of the Celtic knot and of the
“Russian” wedding ring, and compare them, correspondingly, to those of
 (a) the Hopf link; the unlink with 2 rings; and the Whitehead link;
(b) the Borromean rings; the linear chain; and the unlink with 3 rings.

You can find on the internet a multitude of intricate knots in a variety

of situations. If you have loads of time on your hands, our final exercise will
offer you exhaustive practice in calculating and analysing Jones polynomials.
Exercise 36. Verify the properties of Jones polynomials in Theorem 2 in
all cases of knots and links in this session, including the celebrity ones.

5. Is This the End?

Definitely not! If you would like to learn more about knots, you should
have a look at Justin Roberts’ “Knots Notes,” available on his website [68].
Another great resource, as we mentioned earlier, is “The Knot Book” by Colin
Adams [1]. Of course, you should also take a peek at the more recent article
“The Jones Polynomial” by the master himself, Sir Vaughan Jones on his
own website [44].
We mentioned in passing some funny-sounding, yet rigorous knot termi-
nology. If you are curious about a quandle, it is a knot invariant, discussed in
the accessibly written “Knot Quandle” by then-undergraduate Elenoir Bir-
rell [10]. For using flypes – a different type of knot transformations – to
prove “The Tait Flyping Conjecture,” we direct you to two papers of Menasco
and Thistlethwaite [54, 55]. Finally, a writhe – a property of a positively-
oriented link – fails to be a knot invariant, as demonstrated by Hoste et al.
in “The First 1, 701, 936 Knots” [42].
Regarding the open Question 1 on page 65, check out “Links with trivial
Jones polynomial” [81] and “Infinite families of links with trivial Jones poly-
nomial” [23] by Thistlethwaite et al. Even the basic notion of Reidemeister
moves enters into modern research nowadays; for some upper bounds on “The
number of Reidemeister moves needed for unknotting” we direct you to Hass
and Lagarias’ paper [38].
Many of the images in this session were created using Robert Scharein’s
KnotPlot software at, which you should absolutely
download and play with! It allows you to load knots from a library up to
10 crossings, see them in 3-D, compute polynomial invariants, sketch new
knots, and much, much more!
The author would also like to thank Henning Hohnhold for the idea to
include the Alexander the Great story.

6. Hints and Solutions to Selected Exercises

Exercise 1+ . In Figure 2, the upper loop of the knot on the right has two
twists. Just untwist it to get the trefoil. 


Figure 17. Right-handedness and Unknotting 2-crossings knots

Why is the trefoil T in Figure 2a called right-handed ? Orient T by tracing

it in one of the two possible directions (as in Fig. 17a). Then every crossing
of T is right-handed : if you grasp the over-strand in your right hand with the

thumb pointing in its direction, then your other fingers point in the direction
of the under-strand; thus, each crossing is of type L+ (cf. skein relation in
Fig. 11). This remains true for the right-handed trefoil T regardless of its
orientation, and it is false for the left-handed trefoil.
Exercise 3. A knot diagram with 1 or 2 crossings inevitably results in the
unknot, as demonstrated by the untwisting in Figure 8. Trying something
“different” with 2 crossings as in Figure 17b doesn’t help: just pull the strand
that is draped over the other to eliminate the crossings and get the unknot. ♦
Exercise 4. Take n − 1 unlinks and arrange them in a line so that each one
overlaps slightly with the one before and the one after. Add in the final link
by weaving through these, going over and under, over and under, and then
fuse the ends of this final link together. A case of a Brunnian link with 4 com-
ponents is displayed in Figure 3b; one actually has to stare at it for a while to
realize that our construction recipe is not followed to the letter. For another
construction using “rubberbands” check [57]. To see a Brunnian link with
5 components go to YouTube at
and watch it fall apart in slow motion when one link is cut.
In fact, for each n the infinitely many Brunnian links with n compo-
nents were classified in 1954 by John Milnor via what is now called Milnor
invariants [56]. ♦
Exercise 5. Use two R1 moves or one R2 move. 
Exercise 6. Whichever crossing you choose to change, the Hopf link will
become an unlink, the trefoil and the figure 8 knot will turn into the unknot,
and one of the Borromean rings will peel off, forcing the remaining two rings
into a Hopf link. In this respect, the Whitehead link is more interesting:
changing any of its 4 “outside” crossings results in the Hopf link; but changing
its central crossing breaks it into an unlink! You should check that the
transformations described here (except for crossing changes, of course!) can
be expressed as sequences of Reidemeister moves. ♦
Exercise 9. Show that any diagram with three crossings represents either
the trefoil or the unknot. ♦
Exercise 10. The solution to Exercise 6 actually tells us that the unknotting
number is 2 for the Borromean rings, 0 for the unknot (of course!), and 1 for
all other knots and links in Figure 4. ♦
Exercise 11. The links in Figure 4 will be ordered if you insert the 2-strand
Hopf link between the 1-strand unknot and the 3-strand trefoil. Note that
the number of strands in these links equals the number of crossings, except
for the unknot. Do you know why? Check out Exercise 21. ♦
Problem 1. Tri-color the diagram D. An R1 move can be performed only
on a monochromatic crossing, after which the crossing is eliminated but its
color is preserved (cf. Fig. 18a). For R2, the over-crossing strand is all one

color, while the three under-crossing strands can be colored in two ways
(cf. Fig. 18b-c). In either case, after pulling apart the strands by move R2,
the diagram remains tricolorable: indeed, all strands “exiting” the picture
preserve their colors, thereby allowing for the rest of the (unseen) diagram
to remain tricolored as before. Note that we are only allowed to change the
color of strands that lie entirely inside our picture.

R1 R2 R2

Figure 18. Moves R1 and R2 and tricolorability

The same idea governs tricolorability when applying move R3. The first
picture in Figure 19 has five “exiting” strands (in black) and one “non-exiting”
strand (in green). There are five ways to tri-color this diagram segment: two
cases with monochromatic (blue) central crossing and three cases with tricol-
ored central crossing. Check that after move R3, all “exiting” strands have
preserved their colors, while the “non-exiting” central strand may preserve
its color (as in column 2) or may change its color (as in column 3). 






Figure 19. Move R3 and tricolorability

Exercise 13. No: Hopf link, Figure 8, Whitehead link, Borromean rings.
Yes: trefoil, 74 knot. The picture has 5 trefoils, including the hairdo! ♦
Exercises 14-15. These knots are tricolorable if and only if the number of
crossings is divisible by 3. Think about why! ♦
Problem 2. Use the solution to Problem 1. ♦
Exercise 16. τ (Trefoil) = 9; τ (Figure 8) = 3; τ (Square) = 27. ♦
Problem 3. τ (61 ) = 9 = τ (61 ); τ (816 ) = 3 = τ (816 ); τ (910 ) = 9 = τ (910 ). ♦
Exercise 17. Yes, K1 #K2 is tricolorable. Let α1 and α2 be the strands in
K1 and K2 , respectively, on which the “surgery” will be performed. If α1 and
α2 have the different colors, permute the colors on K2 to make α2 ’s color
match α1 ’s color. Perform then the surgery and extend that common color
onto the pieces connecting K1 and K2 within K1 #K2 . 

Problem 4. τ (K1 #K2 ) = 13 τ (K1 )τ (K2 ). The factor of 1/3 is explained

within the solution of Exercise 17. In order for the colorings of K1 and K2
to match for a coloring of K1 #K2 , the strands α1 and α2 must be colored
the same, i.e., every coloring of K1 can be matched with exactly a third of
the colorings of K2 . ♦
Problem 5. (a) Connect the unknot U to a knot K1 doesn’t change K1 , i.e.,
K1 #U = K1 . However, connecting the Hopf link H to K1 loops an extra
ring around the strand α1 of K1 (α1 was described in the previous solution);
thus, in any coloring of K1 #H, the color of α1 forces the same color on the
extra ring (why?), implying τ (K1 #H) = τ (K1 ). Since τ (U ) = 3 = τ (H),
we can now verify the formula from Problem 4:
• τ (K1 #U ) = τ (K1 ) = 13 τ (K1 )τ (L0 );
• τ (K1 #H) = τ (K1 ) = 13 τ (K1 )τ (H). 
(b) If τ (K1 #K2 ) > 3, then τ (K1 )τ (K2 ) > 9, i.e., τ (K1 ) > 3 or τ (K2 ) > 3;
so one of K1 or K2 is tricolorable. This is a converse to Exercise 17. To
summarize, K1 #K2 is tricolorable iff one of K1 or K2 is tricolorable. 
(c) According to part (a), a linear chain of n rings, Ln , can be constructed
by consecutively summing n − 1 Hopf links H to the unknot U = L1 ; more-
over, each repetition of this operation preserves the number of tricolorings.
Hence, τ (Ln ) = τ (L1 ) = 3, and Ln is not tricolorable (which can be directly
verified by trying and failing to tricolor a linear chain). 
(d) If Nn is the necklace of n rings, then Nn = Nn #U ; but this is not
useful in calculating τ (Nn ). Moreover, Nn cannot be viewed as a non-trivial
sum of other links. To see this, make two cuts in Nn (in the same ring or
in two different rings) and reconnect the 4 ends in an attempt to decompose
into a connect sum and reconstruct links L1 and L2 with L1 #L2 = Nn . The
result will depend on the choices you make. Check out all possibilities and
conclude that we cannot effectively use the formula for τ from Problem 4.
We have to calculate τ (Nn ) by brute force!
Tricoloring one of Nn ’s crossings forces a unique tricoloring on the ad-
jacent crossings, which in turn forces a unique tricoloring on their adjacent
crossings, and so on and so forth. In order to successfully complete the tri-
coloring of the whole necklace, the tricoloring of the final crossing will have
to be matched as coming from both directions along Nn . You can easily
verify that this happens only when n is divisible by 3. In such a case, the
initial tricoloring of a crossing uniquely determines the whole tricoloring of
Nn . As there are exactly 3! = 6 possible tricolorings (plus 3 monochromatic
colorings) of a crossing, this makes a total of 9 colorings of Nn . The final
answer is τ (Nn ) = 9 if 3 divides n, and τ (Nn ) = 3 otherwise. ♦
Attempting to tricolor the Borromean rings will quickly lead to a con-
tradiction. However, deciding if a general Brunnian link with n components
is tricolorable or counting its tricolorings, will require deeper investigation
(as suggested by the comments to Exercise 4). ♦
Å ã Å ã Å ã
2 1 0 0 1 0 −1 −1 1 0 −1 −1
Exercise 18. ∼ ∼ ·
1 1 1 1 1 1 1 1 0 1 2 2
Hence z is non-leading (free) variable, x = z − 1, and y = 2 − 2z = 2 + z
(as −2 = 1 in F3 ). There are 3 choices for z ∈ F3 , each of which completely
determines x and y. Thus, overall there are 3 solutions. 
Exercise 19. The 3 leading 1s in the non-zero rows determine the 3 leading
variables x1 , x2 , and x3 . Each of the non-leading variables, x4 , x5 , and x6 ,
can be 0, 1, or 2. Thus, overall there are 33 = 27 solutions. 
Exercise 23. (a) 0, 0, 0; 1, 1, 1; 2, 2, 2; 0, 1, 2; 2, 0, 1; 1, 2, 0; 0, 2, 1; 1, 0, 2; 2, 1, 0:
3 mono-colorings and 6 tricolorings. The sum is always divisible by 3. 
Problem 6. (a)-(c) Below is the linear system. Of course, you might have
written down the equations in another order or labeled the strands differently.
This will make your matrix A look slightly different: your rows or columns
might be shuffled.
⎛ ⎞ ⎛ ⎞
x1 + x2 + x7 =0 1 1 0 0 0 0 1 1 0 0 1 1 0 0
x1 + x2 + x3 =0 ⎜1 1 1 0 0 0 0⎟ ⎜0 1 0 2 2 0 1⎟
⎜ ⎟ ⎜ ⎟
x3 + x4 + x7 =0 ⎜0 0 1 1 0 0 1⎟ ⎜0 0 1 0 1 1 0⎟
⎜ ⎟ ⎜ ⎟
x4 + x5 + x6 = 0 ⇒ A=⎜
⎜0 0 0 1 1 1 0⎟⎟ ∼ B = ⎜0
⎜ 0 0 1 0 0 2⎟⎟
x1 + x4 + x5 =0 ⎜1 0 0 1 1 0 0⎟ ⎜0 0 0 0 1 1 1⎟
⎜ ⎟ ⎜ ⎟
x3 + x5 + x6 =0 ⎝0 0 1 0 1 1 0⎠ ⎝0 0 0 0 0 0 0⎠
x2 + x6 + x7 =0 0 1 0 0 0 1 1 0 0 0 0 0 0 0
Performing row operations on A, we find its reduced echelon form B; your
matrix B must too have 5 leading 1s in the top 5 rows, and 2 zero rows at
the bottom. 
(d) In general, solutions to the system of equations are the null space of
A. We can always make a monochromatic coloring (e.g., set xi = 2 for all i);
this coloring would certainly be a solution to the system of equations; but
it won’t be a nontrivial tricoloring. The set of monochromatic colorings in
T (D) is the span of the vector with all 1s and is, therefore, 1-dimensional.
Any other solution vector will constitute a nontrivial tricoloring, raising the
dimension of the null space of A to 2 or larger. Equivalently, the echelon
form of A must have more than 1 row of zeros.
Since our B has 2 rows of zeros, our knot 77 is tricolorable! 
(e) Since B has 2 free variables, with 3 choices for each, there are 9
tricolorings of 77 , i.e., τ (77 ) = 9. More generally,
Lemma 1. τ (L) = 3 dim Null(A) for a link L with matrix A. 

Exercise 25. A priori, a knot has 2 possible orientations: by following it all

around in one direction, or in the other. Revolving the standard diagrams
of the unknot (with as many twists as you wish), the trefoil, and the figure
eight about a vertical line transforms one orientation to the other, and hence
each of these knots has exactly 1 orientation.

A link L with k components has a priori 2k possible orientations (why?),

some of which may be equivalent. As mentioned earlier, the Hopf link has
only 2 (not 4) non-equivalent9 orientations. Revolving the Whitehead link
W about a vertical or horizontal line shows that it has only 1 orientation
(instead of 4). For the Borromean rings, rotations by 120◦ and a revolution
about a vertical line reduce the 23 = 8 initial orientations to only 2: with
all rings oriented the same way, or with one ring oriented opposite to the
others. Are these orientations really non-equivalent? ♦

Exercise 27. By changing one crossing of the right-hand trefoil T = L+ ,

we obtain the unknot U = L− and the Hopf link H = L0 (cf. Fig. 20).

= =

L+ = T L− = U L0 = H
Figure 20. Trefoil in the skein relation

Exercise 29. From Exercises 27–28 we have:

1 √ 1 √ 1 √
VT − tVU = t − √ VH = t− √ t(−1 − t2 ) = (t − 1)(−1 − t2 ).
t t t
Since VU = 1, this simplifies to VT = t(t2 + 1 − t3 ) = t + t3 − t4 . 

Exercise 30. For a link L, the table below lists triplets (L+ , L− , L0 ) entering
in a skein relation with L. Here X  Y is the disjoint union of X and Y .
Link L L+ L− L0 Jones polynomial VL
two unknots U2 U U U2 −t1/2 − t−1/2
positive Hopf link H H U2 U −t1/2 − t5/2
right-hand trefoil T T U H t + t3 − t 4
Figure 8, 41 U 41 H t2 − t + 1 − t−1 + t−2
Whitehead link W H W U t−3/2 (−1 + t − 2t2 + t3 − 2t4 + t5 )
Borromean rings B B H U W −t3 +3t2 −2t+4−2t−1 +3t−2 −t−3
Square knot S S T T #H (t + t3 − t4 )(t−1 + t−3 − t−4 )
For a knot K, changing the orientation on one strand in a local crossing
picture forces us to change the orientation of the other strand (by tracing
around the knot) and, thus, the preserves the crossing type L+ , L− , and L0 ,
and does not affect VK . For a link L, though, VL may depend on the orienta-
tion of L: you can easily see this for the negative Hopf link H. Why doesn’t
it matter for the unlink U2 ? We leave it to the reader to decipher which ori-
entations we have used for the Whitehead link W and the Borromean rings
B above and if VW and VB are affected by our choices.
The positive and negative Hopf links H and H − are inequivalent (the fact that they
have different Jones polynomials proves this). The next exercise will help you calculate
their Jones polynomials. Ditto for the Borromean rings.

While the first five examples in the table can be handled one by one
in the listed order, for B and S we need to know the Jones polynomials of
H  U and T #H, which must be computed separately.
To get VHU , note that U2 = U  U . Using the “skein” √ triplet
√ (L+ = U ,
L− = U , L0 = U2 ), we found earlier that VU2 = −(1/ t + t)VU . Our
calculation generalizes to any disjoint union LU . Indeed, the corresponding
skein relation is represented by (L+ = L, L− = L, L0 = L  U ) (why?), and
1 √ 1
VL − tVL = t − √ VLU .
t t
Algebra manipulations similar to those in the text for VU2 yield
Ä ä Ä √ äÄ 1 √ä
− t V L
√ − t √ + t Ä √ä
VLU = √ t
= t
Ä√ t
ä VL = − √1t + t VL .
t − √t 1
t − √t1

This allows for a painless calculation of VHU (and VB ) and also establishes
Ä √ä
Lemma 2. VLU = − √1t + t VL for any link L.
Finally, VS for the square knot S can be yanked out through the skein
relation with (S, T, T #H). A faster approach would be to apply Theorem 2
(cf. p. 68), using that S is the connected sum of T and its mirror image T :
Thm. 2(d) Thm. 2(a)
VS = VT #T (t) = VT (t) · VT (t) = VT (t) · VT (t−1 ). ♦

Problem 7. We can view the unlink Un with n components as the disjoint

union Un = Un−1  U . According to Lemma 2 from the previousÄ solution,
disjointly adding an unknot U to a link L multiplies VL by − √1t − t .
Starting with the unknot U1 = U and applying this procedure n − 1 times
results in Un . Since VU1 = VU = 1, we arrive at
Ä √ än−1 Ä√ än−1
VUn = − t − √1t VU1 = (−1)n−1 t + √1t . 
Draw the diagram of the linear chain Ln of n rings as in Figure 23,
ignoring orientation. Now orient counter-clockwise all rings of Ln : this makes
all crossings positively-oriented. Skeining on an end crossing,Ä L+ = √Lnä,
L− = Ln−1 U , and L0 = Ln−1 . Substituting VLn−1 U = VLn−1 − √1t − t ,
Ä √ √ä
yields VLn = VLn−1 − t − t2 t .
But we already knew this! Recall that joining the Hopf link H to any
link L loops an extra ring around a component of L, so that we can view
Ln inductively as Ln = Ln−1 #H, and Ä the
√ Jones√ polynomials
ä consequently
multiply: VLn = VLn−1 · VH = VLn−1 − t − t2 t , where L1 = U . Thus,
Ä √ √ än−1 Ä
VL n = − t − t 2 t VL1 = (−1)n−1 t1/2 + t5/2 )n−1 . 

Finding Vn1 for the knots n1 from page 59 requires a more intricate in-
ductive reasoning with skein triplets. To completely understand this solution
will require familiarity with recursive sequences.

Let Tn denote the link with 2 components that twist n times around each
other. Consider first the case for n odd. Skeining on any crossing, check that
L+ = n1 , L− = (n − 2)1 , and L0 = Tn−1 . In turn, skeining on Tn−1 yields
L+ = Tn−1 , L− = Tn−3 , and L0 = (n − 2)1 . For simplicity, write an = Vn1
and bn = VTn . Therefore,
t an − tan−2 = (t1/2 − t−1/2 )bn−1 ;
t bn−1 − tbn−3 = (t1/2 − t−1/2 )an−2 .
Solve for bn−1 from the first equation and then shift down the indices in the
result to obtain an expression for bn−3 too. Substitute these findings into
the second equation to eliminate all bk ’s and derive a “symmetric” recursive
relation involving the ak ’s alone:
an − (t3 + t)an−2 + t4 an−4 = 0 ⇒ an − t3 an−2 = t(an−2 − t3 an−4 ).
The last representation rolls down to the lowest possible index n = 5:
an − t3 an−2 = t(n−3)/2 (a3 − t3 a1 ) ⇒ an = t3 an−2 + t(n−3)/2 (t − t4 ),
where a3 = VT = t +t3 −t4 and a1 = VU = 1. Rolling down the last equation
to the lowest possible index n = 3 results in a direct formula for the an ’s:
n−1 î ó
(1) Vn1 = an = t 2 tn−1 + (1 + t2 + t4 + · · · + tn−3 )(1 − t3 ) .
Using a geometric series, we can rewrite (1) in a closed form as
tn+2 −tn+1 −t3 +1
(2) Vn1 = an = 1−t2 t 2 .
The compact formula (2) is cumbersome to work with, as it requires long
division. Since n is odd, it is evident from the direct formula (1) that Vn1
is an ordinary polynomial with positive integer powers of t and coefficients
±1. For example, one can check that V51 = −t7 + t6 − t5 + t4 + t2 . ♦
We leave the case of n1 with n even to the reader. Note that the bottom
crossing in all such knots is special: changing it unravels the whole n1 into
the unknot U . The final answer is: Vn1 = (t3 + t − t5−n + t2−n )/(t + 1). ♦
Problem 8. 41 is amphichiral: it takes 8 Reidemeister moves to show it. ♦
Exercise 33. Let P1 → P2 be a Reidemeister move, where P1 and P2 are
the parts of the diagrams D1 and D2 affected by the move (as on p. 53). It
suffices to show that P1 → P2 for the mirror images of P1 and P2 . ♦
Exercise 34. For a knot K and its mirror image K, VK (t) = VK (t−1 )
(cf. Theorem 2(a)). Indeed, if links L+ , L− , and L0 satisfy the skein relation,
then L+ , L− , and L0 also satisfy the skein relation, but with L+ and L−
playing opposite roles. Thus
1 √ 1
(3) VL (t) − tVL− (t) = t − √ VL0 (t)
t + t
1 √ 1
(4) ⇒ VL− (t) − tVL+ (t) = t − √ VL0 (t)
t t

Switching t → t−1 and multiplying by −1 yields the mirror image of (3):

1 √ 1
(5) VL+ (t−1 ) − tVL− (t−1 ) = t − √ VL0 (t−1 ).
t t
Thus, if you know that the desired statement is true for the Jones polynomials
of two of L+ , L− , and L0 , you can deduce the statement for the third,
mirror image too. For instance, if you know that VL+ (t−1 ) = VL+ (t) and
VL− (t−1 ) = VL− (t), then the LHS’s of (3) and (5) are identical, forcing their
RHS’s to be identical too, i.e., VL0 (t−1 ) = VL0 (t). 
Exercise 35. Orient the Celtic knot C as in Figure 21a. From here, the
Jones polynomial is VC = −t−9/2 − t−5/2 + t−3/2 − t−1/2 . ♦

= =

L− = C L+ = H − L0 = U
Figure 21. Celtic knot C in the skein relation

As predicted by Theorem 2(c), VC , VU2 , VH , and VW , contain t, while
VT , VW , VB , and VS have only integer powers of t (why?). 

= =

L+ = R L− = L 3 L0 = U2
Figure 22. Russian “wedding” knot R in the skein relation

Orienting the Russian “wedding” ring R as in √ Figure 22a, and skeining

on its bottom crossing yields 1t VR − tVL3 = ( t − √1t )VU2 . Skeining on

L3 (cf. Fig. 23) yields 1t VHU − tVL3 = ( t − √1t )VH . Eliminating the
√ √ tVL3 from the equations and applying our formula VHU =
−( t + 1/ t)VH results in VR = t4 + t2 + 2. There is no surprise that all
powers are integer here (why?). But is there an explanation for why the
exponents are all non-negative, i.e., that VR is an ordinary polynomial? ♦

L− = L 3 L+ = H  U L0 = H
Figure 23. Oriented 3-ring linear chain in the skein relation
Session 4

Multiplicative Functions. Part I

The Infinite-Raffle Challenge
Zvezdelina Stankova

Sneak Preview. To enter Multiplicative Land, we’ll have to get tickets from
an infinite raffle. While walking through villages of relatively prime numbers
and fields of perfect squares, while examining prime decompositions of castles and
crossing geometric series rivers, we will be constantly searching for ways to win
this raffle game. To this end, we will make friends with the two-faced duke, the
function ε, and the princes of divisors, τ and σ; we will meet their sum-function
relatives Sε , Sτ , and Sσ , and realize just how contagious multiplicativity is! In-
voking the strength of induction, we will eventually emerge victorious with a
winning raffle ticket, only to discover that even deeper challenges await us in this
Multiplicative Land of Dirichlet, Möbius, and Euler, in Part II.
A beginner with some basic knowledge from Number Theory I will be well-
equipped to follow our journey. The advanced reader can study the summarizing
Figure 1, hop quickly to the olympiad-hurdle Problem 8, and upon clearing it,
plunge directly into Part II, the intermediate-level continuation.

1. Infinite Raffle: the Initial Setup

Suppose we buy several tickets from an infinite raffle, that is, a lottery
with infinitely many tickets. Each ticket has some natural number written
on it. We have a favorite number in mind, say, 2009, and we would like to get
a ticket with that number on it. But will there necessarily be a ticket with
2009 on it? Of course, it depends on which particular numbers are written
and how they are distributed among the raffle tickets.

Here is one interesting way of doing just that.

Problem 1. (∞-Raffle) There are infinitely many tickets, each with one
natural number on it. For any n ∈ N the number of tickets on which divisors
of n are written is exactly n. For example, the divisors of 6, {1, 2, 3, 6}, are
written in some variation on 6 tickets, and no other ticket has these numbers
written on it. Prove that any n ∈ N is written on at least one ticket.

According to Problem 1, our number1 2009 will indeed appear on some

ticket. But why is that so and how can we prove it?

1.1. Initial exploration. Let’s mess a bit with some initial data to get a
feeling for ∞-Raffle. Try to solve the first cases for n = 1, 2, 3, 4 on your own
before reading the ensuing discussion below.
• The easiest number to be tackled is obviously n = 1: it has to be
written on exactly 1 ticket since {1} constitutes all divisors of 1.
• The next number is n = 2: its divisors {1, 2} must be written on a
total of 2 tickets; we just found out that 1 is written on exactly 1
ticket, so that 2 has no choice but to appear on the remaining 1 ticket
and on no more tickets.
• We apply the same analysis for n = 3: its divisors {1, 3} must be
written on a total of 3 tickets; as 1 is already known to occupy exactly
1 ticket, 3 must appear on exactly 2 tickets.
• For n = 4 the situation is marginally more exciting: the divisors
{1, 2, 4} must be written on a total of 4 tickets; knowing that each
of 1 and 2 is written on a unique ticket, 4 must appear on the remain-
ing 2 tickets.
The reader has probably gathered by now that,
PST 23. In order to solve ∞-Raffle, i.e., to prove that every number appears

on at least 1 ticket, we must do something more: we need to introduce a
“stronger” object, a function R(n) that counts the exact number of tickets on
which n appears.
The function R (for “Raffle”) suggested by PST 23 is the main player in
the solution to the ∞-Raffle Problem. We already know its first few values:
R(1) = 1, R(2) = 1, R(3) = 2, and R(4) = 2.
To find out on how many tickets 5 is written, we just calculate R(5): the
divisors of 5 are {1, 5}, written on a total of 5 tickets, so that
R(1) + R(5) = 5 ⇒ 1 + R(5) = 5 ⇒ R(5) = 4.
Similarly, the divisors {1, 2, 3, 6} of 6 pro-
duce an equation for the total number 6
of tickets on which they appear:
R(1) + R(2) + R(3) + R(6) = 6 ⇒ 1 + 1 + 2 + R(6) = 6 ⇒ R(6) = 2.
Exercise 1. Continue with the above calculations up to n = 10 to find out
that R(7) = 6, R(8) = 4, R(9) = 6, and R(10) = 4. The impatient reader
should keep on calculating R(n) until at least n = 20 to see if a pattern for
the function R pops up.
From now on, “numbers” and “divisors” will refer to natural numbers and divisors,
until we lift this restriction in Part II.

1.2. Brute force bows to general abstract theory. Using the above
method, it should be clear that one can determine R(n) for any n, as long as
all R(k) for smaller k are already calculated. Although this gives one way of
proving that our favorite number 2009 will appear on some ticket (just grind
out all numbers R(1), R(2), . . . , R(2009)), these close-to-insane calculations
are definitely not the way intended by the authors of Problem 1: for one,
calculating R(2009) alone will not prove that every number is written at
least once on the tickets.
In light of our new function R(n), the ∞-Raffle Problem can be para-
phrased to say:
Problem 1 . (∞-Raffle) Show that R(n) ≥ 1 for every n ∈ N.
This is far from a simple task. Interestingly, trying to prove just the
inequality (≥ 1) is much harder than trying to find the exact values R(n) and
compare them to 1. It is also true that, in order to conquer our problem, we
will require much more sophisticated methods than brute-force calculations.
So, here is the plan: for the remainder of the session, we will
PST 24. Step back and look at the ∞-Raffle Problem from different angles,
discover and formalize properties of R(n) along with a bunch of its sibling
 functions, develop a new theory of multiplicative functions to explain all of
the arising phenomena, and ultimately produce an exact formula for R(n).
At every stage of creating this new theory, we will reconsider the ∞-
Raffle Problem, check how it relates to our new discoveries, and describe the
progress we have made on it up to that moment.

M s group Dirichlet Riemann

structure series nfn(n)
s zeta-function ζ

Dirichlet Möbius
functions M product  function μ

functions Sf ∞-Raffle Euler
function φ

Figure 1. ∞-Raffle within the larger picture

In mathematics, the overarching PST 24 is referred to as “abstracting

properties” and is used to create new theories that encompass broad collec-
tions of objects with common properties. A prime example of this approach
is the very topic of abstract algebra, a mandatory upper-division course for
every math major. The word “abstract” should not, however, deceive you:
even abstract theories can have abundant applications in practice. For in-
stance, the Rubik’s Cube sessions in this book series present a particular
application of abstract algebra to a concrete problem. Ditto, producing an

exact formula for the ∞-Raffle function R(n) at the end of this session will
come as a direct consequence of the abstract theory of multiplicative func-
tions. The real beauty of the abstraction approach of PST 24 is that, in the
context of these two multiplicative sessions, it will
• lead us to a new and deeper understanding of numbers, functions, and
relations between them, and
• empower us to conquer numerous other difficult problems that we could
not have solved before.
Figure 1 illustrates the richness of the land M of multiplicative func-
tions. The - area summarizes the current session, while the six-concept
area marked by  will be developed in the intermediate-level Part II. Both
sessions contain (different) solutions to the ∞-Raffle puzzle. Part II will ven-
ture into more advanced areas such as M’s group structure and the Dirichlet
series, and touch upon the famous Riemann zeta-function ζ(s). An historical
overview at the end will link six great mathematicians who have contributed
to the topic of multiplicative functions and its various extensions.

2. What are Multiplicative Functions?

2.1. Basic definitions. The first thing to notice about the ∞-Raffle func-
tion R(n) is that it essentially differs from commonly used functions such
as g(x) = x2 : the variable x in g(x) is a real number (x ∈ R); in contrast,
the variable n in R(n) is just a natural number (n ∈ N). Thus, R has the
restricted domain of N. Such functions have a special name:
i Definition 1. A function f : N → C is called arithmetic.
Here C is the set of complex numbers. If you don’t feel comfortable with C,
for now you can safely replace it with the set of integers Z. For instance,
R(n) is arithmetic because R : N → Z. The important thing to remember
about arithmetic functions is that their inputs can only be natural numbers.
Let A denote the set of all arithmetic functions. This is a rather large
i set involving all sorts of functions. In these sessions we will concentrate on a
special subset M of A comprised of all multiplicative functions. Why M is
so special can be explained by the fact that it is usually easier to calculate
explicit formulas for multiplicative functions and not so easy for arbitrary
arithmetic functions.2
Definition 2. An arithmetic function f : N → C is multiplicative if for any
i relatively prime m, n ∈ N:
(1) f (mn) = f (m)f (n).

For the advanced reader, M is special on a deeper level partly because it is closed
under the Dirichlet product in A, as we will discover in the Part II continuation.

Recall that m and n are relatively prime if they share no common divisor
other than 1. For example, 9 and 20 are relatively prime, but 9 and 6 are
not. Thus, any multiplicative function must satisfy f (180) = f (9)f (20) and
f (1) = f (1)f (1) but not necessarily f (54) = f (9)f (6) (why?). While it is
obvious that the name “multiplicative” is inspired by equation (1), it is not
immediately clear why relative primeness should be involved at all.

2.2. Trivial examples. Looking at Definition 2, our first impulse is to

construct very simple examples of multiplicative functions f that always
satisfy (1), regardless of whether m and n are relatively prime or not. One
such obvious example is f (n) = n for all n (check it!). It is called the identity
i function since it returns the same output as the input n; thus,
id(n) = n for all n ∈ N.
Another such trivial example is f (n) = n2 , as
f (mn) = (mn)2 = m2 n2 = f (m)f (n) for all m, n ∈ N.
For that matter, any power function f (n) = na (for a fixed a ∈ R) is also
multiplicative (why?). Such functions will be called strongly 3 multiplicative
for the obvious reason that they satisfy (1) for all pairs of numbers (m, n).
Definition 3. An arithmetic function f : N → C is strongly multiplicative if
i f (mn) = f (m)f (n) for any m, n ∈ N.

Exercise 2. How about constant functions: are any of them multiplicative?

Solution: If f (n) = c is a constant multiplicative function, then f (1) =

f (1 · 1) = f (1)f (1), so that
c = c2 ⇒ c(c − 1) = 0 ⇒ c = 0 or 1.
It is easy to check that the constant functions f (n) = 1 and f (n) = 0 satisfy
Definition 3, making them strongly multiplicative. 
The constant function 1 is so important in our upcoming analysis that
i we give it the special name ι; thus,
ι(n) = 1 for all n ∈ N.
As a bonus, we learn from the above solution that for any multiplicative
function f we must have f (1) = 1 or f (1) = 0 (as these are the only numbers
making the equation f (1) = f (1) · f (1) work). Further, if f (1) = 0 then
f (n) = f (n · 1) = f (n) · f (1) = 0 for all n ∈ N,
i and the whole function is 0, denoted by f = O. From this we see that all
interesting cases of multiplicative functions have f (1) = 1.
In literature, also referred to as totally or completely multiplicative.

2.3. Semi-trivial examples. If we want to merge the two constant func-

tions ι and O into a single “hybrid” multiplicative function, we could define
1 if n = 1;
i ε(n) =
0 if n ≥ 2.
Check a couple of easy cases in order to
Exercise 3. Verify that ε(n) is a strongly multiplicative function.
But why would anyone want to consider such a two-value multiplicative
function? This will become transparent later in Part II where we define
Dirichlet product on the set of arithmetic functions A. While we are still on
the topic of strong multiplicativity, try the following preparatory problem:

 there any functions (other than ε(n)) that attain only two distinct values?
Exercise 4. Describe all strongly multiplicative functions. Among them, are

Even though the definition of strong multiplicativity does not involve (at
least on the surface) relative primeness, the solution to Exercise 4 heavily
depends on the notion of the prime decomposition for any n ∈ N:
(2) n = pa11 pa22 · · · par r ,
where p1 , p2 , . . . , pr are the distinct prime divisors of n and a1 , a2 , . . . , ar are
the corresponding positive exponents.
Partial Solution to Exercise 4: Let f be strongly multiplicative.
Definition 3 then allows us to split f (n) along any divisors of n. For example,
for a prime power pa we can split as follows:
f (pa ) = f (p · p · · · p) = f (p) · f (p) · · · f (p) = (f (p))a .
a a
More generally, we can split f (n) along the prime decomposition of n:
⇒ f (n) = (f (p1 ))a1 (f (p2 ))a2 · · · (f (pr ))ar .
Thus, to completely know f we need to know only the values f (p) for any
prime p. These values can be arbitrarily assigned, as long as f (1) = 0 or 1
(why?). Of course, if we set f (1) = 0, we end up with the constant function O.
We summarize: any strongly multiplicative function f is either the 0-
function O, or it is constructed in the following way. For any prime number
pi we arbitrarily choose a (complex) number bi , set f (pi ) = bi , and expand
along the prime decomposition of n; that is, we define
f (n) = f (pa11 pa22 · · · par r ) := ba11 ba22 · · · bar r . ♦

For instance, if we set f (pi ) = p2i for all primes pi , we get back the square
function f (n) = p1 2a1 p2 2a2 · · · pr 2ar = n2 . If we set all f (pi ) = 1, we get
back the constant function ι(n) = 1. And finally, if we set all f (pi ) = 0 but
insist on f (1) = 1, we get back the hybrid function ε(n). Needless to add,
all of these functions are strongly multiplicative, as observed earlier.

2.4. “Truly” multiplicative examples. As promised in the introduction,

we pause here to reassess the situation with the ∞-Raffle function R(n).
Question 1. Is R(n) strongly multiplicative?
Answer: Recall the initial data we found, R(1) = R(2) = 1, R(3) =
R(4) = R(6) = 2, and R(5) = 4. For strong multiplicativity, we need to
have R(4) = R(2) · R(2), but this is false as 2 = 1 · 1. We have finally come
across a function which is not strongly multiplicative! 
Question 2. Is R(n) at least multiplicative?
From the above values of R(n) it looks like R(n) still has a shot at being
multiplicative. For instance, since 2 and 3 are relatively prime, we must have
R(6) = R(2) · R(3), which happens to be true: 2 = 1 · 2. In fact, a main
goal of Part I and II will be to prove that R(n) is multiplicative. Alas, we
are not equipped to do that yet, so be patient until we develop our theory
up to the necessary level.
As we leave R(n) in peace for now, can we think of other examples of
“truly” multiplicative functions, that is, functions that satisfy equation (1)
for relatively prime pairs (m, n) yet not for all pairs (m, n)? Such examples
are hard to come up with unless you have already seen some before.
Problem 2. Let n ∈ N. Define functions τ, σ, π : N → N as follows:
(a) τ (n) = the number of all divisors of n;
i (b) σ(n) = the sum of all divisors of n;
(c) π(n) = the product of all divisors of n.
 Prove that τ and σ are multiplicative functions, while π is not.
To make sure we are on the same page, here are the values of the three
functions for n = 6: τ (6) = 4, σ(6) = 1 + 2 + 3 + 6 = 12, and π(6) =
1 · 2 · 3 · 6 = 36. I can think of at least two different ways of doing each part
of Problem 2. Hence, you should first try really hard on your own before
peeking at the solution below.
In view of what we want to prove (or disprove), what is the first non-
trivial case to confirm (or deny) that a function is multiplicative? The small-
est non-trivial relatively prime numbers are 2 and 3. Therefore, it makes
sense to check if f (2)f (3) = f (6) for each of our functions:
• τ (2)τ (3) = 2 · 2 = 4 = τ (6): checks out! 
• σ(2)τ (3) = (1 + 2)(1 + 3) = 12 = σ(6): checks out! 
• π(2)π(3) = 2 · 3 = 36 = π(6): aha! π fails multiplicativity at the very
first possible non-trivial instance!  Part (c) is done. 
Before we devise a formal solution to parts (a)–(b), let’s do something
slightly “illegal”: let’s

 PST 25. “Prove” that τ and σ are multiplicative via a specific example. The
chosen example must be representative enough to illustrate the involved ideas
and PSTs so as to allow us later to generalize our solution to all cases.
Some initial trials lead us to choose the simple (but general enough) case
of relatively prime m = 5 and n = 6. The divisors of 5, 6, and 5 · 6 = 30 are
{1, 5}, {1, 2, 3, 6}, and {1, 2, 3, 6, 5, 10, 15, 30}, respectively. The key question
is: how can we obtain the divisors of 30 by using only the divisors of 5 and 6?
After staring at the data for a while, the reader is likely to notice that the
divisors of 30 are all the pairwise products of the divisors of 5 and 6:
(3) {1, 2, 3, 6, 5, 10, 15, 30} = {1·1, 1·2, 1·3, 1·6, 5·1, 5·2, 5·3, 5·6}.
For starters, this means that the desired multiplicative relation among the
number of divisors of 5, 6, and 30 is satisfied: τ (30) = τ (5)τ (6) (8 = 2·4).
Moreover, we can calculate the divisor-sum σ(30) in two ways, using the
usual distributivity property:
σ(5)σ(6) = (1 + 5)(1 + 2 + 3 + 6)
= 1·1 + 1·2 + 1·3 + 1·6 + 5·1 + 5·2 + 5·3 + 5·6
= 1 + 2 + 3 + 6 + 5 + 10 + 15 + 30 = σ(30).
There is no obstruction to generalizing the above “proof-by-example” to all
cases, as long as the idea of pairwise products in (3) holds for any relatively
prime m and n. This is a well-known fact from number theory:
Lemma 1. For any numbers m and n, the divisors of mn are all pairwise

products of divisors of m and n. If m and n are relatively prime, then all
such products are distinct. In particular, the number of divisors of mn is the
product of the numbers of divisors of m and of n: τ (mn) = τ (m)τ (n).
We leave the reader to come up with a rigorous proof of Lemma 1 (cf. Hints
section). Note that Lemma 1 shows de facto that τ is multiplicative. ♦

It remains only to prove that σ is multiplicative, which we do by for-

malizing the earlier calculations for σ(30). For the remainder of Part I and
in Part II, unless otherwise stated, let {c1 , c2 , . . . , cs } and {d1 , d2 , . . . , dr }
i denote the divisors of m and of n, respectively. Thus, τ (m) = s, τ (n) = r,
σ(m) = c1 + c2 + · · · + cs , and σ(n) = d1 + d2 + · · · + dr .

Solution to Problem 2(b): For relatively prime m and n, Lemma 1

has established that the divisors of mn are the pairwise products {ci dj }
(where 1 ≤ i ≤ s and 1 ≤ j ≤ r) and that these products are all distinct.
Then by definition, σ(mn) is the sum of all products ci dj :
(4) σ(mn) = c1 d1 + c1 d2 + · · · + ci dj + · · · + cs dr
(5) = (c1 + c2 + · · · + cs )(d1 + d2 + · · · + dr ) = σ(m)σ(n).
Therefore, σ is multiplicative. 

It is worth verifying that neither τ nor σ is strongly multiplicative, as

they miserably fail the very first non-trivial case: τ (4) = 3 = 2 · 2 = τ (2)τ (2)
and σ(4) = 7 = 3 · 3 = σ(2)σ(2). If you fast-forward to Figure 2 (p. 95), you
will see that τ and σ are correspondingly placed in the middle “ring” of the
diagram, meaning that they belong to M but not to S, the set of strongly
i multiplicative functions.

2.5. A taste of prime power. At this point, multiplicativity may still

seem like a nice abstraction of no practical value. How deceiving! Once you
know that a function is multiplicative, you can do wonders with it:
PST 26. You can reduce the question of finding a direct formula for a

multiplicative function f (n) to finding such a formula only in the case when
n is a prime power pa . Namely, you can split f (n) along the prime powers
pai i from the prime decomposition of n:
(6) f (n) = f (pa11 )f (pa22 ) · · · f (par r ),
and now look for a direct formula just in the prime-power case f (pa ).

Equation (6) follows from the multiplicativity of f

and the fact that pai i and pj j are relatively prime for f (71 )
f (52 )
distinct primes pi and pj . We cannot split f (n) any f (10!)
further since the divisors of a prime power pa , exclud- f (28 )
ing 1, are not relatively prime. Thus, (6) is the finest
splitting which applies to all multiplicative functions. f ∈M f (34 )

Using the prime-power reduction in PST 26,

Exercise 5. Split f (10!) into as fine a product as possible for any f ∈ M.
Problem 3. Derive the following representations of τ and σ:

r r
pai +1 − 1
(a) τ (n) = (ai + 1); (b) σ(n) = i
i=1 i=1
pi − 1

The notation in this problem calls for a short detour. By now, we have

carefully avoided the symbols and , but it is high time that we stop
beating about the bush and re-introduce them, as they will substantially
shorten our presentation and clarify calculations. The notation ri=1 simply
i means “take the sum of all terms indexed by i = 1, 2, . . . , r.” For instance,
we could have written the initial definition of σ in two equivalent ways:

σ(n) = di = d,
i=1 d|n

where the notation d|n stands for “d divides n.” The first summation is read
as “add all d1 , d2 , . . . , dr ”; while the second summation: “add all d’s for

which d|n,” or in other words, “add all divisors d of n.” Likewise, equations
(4)–(5) on σ’s multiplicativity can be succinctly rewritten as:
s äÄ 
r ä distr.  Lem.1
σ(m)σ(n) = ci dj = ci dj = σ(mn).
i=1 j=1 i,j

Notice the double-index “i, j” in the last summation: when bounds for i
and j are not explicitly written, it is assumed that i and j run over all
possibilities. Using the divisor notation, we can rewrite the above calculation
in yet another way that may at first look confusing; but ultimately, it is most
advantageous for multiplicative functions:
Ä  äÄ  ä distr.  Lem.1 
σ(m)σ(n) = c d = cd = e = σ(mn).
c|m d|n c|m,d|n e|mn
The notation is analogous: just take the product of all terms in-
i=1 r 
i dexed by i. Thus, the function π can be written as π(n) = i=1 di = d|n d.
Conversely, the desired formula for τ (n) in Problem 3 can be expanded as
τ (n) = (a1 + 1)(a2 + 1) · · · (ar + 1).

With this said, we can go back to proving our formulas for τ and σ.

Solution to Problem 3: The prime-power reduction of PST 26 teaches

us that, to find general formulas for τ and σ, we need only to find formulas
for τ (pa ) and σ(pa ). Since the divisors of pa are {1, p1 , p2 , . . . , pa }, a total of
a + 1 divisors, by definition,
pa+1 − 1
(7) τ (pa ) = a + 1 and σ(pa ) = 1 + p + p2 + · · · + pa = ·
The last formula is known as the sum of a finite geometric series with first
i term 1, ratio p, and total number of terms a + 1.4 At any rate, by PST 26
we can patch the prime-power pieces from (7) into general formulas:
• τ (n) = τ (pa11 )τ (pa22 ) · · · τ (par r ) = (a1 + 1)(a2 + 1) · · · (ar + 1);
a +1 a +1
p1 1 −1 p2 2 −1 ar +1
• σ(n) = σ(pa11 )σ(pa22 ) · · · σ(par r ) = p1 −1 · p2 −1 · · · prpr −1−1 · 

An excellent illustration of these findings is provided by our number

n = 2009. Its prime decomposition 2009 = 72 · 41 instantaneously yields
• τ (2009) = (2 + 1)(1 + 1) = 6, and
73 −1 412 −1
• σ(2009) = 7−1 · 41−1 = 57 · 42 = 2394.
We encountered the geometric series formula in the Stomp session in Volume I. To
prove it, multiply both sides of (7) by p − 1 to clear the denominator, expand the resulting
product on the LHS, and cancel just about everything in sight, arriving at the RHS:
(1 + p + p2 + · · · + pa )(p − 1) = p + p2 + p3 + · · · + pa + pa+1
−1 − p − p2 − p3 − · · · − pa = pa+1 − 1.

If you don’t believe this, calculate by brute-force the number and the sum
of all divisors of 2009 and compare answers.

2.6. Non-multiplicative example. Earlier, we found that the product

π(n) of all divisors of n is not a multiplicative function, and hence, the
prime-power reduction formula (6) is powerless here. This certainly does
not prevent us from finding a nice compact formula for π(n):

 Problem 4. Prove that π is given by π(n) = n 1

τ (n)
Note how deftly this formula links the two functions τ and π.

Hint: The proof departs from the theme of multiplicative functions, so we

leave it up to the discretion of the reader. The only hint we slip in here is to
study the legendary way Gauss proved (as a child) the summation formula
for the arithmetic series
n(n + 1)
1 + 2 +3 + ···+ n = ,
and to replace appropriately addition with multiplication. ♦

2.7. Is the ∞-Raffle Problem doable after all? It’s time to pause and
think what this all means for the function R(n). If we eventually do manage
to prove that R(n) is multiplicative, PST 26 will empower us to find a direct
formula for it. For this to work, we will need to
 Problem 5. Find a formula for R(p ) for any prime power p .
a a

Solution: Let’s check R(n) for the first few powers of p.

• We already know that R(1) = 1.
• The divisors of p are {1, p}. Thus, R(1) + R(p) = p ⇒ R(p) = p − 1.
• The divisors of p2 are {1, p, p2 }. Thus, R(1) + R(p) + R(p2 ) = p2 ,
⇒ R(p2 ) = p2 − (p − 1) − 1 = p2 − p.
• For p3 we similarly calculate R(1) + R(p) + R(p2 ) + R(p3 ) = p3 ,
⇒ R(p3 ) = p3 − (p2 − p) − (p − 1) − 1 = p3 − p2 .

A pattern surfaces: R(pa ) = pa − pa−1 for a ≥ 1. Having been through the

induction session in Volume I, we could immediately attack our conjecture
by induction on a, but this will be overkill! Here is a shortcut:
For any a ≥ 1, the divisors of pa are {1, p, . . . , pa−1 , pa }. By ∞-Raffle:
R(1) + R(p) + · · · + R(pa−1 ) +R(pa ) = pa .

The first a terms correspond to the divisors of pa−1 , so by ∞-Raffle again,

they add up to pa−1 (this is emphasized by the underbrace). Solving for the
last term R(pa ) poses no difficulty: R(pa ) = pa − pa−1 . 

We have experienced one of the most fundamental, popular, and effective

techniques in solving problems with numbers and sequences:
 PST 27. Check the first few cases of a problem to search for a pattern.
After you find a pattern, prove it directly, by induction, or another method.

We are now ready to derive a direct formula for the general case R(n):
(8) R(n) = R(pa11 )R(pa22 ) · · · R(par r )
(9) = (pa1 − pa1 −1 )(pa2 − pa2 −1 )(par − par −1 ).
Since each factor (pa −pa−1 ) ≥ 1 (why?), we conclude that the whole product
R(n) ≥ 1. Hence, every number n appears on at least 1 ticket!

Are we done? Far from done! We still need to

Problem 1 . (∞-Raffle) Prove that the function R(n) is multiplicative.
We challenge the reader to tackle the multiplicativity of R by brute-
force. Meanwhile, we will take a deeper, more elegant approach in Section 3
to explain the multiplicativity of R(n). . . without a single specific calculation!

2.8. Warming up to τ , σ, and π. To check your understanding of the

concepts and theory so far, do the exercises below. For the most part, they
require clever manipulation of the definitions and formulas for τ , σ, and π.
After a reasonable amount of effort, you may glimpse at the solutions we
have listed and try to reconstruct them in your own way.

Exercise 6. Find all n ∈ N such that

 (a) τ (n) = 403; (b) σ(n) = 381; (c) π(n) = 5832; (d) π(n) = 330 540 .

Solution (a): The key observation is that 403 is the product of two primes
13 and 31. Correspondingly, if n had three or more distinct prime divisors,
i.e., n = pa11 pa22 pa33 k for some k ∈ N, then
τ (n) = (a1 + 1)(a2 + 1)(a3 + 1)τ (k) = 13 · 31.
But each factor ai + 1 ≥ 2 and, therefore, it yields a prime divisor of τ (n);
yet, 13 · 31 has only two prime divisors, a contradiction. We conclude that
n has at most two distinct prime divisors; i.e., n = pa11 pa22 or n = pa .
The formula for τ then yields τ (n) = (a1 + 1)(a2 + 1) = 13 · 31 or
τ (n) = a + 1 = 13 · 31, from which a1 = 12, a2 = 30, and a = 13 · 31 − 1. The
answer is n = p12 30
1 p2 or p
13·31−1 for primes p , p , and p, with p = p .
1 2 1 2 
At the heart of this solution stands a powerful idea:

 PST 28. Via properties of prime decompositions (e.g., 13 · 31), bound the
number of distinct prime divisors of n (e.g., n has at most two prime divi-
sors), and investigate each case within your newly-found bound.

We leave the reader to figure out part (b) with somewhat similar tech-
niques, and we move to the different part (c).
Solution (c): If π(n) = 23 36 (= 5832), the definition of π(n) implies that
n has exactly two prime divisors: p1 = 2 and p2 = 3 (why?), from which
n = 2a 3b . From Problem 4 for π, we have π(2a 3b ) = (2a 3b ) 2 τ (n) , so that
(2a 3b )(a+1)(b+1)/2 = 2a(a+1)(b+1)/2 3(a+1)b(b+1)/2 = 23 36 .
Equating the exponents of the involved prime powers of 2 and 3, we arrive
at a system of two equations:
a(a + 1)(b + 1) = 6 and (a + 1)b(b + 1) = 12.
 There are many ways to continue from here. A slick way is to divide the
two equations, resulting in a/b = 1/2, i.e., b = 2a. Substituting in the first
equation yields a(a + 1)(2a + 1) = 6, which (by trial and error) has only
one natural root a = 1 (why?), and hence b = 2. The final answer is then
n = 2 · 32 = 18. Checking: π(18) = 1 · 2 · 3 · 6 · 9 · 18 = 23 36 .  
For the next exercise, recall the notation gcd(m, n), which stands for
the greatest common divisor of m and n. Recall also that for each prime p,
the gcd picks up the smaller of the two prime powers pa in m and pb in n.
PST 28 applies with full force here too.

Exercise 7. Find all m and n such that gcd(m, n) = 18, τ (m) = 21, and
τ (n) = 10.

2.9. “Fields” of perfect squares. Our last problem in this section is

both theoretically important and interesting on its own. It’s centered on the
i concept of a perfect square, that is, the square of an integer; for instance, 36
is a perfect square, but 24 is not.

 is odd iff n is a perfect square or twice a perfect square.

Problem 6. Show that τ (n) is odd iff n is a perfect square, and that σ(n)

For some reason, every time I give a session on multiplicative functions,

someone from the audience expresses a doubt regarding these equivalences.
To dispel all such doubts, let’s list a few supporting examples:
• 36 = 22 32 is a perfect square and τ (36) = 3 · 3 is odd, while 24 = 23 3
is not a perfect square and τ (24) = 4 · 2 is even.
• σ(36) = 7·13 and σ(2·36) = 15·13 are odd, while σ(24) = 15·4 is even.

Proof: The product τ (n) = (a1 + 1)(a2 + 1) · · · (ar + 1) is odd iff all factors
(ai + 1) are odd themselves, i.e., all ai ’s are even. In turn, this means that
ai = 2bi for some numbers bi , and the prime decomposition of n is
n = pa11 pa22 · · · par r = p2b1 2b2 b1 b2
1 p2 · · · pr = (p1 p2 · · · pr ) = k ,
2br br 2 2

which is a perfect square. 

ai +1
p −1
Analogously, σ(n) is odd iff all of its factors σ(pai i ) = ipi −1 are odd
themselves. It is better to forget here about this compact geometric series
formula and work with the original definition of σ(pa ) for prime powers pa :

σ(pa ) = 1 + p + p2 + · · · + pa .

If p = 2, this sum is always odd. But if p > 2, i.e., if p is an odd prime,

the sum consists of a + 1 odd summands; thus, the total sum is odd iff it
has an odd number of odd summands, i.e., if a + 1 is odd, i.e., if a is even!
Summarizing, the exponent of 2 can be either even or odd, but all other
exponents of primes pi must be even:

n = 2a p2b2 2b3 a b2 b3
2 p3 · · · pr = 2 (p2 p3 · · · pr ) = 2 k .
2br br 2 a 2

If a is even, then n is a perfect square (why?). Otherwise, a = 2b + 1, and

n = 2(2b k)2 is twice a perfect square. 

3. Sum-Functions

3.1. Creating new functions. To any arithmetic function f we now asso-

ciate another arithmetic function Sf in the following manner:

Definition 4. For any f : N → C define the sum-function Sf of f by

i (10) Sf (n) = f (d1 ) + f (d2 ) + · · · + f (dr ) = f (d),

i.e., the sum-function Sf adds up all values of f along the divisors of n.

Suppose we want to evaluate the sum-function of the constant function ι:

(11) Sι (n) = ι(d1 ) + ι(d2 ) + · · · + ι(dr ) = 1 + 1 +· · · + 1 = r = τ (n).

Thus, Sι = τ , our old friend τ counting the number of divisors.

Exercise 8. What are the sum-functions of ε(n), id(n), and R(n)?

Partial solution: It is fairly straightforward to arrive at Sε = ι and

Sid = σ. As for the sum-function of the ∞-Raffle function R, we have:
def  ∞-Raffle def
SR (n) = R(d) = n = id(n) ⇒ SR = id. ♦

We realize that ∞-Raffle can (yet again!) be paraphrased:

Problem 1. (∞-Raffle) Let R be an arithmetic function whose sum-

function is the identity, i.e., let SR = id. Prove that R(n) ≥ 1 for all n.

3.2. Multiplicativity is contagious! Viewing ∞-Raffle in the above way

inevitably brings up the question: What is the relationship between a func-
tion f and its sum-function Sf ? Do they share common properties? Knowing
f or Sf , can we easily calculate the other? The examples above, Sι = τ ,
Sε = ι, Sid = σ, and SR = id, suggest that:

 Theorem 1. If f is multiplicative then its sum-function S f is multiplicative.

Proof: Let f be multiplicative, and let m and n be relatively prime. Then
any divisors ci of m and dj of n are also relatively prime and, from Lemma 1,
the pairwise products {ci dj } are distinct and comprise all divisors of mn. We
can now verify the definition of multiplicativity for Sf :
Sf (m) · Sf (n) = f (ci ) f (dj ) = f (ci )f (dj )
i=1 j=1 i,j
f –mult  Lem.1  def
= f (ci dj ) = f (e) = Sf (mn).
i,j e|mn

Hence, Sf (m)Sf (n) = Sf (mn) and the sum-function Sf is multiplicative. 

Theorem 1 confirms that all examples of sum-functions we found earlier

were multiplicative by no coincidence; starting with the multiplicative ι, ε,
and id, we had to arrive at some multiplicative sum-functions, which turned
out to be τ , ι, and σ. Furthermore, using sum-functions, we can now create
many more examples of multiplicative functions.

 Exercise 9. For any a ∈ R, let σ (n) = a d|n d

a. Find a formula for σa .
Note that σa generalizes τ and σ: σ0 = τ and σ1 = σ. The solution
below illustrates a typical exercise on multiplicativity of sum-functions.

Solution: As we observed before, all power functions na are (strongly)

multiplicative. Exercise 9 defines σa as the sum-function of na , i.e., σa = Sna .

By Theorem 1, Sna is multiplicative. Hence, we split σa (n) = ri=1 σa (pai i )
and then calculate a prime-power piece σa (pb ):
σa (pb ) = da = (pj )a = 1 + pa + (p2 )a + · · · + (pb )a
d|pb j=0

(pa )b+1 − 1
= 1 + pa + (pa )2 + · · · + (pa )b = ·
pa − 1
The last equality featured a geometric series with initial term 1, ratio pa ,
and b + 1 terms. Multiplying together all σa (pai i ) yields:

pa(ai +1) − 1
σa (n) = ·
pai − 1
Note that this formula works for all real a except for a = 0 (why?). 

As in the above solution, calculations with multiplicative sum-functions

often employ the following consequence of Theorem 1:
Corollary 1. The sum-function of a multiplicative function f is given by

r Ä
Sf (n) = f (1) + f (pi ) + f (p2i ) + · · · + f (pai i ) .
Try this formula on several more sum-functions:
 Problem 7. Find formulas for S τ and Sσ . How about Sln and Sπ ?
Partial Solution: Corollary 1 can be applied to the multiplicative
sum-functions Sτ and Sσ , but it cannot help us in the remaining two non-
multiplicative cases. To find Sln , two basic properties of the logarithmic
function come to the rescue: ln(a)+ln(b) = ln(ab) and ln(ac ) = c ln a. Hence,
 Ä 1 ä
Sln (n) = d|n ln d = ln( d|n d) = ln(π(n)) = ln n 2 τ (n) = 12 τ (n) ln n.
As for Sπ , try your luck, patience, and ingenuity! ♦
If you would like to create even more diverse multiplicative functions,
consider the following easy-to-prove statement:

Lemma 2. If f1 , f2 , . . . , fk are multiplicative functions, then the usual func-
tion product f1 f2 · · · fk is also a multiplicative function.
Thus, for instance, τ 2009 , Sτ 2009 , and even SSτ 2009 are all multiplicative.

3.3. The AMS-inclusion and movement in Multiplicative land.

The converse of Theorem 1 is also true:
Theorem 2. If the sum-function Sf is multiplicative then the original func-
tion f is multiplicative too.
The proof is, alas, harder and somewhat technical, involving the method
of strong induction. The beginner is advised to accept this converse state-
ment on a first reading. But the intermediate reader skilled with induction
should attempt to find a proof and then consult with the Hints section.5
As alluded to earlier, Figure 2 depicts the strict inclusions6 between the
three sets of functions we’ve encountered: A  M  S. As you track
down functions and their sum-functions in this diagram, think about what
exactly Theorems 1–2 imply about them; namely, that a function f and
its sum-function Sf are either simultaneously inside M, or simultaneously
outside M. For instance, O, ι, ε, id, na , τ , σ, R, and their sum-functions
are all inside M; on the other hand, ln and π are in the outer ring A\M of
non-multiplicative functions, and so are their sum-functions Sln and Sπ .
Using closure of Dirichlet product on the set M of multiplicative functions, we shall

provide in Part II another, direct and slick proof of Theorem 2.

half-jokingly named “the AMS -inclusion”: check out the publisher of this book! 

Sπ A
Sln = 12 τ ln
SΛ = ln Sid =σ Sτ σ
S ι =τ
Sε =ι SR =id S
S μ =ε na φ
O ι ε id R
μ τ σ Λ

Figure 2. The AMS -inclusion: A  M  S

Interestingly enough, S is not closed under the taking of sum-functions;

i.e., you can start with a strongly multiplicative function (like ι and id) but
end up with a non-strongly multiplicative sum-function (like Sι = τ and
Sid = σ in M\S). There is a good reason for this “movement” within M in
and out of S, which will be explained via the Dirichlet product in Part II.
Finally, the functions μ (Möbius), φ (Euler ), and Λ (von Mangoldt) must
remain a mystery until we learn a whole lot about them in Parts II–III.

3.4. First victory over ∞-Raffle. Theorem 2 is important to us also for

a personal reason. The sum-function of R is the multiplicative identity func-
tion: SR = id; so by Theorem 2 we conclude that R is also multiplicative.
This, in effect, solves the second reformulation of ∞-Raffle (Problem 1 ) and
yields formula (9) for R. 

One could stop right here: after all, we have solved our ∞-Raffle Problem.
But if you are curious to see the story of sum-functions placed within a much
larger context and to arrive at an even niftier solution to ∞-Raffle, plow on
into Part II.
For those who have found the discussion so far too elementary, sharpen
your olympiad problem-solving skills with the following delightful

Problem 8. Let f (n) : N → N be multiplicative and strictly increasing.7 If

f (2) = 2, then prove that f (n) = n for all n ∈ N.

In other words, if we are looking for multiplicative and strictly increasing

functions: N → N, and if we fix the first two values as f (1) = 1 and f (2) = 2,
the function id will be the only one that fits! The proof doesn’t require any
of the fancy techniques we develop later in Part II: indeed, a bit of induction
and several linked inequalities is “all” it takes to nail down this problem. Yet,
who will succeed? See you at the end of the Hints section for the “showdown.”

Strictly increasing means f (x) < f (y) for any x < y in the domain of f .

4. Hints and Solutions to Selected Problems

Exercise 1. Continuing by brute-force,
• R(7) = 7 − R(1) = 7 − 1 = 6;
• R(8) = 8 − R(1) − R(2) − R(4) = 8 − 1 − 1 − 2 = 4;
• R(9) = 9 − R(1) − R(3) = 9 − 1 − 2 = 6;
• R(10) = 10 − R(1) − R(2) − R(5) = 10 − 1 − 1 − 4 = 4.
The next values are: R(11) = 10, R(12) = 4, R(13) = 12, R(14) = 6,
R(15) = 8, R(16) = 8, R(17) = 16, R(18) = 6, R(19) = 18, R(20) = 8. ♦
Exercise 3. Let m, n ∈ N. All cases can be grouped into two categories.
If one of n or m is 1, say, n = 1, then mn = m and ε(m)ε(n) = ε(m) · 1 =
ε(m) = ε(mn). If both m and n are > 1, then mn > 1 and ε(m)ε(n) =
0 · 0 = 0 = ε(mn). In all cases, ε(m)ε(n) = ε(mn), i.e., ε is strongly
Exercise 4. For a two-value strongly multiplicative f , we must have f (1) = 1
(otherwise f = O, which is no good!). The second value of f must come from
a prime p: f (p) = b for some b = 1 (why?). But then f (p2 ) = (f (p))2 = b2
so that b2 is another value of f . In order to have only two values of f , b2 = b
or b2 = 1. As b = 1, this yields only two possibilities: b = 0 or b = −1.
Hence, there are two types of such functions f1 and f2 , obtained as
follows. Set f1 (1) = 1 = f2 (1) and choose a non-empty set of primes P. For
any prime p, define
® ®
0 if p ∈ P −1 if p ∈ P
f1 (p) := and f2 (p) :=
1 otherwise; 1 otherwise.
Extend f1 and f2 in a strongly multiplicative fashion: fi (pa11 pa22 · · · par r ) :=
fia1 (p1 )fia2 (p2 ) · · · fiar (pr ) for i = 1, 2. Then f1 attains only the values 1 and
0, while f2 only the values 1 and −1. Thus, the function ε is one of infinitely
many examples of two-value strongly multiplicative functions. ♦
Lemma 1. What are the divisors of mn? Obviously, if c|m and d|n, then the
product follows suit: cd|mn; so all pairwise products ci dj are divisors of mn.
Conversely, if e is a divisor of mn, using the prime decompositions of m
and n we can write e as a product of a divisor of m and a divisor of n (why?):
e = ci dj for some ci and dj . The key point here is that all products ci dj
are distinct. It may not be immediately obvious that this follows from the
relative primeness of m and n, but watch! If ci dj = ck dl for some divisors
ci , ck of m and dj , dl of n, then ci divides the RHS, i.e., ci |ck dl . However,
ci and dl are relatively prime (as divisors of the relatively prime m and n);
thus, ci |ck . Turning the tables around, we can equally show that ck |ci , so
that ci = ck and, consequently, dj = dl . Thus, two products ci dj and ck dl
are equal only if they are comprised of identical divisors of m and n.
Therefore, the products ci dj are distinct and they comprise all divisors
of mn. There are sr such products, i.e., τ (mn) = sr = τ (m)τ (n). 

Problem 4. The (most likely) way by which Gauss added up 1+2+· · ·+100
in his primary school math class was to pair up terms in the front with terms
in the back, each pair giving the same total sum of 101, i.e., 1+100 = 2+99 =
3 + 98 = · · · = 101.
The same idea can be applied to the product of all divisors of n. If
{1 = d1 , d2 , d3 , . . . , dr = n} are the divisors of n arranged in ascending order,
note that n = d1 dr = d2 dr−1 = d3 dr−2 , and so on. The reason this works
out so nicely is because if d is a divisor of n, then nd is also a divisor n, so
that d · nd = n. Formally, { dn1 , dn2 , · · · , dnr } are also the divisors of n, but
arranged in descending order. We see that π(n) can be calculated in two
different ways, and we multiply the two corresponding expressions below:
π(n) = d1 d2 · · · dr × Ä n äÄ nä Ä nä
⇒ π 2 (n) = d1 · d2 · · · · dr · = nr .
π(n) = dn1 dn2 · · · dnr d1 d2 dr
As r = τ (n) is the number of divisors of n, we arrive at π(n) = nτ (n)/2 . 
Exercise 6(b). The prime decomposition of 381 is 3 · 127. Since σ is
multiplicative, we can apply the prime-power splitting to it:
σ(n) = σ(pa11 )σ(pa22 ) · · · σ(par r ) = 3 · 127.
Each factor σ(pa ) = 1+p+· · ·+pa yields a non-trivial divisor of σ(n) = 3·127.
Hence, there can be at most two prime divisors p1 and p2 of n. (Why?
Compare with PST 28.)
Case 1. If n = pa q b for distinct primes p and q, then σ(pa ) = 3 and
σ(q b )= 127. The first equation has only one solution: 1 + 2 = 3, i.e., pa = 2.
You can “brute-force” the solutions to the second equation, but there is a
finer way to proceed. From q b+1 − 1 = 127(q − 1) (how did we get this?)
we can reduce modulo q to −1 ≡ −127 (mod q), i.e., q|126 = 2 · 32 · 7. But
q = 2 (we already established p = 2), so that q = 3 or q = 7. Check that
3b+1 − 1 = 127 · 2 and 7b+1 − 1 = 127 · 6 do not yield acceptable solutions
for b. Therefore, this case does not work in our problem.
Case 2. If n = pa is a prime power, then σ(pa ) = 381, which means
pa+1 − 1= 381(p − 1). Again, reducing modulo p results in p|380 = 22 · 5 · 19.
Check that p = 2 and p = 5 do not yield any solutions, but p = 19 works:
193 − 1 = 18 · 381. The final (and only) answer is n = 192 = 361. ♦
Exercise 6(d). As in part (c), set n = 3a 5b and obtain a system of two
equations. Dividing them, deduce that b = 43 a. Substituting into one equa-
tion, arrive at a(a + 1)(4a + 3) = 180. The LHS increases as a increases, so
the solution a = 3 is the only one (why?). The final answer is n = 33 54 . ♦
Exercise 7. Among other things, gcd(m, n) = 18 implies that both 2 and 3
divide m and n (why?). On the other hand, τ (m) = 21 = 3 · 7 is a product
of two primes, just like in Exercise 6(a) where τ (n) = 13 · 31. By a similar
analysis, conclude that m = 22 36 or m = 26 32 . Ditto, since τ (n) = 10 = 2 · 5
is also a product of two primes, n = 21 34 or n = 24 31 (why?).

Finally, observe that not both m and n are divisible by 4 or by 27

(why? gcd(m, n) = 18 = 2 · 32 ). This leaves only one possibility for the
pair (m, n): m = 26 32 and n = 21 34 . ♦
Exercise 8. By definition of sum-functions:
• Sε (n) = d|n ε(d) = ε(1) + 0 + 0 + · · · + 0 = 1 = ι(n);
• Sid (n) = d|n id(d) = d|n d = σ(n). 

Corollary 1. There is really nothing to prove here. Since f is multiplica-

tive, we know that Sf is multiplicative; so we split Sf into prime-power
components and write the definition of each component as a sum-function:

r r Ä
Sf (n) = Sf (pai i ) = f (1)+f (pi )+f (p2i )+· · ·+f (pai i ) . 
i=1 i=1

Problem 7. Since τ and σ are multiplicative, their sum-functions are also

multiplicative; so it suffices to find formulas only at prime powers:
Sτ (pa ) = τ (d) = τ (pi ) = (i + 1) = 2 ; and
d|pa i=0 i=0

a Ä a+1
pi+1 −1
Sσ (pa ) = σ(d) = σ(pi ) = p−1 = 1
p−1 pi
− (a + 1)
d|pa i=0 i=0 i=1
p p p−1−1 −(a+1) pa+2 −p−(p−1)(a+1) pa+2 −p(a+2)+(a+1)
= p−1 = (p−1)2 = (p−1)2 ·
Along the way, we used the formulas for the sum of the arithmetic series
a a+1 i
i=0 (i + 1) and for the sum of the geometric series i=1 p . Applying (6),
we piece together all prime-power parts into general formulas for Sτ and Sσ :

(ai +2)(ai +1)

ai +2 
r a +2
pi i −pi (ai +2)+(ai +1)
Sτ = 2 = 2 and Sσ = (p −1)2 · i
i=1 i=1 i=1
The final version of the formula for Sτ employs the notation for the binomial
coefficient a+2
2 = (a+2)(a+1)
2 · 
As we indicated in the text, working with Sπ is much harder, since π is
not multiplicative. We can’t use (6) for Sπ (n); even finding a closed formula
for a prime-power piece is already problematic:

p( 2 ) .
Sπ (pa ) = π(pi ) = (pi )τ (p )/2 = pi(i+1)/2 = ♦
i=0 i=0 i=0 i=0

Lemma 2. To save chalk , we will prove the lemma only for two mul-
tiplicative functions f1 and f2 ; but this will actually suffice to prove the
statement for any number of such functions (why?).
Let m and n be relatively prime. To show multiplicativity of f1 · f2 , we
calculate as follows:

def· mult
(f1 ·f2 )(mn) = f1 (mn)·f2 (mn) = f1 (m)f1 (n)f2 (m)f2 (n)
Ä äÄ ä
= f1 (m)f2 (m) f1 (n)f2 (n) = (f1 ·f2 )(m)(f1 ·f2 )(n).
Therefore, f1 ·f2 is also multiplicative. ♦
For instance, in the text, we claimed that SSτ 2009 is multiplicative. This
is true because τ is multiplicative, and so is its power τ 2009 by the newly-
proven Lemma 2, and so is its sum-function Sτ 2009 , and in turn, so is its
sum-function SSτ 2009 .
Theorem 2. Let n1 and n2 be relatively prime numbers such that n = n1 n2 .
We will prove by induction on n that f (n1 n2 ) = f (n1 )f (n2 ).
The statement is trivial for n = 1: as n1 = n2 = 1, we need only to verify
f (1) = f (1)f (1). By definition, Sf (1) = f (1); since Sf (1) = 1 or 0 (Sf is
multiplicative!), we conclude that f (1) = 1 or 0, so that f (1) = f (1)f (1).
Assume now that the statement is true for all d = d1 d2 < n, i.e., that
f (d1 d2 ) = f (d1 )f (d2 ). Then for our n1 n2 = n we calculate twice:
Lem.1  ∗ 
• Sf (n1 n2 ) = f (d1 d2 ) = f (n1 n2 ) + f (d1 d2 )
di |ni di |ni ,d1 d2 <n
= f (n1 n2 ) + f (d1 )f (d2 );
di |ni ,d1 d2 <n
• Sf (n1 )Sf (n2 ) = f (d1 ) f (d2 ) = f (d1 )f (d2 )
d1 |n1 d2 |n2 di |ni
= f (n1 )f (n2 ) + f (d1 )f (d2 ).
di |ni ,d1 d2 <n

 The same key idea occurs in steps (∗) and (∗∗), where we have separated the
product n1 n2 from all other (smaller) products d1 d2 . Since Sf is given to
be multiplicative, we have Sf (n1 n2 ) = Sf (n1 )Sf (n2 ): these are the LHS’s of
the above two equations. Equating their RHS’s and canceling all summands
f (d1 )f (d2 ), we are left with f (n1 n2 ) = f (n1 )f (n2 ). This completes the
induction step and shows that f is indeed multiplicative. 
Problem 8. This is a tough problem. Did you manage to do it on your
own? In any case, we are given f (1) = 1, f (2) = 2, and f (3) ≥ 3 (why?).
 PST 29. To prove f (3) = 3, create an equality in order to have a variable
to work with: set f (3) = 3 + m for some m ≥ 0 and show that m = 0.
Now, we have to somehow combine the multiplicativity of f and the fact
that it is strictly increasing to show m = 0. To this end,
 PST 30. Aim at some composite number n = d d 1 2 (with d1 and d2 relatively
prime) and, using f (n) = f (d1 )f (d2 ), arrive at f (n) in two different ways,
thereby creating two opposite inequalities f (n) ≤ N1 and f (n) ≥ N2 .

If each N1 and N2 involves the variable m, then setting N2 ≤ N1 will give an

inequality for m and, hopefully, allow us to solve for m. But what suitable
composite number should n be? If we want to involve m, then it is reasonable
to try d1 = 3. As d2 must be relatively prime with 3, let’s try d2 = 2:
f (3) + 3 ≤ f (6) = f (2)f (3) ⇒ 6 + m ≤ 2(3 + m) = 6 + 2m ⇒ m ≥ 0 . . .
but we already know this! We’ll try instead d2 = 5, i.e., n = 15.

18 ≥
f (15) ≤ f (18) − 3 ≤ 15 + 8m 15
⇑ ∗
f (18) = f (2)f (9) ≤ 18 + 8m 10 ≥
⇑ 9
f (9) ≤ f (10) − 1 ≤ 9 + 4m ∗
⇑ 6 ≥
f (10) = f (2)f (5) ≤ 10 + 4m 5 ∗

f (5) ≤ f (6) − 1 = 5 + 2m ∗ f (15) = f (3)f (5) ≥ (3+m)(5+m)
⇑ ⇑
f (6) = f (2)f (3) = 6 + 2m 3 f (5) ≥ f (3) + 2 = 5 + m

Figure 3. Reaching f (15) twice, w/ f (2) = 2, f (3) = 3 + m

Figure 3 depicts two possible ways of reaching f (15). Each upward (∗)
step is an application of f ’s multiplicativity, while each downward (≥) step
uses that f is strictly increasing. To the left of the diagram a chain of
calculations starts with f (6), goes through f (5), f (10), f (9), and f (18), and
lands us with our first inequality f (15) ≤ N1 = 15 + 8m. To the right of the
diagram, starting with f (5) produces our second inequality f (15) ≥ N2 =
(3+m)(5+m). A “skinny” sandwich N2 ≤ f (15) ≤ N1 has been created:
15 + 8m + m2 ≤ f (15) ≤ 15 + 8m ⇒ m2 ≤ 0 ⇒ m = 0,
and the desired f (3) = 3 follows immediately. Having cleared this hurdle,
one may now look for faster ways to show f (3) = 3. For example,
f (3) · f (5) = f (15) < f (18) = f (2) · f (9) < f (2) · f (10) = f (2) · f (2) · f (5) = 4f (5).
Cancelling f (5) implies f (3) < 4, i.e., f (3) = 3.
Do we have to come up with such specific arguments for every n to
establish that f (n) = n? Fortunately, there is a shortcut. To demonstrate
the idea, we run the first cases.
• How to show that f (4) = 4? As 4 = 2·2, multiplicativity won’t help. The next
value, f (5), also does not yield to multiplicativity. But f (6) = f (2)f (3) =
2 · 3 = 6. Since f (3) = 3 < f (4) < f (5) < f (6) = 6, only two natural numbers
fit between 3 and 6, namely, f (4) = 4 and f (5) = 5.
• Instead of going for f (7), f (8), or f (9) (all unreachable via multiplicativity),
we try f (10) = f (2)f (5) = 2 · 5 = 10. Again, we have just the right “tight”
inequalities: f (6) = 6 < f (7) < f (8) < f (9) < f (10) = 10. Only 3 natural
numbers can fit in, namely, f (7) = 7, f (8) = 8, and f (9) = 9.
• Continuing with this reasoning, the next number to find is f (14) = f (2)f (7)
= 2 · 4 = 14, thereby locating everything between f (10) and f (14).

This initial case analysis naturally leads to a cool idea:

 PST 31. To show that a strictly increasing function f : N → N is the identity
function, you need only to establish that f (ak ) = ak for some increasing
sequence {a1 , a2 , . . . , ak } of natural numbers. More generally, to prove some
property P for all n ∈ N, one possible route is to prove P only for n = ak ,
and then to show that P is satisfied for any n with ak < n < ak+1 .
In our problem, {ak } is {2, 6, 10, 14, . . .} = {2·1, 2·3, 2·5, 2·7, . . .}, twice
the sequence of odd numbers; thus, ak = 2(2k − 1) for k ≥ 1. Since 2 and
2k − 1 are relatively prime, f (ak ) = f (2)f (2k − 1) = 2(2k − 1) = ak , if . . . we
have already shown that f (2k − 1) = 2k − 1. This prompts the idea of strong
induction on k.
Assume by induction on k that f (l) = l for l = 1, 2, . . . , 2k − 1. Since
f (2k−1) = 2k−1 by IH, then f (4k−2) = f (2)f (2k−1) = 2(2k−1) = 4k−2.
But f is strictly increasing, so that
2k−1 = f (2k−1) < f (2k) < f (2k+1) < · · · < f (4k−3) < f (4k−2) = 4k−2.
This gives just enough space to fit all the intermediate natural numbers;
namely, we conclude that f (l) = l for all l = 2k, 2k + 1, . . . , 4k − 3. In
particular, f (2k) = 2k and f (2k + 1) = 2k + 1, which gives the inductive
statement for k + 1 and completes the induction step. ?

Not so fast! The gap here is quite subtle. For the inductive argument
to work, we need to have 2k + 1 ≤ 4k − 2 (why? check the long chain of
inequalities above), i.e., k ≥ 1.5, i.e., k ≥ 2. But then the basis case of our
inductive hypothesis is for k = 2: “Assume that f (l) = l for l = 1, 2, 3.”
(Why?) We are given that f (1) = 1 and f (2) = 2, but we needed to go out
of our way to prove that f (3) = 3. The basis case in this problem is much
trickier to do than the general inductive step! Now we are truly done. 

Problem 8. (Second Solution)8 If, by some chance, you weren’t amazed

by the ideas in our first solution, get ready to be stunned now. To a beginner,
what follows below will seem to have come completely out of the blue; yet,
a seasoned problem solver will appreciate its spicy olympiad flavor.
The question is: can we solve Problem 8 in a one-step strong induction,
thereby avoiding our simple but still multi-step Figure 3 on page 100? Evan
O’Dorney answers Yes. The basis cases for n = 1 and n = 2 are known:
f (1) = 1 and f (2) = 2. Let n ≥ 3 and assume that f (k) = k for all 1 ≤ k ≤ n.
By division on the multiplicativity condition, we can say
f (a) f (c)
(12) <
f (b) f (d)
proposed by Evan O’Dorney, then an 11th grader, BMC alumnus, four-time IMO
medalist, Putnam winner, and contributor to Volume I of the BMC book.

for any a, b, c, d ∈ N such that ad < bc and gcd(a, d) = gcd(b, c) = 1. Indeed,

starting with f (ad) < f (bc) (as f is strictly increasing), we can split as
f (a)f (d) < f (b)f (c) (due to multiplicativity of f ), and then correspondingly
divide by the positive f (b) and f (d).
Now we “just” need to find suitable examples of a, b, c, d to help us show
that f (n + 1) = n + 1. Evan provides two such examples, applies (12) twice,
and ends up with a chain of two inequalities:
f (n + 1) (12) f (n4 − n2 + 1) (12) f (n)
(13) < < ·
f (n) f (n − n − n + n + 1)
4 3 2 f (n − 1)
For instance, to verify the conditions of (12) for the first inequality, multiply
out ad and bc to check that ad < bc:
(n+1)(n4 −n3 −n2 +n+1) = n5 −2n3 +2n+1 < n5 −n3 +n = n(n4 −n2 +1),
where the inequality in the middle can be rewritten as the more obvious
n3 > n + 1 for n ≥ 3. As for the necessary relative primeness condition,
rewriting d = n4 −(n+1)(n2 −1) and c = n2 (n2 −1)+1 makes it transparent
 that gcd(d, a = n + 1) = 1 = gcd(c, b = n) (how?). The reader is urged to
verify (12)’s conditions for the second inequality in (13).
Ignoring the complicated middle term in (13), we derive that
f (n + 1) f (n) IH s n
1< < = ,
f (n) f (n − 1) n−1
because f (n + 1) > f (n), and f (n) = n and f (n − 1) = n − 1 by strong IH.
n2 n2
Clearing the denominators yields n < f (n + 1) < n−1 . However, n−1 < n+2
(check it!), so that the natural number f (n + 1) is (strictly) sandwiched
between n and n + 2. This leaves no choice but to conclude f (n + 1) = n + 1,
thereby completing the induction step and this stunning solution. 

Timothy Chu, then a 12th grader from the SF Bay Area, asks even bolder
questions: what if f (n) is not given to be strictly increasing, or f (2) = 2?
Problem 9. (Chu/O’Dorney) If f : N → N is multiplicative and increas-
ing, and f (k) = k for some k > 1, then f (n) = n for all n ∈ N.
Solution: Replacing (<) with (≤) in the above solution, we arrive at:
f (n + 1) f (n)
(14) 1≤ ≤ for all n ≥ 2;
f (n) f (n − 1)
i.e., the ratio-function f (n + 1)/f (n) is decreasing and bounded below by 1.
If f (n) is not strictly increasing, then f (n) = f (n+1) for some n implies that
all ratios f (m +1)/f (m) from there on are equal to 1; i.e., f (n) = f (n+1) =
f (n + 2) = · · · . In particular, f (n) = f (n(n + 1)) = f (n)f (n + 1); i.e.,
f (n + 1) = 1, and f is the constant 1 (why?), a contradiction. Thus, f is
strictly increasing. Because f : N → N and f (k) = k for some k > 1, it
follows that f (2) = 2.
We are back to our previous problem! Thus, f (n) = n for all n ∈ N. 
Session 5

Introduction to Group Theory

based on Tatiana Shubin’s session

Sneak Preview. Having played with Rubik’s Cube and taken it apart to see
what is inside, it is now time to look under the hood and penetrate more deeply
into what its true structure is. The building blocks are groups. Stubborn poly-
nomials, symmetric elephants, and socks that beg to be put on, taken off, and
permuted between your feet are all part of the story, directed by Galois. You will
escape never-ending cycles in a complex world, only to stroll along in Permuter-
land and, ultimately, seek bi-polar paths in 15-Puzzleland.

1. Puzzling It Out

The well-known 15 -puzzle consists of a shallow box filled with 16 squares

in a 4 × 4 array (cf. Fig. 1a–d from left to right). The bottom right corner
square is removed, and the other squares are labeled 1 through 15 as in
Figure 1a. Using the empty spot, we can slide the squares around without
lifting them up.

1 2 3 4 4 3 2 1 10 9 8 7 8 14 11 3
5 6 7 8 5 6 7 8 11 2 1 6 12 2 15 9
9 10 11 12 12 11 10 9 12 3 4 5 6 4 13 1
13 14 15 13 14 15 13 14 15 7 10 5

Figure 1. Achievable or not?

Problem 1. (McCoy, [53]) Starting from the initial position in Figure 1a,
which 15 -puzzle positions in Figures 1b–d can be achieved and why?
Understandably, a novice may ask: “What does this puzzle have to do
with serious mathematics?” “Ah, . . . wrong question!” an advanced math cir-
cler will say. “Just about any interesting (or uninteresting) puzzle is somehow
related to mathematics.” The puzzle is frequently a disguise for an actual
problem from group theory. In fact, by the end of this session you will have
seen such a variety of examples of groups, that (whether you wanted to or
not) you will start seeing groups everywhere around you!


For instance, just like the Rubik’s Cube, the 15-puzzle is solvable via a
special type of permutations that form a subgroup, fortified with the idea of
a closed path . . . . What does this mean? As vague as this hint may be, it is
the only one you will get for now on Problem 1. Did you try it? Any luck
in transforming Figure 1a into others? Some positions will be achievable
while others will stubbornly remain out of your reach! Is it possible to rigor-
ously prove that the stubborn positions are indeed unreachable, regardless
of how long you play with the puzzle and regardless of whatever complicated
sequences of moves you invent?
If you are stuck, hang around with us for a systematic introduction to
the objects, theorems, tools, and basic applications of group theory. At
the end, we will get back to the 15-puzzle and, hopefully, by then you will
not find it nearly as difficult as it may now look. On the other hand, if
you already know the fundamentals of group theory, skim over the examples
spread throughout this session, and jump to the challenging problems in the
last section. The 15-puzzle will be waiting for you there.

2. A Polynomial Prelude1

2.1. The promise of the quartics. When we think of algebra the first
thing that comes to mind is the study of polynomial equations and their
solutions. And duly so – for a long, long time algebra essentially had been
that very study.
Of course, any linear or quadratic equation can easily be solved, and
there is evidence suggesting that Babylonians as early as 1800 BCE already
knew general procedures for dealing with both types of equations.
Cubic equations proved to be much trickier – the first description of a
general way to solve them appeared in Ars Magna, published in 1545 by
Gerolamo Cardano.2 Soon after, Cardano’s pupil Lodovico Ferrari invented
a nice reduction procedure to conquer quartic (4th degree) equations by con-
structing an associated cubic equation, solving it, and then using its roots
to find a solution to the original quartic equation. This method seemed to
promise that a similar approach could be used to solve higher degree equa-
tions – just keep constructing auxiliary lower degree equations and solving
Unfortunately, this did not work – so much so that all attempts to find
a general method for solving even quintic (5th degree) equations failed.

If any words in this section are unfamiliar to you, don’t worry: just read on for the
fun of it. After all, the history of mathematics is full of duels, drama, and enlightenment.
Recall the discussion of x3 = 15x + 4 in Complex Numbers I, volume I. The method
was actually found independently by Scipione del Ferro and Niccoló Tartaglia, but revealed
by Cardano in Ars Magna, apparently, against Tartaglia’s wishes.

2.2. Shifting focus to a new big picture. Mathematicians were really

perplexed. But, of course, they kept working. Instead of direct attacks,
they turned their attention to the relationships between the roots of a given
equation. This eventually led to the discovery of the marvelous world of sym-
metry and, ultimately, to the idea of groups and other algebraic structures.
A whole new field of mathematics called abstract algebra was created.
So what about higher degree equations? It was by means of abstract
algebra that the question was finally settled in the first half of the 19th cen-
tury – it’s been proved that, in general, a polynomial equation of degree five
and above cannot be solved in radicals; i.e., there is no way to get a solution
formula which uses only the algebraic operations of addition, subtraction,
multiplication, division, and root extraction.3
Meanwhile, the notion of a group has become one of the most important
notions in mathematics. At the same time, it is also very widely used in
applications. For example, apart from the study of algebraic equations, finite
groups are indispensable in fields as distinct as crystallography and coding
theory, just to name two.

3. Action Groups

One way to think of groups is as follows.

Definition 1. A nonempty collection of actions that can be performed one
i after another is called a group if every action has a counteraction also in-
cluded in this collection, and the result of performing any two of these actions
in a row is also included in the collection.

3.1. A group for every soldier. Let us start

with a very simple but illustrative example.

Exercise 1. (Sosinski, [76]) The “Turning

Soldier” group consists of four actions:
• s = stand still;
• r = turn right;
• l = turn left;
• b = turn around 180◦ .
Why is this collection T = {s, r, l, b} a group?
In order to see what happens when various actions are performed one
after another, it is convenient to construct a table, called the multiplication
i table of the group. We label (in some order) each row and each column of
the table by the elements of the group and we place in the matrix cell (i, j)
the element that is equal to the product of the elements labeling the ith row
and j th column of the table.
Who proved it? The 22-year old Norwegian Niels Henrik Abel, in 1824.

Now just stop for a second and see whether what you have just read
makes any sense to you. You certainly should be perplexed by certain words!
In particular, what exactly is meant by the product of actions? Actions are
not numbers, so how do we multiply them? When we deal with an action
group, we can combine a pair of actions by performing one of them and then
following with the other one; and – just for convenience! – we say that we
i have multiplied these two actions.
We are really interested only in the final result of these actions, and not
in the particular way by which that result has been achieved. So if the soldier
turns 180◦ around and then turns right, the result is the same as if he simply
turned left to begin with. (Can you see it?) Thus we say that the product
of actions b and r equals l, and we write rb = l. Observe the order in which
we list the actions b and r: from right to left!
Let’s go back to the multiplication table for the · s r b l
turning soldiers. If the 2nd row is labeled by r and the s
3rd column is labeled b, then we place l in cell (2, 3). r l
Can you fill in the entire table? b
Notice that s = “doing nothing” is a very special
action. Every group must have such an element. Why?
Anyone would agree that the counteraction of turning left, l, is turning right,
r; if we perform these two actions one after the other, we get rl = s. By our

rules for a group, we must therefore include in T the “do nothing” turn s.
Can you think of other reasons why s should be in T ? Check out the first
row and column corresponding to s: they are also very distinguishable!

3.2. A group for every sock. While one sock is not enough for your two
feet, it is enough (precisely because of this) to make for an interesting group.
Exercise 2. (Sosinski, [77]) The “One Sock” group S consists of the ac-
tions: • n = do nothing;
• c = take the sock off and put it on the other foot;
• i = take it off, turn it inside out, then put it on
the same foot again;
• t = take it off, turn it inside out, then put it on
the other foot.
Show that S is indeed a group.
Here is one question over which you may (and should)
want to ponder: Is the One Sock group any different from
 the Turning Soldier group? Each consists of four actions.
Still, can we view every turn of a soldier as a sock move?
We will explore such questions soon; but for now start thinking about this.
 PST 32. A classical way to distinguish between the Turning Soldier and
the One Sock groups is to find the counteraction of each sock’s move and of
each soldier’s turn and compare the two situations.

3.3. A group for every figure. The next example is much more interesting
(and important). While numbers measure size, groups measure symmetry.
Symmetry is the property of an object to remain unchanged while undergoing
i changes. More precisely, a symmetry is a motion that maps a figure onto
itself. For instance, any motion you perform on the elephant-in-profile E1 –
a translation, rotation, reflection, or glide reflec-
tion4 – will produce another figure (congruent to
E1 ). By contrast, the full-face-elephant E2 will go
to itself under a reflection r about a vertical line.
We conclude that elephant E1 has only the trivial symmetry i (the “fix every-
thing” motion), while elephant E2 has a second symmetry – the reflection s.
In general, for every geometric figure F , the collection of symmetries of F
i forms a group (why?) called the symmetry group of F and denoted by S(F ).
The structure and size of this group tells us how much symmetry the figure
possesses. Thus, S(E1 ) = {i} is a single-element group, while S(E2 ) = {i, s}
is a group of 2 motions. Let’s move now to larger symmetry groups.

 the elements of the symmetry group S(Δ). How many are there?
Exercise 3. Let Δ denote an equilateral triangle. Describe (geometrically)

Answer: S(Δ) consists of 6 actions: 3 rotations with respect to the center

(including the 0◦ -rotation), and 3 flips (reflections) across Δ’s altitudes. ♦
By the way, if you want that large of a symmetry group for an elephant,
you will need at least 3 elephants (cf. Fig. 2). Why?

Figure 2. D3 = S(Δ): symmetries of the equilateral triangle

The symmetry group S(Δ) is usually denoted by D3 and is called the

i 3rd dihedral group. In general, Dn , the nth dihedral group, is the group of
symmetries of a regular n-gon. You may know what’s coming up now:
Exercise 4. Find the number of elements in D4 and compare with D3 .
Establish a pattern and check it on Dn for any n ≥ 1. As a bonus, what is
 the maximal number of symmetries you can produce using only two full-face
elephants, and does the resulting symmetry group match any Dn ?
Partial Answer: By playing with the square, you will quickly discover
its 8 symmetries and conjecture that Dn has 2n elements for all n. This will
almost always be true. Two E2 -elephants can fill in the “gap”. ♦
Such transformations of the plane are known as Euclidean motions (cf. Hints section).

The curious reader, of course, will ask if the remaining numbers missed
by the orders of Dn can be obtained as orders of symmetry groups of plane
figures. We challenge the reader to affirmatively answer this question:
Problem 2. For any odd n ≥ 1, find a plane figure F with exactly n
symmetries, i.e., such that the number of elements in S(F ) is n.

3.4. Size is not everything. Nevertheless, the number of elements of a

i group G is its most important characteristic. It is called the order of the
group G and is denoted by |G|. For instance, you must have discovered
above that |Dn | = 2n for n ≥ 3. If G is a finite set, |G| is a positive integer;
otherwise, we say that G is of infinite order, or more simply, is infinite.
Clearly, if two groups have different orders then they are not the same
group. How about if the orders match? Are the groups the same? We
already encountered this situation in our first two examples: the Turning
Soldier group T = {s, r, l, b} and the One Sock group S = {n, c, i, t} both
have orders 4. Following PST 32, you must have noticed that the counterac-
tion of each sock’s move is itself, while this is not always true for the soldiers
turns: r = l, yet r and l counteract each other in T . Thus, T = S.
In the next example, we will take this question to a new level: literally,
to a new dimension. We will count symmetries in space.
Problem 3. (Armstrong, [6]) Consider three solids:
(1) a (right) pyramid whose base is a regular polygon with 12 sides;
(2) a regular hexagonal plate (a hexagonal prism);
(3) a regular tetrahedron.
For simplicity, consider only rotational symmetries5 of these solids. For each
solid, these symmetries form a group (why?) G1 , G2 , or G3 , respectively.
Show that these groups have orders 12; yet, they are all different.

Figure 3. Rotational symmetries of solids

Partial Solution: Locate first the rotational axes and decide how many
rotations about these axes will send the solids to themselves (cf. Fig. 3).
(1) G1 has 12 rotations about the vertical axis (including the identity).
(2) G2 has 6 rotations about the vertical axis (including the identity);
1 rotation about each of 3 axes through the midpoints of opposite vertical
As opposed to rotations in the plane (which happen about single points), rotations
in 3D-space are performed about lines, called axes of rotation.

edges; and 1 rotation about each of 3 axes through the centers of opposite
rectangular side faces.
(3) G3 has the identity; 2 rotations about each of the 4 axes through a
vertex and the center of the opposite face; and 1 rotation about each of the
3 axes through the midpoints of opposite edges.
Thus, |G1 | = |G2 | = |G3 | = 12. But clearly, the symmetries of these
solids are distinctly different. One such striking difference is the fact that
one single rotation when repeated, generates all rotations of the pyramid
(which rotation is that?); but there is no such single rotation of the prism
or the tetrahedron.6 There are other differences as well. To name just
one more, for the pyramid there is only one (non-trivial) rotation which
counteracts itself (which one?), i.e., combined once with itself it equals the
identity. For the prism, there are more such rotations (how many?); and for
the tetrahedron, the number is still different (what is it?).
These essential differences imply that the Gi ’s are all distinct groups. ♦
 PST 33. To establish that groups are not the same, find a suitable property
that is satisfied by a different number of objects from each group. Along with
group order, you may want to count, for example, the number of elements
that counteract themselves, or those that generate the groups (if any).

3.5. A group within a group. Problem 3 was based on the fact that the
rotational symmetries of the solids in Figure 3 form smaller groups Gi inside
the full symmetry groups. A similar phenomenon can be observed in the
simpler case of the group D4 : we may notice that some actions in this group
i form a group by themselves. We call such a subset a subgroup.

 Problem 4. Find all subgroups of D , the symmetry group of the square.


Partial Solution: One of these subgroups con- · r0 r1 r2 r3

tains 4 elements; it consists of all rotational symmetries r0
of a square. Let us call it R4 = {r0 , r1 , r2 , r3 }, where r1 r3
rj is a (90j)◦ -rotation. Note that r12 = r2 , r13 = r3 , and r 2
r14 = r0 ; i.e., r1 generates R4 . With this, the multipli- r3
cation table for R4 can be filled in no time. Adding to
R4 a reflectional symmetry of the square inevitably yields all of D4 (why?).♦
 PST 34. To construct a subgroup of a group, start by including the identity.
For each new element g you add, include all repetitions of g, all products
of g with the current members of your subgroup, and these products’ coun-
teractions. When you are done, the multiplication table will tell you if you
have indeed created a subgroup.
Put geometrically, the pyramid has only 1 axis of rotation, while the prism has 7 and
the tetrahedron 6. This allows the pyramid’s group G1 to have a generating rotation but
makes the same impossible for G2 and G3 . Why?

3.6. Twin groups. Comparing the tables for R4 and T (pp. 106, 109),
we can see that they differ only by the letters used to denote the elements.
After a suitable renaming (e.g., s → r0 , r → r1 , b → r2 , l → r3 ) one
table will become exactly the same as the other. Therefore, these groups are
i indistinguishable from an algebraic point of view. We call them isomorphic
groups and denoted this by R4 ∼ = T.
Exercise 5. Are the groups R4 and S isomorphic? Why or why not?
So far, we have found only two non-isomorphic groups of order 4: T and S.

 also has order 4 and decide if it is isomorphic to T or S.

Exercise 6. Verify that the symmetry group of a (non-square) rectangle

Problem 5. (Advanced) Is there another group of order 4 non-isomorphic

to T and to S? How about a fourth group of order 12 that is non-isomorphic
to G1 , G2 , and G3 ?
Note that all differences between the 3D-solids (or, for that matter, be-
tween the planar figures we encountered), must be related to how their sym-
metries combine. In each case, the group of symmetries has a certain alge-
braic structure. Group theory studies this structure.

4. General Groups

An abundance of groups naturally arise as “action groups”: we devoted a

good amount of time studying them. However, some questions about these
groups can be better and more easily answered if we momentarily forget
about their origins and extract from them only their group essence. And
hence, the general (a.k.a. “abstract”) definition of a group:
Definition 2. A group is a nonempty set G together with a binary opera-
i tion7 ∗ on G with the following properties:
(i) a ∗ (b ∗ c) = (a ∗ b) ∗ c for all a, b, c ∈ G (i.e., ∗ is associative).
(ii) There is an identity element e ∈ G, i.e., a ∗ e = e ∗ a = a for all a ∈ G.
(iii) For each a ∈ G, there is an inverse element a−1 ∈ G, i.e., a ∗ a−1 =
a−1 ∗ a = e.

4.1. Gated communities. Implicit in the above group definition is that

i the set G is closed under the operation, namely that a ∗ b ∈ G for all a, b ∈ G.
It’s worth spending a few moments thinking about the notion of being closed.
Exercise 7. Recall the set T of soldier’s turns, where the (binary) opera-
tion is that of performing actions one after another (a.k.a. composing these
actions). Is T closed under this operation? What if instead of the entire set
T we consider its various subsets? Which of them are closed?
A binary operation ∗ on G takes two inputs a, b ∈ G and yields one output a ∗ b ∈ G.

Solution: No matter what sequence of turns the soldier performs,

in total, he will still have made one of the four allowed turns. Hence T is
closed under the operation. (It better be, since you showed earlier that T is
a group!) The only other two closed subsets of T are {s} (s is the identity
element) and {s, b} (because b2 = s). Trying to include r forces inclusion of
b = r2 , l = r3 , and s = r4 , i.e., of the whole set T ; and similarly for l. 
Exercise 8. Let U = {0, 1}. Is U closed under multiplication? How about
 under addition? Can you add one real number to U so that the new set
would be still closed under multiplication? More than one number?
Hint: The answer to the last question depends on whether we are allowed
to add to U infinitely or finitely many numbers. The former case is simple:
just add all real numbers and you cannot go wrong because of the closure of
R itself! Adding finitely many numbers, however, requires deeper reasoning.
If you add a to U , then you must also add all powers an ; therefore, there
must be only finitely many such distinct powers, i.e., an = am for some
distinct n, m ≥ 1 (why?). For which real a can this happen? ♦

4.2. One too many. It is natural to ask if the objects in Definition 2(ii)-(iii)
of a group are unique:

 element of a group have two different inverse elements?

Exercise 9. Can a group have two different identity elements? Can an

Hint: Both questions will be answered negatively in Multiplicative Func-

tions II. Yet, these are such fundamental facts that they deserve to be proven
again. What is the product e1 ∗e2 for two identity elements e1 and e2 ? What
is the triple product a1 ∗a∗a2 for two inverse elements a1 and a2 of a? Answer
each question in two ways and compare your answers. ♦

4.3. A billion or abelian? We denote a group G with operation ∗ by (G, ∗).

If in addition to properties (i), (ii), and (iii), (G, ∗) satisfies a ∗ b = b ∗ a for
i all a, b ∈ G, then G is said to be commutative, or abelian.
The result of performing two actions one
v r1 after the other usually depends on which
is done first and which second. For exam-
ple, reflecting elephant E1 across a vertical

r1 v line and then rotating it by 90◦ clockwise


is different from first rotating and then re-

flecting E1 . Indeed, starting with the dark
elephant in Figure 4, we eventually ar-
Figure 4. r1 v = vr1 in D4 rive at two differently positioned white ele-
phants,8 showing that the rotation r1 and
the reflection v do not commute with each other. (Try this on your own,
and with other figures too!) This is to say that D4 is not abelian.
Ignore the translations of E1 in Figure 4: they are done so that we can see the
differently positioned elephants, instead of stampeding all of them into each other.

All the same, here is a property that will make a group abelian.9
 Problem 6. Show that if a ∗ a = e for all a ∈ G then G is abelian.
Hint: For any two a, b ∈ G, start with (a ∗ b)2 = e, expand this, and solve
for b ∗ a (which will appear in the middle of your expression). ♦
The hypothesis of the problem may be interpreted to say that every
element is its own inverse, or that every action is its own counteraction!
This was the case in the One Sock group S; now we automatically know
that S is abelian, without having to check the commutativity condition for
all pairs of socks moves! Problem 6 is a classic in the group theory folklore:
it relates the local self-inverse property of individual elements of G (a∗a = e)
to the global property of G being abelian (a ∗ b = b ∗ a).

5. Some More Examples of Groups

5.1. Total “recall.” Here are some initial examples10 of groups, with which
you have worked ever since you started adding and multiplying numbers.
Whether you have realized that the sets below could be treated as groups is
an altogether different situation, to be “remedied” right now.
Exercise 10. By using Definition 2, show that the following are groups:
(a) (R, +); R is the set of all real numbers, and + is ordinary addition.
(b) (Z, +); Z is the set of all integers, and + is ordinary addition.
(c) (Zn , +); Zn = {0, 1, 2, . . . , n − 1}, and + is addition modulo n.11
(d) (R∗ , ·); R∗ = R − {0} is the set of all non-zero real numbers, and the
operation is ordinary multiplication.
5.2. An ocean of symmetries. The next example generalizes our new
friends, the dihedral groups Dn . Now, each Dn is the group of symmetries
of a regular n-gon, and as such it is finite. In order to obtain an infinite
group D∞ , we need a figure with infinitely many symmetries. There are
several natural choices here. One is the limiting figure of regular n-gons
when n becomes large: this is the circle C, with its infinitely many rotations
and reflections. Another choice is the real line R with its infinitely many
reflections and . . . translations. The example below picks yet a third object.
Exercise 11. Think of Z as the set of all dots marking integers on the real
number line (cf. Fig. 4b). Let t be the translation to the right through one
unit, and let s be reflection in the origin. We set
 D∞ = {e, t, t−1 , t2 , t−2 , . . . , s, ts, t−1 s, t2 s, t−2 s, . . . },
where the operation is composition of transformations. Show that D∞ is the
group of symmetries of Z; and that D∞ has some properties similar to those
of Dn : s2 = e and stk = t−k s; but unlike Dn , tk = e in D∞ for any k = 0.
Q: How many commutative groups are there? A: A billion (“Abelian”). 
These and other similar examples also appear in Multiplicative Functions III.
Review operations modulo n from Number Theory I; e.g., 5 + 9 ≡ 3 (mod 11).

The comparison between S(Z) and Dn justifies the name infinite dihedral
i group D∞ for S(Z). Still, why wouldn’t S(R) or S(C) work as well as S(Z)?
C ζ52 C

rα −4 −3 −2 −1 0 1 2 3 4
0 1
t t 2
t3 ζ53
Figure 4. C-symmetries, Z-symmetries, and cyclic C5

Problem 7. (Advanced) Describe the symmetry groups S(R) of the real

line and S(C) of the circle. Are they isomorphic to some other well-known
groups? How do they compare to D∞ ?
To answer fully these questions will require semidirect products of groups
and, hence, take us beyond the intended level of this session. The advanced
explorer may want to check his/her work against the Hints section.

5.3. Complex world. We can think of the circle C as the set of all complex
numbers12 with magnitude 1: C = {z ∈ C | |z| = 1}, or equivalently, C is
the unit circle in the C-plane. For starters,
Exercise 12. Show that C∗ = C − {0} is a group under ordinary multipli-
 cation of complex numbers, and that (C, ·) is a subgroup of (C∗ , ·).
The fact that (C, ·) is an infinite group does not prevent it from having
finite subgroups. Indeed, let n ≥ 1 be an integer, and denote by Cn the
set of all roots of the polynomial equation of degree n, z n − 1 = 0, i.e.,
i Cn = {z ∈ C | z n = 1}. For example, C2 = {1, −1} and C4 = {1, i, −1, −i}.
It is no surprise that all these roots land on the unit circle C: the equation
z n = 1 implies that |z| = 1 and hence z ∈ C. This is illustrated by Figure 4c,
depicting the relative positions of the 5 elements of C5 along C. Moreover,
 Exercise 13. Show that C n is a subgroup of (C, ·).

We can actually list all elements of Cn (via de Moivre’s formula for n z):
Cn = {1, ζn , ζn2 , ζn3 , . . . , ζnn−1 },
i where ζn = cos 2π 2π th
n + i sin n is a primitive n root of unity, that is, a root
whose powers yield all other roots of the equation z n = 1. We can observe
this phenomenon in the above examples:
• as (−1)2 = 1, the primitive root in C2 is ζ2 = −1;
To get comfortable with this example, read first Complex Numbers I-II. In particular,
the magnitude |z| is the distance from point z = a + bi to the origin; |z −1 | = 1/|z| and
|z1 z2 | = |z1 | · |z2 | for any z, z1 , z2 ∈ C. Exercises 12–13 are solved (under disguise) in these
sessions. Primitive roots of unity and de Moivre’s formula appear there too.

• in C4 we have i1 = i, i2 = −1, i3 = −i and i4 = 1, making i a primitive

root; but so is (−i) (why?);
• it turns out that there are four primitive roots in C5 : any non-identity
element generates all of C5 (check it!).
Such situations are so important that there is a special name for them.
i Definition 3. If a group G has a generator a then G is called a cyclic group.
Thus, Cn is a cyclic group; but so are several other groups we have
encountered. While some readers are searching for these cyclic examples, we
will pause developing the theory to finish an earlier story.
5.4. Dramatic conclusion to the search for polynomial solutions.
We managed to completely solve the equation z n −1 = 0 in complex numbers
and to describe its group of solutions as the finite cyclic group Cn of order n.
Of course, z n − 1 = 0 is a very special and simple equation.
In order to fully understand when and why
a general polynomial equation can or cannot
be solved in radicals, you need to learn a very
beautiful part of abstract algebra called Ga-
lois theory (cf. Stewart’s [80]). The theory is
named after a French mathematician, Évariste
Galois, who died (after a duel) at the age of
20 but who had managed to make fundamen-
tal mathematical discoveries and to create a
whole new branch of mathematics. Certain
of his impending death, the night before the
duel Galois outlined his mathematical ideas in
Galois (1811–1832) a famous letter to his closest friend.
Nowadays, Galois Theory is a major part of mathematics programs all
over the world: it is incorporated into the upper-division abstract algebra
sequence or it constitutes a separate advanced college course. By the way,
Galois was the first to use the word “group” in our present sense.
5.5. Back to the cyclic world. For a group (G, ∗), we often refer to the
group operation ∗ as “multiplication”, omit the symbol ∗, and write ab for
a ∗ b. If a ∈ G, we denote the product of n copies of a by an , and the product
of n copies of a−1 by a−n (of course, n ∈ N). We also set a0 = e.
With this convention, a group G is cyclic if everything in G is a power of a
single element a. In other words, G = {an | n ∈ Z}, also denoted as G = a.
Thus, the cyclic group Cn with generator ζn can be written as Cn = ζn .

 Problem 3 isomorphic to a cyclic group? Why not G

Exercise 14. Is the group G1 of rotational symmetries of the pyramid in
2 or G3 ?
A group may have an additive operation, e.g., (R, +). In such a case,
inverses a−1 are written as −a; powers an become sums na, e.g., 3a = a+a+a

and (−2)a = (−a)+(−a); and a0 = e is simply 0a = 0. Thus, a group (G, +)

is cyclic if G = {na | n ∈ Z}, also written as G = a. While (R, +) is not
cyclic (why?), other familiar groups are.

 prove that any two cyclic groups of order n are isomorphic.

Exercise 15. Show that (Zn , +) is a cyclic group with n elements. Then
Conclude that

(Zn , +) and (Cn , ·) are isomorphic, written as (Zn , +) = (Cn , ·).
Hint: Relabeling a generator of (Zn , +) as a generator of (Cn , ·) will make
their multiplication tables identical. ♦
As for an infinite cyclic group, we have seen one: the group of integers
(Z, +) with its two generators 1 and −1 (explain!), i.e., (Z, +) = 1 = −1.
And this is essentially all that can be seen: any other infinite cyclic group is
isomorphic to (Z, +) (Why? Compare with Exercise 15). For instance,
Exercise 16. Let TZ be the set of translations of Z. Show that TZ is an
 infinite cyclic subgroup of S(Z). Conclude that T ∼
Z = (Z, +).

We agreed above that a−n = (a−1 )n for every positive integer n (this
is simply the meaning of our notation). In order to explore if and how a
generates the whole group, we need to be able to manipulate all powers of a:
Exercise 17. Is it true that (a−1 )n = (an )−1 for any integer n? Why?

5.6. Will the court, please, come to order! It is true that every element
a ∈ G generates a cyclic subgroup a of G. Moreover, if G is finite, there
must be a positive n such that an = e; otherwise, a will generate an infinite
cyclic subgroup a of G! In general,
Definition 4. If G is a group and a ∈ G, the smallest positive integer n for
i which an = e is called the order of a and denoted by o(a). If such n does
not exist, we say that a has infinite order and write o(a) = ∞.
Here is a bunch of examples. Check them all out on your own!
• In C4 the order o(i) = 4 while o(−1) = 2. However, in C5 , all elements
(except for the identity 1, of course) have orders 5 so that each generates
the whole group C5 (cf. Fig. 4c).
• Moving to additive notation, o(3) = 2 in Z6 because 3 + 3 = 0; but
o(3) = 4 in Z4 because 3 + 3 + 3 + 3 = 12 = 0 and no smaller sum
would yield the identity 0; still yet, o(3) = ∞ in (Z, +) (why?).
• Finally, for the reflection s and the generating rotation r in the dihedral
group Dn we have o(s) = 2 while o(r) = n.
Problem 8. Let a and b be elements of a group G, and let o(a) = k.
(a) What is o(a−1 )? How about o(am ) for any m ∈ Z?
(b) Prove that H = {e, a, a2 , a3 , . . . , ak−1 } is a subgroup of G, previously
denoted by a. Deduce that the order of a is o(a).
(c) If o(ab) = n, prove that o(ba) = n too.

Partial solutions: (a) If o(am ) = s then ams = e, i.e., o(a) = k divides

ms (why?). The smallest s for which this happens is s = k/ gcd(k, m)
(why?). In particular, for m = −1 we have s = k and o(a−1 ) = ◦(a). ♦
i j
(b) The product of any two powers a and a in H is also a power in H:
if i + j ≥ k, simply subtract k to land ai+j = ai+j−k in H. Since aj and
ak−j are inverses of each other, H satisfies the definition of a group. ♦
(c) For concreteness, suppose o(ab) = 3. Then e = ababab. Multiplying
on the left by a−1 and on the right by a yields a−1 ea = bababa, i.e., e = (ba)3 .
How does this imply that 3 is also the order of ba? ♦
Exercise 18. Sometimes “circular reasoning” is useful.

(a) If G is cyclic, show that it is abelian.
(b) If G is cyclic of order n, show that it has an element of order n.
(c) Show that Dn is non-abelian and hence non-cyclic, but it contains a
cyclic subgroup of order n.
Hint: (c) Consider the set of rotational symmetries of a regular n-gon. ♦
5.7. A never-ending cycle? Can an infinite group have elements of finite
order? Not only the answer is Yes, but you have worked many times in the
“extreme” scenario:

Problem 9. Give at least two different examples of (infinite) groups that
contain elements of order n for every n ≥ 1.
Hint: Two possible answers are among the groups on pp. 112 -113. ♦
We constructed the cyclic groups Cn as examples of finite subgroups of
the circle C. Are these all finite subgroups of C∗ ? The ingredients for the
solution to our final problem below are spread all over this section.
Problem 10. (Intermediate) Find all finite subgroups of (C∗ , ·).

6. Permutation (or Symmetric) Groups

Permutation groups are the substance of Rubik’s Cube I-II. Indeed, their
complexity is what makes the Rubik’s Cube such a tantalizing and challeng-
ing puzzle. Even though permutations provide “just” examples of groups,
they are so fundamental for the development of group theory that it is worth-
while reviewing them again here and doing all associated exercises. If you
feel strongly prepared for the topic, tackle on your own the 15-puzzle in
Problem 1 and rejoin us later for the official “showdown” via permutations.
6.1. The word permutation has at least five mathematical synonyms.
Definition 5. Let A be a set of n elements. A permutation α of A is a
i rearrangement of the elements of A. In other words, α is a 1-to-1 function
from A onto A, a.k.a. a 1-to-1 correspondence or a bijection of A.

For example, let n = 5, and let us denote the elements of A by numbers,

e.g., A = {1, 2, 3, 4, 5}. It is convenient to represent a permutation α by a
table with two rows as follows:
Å ã
α= ,
where α(1) = 4, α(2) = 3, α(3) = 5, α(4) = 1, α(5) = 2. Thus, α sends 1 to
position 4, 2 to position 3, and so on.
From the viewpoint of group theory, the first thing to notice is that the
product of two permutations is a permutation as well (why?). For instance,
Å ã Å ãÅ ã Å ã
12345 12345 12345 12345
if β = then αβ = = .
23154 43512 23154 35421
It is no surprise that in the above calculation we applied first β and then
α but wrote αβ in the standard right-to-left notation. To be concrete,
(αβ)(1) = α(β(1)) = α(2) = 3, (αβ)(2) = α(β(3)) = α(1) = 4, and so on.
As A is a subset of N, the permutations of A can be also viewed as
symmetries of A: they map A onto itself, just as the reflection across a
vertical line maps the full-face elephant E2 onto itself. As shown in Rubik’s
 Cube II, the set of all permutations of A = {1, 2, . . . , n} forms a group, called
the symmetric group on n elements and denoted by Sn .
Let us review some basic facts that make Sn a group.
• To obtain the inverse of a permutation, simply return all elements of
A to their original positions. For example, check that
Å ã Å ã
12345 12345
α−1 = and αα−1 = α−1 α = e = .
45213 12345
• The identity permutation e above is, of course, the do-nothing permu-
tation e(i) = i for all i.
• Finally, α(βγ) = (αβ)γ for any α, β, γ ∈ Sn because the composition of
any functions is an associative operation.

6.2. Law & Order in Permuterland. In a high court of the kingdom of

Permuterland,13 there are three judges for every trial. In the grand tradition
of algebra, let’s call them A, B, and C. They file in at the beginning of any
trial and sit at the table in the order ABC. But when the eccentric king of
Permuterland, who attends all trials, yells “Promenade 1,” B and C change
places; and when he yells “Promenade 2,” A and B change places; and when
he yells “Promenade 3,” A goes to where C was sitting, B goes to where A
was sitting, and C goes to where B was sitting.

The example of Permuterland was introduced in 1973 by Roy Dubish in his Groups
(Topics For Mathematics Clubs, [22]). It is interesting to realize that 40 years ago,
group theory was considered a suitable topic for budding pre-college mathematicians.
The publisher of the book is the National Council of Teachers of Mathematics.
Now, in a hectic mood one
day, the king yells “Promenade 1” p1 p2
and, two minutes later, yells “Prom-
enade 2.” To his royal amazement, p4
the king realizes that the judges p3
are now seated exactly as they p2 p1
would be if instead he had just
yelled “Promenade 3.” The next
day he decides to try this procedure Figure 5. Permuterland
again – but with a slight variation:
now he yells “Promenade 2” first and then yells “Promenade 1” – and he is
amazed to find out that the result is not the same as Promenade 3; indeed,
the result is what he has been calling Promenade 4.
Exercise 19. Of course, you recognize that the “Promenades” are simply
elements of some Sk . What is k and how does the example of Permuterland
show that this Sk is non-abelian?
Solution: The judges comprise the set A = {A, B, C}. If pj denotes
Promenade j, then the King’s favorite promenades p1 , p2 , p3 , and p4 are
permutations of A, i.e., they are elements of the symmetric group S3 . The
King’s observation p1 p2 = p3 = p4 = p2 p1 shows that S3 is non-abelian. 

 tation. Which promenades are missing? What is the order of S ? S ? S ?

Exercise 20. Write all promenades in Figure 5 in the standard 2-row no-
3 4 n

Partial Solution: Promenade 5 that switches A and C and the do-

nothing Promenade 6 are missing in Figure 5. The order of S3 is thus 6.
One does not need to know anything about groups to calculate the order of
Sn : in Combinatorics I, we imagined each permutation as a row of n empty
slots, to be filled with the numbers 1 through n in some order; this helped
us arrive with |Sn | = n!. In particular, |S3 | = 3! = 6 and |S4 | = 4! = 24. ♦
 Exercise 21. Find the order of every element of S . 3

Solution: We can view the six promenades in Permuterland according to

the number of judges they move. Three promenades switch two judges: p1 =
(B ↔ C), p2 = (A ↔ B), and p5 = (C ↔ A). Two promenades rotate the
three judges around, in one or the other direction: p3 = (A ← C ← B ← A)
and p4 = (A → C → B → A); and promenade p6 does not move anyone.
From here, o(p6 ) = 1, o(p1 ) = o(p2 ) = o(p5 ) = 2, and o(p3 ) = o(p4 ) = 3. 
Note that p1 , p2 , and p5 are self-inverses, but p3 = p−14 . Also, none of
the individual element’s orders equals the order 6 of S3 ; still all of them are
divisors of 6. Is this a coincidence? The answer will come up later.
Å ã
1 2 ... n
6.3. Cycling again. The standard 2-row notation π =
π(1) π(2) . . . π(n)
for any π ∈ Sn is not the only possible way to denote permutations. Let’s
use it in the practice exercises below and think if there is a “better” option.

Exercise 22. Perform the indicated operations:

Åã Å ã
123 123
(a) πρ and ρπ where π = and ρ = ;
312 321
Å ãÅ ã Å ã−1 Å ã3
, and
Å ã−2
1 2 3 4 5 6 7 8 9 10 11 12
(c) .
10 2 9 8 12 3 4 1 11 5 7 6

Notice that the permutation π in (a) has the effect of moving the elements
i around in a cycle. Thus, we call it a cycle of length 3 and we write it as
π = (1 3 2). This is just another, more convenient, notation for the same
permutation. We think of (1 3 2) as representing the following mapping:
1 → 3 → 2 → 1, and we drop the spaces if only one-digit numbers appear.
Clearly, (132) = (321) = (213).
A cycle of length r is called an r-cycle. A 2-cycle is also called a transpo-
i sition since it transposes two elements. Which promenades in Permuterland
are transpositions and which are 3-cycles?
 Exercise 23. Calculate (1356) , (1356) , and (1356) . What is o((1356))?
2 3 4

It takes 12 “one-hour” rotations for a clock to come back

to its original position. Analogously for the r-cycles:
 Problem 11. Prove that an r-cycle is of order r. 10
11 12 1
8 4
Exercise 24. Calculate (1342)(123) and (1534269)−1 . 7 6 5

Hint: Remember to apply permutations from right to left!

E.g., (1342)(123) sends 2 → 3, then 3 → 4; so overall 2 → 4.♦ Z12

Not all permutations are cycles (obviously!) . . . but it is true that they
can all be written as products of one or more cycles. For starters,
Å ã
 Exercise 25. Write permutation φ = 12345678
as a product of cycles.

Now generalize this result to any permutation, on your way to a formula for
the order of permutations.
Problem 12. Prove the following statements.
(a) Every permutation can be expressed as a product of disjoint cycles, i.e.,
cycles which have no common elements.
(b) Every permutation can be expressed as a product of transpositions.
(c) The order of the product of disjoint cycles is the least common multiple
(lcm) of the lengths of these cycles.
“Proof” by Example: You should have found out in Exercise 25 that
φ = (1357)(468) is the product of two disjoint cycles. Disjoint cycles are
great because they commute: it doesn’t matter which way you write them,
you will get the same result; e.g., (1357)(468) = (468)(1357). Thus, powers

 of the permutation can be computed by taking individual powers of each

cycle; e.g., φ4 = (468)4 (1357)4 = (468) (why?). In order to eliminate 3-cycle
(468) you need a power φk where k is a multiple of 3; similarly, k must be a
multiple of 4 in order to eliminate 4-cycle (1357) (why?). This leads to the
inevitable conclusion that raising φ only to multiples of 12 will make it the
identity. Therefore, o(φ) = 12 = lcm(3, 4). ♦
Further, (1357) = (17)(15)(13) and (468) = (48)(46), so that one way to
represent φ as a product of transpositions is φ = (17)(15)(13)(48)(46). ♦

6.4. Permutations are born unequal! Can you guess why we didn’t
use the transpositions to calculate o(φ)? The key reason is that transposi-
tions cannot always be made disjoint; hence, they may not commute and,
in general, are not convenient in calculating the order of the permutation.
Nevertheless, representing a permutation as product of transpositions plays
a crucial role in solving the 15-puzzle, so don’t discard transpositions yet!
Definition 6. A permutation is said to be even if it can be expressed as the
i product of an even number of transpositions. A permutation is odd if it can
be expressed as the product of an odd number of transpositions.
For example, (123) and (12)(2543) are even since (123) = (13)(12) and
(12)(2543) = (12)(23)(24)(25); while (1234) = (14)(13)(12) is odd, and so is
φ above. The identity permutation is even: (1) = (12)(12).
An “annoying” question should pop up in your mind: Isn’t it possible
to write the same permutation in two different ways, once as a product of
an even number of transpositions, and once as a product of an odd number
of transpositions? If yes, this would completely obviate the meaning of the
above definition! We urgently need to resolve this question.
Problem 13. Prove the following facts about even and odd permutations.
(a) The identity permutation is not odd.
(b) Every permutation in Sn is either even or odd but not both.
(c) An r-cycle is even if and only if r is odd.
Part (a) may initially strike you as strange – why care so much about
the specific case of the identity not being odd? If (b) is proven, wouldn’t it
subsume (a)? Still, stating (a) separately is no mistake.
 PST 35. To prove a property for all permutations, first prove it for the
special case of the identity e and then reduce the general case to the case for e.

Here is how this idea applies specifically to our problem.

“Proof” by Example: (b) If some α ∈ Sn were both even and odd,
that would force e itself to be odd! Indeed, suppose that (hypothetically!)
α = t1 t2 = q1 q2 q3 for some transpositions ti and qj . Since every transposition
is its own inverse, t2i = e for all i and (t1 t2 )−1 = t2 t1 (Why? Check it!). We
can now eliminate the LHS by pre-multiplying everything by t2 t1 :

(t2 t1 )(t1 t2 ) = (t2 t1 )(q1 q2 q3 ) ⇒ e = t2 t1 q1 q2 q3 .

This represents e as a product of 5 transpositions, and contradicts e being
even from part (a). Of course, you should repeat this proof with arbitrary
odd and even numbers of ti s and qj s, respectively. ♦

This was the easy part. The hard part will come up when you try to
show that e cannot be odd, as desired in (a).

Skeleton of a proof: (a) By contradiction, start with e = t1 t2 · · · t2n+1

being a shortest representation of e as a product of an odd number of trans-
positions. WLOG, suppose that 1 does appear in this representation, and
locate the first 1 from right to left, in some tk = (1a).
By considering 4 different possibilities for the previous transposition tk−1 ,
show that you can rewrite tk−1 tk in an equivalent form tk−1 tk where 1 ap-
pears only in the left transposition tk−1 . Conclude that you can consecu-
tively move 1 to the left14 until it appears only in the leftmost transposition
t1 = (1a). This is impossible, as e = (1a)t2 t3 · · · t2n+1 is supposed to fix
everything, yet it actually moves 1 to a!
Thus, 1 cannot appear in the representation of e, which is yet another
contradiction (why?). ♦

Notice that in the above proof, we used a minimality principle:

 PST 36. Choosing to work with a shortest representation of e wrt a certain
property gives you grounds for contradicting later this same shortest length.
In particular, in Problem 13b, if tk−1 = (1a), then tk−1 tk = (1a)(1a) = e
reduces the length of the representation by 2. While keeping the number of
transpositions odd, this blatantly contradicts the assumption of minimality.
Now that the hard work is done, and we are certain that each permuta-
tion is either even or odd (but not both!), we have a simple algorithm:
 PST 37. To find the parity of any π ∈ S , write π in some way as a product
of transpositions. The parity of the number of these transpositions will be
equal to the parity of π.
Proof: (c) In part (a) earlier, when moving the 1 to the left, you probably
used an equality like (1b)(1a) = (1a)(ab), where both sides equal the 3-cycle
(1ab). A similar representation holds true for any r-cycle:
(a1 a2 . . . ar ) = (a1 a2 )(a2 a3 ) · · · (ar−2 ar−1 )(ar−1 ar ) .
r−1 transpositions
From here, the parity of an r-cycle matches the parity of r − 1. 
In fact, there are a number of other ways to represent a cycle as a product
of transpositions. Can you think of several more?
The technical details of this move are included in the Hints section. A very different
proof of parts (a) and (b) was already featured in Rubik II.

Let us denote the set of all even permutations of Sn by An .

Exercise 26. Show that the product of two per- Sn

 mutations is even iff they are of the same parity, (12)

(132) An
and that a permutation and its inverse have the (1432)
same parity. Conclude that An is a subgroup of e
i Sn , called the alternating group on n elements.

6.5. Permutations in space. The relationship between An and Sn can

explain a geometric phenomenon which we encountered earlier. Recall the
group of all rotations G3 of the regular tetrahedron T in Problem 3. Since T
has 4 vertices and every rotation permutes them (why?), it is obvious that
G3 must be a subgroup of S4 . But G3 is not the entire S4 :
Problem 14. Explain why G3 is the set of all even permutations A4 . And
while you are at it, what is the group of all symmetries of the tetrahedron?
Almost a solution: There are two types of axes of rotation for the
tetrahedron T (cf. Fig. 3c), each of which defines a different type of even per-
mutation of the vertices: a product of two disjoint transpositions (ab)(cd),
or a product of two non-disjoint transposition (ab)(bc) (a.k.a., a 3-cycle). As
for the full group S(T ), you can use brute
force to list all 24 symmetries of T (cf.
Fig. 7a). Or you can be “sneaky” and use C
the fact that there is no group G properly
sitting between A4 and S4 (no G with A4 
G  S4 ). (The proof will appear via La-
grange’s Theorem in Group Theory II.) ♦
Figure 7. More symmetries
The cube, on the other hand, is a different matter altogether:

Problem 15. Prove that the group of rotations of the cube is the entire S4 .
Veiled hint: Can you think of the 4 things in the cube C which are being
permuted by every rotation of the cube and whose group of permutations
“coincides” with the group G4 of rotations of the cube? ♦

Problem 16. (Advanced) What is the group of all symmetries of the cube?
Hints through answer: The full S(C) turns out to be twice as big as
S4 , but certainly not as big as S8 . Why? Which permutations of the cube’s
vertices cannot be obtained by symmetries of C? In fact, S(C) is the direct
product of S4 with the cyclic group C2 = {1, −1}, i.e., S(C) = S4 × C2 .
Now, the identity element 1 of C2 is easy to interpret geometrically
(how?), but what is the geometric meaning of the other element −1 of C2 ?
It cannot be a rotation of the cube (as all of those are already included in
S4 ); but it is definitely a symmetry of order 2 (why?), so what is it? ♦

7. The 15-Puzzle Puzzled Out

7.1. Double tasking. Whoever seriously attempts Problem 1 realizes that

 PST 38. Solving the 15-puzzle consists of two distinct parts: construc-
tion and elimination. The construction part amounts to an algorithm that
demonstrates which positions of the puzzle are attainable. The elimination
part is essentially a proof (by methods beyond the trial-and-error of the
physical game) that the remaining positions cannot be attained.
With our group theory knowledge, we are fully equipped to do the elim-
ination part. We start with some preliminary work.
 PST 39. The first step in applying a theory to a problem is to interpret the
problem in the setup of the theory.
In particular, you need to find out the group represented by the 15-puzzle,
making sure that the puzzle movements match the group operation.

7.2. Is the 15-puzzle really a 15 -puzzle? It should be clear by now that

the game has something to do with permutations. At a first glance, these
permutations are elements of S15 . Indeed, let us agree to read the numbers
line by line from left to right, starting at the top left corner and ending with
the empty cell in the bottom right corner.
Exercise 27. Interpret the arrangements in Figure 1 as permutations in S15 .
Some answers: Figure 1a depicts the identity (1), Figure 1b the product
of 4 transpositions (1,4)(2,3)(9,12)(10,11), and Figure 1c the 12-cycle
Å ã
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
= (1, 10, 3, 8, 6, 2, 9, 12, 5, 11, 4, 7).
10 9 8 7 11 2 1 6 12 3 4 5 13 14 15
Write the permutation in Figure 1d also in cyclic notation. ♦

Yet, the moves of the empty cell cannot be interpreted as permutations

in S15 ! The empty cell interferes, rendering intermediate arrangements that
are something different and cannot be encoded just by looking at S15 . In
order to apply our group theory knowledge, we want to be able to think of
all arrangements of the 15-puzzle (regardless of where the empty cell is) as
some permutations.
 PST 40. The key idea is to close our eyes and imagine that the number
16 is written in the empty cell. Then each move of the puzzle is simply a
transposition of 16 with an adjacent number, and each arrangement of the
15-puzzle is a permutation in S16 .
This allows us to think of S15 as a part of S16 : for every α ∈ S15 insert
16 at the end of α, i.e., set α(16) = 16; thus, S15 becomes the subset of S16
consisting of all permutations that fix 16 (cf. Fig. 5a). Moreover,
 Exercise 28. Prove that, viewed as above, S 15 is a subgroup of S16 .

α1 S16
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
α2 e S15
α4 5 8 5 6 7 8 5 6 7 8 5 10 6 8
A16 α6 9 9 10 11 16 9 10 16 11 9 14 7 11
13 13 14 15 12 13 14 15 12 13 15 12 16
α5 α7
P on e α1 α2 α8

Figure 5. Even promenades in Puzzleland

7.3. Strolling along in Puzzleland. As you have certainly noticed, all

arrangements in Problem 1 have “16” in their last position (bottom right
corner), and as such they are elements of our S15 inside S16 . To get from
the identity arrangement e ∈ S15 to another such arrangement α ∈ S15 , the
16-cell must leave its position, trace a path along the puzzle, and in the end
return to its initial position – the bottom right corner. Figure 5b displays
one such path P for 16: P = U LU LDDRR, where U = up, L = left, etc.
Figures 5c-e show the results α1 = (16,12) and α2 = (16,11)(16,12) after one
and two moves along P, as well as the final result α8 after 16 completes P.

Problem 17. Write down α8 . Is it odd or even? What if you choose another
15 path for 16: what will the parity of the final permutation be? Why?

Solution: As P has 8 steps, α8 is even:

α8 = (16, 12)(16, 15)(16, 14)(16, 10)(16, 6)(16, 7)(16, 11)(16, 12).
An arbitrary closed path P  will still return 16 to its original position. Hence,
P  has as many up as down steps, and as many left as right steps. In short,
P  has some even length 2k, forcing the final permutation α2k to be a product
of 2k transpositions and, therefore, necessarily even. 
The diagram in Figure 5a illustrates what is happening along the 8-step
path P. Notice that S15 and the alternating group A16 are both subgroups
of S16 ; their intersection S15 ∩ A16 (in white) is simply A15 , the set of even
permutations in S15 . The path P starts at the identity e in this intersection.
Taking one step along P multiplies e by one transposition and hence lands
on some odd α1 ∈ A16 . The next step produces an even α2 ∈ A16 ; the third
step yields again an odd α3 ∈ A16 , and so on. The path zigzags, going in
and out of A16 , until it finally lands on the even α8 ∈ S15 ∩ A16 = A15 .
What is the conclusion? If you want to start from the identity permu-
tation e and end with the empty cell still in the bottom right corner, you
must stroll along an even-length (closed) path, which will terminate in the
intersection S15 ∩ A16 and hence your final result will always be an even
permutation! In other words,
Theorem 1. Odd permutations defy the 15-puzzle: only even permutations
can be obtained in the 15-puzzle.
A closed path means that it starts and ends at the same place.

In particular, you will never be able to reach the 12-cycle in Figure 1c

(why?). On the other hand, the even permutation in Figure 1b still has a

chance of being reached! We will determine whether this is so in the next
subsection. How about the permutation in Figure 1d?

7.4. Playing the puzzle. Now it remains to show that any even permu-
tation can be obtained via the 15-puzzle. The best way to do this is . . . to
play the puzzle. But not randomly! Here is a vastly simplifying idea:

PST 41. Instead of starting with the identity e and finding your way to any
even arrangement α of the 15-puzzle, reverse the process – start with α and
try to reach e.16 You do not need 16 anymore; so use the empty cell instead.
Problem 18. Here is the beginning of one possible algorithm to convert any
even permutation α to e. Think about each step and how to perform it.
(1) Move 1 to the top left position. Without displacing 1, move 2 on its
right; now, without shifting 1 or 2, move 3 to 2’s right.
(2) Move 4 to 3’s right (this may temporarily displace 1, 2, and 3). By now
you have arranged the first row into 1, 2, 3, 4.
 (3) Using the same algorithm (without touching the first row), you can
arrange second row into 5, 6, 7, 8.
(4) With some more care (without touching the first two rows), you can
rearrange the third row into 9, 10, 11, 12.
(5) Push the empty cell to the rightmost position on the fourth row.
Call the resulting permutation β, i.e., α → β. What can β be? So far, we
know that β has the numbers 1 through 12 in their correct positions. Since
we started with an even α, Theorem 1 ensures that β must be even too. But
there are only 3 ways to rearrange the remaining numbers {13, 14, 15} in β
and still be even: the 3-cycles β1 = (13,14,15) and β2 = (13,15,14), or the
identity e itself. If you manage to convert β1 → e, then applying the same
algorithm to β2 will convert it to β1 (why? β22 = β1 ), so you will again reach
e after another application of your algorithm: β2 → β1 → e. What is left
is probably the hardest conversion you can make in the 15-puzzle: it looks
simple, but it captures the true spirit of the puzzle. Prove that
Theorem 2. In the 15-puzzle it is possible to 1 2 3 4 1 2 3 4
convert the 3-cycle β1 = (13,14,15) to the iden- 5 6 7 8 ? 5 6 7 8
tity arrangement e. Conclude that all even 9 10 11 12 9 10 11 12
permutations are reachable. 14 15 13 13 14 15

β1 e
You can think of this as if you are going from your house to an unknown place and
then back. Which way will be easier to cover? Probably from the unknown place back
home, because you are likely to recognize more and more familiar scenes and road markers
as you approach your house. Traveling to a familiar place will give you an advantage to
take alternative routes or find out with greater ease where you are.

8. Hints and Solutions to Selected Problems

Exercise 1. The multiplication table for the Turning Soldier group T is

shown in Figure 6a. The product of any two actions is also one of our four
actions: s, r, b, and l. Further, every action has a counteraction in this
table; to see this fast, observe that for every row labeled by action x there is
some column labeled by action y such that the row and the column intersect
in the “do nothing” action s: xy = s. Thus, rl = lr = bb = ss = s. In
particular, the counteraction of l is r, the counteraction of b is b itself, etc.:
every element has a counteraction in T ! By definition of a group, we have
established that the Turning Soldier group T is a group indeed! 

· s r b l · n c i t
s s r b l n n c i t
r r b l s c c n l s
b b l s r i i l n r
l l s r b t t s r n

Figure 6. Tables for the Turning Soldier and the One Sock groups

Note that b2 = r4 = l4 = s (why?), which gives more reasons to include the

“do nothing” action s in T . In fact, s has the very special property that if we
multiply it by something else, we will get that something else: sx = xs = x
for any action x ∈ T . This property is clearly demonstrated by the row and
by the column labeled by s, as they mimic precisely the labeling row and
labeling column in the table. ♦
Exercise 2. The table for the One Sock group S is given in Figure 6b. The
interesting observation here is that every action is its own counteraction:
x2 = n for any x ∈ S. We observe this along the diagonal of the table,
which is filled only with the “do nothing” action n. However, we didn’t
observe a similar phenomenon in the Turning Soldier group T : r and l
were counteractions of each other, but certainly not of themselves! This
already makes the situation very suspicious: the two groups T and S must
be different somehow, despite the fact that each hast 4 elements. ♦
Exercise 3. Let i, r1 , and r2 denote the rotations of Δ by 0◦ , 120◦ counter-
clockwise, and 120◦ clockwise, and let s1 , s2 , and s3 denote the three reflec-
tions of Δ across its altitudes (as in Fig. 2). The partial multiplication table
for S(Δ) is displayed in Figure 8c. For example, s1 r1 = s3 and s1 s2 = r2 ;
but multiplying the other way around yields r1 s1 = s2 and s2 s1 = r1 . To
see how to get these results fast, label Δ as ABC and track down where
the vertices go under the rotations and the reflections (cf. Fig. 7).
For practice, the beginner should complete the whole table for S(Δ). The
more advanced reader will realize that it is not necessary to go through the
grueling calculations of finding the exact multiplication table for every group:

s3 r2
r1 s1 s2 s1
= =
s1 r1 s1 s2
Figure 7. In S(Δ) = D3 : s1 r1 = s3 and s1 s2 = r2

a more general argument is usually much faster and more elegant. In our
situation with S(Δ): think about why the composition of two symmetries of
Δ is again a symmetry of Δ, and why a symmetry always has a counteraction,
i.e., a “reverse” symmetry that undoes it. For example, the counteraction of
r1 is r2 , and of s1 is s1 itself. ♦

· i r1 r2 s1 s2 s3
i i r1 r2 s1 s2 s3
· i s r1 r1 r2 i s2 s3 s1
· i
i i s r2 r2 i r1 s3 s1 s2
i i
s s i s1 s1 s3 s2 i r2 r1
s2 s2 s1 s3 r1 i r2
s3 s3 s2 s1 r2 r1 i

Figure 8. Tables for symmetry groups S(E1 ), S(E2 ), and S(Δ)

If you are still unsure which “symmetries” of our figures we are allowed
to consider in this session, check out the footnote on page 107: the allowable
symmetries are called Euclidean motions. These are motions (bijections) of
the plane that preserve distances, also known as rigid motions or isometries:
i imagine your figure made of cardboard and you want to transform the figure
onto itself without bending, twisting, pinching, or doing other horrible stuff
to the cardboard. Thus, a symmetry of a plane figure is not just any bijection
of the figure onto itself: it is a rigid motion. For example, switching the
vertices A and B of a square ABCD while leaving the other two vertices C
and D fixed is not part of a symmetry of the square (why?). Be aware that
i in some sources “rigid” motions exclude orientation-changing motions like
reflections (a reflection changes a clockwise orientation ABCD of the square
to a counterclockwise orientation of the vertices, i.e., ADCB). However, we
will consider reflections as part of our symmetry groups in this session.
Finally, a reflection across a line combined with a translation along this
i line is what is called a glide reflection. For any plane figure, its symmetry
group will be generated by and will consist of the four types of plane trans-
formations mentioned in the text: rotations, reflections, translations, and
glide reflections. This is a fact that needs a proof, and we leave it to the
more experienced reader to provide such a proof.
Exercise 4. For n ≥ 3, Dn has 2n elements: n rotations and n reflections.
The pattern breaks for n = 1 and n = 2. Of course, we may never think

of a point or a segment as a “regular” 1-gon or 2-gon; but if we do, we will

find out that D1 = {i} = S(E1 ) and D2 = {i, s} = S(E2 ) (cf. Fig. 8a–b).
Thus, the sequence of Dn ’s sizes is {1, 2, 6, 8, 10, . . .}, and it misses the even
number 4. One way to achieve a symmetry group of size 4 is to put two E2
elephants on top of each other or, equivalently, to consider the symmetry
group of a (non-square) rectangle (cf. Fig. 9a). 


Figure 9. S(rectangle) and |S(F )| = 3

Problem 2. In Figure 9b, two equilateral triangles share the same center
O and can be obtained from each other by a rotation and a rescaling; the
rotation is about O at some angle α = k π3 (k ∈ Z), e.g., α = 45◦ , while
the rescaling has some ratio r = 1, e.g., r = 1.5. It is easy to see that the
union of these two triangles has only 3 (rotational) symmetries, written as
|S(F )| = 3. Generalize this example to |S(F )| = n for any n ≥ 1. ♦
Problem 3. The hardest question to answer here is why G1 , G2 , and G3 are
actually groups. You can show this by brute force for each of the groups (e.g.,
compute their multiplication tables). The true explanation, however, is that
the composition of any two rotations in space is again a rotation in space, the
proof of which can be done with linear algebra methods (e.g., multiplying
the so-called orthogonal matrices) and is beyond the scope of this session.
Now, having accepted that we are indeed dealing with groups G1 , G2 , and
G3 , you can find plenty of reasons for these groups to be different.
The text suggests that G1 has a generating rotation; if rj is the rotation
about the vertical axis of the pyramid by (30j)◦ clockwise, then applying r1
repetitively j times will yield rotation rj for all j; so r1 certainly generates
all of G1 . However, r5 , r7 , and r11 also generate G1 : either check this
by brute-force examination of all their repetitive applications or, if you are
more advanced, use slick reasoning from number theory to conclude that the
generating rotations are precisely those rj s for which j is relatively prime
with 12, i.e., j = 1, 5, 7, or 11. On the other hand, if a solid has more than 1
rotational axis, there is no hope for it to have a generating rotation: indeed,
every rotation can generate at most some other rotations about its own axis,
but certainly not about another rotational axis! Thus, G2 and G3 lack single
The text asks us also to pay attention to non-trivial rotations that coun-
teract themselves: such a rotation can only be by 180◦ about the correspond-
ing axis (why?). Each of the rotational axes of our solids has such a special
rotation. Therefore, the number of “self-counteracting” rotations for each
solid is the number of axes for that solid: 1, 7, and 6, respectively. ♦

Problem 4. If we add a reflectional symmetry s to R4 (i.e., s is a reflection

across one of the two diagonals or across one of the two midsegments of the
square), then the products r0 s, r1 s, r2 s, r3 s must also be in our subgroup
of D4 . These products are obviously 4 different symmetries: all first apply
s to the square, but then each continues with a different rotation rj of the
square. In addition, each reflection of the square switches the labeling of the
vertices of the square from clockwise to counterclockwise orientation (check
it!); yet any rotation of the square preserves the orientation of this labeling
(check it!). Hence, each product rj s first changes the orientation of the
labeling (via s) and then preserves this new orientation (via rj ); so overall,
rj s changes the orientation of the vertices’ labeling and, thus, must be one
of the reflections of the square (cf. Fig. 10a).
· r0 r1 r2 r3 s 1 s 2 s 3 s 4
· r0 r1 r2 r3 s 1 s 2 s 3 s4
· r0 r1 r2 r3 s 1 s 2 s 3 s 4
r0 r0 r1 r2 r3 s 1 s 2 s 3 s4
r0 r1 r1 r2 r3 r0
r1 R4: r2 r2 r3 r0 r1 s 2 s 1 s 4 s3
r2 rotations r3 r3 r0 r1 r2
s1 s1 s2 r0 r2
s1 s2 s2 s1 r2 r0
reflections rotations s3 s3 s4 r0 r2
s4 s4 s3 r2 r0

Figure 10. Rotations vs. reflections in D4

Therefore, r0 s, r1 s, r2 s, and r3 s are the four distinct reflections of the
square, and our subgroup R4 ∪ {r0 s, r1 s, r2 s, r3 s} of D4 has expanded to
include all 8 elements of D4 . To paraphrase, there is no subgroup of D4
strictly between the rotational subgroup R4 and D4 itself. 
It is clear that {r0 } (called the trivial or the identity subgroup) is the
only subgroup of D4 of size 1; and that the subgroups of size 2 consist of
the identity r0 plus a self-counteracting symmetry, i.e., these are {r0 , r2 }
and {r0 , sj } for any reflection sj . The previous argument shows that once a
subgroup K contains r1 and some reflection sj , then K contains everything,
i.e., K = D4 . A similar argument can be applied to r3 and any reflection
sj , since r3 generates the rotational subgroup R4 , just as r1 does. Thus, the
rotations in any other subgroup of D4 are at most r2 and r0 (of course).
Now, if you complete the full multiplication table for D4 , you will notice
that r2 is a very special element: it commutes with everything in D4 , i.e.,
r2 x = xr2 for all x ∈ D4 (cf. Fig. 10b). In particular, if a subgroup K
contains r2 and some reflection sj , then r2 sj = sj r2 = sk for some other
reflection sk . Since r22 = s2k = s2j = r0 , the identity, with some more work, one

can manipulate the above equalities to also obtain that r2 sk = sk r2 = sj and
sj sk = sk sj = r2 . In other words, {r0 , r2 , sj , sk } already forms a subgroup
of D4 of order 4. There are two such subgroups of D4 ; the pairs {sj , sk }

corresponding to these subgroups are the two reflections {s1 , s2 } across the
midsegments of the square, or the two reflections {s3 , s4 } across the diagonals
of the square. Any other pair of reflections in your subgroup will multiply
to the rotations r1 or r3 (why?), resulting in the whole group D4 (why?).
This exhausts all possibilities for subgroups of D4 . In Group Theory II
you will learn of more powerful techniques for tracking and classifying sub-
groups K of a given group G. In particular, |K| divides |G|, which explains
why the group D4 of 8 elements ended up having subgroups only of orders
1, 2, 4, and 8, all of which are divisors of 8. ♦
Exercise 5. As was shown in the text, R4 and T are isomorphic; but T and S
are not isomorphic (we came up with different number of self-counteractions
in each of them). It follows that R4 and S cannot be isomorphic either
Exercise 6. Every symmetry of the rectangle is its own counteraction
(cf. Fig. 9a). Thus, S(rectangle) cannot be isomorphic to T . However, it is
isomorphic to S via any relabeling of the four rectangle’s symmetries to the
four sock actions that sends the identity symmetry r0 to the “do-nothing”
action n. Explain why any such relabeling will work, and count how many
i relabellings, called isomorphisms, there are between S(rectangle) and S. ♦
Problem 5. It is slightly “illegal” to ask this question yet, as we haven’t
defined groups in general! This does not prevent the reader from glancing
ahead at Definition 2 and drawing some conclusions. For example, it is
true (and not too hard to verify) that any group G = {e, a, b, c} of order 4 is
isomorphic to T or S. Indeed, check that the product of any two non-identity
elements of G must equal the identity element e or the third non-identity
element, e.g., ac = e or ac = b. (Why ac = a and ac = c?)
(a) If ac = e, then a and c are inverses to each other, leaving b to be its own
inverse: b2 = e (why?). Then the row of a prohibits ab from being equal
to a or e (cf. Fig. 11a), and it can’t be b anyways (Why not?), so that
the only choice left is ab = c. This in turn leaves only one possibility
for a2 in the row of a: a2 = b. Using the fact that any row and any
column of G’s table contains all elements of G (without repetitions or
omissions, why?), you can easily fill in the rest of the table and establish
that it is identical to that of T .
· e a b c · e a b c
e e a b c e e a b c
a a e a a c b
b b e b b c a
c c e c c b a

Figure 11. If |G| = 4 then G ∼

= T or G ∼

(b) Similarly, if any other product xy = e for some x, y = e, we end up

with G ∼= T . Thus, WLOG, assume that xy = z for any of the three
non-identity elements x, y, z in G. This almost completes the table for
G (cf. Fig. 11b), leaving only to plug in identity elements along the
diagonal: a2 = b2 = c2 = e. Without doubt, G ∼ = S. ♦
Thus, there is no “third” group of order 4. The question of a “fourth”
group of order 12 is much more involved and requires techniques beyond the
current scope of the session. There are, in fact, five non-isomorphic groups
of order 12; and, for those familiar with the notation, they are: Z12 , Z6 × Z2 ,
D6 , A4 , Z3  Z4 . The reader will learn about some of these groups as we
move through this part I and part II of Group Theory. ♦
Exercise 8. U is closed under multiplication but not under addition: 1+1 =
2 ∈ U . If we want U + = {0, 1, a} to be closed under multiplication for a real
number a, then a2 ∈ U + . But a2 = 0 and a2 = a (a = 0, 1), so we are left
with a2 = 1 and forced to conclude a = −1. Indeed, the set U + = {0, 1, −1}
is closed under multiplication!
If we allow the addition of finitely many numbers to U , for the resulting
set U ++ to be closed under multiplication we must have at least all powers of
a in U ++ for any a ∈ U ++ . But there are infinitely many such powers of a:
a1 , a2 , a3 , . . .! By the Pigeonhole Principle, two such powers must coincide,
i.e., an = am for some n > m > 0. From here an−m = 1 (why?), i.e., a is a
root of an equation xk = 1 for some k ≥ 1. The only real numbers satisfying
such equations are ±1. Hence, our previous set U + = {0, +1, −1} is the only
option for a finite real extension of U that is closed under multiplication.
If you allow complex numbers to be added to U , the possibilities become
numerous, as each equation xk = 1 has n distinct complex solutions. This
will be discussed in more detail in relation to the cyclic subgroups Cn of the
complex numbers C∗ = C − {0} (cf. Exer. 13). 
Exercise 9. Following the hint in the text, the product of two identity
elements e1 and e2 can be viewed differently, depending on whether we choose
to apply the definition of an identity element to e1 or to e2 : e1 = e1 ∗e2 = e2 .
From here e1 = e2 , i.e., any two (and hence all) identity elements are equal.
Similarly, if a1 and a2 are inverses of a, the triple product a1 ∗ a ∗ a2 can
be calculated two ways, using the associativity property of the operation:
(a1 ∗ a) ∗ a2 = e ∗ a2 = a2 and a1 ∗ (a ∗ a2 ) = a1 ∗ e = a1 . From here,
a2 = a1 ∗ a ∗ a2 = a1 , i.e., any two inverses of a are equal. 
Problem 6. Following the hint, for any a, b ∈ G we expand e = (a ∗ b)2 :
e = (a ∗ b) ∗ (a ∗ b) = a ∗ (b ∗ a) ∗ b.
Multiply both sides by a on the left and by b on the right: a ∗ e ∗ b =
a ∗ (a ∗ (b ∗ a) ∗ b) ∗ b. As a ∗ a = e and b ∗ b = e, the RHS simplifies:
a ∗ b = (a ∗ a) ∗ (b ∗ a) ∗ (b ∗ b) = e ∗ (b ∗ a) ∗ e = b ∗ a.
Thus, a ∗ b = b ∗ a and the group is abelian. 

Exercise 10. Verifying that (R, +) and (Z, +) are groups should be no
problem. To make sure everyone is on the same page, note that the number
0 is the identity element of both groups (why?), and inverses are obtained
by the usual negation of a number: a−1 = −a (why?) for any a ∈ R or Z.
The case of the group (Zn , +) requires some facts from Number Theory I.
For instance, to establish that the operation + is well-defined in Zn (what
is this and why are we concerned about it here?), we need the fundamental
lemma that adding congruences modulo n is a valid operation in Zn : if
a ≡ b (mod n) and c ≡ d (mod n) then a + c ≡ b + d (mod n). Again, 0 will
serve as the identity element and −a = n − a will be the inverse of a (mod n).
Regarding the group (R∗ , ·) it is important to understand that removing
the number 0 from R is necessary, as 0 has no multiplicative inverse (i.e., no
reciprocal ). The identity element in (R∗ , ·) is the number 1 this time! ♦

Exercise 11. It is conventional in mathematics to define t0 as the identity

element in any group; in particular, t0 = e in D∞ . Note that by the listing
of the elements of D∞ it is clear that D∞ is generated by two elements:
starting with t and s, we keep adding to D∞ all powers of t and of s, and
all the resulting products of such powers. The list for D∞ contains only the
products in the form tk sm ; but what about something like s2 t3 s−5 t−2 ? As
D∞ is supposed to be a group, this product must be in D∞ ! Is it?
To start with, s2 = e as s is a reflection. Thus, s−1 = s and we can
simplify s2 t3 s−5 t−2 to t3 st−2 . To get this into the desired form tk sm we
need a rule which moves somehow all ss through all ts from left to right:
this is precisely what the problem is asking us to prove, i.e., stk = t−k s.
Indeed, if x is any integer, here is how the two sides of the proposed equality
act on x:
stk (x) = s(tk (x)) = s(x + k) = −(x + k) = −x − k,
t−k s(x) = t−k (−x) = (−x) − k = −x − k.
We conclude that stk (x) = t−k s(x) for all x ∈ Z, i.e., the transforma-
tions themselves are equal: stk = t−k s. In particular, t3 st−2 = t3 (st−2 ) =
t3 (t2 s) = (t3 t2 )s = t5 s. In practice, this rule boils down to pushing all ss to
the right and representing any product in D∞ in the form tk sm for k ∈ Z
and m = 0, 1. In conclusion, D∞ is closed under the group operation of
composition. Associativity is automatic because any composition of actions
(or functions) is associative: (f ◦ (g ◦ h))(x) = ((f ◦ g) ◦ h)(x) = f (g(h(x)))
for all x. And inverses are easy to find too: (tk sm )−1 = sm t−k ∈ D∞ (why?).
Thus, D∞ is a group.
More interestingly, D∞ is the group of all symmetries of the integer
line Z. Indeed, a (rigid) symmetry φ of Z must preserve adjacency among
integers; in particular, φ(0) and φ(1) must be adjacent integers; thus, if
φ(0) = k, then φ(1) = k + 1 or φ(1) = k − 1. In the first case, we are forced
to declare φ(2) = k + 2, φ(−1) = k − 1 and so on, i.e., φ = tk is translation

by k units. In the second case, φ can be realized as the composition of

a reflection and then a translation: φ = tk s (why?). In either case, the
symmetry φ belongs to D∞ . As D∞ consists only of rigid symmetries of Z,
we conclude that the two groups are identical: D∞ = S(Z). 
Problem 7. Let s be the reflection across a vertical line through the origin
in all symmetry groups, rα a rotation of the circle C by α degrees clockwise
about the origin, and tq a translation by q units of the real line R. Check the
following basic relations for all angles α and β and all real numbers q and u:
s2 = e, srα = r−α s, stq = t−q s, rα rβ = rα+β , and tq tu = tq+u .
To understand the answers below, the reader needs to be familiar with the
semidirect product  of groups, according to which the above relations give:
• Dn ∼= Zn  Z2 ∼ = S(Cn ); D∞ ∼ = Z  Z2 ∼= S(Z);
∼ ∼
• S(R) = R  Z2 ; S(C) = R/Z  Z2 .
Here Cn is the set of vertices of a regular n-gon under complex multiplication,
which is a cyclic group of order n (cf. Exer. 13). The quotient R/Z (another
standard construct of abstract algebra) can be thought of as the interval
[0, 1] with its endpoints identified, which can be easily visualized as the unit
circle C. ♦
Exercise 12. That (C∗ , ·) is a group follows in much the same way as you
showed that (R∗ , ·) is a group: C∗ is closed under complex multiplication (if
you multiply two non-zero complex numbers, you will get a non-zero complex
number), the operation is associative, the number 1 is in C∗ and acts as the
identity element there, and any z ∈ C∗ has an inverse in C∗ :
1 1·z 1 1
z −1 = = = z = 2 z ∈ C∗ .
z z·z zz |z|
Recall here that z is the complex conjugate of z, and that zz = |z|2 is always
a positive real number for z = 0.
To show that the unit circle C is a subgroup of C∗ , note that 1 ∈ C and
that C is closed under multiplication and taking reciprocals: if z, w ∈ C, then
|z| = |w| = 1 so that |zw| = |z| · |w| = 1 and z1 = |z|
= 1, i.e., zw and z −1
are both in C. 
Exercise 13. If z1 and z2 are two roots of the equation z n = 1, then their
product is too, as well as their reciprocals: (z1 z2 )n = z1n z2n = 1 · 1 = 1, and
(1/z1 )n = 1/(z1n ) = 1/1 = 1. Hence, Cn is closed under multiplication and
taking inverses. Noting that 1 is always a root of the equation completes the
proof that Cn is a subgroup of C, and hence of C∗ too. 
Exercise 14. We already discussed earlier that G1 is generated by its ro-
tation r1 , and thus G1 = r1  is cyclic of order 12. Sending any power
r1k → ζnk ∈ Cn defines an isomorphism from G1 to Cn . As G2 and G3 have
no generators, they are not cyclic, and hence not isomorphic to Cn . ♦

Exercise 15. (Zn , +) is generated by 1, as any k ∈ Zn can be written

as k = k · 1 = 1 + 1 + · · · + 1(mod n), and hence (Zn , +) is cyclic with n
 elements. For further challenge, find all generators m of (Zn , +): they will
be precisely the relatively prime to n numbers m, i.e., gcd(m, n) = 1 (why?).
For an isomorphism φ : (Zn , +) ∼ = (Cn , ·) send a generator to a generator,
and follow with the powers: k · 1 → ζnk for all k = 1, 2, . . . , n. ♦

Exercise 16. From Problem 7 on D∞ , we know that the set of translations

of Z is described by TZ = {e, t, t−1 , t2 , t−2 , . . . , tk , t−k , . . . } = t, where t is
the translation by 1 to the right. Thus, TZ is cyclic of infinite order, and
as such, it is isomorphic to any cyclic group of infinite order, e.g., Z = 1;
indeed, show that the map φ : TZ → Z defined by φ(tk ) = k for all k ∈ Z is
an isomorphism. ♦

Exercise 17. By definition of the inverse of b = an in a group G, we need

? ? ? ?
only to verify that b−1 b = e = bb−1 , i.e., (a−1 )n an = e = an (a−1 )n . Do this
using associativity of the operation and work from inside out. For example,
for n = 3 we can calculate a3 (a−1 )3 as follows:
(aaa)(a−1 a−1 a−1 ) = a(a(aa−1)a−1 )a−1 ) = a(aa−1)a−1 ) = aa−1 = e. ♦
e e e
Problem 8. (a) Note that whenever an = e then (a ) = (a ) = e−1 = e,
−1 n n −1

and vice versa: if (a−1 )n = e then an = ((a−1 )n )−1 = e−1 = e. Thus, the
same powers of a and a−1 equal e, i.e., the orders of these elements are the
same (why?). In particular, in the present case, o(a−1 ) = k. 
For o(a ) we need the following key lemma:
Lemma 1. If an = e for some n = 0, then a has finite order k that divides n.

Proof: If n < 0, then −n > 0 and by the previous exercise we have

a−n = (an )−1 = e−1 = e. Thus, WLOG we may assume n > 0 so that some
positive power of a equals e: an = e. But then there is a smallest positive
power of a which equals e, i.e., let k > 0 be the smallest integer such that
ak = e. By definition, k = o(a). We want to show that k|n. To this end,
divide n by k: n = kq + r for some quotient and remainder q, r ∈ Z such
that 0 ≤ r < k. We can now calculate an in two different ways:
an = akq+r ⇒ e = (ak )q · ar = eq ar = ar ⇒ ar = e.
If r = 0, this will produce a positive integer r smaller than k with ar = e, a
contradiction. Thus, r = 0 and n = kq, i.e., k|n. 
Back to Problem 8(a). Moving on to o(am ) for any m ∈ Z, we already
know that this order is finite because (am )k = (ak )m = em = e. So set
o(am ) = s, i.e., e = (am )s = ams . By Lemma 1 applied to ams = e,
we conclude that o(a) = k divides ms. Here m and k are given to us,
and we are trying to find the smallest s > 0 for which this happens. If
gcd(k, m) = d and k = dk1 , m = dm1 for some relatively prime k1 , m1 ∈ Z,

then k|ms iff (dk1 )|(dm1 s) iff k1 |(m1 s) iff k1 |s (why?). So, the smallest s is

k1 : s = k1 = k/d = k/ gcd(k, m). Conversely, if s is given by this formula,
k k
m gcd(k,m)
(am )s = ams = a = adm1 d = am1 k = (ak )m1 = em1 = e.

gcd(k,m) · 
Therefore, o(am ) =

Exercise 18. (a) If G is cyclic, then all its elements are powers of the same
a ∈ G. Hence, the product of any two elements b, c ∈ G can be calculated
by these powers: bc = ai aj = ai+j = aj+i = aj ai = cb. Thus, G is abelian
because the addition among the integers i and j is also abelian. 
In (b), let G = a. If o(a) = k, by Problem 8(b), H = {e, a, a2 , . . . , ak−1 }
is already a (cyclic) subgroup of G, with k elements. But any power aj of
a equals some element of H! Indeed, if we divide j by k, i.e., j = kq + r
with quotient q and remainder r (0 ≤ r < k), then aj = akq+r = (ak )q ar =
eq ar = ar ∈ H. Thus, the whole group G equals H, and their orders must
be the same: n = |G| = |H| = k, i.e., o(a) = n = |G|. 
(c) In our previous notation for Dn , let rk be the rotation of a regular
n-gon A0 A1 A2 . . . An−1 which takes vertex A0 to Ak , for k = 0, 1, 2, . . . n − 1.
Then (r1 )k = rk for all k, i.e., the rotation r1 generates the rotational sub-
group Rn = {r0 , r1 , . . . , rn−1 } of Dn . In particular, Rn = r1  is a cyclic
subgroup of order n. Now, if s1 the reflection of the n-gon across the perpen-
dicular bisector of A0 A1 , then check that r1 s1 = s1 r1 . Indeed, while r1 and
s1 both send vertex A0 to A1 , the two compositions r1 s1 and s1 r1 act overall
differently on A0 : r1 s1 (A0 ) = r1 (A1 ) = A2 = s1 r1 (A0 ) = s1 (A1 ) = A0 .
Therefore, Dn is non-abelian. By part (a), Dn can’t be cyclic either. 

Problem 9. Following the hint, let’s examine the examples in Exercise 10.
The elements of (R, +) all have infinite orders (why?), except for the identity
element 0, whose order is 1. The subgroup (Z, +) follows suit (why?) and
doesn’t produce anything interesting in terms of orders of elements. The

group (Zn , +) is finite, hence it can’t have elements of orders larger than n
(why?); in fact, every j = 1, 2, . . . , n ∈ Zn has order k = n/ gcd(j, n) ≤ n.
(This needs a proof!) The only elements x of (R∗ , ·) with finite orders are
those for which xn = 1 for some positive n; but the only real numbers
satisfying such equations have absolute value 1 (why?), i.e., x = 1 or x = −1,
with orders 1 and 2, respectively. No luck here either!
Moving to D∞ and using Exercise 11 check that, as long as there is a
translation in the product tm sk (i.e., m = 0), the element will have infinite
order. Thus, the only two elements in D∞ of finite orders are e and s. Dn
is a finite group of order 2n; so it won’t have elements of order greater than
2n; in fact, check that the largest order of an element in Dn is n, attained
by the rotations rj with gcd(j, n) = 1. The symmetries of the real line R
behave in much the same way as D∞ (cf. solution to Problem 7); so again
no luck here.

Finally, let’s examine the symmetries S(C) of the unit circle C. Any
rotation r 2π has order n (why?), and hence S(C) is a infinite group containing
elements of any finite orders n, for n ∈ N. Note that S(C) also has elements of
infinite order; for example, any rotation raπ where a is an irrational number
will never compose several times with itself to give the identity rotation
(why?). For instance, r√2π and rπ2 fall into this category. 
The final example is the unit circle C itself, or the larger group in which
it is contained: the non-zero complex numbers under multiplication, (C∗ , ·).
Since any cyclic group Cn = ζn  is contained in C, and since o(ζn ) = n, our
infinite groups C and C∗ have elements of any order n ∈ N. 

Problem 10. Let G be a finite subgroup in (C∗ , ·). Then the same discus-
sion we had in the hints about U ++ in Exercise 8 applies to G too! Indeed,
any a ∈ G has infinitely many powers {aj }, all of which are inside the finite
group G. By PHP, it follows that two of those powers must coincide, i.e.,
an = am for some n > m, from which an−m = 1 for n − m > 0, and hence a
has some finite order k (why?). In other words, ak = 1 in C, which means:
(a) a has a finite order in G, and
(b) the modulus of a is 1 and hence a lies on the unit circle C.
Therefore, G ⊂ C. Starting from 1 ∈ G ∩ C, let’s walk along C counterclock-
wise. Since G is finite, after 1 ∈ G, there will be a first element g of G which
we will hit along our walk. Let the angle of g with the real axis be α, i.e.,
g = cos α + i sin α. We claim that g generates all of G.
Indeed, let h ∈ G, h = 1, and h = g. Then the angle of h is larger than
α, i.e., h = cos β + i sin β with 0 < α < β < 2π. Keep subtracting α from
β until you hit a negative angle for the first time: say, β − (l + 1)α < 0 but
γ = β − lα ≥ 0, so that γ − α < 0. This means that b = cos γ + i sin γ, and
in group terminology, b = h · g −l ∈ G. Thus, b ∈ G has angle γ such that
0 ≤ γ < α. By the minimality of α this is impossible, unless γ = 0, i.e.,
β = lα, b = 1, and hence h = g l .
So all elements of G are in the cyclic subgroup generated by g. This
certainly means that G equals its own subgroup, i.e., G = g. As we showed
in (a) above with g in place of a, g must have some finite order q. We conclude
that G = g is the cyclic group Cq = ζq , which we encountered earlier.
Thus, all finite subgroups of (C∗ , ·) are precisely the cyclic subgroups Cn for
any n ∈ N. 
Åã Å ã Å ã
123 123 1234567
Exercise 22. (a) πρ = ; ρπ = ; (b) ;
213 132 7641352
Å ã Å ã Å ã
123456 12345678 1 2 3 4 5 6 7 8 9 10 11 12
; ; (c) .
512364 17824653 4 2 12 11 1 5 9 7 6 8 3 10

Exercise 23. (1356)2 = (15)(36), (1356)3 = (1653), (1356)4 = (1) = e.

Hence, o((1356)) = 4. 

Problem 11. If the r-cycle is α = (a1 a2 · · · ar ), then αk = e for any

k = 1, 2, . . . , r − 1. Indeed, αk sends a1 → ak+1 = a1 . However, αr = e
as every element will move r slots to the right, i.e., it will come back to its
original position. Thus, o(α) = r. 
Exercise 25. φ = (1357)(2)(468) = (1357)(468). 
Problem 12. (a) The statement is obvious for n = 1: the only permutation
around is (1). Given permutation α ∈ Sn , take some a1 ∈ {1, 2, . . . , n} and
track down where it goes under α; let α(a1 ) = a2 , α(a2 ) = a3 , α(a3 ) = a4
and so on; and define αk = ak . Keep on going, until you come back to
a1 , i.e., α(aj ) = a1 for some r ≤ n. This will always happen. Indeed,
the sequence {a1 , a2 , a3 , . . . , an , an+1 } consists of n + 1 numbers while we
have only n numbers to work with! By PHP, two elements of the sequence
must be the same! Let ai = aj be the first two elements that are equal. If
i > 1, then the permutation α has hit ai = a1 twice: indeed, ai−1 = aj−1
(why?) but α(ai−1 ) = ai = aj = α(aj−1 ). This contradicts the bijectivity
of α! Thus, the only possibility is for the first repetition in the sequence
to involve a1 : a1 = aj where 2 ≤ j ≤ n + 1. Thus, α contains the j-cycle
(a1 , a2 , . . . , aj−1 ). The remaining k = n − (j − 1) numbers in {1, 2, . . . , n}
must be permuted amongst themselves by α (again because of bijectivity of
α); we can think of this permutation β as an element of Sk for some k < n.
Thus, α = (a1 , a2 , . . . , aj−1 )β. By strong induction on n, we can write β as
a product of disjoint cycles, which implies that α can be written as such a
product too. 
(b) By (a), we can split any permutation α as a product of several disjoint
cycles αi : α = α1 α2 · · · αr . It remains to represent any cycle as a product
of (not necessarily disjoint) transpositions. The text suggests one way to do
this via a specific example. Here is a general formula for a cycle αi :
αi = (a1 a2 a3 . . . aj ) = (a1 aj )(a1 aj−1 )(a1 aj−2 ) · · · (a1 a4 )(a1 a3 )(a1 a2 ).
Indeed, track down where each of the elements of αi goes in the RHS, being
careful to apply the transpositions in the correct order, from right to left! For
example, in the RHS we send a1 → a2 , a2 → a1 → a3 , . . . , aj−1 → a1 → aj ,
and aj → a1 , which is precisely what we want to happen. 
(c) Let α = α1 α2 · · · αr be a product of disjoint cycles. As any two
disjoint cycles commute: αi αj = αj αi , we can easily compute any power of
α by rearranging and combining together the same cycles: αk = α1k α2k · · · αrk .
In order for this whole product to be e, we must “kill” each power αjk . Let
the length of αj be lj . By Problem 11, we know that o(αj ) = lj , so that the
first power that kills αj is its length: αjj = e. Moreover, within the proof
of Problem 8, we showed that ak = e iff o(a) divides the exponent k. Thus,
αjk = e iff lj |k. As this applies to all cycles αj , we conclude that αk = e
iff k is divisible by all lengths l1 , l2 , . . . , lr . The smallest k that makes this
happen is their least common multiple: k = lcm(l1 , l2 , . . . , lr ). 

Problem 13. (a) The arising 4 cases depend on whether, how much, and
how exactly tk = (1a) overlaps with the previous transposition tk−1 :
• Complete overlap: tk−1 = tk = (1a) so that tk−1 tk = (1a)(1a) = e and
we can erase tk−1 tk from the product, thereby reducing the number of
transpositions by 2 and contradicting the minimality of this odd-length
representation of e.
• No overlap: tk−1 = (bc) for some b and c different from 1 and a. Then
tk and tk−1 commute: tk−1 tk = (bc)(1a) = (1a)(bc).
• Partial overlap 1: tk−1 = (1b) for some b different from a and 1. Then
tk−1 tk = (1b)(1a) = (1ab) = (1a)(ab).
• Partial overlap 2: tk−1 = (ba) for some b different from a and 1. Then
tk−1 tk = (ba)(1a) = (1ba) = (1b)(ba).
While the first case is impossible, in the last three cases we managed to move
1 to the left in the total product. ♦

Exercise 26. Write any two even permutations α and β as products of

transpositions: α = t1 t2 · · · t2m and β = q1 q2 · · · q2k . Then αβ can be written
as the product of 2m + 2k transpositions, i.e., αβ is an even permutation.
Furthermore, the inverse is α−1 = (t1 t2 · · · t2m )−1 = t−1 −1 −1 −1
2m t2m−1 · · · t2 t1 =
t2m t2m−1 · · · t2 t1 , i.e., α−1 is also an even permutation. We have shown
that An , the set of even permutations, is closed under taking products and
inverses. It definitely contains the identity e, and its operation is associative
(as it is so in Sn ). Hence An is a subgroup of Sn . 
Using similar arguments, the reader can practice in showing that the
product of two odd permutations is even, while the product of an odd and
an even permutation is odd, and that the inverse of an odd permutation is
odd. ♦

Problem 14. Any symmetry of the tetrahedron is a permutation on its 4

vertices. The rotational symmetries correspond to the 12 elements of A4 :
e, the three (disjoint) products (ab)(cd), and the eight 3-cycles (abc). A
new (reflectional) symmetry adds to A4 a new, odd permutation (ab). Thus,
13 ≥ |S(T )| ≤ 24 = |S4 |. But as the text alludes, the order of a subgroup
divides the order of the group, and the only number between 13 and 24
dividing 24 is 24 itself, i.e., S(T ) = S4 . ♦

Problem 15. The cube has 6 pairs of opposite edges, through the midpoints
of which passes a rotational axis with only 1 non-trivial rotation by 180◦ .
The cube has 3 pairs of opposite faces, through the centers of which passes a
rotational axis with 3 non-trivial rotations (by 90◦ , 180◦ and 270◦ ). Finally,
there are 4 pairs of opposite vertices (forming the 4 diagonals), through which
passes a rotational axis with 2 non-trivial rotations (by 120◦ and 240◦ ). With
the identity this makes a total of 1+6·1+3·3+4·2 = 24 rotational symmetries
of the cube, i.e., |G4 | = 24. Can we identify G4 with a well-known group?

The cube has 8 vertices. As the hint in the text suggests, we are looking
for 4 objects that are permuted; such are the 4 pairs of diagonally oppo-
site vertices. Indeed, the 4 diagonals of the cube are permuted amongst
themselves by any symmetry of the cube (why?). If l is the line through
the midpoints of any two diagonals di and dj , then l is perpendicular to
the other two diagonals dk and dm (can you see it?). Hence, the rotation
about l by 180◦ will switch di and dj but fix dk and dm , thereby inducing
the transposition (di dj ) ∈ S4 .
As any permutation is a product of transpositions, these (di dj )s generate
the whole S4 of permutations of {d1 , d2 , d3 , d4 }. Thus, S4 ⊆ G4 . But |S4 | =
4! = 24 = |G4 |, so that G4 = S4 . 

Problem 16. Conversely, if we know where the diagonals go under a sym-

metry of the cube, we know where the whole cubes goes! The subtlety here
is to realize that each diagonal di corresponds to a pair of vertices (ai , bi ),
which can be fixed or which can switch with each other under a cube’s sym-
metry; in either case, the diagonal di goes it itself, so we don’t see a difference
between such symmetries from the viewpoint of our S4 in Problem 15. So,
we might still be short a few symmetries of the cube!
Indeed, a central symmetry σ through the center O of the cube (i.e., a
dilation through O by a ratio of −1) will switch any two opposite vertices,
thereby fixing all 4 diagonals. Hence, σ is a (non-rotational) symmetry of the
cube such that o(σ) = 2. The reader should check that σ commutes with all
rotations in G4 and that any reflection of the cube can be uniquely written
as a product of σ and a rotational symmetry. For instance, the reflection
depicted in Figure 7b equals σρ = ρσ, where ρ is an 180◦ -rotation about the
line connecting the centers of the front and the back faces of the cube. In
summary, S(C) ∼ = S4 × C2 . ♦

Exercise 27. (d) (1, 8, 9, 6, 2, 14, 10, 4, 3, 11, 13, 7, 15, 5, 12). 

Exercise 28. If α, β ∈ S15 , then they both fix the number 16, so that their
composition αβ, as well as their inverses α−1 and β −1 , will also fix 16. As e
fixes 16, S15 can, indeed, be thought of as the subgroup of S16 consisting of
all permutations that fix 16. 

Question on p. 125. The permutation in Figure 1d is the 12-cycle from

Exercise 27(d) above, and hence it is odd permutation. By Theorem 1, it is

Theorem 2. The 3-cycle (13, 14, 15) in the 15-puzzle can be converted into
e in many ways by permuting only the bottom two rows. Here is one way
(cf. Fig. 12), proposed by Alison Mirin, an alumna of Mills College, who
took the first course in Problem Solving in Mathematics that was based on
volume I of the present book.

(1) Rotate the bottom two rows clockwise by one place.

(2) Rotate the bottom middle 2 × 2 square clockwise by one place.
(3) Rotate the bottom left 2 × 2 square counter-clockwise by one place.
(4) Move the square with 9 in it (to the left), and then move the square
with 10 in it (up).
(5) Rotate the bottom two rows counter-clockwise by one place.
The last permutation is the desired identity permutation e. 

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8 5 6 7 8
9 10 11 12 14 9 10 11 14 13 9 11 13 9 11 13 9 10 11
14 15 13 15 13 12 15 10 12 14 15 10 12 14 15 12

Figure 12. Permuting the 3-cycle (13, 14, 15) to the identity e
Session 6

Monovariants. Part II
Jumping Fleas and Conway’s Checkers

based on Gabriel Carroll’s session

Sneak Preview. In Part I of this session, we learned about monovariants:

things that can change but only in one direction. Plunging deeper into the topic,
here we shall track “right-wing” fleas and other “extremists,” zero-in on everything
in sight, enforce affirmative action by re-distributing favors and peacefully settle
scores among feuding knights, learn to efficiently edit our essay assignments, enjoy
organized sleep-shifts . . . at school, and discover unseen barriers for migrating
checkers. Through recreational problems and constructive activities, we shall
focus on two major uses of monovariants: showing that an iterative process must
end, and showing that some state is unreachable from some other state. Part III
shall present another, more technical application of monovariants to inequalities.
In addition to Monovariants I, the reader should review from volume I some
basics from Combinatorics, Number Theory, Proofs, Induction, and, of course,
the Invariants session, especially the “Escape of the Clones” problem.

1. Numerical Monovariants

A large portion of the examples of monovariants in Part I might be

called “numerical” – they were full of numbers, e.g., in the mansion problem
we talked about the number of people in each room, or the numbers of men
and women, separately, in each room, etc. When you have a large collection
of numbers and want to form a single monovariant from them, there are
some standard recipes to try:

 largest, or the smallest; or more generally, apply some function to each of

PST 42. Look at the sum of all the numbers, or their product; look at the

them (such as squaring), and then add or multiply them together.

With some luck, you have created a monovariant: a feature in the prob-
i lem that either always increases or always decreases! Let’s see how this works
in practice.

1.1. Sum-monovariants are among the most common monovariants.

Problem 1. (Russia ’61) A rectangular m × n array of real numbers is

given. Whenever the sum of the numbers in any row or column is negative,
we may switch the signs of all the numbers in that row or column, from
negative to positive or vice versa. Prove that if we repeat this operation
enough times, eventually all the row and column sums will be non-negative.

Before looking at the official solution below, try a simple example with a
2 × 2 array, as in Figure 1. Think about how many flips of the row or column
signs you need to perform before being unable to continue. What happens if
you change the numbers in your table? Do you always get 4 positive numbers
in the end, or could some be negative?

−2 −3 +2 −3 −2 +3 +2 +3 +2 −3 −2 +3 +2 +3
→ → → → → →
−1 −4 +1 −4 +1 −4 −1 −4 −1 +4 −1 +4 +1 +4

Figure 1. Switching signs in rows and columns

Solution: If the sum of the numbers in a row or column is x < 0, then

after we switch their signs, their sum will be −x > x. Since all the other
numbers in the table stay the same, we see that the sum of all the numbers
in the table has increased. (In the example in Figure 1, check that the sum
indeed increases: −10 → −4 → −2 → 0 → 2 → 4 → 10.) So this is our
monovariant – just the sum of all mn numbers.
Since each position in the array can take on only two different values
(the original number, with a + or − sign), the whole array can take on at
most 2mn possible states. This is a finite number of choices, so the sum of all
the numbers cannot keep increasing forever. Thus, eventually it must stop
increasing, which means we cannot perform the sign-switching operation any
more, and hence all the row and column sums will be non-negative. 
 PST 43. The key idea was to count the number of all possible states of the
array and deduce that some feature of the table we come up with, such as
the sum of all entries, will have only finitely many values. If our feature is a
monovariant, then it must reach an extreme value and stop changing.
If you are wondering how we counted 2mn possible states: we multiplied
the number of possibilities (2) for each cell of the array, e.g., for a 2 × 2 array
there are at most 24 = 16 possible states. This is an independent-choice
“menu”-type problem that you encountered in Combinatorics I (vol. I).
Potentially, a function could run through all of its possible values before
reaching its maximum. Could the process in Problem 1 take all the 2mn − 1
sign-switching steps before terminating? For an m × n array, what would be
the longest number of steps in this process? The sum in Figure 1 takes only

6 (not 15) steps to stop: Can we devise a 2 × 2 example that will take longer
to terminate? This discussion should remind the reader of the Appendix
to the Monovariants I session, where an analogous question was answered
for the mansion problem. To make it even more challenging, the advanced
reader can ask and attempt to answer the same questions about maximizing
the length of the process whenever appropriate in the forthcoming problems.
Only hints to the next few problems will be offered here, but you can
check out the solutions at the end if you get stuck. Recall that the greatest
common divisor of two integers, gcd(a, b), is the largest integer that divides
i both a and b, while the least common multiple, lcm(a, b), is the smallest
(positive) integer that is divisible by both a and b.

 a blackboard. One can erase any two distinct numbers and write their gcd
Problem 2. (St. Petersburg ’96) Several positive integers are written on

and lcm instead. Prove that eventually the numbers will stop changing.

As usual, let’s try out some examples to see what is happening:

• {2, 3, 5, 15} → {1, 6, 5, 15} → {1, 1, 30, 15}, or differently:
• {2, 3, 5, 15} → {1, 3, 5, 30} → {1, 1, 15, 30}.
Hints: If d = gcd(a, b), then a = dk and b = dm for some relatively prime
positive integers k and m, and l = lcm(a, b) = dkm (why?). Think of how the
sum changes from a + b to d + l: some algebra will be necessary to establish
whether you have an increasing or decreasing monovariant. Finally, what is
the largest number you can possibly write down? ♦

1.2. “Extremal” monovariants. By now, for your monovariants you should

be automatically trying first the sum of your numbers. However,
 PST 44. Sometimes it is easier to track the largest or the smallest number
present, and to set it aside if it doesn’t change under the operation.
Here are a few more related puzzles for you to think about. Their solu-
tions may also require concepts like prime powers pk .
Problem 3. In the setup of Problem 2:
(a) Is the product of all numbers a monovariant? Is it helpful at all?

(b) You will inevitably see a pattern among the numbers when the process
terminates. What is this pattern? Does it always persist?
(c) We choose the order of pairs on which to perform the operation. Pre-
sumably, the length of the process and the final resulting set of numbers
depend on our choice, or do they?

Hints: When does the process terminate? What pairs of numbers are kept
the same under the operation? If the largest (or smallest) possible number
is written on the board, will that number change afterwards? Can you put
it aside and apply the problem to the remaining numbers? ♦

Problem 4. (Cofman, [16]) Place four non-negative integers a, b, c, and d

around a circle. For every two consecutive numbers, take their absolute dif-
ference and write that difference between them; then erase the four original
numbers. Thus, after one step, the four new numbers will be |a − b|, |b − c|,
|c − d|, |d − a|. Iterate this process. Is it true that the process always even-
tually leads to a circle with four 0s? Generalize your result by replacing 4
with any power of 2 greater than one.
Solution: The first part of our solution works for any M numbers (not
just 4 or 2m ). For starters, notice that at every step the largest number L
around the circle can never increase: it will be replaced with some difference
|L − c| = L − c ≤ L, while no other (smaller) value can go beyond L. Thus,
the largest value is a decreasing monovariant, and it will eventually stabilize
at some value a ≥ 0. Of course, if a = 0, we are done (why?).

 stabilizes at some a > 0, eventually each of the numbers will be either a or 0.

Lemma 1. For any M numbers, if the largest number around the circle

Solution: The only way to get the value a from now on is, at each step, to
have the consecutive pair {a, 0} or {0, a} appearing somewhere on the circle.
But the only ways to produce {a, 0} are to have consecutively {a, 0, 0} or
{0, a, a}, and to produce {0, a} – to have {0, 0, a} or {a, a, 0}. We see that
from now on, at each step, we must have at least one of the sequences
{a, 0, 0}, {0, a, a}, {0, 0, a}, or {a, a, 0}. This begs for an inductive argument.
Suppose we have shown that for some n ≥ 2 it is always necessary (from
now on at each step) to have some sequence An = {a1 , a2 , . . . , an } where all
ai ’s are 0 or a, and at least one of them is a. (The sequence itself is not
fixed, so it can be one of 2n − 1 types – the exact number of sequences is
irrelevant.) But to produce sequence An , we must have from now on, at each
step, a sequence An+1 of (n + 1) 0s and a’s, with at least one a. Indeed,
start from some ai = a in An . To have such an a, as we saw above, we need
to have either {a, 0} or {0, a} in the corresponding place in An+1 . One can
easily see that each of these cases uniquely determines the rest of sequence
An+1 , populating it only with 0s and a’s. This completes the induction step.
But what happens when the length n of the required sequence An exceeds
M ? It simply means that we have wrapped the sequence around the whole
circle, and from now on the only values on the circle will be a’s and 0s. 
0 a a
a a a a
↔0 0
a a a a
0 a a
Figure 2. Cycles for triangles and hexagons

The remainder of the problem does not work in general. For instance,
take the 3 numbers {0, a, a} on the gray triangle in Figure 2a; the next

step will be the white triangle with exactly the same label {0, a, a}! Nei-
ther will we get to the zero-configuration from the hexagon {0, a, a, 0, a, a}
(cf. Fig. 2b), whose label also goes to a rotated version of itself under the
operation. The reader may want to think of other counterexamples.
Below we move to the 2m case, which always works.
Lemma 2. If you start with M = 2m numbers, all 0s and a’s, then after
2m iterations, you will be left with only 0s.
Partial proof: As an illustration, take the square with labels {0, a, a, a}
in Figure 3. After 4 iterations (follow the numbers outside the squares), this
turns into {0, 0, 0, 0}. To see why, forget about the operation |b − c|: it is
too hard to track what happens under it. As everything is divisible by a,
we can factor out a and, de facto, assume that a = 1. Now, let’s add up
any two adjacent numbers (written inside the squares) and work modulo 2,
i.e., think of 0 for even and 1 for odd. The results will be really the same
as before: at each step, 1 + 1 = 2 = 0 = |1 − 1|, 1 + 0 = 1 = |1 − 0|, and
0 + 0 = 0 = |0 − 0|. The net effect of these additions are shown in Figure 3,
where the final label is {10, 12, 14, 12} = {0, 0, 0, 0} (mod 2).
a a
0 a 0 a 0 0
0 1 1 2 3 5 10 12
a 1 2 0 a 5 7 a
1 1 3 4 12 14
a a 2 a 0 7 0 0
0 a
Figure 3. Zeroing-in on the circle for 22 numbers
In order to show that this works for any initial labels {a1 , a2 , a3 , a4 },
track the additions for the next 4 iterations. Thus, for instance, in place
of a1 in the last square we will have 6a1 + 4a2 + 4a4 + 2a3 = 0 (mod 2).
Analogous formulas will imply that all numbers around the last square are 0.
For instance, a1 = 0, a2 = 1, a3 = 1, a4 = 1 give 6 · 0 + 4 · 1 + 4 · 1 + 2 · 1 = 10.
To extend this argument to any 2m numbers, it is necessary to recog-
nize the coefficients 6, 4, 4, and 2 in the final formula as certain binomial
coefficients, generalize them, and show that they are all even. ♦

1.3. Looking for more than just a monovariant. Here’s a clever but
really difficult problem. It will require using an invariant, a feature that does
i not change (which shouldn’t be hard to find), together with a monovariant.
Problem 5. (IMO ’86, [21]) An integer is written at each vertex of a reg-
ular pentagon so that the sum of all five numbers is positive. If three consec-
utive vertices are assigned the numbers x, y, z with y < 0, then the following
operation is allowed: the numbers x, y, z are replaced by x + y, −y, z + y,
respectively. Such an operation is repeated as long as at least one of the five
numbers is negative. Determine whether the procedure necessarily comes to
an end in a finite number of steps.
x x+y
5 3 2 2
t y t −y
−1 −2 −1 2 1 2 0 2
−2 < 0 −1 < 0 −1 < 0
0 3 0 1 −1 1 1 0
q z q z+y

Figure 4. Changing consecutive sums

Hint: Try the following numerical experiment. Put some numbers at the
vertices, e.g., as in Figure 4a. Then write down all of the possible sums
of 1 or more consecutive numbers around the pentagon. (There are 21 such
sums. Why?) Perform the operation in the problem several times, and at
each step, again write down all 21 sums. How do the sums change when the
operation is performed? In particular, what about their absolute values? ♦
If you attempt to solve this last problem, you will realize that the ap-
propriate monovariant can sometimes be tricky to construct. In practice,
though, when a problem requires a monovariant, it’s usually not too hard
to come up with it. The simple formulas mentioned at the beginning of
this section are fairly general-purpose. The further problems in this session
should help provide you with inspiration for the rough times when the usual
recipes come up short.

1.4. Monovariants and Sequences. We’ve been looking so far at de-

scriptions of processes repeated over time. Of course, time, strictly speak-
ing, is not a mathematical concept; it’s a conceptual convenience. When
we talk about the state of some system changing over time, that is re-
ally shorthand for a sequence of states, with a certain relationship between
each state and the next. For example, Problem 4 could be rewritten as
describing a sequence s0 , s1 , s2 , . . ., where each si is a quadruple of num-
bers (ai , bi , ci , di ). The relationship between si and si+1 is given by si+1 =
(|ai − bi |, |bi − ci |, |ci − di |, |di − ai |). (Or, alternatively, we could think of
the problem as describing four sequences of integers, ai , bi , ci , di , living in
interrelated harmony.)
Using this equivalence between changes over time and sequences, we can
translate an earlier PST:

 tually constant, try using a monovariant.

PST 45. To show that some sequence of numbers (or other things) is even-

For example, the monovariant might be some function of the nth term
of the sequence, which would change in some predictable way as n increases.
Or perhaps we need to look not just at one term at a time, but at two or
more successive terms. This is confusing to explain without an example, so
here’s an example to make the discussion concrete.

Problem 6. (USAMO ’93, [41]) Let a and b be two odd positive integers.
Define a sequence by putting f1 = a, f2 = b, and letting fn for n ≥ 3 be the
greatest odd divisor of fn−1 + fn−2 . Prove that fn becomes constant for n
sufficiently large, and determine the eventual value as a function of a and b.

As an example, if a = 11 and b = 23, then f3 = 17, f4 = 5, f5 = 11,

f6 = 1, f7 = 3, f8 = 1, f9 = 1, and the sequence stabilizes at 1. But how do
we tie monovariants into the problem?
 PST 46. The key is to turn a divisor relationship into an inequality.
For example, if c divides d, then c ≤ d. Thus, for a sequence (of positive
terms) in which fn is a divisor of fn−1 for each n, in particular we know that
fn ≤ fn−1 , i.e., that the sequence is decreasing. The situation, however, isn’t
quite that simple in our problem – the example shows we don’t necessarily
have fn ≤ fn−1 for all n – but the truth is not too much more complicated.
Proof of stabilization: Notice that for each n, fn−2 + fn−1 is even,
which means that fn , as an odd divisor of fn−2 + fn−1 , is no larger than
(fn−2 + fn−1 )/2, the average of the two preceding terms. Therefore,
(1) fn ≤ (fn−2 + fn−1 )/2 ≤ max{fn−2 , fn−1 }.
This inequality is not “well-balanced” for our purposes: on the LHS we have
one term fn , and on the RHS we have some “max” function. To make it
symmetric, this “max” function should appear on the LHS too. But clearly
(2) fn−1 ≤ max{fn−2 , fn−1 },
so that the LHS’s of the last two inequalities (1)-(2) together imply
max{fn−1 , fn } ≤ max{fn−2 , fn−1 },
the desired balanced inequality we need. We read it as follows: if, for each n,
we consider the larger of the two numbers fn−1 , fn , this max never increases
when n increases. Aha! A monovariant!
Hence, max{fn−1 , fn } can only decrease. Since it is a positive integer, it
can’t keep decreasing forever, so it must eventually become constant at some
value c. Let n be large enough that the monovariant has stopped decreasing,
i.e., max{fn−1 , fn } = c from n on. We can assume fn = c (otherwise, use
max{fn , fn+1 } = c and just replace n by n + 1). Then we claim fn+1 = c
also. For contradiction, suppose fn+1 < c; then max{fn+1 , fn+2 } = c implies
fn+2 = c. By definition of fn+2 , this means that c is a divisor of fn + fn+1 .
But fn + fn+1 lies strictly between c and 2c (why?), so it cannot be divisible
by c. This contradiction proves the claim: fn+1 = c after all. We have
completed an inductive argument that shows all subsequent terms of the
sequence are also equal to c, i.e., the sequence is eventually constant. 
Once this is out of the way, the second part of the problem is not difficult.
In our example that started with f1 = 11, f2 = 23, the stabilizing constant
was c = 1. At the same time, the original two terms were relatively prime,

as well as any pair of consecutive terms. If we multiply both f1 and f2

by 5, then every consecutive term would be multiplied by 5, modifying the
constant c to 5. This gives us the idea of how to proceed:

Exercise 1. In the setup of Problem 6 prove that gcd(fn−1 , fn ) is an invari-
ant. Conclude that the constant value at which fn stabilizes is gcd(a, b).
Hint: Start by proving that fn+1 is divisible by gcd(fn−1 , fn ), and that
fn−1 is divisible by gcd(fn , fn+1 ). ♦
Problem 7. (USAMO ’97, [41]) Let p1 , p2 , p3 , . . . be the prime numbers
listed in increasing order, and let x0 be a real number between 0 and 1. For
each positive integer k, define xk = 0 if xk−1 = 0, and xk = {pk /xk−1 } other-
wise, where {x} = x−x denotes1 the fractional part of x. Find all x0 satis-
fying 0 < x0 < 1 for which the sequence x0 , x1 , x2 , . . . eventually becomes 0.

As an example, if x0 = 35 , then x1 = {p1 /x0 } = {2/ 35 } = {10/3} =

10/3 − 10/3 = 10/3 − 3 = 1/3, x2 = {p2 /x1 } = {3/ 13 } = {9} = 0, and

the sequence stabilizes. If x0 = 1/ 2, the calculation below is a lot more
involved and requires some algebra skills. (If you have a hard time following
it, you can skip it for now, as it won’t affect the gist of the solution.) Thus,
√ √ √ √ √
x1 = {p1 /x0 } = {2 2} = 2 2 − 2 2 = 2 2 − 2 = 2( 2 − 1),
¶ ©    √   √ 
x2 = p2 √3 √3( 2+1)
√ 3( 2+1)
x1 = 2( 2−1)
= 2( 2−1)( 2+1)
= 2(2−1)
√ √ √
3( 2+1)
= 2 −  3( 22+1)  = 3( 22+1) − 3 = √32 − 32 , and so on.
Hint: What √ looks evident in the second example is that we won’t be able to
get rid of 2, which will prevent the sequence from stabilizing. The reason
is that if x0 is irrational,2 then all xk are going to be irrational (why?).
Now suppose x0 is rational, as in our first example. The sequence it-
self may not be monovariant! However, do you notice something about the
denominators of x0 , x1 , x2 , . . .? Employ the PST below and see if you can
locate your monovariant. ♦

PST 47. For a sequence of rational numbers, investigate the two sequences
that it naturally generates: its numerators and its denominators. Depending
on the problem, it may or may not be advantageous to reduce the fractions
so as to redefine the two sequences.
Problem 8. (USAMO ’07, [25]) Let n be a positive integer. Define a
sequence by setting a1 = n and, for each k > 1, letting ak be the unique
integer in the range 0 ≤ ak ≤ k − 1 for which a1 + a2 + · · · + ak is divisible
by k. For instance, when n = 9 the sequence obtained is 9, 1, 2, 0, 3, 3, 3, . . ..
Prove that for any n the sequence a1 , a2 , a3 , . . . eventually becomes constant.

x is the largest integer ≤ x, e.g., 10/3 = 3.3̄ = 3 and  2 = 1.4 . . . = 1.
2 a
Irrational means a real number that is not of the form b for any integers a and b.

Why does such a sequence exist and why is it unique? Once a1 , a2 , . . . ,

ak−1 are determined, there is exactly one choice for ak to adjust the previous
sum a1 +a2 +· · ·+ak−1 to be divisible by k. For instance, if we already know
the first terms 9, 1, 2, 0, to get a5 ∈ [0, 4], we compensate for the remainder
2 (mod 5) of 9 + 1 + 2 + 0 = 12 by adding the only possibility 3 = a5 .
Hint: Since a1 + a2 + · · · + ak is divisible by k, this suggests looking at the
quotient bk = (a1 + a2 + · · · + ak )/k. Check the sequence b1 , b2 , b3 , . . . : you
may want to first show that it stabilizes. ♦

2. Constructive Activities

Imagine that you’re editing a piece of writing – maybe you’re a student

turning in a paper for a class, or maybe you’re a professional writer working
on your next book. If you’re like most writers, you don’t simply sit down
and instantly crank out a perfect piece of work. You start with a first draft,
and you know that there are lots of flaws and mistakes and weak points in
the writing. So you go through and fix them. Sometimes fixing a problem
requires making major changes to the paper, and in the process you create
new problems and mistakes. But you do it, because you know that on the
whole it’s an improvement, and the new problems can be fixed in turn. You
keep revising and revising, and eventually the result will be satisfactory.
Now, what does all that have to do with mathematics? It’s an example of
monovariants in action: the paper keeps getting better each time a problem
is fixed. In real life this iterative process is tedious and time-consuming, but
mathematics is great, because all you have to do is describe the process and
verify that it is possible. To strip away the allegory and get to the point:

 object that doesn’t meet the conditions, and then fix it until it does. Use a
PST 48. To construct an object that meets certain conditions, create an

monovariant to show that it will eventually be completely fixed.

2.1. Connecting the dots. The vagueness above is probably making your
eyes glaze, so let’s save the day with a specific example [59], where we will
progressively eradicate all “errors” in the solution.
Problem 9. (Kvant ’94) Given are n grey and n black points in the plane,
no three collinear (i.e., no three on the same line). Show that we can draw n
nonintersecting segments connecting the black points to the grey points.

Figure 5. Switching connections and Creating more intersections


How to go about this? If you try to give an explicit description of how

to pair up the points, it’s not clear where to start. So, using PST 48,
start instead by simply connecting the points randomly. Now whenever two
segments cross, we can uncross them by changing the way the points are
connected, as shown in Figure 5a.
We can just repeat this process, until all the intersection points have gone
away. Or can we? We need to show that the process will eventually end.
We can’t simply say that the number of intersection points will decrease,
because we’d be lying. As shown in Figure 5b, switching the segments could
create new and even increase the number of intersection points!

2.1.1. A geometric monovariant. We need to find some other monovariant

to guarantee that the process won’t get caught in a loop. Well, you might
notice that each time two points are uncrossed, two long segments are re-
placed by two shorter segments. Let’s make this precise. The four endpoints
of the segments form a convex quadrilateral, and uncrossing the lines just
means replacing the two diagonals by two opposite sides.
Exercise 2. Check that, if ABCD is a
convex quadrilateral, then C C
AC + BD > AD + BC.
 Hint: If AC ∩ BD = {E}, use the Trian- A

gle Inequality on ADE and BCE. ♦
It immediately follows that the sum of the lengths of all n segments
decreases every time we perform the uncrossing operation.
2.1.2. The clean write-up. Now all the parts are in place to be put together.
Solution to Problem 9: First pair up the grey and the black points ar-
bitrarily, and connect each grey point to the corresponding black point. This
may create some intersection points. Now iterate the following operation:
• Whenever a segment AC crosses a segment BD, with A, B grey and
C, D black, replace segments AC and BD by AD and BC.
For AC and BD to intersect, ABCD must be a convex quadrilateral
(why?). So by Exercise 2, the sum of the lengths of all n line segments must

decrease each time we perform this uncrossing operation. But there are only
a finite number of ways to pair up the grey points with the black points.3 So
if we perform the operation repeatedly, the process must eventually end.
By assumption, we perform the uncrossing above whenever two segments
cross. So when the process stops, there must be no more crossings, which
means that we have paired up the grey points with the black points using n
nonintersecting line segments – just as the problem requires. 
Recall the matchmaking Exercise 8 from Combinatorics I (vol. I). There, 10 men and
10 women could marry off in 10! heterosexual couples. Similarly, the number of possible
pairings between the black and grey points is n!.

Notice that this is an example of a problem where no operation is sup-
plied. Instead, solving the problem requires coming up with both the mono-
variant (total segment length) and the operation (uncrossing of segments)
that makes it monovary.

2.1.3. Extremes again! Writing out this solution doesn’t necessarily require
us to describe fixing the configuration as a process. For example, we could
also write the solution as follows:
Alternative solution to Problem 9: Among all n! ways of pairing
the grey points with the black points, consider the pairing that makes the
sum of the lengths of the segments as small as possible. We claim that,
with this pairing, the segments never cross each other. Indeed, if there is a
crossing, then we can re-pair the points involved as in Figure 5a to make the
total length of the n segments shorter (by Exercise 2). But we assumed that
the original pairing made this total length as small as possible, so this is a
The idea of picking up a pairing that minimizes the total segment lengths
i is a famous technique called the Extreme Principle:

 an extreme value of that feature (e.g., minimal sum). Then argue that, due
PST 49. Define a feature (e.g., sum of lengths) and select an object having

to the extreme value, some operation is not possible (e.g., uncrossing), and
hence conclude that the object in question possesses some other property
(e.g., no intersection points).
Speaking of which, our alternative solution relied on just one minimal-
length pairing; but it did not preclude the existence of other such pairings,
nor did it outlaw good non-minimal pairings:

 more than one minimal-length pairing of the segments? How about having
Exercise 3. Could there be a configuration of grey and black points with

a correct (non-intersecting) pairing of the points that is not the shortest?

 Another idea that anchored both solutions was the finiteness of all pos-
sible correct (and, in fact, incorrect) pairings. We saw this idea earlier in
PST 43, where we counted the total number of states in an array. Here,
just knowing that there are finitely many possible pairings made our, a pri-
ori continuous, length monovariant into a discrete monovariant (having only
finitely many values). Because of this, we could conclude that the monovari-
ant eventually stabilized, perhaps, not necessarily at its minimal value.

2.1.4. Pros and cons. The solution to Problem 9 can be written either way:
by a “self-correcting” process or via the Extreme Principle. Arguably the
second way is in some sense more appealing, because it explicitly identifies
the pairing (or one of the pairings) that works. But both solutions require
the same key idea – the same operation and the same monovariant.

Thinking about the problem in terms of a process and a monovariant

can be more helpful for you as the solver, trying to come up with a solution.
It can also be more helpful for someone trying to actually implement the
solution. While the Extreme Principle is a great theoretical tool. . . going
back to the writing example, if your algorithm for writing a 10-page essay is
to consider all possible 10-page sequences of letters, spaces, and punctuation
marks, and choose the one that best fits the demands of the assignment,
you’re going to have to ask your teacher for one heck of an extension!

2.2. Friendship/Enmity Relationships. Let’s practice our newly-learned

“correcting” or “editing” technique of starting from a random arrangement
and gradually “improving” it. Below we assume that friendship is symmetric,
i.e., if P is Q’s friend, then Q is P ’s friend.
Problem 10. You are the host of a party, with some number of guests.
Some of the guests are friends with each other. You have n kinds of party
favors, each in unlimited supply.4 Prove that you can give each guest one of
the favors so that the following condition is satisfied for each person P : at
most 1/n of P ’s friends have the same kind of favor as P .
Hint: Consider an arbitrary assignment of party favors to people. If some-
thing is wrong for a person P , what operation could you perform to improve
the situation? What will your monovariant be? Think of the purpose of the
problem: to limit the number of friends with same favors. ♦
Now, let’s talk about enemies as seen in this Moscow ’64 contest
Problem 11. (Dirac’s Theorem, [48]) King Arthur summoned 2n knights
to his court. Each knight has at most n − 1 enemies among the other knights
present. Prove that the knights can sit at the Round Table so that no two
enemies sit next to each other. (The relation of enmity is symmetric.)
Hint: The operation needed here is a little tricky: take some arc of the
table and reverse the order of the knights sitting in that arc. If some two
enemies are sitting next to each other, there is always a way of performing
this operation to decrease the number of pairs of adjacent enemies. ♦

2.3. Shedding the disguise. If you are familiar with the language of
graph theory, you will probably notice that both of these last two problems
are really theorems of graph theory, recast in anthropomorphic form. Briefly,
(1) A graph consists of vertices (dots), some of which are connected by
edges (the segments between the dots).
i (2) Neighbors are two vertices connected by an edge.
(3) A Hamiltonian cycle is a path that tours the graph along the edges,
visiting each vertex exactly once and coming back where it started.
Paul Zeitz called this the “affirmative action coloring problem” in the case of n = 2.

“Friends” often end up being translated as “neighbors” on a graph, “ene-

mies” are not connected by an edge, and colors could be properties assigned
to vertices or edges. In this language, Problem 10 can be conventionally
expressed using the coloring metaphor : for each n, the vertices of any finite
graph can be colored in n colors so that, for each vertex v, at most 1/n
of its neighbors (a.k.a. friends) are the same color as v. Problem 11 says
that in a graph with 2n vertices, where each vertex has at least n neighbors
(a.k.a. friends), there exists a Hamiltonian cycle, i.e., a closed path “around
the table” that visits each knight once and goes from friend to friend.

3. Not Getting There

So far we have used monovariants to study how a repeated process must

eventually reach a certain type of state. But there is another, perhaps more
self-evident use of monovariants, namely to show how a process cannot reach
a certain final state from a certain initial state. The idea is simple: if a
certain monovariant can only increase, there is no way to get from one state
to another state where the monovariant’s value is smaller, and likewise if
the monovariant can only decrease. If you recall in our china-shop example
from Part I (vol. I), you know that there is no way to reassemble a plate
from a bunch of pieces by repeatedly dropping them on the floor, because
this operation can only increase the total number of pieces, whereas having
a single plate at the end requires decreasing the number of pieces.
This idea can be presented as a PST, because that’s what it is:

 formed repeatedly. To show that the system can never reach some state from
PST 50. Suppose you have a system on which certain operations are per-

some other state, try using a monovariant.

This use of monovariants is a natural generalization of invariants. Indeed,
you can think of an invariant as a special type of monovariant – one that
can never increase and can never decrease. But there are some problems of
this sort where coming up with an appropriate invariant would be difficult
or awkward, and a monovariant does the job easily.

3.1. Flea-ing in a straight line. Here’s one example:

Problem 12. (IMO ’00, adapted, [21]) Let n ≥ 2 be a positive integer,
and let λ be a positive number less than 1/(n − 1). Suppose there are n
fleas on a horizontal line.5 Whenever two fleas are at points A and B on
the line, with A to the left of B, the flea at A may jump to the point C on
the line to the right of B with BC/AB = λ. Show that there exists some
initial position of the n fleas and some point M on the line such that it is
not possible for all of the fleas to get to the right of M .
The problem does not preclude two or more fleas crowding at the same point.

3.1.1. Mono-search. The natural way to begin such problems is to coordi-

natize the line and identify the fleas with their respective coordinates. Thus,
the rule BC/AB = λ (on the picture:
|AB| = 1 and |BC| = λ) translates
into coordinates as C − B = λ(B − A).
So, a move consists of taking two fleas
A and B, with A < B, and replacing A
by C = B +λ(B −A) = (1+λ)B −λA.
i This is called a linear function of the
two coordinates A and B. A 1 B λ C

 expect that the monovariant will also be given by some sort of linear function.
PST 51. If the operation is given by a linear function, it is reasonable to

In the case of the fleas, that would be α1 P1 + α2 P2 + · · · + αn Pn where the

αi ’s are constants and the Pi ’s are the positions of the fleas arranged from
left to right on the line.
Notice that if the problem is correct, then actually we can’t even get the
rightmost flea past M ; otherwise we could jump the other fleas over it. So
this suggests the rightmost flea, and its coefficient αn , are special. Let’s hope,
for simplicity, that all the other coefficients are equal. Finally, again because
of linearity, we can divide everything by αn to adjust the coefficient of Pn
to 1. So, the conjectured monovariant function is Pn +α(P1 +P2 +· · ·+Pn−1 ).
 PST 52. Adjust the coefficients of the linear function according to the
specifics of the problem. If some variables are special (such as the largest
coordinate), while the other variables play a symmetric role in the problem,
then let the former have different coefficients, and the latter the same coeffi-
cients. Typically in such problems you will be able to rescale all coefficients
so as to make one of them equal to 1.
To figure out α, let’s just think of the simplest case of two fleas P1 < P2 .
After P1 jumps over P2 , P1 is replaced by P2 + λ(P2 − P1 ), which is also
the rightmost point. Correspondingly, our function changes as P2 + αP1 →
(P2 + λ(P2 − P1 )) + αP2 , and the net effect is RHS − LHS = (λ + α)(P2 − P1 ).
We want this to be a decreasing monovariant, so the last quantity must be
non-positive. As P2 − P1 > 0, we need only α ≤ −λ.

 try the simplest possible option.

PST 53. When making a choice and in the absence of further restrictions,
For example, turn a non-strict inequality
α ≤ −λ into an equality α = −λ.
Without hesitation then let’s choose α = −λ, which does not depend on
the specific configuration of the fleas (excellent!). Our proposed monovariant
i is thus Pn − λ (P1 + P2 + · · · + Pn−1 ), called the value of the configuration.
If there is a tie for the rightmost position, then chose one of the rightmost
fleas for Pn and treat the other(s) as “regular” non-extreme fleas.

A formal proof is in order, but it will feel like a technical calculation

compared to the hard (and creative!) work done above.
Exercise 4. If Pn is the rightmost flea, show that Pn −λ (P1 +P2 +· · ·+Pn−1 )
 can never increase when a flea jumps.
Hint: Consider two cases, depending on whether the rightmost flea does
or does not change after the jump. Calculate the net effect of the jump on
the value of the configuration, factor, and show that it is ≤ 0. ♦

3.1.2. Place-search. We still haven’t solved our problem. For all we know,
our monovariant could decrease forever if the fleas could go on jumping
forever, unless they all eventually land in the same location. (When does
this happen?)
However, the problem asks for something different: a place on the line
over which the fleas cannot jump, i.e., for an impossible final configuration.
It seems we again need to do some detective work.
Now consider any configuration V of the n fleas, with some value ν.
According to PST 50, we should find a configuration W of the fleas that is
unreachable from V; more precisely, whose value is larger than our value ν, so
that the decreasing monovariant will prevent us from reaching it. If ω is the
rightmost position in W, then the value of W is ω − λ (sum of fleas ≤ ω).
How can we make sure this is larger than ν? By the Sandwich technique
from the Induction session (vol. I), we squeeze in an obvious intermediate
(3) ω − λ (sum of fleas ≤ ω) ≥ ω − λ(n − 1)ω = ω (1 − λ(n − 1)) > ν.
This may look intimidating, but all we did was replace each of the other
fleas with ω, in order to decrease the overall value. To resolve the “?” before
reaching ν at the end, let μ = 1 − (n − 1)λ. We finally see here why the
condition λ < 1/(n−1) was required in the problem: to make our μ positive!
Thus, we need ωμ > ν, i.e., any ω > ν/μ will do. The inequalities in (3) will
be satisfied regardless of where the remaining fleas are, as long as they are
to the left of ω.
Our search over, a formal proof needs to recap the above points.

Exercise 5. From any configuration with value ν, show that it is not possible
for any one of the fleas to get to a position M > ν/μ, where μ = 1 − (n − 1)λ.
Hint: Use inequalities (3), our monovariant, and a contradiction. ♦
We have just proven that for any initial configuration, there is some point
over which none of the fleas can ever jump. This is a stronger statement
than the problem required, so we are certainly done. In fact, the statement
of the original problem is slightly misleading in that it asks us to search for a
whole configuration of fleas, along with an unreachable position. In reality,
any flea configuration works, and any M > ν/μ is beyond the fleas’ reach.

3.2. Sleeping in an organized fashion. Here is another example of the

same technique, in a less “numerical” setting.
Problem 13. 100 students are sitting at a 10 × 10 array of desks in a
boring class. Each student is either asleep or awake. When at least two of
a student’s immediate neighbors (vertically or horizontally) are asleep, the
student may fall asleep; when at least two neighbors are awake, the student
may wake up. At the end of class, there are ten students awake, and no two
of them are sitting adjacent to each other. Prove that there were at least
ten students awake at the beginning of the class.
If you play around a bit, you’ll quickly notice that the number of students
awake at any given time is not a monovariant! For example, Figure 6 displays
the bottom left corner of the grid; W stands for “awake” and S for “asleep”.
The cell that is about to change its state is shaded, and the border of its
neighborhood of adjacent students is thicker. Check that the number of
awake students fluctuates up and down! Indeed, this problem requires being
a little cleverer.


Figure 6. Number of awakes is not a monovariant!

The trick is similar to the idea behind Problem 10 about favors among

friends. At each step, a student assimilates to the state of a majority of his
or her neighbors, and hence the total amount of mismatch between students
sitting next to each other never increases. That is, the number of pairs of

adjacent students, with one asleep and one awake, always decreases or stays
the same. This solves the problem, because we can bound the number of such
pairs at the beginning and at the end of class.
Well, almost – the count doesn’t quite work out correctly, because while
most students can belong to up to 4 such pairs, those on the edges of the
grid have only 2 or 3 neighbors. So, at the end with 10 awake students, we
could have anything from 26 mismatched pairs (with 4 corner and 6 edge
W ’s) to 40 mismatched pairs (with all 10 W ’s inside the grid). This is not
enough information to conclude for sure that there were 10 awake students
in the beginning. Hypothetically, we could have started with 36 mismatched
pairs created by 9 awake non-adjacent students inside the grid, but ended
with all W ’s having “migrated” to the border, e.g., 10 edge W ’s for a total
of 30 mismatched pairs and 10 awake non-adjacent students. As predicted,
the monovariant decreased (from 36 to 30), but we didn’t solve the problem!

PST 54. When everything works perfectly inside a region and is slightly off

along the border, you might remedy this inelegance by embedding the given
region in a bigger region and imposing some trivial conditions outside the
region in order to extend the problem there.

You may have seen this technique in another setting before. In Pascal’s
Triangle, the defining rule “add two adjacent numbers to get the number
directly below them” does not work for the 1s at the ends of the rows (why?).
However, if you place 0s all around Pascal’s Triangle in the same triangular
grid covering the whole plane, the rule will work for all numbers in the plane,
except for the 1 at the top of the triangle.
In our present predicament, we embed the grid in a bigger grid. The
“trivial” condition that will be imposed outside our original grid will be
perpetual sleepiness.

Solution to Problem 13: Imagine the 10 × 10 grid as part of an

infinitely large grid of students, with all students outside the small grid
being always asleep. (If the idea of that many sleeping students scares you,
it also works to imagine it as the central square in a 12 × 12 grid.) We need
to be careful to check if anything changed along the border of the 10 × 10
• The students outside the 10 × 10 grid will continue to be perpetually
asleep, as they neighbor at least 3 other such sleepers at all times.
• If a student along the border of the 10 × 10 grid could
wake up before due to 2 (or 3) adjacent W ’s, then adding S S W
1 (or 2) adjacent sleepers to that student will still allow the S W
student to wake up (cf. the figure on the right). S
• If a W on the border of the 10 × 10 grid had 2 adjacent S W
W ’s (and at most 1 adjacent S) before, then adding S’s out- S S W
side the grid will allow this W to switch to S, something that S
was not allowed in the original grid. Hence, our “extended” S S W
problem allows more possibilities for change of states, in par- S W W W S
ticular, for falling asleep. Nevertheless, we will show that the S S S S S
conclusion of the problem works in all new scenarios too.
Now consider the number of pairs of (horizontally or vertically) adjacent
students such that one is asleep and the other is awake. Each time a student
falls asleep or wakes up following the rules described, at most 2 such pairs
are created and at least 2 disappear, so the total number of such pairs can
never increase: this is our monovariant.
At the end of class, there are 10 students awake, and each has 4 sleeping
neighbors, making 40 asleep-awake pairs. So at the beginning of the class,
there must have also been at least 40 such pairs. Since each awake student
can belong to at most 4 pairs (one for each neighbor), there must have been
at least 10 students awake at the beginning of the class. 

4. Conway’s Checkers

This session will close with one last, fairly complex example. It might
get a little tiresome to go through so many disconnected examples of mono-
variants, but that’s no excuse for stopping here, because if you haven’t seen
this problem, your life is not complete. The problem is credited to the great
recreational mathematician John Horton Conway and is often called Con-
way’s Checkers or Conway’s Soldiers [9].

4.1. The setup is similar in spirit to the Escape of the Clones from the
Invariants session (vol. I), played on a grid infinite to the right and up.
Every cell had at most one clone. A clone sprouted one clone in the cell
to the right and another in the cell directly above, and then disappeared.
Given a “prison” fence enclosing some (or all) clones, the task was to free the
the clones from the prison. Let’s see how Conway’s problem differs.
Problem 14. (Conway’s Checkers) Imagine that you have an infinite
square grid, with a particular horizontal line of the grid designated. You
play the following game:
(a) First, you may initially place checkers in the squares below the line –
as many as you want, but no more than one checker per square.
(b) Then, you may take a checker and jump it over a checker that is adjacent
to it – in any of the four directions – into the square immediately
beyond, if that space is vacant. In the process, you remove the checker
that has been jumped over (cf. Fig. 7).
(c) You may continue jumping checkers, as long as there are two checkers
adjacent to each other somewhere.
The goal is to get some checker to be as far above the designated line as
possible. What is the highest row that can be reached?

Figure 7. Checker-jumping legal and illegal moves

Figure 7a–b display legal moves in all four directions and their results, and
Figure 7c warns against illegal moves: no jumping over several checkers or
into an occupied square, and no diagonal jumping!

4.2. Initial victory. Check that you can get a checker to the first row above
the designated line, by simply starting with two checkers stacked just below
the line and then jumping upward. With four checkers and a series of three
jumps, you can get a checker to the second row above the line (cf. Fig. 8).

Exercise 6. Find a way to get to the third row above the line. Then try to
get to the fourth row.
Hint: Figure 8c contains the initial configuration for reaching row 1, only
moved up a row. In general, suppose you have reached row k from some initial
configuration Fk . Shift Fk one row up, and try to reach it from some new
configuration Fk+1 . If you are successful, then your previous transformation
of Fk will result in a checker in row k + 1. There could be other ways. ♦

Figure 8. Getting to the second row

Here is the general principle that may have helped you so far:
 PST 55. Re-use inductively your solution for a previous case inside your
solution for the next case.
The situation looks hopeful! Pushing on:
Exercise 7. Try to find a way to get a checker up to the fifth row. Become
As the last exercise foreshadowed, and fairly remarkably, it’s impossible
to get a checker more than four rows up, no matter how many checkers you
place in the first stage of the game. Can we prove this?

4.3. Is there an invariant? First, let’s try using an invariant, along the
lines of the Escape of the Clones. Can we assign a number to each square, so
that the sum of the numbers of squares with checkers in them stays constant
at each step?
a b c a b c

Figure 9. Invariant in both directions

Consider three successive squares (cf. Fig. 9). Suppose we want to write
the numbers a, b, c in them so that the sum of the occupied squares stays
constant. One legal move is to jump a checker from the a square to the c
square, removing the checker on b in the process. For the sum to be invariant
under this move, we must have a+b = c. Similarly for a jump in the opposite
direction, we must have c+b = a. But these two equations add to give b = 0.
This argument shows that the number written in every square must be zero!
That doesn’t give us a very useful invariant.

4.4. Modifying an invariant. Maybe we ask too much from the invariant?

 laxing or dropping some of these conditions. In the case of Conway’s checkers,

PST 56. If imposing too many conditions leads to a trivial solution, try re-

instead of having an invariant in all four directions (moving to the right, left,
up, and down), try to make the sum invariant only under jumps in certain
directions, e.g., only right and up.
First, let’s focus on a single row. Let’s choose numbers to write in the
squares so that the sum stays invariant when we jump to the right. Just as
powers of 12 were the natural choice to use in the Escape of the Clones, so are
powers xn of some unknown x in order here. Three successive squares with
xn , xn+1 , xn+2 , written from left to right, will accommodate our desired
invariant only if xn + xn+1 = xn+2 . Dividing by xn leads to a famous
√ equation: x2 − x − 1√= 0, and the quadratic formula gives x1 =
1+ 5
2 ≈ 1.618 > 1 and x2 = 1−2 5 ≈ −0.618. The larger root x1 is denoted
i by φ and is known as the golden ratio.
To make it easier to express things concretely, coordinatize the squares
in a row with integers (increasing as we move to the right). If we write φn
in square n, we have ensured that the sum of the numbers of the squares
containing checkers stays invariant under jumps to the right: φn + φn+1 =
φn+2 . On the other hand, a jump to the left consists of replacing φn+2 +φn+1
by φn , thus always decreasing the sum (why?).
Now, what about jumps up and down, trying to be invariant only up?
For any column, an analogous discussion leads us to assign powers φn , with
n increasing as we move up the column. However, this column intersects our
previously discussed row, and there is already some power φm assigned to
the square in the intersection. We need to shift all powers in our column
up or down in order to match this φm . There is a simple algebraic way of
reconciling the numbers in all rows and columns: we assign vertical as well as
horizontal coordinates to the squares, and then write φm+n in square (m, n).
This multiplies all numbers in our column by the same φm , and all numbers
in our row by the same φn , without changing the properties that we would
like: the sum of the numbers in the checkers’ squares will
• stay the same under rightward and upward jumps, and
• decrease under leftward and downward jumps.

4.5. Symmetry gets rid of infinities. Even though we now have turned
the sum into a monovariant, this doesn’t quite work to solve the problem.
We want to show that the fifth row is unreachable by arguing the sum of the
original checkers is not large enough. But with the numbering scheme just
described, there exist squares with arbitrarily large numbers, even below the
designated line: since φ > 1, we have a real problem with the positive powers
φm . In particular, we need the sum in a single row to be finite, but what we
have now is this:

Lemma 3. For any row, the assignment {φm } for any integer m yields an
infinite sum for half of the row and a finite sum for the other half:
 (a) 1 + φ + φ2 + φ3 + · · · + φm + · · · = ∞.
(b) 1 + φ−1 + φ−2 + φ−3 + · · · + φ−m + · · · = = φ2 .
1 − φ−1
Proof: Part (a) is self-evident, as all numbers there are > 1. The sum in
i part (b) is a geometric series a + ar + ar2 + · · · + ar m + · · · , where every
next term is the previous multiplied by the ratio r. Provided r is small,
namely, −1 < r < 1, the sum adds up to 1−r a
(cf. the Invariants session,
vol. I). In our case, a = 1 and r = φ , and 1−φ1 −1 = φ−1φ
= φ2 because
(φ − 1)φ = φ2 − φ = 1 and hence φ−1
= φ. 
The ∞ in part (a) is worrisome, showing that our argument will never
come together. We need to get rid of the large powers φm , while at the same
time ensure that what happens when jumping left (the sum decreases) will
also happen when jumping right! The way to do this is to:
 PST 57. Choose a central object and symmetrize the rest with respect to it.
Specifically, for our checkers choose a central column, have the numbers be
highest in that column, and decrease as you go away from it along any row.
φ.7.. φ.8.. φ.9.. φ10 ...φ9 ...φ8 ...φ7
To put it more directly, choose the “central” ... ... ... ... ... ...
... ... ...
column to be the one with m-coordinate 0, and φ... φ.7.. φ.8.. φ9 ...φ8 ...φ7 ...φ6
... ... ... ... ... ...
as before assign to it all powers {φn }. Row ... ... ...
n intersects this central column in φn , so de- φ.5.. φ.6.. φ.7.. φ8 ...φ7 ...φ6 ...φ5
... ... ... ... ... ...
... ... ...
crease the powers of φ as you move away from φ.4.. φ.5.. φ.6.. φ7 ...φ6 ...φ5 ...φ4
φn along row n, either to the right or the left: ... ... ... ... ... ...
... ... ...
φ.3.. φ.4.. φ.5.. φ6 ...φ5 ...φ4 ...φ3
. . . , φn−2 , φn−1 , φn , φn−1 , φn−2 , . . .. ... ... ... ... ... ...
... ... ...
This boils down to replacing φm → φ−|m| φ.2.. φ.3.. φ.4.. φ5 ...φ4 ...φ3 ...φ2
... ... ... ... ... ...
in our previous formula and arriving at the ... ... ...
φ... φ.2.. φ.3.. φ4 ...φ3 ...φ2 ...φ
pretty V -shape pattern on the right. ... ... ... ... ... ...
... ... ... ...
1... φ... φ.2.. φ3 ...φ2 ...φ ...1
Exercise 8. Suppose the number φ−|m|+n is .
... ... ... ... ... ....
. .
. .. .. ..
assigned to square (m, n), for all m and n. φ... 1... φ... φ2 ...φ ...1 ...φ−1
... .. ... ... ... ...
Check that whenever a jump is made, the sum ... .. ... ... ... ...
φ.−2 −1 −1
... φ.... 1.... φ ....1 ....φ ....φ

of the numbers in squares occupied by check- ... ... . .. .. ..

.. .. .. ... ... ...
ers will either stay the same or decrease. φ−3 φ−2 φ−1 1 φ−1 φ−2 φ−3
Hint: More precisely, the sum of the occupied squares will
• stay the same if you jump up or towards the central column;
• decrease if you jump down, over, or away from the central column. ♦
Now that we have modified our monovariant, let’s see if we have resolved
our previous difficulty of having infinite sums.

 the sum of all the numbers on or below row 0 is exactly φ .

Exercise 9. Using the formula for the sum of a geometric series, check that

Note that, if you add up all the numbers in the grid, you will inevitably
get ∞; indeed, any one column alone will yield an infinite sum (why?). But
as it will turn out, for our solution we do not need to add up all numbers
and we will not do that.

4.6. What’s stopping us from reaching the 5th row? Now we have all
the pieces in place to explain this “mystery”.
Solution to Problem 14: In Exercise 6, we saw that it was possible
to get checkers as high as the 4th row above the designated line. We claim
that it is not possible to get a checker to the 5th row, which establishes the
answer: the 4th row is the highest possible.
Suppose, for a contradiction, that it is possible to get a checker to the
5 row. Coordinatize the grid so that the row just below the designated line
is row 0 and the row just above it is row 1, and so that the alleged 5th -row
checker lands in column 0, i.e., in square (0, 5). As before, for each m, n, let
the number φ−|m|+n be written in the square in column m and row n.
Now consider the sum of all the numbers written in squares containing a
checker. Initially, checkers exist only in the squares below the line, so their
sum is at most φ5 according to Exercise 9. Furthermore, Exercise 8 showed
that our sum will either stay the same or decrease with each jump, and so it
will always be ≤ φ5 . But we assumed that we can eventually get a checker
to the square (0, 5), which has the number φ5 written in it, making the sum
≥ φ5 . This means that the sum must have started from φ5 and ended also
at φ5 , i.e., it stayed constant throughout the whole game! So the sum is,
after all, an invariant?
Wait a minute! To have an initial sum of φ5 we must have started with
all checkers that are below the designated line. To have a final sum of
φ5 concentrated in one square, (0, 5), means that we ended the game with
exactly one checker: having more checkers in the end will bump up the sum
beyond φ5 . So, we converted an initial configuration with infinitely many
checkers into a single checker? But that would take an infinite number of
jumps, while the game is finite: we said we reached the 5th row, which ended
the game! This is our contradiction: after all, ∞ is not finite. 

4.7. Monologue on the monovariant. Our numbering of the squares

went through several stages before reaching its final shape:
(1) a “universal” invariant that ended up in 0s all over and made us give
up on the invariant idea and ask for less.
(2) a “universal” monovariant that mimicked the Escape of the Clones
solution, i.e., had φm+n in square (m, n), for all m and n. But since our grid
is infinite in all directions (not just two, as in the Clones problem), this led
to a disaster: we had arbitrarily large numbers φn in a monovariant that was
decreasing, thereby cutting off our chance of reaching a contradiction.

(3) a “modified” monovariant that removed these arbitrarily large powers

from each row by placing φ−|m|+n in square (m, n), for all m and n. Now
every individual row had a finite sum, and moreover, all rows in half of the
grid added up to a finite number (φ5 ). This made it possible to squeeze out
the desired contradiction.
But in (3) we avoided some parts of the grid, for otherwise we would be
stuck with ∞ as soon as we tried to add up a single column! There was also
the somewhat mysterious sum of φ5 that appeared twice in our solution and
the lonely line of symmetry: the central column. Without messing up our
monovariant solution, here is one pretty (but not necessary) final re-shaping:
i (4) a “doubly-symmetric” monovariant that puts 1 in the position (0, 5);
φ−1 in the four adjacent positions (above, below, right and left), forming a
square; φ−2 in the 8 positions directly outside this square, forming a larger
square; φ−3 in the 12 positions directly outside those, and so on. The whole
plane will be covered this way with powers of φn for n ≤ 0 (cf. Fig. 10).

ρ5 ρ4 @ρ5
@ @
ρ5 ρ4 ρ3 @ρ4 @ρ5
@ @ @
ρ5 ρ4 ρ3 ρ2 @ρ3 @ρ4 @ρ5
@ @ @ @
ρ5 ρ4 ρ3 ρ2 ρ @ρ2 @ρ3 @ρ4 @ρ5
@ @ @ @ @
ρ5 ρ4 ρ3 ρ2 ρ 1 @ρ @ρ2 @ρ3 @ρ4 @ρ5
@ @ 5 @4 @3 @2
@ρ @ρ @ρ @ρ @ρ ρ2 ρ3 ρ4 ρ5
@ @ 5 @4 @3 2
@ρ @ρ @ρ @ρ ρ3 ρ4 ρ5
@ @ 5 @4 3
@ρ @ρ @ρ ρ4 ρ5
@ @ 5 4
@ρ @ρ ρ5
@ 5

Figure 10. Alternative numbering, ρ = φ−1

Exercise 10. With the alternative numbering in (4), show that the sum of
occupied squares is a monovariant as follows: moving away from the row or
the column of 1 keeps it constant, while moving towards or over the row or
the column of 1 decreases it. Finally, show that the sum below the designated
row is 1, while the total sum of the grid is still finite.
Obviously, our previous solution will carry over with no blips because
the drastic changes happened only to rows above (0, 5), while the rows up to
the fifth row were only rescaled by φ−5 . . . and that is the only region where,
as it turned out, any checkers moved.

5. Hints and Solutions to Selected Problems

Problem 2. Look at the sum of all the numbers: it increases in the example,
and in fact it always increases. Indeed, using the notation from the hint, any
two numbers a and b among the given will be replaced by d and l = dkm,
so the sum will go from a + b = d(k + m) to d + l = d(1 + km). To see why
1+km ≥ k+m move to the LHS and factor: 1−k+km−m = (k−1)(m−1) ≥ 0.
Because k, m ≥ 1, equality occurs iff k = 1 or m = 1, which boils down to
a = d or b = d, and a|b or b|a. Thus, the sum will remain the same iff one
of the numbers divides the other (and in this case, the two numbers will not
change either), and the sum will strictly increase otherwise.
Note that the operation does not change the overall lcm of all numbers.
This is so because lcm(a, b) = l = lcm(d, l). Thus, the largest number we can
possibly write down is L = lcm(a1 , a2 , . . . , an ), the lcm of all given numbers
a1 , a2 , . . . , an . This puts a (very rough) upper bound on the total sum: at
most nL, and as the sum increases by steps of 1 or more, it eventually must
stop changing. By our argument above, this means that for any two of the
given numbers, one divides the other, and hence the numbers themselves
stop changing at that point. 
Problem 3. In part (a), we have ab = d2 km = dl, i.e., the total product
Π = a1 a2 · · · an is an invariant. This could have been used instead of the lcm
above to show that the process will terminate (how?). It also explains what
really happens in parts (b)-(c): the process reshuffles and recombines the
factors of the given numbers, without dropping or creating new factors. 
From our solution to Problem 2 we know that the process terminates
when, for any two numbers, one divides the other. Arrange the final num-
bers ci in increasing order. Then we must have a chain of divisibilities:
c1 |c2 |c3 | · · · |cn−1 |cn . There are further restrictions. If p is a prime that di-
vides some ai , let pα1 ≤ pα2 ≤ · · · ≤ pαn be all prime powers that divide the
original numbers. It is easy to see that these prime powers cannot combine
or split into two different numbers during the process (why?). Hence, they
will end up dividing the resulting final numbers on the board, i.e., pαi |ci for
all i = 1, 2, . . . , n. Thus, the original prime powers are only reshuffled to
make the final numbers, thereby, completely determining the final outcome
of the process, regardless of the order in which we conduct the process.
An interesting twist is that the length of process does depend on our
choices. In addition to the 2-step example on page 143, we can complete the
process in a longer way:
• {2, 3, 5, 15} → {1, 3, 10, 15} → {1, 3, 5, 30} → {1, 1, 15, 30}. 

Problem 4. For the case of M numbers on the circle, to track the contri-
butions of a single label a1 to all labels, set a1 = 1 and all other ai = 0,
and perform the addition process. You will quickly see a famous pattern,
encoded in the Pascal’s Triangle:

The 1 at the top stands for a1 = 1. 1

The second row shows the contribu- 1 1

tion of this 1 after one iteration: the
1 2 1
labels will be {1, 1} and 0s elsewhere.
1 3 3 1
The third row shows the labels after
two iterations: {1, 2, 1} and 0s else- 1 4 6 4 1
where; 2 is in the spot of the original 1 5 10 10 5 1
a1 . The fifth row shows {1, 4, 6, 4, 1} =
1 6 15 20 15 6 1
{a4 , a5 , a1 , a2 , a3 }, unless we start with
only 4 numbers: then the two 1s come 1 7 21 35 35 21 7 1

together in the spot of a3 , yielding la- 1 8 28 56 70 56 28 8 1

bels {6, 4, 2, 4} = {a1 , a2 , a3 , a4 },
i.e., 0s (mod 2) everywhere. Analogously, the total final contribution of any
number in any other slot after 4 iterations will also be {0, 0, 0, 0}. We con-
clude that for 22 numbers 22 iterations will zero-in everything.
For 23 = 8 numbers around the circle, the 9th row displays only evens,
except for the end 1s, which will come together to add up on the spot of a5
(opposite to a1 ), yielding 2 ≡ 0 (mod 2) and showing that 8 iterations will
the most general case of 2m numbers, show

zero-in everything. To complete 2m 
that all binomial coefficients j are even for j = 0, 2m . ♦

Problem 5. Let x, y, z, t and q be the five numbers as some point, with

y < 0. Let S = x + y + z + t + q be the total sum. It is clear that S is
invariant under the operation: (x+y)−y+(z+y)+t+q = x+y+z+t+q = S.
Following the hint, we look at the set of all consecutive sums of 1, 2, 3, and
4 numbers. To make sure that we do not overcount, we will begin with a
vertex and list all four sums that start with it, going clockwise; and then we
will list what happens to that sum under the operation.
First, let’s do this on the concrete example in Figure 4. Starting from
the top vertex, the 20 sums are (we skip the total sum as it is an invariant):
• {5, 3, 6, 6, -2, 1, 1, 0, 3, 3, 2,7, 0, -1, 4, 2, -1, 4, 2, 5}

 ↓ ↓ ? ? ↓ ↓   ↓ 
• {3, 5, 6, 6, 2, 3, 3, 2, 1, 1, 0, 3, 0, -1, 2, 4, -1, 2, 4, 5}
? ?
• {2, 4, 5, 4, 2, 3, 2, 3, 1, 0, 1, 3, -1, 0, 2, 4, 1, 3, 5, 6}
? ?
• {2, 4, 4, 5, 2, 2, 3, 3, 0, 1, 1, 3, 1, 1, 3, 5, 0, 2, 4, 4}
Try to match the 20 numbers from one list to the next. This has been
partially done for the first two lists. If you complete the matching, you will
see that at every step two numbers refuse to match anything; for example,
in the first transition, we are left with −2 → 2 and 7 → 3, marked by ?s.
Now, let’s do this in general, using variables. Starting the sums with x:
• x → x + y, x + y → (x + y) − y = x, i.e., x ↔ x + y (switch);
• x + y + z , x + y + z + t  (go to themselves).

“Miraculously,” the set of sums starting with x does not change as a whole!
Similarly for the sums starting with t or q:
• q + x ↔ q + x + y, t + q + x ↔ t + q + x + y (switch).
• t, q, t + q, q + x + y + z  (go to themselves);
Finally, starting with y or z:
• y + z ↔ z; y + z + t ↔ z + t; y + z + t + q ↔ z + t + q (switch);
• y → −y and z + q + t + x → z + q + t + x + 2y.
So the only change to the set of sums occurs when y → −y and S −y → S +y:
these are the two sums without matching partners in our example! Taking
absolute values, we have no change for |y| = | − y|, but |S − y| = S + (−y) >
|S +y| (why?). (In our example, |S −y| transitions from 7 to 3, then from 6 to
4, and again from 6 to 4, dropping down every time by −2y.) Consequently,
the sum of all absolute values of the 21 possible sums goes down by an integer
value. This is our monovariant! Since this sum is always positive (why?), it
must stop decreasing after a while, i.e., the process must terminate. 
Try repeating the same argument for 3, 4, and 6 numbers.
Exercise 1. When n is large enough that fn has reached its eventual con-
stant value c (odd), then fn = fn+1 = c implies (fn + fn+1 )/2 = c, so indeed
fn+2 = c. At the same time, gcd(fn , fn+1 ) = gcd(c, c) = c for all n from now
on. But what happens before the sequence stabilizes?
Following the hint, for any n, let d = gcd(fn−1 , fn ). Thus, d |fn−1 and
d |fn , and hence d also divides the sum fn−1 + fn . Since both fn−1 and fn
are odd, then d must be odd too; this means that we can divide the sum
by 2 without affecting d, i.e., d |(fn−1 + fn )/2. But fn+1 is the largest odd
divisor of this average, so d |fn+1 . Ordinarily, the gcd of a subset of numbers
{fn−1 , fn } is greater than the gcd of the whole set {fn−1 , fn , fn+1 }. However,
in our case, adding fn+1 to {fn−1 , fn } does not decrease the gcd (as fn+1 is
already divisible by that gcd), so gcd(fn−1 , fn ) = gcd(fn−1 , fn , fn+1 ) = d.
We are half done. Now, let e = gcd(fn , fn+1 ), and hence e |fn and
e |fn+1 . By definition, fn+1 |(fn−1 + fn )/2, therefore, e |(fn−1 + fn ). But
e already divides the summand fn ; hence e must divide the other sum-
mand fn−1 , i.e., gcd(fn , fn+1 ) | fn−1 . Analogously as above, gcd(fn , fn+1 ) =
gcd(fn−1 , fn , fn+1 ) = e.
Combining the two conclusions, gcd(fn−1 , fn ) = gcd(fn , fn+1 ) for any n,
i.e., the gcd of two consecutive numbers is an invariant. 
To finish the exercise, on the one hand gcd(f1 , f2 ) = gcd(fn , fn+1 ) for
any n, and on the other hand c = gcd(fn , fn+1 ) for n large enough (when
{fn } stabilizes). So the constant value of the sequence is gcd(a, b). 
Problem 7. Recall (from Exercise 4 in Proofs I, volume I, about arith-
metic operations on irrational and rational numbers) that the difference,
sum, product, or ratio of an irrational and a (non-zero) rational number is
irrational (proven by contradiction). Now, if x0 is irrational (x0 ∈ I), then
by induction, all xk ∈ I. Indeed, if some xk−1 ∈ I, then pk /xk−1 ∈ I (as pk is

a prime and hence rational). As the floor function x outputs only integers
(hence rational), the fractional part {x} = x − x of a number preserves
the rationality/irrationality of x. Putting these together, xk = {pk /xk−1 } is
also irrational. Hence the sequence will never reach 0.
On the contrary, if x0 is rational (x0 ∈ Q), the sequence will reach 0.
Indeed, similarly as above, all xk ∈ Q (xk ≥ 0), so we can write them
as xk = abkk for some relatively prime positive integers ak and bk , unless xk
becomes 0. We will show that the denominators {bk } decrease in the process.
Since 0 < x0 < 1, a0 < b0 , then x1 = { xp10 } = { a0p/b 1
} = { pa1 0b0 } = ab11 < 1.
The fractional part {x} changes only the numerator, but not the denominator
of the rational x (why?). Therefore, b1 = a0 , or b1 < a0 if there is some
reduction of the fraction ab11 . In either case, b1 ≤ a0 < b0 , so the denominator
of x1 is smaller than that of x0 . To push this argument through induction
(which we leave to the reader), you will also need to use a1 < b1 , which
follows from {x} < 1 for all x.
But the sequence of (positive) denominators {bk } cannot decrease for-
ever. Hence, the process terminates, implying that some xk = 0. ♦

Problem 8. The averages bk for the given example are 9, 5, 4, 3, 3, 3, . . ..

While the original sequence may not be decreasing, the sequence of averages
seems to be decreasing. Let’s prove it. A standard way to proceed is to set
up the inequality bk+1 ≤ bk and work backward until you get something like
ak+1 ≤ bk (do it!), but the latter is not easy to prove either. One thing that
becomes evident in these calculations is that there must be some relationship
between the three involved quantities bk+1 , bk , and ak+1 . Let’s find it:

a1 + a2 + · · · + ak + ak+1 kbk + ak+1 k ak+1

bk+1 = = = bk + ·
k+1 k+1 k+1 k+1

This relationship is actually true for any sequence {an } with average se-
quence {bn }. Now we are on the right track: since k/(k + 1) < 1 and
ak+1 < k + 1, then bk+1 < bk + 1. As both sides are integers, we can be
more precise: bk+1 ≤ bk , which is what we were after! The sequence {bk } is
decreasing and consists of positive integers, so it must stabilize. But as soon
as bk+1 = bk , our inequalities above turn into equalities, showing ak+1 = bk .
Thus, the original sequence {an } stabilizes at the same value. 
Exercise 3. The triangles on the right
are equilateral. The fourth point of
the first configuration is the center of
the triangle and in the second config-
uration it is a point below the center.
In each case, the two possible non-intersecting pairings are marked in
solid or dashed segments. In the first configuration both pairings are minimal

(they have the same length). In the second configuration the solid pairing is
longer than the other, hence it is non-minimal, but still a correct pairing. 
Problem 10. Starting from an arbitrary assignment of party favors to
people, design the following operation. Whenever some person P violates
the condition, notice that there is some favor that at most 1/n of P ’s friends
have, so reassign P to this favor. For example, in Figure 11a, n = 5 and P ’s
friends are split into 5 groups according to their favors F1 , . . . , F5 . Originally,
P has favor F2 , so he is connected to each friend in the group with F2 .
However, this group is larger than 1/5 of P ’s friends. This means that
another group, namely, the group with F5 , is smaller than 1/5 of P ’s friends,
so we change P ’s favor to F5 and connect him to everyone in that group.
Check that each time the operation is performed, the number of pairs
of friends who have the same favor decreases. This monovariant cannot
decrease forever, and eventually the desired situation will be achieved. 
A B Bk B
An An
B1 Bn Bn
P B2
A1 Ak Ak
B2 A1
F4 F3 Bk A
A2 B1

Figure 11. Favors and Knights

Problem 11. Seat the knights in any random order around the table (and
make sure they don’t kill each other while you are rearranging them!)
Suppose two enemies are sitting next to each other. Call them A and
B, going clockwise (CW) around the table (cf. Fig. 11b). Starting from A
and going counterclockwise (CCW), let n of A’s friends be A1 , A2 , . . . , An .
For any Am , let Bm be his CW neighbor. (Some Bm ’s will coincide with
Am−1 ’s or B1 = A, but that won’t affect our solution.) Among the n knights
B1 , B2 , . . . , Bn , one must be a friend of B, say, Bk (= A). We have the CW
arrangement: A, B, . . . , Ak , Bk . . ., with friends {A, Ak }, friends {B, Bk },
and adjacent enemies {A, B}.
The arc of the table that we will switch goes CCW from A to Bk (cf. the
dashed arcs in Figures 11b–c). The switch creates two new adjacent friendly
pairs {A, Ak } and {B, Bk }, while splitting the original adjacent enemy pair
{A, B} and not affecting other adjacent pairs (except for, possibly, flipping
their order of appearance around the table). As a result, we have reduced by
1 or 2 the number of adjacent enemy pairs. This is our monovariant, which
will keep decreasing until it hits 0 and only friends sit next to each other. 
Exercise 4. Suppose that the flea at A jumps over the flea at B to B +
λ(B − A). Prior to the jump, suppose the rightmost of the n fleas is at
position C (which may be equal to B). There are two possibilities.

Case 1. If C is still the rightmost position after the jump (cf. Fig. 12a), then
the only change in the value happens with the flea at A:
−λA → −λ(B + λ(B − A)).
Ä ä
This is a net change of −λ B + λ(B − A) − A = −λ(1 + λ)(B − A) < 0, so
the value of the configuration has decreased.


Figure 12. Preservation/Change of “leadership” among the fleas

Case 2. If the flea has jumped farther right than C (cf. Fig. 12b), then the
new value is (B + λ(B − A)) − λ(C + B + (other terms)), and the net change:
(B + λ(B − A) − λC) − (C − λA) = (1 + λ)(B − C) ≤ 0.
(We know that B ≤ C because C was originally the rightmost flea.) So the
value may stay the same, but it still cannot increase. 
Exercise 5. Suppose otherwise. Then the rightmost flea is at some position
ω > ν/μ and all the other fleas are at positions ≤ ω, which means the value
of the configuration is:
ω − λ(sum of fleas ≤ ω) > ν.
So getting any flea to a position > ω requires the value of the configuration
to increase, and we have shown in Exercise 4 that this can never happen. 

Figure 13. Reaching the third row in an inductive way

Exercise 6. Figure 13 uses the inductive idea: it starts with 10 checkers,

reduces them to the 4-checker configuration from Figure 8 shifted up a row,
and ends in one checker in row 3. A faster way to reach row 3 is to start
with 8 checkers as in Figure 14; some steps require you to find the correct
order to follow the arrows. 

To reach row 4, Figure 15 builds upon our previous 8-checker configura-
tion. However, there is a solution starting with only 20 checkers. 

Figure 14. Reaching the third row in an efficient way

Figure 15. Reaching the fourth row in an inductive way

Exercise 8. Jumping up or toward the central column (whether left or

right), replaces φk + φk+1 → φk+2 ; but the two sides are equal since 1 + φ =
φ2 . Jumping down or away from the central column replaces φk+2 + φk+1 →
φk ; this is a decrease as φk+1 > φk . Finally, jumping over the central column
replaces φk + φk+1 → φk , which is also an obvious decrease. 

Exercise 9. Row 0 = {. . . , φ−3 , φ−2 , φ−1 , 1, φ−1 , φ−2 , φ−3 , . . .}. The right
half of row 0, starting with 1, is a geometric series that adds up to φ2
(cf. Lemma 3(b)). The left half of the row is the same series minus the term
1, i.e., φ2 − 1. Adding the two halves we get 2φ2 − 1 = 2φ2 − (φ2 − φ) =
φ2 + φ = φ3 . But row n < 0 (below row 0) is just row 0 with everything
multiplied by φn , hence the sum for row n is φ3 φn . Now we add up all rows for
n = 0, −1, −2, −3, . . . and factor the repeating φ3 , and discover yet another
geometric series that we have already encountered again in Lemma 3(b):
Ä ä
φ3 1 + φ−1 + φ−2 + φ−3 + · · · = φ3 · φ2 = φ5 . 

Exercise 10. Compared to before, everything on and under row 5 has been
divided by φ5 . Hence the sum underneath the designated line did the same
thing: φ5 /φ5 = 1. The sum up to row 5 (inclusive) is φ5 , so the total sum is
twice that minus row 5’s sum: 2φ5 − φ3 = φ5 + (φ5 − φ3 ) = φ5 + φ4 = φ6 . ♦
Session 7

Geometric Re-Constructions. Part II

Bits of Geometry, Physics & Trigonometry

Zvezdelina Stankova

Sneak Preview. In this session, we will explore intermediate-level solutions

to our main challenges from Part I: the Farmer-and-Cow and the Three-Squares
problems. Some basic facts about inscribed angles from Circle Geometry (vol. I)
and a couple of similarity criteria for triangles from Part I will come in handy.
We will also need to review and learn more sophisticated geometry facts and
techniques. For example, we will re-discover and re-prove the famous theorems
of Pythagoras and Ptolemy, and through the Farmer-and-Cow problem we will
link physics and everyday phenomena to mathematical theory. We will also expe-
rience a speed-of-light introduction to trigonometry, use our previous geometric
knowledge to prove a famous trigonometric formula, and in turn apply this for-
mula to solving the Three-Squares problem in yet a different but super-fast way.
Two challenges on optimal bridges and the infinitely many angles will be
posed right away for the most advanced. However, their solutions will be post-
poned until Part III, where a creative plane geometry idea will be inspired by
an inequalities approach and further Calculus techniques will provide alternative
ways of tackling a whole range of such problems.

1. Optimal and Infinite Challenges

If you feel you are already fortified with enough plane geometry back-
ground and the two main problems from Part I are not challenging enough,
you can tackle their two cousins below and meet us later in the session to
compare notes on their difficulty and variety of approaches. Calculus solu-
tions are allowed; but the ultimate challenge in these problems, of course, is
to discover beautiful purely geometric solutions that can be potentially cre-
ated by bright middle schoolers with little technical background and open
minds. Do such solutions exist? Part III will partially answer this question.

Problem 1. (Optimal Bridge) Two villages

are situated on opposite banks, not necessarily
across from each other. The river is of constant
width. The farmer’s market is always held in
the same village. The other village wants to
build a bridge across the river (and perpendic- ?
ular to the banks of the river) so that the total
trip to the farmer’s market is as short as possi-
ble. Where should the bridge be built and why?
Hint: This problem reminds us of the Farmer-and-Cow situation, where we
reflected the farmer across the river to a phantom farmer and asked the latter
to walk straight through the river and toward the cow. Alas, we justified
there that the width of the river could be safely assumed to be zero without
changing the problem. Yet, in our Optimal-Bridge problem the width of the
river plays an essential role and cannot be ignored. Is reflection again the
“magic” transformation that will reduce the problem to a trivial one, or is
there another, more appropriate “action” in the plane? ♦.

The second challenge for the die-hards is an infinite extension of our

previous Three-Squares puzzle:
Problem 2. (ℵ0 -Squares)1 Glue to each other infinitely many identical
squares with bases AA1 , A1 A2 , A2 A3 , A3 A4 , A4 A5 , and so on, to form an
infinite row (cf. Fig. 3). If D is the top left corner of the first square, right
above A, what is the sum ∠AA1 D + ∠AA2 D + ∠AA3 D + ∠AA4 D + · · · ?

α1 α2 α3 α4 α5
A A1 A2 A3 A4 A5
Figure 1. α1 + α2 + α3 + α4 + α5 + · · · = ?
Ideas: The discussion about the original Three-Squares problem in Part I
concluded with finding the sum of the first three angles: α1 + α2 + α3 = 90◦ .
Is there a similar geometric construction for the ℵ0 -Squares puzzle, i.e., can
you usefully tile (part of) the integer grid into grid-triangles? Or could you
apply some more advanced techniques instead? In the latter case, try first
to solve the Three-Squares problem with trigonometry as a preparatory step
for this infinite version. Or, perhaps, you know how to employ the so-called
Taylor expansion of a suitable function for the infinitely many squares?
Whatever you decide, experimenting by summing some of the angles and
estimating the total can be illuminating. Starting with α4 , you may need
to add up more than a dozen angles before you realize that this problem is
very different in nature than its Three-Squares predecessor. ♦
ℵ0 is a shortcut for “infinitely many.”

2. A Pythagorean Path for the Intermediate

2.1. Similarity rules again. Let’s review the problem in Part I that we
solved via an auxiliary geometric construction and congruent triangles:

Problem 3. (Three Squares) Three identical squares with bases AM ,

M H, and HB are put next to each other to form a rectangle ABCD
(cf. Fig. 2a). Prove that ∠AM D + ∠AHD + ∠ABD = 90◦ .

α β γ δ β γ
Figure 2. Three-Squares problem and Similarity of triangles

The auxiliary geometric construction was the hard and the brilliant part
of our first solution. It is unlikely that one would come up with the exact
same construction. A natural task would be to find a solution that does not

depend on auxiliary segments. It turns out that there is such a solution; but
to compensate for the lack of auxiliary segments, we will need to replace the
simpler congruences by similarities of triangles.

2.1.1. Angle discussion in reverse. Since α = 45◦ , we just need β + γ = 45◦ .

Do we see a 45◦ -degree angle that is already split into β and γ? (Remember:
no additional drawings!) The only plausible location is the clustering of
angles at vertex D. Do we recognize some angles there? Indeed, ∠CDM is
45◦ and it happens to contain β (cf. Fig. 2b) because ∠CDH = ∠AHD as
alternate interior angles for AB || CD and the transversal HD.
The solution to the Three-Squares problem will be complete if we can
show that ∠M DH = γ. But there isn’t a pair of (drawn) parallel lines to
imply it! What other tools can be used to compare ∠M DH and ∠HBD = γ?
Do these angles, perhaps, participate in some congruent or similar triangles?
Two natural candidates are M DH and M BD: since ∠HM D = δ is
shared by them, if our two angles were indeed equal, then by AA the triangles
would be similar !

This is a good place to pause and re-think what happened just now.

PST 58. When reasoning backward you will often reach an important fact

that must be true (in order for the original problem to work out): try to
prove this fact without using any unjustified assumptions from the “back-
ward” discussion.

Applying PST 58 may as well be the the turning moment in the analysis
of the problem, where your solution starts “moving forward”.

2.1.2. Moving forward . . . and back again. How do we show that M DH ∼
M BD without assuming ∠M DH = γ? We still have ∠HM D = δ shared
by the two triangles, but we do not know anything about other pairs of
angles in these non-congruent triangles. Our only chance is to use ratios of
sides through, say, the RAR criterion for similarity.
With this in mind, is it true that the sides adjacent to δ in M DH and
M BD form equal ratios, i.e.,

MH ? MD ? ? ? √
(1) = ⇔ M H · M B = M D 2 ⇔ 1 · 2 = M D 2 ⇔ M D = 2?
We again reasoned backward! But we finally seem to have reached something
that can be proved independent of the discussion so far.

2.1.3. Ending with a Pythagorean certainty. I can almost hear the reader
objecting to the last question in (1): “It is a well-known fact! M D is the
diagonal of a unit square. The Pythagorean Theorem for isosceles
√ right
AM D implies M D 2 = DA2 +AM 2 = 12 +12 = 2, so M D = 2. Done!” ♦
Not so fast! First, do we know how to prove the Pythagorean Theorem?
And even if we do, our reasoning back and forth is not quite written in the
form of a traditional proof.

 the forward argument from the above discussion, and write a short formal
Exercise 1. Assuming the Pythagorean Theorem, track down and extract

solution to Problem 3 with similar triangles.

2.1.4. Restricting ourselves may be advantageous. Do you recall what got us

looking for similar triangles in the first place? Was it perhaps the lack of
other geometric options?

PST 59. When searching for a second solution, eliminate the methods from
the first solution, in order to restrict your attention to what other techniques
and ideas are available and suitable in your situation.
For example, in the 5th -grade solution from Part I extra constructions
were encouraged, albeit restricted only to the integer grid. On the other
hand, in the second solution we disallowed any extra drawings at all! As
restrictive as this may have seemed at the time, it worked to our advantage:
it reduced the number of possible triangles and, even more drastically, the
number of possible pairs of similar triangles, making it easier to find the
“right” pair: M DH∼M BD.
Exercise 2. If you have extra time on your hands, count for fun the number
of families of similar triangles that appear in the original Figure 2a.
Here, a family is a collection of triangles any two of which are similar
i to each other, and two triangles from different families are not similar. Be
aware that congruent triangles are also counted as similar!

2.2. The Pythagorean Theorem is arguably the most widely-known

theorem in geometry. (Actually, it is a hybrid between geometry and al-
gebra.) It was mentioned and used several times in the discussion of our
main problems, so a proof of it is due. Incidentally, among the hundreds of
explanations of the theorem, an elementary and straightforward one is based
on a similarity criterion introduced in Part I.
Theorem 1. (Pythagorean Theorem (PT)) In a right triangle, the
squares of the legs add up to the square of the hypotenuse; in other words,
AC 2 + CB 2 = AB 2 as in Figure 3a.
Hint: Drop the altitude CH to the hypotenuse AB. Note that the foot
of CH will be on segment AB (why?) and, hence, it will split AB into two
parts, AH and HB. Using two pairs of similar triangles (how many similar
triangles do you see?), express each of AH and HB in terms of the sides of
ABC and then sum your results. ♦
α 6
b β a F
60◦ 2
α β 30◦
A H c B A O B A 1X 3 B
Figure 3. Pythagorean Theorem and Special cases
Exercise 3. (Baby Pythagorean Consequences) Using PT, show that

(a) In a right triangle, the hypotenuse is the largest side. √
(b) In a right isosceles triangle with legs 1, the hypotenuse is√ 2.
(c) In a 30◦ -60◦ -90◦ triangle, the three sides are in ratios 1 : 3 : 2.

Hint: Parts (a)-(b) have been done before (where?), with the “premature”
assumption of PT. In part (c) draw a segment through the vertex of the right
angle to split the original triangle into two smaller triangles, one of which
equilateral (cf. Fig. 3b). Describe the other small triangle. How does this
imply that the hypotenuse is twice the side of the equilateral triangle? ♦
We have some unfinished business from the Farmer-and-Cow discussion
in Part I. We concluded there that the farmer’s shortest route is through
point X on the river such that AX = 1 km and BX = 3 km. The other
(given) distances are F A = 2 km and CB = 3 km (cf. Fig. 3c).
 Exercise 4. Calculate the length of the shortest route of the farmer.
Another PT consequence (used earlier) has a more demanding proof:

 is longer than the third side.

Exercise 5. (Triangle Inequality) In a triangle the sum of any two sides

Hint: Drop the altitude to that third side, split into cases depending on
where the foot of this altitude lands, and use a baby consequence of PT. ♦

3. Physics and Math Combine Forces

3.1. Through the looking-glass. In Part I we mentioned a law of physics

that we observe every day and which could be used to find an alternative
solution to the Farmer-and-Cow problem. If you recall the picture-hint there
– the sun looking at itself in the mirror – it should not be surprising that we
were referring to the following well-known laws:
Laws of Reflection. If the reflecting surface is very smooth, the reflection
of light obeys the rules:
(1) The incident ray, the reflected ray, and the normal to the reflection
surface at the point of the incidence lie in the same plane.
(2) The angles which the two rays make with the normal are equal.
(3) The two rays are on the opposite sides of the normal.
To illustrate, on the right is the sunlight reflect-

ing off a river.2 Everything is in a half-plane with
respect to the river, the normal is the dashed per-
pendicular to the river, and the doubly-marked
angles are equal: ρ = ρ . Subtracting each from
ρ ρ
90◦ yields α = α , which is a rephrase of law (2):
the angles made by the riverand the incoming ray α α river
and by the river and the reflected ray are equal. Y X Z
Since we are trying to connect these “laws of nature” to our Farmer-
and-Cow problem, it will be silly to expect that numerical data (such as
the specific distances from the farmer and the cow to the river) are relevant
in this discussion. With this understanding, let’s generalize the original
problem by keeping only its features that are essential:

Problem 4. (Generalized Farmer & Cow) A

farmer and a cow are on the same side of a river.

The farmer must get to the river, dip his bucket
there, and take the water to his cow. To which
point at the river should the farmer walk so that
his total path is as short as possible?
Hint: The solution from Part I applies equally α α river
well to this generalized version: reflect the farmer Y X Z
across the river to obtain three similar triangles (review page 13). Then the
optimal path of the farmer must have made two equal angles with the river,
namely, α = α as marked above (why?). ♦

Caution: “Reflection” may mean different things, depending on the context. In the
Laws of Reflection, the sunlight is reflecting off the river. Mathematically speaking, this
is different from reflecting across the river, which the farmer did in order to get to the
phantom farmer on the other side of the river. The two usages are related, of course: the
sunlight’s reflection off the river is the same as its reflection across the normal (why?).

Comparing the two pictures on the previous page leads to the inevitable
conclusion that the farmer must follow the same path as the sunlight, except
on a horizontal instead of the vertical or slanted plane along which the
sunlight travels. To make this into a formal argument, a small hurdle about
uniqueness must be overcome:

 that if we connect it to the farmer F and the cow C we will make two equal
Exercise 6. Show that there is exactly one point X on the river Y Z such

angles with the river: ∠F XY = ∠CXZ. C

Proof: If X is such a point, let F  be the phantom
farmer and F F  intersect the river at A. By SAS,
F  XA ∼ = F XA, so ∠F  XA = ∠F XA. Hence F
∠F XA = ∠CXZ, forcing X to be on F  C, i.e., X is

the intersection of F  C and the river. But this is pre-

cisely how we constructed the original optimal point X Y A X Z
on the river! Thus, there is exactly one such point.  F
To truly understand the uniqueness of point X, try the following:
Exercise 7. If line F C and the river are not parallel, their intersection
X  will produce two angles at the river, ∠F X  Y and ∠CX  Z. Are these
angles equal? Are X and X  different? Is there a situation when X and X 
coincide? Why does this not contradict the uniqueness of X in Exercise 6?

3.2. What have we proven? If we assume the Laws of Reflection, then

our Farmer-and-Cow solution implies a well-known fact about the sunlight:

Exercise 8. Among all paths from one point to another that bounce off a
mirror, show that the sunlight will take the shortest distance possible.
Proof: If the sunlight starts at point F , reflects in one mirror and passes
through point C, then the two angles that the sunlight’s path makes with
the mirror are equal by the Laws of Reflection.
Now put everything on a horizontal plane and let a farmer start at F
and walk to the mirror and then to the cow. From the general Problem 4
and the uniqueness in Exercise 6, we know that the shortest path the farmer
goes through the unique point X on the mirror where the path makes equal
angles with the mirror.
In other words, the path of the farmer and the path of the sunlight are
identical. Since this is the shortest path for the farmer, it will be the shortest
path for the sunlight too. 
So, what happened here? Simply put, our solution to the Farmer-and-
Cow problem implied a “law of nature”: the sunlight travels the shortest
route possible even if it has to reflect along the way! And conversely, if we
assume this “law of nature” about the sunlight, then the shortest route for

the farmer will make two equal angles with the river. It depends on what you
assume as an “axiom” and what you decide to prove from it as a “theorem”.

3.3. More mirrors. It is natural to explore what happens if there is more

than one smooth surface off which the sunlight has to reflect. As a prepara-
tory version, try the special case with two “mirrors” where the path is closed,
i.e., it starts and ends at the same place:

Exercise 9. (Optimal Game) Two trees grow

in a yard fenced in the shape of an acute or right

angle. Children play the following game: starting
from one of the trees, they run to one side of the
fence, then to the other tree, then to the other
side of the fence, and finally return to the first
tree. Help them do this as fast as possible.

You may assume that the fence extends as far as necessary so that the
children cannot go out of the courtyard. You should also think about:

 go wrong then and how should the solution be modified to work here too?
Exercise 10. Why were obtuse angles eliminated in Problem 9? What may

And now, for the final generalization:

Exercise 11. If the sunlight M3
(a) starts at point A,

(b) bounces off from a sequence of mirrors
M1 , M2 , . . . , Mn , and
(c) ends at point B,
 show that the sunlight has taken the shortest pos- A

sible route among all routes with the three prop- B

erties (a), (b), and (c). M1

The picture above shows two paths: the sunlight’s path from A to B that
reflects through mirrors M1 , M2 , M3 , and M4 , and an alternative (dashed)
path that bounces off from the same sequence of mirrors. Note that the two
paths happen to pass through the same point on mirror M3 . Still, Exercise 11
claims that the sunlight’s path will be the shorter of the two.
Hint: Resolve the sunlight path into an equally long straight line path
while showing that the alternate path is a broken line path. ♦

4. Ptolemy’s Lead into Trigonometry

4.1. Ptolemy’s Theorem can be used as a springboard to a standard

trigonometric formula needed in the promised 8th -grade solution to the Three-
Squares problem. If you recall, Ptolemy’s Theorem succumbed to the method
of inversion in volume I. Among its numerous proofs, the one discussed here
stands out as a powerful application of inscribed angles and similar triangles,
glued together by an auxiliary geometric construction.

Theorem 2. (Ptolemy’s Theorem (PtT)) For an inscribed quadrilateral,

the sum of the products of opposite sides equals the product of the diagonals;
in other words, AB · DC + AD · BC = AC · BD as in Figure 4a.
β γ
α δ
α β δ

Figure 4. Proof of Ptolemy’s Theorem

Proof: The key idea is to split a diagonal, say, BD into two parts BM and
M D, so that ∠BAM = ∠CAD (= α as in Fig. 4b). Since inscribed angles
∠ABM and ∠ACD intercept the same arc AD, ¯ they are equal.3 From here,
BAM ∼ CAD by the AA similarity criterion. We can picture this by
rotating CAD about vertex A until side AC aligns with ray AB, and then
rescale CAD to the size of BAM . The angle of rotation is ∠CAB.
A second rotation about A but through ∠M AB (as in Fig. 4c), followed
by a rescaling, will move DAM onto CAB: why are these triangles also
similar? Check out the pairs of equal angles denoted by γ and δ.
Now we use ratios of sides from the above two similarities to express the
parts BM and CM of diagonal BD in terms of quadrilateral ABCD:
= , = ⇒ BD = BM + M D = + ·
Clearing the common denominator CA yields the desired equality of PtT. 
Did you notice that the same problem-solving idea occurred in the proofs
to both the Pythagorean and Ptolemy’s Theorems? The hypotenuse or a
diagonal was split into two parts, whether by the foot of an altitude or by an
extra point we created. In both situations, similar triangles played a crucial
role in the geometric construction and the ensuing algebraic calculations.

 of the segments XY into two parts, XZ and ZY , by an auxiliary geometric

PST 60. In proving an algebraic equality involving segments, try to split one

construction using, perhaps, similar triangles. Then express each of XZ and

ZY in terms of the given segments, sum the results for XZ and ZY to get
XY , and finally simplify to obtain the desired equality.

4.2. Lightspeed entry into trigonometry. The technical tool needed in

the 8th -grade solution to the Three-Squares problem is a famous trigonomet-
ric formula that expresses the tangent of a sum of two angles, tan(α + β), in
terms of its building blocks, tan α and tan β.

To review some facts about angles in a circle, see Circle Geometry, vol. I.

Let us first review the four basic trigonometric functions: C

Definition 1. In ABC with ∠B = 90◦
and ∠A = α, γ
i we define the following ratios of sides as new functions of
angle α, called sine, cosine, tangent, and cotangent:
AB ·
sin α = CA , cos α = CA , tan α = AB , and cot α =
If you are seeing these functions for the first time, you should: A B

Exercise 12. Verify that sin x and cos x, as well as tan x and cot x, swap
their values for angles α and γ = ∠C; that tan x and cot x are reciprocals

of each other and can be expressed as ratios of sin x and cos x; and that a
trigonometric version of the PT is satisfied for any right triangle:
(a) sin α = cos γ, cos α = sin γ, tan α = cot γ, and cot α = tan γ;
(b) tan α = cot1 α and tan γ = cot1 γ ; tan α = cos
sin α cos α
α and cot α = sin α ;
(c) sin2 α + cos2 α = 1 and sin2 γ + cos2 γ = 1.

Using the Baby Pythagorean Consequences from Exercise 3,

Exercise 13. Calculate the values of the four trigonometric functions at the
following famous angles: 0◦ , 30◦ , 45◦ , 60◦ , and 90◦ . Did you plug 90◦ into
tan x or 0◦ into cot x? Why or why not?
Partial solution: Since tan α = CB ◦
AB , α = 0 means that CB = 0 and
tan 0 = 0. When α = 30 , ACB is 30 -60 -90◦ and tan 30◦ = CB
◦ ◦ ◦ ◦ √1
AB = 3
(cf. Exer. 3c). But tan 60◦ is the reciprocal of tan 30◦ , i.e., tan 60◦ = CB

3. For α = 45◦ we obtain a right isosceles triangle, so tan 45◦ = CB AB = 1.
Finally, tan 90◦ is undefined: we cannot have a right ABC with two
0 · ♦
right angles; equivalently, we cannot divide by 0 as AB = AC
If we make the hypotenuse AC = 1, we can use E
the unit circle k centered at A to visualize the basic y
trigonometric functions. To this end, let A be the cen-
k C
ter of the coordinate system in the plane, and place B
tan α

along the positive x-axis. Then C will lie on the circle

sin α

while B will be inside it (why?). Since sin α = CB α x

and cos α = AB (why?), the sine and cosine func- A B D
tions will simply measure the vertical and horizontal cos α
displacement of point C, i.e., its y- and x-coordinate,
Similar triangles can also help us geometrically interpret the tangent
function as a single segment (not a ratio of segments).

 perpendicular l to the x-axis through point D and extend ray AC until it

Exercise 14. Let the x-axis intersect the unit circle k in point D. Draw a

meets l at point E. Show that tan α = DE.


Proof: By the AA similarity criterion, ABC ∼ ADE since they are

both right and share angle α. Calculating corresponding ratios and taking
into account that the radius of the circle is 1, we obtain:
tan α = = = = ED. 
In other words, the tangent function measures the vertical displacement
of point E on line l. Line l “happens” to be the tangent line to the circle k
at point D. This is no coincidence! So, if the name of tan x was a mystery
before, it should not be anymore. For practice,

 can be similarly interpreted as the length of a single segment.

Exercise 15. Find a horizontal line m along which the cotangent function

◦ ◦

 strictly increase, while cos x and cot x strictly decrease. (This means, for
Exercise 16. When α moves from 0 to 90 , show that sin x and tan x

example, that sin x < sin y and cot x > cot y for acute angles x < y.)
Hint: Use the unit circle for the values of sin x and cos x, or lines l and m
from Exercises 14-15 for the values of tan x and cot x. ♦
Yet a third way to think about a trigonometric function is via its graph.
When drawing graphs of trigonometric functions, on the x-axis we ordinarily
use linear units called radians (instead of degrees, which are angular units).
For example, 0◦ corresponds to 0 radians, 90◦ to π2 radians, 180◦ to π radians,
etc. More generally, z ◦ corresponds to 180
radians: this is the length of the
arc on the unit circle k that is encompassed by a central z ◦ -angle. Thus,
¯ on the unit circle k on page 180
the length of the smaller (dotted) arc CD
measures angle ∠BAC = α in radians. Keep this in mind when drawing the
graphs below and use radian measure along the x-axis.

 and cos x on interval [0,

Exercise 17. Put together all findings so far to sketch the graphs of sin x
π π π
2 ], tan x on [0, 2 ), and cot x on (0, 2 ].

Partial solution: From Exercise 14, we know that y ν

the tangent function is measured on line l tangent to the
unit circle at point D: tan α = DE. When α = 0, side AC
tan x

of ∠BAC coincides with the other (horizontal) side AB,

causing E = D and tan 0 = 0. As α increases (still staying
acute), side AC starts moving counterclockwise from the
the horizontal position on the x-axis towards the vertical 1
position on the y-axis, lifting in the process point E higher x
π π
and higher on line l and making tan α increase. O 4 2

In fact, any positive value of tan α can be obtained this way (why?).
Thus, the range of tan x is [0, ∞) for 0 ≤ x < π/2 . Furthermore, the graph
of tan x has a vertical asymptote ν at x = π2 (not to be confused with the
previously discussed tangent line l). Visually, we observe that the graph of
tan x gets closer and closer to the line ν as x approaches π2 . ♦

The sine and cosine functions can be defined for any angles, not just for
acute angles like α and γ above, while the tangent and cotangent functions
can be extended with care to almost all angles, avoiding division by cos x or
sin x when they are zero. We will not do this here, but the reader interested in
having a more complete understanding of trigonometry should consult a basic
text on trigonometry and then justify in the Hints section the corresponding
(dashed) extensions of the graphs from Exercise 17.

4.3. Deep in Trigland. To prepare for the promised formula for tan(α+β),
we need to first address its predecessors: analogous versions for sine and
cosine. If you are familiar with these formulas, skip to Section 4.4 for the
trigonometric solution to the Three-Squares problem. Otherwise, hold on
tight, for we will pass through some rough trigonometric terrain.
Theorem 3. For any angles α and β:
(a) sin(α + β) = sin α cos β + cos α sin β;
(b) sin(α − β) = sin α cos β − cos α sin β;
(c) cos(α + β) = cos α cos β − sin α sin β.

For our purposes it suffices to consider only the case when α + β is acute.
We leave it to the reader to extend the proofs to any other cases. D

PST 61. To prove a trigonometric formula that involves


angles α, β, and their sum α + β, try to incorporate two
smaller right triangles with angles α and β, respectively, C
into a larger right triangle one of whose angles α + β 1
is made from gluing angles α and β together, as in the
picture on the right. O A
Proof: (a) Following PST 61, glue two right triangles OAC and OBC
along their hypotenuse OC to form quadrilateral OACB with right angles
at A and B. This makes ∠AOB = α + β (< 90◦ ), which is assumed to be
acute for the duration of our proof. Extend AC and OB to form another,
larger right OAD.
To simplify calculations, let OC = 1. Then from OAC and OBC,
the RHS of (a) can be written as:
(2) sin α cos β + cos α sin β = AC · OB + OA · BC.
On the other hand, quadrilateral OACB is cyclic because
the two opposite angles at A and B sum to 180◦ . (In fact, B
the diameter of circle k circumscribed about OACB is k
OC.) Applying Ptolemy’s Theorem, we can rewrite the 1
RHS of (2) as OC · AB = 1 · AB = AB. β
To finish the proof, we need to show sin(α + β) = AB. O A
In right OAD, sin(α + β) = sin ∠AOD = DO DA

From circle k, ∠BOC = ∠BAC = β, since they are D

¯ Because of
inscribed angles in k intercepting arc BC.
the common angle ∠ADO, the AA similarity criterion k
implies OCD ∼ ABD, and hence C
DA AB AB β 1 β
= = = AB.
This establishes that RHS = LHS in (a).  O A

In part (b), we can modify the ideas encountered just now to accom-
modate the required difference (instead of sum) of two angles. We can also
restrict the solution to the case when α > β so that α − β > 0 and we can
use our basic definitions of the sine and cosine functions.

Hint: (b) Assuming α > β, geometrically “subtract” β D

from α: start with right OAC and right OBC such β A α
− B
that ∠AOC = α and ∠BOC = β (as in the figure on α

the right); glue them along hypotenuse OC so that angle β

β is inside angle α, and hence ∠AOB = α − β. Extend O 1 C
−→ −−→ k
rays OA and CB until they intersect at D.
There are more clues in the figure. ♦
Proof: (c) Using our introductory Exercise 12(a), we can switch back
and forth between sines and cosines by applying cos x = sin(90◦ − x) and
sin x = cos(90◦ −x). Again assuming that 0 < α, 0 < β, and α + β < 90◦ ,
we can reduce part (c) to the previous part (b):
Ex. 12(a) Ä ä Ä ä
cos(α + β) = sin 90◦ − (α + β) = sin (90◦ − α) − β
Thm 3(b)
= sin(90◦ − α) cos β − cos(90◦ − α) sin β
Ex. 12(a)
= cos α cos β − sin α sin β. 
After all this hard work, the final formula for the tangent of a sum will
feel anticlimactic. We just have to be careful not to divide by 0 so as to have
well-defined tangents. Hence the conditions below:
Corollary 1. If cos α = 0, cos β = 0, and cos(α + β) = 0 then
 tan(α + β) =
tan α + tan β
1 − tan α tan β
Hint: This is more of an exercise on fractions than anything else. Use
the fact that tan x is the ratio of sin x and cos x, expand sin(α + β) and
cos(α + β), and practice your algebraic skills! ♦

For a complete trigonometric picture,

Exercise 18. Devise and prove analogous formulas for cos(α−β), tan(α−β),
cot(α + β), and cot(α − β).

4.4. Trigonometric gratification. Believe it or not, we are ready for

the shortest, yet long-overdue trigonometric solution to the Three-Squares
problem. Recall from Figure 2a on page 173 that we must show α + β + γ =
90◦ . Since α = 45◦ , this boils down to β + γ = 45◦ .
4.4.1. Tangents rule! In view of what we studied in the previous subsection,
it is reasonable to try to calculate some trigonometric function of β +γ. Why
not the tangent function?
Exercise 19. Calculate tan(β + γ).
Solution: Since AHD and ABD are right triangles, we have tan β =
AD 1 AD 1
AH = 2 and tan γ = AB = 3 . Substituting into the formula for the tangent
of a sum, we obtain
tan β + tan γ +1 3+2 5
tan(β + γ) = = 2 1 3 1 = 2·3 1 = 6
= 1. 
1 − tan β tan γ 1− 2 · 3 1− 6 5
However, we already know that tan 45◦ = 1. Thus, tan(β + γ) = tan 45◦ .
Does this imply that the two angles are equal? Indeed, as we demonstrated
earlier, tan x strictly increases for acute angles. Well, 45◦ is acute. How
about the other angle β + γ? In order to show that β + γ < 90◦ verify that:
 Exercise 20. Both β and γ are < 45 . ◦

Proof: In right AHD, β + ∠ADH = 90◦ . But since ∠ADH contains

∠ADM = 45◦ , it follows that ∠ADH > 45◦ and β = 90◦ − ∠ADH < 45◦ .
You can use a similar argument for γ and ABD. Alternatively, α is an
exterior angle for M HD and for M BD, and hence α = 45◦ is larger
than the remote interior angles β and γ in these two triangles. 
To wrap things up, since tan x is strictly increasing for acute angles, we
can’t have tan x = tan y for two different acute angles. But β + γ and 45◦
are both acute and tan(β + γ) = 1 = tan 45◦ . So the two angles must be
equal! Overall, α + (β + γ) = 45◦ + 45◦ = 90◦ and we are truly done. 

4.4.2. More trig-routes? As we went through the above solution, the reader
should have questioned our choice of the tangent function: couldn’t we have
done as well with other trigonometric functions? The answer is Yes, but you
need to complete the earlier exercises about the basic properties of sin x,
cos x, and cot x, before you can:
 Exercise 21. Produce three more solutions to the√Three-Squares√problem.
Solution with cosine: In Figure 2a, DH = 5 and DB = 10 from
right DAH and right DAB. Thus, cos(β + γ) can be calculated as
cos β cos γ − sin β sin γ = √2 √3
5 10
− √1 √1
5 10
= √ 5

5 10
= √1
= cos 45◦ .
But cos x strictly decreases for acute angles, and, as above, β + γ < 90◦ ; so
cos(β + γ) = cos 45◦ means β + γ = 45◦ . Thus, again α + β + γ = 90◦ . 

5. Hints and Solutions to Selected Problems

Exercise 1. Assuming√the Pythagorean √ Theorem, from right isosceles

M AD we have M D = 1 + 1 = 2, and a sequence of equalities follows:
2 2

M D = 2 ⇒ M D2 = 1 · 2 = M H · M B ⇒ M MD = MB ·

Since ∠HM D = δ is shared by M DH and M BD, by RAR criterion

the two triangles are similar. This in turn implies ∠M DH = ∠M BD = γ.
Summarizing, β + γ = ∠HDC + ∠M DH = ∠M DC = 45◦ and, finally,
α + β + γ = 45◦ + 45◦ = 90◦ . 
Exercise 2. With a risk of having missed something, we’ll say that there are
8 types of non-similar triangles that appear in Figure 2a: 3 of them are right
triangles with a second angle α, β, or γ, and 5 are obtuse triangles, whose
pairs of acute angles are {α, γ}, {γ, β}, {β, α}, {γ, β −γ}, or {β −γ, α+γ}. ♦
Theorem 1 (PT). If the foot H of altitude CH were not on hypotenuse
AB, say, B is between A and H (cf. Fig. 5a), then ∠ABC would be an
exterior angle for BHC and, as such, ∠ABC > ∠AHC = 90◦ . But we
can’t possibly have an obtuse ∠ABC in the right ABC! This contradiction
explains why H must be on hypotenuse AB and, hence, it must split AB
into two segments AH and HB. (Why can’t H = A or H = B?)
The three resulting triangles (back in Fig. 3a) are similar by the AA
criterion: AHC ∼ ACB ∼ CHB: they are right and the two smaller
triangles share angles α or β with the big triangle. From these similarities,
AC = AB and BC = BA ⇒ AH · AB = AC and BH · AB = BC .

Adding up, (AH + BH)AB = AC 2 + BC 2 , i.e., AB 2 = AC 2 + BC 2 . 

Exercise 3. (c) Following the hint, ∠ACO = 60◦ means that ∠BCO = 30◦ ,
and hence OBC is isosceles with OC = OB. Since AOC is equilateral,
OC = OA. Thus, √ √ of ABC and√AB = 2OC = 2AC.
O is the circumcenter
From PT, BC = AB − AC = 4AC 2 − AC 2 = 3AC. We
2 2
√ conclude
that the desired ratios are satisfied; namely, AC : CB : BA = 1 : 3 : 2. 
Exercise 4. The shortest route goes along hypotenuses F X and XC:
√ √ √ √ √
F X +XC = 22 + 12 + 32 + 62 = 5+ 45 = 4 5. 

Exercise 5. Following the hint, to show that AB + BC > AC in ABC,

drop the altitude CH to side AB. There are three cases to consider.


A B H A H =B A H B Y X X Z Y X=X  Z
Figure 5. Triangle Inequality and Questioning uniqueness

(a) If H is outside segment AB (cf. Fig. 5a), WLOG let B be between

A and H. Using right AHC and part (a) of Baby Pythagorean we have
AC > AH > AB and hence AC +BC > AB.
(b) If H = B (cf. Fig. 5b), then ∠B is right in ABC. Part (a) of Baby
Pythagorean implies AC > AB and hence AC + BC > AC > AB.
(c) If H is between A and B (cf. Fig. 5c), part (a) of Baby Pythagorean
for right AHC and right BHC implies AC > AH and BC > BH.
Adding the inequalities, we have AC + BC > AH + BH = AB. 
Exercise 7. The angles ∠F X  Y and ∠CX  Z are supplementary (cf. Fig. 5d)
and, in general, not equal to each other, causing X  = X. It is only when
they are both right angles, i.e., F C ⊥ river (cf. Fig. 5e), that X = X  . 
Exercise 9. The game consists of two independent sub-
games: to get from tree T1 to fence side BA and then CR2
to tree T2 , and to get from tree T2 to fence side BC and T2
then to tree T1 . Each part is a copy of the Farmer-and- X2
Cow problem. Thus, we can reflect T1 across line BA to
point R1 and reflect T2 across line BC to point R2 , and T1
let R1 T2 intersect line BA in X1 and R2 T1 intersect line
BC in X2 . How does ∠BAC ≤ 90◦ imply that X1 is on A X1 B
→ −−→
ray BA and X2 is on ray BC? R1
The shortest path will be T1 → X1→ T2 → X2 → T1 . ♦
Exercise 10. The above solution will work for an obtuse

angle too as long as X1 is on ray BA and X2 is on ray T1
−−→ −−→
BC. But what if, say, X1 is not on ray BA: this happens T2
−−→ C
if R1 T2 intersects line BA outside ray BA?! We cannot
possibly take the sub-route T1 → X1 → T2 because we
A B X1
will exit the garden! As it turns out, the shortest path
from T1 to T2 via wall BA will go through the corner B
of the garden: T1 → B → T2 . R1
To see this, let D be any point on wall BA. Then T1 T2
sub-routes T1 → D → T2 and T1 → B → T2 are the same
length as, respectively, sub-routes R1 → D → T2 and
R1 → B → T2 . In other words, we can consider our D B X1
sub-routes to start from R1 and end at T2 . But then
sub-route R1 → B → T2 is inside R1 DT2 while sub-
route R1 → D → T2 goes along the sides of the triangle. R1
To compare these two sub-routes, extend R1 B until it intersects side DT2
in point E. By -Inequality for BET2 and for R1 DE:
R1 B + BT2 < R1 B + (BE + ET2 ) = R1 E + ET2
< (R1 D + DE) + ET2 = R1 D + DT2 .
Thus, the sub-route going through the corner B is shorter than any other
sub-route in this situation. ♦

Exercise 11. Reflect the initial point A across

mirror M1 to point A1 and let the sunlight and

the alternative (dashed) route start at A1 instead.
S2 More precisely, replace the initial line segments
Q2 AS1 and AQ1 of the two routes by segments A1 S1
and A1 Q1 of, correspondingly, equal lengths. Note
Q1 M1
that the sunlight will now continue straight from
S1 A1 through S1 to mirror M2 , while the dashed
route will, in general, follow a broken line from
A1 A1 through Q1 to mirror M2 .
To summarize, moving the starting point of the routes from A to A1 did
not change the total length of each route. But now we can forget mirror
M1 and reduce the problem to one fewer mirror. Continuing this way, we
can gradually straighten out the sunlight’s route. In the end, after the last
mirror has been eliminated, both routes will start at some point An and
both will end at B, but the sunlight’s route will be a straight segment while
the alternative route will still be a broken line, unless it originally coincided
(everywhere!) with the sunlight’s route. ♦

Exercise 12. Since sin γ = AB BC AB BC

AC , cos γ = AC , tan γ = BC , and cot γ = AB ,
the identities in parts (a)-(b) can be directly verified. For part (c), the
Pythagorean Theorem for ABC says that AB 2 + BC 2 = AC 2 . Dividing
everything by AC 2 yields:
AB 2 BC 2
+ = 1 ⇒ cos2 α + sin2 α = sin2 γ + cos2 γ = 1. ♦
AC 2 AC 2

Exercise 13. Check that sin 0◦ = cos 90◦ = 0, sin 30◦ = cos 60◦ = 12 , sin 60◦ =

cos 30◦ = 23 , sin 90◦ = cos 0◦ = 1. Furthermore, cot 0◦ = 0 is not defined,

cot 30◦ = 3, cot 60◦ = √13 , and cot 90◦ = 0. ♦
y cot α
Exercise 15. Using the figure on page 180, let m
G H m
be the line through G(0, 1) tangent to circle k and
−→ F
intersecting ray AC at H. Note that the measure of k
∠CAG is (90◦ − α). ◦ α
90 −α
By Exercises 12 and 14, cot α = tan(90◦ − α) = A B x
tan ∠CAG = GH, i.e., the cotangent function is mea-
sured along line m. 

Exercise 16. As the second side of ∠BAC rises from 0◦ to 90◦ , point C
also rises along the unit circle k and hence its y-coordinate sin α increases.
At the same time, B moves closer to the center A of the unit circle; i.e., its
x-coordinate cos α, decreases. As we saw in Exercise 15, cot α = GH, which
will decrease since H will move towards point G. ♦

Exercise 17. The graphs of sin x and cos x on [0, π], of tan x on (− π2 , π2 ) ∪
2 ), and of cot x on (0, π) ∪ (π, 2π) are sketched below. To justify the
( π2 , 3π
dashed parts of the graphs, the functions need to be defined on the cor-
responding intervals; for these definitions, use the unit circle for sin x and
cos x, and use the tangent and cotangent lines l and m for tan x and cot x ♦
y y y

tan x

tan x

cot x

cot x


0 π x
π π 2π
y − π2 0 π
3π x 0 π
3π x
2 2


0 π 2π

Figure 6. Graphs of sin x, cos x, tan x and cot x

Theorem 3. (b) Again we have a cyclic quadrilateral OCBA (why?).

WLOG, let the diameter OC of k be 1. Then from right triangles OAC
and OBC we can write the RHS of (b) as:
sin α cos β − cos α sin β = AC · OB − OA · BC = OC · AB = AB.
From right OBD, sin(α − β) = DB DO . From the cyclic OCBA, ∠ABD = α
(why?), so ABD ∼ COD by the AA criterion. Thus, DB AB
DO = CO = AB.
Everything matches: sin(α − β) = AB and formula (b) follows. 
Corollary 1. Using tan x = sin x/ cos x and Theorem 3, we calculate:
sin α sin β
sin(α + β) sin α cos β + cos α sin β cos α + cos β
tan(α + β) = = = ,
cos(α + β) cos α cos β − sin α sin β sin α sin β
1 − cos α cos β
where the last step was division of both numerator and denominator by
cos α cos β. This introduced the tangent function everywhere and we can
tan α+tan β
now arrive at the desired expression 1−tan α tan β · 
Exercise 18. The four formulas are:
tan α − tan β
• cos(α − β) = cos α cos β + sin α sin β; • tan(α − β) = ;
1 + tan α tan β
cot α cot β − 1 cot α cot β + 1
• cot(α + β) = ; • cot(α − β) = ·
cot α + cot β cot β − cot α
For the cosine formula, if you assume that 0 < β < α < 90◦ , you can
mimic the proof of Theorem 3(c). Once you prove this formula, the remaining
formulas are just algebraic exercises with fractions: turn the tangents and
cotangents into fractions of sines and cosines and algebraically manipulate
the expressions on the RHS and LHS of the desired formulas to verify that
they are equal. ♦
Session 8

Complex Numbers. Part II

Zvezdelina Stankova

Sneak Preview. The discussion of basic operations on complex numbers from

Part I will continue here with ratios, integer powers, and roots in C. Along the
way, we will stumble upon a stunning resemblance between powers in C and
mollusk shells and will become skilled with de Moivre’s Formula by applying its
“offspring,” the roots of unity, to geometric problems with regular polygons. In
particular, we shall discover the connection between C and the Triangle Inequal-
ity by showing that modulus lacks “respect” for addition, solve the introductory
nonagon Problem 1 from Part I, and expand toward a fundamental question in
statistics by minimizing sums of squared distances . . . . All with complex numbers.

1. Warning, “Teaser,” and Strategy

Although Sections 3–7 are a must for everyone, Sections 8–9 are rather
non-trivial: the applications of complex numbers to geometry will require
sophistication and determination from the reader to follow through the ar-
guments and absorb all the ideas.
A prime example of this is Problem 7, which
will come up in Section 10. Informally, if
A2 A0 A1 . . . An−1 is a regular polygon, which line
l in the plane is the “closest” to its vertices?
More precisely, if we take the distances from each
O A0
vertex Ai to l (denoted by dashed segments in
A3 Fig. 1), square them, and add them, which line
l will yield the minimal such sum?
Figure 1. “Closest” line? After skimming through Sections 3–7, the
novice reader may decide to wait for Part IV (in
a future volume), devoted entirely to solving Olympiad-style problems via
complex numbers. Part III would then be an option for the intermediate
reader. The advanced reader, on the other hand, is encouraged to “stick with
it” and try Problem 7 on his/her own while we diligently move towards its
solution at the end of the present Part II.

2. Conventions from the Past

Recall from Part I that a complex number z is written in Cartesian form

i as z = (x, y) = x + iy, and in polar form as z = (|z|, θ) = |z|(cos θ + i sin θ).
Here x = Re(z) and y = Im(z) are the real and the imaginary parts of z,
while |z| and θ are the modulus and argument of z. Note that both forms
can be written as ordered pairs (a, b); yet these pairs mean different things
depending on which form they represent. If not otherwise specified, in this
session ordered pairs of real numbers will stand for polar notation of complex
Further, we will frequently use the polar form of addition in C, so it is
worth reviewing it: for any z, w ∈ C and any angles θ, μ ∈ R we have
(1) (|z|, θ) · (|w|, μ) = (|z||w|, θ + μ).

 moduli and adds the arguments.

In other words, as shown in Part I, complex multiplication multiplies the
Finally, the polar form of conjugation is
z = (|z|, θ) = (|z|, −θ), i.e., conjugation preserves the modulus and negates
the argument.
As in Part I, wherever possible throughout this session we will strive to
provide both algebraic and geometric arguments.

3. Complex Division

We saw in Part I how to add, subtract, and multiply two complex num-
bers z and w. In R, we can also divide two numbers x and y, as long as
y = 0. Can we do this in C too? In other words, can we rewrite the ratio
z/w as some complex number q such that qw = z?

3.1. Conjugation to the rescue! Let’s study first a special product:

Exercise 1. Prove that ww = |w|2 for any w ∈ C, i.e., multiplying a complex
 number by its conjugate produces a real number: the square of its modulus.

w A2
θ |w|2 1
−θ ww A0
Figure 2. Multiplying by the conjugate

One solution: There is no mystery about the origins of the equation

ww = |w|2 . Indeed, if we view it in polar coordinates, the product ww
simply “kills” the angle θ and lands us on the real axis (cf. Fig. 2a):
ww = (|w|, θ)(|w|, −θ) = (|w||w|, θ − θ) = (|w|2 , 0) = |w|2 ∈ R. 

Exercise 1 leads us to one of the oldest “tricks” with complex numbers:

 PST 62. If you want to get rid of i in the denominator of a fraction z/w,
multiply both top and bottom by the conjugate w of the denominator:
Ç å
z zw zw 1
(2) = = = zw.
w ww |w|2 |w|2
Note that the last expression in (2) is a well-defined complex number: the
product zw is rescaled by the real number 1/|w|2 . Thus, we do know how
to divide two complex numbers: simply apply PST 62. Let’s try it.
Exercise 2. Find real numbers x and y such that x + yi equals the fraction

5 − 4i
; (b)
2 + 3i
5 − 4i
; (c)
if v = 2 + 3i; (d)
a + bi
c + di
for a, b, c, d ∈ R.

Solution for part (a): The desired fraction is nothing but the recip-
rocal of w = 5 − 4i. Applying (2) with z = 1 we obtain a formula for w−1 :
1 w 5 + 4i 5 4 
(3) = = = + i.
w |w| 2 25 + 16 41 41
Whoever diligently completes part (d) will frown at the resulting com-
plicated formula for the ratio z/w: this happened because the question was
phrased in Cartesian coordinates. Can we interpret division in C via polar
coordinates in a more easily remembered and natural way? Prove the fol-
lowing corollary in two ways: algebraically and geometrically, and convince
yourselves that it makes sense for any non-zero denominator w.
Corollary 1. If z = (|z|, θ) and w = (|w|, μ), then
z Ä |z| ä 1 Ä 1 ä
= , θ − μ , and = , −μ .
w |w| w |w|
 In other words, division in C divides the moduli and subtracts the arguments
of the numerator and denominator.
3.2. Division in C is respected. Just like in Part I, it is time to under-
stand how the operations of modulus and conjugation interact with complex
division. That they respect division should come as no surprise. To see this,
do the following exercise in two ways, using Cartesian or polar forms.
 Exercise 3. For z, w ∈ C with w = 0, prove that = and   = · z

 erence to a particular notation: this way, they can be better understood,

It is often advantageous to phrase mathematical statements without ref-

readily remembered, and more easily used in various situations. We have

done just that on a number of occasions. For example, word reformulation
of the equation ww = |w|2 can be applied to regular polygons placed on the
unit circle as the pentagon in Figure 2b: the product of pairs of conjugate
vertices always lands on the vertex corresponding to 1. For the pentagon,
this means A1 · A4 = A0 = 1 = A2 · A3 .
As another example of word reformulation, let’s rephrase Exercise 3.

Corollary 2. In C, the modulus of a ratio equals the ratio of the moduli,

and the conjugate of a ratio equals the ratio of the conjugates.
Note how the word order changes in the sentences. Good mathematicians
use well-phrased statements. Contrary to what the general public is likely
to imagine mathematics to be–they might see countless blackboards covered
with complicated formulas and “big” numbers–the simple truth is that the
more involved and advanced the math topic is, the more letters, words, and
language permeate it to express the complexity of concepts and abstractness
of ideas, and to allow for applications of these ideas to numerous other areas
of mathematics, sciences, and everyday life.
Therefore, learning to verbalize our math statements well in words is a
skill worth acquiring as early as possible. As we go along, we shall paraphrase
“in English” some important conclusions; but it is an ongoing task for the
reader to perform this constantly, as we did in Corollary 2.

4. The Triangle Inequality: No “Respect” for Addition?

4.1. Modulus and addition. We have seen that conjugation respects all
four standard operations on C: addition, subtraction, multiplication, and
division.1 We have also observed that the modulus preserves multiplication
and division. How about modulus and addition? Let’s experiment.
Exercise 4. For z = 3 + 4i and w given below, compare the modulus of
their sum with the sum of their moduli. Which is larger, |z + w| or |z| + |w|?
How about |z − w| and |z| − |w|? Are they ever equal?
(a) w = 5 − 12i; (b) w = 6 + 8i; (c) w = 1 + 43 i.
We have finally come to operations in C that are not respected: the
modulus does not respect addition or subtraction on C. Instead, we have:

Theorem 1. (Triangle Inequality) For any complex numbers z and w it

is true that |z + w| ≤ |z| + |w|. Equality is attained iff z and w are non-
negative multiples of each other: z = kw or w = kz for some real k ≥ 0.

Proof: Geometrically, the situation becomes familiar if we label points

P = z, Q = w, and T = z + w, as in Figure 3a. We realize that the sides
of OP T are given by |OP | = |z|, |P T | = |OQ| = |w|, and |OT | = |z + w|.
Since the shortest route from O to T is the straight segment between them,
we see that |OT | ≤ |OP | + |P T |; i.e., |z + w| ≤ |z| + |w|.
Equality will be attained iff OP T degenerates to a segment OT with
point P between O and T (cf. Fig. 3b); i.e., z and w are positive multiples
of each other or one of them is 0 (why?). 
In Abstract Algebra this identifies conjugation as an automorphism of C as a field.

T =z + w T = kz |z − w|
Q=w |w|
P =z degenerate P =z |z|

Figure 3. -Inequality, -Equality, and Subtraction version

All right, but how does one prove this geometric version of the Triangle
Inequality? Check out Geometry II.

4.2. Modulus and subtraction. It is worth pointing out that there is a

subtraction version of the Triangle Inequality:
 Corollary 3. |z − w| ≥ |z| − |w| for any z, w ∈ C.
Partial Proof: Corollary 3 is equivalent to |z−w|+|w| ≥ |z| (cf. Fig. 3c),
which is true by the ordinary Triangle Inequality applied to (z − w) and w:
(4) |z − w| + |w| ≥ |(z − w) + w| = |z|.
For which z and w is equality obtained in Corollary 3? ♦

 the previously proved Triangle Inequality, we applied a reduction PST that

Note that by reworking our subtraction inequality to look exactly like

appeared in various forms in Volume I and in Group Theory I.

5. Integer Powers in C

We introduce and discuss integer powers in C via several problems.

Problem 1. Calculate ( 3 + i)2004 and (1 − i)2004 .

Multiplying out the 2004 terms ( 3 + i) seems ludicrous. Even if the
reader is familiar with the Binomial Theorem,2 it is still unclear how to
simplify the final answer. We need another method that quickly yields powers
of complex numbers. The difficulty here arises from the “wrong” viewpoint in
the formulation of Problem 1: it is again phrased in Cartesian coordinates,
while polar ones are a lot more insightful.

5.1. de Moivre’s formula saves the day!

Theorem 2. (de Moivre) In polar coordinates: if z = (r, θ) then z n =
 (rn , nθ) for any n ∈ Z. In Cartesian coordinates: if z = |z|(cos θ + i sin θ)
then z n = |z|n (cos nθ + i sin nθ).
There isn’t much to prove here when n ≥ 0: de Moivre’s Theorem is
an n-repeated application of complex multiplication of the same number z.
When n < 0 a small calculation is necessary. Using Corollary 1 for n = −5:
Check Combinatorics I in volume I.

z n = z −5 = (z 5 )−1 = (r5 , 5θ)−1 = (r−5 , −5θ) = (rn , nθ).

In either case, to apply de Moivre’s Theorem, we need to know the angle θ.
 PST 63. Let z = a+bi in Cartesian coordinates. To find the argument θ of z,
• factor out |z|, as in z = |z|( |z|
a b
+ i |z| ), and
• check which angle θ fits the bill: cos θ = a
|z| and sin θ = |z| ·

√ √
Solution to Problem 1: Factor |z| = 3 + 1 = 2 from z = 3 + i:

z= 2( 23 + 12 i) = 2(cos π6 + i sin π6 ).
de Moivre’s Theorem then yields:
Ä ä
z 2004 = 22004 cos 2004 2004
6 π + i sin 6 π = 2
2004 (cos 334π + i sin 334π) = 22004 .

The reader should repeat this calculation for (1 − i)2004 . ♦

Pz 5.2. Hopscotch on mollusks. In Figure 4,

locate the dot√corresponding to the initial first
1 power z 1 = 3 + i. The white dots are the
next several consecutive powers z 2 , z 3 , z 4 , etc.
Let us connect these white dots with a smooth
curve (in solid black), which we call the power
√ curve for z and denote by Pz . The resem-
Figure 4. Pz for z = 3 + i blance of Pz to a mollusk shell is inescapable!
If we also draw the “mollusk-type” power curve Pw for w = 1 − i, which
white dots on Pw and Pz would be the first to coincide? In other words:

 equality (√3 + i)
Problem 2. Find the smallest positive integers m and n satisfying the
m = (1 − i)n .

Solution: Recycling the idea in the solution to Problem 1, we obtain:

(5) ( 3 + i)m = z m = 2m cos π6 m + i sin π6 m .
√ √ √
Similarly, |w| = 2, w = 2( √12 − i √12 ) = 2(cos(− π4 ) + i sin(− π4 )),

(6) (1 − i)n = wn = ( 2)n (cos(− π4 n) + i sin(− π4 n)).
Before launching into complex calculations, consider the following “obvious”

PST 64. If you want z = w, write in polar form z = (|z|, θ) and w = (|w|, μ)
and equate the moduli and arguments: |z| = |w| and θ ≡ μ (mod 2π).
Recall from Number Theory I (vol. I) that “mod 2π” simply means that
θ and μ differ by a multiple of 2π. So, we apply PST m n
√ n64 to z and w . For
starters, the moduli must be equal, i.e., 2 = ( 2) , and hence n = 2m.
Excellent: n must be even, which simplifies (6) to
(7) (1 − i)n = w2m = 2m (cos(− π2 m) + i sin(− π2 m)).

Now we “equate” the arguments in (5) and (7): π6 m = − π2 m + 2kπ, from

which we conclude m = 3k, for some k ∈ Z. We remember at this point
that we were looking for the smallest positive m and n, i.e., m = 3 and
n = 2m = 6. The reader is encouraged (as always!) to check the answers by
plugging them into (5) and (6); you should get z 3 = w6 = 8i (cf. Fig. 4). 

5.3. Landing on the axes. In the next problem, we seek out all z ∈ C
such that the fourth white dot on their power curve Pz lands on the real or
the imaginary axes.
Problem 3. If z = a + bi ∈ C, find out relations between a and b such that
(a) z 4 is real; (b) z 4 is purely imaginary.
Hint: The problem is again stated in the “wrong” coordinates. Instead,
write z = (r, θ) in polar form and use de Moivre’s formula: z 4 = (r4 , 4θ).
Note that the modulus of z is irrelevant in our question, since landing
on a specific axes is determined entirely by the angle θ. For example, in
part (b), in order for z 4 to be purely imaginary, 4θ must “line up” with
the positive or the negative imaginary axis, which yields two possibilities:
4θ ≡ ±π/2 (mod 2π). These two possibilities are contained in the single
congruence relation 4θ ≡ π/2 (mod π) (why?). ♦

5.4. Extending a mollusk spiral. Once we know how to find positive

integer powers of complex numbers, we can fill in the white dots z n (for
n ∈ N) on the power curve for any z ∈ C. Note that Pz will start at z and
either spiral away from the origin if |z| > 1, or toward the origin if |z| < 1,
or move along the unit circle if |z| = 1. (Draw a few cases for various z to
get a feeling for the three situations.) Yet, inspecting carefully a mollusk, it
seems that the spiral does both things: it starts at the “origin”, and it spirals
away forever (if the mollusk lives and grows forever.)

Exercise 5. In Figure 4 which powers of z = 3 + i will extend the spiral
Pz from the initial dot z toward the origin O?
Solution: The intended extension of Pz is depicted by a dotted curve in
Figure 4. The black dots on this curve are the negative integer powers of z:
z −1 , z −2 , z −3 , etc. Indeed, for any integer n > 0,
z −n = ( |z|1n , −nθ) = ( 21n , −nθ), where θ = π6 ·
Thus, the modulus 1/2n becomes smaller and approaches 0 as n increases,
thereby pulling the complex numbers z −n toward the origin. Further, the
angle −nθ rotates z clockwise, making a spiral revolving toward O. 
We can summarize informally this section as follows. For all non-zero
z ∈ C the integer powers z n comprise the “skeleton” for the power curve Pz ,
to give the impression that Pz “starts” at the origin and spirals away forever,
or that it is the unit circle for |z| = 1.

6. Roots in C

Yet, there are plenty of empty spots on Pz between two consecutive

powers of z on Figure 4. With what can we fill these spots? Since the angles
associated√to such complex numbers are non-integer parts of kθ, considering
the roots z is reasonable. But can we take roots in C?

6.1. de Moivre again.

Corollary 4. (Root Formula) A non-zero complex z = |z|(cos θ + i sin θ)
has exactly n complex nth -roots w1 , w2 , · · · , wn , given by the formula
» Ä ä
wk = n
|z| cos θ+2πk
n + i sin θ+2πk
n for k = 0, 1, . . . , n − 1.

“Proof” via example: At first glance, this is a rather forbidding formula,

so let’s take it apart. For concreteness, let n = 5. We want to find all
5th roots w of z: w5 = z (cf. Fig. 5a). In other words, if w = (r, μ) and
z = (|z|, θ), de Moivre’s formula yields
w5 = (r5 , 5μ) = (|z|, θ) = z.

w1 √ ω 2
w2 z= 3 + i

w0 5 1

w3 ω3
w4 ω4

Figure 5. 5th Roots of z = 3 + i and Regular pentagon in C

» the moduli we must have r = |z|, which accounts for the term r =
For 5
|z| in the Root Formula. Further, equating arguments of w5 and z, we
obtain 5μ ≡ θ (mod 2π), i.e., 5μ = θ + 2kπ for some integer k. Therefore
μ = 5θ + 2k5 π. Although, in principle, we can plug in any integer k, only five
distinct sums μ will be formed up to a multiple of 2π (why?):
μ = θ5 , θ
5 + 25 π, θ
5 + 45 π, θ
5 + 65 π, θ
5 + 85 π (for k = 0, 1, 2, 3, 4).
For instance, k = 2007 will land us on the third possibility:
5 + + (802 + 45 )π ≡ 5θ + 45 π (mod 2π).
5 π = θ

This explains why there are exactlyfive roots 5 z, given by k = 0, 1, 2, 3, 4. 
6. ROOTS IN C 197

Now you should repeat this whole reasoning for a general n ∈ N, in the
place of 5. To really understand the Root Formula, do the following:
Exercise 6. Let z = (4, 23 π).

(a) Use polar coordinates to show that z has exactly two square roots z.

First reason geometrically, and then use the Root Formula. Draw a

(b) Repeat the exercise for the cube roots 3 z, showing z has exactly three
cube roots.

6.2. The provocation: choosing your favorite root of z. Taking in-

teger powers of z is a “one-way” street in the sense that there is only one
answer for z n . Thus,
√ for instance, filling in the white dots on the power
curve Pz for z = 3 + i was straightforward: plot the unique point z n .

However, if we want to fill in Pz with a dot corresponding to a root 5 z,
we have to make a choice among the five possible
» such roots w0 , w1 , . . . , w4 .
Note that all wk ’s have the same modulus |z|, i.e., they lie on the same

circle centered at O (drawn dashed in Fig. 5a). Hence only one of the √ wk ’s
should land on the spiral Pz . Figure 5a seems to indicate that for z = 3+i,
it is
w0 = 5
|z|(cos 5θ + i sin 5θ )
that lands on Pz . But is this true for any complex z? Besides, the angle
θ in the polar form for z was arbitrarily chosen up to a multiple of 2π. If
we change θ to θ + 2π in the expression for w0 , we will end up with the
formula for
w1 = 5
|z|(cos θ+2π θ+2π
5 + i sin 5 ).
So, should w1 also lie on Pz ? What is going on? We can clearly see that
only one of the roots wi can land on Pz . . . .
The answer is hiding where we are not looking for it: we haven’t really
defined the power curve Pz , other than saying it is a “smooth curve passing
through all integer powers of z”. But maybe there are several such smooth
curves, one of which passes through w0 , and another through w1 ?! We shall
resolve this question in Part III and extend the discussion to any powers z v ,
whether v is real or complex.
For the time being, check your understanding by solving the following:
Exercise 7. Consider the equations w6 = z for z = 1, −64, i and 64i.
(a) Find all complex solutions w of these four equations, and draw pictures.
 (b) In each case, can you visually select “the one” solution w which lies on
the corresponding power curve Pz ? How are you sure that your choice
is correct?

7. Roots of Unity and Regular Polygons

7.1. A definition is in order. The frequently encountered equation z n = 1

(or equivalently, z n − 1 = 0) is so important, that its n distinct C-roots have
i been named the nth roots of unity. By the Root Formula, they are given by:
ωk = cos 2πk 2πk
n + i sin n for k = 0, 1, . . . , n − 1.
In other words, all (ωk )n = 1. We denote the root ω1 = cos 2π 2π
n + i sin n by
i ω and refer to it as a primitive n root of unity. The powers of ω yield the
th 3

other roots of unity: ω k = ωk for all k, and hence the name primitive root.
We conclude that the original polynomial z n − 1 factors as:
(8) (z − ω0 )(z − ω1 ) · · · (z − ωn−1 ) = (z − 1)(z − ω)(z − ω 2 ) · · · (z − ω n−1 ).
Exercise 8. Verify that ω = ω −1 = ωn−1 . Conclude that the other roots of
unity also pair up under conjugation: ωk = ωn−1−k for k = 0, 1, . . . , n − 1.

7.2. Choosing the best coordinate system. The roots of unity are not
only algebraic objects – roots of a polynomial – they are also geometric ob-
jects; the vertices of a regular n-gon inscribed in the unit circle (cf. Fig. 5b).
Can we use this to our advantage in geometry problems? Let us start with
a relatively straightforward situation.
Exercise 9. Let A0 A1 A2 A3 A4 be a regular pentagon. Find a C–coordinate
 system in which the five vertices A k are easily encoded as complex numbers.

Solution: “Obviously,” we should place all vertices of the pentagon on

the unit circle. This forces us to choose the origin O as the center of the
pentagon (cf. Fig. 5b). For ease of calculations, we can further place the
number “1” to coincide with vertex A0 , and we can even choose the positive
imaginary axis in such a way that the vertices A0 , A1 , . . . , A4 are arranged
counterclockwise along the unit circle.
Since the five central angles ∠A0 OA1 , ∠A1 OA2 , . . . , ∠A4 OA0 are all
equal to 2π/5, one can get from any vertex to the next via multiplication by
the primitive 5th root ω: A0 = 1, A1 = ω, A2 = ω 2 , A3 = ω 3 , and A4 = ω 4 .
In other words, the five vertices correspond to the five 5th roots of unity. 
The reader should be able to generalize now this situation to any regular
n-gon and draw the following problem-solving conclusion:
 PST 65. In a specific geometry problem, pick the most convenient point
and unit of length for the origin and for the radius of the unit circle; in plain
language, you can place the points 0 and 1 wherever in the plane you (wisely)
wish, thereby fixing the real axis. If need be, you can further also pick one
of two possibilities for the positive direction of the imaginary axis.
Note the subtle difference between the Greek letter ω (omega), which we reserve for
the roots of unity, and the Latin letter w, which stands for an arbitrary complex number.

7.3. What if a C–system is fixed? For example, if the C–coordinates are

already chosen to fit well other objects in the plane, how do we determine if
a given n-gon is regular?
Exercise 10. Prove that A0 A1 ...An−1 is a regular n-gon iff for some v, z ∈ C,
v = 0, and all k = 0, 1, . . . , n − 1, we have
 (a) Ak = vω k + z (polygon oriented counterclockwise);
(b) Ak = vω −k + z (polygon oriented clockwise).

A3 B4
+z ω1
B3 B0 ω3 ω2
A1 ∗v = ru w
A2 ω0 ω0
ω2 ω3
ω1 ω4

Figure 6. Regular polygon, oriented clockwise

Sketch: Figure 6 demonstrates how to transform the fifth roots of unity

ωk to the vertices Ak of an arbitrary regular pentagon, oriented clockwise:
• Reflect the (rightmost) polygon in Figure 6d across the real axis to
reverse the order of its vertices, as in the polygon in Figure 6c,
• Rotate the latter about the origin to arrive at the dashed polygon in
Figure 6b.
• Rescale the latter to land on polygon B0 B1 . . . B5 , also in Figure 6b.
• Translate the latter to obtain polygon A0 A1 . . . A5 in Figure 6a.
We can schematically arrange the transformations as follows:
+z ∗r ∗u w
poly 5 ←− poly 4 ←− poly 3 ←− poly 2 ←− poly 1,
where u is unit (|u| = 1), r is real, and z is complex. Note that multiplying by
the complex v = ru represents directly the move poly 4 ←− poly 2. Putting
everything together, the total transformation sends ω k → vω k + z = Ak . ♦
Often, you will see the equations in Exercise 10 paraphrased as follows.
Exercise 11. Show that A0 A1 . . . An−1 is a regular n-gon oriented counter-
clockwise iff
 Ak+2 − Ak+1
Ak+1 − Ak
= ω for k = 0, 1, . . . , n − 3.

In particular, show that A0 A1 A2 is equilateral iff ν0 A0 + ν1 A1 + ν2 A2 = 0

where ν0 , ν1 , ν2 are the third roots of unity (in some order).

8. Geometric Promise Fulfilled

8.1. Products of distances. We are finally ready to take on the intro-

ductory nonagon Problem 1 from Part I and solve it with complex numbers.
Here follows a paraphrase that utilizes some of the new terminology and
techniques we learned in this session so far.
Problem 4. The vertices of a regular 9-gon inscribed in the unit circle are
Ak = ω k for k = 0, 1, . . . , 8, where A0 = 1 and A1 = ω = (1, 2π/9), a
primitive 9th root of unity. Prove that the product of segment lengths from

A0 to the other eight vertices is 9, i.e., that 8k=1 |A0 Ak | = 9 (cf. Fig. 7a).

A2 A2 A2
A3 A3 A3
A1 A1 A1
A4 A4 A4
O A0 O A0 P O A0
A5 A5 A5
A8 A8 A8
A6 A6 A6
A7 A7 A7

Figure 7. Problem 4 and Generalizations

Solution: Since length is represented by modulus in C, i.e., |A0 Ak | =

|1 − ω k | for all k, the desired equality becomes
(9) |1 − ω||1 − ω 2 ||1 − ω 3 | · · · |1 − ω 8 | = 9
(10) ⇔ |(1 − ω)(1 − ω 2 )(1 − ω 3 ) · · · (1 − ω 8 )| = 9.
Equation (10) was obtained using the fact that modulus respects multipli-
cation. But this looks suspiciously like the factorization of z 9 − 1 in (8):
(11) z 9 − 1 = (z − 1)(z − ω)(z − ω 2 ) · · · (z − ω 8 ).
If we plug z = 1 into (11), we will get 0 = 0, which isn’t helpful. The reason
for the zeros is the factor of (z − 1) on both sides, so we divide by it:
z9 − 1
(12) = (z − ω)(z − ω 2 ) · · · (z − ω 8 ).
Plugging z = 1 in the RHS is OK now, but we must get rid of (z − 1) in the
denominator on the LHS. Let’s recall the following useful factorization:

Lemma 1. z n − 1 = (z − 1)(z n−1 + z n−2 + · · · + z + 1) for z ∈ C and n ∈ N.

“Proof” of Lemma 1 via Example: To make things crystal clear for
the novice, let n = 5. Brute force works great here – use the distributive
property to expand the RHS and cancel just about everything in sight:
(z − 1)(z 4 + z 3 + z 2 + z + 1) = z 5 + z 4 + z 3 + z 2 + z
− z 4 − z 3 − z 2 − z − 1 = z 5 − 1. ♦

Applying Lemma 1 for n = 9, the LHS of (12) becomes

z9 − 1 (z − 1)(z 8 + z 7 + · · · + z + 1)
= = z8 + z7 + · · · + z + 1
z−1 (z − 1)
(13) ⇒ z 8 + z 7 + · · · + z + 1 = (z − ω)(z − ω 2 ) · · · (z − ω 8 ).
We are finally free to plug in z = 1 into (13): 9 = (1 − ω)(1 − ω 2 ) · · · (1 − ω 8 ),
and noting that |9| = 9 we get the desired equality of distances in (10). 
The generalization of this solution to any regular n-gon is straightforward
and it can be summarized algebraically by
Corollary 5. If ω is an nth primitive root of unity, then
z n−1 + z n−2 + · · · + z + 1 = (z − ω)(z − ω 2 ) · · · (z − ω n−1 )
as polynomials in z. In particular, (1 − ω)(1 − ω 2 ) · · · (1 − ω n−1 ) = n.

8.2. Getting extra mileage. The reader should be at least a bit curious
about the need for factoring out and cancelling (z − 1) on both sides of (11):
this was caused exclusively by our determination to plug in z = 1. What if
we plug any other complex number z into (11): as long as z = ω k (for any
integer k), we will get a non-trivial equality. The question is: which of these
inequalities will correspond to an elegant geometric formula?
Problem 5. Let A0 A1 . . . An−1 be a regular polygon inscribed in a circle of
radius r and center O, and let P be a point on ray OA0 beyond A0 . Prove that
the product of distances from P to the vertices of the polygon is |OP |n − rn .
Proof: A nonagon version of this problem is presented in Figure 7b. Again,
we fix the origin O at the center of the polygon, and let A0 lie on the positive
real axis. Because of the given radius r, we slightly adjust by making A0 = r,
and hence Ak = rω k for k = 0, 1, . . . , n − 1. Since P also lies on the positive
real axis, it is advantageous to write P in a similar way: p = rq for some
real q > 0. The desired product is calculated by:

|P Ak | = |rq − rω k | = rn |q − ω k |
k=0 k=0 k=0
= rn |(q − 1)(q − ω)(q − ω 2 ) · · · (q − ω n−1 )| = rn |q n − 1|.
The last equality was obtained from (8) for z = q. We can put r back inside
the modulus: rn |q n −1| = |(rq)n −rn | = |pn −rn |. Now, if P were an arbitrary
point, we would stop here since there would be nothing to simplify. But P
lies on the ray OA0 , outside of the circle. Hence, p ∈ R and p > r. Thus,
pn − rn is also a positive real number, which therefore equals its modulus:

|P Ak | = |pn − rn | = pn − rn = |OP |n − rn . 
Along the way we established that for an arbitrary point P , the corresponding

k=0 |P Ak | = |p − r |.
product (illustrated in Fig. 7c) is given by n−1 n n

9. Venturing Everywhere in the Plane

9.1. Sums versus products. Now, why should the product of the above
segment lengths be any more interesting than, say, their sum? If you try
to calculate |P A0 | + |P A1 | + · · · + |P An−1 |, you will find out that, due to
convoluted square roots, this sum is harder to control and simplify than the
product. For some people the more obvious and more important question
would be to investigate the sum of squares, |P A0 |2 +|P A1 |2 +· · ·+|P An−1 |2 ;
for instance, such people
• may have studied a bit of statistics and are therefore always tempted
to minimize sums of squares of distances;
• are geometry fans of Pythagorean-like problems and would like to gen-
eralize the Pythagorean Theorem;
• have understood Part I of complex numbers well enough to realize that
the modulus |z| is much harder to manipulate since it involves a square
root, while the square |z|2 = zz = a2 + b2 is susceptible to more than
one method of slick calculation.

9.2. Restricting point P to the circumcircle allows us to calculate the

sum of the squares for this special placement of P (cf. Fig. 8a).
Problem 6. If A0 A1 . . . An−1 is a regular n-gon and P lies on its circum-
scribed circle, prove that |P A1 |2 + |P A2 |2 + · · · + |P An |2 is constant.
Proof:4 The C–coordinatization from Problem 5 also works well here.
We set Ak = rω k for k = 0, 1, . . . , n, where r is the circumradius. Then
P = p = rq with |q| = 1 (why?), and |P Ak | = |rq − rω k | = r|q − ω k |. Then

|P Ak |2 = r 2 |q − ω k |2 = r 2 (q − ω k )(q − ω k ) = r 2 (q − ω k )(q − ω k )
k=0 k=0 k=0 k=0

= r2 (qq + (ωω)k − qω k − qω k ) = r 2 2 − r2 q ωk − r2 q ωk
k=0 n−1  k=0 k=0 k=0
(14) = 2nr − r q
2 2
ω k −r q2 k
ω .
k=0 k=0
Along the way, we used that qq = 1 = ωω, since both q and ω lie on the unit
circle. A little bit of “cheating” is in order now. Recall that we are supposed
to show that the above sum |P Ak |2 is independent of the position of P on
the circumscribed circle; i.e., n and r are OK, but the variable q should be
eliminated from the last expression. To this end, it would be very convenient
if r2 q and r2 q are each multiplied by 0 in (14).
This leads us naturally to the next lemma.
Lemma 2. The sum of all nth roots of unity is 0: n−1 k

 k=0 ω = 0. Geometri-
cally, the center of a regular n-gon inscribed in the unit circle is the origin.
Algebraic manipulations of complex numbers are required for this solution.

ω A1
ω2 A2

O 1 O A0
ω3 A3
ω4 A4

Figure 8. Sums of squares

We present four different proofs of this fact, each proof using a different
PST. Even though four proofs are an “over-overkill” for the task at hand,
one never knows which idea will end up being useful in a later situation.
Proof 1 (Equating): By (8), z n − 1 = (z − ω0 )(z − ω1 ) · · · (z − ωn−1 ).
This is not just one equation, but several. Indeed, if we multiply out the
RHS and regroup around the powers of z we will obtain
z n − 1 = z n − (ω0 + ω1 + · · · + ωn−1 )z n−1 + · · · + (−1)n (ω0 ω1 · · · ωn−1 ).
We can equate the coefficients5 on both sides for any power of z. But there
is no power of z n−1 on the LHS! Therefore, equating its coefficients on both
sides yields 0 = ω0 + ω1 + · · · + ωn−1 . 
Proof 2 (Series): Why not use Lemma 1? Substituting the nth primitive
root of unity ω for z, we arrive at
ωn − 1 1−1
1 + ω + ω 2 + · · · + ω n−1 = = = 0. 
ω−1 ω−1
Along the way, we realized that Lemma 1 is equivalent to the well-known
and useful formula for a geometric series, which made its appearance in the
Stomp session in volume I:
zn − 1
(15) a + az + az 2 + · · · + az n−1 = a for any a, z ∈ C, z = 1.
Proof 3 (Invariants): If S = ω0 + ω1 + · · · + ωn−1 , multiplying each
vertex ωk by the primitive root of unity ω simply rotates ωk to the next vertex
ωk+1 (where ωn = ω0 = 1). Overall, the set of vertices remains the same:
{ωω0 , ωω1 , · · · , ωωn−1 } = {ω1 , ω2 , · · · , ωn−1 , ω0 } = {ω0 , ω1 , · · · , ωn−1 }, and
thus the sums in these two sets are equal:
S = ω0 + ω1 + · · · + ωn−1 = ωω0 + ωω1 + · · · + ωωn−1 = ωS
⇒ S − ωS = 0 ⇒ S(1 − ω) = 0 ⇒ S = 0. 
Equating these coefficients would yield n relations between the roots and the coeffi-
cients of the given polynomial, which are a special case of Viète’s formulas. For instance,
equating the free terms yields −1 = (−1)n (ω0 ω1 · · · ωn−1 ), i.e., ω0 ω1 · · · ωn−1 = (−1)n−1 .

Proof 4 (Centroid): There’s got to be a geometric proof! Recall that

adding complex numbers is essentially the same as adding vectors emanating
from the origin. For those of you who know a bit about vectors: in the case
−−→ −−→
of our regular polygon, adding all OAk results in OO = 0 because the center
of mass (the centroid) coincides with the center O of the polygon. ♦
Completion of Problem 6: We can now conclude that the sum of the
nth roots of unity is 0, i.e., n−1 k
k=0 ω = 0. The sum in (14) is then equal to

k=0 |P Ak |
= 2nr2 − r2 q · 0 − r2 q · 0 = 2nr2 ,
n−1 2

which certainly does not depend on the specific P and is thus constant. 
9.3. Letting P wander off in the plane. Naturally, we should question
the necessity of placing P on the circumcircle of the n-gon, and we should
attempt to generalize Problem 6 to any point P in the plane.
Exercise 12. Given a regular n-gon A0 A1 . . . An−1 , calculate the sum
 |P A1 |2 + |P A2 |2 + · · · + |P An |2
and determine for which P it is minimal.
Sketch: The proof of Problem 6 goes through here with only one small
change. We can write again P = p = rq where r is the circumradius, but
|q| is no longer required to be 1. We adjust the calculation accordingly:
qq = |q|2 and ωω = 1, so that

|P Ak |2 = r2 (qq + (ωω)k − qω k − qω k )
k=0 k=0
= r 2 |q|2 n + r 2 n − r 2 q · 0 − r 2 q · 0 = n(|rq|2 + r 2 ) = n(|p|2 + r 2 ).
We conclude that for any circle K centered at O of radius |p|, the given sum
depends only on |p| and is therefore constant along K. Figure 8b displays
four examples of such circles K, along each of which the sum of squares
remains constant. As the circle K shrinks, the sum also decreases, and its
minimal value of nr2 is obtained when P coincides with O:
|P A1 |2 + |P A2 |2 + · · · + |P An |2 ≥ nr2 , with equality iff P = O. 

9.4. Summary of PSTs. It is important to record the various ideas which

we used in the proof of Lemma 2, since these ideas are ubiquitous.

 f (z) = g(z), equating the coefficients on both sides for each power z
PST 66. (Partial Viète’s Formulas) Given an equality of polynomials
k yields

a relation. In particular, for a polynomial f (z) of degree n with leading

coefficient 1 whose roots are z1 , z2 , . . . , zn (counted with multiplicities), i.e.,
f (z) = z n + an−1 z n−1 + · · · + a1 z + a0 = (z − z1 )(z − z2 ) · · · (z − zn ),
the sum of the roots is minus the coefficient of z n−1 : z1 +z2 +· · ·+zn = −an−1 ,
and the product is ± the constant term: z1 z2 · · · zn = (−1)n a0 .

PST 67. (Geometric Series) When calculating a sum S, try to identify it
with some well-known type of sum. In particular, if each term is the previous
term times the same number z = 1, we can use formula (15) for the so-called
i geometric series with ratio z and initial term a.

 bers S = {z , z , . . . , z }, suppose that for some complex number c = 1

PST 68. (Invariant under Multiplication) Given a set of several num-
1 2 n
multiplication by c rearranges the elements of S, i.e.,
as sets
S = {z1 , z2 , . . . , zn } = {cz1 , cz2 , . . . , czn }.
Then the sum z1 + z2 + · · · + zn is 0. The product z1 z2 · · · zn is also 0 if in
addition c is not an nth root of unity (why?).

10. Which are the “Closest” Lines

In Section 9 we explored a problem involving distances from a point P

to the vertices of a regular polygon. Now let’s replace the point P by a line l
and ask the same question. We arrive at the problem posed in the beginning
of the session, which we reformulate below:
Problem 7. If A0 A1 . . . An−1 is a regular n-gon and l is a line, let dk be
the distance from vertex Ak to l for k = 0, 1, . . . , n − 1. Consider the sum of
squares of all such distances: S = d20 + d21 + · · · + d2n−1 . For which line(s) l
is S minimal and what is this minimal value?
Figure 1 depicted a regular pentagon and a line l and showed the five
distances dk via the perpendiculars from the vertices Ak to l. We could
proceed as before and setup our C-system so that the Ak ’s line up with the
nth roots of unity. The line l will have some equation in this C-system, and
we will have to find a formula for the distances from each Ak to l. Even
though this is possible and not hard (the reader should try it), let us go
along a different route which will introduce another problem-solving idea.

10.1. Fixing, unfixing, and adjusting. Contrary to how the problem

is phrased, let us first fix the line l as the most “convenient” line in the
plane, and then adjust the rest of the C-system to fit l. Given a number
z = x + yi ∈ C, two lines seem the most “convenient” for finding the distance
dz from z to l:
• If l is the real axis, then dz = y = Im(z).
• If l is the imaginary axis, then dz = x = Re(z).
Verify the following algebraic formulas which express x and y in terms of z.

 Lemma 3. For any z ∈ C, Re(z) = x =

and Im(z) = y =

Lemma 3 suggests that it will be slightly easier if we fix l to be the

imaginary axis, and then move the polygon to fit this situation. Our final

Figure 9d depicts just that: the imaginary axis is rotated and shifted to
coincide with line l. Meanwhile, the pentagon is also rotated and shifted
without changing its relative position to l. The vertices of the pentagon may
no longer be the 5th roots of unity.
l A1 A1 l l A1 l A0
A2 A2 A1
A0 A4
O A0 O O A0 O
A3 A3 A2
A4 A3
A4 A3 A4

Figure 9. Distances from a line to a pentagon

Instead of jumping directly from Figure 1 to Figure 9d and dealing with

all of the complicated calculations at the same time, we shall solve the prob-
lem in stages and cope with each computational obstacle as it arises.

10.2. If l happens to be the imaginary axis, then the polygon (as

depicted in Fig. 9a) does have its vertices as roots of unity: Ak = ωk so that
ωk ωk = 1 for all k. The distances are easily computed by Lemma 3:
Ä ä
1 2 
d2k = 14 (ωk + ωk )2 = 4 ωk + 2ωk ωk + ωk 2 = 1
2 + 1
4 ωk2 + ωk2 .

n−1 Ä
n−1 ä
(16) ⇒ S= d2k = 12 n + 1
4 ωk2 + ωk2 ,
k=0 k=0
where we have added all expressions d2k for k = 0, 1, . . . , n − 1. Recall that
the sum of all nth roots of unity is 0. We need the sum of their squares:

 Lemma 4.  
n−1 n−1
ωk2 = 0 and, more generally, ωkm = 0 for any integer m.
k=0 k=0

Hint: The necessity for the geometric series approach should be evident
here: write out the sums and decide what your ratio and initial term are. ♦
Plugging the result of Lemma 4 into (16) yields the desired sum: S = 12 n.
(Why did the sum of conjugates ωk2 in (16) disappear too?) 

10.3. If l passes through the center O of the polygon, this yields

another special case (as depicted in Fig. 9b). The vertices of the polygon
may no longer be the original roots of unity, but they are still on the unit
circle. If the polygon has been rotated by angle θ from its original position,
its vertices Ak have been correspondingly multiplied by some unit number
u = (1, θ), i.e., Ak = uωk for all k, and uu = 1. The distances dk are
computed similarly by Lemma 3:
 2 2  Ä ä
d2k = 14 (uωk + uωk )2 = 1
4 u ωk + 2uuωk ωk + u2 ωk 2 = 12 + 14 u2 ωk2 + u2 ωk2 .

The only difference from the previous case are the u2 and u2 , which are
“stuck” in front of the squares ωk2 and ωk2 . However, since u2 and u2 are
constants, after summing up everything over k, they will factor in front of
the sums ωk2 = 0 and ωk2 = 0, yielding:

n−1 Ä ä Ä ä
S= d2k = 12 n + 1
4 u2 ωk2 + u2 ωk2 = 12 n + 1
4 u2 · 0 + u2 · 0 = 12 n.
We conclude that as long as the line l passes through the center of the
polygon, the angle of rotation θ about the origin will not matter and the
sum will remain the same. 

10.4. If l is parallel to the original imaginary axis, the situation is

depicted in Figure 9c and is our last special case. It corresponds to translat-
ing the polygon horizontally by some (real) number t: the vertices Ak may
no longer be on the unit circle. Instead, Ak = ωk + t, with t = t. Again we
calculate the distances by Lemma 3:
Ä ä2 Ä ä2
d2k = 1
4 (ωk + t) + ωk + t = 14 (ωk + ωk ) + 2t
= 14 (ωk + ωk )2 + t(ωk + ω k ) + t2 .
The first summand 14 (ωk + ωk )2 appeared in the case of Subsection 10.2 and
contributed 12 n to the sum S. The second term 12 t(ωk + ω k ) will yield 0 in S
(why?), and the last term t2 will contribute t2 n. Overall, S = ( 12 + t2 )n. 
The term t2 ≥ 0 matches our intuition: as the line l recedes from the
polygonal center, the sum of the distances grows.

10.5. If l is an arbitrary line, in order for l to line up with the imaginary

axis (as in Fig. 9d), our polygon needs to be rotated first (as in Fig. 9b) and
then to be translated horizontally (as in Fig. 9c). The vertices will be given
by Ak = uωk + t where |u| = 1 and t ∈ R. One final time we calculate by
Lemma 3:
d2k = 1 2 1
4 ((uωk + t) + uωk + t) = 4 ((uωk + u ωk ) + 2t)2
1 2 2
= 4 (uωk + u ωk ) + t(uωk + u ω k ) + t .
Sum over k and combine the previous cases (fill in the details!) to get
S = 12 n + t(u · 0 + u · 0) + t2 n = ( 12 + t2 )n. 

10.6. The “closest lines”: conclusions and a look ahead. The actual
final answer should be adjusted to reflect the fact that the circumradius of
our polygon may be some (real) r = 1: if Ak = ruωk + t, then
S = ( 12 r2 + t2 )n.
Note that the real number t measures the distance from the line l to the center
of the polygon. Thus, the sum is minimal when t = 0, i.e., the “closest” lines
l pass through the center O and yield a minimal sum S = 12 r2 n. 

It is curious that the answers for this Problem 7 and for the previous
Problem 6 are in some sense identical: the minimal sums are obtained when
the line l and the point P are incident with O: l passes through O or
P = O. Is there a deeper reason, beyond our calculations, for such a
“coincidence” of answers? To nudge you in one possible direction, here is
a related problem in 3-dimensions. It is suggested by one of the giants of
contemporary mathematics, the Russian Vladimir Arnol’d, in his Trivium
Mathematique [7], a collection of 100 problems that he expects every well
educated mathematician should be able to solve.
Problem 8. Given a cube, let l be a line through its center. Consider the
sum of the squares of distances from each of its vertices to l. For which
such line l is this distance minimal? How about replacing the cube by other
Platonic solids? 6
For further discussion of C and more similar examples check out the
books by Hahn, Needham, Schwerdtfeger, and Yaglom [35],[60],[72],[86],
and look for the next two sessions on complex numbers. Part III will round
up the theoretical discussion of C by applying the Fundamental Theorem
of Algebra to real polynomials, while Part IV will apply C-techniques to
solving, as promised, Olympiad-type geometry problems from around the

11. Hints and Solutions to Selected Problems

Exercise 1. In Cartesian form w = a + ib, so that

ww = (a + bi)(a − bi) = a2 − (bi)2 = a2 + b2 = |w|2 .
Along the way, we remember the basic definition of i: i2 = −1. 

Exercise 2(c). Since 7 + v = 7 − v, be careful when conjugating the de-

nominator: 7 + v = 7 + 2 + 3i = 9 − 3i. The final answer is 15
− 15
i. ♦

Exercise 2(d). By formula (2), we arrive at the unsightly expression:

a + bi (a + bi)(c − di) (ac + bd) + i(bc − ad) ac + bd bc − ad
= = = 2 + 2 i. ♦
c + di (c + di)(c − di) 2
c +d 2 c + d2 c + d2
Corollary 1. Instead of the tedious task of dividing (and possibly using the
above uninspiring formula), let’s “cheat”: we already know that division in
C is well-defined for w = 0 and yields a unique answer. (Do we know this?
Explain why.) In other words, we know that z/w is some complex q such
that qw = z. But Corollary 1 gives in polar form one candidate for q. So
The Platonic solids, also referred to as regular polyhedra, are convex polyhedra whose
faces are congruent convex regular polygons. There are exactly five such solids: the
tetrahedron, cube, octahedron, dodecahedron, and icosahedron. Check out the website

it remains only to show that this candidate behaves as a true ratio should.
We multiply it by w in hopes of getting z:
|z| |z|
|w| , θ − μ · (|w|, μ) = |w| · |w|, (θ − μ) + μ = (|z|, θ) = z.
Along the way, we used (1) from Section 2 in order to multiply in polar form.
We conclude that |w| , θ − μ indeed equals the ratio z/w. 

Exercise 3. The identities follow directly from Corollary 1 in polar form; e.g.,
z |z| |z| |z| (|z|,−θ)
w = |w| , θ −μ = |w| , −θ +μ = |w| , −θ − (−μ) = (|w|,−μ) = w·

Justify all steps. Why did we do the “strange” double-negation “−(−μ)”? ♦

Exercise 4. In part (a), |z + w| “loses” by a lot to |z| + |w|:

√ √ √
|z + w| = |8 − 8i| = 8|1 − i| = 8 2 = 128 < 324 = 18 = 5 + 13 = |z| + |w|.
At the same time, |z − w| “wins” by a lot to |z| − |w|:

|z − w| = | − 2 + 16i| = 2 65 > −8 = 5 − 13 = |z| − |w|.
This trend continues elsewhere, except that for w = 6 + 8i we have a tie:
|z + w| = |9 + 12i| = 15 = 5 + 10 = |z| + |w|,
and for w = 1+ 43 i there are two ties: |z +w| = |z|+|w| and |z −w| = |z|−|w|.
Note that all ties arise when the numbers are rescales of each other: w = 2z
in (b)’s tie, and z = 3w in (c)’s ties. ♦

Corollary 3. By the ordinary Triangle Inequality applied to (z − w) and

w, equality is obtained in (4) iff (z − w) = cw for some real c ≥ 0, or w = 0.
Translating, z = (1 + c)w = dw for some real d ≥ 1, or w = 0. No wonder
in Exercise 4 we got the only tie of the form |z − w| = |z| − |w| in part(c)
when z = 3w. 
√ 2004 î√ Ä äó2004
Problem 1. (1 − i)2004 = 2( √12 − i √12 ) = 2 cos(− π4 )+i sin(− π4 )

= ( 2)2004 (cos(− 2004π 2004π
4 )+i sin(− 4 )) = 2
1002 (cos(−501π)+i sin(−501π)).

Since cos(−501π) = 1 and sin(−501π) = 0, the final result is −21002 

Problem 3(b). θ = π
8 + kπ
4 , and hence b
a = tan( π8 + kπ
4 ) for 0 ≤ k ≤ 7. ♦
√ Ä√ 2π +2kπ ä
Exercise 6(a). If w = (4, 2π 3 ) then w = 4, 3 2 = (2, π3 + kπ) for
k = 0, 1. Check that these two solutions work and draw relevant pictures. ♦

Exercise 11. For the first part, assuming that the polygon is regular, we
can simply plug the formulas Ak = vω k +z into the given fractions and verify
that we’ll get ω:
Ak+2 − Ak+1 (vω k+2 + z) − (vω k+1 + z) ω k+1 (ω − 1)
= = = ω.
Ak+1 − Ak (vω k+1 + z) − (vω k + z) ω k (ω − 1)

We are not done yet! Conversely, assuming that

Ak+2 − Ak+1
= ω for k = 0, 1, . . . , n − 3,
Ak+1 − Ak
we must show that the polygon is regular. For the reader not versed in indices
and sequences, this may seem like a formidable task. To make things more
accessible, let’s assume n = 5 for the time being. We have three equations:
A2 − A1 A3 − A2 A4 − A3
(17) = ω, = ω, = ω.
A1 − A0 A2 − A1 A3 − A2


A1 5 A0 A1 A0 A1 A0
- 3π

A2 - 3π
5 A2 A4
A2 - 3π

A3 A3
Figure 10. Building a Regular Polygon from Equations (17)

We must show that A0 A1 A2 A3 A4 is regular. Instead of doing this alge-

braically, let’s try a geometric argument. Rewrite the first equation as
(18) A2 − A1 = (−ω)(A0 − A1 ).
Since ω corresponds to angle 2π n on the unit circle, then −ω corresponds
to −(π − 2π n ) = − n−2
n π = − 3
5 π (cf. Fig. 10a). Equation (18) then means
that segment A0 A1 goes to segment A2 A1 via rotation about A1 at angle 35 π
clockwise, i.e. sides A0 A1 and A2 A1 have same lengths, and ∠A0 A1 A2 = 35 π
when traversed clockwise. Excellent: all of the interior angles of a regular
pentagon are equal to 35 π (why?) Reasoning in a similar fashion with the
other two equations in (17), we can build the pentagon A0 A1 A2 A3 A4 in three
stages, as shown in Figure 10. The reader should explain why in Figure 10c
we have arrived at a regular polygon. 
Session 9

Introduction to Inequalities. Part I

Arithmetic, Geometric, and Power Means

based on Bjorn Poonen’s session

Sneak Preview. When your teacher calculates the average of your exams scores,
she usually adds up all scores and divides by the number of exams. But what
if instead she multiplies the n scores and takes the nth root of that, or adds up
the squares of the scores, divides by n, and takes the square root of that? If all
exam scores are equal, these three ways will yield the same average; but if even
two exam scores differ, the results will all be different. Which method yields the
highest average? And what if your teacher weights the exams unequally?
This session will answers these and more questions from the realm of inequali-
ties. Some problems will invoke geometry, combinatorics, or calculus, and can be
skimmed on a first reading. Part II will tackle other fundamental inequalities.

1. The Language of Inequalities

“For a real number t, one has t2 ≥ 0,

with equality if and only if t = 0.”

What does a statement like this mean? It seems it can mean only one
thing: that “the squares of real numbers are non-negative.” Actually, it is
saying three things. The main part of the statement is that
(1) if t is a real number, then t2 ≥ 0.
But the last phrase “with equality if and only if t = 0” adds two more things:
(2) if t = 0, then t2 = 0; and
(3) if t2 = 0, then t = 0. (Thus if t = 0, then t2 > 0.)
This simple example shows that there is an interplay between the language
of equalities and the language of inequalities, and that often statements of
inequalities may be saying “more” than what can be seen on the surface. As
we go through this session, we will introduce further terminology related to
inequalities and pay close attention to the specific language used so as to
interpret and use it correctly.

2. Arithmetic Mean – Geometric Mean Inequality

2.1. Gardening With Baby AM-GM. The most basic arithmetic mean–
geometric mean inequality involves only two variables:
Lemma 1. (Baby AM-GM) If x and y are non-negative real numbers,
x+y √
≥ xy, with equality if and only if x = y.

Again, the last phrase “with equality. . . ” means two things:

(1) if x = y ≥ 0, then x+y = xy (obvious; check it!); and conversely,
√ 2
(2) if x+y
2 = xy for some x, y ≥ 0, then x = y.
Figure 1a depicts the relative positions of the two means when 0 < x < y.
The name “AM” obviously comes from applying arithmetic operations to x
and y to obtain the arithmetic mean x+y 2 .
The name “GM” might come from the geometric problem of constructing
a square with the same area as a given rectangle (cf. Fig. 1b), or it might
come from its geometric solution in Figure 1c, where two segments of lengths

x and y make the diameter AB of a circle and the length of the (dashed)

perpendicular CD to that diameter turns out to be the geometric mean xy.
The baby AM-GM inequality itself can be visualized using the shaded right
OCD, where the hypotenuse is “AM” (also equal to the radius), while the
(dashed) leg is “GM”. The proofs of these facts can be found in the plane
geometry interlude at the end of the session.
√ C x+y
? xy

xy O
x Area A = Area A    
0 x x+y y A xD y B
y √
2 xy k

Figure 1. Arithmetic and geometric means when 0 < x < y

Hint: Using some algebra instead, deduce the baby AM-GM √ from the

inequality in the opening quotation on page 211 by letting t = x − y. For

i x = y, explain why x+y2 > xy, an example of a strict inequality. ♦

2.2. Which is the perfect garden? Inequalities such as the AM-GM

inequality often give a quick way to solve optimization problems.

Problem 1. A rectangular garden is to be constructed using 20 meters of
fence for three of the sides, and using an existing long wall for the fourth
side. What is the maximum possible area that can be enclosed?
 PST 69. Knowing if and when an inequality becomes an equality is usually
the key to finding extreme values, e.g., baby AM-GM roughly implies that
x = y (> 0) makes the sum x + y minimal and the product xy maximal.

Solution: Let x be the length of the side along the wall (cf. Fig. 2), and
let y be the length of each side adjacent to this side (in meters). We must
find when the area A = xy is maximal for positive x and y such that the
fence length x + 2y is at most 20. In formal language, we must maximize xy
subject to the constraints x, y > 0 and x + 2y ≤ 20.

y y
y y
Figure 2. Two gardens along a wall

 Since we see the sum x + 2y, we apply AM-GM to x and 2y (both > 0):
 AM-GM x + 2y 20
(1) x · 2y ≤ ≤ = 10.
2 2

Squaring both sides of 2xy ≤ 10 and dividing by 2 yields xy ≤ 50.
To finish the problem, we must show that xy = 50 is possible (if not, the

maximum area would be < 50). To obtain xy = 50 in the end, we must have
an equality at each step of (1). This happens when x = 2y (by the equality
criterion in AM-GM) and x + 2y = 20. Solving this system yields x = 10
and y = 5, so these are the only allowable values of x and y that might make
the equality xy = 50 hold, and they do.
We conclude that the maximum possible area is 50 square meters, and
this is attained if and only if the rectangle is 10 meters by 5 meters, with
the long side against the wall. 

2.3. PSTs everywhere! In our garden problem, we used several PSTs:

some pertained to solving optimization problems in general, and others to
applying specific techniques when dealing with inequalities and AM-GM.
 PST 70. In real-life problems asking to find an extreme value of something
(such as the area of a garden), follow the general scheme:
(a) Assign variables to the unknown quantities (e.g., length x and width y)
and note any natural restrictions on them (x, y > 0), as well as con-
straints given by the problem (e.g., x + 2y ≤ 20).
(b) Use these variables to construct the function you need to optimize (e.g.,
the area function f (x, y) = xy).
(c) Translate the problem into a formal mathematical statement of optimiz-
ing the value of the function subject to the constraints (e.g., maximize
f (x, y) = xy subject to x, y > 0 and x + 2y ≤ 20).
(d) Solve the mathematical problem in (c) using whatever methods are
necessary (e.g., AM-GM, calculus techniques, monovariants, etc.).
(e) Translate your answer back into the original problem, ensuring that it
works (e.g., x = 10 and y = 5 satisfy x, y > 0 and x + 2y ≤ 20).

 PST 71. When applying AM-GM to solve an optimization problem, these

ideas might come in handy:
(a) A sum with positive terms can be bounded below by applying ÄAM-
GM, provided you use its summands as the variables in AM-GM e.g.,
√ » ä
x + y ≥ 2 xy, but also x + 2y ≥ 2 x(2y) .
(b) A product with positive terms can be bounded above by applying ÄAM-
GM, provided you use its factors as the variables in AM-GM e.g.,
Ä ä2 Ä ä2 ä
xy ≤ x+y 2 , but also x(2y) ≤ x+2y
2 .
(c) After setting up a chain of inequalities, the optimal value is usually
obtained when the beginning and the end are equal, forcing all inequal-
ities in between to become equalities. In particular, the variables used
in AM-GM will be equal (e.g., x = y or x = 2y).
(d) In the end, check that the values resulting from equality in AM-GM do
yield equalities everywhere else in your chain of inequalities; otherwise,
you will need to modify your solution.
To make sure you understand the above PSTs, verify that:
Exercise 1. In the setting of Problem 1, if the fence length is changed to
something other than 20 m, the previous solution will essentially go through,
still yielding an optimal garden with length twice as long as its width.
However, some modifications are in order to address the next exercise.
Exercise 2. You have bought enough flowers to plant a rectangular area of
50 m2 along the wall, but the fence is very expensive. Find the dimensions of
the garden that will cost you least in terms of the fence. What if, instead of
flowers, you have already purchased a fence of length 20 m: which rectangular
garden will have the largest area, assuming the fence goes all the way around
the garden (no wall)?

2.4. More variables, more challenge for baby AM-GM! The garden
problem can also be solved using calculus, which gives another approach to
many inequality problems. Instead, we did it in detail with the baby AM-
GM because the same reasoning can be used in more complicated problems.
Try the two exercises below: despite their “multivariable” appearance, clever
applications of baby AM-GM for two variables at a time is all you need!
Exercise 3. Prove that for any a, b, c > 0,
 (a + b)(b + c)(c + a) ≥ 8abc,
and determine when equality holds.
 Exercise 4. Prove that n! < Ä ä for all integers n > 1.
n+1 n
Hints: In both exercises, baby AM-GM applies nicely to pairs of numbers.
In the former exercise there isn’t much of a choice for the pairs, whereas
in the latter exercise you have to be careful to pair up the “right” numbers
according to their sum. ♦

2.5. Need more strength. Some problems with more variables cannot
be conquered by a repetitive application of baby AM-GM. And hence, we
formulate the general version of AM-GM for any number of variables:
Theorem 1. (AM-GM) If x1 , x2 , . . . , xn ≥ 0, then
x1 + x2 + · · · + xn √
≥ n x1 x2 . . . xn
with equality if and only if x1 = x2 = · · · = xn .
Theorem 1 and other fundamental inequalities of n variables will be
proven later in Monovariants III. In this session, we assume them and show
how to use them in problems, along with other PSTs. To start off,

 and to make it the numerator of an arithmetic mean.

PST 72. Often the key to using AM-GM is to identify a sum in the problem

You will need to manipulate algebraically both sides of the following

inequalities before you can identify which sum to plug into AM-GM:

 √ 1
Exercise 5. Prove 2 x ≥ 3 − for x > 0.
Exercise 6. Prove that if a ≥ b ≥ 0 and n ≥ 1 is an integer, then
 an − bn ≥ n(a − b)(ab)(n−1)/2.
√ √ √
Hints: Write 2 x as x + x, or factor a − b out of the LHS. Then apply
AM-GM for 3 or for n variables. If equality is attainable, find out when. ♦
2 2 2
Exercise 7. Let E be the ellipsoid xa2 + yb2 + zc2 = 1 for some a, b, c > 0.
 Find, in terms of a, b, and c, the volume of the largest rectangular box that
can fit inside E, with faces parallel to the coordinate planes (cf. Fig. 3a).


x y

Figure 3. Box in ellipsoid and Rectangle in ellipse

Hint: x = y = z is not necessarily useful, but some rescales of these

variables are. Try also the two-dimensional version of the problem asking
for a rectangle of largest area inside an ellipse (cf. Fig. 3b). ♦

 variables x , it is natural to apply the AM-GM inequality precisely to the

PST 73. If an inequality becomes an equality for certain values ai of the
corresponding rescaled quantities xi /ai that can equal each other.

Problem 2. Let g = n a1 a2 . . . an be the geometric mean of the numbers
a1 , a2 , . . . , an > 0. Prove that
 (1 + a1 )(1 + a2 ) . . . (1 + an ) ≥ (1 + g)n .

Hint: Following PST 73, equality holds if a1 = a2 = · · · = an , so one should

apply AM-GM only to terms that are equal when a1 = a2 = · · · = an . This
suggests expanding both sides of the inequality to be proved and grouping
terms according to total degree. Some combinatorics will be needed here. ♦

3. Power Mean Inequality

3.1. Are there any other means? The arithmetic and geometric means
are certainly not the only ways to assign an “average” to several numbers.
Definition 1. Fix x1 , x2 , . . . , xn ≥ 0. For r = 0, the rth power mean Pr of
x1 , x2 , . . . , xn is the rth root of the average of the rth powers of the xi ’s:
Å r ã
x1 + xr2 + · · · + xrn 1/r
i Pr = .
To avoid inverting 0s, assume r > 0 if some xi is 0.
Even though the formula yields nonsense if r = 0, there is a√natural way
i to define P0 too: simply let it be the geometric mean,1 i.e., P0 = n x1 x2 . . . xn .
At the other extreme, what happens when r is very large? If one of the
xi ’s, say xm , is larger than all the others, then xrm will be much larger than
the rth powers of the others, so much larger that Pr ≈ xm . Hence, we define:
i P∞ = max{x1 , . . . , xn }, and similarly, P−∞ = min{x1 , . . . , xn }.
Below are three famous examples of power means:
x1 + · · · + xn x21 + · · · + x2n n
P1 = , P2 = , and P−1 = ·
n n 1
x1 +··· + 1
Here P1 is just the arithmetic mean, P2 is sometimes called the root mean
i square, and P−1 (defined only for x1 , . . . , xn > 0) is the harmonic mean (HM).

3.2. What is the relation between all power means? Briefly, the larger
the power, the larger the mean:
Theorem 2. (Power Mean Inequality) Let x1 , x2 , . . . , xn ≥ 0. Suppose
that r > s (and s ≥ 0 if some xi is 0). Then Pr ≥ Ps , with equality if and
only if x1 = x2 = · · · = xn .
The power mean inequality (PM) holds even if r = ∞ or s = −∞, pro-
vided that we use the definitions of P∞ and P−∞ above, and the convention
that ∞ > r > −∞ for all numbers r. Here are three important special cases
of the PM inequality, including our previous AM-GM:
The definitions of P0 , P∞ , and P−∞ are explained in Section 3.3.

Corollary 1. (AM-GM-HM Inequalities) P1 ≥ P0 ≥ P−1 .

If you are seeing these inequalities for the first time, you should write
them out so as to recognize them more easily in practice.

 √2 times the hypotenuse of the triangle.

Exercise 8. Prove that the sum of the legs of a right triangle never exceeds

Hint: Although PM or AM-GM (P1 ≥ P0 ) are powerful tools in these

exercises, can you get by with just the fact that squares are non-negative? ♦
Exercise 9. Among all planes passing through a fixed

point (a, b, c) with a, b, c > 0 and meeting the positive z
parts of the three coordinate axes, find the one such that (a, b, c)
the tetrahedron bounded by it and the coordinate planes y
has minimal volume.
Hint: For r, s, t > 0 what is the equation of the plane
through (r, 0, 0), (0, s, 0), and (0, 0, t)? x
Try also the two-dimensional version of the problem. ♦

3.3. Limits justify our choices. The discussion below explains the def-
initions for the power means P0 , P∞ , and P−∞ . If you do not know limits
well, you can skip this on a first reading, without hurting your understand-
ing of inequalities. The die-hards can still find the necessary background
material in a real analysis [69] or an advanced calculus textbook.

Let’s start with P0 . The reason for the convention P0 = n x1 x2 . . . xn
is that when r is very small but nonzero the value of Pr is very close to
the geometric mean, and it can be made as close as desired by taking r
sufficiently close to 0. In the language of limits,
Lemma 2. lim Pr = n x1 x2 . . . xn .
Another way of saying this is that the only choice for P0 that makes Pr
depend continuously on r is the geometric mean: lim Pr = P0 .
Hint: l’Hôpital’s Rule and properties of ln x will be needed in the proof. ♦
Let us now explain why we defined P∞ as we did. Let xm be the largest
of the xi ’s. Then 0 ≤ xi ≤ xm for all i. Hence
xrm xr + · · · + xrm + · · · + xrn nxrm xm
≤ 1 ≤ = xrm , so √
≤ P r ≤ xm .
n n n n

But lim r n = lim n1/r = n0 = 1, so by the Sandwich (Squeeze) Theorem
r→∞ r→∞
lim Pr = xm = max{x1 , . . . , xm }.
This motivates the definition P∞ = max{x1 , . . . , xm }.
See if you can modify this proof to explain the choice for P−∞ :
 Lemma 3. lim Pr = min{x1 , x2 , . . . , xn }.

4. The Land of the Convex

The power mean Pr provided a large generalization of AM and GM. Still,

we don’t know yet why the infinitely many inequalities among the Pr ’s are
true! The notion of convexity will allow us to further generalize the power
means and explain all inequalities encountered so far in one fell swoop.

4.1. What is a convex function? Briefly, a function f is convex if for

every two points A and B on the graph of f , the line segment AB lies above
the part of the graph between A and B (cf. Fig. 4a). More formally,
Definition 2. (Geometric convexity) A function f (x) is convex if for
any real numbers a and b with a < b, each point D = (c, d) on the line
i segment joining A = (a, f (a)) and B = (b, f (b)) lies above or at the point
C = (c, f (c)) on the graph of f with the same x-coordinate as D (cf. Fig. 4a).

y y
f (x) l
D y

A x x
a c b x convex

Figure 4. Graphs of x2 , x3 , and sin x

The x-value c is a fraction λ of the way from a to b for some λ ∈ [0, 1],
i.e., c − a = λ(b − a), and hence c Ä= (1 − λ)a + λb.
ä This yields the height
of point C on the graph: f (c) = f (1 − λ)a + λb . At the same time, the
height of point D on the line segment AB is (1 − λ)f (a) + λf (b); indeed, this
 is the same (linear) combination of the heights f (a) and f (b) of A and B,
coming from the right trapezoid “abBA” (for a proof, see the plane geometry
interlude). The condition for convexity, that the height of C is at most the
height of D, can be expressed algebraically as follows:

Definition 2 . (Algebraic convexity) A function f (x) is convex if

Ä ä
i (2) f (1 − λ)a + λb ≤ (1 − λ)f (a) + λf (b)
whenever a < b and λ ∈ [0, 1].
Those who know what a convex set in geometry is can interpret the
condition as saying that the set S = {(x, y) : y ≥ f (x)} of points above the
graph of f is a convex set, i.e., the segment connecting any two points in S
i is entirely in S. Loosely speaking, this will hold if the graph of f curves in
the shape of a smile instead of a frown. For example, the function f (x) = x2
is convex (cf. Fig. 4a), and so is f (x) = xn for any positive even integer.

One can also speak of a function f (x) being convex on an interval I.

i This means that the condition (2) above holds at least when a, b ∈ I (and
a < b and λ ∈ [0, 1]). In Figure 4b-c, for instance, one can observe that
f (x) = x3 is convex on [0, ∞), and that f (x) = sin x is convex on [−π, 0].
Finally, one says that a function f (x) on an interval I is strictly convex if
Ä ä
i f (1 − λ)a + λb < (1 − λ)f (a) + λf (b)
whenever a, b ∈ I and a < b and λ ∈ (0, 1). In other words, the line segment
connecting two points on the graph of f should lie entirely above the graph
of f , except where it touches at its endpoints. Thus, for example, while a
linear function ax + b and a quadratic function ax2 + bx + c (with a > 0) are
both convex everywhere, only the quadratic one is strictly convex (why?).

4.2. The “convex hall” of fame. For convenience, here is a brief list of
some frequently encountered convex functions:
• x2k on all of R; • xr on [0, ∞), if r ≥ 1;
• −xr on [0, ∞), if r ∈ [0, 1]; • xr on (0, ∞), if r ≤ 0;
• − ln x on (0, ∞); • − sin x on [0, π];
• − cos x on [−π/2, π/2]; • tan x on [0, π/2);
• ex on all of R; • r
s+x on (−s, ∞), if r > 0.
In these, k represents a positive integer, r, s represent real constants, and x
is the variable. In fact, all of these are strictly convex on the interval given,
except for xr and −xr when r is 0 or 1.
Exercise 10. Draw the graphs of the functions above and explain why they
are convex on the given intervals by verifying visually the geometric definition
of convexity.
To make more convex functions out of already known convex functions,
we can perform certain arithmetic operations:
Lemma 4. Show that a sum of convex functions is convex, and that adding
a constant or linear function to a function does not affect convexity.
Hint: Verify the algebraic definition of convexity. ♦

4.3. Convexity fast-track for calculus aficionados. If you know about

continuity and derivatives and want to rigorously prove that a function is
convex, then . . . instead of just guessing it from the graph or trying to verify
the algebraic definition of convexity (which can be quite hard and time-
consuming), it is often easier to use one of the criteria below.
The first criterion says that for a continuous function (roughly, a function
whose graph you can draw without lifting your pencil), it is enough to verify
the definition of convexity only for the midpoint c of every interval [a, b]:

Theorem 3. (Continuity and Midpoints) Let f (x) be a continuous func-

tion on an interval I. Then f (x) is convex if and only if for all a, b ∈ I:
Å ã
f (a) + f (b) a+b
≥f .
2 2
Also, f (x) is strictly convex if and only if the inequality is strict for all a = b.

Convexity can be also expressed in terms of the derivative f  (x), which

measures the rate of change of f (x):
Theorem 4. (First Derivative Test) Let f (x) be a differentiable function
on an interval I. Then f (x) is convex if and only if f  (x) is increasing on
the interior of I.
Often, it is hard to determine directly if a function increases or decreases:
this is more easily verified by determining where the derivative is positive
or negative. More precisely, if g(x) is a function whose derivative satisfies
g  (x) > 0, then g(x) is increasing. Applying this with g(x) defined to be
f  (x) results in the second derivative f  (x) and leads to another useful test:
Theorem 5. (Second Derivative Test) Let f (x) be a twice differentiable
function on an interval I. Then f (x) is convex if and only if f  (x) ≥ 0 for
all x ∈ I. Also, f (x) is strictly convex if and only if f  (x) ≥ 0 for all x ∈ I
and there is no subinterval J ⊂ I of positive length on which f  is zero.

Use each of the criteria above to produce three different solutions to:
Exercise 11. Find out (with proof) on which intervals x2 , x3 , and sin x
are convex. For an extra challenge, prove that each of the functions in
Exercise 10 is convex (or strictly convex) on the indicated intervals.

5. Applications of Convexity to Inequalities

5.1. Convexity and endpoints. Convexity is frequently used to prove

inequalities that would have been too hard to tackle before. A seemingly
obvious but powerful principle is in action here:

Theorem 6. (Maximum Principle for Convex Functions) A convex
function f (x) on an interval [a, b] is maximized at x = a or x = b (or both).

Proof: First suppose that f (b) ≥ f (a). Given c in [a, b], let λ ∈ [0, 1]
be such that c = (1 − λ)a + λb. Then the algebraic definition of convexity
implies that
f (c) ≤ (1 − λ)f (a) + λf (b) ≤ (1 − λ)f (b) + λf (b) = f (b),
so f attains a maximum at b. The case f (a) ≥ f (b) is analogous. 
Thus, as long as f (x) is convex on [a, b], its maximum is attained at an
endpoint of the interval.

Problem 3. (USAMO ’80) Prove that for a, b, c ∈ [0, 1],

a b c
+ + + (1 − a)(1 − b)(1 − c) ≤ 1.
b+c+1 c+a+1 a+b+1
How can we apply the Maximum Principle and plug into the LHS the
end values of the interval [0, 1], when our treatment of convex functions
concentrated only on functions of a single variable, while the inequality above
has three variables?! There is a standard technique that can help.
 PST 74. For a function f (x , x , . . . , x ) in several variables, fix all but one
1 2 n
variable, e.g., pretend that x2 , . . . , xn are constants. Viewing the function
as having only one variable x1 will allow you to apply your knowledge of
single-variable functions, for example, to their convexity.
Solution: Let F (a, b, c) denote the LHS. If we fix b and c in [0, 1], the
resulting function of a is convex on [0, 1], because it is a sum of functions
r a
of the type f (a) = s+a and linear functions. In detail, f1 (a) = b+c+1 and
f4 (a) = (1 − a)(1 − b)(1 − c) are the linear functions, while f2 (a) = c+a+1

and f3 (a) = a+b+1 are convex on their respective domains (−1 − c, ∞) and

(−1 − b, ∞), and the latter intervals include [0, 1] since b, c ≥ 0.

Therefore, the whole sum F (a, b, c) is maximized when a = 0 or a = 1;
i.e., we will not decrease F (a, b, c) by replacing a by 0 or 1. Similarly we
will not decrease F (a, b, c) by replacing each of b and c by 0 or 1. Hence the
maximum value of F (a, b, c) will occur at one of the 23 = 8 cases when a, b,
and c are 0s or 1s. But F (a, b, c) = 1 at these eight points (why? check it!),
so F (a, b, c) ≤ 1 whenever 0 ≤ a, b, c ≤ 1. 

5.2. Jensen’s inequality is one of the most widely-applied inequalities

with convex functions.
Theorem 7. (Jensen’s Inequality (JI)) Let f be a convex function on
an interval I. If x1 , x2 , . . . , xn ∈ I, then
Å ã
f (x1 ) + f (x2 ) + · · · + f (xn ) x1 + x2 + · · · + xn
≥f .
n n
If moreover f is strictly convex, then equality holds iff x1 = x2 = · · · = xn .
JI resembles the Continuity-and-Midpoints convexity criterion but for n
variables, and it can be proven by induction on n. Furthermore,
Lemma 5. JI implies AM-GM (P1 ≥ P0 ) and PM (Pr ≥ Ps ) for r ≥ s > 0
 and positive numbers x , x , . . . , x .
1 2 n

Hint: Apply JI with f (x) = − ln x or g(x) = xr/s . ♦

 Exercise 12. Prove that x x ≥

for x > 0.
Hint: Apply JI or just the definition of convexity to x ln x on (0, ∞). ♦

Exercise 13. Show that among all convex n-gons inscribed

in a fixed circle the regular n-gons have the largest perimeter.

Hint: A bit of trigonometry and a careful choice of the x

variables and the function are necessary here. Apply Jensen’s xi i
Inequality with f (x) = − sin x and with xi as suggested by
the diagram to the right. ♦

5.3. Hardy-Littlewood-Pólya (HLP) inequality. Next we have an in-

equality that is so general that it includes almost all of the other inequalities
we have discussed so far as special cases:
Theorem 8. (HLP Majorization Inequality) Let f be a convex function
on an interval I, and let a1 , . . . , an , b1 , . . . , bn ∈ I. Suppose that the sequence
i a1 , . . . , an majorizes b1 , . . . , bn ; that is, a1 ≥ · · · ≥ an , b1 ≥ · · · ≥ bn , and
a1 ≥ b1 ,
a1 + a2 ≥ b1 + b2 ,
a1 + a2 + · · · + an−1 ≥ b1 + b2 + · · · + bn−1 ,
a1 + a2 + · · · + an−1 + an = b1 + b2 + · · · + bn−1 + bn .
(Note the equality in the final equation.) Then
f (a1 ) + · · · + f (an ) ≥ f (b1 ) + · · · + f (bn ).
If in addition f is strictly convex on I, then equality holds iff ai = bi for all i.

Exercise 14. Suppose that 0 ≤ θ1 , . . . , θn ≤ π/2 and θ1 + · · · + θn = 2π.

 Prove that 4 ≤ sin(θ ) + · · · + sin(θ ) ≤ n sin(2π/n).
1 n

Hint: If f (x) = − sin x, then apply JI or HLP, as needed. ♦

 Lemma 6. Show that Jensen’s Inequality is a special case of HLP.
Hint: Let one of the sequences in HLP be constant. ♦

5.4. Inequalities with weights. Many of the inequalities we have looked

at so far have versions in which the terms in a mean can be weighted un-
equally. The algebraic definition of convexity itself unequally weights the
two x-values a and b with non-negative weights λ1 = 1 − λ and λ2 = λ so
that λ1 + λ2 = 1 and λ1 f (a) + λ2 f (b) ≥ f (λ1 a + λ2 b). Let’s see how this
works for other inequalities with more variables and weights.
Theorem 9. (Weighted AM-GM) If x1 , . . . , xn > 0, λ1 , . . . , λn ≥ 0, and
λ1 + · · · + λn = 1, then
λ1 x1 + λ2 x2 + · · · + λn xn ≥ xλ1 1 xλ2 2 . . . xλnn ,
with equality iff all the xi with λi = 0 are equal.

Definition 3. Fix x1 , . . . , xn > 0 and weights λ1 , . . . , λn ≥ 0 such that

i λ1 + · · · + λn = 1. For any r = 0, define the rth weighted power mean by
Pr := (λ1 xr1 + λ2 xr2 + · · · + λn xrn )1/r .
Also let P0 be the weighted geometric mean P0 := xλ1 1 xλ2 2 . . . xλnn .
Theorem 10. (Weighted PM) The weighted power means Pr increase as
r increase. Moreover, if the xi ’s with λi = 0 are not all equal, then Pr is a
strictly increasing function of r.

Theorem 11. (Weighted JI) Let f be a convex function on an interval I.

If x1 , . . . , xn ∈ I and λ1 , . . . , λn ≥ 0 with λ1 + · · · + λn = 1, then
λ1 f (x1 ) + λ2 f (x2 ) + · · · + λn f (xn ) ≥ f (λ1 x1 + λ2 x2 + · · · + λn xn ) .
If f is strictly convex, then equality holds iff all the xi with λi = 0 are equal.

It is not surprising that there is a relation between the various inequalities

and their weighted versions:
Lemma 7. The weighted AM-GM, weighted PM, and weighted JI contain
as special cases their ordinary versions AM-GM, PM, and JI, respectively.
And conversely,

 GM implies the weighted AM-GM, and similarly, the weighted PM and the
Lemma 8. When the weights are all rational numbers, the ordinary AM-

weighted JI follow from their unweighted versions.

Hint: Apply AM-GM to a list in which some of the numbers are repeated.
The same proof works for the weighted PM and weighted JI. ♦

Exercise 15. Given a, b, c, p, q, r > 0 with p + q + r = 1, prove
a + b + c ≥ ap bq cr + ar bp cq + aq brc p .
Hint: Apply the weighted AM-GM three times. ♦
Exercise 16. Prove that if a, b, c are sides of a triangle, then
 (a + b − c)a (b + c − a)b (c + a − b)c ≤ aa bb c c .
Hint: Why is a triangle mentioned? Divide both sides by the RHS, take
some root of both sides, and apply the weighted AM-GM. ♦

6. Geometry Leftovers and a Mean Summary

6.1. Plane geometry interlude. Several problems in this session called

for some knowledge of plane or analytic geometry, or trigonometry. In par-
ticular, three plane geometry facts appeared prominently in the theory part
of the session and deserve to be proven.
The first one pertains to the name geometric mean of two variables.

Exercise 17. In Figure 1b, two segments form the diameter AB of circle k:
AD = x and DB = y. A perpendicular is erected at point D to AB until it

hits the circle k in point C. Prove that CD = xy.
Proof: Since ∠ACB is an inscribed angle in circle k overlooking diameter
AB, we have ∠ACB = 90◦ (cf. Circle Geometry session, vol. I), and all
three triangles ADC, CDB, and ACB are right. They are also similar
because they have one more equal angle; e.g., ∠BAC is shared among two
of them, etc.
In particular, the ratios of the two smaller triangles’ sides are the same.

Hence, AD/CD = CD/BD, from which xy = CD 2 and CD = xy. 
This geometric construction explains the name geometric mean of x and y.
Using it, it is possible to:
 Exercise 18. Interpret and prove geometrically the baby AM-GM.
Proof: The midpoint O of AB is the center of k, and the radius of k, being
half of the diameter, is the arithmetic mean (x + y)/2 = OA = OB = OC.
So both the AM and GM of x and y appear in right triangle ODC as the
hypotenuse OC and the leg CD, respectively (cf. Fig. 1b).

The geometric inequality OC ≥ OD says that (x + y)/2 ≥ xy, which
is the baby AM-GM inequality. Equality is attained if and only if ODC
degenerates into a segment OC, which happens exactly when D = O, i.e.,
when x = y (cf. Fig. 5a). 
x y

A1 D1 B1 x
Figure 5. Equality in baby AM-GM and Trapezoids in convexity

A third geometric fact sneaked into the discussion of convexity in Fig-

ure 4a, which is redrawn below as Figure 5b.

 point D on side AB so that A A, D D, and B B are parallel. If A D

Exercise 19. Given trapezoid A1 B1 BA, let point D1 be on side A1 B1 and
1 1 1 1 1 =
λA1 B1 for some λ ∈ (0, 1), then show that DD1 = (1 − λ)A1 A + λB1 B.
Proof: WLOG, assume AA1 ≤ BB1 . Draw a line through A1 parallel to
AB that intersects DD1 in E and BB1 in F . Then A1 A = ED = F B, and
D1 E/B1 F = A1 D1 /A1 B1 = λ. (Why? Think of similar triangles.) We can
now calculate the length of the segment in question:
DD1 = D1 E + ED = λB1 F + λED + (1 − λ)ED
= λ(B1 F + F B) + (1 − λ)A1 A = λB1 B + (1 − λ)A1 A. 

Baby AM-GM
6.2. A diagram of the major inequalities for
means introduced in this session appears to the Weighted Baby
right. The arrows show implications between dif- AM-GM
ferent inequalities, e.g., the bottom-most arrow in-
dicates that the Hardy-Littlewood-Pólya Inequal-
ity implies the Jensen’s Inequality. The dashed
Weighted AM-GM
arrows refer to implications being shown here for PM
the case of rational weights only.
Using the so-called smoothing technique, we
will prove some of these inequalities in Monovari- Weighted PM
ants III. We will see other fundamental inequalities
and further sophisticated applications of inequal-
ities to olympiad-style problems in the upcoming
Weighted JI
Inequalities II. HLP

6.3. Acknowledgments and sources for more inequalities. Some of

the problems here were drawn from notes from the U.S. training session for
the International Mathematics Olympiad. Others are from [73]. Many of
the inequalities themselves can be found in the book [36], which contains a
very thorough treatment of the topic.

7. Hints and Solutions to Selected Problems

Lemma 1. Reasoning backward, we square, simplify, and move to the RHS:

√ ?
? ? ?
xy ≤ 2 ⇔ xy ≤ ( x+y
2 ) ⇔ 4xy ≤ x + 2xy + y ⇔ 0 ≤ x − 2xy + y .
2 2 2 2 2

The latter is recognized as 0 ≤ (x − y)2 , which is always true, with equality
iff x = y. If you prefer to avoid squaring both sides, multiply the proposed
inequality instead by 2, pull to the RHS and rewrite as a square:
√ ? ? √ √ √ √ ? √ √
2 xy ≤ x + y ⇔ 0 ≤ ( x)2 − 2 x y + ( y)2 ⇔ 0 ≤ ( x − y)2 .
√ √
Plugging t = x − y into the quotation in the beginning of the session, we
√ √
obtain the true inequality t2 ≥ 0, with equality iff x = y, i.e., x = y. 
Exercise 1. If L replaces 20m, then the system of equations is x + 2y = L
and x = 2y, leading to y = L/4, x = L/2, and largest area xy = L2 /8. ♦
Exercise 2. For the first question, the area xy = 50 is fixed and we want
to minimize the fence length x + 2y. By baby AM-GM:
x + 2y AM-GM  √ √
≥ x · 2y = 2 · 50 = 100 = 10,
where equality is obtained iff x = 2y. Plugging into the fixed area, we obtain
2y 2 = 50, i.e., y = 5 and x = 10, yielding a minimal fence of 20 m again!

This is not a coincidence, since this exercise and the original problem are
two sides of the same optimization situation (why?). 
To answer the other question, we have the fence length fixed at 20 =
2x + 2y, i.e., x + y = 10. To maximize the area, we again apply AM-GM,
but this time to variables x and y:
x·y ≤ 10
2 = 2 = 5,
with equality iff x = y = 5. Thus, the square has maximal area of 25 m2
among all rectangles of 20 m perimeter. Again, any perimeter length will
yield a square as the optimal figure in this type of a problem. 
Exercise 3. Following the hint, we apply AM-GM once to each sum on the
LHS and then multiply the three resulting inequalities:
√ √ √
a + b ≥ 2 ab, b + c ≥ 2 bc, c + a ≥ 2 ca, and
√ √ √ √
(a + b)(b + c)(c + a) ≥ 2 ab · 2 bc · 2 ca = 8 a2 b2 c2 = 8abc,
where equality is obtained iff equalities are obtained in each of the original
three applications of AM-GM, i.e., a = b = c. 
Exercise 4. We pair up the numbers from {1, 2, . . . , n} so that each pair
adds up to n + 1: (1, n), (2, n − 1), . . . , (n − 1, 2), (n, 1). Note that each
number appears twice, and if n is odd then the middle number (n + 1)/2 is
paired up with itself. We now apply AM-GM to each such pair:
√ 2+(n−1) √ √ √
2 ≥
1·n, 2 ≥ 2(n−1), . . . , (n−1)+2
2 ≥ (n−1)2, n+12 ≥ n·1.
Ä än √
Multiplying now all these n inequalities yields n+1 2 ≥ n! n! = n!. Equal-
ity can never be obtained for n > √
1 since the very first application of AM-GM
yields a strict inequality: 1+n
2 > n. 
Exercise 5. Pulling all variable expressions to the LHS, and dividing by 3,
√ ?
we turn the inequality into its equivalent version (2 x + x1 )/3 ≥ 1. The 3 in
the denominator suggests using AM-GM for 3 variables, but we√have only 2
summands in the numerator! Hence the text suggests to split 2 x:

2 x+ x1
√ √ »
x+ x+ x1 AM-GM 3 √ √ 1
» √
3 = 3 ≥ x x x = 3
x 1
x = 3
1 = 1,
√ √
with equality iff x = x1 , i.e., x x = 1, x3 = 1, and x = 1. 
Exercise 6. We see the product of many ab’s on the RHS; if we divide
everything by n, we also see a denominator of n on the LHS; still, we do not
see a sum on the LHS! But there is a common factor of (a − b) on both sides:
? n−1
an −bn = (a−b)(an−1 +an−2 b+· · ·+an−1−k bk +· · ·+abn−2 +bn−1 ) ≥ n(a−b)(ab) 2 .
If a = b, then both sides are 0 and we are done. If a > b, we divide by
n(a − b) without changing the direction of the inequality:
an−1 + an−2 b + · · · + an−1−k bk + · · · + abn−2 + bn−1 ? n−1
≥ (ab) 2 .

We can now apply AM-GM to the n summands an−1−k bk on the LHS:

LHS ≥ n
(an−1 )(an−2 b1 ) · · · (an−1−k bk ) · · · (a1 bn−2 )(bn−1 ).
How many a’s are under the radical? Adding up the exponents of a, we get
(n−1+1)(n−1) n(n−1)
(n − 1) + (n − 2) + · · · + 1 = 2 = 2 ·
Therefore taking the nth root in the RHS leads to
n n(n−1) n(n−1) n−1 n−1 n−1
LHS ≥ a 2 b 2 = a 2 b 2 = (ab) 2 .
Incidentally, when a > b, the terms an−1 and bn−1 in the application of
AM-GM are not equal, so the inequality is strict. 
Exercise 7. Let (x, y, z) be the corner of the box such that x, y, z > 0. Then
x2 y2 z2
a2 + b2 + c2 = 1 and the box’s volume is (2x)(2y)(2z) = 8xyz √
(why?). If we
blindly apply AM-GM to the product xyz, we end up with xyz ≤ x+y+z
3 ,
and we do not have information about the last average. We need to involve
x2 , y 2 , and z 2 , and so instead we try
» x2 + y 2 + z 2 AM-GM
x2 y 2 z 2 · ≤
But again we are out of luck: we need the constants a, b, and c to appear on
the RHS! Following the hint, we realize that in the defining equation for the
ellipsoid, equality is obtained if xa = yb = zc = √13 , and since the given sum
x2 y 2 z2
involves the terms a2 , b2 , and c2 , we apply AM-GM to these and obtain
Ä äÄ äÄ
ä AM-GM ( x2 ) + ( y2 ) + ( z2 )
x2 y2 z2
a2 b2 c2 ≤ a b c
= ·
3 3
By cubing, taking the square root, and clearing a denominator, we solve
√ · If x = √a , y = √b , and
for the product xyz, and we find that xyz ≤ 3abc
3 3 3
z= √c , then equality holds everywhere, so the maximum volume is √ ·

3 3 3
In a similar vein, the largest area of a rectangle inscribed in an ellipse
+ yb2 = 1 with sides parallel to the axes is 2ab; the corner of this optimal
rectangle in the first quadrant is (x = √a2 , y = √b2 ). ♦
Problem 2. Following the hint, we multiply out and expand the LHS. It
consists of 2n products of the form ai1 ai2 . . . aik , corresponding to the 2n
subsets {i1 , i2 , . . . , ik } of the indices {1, 2, . . . , n}, each such subset indicat-
ing which ai ’s have been chosen and which have been replaced by 1s when
  out. For every k = 0, 1, . . . , n we apply AM-GM to the sum of
all such nk products that have exactly k terms:
Ç å
 AM-GM n n 
(3) ai1 ai2 . . . aik ≥ k ai1 ai2 · · · aik .
{i1 ,i2 ,...,ik }
k {i1 ,i2 ,...,ik }

If the notation is too intimidating,

  then use n = 3 to see the pattern. Each ai

appears in exactly n−1 k−1 times: after removing ai , these are the number of

ways to choose the other (k−1) elements from the (n−1) leftover numbers aj .
Thus, each ai is raised to the power
(n − 1)! k!(n − k)! k
= · = ·
(k − 1)!(n − k)! n! n
n k n k
Hence, the RHS of (3) equals k (a1 a2 . . . an ) n = k g . For the whole sum,
we run this argument for k = 0, 1, . . . , n and recover on the LHS the original
product that was multiplied out:
n Ç å
 n k
(1 + a1 )(1 + a2 ) . . . (1 + an ) ≥ g = (1 + g)n .
i The last is a famous combinatorial identity called the Binomial Theorem.
Equality is achieved only if equalities are obtained everywhere in the appli-
cations of AM-GM in (3). In particular, for k = 1 we must have the singleton
products equal among themselves, i.e., a1 = a2 = · · · = an (= g), and these
do produce an overall equality. 

 ric functions and some abstract manipulation of the LHS, or by smoothing

The Binomial Theorem can be avoided by using the so-called symmet-

algorithms (to be done in Inequalities II and Monovariants III, respectively).

Exercise 8. If ABC has a right angle at B, then by the Pythagorean
Theorem, AC 2 = AB 2 + BC 2 . On the other hand, by PM we have
» 2 » 2 √
AB +BC 2
2 = P 1 ≤ P 2 = 2 = 2 , so AB + BC ≤
Equality is obtained iff AB = BC, i.e., ABC is right isosceles. 
Exercise 9. The equation of the plane through (r, 0, 0), (0, s, 0), and (0, 0, t)
is xr + ys + zt = 1. The plane passes through point (a, b, c), so ar + sb + ct = 1.
The volume of the tetrahedron is rst/6. AM-HM for ar , sb , and ct implies
r s t 3 3
· · = P0 ≥ P−1 = a b c
= = 3,
a b c r + s + t
so rst ≥ 27abc, with equality if and only if = sb = ct = 13 , i.e., r = 3a,
s = 3b, and t = 3c. The maximal volume is thus 9abc/2. 
The two-dimensional version of the problem asks for the triangle of
largest area bounded by the positive parts of the x- and y-axes and a line
passing through a fixed point (a, b) with a, b > 0. Analogously, the largest
area is 2ab attained when the line has x-intercept 2a and y-intercept 2b. ♦
xr + · · · + xrn n
Lemma 2. Let y(r) = 1 · Then lim y(r) = = 1, and
n r→0 n
xr1 ln x1 +· · ·+xrn ln xn ln x1 + · · · + ln xn
y  (r) = , so lim y  (r) =
= ln(x1 . . . xn ) n .
n r→0 n
Ä ä Ä ä
1 1
1 ln y(r) r lim ln y(r)
If the limits exist, P0 = lim Pr = lim y(r) r = lim e = e r→0 r .
r→0 r→0 r→0

The new problem lim ln y(r)

r is solved by l’Hôpital’s Rule (l’H), since both top
and bottom go to 0 when r → 0. By continuity, lim ln y(r) = ln 1 = 0, so

ln y(r) l’H (ln y(r))

y  (r) lim y  (r)
y(r) r→0
lim = lim = lim = = ln(x1 x2 . . . xn )1/n .
r→0 r r→0 r r→0 1 lim y(r)
1/n √
Backtracking, we get P0 = eln(x1 x2 ...xn ) = n x1 x2 . . . xn . 
Lemma 3. If xk = min{x1 , . . . , xn } and r < 0, then xri = 1
≤ 1
= xrk .
i k
After replacing xm by xk , the rest of the solution goes exactly as in the text
√ 1
for P∞ , with the observation that lim r n = lim n r = n0 = 1. 
r→−∞ r→−∞

Exercise 10. In addition to x2 and x3 in Figure 4, below are graphs of

examples of the listed types of functions (cf. Fig. 6). Some convex parts are
drawn in solid lines. ♦

tan x ex
x−1 − cos x 3

− sin x − log x
−x 3

Figure 6. Graphs of convex examples

Lemma 4. If f (x) and g(x) are convex functions on [a, b], then by the
algebraic definition of convexity, for any λ ∈ [0, 1]:
f (1 − λ)a + λb ≤ (1 − λ)f (a) + λf (b) and g (1 − λ)a + λb ≤ (1 − λ)g(a) + λg(b).
Adding these two inequalities yields
f (1 − λ)a + λb + g (1 − λ)a + λb ≤ (1 − λ)(f (a) + g(a)) + λ(f (b) + g(b)),
which is the algebraic definition of convexity on [a, b] for the function f + g.
A constant or linear function g(x) is automatically convex, so adding it to a
convex function preserves convexity. 
Exercise 11. Since all three functions x2 , x3 , and sin x are continuous on R,
we can apply to them the Continuity-and-Midpoint (CM) criterion. To start,
for a, b ≥ 0 we apply special cases of the PM inequality to x2 and x3 :
Å ã Å ã
a2 + b2 (P2 ≥P1 ) a + b 2 a3 + b3 (P3 ≥P1 ) a + b 3
2 2 3 3

≥ and ≥ ,
2 2 2 2
and hence, by CM criterion, x2 and x3 are convex on [0, ∞). As for showing
that x2 is convex on all of R, what would happen to the inequality if you
replace a and b by ±a and ±b? ♦

Using some trigonometry, we get

Å ã Å ã
sin a + sin b a+b a−b
= sin cos .
2 2 2
Ä ä Ä ä
How does the RHS compare with sin a+b 2 ? Although | cos | ≤ 1, we
have to be careful with signs, or we will Ä get
ä the inequality in Äthe äwrong
direction! If −π ≤ a, b ≤ 0, then sin a+b 2 ≤ 0 and 1 ≥ cos a−b
2 ≥ 0
(why?), so convexity of sin x on [−π, 0] follows from the inequality
Å ã Å ã Å ã
sin a + sin b a+b a−b a+b
= sin cos ≥ sin . 
2 2 2 2
To apply the First Derivative Test, we find out where derivatives increase:
• (x2 ) = 2x increases for all x, so x2 is convex on R;
• (x3 ) = 3x2 increases for all x ≥ 0, so x3 is convex on [0, ∞);
• (sin x) = cos x increases for −π ≤ x ≤ 0, so sin x is convex there. 
For the reader interested in applying the First Derivative Test to the
functions in Exercise 10, here are the corresponding derivatives:
Ä ä
• (xr ) = rxr−1 , (−xr ) = −rxr−1 , r
s+x = 
(s+x)2 , (− ln x) = − x

• (− sin x) = − cos x, (− cos x) = sin x, (tan x) = cos12 x , (ex ) = ex .

Finally we check convexity with the Second Derivative Test:

• (x2 ) = (2x) = 2 > 0, so x2 is convex on R;
• (x3 ) = (3x2 ) = 6x > 0 for all x > 0, so x3 is convex on [0, ∞);
• (sin x)= (cos x)= − sin x ≥ 0 on [−π, 0], so sin x is convex there. 
Lemma 5. Apply JI to the convex function f (x) = − ln x on (0, ∞):
Ä ä Ä ä
− ln x1 +···+ln
≥ − ln x1 +···+x
implies n1 ln(x1 . . . xn ) ≤ ln x1 +···+x

√ Ä ä √
so ln n x1 . . . xn ≤ ln x1 +···+x
, or P0 = n x1 . . . xn ≤ x1 +···+x
= P1 ,
with equality iff x1 = x2 = · · · = xn . 
For Pr ≥ Ps with r ≥ s > 0, we apply JI to xs1 , xs2 , . . . , xsn > 0 and the
function g(x) = xr/s , which is convex on [0, ∞) because rs > 1:
Å ãr/s
(xs1 )r/s + · · · + (xsn )r/s JI xs1 + · · · + xsn
≥ .
n n
Taking rth roots on both sides gives
Å r ã Å s ã
x1 + · · · + xrn 1/r x1 + · · · + xsn 1/s
Pr = ≥ = Ps . 
n n
Exercise 12. To discover the needed convex function, reason backward:
Å ãx+1 Å ã Å ã
? x+1 ln ? x + 1 x+1 ? x+1
xx ≥ ⇐⇒ ln xx ≥ ln ⇐⇒ x ln x ≥ (x + 1) ln ·
2 2 2
The function f (x) = x ln x participates on both sides of the last inequality,

so we check if f (x) is strictly convex for x > 0. This is true by the First
Derivative Test since f  (x) = ln x + 1 is strictly increasing for x > 0. By the
definition of convexity of f (x) = x ln x on [x, 1] (or [1, x]) and t = 1/2,
Å ã Å ã
f (x) + f (1) x+1 x ln x + 1 ln 1 x+1 x+1
≥f , so ≥ ln ,
2 2 2 2 2
which simplifies to the desired inequality, with equality iff x = 1. 
Exercise 13. Let xi be as in the hint, so xi ∈ (0, π) and x1 + · · · + xn = π.
Then the length of the ith side of the n-gon is 2R sin xi , where R is the
radius of the circle. Thus, we need to maximize the sum of all sin xi . But
f (x) = − sin x is convex on [0, π], hence by JI:
Å ã
sin x1 + · · · + sin xn x1 + · · · + xn π
(4) − ≥ − sin = − sin ·
n n n
Therefore, sin x1 + · · · + sin xn ≤ n sin n , with equality iff x1 = · · · = xn , i.e.,

the polygon is regular. 

Exercise 14. Modifying slightly the solution in Exercise 13, we immediately
obtain the second inequality: sin θ1 +· · ·+sin θn ≤ n sin(2π/n), with equality
iff all xi ’s are equal.
For the first inequality, note that there are at least four angles θ1 , θ2 , θ3 ,
and θ4 , or else their sum would be at most 3π/2 and not 2π. We arrange
the angles θi in decreasing order and apply HLP to the convex function
f (x) = − sin x on [0, π/2] and to the sequences a1 = · · · = a4 = π2 , a5 =
· · · = an = 0, and bi = θi for all i. (Check that {an } majorizes {bn }.) Thus,
−4 sin − 0 − · · · − 0 ≥ − sin θ1 − · · · − sin θn ⇒ 4 ≤ sin θ1 + · · · + sin θn ,
with equality iff four θi ’s are right angles and the rest are 0s. 
Lemma 6. Arrange the numbers in JI in decreasing order: x1 ≥ · · · ≥ xn ,
and let c be their average (x1 + · · · + xn )/n. Note that the average of
x1 , x2 , . . . , xk decreases when we include the next xk+1 . Indeed,
x1 + · · · + xk ? x1 + · · · + xk + xk+1

k k+1
? ?
⇔ (k + 1)(x1 + · · · + xk ) ≥ k(x1 + · · · + xk + xk+1 ) ⇔ x1 + · · · + xk ≥ kxk+1 ,
and the latter is certainly true since each number on the LHS is ≥ xk+1 . In
particular, the smallest average is the total average c, so x1 + · · · + xk ≥ kc.
This means that the sequence x1 , . . . , xn majorizes the constant sequence
c, . . . , c. Applying HLP to the convex function f (x), we arrive at JI:
f (x1 ) + f (x2 ) + · · · + f (xn ) ≥ f (c) + f (c) + · · · + f (c) = nf (c).
f (x1 ) + f (x2 ) + · · · + f (xn ) x1 + x2 + · · · + xn
Hence ≥ f . 
n n

Lemma 7. Apply the weighted inequalities with λ1 = · · · = λn = 1

n, in
which case λ1 + · · · + λn = 1 as needed. ♦

Lemma 8. If the weights λi are rational numbers, multiplying by the lcm

of their denominators, we can assume that all of them have the same de-
nominator q: λi = pqi with q and the pi ’s positive integers. Since λ1 +
λ2 + · · · + λn = 1, we have p1 + p2 + · · · + pn = q. Now we construct a
list in which each ai is repeated pi times, for a total of q variables, i.e.,
{ x1 , . . . , x1 , x2 , . . . , x2 , . . . , xn , . . . , xn }, and apply the ordinary AM-GM:
p1 p2 pn
p 1 x1 + · · · + p n xn »
q p1 p1 pn

λ1 x1 + · · · + λn xn = ≥ x1 . . . xpkk = x1q1 . . . x1qn .

The last is xλ1 1 . . . xλnn , which is the RHS of the weighted AM-GM inequality
for non-negative rational weights. 
Similarly, JI implies weighted JI, using the same list of repeated variables:
p1 f (x1 )+···+pn f (xn ) JI
Ä ä
λ1 f (x1 ) + · · · + λn f (xn ) = q ≥ f p1 x1 +···+pn xn
q .

The last is the desired RHS f (λ1 x1 + · · · + λn xn ) of the weighted JI. 

PM too implies weighted PM, with the same list of variables, r ≥ s > 0:
Äp r r ä1/r PM Äp s s ä1/s
1 x1 +···+pn xn 1 x1 +···+pn xn
Pr = (λ1 xr1 + · · · + λn xrn )1/r = q ≥ q = Ps .

We leave it to the reader to revise this to include the weighted P0 too. ♦

Exercise 15. The powers p, q, and r play here the role of the weights λi ,
but we need to change their order to match all three products on the RHS.
Hence, we apply weighted AM-GM three times to the variables {a, b, c} with
the weight arrangements (p, q, r), (r, p, q), and (q, r, p):
pa + qb + rc ≥ ap bq cr , ra + pb + qc ≥ ar bp cq , qa + rb + pc ≥ aq brc p .
Adding these up and using that p+q+r = 1 yields the desired inequality. 
Exercise 16. The triangle inequality ensures that a + b − c > 0, etc. We
divide by aa bb cc and take the (a + b + c)th root on both sides:
ã Å ã Å ã
a+b−c a b+c−a b c+a−b c ?
≤ 1
a b c
Å ã a+b+c
a Å ã a+b+c
b Å ã c
a+b−c b+c−a c + a − b a+b+c ?
⇔ ≤ 1.
a b c

The three exponents add up to 1 and are positive. Hence they can serve as
weights λi , making the LHS the weighted GM of a+b−c a ,
b , and
c ,
which is less than or equal to the weighted AM, i.e.,
a a+b−c b b+c−a c b+c−a
LHS ≤ · + · + ·
a+b+c a a+b+c b a+b+c c
a+b−c b+c−a c+a−b a+b+c
= + + = =1 
a+b+c a+b+c a+b+c a+b+c
Session 10

Multiplicative Functions. Part II

Dirichlet Product and Möbius Inversion

Zvezdelina Stankova

Sneak Preview. This session is a direct continuation of Multiplicative func-

tions Part I; even the numbering of sections and statements here follows suit.
Sum-functions will be generalized via the Dirichlet product ; arithmetic func-
tions will be inverted via the Möbius function μ; and upon discovery of the Euler
function φ, the ∞-Raffle Problem will (yet again!) be conquered in a most ele-
gant way. Occasionally, basic operations on remainders (cf. congruence modulo n
in Number Theory I) and knowledge of binomial coefficients and counting tech-
niques (cf. Combinatorics I) will aid our studies. The advanced reader familiar
with the theory should ensure that he/she can solve all problems on μ and φ
in Subsection 5.7 and Section 7 before moving on to the group structure of M,
Dirichlet series, and the Riemann zeta-function in Part III.

4. Dirichlet Product

4.1. Redefining multiplication. The sum-function Sf (n) := d|n f (d),

defined in Session 4, is a special case of a much broader notion. Just as
you have learned (some time ago) how to multiply numbers, we will learn
here how to multiply functions. Certainly, you can do it in the usual way:
(f · g)(n) = f (n)g(n), i.e., simply multiply the corresponding values of f
and g. For example, id · ε = ε and f · ι = f for any f ∈ A (check it!)
But, as will soon become clear, this function product does not capture the
basic properties of multiplicative functions in which we are interested, and
it certainly does not generalize sum-functions. A different product on the
set A of arithmetic functions is called for.
Definition 5. Let f and g be two arithmetic functions. We define their
i Dirichlet product (a.k.a. Dirichlet convolution) by
  Ä ä
(12) f  g (n) = f (d1 ) g(d2 ) = f (d) g d .
d1 d2 =n d|n


In other words, the product is taken over all pairs of divisors (d1 , d2 ) that
multiply to n: d1 d2 = n. Solving for the divisor d2 = n/d1 yields the second
equivalent summation in (12). For instance,
f  g (6) = f (d) g d) = f (1)g(6) + f (2)g(3) + f (3)g(2) + f (6)g(1).

The ordinary function product is now transformed into something entirely

different, as demonstrated by the next exercise.
Exercise 10. Calculate f  ε and f  ι for any f ∈ A.
Solution: As ε is 0 except for ε(1) = 1, the Dirichlet products with ε are
easy to calculate:

f  ε (n) = f (d1 ) ε(d2 ) = 0 + · · · + 0 + f (n) · ε(1) = f (n).
d1 d2 =n
Therefore, f  ε = f ; in particular, id  ε = id. Next,
f  ι (n) = f (d) ι( nd ) = f (d)·1 = f (d) = Sf (n). 
d|n d|n d|n

Along the way, we have discovered that sum-functions are, not surprisingly,
a particular instance of the D-product:1
Property 1. D-multiplying by ι produces the sum-function:
(13) f  ι = Sf for any f ∈ A.

Recall the reformulation of ∞-Raffle in Problem 1 . It described R as

an arithmetic function whose sum-function is id: SR = id. By Property 1,
we can now rewrite the sum-function SR as the D-product R  ι = id and
have yet another reformulation:

Problem 1 (∞-Raffle). An arithmetic function R satisfies R  ι = id.
Solve for R and prove that R(n) ≥ 1 for all n ∈ N.

Can we really solve for R from here? We’ll answer this affirmatively in a bit.

4.2. D-product and number-product are alike. Just as multiplication

of numbers produces a number (5·7 = 35 ∈ N), so does D-multiplication start
with two arithmetic functions f and g and produce an arithmetic function
i f g. Formally, we say that  is a binary operation 2 on A, i.e.,  : A×A → A
sending the pair (f, g) → f  g.

The first properties of number-multiplication that you have probably

used are commutativity and associativity: mn = nm and (mn)k = m(nk)
for all m, n, k ∈ N. The same properties hold true if we extend multiplication
to rational, real, or even complex numbers. Likewise,
We will abbreviate “Dirichlet” to “D-” in various expressions from now on.
We met with binary operations in Complex Numbers I.

Property 2. D-product is commutative and associative: f  g = g  f and
(f  g)  h = f  (g  h) for all f, g, h ∈ A.

Partial Solution: Commutativity of  is automatic from the symmetry

of  in Definition 5: switching the places of f and g results in the same D-
product. That  is also associative follows from a convenient way of rewriting
Definition 5 for the triple product (f  g)  h:
Ä ä 
(14) (f  g)  h (n) = f (d1 ) g(d2 ) h(d3 ),
d1 d2 d3 =n

where the sum is taken over all triples (d1 , d2 , d3 ) of divisors of n that mul-
tiply to n (prove this!) ÄAs the RHSä of (14) is symmetric with respect to f ,
g, and h, it also equals f  (g  h) (n). Thus, we can write:

f  g  h = (f  g)  h = f  (g  h). ♦

4.3. Multiplicative identity. Suppose your little sister asks you: “What
is the number 1?” How would you describe 1 to identify it uniquely among
all other numbers? Answering “The number 1 signifies one object.” is a
circular definition. Saying “1 is such that 1 + 1 = 2.” is no good either:
you are defining 1 via another number 2; besides, I prohibit you from using
in your description any operation other than multiplication . . . . Well, here
is what you will “learn” about 1 in any abstract algebra course: 1 is the
unique number such that multiplying any number by 1 gives that number,
i.e., n · 1 = 1 · n = n for all n ∈ N. Again, this works equally well in the sets
of rational, real, or complex numbers too.
Moving to the set A of arithmetic functions with product , the question
is: what function plays the role of “1” and deserves to be called the multi-
i plicative identity of A? Our calculations in Exercise 10 point to the answer:

Property 3. With respect to the D-product, the multiplicative identity in

A is the two-valued function ε, i.e., f  ε = ε  f = f for all f ∈ A.

Note that any multiplicative identity (if it exists) is unique: this is well known
in abstract algebra. Indeed, in our context, if ε is another multiplicative
identity in A, then ε = ε  ε = ε (why?), implying uniqueness of ε.

At this point it is worth comparing the D-product with the usual product
of functions. The ordinary product is commutative and associative: f · g =
g · f and f · (g · h) = f · (g · h), but the multiplicative identity with respect
to it is the function ι: as observed earlier, f · ι = ι ·f = f for any f ∈ A.
With respect to the D-product, ι is not the multiplicative identity, but has
the nice property of transforming each function f into its sum-function Sf :
ι  f = f  ι = Sf . The two types of function products  and · have started
to diverge, and they will continue to do so as we study our next notion.

4.4. Multiplicative inverses. Our inquisitive little sister is bothering us

again: “What is the number 13 ”? You could reply: “ 13 is 1 divided by 3.” But
we haven’t defined yet division! “ 13 is the reciprocal of 3.” Likewise, what is
a reciprocal ? Correction: “ 13 is that number which, added 3 times to itself,
gives 1.” Yet, addition must not be used either in this description. . . . A last
attempt finally does it: “ 13 is the number which, multiplied by 3, gives 1.”
That’s right! We can define now the reciprocal n−1 (a.k.a. multiplicative
inverse) of any number n as the solution x to the equation n · x = x · n = 1.3
We make an analogous definition in the set of arithmetic functions:
Definition 6. The Dirichlet inverse f −1 of f ∈ A is an arithmetic function
i whose D-product with f is the multiplicative identity ε: f f −1 = f −1 f = ε.
Uniqueness of multiplicative inverses is another well-known fact from
abstract algebra. In our context, if fˆ−1 is another D-inverse of f , the trick
is to calculate a specific triple D-product (in the middle below):
fˆ−1 = fˆ−1  ε = fˆ−1  (f  f −1 ) = (fˆ−1  f )  f −1 = ε  f −1 = f −1 ,
from which fˆ−1 = f −1 and the D-inverse of f is unique.
As for existence, unfortunately, not all arithmetic functions have D-
inverses in A. This shouldn’t be surprising since not all numbers have re-
ciprocals either: how about 0? The lemma below addresses this issue, but
its proof is deferred until Part III, as it is not crucial for understanding the
material before then.
 Lemma 3. An arithmetic function f has a D-inverse iff f (1) = 0. 4

Because ε is the multiplicative identity in A, it is its own D-inverse:

ε = ε (why?). Calculating D-inverses in general is far from trivial even for
strongly multiplicative functions, as we will see in Problem 10.

4.5. ∞-Raffle challenge. Ultimately, we want to solve for the function R

from R  ι = id. If we were dealing with (rational) numbers instead, this
wouldn’t have been a problem. To solve x a = b for x, we would multiply both
sides by the reciprocal of a and arrive at x = b a−1. As long as we can calculate
the D-inverse ι−1 , we can apply the same logic to our equation for R:
ι−1 assoc.
R  ι = id ⇔ (R  ι)  ι−1 = id  ι−1 ⇔ R  (ι  ι−1 ) = id  ι−1
⇔ R  ε = id  ι−1 ⇔ R = id  ι−1 .
Our work in the next section is cut out for us: we need to find a formula for
ι−1 , which, incidentally, will help us understand much better the relationship
between any function and its sum-function.
As reciprocals do not exist within the set of natural numbers (except for 1−1 = 1),

we need to enlarge N to include at least the positive rational numbers. From now on,
“numbers” will refer to natural, rational, real, or complex numbers, as needed.
For the group theory fans: Lemma 3 implies that A is not a group under . However,
we’ll see that its subset A∗ of all arithmetic functions f with f (1) = 0 is a group.

5. Möbius Inversion Formula

5.1. Ad-hocking ι−1 . For simplicity, denote the D-inverse of ι by g ∈ A,

i.e., g  ι = ε. But g  ι is the sum-function Sg ; hence, Sg = ε. Thus,

(15) g(1) = ε(1) = 1 and g(n) = ε(n) = 0 for n ≥ 2.

To find an explicit formula for g, we will use the ad-hoc 5 approach of PST 27,
guess the formula, and then prove it rigorously. In the following, we have
done some initial calculations for g(n). Try to come up with these calcula-
tions on your own and then compare your answers with those in the table.
n Sg (n) = 0 for n ≥ 1 g(n)
p g(1) + g(p) = 0 −1
1 ?
p2 g(1) + g(p) + g(p2 ) = 0 0
0 ?
p3 g(p3 ) = 0; why? 0
p1 p2 g(1) + g(p1 ) + g(p2 ) + g(p1 p2 ) = 0 1
0 −1 ?

p1 p2 p3 g(1) + g(pi ) + g(pi pj ) + g(p1 p2 p3 ) = 0 −1
i=1 i<j
1     ?
−3 3
p21 p2 g(1) + g(p1 ) + g(p2 ) + g(p1 p2 ) + g(p21 ) + g(p21 p2 ) = 0 0
0 0 ?
p31 p2 g(p31 p2 ) = 0; why? 0

Two patterns emerge in our table (Don’t see them yet? Do more cases!):
• If n = p1 p2 · · · pr is a product of distinct primes, then g(n) = (−1)r .
i Such n is called square-free since it has no perfect square divisor.
• If n is divisible by a higher prime power pa with a ≥ 2, the result is
always g(n) = 0. Correspondingly, such n is not square-free.
Our exploration naturally leads to the introduction of a (famous) function:
i Definition 7. The Möbius function μ : N → Z is defined by

⎨ 1 if n = 1;
μ(n) = 0 if n is not square-free;

(−1)r if n = p1 p2 · · · pr , pi ’s are distinct primes.

To get a feeling for the Möbius function,

 Exercise 11. Show that μ is multiplicative.
created for the specific purpose of solving this particular problem.

5.2. Combinatorics to the rescue. We can now rigorously show that

ι−1 = μ, i.e., μ  ι = ε. The proof will involve two combinatorial ideas:
• the binomial coefficients kr , which count in how many ways a team of
k people can be chosen from r people; and
• the Binomial Theorem, which calculates powers of the form:
       r  r−1 r r
(x + y)r = 0r xr + 1r xr−1 y + 2r xr−2 y 2 + · · · + r−1 xy + r y .

For the beginner unfamiliar with these concepts,6 we present first a

“proof-by-example” that μ  ι = ε. Start by choosing something nice and rep-
resentative like n = 360 = 23 32 5. We need to show that μ  ι(360) = ε(360),
i.e., d|360 μ(360) = 0. But remember that μ(d) = 0 if d is divisible by a
prime power p2 , so that the only “survivors” in our summation correspond
to divisors d of 2·3·5 (why?):
(16) μ(1)+μ(2)+μ(3)+μ(5)+μ(2 · 3)+μ(2 · 5)+μ(3 · 5)+μ(2 · 3 · 5) = 0.
Thinking of 2, 3, and 5 as primes p1 , p2 , and p3 , we can rewrite the above
as a familiar expression from our earlier exploration table:

3  ?
1+ μ(pi ) + μ(pi pj ) + μ(p1 p2 p3 ) = 0.
i=1 1≤i<j≤3
−1 1 −1

This, of course, is indeed 0: 1 + 3·(−1) + 3·1 + (−1) = 0, so we are done in

this “proof-by-example.” ♦
Let’s push our understanding a bit further. From among the primes
{p1 , p2 , p3 } there is 1 way to choose no primes (silly!) – well, choose nothing
and end up with the divisor 1; there are 3 ways to choose a singleton prime
pi ; there are also 3 ways to choose a pair of primes {pi , pj }; and there is 1 way
to choose a triplet of primes {pi , pj , pl } – choose everything and end up with
divisor p1 p2 p3 = 30. BTW, these 4 cases can be counted (no coincidence!)
by the binomial coefficients 30 = 1, 31 = 3, 32 = 3, and 33 = 1,
respectively. Translating all this back into our sum in (16) and taking into
account that μ(p1 p2 · · · pr ) = (−1)r , results in
3 3 3 3 ?
0 (−1)0 + 1 (−1)1 + 2 (−1)2 + 3 (−1)3 = 0.
The Binomial Theorem kicks in here, compressing the LHS into the power
(1 − 1)3 . The main idea is that the sum runs over all possible divisors of
p1 p2 p3 ; the divisors with an odd number of primes are counted with “−1”,
and the divisors with an even number of primes are counted with “1” in the
total sum. Everything evens out and cancels on account of the Binomial
Theorem: the final answer is (1 − 1)3 = 03 = 0.
For more details, check Combinatorics I in Volume I. The binomial coefficients are
the well-known Pascal Triangle entries, and the Binomial Theorem is nothing more than
a generalization of frequently occurring equalities such as (x + y)2 = x2 + 2xy + y 2 and
(x + y)3 = x3 + 3x2 y + 3xy 2 + y 3 .

We now finish this discussion with a general proof.

Property 4. The D-inverse of ι is the Möbius function, i.e., ι−1 = μ.
Proof: We are asked to show that μ  ι = ε, i.e.
 ? 1 if n = 1;
(17) μ(d) =
0 if n > 1.

Indeed, for n = pa11 pa22 · · · par r > 1 we have

μ(d) = μ(d) = μ(1) + μ(pi1 · · · pik ).
d|n d|n, d sq. free k=1 1≤i1 <···<ik ≤r

The first equality throws out divisors that are not square-free, as μ(d) = 0
for them. The second equality realizes any square-free divisor d as a product
of some distinct prime factors pij of n; these d’s are grouped according to the
number k of their prime factors, with k running from 1 (choose one pij ) to r
(choose all pij ’s); the divisor 1 is written separately, and it can be thought
of as the product of zero pij ’s.
As noted earlier, the number of k-tuples {pi1 , pi2 , . . . , pik } from among
the r given primes {p1 , p2 , . . . , pr } is calculated by kr . From the definition
of μ, the corresponding contributions are all μ(d) = (−1)k . Putting all this
together and using the Binomial Theorem completes the argument:
μ(d) = k (−1)k = (1 − 1)r = 0.
d|n k=0

This proof doesn’t work for n = 1 (why?), but this is a trivial case to be
checked by hand: d|1 μ(d) = μ(1) = 1. 

5.3. Möbius inversion. As we learned in Lemma 3, not every arithmetic

function has a D-inverse. The “right” notion to replace D-inverses in A turns
i out to be sum-functions, a.k.a. Möbius inverses. This is the idea of
Theorem 3. (Möbius inversion) Any f ∈ A can be expressed in terms of
 its sum-function S
f as f = μ  Sf ; or, equivalently,
 Ä ä
(18) f (n) = μ(d)Sf d .

Proof: To solve for f , we just have to vanquish ι in ι  f = Sf . As we

solved earlier for R, we multiply both sides by ι−1 = μ: μ  (ι  f ) = μ  Sf .
The LHS simplifies to (μ  ι)  f = ε  f = f , and hence f = μ  Sf . 

It is natural to ask the opposite

Question 3. Given the Möbius relation f (n) = d|n μ(d)g( nd ) for two arith-
 metic functions f and g, is it true that g is the sum-function Sf of f ?

The answer should be obvious again from the D-product structure on A:

f = μ  g ⇒ ι  f = ι  (μ  g) ⇒ Sf = (ι  μ)  g = ε  g = g,
and indeed, g is the sum-function Sf . We have established:

Corollary 2. (Möbius inversion ) For any f, g ∈ A we have the equiva-

lence: g = Sf ⇔ f = μ  g, written in sum-notation as
g(n) = f (d) ⇔ f (n) = μ(d)g( nd ).
d|n d|n

In particular, every g ∈ A is the sum-function of a unique f ∈ A; namely,

define f by the second formula as f = μ  g, so that g = Sf .

5.4. Back to Multiplicativity. Recall that the sum-function of R is id:

SR = id. Since id is multiplicative, by Theorem 2 in Part I, we concluded
that the original function R is multiplicative too . . . . But we didn’t prove
that theorem in the text! We shoved its proof into the Hints section and left
it to the reader skilled with strong induction!
As promised, let’s try another way that completely avoids induction, yet
provides a crystal clear explanation of what’s going on. Since SR = id, by
Möbius inversion, R = μ  id. Now, both μ and id are multiplicative. Could
their D-product be multiplicative on account of some more general fact?
That’s right! The most general statement in the present context turns out
to be true:
 Theorem 4. If f, g ∈ M then f  g ∈ M. In other words, the D-product of
two multiplicative functions is also multiplicative.

Proof: Let f, g ∈ M and gcd(m, n) = 1. Then (f g)(m)·(f g)(n) equals:

 Ä ä Ä ä  Ä ä Ä ä
m n m n
f (d1 )g d1 f (d2 )g d2 = f (d1 )f (d2 )g d1 g d2
d1 |m d2 |n d1 |m,d2 |n

mult  Ä ä Lem.1  Ä ä def

mn mn
= f (d1 d2 )g d1 d2 = f (d)g d = (f  g)(mn).
d1 |m,d2 |n d|mn
Thus, (f  g)(m) · (f  g)(n) = (f  g)(mn) and f  g is also multiplicative. 

The proof involves really nothing but manipulation of the -notation

and the divisors d = d1 d2 of mn. Because of Theorem 4, the set M of
i multiplicative functions is said to be closed under the D-product.7 This
implies, in particular, that multiplying any g ∈ M by μ ∈ M produces
μ  g ∈ M; in other words, if the sum-function Sf is multiplicative, then
the original function f = μ  Sf is multiplicative too . . . . We have provided
another proof of Theorem 2 in Part I.
That A is closed under  follows immediately from the definition of the D-product.

5.5. ∞-Raffle is unraveled. As id ∈ M, the D-product μ  id ∈ M, so

that R is indeed multiplicative! Hence, formula (9) from Part I is validated
and we have yet again conquered the ∞-Raffle Problem. 
In the midst of our victory, a potentially annoying question nags in the
back of our heads. So far, the function R was essentially identified as the
i Möbius inverse of id, i.e., R = μ  id. Formula (9) for R is nice . . . but really,
what objects does it count?
Question 4. Is there yet a way to describe R directly as some famous
function on its own?
Yes. Moreover, this same “famous” function was promised to us in Number
Theory I. . . and it is the main protagonist of the upcoming Section 6!

5.6. Bijections within A and M. We finish our formal discussion of

Möbius inversion by drawing a diagram. It emphasizes that the operation of
sending f → Sf and the inverse operation Sf → f are just D-multiplications
by ι and μ, respectively. Each of the two operations provides a bijection 8
• of the set of arithmetic functions A onto itself; i.e.,
A i μ
M ι A # A and A # A. This is the meaning of Möbius
R ι
ln inversion from Corollary 2. Thus, ln → Sln = 12 τ ln
ι μ and 12 τ ln → ln (cf. Problem 7).
# #
• of the subset of multiplicative functions M onto itself;
1 μ
2 τ ln
τ i.e., M # M and M # M. This is implied by the
ι μ
M id newly-shown Theorem 4. Thus, ι → Sι = τ and τ →
ι μ
ι, as well as R → SR = id and id → R.
Exercise 12. Where does the function μ go under the operation
 (a) ι; in other words, what is the sum-function Sμ ?
(b) μ; in other words, which function f has Sf = μ?
Exercise 13. Verify the following two chains of relations among well-known
functions, at each step sending f → Sf :
 {μ  μ → μ → ε → ι → τ → Sτ } and {R → id → σ → Sσ }.
Place each relation within the above diagram.
The set of strongly multiplicative functions S is not featured here. Why?
Since μ ∈ S, there is no reason to expect that the operation μ will preserve
strong multiplicativity; e.g., ε  μ = μ ∈ S even though ε ∈ S. Moreover,
the set S is not closed under the D-product: just recall the familiar ι  ι =
Sι = τ , where ι ∈ S but τ ∈ S. This explains why sum-functions of strongly
multiplicative functions do not need to be strongly multiplicative. (Recall
the movement in and out of S in the AMS -inclusion Figure 2 on page 95.)
A bijection f : S T is a 1–1 correspondence between two sets S and T .

5.7. Summing up with μ. Our first exercises focus on recognizing types

of sums built from the Möbius function μ and a multiplicative function f .
Exercise 14. If f ∈ M, prove the formulas:

  r Ä
(a) μ(d)f (d) = f (1) − f (pi ) .
d|n i=1
  Ä ä
(b) μ(d)f ( nd ) = ri=1 f (pai i ) − f (piai −1 ) .

Hint: These two sums can be described as follows:

(a) the sum-function Sμ·f where μ·f is the usual function product;
(b) the D-product μ  f (or a Möbius-type sum).
Needless to say, as long as f is multiplicative, both sums are multiplicative
too (why?); hence PST 6 reduces the exercise to examining prime powers. ♦

Exercise 15. Find closed expressions for the following sums:
(a) μ(d) τ (d); (b) μ(d) σ(d); (c) d ;
d|n Ä ä d|n
 Ä ä d|n

(d) μ(d) τ n
d ; (e) μ(d) σ n
d ; (f) μ(d) nd ·
d|n d|n d|n
Solution to (c). The function f (n) = 1/n is multiplicative, so
 μ(d)  Exer.14

d = μ(d)f (d) = Sμ ·f (n) = f (1) − f (pi ) = 1− 1
pi . 
d|n d|n i=1 i=1

Look closer at the result: does it remind you of something? In formula (9)
for ∞-Raffle R, if we over-factor all pai i we obtain:
Ä äÄ äÄ ä r Ä ä
(19) R(n) = pa11 pa22 · · · par r 1 − 1
p1 1− 1
p2 1− 1
pr =n i=1 1− 1
pi .
R(n) μ(d)
Dividing by n yields the alternative answer n = d|n d · 

Problem 9. Find closed expressions for the following sums:

    Ä ä
(a) μ(d) ln nd ; (b) μ(d) ln d; (c) μ(d) π(d); (d) μ(d) π n
d .
d|n d|n d|n d|n

Hint: Neither ln n nor π(n) are multiplicative, so a new idea is necessary

here! If Λ(n) denotes the sum in (a), by Möbius inversion, SΛ = ln. Check
some cases to conjecture a formula for the von Mangoldt function Λ:
ln p if n = pa , a prime power;
i Λ(n) =
0 otherwise.

Could parts (a) and (b) be related? You are on your own regarding the
function π in parts (c) and (d). Check out some interesting observations in
the Hints section. ♦

The Möbius function μ turns out to also be relevant in finding certain

D-inverses. So far we know that ε−1 = ε (why?) and also we calculated
the inverse ι−1 = μ. Note that both ι and ε are strongly multiplicative. On
the other hand, the 0-function O cannot be D-inverted (cf. Lemma 3). The
following problem asks you to discover a nice formula for the D-inverses of
the strongly multiplicative functions that can, in principle, be inverted.
Problem 10. Let f ∈ S with f (1) = 0, i.e., f = O. Find the D-inverse f −1
 and show that it is multiplicative, but not necessarily strongly multiplicative.
Check that your formula for f −1 works for ε−1 , ι−1 , and id−1 .
i If S ∗ denotes all of S without O, Problem 10 can be paraphrased to say
that the D-inverses of all elements of S ∗ are in M. Formulas for inverses
of multiplicative or, more generally, arithmetic functions are not nearly as
compact. We will revisit D-inverses in Part III and incorporate them in the
discussion of groups.

6. The Euler Function φ(n)

6.1. Multiplicativity is ubiquitous! In this section we finally identify

the ∞-Raffle function R with a well-known function from number theory.
Definition 8. The Euler function φ(n) counts how many numbers between
i 1 and n are relatively prime to n.9
Check that φ(1) = φ(2) = 1, φ(3) = φ(4) = 2, φ(5) = 4, and φ(6) = 2.
For example, the last equality is true because 1 and 5 are the only numbers
between 1 and 6 that are relatively prime to 6.
At a first glance, there is no connection between the ∞-Raffle and Eu-
ler functions: the former is built from an equation using divisors of n
( d|n R(d) = n), while the latter rests on numbers relatively prime to n . . . .
If anything, R(n) and φ(n) should be “orthogonal” concepts! Yet, is it a
coincidence that their first 6 values are the same?
Exercise 16. Continue with the above calculations for φ(n) up to n = 10,
to find out that φ(7) = 6, φ(8) = 4, φ(9) = 6, and φ(10) = 4. The doubting
readers should keep on calculating φ(n) until n = 20 and compare with R.
If we accept that R(n) and φ(n) could be identical, our first task in the
present context would naturally be to establish that
Lemma 4. φ(n) is multiplicative.
The reader is expected to know a bit about remainders in order to fully
understand the proof below. If this is not so, do not despair: an alternative
proof is coming up soon afterward.
In Number Theory I we briefly saw this arithmetic function φ(n) in relation to Euler’s
Theorem: aφ(n) ≡ 1 (mod n) for any relatively prime a and n.

Direct Proof: Let a and b be relatively prime. Consider the a × b

table of all numbers between 1 and ab. Observe that the entries in any fixed
column have the same remainder when divided by a; thus, the ith column
contains all numbers with remainder i.10

1 2 ··· i ··· a−1 a

a+1 a+2 ··· a+i ··· 2a − 1 2a
2a + 1 2a + 2 ··· 2a + i ··· 3a − 1 3a
··· ··· ··· ··· ··· ··· ···
ja + 1 ja + 2 ··· ja + i ··· (j + 1)a − 1 (j + 1)a
··· ··· ··· ··· ··· ··· ···
(b − 2)a + 1 (b − 2)a + 2 ··· (b − 2)a + i ··· (b − 1)a − 1 (b − 1)a
(b − 1)a + 1 (b − 1)a + 2 ··· (b − 1)a + i ··· ba − 1 ba

If it happens that ja + i is relatively prime to a, then i will be relatively

prime to a, and consequently, all numbers in the ith column will be relatively
prime to a (why?). In other words, “relative primeness with a” applies to
a whole column or to nothing in a column. By definition of φ(a), the first
row contains exactly φ(a) numbers relatively prime to a; hence, their φ(a)
columns comprise all entries in the table that are relatively prime to a.
As for b, check that the b entries in each column have distinct remainders
when divided by b. For instance, if ja + 2 and ka + 2 in the 2nd column
have the same remainder, then their difference (j − k)a is divisible by b; as
gcd(a, b) = 1, this forces b to divide j −k; but 0 ≤ j, k < b, so that |j −k| < b
and the only possibility for b to divide j − k would be if j = k (why?); we
conclude that all the b numbers in the 2nd column have different remainders
modulo b.11 In a similar way, each column contains precisely φ(b) entries
that are relatively prime to b (why?).
Finally, as gcd(a, b) = 1, being relatively prime with ab is the same as
being relatively prime with a and with b. By definition of φ, there are a total
of φ(ab) entries in the table relatively prime to ab. Our observations have
located φ(a) columns with entries relatively prime to a, and each such column
contains precisely φ(b) entries relatively prime to b. Equating, there are
φ(ab) = φ(a)φ(b) entries relatively prime to ab, and φ is multiplicative. 
An illustration of this proof is given by the example of a = 6 and b = 5.
In Figure 5a, all entries relatively prime to 6 (in bold) occupy the 1st and 5th
columns (indicated by ↑): these are φ(6) = 2 columns altogether. Each such
column comprises entries with different remainders modulo 5 and contains
precisely φ(5) = 4 entries relatively prime to 5 (circled). Figure 5b, 6 and 5
are switched; yet, we arrive at the same final answer: φ(30) = φ(5)·φ(6) =
4 · 2 = 8 entries relatively prime to 30 in all.
In Number Theory I, we would have said that all entries the ith column are congruent
to i modulo a and written ja + i ≡ i (mod a).
In other words, each column is a system of remainders modulo b.

 1  2 3 4 5
1  2 3 4 5 6
6 8 9 10
7  8 9 10 11  12
11  12 13  14 15
13  14 15 16 17  18
16 17  19 
19  18 20
20 21 22 23  24
21 22 23 
25 26 27 28 29  30 24 25
↑ ↑
26 27 28 29  30
↑ ↑ ↑ ↑
Figure 5. φ(30) = φ(6)φ(5) = 2 · 4

6.2. Follow the beaten path and triumph again over ∞-Raffle. Once
multiplicativity of the Euler function has been shown, there is nothing else
to do but apply PST 26 and reduce φ(n) to prime powers:
 Problem 11. Find a closed formula for φ(p ) for any prime power p . k k

Solution: The numbers from 1 to pk split into two disjoint subsets:

p, 2p, 3p, 4p, 5p, • not relatively prime to pk , i.e., divisible by p; these can
...,p p be written as p c where c = 1, 2, . . . , pk−1 ; so there are
pk−1 such numbers.
1, 2, . . . , p−1, . . . ,
pk−1+1, . . . , pk−1
• relatively prime to pk , i.e., not divisible by p; by defi-
nition, there are φ(pk ) such numbers.

PST 75. Using the simple idea of complements, to find the size of one subset
of a set, subtract the size of its complement from the size of the whole set.
Since the “whole set” is {1, 2, ..., pk }, we thus obtain φ(pk ) = pk − pk−1 . 

Putting together all prime-power pieces, we arrive at a (quite familiar)

formula for the Euler function:

φ(n) = φ(pai i ) = (pai i − pai i −1 ) = pai i (1 − p1i ) = n (1 − p1i ),
i=1 i=1 i=1 i=1
which allows us to finally identify it with our function R.
Corollary 3. The ∞-Raffle function R is the Euler function φ:
(20) R(n) = φ(n) = n(1 − 1
p1 )(1 − p2 ) · · · (1
− 1
pr ).

By definition, φ(n) ≥ 1 for all n ∈ N: at least 1 is relatively prime to n.

Therefore, R(n) ≥ 1 and every n ∈ N will be written on at least one ticket
in the ∞-Raffle. This is yet another explanation of the ∞-Raffle puzzle. 
No discussion of the Euler function can be complete without computing
its sum-function:
Corollary 4. The sum-function of the Euler function is id:
φ(n) = n, Sφ = id, φ  ι = id, and φ = μ  id.

Sketch: As φ was just shown to be multiplicative, you can embark on

the beaten path and nail down the sum-function in no time. Indeed, Sφ is
multiplicative (why?), so that Sφ (n) = Sφ (pa11 ) · · · Sφ (par r ). This reduces the
problem to a prime power n = pa , which, together with φ(pj ) = pj − pj−1 ,
easily implies Sφ (pa ) = pa , and correspondingly, Sφ (n) = n. 
Of course, you could have used that φ = R and that R was defined by
d|n R(d) = n to conclude the same for φ: d|n φ(d) = n and Sφ = id. 

6.3. Eager for second helpings. Our direct proof of φ’s multiplicativity
was somewhat hefty. Here we explore an alternative, slicker way. To this
end, we again turn to the sum-function of φ:
Problem 12. From the definition of the Euler function φ, directly calculate
its sum-function Sφ without using φ’s multiplicativity.
Arguably, this question is hard because it asks us to forget what we have
learned so far about the Euler function (and the human mind really objects
to such tasks!), and to find its sum-function with bare hands from scratch.
If you attempt to brute-force the calculation of Sφ , you will end up in an

incomprehensible mess. You certainly could try a more organized approach
by some type of induction (as you know the answer is id); try it! But suppose
you truly did not know anything about the Euler function, other than its
definition. What would you do?
The marvelous approach below is based on the simple but most funda-
mental of combinatorial ideas:

PST 76. To establish a combinatorial identity (such as Sφ = id), identify a
suitable set of objects and count its elements in two different ways.12
The question, of course, is what is this “suitable” set of objects to be
counted? Enjoy one possible solution, and see if you can think of another.

Solution to Problem 12: For starters, consider the set of fractions

ß ™ ® ´
1 2 n−1 n a1 a2 an−1 an
, ,..., , = , ,..., ,
n n n n b1 b2 bn−1 bn
where each fraction c/n on the LHS is written also in reduced form ai /bi in
the RHS (i.e., gcd(ai , bi ) = 1). For instance, if n = 10:
¶ © ¶ ©
(22) 1 2 3 4 5 6 7 8 9 10
10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 , 10 = 1 1 3 2 1 3 7 4 9 1
10 , 5 , 10 , 5 , 2 , 5 , 10 , 5 , 10 , 1 ·
The mysterious “suitable” set of objects sought by PST 76 turns out to be
the set Dn of all denominators {bi } on the RHS. This Dn is precisely the set
of all divisors d of n, with possible repetitions (why?). For instance, from
(22) we have D10 = {1, 2, 5, 5, 5, 5, 10, 10, 10, 10}.
Did you notice that derivatives of the word “identity” occurred three times in this
PST, referring to three essentially different concepts? Can you identify these concepts? 

Well, how many times does a divisor d of n appear in Dn ? The answer

for D10 is more than revealing: 1 and 2 appear once, and 5 and 10 appear 4
times; this data “happens” to match φ(1) = φ(2) = 1 and φ(5) = φ(10) = 4.
This is too good to just be a coincidence! In general:
ai ai ai ·
= = d,
bi d n
where d = bi so that gcd(ai , d) = 1 and 1 ≤ ai ≤ d. In other words, the
divisor d of n appears as a denominator in Dn exactly once for all ai ’s that
are relatively prime with d and not larger than d. But this is the definition
of the Euler function! Therefore, each d appears exactly φ(d) times in the
set Dn . On the other hand, we originally started with n fractions in Dn , so
that there is a total of n appearances of divisors d in Dn .
Thus, counting the elements of Dn in two different ways leads to

φ(d) = |Dn | = n. 

This is my favorite moment in the multiplicative functions topic!  Our

new proof is so elementary. . . it literally uses nothing but the definition of φ!
Yet, it is so elegant and revealing. Probably because of this very combination
it is so hard to come by! In short, this solution is a delightful illustration of
the beauty and power of mathematical problem solving.
Its usefulness is still another matter to marvel at. Now that we have
shown Sφ = id, we realize that, by Möbius inversion, φ = μ  id. The latter
is a D-product of two well-known multiplicative functions, and hence the
product itself is multiplicative. This nails down the multiplicativity of φ in
a way that completely circumvents our original table approach. 

7. The Taming of the ShrewD φ

We have dedicated this whole section to problems with φ. Yet, the ways
in which the Euler function can be engaged are so varied that we can only
gloss over a few of the possible related problem-solving themes.

7.1. Warming up to φ. For our first calculation “bonanza,” recall the

notation n! = 1 · 2 · 3 · · · n, pronounced as “n-factorial.”
Exercise 17. Calculate φ(φ(10!)).
Exercise 18. Solve the following equations in natural numbers:
(a) φ(2x 5y ) = 80; (b) φ(n) = 12; (c) φ(n) = ; (d) φ(n) =

Parts (a)–(b) should be straightforward since they have the unknowns

only as inputs of φ. The more interesting parts (c)–(d) require some extra
thought as they feature n on both sides of the equations.

Solution (d): Our first move is simple, but may be unexpected. Since
φ(n) is an integer, 3 must divide n on the RHS. This prompts the idea:
 PST 77. Factor out all 3s that divide n to form the partial prime decom-
position n = 3a m with gcd(3, m) = 1 and substitute it for n. The relative
primeness of 3 and m makes this expression especially suitable when working
with multiplicative functions.
In our situation, a ≥ 1. Substituting, φ(3a m) = 23 3a m, i.e., φ(3a )φ(m) =
3a−1 2 φ(m) = 2·3a−1 m. Thus, φ(m) = m. But φ(m) < m for m > 1 (why?),
so that m = 1 and the final answer is n = 3a for a ≥ 1. 

Even though the above solution is probably the best there is, if we didn’t
start the “right” way by factoring out all powers of 3, we could have ended
up with a seemingly harder problem. Our instinct might have told us to
apply the formula for φ(n) from the get-go:
Ä ä
n 1− 1
pi = 23 n.

Canceling n and clearing all denominators would have resulted in

(23) 3(p1 − 1)(p2 − 1) · · · (pr − 1) = 2p1 p2 · · · pr .
The function φ has been completely eliminated, which isn’t necessarily ad-
vantageous for us. Still, solving the problem in this alternative form will
teach us a mix of useful ideas:

PST 78. Recall that all primes are odd, except for 2 itself, and match the
number of 2s dividing each side. In the process, obtain an upper bound for
the number of involved primes, and thus reduce the problem to only finitely
many possibilities.
You can try the same idea with the prime 3 instead of 2 in PST 78. We
defer the remainder of this second solution to part (d) of Exercise 18 to the
Hints section. ♦
As a bonus, try to do part (c) also in two ways: start by factoring out

all 2s from n and then apply φ, or first apply φ and then deal with the
consequences. ♦

We are bound to explore standard summations with φ at some point,

so why not right now? The general formulas for multiplicative sums from
Exercise 14 will again come in handy here. Make sure that you first do prove
the multiplicativity of the involved functions: this may mean reasoning with
something else in addition to D-products.
Exercise 19. Find closed expressions for the following sums:


μ(d)φ(d); (b)

μ(d)φ( nd ); (c)

μ2 (d)φ2 (d); (d)
d|n d|n d|n d|n

7.2. Is φ “more than just” multiplicative? The next exercise suggests

that φ has some resemblance to the identity function, even though φ ∈ S.
Exercise 20. Show that φ(nk ) = nk−1 φ(n) for all n, k ∈ N.
Our study of multiplicative functions has depended heavily so far on
the notion of gcd(m, n), also denoted for short as (m, n). Its counterpart
i lcm(m, n), called the least common multiple of m and n and denoted by
[m, n], makes its debut now. You may have heard that multiplying the two
quantities yields the original product mn:
(24) (m, n)·[m, n] = mn.
 For instance, (4, 6)·[4, 6] = 2·12 = 4·6. The proof of (24) rests on the prime
decompositions of (m, n) and [m, n], also necessary for the exercise below.
Exercise 21. Show that for any m, n ∈ N:
 Ä ä Ä ä
(a) f (m) f (n) = f (m, n) f [m, n] provided f ∈ M;
(b) φ(mn) = (m, n)φ([m, n]).
Part (a) extends the definition of multiplicativity to any pair of numbers
m and n; in fact, when m and n are relatively prime, (m, n) = 1 and [m, n] =
mn, so that (a) becomes the usual definition f (m)f (n) = f (mn) for f ∈ M.

By contrast, part (b) works only for certain multiplicative functions! Can
you think of other non-trivial examples that satisfy (b) in place of φ?

7.3. Tinkering with φ. The divisor function τ can be written as the sum
τ (n) = d|n 1. Likewise, the Euler function φ can be written as

(25) φ(n) = 1,

where the sum runs over all numbers relatively prime to and ≤ n. Changing
“1” to “d ” in the formula for τ yields σ(n) = d|n d. Due to multiplicativity,
we have nice formulas for the three sums τ , σ, and φ. What happens if we
analogously replace “1” by “t ” in the formula for φ: can we calculate the sum
of all numbers relatively prime to and ≤ n?

Problem 13. Find a closed expression for the sum η(n) := t.

Hint: For starters, η fails to be multiplicative as η(2)η(3) = 1·3 = 6 = η(6).

This forces us to think “out-of-the-box.”

 PST 79. Compare with a previously solved problem and adapt the old proof.
Right! Which proof? Gauss’s proof of 1 + 2 + · · · + n = (n + 1)n/2 was
already adapted once to finding a formula for the product π(n) of divisors of n
(cf. Exer. 4). Can it be adapted again for a swift calculation of η(n)? ♦

For a lengthier approach, which nevertheless will practice your combina-

torial skills, you could represent η(n) as an alternating sum via the so-called
Inclusion-Exclusion Principle:
η(n) = t− t+ t− t + · · · + (−1)r t,
i pi |t i<j pi pj |t i<j<k pi pj pk |t p1 p2 ···pr |t

where we have omitted writing “1 ≤ t ≤ n” for all sums. As we now explain

this construction, check how each of the individual sums above fits in. The
idea is, as a first approximation, to add up all t’s from 1 to n and then to
subtract those t’s that are divisible by some prime factor pi of n. But this
subtracts twice each t divisible by a double product pi pj , so we need to add
back those t’s, causing in turn the overcounting of each t divisible by a triple
product pi pj pk . We subtract the latter t’s and continue in the same fashion
to correct our over- or under-counting of t’s. If you try to calculate the whole
expression above for η(n), it shouldn’t be a surprise that each and every sum
will be a variation of the original “Gauss” sum-formula. ♦

Earlier, we managed to generalize simultaneously τ and σ in the joint

formula σa = Sida (cf. Exer. 9). Can you generalize the Euler function φ in
some non-trivial and useful way?
Problem 14. Let f (x) ∈ Z[x], i.e., f (x) is a polynomial with integer coeffi-
cients, and let λf (n) be the number of values f (j) for j = 1, 2, . . . , n such that
(f (j), n) = 1. Show that λf is multiplicative and that λf (pa ) = pa−1 λf (p)
λ (n)  λ (p )
for primes p and a ≥ 1. Derive fn = ri=1 fpi i and describe Sλf .

Hint: Does this problem resemble something else we’ve done before? Let’s
consider some initial cases for f (x). The constant polynomial f (x) = 1
results in λ1 = id ∈ S, thereby oversimplifying the problem. The next poly-
nomial to try is the linear f (x) = x, resulting in λx = φ: we have discussed
this case in great detail! Can you modify the table-proof of multiplicativity
of the Euler function to show multiplicativity of the general λf ? ♦
Note that Problem 14 provides infinitely many non-trivial examples λf
satisfying both Exercises 20 and 21(b), with λf in place of φ. Is this a
coincidence? (See the extra Exercise 24 in the Hints section.)

7.4. Number theory in earnest. Our final problems on the Euler function
are “ruled” by a new notion:
i Definition 9. A Fermat prime is a prime of the form 2n + 1 for some n ∈ N.

If n has an odd divisor > 1, then 2n + 1 has no chance of being a prime.

For instance, if n = 5m for m ≥ 1, we can substitute A = 2m :
Ä ä5
2n + 1 = 2m + 1 = A5 + 1 = (A + 1)(A4 − A3 + A2 − A + 1),

where both last factors are > 1, making 2n + 1 composite. This factorization
of 2n + 1 would not work if 5 were replaced by an even number (why?),
but a similar factorization works for any odd number. We conclude that the
exponent n can only be a power of 2, i.e., n = 2k (why?), and hence

 Lemma 5. All Fermat primes are of the form F k

= 22 + 1 for some k ≥ 0.

The first Fermat primes (and the only ones known so far!) are F0 = 3,
F1 = 5, F2 = 17, F3 = 257, and F4 = 65, 537. The next case is the long-
believed “prime” F5 = 232 + 1, whose factor of 641 was discovered by Euler
in 1732. If you want to make history, find out whether there are infinitely
many Fermat primes. Could it be that, to the contrary, all Fermat numbers
Fn = 22 + 1 are composite for n > 4?
Now let’s turn to more accessible, even though far from easy, problems.
Problem 15. Solve the equation φ(σ(2n )) = 2n for n ≥ 0.

Sketch: Applying the formula for σ we paraphrase to φ(2n+1 − 1) = 2n . If

2n+1 − 1 = 1, then n = 0 and φ(1) = 1. Otherwise, 2n+1 − 1 = ri=1 pai i > 1
where all pi ’s are odd, and we can apply the formula for φ:

2n = φ(pa11 pa22 · · · par r ) = p1a1 −1 p2a2 −1 · · · prar −1 (p1 − 1)(p2 − 1) · · · (pr − 1).

Therefore, all ai = 1 so that 2n = (p1 − 1)(p2 − 1) · · · (pr − 1). This forces

each pi to be a Fermat prime: pi = 22 i + 1 for some ki ≥ 0 (why?). The
problem is reduced to finding all sets of Fermat primes {pi } such that
(26) p1 p2 · · · pr + 1 = 2n+1 = 2(p1 − 1)(p2 − 1) · · · (pr − 1);
or, equivalently,
Ä k1 äÄ k2 ä Ä kr ä k1 k2 kr
(27) 22 + 1 22 + 1 · · · 22 + 1 + 1 = 2 · 22 · 22 · · · 22 .
At this point, the problem changes direction and demands some skills
with modulo calculations (cf. Number Theory I).
PST 80. Order the primes p1 < p2 < · · · < pr . To find out the smallest
Fermat prime p1 = 22 1 + 1 appearing, check both sides of the equation using
a suitable modulus. After substituting this smallest p1 , find out the next p2
again by using a suitable modulus. Continue until you find all p1 , p2 ,. . . , pr
and reach a contradiction for pr+1 .
For a beginner, it may be tricky to figure out these suitable moduli:
k k
they are the powers 22 i . Indeed, the smallest among them, 22 1 , is suitable
for calculating p1 because p1 ≡ 1 (mod 22 1 ) and because all other appearing
Fermat primes are also ≡ 1 (mod 22 1 ) (why?). Equation (27) “miraculously”
k k
simplifies to 1 · 1 · · · 1 + 1 ≡ 2 · 0 · 0 · · · 0 (mod 22 1 ), i.e., 2 ≡ 0 (mod 22 1 ).
The latter simply means that 22 1|2, i.e., k1 = 0 and p1 = 3 is the smallest
Fermat prime appearing in (27). Substituting p1 = 3, we obtain:
Ä k2 ä Ä kr ä k2 kr
(28) 3 22 + 1 · · · 22 + 1 + 1 = 22 · 22 · · · 22 .
As suggested earlier, in order to find k2 and p2 , we reduce equation (28)
modulo 22 2 . We leave the reader to follow the directive of PST 80 and
finish the problem in this fashion.
Along the way, the product of consecutive Fermat numbers will keep
displaying the following pretty pattern:
 Lemma 6. F k+1 = F0 F1 F2 · · · Fk + 2 for all k ≥ 0.
For instance, F4 = 65, 537 = 3 · 5 · 17 · 257 + 2 = F0 F1 F2 F3 + 2. The lemma
would provide serious shortcuts in your solution to φ(2n+1 − 1) = 2n . The
final answers are n = 2s − 1 for s = 0, 1, . . . , 5. ♦
Problem 16. Solve the equation φ(φ(n)) = 213 33 .
Hint: This problem definitely resembles Exercise 17, where a correct calcu-
lation would have yielded φ(φ(10!)) = 213 33 . Aha, we know one solution to
our current problem: n = 10! = 28 34 52 7. Alas, there are plenty more! A few
that come to mind are n = 212 35 , 29 35 52 , 29 33 11·13, 23 ·7·11·13·17·19·37. . .
As you try to organize the array of all possibilities for n, you will be
forced to generalize the Fermat primes to primes of the form 2k 3m + 1 for
some k, m ≥ 0. To classify the latter must be even harder than to classify
the “regular” Fermat primes 22 +1. Luckily, in Problem 16 there is a natural
bound on the cases that need to be checked (what upper bound? cf. PST 78),
which allows for computer attacks on the solution. Is there a way to extract
“human sense” from such a computer solution? Is there a unified approach
to resolve all equations of the form φ(φ(n)) = 2a 3b where a, b ≥ 0? ♦
8. Hints and Solutions to Selected Problems

Property 2. Rewrite Definition 5 for the triple product (f g)h as follows:

    Å  ã
(f  g)  h (n) = (f  g)(d) h(d3 ) = f (d1 ) g(d2 ) h(d3 )
dd3 =n dd3 =n d1 d2 =d

= f (d1 ) g(d2 ) h(d3 ),
d1 d2 d3 =n

which fills the gap in showing associativity of  in the text. 

Exercise 11. Let m and n be relatively prime. If one of them is not square-
free, then mn is not square-free either; thus, both sides of μ(m)μ(n) = μ(mn)
are 0 by μ’s definition and equality holds. If both m and n are square-free,
their prime decompositions m = q1 q2 · · · qs and n = p1 p2 · · · pr multiply
to a square-free prime decomposition: mn = q1 q2 · · · qs p1 p2 · · · pr . Thus,
μ(m)μ(n) = (−1)s (−1)r = (−1)s+r = μ(mn), and μ is multiplicative. 

Exercise 12. (a) The initial purpose of discovering the function μ was to
find the D-inverse of ι, i.e., μ  ι = ε and Sμ = ε. 
(b) The D-product μ  μ is multiplicative since μ ∈ M, which reduces
the calculation to prime powers: (μ  μ)(pa ) = a1 +a2 =a μ(pa1 )μ(pa2 ). If
a ≥ 3, then we always have a1 ≥ 2 or a2 ≥ 2, yielding a 0 μ-value and hence
a 0 overall answer. The remaining cases (with a ≤ 2) are:
• (μ  μ)(p) = μ(1)μ(p) + μ(p)μ(1) = −2;
• (μ  μ)(p ) = 2
μ(1)μ(p2 ) + μ(p)μ(p) + μ(p2 )μ(1) = 1.
Multiplying all answers for the prime powers pa produces the final formula:
(−2)k if n cube-free;
(μ2 )(n) =
0 otherwise,
where k counts the primes pi such that pi| n but p2i | n. Thus, Sμ2 = μ. 
Exercise 13. Each relation f → Sf was verified somewhere in the text. ♦
Exercise 14. The explicit hint in the text leaves almost nothing to do:

Sμf (pa ) = μ(pj )f (pj ) = μ(1)f (1) + μ(p)f (p) + 0 = f (1) − f (p);

μ  f (pa ) = μ(pj )f (pa−j ) = μ(1)f (pa ) + μ(p)f (pa−1 ) + 0 = f (pa ) − f (pa−1 ).

Multiplying these prime-power pieces gives the desired formulas. 

Exercise 15. (a)–(b) The sums are Sμτ and Sμσ , which are multiplicative.
If r counts the distinct prime divisors pi of n, by Exercise 14:

Sμτ (n) = (τ (1) − τ (pi )) = (1 − 2) = (−1)r , and
i=1 i=1

Sμσ (n) = (σ(1) − σ(pi )) = (1 − (1 + pi )) = (−pi ) = (−1)r pi . 
i=1 i=1 i=1 i=1

(d)–(f) The sums are μ  τ = ι, μ  σ = id, and μ  id = R, which are the

Möbius inversion of τ = Sι (cf. (11)), Sid = σ, and SR = id, respectively. 
Problem 9. (a) Using ln ab = ln a + ln b and ln pa = a ln p, we calculate
SΛ (n) = Λ(d) = Λ(pα
i )+0=
ln pi = ln 1 + ai ln pi
d|n i α
pi i |n i α
pi i |n i pi |n

= 0+ ln pai i = ln(pa1 1 pa2 2 · · · par r ) = ln n.
i pi |n

Hence, SΛ = ln, and by Möbius inversion, μ  ln = Λ. 

(b) Using ln(a/b) = ln a − ln b, we convert part (a) into part (b):

 n    ∗
μ(d) ln = μ(d)(ln n − ln d) = μ(d) ln n− μ(d) ln d = 0−Sμ ln (n).
d|n d|n d|n d|n
Justify (∗)! Hence, Sμ ln = −μ  ln = −Λ, i.e., d|n μ(d) ln d = −Λ(n). ♦

(c) The sum is Sμπ = (μπ)  ι, which is probably its most compact
form. It is worth noting that, since μ “kills” any non-square-free d’s, the sum
depends only on the distinct primes p1 , p2 , . . . , pr dividing n; i.e., increasing
the exponents of prime powers does not change the value of the overall sum
(why?). Thus, Sμπ (n) = Sμπ (p1 p2 · · · pr ), and the latter can be expanded as
“a sum of sums” into a symmetric polynomial in the pi ’s. ♦

(d) The sum is the D-product μ  π, which (in contrast to the previous
sum) does depend on the specific exponents of the prime powers dividing n.
Although a “closed form” seems to be out of question, here is an interesting
observation shifting the emphasis of the problem in a different direction:
Problem 17. (Evan O’Dorney) Show that n
d|n μ(d)π( d ) > 0 ∀n ∈ N.
Solution: Let n = pa11 pa22 · · · par r > 1. For any subset S of {p1 , ..., pr }
define π(S) = pi ∈S pi and π(S + 1) = pi ∈S (pi + 1) as the products of all
elements of S, or of the shifts up by 1 of all pi ∈ S. As μ(d) = 0 for non-
square-free d’s, the only surviving terms μ(d)π(n/d) correspond to d = π(S)
for any such subset S. If |S| is the number of elements of S, the LHS of the
desired inequality can be written as
Ç åπ(S)π(S+1)
 n 2
(−1) ,
one term for each such subset S. The “hero term” (i.e., the biggest term) is
τ (n)
when S is empty13 (why?); it equals n 2 . The biggest “enemy term” (i.e.,
the most negative term) must occur for some singleton set (why?). WLOG
τ (n/p1 )
S = {p1 }, so that this term (with the negative sign dropped) is (n/p1 ) 2 .
Taking into account that n ≥ 2, all ai ≥ 1, and 2r−2 ≥ r − 1 for r ≥ 1
(why?), we bound the hero term from below as follows:
τ (n)
n 2 = n(a1 +1)(a2 +1)···(ar +1)/2 = n(a2 +1)···(ar +1)/2 · na1 (a2 +1)···(ar +1)/2
Ä ä Ä ä τ (n/p1 )
n a1 (a2 +1)···(ar +1)/2
> 22 p1 ≥ 2r−1 n
In other words, the hero term is at least 2r−1 times as large as any enemy
term. But each enemy term corresponds to a subset S with an odd number
of elements, while each positive term to a subset S with an even number of
elements. A well-known combinatorial problem says the following:
Exercise 22. The number of subsets of a set with r elements is 2r . For
 r ≥ 1, half of these subsets have an odd number of elements, and the other
half have an even number of elements.
Thus, there are exactly 2r−1 enemy terms, whose sum is already domi-
nated by the single hero term. We conclude that the whole sum μ  π(n) > 0
for all n > 1. ♦

Note that for S = ∅, the empty product pi ∈∅ is defined to be 1.

Problem 10. The statement is a giveaway: it asks us to find f −1 and show

 that f −1 ∈ M. Instead, we go the other way around: we calculate only
the prime-power cases f −1 (pa ), multiply them together for a multiplicative
function, and prove that the resulting formula indeed gives the D-inverse f −1 .
(Compare with PST 96 on order-switching in Geometry III on page 294.)
Initial checks for f −1 are based on the definition f −1  f = ε. For n = 1,
f (1)f (1) = 1; as f ∈ M with f (1) = 0, we have f (1) = 1 so that

f −1 (1) = 1 too. For n = p, a prime,

(f −1  f )(p) = f −1 (1)f (p) + f −1 (p)f (1) = ε(p) = 0 ⇒ f −1 (p) = −f (p).

However, for n = p2 :
(f −1  f )(p2 ) = f −1 (1)f (p2 ) + f −1 (p)f (p) + f −1 (p2 )f (1) = ε(p2 ) = 0
⇒ f 2 (p) − f 2 (p) + f −1 (p2 ) = 0 ⇒ f −1 (p2 ) = 0.

Along the way, we used that f is strongly multiplicative in f (p2 ) = f 2 (p).

Continuing, you will discover that f −1 (pa ) = 0 when a ≥ 2. The conjectured
multiplicativity of f −1 means then that f −1 (n) = 0 if n is not square-free,
which, of course, reminds us of μ. If you think about it for a moment, you
will come up with the compact formula f −1 = μf .
Now we have to prove that this conjectured formula works. That μf
is multiplicative follows directly from Lemma 2: both μ and f are in M,
and hence so is their ordinary product μf . We are left to show that μf is
indeed the D-inverse of f , i.e., why is (μf )  f = ε? As a D-product of two
multiplicative functions μf and f , the entire LHS is also multiplicative. So,
to calculate it, we split as usual into prime powers:
a a
(μf )f (pa ) = (μf )(pj )f (pa−j ) = μ(pj )f (pj )f (pa−j )
j=0 j=0
f ∈S
= μ(1)f (1)f (pa)+μ(p)f (p)f (pa−1)+0 = f a (p)−f (p)f a−1(p) = 0.
Ä ä
Multiplying all such pieces yields (μf )  f (n) = 0, except for n = 1: then
Ä ä
(μf )  f (1) = μ(1)f (1)f (1) = 1. Thus, (μf )  f = ε, and by uniqueness of
D-inverses (cf. p. 236) f −1 = μf for any f ∈ S ∗ . 
Since ε  ε = ε, then ε−1 = ε; and we already know that ι−1 = μ.
Our formula checks for both cases: με = ε (why?) and μι = μ (ι is the
multiplicative identity wrt “·”), so that ε−1 = με and ι−1 = μι.As for
id−1 = μ·id (∈ S!), this is the first time we see a formula for id−1 . ♦
It is interesting to generalize our formula f −1 = μf for f ∈ S ∗ . In
analogy with S ∗ , define M∗ := M − {O}.

 Exercise 23. Show that (μf )  g = g  Ä1 − f

g (p)
for f ∈ M∗ , g ∈ S ∗ .

Plugging f = g ∈ S∗ quickly yields (μf )  g = ε, i.e., μf = f −1 . 


Exercise 17. The cube-diagram on page 87 splits 10! into prime powers:
φ(φ(10!)) = φ(φ(28 34 52 71 )) = φ(φ(28 )φ(34 )φ(52 )φ(7))
= φ((28 −27 )(34 −33 )(52 −5)(7−1)) = φ(211 34 51 )
= φ(211 )φ(34 )φ(5) = (211 −210 )(34 −33 )4 = 213 33 . 

Exercise 18. (a) As x, y ≥ 1, we can calculate φ as follows:

φ(2x 5y ) = φ(2x )φ(5y ) = (2x − 2x−1 )(5y − 5y−1 ) = 2x−1 5y−1 4 = 2x+1 5y−1 .
Since 80 = 24 51 , we obtain x = 3 and y = 2. 
 ai −1
(b) If φ(n) = 12, then − 1) = 12. Hence, each (pi − 1)|12.
i pi (pi
The primes with this property are pi = 2, 3, 5, 7, and 13. In addition, piai −1
also must divide 12 = 22 31 ; thus, either ai = 1, or pi = 2 with ai ≤ 3, or
pi = 3 with ai ≤ 2. Overall,
n = 2a1 3a2 5a3 7a4 13a5 with a1 ≤ 3, a2 ≤ 2, and a3 , a4 , a5 ≤ 1.
These are 4·3·2·2·2 = 96 cases: too many for brute-force!

PST 81. To reduce the number of cases to be checked, study whether a
specific prime participates in n, starting with the largest prime!

Case 1. If 13 participates (a5 = 1), then n = 13k where gcd(13, k) = 1.

Thus, φ(n) = φ(13)φ(k) = 12φ(k) = 12 and φ(k) = 1. Check that only
k = 1, 2 work here, so that n = 13 or 2·13.
Case 2. Similarly, if 7 participates, reduce to φ(k) = 2, which is satisfied
only by k = 3, 4, 6 (why?), so that n = 3· 7, 22 ·7 or 2· 3·7.
Case 3. If 5 participates, reduce to φ(k) = 3, which never works! (why?).
Case 4. If none of 13, 7, or 5 participates, n = 2a1 3a2 . But φ(2a1 ) = 2a1 −1
and φ(3a2 ) = 3a2 −1 · 2, neither of which ever equals 12 because of missing
factors of 3 or 2. Thus, both 2 and 3 must participate in n with a1 = 2, 3
and a2 = 2 (why?). Check that n = 22 32 works, but n = 23 32 doesn’t. ♦
(c) Our first approach follows the techniques from the text. If φ(n) =
n/2, then 2|n, so that n = 2a k with k odd and a ≥ 1. Substituting and
canceling yields φ(k) = 1, which we just saw has the only solutions k = 1, 2.
The final answer is n = 2a with a ≥ 1. ♦
In a second approach, we apply the formula for φ and clear denominators:

n (1 − 1
pi ) = n
2 ⇒ 2(p1 − 1)(p2 − 1) · · · (pr − 1) = p1 p2 · · · pr .
As 2|LHS, p1 = 2 on the RHS. Canceling results in (p2 − 1) · · · (pr − 1) =
p2 · · · pr , which doesn’t have solutions. Indeed, any further prime p2 would be
odd ; hence, p2 −1 would be even, making the LHS even; but all primes on the
RHS are odd, which makes the RHS odd ! Therefore, the only participating
prime is p1 = 2, and n = 2a with a ≥ 1. 

(d) For the second approach started in the text, consider again (23) and
PST 78. The RHS is divisible by 2, possibly by 4 (if p1 = 2), but never by 8
(why?). Correspondingly, the same must be true for the LHS. Each factor
pi − 1 is even, except for possibly p1 − 1 = 1 (if p1 = 2). We conclude that
there are at most two odd primes p2 and p3 , and possibly one even prime
p1 = 2. Further, since 3|LHS, then p2 = 3. So, the only possibilities are
− 1)(3 − 1)(p÷ ! "3 , ⇒ p÷
3 − 1) = 2· 2·3· p
! "3
3 − 1 = 2· p

where anything under a hat ! could be missing. But as p3 − 1 and p3 are

relatively prime, if p3 occurs then (p3 −1)|! 2, i.e., p3 = 3, contradicting p2 = 3.
We are left with 1 = 2, which simply means that p1 = 2 does not participate
either. This leads to the final answer n = 3a with a ≥ 1. 
Alternatively, in (23), 3|LHS hence 3|RHS; so WLOG p1 = 3. Dividing
through by 6, (p2 − 1) . . . (pr − 1) = p2 . . . pr . Thus LHS < RHS, unless there
are no primes p2 , p3 ,. . . , pr , i.e., r = 1 and n is a power of 3. 
Exercise 19. All sums represent multiplicative functions. The only unclear
case may be in (d): Sμ/φ . As in Lemma 2, the ordinary ratio f /g of two
multiplicative functions is also multiplicative, provided we can actually divide
by g. We don’t have a problem with division by φ since φ(n) ≥ 1 for all n
(as incidentally, the ∞-Raffle puzzle also required). The end results are:
  φ2 (n)
(a) Sμφ (n) = i (2 − pi ); (b) (μ  φ)(n) = n i (1 −
1 2
pi ) = n ;
  pi −2
(c) Sμ2 φ2 (n) = i ((pi − 1)2 + 1); (d) Sμ/φ (n) = i pi −1 · ♦

Exercise 20. Since n and nk have identical prime divisors {pi }, we obtain:
 Ä  ä
φ(nk ) = nk (1 − 1
pi ) = nk−1 n (1 − 1
pi ) = nk−1 φ(n). 
i i

Proof of (24). The text points to the prime decompositions of gcd(m, n) =

(m, n) and lcm(m, n) = [m, n]. Let n = ri=1 pai i and m = ri=1 pbi i for the
same primes pi , with ai ≥ 0 and bi ≥ 0. Then
min{a1 ,b1 } min{a2 ,b2 } r ,br }
(29) (m, n) = p1 p2 · · · pmin{a
r , and
max{a1 ,b1 } max{a2 ,b2 } r ,br }
(30) [m, n] = p1 p2 · · · pmax{a
r ,
because the gcd picks the smaller of the two prime powers pai i and pbi i , while
the lcm picks the larger. For example, for 4 = 22 30 and 6 = 21 31 , we obtain
(4, 6) = 21 30 = 2 and [4, 6] = 22 31 = 12. ♦
Now suppose that ai ≤ bi for some i. Then min{ai , bi } = ai and
min{ai ,bi } max{ai ,bi }
max{ai , bi } = bi , so that the product pi pi = pai i +bi . The
same result holds true if ai ≥ bi . Thus, multiplying expressions for (m, n)
and [m, n] in (29)–(30) yields:
 min{a ,b }  max{a ,b }  a +b 
pii i i i
pi = pi i i = pai i pbi i = mn. 
i i i i=1 i=1

Exercise 21. (a) Since f ∈ M, we can split everything along the corre-
sponding prime powers and use (24). To establish the equality, it will be,
therefore, sufficient to compare only the resulting prime-power pieces:
? min{ai ,bi } max{ai ,bi }
(31) f (pbi i ) f (pai i ) = f (pi ) f (pi ).
This is true, as seen before: either min{ai , bi } = ai and max{ai , bi } = bi , or
the other way around. Multiplying (31) for all i gives the desired relation
Ä ä Ä ä
f (m)f (n) = f (m, n) f [m, n] . 

(b) We apply the formula for φ, keeping in mind that the prime divisors
of mn and of [m, n] comprise the same set {p1 , p2 , . . . , pr } (why?):
φ  (24)  φ
(m, n)φ([m, n]) = (m, n)[m, n] (1− p1i ) = mn (1− p1i ) = φ(mn). 
i i

Trivial examples that satisfy (b) in place of φ are O, ε, and id; but none of
ι, μ, τ , or σ works. In fact,
Exercise 24. For any f ∈ M show that Exercises 20 and 21(b), with f in
place of φ, are equivalent:
 f (nk ) = nk−1 f (n) ∀n, k ∈ N ⇔ f (mn) = (m, n)f ([m, n]) ∀m, n ∈ N.
Hint: Split along prime powers and rewrite for any prime p as follows:
f (pk ) f (p) ? f (pa+b ) f (pb )
= p ∀k ∈ N ⇔ pa+b
= pb
∀a, b ∈ N, a ≤ b.
You can further simplify by substituting g(n) = f (n)/n for all n ∈ N. ♦
Problem 13. To use Gauss’s approach, we split all numbers from 1 to n
into pairs {t, n − t}. Note that t is relatively prime to n iff n − t is relatively
prime to n; thus, either both numbers t and n−t participate in the sum η(n),
or neither of them does. More good news: each pair adds up to n. Finally,
to avoid overcounting because of the pairing, we divide by 2 and skip writing
that each sum runs over all t’s such that (t, n) = 1 and 1 ≤ t ≤ n:
Ä ä   def. φ 1
η(n) = 1
2 t + (n − t) = 1
2 n = 12 n 1 = 2 nφ(n). 
For the alternative combinatorial approach suggested in the text, let’s cal-
culate a few preliminary sums. If not indicated, all t’s run from 1 to n;
{p1 , p2 , . . . , pr } are, as usual, the prime factors of n, and d is a divisor of n.
The basic “Gauss” sum-formula is nt=1 t = n(n+1) 2 . If we restrict the sum to
multiples t of d, we can write t = dq for q = 1, 2, . . . , n/d and add up:
n/d n/d
   n n
Gauss d(d + 1) n(n + d) n2 1
t= dq = d q = d = = 2 d + n2 ·
d|t q=1 q=1
2 2d
If, say, d = pi pj pk , recall that 3 counts the triples {pi , pj , pk }, so that
     r  n
3 2·
t= t= 2 pi pj pk +
i<j<k pi pj pk |t i<j<k d|t i<j<k

Thus, the whole sum η(n) equals:

η(n) = 2 (−1)k 1
pi1 pi2 ···pik + n
2 (−1)k k
k=0 i1 <i2 <···<ik k=0

r Ä 
r ä nφ(n)
= 2 (1 − 1
pi ) + n2 (1 − 1)r = n
2 n (1 − 1
pi ) +0= · ♦
i=1 i=1

Problem 14. Let f (x) = nm=1 αm xm where all αm ∈ Z. As in the table-

proof for φ, for any relatively prime a and b we setup an a × b table of the
values {f (1), f (2), . . . , f (ba)}.
f (1) ··· f (i) ··· f (a)
f (a + 1) ··· f (a + i) ··· f (2a)
··· ··· ··· ···
f (ja + 1) ··· f (ja + i) ··· f ((j + 1)a)
··· ··· ··· ··· ···
f ((b − 1)a + 1) ··· f ((b − 1)a + i) ··· f (ba)

As we saw for φ, all entries in the same column here are congruent modulo a
(that is, have the same remainder when divided by a). Indeed, since ja + i ≡
i (mod a), applying standard algebraic properties of remainders (cf. Number
Theory I), we have in the ith column:
(32) f (ja + i) = αm (ja + i)m ≡ αm im (mod a) = f (i).
m m

Thus, if f (ja+i) is relatively prime to a, then f (i) will be relatively prime to

a, and consequently, all numbers in the ith column will be relatively prime to
a. In other words, “relative primeness with a” applies to a whole column or
to nothing in a column. By definition of λf (a), the first row contains exactly
λf (a) numbers relatively prime to a; hence, their λf (a) columns comprise
all entries in the table that are relatively prime to a. For example, in the
6 × 5 table for f (x) = x2 + 3x − 3 in Figure 6a, we see λf (6) = 4 columns
indicated by ↑ and comprised of all entries relatively prime with 6.
As for b, unfortunately, the entries in a column do not necessarily have
distinct remainders modulo b. Check, for instance, that each column in
Figure 6a is comprised of entries with remainders {0, 0, 1, 2, 2} modulo 5; in
particular, remainders 3 and 4 (mod 5) are altogether missing from the table.

   1  7  15 25
1  7  15
 25 37  51 67  127 
67  85 105 127  151  177  51 85 105
151 177 205 235
235 267
301  337  375
301  337  375 415 457 
415 457  501 547  595 645
697  751  807 865 925 987 501 547  595 645 697 
↑ ↑ ↑ ↑ 751  807 865 925 987
↑ ↑ ↑

Figure 6. λf (30) = λf (6)λf (5) = 4 · 3 for f (x) = x2 + 3x − 3


To resolve this apparent obstacle, we need to modify our previous solu-

tion for φ. Notice, by symmetry, that in the b×a table containing overall the
same numbers {f (1), f (2), . . . , f (ab)}, the entries relatively prime to b are
grouped in λf (b) columns, and in each such column all entries are congruent
to each other modulo b. For example, Figure 6b displays three columns (in-
dicated by ↑) of entries congruent (mod 5) to 1, 2, and 2, respectively. Let’s
call these columns the “b-columns.”
 The key idea is simple: each a-column and each b-column share exactly
one entry (in the sense of wrapping any b-column around the a × b table and
landing exactly once on any pre-specified a-column). This is true regardless
of the specific polynomial f : the property depends only on the positioning
of cells in the table and on the relative primeness of a and b. For example,
the 6-column {25, 127, 301, 547, 865} in Figure 6a shares the single entry
127 with the 5-column {37, 127, 267, 457, 697, 987} in Figure 6b.
To show that the a-column {f (i), f (a + i), f (2a + i), . . . , f ((b − 1)a + i)}
intersects the b-column {f (l), f (b + l), f (2 + lb), . . . , f ((a − 1)b + l)} exactly
once (in a table position) is equivalent to solving the equation
(33) ja + i = kb + l
for some unique j = 0, 1, . . . , b − 1 and k = 0, 1, . . . , a − 1. In our example
above, the entry 127 = f (10) corresponds to the unique solution 1 · 6 + 4 =
2 · 5 + 0 = 10 of the equation 6j + 4 = 5k + 0 for 0 ≤ j ≤ 4 and 0 ≤ k ≤ 5.
Since we have done the whole problem for φ, we already know that (33) has
a unique solution: this is the substance of the third paragraph on page 244.
In conclusion, the a × b table contains exactly λf (a)λf (b) entries that
are relatively prime with a and b, i.e., λf (ab) = λf (a)λf (b) and λf ∈ M. ♦
To show that λf (pa ) = pa−1 λf (p), wrap the pa values {f (1), f (2),. . . ,
f (pa )} in a p × pa−1 table as usual:

f (1) f (2) ··· f (p)

f (p + 1) f (p + 2) ··· f (2p)
f (2p + 1) f (2p + 2) ··· f (3p)
··· ··· ··· ···
f (jp + 1) f (jp + 2) ··· f ((j + 1)p)
··· ··· ··· ···
f ((j + 1)p + 1) f ((j + 1)p + 2) ··· f (pa )

Figure 7. λf (pa ) = pa−1 λf (p)

By definition, λf (pa ) counts all f (j) with gcd(f (j), pa ) = 1, which is equiva-
lent to gcd(f (j), p) = 1. Further, each column in Figure 7 consists (again!) of
entries congruent to each other modulo p: f (jp + i) ≡ f (i) (mod p) (cf. (32)
with a = p). Thus, every row contains the same number of entries relatively
prime to pa as does the first row. As there are pa−1 rows and the first row has
λf (p) entries relatively prime to p, we conclude that λf (pa ) = pa−1 λf (p). 

Using that λf ∈ M, we split along prime powers:

r r
1 r
λf (pi )
λf (n) = λf (pai i ) = piai −1 λf (pi ) = pai i λf (pi ) = n ,
i=1 i=1 i=1
i=1 i i=1

λf (n) 
λf (pi )
⇒ = · 
n i=1

Being multiplicative, Sλf also succumbs to prime-power splitting:



Sλf (n) = Sλf (pai i ) = λf (pji ) = 1+ pj−1
i λf (pi ) .
i=1 i=1 j=0 i=1 j=1
r Ä pi i −1 ä
Adding up the geometric series produces Sλf (n) = i=1 1 + λf (pi ) pi −1 ,
not as a “comfortable”-looking Sλx = Sφ = id. 

Problem 15. Evidently, the next suitable modulo will be 2 , yielding 2 k2

k k k
3 + 1 ≡ 4 · 0 (mod 22 2 ), i.e., 4 ≡ 0 (mod 22 2 ) and 22 2 |4. Thus, k2 = 1, and
p2 = 5 is the second smallest appearing Fermat prime. Continuing in this
fashion, we discover that
k k
• 3 · 5 + 1 ≡ 0 (mod22 3 ), 22 3 |16, k3 = 2, and p3 = 17;
k k
• 3 · 5 · 17 + 1 ≡ 0 (mod22 4 ), 22 4 |256, k4 = 3, and p4 = 257;
k k
• 3 · 5 · 17 · 257 + 1 ≡ 0 (mod 22 5 ), 22 5 |65536, k5 = 4, and p5 = 65537.
Something “bad” is about to happen, as F5 is not prime! Indeed, the
smallest five primes appearing in our problem have been already determined.
If a sixth prime pi also appear, our technique of reducing modulo 22 6 yields
(34) 3 · 5 · 17 · 257 · 65537 + 1 ≡ 0 (mod 22 ).
The horrendous LHS is easily calculated by Lemma 6 invoked in the text:
Lem.6 k6 5 k6
F0 F1 F2 F3 F4 +1 = F5 −1 ≡ 0 (mod 22 ) ⇒ 22 ≡ 0 (mod 22 ) ⇒ k6 ≤ 5.
As k5 = 4 and k6 > k5 , we conclude that k6 = 5. But this is impossible since
p6 = 22 + 1 = F5 is not prime, as Euler established several centuries ago!
Thus, if they participate, the primes are p1 = 3, p2 = 5, p3 = 17, p4 =
257, and p5 = 65, 537. Therefore, 2n = ji=0 (pi − 1) = ji=0 22 , and we get
n = 20 + 21 + · · · + 2j = 2j+1 − 1 for 0 ≤ j ≤ 4. Including n = 0, the final
answers are n = 2s − 1 for s = 0, 1, 2, 3, 4, 5. 
Lemma 6. We use induction on k. For k = 0, the basis step checks out:
F0 + 2 = 3 + 2 = 5 = F1 . Assuming the statement is true for some k ≥ 0,
i.e., Fk+1 = F0 F1 F2 · · · Fk + 2, we move to the case for k + 1:
Ä ä IH
RHS = F0 F1 · · · Fk Fk+1 +2 = (Fk+1 −2)Fk+1 +2 = (Fk+1
−2Fk+1 +1)+1
Ä k+1 ä2 k+2
= (Fk+1 −1)2 +1 = 22 +1 = 22 +1 = Fk+2 = LHS.
This completes the inductive step and proves the lemma. 
Session 11

Monovariants. Part III

Smoothing Inequalities

based on Gabriel Carroll’s session

Sneak Preview. In this final part, we will see a different kind of application
of monovariants. Previously we used monovariants instrumentally, to prove that
certain situations will or will not be reached. Now we will look at problems
where the monovariant itself is, explicitly, the whole point of the problem. This
specifically happens with inequality problems. We will smooth or unsmooth them,
and relying on convex functions, we will prove the famous relations between means
from Inequalities I. Applications to Olympiad problems will bring us full circle
to Monovariants I, to gain a deeper understanding of its signature problem on
mansion walks. As a bonus, the reader familiar with limits and continuity will
construct GPS devices to extend our techniques to endless smoothing processes.
While a review of Inequalities I is optional but highly recommended, a working
knowledge of induction is essential in this session.

1. The Balkan Roots Challenge

If the Olympiad problem below looks unfamiliar, puzzling, or simply

worth solving, then you are in the right session (alas, you have be patient
until almost the end to see one solution of it via monovariants):

Problem 1. (Balkan ’98, [3]) If n ≥ 2 and 0 < a1 < · · · < a2n+1 , prove
√ √ √ √ √
a1 − n a2 + n a3 − · · · + n a2n+1 < n a1 − a2 + a3 − · · ·+ a2n+1 .

In this Part III of Monovariants we will study and

apply a simple-looking but surprisingly powerful
method for approaching such problems:
PST 82. To prove an inequality A ≤ B for some
 expressions A and B, make incremental changes to
turn A into B, and show that the expression A can
only increase at each step, or turn B into A and
show that the expression B can never increase.

2. Smoothing and Unsmoothing

2.1. Striving for the golden median. Let’s illustrate the technique by
proving one of the most famous classic inequalities, AM-GM, known to the
prolix by its full name, the Arithmetic Mean–Geometric Mean inequality.1
Problem 2. (AM-GM) Let x1 , x2 , . . . , xn be positive numbers. Prove that
x1 + x2 + · · · + xn √
(1) ≥ n x1 x2 · · · xn ,
with equality if and only if all the xi ’s are equal.

 a way that keeps their sum constant and makes their product increase. At
Our approach is going to be to gradually change the values of the xi ’s in

the end of the process, all xi ’s will be equal, and therefore (1) will hold with
equality. Since the product of the xi ’s has only increased throughout this
process, we can conclude that (1) was true initially.
Proof: Let x = (x1 + · · · + xn )/n be the average of the xi ’s. If all xi ’s are
equal to x, then (1) reads x ≤ x, which holds with equality as required.
Now suppose not all xi ’s are equal to x. They cannot all be less than or
equal to x (since then their average would be < x), so one of them must be
greater than x; call it x + a. Similarly, one of them must be less than x; call
it x − b (with a, b > 0).

x−b x x+a−b x+a x1 x x2

Figure 1. Smoothing AM-GM and Endless smoothing

 The key operation in solving the problem is to replace x + a and x − b

by x + a − b and x (cf. Fig. 1a), which amounts to pulling them closer to
the mean x. The two new numbers x and x + a − b are both positive: we
know x + a − b is positive since it is greater than x − b, which was one of the
numbers in the original collection; and they have the same sum as before:
(x + a − b) + x = (x + a) + (x − b). Even better, their product is larger than
the previous:
(x + a − b) x − (x + a)(x − b) = ab > 0,
so (x + a − b) x > (x + a)(x − b). Multiplying by the other xi ’s shows that
the product of all the xi ’s strictly increases when we replace {x − b, x + a}
by {x, x + a − b} as above.
To summarize, each time we perform our operation:
• the sum of all xi ’s stays constant (so their average, the LHS, is still x);
• their product (and hence, the RHS) strictly increases;
• the number of xi ’s that are equal to x increases.
cf. Inequalities I for the context, background, and a list of classic inequalities.

It follows from the third bullet point that this replacement process, per-
formed repeatedly, must eventually come to an end (namely, after no more
than n repetitions), with all xi ’s equal to x, and at this point (1) holds with
equality. But throughout the replacement process, the LHS of (1) has stayed
the same and the RHS has strictly increased. Therefore, it follows that for
the original xi ’s, (1) must have been true with strict inequality. 
This solution is an example of a very general, intuitive technique for
proving inequalities that was outlined in PST 82 and is made specific here:
PST 83. To prove an inequality with several variables gradually replace the

(values of the) variables with others to reach the equality case (or another
convenient extreme case) in such a way that one side of the inequality stays
the same, and the other side always increases or always decreases (e.g., the
sum stays the same while the product increases).
This technique is informally called smoothing, or unsmoothing when it
i involves pulling variables apart instead of bringing them together. The ex-
ample above was one of smoothing. At least one of the later exercises in this
section will use unsmoothing.
Try smoothing the Arithmetic Mean–Harmonic Mean inequality:
Exercise 1. (AM-HM) If x1 , x2 , . . . , xn are positive numbers, prove that
x1 + x2 + · · · + xn
≥ 1
1 ,
x + x + · · · + xn
n 1
1 2

with equality if and only if all xi ’s are equal. Can you also derive this
inequality directly as a consequence of AM-GM?
2.2. Endless smoothing. There are many ways in which to smooth an
inequality, i.e., to bring the variables closer together.
Exercise 2. Suppose that in the preceding proof of AM-GM we instead try

to smooth in a perhaps more obvious way: if not all the numbers are equal
to their average x, then take two numbers xi < x and xj > x, and replace
them both by their average (xi + xj )/2 (cf. Fig. 1b). What if we average any
pair of numbers xi and xj , not necessarily coming from the opposite sides
of x? What goes wrong with the proof in either situation?
This shows that one has to be a little careful in choosing the smoothing
operation. If it’s not clear right away why the above doesn’t work, doing the
next two exercises – one concrete and the other more abstract – might help
give you a hint.
Exercise 3. In the set up of Exercise 2,
(a) Given any 2n numbers, show that there is a way to equalize all numbers
 after a finite number of replacement operations.
(b) Given any 3 numbers, if they are not all equal after the first replacement
operation, prove that they will never all be equal.

Now, if you want to still make the smoothing in Exercise 2 work, you

resort to the idea that the process may go on forever and compensate for
this by use of limits (cf. Appendix on Limits); or you may apply a cool trick:
add a bunch of extra terms equal to the average x so that the total number

of terms becomes a power of 2, then apply Exercise 3(a) to smooth them all
to x, and finally drop the extra terms. Check that this works!

 as an induction on n: perform just one replacement and apply the induction

Exercise 4. Recast the argument in Problem 2 used to prove AM-GM above

hypothesis (IH ).
This exercise indicates how smoothing is really
logically equivalent to a kind of induction: a finite
induction, which eventually ends (as suggested by
the picture on the left). Smoothing is nonetheless
useful as a conceptual tool.
The discussion of smoothing applications also fully rounds out our dis-
cussion of monovariants. We started out Monovariants I with examples of
problems where the steps are given, and you need to come up with a mono-
variant. In using smoothing to prove inequalities, the monovariant is more
or less given, and you need to come up with the appropriate steps.

3. Rearranging Terms

3.1. Pairwise improvements. A classic example of using interesting steps

is the Rearrangement inequality. Here the steps involve not adding and
subtracting as in the proof of AM-GM above, but rather changing an ordering
of the terms. A similar idea occurred in Monovariants II, where we started
with an arbitrary arrangement of the objects (e.g., a random seating of the
knights at the Round Table) and then reshuffled these objects in order to
“improve” the situation and get it closer to the desired arrangement (e.g.,
one with the number of pairs of adjacent enemies reduced).
Let’s see how this rearrangement idea works with numbers in inequalities.
Problem 3. (Rearrangement Inequality (RI)) Let x1 ≥ x2 ≥ · · · ≥ xn
and y1 ≥ y2 ≥ · · · ≥ yn be real numbers, and let z1 , z2 , . . . , zn be equal to
y1 , y2 , . . . , yn in some order. Prove that
(3) x1 z1 + x2 z2 + · · · + xn zn ≤ x1 y1 + x2 y2 + · · · + xn yn .

Solution: We will use induction on n. We need two base cases. The

case n = 1 is trivial (equality). If n = 2, there are two possibilities for the
rearrangement {z1 , z2 } of {y1 , y2 }: either z1 = y1 and z2 = y2 , which again
just gives equality in (3), or z1 = y2 and z2 = y1 , in which case
(x1 y1 + x2 y2 ) − (x1 y2 + x2 y1 ) = (x1 − x2 )(y1 − y2 ) ≥ 0
and (3) follows.

Now let n > 2 and suppose the statement has been proven for n − 1. If
y1 = z1 , then we can simply apply the induction hypothesis to the remaining
n − 1 variables x2 , . . . , xn and y2 , . . . , yn . So suppose y1 = zk for some k > 1.
Since x1 ≥ xk and zk ≥ z1 , we can apply the n = 2 case to these four
variables, switching z1 and zk (x1 z1 + xk zk ≤ x1 zk + xk z1 ) and thereby
increasing the LHS:
LHS = x1 z1 + x2 z2 + · · · + xk−1 zk−1 + xk zk + xk+1 zk+1 + · · · + xn zn

(4) ≤ x1 zk + x2 z2 + · · · + xk−1 zk−1 + xk z1 + xk+1 zk+1 + · · · + xn zn .

Now, zk = y1 and all the other zi ’s must be equal to y2 , y3 , . . . , yn in some
order, so applying IH to these n − 1 variables, we get:
x2 z2 + · · · + xk−1 zk−1 + xk z1 + xk+1 zk+1 + · · · + xn zn ≤ x2 y2 + · · · + xn yn .
Combining this with (4) gives the desired result (3). 
What did we do here? No monovariant was explicitly mentioned, but
one was stealthily lurking behind this solution: the sum of the products
xi zj . The step we performed was to:
 PST 84. Switch the order of the variables z ’s, gradually going from the ar-
bitrary initial ordering to the ordering y1 , . . . , yn , so that the sum of products
xi zj (i.e., the LHS of RI) increases at each step.
We could have attempted any permutation of the zj ’s at each step. De-
spite the final application of IH that permuted n − 1 of the zj ’s, at its
basic level our solution really switches only two numbers at a time (why?).
Switching or, formally, transposing two elements is the simplest non-trivial
permutation of a set. But we know (say, from Rubik II) that any permuta-
tion of a set can be achieved by consecutive applications of transpositions,
which is the reason why we eventually managed to rearrange the zj ’s in the
pre-determined order of the yj ’s.

PST 85. The advantage of using only transpositions, as in the proof of RI,
is that we need to keep track of what happens to both sides of the inequality,
and having the simplest steps makes this possible and easy.

3.2. Dualizing. The Rearrangement inequality can be summarized by say-

ing that the sum of the products xi zi is maximized when larger numbers
are matched in size with larger numbers. Certainly, it is natural to ask when
this sum of the products will be minimized. Will the opposite hold true: that
the numbers we multiply have to be mismatched in size as much as possible?
Indeed, the “dual” version of RI says precisely that:
Exercise 5. (Dual RI) If x1 ≥ · · · ≥ xn and y1 ≥ · · · ≥ yn are any real
numbers, and {z1 , . . . , zn } is any reordering of {y1 , . . . , yn }, then prove that
 x1 yn + x2 yn−1 + · · · + xn y1 ≤ x1 z1 + x2 z2 + · · · + xn zn .
Can you prove this using a monovariant argument similar to the one above?
Can you deduce it as a direct consequence of the original RI?

4. Convexity and Smoothing

4.1. Convex functions readily yield to smoothing. Powerful, well-

known, and widely-applicable inequalities frequently involve convex func-
tions one way or another. Let’s recall that a function f is strictly convex on
an interval I if for all x = y in I and all 0 < λ < 1, the following holds:
i (5) f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y).
In particular, equality in (5) holds iff x = y. Visually, the graph of a convex
function looks like a smile (or part of a smile as in Fig. 2a); more formally,
the segment connecting any two points on the graph lies above the part of
the graph between these two points. A function f (x) is just convex on I if
inequality (5) is non-strict (i.e., it is satisfied with a “≤” sign).
Convex functions are quite important in the study of inequalities, as
we discovered in the earlier session Inequalities I. There, we also noted an
equivalent and faster way of verifying convexity for continuous 2 functions:
Å ã
x+y f (x) + f (y)
(6) f < for all x = y in I.
2 2
i This is called the Midpoint Rule (illustrated in Fig. 2a-b) and it is a suffi-
cient condition for f to be strictly convex. In other words, for a continuous
function we need to verify the definition of convexity just for λ = 12 , in order
to conclude that the function is convex everywhere.3
Exercise 6. Use the Midpoint Rule to conclude that each of the functions
f (x) = 1/x and g(x) = − ln x (cf. Fig. 2a-b) are strictly convex for x > 0.

1 − ln x
2 y
x x+y y x
2 A B C D

Figure 2. Midpoint Rule for 1/x and − ln x, and Smoothing Lemma

But we didn’t need convex functions before when we proved the AM-
GM, AM-HM, or the Rearrangement inequality! Were convex functions
there? Yes, in each and every solution so far that incorporated smooth-
ing/unsmoothing or rearranging of terms convexity quietly stood in the back-
ground and was the reason that our arguments worked!
If you don’t believe it, let’s take another . . .
Technically speaking, f (x) is continuous at x = a if lim f (x) = f (a).
See the Appendix on Limits for a proof of the Midpoint Rule. There are also other,
more technical ways to investigate whether functions are convex or not, for example, via
derivatives, which we won’t use in this session.

4.1.1. Convex look at AM-HM. The key smoothing step in Exercise 1 was to
replace two of the numbers, x1 = x − b and x2 = x + a, by x + a − b and the
average x of our n non-negative xi ’s. This kept the sum x1 + x2 constant,
but it also “miraculously” decreased the sum of the reciprocals x11 + x12 , which
is what we needed for our monovariant argument to go through. The convex
function used was no other than f (x) = x1 for x > 0.
This precise smoothing situation is so common that we phrase it as:
Lemma 1. (Smoothing) If f (x) is a convex function on interval I, and
A < B < C < D are numbers in I such that the middle two are equidistant
from the end ones, i.e., B − A = D − C, then
 f (B) + f (C) ≤ f (A) + f (D).
Moreover, if f (x) is strictly convex, then the inequality above is strict.
Concisely put, bringing the inputs closer together (as {A, D} → {B, C}
in Fig. 2c) decreases the sum of the outputs of a convex function. For the
novice, it will be a worthwhile experience to attack this Smoothing Lemma
about four points by the definition of convexity, which relates only three
points at a time.
PST 86. If you are given a statement P1 about k objects, but you are trying
to prove some other statement P2 about n objects (where n > k), select all
 or several suitable k-element subsets out of the n objects, apply P1 to each
subset, and then bring together your results in P2 by adding, multiplying, or
performing some other such symmetric operation.
Hint for Smoothing Lemma 1: It turns out that of the four possible
triplets of points from {A, B, C, D} only two will suffice (hinted by Fig. 2c):
you just have to choose them symmetrically, apply the definition of convexity
to each triplet, and then add up your inequalities. The geometry-oriented
reader may want to find a fast trapezoidal explanation. ♦

Going back to AM-HM, we apply Lemma 1 to the convex f (x) = x1 on

(0, ∞) and the numbers {x − b, x, x + a − b, x + a} (possibly x > x + a − b):
1 1 1 1
f (x − b) + f (x + a) > f (x) + f (x + a − b) ⇒ + > + ·
x−b x+a x x+a−b

Since the LHS (AM) remains constant, and from the above the RHS (HM)
increases (why?), our monovariant argument for AM-HM works out. 

4.1.2. Convex look at AM-GM. To locate the convex function behind the
proof of AM-GM requires some manipulation of the inequality. The key
smoothing step was the same as in the AM-HM proof: to replace x1 = x + a
and x2 = x − b by x + a − b and the average x of the given n non-negative
numbers. This kept the sum x1 + x2 constant, but how do we explain that
it also increased the product x1 x2 ?

Products are not conveniently recognized as convex functions. Still, we

can turn them into sums by applying, say, ln x: ln(xy) = ln(x) + ln(y) and
ln x1/n = n1 ln x. The effect of this on both sides of AM-GM is:
x1 + x2 + · · · + xn ? √ ln x1 + ln x2 + · · · + ln xn
ln ≥ ln n
x1 x2 · · · xn = ·
n n
But this is the opposite to the inequality that we would expect for a convex
function, and not surprisingly, ln x is not convex: it is commonly called con-
i cave, the opposite of convex. To reverse the inequality, the convex function
to be used here is f (x) = − ln x:
x1 + x2 + · · · + xn ? − ln x1 − ln x2 − · · · − ln xn
− ln ≤ ·
n n
Again by Lemma 1, − ln(x + a) − ln(x − b) > − ln(x) − ln(x + a − b), from
which the monovariant argument for AM-GM can be completed. 

4.2. Smoothing Jensen’s inequality. Smoothing or unsmoothing famous

inequalities is a treat for problem-solvers. Inequalities I discussed the rela-
tions between ubiquitous means such as AM, GM, HM, and more generally,
power means and their weighted versions. Our next goal is to prove two
fundamental inequalities: Jensen’s inequality (ordinary and weighted) and
Hardy-Littlewood-Pólya’s inequality. These two inequalities imply the rest of
the standard inequalities among power means.
To start off, Jensen’s inequality can be recognized as generalizing the
Midpoint Rule from two numbers x and y to any n numbers:
Problem 4. (Jensen’s Inequality (JI)) Let f be a strictly convex func-
tion on some interval I. If x1 , x2 , . . . , xn are any numbers in I, prove that

Å ã
x1 + x2 + · · · + xn f (x1 ) + f (x2 ) + · · · + f (xn )
f ≤ ,
n n
with equality if and only if all xi ’s are equal.
Hint: Let x be the average of x1 , x2 , . . . , xn . Use the same method as in
the proof of AM-GM: if the xi ’s are not all equal to x, then change two of
them so that one becomes x but the average of all stays the same. Lemma 1
will show that, meanwhile, the sum in the RHS has decreased. ♦
Exercise 7. (AM-GM again!) Modify the previous proof of AM-GM
with another convex function in place of the logarithmic f (x) = − ln x; for
example, the exponential 2x or, frankly, bx for any constant b > 1. Further
shorten these proofs by applying Jensen’s inequality.
Problem 5. (Weighted JI) Let f be a strictly convex function on some
interval I. Prove that if x1 , x2 , . . . , xn are any numbers in I and λ1 , λ2 , . . . , λn
are any non-negative numbers whose sum is 1, then
f (λ1 x1 + λ2 x2 + · · · + λn xn ) ≤ λ1 f (x1 ) + λ2 f (x2 ) + · · · + λn f (xn ),
with equality if and only if all xi ’s with non-zero weights λi are equal.

The reason for the sum of the weights λi to be 1 is two-fold. On the

one hand, it forces the weighted average x̃ = λ1 x1 + λ2 x2 + · · · + λn xn to be
between the smallest and the largest of the xi ’s (why?) and, therefore, on the
interval I where f is defined and convex. On the other hand, something that
might be self-evident: now there is no need to divide by n when taking the
weighted average of the xi ’s because the λi ’s have taken care of this (how?).
Although we warned in Exercise 2 against a (possibly endless) smoothing
argument that replaced each of two variables by their pairwise average, let’s
see how this can be turned to our advantage:

Proof of weighted JI: Take two variables that are not equal, say,
x1 = x2 , and replace each by their weighted average x̃ = λ1λ+λ 1
x1 + λ1λ+λ
x2 .
Here we divided the original weights λ1 and λ2 by (λ1 + λ2 ) in order to make
the new weights of x1 and x2 sum to 1. From the definition of a convex
function on [x1 , x2 ] (or [x2 , x1 ]):
Å ã
λ1 λ2 λ1 λ2
f (x1 ) + f (x2 ) ≥ f x1 + x2 = f (x̃)
λ1 + λ2 λ1 + λ2 λ1 + λ2 λ1 + λ2
∗(λ1 +λ2 )
⇒ λ1 f (x1 ) + λ2 f (x2 ) ≥ (λ1 + λ2 )f (x̃) = λ1 f (x̃) + λ2 f (x̃),
which shows that the RHS of the weighted JI decreased. Meanwhile, the
weighted average of all numbers did not change:
(λ1 x1 + λ2 x2 ) + λ3 x3 + · · · + λn xn = (λ1 x̃ + λ2 x̃) + λ3 x3 + · · · + λn xn ,
so the LHS stayed constant. This provides the intended smoothing argument,
alas, possibly never ending!

However, something more happened: not only have we replaced each
x1 and x2 by x̃, but we can actually combine these two variables into one
variable x̃, with weight λ̃ = λ1 + λ2 , and prove instead the inequality:
f (λ̃x̃ + λ3 x3 + · · · + λn xn ) ≤ λ̃f (x̃) + λ3 f (x3 ) + · · · + λn f (xn ),
where λ̃ + λ3 + · · · + λn = 1. So, there are actually one invariant and two
monovariants anchoring this proof:
• the constant LHS and the decreasing RHS;
• the total number of variables (and not the number of variables equal
to the average), which decreases by 1 at every step.
At the end, we are left with only one variable or with all variables equal to
each other, both of which cases are trivially true. Backtracking, equality is
obtained iff all original variables with non-zero weights are equal (why?). 
In Inequalities I we showed that the weighted JI implies the weighted
versions of AM-GM, AM-HM, and other inequalities among means. With
our new understanding of convex functions and smoothing techniques, the
reader may want to redo these proofs here. For “extra credit,”
 Problem 6. Invent other problems that can be solved by (weighted) JI.

4.3. Hardy-Littlewood-Pólya’s inequality (HLP) (or Karamata’s in-

equality) will require more than just straightforward smoothing. First recall
how we defined majorization of sequences in Inequalities I:
Definition 1. The n-tuple {x1 , x2 , . . . , xn } is said to majorize the n-tuple
i {y1 , y2 , . . . , yn } if x1 ≥ x2 ≥ · · · ≥ xn , y1 ≥ y2 ≥ · · · ≥ yn and

x1 ≥ y1 ;
x1 + x2 ≥ y1 + y2 ;
(7) .
x1 + · · · + xn−1 ≥ y1 + · · · + yn−1 ;
x1 + · · · + xn−1 + xn = y1 + · · · + yn−1 + yn .

The majorizing conditions (7) appear commonly in the theory of inequalities,

as well as the theory of partitions and elsewhere.
Problem 7. (HLP) Suppose that {x1 , . . . , xn } majorizes {y1 , . . . , yn } and
f is a convex function on an interval I containing the xi ’s and yi ’s. Then
f (x1 ) + f (x2 ) + · · · + f (xn ) ≥ f (y1 ) + f (y2 ) + · · · + f (yn ).
If f is strictly convex on I, then equality is attained iff xi = yi for all i.
The proof of HLP is tricky, because we have to choose the steps so that
the constraints (7) of majorization never get violated. We proceed in stages.

4.3.1. Happy endings that could happen. In two cases the HLP situation will
be majorly simplified:
Happy ending 1. All xi are equal. Then the last equality of (7) yields:
nx1 ≤ ny1 , i.e., x1 ≤ y1 . Combining with the first inequality x1 ≥ y1 , we
conclude x1 = y1 . Cancelling both x1 and y1 from all inequalities, we remain
in the same situation but for only n − 1 numbers. Continuing inductively,
we arrive at xi = yi for all i, and then HLP follows trivially. 
Happy ending 2. An inequality in (7) is an equality: x1 + · · · + xk =
y1 + · · · + yk for some k ≤ n − 1. We can then restrict the problem to
the first k inequalities and variables. Moreover, canceling x1 + · · · + xk
and y1 + · · · + yk from both sides of the remaining inequalities, we again
arrive at the HLP problem but only for the sequences {xk+1 , . . . , xn } and
{yk+1 , . . . , yn }. Applying induction on the number of variables, we conclude
that HLP works for the first k variables, and also for the last n − k variables:
f (x1 ) + f (x2 ) + · · · + f (xk ) ≥ f (y1 ) + f (y2 ) + · · · + f (yk ),
f (xk+1 ) + f (xk+2 ) + · · · + f (xn ) ≥ f (yk+1 ) + f (yk+2 ) + · · · + f (yn ).
Summing, we obtain the HLP inequality for all n variables. 

4.3.2. How to get to a happy ending? We will come up with an operation

which, at every step, will:
(1) increase the number of initial xi ’s that are all equal to each other,
x1 = · · · = xk ; or failing this,
(2) lead us to the Happy ending 2.
The operation, of course, will help us prove the HLP inequality: we will
ensure that the sum of the f (xi )’s (the LHS) decreases due to convexity,
while the yi ’s (and hence the RHS) remains the same.
Start by letting k be the first index for which x1 = · · · = xk−1 > xk . Let
x = (x1 +· · ·+xk )/k be the average of the first k xi ’s. We will perform one of
two smoothings, being careful to preserve the majorization inequalities (7).
Smoothing 1. If we can replace each of x1 , . . . , xk by x (cf. Fig. 3a) so
that {x1 , . . . , xn } still majorizes {y1 , . . . , yn }, then we do so. This equalizes
the first k numbers: x1 = · · · = xk , for which Jensen’s inequality implies
i=1 f (xi ) ≥ kf (x), i.e., the LHS of the desired HLP has decreased.

xk−1 +(k − 1)a xk−1

xn . . . xk x x2 xn . . . xk x x2
x1 x1

Figure 3. Smoothing 1 and Smoothing 2 of HLP

Smoothing 2. If Smoothing 1 disturbs some inequality in (7), apply

Lemma 2. For some positive value a < x1 −x we can shift down x1 , . . . , xk−1
by a to x1 −a, . . . , xk−1 −a, and compensate by shifting up xk to xk +(k −1)a
 (cf. Fig. 3b), so that one of the majorization inequalities ( 7) becomes an
equality while the other inequalities are preserved.
Proof: Since x1 + . . . + xk−1 + xk is not changed, the inequalities after the
k th in (7) are not affected by Smoothing 2. If one of the first k inequalities
were an equality from the get-go, then we are already at Happy ending 2.
If not, let ai = 1i (LHSi − RHSi ) > 0 be the difference of the two sides of
the ith inequality, divided by i. Note that ai is exactly as much as we would
decrease each xj in the ith inequality in order to make it into an equality:
x1 + · · · + xi > y1 + · · · + yi ⇒ (x1 − ai ) + · · · + (xi − ai ) = y1 + · · · + yi .
Set a = am to be min{a1 , a2 , . . . , ak−1 } and use it to perform Smoothing 2,
i.e., decrease each x1 , . . . , xk−1 by a, and increase xk by (k − 1)a. This
will make the mth inequality into an equality, while preserving the rest of
(7). Since Smoothing 1 (x1 → x) would have violated something in (7) but
Smoothing 2 (x1 → x1 − a) does not, we must have x1 − x > a (why?). We
can now arrange the players in Smoothing 2 in increasing order (cf. Fig. 3b):
xk < xk + (k − 1)a < x < x1 − a < x1 ,
(∗) being true, or else all k new numbers would be > their average x!

What happened to the LHS? Did it decrease under Smoothing 2, i.e.,

f (xi ) = (k − 1)f (x1 ) + f (xk ) ≥ (k − 1)f (x1 − a) + f (xk + (k − 1)a)?
By making appropriate replacements, you can identify this inequality with
the following generalization of the Smoothing Lemma:
Lemma 3. (Multi-smoothing) If f (x) is a convex function on interval I,
and A < B < C < D are numbers in I such that C is l times closer to D
than B is to A for some l ∈ N, i.e., B − A = l(D − C), then
 f (A) + lf (D) ≥ f (B) + lf (C).
Moreover, if f (x) is strictly convex, then the inequality above is strict.
Hint: You could repeatedly ap- A
B C  D
ply the Smoothing Lemma, or you
could devise a fast-track geometry
argument with trapezoids. In the
case of l = 4, the drawing on the
right has already set up the ground
for both solutions. ♦ A B1 . . . Bl−1 B C D

4.3.3. Weaving inductively the proof of HLP. We now have all pieces to put
together: we know where we would like to end, and we know how to get
there. Naturally, we use induction on n. The HLP inequality is trivially true
for n = 1, since then x1 = y1 . Suppose HLP is true for n − 1 variables.
For n variables, we keep applying our Smoothing 1 or 2 operations until
we make all variables equal to each other, or until one of the majorization
(7) inequalities (other than the nth one) becomes an equality. These two
situations were addressed before, and each ends happily. 
A “bifurcation” phenomenon persisted throughout our solution of HLP:
there were two happy endings and two smoothing procedures to get there.
PST 87. While it is often possible to construct a smoothing operation lead-
ing eventually to all variables being equal, some inequalities call for alter-

native smoothing operations that lead to other favorable outcomes. In such
problems, you have to simultaneously take into account two or more scenarios
throughout the induction (or smoothing) process.
Now that we have proven HLP,
Exercise 8. Can you recognize the Smoothing and the Multi-smoothing
Lemmas as special cases of HLP?
For practice, explain why HLP implies the following:
Corollary 1. (HLP for Products) If {x1 , . . . , xn } majorizes {y1 , . . . , yn }
 on interval I and all xi ’s and yj ’s are positive, then x1 x2 · · · xn ≤ y1 y2 · · · yn .
Equality is attained if and only if xi = yi for all i.

5. Random Fun with Smoothing

Here are two beautiful Olympiad problems that will challenge us to com-
bine old monovariant ideas in creative new ways. We will only discuss how
to link the problems to what we have already learned, and leave it to the
reader to “smooth out” (pun intended) all arguments into complete solutions.

5.1. Alternating sums are featured in the Balkan Olympiad Problem 1

that started our session: if n ≥ 2 and 0 < a1 < · · · < a2n+1 , then
√ √ √ √ √ √
a1 − n a2 + n a3 − · · · + n a2n−1 − n a2n + n a2n+1
? √
< n a1 − a2 + a3 − · · · + a2n−1 − a2n + a2n+1 .

Constructing a clever smoothing (or unsmoothing) procedure for a con-

vex (or a concave) function will finally unravel the inequality for us. Asking
the right questions is half of the smoothing! So, let’s start.

5.1.1. What function should we use? The only function around is n x for
x > 0. Even if we apply induction on n and manage to decrease the number
of variables, it is√unlikely that we will succeed in reducing to the previous

root function n−1 x. So, we have to live with the same function f (x) = √ n
throughout the whole solution. To make matters slightly more annoying, n x
is concave (why?), which can be resolved easily: all of our known inequalities
will be applied with the opposite signs.

5.1.2. What is our monovariant: what feature of the inequality is it feasible

to preserve? Not the sum, evidently, but the alternating sum of the variables.
Hence, we could try to keep the RHS constant, while increasing the LHS.
For the same reason, useful smoothing cannot possibly collect terms together
towards an average, because preserving the sum seems irrelevant here.

5.1.3. Can we reduce the number of variables? If we combine two consecutive

terms into one, i.e., −a2n + a2n+1 → −a2n , the number of terms will change
from odd to even, and the nature of the problem will change.4 √
Instead, we could combine three consecutive terms
into one: something like a2n−1 − a2n + a2n+1 → a, and
thereby involve the four variables a2n−1 , a2n , a, a2n+1 ,
along with their nth roots in our smoothing argument.
Note that a2n−1 < a < a2n+1 (why?), but we don’t
a2n−1 a2n a a2n+1
know (and won’t care) which of a and a2n is larger.
What we do care about is that a and a2n are symmetrically placed in
the interval [a2n−1 , a2n+1 ] (why?). Which obvious tool comes to mind? ♦

Besides, the inequality doesn’t make sense with an even number of variables; e.g.,
√ √ ? √ √
1 − 2 < 1 − 2 = −1 = i. . . . Nope, complex numbers cannot be compared like that!

5.2. Concentration monovariant revisited! Our last Olympiad problem

will bring us full circle to the very first serious monovariant we constructed
way back in Part I. Does something below look familiar?
Problem 8. (USAMO ’99, [4]) Let n > 3, and let a1 , a2 , . . . , an be real
numbers such that a1 + a2 + · · · + an ≥ n and a21 + a22 + · · · + a2n ≥ n2 . Prove
that at least one of the numbers a1 , a2 , . . . , an is ≥ 2.
5.2.1. Easy come, easy go! Before we plunge into a deep discussion, we
should discard the possibility of a trivial solution. At a first glance, the
solution rests on inequalities about the sum of squares a2i , along with the
assumption that all ai < 2: n2 ≤ a21 + a22 + · · · + a2n < 4n, from which n < 4,
a contradiction. . . . or is it? We forgot that a2i < 4 would be implied if the
ai > 0, but our ai ’s could be any real numbers! The inequality P1 ≤ P2
between the arithmetic mean and the root mean square also requires ai ≥ 0
and yields the unhelpful 1 < 2. There goes any chance of a trivial solution.
5.2.2. “Rewriting” history. Our Problem 8 resembles a lot the signature prob-
lem of Monovariants I about the mansion. Let’s take a fresh look at it.
Problem 9. (Mansion ) P people reside in the rooms
of a n-room mansion. Each minute a person walks from
one room into another with at least as many people in
it. Prove that eventually everyone will be in one room.
Proof : The mansion problem came with an explicit
operation – the movement of people between rooms. But
it did not have a readily available monovariant. In our solution, we created
the latter as the “concentration of the mansion,” namely, the sum of the
squares a21 + a22 + · · · + a2n where ak was the number of people in room k.
Was the given operation smoothing with respect to our monovariant?
To the contrary: it was unsmoothing! To see this,
picture the action happening along the graph of the
x2 convex f (x) = x2 for x > 0: whenever ai ≤ aj
a person could move from room i to room j and
(ai , aj ) → (ai − 1, aj + 1). This kept the sum of
ai −1 ai aj aj +1 all a ’s constant, while it pulled them apart.
So by Smoothing Lemma 1 applied in reverse:
f (ai ) + f (aj ) < f (ai − 1) + f (aj + 1) ⇒ a2i + a2j < (ai − 1)2 + (aj + 1)2 ,
i.e., the monovariant increased at each step. To complete the argument:
each ai could assume only finitely many values (why?), and hence so could
the monovariant a2i , which prevented it from increasing forever. 
5.2.3. Learning from the past. To attack our current problem, let’s try to
change the numbers a1 , a2 , . . . , an in a systematic way to reach the extreme
case, in the spirit of smoothing. In order to preserve the given inequality
a21 + a22 + · · · + a2n ≥ n2 , we need each step to increase the sum of squares,

not decrease it. This is where the mansion problem comes in; it suggests
that we should try an unsmoothing operation.
There are minor differences between the two problems, all resolvable:
• Before, the ai ’s were integers ≥ 0. Now they are any real numbers.
• Before, we could only make changes of (−1, +1) to pairs ai < aj . Now
the unsmoothing change (−a, +a) is allowed for any a > 0.
• Before, the monovariant eventually came to a full stop, simply because
it had only finitely many possible values. But now, we have continuous
variables ak and, therefore, infinitely many values for a2i . How do we
make the monovariant stop changing?
We need another, discrete monovariant to put the brakes on the process.
Recall the goal of the problem: to show that some ai ≥ 2. Assuming to the
contrary that all ai < 2, create a unsmoothing operation that increases at
each step the number of ai ’s ≥ 2. ♦
In both the mansion and USAMO ’99 problems the sum of squares acted
as a “concentration” monovariant. From our discussion of convex functions,
we know that we don’t have to use squares. In the latter problem the mono-
variant is, at least hypothetically, in danger of continuing to increase for-
ever, so we helped it with an auxiliary monovariant. These ideas are general
enough to be written out as PSTs.
PST 88. If you have a collection of numbers xi whose sum stays constant,

and need a monovariant that increases when the numbers become more
“spread out,” try using the sum of their squares. More generally, you can
try using the sum of f (xi ) where f is any strictly convex function.

PST 89. If the monovariant is continuous (or can take on infinitely many
values), create another, discrete monovariant (e.g., the number of variables
with some specific property) that will cause the smoothing process to end.
For fun and to understand better the technique of smoothing:
Exercise 9. Redo the problems from Monovariants I about gender bal-
ance and hybrid mansions, leaping frogs along collinear lilies or in a circular
swamp, and simultaneous switches. (The images below should help bring
on a flashback.) Identify explicitly the convex functions and the smoothing
operations used in the solutions. Create more exercises of the same type.

6. Appendix on Limits and Endless Smoothing

We avoided using limits so far in the Monovariants sessions, relying on

discrete arguments. As promised, we shall prove here several technical state-
ments for the reader advanced in limits and continuous functions.

6.1. We need a GPS device! Smoothing by pairwise averaging did not

work in Exercise 3(b). We started there with three numbers, {a, a, b}. At
each step we replaced two of the unequal numbers with their average and
left the third number unchanged:
ß ™ ß ™
a+b a+b 3a + b 3a + b a + b
(8) {a, a, b} → a, , → , , → · · ·
2 2 4 4 2
Unfortunately, this smoothing process never ends because it keeps producing
the same type of configuration {c, c, d} with c = d. Still, performing a
few steps of the process leads to the inevitable observation that the three
numbers are crowding closer and closer around their average x = (2a + b)/3
(cf. Fig. 4a). Can we make this precise?
3a+b a+b dk
4 2
a x b ak x bk

Figure 4. Approaching x and Decreasing dk

Exercise 10. Let {ak , bk , ck } be the three numbers after performing k steps
 of the pairwise averaging in (8). Then k→∞
lim ak = lim bk = lim ck = x.
k→∞ k→∞
There are many ways to prove this, but perhaps the most easily gener-
alized way rests on a standard “sandwich” idea:
PST 90. To see why several sequences converge to the same limit x, set dk
to be the maximal distance between x and all the numbers after the k th step.
 Show that lim dk = 0 to force all sequences to converge to x.
Solution to Exercise 10: The process averages only pairs of numbers
that come from opposite sides of x (why?), i.e., if ak < x < bk , then ak
and bk each go to (ak + bk )/2. Suppose ak is further away from x than bk ,
i.e., dk = |x − ak |. Using the notation in Figure 4b, the simple geometric
argument CX < CB = 12 AB < AX = dk shows that ak shortened its
distance to x by a factor of at least 2. Applying the averaging once more
will bring in the third number at least twice as close to x as it was before.
To summarize, dk decreases at each step and gets at least halved every
other step, i.e, dk+2 ≤ dk /2. This results in limk→∞ dk = 0, and by PST 90
all three sequences ak , bk , and ck converge to x. 
With four or more numbers, however, there are choices for the order of
pairs to average, and if we are not careful, our numbers may not converge
to the same place! Using a distance monovariant again,

Exercise 11. Devise an algorithm for pairwise averaging of x1 , x2 , . . . , xn
that forces them to approach their arithmetic average x = ( i xi )/n.
We will refer to such an algorithm as a good pairwise smoothing, or for
i short, a GPS directing all numbers x1 , x2 , . . . , xn towards their average x.

6.2. Limits tame inequalities. To see how this is useful in working with
inequalities, suppose you want to prove something as general as:
(9) LHS = F (x1 , x2 , . . . , xn ) ≤ G(x1 , x2 , . . . , xn ) = RHS
for two functions F and G, continuous for all xi in some interval I. Suppose
Ä x +x ä
x +x
further that under pairwise averaging of the inputs (xi , xj ) → i 2 j , i 2 j ,
F increases and G decreases. Using a GPS, we have lim xi → x for all i,
where the steps of the GPS are indexed by k.5 By continuity of F and G and
properties of limits, we are left to prove only the middle inequality below:
LHS ≤ lim F (x1 , . . . , xn ) = F (x, . . . , x) ≤ G(x, . . . , x) = lim G(x1 , . . . , xn ) ≤ RHS.
k→∞ k→∞

We have gained a powerful insight:

PST 91. If pairwise smoothing does not disturb the inequality between two

continuous functions, i.e., it lifts the smaller side up and lowers the larger
side down, then all you need to show is that the inequality is true when all
variables are equal.
This simplifies the proof of some inequalities we have encountered so far.
If you are bothered by the continuity condition, rest assured that:
Lemma 4. Any composition of the four arithmetic operations ±, ×, ÷, the
algebraic operations of raising to a power, and any of the standard continuous
functions such as exponential, logarithmic, or trigonometric functions, is
continuous on any interval where this composition is well-defined.

6.3. Infinite pairwise smoothing is useful, after all! With this sea of
continuous functions, we can attack a number of inequalities. Keep in mind
that any convex function on interval I is necessarily continuous on I (why?).

 inequalities between the arithmeticmean P

Exercise 12. Using pairwise smoothing, prove Jensen’s inequality and the
1 and any other power mean Pr .

xr +···+xr
Partial Proof: Since Pr = r 1 n n is continuous for xi > 0 and
Pr = P1 for equal inputs, when r > 1 the proof of P1 ≤ Pr boils down to
showing that Pr strictly decreases under pairwise smoothing for x = y:
Å ã
? x+y r xr + y r ? x + y
xr + y r > 2 ⇔ r
> ·
2 2 2
Strictly speaking, each xi should also be indexed by k since xi changes with the steps.

The latter is the original inequality P1 < Pr but only for two variables, which
is a substantial reduction brought about by the pairwise smoothing.
Setting f (x) = xr and I = (0, ∞), we can rewrite the last inequality as:
Å ã
x + y ? f (x) + f (y)
(10) f < for all x = y in I.
2 2
“Surprise!” This is the Midpoint Rule for the strictly convex f (x) on I! ♦

6.4. Why is the Midpoint Rule (MR) true? Taking (10) as given, we
need to show the definition of a strictly convex function, i.e.,
(11) f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) for any λ ∈ (0, 1).
Starting with x and y, we have to get to x̃ = λx +(1−λ)y, their weighted
average, by using only ordinary averages of pairs so that we can apply MR
at each step. Another (weighted) GPS construction is in order here:
Lemma 5. There is a sequence {an } inside interval (x, y) that converges to x̃
 so that each an is midway between some previous ai ’s, including possibly x
and y. If an = λn x + (1 − λn )y for some λn ∈ (0, 1) then lim λn = λ.

By induction on n, show that inequality (11) works for these an ’s:

 Lemma 6. f (a ) < λ f (x) + (1 − λ )f (y) for all n ≥ 1.
n n n

To complete the proof of the Midpoint Rule, we apply limits to both

sides of the inequality in Lemma 6:
lim f (an ) ≤ ( lim λn )f (x) + lim (1 − λn )f (y)
n→∞ n→∞ n→∞
⇒ f (x̃) ≤ λf (x) + (1 − λ)f (y),
which is almost what we wanted, but since the limits on both sides of the
inequality could be equal, we lost along the way the strict inequality! How-
ever, we did prove (11) as a non-strict “≤” inequality for all λ ∈ (0, 1), and
thereby for all weighted averages of x and y. Geometrically, this means that
the graph of f on (x, y) is below or on the segment XY . (In the figure, for
any z ∈ I we denote by Z the point on the graph of f over z).
If (11) were an equality for some λ ∈ (0, 1), then
X X the point X̃ (above the weighted average x̃) must lie
f on segment XY . Because of the Midpoint Rule hy-
X̃ Y
pothesis, λ = 1/2 (why?), i.e., x̃ is not the midpoint
between x and y; so say, x̃ is closer to x than to y.
x x̃ x y Then x̃ is midway between x and another x ∈ (x, y).
Again, from the strict inequality in the Midpoint Rule we know that
2f (x̃) < f (x) + f (x ). Geometrically, this puts point X̃ below XX  , i.e., X 
is strictly above line X X̃ = XY ! This contradicts our previous conclusion!
Thus, all inequalities (11) are strict, f (x) is strictly convex on I, and the
Midpoint Rule is indeed true. 

6.5. Final Quiz. Recall Problem 2 from Inequalities I: if a1 , a2 , . . . , an are

positive numbers and g = a1 a2 · · · an is their geometric mean, then
(12) (1 + a1 )(1 + a2 ) . . . (1 + an ) ≥ (1 + g)n .
The solution in Inequalities I relied on a fair amount of combinatorics and
some algebra, in addition to clever multiple applications of AM-GM to a
variety of terms. It is reasonable now to attempt to prove this inequality
using smoothing, but since the sum of the ai ’s does not appear in the prob-
lem, at least not in an obvious way, can smoothing be performed at all?
Expanding by multiplying out the LHS will “create” the desired sum ai ;
unfortunately, various products of the ai ’s will also pop up on that side of the
intended inequality, messing up any smoothing attempts that fix the sum!
Problem 10. Challenge your understanding of the techniques developed in
this session by finding a quick and elegant proof to inequality (12) via:
 (a) a finite smoothing argument;
(b) an infinite smoothing argument.

7. Hints and Solutions to Selected Problems

Exercise 1. Suppose we want to run smoothing with the same idea as in the
proof of AM-GM: fix the sum of the xi ’s and increase the other side. Thus,
we replace a pair of numbers x + a and x − b not equal to the average x by x
and x + a − b. We know that their sum remains constant and their product
(x + a)(x − b) increases to x(x + a − b). But we also need to figure out what
happens to the sum of their reciprocals in the denominator of the RHS:
1 1 (x + a) + (x − b) x + (x + a − b) 1 1
(13) + = > = + ·
x+a x−b (x + a)(x − b) x (x + a − b) x x+a−b
Adding the other unchanged x1i ’s and reciprocating (13) flips the sign of the
inequality and forces the RHS to increase under our operation. Just as in the
proof of AM-GM, the number of variables xi equal to the average x increases
at each step, i.e., after at most n repetitions all variables will be equal to x
and equality will be then obtained. Hence, the original AM-HM inequality
must have been true, with equality iff all variables are equal. 
It is possible to also deduce AM-HM from two applications of AM-GM.
Rewrite equivalently the desired AM-HM by pulling everything to the LHS:
x1 + 1
x2 + ··· + 1
x1 + x2 + · · · + xn ?
· ≥ 1.
n n
Now, AM-GM applied separately to x1 , x2 , . . . , xn and to x11 , x12 , . . . , x1n yields
two geometric means that cancel each other:
x1 + · · · + xn x1 + · · · + xn AM-GM 
1 1

· ≥ n 1
x . . . x1n · n x1 . . . xn = 1.
n n 1

Equality is obtained iff AM-GM yields equalities, i.e., for x1 = · · · = xn . 


Exercise 2. The sum remains constant and the product increases under the
x +x
operation of replacing each of two different numbers xi and xj by i 2 j :
Å ã2
xi + xj xi + xj AM-GM xi + xj
xi + xj = + and xi xj < ·
2 2 2
The problem is that the process may never end! In fact, Exercise 3(b)
provides a simple and convincing example of an endless smoothing. 
Exercise 3. (a) This can be done by induction on n. The base case n = 1
of two numbers x1 and x2 being replaced by their average is trivially true.
Assuming that we can equalize 2n numbers, take any 2n+1 numbers and ar-
bitrarily split them into two groups of 2n numbers each. By IH, equalize
the numbers in each group to some common values a and b, respectively.
Performing then the operation on 2n pairs of numbers {a, b} will make ev-
erything equal to (a + b)/2, completing the induction step. 
(b) After one step there will be two numbers equal and the third different
from them: {a, a, b}. After another step, we will have exactly the same
configuration of numbers: two equal and one unequal to them, and so on
and so forth. The process will therefore never stop. 
Read the Limit Appendix for a discussion that pushes through the re-
placement operation in Exercise 2 to a successful proof, despite the fact that
the smoothing process itself never stops.
Exercise 4. With the pairwise averaging operation in the AM-GM solution,
we claim that any n > 1 numbers will be equal to each other after n−1 steps.
More precisely, if n − k of the numbers are already equal to the average x,
with k ≥ 1, then it will take at most k − 1 steps for the process to end in
equality, fixing the LHS at each step and increasing the RHS. We leave it to
the reader to finish the formal proof by induction on k, taking into account
that every time the operation reduces the numbers not equal to x. ♦
Exercise 5. Just reverse the monovariant step from the proof of RI: for
n = 2, if z1 = y2 and z2 = y1 , do nothing; if z1 = y1 and z2 = y2 , then switch
the zi ’s to decrease the RHS. Now apply an analogous inductive argument
as in the proof of RI. ♦
Exercise 6. The Midpoint Rule boils down to 2-variable AM-HM/AM-GM:
1 1
1 ? x + y 2 ? x+y
• x+y < ⇔ 1 1 < ;
2 x + y
Ä ä ? Ä ä ? √ ? √
− ln x−ln y
• − ln x+y
2 < 2 ⇔ ln x+y
2 > ln xy ⇔ x+y
2 > xy. ♦

Smoothing Lemma 1: Since B is between A and D, it can be viewed

as a weighted average of A and D for some λ ∈ (0, 1) (cf. Fig. 2c), i.e.,
B = λA+(1−λ)D. Since B −A = D −C, for symmetry reasons C is also the
weighted average of A and D with the reverse weights: C = (1 − λ)A + λD.

By the convexity definition applied twice to these two triplets of points:

f (B) ≤ λf (A) + (1 − λ)f (D),
f (C) ≤ (1 − λ)f (A) + λf (D).
Adding and canceling on the RHS yields:
f (B) + f (C) ≤ f (A) + f (D). 

Problem 4. For xi < x < xj , smoothing as usual (xi , xj ) → (x, xi + xj − x)

results in decreasing the RHS according to Smoothing Lemma 1:
f (xi ) + f (xj ) = f (A) + f (D) < f (B) + f (C) = f (x) + f (xi + xj − x).
When all variables are equal, JI is trivially true. 
Exercise 7. To show that is convex, apply the Midpoint Rule: 2(x+y)/2 =
√ ?
2x 2y < (2x + 2y )/2, which is AM-GM for the positive numbers 2x and 2y .
Alternatively, if you know derivatives, (2x ) = (ln 2) 2x is an increasing func-
tion, so by the First Derivative Test (cf. Inequalities I) 2x is convex for all x.
Applying JI to the convex function f (x) = 2x and any numbers y1 , . . . , yn :
y1 +···+yn JI 2y1 + · · · + 2yn √ 2y 1 + · · · + 2y n
≤ ⇔ 2 1 · · · 2y n ≤ ·
n y
(14) 2 n
n n
Now, any xi > 0 can be uniquely written as xi = 2yi : simply set yi =
log2 (xi ). Substituting in (14) yields the AM-GM for x1 , . . . , xn . Equality is
obtained iff all yi ’s are equal, i.e., all xi ’s are equal. 
Lemma 3. The figure on page 274 shows points A = B0 , B1 ,. . . ,Bl = B that
divide AB into l equal parts. Applying repeatedly the Smoothing Lemma to
the quadruple of points {Bi−1 , Bi , C, D} for 1 ≤ i ≤ l, we obtain l inequal-
ities f (Bi−1 ) + f (D) > f (Bi ) + f (C). Adding them up and canceling all
intermediate f (Bi ) for i = 1, . . . , l − 1, but not f (D) and f (C) which appear
l times, we arrive at the desired f (A) + lf (D) ≥ f (B) + lf (C). 
For the geometric argument, in trapezoid ADD A segments BB and 

CC are parallel to AA and DD  . From AB/CD = l and the similar

shaded right triangles (with hypotenuses along A D  and one leg horizontal):
lC  D  = A B  , from which l(CC  − DD  ) = AA − BB  (why?). Rearranging
this, AA +lDD  = BB  +lCC  . Using that the graph of f (x) between A and
D  lies underneath the segment A D  , we have BB  > BB and CC  > CC,
which yields again f (A) + lf (D) ≥ f (B) + lf (C). ♦
Exercise 8. The Smoothing Lemma is the case with two inequalities in HLP:
x1 ≥ y1 and x1 +x2 = y1 +y2 , where x1 , x2 , y1 , y2 are D, A, C, B, respectively.
The Multi-smoothing Lemma is the following case of l inequalities in HLP:
D ≥ C, D + D ≥ C + C, . . . , lD ≥ lC, lD + A = lC + B. 
Corollary 1. Apply − ln(x) to both sides of the intended inequality and
n n
split the products to sums: i=1 (− ln(xi ))≥ i=1 (− ln(yi )), which is HLP
for the convex f (x) = − ln x on (0, ∞). 

Problem 1. To show that x is concave, use the Midpoint Rule:
» √ √ n
x+y ? n x+ n y x+y ? x1/n +y 1/n ?
2 > 2 ⇔ 2 > 2 ⇔ P1 > P 1 ,
which is the power mean inequality for two variables with r = < 1. Alter-
natively, using the Second Derivative Test (cf. Inequalities I), we calculate
√ √
( n x) = n1 ( n1 − 1)x n −2 < 0 for x > 0, so n x is concave there. Thus, all

inequalities work with the opposite signs. √

√ we will show that Problem 1 is true for any function x instead
of n x, where
√ m > 1 is fixed. Indeed, by the Smoothing Lemma for the
concave x and a2n−1 , a2n , a = a2n+1 − a2n + a2n−1 , and a2n+1 :

√ √ √ √
m a
2n+1 +
m a
2n−1 <
a2n + m a2n+1 − a2n + a2n−1
√ √ √ √
(15) ⇒ m a2n+1 − m a2n + m a2n−1 < m a2n+1 − a2n + a2n−1 .
In the key step, replace a2n → a2n−1 and a2n+1 → a = a2n+1 − a2n + a2n−1 .
This increases the LHS and fixes the RHS. Cancelling out the resulting two
a2n−1 terms, we end up with √ two fewer radicals and n becomes n − 1. By
induction on n (for function x), we only need the base case for n = 2:
√ √ √ ? √
a1 − m a2 + m a3 < m a1 − a2 + a3 .
But this is the general inequality (15) that we already proved earlier with
n = 1. Finally, note that
√ our original problem is just the special case of the
above for the function x (when m = n) and 2n + 1 variables. 
Problem 8. Suppose that all ai < 2. Iterate the following operation as long
as possible: take two numbers ai and aj both less than 2, and replace them
by 2 and ai + aj − 2. Now ai stays the same, but what happens to a2i ?
The operation is unsmoothing: ai +aj −2 < ai ,
aj < 2, so the middle numbers ai and aj are
pulled apart to ai +aj −2 and 2. As f (x) = x2
is convex, by Lemma 1 the sum of the a2j ’s acts
as a “concentration” monovariant and goes up:
ai +aj −2 ai aj 2 f (ai ) + f (aj ) < f (ai + aj − 2) + f (2).
Both given inequalities are preserved by the operation; however, one more
of the ai ’s became 2. As we cannot repeat this operation forever, eventually
exactly one aj is < 2 and the rest all 2’s. The two inequalities then read:
• aj + 2(n − 1) ≥ n ⇒ aj ≥ −(n − 2) ⇒ aj < −(n − 2);
• a2j + 4(n − 1) ≥ n2 ⇒ a2j ≥ (n − 2)2 ⇒ |aj | ≥ |n − 2|.
From here aj ≥ n − 2 ≥ 4 − 2 = 2, a contradiction with aj < 2.
Hence, indeed, one of the original ai ’s must be ≥ 2. 
Exercise 11. At every step simply average the number that is furthest away
from x and any other number on the other side of x. After n (really, & n2 ')
such steps the maximum distance from x will be at least halved. ♦

Lemma 5. We modify the GPS algorithm to approach the weighted average

x̃ of x and y. Set d = y − x, a−1 = x, and a0 = y. Since x and y are
on opposite sides of x̃, their average a1 = (x + y)/2 is at most d/2 from
x̃. Continue inductively: at the nth step, average an−1 and the ai on the
opposite side of x̃ that is closest to x̃, resulting in an = (an−1 + ai )/2 at most
d/2n from x̃. Writing an = λn x + (1 − λn )y for some λn ∈ (0, 1), we have
0 = lim (an − x̃) = lim (λn − λ)(x − y) ⇒ lim λn = λ. ♦
 Incidentally, this GPS proof shows that any real number is the limit of some
n→∞ n→∞ n→∞

sequence of rational numbers { 2bnn } whose denominators are just powers of 2.

Lemma 6. For a−1 = x and a0 = y, the inequality in Lemma 6 becomes
trivially an equality (why?). Assume that Lemma 6 is true for all k < n.
Since an = an−12+ai for some i < n, add the IH inequalities for an−1 and ai :
Äa ä MR f (a )+f (a ) −λi )
f (an ) = f n−1 +ai
2 < n−1
≤ λn−1 +λi
2 f (x) + (2−λn−1
2 f (y).
Since we can write an = λ x + (1 − λ )y for a unique λ ∈ (0, 1) (why?), we
conclude that λn = λn−12+λi , so that f (an ) < λn f (x) + (1 − λn )f (y). 
Problem 10. The RHS of (12) will not change if we fix the product
a1 a2 · · · an . Thus, instead of using the arithmetic mean of the ai ’s, we will
use their geometric mean g. If the ai ’s are not all equal, then one of them
will be larger and another one smaller than g, e.g., ai < g < aj . Replace:
√ √
(a) (ai , aj ) → (g, ai aj /g); (b) (ai , aj ) → ( ai aj , ai aj ).
The part of the LHS that changes is (1 + ai )(1 + aj ) = (1 + ai aj ) + (ai + aj ).
The product ai aj is fixed, while the sum ai + aj decreases:
? ?
(a) ai + aj ≥ g + ai aj /g ⇔ gai + gaj − g 2 − ai aj = (g − ai )(aj − g) ≥ 0,
which is true by construction of ai and aj .

(b) ai + aj ≥ 2 ai aj by AM-GM.
In (a), the smoothing will be finite (why?), resulting in all ai ’s equal to
g and the inequality then being trivially an equality.
In (b), the smoothing may be infinite, but a limit argument will easily
complete the proof, as long as there is a GPS that directs all aj ’s towards
their geometric mean g. To this end, recall the algorithm in Exercise 11
for pairwise averaging of x1 , x2 , . . . , xn that forced them to approach their
arithmetic average x = ( i xi )/n. Setting xi = ln ai and applying the same
√ √
algorithm to the ai ’s will replace at each step (ai , aj ) by ( ai aj , ai aj ) and
will force the ai ’s to approach g = ex (why?). This will provide the necessary
GPS. Since both sides of (12) are continuous functions, applying limits to
both sides (where k indexes the steps of the GPS) yields:
lim (1 + a1 )(1 + a2 ) · · · (1 + an ) = (1 + g)(1 + g) · · · (1 + g) = (1 + g)n .
From PST 91, we conclude that inequality (12) is always true. 
Session 12

Geometric Re-Constructions. Part III

Optimal Bridges and Infinitely Many Squares

Zvezdelina Stankova

Sneak Preview. If you thought that there are no more solutions to the Farmer-
and-Cow problem worth discussing . . . you are in for a surprise! To produce a
solution radically different from anything we have done so far, we will re-discover
a special case of the famous Minkowski’s inequality; and yet another solution will
invoke some more advanced derivative techniques for optimizing functions. We
will try to utilize all generated ideas to conquer the Optimal Bridge challenge from
Part II. However, in the purely geometric solution, a new “magic” transformation
in the plane will have to be created, to replace the reflection across the river
used in Part I. Everyone from beginners to advanced will find a solution here
corresponding to their level.
In contrast, the section devoted to generalizing the Three-Squares problem to
infinitely many squares is intended only for the most advanced readers. Our
solution will depend entirely on Calculus techniques: we will invoke a particular
Taylor series and some PSTs for determining when sums are finite or infinte.
Our geometric journey will end with a historical detour through a 2000-year
old puzzle attributed to Archimedes. This will bring us full circle back to where
we started: our brilliant 5th-grade solution to the Three-Squares Problem in
Part I. In an attempt to circumvent all Calculus machinery in this article, a final
geometric challenge will be posed as an open problem.

1. Farmer-and-Cow via Inequalities and Calculus

The Farmer-and-Cow problem from Parts I-II is essentially an optimiza-

tion problem: it asks about the minimal length of the farmer’s path. In this
section, we explore two standard but technically more demanding routes that
can be taken in such optimization situations. Some basic familiarity with
inequalities will be helpful along the first route (and you may want to review
parts of Inequalities I), and knowledge of some standard Calculus techniques
in applying derivatives will be expected along the second route.

1.1. The initial set-up. Both routes start the same way.

 knowns (and knowns).

PST 92. In solving optimization problems, first label appropriately the un-
If some values of the unknowns are unrealistic in
terms of achieving the desired optimum, prove that they can be discarded.
This will restrict the given domain and may simplify the ensuing calculations.
To begin with, let F A = a and CB = b be the distances from the farmer
and the cow to the river (cf. Fig. 1b). It is intuitively obvious that, in order
to travel along the shortest path, the farmer should go to a point X between
A and B. But intuition is not equal to proof.

Exercise 1. If Y is a point on the river outside segment AB, find another
path from the farmer to the cow that is shorter than the path F → Y → C.

F F b F b

l l A X Bl
YA B A X B a a
x x+y
Figure 1. Restricting the domain and Conjecturing the optimum

Hint: If A is between Y and B (cf. Fig. 1a), show that F Y +Y C > F A+AC.
The Pythagorean Theorem may be helpful. ♦
Because of Exercise 1, we can now safely assume that the shortest path
of the farmer must pass through a point X on segment AB (cf. Fig. 1b).
Hence we can introduce the non-negative unknowns x = AX and y = BY
such that x + y = AB. Moving on to the next step:
 PST 93. Describe the quantity in question via some function of the un-
knowns (and the knowns).
A specific case of this was done in Exercise 4 in Part II. To generalize, the
farmer’s route is made of hypotenuses F X and XC in F AX and CBX:
f (x, y) = a2 + x2 + b2 + y 2 where x, y ≥ 0 with x + y = AB.

1.2. Famous inequality in disguise. The reader may wonder: why do we

not substitute y = AB − x to get rid of one variable in f (x, y)? We will do
this later in our Calculus solution; but it is more convenient to proceed in a
slightly different fashion in our first approach with inequalities below.
PST 94. Set up the problem as an inequality A ≥ B, where A is the quantity
 in question and B is a conjectured minimal 1 value of A.
Analogously, setting up the opposite inequality A ≤ B will work for establishing the
maximal value for A.

1.2.1. Conjecturing by “cheating”. While the function f (x, y) will undoubt-

edly serve as the LHS A of our inequality, it is not at all obvious what the
RHS B should be! We can take advantage of our prior experience with this
problem and make an intelligent “guess”: we can conjecture that the length
B of the shortest path will be equal to F  C, where F  is the reflection of the
farmer F across the river (cf. Fig. 1c). So, what is B?
 Exercise 2. Calculate the length of F C in terms of a, b, x, and y.

Solution: If AF  EB is a rectangle (cf. Fig. 1c), »using the right F EC

shortens our calculations: F  C = F  E 2 + EC 2 = (x + y)2 + (a + b)2 . 
We can now algebraically re-formulate the Farmer-and-Cow problem:
Problem 1. (Inequality Version) For any x, y, a, b ≥ 0 prove that
 » ? »
(1) a2 + x2 + b2 + y 2 ≥ (a + b)2 + (x + y)2 .

1.2.2. Nothing new under the sun. Inequality (1) is a symmetrically phrased,
elegant inequality, so it should not be a surprise that it is well-known. Indeed,
it is a special case of a famous and much more general inequality.
Theorem 1. (Minkowski’s Inequality) If all ai , bj ≥ 0 and r ≥ 1, then
» » »
ar1 + · · · + arn + r
br1 + · · · + brn ≥ r
(a1 + b1 )r + · · · + (an + bn )r .
If 0 < r < 1, then the inequality is reversed.
We shall not attempt to prove Minkowski’s Inequality here (it will be dis-
cussed in its full generality in Inequalities II, vol. III). We can, though, prove
our special inequality (1) practically with “bare hands”.

1.2.3. Proof by reasoning backwards. We will rewrite (1) in a sequence of

simpler but equivalent ways. Since everything is positive, we can square
both sides without changing the direction of the inequality:
» ?
(a2 + x2 ) + 2 (a2 + x2 )(b2 + y 2 ) + (b2 + y 2 ) ≥ (a + b)2 + (x + y)2 .
The RHS expands to a2 + b2 + 2ab + x2 + 2xy + x2 , causing four squares to
cancel from each side and yielding other equivalent versions of the inequality:
» ? ?
2 (a2 + x2 )(b2 + y 2 ) ≥ 2ab + 2xy ⇐⇒ (a2 + x2 )(b2 + y 2 ) ≥ (ab + xy)2 ,
where we divided by 2 and squared once again. We now expand:
a2 b2 + x2 y 2 + a2 y 2 + x2 b2 ≥ (ab)2 + 2(ab)(xy) + (xy)2 ,
cancel a2 b2 +x2 y 2 = (ab)2 +(xy)2 , pull everything to the LHS, and rearrange
a bit to recognize the well-known formula c2 + d2 − 2cd = (c − d)2 :
? ?
a2 y 2 + x2 b2 − 2(ay)(xb) ≥ 0 ⇐⇒ (ay − xb)2 ≥ 0.
The last inequality is certainly true for all a, b, x, and y. 

1.2.4. Don’t forget the equality! Our proof so far verified that all paths of
the farmer are at least (a + b)2 + (x + y)2 . As per Exercise 2, the latter is
the length of F  C where F  is the reflected, “phantom” farmer. But we must
ask: is the corresponding path F → X → C (where X is the intersection of
F  C with the river) the unique shortest path for the real farmer F ?
PST 95. To complete the proof of an inequality A ≥ B, investigate when
 equality is obtained. In other words, find a condition (algebraic, geometric,
or other) on the involved letters that makes the two sides equal.
The task of solving A = B has been trivialized by the last step of our
proof: inequality (1) is equivalent to (ay − xb)2 ≥ 0. Equality is obviously
obtained exactly when ay = xb. Furthermore, substituting y = AB − x and
solving a(AB − x) = xb for x yields the only possible value x = a ·AB/(a + b),
as long as we can divide by a + b. Therefore, if at least one of a or b is non-
zero, there is a unique place on the riverbank such that the corresponding
path F → X → C is a shortest path for the farmer; namely, this is the point
X between A and B with AX = a · AB/(a + b).
If a = b = 0, then (1) is always trivially satisfied (check it!). In reality,
this corresponds to the situation when both the farmer and the cow are at
the riverbank: the farmer can dip his bucket into the river anywhere along
his way to the cow; i.e., AX = x can be any number between 0 and AB. 
1.2.5. Why be restricted to the plane? Rewriting the condition for equality
in (1) in the form x/a = y/b, makes it reasonable to expect that the general
Minkowski’s Inequality will become equality if and only if all ratios ai /bi are
equal.2 The reader curious about more general versions of the Farmer-and-
Cow problem can formulate and solve the problem in space (a flying farmer
and a flying cow?) and even venture into four or more dimensions.

1.3. A Calculus drill. Only the reader versed in Calculus techniques

should read this solution, since we will use but not justify those techniques.
1.3.1. Translation from life to Calculus has essentially been done. Recall our
function f (x, y) that measures the length of the farmer’s path for x, y ≥ 0
such that x + y = AB. Setting AB = c, we can substitute y = c − x in order
to reduce to a one-variable function F (x) for 0 ≤ x ≤ c:
F (x) = f (x, c − x) = a2 + x2 + b2 + (c − x)2 .
We are asked to find the minimum value of F (x) on the interval [0, c].
1.3.2. The critical points of F (x) are places where optimum values could
potentially occur. To locate them, we take the derivative F  (x) and set it to
equal 0. The derivative itself is calculated by applying the Chain Rule twice:

Technically, to avoid division by 0, we must stipulate that when some bi = 0 then
ai = 0, or rewrite the conditions for equality as ai bj = aj bi for all i, j.

x (c − x) » 
F  (x) = √ − = 0 ⇐⇒ x b2 + (c − x)2 = (c−x) a2 + x2 .
a 2 + x2 b2 + (c − x)2
The last manipulation was simply clearing the denominators. Now note that
both sides of the equality are non-negative since x ∈ [0, c]. Squaring and
multiplying through leads to:
x2 b2 +(c−x)2 = (c−x)2 (a2 +x2 ) ⇐⇒ x2 b2 +x2 (c−x)2 = (c−x)2 a2 +(c−x)2 x2 .

An obvious cancellation leaves us with x2 b2 = (c − x)2 a2 . Again, since all

involved quantities are non-negative, taking the square root on both sides
reduces this to xb = (c − x)a, or xb = ya as previously discovered. If both
a = 0 and b = 0 (the farmer and the cow are both on the riverbank), the
function F (x) is the constant x + (c − x) = c = AB, so it doesn’t matter
where the farmer dips his bucket between A and B: he will end up walking
straight to the cow and covering the same (minimal) distance AB.
If at least one of a and b is not 0 then a + b > 0, and we can solve
xb = (c − x)a for x to get x0 = ac/(a + b). This is where the only critical
point of F (x) occurs and where F (x) has a potential minimum or maximum.

1.3.3. Optimum realized. To check what really happens at x0 , we investigate

how the sign of F  (x) changes as x moves through x0 . Thus, instead of the
equality F  (x) = 0, we try to solve the inequality F  (x) > 0.
Exercise 3. Check that all of the algebraic manipulations of the equalities
above in Subsection 1.3.2 can be redone as inequalities: start with F  (x) > 0
for 0 ≤ x ≤ c, replace everywhere “=” by “>”, and show that eventually you
will arrive at the following equivalent inequalities:
? ? ? ac
xb > (c − x)a ⇐⇒ x(a + b) > ca ⇐⇒ x > = x0 .

To summarize, F  (x) > 0 if x > x0 and F  (x) < 0 if x < x0 (cf. Fig. 2a
on p. 292). This means that the original function F (x) decreases before x0
and increases after x0 , i.e., F (x0 ) is the global minimum of F (x) on [0, c]. 

1.3.4. No more “guessing”. We can now find the minimum of our function:
Exercise 4. Calculate F (x0 ) and simplify it as much as possible.
Answer: ÄAfter ä some
» non-taxing algebraic manipulations, one arrives at
F (x0 ) = F a+b = (a + b)2 + c2 . ♦
If we recall that c = AB = x + y, the expression for F (x0 ) should not be
surprising: F (x0 ) is precisely the “mysterious” RHS of the inequality A ≥ B
in our previous approach. We conclude that
» ac
F (x) ≥ F (x0 ) = (a + b)2 + c2 with equality iff x0 = · 

1.3.5. The big versus the really big picture. Our investigation of the derivative
F  (x) can be used to show that the local behavior of F (x) on the interval
[0, c] extends to a global behavior on (−∞, ∞). More precisely,
 Exercise 5. Using the sign of F (x) again, show that F (x) decreases for all

x < 0 and increases for all x > c.

y y

) 12 200
F + )

0 − x0 c x F



0 x0 c x −2 0 x0 6 x −100 0 x0 100 x

√ »
Figure 2. Graphs of F (x) = 22 + x2 + 62 + (4 − x)2
The expected shape of the graph of F (x) is confirmed by Figure 2b in
the original case of the problem with a = 2, b = 6, and c = 4, on the
interval [−2, 6]. The graph basically looks like a smile,3 with the bottom of
the smile at x0 = ac/(a + b) = 1. However, as we enlarge the interval to, say,
[−100, 100] (cf. Fig. 2c), the graph of the function starts resembling a wedge:
it “straightens out” into two lines as x moves further away from x0 = 1. If
you are familiar with the necessary Calculus techniques,
√ »
Exercise 6. Show that F (x) = 22 + x2 + 62 + (4 − x)2 has two slant
 asymptotes: y = 2x − 4 when x → ∞ and y = 4 − 2x when x → −∞.
Alternatively said, F (x) ≈ |x| + |4 − x| when |x| is large.
Thus, the length of the farmer’s path changes approximately linearly
when he approaches the river at places X very far from the cow.

2. Optimal Bridge Located!

Let us apply the techniques we have developed so far to:

Problem 2. (Optimal Bridge) Two villages are sit-

uated on opposite banks, not necessarily across from
each other. The river has constant width. The farm-
ers’ market is always held in the same village. The
other village wants to build a bridge across the river
(and perpendicular to the banks of the river) so that
the total trip to the farmer’s market is as short as
possible. Where should the bridge be built and why?
The “smile” refers to the formal term convex : to show that F (x) is convex on
(−∞, ∞), you could verify that the second derivative F  (x) > 0 for all x.

2.1. Inequalities: something old. The memory from applying inequali-

ties should be still fresh in our minds. So, let’s first attempt the inequalities
approach in the Optimal-Bridge problem.
2.1.1. Appropriate labeling. For clarity, let’s draw a rectangle whose sides are
parallel or perpendicular to the river and two of whose diagonally opposite
vertices are our villages V1 and V2 (cf. the picture below).
Let V1 A = a and V2 B = b be the distances from V2
V1 and V2 to the river. If X1 and X2 are the be-
ginning and the end of the bridge, then X1 X2 = d b
is the fixed width of the river. Our unknowns are X2 B
the distances from A and B to the respective ends d
of the bridge: AX1 = x and BX2 = y. Note that Aa X1
the height of our rectangle is the fixed a + d + b, c
while its base is some c, also fixed. V1

2.1.2. Restricting the possibilities. As ridiculous as it may seem, the bridge

could be built outside our rectangle. We know how to proceed:

 outside the rectangle by showing that the resulting

Exercise 7. Eliminate the cases when the bridge is
routes are not the shortest possible. A
X2 B
Sketch: If the bridge X1 X2 is to the left of our
rectangle, then V1 → A → A → V2 , the (dashed)
route on the picture going straight up from village X1 A
V1 to the river, will be shorter than the route V1 → V1
X1 → X2 → V2 through the bridge X1 X2 . ♦

2.1.3. We’ve done this before! We have justified that the optimal bridge must
be inside our rectangle, and hence our unknowns x and y are non-negative
and add up to x + y = c. Further, the total length of the route from V1 to V2
is V1 X1 + X1 X2 + X2 V2 , which can be expressed as the following function:
f (x, y) = a2 + x2 + d + b2 + y 2 for a, b, x, y ≥ 0.
By now the reader has, no doubt, seen the connection with the special case
of Minkowski’s inequality, proven in the Farmer-and-Cow situation:
 » » »
a2 + x2 + b2 + y 2 ≥ (a + b)2 + (x + y)2 = (a + b)2 + c2 .
Therefore, the length of the shortest route is (a + b)2 + c2 + d, attained
(again!) iff ay = bx. 

2.2. The “magic” transformation: something new. If neither of the

villages is on the riverbank, we found that the shortest route is obtained
when x/a = y/b. In turn, this implies that two right triangles are similar:
V1 AX1 ∼ V2 BX2 . Is there a geometric explanation of this phenomenon?

Unfortunately, both the width d of the river and the fact that the villages
are on opposite sides of the river makes our previous idea of reflecting across
the river useless here. Below we uncover another transformation that will
elegantly explain the situation and lead to a purely geometric solution.
2.2.1. Rearranging parts for a better understanding. As we observed, every
route consists of three parts: walking from V1 to the bridge, walking across
the bridge, and then walking to V2 . While the first and the third parts
depend on where the bridge is built, the middle part is kind of a “constant”:
• it always goes in the same direction; e.g., we can assume (as in our
figures) that walking across the bridge is in the north direction; and
• it has a fixed length of d.

PST 96. If some quantity (whether algebraic or geometric) consists of sev-
eral parts, try swapping some of these parts: this may give you an advanta-
geous angle by viewing the quantity in a different, easier way.
In the bridge situation, why not first walk the “constant” middle part of
the route and then follow it by the other two parts of route? To this end, we
ignore temporarily the river: this will enable us to arbitrarily build “bridges”
on land and walk on water in any direction without a bridge. Thus,
(a) First walk north from V1 to point Y for a V2
distance of d.
(b) Then walk straight from Y to village V2 .
In effect, this swapped the first segment V1 X1 of the X2 Z2
route with the second, bridge-part X1 X2 . To recover d d

our original route, we need to swap back these two Y X1 Z1

parts! There is one convenient place to break the d
walk Y V2 in order to build the bridge: V1
(c) Let Y V2 intersect the riverbank of V2 in point X2 .
(d) Build the bridge at X2 , going back to point X1 across the river.
(e) Let the villagers take the route V1 → X1 → X2 → V2 .
2.2.2. Wait! Is this the most optimal route? A proof is in order here.
Exercise 8. Take another route V1 → Z1 → Z2 → V2 (going over an
 actual bridge Z1 Z2 , of course!) and show that it is longer than the route
V1 → X1 → X2 → V2 proposed by the algorithm (a)-(e) above.
Hint: Using two parallelograms and the Triangle Inequality, re-direct ev-
erything through Y without changing the overall length of the routes. ♦
2.2.3. What is the “magic” transformation? If you think about what hap-
pened above, for each route V1 → Z1 → Z2 → V2 we found another route
V1 → Y → Z2 → V2 of equal length (not necessarily through a bridge) that

always started with segment V1 Y ; i.e., the useful transformation turned out
to be a translation V1 → Y from village V1 to the north by distance d.

The idea of the translation in the plane can also explain the aftermath
of our previous inequality solution, where by algebraic calculations we dis-
covered that the shortest route occurs if x/a = y/b.
Exercise 9. Justify geometrically that V1 AX1 ∼ V2 BX2 for the optimal
 bridge X X .
1 2

Solution: According to our algorithm (a)-(e), in

the shortest route V1 → X1 → X2 → V2 the first V2
and third segments are parallel : by construction,
V1 X1 || Y V2 and X2 V2 lies on Y V2 , so V1 X1 || X2 V2 .
But the two riverbanks are also parallel to each X2 B
other. Thus, angles ∠V1 X1 A and ∠V2 X2 B are A
formed by two pairs of parallel sides and therefore Y X1
they are equal (why?). Since V1 AX1 and V2 BX2
are right triangles, AA criterion implies the desired V1
similarity V1 AX1 ∼ V2 BX2 . 
To wrap up the discussion, from equal ratios in V1 AX1 and V2 BX2
we have V1 X1 /AX1 = V2 B/BX2 , i.e., x/a = y/b. In other words, the
inequalities and the translation solutions yield the same optimal bridge.
2.2.4. Is the optimal bridge always unique? So far we worked only with the
case when none of the villages was directly at the river. This caused the
existence of Y Z2 V2 and a unique optimal bridge. To complete the picture,

 are on their corresponding riverbanks. How many optimal bridges are there
Exercise 10. Investigate the special cases when one or both of the villages

and how do we locate them?

2.3. Why only two villages? If you want to test everything you’ve learned
so far in Parts I-III about solving optimization problems, bump up the num-
ber of villages to three, change the river to a railroad track, and try to come up
with a variety of approaches (purely geometric, inequalities, and Calculus –
anything counts!) to the following challenge problem:

Problem 3. (Optimal Station) There are three

villages nearby a railroad track: one is situated
right by the track, and the other two are built sym-
metrically on opposite sides of track. Where should V3
the villages build a joint train station so that the
total commute from the three villages to the station ?
is the shortest possible? V1

Hint: Let V3 be the village at the railroad track

and V1 and V2 the other two villages. Two different
situations occur depending on how ∠V1 V3 V2 com- V2
pares to 120◦ . ♦

3. Infinitely Many Angles and Infinite Series

The next challenge will require both trigonometry and advanced Calculus
techniques. Read on only if you are fluent in both.
Problem 4. (ℵ0 –Squares) Glue to each other infinitely many identical
squares with bases AA1 , A1 A2 , A2 A3 , A3 A4 , A4 A5 , and so on, to form an
infinite row (cf. Fig. 3). If D is the top left corner of the first square, right
above A, what is the sum ∠AA1 D + ∠AA2 D + ∠AA3 D + ∠AA4 D + · · · ?

α1 α2 α3 α4 α5
A A1 A2 A3 A4 A5
Figure 3. α1 + α2 + α3 + α4 + α5 + · · · = ?

3.1. Finite or infinite? Problem 4 asks us to find the sum of all angles
αi . From the Three-Squares problem, we know that α1 + α2 + α3 = 90◦ . So,
let’s concentrate on finding the sum of the rest of the αi ’s.
i To this end, define the partial sum sn to be sn = α4 + α5 + · · · + αn
for any n ≥ 4. From right DAAi , tan αn = n1 . Luckily, the formula from
Part II for tangent of a sum will link recursively all values of tan sn :
tan sn−1 + tan αn tan sn−1 + n1
tan sn = tan(sn−1 + αn ) = = ·
1 − tan sn−1 tan αn 1 − tan sn−1 · n1
Thus, tan α4 = tan s4 = 1
4 and tan s5 = ( 14 + 15 )/(1 − 1
4 · 15 ) = 9
19 .

 culate the first dozen terms of {tan s }. Is the sequence increasing?

Exercise 11. Starting with tan s5 and rounding to the nearest tenth, cal-
Solution: The values of tan s5 through tan s16 are approximately:
0.5, 0.7, 0.9, 1.2, 1.5, 1.9, 2.4, 3.1, 4.2, 6., 10.1, and 27.9.
Since the tangent function is increasing on [0◦ , 90◦ ), it is no surprise that
the sequence seems to be increasing. . . . But the very next term will make us
stop in our tracks: tan s17 = −43.6 < 0. To cause the tangent to be negative,
we must have gone over the right angle, i.e., s17 = α4 + · · · + α17 > 90◦ .
Thus, the sequence {tan sn } is not increasing. 

PST 97. To find out if a sum is finite or infinite, investigate the first partial
sums and make a conjecture in order to know what type of proof to expect,
because the techniques in the finite vs. infinite case will be different.
With this in mind, we keep on investigating the sequence {tan sn }. To
go over another 90◦ , i.e., to turn the tangent positive again, check that you
will need to wait much longer: tan s81 ≈ −0.01 and tan s82 ≈ 0.002. So far,
(α1 + α2 + α3 ) + (α4 + · · · + α17 ) + (α18 + · · · + α82 ) > 3 · 90◦ = 270◦ .

Given the evidence, there is no reason to expect that the sum of the infinitely
many angles αn will be finite! We are compelled to make the following

Conjecture 1. The sum α1 + α2 + · · · + αn + · · · is unbounded.

The conjectured infinite sum will necessitate a completely different ap-
proach compared to that in the Three-Squares problem. Before we dedicate
the rest of the section to proving the conjecture, it is interesting to ponder
over an analogous, “semi-finite” version of the Three-Squares problem:
Problem 5. We know that α1 + α2 + α3 = 90◦ . Is there a place beyond α3
where the sum of the angles up to αn is an exact multiple of 90◦ , i.e., are
there natural numbers n, k ≥ 4 for which α1 +α2 +α3 +α4 +· · ·+αn = 90◦ k?
3.2. Inverse trigonometry. As stated, Conjecture 1 is difficult to prove
because we can easily calculate tan αn = 1/n, but we are trying to find the
sum of the inputs αn , not of the outputs 1/n. We are looking at the “wrong”
function: tan x! Instead, we should be looking at its inverse arctan y, which
is defined for all reals? In particular, arctan(1/n) = αn and we can rewrite:
Conjecture 1 . arctan 1 + arctan 12 + · · · + arctan n1 + · · · = ∞.
This is a Calculus problem about series, and a sequence of Calculus
exercises will help us justify that the series diverges.
3.3. Bounding from below will serve as our first step.

PST 98. Let the terms an of a sequence be given by some function f (x).
To show that the an ’s add up to ∞ (the sum has no upper bound), find a
lower bound for f (x), i.e., another function g(x) such f (x) ≥ g(x), and show
instead that the corresponding terms bn given by g(x) add up to ∞.
In our case, an = f ( n1 ) with f (x) = arctan x, and the bn ’s should be
given as bn = g( n1 ). If you haven’t worked before with Taylor series, the
choice we will make here for the lower bound for arctan x will seem to come
out of nowhere. As we shall see later, it is not a guess at all.
Exercise 12. The function arctan x for 0 ≤ x ≤ 1 is bounded from below
by a cubic polynomial g(x); namely, arctan x ≥ x − x3 for all x ∈ [0, 1].
We shall first go through a less technical proof that avoids Taylor series
and relies on analysis with derivatives to minimize a function.

Proof 1: Pull everything to the LHS to form a new function h(x) =

arctan x − x + x3 . To show that h(x) ≥ 0 on [0, 1], calculate and simplify
the derivative: h (x) = 1+x
2 , and note that it is always non-negative! Hence

h(x) increases for all x. In particular, for x ≥ 0 we have h(x) ≥ h(0) = 0.

x3 x3
Unraveling, arctan x − x + 3 ≥ 0, i.e., arctan x ≥ x − 3 for x ≥ 0. 

As the picture shows, arctan x is sandwiched be-

3 y
tween the polynomials x − x3 and x. We proved arcta
above that arctan x ≥ x − x3 for x ≥ 0. For x
practice, using the derivative techniques above, 0

Exercise 13. Show that x ≥ arctan x for x ≥ 0.

happens among the functions x, arctan x,
and x − 3 when x ≤ 0?
3.4. The price to pay for demystifying the cubic polynomial x − x3 is
using Taylor expansions. It is a standard exercise in Calculus to derive the
Taylor expansion of arctan x centered at x = 0 and find the interval where
it converges to arctan x. We will discuss this calculation only in the Hints
section, and leave it to the advanced reader to investigate the topic in a
Calculus textbook. The result needed for our purposes is:
x 3 x5 x7
Exercise 14. For any x ∈ [−1, 1], arctan x = x − + − + ····
3 5 7
The RHS looks like a polynomial of “infinite” degree, but we need only the
degree-3 polynomial x − x3 made of its first terms! Why does dropping the
higher powers of x yield the desired inequality arctan x ≥ x − x3 for x ≥ 0?
PST 99. Given an equality between a function and an infinite series (such

as in Exer. 14), group the unwanted terms in the RHS and show that each
group is positive (or each group is negative, as needed). Then drop all such
grouped terms to produce an inequality in the desired direction.
Equipped with PST 99, we can justify again the lower bound for arctan x.
Proof 2 of Exercise 12: Restricting the Taylor expansion of arctan x
to 0 < x ≤ 1, note that the absolute values of the terms decrease as n grows:
x2n+1 ? x2n+3 2n + 3 ? 2 2 ?
≥ ⇔ ≥x ⇔1+ ≥ x2 ,
2n + 1 2n + 3 2n + 1 2n + 1
and the last in certainly true because 1 ≥ x . Leaving alone the first two
terms x − x3 , we can therefore group the remaining (unwanted) terms into
pairs with non-negative differences when x ∈ [0, 1]:
Ç 5 å Ç 9 å Ç 2n+1 å
x x7 x x11 x x2n+3
− + − + ··· + − + · · · ≥ 0.
5 7 9 11 2n + 1 2n + 3
As a result, arctan x ≥ x − 3 for x ∈ [0, 1]. 

3.5. Classic infinite and finite sums. Recall that we wanted to show
that all arctan n1 add up to ∞. From Exercise 12, we know that their sum
will be at least the corresponding sum of values of x − x3 ; namely,
∞ ∞ Å
1 1 1
(2) arctan ≥ − 3 ·
n n=1 n 3n

So by PST 98 we need to verify that the RHS of (2) adds up to ∞. Part of

i this RHS sum known as the harmonic series will be infinite:
Exercise 15. Show that the reciprocals of all natural numbers add up to ∞:
1 1 1
1 + + + · · · + + · · · = ∞.
2 3 n
On the other hand, the rest of the RHS of (2) is known to be finite:
Exercise 16. Show that sum of the reciprocals of all cubes of natural num-
bers is bounded from above, i.e., for some number B:
1 1 1
1 + 3 + 3 + · · · + 3 + · · · < B.
2 3 n
Note that Exercise 16 is not asking us to find the exact value of the sum of all
i 1 . This value is denoted by ζ(3), after the famous Riemann zeta-function
1 2
ζ(z). It turns out that ζ(2) = n2
= π6 ; but no closed formula is known
for ζ(3)! Hence, our task here in only to show that ζ(3) is finite, or what it
is equivalent to here, that it is bounded from above by a number B.
There are many ways to do Exercises 15-16; for example, by integrals.
We will go instead along paths accessible without Calculus knowledge but
requiring advanced PSTs for sequences and some non-trivial thinking.

3.5.1. Doubling the index adds another half. To start off, for any n ≥ 1 let
i an = 1+ 12 + 13 +· · · + n1 . The an ’s are called the partial sums of the harmonic
series. Since n1 > 0, the sequence of partial sums {an } is increasing.
PST 100. To show that an increasing sequence {an } goes to ∞, it is enough
 to show that a subsequence {ank } of it goes to ∞.
The choice of a convenient subsequence {ank } depends on the specific
example. For our harmonic series, something inventive needs to be done.
Solution to Exercise 15: The slick approach is to consider the subse-
quence {a2k } made of every (2k )th term. Now, every next term a2k+1 is a sum
of twice as many fractions as the previous term a2k . How will this increase
the value of a2k ? Check the beginning: a20 = a1 = 1, a21 = a2 = 1+ 12 = 1 12 ,
Ä ä Ä ä Ä ä Ä ä
a22 = a4 = 1 + 12 + 13 + 14 > 1 + 12 + 14 + 14 = 2,
Ä ä Ä ä
a2 3 = a8 = a4 + 15 + 16 + 17 + 18 > 2 + 18 + 18 + 18 + 18 = 2 + 4· 18 = 2 12 ·
A pattern emerges: when we double the index from 2k to 2k+1 , the terms
 a2k increase by at least a half, which is the brilliant idea in this approach:
a2k+1 = a2k + 1
2k +1
+ 1
2k +2
+··· + 1
> a2k + 2k · 2k+1
= a2k + 12 ·
Using induction, one can formally show that a2k ≥ 1 + k 21 for all k ≥ 1. But
the new, smaller sequence {1 + k 21 } obviously goes to ∞, pushing the larger
sequence {a2k } to go to ∞. Retracing our steps, by PST 100 we conclude
that the original (increasing) sequence {an } is also forced to go to ∞. 

3.5.2. Telescoping for convergence. Turning now to the partial sums of the
n3 , bn = 1 + 23 + 33 + · · · + n3 , we must change tactics because
1 1 1 1
1 1
and the harmonic series n behave in opposite ways!

 PST 101. To prove that a sequence {b } is bounded from above, find an-
other sequence {cn } greater than it and bounded from above. Symbolically, if
bn ≤ cn and cn ≤ B for all n, then bn ≤ B for all n.

Solution to Exercise 16: Confirm the following chain of events:

1 1 1 (∗) 1 1
< 2 < = − for any n > 1,
n 3 n n(n − 1) n−1 n
where in (∗) we split the fraction as a difference of two simpler fractions.
If you remember the telescoping method from Induction (vol. I), you will
recognize that our solution is about to employ this method:
1 1 1 1
bn = 1 + + 3 + ···+ + 3
2 3 3 (n − 1) 3 n
Å ã Å ã Å ã Å ã
1 1 1 1 1 1 1 1
≤1+ − + − + ···+ − + − ·
1 2 2 3 n−2 n−1 n−1 n
Almost all intermediate terms cancel, leaving only three surviving fractions:
1 1 1
bn ≤ 1 + − = 2 − < 2.
1 n n
Thus, an upper bound for all bn ’s is B = 2, and ultimately, 3
≤ 2. 
To show that all fractions n13 actually add up to something, a strong Real
Analysis theorem needs to be invoked (cf. the Hints section).

3.6. Concluding arguments. Recall inequality (2) from page 299, which
a lowerä bound for our desired sum of arc-tangents: ∞
n=1 arctan n ≥

n=1 n − 3n3 . If we stop the sum on the RHS at some n and regroup the
1 1

terms, the partial sums an and bn discussed above will spring up:
Å ã Å ã Å ã
1 1 1 1 1 1
− + − +··· + −
1 3 · 13 2 3·2 3 n 3 · n3
Å ã Å ã
1 1 1 1 1 1 1 1 2
= + + ··· + − 3
+ 3 + · · · + 3 = an − bn > an − ,
1 2 n 3 1 2 n 3 3
where in the last inequality we used bn < 2. Since {an } goes to ∞, then
{an − 23 } also goes to ∞, making the whole RHS of (2) also go to ∞. This
in turn pushes the larger sum arctan n1 on the LHS of (2) to go to ∞.

Translating back to our original Problem 4, arctan n1 = αn and the infin-

itely many angles in Figure 3 do add up to ∞: α1 + α2 + · · · + αn + · · · = ∞.
This completes the proof of Conjecture 1. 

4. Historical Detour: from Today back to Archimedes?

We managed to conquer the Infinitely Many Squares problem using any-

thing but geometry! Yet, its predecessor, the Three-Squares problem, yielded
to a variety of geometric ideas. In fact, 54 proofs to it that use only el-
ementary geometry can be found in Charles Trigg’s article [82] from 1971
in the Journal of Recreational Mathematics. Our own investigation of the
Three-Squares problem prominently included the brilliant 5th -grade solution
in Part I, based on the specific tiling of the 2 × 3 grid-rectangle shown in
Figure 4a.

Figure 4. Tilings in the Three-Squares and Stomachion

For someone who has followed the recent great discoveries of ancient
mathematical works, this discussion may have triggered a memory of other
tilings: Figure 4b represents one possible solution to the famous Stomachion,
a 14-piece puzzle attributed to Archimedes.4 The task is to take the pieces
out and then reassemble them back into the square shape.
At a first glance, the pieces are so distinct that it seems just a few
configurations are possible; but our intuition is very far from the truth! It
was only in 2003 that William Cutler, via a computer program, proved that
there are 17,152 possibilities. Discarding those that can be obtained from
each other by rotations and reflections, he showed that the number of truly
different ways to arrange the puzzle is exactly 536 [18]. And there is more
amazing combinatorics related to the problem! For example, as pointed by
Fan Chung and Ron Graham [14], there are 3 pairs of pieces such that no
matter how we rearrange the 14 original pieces, these 6 pieces will line up
within each pair next to each other exactly as shown by the shaded figures
in Figure 4c (and as one can check too in Figure 4b). In other words, after
gluing the pieces within these pairs, we are left to play with only 11 pieces.
The Stomachion is a 950 AD copy of a work of Archimedes by a Byzantine scribe.
It is also the last paper in the Palimpsest, a collection of several manuscripts that were
scraped, washed, and reused in the 13th century for a Christian liturgical book. Having
a fascinating history on its own of being discovered, re-discovered, and lost in the 19th
and 20th centuries, the Palimpsest finally became available again to the public after it
was purchased by an anonymous bidder in 1998 for over $2,000,000. This led to a decade
of scholarly research that heavily relied on technological advances, making the original
papers in the Palimpsest readable and overturning century-held beliefs.

Back to Archimedes, it is not completely clear what his ultimate goal

was in working on the puzzle. Unfortunately, only the beginning of the
Stomachion is preserved in the manuscript, and it is hard to judge where the
text was actually leading. Alexander Givental from UC Berkeley conjectured
that if the whole of the paper were recovered it would show that Archimedes
was solving the problem of comparing angles of triangles on a grid lattice,
thus discovering the basics of trigonometry.
We may never know whether this conjecture is true or not. But we
certainly did use trigonometry in the 8th -grade solution to the Three-Squares
problem, and the general idea of tilings on the grid lattice appeared in both
the Stomachion and in the 5th -grade solution to the Three-Squares problem.
Since geometry (and, for that matter, combinatorics) was entirely absent in
our approach to the Infinitely Many Squares problem, a gap is begging to
be filled by the most curious, persistent, and advanced readers:
Problem 6. (Super Challenge) Find a purely geometric argument, per-
haps along the lines of tiling up the grid lattice, to prove that the sum of
all angles αn in the Infinitely Many Squares problem is ∞. Do you think
Archimedes would have been able to come up with your solution?

5. Hints and Solutions to Selected Problems

Exercise 1. From right F AY we have F Y > F A. By the Pythagorean

Theorem for right CBY and right√ CBA, and from √ Y B > AB (A is
between Y and B), we have Y C = CB 2 + Y B 2 > CB 2 + AB 2 = AC.
Adding the two inequalities verifies that the path through Y is longer than
the path through A: F Y + Y C > F A + AC. 
Exercise 5. The text formula for F  (x) still works when x < 0 or x > c:
x (c − x)
F  (x) = √ −» ·
a +x 2
b + (c − x)2

In case x < 0, the first fraction is negative while the second fraction is
positive (why?), making the overall difference negative: F  (x) < 0 for x < 0.
This implies that F (x) decreases when x < 0.
Argue similarly to show that F  (x) > 0 for x > c. ♦
Exercise 6. More generally, for any function g(x) = A2 + (x − B)2 we
will show that g(x) ≈ |x − B| when |x| is large. Indeed, rationalizing the
“numerator” of the difference g(x) − |x − B|, we obtain:
g(x) − |x − B| g(x) + |x − B| g 2 (x) − (x − B)2 A2
· = = ·
1 g(x) + |x − B| g(x) + |x − B| g(x) + |x − B|

Since both g(x) and |x−B| go to ∞ when x → ±∞ (why?), the denominator

goes to ∞, forcing the whole fraction to converge to 0. In other words, when
|x| is large we have g(x) − |x − B| ≈ 0, i.e., g(x) ≈ |x − B|.

Applying this to the two

» square root functions appearing in our F (x), we

have 2 + x ≈ |x| and 62 + (4 − x)2 ≈ |4−x|, so that F (x) ≈ |x|+|4−x|
2 2

when |x| is large. Thus, when x → ∞, F (x) ≈ x + (x − 4) = 2x − 4, and

when x → −∞, F (x) ≈ −x + (4 − x) = 4 − 2x. 

Exercise 7. The middle parts of the routes are equal: X1 X2 = AA = d. From
right V1 AX1 , we have V1 A <»V1 X1 , and from » right triangles X2 BV2
and A BV2 , we have A V2 = A B + BV2 < X2 B 2 + BV22 = X2 V2 .
2 2

Adding up, V1 A + AA + A V2 < V1 X1 + X1 X2 + X2 V2 . 

Exercise 8. By construction, the three segments V1 Y , X1 X2 , and Z1 Z2

are parallel to each other (they all go north!) and of same length d. Hence,
two parallelograms sharing side V1 Y are born: V1 Y X2 X1 and V1 Y Z2 Z1 . By
taking different paths along the sides of these parallelograms we can partially
straighten each route without changing its length:
• V1 → X1 → X2 → V2 is as long as V1 → Y → X2 → V2 (why?).
• V1 → Z1 → Z2 → V2 is as long as V1 → Y → Z2 → V2 (why?).
Note that both replacements routes start with V1 Y and then continue from Y
to V2 along sides of Y2 Z2 V2 : the first route along side Y V2 , and the second
route along sides Y Z2 and Z2 V2 . By the Triangle Inequality for Y Z2 V2 ,
the second route is longer than the first route.
The only way for the two (original) routes through X1 and Z1 to have
the same length is for Z2 to slide along the riverbank until it coincides with
X2 , thereby pulling along Z1 until it coincides with X1 . To summarize, there
is a unique shortest path and it is given by our algorithm (a)-(e). 

Exercise 10. When exactly one of the villages is on the riverbank, our
solution goes through and yields a unique optimal bridge built at that village.
If both villages are on the riverbanks, then the two bridges built directly at
the villages and any bridge between these two bridges will be optimal. 
Exercise 12, Proof 1. The derivative of arctan x − x + 3 simplifies to
1 1 + (x2 − 1)(x2 + 1) 1 + x4 − 1 x4
h (x) = − 1 + x 2
= = = · 
1 + x2 1 + x2 1 + x2 1 + x2

Exercise 13. Set q(x) = x − arctan x. Its derivative is q  (x) = 1 − 1

1+x2 ≥ 0 for all x. Thus, q(x) is increasing for all x. In particular, q(x) ≥
q(0) = 0, i.e., x − arctan x ≥ 0 and x ≥ arctan x for x ≥ 0.
Since all three functions x, arctan x, and x − x3 are odd functions5 , the
inequalities between them flip signs when moving from positive to negative
inputs. Therefore, x ≤ arctan x ≤ x − x3 for x ≤ 0. 

An odd function F (x) is such that F (−x) = −F (x).

Exercise 14. The well-known geometric series 1 + x + x2 + · · · + xn + · · ·

converges iff its ratio x ∈ (−1, 1); and the sum in such a case is 1−x1
. Plugging
in −x2 for x we obtain:
1 − x2 + x4 − x6 + x8 − · · · + (−1)n x2n + · · · = for |x| < 1.
1 + x2
A theorem for integrating infinitely many terms of a series allows us to replace
both sides by their antiderivatives:
# Ä ä #
1 − x + x − x + x − · · · + (−1) x + · · · dx =
2 4 6 8 n 2n
1 + x2
x3 x5 x7 x9
⇒ x− + − + − · · · = arctan x + C, for |x| < 1.
3 5 7 9
Plugging in x = 0 in both sides forces C = 0. Somewhat more sophisticated
techniques are necessary to show that the equality above holds also for x =
±1. But for the purposes of just showing arctan x ≥ x − x3 when x ∈ [0, 1]
(as in Exercise 12), one can simply verify by hand the inequality for x = 1:
arctan 1 = π4 > 34 > 23 = 1 − 13 . ♦
Exercise 16. To show that the fractions n13 add up to something (i.e., the
partial sums bn converge to a limit, called the sum of n13 ), we enlist the
following theorem from Real Analysis:
Theorem 2. (Monotone Bounded Theorem (MBT)) If a sequence is
monotone (i.e., either only increasing or only decreasing) and bounded (from
above and from below), then it converges to a number, called its limit.
In our case, the partial sums bn of n3
increase because we keep adding
positive fractions. The bn ’s are also bounded from below by, say, 0, and from
above by 2 (as shown in the text). Thus, by MBT, {bn } converges to some
(finite) limit L = ζ(3). 


V1 V2

Figure 5. Optimal Station when ∠V1 V2 V3 is < 120◦ or ≥ 120◦
Problem 3. The optimal station will be located at a point T along the
railroad, inside V1 V2 V3 , and such that the three angles between arms T V1 ,
T V2 , and T V3 are as equal to each other as possible. In case ∠V1 V2 V3 < 120◦ ,
T will be the unique such point with ∠V1 T V3 = 120◦ , making the three angles
all equal to 120◦ (cf. Fig. 5a). If ∠V1 V2 V3 ≥ 120◦ , then T will coincide with
village V2 (cf. Fig. 5b). Can you find a geometric way to justify the answer? ♦

1. What Comes from Within

It is the 1980s. A sunny 5th grade classroom in Bulgaria. The math

teacher opens the class register, calls two girls to the board, and gives each a
problem. Soon enough, one of the girls writes a correct solution, receives an
A, and goes back to her seat. The other girl is stuck; she tries one approach,
then another; but the boat and ship in her problem go up and down the river
and refuse to meet in simple mathematical equations . . . . Meanwhile, the
other students “tame” the vessels in their notebooks, and the teacher moves
on with the new lesson. The girl remains at the board for the rest of the
period, her tears making it even harder to think about the problem.
The bell rings. The teacher beckons the girl and asks: “You know what
grade you deserve, yes?” A nod. “Well, I will not give it to you if you explain
the correct solution to me by the next math class.” The still sobbing girl goes
home in a miserable mood, yet with a big hope. Her father (a shipbuilding
engineer, speaking of coincidences) helps her derive a system of two linear
equations in two variables. From here on the solution is easy, and so the girl
explains it to the teacher the next day. Having avoided the poor grade, she
doesn’t stop there: “May I come to your math circle?” she inquires.
Three months later, to her classmates’, parents’, and her own amaze-
ment, that girl wins the local Math Olympiad with a perfect score. Her
fate is sealed right then: math will be her future. Sure enough, she will
continue for years with her ballet, piano, and guitar lessons; she will attend
a poetry circle and compete at science and literature olympiads; but her
passion . . . her passion will always be for math problem solving.

Later that year she would devise her own way of conquering the last
row of the Rubik’s Cube (having learned to solve the first two at her math
circle); in a couple of years she would represent Bulgaria at the International
Mathematical Olympiads (IMO); then go onto a math major at Bryn Mawr
and a doctorate at Harvard; train the USA math team for the IMOs . . . and
come full circle by founding the Berkeley Math Circle in 1998.

That girl is me – not angry at my middle school math teacher for putting
me on the spot in front of the whole class, rather, grateful to her for giving
me a second chance, for seeing the seed of talent in me, for accepting me and
nurturing my mathematical curiosity at her math circle, and for propelling
me forward with the belief that “what comes from within will take you far.”

2. The Culture of Circles

2.1. All you need is love. There is more than one way to fall in love with
mathematics. Many Eastern European mathematicians have come along the
path of math circles, where they have learned for the first time that the world
of math is larger than one could imagine, more interesting, and more diverse.
The math circle culture is ingrained in the societies in these countries. Dur-
ing the communist era, established mathematicians and pre-college teachers
considered it their duty to expose the younger generation to the wonders of
mathematics. And so they teamed together to found and run math circles.
In my hometown of Rousse ( ), for example, the math circles used
to meet twice a week in the afternoons or after dinner for 1.5-hour sessions.
The elementary/middle school math circle started in 3rd grade and included
about 25 kids of the same age from my school. The high school math circle
started in 8th grade, held its sessions at the local science youth center, and in-
volved about 15 students from several schools, about half of whom made the
circle’s core and competed at local, national, and international olympiads.
Concurrently, there were identically organized circles at all grade levels 3–11.
The material covered ranged from basic algebra and geometry to advanced
olympiad problem solving, to lower- and upper-division college topics.
2.2. Worthy of a circle. Mathematics was not the only subject “worthy
of a circle”. Starting in late middle or early high school, there were circles
in chemistry, physics, and biology; in English, poetry, and literature. I
participated in just about all of them at one time or another. I tried many
fields because the opportunities were there for me to explore.
The math circles were only part of a large net of pre-college circles created
to draw children and discover their talents. It was no more prestigious or
“cool” to attend a soccer club or take music lessons than to be a member
of, say, a high school physics circle. In fact, parents knew how important
the advanced knowledge gained in circles would be for their children’s future
and hence enthusiastically supported circle participation.
2.3. And they said higher math wasn’t practical? It was to my advan-
tage to attend math circles in particular. The type of thinking and specific
knowledge I mastered there helped me win science olympiads, e.g., devise
systems of equations to balance chemical elements or solve a quadratic equa-
tion in physics in 7th grade. I was heavily courted by my high school teacher
to participate in biology olympiads, for they often involved combinatorial
gene-counting or probability theory: a piece of cake for math circlers.

Even composing poetry and critiquing literature apparently benefited

from my “math-set” of mind. My favorite story here (which, incidentally,
landed in the Philadelphia Inquirer in the early 1990s) goes back to the
mandatory two-semester freshman English course at Bryn Mawr.
As the only non-native speaker of English in my class, I put a tremen-
dous effort into the weekly essay assignments, practically sleeping with the
dictionary under my pillow every weekend before the homework was due.
Still awaiting my final grade in January, I was sitting one day on the floor in
my dormitory and assembling my spring schedule. The phone rang, and, to
my surprise, my English instructor spoke at the other end of the line: “Are
you a math major, by any chance?” I answered affirmatively and steeled
myself for the worst. The instructor exclaimed:
“It figures! You write so clearly and in such a structured way, yet
your personality shows through your words! Even though I disagree with
half of the arguments in your final essay [‘How to Read the Beatles’]
you wrote them so convincingly, like a true mathematician . . . . I exempt
you from the second semester of English. You should take a higher-level
course: I can teach you nothing more in writing in this course.”
I didn’t end up taking another English course (probably a mistake on
my part); but needless to say, my math (and literature) circle training was
responsible for the above remarkable exemption.

3. Eastern European vs. USA Math Circles

3.1. He loves me; he loves me not! On the larger scale, the math circles
shaped my future by drawing me like a powerful magnet to the world of
mathematical problem solving. Since that 5th grade dramatic experience, I
knew within me that no subject but math would complete me, and no profes-
sion other than one in mathematics would be satisfactory to me. It is because
I loved math at school that I went to the math circle to get more of it.
Unfortunately, students in the U.S. by-and-large do not like their math
classes. And let us not deceive ourselves: generally, the talented middle
and high school students are bored by the low-level math, the relentless
repetition, and the lack of advanced ideas or challenging problems. And it
is because they don’t like math in school that they come to the Berkeley Math
Circle. Ironic, isn’t it?

3.2. Frequently asked questions. Here are some more differences between
Eastern European and U.S. math circles. Keep in mind that not all U.S.
circles follow the BMC model, and neither are my hometown math circles
(HMC) identical twins of the other Eastern European math circles.
3.2.1. Age of circlers. While in HMC all students were about the same age,
U.S. math circles may incorporate students of a variety of ages, e.g., BMC
ordinarily engages students in two or three different grades, but sometimes
ranging from 4th to 12th grade, all sitting and learning in the same room.

3.2.2. Logistics. HMC met twice a week for 1.5 (or more) hours. The HMC
were numerous and organized in such a way that students ordinarily could
go there and get home without parents’ assistance. U.S. math circles, due to
transportation issues and conflict with other established school and out-of-
school activities (e.g., volleyball team, music lessons, chorus, etc.), may meet
only once a week for 2-hour sessions. The large area covered by the one BMC
(from Sacramento to San Jose, from Palo Alto to Orinda and Danville) calls
for parents to drive their kids across the long distances and forces the evening
BMC time (6–8 pm) during the week, or alternative weekend sessions whose
timing presents other obstacles to families and organizers.
3.2.3. Home base. While HMC were either based at a school or at a local
math/science center, their U.S. counterparts are usually university-based.
A sufficient number of teachers in Eastern Europe were qualified to lead
math circles on their own, with some occasional support of materials and
instructors from a nearby university. Alas, this is not the case in the U.S.
3.2.4. Topics in HMC were organized in modules, providing continuity and
gradual increase of difficulty and depth of the material. This was possi-
ble mostly because the students had very similar math background, level
of knowledge, and mathematical maturity and because circlers attended all
sessions: transportation issues did not exist and other activities were de-
prioritized by the math circles. In the U.S., the circlers may vary from
beginners to seasoned members of the national USA math team, and hence
single powerful sessions incorporating the various levels and backgrounds are
more practical than long sequences of linked sessions. Besides, the sparsity of
U.S. math circles and competing activities (which become more the older the
student gets) means regular weekly attendance is not always possible; hence
missing one session should not preclude understanding the following one.
For the BMC-advanced group, the sessions are usually singletons, with occa-
sional series of 2 sessions. For the BMC-intermediate group, the sessions are
often in a series of 2, while for the BMC-beginners group a single instructor
undertakes a module of 3–4 thematically arranged sessions. (The BMC-
elementary groups have the same instructor throughout the whole year, and
topics tend to last for a month or two of sessions.) The younger the students,
the more continuity in topics and instructors is provided at BMC.
3.2.5. Session leaders in HMC were only one or two teachers who organized
the specific math circle. Occasionally we had guest speakers from the lo-
cal university, and once in a while we were visited by professors from Sofia
University or the National Youth Science/Math Center who trained the Bul-
garian national team. In contrast with HMC, each BMC instructor leads
an average of 2 sessions per year, accounting for approximately 50 instruc-
tors at the BMC-Upper every year. They are mathematicians from nearby
universities and colleges, some specially trained high school teachers, some
professionals working in related fields, and even some alumni and current
advanced circlers.

3.2.6. Popularity. Everyone in Eastern Europe knew about the math circles;
children and parents alike were well aware of the opportunity to enroll and
of the possibilities which successful participation might open in the students’
future. What portion of the U.S. population has an inkling that math cir-
cles exist? Negligible. What status do math circles have in U.S. society
and its educational system? Unclear. Can they compare in popularity to
membership of a high school football or debate team? No, they can’t.

Figure 1. Football or Math Circle?

3.2.7. Government support. The overall organization and funding in the so-
cialist model math circle was entirely secured by the state; a math circle was
an extracurricular activity roughly equivalent to one course each semester
and was thus correspondingly compensated by the Ministry of Education. To
the contrary, SF Bay Area math circles, for instance, are partially funded (if
at all) by private sources; the remaining “funds” are donated by volunteers’
time, effort, professionalism, and enthusiasm.
Undoubtedly, the reader has more questions, and the comparison list can
go on and on. But this Epilogue is not intended as an exhaustive study of
the math circle phenomenon. For more details on U.S. Math Circles, see
Sam Vandervelde’s “Circle in a Box” [84].

3.3. Get to the point. One way to resolve most of the problems associated
with math circles in the U.S. is . . . (OK, start dreaming!) . . . to have a math
circle at every college and university.
(1) The professor organizing and running the math circle will receive a
one- or two-course release from the math department, depending on
the frequency, length, and intensity of the circle sessions. This will
compensate for the huge effort involved in directing a math circle and
will hopefully encourage more mathematicians to get involved in edu-
cating the talented youth of the U.S.
(2) The math circle can be formally organized as a math course and, thus,
be open also to undergraduates.
(3) Undergraduate and graduate students, as well as interested postdocs
and tenured faculty, can be vertically integrated in this model.

(4) A modest semester fee for non-university participants (pre-college stu-

dents and teachers) will provide honoraria to the session leaders.
(5) The math department can provide secretarial and computing support
and office supplies, as well as a work-study student assistant and web
The math circle will be an invaluable math program offered to the local
community and can be viewed as part of the math department’s outreach
activities. This network model will resolve transportation problems at least
for the urban and suburban areas (i.e., areas with an institution of higher
education), will mobilize previously disinterested math faculty, and will give
some tangible and formal recognition to the work of math circle leaders.
An NSF VIGRE grant for the University of Utah ensured the above
model for their math circle, led by Peter Trapa and Dan Ciubotaru [83].
Other university-based circles approaching this model were founded at San
Jose State University [71], University of California at Davis [19], Stanford
University [78], University of California at Los Angeles [47], and others.
Below we’ll examine more closely the model of the Berkeley Math Circle [11].

4. History and Power

Despite the shortfalls of U.S. math circles’ set-up, don’t get me wrong: I
founded and ran one such circle for a decade and plan on doing so for at least
another decade. If I had to describe the Berkeley Math Circle in one phrase,
it would simply be a “high-power version of my hometown math circle”. But
let’s start from the beginning.
4.1. To marvel and to be appalled. By my last year of graduate stud-
ies at Harvard, I had taught enough math courses to question the quality
and depth of pre-college math education in the U.S. The few strong (very
strong!) undergraduates never took calculus or linear algebra (apparently
having taken them at some university while in high school) but jumped di-
rectly to upper-division courses like real analysis, abstract algebra, or number
theory, to name a few. The cream of the crop, former USAMO winners and
IMO medalists, even ventured into graduate courses like algebraic geometry
or topology, or Lie algebras (why not?). Each and every such top student had
beaten his/her own path out of the jungle of U.S. secondary math education
by hiring tutors, by escaping to a nearby university, or, if extremely talented
in problem solving, by qualifying for the 30-student one month Mathematical
Olympiad Summer Program (MOSP), in preparation for the IMOs.
As I marveled at the super-advanced math knowledge and skills those
relatively rare students had acquired through very special personal circum-
stances, I was appalled at the general math level of the remaining huge bulk
of undergraduates. We are talking here about problems in dealing with frac-
tions and simple algebraic manipulations, with which, I am sure, a 6th grader
in Bulgaria would have felt perfectly comfortable!

4.2. The missing link. In addition to the outrageous discrepancy between

the “top” and the “generic” math student, the link between secondary and
college math education – the math circles – was nonexistent as a system. It
seemed to me there was no statewide system in the U.S. to meet the needs
of talented math students, to discover and train them, to inspire them to
continue on with advanced mathematics.
And so, I decided it was high time to get acquainted, first-hand, with
secondary education in the U.S.: I enrolled in the Massachusetts’ teach-
ers certification program. The two high schools for my practicum, Newton
North and Chelmsford, offered me an interesting mixture of classes from al-
most remedial algebra to a problem-solving course of my design. I saw the
mathematical potential in a number of students, the desire to go beyond the
regular school curriculum. But I realized too that the math teachers were
overburdened with courses, never-ending administrative chores and extracur-
ricular activities; the additional load of running math circles (assuming some
teachers were qualified and willing) was inconceivable unless the school sup-
ported the enterprise financially and administratively. And these were two
of the good and prosperous schools in the Boston area.

4.3. The chicken or the egg. I didn’t have time to think about the
situation in the bad schools, as I graduated from Harvard and moved in 1997
to Berkeley to take up a postdoctoral position at the Mathematical Sciences
Research Institute (MSRI).
It wasn’t a month into my new job, when I got an e-mail from Hugo Rossi
(then the Deputy Director of MSRI) asking MSRI members for suggestions
on possible outreach activities to the community. About 10 minutes later,
Hugo and I were in agreement that a regional Math Olympiad for pre-college
students would be the right thing to do: an Olympiad different from the
numerous fast-type calculational contests, an Olympiad consisting of a few
hard essay-proof problems for several hours, in the true fashion of Eastern
Europe. I met Paul Zeitz (University of San Francisco) a week later, and
definite plans to start the Bay Area Mathematical Olympiad (BAMO) were
set in motion.
To publicize the plan, in the late fall
of 1997 MSRI asked me to give a talk to
an audience of 400 people at a bi-annual
public event. Sandwiched between two
spectacular lectures on the mathematics
behind “Brain Waves” and “Toy Story”,
was my modest presentation “The High
School Olympiads - Excitement, Talent,
and Determination” (cf. MSRI streaming
video [79]). Years afterward, people still
remember it by a single picture: that of a
chicken and an egg.

The idea was that BAMO would get its participants mainly through
newly founded school-based math circles around the SF Bay Area and would
serve as an annual focal event for their activities. The Olympiad and the
math circles would complete and strengthen each other and would be founded
at the same time: neither would exist without the other. The mathematical
community would support the math circles with materials and occasional
session leaders; but the circles would be run by teachers at their schools.
In the audience were Tom Davis (Silicon Graphics), Tom Rike (Oakland
High School), Quan Lam (UC Berkeley President’s Office), Brian Conrey
(Director of the American Institute of Mathematics in Palo Alto (AIM)), and
Donald Knuth (Stanford), who all expressed desire to help with the new circle
and Olympiad movement. MSRI and AIM then launched a series of events
with local teachers and the media to publicize BAMO and to encourage the
start-up of many math circles. Alexander Givental and Bjorn Poonen (UC
Berkeley), John McCuan (MSRI), Dmitry Fuchs (UC Davis), Tatiana Shubin
(SJSU), Joshua Zucker (then at Henry Gunn High School), and others were
attracted through these events and pledged their support.

4.4. The “temporary” is the most permanent. One of these public

events stands out in my mind as the conception of the Berkeley Math Circle.
It was half a year later, in April 1998. Thirty or so local teachers had gathered
at MSRI to learn about BAMO and to experience a math circle mock-session.
Everyone was elated after the presentations; people were talking excitedly.
But when a poll was taken of how many teachers were interested in
starting a math circle at their own school, guess what? There was not a
single hand up in the air! This was a wake-up call for all of us . . . more
precisely, a bucket of icy water on my hot head. I remember sitting in my
chair and puzzling over it: “What shall we do? BAMO can’t survive without
math circles . . . . But the teachers are obviously not ready to undertake the
enterprise on their own. Is this the end of it?”
Still reeling with the thought, I started circulating among my colleague-
professors asking if they were willing to deliver several sessions a year at
a temporary math circle, to serve as an example to teachers, so that they
would learn how it is done and would then start their own math circles. I
got affirmative answers from seven and undertook the task of organizing a
1–2 year trial math circle in Berkeley.
I must have been out of my mind, not realizing at the time what an
enormous responsibility, both academic and administrative, I was willingly
adding to my full-time job. But that’s what a new baby requires: sacrifice
and effort and devotion. I had more than enough of each, as I was carrying
a lifelong gratitude for my own childhood math circles and wanted to convey
the wonders of mathematics to the young generation of the United States, my
new home. What I didn’t know was that this project was far from temporary:
that it would go on year after year, until we would be celebrating now 15
years of BMC and the present book series would be our new baby.

There must have been more “crazy” people in the SF Bay Area at that
time. A twin to BMC was born: the San Jose Math Circle [71] came into
existence the same week as BMC, mid-September 1998, under the tender
care and never-ending enthusiasm of Tatiana Shubin and Tom Davis, and is
still operational. For a few years Tom Rike, Joshua Zucker, and John Howe
led their own school-based circles in Oakland, Henry Gunn, and Presentation
High Schools, respectively. Sam Vandervelde had a circle for two years at
Stanford [78] (now led by parents). With MSRI’s guidance and support,
Paul Zeitz and Brandy Wiegers launched a different type of math circle in
San Francisco [70] and Oakland [62]. Sharon Madison opened the Sudbury
Math Circle (Canada) as a chapter of BMC, and Olga Radko also fashioned
the LA Math Circle [47] after BMC. The SF Bay Area network has expanded
now to a number of math circles across the U.S.: very few school-based and
not nearly as many as needed, but certainly way more than a decade ago.

4.5. Mapping out the future. Zooming back in on the Berkeley Math
Circle, the services it offers begin with the weekly sessions and the monthly
contests, but certainly do no end there. BMC has become a center for com-
munications between students, parents, instructors, teachers, educators, and
university administrators, where the circlers’ present and future mathemat-
ical education is mapped out. This kind of mentoring is possible only in the
presence of both “sides”: high quality instructors and students.
The more than 50 BMC instructors per year range from teachers and
students to university faculty and real world tycoons. Among them are
mathematicians: Alexander Givental, Alexandre Chorin, Bernd Sturmfels,
Bjorn Poonen, Dmitry Fuchs, Elwyn Berlekamp, Federico Ardilla, Joe Buh-
ler, Kiran Kedlaya, Olga Holtz, Ravi Vakil, Robin Hartshorne, Serge Lang,
Vera Serganova, and many more. Some famous alumni have also contributed
sessions to the circle: Gabriel Carroll, Maxim Maydanskiy, Inna Zakharevich,
Neil Herriot, Andrew Dudzik, Austin Shapiro, Oaz Nir, and Evan O’Dorney,
all of whom have chosen career paths in or related to mathematics.
The accomplishments of the BMCers are stellar. For example, half of
the BAMO grand prizes and brilliancy awards have been captured by the
BMCers, including the only brilliancy award won by a girl, Hoan Ngo (Oak-
land High School), and the only BAMO-8 grand prize won by a girl, Laura
Pierson (then a 6th grader at Oakland’s Hillcrest School), as well as a dozen
gold and silver medals at the IMO’s and a dozen USAMO wins. In 2007,
Evan O’Dorney, as an 8th grader, scored perfectly at BAMO and won the
National Spelling Bee, meeting and enchanting the then-President Bush; the
next year he scored highest at the USAMO and received the Clay Olympiad
Scholar Award [15] for one of his solutions; he went on to earn the second
highest score in the world at the IMO ’10 in Kazakhstan and received a
congratulatory call from President Obama, meeting him a year later when
in Washington to be awarded the first place prize at the Intel Talent Science

Search in 2011. Several multiple-time Putnam Fellows1 are also among our
students. But most importantly, original mathematical research has been
conducted by several circlers, including Gabriel Carroll, Tiankai Liu, Mak-
sim Maydanskiy, Evan O’Dorney, and others.

5. Does the U.S. Need Top-Tier Math Circles?2

“I wish to state in no uncertain terms how important programs for our
talented young people are to the future of this country. The best place
to develop the highest end mathematical talent is in groups where young
people can feed off each others’ excitement, guided by the best minds
in the field. The model of top-tier math circles has been honed over
decades in other countries. An American version has been in place for
a decade and has shown measurable and almost unbelievable results.
Now is the time to make these programs a permanent feature of our
educational landscape. The community is ready to assist in any way pos-
sible. Universities are happy to provide facilities. Professors are happy
to volunteer their time. Parents are happy to spend countless hours.
And the reason we do this is that when you see these kids catch fire,
it takes your breath away.”
Ravi Vakil
Four-time Putnam Fellow
Professor of Mathematics
Stanford University

5.1. Early birds. Creative people start at a very young age to think
“outside-of-the-box” and to make significant contributions to the world. Some
noticeable examples are Bill Gates, who at age 20 dropped out of Harvard
to run Microsoft full-time; Steve Jobs founded Apple at age 19; and re-
cently Mark Zuckerberg created Facebook, a social graph platform, also at
age 19. The best young minds in the U.S. deserve our support. The Top-Tier
Math Circles are venues for such support: they nurture individuals who are
capable of significant accomplishments by giving them advanced training in
problem-solving tools that are found in no other U.S. educational institution.
As another example, a month before Evan O’Dorney [50] qualified for
his first IMO in Spain ’08, the 9th grader was exempted from his final in a
linear algebra class at UC Berkeley. The reason: he solved an open problem
posed in an article by Professor William Kahan [45]; more precisely, Evan
found out how small one can make the Cayley transform of a real orthogonal
matrix by reversing the signs on selected columns.

William Lowell Putnam Mathematical Competition [64] is the premier Mathematical
Olympiad for college students in the world. A Putnam Fellow is among the top 5 scorers.
Excerpts from [85].

“BMC has taught me a number of useful mathematical concepts and

theories and exposed me to challenging problems. Writing problems for
the Monthly Contests provided an outlet for my creative mind. BMC also
introduced me to the top local, national, and international mathematical
contests. The mentorship I receive through BMC is invaluable.”
Evan O’Dorney, BMC alumnus
Junior at Harvard University
Two gold, two silver IMO medals
1st prize, Intel Talent Science Search ’11
National Spelling Bee Champion ’07
BMCer Gabriel Carroll was a high school junior when he took time off
from IMO participation to work at the Research Science Institute at MIT.
Without any prior experience in algebraic topology, he studied the link be-
tween posets and geometric figures, and his paper “Homology of Narrow
Posets” [63] won the third place prize at the Intel Science Talent Search ’01.
Gabriel went on to win two gold and one silver medals at the IMOs, achieving
one of only four perfect scores at IMO ’01. He conquered the Putnam four
times, two of those four while still in high school. A quote by him appears
in the beginning of the Introduction.
After winning a BAMO grand prize and the Regents’ and Chancellor’s
Scholarship to UC Berkeley, BMC alumnus Maksim Maydanskiy attended
two top undergraduate research programs: the Penn State REU and the REU
in Duluth, Minnesota. His first project was inspired by Monsky’s Theorem
on triangulations of the square and resulted in the paper “Triangles Gone
Wild” [46]. His Duluth work “The Incidence Coloring Conjecture for Graphs
of Maximum Degree 3” [51] extended the previously known result that all
Hamiltonian cubic graphs have incidence 5-coloring to all cubic graphs.

“The impact of the math circle program on my personal mathemati-

cal development is hard to overestimate. It was, and continues to be,
the single most vibrant source of mathematical activity for high school
students in the Bay Area. The lectures introduced me to many areas
of mathematics, a number of which came up again in my later studies.
The opportunity to meet a variety of people from fellow students to pro-
fessors, the college campus setting, the overall atmosphere – all of that
made BMC unique. The program helped me to shape my plans for un-
dergraduate education. It was an experience no other sources could pro-
vide. The program has a great effect on mathematical youth in the Bay
Area. It provides an interaction media and stimulating environment,
both encouraging further involvement from students already interested
in mathematics and promoting mathematics to a wider audience.”
Maxim Maydanskiy, BMC alumnus
BAMO ’00 grand prize
Ph.D. in mathematics, MIT
Institut de Mathématiques de Jussieu, Paris

5.2. The ultimate measure: more testimonials. An important contri-

bution that top-tier math circles make is to challenge the exceptional stu-
dents and by doing so to keep them interested in science and mathematics.
“The math circle was so crucial to my education and interest in math;
I can hardly imagine studying math at Harvard if it weren’t for it.”
Tiankai Liu, SJMC and BMC alumnus
Three-time IMO gold medalist
Two-time Putnam Fellow
Ph.D. student in mathematics, MIT
Over and over again, our circlers write about the impact of the program
on their understanding of mathematics and their future; about a “different
side of math” which they can acquire at the math circle but not at school;
about “mind-bending” and “constantly challenging” sessions; about “gaining
confidence” and finding a place where they “feel accepted”. Starting with a
senior at BMC, we will move to quotes from younger and younger circlers.
Evan Chen from Fremont, who teamed up with Evan O’Dorney for the
last three years to coordinate the Monthly Contest, was a USA IMO ’13
Candidate and a USAJMO ’10, ’11 Winner. He received perfect scores at
BAMO ’12 and Asian-Pacific MO ’13 and was selected to participate at the
Research Science Institute in the summer of 2013 at MIT.
“The Berkeley Math Circle was an unparalleled educational opportu-
nity for me, both as a student and instructor. The lectures burrowed
into countless different areas of mathematics, most of which I otherwise
would not have seen until much later, and many which I would likely
have not seen at all. The opportunity to plan and deliver my own ses-
sions and to teach students proof-writing through the monthly contests
has also been an invaluable pedagogical experience (and lots of fun!).”
Evan Chen, BMCer, 12th grader

The moment Laura Pierson from Oakland walked into BMC as a 5th
grader, it was obvious that she was special beyond any regular measures. As
a 6th grader, she made history: she won the BAMO-8 Grand Prize in 2012
with a perfect score and conquered USAJMO ’12, thereby becoming the
youngest to have been invited to MOSP. She went on to win silver medals
on the U.S. (high school!) teams at the European and China Girls Math
Olympiads in 2013 and 2012, respectively. She astounded her professors at
UCB when, as a seventh grader, she received the top scores in multi-hundred
student Calculus II and the upper-division Linear Algebra courses. She was
accepted to College Preparatory School in Oakland, skipping 8th grade.
“BMC has opened up a whole new world for me. It sparked my passion
for math and introduced me to whole new areas of math I had no idea
existed. I’ve also gotten to meet so many amazing people who share my
passions and who I can connect with and learn from. In many ways it’s
been a really life-changing experience.”
Laura Pierson, BMCer, 9th grader

Nico Brown from Mill Valley is the kind of kid about whom you have no
doubt: he “breathes” mathematics just as he breathes air. Being precocious
does not come even close to describing the mature interest in pure math-
ematics which Nico spontaneously exudes. He has 13 accepted sequences
on the Online Encyclopedia of Integer Sequences, a mathematician peer-
reviewed database. A multiple winner of the Monthly Contest and the Win-
ner in the Individual Countdown Round of the Berkeley Mini-Math Tour-
nament ’13, Nico’s passion is expressed most prominently through his work
at on “original mathematics and proof writing, particularly in
number theory.”
“Most weeks start on Monday mornings, but mine start on Tuesday
nights with the Berkeley Math Circle. It’s the highlight of my week for a
couple of reasons. Reason #1: The math, of course, but math I wouldn’t
see otherwise, such as the chromatic number of the plane or matrices,
brought in by people who love math like me. Reason #2: I’ve met two
of my best friends at BMC. For kids who love math, it’s rare to meet
others who feel the same; so combining math with friendship is why I
keep coming back. BMC also stands for ‘Best Math Community’.”
Nico Brown, BMCer, 6th grader

Vincent Pisani from Castro Valley has been in BMC for three years
and, as one of the youngest participants, has bravely taken any and all tests
offered at the circle, including AMC8, AMC10, and BAMO. Having been
awarded the John Hopkins 2012 High Honors, it may come as an anticlimax
to know that he also received the credit for the California High School Alge-
bra requirement based on test results taken as a 4th grader. A programmer
and iPad App developer, Vincent is an accomplished trumpet player.
“I really enjoy going to the Berkeley Math Circle. Each week has a
new topic, so I get to learn about a huge variety of mathematical topics,
unlike school. I have also met several great friends who also enjoy math,
including a professor from USF. I get together with them often to share
and work on math. BMC feeds my appetite for learning about math, and
I think it is worth driving all the way to Berkeley each Tuesday.”
Vincent Pisani, BMCer, 6th grader
Arav Karighattam from Davis joined the circle two years ago and won
over everyone with his smile and irrepressible enthusiasm for math. He re-
ceived the BAMO Young Student Achievement Award in ’12 and ’13, was
one of the top students in the Junior High category of the
California State Championships in ’12 and ’13, qualified for AIME in ’13 (as
a 4th grader) and in ’14, and continues to amaze his UC Davis professors in
upper-division courses such as Combinatorics, Euclidean Geometry, Number
Theory, and Real Analysis. He has also won music and poetry competi-
tions, including the Composers Today California State Contest in ’13 and
the ‘Voices of Lincoln’ Young Poet Contest in ’11, ’12, and ’13.

“There are many things I love about the Berkeley Math Circle. First,
I like the range of advanced topics taught at each session. Second, I
enjoy all the open problems presented at the circle during certain lectures
(e.g., which permutations are Wilf-equivalent?). That is why I don’t like
to miss a single session of BMC, rain or shine. It is an extraordinary
Arav Karighattam, BMCer, 5th grader

Espen Slettnes is a third grader at BMC, who rapidly moved from the
BMC-Elementary to the BMC-Intermediate group in only two years and
received, not surprisingly, the 2012 High Honors Award from Johns Hopkins
University’s Center for Talented Youth and Math Kangaroo’s 2013 5th place
in California and 10th place nationwide. He is also a Young Scholar at
Davidson Institute for Talent Development and was selected to participate
at the Epsilon Camp for exceptionally gifted young children in 2013 and 2014.
“I am 8 years old, and I love math. BMC is an important part of my
math education, because it is one of the only places I get to work on real
math that I don’t get to do in school. The lectures introduce me to many
different math topics and help me dive deeper into topics I already know.
I also love participating in the BMC monthly contests, which exercise
my mind and help me improve my skills in writing mathematical proofs.
I am very glad to be part of BMC.”
Espen Slettnes, BMCer, 3rd grader

5.3. The gathering storm. There are a number of studies of the deteri-
orating situation in U.S. math and science education and its impact on the
scientific and technological presence of the U.S. in the world. To describe
just how critical the situation is, we refer below to three such reports.
“The United States is losing its edge in innovation and is watching the
erosion of its capacity to create new scientific and technological break-
throughs. Increased global competition, lackluster performance in mathe-
matics and science education, and a lack of national focus on renewing its
science and technology infrastructure have created a new economic and
technological vulnerability as serious as any military or terrorist threat.”
A Commitment to America’s Future, 2005 [13]

The National Academy of Sciences has also called to our attention the
need for the U.S. to raise its capabilities in mathematics, science, and en-
gineering, in a report “Rising Above the Gathering Storm: Energizing and
Employing America for a Brighter Economic Future” [58]. According to it:
• The U.S. has long depended on foreign-born and -trained mathematicians,
engineers and scientists to help maintain its intellectual lead.
• The global competition for these talented individuals has greatly intensified
in recent years and will continue to do so, as the rest of the world increases
its technical capabilities and living standards.
• To remain competitive, the U.S. needs to devote considerably more effort and
resources to foster excellence in mathematics, science and engineering.

The majority of talented individuals in these fields recruited by U.S.

universities and technology companies are from China, Europe, India, and
the former Soviet Union. A 2006 report on Science, Technology, Engineering,
and Mathematics Education (STEM, [61]) brought forward related troubling
trends and numbers:
• In 2004, China graduated approximately 500,000 engineers; India graduated
200,000 engineers; and the U.S. graduated 70,000 engineers. On the other
hand, South Korea graduates as many engineers as the U.S. even though it
has only one sixth of the U.S. population.
• More than half of all engineering doctorates awarded in the U.S. go to foreign-
born students. In 2003, 25% of all college-educated workers and 40% of all
doctorate holders were foreign-born. Over half of the doctorate holders in
several fields who resided in the U.S. were foreign-born: computer science;
electrical, civil, and mechanical engineering.
• From 1994 to 2004, there has been a steady increase in the percentage of U.S.
patents granted with a foreign origin, including foreign-owned companies and
foreign inventors. In one decade this number has increased from 18% in 1994
to 48% in 2004!

What do these foreign countries do differently from the U.S.? There

are many differences and each country is unique. India and China value
technical education as a path to prosperity; admission to technical schools
there is based on rank in national exams. In the former Soviet Union and
Eastern Europe, mathematically talented individuals are identified very early
and are provided with the resources needed to reach their full potential.

5.4. Raising the ceiling. What can be done in the U.S.? Hung-Hsi Wu,
Professor of Mathematics at UC Berkeley, has been involved in the educa-
tion of U.S. mathematics teachers for the last decade. He was on the Task
Group on Teachers in the National Mathematics Advisory Panel appointed
by President Bush and is currently serving on the National Research Council
Panel on the Study of Teacher Preparation Programs.
According to Professor Wu, a main purpose of both panels is to address
the crisis in teacher quality among math teachers so as to insure the pro-
duction of a large enough pool of mathematically literate students to fill our
technological needs. However, to insure that we also produce first rate scien-
tists and mathematicians, a different kind of approach would be necessary:

“This is where the Math Circles come in. It is programs like the
Math Circles that can provide the needed guidance and stimulation for
the cream of the crop of this pool. While the work done by the above-
mentioned panels is designed to raise the floor to make our nation com-
petitive in the global market, what the Math Circles do is to raise the
ceiling in order to maintain our worldwide leadership position in science
and technology.

At a time of need in our nation’s mathematics education, the work

done in top-tier math circles such as the Berkeley Math Circle and the
San Jose Math Circle is of vital importance.”
Hung-Hsi Wu
Professor of Mathematics
University of California at Berkeley
While it is unlikely that math circles will have a large impact on the
value system of the American public, the top-tier math circles in the U.S. do
play a significant role in meeting the challenges described above by preparing
our best young minds for their future role as mathematics, science, and tech-
nology leaders. With your help, we can establish a dense network of math
circles across the U.S.

With hope,
Zvezdelina Stankova
Berkeley Math Circle Director
Berkeley, March 17, 2014
Symbols and Notation

Set and Logic Notation

N set of natural numbers
Z set of integer numbers
Q set of rational numbers
I set of irrational numbers
R set of real numbers
C set of complex numbers
∞ infinity or infinitely many
(a, b) open interval: all x ∈ R such that a < x < b
[a, b] closed interval: all x ∈ R such that a ≤ x ≤ b
(a, ∞) semi-infinite interval: all x ∈ R such that x > a
(−∞, b) semi-infinite interval: all x ∈ R such that x < b
(−∞, ∞) infinite interval: all real numbers, R
∈; ∈ is an element of; is not an element of
(; ( passing; not passing through
⊂; ⊂ is a subset of; is not a subset of
⊃; ⊃ contains; does not contain
 is contained in but is not equal to
 contains but is not equal to
A∩B intersection of set A and set B
A∪B union of set A and set B
AB disjoint union of set A and set B
A\B set A but without the elements of set B
A×B all pairs (a, b) of elements a in A and b in B
A the complement of set A
|A| number of elements in set A
Σ sum of elements in set A
(A) product of elements in set A
⇒ implies, only if
⇐ if, is implied by
⇔, iff if and only if
 end of proof
♦ end of hint or partial solution
? questionable proof


Geometry Notation
: divide or take the ratio of segments
α, β, γ, δ alpha, beta, gamma, or delta: letters from the Greek alphabet
aA mass point (a, A)
I(A, r) inversion with center A and radius r
I(A) inversion with center A and unspecified radius
[ABC] area of triangle ABC
AB segment AB or its length depending on context
|AB| distance from A to B; used if AB is ambiguous

AB arc AB
AB ray AB
∠ABC angle ABC
ABC triangle ABC
I Triangle Inequality
⊥ is perpendicular to
* is parallel to

= geometric congruence
∼ geometric similarity
∠A = ∠B congruence of angles written also as ∠A ∼= ∠B

Group Theory Notation

R Rubik’s Cube group
B,F,U,D,L,R quarter-turn clockwise twist about the back, front, up, down, left,
and right faces of the Rubik’s Cube.
e, id identity element (e.g., in a group)
g −1 inverse of an element in a group or reciprocal of a number
o(a) order of element a of a group
Dn the nth dihedral group: the group of symmetries of a regular n-gon
Sn the symmetric group of permutations on n objects
An (usually) the alternating group of even permutations on n objects
Zn the group of remainders modulo n under addition
Q∗ , R∗ , C∗ same sets but without 0; all groups under multiplication
G1 × G2 direct product of groups G1 and G2
G1  G2 semidirect product of groups G1 and G2

Complex Numbers Notation

C √ viewed in the C-plane)
the unit circle (as
i imaginary unit, −1
Re{z} real part of complex number z
Im{z} imaginary part of complex number z
z conjugate of complex number z
(|z|, θ) polar form of z with modulus z and argument θ
ζn , ω1 primitive nth root of unity
ζnk , ωk kth power of a primitive nth root of unity
Cn the group of all nth roots of unity under multiplication
Pz smooth curve through all integer powers z n of z
ζ(s) Riemann zeta-function

Number Theory Notation

x = [x] floor of x or integer part of x: greatest integer ≤ x

&x' ceiling of x: least integer ≥ x
{x} fractional part of x: x − [x]
min{a, b} minimum of a and b
max{a, b} maximum of a and b
a | b (a  b) a divides b without remainder (a does not divide b)
a ≡ b (mod c) a is congruent to b modulo c
gcd(a, b) greatest common divisor of a and b
lcm(a, b) least common multiple of a and b
R(n) ∞-Raffle function
id(n) the identity function: id(n) = 1 for all n ∈ N
ι(n) the constant function 1: ι(n) = 1 for all n ∈ N
O(n) the zero-function: O(n) = 0 for all n ∈ N
ε(n) a two-value function: ε(1) = 1 and 0 elsewhere
φ(n) Euler function
μ(n) Möbius function
Λ(n) von Mangoldt function
τ (n) number of the divisors of n
σ(n) sum of the divisors of n
π(n) product of the divisors of n
A set of arithmetic functions
M set of multiplicative functions
S set of strongly multiplicative functions
Sf sum-function of the function f
f g Dirichlet convolution of functions f and g
x xi is missing from the product of the other xj ’s

Combinatorics Notation

n! n factorial, 1 · 2 · 3 · · · (n − 1) · n
P (n, k) number of permutations of n objects taken k at a time
k binomial coefficient n choose k, n!/(r!(n − r)!)

Knot Theory Notation

U unknot
T (right-hand) trefoil
41 figure 8
H Hopf link
W Whitehead link
B Borromean rings
S Square knot
R1, R2, R3 Reidemeister moves on links
τ (L) the number of tricolorings of a link L
K1 #K2 connected sum of two knots
L mirror image of link L
VL Jones polynomial of a link L

Linear Algebra Notation

x vector x
3-D three-dimensional
Null(A) null space (or kernel) of a matrix A
dimV dimension of space V

Functions, Means, and Calculus Notation

≈ approximately
→, → goes to (under a function or a process)
e base of natural log, ≈ 2.71828
π ratio of circumference√ to diameter of a circle,√≈ 3.14159
φ; φ golden ratio, (1 + 5)/2; its conjugate, (1 − 5)/2
ex natural exponential function
ln x natural logarithmic function, loge (x)
sin x sine function
cos x cosine function
tan x tangent function
cot x cotangent function
arctan x inverse of the tangent function
|x| modulus, or absolute value of, x
√ − y| distance between numbers x and y
√ x square root of x
x nth root of x
Pr r th power mean
P1 arithmetic mean
P0 geometric mean
P−1 harmonic mean
P2 root mean square
P∞ max{x1 , . . . , xn }
P−∞ min{x1 , . . . , xn }
x̃ weighted average
lim f (x) limit of function f (x) as x goes to a
lim xn limit of sequence xn
$f (x), f (x) first and second derivatives of f (x)
f (x)dx integral (or antiderivative) of f (x)
AA Angle-Angle Criterion for similarity of triangles
ASA Angle-Side-Angle Criterion for similarity of triangles
AHSME American High School Mathematics Examination
AIM American Institute of Mathematics
AIME American Invitational Mathematics Examination
AM Arithmetic Mean
AMC American Mathematics Competition
AMS American Mathematical Society
ARML American Regional Mathematics League
AWM Association for Women in Mathematics
BAMM Bay Area Mathematics Meet
BAMO Bay Area Mathematical Olympiad
BMC Berkeley Math Circle
CM Continuity and Midpoint Criterion
Cor Corollary
CTY Center for Talented Youth at John Hopkins University
Def Definition
gcd Greatest Common Divisor
GM Geometric Mean
GPHP Generalized Pigeonhole Principle
HM Harmonic Mean
HMC Hometown Math Circles
HL Hypotenuse-Leg Criterion for congruence of right triangles
H/L Hypotenuse-Leg Criterion for similarity of right triangles
HLP Hardy-Littlewood-Pólya Inequality
iff If and only if
IH, IHs Inductive Hypothesis, Strong Inductive Hypothesis
IMO International Mathematical Olympiad
JI Jensen’s Inequality
l’H l’Hôpital’s Rule
LA Los Angeles
lcm Least Common Multiple
Lem Lemma
LHS Left-Hand Side
MAA Mathematical Association of America
MASS Mathematics Advanced Study Semesters
MC Monthly Contest at the Berkeley Math Circle
MI, MIs Mathematical Induction, Strong Form of Mathematical Induction


MIT Massachusetts Institute of Technology

MOSP Mathematical Olympiad Summer Program
MSRI Mathematical Sciences Research Institute
Mult Multiplicative
NSF National Science Foundation
NYCML New York City Mathematical League
OEIS Online Encyclopedia of Integer Sequences
PM Power Mean
PHP Pigeonhole Principle
Prop Proposition
PST Problem Solving Technique
RA Ratio-Angle Criterion for similarity of triangles
R A Ratio-Opposite-Angle Criterion for similarity of triangles
REU Research Experience for Undergraduates
RHS Right-Hand Side
RI Rearrangement Inequality
RR Ratio-Ratio Criterion for similarity of triangles
SAS Side-Angle-Side Criterion for congruence of triangles
SF San Francisco
SJMC San Jose Math Circle
SJSU San Jose State University
SOS Sum of Squares
SsA Side-Side-Angle Criterion for congruence of triangles
SSS Side-Side-Side Criterion for congruence of triangles
Thm Theorem
TLC Tangent Line-Chord
USAMO USA Mathematical Olympiad
USAJMO USA Junior Mathematical Olympiad
USSR Union of Soviet Socialist Republics
VIGRE Vertical Integration of Research and Education
WLOG Without Loss of Generality
wrt With Respect To
Biographical Data

Bjorn Poonen is the Claude Shannon Professor of Mathematics at MIT.

He received AB and PhD degrees from Harvard and Berkeley, respectively,
and held positions at MSRI, Princeton, and Berkeley before moving to MIT
in 2008. He was involved with the Berkeley Math Circle from its creation in
1998 until 2008; he first led a session on inequalities there in 2001.
Poonen’s research focuses mainly on number theory and algebraic ge-
ometry; in particular, he is interested in the rational number solutions to
equations. Poonen is the founding managing editor of Algebra & Number
Theory. He is a fellow of the American Academy of Arts and Sciences and
of the American Mathematical Society. He has received the Guggenheim,
Packard, Rosenbaum, and Sloan fellowships, as well as a Miller Professor-
ship, and the Chauvenet Prize (in 2011). Earlier, he was a four-time Putnam
Competition winner, an International Mathematical Olympiad silver medal-
ist, and the unique perfect scorer out of 385,000 participants in the 1985
American High School Mathematics Exam. Thirteen mathematicians have
completed a PhD thesis under his guidance.

Gabriel Carroll was a student at Oakland Technical High School when he

attended the Berkeley Math Circle for three years. He won three consecutive
BAMO grand prizes and three ARML top individual prizes, received two gold
medals and one silver IMO medal (including a perfect score at IMO ’01 in
Washington, D.C.), and was among the top five-ranked Putnam scorers from
2000–2003, becoming one of only seven four-time Putnam Fellows. Gabriel
co-coordinated the BMC Monthly Contest for two years. While still a circler,
he presented a number of topics to the more advanced BMC students; one
of these sessions became the basis for Monovariants in this book series.
After a stint teaching English in Hunan province in China, Gabriel has
since proceeded to put his mathematics background to use studying theoret-
ical economics. He completed his PhD at MIT and a post-doc at Microsoft
Research, and is now an assistant professor in the economics department
at Stanford. He continues to write problems for contests such as BAMO,
Gabriel’s recent activities outside academia include poking at piano key-
boards, making ceramics, playing Go, and eating unexpected vegetables.

Maia Averett is on the faculty at Mills College. She completed her PhD
in 2008 at UC San Diego where she was a UC Regents Dissertation Year
Fellow. Her area of mathematical specialty is the wobbly world of topology.
She started off as a homotopy theorist, but lately her research has been in
the fascinating new area of topological data analysis, a field that applies the
abstract machinery of algebraic topology to point cloud data to gain insight
about topics ranging from breast cancer to basketball.
Since finding mathematics as her passion came relatively late in college,
she has made mathematical outreach to young people a central objective
in her career. She created and conducted math circle sessions since 2008
for both the Berkeley and the Marin Math Circles. Maia also has a special
interest in fostering women in mathematics. She has been engaged in the Ex-
panding Your Horizons program at UC San Diego and at Mills. She founded
student chapters of the Association for Women in Mathematics (AWM) at
UC San Diego while in graduate school and later at Mills, where the chapter
goes by the name of The Möbius Band in honor of her love of topology.
Events organized by The Möbius Band regularly attract upwards of 30 peo-
ple – quite a feat at a school like Mills, which has only 950 undergraduates.
Maia has also taken an active role in the AWM on a national level, serving on
and chairing the student chapters committee and creating chapter meet-ups
at national math meetings.
When she’s not teaching, researching, programming, outreaching, or
otherwise mathematically engaged, Maia enjoys cooking Thai food, circuit-
bending children’s toys, and hiking in the Oakland hills with her dog.

T om Davis competed on the Caltech Putnam team as an undergraduate

and earned his PhD in probability and partial differential equations at Stan-
ford. He was a founder of Silicon Graphics and a Principal Scientist there.
For fifteen years he has been a freelance mathematician, pursuing his pas-
sion: to work on challenging problems with talented students in the setting
of math circles. Tom has been involved in the Berkeley Math Circle from
its inception. He co-founded and co-directs the San Jose Math Circle [71]
and regularly leads sessions at all SF Bay Area math circles. He has also
co-organized Teachers’ Math Circles at AIM and MSRI.
According to Tom, calculus is not enough to do computer graphics: “Peo-
ple who are interested in making computer-generated dinosaurs for ‘Jurassic
Park’ or a liquid metal man in ‘Terminator II’ or who want to have Forrest
Gump shake hands with Richard Nixon had better have a solid grounding in
advanced calculus and in differential and projective geometry.” Tom’s web
site [20] contains free dynamic geometry software, the Rubik’s Cube software
discussed in his articles, and an extensive collection of math circle talks.
Tom is also an avid fan of endurance athletics. He has completed three
ironman-distance triathlons and one ultramarathon, but hopes that he finally
has the good sense not to do another.

T om Rike graduated from San Francisco State College in 1968. After

spending the next two years taking graduate courses and getting a teaching
credential, he taught six years at Westlake Junior High School. In 1974, he
went back to school in the evenings and received his M.S. in mathematics
from Holy Names College in 1976. Moving to Oakland High School, he
taught until he retired in 2003 and now volunteers there three days a week.
Tom served as the high school liaison for BMC and BAMO from the
beginning until 2009. He has been fascinated by giants of mathematical
thought such as Archimedes, Euler, and Gauss and has shared their works
at BMC sessions on a number of occasions: using the arbelos from the Book
of Lemmas by Archimedes; showing Euler’s solution to the Basel Problem;
and demonstrating Gauss’s proof from Disquisitiones Arithmeticae that a
17-gon is constructible. It is Archimedes’ lever, on which he “balanced” his
Mass Point article in Volume I. For a number of years, Tom ran a math
circle at Oakland High.
Tom has been deeply involved as a coach and is now director of the East
Bay Mathletes, a monthly competition among local high schools, which began
in 1978. In 1983, he helped found in Oakland a middle school mathematics
competition, All-Star Mathletes, which takes place four times a year. Among
his interests outside mathematics are the San Jose Sharks, the SF Giants,
and opera. He has been playing Go and studying the Japanese language
with devotion for over 45 years.
T atiana Shubin went as a high school student to the Special Mathematics
and Physics Boarding School of the Academy of Sciences in Novosibirsk. She
did her undergraduate work in the USSR at the Kazakh and Moscow State
Universities. In 1983 she received her PhD in Mathematics at UC Santa
Barbara, and after a couple of years at UC Davis she joined the Mathematics
Department of San Jose State University in 1985.
At the outset, Tatiana was a strong proponent of math circles. She has
often said that she owes her life to math circles and is now paying back
her debt of gratitude by making math circles available to others. In 1998,
Tatiana co-founded the San Jose Math Circle [71] and the highly successful
Bay Area Math Adventures (BAMA) talks. The latter have been preserved
in two volumes published by MAA: Mathematical Adventures for Students
and Amateurs, and Expeditions in Mathematics, co-edited by Tatiana, David
Hayes, and Gerald Alexanderson. [39, 75]
Tatiana has also contributed sessions to BMC since 2001, with emphasis
on group theory and a variety of hybrid geometry topics. She is a co-founder
and a leadership team member of the Math Teachers’ Circle Network since
2006, and a co-founder and a member of the Executive Committee of a
Special Interest Group of MAA on Math Circles for Students and Teachers
since 2009. She translated and edited several books published by the AMS
in the MSRI Mathematical Circles Library. She is also the founder and a
co-director of the Navajo Nation Math Circles project, aimed at launching

and supporting mathematically rich experiences such as math circles and

math summer camps for children and teachers in the Navajo Nation.
Tatiana’s outstanding teaching was recognized in 2006 via MAA’s Distin-
guished College or University Teaching of Mathematics Award, of the North-
ern California, Nevada, and Hawaii section.
With all this, Tatiana still finds opportunities for her favorite pastime,
rock hounding, resulting in an expansive rock collection at her home.
Zvezdelina Stankova was drawn into the world of mathematics when,
as a 5th grader, she joined the math circle at her school in Bulgaria and
three months later won the Regional Math Olympiad. She represented her
home country at two IMOs, earning silver medals. Some of her articles in
this book series are inspired by the lectures she heard during the training of
the Bulgarian IMO team.
As a freshwoman at Sofia University, Zvezda won a competition to study
in the U.S. and completed her undergraduate degree at Bryn Mawr College
in 1993. She did her first math research in enumerative combinatorics at
two summer REU’s in Duluth, Minnesota. The resulting papers contributed
to her Alice T. Schafer Prize for Excellence in Mathematics by an Under-
graduate Woman, awarded by the Association for Women in Mathematics.
In 1997, Zvezda received a PhD from Harvard University, with a thesis on
moduli spaces of curves, in the field of algebraic geometry. Meanwhile, she
earned a high school teaching certificate in the state of Massachusetts and
later in California.
As a postdoctoral fellow at MSRI and UC Berkeley in 1997–1999, Zvezda
co-founded BAMO [8] and started BMC [11]. She trained the USA national
team for the IMOs for six years, including the memorable year 2001 when
three of the six team members were BMCers, and USA tied with Russia for a
second overall place in the world. Since 1999, she has been at Mills College.
Her current research interests include classification of restricted patterns in
the area of enumerative and algebraic combinatorics.
Zvezda’s inspiring style and passion to teach have been recognized by
the MAA: in 2004 she was selected as a recipient of the first Henry L.
Alder Award for Distinguished Teaching by a Beginning College or Univer-
sity Mathematics Faculty Member. In 2011 MAA awarded her the highest
math teaching award in the United States, the Deborah and Franklin Tepper
Haimo Award for Distinguished College or University Teaching of Mathe-
matics. Zvezda was featured in the Salutes Program of the ABC 7 News
in spring 2011. In 2012, she was listed in Princeton’s Review “300 Best
Zvezda’s most enduring passion remains working at BMC with young
students motivated to discover new mathematical wonders. She spends a
lot of time with her girl and boy, studying foreign languages with them and
playing the piano, and teaching them mathematics the “Bulgarian” way.

1. C. Adams, The Knot Book: An Elementary Introduction to the Mathematical Theory

of Knots, Amer. Math. Soc., 2004.
2. American Mathematics Competitions,
3. T. Andreescu and Z. Feng, Mathematical Olympiads 1998-1999, Math. Assoc. of Amer-
ica, 2000.
4. , Mathematical Olympiads 1999-2000, Math. Assoc. of America, 2002.
5. , 103 Trigonometry Problems from the Training of the USA IMO Team,
Birkhäuser, 2005.
6. M. Armstrong, Groups and Symmetry, Springer, 1987.
7. V. Arnol’d,Trivium Mathematique,Translated by C.J. Shaddock, hans.math.upenn.
edu/Arnold/Arnold-Trivium-1991.pdf, 1991.
8. Bay Area Mathematical Olympiad,
9. E. Berlekamp, J.H. Conway, and R. Guy, Winning Ways for Your Mathematical Plays,
Vol. 4: Solitaire Army, A K Peters/CRC Press; 2nd edition, 2004.
10. E. Birrell, The Knot Quandle,, Fall
11. Berkeley Math Circle (BMC),
12. S. Budurov and D. Serafimov, Mathematical Olympiads, part II, State Publishing
Company “Narodna Prosveta”, 1985.
13. Business-Higher Education Forum, A Commitment to America’s Future,
crisis-mathematics-and-science-education, 2005.
14. F. Chung and R. Graham, A Tour of Archimedes’ Stomachion,
∼fan/stomach/tour/stomach.html, 1993.
15. Clay Olympiad Scholar Award and USAMO winners,
16. J. Cofman, What to Solve? Problems and Suggestions for Young Mathematicians,
Oxford University Press, 1990.
17. L. Cohen and G. Ehrlich, The Structure of the Real Number System, D. Van Nostrand,
18. B. Cutler, Stomachion,, Nov. 2003.
19. Davis Math Circle,
20. T. Davis,
21. D. Djukić, V. Janković, I. Matić, and N. Petrović, The IMO Compendium:1959-2004,
Problem Books in Mathematics, Springer, 2006.
22. Roy Dubish, Groups (Topics for Mathematics Clubs), National Council of Teachers of
Mathematics, 1973.
23. S. Eliahou, L. Kauffman, and M. Thistlethwaite, Infinite families of links with trivial
Jones polynomial, Topology 42 (2003), no. 1, 155–69.


24. Euclid, Euclid’s Elements, Green Lion Press, 2003.

25. Zuming Feng and Yi Sun, USA and International Mathematical Olympiads 2007-2008,
Math. Assoc. of America, 2008.
26. J. Gallian, Contempory Abstract Algebra, 7th ed., Brooks Cole, 2009.
27. I. Gelfand, Functions and Graphs, Dover, 2002.
28. I. Gelfand, E. Glagoleva, and A. Kirilov, The Method of Coordinates, Dover, 2011.
29. I. Gelfand and M. Saul, Trigonometry, Birkhäuser, 2013.
30. I. Gelfand and A. Shen, Algebra, Birkhäuser, 2013.
31. S. Gelfand, M. Gerver, A. Kirillov, and N. Konstantinvov, Sequences, Combinations,
Limits, Dover, 2002.
32. A. Givental, Kiselev’s Geometry: Book I. Planimetry, Sumizdat, 2006.
33. M. Greenberg, Euclidean and Non-Euclidean Geometry, W. H. Freeman & Company,
34. H. Guerber, The Story of the Greeks,
guerber&book=greeks&story=knot, 1923.
35. L. Hahn, Complex Numbers & Geometry, Math. Assoc. of America, 1994.
36. G. Hardy, J. Littlewood, and G. Pólya, Inequalities, Cambridge Univ. Press„ 1988.
37. Robin Hartshorne, Geometry: Euclid and Beyond, Springer, 2000.
38. J. Hass and J. Lagarias, The Number of Reidemeister Moves Needed for Unknotting,
Journal of Amer. Math. Soc. 14 (2001), 399–428.
39. D. Hayes and T. Shubin (eds.), Mathematical Adventures for Students and Amateurs,
Spectrum Series, Math. Assoc. of America, 2004.
40. D. Hilbert, Foundations of Geometry, Open Court, 1990.
41. H. Hoà, United States of America Mathematical Olympiad (USAMO), www., 2007.
42. J. Hoste, M. Thistlethwaite, and J. Weeks, The First 1,701,936 Knots, Math. Intell.
20 (1998), 33–48.
43. H. Jacobs, Geometry, third ed., W. H. Freeman and Company, 2003.
44. V. Jones, The Jones Polynomial,∼ vfr/jones.pdf, Au-
gust 2005.
45. W. Kahan, Is There a Small Skew Cayley Transform with Zero Diagonal?, Linear
Algebra and Its Applications (2006), 335–341.
46. J. Kantor and M. Maydanskiy, Triangles Gone Wild, MASS selecta (2003), 277–288.
47. Los Angeles Math Circle,∼ radko/circles/.
48. D. Leites, 60-odd YEARS of Moscow Mathematical Olympiads,
ostalo/gimnazija/math/ruske_olimpijade/11a-olym-1.pdf, 1997.
49. C. Livingston, Knot Theory, Carus Monograph, vol. 24, Mathematical Association of
America, 1993.
50. MAA Online, Evan O’Dorney: Spelling Champ and Math Whiz,
51. M. Maydanskiy, The Incidence Coloring Conjecture for Graphs of Maximum Degree
3, Discrete Mathematics 292 (2005), 131–141.
52. Marin Math Circle,
53. N. McCoy, Introduction to Modern Algebra, 4th ed., Allyn and Bacon, Inc., 1987.
54. W. Menasco and M. Thistlethwaite, The Tait Flyping Conjecture, Bull. Amer. Math.
Soc. 25 (1991), 403–12.
55. , The Classification of Alternating Links, Ann. Math 138 (1993), 113–73.
56. J. Milnor, Link Groups, Ann. Math 59 (1954), no. 2, 177–195.
57. S. Morrison and D. Bar-Natan, “Rubberband” Brunnian Links, http://katlas.math., May 2009.
58. National Academy of Sciences et al., Rising Above the Gathering Storm: Energizing
and Employing America for a Brighter Economic Future,
php?record_id=11463, 2007.

59. Nauka, Kvant,, 1994.

60. T. Needham, Visual Complex Analysis, Clarendon Press, 1997.
61. Northern Illinois University, Illinois Status Report on Science, Technology, Engineering,
and Mathematics Education,
pdfs/STEM_ed_report.pdf, 2006.
62. Oakland/East Bay Math Circle,
63. I. Peterson, Prized Geometric Logic,, 2001.
64. William Lowell Putnam Mathematical Competition,
65. K. Reidemeister, Elementare Bergründung der Knotentheorie, Abh. Math. Sem. Univ.
Hamburg 5 (1926), 24–32.
66. , Knoten und Gruppen, Abh. Math. Sem. Univ. Hamburg 5 (1926), 7–23.
67. , Knotentheorie, Chelsea, 1948.
68. J. Roberts, Knot Knotes,∼ justin/Papers/knotes.pdf, 1999.
69. W. Rudin, Principles or Mathematical Analysis: 3rd edition, McGraw-Hill, 1976.
70. San Francisco Math Circle,
71. San Jose Math Circle,
72. H. Schwerdtfeger, Geometry of Complex Numbers, Dover, 1979.
73. D. Shklarsky, N. Chentzov, and I. Yaglom, The USSR Olympiad Problem Book, Dover,
74. , The U.S.S.R. Olympiad Problem Book, Dover, 2013.
75. T. Shubin and D. Hayes and G. Alexanderson (eds.), Expeditions in Mathematics,
Spectrum Series, Math. Assoc. of America, 2011.
76. A. Sosinski, Marching Orders, Quantum 2 (1991), no. 2, 8–11.
77. , Finite Groups (in Russian), Kvant (1996), no. 6.
78. Stanford Math Circle,
79. Z. Stankova, The High School Olympiads: Excitement, Talent, and Determination,, 1997.
80. I. Stewart, Galois Theory; 3rd edition, 3 ed., Chapman & Hall/CRC Mathematics,
81. M. Thistlethwaite, Links with trivial Jones polynomial, Journal of Knot Theory Ram-
ifications 10 (2001), no. 4, 641–3.
82. C. Trigg, A Three-Square Geometry Problem, Journal of Recreational Mathematics
4 (1971), 90–99.
83. Utah Math Circle,
84. S. Vandervelde, Circle in a Box, MSRI Math Circle Library, Vol. 2, AMS and MSRI,
85. M. Whitlow, M. Breen, Z. Stankova, and T. Shubin, Sustainable Funding of Top Tier
Math Circles, Proposal, 2007.
86. I. Yaglom, Complex Numbers in Geometry, Academic Press, 1968.
87. B. Youse, The Number System, Dickenson, 1965.

The American Mathematical Society gratefully acknowledges these institu-

tions and individuals for granting the following permissions:
Business-Higher Education Forum
The quotation “A Commitment to America’s Future: Responding to the
Crisis in Mathematics and Science Education” in the Epilogue,
to_americas_future_0.pdf, Business-Higher Education Forum, 2005.
The Mathematical Association of America, American Mathematics Compe-
Problems from USAMO ’80, USAMO ’93, USAMO ’97, USAMO ’99,
USAMO ’07 used with permission.
Alexander the Great, The Story of the Greeks, by Helene A. Guerber [34]
on page 51.
The International Mathematical Olympiad (IMO) logo
Robert Scharein’s KnotPlot software at
A few references and the public domain portrait of Galois have been taken
from Wikipedia at


n-factorial, 247 supplementary, 186

AMS-inclusion, 94 vertical, 9, 16, 21
Rubik program, 33 arc, 181, 183
Macro gizmo, 33 Archer Design Inc., xxiv
“convex hall” of fame, 219 Archimedes, 2, 18, 301, 329
15-puzzle, 103, 139 Archimedes’ Axiom, 18
Stomachion, 301
Abel, Niels Henrik, 105 Ardilla, Federico, 313
abstract algebra, xvi, 31, 48, 81, 105, argument, 190, 194
114, 234, 310 ARML, 327
AIM, 312, 328 array, 142
Alexander the Great, 49 Ars Magna, 104
Alexanderson, Gerald, 329 art, xxii, 51, 68
algebra, xx, 104, 212, 306, 330 asymptote, 181, 292
algebraic topology, 315 Auckly, David, xxiii
algebraic geometry, xiv, 310, 327, 330 automorphism, 192
algebraic number theory, 53 average, 167, 211
algebraic structure, 110 ordinary, 280
algorithm, 11, 63, 125, 140, 271, 294 weighted, 271, 280, 282
GPS, 278, 285 Averett, Maia, 328
Rubik’s Cube, 45, 48 AWM, 328
smoothing, 264 axes in the C-plane, 205
unsmoothing, 265 axiom, 16, 23, 177
Alper, Ted, xxiv
altitude, 126, 175, 185 baby AM-GM, 225
foot of, 175 Balkan Math Olympiad, 263, 275
AMC, xxiv, 317 BAMO, 311–313, 315–317, 329
American Academy of Arts and Barchelo, Helene, xxiii
Sciences, 327 Bay Area Math Adventures, 329
AMS, 327 Bay Area math circles, xix, 309, 312,
angles, 2, 172 313, 328
acute, 178, 182, 184 Beatles, 307
alternate interior, xvii, 8–10, 16, 18, Beltrami, Eugenio
21, 173 Beltrami-Klein model, 18
central, 198 Beltrami, Eugenio, 18
congruent, 17 Berkeley Math Circle, xiv
equal, 176, 177, 304 BMC-Elementary, xiv
exterior, 184 BMC-Upper, xiv
inscribed, 171, 178, 224 Berkeley Math Circle (BMC), 307, 310,
obtuse, 19, 178 312, 313, 330
remote interior, 184 Berkeley Math Circle(BMC), 305
right, 14, 20, 178, 296 Berkeley Mini-Math Tournament, 317
straight, 22 Berlekamp, Elwyn, 313

bijection, 241 complement, 245

binomial coefficients, 98, 165, 238 computer graphics, 328
Binomial Theorem, 193, 228, 238 computer science, xiii, xiv, 319
biology, 306 congruences modulo n, 132
BMC, xxiii conjecture, xxi, 2, 4, 14, 59, 68, 289, 296
Bolyai, János, 17 conjugate, 133, 191, 192
Boston, 311 conjugation, 48, 190
bound, 156, 248, 297 Conrey, Brian, 312
bounded, 214 constraints, 213
Breen, Mike, xiiin, xxiii convexity, 218
Brown, Ian, xxiii Conway’s checkers, 158
Brown, Nico, 317 Conway, John Horton, 158
Brown, Tom, xxiv corollary, xxi
Bryant, Robert, xxiii counterexample, 19, 62
Bryn Mawr College, 305, 307, 330 criteria
Buhler, Joe, xxiii, 313 AA, 10, 13, 173, 179, 181, 183, 295
Bulgaria, 1, 305, 310, 330 ASA, 7, 8, 21
Bush, George W., 314 congruence, 7, 16, 19, 20, 173
H/L, 21
calculus, xvi, xvii, 5, 213, 290, 296, 316 HL, 19, 22
advanced, 217, 310, 328 R’A, 7
Caltech, 328 RA, 9
Cardano, Gerolamo, 104 RR, 7
Carroll, Gabriel, xiii, xxiv, 313, 315, 327 SAS, 13, 17, 21
Cartesian form, 190 similarity, 7, 171, 173, 178
Cayley transform, 314 SsA, 19, 21, 22
central object, 161 SSS, 21
central symmetry, 139 critical point, 290
centroid, 9, 10, 16, 204 Crossbar Theorem, 19
Chain Rule, 290 crystallography, 105
chemistry, 306 CTY, 318
Chen, Evan, xxiv, 316 cube, 138, 139
Chen, William, xxiv Cutler, William, 301
China, 319
China Girls Math Olympiad, 316 Davidson Institute, 318
chord, 18 Davis Math Circle, 310
Chorin, Alexandre, 313 Davis, Tom, xxiv, 312, 313, 328
Chu, Timothy, 102 de Moivre’s formula, 113, 193
Chung, Fan, 301 de Souza, Paulo, xxiv
circle, xvii, xx, 27 de Vera, Wycee, xxiv
diameter of, 188, 224 deductive reasoning, xxii
symmetries of, 27 definition, xxi, 16, 23
unit, 113, 133, 136, 180, 198, 202 del Ferro, Scipione, 104
circumcenter, 9 denominator, 148, 167, 191, 226, 246
circumcircle, 202, 204 derivative, 5, 68, 287, 297
Ciubotaru, Dan, 310 DeRose, Tony, xxiii
Clay Mathematics Institute, xxiii differential geometry, 53, 328
Clay Olympiad Scholar Award, 313 dimensions, 290
coding theory, 105 dinosaur, 328
collinear, 149 Dirac’s Theorem, 152
combinatorial gene-counting, 306 Dirichlet
combinatorics, xiv, 118, 150, 238, 302, inverse, 236
317, 330 product, 81, 233, 234, 247

series, 82 Extreme Principle, 151

distance, 200 extreme value, 213
divisible, xvii
division in C, 191 family of similar triangles, 174
divisors, 32, 79, 80, 86, 130, 147, 243 Fermat prime, 250, 251, 261
odd, 250 Ferrari, Lodovico, 104
of zero, 25 field, 61
prime, 90, 97, 239 Fields Medal, 64
domain, 82, 288 figure eight, 52
Dubish, Roy, 117 First Derivative Test, 220, 230, 231, 283
Dudzik, Andrew, xxiv, 313 football, 309
Duluth REU, 315, 330 Foundation
Dunne, Edward, xxiv Merriam-Webster, xxiii
Mosse, xxiii
Eastern Europe, xiv, 306, 307, 311, 319 National Science, xxiii
economics, xiii, xiv, 327 Packard, xxiii
Eisenbud, David, xxiii Toyota, xxiii
ellipse, 26, 31, 47, 215 foundation, xix
ellipsoid, 215 of geometry, 1, 5, 16, 19
elliptic curves, xiv fractional part, 148
embed, 157 fractions, xvii, 5, 46, 188, 191, 299, 302
endurance athletics, 328 Fuchs, Dmitry, xxiv, 312, 313
engineering, 318 function, xvii, 80, 141, 288, 289
English, 306, 307 arctan x, 297, 299, 300, 303
Epsilon Camp, 318 arithmetic, 82, 92
equality, 99, 211, 280, 290 concave, 270
equation, 97 constant, 83, 229
cubic, 104 continuous, 217, 219, 268, 278
linear, 5, 104, 305 convex, 218, 268, 283
of a plane, 228 differentiable, 220
polynomial, 104, 114 exponential, xvii, 278
quadratic, 104, 160, 306 increasing, 303
quartic, 104 linear, xvii, 154, 229
quintic, 104 logarithmic, 270, 278
system, 74, 306 max, 147
ergotic theory, 64 multiplicative, 79, 81, 82, 93, 233
Escape of the Clones, 158 odd, 303
Euclid, 16, 17 power, 83
Fifth Postulate, 17 quadratic, xvii
Euclidean motions, 107, 127 Riemann zeta, 81, 82, 299
isometries, 127 square root, 303
rigid motions, 127 strictly convex, 269, 277
rigid symmetries, 133 strictly increasing, 95, 102
Euler, 251, 329 strongly multiplicative, 83, 87, 255
function, 81, 243, 245, 247, 250 sum-function, 92–95, 233, 239, 245
Theorem, 243 symmetric, 228
Europe, 319 tangent, 296
European Girls Math Olympiad, 316 trigonometric, xvii, 278
example, xx, 34 Fundamental Theorem of Algebra, 208
existence, 149
Expanding Your Horizons, 328 Galileo, xx
experiment, 2, 4, 9, 12, 14, 34 Gallian, Joe, 31
extra construction, 3, 5, 173, 178 Galois theory, 114

Galois, Évariste, 103, 114 existence of inverses, 46, 47

game theory, xvi generator, 25, 26, 46, 47, 114, 134
Gates, Bill, 314 identity, 24, 110, 117, 124, 130–132
Gauss, Carl Friedriech, 17, 89, 329 intersection of subgroups, 32
gcd, 91, 135, 143, 166, 244 inverse, 24, 110, 117
geometry, 26, 306, 330 isomorphic, 30, 130
analytic, 223 isomorphism, 27, 134
basic tools, 5 multiplication table, 29
circle, 1, 171 of permutations, 27
classical, 1 of Rubik’s Cube, 27
elliptic, 16 of symmetries, 26, 107, 117, 127, 132
Euclidean, 1, 16, 20, 317 order of, 31, 32, 108
hyperbolic, 1, 16, 17, 20 order of an element, 31, 32, 115
inversion in the plane, 1, 18, 178 order of Rubik’s Cube, 39
mass point, 1 properties, 31
non-Euclidean, 17 representations, 64
plane, xvi, 1, 5, 8, 16, 171, 212, 218, semidirect product, 113, 133
223, 290 simple, 38
projective, 328 single face subgroup, 34
space, 290 slice moves, 32
synthetic solution, 5, 171, 287, 302 slice subgroup, 34
Givental, Alexander, xxiii, xxiv, 302, structure, 81, 82
312, 313 subgroup, 31, 34, 47, 48, 104, 109, 123
glide reflection, 127 subgroups of Rubik’s Cube, 32
Graham, Ron, 301 symmetric, on n objects, 29, 30
graph theory, xiv, 152 symmetries in space, 108
graphs, 315 theory, xiv, 23, 31, 38, 103, 330
cubic, 315 theory, combinatorial, 53
edges of, 152 transposition, 119, 137
Hamiltonian, 152, 315 trivial, 25, 129
of convex functions, 229 Gump, Forrest, 328
of functions, xvii, 181, 292
of trigonometric functions, 188 Harris, Joe, xxiii
vertices of, 152 Hartshorne, Robin, 313
grid, 14, 302 Harvard, 305, 310, 314, 327, 330
group, 23, 103 Hayes, David, 329
n-cycle, 35 Herriot, Neil, xxiv, 313
r-cycle, 119, 121, 137 hexagonal plate, 108
2-cycle, 35, 119 High School
3-cycle, 139 Oakland Technical, 327
abelian, 25, 111, 131, 135 High School
action, 105 Chelmsford, 311
alternating, on n objects, 38, 122 College Preparatory, 316
commutative, 25 Henry Gunn, 312, 313
cube symmetries, 122 Newton North, 311
cycle, 29 Oakland, xix, 312, 313, 329
cyclic, 25, 47, 48, 114, 115, 133–135 Presentation, 313
cyclic subgroup, 136 Westlake Junior, 329
definition, 23, 105, 110 high-voltage symbol, xxi
dihedral, 27, 107, 113 Hilbert, David, 16, 17
direct product, 34, 48 Congruence Axiom, 17
disjoint cycles, 32, 36, 119, 137 Parallel Axiom, 18
examples, 24 Holtz, Olga, 313

Holy Names College, 329 Karighattam, Arav, 317, 318

homotopy, 328 Kazakh State Uniiversity, 329
Howe, John, 313 Kazakhstan, 329
hypotenuse, 175, 185, 288 Kedlaya, Kiran, 313
King Arthur, 152
IMO, xiii, xvi, 101, 225, 305, 310, King Solomon, 68
313–316, 327, 330 Kiselev, A. P., 20
logo, 68 knot, 49
incenter, 9 74 knot, 57
incidence coloring, 315 n1 , 59
Inclusion-Exclusion Principle, 250 amphichiral, 67, 68, 77
India, 319 Celtic, 68, 78
induction, xx, 67, 99, 144, 266, 282, 300 chiral, 67, 68
strong, 94, 101, 137 connected sum, 60
inequalities, xvi, 81, 100, 147, 186, 211, crossing, 51
263, 288, 289, 302, 327 crossing number, 55, 58
AM-GM, 215, 226, 227, 232, 264, 266, diagram, 51, 54
269, 270, 285 equivalent, 50, 53
AM-GM-HM, 217 figure eight, 57, 58, 60
AM-HM, 228, 265, 269, 281 flype transformation, 70
baby AM-GM, 212 fundamental group, 53
chain of, 214 Gordian, 49
diagram of, 225
invariant, 50, 54, 58
HLP, 222, 231, 271
KnotPlot, 70
Jensen’s, 230–232, 270, 278, 283
mirror image, 67, 68, 77
Jensen’s, JI, 221
not equivalent, 55
Karamata’s, 271
polynomial invariant, 70
Minkowski’s, 287, 289, 292
quandle invariant, 70
PM, 216, 229, 232
square, 60
Rearrangement, 266, 282
strand, 53, 56
weighted, 231
string, 51
weighted AM-GM, 222, 232
surgery, 72
weighted Jensen’s, 223, 232, 270
theory, xvi, 50, 53
weighted PM, 223, 232
two-colorability, 58
infinite raffle, 79
unknotting number, 56, 58, 71
input, 82, 83, 297
Knuth, Donald, 312
integer powers in C, 195
integral, 299, 304 l’Hôpital’s Rule, 217, 229
antiderivative, 304 Lam, Quan, xxiv, 312
Intel Science Talent Search, 315 Lang, Serge, 313
interval, xvii, 219 law, xiii
invariant, 145, 148, 164, 203, 271 associative, 61, 234
under multiplication, 205 commutative, 234
Ishikawa, Yuki, xxiii distributive, 61, 86
Japanese, 329 lcm, 32, 119, 137, 143, 164, 249
Jobs, Steve, 314 Lee, Hojae, xxiv
John Hopkins, 317, 318 lemma, xxi
Jones, Vaughan, 64 Lie algebras, 310
jumping fleas, 153 limit, 217, 266, 278, 304
Jurassic Park, 328 linear algebra, xiv, 61, 63, 128, 310,
Jussieu, Paris, 315 314, 316
real orthogonal matrix, 314
Kahan, William, 314 lines, xvii, 205

concurrent, 10 Maydanskiy, Maksim, xxiv, 313, 315

parallel, 8, 16, 17, 173, 207, 295, 303 McCuan, John, 312
perpendicular, 176 mean
link, 51 arithmetic, 212, 215, 224, 276, 278
4-crossing, 68 geometric, 212, 216, 224, 281
Borromean rings, 52, 54, 58 harmonic, 216
Brunnian, 52, 60, 71 power, 216, 278
component, 51 root mean square, 216, 276
Hopf, 52, 54, 58, 62, 64, 66, 67 median, 9
invariant, 58, 59, 64 Megginson, Bob, xxiii
linear chain of rings, 60 Microsoft Research, 327
local coloring, 62 midpoint, 220
necklace of rings, 60 Midpoint Rule, 220, 229, 268, 280, 283
number of components, 55 midsegment, 9, 16, 130
orientation, 64, 67 Mills College, 139, 328, 330
wedding/trinity ring, 68, 78 Milnor invariant, 71
Whitehead, 52, 54, 58, 68, 71 Minimality Principle, 121
literature, 306 minimize, 204, 267
Liu, Tiankai, 314, 316 Mirin, Alison, 139
Lobachevsky, Nikolai, 17 Mironov, Dmitri, xxiv
logic, xx, 53 MIT, 316, 327
Los Angeles Math Circle, xv, 310, 313 modular arithmetic, 25, 61, 260
moduli spaces of curves, 330
Möbius modulo n
function, 237, 239 addition, 46, 112
inverse, 239, 241 multiplication, 46, 47
inversion, 81, 82, 241, 247, 253 modulus, 136, 190, 192, 193
relation, 239 mono-coloring, 74
Möbius Band, 328 monochromatic, 63, 71, 74
MAA, 330 Monotone Bounded Theorem, 304
Madison, Sharon, 313 monovariant, 213, 266, 271, 275, 327
mansion problem, 276 concentration, 276
margin pictures, xxi continuous, 151, 277
Marin Math Circle, xv, 328 decreasing, 144
mass point, 329 discrete, 151, 277
math, 68, 176 distance, 278
math circle, xiii, 309, 319, 328 doubly-symmetric, 163
Math Kangaroo, 318 extremal, 143
Math Teachers’ Circle Network, 329 geometric, 150
mathematical research, xv, xvi, 314, numerical, 141
315, 330 operation, 151
mathematics education, 319 sum-monovariant, 142
Mathletes, 329 with sequences, 146
Matić, Ivan, xxiii, xxiv Monsky’s Theorem, 315
matrix, 61 Monthly Contest, xv, xxiv, 315, 327
augmented, 61 Moscow State University, 329
coefficient, 61, 63 MOSP, 310
echelon form, 61, 63, 74 MSRI, xxiii, 311–313, 328, 330
inverse, 61 Multi-smoothing Lemma, 274, 283
null space, 74 multiplication table, 106, 126, 129
row operations, 61 multiplicative identity, 234, 255
maximize, 213 multiplicative inverse, 236
Maximum Principle, 220, 221 music, 306, 317

mysticism, 68 diagonals of, xvii, 8, 21

National Academy of Sciences, 318 of cubie’s turn, 39
natural sciences, xiv of edges, 40
Navajo Nation Math Circles, 329 partial differential equations, 328
Ngo, Hoan, 313 Pascal’s Triangle, 164, 238
Nir, Oaz, 313 path
Nixon, Richard, 328 broken, 12, 178, 187
normal (line), 176 closed, 104, 124, 178
Novosibirsk, 329 of sunlight, 177, 187
number optimal, 10
complex, xvi, 24, 46, 82, 113, 131, 190 straight, 178
composite, 99 Paulos, John Allen, 39
integer, xvii, 24, 46, 82, 112 Peano axioms, xix
irrational, 148, 166 Peavy, Barbara, xxiv
natural, xix, 24, 79, 82, 297 Pejic, Michael, xxiv
non-negative, 212, 288 Penn State REU, 315
prime, 25 perfect square, 91
purely imaginary, 195 permutations, 23, 28, 104, 116, 137
rational, 24, 46, 61, 148, 166, 223, 285 2-cycle, 35
real, xvii, 18, 24, 46, 61, 111, 112, 195 3-cycle, 43
number theory, xiv even, 35, 120, 124, 125, 138
number theory, xvi, 25, 86, 132, 310, group, 27
317, 327, 330 identity, 37
numerator, 148, 215 multiplying, 28
O’Dorney, Evan, xxiv, 101, 254, 313 odd, 35, 120, 124, 138, 139
O’Dorney, Jennifer, xxiii perpendicular bisector, 135
Oakland/East Bay Math Circles, 313 philosophy, xiii, xx, 53
Obama, Barack, 314 PHP, xx, xxi, 31, 131, 136, 137
OEIS, 317 physics, xvi, 171, 176, 306
Olsson, Martin, xxiii law of, 5, 176
operation in a group laws of reflection, 176, 177
associative, 24, 46, 131, 132 piano, 305, 330
binary, 24, 110 Pierson, Laura, 313
closed, 24, 110 Pierson, Laura, 316
commutative, 24 Pisani, Vincent, 317
symmetry, 26 Platonic solids, 208
operator theory, 64 poetry, 306, 317
optimization, 212, 214, 287, 288, 303 polar form, 190
global behavior, 292 policy analysis, xiii
global extremum, 291 polygon, 5
local behavior, 292 convex, 222
minimal, 287, 290 regular, 189, 201, 202, 204, 210, 222
potential extremum, 291 regular n-gon, 27, 112, 135
ordered pairs, 190 regular nanogon, 200
orthocenter, 9 regular pentagon, 196, 205
orthogonal matrices, 128 polynomial, xvii, 103, 203, 299
output, 83, 297 algebra, 36
outreach activities, 310, 311 Jones, 64, 65, 68
real, 208
Palimpsest, 301 symmetric, 254
parallelogram, xvii, 8, 16, 21, 294, 303 Poonen, Bjorn, xxiii, xxiv, 312, 313, 327
center of, 8, 16 poset, 315

possible states, 142 moves, 53, 54, 58, 59, 67, 68, 71, 77
power curve, 194 Theorem, 53, 54
powers, 161 Reidemeister, Kurt, 53
pre-calculus, xvii relatively prime, 46, 83, 96, 98, 99, 134,
prime, xix, 92, 148, 256 167, 243, 244, 259
decomposition, 84, 248 remainders, xvii, 135, 259
decomposition, square-free, 252 system of, 244
power, 87, 97, 253 rescaling, 199
prime-power reduction, 87 Research Science Institute, 315, 316
primitive root of unity, 113, 198, 201 restricted patterns, 330
probability, 306, 328 reverse weights, 282
problem solving techniques, xxi revolution, 75
abstract and develop a theory, 81 Rike, Tom, xix, xxiv, 312, 313, 329
introduce stronger object, 80 roots
proof via example, 86 formula, 196
reduce to prime powers, 87 in C, 196
problem-solving techniques, xxii of unity, 198, 202, 205
programming, 317, 328 Rossi, Hugo, xxiii, 311
proof, xiv, xvi, xx, xxi, 288 rotation, 26, 30, 33, 40, 41, 46, 122, 126,
property, xxi 129, 135, 179, 199
proposition, xxi Rousse, 306
protractor, 3, 20 Rubik’s Cube, xvi, 23, 103, 116, 328
Ptolemy’s Theorem, 179, 182 Rubik’s cube, 81
Putnam, xiii, 101, 314, 316, 327, 328 Rubik’s Cube group, 27
pyramid, 108
Pythagorean Theorem, xvii, 3, 20, 22, San Francisco Math Circle, 313
174, 175, 185, 187, 202, 288, 302 San Francisco State College, 329
baby Pythagorean, 175, 180, 186 San Jose Math Circle, xv, 313, 328, 329
San Jose State University, 312, 329
quadrilateral, xvii Savine, Igor, xxiv
convex, 150 science, 51, 305, 306, 316, 318, 319
cyclic, 182, 188 Scripps Spelling Bee, xxiii, 313
diagonals of, 150, 179 Second Derivative Test, 220, 230, 284
inscribed, 179 segment, 19, 185, 192, 218, 283, 295
quantum mechanics, 64 self-correcting process, 151
quotient, 135 sequence, 144
constant, 222
radians, 181 convergent, 278, 280, 300, 304
radicals, 105 increasing, 101, 296, 299
Radko, Olga, 313 majorizes, 222, 231, 271
ratio, 4, 9, 10, 161, 174, 185, 190, 209 monotone, 304
golden, 160 of averages, 167
rationalizing, 302 of moves, 104
ray, 19, 176, 186 of transformations, 57
real analysis, xvi, 217, 304, 310, 317 recursive, xiv, 76
real number system, xix stabilizes, 167
reciprocal, 133, 236 subsequence, 299
rectangle, 21, 110, 128, 130, 215, 292 Serganova, Vera, xxiv, 313
reflection, 11, 16, 26, 46, 126, 129, 132, series, xiv, 203
133, 135, 172, 176, 186, 199, 289 arithmetic, 89
regular polyhedra, 208 geometric, 88, 161, 170, 205, 304
Reidemeister harmonic, 299
change-of-crossing move, 54 Taylor, 287, 297

set, 205 top-tier math circle, xiv, xx, 314, 320

convex, 218 topology, xvi, 310, 328
of numbers, xvii algebraic, 328
subset of moves, 33 combinatorial, 53
subsets, 227, 269 data analysis, 328
theory, xiv low-dimensional, 64
Shapiro, Austin, 313 transformation, xiv, 172, 199, 287, 292
Shubin, Tatiana, xiiin, xxiv, 313, 329 translation, 115, 132, 134, 135, 199, 294
Silicon Graphics, 328 transpositions, 267
Singer, Michael, xxiii transversal, 10, 17, 173
Sizemore, Steve, xxiv Trapa, Peter, 310
skein relation, 65, 75, 78 trapezoid, xvii, 218, 224, 269, 283
Slettnes, Espen, 318 trefoil, 52, 56–58, 60, 62
smooth power curve, 197 left-handed, 60, 71
smoothing, 225, 264 right-handed, 51, 60, 66, 71
endless, 265 via skein relation, 65
Smoothing Lemma, 269, 274, 283 Triangle Inequality, 5, 12, 16, 175, 192,
Snow, Marsha, xxiv 209, 294
soccer, 306 triangles, xvii, xx, 315
Sofia University, 330 center of, 9
South Korea, 319 congruent, 3, 6
Soviet Union, USSR, xix, 319, 329 equilateral, 26, 30, 46, 107, 128, 175,
span of vectors, 74 185
Special Interest Group of MAA on isosceles, 20, 22, 185
Math Circles, 329 obtuse, 15
square, 26, 46, 130, 315 right, 15, 20, 182, 295
Squeeze Theorem, 217 right isosceles, 15, 16
stabilize, 144, 147 similar, 3, 5, 6, 13, 224
Stanford, xiii, xxiii, 310, 314, 327, 328 symmetries of, 26
Stanford Math Circle, 313 triangulation, 315
Stankova, Zvezdelina, 330 tricoloring, 56, 62, 74
Stankova, Zvezdelina, xiiin, xxiv not tricolorable, 57
Sturmfels, Bernd, 313 number of, 59
subfigures, labeling of, xxii set of, 63
subtraction in C, 193 tricolorability, 58, 72
Sudbury Math Circle, 313 tricolorable, 56, 59, 60
sum trivial, 56
bounded from above, 299 Trigg, Charles, 301
finite, 297, 299 trigonometry, xvi, 3, 171, 179, 223, 302
infinite, 297, 299 application of, 184
of squares, 202 cosine, 180, 182
partial, 296, 299, 304 cotangent, 180, 181
system of equations, 61 inverse, 297
homogeneous, 62 sine, 180, 182
tangent, 180, 181, 184
Tartaglia, Niccoló, 104 Tung, Stephanie, xxiv
Taylor expansion, 172, 299 Turkey, 49
technical writing skills, xv
technology, 319 UC Berkeley, 330
telescoping, 300 UC Berkeley, xxiii, 64, 302, 312, 314,
Terminator II, 328 315, 319, 327
tetrahedron, 108, 122, 138 UC Davis, 312, 329
theorem, xxi, 16, 23, 177 UC San Diego, 328

UC Santa Barbara, 329

uniqueness, 31, 47, 111, 149, 177, 234,
290, 295, 303
University of San Francisco, 311
unknot, 52, 56, 58, 62
unsmoothing, 276
US, xx, 307, 308, 310, 313, 318
USAMO, xxiv, 313, 315
Utah Math Circle, 310

Vakil, Ravi, xxiii, 313, 314

Vandervelde, Sam, 313
variable, 62, 74
Viète’s Formulas, 203, 204
Vladimir Arnol’d, 208
von Mangoldt function, 242
von Neumann algebras, 64

warning road sign, xxi

Washington, D.C., 327
weight, 222
Wertheimer, David, xxiv
Whitlow, Marc, xiiin, xxiii
Wiegers, Brandy, 313
Wikipedia, xxiv
Wiles, Andrew, xxii, 39
Wu, Hung-Hsi, 319

Yeung, Joyce, xxiv

Zakharevich, Inna, xxiv, 313

Zeitz, Paul, xxiv, 311, 313
Zucker, Joshua, xxiv, 312, 313
Zuckerberg, Mark, 314
Many mathematicians have been drawn
to mathematics through their experi-
ence with math circles. The Berkeley

Photo courtesy of Rudolph Chung

Math Circle (BMC) started in 1998 as
one of the very first math circles in the
U.S. Over the last decade and a half,
100 instructors—university professors,
business tycoons, high school teachers,
and more—have shared their passion
for mathematics by delivering over 800 BMC sessions on the UC Berkeley campus
every week during the school year.
This second volume of the book series is based on a dozen of these sessions,
encompassing a variety of enticing and stimulating mathematical topics, some new
and some continuing from Volume I:
• from dismantling Rubik’s Cube and randomly putting it back together to solving
it with the power of group theory;
• from raising knot-eating machines and letting Alexander the Great cut the
Gordian Knot to breaking through knot theory via the Jones polynomial;
• from entering a seemingly hopeless infinite raffle to becoming friendly with mul-
tiplicative functions in the land of Dirichlet, Möbius, and Euler;
• from leading an army of jumping fleas in an old problem from the International
Mathematical Olympiads to improving our own essay-writing strategies;
• from searching for optimal paths on a hot summer day to questioning whether
Archimedes was on his way to discovering trigonometry 2000 years ago

Do some of these scenarios sound bizarre, having never before been associated with
mathematics? Mathematicians love having fun while doing serious mathematics and
that love is what this book intends to share with the reader. Whether at a beginner,
an intermediate, or an advanced level, anyone can find a place here to be provoked
to think deeply and to be inspired to create.
In the interest of fostering a greater awareness and appreciation of mathematics
and its connections to other disciplines and everyday life, MSRI and the AMS are
publishing books in the Mathematical Circles Library series as a service to young
people, their parents and teachers, and the mathematics profession.

For additional information

and updates on this book, visit

AMS on the Web

You might also like