Download as pdf or txt
Download as pdf or txt
You are on page 1of 296

3 +

4= n
Teaching School Mathematics:
Hung-Hsi Wu

T (Ga )

Ga T (P ) = (x + p, ax2 + q)

P = (x, ax2 )
V = (p, q)

O p

E R 0
B 1

/ x

Teaching School Mathematics:

Teaching School Mathematics:

Hung-Hsi Wu
Department of Mathematics
University of California, Berkeley


Providence, Rhode Island
2010 Mathematics Subject Classification. Primary 97-01, 00-01, 97H20, 97G70,
97H30, 97F80.

For additional information and updates on this book, visit

Library of Congress Cataloging-in-Publication Data

Names: Wu, Hongxi, 1940-
Title: Teaching school mathematics. Algebra / Hung-Hsi Wu.
Description: Providence, Rhode Island : American Mathematical Society, 2016. | Audience:
Grades 6 to 8.- | Includes bibliographical references.
Identifiers: LCCN 2016000118 | ISBN 9781470427214 (alk. paper)
Subjects: LCSH: Algebra–Textbooks. | Algebra–Study and teaching (Elementary) | Algebra–
Study and teaching (Middle school) | AMS: Mathematics education – Instructional exposition
(textbooks, tutorial papers, etc.). msc | General – Instructional exposition (textbooks, tutorial
papers, etc.). msc | Mathematics education – Algebra – Elementary algebra. msc | Mathe-
matics education – Geometry – Analytic geometry. Vector algebra. msc | Mathematics educa-
tion – Algebra – Equations and inequalities. msc | Mathematics education – Arithmetic, number
theory – Ratio and proportion, percentages. msc
Classification: LCC QA159 .W8 2016 | DDC 512.9071/2–dc23 LC record available at http://lccn.

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting
for them, are permitted to make fair use of the material, such as to copy select pages for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Permissions to reuse
portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink
service. For more information, please visit:
Send requests for translation rights and licensed reprints to
Excluded from these provisions is material for which the author holds copyright. In such cases,
requests for permission to reuse or reprint material should be addressed directly to the author(s).
Copyright ownership is indicated on the copyright page, or on the lower right-hand corner of the
first page of each article within proceedings volumes.

c 2016 by the author. All rights reserved.
Printed in the United States of America.

∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at
10 9 8 7 6 5 4 3 2 1 21 20 19 18 17 16
To Kuniko
Wir sind durch Not und Freude
gegangen Hand in Hand;
vom Wandern ruhen wir beide
nun überm stillen Land.
Im Abendrot
Joseph von Eichendorff (1788–1857)

Chapters in the Companion Volume ix

Preface xi

Suggestions on How to Read This Volume xix

Chapter 1. Symbolic Expressions 1

1.1. Basic protocol in the use of symbols 2
1.2. Expressions and identities 5
1.3. Mersenne primes and finite geometric series 11
1.4. Polynomials and order of operations 17
1.5. Rational expressions 24

Chapter 2. Translation of Verbal Information into Symbols 27

2.1. Equations and inequalities 27
2.2. Some examples of translation 30

Chapter 3. Linear Equations in One Variable 37

3.1. Solving linear equations 37
3.2. Some word problems 45

Chapter 4. Linear Equations in Two Variables and Their Graphs 53

4.1. Coordinate system in the plane 54
4.2. Linear equations in two variables 57
4.3. The concept of slope 61
4.4. Proof that graphs of linear equations are lines 72
4.5. Every line is the graph of a linear equation 76
4.6. Useful facts and examples 78

Chapter 5. Simultaneous Linear Equations 85

5.1. Solutions of linear systems and the geometric interpretation 85
5.2. The algebraic method of solution 87
5.3. Characterization of parallel lines by slope 93
5.4. Algebraic criterion for solvability 98
5.5. Partial fractions and Pythagorean triples 101
5.6. Appendix 109

Chapter 6. Functions and Their Graphs 117

6.1. The basic definitions 117
6.2. Why functions? 122
6.3. Some examples of graphs 126
6.4. Remarks on graphs and coordinate systems 133

Chapter 7. Linear Functions and Proportional Reasoning 137

7.1. Constant rate and linear functions 137
7.2. Proportional reasoning 144
Chapter 8. Linear Inequalities and Their Graphs 155
8.1. How do inequalities arise in real life? 155
8.2. The symbolic translation 157
8.3. Basic facts about inequalities and applications 160
8.4. Graphs of inequalities in the plane 163
8.5. Solution of the manufacturing problem 180
8.6. Behavior of linear functions in the plane 187
Chapter 9. Exponents 191
9.1. Positive-integer exponents 194
9.2. Rational exponents 200
9.3. Laws of exponents 205
9.4. Scientific notation 214
9.5. Three additional remarks on rational exponents 220
Chapter 10. Quadratic Functions and Their Graphs 223
10.1. Quadratic equations 224
10.2. A special class of quadratic functions 238
10.3. Properties of quadratic functions 246
10.4. The graph and the parabola 252
10.5. Some applications 260
Appendix: Facts from [Wu-PreAlg] 265
Bibliography 273
Chapters in the Companion Volume

Teaching School Mathematics: Pre-Algebra ([Wu-PreAlg])

Chapter 1: Fractions
Chapter 2: Rational Numbers
Chapter 3: The Euclidean Algorithm
Chapter 4: Experimental Geometry
Chapter 5: Length, Area, and Volume


Structure of the chapters in [Wu-PreAlg] ([PA]) and this volume ([A])

[PA]Chapter 1

[PA]Chapter 2
[PA]Chapter 4 [PA]Chapter 3
! !
!!! ! !!
! !
! !! !!!
!! !!
[PA]Chapter 5 [A]Chapter 1

[A]Chapter 2

[A]Chapter 3

[A]Chapter 4

[A]Chapter 5

[A]Chapter 6

[A]Chapter 7 a
!!! aa
!! aa
!! aa
! aa
!! a
[A]Chapter 9 [A]Chapter 10 [A]Chapter 8

A main obstacle in the learning of

school mathematics has always
been how to cope with the steady
increase in abstraction with
the passage of each school year.

This volume and its companion volume—Teaching School Mathematics: Pre-

Algebra ([Wu-PreAlg])—are textbooks written for teachers, especially middle
school teachers. They address the mathematics that is generally taught in grades
6–8. In this volume, we give a presentation of school algebra as a direct continuation
of arithmetic—whole numbers, fractions, decimals, and negative numbers—and we
also assume a basic acquaintance with the geometry of congruence and similarity.
For this reason, we must draw on the readers’ knowledge of these topics. In the
Appendix (pages 265 ff.), one can find a brief summary of most of the relevant
facts from [Wu-PreAlg] that we need.
The topics to be taken up in this volume are those to be found in any mid-
dle school or high school course on Algebra I: linear equations in one and two
variables, linear inequalities in one and two variables, simultaneous linear equa-
tions, the concept of a function, polynomial functions and exponential functions,
and a detailed study of linear and quadratic functions. These topics are entirely
unexceptional. Such being the case, one may well ask why this volume deserved
to be written. In general terms, an answer to this question has been given in the
Preface to [Wu-PreAlg]. What follows is a more focused answer in the context of
the teaching and learning of introductory school algebra.
At the moment, Algebra for All is a national goal (see Chapter 3 of [NMP]), and
there are various theories as to why this goal seems to be out of reach. Could it be
that the appropriate classroom manipulatives have not been sufficiently exploited,
that the latest advances in technology have not yet been fully integrated into the
instruction, or that the teaching has slighted so-called sense-making, conceptual
understanding, and higher-order thinking skills? Perhaps. All these questions,
however, ignore a fundamental issue: there is ample evidence that students can-
not learn algebra, not because they don’t like the packaging of the product, but
because they find the product itself to be incomprehensible. We will refer to
this product—the mathematics in almost all the standard school textbooks of the
past four decades—as Textbook School Mathematics ([TSM]).1 TSM fails, often
in spectacular fashion, to explain to students, clearly and correctly, what they are

1 See, for example, [Wu2013] or [Wu2015] for more details.


supposed to learn. Education researchers who look into the nonlearning of al-
gebra do not appear to have given much thought to the fact that the TSM that
resides in student textbooks or standard professional development materials is
riddled with ambiguities and errors, big and small. In short, TSM is not learn-
able. Until a mathematically correct version of school algebra is readily accessible
to one and all, it will be premature to draw any conclusions about why students
cannot learn algebra. With this in mind, the main justification for this volume’s
existence is that it gives a logical and coherent exposition of the standard math-
ematical topics in Algebra I in a way that not only is grade-level appropriate for
eighth and ninth graders, but also meets the requirements of the following five
fundamental principles of mathematics:
(I) Precise definitions are essential.
(II) Every statement must be supported by mathematical reasoning.
(III) Mathematical statements are precise.
(IV) Mathematics is coherent.
(V) Mathematics is purposeful.
We will refer the readers to the Preface of [Wu-PreAlg] for a fuller discussions of
these fundamental principles.
The grade-level requirements we have imposed on this volume by no means
imply that this is a student textbook. This volume is unequivocally a book for
teachers with a sharp focus on mathematics. What this requirement means is
that a conscientious attempt has been made to minimize the distance between the
content in this volume and what teachers have to teach in middle school (see, for
example, [Wu2006]). Consequently, this volume will not touch on any advanced
topics such as vector spaces and linear transformations, groups, rings, fields, and
especially finite fields. It turns out that the need for such advanced considerations
is not critical at this stage and, in any case, there will be no advanced topics to be
found in this volume. Instead, we will focus on probing the basic structure that
undergirds the standard topics of school algebra. In the course of this probe, how-
ever, the need for advanced—and often quite subtle—considerations does surface
from time to time. On these occasions, we will not shy away from giving the
full explanation in order to bring mathematical closure to the discussion. All the
same, we will also be explicit in pointing out that these advanced considerations
are more for broadening the teachers’ knowledge base than for school classroom
The fundamental principles of mathematics are of critical importance in the
teaching of school algebra because algebra is inherently an abstract subject com-
pared to arithmetic, and TSM’s lack of precise definitions and logical reasoning in
an abstract environment has rendered the subject unlearnable. In greater detail,
let us consider the following specific manifestations of these flaws in the algebra
portion of TSM:

1. TSM considers the concept of a “variable” to be basic in school algebra.

For example:

Understanding the concept of variable is crucial to the study of

algebra; a major problem in students’ efforts to understand and
do algebra results from their narrow interpretation of the term.
([NCTM], page 102)

Many in the education establishment may be surprised to learn that “variable” is

not a mathematically well-defined concept and is only used informally in mathe-
matical discussions in order to remove excessive verbiage.2 One should not ex-
pend scarce instructional time trying to teach a phantom concept, much less make
it the cornerstone of algebra learning. When textbooks follow suit and elaborate
on a “variable” as a quantity that changes or varies, they block beginners at the
gate of the gate-keeper course that is algebra.
2. Once the concept of “variable” has taken root, an equation will naturally be
defined in terms of a “variable”. Here is a typical example:
A variable is a symbol used to represent one or more num-
bers. A variable expression is an expression that contains a vari-
able. . . . An equation is a statement formed by placing an equal
sign between two numerical or variable expressions. ([Dolciani],
pages 724 and 731)
This then raises the question of what it means for two variable expressions to be
equal: if a variable can represent more than one number, does the equality of
two variable expressions mean the expressions are equal for all the numbers so
represented? If so, isn’t that an identity? If not, then for which numbers are they
When basic questions like these cannot be answered, it is a foregone conclu-
sion that the fundamental process of solving an equation, in the way it is taught in
school algebra, becomes a faith-based ritual divorced from mathematical reason-
ing (see the discussion on pages 37 ff.).
3. TSM introduces students to the concept of the slope of a nonvertical line
strictly as a rote skill: fix two chosen points on the line and compute their “rise
over run”. There is no mention of the fact that, if two other points are chosen,
the resulting “rise over run” will still be the same. Some students even ignore the
“rise over run” and simply expect every line to come equipped with an equation
y = mx + b so that they can conveniently identify the slope of the line with the
constant “m”. Recently, the scope of the misconception about slope has been cap-
tured quantitatively in [Postelnicu-Greenes], but the education research literature
still seems oblivious to the fundamental mathematical error in TSM’s definition of
slope and the glaring absence of reasoning surrounding this concept. Education
research also appears to be unaware that, until this error is honestly confronted,
it will be premature—not to say futile—to talk about students’ “conceptual un-
derstanding” of slope.
4. A natural consequence of not having a correct definition of slope is the
absence of any explanations for the interplay between a linear equation in two
variables and its graph. For example, why is the graph of a linear equation in
two variables a straight line? And is every straight line necessarily the graph
of some linear equation in two variables? TSM’s answer to the first question is
that when several points in the graph of the linear equation are plotted, “they
look straight”. Reasoning plays no role. Consequently, students can only learn
how to find the equation of a line satisfying certain geometric conditions (e.g.,
passing through two given points, passing through a given point with a given
slope, etc.) as a rote skill. Since linear equations constitute a major part of the
first half of Algebra I, this means that students’ first encounter with algebra will
2 We have already done so above by referring to “linear equations of one and two variables”, etc.

consist mainly of a deeper immersion in learning-by-rote. After years of bruising

battles with fraction-as-a-piece-of-pizza, students become convinced by such an
encounter that math is unlearnable except by brute force memorization.
5. The theorem that two lines being parallel is equivalent to the lines having
the same slope is routinely offered in textbooks as a definition or as a key concept of
parallel lines. Likewise, the theorem that two lines being perpendicular is equiv-
alent to the product of the slopes of the lines being equal to −1 is often given as
a seemingly sophisticated definition of perpendicularity. Because students are al-
ready familiar with the concepts of parallel and perpendicular lines from earlier
grades, they are confused by this spectacular about-face. Does a mathematical
concept have any permanence, or is it liable to change with each grade? The
likely conclusion from such confusion is that algebra doesn’t make sense. This is one
reason that the current discussion about “sense-making” in mathematics learning
has no real traction: until we have a curriculum that makes sense, we cannot ask
students to make sense of the mathematics.
6. In elementary and middle school, students have already used the concept
of constant rate (e.g., constant speed ) extensively, but there is no precise definition of
this concept in TSM. What there is in TSM is an abstruse discussion of a concept
called proportional reasoning; the implicit assumption is that if students have a
conceptual understanding of proportional reasoning, they will be able to handle
constant rate. An introductory algebra course is the first opportunity to bring
clarity and closure to “constant rate” by pointing out what it means and why it
corresponds to the linearity of an appropriate function. Yet this is hardly ever
done. This is a prime example of the fractured school curriculum: the intrinsic
coherence between the mathematics of grades 5–7 and the foundations of algebra
is too often missing.
7. The concept of the graph of an equation is not precisely defined in TSM, and
consequently not emphasized. It follows that simple facts about graphs such as
the solution of simultaneous linear equations being the coordinates of the point
of intersection of the two linear graphs become articles of faith rather than simple
logical consequences of the definitions. Students do not learn mathematics if all
they do is memorize facts on faith alone. Not surprisingly, some students do lose
faith, which then makes any kind of learning—by rote or otherwise—impossible.
8. In TSM, the graph of a linear inequality of two variables is almost never defined,
and the concept of a half-plane is also left undefined. Consequently, the theorem
that the graph of a linear inequality is a half-plane becomes either a decree or a
definition, and it is impossible to decide which it is. In asking students to learn
about linear inequalities and linear programming, we are in effect asking them
(once again) to wade through, and memorize by rote, a morass of disconnected
shadowy statements while making believe that we are teaching mathematics. Un-
der these circumstances, how can any mathematics learning take place?
9. The concept of a rational 3 exponent of a positive number is a source of
immense confusion. TSM makes believe that, for any positive number a, a0 = 1
is a theorem rather than a definition, and the same goes for a−n = n (for any
positive integer n). Moreover, TSM does not explain that the reason we want a

3 We are using the term of “rational numbers” in its correct mathematical sense: fractions and

negative fractions.

definition of ar for all rational numbers r is that these are special values of the
exponential function x → a x when x is an arbitrary number. As a consequence,
the laws of exponents become just another set of senseless rote skills about a
strange notation rather than remarkable properties of the exponential function.
10. TSM’s presentation of quadratic equations and functions is chaotic: too
many facts to memorize while no conceptual framework is provided for their un-
derstanding. For example, students learn how to factor quadratic polynomials
with leading coefficients other than 1, learn the quadratic formula, learn the for-
mula for the axis of symmetry of the graph, learn the formula for the vertex of
the graph of a quadratic function, etc. How are these related to each other?

If one goes through the algebra curriculum of TSM carefully, one will uncover
these and many more serious mathematical issues. (Many of them will be pointed
out in this volume in due course.) The prospect of a student learning algebra is
therefore daunting: it may be likened to walking through a minefield where all
the mines were put there by human errors. The least we can do is to remove
the mines (and some of students’ concomitant fears)—in other words, eradicate
TSM—in order to give learning a chance. The modest goal of this volume is to
give you the tools to do exactly that. Briefly, one will find in the following pages
ways to deal with the preceding difficulties:

1a. What students should be learning is not what a “variable” is but the
proper use of symbols; see pages 4 ff. The meaning of each symbol must be
specified before it is put to use. For example, the equality of two functions of one
variable, f ( x ) and g( x ), may be a prototypical statement involving variables, but
the precise definition of the equality f = g is that, for each fixed number x in their
common domain of definition, f ( x ) = g( x ). Nothing varies.
2a. The solving of equations is strictly a matter of computations with numbers.
No variables are involved, and therefore there is no reason to confuse the issue
by using balance scales or algebra tiles to explain the solution process. See the
discussion in Section 3.1 on page 37.
3a. The concept of slope needs to be defined with far greater care than TSM
has let on. One has to explain what “slope” tries to measure, how to measure it,
and, most importantly, why this way of measuring it is correct and useful. In Section
4.3 on page 61, there is an extended discussion to this effect. In particular, this is
where the discussion of congruent triangles and similar triangles in Chapter 4 of
[Wu-PreAlg] becomes absolutely essential.
4a. In Sections 4.4 and 4.5 on pages 72 and 76, we will give a careful proof
of why the graph of a linear equation of two variables is a line and why each
line is the graph of some linear equation of two variables. In the process, it will
become obvious how to write down the equation of a line that satisfies any of the
standard geometric conditions. See Section 4.6 on page 78.
5a. Because perpendicularity and parallelism have been defined in Chapter 4
of [Wu-PreAlg], and because slope has been defined in Section 4.3 on page 61, any
assertion about parallelism (or perpendicularity) and slope becomes a theorem to
be proved. We will do exactly that in Sections 5.3 and 5.6 on pages 93 and 109,

6a. In Section 7.1 on page 137, we review the definition of constant rate, and
then prove that constant rate is equivalent to the existence of an appropriate linear
function that represents work done over time. In Section 7.2, we closely examine
the possible meanings of proportional reasoning and point out how—by eliminating
it altogether—its purported applications in school mathematics can all be put on
a firm mathematical foundation.
7a. In Section 5.1, we explain precisely why the solutions to a pair of equa-
tions are the set of all the points of intersection of the graphs of the two equations
in question. Such an explanation is possible only because the graph of an equation
has been precisely defined and put to use in reasoning.
8a. In Section 8.4, we define the half-planes of a line and the graph of a linear
inequality. Then in Theorem 8.4 on page 172, we prove that the graph of a linear
inequality is a half-plane of the graph of the associated linear equation.
9a. Section 9.2 re-orients the discussion of rational exponents by assuming
the existence of exponential functions from the beginning. (This is analogous to
the discussion of solving polynomial equations by assuming—at the outset—the
Fundamental Theorem of Algebra. In school mathematics, sometimes a central
theorem has to be taken on faith for pedagogical reasons.) Then we make use
of the characteristic property of the exponential functions (i.e., a x · ay = a x+y )
to prove that a0 = 1 and a− x = 1/a x . This makes it possible for the following
section (Section 9.3) to present complete proofs of the other laws of exponents for
rational exponents.
10a. Chapter 10 begins with a general discussion of the shape of the graph of
a quadratic function and then shows how the graph can provide a framework for
the understanding of quadratic functions in the same way that straight lines pro-
vide a framework for the understanding of linear functions. The basic technique
here is that of completing the square; it will be seen that this technique unifies the
diverse skills related to quadratic functions.

It can be persuasively argued that any form of professional development for

middle school teachers that makes any claim to legitimacy must make the needed
corrections of these flagrant errors in TSM. The content of this volume—in its var-
ious incarnations—has been used for both inservice and preservice professional
development since 2006. Nevertheless, I have come to realize that, as of the year
2015, this offering comes with some liabilities. While it provides an opportunity
for teachers to learn correct school mathematics, perhaps for the first time, it also
obligates them to put in a tremendous amount of work in order to teach this ma-
terial in the school classroom. In addition, the amount of steely resolve that is
needed to teach it without the support of a compatible student textbook and a
school’s or a district’s pacing guide may well be beyond the normal call of duty.
To give a somewhat extreme example, if a teacher teaches slope more or less ac-
cording to Section 4.3 on page 61 (see 3a above), then inevitably he or she will
have to steal many hours from other topics in order to introduce students to the
basic facts about similar triangles.
The advent of the CCSSM ([CCSSM]) should mitigate some of the difficulties
teachers have in teaching correct algebra. If they wish to implement the content of
this volume in their own classrooms, they can do so now with the assurance that,
in the Common Core era, much of what used to be outlandish in this volume is

now becoming the accepted norm. I can only hope that, in the forthcoming years,
better student textbooks will be written so that the CCSSM will finally bring about
better student learning in school algebra.

Acknowledgements. This volume and its companion volume [Wu-PreAlg]

evolved from the lecture notes ([Wu2010a] and [Wu2010b]) for the Pre-Algebra
and Algebra summer institutes that I used to teach to middle school mathemat-
ics teachers from 2004 to 2013. My ideas on professional development for K–12
mathematics teachers were derived from two sources: my understanding as a
professional mathematician of the minimum requirements of mathematics (see the
five fundamental principles of mathematics in the Preface of [Wu-PreAlg]) and
the blatant corrosive effects of TSM on the teaching and learning of mathematics.
Those summer institutes therefore placed a special emphasis on improving teach-
ers’ content knowledge. I would not have had the opportunity to try out these
ideas on teachers but for the generous financial support from 2004 to 2006 by the
Los Angeles County Office of Education (LACOE), and from 2007 to 2013 by the
S. D. Bechtel, Jr. Foundation. Because of the difficulty I have had with funding by
government agencies—they did not (and perhaps still do not) consider the kind of
content-based professional development I insist on to be worthy of support—my
debt to Henry Mothner and Tim Murphy of LACOE and Stephen D. Bechtel, Jr. is
Through the years, I have benefited from the help of many dedicated teachers;
to Bob LeBoeuf, Monique Maynard, Marlene Wilson, and Betty Zamudio, I owe
the corrections of a large number of linguistic infelicities and typos, among other
things. Winnie Gilbert, Stefanie Hassan, and Sunil Koswatta were my assistants in
the professional development institutes, and their comments on the daily lectures
of the institutes could not help but leave their mark on these volumes. In addition,
Sunil created some animations (referenced in Chapters 2 and 4) at my request.
Phil Daro graciously shared with me his insight on how to communicate with
teachers. Sergei Gelfand made editorial suggestions on these volumes—including
their titles—that left an indelible imprint on their looks as well as their user-
friendliness. R. A. Askey read through a late draft with greater care than I had
imagined possible, and he suggested many improvements as well as corrections.
I shudder to think what these volumes would have been like had he not caught
those errors. Finally, Larry Francis helped me in multiple ways. He created
animations for me that can be found in Chapter 4. He is also the only person
who has read almost as many drafts as I have written. (He claimed to have read
twenty-seven, but I think he overestimated it!) He met numerous last minute
requests with unfailing good humor, and he never ceased to be supportive; more
importantly, he offered many fruitful corrections and suggestions.
To all of them, it gives me great pleasure to express my heartfelt thanks.
Hung-Hsi Wu
Berkeley, California
April 15, 2016
Suggestions on How to Read This Volume

The major conclusions in this book, as in all mathematics books, are sum-
marized into theorems; depending on the author’s (and other mathematicians’)
whims, theorems are sometimes called propositions, lemmas, or corollaries as a
way of indicating which theorems are deemed more important than others (note
that a formula or an algorithm is just a theorem). This idiosyncratic classification
of theorems started with Euclid around 300 B.C., and it is too late to change now.
The main concepts of mathematics are codified into definitions. Definitions are
set in boldface in this book when they appear for the first time. A few truly basic
definitions are even individually displayed in a separate paragraph, but most of
the definitions are embedded in the text itself. Be sure to watch out for them.
The statements of the theorems as well as their proofs depend on the defini-
tions, and proofs (= reasoning) are the guts of mathematics.
A preliminary suggestion to help you master the content of this book is for
you to
copy out the statements of every definition, theorem, proposi-
tion, lemma, and corollary, along with page references so that
they can be examined in detail if necessary,
and also to
summarize the main idea of each proof.
These are good study habits. When it is your turn to teach your students, be
sure to pass on these suggestions to them. A further suggestion is that you might
consider posting some of these theorems and definitions in your classroom.
You should also be aware that reading mathematics is not the same as reading
a gossip magazine. You can probably flip through such a magazine in an hour, if
not less. But in this book, there will be many passages that require careful reading
and re-reading, perhaps many times. I cannot single out those passages for you
because they will be different for different people. We do not all learn the same
way. What is true under all circumstances is that you should accept as a given
that mathematics books make for exceedingly slow reading. I learned this very
early in my career. On my very first day as a graduate student many years ago,
a professor, who was eventually to become my thesis advisor, was lecturing on
a particular theorem in a newly published volume. He mentioned casually that
in the proof he was going to present, there were two lines in that book that took
him fourteen hours to understand and he was going to tell us what he found out
in those long hours. That comment greatly emboldened me not to be afraid to
spend a lot of time on any passage in my own reading.
If you ever get stuck in any passage of this book, take heart, because that is
nothing but par for the course.



Symbolic Expressions
It can be argued that the most basic part of the learning of algebra is learning how
to use symbols correctly. This point of view is eloquently exposed in Chapter 3 of
the National Mathematics Advisory Panel Report [NMP]. If there is any meaning
at all to the phrase “algebraic thinking” in school mathematics, it would be “the
ability to use symbols precisely and fluently”. In this regard, there is a need
to single out the treatment of polynomials in this chapter. In mathematics, a
polynomial is either a polynomial function or an element of the polynomial ring
R[ x ], where R is the real numbers and “x” is an “indeterminate”. These two
concepts are distinct in general and every book on algebra has to come to grips
with the problem of how to reconcile these two notions. Happily, so long as we
work only with real or complex numbers, these two concepts are essentially the
same.1 Therefore we can afford to eschew the abstract concept of R[ x ] and simply
present a polynomial as a polynomial function so that the x in a polynomial
an x n + · · · + a1 x + a0 can be taken to be a number. The purpose of this chapter is
to demonstrate how one can do algebra by taking x to be just a number and turn
at least the introductory part of school algebra into generalized arithmetic, literally.
Formal algebra in the sense of R[ x ] can be left to a later date, e.g., a second course
in school algebra.2
This chapter is thus entirely elementary, and is nothing more than a direct ex-
tension of arithmetic. The exposition therefore intentionally emphasizes its close
affinity to arithmetic. (However, we do take the liberty of making more advanced
mathematical comments in the chapter preambles and, at times, in footnotes in or-
der to round out the picture; it is not necessary to understand the more advanced
comments for the reading of the text proper.) There is a danger, however, that
precisely because of its elementary character, you may take this chapter lightly
because it is “something you already know” and therefore not worthy of deeper
consideration. I would like to explicitly ask you to recognize that what is in this
chapter is genuine algebra, and that, most likely, whatever you think you already
know has been cast here in a new light. For example, whereas “variable” is re-
garded as a gateway concept for the learning of algebra at the time symbols are
introduced in TSM,3 this chapter shows why there is no need to try to learn what
a “variable” might be in order to learn algebra. We will restore simplicity to the

1 R [ x ] is ring isomorphic to the ring of polynomial functions over R. The same holds if R is

replaced by the complex numbers.

2 See Chapter 11 in Volume II of [Wu-HighSchool].
3 See page xi for the definition of TSM.


study of symbolic expressions, and simplicity is precisely the reason that algebra
can be taught without any fanfare.
It is not easy to learn to do things simply. It will take effort.

1.1. Basic protocol in the use of symbols

Recall from [Wu-PreAlg] that in these volumes, a number is a real number, i.e., a
point on the number line, unless stated to the contrary.
We are going to embark on a wholesale use of symbols. Why symbols? Be-
cause when we try to assert that something is valid for a large collection of num-
bers (e.g., for all positive integers, or for all rational numbers) instead of just for a
few specific numbers, we have to resort to the use of symbols to express this asser-
tion correctly and succinctly. For example, suppose we observe that 2 × 3 = 3 × 2,
3 × 4 = 4 × 3, 17 6
× 49 = 49 × 17
, (− 83 ) × 82 = 82 × (− 83 ), and so on. We want to
express our observation in general as follows:
For any two numbers, if we multiply them one way and, switch-
ing the order, we multiply them again, we get the same number.
Of course, what we wish to assert is what is known as the commutative law of
multiplication for numbers. The question is how to say it completely, unam-
biguously, and succinctly. After many trials and errors through many centuries,
starting with Diophantus around the third century4 and continuing up to René
Descartes (1596–1650)5 , people finally settled on the use of symbols as we know it
today. For the problem at hand, the accepted way of enunciating the commutative
law of multiplication is to say:
ab = ba for all numbers a and b.
Compared with the preceding indented verbal statement, the brevity resulting
from the use of symbols should be obvious.
It would seem that the fruits of some seventeen centuries of development of
the symbolic notation have not filtered down to our school curriculum, and the
use of symbols in standard textbooks is irresponsible and reckless at best. Major
misconceptions ensue. A main theme throughout this volume is to give careful
guidance on the etiquette of using symbols in order to undo these misconceptions.
One of the misconceptions that accompanies the abuse of the symbolic nota-
tion is the concept of a variable. At present, variable occupies a prominent position
in school mathematics, especially in algebra. In standard algebra texts as well as in
major documents in the mathematics education literature, there may be no explicit
definition of what a variable is, but students are nevertheless asked to understand
this concept because it is considered to be the gateway to algebra. When students
are asked to understand something that is mathematically fictitious, nonlearning

4 Diophantus was a Greek mathematician who lived in Alexandria, Egypt (Alexandria was a

Greek colony named after Alexander the Great). Unfortunately, his dates are unknown other than the
fact that he probably lived in the third century A.D. His influence in the development of mathematics
is considerable, as evidenced by the fact that the terminology of Diophantine equations is standard in
5 A co-discoverer of analytic geometry with Pierre Fermat (1601–1665). He is also an important

philosopher noted for the statement that, “I think, therefore I am”.


inevitably follows. Sometimes, a variable is described as a quantity that changes

or varies. The mathematical meaning of the last statement is vague and obscure.
At other times it is asserted that “students’ understanding of variable should go
far beyond simply recognizing that letters can be used to stand for unknown
numbers in equations” ([NCTM2000, page 225]), but nothing is said about what
lies “beyond” this recognition. For example, in [NRC], there is a statement that
students emerging from elementary school often carry the “perception of letters
as representing unknowns but not variables” (page 270). The difference between
“unknowns” and “variables” is unfortunately not clarified. All this deepens the
mystery of what a variable might be.
This volume will not ask for an understand-
ing of a variable in the learning of algebra. In mathematics, a variable is
There is no need for that in mathematics.6 In- an informal shorthand for
stead, we will explain the correct way to use
“an element in the domain of
symbols, and once you understand that, you
will feel no compunction about pushing vari- a function”. It is not a
able aside and going on with your study of al- mathematical concept.
gebra. However, the word variable has been in
use for more than three centuries and, sooner or later, you will run across it in the
mathematics literature. The point is not to pretend that this word doesn’t exist
but, rather, to understand enough about the use of symbols to put so-called “vari-
ables” in the proper perspective. Think of the analogy with the concept of alchemy
in chemistry; this word has been in use longer than variable. On the one hand, we
do not want alchemy to be a basic building block of school chemistry, and, on the
other hand, we want every school student to acquire enough knowledge about
the structure of molecules to know why alchemy is an absurd idea. In a similar
vein, while we do not make the concept of “variable” a basic building block of
algebra, we want students to be so at ease with the use of symbols that they are
not fazed by the abuse of the word “variable” because they know how to interpret
it correctly. We hope you will carry this message about “variable” back to your
Let a letter x stand for a (real) number, in the same way that the pronoun
“he” stands for a boy or man. All the knowledge accumulated about rational
numbers7 can now be brought to bear on this x. There should be no discomfort
about the use of symbols any more than there should be discomfort about the use
of pronouns. The analogy with a pronoun is apt, in the sense that, if one does
not begin a sentence with a pronoun without saying what the pronoun stands for,

6 In mathematics, a variable is an informal abbreviation for “an element in the domain of definition

of a function” or a symbol that represents such, which is of course a perfectly well-defined concept
(see Chapter 7). If, for example, the domain of definition of a function (see page 117) is a set of
ordered pairs of numbers, it is informally referred to as “a function of two variables”, and it must be
said that, in that case, the emphasis is more on the word “two” than on the word “variables”. In the
sciences and engineering, the word “variable” is bandied about with gusto. However, to the extent
that mathematics is just a tool rather than the central object of study in such situations, scientists and
engineers can afford to be cavalier about mathematical terminology. In this volume, we have to be
more careful because we are trying to learn mathematics.
7 Because of FASM (page 265; a longer discussion is in Section 1.8 of [Wu-PreAlg]), all the opera-

tions on rational numbers transfer to real numbers.


then one also never uses a symbol without saying what the symbol stands for.
Here then is what might be called the Basic Protocol in the use of symbols:
Each time one uses a symbol, one must specify precisely what the
symbol stands for.
In a situation where we want to determine which number x satisfies an equality
such as 2x2 + x − 6 = 0, the value of the number x would be unknown for the
moment and x is then also called an unknown. In broad outline, this is all there
is to it as far as the use of symbols is concerned.
A closer examination of this usage reveals some subtleties, however. Consider
first the following three cases of the equality xy = yx:
(V1) xy = yx.
(V2) xy = yx for all whole numbers x and y so that 0 ≤
x, y ≤ 10.
(V3) xy = yx for all real numbers x and y.
The statement (V1) has no meaning, because we don’t know what the symbols x
and y stand for. To pursue the analogy with pronouns, suppose someone makes
the statement, “He is 7 foot 6”. Without indicating who “he” refers to, this state-
ment is neither true nor false.8 It is simply meaningless. For example, if x and y in
(V1) are real numbers, then (V1) is true, but there are other mathematical objects
x and y for which (V1) would be false.9 There is thus no way to decide if (V1) is
true or false. On the other hand, (V2) is true, but it is a trivial statement because
its truth can be checked by successively letting both x and y be the numbers 0,
1, 2, . . . , 9, 10, and then computing xy and yx for comparison. The statement
(V3) is however both true and more profound. As mentioned implicitly above,
this is the commutative law of multiplication among real numbers. It is either
something you take on faith, or, in some other context,10 a not-so-trivial theo-
rem to prove. Thus, despite the fact that all three statements (V1)–(V3) contain
the equality xy = yx, they are in fact radically different statements because the
quantifications (i.e., the precise descriptions) of the symbols x and y are different.
This reinforces the message of the above Basic Protocol that the quantification of
a symbol is critically important.
The preceding examples may convey the false message that each time a sym-
bol is used, it stands for “many” numbers, e.g., all real numbers. It remains to
point out that such is not the case in general. There are many equalities involv-
ing a number x where the x stands only for a finite collection of numbers. For
example, the x in the equality 3x + 7 = 5 can only be a single number, namely, x
= − 23 . This familiar process of “solving an equation” will be discussed in some
detail in Section 3.1 on page 37; it is not as simple as meets the eye. An even more
telling example is the following: let numbers a, b, c be fixed and let a = 0; then
the number x in the equality ax + b = c is the number
x =
We leave the verification of this claim to an exercise, but note that in this case,
not only does x stand for a single number, but also the symbols a, b, c are each
8 Itis true if “he” refers to the basketball star Yao Ming, but is false for Woody Allen.
9 For example, if x and y are certain 2 × 2 matrices.
10 Such as the set-theoretical foundation of mathematics.

explicitly restricted to be single numbers. In textbooks, because the numbers a, b,

and c are fixed, they are called constants. The symbols a, b, and c therefore furnish
examples of a symbol that “does not vary”. On the other hand, there could be, a
priori, many numbers x that make a given equation valid. For example, both ±1
and ±2 are clearly solutions of x4 − 5x2 + 4 =
0. For this reason, one often refers to x in a Each time one uses a symbol, one
general equation in x as a variable. Thus we must specify precisely what the
use the terminology of a “variable” here as an
symbol stands for.
afterthought; keep in mind that there is clearly
no need for it.

Exercises 1.1

In doing these and subsequent exercises, observe the following basic

( a) Use only what you have learned so far in this volume.
This is the situation you face when you teach.
(b) Show your work. The explanation is as important as the
(c) Be clear. Get used to the idea that everything you say has to
be understood.
(1) Verify the above assertion that with the numbers a, b, c fixed and a = 0,
the only number x that satisfies ax + b = c is the number x = c−a b .
(2) If a and b are two numbers, what are ( a + b)3 and ( a − b)3 ? (These are
useful identities to bear in mind.)
(3) Is the following true or false for any numbers s and t ?

(s2 − t2 )2 + (2st)2 = (s2 + t2 )2 .

Do you see why such an identity could be of interest?11

(4) Determine all the numbers x so that ( x + 3)( x − 12 ) = 0. Give the detailed

1.2. Expressions and identities

Meaning of an expression
A notational convention
Meaning of an identity

11 Look ahead to page 107 if you wish.


Meaning of an expression
It is time to recall that in arithmetic there are many occasions when the use of
symbols is unavoidable. In addition to the commutative law of multiplication,
the statements of the commutative law for addition, the associative laws for ad-
dition and multiplication, and also the distributive law require a similar use of
symbols. In addition, the formulas for the addition, subtraction, multiplication,
and division of fractions likewise cannot be stated without the use of symbols.
We repeat these formulas here to emphasize this point: let k , m n be arbitrary
rational numbers. In other words, k, , m, n are integers and  = 0, n = 0, and
m = 0. Then:12
k m kn ± m
± = ,
 n n
k m km
· = ,
 n n
m = .
n m
We emphasize that in each of these formulas, we don’t need to know the exact value
for each of k, , m, n, but so long as they are integers, they will have to satisfy k ± m
= kn±nm , etc. For example, with k = 11,  = −7, m = 5, and n = 23, then the
above formulas imply that
11 5 (11 × 23) ± (5 × (−7)) 218 288
± = = − or − ,
−7 23 (−7) × 23 161 161
11 5 11 × 5 55
· = = − ,
−7 23 −7 × 23 161
−7 11 × 23 253
= = − .
5 × (−7) 35
As a natural extension of these ideas, we now give some well-known algebraic
identities. The term identity is used in mathematics to indicate, informally, that
an equality is valid for a “large set” of numbers of interest. What “large” means
will be clearly indicated in each situation and,
The term identity is used in in any case, is usually clear from the con-
mathematics to indicate, text. The term “identity” is definitely not a well-
defined mathematical concept that requires a 100%
informally, that an equality is precise definition. However, since the meaning
valid for a “large set” of of this term seems at present to be endlessly
numbers. (and, one might say, unnecessarily) debated,
we will now try to clarify its meaning as best
we can. By a number expression or more simply an expression, in a given col-
lection of numbers x, y, . . . , w, we mean a number obtained from these x, y, . . . ,
w and from a collection of specific real numbers (e.g., 16, 18 , 5, etc.) by the use of

12 See page 270; a detailed discussion is given in Section 2.5 of [Wu-PreAlg].


a combination of arithmetic operations (i.e., addition, subtraction, multiplication,

and division). For example, if x, y, z are numbers, then
xy z − y3 2
+ x3 (16z − y2 ) − z21 , 1 5
, w4 + y4 + z4 − wyz
xyz + 2 8 + ( yz )2

are examples of expressions in the numbers x, y, z (we have to assume xyz = −2

in the first expression and y = 0 and z = 0 in the second expression to avoid
dividing by 0). More precisely, the first expression is the number obtained by
applying +, −, ×, and ÷ to the numbers x, y, z and to the specific numbers 2
and 16. Similarly, the second expression is the number obtained by applying +,
−, ×, and ÷ to the numbers y, z and the specific numbers 5 and 18 , and the third
expression is the number obtained by applying +, −, ×, and ÷ to the numbers
w, y, z and the specific number 27 . And so on. Later on in Chapter 9, we shall expand
the meaning of expression after we have defined taking the n-th root.

A notational convention
You may have noticed that the above expressions would be ambiguous unless a
notational convention concerning the arithmetic operations among the symbols
is understood. With the help of parentheses, the correct order in carrying out the
arithmetic operations in, for example,
+ x3 (16z − y2 ) − z21
xyz + 2
will always be understood in this convention to mean
   −1   3    
(1.1) xy · ( xyz) + 2 + ( x ) (16z) + (−(y2 )) + − (z21 ) .
(The notation { A}−1 for a number A stands for the multiplicative inverse of A;
see page 270 or Section 2.5 of [Wu-PreAlg].) The ungainly sight of (1.1) should
be reason enough for the adoption of this notational convention. Postponing the
exact description of this notational convention to Section 1.4 on page 17 so as not
to disrupt the flow of the exposition, we may roughly describe this convention as
follows: do the multiplication indicated by the exponents first, then the multiplications,
and finally the additions. Recall in this connection that subtraction is nothing but
addition in disguise, i.e., a − b = a + (−b) by definition, for any two rational
numbers a, b (Section 2.3 of [Wu-PreAlg]). Similarly, division is nothing but mul-
tiplication in disguise, i.e., the division in xyz+2 above is nothing other than the
multiplication xy · ( xyz + 2)−1 (see Section 2.5 of [Wu-PreAlg]).

Meaning of an identity
Now we can give “an approximate definition” of an algebraic identity, or more
simply an identity, as a statement that two given number expressions are equal
for every number in a given collection under discussion (such as all whole num-
bers, all positive numbers, or all numbers13 ) allowing for a small set of exceptions.
We emphasize again that an identity is not a precise concept within mathematics
but a piece of terminology used loosely for convenience. In specific situations,

13 Recall that a number, or a real number, is just a point on the number line.

there will be plenty of opportunities to discern what “the given collection under
discussion” is and what the “small set of exceptions” may be. A few examples
will be given below.
The assertion that ab = ba is true for all numbers a and b is an example of an
kn ± m 
identity, and so is k ± m n = n for all integers k, , m, n provided  = 0 and
n = 0. Right here, we see an identity that makes allowance for the exceptions of
 = 0 and n = 0. More is true. We have just stated the equality k ± mn = kn±nm for
integers k, , m, n, but we know from considerations of rational quotients14 that this
equality remains true even if k, , m, n are arbitrary rational numbers. Therefore,
in this form, this identity is valid for all rational numbers k, , m, n provided  = 0
and n = 0. The fact that the identity remains valid for all real numbers is then a
consequence of FASM.15 But even here, there are a “small number of exceptions”
to this general identity, namely,  = 0 and n = 0.
In case it helps to further illustrate the cavalier manner in which the termi-
nology of identity is used, we give two advanced examples without attempting to
define the relevant concepts. The equality log xy = log x + log y is an identity
for all positive numbers x and y. The equality 1 + cot2 x = csc2 x is an identity
for all numbers x except for all integer multiples of π.
We want to get more interesting identities. Consider the computation of the
square 1042 , for example. One can compute it directly, of course. But one can also
proceed by appealing to the distributive law, as follows:

1042 = (100 + 4)2 = (100+4) (100 + 4)

= (100+4) × 100 + (100+4) × 4 (dist. law)
= 1002 + (4 × 100) + (100 × 4) + 42 (dist. law again)
= 1002 + 2 × (100 × 4) + 42 .
At this point, it should be possible to mentally finish the computation as 10000 +
800 + 16 = 10816. More than a trick, this idea of computing the square of a
sum using the distributive law turns out to be almost omnipresent in algebraic
manipulations of all kinds. It is a good idea to formalize it once and for all. We
therefore have, in an identical fashion:
(1.2) ( a + b)2 = a2 + 2ab + b2 for all numbers a and b.
This is our first identity of note.
A similar consideration, but worth pointing out in any case, is the computa-
tion of the square of 497, for example. We recognize it as (500 − 3)2 , so that

4972 = (500 − 3)2 = (500 − 3) (500 − 3)

= (500 − 3) × 500 − (500 − 3) × 3 (dist. law)
= 5002 − (3 × 500) − (500 × 3) − 32 (dist. law again)
= 500 − 2 × (500 × 3) + 3 .
2 2

14 Recall that these are quotients A where A and B are rational numbers. See page 268 of this
volume or Section 2.5 of [Wu-PreAlg].
15 See page 265 of this volume or Section 2.7 of [Wu-PreAlg].

(Note that the preceding computation furnishes a good review of the basic arith-
metic of rational numbers: the distributive law for a difference, a(b − c) = ab − ac
for all numbers a, b, c, and the removal of parentheses by −( a − b) = − a + b for
all a, b. See Section 2.4 of [Wu-PreAlg].) Again, we stop the calculation at this
point because it can now be finished in one’s head: 250000 − 3000 + 9 = 247009.
The same computation also leads to:
(1.3) ( a − b)2 = a2 − 2ab + b2 for all numbers a and b.
It is a good illustration of the power of the symbolic notation, and the at-
tendant generality the symbolic method brings, to note that identity (1.3) can
be obtained directly from identity (1.2). Indeed, since the identity ( a + b)2 =
a2 + 2ab + b2 is valid for all numbers a and b, we may replace b by an arbitrary
number −c to get
( a + (−c))2 = a2 + 2a(−c) + (−c)2 = a2 − 2ac + c2 .
Since a + (−c) = a − c by definition, we get ( a − c)2 = a2 − 2ac + c2 , and since c
is arbitrary anyway, we may replace c by b to obtain ( a − b)2 = a2 − 2ab + b2 for
any numbers a and b. Thus we have retrieved identity (1.3) by way of the identity


What is the following number?

145 2 145 51 51 2
+2 + .
196 196 196 196

A third common identity can be introduced by a computation of another kind:

409 × 391 =? We recognize that 409 × 391 = (400 + 9)(400 − 9), so that
409 × 391 = {(400 + 9) × 400} − {(400 + 9) × 9} (dist. law)
= 400 + (9 × 400) − (400 × 9) − 9
2 2
(dist. law again)
= 400 − 9 .
2 2

It follows that 409 × 391 = 160000 − 81 = 159919. The same reasoning carries
over to any two numbers a and b, so that
( a + b)( a − b) = ( a + b) a − ( a + b)b
= a2 + ba − ab − b2
= a2 − b2 .
When the symbolic computation is given in such detail, we see that in the second
line, the commutative law for multiplication was used. We have obtained our
third identity:
(1.4) ( a + b)( a − b) = a2 − b2 for all numbers a and b.
The preceding three identities, (1.2)–(1.4), may be considered the most basic
identities in algebra. Note that their usefulness comes not just from the expansion,
( a + b)2 = a2 + 2ab + b2 , ( a − b)2 = a2 − 2ab + b2 , etc., but even more so from
the recognition, for example, that in (1.4), the expression a2 − b2 in the numbers a
and b is equal to a product, ( a + b)( a − b). Informally, we may say that the power

of the identities (1.2)–(1.4) often results from reading these identities from right to left,
i.e., for all numbers a and b,
a2 + 2ab + b2 = ( a + b )2 ,
a2 − 2ab + b2 = ( a − b )2 ,
a2 − b2 = ( a + b)( a − b).
The last equality, i.e.,
(1.5) a2 − b2 = ( a + b)( a − b) for all numbers a and b,
which is identity (1.4) written backward, is what is known as a factorization or
factoring of a2 − b2 , which merely means expressing a2 − b2 as a product, in the
same sense that 24 = 3 × 8 is a factorization of 24. Knowing such a factorization
for a number expression involving two arbitrary numbers a and b can be very
useful. Thus, if a = b, we can simplify the division aa−
2 b2
− b to
a2 − b2
(1.6) = a + b,
because a2 − b2 = ( a + b)( a − b), so that we can cancel the (nonzero) number a − b
in the numerator and the denominator. Here then is another identity that holds
for all a and b except when a = b. We explicitly point out that, insofar as a and
b can be rational numbers (say, 17 2
5 and 7 ), we are using the cancellation law for
rational quotients here.16 One cannot over-emphasize the importance of the role
played by complex fractions or rational quotients17 in school mathematics.

Exercises 1.2
In doing these and subsequent exercises, observe the following basic
( a) Use only what you have learned so far in this volume.
This is the situation you face when you teach.
(b) Show your work. The explanation is as important as the
(c) Be clear. Get used to the idea that everything you say has to
be understood.
(1) Let x and y be numbers so that x = y and x = −y. (i) Simplify +
y 1 1
. (ii) Simplify 2 − 2 .
x−y x − y2 x + y2
(2) If a is a number, one can compute ( a2 − 53 a − 23 )( a2 + 53 a − 23 ) by a
straightforward application of the distributive law. Do you see an easier
way to do this computation? (There is more than one way.)
(3) Simplify for all numbers x and y: (i) ( x + y)2 + ( x − y)2 . (ii) ( x + y)2
− ( x − y)2 . Observe that ( x + y)2 − ( x − y)2 ≤ ( x + y)2 ; in view of
(ii), what do you conclude? (Compare Exercise 13 in Section 2.6 of
(4) Is the whole number 98767 − 1237 a prime number?
16 See page 269 of this volume or Section 2.5 of [Wu-PreAlg].
17 Both concepts are neglected in TSM (see page xi for the definition of TSM).

879 2 868 879 868 2
(5) Mental math: compute −2 + .
22 22 22 22
(6) Show that for all numbers x, y, and c = 0,
| x + y|2 ≤ 1 + 2 | x |2 + (1 + c2 )|y|2 .

(7) Can you see why if x and y are any two numbers (in particular, they
could be negative), then 19 x2 − 121
y ≥ 0?
1 2
xy + 64
(8) For numbers a and b, compute ( a + b) . (There is a generalization of
identity (1.2) for any positive integer n that states
n n −1 n n −2 2 n
( a + b)n = an + a b+ a b +···+ abn−1 + bn ,
1 2 n−1
where the numbers (nr) for r = 1, 2, . . . , n − 1 are the binomial coefficients
(see page 266). This is called the binomial theorem (see, e.g., Chapter 11
in Volume II of [Wu-HighSchool]). It would be instructive for you to
check that your result for ( a + b)3 coincides with the special case of the
binomial theorem for n = 3.)

1.3. Mersenne primes and finite geometric series

A basic identity
Mersenne primes
Finite geometric series

A basic identity
There is an identity that generalizes identity (1.4), that is equally elementary but
has far-reaching applications in mathematics. This time, we start with a symbolic
calculation using the distributive law twice: if a, b are any two numbers, then
( a2 + ab + b2 )( a − b) = ( a2 + ab + b2 ) a − ( a2 + ab + b2 ) b
= ( a3 + a2 b + ab2 ) − ( a2 b + ab2 + b3 )
= a3 − b3 .
Notice two features in the preceding calculation. First, if we call any of the prod-
ucts separated by two consecutive +’s a term of the number expression,18 e.g.,
a3 , a2 b, ab2 , . . . , b3 , then the way to remember the expression a2 + ab + b2 is to
observe that the power of a decreases by 1 and the power of b increases by 1 as
we go through the terms from left to right. Second, the cancellation in the second
( a3 + a2 b + ab2 ) − ( a2 b + ab2 + b3 )
is due to the matching of each term in the first pair of parentheses with a term
in the second pair of parentheses, except for the first term a3 and the last term b3 ,
18 Recall that since a subtraction is an addition in disguise, this reference to + includes automat-

ically all the −’s.


so that the only survivors at the end are the two terms a3 − b3 . The same pattern
repeats itself if we multiply ( a3 + a2 b + ab2 + b3 ) by ( a − b). Thus,
( a3 + a2 b + ab2 + b3 )( a − b) = ( a3 + a2 b + ab2 + b3 ) a − ( a3 + a2 b + ab2 + b3 ) b
= ( a4 + a3 b + a2 b2 + ab3 ) − ( a3 b + a2 b2 + ab3 + b4 )
= a4 − b4 .
If we form the products
( a4 + a3 b + a2 b2 + ab3 + b4 )( a − b),
( a5 + a4 b + a3 b2 + a2 b3 + ab4 + b5 )( a − b),
the results would be a5 − b5 , a6 − b6 . Let us write these down. For any two
numbers a and b, we have
( a − b) ( a2 + ab + ab2 ) = a3 − b3 ,
( a − b) ( a3 + a2 b + ab2 + b3 ) = a4 − b4 ,
( a − b) ( a4 + a3 b + a2 b2 + ab3 + b4 ) = a5 − b5 ,
( a − b) ( a5 + a4 b + a3 b2 + a2 b3 + ab4 + b5 ) = a6 − b6 .


Verify that ( a − b)( a4 + a3 b + a2 b2 + ab3 + b4 ) = a5 − b5 .

At this point, it should not be difficult to discern a pattern. So let n be a

positive integer and we form the product
( an + an−1 b + an−2 b2 + an−3 b3 + · · · + abn−1 + bn )( a − b).
Then we get the following sum:
( a n + a n −1 b + a n −2 b2 + a n −3 b3 + · · · + b n ) a
− ( an + an−1 b + an−2 b2 + · · · + abn−1 + bn ) b,
which, upon applying the distributive law again, becomes:
a n +1 + an b + a n −1 b2 + a n −2 b3 + ··· + abn
− an b − a n −1 b2 − a n −2 b3 − ··· − abn − b n +1 .
We now see that the terms which are vertically aligned cancel each other. What is
left is then an+1 and −bn+1 . Thus we have: for any integer n ≥ 1,
an+1 − bn+1 = ( a − b)( an + an−1 b + an−2 b2 + · · · + abn−1 + bn ).
It is more convenient for subsequent discussions to restate this as:
(1.7) an − bn = ( a − b)( an−1 + an−2 b + · · · + abn−2 + bn−1 )
for any numbers a and b, and any integer n ≥ 2.
The case n = 3 of identity (1.7) comes up so often that we call attention to it by
stating it separately:
(1.8) a3 − b3 = ( a − b)( a2 + ab + b2 ) for any numbers a, b.
The rest of this section is devoted to two observations about identity (1.7).

Mersenne primes
First, we consider identity (1.7) only when a and b are whole numbers. Then of
course an − bn is also a whole number for any positive integer n. It may come
as a surprise that (1.7) has very interesting things to say about prime numbers
in this case. Recall that a whole number ≥ 2 is a prime if it has no divisor other
than 1 and itself (see, e.g., Section 3.1 of [Wu-PreAlg]). Therefore, when two
whole numbers a and b satisfy a − b > 1, (1.7) says that an − bn is never a prime
when n ≥ 2 because it has a − b as a divisor. For example, 2541 − 641 is not a
prime because—although we don’t know this big number exactly—we know that
19 (= 25 − 6) is a divisor.
Why is the fact that an − bn is never a prime when n ≥ 2 and a − b > 1 wor-
thy of attention? Because the study of the integers is a primary concern of a major
branch of mathematics, number theory, and an important part of number theory
is devoted to the understanding of prime numbers. An obvious question about
primes is how to decide, simply, whether a given number is prime or not. Unfortunately,
we have no complete answer to this question yet. There is a silver-lining to this
failure, however. If we had a simple way to detect primes, our daily life might be-
come dramatically different because, for example, banking and online purchasing
would not have evolved the way they did (see, e.g., [Wiki-cryptography]). There-
fore knowing that a large number such as 21560887 − 1 (it has 426 digits!) is never
a prime is something to write home about.


(a) Explain, without using identity (1.7), why the number 39187 − 35387 is
not a prime. (b) Verify that 292 − 282 = 57 by mental math.

When a = b + 1, then of course (1.7) ceases to give any direct information on

the primality of an − bn because the factor ( a − b) in (1.7) is equal to 1. However,
this does not mean that (1.7) has nothing more to say. Consider, for example,
296 − 286 . Since
296 = 29 · 29 · 29 · 29 · 29 · 29 = (29 · 29)3 = (292 )3
and similarly 286 = (282 )3 , we can use (1.8) to conclude that 296 − 286 is not a
prime because
296 − 286 = (292 )3 − (282 )3
= (292 − 282 ) (292 )2 + 29 · 28 + (282 )2
and 57 (= 292 − 282 ) is therefore a divisor of 296 − 286 . In general, the same
reasoning shows that, if a = b + 1 but n is composite (i.e., it has a divisor other
than 1 and itself), then an − bn is not a prime. More precisely, let n = pq, where p
and q are integers both > 1. Then an = ( a p )q and bn = (b p )q , so that

a n − b n = ( a p − b p ) ( a p ) q −1 + ( a p ) q −3 b p + ( a p ) q −2 ( b p )2

+ · · · + a p ( b p ) q −2 + ( b p ) q −1 .
Thus the number a p − b p is greater than 1 (see Exercise 9 on page 17) and is a
divisor of an − bn .

There remains the case of a − b = 1 but n is a prime. Then an − ( a − 1)n can

be prime or composite as n runs through the primes. For example, 32 − 22 is a
prime (it is 5), but 52 − 42 is not (it is 9). Similarly, 33 − 23 is a prime (it is 19),
but 63 − 53 is not (it is 91 = 7 · 13). However, the most intriguing case is when
a = 2 and b = 1; then the numbers 2 p − 1 (= 2 p − 1 p ) when p runs through all
the primes become interesting for a historical reason. First, observe that, for the
first few primes p, we have

22 − 1 = 3,
23 − 1 = 7,
25 − 1 = 31,
27 − 1 = 127,
211 − 1 = 2047,
213 − 1 = 8191,
217 − 1 = 131071.

On this list, every number is a prime19 except the case of p = 11: 2047 = 23 ×
89. Those numbers of the form 2 p − 1 which are primes are called Mersenne
primes. Marin Mersenne (1588–1648) was a French monk, a scholar of science and
mathematics, and the central clearinghouse of European science and mathematics
of his time. There were no scholarly journals in those days, but Mersenne, through
his correspondence with the leading scientists and mathematicians of Europe—
including Descartes, Pascal, Fermat, and Huygens—helped disseminate the latest
discoveries to a wider audience. He came upon the primes that are named after
him in his (unsuccessful) search for an expression that would yield only primes.
He claimed that when a prime p is at most 257, then 2 p − 1 is a prime exactly
p = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127, and 257.

It turns out that he was wrong about p = 67 and 257, and he also missed p = 61,
89, and 107 (261 − 1, 289 − 1, and 2107 − 1 are all primes). Nevertheless, the interest
in Mersenne primes has endured.
The overriding fact about Mersenne primes is that it is not known whether
there are an infinite number of them; as of April 2016, only 49 Mersenne primes
are known ([Wiki-GIMPS]). This fact colors everything we have to say about
these primes. There is an online society devoted to the search of Mersenne
primes, the Great Internet Mersenne Prime Search (GIMPS), which has been re-
sponsible for the discovery of all the Mersenne primes since 1997 (see [GIMPS],
also [Wiki-GIMPS]). The largest known Mersenne prime as of April 2016 has
22,338,618 digits; it corresponds to p = 74,207,281 (discovered on January 7, 2016).
Incidentally, this is also the largest known prime number. If we can prove that
there is only a finite number of Mersenne primes, then finding the largest one
would obviously be of great interest.

19 The primality of these numbers (other than 131071) can be decided with a modicum of patience.

The fact that 131071 is a prime was first discovered by Pietro Cataldi (1548–1626) in year 1588.

Finite geometric series

It is time to return to the original identity (1.7) for arbitrary numbers a and b. Our
second observation about this identity begins by rewriting it as a division:

a n +1 − b n +1
(1.9) = ( an + an−1 b + an−2 b2 + an−3 b3 + · · · + abn−1 + bn )
for any a and b, with a = b, and any positive integer n.

Note that this identity generalizes identity (1.6) on page 10. Now if b = 1 and
a = 1, then we get (by writing (1.9) backward):

a n +1 − 1
(1.10) (1 + a + a2 + · · · + a n −1 + a n ) = for any number a = 1.
In this form, identity (1.10) is called a summation formula for the finite geometric
series20 of n + 1 terms in a, 1 + a + a2 + · · · + an−1 + an . For example, if a = 5
and n = 11, then (using a calculator!)

512 − 1 512 − 1
1 + 5 + 52 + 53 + · · · + 510 + 511 = = .
5−1 4

Since 512 − 1 = 244, 140, 624, we have

1 + 5 + 52 + 53 + · · · + 510 + 511 = 61035156.

If a = −3 and n = 15, then

316 − 1 43046720
1 − 3 + 32 − 33 + 34 − · · · + 314 − 315 = = − = −10761480.
−3 − 1 4
And finally, if a = 4 and n = 10, we have

{( 34 )11 − 1}
1+ 3
4 + ( 34 )2 + ( 34 )3 + · · · + ( 34 )10 = ,
4 −1

which is equal to
= 3.83 . . . .
As another example,
318 − 1
38 + 39 + · · · + 325 = 38 (1 + 3 + · · · + 317 ) = 38
= 6561 × 12 (387420488)
= 1270932910884.

20 The reason for calling such a series “geometric” is obscure, and everybody seems to be—at

best—guessing. The most reasonable guess, to me, is the picture of the sequence of segments in

In summary: Identity (1.7) exemplifies the

The summation formula for finite power of the symbolic notation (or more gen-
geometric series belongs in the erally, the power of abstraction). It is a useful
identity in its own right; for example, it can
tool kit of every teacher and
be used to find the derivative of a polynomial
student. in calculus. When a and b are restricted to be
whole numbers, the identity leads to the fac-
torizations of many whole numbers and also leads to the consideration of Mersenne
primes. Finally, in the form of (1.9), it also tells us how to sum a finite geometric
series. In any case, this identity comes up naturally on many occasions.
Geometric series appear often in both science and mathematics and, for this
reason, identity (1.7) belongs in the tool kit of every teacher and student.

Exercises 1.3

In doing these and subsequent exercises, observe the following basic

( a) Use only what you have learned so far in this volume.
This is the situation you face when you teach.
(b) Show your work. The explanation is as important as the
(c) Be clear. Get used to the idea that everything you say has to
be understood.

1 1 1 1
(1) If y is a nonzero number, what is 1 + + + 3 + · · · + 19 ?
y y2 y y
example of this section, we found 3 + 3 + · · · + 3 =
(2) In
the last 8 9 25

38 33−−11 . Can you find another way to prove this?


1 1 1 1
(3) (a) Sum 56 + 57 + 58 + · · · + 527 . (b) Sum 15 + 16 + 17 + · · · + 32 .
2 2 2 2
(4) If y is a nonzero number and n is a positive integer, what is

1 1 1
+ 4 +···+ n?
y3 y y

1 1 1 1 1
(5) Sum 3
− 5 + 7 − 9 + · · · − 33 .
4 4 4 4 4
(6) If x is a nonzero number and n is a positive integer, what is

1 1 1 1 1
−1 + 3
− 6 + 9 − · · · + (2n−1)3 − (2n)3 ?
x x x x x

1 1 1
(7) Show that, for any positive integer n, + + · · · + n < 1. (Note: a
2 22 2
popular representation of this inequality is the following picture:

Do you see the relationship between the two? Explain.)

(8) Show that if a and n are integers so that a ≥ 3 and n ≥ 2, then an − 1 is
never a prime.
(9) Show that if a and b are positive integers and a > b, and if p is an integer
> 1, then a p − b p > 1.

1.4. Polynomials and order of operations

Order of operations and other conventions

Monomials and polynomials
Factoring quadratic polynomials

Order of operations and other conventions

Before we can define polynomials, we have to set up some more notational conven-
tions. Underlying the whole discussion of polynomials will be a simple observa-
tion based on the distributive law, and we deal with this first. Suppose we have a
(18 × 53 ) + (53 × 23) + (69 × 53 ).
One can compute this sum by multiplying out each term 18 × 53 , 53 × 23, and
69 × 53 , and then adding the resulting numbers to get
(18 × 53 ) + (53 × 23) + (69 × 53 ) = 2250 + 2875 + 8625 = 13750.
Now if we reflect for a moment, we will realize that we wasted precious time
doing three multiplications before adding. If we apply the distributive law, then
the computation becomes easier:
(18 × 53 ) + (53 × 23) + (69 × 53 ) = (18 + 23 + 69) × 53
= 110 × 125 = 13750.
(Notice that we have made use of the commutative law of multiplication to change
53 × 23 to 23 × 53 in the process.) You may think that with the advent of high-
speed computers, it doesn’t make any difference whether we get the answer by
multiplying three times and then adding once, or (as in the second case) adding

three times and multiplying once. This is true, but the difference in conceptual
clarity between

(18 × 53 ) + (53 × 23) + (69 × 53 )

(18 + 23 + 69) × 53

is enormous. This is because multiplication is a far more complicated concept

than addition; for example, on the level of whole numbers, every whole number
is just a sum of 1’s, but is a product of primes (the Fundamental Theorem of
Arithmetic; see page 270), and primes are complicated. On a more mundane
level, 234 + 677 merely means lumping 234 and 677 together, but 234 × 677
means adding 234 copies of 677. It is therefore conceptually simpler to add
three times and multiply once than to multiply three times and add once. In
addition, the second way of writing, (18 + 23 + 69) × 53 , is more succinct, and
therefore more clear. Because conceptual clarity is very important in learning and
doing mathematics, whenever we see terms involving the same numbers raised
to a fixed power (such as 53 in (18 × 53 ) + (53 × 23) + (69 × 53 )), we will always
collect them together by the use of the distributive law. For example, we will
always rewrite

(181 × 25 ) + (67 × 25 ) + (25 × 96) − (257 × 25 )

as a product,

(181 + 67 + 96 − 257) × 25 (= 87 × 25 ).

Similarly, we write
24 × 5914 − ( 35 )8 × 89 + (5914 × 73) + (5914 × 66) + 25 × ( 35 )8 + ( 35 )8 × 11

as a sum of two terms:

(163 × 5914 ) − 53 × ( 35 )8 ,

where 163 = 24 + 73 + 66 and −53 = −89 + 25  + 11. Recall

 that we

consider a subtraction to be a “sum” because − ( 35 )8 × 89 = + − ( 35 )8 × 89 .
In an entirely similar manner, suppose we are given a sum of multiples of
whole number powers of a fixed number x, where multiple here means simply
multiplication by any number and not necessarily by a whole number. Then we
would automatically collect together the terms involving the same power of x as
before. For example, we would rewrite

1 3 1
x + 16 − 8x2 + x3 − x5 − 6x2 + 75x + 2x3
2 3
17 3
− x5 + x − 14x2 + 75x + 16.

Observe that we have implicitly followed three conventions in writing the latter
sum involving the powers of a fixed number x:
(i) Parentheses are suppressed with the understanding that exponents be computed first,
multiplications second, and additions third. (This is the so-called order of operations,
and was already mentioned on page 7.)
(ii) Powers of x are placed last in each term (so that The order of operations is just a
instead of − x2 14, we write −14x2 ). convention and, like all other
(iii) The terms are written in decreasing powers 21 conventions in mathematics, it
of the number x in question. (We make the ad hoc has no mathematical substance.
definition in this situation that x0 = 1 regard-
less of whether x is 0 or not.22 The term 16 is
then the term 16x0 ; incidentally, this is where we need the concept of the zeroth
power of x.)

Monomials and polynomials

Let x be a number. A multiple of a single nonnegative power of x, such as 58x12 ,
is called a monomial. The number in front of a power of x is called the coefficient
of that particular power of x; thus 58 is the coefficient of x12 in 58x12 . A (finite)
sum of multiples of whole number powers of x is called a polynomial in x. We
emphasize that, in this terminology, the definition in (iii) above says that, for a
polynomial in x, we define x0 = 1 regardless of whether x is 0 or not. A monomial
is a polynomial with only one term.
The highest power of x with a nonzero coefficient in a polynomial is called the
degree of the polynomial. The terminology about “nonzero coefficient” refers to
the fact that the polynomial of the last subsection, − x5 + 17 6 x − 14x + 75x + 16,
3 2

could be written as 0 · x − x + 6 x − 14x + 75x + 16, but the 37-th power of

37 5 17 3 2

x clearly doesn’t count. This polynomial has degree 5, and not 37 (and not any
whole number different from 5, for that matter.) Moreover, −1 is the coefficient
of x5 , 0 is the coefficient of x4 , and −14 is the coefficient of x2 , because, strictly as
a sum of the powers of x, this polynomial is, in reality,
17 3
(−1) x5 + 0x4 + x + (−14) x2 + 75x + 16x0 .
Similarly, 16 is the coefficient of x0 .
As is well known, a polynomial of degree 1 is called a linear polynomial, and
that of degree 2 is called a quadratic polynomial. Because a general quadratic
polynomial has only three terms ax2 + bx + c (where a, b, and c are constants),
it is sometimes called a trinomial in school mathematics. It must be said that
the terminology of “trinomial” is not one that is used in advanced mathematics,
so you should avoid using it as much as possible. We will discuss quadratic
polynomials in some detail in the last chapter (Chapter 10). A polynomial of
degree 3 is called a cubic polynomial.
There is no reason why we must restrict ourselves to polynomials in one
variable. If x, y, z, etc., are numbers, then sums of multiples of the products of

21 This is a good rule most of the time but not all the time. There will be times when we want to

write such sums in increasing powers of a number x.

22 For a fuller discussion of the zeroth power of a number, see Chapter 9.

nonnegative powers of x, y, z, etc., are called polynomials in x, y, z, etc. For

example, 19x3 y21 − 8y9 z5 − xyz + 31 is such a polynomial.
Here we should make a comment on the order of operations, a convention about
the particular order of carrying out the arithmetic operations on a polynomial.
This is a topic in school mathematics that has been as wrongly over-emphasized
as the insistence on having all fractions reduced to lowest terms.23 This is just a
convention and, like all other conventions in mathematics, it has no mathematical
substance. You should try to explain to your students, as clearly as you can, what
this convention is all about, why we adopt it, and then go on to spend time on
more important topics, such as those in the remaining chapters of this volume.
While we are discussing conventions, we may mention a few others:
(iv) In symbolic expressions, we usually use a dot · in place of × for the multiplication
between specific numbers, e.g., 24 · 95 instead of 24 × 95.
(v) We usually omit even the dot · between a letter and a number, e.g., write 42x2 instead
of 42 · x2 unless we wish to achieve an extra degree of clarity.
(vi) We also write 1x simply as x, and we agree to omit all terms of the form 0x m where
m is any whole number.
The reason for (iv) is to avoid confusing the letter x with the multiplication symbol
You have seen polynomials before. The so-called expanded form of a multi-
digit whole number such as 75018 is a special example of a polynomial in the
number 10:

(7 · 104 ) + (5 · 103 ) + (0 · 102 ) + (1 · 101 ) + (8 · 100 ).

This is a fourth-degree polynomial in the number 10. Similarly, the expanded
form of any k-digit whole number is a special polynomial of degree (k − 1) in 10.
We referred to these polynomials as “special”, because by the requirement of the
Hindu-Arabic numeral system, the coefficient of any power of 10 in the expanded
form must be a single-digit whole number, whereas a general polynomial in 10
could be one like the following:
5 · 106 + 293 · 103 + · 102 − 9.
This is not the expanded form of a whole number. To further illustrate this point,
observe that none of the following polynomials in 10 is the expanded form of a
whole number:
(35 · 102 ) + (2 · 101 ) + (8 · 100 ),
(3 · 103 ) − (6 · 102 ) + (7 · 101 ) + (4 · 100 ),
(4 · 103 ) + (2 · 102 ) + ( 23 · 101 ) + (7 · 100 ).
The first is not the expanded form of a whole number because 35 is not a single-
digit number, the second because the coefficient of 102 is −6, which is not a whole

23 The obsession in TSM (see page xi) with order of operations has no mathematical merit; this

terminology (order of operations) is in fact unknown to most working mathematicians. For a fuller
discussion of the issues involved, see [Wu2004].

number, and the third because 23 is not a whole number. However, if we choose
to rewrite the first of these three polynomials in 10 as
(3 × 103 ) + (5 × 102 ) + (2 × 101 ) + (8 × 100 ),
then it would be the expanded form of 3528.
In the same vein, the so-called complete expanded form of a finite decimal
with any nonzero decimal digits, such as 32.58,
(3 · 101 ) + (2 · 100 ) + (5 × 10−1 ) + (8 · 10−2 ),
is not a polynomial in 10, for the reason that it contains negative powers of 10.
Because polynomials are just numbers, we can add, subtract, multiply, and
divide them as usual. With the exception of division, the other three arithmetic
operations produce another polynomial in a routine manner.


Take a few minutes to verify the preceding statement that the sum, differ-
ence, and product of two polynomials in the same number x are polynomials
in x.
Division of polynomials does not generally produce a polynomial and will be
looked at separately in the next section.

Factoring quadratic polynomials

In this subsection, we consider the multiplication of polynomials—specifically,
linear polynomials—and then read the results backward. If a and b (respectively,
c and d) are the coefficients of the linear polynomial (i.e., polynomials of degree
1) ax + b (resp. cx + d) in x, with a = 0 and c = 0, then
( ax + b)(cx + d) = ( ax + b)(cx ) + ( ax + b)d (dist. law)
= acx + bcx + adx + bd (dist. law)
= acx2 + ( ad + bc) x + bd (dist. law).
Because ac = 0, the product is a quadratic polynomial. Of course we had
to collect terms of the same degree using the distributive law and rearrang-
ing the terms so that they are in descending powers of x in accordance with
convention (iii) on page 19. The main point is to emphasize the role played
by the distributive law and to showcase the
fact that multiplying polynomials is no differ- Instead of the mnemonic device
ent from the usual operations with numbers. called FOIL, learn how to use the
If the arithmetic of numbers (whole numbers
distributive law.
and fractions) is taught correctly, such opera-
tions with polynomials are just more of the same and would not be a problem. In
particular, the uncivilized mnemonic device called FOIL is to be studiously avoided.
We have mentioned the need to sometimes look at an equality backward, i.e.,
instead of just reading it from left to right, we should also take note of the fact
that the right side is equal to the left side. Now we will have to repeat this
message. What we obtained above,
(1.11) ( ax + b)(cx + d) = acx2 + ( ad + bc) x + bd,

is nothing but routine applications of the distributive law. However, when this
equality is read backward, it becomes the statement that the sum of the three
terms on the right of (1.11) is actually equal to the product of the two linear
polynomials on the left, i.e.,
(1.12) acx2 + ( ad + bc) x + bd = ( ax + b)(cx + d).
This is not a priori obvious. For example,
15x2 + 172x − 96 = (15x − 8)( x + 12).
In general, if the polynomials p( x ), q( x ), and
Do not read an identity only r ( x ) in x satisfy p( x ) = q( x )r ( x ), then we say
from left to right; be aware that q( x )r ( x ) is a factorization of p( x ) if the de-
grees of both q( x ) and r ( x ) are positive; the
the right side is also equal to the polynomials q( x ) and r ( x ) are called the fac-
left side. tors of the polynomial p( x ). (Thus 53 x3 −
2x2 + 23 = ( 13 )(5x3 − 6x2 + 2) is not a factor-
ization of 53 x3 − 2x2 + 23 , because the degree of 13 is zero.) Compare the comments
made in connection with identity (1.5) on page 10.
In this terminology, the equation (1.12) gives a factorization of acx2 +
( ad + bc) x + bd as a product ( ax + b)(cx + d), where it is understood that
a = 0 and c = 0. For example, we get
1 2 5 1
x + x − 3 = (2x − 3)( x + 1)
2 4 4
by letting a = 2, b = −3, c = 14 , and d = 1. With some practice, the factorization
of 12 x2 + 54 x − 3 can be done directly. One way is the following. Since it is
much easier to deal with integers rather than rational numbers, we rewrite the
polynomial by using the distributive law to take out the denominators of all the
coefficients, as follows:
1 2 5 1
x + x − 3 = (2x2 + 5x − 12).
2 4 4
Then we recognize that
(2x2 + 5x − 12) = (2x − 3)( x + 4)
because, assuming there is such a factorization into polynomials with integer coefficients,
we learn from equation (1.12) that the zero-degree term (i.e., −12) of 2x2 + 5x − 12
has to be the product of two integers that are the zeroth degree terms of the
factors—thus ±3 and ∓4, or ±2 and ∓6, or ±1 and ∓12. Likewise, the coefficient
2 of 2x2 + 5x − 12 has to be the product of the coefficients of x in the factors— ±2
and ±1. Finally the coefficient 5 of 2x2 + 5x − 12 has to be the sum of the “cross
products” of these four numbers in the sense of ( ad + bc) in (1.11) above. So a
few trials and errors should get it done. Hence, we obtain
1 2 5 1 1
x + x − 3 = (2x2 + 5x − 12) = (2x − 3)( x + 4),
2 4 4 4
which is the same factorization as above.
At present, the teaching of factoring quadratic polynomials with integer coef-
ficients figures prominently, not to say obsessively, in a typical algebra course. For

this reason, some perspective on this subject is called for. All that those exercises
in factoring
Ax2 + Bx + C = ( ax + b)(cx + d)
can do for students is to help them learn to decompose two whole numbers A
and C into products of integers A = ac and C = bd so that B = ad + bc, i.e.,

Ax2 + Bx + C = acx2 + ( ad + bc) x + bd.

There is no denying that beginning students ought to acquire some facility with
decomposing integers into products of other integers. It is also important that
they can effortlessly factor a simple quadratic polynomial such as x2 + 2x − 35
into ( x + 7)( x − 5). But it often happens that although a little bit of something is good,
a lot of it can actually be bad for you. (Think of fluoride in your drinking water.) This
seems to be the case here: the teaching of a small skill gets blown up to be a
major topic, with the consequence that other topics that are more central and
more substantial (such as learning about the graphs of linear equations, solving
constant rate problems correctly, or the effective use of completing the square) get
slighted. The teaching of algebra should avoid this pitfall. Please also keep in
mind the fact that once the quadratic formula becomes available (see Theorem 10.3 on
page 234), there will be an algorithm to accomplish this factorization (in all the cases
where factoring is possible) no matter what the coefficients of the quadratic polynomial
may be.
We give one more illustration of the multiplication of polynomials where each
step except the last makes use of the distributive law:

1 1 1 1
(5x3 − x )( x2 + 2x − 4) = (5x3 − x ) x2 + (5x3 − x )2x − (5x3 − x ) 4
2 2 2 2
= (5x5 − x3 ) + (10x4 − x2 ) − (20x3 − 2x )
41 3
= 5x + 10x4 −
x − x2 + 2x.
Now, reading this equality backward gives a factorization that is (for a change)
not so easy:
41 3 1
5x5 + 10x4 − x − x2 + 2x = 5x3 − x ( x2 + 2x − 4).
2 2

Note the fact that if p( x ) and q( x ) are polynomials of degree m and n, re-
spectively, then the degree of the product p( x )q( x ) is (m + n). In other words,
the degree of a product is the sum of the degrees of the individual polynomial factors. For
example, the preceding calculation which multiplies a degree 3 polynomial with
a degree 2 polynomial yields a polynomial of degree 5 (= 3 + 2).


Discuss whether the sum of two n-th degree polynomials is always an n-th
degree polynomial.

Exercises 1.4

(1) Factor the following expressions in a number x: (i) 4x2 − 12x + 9.

(ii) 25x2 + 40x + 16. (iii) 81x2 − 121. (iv) (4x2 − 9)2 − 5x (4x2 − 9) + 4.
(2) Factor a3 + b3 for any numbers a and b. Factor a2n+1 + b2n+1 for any
positive integer n. Show that 589 + 689 is never a prime.
(3) (i) Factor x2 − 5xy + 6y2 for any numbers x and y. (ii) Factor s4 +
s2 t2 + t4 for any numbers s and t. (Hint: Expand (s2 + t2 )2 .) Factor
s4k + s2k t2k + t4k for any positive integer k.
(4) If x, y, and z are numbers, compute ( x + y + z)( x − y − z). (Obviously,
you can compute it by brute force. Equally obviously, such is not the
expectation of this exercise; see if you can do better than using brute
(5) Let a, b, c be three one-digit numbers, no two of them the same. Form all
six distinct two-digit numbers by using these three digits, and add these
six numbers. If you divide the sum of these six numbers by the sum of
the three digits a + b + c, what number do you get?

1.5. Rational expressions

A quotient (i.e., division) of two polynomials in a number x is called a rational
expression in x. Here is an example:
3x5 + 16x4 − 25x2 − 7
x2 − 1
We note that in the case of rational expressions, we need to exercise some care in
not allowing division by 0 to take place. For example, in the preceding rational
expression, x can be any number except ±1 because if x = ±1, then x2 − 1 = 0
and the denominator would be 0.


Prove that ±1 are the only numbers that satisfy x2 − 1 = 0. (Remember: we

don’t know how to solve quadratic equations yet.)

Convention: In writing rational expressions in x, it is understood

that only those numbers x for which the denominator is nonzero are
In middle school, we are mainly interested in rational numbers and, as a
consequence, all computations with numbers tacitly assume that the numbers involved
are rational numbers. With this mind, since x is a (rational) number, a rational
expression is just a rational quotient (see page 268 of this volume or Section 2.5
of [Wu-PreAlg]) and can therefore be added, subtracted, multiplied, and divided
like any other rational quotient. For example, in case x = 12 in the foregoing
rational expression, we would be looking at the rational quotient
3( 32 1
) + 16( 16 ) − 25( 14 ) − 7
( 14 ) − 1

which is equal to 16 24 , by the formulas for rational quotients (page 270). In gen-
eral, no matter what x may be, we can likewise compute with rational expressions:

5x3 + 1 2x7 (5x3 + 1)( x3 + 4) + (2x7 )( x8 + x − 2)

+ =
x8 + x − 2 x3 + 4 ( x8 + x − 2)( x3 + 4)

x2 + 1 6 ( x2 + 1)(6)
· =
x2 + 4x − 7 3x4 − 5 ( x2 + 4x − 7)(3x4 − 5)
2x +1
x 2 −3 (2x + 1)(2x )
= .
4x3 − x +11 ( x2 − 3)(4x3 − x + 11)

These are just computations with rational quotients. At the risk of belaboring a point,
we emphasize that these computations are exactly the same as those with rational
quotients and not just “analogous to” them. There is so much in introductory
algebra that is just a revisit of arithmetic.
Because the cancellation law is valid for rational quotients (i.e., AC = CB for all
rational numbers A, B, and C, with A = 0 and C = 0),24 some rational expressions
can be simplified. Sometimes the cancellation presents itself, as in

(5x4 − x3 + 2)(2x − 15)

(14x2 + 3x − 28)(5x4 − x3 + 2)

Here, the nonzero number (5x4 − x3 + 2) in both the numerator and denominator
can be cancelled,25 resulting in

(5x4 − x3 + 2)(2x − 15) 2x − 15

= .
(14x + 3x − 28)(5x − x + 2)
2 4 3 14x + 3x − 28

Sometimes, the cancellation can be less obvious. For example, the rational expres-
x3 − 8
x2 + 2x + 4
can be simplified to x − 2 because, by identity (1.8) on page 12,

x3 − 8 = x3 − 23 = ( x − 2)( x2 + 2x + 4)

and we can cancel the nonzero number ( x2 + 2x + 4) from the numerator and
denominator. (As we will see when we come to Chapter 10—more precisely, page
232—it turns out that x2 + 2x + 4 is never equal to 0. Therefore, we actually have
−8 3
an identity x2 x+2x +4
= x − 2 for all x.)

24 Again, see page 269 or Section 2.5 of [Wu-PreAlg].

25 Remember: By our convention, we only consider those x so that 5x4 − x3 + 2  = 0.

In introductory algebra, students are too often required to automatically reduce

every rational expression to lowest terms, i.e., the numerator and denominator
of a rational expression have no factor in com-
There is no mathematical reason mon. Please do not inflict this requirement on
to automatically reduce every your students. This is a leftover from the ill-
rational expression to its lowest advised practice of teaching fractions by insist-
ing on the reduction of all fractions to lowest
terms. terms.
It remains to round off this discussion by
mentioning that, just as one can easily define polynomials in several numbers x,
y, z, etc., one can likewise define rational expressions in x, y, z, etc.

Exercises 1.5
In each of the following exercises, x and y are numbers.
(1) Compute and simplify:
x x −2
− 3 .
x − 16
4 x + 2x2 + 4x + 8
(2) If x is a number different from 2, −3, and −1, what is
2 3 1
+ − =?
x−2 x+3 x+1
(3) If x is a number that makes all the denominators nonzero in the follow-
ing, simplify:
2x3 −9x2 −5x
( x −2)2
x2 −3x −10
x4 −16
15x3 y4 4x4 − 9y4
(4) Simplify: (i) . ( ii ) .
−60x2 y7 4x4 + 12x2 y2 + 9y4
x4 − 16 3x + 6
(5) Simplify: · .
x2 − 4 x3 + 2x2 + 4x + 8


of Verbal Information
into Symbols
Word problems are the bugbears of students (and some teachers too). Part of this
difficulty stems from a habit that was probably acquired in elementary school
from some of their teachers and textbooks. Students learn to skip the crucial
step of trying to understand what the problem is about and look instead for so-
called “key words” in order to make the replacement of words by symbols into
an automatic, rote skill. Thus, “increase by” becomes +, “less than” becomes −,
“of” becomes ×, etc. (Google “key words math” to get an idea of the extent of
this phenomenon.)
This chapter confronts the key word syndrome head-on. We recognize that stu-
dents’ difficulty with solving word problems can be separated into three stages:
the first stage is reading the text carefully to know what the problem is about,
the second stage is the translation of verbal information into symbols, and the
third stage is the extraction of the solution from the symbolic statements, be
they equations or inequalities. The need for such a separation does not seem
to be widely recognized at present in school mathematics education. Many teach-
ers—after routinely writing down the equations associated with a problem by the
“key words” method—spend most of their effort on the skill of solving equations,
i.e., the third stage. Their students learn to follow suit. Consequently, many stu-
dents fail to learn the most fundamental aspect of algebra, namely, the proper
use of symbols to capture an abstract thought or—in the case of solving word
problems—translate verbal information accurately into equations or inequalities.
For the purpose of good mathematics education, we should, and must, reverse
this trend and promote the importance of the translation process. In this chapter,
we will address—exclusively—the first and second stages (reading carefully and
translating accurately) and leave the third stage to later chapters (see Chapters 3,
5, and 8–10).

2.1. Equations and inequalities

We first pause to formally define an equation in a symbol x, which is always as-
sumed to be a number. Many readers probably feel scandalized to be called upon
to do something this ridiculous: define an equation? Have you not been solving

equations all your lives? Perhaps. But before giving the definition, let us see what
an equation is supposed to be in TSM.1 First x is a “variable”, which means it is
some “quantity” and all you know is that it varies. Then when something like
3x − 5 = 7x − 1
is given, you immediately set about “solving” it by going through the motions.
What is 3x − 5, and what is 7x − 1? Both are combinations of a “variable” and
some numbers, and therefore both “vary” so that you don’t know what they
are. In what sense can such combinations be “equal”? Yet you are supposed to
accept that they are somehow “equal” and go about computing with them as if
they were plain, ordinary numbers. Are you making any sense? Is mathematics
so incomprehensible that it is reduced to a collection of symbolic manipulations
devoid of any meaning, and you just go through them because “this is what it
takes to get the right answer”?
Such thoughts should give you pause and make you feel uneasy about teach-
ing your own students by recycling the same unfathomable TSM that you were
subjected to. It is time to take a fresh look at what an equation is, get it right, and
try to teach your own students better.
An equation in x is a question asking which numbers x would make two given
expressions in x equal. Therefore the symbolic statement
3x − 5 = 7x − 1
is nothing more than an abbreviation of the question:
For which numbers x are the two expressions in x, 3x − 5 and 7x − 1, equal?
Any number x that makes the expressions equal is called a solution. In this
terminology, to solve an equation is to obtain all the solutions of the equation.
For example, suppose one expression is 3x − 5 and the other is 7x − 1. The
equation 3x − 5 = 7x − 1 asks for the collection of (all the) numbers x so that
3x − 5 = 7x − 1. It is not difficult to see that the only solution in this case is
−1, and we will discuss the solution process in the next section. In textbooks
and education materials, the whole question is usually presented as the following
symbolic statement with no preamble:
Solve 3x − 5 = 7x − 1.
Notice that such a statement violates the Basic Protocol in the use of symbols
because the symbol x has not been quantified and we have no idea what it is.
Does this mean that each time we see an equation, we must repeat the cumber-
some statement about “for which collection of numbers x is it true that 3x − 5 =
7x − 1?” No, not if an equation has been clearly defined—and understood—from
the beginning to be the abbreviation of that
An equation in x is a question question. Therefore, after students have come
asking which numbers x make to terms with what an equation is and what it
two given expressions in x equal. means to solve an equation, you will be able
to properly employ the time-honored, cryptic
shortcut: “Solve 3x − 5 = 7x − 1.” At that point, we hope there will be no
misunderstanding about an equation being the abbreviation of a question. Before
reaching that point, however, it is a good idea to remind ourselves of the real

1 See the footnote on page xi for the concept of TSM.


meaning of an equation, because if we don’t, the process of “solving an equation”

will degenerate into the sequence of meaningless moves that you have witnessed
firsthand in TSM. If you cannot make sense of something as basic as an equation
to yourself, how can you make sense of it to your students, and if you don’t do
that, how can you be a teacher?
Given an equation in x, there are three distinct possibilities: that every x is
a solution, that there is no solution, and that the equation has some solutions
but not every number is a solution. An equation for which every number is
a solution is what we called on page 6 an identity. For example, ( x + 1)2 =
x2 + 2x + 1 is an identity in x. An example of an equation with no solution is
x2 + 3 = x2 . Intuitively, it is clear that such an equation has no solution, but
let us prove it. We will use proof by contradiction (see the proof of Lemma 3.1
in Section 3.1 of [Wu-PreAlg]). Suppose there is a solution x0 , then we have an
equality of numbers, x02 + 3 = x02 . Therefore we can compute with numbers as
usual: (− x02 ) + x02 + 3 = (− x02 ) + x02 , from which it follows that 3 = 0, which is
absurd. So x2 + 3 = x2 can have no solution, as claimed. Finally, we give a not-
so-obvious example of an equation which has some solutions but is nevertheless
not an identity: x3 − 1 = − 52 x2 − 12 x. One can verify by a direct calculation that
2 , −1, and −2 are solutions. Clearly 0 is not a solution, and nor is 1. Thus this
1 2

equation is not an identity.

Because an equation in x involves only one number x, it is usually called
an equation of one variable or an equation in one variable out of respect for
tradition. This terminology is retained in the mathematics literature because, as
we said, it was used in the past and, like “identity”, it is convenient to have
around. However, you should also see that, strictly speaking, we don’t need this
terminology for solving equations, so please don’t lose any sleep over what “variable”
might mean.
Equations in a collection of (yet-to-be-determined) numbers x, y, . . . are simi-
larly defined. We will deal with equations in two variables in Chapter 4.
In a similar vein, an inequality in a num-
ber x is a question about whether there are An inequality in x is a question
numbers x that make one expression in x big- asking which numbers x make
ger than (or, bigger than or equal to) the other.
one expression in x smaller than
Again, the inequality may be valid for all x, for
some x, or for no x. The inequality x + 1 < 0, the other.

for example, is satisfied by no number x (do

you know why?). As in the case of equations, the explicit statement that an in-
equality in x is a question about whether there are numbers x that make one number
expression in x bigger than (or, bigger than or equal to) the other will usually be omit-
ted in the future.

2 By the so-called factor theorem (see e.g., Section 11.1 in Volume II of [Wu-HighSchool]), it is not

difficult to prove that there can be no more than three solutions. Thus 12 , −1, and −2 are the only
solutions of x3 − 1 = − 25 x2 − 12 x.

Exercises 2.1
(1) (You may freely make use of Theorem 9.2 on page 201 in order to do this ex-
ercise.) (i) Does the equation x2 + 2x + 1 = −4 in the number x have
solutions? Why? (ii) Does the equation x2 + 2x = −4 have solutions?
Why? (iii) Does the equation x2 − 6x + 7 = 0 have solutions? Why?
(iv) Does the equation in the numbers x and y, x2 + y2 − 4y = −9 have
solutions? Why?
3 1
(2) Does the equation 4x− 2 = 2x +3 have a solution? Why?

2.2. Some examples of translation

The critical first step in solving word problems is to read the problem carefully
and understand the situation. The next step—which is our main concern here—is
to correctly translate verbal information into
To solve word problems, one equations or inequalities. There can be no
always begins with a systematic, hope of getting a correct solution to a word
problem if we try to get it from the wrong
sentence-by-sentence translation equations or inequalities. We will try to re-
of the verbal data into symbolic dress the traditional neglect of this critical sec-
language. ond step by focusing on this translation pro-
cess alone. For emphasis, we will intentionally
ignore the subsequent solving of equations or inequalities and concentrate instead
only on the translation.
We will give a few illustrative examples. In these examples, notice that the
starting point is always a systematic, sentence-by-sentence translation of the verbal data
into symbolic language. Then all the information is pulled together at the end to
arrive at the correct equations or inequalities. Let us begin with a simple one.
Let ba be a fraction. If ba of 57 is taken away from 57, what remains
exceeds 23 of 57 by 4. Express this information as an equation in ba .
Solution. We know from the definition of the multiplication of fractions (see
Section 1.5 of [Wu-PreAlg]) that “ ba of 57” is just ba × 57. (We put in the × symbol
here for clarity, as ba · 57 or ba 57 would look somewhat odd, while writing it as
57 ba might confuse it with a mixed number. Lest we forget, the main purpose
of the symbolic notation is to add clarity and brevity to the verbal expression.
Consequently, any symbolic convention should be put aside whenever clarity or brevity
appears to be at risk.) Thus the statement, “If a fraction ba of 57 is taken away from
57”, becomes
57 − × 57
because of the exact definition of subtraction (see Section 1.3 of [Wu-PreAlg]).
According to the given information, this number is 4 more than 23 of 57, i.e., 4
more than 23 × 57. This is of course equal to ( 23 × 57) + 4. Here then is the direct
translation of all this information:

57 − × 57 = × 57 + 4.
b 3
This is the equation in the fraction b that we must solve.

The following example is a bit more complicated.

Johnny has three siblings, two brothers and a sister. His sister is half
the age of his older brother, and three-fourths the age of his younger
brother. Johnny’s older brother is four years older than Johnny, and
his younger brother is two years younger than Johnny. Let J be the
age of Johnny, A the age of Johnny’s older brother, and B the age of
his younger brother. Express the above information in terms of J, A,
and B.
Solution. We observe right away that the given data of the problem involve
Johnny’s sister, but we are asked to “express the above information in terms of
J, A, and B”, i.e., the sister is left out. There are many ways to deal with this
situation, and one of them is to directly translate all the information by bringing
in the sister, and then try to leave out any reference to her at the end while still
faithfully retaining all the given information. This is what we are going to do.
So let S be the age of the sister. “His sister is half the age of his older brother”
then becomes S = 12 A, and “His sister is . . . three-fourths the age of his younger
brother” becomes S = 34 B. “Johnny’s older brother is four years older than
Johnny” becomes A = J + 4, while “his younger brother is two years younger
than Johnny” translates into B = J − 2.
At this point, the two equations, A = J + 4 and B = J − 2, would appear to
be the answer because they are the only equations directly involving A, B, and
J. But these two equations fail to capture the part of the given information about
how the brothers are related to the sister, which indirectly gives information on
how the brothers are related to each other. So we go back to look at S = 12 A and
S = 34 B. They show that both 12 A and 34 B are equal to S, and therefore equal to
each other. Thus we also have 12 A = 34 B. This is the last piece of information
concerning J, A and B, and we have the following three equations:
1 3
A = J + 4, B = J − 2, A = B.
2 4
We next give an example requiring the use of inequalities.
Erin has 10 dollars and she wants to buy as many of her two favorite
pastries as possible. She finds that she can buy either 10 of one and
9 of the other, or 13 of one and 6 of the other, and in both cases she
will not have enough money left over to buy more of either pastry. If
the prices of the pastries are x dollars and y dollars, respectively, write
down the inequalities satisfied by x and y.
Solution. With x and y understood, Erin spends a total of $(10x + 9y) in
the first option, and then $(13x + 6y) in the second option. The key point is that
in either case, “she will not have enough money left over to buy more of either
pastry”. Consider then the first option: the total number of dollars left over is
10 − (10x + 9y). Consider first the relationship of this amount with the pastry
costing x dollars each. If this amount exceeds or equals x, then Erin would be
able to purchase one more of this pastry. Such not being the case, we have a strict
inequality: 10 − (10x + 9y) < x. But there is something more: the preceding
inequality does not rule out the fact that the amount of money she is spending
(i.e., 10x + 9y ) exceeds $10, whereas we are given that this amount is ≤ the $10
she has. Therefore in order to make a faithful translation of this situation, we

must add another inequality, namely, 10 − (10x + 9y) ≥ 0. We combine these two
inequalities into the following double inequality:3
0 ≤ 10 − (10x + 9y) < x.
Similarly, switching to the other pastry and replacing x by y, we also get
0 ≤ 10 − (10x + 9y) < y.
We apply the same considerations to the second option, that Erin buys “13 of one
and 6 of the other”. Altogether, we have the following collection of four double
inequalities that completely captures the verbal information:
0 ≤ 10 − (10x + 9y) < x, 0 ≤ 10 − (10x + 9y) < y,

0 ≤ 10 − (13x + 6y) < x, 0 ≤ 10 − (13x + 6y) < y.

As a final illustration, we do a problem that is a trifle more sophisticated than
the previous three. It is very instructive.
Two women started at sunrise and each walked at constant speed. One
went straight from City A to City B while the other went straight from
B to A. They met at noon and, continuing without stopping, arrived
respectively at B at 4 pm and at A at 9 pm.
If the sunrise was x hours before noon, and if L is the speed of
the woman going from A to B and R is the speed of the woman going
from B to A, translate the information above into equations using the
symbols L, R, and x.
Solution. For the concept of constant speed, see page 266. However, we will
give a more elaborate discussion of this concept in Section 3.2 on page 46 and
especially in Theorem 7.1 on page 138.
For ease of discussion, we will refer to the woman going from City A to City
B as the First Woman, and the other as the Second Woman. Before looking at a
correct solution, we first look at one that may be what most people would write
down and we will explain why that is not good enough. Here is a picture that
will guide our explanation.

Lx - 4L -
A r s rB
9R Rx

The distance between City A and City B is fixed, and both the First Woman
and Second Woman walked this distance in the time given. Let the First Woman
walk—at a constant speed of L mph—x hours before noon, and then another
4 hours (from noon till 4 pm), and let the Second Woman walk—at a constant
speed of R mph—x hours before noon and then another 9 hours (from noon till 9
pm). So the First Woman walked a total of x + 4 hours while the Second Woman
walked a total of x + 9 hours. Given that the former walked with speed L mph,
the total distance she walked in x + 4 hours is of course L( x + 4) miles. Similarly
the total distance the Second Woman walked in x + 9 hours is R( x + 9) miles. By a

3 Compare Section 2.6 of [Wu-PreAlg].


previous remark, both distances are the same since both women walked between
cities A and B. Therefore we get
(2.1) L ( x + 4) = R ( x + 9).
This is supposed to be the equation we have to solve. But is it?
The answer is not quite, because all that this equation says is that, after the
First Woman walked x + 4 hours and after the Second Woman walked x + 9, they
had both covered the same distance. What this equation fails to capture is the
information that, at noon, the two women met
at a certain point between A and B, and that One should always check to
at the time of the meeting both had walked ex- make sure that the symbolic
actly x hours in opposite directions from City
translation has completely
A and City B. This means that the total dis-
tance the two of them had covered after walk- captured the verbal information.
ing x hours (which is Lx + Rx miles) is equal
to the distance between the cities, which is L( x + 4) or R( x + 9) miles, as we
have seen. Therefore the additional piece of information that must be incorpo-
rated into the symbolic translation is Lx + Rx = L( x + 4) or, what is the same,
Lx + Rx = R( x + 9). (In view of (2.1), it makes no difference which of the two is
used.) Therefore, it takes the following two equations to completely capture the
verbal information embedded in the problem:
(2.2) L ( x + 4) = R ( x + 9), R( x + 9) = Lx + Rx.
Now one may also reason slightly differently. Let the meeting point of the
two women be C:
A - C  B

Consider the distance between A and C. The First Woman covered it in x hours
(before noon) while the Second Woman covered it in 9 hours (after noon). But in
x hours the First Woman walked Lx miles, while in 9 hours the Second Woman
walked 9R miles. Therefore Lx = 9R. Similarly, if we consider the distance
between C and B, we get in exactly the same fashion that 4L = Rx. The following
set of equations therefore also faithfully captures the verbal information of the
(2.3) Lx = 9R, 4L = Rx.
We will leave as an exercise to show that the two sets of equations, (2.2) and (2.3),
are “the same”, in a precise sense. See Exercise 1 immediately following.

Exercises 2.2

Do not attempt to solve any of the following problems. Do

only what the problem tells you to do, which is always about
translating the verbal information into the needed equations or
inequalities rather than getting the answers.
(1) Prove that from the equations in (2.2) one can derive the equations in
(2.3), and conversely, from the equations in (2.3) one can derive the equa-
tions in (2.2).

(2) The sum of the squares of three consecutive integers exceeds three times
the square of the middle integer by 2. If the middle integer is x, express
this fact in terms of x. If the smallest of the three integers is y, express
the same fact in terms of y.
(3) Paulo read a number of pages of a book with N pages, then he read
43 pages more and finished three-fifths of the book. If p is the number
of pages Paulo read the first time, write an equation using p and N to
express the above information.
(4) A whole number has the property that when the square of half this num-
ber is subtracted from 5 times this number, we get back the number itself.
If y is this number, write down an equation for y.
(5) Helena buys two books. The total cost is 49 dollars, and the difference of
the squares of the prices is 735. If the prices are x and y dollars, express
the above information in terms of x and y. (See Exercise 5 on page 93.)
(6) I have two numbers x and y. Take 20% of x from x, then what remains
would be 7 less than y. If however I enlarge y by 20%, then it would
exceed x by 8. Express this information in equations in terms of x and y.
(See Exercise 6 on page 93.)
(7) I have $4.60 worth of nickels, dimes, and quarters. There are 40 coins
in all, and the number of nickels and dimes together is three times the
number of quarters. If N, D, and Q denote the number of nickels, dimes,
and quarters, respectively, write equations in terms of these symbols to
capture the given information.
(8) We have two whole numbers. The division-with-remainder of the larger
number by the smaller number has quotient 9 and remainder 15. Also,
the larger number is 97.5% of ten times the smaller number. Let the
larger number be x and the smaller number be y. Express the given
information in equations in terms of x and y. (See Exercise 7 on page
(9) We look for two whole numbers so that the larger exceeds the smaller
by at least 10, but that the cube of the smaller exceeds the square of the
larger number by at least 500. If the larger number is x and the smaller
number is y, translate the above information in terms of x and y.
(10) If the digits of a three-digit number are reversed, the sum of the new
number and the original number is 1615. If 99 is added to the original
number, the digits of the original number are reversed. Let the hundreds,
tens, and ones digits of the original numbers be a, b, and c, respectively.
Write equations in a, b, and c to express the given information. (Caution:
Be very careful with the writing of your symbolic expressions.)
(11) A sum of money is to be divided equally among x people, each receiving
y dollars. If there were 3 more people, each person would receive 1 dollar
less, and if there were 6 fewer people, each would receive 5 dollars more.
Write equations in x and y to express this information. (See Exercise 8
on page 93.)
(12) The denominator of a fraction exceeds twice the numerator by 2, and the
difference between the fraction and its reciprocal is 5524 . If the numerator

is x and the denominator y, write equations in terms of x and y that ex-

press the above information. (You will be able to determine this fraction
after you have read Chapter 10.)
(13) A video game manufacturer sells out every game he brings to a game
show. He has two games, an A Game and a B Game. He can bring 50
of A Games and B Games in total to the show. Each A Game costs $75
to manufacture and will bring in a net profit of $125. Each B Game costs
$165 to manufacture and will bring in a net profit of $185. However, he
only has $6, 000 to spend on manufacturing. If he brings x A Games and
y B Games, describe in terms of x and y how he can maximize his profit.
(You will be able to maximize the profit after you have read Chapter 8.)
(14) Here are the instructions for a “magic trick”:
(1) Grab a calculator.
(2) Key in the first three digits of your phone number (NOT the area
(3) Multiply by 80.
(4) Add 1.
(5) Multiply by 250.
(6) Add the last 4 digits of your phone number.
(7) Add the last 4 digits of your phone number again.
(8) Subtract 250.
(9) Divide the number by 2.
Let the 3-digit number which is the first three digits of your phone
number be denoted by x, and let the 4-digit number which is the last four
digits of your phone number be denoted by y. Write down the equation
that shows that you always get back your phone number at the end.
(15) Here are the instructions for another “magic trick”:
(1) Pick any 3-digit number between 000 and 999.
(2) Reverse the order of digits.
(3) Subtract the smaller number from the larger, getting another
three-digit number.
(4) Reverse those digits.
(5) Add this number to the last one.
Let the hundreds digit, the tens digit, and the ones digit of the 3-
digit number be a, b, and c, respectively. Write down the equations that
show that you always get 1089 or 0 at the end.


Linear Equations in One Variable

This chapter treats the most basic topic in algebra: linear equations in one vari-
able. This is such a simple subject that you are entitled to ask whether we are not
wasting time by doing this. We are not, because there is widespread confusion
at the moment about what it means to solve an equation. We will therefore begin
by explaining why the usual method of solution taught in TSM (Textbook School
Mathematics1 ) is nothing more than a meaningless rote procedure. If we teach
students something that makes no sense, then either students tune us out and
end up not learning it, or—as often happens—they feel compelled to memorize
the rote procedure because they are left with no choice if they want to get the
right answers. Either outcome is inimical to the goals of a good mathematics
In the first section, we will make sense of this procedure in order to make
it teachable and learnable. The following section then discusses the solutions of
some prototypical word problems involving such equations.
We go into such detail about solving linear equations in one variable because, once
the reasoning is understood, it can be applied to the solution of any equation or system of
equations. See for example the solution of linear systems in Section 5.2, page 87,
and the solution of quadratic equations in Section 10.2 on page 238.

3.1. Solving linear equations

The meaning of solving an equation

The correct procedure for solving an equa-

The meaning of solving an equation

There is no question that linear equations involving a number x arise naturally.
(See the next section.) In this section, we make a first attempt at solving such
equations. We reiterate the rationale for paying close attention to solving linear
equations: the reasoning is perfectly general and will be applicable to the solution
of any equations.

1 For the explanation of TSM, see the footnote on page xi.


Formally, a linear equation of one variable asks for all the numbers x that
make two given polynomials in x of degree at most 1 equal. Examples are: 12x −
7 = 5x + 13, − 56 x + 1 = 23x − 4, and 9 = 27x − 4.
Now, you may feel that these are equations that you can solve with one eye
closed. Nevertheless, we are going to make you feel some discomfort by carefully
analyzing the usual procedure taught in TSM for solving such equations, step by
step. Then we will ask you to decide if it makes any sense. Let us first look at
how a simple equation such as 2x − 3 = 4x is solved according to TSM.

Step 1: 2x − 3 = 4x
Step 2: (2x − 3) − 2x = 4x − 2x
Step 3: −3 = 2x
Step 4: 2x = −3
Step 5: x = − 32

How are Step 2 and Step 5 justified? Let us concentrate on Step 2 first. Don’t
forget that in TSM, x is a variable, something that varies. Not knowing precisely
what a variable is, we cannot say what it means in Step 1 for 2x − 3 and 4x
to be equal, much less why the equality is undisturbed when (−2x ) is added to
both sides in Step 2. For example, if x can “vary”, then what about if x = 5?
In that case, the left side is 7 while the right side is 20, and the two sides are
definitely not equal. So once again, what does it mean for 2x − 3 and 4x to be
equal? Without answering this question, TSM nevertheless proposes that, if such
an equality is there, then adding the same object, −2x, to both 2x − 3 and 4x will
preserve the equality. This is a questionable attempt to imitate what Euclid wrote
some twenty-three centuries ago:

If equals be added to equals, the wholes are equal. ([Euclid, p. 155])

Of course even TSM tries to be persuasive. So it tries to compensate for the lack
of understanding by setting up an analogy. Imagine that we have a balance scale
and on the two sides of the balance are 2x − 3 and 4x, which balance each other

−2x −2x
2x − 3 4x

It seems reasonable that putting −2x on both sides will not “tip the balance”, and
this explains Step 2.
In case this is unconvincing, TSM presents a second strategy that makes use
of algebra tiles to “model” this solution of 2x − 3 = 4x. Let a green rectangle
model a variable and a red square model −1. Then the two sides of the equation

2x − 3 = 4x are modeled by the algebra tiles on both sides of the dotted line

It seems “natural” that, if we remove two green tiles on the left (i.e., adding −2x)
and also remove two green tiles on the right (indicated by the two arrows on each
side), the state of equality between the two sides remains undisturbed. This is
how we arrive at Step 2 above.
These analogies are useful psychological ploys to win students’ trust, but in
mathematics, we cannot replace logical reasoning by offering suggestions of why
something might be true on account of analo-
gies. Since no advanced mathematics or sci- A teacher has to be able to
ence can be done this way, it would be unfair explain what it means to solve
to make school students go down this slippery
an equation without using
slope. All the more so when the correct way of
solving an equation is so simple to explain. balance scales or algebra tiles.

The correct procedure for solving an equation

Let us see how to solve an equation correctly. If we want a solution to 2x −
3 = 4x, the way to look for it is to use the time-honored method of pretending
that we already know what the solution is, let us say x0 , and then make use of this
information to find out what x0 might be. This then narrows down the number
of candidates for a solution. Once we have this information, we will do a simple
checking to verify that these candidates are in fact solutions.
So we assume that a number x0 is a solution of 2x − 3 = 4x. (We use the
symbol x0 to emphasize that we are now looking at one fixed number.) Thus
2x0 − 3 = 4x0 .
Note that this is an equality between two numbers and, as such, we can bring
to bear all we know about numbers2 on the equality. Therefore, the following
computation is now entirely routine:
Step 1a: 2x0 − 3 = 4x0
Step 2a: (2x0 − 3) − 2x0 = 4x0 − 2x0
Step 3a: −3 = 2x0
Step 4a: 2x0 = −3
Step 5a: x0 = − 32
In view of the fact that these are assertions about numbers, every one of the steps
from 1a to 5a is transparent and there are no more dark clouds about the meaning
of a variable hanging over us.
Some additional comments, however, will be helpful.

2 We use what we know about rational numbers, and remember FASM.


(A) Take the transition from Step 2a to Step 3a. The reason that the right side
of Step 3a is correct is clear: it is a simple application of the distributive law:
4x0 − 2x0 = (4 − 2) x0 = 2x0 .
Observe that if we say 4 · 173 − 2 · 173 = (4 − 2)173, then there is no need for the
distributive law: the left side and the right side are both equal to 346 by a direct
computation. But what we are claiming is that, without knowing what x0 is, we can
nevertheless assert that 4x0 − 2x0 = (4 − 2) x0 . Then the only way we can justify
this is by invoking the distributive law. In the same vein, let us examine closely
how to arrive at the left side of Step 3a:
(2x0 − 3) − 2x0 = (2x0 + (−3)) + (−2x0 ) (by definition of subtraction)
= (2x0 + (−2x0 )) + (−3) (Theorem 1 in Section 1.11
in [Wu-PreAlg])
= 0 + (−3) = −3.
We have just made use of the commutative and associative laws of addition in the
second line (see the Appendix (Section 1.11) of Chapter 1 in [Wu-PreAlg]). When
we do computations with specific (known) numbers, the use of the commutative
or associative law is unnecessary. For example, we hardly need the associative
and commutative laws to justify (15 − 7) − 16 = (15 − 16) − 7, because the left
side is 8 − 16 = −8 and the right side is −1 − 7 = −8. But now look at
(2x0 + (−3)) + (−2x0 ) = (2x0 + (−2x0 )) + (−3).
Here we don’t know the specific value of x0 so that no explicit computation is possi-
ble. How can we claim that the two sides are indeed equal except by invoking
Theorem 1 in the Appendix of Chapter 1 in [Wu-PreAlg] (see page 270 in this
It is only when we do algebra and have to compute with unknown numbers
that the import of the general laws (associative and commutative laws of + and
×, and the distributive law) begins to be apparent. Each time we solve an equation,
we depend crucially on these laws.
(B) What has been accomplished in Steps 1a to 5a is that if we know there is a
solution x0 to 2x − 3 = 4x, then x0 = − 32 . This has nothing to say about whether
− 32 is a solution of 2x − 3 = 4x or not. Of course, there is a simple way to check
whether such is the case: if x0 = − 32 , then
2 − − 3 = −3 − 3 = −6
and also  
4 − = 2(−3) = −6.
Thus indeed 2x0 − 3 = 4x0 , and the solution of 2x − 3 = 4x is − 32 .
Now we have to point out that there is a downside to the above direct check-
ing: it may be simple, but it also appears to be so dependent on the specific equa-
tion 2x − 3 = 4x being used that, perhaps, the reasoning will not carry over
to another equation. To counteract this impression, we proceed to give a more
clumsy way of checking that − 32 is a solution of 2x − 3 = 4x. We do so by
observing that Steps 1a to 5a can be done in reverse order so that if we start with

x0 = − 32 (which is Step 5a), we will arrive at 2x0 − 3 = 4x0 (which is Step 1a),
thereby proving that − 32 is a solution of 2x − 3 = 4x. In greater detail:
• x0 = − 32 implies 2x0 = −3 (multiply both sides by 2)
• 2x0 = −3 implies −3 = 2x0
• −3 = 2x0 implies (2x0 − 3) − 2x0 = 4x0 − 2x0 (after adding and sub-
tracting 2x0 to each side)
• (2x0 − 3) − 2x0 = 4x0 − 2x0 implies 2x0 − 3 = 4x0 (add 2x0 to both
This then shows that, by reversing the very steps that show what a solution might
be, we get to see that the putative solution is indeed a genuine solution. A little reflec-
tion will show that the general reasoning that leads from Step 1a to Step 5a can
always be reversed, because it only makes use of the basic laws of operations: the
associative and commutative laws of addition and multiplication together with
the distributive law. Thus there is no accident: this method of solution is univer-
sally valid for all equations.
(C) We hope you are beginning to appreciate the earlier remark that there is
no need for the concept of a variable. We have solved the equation by dealing
strictly with numbers, and by observing the Basic Protocol in the use of symbols
(page 4). This is a lesson you will want to bring back to your classroom.
(D) The preceding elaborate justification of Steps 1–5 on page 38 raises the
specter that, in order to do mathematics correctly in middle school, even the solu-
tion of a simple equation such as 2x − 3 = 4x would always have to be accompa-
nied by an unreasonably stodgy explanation. There is no fear of that, however, if
we exercise proper pedagogical judgment. One way to deal with this issue would
be to go over Steps 1a–5a with care the first time a linear equation or a quadratic
equation is solved. Then as soon as students grasp the underlying principle of
equation solving, it would be safe to allow them to use the rote procedure of Steps
1–5 on page 38. Naturally, if a teacher wants to test students’ understanding of
the reasoning underlying Steps 1–5, putting such a question on a test would be
entirely appropriate.3
The issue surrounding the teaching of equation solving is actually no differ-
ent from the teaching of the standard algorithms in arithmetic. Take the most
notorious case, the long division algorithm, for example. The algorithm itself is
brief, while its mathematical explanation is anything but.4 While students should
be exposed to some form of the mathematical explanation of the algorithm at the
beginning, it would be wrong to ask for the explanation each time the algorithm
is executed. The same is true of the justification for the procedure of solving
What is indisputable is that if a teacher hopes to inspire trust in her students,
she has to thoroughly understand what it means to solve an equation in order
for her teaching to achieve the necessary transparency. Everything we have said
above is therefore an integral part of a teacher’s repertoire. Where TSM fails, and
fails spectacularly, is in never giving a correct explanation of what it means to
solve an equation.

3 Such questions will never appear on standardized tests, and this is one reason why we should

not rely solely on standardized test scores to evaluate the quality of math education.
4 See Chapter 7 of [Wu2011].

(E) A final comment is on the practical issue of solving equations with rational
coefficients, e.g.,
7 3 1
(3.1) x− = 2x − .
6 4 3
With an adequate understanding of rational numbers, this equation can be easily
solved according to Steps 1a to 5a. However, since students are more likely to
make computational errors with fractions than with integers, there is some ad-
vantage in being able to get around the fractions of this equation by clearing the
denominators, namely, multiplying both sides by the product of all the denomi-
nators, 6 × 4 × 3 = 72, to get:
(3.2) 84x − 54 = 144x − 24.
The important thing to note is that this equation is equivalent to the original
equation (3.1), in the sense that any solution of equation (3.1) is a solution of (3.2),
and vice versa (see Exercise 3 on page 45). In any case, we get 60x = −30, and
x = − 12 is the solution.
You may have noticed that, for the purpose of clearing the denominators in
(3.1), it suffices to multiply both sides by 12 instead of 72 (12 is the LCM of 6, 4,
and 3; see page 267 for the meaning of LCM). If we do that, we find that
14x − 9 = 24x − 4.
Therefore 10x = −5 and we get x = − 12 again. The choice of 12 rather than 72 is
a nice shortcut, but since it is not absolutely necessary for the solution of the equation,
there is no need to emphasize it in the school classroom.
Now that we have a fresh understanding of Steps 1–5, we will use this lan-
guage to describe the structure of solving a general linear equation. There are two
(I) Solve equations in x of the form ax = b, where a and b are constants with a = 0.
Clearly the solution is ba , as one can check:
a = b.
Notice that the fact a = 0 guarantees that the fraction ba is well-defined. For
example, the solution to 3x = −7 is − 73 (= −37 ).
(II) Any linear equation Ax + B = Cx + D (where A, B, C, and D are constants
and A = C ) has the same solution as a linear equation of the form ax = b.
Let us go into part (II) in some detail. It claims: any number x that satisfies the
former equation also satisfies the latter equation for some appropriate constants
a and b, and vice versa.
The reason is that if x is a number so that Ax + B = Cx + D, then
( Ax + B) + (−Cx − B) = (Cx + D ) + (−Cx − B).
Therefore, by Theorem 1 in the Appendix of Chapter 1 in [Wu-PreAlg] (see
page 270 of this volume), we have Ax − Cx = D − B, i.e., ( A − C ) x = D − B. In
other words, the original equation Ax + B = Cx + D is now in the form of ax = b,
with a = A − C and b = D − B. By part (I), the solution to ( A − C ) x = D − B is
x = .

We now check that the solution ( D − B)/( A − C ) of ( A − C ) x = D − B

is indeed a solution of the original equation Ax + B = Cx + D. Note that the
following computations have to rely heavily on the formulas for rational quotients
in Section 2.5 in [Wu-PreAlg] (see page 270 of this volume) because A, B, C, and
D are rational numbers. With that understood, we have:
D−B AD − AB AD − AB B( A − C )
A +B = +B = +

AD − AB + AB − BC

= .
On the other hand,
D−B CD − CB CD − CB D( A − C)
C +D = +D = +

CD − BC + AD − CD

= .
It follows that    
A +B = C +D

or, what is the same thing, ( D − B)/( A − C ) is a solution of Ax + B = Cx + D.

Notice that we need the assumption of A = C in order that the fraction D −B
A −C
makes sense.
We should now make contact with the terminology of school mathematics.
What we did in part (II) is sometimes referred to as isolating the variable.5 The
process of going from Ax + B = Cx + D to Ax − Cx = D − B is usually referred
to as transposing the terms Cx and B to the other side.
For example, the equation 3x − 1 = 8x + 7 becomes 3x − 8x = 7 + 1, and
therefore its solution is
7+1 8
= − .
3−8 5
It remains to examine the case of Ax + B = Cx + D where A, B, C, and D are
constants and A = C. In this case, it is time to remember what an equation means. We
are trying to determine the collection of all numbers x so that Ax + B = Cx + D
when A = C. Suppose there is such a number x; then the same procedure as above
leads to ( A − C ) x = D − B, which is 0 = D − B. If D = B, then the assumption
that such a number x exists leads to the fact that 0 is equal to a nonzero number,
which is absurd. We therefore must conclude that there is no such number x. On
the other hand, if D = B, then we have 0 = 0, which is fine. In fact, let us go back
to square one: suppose we assume A = C and B = D; then of course the two
5 At the risk of sounding like a broken record, we call attention to the fact that we freely use the

word variable here in place of x without saying what “variable” means, because it doesn’t matter.

sides of the equation Ax + B = Cx + D are the same number for every x. Thus in
this case we have the trivial identity that Ax + B = Ax + B.
We summarize the whole discussion in the following theorem:

Theorem 3.1. Given a linear equation Ax + B = Cx + D, where A, B, C, and D

are constants. Then:
(i) The equation has a unique solution ( D − B)/( A − C ) if A = C.
(ii) The equation has no solution if A = C but B = D.
(iii) Every number is a solution if A = C and B = D.
The theorem should not be memorized. Rather, one should be totally fluent
in repeating the steps in its proof in each case. Here is another example. To solve
− 23 x + 4 = − 15 x + 5 13 , we transpose − 15 x to the left, in the sense that we add
to both sides the number + 15 x (which is the negative of − 15 x) so that the − 15 x
disappears from the right side, but its negative, −(− 15 x ) = + 15 x, appears on the
left. In greater detail, adding + 15 x to each side yields
2 1 1 1 1
− x + 4 + x = − x + 5 + x.
3 5 5 3 5
The right side is therefore equal to − 15 x + 15 x + 5 13 = 5 13 (see Theorem 1 in
the Appendix of Chapter 1 (in [Wu-PreAlg]) on page 270 of this volume). We
therefore obtain:
2 1 1
− x+4+ x = 5 .
3 5 3
Now the left side is equal to − 23 x + 4 + 15 x = − 23 x + 15 x + 4 = −7 x + 4. Thus the
equation becomes
−7 1
x+4 = 5 .
15 3
We next transpose 4 to the right side: − 7 1 −7
15 x = 5 3 − 4, and so 15 x = 3 . Thus the
 4   −7 
solution is 3 / 15 = − 7 . 20

We can also make use of comment (E) on page 42 to clear the denominators
of the equation − 23 x + 4 = − 15 x + 5 13 in order to solve this equation. Multiplying
both sides by 15, we get
−10x + 60 = −3x + 80
so that 7x = −20, and x = − 20 7 as before.
To summarize, solving a linear equation in a number x depends on two simple
ideas: by transposing terms, we isolate x on one side of the equation, and then we
solve an equation of the type ax = b. From this
The practice of teaching linear point of view, the common practice of classi-
equations as one-, two-, three-, fying linear equations into one-step equations,
and four-step equations does not two-step equations, three-step equations, and
four-step equations, and then teaching the solv-
make mathematical sense. ing of linear equations according to this clas-
sification simply does not make sense. You
should avoid teaching the solution of linear equations according to this clas-

It remains to point out that sometimes a linear equation is disguised as one

involving rational expressions. For example, consider a number x that satisfies
2 4
= .
3x − 1 x + 13
By the cross-multiplication algorithm (which is valid also for rational quotients;
see page 270), this equation is equivalent to
2( x + ) = 4(3x − 1).
We now have a linear equation in x, and the solution is x = 15 (see Exercise 1

Exercises 3.1
7 2 4
(1) Prove that x = is the unique solution of = .
15 3x − 1 x + 13
(2) Solve: (i) 2x − 8 = 15 + 43 x. (ii) 73 x + 2 = 32 − 25 x. (iii) 11 9 − 3x =

−6x + 18 1
. (iv) ax + 6 = 8 − 7ax, where a is a nonzero number.
(v) 4bx + 13 = 2x + 26b, where b is a number not equal to 12 .
(vi) 12 − 83 x = 56 x + 23 . (vii) 25 ax − 17 = 13 ax − 15
2 .
(3) Let a linear equation ax + b = cx + d be given so that a, b, c, and d are
constants and a − c = 0; call this Equation A. Let k be a nonzero constant
and call the equation (ka) x + (kb) = (kc) x + (kd) Equation B. Prove that
a number is a solution of Equation A if and only if it is a solution of
Equation B.
(4) Given an equation 3x − 8 = ax + 7, where a is number. For what values
of a does the equation have a unique solution? have no solution? Can it
have an infinite number of solutions?
3−x 3 5 4
(5) Solve: (a) = − . (b) = .
x−1 2 2x − 3 2−x

3.2. Some word problems

Here are some examples of word problems involving the solution of linear equa-
tions in one variable.
Example 1. There are 39 coins made up of quarters and pennies, and they are
worth $4.47. How many quarters are there?
Solution. We follow the practice of Section 2.2 on page 30 and simply trans-
late the information faithfully into symbolic language before doing anything. So
if there are Q quarters, then there are 39 − Q pennies. In terms of cents, we have
447 cents, of which Q × 25 cents come from the quarters and 39 − Q cents coming
from the pennies. Obviously,
25Q + (39 − Q) = 447.
This is a linear equation in one variable, so the technique of the last section allows
us to solve this easily. Transposing, we have 25Q − Q = 447 − 39, therefore
24Q = 408, and therefore Q = 17. There are 17 quarters.

One should always check: 17 quarters amount to 17 · 25 = 425 cents. Adding

to it 39 − 17 = 22 cents from the pennies does give 447 cents.
Example 2. Find four consecutive odd integers so that the product of the
second and fourth integers exceeds the product of the first and third integers
by 64.
Solution. Let the smallest of the four odd integers be x. (At the moment
we do not worry about whether x is even or odd; we just translate the given
information and wait to see what happens. No reason to do more than you have
to!) Thus the next three integers are x + 2, x + 4, and x + 6. The given data is that
( x + 2)( x + 6) is bigger than x ( x + 4) by 64. So
( x + 2)( x + 6) − x ( x + 4) = 64.
The solution of this equation, which is not linear, begins with a simplification
of the left side by the use of the distributive law. We get x2 + 8x + 12 − x2 −
4x = 4x + 12 no matter what the number x may be. Thus the equation becomes
4x + 12 = 64, which is a linear equation in one variable after all. From 4x = 52,
we obtain x = 13. Thus the four integers are 13, 15, 17, 19. We check that
(15 × 19) − (13 × 17) = 285 − 221 = 64.
By the way, the solution suggests that the initial assumption that the four
consecutive integers be odd is irrelevant. All we need to know is that each integer
is 2 more than the preceding one.
Example 3. Break 48 into two parts so that the smaller part is 23 of the greater
part. (Compare Example 2 in Section 1.9 of [Wu-PreAlg].)
Solution. Let s be the smaller part; then the greater part is 48 − s. It is
given that s = 23 (48 − s). Thus 32 s = 48 − s, and 52 s = 48. It follows that
s = 965 = 19 5 . We check and see that the greater part is 48 − 5 = 5 , and
1 96 144

5 = 3 · 5 .
96 2 144

The next few problems are about so-called constant rates: the constant rate of
walking (which we call constant speed), the constant rate of water pouring into a
tub, the constant rate of work, such as the
The concept of constant rate has number of square feet a lawn is mowed, etc.
to be defined before rate problems In view of the fact that not only is the concept
of rate mangled in the standard materials, but
in school mathematics can be
the concept of constant rate, which is central to
solved. the solution of this class of problems, is hardly
ever clearly defined, we begin by recalling the
needed precise definitions (see Section 1.9 in [Wu-PreAlg]).
We will concentrate on speed; we have seen that the extrapolation of the speed
discussion to other kinds of rates is not difficult (see Section 1.9 of [Wu-PreAlg]).
In general, for a given motion, let us say the distance is measured in terms of
miles and the time is measured in terms of hours. We define the average speed
over a time interval from hour t0 to hour t, t0 < t, to be the division

total distance traveled (in miles) from t0 hours to t hours

(3.3) .
t − t0 hours
In this case, the unit of average speed is mph (miles per hour). However, it is
understood that the unit of distance (here in miles) and the unit of time (here in

hours) can be any pre-assigned units. We say the motion has constant speed v
mph (v being a fixed positive number), or more simply that it has speed v mph,
if its average speed over any time interval is always equal to v.
In doing word problems about a motion of constant speed v, the important
thing is to remember that no matter what time interval is used, the average speed
over this interval will always be the same, namely, v mph.
We have already done some (constant) rate problems in Section 1.9 of
[Wu-PreAlg], but we can now take up more complicated ones that require a more
substantial application of linear equations. We start with a prototypical problem
of this genre.
Example 4. Regina drives from Town A to Town B in 10 hours, and Eric in
12. Assume that each drives at constant speed. If Regina drives from Town A to
Town B, and Eric from Town B to Town A, and they leave at the same time and
drive on the same highway, after how many hours will they meet in between?
There is an implicit convention for problems of this type and it should
be brought out: Regina and Eric are implicitly assumed to drive cars,
and their cars are idealized to be two points,6 Likewise, the two
towns A and B are also idealized to be two points.7 Without
these two idealizations, it would be unclear as to what it means to say,
for example, that “Regina drives from A to B” in (exactly) 10 hours”.
We should keep these idealizations in mind when doing this kind of
Solution. We first determine the speeds of Regina and Eric. We do not
know the distance between Towns A and B, so to facilitate thinking, let us say this
distance is D miles.
DR mi DE mi
D mi
Since Regina’s (constant) speed is also her average speed in the 10-hour drive
from Town A to Town B, her speed is therefore 10 mph. Likewise, Eric’s speed
is 12 mph. We are trying to find out how long it will be before Regina and Eric
meet in between; let us say Regina and Eric meet after T hours. Note that D and
T are real numbers and, a priori, we do not know whether they will be fractions or
not. Therefore the following computations will have to invoke FASM (page 265)
many times, and we will not mention this fact again. In particular, we will make
use of the fact that the distributive law and formulas (a)–(d) for rational quotients
on page 270 are valid for real numbers.
Knowing that Regina has driven T hours when she meets Eric, we can now
determine the distance she has driven in T hours; let us call this distance DR miles
(see the preceding picture). Because Regina’s (constant) speed of 10 mph is also
her average speed during those T hours, we have
= R.
10 T
6 This is an example of “modeling”.
7 “Modeling” again.

Therefore, by multiplying both sides by T, we get:

DR = ·T = .
10 10
Similarly, if Eric has driven DE miles by the time he meets Regina, then
DE = ·T = .
12 12
Now observe that after T hours, they meet in between the towns, so DR + DE = D
(see the preceding picture again). Consequently,
+ = D.
10 12
This is the equation we have to work with, and you may be concerned about the
fact that there are two unknowns in the equation: D and T. But you can see
that the presence of D in the equation is spurious because “we can cancel the D”.
Precisely, if we multiply both sides by D1 and apply the distributive law to the
left side, we obtain:
+ = 1.
10 12
1 1
This is the equation we have to solve. Since it is T ( 10 + 12 ) = 1, we get,
1 5
T =
= 5 (hours).
+ 1 11
10 12

In other words, Regina and Eric meet after 5 11 hours.
It was pointed out in Section 1.9 of [Wu-PreAlg] that there is a certain “monot-
ony” to constant rate problems. For example, Example 4, which is about speed,
can be easily reformulated in terms of water flow or painting a house or mowing
a lawn. Consider, for example, the following problems.
(4a) Regina mows a lawn in 10 hours, and Eric in 12. Assuming that each
mows at constant rate, how long would it take them to mow the same lawn if
they mow together without interfering with each other?
(4b) Regina paints a house in 10 hours and Eric in 12. Assuming that each
paints at constant rate, how long would it take them to paint the same house if
they paint together without interfering with each other?
(4c) A faucet can fill a tub in 10 minutes, and a second faucet in 12. Assuming
that the rate of the water flow remains constant in each faucet, how long would it
take to fill the same tub if both faucets are turned on at the same time?
It is important to be able to recognize that the mathematics behind Example
4 and (4a)–(4c) is the same, and that if you can solve any one of these, the same
reasoning will allow you to solve them all.


Solve (4b).

Example 5. Water flows out of two faucets A and B at constant rate. Suppose
the water flow from faucet A is 10 gallons per minute more than that from faucet
B, and suppose a container has a capacity of 150 gallons. If both faucets are turned

on at the same time and the container is filled in 1 12 minutes, what are the rates
of the water flows in both faucets?
Solution. Let the rate of water flow from faucet A be x gallons per minute.
Then the rate from faucet B is x − 10 gallons per minute. Suppose the amount
of water coming out of faucet A after 1 12 minutes is w A gallons, then the average
rate of the water flow from faucet A in 1 12 minutes is, by definition,
1 12
Since this average rate is equal to x gal/min (because of the constancy of the rate
of water flow), we have
x =
1 12
and therefore w a = 1 12 · x. Similarly, the amount of water w B coming out of faucet
B after 1 12 minutes is w B = 1 12 · ( x − 10). Since by hypothesis, the container of
150 gallons is filled after 1 12 minutes when both faucets A and B are turned on at
the same time, we see that w A + wb = 150. Therefore
1 1
(3.4) 1 x + 1 ( x − 10) = 150.
2 2
There are many ways to solve this equation. One can, for example, clear the
denominators of the equation (see page 42). However, it is actually simpler in
this case to use the distributive law to expand the left side to get 1 12 v + 1 12 v −
(1 12 × 10), which is immediately seen to be equal to 3v − 15. Thus 3v − 15 = 150,
so that 3v = 165 and v = 55 mph. The answer is therefore: the rate of water
flow from faucet A is 55 gal/min and that from faucet B is 45 (= 55 − 10) gal/min.
Example 6. Karen and Lisa paint houses at a constant rate. Suppose Karen
paints 10 square meters more per hour than Lisa, and suppose a wall has an area
of 150 square meters. If both Karen and Lisa paint this wall at the same time and
they finish it in 1 12 hours, what are the rates at which each paints?
Again, there is an unspoken convention for this kind of collaborative-
work problems: Karen and Lisa are supposed to be able to work simul-
taneously without any interference from the other person.8
Solution. At this point, we may assume that we know how to define constant
rate of painting (in sq. m per hour) as the number r so that the average rate of
painting from time t0 to time t is equal to r sq. m per hour no matter what t0
and t may be.
Let Karen paint x square meters per hour. Then Lisa paints x − 10 square
meters per hour. If after 1 12 hours, Karen has painted K sq. m, then x = K/(1 12 )
because she paints at a constant rate, so that K = 1 12 x sq. m. Similarly, Lisa paints
1 12 ( x − 10) sq. m in 1 12 hours. So in 1 12 hours they have painted a combined
area of 1 12 x + 1 12 ( x − 10) sq. m. Since the area of the wall is 150 square meters,
we have
1 1
1 x + 1 ( x − 10) = 150.
2 2
8 Again, an example of modeling.

Comparing this equation with (3.4), we realize that we are doing the same prob-
lem as Example 5! Therefore the solution is that Karen paints at a rate of 55 square
meters per hour and Lisa 45 square meters per hour.
Example 7. Tom and May drive on the same highway at constant speed. May
starts 30 minutes before Tom, and her speed is 45 mph. Tom’s speed is 50 mph.
How many hours after May leaves will Tom catch up with her?
Solution. (As in the preceding examples, we will have to make use of FASM
throughout the following discussion.) We give two slightly different solutions.
Suppose T hours after May leaves, Tom catches up with May. In those hours,
May has driven 45T miles. Now Tom does not start driving until half an hour
after May does, therefore at the time he catches up with May, he has only driven
T − 12 hours. The total distance he travels in that time duration is thus 50( T − 12 )
miles. The fact that Tom catches up with May after T hours means that two
distances—45T miles and 50( T − 12 ) miles—are equal, i.e., 45T = 50( T − 12 ). By
the distributive law, 45T = 50T − 25. Adding 25 to both sides, we get 45T + 25 =
50T, and so we get 25 = 5T after adding −45T to both sides. Thus T = 5, i.e., 5
hours after May leaves, Tom catches up with her.
A second solution is obtained by imagining we can watch Tom’s car from
May’s car. Since she travels 45 miles in an hour and her speed is constant, she
travels 12 × 45 = 22.5 miles in half an hour. So when we watch Tom’s car from
May’s car half an hour after she leaves, we see Tom’s car coming from a distance
of 22.5 miles. Suppose after Tom has driven t hours, he catches up with May.
In those t hours, May’s car travels 45t miles, whereas Tom’s car travels 50t
miles. The fact that Tom catches up with May after t hours means in t hours, Tom
has driven 22.5 miles more than May. Consequently, 50t − 45t = 22.5, so that
5t = 22.5 and t = 4.5 hours. Since Tom starts 0.5 hours after May leaves, it takes
Tom 4.5 + 0.5 = 5 hours after May leaves to catch up with her.

Exercises 3.2

(1) A man has six hours at his disposal. What is the furthest he can ride in
a car going at a constant speed of 25 mph if he has to get back to the
starting point by riding a bicycle at the constant rate of 6 mph?
(2) A train loses 16 of its passengers at the first stop, 25 at the second, 20%
of the remainder at the third, and three quarters of the remainder at
the fourth. After all that, 25 passengers remain. What was the original
number of passengers?
(3) Water flows out of two faucets A and B at constant rate. Faucet A fills
a given container in 5 minutes, while faucet B fills it in 6 minutes. How
long would it take to fill the container if both faucets are turned on at
the same time?
(4) The numerator of a fraction is 7 less than the denominator. If 4 is sub-
tracted from the numerator and 1 added to the denominator, the result-
ing fraction equals 13 . What is the fraction?
(5) Alan had twice as much money as Bill, but after giving Bill $28, he has
3 as much as Bill. How much did each have at first?
(6) Find two numbers whose sum is 76 and whose difference is 16 .

(7) Lisa and Karen mow lawns at a constant rate. Lisa mows a certain lawn
by herself in 4 hours, but with Karen’s help from the beginning, she does
it in 3 hours. How long would it take Karen to mow it alone?
(8) There are two heaps of coins, one containing nickels and the other dimes.
The second heap is worth 20 cents more than the first, and has 8 fewer
coins. Find the number in each heap.
(9) If A has $566 and B has $370, how much money must A give B so that B
has 45 as much as A?
(10) A woman drives a car for 3 12 hours and she finds that she has covered a
distance of 130 miles. If she drives at a constant speed of 45 mph in the
country and 20 mph within city limits, how many miles of her trip is in
the country?
(11) (Sixth-grade Japanese exam question) A train 132 meters long travels at
87 kilometers per hour and another train 118 meters long travels at 93
kilometers per hour. Both trains are traveling in the same direction on
parallel tracks. How many seconds does it take from the time the front
of the locomotive of the faster train reaches the end of the slower train to
the time that the end of the faster train reaches the front of the locomotive
on the slower one?
(12) Two trains A and B run at constant speed. Train A goes from City P to
City Q in two hours whereas Train B goes from Q to P in three hours. If
A leaves P for Q at the same time that B leaves Q for P (on a separate
but identical rail!), after how many hours will they meet?
(13) Winnie and Reggie working together can paint a house in 56 hours. If
Reggie paints the same house alone, it takes him 90 hours to get it done.
How long would it take Winnie to paint the house if she works alone?
(Assume each paints at a constant rate, and that when they paint together
there is no mutual interference.)
(14) Two cars A and B move at constant speed. A starts from P to Q, 150
miles apart, at the same time that B starts from Q to P. They meet at the
end of 1 12 hours. If A moves 10 miles per hour faster than B, what are
their speeds?
(15) Alfred, Bruce, and Chuck mow lawns at a constant rate. It takes them 2
hours, 1.5 hours, and 2.5 hours, respectively, to finish mowing a certain
lawn. If they mow the same lawn at the same time, and if there is no
interference in their work, how long would it take them to get it done?
(16) Paul can mow a certain lawn by all himself in 11 hours. After working
for 2 12 hours, however, Paul is joined by Henry and the two together
finish mowing the lawn in another 5 hours. Assume as always that both
mow the lawn at constant rate, how long would it take Henry to mow
the lawn alone? Explain clearly how you get the solution.
(17) Water flows out of two faucets, A and B, at a constant rate. If both faucets
are turned on at the same time, a tub is filled in 36 minutes. If faucet A
alone can fill this tub in 58 minutes, how long would it take for faucet B
to fill it alone?
(18) A man walked at constant speed from one place to another in 5 12 hours.
If he had walked 14 of a mile faster in each hour, the walk would have

taken only 5 hours. How long is the walk and what was his original
(19) A solution consisting of water and alcohol has 70% alcohol. If 25 cc of
water is added to the solution, how much alcohol must be added in order
for the solution to still contain 70% alcohol?
(20) Fifteen minutes after Colin leaves for school, his mother discovers that he
forgot to take his homework. She drives at a constant rate, and it takes
her 6 minutes to get to school. Colin walks to school at a constant rate,
and it takes him 24 minutes to get there. (i) Use mental math to decide
if Colin’s mother can catch up with him. (ii) If she does, compute how
soon this happens after Colin leaves.


Linear Equations in Two Variables

and Their Graphs
The subject of linear equations of two variables and their graphs (lines) is central
to introductory algebra. It also happens to be an area in which TSM1 commits
some of its most flagrant errors.
Any discussion of the graph of a linear equation of two variables requires a
knowledge of geometry and algebra. We have done the groundwork in geometry
in Chapter 4 of [Wu-PreAlg], and the preceding three chapters provide the prepa-
ration in algebra. The first serious confrontation of algebra with geometry occurs
in the definition of the slope of a line, which
is the key concept that unlocks the mystery of The first serious confrontation of
why the graph of a linear equation in two vari- algebra with geometry occurs in
ables is a line. In TSM, one is supposed to un-
the definition of the slope of a
derstand that this is true because, when points
of the graph are plotted, they look like part of line.
a line. When linear equations are presented
this way on the basis of faith, mathematics—not being faith-based—becomes un-
learnable. Students’ misery in trying to cope with slope and the graphs of linear
equations has been well documented (cf. [Postelnicu], [Postelnicu-Greenes], and
[Stump]). The misery is set in motion by TSM’s refusal (or inability) to define
slope correctly (compare the discussion in Section 4.1 of [Wu-PreAlg]). Without
a correct definition, students do not realize that the slope of a line is a number
that describes its slant, and reasoning with slope becomes impossible. Thus ev-
erything about the graph of a linear equation must henceforth be committed to
rote memorization. It is shocking that this glaring defect in TSM—the incorrect
definition of slope—has been consistently overlooked in mathematics education
research of the past decades and has been allowed to stay in the school curriculum
for so long.
The first goal of this chapter is, therefore, to give a correct definition of slope
(page 66).2 It then uses this definition to prove the theorem that the graph of a
linear equation of two variables is a line (Theorem 4.2 on page 60). This theorem

1 See page xi for the definition of TSM.

2 The definition and treatment of slope given here were first presented in my 2013 Mathematics
Professional Development Institute and are published here for the first time. In the meantime, I agreed
to let it be used in [EngageNY] and [Eureka].


is never stated in TSM, and therefore not proved either, but it is the central the-
orem of this topic because the reasoning in the proof provides students with the
tools that render all standard problems involving equations of lines into routine

4.1. Coordinate system in the plane

Before discussing the graphs of linear equations, we have to set up a coordinate
system in the plane, in the sense that we will associate to each point of the plane
a unique ordered pair of numbers, and vice versa. Because this is a standard
process, we will merely outline the main points of how you can teach this in the
school classroom. In the procedure, we will need the fact that opposite sides of a
parallelogram are equal as well as the concept of the distance of a point from a given
line. Let us address this preparatory material first.

Theorem 4.1. Opposite sides of a parallelogram are equal.3

Proof. Let the parallelogram be ABCD. We have to prove that AB = CD and
BC = DA.

@ c

a @

@ b

d @


By hypothesis, AB DC, so the alternate interior angles ∠a and ∠b with respect

to the diagonal AC are equal (Theorem 4.7 on page 271; it is proved in Section
4.6 of [Wu-PreAlg]). For the same reason, ∠c = ∠d. Of course, the triangles ABC
and CDA have side AC in common. Therefore
ABC ∼ =
CDA on account of
ASA. Corresponding sides of congruent triangles being equal, we have AB = CD
and BC = DA. The proof is complete.
Next, let P be a point not lying on a line , and let Q be a point on  so that
the line L PQ ⊥ .
Take any point Q on  so that Q = Q; then | PQ | > | PQ| on account of the
Pythagorean Theorem (see page 270; this is Theorem 4.15 of [Wu-PreAlg]). Thus
| PQ| is the shortest distance from P to a point on . For this reason, we call | PQ|
the distance of P from .
Now choose two perpendicular lines in the plane which intersect at a point
to be called O. It is traditional to make one of the lines horizontal in the sense of

3 See Exercise 11 in Exercises 4.6 of [Wu-PreAlg] for the suggestion of another proof.

being parallel to the lower edge of the page; then the other line is vertical in the
sense of being parallel to the left and right edges of the page. Also by tradition,
the horizontal line is designated as the x-axis, and the vertical one the y-axis.
By regarding these two lines as number lines, we may henceforth identify every
point on these coordinate axes (as the x- and y-axes have come to be called) with
a number. As expected, we choose the positive numbers on the x-axis to be on
the right of O so that O is the 0 of the x-axis, and we choose the positive numbers
on the y-axis to be above O on the y-axis so that O is also the 0 of the y-axis. The
ray on the x-axis with vertex O and which contains the positive numbers is called
the positive x-axis; the positive y-axis is similarly defined. O is called the origin
of the coordinate system.
Recall that a number line depends on the choices of a point as 0 and another
point as 1. In the case of the x-axis and y-axis, the choice of 0 is already specified
by the requirement that the point of intersection O be also the 0 on both axes.
Once the choice of 1 on one axis, let us say the x-axis, has been made (to the right
of O), then the choice of 1 on the y-axis will be uniquely determined because the
counterclockwise rotation ϕ (the lower case Greek letter phi) of 90 degrees around
O has to be length-preserving (see assumption (Iso1) on page 265). Therefore if
the 1 on the x-axis is denoted by A, then the 1 on the y-axis has to be the point
ϕ ( A ).
r 1 = ϕ( A)

O 1

Now we can associate to each point P in the plane an ordered pair of numbers
in the following way. Let us agree to call any line parallel to the x-axis a horizontal
line, and also any line parallel to the y-axis a vertical line. Then through P draw
two lines, one vertical and one horizontal, so that they intersect the x-axis at
a number a and the y-axis at a number b, respectively. Then the ordered pair of
numbers ( a, b) is said to be the coordinates of P (relative to the chosen coordinate
axes); a is called the x-coordinate and b the y-coordinate of P (relative to the
chosen coordinate axes), as shown:

P r rb

r X
a O

Notice that the coordinate pair associated with a point is unique, i.e., unambigu-
ous, i.e., it cannot happen that a given P is associated with two distinct pairs of

numbers ( a, b) and ( a , b ), where a = a or b = b . This is because by the Paral-

lel Postulate, the horizontal and vertical lines passing through P are unique, and
therefore their intersections with the x-axis and y-axis are also unique.
Notice also that, because the pair ( a, b) is ordered, the first number a is always
the x-coordinate rather than the y-coordinate. Similarly, the second number b
will always be the y-coordinate and not the x-coordinate. The coordinates of a
number x on the x-axis are ( x, 0), and the coordinates of a number y on the y-axis
are (0, y).
Now, by construction, PaOb is a parallelogram. By Theorem 4.1, the length of
the segment from P to b, | Pb|, is just | a|. Likewise, the length of the segment from
P to a, | Pa|, is just |b|. Since the line L Pa joining P to a is parallel to the y-axis
and the y-axis is perpendicular to the x-axis, we see that L Pa is perpendicular
to the x-axis (again, see Theorem 4.7 on page 271). For the same reason, L Pb is
perpendicular to the y-axis. Thus, | a| is in fact the distance from P to the y-axis,
and |b| is the distance from P to the x-axis. We have therefore obtained a different
interpretation of the coordinates of P:
The x-coordinate of P is the distance from P to the y-axis if P is in the
right half-plane of the y-axis, and is minus this distance from P to the
y-axis if P is in the left half-plane of the y-axis.
The y-coordinate of P is likewise the distance from P to the x-axis if
P is in the upper half-plane of the x-axis, and is minus this distance
from P to the x-axis if P is in the lower half-plane of the x-axis.
Conversely, with a chosen pair of coordinate axes understood, then given an
ordered pair of numbers ( a, b), there is one and only one point in the plane with
coordinates ( a, b). Precisely, this is the point of intersection of the vertical line
passing through ( a, 0) and the horizontal line passing through (0, b). These two
lines being unique, by virtue of the Parallel Postulate, the point of intersection is
also unique.
We therefore see that, with a pair of coordinate axes in place, there is a one-to-
one correspondence between all the points in the plane and all the ordered pairs
of numbers, in the sense that we assign to each point in the plane a unique ordered
pair of numbers and, conversely, we assign to each ordered pair of numbers a
unique point in the plane. These assignments have the property that
if we assign to a point P the ordered pair of numbers ( a, b), then the
point in the plane we assign to the ordered pair of numbers ( a, b) is
also P, and
if we assign to an ordered pair of numbers ( a, b) the point P in the
plane, then the ordered pair of numbers we assign to P is also ( a, b).
This one-to-one correspondence is the reason that the coordinate plane is denoted
by the symbol R2 in mathematics, i.e., we identify the plane with the collection of
all ordered pairs of real numbers.
With this one-to-one correspondence under-
We usually identify a point of stood, we proceed to adopt the usual abuse of
the plane with its corresponding notation by identifying a point with its cor-
responding ordered pair of numbers. In the
ordered pair of numbers.
plane, we define ( a, b) = (c, d) to mean that
the points represented by ( a, b) and (c, d) are the same point. Since we have just

shown that every point corresponds to one and only one ordered pair of numbers,
we see that
( a, b) = (c, d) is equivalent to a = c and b = d.
Again, note that there is no ambiguity as to what the equality between two or-
dered pairs of numbers means.
We now make contact with a few geometric concepts that we have introduced
earlier. The first is a standard application of the Pythagorean Theorem: The dis-
tance between any two points ( a, b) and (c, d) is

( a − c )2 + ( b − d )2 .
This is usually called the distance formula for two points. The proof is so straight-
forward that it can be left as an exercise.
We can also express some basic isometries in terms of coordinates. The re-
flection across the x-axis maps a point ( x, y) to ( x, −y), and the reflection across
the y-axis maps a point ( x, y) to (− x, y) (x and y are any numbers). These fol-
low directly from the way coordinates are defined and from the definition of a
reflection (see Exercise 5 below). On page 114, one also finds a description of the
coordinates of points under reflection across the diagonal line y = x. In addition,
we can also express a translation in terms of coordinates; see Lemma 5.3 on page
95 in the next chapter; this lemma plays an important role in the discussion of
quadratic functions in Chapter 10.
There are some fine points about the drawing of a coordinate system that we
will have to confront at some point; see the discussion in Section 6.4.

Exercises 4.1
(1) (i) Let L be the vertical line passing through (5, 0) and let R denote the
reflection across L. What are the coordinates of R( x, y), the reflection of
( x, y) across L? (ii) Repeat part (i) when L is now the horizontal line
passing through (2, −3).
(2) Prove the distance formula for two points.
(3) Let D be the line which bisects the right angle whose sides are the posi-
tive x-axis and the positive y-axis. (i) Prove that the coordinates of every
point on D is (t, t) for a number t. (ii) Let Λ be the reflection with re-
spect to D. Prove that for any point ( x, y) in the plane, Λ( x, y) = (y, x ).
(4) Let R be the 180◦ rotation with respect to the origin O of a coordinate
system. Then for any point ( x, y) in the plane, prove that R( x, y) =
(− x, −y).
(5) Prove the claims about the coordinates of points under reflections across
the x- and y-axes above.

4.2. Linear equations in two variables

An equation in two numbers x and y such as ax + by = c for some constants
a, b, and c is called a linear equation in two variables. Thus x − 2y = −2 is
an example of a linear equation in two variables. Recall, by the definition of an
equation (page 28), the equation x − 2y = −2 is the question that asks whether
there are numbers x and y that satisfy x − 2y = −2. A solution of this equation

is an ordered 4 pair of numbers ( A, B) so that A and B satisfy the equation, i.e.,

A − 2B = −2. We observe that in this situation, it is easy to find all the solutions
with a prescribed first number A or a prescribed second number B. For example,
with the first number prescribed as 3, then we solve the linear equation in y,
3 − 2y = −2, to get y = 52 (see Section 3.1). Therefore (3, 52 ) is the sought-
for solution. Or, if the second number is prescribed to be −1, then we solve the
linear equation in x, x − 2(−1) = −2, to get x = −4. The solution is now
(−4, −1). Relative to a pair of coordinate axes in the plane, the collection of all
the points ( A, B) in the coordinate plane so that each pair ( A, B) is a solution
of the equation x − 2y = −2 is called the graph of x − 2y = −2 in the plane.
Using the above method of getting all the solutions of the equation x − 2y = −2,
we can plot as many points of the graph as we please to get a good idea of the
graph. For example, the following picture contains the following six points (given
by the dots) on the graph, going from left to right:

(0, 1), (2, 2), (2.5, 2.25), (4, 3), (6, 4), (7, 4.5).
These points strongly suggest that the graph of x − 2y = −2 is a (straight) line,
and we will presently prove in Section 4.4 that such is in fact the case.


2 4 6

However, for the graphs of the two special kinds of linear equations in two
variables in the form of x = a or y = b, where a and b are specific numbers, we
can prove that their graphs are lines without further ado. We single out these
two equations for another reason: their graphs are confusing to students, partly
because TSM5 does not explain it well. Let us go over these cases carefully.
Consider, for example, y = 3, which, as an equation in two variables, is in
reality the abbreviated form of the equation 0 · x + 1 · y = 3. The collection of all
solutions of y = 3 is then exactly all the pairs (s, 3), where s is an arbitrary number,
for the following reason. Every one of these (s, 3)’s is clearly a solution because

4 We emphasize that ( x0 , y0 ) being an ordered pair means ( x0 , y0 )  = ( y0 , x0 ), unless of course

x0 = y0 . Thus the point (3, 5) is not the same as the point (5, 3), and this is most obvious when we
think in terms of their graph: (3, 5) and (5, 3) are two distinct points which lie on different sides of
the line that is the graph of x − y = 0.
5 See page xi for the concept of TSM. The confusion in this case is largely the result of not

emphasizing the precise definition of the graph of an equation.


0 · s + (1 × 3) = 3. Are there perhaps other pairs of numbers which are also

solutions? For example, (s, 3.1)? But 0 · s + (1 × 3.1) = 3.1 = 3, so (s, 3.1) is not
a solution of y = 3 for any number s. Similarly, if a number t is not equal to
3, then (s, t) is not a solution of y = 3 no matter what t may be. This shows
that the preceding assertion about the pairs (s, 3) is true. In terms of the graph,
the points with coordinates (s, 3) always lie on the horizontal line (i.e., parallel to
the x-axis) passing through the point (0, 3) on the y-axis, and since s is arbitrary,
these points (s, 3) then comprise the complete horizontal line passing through
(0, 3). In short, the graph of the equation y = 3 in the plane is exactly the horizontal
line passing through the point (0, 3) on the y-axis.


Similarly, the graph of the equation x = −2 (as an equation in two variables)

is the vertical line (i.e., parallel to the y-axis) passing through the point (−2, 0)
on the x-axis. In general, we conclude in a similar manner:
The graph (in R2 ) of x = c for a given number c is the vertical line
passing through the point (c, 0) on the x-axis, and the graph (in R2 )
of y = b for a given number b is the horizontal line passing through
the point (0, b) on the y-axis.
Since there is only one horizontal (respectively, vertical) line passing through a
given point of the plane (do you know why?), it follows that every vertical line is the
graph of some equation x = c, and every horizontal line is the graph of some equation
y = b. Both of these simple facts are well known, but the precise reasoning behind
them may have been missing. We have supplied so much detail to explain them
because every middle school student should understand that these facts are not
facts to be memorized by brute force, but are consequences of careful reasoning
and the precise definition of the graph of an equation.
We next treat the general case. A linear equation in two variables x and y
is an equation in the numbers x and y which is either of the form ax + by = c,
where a, b, and c are constants (see page 5
for the definition) and at least one of a and b The fact that the graph of a
is nonzero, or can be rewritten in this form linear equation in two variables
after transposing and using the four arith-
is a line has to be proved.
metic operations. Thus −2x = 25 y + 7 and
6 + 38 y = 179 − 5x are examples of linear equations of two variables, as is
x2 − x + 5 = x2 + 2y + 16. We call attention to the requirement in the defini-
tion that at least one of a and b be nonzero. There will be ample occasions to make
use of this requirement.
A solution of this equation is an ordered pair of numbers A and B, written in
the expected fashion as ( A, B), so that they satisfy the equation ax + by = c, in
the sense that aA + bB = c. The graph of ax + by = c (in R2 ) is the collection
of all the points in the plane with coordinates ( A, B) (relative to a given pair of

coordinate axes), so that each is a solution of ax + by = c. As we have seen,

and will continue to bear witness, the study of linear equations of two variables
is grounded in the study of linear equations of one variable.
Armed with these precise definitions, we are now in a position to state the
main theorem of this chapter.
Theorem 4.2. The graph of a linear equation in two variables is a line. Conversely,
every line in R2 is the graph of a linear equation in two variables.
This theorem establishes a correspondence between lines in the plane and
the graphs of linear equations in two variables: the graph of a linear equation
ax + by = c is a line L, and every line L is the graph of some equation of the form
ax + by = c for suitable constants a, b, and c. It is customary to call ax + by = c
the equation of the line L if L is the graph of ax + by = c, and say that L is
defined by ax + by = c.6 Incidentally, this theorem explains why equations of
the form ax + by = c are called linear equations, because their graphs are lines.
The reasoning in the proof of this theorem, given in this and the following two sec-
tions, provides the key to the understanding of almost everything about linear equations
in two variables in introductory algebra.
We want to make a minor, but significant, simplification in the subsequent
discussion of Theorem 4.2. Suppose we start with a linear equation ax + by = c.
If b = 0, then by the definition of a linear equation in two variables, a = 0. The
equation may therefore be rewritten as x = c , where c is the constant c = ac .
In this case, we have seen (page 59) that the graph is a vertical line. The first part
of Theorem 4.2 is therefore true in this case, and we may assume from now on
that b = 0 in a given equation ax + by = c. Such being the case, we may rewrite
the equation as by = − ax + c, and therefore y = mx + k, where m = − ba and
k = bc . On the other hand, we have seen that a vertical line  is the graph of x = c
(i.e., x + 0 · y = c ), where (c , 0) is the point at which  intersects the x-axis (see
page 59 again). In other words, the second part of Theorem 4.2 is also true for
vertical lines. We may therefore assume from now on that a given line is not a vertical
line. Equivalently,
we may assume in the subsequent discussion of Theorem 4.2 that a
given linear equation is of the form y = mx + k, where m and k are
constants, and that a given straight line is nonvertical.
Finally, we note that while the preceding definitions of equations, solutions,
and graphs of equations appear to be valid only for linear equations in two vari-
ables, they are in fact valid for any equation (not necessarily linear) in any number
of variables. For example, let F( x, y) be an expression in the two numbers x and y
and let c be a fixed number. Then the equation F( x, y) = c is a question that asks
whether there are numbers x0 and y0 that satisfy F( x0 , y0 ) = c. An ordered pair
( x0 , y0 ) so that F( x0 , y0 ) = c is called a solution of the equation F( x, y) = c, and
the set of all the solutions of the equation F( x, y) = c is by definition the graph of
F ( x, y) = c. We will come across equations in two variables many more times in
the remainder of this volume.

6 Lemma 4.10 on page 79 below shows that the equation of a given line is unique up to a constant

multiple, so the terminology of the equation of a line is justified.



Let F( x, y) = x2 + y2 . What is the graph of F( x, y) = 9 ?

Exercises 4.2

(1) Explain clearly why each of the following figures fails to be the graph
of the equation y = 3. (a) The figure consisting of the horizontal line
passing through the point (0, 3.1). (b) The figure consisting of all the
points ( x, 3) so that x = 2. (c) The figure consisting of the horizontal
line passing through (0, 3) together with (0, 0).
(2) Let G be the graph of the equation −5x + y = 8. What is the point of
intersection of the x-axis with G? Explain your answer as clearly as you can.
(3) Let G be the graph of the equation x + 23 y = 1. What is the point of
intersection of the y-axis with G? Explain your answer as clearly as you can.

4.3. The concept of slope

Let us approach the proof of Theorem 4.2 by first looking at a special case such
as y = 23 x + 2. Why is the graph of this equation a line? The reasoning in this
case will shed light on the general case. So let G be the graph of y = 23 x + 2. Notice
that the point (0, 2) on the y-axis and the point (−3, 0) on the x-axis are on
G, because 2 = ( 23 × 0) + 2 and 0 = 23 × (−3) + 2. Let L be the (straight) line
joining (0, 2) and (−3, 0). We are going to prove that G is the line L.


r  X
 −3 O

Let us recall how to show that the two geometric figures G and L are equal
(see page 267; the discussion of equality of geometric figures is given in Section 4.4
of [Wu-PreAlg]). We first have to show that every point on the graph G lies on
the line L. But this is not enough because G could just be part of L and not all of
L. For example, G could be the segment on L indicated by the thickened portion
of L in the following picture:


−3 X
So we must also show that every point of L is a point of G. Therefore, we must
show two things (we label them by (α) and (β) to raise their profile):
(α) Every point on the graph G is a point on the line L.
(β) Every point on the line L is a point on the graph G.
The proofs of these assertions require some preparation, and we will address
the preparatory material before returning to these proofs on page 73. We begin
by reviewing some facts concerning similar triangles.7 Recall that two geometric
figures are similar if one is mapped onto the other by a dilation followed by a con-
gruence. The fundamental fact governing dilation is the following theorem; it is
an immediate consequence of Theorems 4.4 and 4.5 in Section 4.6 of [Wu-PreAlg].
Theorem 4.3. Let
ABC be given, and let D be a point on AB. Let the line passing
through D and parallel to BC intersect AC at E. Then
| AB| | AC | | BC |
= = .
| AD | | AE| | DE|




Check that indeed Theorem 4.3 follows from Theorems 4.4 and 4.5 in Section
4.6 of [Wu-PreAlg].

We will also need the following theorem, which is Theorem 4.13 in Section
4.7 of [Wu-PreAlg] (AA stands for angle-angle).
Theorem 4.13 (AA criterion for similarity). If two triangles have two pairs of
equal angles, they are similar.
In order to use this criterion effectively, one needs to know when two angles
are equal. In this context, the theorem about corresponding angles and alternate
7 See Sections 4.6 and 4.7 of [Wu-PreAlg].

interior angles intercepted on a pair of parallel lines by a transversal (Theorem

4.7 (of [Wu-PreAlg]) on page 271 will come in handy.
Our first goal is to arrive at a well-defined concept of slope. At this point, we
will simply refer to the preamble of this chapter on page 53 and to Section 4.1
of [Wu-PreAlg] for a discussion of the rationale of such an undertaking. Simply
put, TSM8 defines the slope of a line by taking two chosen points on the line to
form the ratio of “rise over run”, but neglects to explain why this ratio remains
the same when any two points on the line are chosen. We will avoid this pitfall.
We will begin with the intuitive meaning of slope. Given a nonvertical line
L, fix a point P on L. We first localize our attention to a neighborhood of the
point P and consider a slightly simpler problem, namely, from the vantage point
of P, how to distinguish this L among all the possible nonvertical lines passing
through P. Common sense equates slope with
steepness: greater slope means a steeper in- The slope of a line is a single
cline. So our immediate problem becomes number attached to the line; it is
one of distinguishing among all the nonverti-
not two things—rise over run.
cal lines passing through P by their varying
degrees of steepness (see the picture below). We would like to be able to assign a
single number (which could be positive or negative) to each line passing through
P so that the “bigger” the number, the steeper the line. (The general problem of
“the steepness of L at each of its points” will be dealt with on page 66.)

We have to be careful, however. Both lines in the left picture below are intuitively
considered to be rather steep, but as we look at them from left to right, one is
ascending and the other is descending. In order to distinguish between these two
kinds of steepness, we agree, by tradition, to assign a positive number to a line
slanted this way, /, and assign a negative number to a line slanted this way, \.
More precisely, we want the assignment of numbers to nonvertical lines passing
through P to satisfy the following natural requirements: (i) distinct numbers are
assigned to distinct lines, (ii) when the absolute value of this number is large, the
line would look like those on the left below—very steep—but when the absolute
value of this number is small (i.e., close to 0), then the line would look like those on
the right—not steep, almost horizontal—and (iii) 0 is assigned to the horizontal

8 See page xi for the definition.


D ``` P ((((((
DqP ` (`
(((( ```
D `
O  D O

The reason we exclude the vertical line from our consideration is twofold. The
first is technical and has to do with the inability to define division by zero (see
Theorem 4.4 on page 67). The second one is intuitive: if a line is vertical, it is
already the ultimate of “steep” and there would be no need to discuss it. (There
is an interesting story that indirectly reveals the woeful neglect of a precise defi-
nition for slope and the resulting damage on student learning on pages 241 ff. of
With this intuitive picture in mind, the following definition of the local slope
at P gives a natural and direct way to assign such a number to a line passing
through P. The definition will require the concept of the image of a set under
a translation (Section 4.4 of [Wu-PreAlg]); recall that if T is a translation of the
plane (see page 269) and P is a point in the plane, then T ( P ) denotes the point to
which T moves (translates) P . Now if S is a geometric figure, then the translated
image of S under T, denoted by T (S ), is the collection of all the points T ( P ), where
P is a point of S .
We now return to the consideration
of all the lines passing through the 2
fixed point P. Pass a horizontal line Y
through P and let Q be the point on 2
this horizontal line to the right of P so
that | PQ| = 1. Also recall that O is the
origin of the coordinate system.
Let  be the translated image of 1
the y-axis by the translation along the 1
−→ P Q= 0
vector OQ; the numbers on the y-axis
are also translated to  through this X
translation so that, in particular, the O
number 0 of  is at the point Q. The
line  is now the vertical number line 
passing through Q. This line  then −1
allows us to define the local slope of a
nonvertical line L passing through P, as follows: Let L intersect  at a point; then
the coordinate of this point of intersection on the number line  is by definition the local
slope of L at P.
The following picture gives a better idea of what is happening in two cases.
First, when a line L1 passing through P intersects  at a point Q1 above Q, then
the local slope of L1 at P is the number Q1 on the number line . This Q1 is
positive because, like the y-axis, the positive numbers on  are those above its 0,

which is Q. On the other hand, if a line L2 intersects  at a point Q2 below Q, then

Q2 is a negative number on  and the local slope of L2 at P (being Q2 ) is negative.

P 1 Q




Show that a line passing through P has local slope equal to 1 if and only if it
is the 45◦ counterclockwise rotation around P of the horizontal line, and that
it has local slope equal to −1 if and only if it is the 45◦ clockwise rotation of
the horizontal line around P.a
a Also see the comments about coordinate systems in Section 6.4.

The virtue of this definition of the local slope of a line at P is that it shows in a
natural way why some lines, such as L1 in the picture, have positive local slope
while others, such as L2 in the picture, have negative local slope. It also allows
the value of the local slope of a line at P to be read off directly from the number
line  itself. To this end, observe that we may give an equivalent description of the
number line  as follows. On the vertical line  through Q, we choose the number
0 to be the point Q itself, and choose the number 1 to be the point on  which is of
distance 1 above Q. Recall that once 0 and 1 have been chosen, the number line is
completely determined; it is straightforward to see that this procedure describes
the same number line as the one obtained by translating the y-axis to  along the
vector OQ. Therefore we have:
[Alternate definition of local slope of L at P.] If the given line
intersects  at a point Q1 above Q, then the local slope of this line at
P is the length of QQ1 , but if the line intersects  at a point Q2 below
Q, then the local slope of the line at P is the negative of the length
of QQ2 .
We can say more. This definition of local slope of a line at P immediately
implies that the local slope of a horizontal line passing through P is 0. Moreover,
suppose the point of intersection Q1 of a line L1 (passing through P) with  is
very high above Q; then the (absolute value of the) local slope of L1 would also
be very large. Correspondingly, L1 would be very steep. However, if the point
of intersection Q2 of a line L2 (passing through P) with  is very far below Q,
then the local slope of L2 would be a negative number with a very large absolute

value.9 But if Q2 is very far down below Q, L2 would have to be very steep as
well. Examples like these show that this definition of the local slope of a line
at P captures the intuitive meaning of slope as a measurement of steepness (see
(i)–(iii) on page 63).

Suppose L is the line passing through (5, 1) and (−2, 3). What is the slope
of L at (5, 1)? What is the slope of L at (−2, 3)? (Hint: Use Theorem 4.3 on
page 62.)

So far, we have been looking at the steepness of a line L at a fixed point P on

L. The question naturally arises as to whether,
For a cognitively complex if a different point M on the line L were to be
concept such as slope, one must chosen, the local slope of L at M would be equal
to the local slope of L at P. (Compare the pre-
make sure that the definition
ceding Activity, and think of P = (5, 1) and
makes sense mathematically. M = (−2, 3).) We will show presently that in-
deed such is the case, i.e., the two local slopes
are always equal. This then makes possible the definition of the slope of L for
any nonvertical line L: it is by definition the local slope of L at P as defined on page
64, where P is an arbitrary point of L.
Let us hasten to show that this definition of slope makes sense by showing
that if two points P and M are chosen on a given line L, then the local slope of L
at P is equal to the local slope of L at M.
At the point P, we get the vertical number line  at Q, where Q lies on the hor-
izontal line passing through P and is of distance 1 to the right of P (see page 64).
Consider first the case where L intersects the number line  at a point Q above
Q (see the picture below). At M, let N be a point on the horizontal line passing
through M so that N is of distance 1 to the right of M. Let the vertical line passing
through N meet L at N , as shown below. We are going to show that QQ and
NN have the same length, i.e., | QQ | = | NN |. This will show that the local slope
of L at P is equal to the local slope of L at M (see the alternate definition of local
slope at P on page 65).


Q M 1 N

P 1 Q

9 Recall that the absolute value of a number is its distance from 0. In this instance, 0 is just Q.

The reasoning is as follows. We are given

| PQ| = | MN | = 1 and |∠PQQ | = |∠ MNN | = 90◦ .
Also |∠Q PQ| = |∠ N MN | because they are corresponding angles of the parallel
(horizontal) lines MN and PQ with respect to the transversal L (Theorem 4.7 (of
[Wu-PreAlg]) on page 271). Hence
PQQ ∼ =
MNN because of ASA. It follows

that | QQ | = | NN | because they are corresponding sides of congruent triangles.
Next, consider the case where the line L intersects the number line  at a point
Q below Q. Then the corresponding picture is shown below.


We have to show that | QQ | = | NN |. Because the reasoning is so similar to the
preceding case, we will leave it as an Activity. In any case, we have proved that
the concept of slope as given on page 66 is well-defined.


Prove that | QQ | = | NN |.

One more question about this definition of slope: while it is conceptually

transparent, isn’t it too clumsy for computations? This is indeed correct, and the
clumsiness would be fatal but for the intervention of the following theorem. In
order to state this theorem, we introduce a notational convention for the coordi-
nates of a point that will be used consistently for the rest of this volume. For a
point P in the coordinate plane, we denote its coordinates by ( p1 , p2 ) using the
corresponding small letter p as well as using the subscripts 1 and 2 on p to in-
dicate its first (i.e., x-) coordinate and its second (i.e., y-) coordinate, respectively.
This notation may take some getting used to, but it has the advantage of being
self-explanatory once the convention has been established, e.g., the coordinates
of a point B will be (b1 , b2 ) and those of another point Q will be (q1 , q2 ), etc. No
thinking is involved. In addition, this is the notation that can be used in any
dimension, e.g., in 3-dimensions, the coordinates of a point P will be ( p1 , p2 , p3 ).
This said, the theorem to be proved is the following.
Theorem 4.4. On a given nonvertical line L, let any two distinct points P and R be
p −r
chosen. Then the slope of L is equal to the ratio p2 −r2 .
1 1

Several trivial remarks should be made about this ratio right away.

First, because for any a and b, ba = − b (see Lemma 2.12 of Section 2.5 in
[Wu-PreAlg] when a and b are rational, then use FASM), we see that
p2 − r2 r − p2
= 2 .
p1 − r1 r1 − p1
This shows that in writing the ratio, the order of P and R doesn’t matter (so long
as the order of their appearance in the numerator and denominator remains the
Next, observe that the denominator of this ratio is never 0 because if it is, then
r1 − p1 = 0 and r1 = p1 . The distinct points P, R now have the same x-coordinate
and therefore lie on a vertical line. This implies that the line L is a vertical line,
contradicting the hypothesis that L is nonvertical. Thus the denominator of this
quotient is never zero, and the quotient makes sense.
p −r
Finally, we should point out where the ratio p2 −r2 originally comes from.
1 1
To this end, consider the situation in the definition of the local slope of L at the
point P, as given on page 64. Thus let Q be the point lying on the horizontal
line passing through P of distance 1 to the right of P, and let  be the vertical
number line passing through Q with Q = 0. Now let the point R of Theorem 4.4
be the point of intersection of L with the vertical number line  (see both pictures
below). Then we claim:
p2 − r2
(4.1) slope of L = .
p1 − r1

P 1 Q
P 1 Q

In order to prove (4.1), observe that p2 = q2 because P and Q lie on the same
horizontal line and therefore have the same y-coordinate, and r1 = q1 because R
and Q lie on the same vertical line  and therefore have the same x-coordinate. If
R is above Q (as in the left picture above), then
p2 − r2 r − p2 r − q2 | RQ| | RQ|
(4.2) = 2 = 2 = = = | RQ|.
p1 − r1 r1 − p1 q1 − p1 | PQ| 1
On the other hand, if R is below Q (see the right picture above), then, again using
p2 = q2 and r1 = q1 , we have
p2 − r2 q − r2 | RQ| | RQ|
(4.3) = 2 = = = −| RQ|.
p1 − r1 p1 − q1 −| PQ| −1
In view of the alternate definition of local slope at P on page 65, equations (4.2)
p −r
and (4.3) together prove that p2 −r2 is equal to the local slope of L at P, which is
1 1
of course the slope of L. The proof of (4.1) is complete.

Proof of Theorem 4.4. We will prove the theorem for the case of a positive slope
for the line L. The remaining part of the proof for the case of a negative slope will
be left as an exercise.
p −r
Because it doesn’t matter in the writing of the ratio p2 −r2 which of P and
1 1
R comes first, we may assume that P is the point to the left of R, i.e., we may
assume p1 < r1 . As usual, if Q is the point to the right of P on the horizontal
line H passing through P so that | PQ| = 1, then the vertical line through Q
intersects L at a point M so that M is above Q on this vertical line. We also take
this opportunity to recall that | MQ| is equal to the slope of L (see the alternate
definition of local slope at P on page 65). Let the vertical line passing through
R intersect the horizontal line H at a point S. Either | PS| ≤ 1 or | PS| > 1. The
following picture shows the case | PS| > 1, but the reasoning for both cases is

1 Q S H

The reasoning that led to (4.2) or (4.3) now shows that

p2 − r2 | RS|
(4.4) = .
p1 − r1 | PS|
Here comes the critical idea: the triangles PRS and PMQ have two pairs of equal
angles (they share an angle ∠ P, and |∠ MQP| = |∠ RSP| because both are right
angles) so that by the AA criterion for similarity (Theorem 4.13 on page 62), the
triangles are similar.10 Therefore their corresponding sides are proportional (The-
orem 4.12 on page 271):

| RS| | MQ|
(4.5) = .
| PS| | PQ|
| MQ| | MQ|
However, | PQ| = 1 = | MQ|, which as noted is equal to the slope of L. There-
fore, by combining this fact with equalities (4.4) and (4.5), we obtain
p2 − r2
= slope of L.
p1 − r1

The proof of Theorem 4.4 is complete.

10 In this case, note that Theorem 4.3 on page 62 also suffices for the purpose at hand.

The important conclusion to draw from Theorem 4.4 is this: given any straight
line L which is not vertical, then the slope of L is given by the ratio
p2 − r2
p1 − r1
for any two points P and R on L


Suppose a line L passes through (2, −3) and (−4, 1). If P is a point on L
with x-coordinate 23 , what is the y-coordinate of P?

Textbooks usually define the slope of a line by picking two points P and R
on the line and then declaring the ratio formed from the coordinates of these two
points—as in (4.6)—to be the slope of the line. A priori, the ratio resulting from a
different choice of points on the line could be a different number so that a line
could have many slopes. This would render any discussion of “the slope of a line”
nonsensical. For example, suppose instead of a straight line we have the circle of
radius 5 around the origin (0, 0), denoted by C .


(−5,0) O (5,0)

If we take the two points (−5, 0) and (3, 4) on C , and form the usual ratio, we get
= 12 . On the other hand, taking another pair of points (5, 0) and (0, 5) on
C leads to the ratio of 50−
−5 = −1. For the curve C , the ratios formed from different

pairs of points on it are therefore not always the same. Yet for a line, these ratios
are always the same, and the question is why?. We have answered this question by
the use of similar triangles. Be sure your students know the answer too, because
the fact that the computation of the slope of a line in Theorem 4.4
can be done by using any two points on the line is a powerful tool in
dealing with all kinds of questions related to linear equations.
The discussion in the next section will amply bear out this assertion.
We will round out the discussion of slope by addressing two questions con-
cerning its definition that may be baffling to some. First, why do we choose a Q
(on the horizontal line through P) to be of distance exactly 1 from P? The answer
is that there is no reason at all except that we have to decide on a consistent choice
of such a point Q so that we can compare the slopes of different lines passing
through different points. For example, we can choose this Q to be of distance 2

from P once and for all. Such a choice would not change the reasoning in any dis-
cussion of slope except that the values of the slopes of lines would be consistently
larger,11 as the following picture on the left indicates.
L1 L1
Q1 Q1
1 Q 1
P 1 Q P

Another question is why choose a point Q to the right of P rather than to the
left? Again, no reason at all except to maintain a consistency in order to make
the discussion possible. In the above picture on the right, suppose we choose
Q to be 1 unit to the left of P, then the slope of a line like L1 in the picture on
the right would be negative because the point of intersection Q of L1 and the
vertical number line passing through Q is now below Q (the 0 of the vertical
number line). The same reasoning leads to the fact that the slope of a line like
L2 would be positive. There is nothing wrong with that except this is not the
convention about slope that we are used to. (This is similar to asking why we
want the positive numbers on the x-axis to be on the right of 0, or why we want
the positive numbers on the y-axis to be above 0. The answers would be the same:
no reason other than to conform with an a priori accepted convention.)

Exercises 4.3
(1) (This exercise shows that the use of a vertical number line  on page 64 to
define slope is not strictly necessary.) Referring to the picture used for the
definition of slope on page 65 and using the “coordinates of a point” to
refer to the coordinates with respect to the x- and y-axes, prove that the
slope of L1 is
(y-coordinate of Q1 ) − (y-coordinate of Q).
Similarly, prove that the slope of L2 is
(y-coordinate of Q2 ) − (y-coordinate of Q).
(2) Prove the case of Theorem 4.4 when the slope of L is negative.
(3) (i) Let L be the line joining (1, 2) to ( p, −4), where p is some number.
For what value of p would L pass through (10, 25)? (ii) Let  be the line
joining (− 32 , 4) and ( 45 , q), where q is some number. For what value of q
would  pass through (2, −3)?
(4) Does the line joining (3, −2) and (6, 2) contain the point (9, 6)? Explain
it two different ways.

11 They would be twice as large, which one can prove by using Theorem 4.3 on page 62.

(5) (i) Let A = ( a, a ) and B = (b, b ). Prove that the midpoint of the segment
b a + b
AB is ( a+2 , 2 ). Hint: Use Theorem 4.3 on page 62. (ii ) Generalize
part (i): Given positive numbers s and t. Prove that the coordinates of
| AC |
the point C on the segment AB so that |CB| = st are given by
ta + sb ta + sb
, .
s+t s+t
(6) Let D be a dilation of the coordinate plane with center at the origin O
and let L be a line whose slope is s. What is the slope of D ( L)?
(7) (i) Let L be the line passing through (1, −2) with slope m. For which
value of m would L pass through (20, 72)? (ii) Let  be the line with
slope m passing through ( 12 , 34 ). For which value of m would  pass
through ( 53 , 13 )?

4.4. Proof that graphs of linear equations are lines

We are now in a position to prove that the first part of Theorem 4.2 on page 60 is
true, i.e., the graph of a linear equation y = mx + k is a line. Consider first the
seemingly obvious question: if two lines have the same slope and pass through
the same point, are they identical? The answer is given in the following theorem.
Theorem 4.5. If two lines have the same slope and pass through the same point, then
they are the same line.
Proof. Let L1 and L2 be two lines passing through the same point P, without
assuming that they have the same slope for the time being. On the horizontal line
passing through P, let Q be the point to the right of P so that | PQ| = 1. Let  be
the vertical line passing through Q, and let Q1 and Q2 be the points of intersection
of L1 and L2 with , respectively, as shown:

P 1 Q



By definition (see page 66), the slope of L1 (respectively, L2 ) is the coordinate of

Q1 (resp. of Q2 ) on the number line . Now suppose L1 and L2 have the same
slope; then Q1 and Q2 coincide, and L1 and L2 —both being lines joining the two

distinct points P and Q1 —must likewise coincide. The proof of Theorem 4.5 is
Armed with Theorem 4.5, we can now conclude the discussion of the previous
special case of Theorem 4.2 when m = 23 and k = 2, i.e., the equation y = 23 x + 2
(see page 62). As before, let
L be the line joining (−3, 0) and (0, 2),
G be the graph of y = 23 x + 2.
Recall that our strategy is to prove that G coincides with L by proving:
(α) Every point on the graph G is a point on the line L.
(β) Every point on the line L is a point on the graph G.
We first prove (α). Let the point (0, 2) on the y-axis be denoted by P. Take
an arbitrary point R on the graph G distinct from P, and we must prove that R
lies on L. We will do so by showing that the line L joining P to R coincides with
L so that, in particular, R lies on L. So why would L and L coincide? There is
no a priori reason that they should, because if the graph G is really “curved”, as

# "
r# r
# "
" R
# "
G # ""
# "
# "
# "
# O

then L and L would be distinct. What we are going to show is that, because G is
the graph of a linear equation y = 23 x + 2, L and L must have the same slope.
Once that is done, since the lines L and L both pass through P, they will have to
coincide because of Theorem 4.5.


 R = (r1 , r2 )

 −3 O


The slope of L can be computed from any

The slope of a line L can be pair of points on L, in particular, from (0, 2)
computed by using any two (i.e., P) and (−3, 0): it is −03−−20 = 23 . What
distinct points of L. about the slope of L ? We compute it using
the points (0, 2) and R , where R is the pre-
viously chosen point on the graph G distinct from (0, 2). Let the coordinates of
R be (r1 , r2 ). Observe that r1 = 0. For if r1 = 0, then R = (r1 , r2 ) being a point
of the graph of y = 23 x + 2, we would have
2 2
r2 = · r1 + 2 = · 0 = 2.
3 3
Therefore R = (0, 2). This would contradict the fact that R is distinct from

P = (0, 2). Thus r1 = 0 and the slope of L computed from P and R is:
r2 − 2 ( 23 r1 + 2) − 2 2
3 r1 2
= = = .
r1 − 0 r1 r1 3
So both L and L have the same slope 23 , as claimed. Since they also pass through
the same point P, by Theorem 4.5, they must coincide and step (α) is proved.
To prove step (β) for the equation y = 23 x + 2, we must show that if a point
R lies on the line L joining P (= (0, 2)) and (−3, 0), then R also lies on G. This
means, if the coordinates of R are (r1 , r2 ), then we must show r2 = 23 r1 + 2. We
prove this by computing the slope of L in two different ways, first using the two
points (−3, 0) and (0, 2), and then using (0, 2) and R = (r1 , r2 ).


R = (r1 , r2 )

 P = (0, 2)

 −3 O

Since both give the same number (the slope of L), we get
r2 − 2 2−0
= .
r1 − 0 0 − (−3)
Thus r2 − 2 = 23 r1 , so that r2 = 23 r1 + 2, as desired. This completes the proof of
step (β), and therewith also the proof that the graph of y = 23 x + 2 is the line L
joining (0, 2) and (−3, 0).
Observe how the preceding proof depends critically on the fact that we can compute
the slope of a line by using any two points of our choosing.
We now give the proof that, for any numbers m and k, the graph of the
equation y = mx + k is a line. (This is the first part of Theorem 4.2 on page
60.) Let any two points P and R be chosen on the graph G of y = mx + k, and
let L be the line joining P and R. For simplicity, we simply take P to be the point
(0, k) on the y-axis (check that (0, k) is on the graph of y = mx + k). We will use

the same method as in the special case of y = 23 x + 2 to show that L and G are
equal, i.e., we go through the same two steps:
(α) Every point on the graph G is a point on the line L.
(β) Every point on the line L is a point on the graph G.
We begin with step (α): we have to show that any point R on the graph G
distinct from P lies on L. We do so by proving that the line L joining P to R
coincides with L; consequently, R has to lie on L.

Rr L




As in the case of y = 23 x + 2, we will prove the coincidence of L and L by
showing that they have the same slope. It then follows from Theorem 4.5 on page
72 that L = L because they also pass through the same point P. To this end, we
will prove in general the following lemma.
Lemma 4.6. The slope of the line joining any two distinct points on the graph of a
linear equation y = mx + k is always equal to m.
(Caution: One may be tempted to assert instead that “the slope of the graph
of y = mx + k is m”, but at this particular juncture, it is not yet known that the
graph of y = mx + k is a line, so it would be premature to talk about the “slope
of the graph of y = mx + k”.)
Proof. Let the two points on the graph of y = mx + k be ( p1 , p2 ) and (q1 , q2 ). The
slope of the line joining them is then (q2 − p2 )/(q1 − p1 ).12 Being on the graph,
the coordinates of these points satisfy, by definition, the equations
p2 = mp1 + k and q2 = mq1 + k.
Therefore, the slope is
q2 − p2 (mq1 + k) − (mp1 + k) m ( q1 − p1 )
= = = m.
q1 − p1 q1 − p1 q1 − p1
This proves Lemma 4.6.
Because L is the line joining the points P and R on the graph G of y = mx + k,
and L is the line joining P and R , also on G, it immediately follows from the
preceding lemma that both of their slopes are m. So L = L by Theorem 4.5 on
page 72. The proof of step (α) is complete.

12 Note that this ratio always makes sense because q − p is never 0. The reason for the latter is
1 1
that, if it were, we would have p1 = q1 . But ( p1 , p2 ) and ( q1 , q2 ) being distinct points on the graph of
y = mx + k, we have p2 = mp1 + k and q2 = mq1 + k. Thus also p2 = q2 and the two points ( p1 , p2 )
and ( q1 , q2 ) would not be distinct, a contradiction. Hence q1 − p1 is never 0.

Now, step (β): why every point of L lies on the graph G. Because G is the
set of all solutions of the equation y = mx + k, we have to prove that any point
Q = (q1 , q2 ) on L satisfies q2 = mq1 + k. The reasoning is very simple now. We
have just seen that the slope of L is m. Since P = (0, k), the slope of L computed
using P and Q is still equal to m, i.e.,

q2 − k
= m.
q1 − 0

This implies q2 − k = mq1 , which is the same as q2 = mq1 + k. Thus Q lies on

G, and the proof of the first part of Theorem 4.2 is complete.
We will make the observation once more, but will not repeat it in the future, about
the need to be able to compute the slope of a line by using any two points of our choosing.
We would like to single out a useful intermediate step in the preceding proof
for future reference. According to Lemma 4.6, the slope of the line joining any
two points of the graph of a linear equation in two variables, y = mx + k, must
be m. Now that the first part of Theorem 4.2 has been proved and we know that
the graph of y = mx + k is a line, this graph must coincide with any line joining
two of its points. Hence we can conclude the following.

Lemma 4.7. The slope of the line which is the graph of y = mx + k is m.

Exercises 4.4

(1) (a) What is the most general linear equation of two variables whose
graph passes through the point (−2, 1)? (b) Write down the three linear
equations of two variables whose graphs all have slope 32 but intersect
the y-axis at (0, 1), (0, −2), and (0, 32 ), respectively.
(2) Consider the graphs of y = 125x − 7 and y = 126x − 7 over the negative
x-axis, i.e., left of the y-axis. Which graph lies above the other? Explain
in two different ways.
(3) On the basis of what we know thus far, what can you say about the
graphs of the following two equations? Explain in two different ways.
67 67
y = x + 21 and y = x + 21.5.
895 895

(4) Do the graphs of 6x − 2y = 7 and 15 x − 15 1

y = 329 intersect? Explain
why or why not using what we have done so far.

4.5. Every line is the graph of a linear equation

We proceed to finish the proof of Theorem 4.2 on page 60 by showing that every
straight line is the graph of a linear equation. Let us begin as usual with a special
case: we look at the line L that joins the points (−4, 0) and (0, 6). What equation
is it the graph of?


( x, y)

− O

We can compute the slope of L by using (−4, 0) and (0, 6), and it is 32 . By the
first part of Theorem 4.2, the graph of the equation y = 32 x + k, where k is some
constant, is a line  whose slope, in view of Lemma 4.7 on page 76, is also 32 . We
are therefore tempted to show that  = L for a suitably chosen k. By Theorem 4.5
on page 72, this would be the case if  passes through (0, 6). But the latter can be
easily arranged because we only need to choose k in y = 23 x + k so that (0, 6) is a
solution. In other words, it suffices to choose k so that 6 = 32 (0) + k. The latter is
true precisely when k = 6. Therefore the graph  of y = 32 x + 6 is a line which
passes through (0, 6) and whose slope is 32 . As noted, Theorem 4.5 now implies
that indeed L = . Therefore the given line L is the graph of y = 32 x + 6.
Observe that the “6” in y = 32 x + 6 is the “6” in (0, 6).
It remains to tackle the general case: Given a (nonvertical) straight line L, we
must find a linear equation whose graph is exactly L. The idea of the proof is the
same as in the above special case. Since L and the y-axis are not parallel, they
must meet at some point. Let L intersect the y-axis at (0, k). Let the slope of L
be m. We are going to show that L is the graph G of the equation y = mx + k.
For the sake of variety, let us assume m is negative so that we have the following
picture (for convenience, we have drawn the picture for the case k > 0):
Q rk
By the first part of Theorem 4.2 , the graph G of y = mx + k is a line. By Lemma
4.7 at the end of the last section, the slope of G is m. Since (0, k) is obviously a
solution of y = mx + k, G also passes through the point (0, k). Therefore G and
L are two lines which have the same slope m and pass through the same point
(0, k). By Theorem 4.5 on page 72, G and L are the same line. It follows that the
given line L is the graph of y = mx + k. This completes the proof of Theorem 4.2.

Exercises 4.5
(1) (i) Find the equation of the line passing through (1, −1) with slope −1.
Does it pass through (−85, 85)? (ii) Find the equation of the line passing
through (− 25 , 3) with slope 17 . Where does it intersect the x-axis?
(2) Practice explaining to an eighth grader why the line joining the origin
and the point (−1, −1) is the graph of a linear equation of two vari-
ables. Do it once assuming that the student knows the graph of a linear
equation of two variables is a line, and do it also without making that
assumption. In any case, be clear about what you assume the student
knows, and make it as simple as possible.

4.6. Useful facts and examples

The preceding proof of Theorem 4.2 (page 60) may
All the skills needed to find the seem a bit long, but almost every piece of reason-
equation of a line are included in ing in the proof will show up in subsequent discus-
sions of lines and linear equations. We now give
the proof that the graph of a demonstration of this fact by extracting four
ax + by = c is a line. useful consequences from the proof.
We first introduce two standard concepts.
If a line  intersects the y-axis at (0, k), then the number k, or the point (0, k),
is called the y-intercept of the line . Similarly, if  intersects the x-axis at (c, 0),
then c, or the point (c, 0), is called its x-intercept. It follows from Theorem 4.5 on
page 72 that there is at most one line with a given slope and a given y-intercept
(respectively, a given x-intercept). Observe that if a nonvertical line has a y-intercept
of 0, then it also has an x-intercept of 0. Recall also that we are only dealing with
equations of the form y = mx + k and lines which are not vertical.
Lemma 4.8. The graph of y = mx + k is the unique nonvertical line with slope m
and y-intercept k.
Proof. It follows from Theorem 4.2 on page 60 that the graph of y = mx + k is a
line; the fact that the slope of this line is m is implied by Lemma 4.7 on page 76.
The fact that its y-intercept is k follows immediately from the fact that (0, k) is a
solution of y = mx + k, and the fact that this line with slope m and y-intercept k
is unique is implied by Theorem 4.5 on page 72, as noted before the lemma. The
proof is complete.
A second useful fact is a restatement of the fact that the slope of a line can be
computed using any two points on the line.
Lemma 4.9. The equation of the line passing through two given points ( p1 , p2 ) and
q −p
(q1 , q2 ), where p1 = q1 , is (y − p2 ) = m( x − p1 ), where m = q21 − p21 .
Proof. Let L be the line passing through ( p1 , p2 ) and (q1 , q2 ). We want to show
that L is the graph of (y − p2 ) = m( x − p1 ).
q −p
By Theorem 4.4 on page 67, the slope of L is q2 − p2 , which is precisely the
1 1
number m in the lemma. And, of course, L passes through ( p1 , p2 ). Therefore, to
prove the lemma, it suffices to prove that the graph of (y − p2 ) = m( x − p1 ), to be
denoted by , also passes through ( p1 , p2 ) and has slope m. For then, L = , by

Theorem 4.5 on page 72. But the equation (y − p2 ) = m( x − p1 ) can be rewritten

as y = mx + ( p2 − mp1 ), so that the slope of  is m (by Lemma 4.7). Moreover,
( p1 , p2 ) is obviously a solution of (y − p2 ) = m( x − p1 ). Thus L = , as desired.
Remark. In Lemma 4.9, there is no preference between the two points ( p1 , p2 )
and (q1 , q2 ). The equation (y − p2 ) = m( x − p1 ) in the lemma is stated in terms
of ( p1 , p2 ), but we could equally well use (q1 , q2 ). Therefore:
Corollary. The equation of the line passing through two given points ( p1 , p2 ) and
q −p
(q1 , q2 ), where p1 = q1 , is (y − q2 ) = m( x − q1 ), where m = q2 − p2 .
1 1

In practice, one can also get the equation in Lemma 4.9 differently. The line
q −p
L obviously has slope m = q2 − p2 , so it must be defined by an equation of the
1 1
form y = mx + k for some constant k (second part of Theorem 4.2). It suffices
to determine what k is, and that can be done by recalling that ( p1 , p2 ) lies on
the graph of y = mx + k, i.e., ( p1 , p2 ) is a solution of the equation so that p2 =
mp1 + k. Thus k = p2 − mp1 and the equation sought by Lemma 4.9 is now

q2 − p2
y = mx + ( p2 − mp1 ), where m = .
q1 − p1

Needless to say, this equation need not—and should not—be memorized. More
importantly, we hope it is clear by now that the whole point of going through
the details of the last three sections is to show that, once you get to know the
interplay between the algebra of the equation and the geometry of the graph, no
memorization is necessary.
Example 2 on page 80 shows that the actual computation of k is even simpler
than the abstract description.
We now come to a third fact that is usually glossed over in TSM.

Lemma 4.10. The lines defined by the two equations ax + by = c and a x + b y =

c are the same if and only if there is a nonzero number λ so that a = λa, b = λb,
and c = λc.

Proof. First of all, if there is a nonzero number λ so that a = λa, b = λb,

and c = λc, then the solutions of ax + by = c and a x + b y = c are clearly
identical because a x + b y = c may be rewritten as λ( ax + by) = λc. Since the
line defined by a linear equation is just the set of all its solutions, the lines defined
by ax + by = c and a x + b y = c have to be the same because these equations
have the same set of solutions. Conversely, suppose the two lines defined by
ax + by = c and a x + b y = c coincide. Writing the equations equivalently as

a c a c
y = − x+ and y = −
b b b b

we must have, by Lemma 4.8 on page 78, bc = bc and ba = ba because the graphs
are the same line and must therefore have the same y-intercept and slope. By the
cross-multiplication algorithm (see page 270), we get, equivalently,

b c b a
= and = .
b c b a

Let λ = b. Then
b c a
λ = = = .
b c a
It follows that a = λa, b = λb, and c = λc. The proof is complete.
In the situation of Lemma 4.10, we can retrieve the equation ax + by = c
of  from the equation a x + b y = c by multiplying both sides by λ1 . For this
reason, one normally regards any two equations defining a line as “the same”,
and speaks of the defining equation of a line.
The final and fourth fact is a consequence of Lemma 4.8 and Lemma 4.10.

Lemma 4.11. A line  defined by ax + by = c with a = 0 and b = 0 has slope − ba

and x-intercept ac .

Proof. By Lemma 4.10,  is also defined by y = (− ba ) x + bc , as this equation can

be obtained from ax + by = c by multiplying both sides by 1b . By Lemma 4.8 , 
has slope − ba . It is also obvious that ( ac , 0) is a solution of y = (− ba ) x + bc , so 
has x-intercept ac . The lemma is proved.
Next, we give some examples on how to write down the equation of a line.
Example 1. What is the equation of the line passing through the point (2, −1)
with slope 23 ?
Solution. We already know that the equation has the form y = 23 x + k
(Lemma 4.8), so we only need to find out what k is. It is not necessary to directly
compute the y-intercept. Since the line contains (2, −1) (and since the line is the
graph of y = 23 x + k), we know −1 = ( 23 )2 + k, from which k = − 73 . Thus the
equation is y = 23 x − 73 .
This is the proper place to comment on a common misconception. Sometimes
it is taught in classrooms, and written up in textbooks, that the equation of this
line is
y − (−1) 2
= .
x−2 3
This is not correct inasmuch as the point (2, −1) would not be a solution of this
equation because when x = 2, the denominator on the left side would be 0.
Therefore the graph of x −2 = 23 contains every point of the line passing through
the point (2, −1) with slope 23 except the point (2, −1) itself. However, if we re-
express this equation as y − (−1) = 23 ( x − 2), then certainly this is an equation
whose graph is the desired line. That said, we should add that the equation
x −2 = 23 contains the correct geometric conception of the desired line, because
what it says is that this line consists of all the points ( x, y), other than (2, −1),
so that the slope of the line containing ( x, y) and (2, −1) is 23 . The point of this
comment is therefore that even correct thinking needs to be complemented by
correct technical execution.
Example 2. What is the equation of the line passing through the points (−1, 3)
and ( 12 , 4)?

Solution. Call this line . The slope of  is

4−3 2
2 − (−1) 3

so the equation of  has the form y = 23 x + k (Lemma 4.8), where the constant
k is determined by the fact that (−1, 3) is a solution of the equation since (−1, 3)
lies on . Thus 3 = 23 (−1) + k, and k = 11 2 11
3 . The equation of  is y = 3 x + 3 .
There is another way to approach this problem. Let ( x, y) be an arbitrary
point on  not equal to (−1, 3). By Theorem 4.4 on page 67, we can compute the
slope of  by using ( x, y) and (−1, 3), getting
y−3 2
= .
x+1 3
When x = −1, this is equivalent to y − 3 = 23 ( x + 1), and the graph of the
latter now contains every point on , including (−1, 3). But y − 3 = 23 ( x + 1) is
equivalent to y = 23 x + 11
3 .
The preceding solutions need to be complemented by an observation. We
have used the point (−1, 3) instead of ( 12 , 4) as the reference point in both solu-
tions, but of course the outcome would have been the same had ( 12 , 4) been used.
This is the message of the Corollary on page 79. For example, suppose in the first
solution we make use of the fact that ( 12 , 4) lies on the graph of y = 23 x + k; then
we have 4 = 23 · 12 + k, so that
1 11
k = 4− =
3 3
as before. In the second solution, if we use the point ( 12 , 4) as the point of refer-
ence, then for all points ( x, y) on  except ( 12 , 4), we would have
y−4 2
= .
x − 12 3

The same reasoning as before shows that the equation of  is, again, y = 23 x + 11
3 .
It is worth emphasizing that none of these methods should be memorized
by brute force beyond the fact that the equation of a nonvertical line is of the
form y = mx + k for some constants m and k, where m is the slope of the line.
Instead, one should get to know the reasoning underlying these procedures and
do a simple computation each time to get at the equation.
Example 3. What is the x-intercept of the line joining the points (−4, 6) and
(2, 1)?
Solution. The slope of the line is −64−−12 = − 56 , so the equation of the line
is y = − 56 x + k, for some constant k. Since it contains the point (2, 1), we have
1 = − 56 · 2 + k, and so k = 1 + 53 = 83 . The equation of the line is therefore
y = − 56 x + 83 . The point where this line intersects the x-axis has a y-coordinate
equal to 0; let it be (c, 0). Since (c, 0) lies on the line, we also get 0 = − 56 c + 83 ,
which is the same as 56 c = 83 . Multiplying through by 65 , we get c = 16 5 . So the
x-intercept is 5 .

We conclude this section with a coordinate description of the concept of dila-

tion in the plane (see page 267) that will be useful later on. We claim:
Lemma 4.12. If D is a dilation of the coordinate plane with center at the origin O
and with scale factor r (r > 0), then for any point ( a, b),
D ( a, b) = (ra, rb).
Proof. If we define the multiplication of a point ( x, y) by a number c as
c( x, y) = (cx, cy),

then we can rewrite the conclusion of the lemma as

D ( a, b) = r ( a, b).
To prove this, denote the points ( a, b) and D ( a, b) by P and P , respectively.
First recall the definition of P : on the ray ROP from O to P, P is the point so that
the distance |OP | from O to P is r times the distance |OP| from O to P, i.e.,
|OP | = r |OP|.
For simplicity, first assume both a and b are positive. Let vertical lines passing
through P and P meet the x-axis at Q and Q , as shown:

D ( a, b) = P "
( a, b) = P "

Notice that the x-coordinates of P and P are |OQ| and |OQ |, respectively. Be-
cause PQ P Q , Theorem 4.3 on page 62 implies immediately that
|OQ | |OP |
= .
|OQ| |OP|
|OP |
But we have seen that |OP| = r, so we get

|OQ |
= r,
or, what is the same thing, |OQ | = r |OQ|. Thus the x-coordinate of D ( a, b) is
r times the x-coordinate a of ( a, b). By using horizontal lines passing through P
and P and by repeating the same argument with respect to the y-axis, we get in
a similar manner that the y-coordinate of D ( a, b) is also r times the y-coordinate
b of D ( a, b). This shows D ( a, b) = (ra, rb) = r ( a, b), as claimed.

In case one or both of a and b is negative, the reasoning is essentially the

same. Consider, for instance, the case a < 0 but b > 0 for definiteness. Then the
preceding picture becomes:
b P = D ( a, b)
b P = ( a, b)

In this case, the only difference is that |OQ| = − a and |OQ | is equal to the
negative of the x-coordinate of D ( a, b), so that from |OQ | = r |OQ|, we conclude
the x-coordinate of D ( a, b) is again r times the x-coordinate a of ( a, b) as before.
Similarly, the y-coordinate of D ( a, b) is again r times the y-coordinate b of ( a, b).
The proof is complete.

Exercises 4.6

(1) (i) Find the equation of the line joining (2, −1) and (−3, −11). What is its
x-intercept? (ii) Find the equation of the line joining (− 14 , 23 ) and (5, 32 ).
What is its y-intercept?
(2) (i) What is the equation of the line with x-intercept equal to −2 and slope
− 13 ? What is its y-intercept? (ii) What is the equation of the line with
x-intercept equal to 25 and y-intercept equal to − 43 ?
(3) What is the x-intercept of the line passing through (−5, 2) with slope
− 72 ?
(4) (i) What is the y-intercept of the graph of x = −5y + 7? What is its
slope? Does the point (352 25 , −70 15 ) lie on the graph? (ii) What is the
x-intercept of the line passing through (5, − 23 ) and (− 43 , 12 )? What is its
(5) Let L and L be the lines defined by 2x − 3y = 0 and 3x + 2y = 0,


P J r
J  P



Let P and P be points on L and L , respectively. We may assume that
the coordinates of P and P are (3t, 2t) and (−2s, 3s), respectively, for
some numbers t and s. (a) Compute the squares of the lengths, |OP|2 ,

|OP |2 , and | PP |2 . What do you observe? (b) What can you conclude
about L and L ?
(6) Let L and L be the lines defined by ax − by = 0 and bx + ay = 0,
respectively, where a and b are constants. What can you say about L and
L ? (Hint: Look at the preceding exercise.)
(7) Let D be the dilation with center O (center of a coordinate system) and
scale factor 13 . Let Δ be the triangle with vertices (6, 3), (12, 15), and
(9, −17). What are the vertices of D (Δ) ?


Simultaneous Linear Equations

In this chapter, we consider the situation where two linear equations appear si-
multaneously. Since their graphs are two lines (Theorem 4.2 on page 60), the in-
terplay between the geometry (e.g., do the lines intersect?) and the algebra (e.g., is
there a common solution to both equations?) deserves a close inspection. We will
explain why the usual algebraic method of solution is correct, analyze the nature
of the solution, and call special attention to the precise geometric interpretation
of the algebraic conclusions.
In the Appendix of this chapter, we give a characterization of the perpendic-
ularity of lines in terms of slope. The proof is an instructive exercise in the use of
the geometric tools we have carefully assembled.

5.1. Solutions of linear systems and the geometric interpretation

Recall that a solution to a linear equation such as x + 3y = 12 is a pair of numbers
( A, B) so that A + 3B = 12 . For example, ( 12 , 0) is a solution. There are an infinite
number of solutions to a linear equation of two variables, and their totality is by
definition the graph of the equation, which is a line (Theorem 4.2 on page 60).
Still with x + 3y = 12 , suppose we also consider another equation x − 2y = −2
and ask if there could be a pair of numbers ( A, B) so that it is a solution of both
x + 3y = 12 and x − 2y = −2. Indeed there is: for example, (−1, 12 ), as it is easy
to check. We say the pair of linear equations

x + 3y = 12
x − 2y = −2
is a system of linear equations, or simultaneous (linear) equations, in the num-
bers x and y. To be precise, one would have to refer to such a pair of equations
as a linear system of two equations in two unknowns or in two variables. As
in the case of a single equation in one variable, such a system is a question that
asks whether there are pairs of numbers ( A, B) which are solutions of both linear
equations. This implicit statement will be
taken for granted and will not be repeated in The study of a pair of linear
subsequent discussions. To solve the system is equations is equivalent to the
to find all the ordered pairs of numbers ( A, B)
which are solutions of both equations. Each
study of a pair of lines in the
such ( A, B) is called a solution of the sys- plane.
tem. Sometimes we also call the collection of
all these ( A, B)’s the solution of the system. Thus (−1, 12 ) is a solution of the above

system. A priori, there may be others, but it will turn out that the solution of
this particular system consists only of (−1, 12 ) so that (−1, 12 ) is the solution of the
system. At present we are only concerned with systems consisting of two equa-
tions in two variables, but note that a similar discussion also holds for systems of
equations consisting of many equations in any number of variables.
Postponing for the moment the discussion of how to get a solution such as
(−1, 12 ) to the above system, let us first give a geometric interpretation of this
point (−1, 12 ). Now (−1, 12 ), being a solution of the first equation x + 3y = 12 ,
lies on the line defined by x + 3y = 12 . Similarly, (−1, 12 ) also lies on the line
defined by the second equation x − 2y = −2. This means the solution (−1, 12 ) is
the point of intersection of the two lines defined by the equations in the linear system, as
Y x − 2y = −2

x + 3y = 12  
PP(−1, 1 ) 
PP r2 
 −2 PP
−1 O 1

Now, are there perhaps other solutions? The answer is no, because if the point
( A, B) is not at the intersection of these two lines, then let us say ( A, B) does
not lie on the graph of x + 3y = 12 . Therefore, by the definition of the graph of
x + 3y = 12 (which is the collection of all the points (α, β) which are solutions of
x + 3y = 12 ), ( A, B) cannot be a solution of x + 3y = 12 . Therefore, the point of
intersection of the lines defined by the equations of the linear system is exactly
the solution of the linear system.
This reasoning is perfectly general.
Theorem 5.1. Suppose we are given a linear system in two unknowns x and y:

ax + by = e
cx + dy = f
where a, b, . . . , f are constants. Let 1 and 2 be the lines defined by the equations
ax + by = e and cx + dy = f , respectively. Then the solution to the system is (the set
of points in) the intersection of the lines 1 and 2 .
Proof. We first show that if ( A, B) lies in the intersection of 1 and 2 , then it is
a solution of the system. Indeed, ( A, B) being on 1 implies that it is a solution
of ax + b = e and being on 2 implies that it is a solution of cx + d = f . Thus
( A, B) is a solution of both equations, and therefore a solution of the system. Is
there any solution that does not lie in the intersection of 1 and 2 ? We now show
that there is not. Let ( A, B) be a solution of the system. Then ( A, B) is a solution
of ax + by = e, and therefore ( A, B) lies on 1 by the definition of the graph of
an equation. For the same reason, ( A, B) lies on 2 as well. Therefore ( A, B) is a
point of intersection of 1 and 2 . Thus the set of all the solutions coincides with
the intersection of 1 and 2 . The proof is complete.

Suppose the lines 1 and 2 are distinct nonparallel lines. Then we know they
intersect at exactly one point. We have just given the precise reasoning why, if the
lines defined by the two equations of a linear system of two linear equations in
two unknowns are distinct nonparallel lines, then the solution of the linear system
is the point of intersection of the two lines. This fact is usually decreed by fiat in
TSM—with no reason given—probably because the precise definition of the graph
of an equation is rarely given or, if given, is not put to use. It is very important
that you learn to make use of definitions in your teaching. In particular, please
do not forget to explain why the solution of a linear system can be obtained from
the intersection of the graphs of the linear equations.

Exercises 5.1
(1) Write down a system of equations so that the following picture is its
geometric interpretation (you may assume that one of the lines intersects
the x-axis at (1.5, 0) and the other intersects the y-axis at (0, 0.5)):


1 2 3

5.2. The algebraic method of solution

The substitution method

The elimination method
Two examples

The substitution method

Next, we turn to the question of how to get a
solution of a given linear system algebraically, To solve a linear system of
i.e., without looking at the graphs of the equa- equations, we assume that there
tions. We adopt the standard method (see Sec-
is a solution and then try to find
tion 3.1) of first assuming that there is a solution
to the given linear system, and then we use this out what this solution might be.
information to find out what it has to be. We

then turn around and verify that the presumptive solution is indeed a solution of
the linear system.
The first method of solution is by substitution. We use one equation to get
an expression of (let us say) y in terms of x, and then replace the y in the other
equation by this expression of y.1 Then we solve the resulting linear equation in
x as in Section 3.1 on page 37. Finally, we solve for y.
We will illustrate with the following specific linear system in the hope of
making the explanation more accessible:

4x + 5y = −3
−2x + y = 5
Note, however, that the reasoning given below is perfectly general.
We want to show that if there is an ordered pair of numbers ( x, y) satisfying
the system (5.1), then necessarily ( x, y) = (−2, 1) (in the sense that x = −2
and y = 1; see page 56). So let ( x, y) be such a solution. Then the system (5.1)
becomes a pair of equations about numbers to which we can apply the usual
arithmetic operations. Look at the second equation: it practically hands over an
expression of y in terms of x, namely, y = 2x + 5. Now the substitution method
calls for “substituting” this expression of y into the first equation of (5.1) to get
(5.2) 4x + 5(2x + 5) = −3.
This implies 14x = −28 so that x = −2 (see Section 3.1 on page 37). Since
y = 2x + 5, we have y = −4 + 5 = 1. Thus ( x, y) being a solution of (5.1) implies
that it has to be (−2, 1), as claimed.
Needless to say, it is easy to check that (−2, 1) is indeed a solution of (5.1).
(In practice, the routine checking that a purported solution is the solution of the
linear system should be made mandatory in the school classroom.)
We hasten to explain why the substitution method works. Precisely, what
does equation (5.2) mean, and why is its solution part of the solution of the system
The first equation of (5.1) is equivalent (in the sense of having the same solu-
tions) to 5y = −4x − 3 since the latter is obtained from the former by transposing
the term 4x. Now the second equation of (5.1) is equivalent to y = 2x + 5 for the
same reason, and the latter is in turn equivalent to 5y = 5(2x + 5). Let us define
two linear systems to be equivalent if they have the same solutions. Then we see
that the system (5.1) is equivalent to the following linear system:

5y = −4x − 3
5y = 5(2x + 5)
Therefore, solving (5.1) is equivalent to solving (5.3). Let us pause to reflect on
what it means to solve (5.3). If x is any number, say 3, would x = 3 be part of
the solution of the system (5.3)? No, because if x = 3, then the first equation of
(5.3) implies that 5y = −15 and therefore y must be y = −3. However, if we let
x = 3 in the second equation of (5.3), then necessarily 5y = 55 and y = 11; this
contradicts the fact that y is already known to be equal to −3 because of the first
equation. Thus for a value of x to be part of the solution of the system (5.3), this
value of x must be one such that when the right sides of both equations of (5.1)
are given this value, they are equal to the same number, namely, 5y. Such being
1 Thus we “substitute” this expression of y into the other equation.

the case, it is then clear how to solve the system (5.3): we want a value of x so
that the right sides of both equations in (5.3) are equal, i.e., we want x to be the
solution of
(5.4) −4x − 3 = 5(2x + 5).
The solution of (5.4) then guarantees that for this value of x, the solutions of y
from both equations in the system (5.3) will coincide, i.e., these values of x and y
furnish the solution of the system (5.3), and therefore also of system (5.1).
Now observe that (5.4) is equivalent to equation (5.2), because the former is
obtained from the latter by transposing the term 4x. This then explains what the
method of substitution is all about and why solving equation (5.2) gives part of
the solution of (5.1).
We purposely chose system (5.1) for illustration because its second equation
immediately suggests an expression of y in terms of x. This obviates any need to
search for an expression of y in terms of x and allows us to completely focus on
the subsequent explanation about the substitution of x for y in the first equation.
However, the underlying reasoning of the preceding explanation is perfectly gen-
eral, and we will now demonstrate this generality by using the second equation
of (5.1) to get an expression of x in terms of y (i.e., rather than y in terms of x)
and use entirely similar reasoning to explain why the corresponding substitution
of y for x in the first equation of (5.1) also leads to a solution of system (5.1).
For this purpose, let ( x, y) be a solution of the system (5.1) as usual. Recall
the system (5.1):

4x + 5y = −3
−2x + y = 5
We will deduce that ( x, y) = (−2, 1). We do this by using the second equation to
get an expression of x in terms of y and then substitute y for x in the first equation
to get an equation in y alone. Thus, rewrite the second equation as −2x = −y + 5,
from which we conclude
x = (− 12 )(−y + 5).
Now substitute this value of x into the first equation above to get 4(− 12 )(−y + 5) +
5y = −3, which may be rewritten as
(5.5) (−2)(−y + 5) + 5y = −3.
The method of substitution now calls for the solution of equation (5.5) in y. This
yields 7y = 7 and therefore y = 1. From x = (− 12 )(−y + 5), we get x =
(− 12 )(−1 + 5) = −2. We conclude that if there is a solution ( x, y) to (5.1), then
( x, y) = (−2, 1). As before, we should always check that (−2, 1) is indeed a
solution of the system (5.1).
Once again, why does solving equation (5.5) yield a solution of system (5.1)?
To see this, we first show that the system (5.1) has the same solutions as another
system. The first equation of (5.1) is equivalent to 4x = −5y − 3. The second
equation is equivalent to x = (− 12 )(−y + 5), as we have observed, and the latter
is clearly also equivalent to 4x = (−2)(−y + 5). Therefore the system (5.1) is
equivalent to the following linear system:

4x = −5y − 3
4x = (−2)(−y + 5)

We may therefore solve system (5.1) by solving system (5.6) instead. Now if ( x, y)
is a solution of system (5.6), then, of course,
(5.7) −5y − 3 = (−2)(−y + 5)
because both sides are equal to 4x. Let us look at (5.7) as an equation in y.
The first observation about equation (5.7) is that its solution y guarantees that
for this value of y, the solution x from either equation of (5.6) will automatically
satisfy the other equation in (5.6), and therefore the pair ( x, y) is a solution of the
system (5.6), and hence of system (5.1). The second observation is that equation
(5.7) is equivalent to equation (5.5) as the former is obtained from the latter by
transposing the term 5y. This then explains why solving equation (5.5) yields a
solution of (5.1).
Remark. It is worth repeating that, once students have solved a linear system
using the analog of the substitution equation (5.2) or (5.5), they should check that
the solution so obtained actually satisfies the original system. This practice is not
only a good way to avoid unintended errors, but is also a reminder of the overall
structure of solving equations, i.e., assuming that there is a solution, we first find
out what this solution has to be, and then we confirm that the purported solution
is a solution (see Section 3.1).
In a middle school classroom, it would be entirely appropriate to assess stu-
dents’ understanding by asking them to explain why the method of substitution
works, in the sense of the explanation after equation (5.2) or after equation (5.5).


4x − y = −1
9x − 2y = 1

The elimination method

Recall the system (5.1):

4x + 5y = −3
−2x + y = 5
We have written out the method of getting the solution (−2, 1) via the steps asso-
ciated with (5.2)–(5.4) to facilitate the explanation of why (−2, 1) is a solution. In
practice, however, one can sometimes achieve some simplification by skirting the
need to solve equation (5.2), for the simple reason that the terms involving x in
both equations of the system (i.e., 4x and −2x) are so similar that, without much
trouble, we can “eliminate” them to arrive at a single equation in y (which we can
then solve). In more details, let ( x, y) be a solution of (5.1). Then multiplying both
sides of the second equation by 2, we get:

4x + 5y = −3
−4x + 2y = 10
Now add both sides of the second equation (don’t forget each side is just a num-
ber ) to the corresponding sides of the first, and we obtain:
(5.8) 5y + 2y = (−3) + 10.

This leads to y = 1 as before. One then solves for x to get the solution of the
What is even more important about equation (5.8) is the fact that it becomes
the same equation as (5.5) if we expand the latter to get
2y − 10 + 5y = −3
and then transpose −10 to the right side. Therefore, this way of “bringing the co-
efficient of the x term in both equations to have opposite signs and then eliminate
x by adding the corresponding sides of both equations” achieves the same result
as the method of substitution embodied in equations (5.2)–(5.4). This so-called
method of elimination is thus nothing more than a different presentation of the
method of substitution, but one should keep this method in mind as an additional
tool to solve simultaneous equations.
We will give another illustration of the method of elimination by eliminating
the y terms in the linear system (5.1) instead. Still with ( x, y) as a solution of (5.1),
multiplying both sides of the second equation by 5, we obtain −10x + 5y = 25.
Now subtract both sides of this equation from the corresponding sides of the first
equation in (5.1),2 and we get:
(5.9) 4x − (−10x ) = −3 − 25.
Thus 14x = −28 and x = −2. The fact that y = 1 follows by letting x = −2 in
the second equation, −2x + y = 5, in (5.1).


Explain why this way of eliminating the y terms and solving equation (5.9)
in x leads to a solution of system (5.1). (Hint: Compare equations (5.9) and

Two examples
Example 1. Solve the following system:

2x + 3y = 2
3x − 4y = 1
We first solve the system by a brute force substitution. From the first equation,
we get an equivalent equation x + 32 y = 1, so that x = − 32 y + 1. Substituting
this value of x into the second equation, we get:
3(− 32 y + 1) − 4y = 16 .
Upon simplification, this becomes,
− 92 y + 3 − 4y = 1
We can solve this equation by clearing the denominators (see (E) on page 42). Thus,
multiplying both sides of the equation by 6, we get −27y + 18 − 24y = 1, so that
y = 13 . To solve for x, we can make use of either equation of (5.10). Suppose we
use the first equation; then 2x + 3 ( 13 ) = 2 and x = 12 . Thus the solution of (5.10)
is ( 12 , 13 ).
2 Again, don’t lose sight of the fact that each side is just a number!

We can also solve (5.10) by the method of elimination, as follows. Let us

eliminate x. To this end, we multiply the first equation of (5.10) by 3 and the
second equation by 2 to obtain an obviously equivalent system:

6x + 9y = 6
6x − 8y = 1
Subtract both sides of the second equation from the corresponding sides of the
first equation to get 17y = 17 1 1
3 , which implies y = 3 . The fact that x = 2 now
follows as before.


Solve (5.10) by eliminating y.

Example 2. A fraction has the property that, when 2 is added to both the
numerator and the denominator, the new fraction is equal to 43 , but when the
denominator of the original fraction is subtracted from the numerator, the result
is 5. What is the fraction?
Let the fraction be xy . Then we are given that
x+2 4
= and x − y = 5.
y+2 3
Using the cross-multiplication algorithm (see page 270), the first equation is equiv-
alent to 3x + 6 = 4y + 8, which is in turn equivalent to 3x − 4y = 2. Therefore
( x, y) is a solution of the following linear system:

3x − 4y = 2
x − y = 5
This system is equivalent to:

3x = 4y + 2
3x = 3y + 15
Equating the right sides of the two equations gives 4y + 2 = 3y + 15, and there-
fore y = 13. To solve for x, we can use either of the equations in the linear system,
but since the second equation is x − y = 5, we get x = 18. The fraction is 18 13 .

Exercises 5.2
(1) Solve:
7x − 3y = 10 1
3x − 3y = 5
(a) (b)
3x − 5y = −5 4y − 12 x = −3.5

⎨ 2
− 56 y = − 12 
5x 12x + 11y = 172
(c) (d)
⎩ 1 5 5 28x − 17y = 60
6x + 9y = 2
0.08x + 0.9y = 0.46 5x − 34 y = 2
(e) (f)
0.1x − 0.04y = 0.16 x + 2y = 11

(2) Solve:
⎧ ⎧
⎨ 6
x + 12
y = −1 ⎨ 9
− 3y = 4
(a) (b)
⎩ 8
− 9
= 7 ⎩ 3
+ 2y = 10
x y x 3
(3) The second digit of a two-digit number is 13 of the first digit. If the
number is divided by the difference of the digits, the quotient is 15 and
the remainder is 3. Find the number.
(4) Alan’s age is 65 of Bill’s, but 15 years ago his age was 13
10 of Bill’s. Find
their current ages.
(5) Helena buys two books. The total cost is 49 dollars, and the difference
of the squares of the prices is 735. What is the cost of each book?
(6) I have two numbers x and y. Take 20% of x from x, then what remains
would be 7 less than y. If however I enlarge y by 20%, then it would
exceed x by 8. What are the two numbers?
(7) We have two whole numbers. The division-with-remainder of the larger
number by the smaller number has quotient 9 and remainder 15. Also,
the larger number is 97.5% of ten times the smaller number. What are
these numbers?
(8) A sum of money is to be divided equally among x people, each receiving
y dollars. If there were 3 more people, each person would receive 1 dollar
less, and if there were 6 fewer people, each would receive 5 dollars more.
Determine x and y.
(9) If 3 is added to the numerator of a fraction and 7 is subtracted from the
denominator, its value is 67 . But if 1 is subtracted from the numerator
and 7 is added to the denominator, its value is 25 . Find the fraction.
(10) Barrels are filled with wine and water. The contents of one barrel is
5 8
6 wine, and of another 9 wine. How many gallons must be taken
from each to fill another barrel whose capacity is 24 gallons, so that the
mixture will be 78 wine?
(11) Two marathon runners run at constant speeds. If they start running at
the same time from separate cities, 22 kilometers apart, towards each
other, they are 11 kilometers apart after 1 hour. Suppose they start over
and the first runner now runs twice as fast as before but the second run-
ner continues to run at his usual speed; then they would be 5 kilometers
apart after one hour. What are their respective speeds?

5.3. Characterization of parallel lines by slope

It is time to point out that not every linear system of two equations in two un-
knowns has a solution. Indeed, Theorem 5.1 on page 86 shows that the solutions
of a linear system coincide with the collection of points in the intersection of the
graphs of the equations in the system (they are lines, of course, by Theorem 4.2
on page 60). It follows that if these graphs are parallel, then they will have no
intersection, and therefore the linear system will have no solution. For example,
obviously the system

x + 0·y = 1
x + 0·y = 2

can have no solution, and we understand this

The characterization of parallel phenomenon from our present perspective by
lines by their slopes illustrates noting that the lines defined by the equations
x = 1 and x = 2 are distinct vertical lines and
the symbiotic relationship
are thus parallel. In general, when linear equa-
between algebra and geometry. tions are given, it comes down to the question
of how we can recognize that their graphs are
parallel. To this end, we will prove the following basic property of a pair of lines
in the plane.

Theorem 5.2. Two distinct, nonvertical lines in R2 are parallel if and only if they
have the same slope.
Proof. Let the lines be 1 and 2 . We first assume that they are parallel and prove
that they have the same slope.
If either of 1 and 2 is horizontal, then since 1 2 , the other is also horizon-
tal and both would have 0 slope and there would be nothing to prove in this case.
So we may assume both 1 and 2 are not horizontal. Referring to the picture
below, take a point P on 1 and let a vertical line through P intersect 2 at Q. (This
vertical line must intersect 2 because the latter is not vertical.) Since the lines are
distinct, P = Q. From Q, draw a horizontal line which meets 1 at S. Then from
S, draw a vertical line which (as before) meets 2 at a point T.






PQS and
TSQ are right triangles with legs parallel to the coordinate
axes. By Theorem 4.4 on page 67 and by equation (4.4) on page 69, the slopes of
1 and 2 are, respectively,
p2 − s2 | PQ| q2 − t2 |ST |
= and = .
p1 − s1 |SQ| q1 − t1 |SQ|
We have to show that these two numbers are equal. It suffices to show that
| PQ| = |ST |. Observe that PQ ST because both are vertical. We are also given
that 1 2 . Therefore PQTS is a parallelogram and, by Theorem 4.1 on page 54,
the opposite sides PQ and ST are equal. This shows that nonvertical parallel lines
have the same slope.
Conversely, suppose two distinct, nonvertical lines 1 and 2 have the same
slope, and we have to show that they are parallel.

We give three proofs. The first is a direct continuation of the preceding line
of geometric reasoning; the second is algebraic; while the third is the geometric
version of the second. First, if 1 and 2 have slope 0, then they are both horizontal
and are therefore parallel. We may therefore assume that they have nonzero slope
so that they are both nonhorizontal. We now perform the same construction as
before to get triangles
PQS and
TSQ. The fact that 1 and 2 have the same
slope then implies that (see Theorem 4.4 and equation (4.4))

| PQ| |ST |
= .
|SQ| |SQ|
Multiplying both sides by |SQ| yields | PQ| = |ST |. This immediately implies
PQS and
TSQ are congruent because of SAS. In greater detail, they have
a side SQ in common, and have another pair of equal sides in | PQ| = |ST |.
Finally, ∠ PQS and ∠TSQ are equal because they are both right angles, so the
congruence conditions of SAS are met. It follows that ∠ PSQ and ∠TQS are equal
because they are corresponding angles of congruent triangles. But then 1 2
because their alternate interior angles relative to the transversal LSQ are equal
(see Theorem 4.9 (of [Wu-PreAlg]) on page 271). The first proof of Theorem 5.2 is
Here is a second proof; it is algebraic. Since 1 and 2 are both nonvertical,
say they have slope m. Then let the equations defining them be y = mx + k
and y = mx + k , respectively, where k = k because by assumption the lines
are distinct. Suppose they intersect at a point ( A, B); then B = mA + k and
B = mA + k , which then implies that mA + k = mA + k , which in turn implies
that k = k . This is a contradiction to the earlier conclusion that k = k . Again,
we are done.
Finally, we give a third proof of Theorem 5.2 by contradiction (see the proof
of Lemma 3.1 in [Wu-PreAlg]). Suppose 1 and 2 are distinct and have the same
slope. If they are not parallel, then they meet at a point Q. Since 1 and 2 are
now two lines which have the same slope and pass through the same point (i.e.,
Q), Theorem 4.5 on page 72 implies that 1 = 2 . This contradicts the hypothesis
that the lines are distinct, thereby completing the third proof of Theorem 5.2.


In the preceding proof of Theorem 5.2, the picture on page 94 seems to

suggest that the reasoning only works when 1 is above 2 . Draw a picture
where 2 is above 1 , and go through the whole proof again to convince
yourself that the proof is also valid in this case.

Theorem 5.2 has many applications, and here is one of them.

Lemma 5.3. Let T be the translation along the vector AB, where A = ( a1 , a2 ) and
B = (b1 , b2 ). Then for all ( x, y) in R2 , T ( x, y) = ( x + c1 , y + c2 ), where (c1 , c2 ) =
( b1 − a 1 , b2 − a 2 ) .

Proof. Let the line passing through A and B be denoted by L. Let P be a point with
coordinates ( p1 , p2 ). We will prove that T ( P) has coordinates ( p1 + c1 , p2 + c2 ).
The proof is broken into two cases, P lies on L and P does not lie on L.

Case 1: P lies on L. First assume L is not vertical, i.e., a1 = b1 . Let Q denote

the point ( p1 + c1 , p2 + c2 ), where, as in the lemma, (c1 , c2 ) = (b1 − a1 , b2 − a2 ).
We want to prove that Q = T ( P). According to the definition of translation (see
page 269), we have to prove that Q lies on L, | PQ| = | AB| and Q is to the left
(respectively, the right) of P if B is to the left (respectively, the right) of A on the
line L.
B  L





Now | PQ| = | AB| because the distance formula (page 57) implies

| PQ| = (( p1 + c1 ) − p1 )2 + (( p2 + c2 ) − p2 )2
= c21 + c22 = (b1 − a1 )2 + (b2 − a2 )2 = | AB|.

Next, we prove that Q lies on L (= L AB ). Let L PQ denote the line containing P and
Q as usual. Now L and L PQ are two lines that contain the point P, and they also
have the same slope because the slope of L PQ is

( p2 + c2 ) − p2 c b − a2
= 2 = 2
( p1 + c1 ) − p1 c1 b1 − a 1

and the latter is the slope of L. Therefore, by Theorem 4.5 on page 72, the lines
L and L PQ coincide. Therefore Q lies on L PQ = L. Finally, if B is to the right of
A (as shown in the preceding picture), then b1 > a1 , so that b1 − a1 > 0, which
is to say, c1 > 0. This then implies that p1 + c1 > p1 , i.e., the first coordinate of
Q is bigger than the first coordinate of P and therefore Q is to the right of P. The
proof that if B is to the left of A then Q is also to the left of P is entirely similar.
Thus T ( P) = Q if P lies on L and L is not vertical.
If L is vertical, then a1 = b1 and c1 = 0. It is straightforward to see that,
in this case, the preceding argument simplifies, e.g., we prove that if A is above
(respectively, below) B, then P is also above (respectively, below) Q. The proof of
Case 1 is complete.
Case 2: P does not lie on L = L AB . As before, first assume L is not vertical,
i.e., a1 = b1 . Again let P = ( p1 , p2 ) and let Q = ( p1 + c1 , p2 + c2 ). According
to the definition of translation (see page 269), we have to prove that PQ AB,
| PQ| = | AB|, and Q is to the left (respectively, the right) of P if B is to the left
(respectively, the right) of A on the line L.



Since c1 = b1 − a1 = 0, the first coordinate ( p1 + c1 ) of Q differs from the first
coordinate p1 of P and therefore L PQ is not vertical. Thus the slopes of L (= L AB )
and L PQ are well-defined; the fact that the slopes are equal can be proved in
exactly the same way as in Case 1. Therefore by Theorem 4.5 on page 72, the
lines L and L PQ are parallel. The fact that | PQ| = | AB| is proved by the same
calculation as in Case 1 using the distance formula, and, finally, the reasoning in
Case 1 concerning Q being to the left (respectively, the right) of P on L PQ if B is
to the left (respectively, the right) of A on the line L remains the same for Case 2.
Now suppose L is vertical. In that case, a1 = b1 and therefore c1 = 0. The
first coordinates of P and Q are now the same (= p1 ) and therefore L PQ is also
vertical. Then it is straightforward to see that if A is above (respectively, below)
B, then P is also above (respectively, below) Q. The proof of Case 2 is complete,
and therefore the proof of Lemma 5.3 is also complete.

Exercises 5.3

(1) Mental math: Without solving the following linear system, explain using
geometry why it has no solution:
4x + 67y = 567
x + 268y = 931

(2) Does the following linear system have a solution? Explain.

23x + 17y = 56
299x + 221y = 931

(3) Use coordinates to prove that the three medians of a triangle (the lines
joining a vertex to the midpoint of the opposite side) meet at a point, as
follows. We may assume that the vertices of the triangle are A = ( a1 , a2 )
(a2 > 0), B = (0, 0), and C = (c, 0) (c > 0), i.e., A is above the x-axis,
B is the origin and C lies on the positive x-axis. Let the midpoints of
AB, AC, and BC be D, E, and F, respectively, and let BE and CD meet
at a point G. Prove that A, G, and F are collinear by computing the
coordinates of G and F. Hint: Use Exercise 5 on page 72.
(4) Theorem 3.1 on page 44 proves that a linear equation of one variable
ax + b = cx + d has a unique solution if a = c. Fill in the details of the

following reasoning which gives a second proof of this theorem: Con-

sider the linear system, where a = c:

y = ax + b
y = cx + d
Since the graphs of the lines y = ax + b and y = cx + d are not parallel,
they intersect and the system has a unique solution. Therefore ax + b =
cx + d has a unique solution.3

5.4. Algebraic criterion for solvability

Linear systems with a solution

The determinant and linear systems with no
The main theorem on solvability

Linear systems with a solution

From Theorem 5.1 on page 86, we know that a linear system of two equations in
two variables has a unique solution if and only if the graphs of the equations are
distinct nonparallel lines. This is a decisive result, except that there are occasions
when one wants to solve a linear system without once looking at the graphs
(for example, suppose one wants to write a computer program to solve linear
systems). The goal of this section is to make use of Theorem 5.2 on page 94
to translate the geometric information in Theorem 5.1 into algebra and, in the
process, completely describe the solvability of a linear system algebraically.
Let the linear system of two equations in x and y be

ax + by = e
cx + dy = f
where a, b, . . . , f are constants. Let 1 , 2 be the graphs of the first equation
ax + by = e and the second equation cx + dy = f , respectively. According to
Theorem 5.1 (page 86), the linear system has solutions if and only if 1 and 2
either intersect at one point or coincide completely. In other words, if a linear
system has a solution, then there are exactly two possibilities:
Case 1. The system has a unique solution.
Case 2. The system has an infinite number of solutions.
We will now describe each of these two scenarios algebraically.
Case 1. The system has a unique solution.
By Theorem 5.1, this would be the case if and only if 1 and 2 intersect at one
point. This happens if either both 1 and 2 are nonvertical, or one of 1 and 2 is
vertical and the other is nonvertical.

3I owe this problem to Bob LeBoeuf.


Case 1a. Both 1 and 2 are nonvertical; let them intersect at

a point P. According to Theorem 4.5, the lines have distinct
slopes. Since both are nonvertical, b = 0 and d = 0, so that
the slopes of 1 and 2 are −ba and −dc , respectively. Therefore
b  = d . By the cross-multiplication algorithm (page 270), this is
a c

equivalent to ad = bc or, what is the same thing, ad − bc = 0.

Therefore Case 1a happens if and only if b = 0, d = 0, and
ad − bc = 0.
Case 1b. Exactly one of 1 and 2 is vertical. If 1 is vertical, then
b = 0, and since 2 is not vertical, d = 0. Similarly, if 2 is
vertical, then d = 0, and b = 0. Thus this case happens if and
only if either b = 0 but d = 0, or d = 0 but b = 0.
Now let us reflect on the conclusion from Case 1: it looks complicated, be-
cause it says that the system (5.12) has a unique solution if and only if either
(i) b = 0, d = 0, and ad − bc = 0,
(ii) b = 0 but d = 0, or d = 0 but b = 0.
It is not obvious, but we can actually replace the clumsy condition of “either (i)
or (ii)” by a much more user-friendly statement, as follows: We claim:
The system (5.12) has a unique solution if and only if
(iii) ad − bc = 0.
Thus we must prove that if (iii) holds, either (i) or (ii) must hold, and conversely,
if either (i) or (ii) holds, then (iii) also holds. First suppose (iii) is true, and we
will prove that either (i) or (ii) is true. Now if b = d = 0, then ad − bc = 0.
Therefore since (iii) is true, b and d cannot be 0 at the same time; it follows easily
that either (i) or (ii) must be true. The converse is more interesting: if either (i)
or (ii) is true, we will prove (iii) is true. If (i) holds, then (iii) is trivially true.
So suppose (ii) holds. Let b = 0 but d = 0. Then ad − bc = ad. But in the first
equation of (5.12), the fact that b = 0 requires that a = 0 (see the definition of a
linear equation on page 59). It follows that ad − bc = ad = 0. Similarly, if d = 0
but b = 0, then ad − bc = −bc = 0. Therefore (iii) is proved in either case. The
proof of the claim is complete.
Case 2. The system has an infinite number of solutions.
By Theorem 5.1, this would be the case if and only if 1 and 2 coincide.
According to Lemma 4.10 on page 79, the graphs 1 and 2 of the two equations
in the system (5.12) coincide if and only if there is a nonzero λ so that c = λa,
d = λb, and f = λe. For simplicity, we will agree to express the preceding three
equalities by the notation (c, d, f ) = λ( a, b, e). Thus the linear system (5.12)
has an infinite number of solutions if and only if there is a nonzero λ so that
(c, d, f ) = λ( a, b, e).

The determinant and linear systems with no solution

Observe that if (c, d, f ) = λ( a, b, e) for a nonzero λ, then
ad − bc = a(λb) − b(λa) = 0.
Compare this fact with condition (iii) above and it becomes clear that the number
ad − bc has a significant bearing on the solvability of (5.12): Case 1 is about the

case of ad − bc = 0, while we now know that Case 2 should be subsumed under

the heading of ad − bc = 0.
It is time to formalize this number ad − bc. It is called the determinant Δ of
the linear system (5.12): Δ = ad − bc. In terms of the determinant, we have now
acquired the perspective that Case 1 is about a nonzero determinant whereas Case
2 is about what happens when the determinant is zero. More precisely, the system
(5.12) has a unique solution if and only if its determinant is nonzero (Case 1), and
it has an infinite number of solutions if and only if the determinant is zero and
(c, d, f ) = λ( a, b, e) for some nonzero number λ.


Decide by visual inspection whether the following system has a unique so-
lution or an infinite number of solutions, and explain why:

12 13 x − 24y = 1
18.2x + 35.5y = 2.8

It remains to consider the situation where the system (5.12) has no solution.
In that case, the determinant of the system (5.12) must be zero because if the
determinant were nonzero, the system would have a unique solution, which is a
contradiction. Moreover, we claim that, if there is no solution, then there is no
nonzero number λ so that (c, d, f ) = λ( a, b, e). This is because Case 2 above says
(c, d, f ) = λ( a, b, e) for a nonzero λ implies the system has an infinite number of
solutions, which is a contradiction again.
We now show that, conversely, if the system (5.12) has a zero determinant
and if there is no nonzero number λ so that (c, d, f ) = λ( a, b, e), then the system
has no solution. We give a proof by contradiction (see the proof of Lemma 3.1 in
Section 3.1 of [Wu-PreAlg]). Suppose under these assumptions, the system (5.12)
has a solution. According to the earlier conclusion on page 98, the linear system
either has a unique solution or an infinite number of solutions. In the former case,
the determinant is nonzero (see Case 1 above), a contradiction to the hypothesis
of a zero determinant. But in the latter case, we have just seen that there must
be a nonzero number λ so that (c, d, f ) = λ( a, b, e) (see Case 2 above), again a
contradiction. Thus the converse is proved.
We have now proved that the system (5.12) has no solution if and only if it has
a zero determinant and there is no nonzero number λ so that (c, d, f ) = λ( a, b, e).
According to Theorem 5.1 on page 86, this means that if the determinant of the
system (5.12) is zero but there is no nonzero number λ so that (c, d, f ) = λ( a, b, e),
then the graphs 1 and 2 of the equations in (5.12) are parallel lines.


Decide by mental computation alone whether the following system has any
solutions:  1
2 3 x − 5y = 1
7x − 15y = 2.8

The main theorem on solvability

We summarize the preceding discussion into one theorem.
Theorem 5.4. Given a linear system

ax + by = e
cx + dy = f
where a, b, . . . , f are constants. Let Δ denote the determinant of the system. Then:
(1) If Δ = 0, the system has a unique solution.
(2) If Δ = 0, then
(2a) the system has an infinite number of solutions if there
is a nonzero number λ so that (c, d, f ) = λ( a, b, e),
(2b) the system has no solution if there is no nonzero num-
ber λ so that (c, d, f ) = λ( a, b, e).
Remark. The concept of the determinant of
a linear system in two variables generalizes to The criterion for solvability is
linear systems of n equations in n variables, the algebraic translation of the
and Theorem 5.4 likewise has a generalization
to linear systems of n equations in n variables.
fact that two lines intersect if
These are standard topics in any book on linear and only if they are nonparallel.

Exercises 5.4
(1) Without solving any of the following systems of equations, discuss the
nature of their solutions: a unique solution, infinitely many solutions, or
no solution?
(i ) 
2x − 3y = 1
x + 23 y = 2
(− 14 ) x − 3546 y = 23
697 x + 4239 y = 890
23 x − 85 y = 22
69 x + 255 y = 67
(2) Prove that the determinant of a linear system is zero if and only if the
graphs of the two equations 1 and 2 coincide or are parallel lines. Give
two proofs, one using Theorem 5.4, and a direct proof without using
Theorem 5.4.

5.5. Partial fractions and Pythagorean triples

Partial fractions
Pythagorean triples

Partial fractions
This section gives two applications of linear systems in two variables.
The first application shows how to express certain rational expressions in a
number x as a sum of simpler rational expressions also in x. Consider the simple
5 4 5( x + 3) + 4( x − 2)
+ = .
x−2 x+3 ( x − 2)( x + 3)
After simplifying the numerator of the right side and multiplying out

( x − 2)( x + 3) = x2 + x − 6,

we get the identity

5 4 9x + 7
(5.13) + = 2 .
x−2 x+3 x +x−6

This is straightforward. Things get interesting, however, if you happen not to

know identity (5.13) but are asked whether x29x +7
+ x −6
can be expressed as a sum of
1 1
(constant) multiples of the simple rational expressions x− 2 and x +3 . In general
terms, the question may be understood as part of our overall desire to express
complicated objects in terms of simpler ones (e.g., think of the prime decomposition
of a positive integer, to the effect that every integer > 1 is the product of a finite
number of primes; see Section 3.2 of [Wu-PreAlg]). This question arises naturally
in calculus, and is a special case of the so-called partial fraction decomposition of a
rational expression.
The answer to the preceding question is by no means obvious, for two reasons.
One is that even knowing x2 + x − 6 = ( x − 2)( x + 3) ahead of time, one would be
inclined to believe that x29x +7
+ x −6
is a sum of more complicated rational expressions
such as ax +b cx + d
x −2 and x +3 for linear polynomials ax + b and cx + d, rather than just
5 4
having constants in the numerators, such as x− 2 and x +3 . The other reason
is that even if you believe that such an expression is possible, there remains the
question of how to get the precise values of the numerators, i.e., 4 and 5 in (5.13).
In order to answer this question, we have to first quote two facts without
proof; the proofs are not too difficult, but they do take up time and space that we
cannot afford at this point.4
(A) Let ( a1 x + b1 ), ( a2 x + b2 ), . . . , ( an x + bn ) be n linear poly-
nomials in x (n is a positive integer) so that none is a constant
multiple of another. Let p( x ) be a polynomial in x of degree
less than n. Then there are constants c1 , c2 , . . . , cn so that

p( x ) c1 cn
= +···+ .
( a 1 x + b1 ) · · · ( a n x + b n ) a 1 x + b1 a n x + bn

4 For (A), see [Birkhoff-MacLane], Section 3.11; (B) follows from the fact that a polynomial of

degree n has at most n roots, which is easy to prove, e.g., Chapter 9 in Volume II of [Wu-HighSchool].

(B) Suppose the following two n-th degree polynomials in x (n

is a positive integer) are equal for all x with at most a finite
number of exceptions:

a n x n + · · · + a 1 x + a 0 = b n x n + · · · + b1 x + b0 .

Then the coefficients of the polynomials are pairwise equal:

a n = b n , a n − 1 = b n − 1 , . . . , a 0 = b0 .
Using Fact (A), we see that there must be constants a and b so that

9x + 7 a b
(5.14) = + ,
( x − 2)( x + 3) x−2 x+3

which is valid for all x = 2, −3. We now use Fact (B) to recover equation (5.13),
i.e., to obtain the values of the constants a and b as 5 and 4, respectively.5 By the
addition of rational expressions,

a b a ( x + 3) + b ( x − 2)
+ =
x−2 x+3 ( x − 2)( x + 3)
( a + b) x + (3a − 2b)
= .
( x − 2)( x + 3)

Comparing this equality with (5.14), we get

9x + 7 ( a + b) x + (3a − 2b)
( x − 2)( x + 3) ( x − 2)( x + 3)

for all x = 2, −3. If we multiply both sides by ( x − 2)( x + 3), we see that this is
equivalent to
9x + 7 = ( a + b) x + (3a − 2b)

for all x = 2, −3. From Fact (B), we know that the coefficients a + b and 3a − 2b
must be equal to 9 and 7, respectively. In other words, we have the following
simultaneous linear equations in a and b:

a + b = 9
3a − 2b = 7

We can solve this linear system by simply multiplying the first equation by 2 and
then adding it to the second equation. This yields 5a = 25 and therefore a = 5.
Either equation then yields b = 4, as claimed.
Remark. Because we have only learned about solving linear systems in two
unknowns, we can only make use of (A) and (B) on page 102 for the case n = 2.
Once we learn how to deal with linear systems in n unknowns, the preceding
method will allow us to determine all the coefficients c1 , . . . , cn in (A).

5 There is another method to achieve the same goal; again see [Birkhoff-MacLane], Section 3.11.

Pythagorean triples
We now give a second application of linear
Pythagorean triples furnish systems. We say three positive integers a, b, c
another example of how algebra form a Pythagorean triple { a, b, c} if a2 +
and geometry are intertwined. b2 = c2 . In other words, a, b, and c are the
lengths of three sides of a right triangle, and
our convention is that the third member of a Pythagorean triple is, by definition, the
length of the hypotenuse of the right triangle. It goes without saying that the key
point of the definition of a Pythagorean triple is that all three numbers are positive
integers. Everybody knows that 3, 4, 5 form a Pythagorean triple; some may even
know that {5, 12, 13} is another Pythagorean triple, or even that {8, 15, 17} is
yet another example. But are there others?
Before answering this question, we should make the trivial observation that
given any two positive integers m and n, we can find a right triangle Δ so that m
and n are (the lengths of) two of its sides. Indeed, simply construct two perpen-
dicular segments with a common endpoint so that one has length m and the other
has length n, and then join the other endpoints of the segments to form a right
triangle. (If m < n, we have the additional freedom of constructing a right trian-
gle whose hypotenuse has length n and one leg has length m; see Exercise 1 on
page 108.) However, the length of the third side of this right triangle will not be
an integer in general. The classic example is the isosceles right triangle with both
√ equal to 1, and the third side—the hypotenuse—then of course has length
2, which is not even a rational number.6 The main attraction of Pythagorean
triples is therefore that all three numbers are positive integers.
Our purpose is to produce Pythagorean triples at will by solving an extremely
simple linear system of equations. It will be obvious that we will get an infinite
number of Pythagorean triples by this method. It is even true that the method
produces all the Pythagorean triples, though we will not prove this fact here. One
would like to say that this method is due to the Babylonians some thirty-eight
centuries ago, circa 1800 B.C. (Babylon, about sixty miles south of Baghdad in
present day Iraq), but a more accurate statement would be that it is the algebraic
rendition of the method one infers from a close reading of the celebrated cuneiform
tablet, Plimpton 322, which lists fifteen Pythagorean triples.7 See [Robson].
Let us first perform a conceptual simplification. Take {3, 4, 5}, for exam-
ple. Once we are in possession of this triple, we will in fact be in possession
of an infinite number of Pythagorean triples, namely, {6, 8, 10}, {9, 12, 15},
{12, 16, 20}, and in general, {3n, 4n, 5n} for any positive integer n. Clearly,
if you already have the Pythagorean triple {3, 4, 5}, there is not much glory in
claiming that you also have another Pythagorean triple such as {6, 8, 10}. Ac-
cordingly, we define a Pythagorean triple { a, b, c} to be primitive if the integers
a, b, and c have no common divisor other than 1 (i.e., if k is a positive integer
that divides all three a, b, and c, then k = 1), and will henceforth concentrate
on getting primitive Pythagorean triples. We say a Pythagorean triple { a, b, c}

6 See,
e.g., the end of Section 3.2 in [Wu-PreAlg].
7 Lestyou entertain for even a split second the idle thought that people couldn’t have known
such advanced mathematics thirty-eight centuries ago and that these triples were probably hit upon
by trial and error, let it be noted that the largest triple in Plimpton 322 is {12709, 13500, 18541}.

is a multiple of another Pythagorean triple { a , b , c } if there is a positive in-

teger n so that a = na , b = nb , and c = nc . In this terminology, a given
Pythagorean triple is either primitive, or is a multiple of a primitive Pythagorean
triple (see Exercise 6 on page 108). Therefore whenever a Pythagorean triple is
given, we lose nothing by replacing it with the primitive Pythagorean triple of
which the first Pythagorean triple is a multiple. For example, instead of dealing
with {15, 36, 39}, we will replace it by {5, 12, 13}.
We will give a proof of the following theorem. Observe that its statement
makes use of the fact that any two fractions can be written as two fractions with
the same denominator (FFFP).8
Theorem 5.5. Let (u, v) be the solution of the linear system

⎪ t

⎨ u + v = s

⎩ u − v = s
where s, t are positive integers with s < t. If we write u and v as two fractions with the
same denominator, u = bc and v = ba , then { a, b, c} form a Pythagorean triple.
There is extra incentive in providing a proof of this theorem, not only because
the proof is very simple, but also because it actually tells us why the solution (u, v)
of the linear system furnishes a Pythagorean triple.
Proof. With (u, v) as the solution of the linear system, we multiply the corre-
sponding sides of the two equations in the theorem to get (u + v)(u − v) = st · st ,
or u2 − v2 = 1 (by (1.4) on page 9). So with u = bc and v = ba , we have

c 2
a 2
− = 1.
b b
Multiplying through both sides of this equality by b2 gives c2 − a2 = b2 and,
a2 + b2 = c2 .
We have our Pythagorean triple and the proof of Theorem 5.5 is complete.
It is easy to explain how Theorem 5.5 came about, but before doing that, let
us put it to use to produce some new Pythagorean triples.
Example 1. Consider:

u + v = 2

u − v = 1
Adding the equations gives 2u = 52 so that u = 54 . From the second equation,
we get v = u − 12 = 54 − 12 = 34 . Thus we have retrieved the grandfather of all
Pythagorean triples, {3, 4, 5}.
Example 2. Consider:

⎨ u + v = 32
⎩ u − v = 2

8 See page 270; it is first discussed in Section 1.2 of [Wu-PreAlg].


Adding the equations gives 2u = 13 13

6 so that u = 12 . From the second equation,
we get v = u − 3 = 12 − 3 = 12 . By Theorem 5.5, {5, 12, 13} is a Pythagorean
2 13 2 5

triple, which of course we already know.

Example 3. Consider:

⎨ u + v = 4
⎩ u − v = 3

Adding the equations gives 2u = 25 25

12 so that u = 24 , and the second equation
gives v = 25 24 − 4 = 24 . By Theorem 5.5, {7, 24, 25} is a Pythagorean triple.
3 7

Since this is new to most people, one should check directly that 72 + 242 = 252 .
Example 4. Consider:

⎨ u + v = 69
⎩ u − v = 2

Adding the two equations gives 2u = 4765 4765

138 so that u = 276 . From the sec-
ond equation, we get v = 4765276 − 69 = 276 . This time we get an unfamiliar
2 4757

Pythagorean triple {276, 4757, 4765}. Although Theorem 5.5 guarantees that
this is indeed a Pythagorean triple, it would be good for your soul to directly
check that 2762 + 47572 = 47652 is in fact true.
Observe that thus far, every single Pythagorean triple has been primitive.
Now consider:
Example 5. Consider:

u + v = 5

u − v = 1

Adding the equations, we obtain 2u = 26 1

5 and multiplying both sides by 2 gives
u = 10 . From the second equation of the system, we get v = 10 − 5 = 24
26 26 1
10 . By
Theorem 5.5, {10, 24, 26} is a Pythagorean triple. This is not a primitive triple
because it is a multiple of {5, 12, 13}, which we already know and which is
clearly primitive.
However, if we had taken the trouble to do the obvious cancellation to get
1 26 13
u = × = ,
2 5 5
then we would have obtained from the second equation v = 13 5 − 5 = 5 , and
1 12

the primitive triple {5, 12, 13} would be the result. Thus we see that different
values of s and t do not always lead to distinct primitive Pythagorean triples.
We now explain the genesis of Theorem 5.5, which, we must add, is already
implicit in the above proof of the theorem. We will follow the standard method
of assuming that we already have a Pythagorean triple { a, b, c}, and proceed
to find out what equation or equations they must satisfy. The new idea here is
that, by rewriting the Pythagorean Theorem a2 + b2 = c2 as b2 = c2 − a2 so that

1 = ( bc )2 − ( ba )2 , we succeed in expressing 1 as a difference of squares to which

the standard identity (1.4) on page 9 can be applied. In fact, by letting
c a
(5.15) u = and v =
b b
we get
u2 − v2 = 1
and therefore
(u + v)(u − v) = 1.
Since u + v and u − v are fractions and since c > b and therefore u > 1, we
may let s and t be positive integers with s < t so that u + v = st . But because
(u + v)(u − v) = 1, necessarily, u − v = st . Therefore we have:

⎨ u + v = t
⎩ u − v = s
where, we recall for emphasis, the s and t are positive integers with s < t. We
may regard this system as a system of linear equations in the variables u and v,
and it is exactly the system in the statement of Theorem 5.5. From this point of
view, Theorem 5.5 becomes inevitable.
Optional Reading: We will give a refinement of Theorem 5.5 by
directly solving system (5.16). Adding the two equations, we
2 s2
get 2u = st + st = t +
st , and therefore
t2 + s 2
u= .
From the second equation of (5.16), we then obtain
s s 2 + t2 s s 2 + t2 2s2 t2 − s 2
v = u− = − = − =
t 2st t 2st 2st 2st
so that
t2 − s 2
v= .
Since u2 − v2 = 1, we have
 2 2  2 2
t + s2 t − s2
− = 1.
2st 2st
Multiplying both sides by (2st)2 , we get
(t2 + s2 )2 − (t2 − s2 )2 = (2st)2 ,
(s2 − t2 )2 + (2st)2 = (s2 + t2 )2 .
(Compare Exercise 3 on page 5.) This shows that if s, t are
positive integers and t > s, then
{2st, t2 − s2 , t2 + s2 } is a Pythagorean triple.
We have therefore presented two ways of obtaining Pythagorean
triples: by giving explicit values of positive integers s and t in
the preceding formula, or by using Theorem 5.5 and solving the
linear system there. Of course the former is a consequence of

the latter, but for school mathematics, the latter is more instruc-
With a little more work (see Exercise 7 immediately follow-
ing), one can prove that if s and t are relatively prime (i.e., no
common divisor other than 1), and if one of them is even and the
other odd, then the triple {2st, t2 − s2 , t2 + s2 } is primitive. With
considerably more work (see Exercise 11 on page 109), it can be
shown that every primitive Pythagorean triple is represented in
terms of suitable s and t in this manner.

Exercises 5.5

(1) Given any two positive numbers (not necessarily integers) x and y so
that x < y, describe a ruler-and-compass construction of a right triangle
so that its hypotenuse has length y and one leg has length x. If x and y
are fixed, are all such triangles congruent?
(2) Express 8x +2
x2 −1
as a sum of constant multiples of x+ 1
1 and x −1 .
− 5( x + 1) 1 1
(3) Express 3( x2 + x−12) as a sum of constant multiples of x+ 4 and x −3 .
(4) In each of the following, you are asked to solve the linear system in The-
orem 5.5 with the given values of s and t to obtain Pythagorean triples.
You may use a scientific calculator, especially for (i) and (j) below.
(a) s = 2, t = 5.
(b) s = 4, t = 5.
(c) s = 1, t = 4.
(d) s = 1, t = 3.
(e) s = 1, t = 6.
(f) s = 1, t = 12.
(g) s = 3, t = 17.
(h) s = 3, t = 13.
(i) s = 12, t = 13.
(j) s = 54, t = 125.
(k) s = 8, t = 9907.
(5) In part (k) of the last problem, the largest number in the Pythagorean
triple has 8 digits. Suppose you have a calculator with only a 12-digit
display on the screen. Explain how you can use such a calculator to
directly verify that the triple of numbers so obtained is a Pythagorean
(6) Prove that a Pythagorean triple is either primitive, or is a multiple of a
primitive Pythagorean triple.
(7) Prove that if s and t are relatively prime positive integers (i.e., no com-
mon divisor other than 1), 0 < s < t, and one of them is even and the
other odd, then {2st, t2 − s2 , t2 + s2 } is a primitive Pythagorean triple.
(Hint: Make strong use of the Key Lemma in Section 3.1 of [Wu-PreAlg]
(also see page 270 of this volume).)
(8) Let { a, b, c} be a Pythagorean triple. Prove that the following four condi-
tions are equivalent: (i) { a, b, c} is primitive. (ii) a and b are relatively
prime. (iii) a and c are relatively prime. (iv) b and c are relatively
5.6. APPENDIX 109

(9) Let { a, b, c} be a primitive Pythagorean triple. Prove that one of a and b

is even and the other is odd.
(10) Let { a, b, c} be a primitive Pythagorean triple so that a is odd and b is
even (see preceding exercise). Let st be a fraction in lowest terms (i.e., t
and s are relatively prime integers9 ) so that
c a t
+ = .
b b s
(i) Prove that
c a s
0 < s < t and − =
b b t
and that
c t2 + s 2 a t2 − s 2
= and = .
b 2st b 2st
(ii) Use the last expression of b to prove that one of s and t is even and
the other is odd. (iii) Finally, prove that
a = t2 − s 2 , b = 2st, and c = t2 + s 2 .
(Hint: Make use of Exercise 7 above and the uniqueness of the reduced
form of a fraction (Theorem 3.1 in [Wu-PreAlg]).)
(11) Make use of Exercises 8–10 to prove that a Pythagorean triple { a, b, c}
is primitive if and only if there exist relatively prime integers s and t,
0 < s < t, so that one of them is even and the other is odd, and so that
a = t2 − s2 , b = 2st, and c = t2 + s2 .

5.6. Appendix
In Theorem 5.2 on page 94, we characterized
parallelism in terms of slope. There is a com- The fact that two lines are
panion theorem that gives a characterization perpendicular if the product of
of perpendicularity in terms of slope, and the
their slopes is −1 is something
purpose of this Appendix is to state and prove
the latter. We do so not only for reasons of that must be proved.
completeness, but also because it is now be-
coming common in high school algebra textbooks to adopt the absurd practice of
defining perpendicularity in terms of slope. The absurdity comes from the fact that
the concept of perpendicularity has already been defined in elementary school in
terms of the degree of an angle (i.e., 90◦ angles at the point of intersection). What
we need is therefore the proof of a theorem, not a second definition.
Theorem 5.6. Two nonvertical lines are perpendicular if and only if the product of
their slopes is equal to −1.
Proof. We first prove a special case of the theorem: Let 1 and 2 be the two given
lines passing through the origin O. We will prove that they are perpendicular if
and only if the product of their slopes is −1.
Because 1 and 2 are nonvertical and are perpendicular to each other, neither
is horizontal. To describe the relative positions of the lines, observe that the four
9 See page 268.

right angles10 formed by the positive and negative coordinate axes with vertex
at the origin O, minus the coordinate axes themselves, are usually called the four
quadrants of the coordinate system and are labeled I, II, III, and IV, as shown:



Since 1 and 2 are neither vertical nor horizontal, with the exception of the point
O, they must lie completely inside either quadrants I and III, or quadrants II and
IV, as shown below:


If both 1 and 2 lie in quadrants I and III, the degree of the angle between
the rays on the lines is either greater than 90◦ or less than 90◦ , and 1 cannot be
perpendicular to 2 , as shown:


For a similar reason, 1 and 2 cannot both lie in quadrants II and IV. We may
therefore assume that 1 lies in quadrants I and III and 2 lies in quadrants II and
We choose points P1 = ( x1 , y1 ) and P2 = ( x2 , y2 ) on 1 and 2 , respectively, so
that both lie above the x-axis. Then P1 lies in quadrant I and P2 lies in quadrant
II. It follows that y1 , y2 > 0, but x1 > 0 and x2 < 0.

10 Remember that, in these volumes, an angle is a region in the plane rather than two rays issuing

from a common vertex.

5.6. APPENDIX 111

P2 = ( x2 , y2 ) J 
J  P1 = ( x1 , y1 )


The slope of 1 computed using points P1 and O is y1 /x1 , which is positive,
while the slope of 2 computed using points P2 and O is y2 /x2 , which is negative.
Therefore, the perpendicularity of 1 and 2 implies that the product of their
slopes must be negative. It remains to check that the absolute value of the product
of the slopes is 1.
So far, P1 and P2 are any two points on 1 and 2 , respectively, subject only
to the restriction that they lie above the x-axis. Now we further specify that P1 ,
P2 be chosen so that they are equidistant from O, i.e., |OP1 | = |OP2 |. Because
the rotation  of 90 degrees around the origin O carries 1 to 2 , the fact that 
is a congruence implies that  carries P1 to P2 . Let the vertical line from P1 meet
the x-axis at Q1 , and let Q2 = ( Q1 ). Now Q2 lies on the y-axis and since  also
preserves angles, P2 Q2 ⊥ y-axis. But  is also length-preserving, so
(5.17) | P1 Q1 | = | P2 Q2 | and |OQ1 | = |OQ2 |.

P2 J Q2 1
J P1 


O Q1
By the way a coordinate system is set up (see Section 4.1), we know that | P2 Q2 |
and |OQ2 | are the absolute values of the x- and y-coordinates of the point P2 . Thus
computing the slope of 2 using the points P2 and O (Theorem 4.4 on page 67),
we see that the absolute value of this slope is |OQ2 |/| P2 Q2 |. The absolute value
of the slope of 1 is of course | P1 Q1 |/|OQ1 |. Thus, taking (5.17) into account, the
product of the absolute values of the slopes of 1 and 2 is
| P1 Q1 | |OQ2 | | P2 Q2 | |OQ2 |
· = · = 1.
|OQ1 | | P2 Q2 | |OQ2 | | P2 Q2 |
This completes the proof that the product of the slopes of two nonvertical perpen-
dicular lines which pass through the origin O must be equal to −1.
Still assuming that two lines 1 and 2 pass through the origin, how
shall we approach the proof of the converse, namely, that if the product
of the slopes of 1 and 2 is −1, then 1 ⊥ 2 ? In other words,

if P2 and P1 are two points on 2 and 1 , respectively, how to prove

|∠P2 OP1 | = 90◦ ?
P2 Jq qQ
J P1 q 
J s 

J q
O Q1
Let Q1 and Q2 be points on the x-axis and y-axis, respectively.
Since ∠Q2 OQ1 is a right angle and |∠Q2 OQ1 | = |∠ P1 OQ1 | +
|∠Q2 OP1 |, if we can show
|∠P1 OQ1 | = |∠P2 OQ2 |,
then we would have
|∠P2 OP1 | = |∠P2 OQ2 | + |∠Q2 OP1 | = |∠P1 OQ1 | + |∠Q2 OP1 | = 90◦ .
So how can we show |∠ P1 OQ1 | = |∠ P2 OQ2 |? One way is to iden-
tify these angles as corresponding parts of similar or congruent trian-
gles. For simplicity, we will only make use of congruent triangles.
We now begin the formal proof of the converse, namely, if two lines 1 and 2
pass through O and the product of their slopes is −1, then 1 ⊥ 2 . Since the slopes of
1 and 2 have opposite signs, the above discussion shows that we may assume 1
lies in quadrants I and III and 2 lies in quadrants II and IV. Let P1 be some point
on 1 lying in quadrant I. Drop a vertical line from P1 so that it meets the x-axis
at Q1 .

J 1
P2 J Q2 
J P1 

J s
O Q1
Let Q2 be the point on the positive y-axis so that |OQ2 | = |OQ1 | and let a hori-
zontal line from Q2 meet the 2 at P2 . If we can prove that
P1 OQ1 ∼
P2 OQ2 ,
then we would have |∠ P2 OQ2 | = |∠ P1 OQ1 |, so that
|∠P2 OP1 | = |∠P2 OQ2 | + |∠Q2 OP1 |
= |∠P1 OQ1 | + |∠Q2 OP1 |
= |∠Q2 OQ1 | = 90◦ .
In other words, 1 ⊥ 2 .
5.6. APPENDIX 113

It remains to prove
P1 OQ1 ∼ =
P2 OQ2 . Since the product of the slopes of
1 and 2 is −1, the product of the absolute values of slopes of 1 and 2 is equal
to 1. By a reasoning that is familiar to us by now, this means
| P1 Q1 | |OQ2 |
· = 1.
|OQ1 | | P2 Q2 |
Since |OQ1 | = |OQ2 |, we have
| P1 Q1 |
= 1
| P2 Q2 |
and therefore | P1 Q1 | = | P2 Q2 |. Recall that |OQ1 | = |OQ2 |, by the definition
of Q2 . Since also ∠ P1 Q1 O and ∠ P2 Q2 O are right angles, the SAS criterion for
congruence (see page 270) implies that
P1 OQ1 ∼ =
P2 OQ2 , as desired. This
proves that if two lines 1 and 2 pass through O and the product of their slopes
is −1, then 1 ⊥ 2 .
We have just proved that Theorem 5.6 is true for nonvertical lines passing
through the origin O.
We now finish the proof of Theorem 5.6 by dealing with the general case
where the two given lines 1 and 2 need not pass through the origin. Let L1 and
L2 be lines passing through the origin O so that 1 L1 and 2 L2 . We need the
following simple lemma.

Lemma 5.7. Let 1 and 2 be intersecting lines, and let lines L1 and L2 be parallel
to 1 and 2 , respectively. Then the perpendicularity of 1 and 2 is equivalent to the
perpendicularity of L1 and L2 .
The proof is an immediate consequence of the considerations of correspond-
ing angles of parallel lines, as shown (the details can be left as an exercise):
L1 1
L2 @
@ @r
@ @
@r @
@ @
@ @

Now suppose 1 ⊥ 2 ; then we want to prove that the product of the slopes
of 1 and 2 is −1. By Lemma 5.7, 1 ⊥ 2 implies L1 ⊥ L2 . Since L1 and L2
pass through O, the preceding proof of the special case of Theorem 5.6 for lines
passing through the origin shows that the product of the slopes of L1 and L2 is
−1. By Theorem 5.2, the slopes of 1 and L1 are equal, as are the slopes of 2 and
L2 . Hence the product of the slopes of 1 and 2 is also −1. Conversely, suppose
the product of the slopes of 1 and 2 is −1, and we will prove that 1 ⊥ 2 . By
Theorem 5.2, the product of the slopes of L1 and L2 is also −1. Since L1 and L2
pass through O, we already know L1 ⊥ L2 . By Lemma 5.7, 1 ⊥ 2 . The proof of
Theorem 5.6 is complete.

We give an application of Theorem 5.6. Let L be the diagonal line that is the
graph of y = x and let Λ be the reflection across L. Then we claim that for any
x and y, Λ( x, y) = (y, x ). Let P = ( x, y) and Q = (y, x ); then it suffices to prove
that L is the perpendicular bisector (see page 267) of PQ. For this purpose, use
Theorem 4.2 of [Wu-PreAlg] (see page 270 in this volume); we leave the details to
Exercise 5 on page 115.

P = ( x, y) r



@r Q = (y, x )


Exercises 5.6

(1) Write out a detailed proof of Lemma 5.7 on page 113.

(2) What is the equation of the line that passes through (1, 2) and is perpen-
dicular to the graph of 3x − y = 2?
(3) Explain in two different ways why the linear system

ax + by = e
−bx + ay = f

always has a unique solution regardless of what the numbers e and f

may be.
(4) The following is the outline of a new proof that if the product of the
slopes of two nonvertical lines 1 and 2 is −1, then they are perpendic-
ular. Fill in the details.
(A) First prove it for the special case that the two lines 1 and 2
pass through the origin O. Using the notational setup as above, we may
assume that 1 lies in quadrants I and III and 2 lies in quadrants II
and IV. Choose an arbitrary point P1 = ( x1 , y1 ) on 1 so that P1 lies
in quadrant I. Let the vertical line passing through P1 intersect the x-
axis at a point Q1 . Choose a point Q2 on the positive y-axis so that
|OQ2 | = |OQ1 |. Let the horizontal line through Q2 meet the line 2 at
P2 = ( x2 , y2 ), as shown below.
5.6. APPENDIX 115

P2 = ( x2 , y2 ) J Q2 1
P = ( x , y )
J  1 1 1


O Q1
Step 1. Use the fact that the product of the slopes of 1 and 2 is
equal to −1 to prove that ( x1 , y1 ) = (−y2 , x2 ).
Step 2. Show that |OP1 |2 + |OP2 |2 = | P1 P2 |2 .
Step 3.
P1 OP2 is a right triangle and therefore 1 ⊥ 2 .
(B) Prove in general that if the product of the slopes of two nonver-
tical lines 1 and 2 is −1, then they are perpendicular.
(5) Prove that the reflection Λ across the diagonal line y = x maps a point
( a, b) to (b, a).


Functions and Their Graphs

A major concern of algebra, and in fact of all mathematics, is with functions. In
this chapter, we give the definitions of a function and its graph, and single out
the graphs of so-called real-valued functions of one variable for emphasis. Section
6.2 on pages 122 ff. explains why numbers are inadequate for the description of
the phenomena around us and why functions must be used. Although numbers
are sufficient for doing arithmetic, functions are now needed to describe any phe-
nomena, be they social or natural, if we want anything more than “snapshots”
of an evolving situation. Functions are the alphabet of the language of higher
mathematics. At a time when the concept of a function is too often honored in
a pro forma manner in TSM,1 where the emphasis is mistakenly focused almost
exclusively on equations, perhaps Section 6.2 can serve the purpose of beginning
to reverse this unfortunate trend. The remainder of this volume will be devoted
to the study of functions.

6.1. The basic definitions

A function from a set A to a set B is a rule (i.e., a precise prescription) that assigns
(or associates) to each element of A an element of B. To be precise, we should
emphasize that a function, by definition, assigns to each element of A only one
element of B. If the function is denoted by f , then f : A → B is the correct
notation to capsulize this information. However, when A and B are understood,
the function f is often denoted generically by f ( x); this is not a good notation,
but it is one that appears in most textbooks, so you may as well get used to it. If
f assigns the element b of B to an element a of the set A, we write
f ( a) = b
and say b is the value of f at a. If A and B are subsets of the real numbers,
such an f is called a real function of (or in) one variable, or more correctly, a
real-valued function of (or in) one variable.
The set A in the definition of a function f : A → B is called the domain, or
domain of definition of f ; there is no universally accepted terminology for the
set B. Some call it the target, others the range, and yet others the co-domain. You
will have to be alert to this ambiguity in the literature. In the remainder of these
two volumes, we will try to minimize any references to B precisely because of this

1 See page xi for its definition.


Because we normally put so much emphasis on precise language, you may

have been startled to find that the definition of a function above speaks of “a rule
that assigns an element to another element” when the meaning of a “rule” is less
than transparent. Let it be noted that this slight ambiguity is intentional rather
than an intrinsic flaw. There is a way to arrive at a completely precise definition
of a function f : A → B: replace the “rule that assigns b of B to a in A” (i.e.,
what was written above as f ( a) = b) by an explicit pairing of a and b together. In
symbols, we simply define a function f : A → B as a collection of ordered pairs
( a, b), where the first element belongs to A and the second element belongs to B.
Therefore a function is, in this setting, just a collection of such ordered pairs {( a, b)}.
To further ensure that f does assign to each a0 in A an element b0 of B (i.e., to
ensure that the domain of f is A and not a smaller set), we add the requirement:
(i) To each a0 in A there is an ordered pair ( a0 , b0 ) in f . To make explicit the
fact that f assigns only one element of B to a, we further specify: (ii) If both
ordered pairs ( a, b) and ( a, b ) are in f (where b and b are elements of B), then
necessarily b = b . In summary then, the precise definition of a function f : A → B
is a (presumably very large) collection of ordered pairs {( a, b)}, with a in A and b
in B, so that the preceding two requirements (i) and (ii) are satisfied. The afore-
mentioned “rule” then corresponds to the explicit enumeration of all the ordered
pairs in this collection. Furthermore, given an ordered pair ( a, b) in f : A → B,
we usually rewrite it in the more suggestive notation of f ( a) = b.
For the definition of a function, the precision can thus be achieved at a con-
siderable cost. A common belief—in the context of school mathematics—is that
the cost is too high, because there is no evidence that most students in K–12 are
ready for this level of precision and abstraction. We are therefore willing to trade
the complete precision for a little common sense, and speak of a “rule” and an
“assignment”. To compensate for the lack of precision, we will give an extended
discussion of the raison-d’être of functions in the following section and give as
many concrete examples as possible. The hope is that such an informal approach
to functions will be an acceptable compromise between clarity and accessibility.
In a course on introductory algebra, the kind of functions one encounters
are usually real-valued functions of one variable. Often, such functions can be
described symbolically by formulas. Thus, letting R denote the real numbers, the
that assigns to each number its square can be succinctly given as F( x ) = x2 for
each number x. Thus, F(5) = 25, F(11) = 121, F(73) = 5329, etc. We note
for emphasis that F( x ) is always ≥ 0 for any x (the square of a number is never
negative because negative × negative is positive), so that if we write instead
F : R → {all real numbers ≥ 0},
then we would also be correct. Another example of a function is
G : {all nonzero numbers} → R

so that G assigns the multiplicative inverse x −1 (see page 267 for the definition)
to each nonzero number x. In symbols: G ( x ) = x −1 for each x = 0. Thus,
G (5) = 15 , G ( 17 9
9 ) = 17 ,
G (−0.28) = −0.28 , G (− 15
) = − 15
2 , etc.

Observe once again that, insofar as G ( x ) = 0 for any x in the domain of G (all the
nonzero numbers), we can also write the same function G as
G : {all nonzero numbers} → {all nonzero numbers}.
There is another kind of function that is almost as important as real-valued
functions of one variable: those functions defined on the whole numbers N.
Consider, for example, buying many copies of the same book (think of your-
self as the owner of a bookshop). If one copy costs $17.85, then two copies cost
17.85 + 17.85 = 35.70 dollars, and for any whole number n, n copies will cost:
17.85 + 17.85 + · · · + 17.85 = n × 17.85 dollars,

where we have made use of the fact that for any fraction A, nA = A + A + · · · + A
(n times) (see product formula on page 268). In accordance with the notational con-
vention on page 19, we will henceforth write the cost of n copies as 17.85 n dol-
lars. We may express this information more compactly by introducing a function
h : N → R so that h(n) = the cost of n copies of the book. We have just shown
that h(n) = 17.85n for all whole numbers n.
There is a real-valued function H : R → R
that is closely related to the preceding func- The function h(n) = 17.85n for
tion h : N → R, namely, H ( x ) = 17.85x for all n ∈ N should not be conflated
all real numbers x. In TSM, the function h is of-
ten conflated with the real-valued function H
with the function H ( x) = 17.85x
from the beginning. We shall see that on some for all x ∈ R.
occasions, we do want to explicitly replace the
function h by the function H (see, for example, the discussion on pages 157 ff.),
but if we do that, we will be doing it for a reason. In general, especially at the be-
ginning of the discussion of functions, we should not conflate these two functions
because they are different functions with different domains of definition. We will
return to this point on page 128 below.
We do not want to create the impression that every function can be expressed
by a formula. For example, if S is the function
S : {a deck of cards} → {club, diamond, heart, spade}
that assigns to each card its suit (where “{club, diamond, heart, spade}” stands
for the set consisting of the four possible suits of a card), then what S does to each
card would be difficult to describe in symbols. One can only illustrate by giving
some examples, such as
S(King of diamonds) = diamond,
S(Two of spades) = spade,
S(Queen of hearts) = heart, etc.
Yet another example of a function is a person’s age or, more precisely, the
assignment to each person his or her age. What we have is a function K :
{people} → {whole numbers} so that if p is a person, K ( p) = the age of p.
Once you get this idea, you begin to see many examples of functions in real
life. For example, writing down a person’s height is in effect a function g :
{people} → R, and writing down a person’s Social Security number is a func-
tion S : {American working adults} → N. And so on.

An effective way to get to know the concept of a function is to look at many

examples of real functions of one variable and to examine their graphs. We now
define this concept.2 Let f be a function from a set of numbers A to a set of
numbers B, f : A → B. Then the graph of f is the set of all the points ( x, f ( x ))
in R2 , where x is a number in A. In general, of course, the set A is infinite, so
that the graph of f is an infinite set as well. Although it is impossible to literally
get hold of the whole graph of any function, it is usually the case that plotting a
finite number of well-chosen points in the graph is enough to reveal the essential
features of the graph, and therefore of the function itself. In a later section, we
will give some standard examples of graphs of functions. It should be noted that
plenty of practice in plotting points on a graph is an essential component in the
learning of functions and graphs. So please remember: don’t let your students use
a graphing calculator until they have achieved fluency in plotting points.
The graph of a real function of one variable is thus a subset of the plane and,
as such, it can be represented pictorially most of the time. We shall implement
this idea in Section 6.3 on pages 126 ff. Here, we want to make two simple
observations about the graphs of real-valued functions of one variable. One is that
the graph of a function is in fact what we called on page 118 the abstract definition
of the function G, i.e., a collection of ordered pairs of numbers G with a special
property about the second coordinate of its points, namely, if two points ( x0 , y0 )
and ( x0 , y1 ) (which have the same first coordinate) are in G, then necessarily
y0 = y1 (see the requirement (ii) in the precise definition of a function on page
118). Second, this property of the graph G of such a function has a geometric
interpretation: the intersection of the graph G with a vertical line x = x0 (which
is the set of all the points of the form ( x0 , y), where y is any number) is either
empty (i.e., nothing) or just one point. This is called the vertical line rule: a
vertical line cannot intersect the graph of a function at more than one point.
Naturally, a general subset S of the plane does not necessarily satisfy this
special property, i.e., it can happen that ( x0 , y0 ) and ( x0 , y1 ) are both in S but
y0 = y1 . This would be the case, for example, if S turns out to be a vertical line
or something like the following:

r( x0 , y1 )

r( x0 , y0 ) S

O x0

In general, a subset of the plane is called a relation. In advanced mathematics,

some relations are of fundamental importance, e.g., the graphs of a polynomial
in two variables such as y2 = x2 − 1 or y2 = x3 + 2x + 1. These are not graphs
2 Although we are defining the graph of a function only for real functions of one variable here,
the same definition is valid for any function. But of course when the function is not a real function of
one variable, the “picture” of its graph becomes much more elusive.

of functions, but they are not pursued as relations per se either. They are studied
seriously because of the geometry or the number theory that is associated with
them. However, it is common in TSM to devote many pages to various aspects
of relations, including definitions of the domain and range of a relation. TSM
also spawns standardized-test items that demand that your students know what
the “range of a given relation” is. As far as mathematics is concerned, such
information is of negligible import in K–12 and therefore does not deserve your
rapt attention. If we can get students in K–12 to use functions fluently, we will
already be ahead of the game.
At this point, we want to point out that the graphs of real-valued functions
defined on the whole numbers N, such as the cost function of a book, h : N →
R on page 119, are also subsets of the plane and therefore also have pictorial
representations. See the discussion on page 128.
We can now revisit the concept of the graph of an equation in two variables
that was introduced on page 60. Suppose we are given a function H from the
plane to the set of all numbers, i.e., H : R2 → R, so that, for any point ( x, y)
in R2 , H assigns to it a number H ( x, y). Such a function H is called a function
of two variables, or more precisely, a real-valued function of two variables. An
equation in x and y such as H ( x, y) = 5 is thus a question asking whether there
are points ( A, B) in the plane that satisfy H ( A, B) = 5. In general, an equation
of the form H ( x, y) = c for some fixed number c is called an equation in two
variables.3 The graph of the equation in two variables, H ( x, y) = c, is the
collection of all the points ( A, B) in the plane so that H ( A, B) = c.
In Chapters 4 and 5, we already came across many functions of two variables
though without the name, e.g., g( x, y) = 3x − y for all numbers x and y. We
recognize that, for example, the equation in two variables, g( x, y) = 1 (i.e., 3x −
y = 1) is exactly what we called a linear equation in two variables (page 59). We
also recognize that, in this case, the graph of 3x − y = 1 is a line (Theorem 4.2 on
page 60).
We have now defined the concept of the graph of a real function of one vari-
able and, for any function of two variables H, we have also defined the graph
of an equation H ( x, y) = c for a fixed constant c. Both are subsets of the plane.
Where these concepts of graphs come together is in the following situation. Let f
be a real function of one variable: f : R → R. Define a function of two variables
F : R2 → R by F( x, y) = y − f ( x ) for all x and y. Then we claim:
(6.1) {the graph of f } = {the graph of the equation F( x, y) = 0}.
We can make this relationship more explicit by rewriting it as follows: let f be a
real-valued function of one variable; then
(6.2) {graph of the function f } = {graph of the equation y − f ( x ) = 0}.
In order to prove (6.2), we have to prove that the two sets are equal. This means
we have to prove that a point in the graph of the function f is also a point in the
graph of the equation y = f ( x ), and vice versa (see page 267). So suppose ( x0 , y0 )
is a point on the graph of the function f ; by the definition of the graph of f , this
means ( x0 , y0 ) = ( x0 , f ( x0 )), so that y0 = f ( x0 ). Thus ( x0 , y0 ) is on the graph of

3 For example, when H ( x, y ) is an expression in the numbers x and y. But of course, a function

of two variables x and y need not be representable by an expression in x and y.


the equation y − f ( x ) = 0. The proof of the converse is entirely similar; (6.2) is

proved, and therewith, also (6.1).
One consequence of equation (6.1) is that the graph of a real-valued function of
one variable is always the graph of an equation in two variables. However, the converse
is false, i.e., there are equations in two variables whose graphs are not the graphs
of any real-valued functions of one variable. One such equation is G ( x, y) = 0,
where G ( x, y) = x − 1. Indeed, the equation G ( x, y) = 0 is then the same as
the equation x = 1, whose graph is the vertical line in the plane passing through
(1, 0). Since a vertical line is not the graph of any function of one variable (because
of the vertical line rule), we conclude that the graph of the equation G ( x, y) = 0
for G ( x, y) = x − 1 is not the graph of any function of one variable.

Exercises 6.1
(1) (a) Describe the graph of f : R → R, where f ( x ) = 2x − 8. (b) Express
the graph of g : R → R, so that g( x ) = 5 − 3x, as the graph of an
equation of two variables.
(2) Let a real-valued function of two variables H be defined by H ( x, y) =
x − y. What is the graph of the equation H = 0? What is the graph of
the equation H = 1? What is the graph of the equation H = −25? How
are these graphs related?
(3) Can the circle of radius 1 around the origin (0, 0) be the graph of a
function in one variable? Is it the graph of an equation in two variables?
(4) Describe the graph of the following function of one variable, S : R → R,
defined as follows: for any number x, if n is the integer so that n ≤ x <
(n + 1), then S( x ) = n. (Incidentally, this is an example of a real-valued
function of one variable that does not have a formula in terms of the stan-
dard notation for addition, subtraction, multiplication, division, or rais-
ing to a power. However, it shows up often enough that a special notation
has been devised for it and related functions; see [Wiki-floorfunction].)

6.2. Why functions?

In the usual school mathematics curriculum, students coming to algebra have
only been exposed to arithmetic, and therefore may not realize that arithmetic
is only capable of capturing a frozen moment in a changing world. Your job as a
teacher is therefore to make them aware of this fact as a prelude to convincing
them that numbers alone are not enough to deal with phenomena related to change in
the real world, and therefore functions are necessary. To explain what this means,
consider a typical word problem in arithmetic:
Jason runs 2 miles in 8 12 minutes. At this speed, how long would
it take him to run 13 miles?
Common sense will tell you that this problem is nothing but a fairy tale. One
may be able to run 2 miles in 8 12 minutes, but it is highly unlikely that one can
keep up such a pace over 13 miles. (As it turns out, at this pace, Jason can run 13
miles in 55 14 minutes, and this is faster than the current world record for a half-
marathon, which is approximately 13.1 miles.) However, for the sake of creating

a word problem in arithmetic, such a fairy tale is necessary because, otherwise,

what other kinds of “rate problems” can we make up?
Such an arithmetic problem in effect makes us stare at a snapshot of Jason’s
running in a 2-mile race and extrapolate it to a distance of 13 miles, whereas we
should be looking at a video to see how Jason’s
running evolves over the whole time that he Numbers are to functions as a
runs the 13 miles. How can numbers alone ad- snapshot of an event is to a
equately describe this evolution over a span of
video of the same event.
more than an hour? In this sense, a function is
the mathematical analog of a video that frees us from staring only at one frozen
moment and allows for an analysis from moment to moment. A function offers
the potential to capture change over time.
The simple answer to the question of “why functions?” is that, because nature
and human activities are never static and always changing, we need functions
to describe these activities truthfully. The following three examples may give a
better idea of what we have in mind.
For our first example, suppose you have just brewed a cup of coffee and are
waiting for it to cool down. You may formulate an arithmetic problem of the
following type:
Let us say the coffee is 195◦ (Fahrenheit) at the beginning, but after 4
minutes it is 143◦ . What is its average rate of cooling in the first four
minutes? 4
The answer is, of course,

total change in temp. from 0 to 4 minutes 195 − 143

= = 13 deg/min.
4 minutes 4
More important than the answer is the framework that undergirds the formulation
of this problem: Take two snapshots of the evolving state of your cup of coffee,
once at the beginning and a second time at the 4 minute mark. It is only by freezing
those two moments in time that we can come up with the above arithmetic problem. But
what if we ask a more realistic question, one that is perhaps of pressing concern:
how long do you have to wait before the cup of coffee becomes drinkable? It is
obvious that such arithmetic techniques bear little relevance to this question.
What we need, for starters, is a way to describe the temperature of the coffee
at various times. A primitive response to this need may be to create a table:
time after brewing temperature
0 195
1 180
2 165
3 153
4 143
5 135

4 See the definition of average speed on page 266. The concept of average rate is similar, and is

discussed in Section 1.8 of [Wu-PreAlg].


However, if you want to know the temperatures at the half-minute marks, then
you’d need a bigger table:

time after brewing temperature time after brewing temperature

0 195 3 153
0.5 187 3.5 148
1 180 4 143
1.5 172 4.5 139
2 165 5 135
2.5 159 5.5 132

Now if you also want the temperatures at the quarter minute marks, you’d
need a table that is even bigger. Clearly there is no end in sight of the size of the
table you need if you want a complete profile of the whole evolving situation, and
you soon realize that what you really need is not a table of enormous size but a
function f (t), where

f : {all numbers ≥ 0} → R,
f (t) = the temperature of the coffee t minutes after it is brewed.

(Observe how this function f literally “assigns” the temperature f (t) to each mo-
ment that is t minutes after it is brewed.) Once we have the right concept in the
form of a function (and not a table), the next step is to determine this function in
the form of a reasonable formula. The rest of the story is related to Newton’s law
of cooling; the long and short of it is that there is such a formula using concepts in

110 t
f (t) = 70 + 125 ,

which assumes an ambient temperature of 70 degrees Fahrenheit. (For a bit of

explanation of the second term on the right-hand side, see Theorem 9.1 on page
198 and the subsequent discussion.) What is important for us is the realization
that without the concept of a function to describe the change in temperature, such
scientific progress would not be possible.
Consider a second example: a man drives to the airport which is 25 miles
away. He plans to leave his house two hours before departure time. If we want
to see how far he is from the airport, clearly one number won’t get the job done
because this distance depends on the time when the distance is measured. Our
experience with the coffee problem suggests that we make use of a function F for
this description, such that

F(t) = his distance (in miles) from the airport t minutes after
he leaves his house.

Thus F(0) = 25. In general, F assigns to each number t ≥ 0 another number

which is his distance in miles from the airport at time t. Even a skeletal description

of this function in terms of a few values of t can tell a story, as for instance:
t f (t) t f (t)
0 25 30 13
5 24 35 19
10 22.5 40 24
15 21 43 25
20 16 44 25
25 10 45 24.5
26 9 55 13
27 10 60 7.5
28 11 67 0
We can see that he has to start his trip slowly probably because of city traffic,
so that after 10 minutes he has only traveled two and a half miles. Around the
26th minute after he leaves home, he turns around as the values of f (27) and
f (28) and those of subsequent minutes show that he is driving away from the
airport. He forgets to bring his photo-ID (a guess!). He manages to get home at
43 minutes after his departure and it takes him only about a minute to get the
necessary document. Then he speeds a bit as he makes it to the airport in 23
minutes (67 − 44 = 23), not trivial considering the traffic conditions these days.
He has a few minutes to spare.
As a final example, consider the problem of the temperature of the city of
Berkeley on a certain day. To say that Berkeley is 67◦ (Fahrenheit) makes no sense,
strictly speaking. Is the temperature taken in the early dawn or in the afternoon?
In Berkeley, this could mean a 25◦ difference. And where is the temperature
measured: at the top of the hill (about 1000 feet high), downtown, or by the
Bay? The difference here could be another 15◦ . If we start measuring the time
t in hours from midnight, then 0 ≤ t ≤ 24. To specify the geographic location,
we need two more numbers which may be thought of as the idealized x- and
y-coordinates. Berkeley being a small city, 5 miles from the city center in any
direction would include everything. Therefore, a scientifically usable description
of the temperature of Berkeley would make use of a function T, so that, if S is
the region in 3-space consisting of all ordered triples of numbers ( x, y, t) so that x
and y satisfy | x |, |y| ≤ 5 (miles), and 0 ≤ t ≤ 24 (hours), then5
F : S → R,
F( x, y, t) = the temperature of Berkeley, t hours past midnight
at a spot specified by the x- and y-coordinates.
Incidentally, such a function F is said to be a function of three variables, because
three numbers x, y, and t are involved in its definition. In general, an accurate
description of the temperature in a given geographic area would require four
numbers, ( x, y, z, t), where z will specify the height about the point with coordi-
nates ( x, y) at which the temperature is measured (think about the approach of a
storm and you would appreciate the importance of taking height into account).
Therefore, any serious study of temperature will be a study of this function of
four variables.
5 Notice also the use of absolute value to describe the physical extent of the city. It means of

course that −5 ≤ x ≤ 5 and −5 ≤ y ≤ 5. See Section 2.6 in [Wu-PreAlg] or pages 161 ff. in this

It may not have escaped your attention that we have been talking about func-
tions of three variables and four variables without holding forth on the philo-
sophical implications of what a “variable” is. This is as it should be. (Compare
the discussion of the term “variable” in Section 1.1 and Section 3.1.)
After all that, you are still entitled to ask: what is the point of describing
a mundane concept such as “the temperature of Berkeley” with such exquisite
precision using a function of three variables? Is it just to show off? No, the reason
is that our need for accurate weather forecasting—long-term and short-term—will
never be met until we can make a science out of the study of the climate. There
can be no science if the temperature of a wide area remains a single number rather
than a function of four variables; the same holds for other quantities such as air
pressure, wind speed, etc. If functions are taken out of climate science, we wouldn’t
be able to predict rain or shine in the next 24 hours, much less the onset of global
Quite apart from the description of change, functions have already forced
their way into our work whether we know it or not. Transformations of the plane,
including translations, reflections, and rotations, are functions which assign to
each point of the plane another point of the plane, i.e., these are examples of
function T , so that
T : R2 → R2 .
For example, the translation T which moves every point of the plane 2 units to
the left horizontally is precisely given by (see Lemma 5.3 on page 95):
T ( x, y) = ( x − 2, y).
These are among the simplest examples of how functions naturally arise. Of
course, functions are everywhere as soon as you look around. You will be seeing
many more from now on. The purpose of this discussion is to make it plain that
the concept of a function is not something artificially concocted for the purpose
of giving students a hard time. Rather, it is a tool, created out of necessity, to
succinctly describe the phenomena around us, be they natural or social. Functions
are indispensable.

Exercises 6.2

(1) Consider the concept of “the population of a city”. Does it make sense?
If not a number, what would you use to more accurately describe the
number of people living in the city?
(2) Consider the medical question of how much a person weighs. Does it
make sense? What would you use to describe more accurately a person’s

6.3. Some examples of graphs

There is little or no mystery to the concept of the graph of a function, at least
not when the function is a real-valued function of one variable. Just follow the
precise definition and plot as many points as possible in the coordinate plane to

get a feeling of what the graph looks like. For the school classroom, we repeat:
The importance of actually plotting points on the graph by hand can-
not be over-emphasized, and this is especially true in an age of afford-
able graphing calculators. Be sure to insist on it in your class-
There is probably no better illustration of the need for students to learn the
definition of the graph of a function, and to form the habit—at the beginning, at
least—of plotting the graph by hand, point-by-point, than the 2015 blog of Dan
Meyer ([Meyer]). Here is a quote from the blog:
I left high school adept at graphing functions. I could complete
the square and change forms easily. I knew how to identify the
asymptotes, holes, and limiting behavior of those thorny ratio-
nal expressions. But it wasn’t until I had graduated university
math and was several years into teaching that I really, really un-
derstood that the graph is a picture of all the points that make
the function true. This was difficult for me because graphs don’t
often look like a bunch of points. They look like a line.6
This quote, together with readers’ responses in [Meyer], show clearly the
devastating effect of TSM on mathematics learning: students have no idea that the
graph of a function f is the totality of all the
points {( x, f ( x ))}, because TSM does not con- Beginners benefit from graphing
cern itself with giving the precise definition of a function manually, point by
the graph of a function. In this light, one can
point, for many points.
better appreciate the real purpose of graphing
functions by hand, point-by-point: it is to leave A graphing calculator can be
no doubt in students’ minds that the graph of used later.
a function is “a picture of all the points that
make the function true.”
Let us start with a linear function of one variable, i.e., a function f defined
on R so that for some constants a and b, f has the expression f ( x ) = ax + b for
all numbers x. To get an idea of what a linear function is like, one can graph a
simple function such as g( x ) = 2x − 3 by plotting a few points of its graph and
observing that they all line up in a straight line, e.g.,
(0, −3), (1, −1), (1.5, 0), (2, 1), (4, 5), (4.5, 6), (5.5, 8), (6, 9).
But is the graph of g really a line? More generally, is the graph of f ( x ) = ax + b a
line? We now show that such is the case, because we already observed in (6.2) on
page 121 that the graph of the function f ( x ) = ax + b is the graph of the equation
y = ax + b which, according to Section 4.4, is a line. Thus, we have:
Lemma 6.1. The graphs of linear functions of one variable are lines.
If a function h is defined on the whole numbers N, h : N → R , so that for
some constants c and d, h(n) = cn + d for all whole numbers n, then we also call
such an h a linear function. If there is any danger of confusion, we will be careful
to say h is a linear function defined on the whole numbers. The graph of such an h
will be a collection of dots, e.g.,
(0, −3), (1, −1), (2, 1), (3, 3), (4, 5), (5, 7), (6, 9), (7, 11).
6 What is meant is probably that “They look like a curve.”

Now consider the two functions that arose in connection with the cost of a
book, h : N → R and the real-valued function of one variable H : R → R, so
that h(n) = 17.85 n for each whole number n and H ( x ) = 17.85 x for every real
number x (see page 119). Let the number 17.85 be denoted by c; then we rescale
the y-axis so that the unit is not 1 (dollar) but c (dollars) in order to be able to
draw the graph of H within the page of a book.

O 1 2 3 4 5 6 7 8 9
As noted above, the graph of h is a sequence of points because its domain is the
whole numbers:

7c q

6c q

5c q
4c q

3c q

2c q

c q
O 1 2 3 4 5 6 7 8 9
The standard terminology is that the function H interpolates h (“connects the
dots of the graph of h”), or that H is an interpolation of the function f . We will
return to this concept of interpolation in Chapter 9.
In TSM, a discussion of the function h usually displays the graph of H, but
not the graph of h as a sequence of dots. As we mentioned earlier, TSM usually
conflates h with H. This tends to create a crisis in students’ perception of math-
ematics: is the graph of a function what it is supposed to be, or is it something
the textbook makes up as it goes along? For this reason, we will be careful to
draw a distinction between a linear function defined on the whole numbers and
its interpolation.

Given a real-valued function of one variable f , we have seen (e.g., page 121)
how to associate with it a function of two variables, F( x, y), which is defined by:
F( x, y) = y − f ( x ) (= y − ( ax + b)).
Thus F is a function defined on R2 . Consider now the following problem: for a
fixed constant c, what is the set of all the points ( x , y ) so that F( x , y ) = c? (Of
course this is the same as saying all the points ( x , y ) so that y − ( ax + b) = c.)
This set—the set of all the points ( x , y ) so that H ( x , y ) = c—is called a level
set of the function of two variables H, and is denoted by { H = c}. Of course, if
c = 0, this would be the same question as asking for the graph of f . Now ( x , y )
being in { H = c} is equivalent to y − ( ax + b) = c, which in turn is equivalent
to (− a) x + y = b + c, which is equivalent to ( x , y ) being a solution of the linear
equation in two variables (− a) x + y = b + c. The conclusion is that a level set
{ H = c} is always a line, namely, the graph of the equation (− a) x + y = b + c.
The reason for the terminology of “level set” for { H = c} comes
from the fact that if we graph the function H : R2 → R in 3-space
R3 , then the graph is a surface. The intersection of this surface with
the “horizontal” plane z = c is a (plane) curve, and what we call
{ H = c} in the xy-plane is exactly the vertical projection of this
curve on the xy-plane. Of course, any horizontal plane is considered
in everyday life to be “level”, and this accounts for the name.


Let H ( x, y) = 3x − y. Describe the level sets { H = 0}, { H = 1}, { H = 2},

and { H = −4}.

Let us graph a function of one variable that is not linear. For example, take the
square function s : R → R, s( x ) = x2 for all numbers x. The graph of s consists
of all the points of the form ( x, x2 ), where x is arbitrary. Since (− x )2 = x2 ,
we see that the graph includes both ( x, x2 ) and (− x, x2 ), no matter what x may
be. The point (0, 0) is an obvious point on the graph. We can put in values of
x = ±1, ±2, ±3, ±4 to get the points
(±1, 1), (±2, 4), (±3, 9), (±4, 16).
Let us also throw in the points
(±0.5, 0.25, ) (±1.5, 2.25), (±2.5, 6.25), (±3.5, 12.25)
for good measure, and we get a sequence of points on the graph of s, displayed
on the left picture below. Note that in order to make the picture small enough to
fit the page, we have shrunk the scale of the y-axis by a factor of 4.
q 16 q
q 14 q
q 10 q
q 6 q
q q
q q
q q q2 q q
−4 −2 O 2 4

It is not difficult to extrapolate from these points to envision the graph as the
curve in the above picture on the right. This curve is an example of what is called
a parabola. Parabolas will be defined and discussed more fully in Chapter 10 (see
page 252).
With the availability of scientific calculators, there should be no hesitation in
asking students to graph quite sophisticated functions, e.g., a function such as
x → x x−27x
4 +5
. Let us illustrate with a simpler one such as G : R → R given by
G ( x ) = x − 3x + 6. Recall from Section 1.4 that this is called a cubic polynomial or,

more simply, a cubic. Since we have no idea what to expect, we try some obvious
numbers, e.g., G (0), G (±1), G (±2), G (±3), G (±4), getting the following points
on the graph of G:

(−4, −46), (−3, −12), (−2, 4), (−1, 8), (0, 6),
(1, 4), (2, 8), (3, 24), (4, 58).

Because the jumps in the values of G between the values of x at 2 and 3, 3 and 4,
−2 and −3, and −3 and −4 are so great, we also get the following points on the
graph of G:

(−3.5, −26.375), (−2.5, −2.125), (2.5, 14.125), (3.5, 38.375).

By compressing the y-axis by a factor of 40, we can exhibit these points as follows:

60 r
40 r
30 r
20 r
r r r 10 r r
−4 −r3 −2 −1 O
−10 1 2 3 4
r −20
r −40

The graph seems to cross the x-axis between −3 and −2. Suppose it crosses the
x-axis at ( x0 , 0); then 0 = G ( x0 ) by definition of the graph of G. This means
x03 − 3x0 + 6 = 0. Such an x0 is called a root or solution of the cubic polynomial
equation x3 − 3x + 6 = 0. The roots of a polynomial equation are of great interest
in mathematics. For this reason, one may try to get a better estimate of this x0 .
We have

G (−2.5) = −2.125, G (−2.4) = −0.624, G (−2.3) = 0.733,

so it is intuitively clear that x0 is between −2.4 and −2.3. By experimenting

with G (−2.31), etc., we can get even better estimates of this root.
Notice that the graph has a “bump” above (roughly) −1, and has a “trough”
above (roughly) 1. It is a known fact, proved in advanced courses, that the graph
of a cubic polynomial can have at most one “bump” and one “trough”. Therefore we
are very fortunate that with the choice of the nine obvious points on this graph,

we already know that there are no further troughs or bumps below x = −4 and
above x = 4. So the graph will continue to go down as it goes to the left on the
x-axis, and continue to go up as it goes to the right on the x-axis.
If we were less fortunate and the chosen points happen not to reveal the
“bump” and the “trough” of the graph, then we would have to plot more points,
since these features may not have appeared yet or may not even exist. The follow-
ing graph of g( x ) = x3 , for example, has no “bump” and no “trough”:

Let us graph the function h given by h( x ) = 1x .

We first address an issue concerning this h that we have not confronted thus
far, and it is this. Would it be correct to say that h is a function from all numbers
to all numbers? In other words, would it be correct to say that h : R → R ? The
answer is no, because h cannot assign any number to 0, as division by 0 is not
defined. So the correct statement is, rather, that

h : {all nonzero numbers} → R.

That said, we start plotting points. Again, there are two obvious points:

(1, 1) and (−1, −1).

Beyond that we take some random values of x and compute x ; we will remark
on the following points on the graph in due course:

(0.1, 10), (0.2, 5), (0.4, 2.5), (0.5, 2), (2, 0.5),
(4, 0.25), (5, 0.2), (8, 0.125), (10, 0.1), (−0.1, −10),
(−0.2, −5), (−0.4, −2.5), (−0.5, −2), (−2, −0.5),
(−4, −0.25), (−5, −0.2), (−8, −0.125), (−10, −0.1).

The corresponding picture is then:

10 q

2 q
q q
q q q q q q q q
q 2 4 6 8 10

Notice that there are two separate curves here, and they are called the two branches
of a hyperbola. Hyperbolas are related to parabolas (page 130) as they can both be
obtained by intersecting a plane with a (double-napped) cone (see, for example,
[Wiki-conic] for an elementary introduction and related references). For this rea-
son, both the parabola and the hyperbola are examples of so-called conic sections.
Someone has yet to write an elementary account of conic sections that begins
with the geometric definitions of all the conic sections and then identifies these
curves with those obtained from plane intersections with a cone; for the time
being, see [Teukolsky]. Regrettably, we will not pursue the study of hyperbolas
in his volume, but see Chapter 8 of Volume II and Chapter 15 of Volume III in
The plotted points above exhibit a pattern: if 0 < a < b or a < b < 0, then
1 1
a > b . (In Exercise 1 on page 132, you are asked to prove this in general.) This
pattern suggests that the limited choice of the points on the graph above is enough
to reveal the general behavior of the graph: it tells us that as the upper right curve
extends to the right end of the positive x-axis, all it does is get closer and closer
to the x-axis, and as it approaches 0 from the positive x-direction, all it does is
get closer and closer to the positive y-axis. A similar statement also applies to the
lower left curve.

Exercises 6.3

(1) Prove that if 0 < a < b or a < b < 0, then 1a > 1b .

(2) Plot enough points in the graph of each of the following functions to get
an accurate picture of the graph: (i) x2 − 2x + 5, (ii) x3 , (iii) x3 + 2,
(iv) ( x − 5)3 . (Use a scientific calculator.)

(3) Plot enough points in the graph of each of the following functions to
get an accurate picture of the graph: (i) x2 − x, (ii) 3x2 − 4x + 1,
(iii) x3 − x2 − 4x + 4, (iv) 2x3 − 4x2 + x + 6. (Use a scientific calculator.)
(4) Plot enough points in the graph of each of the following functions to get
an accurate picture of the graph: f 1 ( x ) = 2x2 , f 2 ( x ) = 2x2 + 3, f 3 ( x ) =
2( x − 1)2 , and f 4 ( x ) = 2( x − 1)2 + 3. How are they related? Explain in
(5) Let f ( x ) = ax2 and g( x ) = a( x − b)2 + c, where a, b, c are constants.
Describe how the graphs of f and g are related. Explain in detail.
(6) (a) Let H be the function of two variables defined by H ( x, y) = 23 x − 14 y +
5. Describe the level sets { H = 1} and { H = −2} individually and how
they are related. (b) Let H be the function of two variables defined by
H ( x, y) = ax + by + d, where a, b, d are constants with b = 0. Let c and
c be distinct constants. Describe the level sets { H = c} and { H = c }
individually and how they are related.

6.4. Remarks on graphs and coordinate systems

In Section 4.1, we described how a coordinate system is set up in the plane.
One of the most important features is that, once the unit segment [0, 1] has been
chosen on the x-axis, the unit segment on the y-axis will be automatically fixed
too, because the 90◦ counterclockwise rotation ϕ around the origin O is length-

r 1 = ϕ( A)

O 1

It is time to point out that sometimes we have to intentionally ignore the fact
that rotations and reflections are length-preserving and angle-preserving in order
to rescale a coordinate axis for a particular need. Indeed, we already had to
perform such rescaling for the graph of the cost function h on page 128, the
square function s on page 129, and the cubic function x3 on page 131. Here is
another example. Let us graph the linear function f : R → R defined by

f ( x ) = 300x + 60

for all x in the segment [0, 3]. Then it would be impossible to draw this graph on
a page of a book: 3 units of length horizontally but 960 units vertically? Common

sense dictates that we shrink the y-axis in order to make the drawing of the graph
possible. For example, we can let 1 unit along the y-axis stand for 300:
Y (3, 960) r

600 f


60 r
O 1 2 3
However, by making the graph pre-
In theory, the unit segments on sentable, we pay a price in terms of geometry.
the x- and y-axes have the same A (counterclockwise) rotation of 90 degrees
around O will no longer be length-preserving
length. In practice, one of the because it will map a segment of length 1 on
axes often has to be scaled. the x-axis (i.e., [0, 1]) to a segment that repre-
sents a length of 300 on the y-axis (i.e., [0, 300]).
Moreover, consider the graph of the linear function g of one variable, g( x ) = x.
This is a line of slope equal to 1; according to the Activity on page 65, this line
should make a 45◦ angle with the positive x-axis. While this is true in a properly
set-up coordinate system where rotations are length-preserving, it won’t be true
here. In the present coordinate system, what is usually 1 unit of length along the
y-axis becomes a length of 300, so that a “rise” of 1 unit in the y value amounts to
a vertical “rise” of 1/300 of the unit length. Because of this geometric distortion,
the graph of g would be indistinguishable from the x-axis because it is, for all
practical purposes, horizontal. In particular, it will not make a 45◦ angle with the
x-axis. Instead, it is the graph of f —the slope of which being 300—that makes a
45◦ angle with the positive x-axis, as shown in the picture above.
Observe also that, as a consequence of the distortion along the y-axis in the
preceding coordinate system, the reflection across the line that passes through
the origin O and makes a 45◦ angle with the x-axis (the broken line above) is no
longer length-preserving, because it maps the unit segment [0, 1] on the x-axis to
the segment [0, 3000] on the y-axis.
For the purpose of communication, we may call a coordinate system in which
the unit of length in one of the coordinate axes has been intentionally modified a
scaled coordinate system. Be aware that in a such a coordinate system, rotations
and reflections do not preserve lengths.
In the exercise below, one sees a natural example of a scaled coordinate sys-
tem where the rescaling takes place in the x-axis rather than the y-axis. Other
examples of scaled coordinate systems can be found on pages 197 and 198.

Exercises 6.4
(1) Define a function h : [0, 360] → R as follows. Let P0 be the point (1, 0)
on the circle of radius 1 (the unit circle) around the origin O. For each t so
that 0 ≤ t ≤ 360, let Pt be the point on the unit circle which is the image
of P0 under the t-degree counterclockwise rotation around the origin O.
Let the coordinates of Pt be ( xt , yt ) (so that y0 = 0 and −1 ≤ yt ≤ 1 for
all t in [0, 360]). See the picture:

Pt = (xt ,yt )

O P0 =(1,0)

We define h by h(t) = yt . (You may recall from trigonometry that h(t)

is in fact the sine of t degrees, but this fact is irrelevant to us.)
Graph this function h on a regular sheet of paper, and specify the t
value that the unit length on the x-axis represents.


Linear Functions
and Proportional Reasoning
With the concept of a function available, we are now in a position to revisit and
shed light on the earlier discussion of rates and constant rates in Section 1.9 of
[Wu-PreAlg] and Section 3.2 of this volume. In terms of functions, the concept
of constant rate now takes on a strikingly simple form: the constancy of the rate
is equivalent to the linearity of a well-defined associated function.1 This is the
content of Theorem 7.1 (page 138). We give some examples in Section 7.1 to
illustrate how problems about constant rate can now be done in a much more
conceptual way from the perspective of Theorem 7.1.
We also take this opportunity to critically examine the concept of proportional
reasoning, a mainstay of the middle school mathematics curriculum in TSM.2 This
concept is not mathematically well-defined, and it is unclear in what sense this
concept could be rendered mathematically valid. The purpose of the extended
discussion of proportional reasoning in Section 7.2 is to alert teachers to approach
all things related to proportional reasoning with a great deal of caution, particularly
the exhortation that this concept, being of allegedly great importance, “merits
whatever time and effort must be expended to assure its careful development”
([NCTM, page 82]).

7.1. Constant rate and linear functions

Constant rate as linear function


Constant rate as linear function

Let us take up again the following problem that was already solved in Section 1.9
(Example 4) of [Wu-PreAlg]:
If Ina walks at a constant speed and she walks 1 12 miles in 30 minutes,
how long would it take her to walk 78 miles?

1 In advanced mathematics, this function is one whose derivative is the rate.

2 See page xi for the definition of TSM.


Using the concept of a function, we will now

The constancy of speed is rephrase the solution of this problem to make
equivalent to the linearity of it more mathematically transparent. Let us say
that Ina starts walking at time t = 0. For
the distance function in terms
ease of exposition, we introduce the standard
of time. notation [0, ∞) to denote all the nonnegative
numbers, i.e., the point 0 together with all the
points on the number line to the right of 0. Now define a function of one variable
f : [0, ∞) → R, so that for all t ≥ 0,
f (t) = the distance Ina walks from 0 hours to t hours.
Thus f (0) = 0. Call this function f Ina’s distance function. We are going to express
the fact that Ina walks at a constant speed in terms of this function f .
We first express Ina’s average speed in the time interval [0, t], for any t > 0, in
terms of f . By definition (page 266), this average speed is
distance Ina walks (in miles) from 0 to t hours f (t)
= mph.
t hours t
Now if Ina walks at a constant speed of v mph, then her average speed over the
time interval [0, t] (for any t > 0) is equal to v. It follows that, for any t > 0, we
f (t)
have t = v or, equivalently, f (t) = vt for any t > 0. Since by definition of f ,
f (0) = 0 anyway, we see that the equality f (t) = vt is in fact valid not only for
t > 0 but for all t ≥ 0. So, in short, constant speed v implies f (t) = vt for all
t ≥ 0.
Conversely, we claim that, still with the same setting—Ina starts walking at
time t = 0 and Ina’s distance function is f (t)—if the function f satisfies f (t) = vt
for some positive constant v, then Ina walks at a constant speed of v mph. We
must therefore show that Ina’s average speed in any time interval [t0 , t], where
0 ≤ t0 < t, is v. This is so because the distance Ina walks from time t0 to time t is,
by the definition of Ina’s distance function, equal to the difference:
(the distance Ina walks from 0 to t hours)
−(the distance Ina walks from 0 to t0 hours).
This is of course equal to
f (t) − f (t0 ) = vt − vt0 = v(t − t0 ).
Thus the distance Ina walks in the time interval [t0 , t] is v(t − t0 ) miles. By the
definition of average speed, Ina’s average speed in the time interval [t0 , t] is then
v ( t − t0 )
= v mph.
t − t0
This proves the claim.
What has just been shown is that Ina walks at a constant speed of v mph if
and only if Ina’s distance function f satisfies f (t) = vt for all t ≥ 0. We have
therefore proved the following theorem.
Theorem 7.1. Let an object in motion be described by a function f : [0, ∞) → R
so that, when measured in miles,
f (t) = the total distance traveled from 0 to t hours,

and so that f (0) = 0, i.e., the motion begins at time 0. Then the motion has a constant
speed of v mph if, and only if, f (t) = vt for a fixed positive number v.
It goes without saying that there is a corresponding theorem for other kinds
of work done at a constant rate: water flow, lawn-mowing, house-painting, etc.
(see Section 3.2).
We now use Theorem 7.1 to solve the original problem about Ina. We want to know
how long it will take her to walk 78 miles. In terms of Ina’s distance function
f , this means we want to know the value of t0 so that f (t0 ) = 78 . Noting that
f (t) = vt, we are looking for a t0 so that v t0 = 78 . Thus, this t0 satisfies
t0 = .
We need to know the value of v. From the given data: Ina walks 1 12 miles in 1
2 of
an hour. Therefore f ( 12 ) = 1 12 , so that
1 1
v· = 1 =⇒ v = 3.
2 2
8 7
t0 = = hours.
3 24
Since 24 hours is 17.5 minutes, this answer is the same as the one obtained in
Section 1.9 of [Wu-PreAlg].
Recall from page 127 that a linear function of one variable x is a function g
of the form g( x ) = ax + b for some constants a and b; the number b is called
the constant term of the linear function. We say g is a linear function without
constant term if b = 0. Thus the distance function in Theorem 7.1 is an example of
a linear function without constant term. The reason we single out linear functions
without constant terms is that for such a function, g( x ) = ax, we have
g( x )
= a for all x > 0.
In particular, if x1 and x2 are any two positive numbers, then
g ( x1 ) g ( x2 )
(7.1) =
x1 x2
because both sides are equal to the constant a. Equation (7.1) is the precise mean-
ing of the statement in TSM that “two quantities g( x ) and x are in a proportional
relationship”, provided it is known that g( x ) is a linear function of x without con-
stant term. Unfortunately, such an explanation is missing in TSM. In particular,
TSM usually does not make explicit the fact that g( x ) is a linear function without
constant term but expects students to somehow guess it.
In general, the constant term of such a linear function is not zero. For exam-
ple, in the situation of Theorem 7.1 suppose we begin observing the motion of
the object, not from the beginning, but only after the object has traveled b miles.
Thus at time 0, the object has already traveled b miles. (Compare Paul’s distance
function in Example 2 on page 142.) Define the associated distance function of
one variable F : [0, ∞) → R so that
F(t) = the total distance traveled up to time t (hours).

Then we are given that F(0) = b (miles). The average speed of the motion in the
time interval [t0 , t] is

distance traveled (in miles) from t0 hours to t hours

t − t0 (hours)
As before, the distance traveled from time t0 to time t is the difference:

(the total distance traveled up to time t)

−(the total distance traveled up to time t0 ),

which is equal to F(t) − F(t0 ) miles. The average speed of the motion in the time
interval [t0 , t] is, therefore,

F ( t ) − F ( t0 )
(7.2) .
t − t0
Suppose the motion has constant speed v mph. Then for every t > 0, the
average speed of the motion in the time interval [0, t] is equal to v. So,

F ( t ) − F (0)
= v.
Therefore F(t) − F(0) = vt. Since F(0) = b, we get F(t) = vt + b for every t > 0,
and therefore also for every t ≥ 0. (Again, compare Paul’s distance function in
Example 2 on page 142.)
Conversely, suppose F(t) = vt + b for some constants v and b; then we claim
that the motion is one of constant speed v. Indeed, from (7.2), we have that the
average speed of the motion in the time interval [t0 , t] is equal to

(vt + b) − (vt0 + b) v ( t − t0 )
= = v mph.
t − t0 t − t0
Since this is true for all t0 and t, the motion has constant speed v by the definition
of constant speed. We have therefore proved the following slightly more general
version of Theorem 7.1.

Theorem 7.2. Let the motion of an object be described by a function F : [0, ∞) → R

so that, when measured in miles,

F(t) = the total distance traveled up to time t (hours)

and so that F(0) = b (miles). Then the motion is one of constant speed v mph if, and
only if, F(t) = vt + b for a fixed positive number v.

Naturally, an entirely similar discussion can be given for water flow at a con-
stant rate, work done at a constant rate, etc. For example, in the case of water
flowing out of a faucet into a container (let us say), let F be the function so that
F(t) is the amount of water (in gallons) in the container at time t (in minutes). Let
F(0) = b gallons, i.e., there are already b gallons of water in the container at time
0. One then proves in exactly the same way that the rate of the water flow being a
constant r gallons per minute is equivalent to F(t) = rt + b gallons for all t ≥ 0.

We now solve two prototypical “constant rate problems” using linear functions.
These problems can be done without algebra, so the point of interest here is
the relative simplicity and conceptual clarity of the solutions that come from the
formulation of constant rate in terms of linear functions (see Theorem 7.1). In
the ensuing discussion, one can also appreciate the importance of being able to
translate verbal information into equations (Chapter 2).
Example 1. Joshua, Li, and Manfred are going to paint a house together. It is
estimated that, individually, it would take them 18 hours, 15 hours, and 16 hours,
respectively, to paint the whole house. Assuming that each person paints at a
constant rate, estimate how long it would take them to do it together.
Since each person paints at a constant rate, Theorem 7.1 implies that there are
fixed positive constants j, , and m so that the areas of the house that Joshua, Li,
and Manfred paint in t hours are, respectively,
J (t) = jt sq ft,
L(t) = t sq ft,
M (t) = mt sq ft.
(Thus J (0) = L(0) = M (0) = 0.) We can determine each of these constants j, ,
and m, as follows. Let A be the number of square feet of the house that needs
painting. Since it takes Joshua 18 hours to paint the house, we see that J (18) = A.
Thus j · 18 = A and j = 18 A
. In like manner, we get  = 15 A
and m = 16 A
. If
all three paint together, then in t hours, each of Joshua, Li, and Manfred paints,
respectively, J (t), L(t), and M (t) sq ft, i.e., 18 t, 15 t, and 16 t sq ft, respectively.
Therefore, if all three work together, they paint
A A A 1 1 1
(7.3) t+ t+ t = + + At
18 15 16 18 15 16
sq ft in t hours. Let t0 hours be the time it takes these three people to paint the
whole house, i.e., A sq ft. Then,
1 1 1
+ + A t0 = A.
18 15 16
Multiplying both sides by A, we get
1 1 1
+ + t0 = 1
18 15 16
and therefore the answer is:
1 1 55
t0 = 1 1 1
= 798
= 5 hours.
18 + 15 + 16 4320

Check that 5 133 hours is the correct answer, i.e., the areas painted by Joshua,
Li, and Manfred after 5 133 hours do add up to A sq ft.

Two things are noteworthy. First, (7.3) clearly shows that when

Joshua, Li,
and Manfred work together, they paint at the constant rate of 18 + 15 + 16
square feet per hour (see Theorem 7.1 on page 138). This would be clumsy to
prove without the availability of linear functions. A second thing of note is that,
if one is fluent in the use of functions, then the preceding solution is entirely
straightforward and is devoid of subtlety. Compare this with any solution that
does not use functions.
Example 2. Paul and Geneviève walk at a constant rate. Paul walks from their
house to the train station in 30 minutes while Geneviève needs only 24 minutes to
do the same. Geneviève gives Paul a head start of 4 minutes and then she starts
off. Does she catch up with Paul, and if so, after how many minutes?
Let us first make a rough estimate of whether Geneviève can overtake Paul.
Since it takes Geneviève only 24 minutes to get to the station, it takes only 4 +
24 = 28 minutes after Paul leaves the house before she gets to the station. But 28
minutes after Paul leaves, he is still on his way to the station because it takes him
30 minutes to get there. Therefore Geneviève overtakes him at some point on her
way to the station. The question is exactly when.
Let G (t) be Geneviève’s distance from the house t minutes after her departure.
Let the distance between the house and the train station be D miles. So G (0) = 0,
and by Theorem 7.1, we know G (t) = a t, where a is Geneviève’s (constant)
speed. Since her speed is given as 24 miles per minute, we have G (t) = 24 t.
Let P(t) be Paul’s distance from the house t minutes after Geneviève’s departure.
Now, by the same reasoning as in Geneviève’s case, Paul’s speed is 30 miles per
minute, so that in 4 minutes, he would be 4( 30 ) miles from the house. Thus
P(0) = 4( 30 ), and therefore P(t) = ( 30 )t + 4( 30 ) = 30 (t + 4) (compare Theorem
7.2). The problem then becomes: what is the time t0 so that G (t0 ) = P(t0 )? That
is, we must solve the equation G (t) = P(t), i.e., we must solve:3

t = ( t + 4).
24 30
This again looks like an equation in the two numbers D and t, but once again, the
D goes away as soon as we multiply both sides by D1 . So 24 1 1
t = 30 (t + 4) and
30 1
therefore 24 t = t + 4. This leads to 4 t = 4 and t = 16. So 16 minutes after
Geneviève leaves the house, she catches up with Paul.


Check that 16 minutes is the correct answer.

It is instructive to look at the graphs of the equations describing the motion
of Paul and Geneviève. For the sake of clarity, we need to give the distance D
between the house and the station a definite value so as to be able to draw the
picture; let us say D = 2 miles (2 miles is a reasonable number for the distance

3 Please take note of the precise meaning of an equation as explained on page 28 and how it

dovetails with the way we try to solve for t0 here.


2 1 2
from the house to the station). Then G (t) = 24 t = 12 t and P(t) = 30 ( t + 4) =
1 4
15 t + 15 . We now graph both linear functions
1 1 4
(7.4) G (t) = t and P(t) = t+
12 15 15
on the same set of coordinate axes (D stands for “distance from the house”):
G (t) P(t)
5 !!!
! !
4 r
! !

 (16, 3 )
3 4
! !
1 !! 
3 ! 
1 !!!
3 ! 

0  T
1 12 16 24 26
The intercept of P(t) on the D-axis (which is 15 , as we saw above) now has a
graphic interpretation: it gives Paul’s distance from the house at the moment
Geneviève leaves the house. The point of intersection of the two graphs, which is
(16, 43 ), also has an interpretation: the x-coordinate tells the time when Geneviève
catches up with Paul because at that instant, both are exactly the same distance
( 43 miles, the D-coordinate of the point) from the house.

Exercises 7.1
Each of the following exercises can be done as in Section 3.2. Therefore
the reason for giving these exercises here is for you to get some practice
doing them using linear functions.
(1) Suppose Jessica can paint a house in 5 days, and Jessica and Helena
together can do it in 3 days. Assuming that each paints at a constant
rate, in how many days can Helena do the work alone?
(2) A man walks from point A to point B at a constant rate. If he walks at
the rate of 1 yard per second, then it takes him 5 12 minutes more to get
to point B than if he walks at the rate of 4 yards per 3 seconds. How far
is point A from point B?
(3) A freight train runs 6 miles an hour slower than a passenger train. It
runs 80 miles in the same time that the passenger trains runs 112 miles.
Assuming that both trains run at a constant rate, find the speed of each
(4) A train left A for B, 112 miles apart, at 9 am, and one hour later a train
left B for A; they met at 12 noon. If the second train had started at 9 am
and the first at 9:50 am, they would also have met at noon. Assuming
that each train runs at a fixed constant speed, find their speeds.
(5) Two faucets pour into a tub. The first faucet alone can fill the tub in
18 minutes, and the second faucet alone can fill the tub in 22 minutes.

Assume the constancy of the rates of the water flow as usual. The first
faucet is turned on for 4 minutes before the second faucet is turned on,
and t minutes later the tub is filled. What is t?
(6) Two people A and B walk straight towards each other at constant speed.
A walks 2 12 times as fast as B. If they are 2000 feet apart initially, and if
they meet after 3 13 minutes, how fast does each walk?
(7) Joshua, Li, and Manfred mow lawns at a constant rate. How long would
it take the three of them to mow a lawn if, for the same lawn, it takes
Joshua and Li 2 hours to mow it together, Li and Manfred 3 hours to
mow it together, and Joshua and Manfred 4 hours to mow it together?
(8) A can do a piece of work in 23 as many days as B, and B can do it in
4 7
5 as many days as C. Together they can do it in 3 11 days. Assuming
constant rate of work, in how many days can each do it alone? (Recall
the comment on this kind of abstract “work problem” in Section 1.9 of
[Wu-PreAlg], at the end of the subsection The concept of constant rate:
do this exercise by imagining the “work” to be something concrete, like
painting a house or mowing a lawn.)

7.2. Proportional reasoning

The discrete case
The continuous case

We are devoting a whole section to the discussion of proportional reasoning for a
good reason. This is one of the key topics in school mathematics on which TSM
has inflicted severe damage (the other comparable topics being fractions, negative
numbers, “variable”, and slope). First, TSM promotes “setting up a proportion”
in a way that is unteachable and therefore unlearnable. Then the resulting mas-
sive nonlearning triggers an extreme reaction that ends up codifying proportional
reasoning as the capstone of elementary school mathematics and the gateway to
higher mathematics ([Post-Behr-Lesh]). In fact, proportional reasoning has come to
be regarded as a concept of such great importance that it “merits whatever time
and effort must be expended to assure its careful development” [NCTM, page 82].
It is precisely because of the putative importance of proportional reasoning in the
middle school curriculum—according to TSM—that we feel obligated to investi-
gate what proportional reasoning might be.
What is proportional reasoning? Apparently, nobody knows for sure. Accord-
ing to the volume [Siegler-etal.], “the literature consists of several different defini-
tions of proportional reasoning. On a basic level, the term means understanding
and working with the underlying relations in proportions” (page 48). When we
seek clarification in [NRC], we are told that it is “understanding the underlying

relationships in a proportional situation and working with these relationships”

(page 241). Such circular statements obviously fall short of being informative,
and the following more discursive approach from page 5 of [Lamon] does not
seem to yield more information either:
Proportional reasoning results after one has built up compe-
tence in a number of practical and mathematical areas. For ex-
ample, what is a function? What is space? What is a limit?
Someone might give you a definition of those terms, but truly
understanding them is a more difficult task. They are not ab-
solute ideas with a single source of meaning. Instead, meaning
is built over time and is facilitated by interactions with many
closely related situations, each of which embodies some, but
not all, of the critical aspects of an idea. This is true of propor-
tional reasoning. It draws on a huge web of knowledge.
So the answer is that we cannot say in a concise way what
proportional reasoning is, nor can we say how a person learns
to reason proportionally.
This explanation of proportional reasoning clearly fails the requirement of precision
in mathematics (see the fundamental principles of mathematics on page xii).
For all these reasons, what we are going to do is to take a look at a few of these
“proportional situations” to see how proportional reasoning is applied. It appears
to have escaped the attention of the education literature that, in fact, “proportional
situations” fall into two distinct categories: the discrete and the continuous. First,
a problem is classified as a discrete problem if it revolves around a smallest unit.
To explain what this means, we turn to the following problem that belongs to the
discrete category:
Suppose you want to know how many sheets are in a particular stack
of paper, but don’t want to count the pages directly. You have the
following information:
• The given stack has height 4.50 cm.
• A ream of 500 sheets has height 6.25 cm.
How many sheets of paper do you think are in the given stack? ([Stanley])
This is a problem about the height of a stack of papers, and it is easy to discern
a smallest unit: the height (i.e., the thickness) of one sheet of paper. The existence
of this smallest unit simplifies the reasoning
because the solution of the problem depends Discrete “proportional
only on counting or calculating the total num- reasoning” problems are best
ber of this smallest unit in a stack of papers;
solved without setting up a
there is no need to worry about fractional mul-
tiples of this smallest unit. For example, the proportion.
question of the thickness of a third of a piece of
paper simply does not arise. Therefore, such problems are essentially problems
about whole numbers. See pages 146 ff.
We bring up what seems like a fine point now, but we will revisit it in greater
detail later (pages 150 and 151). In this problem, it is tacitly assumed that every
single sheet of paper has the same thickness. In context, this is an entirely rea-
sonable assumption, just as the fact that in a problem about buying pencils (let us
say), one may assume that all of the pencils in question cost the same. However,

we will come across situations where such an assumption can by no means be

taken for granted, and the lack of this explicit assumption then interferes with
mathematics learning.
The next problem is classified as a continuous problem for a reason to be
John’s grandfather enjoys knitting. He can knit a scarf 30 inches long
in 10 hours. He always knits for 2 hours each day.
(1) How many inches can he knit in 1 hour?
(2) How many days will it take Grandpa to knit a scarf 30 inches
(3) How many inches long will the scarf be at the end of 2 days?
Explain how you figured it out.
(4) How many hours will it take Grandpa to knit a scarf 27 inches
long? Explain your reasoning. ([MAC])
Here the key concept is the speed of Grandpa’s knitting: how many inches can
he knit in each unit of time. Unfortunately, there is no smallest unit of time. In
greater detail, we will show that this problem requires a critical assumption that
is not articulated in the above formulation of the problem, namely, that Grandpa
knits at constant speed (see pages 47 ff.). If there were a smallest unit of time, say
one minute, then the constancy of the speed of Grandpa’s knitting could be easily
defined as “Grandpa knits k inches per minute” for a fixed number k. Were this
the case, this problem—like many others of the same genre—would be as easily
solvable as those in the discrete case, because everything would then be essentially
reduced to counting whole number multiples of k and computations with whole
numbers would once again be all that is required. Such a difficulty could also be
bypassed if the problem explicitly states that Grandpa somehow manages to only
knit in multiples of a minute and always knit k inches per minute (with k fixed). But
such an assumption is not clearly stated, so this problem has to be treated as a
purely mathematical problem and, as such, we have to ask what it means to knit at
a constant speed. The lack of a smallest unit of time (e.g., a minute is 60 seconds,
and a second is 103 milliseconds, 106 microseconds, 109 nanoseconds, 1012 pico-
seconds, etc.), makes it impossible to describe constant speed in simplistic terms,
such as “k inches per second for a fixed k”. (Compare the discussion in Example
3 of Section 1.9 in [Wu-PreAlg] on why “traveling 40 miles during each 60-minute
time interval” does not guarantee constant speed.) Under these circumstances,
even if Grandpa is known to knit the same number of inches during each minute,
the rigor of mathematical reasoning demands that we further inquire whether he
knits the same number of inches during each second, millisecond, microsecond,
nanosecond, etc. When the data in a given problem cannot be formulated in terms
of a smallest unit, we call such a problem a continuous problem.
After this preamble, it is time to get down to business by examining in detail
the discrete case and the continuous case separately.

The discrete case

Let us begin with the earlier problem about stacks of paper.
Suppose you want to know how many sheets are in a particular stack
of paper, but don’t want to count the pages directly. You have the
following information:

• The given stack has height 4.50 cm.

• A ream of 500 sheets has height 6.25 cm.
How many sheets of paper do you think are in the given stack? ([Stanley])
Because the context (very thin papers in a stack) may create a psychological
barrier, one can put students at ease by first doing a problem in a more familiar
A principal wants to buy 50 chairs (the price of each chair is fixed) for
his newly refurbished classroom. He was told that it would cost him
$6250. Because of a last-minute reduction of his budget, he now has
only $4500 to spend. How many chairs can he buy with this amount?
Obviously we have to find out how much each chair costs: it is 6250
50 = 125 dollars.
Therefore $4500 can buy 125 = 36 chairs.
Now back to the paper-stack problem. The reasoning is entirely similar: first
find out how thick a sheet of paper is. Since all sheets have the same thickness
(this is the universally accepted unspoken assumption), the fact that 500 sheets
stack up to 6.25 cm means the thickness of one sheet is
6.25 1.25
(7.5) = = 0.0125 cm.
500 100
If there are n sheets in the given stack of paper with height 4.50 cm, then obviously
n(0.0125) = 4.50. Therefore,
4.50 45000
n = = = 360.
0.0125 125
The answer is 360 sheets.
It is worth noting that the preceding solution is entirely straightforward; it has
no subtleties, and it certainly involves no “proportional reasoning”, regardless of
how the latter is defined. Students should be taught how to reason through such a
problem because this kind of reasoning is truly basic. Unfortunately, TSM does not
promote such reasoning. In any case, if students were taught this down-to-earth
solution, there would be nothing more to talk about.
However, TSM believes that students need to know what a “proportional
relationship” is and what a “constant of proportionality” is for the purpose of
doing this simple problem. Accordingly, TSM wants this problem to be solved using
proportional reasoning, in the following way. Let n = number of sheets and h =
height of a stack with n sheets. Then n and h are “variable quantities”, and TSM
wants students to believe, on faith, that the “variable quantity” n is proportional
to the “variable quantity” h, in the sense that there is an “invariant” (constant) k
so that
(7.6) n = kh.
This k is called the unit rate of the proportionality relationship (7.6), and because
k = nh , this k measures the number of sheets per cm, cm being the unit height. In
this way of thinking, one first computes k from the given data that a stack of 500
sheets has a height of 6.25 cm:
k = = 80 sheets/cm.
The problem asks for the number of sheets in a stack that is 4.50 cm high. In a
stack that is 4 cm high, there are 4 × 80 = 320 sheets; this is easy to see. For

the additional 0.50 cm (half of a cm), we see that, “proportionally”, we should get
half of 80 sheets, i.e., 40 sheets. Altogether, there are 320 + 40 sheets in 4.50 cm.
Once students get used to this reasoning, they can do it in one step: if they can
compute the unit rate k = 80 sheets/cm, they can conclude that the number of
sheets in a stack of 4.50 cm is:

80 × 4.50 = 360 sheets.

On pages 2 ff., we already explained the futility of trying to understand what

a “variable” is; along this same line of reasoning, we suggest that you not try to
teach what a “variable quantity” means in your classroom. Nor, indeed, should
you ask your students to acquire the “conceptual understanding” of why equation
(7.6) is correct, i.e., why for this particular problem there is a unit rate, unless of
course you can prove it.
We have seen that the paper-stack problem, like all discrete problems in pro-
portional reasoning, is a simple problem to solve. There is no reason whatsoever that
you should force your students to solve such simple problems by using a more complicated
method that invokes some mysterious principle about two “variable quantities”
like n and h above being “in a proportional relationship”. However, it is possible
to make sense of “variable quantities” and “proportional relationships”, in the
same way that we made sense of a “variable” in Section 1.1. There is even some
incentive for us to do so because when we deal with problems in the continuous
category, we will have to take “proportional relationships” (once such a thing has
been correctly defined ) seriously. Thus, if students are already fluent in solving
discrete proportional reasoning problems as on pages 147 ff., then they can af-
ford the luxury of learning the following approach to these problems that has the
advantage of wider applicability.
We will now freely make use of the concept of a linear function without
constant term (see page 139) to take a second look at the paper-stack problem.
Thus we define a function h : N → R (N denotes the whole numbers) so that
h(n) is the height of a stack of n sheets of paper. If the thickness of one sheet of
paper is T cm, then h(0) = 0 and for any positive integer n,

h(n) = T + T + · · · + T = nT

so that h(n) = Tn, where we write Tn instead of nT to bring out the fact that in
the expression for the function h(n), T is the constant. (Compare the discussion
of the cost of multiple copies of the same book on page 119.) Thus h(n) is a linear
function without constant term:

(7.7) h(n) = Tn for any whole number n.

This equation is the precise meaning of the common expression that “the number
of sheets n is proportional to the height h(n) of a stack with n sheets”. Now equa-
tion (7.7) explains why equation (7.6) is correct: the h(n) and n in equation (7.7)
correspond, respectively, to the h and n in equation (7.6), and the T in equation
(7.7) then corresponds to 1k in equation (7.6). In other words, the “invariant” k in
equation (7.6) is precisely the number 1/T, but you may have noticed that it is
far easier to think of T (the thickness of one sheet) than to think of 1/T.

We can rewrite equation (7.7) in a form that leads directly to setting up a

(7.8) = T for any positive integer n.
We can now solve the paper-stack problem anew by using equation (7.8): let x be
the number of sheets in the stack of height 4.50 cm; then
h(500) h( x )
500 x
because both ratios are equal to T according to (7.8). From the given data of the
problem, we get
6.25 4.50
= .
500 x
The cross-multiplication algorithm again yields x = 360 as before.
It may be useful to put the preceding solution in a more general context. Each
time we have a linear function without constant term such as (7.7), we will also
have equation (7.8). It is time to point out that the equality of two ratios is usually
called a proportion. Setting up a proportion vis-à-vis equation (7.7) means for
any two distinct positive integers m and n,
h(m) h(n)
(7.9) = .
m n
The validity of the proportion in (7.9) follows immediately from equation (7.8) as
both sides of (7.9) are equal to T.
It is time to reflect on the fact that we are introducing the concept of a propor-
tion at this late date only because we have had no use for it until now. Don’t forget
that even now, we feel the need to bring it up only because we are trying to make
contact with what has been going on in TSM. The truth is that if we have a correct
definition of constant rate or if we have a linear function without constant term,
the equality of two ratios, such as equation (7.1) on page 139 or equation (7.9), is a
natural consequence of the definition or the property of the linear function. Once
we have achieved an understanding of the definition or how the linear function
is derived, a proportion such as (7.9) is nothing to write home about. One may
speculate that setting up a proportion is considered a major skill in TSM only
because TSM, not being able to give any reason as to why a proportion can be set up, is
obligated to magnify the skill itself because it has nothing to say about why the
skill is correct.
We can now put the paper-stack problem (page 146) as well as all other dis-
crete problems in proportional reasoning in the proper perspective: these are
problems about a linear function without constant term defined on the whole
numbers N. If students are already fluent in working with linear functions, they
should indeed be taught how to solve such problems by setting up a proportion,
e.g., equation (7.9). Nevertheless, we strongly recommend that all students be
taught the basic method of solution as described on page 147 before they embark
on the more sophisticated method of solution involving a linear function with-
out constant term (e.g., equation (7.8)) and the setting up of a proportion (e.g.,
equation (7.9)).

Let us briefly consider, in succession, two discrete problems because they

point to another kind of defect in TSM that every teacher ought to be aware of.
First, a camping problem:
A group of 8 people are [sic] going camping for three days and need
to carry their own water. They read in a guide book that 12.5 liters
are needed for a party of 5 persons for 1 day. How much water should
they carry? [NCTM, page 83].
Although it might be easy for most students to accept a uniform thickness
for sheets of paper, many students will not likely take for granted that everybody
drinks the same amount of water each day. Because this assumption is needed
for the solution of the problem, this assumption must be made explicit in the problem
Assume then that everybody drinks the same amount of water each day, let
us say c liters. To the extent that we are not concerned with how much water a
person drinks in half a day or a third of a day, one day is the shortest time duration
that matters in this problem. Consequently, since we are only interested in water
consumption over a certain number of days, c is the smallest unit in the problem.
Let us first determine this c. We know 5 people will need c + c + c + c + c = 5c
liters per day, and it is given that 5 people need 12.5 liters per day. Therefore
5c = 12.5, and c = 2.5 liters. Thus, each person needs 2.5 liters per day, and 8
people will need

c + c + · · · + c = 8 × c = 8 × 2.5 = 20 liters per day.


If the 8 people stay for 3 days, then they need 3 × 20 = 60 liters. The answer is
that the 8 people should carry 60 liters of water.
As before, we note that the solution is simplicity itself (provided we make the
assumption that everybody drinks the same amount each day) and “proportional
reasoning” does not intrude at all. However, if we want to put the problem in the
general context of linear equations without constant term, we can.
Again we assume that each person drinks c liters per day. Define a function
f : N → R so that for each whole number n, f (n) = the amount of water (in
liters) that n people drink per day. We are given that f (1) = c, and therefore for
each whole number n ≥ 1,

f (n) = c + c + · · · + c = n c,

where the second equality is because of the definition of multiplication of whole

numbers. Therefore, we have an explicit formula for the linear function without
constant term f (n): f (n) = cn for all whole numbers n. In particular,
f (n)
(7.10) = c for all positive integers n.
Equation (7.10) allows us to solve the camping problem by setting up a propor-
tion: let n be, successively, 5 and 8 in (7.10); then
f (5) f (8)
5 8

because both ratios are equal to c, according to (7.10). Using the given data in the
problem, we get
12.5 f (8)
= .
5 8
The cross-multiplication algorithm gives 5 f (8) = 100, and f (8) = 20. Therefore
8 people will need 20 liters each day. If they want to camp for 3 days, they will
need 3 × 20 = 60 liters as before.
One more example:
Which is the better buy: 12 tickets for $15.00 or 20 tickets for $23.00?
([NCTM2000, page 221]).
We begin by removing two flaws from this problem. The first one is relatively
minor: what is missing is a clear statement that each ticket within a group costs
the same amount; after all, tickets for a musical or theatrical performance usually
come in a wide range of prices. While adults reading this item might realize that
they must assume that all tickets in each group cost the same before this problem
can be solved, an adolescent may not be sophisticated enough or lucky enough
to come to the same realization. The major issue, however, is that it is not clear
what is meant by “better buy”. For example, suppose the first kind of ticket is
for a regular concert of the San Francisco Symphony while the second kind of
ticket is for a performance by the local high school band. Then the former may be
considered a “better buy” even if it turns out to cost twice as much as the latter.
For these reasons, the problem will have to be rephrased. Here is one possibility:
For a certain event, there are two kinds of tickets on sale: 12 tickets
for $15.00 or 20 tickets for $23.00. Assuming that all tickets in each
group cost the same amount, which of the two kinds of tickets is less
When the problem has been properly reformulated this way, it is clearly a discrete
problem because, within each group of tickets, the smallest unit is the price of one
ticket. Let us solve this problem.
Just as in the preceding problems, this is a simple problem in 5th-grade arith-
metic. The price of the first kind of ticket is computed as follows: $15 is par-
titioned into 12 equal parts, so by the division interpretation of a fraction (see
Section 1.2 in [Wu-PreAlg]), the size of one part (= the price of one ticket) is
= 1.25 dollars.
In like manner, the price of one ticket of the second kind is
= 1.15 dollars.
Clearly the second kind of ticket is less expensive. (If we use a cent as the unit for
the price of a ticket, this would be a problem in whole number arithmetic.)
We can treat this problem as one about linear functions without constant term
but, for something this simple, such a conceptual detour would indeed be a waste
of time. Instead, let us see where “proportional reasoning” might play a role. We
are looking at the ratio:
the price of n tickets
n tickets

and the two groups of tickets give rise to two ratios,

15 23
and .
12 20
Instead of claiming that the ratios are equal (i.e., setting up a proportion), the
problem asks for a determination of the smaller of the ratios. One can simply use
the cross-multiplication algorithm (see page 270):
23 15
< because 23 × 12 = 276 < 300 = 15 × 20.
20 12
Thus the second kind of ticket is less expensive.
In summary: (1) Discrete problems can be solved using whole number arith-
metic, and there is no need for “proportional reasoning”.
(2) We can set up the solution of such discrete problems by the use of a linear
function without constant term defined on N. Then setting up a proportion becomes
a logical consequence of having such a linear function.
(3) The formulation of such discrete problems in TSM often leaves out criti-
cal assumptions (see the camping problem and the ticket problem) or lacks the
necessary clarity (see the ticket problem), so one must be alert to such defects.

The continuous case

Let us take up the knitting problem in the Overview subsection (page 146):
John’s grandfather enjoys knitting. He can knit a scarf 30 inches long
in 10 hours. He always knits for 2 hours each day.
(1) How many inches can he knit in 1 hour?
(2) How many days will it take Grandpa to knit a scarf 30 inches
(3) How many inches long will the scarf be at the end of 2 days?
Explain how you figured it out.
(4) How many hours will it take Grandpa to knit a scarf 27 inches
long? Explain your reasoning ([MAC]).
Before we discuss how to solve this problem, please think it through, and do it
yourself first. Then compare your solution with the following suggested solution
that comes with the problem:
(1) 3 inches, by division: 30 ÷ 10.
(2) 5 days, by division: 10 ÷ 2.
(3) 12 inches. Give an explanation such as: In one day he knits
3 × 2 = 6 inches. In 2 days he knits 2 × 6 inches.
(4) 9 hours. Give an explanation such as: To knit 27 inches
takes 27 ÷ 3 hours.
Let us take a critical look at this problem and
Continuous “proportional its proposed solution. As a problem in math-
reasoning” problems can be ematics, with the possible exception of part
(2), this problem is not solvable as it stands be-
solved only if some assumption cause what is given cannot support any kind
of constant rate is made explicit. of logical reasoning for its solution. The miss-
ing assumption is that grandfather knits at a
constant rate. Without this assumption, how can anyone begin to reason about such a

problem? Let us see, for example, why part (1) cannot be solved under the cir-
cumstances. Suppose grandfather adopts the following routine about his daily
knitting: each day he knits 4 inches in the first hour and 2 inches in the second
hour. This way of knitting then satisfies every piece of the given data: he knits 6
inches in the 2 hours of knitting each day, and in 10 hours (i.e., 5 days), he knits
30 inches. Now consider how to answer the question in part (1): Which “1 hour”
interval are we talking about? If it is the first hour of the day, the answer is 4
inches, but if it is the first hour of the day, then it is 2 inches. Exactly as remarked
above, this problem is not solvable as is.
Let us therefore add the assumption that grandfather knits at a constant rate.
Then we can solve the problem, as follows. Define a function g : [0, ∞) → R so
that g(t) is the number of inches grandfather knits in t hours. (Observe that the
domain of g is roughly half of R, but not N. See the discussion on pages 145 ff.)
We may as well assume that grandfather starts knitting at t = 0 so that g(t) = 0.
By Theorem 7.1 on page 138, g(t) is a linear function without constant term:
g(t) =  t for some constant .
Then from the given data that g(10) = 30, we get 10  = 30, and  = 3. Thus
g(t) = 3t. The answers to the four parts are then, in succession:
(1) g(1) = 3,
(2) 10
2 = 5,
(3) g(4) = 12, and
(4) the value of t0 so that g(t0 ) = 27 is t0 = 9.
To drive home the relevance of linear functions without constant term to all
such “proportional reasoning” problems, we make one more comment on the
preceding solution to part (4), “How many hours will it take Grandpa to knit a
scarf 27 inches long?” Now we have g(t) = 3t for all t > 0, so that t = 3 for
all t > 0. In particular, if it takes grandfather t0 hours to knit 27 inches (so that
g(t0 ) = 27), then
g(10) g ( t0 )
(7.11) =
10 t0
as both are equal to 3. Using the data in the problem, we get:
30 27
= ,
10 t0
which yields t0 = 27 3 . Note that equation (7.11) is the proportion that TSM asks
students to set up—without making it explicit that grandpa’s knitting is done at a
constant speed—in order to solve the problem.
The rote skill of writing down equations as in (7.11)—without knowing about,
or being the least bit concerned with, the constant speed of the knitting—would
seem to be the essence of TSM’s concept of “proportional reasoning”. (Compare
the comments after equation (7.1) on page 139.)
In one way or another, such continuous problems are solvable only when
they are known to involve a linear function without constant term; in the case at
hand, this function comes from the assumption of knitting at a constant rate. As
mentioned in the preceding paragraph, TSM wants students to write down pro-
portions of the type (7.11) without making use of any assumption about constant
rate of knitting. This kind of mathematics education has no place in the school

classroom. What we ask you to do, instead, is to make an effort to teach your
students about constant rate (see the preceding section) and how to use the linear
function without constant term that follows from constant rate to deduce the pro-
portion of the type in equation (7.11). Yes, these continuous problems should be
done by “setting up a proportion”, but only after the why and the how of “setting
up a proportion” have been carefully explained.
In the education literature, students are usually blamed for their inability to
“reason multiplicatively” in order to solve these “proportional reasoning” prob-
lems. However, if you review everything we have done in this chapter, you will
undoubtedly conclude that if students were taught proportional reasoning with
all the necessary assumptions clearly stated (e.g., that every person drinks the
same amount of water each day, or that the knitting is done at a constant rate),
and with a reason provided for every step of the solution, then these problems
would be no more difficult than any other problem we have discussed so far. In
particular, proportional reasoning is entirely learnable when it is formulated correctly
and taught correctly. In TSM, unfortunately for teachers and their students, neither
takes place. When it is your turn to teach proportional reasoning, just remember
not to follow TSM or its affiliated literature, but teach proportional reasoning as
mathematics, with all that this term implies (see the five principles on page xii).
Further discussion of the role of proportional reasoning in the school math-
ematics curriculum is given in the section on Rate and Proportional Reasoning in

Exercises 7.2

(1) If 15 cupcakes cost $5.10, find the cost of 37 cupcakes, (i) without using
any proportions, and (ii) by setting up a proportion. To show that you
now know more than TSM, give an explanation in (ii) about why you
can set up a proportion.
(2) Ann and Betty both run at constant speed. They start running together
at the same time, but after Ann has run 3 laps, Betty has only run 2.5
laps. By the time Ann finishes running 7 laps, how many laps will Betty
have run? You must be able to explain every step of your solution.
(3) The following is a favorite problem in middle school mathematics: “On
a certain map, the scale indicates that 3 centimeters represent the actual
distance of 8 miles. Suppose the distance between two cities on this map
measures 1.7 centimeters. What is the actual distance between these two
cities?” (a) Suppose you are the teacher. What additional explanation
must you give your students about this problem before they can solve it?
(b) Solve it.
(4) Consider this problem: “If it took 8 hours to mow 5 lawns, then at that rate,
how many lawns could be mowed in 32 hours? At what rate were lawns being
mowed? (i) Critique it in terms of clarity. (ii) What does the last sentence
mean? (iii) How is this problem without the last sentence different from
the following: “A ballpoint pen sells only in bundles of 5, and each
bundle costs $8. How many pens can you get for $32?”
(5) Consider this problem: “If 25 cows consume 400 lb. of hay in a week, how
long will 300 lb. of hay last for 12 cows?” (a) What other assumptions do
you need to add to make the problem solvable? (b) Solve it.


Linear Inequalities
and Their Graphs
So far we have only discussed equations because school algebra is primarily about
equations. But school algebra also includes everything related to number compu-
tations and, to the extent that inequalities arise naturally in various mathematical
contexts as well as in real life, they should also be an integral part of the alge-
bra curriculum. In this chapter, we pay special attention to inequalities by giving
careful definitions of the basic concepts and proving the most rudimentary facts
related to inequalities in two variables. Then we pull all these pieces together to
solve a typical optimization problem in Section 8.5, i.e., a problem that looks for
the largest or smallest value of a given function in a given region.

8.1. How do inequalities arise in real life?

Real life numerical data tend to appear as in-
equalities rather than as equalities. After all, Inequalities are important
it is rare that two measurements are exactly because real life numerical data
the same, e.g., people’s heights, weights, exam
tend to take the form of
scores, and world records in athletic events.
Equalities, such as identities, are the excep- inequalities rather than of
tions, and we understand why the exceptions equalities.
deserve to be celebrated (e.g., identity (1.7) on
page 12). All the same, we must also study the generic case and, therefore, have to
learn about inequalities. For the consideration of linear inequalities, the following
problem is a prototypical one.
[Manufacturing Problem] A video game manufacturer is invited
to a game show, and is told that she can bring up to 50 games.
She has two games, A and B, and has up to $6000 to spend on
manufacturing costs. Game A costs $75 to manufacture and will
bring in a net profit of $125, while Game B costs $165 to man-
ufacture but will bring in a net profit of $185. Assuming that
she sells every game she brings, how many games of each kind
should she manufacture if she wants to maximize her profit?
It is clear that in this case there is no equation to solve, because the answer to
the problem consists of a pair of numbers, a certain number of A games and a
certain number of B games, so that this combination brings in a profit bigger than

any other possible combination. The emphasis here is on the words bigger than,
i.e., an inequality.
One way to understand what is involved in a problem of this nature is to
approach it in a naive way in order to see why naivety doesn’t pay. For exam-
ple, a casual glance at the data would suggest that it is more profitable to sell
Game A than Game B, in the following sense. Suppose you have $165. Then
if you manufacture one B game, you only make $185, but if you use the same
amount to manufacture two A games (each costing $75), you’d not only make $250
(= 2 × 125) but would also have $15 left over from your $165.
A precise way to think about this is to notice that each Game A brings
in a profit that is 1 23 of its manufacturing cost (because 125
75 =
1 23 ), but each Game B only brings in a profit of about 1 18 of its
185 4 4
manufacturing cost (because 165 = 1 33 , which is about 1 32 = 1 18 ).
One’s first impulse is therefore to say that the manufacturer should bring
only A games to the show. We will show why this is a bad strategy in terms of
profit-making. Remember: there is a limit to how many games in total she can
bring to the show: 50. Does she have the money to manufacture 50 A games? Yes,
because it takes only $75 to manufacture one A game so that the manufacturing
costs for these 50 games is 50 × 75 = 3750 dollars. Since she has $6000 to spend,
she is well within her budget. However, with only 50 A games, she can only
make 50 × 125 = 6250 dollars. There is at least one alternate strategy that makes
a greater profit: bring 40 A games and 10 B games. Is this possible? Yes, because
she would still be bringing 40 + 10 = 50 games, and moreover, the manufacturing
cost for 40 A games and 10 B games is only (40 × 75) + (10 × 165) = 4650 dollars,
which is less than the budget of $6000. However, the resulting profit is

(40 × 125) + (10 × 185) = 6850 dollars.

Needless to say, $6850 is more than the earlier profit of $6250.

She can also approach this problem from the opposite end, namely, knowing
that each Game B brings a profit of $185 whereas each Game A brings a mere
$125, she could decide to concentrate entirely on selling Game B and forget about
Game A. The problem now is that she cannot bring 50 B games to the show
because her budget of $6000 won’t allow it: the cost of manufacturing 50 B games
is 50 × 165 = 8250 dollars, which is more than $6000. Again, this would suggest
that bringing all B games to the show is a poor strategy for maximizing profit.
For confirmation, notice that a budget of $6000 can produce at most 36 B games
165 = 36 11 , and her profit from 36 B games would be 36 × 185 = 6660
because 6000 4

dollars, whereas we have already seen that bringing 40 A games and 10 B games
would bring in a greater profit of $6850.
It is now clear that there is an inherent push-
Finding the maximum profit pull in this problem: bringing only A games
usually requires a balance would under-utilize the $6000 manufacturing
budget because of the 50-game quota, and
between opposing demands.
bringing only B games would under-utilize the
50-game quota because of the $6000 manufacturing budget. Neither of these ex-
treme options would bring in the maximum profit. Intuitively, the combination of

A games and B games that brings in the maximum profit must be a kind of “equi-
librium” between “all A games” and “all B games”. What we need to understand
in mathematical terms is how to negotiate the push-pull in a systematic and logical
fashion in order to arrive at this equilibrium. The main theme of this chapter is
about the mathematical understanding of this push-pull which, as adumbrated
above, is grounded on an understanding of inequalities.
One more comment before we proceed. While we are trying to
promote the need to better understand inequalities, this manu-
facturing problem may suggest that we forget about inequalities
and get a solution by simple trial-and-error instead. And why
not? Consider the pair of whole numbers (m, 50 − m), where m
(respectively, 50 − m) is the number of A games (respectively,
B games) the manufacturer produces. As m runs from 0 to 50,
the 51 possible profits of {125m + 185(50 − m)} dollars exhaust
all possibilities, and one of these 51 numbers will then be the
solution (we can worry about not exceeding the manufacturing
budget at the end). This is correct. However, we use small num-
bers here (50, 6000, 75, etc.) only for ease of illustration. Sim-
ilar problems coming from industry would involve far bigger
numbers (e.g., the budgets involved may be millions of dollars)
and far more choices than just two, namely, our choice between
A games and B games. In such situations, the trial-and-error
method for the purpose of getting an answer would in general
take too long even on a computer, and a more efficient method
would be needed. Getting a more efficient method then requires
a better understanding of inequalities and what optimization is
all about, and the remainder of this chapter will take a first step
toward such an understanding.

Exercises 8.1
(1) Referring to the preceding problem, we have seen that neither “50 A
games” nor “36 B games” would maximize the profit. The strategy that
maximizes profit must lie somewhere in between. Note that if instead
of 36 B games, the manufacturer can bring 35 B games and 2 A games,
35 + 2 = 37 < 50, so the new strategy still meets the quota of
“up to 50 games”, and
the manufacturing cost is (35 × 165) + (2 × 75) = 5925 < 6000,
and is within budget.
Use this trial and error method to find the number of games of each kind
that maximizes profit.

8.2. The symbolic translation

The need for a better understanding of inequalities would be more in evidence if
we begin with a translation of the given data of the Manufacturing Problem into
symbolic language. (Review Chapter 2 at this point if necessary.)

Suppose the manufacturer produces x A games and y B games. Then the

resulting profit is a function of two variables, H ( x, y), so that
H ( x, y) = 125x + 185y for all whole numbers x and y.
We want to find two whole numbers x0 and y0 so that at ( x0 , y0 ), the profit
H ( x0 , y0 ) is a maximum, i.e.,
H ( x0 , y0 ) ≥ H ( x, y) for all whole numbers x and y.
A function  of two variables is said to be a linear function of two variables if it
is of the form ( x, y) = ax + by + c, where a, b, c are constants. Thus the profit
function H ( x, y) is an example of a linear function of two variables. We will see
that the linearity of H ( x, y) is of critical importance to the solution of the problem,
e.g., without the linearity, Lemma 8.3 on page 168 would not be applicable to
H ( x, y).
Now this problem, as stated, does not make sense because there can be no
maximum for the simple reason that the larger x and y are, the larger H ( x, y) =
125x + 185y gets. It is time that we take note of the fact that the whole numbers x
and y are not arbitrary but are under the constraints (a technical term for “restric-
tions”) that come with the problem. Because the game manufacturer can bring at
most 50 games, x and y are constrained by the inequality
x + y ≤ 50.
Her manufacturing budget imposes another constraint in that she has at most
$6000 to spend on the production. Therefore,
75x + 165y ≤ 6000.
There are also two other obvious but indispensable constraints that come with the
fact that x and y are whole numbers: x ≥ 0 and y ≥ 0. In summary then, we have
the following symbolic formulation of the problem:
(I) Among whole numbers x and y satisfying

x ≥ 0, y ≥ 0 ⎪ ⎪

(8.1) x + y ≤ 50

75x + 165y ≤ 6000
find x1 and y1 so that at ( x1 , y1 ), the profit H ( x, y) = 125x + 185y
is a maximum.
Notice that (I) is an awkward problem: if we are going to work only with
whole numbers x and y, then we really have no tools to find the maximum of
H ( x, y) except to do guess-and-check. Such a strategy does not serve the purpose
of achieving any kind of mathematical understanding of the push-pull in this
problem (see the end of the preceding section). It turns out that a better strategy
is to break up the problem into two smaller problems:
[Step 1]: Allow both x and y to be real numbers and find the
ordered pair ( x0 , y0 ) (where x0 and y0 are now real numbers)
so that
H ( x0 , y0 ) ≥ H ( x, y)
for all real numbers x and y satisfying the constraints in (8.1).

[Step 2]: Check whether x0 and y0 are whole numbers and, if

not, use ( x0 , y0 ) as a stepping stone to find an ordered pair of
whole numbers ( x1 , y1 ) so that ( x1 , y1 ) is a solution of Problem
With this in mind, we address [Step 1] by reformulating Problem (I) as follows:
(II) Among all real numbers x and y satisfying

x ≥ 0, y ≥ 0 ⎪ ⎪

(8.2) x + y ≤ 50

75x + 165y ≤ 6000
find x0 and y0 so that at ( x0 , y0 ), the profit H ( x, y) = 125x + 185y
is a maximum.
If a given inequality becomes a linear equation (see pages 38 and 57) when the
inequality symbol “≤” is replaced by the equal sign, then we call the inequality a
linear inequality. Any of the inequalities in (8.1) (for whole numbers) and (8.2)
(for real numbers) is therefore a linear inequality.
The collection of all the points ( x, y) satisfying these four constraints (8.2)
is now a region R in the plane. In a terminology that will be formally intro-
duced in Section 8.4, R is called the graph of these inequalities (in the plane).
The profit function H ( x, y) = 125x + 185y
can now be thought of as a function whose Certain problems with
domain is this R. Observe that this part of whole-number data and
the manufacturing problem has now become a
whole-number answers are best
purely mathematical problem independent of
any context: Among the points in the graph solved by going beyond whole
R, at which point ( x0 , y0 ) in R does the profit number arithmetic.
function H ( x, y) = 125x + 185y achieve the
maximum value, in the sense that H ( x0 , y0 ) ≥ H ( x, y) for all the points ( x, y)
in R? The point ( x0 , y0 ) is called a maximum point of the profit function in R.
(Similarly, H is said to achieve the minimum value at a point ( x0 , y0 ) in R if
H ( x0 , y0 ) ≤ H ( x, y) for all the points ( x, y) in R. The point ( x0 , y0 ) is then called
a minimum point of H in R.)
The virtue of Problem (II) is that it points to clearly defined mathematical
( A) What does the graph R of a collection of inequalities look
( B) Can we achieve enough of an understanding of the profit
function H ( x, y) = 125x + 185y to predict where it might achieve
its maximum value in R?
In the following sections, we will do the necessary spade work for analyzing
graphs of inequalities.

Exercises 8.2
(1) Translate into symbolic language the following manufacturing problem
(no solution is required): A small firm tries to introduce two products, to
be called A and B. It has invested $60,000 in the production cost. It takes

$215 to produce one item of product A and $95 to produce one item of
product B. The projection is that it takes 3.2 hours to produce one item
of product A and 5.5 hours to produce an item of product B. Because its
manufacturing facilities are limited, the firm can only devote 1500 hours
to the production of these two products. Each item of product A brings
in a profit of $310 and each item of product B brings in a profit of $230.
Assuming that every item produced will be sold, how many items of
product A and how many items of product B should the firm produce in
order to maximize the profit?

8.3. Basic facts about inequalities and applications

Our first concern is with the behavior of inequalities with respect to arithmetic
operations. This topic has been treated in Section 2.6 of [Wu-PreAlg], but we will
briefly review the relevant facts without proof, because we will be using them
extensively without comment.
Let x, y, z, w be arbitrary numbers in the following discussion. Recall that the
inequality x < y (or written differently as y > x) means, by definition, that x is
to the left of y on the number line. Recall also the
Trichotomy law. Given two numbers x and y, then one and only
one of the following three possibilities holds: x = y, or x < y, or
x > y.
Two applications of this law can be found in the proofs of Corollary 1 and Corol-
lary 2 to Lemma 9.4 on page 209. There is also a weaker notion of inequality
in the form of x ≤ y, which means x is less than or equal to y, or in symbols,
x < y or x = y. For emphasis, we may sometimes explicitly refer to an inequality
involving “≤” as a weak inequality. Observe the following simple consequences
of the definitions:
x < y and y < z =⇒ x < z,
x ≤ y and y ≤ x =⇒ x = y.
The second one uses the trichotomy law, of course.
In the first five of the following assertions about inequalities, we state ev-
erything in terms of <, but they will all remain valid if “<” is replaced by “≤”
throughout for the same reasons. Moreover, while each of these inequalities has
been proved only for rational numbers x, y, z, w in [Wu-PreAlg], they remain valid
when x, y, z, w are arbitrary numbers, thanks to FASM (see page 265).
Recall that the symbol “ ⇐⇒ ” means “is equivalent to” and Q denotes the
rational numbers.
(A) For any x, y in Q, x < y ⇐⇒ − x > −y.
(B) For any x, y, z in Q, x < y ⇐⇒ x + z < y + z.
(C) For any x, y in Q, x < y ⇐⇒ y − x > 0.
(D) For any x, y, z in Q, if z > 0, then x < y ⇐⇒ xz < yz.
(E) For any x, y, z in Q, if z < 0, then x < y ⇐⇒ xz > yz.
We also recall from Section 2.6 of [Wu-PreAlg] that the following two proper-
ties follow from (A)–(E):
(8.3) For any x in Q, x > 0 ⇐⇒ > 0.

Let x, y, z be in Q. Then:
x y x y
(8.4) Let x < y. If z > 0, then < but if z < 0, then > .
z z z z
Here is one simple application of (B) and (D).
Example 1. Exhibit all the numbers x on the number line that satisfy
(5 − x ) + 12 > 4 − (3x − 5).
The set of all these numbers x is called the graph of (5 − x) + 12 > 4 −
(3x − 5) on the number line, and Example 1 is usually expressed as: Graph the
inequality (5 − x ) + 12 > 4 − (3x − 5) on the number line.
As in the case of solving linear equations, one simply isolates the variable
x in the inequality, in the sense of transposing all the x’s to one side of the
inequality by making repeated use of (B) (compare page 44 for a similar concept).
(5 − x ) + 12 > 4 − (3x − 5) ⇐⇒ (5 − x ) > 4 − (3x − 5) − 12.
Since 4 − (3x − 5) − 12 = −3 − 3x, we have:
(5 − x ) + 12 > 4 − (3x − 5) ⇐⇒ 5 − x > −3 − 3x ,
which, by (B) again, is equivalent to 5 − x + (3x − 5) > −3 − 3x + (3x − 5), i.e.,
equivalent to 2x > −8. By (D), this is equivalent to 12 · 2x > 12 · (−8), i.e., x > −4.
Thus we see that
(5 − x ) + 12 > 4 − (3x − 5) ⇐⇒ x > −4.
In other words, x satisfies (5 − x ) + 12 > 4 − (3x − 5) if and only if x satisfies
x > −4. These x’s therefore can be represented by the thickened semi-infinite
line segment below.

−4 0
Things get a bit more interesting when absolute value appears in inequalities.
Recall (again, see Section 2.6 of [Wu-PreAlg]) that for any number x, the absolute
value | x| of x is by definition:
| x | = the distance of x from 0.
| x | ≥ 0 for every number x, and | x | = 0 is equivalent to x = 0.
There are two basic properties of absolute value. The first is:
(8.5) | xy| = | x | · |y| for all numbers x and y.
The next is the Triangle inequality:
(8.6) | x + y| ≤ | x | + |y| for all numbers x and y.
A key point about absolute value is that the inequality | x | < b for numbers
x and b (b > 0) can be expressed directly in terms of ordinary inequalities. Let us
introduce for this purpose the double inequality a ≤ b ≤ c, where a, b, c are
numbers, to stand for the two inequalities:
a≤b and b ≤ c.

Then we have:
Let x, c be arbitrary numbers and let  be a positive number. Then
| x − c| ≤  is equivalent to the double inequality c −  ≤ x ≤ c + .
 x -
s s
c− c c+
A useful observation about absolute values is the following:

Lemma 8.1. For two numbers x and y, | x − y| is the distance between x and y.

Proof. We split the proof into three cases: Case 1: both x and y are positive. Case
2: one is positive and the other is negative. Case 3: both are negative.
Let us prove the first case, and we leave the remaining cases to an exercise
(Exercise 1 on page 163). Thus suppose both x and y are positive. Since |y − x | =
| x − y|, we may assume x < y, so that |y − x | = y − x. The lemma is then
obvious. (You may find it instructive to recall that, since y and x are the lengths of
the segments [0, y] and [0, x ], respectively, the definition of subtraction therefore
implies, literally, that y − x is the length of the remaining segment when [0, x ] has
been taken away from [0, y], which is to say, it is the length of the segment [ x, y].
See the definition of subtraction in Section 1.3 of [Wu-PreAlg].)

0 x y


Describe all the numbers x so that |3 + x | < 12 .

The following examples give a good indication of how to handle inequalities

which contain absolute values by making use of these facts.
Example 2. Graph |2x + 3| − 6 < 2 on the number line.
The inequality is equivalent to |2x + 3| < 8, ⇐⇒ −8 < 2x + 3 < 8. The left
inequality is −8 < 2x + 3, ⇐⇒ −11 < 2x, ⇐⇒ −5.5 < x. The right inequality
is 2x + 3 < 8, ⇐⇒ 2x < 5, ⇐⇒ x < 2.5. Thus |2x + 3| − 6 < 2 is equivalent to
−5.5 < x < 2.5, and the graph of |2x + 3| − 6 < 2 is the open interval (−5.5, 2.5),
displayed as the thickened segment below (not including the endpoints):

−6 −5 0 2.5 5
Example 3. Graph |6 + 2x | ≥ 1 on the number line.
We want to change the left side to something like | x − a| for some number
a, because we want to apply Lemma 8.1. With this in mind, the inequality is
equivalent to 12 |6 + 2x | ≥ 12 · 1, which is equivalent to | 12 | · |6 + 2x | ≥ 12 , which in
turn is equivalent to | 12 (6 + 2x )| ≥ 12 (by (8.5) ), i.e., | x + 3| ≥ 12 . Since |3 + x | =
| x − (−3)|, the original inequality is therefore equivalent to | x − (−3)| ≥ 12 . By
Lemma 8.1, this means we have to find all the points x so that their distance from

−3 is greater than or equal to 12 . From the picture,

s s
−3 12 −3 −2 12

we see that the graph is the union of two semi-infinite segments: the segment
to the left of −3 12 and including −3 12 , and the segment to the right of −2 12 and
including −2 12 .


Graph |6 − 2x | ≥ 2 on the number line.

Exercises 8.3

(1) Complete the proof on page 162 about | x − y| being the distance between
x and y by proving Cases 2 and 3.
(2) (a) Graph the inequality 23 x − (2 + 7x ) ≥ (6 + x ) − (1 − 12 x ) on the
number line. (b) Graph the inequality 25 − 12 x ≥ 15 x + 16 on the number
(3) Graph on the number line each of the following: (i) | x | − 14 > −8.
(ii) | x | − 4 < 13 . (iii) 9 − |3x − 1| < 4. (iv) |2x + 35 | ≥ 15 .
(v) |6x + 1| + 2 4 < 5.

8.4. Graphs of inequalities in the plane

We can now begin to tackle question ( A) on page 159, i.e., what does “the graph
of a collection of inequalities” look like in the plane? First, we need a formal
definition of the graph of a linear inequality of two variables

ax + by ≥ c

(where a, b, c are given constants): it is the set of all the points ( x, y) in the plane
whose coordinates x and y satisfy this inequality, i.e., ax + by ≥ c. For example,
the point (1, 2) does not lie on the graph of 3x + 2y ≥ 25, for the simple reason
that (3 × 1) + (2 × 2) = 7 < 25, whereas (10, 10) is easily seen to lie on this
graph. The graph of ax + by > c is defined in like manner, as are the graphs
of ax + by ≤ c and ax + by < c. It is customary in mathematics to denote the
graph of an inequality such as ax + by ≥ c by the notation { ax + by ≥ c}, and
we will use this notation below.
Given a collection of linear inequalities of two variables, the graph of the
inequalities is by definition the set of all the points which satisfy each of the
inequalities in the collection. It follows that the graph of a collection of inequalities is
the intersection of all the graphs of the individual inequalities.

Note that the concept of the graph of an in-

The concepts of graph of an equality or a collection of inequalities is usually
inequality, half-planes, above a used in TSM1 without an explicit definition.
When this happens, no mathematical reason-
line, and below a line all need
ing will be possible in any discussion concern-
precise definitions. ing graphs of linear inequalities and, indeed,
reasoning plays little or no role in such discus-
sions in TSM.
To describe the graph of an inequality in greater detail, we will show that
every line separates the plane into two half-planes. Now the “half-planes” of a given
line have been introduced in Section 4.4 of [Wu-PreAlg] intuitively, but with the
availability of a coordinate system in the plane, we will now define this concept
precisely. For the y-axis, this is easy: its two half-planes are the left half-plane
L− and the right half-plane L+ consisting of all the points with negative x-
coordinates and those with positive x-coordinates, respectively (see the left picture
below). Similarly, every vertical line defined by x = c separates the plane into the
half-plane L− consisting of all the points ( x, y) so that x < c and the half-plane
L+ consisting of those with x > c (see the right picture below).


L− L+ L− L+
r X r X
O c


If L is the line defined by x = 2, is (3.1, 1.5) in L+ ? (3.1, −1)? (−1, 2)?

(1.5, − 12 )? (2, 2)? (0, 2)?

When the given line L is horizontal, such as when L is the graph of y = c,

then the two half-planes L+ and L− are still easy to define: they are the points
with their y-coordinates > c and those with y-coordinates < c, respectively.

r L

When L is neither vertical nor horizontal, it is intuitively clear that there are still
points that are “above” L and those that are “below” L, as shown below.

1 See page xi for a definition.




 O X

However, how to precisely describe these two “halves” becomes more subtle. We
propose to test whether a point (s, t) is “above” or “below” L by passing a vertical
line  through (s, t) and letting it intersect L; since L is not vertical, L and  must
intersect, let us say at a point P. Since P and (s, t) lie on the same vertical line,
they have the same x-coordinate, namely s. Therefore let the coordinates of P be
(s, y0 ). Now we compare t, the y-coordinate of (s, t), with y0 , the y-coordinate of
P. We see that if t > y0 , then, pictorially, (s, t) lies above P, as is shown in the
following picture:

r(s, t)


r P = (s, y0 )


Similarly, if t < y0 , then (s, t) lies below P.

Formally, given a nonvertical line L, we define a point (s, t) to be above L
if the vertical line passing through (s, t) meets L at a point (s, y0 ) and t > y0 .
Similarly, (s, t) is said to be below L if the vertical line passing through (s, t)
meets L at (s, y0 ) and t < y0 . The collection of all the points above L will be
denoted by L+ and the collection of all the points below L will be denoted by L− .
(Notice that these definitions are consistent with the preceding definitions of the
half-planes of a horizontal line.) These sets, L+ and L− , associated with a line L
will be referred to as the half-planes of the line L. At times we will refer to L+
as the upper half-plane and L− as the lower half-plane. The following theorem
describes the basic properties of these half-planes.
First, we introduce a definition: a geometric figure in the plane is said to be
nonempty if it has at least one point in it; otherwise we say it is empty. Thus, an
empty figure (or set) is a figure with no elements in it, e.g., the collection of all
points below the x-axis so that their coordinates are of the form ( x, b2 ) for some
numbers x and b is an empty set.

Theorem 8.2 (Plane separation). A line L divides the coordinate plane into two
nonempty half-planes, L+ and L− , with the following properties:
(i) The plane is the disjoint union of L, L+ , and L− , in the sense
that the union of L, L+ , and L− is the whole plane and no two of
these sets have any point in common.
(ii) If two points P and Q in the plane belong to the same half-plane,
then the line segment PQ lies in the same half-plane.




(iii) If two points P and Q in the plane belong to different half-planes,
then the line segment PQ must intersect the line L.


P  Q


Proof of the theorem. This proof is quite long (it ends on page 172), so this proof
is not one that you will learn for the purpose of presenting it to your students
in class one day. However, it is given in great detail here because the reasoning
is extremely instructive, in much the same way that the reasoning in the proof of
Theorem 4.2 on pages 60 ff. is instructive. You will get to see how precise defi-
nitions are the foundation for reasoning, why the concept of a function is useful,
and why it is important to know the detailed interplay between the algebra and
the geometry of a linear equation in two variables. In particular, Lemma 8.3 on
page 168 is not only instrumental for the proof of Theorem 8.2, but it will be seen
to be crucial for the solution of the manufacturing problem in the next section.
When all is said and done, however, this proof deserves to be learned because ev-
ery middle school teacher should have a good idea of how to approach the proof
of a basic theorem such as Theorem 8.2. If we want to encourage students to
always ask why and be at ease with reasoning, then we have to start this tradition
at home and be prepared to ask and answer these questions ourselves.
Suppose L is vertical; it is obvious that property (i) holds. The fact that (ii)
and (iii) also hold when L is vertical is simple to prove and will be left as an
exercise (Exercise 6 on page 179).
For the rest of the proof here, we will assume L is nonvertical.
Proof of property (i). Since L is not vertical,
L is defined by y = mx + k for some constants m and k.
See Theorem 4.2 on page 60. We recall the definitions of L+ and L− : these are all
the points above and below L, respectively. More precisely, let ( x, y) be any point
in the plane not on L. Then the vertical line passing through ( x, y) will intersect
L at a point whose first coordinate is x and whose second coordinate is mx + k

(because ( x, mx + k) lies on L). Then, by definition

(8.7) ( x, y) is in L+ ⇐⇒ y > mx + k,
(8.8) ( x, y) is in L− ⇐⇒ y < mx + k.
The following picture shows the case of ( x, y) below L:


r( x, mx + k)

 ( x, y)

O x X

Again, it is obvious that there can be no point in common between any two of
L+ , L− , and L, that each is nonempty, and that their union is the whole plane.
We have therefore proved that L+ , L− , and L satisfy property (i) of the theorem.
Proof of property (ii). We will deal with L+ ; the proof for L− is similar (see
Exercise 6 on page 179). Thus let P and Q be points in L+ . We must prove that
the segment PQ also lies in L+ . Precisely, we will prove: if P and Q are above L and
S is a point on the segment PQ, then S is also above L. To this end, we are going to
introduce a function h : R2 → R (see page 56 for the notation R2 ) so that h( x, y)
measures how far the point ( x, y) is, vertically, above L or below L. Recall that L
is the graph of the equation y = mx + k and, as we have seen, the vertical line
passing through ( x, y) intersects L at the point ( x, mx + k), as shown:

r(⎫x, y)

⎬  L
h( x ) 
L+ ⎭ 

r  L−
 ( x, mx + k )

 x X

By definition, h( x, y) = y − (mx + k). This function h is clearly a linear function

of two variables (see page 158). From (8.7) and (8.8), we have:
h( x, y) > 0 ⇐⇒ ( x, y) is above L, i.e., ( x, y) is in L+ .
h( x, y) < 0 ⇐⇒ ( x, y) is below L, i.e., ( x, y) is in L− .
h( x, y) = 0 ⇐⇒ ( x, y) is on L.
In terms of the function h, property (ii) of Theorem 8.2 now states: given
two points P and Q so that h( P) > 0 and h( Q) > 0, if a point S lies on PQ,
then also h(S) > 0. In this form, the next lemma, Lemma 8.3, clearly implies
property (ii). The essential content of the lemma is that for a linear function of
two variables f ( x, y), the maximum or minimum among the values it assigns to

points in a segment PQ is achieved at the endpoints (see page 56 for the notation
R2 in the lemma).2
Lemma 8.3. Let f ( x, y) be a linear function of two variables, f : R2 → R, and let
PQ be a given segment in the plane. Then for any point S in PQ, either f (S) = f ( P) =
f ( Q), or f (S) is between f ( P) and f ( Q).


p s q X

Proof of Lemma 8.3. First assume the line L PQ joining P and Q is not vertical.
Then L PQ is the graph of an equation y = μx + κ for some constants μ and κ
(Theorem 4.2 on page 60).3 Thus P = ( p, μp + κ ) and Q = (q, μq + κ ) for some
constants p and q. We may assume without loss of generality that p < q. Let
f ( x, y) = ax + by + c for some constants a, b, and c. Therefore,
f ( P) = ap + b(μp + κ ) + c = ( a + bμ) p + (bκ + c).
f ( Q) = ( a + bμ)q + (bκ + c).
If S is between P and Q, then S = (s, μs + κ ) for some constant s, and
p < s < q
(see the definition of between on page 266). We also have
f (S) = ( a + bμ)s + (bκ + c).
Obtaining these explicit expressions for f ( P), f (S), and f ( Q) is the key point of
this proof. Once that is done, the rest of the proof is nothing more than a straightforward
computation with these expressions to arrive at the desired conclusions.
We will in fact prove something slightly more precise, namely:
(i) If f ( P) = f ( Q), then f (S) = f ( P) = f ( Q), i.e., f is constant
on PQ.
(ii) If f ( P) = f ( Q), then f (S) is between f ( P) and f ( Q).
The lemma is easily seen to follow from (i) and (ii) (when L PQ is not vertical).
We first prove (i). If f ( P) = f ( Q), then
( a + bμ) p + (bκ + c) = ( a + bμ)q + (bκ + c),
2 From an advanced standpoint, this lemma is about linear functions of one variable, in the fol-
lowing sense. If PQ is taken to be the image of a linear mapping φ : [ a, b ] → R2 , then the composition
f ◦ φ : [ a, b ] → R is a linear function of one variable, and the maximum or minimum of a linear
function of one variable on a closed interval is achieved at the endpoints of the interval.
3 The letters μ and κ are the lower case Greek letters for “m” (mu) and “k” (kappa), respectively.

We use these because we have run out of appropriate lower case Latin letters.

which implies ( a + bμ) p = ( a + bμ)q or, equivalently, ( a + bμ)( p − q) = 0. Since

p < q, ( p − q) = 0 and therefore ( a + bμ) = 0. Hence in this case,
f (S) = f ( P) = f ( Q) = (bκ + c).
Next, we prove (ii). Suppose f ( P) = f ( Q); then
( a + bμ) p + (bκ + c) = ( a + bμ)q + (bκ + c)
so that ( a + bμ) p = ( a + bμ)q and ( a + bμ)( p − q) = 0. Thus, ( a + bμ) = 0.
Either ( a + bμ) > 0 or ( a + bμ) < 0. If ( a + bμ) > 0, then on account of
p < s < q, we have
( a + bμ) p < ( a + bμ)s < ( a + bμ)q
(see (D) on page 160). This implies
( a + bμ) p + (bκ + c) < ( a + bμ)s + (bκ + c) < ( a + bμ)q + (bκ + c).
Equivalently, f ( P) < f (S) < f ( Q), i.e., f (S) is between f ( P) and f ( Q). If,
instead, ( a + bμ) < 0, then the fact that p < s < q and (E) on page 160 together
imply that
( a + bμ) p > ( a + bμ)s > ( a + bμ)q
and therefore
( a + bμ) p + (bκ + c) > ( a + bμ)s + (bκ + c) > ( a + bμ)q + (bκ + c).
Equivalently, f ( P) > f (S) > f ( Q) and, once again, f (S) is between f ( P) and
f ( Q). The proof of Lemma 8.3 is complete in case L PQ is not vertical.
Finally, suppose L PQ is vertical. Then P, S, and Q have the same first coordi-
nate, let us say, p. Let P = ( p, p ), S = ( p, s ), and Q = ( p, q ). Since S is between
P and Q, we may assume without loss of generality that p < s < q .

r Q = ( p, q )

r S = ( p, s )

r P = ( p, p ) 


 ( p, mp + k)

 p X

From f ( x, y) = ax + by + c, we have
f ( P) = ap + bp + c, f (S) = ap + bs + c, f ( Q) = ap + bq + c.
To prove (i), suppose f ( P) = f ( Q). Then bp = bq , so that b( p − q ) = 0. Since
p < q , we have b = 0. In that case,
f ( P) = f ( Q) = f (S) = ap + c.
Now we prove (ii). If f ( P) = f ( Q), then bp = bq and b( p − q ) = 0. In
particular, b = 0. If b > 0, then p < s < q implies bp < bs < bq , which
ap + bp + c < ap + bs + c < ap + bq + c.

Hence f ( P) < f (S) < f ( Q), and f (S) is between f ( P) and f ( Q). If, however,
b < 0, then the by-now familiar argument proves that
ap + bp + c > ap + bs + c > ap + bq + c
and therefore f ( P) > f (S) > f ( Q). So f (S) is again between f ( P) and f ( Q).
The proof of Lemma 8.3 is complete.


Let L be the graph of y = 12 x, and let P = (0, 12 ) and Q = (3, 2). ( a) Verify
that P and Q are in L+ . (b) Produce two points on the segment PQ, and
verify directly—without using Theorem 8.2—that they also lie in L+ .

Proof of property (iii). Recall that the line L is defined by y = mx + k. Let P

and Q be points in L+ and L− , respectively; we have to prove that the segment
PQ intersects L.
The case of a vertical L PQ is simpler and will be left as an exercise (see Exercise
7 on page 179.) From now on, we assume that L PQ is not vertical. Thus L PQ is
defined by an equation y = cx + d for some constants c and d. We first tackle a
simpler problem by showing that the lines L and L PQ intersect. For this, it suffices
to show that the following linear system has a solution (see Theorem 5.1 on page

y = mx + k
y = cx + d
A solution for x of this system comes from solving mx + k = cx + d, and we see
immediately that this equation has a unique solution if m = c. Now m is the
slope of L and c is the slope of L PQ (see Lemma 4.8 on p. 78). We are in fact going
to show that c < m by a direct computation, as follows. Let P = ( p, p ) and
Q = (q, q ). We may assume p < q. Let the vertical lines passing through P and
Q intersect L at U = ( p, u ) and V = (q, v ), respectively, as shown:
PP  V = (q, v )
P = ( p, p ) PPP ( x0 , y0 )
 PP L−
U = ( p, u ) r PP
PsP Q = (q, q )

p x0 q X

Using P and Q to compute the slope c of L PQ and using U and V to compute

the slope m of L, we get
q − p v − u
(8.10) c = and m = .
q−p q−p

Because P is in L+ and Q is in L− , we have, by virtue of (8.7) and (8.8) on

page 167, that

(8.11) p > u and q < v .

Therefore q − v < 0 < p − u , as the left side is negative and the right side is
positive. By (B) on page 160, the fact that q − v < p − u implies

(8.12) q − p < v − u .

Moreover, since we are assuming p < q, (q − p) > 0. Therefore, dividing

both sides of (8.12) by q − p will leave inequality (8.12) unchanged (by (8.4) on
page 161). Hence,
q − p v − u
< .
q−p q−p
According to (8.10), this means
c < m.
Therefore the equation mx + k = cx + d has a unique solution:
(8.13) x0 = .
Since this x0 is a solution of the system (8.9), x0 is the first coordinate of the point
( x0 , y0 ) where the line L PQ intersects L. (We are not concerned with the exact
value of y0 because it is irrelevant to what we are doing.)
Recall that property (iii) requires the proof that the segment PQ intersects L,
i.e., the point ( x0 , y0 ) is between P and Q on L PQ . The time has come for us to
recall precisely what it means to say S is between P and Q (see page 266). What
we observe is that since every point on L PQ is of the form ( x, cx + d) for a number
x, the first coordinate of the point then confers a number-line structure on L PQ ,
e.g., the point 0 on L PQ would be the point (0, d) and unit 1 on L PQ would be the
point (1, c + d). Since we are assuming that p < q, the fact that ( x0 , y0 ) is between
P and Q therefore means:
p < x0 < q.
We first prove the left inequality. By (8.13), we have:
p < x0 ⇐⇒ p <

⇐⇒ (m − c) p < (d − k) (because m − c > 0)

⇐⇒ mp − cp < d − k.
By (B) on page 160, mp − cp < d − k ⇐⇒ mp + k < cp + d. Thus

p < x0 ⇐⇒ mp + k < cp + d.

But mp + k = u , the second coordinate of U, and cp + d = p , the second

coordinate of P. Thus the right side is a valid inequality because of (8.11), and
therefore it is true that p < x0 . Similarly, to prove x0 < q, again by (8.13), we

x0 < q ⇐⇒ < q
⇐⇒ (d − k) < (m − c)q (because m − c > 0)
⇐⇒ d − k < mq − cq.
Using (B) on page 160 once more, we see that d − k < mq − cq ⇐⇒ cq + d <
mq + k, which is the same as q < v . By (8.11), the last inequality is valid, and
therefore so is x0 < q. Consequently, ( x0 , y0 ) is between P and Q after all, and
the proof of property (iii) is complete. The proof of Theorem 8.2 is therefore also


Let L be the graph of y = − 12 x + 3, and let P = (12, 0) and Q = (−1, 2).

Where does L intersect PQ?

The definition of half-planes given on page 165 is intuitive enough, but it is

somewhat clumsy in applications (such as solving the manufacturing problem
in Section 8.1). The following characterization of half-planes is a refinement of
Theorem 8.2.
For the precise statement of the theorem, let
The fact that the graph of a a line L be given so that its defining equation
linear inequality is a half-plane is ax + by = c. We will use  to denote the
corresponding linear function of two variables
is a theorem to be proved rather
 : R2 → R so that, with the same a and b as in
than a factoid to be memorized the equation ax + by = c, ( x, y) = ax + by for
by rote. every x. In this notation, L is just the level set
{ = c} (see page 129 for the definition). Given
any number k, the graph of the inequality ax + by > k can now be denoted more
simply as { > k}, and the graph of the inequality ax + by < k will likewise be
denoted by { < k}. Thus, { > k} is the set of all the points ( x, y) in the plane
so that ax + by > k.
Theorem 8.4. Let a line L be given and let its equation be ax + by = c, where a, b,
c are constants and at least one of a and b is not equal to 0. With  as the function on the
plane so that ( x, y) = ax + by for all ( x, y):
(i) The sets { < c} and { > c} are the two half-planes L+ and L−
(defined on page 165) of the line L.
(ii) If b > 0, then L+ = { > c} and L− = { < c}. If b < 0,
then L+ = { < c} and L− = { > c}.
Remark. We first explain the significance of this theorem. The definitions
of L+ and L− given on page 165 are purely geometric and do not involve the
defining equation of L. What Theorem 8.4 does is to show that these half-planes
can also be described algebraically in terms of the defining equation of L.
Part (ii) of this theorem makes part (i) more precise, but in practice, there is no
need for such a precise statement. There are, after all, only two half-planes of L, and
if it is a matter of deciding which of the two is { ax + by > c}, it can be done very

simply, as follows. Take a point ( x0 , y0 ) of L+ and check whether the inequality

ax0 + by0 > c is true or not. If it is, then ( x0 , y0 ) belongs to { ax + by > c} by
definition and, necessarily, { ax + by > c} = L+ . If it is not, then ( x0 , y0 ) does
not belong to ax + by > c so that { ax + by > c} cannot be L+ . But since
Theorem 8.4 guarantees that { ax + by > c} must be a half-plane, we conclude
that { ax + by > c} = L− .
Before proving the theorem, let us begin with an example.
Example 4. Let L be the line defined by 2x − 3y = − 6 and let ( x, y) be the
function ( x, y) = 2x − 3y. Using notation as in Theorem 8.4, we want to know
whether the half-plane { > −6} is L+ or L− .
Take a random point (−4, 2). One can directly check that
(−4, 2) = 2(−4) − 3(2) = −14 < −6
and therefore (−4, 2) lies in { < −6}. On the other hand, the vertical line x = −4
passing through (−4, 2) intersects L at a point, and this point is easily seen to be
(−4, − 23 ) by solving for y in the equation 2x − 3y = −6 with x = −4. See the
picture below.

 (3, 4)

(−4, 2) q 
 2 q(3, 1)

r −3
(−4, − 32 )  O 3

Since 2 > − 23 , we see that (−4, 2) lies above (−4, − 23 ) and is therefore in L+ .
Hence the half-plane { < −6} is L+ . The answer to the original question is that
the other half-plane, { > −6}, has to be L− and not L+ .
What is noteworthy is the fact that one intuitively associates the half-plane
{ > c} (which is defined by the inequality “>”) with L+ (which is also defined
by the inequality “>”), but in this case, { > −6} is equal to L− , not L+ . It
is instructive for the understanding of the proof of Theorem 8.4 to analyze the
computation to see why (−4, 2) < −6. The crucial fact is that the coefficient −3
of y in ( x, y) = 2x − 3y is negative. In greater detail, the point (−4, 2) sits above
the point (−4, − 23 ) because
2 > − .
If we multiply both sides of this inequality by (−3), assertion (E) on page 160
implies that the inequality is reversed:
(−3)2 < (−3)(− ).
Therefore, we get an inequality for (−4, 2):
(−4, 2) = 2(−4) + (−3)2 < 2(−4) + (−3)(− ) = (−4, − 23 ) = −6.

This is how the inequality (−4, 2) < −6 came about, and this is why (−4, 2)
belongs to { < −6}.
To further bring out this idea, consider another point (3, 1) (see preceding
picture). The vertical line x = 3 passes through it and intersects L at (3, 4); since
1 < 4, we see that (3, 1) lies in L− . We now perform the same computation to
verify that (3, 1) belongs to { > −6}. Indeed, 1 < 4 implies (−3)1 > (−3)4 by
(E) on page 160, so that
(3, 1) = 2(3) + (−3)(1) > 2(3) + (−3)4 = (3, 4) = −6.
Thus (3, 1) > −6, as desired.
This example tells us how to prove Theorem 8.4.
Proof of Theorem 8.4. Part (ii) implies part (i), so we will prove part (ii). There
are two cases to consider: b > 0 and b < 0. We will prove the latter case because
it is more involved, and leave the case of b > 0 to Exercise 11 on page 180.
Thus we assume henceforth that b < 0 in ax + by = c. We will prove:
(8.14) L− = { > c}.
Take a point ( x, y) in L− ; we first show that ( x, y) lies in { > c}. Referring to
the picture below, let the vertical line passing through ( x, y) intersect L at ( x, y0 ).
Since ( x, y) is in L− , y < y0 . Because b < 0, (E) on page 160 implies that
(8.15) by > by0 .


 ( x, y0 )


 q( x, y)

 O x

We now see that ( x, y) belongs to { > c} because, by (8.15),
( x, y) = ax + by > ax + by0 = ( x, y0 ) = c,
where the last step is because ( x, y0 ) lies on L.
To complete the proof of (8.14), we have to show that every ( x, y) in { > c}
is a point of L− (see equal sets on page 267). Thus we are given that ( x, y) > c.
By Theorem 8.2, this ( x, y) is either in L, or in L+ , or in L− , and there are no
other possibilities. If it is in L, then by the definition of L, ( x, y) = c, and this
contradicts ( x, y) > c. Next, suppose ( x, y) is in L+ . We have just finished
showing that every point ( x, y) in L− must lie in { > c}; an entirely similar
argument will show that every point of L+ must be in { < c}. Thus we would
have ( x, y) < c for this ( x, y). Again this contradicts ( x, y) > c. Hence by
elimination, we are left with the conclusion that such an ( x, y) in { > c} has to
be a point of L− . The proof of (8.14) is complete.
The proof of L+ = { < c} is entirely similar. This proves Theorem 8.4.


Let ( x, y) = 2x − 3y − 4, so that (3, 0) = 2. Let L be the level set of 

passing through (3, 0). Without using Theorem 8.4, directly determine the
following four sets: L+ , L− , { > 2}, and { < 2}.

The occasion will arise when more precision regarding half-planes is needed,
for the following reason. The half-planes L+ and L− do not include L, but we
shall see presently that there is sometimes a need to also consider half-planes
together with L itself. For this need, it will be advantageous to formally introduce
two common concepts regarding geometric figures (see page 267).4 Let A and B
be two figures in the plane. Then the union of A and B, to be denoted by A ∪ B,
is the totality of all the points that are in A or B, or both. Their intersection, to
be denoted by A ∩ B, is the totality of all the points that are in both A and B.
In this new language, we will refer to L+ ∪ L and L− ∪ L as the two closed
half-planes of L. The two closed half-planes are not disjoint as they have L in
common. If there is any fear of confusion, we will refer to L+ and L− as the two
open half-planes of L for emphasis.5
Theorem 8.4 allows us to see why the concept of a closed half-plane is rele-
vant. Indeed, suppose we want to know the graph of the weak inequality ax +
by ≤ c; it is natural to denote this graph by { ≤ c}. By Theorem 8.4, we know
{ < c} is one of L+ and L− . Let us say for definiteness that { < c} = L− .
It follows that { ≤ c} is the closed half-plane L− ∪ L. Similarly, if we define
{ ≥ c} to be all the points ( A, B) so that ( A, B) ≥ c, and if { > c} = L+ ,
then { ≥ c} is equal to the closed half-plane L+ ∪ L. Incidentally, we have
{ ≤ c} ∩ { ≥ c} = L.
Now recall that we are interested in the region R consisting of all the points
satisfying the inequalities of (8.2). Therefore by the definition of the graph of a
collection of inequalities on page 163, R is the intersection of a finite number of closed
The following examples illustrate how to make use of Theorem 8.4.
Example 5. Graph 3x − 2y > −5 in the plane.
The line L defined by 3x − 2y = −5 is shown below.



− 53



4 The concepts of union and intersection apply to any two sets (collections of objects).
5 This way of using “open” and “closed” is standard in mathematics, but one must be careful to
keep in mind the fact that a half-plane that is not closed may not necessarily be open. For example, the
union of the open upper half-plane together with the positive x-axis is not a closed half-plane, but it is
not open either.

The coefficient of y being −2 and therefore negative, Theorem 8.4(ii) says the
graph of 3x − 2y > −5 is L− . However, as we have mentioned more than once,
there is no reason to rely on part (ii) of Theorem 8.4 to make this determination.
This fact is more easily deduced by the cruder, but eminently practical, method of
checking in which half-plane O belongs. Visibly, (0, 0) belongs to L− , but it also
belongs to {3x − 2y > −5} as 0 > −5. Since Theorem 8.4 says {3x − 2y > −5}
must be either L+ or L− , we know that it is L− .

Example 6. Find the graph of the pair of inequalities − x − 2y < 4 and −2x +
3y > 0, i.e., find all the ( x, y) that satisfy both inequalities.

This example asks for the totality of all the points in both graphs {− x − 2y <
4} and {−2x + 3y > 0}. In other words, we want the intersection of the graphs of
the individual inequalities. Let L1 be the line − x − 2y = 4. Now (0, 0) belongs to
− x − 2y < 4 because 0 < 4, so the graph of − x − 2y < 4 is the upper half-plane
L1+ of L1 , as shown below.

1 {− x − 2y < 4} = L1+
H− 2

It remains to determine the graph of −2x + 3y > 0. Let L be the line defined by
−2x + 3y = 0. Then the picture is the following:

L+ = {−2x + 3y > 0} 


r −2

Since the coefficient of y in −2x + 3y is 3 and 3 > 0, Theorem 8.4(ii) implies

that {−2x + 3y > 0} is also the upper half-plane L+ of L. Another way is
to check that the point (−3, 0) is in L+ because (−3, −2) is on L, and it is also
in {−2x + 3y > 0} because (−2)(−3) + 3 · 0 > 0. Thus by Theorem 8.2,
{−2x + 3y > 0} = L+ .

The graph of the pair − x − 2y < 4 and −2x + 3y > 0 is therefore the inter-
section of two half-planes: L+ ∩ L1+ . This is the shaded region in the following
picture, and one should take note of the fact that the region does not include the
two rays on the boundary6 of the region.

−4 O

The graph in Example 6 is an “unbounded region” in a sense that is self-explanatory
(although “unbounded” in this context can be precisely described in advanced
mathematics). In applications such as the manufacturing problem of this chapter,
however, the graph would tend to be a polygon with the edges included. As an
illustration of such a polygon, let us see how we can obtain one by a slight elab-
oration on Example 6. The graph of the two weak inequalities − x − 2y ≤ 4 and
−2x + 3y ≥ 0 is the intersection of the closed half-planes L+ ∪ L and L1+ ∪ L1 ,
and is the same shaded region as above plus the two rays. Call this region S .
Now consider not just the graph of this pair but the graph of this pair plus a third
inequality, namely, the graph of the three weak inequalities:
− x − 2y ≤ 4, −2x + 3y ≥ 0, and y ≤ 0.
We see that this graph is the intersection of S with the closed lower half-plane of
the x-axis; it is therefore the following shaded triangular region together with the
three edges:

−4 O


Finally, we bring closure to this section by answering the question: what does
the graph of a collection of weak inequalities look like? The answer is that it is the
intersection of a collection of closed half-planes. More can be said. A geometric
figure R in a plane is said to be convex if, given any two points A and B in R,
the segment AB lies completely in R. For example, the region enclosed by the
6 In this volume, we will use the term “boundary” in an intuitive sense, in the same way that

we have used the term “region” in an intuitive sense without a proper definition. These are precise
concepts in advanced mathematics.

cross below is not convex (can you prove it is not convex by using this definition
of convexity?).
On the other hand, a half-plane and a closed half-plane are both convex, and with
a little bit more effort, one can prove that intersection of a finite number of convex
sets is also convex (see Exercise 12 on page 180). It follows that the intersection of
a finite number of closed half-planes is convex. For example, a triangular region
or a rectangular region is convex (Exercise 12 again). If the intersection of a finite
number of closed half-planes is known to be a bounded region, then it is actually a
convex polygon; see Lemma 8.7 on page 188 for further discussion.
A backward glance. In TSM, it is usually asserted that “the graph of a linear
inequality in two variables is a half-plane”, but no definition is given for “graph
of an inequality” or “half-plane” and no reasoning is given for why this is true.
When such an assertion is made without proof about two undefined concepts, it
promotes sloppy thinking and precludes any meaningful mathematics learning.
Learning by rote is the inevitable result. Students get the idea that mathematics
is a faith-based discipline that tolerates no questioning about why something
might be true. What this section tries to do is to disabuse students of that sort
of anti-mathematical thinking by precisely defining what a half-plane is and what
the graph of a linear inequality means; it also provides the reasoning for why the
graph of a linear inequality is a half-plane and how to figure out which half-
plane corresponds to which inequality. This kind of knowledge is indispensable
to a middle school mathematics teacher, even if something like the complete proof
of Theorem 8.2 may be too sophisticated (or too technical) for a K–12 audience.
However, if you as a teacher can bring the basic spirit of this section back to
your classroom—at least giving precise definitions of all the terms you use and
exposing your students to a judicious choice of the arguments, such as the proof
of Lemma 8.3—you will be taking a major step toward restoring good sense to
your mathematics classroom.

Exercises 8.4

(1) Prove that distinct level sets of a linear function of two variables are
parallel lines.
(2) Let the linear function of two variables  be defined by ( x, y) = 5x −
y + 7. Sketch { = 1}, { = 10}, { < 1}, and { < 10}. What is the
relationship between { < 1} and { < 10}?
(3) Referring to the picture below, let a line L be defined by ax + by = c,
where a, b, c are constants and a = 0 (thus L is not horizontal). Let its
x-intercept be x L . Let A (respectively, B ) be the set of all points ( x , y )
with the following property: ( x , y ) does not lie on L, and if the line
passing through ( x , y ) and parallel to L has x-intercept k, then k > x L
(respectively k < x L ). Prove that A and B are the half-planes of L.

# #
# #
B # # A
# r#
# #( x , y )
# #
# #
#r# #r# X
# L # k

(4) Let L be the line defined by y = 3x − 5. (a) Find a linear function of two
variables F( x, y) so that L is the level set { F = −5}. (b) Find a linear
function of two variables G ( x, y) so that L is the level set { G = 1}.
(5) Suppose we have two lines both with slope ba (b = 0), as shown:

( p, q ) r

r r
( s, t) r 
( p , q ) r

Let P( x, y) be a linear function of two variables P( x, y) = ax − by + e
with a > 0. (a) Compare the values that P( x, y) assigns to (s, t) and
( p, q). (b) Compare the values that P( x, y) assigns to ( p, q) and ( p , q ).
(6) The following two statements refer to the proof of Theorem 8.2 on page
165: (a) Complete the proof by proving it for the case of a vertical L, i.e.,
show that the L+ and L− so defined (page 164) satisfy properties (ii)
and (iii). (b) Write out a complete proof of the fact that L− satisfies
property (ii) (see page 167).
(7) Prove property (iii) of Theorem 8.2 when the line joining P in L+ and Q
in L− is vertical (see page 170).
(8) Graph the following inequalities in the plane:

⎪ 5x + 2y ≤ 6

2x − 3 12 y ≤ −3

− x + 1 34 y ≤ 5
(9) Graph the following inequalities in the plane:

⎪ x−y ≤ 53

2x + y ≤ −4

x ≥ −3

(10) Graph the following inequalities in the plane:

⎪ −3x + 2y ≤ 0

3x + 2y ≤ 12

y ≥ 0
How would you describe the shape of this region in words?
(11) Give a proof of the half of Theorem 8.4 on page 172 concerning b > 0.
(12) (a) Prove that a half-plane of a given line in the plane is convex.
(b) Prove that a closed half-plane of a given line in the plane is convex.
(c) Prove that the intersection of a finite number of convex sets is convex.
(d) Prove that all rectangular regions and triangular regions are convex.

8.5. Solution of the manufacturing problem

At this point, we have all the needed information to tackle task ( B) on page 159.
Let R be the graph of the following four inequalities (see (8.2) on page 159):

x ≥ 0, y ≥ 0 ⎬
(8.16) x + y ≤ 50

75x + 165y ≤ 6000
R is thus the intersection of the four closed half-planes defined by the four weak
inequalities in (8.16); it is called the feasibility region of the problem. It can be
easily seen that R is the dotted quadrilateral region AOBC below:

50 @
@ x + y = 50
aa @
A aaa @
q aa @
q q q aa
@a tC
q q @ aa
R q aa
q q q @ aa75x + 165y = 6000
q q @q aa
q q @ aa
t @t aa
a X
O @
B 80


Confirm that this graph of the region R is correct.


For a later need, we record the coordinates of the vertices of the quadrilateral
(8.17) A = (0, 36 11 ), O = (0, 0), B = (50, 0), C = (25, 25).
The only coordinate expression worthy of comment here is that of C. It is the
point of intersection of the two lines defined by x + y = 50 and 75x + 165y = 6000
and is therefore the solution (by Theorem 5.1 on page 86) of the linear system:

x + y = 50
75x + 165y = 6000
Solving this system in the standard way (but noting that a simplification can
be achieved by reducing the second equation to x + 2.2 y = 80), we get the
solution (25, 25). So C = (25, 25).
What task ( B) asks is at which point ( x0 , y0 ) of R the profit function H ( x, y) =
125x + 185y will achieve its maximum value in R. Thus we look for an ( x0 , y0 )
in R so that
(8.18) H ( x, y) ≤ H ( x0 , y0 ) for all ( x, y) in R.
Let us first informally discuss where to look for such an ( x0 , y0 ). The overriding
fact is that H is a linear function of two variables so that we can apply Theorem 8.4
on page 172 to H. With this understood, letting c = H ( x0 , y0 ) and L be the level
set { H = c} that passes through ( x0 , y0 ),7 we see that—on account of (8.18)—the
whole region R must lie in the closed half-plane { H ≤ c} . This suggests strongly
that this ( x0 , y0 ) had better not be inside the region R because, were this the case,
the line L = { H = c} would “split” R into two parts and R could in no way lie
in either half-plane of L, as the following picture shows:
QQ L = { H = c}
A aaa Q
a Q
a a
Qaa C
Q@ r
( x0 , y0 ) Q@
R @Q
@ QQ
What this informal discussion suggests is that, if H achieves a maximum
value at ( x0 , y0 ) in R, then this point ( x0 , y0 ) will have to be at the boundary. We
can go further, however. We claim that the maximum value of H is in fact achieved
at a vertex.8 Suppose this ( x0 , y0 ) lies on a side of the quadrilateral AOBC—let
us say on AC—but H ( x0 , y0 ) is greater than H ( A) and H (C ). According to
Lemma 8.3 on page 168, this is impossible, because either H ( x0 , y0 ) = H ( A) =
H (C ) or H ( x0 , y0 ) has to be between H ( A) and H (C ). Therefore if H ( x0 , y0 ) is a
maximum value of H, then H ( x0 , y0 ) ≤ H ( A) or H (C ). Since we already know
that H ( x0 , y0 ) is a maximum value of H in R, we must have H ( x0 , y0 ) = H ( A) =
H (C ) after all. This means H also achieves its maximum value at both vertices
7 Don’t forget L is the line 125x + 185y = c.
8 Caution: we are not saying that H achieves its maximum value only at a vertex. All we are
saying is that, if H achieves a maximum value somewhere in R, it already does so at a vertex.

A and C, which then proves the claim. Our informal knowledge therefore tells
us if we are looking for such an ( x0 , y0 ), we should be looking for it at one of
the vertices of AOBC. With this understood, the following theorem then becomes
somewhat anti-climatic, and the only excitement left is to find an honest proof for
the theorem.
Theorem 8.5. For the region R defined by (8.16), the linear function of two vari-
ables H ( x, y) = 125x + 185y achieves a maximum value in R at a vertex.
It may seem strange that we would be proving a general theorem about a
specific problem. The reason for doing this is that the idea behind this proof is in
fact perfectly general and therefore will serve as the model for a proof of a general
theorem about the maximum value of linear functions of two variables on convex
polygons. See Theorem 8.6 on page 187.
Proof. Referring to (8.17), we have:
H ( A) = 6727 11 , H (O) = 0, H ( B) = 6250, H (C ) = 7750.
Thus among the four vertices, H is largest at C. Now we will prove that H ( x, y) ≤
H (C ) for every point ( x, y) in R. To this end, let the horizontal line passing
through a given ( x, y) meet the boundary of R at two points P and Q.9
A aaa
aa C
P r r @ rQ
( x, y) @
R @
By Lemma 8.3 on page 168, H ( x, y) ≤ H ( P) or H ( Q). By the same lemma,
H ( P) ≤ H ( A) or H (O) and, as we know, both are ≤ H (C ). For the same
reason, H ( Q) ≤ H ( B) or H (C ) and therefore ≤ H (C ). In either case, we get
H ( x, y) ≤ H (C ). The proof of the theorem is complete.
Recall from (8.17) that C = (25, 25). Thus the fact that H (C ) is the maximum
value of H in the feasibility region R means that the profit from manufacturing
25 of the A games and 25 of the B games will bring in the maximum profit under
the given constraints. Theorem 8.5 therefore solves the manufacturing problem.
It should not have escaped your attention that the preceding solution of the
manufacturing problem depends on a stroke of luck: the fact that the coordinates
of the vertex C at which H achieves its maximum value are positive integers.
Imagine, for example, that the coordinates of C were (21 37 , 28 47 ). Such would
be the case if we slightly perturb the original problem by specifying the cost of

9 It is not necessary to use a horizontal line; any line passing through ( x, y ) would do. But a

horizontal line (like a vertical line) has the virtue of simplicity. Furthermore, there should be no
doubts about the claim that this horizontal line will meet the boundary of R at two points P and
Q. Indeed, since all the equations of the four lines containing the sides AO, OB, BC, and AC are
explicitly known, the points of intersection of this horizontal line with these four lines can all be
explicitly computed (thanks to Theorem 5.1 on page 86) and verified to lie on the four sides.

manufacturing an A game to be $60 instead of $75. Thus suppose the problem is:
[Second Manufacturing Problem] A video game manufacturer is
invited to a game show, and is told that she can bring up to 50
games. She has two games, A and B, and has up to $6000 to
spend on manufacturing costs. Game A costs $60 to manufac-
ture and will bring in a net profit of $125, while Game B costs
$165 to manufacture but will bring in a net profit of $185. As-
suming that she sells every game she brings, how many games
of each kind should she manufacture if she wants to maximize
her profit?
Let x and y denote the number of A and B games manufactured, respectively.
While the profit function H stays the same,

H ( x, y) = 125x + 185y ,

we are now looking at the pair of integers x and y which satisfy the following
inequalities (they are the same as those in (8.16) except the last):

x ≥ 0, y ≥ 0 ⎬
(8.19) x + y ≤ 50

60x + 165y ≤ 6000

If we let x and y be arbitrary real numbers that satisfy (8.19), then we have a new
feasibility region R shown below:
50 @
@ x + y = 50
PPt @
A PP @
q P
q PP @ C
q P P@
q q q @@ PPP
@ PP
q @ PP
q R q PP
@ PP
q q q P60x
PP+ 165y = 6000
q q @q PP
q q @ PP
t @t PP
O B@ 100

Now the coordinates of the vertices are:

A = (0, 36 11 ), O = (0, 0), B = (50, 0), C = (21 37 , 28 47 ),

where the coordinates of C are obtained by solving the linear system:

x + y = 50
60x + 165y = 6000

The values of H at the vertices are:

(8.20) H ( A) = 6727 11 , H (O) = 0, H ( B) = 6250, H (C ) = 7964 27 .

Just as before, among the four vertices, H achieves its maximum value at C. Then
arguing exactly as in the proof of Theorem 8.5, we conclude that H achieves its
maximum value in the feasibility region R at C = (21 37 , 28 47 ). So far so good.
But now the formal conclusion is that if we manufacture 21 37 A games and
28 7 B games, then the profit would be the maximum. The only problem with this
conclusion is that it doesn’t make any sense. For example, how to manufacture
21 37 A games? We need solutions that are a pair of whole numbers, not fractions.
The second manufacturing problem, in effect, compels us to recall that there
are two steps to the solution of this kind of problem, as explained on page 158.
What we have been doing is to address Step 1, whose goal is to obtain the max-
imum value of the profit function in the feasibility region using real numbers
without regard to the original problem itself.
A real world problem is The second step is to transition from the
sometimes solved by solving an purely mathematical solution of Step 1—such as
3 4
abstract version first before going H (21 7 , 28 7 )—to a solution in whole numbers as
demanded by the original problem. In general,
back to the original problem. the second step is not simple, and the mid-
dle school curriculum correctly de-emphasizes
this step and only deals with problems so that the coordinates of the vertices of
the given feasibility regions have integer coordinates; cf., e.g., the original manu-
facturing problem on page 155 and its solution in Theorem 8.5 (page 182). How-
ever, the second manufacturing problem is sufficiently simple that we will give a
complete solution (but it can be skipped if necessary).
We are given that C = (21 37 , 28 47 ) is the vertex of R at which H achieves
the maximum value in R . If we want a point in R with integer coordinates
at which H is the largest possible, the point (21, 28) comes to mind immediately
and we have H (21, 28) = 7805. It is simple to check (8.19) to see that (21, 28)
is in R . Now since 21 + 28 < 50, we should also consider the points (22, 28)
and (21, 29). Unfortunately, (21, 29) does not satisfy 60x + 165y ≤ 6000, so that
(21, 29) does not lie in R , but (22, 28) does and therefore (22, 28) lies in R . We
have H (22, 28) = 7930, which is larger than 7805 (= H (21, 28)). We claim:

H achieves its maximum of 7930 at (22, 28) among all the points in
R with integer coordinates.

To prove this claim, let P = (22, 28). P lies on the line { x + y = 50}; this
is the line containing the side BC of the feasibility region R , which is AOBC.
Since H (22, 28) = 7930, P also lies on the level set { H = 7930}. Let { H = 7930}
intersect the line {60x + 165y = 6000} at a point Q. The coordinates of Q can be
obtained by solving the following linear system:

125x + 185y = 7930
60x + 165y = 6000

The solution is (20 106 100

127 , 28 127 ). Now what we have is an expression of the feasi-

bility region R as the union of the polygonal region AOBPQ and the triangular
region QCP. In order to give a clearly visible pictorial representation of this union,
the following picture intentionally magnifies the triangle QCP out of proportion.
QCP is highlighted as the dotted region.
PP rQ = (20 106 100
127 , 28 127 )
Q q Pq P
Q qP C = (21 37 , 28 47 )
Q qq P q Pq t
Q q @
Q q q
Q q@
Q q@
Q q
Q @q
Q @q
Q rP = (22, 28)
R @Q
@ QQ
@ Q
@ Q { H = 7930}
O t @ tB

The points with integer coordinates either lie in AOBPQ or lie in QCP. Looking
at the coordinates of Q, C, and P, we can see that there is no point with integer
coordinates in QCP except P itself. The reason is simple: if ( a, b) is such a point,
where a and b are both positive integers, then ( a, b) lies in the horizontal region
whose boundary consists of horizontal lines passing through Q and P. Thus b is
a positive integer so that
28 ≤ b ≤ 28 100 127 .
Clearly there is no such integer other than 28. Thus ( a, b) = (22, 28). It follows
the points with integer coordinates in R are the same as those points
with integer coordinates in the polygon AOBPQ.
Therefore in order to prove the claim, it suffices to prove that H ( P) ≥ H ( a, b)
for all ( a, b) lying in AOBPQ, where a and b are integers. We will in fact prove
more: H ( P) ≥ H ( x, y) for all the points ( x, y) lying in AOBPQ. This is because,
by Theorem 8.4 on page 172, the polygonal region AOBPQ lies in the closed half-
plane { H ≤ 7930} of the line { H = 7930} (e.g., O clearly lies in { H ≤ 7930}). The
claim is proved.
It follows that by manufacturing 22 A games and 28 B games, the manufac-
turer will make the maximum profit of $7930.
Remark. While the profit function H ( x, y) = 125x + 185y for whole numbers
x and y is something concrete and down-to-earth, the consideration of H ( x, y) as
a function defined on the feasibility region elevates the problem to an abstract
mathematical problem about linear functions of two variables. It will be seen
that a substantial part of learning algebra consists of learning when to take an

abstract approach to a problem in order to get the solution to the original at the end.
The extension of the concept of “profit” to include 125x + 185y for any numbers
x and y is a good example of the needed abstraction for the solution of many
problems in algebra.

Exercises 8.5

(1) On the feasibility region R of the manufacturing problem as defined

by (8.16) on page 180, prove that the linear function of two variables
F( x, y) = x − 3y − 2 achieves its maximum value at a vertex. Do the
same for its minimum. What are the maximum and minimum values of
F in R?
(2) Given a region R and a point (s, t) in R, as shown:

(s, t) r

Can a linear function of two variables F( x, y) achieve its maximum value

in R at (s, t)? Its minimum value? Explain. (There are at least two
different explanations that you can give.)
(3) Let S be the region defined by the inequalities:

3x − 3y ≤ 5
2x + y ≤ −4

Does the linear function P( x, y) = 23 x + y achieve a maximum value in

S ? Does it achieve a minimum value in S ? Explain.
(4) Prove that the linear function 10x + 3y achieves its maximum and mini-
mum values in the graph of the inequalities:

⎪ x≥0

x ≤ 10

⎪ y≥5

0.5x + y ≤ 20
What are the maximum and minimum values of 10x + 3y in this graph?
(5) Let R be the region defined by x ≥ 1, y ≥ −2, x + y ≤ 10, and y ≤ 5.
Also let P( x, y) = x − 2y. (a) Graph R. (b) Prove that P( x, y) achieves its
maximum and minimum values at the vertices of R. (c) Determine the
maximum and minimum values of P( x, y) in R.

8.6. Behavior of linear functions in the plane

In resolving Step 1 of the manufacturing problem (see page 158), we relied on the
explicit description of the feasibility region and bypassed some theoretical con-
siderations in order to get to the solution directly (see Theorem 8.5 on page 182).
Now it is time to step back and provide some context to Theorem 8.5 by stating
the general theorem that lies behind Theorem 8.5. Proofs will be either outlined
or entirely missing; they are not the focus of our attention here.
We begin with two definitions. A region that is the intersection of a finite
number of closed half-planes—such as the one arising from the manufacturing
problem (see page 159)—will be called a finite intersection of closed half-planes
for short. A geometric figure S in the plane is said to be bounded if there is a
positive number K so that the distance between any two points in S is less than
K. Thus polygons and disks are bounded, but rays or half-planes are not. We
also call attention to the fact that, whereas a polygon is usually conflated with its
associated polygonal region, consisting of the polygon and all the points inside it
(see page 267), we will try to distinguish between the two in this section in the
interest of clarity. In particular, we will refer to the polygon as the boundary of
the polygonal region it defines.10 We also recall that a geometric figure S is convex
if the segment joining any two points in S also lies completely in S (see page 177).
Disks and rectangular regions are convex, but the proof of the convexity of the
former is not as easy as one might believe.
Here is the theorem in question.
Theorem 8.6. Let a region R be a bounded region that is a finite intersection of
closed half-planes, and let the vertices at its boundary be A1 , A2 , . . . , An . Let H ( x, y) =
ax + by + e be a linear function of two variables, and let A1 be a vertex so that H ( A1 ) ≥
H ( A j ) for all j. Then H achieves its maximum value in R at A1 , i.e., for all ( x, y) in R,
H ( A1 ) ≥ H ( x, y).
Similarly, if H ( A1 ) ≤ H ( A j ) for all j, then H achieves its minimum value in R at
A1 , i.e., for all ( x, y) in R, H ( A1 ) ≤ H ( x, y).
First of all, note that the theorem is false if we remove the hypothesis that R be
bounded. For example, let S be the shaded region indicated below (understood
to extend infinitely to the left) and let H ( x, y) = y.



10 Note that we still refrain from defining precisely what a region is and what the boundary of a

region is. Such omissions are part of the reason we do not emphasize proofs here.

This region S is the intersection of the following four closed half-planes (instead
of saying “the upper half-plane of the line containing the ray”, etc., we will simply
say “the upper half-plane of the ray”, etc.):

the closed upper half-plane of the ray containing A,

the closed upper half-plane of the segment AB,
the closed lower half-plane of the segment BC,
the closed lower half-plane of the ray containing C.

Clearly H (C ) is the largest of the three numbers H ( A), H ( B), and H (C ), but
there is no point in S at which the linear function H ( x, y) achieves the maximum
The critical component in the proof of Theorem 8.6 is the following basic

Lemma 8.7. A bounded region that is a finite intersection of closed half-planes is

a convex polygonal region. More precisely, if a bounded region R is the intersection of
a designated closed half-plane of each of the lines L1 , L2 , . . . , Ln (where n is a positive
integer ≥ 3), then R is a convex polygonal region so that each side of its boundary
polygon is a segment on a line L j for some j, 1 ≤ j ≤ n.

A proof of this lemma can be found in Section 7.6 of Volume II of

[Wu-HighSchool]. Like many technical facts, this lemma has the not uncom-
mon distinction of being completely plausible and—at the same time—tedious to
prove. Once we draw a few pictures of intersecting closed half-planes that end
up being bounded, we cannot help but see a convex polygon at the end. The
left picture below is an example of a triangular region obtained by intersecting
four closed half-planes (the closed half-plane in each case is indicated by an ar-
row), while the polygonal region on the right is the intersection of five closed
L4 L5


L3 L3
L2 L4


Let S be a finite intersection of closed half-planes. Prove that if the intersec-

tion of S with a line is not empty, then it is either a segment or a ray.

Proof Outline of Theorem 8.6. It suffices to prove the case of a maximum, as the
case of a minimum will be seen to be entirely similar. Let P be a point in R, and
we will prove that H ( P) ≤ H ( A1 ).
Let L be the horizontal line passing through P. By hypothesis, R is a finite
intersection of closed half-planes. According to the Activity above, the intersec-
tion L ∩ R has to be a segment CD because R is bounded. C and D are therefore
points on the boundary polygon; let us say C lies on A3 A4 and D lies on A7 A8 .
A3 @
C Pr @D
A4  @ A7

By Lemma 8.3 on page 168, either (i) H ( P) = H (C ) = H ( D ), or (ii) H ( P) is

between H (C ) and H ( D ). If (i) holds, then by Lemma 8.3 again, H (C ) is ≤ one
of H ( A3 ) and H ( A4 ). Since both H ( A3 ) and H ( A4 ) are ≤ H ( A1 ), we see that

H ( P ) = H ( C ) ≤ H ( A1 ).

The theorem is proved in this case. If (ii) holds, let us say H ( P) < H ( D ). By
Lemma 8.3, H ( D ) ≤ one of H ( A7 ) or H ( A8 ), and since both H ( A7 ) and H ( A8 )
are ≤ H ( A1 ), we see that H ( P) < H ( A1 ). The proof of Theorem 8.6 is complete.
Remark. The part of Theorem 8.6 concerning the fact that a linear function
must attain a maximum or a minimum in a bounded finite intersection of closed
half-planes is a special case of a general theorem about continuous functions. See,
for example, Theorem 13.12 on page 65 and Corollary 2.5 on page 122 of [Ross].

Exercises 8.6

(1) Let R be the graph of the following inequalities:

⎨ 1
+ 4
5y ≥ 4
−2x + 5y ≤ 10
⎩ 1
3x − ≤ 2

Where would the linear function 2x + 13 y − 4 achieve its maximum value

in R? Its minimum value? What are the maximum and minimum val-
(2) Let R be the polygonal region with vertices (3, 1), (6, 1), (6, 7), (4, 2),
and (3, 7). (a) Does the linear function H ( x, y) = 12 x − 4y + 2 achieve
its maximum value in R? If so, what is it, and which are the maximum
points of H in R? (b) Does it achieve its minimum value in R? If so,
what is it, and which are the minimum points of H in R? Explain your
answers fully.

(3) The nutritional values of a basic unit of two food items are tabulated
calories vitamin C (i.u.) protein (mg)
A 156 50 30
B 116 75 80
A mountain climber wants to bring enough of both items for her trip
so that she would get at least 2600 calories, 1500 i.u. of vitamin C, and
1250 mgs of protein.
Suppose each unit of Item A costs $2.80 and each unit of Item B
costs $5. How many units of each should she buy so that the total cost
is minimum and her nutritional requirements are met? Use a scientific


So far we have dealt with linear equations (pages 38 and 57), linear inequalities
(page 159), and linear functions (page 127) of one or two variables. Linear objects
are important because they are the basic building blocks of mathematics, but life
is often not linear. A good example is Kepler’s famous Third Law governing the
motion of an object around the sun: the square of the period1 divided by the cube
of the so-called semi-major axis of the elliptic orbit2 is a fixed constant no matter
what the object may be (e.g., any planet, any meteor, any asteroid). In symbols,
this means there is a number c so that, if T is the period and D is the semi-major
axis of an object revolving around the sun,
= c.
Thus if the object is very far from the sun compared with the earth (e.g., Pluto)—
so that D is very large—then T must be correspondingly very large and therefore
it would take much more than an earth-year for that object to complete a revolu-
tion around the sun. Multiplying both sides by D3 , we get
T 2 − cD3 = 0.
You can see that this is not a linear equation of two variables in T and D. What
this implies is that, in order to progress further into mathematics, at the very least
we will have to deal with powers of numbers, such as T 2 and D3 . These are the
most basic nonlinear quantities.
This is of course hardly news to us because we have already come across
polynomials of any degree in Section 1.4, and polynomials of degree exceeding
1 are not linear. If we isolate a monomial of degree 4, let us say x4 , then it is
a real-valued function defined on R that assigns the number x4 to each number
x. However, it is possible to tweak this idea to arrive at a different kind of a
nonlinear function. Instead of fixing the exponent and raising all numbers to
that power, we fix a number such as 3 and assign to each positive integer n the
number 3n . If we denote this function for the moment by g, then g(n) = 3n for
each positive integer n, e.g., g(5) = 35 = 3 · 3 · 3 · 3 · 3. The domain of definition
of g is the collection of positive integers. It is clear that g is not a linear function,
in the sense that there are no fixed constants a and b so that g(n) = an + b for all

1 The time it takes the object to complete a revolution around the sun.
2 The maximum distance of the object from the sun. In some school mathematics and physics
textbooks, this law is stated using “mean distance” in place of “major axis”, and that is an error.


positive integers n (compare the definition of a linear function defined on whole

numbers on page 127).
What interests us in this context is the fact that a number such as 5.1 is not in
the domain of definition of g, so the symbol 35.1 has no meaning. Until we can
define what it means to raise 3 to a power that is a rational number, the domain
of definition of g must remain the positive integers.
Functions that are defined only on positive integers, such as g(n) = 3n , are
awkward to use.3 We have already encountered an analogous situation in Chapter
8, when the profit function H (m, n)—which was initially defined only on pairs of
whole numbers (m, n)—had to be extended so that it became defined on pairs
of nonnegative real numbers ( x, y) (see pages 158 ff.). In that case, the extension
was handed to us on a silver platter due to the linearity of the profit function: if
for whole numbers m and n, H (m, n) is given by H (m, n) = 125m + 185n, then
for any nonnegative real numbers x and y, we simply replace m and n by x and
y, respectively, and H ( x, y) = 125x + 185y continues to make perfect sense as a
function of the real numbers x and y. Unfortunately, when we replace the n in
g(n) = 3n by a real number x that is not a positive integer, we have no idea what
3x could mean.
The quest for the meaning of 3x where x is not a positive integer is therefore
a quest for a function that “naturally extends” the function g(n) = 3n defined
on positive integers, in the sense that we want a function G whose domain of
definition is all of R so that G has properties “similar to those of g” and so that
when n is a positive integer, G (n) = 3n as before. Thus, for all positive integers
n, G (n) is the same as g(n), but G ( x ) makes sense for any real number x. Such
a natural extension was in fact found long ago, and it turns out that, for rational
numbers x, such as x = 5/2, the value of G ( 52 ) is something familiar to us, namely,
 √ 5 √ m √
3 , and in general, for any fraction m m
n , G( n ) =
3 , where n 3 stands
for the positive n-th root of 3 (see page 201 for the definition of “n-th root”).
Therefore, when we know that there is a natural extension of g(n) to a function G ( x )
that is defined for all x, we will write 3x for G ( x ). This is why, in this notation,
√ m
3 m/n = n
3 for any fraction mn.
The problem of explaining this phenomenon to school students in an optimal
way has not been completely solved as of 2015, but what we know with some
certainty is that the way TSM4 handles this issue does not work. Briefly, here
is the situation. When n is a positive integer and β is a nonzero number, the
definition of βn is straightforward:
βn = β · β · · · β .
The key facts governing this exponential notation are the following laws of exponents:
Let α, β be real numbers and let m and n be any positive integers. Then:
(E1) βm · βn = βm+n .
(E2) ( βm )n = βmn .
(E3) (α · β)n = αn · βn .

3 The real reason for the awkwardness comes from advanced mathematics: such functions cannot

be differentiated or integrated in the usual way.

4 See page xi for the definition.

What TSM does is to introduce the idea that we can extend the meaning of the
notation in βn so that n is allowed to be a rational number rather than just a
positive integer, and, moreover, when so extended, (E1)–(E3) continue to hold for
rational numbers m and n. To this end, the following definitions are given: let α
and β be positive from now on and let n and m be positive integers as before.
Then, by definition,
β0 = 1 and ( β)m/n = n β

and then negative rational exponents are defined by

β−m/n = .

It is here that the first sign of trouble appears because, in trying to motivate these
definitions, heuristic arguments are given to the effect that, if (E1)–(E3) are already
known to be true for all rational numbers m and n, then the preceding definitions
would be the inevitable outcome. There is nothing wrong with this approach if
precision and reasoning are the rule of the day and great care is taken to ensure
that such speculative reasoning is clearly understood to be speculative and not
part of regular reasoning. (See Section 10 of [Wu2010b] for such an exposition.)
In TSM, however, reasoning is largely absent, precision is a rarity, and there seems
to be little difference between reasoning and speculative reasoning. Consequently,
heuristic arguments are commonly misunderstood to be valid proofs, and the defi-
nitions of 0 exponents, fractional exponents, and negative rational exponents have
been misconstrued by many to be theorems.
Recall that the purpose of these definitions is to extend (E1)–(E3) to rational
values of m and n. Under normal circumstances, the definitions would be put in
the service of reasoning, and seeing these definitions in action naturally leads to
an increased understanding. In typical TSM fashion, however, these definitions
are not put to use for reasoning except as tools for drills, and no proofs of the
generalized laws of exponents are offered.5 Not even for important special cases.
For example, the ubiquitous identity (which is a special case of the generalized
version of (E3) for rational numbers m and n)
α β = α β for all positive α and β

is rarely proved, or if a proof is attempted, it is not done correctly.6 It may

therefore be said that, in TSM, students’ learning of rational exponents begins
with the misconception that definitions are theorems, and continues with the
rote-learning of the generalized versions of (E1)–(E3). This kind of mathematics
“learning” should not be the intended outcome of K–12 mathematics education.
More can be said about rational exponents and TSM, however. Most textbooks
discuss the laws of exponents as “number facts” but say nothing about functions.
The fact that the exponential functions α x and β x satisfy (E1)–(E3) for all real

5 It must be admitted that a direct proof of any of the generalized versions of (E1)–(E3) would be

extremely tedious.
6 We have made a special effort to give a self-contained proof of this identity on pages 220 ff.

numbers s and t, i.e.,

βs · βt = βs+t ,
( βs )t = βst ,
(α · β)s = αs · βs ,
is then unceremoniously dumped on students. And this in spite of the fact that
students’ grasp of rational exponents is still tenuous and they do not have the
proper context (i.e., the problem of extending the domain√ of definition of βn de-
fined for positive integers n) to take in something like 3 2 , much less to under-
stand why the following version of (E1) might be true:
√ √
3π · 3 2
= 3π + 2
The promulgation of such “higher order rote-learning skill” becomes part of the
lore of TSM. It goes without saying that, amidst such chaos, the fact that the
exponential functions are the main reason one studies the laws of exponents is
bound to get lost.
Incidentally, anyone interested in seeing how all five fundamental principles
of mathematics (see page xii) are violated can do no better than examining what
TSM does to the laws of exponents.
Given all the attendant pitfalls in the preceding approach, there is a com-
pelling reason to introduce, at the outset, the problem of “extension” (of 3n to
3x ) in order to set the stage properly for our interest in rational exponents. So long
as the existence of exponential functions will be assumed without proof, why not assume it
sooner rather than later? We will therefore quote—at the outset—a known theorem
(Theorem 9.1 on page 198) that guarantees the existence of a unique “extension”
of the function g(n) = βn (β > 0) that satisfies property (E1), i.e., βs · βt = βs+t
for all s and t. We will then use this theorem to develop all the known facts about
rational exponents systematically and logically. In particular, the erstwhile defi-
nitions of β0 , βm/n , and β−m/n can now be unambiguously proved as theorems;
there will no longer be any logical subtleties to induce any confusion between
what is being proved and what is being defined. In this way, we will be able
to provide the reasoning for what is known about rational exponents in K–12
mathematics (see Section 9.3). What we are doing for rational exponents here
is not unlike what we did in Chapter 1 of [Wu-PreAlg] for fractions: reorganize
the known facts in such a way that they become mathematically coherent and
pedagogically feasible.
To understand this coherence first-hand, with the goal of making this mate-
rial more understandable and learnable by your students, you are invited to go
through the detailed reasoning in this chapter.

9.1. Positive-integer exponents

We are used to the notation of the square x2 or the cube x3 of a given number x.
In this chapter, we will give a somewhat systematic approach to the concept of
raising a given positive number β to the s-th power for any real number s. Because
this concept is traditionally not treated well in TSM, we begin by explaining the
need for exponential notation.

When we have to add the same whole num-

ber to itself many times (which happens of- In the same way that repeated
ten, as we all know), we simplify by intro- addition leads to multiplication,
ducing a new concept—multiplication—and a
repeated multiplication leads to
new notation—×. Thus, instead of writing
5 + 5 + 5, we write 3 × 5. In general, if k and n exponentiation.
are positive integers, then

(9.1) n×k = k+k+···+k.

Later on, we will also encounter the phenomenon of having to multiply the same
whole number k by itself many times. For that, we introduce the concept of
exponentiation: 5 × 5 × 5 is denoted by 53 . In general, if k and n are whole
numbers and n > 0, then

(9.2) kn = k × k × · · · × k .

As is well known, n is called the power, or exponent, of kn , and k is the base of

kn . One also speaks of kn as raising k to the n-th power.
The definition of kn only requires that we can multiply k by itself and it
doesn’t matter whether k is a fraction or a rational number, or in fact—by virtue
of FASM (see page 265)—a real number. With this understood, we will let the
base be a real number from the beginning. Thus let α, β be real numbers and let m
and n be any positive integers. Then we have the following laws of exponents for
positive integer exponents:

(E1) βm · βn = βm+n .
(E2) ( βm )n = βmn .
(E3) (α · β)n = αn · βn .

These three facts are, simultaneously, trivial to prove and “fun” to use due to their
simplicity. For example, (E1) says that, in an intuitive sense, exponents are additive
under multiplication. As to the triviality of their proofs, there is no doubt of that.
For example, here is the proof of a special case of (E2) when m = 3, n = 5:

( β3 )5 = ( βββ)5
= ( βββ)( βββ)( βββ)( βββ)( βββ)
= ( ββ · · · β) = β3×5 .

The general proof of (E2) is almost identical, and the proofs of the other two
identities are equally straightforward and will be left as exercises (see Exercise 1
on page 200).

Let us go back to (9.1) for a moment: The k and n in the definition of multipli-
cation in (9.1) must be extended from whole numbers to fractions (Chapter 1 in
[Wu-PreAlg]), to rational numbers (Chapter 2 in [Wu-PreAlg]), and—by virtue of
FASM—to all real numbers. Problems in nature and everyday life have exposed
us to the need for such extensions. We recall briefly that the extensions are not
done randomly but are always made to satisfy some basic requirements:

(1) Each extension respects the definition in the preceding stage,

so that the multiplication of fractions coincides with the multi-
plication of whole numbers when the fractions are whole num-
bers, the multiplication of rational numbers coincides with the
multiplication of fractions when the rational numbers are frac-
tions, and the multiplication of real numbers coincides with the
multiplication of rational numbers when the real numbers are

(2) The extended meaning of multiplication continues to satisfy

the associative, communicative, and distributive laws.

(3) 1 · x = x regardless of whether x is a fraction, rational num-

ber, or real number.

These requirements are self-explanatory and need no justification.

Now consider the parallel situation regarding exponentiation. We have just
remarked in connection with (E1)–(E3) that, in (9.2), k can be any real number.
While eventually the exponent n in (9.2) will also be extended to real numbers, our
immediate concern is to at least make sense of (9.2) when n is a rational number.
Now, while it is easy to accept that we want to multiply any two √
real numbers,
it is less obvious why one should bother with 5 3/4 or even 3 − 2 . TSM makes
believe that it is necessary to get to know this new notation because standardized
tests say you must, but this is not how we want to teach mathematics. The real
need for allowing n in (9.2) to be an arbitrary number is actually very natural,
although not one that is often encountered in school mathematics. It has to do
with how to interpolate a function defined on whole numbers to one defined on
R, in the following sense. Let us fix a positive (real) number β to be the base (the
requirement of β to be positive when we extend n to the rational numbers will be
explained on page 204 below). Then we have a function

E0 : {positive integers} → R

defined by

(9.3) E0 (n) = βn for any positive integer n.

Because the domain of definition of E0 is the positive integers, the graph of E0

is just a sequence of dots (we are taking the domain of definition of the function
E0 very seriously at this point). For β = 3, the graph of the function that assigns

3n to each positive integer n (with the y-axis compressed) looks like this (observe
that we are using a scaled coordinate system in the sense of page 134):

Of course when we see dots, we try to connect them! (That is the basic impulse
anyway.) Formally, we would like to get a reasonable looking curve that is the
graph of a function to interpolate these dots, in the sense of page 128. Thus, let
[1, ∞) stand for the right-pointing ray on the number line with vertex at 1. Then
what we want to do is define a function
E : [1, ∞) → R
so that
(9.4) E(n) = E0 (n) = βn for all positive integers n.
Such an E satisfying (9.4) is said to interpolate E0 on [1, ∞). Clearly, the graph of
E is a curve that “connects all the dots” of the graph of E0 .
There are many ways to achieve an interpolation of E0 , the most primitive one
being to connect any two adjacent dots of the graph of E0 by a line segment and
then define the interpolating function to be the one whose graph is the sequence
of connected segments so obtained. This way of interpolating a function has in
fact been implicitly described in connection with the book-cost function h(n) and
its corresponding linear function of one variable H ( x ) on page 128.7 However,
there is usually a “natural” interpolation of a given function which “respects” its
characteristic properties. For the case at hand, it will be seen that property (E1) is
the defining characteristic of E0 . Now when (E1) is stated in terms of the function
E0 (see (9.4)), it becomes:
E0 (m) E0 (n) = E0 (m + n) for all positive integers m and n.
Therefore we expect to have an interpolation E : [1, ∞) → R of E0 with a similar
property, namely,
E(s) E(t) = E(s + t) for s, t ≥ 1.
7 With a little reflection, one would realize that the extension of the profit function in the man-

ufacturing problem on page 159 is the precise 2-dimensional analog of the extension from h to H in

As often happens, nature surprises us by offering more than we ask for: by the
use of more advanced mathematics, one in fact proves that there is a function E
defined not just on [1, ∞) but on all of R, so that
E(s) E(t) = E(s + t) for all numbers s and t.
See Chapter 21 in Volume III of [Wu-HighSchool]. Because it is this interpolating
function that brings the correct perspective to the definitions of rational exponents
and to the laws of exponents, we will make this function the cornerstone of our
discussion. From now on, we will assume the validity of this theorem without
proof.8 Here is the theorem in question (in part (A) the term “continuous” is
used, but we will not even define what it means because it is a technical condition
that ensures the validity of the theorem but is otherwise never mentioned again
in this volume).
Theorem 9.1. For a given positive constant β, there exists a unique function E :
R → R, defined on the whole number line R, so that E satisfies:
(A) E is continuous, and for all positive integers n, E(n) = βn .
(B) For all (real) numbers s and t, E(s) E(t) = E(s + t).
The function E in the theorem is known as an exponential function with base
β. Here is the graph of E for β = 3; observe that this graph connects the dots of
the graph on page 197.

We accept Theorem 9.1 without proof partly because its proof is not appropri-
ate for school mathematics, but partly also because, at this point, the proof is not
as important as what this theorem has to say about our general understanding of
the rational exponents of a given number.
Let us rewrite Condition (B) of Theorem 9.1 into a more familiar form. Be-
cause of Condition (A), it is tempting to rewrite E( x ) as β x for every number x;
indeed, if x is a positive integer n, then (A) says E(n) is βn after all. If we agree to

8 This is not unlike the discussion of polynomials of one variable: without assuming the Funda-

mental Theorem of Algebra, such a discussion would be superficial and essentially pointless.

do that, and we do so from now on, then (B) becomes:

(9.5) βs · βt = βs+t for all numbers s, t.
In this form, (9.5) is recognized as one of the general laws of exponents; see
Section 9.3 for further discussion. At the risk of harping on the obvious, notice
that (A) guarantees that the meaning of βn for a positive integer n is unambiguous:
it is β · β · · · β (n times).
Because we use Theorem 9.1 as the starting point of our discussion of ra-
tional exponents, we will rely on the identity (9.5) to survey and reorganize the
landscape of rational exponents. We begin with some simple deductions. First of
(9.6) β0 = 1.
Indeed, letting s = 0 and t = 1 in (9.5), we obtain β1 = β0 β1 , i.e., β = β0 β.
Multiplying both sides by β1 , we get (9.6).
Equation (9.6) has the important consequence that
(9.7) β x > 0 for every x.
To prove this, we first show that β x/2 = 0. By (9.5) again, we have
β x/2 · β− x/2 = β0 = 1
and therefore neither β x/2 nor β− x/2 can be 0. Such being the case,
β x = β x/2 · β x/2 = ( β x/2 )2 > 0
because the square of a nonzero number is always positive. Thus (9.7) is proved.
(9.6) allows us to draw another conclusion about the relationship between β x
and β− x . Let s = x and let t = − x in (9.5); then we get β0 = β x · β− x , so that
1 = β x · β− x . Therefore,
(9.8) β− x = for every x.
One consequence of (9.8) is that if we know the value of β x when x is a fraction,
then we know the value of β x for all rational numbers x. In the next section, we
will make use of (9.5) in a more substantial way to determine the value of β x at a
fraction x, which will then determine the values of β x at all rational numbers.
Remark. The status of (9.6)–(9.8) in TSM is ambiguous at best, in the sense
that it is never clear whether these are definitions to be memorized or theorems
to be proved. The confusion is the direct result of pseudo-mathematical argu-
ments in TSM that purport to prove (9.6) and (9.8), whereas in the usual treatment
of TSM, (9.6) and (9.8) have to be definitions that give meaning to the symbols β0
and β−m/n . As to be expected, the fact that (9.6)–(9.8) are definitions is almost
never made clear in TSM. In the context of the logical development of this chapter,
(9.6)–(9.8) are, unambiguously, theorems we have proved on the basis of Theorem
9.1. This approach to exponents that are not positive integers has the advantage
of logical clarity: we see here, as we shall see in the following two sections (Sec-
tions 9.2 and 9.3), that everything we know about exponents follows from identity
(9.5). Therefore all the facts we know about exponents are not isolated skills but
are spinoffs from a single entity, namely, identity (9.5). This is one example of
mathematical coherence, except we do not flaunt it in these volumes—in the same

way we try not to call your attention to the fact that the writing of these volumes
is in English—because this is what mathematics is.
Let us bring closure to the discussion of interpolations of functions. What
we have just witnessed in (9.6) and (9.8) is that, once we know (9.5) is valid, it be-
comes very simple to understand why the zeroth power and the negative powers
of β must be what they are. In the next section, we will see that (9.5) brings the
same clarity to the fractional powers. But (9.5) would not come to the forefront
of these considerations without a clear understanding that the desired interpola-
tion—given in Theorem 9.1—of the function E0 (n) = βn exists. So the question
becomes how one could dream up such interpolations, i.e. the exponential func-
tions. The simple answer is that these exponential functions do not need to be dreamt
up because they have been there all along. Indeed, many natural processes of growth
and decay (growth of a bacterial population, decay of plutonium atoms, etc.)
have to be modeled by the use of exponential functions. In addition, exponential
functions appear naturally in mathematics: solutions of many basic (differential)
equations involve exponential functions. For these reasons, exponential functions,
i.e., the interpolations of βn , provide natural guideposts for our understanding
of rational exponents.

Exercises 9.1
(1) (i) Write out a proof of (E1) and (E3) on page 195 for the special case of
m = 3 and n = 5. (ii) Write out a general proof of (E1) and (E3).
(2) Here is a common explanation of why 50 must be equal to 1:
Consider the sequence of powers of 5:
54 = 625, 53 = 125, 52 = 25, 51 = 5, and 50 = ?
Notice that you obtain 53 from 54 by dividing by 5, obtain
52 from 53 by dividing by 5, and obtain 51 from 52 by
dividing by 5. Therefore, we should also obtain 50 from
51 by dividing by 5. So 50 = 55 = 1.
Do you think this is a valid proof? Why or why not?
(3) Let H : {whole numbers} → {all real numbers ≥ 0} be the function
H ( x ) = 2x . Plot ( x, 2x ) for x = 0, 1, . . . , 12 to get an idea of the gen-
eral shape of the graph.
(4) Mental math (x and y are real numbers):
(i) Simplify (3x − y)7 (3x + y)7 .
(ii) 424 · 14−4 = ?
(iii) If it is known that 173 = 4913, what is 343 when it is
rounded to the nearest 104 ?
(iv) Simplify ( x4 − y4 )−5 ( x3 + x2 y + xy2 + y3 )5 .

9.2. Rational exponents

Rational exponents are next. We will use the notation of the preceding section; in
particular, β is a positive number.
We start with the simplest case: what is β 2 ? By (9.5),
1 1 1 1
β 2 · β 2 = β 2 + 2 = β1 = β.

If we write γ for β 2 , this says γ is a positive number so that γ2 = β, i.e., γ is the
square root of β. This prompts us to take a more serious look at the concept of a
square root.
A good mathematics education sometimes has the beneficial effect of making
you stop and think about things you may have taken for granted all along and, in
the process, make you gain new understanding of these things. A case in point
for most of us is our experience with the number π. We may have learned in some
vague sense that π is the ratio of circumference over diameter, but we may not
stop to think what “circumference” means until we encounter a formal definition
of length of a curve such as the one offered in Section 5.2 of [Wu-PreAlg]. Moreover,
we may not have thought about how to get a reasonably accurate estimate of π
until we realize that π is also the area of the unit disk and, therefore, with a precise
concept of area available (such as the one offered in Section 5.3 of [Wu-PreAlg]),
we can actually achieve such an estimate by hand. The situation with the “square
root of a number” is similar. It is a concept that is all too familiar, but how do
we know that there is a number whose square is exactly the given number ? Take the
square root of 2, for instance. You can perhaps rattle off 1.4142135 . . . as that
number, but you may also be aware that the decimal expansion of the square root
of 2 is nonrepeating, so that no matter how many decimal digits you write down,
it will just be an approximation. For example, 1.1421352 = 1.99999091405925. So
what gives you the confidence that there is a number which is the “square root of 2”?
This is where mathematical knowledge can
help by providing us with the answers we The fact that there is always a
need. There is a theorem, proved in advanced positive square root √ x of a
courses, that not only square roots, but any so-
called n-th roots, exist and are unique. Pre-
positive number x is highly
cisely, let n be a positive integer. Then given a nontrivial.
positive number β, a positive number γ is said
to be a positive n-root of β if γn = β. (Recall that γn = γγ · · · γ (n times), by
Note the emphasis throughout on the positivity of β and γ.
This is because if β = −2, then there is no number on the num-
ber line whose square is a negative number (see Exercise 2 on
page 204). Moreover, in case β > 0, e.g., β = 4, there will be at
least two numbers whose square is 4, namely, 2 and −2. This
is why we have to specify the positivity of γ in the preceding
Then the theorem that resolves all doubts in this context is the following (it is
proved in Section 16.5 of Volume III of [Wu-HighSchool]; also in §18 of [Ross]).
Theorem 9.2. Given a positive number β and a positive integer n, there is one and
only one positive number γ so that γn = β.
To paraphrase: every positive number has a unique positive n-th root (n is a
positive integer). The uniqueness part of the theorem, which says that there is at
most one such γ, is actually not difficult to prove, but we will postpone this proof
to Corollary 2 on page 209 so as not to interrupt our discussion. Henceforth, we
shall refer to the γ in the theorem as the positive n-th root of α and, if there is no
fear of confusion, more simply as the n-root of α. The standard notation for the

positive n-th root of α is n α. Note that the case n = 2 is distinguished and√ the

notation for the positive square root is α rather than the more elaborate


Please remember that n α is always positive, by convention. Therefore 4 = 2,
and never −2. In any case, the main point of Theorem 9.2 is that there is such a
thing as the positive square root of 2.
The third root of α is traditionally called its cube root.

Example. Graph the function given by r ( x ) = x.
We note that this is not a function from all numbers to all numbers, but
r : {all real numbers ≥ 0} → {all real numbers}.
The following sequence of points on the graph of r ( x ) is self-explanatory:

4 q q
q q
3 q q
2 q
q q
1 q q
O 1 4 9 16

√ sequence of points exhibits two √ different patterns: when 0 < x < 1, x <
x = r ( x ), but when 1 < x, x > x = r ( x ) (see Exercise
√ 4 on page 205). Note

also that if 0 < a < b, then r ( a) < r (b), i.e., a < b (see Exercise 5 on page
205). This fact then tells us that there is no need to see more points on the graph
of r ( x ) because as we go towards the right on the positive x-axis, the graph will
simply rise slowly in the same way as it does here for the values of 1 ≤ x ≤ 16.


Verify that the function G : R → R so that G ( x ) = the number t so that

t3 = x is well-defined, i.e., for any number x (positive or negative), there is
one and √ only one real number t so that t3 = x. Of course if x ≥ 0, then
G ( x ) = 3 x. Now plot a sequence of points on the graph of G to get the
general shape of this graph.

We are now in a position to resume our discussion of rational exponents.

Recall, we had β1/2 · β1/2 = β. Since β1/2 > 0 (by (9.7)), Theorem 9.2 implies
that β 2 is the positive square root of β, i.e.,

β1/2 = β.
In general, for any positive integer n, we claim that

(9.9) β1/n = n β.
Because β1/n > 0, the uniqueness part of Theorem 9.2 says that (9.9) is equivalent
to proving that
( β1/n )n = β.

To this end, we have to first slightly generalize (9.5). Let s1 , s2 , . . . , sn be any n

numbers (n is a positive integer). Then, we shall prove:
(9.10) βs1 · βs2 · · · βsn = βs1 +···+sn .
For example, we can understand a special case of (9.10) by proving 3a · 3b · 3c =
3a+b+c for any numbers a, b, c, as follows. We will make use of (9.5) liberally, and
we will also take for granted that we can add or multiply a collection of numbers
in any order (see Theorem 1 and Theorem 2 on page 270). Thus,
3a · 3b · 3c = 3a · (3b · 3c ) = 3a · 3b+c = 3a+(b+c) = 3a+b+c .
In general, the proof of (9.10) goes as follows. (9.5) already proves this for n = 2.
Consider three numbers s1 , s2 , s3 . By applying (9.5) to the two numbers s1 and
(s2 + s3 ), we have
βs1 · βs2 +s3 = βs1 +(s2 +s3 ) .
By applying (9.5) again to the two numbers s1 and s2 , the left side becomes (by
making use of Theorem 2 in the Appendix of Chapter 1 in [Wu-PreAlg]; see
page 270)
β s1 · β s2 · β s3
and the right side is, of course, just
β s1 + s2 + s3 ,
where, strictly speaking, we have just made use of Theorem 1 in the Appendix of
Chapter 1 in [Wu-PreAlg]; see page 270. Therefore (9.10) is proved when n = 3.
Now consider four numbers s1 , s2 , s3 , s4 . By applying (9.5) to the two numbers s1
and (s2 + s3 + s4 ), we have
βs1 · βs2 +s3 +s4 = βs1 +(s2 +s3 +s4 ) .
Because we know (9.10) is valid for n = 3, the left side becomes (using Theorem
2 on page 270)
β s1 · β s2 · β s3 · β s4 ,
while the right side is, using Theorem 1 on page 270,
β s1 + s2 + s3 + s4 .
Therefore (9.10) is proved when n = 4. We can now go on to prove that (9.10) is
valid for n = 5 in a similar manner. The pattern is clear: we can prove the validity
of (9.10) for any positive integer n, as claimed.
We can now prove (9.9). Noting that
1 1 1
+ +···+ = 1,
n  n


we have, by (9.10),
β1/n · β1/n · · · β1/n = β1 = β.

Thus ( β1/n )n
= β. Since β1/n > 0 by (9.7), this says β1/n is n
β, by Theorem 9.2.
This proves (9.9).

The equality (9.9) explains why we had to assume β > 0. If β were a negative
number, then the value of β1/2 , or for that matter the value of β1/n for any even
positive integer, would not be a real number (see Exercise 2 on page 204).
Finally, we can determine the value of β x when x is a fraction:
(9.11) βm/n = n β for all positive integers m and n.
One can see the reason behind (9.11) in a special case: using (9.10) and the fact
that 14 + 14 + 14 = 34 , we have
1 1 1 √
53/4 = 5 4 + 4 + 4 = 51/4 · 51/4 · 51/4 = (51/4 )3 = ( 5)3 .

In general, because
m 1 1 1
= + +···+ ,
n n
n  n

(9.10) implies that
βm/n = β1/n · β1/n · · · β1/n .
Therefore, using (9.9), the right side is equal to
β · n β · · · n β = ( n β )m ;
this then proves (9.11).
It follows from (9.8) and (9.11) that
β−m/n =  for all positive integers m and n.
( β )m

Together with (9.6) and (9.11), we have the following complete determination of
the values of the function β x when x is a rational number: For all positive integers
m and n,
⎧  m

⎪ βm/n = n
β ,

⎨ 0
(9.12) β = 1,

⎪ −m/n = 1

⎩ β  .
( β )m

Our work is not yet done, however, because there are still tantalizing ques-
tions we cannot answer at this point, e.g., is it true that ( βm )1/n = ( β1/n )m ? We
shall deal with this and other related questions in the next section.

Exercises 9.2
(1) Do not use a calculator for the following. (a) 125−2/3 − 32−2/5 = ?
(b) 5124/3 = ? (c) 31252/5 · 2561/4 = ?
(2) (a) Explain why there is no number on the number line whose square
is a negative number. (b) If n is an even positive integer, explain why
there is no number on the number line whose n-th power is a negative

(3) Verify each of the following by direct computations (no calculators):

(a) 7291/2 · 7291/3 = 7295/6 .
(b) (645/6 )1/2 = 64(5/6)(1/2) .
(c) 2164/3 · 1254/3 = (216 · 125
4/3 .

(4) Prove that when 0 < x < 1, x < x,√and when 1 < x, x > x.

(5) Prove that if 0 < a < b, then a < b.
(6) Let S be the segment from −1 to 1 on the number line, and let F be the
function F : S → {the real numbers} defined by F( x ) = 4096x . Plot
enough points, without the use of a calculator, to get a general picture of
the graph. (Note that 4096 = 212 .)

9.3. Laws of exponents

For positive integer exponents, we have the laws of exponents (E1)–(E3) (page
195). Now that we have a definition of β x for any real number x, it seems natural
to try to generalize (E1)–(E3) to all exponents, i.e., for all positive real numbers α
and β, and for all real numbers s and t:
(E4) βs · βt = βs+t .
(E5) ( βs )t = βst .
(E6) (α · β)s = αs · βs .
These are indeed valid, and are called the general laws of exponents. We rec-
ognize that (E4) is just (9.5) on page 199. Incidentally, (E5) answers affirmatively
the question raised at theend of the  preceding section: indeed, it is the case that
(β )
m 1/n = ( β ) , i.e., β = ( β )m .
1/n m n m n

In advanced courses, both the definition of β x for any number x and the
proofs of these laws—again for all s and t—are done in one fell swoop (see Chapter
21 in Volume III of [Wu-HighSchool]). Our more modest goal in this section is to
at least prove (E5) and (E6) for rational numbers s and t on the basis of (E4), i.e., on
the basis of identity (9.5). ((E5) is the statement of Theorem 9.3 on page 206, and
(E6) is the statement of Theorem 9.7 on page 212.) Let it be stated at the outset
that, even with this limited goal, the proofs in the next ten pages are not entirely
straightforward, and most of them will not likely see the light of day in a middle
school classroom. But we provide these proofs nonetheless because the reasoning
is extremely instructive; if you read them through carefully—and even if you do
not follow everything—you are bound to become more comfortable with rational
exponents. At the very least, study Corollary 2 on page 209 and the proof of Theorem
9.6 on page 211 carefully, because they will enable you to explain to your students
the fundamental fact that, for all positive numbers x and y,
√ √ √
n xy = n x · n y.

The special case of this identity for n = 2 ,

√ √ √
xy = x · y,

is so ubiquitous that we will give it a self-contained proof in Section 9.5 on page


Remark. In TSM, these laws of exponents are presented as isolated skills

about numbers, and the main selling point in such presentations would seem to be
the novelty
√ of rewriting something familiar,
5 4
The laws of exponents are not e.g., 3 , in the esoteric notation of 34/5 . The
number facts written in an unspoken message is that these new skills—in
esoteric notation. They are the absence of any reason why they deserve to
be learned—must be memorized because they
remarkable properties of are needed for standardized tests. We hope to
exponential functions. have convinced you, however, that these laws
are not tricks to memorize for standardized
tests, but are remarkable properties of the exponential functions (in the sense of
Theorem 9.2 on page 201) that you must get to know if you want to understand
these functions. As mentioned earlier, the exponential functions are truly basic
in both mathematics and the sciences, and therefore these laws of exponents are
eminently worth knowing.


Prove each of the following in two different ways: first, appeal to (E4)–(E6),
and second, directly compute the values of both sides with a four-function
calculator. (i) (272/3 )(1252/3 ) = (27 · 125)2/3 . (ii) 3438/3 = 3434 .

The first goal of this section is the proof of the following special case of (E5).
Theorem 9.3. Let β > 0. Then
(9.13) ( βs )t = βst for all rational numbers s and t.
We would like to point out, first of all, that (9.13) does not come out of
nowhere, but is rather the comprehensive summary of various known special
cases. For example, we proved earlier that if m and n are positive integers, then
βm/n = n β

(see (9.11) on page 204). But by (9.9), β1/n = n β. Therefore the preceding
equality can be written as
βm/n = β1/n .
Or, equivalently,
 1/n m
(9.14) β = βm/n .
We see that this is a special case of (9.13) when s = n1 and t = m.
In addition, we know from (9.12) on page 204 that
β−m/n =  .
( β)m

But the product formula for rational quotients (see page 270)9 implies that
1 1 1
 =  ×···×  .
( β)
n m n
β n

9 Together with FASM.


By combining the two and using β1/n = n
β, we get:
1 1
β−m/n = × · · · × 1/n .
β1/n β
Now using the fact that β−s
= 1/βs
for any number s (see (9.8) on page 199), we
have  m
β−m/n = β−1/n · · · β−1/n = β−1/n .
Written another way, this says:
 −1/n m
(9.15) β = β−m/n .
This is then a special case of (9.13) when s = − n1 and t = m.
As another variation on this theme, again we start with (9.12) on page 204, to
the effect that
β−m/n =  .
( β)m

Using β−s = 1/βs for any number s, we rewrite the right side as
1  −m
 = n
β = ( β1/n )−m .
( n β)m
Combining the two, we have
(9.16) β1/n = β−m/n .
We see that this is a special case of (9.13) when s = n1 and t = −m. We will build
on (9.14)–(9.16) to prove (9.13).
A second comment about the proof of (9.13) is that it can be done by a brute-
force “grinding out” process, but with a little finesse the proof can be greatly
simplified and made conceptually more transparent. Anticipating Corollary 2 on
page 209 below (which is logically independent of Theorem 9.3), we will illustrate
with the proof of a very simple special case of (9.13):
(9.17) (21/3 )1/2 = 2(1/3)·(1/2).
We first give the brute-force proof. The key idea is that we don’t have to prove
directly that both sides of (9.17) are equal because, in view of Corollary 2, all we
have to do is to make sure that both sides are positive numbers (they are) and
that when we raise both sides to a large positive integer power, they are equal. Of
course the whole point of raising both sides to a large positive integer power is to
bypass the unpleasant fractional exponents. In this case, for example, if we raise
the left side of (9.17) to the 6th power (6 is the product of the denominators of 13
and 12 in the exponents), then

(21/3 )1/2 = (21/3 )1/2 · (21/3 )1/2 · · · (21/3 )1/2 .

If we let β = 21/3 , then β = (21/3 )1/2 , by (9.9) on page 202. Thus we get:

(21/3 )1/2 = β · β · β · β · β · β.

Since √
β = β = 21/3 = 2,
where the last equality is by (9.11) on page 204, we have

6 √ √ √
3 3 3
(21/3 )1/2 = ( 2) ( 2) ( 2) = 2 ,

where the last equality is because of the definition of 3 2. By Corollary 2, (9.17)
will be proved as soon as we can show that the right side of (9.17), when raised
to the 6th power, is also equal to 2. Now the right side of (9.17), when raised to
the 6th power, is equal to

2(1/3)·(1/2) = 21/6 .

By (9.9), 21/6 = 6 2 so that by the definition of the 6th root of 2, we have

21/6 = 2.

Therefore the right side of (9.17), when raised to the 6th power, is indeed equal to
2. This then proves (9.17).
Let us now rephrase the same proof using (9.14) on page 206. If we raise the
left side of (9.17) to the 6th power, we get, by applying (9.14) twice in succession:

(21/3 )1/2 = (21/3 )(1/2) · 6 = (21/3 )3 = 2(1/3) · 3 = 2.

We can also simplify the right side of (9.17) when raised to the 6th power by
applying (9.14):

2(1/3)·(1/2) = 21/6 = 2(1/6) · 6 = 2.

The two sides of (9.17) being equal when raised to the 6th power, Corollary 2
again concludes the proof of (9.17).


Prove the following special case of (9.13) on page 206: (51/2 )4/3 = 52/3 .

We will give a general proof of (9.13) by pushing harder along the line of
reasoning of (9.14)–(9.16). We need some preparation. Let us note that, for our
immediate needs, Corollary 2 on page 209 is the most important, and Corollary
2 can be proved directly (see Exercise 11 on page 213). However, Lemma 9.4
and Corollaries 1 and 2 are extremely useful facts (see Sections 10.2 and 10.4, for
example), so all three deserve to be learned.
Lemma 9.4. If two positive numbers α and β satisfy α < β, then for any positive
integer n, αn < βn .
Proof. Suppose α < β. We will prove αn < βn for n = 2, 3, 4, . . . in succession.
First, we prove α2 < β2 . Because α > 0, we may multiply both sides of α < β by
α to get
(9.18) α2 < αβ.

Next we multiply both sides of α < β by β to get

(9.19) αβ < β2 .
Combining (9.18) and (9.19), we get α2 < β2 .
Next we prove α3 < β3 . Multiplying both sides of α2 < β2 by α, we get

(9.20) α3 < αβ2 .

Now multiply both sides of (9.19) by β to get

(9.21) αβ2 < β3 .

Combining (9.20) and (9.21), we get α3 < β3 .
Let us take one more step and show that α4 < β4 . Multiplying both sides of
α < β3 by α, we get

(9.22) α4 < αβ3 .

Now multiply both sides of (9.21) by β to get

(9.23) αβ3 < β4 .

Combining (9.22) and (9.23), we get α4 < β4 . In this way, we will eventually get
to αn < βn no matter what n is. This completes the proof.
Lemma 9.4 is entirely intuitive. What may not be so intuitive is the fact that
the same lemma actually has implications in “the reverse direction”, namely, if
we know something about the n-th powers of two numbers, then we can draw
conclusions about the numbers themselves. This is the essential content of the
following two corollaries of Lemma 9.4 which turn out to be even more useful to
us than Lemma 9.4.
Corollary 1. For two positive numbers α and β, if αn < βn for some positive integer n,
then α < β.
This is of course the converse of Lemma 9.4, so it is a curious fact that the
converse of Lemma 9.4 is itself a corollary of Lemma 9.4. Before explaining why
this is so, let us note first of all what Corollary 1 does not say: it does not say
that if any two numbers α and β satisfy αn < βn for some positive integer n, then
α < β. For example, 32 < (−4)2 , but instead of 3 < −4, we have 3 > (−4). So the
truth of Corollary 1 depends critically on the positivity of both α and β.
As to the deduction of Corollary 1 from Lemma 9.4, we make use of the
trichotomy law among numbers (page 160). What we want to prove is that, under
the assumption of α > 0, β > 0, and αn < βn , we can conclude α < β. By
the trichotomy law, it is sufficient to show that neither α = β nor α > β is a
possibility. Let us first rule out α = β: in this case, clearly αn = βn for any
positive integer n, thereby contradicting the hypothesis that αn < βn . If α > β,
then Lemma 9.4 implies αn > βn for any positive integer n, again contradicting
the hypothesis. Therefore, only α < β is possible. This proves Corollary 1.
The next corollary is the statement that the positive n-th root of a positive
number is unique, thereby proving one-half of Theorem 9.2 on page 201.
Corollary 2. If two positive numbers α and β satisfy αn = βn for some positive integer
n, then α = β.

We use the trichotomy law again to eliminate the possibility of either α < β
or α > β. Lemma 9.4 says that if either is true, then αn = βn , which contradicts
the hypothesis. Thus α = β, and Corollary 2 is proved.
We can now prove the following lemma which is already halfway towards
Theorem 9.3 (if the positive integer k in the lemma were a rational number, then
the lemma would be exactly Theorem 9.3).
Lemma 9.5. For any positive number β and for any rational number s and any
positive integer k,
 s k
β = β sk .
Proof. If s = 0, the lemma is trivial (see (9.6) on page 199). We may therefore
assume s = 0. First we assume s > 0. Then s = m/n for some positive integers m
and n and we have to prove:
 m/n k
(9.24) β = βmk/n .
If we look at (9.24) “the right way”, then it is not complicated at all: the right way
is to think of both sides of (9.24) as statements about β1/n . Then, writing α for
β1/n , and using βm/n = ( β1/n )m (this is (9.14) on page 206), we see that
βm/n = αm .
Therefore, the left side of (9.24) is
 m/n k  k
(9.25) β = αm = αmk ,
where we make use of (E2) on page 195 in the second equality. Now apply (9.14)
to the right side of (9.24); we get
βmk/n = β1/n = αmk .
Comparing this with (9.25), we see that we have proved (9.24).
Now suppose s < 0. Then s = −m/n for some positive integers m and n,
and we have to prove:
 −m/n k
(9.26) β = β−mk/n .
We now think of both sides of (9.25) as statements—not about β1/n —but about
β−1/n . Then the preceding argument can be followed verbatim except that (9.14)
has to be replaced everywhere by (9.15). We will leave the details as an exercise
(Exercise 4 on page 213). The proof of the lemma is complete.
Proof of Theorem 9.3. We have to prove that
( βs )t = βst for all rational numbers s and t.
If t = 0, both sides are equal to 1 (see (9.6) on page 199). Henceforth, we may
assume t = 0. First suppose t > 0. Then t = m/n for some positive integers m
and n. Then we have to prove:
 s m/n
(9.27) β = β s · (m/n) for all rational s.
By Corollary 2, it suffices to prove that, when raised to the n-th power, both sides
of (9.27) are equal. The left side of (9.27) when raised to the n-th power is equal
to, by Lemma 9.5,
 s m/n n
β = ( βs )(m/n) · n = ( βs )m .

By Lemma 9.5 again, ( βs )m = βsm . Thus,

 s m/n n
(9.28) β = βsm .
Now we raise the right side of (9.27), namely βs · (m/n) , also to the n-th power.
Then Lemma 9.5 implies that
 s · (m/n) n
β = βs · (m/n) · n = βsm .
Comparing this with (9.28), we see that both sides of (9.27) when raised to the
n-th power are equal. Therefore (9.27) is proved.
It remains to prove ( βs )t = βst when t < 0. In this case, t = −m/n for some
positive integers m and n. Thus we have to prove
 s −m/n
(9.29) β = β s · (−m/n) for all rational s.
Because s(−m/n) = −s · (m/n), (9.8) (on page 199) implies that this is equivalent
to proving
1 1
 m/n = s · (m/n) .
βs β
However, this follows immediately from (9.27). The proof of Theorem 9.3 is com-
Finally, we can prove (E6) on page 205 when s is rational. However, the
following special case of (E6) when s = 1/n for a positive integer n occupies
such a central position in school mathematics that we single it out as a separate
theorem before we prove (E6).
Theorem 9.6. Let α, β > 0 and let n be a positive integer. Then
(9.30) n
αβ = n
α n β.
The special case of square root (n = 2) is already used frequently in middle
school, but perhaps with no explanations. We suggest you make an effort to give
the proof of this special case in the classroom (for a self-contained proof of the
special case, see Section 9.5 on page 220).

Proof. By the definition of n αβ, the n-th power of the left side of (9.30) is equal
to αβ. By Corollary 2, it suffices to prove that the n-th power of the right side of
(9.30) is also equal to αβ. The computation is straightforward:
√  n √   √  
α n β = n
α n β ··· n α n β (n times)
√ √    
= n
α··· n
α n
β··· n β
n n
= αβ.
The proof of Theorem 9.6 is complete.
We should point out the following consequence of Theorem 9.6 that is an
equally useful fact. Let α, β > 0 and let n be a positive integer. Then
α n
(9.31) n =  .
β n
Because the proof is only a straightforward adaption of the reasoning in the proof
of Theorem 9.6, and especially because the reasoning is something every teacher

must be comfortable with, we will leave it as an exercise to give readers the

opportunity to learn it well (Exercise 5 on page 213).
The following theorem proves (E6) when s is rational.
Theorem 9.7. Let α, β > 0 and let s be a rational number. Then
(9.32) (α · β)s = αs · βs .
Proof. If s = 0, there is nothing to prove as both sides will be equal to 1 (see (9.6)
on page 199). Therefore we may assume s = 0. If s > 0, then s = m/n for some
positive integers m and n. In this case, we have to prove:

(9.33) (α · β)m/n = αm/n · βm/n .

But by Theorem 9.6, we have

(α · β)1/n = α1/n · β1/n .

Raising both sides to the m-th power, we get
 m  m
(9.34) (α · β)1/n = α1/n · β1/n .
By (9.14) on page 206, the left side of (9.34) is equal to the left side of (9.33). To
prove (9.33), it suffices to prove that the right side of (9.34) is equal to the right
side of (9.33). Now, according to (E3) on page 195,
 1/n 1/n m
α ·β = (α1/n )m · ( β1/n )m .
By (9.14) on page 206 again,
 1/n m  1/n m
α · β = αm/n · βm/n .
Therefore the right side of (9.34) is also equal to the right side of (9.33). We have
proved (9.33).
It remains to prove (9.32) when s < 0. In this case, s = −(m/n) for some
positive integers m and n. Therefore we must prove:

(9.35) (α · β)−(m/n) = α−(m/n) · β−(m/n) .

By (9.8) on page 199, this is equivalent to proving
1 1 1
= m/n · m/n .
(α · β) m/n α β
By applying the product formula for rational quotients (see page 270) to the right
side,10 this in turn is equivalent to proving:
1 1
= m/n m/n .
(α · β) m/n α ·β
But according to (9.33), this equality is correct. Therefore (9.35) is true, and the
proof of Theorem 9.7 is complete.

10 Together with FASM.


Exercises 9.3

(1) In each of the following, find the number s that makes the equality valid,
and then verify the equality directly by computing the value of each side
with a four-function calculator. (i) If 7295/6 · 729s = 7298/6 , what is s?
(ii) If 729s = 7295/6 , what is s?
(iii) If 1176491/2 · 117649s = 117649

2/3 , what is s?

4 3

(2) Which of the following is bigger: √ 125 or 12 125?
√ n
(3) Prove that if 0 < a < b, then n a < b for any positive integer n.
(4) Complete the proof of Lemma 9.5 on page 210 by proving (9.26).
(5) Prove equation (9.31) on page 211.
(6) Without making use of (E5) (= (9.13) on page 206), give a direct proof
that ( β4 )1/3 = β4/3 for all positive numbers β.
(7) Give a direct proof of the following special case of Lemma 9.5 on page 210:
 2/3 4
β = β8/3 . (The numbers are so small that the proof can be achieved
with a minimal appeal to symbolic notation.)
(8) Prove that for all rational numbers r, s, and t, and for all α, β > 0,
(αr βs )t = αrt βst .
(9) For positive numbers α and β, prove that α + β ≤ α + β and that
equality holds if and only if α = 0 or√β = 0.
(10) Prove that for any number x, | x | = x2 .
(11) Here is the outline of a second proof that the positive n-th root of a
positive number β is unique: Let a and b be two such positive n-th roots;
then an − bn = 0. By identity (1.7) on page 12, we conclude that a =
b. Write out this proof in detail, and make sure the proof requires the
positivity of both a and b.
(12) Prove that if a > 1, then for any rational numbers r and s so that r < s,
ar < as , while if 0 < b < 1, br > bs .
(13) Given two similar triangles with the lengths of a pair of corresponding
sides as shown:
a  J
J  J
 J  J
a  J
J  J

If the ratio of the area of the bigger triangle to the area of the smaller

triangle is s, what is the ratio aa in terms of s ?
(14) Recall that an annual interest rate of x percent means that an account of
P dollars earns at the end of one full year an amount of


dollars. Derive a formula which gives the amount of money in an ac-

count at the end of n years if the account has an initial deposit of P
dollars and an annual interest rate of x percent.

9.4. Scientific notation

In addition to applications within mathematics, the exponential notation is indis-
pensable in science. It is used to clearly display the magnitude of a measurement:
how big? how small? We will explore this aspect of the exponential notation in this
It will be seen that understanding magnitude comes down to understanding
the integer powers of 10. We therefore begin by addressing two fundamental
issues about the integer powers of 10: what does it mean to say that 10n for large
positive integers n are big numbers and that 10−n for large positive integers n are
small numbers?
Fact 1. The numbers 10n for arbitrarily large positive integers n are big numbers,
in the sense that given a number M (no matter how big it is), there is a power of
10 that exceeds M.
Fact 2: The numbers 10−n for arbitrarily large positive integers n are small num-
bers, in the sense that given a positive number S (no matter how small it is), there
is a (negative) power of 10 that is smaller than S.
It will turn out that Fact 2 is a consequence of Fact 1. We address Fact 1 first.
Let us first show why this is true in two special cases, using a slightly different
reasoning in each case.
Example 1. Let M be the world population as of March 23, 2013. Approx-
imately, M = 7,073,981,143. It has 10 digits and is therefore smaller than any
whole number with 11 digits, such as 10,000,000,000. But 10,000,000,000 = 1010 ,
so M < 1011 , i.e., the 11th power of 10 exceeds this M.
Example 2. Let M be the U.S. national debt as of March 23, 2013. M =
16,755,133,009,522, to the nearest dollar. It has 14 digits. Since the largest 14-digit
number is 99, 999, 999, 999, 999,
M < 99, 999, 999, 999, 999 < 100, 000, 000, 000, 000 = 1014 .
That is, the 14-th power of 10 exceeds M.
Now, the general case. First, let M be a positive integer with m digits. As we
know, the integer 99 · · · 99 (with m 9’s) is ≥ M. Therefore 100 · · · 00 (m zeros),
being a number with (m + 1) digits, exceeds M and, since 10m = 100 · · · 00 (with
m 0’s), we have
M ≤ 99 · · · 9 < 1 00
 · · · 0 = 10m .

m m
So for an m-digit positive integer M, 10m always exceeds M.
In general, let M be an arbitrary positive number (i.e., a point on the number
line) not necessarily equal to an integer. Recall that on the number line there is a
special sequence of equispaced points, namely the integers. If M coincides with
one of these points in the sequence, then M is a positive integer and we have just
taken care of that. Therefore we may assume M is not one of these points. Then
M has to lie between two consecutive points of this special sequence, i.e., there is

some integer N so that N < M < N + 1. (For example, the number 45572.384
lies between 45572 and 45573.)
N M N+1 10n

Consider the positive integer N + 1: By the preceding reasoning, there is a positive

integer n so that 10n > N + 1. Since N + 1 > M, we have 10n > M again.
Consequently, for this number M, 10n exceeds it. This proves Fact 1.
For the proof of Fact 2 as well as for subsequent discussions, we need the
following three inequalities.
Inequality (A): Let x and y be numbers and let z > 0. Then
x < y if and only if xz < yz. (This is (D) on page 160.)
Inequality (B): For two positive numbers x and y, x < y if
and only if 1y < 1x . (This follows immediately from the cross-
multiplication algorithm on page 270.)
Inequality (C): For any number x, x > 0 ⇐⇒ 1x > 0. (This is
(8.3) on page 160.)
Before giving the proof of Fact 2, we first look at a special case.
Example 3. Given S = 73
45678 , find a specific positive integer n so that 10−n < S.

Let M = S1 = 45678 1
73 . Then also S = M , so that finding a positive integer n so
that 10−n < S is equivalent to finding an integer n so that
1 1
< .
10 M
By Inequality (B), this is equivalent to finding a positive integer n so that M < 10n .
Now Fact 1 already guarantees the latter, so we are done.
However, it is much more enlightening to get a specific n in this case. We
proceed as follows: we have
M = < 45678 < 105 .
By Inequality (B), we get
1 1
< ,
105 M
which can be rewritten as
10−5 < S.
Therefore, n = 5 would work in this case.


Let S = 0.7
456789 . Find a specific positive integer n so that 10−n < S.

The proof of Fact 2 in general is no different. Precisely, let S be given, S > 0.

We have to find a positive integer n so that 10−n < S. Let M = S1 . Then of course
S= M . By Inequality (C), M > 0 so that by Inequality (B),
1 1
10−n < S ⇐⇒ n
< ⇐⇒ M < 10n .
10 M

Therefore finding a positive integer n so that 10−n < S is now seen to be equiva-
lent to finding a positive integer n so that M < 10n . But Fact 1 implies that there
is a positive integer n so that M < 10n , so Fact 2 is proved.
As mentioned above, the reason we want to achieve an intuitive understand-
ing of the integer powers of 10 (such as Fact 1 and Fact 2) is that they are important
for the next concept, scientific notation.
When a number s is written in Consider the number that is the current es-
scientific notation, s = d × 10 , n timate of the number of stars in the universe:
6 × 1022 . Of course this is the 23-digit whole
the exponent n clearly displays number with the leading digit (i.e., the left-
the magnitude of s. most digit) 6 followed by 22 zeros, but when
it is written in the form 6 × 1022 , it is said to
be given in scientific notation. Formally, a positive, finite decimal11 s is said to be
written in scientific notation if it is expressed as a product d × 10n , where d is a
finite decimal ≥ 1 and < 10 (i.e., 1 ≤ d < 10), and n is an integer. (In other words,
d is a finite decimal with only a single nonzero digit to the left of the decimal
point.) The integer n is called the order of magnitude of the decimal d × 10n .12
Take the finite decimal 234.567. It is clearly equal to every one of the follow-
2.34567 ×102 , 0.234567 ×103 , 23.4567 ×10,
234.567 ×100 , 234567 ×10−3 , 234567000 ×10−6 .
However, only the first is a representation of 234.567 in scientific notation.


Is the number 9 × 10−1 written in scientific notation? What about 2.653979 ×

100 ?
The most obvious reason for calling the exponent n the order of magnitude
of the positive number s = d × 10n (expressed in scientific notation) is that the
following inequalities hold:
(9.36) 10n ≤ s < 10n+1 .
Thus the exponent n serves to give an approximate location of s on the number
line in terms of successive powers of 10, i.e., n gives the approximate “size” of s.
10n−1 10n s 10n+1

A word of caution about (9.36): When n ≥ 0, the number 10n is a number

with (n + 1) digits (not n digits) and 10n+1 is a number with (n + 2) digits (not
(n + 1) digits). Therefore, when n ≥ 0 and s is a whole number, (9.36) says that s
is an (n + 1)-digit number.
It is absolutely essential that we express very large and very small numbers in
scientific notation. To see why, consider once again the estimated number of stars
in the universe. The advantage of presenting it as 6 × 1022 rather than as
60, 000, 000, 000, 000, 000, 000, 000
11 Recall
that every whole number is a finite decimal.
12 Sometimesthe place value—10n —of the leading digit of d × 10n is called the order of magni-
tude. Since the two numbers are the same, there is no confusion possible.

(in standard notation) is perhaps too obvious for discussion: in the standard form,
one can’t keep track of the number of zeros! The advantage goes deeper, however.
When faced with a very big number, one’s natural first question is: roughly how
big? Is it something like a few hundred billion (a number with 11 digits), or even
a few trillion (a number with 13 digits)? The exponent 22 in the scientific notation
6 × 1022 tells you immediately that it is a 23-digit number and therefore far bigger
than “a few trillion”.
We should elaborate on the last statement. Observe that the number 6234.5 ×
1022 does not have 23 digits but 26 digits because it is the number
62, 345, 000, 000, 000, 000, 000, 000, 000.
So the reason we are confident about 6 × 1022 having only 23 digits is that 6 is
at least 1 and less than 10. Therefore by requiring the d in d × 10n to satisfy
1 ≤ d < 10, we are ensuring that the exponent n will unfailingly convey the
intuitive sense of the “order of magnitude” of d × 10n .


All planets revolve around the sun in elliptical orbits. Uranus’s furthest dis-
tance from the sun is approximately 3.004 × 109 km, and its closest distance
is approximately 2.749 × 109 km. What is the average of these two distances?

We continue with the discussion of why we are interested in writing numbers

in scientific notation. Consider the mass of the proton:
0.000 000 000 000 000 000 000 000 001 672 621 777 kg.
By the definition of finite decimals, since there are 36 digits to the right of the
decimal point, this number is the fraction
1 672 621 777
Since the whole number in the numerator has 10 digits, it is equal to 1.672 621 777 ×
109 . Therefore the mass of the proton is
1.672 621 777 × 109
= 1.672 621 777 × 10−27 kg.
The exponent of −27 in the scientific presentation of this number reveals that the
first nonzero digit (i.e., 1) of this decimal occurs in the 27th digit after the decimal
point, because 27 = 36 − 9. Similarly, the mass of the electron is
0.000 000 000 000 000 000 000 000 000 000 910 938 291 kg.
In scientific notation, it is
9.109 382 91 × 10−31 kg.
In this case, the exponent −31 serves to indicate that the first nonzero digit (i.e.,
9) of this decimal occurs in the 31st digit after the decimal point.
The advantage of scientific notation becomes even more pronounced when we
have to carry out a computation involving very large or very small numbers such
as those we have been working with. We illustrate by computing approximately

how many times a proton is heavier than an electron. Without using scientific
notation, we would have to compute the ratio r of
0.000 000 000 000 000 000 000 000 001 672 622
0.000 000 000 000 000 000 000 000 000 000 910
This is not an inviting prospect. However, we can write this ratio as
1.672 622 × 10−27
r = .
9.109 382 91 × 10−31
Using the general cancellation law (see page 269), we can eliminate the negative
power of 10 in the numerator and the denominator in order to better see what
we are doing. Anticipating that 10−31 × 1031 = 1, we are led to multiply the
numerator and denominator of the (complex) fraction by 1031 :

1.672 622 × 10−27 × 1031

r = .
9.109 382 2 × 10−31 × 1031
Using (E1) on page 195, we get
1.672 622 × 104 1.672 622
r = = × 104 .
9.109 382 91 9.109 382 91
We pause to note that, because we are using scientific notation, we can read off an
approximate value of r right away: this is because (using “≈” for “approximately
equal to”)
1.672 622 1.7 17 1
≈ = ≈
9.109 382 91 9.1 91 5
so that r is approximately 15 × 10000, which is 2000. Thus we expect a proton to
be about two thousand times heavier than an electron.
A more precise computation would go as follows. Using the general cancella-
tion law (see page 269) again, we can simplify the complex fraction
1.672 622
9.109 382 91
to an ordinary fraction:
1.672 622 1.672 622 × 108 167,262,200
= = .
9.109 382 91 9.109 382 91 × 108 910,938,291
We can now use the method discussed at the end of Section 1.6 of [Wu-PreAlg]
to convert the last fraction to a decimal, but since our focus is on understanding
scientific notation, it would be appropriate at this point to use a calculator to find
an approximate value of
which is 0.183 615 291 675. Thus the ratio of the mass of a proton to the mass of
an electron is
r = 0.183 615 291 675 × 104 = 1836.152 916 75 ≈ 1836
if we round to the nearest one. In everyday conversation, we round to the nearest
hundred and say that the proton is about 1,800 times heavier than the electron.

Exercises 9.4

(1) Let M = 46813902 43 88 . Find a positive integer n so that M < 10 .


(2) Given M = 55 3095487 m

, find a positive integer m so that M < 10 .
(3) Write each of the following numbers in scientific notation:
(a) 0.00002758. (b) 413,692,508,000,000,000. (c) 817.00042009.
(4) Let M = 96754.89 × 234567.345. Find a positive integer n so that M <
10n . (Don’t forget, there is no prize given out for getting the smallest n
to make this work.)
(5) Given S = 0.000 012 345, find a specific positive integer k so that 10−k < S.
(6) Given S = 69−27 × 87−31 . Find a specific positive integer  so that 10−
< S.
(7) The mass of the earth is 5,972,190,000,000,000,000 kg. Write this number
in scientific notation. Which of the two representations of this number
do you prefer? Explain.
(8) The Sun’s mass is 1,989,100,000,000,000,000,000,000,000,000 kg. Write
this number in scientific notation. Which of the two representations of
this number do you prefer? Explain.
(9) Here are the masses of the so-called Inner Planets of the Solar System:
Mercury 3.3022 ×1023 kg
Venus 4.8685 ×1024 kg
Earth 5.9722 ×1024 kg
Mars 6.4185 ×1023 kg
Compute the average mass of the Inner Planets and write it in scientific
(10) Let the geographic areas of California and the U.S. be 163,700 and
3,800,000 sq. mi, respectively. California’s population (as of 2012) is
3.804 × 107 , approximately. If population were proportional to area,
what would the population of the U.S. be?
(11) The 2012 population of the U.S. is actually 3.14 ×108 , approximately.
How does the population density of California (number of people per
sq. mi) compare with the population density of the U.S.?
(12) The planets in the solar system consist of four inner planets (Mercury,
Venus, Earth, and Mars), and four outer planets (Jupiter, Saturn, Uranus,
and Neptune). The inner planets are generally compact and “solid”,
composed mainly of rock and metal, while the outer planets are gener-
ally bloated and “gaseous”, composed mainly of hydrogen and helium.
You are asked to compute the average density (total mass divided by to-
tal volume) of the inner planets and that of the outer planets from the
following data so that you can see the difference. Here is a table of
approximate masses:

Mercury 3.3022 ×1023 kg Jupiter 1.8986 ×1027 kg

Venus 4.8685 ×1024 kg Saturn 5.6846 ×1026 kg
Earth 5.9736 ×1024 kg Uranus 8.68 ×1025 kg
Mars 6.4185 ×1023 kg Neptune 1.0243 ×1026 kg

Here is a table of approximate volumes:

Mercury 60,830,000,000 km3 Jupiter 1.4313 ×1015 km3

Venus 928,000,000,000 km3 Saturn 8.2713 ×1014 km3
Earth 1,083,210,000,000 km3 Uranus 6.833 ×1013 km3
Mars 163,180,000,000 km3 Neptune 6.254 ×1013 km3

9.5. Three additional remarks on rational exponents

(A) The identity
√ √ √
(9.37) a· b = ab for all a, b > 0
is ubiquitous in any discussion about equations, particularly quadratic equations
(see the next chapter) in school algebra. For this reason, we want to give a self-
contained proof of this identity to facilitate your teaching.
We start from the beginning. If a > 0, we assume there is some positive
number x so that x2 = a. We want to show that this x is unique. This would
follow from the following assertion:
(∗) If x, y are two positive numbers and x2 = y2 , then x = y.
This is because 0 = x2 − y2 = ( x − y)( x + y) (identity (1.5) on page 10). At this
point, we recall a basic fact about numbers which was proved in Section 2.5 of
[Wu-PreAlg] (Corollary 1 of Theorem 2.11 ) for rational numbers: let a, b be two
(real) numbers; then
(9.38) ab = 0 implies a = 0 or b = 0.
We will prove (9.38) by invoking FASM (see page 265): By the Cancellation Law
(the first of the formulas for rational quotients in Section 2.5 on page 269), if
a = 0, then 1a · a = 1 when a is rational. By FASM, the same is true when a
is any real number. Therefore, if we multiply both sides of ab = 0 by 1a , we
get 1 · b = 0, which implies b = 0. This proves (9.38). Therefore the fact that
( x − y)( x + y) = 0 necessarily implies that (at least) one of the factors, ( x − y) or
( x + y), is zero. But x and y being both positive, x + y > 0. Hence ( x − y) = 0,
i.e., x = y.
√ x so that x = a is called the positive square root
The unique positive number
of a and will be denoted by a.
Now we can prove identity (9.37). By (∗), it suffices to prove that
 √ √ 2 √
a· b = ( ab )2 .
The right side is equal to ab, by definition. The left side is equal to
√ √ √ √ √ √ √ √
( a b )( a b ) = ( a a )( b b ) = ab.
This then completes the proof of identity (9.37).
(B) First, we want to expand the meaning of an expression as defined on page
6 in Chapter 1. Recall that an expression (or number expression) is simply a
collection of numbers x, y, etc. connected by the four arithmetic operations. Now
that exponents are available, we can generalize the meaning of an expression by
defining it to mean a collection of numbers x, y, etc. which are connected by

the four arithmetic operations and the use of rational exponents. In this context,
an expression is also called an algebraic expression. Thus, the following is an
algebraic expression:
 3/4 xy
x −3 + (yz)2 + 5 − ( )5 .
(C) The availability of integer exponents allows us to bring closure to the
discussion of decimals in Chapter 1 of [Wu-PreAlg] by introducing the concept
of scientific notation. Through the preceding discussion of scientific notation, we
see that the numbers that are important in the sciences are not fractions but finite
decimals (though it behooves us to remember that finite decimals are a special
class of fractions).


Quadratic Functions
and Their Graphs
A quadratic polynomial function or, more simply, a quadratic function f is, by
definition, a function from R to R given by

f ( x ) = ax2 + bx + c

for some constants a, b, and c. In the school curriculum, the topic of quadratic
equations (see (10.3) on page 225) precedes the introduction of quadratic functions,
and rightly so, because an equation is conceptually simpler than a function to
students of school age. This curricular decision—no matter how pedagogically
laudable—has led to an undesirable side effect (at least in TSM1 ), namely, that
functions and equations have become two separate topics. This is but one example
of how TSM has wreaked havoc with the school curriculum, because the study of
functions should properly include the study of equations. Thus, if we define x0 to
be a zero (or a root) of a function f if f ( x0 ) = 0, then a very natural question in
the study of any function is to ask for the locations of all its zeros because these
zeros are usually part of the “signature” of the function. For example, if f ( x ) =
3x2 − 7x + 2 is given and we want to find all its zeros, then the problem—properly
phrased—is one of finding all the numbers x0 so that 3x02 − 7x0 + 2 = 0. If we
compare this problem with the definition of what it means to solve an equation
(see pages 28 ff.), we see that this is exactly the problem asking for the solution of
the quadratic equation 3x2 − 7x + 2 = 0.
In order to emphasize the fact that solving equations is part of the study of func-
tions, the main body of this chapter (Sections 10.2–10.5)—devoted to the study of
quadratic functions—only mentions quadratic equations along the way. However,
out of respect for the school curriculum and teachers’ pedagogical needs, we be-
gin the chapter with a section on quadratic equations that gives the fundamental
idea of completing the square its due. We hope nevertheless that you will come
away with a real appreciation of the connection between equations and functions.
In particular, be sure to take note of the fact that the technique of completing
the square, far from being a one-time trick designed specifically for deriving the
quadratic formula, is the key that unlocks the secrets of quadratic functions in

1 See page xi for the definition.


You should savor your time spent on studying quadratic functions, because
the algebra of quadratic functions of one variable is part of mathematical fairy-
land: this is an area in mathematics where everything we need to know is known,
and the answers are both simple and beautiful. Of course, we also know as much
as we need to know about the algebra of linear functions of one variable, but
the mathematics there is really too simple for us to take much pride in that ac-
complishment. By comparison, nobody should feel any embarrassment about
rediscovering the method of completing the square and the quadratic formula2
(see page 236). That would be a notable achievement indeed.
We will explore the most basic facts about quadratic functions and their
graphs. We have already come across the simplest quadratic function F1 ( x ) = x2
on page 129.3 A main theme of this chapter is to show that if we understand the
functions Fa ( x ) = ax2 , then we have already come very close to understanding all qua-
dratic functions. For this purpose, we must first obtain a good knowledge of the
geometry of the graph of Fa ( x ) for any a. This is hardly an accident as the ge-
ometry of the line weaves through the whole discussion of the algebra of linear
equations. In the same way, we will begin the study of quadratic functions by
trying to find out as much as we can about the graphs of quadratic functions;
then we will make use of the geometry of these graphs to facilitate the under-
standing of the functions themselves. In Sections 10.2 and 10.3, we will get to
witness the central role played by basic isometries (page 266)—specifically, reflec-
tions and translations—in the study of these graphs. In particular, we will correct
an egregious error in TSM by explaining what it means to say that the graph of a
quadratic function is a parabola.

10.1. Quadratic equations

In Section 3.1 on page 37, we defined what an equation is and what it means
to solve an equation. Then we solved polynomial equations of degree 1, otherwise
known as linear equations in one variable. Recall that given a linear equation in
x, ax + b = 0 with a = 0, then Theorem 3.1 on page 44 implies that it always
has a unique solution. For quadratic equations—the equations on the next level of
difficulty—the situation will be more complicated and yet remain simple enough
to be manageable, as we shall see.
Thus, consider an equation that asks for all the numbers x that make two
quadratic polynomial expressions in x equal, e.g.,

3x2 − x + 8 = 3x2 + 2x − 5

3x2 − x + 8 = x2 + 2x.

2 The honor of discovery apparently belongs to the Babylonians of some twenty-four centuries
3 There we denoted it by s to suggest “square”. The reason for the present notation F will be
obvious once we begin discussing the functions Fa ( x ) = ax2 .

Would either qualify to be called a “quadratic equation”? After transposition, the

first equation becomes −3x + 13 = 0 and the second equation becomes 2x2 −
3x + 8 = 0. Clearly, we would not call 3x2 − x + 8 = 3x2 + 2x − 5 a “quadratic
equation”, but 3x2 − x + 8 = x2 + 2x would seem to be a genuine quadratic
equation by any account.
In general, given such an equation,

(10.1) ax2 + bx + c = a x2 + b x + c ,

where a, a , etc., are constants, we can add (− a x2 ) to both sides to get

( a − a ) x2 + bx + c = b x + c .

Similar manipulations with regard to b x and c lead to an equation

(10.2) ( a − a ) x2 + (b − b ) x + (c − c ) = 0.

Therefore solving (10.1) is equivalent to solving (10.2). In the interest of simplicity,

let us concentrate on solving (10.2).
Since the constants a, a , b, b , c, and c in (10.2) are arbitrary, each of ( a − a ),
(b − b ), and (c − c ) is also arbitrary and we can therefore rewrite (10.2) in simpler
notation. We are thus led to consider an equation in a number x of the form

(10.3) ax2 + bx + c = 0,

where a, b, and c are constants. If a = 0, we

are back to solving linear equations; we there- In the theory of quadratic
fore exclude this case (see the above example equations and functions,
of −3x + 13 = 0 ). Thus we define a qua-
dratic equation to be an equation of the form
everything that should be known
(10.3) where the leading coefficient a satisfies is known, and the answers are
a = 0. A solution of a quadratic equation is simple and beautiful.
also called a root of the equation. The intuitive
meaning of the root of a quadratic equation can be given pictorially, as follows.
In Section 4.2 on page 57, we considered linear equations in two variables and
their graphs. Similarly, given the equation ax2 + bx + c = 0 in the number x, we
can consider the associated quadratic equation in two variables y = ax2 + bx + c.
As in Section 4.2, the graph of y = ax2 + bx + c is the collection of all the points
(s, t) that satisfy this equation, i.e., t = as2 + bs + c. Then s is a root of ax2 + bx +
c = 0 if and only if the graph of y = ax2 + bx + c intersects the x-axis at (s, 0). Before

giving the proof of this statement, let us assume it for the moment and give a
simple application: to solve 2x2 − x − 3 = 0, we graph

y = 2x2 − x − 3.

Here is part of the graph of y = 2x2 − x − 3:

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3





The graph of y = 2x2 − x − 3 seems to intersect the x-axis at (−1, 0) and (1.5, 0),
and a simple computation confirms that −1 and 1.5 are indeed roots of 2x2 − x −
3 = 0.
Now, to explain why s is a root of ax2 + bx + c = 0 if and only if the graph of y =
ax2 + bx + c intersects the x-axis at (s, 0), recall that the graph of y = ax2 + bx + c
is the collection of points of the form ( x, ax2 + bx + c). In this light, s being a root
of the equation ax2 + bx + c = 0 means as2 + bs + c = 0, so that (s, 0) is on the
graph of y = ax2 + bx + c. But the point (s, 0) is on the x-axis, and so the graph
of y = ax2 + bx + c passes through the point (s, 0) of the x-axis. Conversely, if
the graph of y = ax2 + bx + c intersects the x-axis at (s, 0), then (s, 0) is on the
graph of y = ax2 + bx + c, which means as2 + bs + c = 0 by the definition of the
graph. So s is a root of ax2 + bx + c = 0.
We have shown the graph of 2x2 − x − 3 roughly on the open interval (−2, 2.5);
it would be impractical to show the graph over a larger interval because the val-
ues of y get large quite fast outside (−2, 2.5). Here is a table of some values of x
and the corresponding y:
x y x y
−2 7 2.5 7
−3 18 3.5 18
−4 33 4.5 33
−5 52 5.5 52
−6 75 6.5 75
−7 102 7.5 102
−8 133 8.5 133

Note that the values of y are the same in the second and fourth columns; this fact
will be explained in due course (see Exercise 2 on page 252).
We begin by looking at some simple quadratic equations. Does x2 − 4 = 0
have any solutions? This is equivalent to x2 = 4, and right away we know two
numbers whose square is 4: 2 and −2, usually abbreviated to ±2. Similarly,
2x2 − 50 = 0 has two solutions, ±5, because multiplying this equation through
by 12 gives x2 − 25 = 0. Clearly 2x2 − 50 = 0 has a solution if and only if
x2 − 25 = 0 does. Since the latter is equivalent to x2 = 25, we have the obvious
solutions ±5 to 2x2 − 50 = 0. Next, does 2x2 − 5 = 0 have any solutions? We
√ by now that this equation is equivalent to x = 2.5. Recalling
can easily conclude
that the symbol N for a positive number N denotes the unique positive number
whose square is equal
√ to N (see Theorem 9.2 again), we conclude that there are
two solutions: ± 2.5 .
Are there others? The answer is no due to the following lemma.

√ 10.1. Let N > 0. Then a number X satisfies X − N = 0 if and only if

Lemma 2

X = ± N.
√ √ 2
Proof . Since N is positive, it has a positive square root N, so that ( N√) = N
(see Theorem 9.2 on page 201). Thus X 2 − N = 0 is equivalent to X 2 − ( N )2 =
0. By identity (1.5) on page 10, we get
√ √
( X − N )( X + N ) = 0.
√ √
√ X − N = 0 or X + N = 0, which is to say, X is
By (9.38) on page√220, either
equal to either + N or − N. The lemma is proved.
√ Thus we √ have proved that the equation 2x2 − 5 = 0 has exactly two solutions:
2.5 and − 2.5.
What happens to Lemma 10.1 when N ≤ 0? Consider, for instance, the
equation x2 − (−5) = 0. But for any number x, x2 ≥ 0 so that
x2 − (−5) = x2 + 5 ≥ 0 + 5 ≥ 5 > 0.
Therefore the equation x2 − (−5) = 0 has no solution. We also observe trivially
that the equation x2 = 0 has 0 as a solution. A similar reasoning allows us to
conclude that the equation X 2 − N = 0 has two solutions, one solution, or no
solution depending on whether N is positive, zero, or negative, respectively.
Now this is a very untidy conclusion about such a simple equation and, from
a more advanced point of view, we can do better. The correct conclusion is that
if complex roots are allowed, every quadratic equation has two solutions.4 Without
going into the details,
√ we outline what this means using the equation x2 − N= 0.
With i denoting
 −1 and N < 0, the two roots of x2 − ( N ) = 0 are i | N |
and −i | N |. As to the case N = 0, i.e., x2 − 0 = 0, we see that the equation
is actually x · x = 0, so that 0 being a solution means 0 · 0 = 0. Thus 0 is a root
“doubly”, i.e., two roots that happen to coincide. For this reason, we say 0 is a
double root. We can better explain the reason for calling this a double root by
considering a related equation x2 − bx = x ( x − b) = 0 for a nonzero real number
b. The two distinct roots are clearly 0 and b. Now let b get closer and closer to 0;
4 The Fundamental Theorem of Algebra asserts that every polynomial equation of degree n has

exactly n complex roots (see Chapter 11 in Volume II of [Wu-HighSchool]).


then the equation x2 − bx = 0 will eventually become x · x = x2 = 0 while the

two distinct roots 0 and b merge into 0, and 0. We therefore see better why there
are actually two roots for x2 = 0, namely 0 and 0.
Next, consider x2 + 2x + 1 = 0. We will adopt the methodology of pages 39 ff.
and assume that there is a solution x, find out what x has to be, and then use this
information to check whether the putative solution is a solution.
So suppose x is a solution of x2 + 2x + 1 = 0. By identity (1.2) on page 8, we
recognize that
x2 + 2x + 1 = ( x + 1)2 .
Therefore we have ( x + 1)( x + 1) = 0. Therefore x = −1 or x = −1, by (9.38)
on page 220. Thus, if x is a solution of x2 + 2x + 1 = 0, then x = −1. It is then
easy to verify that, indeed, −1 is a solution of x2 + 2x + 1 = ( x + 1)( x + 1) = 0.
Again, because the roots of ( x + 1)( x + 1) = 0 are actually −1 and −1, we call
−1 a double root of the equation x2 + 2x + 1 = 0.
Now look at a slightly different equation: x2 + 2x − 4 = 0. Knowing that
( x + 1)2 = x2 + 2x + 1, we can see that x2 + 2x − 4 is not the square of a lin-
ear polynomial. However, the breakthrough idea here is to insist on exploiting
( x + 1)2 = x2 + 2x + 1 nevertheless, and so we have
x2 + 2x − 4 = ( x2 + 2x + 1) − 1 − 4 = ( x + 1)2 − 5.
The payoff for the insistence is immediate: x2 + 2x − 4 = 0 is now seen to be
the same equation as ( x + 1)2 − 5 = 0, so that, according to Lemma 10.1 (letting

X = ( x + 1)), x is a solution of x2 + 2x − 4 = 0 if and only
√ if ( x + 1 ) = ± 5. In
other words, the solutions of x + 2x − 4 = 0 are −1 ± 5.

Consider a related equation: x2 + 2x + 2 = 0. We would have

x2 + 2x + 2 = ( x2 + 2x + 1) − 1 + 2 = ( x + 1)2 + 1
so that x2 + 2x + 2 = 0 can be rewritten as ( x + 1)2 + 1 = 0. Seeing that
( x + 1)2 ≥ 0, we realize that there will be no solution for x2 + 2x + 2 = 0 this
time because, for any number x,

x2 + 2x + 2 = ( x + 1)2 + 1 ≥ 0 + 1 > 0
and x2 + 2x + 2 will always be positive no matter what x may be. Thus we
witness once more the phenomenon that a quadratic equation can have two roots
(including the possibility of a double root), or no roots.5


Solve x2 − 6x + 1 = 0 and x2 − 6x + 10 = 0.
What about solving x2 + 5x + 2 = 0 ? If we follow the preceding line of
reasoning, we will be looking for a number c so that

x2 + 5x + 2 = ( x2 + 5x + c2 ) − c2 + 2 = ( x + c)2 + (−c2 + 2).

5 Recall from the preceding footnote on page 227 that if complex roots are allowed, then this

equation has two roots.


Expanding ( x + c)2 , we get x2 + 2c + c2 , which will be equal to x2 + 5x + c2 if

and only if 2c = 5, i.e., if and only if c = 52 . Knowing this, we can now proceed
with the following calculation:
x2 + 5x + 2 = x2 + 5x + ( 52 )2 − ( 52 )2 + 2 = ( x + 52 )2 − 17
4 .
As before, this means x is a solution of x2 + 5x + 2 = 0 if and only if x is a
solution of
( x + 52 )2 − 17
4 = 0.
By Lemma 10.1, this is true if and only if x satisfies
(10.4) ( x + 52 ) = ± 17
4 = ± 2 .

(Note that the second equality in (10.4) makes use of (9.31) on page

211.) There-
fore, x is a solution of x2 + 5x + 2 = 0 if and only if x = − 52 ± 217 .
Incidentally, given x2 + 5x, the preceding process of finding 52 so that x2 +
5x + ( 52 )2 becomes a square, ( x + 52 )2 , is worthy of being singled out. Let B be a
number. Then we have an obvious expansion (see identity (1.2) on page 8):
( x + B2 )2 = x2 + Bx + ( B2 )2 .
By reading this identity from right to left (the need for doing this having been
first brought up on page 21), we get
(10.5) x2 + Bx + ( B2 )2 = ( x + B2 )2 for any number B.
Letting B = 5, we retrieve the previous result of x2 + 5x.
The process of adding ( B2 )2 to x2 + Bx to get a square ( x + B2 )2 is called
completing the square. In order not to interrupt the flow of this discussion, we
will leave an explanation of this term to the end of this section (page 236).


Solve x2 − x + 1 = 0.
As a final example, consider the equation 3x2 − 4x + 5 = 0. In view of
(10.5), we will make sure that we work with quadratic polynomials whose leading
coefficient is 1. Therefore, letting B = − 43 in (10.5), we get:

3x2 − 4x + 5 = 3( x2 − 43 x ) + 5 = 3 x2 − 43 x + ( 46 )2 − ( 46 )2 + 5.


(10.6) 3x2 − 4x + 5 = 3 x2 − 43 x + ( 23 )2 − 3( 23 )2 + 5 = 3 x − 23 + 11
3 .

It follows that a number x is a solution of 3x2 − 4x + 5 = 0 if and only if x is a

solution of 3( x − 23 )2 + 11
3 = 0. But for any x,

3 x − 23 + 11 11 11
3 ≥ 0 + 3 = 3 > 0.

Therefore 3( x − 23 )2 + 11
3 = 0 has no solution and, consequently, 3x − 4x + 5 = 0

has no solution.


Solve 2x2 − 3x + 3 = 0.

If we switch the leading coefficient of 3x2 − 4x + 5 = 0 from 3 to (−3),

then the situation changes. We would be solving (−3) x2 − 4x + 5 = 0, and the
counterpart of (10.6) would be:

(−3) x2 − 4x + 5 = (−3) x2 + 43 x + ( 23 )2 + 3( 23 )2 + 5 = (−3) x + 23 + 19

Thus x is a solution of (−3) x2 − 4x + 5 = 0 if and only if x is a solution of

(−3)( x + 23 )2 + 19
3 = 0. However, it is clear that the equation (−3)( x + 23 )2 +
3 = 0 has the same solutions as ( x + 3 ) − 9 = 0. By Lemma 10.1, it
19 2 2 19

that x is a solution of (−3) x2 − 4x + 5 = 0 if and only if ( x + 23 ) = ± 199 or,
what is the same thing, if and only if

x = − 23 ± 319 ,

where we have made use of (9.31) on page 211. In particular, (−3) x2 − 4x + 5 = 0

has these two distinct roots.
At this point, we have gained some experience with solving quadratic equa-
tions, and it is time to take stock of where we are. Are we satisfied with just ac-
quiring the new skills and the knowledge—through specific examples—that some
quadratic equations have no roots while others have double roots or two distinct
roots? Or, do we want to find out whether these are fundamentally haphazard
phenomena or whether there is an underlying reason that explains everything? A
main goal of mathematics is to look into questions like these and try to find such
underlying reasons. In this case, there is a complete answer, as we now explain.
To this end, let us start with a general quadratic equation (see (10.3)):

ax2 + bx + c = 0,

where a, b, and c are constants and a = 0. By letting B in (10.5) be b

a , we get

ax2 + bx + c = a( x2 + ba x ) + c = a x2 + ba x + ( 2a
b 2 b
) − a 2a + c.

b − b2 −b2 + 4ac
−a +c = +c = ,
2a 4a 4a
we have:
b b2 − 4ac
ax + bx + c = a x + − .
2a 4a
Therefore x is a solution of ax2 + bx + c = 0 if and only if x is a solution of
b 2 b − 4ac
(10.7) a x+ − = 0.
2a 4a

Equation (10.7) is said to be a quadratic equation in vertex form:6 a( x − p)2 + q =

 2 4ac 
0. (In the case of (10.7), p = − 2a b
and q = − b − 4a .) Multiplying through
equation (10.7) by a , we get the equation
b 2 b − 4ac
(10.8) x+ − = 0.
2a 4a2
It is clear that x is a solution of (10.7) if and only if it is a solution of (10.8). Hence,
x is a solution of ax2 + bx + c = 0 if and only if x is a solution of (10.8). We now
concentrate on the solution of (10.8).
For ease of discussion, we let M denote the second term in (10.8), i.e.,
def b − 4ac
M = .
Then (10.8) may be rewritten as:
b 2
x+ − M = 0.
Since 4a2 > 0 (recall that a = 0 by assumption), we see from the definition of M
M is equal to 0, > 0, or < 0, if and only if

b2 − 4ac is equal to 0, > 0, or < 0, respectively.

We consider these three possibilities separately.
(i) b2 − 4ac < 0. Then M < 0 so that − M > 0, and (10.8) becomes
b 2 b 2
x+ −M > x+ ≥ 0.
2a 2a
Thus the left side of (10.8) is always a positive number for all x, so that the left
side of (10.8) can never be zero regardless of what x may be. In other words,
equation (10.8) can never hold for any x and, therefore, (10.8) has no solution.
Equivalently, ax2 + bx + c = 0 has no solution in this case.
(ii) b2 − 4ac > 0. Then M > 0, and Lemma 10.1 implies √ that the equation
ax + bx + c = 0 has two distinct roots x so that x + 2a b
= ± M or, what is the
same thing,

b b2 − 4ac
x = − ± .
2a 4a2
We can simplify the second term on the right side: by (9.31) on page 211,
b2 − 4ac b2 − 4ac
(10.9) ± = ± √ .
4a2 4a2

If a >√0, then 4a2 , being the positive
√ square root of 4a2 , is equal to 2a. If a < 0,
then 4a2 is equal to −2a. Thus 4a2 = 2a or −2a. Since the symbol ± on the

6 The terminology of “vertex form” is used only in school mathematics. The standard practice in

mathematics is to call this the normal form of the equation.


right side of (10.9) already allows for the variation in the sign of the number (i.e.,
whether it is positive or negative), we see that
b2 − 4ac b2 − 4ac
± 2
= ± .
4a 2a
Therefore we obtain the explicit expressions of the two roots in this case:

−b ± b2 − 4ac
x = .
(iii) b2 − 4ac = 0. Then (10.8) implies that
b 2
x+ = 0.
Therefore −b/(2a) is a double root of this equation, and therefore of ax2 + bx +
c = 0.
Observe that the formula in Case (ii) already yields −b/(2a) as a root if
b − 4ac = 0. Therefore we may combine Case (iii) and Case (ii) into a single

case of b2 − 4ac ≥ 0, provided −b/(2a) is understood to be a double root in

Case (iii). We can now summarize our findings in one theorem.
Theorem 10.2. (Quadratic formula) A quadratic equation ax2 + bx + c = 0 has
no solution in the real numbers if b2 − 4ac < 0. If b2 − 4ac ≥ 0, then the roots r1 , r2
are given by

−b ± b2 − 4ac
(10.10) r1 , r2 = .
The formula in (10.10) expressing the roots in terms of the coefficients a, b, c
is called the quadratic formula for the equation ax2 + bx + c = 0. Recall that
at this point, the formula is meaningful only if b2 − 4ac ≥ 0.7 The number
b2 − 4ac is called the discriminant of the quadratic equation or the quadratic
polynomial ax2 + bx + c (it discriminates between the equation ax2 + bx + c = 0
having roots and not having roots). If we now review the examples of specific
quadratic equations solved in the first part of this section, it will be seen that
Theorem 10.2 explains all the earlier solutions.


Use Theorem 10.2 to explain: (i) why x2 + 2x − 4 = 0 has two distinct roots,
why x2 + 2x + 2 = 0 has no roots, and (ii) why 3x2 − 4x + 5 = 0 has no
roots but (−3) x2 − 4x + 5 = 0 has two distinct roots.

We now give an application of Theorem 10.2 by deriving an inequality. Let

s and t be two numbers, and let a quadratic function g be defined by g( x ) =
( x − s)( x − t). Therefore
g( x ) = x2 − (s + t) x + st.

7 Once complex numbers are available, the quadratic formula will be shown to be valid under all


The discriminant Δ of g is then Δ = (s + t)2 − 4st. Because g has two roots s and
t, Theorem 10.2 implies that Δ ≥ 0. Thus we have st ≤ 12 (s + t) .
Suppose s and t are nonnegative; then we can take the square root of st to get
√ s+t
(10.11) st ≤ , and equality holds if and only if s = t.
The validity of the inequality is clear. As to the assertion about equality, if s = t,
then both sides of (10.11) are equal to s. Conversely, suppose equality holds in
(10.11). Then (s + t)2 − 4st = 0, so that Δ = 0. By the quadratic formula (10.10),
s = t. This proves (10.11).
For nonnegative numbers s and t, 12 (s + t) is called the arithmetic mean of

s and t (more commonly referred to as the average of s and t); the number st
is called the geometric mean of s and t (clearly the geometric mean is the multi-
plicative analog of the arithmetic mean). (10.11) is the arithmetic-geometric mean
inequality for two (nonnegative) numbers. There is a corresponding inequality
for n nonnegative numbers; see [Wiki-AGM] for a simple introduction.
The inequality (10.11) has a geometric interpretation. Let R be a rectangle
with two sides of lengths s and t. Then its perimeter is 2(s + t), and (10.11)
√ 1 
(10.12) area of R ≤ perimeter
and equality holds if and only if the two sides are of equal length, s = t, i.e., if
and only if the rectangle is a square. In other words, among all rectangles with
1 2
a fixed perimeter, say c, the square will have the maximum area, namely, 16 c ;
conversely, if a rectangle with a given perimeter c has maximum area, it is the
square of side length 4c . This is why (10.12) is called the isoperimetric inequality
for rectangles. Compare Exercise 13 in Section 2.6 of [Wu-PreAlg] and Exercise 3
on page 10 of this volume.
At this point, we can return to the issue of factoring quadratic polynomials
adumbrated on page 23. Take a quadratic polynomial with leading coefficient
equal to 1, x2 + Bx + C, where B and C are integers. The usual skill of factoring
this polynomial consists of finding two other integers r1 and r2 so that
(10.13) r1 + r2 = − B and r1 r2 = C.
Then because ( x − r1 )( x − r2 ) = x2 − (r1 + r2 ) x + r1 r2 , we get the factorization
(10.14) x + Bx + C = ( x − r1 )( x − r2 ).

Let us emphasize that in order to get the factorization (10.14), our sole concern
with the integers r1 and r2 is that they satisfy (10.13). However, (10.14) has a
ramification for r1 and r2 that was never part of our original thinking, namely,
the numbers r1 and r2 turn out to be the roots of the equation x2 + Bx + C = 0. This
is because when we substitute r1 or r2 for x in (10.14), the right side of (10.14) is
obviously equal to zero and therefore so is the left side, which then shows r1 and
r2 are the roots of x2 + Bx + C = 0.
Now suppose we are given x2 + Bx + C = 0, where B and C are no longer
required to be integers. Let r1 and r2 be its roots, which are not necessarily integers
either. Would (10.14) still be correct? The astonishing answer is that it is. Keep in
mind that all we know about r1 and r2 is that they are the roots of x2 + Bx + C = 0;
there is no indication that they would satisfy (10.13). But we are going to prove

that they do! To this end, observe that because we are assuming the equation
x2 + Bx + C = 0 has roots, we know its discriminant cannot be negative (see
Case (i) on page 231). Therefore its discriminant is ≥ 0. The quadratic formula
(Theorem 10.2) implies that r1 and r2 are given by (10.10), i.e.,

1   1  
r1 = − B + B2 − 4C and r2 = − B − B2 − 4C .
2 2
r1 r2 = − B + B2 − 4C − B − B2 − 4C .
Using the identity ( X − A)( x + A) = X 2 − A2 , we get

1 2 
r1 r2 = B − ( B2 − 4C ) = C.
1 1
r1 + r2 = (− B) + (− B) = − B.
2 2
This proves (10.13). Of course, (10.14) is then an immediate consequence.
What we have done can be stated more generally, as in the following theorem.

Theorem 10.3. Let ax2 + bx + c be a quadratic polynomial with nonnegative dis-

criminant, and let r1 and r2 be the roots of ax2 + bx + c = 0. Then

(10.15) r1 + r2 = − ba and r1 r2 = ac

and we have an identity in x:

(10.16) ax2 + bx + c = a( x − r1 )( x − r2 ).
Proof. Because ax2 + bx + c = 0 can be written as a x2 + ba x + ac = 0, we see
that r1 , r2 are also the roots of the equation x2 + ba x + ac = 0. Letting B = ba and
C = ac , we see from (10.13) that the equalities in (10.15) hold. Moreover, we have
from (10.14) that
x2 + ba x + ac = ( x − r1 )( x − r2 ).

Multiplying both sides by a, we get (10.16). The proof is complete.

Equation (10.16) implies that a quadratic
The quadratic formula trivializes polynomial ax2 + bx + c is completely deter-
the topic of factoring quadratic mined by its two roots and its leading coeffi-
cient a. In addition, it shows that the excessive
attention that TSM lavishes on tricks to factor
a quadratic polynomial with integer coefficients is unnecessary. Indeed, (10.16)
says that all it takes to achieve the factoring is to use the quadratic formula to find
the roots of the associated quadratic equation. Here is an example of how this is

Example 1. Factor 105x2 − 22x − 96.

By the quadratic formula, the roots are (use a calculator!):

22 ± 222 + 4 · 105 · 96 22 ± 202 −90 112
= = , .
2 · 105 210 105 105
Therefore, by (10.16),
90 112
105x − 22x − 96 = 105 x +
x− .
105 105

This would be a perfectly legitimate factorization. However, it is possible to sim-

plify the right side to make it look better by observing that

112 16
= .
105 15
105x2 − 22x − 96 = (105x + 90)( x − ) = (7x + 6)(15x − 16).


Factor 3x2 − x − 1 and x2 + 7x + 11.

We already emphasized above that the a, b, and c in Theorem 10.3 need not
be integers. The following example illustrates the kind of fancy factoring one can

Example 2. Factor 3x2 − 2 x − 12 .

By the quadratic formula, the roots of 3x2 − 2 x − 12 are
√ √ √ √ √ √
2± 2+6 2±2 2 2 − 2
= = or .
6 6 2 6
By (10.16),
√  √  √   √  √ 
3x2 − 2 x − 12 = 3 x − 22 x + 62 = x − 22 3x + 22 .

Incidentally, it is instructive
√ to multiply out the product of linear polynomials on
the right to obtain 3x2 − 2 x − 12 .
We conclude this section with three comments.
First, the proof of Theorem 10.2, together with the various solutions of specific
quadratic equations in the first part of this section, lays bare the fact that solving a
quadratic equation ax2 + bx + c = 0 is a relatively simple two-step affair, namely,
(a) use completing the square (page 229) to rewrite the equation in vertex form
(see page 231):
a ( x − p )2 + q = 0

for some constants p and q, and (b) use Lemma  10.1 (page 227) to solve
a( x − p) + q = 0, resulting in the two roots p ±
a , provided either q = 0 or
q and a have opposite signs (see page 267).
A word about the notation we have just employed may clear up
some confusion. It would seem that if we are going to make
use of Lemma 10.1, then we should follow the notational con-
vention in that lemma and write the vertex form of the equa-
tion as a( x − p)2 − q = 0. However, the reason for writing
a( x − p)2 + q = 0 instead is Theorem 10.5 on page 242 below.
This is a case of conflicting mathematical demands on a no-
tation, and the choice one makes ultimately comes down to a
judgment call.
A second comment is that we have to put the
Solving a quadratic equation focus of school mathematics on solving qua-
ax2 + bx + c = 0 is nothing dratic equations in perspective. Instead of solv-
ing ax2 + bx + c = 0, let us also consider the
more than asking for the zeros
quadratic function f ( x ) = ax2 + bx + c. From
of the quadratic functions the vantage point of f ( x ), solving ax2 + bx +
f ( x) = ax2 + bx + c. c = 0 is the same as asking whether there is a
number x0 so that f ( x0 ) = 0. This then opens
the floodgates to asking a host of other questions, such as whether there is a
number x1 so that f ( x1 ) = d for a given number d. This can get interesting. For
example, consider the function F( x ) = x2 − 4x + 3.
Is there an x0 so that F( x0 ) = 0 ? Yes. x0 = 1, 3.
Is there an x1 so that F( x1 ) = −1 ? Yes. x1 = 2. (Notice that 1
and 3 are equidistant from 2.)
Is there an x2 so that F( x2 ) = k for a k so that k < −1 ? No.
Is there an x3 so that F( x3 ) =  for an  so that  ≥ −1 ? Yes.


Prove the above assertions.

Now the question becomes whether phenomena similar to the above also
prevail for quadratic functions in general and, if so, whether there is a general
explanation for them. In this broad context, one begins to see that the issue
of solving a quadratic equation is but a small part of the overriding effort to
understand quadratic functions. This is why the rest of this chapter will be devoted
to the study of quadratic functions.
The last comment brings closure to the concept of completing the square (see
page 229). We saw that, given x2 + Bx, adding to it the term ( B2 )2 leads to a
square ( x + B2 )2 . Why is this called “completing the square”? It is because the
Babylonians (perhaps around 400 B.C.) originated this idea pictorially, as follows.
Let both x and B be positive, and write x2 + Bx as
x2 + 2 ·x .

Then x2 + Bx is exactly the area of the following figure consisting of a square

with one side of length x and two rectangles each with sides of length B2 and x.
x 2


Looking at this picture, it would be natural to complete it by making it part of a

bigger square with sides of length x + B2 by adding the small dotted square at the
lower right corner:
x 2

qq q
B qq q
2 qq q

Now the dotted square has a side of length B2 , so its area is ( B2 )2 . With this
dotted square added to the original figure, the total area is now the area of the
big square with sides of length x + B2 . Thus:
2 B B 2
( x + Bx ) + = x+ .
2 2
This is exactly identity (10.5) on page 229.

Exercises 10.1

(1) Without using the quadratic formula, directly solve: (a) 16x2 + 8x + 1 =
0. (b) x2 − 32 x + 1 = 0. (c) 3x2 + 12x + 11 = 0. (d) 6x2 + 3x − 2 = 0.
(2) (Everybody must do this problem!) Starting with px2 + qx + r = 0 (do not
change the notation!), give a self-contained and coherent derivation of
the quadratic formula.

(3) Solve each of the following quadratic

√ equations: (i) 6x2 − 13x = 5. (ii)
s + 6s = 0. (iii) −3x + 4 3 x − 4 = 0. (iv) 16 x2 − 13 = 0. (v)
2 2
√ √
x2 + 14 x = 14 . (vi) −t2 − 13t + 3 = 0. (vii) − x2 − 13x = 3. (viii)
√ √ √
180x2 + 7x = 5. (ix) x2 − 3 2x + 4 = 0. ( x ) 3s2 − √4 s + 15 = 0.
(4) Factor the following polynomials: (i) 30x2 + 13x − 36. (ii) 5x2 − x − 7.
(iii) 105x2 + 766x + 72. (iv) 4x2 − 116 x − 3. (v) 10x − 13x − 30.

(5) Find coefficients a, b, and c of the quadratic polynomial ax2 + bx + c so

that 2 and −5 are the roots of ax2 + bx + c = 0. Do the same for each√of
the following pairs of roots: (i) − 35 and 4. (ii) 34 and 73 . (iii) 2 + 5
√ √ √ √ √ √
and 2 − 5. (iv) 6 and 5. (v) 2 and 3. (vi) 23 + 5 and 23 − 5.
√ √
(vii) 1 − 310 and 1 + 310 .
(6) Explain why if a quadratic equation ax2 + bx + c = 0 has a (real) root,
then the discriminant is ≥ 0, i.e., b2 − 4ac ≥ 0.

10.2. A special class of quadratic functions

In this section, we begin the study of qua-
Quadratic functions in vertex dratic functions. The discussion of the solu-
form are important because every tion of quadratic equations on page 236 al-
ready gives us a hint that the collection of
quadratic function can be written
quadratic functions in so-called vertex form,
in vertex form. f ( x ) = a( x − p)2 + q, for some constants p and
q occupy a special position. We will begin with
this collection. (If we want to look ahead, the real reason we are interested in them
is given in Theorem 10.7 on page 246.)
Before proceeding further, let us first make sure that such an f ( x ) =
a( x − p)2 + q is indeed a quadratic function. By expanding ( x − p)2 (see identity
(1.3) on page 9), we get:

(10.17) a( x − p)2 + q = ax2 − (2ap) x + ( ap2 + q).

Thus the function f ( x ) = a( x − p)2 + q is a quadratic function f ( x ) = ax2 + bx + c
where the coefficients b and c are, respectively,

(10.18) b = −2ap and c = ap2 + q.

If p = q = 0, then the function f ( x ) = ax2 + bx + c simplifies to f ( x ) = ax2 .
We will henceforth denote this function by Fa ( x ). Thus by definition, Fa : R → R,
where a is in R and
Fa ( x ) = ax2 .
Let us get to know this function first.
As mentioned in the introduction to this chapter, we have already come across
the function F1 ( x ) = x2 on page 129. If we graph the more general functions
Fa ( x ) = ax2 , then we will notice that, qualitatively, these graphs look “alike”, i.e.,
the graphs of the Fa ’s for a > 0 look like each other (see the left picture below) and
those of the Fa ’s for a < 0 also look like each other (see the right picture below).


A first goal of this section is to give more precision to the statement that the Fa ’s
for a > 0 look “alike” and those of the Fa ’s for a < 0 also look “alike”.
We note that, first of all, the difference between the graphs of Fa and F− a
would disappear if we allow the reflection Λ across the x-axis (see page 268) to
identify a graph with its reflected image. In greater detail, we claim: the reflection
Λ maps the graph Ga of Fa to the graph G− a of F− a . To see this, let a > 0. Recall
that this reflection satisfies Λ( x, y) = ( x, −y) (page 57). Given a number x, then
( x, Fa ( x )) lies on Ga while ( x, F− a ( x )) lies on G− a .
Note that ( x, Fa ( x )) = ( x, ax2 ) Ga
r( x, F ( x ))
and ( x, F− a ( x )) = ( x, − ax2 ). a

Therefore, Λ( x, Fa ( x )) = ( x, F− a ( x ))
and Λ( x, F− a ( x )) = ( x, Fa ( x )).
So Λ( Ga ) = G− a for any number a, and X
Λ interchanges the graphs Ga and G− a ,
so that the graphs Ga and G− a are
congruent. r( x, F ( x ))
G− a

Because congruence preserves lengths of segments and degrees of angles, any

theorem about one of Ga or G− a will carry with it a corresponding theorem about
the other. In general terms, this means that anything we can prove about one of
the functions Fa or F− a will have a counterpart for the other. This is why in the
future we usually concentrate on the case Fa for a > 0 only. If this sounds too
vague, just wait until we get to specific cases, and the meaning of this statement
will be clarified.
We now begin a more detailed analysis of Fa and its graph Ga .
Given a function f from R to R (e.g., a quadratic function), we say f achieves
a minimum at x0 if f ( x ) ≥ f ( x0 ) for any number x. Similarly, we say f achieves
a maximum at x0 if f ( x ) ≤ f ( x0 ) for any number x.
In terms of the graph G of f , if the function f achieves a minimum at x0 ,
then no other point on the graph G can be lower than ( x0 , f ( x0 )). In view of this
observation, the graph of Fa for a > 0 suggests that Fa achieves a minimum at
the origin O = (0, 0). We now prove that this intuition is correct: for any x = 0,
x2 > 0 and since a > 0, (D) on page 160 implies that ax2 > 0. Therefore, if x = 0,
Fa ( x ) = ax2 > 0 = Fa (0).
This not only proves that Fa achieves a minimum at 0, but also that Fa achieves a
unique minimum at 0.

Similarly, the graph G− a of Fa (still with a > 0) suggests that F− a achieves a

unique maximum at 0, and the proof is entirely similar.
Next, a function f is said to be increasing on an interval (open or closed; see
pages 266 and 267) if for all x and x in the interval, x < x implies f ( x ) < f ( x );
f is said to be decreasing on the interval if for all x, x in it, x < x implies
f ( x ) > f ( x ).


Show that a linear function whose graph has positive slope is increasing on
R, and one whose graph has negative slope is decreasing on R.

The linear functions give the correct idea that the graph of an increasing
function goes up as we move to the right, while the graph of a decreasing function
goes down as we move to the right. To continue with the discussion, we have to
introduce two pieces of new notation: for a number p, we let (−∞, p] denote
all the numbers x so that x ≤ p, and we let [ p, ∞) denote all the numbers x so
that x ≥ p. Then the graph Ga of Fa for a > 0 suggests that on (−∞, 0], Fa is
decreasing, while on [0, ∞), Fa is increasing. The picture of the graph Ga serves
the useful purpose of telling us intuitively what is correct or what is incorrect, but
because we are still learning how to write proofs, we should not rely on intuition
alone but also write down the precise reasoning.

Ga Y
( x , Fa ( x )) r
( x, Fa ( x )) r

s s O x x
Thus let us prove that if 0 ≤ x < x , then Fa ( x ) < Fa ( x ). In other words, we
have to prove that
0 ≤ x < x implies ax2 < a( x )2 .
By Lemma 9.4 on page 208, 0 ≤ x < x implies x2 < ( x )2 . Since a > 0, (D) on
page 160 implies that ax2 < a( x )2 , as desired. Next, we prove that s < s ≤ 0
implies that f (s) > f (s ). Thus we have to prove:
s < s ≤ 0 implies as2 > a(s )2 .
By (A) of page 160, s < s ≤ 0 implies 0 ≤ −s < −s, and therefore by Lemma
9.4 again, we have (−s )2 < s2 , which is equivalent to (s )2 < s2 . By (A) of page
160 once more, we get a(s )2 < as2 , which is equivalent to as2 > a(s )2 . The
proof of the increasing and decreasing properties of Fa for a > 0 is complete.
We pause to make an observation: the fact that Fa ( x ) (a > 0) is decreasing on
(−∞, 0] and increasing on [0, ∞) gives a second proof that Fa achieves a unique
minimum at x = 0.

Still with a > 0, the graph G− a of F− a is now the reflection of Ga across the
s s O x x X

Y G− a

In this case, F− a is increasing on the negative x-axis and decreasing on the positive
x-axis. The details are left to an exercise (Exercise 1 on page 246).
As before, the fact that Fa ( x ) (a < 0) is increasing for x < 0 and decreasing
for x > 0 gives a second proof that Fa achieves a unique maximum at x = 0.
Next, we recall that a set S in the plane is said to have bilateral symmetry
with respect to a line L if for every point Q in S, the reflection Λ across L maps
Q to a point that also lies in S (see page 266, also Exercise 2 on page 246). If we
use the terminology of Chapter 8 (page 165), S being symmetric with respect to L
means the part of S in the half-plane L+ is congruent to the part of S in the other
half-plane L− . If there is such a symmetry, then the study of S itself reduces to a
study of one of the two halves, S ∩ L+ or S ∩ L− . This explains our interest in
such a symmetry.
We claim that, for every a = 0, the graph Ga of Fa has bilateral symmetry with
respect to the y-axis. Let us prove this. With Λ denoting the reflection across the
y-axis, let P be a point of the graph Ga . Then we have to prove that Λ( P) is in
Ga . But P being a point in Ga means that P = ( x, ax2 ) for some number x. Since
Λ is the reflection across the y-axis, we have Λ( P) = (− x, ax2 ) (see page 57).
(− x, ax2 ) = (− x, a(− x )2 ) = (− x, Fa (− x ))
and (− x, Fa (− x )) is of course a point on the graph Ga . Thus Λ( P) is in Ga . This
proves the bilateral symmetry of Ga with respect to the y-axis.
We summarize our findings about the functions Fa in the following theorem,
which states precisely in what way the graphs { Ga } look “alike”.

Theorem 10.4. For a = 0, let Fa : R → R denote the function Fa ( x ) = ax2 , and

let Ga be the graph of Fa . Then:
(i) Ga has bilateral symmetry with respect to the y-axis.
(ii) The reflection across the x-axis is a congruence between Ga and G− a .
(iii) If a > 0, then Fa is decreasing on (−∞, 0] and increasing on [0, ∞) . If a < 0,
then Fa is increasing on (−∞, 0] and decreasing on [0, ∞) .
(iv) If a > 0, then Fa achieves a unique minimum at 0; if a < 0, then Fa achieves a
unique maximum at 0.

Recall that the goal of this section is to un-

derstand quadratic functions in vertex form, In a sense, if we know
f ( x ) = a( x − p)2 + q, where p and q are con- Fa ( x) = ax2 for every a, then we
stants. We are now ready to get on with this
know every quadratic function.
task by proving the following theorem. The

statement of the theorem and its proof both draw on Lemma 5.3 on page 95. In-
cidentally, the theorem shows why we are interested in the function Fa and its
graph Ga in the first place.

Theorem 10.5. Let a quadratic function f be in vertex form, f ( x ) = a( x − p)2 + q,

where p and q are constants, and let G denote its graph. Then G is the translated image
of the graph Ga of Fa ( x ) = ax2 under the translation T along the vector OV where O is
the origin and V = ( p, q).
Proof. For definiteness, let us assume a > 0. The proof for the case a < 0 will be
entirely similar.
Let V = ( p, q) and let T be the translation along the vector OV. Then T
translates the graph Ga of Fa to T ( Ga ), as shown in the following picture. We
want to show that T ( Ga ) = G.

T ( Ga )

Ga T (rP) = ( x + p, ax2 + q)

P = ( x, ax2 ) r

q 1r

 V = ( p, q)

O p

By the definition of the equality of sets (see page 267), we have to show:
(A) T ( Ga ) ⊂ G, i.e., if P is a point of Ga , then T ( P) is a point of G.
(B) G ⊂ T ( Ga ), i.e., if Q is a point of G, then Q = T ( P) for some P on Ga .
Let us first prove (A). By definition, G consists of all the points of the form
(t, f (t)) = (t, a(t − p)2 + q), where t is some number (and therefore the formidable-
looking expression, a(t − p)2 + q , is also just a number). Now if P is a point of
Ga , we are going to show that T ( P) is equal to (t, a(t − p)2 + q) for some t, and
this will prove (A).
Since P is in Ga , by definition, P = ( x, ax2 ) for some number x. By Lemma
5.3 on page 95,
T ( P) = ( x + p, ax2 + q).
Let t = x + p; then a(t − p)2 + q = a( x + p − p)2 + q = ax2 + q. Thus T ( P) =
(t, a(t − p)2 + q), which, as noted above, means that T ( P) is a point of G. The
proof of (A) is complete.
We next prove (B). Suppose Q is a point of G; then Q = (t, a(t − p)2 + q) for
some number t. We have to prove that Q = T ( P) for some point P in Ga ; since a
point of Ga is necessarily equal to ( x, ax2 ) for some number x, we have to prove

(10.19) (t, a(t − p)2 + q) = T ( x, ax2 ) for some number x.
Now T ( x, ax2 ) = ( x + p, ax2 + q), so if we want this to be equal to the given
point (t, a(t − p)2 + q), then—by equating the two x-coordinates—we must have
x + p = t, so that, necessarily, x = t − p. This suggests that, in order to prove
(10.19), we let x = t − p. Then,
T ( x, ax2 ) = ( x + p, ax2 + q) = ((t − p) + p, a(t − p)2 + q) = (t, a(t − p)2 + q).
This proves (10.19) and hence also (B). The proof of Theorem 10.5 is complete.
Theorem 10.5 tells us that the graph G of f ( x ) = a( x − p)2 + q is just the
translated image of the graph Ga of Fa under T.
G (= T ( Ga ))


q 1r

 V = ( p, q)


O p

Looking at the graph G of f ( x ) = a( x − p)2 + q from this perspective, and remem-

bering the properties of Fa and its graph Ga in Theorem 10.4, we see that the
following theorem about f and its graph G is nothing but an afterthought.
Theorem 10.6. Let a quadratic function f be in the form f ( x ) = a( x − p)2 + q,
where p and q are constants, and let G denote the graph of f . Then:
(i) G has bilateral symmetry with respect to the vertical line x = p.
(ii) If a > 0, f is decreasing on (−∞, p] and increasing on [ p, ∞). If a < 0, f is
increasing on (−∞, p] and decreasing on [ p, ∞).
(iii) If a > 0, then p is the number at which f achieves a unique minimum, and
f ( p) = q; if a < 0, then p is the number at which f achieves a unique maximum, and
again f ( p) = q.
Although Theorem 10.6 is now seen to be intuitively obvious in view of The-
orems 10.4 and 10.5, it still needs a proof. While it is possible to prove it ge-
ometrically by making use of the translation T in Theorem 10.5, it turns out to
be much simpler—once we are convinced of its truth visually—to give a direct
algebraic proof. It will be noted that it is the geometric understanding of the
situation that provides the context for the algebraic arguments. In this sense,
Theorem 10.5 is indispensable for the conceptual understanding of the function
f ( x ) = a( x − p)2 + q.
Proof. We first prove (i). Let L denote the vertical line x = p. It is straightforward
to show that the reflection Λ across L maps a point ( p + k, y) (for any number k,
positive or negative, and for any number y) to the point ( p − k, y). (The following

picture assumes that k is positive, but if k < 0, then ( p + k, y) would be on the

left while ( p − k, y) would be on the right.)

( p − k, y) ( p + k, y)
r r

p−k p p+k

Now suppose ( p + k, y) is on the graph G of f ( x ) = a( x − p)2 + q. Then

y = f ( p + k) = a(( p + k) − p)2 + q = ak2 + q.
But also
f ( p − k) = a(( p − k) − p)2 + q = ak2 + q = y.
Therefore the point ( p − k, y) lies on G as well. Since Λ( p + k, y) = ( p − k, y),
we have just proved that Λ( G ) ⊂ G and therefore G has bilateral symmetry with
respect to L (see bilateral symmetry on page 266).
Next, we prove (ii). First, consider the case a > 0.
We begin by proving f is increasing on [ p, ∞). Let k and k be ≥ 0 and let
k < k . Then p + k and p + k are both in [ p, ∞) and ( p + k) < ( p + k ). We have
to prove f ( p + k) < f ( p + k ). This means we must prove:
 2  2
a ( p + k) − p + q < a ( p + k ) − p + q.
In other words, we must prove
(10.20) ak2 + q < a(k )2 + q.
By part (iii) of Theorem 10.4 on page 241, Fa is increasing on [0, ∞) and therefore
ak2 < a(k )2 because 0 ≤ k < k . Therefore (10.20) follows from (B) on page 160.
Next, we prove f is decreasing on (−∞, p]. So this time let k and k be ≤ 0 and let
k < k . Since k ≤ 0, (B) of page 160 implies p + k ≤ p. Similarly, p + k ≤ p and
both ( p + k) and ( p + k ) are in (−∞, p]. Moreover, k < k implies p + k < p + k .
Therefore, we have to prove f ( p + k) > f ( p + k ). This means we must prove
 2  2
a ( p + k) − p + q > a ( p + k ) − p + q.
Or, what is the same thing, we must prove
(10.21) ak2 + q > a(k )2 + q.
Again, by part (iii) of Theorem 10.4 on page 241, Fa is decreasing on (−∞, 0] and
so ak2 > a(k )2 because k < k ≤ 0. Therefore (10.21) follows from (B) on page
The proof for the case of a < 0 is entirely similar and may be left to an
exercise (see Exercise 3 on page 246). The proof of (ii) is complete.
Finally, we prove (iii). Suppose a > 0. As noted earlier (page 240), the fact
that f is decreasing on (−∞, p] and increasing on [ p, ∞) easily implies that f
achieves a unique minimum at p. Similarly, if a < 0, the fact that f is increasing
on (−∞, p] and decreasing on [ p, ∞) implies that f achieves a unique maximum at

p. The fact that f ( p) = q follows from the definition of f . The proof of Theorem
10.6 is complete.


Plot points on the graph of f ( x ) = 4( x − 2)2 + 13 which are symmetric with

respect to the line x = 2. Does the graph cross the x-axis?

Still with f ( x ) = a( x − p)2 + q, consider now the zeros of f . Recall from page
223 that x0 is called a zero (or a root) of f if f ( x0 ) = 0. This number x0 is then
a root of the equation a( x − p)2 + q = 0. On page 235, we saw that the equation
a( x − p)2 + q = 0 has two distinct zeros if and only if a and q have opposite
signs. In terms of f , this means likewise f has two distinct zeros if and only if
a and q have opposite signs. Algebraically, this is a simple assertion that follows
from Lemma 10.1 on page 227. However, using Theorem 10.5, we can understand
this assertion geometrically as well. The zeros of f ( x ) = a( x − p)2 + q are the
x-coordinates of the intersections of its graph G with the x-axis (see the discussion
on page 226). Therefore the question of whether f has two distinct zeros becomes
one of whether G intersects the x-axis at two distinct points. The following three
graphs show the graph of an f with a > 0 for the three cases q < 0, q = 0, and
q > 0, respectively.
.. .. ..
. . .
.. .. ..
. . .
.. .. ..
. . .
.. .. ..
. . .
.. .. ..q( p, q)
.q .q .q
.. p .. p .. p
O .q O . O .
.. ( p, q) .. ..
. . .

q<0 q=0 q>0

If on the other hand a < 0, then we have the following three graphs:
.. ( p, q) .. ..
.q . .
.. .. .. p
O .q O .q O .q
.. p .. p .. ( p, q)
. . .q
.. .. ..
. . .
.. .. ..
. . .
.. .. ..
. . .
.. .. ..
. . .
q>0 q=0 q<0
From the pictures, it is clear that the graph G intersects the x-axis at two
distinct points if and only if either a > 0 and q < 0, or a < 0 and q > 0. This is
exactly the same as saying that the equation a( x − p)2 + q = 0 has two distinct
zeros if and only if a and q have opposite signs. The pictures also make it clear
that the zeros of f are equidistant from p because G has bilateral symmetry with respect

to the vertical line x = p. We will also re-derive this result purely algebraically on
page 249.
The pictures of G also explain why f has a double zero if and only if q = 0,
and has no zero if and only if a and q have the same sign.

Exercises 10.2
(1) Let b < 0 and let Fb ( x ) = bx2 . Prove that Fb is increasing on the negative
x-axis and is decreasing on the positive x-axis.
(2) Let Λ be the reflection across a given line L, and let S be a geometric
figure in the plane. Prove that Λ(S ) = S if and only if Λ maps every
point of S to a point of S .
(3) Prove the case of a < 0 in part (ii) of Theorem 10.6 on page 243.
(4) A quadratic function f is given in vertex form: f ( x ) = 2( x − p)2 + q. It
is known that f (−1) = f (2) = 0. (i) If f (1) = −4, what is f (0) ? Do it
in two different ways. (ii) What are p and q?
(5) Given a quadratic function f ( x ) = a( x − p)2 + q. Suppose it is known
that f (−4) = 0 and f (−2) > 0. Can f (−3) be negative? Give a detailed
(6) (a) Let G be the graph of g( x ) = x2 . Let G be the set obtained by chang-
ing each point ( x, y) of G to ( x + 5, y). Then G is the graph of which
function? (b) G as above, let G be the set obtained by changing each
point ( x, y) of G to ( x, y − 2). Then G is the graph of which function?
(c) G as above, let G be the set obtained by changing each point ( x, y)
of G to ( x − 3, y + 2). Then G is the graph of which function? (d) Let
G be the graph of the function h( x ) = x3 . If G is the set obtained by
changing each point ( x, y) of G to ( x + 1, y + 2), then G is the graph of
which function?

10.3. Properties of quadratic functions

In view of the reasoning on page 231, it should come as no surprise that the
reason we spent the whole of the last section on quadratic functions in vertex
form is that every quadratic function can be put in vertex form. Precisely, we
have the following pivotal theorem.
Theorem 10.7. Every quadratic function f ( x ) = ax2 + bx + c can be put in the
form f ( x ) = a( x − p)2 + q for suitably chosen constants p and q.
Proof. We give two proofs. For the first proof, we go back to (10.18) on page
238; there we saw that if we write a polynomial a( x − p)2 + q as a quadratic
polynomial ax2 + bx + c, then the coefficients b and c satisfy
b = −2ap and c = ap2 + q.
This then suggests that if there are such p and q so that ax2 + bx + c =
a( x − p)2 + q, then we can solve for p and q from the preceding set of equations
to get p = − b
2a and
−b 2 4ac − b2
q = c − ap2 = c − a = .
2a 4a

We can now begin the proof proper. Given f ( x ) = ax2 + bx + c, let

b 4ac − b2
p = − and q = .
2a 4a
Then for all x:
b 4ac − b2
a( x − p) + q
= a x+ +
2a 4a

b b2 4ac − b2
= a x2 + x + 2 +
a 4a 4a

b2 4ac − b2
= ax2 + bx + +
4a 4a

= ax2 + bx + c.
It follows that with these values for p and q, the given function f ( x ) = ax2 + bx +
c is also equal to f ( x ) = a( x − p)2 + q. The first proof is complete.
A second proof is to retrace the steps of completing the square on page 231.
Starting with a general quadratic equation f ( x ) = ax2 + bx + c, we let B in (10.5)
be ba to get
    2   2
b b b b
f (x) = a x + x + c = a x + x + 2
−a + c.
a a 2a 2a
b − b2 −b2 + 4ac
−a +c = +c = ,
2a 4a 4a
we have:
b 2 b − 4ac
f (x) = a x + − .
2a 4a
Therefore the theorem would be proved if we let
b 4ac − b2
(10.22) p = − and q = .
2a 4a
We have completed the proof of Theorem 10.7.
The point ( p, q) in Theorem 10.7 is, not surprisingly, called the vertex of the
graph. Note that the vertex being on the graph of f means f ( p) = q, which is
obvious anyway from the expression f ( x ) = a( x − p)2 + q.


Put f ( x ) = 2x2 + x − 2 in vertex form. Does it have any zeros? Where does
it achieve its minimum, and what is its minimum value?

Once we have Theorem 10.7, the theorems in the last section about quadratic
functions in vertex form now become theorems about any quadratic functions.
The following theorem then summarizes our findings about general quadratic

functions. (Recall that the number b2 − 4ac is called the discriminant of ax2 +
bx + c; see page 232.)

Theorem 10.8. For any quadratic function f ( x ) = ax2 + bx + c, let p and q be

defined as in (10.22). Then:
(i) The graph of f has bilateral symmetry with respect to the vertical
line defined by x = p. If a > 0, f is decreasing on (−∞, p] and
therefore increasing on [ p, ∞); if however a < 0, then f is increasing
on (−∞, p] and therefore decreasing on [ p, ∞).
(ii) If a > 0, then f achieves its minimum at p. If a < 0, then f
achieves its maximum at p. In both cases, f ( p) = q.
(iii) The function f has two zeros, a double zero, or no zero according
to whether its discriminant b2 − 4ac is positive, zero, or negative,

Proof. Parts (i) and (ii) are immediate consequences of Theorem 10.7 and Theo-
rem 10.6 on page 243.
For part (iii), we know from Theorem 10.7 that the function f can be written
as f ( x ) = a( x − p)2 + q, where p and q are as in (10.22). Therefore,

f ( x ) = a ( x − p )2 − M ,

b2 − 4ac
M = .
Since a = 0, x0 is a zero of f if and only if it is a solution of the quadratic
equation ( x − p)2 − M = 0. Suppose the discriminant b2 − 4ac is negative.
Since 4a2 > 0, we have that M < 0 as well and therefore − M > 0. Hence
( x − p)2 − M > 0 because ( x − p)2 ≥ 0 no matter what x is. This implies that
the equation ( x − p)2 − M = 0 can have no solution. If however the discriminant
is ≥ 0, we know from the quadratic formula (Theorem 10.2 on page 232) that the
equation ax2 + bx + c = 0 has a double root (when the discriminant is 0) or two
distinct roots (when the discriminant is positive), and therefore the same holds
for the zeros of f ( x ) = ax2 + bx + c. In fact, the quadratic formula gives the two
zeros (including the double zero) as

−b ± b2 − 4ac
(10.23) r1 , r2 = .
See equation (10.10) on page 232. The proof of Theorem 10.8 is complete.


Use mental math to decide whether each of the following quadratic func-
tions has two distinct zeros, only one zero, or no zero: (i) f ( x ) = 215x2 −
87x + 21. (ii) f (s) = 5s2 + 223 s + 7. (iii) g ( x ) = −83x + 5.2x − 76 . (iv)
2 9

h(s) = 12 s2 − 15
7 s + 1.5. (v) h ( x ) = 3.2x − 9.5x + 22.

In view of (10.22), the vertex ( p, q) of the graph of f is the point

−b 4ac − b2
(10.24) , .
2a 4a

In particular, the vertex lies on the line of symmetry of the graph of f given in part (i) of
Theorem 10.8, which is the vertical line L defined by x = p = − b
2a . Since the zeros
of f are, according to (10.23),
√  √ 
−b b2 − 4ac −b b2 − 4ac
+ and −
2a 2a 2a 2a

we see that the zeros—being the points of intersection of the graph of f with the
x-axis—are equidistant from the line of symmetry L, as demanded by part (i)
of Theorem 10.8. The vertex is either the highest point on the graph of f ( x ) =
ax2 + bx + c (in the case that a < 0) or the lowest point on the graph of f (in the
case that a > 0).
It is common to refer to the graph of a quadratic function in the case of a > 0
as an up parabola, and in the case of a < 0 as a down parabola.


a>0 a<0
This terminology is very slippery because it assumes that the definition of a
parabola is known and the reason for why the graph of a quadratic function
is a parabola is also known. The truth is the opposite: TSM has not defined what
a parabola is up to this point, and no reason is ever given as to why the graph of a
quadratic function is a parabola. However, we will justify this terminology in the
next section, Section 10.4, by first defining a parabola precisely, and then proving
that the graph of a quadratic function is a parabola (Theorem 10.11 on page 254).
Example. Let f ( x ) = ax2 + bx + c be a quadratic function with a < 0. Sup-
pose its graph G contains the two points P = (−1, 2) and Q = (4, 2). Where
might f achieve its maximum? Does f have zeros?
The graph G has bilateral symmetry with respect to a vertical line L (part (i)
of Theorem 10.8). Can P and Q be on the same side of L? No, for the following
reason. Let L intersect the x-axis at p. Then because a < 0, f is increasing on
(−∞, p] and decreasing on [ p, ∞) (by part (ii) of Theorem 10.8). Therefore if P
and Q are on the left side of L, the f being increasing on (−∞, p] would mean
f (−1) < f (4). This contradicts f (−1) = f (4) = 2. Similarly, P and Q cannot
be on the right side of L. Thus P and Q are necessarily on opposite sides of L, as
shown below.

Pr Qr L Qr

−1 O 1 p
Where could P be? Let Λ be the reflection with respect to L. We claim that
Λ( Q) = P. We will prove this by contradiction. Suppose the point Q = Λ( Q)
is not equal to P—as shown in the picture. Since L is the line of symmetry of G,
we see that Q is also a point of G. Let Q = ( x , 2) for some x < p, x = −1;
the fact that the y-coordinate of Q is 2 is because Q = Λ( Q) = Λ(4, 2), and
Λ(4, 2) = (m, 2) for some number m by the definition of Λ. Now both Q and P
are on the same side of L and are both on G; because f is increasing on (−∞, p], we
have f ( x ) < f (−1) or f (−1) < f ( x ). But this contradicts f ( x ) = f (−1) = 2.
Therefore Λ( Q) = P, which implies that the intersection ( p, 0) of L with the x-axis
must be (1.5, 0) because (1.5, 0) is the midpoint between (−1, 0) and (4, 0).
Pr L Qr

−1 O 1 1.5
Since the vertex of G must lie on the line of symmetry L, we see that f must
achieve its maximum at 1.5. Moreover, because f is increasing on (−∞, 1.5], we
see that f (1.5) > f (−1) = 2 > 0. Thus f has a positive maximum and therefore,
if we express f in vertex form, f ( x ) = a( x − 1.5)2 + q, then q = f (1.5) > 0 and
q and a have opposite signs. By the observation on page 235, f has two distinct
zeros. We are done.
Let us conclude with a few additional remarks. First, part (ii) of Theorem
10.8 can be made more precise, and the added precision will be important for
applications of quadratic functions to word problems. In (10.22), we have the ex-
plicit values of p and q given in terms of the coefficients a, b, and c of f . Therefore
part (ii) implies the following corollary.
Corollary. Given a quadratic function f ( x ) = ax2 + bx + c. Then:
(i) If a > 0, f achieves a unique minimum at − b
2a , and the mini-
b 4ac− b2
mum value f (− 2a )is 4a .
(ii) If a < 0, f achieves a unique maximum at − 2a
, and the maxi-
b 4ac− b 2
mum value f (− 2a ) is 4a .

Next, we should reformulate Theorem 10.3 on page 234 in terms of quadratic

functions, because it is actually about functions.

Theorem 10.9. Let f ( x ) = ax2 + bx + c be a quadratic function so that its dis-

criminant b2 − 4ac is nonnegative, and let r1 and r2 be the zeros of f (r1 could be equal
to r2 ). Then

(10.25) r1 + r2 = − ba and r1 r2 = ac .


(10.26) f ( x ) = a( x − r1 )( x − r2 ) for all x.

Once again, we point out that, by (10.26), all quadratic functions with the
same two zeros r1 and r2 are equal to a constant times ( x − r1 )( x − r2 ).
Theorem 10.9 needs the assumption of a nonnegative discriminant because if
the discriminant is negative, the function f has no zeros and it would not make
sense to talk about r1 and r2 . However, once we have complex numbers, there
will always be zeros for complex quadratic polynomial functions f and Theorem
10.9 will be true verbatim without any assumption about the discriminant. See,
e.g., Chapter 11 in Volume II of [Wu-HighSchool].
Finally, let us observe that we have by now obtained at least three different
representations of a quadratic function:
(1) Its standard representation: f ( x ) = ax2 + bx + c.
(2) Its representation in vertex form: f ( x ) = a( x − p)2 + q.
(3) Its representation in factored form: f ( x ) = a( x − r1 )( x − r2 ), where r1 and
r2 are the zeros of f .
Each is important in its own right: the qua-
dratic formula is expressed in terms of (1), for Each of the three representations
example, and (2) displays the line of symmetry of a quadratic function reveals a
of the graph of f and also where it achieves its
maximum or minimum. If the zeros of f are
different facet of the function.
our main interest, then (3) displays its zeros explicitly. Together, the three repre-
sentations give a well-rounded picture of f ; none gets it done alone.
Needless to say, the more representations we have of a concept, the more we
can claim to understand it. However, these representations in mathematics are not
randomly put together. There is always a clear logical relationship between them.
For example, in our situation, (1) was the starting point, i.e., the definition of a
quadratic function, and (2) and (3) were obtained only after serious hard work:
see Theorem 10.7 on page 246 and Theorem 10.3 on page 234, respectively.
Recently, it has become acceptable practice to make amassing different rep-
resentations of a concept a goal in itself, with no imperative to show any logical
interrelationship among them. For example, the concept of a fraction is supposed
to be a part-of-a-whole, a division, a ratio, etc. (see Section 1.1 in [Wu-PreAlg]),
and it is never clear which is the starting point and how the other representations
are related to the starting point, logically speaking. When this happens, it is TSM
and not mathematics. You may wish to stay alert to this fact.

Exercises 10.3

(1) Let G be the graph of the quadratic function f ( x ) = ax2 + bx + c. Prove

that there is a translation T so that G is the translated image under T of
the graph Ga of Fa ( x ) = ax2 .
(2) In the table on page 226, it was observed that the values of y in the
second and fourth columns are identical. Can you explain why?
(3) Write each of the following quadratic functions in the form a( x − p)2 +
q for suitably chosen numbers a, p, and q by completing the square:
(i) f ( x ) = x2 − 8x + 7. (ii) g( x ) = −2x2 + 6x − 21. (iii) h( x ) = 3x2 +
4x + 6. (iv) f ( x ) = − 23 x2 + x − 1. (v) g( x ) = 5x2 − 25 + 2. (vi) h( x ) =
√ 2 √ √
2x − 6x + 5. (vii) f ( x ) = −3x2 + 5x − 1.
(4) Find the maximum or the minimum of each of the following quadratic
functions: (i) f ( x ) = 2x2 + 3x + 4. (ii) g( x ) = 34 x2 − 2x + 85 . (iii) h( x ) =
−6x2 + x + 53 . (iv) f ( x ) = 3x2 − 2x + 83 .
(5) Sketch the graph of each of the functions in the preceding exercise; indi-
cate whether it is an up parabola or a down parabola, estimate the zeros
of the function if there are any, and locate the vertex.
(6) Suppose we have two quadratic functions f ( x ) = ax2 + bx + c and
g( x ) = a x2 + b x + c , and suppose f ( x ) = g( x ) for all x. Prove that
a = a , b = b , and c = c .
(7) (i) In the quadratic function f ( x ) = 3x2 − ux + 2, u is a number. For
what values of u would f have two zeros? One zero? No zero? (ii) In
the quadratic function g( x ) = 3x2 + x + 2u, u is a number. For what
values of u would g have two zeros? One zero? No zero?
(8) Among all rectangles with a perimeter of P meters, which has the great-
est area? Prove that your answer is correct.
(9) A line passing through the points (t, 2) and (3, t) has slope 2t. What is t?
(10) If a quadratic function f has two zeros at 1 and 1.7, where does it achieve
its maximum or minimum? If also f (1.2) = 0.5, describe f completely.

10.4. The graph and the parabola

We will now fulfill the promise made in the
Until one knows what a preamble of this chapter (page 224) by clari-
parabola is, it makes no sense to fying the relationship between graphs of qua-
dratic functions and parabolas. Let us first in-
say that “the graph of a troduce a few definitions. A parabola G is the
quadratic function is a set of all the points equidistant from a fixed
parabola”. point A and a fixed line L (for the concept of
distance from a line, see page 54). The point A
is called the focus of the parabola, and the line L its directrix. Thus if P is a
point on G, then | PA| = | PC |, where C is the point of intersection of L with the
perpendicular from P to L, as shown below.



Let the line passing through the focus A and perpendicular to the directrix L
intersect G at a point O and L at a point B. O is called the vertex of the parabola
G. It will be see from Theorem 10.11 below that this use of vertex does not conflict
with the use of the same word on page 247. Also note that from the definition of
the parabola, we have | AB| = 2 | AO|. The length of the segment AO is called the
focal length of the parabola G.
Before proceeding any further, it would be a good idea to first acquire some
intuition for parabolas. We will describe a simple way to construct points on a
parabola with a given focus and a given directrix, in much the same way that we
plot points on the graph of a given function. Thus draw a point A (the focus)
and a line L (the directrix), and we will describe how to draw as many points
equidistant from A and L as we wish.

P s


A r d

Let B be the point of intersection of L with the line passing through A and per-
pendicular to L. The midpoint O of AB is of course equidistant from A and L. For
other such points, draw a line L parallel to L and lying in the same half-plane
of L as the focus A. The distance d of L from L (see distance between parallel lines
on page 267) should be so large that d > |OB|. Now draw a circle of radius d
and centered at A. Let the circle intersect L at P and P . Then P and P are both
equidistant from L and A. Draw many such pairs of P and P , and the totality
of these points is the parabola G with focus A and directrix L. A moment of
reflection will show that the line L AB is a line of bilateral symmetry of G.


Construct three points on the parabola G with focus (1, 0) and the line de-
fined by y = −1 as its directrix. What is the x-coordinate of the point P on
G if the distance of P from (0, 1) is 3?

The theorems we are going to prove are the following.

Theorem 10.10. (i) Any geometric figure similar to a parabola is a parabola. (ii)
Any two parabolas are similar.
Theorem 10.11. The graph of every quadratic function is a parabola.
An immediate corollary of these two theorems is the following.
Corollary. (i) Any geometric figure congruent to the graph of a quadratic function is a
parabola. (ii) All the graphs of quadratic functions are similar to each other. In particular,
the graph of every quadratic function is similar to the graph of f ( x ) = x2 .
Part (i) of the Corollary is implied by Theorem 10.10 (i) because if two geo-
metric figures are congruent, then they are also similar (see similar figures on page
268). Part (ii) follows from Theorem 10.11 and Theorem 10.10 (ii).
It is quite clear that these theorems are not part of the standard algebra cur-
riculum, whether it be Algebra I or Algebra II or any integrated curriculum. So
why spend time on them here? There are
Without a precise definition of three reasons. The first is that part (ii) of the
similarity, it does not make sense Corollary is a surprising result, and even men-
tioning something like this in an algebra class
to say “any two parabolas are
could be inspiring or intriguing to students.
similar”. A second reason is that the proof of Theorem
10.10 shows why it is essential to know a pre-
cise definition of similarity (see page 268). In this case, we have to prove that any
two parabolas are similar. Since parabolas offer no line segments to measure and
no angles to compare, we are forced to use the definition of similarity in terms
of dilation and congruence for the proof. The final reason is that, in TSM, either
Theorem 10.11 is stated without any definition of what a parabola is or, worse, it is
offered as the definition of a parabola, i.e., a parabola is by definition the graph of
a quadratic function. One consequence of the latter is that, according to TSM, the
graph of the equation in two variables, x = y2 , is not a parabola.8 Of course it is
(a parabola), because it can be viewed as the 90-degree clockwise rotated image
(around the origin) of the graph of y = x2 . It is simply not acceptable that TSM
be allowed to mislead students and teachers to this extent, and we wish to set the
record straight.
For the proof of Theorem 10.10, we recall some basic facts about similarity
from Section 4.7 of [Wu-PreAlg]. Given two geometric figures S and S , we say
S is similar to S , in symbols S ∼ S , if there is a dilation D (see page 267 for the
definition of dilation) so that D (S ) is congruent to S . More explicitly, S ∼ S
means there is a dilation D and a congruence ϕ (the lower case Greek letter phi)
8 The author was inspired to write down a proof of the Corollary after he learned from a friend

at Harvard that one of his daughter’s teachers told her “the graph of x = y2 is not a parabola”. The
culprit is not the teacher; it is TSM.

so that

(10.27) ϕ( D (S )) = S .

The composition ϕ ◦ D is called a similarity. The scale factor r of the dilation D

is then also called the scale factor of the similarity ϕ ◦ D , and the significance of
r is that the similarity ϕ ◦ D changes distance by this fixed factor of r, i.e., if the
distance between P and Q is d, and the similarity maps P to P and Q to Q , then
the distance between P and Q is rd.
In order to prove Theorem 10.10, we will have to prove the similarity of geo-
metric figures without appealing to the usual AA or SAS criteria for similarity (see
pages 269 and 270), because there are no angles or sides in a parabola. We will
have to argue directly by using the definition of similarity in terms of dilations
and congruences.
Proof of Theorem 10.10. We will first prove part (i). The proof depends on the
following two lemmas:

Lemma 10.12. A geometric figure congruent to a parabola is itself a parabola.

Lemma 10.13. If G is a parabola and D is a dilation, then the image D ( G ) is a


We first prove Lemma 10.12. Let G be a parabola and let ϕ be a congruence;

we will prove that the image ϕ( G ) is a parabola. Let the focus and directrix of G
be A and L as usual. We claim that G = ϕ( G ) is the parabola with A = ϕ( A)
as focus and L = ϕ( L) as directrix. To this end, take a point P on G and we
first show that P is equidistant from A and L . Since P lies on G , there is a point
P on G so that P = ϕ( P). But P being on G implies, by the definition of G, that
P is equidistant from A and L. Since a congruence preserves distance, P is also
equidistant from A and L .



We are not finished, however, because for G to be the parabola with focus A and
directrix L , we must show that G contains all the points equidistant from A and
L. Therefore it remains to show that if P is a point equidistant from A and L ,
then P lies in G . Now let P be the point in the plane so that ϕ( P) = P . Because ϕ
is a congruence and is therefore distance-preserving, we see that P is equidistant
from A (because ϕ( A) = A ) and L (because ϕ( L) = L ). Since G contains all the

points equidistant from A and L, P belongs to the parabola G. Therefore ϕ( P)

belongs to ϕ( G ), and therefore P (= ϕ( P)) lies in G (= ϕ( G )) after all. The proof
of Lemma 10.12 is complete.
Next we prove Lemma 10.13; it is similar to the preceding proof. Thus let D
be a dilation with scale factor r and let G be the given parabola with focus and
directrix A and L, respectively. Also let A = D ( A) and L = D ( L). We claim that
the image G = D ( G ) is the parabola with focus A and directrix L . Take a point
P on G ; we will show that P is equidistant from A and L . Let P be the point
on G so that D ( P) = P .

d A

Then the distance of P from A is equal to the distance of P from L; call this
common distance d. By Theorem 4.4 in [Wu-PreAlg] (see page 271 in this volume),
the distance of P to either A or L is rd. In particular, P is equidistant from A
and L , as claimed. It remains to prove that if a point P is equidistant from A and
L , then P lies in G = D ( G ). Let P be the point in the plane so that D ( P) = P .
Then we know (again by Theorem 4.4 in [Wu-PreAlg]) that | P A | = r | PA|, and
the distance from P to L is also r times the distance from P to L. Because P
is equidistant from A and L , we conclude that P is also equidistant from A
and L. Since G is the parabola with A as focus and L as directrix, P lies in G,
and therefore P (= D ( P)) lies in G (= ( D ( G )). The proof of Lemma 10.13 is
The proof of part (i) of Theorem 10.10 is now immediate: Suppose a parabola
G is similar to G ; then we have to prove that G is a parabola. Let ϕ( D ( G )) = G ,
where ϕ is a congruence and D is a dilation. By Lemma 10.13, D ( G ) is a parabola.
By Lemma 10.12, ϕ( D ( G )) is also a parabola. In other words, G = ϕ( D ( G )) is
a parabola. The proof of part (i) of Theorem 10.10 is complete.

Remark. The preceding proof of part (i) of Theorem 10.10 implicitly assumes
the so-called symmetry of the similarity relation, i.e., if G ∼ G , then also G ∼ G
(see Section 4.7 of [Wu-PreAlg] for this discussion). In greater detail, we have
just shown that if a parabola G is similar to G , then G is a parabola. However,
we should have also proved that if G is similar to a parabola G , then G is a
parabola. The reason we did not address this issue is that we have been assuming
all along that this symmetry is valid and have avoided the proof of this fact because
the proof is unpleasant as well as noninstructive; such a proof can be found in
Section 5.4 in Volume I of [Wu-HighSchool].

Proof of Theorem 10.10. (cont.) The proof of part (ii) of the theorem makes use
of the following lemma which is of independent interest. Recall that on page 253,
we defined the concept of the focal length of a parabola, which is the distance from
the focus to the vertex of the parabola.

Lemma 10.14. Two parabolas with the same focal length are congruent.

To prove the lemma, let G be the parabola with focus A and directrix L, and
let G be the parabola with focus A and directrix L . By hypothesis, G and G
have the same focal length. Let the line passing through A (respectively, A ) and
perpendicular to L (resp., L ) intersect L at B (resp., L at B ).




Since | AB| is twice the focal length of G and | A B | is twice the focal length of G ,
we see that | AB| = | A B |. Thus there is a congruence ϕ so that ϕ( A) = A and
ϕ( B) = B . Since a congruence preserves degrees of angles, the fact that L ⊥ L AB
implies that ϕ( L) ⊥ ϕ( L AB ). But ϕ( L AB ) = L A B , therefore ϕ( L) ⊥ L A B . Since
also L ⊥ L A B , we see that ϕ( L) and L are two lines that are both perpendicular
to L A B at the point B . Therefore ϕ( L) = L .
We claim:

ϕ( G ) = G .

To prove the claim, we first prove that if P is on ϕ( G ), then P is on G .

Now, P being on ϕ( G ) means there is a point P on G so that P = ϕ( P). Since
P is equidistant from A and L, we see that—since ϕ is a congruence—ϕ( P) is
equidistant from ϕ( A) and ϕ( L), i.e., P is equidistant from A and L and, hence,
P is on G . Conversely, suppose P is on G ; then we have to prove that P is on
ϕ( G ), i.e., we have to prove that there is a point P on G so that P = ϕ( P). Let P
be the point in the plane so that ϕ( P) = P ; then it suffices to prove that P lies
on G. This is so because P being on G implies that P is equidistant from A and
L . Since ϕ( P) = P , ϕ( A) = A and ϕ( L) = L and since ϕ is a congruence, P
is indeed equidistant from A and L. Since G is the parabola with focus A and
directrix L, we conclude that P lies on G. This proves the claim and hence also
Lemma 10.14.

We are now in a position to prove part (ii) of Theorem 10.10. Given two
parabolas G and G , we will prove that there is a similarity that maps G to G .
Let the focus and directrix of G be A and L, and let the focus and directrix
of G be A and L . Furthermore, let the line passing through A (respectively, A )
and perpendicular to L (resp., L ) intersect L (resp., L ) at the point B (resp., B ),
as shown below.

rA rA


Let r = | A B |/| AB|. Let D be the dilation with center at B and scale factor r.
Let A = D ( A), B = D ( B), L = D ( L), and G = D ( G ). Observe that L = L
and B = B. By Lemma 10.13 on page 255, G is a parabola with focus A and
directrix L. We now compute the focal length of G: it is (compare Theorem 4.4 of
[Wu-PreAlg] on page 271)

1 1 1 | A B | 1
| A B = r | AB| = | AB| = | A B |.
2 2 2 | AB| 2

Thus G and G have the same focal length and, by Lemma 10.14, there is a con-
gruence ϕ so that ϕ( G ) = G . In other words, G = ϕ( D ( G )). Therefore, ϕ ◦ D
is the desired similarity. The proof of Theorem 10.10 is complete.
Before we give the proof of Theorem 10.11, some motivation for the proof
may shed light on the proof itself. Consider the graph Ga of Fa ( x ) = ax2 for
some constant a. If we believe Theorem 10.11, then Ga must be a parabola. Since
a parabola is defined in terms of a focus and a directrix, one naturally wants to
know which point is the focus and which line is the directrix of Ga ? Since the
y-axis is the line of symmetry for Ga , we should look for the focus of Ga among
points along the y-axis, i.e., points of the form (0, k). For simplicity, we assume
a > 0. Then k > 0, and since the origin O should be equidistant from (0, k) and
the directrix, it follows that the directrix of Ga has to be the line y = −k. We will
now determine this k.
P = ( x, y)

(0, k) = A r

r x
L = {y = −k}

Let P = ( x, y) be a point on Ga . The distance of P from the x-axis is

of course y, so
that the distance of P from  is y + k. The distance of P from A is x2 + (y − k)2 ,
by the distance formula (page 57). Ga being a parabola, these two distances are
equal and therefore so are their squares. Thus,

( y + k )2 = x 2 + ( y − k )2 .

Expanding, we find:

y2 + 2ky + k2 = x2 + y2 − 2ky + k2 ,
which becomes
1 2
y = 4k x for all ( x, y) on Ga .
Since Ga is the graph of Fa ( x ) = ax2 , we also have y = ax2 for all ( x, y) on Ga .
1 2 1
Comparing these two equations, y = 4k x and y = ax2 , we get a = 4k and
k = 4a .
Since the focus of Ga is (0, k) and the directrix is the graph of y = −k, we see that
if Ga is a parabola, then
(10.28) its focus should be 0, 4a ,

(10.29) its directrix should be the line {y = − 4a }.

Note that, if a < 0, then k < 0, but the preceding reasoning and the conclusions
in (10.28) and (10.29) remain valid.


Prove this assertion about the case a < 0.

In summary: If we want to prove that the graph Ga of Fa ( x ) = ax2 is a
parabola, our best bet would be to prove that Ga coincides with the parabola
with focus at (0, 4a ) and with the line defined by y = − 4a
as its directrix. The
following proof will become much more understandable if you keep this in mind.
Proof of Theorem 10.11. Let G be the graph of a quadratic function f ( x ) =
ax2 + bx + c . We have to show that G is a parabola. Now, Theorem 10.7 (page
246) and Theorem 10.5 (page 242) together show that there is a translation T so
that the image T ( Ga ) of the graph Ga of the quadratic function Fa ( x ) = ax2 is
G. Therefore, if we can prove that Ga is a parabola, then Lemma 10.12 (page 255)
will guarantee that G is itself a parabola and the theorem would be proved. We
will in fact prove that Ga is the parabola with focus at A = (0, 4a ) and with the
line L defined by y = − 4a as its directrix.

To this end, we have to prove two things: (i) Every point P on Ga is equidis-
tant from A and L. (ii) Any point equidistant from A and L lies on Ga . Clearly,
what lies ahead is just a straightforward computation.
If P is a point on Ga , then P = ( x, ax2 ). The distance formula (page 57) implies
that the square of the distance from P to A is:
1 2
x2 + ( ax2 − 4a ) = x2 + a2 x4 − 12 x2 + ( 4a
1 2

= a2 x4 + 12 x2 + ( 4a
1 2
 1 2
= ax2 + 4a .

But the square of the distance from P = ( x, ax2 ) to the horizontal line L defined
by y = − 4a
is simply the square of the difference of their y-coordinates:
 2 1 2  1 2
ax − (− 4a ) = ax2 + 4a .
Hence the squares of the two distances are equal and by Corollary 2 on page 209,
the distances themselves are equal. This shows that every point of G is equidistant
from A and L. This proves (i).
To prove (ii), suppose P = ( x, y) is equidistant from both A and L. Then the
squares of the distances are equal. But the square of the distance from P to A is
x2 + (y − 4a ) , while the square of the distance from P to L is (y − (− 4a
1 2 1
))2 , i.e.,
1 2
(y + 4a ) . Thus,
1 2  1 2
x2 + (y − 4a ) = y + 4a .
This implies
1 1 2 1 1 2
x2 + y2 − 2a y + ( 4a ) = y2 + 2a y + ( 4a ) .
After cancellation and collecting terms, we get x2 = 1a y, so that y = ax2 . There-
fore P = ( x, ax2 ) and P is a point on Ga . This proves (ii) and, therewith, also
Theorem 10.11.

Exercises 10.4

(1) Let G be the graph of f ( x ) = a( x − p)2 + q. Locate the focus and find
the equation of the directrix of G.
(2) Write down the explicit similarity that maps the graph of f ( x ) = 4x2 +
12x to the graph G1 of F1 ( x ) = x2 . Do the same with the graph of
g( x ) = −4x2 + 12x − 9.
(3) Let A = ( p, q + b), and let L be the line {y = q − b}, where b, p, and
q are real numbers. Prove that if1 a point 2Q is equidistant from A and ,
then Q must be of the form x, 4b ( x − p) + q .
(4) Let D be the dilation centered at the origin O so that D ( x, y) = (4x, 4y),
and let T be the translation along the vector from O to (1, −3). If G2
is the graph of F2 ( x ) = 2x2 , write down the function whose graph is
T ( D ( G2 )).
(5) Let C be the graph of the quadratic equation x = 14 y2 − y. Prove that C
is a parabola.

10.5. Some applications

We give three examples of how quadratic equations and functions typically arise
in applications, within or outside of mathematics. The first example is essentially
Exercise 12 on page 34.
Example 1. The denominator of a fraction exceeds twice the numerator by 2,
and the difference between the fraction and its reciprocal is 5524 . If the numerator
is x and the denominator y, what is the fraction?
Since the denominator is bigger than twice the numerator, the fraction xy is
a proper fraction so that its reciprocal is bigger than the fraction itself. Thus the

given data about the difference between the fraction and its reciprocal has to be
expressed as
y x 55
− = .
x y 24
y2 − x 2 55
xy 24
and we get 55xy = 24(y2 − x2 ), by the cross-multiplication algorithm (see page 270).
We are also given that y − 2x = 2. Therefore,
55x (2x + 2) = 24 (2x + 2)2 − x2 .
Expand both sides to get
110x2 + 110x = 72x2 + 192x + 96.
After simplifying, we get the quadratic equation in x,
19x2 − 41x − 48 = 0.
One either sees the left side as ( x − 3)(19x + 16) or one uses the quadratic for-
mula. In any case, the latter gives the roots as:

41 ± 412 + (4 · 19 · 48) 41 ± 73 32
= = 3 or − .
38 38 38
Only 3 makes sense in context, so we reject the other root. Thus x = 3 and y = 8.
The fraction is therefore 38 . (Check: 83 − 38 = 55
24 .)
Example 2. (Golden ratio) Given a rectangle ABCD and an embedded square
ABEF so that ABCD is similar to the smaller rectangle ECDF, as shown:

Let | AB| = x, | BC | = y, and | FD | = z. Find the ratio x .
Since ABCD ∼ ECDF, we have x = xz , so that x2 = yz, by the cross-
multiplication algorithm. But from the picture, z = y − | AF| = y − | AB| = y − x,
because ABEF is a square. Therefore we get x2 − y(y − x ) = 0, and this is a
quadratic equation in y whose coefficients are expressed in terms of the number x:
(10.30) y2 − xy − x2 = 0.
In the usual way of writing a quadratic equation as ax2 + bx + c = 0, the y in
(10.30) takes the place of x, a is 1, b is − x, and c is − x2 . The solutions by the
quadratic formula are therefore:

x ± x2 + 4x2 √
= 12 (1 ± 5) x.

√ √
Since 1 − 5 < 0 and x and y are positive numbers, 12 (1 − 5) x is a spurious
solution in this context. Therefore,
y √
= 12 (1 + 5) ≈ 1.6.

The number 12 (1 + 5) is called the golden ratio; next to 0, 1, π, e, and i, it
may be the most famous number in mathematics. It has a habit of coming up in
unexpected situations, e.g., in the discussion of Fibonacci numbers; see the article
[Wiki-goldenratio] for a general introduction to the golden ratio and for further
references. It has been a tradition to make index cards so that the ratio of their
side lengths is roughly the golden ratio; be sure to verify this fact to your own
satisfaction as soon as possible.
Example 3. If an object is thrown directly upwards from a height of h meters
from the ground with an initial velocity of v0 m/sec, then its distance (in meters)
f (t) above the ground t seconds after it is thrown is

f (t) = −4.9t2 + v0 t + h.

(This follows from Newton’s second law and the law of universal gravitation.)
Now if h = 20 meters and v0 = 2 m/sec, what is the highest point of the object
above the ground, when does it get there, and when does it hit the ground?
The highest point above the ground is the maximum of the quadratic function
f (t) = −4.9 t2 + 2t + 20. We can make use of the Corollary on page 250 to get the
maximum. It is
4(−4.9)(20) − 22 (4.9)20 + 1 10
= = 20 meters.
4(−4.9) 4.9 49
However, it would be futile, not to say impossible, to keep that corollary in acces-
sible memory all the time; there are more important things to memorize in one’s
life. Form the habit right now of just completing the square to get the vertex form until
you are fluent in this skill, as follows:
f (t) = −4.9t + 2t + 20 = (−4.9) t −
2 2
t + 20
2 1 2 1
= (−4.9) t −2
t+ + + 20
4.9 4.9 4.9
1 2 10
= (−4.9) t − + + 20.
4.9 49

At this point, the maximum value of f (t) is all too visible: 10 10

49 + 20 = 20 49 meters,
and equally visible is the fact that f attains this value when t = 4.9 sec., which is
of course 10
49 ≈ 0.2 sec.
The object hits the ground after t0 seconds if f (t0 ) = 0. Thus solving
−4.9t2 + 2t + 20 = 0 by the quadratic formula, we get
−2 ± 22 + 4(4.9)(20) 1 ± 98
t0 = = .
2(−4.9) 4.9

The only viable solution is

1 + 98
t0 = ≈ 2.2 seconds.

Exercises 10.5
(1) A rectangle has a perimeter of 180 linear units and an area of 1800 area
units. What are its dimensions?
(2) A hifi store sells only 35 CD players of a particular brand each month
when the price is marked up to make a profit of $50 per player. Suppose
the store decides to change the price in integer multiples of 2 dollars,
and it is known that (roughly) for each $2 decrease in the price, the store
can sell 5 more players. What should the price be per player in order to
maximize total monthly profit? What will the total profit be per month?
(3) Find two numbers whose difference is 7, and the difference of their cubes
is 721.
(4) A merchant has a cask full of wine. He draws out 6 gallons and fills the
cask with water. Again he draws out 6 gallons, and fills the cask with
water. There are now 25 gallons of pure wine in the cask. How many
gallons does the cask hold?
(5) Two workmen can do a piece of work (think of painting a house) together
in 6 days. In how many days can each do it alone if it takes one of them
5 days longer than the other? (Assume both work at a constant rate and
that they do not interfere with each other.)
(6) George drove from town A to town B at an average speed of x mph. On
the way back along the same road from Town B to town A, he ran into
rush hour traffic and his average speed slowed down to ( x − 10) mph.
The driving round trip took (about) an hour and fifteen minutes. If the
driving distance between towns A and B is 30 miles, what is x (rounded
to the nearest one)?
(7) Two trains go from City A to City B at constant rate; the distance between
the cities is 200 miles. The second train starts one hour later than the first,
but, traveling 5 mph faster, gets to City B only 30 minutes later than the
first train. Find the time of travel for each train. (Idealize both trains to
be a single point in your reasoning.)
(8) A tank can be filled by the larger of two faucets in 5 hours less time than
by the smaller one. It is filled by them both together in 6 hours. If the
water flows from the faucets at a constant rate, how many hours will it
take to fill the tank by each faucet separately?
Appendix: Facts from [Wu-PreAlg]

There are three parts in this Appendix:

Part 1. Assumptions
Part 2. Definitions
Part 3. Theorems and Lemmas

The section in [Wu-PreAlg] where each item first appears is indicated parenthetically at
the end of that item.

Part 1. Assumptions
Fundamental Assumption of School Mathematics (FASM). We can add and
multiply real numbers, and the laws of operations for both addition and
multiplication (associative, commutative, and distributive), the formu-
las (a)–(d) for rational quotients (page 270), and the basic facts about
inequalities (A)–(E) for rational numbers (page 269) continue to be valid
when the rational numbers are replaced by real numbers. (Section 2.7)
(Iso1). Translations, reflections, and rotations preserve lengths of segments and
degrees of angles. (Section 4.4)
(Iso2). Under a translation or reflection or rotation, the image of a line is a line,
the image of a segment is a segment, and the image of a ray is a ray.
(Section 4.4)

Part 2. Definitions

Alternate interior angles. Let two distinct lines L1 , L2 be given. A transversal of

L1 and L2 is any line  that meets both lines in distinct points. Suppose
 meets L1 and L2 at P1 and P2 , respectively. Let Q1 and Q2 be points
on L1 and L2 , respectively, so that they lie in opposite half-planes of .
Then ∠Q1 P1 P2 and ∠ P1 P2 Q2 are said to be alternate interior angles of
the transversal  with respect to L1 and L2 . (Section 4.6)
Angle. An angle ∠ AOB is by definition a region in the plane whose boundary
consists of two rays ROA and ROB , with a common vertex O; each of ROA
and ROB is called a side of the angle and O is called the vertex of the
angle. Because of the inherent ambiguity in this definition, ∠ AOB is
usually taken to be the intersection of two closed half-planes: the closed
half-plane of LOA that contains B, and the closed half-plane of LOB that
contains A. (Section 4.4)

Average speed. For an object in motion, its average speed over the time interval
from t1 to t2 , t1 < t2 , is

distance traveled from t1 to t2

t2 − t1

(Section 1.9)
Basic isometry. In the plane, a basic isometry refers to a translation, a rotation,
or a reflection. (Section 4.4)
Between. Given a line L in the plane, let P and Q be two points on L. A point S
is said to be between P and Q if S lies on L and if, when we make L into
a number line, either P < S < Q or Q < S < P holds. The fact that one
and only one of these inequalities holds is independent of the way L is
made into a number line. (Section 4.4)
Bilateral symmetry. A geometric figure S is said to have bilateral symmetry with
respect to a line L if the reflection Λ across L has the property that
Λ(S) = S. Equivalently, S is symmetric with respect to L if Λ maps every
point of S to a point of S (this is because Λ ◦ Λ = identity transforma-
tion). The line L is called the line of symmetry or the axis of symmetry.
(Section 4.4)
Binomial coefficients. Let n and k be whole numbers . Then the binomial coeffi-
cients (nk) for k satisfying 0 ≤ k ≤ n is the whole number
n n!
= .
k (n − k)! k!

(Exercises 1.4)
Closed interval. Let a and b be two numbers so that a < b. Then the closed
interval [ a, b] is the set of all numbers x satisfying a ≤ x ≤ b. (Section
Closed half-plane. It is the union of a half-plane of a line together with the line
itself. (Section 4.4)
Complex fraction. A complex fraction is a fraction obtained by a division A B of
two fractions A and B (B > 0). We continue to call A and B the numera-
tor and denominator of A B , respectively. (Section 1.7)
Congruence. A congruence is a transformation of the plane that is the composi-
tion of a finite number of reflections, rotations, and translations. (Section
Congruent figures. A geometric figure S is congruent to another geometric fig-
ure S , in symbols, S ∼ = S , if there is a congruence ϕ so that ϕ(S ) = S .
(Section 4.5)
Constant speed. An object in motion is said to have constant speed if the average
speed of the motion over any time interval (see page 266) is equal to a
fixed constant. This fixed constant is then called the (constant) speed of
the motion. (Section 1.9)
Corresponding angles of a transversal. A pair of angles formed when two paral-
lel lines are intersected by a transversal are called corresponding angles
if they are obtained by replacing one angle in a pair of alternate interior
angles (relative to this transversal) by its opposite angle. (Section 4.6)

Dilation. A transformation D of the plane is a dilation with center O and scale factor
r (r > 0) if
(1) D (O) = O.
(2) If P = O, the point D ( P), to be denoted by P , is the point
on the ray ROP so that |OP | = r |OP|. (Section 4.6)
Distance between parallel lines. Given two parallel lines L1 and L2 , the length
of the segment intercepted on a transversal that is perpendicular to both
L1 and L2 is a constant, and this length is the distance between L1 and
L2 . (Section 5.3)
Equal sets or equal geometric figures. Two geometric figures S and S are equal,
in symbols S = S , if
(i) every point P of S is also a point in S , and
(ii) every point Q of S is also a point in S .
(Section 3.1; Section 4.4)
Exponent. Let b be a nonzero number and let n be a positive integer. Then bn
is by definition equal to b · b · · · b (n times). In this case, n is called
the exponent of bn . [In Chapter 9, this concept of an exponent will be
expanded to include n as a rational number, and even as a real number.]
(Section 1.1)
Figure. See geometric figure.
Fraction division. If m k k
n and  are fractions (   = 0), then the division, or quo-
tient, of n by  , in symbols, k/ , is the fraction ba so that m
m k m/n a
n = b ×
. (Section 1.6)
Fraction multiplication. The multiplication of two fractions k × m n is by defini-
tion the length of the concatenation of k parts when [0, m n is partitioned
into  equal parts. (Section 1.5)
Fraction subtraction. If k > m n , then the subtraction  − n is by definition the
k m

length of the remaining segment when a segment of length m n is taken

from one end of a segment of length  . (Section 1.4)
Geometric figure. A figure, or geometric figure, is just a subset of the plane. Oc-
casionally, it also refers to a subset in 3-dimensonal space. (Section 1.5)
Intersection. The intersection of a collection of sets consists of all the points which
belong to each and every set in the collection. (Section 3.1)
LCM. The LCM (least common multiple) of a finite collection of positive in-
tegers is the smallest positive integer that is a multiple of each positive
integer in the collection. (Exercises 3.2)
Multiplicative inverse. The multiplicative inverse of a number x is the number
x −1 so that x · x −1 = 1. (Section 1.6; Section 2.5)
Open interval. Let a and b be two numbers so that a < b. Then the open interval
( a, b) is the set of all numbers x satisfying a < x < b. In other words, the
open interval ( a, b) is the closed interval [ a, b] without the two endpoints
a and b. (Section 2.6)
Opposite signs. Two numbers are said to have opposite signs if one of them is
positive and the other is negative. (Section 2.6)
Perpendicular bisector. Given a segment AB, its perpendicular bisector is the
line passing through the midpoint of AB and perpendicular to AB. (Sec-
tion 4.2)

Polygon. A polygonal segment A1 A2 · · · An An+1 in the plane so that An+1 = A1

and so that the segments Ai Ai+1 (for i = 1, 2, . . . , n) do not intersect each
other except at the points A1 , . . . , An+1 , called the corners or vertices of
the polygon. Each of A1 A2 , A2 A3 , . . . , An A1 is called a side or an edge
of the polygon. When a polygon A1 A2 · · · An A1 is clearly understood,
we denote it by the simpler notation A1 A2 · · · An . (Section 5.2)
Polygonal region. The union of a polygon together with the region inside the
polygon. (Section 5.2)
Polygonal segment. A polygonal segment A1 A2 · · · An is a sequence of segments
A1 A2 , A2 A3 , . . . An−1 An which need not be collinear and which could
intersect among themselves. The points A1 , A2 , . . . , An are called the
corners or vertices of A1 A2 · · · An . (Section 5.2)
Product formula. For any two complex fractions A C
B and D ,
× = .
(Section 1.5)
Ratio. Given two fractions A and B. The ratio of A to B, usually denoted by
A : B, is the complex fraction A B . (Section 1.9)
Rational quotient. A number that is the quotient (or division) of one rational
number by another. For example, if x and y are rational numbers and
y = 0, then xy is a rational quotient. (Section 2.5)
Ray. A ray is a semi-infinite line with a beginning point called its vertex.
(Section 4.4)
Reflection. Given a line  in the plane, the reflection across  is the transformation
Λ of the plane so that, for every point P in the plane,  is the perpendic-
ular bisector of the segment joining P to Λ( P). (Section 4.4)
Relatively prime. Two positive integers are relatively prime if their only common
divisor is 1. (Section 3.1)
Removing parentheses. This refers to any one of the following three identities
about all rational numbers x and y:
−( x − y) = − x + y, −(− x + y) = x − y, and − (− x − y) = x + y.
(Section 2.3)
Rotation. The rotation of the plane with center O and degree e (−360 ≤ e ≤ 360)
is the transformation R so that R(O) = O, and for a point P = O, P and
P = R( P) lie on the same circle around O, so that (i) the degree of the
angle ∠ POP is |e|, and (ii) P is in the counterclockwise direction of P
if e > 0 and P is in the clockwise direction of P if e < 0. (Section 4.4)
Same sign. Two numbers are said to have the same sign if they are either both
positive or both negative. (Section 2.6)
Segment. Given a line L in the plane and two points P and Q on L, the segment
PQ consists of P and Q together with all the points S between P and Q
(see between on page 266). (Section 4.4)
Similar figures. A geometric figure S is similar to another geometric figure S ,
in symbols, S ∼ S , if there is a dilation D so that D (S ) is congruent to
S . (Section 4.7)
Similarity. A similarity is a transformation of the plane that is the composition
of a dilation followed by a congruence. (Section 4.7)

Square root. A square root t of a positive number x is a number so that t2 = x.

(Section 3.1)
Transformation. A transformation F of the plane is a rule that assigns to each
point P of the plane a unique point F( P) (read: “F of P”) in the plane.
(Section 4.4)
Translation. A translation along a vector AB is the transformation of the plane T
so that, if T maps a point P to Q, then Q has the following properties:
(i) If P lies on the line L AB , then so does Q; if P does not lie
on the line L AB , then the (line containing the) segment PQ is
parallel to the (line containing the) segment AB,
(ii) PQ has the same length as AB, and
−→ −→
(iii) the two vectors PQ and AB point in the same direction.
(Section 4.4)
Transversal. A transversal of two given lines L1 and L2 is a line that intersects
both L1 and L2 . (Section 4.6)
Two-sided number line. With the fractions already defined on the right side of 0
on the number line, given a fraction p, then − p is the point on the left
side of 0 which is equidistant from 0 as p.

−q −p 0 p q
(Section 2.1)
Union. The union of a collection of sets consists of all the points which belong
to at least one set in the collection. (Section 5.1)
Vector. A vector AB is a segment AB so that A is the starting point and B is the
endpoint. (Section 2.2)

Part 3. Theorems and Lemmas

AA criterion for similarity. If two triangles have two pairs of equal angles, they
are similar. (Section 4.7)
ASA. If two triangles have two pairs of equal angles and the common side of
the angles in one triangle is equal to the corresponding side in the other
triangle, then the triangles are congruent. (Section 4.5)
Basic facts about inequalities in Section 2.6. If x, y, z, . . . are rational numbers,
(A) x < y ⇐⇒ − x > −y.
(B) x < y ⇐⇒ x + z < y + z.
(C) x < y ⇐⇒ x − y < 0.
(D) If z > 0, then x < y ⇐⇒ xz < yz.
(E) If z < 0, then x < y ⇐⇒ xz > yz.
(Section 2.6)
Cancellation law for rational quotients. If x, y, and z are rational numbers, and
y, z = 0, then,
x zx
= .
y zy
(Section 2.5)

Cross-multiplication algorithm. (i) For rational numbers x, y, z, and w, with

y = 0 and w = 0: xy = wz if and only if xw = yz. (ii) For positive
numbers a, b, c, and d: ba < dc if and only if ad < bc. (Section 2.5 and
Section 1.7, respectively)
FFFP (Fundamental Fact of Fraction-Pairs). Any two fractions ba and dc may be
ad bc
regarded as two fractions with the same denominator, e.g., bd and bd .
(Section 1.3)
Formulas for rational quotients in Section 2.5. Let x, y, z, w, . . . be rational num-
bers so that they are nonzero where appropriate in the following.
(a) Cancellation law: xy = zx
zy for any nonzero z.
(b) Cross-multiplication algorithm: xy = wz if and only if xw = yz.
xw ± yz
(c) xy ± wz = yw .
(d) xy × wz = yw xz
(Section 2.5)
Fundamental Theorem of Arithmetic. Every positive integer > 1 is a product of
a finite number of primes, and this collection of primes is unique (except
possibly for order). (Section 3.2)
Key Lemma. Suppose , m, n are nonzero whole numbers, and  divides mn. If 
and m are relatively prime, then  divides n. (Section 3.1)
Pythagorean Theorem. If the lengths of the legs of a right triangle are a and b,
and the length of the hypotenuse is c, then a2 + b2 = c2 . (Section 4.7)
SAS. If two triangles have a pair of equal angles (i.e., same degree) and the
corresponding sides of these angles in the two triangles are pairwise
equal (e.g., given
ABC and
A B C , the following holds: |∠ A| =
|∠ A |, | AB| = | A B | and | AC | = | A C |), then the two triangles are
congruent. (Section 4.5)
SAS criterion for similarity. Given two triangles ABC and A B C , if |∠ A| =
|∠ A | and

| AB| | AC |
= ,
| A B | | A C |

A B C . (Section 4.7)
SSS. If the three sides of a triangle and the three corresponding sides of an-
other triangle are pairwise equal, then the two triangles are congruent.
(Section 4.5)
Theorem 1 in the Appendix of Chapter 1. For any finite collection of numbers,
the sums obtained by adding them up in any order are all equal. (Section
Theorem 2 in the Appendix of Chapter 1. For any finite collection of numbers,
the products obtained by multiplying them in any order are all equal.
(Section 1.11)
Theorem 4.2. ( a) An isosceles triangle has equal base angles. (b) In an isosceles
triangle, the perpendicular bisector of the base, the angle bisector of the
top angle, the median from the top vertex, and the altitude on the base
all coincide. (Section 4.5)

Theorem 4.4. If D is a dilation with center O and scale factor r, then for any two
points P and Q in the plane, so that P = D ( P) and Q = D ( Q) are their
dilated images, we have
| P Q | = r | PQ|.
(Section 4.6)
Theorem 4.5. Let D be a dilation with center O and scale factor r, and let P, Q be
two points not collinear with O. Further let P denote D ( P). Then the
dilated image Q of Q is the intersection of line LOQ and the line passing
through P and parallel to L PQ . (Section 4.6)
Theorem 4.7. Alternate interior angles of a transversal with respect to a pair of
parallel lines are equal. The same is true of corresponding angles. (Sec-
tion 4.6)
Theorem 4.9 If two lines have a pair of equal alternate interior angles or corre-
sponding angles with respect to a transversal, they are parallel. (Section
Theorem 4.12. Given two triangles ABC and A B C , their similarity, i.e.,

A B C , implies the following equalities:
|∠ A| = |∠ A |, |∠B| = |∠B |, |∠C | = |∠C |,
| AB| | AC | | BC |
= = .
| A B | | A C | | B C |
(Section 4.7)

[Birkhoff-MacLane] G. Birkhoff and S. Mac Lane, A Survey of Modern Algebra, 4th Edition, MacMillan,
NY, 1977.
[CCSSM] Common Core State Standards for Mathematics (2010). Retrieved from
[Dolciani] R. G. Brown, M. P. Dolciani, R. H. Sorgenfrey, and W. L. Cole, Algebra. Structure and Method.
Book 1, California Teacher’s Edition, McDougall Litell, Evanston, IL, 2000.
[EngageNY] Grade 8 Mathematics Module 4: Teacher Materials.
[Euclid] Euclid, The Thirteen Books of the Elements, transl. Thomas L. Heath, Volume I (Books I and
II), Dover Publications, New York, NY, 1956.
[Eureka] Eureka Math - Grade 8.
[GIMPS] Great Internet Mersenne Prime Search.
[Gladwell] M. Gladwell, Outliers: The Story of Success, Little, Brown and Company, New York, NY,
[Lamon] S. J. Lamon, Teaching Fractions and Ratios for Understanding, Lawrence Erlbaum, Mahwah,
NJ, 1999.
[MAC] Mathematics Assessment Collaborative, Grade Six Performance Assessment, Spring 2002. Re-
trieved June 1, 2013 from\_6A.pdf
[Meyer] Dan Meyer, The Math I Learned After I Thought Had Already Learned Math, August 11, 2015.
Retrieved from
[MSE] Why is a geometric progression called so?, Mathematics Stack Exchange. Retrieved from
[NCTM] Curriculum and Evaluation Standards for School Mathematics, National Council of Teachers of
Mathematics, Reston, VA, 1989.
[NCTM2000] Principles and standards for school mathematics, National Council of Teachers of Mathemat-
ics, Reston, VA, 2000.
[NMP] National Mathematics Advisory Panel, Foundations for Success: Reports of the Task Groups
and Sub-Committees, U.S. Department of Education, Washington DC, 2008. Retrieved from
[NRC] Adding It Up, National Research Council, The National Academy Press, Washington DC,
[Post-Behr-Lesh] T. Post, M. Behr, and R. Lesh, Proportionality and the development of pre-algebra under-
standing, in The Idea of Algebra, K–12 (1998 Year Book of the National Council of Teachers
of Mathematics), A. F. Coxford and A. P. Shulte, eds., Reston, VA, 1988, pp. 78–90.
[Postelnicu] V. Postelnicu, Student Difficulties with Linearity and Linear Functions and Teachers’ Under-
standing of Student Difficulties, Dissertation, Arizona State University, 2011. Retrieved from
[Postelnicu-Greenes] V. Postelnicu and C. Greenes, Do teachers know what their students know?, National
Council of Supervisors of Mathematics Newsletter 42 (3) (2012), 14–15.
[Robson] E. Robson, Neither Sherlock Holmes nor Babylon: A reassessment of Plimpton 322, Historia Math-
ematica 28 (2001), 167-206.
[Ross] K. A. Ross, Elementary Analysis: The Theory of Calculus, Springer, New York, NY, 1980.


[Siegler-etal.] R. Siegler et al., Developing Effective Fractions Instruction for Kindergarten Through
8th Grade: A Practice Guide (NCEE #2010-4039), Washington DC: NCEE, Institute of Edu-
cation Sciences, U.S. Department of Education, 2010.
[Stanley] D. Stanley, Proportionality confusion.
[Stump] S. L. Stump, High School Precalculus Students’ Understanding of Slope as Measure, School Sci-
ence and Mathematics 101 (2) (2001), 81-89.
[Teukolsky] R. Teukolsky, Conic sections, an exciting enrichment topic, in Learning and Teaching Ge-
ometry, National Council of Teachers of Mathematics 1987 Yearbook, M. M. Lindquist and
A. P. Shulte, eds., National Council of Teachers of Mathematics, Reston, VA, 1987, pp. 155-
[Wiki-AGM] Inequality of arithmetic and geometric means, Wikipedia.
[Wiki-conic] Conic Sections, Wikipedia.
[Wiki-cryptography] Public-key cryptography, Wikipedia.
[Wiki-floorfunction] Floor and ceiling functions, Wikipedia.
[Wiki-GIMPS] Great Internet Mersenne Prime Search, Wikipedia.
[Wiki-goldenratio] Golden ratio, Wikipedia. Retrieved from
[Wu2004] H. Wu, “Order of operations” and other oddities in school mathematics, 2004. Retrieved from
[Wu2006] H. Wu, How mathematicians can contribute to K-12 mathematics education, Proceedings of In-
ternational Congress of Mathematicians, Madrid 2006, Volume III, European Mathematical
Society, Zürich, 2006, pp. 1676-1688. Also
[Wu2010a] H. Wu, Pre-Algebra (Draft of textbook for teachers of grades 6-8) (April 21, 2010). Retrieved
[Wu2010b] H. Wu, Introduction to School Algebra (Draft of textbook for teachers of grades 6-8) (August
14, 2010). Retrieved from
[Wu2011] H. Wu, Understanding Numbers in Elementary School Mathematics, Amer. Math. Soc., Provi-
dence, RI, 2011.
[Wu2013] H. Wu, Potential Impact of the Common Core Mathematics Standards on the Ameri-
can Curriculum, in Mathematics Curriculum in School Education, Yeping Li and
Glenda Lappan, eds., Springer, Berlin-Heidelberg-New York, 2013, pp. 119-143. Also
[Wu2015] H. Wu, Textbook School Mathematics and the preparation of mathematics teachers. Retrieved from
[Wu-PreAlg] H. Wu, Teaching School Mathematics: Pre-Algebra, Amer. Math. Soc., Providence, RI, 2016.
[Wu-HighSchool] H. Wu, Mathematics of the Secondary School Curriculum, I, II, and III (to appear).
+4 - %
6 []


cally for Common Core era teachers. The emphasis of the exposition is to
give a mathematically correct treatment of introductory algebra. For example,
it explains the proper use of symbols, why “variable” is not a mathematical
slope of a line correctly, why the graph of a linear equation in two variables is
a straight line, why every straight line is the graph of a linear equation in two
variables, how to use the shape of the graph of a quadratic function as a guide
graph of a quadratic function is a parabola, why all parabolas are similar, etc.
are written the way they are.

For additional information

and updates on this book, visit

AMS on the Web

You might also like