Professional Documents
Culture Documents
Teaching School Mathematics - Algebra
Teaching School Mathematics - Algebra
4= n
Teaching School Mathematics:
Algebra
Hung-Hsi Wu
A
Y
T (Ga )
=L
Ga T (P ) = (x + p, ax2 + q)
P = (x, ax2 )
q
V = (p, q)
5
O p
E R 0
9
G
B 1
A
/ x
y
8
7
https://doi.org/10.1090//mbk/099
Hung-Hsi Wu
Department of Mathematics
University of California, Berkeley
Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting
for them, are permitted to make fair use of the material, such as to copy select pages for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Permissions to reuse
portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink
service. For more information, please visit: http://www.ams.org/rightslink.
Send requests for translation rights and licensed reprints to reprint-permission@ams.org.
Excluded from these provisions is material for which the author holds copyright. In such cases,
requests for permission to reuse or reprint material should be addressed directly to the author(s).
Copyright ownership is indicated on the copyright page, or on the lower right-hand corner of the
first page of each article within proceedings volumes.
c 2016 by the author. All rights reserved.
Printed in the United States of America.
∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at http://www.ams.org/
10 9 8 7 6 5 4 3 2 1 21 20 19 18 17 16
To Kuniko
Wir sind durch Not und Freude
gegangen Hand in Hand;
vom Wandern ruhen wir beide
nun überm stillen Land.
Im Abendrot
Joseph von Eichendorff (1788–1857)
Contents
Preface xi
Chapter 1: Fractions
Chapter 2: Rational Numbers
Chapter 3: The Euclidean Algorithm
Chapter 4: Experimental Geometry
Chapter 5: Length, Area, and Volume
ix
x CHAPTERS IN THE COMPANION VOLUME
[PA]Chapter 1
[PA]Chapter 2
aa
aa
aa
aa
aa
[PA]Chapter 4 [PA]Chapter 3
! !
!!! ! !!
! !
! !! !!!
!! !!
[PA]Chapter 5 [A]Chapter 1
[A]Chapter 2
[A]Chapter 3
[A]Chapter 4
[A]Chapter 5
[A]Chapter 6
[A]Chapter 7 a
!!! aa
!! aa
!! aa
! aa
!! a
[A]Chapter 9 [A]Chapter 10 [A]Chapter 8
Preface
xi
xii PREFACE
supposed to learn. Education researchers who look into the nonlearning of al-
gebra do not appear to have given much thought to the fact that the TSM that
resides in student textbooks or standard professional development materials is
riddled with ambiguities and errors, big and small. In short, TSM is not learn-
able. Until a mathematically correct version of school algebra is readily accessible
to one and all, it will be premature to draw any conclusions about why students
cannot learn algebra. With this in mind, the main justification for this volume’s
existence is that it gives a logical and coherent exposition of the standard math-
ematical topics in Algebra I in a way that not only is grade-level appropriate for
eighth and ninth graders, but also meets the requirements of the following five
fundamental principles of mathematics:
(I) Precise definitions are essential.
(II) Every statement must be supported by mathematical reasoning.
(III) Mathematical statements are precise.
(IV) Mathematics is coherent.
(V) Mathematics is purposeful.
We will refer the readers to the Preface of [Wu-PreAlg] for a fuller discussions of
these fundamental principles.
The grade-level requirements we have imposed on this volume by no means
imply that this is a student textbook. This volume is unequivocally a book for
teachers with a sharp focus on mathematics. What this requirement means is
that a conscientious attempt has been made to minimize the distance between the
content in this volume and what teachers have to teach in middle school (see, for
example, [Wu2006]). Consequently, this volume will not touch on any advanced
topics such as vector spaces and linear transformations, groups, rings, fields, and
especially finite fields. It turns out that the need for such advanced considerations
is not critical at this stage and, in any case, there will be no advanced topics to be
found in this volume. Instead, we will focus on probing the basic structure that
undergirds the standard topics of school algebra. In the course of this probe, how-
ever, the need for advanced—and often quite subtle—considerations does surface
from time to time. On these occasions, we will not shy away from giving the
full explanation in order to bring mathematical closure to the discussion. All the
same, we will also be explicit in pointing out that these advanced considerations
are more for broadening the teachers’ knowledge base than for school classroom
presentations.
The fundamental principles of mathematics are of critical importance in the
teaching of school algebra because algebra is inherently an abstract subject com-
pared to arithmetic, and TSM’s lack of precise definitions and logical reasoning in
an abstract environment has rendered the subject unlearnable. In greater detail,
let us consider the following specific manifestations of these flaws in the algebra
portion of TSM:
3 We are using the term of “rational numbers” in its correct mathematical sense: fractions and
negative fractions.
PREFACE xv
definition of ar for all rational numbers r is that these are special values of the
exponential function x → a x when x is an arbitrary number. As a consequence,
the laws of exponents become just another set of senseless rote skills about a
strange notation rather than remarkable properties of the exponential function.
10. TSM’s presentation of quadratic equations and functions is chaotic: too
many facts to memorize while no conceptual framework is provided for their un-
derstanding. For example, students learn how to factor quadratic polynomials
with leading coefficients other than 1, learn the quadratic formula, learn the for-
mula for the axis of symmetry of the graph, learn the formula for the vertex of
the graph of a quadratic function, etc. How are these related to each other?
If one goes through the algebra curriculum of TSM carefully, one will uncover
these and many more serious mathematical issues. (Many of them will be pointed
out in this volume in due course.) The prospect of a student learning algebra is
therefore daunting: it may be likened to walking through a minefield where all
the mines were put there by human errors. The least we can do is to remove
the mines (and some of students’ concomitant fears)—in other words, eradicate
TSM—in order to give learning a chance. The modest goal of this volume is to
give you the tools to do exactly that. Briefly, one will find in the following pages
ways to deal with the preceding difficulties:
1a. What students should be learning is not what a “variable” is but the
proper use of symbols; see pages 4 ff. The meaning of each symbol must be
specified before it is put to use. For example, the equality of two functions of one
variable, f ( x ) and g( x ), may be a prototypical statement involving variables, but
the precise definition of the equality f = g is that, for each fixed number x in their
common domain of definition, f ( x ) = g( x ). Nothing varies.
2a. The solving of equations is strictly a matter of computations with numbers.
No variables are involved, and therefore there is no reason to confuse the issue
by using balance scales or algebra tiles to explain the solution process. See the
discussion in Section 3.1 on page 37.
3a. The concept of slope needs to be defined with far greater care than TSM
has let on. One has to explain what “slope” tries to measure, how to measure it,
and, most importantly, why this way of measuring it is correct and useful. In Section
4.3 on page 61, there is an extended discussion to this effect. In particular, this is
where the discussion of congruent triangles and similar triangles in Chapter 4 of
[Wu-PreAlg] becomes absolutely essential.
4a. In Sections 4.4 and 4.5 on pages 72 and 76, we will give a careful proof
of why the graph of a linear equation of two variables is a line and why each
line is the graph of some linear equation of two variables. In the process, it will
become obvious how to write down the equation of a line that satisfies any of the
standard geometric conditions. See Section 4.6 on page 78.
5a. Because perpendicularity and parallelism have been defined in Chapter 4
of [Wu-PreAlg], and because slope has been defined in Section 4.3 on page 61, any
assertion about parallelism (or perpendicularity) and slope becomes a theorem to
be proved. We will do exactly that in Sections 5.3 and 5.6 on pages 93 and 109,
respectively.
xvi PREFACE
6a. In Section 7.1 on page 137, we review the definition of constant rate, and
then prove that constant rate is equivalent to the existence of an appropriate linear
function that represents work done over time. In Section 7.2, we closely examine
the possible meanings of proportional reasoning and point out how—by eliminating
it altogether—its purported applications in school mathematics can all be put on
a firm mathematical foundation.
7a. In Section 5.1, we explain precisely why the solutions to a pair of equa-
tions are the set of all the points of intersection of the graphs of the two equations
in question. Such an explanation is possible only because the graph of an equation
has been precisely defined and put to use in reasoning.
8a. In Section 8.4, we define the half-planes of a line and the graph of a linear
inequality. Then in Theorem 8.4 on page 172, we prove that the graph of a linear
inequality is a half-plane of the graph of the associated linear equation.
9a. Section 9.2 re-orients the discussion of rational exponents by assuming
the existence of exponential functions from the beginning. (This is analogous to
the discussion of solving polynomial equations by assuming—at the outset—the
Fundamental Theorem of Algebra. In school mathematics, sometimes a central
theorem has to be taken on faith for pedagogical reasons.) Then we make use
of the characteristic property of the exponential functions (i.e., a x · ay = a x+y )
to prove that a0 = 1 and a− x = 1/a x . This makes it possible for the following
section (Section 9.3) to present complete proofs of the other laws of exponents for
rational exponents.
10a. Chapter 10 begins with a general discussion of the shape of the graph of
a quadratic function and then shows how the graph can provide a framework for
the understanding of quadratic functions in the same way that straight lines pro-
vide a framework for the understanding of linear functions. The basic technique
here is that of completing the square; it will be seen that this technique unifies the
diverse skills related to quadratic functions.
now becoming the accepted norm. I can only hope that, in the forthcoming years,
better student textbooks will be written so that the CCSSM will finally bring about
better student learning in school algebra.
The major conclusions in this book, as in all mathematics books, are sum-
marized into theorems; depending on the author’s (and other mathematicians’)
whims, theorems are sometimes called propositions, lemmas, or corollaries as a
way of indicating which theorems are deemed more important than others (note
that a formula or an algorithm is just a theorem). This idiosyncratic classification
of theorems started with Euclid around 300 B.C., and it is too late to change now.
The main concepts of mathematics are codified into definitions. Definitions are
set in boldface in this book when they appear for the first time. A few truly basic
definitions are even individually displayed in a separate paragraph, but most of
the definitions are embedded in the text itself. Be sure to watch out for them.
The statements of the theorems as well as their proofs depend on the defini-
tions, and proofs (= reasoning) are the guts of mathematics.
A preliminary suggestion to help you master the content of this book is for
you to
copy out the statements of every definition, theorem, proposi-
tion, lemma, and corollary, along with page references so that
they can be examined in detail if necessary,
and also to
summarize the main idea of each proof.
These are good study habits. When it is your turn to teach your students, be
sure to pass on these suggestions to them. A further suggestion is that you might
consider posting some of these theorems and definitions in your classroom.
You should also be aware that reading mathematics is not the same as reading
a gossip magazine. You can probably flip through such a magazine in an hour, if
not less. But in this book, there will be many passages that require careful reading
and re-reading, perhaps many times. I cannot single out those passages for you
because they will be different for different people. We do not all learn the same
way. What is true under all circumstances is that you should accept as a given
that mathematics books make for exceedingly slow reading. I learned this very
early in my career. On my very first day as a graduate student many years ago,
a professor, who was eventually to become my thesis advisor, was lecturing on
a particular theorem in a newly published volume. He mentioned casually that
in the proof he was going to present, there were two lines in that book that took
him fourteen hours to understand and he was going to tell us what he found out
in those long hours. That comment greatly emboldened me not to be afraid to
spend a lot of time on any passage in my own reading.
If you ever get stuck in any passage of this book, take heart, because that is
nothing but par for the course.
xix
https://doi.org/10.1090//mbk/099/01
CHAPTER 1
Symbolic Expressions
It can be argued that the most basic part of the learning of algebra is learning how
to use symbols correctly. This point of view is eloquently exposed in Chapter 3 of
the National Mathematics Advisory Panel Report [NMP]. If there is any meaning
at all to the phrase “algebraic thinking” in school mathematics, it would be “the
ability to use symbols precisely and fluently”. In this regard, there is a need
to single out the treatment of polynomials in this chapter. In mathematics, a
polynomial is either a polynomial function or an element of the polynomial ring
R[ x ], where R is the real numbers and “x” is an “indeterminate”. These two
concepts are distinct in general and every book on algebra has to come to grips
with the problem of how to reconcile these two notions. Happily, so long as we
work only with real or complex numbers, these two concepts are essentially the
same.1 Therefore we can afford to eschew the abstract concept of R[ x ] and simply
present a polynomial as a polynomial function so that the x in a polynomial
an x n + · · · + a1 x + a0 can be taken to be a number. The purpose of this chapter is
to demonstrate how one can do algebra by taking x to be just a number and turn
at least the introductory part of school algebra into generalized arithmetic, literally.
Formal algebra in the sense of R[ x ] can be left to a later date, e.g., a second course
in school algebra.2
This chapter is thus entirely elementary, and is nothing more than a direct ex-
tension of arithmetic. The exposition therefore intentionally emphasizes its close
affinity to arithmetic. (However, we do take the liberty of making more advanced
mathematical comments in the chapter preambles and, at times, in footnotes in or-
der to round out the picture; it is not necessary to understand the more advanced
comments for the reading of the text proper.) There is a danger, however, that
precisely because of its elementary character, you may take this chapter lightly
because it is “something you already know” and therefore not worthy of deeper
consideration. I would like to explicitly ask you to recognize that what is in this
chapter is genuine algebra, and that, most likely, whatever you think you already
know has been cast here in a new light. For example, whereas “variable” is re-
garded as a gateway concept for the learning of algebra at the time symbols are
introduced in TSM,3 this chapter shows why there is no need to try to learn what
a “variable” might be in order to learn algebra. We will restore simplicity to the
1 R [ x ] is ring isomorphic to the ring of polynomial functions over R. The same holds if R is
1
2 1. SYMBOLIC EXPRESSIONS
study of symbolic expressions, and simplicity is precisely the reason that algebra
can be taught without any fanfare.
It is not easy to learn to do things simply. It will take effort.
4 Diophantus was a Greek mathematician who lived in Alexandria, Egypt (Alexandria was a
Greek colony named after Alexander the Great). Unfortunately, his dates are unknown other than the
fact that he probably lived in the third century A.D. His influence in the development of mathematics
is considerable, as evidenced by the fact that the terminology of Diophantine equations is standard in
mathematics.
5 A co-discoverer of analytic geometry with Pierre Fermat (1601–1665). He is also an important
6 In mathematics, a variable is an informal abbreviation for “an element in the domain of definition
of a function” or a symbol that represents such, which is of course a perfectly well-defined concept
(see Chapter 7). If, for example, the domain of definition of a function (see page 117) is a set of
ordered pairs of numbers, it is informally referred to as “a function of two variables”, and it must be
said that, in that case, the emphasis is more on the word “two” than on the word “variables”. In the
sciences and engineering, the word “variable” is bandied about with gusto. However, to the extent
that mathematics is just a tool rather than the central object of study in such situations, scientists and
engineers can afford to be cavalier about mathematical terminology. In this volume, we have to be
more careful because we are trying to learn mathematics.
7 Because of FASM (page 265; a longer discussion is in Section 1.8 of [Wu-PreAlg]), all the opera-
then one also never uses a symbol without saying what the symbol stands for.
Here then is what might be called the Basic Protocol in the use of symbols:
Each time one uses a symbol, one must specify precisely what the
symbol stands for.
In a situation where we want to determine which number x satisfies an equality
such as 2x2 + x − 6 = 0, the value of the number x would be unknown for the
moment and x is then also called an unknown. In broad outline, this is all there
is to it as far as the use of symbols is concerned.
A closer examination of this usage reveals some subtleties, however. Consider
first the following three cases of the equality xy = yx:
(V1) xy = yx.
(V2) xy = yx for all whole numbers x and y so that 0 ≤
x, y ≤ 10.
(V3) xy = yx for all real numbers x and y.
The statement (V1) has no meaning, because we don’t know what the symbols x
and y stand for. To pursue the analogy with pronouns, suppose someone makes
the statement, “He is 7 foot 6”. Without indicating who “he” refers to, this state-
ment is neither true nor false.8 It is simply meaningless. For example, if x and y in
(V1) are real numbers, then (V1) is true, but there are other mathematical objects
x and y for which (V1) would be false.9 There is thus no way to decide if (V1) is
true or false. On the other hand, (V2) is true, but it is a trivial statement because
its truth can be checked by successively letting both x and y be the numbers 0,
1, 2, . . . , 9, 10, and then computing xy and yx for comparison. The statement
(V3) is however both true and more profound. As mentioned implicitly above,
this is the commutative law of multiplication among real numbers. It is either
something you take on faith, or, in some other context,10 a not-so-trivial theo-
rem to prove. Thus, despite the fact that all three statements (V1)–(V3) contain
the equality xy = yx, they are in fact radically different statements because the
quantifications (i.e., the precise descriptions) of the symbols x and y are different.
This reinforces the message of the above Basic Protocol that the quantification of
a symbol is critically important.
The preceding examples may convey the false message that each time a sym-
bol is used, it stands for “many” numbers, e.g., all real numbers. It remains to
point out that such is not the case in general. There are many equalities involv-
ing a number x where the x stands only for a finite collection of numbers. For
example, the x in the equality 3x + 7 = 5 can only be a single number, namely, x
= − 23 . This familiar process of “solving an equation” will be discussed in some
detail in Section 3.1 on page 37; it is not as simple as meets the eye. An even more
telling example is the following: let numbers a, b, c be fixed and let a = 0; then
the number x in the equality ax + b = c is the number
c−b
x =
.
a
We leave the verification of this claim to an exercise, but note that in this case,
not only does x stand for a single number, but also the symbols a, b, c are each
8 Itis true if “he” refers to the basketball star Yao Ming, but is false for Woody Allen.
9 For example, if x and y are certain 2 × 2 matrices.
10 Such as the set-theoretical foundation of mathematics.
1.2. EXPRESSIONS AND IDENTITIES 5
Exercises 1.1
Meaning of an expression
A notational convention
Meaning of an identity
Meaning of an expression
It is time to recall that in arithmetic there are many occasions when the use of
symbols is unavoidable. In addition to the commutative law of multiplication,
the statements of the commutative law for addition, the associative laws for ad-
dition and multiplication, and also the distributive law require a similar use of
symbols. In addition, the formulas for the addition, subtraction, multiplication,
and division of fractions likewise cannot be stated without the use of symbols.
We repeat these formulas here to emphasize this point: let k , m n be arbitrary
rational numbers. In other words, k, , m, n are integers and = 0, n = 0, and
m = 0. Then:12
k m kn ± m
± = ,
n n
k m km
· = ,
n n
k
kn
m = .
n m
We emphasize that in each of these formulas, we don’t need to know the exact value
for each of k, , m, n, but so long as they are integers, they will have to satisfy k ± m
n
= kn±nm , etc. For example, with k = 11, = −7, m = 5, and n = 23, then the
above formulas imply that
11 5 (11 × 23) ± (5 × (−7)) 218 288
± = = − or − ,
−7 23 (−7) × 23 161 161
11 5 11 × 5 55
· = = − ,
−7 23 −7 × 23 161
11
−7 11 × 23 253
= = − .
5
23
5 × (−7) 35
As a natural extension of these ideas, we now give some well-known algebraic
identities. The term identity is used in mathematics to indicate, informally, that
an equality is valid for a “large set” of numbers of interest. What “large” means
will be clearly indicated in each situation and,
The term identity is used in in any case, is usually clear from the con-
mathematics to indicate, text. The term “identity” is definitely not a well-
defined mathematical concept that requires a 100%
informally, that an equality is precise definition. However, since the meaning
valid for a “large set” of of this term seems at present to be endlessly
numbers. (and, one might say, unnecessarily) debated,
we will now try to clarify its meaning as best
we can. By a number expression or more simply an expression, in a given col-
lection of numbers x, y, . . . , w, we mean a number obtained from these x, y, . . . ,
w and from a collection of specific real numbers (e.g., 16, 18 , 5, etc.) by the use of
A notational convention
You may have noticed that the above expressions would be ambiguous unless a
notational convention concerning the arithmetic operations among the symbols
is understood. With the help of parentheses, the correct order in carrying out the
arithmetic operations in, for example,
xy
+ x3 (16z − y2 ) − z21
xyz + 2
will always be understood in this convention to mean
−1 3
(1.1) xy · ( xyz) + 2 + ( x ) (16z) + (−(y2 )) + − (z21 ) .
(The notation { A}−1 for a number A stands for the multiplicative inverse of A;
see page 270 or Section 2.5 of [Wu-PreAlg].) The ungainly sight of (1.1) should
be reason enough for the adoption of this notational convention. Postponing the
exact description of this notational convention to Section 1.4 on page 17 so as not
to disrupt the flow of the exposition, we may roughly describe this convention as
follows: do the multiplication indicated by the exponents first, then the multiplications,
and finally the additions. Recall in this connection that subtraction is nothing but
addition in disguise, i.e., a − b = a + (−b) by definition, for any two rational
numbers a, b (Section 2.3 of [Wu-PreAlg]). Similarly, division is nothing but mul-
xy
tiplication in disguise, i.e., the division in xyz+2 above is nothing other than the
multiplication xy · ( xyz + 2)−1 (see Section 2.5 of [Wu-PreAlg]).
Meaning of an identity
Now we can give “an approximate definition” of an algebraic identity, or more
simply an identity, as a statement that two given number expressions are equal
for every number in a given collection under discussion (such as all whole num-
bers, all positive numbers, or all numbers13 ) allowing for a small set of exceptions.
We emphasize again that an identity is not a precise concept within mathematics
but a piece of terminology used loosely for convenience. In specific situations,
13 Recall that a number, or a real number, is just a point on the number line.
8 1. SYMBOLIC EXPRESSIONS
there will be plenty of opportunities to discern what “the given collection under
discussion” is and what the “small set of exceptions” may be. A few examples
will be given below.
The assertion that ab = ba is true for all numbers a and b is an example of an
kn ± m
identity, and so is k ± m n = n for all integers k, , m, n provided = 0 and
n = 0. Right here, we see an identity that makes allowance for the exceptions of
= 0 and n = 0. More is true. We have just stated the equality k ± mn = kn±nm for
integers k, , m, n, but we know from considerations of rational quotients14 that this
equality remains true even if k, , m, n are arbitrary rational numbers. Therefore,
in this form, this identity is valid for all rational numbers k, , m, n provided = 0
and n = 0. The fact that the identity remains valid for all real numbers is then a
consequence of FASM.15 But even here, there are a “small number of exceptions”
to this general identity, namely, = 0 and n = 0.
In case it helps to further illustrate the cavalier manner in which the termi-
nology of identity is used, we give two advanced examples without attempting to
define the relevant concepts. The equality log xy = log x + log y is an identity
for all positive numbers x and y. The equality 1 + cot2 x = csc2 x is an identity
for all numbers x except for all integer multiples of π.
We want to get more interesting identities. Consider the computation of the
square 1042 , for example. One can compute it directly, of course. But one can also
proceed by appealing to the distributive law, as follows:
14 Recall that these are quotients A where A and B are rational numbers. See page 268 of this
B
volume or Section 2.5 of [Wu-PreAlg].
15 See page 265 of this volume or Section 2.7 of [Wu-PreAlg].
1.2. EXPRESSIONS AND IDENTITIES 9
(Note that the preceding computation furnishes a good review of the basic arith-
metic of rational numbers: the distributive law for a difference, a(b − c) = ab − ac
for all numbers a, b, c, and the removal of parentheses by −( a − b) = − a + b for
all a, b. See Section 2.4 of [Wu-PreAlg].) Again, we stop the calculation at this
point because it can now be finished in one’s head: 250000 − 3000 + 9 = 247009.
The same computation also leads to:
(1.3) ( a − b)2 = a2 − 2ab + b2 for all numbers a and b.
It is a good illustration of the power of the symbolic notation, and the at-
tendant generality the symbolic method brings, to note that identity (1.3) can
be obtained directly from identity (1.2). Indeed, since the identity ( a + b)2 =
a2 + 2ab + b2 is valid for all numbers a and b, we may replace b by an arbitrary
number −c to get
( a + (−c))2 = a2 + 2a(−c) + (−c)2 = a2 − 2ac + c2 .
Since a + (−c) = a − c by definition, we get ( a − c)2 = a2 − 2ac + c2 , and since c
is arbitrary anyway, we may replace c by b to obtain ( a − b)2 = a2 − 2ab + b2 for
any numbers a and b. Thus we have retrieved identity (1.3) by way of the identity
(1.2).
Activity
It follows that 409 × 391 = 160000 − 81 = 159919. The same reasoning carries
over to any two numbers a and b, so that
( a + b)( a − b) = ( a + b) a − ( a + b)b
= a2 + ba − ab − b2
= a2 − b2 .
When the symbolic computation is given in such detail, we see that in the second
line, the commutative law for multiplication was used. We have obtained our
third identity:
(1.4) ( a + b)( a − b) = a2 − b2 for all numbers a and b.
The preceding three identities, (1.2)–(1.4), may be considered the most basic
identities in algebra. Note that their usefulness comes not just from the expansion,
( a + b)2 = a2 + 2ab + b2 , ( a − b)2 = a2 − 2ab + b2 , etc., but even more so from
the recognition, for example, that in (1.4), the expression a2 − b2 in the numbers a
and b is equal to a product, ( a + b)( a − b). Informally, we may say that the power
10 1. SYMBOLIC EXPRESSIONS
of the identities (1.2)–(1.4) often results from reading these identities from right to left,
i.e., for all numbers a and b,
a2 + 2ab + b2 = ( a + b )2 ,
a2 − 2ab + b2 = ( a − b )2 ,
a2 − b2 = ( a + b)( a − b).
The last equality, i.e.,
(1.5) a2 − b2 = ( a + b)( a − b) for all numbers a and b,
which is identity (1.4) written backward, is what is known as a factorization or
factoring of a2 − b2 , which merely means expressing a2 − b2 as a product, in the
same sense that 24 = 3 × 8 is a factorization of 24. Knowing such a factorization
for a number expression involving two arbitrary numbers a and b can be very
useful. Thus, if a = b, we can simplify the division aa−
2 b2
− b to
a2 − b2
(1.6) = a + b,
a−b
because a2 − b2 = ( a + b)( a − b), so that we can cancel the (nonzero) number a − b
in the numerator and the denominator. Here then is another identity that holds
for all a and b except when a = b. We explicitly point out that, insofar as a and
b can be rational numbers (say, 17 2
5 and 7 ), we are using the cancellation law for
rational quotients here.16 One cannot over-emphasize the importance of the role
played by complex fractions or rational quotients17 in school mathematics.
Exercises 1.2
In doing these and subsequent exercises, observe the following basic
rules:
( a) Use only what you have learned so far in this volume.
This is the situation you face when you teach.
(b) Show your work. The explanation is as important as the
answer.
(c) Be clear. Get used to the idea that everything you say has to
be understood.
x
(1) Let x and y be numbers so that x = y and x = −y. (i) Simplify +
x+y
y 1 1
. (ii) Simplify 2 − 2 .
x−y x − y2 x + y2
(2) If a is a number, one can compute ( a2 − 53 a − 23 )( a2 + 53 a − 23 ) by a
straightforward application of the distributive law. Do you see an easier
way to do this computation? (There is more than one way.)
(3) Simplify for all numbers x and y: (i) ( x + y)2 + ( x − y)2 . (ii) ( x + y)2
− ( x − y)2 . Observe that ( x + y)2 − ( x − y)2 ≤ ( x + y)2 ; in view of
(ii), what do you conclude? (Compare Exercise 13 in Section 2.6 of
[Wu-PreAlg].)
(4) Is the whole number 98767 − 1237 a prime number?
16 See page 269 of this volume or Section 2.5 of [Wu-PreAlg].
17 Both concepts are neglected in TSM (see page xi for the definition of TSM).
1.3. MERSENNE PRIMES AND FINITE GEOMETRIC SERIES 11
879 2 868 879 868 2
(5) Mental math: compute −2 + .
22 22 22 22
(6) Show that for all numbers x, y, and c = 0,
1
| x + y|2 ≤ 1 + 2 | x |2 + (1 + c2 )|y|2 .
c
(7) Can you see why if x and y are any two numbers (in particular, they
could be negative), then 19 x2 − 121
y ≥ 0?
1 2
xy + 64
3
(8) For numbers a and b, compute ( a + b) . (There is a generalization of
identity (1.2) for any positive integer n that states
n n −1 n n −2 2 n
( a + b)n = an + a b+ a b +···+ abn−1 + bn ,
1 2 n−1
where the numbers (nr) for r = 1, 2, . . . , n − 1 are the binomial coefficients
(see page 266). This is called the binomial theorem (see, e.g., Chapter 11
in Volume II of [Wu-HighSchool]). It would be instructive for you to
check that your result for ( a + b)3 coincides with the special case of the
binomial theorem for n = 3.)
A basic identity
Mersenne primes
Finite geometric series
A basic identity
There is an identity that generalizes identity (1.4), that is equally elementary but
has far-reaching applications in mathematics. This time, we start with a symbolic
calculation using the distributive law twice: if a, b are any two numbers, then
( a2 + ab + b2 )( a − b) = ( a2 + ab + b2 ) a − ( a2 + ab + b2 ) b
= ( a3 + a2 b + ab2 ) − ( a2 b + ab2 + b3 )
= a3 − b3 .
Notice two features in the preceding calculation. First, if we call any of the prod-
ucts separated by two consecutive +’s a term of the number expression,18 e.g.,
a3 , a2 b, ab2 , . . . , b3 , then the way to remember the expression a2 + ab + b2 is to
observe that the power of a decreases by 1 and the power of b increases by 1 as
we go through the terms from left to right. Second, the cancellation in the second
line
( a3 + a2 b + ab2 ) − ( a2 b + ab2 + b3 )
is due to the matching of each term in the first pair of parentheses with a term
in the second pair of parentheses, except for the first term a3 and the last term b3 ,
18 Recall that since a subtraction is an addition in disguise, this reference to + includes automat-
so that the only survivors at the end are the two terms a3 − b3 . The same pattern
repeats itself if we multiply ( a3 + a2 b + ab2 + b3 ) by ( a − b). Thus,
( a3 + a2 b + ab2 + b3 )( a − b) = ( a3 + a2 b + ab2 + b3 ) a − ( a3 + a2 b + ab2 + b3 ) b
= ( a4 + a3 b + a2 b2 + ab3 ) − ( a3 b + a2 b2 + ab3 + b4 )
= a4 − b4 .
If we form the products
( a4 + a3 b + a2 b2 + ab3 + b4 )( a − b),
( a5 + a4 b + a3 b2 + a2 b3 + ab4 + b5 )( a − b),
the results would be a5 − b5 , a6 − b6 . Let us write these down. For any two
numbers a and b, we have
( a − b) ( a2 + ab + ab2 ) = a3 − b3 ,
( a − b) ( a3 + a2 b + ab2 + b3 ) = a4 − b4 ,
( a − b) ( a4 + a3 b + a2 b2 + ab3 + b4 ) = a5 − b5 ,
( a − b) ( a5 + a4 b + a3 b2 + a2 b3 + ab4 + b5 ) = a6 − b6 .
Activity
Mersenne primes
First, we consider identity (1.7) only when a and b are whole numbers. Then of
course an − bn is also a whole number for any positive integer n. It may come
as a surprise that (1.7) has very interesting things to say about prime numbers
in this case. Recall that a whole number ≥ 2 is a prime if it has no divisor other
than 1 and itself (see, e.g., Section 3.1 of [Wu-PreAlg]). Therefore, when two
whole numbers a and b satisfy a − b > 1, (1.7) says that an − bn is never a prime
when n ≥ 2 because it has a − b as a divisor. For example, 2541 − 641 is not a
prime because—although we don’t know this big number exactly—we know that
19 (= 25 − 6) is a divisor.
Why is the fact that an − bn is never a prime when n ≥ 2 and a − b > 1 wor-
thy of attention? Because the study of the integers is a primary concern of a major
branch of mathematics, number theory, and an important part of number theory
is devoted to the understanding of prime numbers. An obvious question about
primes is how to decide, simply, whether a given number is prime or not. Unfortunately,
we have no complete answer to this question yet. There is a silver-lining to this
failure, however. If we had a simple way to detect primes, our daily life might be-
come dramatically different because, for example, banking and online purchasing
would not have evolved the way they did (see, e.g., [Wiki-cryptography]). There-
fore knowing that a large number such as 21560887 − 1 (it has 426 digits!) is never
a prime is something to write home about.
Activity
(a) Explain, without using identity (1.7), why the number 39187 − 35387 is
not a prime. (b) Verify that 292 − 282 = 57 by mental math.
22 − 1 = 3,
23 − 1 = 7,
25 − 1 = 31,
27 − 1 = 127,
211 − 1 = 2047,
213 − 1 = 8191,
217 − 1 = 131071.
On this list, every number is a prime19 except the case of p = 11: 2047 = 23 ×
89. Those numbers of the form 2 p − 1 which are primes are called Mersenne
primes. Marin Mersenne (1588–1648) was a French monk, a scholar of science and
mathematics, and the central clearinghouse of European science and mathematics
of his time. There were no scholarly journals in those days, but Mersenne, through
his correspondence with the leading scientists and mathematicians of Europe—
including Descartes, Pascal, Fermat, and Huygens—helped disseminate the latest
discoveries to a wider audience. He came upon the primes that are named after
him in his (unsuccessful) search for an expression that would yield only primes.
He claimed that when a prime p is at most 257, then 2 p − 1 is a prime exactly
when
p = 2, 3, 5, 7, 13, 17, 19, 31, 67, 127, and 257.
It turns out that he was wrong about p = 67 and 257, and he also missed p = 61,
89, and 107 (261 − 1, 289 − 1, and 2107 − 1 are all primes). Nevertheless, the interest
in Mersenne primes has endured.
The overriding fact about Mersenne primes is that it is not known whether
there are an infinite number of them; as of April 2016, only 49 Mersenne primes
are known ([Wiki-GIMPS]). This fact colors everything we have to say about
these primes. There is an online society devoted to the search of Mersenne
primes, the Great Internet Mersenne Prime Search (GIMPS), which has been re-
sponsible for the discovery of all the Mersenne primes since 1997 (see [GIMPS],
also [Wiki-GIMPS]). The largest known Mersenne prime as of April 2016 has
22,338,618 digits; it corresponds to p = 74,207,281 (discovered on January 7, 2016).
Incidentally, this is also the largest known prime number. If we can prove that
there is only a finite number of Mersenne primes, then finding the largest one
would obviously be of great interest.
19 The primality of these numbers (other than 131071) can be decided with a modicum of patience.
The fact that 131071 is a prime was first discovered by Pietro Cataldi (1548–1626) in year 1588.
1.3. MERSENNE PRIMES AND FINITE GEOMETRIC SERIES 15
a n +1 − b n +1
(1.9) = ( an + an−1 b + an−2 b2 + an−3 b3 + · · · + abn−1 + bn )
a−b
for any a and b, with a = b, and any positive integer n.
Note that this identity generalizes identity (1.6) on page 10. Now if b = 1 and
a = 1, then we get (by writing (1.9) backward):
a n +1 − 1
(1.10) (1 + a + a2 + · · · + a n −1 + a n ) = for any number a = 1.
a−1
In this form, identity (1.10) is called a summation formula for the finite geometric
series20 of n + 1 terms in a, 1 + a + a2 + · · · + an−1 + an . For example, if a = 5
and n = 11, then (using a calculator!)
512 − 1 512 − 1
1 + 5 + 52 + 53 + · · · + 510 + 511 = = .
5−1 4
316 − 1 43046720
1 − 3 + 32 − 33 + 34 − · · · + 314 − 315 = = − = −10761480.
−3 − 1 4
3
And finally, if a = 4 and n = 10, we have
{( 34 )11 − 1}
1+ 3
4 + ( 34 )2 + ( 34 )3 + · · · + ( 34 )10 = ,
4 −1
3
which is equal to
16068628
= 3.83 . . . .
4194304
As another example,
318 − 1
38 + 39 + · · · + 325 = 38 (1 + 3 + · · · + 317 ) = 38
3−1
= 6561 × 12 (387420488)
= 1270932910884.
20 The reason for calling such a series “geometric” is obscure, and everybody seems to be—at
best—guessing. The most reasonable guess, to me, is the picture of the sequence of segments in
[MSE].
16 1. SYMBOLIC EXPRESSIONS
Exercises 1.3
1 1 1 1
(1) If y is a nonzero number, what is 1 + + + 3 + · · · + 19 ?
y y2 y y
example of this section, we found 3 + 3 + · · · + 3 =
(2) In
the last 8 9 25
1 1 1 1
(3) (a) Sum 56 + 57 + 58 + · · · + 527 . (b) Sum 15 + 16 + 17 + · · · + 32 .
2 2 2 2
(4) If y is a nonzero number and n is a positive integer, what is
1 1 1
+ 4 +···+ n?
y3 y y
1 1 1 1 1
(5) Sum 3
− 5 + 7 − 9 + · · · − 33 .
4 4 4 4 4
(6) If x is a nonzero number and n is a positive integer, what is
1 1 1 1 1
−1 + 3
− 6 + 9 − · · · + (2n−1)3 − (2n)3 ?
x x x x x
1.4. POLYNOMIALS AND ORDER OF OPERATIONS 17
1 1 1
(7) Show that, for any positive integer n, + + · · · + n < 1. (Note: a
2 22 2
popular representation of this inequality is the following picture:
three times and multiplying once. This is true, but the difference in conceptual
clarity between
and
(18 + 23 + 69) × 53
as a product,
(181 + 67 + 96 − 257) × 25 (= 87 × 25 ).
Similarly, we write
24 × 5914 − ( 35 )8 × 89 + (5914 × 73) + (5914 × 66) + 25 × ( 35 )8 + ( 35 )8 × 11
1 3 1
x + 16 − 8x2 + x3 − x5 − 6x2 + 75x + 2x3
2 3
as
17 3
− x5 + x − 14x2 + 75x + 16.
6
1.4. POLYNOMIALS AND ORDER OF OPERATIONS 19
Observe that we have implicitly followed three conventions in writing the latter
sum involving the powers of a fixed number x:
(i) Parentheses are suppressed with the understanding that exponents be computed first,
multiplications second, and additions third. (This is the so-called order of operations,
and was already mentioned on page 7.)
(ii) Powers of x are placed last in each term (so that The order of operations is just a
instead of − x2 14, we write −14x2 ). convention and, like all other
(iii) The terms are written in decreasing powers 21 conventions in mathematics, it
of the number x in question. (We make the ad hoc has no mathematical substance.
definition in this situation that x0 = 1 regard-
less of whether x is 0 or not.22 The term 16 is
then the term 16x0 ; incidentally, this is where we need the concept of the zeroth
power of x.)
x clearly doesn’t count. This polynomial has degree 5, and not 37 (and not any
whole number different from 5, for that matter.) Moreover, −1 is the coefficient
of x5 , 0 is the coefficient of x4 , and −14 is the coefficient of x2 , because, strictly as
a sum of the powers of x, this polynomial is, in reality,
17 3
(−1) x5 + 0x4 + x + (−14) x2 + 75x + 16x0 .
6
Similarly, 16 is the coefficient of x0 .
As is well known, a polynomial of degree 1 is called a linear polynomial, and
that of degree 2 is called a quadratic polynomial. Because a general quadratic
polynomial has only three terms ax2 + bx + c (where a, b, and c are constants),
it is sometimes called a trinomial in school mathematics. It must be said that
the terminology of “trinomial” is not one that is used in advanced mathematics,
so you should avoid using it as much as possible. We will discuss quadratic
polynomials in some detail in the last chapter (Chapter 10). A polynomial of
degree 3 is called a cubic polynomial.
There is no reason why we must restrict ourselves to polynomials in one
variable. If x, y, z, etc., are numbers, then sums of multiples of the products of
21 This is a good rule most of the time but not all the time. There will be times when we want to
23 The obsession in TSM (see page xi) with order of operations has no mathematical merit; this
terminology (order of operations) is in fact unknown to most working mathematicians. For a fuller
discussion of the issues involved, see [Wu2004].
1.4. POLYNOMIALS AND ORDER OF OPERATIONS 21
number, and the third because 23 is not a whole number. However, if we choose
to rewrite the first of these three polynomials in 10 as
(3 × 103 ) + (5 × 102 ) + (2 × 101 ) + (8 × 100 ),
then it would be the expanded form of 3528.
In the same vein, the so-called complete expanded form of a finite decimal
with any nonzero decimal digits, such as 32.58,
(3 · 101 ) + (2 · 100 ) + (5 × 10−1 ) + (8 · 10−2 ),
is not a polynomial in 10, for the reason that it contains negative powers of 10.
Because polynomials are just numbers, we can add, subtract, multiply, and
divide them as usual. With the exception of division, the other three arithmetic
operations produce another polynomial in a routine manner.
Activity
Take a few minutes to verify the preceding statement that the sum, differ-
ence, and product of two polynomials in the same number x are polynomials
in x.
Division of polynomials does not generally produce a polynomial and will be
looked at separately in the next section.
is nothing but routine applications of the distributive law. However, when this
equality is read backward, it becomes the statement that the sum of the three
terms on the right of (1.11) is actually equal to the product of the two linear
polynomials on the left, i.e.,
(1.12) acx2 + ( ad + bc) x + bd = ( ax + b)(cx + d).
This is not a priori obvious. For example,
15x2 + 172x − 96 = (15x − 8)( x + 12).
In general, if the polynomials p( x ), q( x ), and
Do not read an identity only r ( x ) in x satisfy p( x ) = q( x )r ( x ), then we say
from left to right; be aware that q( x )r ( x ) is a factorization of p( x ) if the de-
grees of both q( x ) and r ( x ) are positive; the
the right side is also equal to the polynomials q( x ) and r ( x ) are called the fac-
left side. tors of the polynomial p( x ). (Thus 53 x3 −
2x2 + 23 = ( 13 )(5x3 − 6x2 + 2) is not a factor-
ization of 53 x3 − 2x2 + 23 , because the degree of 13 is zero.) Compare the comments
made in connection with identity (1.5) on page 10.
In this terminology, the equation (1.12) gives a factorization of acx2 +
( ad + bc) x + bd as a product ( ax + b)(cx + d), where it is understood that
a = 0 and c = 0. For example, we get
1 2 5 1
x + x − 3 = (2x − 3)( x + 1)
2 4 4
by letting a = 2, b = −3, c = 14 , and d = 1. With some practice, the factorization
of 12 x2 + 54 x − 3 can be done directly. One way is the following. Since it is
much easier to deal with integers rather than rational numbers, we rewrite the
polynomial by using the distributive law to take out the denominators of all the
coefficients, as follows:
1 2 5 1
x + x − 3 = (2x2 + 5x − 12).
2 4 4
Then we recognize that
(2x2 + 5x − 12) = (2x − 3)( x + 4)
because, assuming there is such a factorization into polynomials with integer coefficients,
we learn from equation (1.12) that the zero-degree term (i.e., −12) of 2x2 + 5x − 12
has to be the product of two integers that are the zeroth degree terms of the
factors—thus ±3 and ∓4, or ±2 and ∓6, or ±1 and ∓12. Likewise, the coefficient
2 of 2x2 + 5x − 12 has to be the product of the coefficients of x in the factors— ±2
and ±1. Finally the coefficient 5 of 2x2 + 5x − 12 has to be the sum of the “cross
products” of these four numbers in the sense of ( ad + bc) in (1.11) above. So a
few trials and errors should get it done. Hence, we obtain
1 2 5 1 1
x + x − 3 = (2x2 + 5x − 12) = (2x − 3)( x + 4),
2 4 4 4
which is the same factorization as above.
At present, the teaching of factoring quadratic polynomials with integer coef-
ficients figures prominently, not to say obsessively, in a typical algebra course. For
1.4. POLYNOMIALS AND ORDER OF OPERATIONS 23
this reason, some perspective on this subject is called for. All that those exercises
in factoring
Ax2 + Bx + C = ( ax + b)(cx + d)
can do for students is to help them learn to decompose two whole numbers A
and C into products of integers A = ac and C = bd so that B = ad + bc, i.e.,
There is no denying that beginning students ought to acquire some facility with
decomposing integers into products of other integers. It is also important that
they can effortlessly factor a simple quadratic polynomial such as x2 + 2x − 35
into ( x + 7)( x − 5). But it often happens that although a little bit of something is good,
a lot of it can actually be bad for you. (Think of fluoride in your drinking water.) This
seems to be the case here: the teaching of a small skill gets blown up to be a
major topic, with the consequence that other topics that are more central and
more substantial (such as learning about the graphs of linear equations, solving
constant rate problems correctly, or the effective use of completing the square) get
slighted. The teaching of algebra should avoid this pitfall. Please also keep in
mind the fact that once the quadratic formula becomes available (see Theorem 10.3 on
page 234), there will be an algorithm to accomplish this factorization (in all the cases
where factoring is possible) no matter what the coefficients of the quadratic polynomial
may be.
We give one more illustration of the multiplication of polynomials where each
step except the last makes use of the distributive law:
1 1 1 1
(5x3 − x )( x2 + 2x − 4) = (5x3 − x ) x2 + (5x3 − x )2x − (5x3 − x ) 4
2 2 2 2
1
= (5x5 − x3 ) + (10x4 − x2 ) − (20x3 − 2x )
2
41 3
= 5x + 10x4 −
5
x − x2 + 2x.
2
Now, reading this equality backward gives a factorization that is (for a change)
not so easy:
41 3 1
5x5 + 10x4 − x − x2 + 2x = 5x3 − x ( x2 + 2x − 4).
2 2
Note the fact that if p( x ) and q( x ) are polynomials of degree m and n, re-
spectively, then the degree of the product p( x )q( x ) is (m + n). In other words,
the degree of a product is the sum of the degrees of the individual polynomial factors. For
example, the preceding calculation which multiplies a degree 3 polynomial with
a degree 2 polynomial yields a polynomial of degree 5 (= 3 + 2).
Activity
Discuss whether the sum of two n-th degree polynomials is always an n-th
degree polynomial.
24 1. SYMBOLIC EXPRESSIONS
Exercises 1.4
Activity
5
which is equal to 16 24 , by the formulas for rational quotients (page 270). In gen-
eral, no matter what x may be, we can likewise compute with rational expressions:
and
x2 + 1 6 ( x2 + 1)(6)
· =
x2 + 4x − 7 3x4 − 5 ( x2 + 4x − 7)(3x4 − 5)
and
2x +1
x 2 −3 (2x + 1)(2x )
= .
4x3 − x +11 ( x2 − 3)(4x3 − x + 11)
2x
These are just computations with rational quotients. At the risk of belaboring a point,
we emphasize that these computations are exactly the same as those with rational
quotients and not just “analogous to” them. There is so much in introductory
algebra that is just a revisit of arithmetic.
AB
Because the cancellation law is valid for rational quotients (i.e., AC = CB for all
rational numbers A, B, and C, with A = 0 and C = 0),24 some rational expressions
can be simplified. Sometimes the cancellation presents itself, as in
Here, the nonzero number (5x4 − x3 + 2) in both the numerator and denominator
can be cancelled,25 resulting in
Sometimes, the cancellation can be less obvious. For example, the rational expres-
sion
x3 − 8
x2 + 2x + 4
can be simplified to x − 2 because, by identity (1.8) on page 12,
x3 − 8 = x3 − 23 = ( x − 2)( x2 + 2x + 4)
and we can cancel the nonzero number ( x2 + 2x + 4) from the numerator and
denominator. (As we will see when we come to Chapter 10—more precisely, page
232—it turns out that x2 + 2x + 4 is never equal to 0. Therefore, we actually have
−8 3
an identity x2 x+2x +4
= x − 2 for all x.)
Exercises 1.5
In each of the following exercises, x and y are numbers.
(1) Compute and simplify:
2
x x −2
− 3 .
x − 16
4 x + 2x2 + 4x + 8
(2) If x is a number different from 2, −3, and −1, what is
2 3 1
+ − =?
x−2 x+3 x+1
(3) If x is a number that makes all the denominators nonzero in the follow-
ing, simplify:
2x3 −9x2 −5x
( x −2)2
.
x2 −3x −10
x4 −16
15x3 y4 4x4 − 9y4
(4) Simplify: (i) . ( ii ) .
−60x2 y7 4x4 + 12x2 y2 + 9y4
x4 − 16 3x + 6
(5) Simplify: · .
x2 − 4 x3 + 2x2 + 4x + 8
https://doi.org/10.1090//mbk/099/02
CHAPTER 2
Translation
of Verbal Information
into Symbols
Word problems are the bugbears of students (and some teachers too). Part of this
difficulty stems from a habit that was probably acquired in elementary school
from some of their teachers and textbooks. Students learn to skip the crucial
step of trying to understand what the problem is about and look instead for so-
called “key words” in order to make the replacement of words by symbols into
an automatic, rote skill. Thus, “increase by” becomes +, “less than” becomes −,
“of” becomes ×, etc. (Google “key words math” to get an idea of the extent of
this phenomenon.)
This chapter confronts the key word syndrome head-on. We recognize that stu-
dents’ difficulty with solving word problems can be separated into three stages:
the first stage is reading the text carefully to know what the problem is about,
the second stage is the translation of verbal information into symbols, and the
third stage is the extraction of the solution from the symbolic statements, be
they equations or inequalities. The need for such a separation does not seem
to be widely recognized at present in school mathematics education. Many teach-
ers—after routinely writing down the equations associated with a problem by the
“key words” method—spend most of their effort on the skill of solving equations,
i.e., the third stage. Their students learn to follow suit. Consequently, many stu-
dents fail to learn the most fundamental aspect of algebra, namely, the proper
use of symbols to capture an abstract thought or—in the case of solving word
problems—translate verbal information accurately into equations or inequalities.
For the purpose of good mathematics education, we should, and must, reverse
this trend and promote the importance of the translation process. In this chapter,
we will address—exclusively—the first and second stages (reading carefully and
translating accurately) and leave the third stage to later chapters (see Chapters 3,
5, and 8–10).
equations all your lives? Perhaps. But before giving the definition, let us see what
an equation is supposed to be in TSM.1 First x is a “variable”, which means it is
some “quantity” and all you know is that it varies. Then when something like
3x − 5 = 7x − 1
is given, you immediately set about “solving” it by going through the motions.
What is 3x − 5, and what is 7x − 1? Both are combinations of a “variable” and
some numbers, and therefore both “vary” so that you don’t know what they
are. In what sense can such combinations be “equal”? Yet you are supposed to
accept that they are somehow “equal” and go about computing with them as if
they were plain, ordinary numbers. Are you making any sense? Is mathematics
so incomprehensible that it is reduced to a collection of symbolic manipulations
devoid of any meaning, and you just go through them because “this is what it
takes to get the right answer”?
Such thoughts should give you pause and make you feel uneasy about teach-
ing your own students by recycling the same unfathomable TSM that you were
subjected to. It is time to take a fresh look at what an equation is, get it right, and
try to teach your own students better.
An equation in x is a question asking which numbers x would make two given
expressions in x equal. Therefore the symbolic statement
3x − 5 = 7x − 1
is nothing more than an abbreviation of the question:
For which numbers x are the two expressions in x, 3x − 5 and 7x − 1, equal?
Any number x that makes the expressions equal is called a solution. In this
terminology, to solve an equation is to obtain all the solutions of the equation.
For example, suppose one expression is 3x − 5 and the other is 7x − 1. The
equation 3x − 5 = 7x − 1 asks for the collection of (all the) numbers x so that
3x − 5 = 7x − 1. It is not difficult to see that the only solution in this case is
−1, and we will discuss the solution process in the next section. In textbooks
and education materials, the whole question is usually presented as the following
symbolic statement with no preamble:
Solve 3x − 5 = 7x − 1.
Notice that such a statement violates the Basic Protocol in the use of symbols
because the symbol x has not been quantified and we have no idea what it is.
Does this mean that each time we see an equation, we must repeat the cumber-
some statement about “for which collection of numbers x is it true that 3x − 5 =
7x − 1?” No, not if an equation has been clearly defined—and understood—from
the beginning to be the abbreviation of that
An equation in x is a question question. Therefore, after students have come
asking which numbers x make to terms with what an equation is and what it
two given expressions in x equal. means to solve an equation, you will be able
to properly employ the time-honored, cryptic
shortcut: “Solve 3x − 5 = 7x − 1.” At that point, we hope there will be no
misunderstanding about an equation being the abbreviation of a question. Before
reaching that point, however, it is a good idea to remind ourselves of the real
2 By the so-called factor theorem (see e.g., Section 11.1 in Volume II of [Wu-HighSchool]), it is not
difficult to prove that there can be no more than three solutions. Thus 12 , −1, and −2 are the only
solutions of x3 − 1 = − 25 x2 − 12 x.
30 2. TRANSLATION OF VERBAL INFORMATION INTO SYMBOLS
Exercises 2.1
(1) (You may freely make use of Theorem 9.2 on page 201 in order to do this ex-
ercise.) (i) Does the equation x2 + 2x + 1 = −4 in the number x have
solutions? Why? (ii) Does the equation x2 + 2x = −4 have solutions?
Why? (iii) Does the equation x2 − 6x + 7 = 0 have solutions? Why?
(iv) Does the equation in the numbers x and y, x2 + y2 − 4y = −9 have
solutions? Why?
3 1
(2) Does the equation 4x− 2 = 2x +3 have a solution? Why?
must add another inequality, namely, 10 − (10x + 9y) ≥ 0. We combine these two
inequalities into the following double inequality:3
0 ≤ 10 − (10x + 9y) < x.
Similarly, switching to the other pastry and replacing x by y, we also get
0 ≤ 10 − (10x + 9y) < y.
We apply the same considerations to the second option, that Erin buys “13 of one
and 6 of the other”. Altogether, we have the following collection of four double
inequalities that completely captures the verbal information:
0 ≤ 10 − (10x + 9y) < x, 0 ≤ 10 − (10x + 9y) < y,
Lx - 4L -
A r s rB
9R Rx
The distance between City A and City B is fixed, and both the First Woman
and Second Woman walked this distance in the time given. Let the First Woman
walk—at a constant speed of L mph—x hours before noon, and then another
4 hours (from noon till 4 pm), and let the Second Woman walk—at a constant
speed of R mph—x hours before noon and then another 9 hours (from noon till 9
pm). So the First Woman walked a total of x + 4 hours while the Second Woman
walked a total of x + 9 hours. Given that the former walked with speed L mph,
the total distance she walked in x + 4 hours is of course L( x + 4) miles. Similarly
the total distance the Second Woman walked in x + 9 hours is R( x + 9) miles. By a
previous remark, both distances are the same since both women walked between
cities A and B. Therefore we get
(2.1) L ( x + 4) = R ( x + 9).
This is supposed to be the equation we have to solve. But is it?
The answer is not quite, because all that this equation says is that, after the
First Woman walked x + 4 hours and after the Second Woman walked x + 9, they
had both covered the same distance. What this equation fails to capture is the
information that, at noon, the two women met
at a certain point between A and B, and that One should always check to
at the time of the meeting both had walked ex- make sure that the symbolic
actly x hours in opposite directions from City
translation has completely
A and City B. This means that the total dis-
tance the two of them had covered after walk- captured the verbal information.
ing x hours (which is Lx + Rx miles) is equal
to the distance between the cities, which is L( x + 4) or R( x + 9) miles, as we
have seen. Therefore the additional piece of information that must be incorpo-
rated into the symbolic translation is Lx + Rx = L( x + 4) or, what is the same,
Lx + Rx = R( x + 9). (In view of (2.1), it makes no difference which of the two is
used.) Therefore, it takes the following two equations to completely capture the
verbal information embedded in the problem:
(2.2) L ( x + 4) = R ( x + 9), R( x + 9) = Lx + Rx.
Now one may also reason slightly differently. Let the meeting point of the
two women be C:
A - C B
Consider the distance between A and C. The First Woman covered it in x hours
(before noon) while the Second Woman covered it in 9 hours (after noon). But in
x hours the First Woman walked Lx miles, while in 9 hours the Second Woman
walked 9R miles. Therefore Lx = 9R. Similarly, if we consider the distance
between C and B, we get in exactly the same fashion that 4L = Rx. The following
set of equations therefore also faithfully captures the verbal information of the
problem:
(2.3) Lx = 9R, 4L = Rx.
We will leave as an exercise to show that the two sets of equations, (2.2) and (2.3),
are “the same”, in a precise sense. See Exercise 1 immediately following.
Exercises 2.2
(2) The sum of the squares of three consecutive integers exceeds three times
the square of the middle integer by 2. If the middle integer is x, express
this fact in terms of x. If the smallest of the three integers is y, express
the same fact in terms of y.
(3) Paulo read a number of pages of a book with N pages, then he read
43 pages more and finished three-fifths of the book. If p is the number
of pages Paulo read the first time, write an equation using p and N to
express the above information.
(4) A whole number has the property that when the square of half this num-
ber is subtracted from 5 times this number, we get back the number itself.
If y is this number, write down an equation for y.
(5) Helena buys two books. The total cost is 49 dollars, and the difference of
the squares of the prices is 735. If the prices are x and y dollars, express
the above information in terms of x and y. (See Exercise 5 on page 93.)
(6) I have two numbers x and y. Take 20% of x from x, then what remains
would be 7 less than y. If however I enlarge y by 20%, then it would
exceed x by 8. Express this information in equations in terms of x and y.
(See Exercise 6 on page 93.)
(7) I have $4.60 worth of nickels, dimes, and quarters. There are 40 coins
in all, and the number of nickels and dimes together is three times the
number of quarters. If N, D, and Q denote the number of nickels, dimes,
and quarters, respectively, write equations in terms of these symbols to
capture the given information.
(8) We have two whole numbers. The division-with-remainder of the larger
number by the smaller number has quotient 9 and remainder 15. Also,
the larger number is 97.5% of ten times the smaller number. Let the
larger number be x and the smaller number be y. Express the given
information in equations in terms of x and y. (See Exercise 7 on page
93.)
(9) We look for two whole numbers so that the larger exceeds the smaller
by at least 10, but that the cube of the smaller exceeds the square of the
larger number by at least 500. If the larger number is x and the smaller
number is y, translate the above information in terms of x and y.
(10) If the digits of a three-digit number are reversed, the sum of the new
number and the original number is 1615. If 99 is added to the original
number, the digits of the original number are reversed. Let the hundreds,
tens, and ones digits of the original numbers be a, b, and c, respectively.
Write equations in a, b, and c to express the given information. (Caution:
Be very careful with the writing of your symbolic expressions.)
(11) A sum of money is to be divided equally among x people, each receiving
y dollars. If there were 3 more people, each person would receive 1 dollar
less, and if there were 6 fewer people, each would receive 5 dollars more.
Write equations in x and y to express this information. (See Exercise 8
on page 93.)
(12) The denominator of a fraction exceeds twice the numerator by 2, and the
difference between the fraction and its reciprocal is 5524 . If the numerator
2.2. SOME EXAMPLES OF TRANSLATION 35
CHAPTER 3
37
38 3. LINEAR EQUATIONS IN ONE VARIABLE
Formally, a linear equation of one variable asks for all the numbers x that
make two given polynomials in x of degree at most 1 equal. Examples are: 12x −
7 = 5x + 13, − 56 x + 1 = 23x − 4, and 9 = 27x − 4.
Now, you may feel that these are equations that you can solve with one eye
closed. Nevertheless, we are going to make you feel some discomfort by carefully
analyzing the usual procedure taught in TSM for solving such equations, step by
step. Then we will ask you to decide if it makes any sense. Let us first look at
how a simple equation such as 2x − 3 = 4x is solved according to TSM.
Step 1: 2x − 3 = 4x
Step 2: (2x − 3) − 2x = 4x − 2x
Step 3: −3 = 2x
Step 4: 2x = −3
Step 5: x = − 32
How are Step 2 and Step 5 justified? Let us concentrate on Step 2 first. Don’t
forget that in TSM, x is a variable, something that varies. Not knowing precisely
what a variable is, we cannot say what it means in Step 1 for 2x − 3 and 4x
to be equal, much less why the equality is undisturbed when (−2x ) is added to
both sides in Step 2. For example, if x can “vary”, then what about if x = 5?
In that case, the left side is 7 while the right side is 20, and the two sides are
definitely not equal. So once again, what does it mean for 2x − 3 and 4x to be
equal? Without answering this question, TSM nevertheless proposes that, if such
an equality is there, then adding the same object, −2x, to both 2x − 3 and 4x will
preserve the equality. This is a questionable attempt to imitate what Euclid wrote
some twenty-three centuries ago:
Of course even TSM tries to be persuasive. So it tries to compensate for the lack
of understanding by setting up an analogy. Imagine that we have a balance scale
and on the two sides of the balance are 2x − 3 and 4x, which balance each other
out.
−2x −2x
2x − 3 4x
q
Q
Q
Q
It seems reasonable that putting −2x on both sides will not “tip the balance”, and
this explains Step 2.
In case this is unconvincing, TSM presents a second strategy that makes use
of algebra tiles to “model” this solution of 2x − 3 = 4x. Let a green rectangle
model a variable and a red square model −1. Then the two sides of the equation
3.1. SOLVING LINEAR EQUATIONS 39
2x − 3 = 4x are modeled by the algebra tiles on both sides of the dotted line
below:
It seems “natural” that, if we remove two green tiles on the left (i.e., adding −2x)
and also remove two green tiles on the right (indicated by the two arrows on each
side), the state of equality between the two sides remains undisturbed. This is
how we arrive at Step 2 above.
These analogies are useful psychological ploys to win students’ trust, but in
mathematics, we cannot replace logical reasoning by offering suggestions of why
something might be true on account of analo-
gies. Since no advanced mathematics or sci- A teacher has to be able to
ence can be done this way, it would be unfair explain what it means to solve
to make school students go down this slippery
an equation without using
slope. All the more so when the correct way of
solving an equation is so simple to explain. balance scales or algebra tiles.
(A) Take the transition from Step 2a to Step 3a. The reason that the right side
of Step 3a is correct is clear: it is a simple application of the distributive law:
4x0 − 2x0 = (4 − 2) x0 = 2x0 .
Observe that if we say 4 · 173 − 2 · 173 = (4 − 2)173, then there is no need for the
distributive law: the left side and the right side are both equal to 346 by a direct
computation. But what we are claiming is that, without knowing what x0 is, we can
nevertheless assert that 4x0 − 2x0 = (4 − 2) x0 . Then the only way we can justify
this is by invoking the distributive law. In the same vein, let us examine closely
how to arrive at the left side of Step 3a:
(2x0 − 3) − 2x0 = (2x0 + (−3)) + (−2x0 ) (by definition of subtraction)
= (2x0 + (−2x0 )) + (−3) (Theorem 1 in Section 1.11
in [Wu-PreAlg])
= 0 + (−3) = −3.
We have just made use of the commutative and associative laws of addition in the
second line (see the Appendix (Section 1.11) of Chapter 1 in [Wu-PreAlg]). When
we do computations with specific (known) numbers, the use of the commutative
or associative law is unnecessary. For example, we hardly need the associative
and commutative laws to justify (15 − 7) − 16 = (15 − 16) − 7, because the left
side is 8 − 16 = −8 and the right side is −1 − 7 = −8. But now look at
(2x0 + (−3)) + (−2x0 ) = (2x0 + (−2x0 )) + (−3).
Here we don’t know the specific value of x0 so that no explicit computation is possi-
ble. How can we claim that the two sides are indeed equal except by invoking
Theorem 1 in the Appendix of Chapter 1 in [Wu-PreAlg] (see page 270 in this
volume)?
It is only when we do algebra and have to compute with unknown numbers
that the import of the general laws (associative and commutative laws of + and
×, and the distributive law) begins to be apparent. Each time we solve an equation,
we depend crucially on these laws.
(B) What has been accomplished in Steps 1a to 5a is that if we know there is a
solution x0 to 2x − 3 = 4x, then x0 = − 32 . This has nothing to say about whether
− 32 is a solution of 2x − 3 = 4x or not. Of course, there is a simple way to check
whether such is the case: if x0 = − 32 , then
3
2 − − 3 = −3 − 3 = −6
2
and also
3
4 − = 2(−3) = −6.
2
Thus indeed 2x0 − 3 = 4x0 , and the solution of 2x − 3 = 4x is − 32 .
Now we have to point out that there is a downside to the above direct check-
ing: it may be simple, but it also appears to be so dependent on the specific equa-
tion 2x − 3 = 4x being used that, perhaps, the reasoning will not carry over
to another equation. To counteract this impression, we proceed to give a more
clumsy way of checking that − 32 is a solution of 2x − 3 = 4x. We do so by
observing that Steps 1a to 5a can be done in reverse order so that if we start with
3.1. SOLVING LINEAR EQUATIONS 41
x0 = − 32 (which is Step 5a), we will arrive at 2x0 − 3 = 4x0 (which is Step 1a),
thereby proving that − 32 is a solution of 2x − 3 = 4x. In greater detail:
• x0 = − 32 implies 2x0 = −3 (multiply both sides by 2)
• 2x0 = −3 implies −3 = 2x0
• −3 = 2x0 implies (2x0 − 3) − 2x0 = 4x0 − 2x0 (after adding and sub-
tracting 2x0 to each side)
• (2x0 − 3) − 2x0 = 4x0 − 2x0 implies 2x0 − 3 = 4x0 (add 2x0 to both
sides)
This then shows that, by reversing the very steps that show what a solution might
be, we get to see that the putative solution is indeed a genuine solution. A little reflec-
tion will show that the general reasoning that leads from Step 1a to Step 5a can
always be reversed, because it only makes use of the basic laws of operations: the
associative and commutative laws of addition and multiplication together with
the distributive law. Thus there is no accident: this method of solution is univer-
sally valid for all equations.
(C) We hope you are beginning to appreciate the earlier remark that there is
no need for the concept of a variable. We have solved the equation by dealing
strictly with numbers, and by observing the Basic Protocol in the use of symbols
(page 4). This is a lesson you will want to bring back to your classroom.
(D) The preceding elaborate justification of Steps 1–5 on page 38 raises the
specter that, in order to do mathematics correctly in middle school, even the solu-
tion of a simple equation such as 2x − 3 = 4x would always have to be accompa-
nied by an unreasonably stodgy explanation. There is no fear of that, however, if
we exercise proper pedagogical judgment. One way to deal with this issue would
be to go over Steps 1a–5a with care the first time a linear equation or a quadratic
equation is solved. Then as soon as students grasp the underlying principle of
equation solving, it would be safe to allow them to use the rote procedure of Steps
1–5 on page 38. Naturally, if a teacher wants to test students’ understanding of
the reasoning underlying Steps 1–5, putting such a question on a test would be
entirely appropriate.3
The issue surrounding the teaching of equation solving is actually no differ-
ent from the teaching of the standard algorithms in arithmetic. Take the most
notorious case, the long division algorithm, for example. The algorithm itself is
brief, while its mathematical explanation is anything but.4 While students should
be exposed to some form of the mathematical explanation of the algorithm at the
beginning, it would be wrong to ask for the explanation each time the algorithm
is executed. The same is true of the justification for the procedure of solving
equations.
What is indisputable is that if a teacher hopes to inspire trust in her students,
she has to thoroughly understand what it means to solve an equation in order
for her teaching to achieve the necessary transparency. Everything we have said
above is therefore an integral part of a teacher’s repertoire. Where TSM fails, and
fails spectacularly, is in never giving a correct explanation of what it means to
solve an equation.
3 Such questions will never appear on standardized tests, and this is one reason why we should
not rely solely on standardized test scores to evaluate the quality of math education.
4 See Chapter 7 of [Wu2011].
42 3. LINEAR EQUATIONS IN ONE VARIABLE
(E) A final comment is on the practical issue of solving equations with rational
coefficients, e.g.,
7 3 1
(3.1) x− = 2x − .
6 4 3
With an adequate understanding of rational numbers, this equation can be easily
solved according to Steps 1a to 5a. However, since students are more likely to
make computational errors with fractions than with integers, there is some ad-
vantage in being able to get around the fractions of this equation by clearing the
denominators, namely, multiplying both sides by the product of all the denomi-
nators, 6 × 4 × 3 = 72, to get:
(3.2) 84x − 54 = 144x − 24.
The important thing to note is that this equation is equivalent to the original
equation (3.1), in the sense that any solution of equation (3.1) is a solution of (3.2),
and vice versa (see Exercise 3 on page 45). In any case, we get 60x = −30, and
x = − 12 is the solution.
You may have noticed that, for the purpose of clearing the denominators in
(3.1), it suffices to multiply both sides by 12 instead of 72 (12 is the LCM of 6, 4,
and 3; see page 267 for the meaning of LCM). If we do that, we find that
14x − 9 = 24x − 4.
Therefore 10x = −5 and we get x = − 12 again. The choice of 12 rather than 72 is
a nice shortcut, but since it is not absolutely necessary for the solution of the equation,
there is no need to emphasize it in the school classroom.
Now that we have a fresh understanding of Steps 1–5, we will use this lan-
guage to describe the structure of solving a general linear equation. There are two
parts.
(I) Solve equations in x of the form ax = b, where a and b are constants with a = 0.
Clearly the solution is ba , as one can check:
b
a = b.
a
Notice that the fact a = 0 guarantees that the fraction ba is well-defined. For
example, the solution to 3x = −7 is − 73 (= −37 ).
(II) Any linear equation Ax + B = Cx + D (where A, B, C, and D are constants
and A = C ) has the same solution as a linear equation of the form ax = b.
Let us go into part (II) in some detail. It claims: any number x that satisfies the
former equation also satisfies the latter equation for some appropriate constants
a and b, and vice versa.
The reason is that if x is a number so that Ax + B = Cx + D, then
( Ax + B) + (−Cx − B) = (Cx + D ) + (−Cx − B).
Therefore, by Theorem 1 in the Appendix of Chapter 1 in [Wu-PreAlg] (see
page 270 of this volume), we have Ax − Cx = D − B, i.e., ( A − C ) x = D − B. In
other words, the original equation Ax + B = Cx + D is now in the form of ax = b,
with a = A − C and b = D − B. By part (I), the solution to ( A − C ) x = D − B is
D−B
x = .
A−C
3.1. SOLVING LINEAR EQUATIONS 43
AD − AB + AB − BC
=
A−C
AD − BC
= .
A−C
On the other hand,
D−B CD − CB CD − CB D( A − C)
C +D = +D = +
A−C A−C A−C A−C
CD − BC + AD − CD
=
A−C
AD − BC
= .
A−C
It follows that
D−B D−B
A +B = C +D
A−C A−C
word variable here in place of x without saying what “variable” means, because it doesn’t matter.
44 3. LINEAR EQUATIONS IN ONE VARIABLE
sides of the equation Ax + B = Cx + D are the same number for every x. Thus in
this case we have the trivial identity that Ax + B = Ax + B.
We summarize the whole discussion in the following theorem:
We can also make use of comment (E) on page 42 to clear the denominators
of the equation − 23 x + 4 = − 15 x + 5 13 in order to solve this equation. Multiplying
both sides by 15, we get
−10x + 60 = −3x + 80
so that 7x = −20, and x = − 20 7 as before.
To summarize, solving a linear equation in a number x depends on two simple
ideas: by transposing terms, we isolate x on one side of the equation, and then we
solve an equation of the type ax = b. From this
The practice of teaching linear point of view, the common practice of classi-
equations as one-, two-, three-, fying linear equations into one-step equations,
and four-step equations does not two-step equations, three-step equations, and
four-step equations, and then teaching the solv-
make mathematical sense. ing of linear equations according to this clas-
sification simply does not make sense. You
should avoid teaching the solution of linear equations according to this clas-
sification.
3.2. SOME WORD PROBLEMS 45
Exercises 3.1
7 2 4
(1) Prove that x = is the unique solution of = .
15 3x − 1 x + 13
(2) Solve: (i) 2x − 8 = 15 + 43 x. (ii) 73 x + 2 = 32 − 25 x. (iii) 11 9 − 3x =
5
−6x + 18 1
. (iv) ax + 6 = 8 − 7ax, where a is a nonzero number.
(v) 4bx + 13 = 2x + 26b, where b is a number not equal to 12 .
(vi) 12 − 83 x = 56 x + 23 . (vii) 25 ax − 17 = 13 ax − 15
2 .
(3) Let a linear equation ax + b = cx + d be given so that a, b, c, and d are
constants and a − c = 0; call this Equation A. Let k be a nonzero constant
and call the equation (ka) x + (kb) = (kc) x + (kd) Equation B. Prove that
a number is a solution of Equation A if and only if it is a solution of
Equation B.
(4) Given an equation 3x − 8 = ax + 7, where a is number. For what values
of a does the equation have a unique solution? have no solution? Can it
have an infinite number of solutions?
3−x 3 5 4
(5) Solve: (a) = − . (b) = .
x−1 2 2x − 3 2−x
5 = 3 · 5 .
96 2 144
The next few problems are about so-called constant rates: the constant rate of
walking (which we call constant speed), the constant rate of water pouring into a
tub, the constant rate of work, such as the
The concept of constant rate has number of square feet a lawn is mowed, etc.
to be defined before rate problems In view of the fact that not only is the concept
of rate mangled in the standard materials, but
in school mathematics can be
the concept of constant rate, which is central to
solved. the solution of this class of problems, is hardly
ever clearly defined, we begin by recalling the
needed precise definitions (see Section 1.9 in [Wu-PreAlg]).
We will concentrate on speed; we have seen that the extrapolation of the speed
discussion to other kinds of rates is not difficult (see Section 1.9 of [Wu-PreAlg]).
In general, for a given motion, let us say the distance is measured in terms of
miles and the time is measured in terms of hours. We define the average speed
over a time interval from hour t0 to hour t, t0 < t, to be the division
hours) can be any pre-assigned units. We say the motion has constant speed v
mph (v being a fixed positive number), or more simply that it has speed v mph,
if its average speed over any time interval is always equal to v.
In doing word problems about a motion of constant speed v, the important
thing is to remember that no matter what time interval is used, the average speed
over this interval will always be the same, namely, v mph.
We have already done some (constant) rate problems in Section 1.9 of
[Wu-PreAlg], but we can now take up more complicated ones that require a more
substantial application of linear equations. We start with a prototypical problem
of this genre.
Example 4. Regina drives from Town A to Town B in 10 hours, and Eric in
12. Assume that each drives at constant speed. If Regina drives from Town A to
Town B, and Eric from Town B to Town A, and they leave at the same time and
drive on the same highway, after how many hours will they meet in between?
There is an implicit convention for problems of this type and it should
be brought out: Regina and Eric are implicitly assumed to drive cars,
and their cars are idealized to be two points,6 Likewise, the two
towns A and B are also idealized to be two points.7 Without
these two idealizations, it would be unclear as to what it means to say,
for example, that “Regina drives from A to B” in (exactly) 10 hours”.
We should keep these idealizations in mind when doing this kind of
problem.
Solution. We first determine the speeds of Regina and Eric. We do not
know the distance between Towns A and B, so to facilitate thinking, let us say this
distance is D miles.
DR mi DE mi
-
A B
D mi
Since Regina’s (constant) speed is also her average speed in the 10-hour drive
D
from Town A to Town B, her speed is therefore 10 mph. Likewise, Eric’s speed
D
is 12 mph. We are trying to find out how long it will be before Regina and Eric
meet in between; let us say Regina and Eric meet after T hours. Note that D and
T are real numbers and, a priori, we do not know whether they will be fractions or
not. Therefore the following computations will have to invoke FASM (page 265)
many times, and we will not mention this fact again. In particular, we will make
use of the fact that the distributive law and formulas (a)–(d) for rational quotients
on page 270 are valid for real numbers.
Knowing that Regina has driven T hours when she meets Eric, we can now
determine the distance she has driven in T hours; let us call this distance DR miles
D
(see the preceding picture). Because Regina’s (constant) speed of 10 mph is also
her average speed during those T hours, we have
D D
= R.
10 T
6 This is an example of “modeling”.
7 “Modeling” again.
48 3. LINEAR EQUATIONS IN ONE VARIABLE
5
In other words, Regina and Eric meet after 5 11 hours.
It was pointed out in Section 1.9 of [Wu-PreAlg] that there is a certain “monot-
ony” to constant rate problems. For example, Example 4, which is about speed,
can be easily reformulated in terms of water flow or painting a house or mowing
a lawn. Consider, for example, the following problems.
(4a) Regina mows a lawn in 10 hours, and Eric in 12. Assuming that each
mows at constant rate, how long would it take them to mow the same lawn if
they mow together without interfering with each other?
(4b) Regina paints a house in 10 hours and Eric in 12. Assuming that each
paints at constant rate, how long would it take them to paint the same house if
they paint together without interfering with each other?
(4c) A faucet can fill a tub in 10 minutes, and a second faucet in 12. Assuming
that the rate of the water flow remains constant in each faucet, how long would it
take to fill the same tub if both faucets are turned on at the same time?
It is important to be able to recognize that the mathematics behind Example
4 and (4a)–(4c) is the same, and that if you can solve any one of these, the same
reasoning will allow you to solve them all.
Activity
Solve (4b).
Example 5. Water flows out of two faucets A and B at constant rate. Suppose
the water flow from faucet A is 10 gallons per minute more than that from faucet
B, and suppose a container has a capacity of 150 gallons. If both faucets are turned
3.2. SOME WORD PROBLEMS 49
on at the same time and the container is filled in 1 12 minutes, what are the rates
of the water flows in both faucets?
Solution. Let the rate of water flow from faucet A be x gallons per minute.
Then the rate from faucet B is x − 10 gallons per minute. Suppose the amount
of water coming out of faucet A after 1 12 minutes is w A gallons, then the average
rate of the water flow from faucet A in 1 12 minutes is, by definition,
wa
gal/min.
1 12
Since this average rate is equal to x gal/min (because of the constancy of the rate
of water flow), we have
wa
x =
1 12
and therefore w a = 1 12 · x. Similarly, the amount of water w B coming out of faucet
B after 1 12 minutes is w B = 1 12 · ( x − 10). Since by hypothesis, the container of
150 gallons is filled after 1 12 minutes when both faucets A and B are turned on at
the same time, we see that w A + wb = 150. Therefore
1 1
(3.4) 1 x + 1 ( x − 10) = 150.
2 2
There are many ways to solve this equation. One can, for example, clear the
denominators of the equation (see page 42). However, it is actually simpler in
this case to use the distributive law to expand the left side to get 1 12 v + 1 12 v −
(1 12 × 10), which is immediately seen to be equal to 3v − 15. Thus 3v − 15 = 150,
so that 3v = 165 and v = 55 mph. The answer is therefore: the rate of water
flow from faucet A is 55 gal/min and that from faucet B is 45 (= 55 − 10) gal/min.
Example 6. Karen and Lisa paint houses at a constant rate. Suppose Karen
paints 10 square meters more per hour than Lisa, and suppose a wall has an area
of 150 square meters. If both Karen and Lisa paint this wall at the same time and
they finish it in 1 12 hours, what are the rates at which each paints?
Again, there is an unspoken convention for this kind of collaborative-
work problems: Karen and Lisa are supposed to be able to work simul-
taneously without any interference from the other person.8
Solution. At this point, we may assume that we know how to define constant
rate of painting (in sq. m per hour) as the number r so that the average rate of
painting from time t0 to time t is equal to r sq. m per hour no matter what t0
and t may be.
Let Karen paint x square meters per hour. Then Lisa paints x − 10 square
meters per hour. If after 1 12 hours, Karen has painted K sq. m, then x = K/(1 12 )
because she paints at a constant rate, so that K = 1 12 x sq. m. Similarly, Lisa paints
1 12 ( x − 10) sq. m in 1 12 hours. So in 1 12 hours they have painted a combined
area of 1 12 x + 1 12 ( x − 10) sq. m. Since the area of the wall is 150 square meters,
we have
1 1
1 x + 1 ( x − 10) = 150.
2 2
8 Again, an example of modeling.
50 3. LINEAR EQUATIONS IN ONE VARIABLE
Comparing this equation with (3.4), we realize that we are doing the same prob-
lem as Example 5! Therefore the solution is that Karen paints at a rate of 55 square
meters per hour and Lisa 45 square meters per hour.
Example 7. Tom and May drive on the same highway at constant speed. May
starts 30 minutes before Tom, and her speed is 45 mph. Tom’s speed is 50 mph.
How many hours after May leaves will Tom catch up with her?
Solution. (As in the preceding examples, we will have to make use of FASM
throughout the following discussion.) We give two slightly different solutions.
Suppose T hours after May leaves, Tom catches up with May. In those hours,
May has driven 45T miles. Now Tom does not start driving until half an hour
after May does, therefore at the time he catches up with May, he has only driven
T − 12 hours. The total distance he travels in that time duration is thus 50( T − 12 )
miles. The fact that Tom catches up with May after T hours means that two
distances—45T miles and 50( T − 12 ) miles—are equal, i.e., 45T = 50( T − 12 ). By
the distributive law, 45T = 50T − 25. Adding 25 to both sides, we get 45T + 25 =
50T, and so we get 25 = 5T after adding −45T to both sides. Thus T = 5, i.e., 5
hours after May leaves, Tom catches up with her.
A second solution is obtained by imagining we can watch Tom’s car from
May’s car. Since she travels 45 miles in an hour and her speed is constant, she
travels 12 × 45 = 22.5 miles in half an hour. So when we watch Tom’s car from
May’s car half an hour after she leaves, we see Tom’s car coming from a distance
of 22.5 miles. Suppose after Tom has driven t hours, he catches up with May.
In those t hours, May’s car travels 45t miles, whereas Tom’s car travels 50t
miles. The fact that Tom catches up with May after t hours means in t hours, Tom
has driven 22.5 miles more than May. Consequently, 50t − 45t = 22.5, so that
5t = 22.5 and t = 4.5 hours. Since Tom starts 0.5 hours after May leaves, it takes
Tom 4.5 + 0.5 = 5 hours after May leaves to catch up with her.
Exercises 3.2
(1) A man has six hours at his disposal. What is the furthest he can ride in
a car going at a constant speed of 25 mph if he has to get back to the
starting point by riding a bicycle at the constant rate of 6 mph?
(2) A train loses 16 of its passengers at the first stop, 25 at the second, 20%
of the remainder at the third, and three quarters of the remainder at
the fourth. After all that, 25 passengers remain. What was the original
number of passengers?
(3) Water flows out of two faucets A and B at constant rate. Faucet A fills
a given container in 5 minutes, while faucet B fills it in 6 minutes. How
long would it take to fill the container if both faucets are turned on at
the same time?
(4) The numerator of a fraction is 7 less than the denominator. If 4 is sub-
tracted from the numerator and 1 added to the denominator, the result-
ing fraction equals 13 . What is the fraction?
(5) Alan had twice as much money as Bill, but after giving Bill $28, he has
2
3 as much as Bill. How much did each have at first?
(6) Find two numbers whose sum is 76 and whose difference is 16 .
3.2. SOME WORD PROBLEMS 51
(7) Lisa and Karen mow lawns at a constant rate. Lisa mows a certain lawn
by herself in 4 hours, but with Karen’s help from the beginning, she does
it in 3 hours. How long would it take Karen to mow it alone?
(8) There are two heaps of coins, one containing nickels and the other dimes.
The second heap is worth 20 cents more than the first, and has 8 fewer
coins. Find the number in each heap.
(9) If A has $566 and B has $370, how much money must A give B so that B
has 45 as much as A?
(10) A woman drives a car for 3 12 hours and she finds that she has covered a
distance of 130 miles. If she drives at a constant speed of 45 mph in the
country and 20 mph within city limits, how many miles of her trip is in
the country?
(11) (Sixth-grade Japanese exam question) A train 132 meters long travels at
87 kilometers per hour and another train 118 meters long travels at 93
kilometers per hour. Both trains are traveling in the same direction on
parallel tracks. How many seconds does it take from the time the front
of the locomotive of the faster train reaches the end of the slower train to
the time that the end of the faster train reaches the front of the locomotive
on the slower one?
(12) Two trains A and B run at constant speed. Train A goes from City P to
City Q in two hours whereas Train B goes from Q to P in three hours. If
A leaves P for Q at the same time that B leaves Q for P (on a separate
but identical rail!), after how many hours will they meet?
(13) Winnie and Reggie working together can paint a house in 56 hours. If
Reggie paints the same house alone, it takes him 90 hours to get it done.
How long would it take Winnie to paint the house if she works alone?
(Assume each paints at a constant rate, and that when they paint together
there is no mutual interference.)
(14) Two cars A and B move at constant speed. A starts from P to Q, 150
miles apart, at the same time that B starts from Q to P. They meet at the
end of 1 12 hours. If A moves 10 miles per hour faster than B, what are
their speeds?
(15) Alfred, Bruce, and Chuck mow lawns at a constant rate. It takes them 2
hours, 1.5 hours, and 2.5 hours, respectively, to finish mowing a certain
lawn. If they mow the same lawn at the same time, and if there is no
interference in their work, how long would it take them to get it done?
(16) Paul can mow a certain lawn by all himself in 11 hours. After working
for 2 12 hours, however, Paul is joined by Henry and the two together
finish mowing the lawn in another 5 hours. Assume as always that both
mow the lawn at constant rate, how long would it take Henry to mow
the lawn alone? Explain clearly how you get the solution.
(17) Water flows out of two faucets, A and B, at a constant rate. If both faucets
are turned on at the same time, a tub is filled in 36 minutes. If faucet A
alone can fill this tub in 58 minutes, how long would it take for faucet B
to fill it alone?
(18) A man walked at constant speed from one place to another in 5 12 hours.
If he had walked 14 of a mile faster in each hour, the walk would have
52 3. LINEAR EQUATIONS IN ONE VARIABLE
taken only 5 hours. How long is the walk and what was his original
speed?
(19) A solution consisting of water and alcohol has 70% alcohol. If 25 cc of
water is added to the solution, how much alcohol must be added in order
for the solution to still contain 70% alcohol?
(20) Fifteen minutes after Colin leaves for school, his mother discovers that he
forgot to take his homework. She drives at a constant rate, and it takes
her 6 minutes to get to school. Colin walks to school at a constant rate,
and it takes him 24 minutes to get there. (i) Use mental math to decide
if Colin’s mother can catch up with him. (ii) If she does, compute how
soon this happens after Colin leaves.
https://doi.org/10.1090//mbk/099/04
CHAPTER 4
53
54 4. LINEAR EQUATIONS IN TWO VARIABLES AND THEIR GRAPHS
is never stated in TSM, and therefore not proved either, but it is the central the-
orem of this topic because the reasoning in the proof provides students with the
tools that render all standard problems involving equations of lines into routine
exercises.
A
@ c
D
a @
@ b
d @
B C
3 See Exercise 11 in Exercises 4.6 of [Wu-PreAlg] for the suggestion of another proof.
4.1. COORDINATE SYSTEM IN THE PLANE 55
being parallel to the lower edge of the page; then the other line is vertical in the
sense of being parallel to the left and right edges of the page. Also by tradition,
the horizontal line is designated as the x-axis, and the vertical one the y-axis.
By regarding these two lines as number lines, we may henceforth identify every
point on these coordinate axes (as the x- and y-axes have come to be called) with
a number. As expected, we choose the positive numbers on the x-axis to be on
the right of O so that O is the 0 of the x-axis, and we choose the positive numbers
on the y-axis to be above O on the y-axis so that O is also the 0 of the y-axis. The
ray on the x-axis with vertex O and which contains the positive numbers is called
the positive x-axis; the positive y-axis is similarly defined. O is called the origin
of the coordinate system.
Recall that a number line depends on the choices of a point as 0 and another
point as 1. In the case of the x-axis and y-axis, the choice of 0 is already specified
by the requirement that the point of intersection O be also the 0 on both axes.
Once the choice of 1 on one axis, let us say the x-axis, has been made (to the right
of O), then the choice of 1 on the y-axis will be uniquely determined because the
counterclockwise rotation ϕ (the lower case Greek letter phi) of 90 degrees around
O has to be length-preserving (see assumption (Iso1) on page 265). Therefore if
the 1 on the x-axis is denoted by A, then the 1 on the y-axis has to be the point
ϕ ( A ).
Y
r 1 = ϕ( A)
Ar
X
O 1
Now we can associate to each point P in the plane an ordered pair of numbers
in the following way. Let us agree to call any line parallel to the x-axis a horizontal
line, and also any line parallel to the y-axis a vertical line. Then through P draw
two lines, one vertical and one horizontal, so that they intersect the x-axis at
a number a and the y-axis at a number b, respectively. Then the ordered pair of
numbers ( a, b) is said to be the coordinates of P (relative to the chosen coordinate
axes); a is called the x-coordinate and b the y-coordinate of P (relative to the
chosen coordinate axes), as shown:
P r rb
r X
a O
Notice that the coordinate pair associated with a point is unique, i.e., unambigu-
ous, i.e., it cannot happen that a given P is associated with two distinct pairs of
56 4. LINEAR EQUATIONS IN TWO VARIABLES AND THEIR GRAPHS
shown that every point corresponds to one and only one ordered pair of numbers,
we see that
( a, b) = (c, d) is equivalent to a = c and b = d.
Again, note that there is no ambiguity as to what the equality between two or-
dered pairs of numbers means.
We now make contact with a few geometric concepts that we have introduced
earlier. The first is a standard application of the Pythagorean Theorem: The dis-
tance between any two points ( a, b) and (c, d) is
( a − c )2 + ( b − d )2 .
This is usually called the distance formula for two points. The proof is so straight-
forward that it can be left as an exercise.
We can also express some basic isometries in terms of coordinates. The re-
flection across the x-axis maps a point ( x, y) to ( x, −y), and the reflection across
the y-axis maps a point ( x, y) to (− x, y) (x and y are any numbers). These fol-
low directly from the way coordinates are defined and from the definition of a
reflection (see Exercise 5 below). On page 114, one also finds a description of the
coordinates of points under reflection across the diagonal line y = x. In addition,
we can also express a translation in terms of coordinates; see Lemma 5.3 on page
95 in the next chapter; this lemma plays an important role in the discussion of
quadratic functions in Chapter 10.
There are some fine points about the drawing of a coordinate system that we
will have to confront at some point; see the discussion in Section 6.4.
Exercises 4.1
(1) (i) Let L be the vertical line passing through (5, 0) and let R denote the
reflection across L. What are the coordinates of R( x, y), the reflection of
( x, y) across L? (ii) Repeat part (i) when L is now the horizontal line
passing through (2, −3).
(2) Prove the distance formula for two points.
(3) Let D be the line which bisects the right angle whose sides are the posi-
tive x-axis and the positive y-axis. (i) Prove that the coordinates of every
point on D is (t, t) for a number t. (ii) Let Λ be the reflection with re-
spect to D. Prove that for any point ( x, y) in the plane, Λ( x, y) = (y, x ).
(4) Let R be the 180◦ rotation with respect to the origin O of a coordinate
system. Then for any point ( x, y) in the plane, prove that R( x, y) =
(− x, −y).
(5) Prove the claims about the coordinates of points under reflections across
the x- and y-axes above.
(0, 1), (2, 2), (2.5, 2.25), (4, 3), (6, 4), (7, 4.5).
These points strongly suggest that the graph of x − 2y = −2 is a (straight) line,
and we will presently prove in Section 4.4 that such is in fact the case.
Y
s
s
4
s
s
s
2
2 4 6
X
O
However, for the graphs of the two special kinds of linear equations in two
variables in the form of x = a or y = b, where a and b are specific numbers, we
can prove that their graphs are lines without further ado. We single out these
two equations for another reason: their graphs are confusing to students, partly
because TSM5 does not explain it well. Let us go over these cases carefully.
Consider, for example, y = 3, which, as an equation in two variables, is in
reality the abbreviated form of the equation 0 · x + 1 · y = 3. The collection of all
solutions of y = 3 is then exactly all the pairs (s, 3), where s is an arbitrary number,
for the following reason. Every one of these (s, 3)’s is clearly a solution because
X
O
6 Lemma 4.10 on page 79 below shows that the equation of a given line is unique up to a constant
Activity
Exercises 4.2
(1) Explain clearly why each of the following figures fails to be the graph
of the equation y = 3. (a) The figure consisting of the horizontal line
passing through the point (0, 3.1). (b) The figure consisting of all the
points ( x, 3) so that x = 2. (c) The figure consisting of the horizontal
line passing through (0, 3) together with (0, 0).
(2) Let G be the graph of the equation −5x + y = 8. What is the point of
intersection of the x-axis with G? Explain your answer as clearly as you can.
(3) Let G be the graph of the equation x + 23 y = 1. What is the point of
intersection of the y-axis with G? Explain your answer as clearly as you can.
Y
L
−3 X
O
So we must also show that every point of L is a point of G. Therefore, we must
show two things (we label them by (α) and (β) to raise their profile):
(α) Every point on the graph G is a point on the line L.
(β) Every point on the line L is a point on the graph G.
The proofs of these assertions require some preparation, and we will address
the preparatory material before returning to these proofs on page 73. We begin
by reviewing some facts concerning similar triangles.7 Recall that two geometric
figures are similar if one is mapped onto the other by a dilation followed by a con-
gruence. The fundamental fact governing dilation is the following theorem; it is
an immediate consequence of Theorems 4.4 and 4.5 in Section 4.6 of [Wu-PreAlg].
Theorem 4.3. Let
ABC be given, and let D be a point on AB. Let the line passing
through D and parallel to BC intersect AC at E. Then
| AB| | AC | | BC |
= = .
| AD | | AE| | DE|
A
D E
B C
Activity
Check that indeed Theorem 4.3 follows from Theorems 4.4 and 4.5 in Section
4.6 of [Wu-PreAlg].
We will also need the following theorem, which is Theorem 4.13 in Section
4.7 of [Wu-PreAlg] (AA stands for angle-angle).
Theorem 4.13 (AA criterion for similarity). If two triangles have two pairs of
equal angles, they are similar.
In order to use this criterion effectively, one needs to know when two angles
are equal. In this context, the theorem about corresponding angles and alternate
7 See Sections 4.6 and 4.7 of [Wu-PreAlg].
4.3. THE CONCEPT OF SLOPE 63
We have to be careful, however. Both lines in the left picture below are intuitively
considered to be rather steep, but as we look at them from left to right, one is
ascending and the other is descending. In order to distinguish between these two
kinds of steepness, we agree, by tradition, to assign a positive number to a line
slanted this way, /, and assign a negative number to a line slanted this way, \.
More precisely, we want the assignment of numbers to nonvertical lines passing
through P to satisfy the following natural requirements: (i) distinct numbers are
assigned to distinct lines, (ii) when the absolute value of this number is large, the
line would look like those on the left below—very steep—but when the absolute
value of this number is small (i.e., close to 0), then the line would look like those on
the right—not steep, almost horizontal—and (iii) 0 is assigned to the horizontal
line.
D
D
D ``` P ((((((
DqP ` (`
(q`
(((( ```
``
D `
D
D
O D O
The reason we exclude the vertical line from our consideration is twofold. The
first is technical and has to do with the inability to define division by zero (see
Theorem 4.4 on page 67). The second one is intuitive: if a line is vertical, it is
already the ultimate of “steep” and there would be no need to discuss it. (There
is an interesting story that indirectly reveals the woeful neglect of a precise defi-
nition for slope and the resulting damage on student learning on pages 241 ff. of
[Gladwell].)
With this intuitive picture in mind, the following definition of the local slope
at P gives a natural and direct way to assign such a number to a line passing
through P. The definition will require the concept of the image of a set under
a translation (Section 4.4 of [Wu-PreAlg]); recall that if T is a translation of the
plane (see page 269) and P
is a point in the plane, then T ( P
) denotes the point to
which T moves (translates) P
. Now if S is a geometric figure, then the translated
image of S under T, denoted by T (S ), is the collection of all the points T ( P
), where
P
is a point of S .
We now return to the consideration
of all the lines passing through the 2
fixed point P. Pass a horizontal line Y
through P and let Q be the point on 2
this horizontal line to the right of P so
1
that | PQ| = 1. Also recall that O is the
origin of the coordinate system.
Let be the translated image of 1
the y-axis by the translation along the 1
−→ P Q= 0
vector OQ; the numbers on the y-axis
are also translated to through this X
translation so that, in particular, the O
−1
number 0 of is at the point Q. The
line is now the vertical number line
passing through Q. This line then −1
allows us to define the local slope of a
nonvertical line L passing through P, as follows: Let L intersect at a point; then
the coordinate of this point of intersection on the number line is by definition the local
slope of L at P.
The following picture gives a better idea of what is happening in two cases.
First, when a line L1 passing through P intersects at a point Q1 above Q, then
the local slope of L1 at P is the number Q1 on the number line . This Q1 is
positive because, like the y-axis, the positive numbers on are those above its 0,
4.3. THE CONCEPT OF SLOPE 65
Q1
P 1 Q
L1
O X
Q2
Activity
Show that a line passing through P has local slope equal to 1 if and only if it
is the 45◦ counterclockwise rotation around P of the horizontal line, and that
it has local slope equal to −1 if and only if it is the 45◦ clockwise rotation of
the horizontal line around P.a
a Also see the comments about coordinate systems in Section 6.4.
The virtue of this definition of the local slope of a line at P is that it shows in a
natural way why some lines, such as L1 in the picture, have positive local slope
while others, such as L2 in the picture, have negative local slope. It also allows
the value of the local slope of a line at P to be read off directly from the number
line itself. To this end, observe that we may give an equivalent description of the
number line as follows. On the vertical line through Q, we choose the number
0 to be the point Q itself, and choose the number 1 to be the point on which is of
distance 1 above Q. Recall that once 0 and 1 have been chosen, the number line is
completely determined; it is straightforward to see that this procedure describes
the same number line as the one obtained by translating the y-axis to along the
−→
vector OQ. Therefore we have:
[Alternate definition of local slope of L at P.] If the given line
intersects at a point Q1 above Q, then the local slope of this line at
P is the length of QQ1 , but if the line intersects at a point Q2 below
Q, then the local slope of the line at P is the negative of the length
of QQ2 .
We can say more. This definition of local slope of a line at P immediately
implies that the local slope of a horizontal line passing through P is 0. Moreover,
suppose the point of intersection Q1 of a line L1 (passing through P) with is
very high above Q; then the (absolute value of the) local slope of L1 would also
be very large. Correspondingly, L1 would be very steep. However, if the point
of intersection Q2 of a line L2 (passing through P) with is very far below Q,
then the local slope of L2 would be a negative number with a very large absolute
66 4. LINEAR EQUATIONS IN TWO VARIABLES AND THEIR GRAPHS
value.9 But if Q2 is very far down below Q, L2 would have to be very steep as
well. Examples like these show that this definition of the local slope of a line
at P captures the intuitive meaning of slope as a measurement of steepness (see
(i)–(iii) on page 63).
Activity
Suppose L is the line passing through (5, 1) and (−2, 3). What is the slope
of L at (5, 1)? What is the slope of L at (−2, 3)? (Hint: Use Theorem 4.3 on
page 62.)
Y L
N
Q M 1 N
P 1 Q
X
O
9 Recall that the absolute value of a number is its distance from 0. In this instance, 0 is just Q.
4.3. THE CONCEPT OF SLOPE 67
N
1
P Q
Q
O X
L
We have to show that | QQ
| = | NN
|. Because the reasoning is so similar to the
preceding case, we will leave it as an Activity. In any case, we have proved that
the concept of slope as given on page 66 is well-defined.
Activity
Prove that | QQ | = | NN |.
Several trivial remarks should be made about this ratio right away.
68 4. LINEAR EQUATIONS IN TWO VARIABLES AND THEIR GRAPHS
−a
First, because for any a and b, ba = − b (see Lemma 2.12 of Section 2.5 in
[Wu-PreAlg] when a and b are rational, then use FASM), we see that
p2 − r2 r − p2
= 2 .
p1 − r1 r1 − p1
This shows that in writing the ratio, the order of P and R doesn’t matter (so long
as the order of their appearance in the numerator and denominator remains the
same).
Next, observe that the denominator of this ratio is never 0 because if it is, then
r1 − p1 = 0 and r1 = p1 . The distinct points P, R now have the same x-coordinate
and therefore lie on a vertical line. This implies that the line L is a vertical line,
contradicting the hypothesis that L is nonvertical. Thus the denominator of this
quotient is never zero, and the quotient makes sense.
p −r
Finally, we should point out where the ratio p2 −r2 originally comes from.
1 1
To this end, consider the situation in the definition of the local slope of L at the
point P, as given on page 64. Thus let Q be the point lying on the horizontal
line passing through P of distance 1 to the right of P, and let be the vertical
number line passing through Q with Q = 0. Now let the point R of Theorem 4.4
be the point of intersection of L with the vertical number line (see both pictures
below). Then we claim:
p2 − r2
(4.1) slope of L = .
p1 − r1
Y Y
L
R
P 1 Q
P 1 Q
R
X X
O O L
In order to prove (4.1), observe that p2 = q2 because P and Q lie on the same
horizontal line and therefore have the same y-coordinate, and r1 = q1 because R
and Q lie on the same vertical line and therefore have the same x-coordinate. If
R is above Q (as in the left picture above), then
p2 − r2 r − p2 r − q2 | RQ| | RQ|
(4.2) = 2 = 2 = = = | RQ|.
p1 − r1 r1 − p1 q1 − p1 | PQ| 1
On the other hand, if R is below Q (see the right picture above), then, again using
p2 = q2 and r1 = q1 , we have
p2 − r2 q − r2 | RQ| | RQ|
(4.3) = 2 = = = −| RQ|.
p1 − r1 p1 − q1 −| PQ| −1
In view of the alternate definition of local slope at P on page 65, equations (4.2)
p −r
and (4.3) together prove that p2 −r2 is equal to the local slope of L at P, which is
1 1
of course the slope of L. The proof of (4.1) is complete.
4.3. THE CONCEPT OF SLOPE 69
Proof of Theorem 4.4. We will prove the theorem for the case of a positive slope
for the line L. The remaining part of the proof for the case of a negative slope will
be left as an exercise.
p −r
Because it doesn’t matter in the writing of the ratio p2 −r2 which of P and
1 1
R comes first, we may assume that P is the point to the left of R, i.e., we may
assume p1 < r1 . As usual, if Q is the point to the right of P on the horizontal
line H passing through P so that | PQ| = 1, then the vertical line through Q
intersects L at a point M so that M is above Q on this vertical line. We also take
this opportunity to recall that | MQ| is equal to the slope of L (see the alternate
definition of local slope at P on page 65). Let the vertical line passing through
R intersect the horizontal line H at a point S. Either | PS| ≤ 1 or | PS| > 1. The
following picture shows the case | PS| > 1, but the reasoning for both cases is
identical.
Y L
R
1 Q S H
P
X
O
The reasoning that led to (4.2) or (4.3) now shows that
p2 − r2 | RS|
(4.4) = .
p1 − r1 | PS|
Here comes the critical idea: the triangles PRS and PMQ have two pairs of equal
angles (they share an angle ∠ P, and |∠ MQP| = |∠ RSP| because both are right
angles) so that by the AA criterion for similarity (Theorem 4.13 on page 62), the
triangles are similar.10 Therefore their corresponding sides are proportional (The-
orem 4.12 on page 271):
| RS| | MQ|
(4.5) = .
| PS| | PQ|
| MQ| | MQ|
However, | PQ| = 1 = | MQ|, which as noted is equal to the slope of L. There-
fore, by combining this fact with equalities (4.4) and (4.5), we obtain
p2 − r2
= slope of L.
p1 − r1
10 In this case, note that Theorem 4.3 on page 62 also suffices for the purpose at hand.
70 4. LINEAR EQUATIONS IN TWO VARIABLES AND THEIR GRAPHS
The important conclusion to draw from Theorem 4.4 is this: given any straight
line L which is not vertical, then the slope of L is given by the ratio
p2 − r2
(4.6)
p1 − r1
for any two points P and R on L
Activity
Suppose a line L passes through (2, −3) and (−4, 1). If P is a point on L
with x-coordinate 23 , what is the y-coordinate of P?
Textbooks usually define the slope of a line by picking two points P and R
on the line and then declaring the ratio formed from the coordinates of these two
points—as in (4.6)—to be the slope of the line. A priori, the ratio resulting from a
different choice of points on the line could be a different number so that a line
could have many slopes. This would render any discussion of “the slope of a line”
nonsensical. For example, suppose instead of a straight line we have the circle of
radius 5 around the origin (0, 0), denoted by C .
(0,5)
(3,4)
(−5,0) O (5,0)
If we take the two points (−5, 0) and (3, 4) on C , and form the usual ratio, we get
4−0
3−(−5)
= 12 . On the other hand, taking another pair of points (5, 0) and (0, 5) on
C leads to the ratio of 50−
−5 = −1. For the curve C , the ratios formed from different
0
pairs of points on it are therefore not always the same. Yet for a line, these ratios
are always the same, and the question is why?. We have answered this question by
the use of similar triangles. Be sure your students know the answer too, because
the fact that the computation of the slope of a line in Theorem 4.4
can be done by using any two points on the line is a powerful tool in
dealing with all kinds of questions related to linear equations.
The discussion in the next section will amply bear out this assertion.
We will round out the discussion of slope by addressing two questions con-
cerning its definition that may be baffling to some. First, why do we choose a Q
(on the horizontal line through P) to be of distance exactly 1 from P? The answer
is that there is no reason at all except that we have to decide on a consistent choice
of such a point Q so that we can compare the slopes of different lines passing
through different points. For example, we can choose this Q to be of distance 2
4.3. THE CONCEPT OF SLOPE 71
from P once and for all. Such a choice would not change the reasoning in any dis-
cussion of slope except that the values of the slopes of lines would be consistently
larger,11 as the following picture on the left indicates.
Y
Y
L1 L1
Q1 Q1
1 Q 1
P 1 Q P
Q
Q2
Q2
O X X
O
L2
L2
Another question is why choose a point Q to the right of P rather than to the
left? Again, no reason at all except to maintain a consistency in order to make
the discussion possible. In the above picture on the right, suppose we choose
Q to be 1 unit to the left of P, then the slope of a line like L1 in the picture on
the right would be negative because the point of intersection Q
of L1 and the
vertical number line passing through Q is now below Q (the 0 of the vertical
number line). The same reasoning leads to the fact that the slope of a line like
L2 would be positive. There is nothing wrong with that except this is not the
convention about slope that we are used to. (This is similar to asking why we
want the positive numbers on the x-axis to be on the right of 0, or why we want
the positive numbers on the y-axis to be above 0. The answers would be the same:
no reason other than to conform with an a priori accepted convention.)
Exercises 4.3
(1) (This exercise shows that the use of a vertical number line on page 64 to
define slope is not strictly necessary.) Referring to the picture used for the
definition of slope on page 65 and using the “coordinates of a point” to
refer to the coordinates with respect to the x- and y-axes, prove that the
slope of L1 is
(y-coordinate of Q1 ) − (y-coordinate of Q).
Similarly, prove that the slope of L2 is
(y-coordinate of Q2 ) − (y-coordinate of Q).
(2) Prove the case of Theorem 4.4 when the slope of L is negative.
(3) (i) Let L be the line joining (1, 2) to ( p, −4), where p is some number.
For what value of p would L pass through (10, 25)? (ii) Let be the line
joining (− 32 , 4) and ( 45 , q), where q is some number. For what value of q
would pass through (2, −3)?
(4) Does the line joining (3, −2) and (6, 2) contain the point (9, 6)? Explain
it two different ways.
11 They would be twice as large, which one can prove by using Theorem 4.3 on page 62.
72 4. LINEAR EQUATIONS IN TWO VARIABLES AND THEIR GRAPHS
(5) (i) Let A = ( a, a
) and B = (b, b
). Prove that the midpoint of the segment
b a
+ b
AB is ( a+2 , 2 ). Hint: Use Theorem 4.3 on page 62. (ii ) Generalize
part (i): Given positive numbers s and t. Prove that the coordinates of
| AC |
the point C on the segment AB so that |CB| = st are given by
ta + sb ta
+ sb
, .
s+t s+t
(6) Let D be a dilation of the coordinate plane with center at the origin O
and let L be a line whose slope is s. What is the slope of D ( L)?
(7) (i) Let L be the line passing through (1, −2) with slope m. For which
value of m would L pass through (20, 72)? (ii) Let be the line with
slope m passing through ( 12 , 34 ). For which value of m would pass
through ( 53 , 13 )?
Q1
P 1 Q
L1
O X
Q2
distinct points P and Q1 —must likewise coincide. The proof of Theorem 4.5 is
complete
Armed with Theorem 4.5, we can now conclude the discussion of the previous
special case of Theorem 4.2 when m = 23 and k = 2, i.e., the equation y = 23 x + 2
(see page 62). As before, let
L be the line joining (−3, 0) and (0, 2),
G be the graph of y = 23 x + 2.
Recall that our strategy is to prove that G coincides with L by proving:
(α) Every point on the graph G is a point on the line L.
(β) Every point on the line L is a point on the graph G.
We first prove (α). Let the point (0, 2) on the y-axis be denoted by P. Take
an arbitrary point R
on the graph G distinct from P, and we must prove that R
lies on L. We will do so by showing that the line L
joining P to R
coincides with
L so that, in particular, R
lies on L. So why would L and L
coincide? There is
no a priori reason that they should, because if the graph G is really “curved”, as
shown,
L L
# "
r# r
"
# "
" R
# "
G # ""
# "
# "
# "
#""
#"
"#"
#
"
P""
r#
"#
"#
# O
then L and L
would be distinct. What we are going to show is that, because G is
the graph of a linear equation y = 23 x + 2, L and L
must have the same slope.
Once that is done, since the lines L and L
both pass through P, they will have to
coincide because of Theorem 4.5.
Y
L
L
r
R
= (r1
, r2
)
Pr
2
r
X
−3 O
74 4. LINEAR EQUATIONS IN TWO VARIABLES AND THEIR GRAPHS
the same method as in the special case of y = 23 x + 2 to show that L and G are
equal, i.e., we go through the same two steps:
(α) Every point on the graph G is a point on the line L.
(β) Every point on the line L is a point on the graph G.
We begin with step (α): we have to show that any point R
on the graph G
distinct from P lies on L. We do so by proving that the line L
joining P to R
coincides with L; consequently, R
has to lie on L.
Y
L
Rr L
r
R
P
r
X
O
As in the case of y = 23 x + 2, we will prove the coincidence of L and L
by
showing that they have the same slope. It then follows from Theorem 4.5 on page
72 that L = L
because they also pass through the same point P. To this end, we
will prove in general the following lemma.
Lemma 4.6. The slope of the line joining any two distinct points on the graph of a
linear equation y = mx + k is always equal to m.
(Caution: One may be tempted to assert instead that “the slope of the graph
of y = mx + k is m”, but at this particular juncture, it is not yet known that the
graph of y = mx + k is a line, so it would be premature to talk about the “slope
of the graph of y = mx + k”.)
Proof. Let the two points on the graph of y = mx + k be ( p1 , p2 ) and (q1 , q2 ). The
slope of the line joining them is then (q2 − p2 )/(q1 − p1 ).12 Being on the graph,
the coordinates of these points satisfy, by definition, the equations
p2 = mp1 + k and q2 = mq1 + k.
Therefore, the slope is
q2 − p2 (mq1 + k) − (mp1 + k) m ( q1 − p1 )
= = = m.
q1 − p1 q1 − p1 q1 − p1
This proves Lemma 4.6.
Because L is the line joining the points P and R on the graph G of y = mx + k,
and L
is the line joining P and R
, also on G, it immediately follows from the
preceding lemma that both of their slopes are m. So L = L
by Theorem 4.5 on
page 72. The proof of step (α) is complete.
12 Note that this ratio always makes sense because q − p is never 0. The reason for the latter is
1 1
that, if it were, we would have p1 = q1 . But ( p1 , p2 ) and ( q1 , q2 ) being distinct points on the graph of
y = mx + k, we have p2 = mp1 + k and q2 = mq1 + k. Thus also p2 = q2 and the two points ( p1 , p2 )
and ( q1 , q2 ) would not be distinct, a contradiction. Hence q1 − p1 is never 0.
76 4. LINEAR EQUATIONS IN TWO VARIABLES AND THEIR GRAPHS
Now, step (β): why every point of L lies on the graph G. Because G is the
set of all solutions of the equation y = mx + k, we have to prove that any point
Q = (q1 , q2 ) on L satisfies q2 = mq1 + k. The reasoning is very simple now. We
have just seen that the slope of L is m. Since P = (0, k), the slope of L computed
using P and Q is still equal to m, i.e.,
q2 − k
= m.
q1 − 0
Exercises 4.4
(1) (a) What is the most general linear equation of two variables whose
graph passes through the point (−2, 1)? (b) Write down the three linear
equations of two variables whose graphs all have slope 32 but intersect
the y-axis at (0, 1), (0, −2), and (0, 32 ), respectively.
(2) Consider the graphs of y = 125x − 7 and y = 126x − 7 over the negative
x-axis, i.e., left of the y-axis. Which graph lies above the other? Explain
in two different ways.
(3) On the basis of what we know thus far, what can you say about the
graphs of the following two equations? Explain in two different ways.
67 67
y = x + 21 and y = x + 21.5.
895 895
Y L
r
( x, y)
r
4
− O
X
We can compute the slope of L by using (−4, 0) and (0, 6), and it is 32 . By the
first part of Theorem 4.2, the graph of the equation y = 32 x + k, where k is some
constant, is a line whose slope, in view of Lemma 4.7 on page 76, is also 32 . We
are therefore tempted to show that = L for a suitably chosen k. By Theorem 4.5
on page 72, this would be the case if passes through (0, 6). But the latter can be
easily arranged because we only need to choose k in y = 23 x + k so that (0, 6) is a
solution. In other words, it suffices to choose k so that 6 = 32 (0) + k. The latter is
true precisely when k = 6. Therefore the graph of y = 32 x + 6 is a line which
passes through (0, 6) and whose slope is 32 . As noted, Theorem 4.5 now implies
that indeed L = . Therefore the given line L is the graph of y = 32 x + 6.
Observe that the “6” in y = 32 x + 6 is the “6” in (0, 6).
It remains to tackle the general case: Given a (nonvertical) straight line L, we
must find a linear equation whose graph is exactly L. The idea of the proof is the
same as in the above special case. Since L and the y-axis are not parallel, they
must meet at some point. Let L intersect the y-axis at (0, k). Let the slope of L
be m. We are going to show that L is the graph G of the equation y = mx + k.
For the sake of variety, let us assume m is negative so that we have the following
picture (for convenience, we have drawn the picture for the case k > 0):
Y
L
QQ
Q
Q
Q
Q
Q
Q
Q rk
Q
Q
Q
Q
Q X
O Q
Q
By the first part of Theorem 4.2 , the graph G of y = mx + k is a line. By Lemma
4.7 at the end of the last section, the slope of G is m. Since (0, k) is obviously a
solution of y = mx + k, G also passes through the point (0, k). Therefore G and
L are two lines which have the same slope m and pass through the same point
(0, k). By Theorem 4.5 on page 72, G and L are the same line. It follows that the
given line L is the graph of y = mx + k. This completes the proof of Theorem 4.2.
78 4. LINEAR EQUATIONS IN TWO VARIABLES AND THEIR GRAPHS
Exercises 4.5
(1) (i) Find the equation of the line passing through (1, −1) with slope −1.
Does it pass through (−85, 85)? (ii) Find the equation of the line passing
through (− 25 , 3) with slope 17 . Where does it intersect the x-axis?
(2) Practice explaining to an eighth grader why the line joining the origin
and the point (−1, −1) is the graph of a linear equation of two vari-
ables. Do it once assuming that the student knows the graph of a linear
equation of two variables is a line, and do it also without making that
assumption. In any case, be clear about what you assume the student
knows, and make it as simple as possible.
In practice, one can also get the equation in Lemma 4.9 differently. The line
q −p
L obviously has slope m = q2 − p2 , so it must be defined by an equation of the
1 1
form y = mx + k for some constant k (second part of Theorem 4.2). It suffices
to determine what k is, and that can be done by recalling that ( p1 , p2 ) lies on
the graph of y = mx + k, i.e., ( p1 , p2 ) is a solution of the equation so that p2 =
mp1 + k. Thus k = p2 − mp1 and the equation sought by Lemma 4.9 is now
q2 − p2
y = mx + ( p2 − mp1 ), where m = .
q1 − p1
Needless to say, this equation need not—and should not—be memorized. More
importantly, we hope it is clear by now that the whole point of going through
the details of the last three sections is to show that, once you get to know the
interplay between the algebra of the equation and the geometry of the graph, no
memorization is necessary.
Example 2 on page 80 shows that the actual computation of k is even simpler
than the abstract description.
We now come to a third fact that is usually glossed over in TSM.
a c a
c
y = − x+ and y = −
x+
b b b b
we must have, by Lemma 4.8 on page 78, bc = bc
and ba = ba
because the graphs
are the same line and must therefore have the same y-intercept and slope. By the
cross-multiplication algorithm (see page 270), we get, equivalently,
b
c
b
a
= and = .
b c b a
80 4. LINEAR EQUATIONS IN TWO VARIABLES AND THEIR GRAPHS
b
Let λ = b. Then
b
c
a
λ = = = .
b c a
It follows that a
= λa, b
= λb, and c
= λc. The proof is complete.
In the situation of Lemma 4.10, we can retrieve the equation ax + by = c
of from the equation a
x + b
y = c
by multiplying both sides by λ1 . For this
reason, one normally regards any two equations defining a line as “the same”,
and speaks of the defining equation of a line.
The final and fourth fact is a consequence of Lemma 4.8 and Lemma 4.10.
so the equation of has the form y = 23 x + k (Lemma 4.8), where the constant
k is determined by the fact that (−1, 3) is a solution of the equation since (−1, 3)
lies on . Thus 3 = 23 (−1) + k, and k = 11 2 11
3 . The equation of is y = 3 x + 3 .
There is another way to approach this problem. Let ( x, y) be an arbitrary
point on not equal to (−1, 3). By Theorem 4.4 on page 67, we can compute the
slope of by using ( x, y) and (−1, 3), getting
y−3 2
= .
x+1 3
When x = −1, this is equivalent to y − 3 = 23 ( x + 1), and the graph of the
latter now contains every point on , including (−1, 3). But y − 3 = 23 ( x + 1) is
equivalent to y = 23 x + 11
3 .
The preceding solutions need to be complemented by an observation. We
have used the point (−1, 3) instead of ( 12 , 4) as the reference point in both solu-
tions, but of course the outcome would have been the same had ( 12 , 4) been used.
This is the message of the Corollary on page 79. For example, suppose in the first
solution we make use of the fact that ( 12 , 4) lies on the graph of y = 23 x + k; then
we have 4 = 23 · 12 + k, so that
1 11
k = 4− =
3 3
as before. In the second solution, if we use the point ( 12 , 4) as the point of refer-
ence, then for all points ( x, y) on except ( 12 , 4), we would have
y−4 2
= .
x − 12 3
The same reasoning as before shows that the equation of is, again, y = 23 x + 11
3 .
It is worth emphasizing that none of these methods should be memorized
by brute force beyond the fact that the equation of a nonvertical line is of the
form y = mx + k for some constants m and k, where m is the slope of the line.
Instead, one should get to know the reasoning underlying these procedures and
do a simple computation each time to get at the equation.
Example 3. What is the x-intercept of the line joining the points (−4, 6) and
(2, 1)?
Solution. The slope of the line is −64−−12 = − 56 , so the equation of the line
is y = − 56 x + k, for some constant k. Since it contains the point (2, 1), we have
1 = − 56 · 2 + k, and so k = 1 + 53 = 83 . The equation of the line is therefore
y = − 56 x + 83 . The point where this line intersects the x-axis has a y-coordinate
equal to 0; let it be (c, 0). Since (c, 0) lies on the line, we also get 0 = − 56 c + 83 ,
which is the same as 56 c = 83 . Multiplying through by 65 , we get c = 16 5 . So the
16
x-intercept is 5 .
82 4. LINEAR EQUATIONS IN TWO VARIABLES AND THEIR GRAPHS
ROP
"
"
"
"
D ( a, b) = P
"
"
"
"
( a, b) = P "
"
"
"
"
"
"
"
O Q Q
Notice that the x-coordinates of P and P
are |OQ| and |OQ
|, respectively. Be-
cause PQ P
Q
, Theorem 4.3 on page 62 implies immediately that
|OQ
| |OP
|
= .
|OQ| |OP|
|OP
|
But we have seen that |OP| = r, so we get
|OQ
|
= r,
|OQ|
or, what is the same thing, |OQ
| = r |OQ|. Thus the x-coordinate of D ( a, b) is
r times the x-coordinate a of ( a, b). By using horizontal lines passing through P
and P
and by repeating the same argument with respect to the y-axis, we get in
a similar manner that the y-coordinate of D ( a, b) is also r times the y-coordinate
b of D ( a, b). This shows D ( a, b) = (ra, rb) = r ( a, b), as claimed.
4.6. USEFUL FACTS AND EXAMPLES 83
In this case, the only difference is that |OQ| = − a and |OQ
| is equal to the
negative of the x-coordinate of D ( a, b), so that from |OQ
| = r |OQ|, we conclude
the x-coordinate of D ( a, b) is again r times the x-coordinate a of ( a, b) as before.
Similarly, the y-coordinate of D ( a, b) is again r times the y-coordinate b of ( a, b).
The proof is complete.
Exercises 4.6
(1) (i) Find the equation of the line joining (2, −1) and (−3, −11). What is its
x-intercept? (ii) Find the equation of the line joining (− 14 , 23 ) and (5, 32 ).
What is its y-intercept?
(2) (i) What is the equation of the line with x-intercept equal to −2 and slope
− 13 ? What is its y-intercept? (ii) What is the equation of the line with
x-intercept equal to 25 and y-intercept equal to − 43 ?
(3) What is the x-intercept of the line passing through (−5, 2) with slope
− 72 ?
(4) (i) What is the y-intercept of the graph of x = −5y + 7? What is its
slope? Does the point (352 25 , −70 15 ) lie on the graph? (ii) What is the
x-intercept of the line passing through (5, − 23 ) and (− 43 , 12 )? What is its
slope?
(5) Let L and L
be the lines defined by 2x − 3y = 0 and 3x + 2y = 0,
respectively.
Y
L
J
J L
Jr
P
J r
J P
J
J
J
J X
O
Let P and P
be points on L and L
, respectively. We may assume that
the coordinates of P and P
are (3t, 2t) and (−2s, 3s), respectively, for
some numbers t and s. (a) Compute the squares of the lengths, |OP|2 ,
84 4. LINEAR EQUATIONS IN TWO VARIABLES AND THEIR GRAPHS
|OP
|2 , and | PP
|2 . What do you observe? (b) What can you conclude
about L and L
?
(6) Let L and L
be the lines defined by ax − by = 0 and bx + ay = 0,
respectively, where a and b are constants. What can you say about L and
L
? (Hint: Look at the preceding exercise.)
(7) Let D be the dilation with center O (center of a coordinate system) and
scale factor 13 . Let Δ be the triangle with vertices (6, 3), (12, 15), and
(9, −17). What are the vertices of D (Δ) ?
https://doi.org/10.1090//mbk/099/05
CHAPTER 5
system. A priori, there may be others, but it will turn out that the solution of
this particular system consists only of (−1, 12 ) so that (−1, 12 ) is the solution of the
system. At present we are only concerned with systems consisting of two equa-
tions in two variables, but note that a similar discussion also holds for systems of
equations consisting of many equations in any number of variables.
Postponing for the moment the discussion of how to get a solution such as
(−1, 12 ) to the above system, let us first give a geometric interpretation of this
point (−1, 12 ). Now (−1, 12 ), being a solution of the first equation x + 3y = 12 ,
lies on the line defined by x + 3y = 12 . Similarly, (−1, 12 ) also lies on the line
defined by the second equation x − 2y = −2. This means the solution (−1, 12 ) is
the point of intersection of the two lines defined by the equations in the linear system, as
shown:
Y x − 2y = −2
x + 3y = 12
PP
PP(−1, 1 )
1
PP r2
P
PP
PP
PP X
−2 PP
−1 O 1
Now, are there perhaps other solutions? The answer is no, because if the point
( A, B) is not at the intersection of these two lines, then let us say ( A, B) does
not lie on the graph of x + 3y = 12 . Therefore, by the definition of the graph of
x + 3y = 12 (which is the collection of all the points (α, β) which are solutions of
x + 3y = 12 ), ( A, B) cannot be a solution of x + 3y = 12 . Therefore, the point of
intersection of the lines defined by the equations of the linear system is exactly
the solution of the linear system.
This reasoning is perfectly general.
Theorem 5.1. Suppose we are given a linear system in two unknowns x and y:
ax + by = e
cx + dy = f
where a, b, . . . , f are constants. Let 1 and 2 be the lines defined by the equations
ax + by = e and cx + dy = f , respectively. Then the solution to the system is (the set
of points in) the intersection of the lines 1 and 2 .
Proof. We first show that if ( A, B) lies in the intersection of 1 and 2 , then it is
a solution of the system. Indeed, ( A, B) being on 1 implies that it is a solution
of ax + b = e and being on 2 implies that it is a solution of cx + d = f . Thus
( A, B) is a solution of both equations, and therefore a solution of the system. Is
there any solution that does not lie in the intersection of 1 and 2 ? We now show
that there is not. Let ( A, B) be a solution of the system. Then ( A, B) is a solution
of ax + by = e, and therefore ( A, B) lies on 1 by the definition of the graph of
an equation. For the same reason, ( A, B) lies on 2 as well. Therefore ( A, B) is a
point of intersection of 1 and 2 . Thus the set of all the solutions coincides with
the intersection of 1 and 2 . The proof is complete.
5.2. THE ALGEBRAIC METHOD OF SOLUTION 87
Suppose the lines 1 and 2 are distinct nonparallel lines. Then we know they
intersect at exactly one point. We have just given the precise reasoning why, if the
lines defined by the two equations of a linear system of two linear equations in
two unknowns are distinct nonparallel lines, then the solution of the linear system
is the point of intersection of the two lines. This fact is usually decreed by fiat in
TSM—with no reason given—probably because the precise definition of the graph
of an equation is rarely given or, if given, is not put to use. It is very important
that you learn to make use of definitions in your teaching. In particular, please
do not forget to explain why the solution of a linear system can be obtained from
the intersection of the graphs of the linear equations.
Exercises 5.1
(1) Write down a system of equations so that the following picture is its
geometric interpretation (you may assume that one of the lines intersects
the x-axis at (1.5, 0) and the other intersects the y-axis at (0, 0.5)):
Y
3
1 2 3
X
O
−1
(5/3,−1/3)
then turn around and verify that the presumptive solution is indeed a solution of
the linear system.
The first method of solution is by substitution. We use one equation to get
an expression of (let us say) y in terms of x, and then replace the y in the other
equation by this expression of y.1 Then we solve the resulting linear equation in
x as in Section 3.1 on page 37. Finally, we solve for y.
We will illustrate with the following specific linear system in the hope of
making the explanation more accessible:
4x + 5y = −3
(5.1)
−2x + y = 5
Note, however, that the reasoning given below is perfectly general.
We want to show that if there is an ordered pair of numbers ( x, y) satisfying
the system (5.1), then necessarily ( x, y) = (−2, 1) (in the sense that x = −2
and y = 1; see page 56). So let ( x, y) be such a solution. Then the system (5.1)
becomes a pair of equations about numbers to which we can apply the usual
arithmetic operations. Look at the second equation: it practically hands over an
expression of y in terms of x, namely, y = 2x + 5. Now the substitution method
calls for “substituting” this expression of y into the first equation of (5.1) to get
(5.2) 4x + 5(2x + 5) = −3.
This implies 14x = −28 so that x = −2 (see Section 3.1 on page 37). Since
y = 2x + 5, we have y = −4 + 5 = 1. Thus ( x, y) being a solution of (5.1) implies
that it has to be (−2, 1), as claimed.
Needless to say, it is easy to check that (−2, 1) is indeed a solution of (5.1).
(In practice, the routine checking that a purported solution is the solution of the
linear system should be made mandatory in the school classroom.)
We hasten to explain why the substitution method works. Precisely, what
does equation (5.2) mean, and why is its solution part of the solution of the system
(5.1)?
The first equation of (5.1) is equivalent (in the sense of having the same solu-
tions) to 5y = −4x − 3 since the latter is obtained from the former by transposing
the term 4x. Now the second equation of (5.1) is equivalent to y = 2x + 5 for the
same reason, and the latter is in turn equivalent to 5y = 5(2x + 5). Let us define
two linear systems to be equivalent if they have the same solutions. Then we see
that the system (5.1) is equivalent to the following linear system:
5y = −4x − 3
(5.3)
5y = 5(2x + 5)
Therefore, solving (5.1) is equivalent to solving (5.3). Let us pause to reflect on
what it means to solve (5.3). If x is any number, say 3, would x = 3 be part of
the solution of the system (5.3)? No, because if x = 3, then the first equation of
(5.3) implies that 5y = −15 and therefore y must be y = −3. However, if we let
x = 3 in the second equation of (5.3), then necessarily 5y = 55 and y = 11; this
contradicts the fact that y is already known to be equal to −3 because of the first
equation. Thus for a value of x to be part of the solution of the system (5.3), this
value of x must be one such that when the right sides of both equations of (5.1)
are given this value, they are equal to the same number, namely, 5y. Such being
1 Thus we “substitute” this expression of y into the other equation.
5.2. THE ALGEBRAIC METHOD OF SOLUTION 89
the case, it is then clear how to solve the system (5.3): we want a value of x so
that the right sides of both equations in (5.3) are equal, i.e., we want x to be the
solution of
(5.4) −4x − 3 = 5(2x + 5).
The solution of (5.4) then guarantees that for this value of x, the solutions of y
from both equations in the system (5.3) will coincide, i.e., these values of x and y
furnish the solution of the system (5.3), and therefore also of system (5.1).
Now observe that (5.4) is equivalent to equation (5.2), because the former is
obtained from the latter by transposing the term 4x. This then explains what the
method of substitution is all about and why solving equation (5.2) gives part of
the solution of (5.1).
We purposely chose system (5.1) for illustration because its second equation
immediately suggests an expression of y in terms of x. This obviates any need to
search for an expression of y in terms of x and allows us to completely focus on
the subsequent explanation about the substitution of x for y in the first equation.
However, the underlying reasoning of the preceding explanation is perfectly gen-
eral, and we will now demonstrate this generality by using the second equation
of (5.1) to get an expression of x in terms of y (i.e., rather than y in terms of x)
and use entirely similar reasoning to explain why the corresponding substitution
of y for x in the first equation of (5.1) also leads to a solution of system (5.1).
For this purpose, let ( x, y) be a solution of the system (5.1) as usual. Recall
the system (5.1):
4x + 5y = −3
−2x + y = 5
We will deduce that ( x, y) = (−2, 1). We do this by using the second equation to
get an expression of x in terms of y and then substitute y for x in the first equation
to get an equation in y alone. Thus, rewrite the second equation as −2x = −y + 5,
from which we conclude
x = (− 12 )(−y + 5).
Now substitute this value of x into the first equation above to get 4(− 12 )(−y + 5) +
5y = −3, which may be rewritten as
(5.5) (−2)(−y + 5) + 5y = −3.
The method of substitution now calls for the solution of equation (5.5) in y. This
yields 7y = 7 and therefore y = 1. From x = (− 12 )(−y + 5), we get x =
(− 12 )(−1 + 5) = −2. We conclude that if there is a solution ( x, y) to (5.1), then
( x, y) = (−2, 1). As before, we should always check that (−2, 1) is indeed a
solution of the system (5.1).
Once again, why does solving equation (5.5) yield a solution of system (5.1)?
To see this, we first show that the system (5.1) has the same solutions as another
system. The first equation of (5.1) is equivalent to 4x = −5y − 3. The second
equation is equivalent to x = (− 12 )(−y + 5), as we have observed, and the latter
is clearly also equivalent to 4x = (−2)(−y + 5). Therefore the system (5.1) is
equivalent to the following linear system:
4x = −5y − 3
(5.6)
4x = (−2)(−y + 5)
90 5. SIMULTANEOUS LINEAR EQUATIONS
We may therefore solve system (5.1) by solving system (5.6) instead. Now if ( x, y)
is a solution of system (5.6), then, of course,
(5.7) −5y − 3 = (−2)(−y + 5)
because both sides are equal to 4x. Let us look at (5.7) as an equation in y.
The first observation about equation (5.7) is that its solution y guarantees that
for this value of y, the solution x from either equation of (5.6) will automatically
satisfy the other equation in (5.6), and therefore the pair ( x, y) is a solution of the
system (5.6), and hence of system (5.1). The second observation is that equation
(5.7) is equivalent to equation (5.5) as the former is obtained from the latter by
transposing the term 5y. This then explains why solving equation (5.5) yields a
solution of (5.1).
Remark. It is worth repeating that, once students have solved a linear system
using the analog of the substitution equation (5.2) or (5.5), they should check that
the solution so obtained actually satisfies the original system. This practice is not
only a good way to avoid unintended errors, but is also a reminder of the overall
structure of solving equations, i.e., assuming that there is a solution, we first find
out what this solution has to be, and then we confirm that the purported solution
is a solution (see Section 3.1).
In a middle school classroom, it would be entirely appropriate to assess stu-
dents’ understanding by asking them to explain why the method of substitution
works, in the sense of the explanation after equation (5.2) or after equation (5.5).
Activity
Solve:
4x − y = −1
9x − 2y = 1
This leads to y = 1 as before. One then solves for x to get the solution of the
system.
What is even more important about equation (5.8) is the fact that it becomes
the same equation as (5.5) if we expand the latter to get
2y − 10 + 5y = −3
and then transpose −10 to the right side. Therefore, this way of “bringing the co-
efficient of the x term in both equations to have opposite signs and then eliminate
x by adding the corresponding sides of both equations” achieves the same result
as the method of substitution embodied in equations (5.2)–(5.4). This so-called
method of elimination is thus nothing more than a different presentation of the
method of substitution, but one should keep this method in mind as an additional
tool to solve simultaneous equations.
We will give another illustration of the method of elimination by eliminating
the y terms in the linear system (5.1) instead. Still with ( x, y) as a solution of (5.1),
multiplying both sides of the second equation by 5, we obtain −10x + 5y = 25.
Now subtract both sides of this equation from the corresponding sides of the first
equation in (5.1),2 and we get:
(5.9) 4x − (−10x ) = −3 − 25.
Thus 14x = −28 and x = −2. The fact that y = 1 follows by letting x = −2 in
the second equation, −2x + y = 5, in (5.1).
Activity
Explain why this way of eliminating the y terms and solving equation (5.9)
in x leads to a solution of system (5.1). (Hint: Compare equations (5.9) and
(5.2).)
Two examples
Example 1. Solve the following system:
2x + 3y = 2
(5.10)
3x − 4y = 1
6
We first solve the system by a brute force substitution. From the first equation,
we get an equivalent equation x + 32 y = 1, so that x = − 32 y + 1. Substituting
this value of x into the second equation, we get:
3(− 32 y + 1) − 4y = 16 .
Upon simplification, this becomes,
− 92 y + 3 − 4y = 1
6.
We can solve this equation by clearing the denominators (see (E) on page 42). Thus,
multiplying both sides of the equation by 6, we get −27y + 18 − 24y = 1, so that
y = 13 . To solve for x, we can make use of either equation of (5.10). Suppose we
use the first equation; then 2x + 3 ( 13 ) = 2 and x = 12 . Thus the solution of (5.10)
is ( 12 , 13 ).
2 Again, don’t lose sight of the fact that each side is just a number!
92 5. SIMULTANEOUS LINEAR EQUATIONS
Activity
Example 2. A fraction has the property that, when 2 is added to both the
numerator and the denominator, the new fraction is equal to 43 , but when the
denominator of the original fraction is subtracted from the numerator, the result
is 5. What is the fraction?
Let the fraction be xy . Then we are given that
x+2 4
= and x − y = 5.
y+2 3
Using the cross-multiplication algorithm (see page 270), the first equation is equiv-
alent to 3x + 6 = 4y + 8, which is in turn equivalent to 3x − 4y = 2. Therefore
( x, y) is a solution of the following linear system:
3x − 4y = 2
x − y = 5
This system is equivalent to:
3x = 4y + 2
3x = 3y + 15
Equating the right sides of the two equations gives 4y + 2 = 3y + 15, and there-
fore y = 13. To solve for x, we can use either of the equations in the linear system,
but since the second equation is x − y = 5, we get x = 18. The fraction is 18 13 .
Exercises 5.2
(1) Solve:
7x − 3y = 10 1
3x − 3y = 5
(a) (b)
3x − 5y = −5 4y − 12 x = −3.5
⎧
⎨ 2
− 56 y = − 12
5x 12x + 11y = 172
(c) (d)
⎩ 1 5 5 28x − 17y = 60
6x + 9y = 2
0.08x + 0.9y = 0.46 5x − 34 y = 2
(e) (f)
0.1x − 0.04y = 0.16 x + 2y = 11
6
5.3. CHARACTERIZATION OF PARALLEL LINES BY SLOPE 93
(2) Solve:
⎧ ⎧
⎨ 6
x + 12
y = −1 ⎨ 9
− 3y = 4
x
(a) (b)
⎩ 8
− 9
= 7 ⎩ 3
+ 2y = 10
x y x 3
(3) The second digit of a two-digit number is 13 of the first digit. If the
number is divided by the difference of the digits, the quotient is 15 and
the remainder is 3. Find the number.
(4) Alan’s age is 65 of Bill’s, but 15 years ago his age was 13
10 of Bill’s. Find
their current ages.
(5) Helena buys two books. The total cost is 49 dollars, and the difference
of the squares of the prices is 735. What is the cost of each book?
(6) I have two numbers x and y. Take 20% of x from x, then what remains
would be 7 less than y. If however I enlarge y by 20%, then it would
exceed x by 8. What are the two numbers?
(7) We have two whole numbers. The division-with-remainder of the larger
number by the smaller number has quotient 9 and remainder 15. Also,
the larger number is 97.5% of ten times the smaller number. What are
these numbers?
(8) A sum of money is to be divided equally among x people, each receiving
y dollars. If there were 3 more people, each person would receive 1 dollar
less, and if there were 6 fewer people, each would receive 5 dollars more.
Determine x and y.
(9) If 3 is added to the numerator of a fraction and 7 is subtracted from the
denominator, its value is 67 . But if 1 is subtracted from the numerator
and 7 is added to the denominator, its value is 25 . Find the fraction.
(10) Barrels are filled with wine and water. The contents of one barrel is
5 8
6 wine, and of another 9 wine. How many gallons must be taken
from each to fill another barrel whose capacity is 24 gallons, so that the
mixture will be 78 wine?
(11) Two marathon runners run at constant speeds. If they start running at
the same time from separate cities, 22 kilometers apart, towards each
other, they are 11 kilometers apart after 1 hour. Suppose they start over
and the first runner now runs twice as fast as before but the second run-
ner continues to run at his usual speed; then they would be 5 kilometers
apart after one hour. What are their respective speeds?
Theorem 5.2. Two distinct, nonvertical lines in R2 are parallel if and only if they
have the same slope.
Proof. Let the lines be 1 and 2 . We first assume that they are parallel and prove
that they have the same slope.
If either of 1 and 2 is horizontal, then since 1 2 , the other is also horizon-
tal and both would have 0 slope and there would be nothing to prove in this case.
So we may assume both 1 and 2 are not horizontal. Referring to the picture
below, take a point P on 1 and let a vertical line through P intersect 2 at Q. (This
vertical line must intersect 2 because the latter is not vertical.) Since the lines are
distinct, P = Q. From Q, draw a horizontal line which meets 1 at S. Then from
S, draw a vertical line which (as before) meets 2 at a point T.
Y
1
P
2
S
Q
T
O X
Now
PQS and
TSQ are right triangles with legs parallel to the coordinate
axes. By Theorem 4.4 on page 67 and by equation (4.4) on page 69, the slopes of
1 and 2 are, respectively,
p2 − s2 | PQ| q2 − t2 |ST |
= and = .
p1 − s1 |SQ| q1 − t1 |SQ|
We have to show that these two numbers are equal. It suffices to show that
| PQ| = |ST |. Observe that PQ ST because both are vertical. We are also given
that 1 2 . Therefore PQTS is a parallelogram and, by Theorem 4.1 on page 54,
the opposite sides PQ and ST are equal. This shows that nonvertical parallel lines
have the same slope.
Conversely, suppose two distinct, nonvertical lines 1 and 2 have the same
slope, and we have to show that they are parallel.
5.3. CHARACTERIZATION OF PARALLEL LINES BY SLOPE 95
We give three proofs. The first is a direct continuation of the preceding line
of geometric reasoning; the second is algebraic; while the third is the geometric
version of the second. First, if 1 and 2 have slope 0, then they are both horizontal
and are therefore parallel. We may therefore assume that they have nonzero slope
so that they are both nonhorizontal. We now perform the same construction as
before to get triangles
PQS and
TSQ. The fact that 1 and 2 have the same
slope then implies that (see Theorem 4.4 and equation (4.4))
| PQ| |ST |
= .
|SQ| |SQ|
Multiplying both sides by |SQ| yields | PQ| = |ST |. This immediately implies
that
PQS and
TSQ are congruent because of SAS. In greater detail, they have
a side SQ in common, and have another pair of equal sides in | PQ| = |ST |.
Finally, ∠ PQS and ∠TSQ are equal because they are both right angles, so the
congruence conditions of SAS are met. It follows that ∠ PSQ and ∠TQS are equal
because they are corresponding angles of congruent triangles. But then 1 2
because their alternate interior angles relative to the transversal LSQ are equal
(see Theorem 4.9 (of [Wu-PreAlg]) on page 271). The first proof of Theorem 5.2 is
complete.
Here is a second proof; it is algebraic. Since 1 and 2 are both nonvertical,
say they have slope m. Then let the equations defining them be y = mx + k
and y = mx + k
, respectively, where k = k
because by assumption the lines
are distinct. Suppose they intersect at a point ( A, B); then B = mA + k and
B = mA + k
, which then implies that mA + k = mA + k
, which in turn implies
that k = k
. This is a contradiction to the earlier conclusion that k = k
. Again,
we are done.
Finally, we give a third proof of Theorem 5.2 by contradiction (see the proof
of Lemma 3.1 in [Wu-PreAlg]). Suppose 1 and 2 are distinct and have the same
slope. If they are not parallel, then they meet at a point Q. Since 1 and 2 are
now two lines which have the same slope and pass through the same point (i.e.,
Q), Theorem 4.5 on page 72 implies that 1 = 2 . This contradicts the hypothesis
that the lines are distinct, thereby completing the third proof of Theorem 5.2.
Activity
Proof. Let the line passing through A and B be denoted by L. Let P be a point with
coordinates ( p1 , p2 ). We will prove that T ( P) has coordinates ( p1 + c1 , p2 + c2 ).
The proof is broken into two cases, P lies on L and P does not lie on L.
96 5. SIMULTANEOUS LINEAR EQUATIONS
Now | PQ| = | AB| because the distance formula (page 57) implies
| PQ| = (( p1 + c1 ) − p1 )2 + (( p2 + c2 ) − p2 )2
= c21 + c22 = (b1 − a1 )2 + (b2 − a2 )2 = | AB|.
Next, we prove that Q lies on L (= L AB ). Let L PQ denote the line containing P and
Q as usual. Now L and L PQ are two lines that contain the point P, and they also
have the same slope because the slope of L PQ is
( p2 + c2 ) − p2 c b − a2
= 2 = 2
( p1 + c1 ) − p1 c1 b1 − a 1
and the latter is the slope of L. Therefore, by Theorem 4.5 on page 72, the lines
L and L PQ coincide. Therefore Q lies on L PQ = L. Finally, if B is to the right of
A (as shown in the preceding picture), then b1 > a1 , so that b1 − a1 > 0, which
is to say, c1 > 0. This then implies that p1 + c1 > p1 , i.e., the first coordinate of
Q is bigger than the first coordinate of P and therefore Q is to the right of P. The
proof that if B is to the left of A then Q is also to the left of P is entirely similar.
Thus T ( P) = Q if P lies on L and L is not vertical.
If L is vertical, then a1 = b1 and c1 = 0. It is straightforward to see that,
in this case, the preceding argument simplifies, e.g., we prove that if A is above
(respectively, below) B, then P is also above (respectively, below) Q. The proof of
Case 1 is complete.
Case 2: P does not lie on L = L AB . As before, first assume L is not vertical,
i.e., a1 = b1 . Again let P = ( p1 , p2 ) and let Q = ( p1 + c1 , p2 + c2 ). According
to the definition of translation (see page 269), we have to prove that PQ AB,
| PQ| = | AB|, and Q is to the left (respectively, the right) of P if B is to the left
(respectively, the right) of A on the line L.
5.3. CHARACTERIZATION OF PARALLEL LINES BY SLOPE 97
Y
L
:
B
r
A
L PQ
r
Q
P
X
O
Since c1 = b1 − a1 = 0, the first coordinate ( p1 + c1 ) of Q differs from the first
coordinate p1 of P and therefore L PQ is not vertical. Thus the slopes of L (= L AB )
and L PQ are well-defined; the fact that the slopes are equal can be proved in
exactly the same way as in Case 1. Therefore by Theorem 4.5 on page 72, the
lines L and L PQ are parallel. The fact that | PQ| = | AB| is proved by the same
calculation as in Case 1 using the distance formula, and, finally, the reasoning in
Case 1 concerning Q being to the left (respectively, the right) of P on L PQ if B is
to the left (respectively, the right) of A on the line L remains the same for Case 2.
Now suppose L is vertical. In that case, a1 = b1 and therefore c1 = 0. The
first coordinates of P and Q are now the same (= p1 ) and therefore L PQ is also
vertical. Then it is straightforward to see that if A is above (respectively, below)
B, then P is also above (respectively, below) Q. The proof of Case 2 is complete,
and therefore the proof of Lemma 5.3 is also complete.
Exercises 5.3
(1) Mental math: Without solving the following linear system, explain using
geometry why it has no solution:
1
4x + 67y = 567
x + 268y = 931
(3) Use coordinates to prove that the three medians of a triangle (the lines
joining a vertex to the midpoint of the opposite side) meet at a point, as
follows. We may assume that the vertices of the triangle are A = ( a1 , a2 )
(a2 > 0), B = (0, 0), and C = (c, 0) (c > 0), i.e., A is above the x-axis,
B is the origin and C lies on the positive x-axis. Let the midpoints of
AB, AC, and BC be D, E, and F, respectively, and let BE and CD meet
at a point G. Prove that A, G, and F are collinear by computing the
coordinates of G and F. Hint: Use Exercise 5 on page 72.
(4) Theorem 3.1 on page 44 proves that a linear equation of one variable
ax + b = cx + d has a unique solution if a = c. Fill in the details of the
98 5. SIMULTANEOUS LINEAR EQUATIONS
Activity
Decide by visual inspection whether the following system has a unique so-
lution or an infinite number of solutions, and explain why:
12 13 x − 24y = 1
18.2x + 35.5y = 2.8
It remains to consider the situation where the system (5.12) has no solution.
In that case, the determinant of the system (5.12) must be zero because if the
determinant were nonzero, the system would have a unique solution, which is a
contradiction. Moreover, we claim that, if there is no solution, then there is no
nonzero number λ so that (c, d, f ) = λ( a, b, e). This is because Case 2 above says
(c, d, f ) = λ( a, b, e) for a nonzero λ implies the system has an infinite number of
solutions, which is a contradiction again.
We now show that, conversely, if the system (5.12) has a zero determinant
and if there is no nonzero number λ so that (c, d, f ) = λ( a, b, e), then the system
has no solution. We give a proof by contradiction (see the proof of Lemma 3.1 in
Section 3.1 of [Wu-PreAlg]). Suppose under these assumptions, the system (5.12)
has a solution. According to the earlier conclusion on page 98, the linear system
either has a unique solution or an infinite number of solutions. In the former case,
the determinant is nonzero (see Case 1 above), a contradiction to the hypothesis
of a zero determinant. But in the latter case, we have just seen that there must
be a nonzero number λ so that (c, d, f ) = λ( a, b, e) (see Case 2 above), again a
contradiction. Thus the converse is proved.
We have now proved that the system (5.12) has no solution if and only if it has
a zero determinant and there is no nonzero number λ so that (c, d, f ) = λ( a, b, e).
According to Theorem 5.1 on page 86, this means that if the determinant of the
system (5.12) is zero but there is no nonzero number λ so that (c, d, f ) = λ( a, b, e),
then the graphs 1 and 2 of the equations in (5.12) are parallel lines.
Activity
Decide by mental computation alone whether the following system has any
solutions: 1
2 3 x − 5y = 1
7x − 15y = 2.8
5.5. PARTIAL FRACTIONS AND PYTHAGOREAN TRIPLES 101
Exercises 5.4
(1) Without solving any of the following systems of equations, discuss the
nature of their solutions: a unique solution, infinitely many solutions, or
no solution?
(i )
2x − 3y = 1
x + 23 y = 2
(ii)
(− 14 ) x − 3546 y = 23
697 x + 4239 y = 890
(iii)
23 x − 85 y = 22
69 x + 255 y = 67
(2) Prove that the determinant of a linear system is zero if and only if the
graphs of the two equations 1 and 2 coincide or are parallel lines. Give
two proofs, one using Theorem 5.4, and a direct proof without using
Theorem 5.4.
Partial fractions
Pythagorean triples
102 5. SIMULTANEOUS LINEAR EQUATIONS
Partial fractions
This section gives two applications of linear systems in two variables.
The first application shows how to express certain rational expressions in a
number x as a sum of simpler rational expressions also in x. Consider the simple
sum:
5 4 5( x + 3) + 4( x − 2)
+ = .
x−2 x+3 ( x − 2)( x + 3)
After simplifying the numerator of the right side and multiplying out
( x − 2)( x + 3) = x2 + x − 6,
5 4 9x + 7
(5.13) + = 2 .
x−2 x+3 x +x−6
p( x ) c1 cn
= +···+ .
( a 1 x + b1 ) · · · ( a n x + b n ) a 1 x + b1 a n x + bn
4 For (A), see [Birkhoff-MacLane], Section 3.11; (B) follows from the fact that a polynomial of
degree n has at most n roots, which is easy to prove, e.g., Chapter 9 in Volume II of [Wu-HighSchool].
5.5. PARTIAL FRACTIONS AND PYTHAGOREAN TRIPLES 103
a n x n + · · · + a 1 x + a 0 = b n x n + · · · + b1 x + b0 .
9x + 7 a b
(5.14) = + ,
( x − 2)( x + 3) x−2 x+3
which is valid for all x = 2, −3. We now use Fact (B) to recover equation (5.13),
i.e., to obtain the values of the constants a and b as 5 and 4, respectively.5 By the
addition of rational expressions,
a b a ( x + 3) + b ( x − 2)
+ =
x−2 x+3 ( x − 2)( x + 3)
( a + b) x + (3a − 2b)
= .
( x − 2)( x + 3)
9x + 7 ( a + b) x + (3a − 2b)
=
( x − 2)( x + 3) ( x − 2)( x + 3)
for all x = 2, −3. If we multiply both sides by ( x − 2)( x + 3), we see that this is
equivalent to
9x + 7 = ( a + b) x + (3a − 2b)
for all x = 2, −3. From Fact (B), we know that the coefficients a + b and 3a − 2b
must be equal to 9 and 7, respectively. In other words, we have the following
simultaneous linear equations in a and b:
a + b = 9
3a − 2b = 7
We can solve this linear system by simply multiplying the first equation by 2 and
then adding it to the second equation. This yields 5a = 25 and therefore a = 5.
Either equation then yields b = 4, as claimed.
Remark. Because we have only learned about solving linear systems in two
unknowns, we can only make use of (A) and (B) on page 102 for the case n = 2.
Once we learn how to deal with linear systems in n unknowns, the preceding
method will allow us to determine all the coefficients c1 , . . . , cn in (A).
5 There is another method to achieve the same goal; again see [Birkhoff-MacLane], Section 3.11.
104 5. SIMULTANEOUS LINEAR EQUATIONS
Pythagorean triples
We now give a second application of linear
Pythagorean triples furnish systems. We say three positive integers a, b, c
another example of how algebra form a Pythagorean triple { a, b, c} if a2 +
and geometry are intertwined. b2 = c2 . In other words, a, b, and c are the
lengths of three sides of a right triangle, and
our convention is that the third member of a Pythagorean triple is, by definition, the
length of the hypotenuse of the right triangle. It goes without saying that the key
point of the definition of a Pythagorean triple is that all three numbers are positive
integers. Everybody knows that 3, 4, 5 form a Pythagorean triple; some may even
know that {5, 12, 13} is another Pythagorean triple, or even that {8, 15, 17} is
yet another example. But are there others?
Before answering this question, we should make the trivial observation that
given any two positive integers m and n, we can find a right triangle Δ so that m
and n are (the lengths of) two of its sides. Indeed, simply construct two perpen-
dicular segments with a common endpoint so that one has length m and the other
has length n, and then join the other endpoints of the segments to form a right
triangle. (If m < n, we have the additional freedom of constructing a right trian-
gle whose hypotenuse has length n and one leg has length m; see Exercise 1 on
page 108.) However, the length of the third side of this right triangle will not be
an integer in general. The classic example is the isosceles right triangle with both
legs
√ equal to 1, and the third side—the hypotenuse—then of course has length
2, which is not even a rational number.6 The main attraction of Pythagorean
triples is therefore that all three numbers are positive integers.
Our purpose is to produce Pythagorean triples at will by solving an extremely
simple linear system of equations. It will be obvious that we will get an infinite
number of Pythagorean triples by this method. It is even true that the method
produces all the Pythagorean triples, though we will not prove this fact here. One
would like to say that this method is due to the Babylonians some thirty-eight
centuries ago, circa 1800 B.C. (Babylon, about sixty miles south of Baghdad in
present day Iraq), but a more accurate statement would be that it is the algebraic
rendition of the method one infers from a close reading of the celebrated cuneiform
tablet, Plimpton 322, which lists fifteen Pythagorean triples.7 See [Robson].
Let us first perform a conceptual simplification. Take {3, 4, 5}, for exam-
ple. Once we are in possession of this triple, we will in fact be in possession
of an infinite number of Pythagorean triples, namely, {6, 8, 10}, {9, 12, 15},
{12, 16, 20}, and in general, {3n, 4n, 5n} for any positive integer n. Clearly,
if you already have the Pythagorean triple {3, 4, 5}, there is not much glory in
claiming that you also have another Pythagorean triple such as {6, 8, 10}. Ac-
cordingly, we define a Pythagorean triple { a, b, c} to be primitive if the integers
a, b, and c have no common divisor other than 1 (i.e., if k is a positive integer
that divides all three a, b, and c, then k = 1), and will henceforth concentrate
on getting primitive Pythagorean triples. We say a Pythagorean triple { a, b, c}
6 See,
e.g., the end of Section 3.2 in [Wu-PreAlg].
7 Lestyou entertain for even a split second the idle thought that people couldn’t have known
such advanced mathematics thirty-eight centuries ago and that these triples were probably hit upon
by trial and error, let it be noted that the largest triple in Plimpton 322 is {12709, 13500, 18541}.
5.5. PARTIAL FRACTIONS AND PYTHAGOREAN TRIPLES 105
u − v = 1
2
Adding the equations gives 2u = 52 so that u = 54 . From the second equation,
we get v = u − 12 = 54 − 12 = 34 . Thus we have retrieved the grandfather of all
Pythagorean triples, {3, 4, 5}.
Example 2. Consider:
⎧
⎨ u + v = 32
⎩ u − v = 2
3
Since this is new to most people, one should check directly that 72 + 242 = 252 .
Example 4. Consider:
⎧
⎨ u + v = 69
2
⎩ u − v = 2
69
Pythagorean triple {276, 4757, 4765}. Although Theorem 5.5 guarantees that
this is indeed a Pythagorean triple, it would be good for your soul to directly
check that 2762 + 47572 = 47652 is in fact true.
Observe that thus far, every single Pythagorean triple has been primitive.
Now consider:
Example 5. Consider:
u + v = 5
u − v = 1
5
the primitive triple {5, 12, 13} would be the result. Thus we see that different
values of s and t do not always lead to distinct primitive Pythagorean triples.
We now explain the genesis of Theorem 5.5, which, we must add, is already
implicit in the above proof of the theorem. We will follow the standard method
of assuming that we already have a Pythagorean triple { a, b, c}, and proceed
to find out what equation or equations they must satisfy. The new idea here is
that, by rewriting the Pythagorean Theorem a2 + b2 = c2 as b2 = c2 − a2 so that
5.5. PARTIAL FRACTIONS AND PYTHAGOREAN TRIPLES 107
the latter, but for school mathematics, the latter is more instruc-
tive.
With a little more work (see Exercise 7 immediately follow-
ing), one can prove that if s and t are relatively prime (i.e., no
common divisor other than 1), and if one of them is even and the
other odd, then the triple {2st, t2 − s2 , t2 + s2 } is primitive. With
considerably more work (see Exercise 11 on page 109), it can be
shown that every primitive Pythagorean triple is represented in
terms of suitable s and t in this manner.
Exercises 5.5
(1) Given any two positive numbers (not necessarily integers) x and y so
that x < y, describe a ruler-and-compass construction of a right triangle
so that its hypotenuse has length y and one leg has length x. If x and y
are fixed, are all such triangles congruent?
(2) Express 8x +2
x2 −1
1
as a sum of constant multiples of x+ 1
1 and x −1 .
− 5( x + 1) 1 1
(3) Express 3( x2 + x−12) as a sum of constant multiples of x+ 4 and x −3 .
(4) In each of the following, you are asked to solve the linear system in The-
orem 5.5 with the given values of s and t to obtain Pythagorean triples.
You may use a scientific calculator, especially for (i) and (j) below.
(a) s = 2, t = 5.
(b) s = 4, t = 5.
(c) s = 1, t = 4.
(d) s = 1, t = 3.
(e) s = 1, t = 6.
(f) s = 1, t = 12.
(g) s = 3, t = 17.
(h) s = 3, t = 13.
(i) s = 12, t = 13.
(j) s = 54, t = 125.
(k) s = 8, t = 9907.
(5) In part (k) of the last problem, the largest number in the Pythagorean
triple has 8 digits. Suppose you have a calculator with only a 12-digit
display on the screen. Explain how you can use such a calculator to
directly verify that the triple of numbers so obtained is a Pythagorean
triple.
(6) Prove that a Pythagorean triple is either primitive, or is a multiple of a
primitive Pythagorean triple.
(7) Prove that if s and t are relatively prime positive integers (i.e., no com-
mon divisor other than 1), 0 < s < t, and one of them is even and the
other odd, then {2st, t2 − s2 , t2 + s2 } is a primitive Pythagorean triple.
(Hint: Make strong use of the Key Lemma in Section 3.1 of [Wu-PreAlg]
(also see page 270 of this volume).)
(8) Let { a, b, c} be a Pythagorean triple. Prove that the following four condi-
tions are equivalent: (i) { a, b, c} is primitive. (ii) a and b are relatively
prime. (iii) a and c are relatively prime. (iv) b and c are relatively
prime.
5.6. APPENDIX 109
5.6. Appendix
In Theorem 5.2 on page 94, we characterized
parallelism in terms of slope. There is a com- The fact that two lines are
panion theorem that gives a characterization perpendicular if the product of
of perpendicularity in terms of slope, and the
their slopes is −1 is something
purpose of this Appendix is to state and prove
the latter. We do so not only for reasons of that must be proved.
completeness, but also because it is now be-
coming common in high school algebra textbooks to adopt the absurd practice of
defining perpendicularity in terms of slope. The absurdity comes from the fact that
the concept of perpendicularity has already been defined in elementary school in
terms of the degree of an angle (i.e., 90◦ angles at the point of intersection). What
we need is therefore the proof of a theorem, not a second definition.
Theorem 5.6. Two nonvertical lines are perpendicular if and only if the product of
their slopes is equal to −1.
Proof. We first prove a special case of the theorem: Let 1 and 2 be the two given
lines passing through the origin O. We will prove that they are perpendicular if
and only if the product of their slopes is −1.
Because 1 and 2 are nonvertical and are perpendicular to each other, neither
is horizontal. To describe the relative positions of the lines, observe that the four
9 See page 268.
110 5. SIMULTANEOUS LINEAR EQUATIONS
right angles10 formed by the positive and negative coordinate axes with vertex
at the origin O, minus the coordinate axes themselves, are usually called the four
quadrants of the coordinate system and are labeled I, II, III, and IV, as shown:
II I
q
O
III IV
Since 1 and 2 are neither vertical nor horizontal, with the exception of the point
O, they must lie completely inside either quadrants I and III, or quadrants II and
IV, as shown below:
T
T
T
T
O OT
T
T
T
If both 1 and 2 lie in quadrants I and III, the degree of the angle between
the rays on the lines is either greater than 90◦ or less than 90◦ , and 1 cannot be
perpendicular to 2 , as shown:
2
1
O
For a similar reason, 1 and 2 cannot both lie in quadrants II and IV. We may
therefore assume that 1 lies in quadrants I and III and 2 lies in quadrants II and
IV.
We choose points P1 = ( x1 , y1 ) and P2 = ( x2 , y2 ) on 1 and 2 , respectively, so
that both lie above the x-axis. Then P1 lies in quadrant I and P2 lies in quadrant
II. It follows that y1 , y2 > 0, but x1 > 0 and x2 < 0.
10 Remember that, in these volumes, an angle is a region in the plane rather than two rays issuing
2
J
Jq
P2 = ( x2 , y2 ) J
1
J
J
q
J P1 = ( x1 , y1 )
J
J
J
J
O
The slope of 1 computed using points P1 and O is y1 /x1 , which is positive,
while the slope of 2 computed using points P2 and O is y2 /x2 , which is negative.
Therefore, the perpendicularity of 1 and 2 implies that the product of their
slopes must be negative. It remains to check that the absolute value of the product
of the slopes is 1.
So far, P1 and P2 are any two points on 1 and 2 , respectively, subject only
to the restriction that they lie above the x-axis. Now we further specify that P1 ,
P2 be chosen so that they are equidistant from O, i.e., |OP1 | = |OP2 |. Because
the rotation of 90 degrees around the origin O carries 1 to 2 , the fact that
is a congruence implies that carries P1 to P2 . Let the vertical line from P1 meet
the x-axis at Q1 , and let Q2 = ( Q1 ). Now Q2 lies on the y-axis and since also
preserves angles, P2 Q2 ⊥ y-axis. But is also length-preserving, so
(5.17) | P1 Q1 | = | P2 Q2 | and |OQ1 | = |OQ2 |.
2
J
J
P2 J Q2 1
J P1
J
J
J
J
J
J
O Q1
By the way a coordinate system is set up (see Section 4.1), we know that | P2 Q2 |
and |OQ2 | are the absolute values of the x- and y-coordinates of the point P2 . Thus
computing the slope of 2 using the points P2 and O (Theorem 4.4 on page 67),
we see that the absolute value of this slope is |OQ2 |/| P2 Q2 |. The absolute value
of the slope of 1 is of course | P1 Q1 |/|OQ1 |. Thus, taking (5.17) into account, the
product of the absolute values of the slopes of 1 and 2 is
| P1 Q1 | |OQ2 | | P2 Q2 | |OQ2 |
· = · = 1.
|OQ1 | | P2 Q2 | |OQ2 | | P2 Q2 |
This completes the proof that the product of the slopes of two nonvertical perpen-
dicular lines which pass through the origin O must be equal to −1.
Still assuming that two lines 1 and 2 pass through the origin, how
shall we approach the proof of the converse, namely, that if the product
of the slopes of 1 and 2 is −1, then 1 ⊥ 2 ? In other words,
112 5. SIMULTANEOUS LINEAR EQUATIONS
2
J 1
J
P2 J Q2
J P1
J
J
J
Js
J s
O Q1
Let Q2 be the point on the positive y-axis so that |OQ2 | = |OQ1 | and let a hori-
zontal line from Q2 meet the 2 at P2 . If we can prove that
P1 OQ1 ∼
=
P2 OQ2 ,
then we would have |∠ P2 OQ2 | = |∠ P1 OQ1 |, so that
|∠P2 OP1 | = |∠P2 OQ2 | + |∠Q2 OP1 |
= |∠P1 OQ1 | + |∠Q2 OP1 |
= |∠Q2 OQ1 | = 90◦ .
In other words, 1 ⊥ 2 .
5.6. APPENDIX 113
It remains to prove
P1 OQ1 ∼ =
P2 OQ2 . Since the product of the slopes of
1 and 2 is −1, the product of the absolute values of slopes of 1 and 2 is equal
to 1. By a reasoning that is familiar to us by now, this means
| P1 Q1 | |OQ2 |
· = 1.
|OQ1 | | P2 Q2 |
Since |OQ1 | = |OQ2 |, we have
| P1 Q1 |
= 1
| P2 Q2 |
and therefore | P1 Q1 | = | P2 Q2 |. Recall that |OQ1 | = |OQ2 |, by the definition
of Q2 . Since also ∠ P1 Q1 O and ∠ P2 Q2 O are right angles, the SAS criterion for
congruence (see page 270) implies that
P1 OQ1 ∼ =
P2 OQ2 , as desired. This
proves that if two lines 1 and 2 pass through O and the product of their slopes
is −1, then 1 ⊥ 2 .
We have just proved that Theorem 5.6 is true for nonvertical lines passing
through the origin O.
We now finish the proof of Theorem 5.6 by dealing with the general case
where the two given lines 1 and 2 need not pass through the origin. Let L1 and
L2 be lines passing through the origin O so that 1 L1 and 2 L2 . We need the
following simple lemma.
Lemma 5.7. Let 1 and 2 be intersecting lines, and let lines L1 and L2 be parallel
to 1 and 2 , respectively. Then the perpendicularity of 1 and 2 is equivalent to the
perpendicularity of L1 and L2 .
The proof is an immediate consequence of the considerations of correspond-
ing angles of parallel lines, as shown (the details can be left as an exercise):
L1 1
@
@
L2 @
@ @r
@ @
@r @
@ @
@ @
2
Now suppose 1 ⊥ 2 ; then we want to prove that the product of the slopes
of 1 and 2 is −1. By Lemma 5.7, 1 ⊥ 2 implies L1 ⊥ L2 . Since L1 and L2
pass through O, the preceding proof of the special case of Theorem 5.6 for lines
passing through the origin shows that the product of the slopes of L1 and L2 is
−1. By Theorem 5.2, the slopes of 1 and L1 are equal, as are the slopes of 2 and
L2 . Hence the product of the slopes of 1 and 2 is also −1. Conversely, suppose
the product of the slopes of 1 and 2 is −1, and we will prove that 1 ⊥ 2 . By
Theorem 5.2, the product of the slopes of L1 and L2 is also −1. Since L1 and L2
pass through O, we already know L1 ⊥ L2 . By Lemma 5.7, 1 ⊥ 2 . The proof of
Theorem 5.6 is complete.
114 5. SIMULTANEOUS LINEAR EQUATIONS
We give an application of Theorem 5.6. Let L be the diagonal line that is the
graph of y = x and let Λ be the reflection across L. Then we claim that for any
x and y, Λ( x, y) = (y, x ). Let P = ( x, y) and Q = (y, x ); then it suffices to prove
that L is the perpendicular bisector (see page 267) of PQ. For this purpose, use
Theorem 4.2 of [Wu-PreAlg] (see page 270 in this volume); we leave the details to
Exercise 5 on page 115.
Y L
P = ( x, y) r
@
@
@r Q = (y, x )
X
O
Exercises 5.6
ax + by = e
−bx + ay = f
2
J
J
P2 = ( x2 , y2 ) J Q2 1
J
J
P = ( x , y )
J 1 1 1
J
J
J
J
O Q1
Step 1. Use the fact that the product of the slopes of 1 and 2 is
equal to −1 to prove that ( x1 , y1 ) = (−y2 , x2 ).
Step 2. Show that |OP1 |2 + |OP2 |2 = | P1 P2 |2 .
Step 3.
P1 OP2 is a right triangle and therefore 1 ⊥ 2 .
(B) Prove in general that if the product of the slopes of two nonver-
tical lines 1 and 2 is −1, then they are perpendicular.
(5) Prove that the reflection Λ across the diagonal line y = x maps a point
( a, b) to (b, a).
https://doi.org/10.1090//mbk/099/06
CHAPTER 6
117
118 6. FUNCTIONS AND THEIR GRAPHS
so that G assigns the multiplicative inverse x −1 (see page 267 for the definition)
to each nonzero number x. In symbols: G ( x ) = x −1 for each x = 0. Thus,
G (5) = 15 , G ( 17 9
9 ) = 17 ,
1
G (−0.28) = −0.28 , G (− 15
2
) = − 15
2 , etc.
6.1. THE BASIC DEFINITIONS 119
Observe once again that, insofar as G ( x ) = 0 for any x in the domain of G (all the
nonzero numbers), we can also write the same function G as
G : {all nonzero numbers} → {all nonzero numbers}.
There is another kind of function that is almost as important as real-valued
functions of one variable: those functions defined on the whole numbers N.
Consider, for example, buying many copies of the same book (think of your-
self as the owner of a bookshop). If one copy costs $17.85, then two copies cost
17.85 + 17.85 = 35.70 dollars, and for any whole number n, n copies will cost:
17.85 + 17.85 + · · · + 17.85 = n × 17.85 dollars,
n
where we have made use of the fact that for any fraction A, nA = A + A + · · · + A
(n times) (see product formula on page 268). In accordance with the notational con-
vention on page 19, we will henceforth write the cost of n copies as 17.85 n dol-
lars. We may express this information more compactly by introducing a function
h : N → R so that h(n) = the cost of n copies of the book. We have just shown
that h(n) = 17.85n for all whole numbers n.
There is a real-valued function H : R → R
that is closely related to the preceding func- The function h(n) = 17.85n for
tion h : N → R, namely, H ( x ) = 17.85x for all n ∈ N should not be conflated
all real numbers x. In TSM, the function h is of-
ten conflated with the real-valued function H
with the function H ( x) = 17.85x
from the beginning. We shall see that on some for all x ∈ R.
occasions, we do want to explicitly replace the
function h by the function H (see, for example, the discussion on pages 157 ff.),
but if we do that, we will be doing it for a reason. In general, especially at the be-
ginning of the discussion of functions, we should not conflate these two functions
because they are different functions with different domains of definition. We will
return to this point on page 128 below.
We do not want to create the impression that every function can be expressed
by a formula. For example, if S is the function
S : {a deck of cards} → {club, diamond, heart, spade}
that assigns to each card its suit (where “{club, diamond, heart, spade}” stands
for the set consisting of the four possible suits of a card), then what S does to each
card would be difficult to describe in symbols. One can only illustrate by giving
some examples, such as
S(King of diamonds) = diamond,
S(Two of spades) = spade,
S(Queen of hearts) = heart, etc.
Yet another example of a function is a person’s age or, more precisely, the
assignment to each person his or her age. What we have is a function K :
{people} → {whole numbers} so that if p is a person, K ( p) = the age of p.
Once you get this idea, you begin to see many examples of functions in real
life. For example, writing down a person’s height is in effect a function g :
{people} → R, and writing down a person’s Social Security number is a func-
tion S : {American working adults} → N. And so on.
120 6. FUNCTIONS AND THEIR GRAPHS
Y
r( x0 , y1 )
r( x0 , y0 ) S
X
O x0
of functions, but they are not pursued as relations per se either. They are studied
seriously because of the geometry or the number theory that is associated with
them. However, it is common in TSM to devote many pages to various aspects
of relations, including definitions of the domain and range of a relation. TSM
also spawns standardized-test items that demand that your students know what
the “range of a given relation” is. As far as mathematics is concerned, such
information is of negligible import in K–12 and therefore does not deserve your
rapt attention. If we can get students in K–12 to use functions fluently, we will
already be ahead of the game.
At this point, we want to point out that the graphs of real-valued functions
defined on the whole numbers N, such as the cost function of a book, h : N →
R on page 119, are also subsets of the plane and therefore also have pictorial
representations. See the discussion on page 128.
We can now revisit the concept of the graph of an equation in two variables
that was introduced on page 60. Suppose we are given a function H from the
plane to the set of all numbers, i.e., H : R2 → R, so that, for any point ( x, y)
in R2 , H assigns to it a number H ( x, y). Such a function H is called a function
of two variables, or more precisely, a real-valued function of two variables. An
equation in x and y such as H ( x, y) = 5 is thus a question asking whether there
are points ( A, B) in the plane that satisfy H ( A, B) = 5. In general, an equation
of the form H ( x, y) = c for some fixed number c is called an equation in two
variables.3 The graph of the equation in two variables, H ( x, y) = c, is the
collection of all the points ( A, B) in the plane so that H ( A, B) = c.
In Chapters 4 and 5, we already came across many functions of two variables
though without the name, e.g., g( x, y) = 3x − y for all numbers x and y. We
recognize that, for example, the equation in two variables, g( x, y) = 1 (i.e., 3x −
y = 1) is exactly what we called a linear equation in two variables (page 59). We
also recognize that, in this case, the graph of 3x − y = 1 is a line (Theorem 4.2 on
page 60).
We have now defined the concept of the graph of a real function of one vari-
able and, for any function of two variables H, we have also defined the graph
of an equation H ( x, y) = c for a fixed constant c. Both are subsets of the plane.
Where these concepts of graphs come together is in the following situation. Let f
be a real function of one variable: f : R → R. Define a function of two variables
F : R2 → R by F( x, y) = y − f ( x ) for all x and y. Then we claim:
(6.1) {the graph of f } = {the graph of the equation F( x, y) = 0}.
We can make this relationship more explicit by rewriting it as follows: let f be a
real-valued function of one variable; then
(6.2) {graph of the function f } = {graph of the equation y − f ( x ) = 0}.
In order to prove (6.2), we have to prove that the two sets are equal. This means
we have to prove that a point in the graph of the function f is also a point in the
graph of the equation y = f ( x ), and vice versa (see page 267). So suppose ( x0 , y0 )
is a point on the graph of the function f ; by the definition of the graph of f , this
means ( x0 , y0 ) = ( x0 , f ( x0 )), so that y0 = f ( x0 ). Thus ( x0 , y0 ) is on the graph of
3 For example, when H ( x, y ) is an expression in the numbers x and y. But of course, a function
Exercises 6.1
(1) (a) Describe the graph of f : R → R, where f ( x ) = 2x − 8. (b) Express
the graph of g : R → R, so that g( x ) = 5 − 3x, as the graph of an
equation of two variables.
(2) Let a real-valued function of two variables H be defined by H ( x, y) =
x − y. What is the graph of the equation H = 0? What is the graph of
the equation H = 1? What is the graph of the equation H = −25? How
are these graphs related?
(3) Can the circle of radius 1 around the origin (0, 0) be the graph of a
function in one variable? Is it the graph of an equation in two variables?
Explain.
(4) Describe the graph of the following function of one variable, S : R → R,
defined as follows: for any number x, if n is the integer so that n ≤ x <
(n + 1), then S( x ) = n. (Incidentally, this is an example of a real-valued
function of one variable that does not have a formula in terms of the stan-
dard notation for addition, subtraction, multiplication, division, or rais-
ing to a power. However, it shows up often enough that a special notation
has been devised for it and related functions; see [Wiki-floorfunction].)
4 See the definition of average speed on page 266. The concept of average rate is similar, and is
However, if you want to know the temperatures at the half-minute marks, then
you’d need a bigger table:
Now if you also want the temperatures at the quarter minute marks, you’d
need a table that is even bigger. Clearly there is no end in sight of the size of the
table you need if you want a complete profile of the whole evolving situation, and
you soon realize that what you really need is not a table of enormous size but a
function f (t), where
f : {all numbers ≥ 0} → R,
f (t) = the temperature of the coffee t minutes after it is brewed.
(Observe how this function f literally “assigns” the temperature f (t) to each mo-
ment that is t minutes after it is brewed.) Once we have the right concept in the
form of a function (and not a table), the next step is to determine this function in
the form of a reasonable formula. The rest of the story is related to Newton’s law
of cooling; the long and short of it is that there is such a formula using concepts in
calculus:
110 t
f (t) = 70 + 125 ,
125
F(t) = his distance (in miles) from the airport t minutes after
he leaves his house.
of this function in terms of a few values of t can tell a story, as for instance:
t f (t) t f (t)
0 25 30 13
5 24 35 19
10 22.5 40 24
15 21 43 25
20 16 44 25
25 10 45 24.5
26 9 55 13
27 10 60 7.5
28 11 67 0
We can see that he has to start his trip slowly probably because of city traffic,
so that after 10 minutes he has only traveled two and a half miles. Around the
26th minute after he leaves home, he turns around as the values of f (27) and
f (28) and those of subsequent minutes show that he is driving away from the
airport. He forgets to bring his photo-ID (a guess!). He manages to get home at
43 minutes after his departure and it takes him only about a minute to get the
necessary document. Then he speeds a bit as he makes it to the airport in 23
minutes (67 − 44 = 23), not trivial considering the traffic conditions these days.
He has a few minutes to spare.
As a final example, consider the problem of the temperature of the city of
Berkeley on a certain day. To say that Berkeley is 67◦ (Fahrenheit) makes no sense,
strictly speaking. Is the temperature taken in the early dawn or in the afternoon?
In Berkeley, this could mean a 25◦ difference. And where is the temperature
measured: at the top of the hill (about 1000 feet high), downtown, or by the
Bay? The difference here could be another 15◦ . If we start measuring the time
t in hours from midnight, then 0 ≤ t ≤ 24. To specify the geographic location,
we need two more numbers which may be thought of as the idealized x- and
y-coordinates. Berkeley being a small city, 5 miles from the city center in any
direction would include everything. Therefore, a scientifically usable description
of the temperature of Berkeley would make use of a function T, so that, if S is
the region in 3-space consisting of all ordered triples of numbers ( x, y, t) so that x
and y satisfy | x |, |y| ≤ 5 (miles), and 0 ≤ t ≤ 24 (hours), then5
F : S → R,
F( x, y, t) = the temperature of Berkeley, t hours past midnight
at a spot specified by the x- and y-coordinates.
Incidentally, such a function F is said to be a function of three variables, because
three numbers x, y, and t are involved in its definition. In general, an accurate
description of the temperature in a given geographic area would require four
numbers, ( x, y, z, t), where z will specify the height about the point with coordi-
nates ( x, y) at which the temperature is measured (think about the approach of a
storm and you would appreciate the importance of taking height into account).
Therefore, any serious study of temperature will be a study of this function of
four variables.
5 Notice also the use of absolute value to describe the physical extent of the city. It means of
course that −5 ≤ x ≤ 5 and −5 ≤ y ≤ 5. See Section 2.6 in [Wu-PreAlg] or pages 161 ff. in this
volume.
126 6. FUNCTIONS AND THEIR GRAPHS
It may not have escaped your attention that we have been talking about func-
tions of three variables and four variables without holding forth on the philo-
sophical implications of what a “variable” is. This is as it should be. (Compare
the discussion of the term “variable” in Section 1.1 and Section 3.1.)
After all that, you are still entitled to ask: what is the point of describing
a mundane concept such as “the temperature of Berkeley” with such exquisite
precision using a function of three variables? Is it just to show off? No, the reason
is that our need for accurate weather forecasting—long-term and short-term—will
never be met until we can make a science out of the study of the climate. There
can be no science if the temperature of a wide area remains a single number rather
than a function of four variables; the same holds for other quantities such as air
pressure, wind speed, etc. If functions are taken out of climate science, we wouldn’t
be able to predict rain or shine in the next 24 hours, much less the onset of global
warming.
Quite apart from the description of change, functions have already forced
their way into our work whether we know it or not. Transformations of the plane,
including translations, reflections, and rotations, are functions which assign to
each point of the plane another point of the plane, i.e., these are examples of
function T , so that
T : R2 → R2 .
For example, the translation T which moves every point of the plane 2 units to
the left horizontally is precisely given by (see Lemma 5.3 on page 95):
T ( x, y) = ( x − 2, y).
These are among the simplest examples of how functions naturally arise. Of
course, functions are everywhere as soon as you look around. You will be seeing
many more from now on. The purpose of this discussion is to make it plain that
the concept of a function is not something artificially concocted for the purpose
of giving students a hard time. Rather, it is a tool, created out of necessity, to
succinctly describe the phenomena around us, be they natural or social. Functions
are indispensable.
Exercises 6.2
(1) Consider the concept of “the population of a city”. Does it make sense?
If not a number, what would you use to more accurately describe the
number of people living in the city?
(2) Consider the medical question of how much a person weighs. Does it
make sense? What would you use to describe more accurately a person’s
weight?
get a feeling of what the graph looks like. For the school classroom, we repeat:
The importance of actually plotting points on the graph by hand can-
not be over-emphasized, and this is especially true in an age of afford-
able graphing calculators. Be sure to insist on it in your class-
room.
There is probably no better illustration of the need for students to learn the
definition of the graph of a function, and to form the habit—at the beginning, at
least—of plotting the graph by hand, point-by-point, than the 2015 blog of Dan
Meyer ([Meyer]). Here is a quote from the blog:
I left high school adept at graphing functions. I could complete
the square and change forms easily. I knew how to identify the
asymptotes, holes, and limiting behavior of those thorny ratio-
nal expressions. But it wasn’t until I had graduated university
math and was several years into teaching that I really, really un-
derstood that the graph is a picture of all the points that make
the function true. This was difficult for me because graphs don’t
often look like a bunch of points. They look like a line.6
This quote, together with readers’ responses in [Meyer], show clearly the
devastating effect of TSM on mathematics learning: students have no idea that the
graph of a function f is the totality of all the
points {( x, f ( x ))}, because TSM does not con- Beginners benefit from graphing
cern itself with giving the precise definition of a function manually, point by
the graph of a function. In this light, one can
point, for many points.
better appreciate the real purpose of graphing
functions by hand, point-by-point: it is to leave A graphing calculator can be
no doubt in students’ minds that the graph of used later.
a function is “a picture of all the points that
make the function true.”
Let us start with a linear function of one variable, i.e., a function f defined
on R so that for some constants a and b, f has the expression f ( x ) = ax + b for
all numbers x. To get an idea of what a linear function is like, one can graph a
simple function such as g( x ) = 2x − 3 by plotting a few points of its graph and
observing that they all line up in a straight line, e.g.,
(0, −3), (1, −1), (1.5, 0), (2, 1), (4, 5), (4.5, 6), (5.5, 8), (6, 9).
But is the graph of g really a line? More generally, is the graph of f ( x ) = ax + b a
line? We now show that such is the case, because we already observed in (6.2) on
page 121 that the graph of the function f ( x ) = ax + b is the graph of the equation
y = ax + b which, according to Section 4.4, is a line. Thus, we have:
Lemma 6.1. The graphs of linear functions of one variable are lines.
If a function h is defined on the whole numbers N, h : N → R , so that for
some constants c and d, h(n) = cn + d for all whole numbers n, then we also call
such an h a linear function. If there is any danger of confusion, we will be careful
to say h is a linear function defined on the whole numbers. The graph of such an h
will be a collection of dots, e.g.,
(0, −3), (1, −1), (2, 1), (3, 3), (4, 5), (5, 7), (6, 9), (7, 11).
6 What is meant is probably that “They look like a curve.”
128 6. FUNCTIONS AND THEIR GRAPHS
Now consider the two functions that arose in connection with the cost of a
book, h : N → R and the real-valued function of one variable H : R → R, so
that h(n) = 17.85 n for each whole number n and H ( x ) = 17.85 x for every real
number x (see page 119). Let the number 17.85 be denoted by c; then we rescale
the y-axis so that the unit is not 1 (dollar) but c (dollars) in order to be able to
draw the graph of H within the page of a book.
Y
7c
6c
5c
H
4c
3c
2c
c
X
O 1 2 3 4 5 6 7 8 9
As noted above, the graph of h is a sequence of points because its domain is the
whole numbers:
Y
7c q
6c q
5c q
h
4c q
3c q
2c q
c q
q
X
O 1 2 3 4 5 6 7 8 9
The standard terminology is that the function H interpolates h (“connects the
dots of the graph of h”), or that H is an interpolation of the function f . We will
return to this concept of interpolation in Chapter 9.
In TSM, a discussion of the function h usually displays the graph of H, but
not the graph of h as a sequence of dots. As we mentioned earlier, TSM usually
conflates h with H. This tends to create a crisis in students’ perception of math-
ematics: is the graph of a function what it is supposed to be, or is it something
the textbook makes up as it goes along? For this reason, we will be careful to
draw a distinction between a linear function defined on the whole numbers and
its interpolation.
6.3. SOME EXAMPLES OF GRAPHS 129
Given a real-valued function of one variable f , we have seen (e.g., page 121)
how to associate with it a function of two variables, F( x, y), which is defined by:
F( x, y) = y − f ( x ) (= y − ( ax + b)).
Thus F is a function defined on R2 . Consider now the following problem: for a
fixed constant c, what is the set of all the points ( x
, y
) so that F( x
, y
) = c? (Of
course this is the same as saying all the points ( x
, y
) so that y
− ( ax
+ b) = c.)
This set—the set of all the points ( x
, y
) so that H ( x
, y
) = c—is called a level
set of the function of two variables H, and is denoted by { H = c}. Of course, if
c = 0, this would be the same question as asking for the graph of f . Now ( x
, y
)
being in { H = c} is equivalent to y
− ( ax
+ b) = c, which in turn is equivalent
to (− a) x
+ y
= b + c, which is equivalent to ( x
, y
) being a solution of the linear
equation in two variables (− a) x
+ y
= b + c. The conclusion is that a level set
{ H = c} is always a line, namely, the graph of the equation (− a) x + y = b + c.
The reason for the terminology of “level set” for { H = c} comes
from the fact that if we graph the function H : R2 → R in 3-space
R3 , then the graph is a surface. The intersection of this surface with
the “horizontal” plane z = c is a (plane) curve, and what we call
{ H = c} in the xy-plane is exactly the vertical projection of this
curve on the xy-plane. Of course, any horizontal plane is considered
in everyday life to be “level”, and this accounts for the name.
Activity
Let us graph a function of one variable that is not linear. For example, take the
square function s : R → R, s( x ) = x2 for all numbers x. The graph of s consists
of all the points of the form ( x, x2 ), where x is arbitrary. Since (− x )2 = x2 ,
we see that the graph includes both ( x, x2 ) and (− x, x2 ), no matter what x may
be. The point (0, 0) is an obvious point on the graph. We can put in values of
x = ±1, ±2, ±3, ±4 to get the points
(±1, 1), (±2, 4), (±3, 9), (±4, 16).
Let us also throw in the points
(±0.5, 0.25, ) (±1.5, 2.25), (±2.5, 6.25), (±3.5, 12.25)
for good measure, and we get a sequence of points on the graph of s, displayed
on the left picture below. Note that in order to make the picture small enough to
fit the page, we have shrunk the scale of the y-axis by a factor of 4.
q 16 q
q 14 q
q 10 q
q 6 q
q q
q q
q q q2 q q
−4 −2 O 2 4
130 6. FUNCTIONS AND THEIR GRAPHS
It is not difficult to extrapolate from these points to envision the graph as the
curve in the above picture on the right. This curve is an example of what is called
a parabola. Parabolas will be defined and discussed more fully in Chapter 10 (see
page 252).
With the availability of scientific calculators, there should be no hesitation in
asking students to graph quite sophisticated functions, e.g., a function such as
x → x x−27x
4 +5
+2
. Let us illustrate with a simpler one such as G : R → R given by
G ( x ) = x − 3x + 6. Recall from Section 1.4 that this is called a cubic polynomial or,
3
more simply, a cubic. Since we have no idea what to expect, we try some obvious
numbers, e.g., G (0), G (±1), G (±2), G (±3), G (±4), getting the following points
on the graph of G:
(−4, −46), (−3, −12), (−2, 4), (−1, 8), (0, 6),
(1, 4), (2, 8), (3, 24), (4, 58).
Because the jumps in the values of G between the values of x at 2 and 3, 3 and 4,
−2 and −3, and −3 and −4 are so great, we also get the following points on the
graph of G:
By compressing the y-axis by a factor of 40, we can exhibit these points as follows:
60 r
50
40 r
30 r
20 r
r r r 10 r r
r
−4 −r3 −2 −1 O
−10 1 2 3 4
r −20
−30
r −40
−50
−60
The graph seems to cross the x-axis between −3 and −2. Suppose it crosses the
x-axis at ( x0 , 0); then 0 = G ( x0 ) by definition of the graph of G. This means
x03 − 3x0 + 6 = 0. Such an x0 is called a root or solution of the cubic polynomial
equation x3 − 3x + 6 = 0. The roots of a polynomial equation are of great interest
in mathematics. For this reason, one may try to get a better estimate of this x0 .
We have
we already know that there are no further troughs or bumps below x = −4 and
above x = 4. So the graph will continue to go down as it goes to the left on the
x-axis, and continue to go up as it goes to the right on the x-axis.
If we were less fortunate and the chosen points happen not to reveal the
“bump” and the “trough” of the graph, then we would have to plot more points,
since these features may not have appeared yet or may not even exist. The follow-
ing graph of g( x ) = x3 , for example, has no “bump” and no “trough”:
That said, we start plotting points. Again, there are two obvious points:
1
Beyond that we take some random values of x and compute x ; we will remark
on the following points on the graph in due course:
(0.1, 10), (0.2, 5), (0.4, 2.5), (0.5, 2), (2, 0.5),
(4, 0.25), (5, 0.2), (8, 0.125), (10, 0.1), (−0.1, −10),
(−0.2, −5), (−0.4, −2.5), (−0.5, −2), (−2, −0.5),
(−4, −0.25), (−5, −0.2), (−8, −0.125), (−10, −0.1).
132 6. FUNCTIONS AND THEIR GRAPHS
10 q
8
6
q
4
q
2 q
q q
q q q q q q q q
q
q 2 4 6 8 10
qq
Notice that there are two separate curves here, and they are called the two branches
of a hyperbola. Hyperbolas are related to parabolas (page 130) as they can both be
obtained by intersecting a plane with a (double-napped) cone (see, for example,
[Wiki-conic] for an elementary introduction and related references). For this rea-
son, both the parabola and the hyperbola are examples of so-called conic sections.
Someone has yet to write an elementary account of conic sections that begins
with the geometric definitions of all the conic sections and then identifies these
curves with those obtained from plane intersections with a cone; for the time
being, see [Teukolsky]. Regrettably, we will not pursue the study of hyperbolas
in his volume, but see Chapter 8 of Volume II and Chapter 15 of Volume III in
[Wu-HighSchool].
The plotted points above exhibit a pattern: if 0 < a < b or a < b < 0, then
1 1
a > b . (In Exercise 1 on page 132, you are asked to prove this in general.) This
pattern suggests that the limited choice of the points on the graph above is enough
to reveal the general behavior of the graph: it tells us that as the upper right curve
extends to the right end of the positive x-axis, all it does is get closer and closer
to the x-axis, and as it approaches 0 from the positive x-direction, all it does is
get closer and closer to the positive y-axis. A similar statement also applies to the
lower left curve.
Exercises 6.3
(3) Plot enough points in the graph of each of the following functions to
get an accurate picture of the graph: (i) x2 − x, (ii) 3x2 − 4x + 1,
(iii) x3 − x2 − 4x + 4, (iv) 2x3 − 4x2 + x + 6. (Use a scientific calculator.)
(4) Plot enough points in the graph of each of the following functions to get
an accurate picture of the graph: f 1 ( x ) = 2x2 , f 2 ( x ) = 2x2 + 3, f 3 ( x ) =
2( x − 1)2 , and f 4 ( x ) = 2( x − 1)2 + 3. How are they related? Explain in
detail.
(5) Let f ( x ) = ax2 and g( x ) = a( x − b)2 + c, where a, b, c are constants.
Describe how the graphs of f and g are related. Explain in detail.
(6) (a) Let H be the function of two variables defined by H ( x, y) = 23 x − 14 y +
5. Describe the level sets { H = 1} and { H = −2} individually and how
they are related. (b) Let H be the function of two variables defined by
H ( x, y) = ax + by + d, where a, b, d are constants with b = 0. Let c and
c
be distinct constants. Describe the level sets { H = c} and { H = c
}
individually and how they are related.
Y
r 1 = ϕ( A)
Ar
X
O 1
It is time to point out that sometimes we have to intentionally ignore the fact
that rotations and reflections are length-preserving and angle-preserving in order
to rescale a coordinate axis for a particular need. Indeed, we already had to
perform such rescaling for the graph of the cost function h on page 128, the
square function s on page 129, and the cubic function x3 on page 131. Here is
another example. Let us graph the linear function f : R → R defined by
f ( x ) = 300x + 60
for all x in the segment [0, 3]. Then it would be impossible to draw this graph on
a page of a book: 3 units of length horizontally but 960 units vertically? Common
134 6. FUNCTIONS AND THEIR GRAPHS
sense dictates that we shrink the y-axis in order to make the drawing of the graph
possible. For example, we can let 1 unit along the y-axis stand for 300:
Y (3, 960) r
900
600 f
300
60 r
X
O 1 2 3
However, by making the graph pre-
In theory, the unit segments on sentable, we pay a price in terms of geometry.
the x- and y-axes have the same A (counterclockwise) rotation of 90 degrees
around O will no longer be length-preserving
length. In practice, one of the because it will map a segment of length 1 on
axes often has to be scaled. the x-axis (i.e., [0, 1]) to a segment that repre-
sents a length of 300 on the y-axis (i.e., [0, 300]).
Moreover, consider the graph of the linear function g of one variable, g( x ) = x.
This is a line of slope equal to 1; according to the Activity on page 65, this line
should make a 45◦ angle with the positive x-axis. While this is true in a properly
set-up coordinate system where rotations are length-preserving, it won’t be true
here. In the present coordinate system, what is usually 1 unit of length along the
y-axis becomes a length of 300, so that a “rise” of 1 unit in the y value amounts to
a vertical “rise” of 1/300 of the unit length. Because of this geometric distortion,
the graph of g would be indistinguishable from the x-axis because it is, for all
practical purposes, horizontal. In particular, it will not make a 45◦ angle with the
x-axis. Instead, it is the graph of f —the slope of which being 300—that makes a
45◦ angle with the positive x-axis, as shown in the picture above.
Observe also that, as a consequence of the distortion along the y-axis in the
preceding coordinate system, the reflection across the line that passes through
the origin O and makes a 45◦ angle with the x-axis (the broken line above) is no
longer length-preserving, because it maps the unit segment [0, 1] on the x-axis to
the segment [0, 3000] on the y-axis.
For the purpose of communication, we may call a coordinate system in which
the unit of length in one of the coordinate axes has been intentionally modified a
scaled coordinate system. Be aware that in a such a coordinate system, rotations
and reflections do not preserve lengths.
In the exercise below, one sees a natural example of a scaled coordinate sys-
tem where the rescaling takes place in the x-axis rather than the y-axis. Other
examples of scaled coordinate systems can be found on pages 197 and 198.
6.4. REMARKS ON GRAPHS AND COORDINATE SYSTEMS 135
Exercises 6.4
(1) Define a function h : [0, 360] → R as follows. Let P0 be the point (1, 0)
on the circle of radius 1 (the unit circle) around the origin O. For each t so
that 0 ≤ t ≤ 360, let Pt be the point on the unit circle which is the image
of P0 under the t-degree counterclockwise rotation around the origin O.
Let the coordinates of Pt be ( xt , yt ) (so that y0 = 0 and −1 ≤ yt ≤ 1 for
all t in [0, 360]). See the picture:
Pt = (xt ,yt )
t
O P0 =(1,0)
CHAPTER 7
Linear Functions
and Proportional Reasoning
With the concept of a function available, we are now in a position to revisit and
shed light on the earlier discussion of rates and constant rates in Section 1.9 of
[Wu-PreAlg] and Section 3.2 of this volume. In terms of functions, the concept
of constant rate now takes on a strikingly simple form: the constancy of the rate
is equivalent to the linearity of a well-defined associated function.1 This is the
content of Theorem 7.1 (page 138). We give some examples in Section 7.1 to
illustrate how problems about constant rate can now be done in a much more
conceptual way from the perspective of Theorem 7.1.
We also take this opportunity to critically examine the concept of proportional
reasoning, a mainstay of the middle school mathematics curriculum in TSM.2 This
concept is not mathematically well-defined, and it is unclear in what sense this
concept could be rendered mathematically valid. The purpose of the extended
discussion of proportional reasoning in Section 7.2 is to alert teachers to approach
all things related to proportional reasoning with a great deal of caution, particularly
the exhortation that this concept, being of allegedly great importance, “merits
whatever time and effort must be expended to assure its careful development”
([NCTM, page 82]).
137
138 7. LINEAR FUNCTIONS AND PROPORTIONAL REASONING
and so that f (0) = 0, i.e., the motion begins at time 0. Then the motion has a constant
speed of v mph if, and only if, f (t) = vt for a fixed positive number v.
It goes without saying that there is a corresponding theorem for other kinds
of work done at a constant rate: water flow, lawn-mowing, house-painting, etc.
(see Section 3.2).
We now use Theorem 7.1 to solve the original problem about Ina. We want to know
how long it will take her to walk 78 miles. In terms of Ina’s distance function
f , this means we want to know the value of t0 so that f (t0 ) = 78 . Noting that
f (t) = vt, we are looking for a t0 so that v t0 = 78 . Thus, this t0 satisfies
7
8
t0 = .
v
We need to know the value of v. From the given data: Ina walks 1 12 miles in 1
2 of
an hour. Therefore f ( 12 ) = 1 12 , so that
1 1
v· = 1 =⇒ v = 3.
2 2
Thus,
7
8 7
t0 = = hours.
3 24
7
Since 24 hours is 17.5 minutes, this answer is the same as the one obtained in
Section 1.9 of [Wu-PreAlg].
Recall from page 127 that a linear function of one variable x is a function g
of the form g( x ) = ax + b for some constants a and b; the number b is called
the constant term of the linear function. We say g is a linear function without
constant term if b = 0. Thus the distance function in Theorem 7.1 is an example of
a linear function without constant term. The reason we single out linear functions
without constant terms is that for such a function, g( x ) = ax, we have
g( x )
= a for all x > 0.
x
In particular, if x1 and x2 are any two positive numbers, then
g ( x1 ) g ( x2 )
(7.1) =
x1 x2
because both sides are equal to the constant a. Equation (7.1) is the precise mean-
ing of the statement in TSM that “two quantities g( x ) and x are in a proportional
relationship”, provided it is known that g( x ) is a linear function of x without con-
stant term. Unfortunately, such an explanation is missing in TSM. In particular,
TSM usually does not make explicit the fact that g( x ) is a linear function without
constant term but expects students to somehow guess it.
In general, the constant term of such a linear function is not zero. For exam-
ple, in the situation of Theorem 7.1 suppose we begin observing the motion of
the object, not from the beginning, but only after the object has traveled b miles.
Thus at time 0, the object has already traveled b miles. (Compare Paul’s distance
function in Example 2 on page 142.) Define the associated distance function of
one variable F : [0, ∞) → R so that
F(t) = the total distance traveled up to time t (hours).
140 7. LINEAR FUNCTIONS AND PROPORTIONAL REASONING
Then we are given that F(0) = b (miles). The average speed of the motion in the
time interval [t0 , t] is
which is equal to F(t) − F(t0 ) miles. The average speed of the motion in the time
interval [t0 , t] is, therefore,
F ( t ) − F ( t0 )
(7.2) .
t − t0
Suppose the motion has constant speed v mph. Then for every t > 0, the
average speed of the motion in the time interval [0, t] is equal to v. So,
F ( t ) − F (0)
= v.
t−0
Therefore F(t) − F(0) = vt. Since F(0) = b, we get F(t) = vt + b for every t > 0,
and therefore also for every t ≥ 0. (Again, compare Paul’s distance function in
Example 2 on page 142.)
Conversely, suppose F(t) = vt + b for some constants v and b; then we claim
that the motion is one of constant speed v. Indeed, from (7.2), we have that the
average speed of the motion in the time interval [t0 , t] is equal to
(vt + b) − (vt0 + b) v ( t − t0 )
= = v mph.
t − t0 t − t0
Since this is true for all t0 and t, the motion has constant speed v by the definition
of constant speed. We have therefore proved the following slightly more general
version of Theorem 7.1.
and so that F(0) = b (miles). Then the motion is one of constant speed v mph if, and
only if, F(t) = vt + b for a fixed positive number v.
Naturally, an entirely similar discussion can be given for water flow at a con-
stant rate, work done at a constant rate, etc. For example, in the case of water
flowing out of a faucet into a container (let us say), let F be the function so that
F(t) is the amount of water (in gallons) in the container at time t (in minutes). Let
F(0) = b gallons, i.e., there are already b gallons of water in the container at time
0. One then proves in exactly the same way that the rate of the water flow being a
constant r gallons per minute is equivalent to F(t) = rt + b gallons for all t ≥ 0.
7.1. CONSTANT RATE AND LINEAR FUNCTIONS 141
Applications
We now solve two prototypical “constant rate problems” using linear functions.
These problems can be done without algebra, so the point of interest here is
the relative simplicity and conceptual clarity of the solutions that come from the
formulation of constant rate in terms of linear functions (see Theorem 7.1). In
the ensuing discussion, one can also appreciate the importance of being able to
translate verbal information into equations (Chapter 2).
Example 1. Joshua, Li, and Manfred are going to paint a house together. It is
estimated that, individually, it would take them 18 hours, 15 hours, and 16 hours,
respectively, to paint the whole house. Assuming that each person paints at a
constant rate, estimate how long it would take them to do it together.
Since each person paints at a constant rate, Theorem 7.1 implies that there are
fixed positive constants j, , and m so that the areas of the house that Joshua, Li,
and Manfred paint in t hours are, respectively,
J (t) = jt sq ft,
L(t) = t sq ft,
M (t) = mt sq ft.
(Thus J (0) = L(0) = M (0) = 0.) We can determine each of these constants j, ,
and m, as follows. Let A be the number of square feet of the house that needs
painting. Since it takes Joshua 18 hours to paint the house, we see that J (18) = A.
Thus j · 18 = A and j = 18 A
. In like manner, we get = 15 A
and m = 16 A
. If
all three paint together, then in t hours, each of Joshua, Li, and Manfred paints,
A A A
respectively, J (t), L(t), and M (t) sq ft, i.e., 18 t, 15 t, and 16 t sq ft, respectively.
Therefore, if all three work together, they paint
A A A 1 1 1
(7.3) t+ t+ t = + + At
18 15 16 18 15 16
sq ft in t hours. Let t0 hours be the time it takes these three people to paint the
whole house, i.e., A sq ft. Then,
1 1 1
+ + A t0 = A.
18 15 16
1
Multiplying both sides by A, we get
1 1 1
+ + t0 = 1
18 15 16
and therefore the answer is:
1 1 55
t0 = 1 1 1
= 798
= 5 hours.
18 + 15 + 16 4320
133
Activity
55
Check that 5 133 hours is the correct answer, i.e., the areas painted by Joshua,
55
Li, and Manfred after 5 133 hours do add up to A sq ft.
142 7. LINEAR FUNCTIONS AND PROPORTIONAL REASONING
Two things are noteworthy. First, (7.3) clearly shows that when
Joshua, Li,
A A A
and Manfred work together, they paint at the constant rate of 18 + 15 + 16
square feet per hour (see Theorem 7.1 on page 138). This would be clumsy to
prove without the availability of linear functions. A second thing of note is that,
if one is fluent in the use of functions, then the preceding solution is entirely
straightforward and is devoid of subtlety. Compare this with any solution that
does not use functions.
Example 2. Paul and Geneviève walk at a constant rate. Paul walks from their
house to the train station in 30 minutes while Geneviève needs only 24 minutes to
do the same. Geneviève gives Paul a head start of 4 minutes and then she starts
off. Does she catch up with Paul, and if so, after how many minutes?
Let us first make a rough estimate of whether Geneviève can overtake Paul.
Since it takes Geneviève only 24 minutes to get to the station, it takes only 4 +
24 = 28 minutes after Paul leaves the house before she gets to the station. But 28
minutes after Paul leaves, he is still on his way to the station because it takes him
30 minutes to get there. Therefore Geneviève overtakes him at some point on her
way to the station. The question is exactly when.
Let G (t) be Geneviève’s distance from the house t minutes after her departure.
Let the distance between the house and the train station be D miles. So G (0) = 0,
and by Theorem 7.1, we know G (t) = a t, where a is Geneviève’s (constant)
D D
speed. Since her speed is given as 24 miles per minute, we have G (t) = 24 t.
Let P(t) be Paul’s distance from the house t minutes after Geneviève’s departure.
D
Now, by the same reasoning as in Geneviève’s case, Paul’s speed is 30 miles per
D
minute, so that in 4 minutes, he would be 4( 30 ) miles from the house. Thus
D D D D
P(0) = 4( 30 ), and therefore P(t) = ( 30 )t + 4( 30 ) = 30 (t + 4) (compare Theorem
7.2). The problem then becomes: what is the time t0 so that G (t0 ) = P(t0 )? That
is, we must solve the equation G (t) = P(t), i.e., we must solve:3
D D
t = ( t + 4).
24 30
This again looks like an equation in the two numbers D and t, but once again, the
D goes away as soon as we multiply both sides by D1 . So 24 1 1
t = 30 (t + 4) and
30 1
therefore 24 t = t + 4. This leads to 4 t = 4 and t = 16. So 16 minutes after
Geneviève leaves the house, she catches up with Paul.
Activity
3 Please take note of the precise meaning of an equation as explained on page 28 and how it
2 1 2
from the house to the station). Then G (t) = 24 t = 12 t and P(t) = 30 ( t + 4) =
1 4
15 t + 15 . We now graph both linear functions
1 1 4
(7.4) G (t) = t and P(t) = t+
12 15 15
on the same set of coordinate axes (D stands for “distance from the house”):
D
G (t) P(t)
2
!!
5 !!!
!
3
! !
4 r
! !
!!
(16, 3 )
3 4
! !
1 !!
!!!
2
!!
3 !
1 !!!
3 !
0 T
1 12 16 24 26
4
The intercept of P(t) on the D-axis (which is 15 , as we saw above) now has a
graphic interpretation: it gives Paul’s distance from the house at the moment
Geneviève leaves the house. The point of intersection of the two graphs, which is
(16, 43 ), also has an interpretation: the x-coordinate tells the time when Geneviève
catches up with Paul because at that instant, both are exactly the same distance
( 43 miles, the D-coordinate of the point) from the house.
Exercises 7.1
Each of the following exercises can be done as in Section 3.2. Therefore
the reason for giving these exercises here is for you to get some practice
doing them using linear functions.
(1) Suppose Jessica can paint a house in 5 days, and Jessica and Helena
together can do it in 3 days. Assuming that each paints at a constant
rate, in how many days can Helena do the work alone?
(2) A man walks from point A to point B at a constant rate. If he walks at
the rate of 1 yard per second, then it takes him 5 12 minutes more to get
to point B than if he walks at the rate of 4 yards per 3 seconds. How far
is point A from point B?
(3) A freight train runs 6 miles an hour slower than a passenger train. It
runs 80 miles in the same time that the passenger trains runs 112 miles.
Assuming that both trains run at a constant rate, find the speed of each
train.
(4) A train left A for B, 112 miles apart, at 9 am, and one hour later a train
left B for A; they met at 12 noon. If the second train had started at 9 am
and the first at 9:50 am, they would also have met at noon. Assuming
that each train runs at a fixed constant speed, find their speeds.
(5) Two faucets pour into a tub. The first faucet alone can fill the tub in
18 minutes, and the second faucet alone can fill the tub in 22 minutes.
144 7. LINEAR FUNCTIONS AND PROPORTIONAL REASONING
Assume the constancy of the rates of the water flow as usual. The first
faucet is turned on for 4 minutes before the second faucet is turned on,
and t minutes later the tub is filled. What is t?
(6) Two people A and B walk straight towards each other at constant speed.
A walks 2 12 times as fast as B. If they are 2000 feet apart initially, and if
they meet after 3 13 minutes, how fast does each walk?
(7) Joshua, Li, and Manfred mow lawns at a constant rate. How long would
it take the three of them to mow a lawn if, for the same lawn, it takes
Joshua and Li 2 hours to mow it together, Li and Manfred 3 hours to
mow it together, and Joshua and Manfred 4 hours to mow it together?
(8) A can do a piece of work in 23 as many days as B, and B can do it in
4 7
5 as many days as C. Together they can do it in 3 11 days. Assuming
constant rate of work, in how many days can each do it alone? (Recall
the comment on this kind of abstract “work problem” in Section 1.9 of
[Wu-PreAlg], at the end of the subsection The concept of constant rate:
do this exercise by imagining the “work” to be something concrete, like
painting a house or mowing a lawn.)
Overview
The discrete case
The continuous case
Overview
We are devoting a whole section to the discussion of proportional reasoning for a
good reason. This is one of the key topics in school mathematics on which TSM
has inflicted severe damage (the other comparable topics being fractions, negative
numbers, “variable”, and slope). First, TSM promotes “setting up a proportion”
in a way that is unteachable and therefore unlearnable. Then the resulting mas-
sive nonlearning triggers an extreme reaction that ends up codifying proportional
reasoning as the capstone of elementary school mathematics and the gateway to
higher mathematics ([Post-Behr-Lesh]). In fact, proportional reasoning has come to
be regarded as a concept of such great importance that it “merits whatever time
and effort must be expended to assure its careful development” [NCTM, page 82].
It is precisely because of the putative importance of proportional reasoning in the
middle school curriculum—according to TSM—that we feel obligated to investi-
gate what proportional reasoning might be.
What is proportional reasoning? Apparently, nobody knows for sure. Accord-
ing to the volume [Siegler-etal.], “the literature consists of several different defini-
tions of proportional reasoning. On a basic level, the term means understanding
and working with the underlying relations in proportions” (page 48). When we
seek clarification in [NRC], we are told that it is “understanding the underlying
7.2. PROPORTIONAL REASONING 145
the additional 0.50 cm (half of a cm), we see that, “proportionally”, we should get
half of 80 sheets, i.e., 40 sheets. Altogether, there are 320 + 40 sheets in 4.50 cm.
Once students get used to this reasoning, they can do it in one step: if they can
compute the unit rate k = 80 sheets/cm, they can conclude that the number of
sheets in a stack of 4.50 cm is:
h(n) = T + T + · · · + T = nT
n
so that h(n) = Tn, where we write Tn instead of nT to bring out the fact that in
the expression for the function h(n), T is the constant. (Compare the discussion
of the cost of multiple copies of the same book on page 119.) Thus h(n) is a linear
function without constant term:
This equation is the precise meaning of the common expression that “the number
of sheets n is proportional to the height h(n) of a stack with n sheets”. Now equa-
tion (7.7) explains why equation (7.6) is correct: the h(n) and n in equation (7.7)
correspond, respectively, to the h and n in equation (7.6), and the T in equation
(7.7) then corresponds to 1k in equation (7.6). In other words, the “invariant” k in
equation (7.6) is precisely the number 1/T, but you may have noticed that it is
far easier to think of T (the thickness of one sheet) than to think of 1/T.
7.2. PROPORTIONAL REASONING 149
If the 8 people stay for 3 days, then they need 3 × 20 = 60 liters. The answer is
that the 8 people should carry 60 liters of water.
As before, we note that the solution is simplicity itself (provided we make the
assumption that everybody drinks the same amount each day) and “proportional
reasoning” does not intrude at all. However, if we want to put the problem in the
general context of linear equations without constant term, we can.
Again we assume that each person drinks c liters per day. Define a function
f : N → R so that for each whole number n, f (n) = the amount of water (in
liters) that n people drink per day. We are given that f (1) = c, and therefore for
each whole number n ≥ 1,
f (n) = c + c + · · · + c = n c,
n
because both ratios are equal to c, according to (7.10). Using the given data in the
problem, we get
12.5 f (8)
= .
5 8
The cross-multiplication algorithm gives 5 f (8) = 100, and f (8) = 20. Therefore
8 people will need 20 liters each day. If they want to camp for 3 days, they will
need 3 × 20 = 60 liters as before.
One more example:
Which is the better buy: 12 tickets for $15.00 or 20 tickets for $23.00?
([NCTM2000, page 221]).
We begin by removing two flaws from this problem. The first one is relatively
minor: what is missing is a clear statement that each ticket within a group costs
the same amount; after all, tickets for a musical or theatrical performance usually
come in a wide range of prices. While adults reading this item might realize that
they must assume that all tickets in each group cost the same before this problem
can be solved, an adolescent may not be sophisticated enough or lucky enough
to come to the same realization. The major issue, however, is that it is not clear
what is meant by “better buy”. For example, suppose the first kind of ticket is
for a regular concert of the San Francisco Symphony while the second kind of
ticket is for a performance by the local high school band. Then the former may be
considered a “better buy” even if it turns out to cost twice as much as the latter.
For these reasons, the problem will have to be rephrased. Here is one possibility:
For a certain event, there are two kinds of tickets on sale: 12 tickets
for $15.00 or 20 tickets for $23.00. Assuming that all tickets in each
group cost the same amount, which of the two kinds of tickets is less
expensive?
When the problem has been properly reformulated this way, it is clearly a discrete
problem because, within each group of tickets, the smallest unit is the price of one
ticket. Let us solve this problem.
Just as in the preceding problems, this is a simple problem in 5th-grade arith-
metic. The price of the first kind of ticket is computed as follows: $15 is par-
titioned into 12 equal parts, so by the division interpretation of a fraction (see
Section 1.2 in [Wu-PreAlg]), the size of one part (= the price of one ticket) is
15
= 1.25 dollars.
12
In like manner, the price of one ticket of the second kind is
23
= 1.15 dollars.
20
Clearly the second kind of ticket is less expensive. (If we use a cent as the unit for
the price of a ticket, this would be a problem in whole number arithmetic.)
We can treat this problem as one about linear functions without constant term
but, for something this simple, such a conceptual detour would indeed be a waste
of time. Instead, let us see where “proportional reasoning” might play a role. We
are looking at the ratio:
the price of n tickets
n tickets
152 7. LINEAR FUNCTIONS AND PROPORTIONAL REASONING
problem? Let us see, for example, why part (1) cannot be solved under the cir-
cumstances. Suppose grandfather adopts the following routine about his daily
knitting: each day he knits 4 inches in the first hour and 2 inches in the second
hour. This way of knitting then satisfies every piece of the given data: he knits 6
inches in the 2 hours of knitting each day, and in 10 hours (i.e., 5 days), he knits
30 inches. Now consider how to answer the question in part (1): Which “1 hour”
interval are we talking about? If it is the first hour of the day, the answer is 4
inches, but if it is the first hour of the day, then it is 2 inches. Exactly as remarked
above, this problem is not solvable as is.
Let us therefore add the assumption that grandfather knits at a constant rate.
Then we can solve the problem, as follows. Define a function g : [0, ∞) → R so
that g(t) is the number of inches grandfather knits in t hours. (Observe that the
domain of g is roughly half of R, but not N. See the discussion on pages 145 ff.)
We may as well assume that grandfather starts knitting at t = 0 so that g(t) = 0.
By Theorem 7.1 on page 138, g(t) is a linear function without constant term:
g(t) = t for some constant .
Then from the given data that g(10) = 30, we get 10 = 30, and = 3. Thus
g(t) = 3t. The answers to the four parts are then, in succession:
(1) g(1) = 3,
(2) 10
2 = 5,
(3) g(4) = 12, and
(4) the value of t0 so that g(t0 ) = 27 is t0 = 9.
To drive home the relevance of linear functions without constant term to all
such “proportional reasoning” problems, we make one more comment on the
preceding solution to part (4), “How many hours will it take Grandpa to knit a
g(t)
scarf 27 inches long?” Now we have g(t) = 3t for all t > 0, so that t = 3 for
all t > 0. In particular, if it takes grandfather t0 hours to knit 27 inches (so that
g(t0 ) = 27), then
g(10) g ( t0 )
(7.11) =
10 t0
as both are equal to 3. Using the data in the problem, we get:
30 27
= ,
10 t0
which yields t0 = 27 3 . Note that equation (7.11) is the proportion that TSM asks
students to set up—without making it explicit that grandpa’s knitting is done at a
constant speed—in order to solve the problem.
The rote skill of writing down equations as in (7.11)—without knowing about,
or being the least bit concerned with, the constant speed of the knitting—would
seem to be the essence of TSM’s concept of “proportional reasoning”. (Compare
the comments after equation (7.1) on page 139.)
In one way or another, such continuous problems are solvable only when
they are known to involve a linear function without constant term; in the case at
hand, this function comes from the assumption of knitting at a constant rate. As
mentioned in the preceding paragraph, TSM wants students to write down pro-
portions of the type (7.11) without making use of any assumption about constant
rate of knitting. This kind of mathematics education has no place in the school
154 7. LINEAR FUNCTIONS AND PROPORTIONAL REASONING
classroom. What we ask you to do, instead, is to make an effort to teach your
students about constant rate (see the preceding section) and how to use the linear
function without constant term that follows from constant rate to deduce the pro-
portion of the type in equation (7.11). Yes, these continuous problems should be
done by “setting up a proportion”, but only after the why and the how of “setting
up a proportion” have been carefully explained.
In the education literature, students are usually blamed for their inability to
“reason multiplicatively” in order to solve these “proportional reasoning” prob-
lems. However, if you review everything we have done in this chapter, you will
undoubtedly conclude that if students were taught proportional reasoning with
all the necessary assumptions clearly stated (e.g., that every person drinks the
same amount of water each day, or that the knitting is done at a constant rate),
and with a reason provided for every step of the solution, then these problems
would be no more difficult than any other problem we have discussed so far. In
particular, proportional reasoning is entirely learnable when it is formulated correctly
and taught correctly. In TSM, unfortunately for teachers and their students, neither
takes place. When it is your turn to teach proportional reasoning, just remember
not to follow TSM or its affiliated literature, but teach proportional reasoning as
mathematics, with all that this term implies (see the five principles on page xii).
Further discussion of the role of proportional reasoning in the school math-
ematics curriculum is given in the section on Rate and Proportional Reasoning in
[Wu2013].
Exercises 7.2
(1) If 15 cupcakes cost $5.10, find the cost of 37 cupcakes, (i) without using
any proportions, and (ii) by setting up a proportion. To show that you
now know more than TSM, give an explanation in (ii) about why you
can set up a proportion.
(2) Ann and Betty both run at constant speed. They start running together
at the same time, but after Ann has run 3 laps, Betty has only run 2.5
laps. By the time Ann finishes running 7 laps, how many laps will Betty
have run? You must be able to explain every step of your solution.
(3) The following is a favorite problem in middle school mathematics: “On
a certain map, the scale indicates that 3 centimeters represent the actual
distance of 8 miles. Suppose the distance between two cities on this map
measures 1.7 centimeters. What is the actual distance between these two
cities?” (a) Suppose you are the teacher. What additional explanation
must you give your students about this problem before they can solve it?
(b) Solve it.
(4) Consider this problem: “If it took 8 hours to mow 5 lawns, then at that rate,
how many lawns could be mowed in 32 hours? At what rate were lawns being
mowed? (i) Critique it in terms of clarity. (ii) What does the last sentence
mean? (iii) How is this problem without the last sentence different from
the following: “A ballpoint pen sells only in bundles of 5, and each
bundle costs $8. How many pens can you get for $32?”
(5) Consider this problem: “If 25 cows consume 400 lb. of hay in a week, how
long will 300 lb. of hay last for 12 cows?” (a) What other assumptions do
you need to add to make the problem solvable? (b) Solve it.
https://doi.org/10.1090//mbk/099/08
CHAPTER 8
Linear Inequalities
and Their Graphs
So far we have only discussed equations because school algebra is primarily about
equations. But school algebra also includes everything related to number compu-
tations and, to the extent that inequalities arise naturally in various mathematical
contexts as well as in real life, they should also be an integral part of the alge-
bra curriculum. In this chapter, we pay special attention to inequalities by giving
careful definitions of the basic concepts and proving the most rudimentary facts
related to inequalities in two variables. Then we pull all these pieces together to
solve a typical optimization problem in Section 8.5, i.e., a problem that looks for
the largest or smallest value of a given function in a given region.
any other possible combination. The emphasis here is on the words bigger than,
i.e., an inequality.
One way to understand what is involved in a problem of this nature is to
approach it in a naive way in order to see why naivety doesn’t pay. For exam-
ple, a casual glance at the data would suggest that it is more profitable to sell
Game A than Game B, in the following sense. Suppose you have $165. Then
if you manufacture one B game, you only make $185, but if you use the same
amount to manufacture two A games (each costing $75), you’d not only make $250
(= 2 × 125) but would also have $15 left over from your $165.
A precise way to think about this is to notice that each Game A brings
in a profit that is 1 23 of its manufacturing cost (because 125
75 =
1 23 ), but each Game B only brings in a profit of about 1 18 of its
185 4 4
manufacturing cost (because 165 = 1 33 , which is about 1 32 = 1 18 ).
One’s first impulse is therefore to say that the manufacturer should bring
only A games to the show. We will show why this is a bad strategy in terms of
profit-making. Remember: there is a limit to how many games in total she can
bring to the show: 50. Does she have the money to manufacture 50 A games? Yes,
because it takes only $75 to manufacture one A game so that the manufacturing
costs for these 50 games is 50 × 75 = 3750 dollars. Since she has $6000 to spend,
she is well within her budget. However, with only 50 A games, she can only
make 50 × 125 = 6250 dollars. There is at least one alternate strategy that makes
a greater profit: bring 40 A games and 10 B games. Is this possible? Yes, because
she would still be bringing 40 + 10 = 50 games, and moreover, the manufacturing
cost for 40 A games and 10 B games is only (40 × 75) + (10 × 165) = 4650 dollars,
which is less than the budget of $6000. However, the resulting profit is
dollars, whereas we have already seen that bringing 40 A games and 10 B games
would bring in a greater profit of $6850.
It is now clear that there is an inherent push-
Finding the maximum profit pull in this problem: bringing only A games
usually requires a balance would under-utilize the $6000 manufacturing
budget because of the 50-game quota, and
between opposing demands.
bringing only B games would under-utilize the
50-game quota because of the $6000 manufacturing budget. Neither of these ex-
treme options would bring in the maximum profit. Intuitively, the combination of
8.2. THE SYMBOLIC TRANSLATION 157
A games and B games that brings in the maximum profit must be a kind of “equi-
librium” between “all A games” and “all B games”. What we need to understand
in mathematical terms is how to negotiate the push-pull in a systematic and logical
fashion in order to arrive at this equilibrium. The main theme of this chapter is
about the mathematical understanding of this push-pull which, as adumbrated
above, is grounded on an understanding of inequalities.
One more comment before we proceed. While we are trying to
promote the need to better understand inequalities, this manu-
facturing problem may suggest that we forget about inequalities
and get a solution by simple trial-and-error instead. And why
not? Consider the pair of whole numbers (m, 50 − m), where m
(respectively, 50 − m) is the number of A games (respectively,
B games) the manufacturer produces. As m runs from 0 to 50,
the 51 possible profits of {125m + 185(50 − m)} dollars exhaust
all possibilities, and one of these 51 numbers will then be the
solution (we can worry about not exceeding the manufacturing
budget at the end). This is correct. However, we use small num-
bers here (50, 6000, 75, etc.) only for ease of illustration. Sim-
ilar problems coming from industry would involve far bigger
numbers (e.g., the budgets involved may be millions of dollars)
and far more choices than just two, namely, our choice between
A games and B games. In such situations, the trial-and-error
method for the purpose of getting an answer would in general
take too long even on a computer, and a more efficient method
would be needed. Getting a more efficient method then requires
a better understanding of inequalities and what optimization is
all about, and the remainder of this chapter will take a first step
toward such an understanding.
Exercises 8.1
(1) Referring to the preceding problem, we have seen that neither “50 A
games” nor “36 B games” would maximize the profit. The strategy that
maximizes profit must lie somewhere in between. Note that if instead
of 36 B games, the manufacturer can bring 35 B games and 2 A games,
because
35 + 2 = 37 < 50, so the new strategy still meets the quota of
“up to 50 games”, and
the manufacturing cost is (35 × 165) + (2 × 75) = 5925 < 6000,
and is within budget.
Use this trial and error method to find the number of games of each kind
that maximizes profit.
Exercises 8.2
(1) Translate into symbolic language the following manufacturing problem
(no solution is required): A small firm tries to introduce two products, to
be called A and B. It has invested $60,000 in the production cost. It takes
160 8. LINEAR INEQUALITIES AND THEIR GRAPHS
$215 to produce one item of product A and $95 to produce one item of
product B. The projection is that it takes 3.2 hours to produce one item
of product A and 5.5 hours to produce an item of product B. Because its
manufacturing facilities are limited, the firm can only devote 1500 hours
to the production of these two products. Each item of product A brings
in a profit of $310 and each item of product B brings in a profit of $230.
Assuming that every item produced will be sold, how many items of
product A and how many items of product B should the firm produce in
order to maximize the profit?
Let x, y, z be in Q. Then:
x y x y
(8.4) Let x < y. If z > 0, then < but if z < 0, then > .
z z z z
Here is one simple application of (B) and (D).
Example 1. Exhibit all the numbers x on the number line that satisfy
(5 − x ) + 12 > 4 − (3x − 5).
The set of all these numbers x is called the graph of (5 − x) + 12 > 4 −
(3x − 5) on the number line, and Example 1 is usually expressed as: Graph the
inequality (5 − x ) + 12 > 4 − (3x − 5) on the number line.
As in the case of solving linear equations, one simply isolates the variable
x in the inequality, in the sense of transposing all the x’s to one side of the
inequality by making repeated use of (B) (compare page 44 for a similar concept).
Thus
(5 − x ) + 12 > 4 − (3x − 5) ⇐⇒ (5 − x ) > 4 − (3x − 5) − 12.
Since 4 − (3x − 5) − 12 = −3 − 3x, we have:
(5 − x ) + 12 > 4 − (3x − 5) ⇐⇒ 5 − x > −3 − 3x ,
which, by (B) again, is equivalent to 5 − x + (3x − 5) > −3 − 3x + (3x − 5), i.e.,
equivalent to 2x > −8. By (D), this is equivalent to 12 · 2x > 12 · (−8), i.e., x > −4.
Thus we see that
(5 − x ) + 12 > 4 − (3x − 5) ⇐⇒ x > −4.
In other words, x satisfies (5 − x ) + 12 > 4 − (3x − 5) if and only if x satisfies
x > −4. These x’s therefore can be represented by the thickened semi-infinite
line segment below.
−4 0
Things get a bit more interesting when absolute value appears in inequalities.
Recall (again, see Section 2.6 of [Wu-PreAlg]) that for any number x, the absolute
value | x| of x is by definition:
| x | = the distance of x from 0.
Thus
| x | ≥ 0 for every number x, and | x | = 0 is equivalent to x = 0.
There are two basic properties of absolute value. The first is:
(8.5) | xy| = | x | · |y| for all numbers x and y.
The next is the Triangle inequality:
(8.6) | x + y| ≤ | x | + |y| for all numbers x and y.
A key point about absolute value is that the inequality | x | < b for numbers
x and b (b > 0) can be expressed directly in terms of ordinary inequalities. Let us
introduce for this purpose the double inequality a ≤ b ≤ c, where a, b, c are
numbers, to stand for the two inequalities:
a≤b and b ≤ c.
162 8. LINEAR INEQUALITIES AND THEIR GRAPHS
Then we have:
Let x, c be arbitrary numbers and let be a positive number. Then
| x − c| ≤ is equivalent to the double inequality c − ≤ x ≤ c + .
x -
s s
c− c c+
A useful observation about absolute values is the following:
Lemma 8.1. For two numbers x and y, | x − y| is the distance between x and y.
Proof. We split the proof into three cases: Case 1: both x and y are positive. Case
2: one is positive and the other is negative. Case 3: both are negative.
Let us prove the first case, and we leave the remaining cases to an exercise
(Exercise 1 on page 163). Thus suppose both x and y are positive. Since |y − x | =
| x − y|, we may assume x < y, so that |y − x | = y − x. The lemma is then
obvious. (You may find it instructive to recall that, since y and x are the lengths of
the segments [0, y] and [0, x ], respectively, the definition of subtraction therefore
implies, literally, that y − x is the length of the remaining segment when [0, x ] has
been taken away from [0, y], which is to say, it is the length of the segment [ x, y].
See the definition of subtraction in Section 1.3 of [Wu-PreAlg].)
0 x y
Activity
−6 −5 0 2.5 5
Example 3. Graph |6 + 2x | ≥ 1 on the number line.
We want to change the left side to something like | x − a| for some number
a, because we want to apply Lemma 8.1. With this in mind, the inequality is
equivalent to 12 |6 + 2x | ≥ 12 · 1, which is equivalent to | 12 | · |6 + 2x | ≥ 12 , which in
turn is equivalent to | 12 (6 + 2x )| ≥ 12 (by (8.5) ), i.e., | x + 3| ≥ 12 . Since |3 + x | =
| x − (−3)|, the original inequality is therefore equivalent to | x − (−3)| ≥ 12 . By
Lemma 8.1, this means we have to find all the points x so that their distance from
8.4. GRAPHS OF INEQUALITIES IN THE PLANE 163
we see that the graph is the union of two semi-infinite segments: the segment
to the left of −3 12 and including −3 12 , and the segment to the right of −2 12 and
including −2 12 .
Activity
Exercises 8.3
(1) Complete the proof on page 162 about | x − y| being the distance between
x and y by proving Cases 2 and 3.
(2) (a) Graph the inequality 23 x − (2 + 7x ) ≥ (6 + x ) − (1 − 12 x ) on the
number line. (b) Graph the inequality 25 − 12 x ≥ 15 x + 16 on the number
line.
(3) Graph on the number line each of the following: (i) | x | − 14 > −8.
(ii) | x | − 4 < 13 . (iii) 9 − |3x − 1| < 4. (iv) |2x + 35 | ≥ 15 .
(v) |6x + 1| + 2 4 < 5.
1
ax + by ≥ c
(where a, b, c are given constants): it is the set of all the points ( x, y) in the plane
whose coordinates x and y satisfy this inequality, i.e., ax + by ≥ c. For example,
the point (1, 2) does not lie on the graph of 3x + 2y ≥ 25, for the simple reason
that (3 × 1) + (2 × 2) = 7 < 25, whereas (10, 10) is easily seen to lie on this
graph. The graph of ax + by > c is defined in like manner, as are the graphs
of ax + by ≤ c and ax + by < c. It is customary in mathematics to denote the
graph of an inequality such as ax + by ≥ c by the notation { ax + by ≥ c}, and
we will use this notation below.
Given a collection of linear inequalities of two variables, the graph of the
inequalities is by definition the set of all the points which satisfy each of the
inequalities in the collection. It follows that the graph of a collection of inequalities is
the intersection of all the graphs of the individual inequalities.
164 8. LINEAR INEQUALITIES AND THEIR GRAPHS
Y L
L− L+ L− L+
r X r X
O c
Activity
L+
r L
c
L−
When L is neither vertical nor horizontal, it is intuitively clear that there are still
points that are “above” L and those that are “below” L, as shown below.
Y L
O X
However, how to precisely describe these two “halves” becomes more subtle. We
propose to test whether a point (s, t) is “above” or “below” L by passing a vertical
line through (s, t) and letting it intersect L; since L is not vertical, L and must
intersect, let us say at a point P. Since P and (s, t) lie on the same vertical line,
they have the same x-coordinate, namely s. Therefore let the coordinates of P be
(s, y0 ). Now we compare t, the y-coordinate of (s, t), with y0 , the y-coordinate of
P. We see that if t > y0 , then, pictorially, (s, t) lies above P, as is shown in the
following picture:
Y
L
r(s, t)
r P = (s, y0 )
O X
Theorem 8.2 (Plane separation). A line L divides the coordinate plane into two
nonempty half-planes, L+ and L− , with the following properties:
(i) The plane is the disjoint union of L, L+ , and L− , in the sense
that the union of L, L+ , and L− is the whole plane and no two of
these sets have any point in common.
(ii) If two points P and Q in the plane belong to the same half-plane,
then the line segment PQ lies in the same half-plane.
Q
P
L
(iii) If two points P and Q in the plane belong to different half-planes,
then the line segment PQ must intersect the line L.
s
P Q
L
Proof of the theorem. This proof is quite long (it ends on page 172), so this proof
is not one that you will learn for the purpose of presenting it to your students
in class one day. However, it is given in great detail here because the reasoning
is extremely instructive, in much the same way that the reasoning in the proof of
Theorem 4.2 on pages 60 ff. is instructive. You will get to see how precise defi-
nitions are the foundation for reasoning, why the concept of a function is useful,
and why it is important to know the detailed interplay between the algebra and
the geometry of a linear equation in two variables. In particular, Lemma 8.3 on
page 168 is not only instrumental for the proof of Theorem 8.2, but it will be seen
to be crucial for the solution of the manufacturing problem in the next section.
When all is said and done, however, this proof deserves to be learned because ev-
ery middle school teacher should have a good idea of how to approach the proof
of a basic theorem such as Theorem 8.2. If we want to encourage students to
always ask why and be at ease with reasoning, then we have to start this tradition
at home and be prepared to ask and answer these questions ourselves.
Suppose L is vertical; it is obvious that property (i) holds. The fact that (ii)
and (iii) also hold when L is vertical is simple to prove and will be left as an
exercise (Exercise 6 on page 179).
For the rest of the proof here, we will assume L is nonvertical.
Proof of property (i). Since L is not vertical,
L is defined by y = mx + k for some constants m and k.
See Theorem 4.2 on page 60. We recall the definitions of L+ and L− : these are all
the points above and below L, respectively. More precisely, let ( x, y) be any point
in the plane not on L. Then the vertical line passing through ( x, y) will intersect
L at a point whose first coordinate is x and whose second coordinate is mx + k
8.4. GRAPHS OF INEQUALITIES IN THE PLANE 167
O x X
Again, it is obvious that there can be no point in common between any two of
L+ , L− , and L, that each is nonempty, and that their union is the whole plane.
We have therefore proved that L+ , L− , and L satisfy property (i) of the theorem.
Proof of property (ii). We will deal with L+ ; the proof for L− is similar (see
Exercise 6 on page 179). Thus let P and Q be points in L+ . We must prove that
the segment PQ also lies in L+ . Precisely, we will prove: if P and Q are above L and
S is a point on the segment PQ, then S is also above L. To this end, we are going to
introduce a function h : R2 → R (see page 56 for the notation R2 ) so that h( x, y)
measures how far the point ( x, y) is, vertically, above L or below L. Recall that L
is the graph of the equation y = mx + k and, as we have seen, the vertical line
passing through ( x, y) intersects L at the point ( x, mx + k), as shown:
Y
r(⎫x, y)
⎪
⎪
⎬ L
h( x )
⎪
L+ ⎭
⎪
r L−
( x, mx + k )
x X
points in a segment PQ is achieved at the endpoints (see page 56 for the notation
R2 in the lemma).2
Lemma 8.3. Let f ( x, y) be a linear function of two variables, f : R2 → R, and let
PQ be a given segment in the plane. Then for any point S in PQ, either f (S) = f ( P) =
f ( Q), or f (S) is between f ( P) and f ( Q).
Y
Pr
XXX S
XXr X
XXX Q
Xr
p s q X
Proof of Lemma 8.3. First assume the line L PQ joining P and Q is not vertical.
Then L PQ is the graph of an equation y = μx + κ for some constants μ and κ
(Theorem 4.2 on page 60).3 Thus P = ( p, μp + κ ) and Q = (q, μq + κ ) for some
constants p and q. We may assume without loss of generality that p < q. Let
f ( x, y) = ax + by + c for some constants a, b, and c. Therefore,
f ( P) = ap + b(μp + κ ) + c = ( a + bμ) p + (bκ + c).
Similarly,
f ( Q) = ( a + bμ)q + (bκ + c).
If S is between P and Q, then S = (s, μs + κ ) for some constant s, and
p < s < q
(see the definition of between on page 266). We also have
f (S) = ( a + bμ)s + (bκ + c).
Obtaining these explicit expressions for f ( P), f (S), and f ( Q) is the key point of
this proof. Once that is done, the rest of the proof is nothing more than a straightforward
computation with these expressions to arrive at the desired conclusions.
We will in fact prove something slightly more precise, namely:
(i) If f ( P) = f ( Q), then f (S) = f ( P) = f ( Q), i.e., f is constant
on PQ.
(ii) If f ( P) = f ( Q), then f (S) is between f ( P) and f ( Q).
The lemma is easily seen to follow from (i) and (ii) (when L PQ is not vertical).
We first prove (i). If f ( P) = f ( Q), then
( a + bμ) p + (bκ + c) = ( a + bμ)q + (bκ + c),
2 From an advanced standpoint, this lemma is about linear functions of one variable, in the fol-
lowing sense. If PQ is taken to be the image of a linear mapping φ : [ a, b ] → R2 , then the composition
f ◦ φ : [ a, b ] → R is a linear function of one variable, and the maximum or minimum of a linear
function of one variable on a closed interval is achieved at the endpoints of the interval.
3 The letters μ and κ are the lower case Greek letters for “m” (mu) and “k” (kappa), respectively.
We use these because we have run out of appropriate lower case Latin letters.
8.4. GRAPHS OF INEQUALITIES IN THE PLANE 169
Y
r Q = ( p, q
)
r S = ( p, s
)
L
r P = ( p, p
)
q
( p, mp + k)
p X
From f ( x, y) = ax + by + c, we have
f ( P) = ap + bp
+ c, f (S) = ap + bs
+ c, f ( Q) = ap + bq
+ c.
To prove (i), suppose f ( P) = f ( Q). Then bp
= bq
, so that b( p
− q
) = 0. Since
p
< q
, we have b = 0. In that case,
f ( P) = f ( Q) = f (S) = ap + c.
Now we prove (ii). If f ( P) = f ( Q), then bp
= bq
and b( p
− q
) = 0. In
particular, b = 0. If b > 0, then p
< s
< q
implies bp
< bs
< bq
, which
implies
ap + bp
+ c < ap + bs
+ c < ap + bq
+ c.
170 8. LINEAR INEQUALITIES AND THEIR GRAPHS
Hence f ( P) < f (S) < f ( Q), and f (S) is between f ( P) and f ( Q). If, however,
b < 0, then the by-now familiar argument proves that
ap + bp
+ c > ap + bs
+ c > ap + bq
+ c
and therefore f ( P) > f (S) > f ( Q). So f (S) is again between f ( P) and f ( Q).
The proof of Lemma 8.3 is complete.
Activity
Let L be the graph of y = 12 x, and let P = (0, 12 ) and Q = (3, 2). ( a) Verify
that P and Q are in L+ . (b) Produce two points on the segment PQ, and
verify directly—without using Theorem 8.2—that they also lie in L+ .
Therefore q
− v
< 0 < p
− u
, as the left side is negative and the right side is
positive. By (B) on page 160, the fact that q
− v
< p
− u
implies
(8.12) q − p < v − u .
p < x0 ⇐⇒ mp + k < cp + d.
have:
d−k
x0 < q ⇐⇒ < q
m−c
⇐⇒ (d − k) < (m − c)q (because m − c > 0)
⇐⇒ d − k < mq − cq.
Using (B) on page 160 once more, we see that d − k < mq − cq ⇐⇒ cq + d <
mq + k, which is the same as q
< v
. By (8.11), the last inequality is valid, and
therefore so is x0 < q. Consequently, ( x0 , y0 ) is between P and Q after all, and
the proof of property (iii) is complete. The proof of Theorem 8.2 is therefore also
complete.
Activity
This is how the inequality (−4, 2) < −6 came about, and this is why (−4, 2)
belongs to { < −6}.
To further bring out this idea, consider another point (3, 1) (see preceding
picture). The vertical line x = 3 passes through it and intersects L at (3, 4); since
1 < 4, we see that (3, 1) lies in L− . We now perform the same computation to
verify that (3, 1) belongs to { > −6}. Indeed, 1 < 4 implies (−3)1 > (−3)4 by
(E) on page 160, so that
(3, 1) = 2(3) + (−3)(1) > 2(3) + (−3)4 = (3, 4) = −6.
Thus (3, 1) > −6, as desired.
This example tells us how to prove Theorem 8.4.
Proof of Theorem 8.4. Part (ii) implies part (i), so we will prove part (ii). There
are two cases to consider: b > 0 and b < 0. We will prove the latter case because
it is more involved, and leave the case of b > 0 to Exercise 11 on page 180.
Thus we assume henceforth that b < 0 in ax + by = c. We will prove:
(8.14) L− = { > c}.
Take a point ( x, y) in L− ; we first show that ( x, y) lies in { > c}. Referring to
the picture below, let the vertical line passing through ( x, y) intersect L at ( x, y0 ).
Since ( x, y) is in L− , y < y0 . Because b < 0, (E) on page 160 implies that
(8.15) by > by0 .
Y L
r
( x, y0 )
L−
q( x, y)
X
O x
We now see that ( x, y) belongs to { > c} because, by (8.15),
( x, y) = ax + by > ax + by0 = ( x, y0 ) = c,
where the last step is because ( x, y0 ) lies on L.
To complete the proof of (8.14), we have to show that every ( x, y) in { > c}
is a point of L− (see equal sets on page 267). Thus we are given that ( x, y) > c.
By Theorem 8.2, this ( x, y) is either in L, or in L+ , or in L− , and there are no
other possibilities. If it is in L, then by the definition of L, ( x, y) = c, and this
contradicts ( x, y) > c. Next, suppose ( x, y) is in L+ . We have just finished
showing that every point ( x, y) in L− must lie in { > c}; an entirely similar
argument will show that every point of L+ must be in { < c}. Thus we would
have ( x, y) < c for this ( x, y). Again this contradicts ( x, y) > c. Hence by
elimination, we are left with the conclusion that such an ( x, y) in { > c} has to
be a point of L− . The proof of (8.14) is complete.
The proof of L+ = { < c} is entirely similar. This proves Theorem 8.4.
8.4. GRAPHS OF INEQUALITIES IN THE PLANE 175
Activity
The occasion will arise when more precision regarding half-planes is needed,
for the following reason. The half-planes L+ and L− do not include L, but we
shall see presently that there is sometimes a need to also consider half-planes
together with L itself. For this need, it will be advantageous to formally introduce
two common concepts regarding geometric figures (see page 267).4 Let A and B
be two figures in the plane. Then the union of A and B, to be denoted by A ∪ B,
is the totality of all the points that are in A or B, or both. Their intersection, to
be denoted by A ∩ B, is the totality of all the points that are in both A and B.
In this new language, we will refer to L+ ∪ L and L− ∪ L as the two closed
half-planes of L. The two closed half-planes are not disjoint as they have L in
common. If there is any fear of confusion, we will refer to L+ and L− as the two
open half-planes of L for emphasis.5
Theorem 8.4 allows us to see why the concept of a closed half-plane is rele-
vant. Indeed, suppose we want to know the graph of the weak inequality ax +
by ≤ c; it is natural to denote this graph by { ≤ c}. By Theorem 8.4, we know
{ < c} is one of L+ and L− . Let us say for definiteness that { < c} = L− .
It follows that { ≤ c} is the closed half-plane L− ∪ L. Similarly, if we define
{ ≥ c} to be all the points ( A, B) so that ( A, B) ≥ c, and if { > c} = L+ ,
then { ≥ c} is equal to the closed half-plane L+ ∪ L. Incidentally, we have
{ ≤ c} ∩ { ≥ c} = L.
Now recall that we are interested in the region R consisting of all the points
satisfying the inequalities of (8.2). Therefore by the definition of the graph of a
collection of inequalities on page 163, R is the intersection of a finite number of closed
half-planes.
The following examples illustrate how to make use of Theorem 8.4.
Example 5. Graph 3x − 2y > −5 in the plane.
The line L defined by 3x − 2y = −5 is shown below.
Y
L+
L−
5
2
− 53
X
O
4 The concepts of union and intersection apply to any two sets (collections of objects).
5 This way of using “open” and “closed” is standard in mathematics, but one must be careful to
keep in mind the fact that a half-plane that is not closed may not necessarily be open. For example, the
union of the open upper half-plane together with the positive x-axis is not a closed half-plane, but it is
not open either.
176 8. LINEAR INEQUALITIES AND THEIR GRAPHS
The coefficient of y being −2 and therefore negative, Theorem 8.4(ii) says the
graph of 3x − 2y > −5 is L− . However, as we have mentioned more than once,
there is no reason to rely on part (ii) of Theorem 8.4 to make this determination.
This fact is more easily deduced by the cruder, but eminently practical, method of
checking in which half-plane O belongs. Visibly, (0, 0) belongs to L− , but it also
belongs to {3x − 2y > −5} as 0 > −5. Since Theorem 8.4 says {3x − 2y > −5}
must be either L+ or L− , we know that it is L− .
Example 6. Find the graph of the pair of inequalities − x − 2y < 4 and −2x +
3y > 0, i.e., find all the ( x, y) that satisfy both inequalities.
This example asks for the totality of all the points in both graphs {− x − 2y <
4} and {−2x + 3y > 0}. In other words, we want the intersection of the graphs of
the individual inequalities. Let L1 be the line − x − 2y = 4. Now (0, 0) belongs to
− x − 2y < 4 because 0 < 4, so the graph of − x − 2y < 4 is the upper half-plane
L1+ of L1 , as shown below.
HLH
1 {− x − 2y < 4} = L1+
HH−4
HH X
HH O
HH
HH
H− 2
HH
HH
It remains to determine the graph of −2x + 3y > 0. Let L be the line defined by
−2x + 3y = 0. Then the picture is the following:
Y L
L+ = {−2x + 3y > 0}
−3
X
O
r −2
The graph of the pair − x − 2y < 4 and −2x + 3y > 0 is therefore the inter-
section of two half-planes: L+ ∩ L1+ . This is the shaded region in the following
picture, and one should take note of the fact that the region does not include the
two rays on the boundary6 of the region.
L
L1
−4 O
−2
The graph in Example 6 is an “unbounded region” in a sense that is self-explanatory
(although “unbounded” in this context can be precisely described in advanced
mathematics). In applications such as the manufacturing problem of this chapter,
however, the graph would tend to be a polygon with the edges included. As an
illustration of such a polygon, let us see how we can obtain one by a slight elab-
oration on Example 6. The graph of the two weak inequalities − x − 2y ≤ 4 and
−2x + 3y ≥ 0 is the intersection of the closed half-planes L+ ∪ L and L1+ ∪ L1 ,
and is the same shaded region as above plus the two rays. Call this region S .
Now consider not just the graph of this pair but the graph of this pair plus a third
inequality, namely, the graph of the three weak inequalities:
− x − 2y ≤ 4, −2x + 3y ≥ 0, and y ≤ 0.
We see that this graph is the intersection of S with the closed lower half-plane of
the x-axis; it is therefore the following shaded triangular region together with the
three edges:
L
L1
−4 O
−2
Finally, we bring closure to this section by answering the question: what does
the graph of a collection of weak inequalities look like? The answer is that it is the
intersection of a collection of closed half-planes. More can be said. A geometric
figure R in a plane is said to be convex if, given any two points A and B in R,
the segment AB lies completely in R. For example, the region enclosed by the
6 In this volume, we will use the term “boundary” in an intuitive sense, in the same way that
we have used the term “region” in an intuitive sense without a proper definition. These are precise
concepts in advanced mathematics.
178 8. LINEAR INEQUALITIES AND THEIR GRAPHS
cross below is not convex (can you prove it is not convex by using this definition
of convexity?).
@@
@@
@@@@@@
@@@@@@
@@
@@
On the other hand, a half-plane and a closed half-plane are both convex, and with
a little bit more effort, one can prove that intersection of a finite number of convex
sets is also convex (see Exercise 12 on page 180). It follows that the intersection of
a finite number of closed half-planes is convex. For example, a triangular region
or a rectangular region is convex (Exercise 12 again). If the intersection of a finite
number of closed half-planes is known to be a bounded region, then it is actually a
convex polygon; see Lemma 8.7 on page 188 for further discussion.
A backward glance. In TSM, it is usually asserted that “the graph of a linear
inequality in two variables is a half-plane”, but no definition is given for “graph
of an inequality” or “half-plane” and no reasoning is given for why this is true.
When such an assertion is made without proof about two undefined concepts, it
promotes sloppy thinking and precludes any meaningful mathematics learning.
Learning by rote is the inevitable result. Students get the idea that mathematics
is a faith-based discipline that tolerates no questioning about why something
might be true. What this section tries to do is to disabuse students of that sort
of anti-mathematical thinking by precisely defining what a half-plane is and what
the graph of a linear inequality means; it also provides the reasoning for why the
graph of a linear inequality is a half-plane and how to figure out which half-
plane corresponds to which inequality. This kind of knowledge is indispensable
to a middle school mathematics teacher, even if something like the complete proof
of Theorem 8.2 may be too sophisticated (or too technical) for a K–12 audience.
However, if you as a teacher can bring the basic spirit of this section back to
your classroom—at least giving precise definitions of all the terms you use and
exposing your students to a judicious choice of the arguments, such as the proof
of Lemma 8.3—you will be taking a major step toward restoring good sense to
your mathematics classroom.
Exercises 8.4
(1) Prove that distinct level sets of a linear function of two variables are
parallel lines.
(2) Let the linear function of two variables be defined by ( x, y) = 5x −
y + 7. Sketch { = 1}, { = 10}, { < 1}, and { < 10}. What is the
relationship between { < 1} and { < 10}?
(3) Referring to the picture below, let a line L be defined by ax + by = c,
where a, b, c are constants and a = 0 (thus L is not horizontal). Let its
x-intercept be x L . Let A (respectively, B ) be the set of all points ( x
, y
)
with the following property: ( x
, y
) does not lie on L, and if the line
passing through ( x
, y
) and parallel to L has x-intercept k, then k > x L
(respectively k < x L ). Prove that A and B are the half-planes of L.
8.4. GRAPHS OF INEQUALITIES IN THE PLANE 179
L
# #
# #
B # # A
# r#
# #( x
, y
)
# #
# #
#r# #r# X
x
# L # k
#
(4) Let L be the line defined by y = 3x − 5. (a) Find a linear function of two
variables F( x, y) so that L is the level set { F = −5}. (b) Find a linear
function of two variables G ( x, y) so that L is the level set { G = 1}.
(5) Suppose we have two lines both with slope ba (b = 0), as shown:
y
( p, q ) r
r r
x
( s, t) r
( p
, q
) r
Let P( x, y) be a linear function of two variables P( x, y) = ax − by + e
with a > 0. (a) Compare the values that P( x, y) assigns to (s, t) and
( p, q). (b) Compare the values that P( x, y) assigns to ( p, q) and ( p
, q
).
(6) The following two statements refer to the proof of Theorem 8.2 on page
165: (a) Complete the proof by proving it for the case of a vertical L, i.e.,
show that the L+ and L− so defined (page 164) satisfy properties (ii)
and (iii). (b) Write out a complete proof of the fact that L− satisfies
property (ii) (see page 167).
(7) Prove property (iii) of Theorem 8.2 when the line joining P in L+ and Q
in L− is vertical (see page 170).
(8) Graph the following inequalities in the plane:
⎧
⎪
⎪ 5x + 2y ≤ 6
⎪
⎨
2x − 3 12 y ≤ −3
⎪
⎪
⎪
⎩
− x + 1 34 y ≤ 5
(9) Graph the following inequalities in the plane:
⎧
⎪
⎪ x−y ≤ 53
⎪
⎨
2x + y ≤ −4
⎪
⎪
⎪
⎩
x ≥ −3
180 8. LINEAR INEQUALITIES AND THEIR GRAPHS
Y
@
@
@
50 @
@ x + y = 50
@
@
aa @
at
A aaa @
q aa @
aa@
q q q aa
@a tC
@aa
q q @ aa
R q aa
q q q @ aa75x + 165y = 6000
q q @q aa
q q @ aa
t @t aa
a X
O @
B 80
Activity
For a later need, we record the coordinates of the vertices of the quadrilateral
AOBC:
4
(8.17) A = (0, 36 11 ), O = (0, 0), B = (50, 0), C = (25, 25).
The only coordinate expression worthy of comment here is that of C. It is the
point of intersection of the two lines defined by x + y = 50 and 75x + 165y = 6000
and is therefore the solution (by Theorem 5.1 on page 86) of the linear system:
x + y = 50
75x + 165y = 6000
Solving this system in the standard way (but noting that a simplification can
be achieved by reducing the second equation to x + 2.2 y = 80), we get the
solution (25, 25). So C = (25, 25).
What task ( B) asks is at which point ( x0 , y0 ) of R the profit function H ( x, y) =
125x + 185y will achieve its maximum value in R. Thus we look for an ( x0 , y0 )
in R so that
(8.18) H ( x, y) ≤ H ( x0 , y0 ) for all ( x, y) in R.
Let us first informally discuss where to look for such an ( x0 , y0 ). The overriding
fact is that H is a linear function of two variables so that we can apply Theorem 8.4
on page 172 to H. With this understood, letting c = H ( x0 , y0 ) and L be the level
set { H = c} that passes through ( x0 , y0 ),7 we see that—on account of (8.18)—the
whole region R must lie in the closed half-plane { H ≤ c} . This suggests strongly
that this ( x0 , y0 ) had better not be inside the region R because, were this the case,
the line L = { H = c} would “split” R into two parts and R could in no way lie
in either half-plane of L, as the following picture shows:
QQ L = { H = c}
A aaa Q
Q
a Q
a a
Qa
Qaa C
Q
Q@ r
( x0 , y0 ) Q@
Q
R @Q
@Q
Q
@Q
@ QQ
O B
What this informal discussion suggests is that, if H achieves a maximum
value at ( x0 , y0 ) in R, then this point ( x0 , y0 ) will have to be at the boundary. We
can go further, however. We claim that the maximum value of H is in fact achieved
at a vertex.8 Suppose this ( x0 , y0 ) lies on a side of the quadrilateral AOBC—let
us say on AC—but H ( x0 , y0 ) is greater than H ( A) and H (C ). According to
Lemma 8.3 on page 168, this is impossible, because either H ( x0 , y0 ) = H ( A) =
H (C ) or H ( x0 , y0 ) has to be between H ( A) and H (C ). Therefore if H ( x0 , y0 ) is a
maximum value of H, then H ( x0 , y0 ) ≤ H ( A) or H (C ). Since we already know
that H ( x0 , y0 ) is a maximum value of H in R, we must have H ( x0 , y0 ) = H ( A) =
H (C ) after all. This means H also achieves its maximum value at both vertices
7 Don’t forget L is the line 125x + 185y = c.
8 Caution: we are not saying that H achieves its maximum value only at a vertex. All we are
saying is that, if H achieves a maximum value somewhere in R, it already does so at a vertex.
182 8. LINEAR INEQUALITIES AND THEIR GRAPHS
A and C, which then proves the claim. Our informal knowledge therefore tells
us if we are looking for such an ( x0 , y0 ), we should be looking for it at one of
the vertices of AOBC. With this understood, the following theorem then becomes
somewhat anti-climatic, and the only excitement left is to find an honest proof for
the theorem.
Theorem 8.5. For the region R defined by (8.16), the linear function of two vari-
ables H ( x, y) = 125x + 185y achieves a maximum value in R at a vertex.
It may seem strange that we would be proving a general theorem about a
specific problem. The reason for doing this is that the idea behind this proof is in
fact perfectly general and therefore will serve as the model for a proof of a general
theorem about the maximum value of linear functions of two variables on convex
polygons. See Theorem 8.6 on page 187.
Proof. Referring to (8.17), we have:
3
H ( A) = 6727 11 , H (O) = 0, H ( B) = 6250, H (C ) = 7750.
Thus among the four vertices, H is largest at C. Now we will prove that H ( x, y) ≤
H (C ) for every point ( x, y) in R. To this end, let the horizontal line passing
through a given ( x, y) meet the boundary of R at two points P and Q.9
A aaa
aa
aa
aa C
@
P r r @ rQ
( x, y) @
R @
@
@
O B
By Lemma 8.3 on page 168, H ( x, y) ≤ H ( P) or H ( Q). By the same lemma,
H ( P) ≤ H ( A) or H (O) and, as we know, both are ≤ H (C ). For the same
reason, H ( Q) ≤ H ( B) or H (C ) and therefore ≤ H (C ). In either case, we get
H ( x, y) ≤ H (C ). The proof of the theorem is complete.
Recall from (8.17) that C = (25, 25). Thus the fact that H (C ) is the maximum
value of H in the feasibility region R means that the profit from manufacturing
25 of the A games and 25 of the B games will bring in the maximum profit under
the given constraints. Theorem 8.5 therefore solves the manufacturing problem.
It should not have escaped your attention that the preceding solution of the
manufacturing problem depends on a stroke of luck: the fact that the coordinates
of the vertex C at which H achieves its maximum value are positive integers.
Imagine, for example, that the coordinates of C were (21 37 , 28 47 ). Such would
be the case if we slightly perturb the original problem by specifying the cost of
9 It is not necessary to use a horizontal line; any line passing through ( x, y ) would do. But a
horizontal line (like a vertical line) has the virtue of simplicity. Furthermore, there should be no
doubts about the claim that this horizontal line will meet the boundary of R at two points P and
Q. Indeed, since all the equations of the four lines containing the sides AO, OB, BC, and AC are
explicitly known, the points of intersection of this horizontal line with these four lines can all be
explicitly computed (thanks to Theorem 5.1 on page 86) and verified to lie on the four sides.
8.5. SOLUTION OF THE MANUFACTURING PROBLEM 183
manufacturing an A game to be $60 instead of $75. Thus suppose the problem is:
[Second Manufacturing Problem] A video game manufacturer is
invited to a game show, and is told that she can bring up to 50
games. She has two games, A and B, and has up to $6000 to
spend on manufacturing costs. Game A costs $60 to manufac-
ture and will bring in a net profit of $125, while Game B costs
$165 to manufacture but will bring in a net profit of $185. As-
suming that she sells every game she brings, how many games
of each kind should she manufacture if she wants to maximize
her profit?
Let x and y denote the number of A and B games manufactured, respectively.
While the profit function H stays the same,
H ( x, y) = 125x + 185y ,
we are now looking at the pair of integers x and y which satisfy the following
inequalities (they are the same as those in (8.16) except the last):
⎫
x ≥ 0, y ≥ 0 ⎬
(8.19) x + y ≤ 50
⎭
60x + 165y ≤ 6000
If we let x and y be arbitrary real numbers that satisfy (8.19), then we have a new
feasibility region R
shown below:
Y
@
50 @
@ x + y = 50
@
@
PPt @
A PP @
q P
q PP @ C
q P P@
PtP
P
q q q @@ PPP
@ PP
q @ PP
q R
q PP
@ PP
q q q P60x
PP+ 165y = 6000
q q @q PP
q q @ PP
t @t PP
P X
O B@ 100
3
(8.20) H ( A) = 6727 11 , H (O) = 0, H ( B) = 6250, H (C ) = 7964 27 .
Just as before, among the four vertices, H achieves its maximum value at C. Then
arguing exactly as in the proof of Theorem 8.5, we conclude that H achieves its
maximum value in the feasibility region R
at C = (21 37 , 28 47 ). So far so good.
But now the formal conclusion is that if we manufacture 21 37 A games and
4
28 7 B games, then the profit would be the maximum. The only problem with this
conclusion is that it doesn’t make any sense. For example, how to manufacture
21 37 A games? We need solutions that are a pair of whole numbers, not fractions.
The second manufacturing problem, in effect, compels us to recall that there
are two steps to the solution of this kind of problem, as explained on page 158.
What we have been doing is to address Step 1, whose goal is to obtain the max-
imum value of the profit function in the feasibility region using real numbers
without regard to the original problem itself.
A real world problem is The second step is to transition from the
sometimes solved by solving an purely mathematical solution of Step 1—such as
3 4
abstract version first before going H (21 7 , 28 7 )—to a solution in whole numbers as
demanded by the original problem. In general,
back to the original problem. the second step is not simple, and the mid-
dle school curriculum correctly de-emphasizes
this step and only deals with problems so that the coordinates of the vertices of
the given feasibility regions have integer coordinates; cf., e.g., the original manu-
facturing problem on page 155 and its solution in Theorem 8.5 (page 182). How-
ever, the second manufacturing problem is sufficiently simple that we will give a
complete solution (but it can be skipped if necessary).
We are given that C = (21 37 , 28 47 ) is the vertex of R
at which H achieves
the maximum value in R
. If we want a point in R
with integer coordinates
at which H is the largest possible, the point (21, 28) comes to mind immediately
and we have H (21, 28) = 7805. It is simple to check (8.19) to see that (21, 28)
is in R
. Now since 21 + 28 < 50, we should also consider the points (22, 28)
and (21, 29). Unfortunately, (21, 29) does not satisfy 60x + 165y ≤ 6000, so that
(21, 29) does not lie in R
, but (22, 28) does and therefore (22, 28) lies in R
. We
have H (22, 28) = 7930, which is larger than 7805 (= H (21, 28)). We claim:
H achieves its maximum of 7930 at (22, 28) among all the points in
R
with integer coordinates.
To prove this claim, let P = (22, 28). P lies on the line { x + y = 50}; this
is the line containing the side BC of the feasibility region R
, which is AOBC.
Since H (22, 28) = 7930, P also lies on the level set { H = 7930}. Let { H = 7930}
intersect the line {60x + 165y = 6000} at a point Q. The coordinates of Q can be
obtained by solving the following linear system:
125x + 185y = 7930
60x + 165y = 6000
8.5. SOLUTION OF THE MANUFACTURING PROBLEM 185
The points with integer coordinates either lie in AOBPQ or lie in QCP. Looking
at the coordinates of Q, C, and P, we can see that there is no point with integer
coordinates in QCP except P itself. The reason is simple: if ( a, b) is such a point,
where a and b are both positive integers, then ( a, b) lies in the horizontal region
whose boundary consists of horizontal lines passing through Q and P. Thus b is
a positive integer so that
28 ≤ b ≤ 28 100 127 .
Clearly there is no such integer other than 28. Thus ( a, b) = (22, 28). It follows
that
the points with integer coordinates in R
are the same as those points
with integer coordinates in the polygon AOBPQ.
Therefore in order to prove the claim, it suffices to prove that H ( P) ≥ H ( a, b)
for all ( a, b) lying in AOBPQ, where a and b are integers. We will in fact prove
more: H ( P) ≥ H ( x, y) for all the points ( x, y) lying in AOBPQ. This is because,
by Theorem 8.4 on page 172, the polygonal region AOBPQ lies in the closed half-
plane { H ≤ 7930} of the line { H = 7930} (e.g., O clearly lies in { H ≤ 7930}). The
claim is proved.
It follows that by manufacturing 22 A games and 28 B games, the manufac-
turer will make the maximum profit of $7930.
Remark. While the profit function H ( x, y) = 125x + 185y for whole numbers
x and y is something concrete and down-to-earth, the consideration of H ( x, y) as
a function defined on the feasibility region elevates the problem to an abstract
mathematical problem about linear functions of two variables. It will be seen
that a substantial part of learning algebra consists of learning when to take an
186 8. LINEAR INEQUALITIES AND THEIR GRAPHS
abstract approach to a problem in order to get the solution to the original at the end.
The extension of the concept of “profit” to include 125x + 185y for any numbers
x and y is a good example of the needed abstraction for the solution of many
problems in algebra.
Exercises 8.5
(s, t) r
X
C
B
A
10 Note that we still refrain from defining precisely what a region is and what the boundary of a
region is. Such omissions are part of the reason we do not emphasize proofs here.
188 8. LINEAR INEQUALITIES AND THEIR GRAPHS
This region S is the intersection of the following four closed half-planes (instead
of saying “the upper half-plane of the line containing the ray”, etc., we will simply
say “the upper half-plane of the ray”, etc.):
Clearly H (C ) is the largest of the three numbers H ( A), H ( B), and H (C ), but
there is no point in S at which the linear function H ( x, y) achieves the maximum
value.
The critical component in the proof of Theorem 8.6 is the following basic
lemma.
L1
L1
L2
L3 L3
L2 L4
Activity
Proof Outline of Theorem 8.6. It suffices to prove the case of a maximum, as the
case of a minimum will be seen to be entirely similar. Let P be a point in R, and
we will prove that H ( P) ≤ H ( A1 ).
Let L be the horizontal line passing through P. By hypothesis, R is a finite
intersection of closed half-planes. According to the Activity above, the intersec-
tion L ∩ R has to be a segment CD because R is bounded. C and D are therefore
points on the boundary polygon; let us say C lies on A3 A4 and D lies on A7 A8 .
A8
@
A3 @
@
C Pr @D
@
@
@
A4 @ A7
H ( P ) = H ( C ) ≤ H ( A1 ).
The theorem is proved in this case. If (ii) holds, let us say H ( P) < H ( D ). By
Lemma 8.3, H ( D ) ≤ one of H ( A7 ) or H ( A8 ), and since both H ( A7 ) and H ( A8 )
are ≤ H ( A1 ), we see that H ( P) < H ( A1 ). The proof of Theorem 8.6 is complete.
Remark. The part of Theorem 8.6 concerning the fact that a linear function
must attain a maximum or a minimum in a bounded finite intersection of closed
half-planes is a special case of a general theorem about continuous functions. See,
for example, Theorem 13.12 on page 65 and Corollary 2.5 on page 122 of [Ross].
Exercises 8.6
(3) The nutritional values of a basic unit of two food items are tabulated
below:
calories vitamin C (i.u.) protein (mg)
A 156 50 30
B 116 75 80
A mountain climber wants to bring enough of both items for her trip
so that she would get at least 2600 calories, 1500 i.u. of vitamin C, and
1250 mgs of protein.
Suppose each unit of Item A costs $2.80 and each unit of Item B
costs $5. How many units of each should she buy so that the total cost
is minimum and her nutritional requirements are met? Use a scientific
calculator.
https://doi.org/10.1090//mbk/099/09
CHAPTER 9
Exponents
So far we have dealt with linear equations (pages 38 and 57), linear inequalities
(page 159), and linear functions (page 127) of one or two variables. Linear objects
are important because they are the basic building blocks of mathematics, but life
is often not linear. A good example is Kepler’s famous Third Law governing the
motion of an object around the sun: the square of the period1 divided by the cube
of the so-called semi-major axis of the elliptic orbit2 is a fixed constant no matter
what the object may be (e.g., any planet, any meteor, any asteroid). In symbols,
this means there is a number c so that, if T is the period and D is the semi-major
axis of an object revolving around the sun,
T2
= c.
D3
Thus if the object is very far from the sun compared with the earth (e.g., Pluto)—
so that D is very large—then T must be correspondingly very large and therefore
it would take much more than an earth-year for that object to complete a revolu-
tion around the sun. Multiplying both sides by D3 , we get
T 2 − cD3 = 0.
You can see that this is not a linear equation of two variables in T and D. What
this implies is that, in order to progress further into mathematics, at the very least
we will have to deal with powers of numbers, such as T 2 and D3 . These are the
most basic nonlinear quantities.
This is of course hardly news to us because we have already come across
polynomials of any degree in Section 1.4, and polynomials of degree exceeding
1 are not linear. If we isolate a monomial of degree 4, let us say x4 , then it is
a real-valued function defined on R that assigns the number x4 to each number
x. However, it is possible to tweak this idea to arrive at a different kind of a
nonlinear function. Instead of fixing the exponent and raising all numbers to
that power, we fix a number such as 3 and assign to each positive integer n the
number 3n . If we denote this function for the moment by g, then g(n) = 3n for
each positive integer n, e.g., g(5) = 35 = 3 · 3 · 3 · 3 · 3. The domain of definition
of g is the collection of positive integers. It is clear that g is not a linear function,
in the sense that there are no fixed constants a and b so that g(n) = an + b for all
1 The time it takes the object to complete a revolution around the sun.
2 The maximum distance of the object from the sun. In some school mathematics and physics
textbooks, this law is stated using “mean distance” in place of “major axis”, and that is an error.
191
192 9. EXPONENTS
3 The real reason for the awkwardness comes from advanced mathematics: such functions cannot
What TSM does is to introduce the idea that we can extend the meaning of the
notation in βn so that n is allowed to be a rational number rather than just a
positive integer, and, moreover, when so extended, (E1)–(E3) continue to hold for
rational numbers m and n. To this end, the following definitions are given: let α
and β be positive from now on and let n and m be positive integers as before.
Then, by definition,
m
β0 = 1 and ( β)m/n = n β
1
β−m/n = .
βm/n
It is here that the first sign of trouble appears because, in trying to motivate these
definitions, heuristic arguments are given to the effect that, if (E1)–(E3) are already
known to be true for all rational numbers m and n, then the preceding definitions
would be the inevitable outcome. There is nothing wrong with this approach if
precision and reasoning are the rule of the day and great care is taken to ensure
that such speculative reasoning is clearly understood to be speculative and not
part of regular reasoning. (See Section 10 of [Wu2010b] for such an exposition.)
In TSM, however, reasoning is largely absent, precision is a rarity, and there seems
to be little difference between reasoning and speculative reasoning. Consequently,
heuristic arguments are commonly misunderstood to be valid proofs, and the defi-
nitions of 0 exponents, fractional exponents, and negative rational exponents have
been misconstrued by many to be theorems.
Recall that the purpose of these definitions is to extend (E1)–(E3) to rational
values of m and n. Under normal circumstances, the definitions would be put in
the service of reasoning, and seeing these definitions in action naturally leads to
an increased understanding. In typical TSM fashion, however, these definitions
are not put to use for reasoning except as tools for drills, and no proofs of the
generalized laws of exponents are offered.5 Not even for important special cases.
For example, the ubiquitous identity (which is a special case of the generalized
version of (E3) for rational numbers m and n)
√
α β = α β for all positive α and β
5 It must be admitted that a direct proof of any of the generalized versions of (E1)–(E3) would be
extremely tedious.
6 We have made a special effort to give a self-contained proof of this identity on pages 220 ff.
194 9. EXPONENTS
def
(9.1) n×k = k+k+···+k.
n
Later on, we will also encounter the phenomenon of having to multiply the same
whole number k by itself many times. For that, we introduce the concept of
exponentiation: 5 × 5 × 5 is denoted by 53 . In general, if k and n are whole
numbers and n > 0, then
def
(9.2) kn = k × k × · · · × k .
n
(E1) βm · βn = βm+n .
(E2) ( βm )n = βmn .
(E3) (α · β)n = αn · βn .
These three facts are, simultaneously, trivial to prove and “fun” to use due to their
simplicity. For example, (E1) says that, in an intuitive sense, exponents are additive
under multiplication. As to the triviality of their proofs, there is no doubt of that.
For example, here is the proof of a special case of (E2) when m = 3, n = 5:
( β3 )5 = ( βββ)5
= ( βββ)( βββ)( βββ)( βββ)( βββ)
= ( ββ · · · β) = β3×5 .
15
The general proof of (E2) is almost identical, and the proofs of the other two
identities are equally straightforward and will be left as exercises (see Exercise 1
on page 200).
196 9. EXPONENTS
Let us go back to (9.1) for a moment: The k and n in the definition of multipli-
cation in (9.1) must be extended from whole numbers to fractions (Chapter 1 in
[Wu-PreAlg]), to rational numbers (Chapter 2 in [Wu-PreAlg]), and—by virtue of
FASM—to all real numbers. Problems in nature and everyday life have exposed
us to the need for such extensions. We recall briefly that the extensions are not
done randomly but are always made to satisfy some basic requirements:
E0 : {positive integers} → R
defined by
3n to each positive integer n (with the y-axis compressed) looks like this (observe
that we are using a scaled coordinate system in the sense of page 134):
Of course when we see dots, we try to connect them! (That is the basic impulse
anyway.) Formally, we would like to get a reasonable looking curve that is the
graph of a function to interpolate these dots, in the sense of page 128. Thus, let
[1, ∞) stand for the right-pointing ray on the number line with vertex at 1. Then
what we want to do is define a function
E : [1, ∞) → R
so that
(9.4) E(n) = E0 (n) = βn for all positive integers n.
Such an E satisfying (9.4) is said to interpolate E0 on [1, ∞). Clearly, the graph of
E is a curve that “connects all the dots” of the graph of E0 .
There are many ways to achieve an interpolation of E0 , the most primitive one
being to connect any two adjacent dots of the graph of E0 by a line segment and
then define the interpolating function to be the one whose graph is the sequence
of connected segments so obtained. This way of interpolating a function has in
fact been implicitly described in connection with the book-cost function h(n) and
its corresponding linear function of one variable H ( x ) on page 128.7 However,
there is usually a “natural” interpolation of a given function which “respects” its
characteristic properties. For the case at hand, it will be seen that property (E1) is
the defining characteristic of E0 . Now when (E1) is stated in terms of the function
E0 (see (9.4)), it becomes:
E0 (m) E0 (n) = E0 (m + n) for all positive integers m and n.
Therefore we expect to have an interpolation E : [1, ∞) → R of E0 with a similar
property, namely,
E(s) E(t) = E(s + t) for s, t ≥ 1.
7 With a little reflection, one would realize that the extension of the profit function in the man-
ufacturing problem on page 159 is the precise 2-dimensional analog of the extension from h to H in
1-dimension.
198 9. EXPONENTS
As often happens, nature surprises us by offering more than we ask for: by the
use of more advanced mathematics, one in fact proves that there is a function E
defined not just on [1, ∞) but on all of R, so that
E(s) E(t) = E(s + t) for all numbers s and t.
See Chapter 21 in Volume III of [Wu-HighSchool]. Because it is this interpolating
function that brings the correct perspective to the definitions of rational exponents
and to the laws of exponents, we will make this function the cornerstone of our
discussion. From now on, we will assume the validity of this theorem without
proof.8 Here is the theorem in question (in part (A) the term “continuous” is
used, but we will not even define what it means because it is a technical condition
that ensures the validity of the theorem but is otherwise never mentioned again
in this volume).
Theorem 9.1. For a given positive constant β, there exists a unique function E :
R → R, defined on the whole number line R, so that E satisfies:
(A) E is continuous, and for all positive integers n, E(n) = βn .
(B) For all (real) numbers s and t, E(s) E(t) = E(s + t).
The function E in the theorem is known as an exponential function with base
β. Here is the graph of E for β = 3; observe that this graph connects the dots of
the graph on page 197.
We accept Theorem 9.1 without proof partly because its proof is not appropri-
ate for school mathematics, but partly also because, at this point, the proof is not
as important as what this theorem has to say about our general understanding of
the rational exponents of a given number.
Let us rewrite Condition (B) of Theorem 9.1 into a more familiar form. Be-
cause of Condition (A), it is tempting to rewrite E( x ) as β x for every number x;
indeed, if x is a positive integer n, then (A) says E(n) is βn after all. If we agree to
8 This is not unlike the discussion of polynomials of one variable: without assuming the Funda-
mental Theorem of Algebra, such a discussion would be superficial and essentially pointless.
9.1. POSITIVE-INTEGER EXPONENTS 199
way we try not to call your attention to the fact that the writing of these volumes
is in English—because this is what mathematics is.
Let us bring closure to the discussion of interpolations of functions. What
we have just witnessed in (9.6) and (9.8) is that, once we know (9.5) is valid, it be-
comes very simple to understand why the zeroth power and the negative powers
of β must be what they are. In the next section, we will see that (9.5) brings the
same clarity to the fractional powers. But (9.5) would not come to the forefront
of these considerations without a clear understanding that the desired interpola-
tion—given in Theorem 9.1—of the function E0 (n) = βn exists. So the question
becomes how one could dream up such interpolations, i.e. the exponential func-
tions. The simple answer is that these exponential functions do not need to be dreamt
up because they have been there all along. Indeed, many natural processes of growth
and decay (growth of a bacterial population, decay of plutonium atoms, etc.)
have to be modeled by the use of exponential functions. In addition, exponential
functions appear naturally in mathematics: solutions of many basic (differential)
equations involve exponential functions. For these reasons, exponential functions,
i.e., the interpolations of βn , provide natural guideposts for our understanding
of rational exponents.
Exercises 9.1
(1) (i) Write out a proof of (E1) and (E3) on page 195 for the special case of
m = 3 and n = 5. (ii) Write out a general proof of (E1) and (E3).
(2) Here is a common explanation of why 50 must be equal to 1:
Consider the sequence of powers of 5:
54 = 625, 53 = 125, 52 = 25, 51 = 5, and 50 = ?
Notice that you obtain 53 from 54 by dividing by 5, obtain
52 from 53 by dividing by 5, and obtain 51 from 52 by
dividing by 5. Therefore, we should also obtain 50 from
51 by dividing by 5. So 50 = 55 = 1.
Do you think this is a valid proof? Why or why not?
(3) Let H : {whole numbers} → {all real numbers ≥ 0} be the function
H ( x ) = 2x . Plot ( x, 2x ) for x = 0, 1, . . . , 12 to get an idea of the gen-
eral shape of the graph.
(4) Mental math (x and y are real numbers):
(i) Simplify (3x − y)7 (3x + y)7 .
(ii) 424 · 14−4 = ?
(iii) If it is known that 173 = 4913, what is 343 when it is
rounded to the nearest 104 ?
(iv) Simplify ( x4 − y4 )−5 ( x3 + x2 y + xy2 + y3 )5 .
1
If we write γ for β 2 , this says γ is a positive number so that γ2 = β, i.e., γ is the
square root of β. This prompts us to take a more serious look at the concept of a
square root.
A good mathematics education sometimes has the beneficial effect of making
you stop and think about things you may have taken for granted all along and, in
the process, make you gain new understanding of these things. A case in point
for most of us is our experience with the number π. We may have learned in some
vague sense that π is the ratio of circumference over diameter, but we may not
stop to think what “circumference” means until we encounter a formal definition
of length of a curve such as the one offered in Section 5.2 of [Wu-PreAlg]. Moreover,
we may not have thought about how to get a reasonably accurate estimate of π
until we realize that π is also the area of the unit disk and, therefore, with a precise
concept of area available (such as the one offered in Section 5.3 of [Wu-PreAlg]),
we can actually achieve such an estimate by hand. The situation with the “square
root of a number” is similar. It is a concept that is all too familiar, but how do
we know that there is a number whose square is exactly the given number ? Take the
square root of 2, for instance. You can perhaps rattle off 1.4142135 . . . as that
number, but you may also be aware that the decimal expansion of the square root
of 2 is nonrepeating, so that no matter how many decimal digits you write down,
it will just be an approximation. For example, 1.1421352 = 1.99999091405925. So
what gives you the confidence that there is a number which is the “square root of 2”?
This is where mathematical knowledge can
help by providing us with the answers we The fact that there is always a
need. There is a theorem, proved in advanced positive square root √ x of a
courses, that not only square roots, but any so-
called n-th roots, exist and are unique. Pre-
positive number x is highly
cisely, let n be a positive integer. Then given a nontrivial.
positive number β, a positive number γ is said
to be a positive n-root of β if γn = β. (Recall that γn = γγ · · · γ (n times), by
definition.)
Note the emphasis throughout on the positivity of β and γ.
This is because if β = −2, then there is no number on the num-
ber line whose square is a negative number (see Exercise 2 on
page 204). Moreover, in case β > 0, e.g., β = 4, there will be at
least two numbers whose square is 4, namely, 2 and −2. This
is why we have to specify the positivity of γ in the preceding
paragraph.
Then the theorem that resolves all doubts in this context is the following (it is
proved in Section 16.5 of Volume III of [Wu-HighSchool]; also in §18 of [Ross]).
Theorem 9.2. Given a positive number β and a positive integer n, there is one and
only one positive number γ so that γn = β.
To paraphrase: every positive number has a unique positive n-th root (n is a
positive integer). The uniqueness part of the theorem, which says that there is at
most one such γ, is actually not difficult to prove, but we will postpone this proof
to Corollary 2 on page 209 so as not to interrupt our discussion. Henceforth, we
shall refer to the γ in the theorem as the positive n-th root of α and, if there is no
fear of confusion, more simply as the n-root of α. The standard notation for the
202 9. EXPONENTS
√
positive n-th root of α is n α. Note that the case n = 2 is distinguished and√ the
√
notation for the positive square root is α rather than the more elaborate
√
2
α.
√
Please remember that n α is always positive, by convention. Therefore 4 = 2,
and never −2. In any case, the main point of Theorem 9.2 is that there is such a
thing as the positive square root of 2.
The third root of α is traditionally called its cube root.
√
Example. Graph the function given by r ( x ) = x.
We note that this is not a function from all numbers to all numbers, but
r : {all real numbers ≥ 0} → {all real numbers}.
The following sequence of points on the graph of r ( x ) is self-explanatory:
4 q q
q q
3 q q
q
2 q
q q
1 q q
qqq
q
O 1 4 9 16
This
√ sequence of points exhibits two √ different patterns: when 0 < x < 1, x <
x = r ( x ), but when 1 < x, x > x = r ( x ) (see Exercise
√ 4 on page 205). Note
√
also that if 0 < a < b, then r ( a) < r (b), i.e., a < b (see Exercise 5 on page
205). This fact then tells us that there is no need to see more points on the graph
of r ( x ) because as we go towards the right on the positive x-axis, the graph will
simply rise slowly in the same way as it does here for the values of 1 ≤ x ≤ 16.
Activity
we have, by (9.10),
β1/n · β1/n · · · β1/n = β1 = β.
n
Thus ( β1/n )n
= β. Since β1/n > 0 by (9.7), this says β1/n is n
β, by Theorem 9.2.
This proves (9.9).
204 9. EXPONENTS
The equality (9.9) explains why we had to assume β > 0. If β were a negative
number, then the value of β1/2 , or for that matter the value of β1/n for any even
positive integer, would not be a real number (see Exercise 2 on page 204).
Finally, we can determine the value of β x when x is a fraction:
m
(9.11) βm/n = n β for all positive integers m and n.
One can see the reason behind (9.11) in a special case: using (9.10) and the fact
that 14 + 14 + 14 = 34 , we have
1 1 1 √
53/4 = 5 4 + 4 + 4 = 51/4 · 51/4 · 51/4 = (51/4 )3 = ( 5)3 .
4
In general, because
m 1 1 1
= + +···+ ,
n n
n
n
m
(9.10) implies that
βm/n = β1/n · β1/n · · · β1/n .
m
Therefore, using (9.9), the right side is equal to
n
β · n β · · · n β = ( n β )m ;
m
this then proves (9.11).
It follows from (9.8) and (9.11) that
1
β−m/n = for all positive integers m and n.
( β )m
n
Together with (9.6) and (9.11), we have the following complete determination of
the values of the function β x when x is a rational number: For all positive integers
m and n,
⎧ m
⎪
⎪ βm/n = n
β ,
⎪
⎪
⎪
⎨ 0
(9.12) β = 1,
⎪
⎪
⎪
⎪ −m/n = 1
⎪
⎩ β .
( β )m
n
Our work is not yet done, however, because there are still tantalizing ques-
tions we cannot answer at this point, e.g., is it true that ( βm )1/n = ( β1/n )m ? We
shall deal with this and other related questions in the next section.
Exercises 9.2
(1) Do not use a calculator for the following. (a) 125−2/3 − 32−2/5 = ?
(b) 5124/3 = ? (c) 31252/5 · 2561/4 = ?
(2) (a) Explain why there is no number on the number line whose square
is a negative number. (b) If n is an even positive integer, explain why
there is no number on the number line whose n-th power is a negative
number.
9.3. LAWS OF EXPONENTS 205
In advanced courses, both the definition of β x for any number x and the
proofs of these laws—again for all s and t—are done in one fell swoop (see Chapter
21 in Volume III of [Wu-HighSchool]). Our more modest goal in this section is to
at least prove (E5) and (E6) for rational numbers s and t on the basis of (E4), i.e., on
the basis of identity (9.5). ((E5) is the statement of Theorem 9.3 on page 206, and
(E6) is the statement of Theorem 9.7 on page 212.) Let it be stated at the outset
that, even with this limited goal, the proofs in the next ten pages are not entirely
straightforward, and most of them will not likely see the light of day in a middle
school classroom. But we provide these proofs nonetheless because the reasoning
is extremely instructive; if you read them through carefully—and even if you do
not follow everything—you are bound to become more comfortable with rational
exponents. At the very least, study Corollary 2 on page 209 and the proof of Theorem
9.6 on page 211 carefully, because they will enable you to explain to your students
the fundamental fact that, for all positive numbers x and y,
√ √ √
n xy = n x · n y.
Activity
Prove each of the following in two different ways: first, appeal to (E4)–(E6),
and second, directly compute the values of both sides with a four-function
2/3
calculator. (i) (272/3 )(1252/3 ) = (27 · 125)2/3 . (ii) 3438/3 = 3434 .
The first goal of this section is the proof of the following special case of (E5).
Theorem 9.3. Let β > 0. Then
(9.13) ( βs )t = βst for all rational numbers s and t.
We would like to point out, first of all, that (9.13) does not come out of
nowhere, but is rather the comprehensive summary of various known special
cases. For example, we proved earlier that if m and n are positive integers, then
m
βm/n = n β
(see (9.11) on page 204). But by (9.9), β1/n = n β. Therefore the preceding
equality can be written as
m
βm/n = β1/n .
Or, equivalently,
1/n m
(9.14) β = βm/n .
We see that this is a special case of (9.13) when s = n1 and t = m.
In addition, we know from (9.12) on page 204 that
1
β−m/n = .
( β)m
n
But the product formula for rational quotients (see page 270)9 implies that
1 1 1
= ×···× .
( β)
n m n
β n
β
m
By combining the two and using β1/n = n
β, we get:
1 1
β−m/n = × · · · × 1/n .
β1/n β
m
Now using the fact that β−s
= 1/βs
for any number s (see (9.8) on page 199), we
have m
β−m/n = β−1/n · · · β−1/n = β−1/n .
m
Written another way, this says:
−1/n m
(9.15) β = β−m/n .
This is then a special case of (9.13) when s = − n1 and t = m.
As another variation on this theme, again we start with (9.12) on page 204, to
the effect that
1
β−m/n = .
( β)m
n
Using β−s = 1/βs for any number s, we rewrite the right side as
1 −m
= n
β = ( β1/n )−m .
( n β)m
Combining the two, we have
−m
(9.16) β1/n = β−m/n .
We see that this is a special case of (9.13) when s = n1 and t = −m. We will build
on (9.14)–(9.16) to prove (9.13).
A second comment about the proof of (9.13) is that it can be done by a brute-
force “grinding out” process, but with a little finesse the proof can be greatly
simplified and made conceptually more transparent. Anticipating Corollary 2 on
page 209 below (which is logically independent of Theorem 9.3), we will illustrate
with the proof of a very simple special case of (9.13):
(9.17) (21/3 )1/2 = 2(1/3)·(1/2).
We first give the brute-force proof. The key idea is that we don’t have to prove
directly that both sides of (9.17) are equal because, in view of Corollary 2, all we
have to do is to make sure that both sides are positive numbers (they are) and
that when we raise both sides to a large positive integer power, they are equal. Of
course the whole point of raising both sides to a large positive integer power is to
bypass the unpleasant fractional exponents. In this case, for example, if we raise
the left side of (9.17) to the 6th power (6 is the product of the denominators of 13
and 12 in the exponents), then
6
(21/3 )1/2 = (21/3 )1/2 · (21/3 )1/2 · · · (21/3 )1/2 .
6
If we let β = 21/3 , then β = (21/3 )1/2 , by (9.9) on page 202. Thus we get:
6
(21/3 )1/2 = β · β · β · β · β · β.
208 9. EXPONENTS
Since √
β·
3
β = β = 21/3 = 2,
where the last equality is by (9.11) on page 204, we have
6 √ √ √
3 3 3
(21/3 )1/2 = ( 2) ( 2) ( 2) = 2 ,
√
where the last equality is because of the definition of 3 2. By Corollary 2, (9.17)
will be proved as soon as we can show that the right side of (9.17), when raised
to the 6th power, is also equal to 2. Now the right side of (9.17), when raised to
the 6th power, is equal to
6
6
2(1/3)·(1/2) = 21/6 .
√
By (9.9), 21/6 = 6 2 so that by the definition of the 6th root of 2, we have
6
21/6 = 2.
Therefore the right side of (9.17), when raised to the 6th power, is indeed equal to
2. This then proves (9.17).
Let us now rephrase the same proof using (9.14) on page 206. If we raise the
left side of (9.17) to the 6th power, we get, by applying (9.14) twice in succession:
6
(21/3 )1/2 = (21/3 )(1/2) · 6 = (21/3 )3 = 2(1/3) · 3 = 2.
We can also simplify the right side of (9.17) when raised to the 6th power by
applying (9.14):
6
6
2(1/3)·(1/2) = 21/6 = 2(1/6) · 6 = 2.
The two sides of (9.17) being equal when raised to the 6th power, Corollary 2
again concludes the proof of (9.17).
Activity
Prove the following special case of (9.13) on page 206: (51/2 )4/3 = 52/3 .
We will give a general proof of (9.13) by pushing harder along the line of
reasoning of (9.14)–(9.16). We need some preparation. Let us note that, for our
immediate needs, Corollary 2 on page 209 is the most important, and Corollary
2 can be proved directly (see Exercise 11 on page 213). However, Lemma 9.4
and Corollaries 1 and 2 are extremely useful facts (see Sections 10.2 and 10.4, for
example), so all three deserve to be learned.
Lemma 9.4. If two positive numbers α and β satisfy α < β, then for any positive
integer n, αn < βn .
Proof. Suppose α < β. We will prove αn < βn for n = 2, 3, 4, . . . in succession.
First, we prove α2 < β2 . Because α > 0, we may multiply both sides of α < β by
α to get
(9.18) α2 < αβ.
9.3. LAWS OF EXPONENTS 209
(9.19) αβ < β2 .
Combining (9.18) and (9.19), we get α2 < β2 .
Next we prove α3 < β3 . Multiplying both sides of α2 < β2 by α, we get
We use the trichotomy law again to eliminate the possibility of either α < β
or α > β. Lemma 9.4 says that if either is true, then αn = βn , which contradicts
the hypothesis. Thus α = β, and Corollary 2 is proved.
We can now prove the following lemma which is already halfway towards
Theorem 9.3 (if the positive integer k in the lemma were a rational number, then
the lemma would be exactly Theorem 9.3).
Lemma 9.5. For any positive number β and for any rational number s and any
positive integer k,
s k
β = β sk .
Proof. If s = 0, the lemma is trivial (see (9.6) on page 199). We may therefore
assume s = 0. First we assume s > 0. Then s = m/n for some positive integers m
and n and we have to prove:
m/n k
(9.24) β = βmk/n .
If we look at (9.24) “the right way”, then it is not complicated at all: the right way
is to think of both sides of (9.24) as statements about β1/n . Then, writing α for
β1/n , and using βm/n = ( β1/n )m (this is (9.14) on page 206), we see that
βm/n = αm .
Therefore, the left side of (9.24) is
m/n k k
(9.25) β = αm = αmk ,
where we make use of (E2) on page 195 in the second equality. Now apply (9.14)
to the right side of (9.24); we get
mk
βmk/n = β1/n = αmk .
Comparing this with (9.25), we see that we have proved (9.24).
Now suppose s < 0. Then s = −m/n for some positive integers m and n,
and we have to prove:
−m/n k
(9.26) β = β−mk/n .
We now think of both sides of (9.25) as statements—not about β1/n —but about
β−1/n . Then the preceding argument can be followed verbatim except that (9.14)
has to be replaced everywhere by (9.15). We will leave the details as an exercise
(Exercise 4 on page 213). The proof of the lemma is complete.
Proof of Theorem 9.3. We have to prove that
( βs )t = βst for all rational numbers s and t.
If t = 0, both sides are equal to 1 (see (9.6) on page 199). Henceforth, we may
assume t = 0. First suppose t > 0. Then t = m/n for some positive integers m
and n. Then we have to prove:
s m/n
(9.27) β = β s · (m/n) for all rational s.
By Corollary 2, it suffices to prove that, when raised to the n-th power, both sides
of (9.27) are equal. The left side of (9.27) when raised to the n-th power is equal
to, by Lemma 9.5,
s m/n n
β = ( βs )(m/n) · n = ( βs )m .
9.3. LAWS OF EXPONENTS 211
Exercises 9.3
(1) In each of the following, find the number s that makes the equality valid,
and then verify the equality directly by computing the value of each side
with a four-function calculator. (i) If 7295/6 · 729s = 7298/6 , what is s?
3/2
(ii) If 729s = 7295/6 , what is s?
(iii) If 1176491/2 · 117649s = 117649
2/3 , what is s?
√
4 3
√
(2) Which of the following is bigger: √ 125 or 12 125?
√ n
(3) Prove that if 0 < a < b, then n a < b for any positive integer n.
(4) Complete the proof of Lemma 9.5 on page 210 by proving (9.26).
(5) Prove equation (9.31) on page 211.
(6) Without making use of (E5) (= (9.13) on page 206), give a direct proof
that ( β4 )1/3 = β4/3 for all positive numbers β.
(7) Give a direct proof of the following special case of Lemma 9.5 on page 210:
2/3 4
β = β8/3 . (The numbers are so small that the proof can be achieved
with a minimal appeal to symbolic notation.)
(8) Prove that for all rational numbers r, s, and t, and for all α, β > 0,
(αr βs )t = αrt βst .
√
(9) For positive numbers α and β, prove that α + β ≤ α + β and that
equality holds if and only if α = 0 or√β = 0.
(10) Prove that for any number x, | x | = x2 .
(11) Here is the outline of a second proof that the positive n-th root of a
positive number β is unique: Let a and b be two such positive n-th roots;
then an − bn = 0. By identity (1.7) on page 12, we conclude that a =
b. Write out this proof in detail, and make sure the proof requires the
positivity of both a and b.
(12) Prove that if a > 1, then for any rational numbers r and s so that r < s,
ar < as , while if 0 < b < 1, br > bs .
(13) Given two similar triangles with the lengths of a pair of corresponding
sides as shown:
J
J
J
J
a
J
J J
J J
J
a J
J J
J
J
J
If the ratio of the area of the bigger triangle to the area of the smaller
triangle is s, what is the ratio aa in terms of s ?
(14) Recall that an annual interest rate of x percent means that an account of
P dollars earns at the end of one full year an amount of
x
P
100
214 9. EXPONENTS
some integer N so that N < M < N + 1. (For example, the number 45572.384
lies between 45572 and 45573.)
N M N+1 10n
Let M = S1 = 45678 1
73 . Then also S = M , so that finding a positive integer n so
that 10−n < S is equivalent to finding an integer n so that
1 1
n
< .
10 M
By Inequality (B), this is equivalent to finding a positive integer n so that M < 10n .
Now Fact 1 already guarantees the latter, so we are done.
However, it is much more enlightening to get a specific n in this case. We
proceed as follows: we have
45678
M = < 45678 < 105 .
73
By Inequality (B), we get
1 1
< ,
105 M
which can be rewritten as
10−5 < S.
Therefore, n = 5 would work in this case.
Activity
Let S = 0.7
456789 . Find a specific positive integer n so that 10−n < S.
Therefore finding a positive integer n so that 10−n < S is now seen to be equiva-
lent to finding a positive integer n so that M < 10n . But Fact 1 implies that there
is a positive integer n so that M < 10n , so Fact 2 is proved.
As mentioned above, the reason we want to achieve an intuitive understand-
ing of the integer powers of 10 (such as Fact 1 and Fact 2) is that they are important
for the next concept, scientific notation.
When a number s is written in Consider the number that is the current es-
scientific notation, s = d × 10 , n timate of the number of stars in the universe:
6 × 1022 . Of course this is the 23-digit whole
the exponent n clearly displays number with the leading digit (i.e., the left-
the magnitude of s. most digit) 6 followed by 22 zeros, but when
it is written in the form 6 × 1022 , it is said to
be given in scientific notation. Formally, a positive, finite decimal11 s is said to be
written in scientific notation if it is expressed as a product d × 10n , where d is a
finite decimal ≥ 1 and < 10 (i.e., 1 ≤ d < 10), and n is an integer. (In other words,
d is a finite decimal with only a single nonzero digit to the left of the decimal
point.) The integer n is called the order of magnitude of the decimal d × 10n .12
Take the finite decimal 234.567. It is clearly equal to every one of the follow-
ing:
2.34567 ×102 , 0.234567 ×103 , 23.4567 ×10,
234.567 ×100 , 234567 ×10−3 , 234567000 ×10−6 .
However, only the first is a representation of 234.567 in scientific notation.
Activity
(in standard notation) is perhaps too obvious for discussion: in the standard form,
one can’t keep track of the number of zeros! The advantage goes deeper, however.
When faced with a very big number, one’s natural first question is: roughly how
big? Is it something like a few hundred billion (a number with 11 digits), or even
a few trillion (a number with 13 digits)? The exponent 22 in the scientific notation
6 × 1022 tells you immediately that it is a 23-digit number and therefore far bigger
than “a few trillion”.
We should elaborate on the last statement. Observe that the number 6234.5 ×
1022 does not have 23 digits but 26 digits because it is the number
62, 345, 000, 000, 000, 000, 000, 000, 000.
So the reason we are confident about 6 × 1022 having only 23 digits is that 6 is
at least 1 and less than 10. Therefore by requiring the d in d × 10n to satisfy
1 ≤ d < 10, we are ensuring that the exponent n will unfailingly convey the
intuitive sense of the “order of magnitude” of d × 10n .
Activity
All planets revolve around the sun in elliptical orbits. Uranus’s furthest dis-
tance from the sun is approximately 3.004 × 109 km, and its closest distance
is approximately 2.749 × 109 km. What is the average of these two distances?
how many times a proton is heavier than an electron. Without using scientific
notation, we would have to compute the ratio r of
0.000 000 000 000 000 000 000 000 001 672 622
.
0.000 000 000 000 000 000 000 000 000 000 910
This is not an inviting prospect. However, we can write this ratio as
1.672 622 × 10−27
r = .
9.109 382 91 × 10−31
Using the general cancellation law (see page 269), we can eliminate the negative
power of 10 in the numerator and the denominator in order to better see what
we are doing. Anticipating that 10−31 × 1031 = 1, we are led to multiply the
numerator and denominator of the (complex) fraction by 1031 :
Exercises 9.4
the four arithmetic operations and the use of rational exponents. In this context,
an expression is also called an algebraic expression. Thus, the following is an
algebraic expression:
3/4 xy
x −3 + (yz)2 + 5 − ( )5 .
z
(C) The availability of integer exponents allows us to bring closure to the
discussion of decimals in Chapter 1 of [Wu-PreAlg] by introducing the concept
of scientific notation. Through the preceding discussion of scientific notation, we
see that the numbers that are important in the sciences are not fractions but finite
decimals (though it behooves us to remember that finite decimals are a special
class of fractions).
https://doi.org/10.1090//mbk/099/10
CHAPTER 10
Quadratic Functions
and Their Graphs
A quadratic polynomial function or, more simply, a quadratic function f is, by
definition, a function from R to R given by
f ( x ) = ax2 + bx + c
for some constants a, b, and c. In the school curriculum, the topic of quadratic
equations (see (10.3) on page 225) precedes the introduction of quadratic functions,
and rightly so, because an equation is conceptually simpler than a function to
students of school age. This curricular decision—no matter how pedagogically
laudable—has led to an undesirable side effect (at least in TSM1 ), namely, that
functions and equations have become two separate topics. This is but one example
of how TSM has wreaked havoc with the school curriculum, because the study of
functions should properly include the study of equations. Thus, if we define x0 to
be a zero (or a root) of a function f if f ( x0 ) = 0, then a very natural question in
the study of any function is to ask for the locations of all its zeros because these
zeros are usually part of the “signature” of the function. For example, if f ( x ) =
3x2 − 7x + 2 is given and we want to find all its zeros, then the problem—properly
phrased—is one of finding all the numbers x0 so that 3x02 − 7x0 + 2 = 0. If we
compare this problem with the definition of what it means to solve an equation
(see pages 28 ff.), we see that this is exactly the problem asking for the solution of
the quadratic equation 3x2 − 7x + 2 = 0.
In order to emphasize the fact that solving equations is part of the study of func-
tions, the main body of this chapter (Sections 10.2–10.5)—devoted to the study of
quadratic functions—only mentions quadratic equations along the way. However,
out of respect for the school curriculum and teachers’ pedagogical needs, we be-
gin the chapter with a section on quadratic equations that gives the fundamental
idea of completing the square its due. We hope nevertheless that you will come
away with a real appreciation of the connection between equations and functions.
In particular, be sure to take note of the fact that the technique of completing
the square, far from being a one-time trick designed specifically for deriving the
quadratic formula, is the key that unlocks the secrets of quadratic functions in
general.
223
224 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
You should savor your time spent on studying quadratic functions, because
the algebra of quadratic functions of one variable is part of mathematical fairy-
land: this is an area in mathematics where everything we need to know is known,
and the answers are both simple and beautiful. Of course, we also know as much
as we need to know about the algebra of linear functions of one variable, but
the mathematics there is really too simple for us to take much pride in that ac-
complishment. By comparison, nobody should feel any embarrassment about
rediscovering the method of completing the square and the quadratic formula2
(see page 236). That would be a notable achievement indeed.
We will explore the most basic facts about quadratic functions and their
graphs. We have already come across the simplest quadratic function F1 ( x ) = x2
on page 129.3 A main theme of this chapter is to show that if we understand the
functions Fa ( x ) = ax2 , then we have already come very close to understanding all qua-
dratic functions. For this purpose, we must first obtain a good knowledge of the
geometry of the graph of Fa ( x ) for any a. This is hardly an accident as the ge-
ometry of the line weaves through the whole discussion of the algebra of linear
equations. In the same way, we will begin the study of quadratic functions by
trying to find out as much as we can about the graphs of quadratic functions;
then we will make use of the geometry of these graphs to facilitate the under-
standing of the functions themselves. In Sections 10.2 and 10.3, we will get to
witness the central role played by basic isometries (page 266)—specifically, reflec-
tions and translations—in the study of these graphs. In particular, we will correct
an egregious error in TSM by explaining what it means to say that the graph of a
quadratic function is a parabola.
3x2 − x + 8 = 3x2 + 2x − 5
or
3x2 − x + 8 = x2 + 2x.
2 The honor of discovery apparently belongs to the Babylonians of some twenty-four centuries
ago.
3 There we denoted it by s to suggest “square”. The reason for the present notation F will be
1
obvious once we begin discussing the functions Fa ( x ) = ax2 .
10.1. QUADRATIC EQUATIONS 225
(10.1) ax2 + bx + c = a x2 + b x + c ,
( a − a ) x2 + bx + c = b x + c .
(10.2) ( a − a ) x2 + (b − b ) x + (c − c ) = 0.
(10.3) ax2 + bx + c = 0,
giving the proof of this statement, let us assume it for the moment and give a
simple application: to solve 2x2 − x − 3 = 0, we graph
y = 2x2 − x − 3.
-2
-3
-4
The graph of y = 2x2 − x − 3 seems to intersect the x-axis at (−1, 0) and (1.5, 0),
and a simple computation confirms that −1 and 1.5 are indeed roots of 2x2 − x −
3 = 0.
Now, to explain why s is a root of ax2 + bx + c = 0 if and only if the graph of y =
ax2 + bx + c intersects the x-axis at (s, 0), recall that the graph of y = ax2 + bx + c
is the collection of points of the form ( x, ax2 + bx + c). In this light, s being a root
of the equation ax2 + bx + c = 0 means as2 + bs + c = 0, so that (s, 0) is on the
graph of y = ax2 + bx + c. But the point (s, 0) is on the x-axis, and so the graph
of y = ax2 + bx + c passes through the point (s, 0) of the x-axis. Conversely, if
the graph of y = ax2 + bx + c intersects the x-axis at (s, 0), then (s, 0) is on the
graph of y = ax2 + bx + c, which means as2 + bs + c = 0 by the definition of the
graph. So s is a root of ax2 + bx + c = 0.
We have shown the graph of 2x2 − x − 3 roughly on the open interval (−2, 2.5);
it would be impractical to show the graph over a larger interval because the val-
ues of y get large quite fast outside (−2, 2.5). Here is a table of some values of x
and the corresponding y:
x y x y
−2 7 2.5 7
−3 18 3.5 18
−4 33 4.5 33
−5 52 5.5 52
−6 75 6.5 75
−7 102 7.5 102
−8 133 8.5 133
10.1. QUADRATIC EQUATIONS 227
Note that the values of y are the same in the second and fourth columns; this fact
will be explained in due course (see Exercise 2 on page 252).
We begin by looking at some simple quadratic equations. Does x2 − 4 = 0
have any solutions? This is equivalent to x2 = 4, and right away we know two
numbers whose square is 4: 2 and −2, usually abbreviated to ±2. Similarly,
2x2 − 50 = 0 has two solutions, ±5, because multiplying this equation through
by 12 gives x2 − 25 = 0. Clearly 2x2 − 50 = 0 has a solution if and only if
x2 − 25 = 0 does. Since the latter is equivalent to x2 = 25, we have the obvious
solutions ±5 to 2x2 − 50 = 0. Next, does 2x2 − 5 = 0 have any solutions? We
2
√ by now that this equation is equivalent to x = 2.5. Recalling
can easily conclude
that the symbol N for a positive number N denotes the unique positive number
whose square is equal
√ to N (see Theorem 9.2 again), we conclude that there are
two solutions: ± 2.5 .
Are there others? The answer is no due to the following lemma.
X = ± N.
√ √ 2
Proof . Since N is positive, it has a positive square root N, so that ( N√) = N
(see Theorem 9.2 on page 201). Thus X 2 − N = 0 is equivalent to X 2 − ( N )2 =
0. By identity (1.5) on page 10, we get
√ √
( X − N )( X + N ) = 0.
√ √
√ X − N = 0 or X + N = 0, which is to say, X is
By (9.38) on page√220, either
equal to either + N or − N. The lemma is proved.
√ Thus we √ have proved that the equation 2x2 − 5 = 0 has exactly two solutions:
2.5 and − 2.5.
What happens to Lemma 10.1 when N ≤ 0? Consider, for instance, the
equation x2 − (−5) = 0. But for any number x, x2 ≥ 0 so that
x2 − (−5) = x2 + 5 ≥ 0 + 5 ≥ 5 > 0.
Therefore the equation x2 − (−5) = 0 has no solution. We also observe trivially
that the equation x2 = 0 has 0 as a solution. A similar reasoning allows us to
conclude that the equation X 2 − N = 0 has two solutions, one solution, or no
solution depending on whether N is positive, zero, or negative, respectively.
Now this is a very untidy conclusion about such a simple equation and, from
a more advanced point of view, we can do better. The correct conclusion is that
if complex roots are allowed, every quadratic equation has two solutions.4 Without
going into the details,
√ we outline what this means using the equation x2 − N= 0.
With i denoting
−1 and N < 0, the two roots of x2 − ( N ) = 0 are i | N |
and −i | N |. As to the case N = 0, i.e., x2 − 0 = 0, we see that the equation
is actually x · x = 0, so that 0 being a solution means 0 · 0 = 0. Thus 0 is a root
“doubly”, i.e., two roots that happen to coincide. For this reason, we say 0 is a
double root. We can better explain the reason for calling this a double root by
considering a related equation x2 − bx = x ( x − b) = 0 for a nonzero real number
b. The two distinct roots are clearly 0 and b. Now let b get closer and closer to 0;
4 The Fundamental Theorem of Algebra asserts that every polynomial equation of degree n has
x2 + 2x + 2 = ( x2 + 2x + 1) − 1 + 2 = ( x + 1)2 + 1
so that x2 + 2x + 2 = 0 can be rewritten as ( x + 1)2 + 1 = 0. Seeing that
( x + 1)2 ≥ 0, we realize that there will be no solution for x2 + 2x + 2 = 0 this
time because, for any number x,
x2 + 2x + 2 = ( x + 1)2 + 1 ≥ 0 + 1 > 0
and x2 + 2x + 2 will always be positive no matter what x may be. Thus we
witness once more the phenomenon that a quadratic equation can have two roots
(including the possibility of a double root), or no roots.5
Activity
Solve x2 − 6x + 1 = 0 and x2 − 6x + 10 = 0.
What about solving x2 + 5x + 2 = 0 ? If we follow the preceding line of
reasoning, we will be looking for a number c so that
5 Recall from the preceding footnote on page 227 that if complex roots are allowed, then this
(Note that the second equality in (10.4) makes use of (9.31) on page
√
211.) There-
fore, x is a solution of x2 + 5x + 2 = 0 if and only if x = − 52 ± 217 .
Incidentally, given x2 + 5x, the preceding process of finding 52 so that x2 +
5x + ( 52 )2 becomes a square, ( x + 52 )2 , is worthy of being singled out. Let B be a
number. Then we have an obvious expansion (see identity (1.2) on page 8):
( x + B2 )2 = x2 + Bx + ( B2 )2 .
By reading this identity from right to left (the need for doing this having been
first brought up on page 21), we get
(10.5) x2 + Bx + ( B2 )2 = ( x + B2 )2 for any number B.
Letting B = 5, we retrieve the previous result of x2 + 5x.
The process of adding ( B2 )2 to x2 + Bx to get a square ( x + B2 )2 is called
completing the square. In order not to interrupt the flow of this discussion, we
will leave an explanation of this term to the end of this section (page 236).
Activity
Solve x2 − x + 1 = 0.
As a final example, consider the equation 3x2 − 4x + 5 = 0. In view of
(10.5), we will make sure that we work with quadratic polynomials whose leading
coefficient is 1. Therefore, letting B = − 43 in (10.5), we get:
3x2 − 4x + 5 = 3( x2 − 43 x ) + 5 = 3 x2 − 43 x + ( 46 )2 − ( 46 )2 + 5.
Therefore,
2
(10.6) 3x2 − 4x + 5 = 3 x2 − 43 x + ( 23 )2 − 3( 23 )2 + 5 = 3 x − 23 + 11
3 .
Therefore 3( x − 23 )2 + 11
3 = 0 has no solution and, consequently, 3x − 4x + 5 = 0
2
has no solution.
230 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
Activity
Solve 2x2 − 3x + 3 = 0.
ax2 + bx + c = 0,
Since
2
b − b2 −b2 + 4ac
−a +c = +c = ,
2a 4a 4a
we have:
2
b b2 − 4ac
2
ax + bx + c = a x + − .
2a 4a
Therefore x is a solution of ax2 + bx + c = 0 if and only if x is a solution of
2
b 2 b − 4ac
(10.7) a x+ − = 0.
2a 4a
10.1. QUADRATIC EQUATIONS 231
6 The terminology of “vertex form” is used only in school mathematics. The standard practice in
right side of (10.9) already allows for the variation in the sign of the number (i.e.,
whether it is positive or negative), we see that
√
b2 − 4ac b2 − 4ac
± 2
= ± .
4a 2a
Therefore we obtain the explicit expressions of the two roots in this case:
√
−b ± b2 − 4ac
x = .
2a
(iii) b2 − 4ac = 0. Then (10.8) implies that
b 2
x+ = 0.
2a
Therefore −b/(2a) is a double root of this equation, and therefore of ax2 + bx +
c = 0.
Observe that the formula in Case (ii) already yields −b/(2a) as a root if
b − 4ac = 0. Therefore we may combine Case (iii) and Case (ii) into a single
2
Activity
Use Theorem 10.2 to explain: (i) why x2 + 2x − 4 = 0 has two distinct roots,
why x2 + 2x + 2 = 0 has no roots, and (ii) why 3x2 − 4x + 5 = 0 has no
roots but (−3) x2 − 4x + 5 = 0 has two distinct roots.
7 Once complex numbers are available, the quadratic formula will be shown to be valid under all
circumstances.
10.1. QUADRATIC EQUATIONS 233
The discriminant Δ of g is then Δ = (s + t)2 − 4st. Because g has two roots s and
2
t, Theorem 10.2 implies that Δ ≥ 0. Thus we have st ≤ 12 (s + t) .
Suppose s and t are nonnegative; then we can take the square root of st to get
√ s+t
(10.11) st ≤ , and equality holds if and only if s = t.
2
The validity of the inequality is clear. As to the assertion about equality, if s = t,
then both sides of (10.11) are equal to s. Conversely, suppose equality holds in
(10.11). Then (s + t)2 − 4st = 0, so that Δ = 0. By the quadratic formula (10.10),
s = t. This proves (10.11).
For nonnegative numbers s and t, 12 (s + t) is called the arithmetic mean of
√
s and t (more commonly referred to as the average of s and t); the number st
is called the geometric mean of s and t (clearly the geometric mean is the multi-
plicative analog of the arithmetic mean). (10.11) is the arithmetic-geometric mean
inequality for two (nonnegative) numbers. There is a corresponding inequality
for n nonnegative numbers; see [Wiki-AGM] for a simple introduction.
The inequality (10.11) has a geometric interpretation. Let R be a rectangle
with two sides of lengths s and t. Then its perimeter is 2(s + t), and (10.11)
becomes:
√ 1
(10.12) area of R ≤ perimeter
4
and equality holds if and only if the two sides are of equal length, s = t, i.e., if
and only if the rectangle is a square. In other words, among all rectangles with
1 2
a fixed perimeter, say c, the square will have the maximum area, namely, 16 c ;
conversely, if a rectangle with a given perimeter c has maximum area, it is the
square of side length 4c . This is why (10.12) is called the isoperimetric inequality
for rectangles. Compare Exercise 13 in Section 2.6 of [Wu-PreAlg] and Exercise 3
on page 10 of this volume.
At this point, we can return to the issue of factoring quadratic polynomials
adumbrated on page 23. Take a quadratic polynomial with leading coefficient
equal to 1, x2 + Bx + C, where B and C are integers. The usual skill of factoring
this polynomial consists of finding two other integers r1 and r2 so that
(10.13) r1 + r2 = − B and r1 r2 = C.
Then because ( x − r1 )( x − r2 ) = x2 − (r1 + r2 ) x + r1 r2 , we get the factorization
(10.14) x + Bx + C = ( x − r1 )( x − r2 ).
2
Let us emphasize that in order to get the factorization (10.14), our sole concern
with the integers r1 and r2 is that they satisfy (10.13). However, (10.14) has a
ramification for r1 and r2 that was never part of our original thinking, namely,
the numbers r1 and r2 turn out to be the roots of the equation x2 + Bx + C = 0. This
is because when we substitute r1 or r2 for x in (10.14), the right side of (10.14) is
obviously equal to zero and therefore so is the left side, which then shows r1 and
r2 are the roots of x2 + Bx + C = 0.
Now suppose we are given x2 + Bx + C = 0, where B and C are no longer
required to be integers. Let r1 and r2 be its roots, which are not necessarily integers
either. Would (10.14) still be correct? The astonishing answer is that it is. Keep in
mind that all we know about r1 and r2 is that they are the roots of x2 + Bx + C = 0;
there is no indication that they would satisfy (10.13). But we are going to prove
234 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
that they do! To this end, observe that because we are assuming the equation
x2 + Bx + C = 0 has roots, we know its discriminant cannot be negative (see
Case (i) on page 231). Therefore its discriminant is ≥ 0. The quadratic formula
(Theorem 10.2) implies that r1 and r2 are given by (10.10), i.e.,
1 1
r1 = − B + B2 − 4C and r2 = − B − B2 − 4C .
2 2
Therefore,
1
r1 r2 = − B + B2 − 4C − B − B2 − 4C .
4
Using the identity ( X − A)( x + A) = X 2 − A2 , we get
1 2
r1 r2 = B − ( B2 − 4C ) = C.
4
Furthermore,
1 1
r1 + r2 = (− B) + (− B) = − B.
2 2
This proves (10.13). Of course, (10.14) is then an immediate consequence.
What we have done can be stated more generally, as in the following theorem.
(10.15) r1 + r2 = − ba and r1 r2 = ac
(10.16) ax2 + bx + c = a( x − r1 )( x − r2 ).
Proof. Because ax2 + bx + c = 0 can be written as a x2 + ba x + ac = 0, we see
that r1 , r2 are also the roots of the equation x2 + ba x + ac = 0. Letting B = ba and
C = ac , we see from (10.13) that the equalities in (10.15) hold. Moreover, we have
from (10.14) that
x2 + ba x + ac = ( x − r1 )( x − r2 ).
112 16
= .
105 15
Therefore,
16
105x2 − 22x − 96 = (105x + 90)( x − ) = (7x + 6)(15x − 16).
15
Activity
Incidentally, it is instructive
√ to multiply out the product of linear polynomials on
the right to obtain 3x2 − 2 x − 12 .
We conclude this section with three comments.
First, the proof of Theorem 10.2, together with the various solutions of specific
quadratic equations in the first part of this section, lays bare the fact that solving a
quadratic equation ax2 + bx + c = 0 is a relatively simple two-step affair, namely,
(a) use completing the square (page 229) to rewrite the equation in vertex form
(see page 231):
a ( x − p )2 + q = 0
236 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
for some constants p and q, and (b) use Lemma 10.1 (page 227) to solve
−q
a( x − p) + q = 0, resulting in the two roots p ±
2
a , provided either q = 0 or
q and a have opposite signs (see page 267).
A word about the notation we have just employed may clear up
some confusion. It would seem that if we are going to make
use of Lemma 10.1, then we should follow the notational con-
vention in that lemma and write the vertex form of the equa-
tion as a( x − p)2 − q = 0. However, the reason for writing
a( x − p)2 + q = 0 instead is Theorem 10.5 on page 242 below.
This is a case of conflicting mathematical demands on a no-
tation, and the choice one makes ultimately comes down to a
judgment call.
A second comment is that we have to put the
Solving a quadratic equation focus of school mathematics on solving qua-
ax2 + bx + c = 0 is nothing dratic equations in perspective. Instead of solv-
ing ax2 + bx + c = 0, let us also consider the
more than asking for the zeros
quadratic function f ( x ) = ax2 + bx + c. From
of the quadratic functions the vantage point of f ( x ), solving ax2 + bx +
f ( x) = ax2 + bx + c. c = 0 is the same as asking whether there is a
number x0 so that f ( x0 ) = 0. This then opens
the floodgates to asking a host of other questions, such as whether there is a
number x1 so that f ( x1 ) = d for a given number d. This can get interesting. For
example, consider the function F( x ) = x2 − 4x + 3.
Is there an x0 so that F( x0 ) = 0 ? Yes. x0 = 1, 3.
Is there an x1 so that F( x1 ) = −1 ? Yes. x1 = 2. (Notice that 1
and 3 are equidistant from 2.)
Is there an x2 so that F( x2 ) = k for a k so that k < −1 ? No.
Is there an x3 so that F( x3 ) = for an so that ≥ −1 ? Yes.
Activity
B
2
qq q
B qq q
2 qq q
Now the dotted square has a side of length B2 , so its area is ( B2 )2 . With this
dotted square added to the original figure, the total area is now the area of the
big square with sides of length x + B2 . Thus:
2
2 B B 2
( x + Bx ) + = x+ .
2 2
This is exactly identity (10.5) on page 229.
Exercises 10.1
(1) Without using the quadratic formula, directly solve: (a) 16x2 + 8x + 1 =
0. (b) x2 − 32 x + 1 = 0. (c) 3x2 + 12x + 11 = 0. (d) 6x2 + 3x − 2 = 0.
(2) (Everybody must do this problem!) Starting with px2 + qx + r = 0 (do not
change the notation!), give a self-contained and coherent derivation of
the quadratic formula.
238 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
Y X
X
Y
A first goal of this section is to give more precision to the statement that the Fa ’s
for a > 0 look “alike” and those of the Fa ’s for a < 0 also look “alike”.
We note that, first of all, the difference between the graphs of Fa and F− a
would disappear if we allow the reflection Λ across the x-axis (see page 268) to
identify a graph with its reflected image. In greater detail, we claim: the reflection
Λ maps the graph Ga of Fa to the graph G− a of F− a . To see this, let a > 0. Recall
that this reflection satisfies Λ( x, y) = ( x, −y) (page 57). Given a number x, then
( x, Fa ( x )) lies on Ga while ( x, F− a ( x )) lies on G− a .
Y
Note that ( x, Fa ( x )) = ( x, ax2 ) Ga
r( x, F ( x ))
and ( x, F− a ( x )) = ( x, − ax2 ). a
Therefore, Λ( x, Fa ( x )) = ( x, F− a ( x ))
and Λ( x, F− a ( x )) = ( x, Fa ( x )).
So Λ( Ga ) = G− a for any number a, and X
O
Λ interchanges the graphs Ga and G− a ,
so that the graphs Ga and G− a are
congruent. r( x, F ( x ))
−a
G− a
Activity
Show that a linear function whose graph has positive slope is increasing on
R, and one whose graph has negative slope is decreasing on R.
The linear functions give the correct idea that the graph of an increasing
function goes up as we move to the right, while the graph of a decreasing function
goes down as we move to the right. To continue with the discussion, we have to
introduce two pieces of new notation: for a number p, we let (−∞, p] denote
all the numbers x so that x ≤ p, and we let [ p, ∞) denote all the numbers x so
that x ≥ p. Then the graph Ga of Fa for a > 0 suggests that on (−∞, 0], Fa is
decreasing, while on [0, ∞), Fa is increasing. The picture of the graph Ga serves
the useful purpose of telling us intuitively what is correct or what is incorrect, but
because we are still learning how to write proofs, we should not rely on intuition
alone but also write down the precise reasoning.
Ga Y
( x
, Fa ( x
)) r
r
( x, Fa ( x )) r
X
s s
O x x
Thus let us prove that if 0 ≤ x < x
, then Fa ( x ) < Fa ( x
). In other words, we
have to prove that
0 ≤ x < x
implies ax2 < a( x
)2 .
By Lemma 9.4 on page 208, 0 ≤ x < x
implies x2 < ( x
)2 . Since a > 0, (D) on
page 160 implies that ax2 < a( x
)2 , as desired. Next, we prove that s < s
≤ 0
implies that f (s) > f (s
). Thus we have to prove:
s < s
≤ 0 implies as2 > a(s
)2 .
By (A) of page 160, s < s
≤ 0 implies 0 ≤ −s
< −s, and therefore by Lemma
9.4 again, we have (−s
)2 < s2 , which is equivalent to (s
)2 < s2 . By (A) of page
160 once more, we get a(s
)2 < as2 , which is equivalent to as2 > a(s
)2 . The
proof of the increasing and decreasing properties of Fa for a > 0 is complete.
We pause to make an observation: the fact that Fa ( x ) (a > 0) is decreasing on
(−∞, 0] and increasing on [0, ∞) gives a second proof that Fa achieves a unique
minimum at x = 0.
10.2. A SPECIAL CLASS OF QUADRATIC FUNCTIONS 241
Still with a > 0, the graph G− a of F− a is now the reflection of Ga across the
x-axis.
s s
O x x
X
r
r
r
r
Y G− a
In this case, F− a is increasing on the negative x-axis and decreasing on the positive
x-axis. The details are left to an exercise (Exercise 1 on page 246).
As before, the fact that Fa ( x ) (a < 0) is increasing for x < 0 and decreasing
for x > 0 gives a second proof that Fa achieves a unique maximum at x = 0.
Next, we recall that a set S in the plane is said to have bilateral symmetry
with respect to a line L if for every point Q in S, the reflection Λ across L maps
Q to a point that also lies in S (see page 266, also Exercise 2 on page 246). If we
use the terminology of Chapter 8 (page 165), S being symmetric with respect to L
means the part of S in the half-plane L+ is congruent to the part of S in the other
half-plane L− . If there is such a symmetry, then the study of S itself reduces to a
study of one of the two halves, S ∩ L+ or S ∩ L− . This explains our interest in
such a symmetry.
We claim that, for every a = 0, the graph Ga of Fa has bilateral symmetry with
respect to the y-axis. Let us prove this. With Λ denoting the reflection across the
y-axis, let P be a point of the graph Ga . Then we have to prove that Λ( P) is in
Ga . But P being a point in Ga means that P = ( x, ax2 ) for some number x. Since
Λ is the reflection across the y-axis, we have Λ( P) = (− x, ax2 ) (see page 57).
However,
(− x, ax2 ) = (− x, a(− x )2 ) = (− x, Fa (− x ))
and (− x, Fa (− x )) is of course a point on the graph Ga . Thus Λ( P) is in Ga . This
proves the bilateral symmetry of Ga with respect to the y-axis.
We summarize our findings about the functions Fa in the following theorem,
which states precisely in what way the graphs { Ga } look “alike”.
statement of the theorem and its proof both draw on Lemma 5.3 on page 95. In-
cidentally, the theorem shows why we are interested in the function Fa and its
graph Ga in the first place.
Y
T ( Ga )
Ga T (rP) = ( x + p, ax2 + q)
1
P = ( x, ax2 ) r
q 1r
V = ( p, q)
X
O p
By the definition of the equality of sets (see page 267), we have to show:
(A) T ( Ga ) ⊂ G, i.e., if P is a point of Ga , then T ( P) is a point of G.
(B) G ⊂ T ( Ga ), i.e., if Q is a point of G, then Q = T ( P) for some P on Ga .
Let us first prove (A). By definition, G consists of all the points of the form
(t, f (t)) = (t, a(t − p)2 + q), where t is some number (and therefore the formidable-
looking expression, a(t − p)2 + q , is also just a number). Now if P is a point of
Ga , we are going to show that T ( P) is equal to (t, a(t − p)2 + q) for some t, and
this will prove (A).
Since P is in Ga , by definition, P = ( x, ax2 ) for some number x. By Lemma
5.3 on page 95,
T ( P) = ( x + p, ax2 + q).
Let t = x + p; then a(t − p)2 + q = a( x + p − p)2 + q = ax2 + q. Thus T ( P) =
(t, a(t − p)2 + q), which, as noted above, means that T ( P) is a point of G. The
proof of (A) is complete.
We next prove (B). Suppose Q is a point of G; then Q = (t, a(t − p)2 + q) for
some number t. We have to prove that Q = T ( P) for some point P in Ga ; since a
point of Ga is necessarily equal to ( x, ax2 ) for some number x, we have to prove
10.2. A SPECIAL CLASS OF QUADRATIC FUNCTIONS 243
that
(10.19) (t, a(t − p)2 + q) = T ( x, ax2 ) for some number x.
Now T ( x, ax2 ) = ( x + p, ax2 + q), so if we want this to be equal to the given
point (t, a(t − p)2 + q), then—by equating the two x-coordinates—we must have
x + p = t, so that, necessarily, x = t − p. This suggests that, in order to prove
(10.19), we let x = t − p. Then,
T ( x, ax2 ) = ( x + p, ax2 + q) = ((t − p) + p, a(t − p)2 + q) = (t, a(t − p)2 + q).
This proves (10.19) and hence also (B). The proof of Theorem 10.5 is complete.
Theorem 10.5 tells us that the graph G of f ( x ) = a( x − p)2 + q is just the
translated image of the graph Ga of Fa under T.
G (= T ( Ga ))
Y
Ga
q 1r
V = ( p, q)
X
O p
( p − k, y) ( p + k, y)
r r
q
p−k p p+k
p. The fact that f ( p) = q follows from the definition of f . The proof of Theorem
10.6 is complete.
Activity
Still with f ( x ) = a( x − p)2 + q, consider now the zeros of f . Recall from page
223 that x0 is called a zero (or a root) of f if f ( x0 ) = 0. This number x0 is then
a root of the equation a( x − p)2 + q = 0. On page 235, we saw that the equation
a( x − p)2 + q = 0 has two distinct zeros if and only if a and q have opposite
signs. In terms of f , this means likewise f has two distinct zeros if and only if
a and q have opposite signs. Algebraically, this is a simple assertion that follows
from Lemma 10.1 on page 227. However, using Theorem 10.5, we can understand
this assertion geometrically as well. The zeros of f ( x ) = a( x − p)2 + q are the
x-coordinates of the intersections of its graph G with the x-axis (see the discussion
on page 226). Therefore the question of whether f has two distinct zeros becomes
one of whether G intersects the x-axis at two distinct points. The following three
graphs show the graph of an f with a > 0 for the three cases q < 0, q = 0, and
q > 0, respectively.
.. .. ..
. . .
.. .. ..
. . .
.. .. ..
. . .
.. .. ..
. . .
.. .. ..q( p, q)
.q .q .q
.. p .. p .. p
O .q O . O .
.. ( p, q) .. ..
. . .
to the vertical line x = p. We will also re-derive this result purely algebraically on
page 249.
The pictures of G also explain why f has a double zero if and only if q = 0,
and has no zero if and only if a and q have the same sign.
Exercises 10.2
(1) Let b < 0 and let Fb ( x ) = bx2 . Prove that Fb is increasing on the negative
x-axis and is decreasing on the positive x-axis.
(2) Let Λ be the reflection across a given line L, and let S be a geometric
figure in the plane. Prove that Λ(S ) = S if and only if Λ maps every
point of S to a point of S .
(3) Prove the case of a < 0 in part (ii) of Theorem 10.6 on page 243.
(4) A quadratic function f is given in vertex form: f ( x ) = 2( x − p)2 + q. It
is known that f (−1) = f (2) = 0. (i) If f (1) = −4, what is f (0) ? Do it
in two different ways. (ii) What are p and q?
(5) Given a quadratic function f ( x ) = a( x − p)2 + q. Suppose it is known
that f (−4) = 0 and f (−2) > 0. Can f (−3) be negative? Give a detailed
explanation.
(6) (a) Let G be the graph of g( x ) = x2 . Let G
be the set obtained by chang-
ing each point ( x, y) of G to ( x + 5, y). Then G
is the graph of which
function? (b) G as above, let G
be the set obtained by changing each
point ( x, y) of G to ( x, y − 2). Then G
is the graph of which function?
(c) G as above, let G
be the set obtained by changing each point ( x, y)
of G to ( x − 3, y + 2). Then G
is the graph of which function? (d) Let
G be the graph of the function h( x ) = x3 . If G
is the set obtained by
changing each point ( x, y) of G to ( x + 1, y + 2), then G
is the graph of
which function?
b b2 4ac − b2
= a x2 + x + 2 +
a 4a 4a
b2 4ac − b2
= ax2 + bx + +
4a 4a
= ax2 + bx + c.
It follows that with these values for p and q, the given function f ( x ) = ax2 + bx +
c is also equal to f ( x ) = a( x − p)2 + q. The first proof is complete.
A second proof is to retrace the steps of completing the square on page 231.
Starting with a general quadratic equation f ( x ) = ax2 + bx + c, we let B in (10.5)
be ba to get
2 2
b b b b
2
f (x) = a x + x + c = a x + x + 2
−a + c.
a a 2a 2a
Since
2
b − b2 −b2 + 4ac
−a +c = +c = ,
2a 4a 4a
we have:
2
b 2 b − 4ac
f (x) = a x + − .
2a 4a
Therefore the theorem would be proved if we let
b 4ac − b2
(10.22) p = − and q = .
2a 4a
We have completed the proof of Theorem 10.7.
The point ( p, q) in Theorem 10.7 is, not surprisingly, called the vertex of the
graph. Note that the vertex being on the graph of f means f ( p) = q, which is
obvious anyway from the expression f ( x ) = a( x − p)2 + q.
Activity
Put f ( x ) = 2x2 + x − 2 in vertex form. Does it have any zeros? Where does
it achieve its minimum, and what is its minimum value?
Once we have Theorem 10.7, the theorems in the last section about quadratic
functions in vertex form now become theorems about any quadratic functions.
The following theorem then summarizes our findings about general quadratic
248 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
functions. (Recall that the number b2 − 4ac is called the discriminant of ax2 +
bx + c; see page 232.)
Proof. Parts (i) and (ii) are immediate consequences of Theorem 10.7 and Theo-
rem 10.6 on page 243.
For part (iii), we know from Theorem 10.7 that the function f can be written
as f ( x ) = a( x − p)2 + q, where p and q are as in (10.22). Therefore,
f ( x ) = a ( x − p )2 − M ,
where
b2 − 4ac
M = .
4a2
Since a = 0, x0 is a zero of f if and only if it is a solution of the quadratic
equation ( x − p)2 − M = 0. Suppose the discriminant b2 − 4ac is negative.
Since 4a2 > 0, we have that M < 0 as well and therefore − M > 0. Hence
( x − p)2 − M > 0 because ( x − p)2 ≥ 0 no matter what x is. This implies that
the equation ( x − p)2 − M = 0 can have no solution. If however the discriminant
is ≥ 0, we know from the quadratic formula (Theorem 10.2 on page 232) that the
equation ax2 + bx + c = 0 has a double root (when the discriminant is 0) or two
distinct roots (when the discriminant is positive), and therefore the same holds
for the zeros of f ( x ) = ax2 + bx + c. In fact, the quadratic formula gives the two
zeros (including the double zero) as
√
−b ± b2 − 4ac
(10.23) r1 , r2 = .
2a
See equation (10.10) on page 232. The proof of Theorem 10.8 is complete.
Activity
Use mental math to decide whether each of the following quadratic func-
tions has two distinct zeros, only one zero, or no zero: (i) f ( x ) = 215x2 −
87x + 21. (ii) f (s) = 5s2 + 223 s + 7. (iii) g ( x ) = −83x + 5.2x − 76 . (iv)
2 9
h(s) = 12 s2 − 15
7 s + 1.5. (v) h ( x ) = 3.2x − 9.5x + 22.
2
10.3. PROPERTIES OF QUADRATIC FUNCTIONS 249
In particular, the vertex lies on the line of symmetry of the graph of f given in part (i) of
Theorem 10.8, which is the vertical line L defined by x = p = − b
2a . Since the zeros
of f are, according to (10.23),
√ √
−b b2 − 4ac −b b2 − 4ac
+ and −
2a 2a 2a 2a
we see that the zeros—being the points of intersection of the graph of f with the
x-axis—are equidistant from the line of symmetry L, as demanded by part (i)
of Theorem 10.8. The vertex is either the highest point on the graph of f ( x ) =
ax2 + bx + c (in the case that a < 0) or the lowest point on the graph of f (in the
case that a > 0).
It is common to refer to the graph of a quadratic function in the case of a > 0
as an up parabola, and in the case of a < 0 as a down parabola.
Y Y
a>0 a<0
This terminology is very slippery because it assumes that the definition of a
parabola is known and the reason for why the graph of a quadratic function
is a parabola is also known. The truth is the opposite: TSM has not defined what
a parabola is up to this point, and no reason is ever given as to why the graph of a
quadratic function is a parabola. However, we will justify this terminology in the
next section, Section 10.4, by first defining a parabola precisely, and then proving
that the graph of a quadratic function is a parabola (Theorem 10.11 on page 254).
Example. Let f ( x ) = ax2 + bx + c be a quadratic function with a < 0. Sup-
pose its graph G contains the two points P = (−1, 2) and Q = (4, 2). Where
might f achieve its maximum? Does f have zeros?
The graph G has bilateral symmetry with respect to a vertical line L (part (i)
of Theorem 10.8). Can P and Q be on the same side of L? No, for the following
reason. Let L intersect the x-axis at p. Then because a < 0, f is increasing on
(−∞, p] and decreasing on [ p, ∞) (by part (ii) of Theorem 10.8). Therefore if P
and Q are on the left side of L, the f being increasing on (−∞, p] would mean
f (−1) < f (4). This contradicts f (−1) = f (4) = 2. Similarly, P and Q cannot
be on the right side of L. Thus P and Q are necessarily on opposite sides of L, as
shown below.
250 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
Y
Pr Qr
L Qr
X
−1 O 1 p
Where could P be? Let Λ be the reflection with respect to L. We claim that
Λ( Q) = P. We will prove this by contradiction. Suppose the point Q
= Λ( Q)
is not equal to P—as shown in the picture. Since L is the line of symmetry of G,
we see that Q
is also a point of G. Let Q
= ( x
, 2) for some x
< p, x
= −1;
the fact that the y-coordinate of Q
is 2 is because Q
= Λ( Q) = Λ(4, 2), and
Λ(4, 2) = (m, 2) for some number m by the definition of Λ. Now both Q
and P
are on the same side of L and are both on G; because f is increasing on (−∞, p], we
have f ( x
) < f (−1) or f (−1) < f ( x
). But this contradicts f ( x
) = f (−1) = 2.
Therefore Λ( Q) = P, which implies that the intersection ( p, 0) of L with the x-axis
must be (1.5, 0) because (1.5, 0) is the midpoint between (−1, 0) and (4, 0).
Y
Pr L Qr
X
−1 O 1 1.5
Since the vertex of G must lie on the line of symmetry L, we see that f must
achieve its maximum at 1.5. Moreover, because f is increasing on (−∞, 1.5], we
see that f (1.5) > f (−1) = 2 > 0. Thus f has a positive maximum and therefore,
if we express f in vertex form, f ( x ) = a( x − 1.5)2 + q, then q = f (1.5) > 0 and
q and a have opposite signs. By the observation on page 235, f has two distinct
zeros. We are done.
Let us conclude with a few additional remarks. First, part (ii) of Theorem
10.8 can be made more precise, and the added precision will be important for
applications of quadratic functions to word problems. In (10.22), we have the ex-
plicit values of p and q given in terms of the coefficients a, b, and c of f . Therefore
part (ii) implies the following corollary.
Corollary. Given a quadratic function f ( x ) = ax2 + bx + c. Then:
(i) If a > 0, f achieves a unique minimum at − b
2a , and the mini-
b 4ac− b2
mum value f (− 2a )is 4a .
(ii) If a < 0, f achieves a unique maximum at − 2a
b
, and the maxi-
b 4ac− b 2
mum value f (− 2a ) is 4a .
10.3. PROPERTIES OF QUADRATIC FUNCTIONS 251
(10.25) r1 + r2 = − ba and r1 r2 = ac .
Furthermore,
Once again, we point out that, by (10.26), all quadratic functions with the
same two zeros r1 and r2 are equal to a constant times ( x − r1 )( x − r2 ).
Theorem 10.9 needs the assumption of a nonnegative discriminant because if
the discriminant is negative, the function f has no zeros and it would not make
sense to talk about r1 and r2 . However, once we have complex numbers, there
will always be zeros for complex quadratic polynomial functions f and Theorem
10.9 will be true verbatim without any assumption about the discriminant. See,
e.g., Chapter 11 in Volume II of [Wu-HighSchool].
Finally, let us observe that we have by now obtained at least three different
representations of a quadratic function:
(1) Its standard representation: f ( x ) = ax2 + bx + c.
(2) Its representation in vertex form: f ( x ) = a( x − p)2 + q.
(3) Its representation in factored form: f ( x ) = a( x − r1 )( x − r2 ), where r1 and
r2 are the zeros of f .
Each is important in its own right: the qua-
dratic formula is expressed in terms of (1), for Each of the three representations
example, and (2) displays the line of symmetry of a quadratic function reveals a
of the graph of f and also where it achieves its
maximum or minimum. If the zeros of f are
different facet of the function.
our main interest, then (3) displays its zeros explicitly. Together, the three repre-
sentations give a well-rounded picture of f ; none gets it done alone.
Needless to say, the more representations we have of a concept, the more we
can claim to understand it. However, these representations in mathematics are not
randomly put together. There is always a clear logical relationship between them.
For example, in our situation, (1) was the starting point, i.e., the definition of a
quadratic function, and (2) and (3) were obtained only after serious hard work:
see Theorem 10.7 on page 246 and Theorem 10.3 on page 234, respectively.
Recently, it has become acceptable practice to make amassing different rep-
resentations of a concept a goal in itself, with no imperative to show any logical
interrelationship among them. For example, the concept of a fraction is supposed
to be a part-of-a-whole, a division, a ratio, etc. (see Section 1.1 in [Wu-PreAlg]),
and it is never clear which is the starting point and how the other representations
are related to the starting point, logically speaking. When this happens, it is TSM
and not mathematics. You may wish to stay alert to this fact.
252 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
Exercises 10.3
P
r
A
r
O
B C L
Let the line passing through the focus A and perpendicular to the directrix L
intersect G at a point O and L at a point B. O is called the vertex of the parabola
G. It will be see from Theorem 10.11 below that this use of vertex does not conflict
with the use of the same word on page 247. Also note that from the definition of
the parabola, we have | AB| = 2 | AO|. The length of the segment AO is called the
focal length of the parabola G.
Before proceeding any further, it would be a good idea to first acquire some
intuition for parabolas. We will describe a simple way to construct points on a
parabola with a given focus and a given directrix, in much the same way that we
plot points on the graph of a given function. Thus draw a point A (the focus)
and a line L (the directrix), and we will describe how to draw as many points
equidistant from A and L as we wish.
G
P
s
s
P L
d
A r d
r
O
B L
Let B be the point of intersection of L with the line passing through A and per-
pendicular to L. The midpoint O of AB is of course equidistant from A and L. For
other such points, draw a line L
parallel to L and lying in the same half-plane
of L as the focus A. The distance d of L
from L (see distance between parallel lines
on page 267) should be so large that d > |OB|. Now draw a circle of radius d
and centered at A. Let the circle intersect L
at P and P
. Then P and P
are both
equidistant from L and A. Draw many such pairs of P and P
, and the totality
of these points is the parabola G with focus A and directrix L. A moment of
reflection will show that the line L AB is a line of bilateral symmetry of G.
254 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
Activity
Construct three points on the parabola G with focus (1, 0) and the line de-
fined by y = −1 as its directrix. What is the x-coordinate of the point P on
G if the distance of P from (0, 1) is 3?
at Harvard that one of his daughter’s teachers told her “the graph of x = y2 is not a parabola”. The
culprit is not the teacher; it is TSM.
10.4. THE GRAPH AND THE PARABOLA 255
so that
(10.27) ϕ( D (S )) = S .
A
L Q
Q P
We are not finished, however, because for G
to be the parabola with focus A
and
directrix L
, we must show that G
contains all the points equidistant from A and
L. Therefore it remains to show that if P
is a point equidistant from A
and L
,
then P
lies in G
. Now let P be the point in the plane so that ϕ( P) = P
. Because ϕ
is a congruence and is therefore distance-preserving, we see that P is equidistant
from A (because ϕ( A) = A
) and L (because ϕ( L) = L
). Since G contains all the
256 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
P G
d
A P
d A
L
L Q Q
Then the distance of P from A is equal to the distance of P from L; call this
common distance d. By Theorem 4.4 in [Wu-PreAlg] (see page 271 in this volume),
the distance of P
to either A
or L
is rd. In particular, P
is equidistant from A
and L
, as claimed. It remains to prove that if a point P
is equidistant from A
and
L
, then P
lies in G
= D ( G ). Let P be the point in the plane so that D ( P) = P
.
Then we know (again by Theorem 4.4 in [Wu-PreAlg]) that | P
A
| = r | PA|, and
the distance from P
to L
is also r times the distance from P to L. Because P
is equidistant from A
and L
, we conclude that P is also equidistant from A
and L. Since G is the parabola with A as focus and L as directrix, P lies in G,
and therefore P
(= D ( P)) lies in G
(= ( D ( G )). The proof of Lemma 10.13 is
complete.
The proof of part (i) of Theorem 10.10 is now immediate: Suppose a parabola
G is similar to G
; then we have to prove that G
is a parabola. Let ϕ( D ( G )) = G
,
where ϕ is a congruence and D is a dilation. By Lemma 10.13, D ( G ) is a parabola.
By Lemma 10.12, ϕ( D ( G )) is also a parabola. In other words, G
= ϕ( D ( G )) is
a parabola. The proof of part (i) of Theorem 10.10 is complete.
Remark. The preceding proof of part (i) of Theorem 10.10 implicitly assumes
the so-called symmetry of the similarity relation, i.e., if G ∼ G
, then also G
∼ G
(see Section 4.7 of [Wu-PreAlg] for this discussion). In greater detail, we have
just shown that if a parabola G is similar to G
, then G
is a parabola. However,
we should have also proved that if G is similar to a parabola G
, then G is a
parabola. The reason we did not address this issue is that we have been assuming
all along that this symmetry is valid and have avoided the proof of this fact because
the proof is unpleasant as well as noninstructive; such a proof can be found in
Section 5.4 in Volume I of [Wu-HighSchool].
Proof of Theorem 10.10. (cont.) The proof of part (ii) of the theorem makes use
of the following lemma which is of independent interest. Recall that on page 253,
we defined the concept of the focal length of a parabola, which is the distance from
the focus to the vertex of the parabola.
10.4. THE GRAPH AND THE PARABOLA 257
Lemma 10.14. Two parabolas with the same focal length are congruent.
To prove the lemma, let G be the parabola with focus A and directrix L, and
let G
be the parabola with focus A
and directrix L
. By hypothesis, G and G
have the same focal length. Let the line passing through A (respectively, A
) and
perpendicular to L (resp., L
) intersect L at B (resp., L
at B
).
G
L
G
P
A
A
B
L B Q
Q P
Since | AB| is twice the focal length of G and | A
B
| is twice the focal length of G
,
we see that | AB| = | A
B
|. Thus there is a congruence ϕ so that ϕ( A) = A
and
ϕ( B) = B
. Since a congruence preserves degrees of angles, the fact that L ⊥ L AB
implies that ϕ( L) ⊥ ϕ( L AB ). But ϕ( L AB ) = L A
B
, therefore ϕ( L) ⊥ L A
B
. Since
also L
⊥ L A
B
, we see that ϕ( L) and L
are two lines that are both perpendicular
to L A
B
at the point B
. Therefore ϕ( L) = L
.
We claim:
ϕ( G ) = G .
We are now in a position to prove part (ii) of Theorem 10.10. Given two
parabolas G and G
, we will prove that there is a similarity that maps G to G
.
Let the focus and directrix of G be A and L, and let the focus and directrix
of G
be A
and L
. Furthermore, let the line passing through A (respectively, A
)
and perpendicular to L (resp., L
) intersect L (resp., L
) at the point B (resp., B
),
as shown below.
258 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
rA rA
rA
L L
B B
Let r = | A
B
|/| AB|. Let D be the dilation with center at B and scale factor r.
Let A = D ( A), B = D ( B), L = D ( L), and G = D ( G ). Observe that L = L
and B = B. By Lemma 10.13 on page 255, G is a parabola with focus A and
directrix L. We now compute the focal length of G: it is (compare Theorem 4.4 of
[Wu-PreAlg] on page 271)
1 1 1 | A
B
| 1
| A B = r | AB| = | AB| = | A
B
|.
2 2 2 | AB| 2
Thus G and G
have the same focal length and, by Lemma 10.14, there is a con-
gruence ϕ so that ϕ( G ) = G
. In other words, G
= ϕ( D ( G )). Therefore, ϕ ◦ D
is the desired similarity. The proof of Theorem 10.10 is complete.
Before we give the proof of Theorem 10.11, some motivation for the proof
may shed light on the proof itself. Consider the graph Ga of Fa ( x ) = ax2 for
some constant a. If we believe Theorem 10.11, then Ga must be a parabola. Since
a parabola is defined in terms of a focus and a directrix, one naturally wants to
know which point is the focus and which line is the directrix of Ga ? Since the
y-axis is the line of symmetry for Ga , we should look for the focus of Ga among
points along the y-axis, i.e., points of the form (0, k). For simplicity, we assume
a > 0. Then k > 0, and since the origin O should be equidistant from (0, k) and
the directrix, it follows that the directrix of Ga has to be the line y = −k. We will
now determine this k.
y
Ga
P = ( x, y)
(0, k) = A r
r x
O
L = {y = −k}
( y + k )2 = x 2 + ( y − k )2 .
10.4. THE GRAPH AND THE PARABOLA 259
Expanding, we find:
y2 + 2ky + k2 = x2 + y2 − 2ky + k2 ,
which becomes
1 2
y = 4k x for all ( x, y) on Ga .
Since Ga is the graph of Fa ( x ) = ax2 , we also have y = ax2 for all ( x, y) on Ga .
1 2 1
Comparing these two equations, y = 4k x and y = ax2 , we get a = 4k and
therefore
1
k = 4a .
Since the focus of Ga is (0, k) and the directrix is the graph of y = −k, we see that
if Ga is a parabola, then
1
(10.28) its focus should be 0, 4a ,
1
(10.29) its directrix should be the line {y = − 4a }.
Note that, if a < 0, then k < 0, but the preceding reasoning and the conclusions
in (10.28) and (10.29) remain valid.
Activity
To this end, we have to prove two things: (i) Every point P on Ga is equidis-
tant from A and L. (ii) Any point equidistant from A and L lies on Ga . Clearly,
what lies ahead is just a straightforward computation.
If P is a point on Ga , then P = ( x, ax2 ). The distance formula (page 57) implies
that the square of the distance from P to A is:
1 2
x2 + ( ax2 − 4a ) = x2 + a2 x4 − 12 x2 + ( 4a
1 2
)
= a2 x4 + 12 x2 + ( 4a
1 2
)
1 2
= ax2 + 4a .
260 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
But the square of the distance from P = ( x, ax2 ) to the horizontal line L defined
by y = − 4a
1
is simply the square of the difference of their y-coordinates:
2 1 2 1 2
ax − (− 4a ) = ax2 + 4a .
Hence the squares of the two distances are equal and by Corollary 2 on page 209,
the distances themselves are equal. This shows that every point of G is equidistant
from A and L. This proves (i).
To prove (ii), suppose P = ( x, y) is equidistant from both A and L. Then the
squares of the distances are equal. But the square of the distance from P to A is
x2 + (y − 4a ) , while the square of the distance from P to L is (y − (− 4a
1 2 1
))2 , i.e.,
1 2
(y + 4a ) . Thus,
1 2 1 2
x2 + (y − 4a ) = y + 4a .
This implies
1 1 2 1 1 2
x2 + y2 − 2a y + ( 4a ) = y2 + 2a y + ( 4a ) .
After cancellation and collecting terms, we get x2 = 1a y, so that y = ax2 . There-
fore P = ( x, ax2 ) and P is a point on Ga . This proves (ii) and, therewith, also
Theorem 10.11.
Exercises 10.4
(1) Let G be the graph of f ( x ) = a( x − p)2 + q. Locate the focus and find
the equation of the directrix of G.
(2) Write down the explicit similarity that maps the graph of f ( x ) = 4x2 +
12x to the graph G1 of F1 ( x ) = x2 . Do the same with the graph of
g( x ) = −4x2 + 12x − 9.
(3) Let A = ( p, q + b), and let L be the line {y = q − b}, where b, p, and
q are real numbers. Prove that if1 a point 2Q is equidistant from A and ,
then Q must be of the form x, 4b ( x − p) + q .
(4) Let D be the dilation centered at the origin O so that D ( x, y) = (4x, 4y),
and let T be the translation along the vector from O to (1, −3). If G2
is the graph of F2 ( x ) = 2x2 , write down the function whose graph is
T ( D ( G2 )).
(5) Let C be the graph of the quadratic equation x = 14 y2 − y. Prove that C
is a parabola.
given data about the difference between the fraction and its reciprocal has to be
expressed as
y x 55
− = .
x y 24
Therefore,
y2 − x 2 55
=
xy 24
and we get 55xy = 24(y2 − x2 ), by the cross-multiplication algorithm (see page 270).
We are also given that y − 2x = 2. Therefore,
55x (2x + 2) = 24 (2x + 2)2 − x2 .
Expand both sides to get
110x2 + 110x = 72x2 + 192x + 96.
After simplifying, we get the quadratic equation in x,
19x2 − 41x − 48 = 0.
One either sees the left side as ( x − 3)(19x + 16) or one uses the quadratic for-
mula. In any case, the latter gives the roots as:
41 ± 412 + (4 · 19 · 48) 41 ± 73 32
= = 3 or − .
38 38 38
Only 3 makes sense in context, so we reject the other root. Thus x = 3 and y = 8.
The fraction is therefore 38 . (Check: 83 − 38 = 55
24 .)
Example 2. (Golden ratio) Given a rectangle ABCD and an embedded square
ABEF so that ABCD is similar to the smaller rectangle ECDF, as shown:
A F D
z
y
B E C
y
Let | AB| = x, | BC | = y, and | FD | = z. Find the ratio x .
y
Since ABCD ∼ ECDF, we have x = xz , so that x2 = yz, by the cross-
multiplication algorithm. But from the picture, z = y − | AF| = y − | AB| = y − x,
because ABEF is a square. Therefore we get x2 − y(y − x ) = 0, and this is a
quadratic equation in y whose coefficients are expressed in terms of the number x:
(10.30) y2 − xy − x2 = 0.
In the usual way of writing a quadratic equation as ax2 + bx + c = 0, the y in
(10.30) takes the place of x, a is 1, b is − x, and c is − x2 . The solutions by the
quadratic formula are therefore:
√
x ± x2 + 4x2 √
= 12 (1 ± 5) x.
2
262 10. QUADRATIC FUNCTIONS AND THEIR GRAPHS
√ √
Since 1 − 5 < 0 and x and y are positive numbers, 12 (1 − 5) x is a spurious
solution in this context. Therefore,
y √
= 12 (1 + 5) ≈ 1.6.
x
√
The number 12 (1 + 5) is called the golden ratio; next to 0, 1, π, e, and i, it
may be the most famous number in mathematics. It has a habit of coming up in
unexpected situations, e.g., in the discussion of Fibonacci numbers; see the article
[Wiki-goldenratio] for a general introduction to the golden ratio and for further
references. It has been a tradition to make index cards so that the ratio of their
side lengths is roughly the golden ratio; be sure to verify this fact to your own
satisfaction as soon as possible.
Example 3. If an object is thrown directly upwards from a height of h meters
from the ground with an initial velocity of v0 m/sec, then its distance (in meters)
f (t) above the ground t seconds after it is thrown is
f (t) = −4.9t2 + v0 t + h.
(This follows from Newton’s second law and the law of universal gravitation.)
Now if h = 20 meters and v0 = 2 m/sec, what is the highest point of the object
above the ground, when does it get there, and when does it hit the ground?
The highest point above the ground is the maximum of the quadratic function
f (t) = −4.9 t2 + 2t + 20. We can make use of the Corollary on page 250 to get the
maximum. It is
4(−4.9)(20) − 22 (4.9)20 + 1 10
= = 20 meters.
4(−4.9) 4.9 49
However, it would be futile, not to say impossible, to keep that corollary in acces-
sible memory all the time; there are more important things to memorize in one’s
life. Form the habit right now of just completing the square to get the vertex form until
you are fluent in this skill, as follows:
2
f (t) = −4.9t + 2t + 20 = (−4.9) t −
2 2
t + 20
4.9
2 1 2 1
= (−4.9) t −2
t+ + + 20
4.9 4.9 4.9
1 2 10
= (−4.9) t − + + 20.
4.9 49
Exercises 10.5
(1) A rectangle has a perimeter of 180 linear units and an area of 1800 area
units. What are its dimensions?
(2) A hifi store sells only 35 CD players of a particular brand each month
when the price is marked up to make a profit of $50 per player. Suppose
the store decides to change the price in integer multiples of 2 dollars,
and it is known that (roughly) for each $2 decrease in the price, the store
can sell 5 more players. What should the price be per player in order to
maximize total monthly profit? What will the total profit be per month?
(3) Find two numbers whose difference is 7, and the difference of their cubes
is 721.
(4) A merchant has a cask full of wine. He draws out 6 gallons and fills the
cask with water. Again he draws out 6 gallons, and fills the cask with
water. There are now 25 gallons of pure wine in the cask. How many
gallons does the cask hold?
(5) Two workmen can do a piece of work (think of painting a house) together
in 6 days. In how many days can each do it alone if it takes one of them
5 days longer than the other? (Assume both work at a constant rate and
that they do not interfere with each other.)
(6) George drove from town A to town B at an average speed of x mph. On
the way back along the same road from Town B to town A, he ran into
rush hour traffic and his average speed slowed down to ( x − 10) mph.
The driving round trip took (about) an hour and fifteen minutes. If the
driving distance between towns A and B is 30 miles, what is x (rounded
to the nearest one)?
(7) Two trains go from City A to City B at constant rate; the distance between
the cities is 200 miles. The second train starts one hour later than the first,
but, traveling 5 mph faster, gets to City B only 30 minutes later than the
first train. Find the time of travel for each train. (Idealize both trains to
be a single point in your reasoning.)
(8) A tank can be filled by the larger of two faucets in 5 hours less time than
by the smaller one. It is filled by them both together in 6 hours. If the
water flows from the faucets at a constant rate, how many hours will it
take to fill the tank by each faucet separately?
Appendix: Facts from [Wu-PreAlg]
Part 1. Assumptions
Part 2. Definitions
Part 3. Theorems and Lemmas
The section in [Wu-PreAlg] where each item first appears is indicated parenthetically at
the end of that item.
Part 1. Assumptions
Fundamental Assumption of School Mathematics (FASM). We can add and
multiply real numbers, and the laws of operations for both addition and
multiplication (associative, commutative, and distributive), the formu-
las (a)–(d) for rational quotients (page 270), and the basic facts about
inequalities (A)–(E) for rational numbers (page 269) continue to be valid
when the rational numbers are replaced by real numbers. (Section 2.7)
(Iso1). Translations, reflections, and rotations preserve lengths of segments and
degrees of angles. (Section 4.4)
(Iso2). Under a translation or reflection or rotation, the image of a line is a line,
the image of a segment is a segment, and the image of a ray is a ray.
(Section 4.4)
Part 2. Definitions
Average speed. For an object in motion, its average speed over the time interval
from t1 to t2 , t1 < t2 , is
(Section 1.9)
Basic isometry. In the plane, a basic isometry refers to a translation, a rotation,
or a reflection. (Section 4.4)
Between. Given a line L in the plane, let P and Q be two points on L. A point S
is said to be between P and Q if S lies on L and if, when we make L into
a number line, either P < S < Q or Q < S < P holds. The fact that one
and only one of these inequalities holds is independent of the way L is
made into a number line. (Section 4.4)
Bilateral symmetry. A geometric figure S is said to have bilateral symmetry with
respect to a line L if the reflection Λ across L has the property that
Λ(S) = S. Equivalently, S is symmetric with respect to L if Λ maps every
point of S to a point of S (this is because Λ ◦ Λ = identity transforma-
tion). The line L is called the line of symmetry or the axis of symmetry.
(Section 4.4)
Binomial coefficients. Let n and k be whole numbers . Then the binomial coeffi-
cients (nk) for k satisfying 0 ≤ k ≤ n is the whole number
n n!
= .
k (n − k)! k!
(Exercises 1.4)
Closed interval. Let a and b be two numbers so that a < b. Then the closed
interval [ a, b] is the set of all numbers x satisfying a ≤ x ≤ b. (Section
2.6)
Closed half-plane. It is the union of a half-plane of a line together with the line
itself. (Section 4.4)
Complex fraction. A complex fraction is a fraction obtained by a division A B of
two fractions A and B (B > 0). We continue to call A and B the numera-
tor and denominator of A B , respectively. (Section 1.7)
Congruence. A congruence is a transformation of the plane that is the composi-
tion of a finite number of reflections, rotations, and translations. (Section
4.5)
Congruent figures. A geometric figure S is congruent to another geometric fig-
ure S
, in symbols, S ∼ = S
, if there is a congruence ϕ so that ϕ(S ) = S
.
(Section 4.5)
Constant speed. An object in motion is said to have constant speed if the average
speed of the motion over any time interval (see page 266) is equal to a
fixed constant. This fixed constant is then called the (constant) speed of
the motion. (Section 1.9)
Corresponding angles of a transversal. A pair of angles formed when two paral-
lel lines are intersected by a transversal are called corresponding angles
if they are obtained by replacing one angle in a pair of alternate interior
angles (relative to this transversal) by its opposite angle. (Section 4.6)
APPENDIX: FACTS FROM [Wu-PreAlg] 267
Dilation. A transformation D of the plane is a dilation with center O and scale factor
r (r > 0) if
(1) D (O) = O.
(2) If P = O, the point D ( P), to be denoted by P
, is the point
on the ray ROP so that |OP
| = r |OP|. (Section 4.6)
Distance between parallel lines. Given two parallel lines L1 and L2 , the length
of the segment intercepted on a transversal that is perpendicular to both
L1 and L2 is a constant, and this length is the distance between L1 and
L2 . (Section 5.3)
Equal sets or equal geometric figures. Two geometric figures S and S
are equal,
in symbols S = S , if
(i) every point P of S is also a point in S
, and
(ii) every point Q of S
is also a point in S .
(Section 3.1; Section 4.4)
Exponent. Let b be a nonzero number and let n be a positive integer. Then bn
is by definition equal to b · b · · · b (n times). In this case, n is called
the exponent of bn . [In Chapter 9, this concept of an exponent will be
expanded to include n as a rational number, and even as a real number.]
(Section 1.1)
Figure. See geometric figure.
Fraction division. If m k k
n and are fractions ( = 0), then the division, or quo-
tient, of n by , in symbols, k/ , is the fraction ba so that m
m k m/n a
n = b ×
k
. (Section 1.6)
Fraction multiplication. The multiplication of two fractions k × m n is by defini-
tion the length of the concatenation of k parts when [0, m n is partitioned
]
into equal parts. (Section 1.5)
Fraction subtraction. If k > m n , then the subtraction − n is by definition the
k m
−q −p 0 p q
(Section 2.1)
Union. The union of a collection of sets consists of all the points which belong
to at least one set in the collection. (Section 5.1)
−→
Vector. A vector AB is a segment AB so that A is the starting point and B is the
endpoint. (Section 2.2)
AA criterion for similarity. If two triangles have two pairs of equal angles, they
are similar. (Section 4.7)
ASA. If two triangles have two pairs of equal angles and the common side of
the angles in one triangle is equal to the corresponding side in the other
triangle, then the triangles are congruent. (Section 4.5)
Basic facts about inequalities in Section 2.6. If x, y, z, . . . are rational numbers,
then:
(A) x < y ⇐⇒ − x > −y.
(B) x < y ⇐⇒ x + z < y + z.
(C) x < y ⇐⇒ x − y < 0.
(D) If z > 0, then x < y ⇐⇒ xz < yz.
(E) If z < 0, then x < y ⇐⇒ xz > yz.
(Section 2.6)
Cancellation law for rational quotients. If x, y, and z are rational numbers, and
y, z = 0, then,
x zx
= .
y zy
(Section 2.5)
270 APPENDIX: FACTS FROM [Wu-PreAlg]
| AB| | AC |
= ,
| A
B
| | A
C
|
then
ABC ∼
A
B
C
. (Section 4.7)
SSS. If the three sides of a triangle and the three corresponding sides of an-
other triangle are pairwise equal, then the two triangles are congruent.
(Section 4.5)
Theorem 1 in the Appendix of Chapter 1. For any finite collection of numbers,
the sums obtained by adding them up in any order are all equal. (Section
1.11)
Theorem 2 in the Appendix of Chapter 1. For any finite collection of numbers,
the products obtained by multiplying them in any order are all equal.
(Section 1.11)
Theorem 4.2. ( a) An isosceles triangle has equal base angles. (b) In an isosceles
triangle, the perpendicular bisector of the base, the angle bisector of the
top angle, the median from the top vertex, and the altitude on the base
all coincide. (Section 4.5)
APPENDIX: FACTS FROM [Wu-PreAlg] 271
Theorem 4.4. If D is a dilation with center O and scale factor r, then for any two
points P and Q in the plane, so that P
= D ( P) and Q
= D ( Q) are their
dilated images, we have
| P
Q
| = r | PQ|.
(Section 4.6)
Theorem 4.5. Let D be a dilation with center O and scale factor r, and let P, Q be
two points not collinear with O. Further let P
denote D ( P). Then the
dilated image Q
of Q is the intersection of line LOQ and the line passing
through P
and parallel to L PQ . (Section 4.6)
Theorem 4.7. Alternate interior angles of a transversal with respect to a pair of
parallel lines are equal. The same is true of corresponding angles. (Sec-
tion 4.6)
Theorem 4.9 If two lines have a pair of equal alternate interior angles or corre-
sponding angles with respect to a transversal, they are parallel. (Section
4.6)
Theorem 4.12. Given two triangles ABC and A
B
C
, their similarity, i.e.,
ABC ∼
A
B
C
, implies the following equalities:
|∠ A| = |∠ A
|, |∠B| = |∠B
|, |∠C | = |∠C
|,
| AB| | AC | | BC |
= = .
| A
B
| | A
C
| | B
C
|
(Section 4.7)
Bibliography
[Birkhoff-MacLane] G. Birkhoff and S. Mac Lane, A Survey of Modern Algebra, 4th Edition, MacMillan,
NY, 1977.
[CCSSM] Common Core State Standards for Mathematics (2010). Retrieved from
http://www. corestandards.org/Math/
[Dolciani] R. G. Brown, M. P. Dolciani, R. H. Sorgenfrey, and W. L. Cole, Algebra. Structure and Method.
Book 1, California Teacher’s Edition, McDougall Litell, Evanston, IL, 2000.
[EngageNY] Grade 8 Mathematics Module 4: Teacher Materials.
https://www.engageny.org/resource/grade-8-mathematics-module-4
[Euclid] Euclid, The Thirteen Books of the Elements, transl. Thomas L. Heath, Volume I (Books I and
II), Dover Publications, New York, NY, 1956.
[Eureka] Eureka Math - Grade 8. http://greatminds.net/maps/math/module-pdfs-v3
[GIMPS] Great Internet Mersenne Prime Search. http://www.mersenne.org/
[Gladwell] M. Gladwell, Outliers: The Story of Success, Little, Brown and Company, New York, NY,
2008.
[Lamon] S. J. Lamon, Teaching Fractions and Ratios for Understanding, Lawrence Erlbaum, Mahwah,
NJ, 1999.
[MAC] Mathematics Assessment Collaborative, Grade Six Performance Assessment, Spring 2002. Re-
trieved June 1, 2013 from http://www.svmimac.org/images/MARS2002\_6A.pdf
[Meyer] Dan Meyer, The Math I Learned After I Thought Had Already Learned Math, August 11, 2015.
Retrieved from http://blog.mrmeyer.com/2015/
the-math-i-learned-after-i-thought-had-already-learned-math/
[MSE] Why is a geometric progression called so?, Mathematics Stack Exchange. Retrieved from
http://math.stackexchange.com/questions/1281856/why-is-a-geometric-
progression-called-so
[NCTM] Curriculum and Evaluation Standards for School Mathematics, National Council of Teachers of
Mathematics, Reston, VA, 1989.
[NCTM2000] Principles and standards for school mathematics, National Council of Teachers of Mathemat-
ics, Reston, VA, 2000.
[NMP] National Mathematics Advisory Panel, Foundations for Success: Reports of the Task Groups
and Sub-Committees, U.S. Department of Education, Washington DC, 2008. Retrieved from
http://tinyurl.com/kvxw3zc
[NRC] Adding It Up, National Research Council, The National Academy Press, Washington DC,
2001.
[Post-Behr-Lesh] T. Post, M. Behr, and R. Lesh, Proportionality and the development of pre-algebra under-
standing, in The Idea of Algebra, K–12 (1998 Year Book of the National Council of Teachers
of Mathematics), A. F. Coxford and A. P. Shulte, eds., Reston, VA, 1988, pp. 78–90.
[Postelnicu] V. Postelnicu, Student Difficulties with Linearity and Linear Functions and Teachers’ Under-
standing of Student Difficulties, Dissertation, Arizona State University, 2011. Retrieved from
http://repository.asu.edu/attachments/56417/content/
Postelnicu_asu_0010E_10384.pdf
[Postelnicu-Greenes] V. Postelnicu and C. Greenes, Do teachers know what their students know?, National
Council of Supervisors of Mathematics Newsletter 42 (3) (2012), 14–15.
[Robson] E. Robson, Neither Sherlock Holmes nor Babylon: A reassessment of Plimpton 322, Historia Math-
ematica 28 (2001), 167-206.
[Ross] K. A. Ross, Elementary Analysis: The Theory of Calculus, Springer, New York, NY, 1980.
273
274 BIBLIOGRAPHY
[Siegler-etal.] R. Siegler et al., Developing Effective Fractions Instruction for Kindergarten Through
8th Grade: A Practice Guide (NCEE #2010-4039), Washington DC: NCEE, Institute of Edu-
cation Sciences, U.S. Department of Education, 2010.
http://ies.ed.gov/ncee/wwc/pdf/practice_guides/fractions_pg_093010.pdf
[Stanley] D. Stanley, Proportionality confusion. http://blogs.ams.org/matheducation/2014/11/20/
proportionality-confusion/
[Stump] S. L. Stump, High School Precalculus Students’ Understanding of Slope as Measure, School Sci-
ence and Mathematics 101 (2) (2001), 81-89.
[Teukolsky] R. Teukolsky, Conic sections, an exciting enrichment topic, in Learning and Teaching Ge-
ometry, National Council of Teachers of Mathematics 1987 Yearbook, M. M. Lindquist and
A. P. Shulte, eds., National Council of Teachers of Mathematics, Reston, VA, 1987, pp. 155-
174.
[Wiki-AGM] Inequality of arithmetic and geometric means, Wikipedia. https://en.wikipedia.org/
wiki/Inequality_of_arithmetic_and_geometric_means
[Wiki-conic] Conic Sections, Wikipedia. http://en.wikipedia.org/wiki/Conic_section
[Wiki-cryptography] Public-key cryptography, Wikipedia. http://en.wikipedia.org/wiki/
Public-key_cryptography
[Wiki-floorfunction] Floor and ceiling functions, Wikipedia. https://en.wikipedia.org/wiki/
Floor_and_ceiling_functions
[Wiki-GIMPS] Great Internet Mersenne Prime Search, Wikipedia. http://en.wikipedia.org/
wiki/Great_Internet_Mersenne_Prime_Search
[Wiki-goldenratio] Golden ratio, Wikipedia. Retrieved from http://en.wikipedia.org/wiki/
Golden_ratio
[Wu2004] H. Wu, “Order of operations” and other oddities in school mathematics, 2004. Retrieved from
http://math.berkeley.edu/~wu/order5.pdf
[Wu2006] H. Wu, How mathematicians can contribute to K-12 mathematics education, Proceedings of In-
ternational Congress of Mathematicians, Madrid 2006, Volume III, European Mathematical
Society, Zürich, 2006, pp. 1676-1688. Also http://math.berkeley.edu/~wu/ICMtalk.pdf
[Wu2010a] H. Wu, Pre-Algebra (Draft of textbook for teachers of grades 6-8) (April 21, 2010). Retrieved
from http://math.berkeley.edu/~wu/Pre-Algebra.pdf
[Wu2010b] H. Wu, Introduction to School Algebra (Draft of textbook for teachers of grades 6-8) (August
14, 2010). Retrieved from http://math.berkeley.edu/~wu/Algebrasummary.pdf
[Wu2011] H. Wu, Understanding Numbers in Elementary School Mathematics, Amer. Math. Soc., Provi-
dence, RI, 2011.
[Wu2013] H. Wu, Potential Impact of the Common Core Mathematics Standards on the Ameri-
can Curriculum, in Mathematics Curriculum in School Education, Yeping Li and
Glenda Lappan, eds., Springer, Berlin-Heidelberg-New York, 2013, pp. 119-143. Also
http://math.berkeley.edu/~wu/Common_Core_on_Curriculum_1.pdf
[Wu2015] H. Wu, Textbook School Mathematics and the preparation of mathematics teachers. Retrieved from
https://math.berkeley.edu/~wu/Stony_Brook_2014.pdf
[Wu-PreAlg] H. Wu, Teaching School Mathematics: Pre-Algebra, Amer. Math. Soc., Providence, RI, 2016.
[Wu-HighSchool] H. Wu, Mathematics of the Secondary School Curriculum, I, II, and III (to appear).
+4 - %
6 []
2
8
5
7KLV LV D V\VWHPDWLF H[SRVLWLRQ RI LQWURGXFWRU\ VFKRRO DOJHEUD ZULWWHQ VSHFLÀ
cally for Common Core era teachers. The emphasis of the exposition is to
give a mathematically correct treatment of introductory algebra. For example,
it explains the proper use of symbols, why “variable” is not a mathematical
FRQFHSWZKDWDQHTXDWLRQLVZKDWHTXDWLRQVROYLQJPHDQVKRZWRGHÀQHWKH
slope of a line correctly, why the graph of a linear equation in two variables is
a straight line, why every straight line is the graph of a linear equation in two
variables, how to use the shape of the graph of a quadratic function as a guide
IRUWKHVWXG\RITXDGUDWLFIXQFWLRQVKRZWRGHÀQHDSDUDERODFRUUHFWO\ZK\WKH
graph of a quadratic function is a parabola, why all parabolas are similar, etc.
7KLVH[SRVLWLRQRIDOJHEUDPDNHVIXOOXVHRIWKHJHRPHWULFFRQFHSWVRIFRQJUX
HQFHDQGVLPLODULW\DQGLWMXVWLÀHVZK\WKH&RPPRQ&RUH6WDQGDUGVRQDOJHEUD
are written the way they are.