Download as pdf or txt
Download as pdf or txt
You are on page 1of 337

Mathematics and Applied Mathematics

Mathematical Sciences

Course Notes

WTW 124
Mathematics 124

2022
Semester 2

© Copyright reserved
Compiled by: Dr. M Mabula
Dr. J H van der Walt
Prof. N van Rensburg
Dr. H Wiggins

Edited by: Prof. L M Pretorius


Mathematics 124
Contents

1 Vectors and a Model for Space 4

1.1 Algebraic Vectors and Vector Algebra . . . . . . . . . . . . . . . . . . . . . . 5

1.2 The Dot Product and the Norm . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 A Mathematical Model for Space . . . . . . . . . . . . . . . . . . . . . . . . 13

1.4 Lines in Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.5 Angles and Angle Measurement . . . . . . . . . . . . . . . . . . . . . . . . . 33

1.6 Planes in Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

1.7 The Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

1.8 Geometric Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

2 Matrices and Systems of Linear Equations 76

2.1 Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

2.2 Matrices and Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 85

2.3 Gauss Elimination for Systems of Linear Equations . . . . . . . . . . . . . . 95

2.4 The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

2.5 The Determinant of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 117

2.5.1 Definition of the Determinant . . . . . . . . . . . . . . . . . . . . . . 118

2.5.2 Properties of the Determinant . . . . . . . . . . . . . . . . . . . . . . 125

2.5.3 The Determinant and the Inverse . . . . . . . . . . . . . . . . . . . . 132

2.6 An Application to Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 135

1
3 The Definition of a Limit 150

3.1 The Limit at a Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

3.2 The Limit Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

3.3 One-sided Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

3.4 Limits at ±∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

4 Integration and Applications 179

4.1 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

4.2 Integration of Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . 185

4.3 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . . . . 199

4.4 The Integral Mean Value Theorem . . . . . . . . . . . . . . . . . . . . . . . 208

4.5 Arc Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

4.6 The Natural Logarithmic and Exponential Functions . . . . . . . . . . . . . 217

4.7 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

4.8 Taylor Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

5 Curves in R2 and R3 253

5.1 Vector Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254

5.2 Limits and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

5.3 Differentiation of Vector Functions . . . . . . . . . . . . . . . . . . . . . . . 278

5.4 Curve Sketching in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

6 Complex numbers 294

6.1 Definition and Algebraic Operations . . . . . . . . . . . . . . . . . . . . . . . 294

6.2 Modulus of a Complex Number . . . . . . . . . . . . . . . . . . . . . . . . . 302

6.3 Polar Form and de Moivre’s Theorem . . . . . . . . . . . . . . . . . . . . . . 305

6.4 The Complex Exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

6.5 Roots of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

2
7 Polynomials over R and C 316

7.1 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

7.2 The Division Algorithm and the Factor Theorem . . . . . . . . . . . . . . . 319

7.3 Roots of polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323

A 328

A.1 Some Theorems You Should Know . . . . . . . . . . . . . . . . . . . . . . . 328

A.2 Mathematical Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329

3
Chapter 1

Vectors and a Model for Space

Most of mathematics originated as attempts to describe aspects of our physical reality.


Indeed, geometry was invented in ancient Egypt, among other places, as an aid for the
measurement of agricultural land along the river Nile. More recently, Newton developed
Differential and Integral Calculus in order to write down the laws of physics in precise
mathematical form.

In this chapter we introduce a mathematical model for three-dimensional space. Consider


first two simple examples. How would we describe the position of a player on a soccer pitch?
If we fix one goal line and one side line of the pitch, we can describe the position of a player
on the field in terms of exactly two numbers; namely, the distance x1 between the player
and the chosen side line, and the distance x2 from the player to the specified goal line.
Side line

x1

x2
Goal line

We may therefore represent the position of a player on the pitch as an ordered pair of real
numbers ⟨x1 , x2 ⟩.

Now consider a rectangular room. By fixing two non-opposing walls, we describe the ‘po-
sition’ of any ‘point’ in the room in terms of an ordered triple ⟨x1 , x2 , x3 ⟩ of real numbers.
Here x1 and x2 are the distances from the ‘point’ to the two walls, respectively, while x3 is
the distance from the ‘point’ to the floor of the room. This gives a unique description of
every ‘point’ in the room as an ordered triple of real numbers. These ideas will be made
precise in Section 1.3, after we have introduced the necessary mathematical prerequisites.

4
1.1 Algebraic Vectors and Vector Algebra

In this section, we formalise some of the ideas presented in the introductory paragraph of
this chapter. At this point our definitions and theorems are purely mathematical, and have
no connection with the physical world.
Definition 1.1.1. Let n be a natural number. An algebraic vector with n components is
an ordered set ⟨x1 , x2 , . . . , xn ⟩ of n real numbers. The numbers x1 , x2 , . . . , xn are called the
components of the algebraic vector.
Remark 1.1.2. For a natural number n, denote the set of all algebraic vectors with n
components by Rn . The elements of Rn are denoted by x̄ = ⟨x1 , x2 , · · · , xn ⟩.
Example 1.1.3. The vectors x̄ = ⟨1, 2, −3⟩ and ȳ = ⟨−2, 0, 7⟩ are elements of R3 , while
z̄ = ⟨−1, 2, 3, −3⟩ is a vector in R4 .
Definition 1.1.4. Two algebraic vectors x̄ = ⟨x1 , x2 , . . . , xn ⟩ and ȳ = ⟨y1 , y2 , . . . , yn ⟩ in Rn
are equal if xi = yi for all i = 1, 2, . . . , n.
Example 1.1.5. The vectors x̄ = ⟨1, 2, −3⟩ and ȳ = ⟨1, −3, 2⟩ in R3 are not equal, since

x3 = −3 ̸= 2 = y3 .

The algebraic vectors z̄ = ⟨1, 1, 1⟩ and w̄ = ⟨1, 1, 1, 1⟩ are not equal, since z̄ ∈ R3 and
w̄ ∈ R4 .

The set Rn is equipped with algebraic operations in a natural way.


Definition 1.1.6. If x̄ = ⟨x1 , . . . , xn ⟩ and ȳ = ⟨y1 , . . . , yn ⟩ are algebraic vectors in Rn , and
α is a real number, then the sum of x̄ and ȳ is the algebraic vector

x̄ + ȳ = ⟨x1 + y1 , x2 + y2 , . . . , xn + yn ⟩.

The scalar multiple αx̄ is the vector

αx̄ = ⟨αx1 , αx2 , . . . , αxn ⟩.

Example 1.1.7. Let x̄ = ⟨2, 3, 9⟩, ȳ = ⟨−2, 4, 1⟩, z̄ = ⟨2, 5, 1, 4⟩ and w̄ = ⟨−4, 0, 2, −1⟩.
Then
x̄ + ȳ = ⟨2, 3, 9⟩ + ⟨−2, 4, 1⟩ = ⟨2 − 2, 3 + 4, 9 + 1⟩ = ⟨0, 7, 10⟩,
z̄ + w̄ = ⟨2, 5, 1, 4⟩ + ⟨−4, 0, 2, −1⟩ = ⟨2 − 4, 5 + 0, 1 + 2, 4 − 1⟩ = ⟨−2, 5, 3, 3⟩
and
7x̄ = 7⟨2, 3, 9⟩ = ⟨7 × 2, 7 × 3, 7 × 9⟩ = ⟨14, 21, 63⟩.
Note that x̄ + z̄ is undefined, since x̄ ∈ R3 and z̄ ∈ R4 .

Before we investigate the properties of vector addition and scalar multiplication, we intro-
duce the following.

5
Definition 1.1.8. The zero vector in Rn is the algebraic vector
0̄ = ⟨0, 0, . . . , 0⟩.
For an algebraic vector x̄ ∈ Rn , −x̄ is the algebraic vector
−x̄ = ⟨−x1 , −x2 , . . . , −xn ⟩.
The difference between algebraic vectors x̄ and ȳ in Rn is the algebraic vector
x̄ − ȳ = x̄ + (−ȳ).

Vector addition and scalar multiplication satisfy a number of familiar properties, as we show
in the following theorem.
Theorem 1.1.9. If x̄, ȳ and z̄ are algebraic vectors in Rn , and α and β are real numbers,
then the following hold.

(1) Vector addition is commutative: x̄ + ȳ = ȳ + x̄.


(2) Vector addition is associative: x̄ + (ȳ + z̄) = (x̄ + ȳ) + z̄.
(3) 0̄ is the additive identity: x̄ + 0̄ = x̄.
(4) −x̄ is the additive inverse of x̄: x̄ + (−x̄) = 0̄.
(5) Scalar multiplication is associative: α(β x̄) = (αβ)x̄.
(6) 1 is the multiplicative identity: 1x̄ = x̄.
(7) (−1)x̄ = −x̄.
(8) 0x̄ = 0̄.
(9) α0̄ = 0̄.
(10) Scalar multiplication is distributive over vector addition: α(x̄ + ȳ) = αx̄ + αȳ.
(11) Scalar multiplication is distributive over addition in R: (α + β)x̄ = αx̄ + β x̄.

Proof of (10). For the sake of simplicity of notation, we give a proof for algebraic vectors
in R3 . Consider x̄, ȳ ∈ R3 and a real number α. Then
α(x̄ + ȳ) = α⟨x1 + y1 , x2 + y2 , x3 + y3 ⟩ [Definition of vector addition]

= ⟨α(x1 + y1 ), α(x2 + y2 ), α(x3 + y3 )⟩ [Definition of scalar multiplication]

= ⟨αx1 + αy1 , αx2 + αy2 , αx3 + αy3 ⟩ [Distributive law in R]

= ⟨αx1 , αx2 , αx3 ⟩ + ⟨αy1 , αy2 , αy3 ⟩ [Definition of vector addition]

= αx̄ + αȳ. [Definition of scalar multiplication]

6
The proofs of the remaining properties of vector addition and scalar multiplication listed
in Theorem 1.1.9 use essentially similar arguments, and are therefore given as exercises, see
Exercise 1.1 number 4.
Example 1.1.10. Let x̄ = ⟨−2, 3, 1⟩, ȳ = ⟨7, 0, 5⟩ and z̄ = ⟨4, 1, 8⟩. We calculate

5(x̄ + 2ȳ) and x̄ − 3(x̄ + z̄).

Solution. By Theorem 1.1.9 (10) and (5),

5(x̄ + 2ȳ) = 5x̄ + 5(2ȳ)

= 5x̄ + 10ȳ

= ⟨−10, 15, 5⟩ + ⟨70, 0, 50⟩

= ⟨60, 15, 55⟩.

Using Theorem 1.1.9 (10) and (2) we find that

x̄ − 3(x̄ + z̄) = x̄ − (3x̄ + 3z̄)

= (x̄ − 3x̄) − 3z̄

= −2x̄ − 3z̄

= ⟨4, −6, −2⟩ + ⟨−12, −3, −24⟩

= ⟨−8, −9, −26⟩.

It is shown in subsequent sections how algebraic vectors in R3 , addition and scalar multi-
plication are used to give a mathematical description of space, and certain objects in space.

Exercise 1.1

1. Let v̄ = ⟨3, −6, 7⟩, x̄ = ⟨2, 1, 2⟩, ȳ = ⟨−1, 8, 1⟩, z̄ = ⟨−2, 3, 0, 2⟩, w̄ = ⟨9, −2, 1, 1⟩.
Calculate each of the following algebraic vectors, if it is defined. If the algebraic
vector is not defined, explain why.
(a) 5x̄ − 2ȳ (b) v̄ + 6(ȳ − x̄) (c) z̄ − 2(x̄ + ȳ) (d) 2x̄ − 7(v̄ + 3ȳ)
(e) 2 + x̄ (f) 3z̄ − 2(w̄ + z̄) (g) 6(3x̄ + ȳ − 2v̄) (h) 7ȳ − 2x̄ + 3v̄
(i) x̄ + (v̄ − w̄) (j) x̄ + 0ȳ − 2v̄
2. Let x̄ = ⟨1, α, −2⟩, ȳ = ⟨β, 1−β, α⟩ and z̄ = ⟨1, 8, −1⟩ where α and β are real numbers.
Find all values for α and β, if any, for which each of the following equations is true.
(a) 2x̄ + 3ȳ = z̄ (b) x̄ − ȳ = 0̄ (c) αx̄ + 2ȳ = ⟨7, −3, 0⟩

7
3. If ā, b̄ and c̄ are algebraic vectors in Rn , then ā is a linear combination of b̄ and c̄ if
there are real numbers α and β so that ā = αb̄ + βc̄.
Let b̄ = ⟨−1, 2, 1⟩ and c̄ = ⟨1, 1, 1⟩. Determine whether p̄ = ⟨2, 5, 4⟩, q̄ = ⟨−4, 2, 0⟩
and r̄ = ⟨2, −4, −1⟩ are linear combinations of b̄ and c̄.

4. Prove Theorem 1.1.9 for algebraic vectors in R3 .

5. Use Theorem 1.1.9 to prove the following: If x̄, ȳ and z̄ are algebraic vectors in Rn so
that x̄ + z̄ = ȳ + z̄, then x̄ = ȳ.

6. Prove that if x̄ and ȳ are algebraic vectors in R3 and α is a non-zero real number so
that αx̄ = αȳ, then x̄ = ȳ.

1.2 The Dot Product and the Norm

In this section, we introduce an additional operation on algebraic vectors in Rn ; namely,


the dot product of two vectors. The dot product is used to define the norm of an algebraic
vector in Rn . The meaning of these concepts is discussed in Sections 1.3 and 1.5.

Definition 1.2.1. Let x̄ = ⟨x1 , . . . , xn ⟩ and ȳ = ⟨y1 , . . . , yn ⟩ be algebraic vectors in Rn . The


dot product of x̄ and ȳ is the real number
n
X
x̄ · ȳ = xi yi = x1 y1 + x2 y2 + · · · + xn yn .
i=1

Example 1.2.2. Let x̄ = ⟨2, 3, 9⟩, ȳ = ⟨−2, 4, 1⟩, z̄ = ⟨2, 5, 1, 4⟩ and w̄ = ⟨−4, 0, 2, −1⟩.
Then
x̄ · ȳ = 2 × (−2) + 3 × 4 + 9 × 1 = 17
and
z̄ · w̄ = 2 × (−4) + 5 × 0 + 1 × 2 + 4 × (−1) = −10.
Note that x̄ · z̄ is undefined, since x̄ ∈ R3 and z̄ ∈ R4 .

The dot product satisfies the following properties.

Theorem 1.2.3. Let x̄, ȳ and z̄ be algebraic vectors in Rn , and α and β real numbers.
Then the following hold.

(1) Positive definiteness: x̄ · x̄ ≥ 0, and x̄ · x̄ = 0 if and only if x̄ = 0̄.

(2) Commutativity: x̄ · ȳ = ȳ · x̄.

(3) Distributive law: x̄ · (αȳ + β z̄) = α(x̄ · ȳ) + β(x̄ · z̄).

8
Proof of (1). We give a proof for algebraic vectors in R3 . Let x̄ be an algebraic vector in
R3 . Then
x̄ · x̄ = x21 + x22 + x23 ≥ 0
because x21 , x22 , x23 ≥ 0.

Assume that x̄ · x̄ = 0. Then

0 = x̄ · x̄ = x21 + x22 + x23 . (1.1)

If x̄ ̸= 0̄, then by Definition 1.1.4, xi ̸= 0 for some i = 1, 2, 3. Then

x̄ · x̄ = x21 + x22 + x23 ≥ x2i > 0,

contradicting (1.1). Therefore x̄ = 0̄.

Conversely, if x̄ = 0̄, then x1 = x2 = x3 = 0 by Definition 1.1.4. Hence

x̄ · x̄ = x21 + x22 + x23 = 0.

The proofs of the remaining two properties of the dot product are given as exercises, see
Exercise 1.2 numbers 2 and 3.
Example 1.2.4. Let x̄ = ⟨−2, 3, 1⟩, ȳ = ⟨7, 0, 5⟩ and z̄ = ⟨4, 1, 8⟩. We calculate

x̄ · (2ȳ − z̄) and (x̄ + ȳ) · (x̄ + 2z̄).

Solution. According to Theorem 1.2.3 (3),

x̄ · (2ȳ − z̄) = 2(x̄ · ȳ) − (x̄ · z̄)

= 2(−14 + 5) − (−8 + 3 + 8)

= −21.

Using Theorem 1.2.3 (2) and (3) we find that

(x̄ + ȳ) · (x̄ + 2z̄) = (x̄ + ȳ) · x̄ + 2((x̄ + ȳ) · z̄)

= x̄ · (x̄ + ȳ) + 2(z̄ · (x̄ + ȳ))

= x̄ · x̄ + x̄ · ȳ + 2(z̄ · x̄) + 2(z̄ · ȳ)

= 14 − 9 + 6 + 136

= 147.

Next we introduce the norm of an algebraic vector in Rn .

9
Definition 1.2.5. The norm of an algebraic vector x̄ = ⟨x1 , . . . , xn ⟩ in Rn is the real number
√ q
∥x̄∥ = x̄ · x̄ = x21 + x22 + · · · + x2n .

Remark 1.2.6. Take note of the following concerning the norm of a vector.

(1) Due to Theorem 1.2.3 (1), the norm ∥x̄∥ of x̄ is well defined for every x̄ ∈ Rn .

(2) If ū is an algebraic vector in Rn so that ∥ū∥ = 1, then we call ū a unit vector. In


particular,
ī = ⟨1, 0, 0⟩, j̄ = ⟨0, 1, 0⟩ and k̄ = ⟨0, 0, 1⟩
are unit vectors in R3 . The set {ī, j̄, k̄} is called the standard basis for R3 . Every
algebraic vector x̄ = ⟨x1 , x2 , x3 ⟩ can be expressed in a unique way in terms of ī, j̄ and
k̄ as
x̄ = x1 ī + x2 j̄ + x3 k̄.

Example 1.2.7. Let x̄ = ⟨1, 2, −3⟩, ȳ = ⟨−2, 0, 7⟩ and z̄ = ⟨−1, 2, 3, −3⟩. Then
√ √ √
∥x̄∥ = x̄ · x̄ = 1 + 4 + 9 = 14,
√ √ √
∥ȳ∥ = ȳ · ȳ = 4 + 0 + 49 = 53
and √ √ √
∥z̄∥ = z̄ · z̄ = 1 + 4 + 9 + 9 = 23.

One of the most important properties of the norm of an algebraic vector, in relation to the
dot product, is the following result, known as the Cauchy-Schwarz Inequality. As we show
in Section 1.5, it is essential for some of the applications of algebraic vectors.

Theorem 1.2.8 (Cauchy-Schwarz Inequality). Let x̄ and ȳ be algebraic vectors in Rn .


Then |x̄ · ȳ| ≤ ∥x̄∥∥ȳ∥.

Proof. We give a proof for algebraic vectors in R3 . Fix x̄, ȳ ∈ R3 . If ȳ = 0̄, then

|x̄ · ȳ| = |x1 (0) + x2 (0) + x3 (0)| = 0



and ∥ȳ∥ = ȳ · ȳ = 0 by Theorem 1.2.3 (1). Therefore

|x̄ · ȳ| = 0 = ∥x̄∥∥ȳ∥.

Now assume that ȳ ̸= 0̄. Then ∥ȳ∥2 = ȳ · ȳ ̸= 0 by Theorem 1.2.3 (1). Hence
x̄ · ȳ
α=
∥ȳ∥2

10
is a well defined real number. By Theorem 1.2.3 (3) we have
(x̄ − αȳ) · (x̄ − αȳ) = (x̄ − αȳ) · x̄ + (x̄ − αȳ) · (−αȳ)

= (x̄ − αȳ) · x̄ − α((x̄ − αȳ) · ȳ)

= x̄ · (x̄ − αȳ) − α(ȳ · (x̄ − αȳ))

= x̄ · x̄ − α(x̄ · ȳ) − α(ȳ · x̄) + α2 (ȳ · ȳ).


It now follows from Theorem 1.2.3 (2) that
(x̄ − αȳ) · (x̄ − αȳ) = ∥x̄∥2 − 2α(x̄ · ȳ) + α2 ∥ȳ∥2

2(x̄ · ȳ)2 (x̄ · ȳ)2


= ∥x̄∥2 − +
∥ȳ∥2 ∥ȳ∥2

(x̄ · ȳ)2
= ∥x̄∥2 − .
∥ȳ∥2
By Theorem 1.2.3 (1) we have
(x̄ · ȳ)2
0 ≤ (x̄ − αȳ) · (x̄ − αȳ) = ∥x̄∥2 − .
∥ȳ∥2
Therefore
(x̄ · ȳ)2
≤ ∥x̄∥2
∥ȳ∥2
so that
(x̄ · ȳ)2 ≤ ∥x̄∥2 ∥ȳ∥2 .
p p
Hence |x̄ · ȳ| = (x̄ · ȳ)2 ≤ ∥x̄∥2 ∥ȳ∥2 = ∥x̄∥ ∥ȳ∥.
Remark 1.2.9. Let x̄ and ȳ be algebraic vectors in Rn . According to the Cauchy-Schwarz
Inequality,
|x̄ · ȳ| ≤ ∥x̄∥∥ȳ∥.
This inequality includes two possibilities; namely,
|x̄ · ȳ| < ∥x̄∥∥ȳ∥ or |x̄ · ȳ| = ∥x̄∥∥ȳ∥.
In applications, it is important to know when equality holds in the Cauchy-Schwarz Inequal-
ity; that is, when |x̄ · ȳ| = ∥x̄∥∥ȳ∥. We have
|x̄ · ȳ| = ∥x̄∥∥ȳ∥ if and only if ȳ = αx̄ or x̄ = αȳ f or some α ∈ R,
see Exercise 1.2 number 6.

Using the properties of the dot product given in Theorem 1.2.3, and the Cauchy-Schwarz
Inequality, Theorem 1.2.8, we obtain the following properties of the norm.

11
Theorem 1.2.10. If x̄ and ȳ are algebraic vectors in Rn , and α is a real number, then the
following hold.

(1) ∥x̄∥ ≥ 0, and ∥x̄∥ = 0 if and only if x̄ = 0̄.

(2) ∥αx̄∥ = |α|∥x̄∥.

(3) ∥x̄ + ȳ∥ ≤ ∥x̄∥ + ∥ȳ∥.

Proof of (1). We give a proof for algebraic vectors in R3 . Let x̄ ∈ R3 . Then


√ q
∥x̄∥ = x̄ · x̄ = x21 + x22 + x23 ≥ 0

by the definition of the square root function.

Assume that ∥x̄∥ = 0. Then x̄·x̄ = ∥x̄∥2 = 0 so that x̄ = 0̄ by√Theorem 1.2.3 (1). Conversely,
if x̄ = 0̄ then x̄ · x̄ = 0 by Theorem 1.2.3 (1) so that ∥x̄∥ = 0 = 0.

The proofs of the remaining properties of the norm are given as exercises, see Exercise 1.2
numbers 4 and 5.

The meaning of the dot product and the norm of an algebraic vector depends on how we
interpret vectors in Rn . For algebraic vectors in R2 and R3 there are natural interpretations
of these concepts, which are the topics of Sections 1.3 and 1.5.

Exercise 1.2

1. Let v̄ = ⟨3, −6, 7⟩, x̄ = ⟨2, 1, 2⟩, ȳ = ⟨−1, 8, 1⟩, z̄ = ⟨−2, 3, 0, 2⟩, w̄ = ⟨9, −2, 1, 1⟩. Cal-
culate the following, if possible. Otherwise, explain why it is not possible to evaluate
the given expression.
(a) ∥5x̄ − 2ȳ∥ (b) v̄ · (ȳ − 2x̄) (c) ∥z̄ − x̄ + ȳ∥
(d) (w̄ − 2z̄) · (w̄ + 2z̄) (e) ∥x̄ · ȳ∥ (f) (2v̄ − x̄ + 3ȳ) · (x̄ − v̄)
(g) (∥x̄∥ȳ − ∥ȳ∥x̄) · (∥x̄∥ȳ − ∥ȳ∥x̄) (h) x̄ · (v̄ − ȳ) (i) (2x̄) · ȳ + v̄
(j) ∥7ȳ − 2x̄ + 3v̄∥

2. Prove Theorem 1.2.3 (2) for algebraic vectors in R3 .

3. Prove Theorem 1.2.3 (3) for algebraic vectors in R3 .



4. Prove Theorem 1.2.10 (2) for algebraic vectors in R3 . [HINT: Remember, |α| = α2
for all α ∈ R.]

5. Prove Theorem 1.2.10 (3) for vectors in R3 . [HINT: Calculate ∥x̄+ ȳ∥2 using Definition
1.2.5, and use the Cauchy-Schwarz Inequality, Theorem 1.2.8. Remember, α ≤ |α| for
all real numbers α.]

12
6. Let x̄ and ȳ be algebraic vectors in R3 .
(a) If x̄ = αȳ for some α ∈ R, show that |x̄ · ȳ| = ∥x̄∥∥ȳ∥.
(b) Now suppose |x̄ · ȳ| = ∥x̄∥∥ȳ∥, and ȳ ̸= 0̄. In the proof of Theorem 1.2.8 it is
shown that
(x̄ · ȳ)2
0 ≤ (x̄ − αȳ) · (x̄ − αȳ) = ∥x̄∥2 −
∥ȳ∥2
for some real number α. Use this fact to prove that x̄ = αȳ.

7. Let ū and v̄ be algebraic vectors in R3 . Prove the following.


(a) ∥ū − v̄∥2 + ∥ū + v̄∥2 = 2∥ū∥2 + 2∥v̄∥2 .
(b) ū · v̄ = 41 (∥ū + v̄∥2 − ∥ū − v̄∥2 ).
[HINT: Use the definition of the norm, Definition 1.2.5, and the properties of the dot
product given in Theorem 1.2.3.]

1.3 A Mathematical Model for Space

In this section we show how the set R3 gives a mathematical model for three-dimensional
space. In particular, we show how algebraic vectors in R3 are used to specify ‘positions’
in space. Recall that in order to specify the ‘position’ of a ‘point’ in a rectangular room,
we select two non-opposing walls. We describe the ‘position’ of a ‘point’ in the room by
specifying the distances x1 , x2 and x3 from the ‘point’ to the two chosen walls and the floor
of the room, respectively. The result may be expressed as an algebraic vector x̄ = ⟨x1 , x2 , x3 ⟩
in R3 .

It should be noted that our mathematical description for a rectangular room is an idealiza-
tion, and does not correspond exactly to reality. For example, the position of a bed in a
small room does not make sense. On the other hand, the position of a mosquito does make
sense, even though the mosquito is not small enough to qualify as a ‘point’.

Let us now make precise our mathematical model for space. We assume that we know what
a point in space is, what is meant by the distance between two points, by direction and
perpendicular, and that the Theorem of Pythagoras holds.

A model for space Fix a point of reference in space, and three mutually perpendicular
directions, labeled x, y and z, respectively.

(1) We identify the point of reference with with the zero vector 0̄ ∈ R3 .

(2) An algebraic vector x̄ = ⟨a, b, c⟩ in R3 represents the point in space obtained by moving
from 0̄ a distance of a units in the x-direction if a ≥ 0 or |a| units in the opposite
direction if a < 0, followed by a movement of b units in the y-direction if b ≥ 0 or |b|
units in the opposite direction if b < 0, and lastly followed by a movement of c units
in the z-direction if c ≥ 0 or |c| units in the opposite direction if c < 0.

13
z

c
x̄ = ⟨a, b, c⟩

• b
a 0̄ y
c
x a
b

Remark 1.3.1. Note the following regarding our model for space.

(1) Since we identify algebraic vectors in R3 with points in three-dimensional space, we


may refer to any algebraic vector p̄ ∈ R3 as a point in space, or a point in R3 .

(2) It follows from the Theorem of Pythagoras that the distance between 0̄ and a point p̄
is ∥p̄∥.

(3) In general, the distance between two points p̄ and q̄ is ∥p̄ − q̄∥. Note that ∥p̄ − q̄∥ =
∥q̄ − p̄∥.

(4) When we use an algebraic vector to represent a point in space, we denote its com-
ponents by lowercase roman characters. For instance, we write p̄ = ⟨a, b, c⟩ or x̄ =
⟨x, y, z⟩. For a point p̄ = ⟨a, b, c⟩ in space, we call the components of the vector p̄ the
Cartesian coordinates of the point. Specifically, a is the x-coordinate of p̄, b is the
y-coordinate of p̄ and c is the z-coordinate of p̄.

Example 1.3.2. The distance between the points p̄ = ⟨3, 2, −1⟩ and q̄ = ⟨1, 4, 0⟩ is

∥p̄ − q̄∥ = ∥⟨2, −2, −1⟩∥ = 4 + 4 + 1 = 3.

Our everyday experience tells us that, given points p̄, q̄ and r̄, the distance between p̄ and q̄
is strictly less than the distance between p̄ and r̄ plus the distance between r̄ and q̄, unless
r̄ is ‘between’ p̄ and q̄. If it is to be of any use, our model for space should be in agreement
with this observation. The following theorem partially addresses this issue, to which we
return in Section 1.4.

Theorem 1.3.3 (Triangle Inequality). If p̄, q̄ and r̄ are points in R3 , then

∥p̄ − q̄∥ ≤ ∥p̄ − r̄∥ + ∥r̄ − q̄∥.

14
This result is a direct consequence of Theorem 1.2.10 (3), and the proof is therefore left as
an exercise, see Exercise 1.3 number 3. Note that the inequality
∥p̄ − q̄∥ ≤ ∥p̄ − r̄∥ + ∥r̄ − q̄∥
includes two possibilities, namely,
∥p̄ − q̄∥ < ∥p̄ − r̄∥ + ∥r̄ − q̄∥
or
∥p̄ − q̄∥ = ∥p̄ − r̄∥ + ∥r̄ − q̄∥.
Both these possibilities occur, as we demonstrate at the hand of an example.
Example 1.3.4. Let p̄ = ⟨2, 0, 2⟩, q̄ = ⟨0, 1, 0⟩ and r̄ = ⟨−2, 0, −2⟩. Then

∥p̄ − q̄∥ = 3, ∥p̄∥ = 2 2 and ∥q̄∥ = 1.
Hence √
∥p̄ − 0̄∥ + ∥0̄ − q̄∥ = 2 2 + 1
so that
∥p̄ − q̄∥ < ∥p̄ − 0̄∥ + ∥0̄ − q̄∥.
On the other hand, √ √
∥p̄ − r̄∥ = 4 2 and ∥p̄∥ = ∥r̄∥ = 2 2.
Therefore
∥p̄ − r̄∥ = ∥p̄ − 0̄∥ + ∥0̄ − r̄∥.

In this section it is shown how R3 serves as a model for space. The algebraic vectors in R3
represent points in space, and ∥p̄ − q̄∥ gives the distance between points p̄ and q̄. Theorem
1.3.3 is an indication that our model is in agreement with reality.

Exercise 1.3

1. In each case, calculate the distance between the points p̄ and q̄. Determine whether the
distance between p̄ and q̄ is less than the distance between p̄ and 0̄ plus the distance
between 0̄ and q̄.
(a) p̄ = ⟨1, 2, 2⟩, q̄ = ⟨2, 0, −1⟩
(b) p̄ = ⟨2, −1, 2⟩, q̄ = ⟨−4, 2, −4⟩
(c) p̄ = ⟨3, 1, −1⟩, q̄ = ⟨1, 2, 3⟩
2. Consider the points p̄ = ⟨1, α, 2⟩ and q̄ = ⟨1, 0, 4⟩ where α is a real number. Find α if
(a) the distance between p̄ and q̄ is 3 units.
(b) the distance between p̄ and q̄ is 1 unit.
3. Use Theorem 1.2.10 (3) to prove Theorem 1.3.3.
4. Let p̄ = ⟨1, 2, 1⟩, q̄ = ⟨−1, 0, −1⟩ and r̄ = ⟨x, y, z⟩. Show that ∥p̄ − r̄∥ = ∥q̄ − r̄∥ if and
only if x + y + z = 1.

15
1.4 Lines in Space

We all have an intuitive understanding of what a ‘straight line’ in space is. For instance,
we would all agree that the edge of rectangular tabletop is a ‘straight line’, and if we were
asked to draw a ‘straight line’ on a sheet of paper, we would know what to do. In this
section we introduce a mathematical model for what we call a ‘straight line’. This is done
in the context of the model for space introduced in Section 1.3.

The concept of a ‘straight line’ is closely related to that of ‘betweenness’. Intuitively, if


three points in space lie on a ‘straight line’, then one of the three points is ‘between’ the
other two. In order to motivate our definition of a straight line, we therefore first consider
what it means for a point r̄ to be ‘between’ two points p̄ and q̄.

Definition 1.4.1. Consider distinct points p̄ and q̄. We say that a point r̄ is between p̄
and q̄ if r̄ = tp̄ + (1 − t)q̄ for some 0 < t < 1.

Our experience of physical reality tells us that, given three points p̄, q̄ and r̄, the distance
between p̄ and q̄ is less than the distance between p̄ and r̄ plus the distance between r̄ and
q̄, unless r̄ is between p̄ and q̄. In terms of our model for space, this fact is expressed as

∥q̄ − p̄∥ < ∥q̄ − r̄∥ + ∥r̄ − p̄∥ if r̄ is not between p̄ and q̄

and
∥q̄ − p̄∥ = ∥q̄ − r̄∥ + ∥r̄ − p̄∥ if r̄ is between p̄ and q̄.
As a motivation for our definition of betweenness, we show that this fact is true in our model
for space.

Theorem 1.4.2. Consider points p̄, q̄ and r̄ in R3 so that p̄ ̸= q̄, r̄ ̸= p and r̄ ̸= q̄. Then

∥q̄ − p̄∥ = ∥q̄ − r̄∥ + ∥r̄ − p̄∥

if and only if r̄ is between p̄ and q̄.

Proof. Assume that ∥q̄ − p̄∥ = ∥q̄ − r̄∥ + ∥r̄ − p̄∥. Then

∥p̄ − q̄∥2 = ∥q̄ − r̄∥2 + 2∥q̄ − r̄∥∥r̄ − p̄∥ + ∥r̄ − p̄∥2 . (1.2)

By the definition of the norm, Definition 1.2.5, and the distributivity and commutativity of
the dot product, Theorem 1.2.3 (3) and (2), we have

∥p̄ − q̄∥2 = ∥(p̄ − r̄) + (r̄ − q̄)∥2

= [(p̄ − r̄) + (r̄ − q̄)] · [(p̄ − r̄) + (r̄ − q̄)] (1.3)

= ∥p̄ − r̄∥2 + 2[(p̄ − r̄) · (r̄ − q̄)] + ∥r̄ − q̄∥2

16
see Exercise 1.4 number 9 (a). Combining (1.2) and (1.3) we have

(q̄ − r̄) · (r̄ − p̄) = ∥q̄ − r̄∥∥r̄ − p̄∥. (1.4)

Therefore, see Remark 1.2.9 and Exercise 1.2 number 6 (b), there exists a real number α so
that

q̄ − r̄ = α(r̄ − p̄). (1.5)

By Theorem 1.2.3 (3) and (1.4) we have

α∥r̄ − p̄∥2 = ∥q̄ − r̄∥∥r̄ − p̄∥.

Because r̄ ̸= p̄ and r̄ ̸= q̄, ∥r̄ − p̄∥ > 0 and ∥q̄ − r̄∥ > 0. Therefore
∥q̄−r̄∥
α= ∥r̄−p̄∥
> 0.

We solve for r̄ in (1.5) and find


α α

r̄ = 1+α
p̄ + 1− 1+α
q̄,

see Exercise 1.4 number 9 (b). Because α > 0, we have


α
0< 1+α
<1

so that r̄ is between p̄ and q̄, see Exercise 1.4 number 9 (c).

The proof of the converse statement is left as an exercise, see Exercise 1.4 number 10.

We illustrate the concept of betweenness at the hand of an example.

Example 1.4.3. Consider the points p̄ = ⟨−4, −1, 1⟩ and q̄ = ⟨2, 1, 3⟩. We determine
whether or not the points 0̄ = ⟨0, 0, 0⟩ and r̄ = ⟨−1, 0, 2⟩ are between p̄ and q̄.

Solution. The point 0̄ is between p̄ and q̄ if and only if

0̄ = tp̄ + (1 − t)q̄ = ⟨2 − 6t, 1 − 2t, 3 − 2t⟩ for some 0 < t < 1.

Hence, according to Definition 1.1.4, 0̄ is between p̄ and q̄ if and only if

0 = 2 − 6t, 0 = 1 − 2t and 0 = 3 − 2t for some 0 < t < 1.

The first equation implies that t = 31 , but the second equation gives t = 12 . Since 13 ̸= 12 ,
there is no value for t for which 0̄ = tp̄ + (1 − t)q̄. Therefore 0̄ is not between p̄ and q̄.

The point r̄ is between p̄ and q̄ if and only if

r̄ = tp̄ + (1 − t)q̄ = ⟨2 − 6t, 1 − 2t, 3 − 2t⟩ for some 0 < t < 1.

17
According to Definition 1.1.4, r̄ = ⟨−1, 0, 2⟩ is between p̄ and q̄ if and only if

−1 = 2 − 6t, 0 = 1 − 2t and 2 = 3 − 2t for some 0 < t < 1.

According to the first equation, t = 12 . This value for t also satisfies the second and third
equations, so that
r̄ = 21 p̄ + (1 − 12 )q̄.
1
Because 0 < 2
< 1, it follows that r̄ is between p̄ and q̄.

We now come to the main object of study for this section; namely, the definition of a line in
our model for space. As mentioned, we expect that if three points in space lie on a ‘straight
line’, then one of the three points is between the other two. We therefore define a line in
space as follows.

Definition 1.4.4. A line is a set of points of the form

L = {tp̄ + (1 − t)q̄ : t ∈ R}

where p̄, q̄ ∈ R3 with p̄ ̸= q̄.

Remark 1.4.5. Consider a line L = {tp̄ + (1 − t)q̄ : t ∈ R} in R3 .

(1) It is important to note that L is a set in R3 . A point x̄ ∈ R3 is either an element of


L, or it is not an element of L. If x̄ ∈ L, we say that x̄ is a point on L, or x̄ lies on
L.

(2) Both p̄ and q̄ belong to the set L. We therefore refer to L as the line through p̄ and q̄.

(3) A point x̄ ∈ R3 is on L, that is, x̄ ∈ L, if and only if x̄ = tp̄ + (1 − t)q̄ for some t ∈ R.
We therefore speak of the line determined by the equation

x̄ = tp̄ + (1 − t)q̄, t ∈ R.

It is often convenient to write the equation for L in the equivalent form

x̄ = q̄ + t(p̄ − q̄), t ∈ R.

(4) If ū and v̄ are distinct points on L, then L = {tū + (1 − t)v̄ : t ∈ R}, see Example
1.4.15 and Theorem 1.4.16. Therefore L has more than one equation; that is, a single
line can be described using infinitely many different equations.

The next result shows that our definition of a line in R3 is in agreement with our intuition
of what a ‘straight line’ in space is.

Theorem 1.4.6. Let L be a line in R3 . If p̄, q̄ and r̄ are distinct points on L, then exactly
one of the following statements is true.

18
(1) p̄ is between q̄ and r̄.
(2) q̄ is between p̄ and r̄.
(3) r̄ is between p̄ and q̄.

Proof. We first show that at most one of (1), (2) and (3) is true. Suppose that (1) and (2)
is true. Then there exist real numbers s, t ∈ (0, 1) so that
p̄ = tq̄ + (1 − t)r̄ and q̄ = sp̄ + (1 − s)r̄.
Then
(1 − st)p̄ = (1 − st)r̄,
see Exercise 1.4 number 11 (a). But 1 − st ̸= 0, see Exercise 1.4 number 11 (b). Therefore
p̄ = r̄, which contradicts the assumption that p̄ ̸= r̄. Therefore (1) and (2) cannot both be
true. In the same way, (1) and (3) cannot both be true, and, (2) and (3) cannot both be
true. Therefore at most one of (1), (2) and (3) is true.

Now we show that at least one of (1), (2) and (3) is true. Assume that
x̄ = ū + tv̄, t ∈ R
is an equation for L. Since p̄, q̄, r̄ ∈ L, there exist real numbers t0 , t1 and t2 so that
p̄ = ū + t0 v̄, q̄ = ū + t1 v̄ and r̄ = ū + t2 v̄.
Since p̄, q̄ and r̄ are all different, it follows that t0 ̸= t1 , t0 ̸= t2 and t1 ̸= t2 . There are six
possibilities to consider; namely,
t0 < t1 < t2 , t0 < t2 < t1 , t1 < t0 < t2 , t1 < t2 < t0 , t2 < t0 < t1 or t2 < t1 < t0 .
We now have the following, see Exercise 1.4 number 11 (c).
If t0 < t1 < t2 or t2 < t1 < t0 , then q̄ is between p̄ and r̄.
If t0 < t2 < t1 or t1 < t2 < t0 , then r̄ is between p̄ and q̄.
If t1 < t0 < t2 or t2 < t0 < t1 , then p̄ is between q̄ and r̄.

This shows that at least one of (1), (2) and (3) is true. But at most one of (1), (2) and (3)
is true. Therefore exactly one of (1), (2) and (3) is true.

Given a line L in R3 , there are two important kinds of subsets of L. These are line segments
and rays.
Definition 1.4.7. Consider distinct points p̄ and q̄ in R3 . The line segment between p̄ and
q̄ is the set of points
S = {tp̄ + (1 − t)q̄ : 0 ≤ t ≤ 1}.
Remark 1.4.8. The line segment S between two points p̄ and q̄ in R3 consists of the points
p̄ and q̄ and all points x̄ between p̄ and q̄. In particular, a point x̄ is between p̄ and q̄ if and
only if x̄ ∈ S and x̄ ̸= p̄ and x̄ ̸= q̄.

19
Definition 1.4.9. Consider distinct points p̄ and q̄ in R3 . A ray is a set of points of the
form
R = {tp̄ + (1 − t)q̄ : t ≥ 0} = {q̄ + t(p̄ − q̄) : t ≥ 0}.
The point q̄ is called the origin of the ray.

Remark 1.4.10. Let L be a line in space.

(1) Any point q̄ ∈ L divides L into exactly two rays. Indeed, let p̄ and r̄ be points on
L so that q̄ is between p̄ and r̄. Then R1 = {q̄ + t(p̄ − q̄) : t ≥ 0} and R2 =
{q̄ + t(r̄ − q̄) : t ≥ 0} are rays with origin q̄ such that

L = R1 ∪ R2 and R1 ∩ R2 = {q̄}.

R2 r̄
@•
@
I
@
@ q̄
@•
@
@ p̄
@•
@R R1
@

(2) If x̄ ∈ L, then x̄ ∈ R1 if and only if x̄ is on the line segment between q̄ and p̄, or p̄ is
between q̄ and x̄. We therefore refer to R1 as the ray with origin q̄ in the direction of
p̄, and call (p̄ − q̄) the direction vector of R1 .


• x̄ = q̄ + t(p̄ − q̄), 0 < t < 1
@

@
@ p̄

@
@ x̄ = q̄ + t(p̄ − q̄), t > 1

@
R R1
@
@

(3) For any two points p̄ and q̄ in R3 , the set R = {q̄ + tp̄ : t ≥ 0} is a ray. Indeed,

R = {q̄ + t([p̄ + q̄] − q̄) : t ≥ 0}.

Therefore R is the ray with origin q̄ in the direction of p̄ + q̄.

The following table shows how the concepts of lines, rays, line segments and betweenness
are related to one another.

20
x̄ = tp̄ + (1 − t)q̄
Line Ray Line segment Between
(t ∈ R) (t ≥ 0) (0 ≤ t ≤ 1) (0 < t < 1)

 • •- • • - • • • • •
q̄ p̄ q̄ p̄ q̄ p̄ q̄ x̄ p̄

We illustrate the concepts of a line, a line segment and a ray at the hand of some examples.
Example 1.4.11. We determine whether or not the points ā = ⟨5, −3, 6⟩ and b̄ = ⟨−4, −1, −3⟩
are on the line L through p̄ = ⟨2, −1, 3⟩ and q̄ = ⟨−1, 1, 0⟩.

Solution. A point x̄ ∈ R3 is on L if and only if

x̄ = q̄ + t(p̄ − q̄) = ⟨3t − 1, 1 − 2t, 3t⟩ for some t ∈ R.

Therefore, according to Definition 1.1.4, ā = ⟨5, −3, 6⟩ is on L if and only if there is a real
number t so that
5 = 3t − 1, − 3 = 1 − 2t and 6 = 3t.
Note that we require one value for t that satisfies all three equations. The third equation
implies that t = 2. Furthermore,

3 × 2 − 1 = 5 and 1 − 2 × 2 = −3

so that t = 2 also satisfies the first and second equations. Therefore ā = q̄ + 2(p̄ − q̄) ∈ L.

The point b̄ = ⟨−4, −1, −3⟩ lies on L if and only if there is a real number t so that

−4 = 3t − 1, − 1 = 1 − 2t and − 3 = 3t.

The third equation implies that t = −1, but the second equation implies that t = 1.
Therefore there is no real number t so that b̄ = tp̄ + (1 − t)q̄. Therefore b̄ is not on the line
L.
Example 1.4.12. We determine whether or not the points b̄ = ⟨0, 3, 0⟩ and c̄ = ⟨ 23 , 0, 32 ⟩
are on the line segment S between the points p̄ = ⟨1, 1, 1⟩ and q̄ = ⟨2, −1, 2⟩.

Solution. For a point x̄ we have

x̄ ∈ S if and only if x̄ = tp̄ + (1 − t)q̄ for some 0 ≤ t ≤ 1.

Hence the point b̄ = ⟨0, 3, 0⟩ is on the line segment S if and only if there exists a real number
0 ≤ t ≤ 1 so that
b̄ = tp̄ + (1 − t)q̄ = ⟨2 − t, 2t − 1, 2 − t⟩.

21
Hence b̄ is on S if and only if

0 = 2 − t, 3 = 2t − 1 and 0 = 2 − t for some 0 ≤ t ≤ 1.

According to the first equation, t = 2. This value of t also satisfies the second and third
equations, so that
b̄ = 2p̄ + (1 − 2)q̄.
Therefore b̄ is on the line through p̄ and q̄. However, t = 2 > 1 so that b̄ is not on the line
segment S between p̄ and q̄.

The point c̄ = ⟨ 23 , 0, 23 ⟩ is on S if and only if

c̄ = tp̄ + (1 − t)q̄ = ⟨2 − t, 2t − 1, 2 − t⟩ for some 0 ≤ t ≤ 1.

Thus c̄ ∈ S if and only if


3 3
2
= 2 − t, 0 = 2t − 1 and 2
= 2 − t for some 0 ≤ t ≤ 1.

The first equation implies that t = 21 . This value of t also satisfies the second and third
equations. Therefore
c̄ = 12 p̄ + 1 − 21 q̄.


1
Since 0 ≤ 2
≤ 1, it follows that c̄ ∈ S.

Two lines L1 = {q̄ + t(p̄ − q̄) : t ∈ R} and L2 = {b̄ + s(ā − b̄) : s ∈ R} in space intersect at
a point x̄ if and only if x̄ ∈ L1 ∩ L2 . That is, L1 and L2 intersect at x̄ if and only if x̄ ∈ L1
and x̄ ∈ L2 . Therefore L1 intersects L2 at x̄ if and only if

x̄ = q̄ + t(p̄ − q̄) for some t ∈ R

and
x̄ = b̄ + s(ā − b̄) for some s ∈ R.
Note that it is possible that L1 and L2 intersect at no point, or at more than one point.
This is illustrated in the following examples.
Example 1.4.13. Consider the line L1 through p̄ = ⟨1, 0, 1⟩ and q̄ = ⟨0, 2, 0⟩, and the line
L2 through the points ā = ⟨2, 1, 2⟩ and b̄ = ⟨0, 3, 0⟩. We determine whether or not L1 and
L2 intersect, and find all points of intersection, if any exist.

Solution. The lines L1 and L2 intersect at a point x̄ = ⟨x, y, z⟩ if and only if x̄ ∈ L1 and
x̄ ∈ L2 . Therefore L1 and L2 intersect x̄ if and only if

q̄ + t(p̄ − q̄) = x̄ = b̄ + s(ā − b̄) for some t, s ∈ R. (1.6)

It then follows from Definition 1.1.4 that

t = x = 2s, 2 − 2t = y = 3 − 2s and t = z = 2s.

22
The third equation implies that t = 2s. Substituting t = 2s into the second equation yields

2 − 2(2s) = 3 − 2s

so that s = − 21 . Hence t = 2s = −1. These values for s and t also satisfy the first equation,
since
t = −1 = 2(− 21 ) = 2s.
Therefore these are the only values for t and s that satisfy (1.6). Therefore L1 and L2
intersect at the single point x̄ = q̄ − (p̄ − q̄) = ⟨−1, 4, −1⟩.

Example 1.4.14. We determine whether or not the lines L1 = {⟨1 − t, 1 − t, 2t⟩ : t ∈ R}


and L2 = {⟨s, 2, s − 1⟩ : s ∈ R} intersect, and find the points of intersection, if any such
points exist.

Solution. The lines L1 and L2 intersect at a point x̄ = ⟨x, y, z⟩ if and only if x̄ ∈ L1 and
x̄ ∈ L2 . Therefore L1 and L2 intersect at x̄ if and only if

⟨1 − t, 1 − t, 2t⟩ = x̄ = ⟨s, 2, s − 1⟩ for some t, s ∈ R. (1.7)

From Definition 1.1.4 it follows that

1 − t = x = s, 1 − t = y = 2 and 2t = z = s − 1.

The second equation implies that t = −1. Substituting t = −1 into the first equation gives
s = 2. Substituting t = −1 into the third equation yields s = −1. Since −1 ̸= 2, it follows
that there are no real numbers t and s so that (1.7) is true. Therefore the lines L1 and L2
do not intersect.

Example 1.4.15. Let L1 be the line with equation x̄ = ⟨1 + t, 2 + t, 3 + t⟩, t ∈ R, and L2


the line with equation x̄ = ⟨2 − 2s, 3 − 2s, 4 − 2s⟩, s ∈ R. We determine whether or not
these lines intersect, and find the points of intersection, if any such points exist.

Solution. The lines L1 and L2 intersect at a point x̄ = ⟨x, y, z⟩ if and only if x̄ ∈ L1 and
x̄ ∈ L2 . Therefore L1 and L2 intersect at x̄ if and only if

⟨1 + t, 2 + t, 3 + t⟩ = x̄ = ⟨2 − 2s, 3 − 2s, 4 − 2s⟩ for some t, s ∈ R. (1.8)

It follows from Definition 1.1.4 that

1 + t = x = 2 − 2s, 2 + t = y = 3 − 2s and 3 + t = z = 4 − 2s.

All three equations imply that t = 1 − 2s. Hence for every real number s, there exists a real
number t = 1 − 2s so that t and s satisfy (1.8). Therefore, if x̄ = ⟨2 − 2s, 3 − 2s, 4 − 2s⟩ ∈ L2 ,
then x̄ = ⟨1+(1−2s), 2+(1−2s), 3+(1−2s)⟩ ∈ L1 . Conversely, if x̄ = ⟨1+t, 2+t, 3+t⟩ ∈ L1 ,
then x̄ = ⟨2 − 2 × 1−t
2
, 3 − 2 × 1−t
2
, 4 − 2 × 1−t
2
⟩ ∈ L2 . Therefore L1 = L2 so that every point
on the line is a point of intersection.

23
Our observations of the physical world tells us that, given two points p̄ ̸= q̄, there is exactly
one straight line through p̄ and q̄. Another way of expressing this fact is that if two lines L1
and L2 intersect in more than one point, then L1 = L2 . Examples 1.4.13, 1.4.14 and 1.4.15
suggest that this fact is true in our model for space. In the following theorem, we prove
that this is indeed the case. This result serves as a further motivation for our definition of
a line in space.

Theorem 1.4.16. If two lines L1 and L2 intersect at more than one point, then L1 = L2 .

Proof. Let L1 = {tp̄ + (1 − t)q̄ : t ∈ R}, and L2 = {sā + (1 − s)b̄ : s ∈ R}. Assume that
L1 and L2 intersect at ū and v̄, with ū ̸= v̄. Let L be the line through ū and v̄. We prove
that
L1 = L = L2
Since ū, v̄ ∈ L1 , there exist real numbers t0 and t1 so that

ū = t0 p̄ + (1 − t0 )q̄ and v̄ = t1 p̄ + (1 − t1 )q̄. (1.9)

Note that t0 ̸= t1 , see Exercise 1.4 number 12.

We first show that every point on L is also on L1 . Consider a point x̄ = rū + (1 − r)v̄ ∈ L.
It follows from (1.9) that

x̄ = rū + (1 − r)v̄

= r(t0 p̄ + (1 − t0 )q̄) + (1 − r)(t1 p̄ + (1 − t1 )q̄).

We use the properties of vector addition and scalar multiplication in Theorem 1.1.9 and find
that
x̄ = rt0 p̄ + r(1 − t0 )q̄ + (1 − r)t1 p̄ + (1 − r)(1 − t1 )q̄

= (rt0 + (1 − r)t1 )p̄ + (r(1 − t0 ) + (1 − r)(1 − t1 ))q̄

= (rt0 + t1 − rt1 )p̄ + (r − rt0 + 1 − t1 − r + rt1 )q̄

= (rt0 + t1 − rt1 )p̄ + (1 − (rt0 + t1 − rt1 ))q̄.


Therefore x̄ ∈ L1 .

Now we show that every point on L1 is also on L. It follows from (1.9) and Theorem 1.1.9
imply that
t0 ū = t0 t1 p̄ + t0 (1 − t1 )q̄

t1 v̄ = t1 t0 p̄ + t1 (1 − t0 )q̄.
We subtract the first equation from the second. The result is

t0 ū − t1 v̄ = (t0 (1 − t1 ) − t1 (1 − t0 )) q̄ = (t0 − t1 )q̄.

24
Therefore
t0 t1
q̄ = ū − v̄. (1.10)
t0 − t1 t0 − t1
In the same way, see Exercise 1.4 number 12, we find that
1 − t1 1 − t0
p̄ = ū − v̄. (1.11)
t0 − t1 t1 − t0

Suppose that ȳ ∈ L1 . Then ȳ = tp̄ + (1 − t)q̄ for some t ∈ R. It follows from (1.10), (1.11)
and Theorem 1.1.9 that
ȳ = tp̄ + (1 − t)q̄
   
1 − t1 1 − t0 t1 t0
= t ū − v̄ + (1 − t) ū − v̄
t0 − t1 t0 − t1 t1 − t0 t1 − t0
   
t(1 − t1 ) t1 (1 − t) t0 (1 − t) t(1 − t0 )
= − ū + − v̄
t0 − t1 t0 − t1 t0 − t1 t0 − t1

t − t1 t0 − t
= ū + v̄
t0 − t1 t0 − t1
 
t − t1 t − t1
= ū + 1 − v̄.
t0 − t1 t0 − t1

Therefore ȳ ∈ L.

It now follows that L1 = L. In exactly the same way, L2 = L. Therefore L1 = L2 .

The concept of parallel lines is an important one in applications. In our model for space,
parallel lines are defined as follows.

Definition 1.4.17. Two lines L1 = {q̄ + t(p̄ − q̄) : t ∈ R} and L2 = {b̄ + t(ā − b̄) : t ∈ R}
in R3 are parallel if p̄ − q̄ = α(ā − b̄) for some nonzero real number α.

Remark 1.4.18. Consider lines L1 = {q̄ +t(p̄− q̄) : t ∈ R} and L2 = {b̄+t(ā− b̄) : t ∈ R}
in R3 . If α is a nonzero real number, then p̄ − q̄ = α(ā − b̄) if and only if ā − b̄ = α1 (p̄ − q̄).
Therefore L1 and L2 are parallel if and only if ā− b̄ = β(p̄− q̄) for some nonzero real number
β.

Intuitively, if two different lines L1 and L2 are parallel, then L1 and L2 do not intersect.
The next result shows that this fact is true for lines in our model for space, thus motivating
our definition for parallel lines.

Theorem 1.4.19. If L1 and L2 are distinct parallel lines in R3 , then they do not intersect.

25
Proof. Let L1 = {q̄ + t(p̄ − q̄) : t ∈ R} and L2 = {b̄ + s(ā − b̄) : s ∈ R} be distinct
parallel lines in R3 . Since L1 ̸= L2 we may, by Theorem 1.4.16, assume that q̄ ∈
/ L2 . Then

q̄ ̸= b̄ + s(ā − b̄) for all s ∈ R. (1.12)

Since L1 and L2 are parallel, there exists a nonzero real number α so that p̄ − q̄ = α(ā − b̄).
Therefore
L1 = {q̄ + αt(ā − b̄) : t ∈ R}.
Suppose that L1 and L2 intersect at a point x̄. Then there exist real numbers s0 and t0 so
that
q̄ = b̄ + (s0 − αt0 )(ā − b̄),
see Exercise 1.4 number 13. This contradicts (1.12). Hence L1 and L2 do not intersect.
Remark 1.4.20. Theorem 1.4.19 states that distinct parallel lines do not intersect. The
converse of this theorem is, in general, false. If two lines L1 and L2 do not intersect, it does
not follow that the lines are parallel, see Example . We will return to this issue in Section
1.6.

We illustrate the concept of parallel lines at the hand of some examples.


Example 1.4.21. Let L1 be the line through p̄ = ⟨2, 3, 1⟩ and q̄ = ⟨4, 5, 5⟩, and L2 the
line through ā = ⟨−1, 1, 2⟩ and b̄ = ⟨5, 7, 14⟩. We determine whether or not L1 and L2 are
parallel.

Solution. We have p̄ − q̄ = ⟨−2, −2, −4⟩ and ā − b̄ = ⟨6, 6, 12⟩. Therefore

ā − b̄ = ⟨6, 6, 12⟩ = −3⟨−2, −2, −4⟩ = −3(p̄ − q̄).

Therefore L1 and L2 are parallel.


Example 1.4.22. Let L1 = {⟨1 − t, 1 − t, 2t⟩ : t ∈ R} and L2 = {⟨s, 2, s − 1⟩ : s ∈ R}.
According to Example 1.4.14, the lines L1 and L2 do not intersect. We show that L1 and
L2 are not parallel.

Solution. We express L1 and L2 as

L1 = {⟨1, 1, 0⟩ + t⟨−1, −1, 2⟩ : t ∈ R}

and
L2 = {⟨0, 2, −1⟩ + s⟨1, 0, 1⟩ : s ∈ R}.
If L1 and L2 were parallel, then

⟨−1, −1, 2⟩ = α⟨1, 0, 1⟩ = ⟨α, 0, α⟩ for some α ∈ R.

In this case, Definition 1.1.4 implies that −1 = 0, which is not true. Therefore L1 and L2
are not parallel.

26
Example 1.4.23. Consider the points p̄ = ⟨1, 0, 2⟩, q̄ = ⟨c, 2, 1⟩, ā = ⟨5, c, 3⟩ and b̄ =
⟨−7, −1, 1⟩ in R3 , where c ∈ R is a constant. We determine all the values of c, if any, so
that the line through p̄ and q̄ is parallel to the line through ā and b̄.

Solution. According to Definition 1.4.17, the line through p̄ and q̄ is parallel to the line
through ā and b̄ if and only if there exists a nonzero real number α so that

ā − b̄ = α(p̄ − q̄).

We have ā − b̄ = ⟨12, c + 1, 2⟩ and p̄ − q̄ = ⟨1 − c, −2, 1⟩. Therefore, for a real number α, we


have
ā − b̄ = α(p̄ − q̄) if and only if 12 = α(1 − c), c + 1 = −2α, α = 2.
According to the last equation, if ā − b̄ = α(p̄ − q̄) then α = 2. Substituting α = 2 into the
first and second equations yields

12 = 2 − 2c and c + 1 = −4.

Both equations imply that c = −5. Therefore the line through p̄ and q̄ is parallel to the line
through ā and b̄ if and only if c = −5.

Another object that appears frequently in applications is the sphere. For instance, the
surface of the earth is often approximated as a sphere. Within our model for space, a sphere
is defined as follows.
Definition 1.4.24. Let c̄ be a point in R3 , and r > 0 a real number. The sphere with
centre c̄ and radius r is the set

S = {x̄ ∈ R3 : ∥c̄ − x̄∥ = r}.

Remark 1.4.25. Consider a sphere

S = {x̄ ∈ R3 : ∥c̄ − x̄∥ = r}.

S consists of all points in space that are at a distance of r from the fixed point c̄, the centre
of the sphere. The sphere S is shown in the figure below.



r
• c̄

27
Our everyday experience tells us that a line intersects a sphere in either one or two points,
or not at all. Indeed, if you shoot an arrow at a soccer ball at a very high speed and do not
miss, then the arrow will enter the ball at one point and exit at another, or just graze the
surface of the ball. We illustrate this fact at the hand of some examples.

Example 1.4.26. Let S be the sphere with radius 2 and centre 0̄, and L the line through
the points p̄ = ⟨3, 1, 0⟩ and q̄ = ⟨0, 0, 3⟩. We determine the points where L intersects S, if
any such points exist.

Solution. L intersects S at a point r̄ if and only if r̄ ∈ S and r̄ ∈ L. Therefore L intersects


S at r̄ if and only if

r̄ = tp̄ + (1 − t)q̄ = ⟨3t, t, 3 − 3t⟩ for some t ∈ R (1.13)

and
∥r̄ − 0̄∥ = 2.
It is convenient to replace the last equation with

∥r̄ − 0̄∥2 = 4. (1.14)

If we substitute (1.13) into (1.14) we find that

4 = ∥r̄ − 0̄∥2 = ∥⟨3t, t, 3 − 3t⟩∥2 = 9t2 + t2 + 9 − 18t + 9t2 .

Hence

19t2 − 18t + 5 = 0. (1.15)

Because the discriminant of the quadratic equation satisfies

∆ = (−18)2 − 4 × 19 × 5 = −56 < 0,

equation (1.15) has no real solutions. Therefore the line L does not intersect the sphere
S.

Example 1.4.27. Let S be the sphere with radius 1 and centre c̄ = ⟨1, 1, 1⟩, and L the
line through the points p̄ = ⟨2, −1, 1⟩ and q̄ = ⟨−1, 2, 1⟩. We determine the points where L
intersects S, if any such points exist.

Solution. A point r̄ lies on both L and S if and only if r̄ ∈ S and r̄ ∈ L. Therefore L


intersects S at r̄ if and only if

r̄ = tp̄ + (1 − t)q̄ = ⟨3t − 1, 2 − 3t, 1⟩ for some t ∈ R (1.16)

and
∥r̄ − c̄∥ = 1.

28
In order to simplify our calculations, we replace this second equation with
∥r̄ − c̄∥2 = 1. (1.17)
Substituting (1.16) into (1.17) we find that
1 = ∥r̄ − c̄∥2 = ∥⟨3t − 2, 1 − 3t, 0⟩∥2 = 9t2 − 12t + 4 + 1 − 6t + 9t2 .
Hence
18t2 − 18t + 4 = 0.
Solving for t we find that t = 32 or t = 13 . Substituting these values for t back into equation
(1.16) we find that the line L intersects the sphere S at the points
r̄0 = ⟨3 × 32 − 1, 2 − 3 × 32 , 1⟩ = ⟨1, 0, 1⟩ and r̄1 = ⟨3 × 31 − 1, 2 − 3 × 31 , 1⟩ = ⟨0, 1, 1⟩.

Example 1.4.28. Let S be the sphere with radius 2 and centre c̄ = ⟨1, 0, 1⟩, and L the
line through the points p̄ = ⟨−2, 2, 2⟩ and q̄ = ⟨1, −1, −1⟩. We determine the points where
L intersects S, if any such points exist.

Solution. L intersects S at a point r̄ if and only if r̄ ∈ S and r̄ ∈ L. Therefore L intersects


S at r̄ if and only if
r̄ = tp̄ + (1 − t)q̄ = ⟨1 − 3t, 3t − 1, 3t − 1⟩ for some t ∈ R (1.18)
and √
∥r̄ − c̄∥ = 2.
It is convenient to replace the last equation with
∥r̄ − c̄∥2 = 2. (1.19)
Substituting (1.18) into (1.19) we find that
2 = ∥r̄ − c̄∥2 = ∥⟨−3t, 3t − 1, 3t − 2⟩∥2 = 9t2 + 9t2 − 6t + 1 + 9t2 − 12t + 4.
Hence
9t2 − 6t + 1 = 0.
This equation has only one solution, namely, t = 31 . Therefore the line L intersects the
sphere S at precisely one point, namely
r̄ = ⟨1 − 3 × 13 , 3 × 13 − 1, 3 × 13 − 1⟩ = 0̄.

In this section lines and related objects are defined in our model for space. It is shown that
these objects behave in the way that we would expect, based on our observations of physical
reality. In particular, it is proved that given two lines L1 and L2 , there are exactly three
possibilities: Either the lines don’t intersect, or they intersect at a single point, or they are
equal. This idea comes up again in Chapter 2.

29
Exercise 1.4

1. Let L be the line through p̄ and q̄. In each case, determine whether or not the given
points ā and b̄ are on L. If one or both of the points is on L, also determine whether
or not the point is between p̄ and q̄.
(a) p̄ = ⟨2, 2, −3⟩ and q̄ = 0̄; ā = ⟨1, 1, −2⟩ and b̄ = ⟨−4, −4, 6⟩
(b) p̄ = ⟨1, 2, 1⟩ and q̄ = ⟨1, 0, 2⟩; ā = ⟨1, 8, −2⟩ and b̄ = ⟨1, 4, 1⟩
(c) p̄ = ⟨−1, 1, 1⟩ and q̄ = ⟨1, 2, 2⟩; ā = ⟨−5, −1, −1⟩ and b̄ = ⟨−3, 3, 3⟩
(d) p̄ = ⟨2, 3, −5⟩ and q̄ = ⟨−1, 2, 1⟩; ā = ⟨5, 4, −10⟩ and b̄ = ⟨−4, 3, 7⟩
(e) p̄ = ⟨0, 2, −1⟩ and q̄ = ⟨1, 0, −1⟩; ā = ⟨−4, 10, −1⟩ and b̄ = ⟨−2, 7, −1⟩

2. Consider the points p̄ = ⟨1, 0, 1⟩ and q̄ = ⟨4, 2, 6⟩. Determine whether or not the
points ā, b̄ and c̄ are between p̄ and q̄, where

ā = ⟨ 52 , 1, 27 ⟩, b̄ = ⟨3, 43 , 13
3
⟩, c̄ = ⟨−2, −2, −4⟩.

3. Determine whether or not the lines L1 and L2 intersect, and find the points of inter-
section, if any exist.
(a) L1 is the line through p̄ = ⟨1, 2, −1⟩ and q̄ = ⟨2, 0, 1⟩, and L2 is the line through
ā = ⟨0, 4, 1⟩ and b̄ = ⟨−2, 8, −11⟩.
(b) L1 is the line through p̄ = ⟨−1, 0, −1⟩ and q̄ = ⟨1, 1, 1⟩, and L2 is the line through
ā = ⟨−1, 0, −5⟩ and b̄ = ⟨1, 1, −1⟩.
(c) L1 is the line through p̄ = ⟨1, 2, 1⟩ and q̄ = ⟨−1, 0, 2⟩, and L2 is the line through
ā = ⟨−2, −2, 5⟩ and b̄ = ⟨−1, −1, 6⟩.
(d) L1 is the line through p̄ = ⟨2, 0, −1⟩ and q̄ = ⟨1, −2, 1⟩, and L2 is the line through
ā = ⟨0, −4, 1⟩ and b̄ = ⟨3, 2, −3⟩.
(e) L1 is the line through p̄ = ⟨2, 1, −2⟩ and q̄ = ⟨−2, −1, 2⟩, and L2 is the line
through ā = ⟨5, 3, −6⟩ and b̄ = ⟨1, 5, −4⟩.
(f) L1 is the line through p̄ = ⟨2, 3, −1⟩ and q̄ = ⟨−1, 4, 1⟩, and L2 is the line through
ā = ⟨5, 2, −3⟩ and b̄ = ⟨−1, 4, 0⟩.

4. Let c̄ = ⟨1, 0, 0⟩ and let S be the sphere with centre c̄ and radius 2. Determine if the
line L intersects the sphere S, and find the points of intersection, if any exist.
(a) L is the line through p̄ = ⟨1, −2, 2⟩ and q̄ = ⟨2, −1, 1⟩.
(b) L = {⟨3t, 4 − 4t, 1⟩ : t ∈ R}.

(c) L is the line through p̄ = ⟨0, −1, 3 2⟩ and q̄ = ⟨3, 2, 0⟩.
√ √
(d) L is the line through p̄ = ⟨3, 2 3, −2⟩ and q̄ = ⟨0, − 3, 4⟩.

5. Let L be the line through p̄ = ⟨1, 1, −1⟩ and q̄ = ⟨−1, 2, 1⟩. For the given points ā
and b̄, determine the value(s) of the real number α so that the line through ā and b̄
intersects L at exactly one point.

30
(a) ā = ⟨1, 0, 1⟩ and b̄ = ⟨−3, 1, 3α + 1⟩ (b) ā = ⟨−9, 6, α2 ⟩ and b̄ = ⟨3, 0, −3⟩
(c) ā = ⟨1, 0, 1⟩ and b̄ = ⟨−3, 1, 3α + 1⟩ (d) ā = ⟨2 + α, 2, 3⟩ and b̄ = ⟨2, 1, −1⟩

6. For vectors ā and b̄ as given below, sketch the line segments between 0̄ and ā, 0̄ and
b̄, ā and ā + b̄, and b̄ and ā + b̄ in the xy-plane {⟨x, y, 0⟩ : x, y ∈ R}. Identify the
figure formed by the four line segments.
(a) ā = ⟨1, 0, 0⟩, b̄ = ⟨0, 1, 0⟩ (b) ā = ⟨1, 2, 0⟩, b̄ = ⟨2, 1, 0⟩
(c) ā = ⟨−1, 1, 0⟩, b̄ = ⟨2, 1, 0⟩ (d) ā = ⟨3, 1, 0⟩, b̄ = ⟨−2, 3, 0⟩
(e) ā = ⟨4, 2, 0⟩, b̄ = ⟨2, 1, 0⟩ (f) ā = ⟨5, 3, 0⟩, b̄ = ⟨2, −3, 0⟩
7. Determine whether or not the given lines L1 and L2 are parallel.
(a) L1 is the line with equation x̄ = ⟨2 + 3t, −t, 2 + t⟩, t ∈ R, and L2 is the line with
equation x̄ = ⟨1, 3, 1⟩ + s⟨6, −2, 2⟩, s ∈ R.
(b) L1 is the line through p̄ = ⟨−2, 1, 2⟩ and q̄ = ⟨5, 4, −1⟩, and L2 is the line with
equation x̄ = ⟨8 − 14s, −6s, 1 + 5s⟩, s ∈ R.
8. Determine the value(s) of the real number c for which the given lines L1 and L2 are
parallel.
(a) L1 is the line with equation x̄ = ⟨4 + c2 t, −1, t + 1⟩, t ∈ R, and L2 is the line with
equation x̄ = ⟨4, 2, 1⟩ + s⟨4, 0, 2⟩, s ∈ R.
(b) L1 is the line through p̄ = ⟨1, 2c, 0⟩ and q̄ = ⟨−1, 1, 1⟩, and L2 is the line through
ā = ⟨0, 1, 1⟩ and b̄ = ⟨1, 2, −1⟩.
(c) L1 is the line through p̄ = ⟨c2 , 1, 1⟩ and q̄ = ⟨−1, 0, 2⟩, and L2 is the line through
ā = ⟨3, 4, c⟩ and b̄ = ⟨−1, 2, 3⟩.
(d) L1 is the line with equation x̄ = ⟨1 + ct, 2, 2t + 1⟩, t ∈ R, and L2 is the line with
equation x̄ = ⟨2, 5, −3⟩ + s⟨1, 3, −1⟩, s ∈ R.
(e) L1 is the line with equation x̄ = ⟨c + t, 2t − c, c − 3t⟩, t ∈ R, and L2 is the line
with equation x̄ = ⟨−2, 0, 3⟩ + s⟨−2, −4, 6⟩, s ∈ R.
9. Study the proof of Theorem 1.4.2. Assume that p̄, q̄ and r̄ are points in R3 , all
different, so that ∥p̄ − q̄∥2 = ∥q̄ − r̄∥2 + 2∥q̄ − r̄∥∥r̄ − p̄∥ + ∥r̄ − p̄∥2 .
(a) Verify equation (1.3).
α α

(b) If q̄ − r̄ = α(r̄ − p̄) with α > 0, show that r̄ = 1+α
p̄ + 1− 1+α
q̄.
α
(c) If α > 0 is a real number, show that 0 < 1+α
< 1.
10. Complete the proof of Theorem 1.4.2: Assume that r̄ is between p̄ and q̄, and show
that ∥p̄ − r̄∥ + ∥r̄ − q̄∥ = ∥p̄ − q̄∥.
11. Study the proof of Theorem 1.4.6.
(a) Assume that
p̄ = tq̄ + (1 − t)r̄ and q̄ = sp̄ + (1 − s)r̄
with s, t ∈ (0, 1). Show that (1 − st)p̄ = (1 − st)r̄.

31
(b) With s and t as in (a), show that 1 − st > 0.
(c) With p̄, q̄, r̄, t0 , t1 and t2 as in the proof, prove the following. If t0 < t1 < t2 or
t2 < t1 < t0 , then q̄ is between p̄ and r̄.

12. Study the proof of Theorem 1.4.16.


(a) Assume that ū = t0 p̄ + (1 − t0 )q̄ and v̄ = t1 p̄ + (1 − t1 )q̄ are distinct points on
the line through p̄ and q̄. Then t0 ̸= t1 ̸= 0. Show that p̄ = t1−t 1
0 −t1
ū − t1−t 0
1 −t0
v̄.
(b) Why can we assume that t0 ̸= t1 ? How is this fact used in the proof?

13. Consider the proof of Theorem 1.4.19. Suppose that L1 and L2 intersect at a point x̄.
Prove that there exist real numbers s0 and t0 so that q̄ = b̄ + (s0 − αt0 )(ā − b̄).

1.5 Angles and Angle Measurement

Angles are common in the world around us. The side line and the goal line on a soccer pitch
form an angle at the point where they intersect, as do non-opposing walls in a rectangular
room. In this section we define angles in the context of our model for space, and introduce
angle measure. As a motivation for our definitions of angle and the magnitude of an angle,
we introduce triangles, and show that our definitions are consistent with the Cosine Law.
Definition 1.5.1. Consider two rays R1 = {ā + tp̄ : t ≥ 0} and R2 = {ā + tq̄ : t ≥ 0}
with the same origin ā. The set R1 ∪ R2 of all points on the two rays is called the angle
formed by the rays R1 and R2 .

The figure below shows the angle formed by the rays R1 and R2 .

R2


ā + q̄ •

• • - R1
ā ā + p̄

Remark 1.5.2. Take note of the following regarding angles.

(1) Consider two rays R1 = {ā + tp̄ : t ≥ 0} and R2 = {ā + tq̄ : t ≥ 0} with the
same origin ā. If the direction vectors p̄ and q̄ are scalar multiples of each other, then
R1 = R2 or R1 ∪ R2 is a line.

(2) Let ā, ū and v̄ be three points not on the same line. Then the line segments S1 =
{ā + t(ū − ā) : 0 ≤ t ≤ 1} and S2 = {ā + t(v̄ − ā) : 0 ≤ t ≤ 1} determine
an angle, namely, the angle formed by the rays R1 = {ā + t(ū − ā) : t ≥ 0} and
R2 = {ā + t(v̄ − ā) : t ≥ 0}, see the figure below.

32
 R2

v̄•

• • - R1
ā ū

Our aim now is to define the magnitude of an angle, that is, to measure the size of an angle.
In the figure below, the angle on the right is clearly ‘larger’ than the one on the left.

 






-  -

Our goal is to quantify what we mean by ‘the size of an angle’, that is, to attach a numerical
value to an angle that serves as a measure of its size.

First note that, for vectors x̄ and ȳ in R3 , |x̄· ȳ| ≤ ∥x̄∥∥ȳ∥ by the Cauchy-Schwarz Inequality,
Theorem 1.2.8. Therefore
x̄ · ȳ
−1 ≤ ≤ 1.
∥x̄∥∥ȳ∥
Since the function f (θ) = cos θ is continuous and strictly decreasing on [0, π], with f (0) = 1
and f (π) = −1, there exists exactly one real number θ ∈ [0, π] such that
x̄ · ȳ
cos θ = .
∥x̄∥∥ȳ∥
We are therefore led to define the magnitude of an angle as follows.
Definition 1.5.3. Consider the angle formed by the rays with equations x̄ = ā + tp̄ and
x̄ = ā + tq̄, t ≥ 0. The magnitude of the angle is the unique real number θ ∈ [0, π] such that
p̄ · q̄
cos θ = .
∥p̄∥∥q̄∥
Remark 1.5.4. Consider the angle formed by the rays R1 and R2 with equations x̄ = ā + tp̄
and x̄ = ā + tq̄, t ≥ 0, respectively. Let θ be the magnitude of the angle. According to
Remark 1.2.9

|p̄ · q̄| = ∥p̄∥∥q̄∥ if and only if p̄ and q̄ are scalar multiples of each other,

see also Exercise 1.2 number 6. In particular,

p̄ · q̄ = ∥p̄∥∥q̄∥ if and only if p̄ = αq̄ for some α > 0

33
and
p̄ · q̄ = −∥p̄∥∥q̄∥ if and only if p̄ = αq̄ for some α < 0
We therefore have the following.

(1) If p̄ and q̄ are not scalar multiples of each other, then 0 < θ < π.
(2) If p̄ = αq̄ for some α > 0, then θ = 0. In this case, R1 = R2 so the angle R1 ∪ R2 is
a ray.
(3) If p̄ = αq̄ for some α < 0, then θ = π. In this case, the angle R1 ∪ R2 is a line.

We illustrate the definition at the hand of a number of examples.


Example 1.5.5. We determine the magnitude of the angle formed by the rays R1 and R2
with equations x̄ = 0̄ + t⟨2, −1, 2⟩ and x̄ = 0̄ + t⟨1, −1, 0⟩, t ≥ 0, respectively.

Solution. The direction vector for R1 is p̄ = ⟨2, −1, 2⟩, while that for R2 is q̄ = ⟨1, −1, 0⟩.
If θ is the magnitude of the angle formed by these two rays, then
p̄ · q̄ 3 1
cos θ = = √ =√ .
∥p̄∥∥q̄∥ 3× 2 2
Hence θ = π4 .
Example 1.5.6. We determine the magnitude of the angle formed by the rays R1 and R2
with equations x̄ = ⟨1, 1, 1⟩ + t⟨1, 2, 1⟩ and x̄ = ⟨1, 1, 1⟩ + t⟨2, 1, 2⟩, t ≥ 0, respectively.

Solution. The direction vector for R1 is p̄ = ⟨1, 2, 1⟩, while that for R2 is q̄ = ⟨2, 1, 2⟩. If θ
is the magnitude of the angle formed by these two rays, then
p̄ · q̄ 6
q
cos θ = =√ = 23 .
∥p̄∥∥q̄∥ 6×3
p 
Hence θ = arccos 2/3 ≈ 0.615.

Example 1.5.7. Consider the rays R1 and R2 with equations x̄ = 0̄ + t⟨1, −1, 1⟩ and
x̄ = 0̄ + t⟨2α, 1, 1⟩, t ≥ 0, respectively. We find all values for α ∈ R so that the angle formed
by R1 and R2 has magnitude π3 .

Solution. The direction vector for R1 is p̄ = ⟨1, −1, 1⟩, while that for R2 is q̄ = ⟨2α, 1, 1⟩.
The magnitude of the angle formed by these two rays is π3 if and only if

1 p̄ · q̄ 2α
2
= cos π3 = =√ √ . (1.20)
∥p̄∥∥q̄∥ 3 4α2 + 2
√ √ p
Thus p3 4α2 + 2 = 4α so that 16α2 = 12α2 + 6. We solve for α and find α = 3/2 or
α = − 3/2.

34
Because we squared both sides of (1.20), it is possible that one or morepof the values we
found for α do not satisfy
p the original equation (1.20). In this case, α = 3/2 is a solution
of (1.20), but α = − 3/2 is not. Therefore
p the only value for α for which the angle formed
by the rays R1 and R2 is π3 , is α = 3/2.

As motivation for our definition of the magnitude of an angle, we define triangles within
our model for space.
Definition 1.5.8. Let ā, b̄ and c̄ be points in R3 , not all on the same line. Let S1 be the
line segment between ā and b̄, S2 the line segment between b̄ and c̄, and S3 the line segment
between ā and c̄. The triangle with vertices ā, b̄ and c̄ is the set S1 ∪ S2 ∪ S3 of points on
the three line segments.

The figure below illustrates a triangle, as defined above.




@
@
S1 @ S2
@
@
ā • @•c̄
S3
Remark 1.5.9. Consider three points ā, b̄ and c̄ in R3 , not all on the same line, and the
triangle with vertices ā, b̄ and c̄.

(1) The line segments S1 , S2 and S3 joining ā, b̄ and c̄, respectively, are called the sides
of the triangle.
(2) The triangle determines three angles, namely,
(a) the angle at ā formed by the rays with equations x̄ = ā+t(b̄−ā) and x̄ = ā+t(c̄−ā),
t ≥ 0;
(b) the angle at b̄ formed by the rays with equations x̄ = b̄+t(ā− b̄) and x̄ = b̄+t(c̄− b̄),
t ≥ 0;
(c) the angle at c̄ formed by the rays with equations x̄ = c̄+t(ā−c̄) and x̄ = c̄+t(b̄−c̄),
t ≥ 0.
The triangle with vertices ā, b̄ and c̄, and the three angles determined by the triangle,
are illustrated in the figure below.
 @
I
@
b̄ @b̄
• •
@
@ @
@
b̄ @
@
@
• @
@
@
@ @ @
ā • @•c̄ -
@
@@  ā • @•c̄
@
The angle at ā @
@ The angle at c̄
ā • @ c̄
@
@•
The angle at b̄ @
@
R
@
35
We now show that our definition of angle measure is consistent with the Cosine Rule.
Consider a triangle with vertices ā, b̄ and c̄. Let θ be the magnitude of the angle at b̄.


@
@
S1 @ S2
@
@
ā • @•c̄
S3
Calculating the square of the length of the side S3 we have

∥ā − c̄∥2 = ∥(ā − b̄) − (c̄ − b̄)∥2

= [(ā − b̄) − (c̄ − b̄)] · [(ā − b̄) − (c̄ − b̄)]

= (ā − b̄) · (ā − b̄) − 2[(ā − b̄) · (c̄ − b̄)] + (c̄ − b̄) · (c̄ − b̄)

= ∥ā − b̄∥2 − 2[(ā − b̄) · (c̄ − b̄)] + ∥c̄ − b̄∥2 .

Note that the angle at b̄ is determined by the rays with equations x̄ = b̄ + t(ā − b̄) and
x̄ = b̄ + t(c̄ − b̄), t ≥ 0. Therefore, calculating the magnitude θ of the angle at b̄, we find

(ā − b̄) · (c̄ − b̄)


cos θ = .
∥ā − b̄∥∥c̄ − b̄∥

Hence

∥ā − c̄∥2 = ∥ā − b̄∥2 + ∥c̄ − b̄∥2 − 2∥ā − b̄∥∥c̄ − b̄∥ cos θ. (1.21)

Note that this is in agreement with the Cosine Law. Indeed, the length of S1 is ∥b̄ − ā∥,
and that of S2 is ∥c̄ − b̄∥.

We illustrate some applications of (1.21) at the hand of a number of examples.

Example 1.5.10. Consider the triangle with vertices ā = ⟨1, 0, 1⟩, b̄ = ⟨1, 2, 1⟩ and c̄ =
⟨1, 2, 3⟩. We determine the magnitude of the angle at ā.

Solution. Let θ be the magnitude of the angle at ā. According to (1.21),

∥b̄ − c̄∥2 = ∥b̄ − ā∥2 + ∥c̄ − ā∥2 − 2∥b̄ − ā∥∥c̄ − ā∥ cos θ.

Hence 4 = 4 + 8 − 8 2 cos θ so that cos θ = √12 . Therefore θ = π4 .

Example 1.5.11. Consider the triangle with vertices ā = ⟨1, 0, 1⟩, b̄ = ⟨1, 2, 1⟩ and c̄ =
⟨α, 1, 0⟩. We find the values for α so that the angle at ā has magnitude π3 .

36
Solution. According to the Cosine Rule (1.21),

∥b̄ − c̄∥2 = ∥b̄ − ā∥2 + ∥c̄ − ā∥2 − 2∥b̄ − ā∥∥c̄ − ā∥ cos π

3

= ∥b̄ − ā∥2 + ∥c̄ − ā∥2 − ∥b̄ − ā∥∥c̄ − ā∥.

We have
p p
∥b̄ − c̄∥ = (α − 1)2 + 2, ∥b̄ − ā∥ = 2 and ∥c̄ − ā∥ = (α − 1)2 + 2.

Therefore p
(α − 1)2 + 2 = 4 + (α − 1)2 + 2 − 2 (α − 1)2 + 2
so that p
(α − 1)2 + 2 = 2.
√ √
Hence (α − 1)2 = 2, so that α = 1 − 2 or α = 1 + 2.

Example 1.5.12. Consider the triangle with vertices ā = ⟨1, −1, 1⟩, b̄ = ⟨2, 1, 1⟩ and
c̄ = ⟨−2α − 1, α, 6⟩. We find the values for α so that the angle at b̄ has magnitude π4 .

Solution. If the angle at b̄ has magnitude π4 , then according to the Cosine Rule (1.21),

∥ā − c̄∥2 = ∥ā − b̄∥2 + ∥c̄ − b̄∥2 − 2∥ā − b̄∥∥c̄ − b̄∥ cos π4



= ∥ā − b̄∥2 + ∥c̄ − b̄∥2 − 2∥ā − b̄∥∥c̄ − b̄∥.

We have
√ √ √
∥ā − c̄∥ = 5α2 + 10α + 30, ∥ā − b̄∥ = 5 and ∥c̄ − b̄∥ = 5α2 + 10α + 35.

Therefore √
5α2 + 10α + 30 = 5 + 5α2 + 10α + 35 − 50α2 + 100α + 350
so that √
10 = 50α2 + 100α + 350.
Hence
α2 + 2α + 5 = 0.
Because ∆ = 22 − 4 × 5 = −16 < 0, the equation has no real solutions. Therefore we
conclude that there does not exist a value for α so that the angle at b̄ has magnitude π4 .

Example 1.5.13. Consider a triangle such as in the figure below.




@
@
@
@
@
ā • @•c̄

37
We determine ∥b̄ − c̄∥ if ∥ā − b̄∥ = 3, ∥c̄ − ā∥ = 4 and the angle at b̄ has magnitude π6 .

Solution. By the Cosine Rule (1.21),

∥ā − c̄∥2 = ∥c̄ − b̄∥2 + ∥ā − b̄∥2 − 2∥ā − b̄∥∥c̄ − b̄∥ cos π

6
.

Therefore √
16 = 9 + ∥b̄ − c̄∥2 − 3 3∥b̄ − c̄∥.
That is, √
∥b̄ − c̄∥2 − 3 3∥b̄ − c̄∥ − 7 = 0.
We use the quadratic formula to solve for ∥b̄ − c̄∥, and find
√ √ √ √
3 3 + 55 3 3 − 55
∥b̄ − c̄∥ = or ∥b̄ − c̄∥ = .
2 2
√ √ √
Since 55 > 27 = 3 3, it follows that
√ √
3 3 − 55
< 0.
2
√ √
Since ∥b̄ − c̄∥ ≥ 0, it therefore follows that ∥b̄ − c̄∥ = (3 3 + 55)/2.

In this section angles and the magnitude of an angle are defined in the context of our
model R3 for space. As an application, we define triangles and show that our definitions are
consistent with the Cosine Law. We therefore have reason to believe that our definitions
are reasonable. The applications of angles and angle measurement, in particular through
the formula (1.21), are demonstrated at the hand of a number of examples.

Exercise 1.5

1. Find the magnitude of the given angle, if it is defined. Otherwise, explain why it is
not defined.
(a) The angle determined by the rays with equations x̄ = 0̄ + t⟨0, 2, 1⟩ and x̄ =
0̄ + t⟨3, −1, 2⟩, t ≥ 0
(b) The angle determined √by the rays with equations x̄ = ⟨1, 0, 1⟩ + t⟨0, 2, 0⟩ and
x̄ = ⟨1, 0, 1⟩ + t⟨−1, 3, 2⟩, t ≥ 0
√ √
(c) The angle determined by the rays with equations x̄ = ⟨1, 2, 3⟩ + t⟨ 2, −3 2, −2⟩
and x̄ = ⟨1, 2, 3⟩ + t⟨0, 2, 0⟩, t ≥ 0
(d) The angle determined by the rays with equations x̄ = ⟨−2, 2, 1⟩ + t⟨4, 8, 4⟩ and
x̄ = ⟨−2, 2, 1⟩ + t⟨1, 2, 1⟩, t ≥ 0
(e) The angle determined by the rays with equations x̄ = ⟨3, 2, 1⟩ + t⟨1, 0, √13 ⟩ and
x̄ = ⟨2, 2, 5⟩ + t⟨0, 0, −1⟩, t ≥ 0

(f) The angle determined by the rays with equations x̄ = ⟨2, 2, 5⟩ + t⟨3, 0, 3⟩ and
x̄ = ⟨2, 2, 5⟩ + t⟨0, 0, 2⟩, t ≥ 0

38

The angle determined by the rays with equations x̄ = ⟨0, −1, −3⟩+t⟨1+ 3, 2, 1−
(g) √
3⟩ and x̄ = ⟨0, −1, −3⟩ + t⟨1, 2, 1⟩, t ≥ 0
(h) √ √ determined by the rays with equations x̄ = ⟨0, −1, −3⟩ + t⟨−1, 1 +
The angle
3, 2 + 3⟩ and x̄ = ⟨0, −1, −3⟩ + t⟨−2, −4, −2⟩, t ≥ 0
(i) The angle determined by the rays with equations x̄ = ⟨1, 2, 1⟩ + t⟨1, 1, 2⟩ and
x̄ = ⟨1, 2, 1⟩ + t⟨−1, −1, 2⟩, t ≥ 0

2. Consider the triangle with vertices ā, b̄ and c̄. In each case, find the magnitude of the
specified angle.
(a) ā = ⟨1, 1, 1⟩, b̄ = ⟨1, 3, 2⟩ and c̄ = ⟨7, −1, 5⟩; the angle at ā
√ √
(b) ā = ⟨1, −1, 1⟩, b̄ = ⟨1, 3, 1⟩ and c̄ = ⟨1 − 2, 3 2 − 1, 3⟩; the angle at ā
(c) ā = ⟨1, 1, 1⟩, b̄ = ⟨0, 1, 0⟩ and c̄ = ⟨0, 0, 1⟩; the angle at c̄

(d) ā = 0̄, b̄ = ⟨2, −6, −2 2⟩ and c̄ = ⟨0, 1, 0⟩; the angle at ā.
√ √
(e) ā = ⟨ 3, 3, − 3⟩, b̄ = ⟨1, 5, 1⟩ and c̄ = ⟨−1, 1, −1⟩; the angle at c̄

(f) ā = ⟨0, 2, −3⟩, b̄ = ⟨ 3, 2, −2⟩ and c̄ = ⟨0, 2, −5⟩; the angle at ā
√ √ √
(g) ā = ⟨2, 1, 3 3⟩, b̄ = ⟨−1, 1, 2 3⟩ and c̄ = ⟨−1, 1, 4 3⟩; the angle at b̄

3. Consider the triangle with vertices ā = 0̄, b̄ = ⟨1, 2, 1⟩ and c̄ = ⟨0, α, 1⟩. Find the
values of α such that the magnitude of the angle at ā is
π π π  √  5π 3π π
(a) (b) (c) (d) arccos 2√32 (e) (f) (g)
6 4 3 6 4 2
4. Consider a triangle with vertices ā, b̄ and c̄, such as the one in the figure below.

• b̄
@
@
@
@
@
ā • @•c̄

(a) Find ∥ā − c̄∥ if ∥ā − b̄∥ = 2, ∥c̄ − b̄∥ = 3 and the angle at b̄ has magnitude π6 .
(b) Find ∥ā − b̄∥ if ∥ā − c̄∥ = 2, ∥c̄ − b̄∥ = 3 and the angle at b̄ has magnitude π3 .
(c) Find ∥ā − b̄∥ if ∥ā − c̄∥ = 3, ∥c̄ − b̄∥ = 2 and the angle at b̄ has magnitude π2 .
π
(d) Find ∥ā − b̄∥ if ∥ā − c̄∥ = 3, the angle at ā has magnitude 3
and the angle at c̄
has magnitude π2 .

5. Consider a triangle with vertices ā, b̄ and c̄ such that the angle at ā has magnitude π2 .
Show that ∥b̄ − c̄∥2 = ∥ā − c̄∥2 + ∥ā − b̄∥2 .

6. Consider a triangle with vertices ā, b̄ and c̄ such that ∥b̄ − c̄∥2 = ∥ā − c̄∥2 + ∥ā − b̄∥2 .
Show that the angle at ā has magnitude π2 .

7. Consider a triangle with vertices ā, b̄ and c̄. If the angles at ā and c̄ have the same
magnitude, show that ∥b̄ − ā∥ = ∥b̄ − c̄∥.

39
8. Consider a triangle with vertices ā, b̄ and c̄. If ∥b̄ − ā∥ = ∥b̄ − c̄∥, show that the angles
at ā and c̄ have the same magnitude.

9. Consider a triangle with vertices ā, b̄ and c̄. If ∥b̄ − ā∥ = ∥b̄ − c̄∥ = ∥ā − c̄∥, show that
the angles at ā, b̄ and c̄ have the same magnitude.

10. Consider a triangle with vertices ā, b̄ and c̄. If the angles at ā, b̄ and c̄ have the same
magnitude, show that ∥b̄ − ā∥ = ∥b̄ − c̄∥ = ∥ā − c̄∥.

1.6 Planes in Space

When we think of a ‘flat surface’ in space, we imagine something like a table top. In this
section we discuss a mathematical model for such ‘flat surfaces’, called planes, within our
model R3 for space.

How should we define a plane? In order to arrive at a satisfactory definition we note the
following. A few physical observations should convince us that for two distinct lines in
space that intersect in one point, there is exactly one ‘flat surface’ containing both lines.
The figure below shows two lines intersecting at a point, and the ‘flat surface’ containing
these two lines.

1L1




 


  
      ‘flat surface’
 
L2   • -
 
     
  
  

)


On the other hand, if a straight line intersects a ‘flat surface’ in more than one point, then
the line is contained in this surface. The figure below shows a ‘flat surface’, a line intersecting
the surface in a single point, and a line intersecting the surface in two points. According to
the sketch, the second line is contained in the ‘flat surface’.

 
 
@
I    ‘flat surface’

@
@  • • 
-

 @ 
 • 

 
@
@
@
R
@

Lastly, given three points in space, not all on the same line, then there is exactly one ‘flat
surface’ containing these three points. Think of a rigid, rectangular, flat wooden board. If
three corners of the board are fixed to a wall, then the entire board is against the wall.

40
These observations lead us to a definition of a plane in R3 . Consider two lines L1 and L2
in space that intersect at a single point r̄. According to Theorem 1.4.16, L1 and L2 have
equations
x̄ = r̄ + t(p̄ − r̄), t ∈ R
and
ȳ = r̄ + s(q̄ − r̄), s ∈ R,
respectively, where p̄ ∈ L1 and q̄ ∈ L2 , with p̄, q̄ ̸= r̄. We believe that there is a ‘flat surface’
P in space containing L1 and L2 . For all t, s ∈ R, the points
x̄ = r̄ + t(p̄ − r̄), ȳ = r̄ + s(q̄ − r̄)
are on the ‘flat surface’ P . Therefore the line through x̄ and ȳ must be contained in P .
Hence for every α ∈ R, the point
z̄ = x̄ + α(ȳ − x̄) = r̄ + (αs)(q̄ − r̄) + (t − αt)(p̄ − r̄)
is a point on the ‘flat surface’ P .
1L1




  
  
ȳ    ‘flat surface’
Y
HHH r̄   
L2 
  •H
H •  -
z̄ •H
  
 
• x̄
 
   H
HH 
   
 HH
j
)

The line through x̄ and ȳ

It seems reasonable that these are the only points on P . Therefore, based on these obser-
vations, we define a plane in space as follows.
Definition 1.6.1. A plane in R3 is a set
P = {r̄ + s(p̄ − r̄) + t(q̄ − r̄) : s, t ∈ R}
where p̄, q̄ and r̄ are vectors in R3 such that p̄ − r̄ and q̄ − r̄ are not scalar multiples of each
other.
Remark 1.6.2. Consider a set P = {r̄+s(p̄− r̄)+t(q̄ − r̄) : s, t ∈ R}, where in p̄, q̄, r̄ ∈ R3 .

(1) If p̄ − r̄ = α(q̄ − r̄) for some α ∈ R, then


P = {r̄ + s(α(q̄ − r̄)) + t(q̄ − r̄) : s, t ∈ R} = {r̄ + (αs + t)(q̄ − r̄) : s, t ∈ R}.
Therefore, in this case, the set P is a line in space.
(2) If p̄ − r̄ and q̄ − r̄ are not scalar multiples of each other, then P is a plane in R3 . Note
that
r̄ = r̄ + 0(p̄ − r̄) + 0(q̄ − r̄), p̄ = r̄ + 1(p̄ − r̄) + 0(q̄ − r̄)
and
q̄ = r̄ + 0(p̄ − r̄) + 1(q̄ − r̄).
Thus r̄ ∈ P , p̄ ∈ P and q̄ ∈ P . We therefore call P the plane through p̄, q̄ and r̄.

41
(3) If P is a plane, then a point x̄ ∈ R3 is on the plane P if and only if

x̄ = r̄ + s(p̄ − r̄) + t(q̄ − r̄) for some s, t ∈ R.

We therefore speak of the plane described by the equation

x̄ = r̄ + s(p̄ − r̄) + t(q̄ − r̄).

(4) Let P = {r̄ + s(p̄ − r̄) + t(q̄ − r̄) : s, t ∈ R} be a plane through p̄, q̄ and r̄, and let
ā = α(p̄ − r̄) and b̄ = β(q̄ − r̄), with α and β nonzero real numbers. Then it follows
that, also,
P = {r̄ + sā + tb̄ : s, t ∈ R}.
Conversely, if ā and b̄ are nonzero vectors, not scalar multiples of each other, then

Q = {r̄ + sā + tb̄ : s, t ∈ R}

= {r̄ + s((ā + r̄) − r̄) + t((b̄ + r̄) − r̄)) : s, t ∈ R}

is a plane through r̄, r̄ + ā and r̄ + b̄.

We illustrate the definition of a plane at the hand of the following examples.

Example 1.6.3. Consider the set P = {⟨2, 3, −2⟩+s⟨1, −4, 2⟩+t⟨2, −3, 4⟩ : s, t ∈ R}, and
let r̄ = ⟨2, 3, −2⟩, ā = ⟨1, −4, 2⟩ and b̄ = ⟨2, −3, 4⟩. Since ā and b̄ are not scalar multiples
of each other, P is a plane through r̄, r̄ + ā = ⟨3, −1, 0⟩ and r̄ + b̄ = ⟨4, 0, 2⟩.

Example 1.6.4. Consider the set P = {⟨−1, 1, 6⟩ + s⟨2, 3, −5⟩ + t⟨−6, −9, 15⟩ : s, t ∈ R}.
Then, since ⟨−6, −9, 15⟩ = −3⟨2, 3, −5⟩,

P = {⟨−1, 1, 6⟩ + (s − 3t)⟨2, 3, −5⟩ : s, t ∈ R}

= {⟨−1, 1, 6⟩ + α⟨2, 3, −5⟩ : α ∈ R}.

Therefore P is the line through ⟨−1, 1, 6⟩ and ⟨1, 4, 1⟩.

Example 1.6.5. Consider the plane P = {r̄ + s(p̄ − r̄) + t(q̄ − r̄) : s, t ∈ R}, where
r̄ = ⟨1, 1, 0⟩, p̄ = ⟨0, 3, 1⟩ and q̄ = ⟨1, 2, 1⟩. We determine whether or not the points
x̄ = ⟨2, 0, 0⟩ and ȳ = ⟨1, 2, 0⟩ are on P .

Solution. The point x̄ = ⟨2, 0, 0⟩ is on P if and only if

x̄ = r̄ + s(p̄ − r̄) + t(q̄ − r̄) = ⟨1 − s, 1 + 2s + t, s + t⟩ for some s, t ∈ R.

According to Definition 1.1.4, the point x̄ is on P if and only if

2 = 1 − s, 0 = 1 + 2s + t and 0 = s + t for some s, t ∈ R.

42
According to the first equation, s = −1. Substituting s = −1 into the second equation,
we find that t = 1. These values for s and t also satisfy the third equation. Therefore
x̄ = r̄ − (p̄ − r̄) + (q̄ − r̄), so that x̄ ∈ P .

To determine whether or not ȳ = ⟨1, 2, 0⟩ is on P , we proceed in the same way: ȳ is on P


if and only if

ȳ = r̄ + s(p̄ − r̄) + t(q̄ − r̄) = ⟨1 − s, 1 + 2s + t, s + t⟩ for some s, t ∈ R.

It follows from Definition 1.1.4 that ȳ ∈ P if and only if

1 = 1 − s, 2 = 1 + 2s + t and 0 = s + t for some s, t ∈ R.

The first equation implies that s = 0. We substitute s = 0 into the second equation and
find that t = 1. On the other hand, substituting s = 0 into the third equation yields t = 0.
Since 0 ̸= 1, there are no values for s and t so that ȳ = r̄ + s(p̄ − r̄) + t(q̄ − r̄). Therefore ȳ
is not a point on P .

At the beginning of this section, we mention three properties that a ‘flat surface’ in space
should satisfy. As a motivation for the definition of a plane in R3 , we show that these
properties are satisfied.
Theorem 1.6.6. Let L be a line in R3 and P a plane in R3 . If L intersects P in more than
one point, then L ⊆ P ; that is, every point on L is also a point on P .

Proof. Let P = {r̄ + s(p̄ − r̄) + t(q̄ − r̄) : s, t ∈ R}, and suppose that L intersects P in
two points ā and b̄. Then, by Theorem 1.4.16,

L = {tā + (1 − t)b̄ : t ∈ R}.

Since ā, b̄ ∈ P , there exist real numbers s0 , s1 and t0 , t1 so that

ā = r̄ + s0 (p̄ − r̄) + t0 (q̄ − r̄) and b̄ = r̄ + s1 (p̄ − r̄) + t1 (q̄ − r̄).

Consider a point x̄ ∈ L. Then, for some t ∈ R,

x̄ = tā + (1 − t)b̄

= t[r̄ + s0 (p̄ − r̄) + t0 (q̄ − r̄)] + (1 − t)[r̄ + s1 (p̄ − r̄) + t1 (q̄ − r̄)]

= [t + (1 − t)]r̄ + [ts0 + (1 − t)s1 ](p̄ − r̄) + [tt0 + (1 − t)t1 ](q̄ − r̄)

= r̄ + [ts0 + (1 − t)s1 ](p̄ − r̄) + [tt0 + (1 − t)t1 ](q̄ − r̄).

Hence x̄ ∈ P so that L ⊆ P .
Theorem 1.6.7. Let L1 and L2 be lines in space that intersect at exactly one point. Then
there is exactly one plane P so that L1 ⊆ P and L2 ⊆ P .

43
Proof. We only prove that at least one plane P exists that contains L1 and L2 . Assume
that L1 and L2 intersect at a point ā. Let b̄ be a point on L1 and c̄ a point on L2 , both
different from ā. Then, in view of Theorem 1.4.16, we have

L1 = {ā + t(b̄ − ā) : t ∈ R} and L2 = {ā + s(c̄ − ā) : s ∈ R}.

The set P = {ā + t(b̄ − ā) + s(c̄ − ā) : s, t ∈ R} is a plane in R3 that contains both L1 and
L2 , see Exercise 1.6 number 3.

We illustrate Theorem 1.6.7 at the hand of the following example.


Example 1.6.8. Consider the lines L1 and L2 through p̄ = ⟨1, 0, 1⟩ and q̄ = ⟨1, 2, −1⟩, and
ū = ⟨−1, 2, 0⟩ and v̄ = ⟨−3, 3, 0⟩, respectively. We show that L1 and L2 intersect, and find
an equation for the plane containing both lines.

Solution. L1 and L2 intersect at a point ā = ⟨x, y, z⟩ if and only if

tp̄ + (1 − t)q̄ = ā = sū + (1 − s)v̄ for some s, t ∈ R.

Hence, by Definition 1.1.4, L1 and L2 intersect at ā if and only if

1 = x = 2s − 3, 2 − 2t = y = 3 − s and 2t − 1 = z = 0 for some s, t ∈ R.

The first equation implies that s = 2, and the third that t = 21 . These values for s and t
also satisfy the second equation. Therefore the lines L1 and L2 intersect at the point

ā = 12 p̄ + (1 − 21 )q̄ = ⟨1, 1, 0⟩.

According to Theorem 1.6.7, there is exactly one plane P containing both L1 and L2 . In
order to write down an equation for this plane, we need three points on the plane that are
not all on the same line. We have

ā ∈ L1 , ā ∈ L2 , p̄ ∈ L1 , ū ∈ L2 .

Since P contains L1 and L2 , it follows that ā, p̄, ū ∈ P . But p̄ ∈


/ L2 and ū ∈
/ L1 . Therefore
ā, p̄ and ū are not all on the same line. Hence an equation for P is

x̄ = ā + s(p̄ − ā) + t(ū − ā) = ⟨1, 1, 0⟩ + s⟨0, −1, 1⟩ + t⟨−2, 1, 0⟩.

Theorem 1.6.9. Let p̄, q̄ and r̄ be points in R3 , not all on the same line. Then there exists
a unique plane P in R3 so that p̄, q̄, r̄ ∈ P .

Proof. Let L1 be the line through r̄ and p̄, and L2 the line through r̄ and q̄. Since p̄, q̄ and
r̄ are not all on the same line, it follows that L1 and L2 intersect at a single point, namely,
at r̄. It follows from Theorem 1.6.7 that there exists a unique plane P containing L1 and
L2 . Since p̄, r̄ ∈ L1 and q̄ ∈ L2 it follows that p̄, q̄, r̄ ∈ P . If Q is a plane containing p̄, q̄ and
r̄, then Q contains L1 and L2 by Theorem 1.6.6. Since P is the unique plane containing L1
and L2 , it follows that P = Q.

44
Recall from Section 1.4 that we call two lines L1 = {q̄ + t(p̄ − q̄) : t ∈ R} and L2 =
{b̄ + t(ā − b̄) : t ∈ R} parallel if p̄ − q̄ and ā − b̄ are scalar multiples of each other. We
showed, in Theorem 1.4.19, that if L1 and L2 are parallel and L1 ̸= L2 , then L1 and L2 do
not intersect. The converse of this statement is false. Indeed, in Example 1.4.22 we gave
an example of two lines that are not parallel, and do not intersect. Our next result clarifies
the this issue.

Theorem 1.6.10. Consider two lines L1 = {q̄+t(p̄−q̄) : t ∈ R} and L2 = {b̄+t(ā−b̄) : t ∈


R} in R3 so that L1 ̸= L2 . The following statements are true.

(1) If L1 and L2 are parallel, then there exists exactly one plane P containing L1 and L2 .

(2) If L1 and L2 do not intersect, and there is a plane P so that L1 and L2 are contained
in P , then L1 and L2 are parallel.

We only prove (1). A proof of a special case of (2) is given as Exercise 1.6 number 5.

Proof of (1). Assume that L1 and L2 are parallel. Then by the definition of parallel lines,
Definition 1.4.17, there exists a nonzero real number α so that

ā − b̄ = α(p̄ − q̄).

Since L1 and L2 are parallel, it follows from Theorem 1.4.19 that L1 and L2 do not intersect.
Therefore p̄, q̄ and b̄ are not all on the same line. By Theorem 1.6.9,

P = {q̄ + s(p̄ − q̄) + t(b̄ − q̄) : s, t ∈ R}

is a plane in R3 containing p̄, q̄ and b̄. The plane P contains the lines L1 and L2 , see Exercise
1.6 number 4(a).

Now assume that Q is a plane containing both L1 and L2 . Then P = Q, see Exercise 1.6
number 4(b). Therefore there is exactly one plane containing both L1 and L2 .

In this section planes in R3 are introduced as a mathematical model for a ‘flat surface’ in
space. It is shown that our model matches our intuition for what a ‘flat surface’ should look
like. In some situations, such as when determining the intersection of two planes, or the
intersection of a plane and a line, the definition of a plane given here is not convenient. In
Section 1.7 we introduce an alternative description of a plane which simplifies such problems.

Exercise 1.6

1. Determine whether the point ā is on the plane through the points p̄, q̄ and r̄.
(a) ā = ⟨2, 6, 5⟩; p̄ = ⟨−1, 3, 2⟩, q̄ = ⟨0, 4, 3⟩ and r̄ = ⟨1, 2, 1⟩.
(b) ā = ⟨5, 12, −35⟩; p̄ = ⟨1, 1, −1⟩, q̄ = ⟨4, 1, 1⟩ and r̄ = ⟨2, 0, 3⟩.
(c) ā = ⟨3, 1, 2⟩; p̄ = ⟨2, 1, 2⟩, q̄ = ⟨1, 1, 3⟩ and r̄ = ⟨0, −1, 1⟩.

45
(d) ā = ⟨1, −1, 1⟩; p̄ = ⟨2, 0, 3⟩, q̄ = ⟨−1, 2, 3⟩ and r̄ = ⟨−2, 3, 1⟩.
(e) ā = ⟨−1, 3, 2⟩; p̄ = ⟨0, 4, 3⟩, q̄ = ⟨1, 2, 1⟩ and r̄ = ⟨2, 6, 5⟩.
(f) ā = ⟨4, −8, 2⟩; p̄ = ⟨4, 2, 1⟩, q̄ = ⟨2, 6, 1⟩ and r̄ = ⟨1, 3, −1⟩.

2. Determine whether or not there is a plane containing the given lines, and find an
equation for the plane, if it exists.
(a) L1 is the line with equation x̄ = ⟨0, 1, 3⟩ + t⟨−1, 1, 2⟩ and L2 is the line with
equation x̄ = ⟨2, −1, −1⟩ + s⟨2, 1, 1⟩.
(b) L1 is the line through p̄ = ⟨1, 0, −1⟩ and q̄ = ⟨2, 2, 1⟩, and L2 is the line through
ū = ⟨2, −1, 3⟩ and v̄ = ⟨−2, −3, −9⟩.
(c) L1 is the line with equation x̄ = ⟨1, 0, 0⟩+t⟨0, 1, 0⟩ and L2 is the line with equation
x̄ = ⟨0, 0, 1⟩ + s⟨1, 0, 0⟩.
(d) L1 is the line through 0̄ and ī, and L2 is the line through j̄ and v̄ = ⟨1, 1, 0⟩.

3. Consider the proof of Theorem 1.6.7. Show that P = {ā+t(b̄−ā)+s(c̄−ā) : s, t ∈ R}


is a plane in R3 that contains both L1 and L2 .

4. Consider the proof of Theorem 1.6.10 (1).


(a) Show that the plane P contains the lines L1 and L2 .
(b) If Q is a plane containing the lines L1 and L2 , prove that Q = P .

5. The aim of this exercise is to prove a special case of Theorem 1.6.10 (2). Let P be the
xy-plane; that is, P = {tī + sj̄ : s, t ∈ R}. Assume that L1 = {p̄ + tq̄ : t ∈ R} and
L2 = {ā + sb̄ : s ∈ R} are lines in R3 , both contained in P .
(a) Assume that L1 and L2 are not parallel.
(i) Show that a3 = b3 = p3 = q3 = 0.
(ii) Explain why b1 ̸= 0 or b2 ̸= 0.
(iii) Assume that b1 ̸= 0. Use the fact that L1 and L2 are not parallel to prove
that b2 q1 − q2 b1 ̸= 0. [HINT: What happens if b2 q1 − q2 b1 = 0?]
(iv) Prove that L1 and L2 intersect at a single point.
(b) In (a) you prove that if L1 and L2 are not parallel, then they intersect in a single
point. Explain why the following is now true: If L1 and L2 do not intersect, then
they are parallel.

1.7 The Cross Product

In this section, we introduce a new operation on algebraic vectors in R3 , called the cross
product. Our motivation for doing so comes both from within mathematics itself, and from
applications of mathematics.

46
The following problem often arises in applications. Given two lines L1 and L2 which intersect
at a point ā, find a third line through ā that is perpendicular to both L1 and L2 . For
instance, an engineer designing a steel frame of a high rise building would have to solve this
kind of problem on a number of occasions. Furthermore, as mentioned at the end of Section
1.6, the definition of a plane is often inconvenient to work with, particularly in a situation
where more than one plane is involved. The cross product provides a convenient alternative
description of a plane.

Definition 1.7.1. If ū = ⟨u1 , u2 , u3 ⟩ and v̄ = ⟨v1 , v2 , v3 ⟩ are algebraic vectors in R3 , then


the cross product of ū and v̄ is the algebraic vector

ū × v̄ = ⟨u2 v3 − u3 v2 , u3 v1 − u1 v3 , u1 v2 − u2 v1 ⟩.

The following visual representation may help you to remember the formula for the cross
product of two algebraic vectors.

ū = ⟨ u1 , u2 , u3 ⟩

v̄ = ⟨ v1 , v2 , v3 ⟩

u2 u3 u3 u1 u1 u2
@ @ @
@ @ @
@ @ @
− @+
R
@
− @+
R
@
− @+
R
@
v2 v3 v3 v1 v1 v2

@
@
@
@
@
@
R
@ ?
ū × v̄ = ⟨ u2 v3 − u3 v2 , u3 v1 − u1 v3 , u1 v2 − u2 v1 ⟩

We illustrate Definition 1.7.1 at the hand of an example.

Example 1.7.2. Let ū = ⟨1, −2, 1⟩, v̄ = ⟨3, 1, −1⟩ and w̄ = ⟨2, −4, 2⟩. Then

ū × v̄ = ⟨[(−2) × (−1)] − [1 × 1], [1 × 3] − [1 × (−1)], [1 × 1] − [(−2) × 3]⟩

= ⟨1, 4, 7⟩,

47
v̄ × ū = ⟨[1 × 1] − [(−1) × (−2)], [(−1) × 1] − [3 × 1], [3 × (−2)] − [1 × 1]⟩

= ⟨−1, −4, −7⟩,

v̄ × w̄ = ⟨[1 × 2] − [(−1) × (−4)], [(−1) × 2] − [3 × 2], [3 × (−4)] − [1 × 2]⟩

= ⟨−2, −8, −14⟩


and
ū × w̄ = ⟨[(−2) × 2] − [1 × (−4)], [1 × 2] − [1 × 2], [1 × (−4)] − [(−2) × 2]⟩

= ⟨0, 0, 0⟩.

Remark 1.7.3. Consider the algebraic vectors ū, v̄ and w̄ in Example 1.7.2.

(1) Note that ū × v̄ ̸= v̄ × ū. In fact, ū × v̄ = −(v̄ × ū). Therefore the cross product is
not commutative.

(2) Recall that if a and b are real numbers such that ab = 0, then a = 0 or b = 0. Observe
that w̄ = 2ū ̸= 0̄, but ū × w̄ = 0̄. The cross product therefore admits zero divisors.

(3) Lastly, we have w̄ = 2ū, as noted above, and v̄ × w̄ = 2(v̄ × ū).

The observations in (1), (2) and (3) above are not incidental, as we shall see shortly.

The cross product satisfies the following properties.

Theorem 1.7.4. Let ū, v̄ and w̄ be algebraic vectors in R3 , and α a real number. Then the
following identities hold.

(1) ū × v̄ = − (v̄ × ū)

(2) ū × (v̄ + w̄) = ū × v̄ + ū × w̄

(3) (ū + v̄) × w̄ = ū × w̄ + v̄ × w̄

(4) (αū) × v̄ = α (ū × v̄) = ū × (αv̄)

(5) ū × v̄ = 0̄ if and only if v̄ is a scalar multiple of ū, or ū is a scalar multiple of v̄.

We give a proof of the identity in (2), with the remaining proofs given as exercises, see
Exercise 1.7 numbers 7 to 10.

48
Proof of (2). We have

u × (v + w) = ⟨u1 , u2 , u3 ⟩ × ⟨v1 + w1 , v2 + w2 , v3 + w3 ⟩ [Definition 1.1.6]

= ⟨u2 (v3 + w3 ) − u3 (v2 + w2 ) ,

u3 (v1 + w1 ) − u1 (v3 + w3 ) ,

u1 (v2 + w2 ) − u2 (v1 + w1 )⟩. [Definition 1.7.1]

Since multiplication in R is distributive over addition, and addition is associative in R, it


follows that
ū × (v̄ + w̄) = ⟨(u2 v3 − u3 v2 ) + (u2 w3 − u3 w2 ) ,

(u3 v1 − u1 v3 ) + (u3 w1 − u1 w3 ) ,

(u1 v2 − u2 v1 ) + (u1 w2 − u2 w1 )⟩

= ⟨u2 v3 − u3 v2 , u3 v1 − u1 v3 , u1 v2 − u2 v1 ⟩+

⟨u2 w3 − u3 w2 , u3 w1 − u1 w3 , u1 w2 − u2 w1 ⟩ [Definition 1.1.6]

= ū × v̄ + ū × w̄. [Definition 1.7.1]

We illustrate some of the identities in Theorem 1.7.4 at the hand of an example.

Example 1.7.5. Consider the algebraic vectors ū = ⟨2, −1, 3⟩, v̄ = ⟨−1, 5, 1⟩ and w̄ =
⟨2, 1, 2⟩. We show that

(a) (ū + v̄) × w̄

(b) (3ū) × v̄

thus illustrating the validity of Theorem 1.7.4 (3) and (4).

Solution. (a) We have

(ū + v̄) × w̄ = ⟨1, 4, 4⟩ × ⟨2, 1, 2⟩

= ⟨(4 × 2) − (4 × 1), (4 × 2) − (1 × 2), (1 × 1) − (4 × 2)⟩

= ⟨4, 6, −7⟩.

49
On the other hand,

ū × w̄ + v̄ × w̄ = ⟨2, −1, 3⟩ × ⟨2, 1, 2⟩ + ⟨−1, 5, 1⟩ × ⟨2, 1, 2⟩

= ⟨[(−1) × 2] − (3 × 1), (3 × 2) − (2 × 2), (2 × 1) − [(−1) × 2]⟩

+⟨(5 × 2) − (1 × 1), (1 × 2) − [(−1) × 2], [(−1) × 1] − (5 × 2)⟩

= ⟨−5, 2, 4⟩ + ⟨9, 4, −11⟩

= ⟨4, 6, −7⟩.

Therefore (ū + v̄) × w̄ = ū × w̄ + v̄ × w̄.


(b) We calculate (3ū) × v̄ and find that

(3ū) × v̄ = ⟨6, −3, 9⟩ × ⟨−1, 5, 1⟩

= ⟨[(−3) × 1] − (9 × 5), [9 × (−1)] − (6 × 1), (6 × 5) − [(−3) × (−1)]⟩

= ⟨−48, −15, 27⟩.

Likewise,

3 (ū × v̄) = 3 (⟨2, −1, 3⟩ × ⟨−1, 5, 1⟩)

= 3⟨[(−1) × 1] − (3 × 5), [3 × (−1)] − (2 × 1), (2 × 5) − [(−1) × (−1)]⟩

= 3⟨−16, −5, 9⟩

= ⟨−48, −15, 27⟩

so that (3ū) × v̄ = 3 (ū × v̄).

The following identity shows how the cross product of two algebraic vectors relates to the
dot product. It has important applications to the interpretation of the cross product, as we
shall see shortly.

Theorem 1.7.6. Let ū and v̄ be algebraic vectors in R3 . Then ū · (ū × v̄) = 0 = v̄ · (ū × v̄).

The proof of Theorem 1.7.6 is an elementary application of the definitions of the cross and
dot products, respectively. Instead of proving the theorem, which is given as Exercise 1.7
number 11, we illustrate the result at the hand of an example.

Example 1.7.7. Let ū = ⟨1, −2, 2⟩ and v̄ = ⟨2, 1, −1⟩. Then

ū × v̄ = ⟨[(−2) × (−1)] − (2 × 1), (2 × 2) − [1 × (−1)], (1 × 1) − [(−2) × 2]⟩ = ⟨0, 5, 5⟩.

50
Therefore
ū · (ū × v̄) = ⟨1, −2, 2⟩ · ⟨0, 5, 5⟩ = 0
and
v̄ · (ū × v̄) = ⟨2, 1, −1⟩ · ⟨0, 5, 5⟩ = 0.

As an application of Theorem 1.7.6, we have the following interpretation of the cross product.
π
Definition 1.7.8. An angle is called a right angle if it has magnitude .
2
Definition 1.7.9. If two lines L1 = {q̄ + t(p̄ − q̄) : t ∈ R} and L2 = {ū + t(v̄ − ū) : t ∈ R}
intersect at a point ā, then L1 and L2 are perpendicular if the angle formed by the rays with
equations x̄ = ā + t(p̄ − q̄) and x̄ = ā + t(v̄ − ū) is a right angle.

Theorem 1.7.10. Consider two rays R0 = {ā + tū : t ≥ 0} and R1 = {ā + tv̄ : t ≥ 0}
such that ū and v̄ are not scalar multiples of each other. If R2 = {ā + t(ū × v̄) : t ≥ 0},
then the angles R0 ∪ R2 and R1 ∪ R2 are right angles.

Proof. We prove that the angle formed by the rays R0 = {ā + tū : t ≥ 0} and R2 =
{ā + t(ū × v̄) : t ≥ 0} is a right angle. Since ū and v̄ are not scalar multiples of each other,
both vectors are nonzero, and ū × v̄ ̸= 0̄ by Theorem 1.7.4 (5). Let θ denote the magnitude
of the angle formed by R0 and R2 . According Theorem 1.7.6,

ū · (ū × v̄) = 0

so that
ū · (ū × v̄)
cos θ = = 0.
∥ū∥∥ū × v̄∥
π
Therefore θ = 2
so that the angle is a right angle.

The proof that the angle formed by {ā + tv̄ : t ≥ 0} and {ā + t(ū × v̄) : t ≥ 0} is a right
angle is similar, and is therefore omitted.

The figure below is an illustration of Theorem 1.7.10.

The ray x̄ = ā + t(ū × v̄)


6

The ray x̄ = ā + tv̄



3
ā + ū × v̄ • 
•

   ā + v̄

H
H
•H

ā HH
H•Hā + ū
HH
HHj The ray x̄ = ā + tū

51
We demonstrate the use of Theorem 1.7.10 at the hand of the following example.
Example 1.7.11. Consider the line L1 through p̄ = ⟨1, 1, 2⟩ and q̄ = ⟨2, 0, −1⟩, and the line
L2 through ū = ⟨1, 1, −1⟩ and v̄ = ⟨−1, 3, 3⟩. We

(a) show that L1 and L2 intersect in a single point ā;


(b) find an equation for a line through ā that is perpendicular to both L1 and L2 ;
(c) show that the line in (b) is the only line through ā that is perpendicular to both L1 and
L2 .

Solution. (a) The lines L1 and L2 intersect at a point ā = ⟨x, y, z⟩ if and only if

p̄ + t(q̄ − p̄) = ā = ū + s(v̄ − ū) for some s, t ∈ R.

In view of Definition 1.1.4, L1 and L2 intersect at ā if and only if

1 + t = x = 1 − 2s, 1 − t = y = 1 + 2s and 2 − 3t = z = 4s − 1 for some s, t ∈ R.

The first equation implies that t = −2s. Substituting t = −2s into the third equation, we
find that 2 + 6s = 4s − 1 so that s = − 32 . Hence t = −2s = 3. These values for s and t also
satisfy the second equation, so that L1 and L2 intersect at one point, namely

ā = p̄ + 3(q̄ − p̄) = ⟨4, −2, −7⟩.

(b) According to Definition 1.7.9 and Theorem 1.7.10, the line L3 with equation

x̄ = ā + t[(q̄ − p̄) × (v̄ − ū)]

is perpendicular to L1 and L2 . We have

(q̄ − p̄) × (v̄ − ū) = ⟨1, −1, −3⟩ × ⟨−2, 2, 4⟩ = ⟨2, 2, 0⟩.

Therefore the line L3 with equation

x̄ = ā + t⟨2, 2, 0⟩

passes through ā and is perpendicular to both L1 and L2 .


(c) Now suppose that the line L with equation x̄ = ā + tr̄, t ∈ R, is perpendicular to L1
and L2 . Let r̄ = ⟨r1 , r2 , r3 ⟩. According to Definitions 1.5.3, 1.7.8 and 1.7.9 we have

(q̄ − p̄) · r̄ = 0 and (v̄ − ū) · r̄ = 0.

We now have the following equations:


r1 − r2 − 3r3 = 0

−2r1 + 2r2 + 4r3 = 0.

52
Adding two times the first equation to the second equation yields r3 = 0. We substitute
r3 = 0 back into the original equations, and obtain the single equation
r1 − r2 = 0.
Therefore r1 = r2 so that
r1
r̄ = ⟨r1 , r1 , 0⟩ = 2
⟨2, 2, 0⟩.
Hence every point x̄ ∈ L is of the form
r1 t
x̄ = ā + tr̄ = ā + 2
⟨2, 2, 0⟩
for some t ∈ R, and is therefore on L3 . By Theorem 1.4.16, L = L3 so that L3 is the only
line through ā and perpendicular to both L1 and L2 .
Remark 1.7.12. It is true, in general, that if two lines L1 and L2 intersect in a single point
ā, then there exists a unique line through ā that is perpendicular to both L1 and L2 .
Theorem 1.7.13. Let ū and v̄ be vectors in R3 . Then ∥ū × v̄∥2 = ∥ū∥2 ∥v̄∥2 − (ū · v̄)2 .

Proof. According to the definitions of the cross product, Definition 1.7.1, we have
ū × v = ⟨u2 v3 − u3 v2 , u3 v1 − u1 v3 , u1 v2 − u2 v1 ⟩.
Therefore
∥ū × v∥2 = (u2 v3 − u3 v2 )2 + (u3 v1 − u1 v3 )2 + (u1 v2 − u2 v1 )2

3 3
! 3
!
X X X
= u2i vj2 − u2i vi2 + 2 (u2 v2 u3 v3 + u1 v1 u3 v3 + u1 v1 u2 v2 )
i=1 j=1 i=1

3
! 3
! 3
!2
X X X
= u2i × vj2 − ui vi
i=1 j=1 i=1

= ∥ū∥2 ∥v̄∥2 − (ū · v̄)2 .

We illustrate Theorem 1.7.13 at the hand of an example.


Example 1.7.14. Let ū = ⟨2, 1, −3⟩ and v̄ = ⟨−1, 2, 2⟩. Then
ū × v̄ = ⟨8, −1, 5⟩
so that
∥ū × v̄∥2 = 64 + 1 + 25 = 90.
On the other hand,
∥ū∥2 = 14, ∥v̄∥ = 9 and ū · v̄ = −6
so that
∥ū∥2 ∥v̄∥2 − (ū · v̄)2 = 14 × 9 − 36 = 90 = ∥ū × v̄∥2 .

53
Theorem 1.7.13 contains information on how the cross product of two algebraic vectors is
related to the angle formed by the rays determined by these vectors. In this regards, we
have the following result, the proof of which is given as an exercise, see Exercise 1.7 number
12.

Theorem 1.7.15. Consider points r̄, ū and v̄ in R3 with ū, v̄ ̸= 0̄. Let θ be the magnitude
of the angle formed by the rays {r̄ + tū : t ≥ 0} and {r̄ + tv̄ : t ≥ 0}. Then

∥ū × v̄∥
sin θ =
∥ū∥∥v̄∥

Theorem 1.7.15 has important applications in Mechanics, but these fall beyond the scope
of this text, and are therefore not dealt with.

The remainder of this section is concerned with planes in R3 . In particular, we will obtain an
alternative description for the plane through the points p̄, q̄ and r̄ which is more amenable
to manipulation. In this regard, we have the following.

Theorem 1.7.16. Let n̄ ∈ R3 and r̄ ∈ R3 with n̄ ̸= 0̄. Then the set

A = {x̄ ∈ R3 : n̄ · (x̄ − r̄) = 0}

is a plane in R3 .

Proof. Let n̄ = ⟨n1 , n2 , n3 ⟩ and r̄ = ⟨r1 , r2 , r3 ⟩. Since n̄ ̸= 0̄, at least one of the components
of n̄ is nonzero. Without loss of generality, we assume that n1 ̸= 0. Let

p̄ = r̄ + ⟨−n2 , n1 , 0⟩ and q̄ = r̄ + ⟨−n3 , 0, n1 ⟩.

Then
p̄ − r̄ = ⟨−n2 , n1 , 0⟩ and q̄ − r̄ = ⟨−n3 , 0, n1 ⟩.
If α is a real number, then

α(p̄ − r̄) = ⟨−αn2 , αn1 , 0⟩ =


̸ ⟨−n3 , 0, n1 ⟩ = q̄ − r̄

because n1 ̸= 0. In the same way it follows that p̄ − r̄ is not a scalar multiple of q̄ − r̄. Hence

P = {r̄ + s(p̄ − r̄) + t(q̄ − r̄) : s, t ∈ R}

is a plane in R3 .

We now show that P = A. Let x̄ ∈ P . Then

x̄ = r̄ + s(p̄ − r̄) + t(q̄ − r̄) = r̄ + s⟨−n2 , n1 , 0⟩ + t⟨−n3 , 0, n1 ⟩

54
for some s, t ∈ R. Therefore
n̄ · (x̄ − r̄) = n̄ · (s⟨−n2 , n1 , 0⟩ + t⟨−n3 , 0, n1 ⟩)

= s(n̄ · ⟨−n2 , n1 , 0⟩) + t(n̄ · ⟨−n3 , 0, n1 ⟩) [By Theorem 1.2.3(3)]

= s(−n1 n2 + n2 n1 + 0) + t(−n1 n3 + 0 + n3 n1 )

= 0.
Therefore x̄ ∈ A so that P ⊆ A.

Assume that x̄ = ⟨x, y, z⟩ ∈ A. Then


n̄ · (x̄ − r̄) = n1 (x − r1 ) + n2 (y − r2 ) + n3 (z − r3 ) = 0.
Therefore
n2 (y − r2 ) + n3 (z − r3 )
r1 = x + . (1.22)
n1
Since n1 ̸= 0, the real numbers
y − r2 z − r3
s= and t =
n1 n1
are well defined. We now have
r̄ + s(p̄ − r̄) + t(q̄ − r̄)

= r̄ + s⟨−n2 , n1 , 0⟩ + t⟨−n3 , 0, n1 ⟩

n2 (y − r2 ) n3 (z − r3 )
= r̄ + ⟨− , y − r2 , 0⟩ + ⟨− , 0, z − r3 ⟩
n1 n1

n2 (y − r2 ) + n3 (z − r3 )
= ⟨r1 − , y, z⟩
n1

= ⟨x, y, z⟩ [By (1.22)]

= x̄.
Therefore x̄ ∈ P so that A ⊆ P . Hence A = P .

Using Theorem 1.7.16, we obtain the following result, the proof of which is given as an
exercise, see Exercise 1.7 number 14.
Theorem 1.7.17. Let p̄, q̄ and r̄ be points in R3 such that p̄ − r̄ and q̄ − r̄ are not scalar
multiples of each other. A point x̄ ∈ R3 is on the plane P = {r̄+s(p̄−r̄)+t(q̄−r̄) : s, t ∈ R}
through p̄, q̄ and r̄ if and only if
n̄ · (x̄ − r̄) = 0
where n̄ = (p̄ − r̄) × (q̄ − r̄).

55
Remark 1.7.18. Consider a plane P through the points p̄, q̄ and r̄, with p̄, q̄ and r̄ not on
the same line.

(1) A point x̄ ∈ R3 is on P if and only if

n̄ · (x̄ − r̄) = 0,

where n̄ = (p̄ − r̄) × (q̄ − r̄). We therefore speak of the plane described by the equation

n̄ · (x̄ − r̄) = 0,

and refer to this equation as a Cartesian equation for the plane. It is sometimes
convenient to write the Cartesian equation for a plane in terms of the components of
x̄, that is, as
n1 x + n2 y + n3 z = n̄ · r̄.

(2) The equation


x̄ = r̄ + s(p̄ − r̄) + t(q̄ − r̄)
is called a vector parametric equation for the plane.

(3) Theorem 1.7.17 may be interpreted geometrically as follows: Given two points ā and
b̄ on a plane with Cartesian equation

n̄ · (x̄ − r̄) = 0,

the angle formed by the rays {ā + t(b̄ − ā) : t ≥ 0} and {ā + tn̄ : t ≥ 0} is a right
angle. This is illustrated in the figure below.

The ray x̄ = ā + tn̄


6

• ā + n̄

 
 b̄
  
 • • ā 

The plane P
  


The vector n̄ is therefore referred to as a normal vector for P .

(4) If n̄ is a normal vector for a plane P , then so is any nonzero scalar multiple αn̄ of n̄.

We illustrate Theorem 1.7.17 at the hand of some examples.

Example 1.7.19. We determine whether or not the plane through the points p̄ = ⟨1, 2, −1⟩,
q̄ = ⟨2, 1, 0⟩ and r̄ = ⟨1, −1, 1⟩ is defined, and find a Cartesian equation for the plane if it
is defined.

56
Solution. Let

n̄ = (p̄ − r̄) × (q̄ − r̄) = ⟨0, 3, −2⟩ × ⟨1, 2, −1⟩ = ⟨1, −2, −3⟩.

Since n̄ ̸= 0̄, it follows from Theorem 1.7.4 (5) that p̄ − r̄ and q̄ − r̄ are not scalar multiples
of each other. Therefore the plane through p̄, q̄ and r̄ is defined. By Theorem 1.7.17,

⟨1, −2, −3⟩ · (x̄ − r̄) = 0

is a Cartesian equation for the plane. Computing the dot product, we find

x − 2y − 3z = 0.

Example 1.7.20. We determine whether or not the plane through the points p̄ = ⟨2, 0, 3⟩,
q̄ = ⟨−4, 0, 1⟩ and r̄ = ⟨−1, 0, 2⟩ is defined, and find a Cartesian equation for the plane if it
is defined.

Solution. Let
n̄ = (p̄ − r̄) × (q̄ − r̄) = ⟨3, 0, 1⟩ × ⟨−3, 0, −1⟩ = 0̄.
Since n̄ = 0̄, it follows from Theorem 1.7.4 (5) that p̄ − r̄ and q̄ − r̄ are scalar multiples of
each other. Therefore the plane through p̄, q̄ and r̄ is not defined.
Example 1.7.21. We determine a vector parametric equation for the plane P with Carte-
sian equation 3x − y + z = 3.

Solution. In order to write down a parametric equation for P , we must find three points p̄,
q̄ and r̄ on P so that p̄ − r̄ and q̄ − r̄ are not scalar multiples of each other. There is no rule
that determines how this should be done. However, if we know the values of two coordinates
of a point ⟨x, y, z⟩ ∈ P , then the equation 3x − y + z = 3 for P uniquely determines the
value of the remaining coordinate.

Let r̄ = ⟨0, 0, z⟩. Then

r̄ ∈ P if and only if 3(0) − 0 + z = 3

so that z = 3. Therefore r̄ = ⟨0, 0, 3⟩ ∈ P .

Let p̄ = ⟨0, y, 0⟩. Then

p̄ ∈ P if and only if 3(0) − y + 0 = 3

so that y = −3. Therefore p̄ = ⟨0, −3, 0⟩ ∈ P .

Let q̄ = ⟨x, 0, 0⟩. Then

q̄ ∈ P if and only if 3x − 0 + 0 = 3

so that x = 1. Therefore q̄ = ⟨1, 0, 0⟩ ∈ P .

57
Now we check whether or not p̄ − r̄ and q̄ − r̄ are scalar multiples of each other. We have

(p̄ − r̄) × (q̄ − r̄) = ⟨0, −3, −3⟩ × ⟨1, 0, −3⟩ = ⟨9, −3, 3⟩ =
̸ 0̄.

By Theorem 1.7.4 (5), p̄ − r̄ and q̄ − r̄ are not scalar multiplies of each other. Therefore P
has Cartesian equation

x̄ = r̄ + t(p̄ − r̄) + s(q̄ − r̄) = ⟨0, 0, 3⟩ + t⟨0, −3, −3⟩ + s⟨1, 0, −3⟩, s, t ∈ R.

The Cartesian form of a plane is much more amenable to manipulation than the vector
parametric form. We demonstrate this in two cases. First we consider the problem of
finding the intersection of a plane and a line. Recall from Theorem 1.6.6 that if a line L
intersects a plane P in more than one point, then L is contained in P , that is, if x̄ ∈ L then
x̄ ∈ P . This leads immediately to the following result.
Theorem 1.7.22. Consider a plane P and a line L in R3 . Then exactly one of the following
statements is true.

(1) L and P intersects in exactly one point.

(2) L is contained in P .

(3) L and P do not intersect; that is, L and P do not have a common point.

We demonstrate Theorem 1.7.22 at the hand of the following examples, which illustrate the
utility of the Cartesian equation of a plane.
Example 1.7.23. We determine whether or not the line L through ā = ⟨1, −1, 2⟩ and
b̄ = ⟨3, 0, 2⟩ intersects the plane P through p̄ = ⟨1, 2, −1⟩, q̄ = ⟨1, 0, 2⟩ and r̄ = ⟨2, 2, 0⟩, and
find the points of intersection, if any exist.

Solution. We have
n̄ = (p̄ − r̄) × (q̄ − r̄) = ⟨−2, 3, 2⟩
so that the plane P has a Cartesian equation

⟨−2, 3, 2⟩ · (x̄ − ⟨2, 2, 0⟩) = 0.

The line L through ā and b̄ has an equation

x̄ = ā + t(b̄ − ā) = ⟨1 + 2t, t − 1, 2⟩, t ∈ R.

Therefore L intersects P at a point x̄ ∈ R3 if and only if

x̄ = ⟨1 + 2t, t − 1, 2⟩ for some t ∈ R (1.23)

and

⟨−2, 3, 2⟩ · (x̄ − ⟨2, 2, 0⟩) = 0. (1.24)

58
Combining (1.23) and (1.24) we find that L intersects P if and only if

⟨−2, 3, 2⟩ · (⟨1 + 2t, t − 1, 2⟩ − ⟨2, 2, 0⟩) = 0 for some t ∈ R

For t ∈ R,
⟨−2, 3, 2⟩ · (⟨1 + 2t, t − 1, 2⟩ − ⟨2, 2, 0⟩) = −t − 3
so that
⟨−2, 3, 2⟩ · (⟨1 + 2t, t − 1, 2⟩ − ⟨2, 2, 0⟩) = 0 if and only if t = −3.
Therefore L intersects P at the single point x̄ = ā − 3(b̄ − ā) = ⟨−5, −4, 2⟩.

Example 1.7.24. We determine whether or not the line L through ā = ⟨1, 0, 1⟩ and b̄ =
⟨3, 2, 0⟩ intersects the plane P through p̄ = ⟨1, 2, −1⟩, q̄ = ⟨1, 0, 2⟩ and r̄ = ⟨2, 2, 0⟩, and find
the points of intersection, if any exist.

Solution. The plane P is the same as in Example 1.7.23. It has Cartesian equation

⟨−2, 3, 2⟩ · (x̄ − ⟨2, 2, 0⟩) = 0.

The line L through ā and b̄ has equation

x̄ = ā + t(b̄ − ā) = ⟨1 + 2t, 2t, 1 − t⟩, t ∈ R.

Therefore L intersects P at a point x̄ ∈ R3 if and only if

x̄ = ⟨1 + 2t, 2t, 1 − t⟩ for some t ∈ R (1.25)

and

⟨−2, 3, 2⟩ · (x̄ − ⟨2, 2, 0⟩) = 0. (1.26)

We combine (1.25) and (1.26) and find that L intersects P if and only if

⟨−2, 3, 2⟩ · (⟨1 + 2t, 2t, 1 − t⟩ − ⟨2, 2, 0⟩) = 0 for some t ∈ R

For t ∈ R,
⟨−2, 3, 2⟩ · (⟨1 + 2t, 2t, 1 − t⟩ − ⟨2, 2, 0⟩) = −6.
Therefore
⟨−2, 3, 2⟩ · (⟨1 + 2t, 2t, 1 − t⟩ − ⟨2, 2, 0⟩) ̸= 0 for all t ∈ R.
Therefore L does not intersects P .

Example 1.7.25. We determine whether or not the line L through ā = ⟨−1, 0, 0⟩ and
b̄ = ⟨−1, −2, 3⟩ intersects the plane P through p̄ = ⟨1, 2, −1⟩, q̄ = ⟨1, 0, 2⟩ and r̄ = ⟨2, 2, 0⟩,
and find the points of intersection, if any exist.

59
Solution. The plane P is the same as in Example 1.7.23. It has Cartesian equation

⟨−2, 3, 2⟩ · (x̄ − ⟨2, 2, 0⟩) = 0.

The line L through ā and b̄ has equation

x̄ = ā + t(b̄ − ā) = ⟨−1, −2t, 3t⟩, t ∈ R.

Therefore L intersects P at a point x̄ ∈ R3 if and only if

x̄ = ⟨−1, −2t, 3t⟩ for some t ∈ R (1.27)

and

⟨−2, 3, 2⟩ · (x̄ − ⟨2, 2, 0⟩) = 0. (1.28)

We combine (1.25) and (1.26) and find that L intersects P if and only if

⟨−2, 3, 2⟩ · (⟨−1, −2t, 3t⟩ − ⟨2, 2, 0⟩) = 0 for some t ∈ R.

For all t ∈ R we have

⟨−2, 3, 2⟩ · (⟨−1, −2t, 3t⟩ − ⟨2, 2, 0⟩) = 0.

Therefore every point on L is also on P so that L ⊆ P .

Next we consider the intersection of two planes.

Theorem 1.7.26. Let P and Q be planes in R3 that intersect at a point ā. Then the
following statements are true.

(1) There is a point b̄ ̸= ā so that b̄ ∈ P and b̄ ∈ Q.

(2) If x̄ is any point on the line through ā and b̄, then x̄ ∈ P and x̄ ∈ Q.

We give a proof of (1), leaving the proof of (2) as an exercise, see Exercise 1.7 number 16.

Proof of (1). Let p̄ and q̄ be points on P so that p̄ − ā and q̄ − ā are not scalar multiples
of each other. Then
x̄ = ā + t(p̄ − ā) + s(q̄ − ā)
is a vector parametric equation for P . Let n̄ be a normal vector for Q. Since ā ∈ Q,

n̄ · (x̄ − ā) = 0

is a Cartesian equation for Q. If n̄ · (p̄ − ā) = 0, then p̄ ∈ Q and we are done. Therefore we
assume that
n̄ · (p̄ − ā) ̸= 0.

60
Let b̄ = ā + t0 (p̄ − ā) + s0 (q̄ − ā) where

t0 = n̄ · (q̄ − ā) and s0 = −n̄ · (p̄ − ā).

Then b̄ ∈ P and, since s0 ̸= 0, it follows that b̄ ̸= ā, see Exercise 1.7 number 15(a). But

n̄ · (b̄ − ā) = 0,

see Exercise 1.7 number 15(b). Therefore b̄ ∈ Q.

The following result is an immediate consequence of Theorems 1.6.9 and 1.7.26. The proof
is given as an exercise, see Exercise 1.7 number 17.

Theorem 1.7.27. Let P and Q be planes in R3 . Then exactly one of the following state-
ments is true.

(1) P and Q do not intersect.

(2) The intersection of P and Q is a line.

(3) P and Q are equal.

We demonstrate Theorem 1.7.27 at the hand of the following examples.

Example 1.7.28. Consider the plane P through the points p̄ = ⟨1, 0, 2⟩, q̄ = ⟨2, 1, −1⟩ and
r̄ = ⟨1, 1, 0⟩, and the plane Q through ū = ⟨1, −1, 1⟩, v̄ = ⟨2, 1, 1⟩ and w̄ = ⟨−1, 1, 0⟩. We
determine whether or not the planes P and Q intersect each other, and find the points of
intersection, if any exist.

Solution. The plane P has normal vector

n̄ = (p̄ − r̄) × (q̄ − r̄) = ⟨1, 2, 1⟩,

Therefore a point x̄ = ⟨x, y, z⟩ is on P if and only if

n̄ · (x̄ − r̄) = x + 2y + z − 3 = 0.

The plane Q has normal vector

m̄ = (ū − w̄) × (v̄ − w̄) = ⟨−2, 1, 6⟩

so that x̄ is on Q if and only if

m̄ · (x̄ − w̄) = −2x + y + 6z − 3 = 0.

Therefore P and Q intersect at a point x̄ = ⟨x, y, z⟩ if and only if

x + 2y + z = 3. (1.29)

61
and

−2x + y + 6z = 3. (1.30)

Note that w̄ ∈ Q, but n̄·(w̄ − r̄) = −2 ̸= 0 so that w̄ ∈


/ P . It therefore follows from Theorem
1.7.27 that P and Q intersect in a line, or not at all.

We eliminate x from (1.29) by multiplying throughout by 2 and adding (1.30). Then x̄ =


⟨x, y, z⟩ ∈ P ∩ Q if and only if

5y + 8z = 9 and − 2x + y + 6z = 3.

Write the first equation as y = 59 − 85 z, and substitute this expression into the second
equation. We now have
y = 59 − 58 z and x = 11
5
z − 53
Observe that these last two equation have many solutions. Every value for z determines a
value for x and a value for y. Therefore P and Q intersect in a line. In order to find the
equation of the line of intersection, we require two points on the line.

Setting z = 0, we find that x = − 53 and y = 95 . Hence

ā = ⟨− 53 , 95 , 0⟩
8 1
is a point on the line of intersection of P and Q. Let z = 1. Then x = 5
and y = 5
so that

b̄ = ⟨ 85 , 15 , 1⟩

is a point on both P and Q. Hence P and Q intersect in the line with equation

x̄ = b̄ + t(ā − b̄) = ⟨ 58 − 11 1
5
t, 5 + 85 t, 1 − t⟩, t ∈ R.

Example 1.7.29. Let P be the plane through r̄ = ⟨1, 2, 1⟩ with normal vector n̄ = ⟨2, 1, −1⟩,
and Q the plane through w̄ = ⟨1, 1, 1⟩ with normal vector m̄ = ⟨−6, −3, 3⟩. We determine
whether or not the planes P and Q intersect each other, and find the points of intersection,
if any exist.

Solution. A point x̄ = ⟨x, y, ⟩ is on P if and only if

n̄ · (x̄ − r̄) = 2x + y − z − 3 = 0,

and x̄ ∈ Q if and only if

m̄ · (ā − w̄) = −6x − 3y + 3z + 6 = 0.

Therefore P and Q intersect at a point ā = ⟨x, y, z⟩ if and only if

2x + y − z = 3 (1.31)

62
and

−6x − 3y + 3z = −6. (1.32)

Adding (1.32) to three times (1.31), we see that if x̄ ∈ P ∩ Q then

−3 = 0.

This is clearly not true. Therefore the two planes do not intersect.
Example 1.7.30. Let P be the plane through r̄ = ⟨1, 0, 1⟩ with normal vector n̄ = ⟨6, 3, 9⟩,
and Q the plane through w̄ = ⟨2, 1, 0⟩ with normal vector m̄ = ⟨2, 1, 3⟩. We determine
whether or not the planes P and Q intersect each other, and find the points of intersection,
if any exist.

Solution. A point x̄ = ⟨x, y, z⟩ is on P if and only if

n̄ · (x̄ − r̄) = 6x + 3y + 9z − 15 = 0, (1.33)

and x̄ ∈ Q if and only if

m̄ · (x̄ − w̄) = 2x + y + 3z − 5 = 0. (1.34)

Multiplying (1.33) by 31 , we find that x̄ ∈ P if and only if

2x + y + 3z − 5 = 0.

Therefore x̄ ∈ P if and only if x̄ ∈ Q, so that P = Q.

We have introduced the cross product of two vectors in R3 , and derived some of its more
important properties. The cross product is an important construction, with many appli-
cations, amongst others in Mechanics. We have, however, focused on the use of the cross
product in the study of planes in R3 . In this regard, the cross product leads to an elegant
description of a plane. The utility of this alternative equation for a plane is demonstrated
at the hand of two problems; namely, that of finding the intersection of a plane and a line,
and of two planes, respectively.

Exercise 1.7

1. Let ū = ⟨1, −2, 1⟩, v̄ = ⟨2, 0, −1⟩, w̄ = ⟨0, 3, −2⟩ and x̄ = ⟨−1, 2, −1⟩. Evaluate the
given expression, if possible. Otherwise, explain why it is impossible to evaluate the
expression.
(a) ū × v̄ (b) v̄ × ū (c) x̄ × ū
(d) w̄ × v̄ (e) (ū + v̄) × w (f) x̄ × (w̄ × v̄)
(g) (x̄ × w̄) × v̄ (h) (w̄ × x̄) · ū (i) (w̄ · x̄) × ū
(j) (2x̄ − 3v̄) × w̄ (k) (3x̄ − 5ū) × (2v̄ + w̄)

63
2. Write down a Cartesian equation for each of the following planes.
(a) P is the plane through r̄ = ⟨1, 0, 2⟩, p̄ = ⟨2, 1, 1⟩ and q̄ = ⟨1, 2, 2⟩.
(b) P is the plane with vector parametric equation x̄ = ⟨1 + t − s, 2s + t, 3 + t + s⟩,
s, t ∈ R.

3. Write down a vector parametric equation for the plane P with given Cartesian equa-
tion.

(a) x + y − 2z = 6 (b) x − y = 1 (c) 2x + 3y − z = 12 (d) z = 2

4. Determine whether or not the given plane P and line L intersect each other, and find
the points of intersection, if any exist.
(a) P is the plane through p̄ = ⟨1, −1, 1⟩, q̄ = ⟨2, 1, 0⟩ and r̄ = ⟨1, 1, 1⟩; L is the line
through ā = ⟨2, 3, 1⟩ and b̄ = ⟨4, 3, 2⟩.
(b) P is the plane through p̄ = ⟨1, −1, 1⟩, q̄ = ⟨2, 1, 0⟩ and r̄ = ⟨1, 1, 1⟩; L is the line
through ā = ⟨1, 1, 2⟩ and b̄ = ⟨0, 4, 3⟩.
(c) P is the plane through ī, j̄ and k̄; L is the line through ā = ⟨−1, −2, −3⟩ and
b̄ = ⟨2, 3, 1⟩.
(d) P is the plane through 2ī, −4j̄ and 3k̄; L is the line through ā = ⟨1, −2, 0⟩ and
b̄ = ⟨1, 0, 32 ⟩.
(e) P is the plane through r̄ = ⟨1, 2, 1⟩ and with normal vector n̄ = ⟨−1, 1, 1⟩; L is
the line with equation x̄ = ⟨1, 1, 1⟩ + t⟨2, 1, 1⟩.
(f) P is the plane with Cartesian equation ⟨2, −1, 1⟩ · x̄ = 0; L is the line with
equation x̄ = ⟨2, 1, 2⟩ + t⟨1, 3, −1⟩.
(g) P is the plane with Cartesian equation ⟨1, −1, 1⟩ · (x̄ − ī) = 0; L is the line with
equation x̄ = ⟨3, 4, 2⟩ + t⟨2, 3, 1⟩.

5. Determine whether or not the given planes P and Q intersect each other, and find the
points of intersection, if any exist.
(a) P is the plane through p̄ = ⟨1, −1, 1⟩, q̄ = ⟨2, 1, 0⟩ and r̄ = ⟨1, 1, 1⟩; Q is the
plane through ū = ⟨2, 0, 1⟩ and with normal vector n̄ = ⟨1, 2, −1⟩.
(b) P is the plane through p̄ = ⟨1, 0, 1⟩, q̄ = ⟨2, 2, 2⟩ and r̄ = ⟨0, −1, 1⟩; Q is the
plane through k̄ = ⟨0, 0, 1⟩ and with normal vector n̄ = ⟨−1, 3, 1⟩.
(c) P is the plane through p̄ = ⟨2, 2, 1⟩, q̄ = ⟨2, 1, 2⟩ and r̄ = ⟨1, 0, 1⟩; Q is the plane
through ū = ⟨1, 2, 3⟩ and with normal vector n̄ = ⟨−2, 1, 1⟩.
(d) P is the plane through p̄ = ⟨1, 3, 1⟩, q̄ = ⟨2, 3, 2⟩ and r̄ = ⟨1, 2, 1⟩; Q is the plane
through ū = ⟨4, 3, 4⟩ and with normal vector n̄ = ⟨3, 0, −3⟩.
(e) P is the plane through p̄ = ⟨2, 3, 1⟩, q̄ = ⟨1, 3, 2⟩ and r̄ = ⟨1, 0, 2⟩; Q is the plane
through 2j̄ and with normal vector n̄ = ⟨1, −1, 2⟩.
(f) P is the plane through p̄ = ⟨2, 0, 1⟩, q̄ = ⟨1, 1, 2⟩ and r̄ = ⟨2, 1, 0⟩; Q is the plane
through ū = ⟨3, 0, 1⟩ and with normal vector n̄ = ⟨2, 2, 2⟩.

64
(g) P is the plane through p̄ = ⟨2, −1, 1⟩, q̄ = ⟨2, 1, 2⟩ and r̄ = ⟨0, 1, 1⟩; Q is the
plane through ū = ⟨1, 1, 1⟩ and with normal vector n̄ = ⟨1, −1, 3⟩.

6. The given lines L1 and L2 intersect in a single point ā. Find an equation for the line
through ā that is perpendicular to both L1 and L2 .
(a) L1 is the line through ī and j̄; L2 is the line through p̄ = ⟨1, 1, 2⟩ and q̄ =
⟨2, −1, 0⟩.
(b) L1 is the line with equation x̄ = ⟨1 − t, t, 2t⟩; L2 is the line with equation x̄ =
⟨3 − t, 2t − 2, t − 4⟩.

7. Let ū and v̄ be vectors in R3 . Prove that ū × v̄ = −(v̄ × ū).

8. Use Theorem 1.7.4 (1) and (2) to prove Theorem 1.7.4 (3).

9. Prove Theorem 1.7.4 (4).

10. The aim of this exercise is to prove Theorem 1.7.4 (5). Let ū and v̄ be algebraic vectors
in R3 , and α any real number. We assume that ū ̸= 0̄ and v̄ ̸= 0̄.
(a) Assume that v̄ = αū.
i. Prove that ū × ū = 0̄.
ii. Now use Theorem 1.7.4 (4) to prove that ū × v̄ = 0̄.
(b) Assume that ū × v̄ = 0̄. Since ū ̸= 0̄, at least one of the components of ū is
nonzero. Suppose u3 ̸= 0. Prove that
v3
v̄ = αū, with α = .
u3

11. Prove Theorem 1.7.6.

12. Use Theorem 1.7.13 and the identity cos2 θ + sin2 θ = 1 to prove Theorem 1.7.15.

13. Consider a triangle with vertices ā, b̄ and c̄. Let the angle at ā have magnitude θ1 , the
angle at b̄ magnitude θ2 and the angle at c̄ magnitude θ3 . Use Theorems 1.7.15 and
1.7.4 to prove the Sine Law; that is, show that
sin θ1 sin θ2 sin θ3
= = .
∥b̄ − c̄∥ ∥ā − c̄∥ ∥b̄ − ā∥

14. The aim of this exercise is to prove Theorem 1.7.17. Let p̄, q̄, r̄, P and n̄ be as given
in the theorem.
(a) Explain why n̄ ̸= 0̄.
(b) Explain why the set A = {x̄ ∈ R3 : n̄ · (x̄ − r̄) = 0} is a plane in R3 .
(c) Show that p̄, q̄, r̄ ∈ A.
(d) Explain why the points p̄, q̄ and r̄ are not on the same line.
(e) Explain why A = P .

65
15. Consider the proof of Theorem 1.7.26 (1). Let b̄ = ā + t0 (p̄ − ā) + s0 (q̄ − ā) where

t0 = n̄ · (q̄ − ā) and s0 = −n̄ · (p̄ − ā).

(a) Explain why b̄ ̸= ā.


(b) Show that n̄ · (b̄ − ā) = 0.

16. The aim of this exercise is to prove Theorem 1.7.26 (2). Suppose that P and Q have
normal vectors n̄ and m̄ respectively. Let ā and b̄ be distinct points lying on both P
and Q. Let x̄ be a point on the line through ā and b̄. Use Theorem 1.2.3 (3) to prove
that x̄ ∈ P and x̄ ∈ Q.

17. The aim of this exercise is to prove Theorem 1.7.27. Let P and Q be planes in R3 .
Assume that (1) and (2) in Theorem 1.7.27 are not true. That is, P and Q have at
least one point in common, and the intersection of P and Q is not a line in R3 .
(a) Explain why there is a line L so that L ⊆ P and L ⊆ Q.
(b) Explain why there is a point x̄ ∈ P ∩ Q such that x̄ is not on L.
(c) Now use Theorem 1.6.9 to prove that P = Q.

1.8 Geometric Vectors

In elementary physics, a force is often modeled as an ‘arrow’, with the length of the ‘arrow’
representing the magnitude of the force, and the ‘direction’ of the arrow the direction in
which the force acts. If two forces, call them F1 and F2 , act on an object at the same time,
the resultant force is calculated as the ‘sum’ of the ‘arrows’ representing the forces, with
this ‘sum’ defined geometrically, as illustrated in the figure below.
> 
1

 
F + F 2
1 

F2 rc
o e
t f
tan
su
e  l
R




 > -
F1
In this section, we show how these ‘arrows’ and their algebra is described in the context of
our model for space. We start by showing how algebraic vectors in R3 are used to represent
‘arrows’.

Recall from Section 1.4 that for points p̄ and q̄ in R3 , the line segment between q̄ and p̄ is
the set

S = {tp̄ + (1 − t)q̄ : 0 ≤ t ≤ 1}. (1.35)

66
It is easy to see that we also have

S = {sq̄ + (1 − s)p̄ : 0 ≤ s ≤ 1}. (1.36)

While the two descriptions of S gives the same set of points in R3 , there is an important
difference. Each of the two expressions (1.35) and (1.36) of S induces an orientation, or
direction, on S.

Consider (1.35). We have q̄ = 0p̄ + (1 − 0)q̄ ∈ S and p̄ = 1p̄ + (1 − 1)q̄ ∈ S. Thus we ‘start’
at q̄, and ‘end’ at p̄, so that the line segment is directed by this specific representation of S.
• q̄ (t = 0)
@
@
R
@@
@
@@• p̄ (t = 1)

We say that q̄ is the initial point of S, and p̄ is the terminal point of S.

Now consider (1.36). In this case we have p̄ = 0q̄ + (1 − 0)p̄ ∈ S and p̄ = 1q̄ + (1 − 1)p̄ ∈ S.
Therefore, expressing S in this way changes the direction of the line segment.
• q̄ (s = 1)
@
@
@
I
@@
@
@@• p̄ (s = 0)

In particular, p̄ is now the initial point, and q̄ the terminal point.

Remark 1.8.1. In view of the comments above, we speak of the line segment between points
p̄ and q̄ when we consider only the set

S = {tp̄ + (1 − t)q̄ : 0 ≤ t ≤ 1}.




We speak of the directed line segment from q̄ to p̄, and write S , when we consider the set
S and the particular orientation (from q̄ to p̄).

We now come to the definition of geometric vectors in the context of our model for space.


Definition 1.8.2. A geometric vector is a directed line segment S in R3 .

− →

Definition 1.8.3. Let S and T be nonzero geometric vectors with initial points q̄ and b̄,
respectively, and terminal points p̄ and ā, respectively. Consider the rays R0 and R1 given
by R0 = {0̄ + t(p̄ − q̄) : t ≥ 0} and R1 = {0̄ + t(ā − b̄) : t ≥ 0}.

− →

(1) If R0 = R1 , then S and T have the same direction.

− →

(2) If R0 ∪ R1 is a line, then we say that S and T have opposite directions.

67
The figure below illustrates Definition 1.8.3 in the case of two vectors with the same direction.
R0 = R1
@
I
@
@
@
ā − b̄ @
• • ā
@ I
@@
p̄ • -@• p̄ − q̄ @ →

@
I @ @ T

−@ @ @
S @ @ @
q̄@• -@•  @• b̄


− →

Remark 1.8.4. If S and T are geometric vectors with the same direction, or with opposite

− →

directions, we say that S and T are parallel.


Definition 1.8.5. Let S be a geometric vector with initial point q̄ and terminal point p̄.


The magnitude of S is ∥p̄ − q̄∥.

− →
− →
− →

Definition 1.8.6. Let S and T be nonzero geometric vectors. Then S and T are equal
as vectors if they have the same direction and the same magnitude. In this case we write

− →

S = T.


− → − → − →
− →
− →
− → − →

̸ R and
The figure below shows geometric vectors S , T , R and U with S = T , S =

− →

S ̸= U .


R
-


−  
S →


− U
T

The following result gives an algebraic test for equality of geometric vectors, as defined in
Definition 1.8.6.


Theorem 1.8.7. Let S be a geometric vector with initial point q̄ and terminal point p̄, and

− →
− →

T a geometric vector with initial point b̄ and terminal point ā. Then S = T if and only if
p̄ − q̄ = ā − b̄.

Proof. Assume that p̄ − q̄ = ā − b̄. Then



− →

magnitude of S = ∥p̄ − q̄∥ = ∥ā − b̄∥ = magnitude of T .

Furthermore,
{0̄ + t(p̄ − q̄) : t ≥ 0} = {0̄ + t(ā − b̄) : t ≥ 0}

− →
− →
− →

so that S and T have the same direction. Therefore S = T .

68

− →
− →
− →

Conversely, assume that S = T . Since S and T have the same direction, it follows from
Definition 1.8.3 that
{0̄ + t(p̄ − q̄) : t ≥ 0} = {0̄ + s(ā − b̄) : s ≥ 0}.
Then
ā − b̄ = 0̄ + (ā − b̄) ∈ {0̄ + t(p̄ − q̄) : t ≥ 0}.
Therefore there exists a real number t > 0 so that
ā − b̄ = 0̄ + t(p̄ − q̄) = t(p̄ − q̄).
But

− →

∥p̄ − q̄∥ = magnitude of S = magnitude of T = ∥ā − b̄∥
so that
∥p̄ − q̄∥ = t∥p̄ − q̄∥.
Therefore t = 1 so that p̄ − q̄ = ā − b̄.


Remark 1.8.8. Let S be a geometric vector with initial point q̄ and terminal point p̄, and


T a geometric vector with initial point b̄ and terminal point ā.

− →

(1) It follows from Theorem 1.8.7 that S and T are equal as vectors if and only if there
exists r̄ ∈ R3 so that
p̄ = ā + r̄ and q̄ = b̄ + r̄.

− →

(2) If it is convenient to do so, we may assume that S and T have the same initial point.


It is also permissible to assume that the initial point of T is equal to the terminal


point of S .

The next result shows that every geometric vector can be represented by an algebraic vector
(point) in R3 in a unique way.


Theorem 1.8.9. Let S be a geometric vector with initial point q̄ and terminal point p̄.

− →
− →

Then there exists exactly one point x̄ ∈ R3 so that S = T , with T the geometric vector
with initial point 0̄ and terminal point x̄.



Proof. Let x̄ = p̄ − q̄, and let T be the geometric vector with initial point 0̄ and terminal

− →

point x̄. Then x̄ − 0̄ = p̄ − q̄ so that S = T by Theorem 1.8.7.


If ȳ ∈ R3 is such that S is equal to the geometric vector with initial point 0̄ and terminal
point ȳ, then by Theorem 1.8.7, ȳ = ȳ − 0̄ = p̄ − q̄ = x̄ − 0̄ = x̄



Based on Theorem 1.8.9, every geometric vector S can be represented in a unique way as
an algebraic vector x̄ ∈ R3 . Conversely, every algebraic vector x̄ ∈ R3 can be interpreted


as any geometric vector S that is equal to the geometric vector with initial point 0̄ and
terminal point x̄.

69
z
c

x̄ = ⟨a, b, c⟩

b y
x a

In fact, for any point ā ∈ R3 , we interpret the algebraic vector x̄ as the geometric vector
with initial point ā and terminal point ā + x̄.

Remark 1.8.10. The elements of R3 are ordered sets of three real numbers, called algebraic
vectors and denoted x̄ = ⟨x1 , x2 , x3 ⟩. We have two interpretations of x̄ ∈ R3 .

(1) Each x̄ ∈ R3 represents a point in space.

(2) Each x̄ ∈ R3 represents the geometric vector in space with initial point ā and terminal
point ā + x̄, where ā is any point in space.

Due to the identification of algebraic vectors with geometric vectors, we make the following
conventions.

(3) We will from now on speak simply of a vector in R3 , even if we are considering
geometric vectors.


(4) Geometric vectors will be denoted by their algebraic representatives; that is, if S is a


geometric vector with initial point ā and terminal point ā + x̄, then we write S = x̄.

(5) When we use a vector in R3 to represent a point in space, we speak of a point x̄ in


space, or a point x̄ in R3 .

The interpretation of an algebraic vector as a geometric vector provides us with a way in


which to visualise vector addition and scalar multiplication. We consider scalar multiplica-
tion first.

Theorem 1.8.11. Let p̄ be a nonzero vector in R3 , and α ̸= 0 a real number.

(1) If α > 0, then αp̄ is the vector in the same direction as p̄, with magnitude α∥p̄∥.

(2) If α < 0, then αp̄ is the vector in the opposite direction as p̄, with magnitude |α|∥p̄∥.

Proof. Both statements follow directly from the definition of a ray, and Definition 1.8.3.

70
Theorem 1.8.11 is illustrated at the hand of the following example.
Example 1.8.12. Consider the vectors p̄ = ⟨2, 3, −5⟩, q̄ = ⟨1, 3, 1⟩, r̄ = ⟨−4, −6, 10⟩ and
ū = ⟨8, 12, −20⟩. Then
r̄ = −2p̄ and ū = 4p̄
so that ū and p̄ have the same direction, and r̄ and p̄ have opposite direction. On the other
hand,
q̄ ̸= αp̄ f or all α ∈ R,
so that p̄ and q̄ have neither the same, nor opposite direction.

Next we consider vector addition. In order to interpret addition of vectors geometrically, we


define parallelograms in the context of our model for space. Note that we define parallelo-
grams in space, whereas the parallelograms you studied in high school were all parallelograms
in the same plane.
Definition 1.8.13. Let ā, b̄, c̄ and d¯ be points in R3 , with no three points on the same
line. The set of points on the line segments between ā and b̄, b̄ and c̄, c̄ and d,¯ and d¯ and ā
is called the parallelogram with vertices ā, b̄, c̄ and d¯ (in this order) if b̄ − ā = c̄ − d¯ and
d¯ − ā = c̄ − b̄, and is denoted by P (ā, b̄, c̄, d).
¯

Remark 1.8.14 (Properties of Parallelograms). It is possible to prove that parallelo-


grams, as defined in our model for space, have the properties that we expect them to have.
For instance, the following statements are true.

(1) Let ā, b̄, c̄ and d¯ be points in space with no three on the same line. Then ā, b̄, c̄ and
d¯ are the vertices of a parallelogram P (ā, b̄, c̄, d)
¯ if and only if the line through ā and
b̄ is parallel to the line through d¯ and c̄, and the line through b̄ and c̄ is parallel to the
line through ā and d. ¯

(2) If ā, b̄, c̄ and d¯ are the vertices of a parallelogram P (ā, b̄, c̄, d),
¯ then
(a) the angle formed by the rays R1 = {ā + t(b̄ − ā) : t ≥ 0} and R2 = {ā +
t(d¯ − ā) : t ≥ 0} have the same magnitude as the angle formed by the rays
R3 = {c̄ + t(b̄ − c̄) : t ≥ 0} and R4 = {c̄ + t(d¯ − c̄) : t ≥ 0};
(b) the angle formed by the rays R1′ = {b̄ + t(ā − b̄) : t ≥ 0} and R2′ = {b̄ +
t(c̄ − b̄) : t ≥ 0} have the same magnitude as the angle formed by the rays
R3′ = {d¯ + t(c̄ − d)
¯ : t ≥ 0} and R′ = {d¯ + t(ā − d)
4
¯ : t ≥ 0}.

b• > •
ϕ θ c

a θ ϕ
• > •
d
The above sketch illustrates the facts mentioned in (1) and (2).

71
Addition of vectors is closely related to parallelograms, as is shown in the following result.

Theorem 1.8.15. Consider points ā and b̄ in R3 that are not scalar multiples of each other.
Then the following statements are true.

(1) 0̄, ā, b̄ and ā + b̄ are on the same plane.

(2) 0̄, ā, ā + b̄ and b̄ are the vertices of a parallelogram P (0̄, ā, b̄, ā + b̄).

(3) If c̄ is a point in R3 so that 0̄, ā, c̄ and b̄ are the vertices of a parallelogram P (0̄, ā, c̄, b̄),
then c̄ = ā + b̄.

We give a proof of (2). The proofs of (1) and (3) are given as exercises, see Exercise 1.8
numbers 5 and 7.

Proof of (2). Since

ā − 0̄ = ā = (ā + b̄) − b̄ and b̄ − 0̄ = b̄ = (ā + b̄) − ā,

it remains only to show that no three of the points 0̄, ā, b̄ and ā + b̄ are on the same line.
The points 0̄, ā and b̄ are not all on the same line, see Exercise 1.8 number 6. Suppose that
ā + b̄ is on the line through 0̄ and ā. Then there exists a real number t so that

ā + b̄ = tā + (1 − t)0̄ = tā.

Hence b̄ = (t − 1)ā, contradicting the fact that ā and b̄ are not scalar multiples of each other.
In the same way, it follows that ā + b̄ is not on the line through 0̄ and b̄. Lastly, suppose
that ā + b̄ is on the line through ā and b̄. Then

ā + b̄ = tā + (1 − t)b̄ for some t ∈ R.

Hence tb̄ = (t − 1)ā, again contradicting the fact that ā and b̄ are not scalar multiples of
each other. Therefore no three of the points 0̄, ā, b̄ and ā + b̄ are on the same line. Thus 0̄,
ā, ā + b̄ and b̄ are the vertices of a parallelogram P (0̄, ā, ā + b̄, b̄).

According to Theorem 1.8.15, if ā and b̄ are nonzero vectors that are not scalar multiples of
each other, then the points 0̄, ā, ā+ b̄ and b̄ are the vertices of a parallelogram P (0̄, ā, b̄, ā+ b̄).
We say that the parallelogram is determined by the vectors ā and b̄. If we interpret ā, b̄ and
ā + b̄ as vectors in stead of points, we have the following.

Theorem 1.8.16. Let ā and b̄ be nonzero vectors in R3 , not scalar multiples of each other.
Then the vector ā + b̄ with initial point 0̄ is the diagonal of the parallelogram determined by
ā and b̄.

Proof. The result follows immediately from Theorem 1.8.15 and Definition 1.8.6.

72
The figures below illustrate the different interpretations of vectors and vector addition given
in Theorems 1.8.15 and 1.8.16.
b• • a + b
 1


  
  


a 

   ā + b̄
  
 
 
 
0 • •a 0̄ • -
b
a, b and a + b as points. a, b and a + b as vectors.

It is often convenient to use geometric terminology when considering vectors. This is jus-
tified, as vectors in R3 can be represented as directed line segments in space. In the table
below we list geometric concepts that are frequently used as terminology in connection
with vectors. Here x̄ and ȳ are vectors in R3 , L = {p̄ + tq̄ : t ∈ R} is a line and
P = {z̄ ∈ R3 : n̄ · (z̄ − r̄) = 0} a plane in space. Let ā and b̄ be points in space.

Terminology Meaning Motivation


x̄ is parallel to ȳ x̄ = αȳ for some α ̸= 0
{ā + tx̄ : t ∈ R} and {b̄ + tȳ : t ∈ R}
are parallel lines.
x̄ and ȳ have the x̄ = αȳ for some α > 0 The rays {tx̄ : t ≥ 0} and {tȳ : t ≥ 0}
same direction are equal.
x̄ and ȳ have op- x̄ = αȳ for some α < 0 The rays {tx̄ : t ≥ 0} and {tȳ : t ≥ 0}
posite directions form a line.
The angle be- The angle formed by the The angle and its magnitude are
tween x̄ and ȳ rays {ā + tx̄ : t ≥ 0} uniquely determined by x̄ and ȳ.
and {ā + tȳ : t ≥ 0}
x̄ is perpendicu- The angle between x̄ The lines {ā + tx̄ : t ∈ R} and
π
lar to ȳ and ȳ has magnitude 2 {ā + tȳ : t ∈ R} are perpendicular.
x̄ is parallel to L x̄ is parallel to q̄ The line {ā + tx̄ : t ∈ R} is parallel to L.
x̄ is perpendicular x̄ is perpendicular to q̄ The line {p̄ + tx̄ : t ∈ R} is perpendicu-
to L lar to L.
x̄ is parallel x̄ is perpendicular to n̄ The line {r̄ + tx̄ : t ∈ R} is contained in
to P the plane P .
x̄ is perpendicular x̄ is parallel to n̄ The line {r̄ + tx̄ : t ∈ R} is perpendicu-
to P lar to every line through r̄ that is con-
tained in P .

In this section it is shown how algebraic vectors in R3 are interpreted as geometric vectors
in space. We therefore have two interpretations of an algebraic vector x̄ ∈ R3 ; namely, as a
point in space or as a geometric vector. This dual interpretation of vectors in R3 is necessary

73
from the point of view of applications. Indeed, if a vector x̄ represents the position of a
particle in space, then in makes sense to think of x̄ as a point. On the other hand, if x̄
represents the velocity of the particle, then it is convenient to think of x̄ as a geometric
vector; that is, as an ‘arrow’ in space.

Exercise 1.8

1. Consider the points ā = ⟨1, 0, 1⟩, b̄ = ⟨2, 1, 2⟩, c̄ = ⟨α + β, α − β, 0⟩ and d¯ = ⟨3, 2, −1⟩,
where α and β are real numbers. Determine those values for α and β, if any, for which
ā, b̄, c̄ and d¯ are the vertices of a parallelogram P (ā, b̄, c̄, d).
¯

2. Assume that ā, b̄, c̄ and d¯ are the vertices of a parallelogram P (ā, b̄, c̄, d).
¯ Prove that
the line through ā and b̄ is parallel to the line through d¯ and c̄, and the line through
¯
b̄ and c̄ is parallel to the line through ā and d.

3. Prove Theorem 1.8.15 (1). [HINT: First show that {tā + sb̄ : s, t ∈ R} is a plane in
R3 .]

4. Consider the proof of Theorem 1.8.15 (2). Let ā and b̄ be as given in the theorem.
Show that 0̄, ā and b̄ are not all on the same line.

5. Prove Theorem 1.8.15 (3).

6. Write down an equation for the line L through p̄ = ⟨1, 2, 3⟩ and parallel to the vector
x̄ = ⟨1, −1, 2⟩.

7. Write down a Cartesian equation for the plane P through r̄ = ⟨2, 1, 1⟩ and parallel to
the vectors x̄ = ⟨1, 1, 1⟩ and ȳ = ⟨2, 0, −1⟩.

8. Determine the magnitude of the angle between the vectors.

(a) x̄ = ⟨1, 1, 0⟩ and ȳ = ⟨1, 0, −1⟩ (b) x̄ = ⟨1, 2, −1⟩ and ȳ = ⟨0, 1, −1⟩
(c) x̄ = ⟨2, 4, 1⟩ and ȳ = ⟨−1, 1, −2⟩

9. For which values of α ∈ R are the vectors x̄ = ⟨α, 2, 1⟩ and ȳ = ⟨α, α, −1⟩ perpendic-
ular?

10. For which values of α ∈ R are the vectors x̄ = ⟨α2 − 1, 4, 2⟩ and ȳ = ⟨0, 2, 1⟩ parallel?
For which of these values do x̄ and ȳ have the same direction?

11. For which values of α ∈ R is the vector x̄ = ⟨1, 2 − α, −1⟩ parallel to the plane P with
Cartesian equation 2x − αy + 3z = 1?

12. For which values of α ∈ R is the vector x̄ = ⟨1, α, −4⟩ perpendicular to the plane P
with Cartesian equation αx + 9y − 12z = 0?

74
Chapter 2

Matrices and Systems of Linear


Equations

Many mathematical problems, and the mathematical formulations of many physical prob-
lems, can be represented as systems of equations. Consider, for instance, the problem of
finding a parabola that passes through the points p̄ = ⟨−2, 15⟩, q̄ = ⟨1, 6⟩ and r̄ = ⟨3, 20⟩.
The general equation of a parabola is y = ax2 + bx + c. We must determine those values,
if any, of the constants a, b and c so that the resulting parabola passes through the points
p̄, q̄ and r̄. By substituting the coordinates of the given points into the equation of the
parabola, we obtain the system of equations

4a − 2b + c = 15
a + b + c = 6
9a + 3b + c = 20.

The study of systems of linear equations, such as the one above, leads to the basic ideas in
matrix theory which we will explore in this chapter.

2.1 Systems of Equations

This section is an introduction to solving systems of equations. A system of equations is a


collection of two or more equations in one or more unknown quantities x1 , ..., xn . A solution
of a system of equations in the unknown quantities x1 , . . . , xn are values

x1 = a1 , x2 = a2 , . . . , xn = an

for the unknowns that simultaneously satisfy all the equations in the system. We typically
write the solutions of a system of equations in vector form, ⟨a1 , a2 , . . . , an ⟩. A system of

75
equations can have more than one solution, no solution or exactly one solution. In the
examples that follow, each of these possibilities is demonstrated.
Example 2.1.1. We solve the system equations
3x + 5y = 22
x + y = 6
and give a geometric interpretation for the solution.

Solution. Swapping the two equations, we obtain the system


x + y = 6
3x + 5y = 22.
Replacing the second equation with the second equation minus three times the first equation,
we obtain the system
x + y = 6
2y = 4.
From the last equation, it is clear that y = 2. Substituting y = 2 into the first equation, we
find x = 4. Thus the system has exactly one solution ⟨x, y⟩ = ⟨4, 2⟩.

The solution of the system can be interpreted geometrically.

y
6

4 x+y =6

3x + 5y = 22
2 ⟨4, 2⟩•

x
−2 2 4 6
−2

Each of the equations in the system represents a straight line in R2 . The solution can be
interpreted as the point of intersection of these two lines. That is, the two lines intersect
only at the point ⟨4, 2⟩.
Example 2.1.2. We solve the system of equations
2x + 3y = 7
4x + 6y = 8
and interpret the system and its solutions geometrically.

76
Solution. Replacing the last equation with the last equation minus twice the first equation,
we obtain the system

2x + 3y = 7
0 = −6.

If a pair of real numbers ⟨x, y⟩ satisfies both equations, then 0 = 6. Since this is impossible,
the two equations cannot be satisfied simultaneously. Therefore the system has no solution.

As in Example 2.1.1, each equation in the system represents a straight line in the plane. In
this case, the two lines are parallel, and therefore they do not intersect.

y
4

2x + 3y = 7
2
4x + 6y = 8
x
−2 2 4

−2

Our conclusion that the system has no solutions is therefore consistent with the fact that
distinct parallel lines do not intersect, see Theorem 1.4.19.

Example 2.1.3. We solve the system of equations

2x − 5y = 7
6x − 15y = 21

and interpret the system and its solutions geometrically.

Solution. Replacing the second equation with the second equation minus three times the
first equation, we obtain the system

2x − 5y = 7
0 = 0.

The second equation is always true. Therefore each pair ⟨x, y⟩ of real numbers that satisfies
the first equation, will satisfy both equations in the original system. If we choose a value
for x, say x = a for some for real number a, then we calculate an accompanying value
y = 52 a − 75 so that the pair ⟨x, y⟩ is a solution of the system. Hence there are an infinite
number of solutions; for example ⟨x, y⟩ = ⟨0, −7/5⟩, ⟨x, y⟩ = ⟨1, −1⟩ and ⟨x, y⟩ = ⟨6, 1⟩

77
are three solutions. It can be shown that every solution is of the form ⟨x, y⟩ = ⟨t, 25 t − 75 ⟩
for some real number t.

As in Examples 2.1.1 and 2.1.2, each equation in the system represents a line in the plane.
In this case, the two equations represent the same line.

y
2

x
−4 −2 2

2x − 5y = 7 −2

−4

Therefore every point on this line corresponds to a solution of the system. Hence the system
has infinitely many solutions.

Example 2.1.4. We solve the system

6x + 6y − 6z = 36
x + 2y + 3z = 12
x + z = 6

and interpret the system and its solutions geometrically.

Solution. Replace equation one with equation one times 16 , to obtain the system

x + y − z = 6
x + 2y + 3z = 12
x + z = 6.

Replace equation two with equation two minus equation one, to obtain the system

x + y − z = 6
y + 4z = 6
x + z = 6.

Replace equation three with equation three minus equation one, to obtain the system

x + y − z = 6
y + 4z = 6
−y + 2z = 0.

78
Replace equation three with equation three plus equation two, to obtain the system

x + y − z = 6
y + 4z = 6
6z = 6.

From the last equation, we deduce that z = 1. If z = 1 is substituted into the second
equation, we find that y = 2. Substituting z = 1 and y = 2 into the first equation, we
deduce that x = 5. Thus the system has exactly one solution ⟨x, y, z⟩ = ⟨5, 2, 1⟩. Each
equation in the system represents a plane in R3 . Hence a solution of the system corresponds
to a point in R3 that lies on all three planes. Since the system has a unique solution, there
is precisely one point in R3 that lies on all three planes.

Example 2.1.5. We find the solutions of the system

x + y + 2z = 0
2x − y + 3z = 1
3x + 5z = 0

and interpret the system and its solutions geometrically.

Solution. Replace the second equation with the second equation minus twice the first
equation. The result is

x + y + 2z = 0
− 3y − z = 1
3x + 5z = 0.

We replace the third equation with the third equation minus three times equation one, and
get
x + y + 2z = 0
− 3y − z = 1
− 3y − z = 0.
Replace the third equation with the third equation minus the second equation, to obtain
the system
x + y + 2z = 0
− 3y − z = 1
0 = −1.
If there were a triple ⟨x, y, z⟩ of real numbers that satisfied all three equations, then it would
mean that 0 = −1. Since this is not, we conclude that the system does not have a solution.

Each equation in the system represents a plane in R3 , and a solution of the system corre-
sponds to a point in R3 that lies on all three planes. The fact that the system does not have
a solution means that the three planes do not have a point in common.

79
Example 2.1.6. We solve the system of equations

x1 + x 2 = 3
x2 + 4x3 = 2
x1 + x3 = 1
x1 + x4 = 0.

Solution. Replace equation three with equation three minus equation one to obtain the
system
x1 + x2 = 3
x2 + 4x3 = 2
− x2 + x 3 = −2
x1 + x4 = 0.
Replace equation four with equation four minus equation one to obtain the system

x1 + x2 = 3
x2 + 4x3 = 2
− x2 + x 3 = −2
− x2 + x4 = −3.

Replace equation three with equation three plus equation two, and equation four with
equation four plus equation two to obtain the system

x1 + x2 = 3
x2 + 4x3 = 2
5x3 = 0
4x3 + x4 = −1.
1
Replace equation three with equation three times 5
to obtain the system

x1 + x2 = 3
x2 + 4x3 = 2
x3 = 0
4x3 + x4 = −1.

Replace equation four with equation four minus four times equation three to obtain the
system
x1 + x2 = 3
x2 + 4x3 = 2
x3 = 0
x4 = −1.
From the last two equations, it follows that x3 = 0 and x4 = −1. If x3 = 0 is substituted
into the second equation, we get x2 = 2. Using back-substitution again, we see from
the first equation that x1 = 1. Thus the only solution of the system is ⟨x1 , x2 , x3 , x4 ⟩ =
⟨1, 2, 0, −1⟩.

80
Remark 2.1.7. In solving the systems in Examples 2.1.1 to 2.1.6, we construct in each
case a logical argument. This argument may be summarised as follows, in the case of a
system in three unknowns x, y and z.

(1) Assume that there exists a triple of numbers ⟨x, y, z⟩ that satisfies all the equations
making up the system; that is, we assume that the system has a solution.
(2) The assumption in (1) turns each equation in the system into an equality of real num-
bers. We may therefore apply all invertible operations on real numbers that preserve
equality to the equations in the system. In particular, if a, b, c and d are real numbers,
then the following is true: For a nonzero real number α,

a = b and c = d if and only if αa = αb

and
a = b and c = d if and only if αa + c = αb + d.
This corresponds to multiplying an equation by a nonzero real number, and adding a
nonzero multiple of one equation to another.
(3) We apply the operations in (2) in a systematic way so that, at some point, we are
able to determine all the values of the unknown variables x, y and z in the system.
Logically, we have an equivalence

⟨x, y, z⟩ is a solution of the system, if and only if ⟨x, y, z⟩ = . . .

(4) If at some point we obtain an equation that is not true, as in Examples 2.1.2 and
2.1.5, it means that our assumption in (1) is false. We therefore conclude that the
system does not have a solution.

The systems of equations in Examples 2.1.1 to 2.1.6 are all of a particular type, namely,
systems of linear equations. The exact meaning of the term linear will be clarified in Section
2.3. We may notice that in each example, the system has either exactly one solution, no
solutions or infinitely many solutions. That these are the only possibilities for a system
of linear equations is proved in Section 2.3. However, for systems that are not linear, the
situation is more complicated. Indeed, a system of nonlinear equations can have any number
of solutions, as we demonstrate in the following two examples.
Example 2.1.8. We solve the system
x + y = −3
x2 + y 2 = 17
and interpret the system and its solutions geometrically.

Solution. The first equation may be written as y = −3 − x. Substitute this into the second
equation to get
x2 + (−3 − x)2 = 17,

81
that is,
(x + 4)(x − 1) = 0.
Thus x = −4 or x = 1. If x = −4 then y = 1, and if x = 1 then y = −4. Therefore the
system has exactly two solutions, namely, ⟨x, y⟩ = ⟨−4, 1⟩ or ⟨x, y⟩ = ⟨1, −4⟩.

The first equation in the system represents


√ a straight line in the plane, and the second a
circle with centre ⟨0, 0⟩ and radius 17. The solutions of the system correspond to the
points where the line intersects the circle.

y
4

2
• ⟨−4, 1⟩
x
−6 −4 −2 2 4 6
−2

−4 • ⟨1, −4⟩

As we see in the figure, the line intersects the circle in only two points.
Example 2.1.9. We solve the system
xy = 3
x2 + y 2 = 10
and interpret the system and its solutions geometrically.

Solution. Add 2 times the first equation to the second equation to obtain

x2 + 2xy + y 2 = 16.

Therefore (x + y)2 = 16 so that

x + y = 4 or x + y = −4.

Substitute y = 4 − x into the first equation. Then

x2 − 4x + 3 = 0

so that x = 3 or x = 1. If x = 3 then y = 4 − x = 1. If x = 1 then y = 4 − x = 3. Therefore


⟨x, y⟩ = ⟨3, 1⟩ and ⟨x, y⟩ = ⟨1, 3⟩ are solutions of the system.
Substitute y = −4 − x into the first equation. Then

x2 + 4x + 3 = 0

82
so that x = −3 or x = −1. If x = −3 then y = −4 − x = −1. If x = −1 then
y = −4 − x = −3. Therefore ⟨x, y⟩ = ⟨−3, −1⟩ and ⟨x, y⟩ = ⟨−1, −3⟩ are solutions of the
system.
The system therefore has four solutions; ⟨x, y⟩ = ⟨1, 3⟩, ⟨x, y⟩ = ⟨3, 1⟩, ⟨x, y⟩ = ⟨−1, −3⟩
and ⟨x, y⟩ = ⟨−3, −1⟩.

4 y
•⟨1, 3⟩
2
•⟨3, 1⟩
x
−4 −2 2 4

⟨−3, −1⟩
−2

⟨−1, −3⟩
−4

The first equation in the system represents a hyperbola,


√ while the second equation represents
a circle with centre at the origin and radius 10. A solution of the system corresponds to
a point of intersection of these two curves. As we see in the figure above, the hyperbola
intersects the circle in only four points.

At the hand of a number of examples, we have demonstrated a logically sound method


for solving systems of equations, in particular systems of linear equations such as those in
Examples 2.1.1 to 2.1.6. In contrast to systems of nonlinear equations, such as in Examples
2.1.8 and 2.1.9, a system of linear equations always has either exactly one, infinitely many
or no solutions. After developing some preliminary concepts in Section 2.2, we will resume
our study of systems of linear equations in Section 2.3.

Exercise 2.1

1. Find all the solutions, if any, of the following systems of equations. Follow a logical
argument, as in Examples 2.1.1 to 2.1.9, and show all your steps.

12x + 7y = 0 x2 − xy = 0
(a) (b)
6x + 14y = 0 y2 + x = 2

x + 2y = 10 x2 + y 2 + z 2 = 4
(c) x − 2y + 2z = 2 (d) z2 + z = 2
2x + 2y − 7z = 0 z = 1

83
x1 + x2 = 2
x + y = 10
x1 − x2 + x3 = 1
(e) x − 2y + z = 13 (f)
x2 − x3 = 0
−2x + y − z = 10
x1 + x4 = 2

2x2 + y 2 = 24 2x + y = 0
(g) (h)
x2 − y 2 = −12 x − y2 = 0

x1 − x2 + x3 − x4 = 0
2x + 3y − z = 1
x1 + x2 + x3 + x4 = 6
(i) x − y + z = 2 (j)
x1 + 2x2 + 4x3 + 8x4 = 21
3x + 2y + z = 1
x1 + 3x2 + 9x3 + 27x4 = 52
3x − 2y + 8z = 9
(k) −2x + 2y + z = 3
x + 2y − 3z = 8
2. Give a geometric interpretation of the systems in questions 1(a), 1(c), 1(e), 1(h), 1(k)
and 1(i) and their solutions, if any.
3. Write down a system of two equations in two unknowns with exactly three solutions.
4. Show that ⟨x, y, z⟩ = ⟨t, −t + 3, 2t + 7⟩ is a solution of the system

x + 3y + z = 16
x − y − z = −10
3x + 5y + z = 22
for any real number t.
5. Show that ⟨x1 , x2 , x3 , x4 ⟩ = ⟨s + t, s − t − 1, s, t⟩ is a solution of the system

x1 + 2x2 − 3x3 + x4 = −2
3x1 − x2 − 2x3 − 4x4 = 1
2x1 + 3x2 − 5x3 + x4 = −3
x1 − x3 − x4 = 0
for all real numbers s and t.

2.2 Matrices and Matrix Algebra

When solving a system of linear equations, such as


x + y = 6
(2.1)
3x + 5y = 22,

84
the particular names we use for the unknowns are irrelevant. Indeed, there is no mathe-
matical difference between the system (2.1) and the system

v + w = 6
3v + 5w = 22.

It is therefore only the coefficients of the unknowns and the right hand terms in each
equation that are relevant. A particularly convenient way in which to capture the essential
information in the system (2.1) is as two arrays of numbers
   
1 1 6
and .
3 5 22

The first column of the first array contains the coefficients of x and the second the coefficients
of y, while the first row contains the coefficients appearing in the first equation, with the
second row the coefficients appearing in the second equation. This leads to the concept of
a matrix.
Definition 2.2.1. A matrix is an ordered rectangular array of numbers.
Remark 2.2.2. We note the following regarding the notation for matrices.

(1) Matrices are typically denoted by upper case letters such as A, B and C.

(2) The array of numbers defining a matrix is usually enclosed in parentheses or square
brackets. For instance, we write
 
  −1 0 3
1 −2  2 1 4 
A= and B =  .
3 5  4 5 −6 
−3 −1 −1

(3) The size of a matrix is specified by the number of (horizontal) rows and the number
of (vertical) columns that it contains. The matrix A above contains two rows and two
columns and is called a 2 × 2 (read “2 by 2”) matrix. Similarly, B is a 4 × 3 matrix.
In writing the notation m × n to describe the size of a matrix, we always write the
number of rows first.

(4) We adopt the notation A = [aij ] where aij represents the entries of matrix A. The
entry aij is the entry in row i and column j. For instance, in the matrix A above,
a11 = 1, a12 = −2, a21 = 3 and a22 = 5. In general, an m × n matrix A may be written
as  
a11 a12 a13 . . . a1n
 a21 a22 a23 . . . a2n 
 
 a31 a32 a33 . . . a3n 
A = [aij ] =  .
 .. .. .. .. 
 . . . . 
am1 am2 am3 . . . amn

85
(5) If the dimensions of a matrix A are equal; that is, if m = n, we call A a square matrix
and the special entries a11 , a22 , . . . , ann form the (main) diagonal of A.

(6) It is often convenient to write a vector as a matrix with a single column; that is, if
x̄ = ⟨x1 , . . . , xn ⟩ is a vector in Rn , it can also be considered as an n × 1 matrix
 
x1
 x2 
x̄ =  ..  .
 
 . 
xn

Definition 2.2.3. Let A = [aij ] and B = [bij ] be two matrices of the same size m × n. Then
A = B if aij = bij for every i = 1, . . . , m and j = 1, . . . , n.

We now introduce two special types of matrices. In the matrix algebra we develop later in
this section, these matrices play the roles of the numbers 0 and 1, respectively.
Definition 2.2.4. A zero matrix O is a matrix with all entries equal to zero.
Example 2.2.5. The matrix
 
0 0 0 0
O=
0 0 0 0

is the 2 × 4 zero matrix, while  


0 0
O=
0 0
is the 2 × 2 zero matrix.
Definition 2.2.6. The n × n matrix with diagonal entries equal to one and the other entries
zero is called the n × n identity matrix and denoted by In or I.
Example 2.2.7. The matrix
 
1 0
I2 =
0 1

is the 2 × 2 identity matrix, and


 
1 0 0
I3 =  0 1 0 
0 0 1

is the 3 × 3 identity matrix.

We now introduce algebraic operations for matrices.

86
Definition 2.2.8. Let A = [aij ] and B = [bij ] be two matrices of the same size, say m × n.
The sum A + B of these two matrices is the matrix C = [cij ] where cij = aij + bij for
i = 1, . . . , n and j = 1, . . . , m.

Definition 2.2.9. Let A = [aij ] be an m × n matrix and r any real number. The product
rA of the real number r with the matrix A is the matrix C = [cij ] where cij = raij for
i = 1, . . . , n and j = 1, . . . , m.

Definition 2.2.10. Let A = [aij ] be an m × n matrix. Then −A = (−1)A = [−aij ].

Definition 2.2.11. Let A = [aij ] and B = [bij ] be m × n matrices. The difference of A and
B is the matrix A − B = A + (−B).

Definition 2.2.12. Let A = [aij ] be an m × n matrix and B = [bij ] an n × p matrix. The


matrix AB is the m × p matrix C = [cij ], where

cij = ai1 b1j + ai2 b2j + · · · + ain bnj

for all i = 1, . . . , m and j = 1, . . . , p.

Example 2.2.13. Consider the matrices A, B and C given by


 
  1 5 −3  
2 0 1 −4  0 −3 1  −2 6 2 3
A= , B=  and C = .
4 1 2 8  1 10 1  3 4 2 1
2 11 7

We calculate AB, A + 2C and C − A.

Solution. First we calculate AB. Let D = [dij ] = AB. Then

d11 = a11 b11 + a12 b21 + a13 b31 + a14 b41


= 2 × 1 + 0 × 0 + 1 × 1 + (−4) × 2
= 2+0+1−8
= −5

and

d12 = a11 b12 + a12 b22 + a13 b32 + a14 b42


= 2 × 5 + 0 × (−3) + 1 × 10 + (−4) × 11
= 10 + 0 + 10 − 44
= −24.

The remaining entries of D are calculated in the same way. The result is
 
−5 −24 −33
D = AB = ,
22 125 47

87
as the reader may verify.

Next we determine A + 2C. We have


   
2 × (−2) 2 × 6 2 × 2 2 × 3 −4 12 4 6
2C = = .
2×3 2×4 2×2 2×1 6 8 4 2

Therefore
   
−2 − 2 0 + 12 1 + 4 −4 + 6 −2 12 5 2
A + 2C = = .
4+6 1+8 2+4 8+2 10 9 6 10

In the same way,  


−4 6 1 7
C − A = C + (−A) = .
−1 3 0 −7

Addition and multiplication of matrices satisfy many of the familiar properties of addition
and multiplication of real numbers, as is shown in the following results. The proofs of these
results are essentially similar, depending only on the properties of addition and multipli-
cation of real numbers and Definitions 2.2.3 to 2.2.12. We therefore illustrate the general
technique and leave the bulk of the proofs as exercises, see Exercise 2.2 number 3.

Theorem 2.2.14. Suppose that A, B and C are m × n matrices. Then the following
statements are true.

(1) Commutativity of addition: A + B = B + A.

(2) Associativity of addition: (A + B) + C = A + (B + C).

(3) Existence of an additive identity: A + O = A.

(4) Existence of an additive inverse: A + (−A) = O.

Proof of (1). According to Definition 2.2.8, A + B = [aij + bij ] and B + A = [bij + aij ].
But addition of real numbers is commutative. Therefore

aij + bij = bij + aij for every i = 1, . . . , m and j = 1, . . . , n.

By Definition 2.2.3 it follows that A + B = B + A.

Theorem 2.2.15. Suppose A and B are m × n matrices, and r and s are real numbers.
Then the following statements are true.

(1) First distributive law: r(A + B) = rA + rB.

(2) Second distributive law: (r + s)A = rA + sA.

(3) Associativity of scalar multiplication: (rs)A = r(sA).

88
(4) 1(A) = A.

(5) 0(A) = O.

(6) r(O) = O.

Proof of (3). By Definition 2.2.9, (rs)A = [(rs)aij ] and r(sA) = r[saij ] = [r(saij )]. But
multiplication of real numbers is associative. Therefore

(rs)aij = r(saij ) for every i = 1, . . . , m and j = 1, . . . , n.

It follows by Definition 2.2.3 that (rs)A = r(sA).

Theorem 2.2.16. Suppose A, B, C, I and O all have suitable sizes so that the relevant
operations are defined, and let r be a real number. Then the following are true.

(1) Associativity of multiplication: (AB)C = A(BC).

(2) First distributive law: A(B + C) = AB + AC.

(3) Second distributive law: (A + B)C = AC + BC.

(4) Third distributive law: r(AB) = (rA)B = A(rB).

(5) Existence of multiplicative identity: IA = A.

(6) AI = A.

(7) OA = O.

(8) AO = O.

Proof of (5). Let I = [oij ] be the m × m identity matrix, and A = [aij ] an m × n matrix.
Let D = [dij ] = IA. For each i = 1, . . . , m and j = 1, . . . , n we have

dij = oi1 a1j + oi2 a2j + · · · + oii aij + · · · + oim amj .

But oij = 0 if i ̸= j and oii = 1 for all i = 1, . . . , m and j = 1, . . . , n. Therefore dij = aij for
all i = 1, . . . , m and j = 1, . . . , n. It follows from Definition 2.2.3 that IA = D = A.

Remark 2.2.17. If A is a square matrix, we denote by A2 the matrix AA. In the same
way, A3 = AA2 , A4 = AA3 and, in general, for a positive integer n ≥ 2, An = AAn−1 .

As Theorems 2.2.14, 2.2.15 and 2.2.16 show, the algebraic operations on matrices satisfy
many of the familiar properties of addition and multiplication of real numbers. However,
matrix multiplication has some peculiar properties, as is shown in the following examples.

89
Example 2.2.18. Multiplication of real numbers is commutative. That is, ab = ba for all
real numbers a and b. This is not true for matrices, in general. There exist n × n matrices
A and B so that AB ̸= BA. Let
   
1 0 1 1
A= and B = .
2 −1 2 3
Then    
1 1 3 −1
AB = and BA = .
0 −1 8 −3
Therefore AB ̸= BA.
Example 2.2.19. For real numbers a and b it is well known that if ab = 0, then either
a = 0 or b = 0. That is, there are no zero divisors in R. For matrix multiplication this is
not the case. Let    
1 2 2
A= and x̄ = .
2 4 −1
Then A ̸= O and x̄ ̸= 0̄, but  
0
Ax̄ = .
0
This situation is not confined to the case when one of the matrices is a column vector.
Indeed, if    
1 −1 2 10
A= and B = ,
3 −3 2 10
then A ̸= O and B ̸= O, but AB = O.
Example 2.2.20. Multiplication of real numbers satisfy the cancellation law. That is, if
a, b and c are real numbers so that ac = bc and c ̸= 0, then a = b. For multiplication of
matrices, this is in general not the case. Indeed, let
     
1 5 1 2 1 0
A= , B= and C = .
3 7 3 4 0 0
Clearly, A ̸= B, but  
1 0
AC = = BC.
3 0
Example 2.2.21. If a is a nonzero real number, then it has a multiplicative inverse; that
is, there exists a real number a−1 so that aa−1 = a−1 a = 1. For matrices, this is generally
not the case. For example, let  
1 0
A= .
3 0
Suppose that there exists a 2 × 2 matrix B = [bij ] so that D = [dij ] = AB = I. It follows
from the definitions of equality of matrices and matrix multiplication, Definitions 2.2.3 and
2.2.12 respectively, that
1 = d11 = b11 and 0 = d21 = 3b11 .
Therefore 1 = b11 = 0, a contradiction. Therefore the matrix B does not exist.

90
Besides the algebraic operations introduced in Definitions 2.2.8, 2.2.9 and 2.2.12, there are
many other interesting ways to generate a new matrix from a given matrix A. A particularly
useful way in which this can be done is through the process of transposition.

Definition 2.2.22. If A is an m × n matrix, then the transpose of A, written AT , is the


n × m matrix B = [bij ] such that bij = aji for all i = 1, . . . , n and j = 1, . . . , m.

Remark 2.2.23. If A is an m × n matrix, then the ith row of AT is the ith column of A.

We illustrate the concept of the transpose of a matrix at the hand of an example.

Example 2.2.24. The transpose of


 
2 3 4
A=
5 6 7

is 

2 5
AT =  3 6  .
4 7
The transpose of  
7
x̄ =  8 
9
is
x̄T =
 
7 8 9 .

The process of transposition satisfies the following properties.

Theorem 2.2.25. Let A and B be matrices of the appropriate sizes. Then the following
statements are true.

(1) (AT )T = A.

(2) (A + B)T = AT + B T .

(3) (AB)T = B T AT .

Proof of (2). Assume that A and B are m × n matrices. Let C = [cij ] = (A + B)T and
D = [dij ] = AT + B T . Then by Definitions 2.2.8 and 2.2.22,

cij = aji + bji = dij for every i = 1, . . . , n and j = 1, . . . , m.

It therefore follows from Definition 2.2.3 that (A + B)T = AT + B T .

91
As is mentioned at the start of this chapter, matrices provide a convenient way in which the
relevant information in a system of linear equations can be represented. In fact, more can
be done! Using matrix multiplication and Definition 2.2.3, any system of linear equations
can be represented in matrix form. We illustrate this fact in a special case.
Example 2.2.26. Consider for the system of linear equations

a11 x1 + a12 x2 + a13 x3 = b1


a21 x1 + a22 x2 + a23 x3 = b2 (2.2)
a31 x1 + a32 x2 + a33 x3 = b3 .

Let      
a11 a12 a13 x1 b1
A =  a21 a22 a23  , x̄ =  x2  and b̄ =  b2  .
a31 a12 a33 x3 b3
Then  
a11 x1 + a12 x2 + a13 x3
Ax̄ =  a21 x1 + a22 x2 + a23 x3  .
a31 x1 + a32 x2 + a33 x3
It follows from Definition 2.2.3 that a triple x̄ = ⟨x1 , x2 , x3 ⟩ is a solution of the system (2.2)
if and only if the (column) vector x̄ satisfies the matrix equation

Ax̄ = b̄. (2.3)

Therefore the system of linear equations (2.2) is equivalent to the matrix equation (2.3).

Motivated by systems of linear equations, the concept of a matrix is introduced, and al-
gebraic operations on matrices are defined. These operations satisfy many of the familiar
properties of addition and multiplication of real numbers but have some peculiar properties.
Some of these are demonstrated in Examples 2.2.18 to 2.2.21. It is also shown how a system
of linear equations may be written in matrix form. This provides a compact and powerful
representation for systems of linear equations. The utility of the matrix representation of
systems of linear equations is demonstrated in the sections to come.

Exercise 2.2

1. Let  
1 1      
1 2 1 1 1
A =  1 −1  , B = , C= and x̄ = .
2 3 0 0 10
2 3
Compute the following B − C, B + 2C, A + C,BA, Ax̄, AB x̄, ABC and BCB if it is
defined. Otherwise, explain why the matrix is not defined.

2. Find matrices A and B so that


   
11 10 9 10
2A + 3B = and 3A + 2B = .
7 0 8 0

92
3. Let      
a11 a12 b11 b12 c1
A= , B= and C = .
a21 a22 b21 b22 c2
Prove the following special cases of Theorems 2.2.15 and 2.2.16.
(a) (r + s)A = rA + sA for all real numbers r and s.
(b) (AB)C = A(BC).
(c) (A + B)C = AC + BC.

4. This exercise relates to Examples 2.2.18 to 2.2.21.


   
1 0 1 1
(a) Compute AB and BA where A = and B = .
1 0 1 1
(b) Find 2 × 2 matrices A and B such that A ̸= O, A ̸= I, B ̸= O, B ̸= I and A ̸= B,
but AB = BA.
(c) If A and B are n×n matrices, is it necessarily true that (A+B)2 = A2 +2AB+B 2 ?
Explain your answer.
(d) Find a 2 × 2 matrix A for which A ̸= O and A2 = O.
(e) Find a 3 × 3 matrix B for which B ̸= O, B 2 ̸= O and B 3 = O.
 
1 1
5. Consider the matrix A = .
2 2
   
1 −1
(a) Show that the vector ū = and w̄ = are solutions of Ax̄ = 0̄.
−1 1
(b) Assume that ū and w̄ are any two vectors so that Āū = 0̄ and Aw̄ = 0̄. If r, s ∈ R
and x̄ = rū + sw̄, show Ax̄ = 0̄.
 
1 2
6. Let A = . Find A2 , A3 and A4 . Give a general formula for An , where n ≥ 2 is
0 1
a natural number. Use Mathematical Induction to prove that your formula is correct.
     
1 2 p q 3 2
7. Consider the matrices P = , Q= , R= ,
2 5 r s 3 2
   
0 0 3 a b c
T =  2 0 0  and S =  d e f  .
0 4 1 g h i
(a) Find real numbers p, q, r and s so that P Q = I. Now calculate QP .
(b) Find real numbers p, q, r and s so that RQ = I.
(c) Find real numbers a, b, c, d, e, f, g, h, i so that T S = I. Now calculate ST .
(d) In general, if A and B are n × n matrices then AB ̸= BA. What conjecture can
you make in the case when AB = I?

93
8. Write the following system of equations in matrix form; that is in the form Ax̄ = b̄.
x1 − x2 + x3 − x4 = 0
3x − 2y + 8z = 9
x1 + x2 + x3 + x4 = 6
(a) −2x + 2y + z = 3 (b)
x1 + 2x2 + 4x3 + 8x4 = 21
x + 2y − 3z = 8
x1 + 3x2 + 9x3 + 27x4 = 52
9. In each case, write the given matrix equation as a system of linear equations. Find
the solution of the system and give a geometric interpretation.
    
     7 2 1 x 0
5 2 x 9
(a) = (b)  0 2 3   y  =  1 
0 0 y 5
0 0 4 z −12
         
2 0 3 x 11 1 2 3 x 7
(c)  0 0 0   y  =  0  (d)  4 5 6   y  =  8 
0 0 2 z 6 0 0 0 z 9
10. Let A and B be 2 × 2 matrices. Show that AB − BA ̸= I, with I the 2 × 2 identity
matrix. [This is a special case of a more general result. As a challenge, show that if
A and B are n × n matrices then, AB − BA is not the n × n identity matrix.]

11. Prove Theorem 2.2.25 (1) and (3) for 3 × 3 matrices.

2.3 Gauss Elimination for Systems of Linear Equa-


tions

Matrices are introduced in Section 2.2, motivated by systems of linear equations. It is shown
how a system of linear equations can be written compactly in matrix form. In this section,
we will develop this idea further. We show how the method for solving systems of linear
equations demonstrated in Section 2.1 is implemented in matrix form. We also show how
the results of matrix algebra obtained in Section 2.2 yield elegant proofs of the important
theorems regarding systems of linear equations.

A system of m linear equations in n unknowns is given by

a11 x1 + a12 x2 + ··· + a1n xn = b1


a21 x1 + a22 x2 + ··· + a2n xn = b2
.. .. .. .. (2.4)
. . . .
am1 x1 + am2 x2 + · · · + amn xn = bm

where aij and bi are constants for i = 1, . . . , m a nd j = 1, . . . , n. The system (2.4) is


completely determined by its m × n coefficient matrix A = [aij ] and by the column of
numbers [b1 , b2 , . . . , bm ]T which we denote by b̄.

94
The augmented matrix or partitioned matrix
 
a11 a12 . . . a1n b1
 a21 a22 . . . a2n b2 
[A | b̄ ] =  .. (2.5)
 
.. .. .. 
 . . . . 
am1 am2 . . . amn bm

is a shorthand notation for the system (2.4). The coefficient matrix A has been augmented
by the column of numbers b̄. Note that all the information contained in the system of
equations (2.4) is captured in this more compact notation.

It is important to note that every row in the augmented matrix (2.5) represents an equation
in the system of equations (2.4). Therefore every time we manipulate the system of equations
to obtain a new system, the augmented matrix changes in a corresponding way to reflect
the new system. We illustrate this at the hand of an example.

Example 2.3.1. Consider the system of linear equation, with augmented matrix [A | b̄ ] as
given below.
 
2y + 3z = 13 0 2 3 13
x + 2y + z = 8  1 2 1 8 
2x + 2y = 6 2 2 0 6

We solve the system, writing down the augmented matrix corresponding to the system at
each step. We also indicate the way in which the augmented matrix is modified.

Solution. By swapping equation one and three we get

 
2x + 2y = 6 2 2 0 6
x + 2y + z = 8 (Row 1 ↔ Row 3)  1 2 1 8 .
2y + 3z = 13 0 2 3 13

1
By multiplying equation one with 2
we get

 
x + y = 3 1 1 0 3
1
x + 2y + z = 8 (Row 1 → 2
× Row 1)  1 2 1 8 .
2y + 3z = 13 0 2 3 13

By replacing equation two with equation two minus equation one we get


x + y = 3 1 1 0 3
y + z = 5 (Row 2 → Row 2 − Row 1)  0 1 1 5  .
2y + 3z = 13 0 2 3 13

95
By replacing equation three with equation three minus twice equation two, we get

 
x + y = 3 1 1 0 3
y + z = 5 (Row 3 → Row 3 − 2 × Row 2)  0 1 1 5 .
z = 3 0 0 1 3
From the last equation it is clear z = 3. Working from the bottom to the top we see the
system has a unique solution ⟨x, y, z⟩ = ⟨1, 2, 3⟩.

Example 2.3.1 clearly demonstrates how the process of solving a system of linear equations
corresponds to the manipulation of the rows of the augmented matrix. The whole process
can therefore be carried out in terms of the augmented matrix corresponding to the system
of equations. This process is called Gauss elimination and is described in more detail in
what follows.

Before we discuss the process of Gauss elimination we introduce some preliminary concepts.
The permissible operations on the rows of an augmented matrix are the so-called elementary
row operations given in the table below.

Row Operation Action Notation


Interchange the ith and jth rows
Row interchange Ri ↔ Rj
in a matrix.
Multiply the ith row in a matrix
Row scaling Ri → sRi
by a nonzero scalar s.
Add s times the jth row to the
Row addition Ri → Ri + sRj
ith row of a matrix .

Definition 2.3.2. Two augmented matrices [A | b̄ ] and [C | d¯ ] of the same size are row
equivalent, written [A | b̄ ] ∼ [C | d¯ ], if the matrix [A | b̄ ] can be transformed into the
matrix [C | d¯ ] by a finite sequence of elementary row operations.

Definition 2.3.3. A matrix is in row echelon form if it satisfies the following two conditions.

(1) All rows containing only zeros appear below those with nonzero entries.

(2) The first nonzero entry in any row appears in a column to the right of the first nonzero
entry in any preceding row.

Example 2.3.4. The matrices


   
  2 0 1 1 0 1 0
2 0 1  0 1 −2 
  0 2 −2 3 
A =  0 2 −2  , B = 
 0 and C =  
0 3   0 0 0 5 
0 0 5
0 0 0 0 0 0 0

96
are in row-echelon form. The matrices
   
  1 0 1 1 0 1 0
0 0 0  3 1 −2   0 2 −2 3 
P = 0 2 −2  , Q = 

 0 0
 and R =  
3   0 3 0 5 
0 0 5
0 0 0 0 0 0 0
are not in row-echelon form.
Definition 2.3.5. The process of Gauss elimination consists of the application of elemen-
tary row operations to the augmented matrix [A | b̄ ] corresponding to a system of linear
equations to obtain an augmented matrix [C | d¯ ], with the matrix C in row-echelon form.

Since our aim is to apply the process of Gauss elimination to solve systems of linear equa-
tions, it is essential that the procedure does not affect the solutions of systems of linear
equations. In this regard, we have the following theorems.
Theorem 2.3.6. Consider two augmented matrices [A | b̄ ] and [C | d¯ ], with A and C both
m × n matrices, and a vector x̄ ∈ Rn . If [A | b̄ ] is row equivalent to [C | d¯ ], then
Ax̄ = b̄ if and only if C x̄ = d.¯
Theorem 2.3.7. Let A be an m × n matrix. Then there exists an m × n matrix B in
row-echelon form so that A and B are row equivalent.

Theorem 2.3.6 states that if Gauss elimination is applied to the augmented matrix [A | b̄ ]
corresponding to a system of linear equations, then the system of linear equations corre-
sponding to the resulting augmented matrix [C | d¯ ] has exactly the same solutions as the
original system. Theorem 2.3.7 states that the process of Gauss elimination can be applied
to any system. We now turn to some examples.
Example 2.3.8. We solve the system of linear equations
x2 − 3x3 = −5
2x1 + 3x2 − x3 = 7
4x1 + 5x2 − 2x3 = 10.

Solution. We apply Gauss elimination to the augmented matrix corresponding to the sys-
tem of equations.  
0 1 −3 −5
 2 3 −1 7  R1 ↔ R2
4 5 −2 10
 
2 3 −1 7
f  0 1 −3 −5  R3 → R3 − 2R1
4 5 −2 10
 
2 3 −1 7
f  0 1 −3 −5  R3 → R3 + R2
0 −1 0 −4

97
 
2 3 −1 7
f  0 1 −3 −5 
0 0 −3 −9
The last matrix is in row-echelon form. The system of equations corresponding to the final
matrix is
2x1 + 3x2 − x3 = 7
x2 − 3x3 = −5
− 3x3 = −9.
We apply backward substitution and obtain
x3 = 3
x2 − 3(3) = −5, so that x2 = 4
2x1 + 3(4) − 3 = 7, so that x1 = −1.
Therefore the solution of the system is ⟨x1 , x2 , x3 ⟩ = ⟨−1, 4, 3⟩.
Example 2.3.9. We solve the system of linear equations
x1 + 3x2 = 13
2x1 − x2 = 19
x1 + 10x2 = 20.

Solution. Apply Gauss elimination to the augmented matrix.


 
1 3 13
 2 −1 19  R2 → R2 − 2R1
1 10 20
 
1 3 13
f  0 −7 −7  R3 → R3 − R1
1 10 20
 
1 3 13
f  0 −7 −7  R3 → R3 + R2
0 7 7
 
1 3 13
f  0 −7 −7 
0 0 0
Hence
x1 + 3x2 = 13
−7x2 = −7
0 = 0
so that
x2 = 1 and x1 = 13 − 3(1) = 10.
The solution is ⟨x1 , x2 ⟩ = ⟨10, 1⟩.

98
Example 2.3.10. We solve the system of linear equations
2x1 + 9x2 − 2x3 + x4 = 0
4x1 + 10x2 − 20x3 + 3x4 = 0
2x1 + 5x2 − 10x3 + 11x4 = 0
8x2 + 16x3 − x4 = 0.

Solution. Applying Gauss elimination to the augmented matrix, we have

 
2 9 −2 1 0
 4 10 −20 3 0 
 R → R2 − 2 × R1 , R3 → R3 − R1
0  2

 2 5 −10 11
0 8 16 −1 0
 
2 9 −2 1 0
 0 −8 −16 1 0 
 R → R3 − 1 × R2 , R4 → R4 + R2
0  3
f 
 0 −4 −8 10 2

0 8 16 −1 0
 
2 9 −2 1 0
 0 −8 −16 1 0 
f 
 0

0 0 9.5 0 
0 0 0 0 0
Hence
2x1 + 9x2 − 2x3 + x4 = 0
−8x2 − 16x3 + x4 = 0
9.5x4 = 0
0 = 0.
It follows from equation three that
x4 = 0.
Back substitution into equation two yields
−8x2 − 16x3 = 0.
Therefore the values of x2 and x3 cannot be uniquely determined by the equations. Let
x3 = r, with r any real number. Then
x2 = −2r.
Back substitution into equation one yields
2x1 + 9(−2r) − 2r + 0 = 0
2x1 = 20r
x1 = 10r.
The system has infinitely many solutions, given by
⟨x1 , x2 , x3 , x4 ⟩ = ⟨10r, −2r, r, 0⟩, r ∈ R.

99
Example 2.3.11. We solve the system of linear equations

2x1 + 2x2 = 2
2x1 + 3x2 = −1
x1 − x2 = 1.

Solution. As usual, we apply Gauss elimination to the augmented matrix.


 
2 2 2
 2 3 −1  R1 → R1 × 21
1 −1 1
 
1 1 1
f  2 3 −1  R2 → R2 − 2 × R1 ; R3 → R3 − R1
1 −1 1
 
1 1 1
f  0 1 −3  R3 → R3 + 2 × R2
0 −2 0
 
1 1 1
f  0 1 −3 
0 0 −6

Hence
x1 + 2x2 = 2
x2 = −3
0 = −6.
The last equation states that
0 = −6.
Since this is impossible, we conclude that the system has no solutions.

Example 2.3.12. We solve the system of linear equations

2x1 + 9x2 − 2x3 + x4 = 10


4x1 + 10x2 − 20x3 + 3x4 = −3.

Solution. Apply Gauss elimination to the augmented matrix. In particular, we apply the
row operation ‘R2 → R2 − 2 × R1 ’. The result is
   
2 9 −2 1 10 2 9 −2 1 10
.
4 10 −20 3 −3 0 −8 −16 1 −23
f

Hence
2x1 + 9x2 − 2x3 + x4 = 10
−8x2 − 16x3 + x4 = −23.

100
Clearly, values for two of the xi cannot be uniquely determined by the equations. Let x3 = r
and x4 = s, with r and s arbitrary real numbers. Then
1 23
x2 = −2r + s + .
8 8
Substitution into equation one yields
1 23
2x1 + 9(−2r + s + ) − 2r + s = 10
8 8
17 127
2x1 = 20r − s−
8 8
17 127
x1 = 10r − s − .
16 16

The system has infinitely many solutions, given by


1 1
⟨x1 , x2 , x3 , x4 ⟩ = r⟨10, −2, 1, 0⟩ + s⟨−17, 2, 0, 1⟩ + ⟨−127, 46, 0, 0⟩, r, s ∈ R.
16 16

We now consider the nature of the solutions of a system of linear equations. As mentioned
in Section 2.1, such a system has either exactly one solution, no solutions or infinitely many
solutions. We formulate this important result, and give a proof based on the properties of
matrix addition and multiplication.

Theorem 2.3.13. For any system of linear equations, exactly one of the following state-
ments is true.

(1) The system has no solutions.

(2) The system has exactly one solution.

(3) The system has infinitely many solutions.

Proof. Write the system in matrix form

Ax̄ = b̄

where A is the coefficient matrix, and b̄ the column vector containing the right-hand terms
of the equations making up the system. Assume that (1) and (2) are false. Then there exist
vectors x¯1 ̸= x¯2 so that

Ax̄1 = b̄ and Ax̄2 = b̄. (2.6)

Let x̄(t) = x̄1 + t(x̄2 − x̄1 ) for some real number t. It follows from Theorem 2.2.15 (1) and
Theorem 2.2.16 (1) and (4) that

Ax̄(t) = A(x¯1 + t(x¯2 − x¯1 )) = Ax¯1 + A(t(x¯2 − x¯1 )) = Ax¯1 + t(Ax¯2 ) − t(Ax¯1 ).

101
Therefore (2.6) implies that
Ax̄(t) = b̄ + tb̄ − tb̄ = b̄.
Therefore x̄(t) is a solution of the equation Ax̄ = b̄ for every real number t.
If t0 and t1 are real numbers so that t0 ̸= t1 , then
x̄(t1 ) − x̄(t0 ) = (x¯1 + t1 (x¯2 − x¯1 )) − (x¯1 + t0 (x¯2 − x¯1 )) = (t1 − t0 )(x¯2 − x¯1 ).
Because x̄1 − x̄2 ̸= 0̄ we have
if t0 ̸= t1 then x̄(t1 ) − x̄(t0 ) ̸= 0̄.
Consequently, if t0 ̸= t1 then x̄(t1 ) ̸= x̄(t0 ). Therefore the equation Ax̄ = b̄ has infinitely
many solutions so that (3) is true.

We have shown that at least one of the statements (1), (2) and (3) is always true. Clearly,
it is not possible for two of these statements to be simultaneously true. Therefore exactly
one of these statements is true.

Systems of linear equations are classified in terms of the number of solutions of the system.
Definition 2.3.14. A system of linear equation is consistent if it has one or more solutions.
If the system has no solutions, the system is called inconsistent.

A careful study of Examples 2.3.8 to 2.3.12 reveals that it is possible to determine the number
of solutions of a system of linear equations without finding the solutions. In particular, the
shape of the augmented matrix in row-echelon form,determines the number of solutions.
Theorem 2.3.15. Let [A | b̄ ] represent a system of linear equations, where the m×n matrix
A is in row-echelon form. Then the following statements are true.

(1) The system is inconsistent if and only if the coefficient matrix A has a row with all
entries equal to zero and the corresponding entry of the column b̄ is not zero.
(2) If the system is consistent, it has a unique solution if and only if m ≥ n and akk ̸= 0
for all k = 1, 2, . . . , n.
(3) If the system is consistent, it has infinitely many solutions if and only if m < n or
akk = 0 for some k = 1, 2, . . . , n.

We end this section with an example that illustrates how Theorem 2.3.15 may be applied.
Example 2.3.16. Consider the system of linear equations with corresponding augmented
matrix  
2 −1 3 0
 0 k 1 2 
2
0 0 k −k−2 k−2
where k is a real number. We determine the value(s) of k for which the system has a unique
solution, no solution and infinitely many solutions, respectively.

102
Solution. Since the coefficient matrix, call it A, is square, it follows from Theorem 2.3.15
(2) that the system has a unique solution if and only if all the entries on the main diagonal
of A are non-zero. That is, the system has a unique solution if and only if

k ̸= 0, k ̸= 2 and k ̸= −1.

If k = 0, the augmented matrix becomes


 
2 −1 3 0
 0 0 1 2  R3 → R3 + 2R2
0 0 −2 −2
 
2 −1 3 0
f  0 0 1 2 .
0 0 0 2

From the last row and Theorem 2.3.15 (1) we see that if k = 0, then the system has no
solutions.
If k = −1, the augmented matrix becomes
 
2 −1 3 0
 0 −1 1 2 .
0 0 0 −3

From the last row and Theorem 2.3.15 (1) we see that if k = −1, then the system has no
solutions.
If k = 2, the augmented matrix becomes
 
2 −1 3 0
 0 2 1 2 .
0 0 0 0

It follows from Theorem 2.3.15 (1) and (3) that if k = 0, then the system has infinitely
many solutions.

By expressing a system of linear equations in matrix form, we are able to give a systematic
method for solving such systems. Furthermore, we have a first example of a result where
the nature of the matrix representing a system of equations determines the nature of the
solutions of the system. This theme is be developed further in the sections that follow.

Exercise 2.3

1. Consider the given augmented matrix corresponding to a system of linear equations.


Use Gauss elimination to reduce each augmented matrix to row-echelon form. Make

103
sure you indicate the row operations that you use. State whether the system is con-
sistent or inconsistent.
   
2 0 3 4 1 2 1 4  
3 1 1
(a)  1 3 0 5  (b)  −1 1 1 0  (c)
6 2 6
−1 1 1 −1 2 0 1 1

 
  3 4 −1
1 −2 1 −1 3  −1 −1 1 
(d)  2 −3 0 0 3  (e) 
 1 −2

0 
1 −1 −1 1 0
2 3 0
2. Use Gauss elimination and back substitution to solve the following systems of equa-
tions. Once the augmented matrix is in row-echelon form, you must write down the
corresponding system of equations.

x1 + x2 = −1
x + z = 4
2x1 + x3 = −1
(a) (b) 2y − z = 1
x2 + x3 + x4 = 11
x + y − z = 0
− 5x3 + x4 = 5

x + 2y − 3z + w = 1
x − y + 2z = −1
2x + y − 2z + 3w = 0
(c) (d) 2x + 2y − z = 1
x − y + z + 2w = −1
−x + 7y − 8z = 4
3x + 3y + z + 4w = 2

x + 2y −z = 2
3x + y − z = 3
3x + 2z − 2w = 1
(e) (f) x + 3y − z = 1
y + 2z − w = 10
x − y = 1
x − 3z + w = 5
3. Solve the systems associated with the following augmented matrices. Show all the
steps. In particular, once the augmented matrix is in row-echelon form, you must
write down the corresponding system of equations.
   
1 2 −2 1 1 2 −2 0
(a)  2 0 −2 0  (b)  2 0 −2 1 
0 1 −1 0 0 1 −1 0

104
 
  1 −1 0 0 0
1 2 −2 0  −1 2 −1 0 0 
(c)  2 0 −2 0  (d) 
 0 −1

2 −1 0 
0 1 −1 1
0 0 −1 1 1

 
2 −2 0 0 0
 −1 2 −1 0 0 
(e) 
 0 −1

2 −1 0 
0 0 −2 2 0
4. Consider the system of linear equations

x + 2y + 3z = 6
2x − y + 2z = 5.

Solve the system and interpret the system and its solutions geometrically.

5. Determine the value(s) of k for which the system represented by the augmented matrix
below has a unique solution, no solution or infinitely many solutions, respectively.
   
1 0 3 1 1 0 6 − 4k 2
(a)  0 1 1 1  (b)  0 1 2k 0 
2
−1 1 k k 0 0 1 − k − 2k 1 − 2k

6. Determine the value(s) of k for which the system

x + y + kz = 1
x + ky + z = 1
kx + y + z = −2

has a unique solution, no solution or infinitely many solutions, respectively.

7. A square matrix is called upper triangular if all the entries below its main diagonal
are zero.
(a) Solve the system of linear equations

x1 + 2x2 + 3x3 = 8
−2x2 + x3 = 5
4x3 = 12.

(b) Which of the following statements are always true? If a statement is false, produce
a counterexample to demonstrate that the statement is false.
(i) Every upper triangular matrix is in row-echelon form.

105
(ii) Every matrix in row-echelon form is an upper triangular matrix.
(iii) If A is an n × n upper triangular matrix with all its entries on the main
diagonal non-zero, then Ax̄ = b̄ has a unique solution for every b̄ ∈ Rn .

2.4 The Inverse of a Matrix

For matrix multiplication, the n × n identity matrix I fulfils the role that the real number
1 does for multiplication of real numbers; that is, just as

a×1=1×a=a

for every real number a, so


AI = IA = A
for each n × n matrix A. Every nonzero real number a has a multiplicative inverse; that is,
there exists a unique real number a−1 so that aa−1 = a−1 a = 1. As is seen in Section 2.2,
in particular Example 2.2.21, there exist square matrices without multiplicative inverses.
This section is devoted to those matrices that do have multiplicative inverses, the so called
invertible matrices.

Definition 2.4.1. Let A be an n × n matrix. If there exists an n × n matrix B such that


AB = BA = I, then we call A invertible and B the inverse of A. In this case we write
B = A−1 .

Remark 2.4.2. It is clear from the definition that if A is an invertible n × n matrix with
inverse A−1 = B, then B is invertible and B −1 = A.

In general, matrix multiplication is not commutative, see Example 2.2.18. It is therefore


remarkable that the following is true.

Theorem 2.4.3. Let A and B be n × n matrices. Then the following statements are equiv-
alent.

(1) AB = I.

(2) BA = I.

(3) A is invertible, and A−1 = B.

Proof. Clearly, (3) implies (1) and (3) implies (2), and (1) and (2) together imply (3).
According to Definition 2.4.1, it is therefore sufficient to prove that (1) and (2) are equivalent.
The proof of a special case is given as an exercise, see Exercise 2.4 number 5.

106
The meaning of Theorem 2.4.3 is that, in order to show that a square matrix A is invertible,
it is sufficient to find a matrix B so that AB = I. The matrix B is then the inverse of A.

Definition 2.4.1 refers to the inverse of an invertible matrix A. However, a matrix equation

AX = C

with A and C given matrices, may have more than one solution. The fact that for a square
matrix A the equation
AX = I
has at most one solution, is therefore exceptional. We leave the proof of this fact, contained
in the following theorem, as an exercise, see Exercise 2.4 number 3. The proof of the second
statement in the theorem is given as Exercise 2.4 number 4.

Theorem 2.4.4. Let A and B be n × n matrices. Then the following statements are true.

(1) If A is invertible, then A has exactly one inverse.

(2) If A and B are invertible, then AB is invertible, and its inverse is B −1 A−1 .

Before we continue with the discussion of invertible matrices and their inverses, we introduce
a convenient notation for matrix multiplication. Let A = [aij ] and B = [bij ] be n×n matrices.
Denote the rows of A by
ā1 , ā2 , . . . , ān
and the columns of B by
b̄1 , b̄2 , . . . , b̄n .
That is, for i = 1, . . . , n, the vector āi is the 1 × n matrix (row vector)
 
āi = ai1 ai2 . . . ain .

For j = 1, . . . , n, the vector b̄j is the n × 1 matrix (column vector)


 
b1j
 
 
 b2j 
 
b̄j =  .
 
 .. 
 . 
 
 
bnj

107
In this notation, we write the matrices A and B as
 
ā1
 
 
 ā2 
   
A=  , B = b̄1 b̄2 . . . b̄n .
 
 .. 
 . 
 
 
ān

Example 2.4.5. Consider the matrices


   
1 3 −2 −2 1 0
   
   
A=  2 0  , B =  3 −1 2
4   .

   
7 3 1 2 2 5

Then      
ā1 = 1 3 −2 , ā2 = 2 0 4 , ā3 = 7 3 1
and      
−2 1 0
     
     
b̄1 = 
 3  , b̄1 =  −1  , b̄3 = 
  
 2
.

     
2 2 5
Theorem 2.4.6. Let A and B be n × n matrices given by
 
ā1
 
 
 ā2 
   
A=  , B = b̄1 b̄2 . . . b̄n .
 
 .. 
 . 
 
 
ān

Then  
AB = [āi · b̄j ] = Ab̄1 Ab̄2 . . . Ab̄n .

We leave the proof of Theorem 2.4.6 as an exercise, see Exercise 2.4 number 6.

We are now in a position to state one of the main theorems of linear algebra.
Theorem 2.4.7. Let A be an n × n matrix. Then the following statements are equivalent.

(1) A is invertible.

108
(2) The only solution of the equation Ax̄ = 0̄ in Rn is x̄ = 0̄.

(3) The equation Ax̄ = b̄ has a unique solution in Rn for every b̄ ∈ Rn .

Proof that (1) implies (2). Assume that (1) is true. It follows from Theorem 2.2.16 (7)
that x̄ = 0̄ is a solution of the equation Ax̄ = 0̄. Suppose that c̄ ∈ Rn is a solution of
Ax̄ = 0̄. Then
Ac̄ = 0̄.
Then, by Theorem 2.2.16 (6), (1) and (7)

c̄ = Ic̄ = (A−1 A)c̄ = A−1 (Ac̄) = A−1 0̄ = 0̄.

Therefore x̄ = 0̄ is the only solution of the equation Ax̄ = 0̄, so that (2) is true.

Proof that (2) implies (3). Assume that (2) is true, and let b̄ be a vector in Rn . Ac-
cording to Theorems 2.3.6 and 2.3.15, [A | 0̄ ] is row equivalent to a matrix [C | 0̄ ] where
C is in row-echelon form and cii ̸= 0 for every i = 1, . . . , n. Then [A | b̄ ] is row equivalent
to [C | d¯ ], with d¯ a column vector in Rn . Since cii ̸= 0 for every i = 1, . . . , n, it follows
from Theorem 2.3.15 that the equation C x̄ = d¯ has a unique solution. But [A | b̄ ] is row
equivalent to a matrix [C | d¯ ], so by Theorem 2.3.6, the equations C x̄ = d¯ and Ax̄ = b̄ have
exactly the same solutions. Therefore the equation Ax̄ = b̄ has a unique solution.

Proof that (3) implies (1). Assume that (3) is true. Denote by ē1 , ē2 , ..., ēn the columns
of I. Then, for each j = 1, . . . , n, the equation Ax̄ = ēj has a unique solution b̄j ∈ Rn . Let
 
B = b̄1 b̄2 . . . b̄n .

By Theorem 2.4.6,
   
AB = Ab̄1 Ab̄2 . . . Ab̄n = ē1 ē2 . . . ēn = I.

Therefore Theorem 2.4.3 implies that A is invertible, and B = A−1 . Hence (1) is true.

The rest of this section is dedicated to the problem of calculating the inverse of a matrix.
We start with a simple example.

Example 2.4.8. We determine whether or not the matrix


 
1 2
A=
3 −1

is invertible, and find its inverse if it exists.

109
Solution. We must determine whether or not there exists a matrix
 
b11 b12
B=
b21 b22
so that AB = I. Now  
b11 + 2b21 b12 + 2b22
AB = .
3b11 − b21 3b12 − b22
Therefore the matrix equation AB = I is equivalent to the two systems of equations
b11 + 2b21 = 1 b12 + 2b22 = 0
3b11 − b21 = 0, 3b12 − b22 = 1.
In matrix form, these two systems are given by
Ab̄1 = ē1 and Ab̄2 = ē2
where b̄1 and b̄2 are the columns of B and ē1 and ē2 are the columns of I; that is,
   
b11 b12
b̄1 = , b̄2 =
b21 b22
and    
1 0
ē1 = , ē2 = .
0 1
Because these two systems have the same coefficient matrix A, the same series of row
operations can be used to solve the two systems. We therefore perform Gauss elimination
on the single augmented matrix
 
1 2 1 0
[A | I ] = ,
3 −1 0 1
instead of the two augmented matrices [A | ē1 ] and [A | ē2 ]. The reader should verify that
" #
1 0 17 2
7
[A | I ] ∼ .
0 1 37 − 17
Then " # " #
1 2
1 0 7
1 0 7
[A | ē1 ] ∼ 3
and [A | ē2 ] ∼ .
0 1 7
0 1 − 71
Therefore " # " #
1 2
7 7
b̄1 = 3
and b̄2 =
7
− 71
so that " #
h i 1 2
7 7
B= b̄1 b̄2 = 3
.
7
− 17
Since the matrix B is a solution of the equation AB = I, it follows that A is invertible and
A−1 = B.

110
If we take a close look at Example 2.4.8, we notice that [A | I ] ∼ [I | A−1 ]. This is true in
general; that is, if A is an invertible n × n matrix, then [A | I ] ∼ [I | A−1 ]. The converse
of this statement is also true. If [A | I ] ∼ [I | B ], then A is invertible, and A−1 = B.

Theorem 2.4.9. Let A be an n × n matrix. Then A is invertible if and only if there exists
an n × n matrix B so that [A | I ] ∼ [I | B ], and in this case A−1 = B.

Proof. Suppose that [A | I ] ∼ [I | B ] for some n × n matrix B. Let b̄1 , . . . , b̄n denote the
columns of B, and ē1 , . . . , ēn the columns of I. Then

[A | ēj ] ∼ [I | b̄j ] for every j = 1, . . . , n.

Therefore, according to Theorem 2.3.6, Ab̄j = ēj for every j = 1, . . . , n. Theorem 2.4.6
implies that    
AB = Ab̄1 Ab̄2 . . . Ab̄n = ē1 ē2 . . . ēn = I.
By Theorem 2.4.3, A is invertible and A−1 = B.

A special case of the proof of the converse is given as an exercise, see Exercise 2.4 number
8.

The power of Theorem 2.4.9 lies in the fact that it provides us with an algorithm to determine
the inverse of a matrix A, if it exists. We perform Gauss elimination on the augmented
matrix [A | I ], and if for some matrix B

[A | I ] ∼ [I | B ],

then A is invertible and A−1 = B. If [A | I ] is not row equivalent to [I | B ] for any matrix
B, then A is not invertible. We demonstrate this procedure at the hand of some examples.

Example 2.4.10. Given the matrix


 
1 1 0
A =  1 2 1 ,
0 2 4

we determine the inverse of A if it exists.

Solution. We have
 
1 1 0 1 0 0
[A | I ] =  1 2 1 0 1 0  R2 → R2 − R1
0 2 4 0 0 1
 
1 1 0 1 0 0
f  0 1 1 −1 1 0  R3 → R3 − 2 × R2
0 2 4 0 0 1

111
 
1 1 0 1 0 0
1
f  0 1 1 −1 1 0  R3 → 2
× R3
0 0 2 2 −2 1
 
1 1 0 1 0 0
f  0 1 1 −1 1 0  R2 → R2 − R3
0 0 1 1 −1 0.5
 
1 1 0 1 0 0
f  0 1 0 −2 2 −0.5  R1 → R1 − R2
0 0 1 1 −1 0.5
 
1 0 0 3 −2 0.5
f  0 1 0 −2 2 −0.5 
0 0 1 1 −1 0.5

= [I | B].
Therefore, by Theorem 2.4.9, A is invertible, and

 
3 −2 0.5
A−1 =  −2 2 −0.5  .
1 −1 0.5
Example 2.4.11. Let  
1 0 −1
A =  −1 2 1 .
0 2 0
We determine whether or not A is invertible, and give its inverse if it exists.

Solution. We have
 
1 0 −1 1 0 0
[A | I ] =  −1 2 1 0 1 0  R2 → R2 + R1
0 2 0 0 0 1
 
1 0 −1 1 0 0
f  0 2 0 1 1 0  R3 → R3 − R2
0 2 0 0 0 1
 
1 0 −1 1 0 0
f  0 2 0 0 1 0 .
0 0 0 −1 −1 1

112
At this point we see that no row operation will result in a row echelon form with a nonzero
entry in row three and column three. Therefore [A | I ] is not row equivalent to [I | B ] for
any matrix B. By Theorem 2.4.9 the matrix A is not invertible.
Example 2.4.12. Consider the matrix
 
1 0 −1
A= 0 2 1 .
−1 4 2
We determine the inverse of A, if it exists, and use it to solve the system of equations
Ax̄ = b̄ with  
1
b̄ =  3  .
2

Solution. We have
 
1 0 −1 1 0 0
[A | I ] =  0 2 1 0 1 0  R3 → R3 + R1
−1 4 2 0 0 1
 
1 0 −1 1 0 0
f  0 2 1 0 1 0  R3 → R3 − 2 × R2
0 4 1 1 0 1
 
1 0 −1 1 0 0
f  0 2 1 0 1 0  R2 → R2 + R3 ; R1 → R1 − R3
0 0 −1 1 −2 1
 
1 0 0 0 2 −1
1
f  0 2 0 1 −1 1  R3 → −R3 ; R2 → 2
× R2
0 0 −1 1 −2 1
 
1 0 0 0 2 −1
f  0 1 0 0.5 −0.5 0.5  .
0 0 1 −1 2 −1
Therefore, according to Theorem 2.4.9, A is invertible and
 
0 2 −1
A−1 =  0.5 −0.5 0.5  .
−1 2 −1
By Theorem 2.4.7 the system Ax̄ = b̄ has a unique solution. For a vector x̄ ∈ R3 ,
 
4
−1 −1 −1
Ax̄ = b̄ if and only if A (Ax̄) = A b̄ if and only if x̄ = I x̄ = A b̄ = 0  .

3

113
Example 2.4.13. Let  
1 2 1
A =  0 1 0 .
1 3 2

We find all matrices B such that AB = A2 + 2A, if any exist.

Solution. If A is invertible, then for a 3 × 3 matrix B,

AB = A2 + 2A if and only if A−1 AB = A−1 AA + 2A−1 A if and only if B = A + 2I.

That is, if A is invertible, then B = A + 2I is the unique solution of the equation AB =


A2 + 2A. We therefore start by determining whether or not A is invertible. We have
 
1 2 1 1 0 0
[A | I ] =  0 1 0 0 1 0  R3 → R3 − R1
1 3 2 0 0 1
 
1 2 1 1 0 0
f  0 1 0 0 1 0  R3 → R3 − R2
0 1 1 −1 0 1
 
1 2 1 1 0 0
f  0 1 0 0 1 0  R1 → R1 − R3
0 0 1 −1 −1 1
 
1 2 0 2 1 −1
f  0 1 0 0 1 0  R1 → R1 − 2 × R2
0 0 1 −1 −1 1
 
1 0 0 2 −1 −1
f  0 1 0 0 1 0 .
0 0 1 −1 −1 1

By Theorem 2.4.9, A is invertible so that


 
3 2 1
B = A + 2I =  0 3 0  .
1 3 4

The existence of the inverse of an n × n matrix A is closely connected to the solutions


of systems of equations for which A is the coefficient matrix; that is, systems of the form
Ax̄ = b̄. In fact, A is invertible if and only if Ax̄ = b̄ has a unique solution for every
vector b̄. Furthermore, the method by which the inverse of A is determined, if it exists,
is essentially based on simultaneously solving the n systems of linear equations Ax̄ = ē1 ,

114
Ax̄ = ē2 ,. . . ,Ax̄ = ēn . In the next section we discuss a method to determine whether or not
a matrix is invertible which does not involve Gauss elimination.

Exercise 2.4

1. Compute the inverse for each of the following matrices, if it exists. If the inverse does
not exist, explain why this is the case.
   
1 2 2 −1
(a) (b)
3 7 −1 2
   
1 1 1 0
(c) (d)
1 1 0 1
   
1 0 −2 1 −1 0
(e)  4 −2 1  (f)  −1 2 −1 
1 2 −10 0 −1 2
 
  1 −1 0 0
1 −1 0  −1
 −1 2 −1 0 
(g) 2 −1  (h) 
 0 −1

2 −1 
0 −1 1
0 0 −1 2
   
1 0 0 1 −1 0
(i)  0 2 0  (j)  0 1 −1 
0 0 3 0 0 1
2. Show that if  
0 −1
A= ,
1 −1
then A3 = I. Use this fact to find A−1 .
3. Prove Theorem 2.4.4 (1). [HINT: Assume that B and C are both inverses of A, and
prove that B = C.]
4. Prove Theorem 2.4.4 (2).
5. The aim of this exercise is to complete the proof of Theorem 2.4.3 in the special case
of 2 × 2 matrices. Let A = [aij ] and B = [bij ] be 2 × 2 matrices so that AB = I. Prove
that BA = I by expressing b11 , b12 , b21 and b22 in terms of a11 , a12 , a21 and a22 . In
each step, carefully explain why each calculation is permissible.
6. Prove Theorem 2.4.6 in the case where A and B are 3 × 3 matrices.
7. Prove that if a, b, c, d are real numbers such that ad − bc ̸= 0 then
 
a b
A=
c d

115
is invertible, and  
−1 1 d −b
A = .
ad − bc −c a
Now prove that if A is invertible, then ad − bc ̸= 0.

8. The aim of this exercise is to complete the proof of Theorem 2.4.9 in the special case
of 2 × 2 matrices. Let A be an invertible, 2 × 2 matrix. Use the preceding exercise to
prove that [A | I ] ∼ [I | A−1 ].

9. Let A be an n × n matrix such that A2 = O. Prove that A − I is invertible.

10. Find x and y if  12 


 −1 11
x y
1 2 3  5 3 −5 
 0 4 5  = 22 22 22
.
 
1 0 6 −2 1 2
11 11 11

11. It is given that


 
  3 1 1
2 −1 0 4 2 4
2 −1  then A−1 = 
 
if A =  −1 1 1 .
 2
1 2 
0 −1 2 1 1 3
4 2 4

(a) Solve for x̄ if Ax̄ = b̄ where  


2
b̄ =  3  .
4

(b) Solve for x̄ if A2 x̄ = b̄ where  


1
b̄ =  1  .
4
12. Assume A, B and C are invertible, n × n matrices. In each case, solve for X in terms
of A, B and C.
(a) ABXAB = C (b) AXC −1 + 3B = I (c) B(XA + C) = B + C

2.5 The Determinant of a Matrix

The determinant of an n × n matrix A is a number, denoted by det(A). This single number


captures a surprisingly large amount of information about A. In particular, it determines
whether or not the matrix A is inventible, and therefore also says something about the
solutions of the equation Ax̄ = b̄, with b̄ a column vector in Rn .

116
2.5.1 Definition of the Determinant

Consider a system of equations

ax + by = e
cx + dy = f.

Assume that a ̸= 0 and c ̸= 0. We multiply the first equation with c and the second equation
with a. The result is the system

acx + bcy = ec
acx + ady = af.

Next we subtract the first equation from the second, resulting in

acx + bcy = ec
(ad − bc)y = af − ec.

This system has a unique solution if and only if ad − bc ̸= 0. We call the number ad − bc
the determinant of the coefficient matrix
 
a b
c d

of the system.

Definition 2.5.1. The determinant of a 2 × 2 matrix A = [aij ] is det(A) = a11 a22 − a12 a21 .

Example 2.5.2. The determinant of


 
1 2
A=
3 4

is det(A) = 1 × 4 − 2 × 3 = 4 − 6 = −2.

For 3 × 3 matrices, the determinant is defined as follows.

Definition 2.5.3. The determinant of a 3 × 3 matrix A = [aij ] is


     
a22 a23 a21 a23 a21 a22
det(A) = a11 det − a12 det + a13 det .
a32 a33 a31 a33 a31 a32

Example 2.5.4. The determinant of


 
1 2 3
A =  −4 5 6 
7 −8 9

117
is
     
5 6 −4 6 −4 5
det(A) = 1 × det − 2 × det + 3 × det
−8 9 7 9 7 −8

= 93 − 2 × (−78) + 3 × (−3)

= 240.

For n × n matrices, with n ∈ N, the determinant is defined inductively; that is, we define
the determinant of (n + 1) × (n + 1) matrices in terms of the determinant of n × n matrices.
In order to formulate the definition, we introduce some notation.
Definition 2.5.5. If A = [aij ] is a square matrix, then Aij is the matrix obtained after
deleting row i and column j from A. The minor of entry aij (the entry in row i and column
j of A) is det(Aij ) and is denoted by Mij .
Remark 2.5.6. If A is an (n + 1) × (n + 1) matrix, then Aij is an n × n matrix for all
i, j = 1 . . . n.
Example 2.5.7. If  
1 2 3
A =  4 −5 6  ,
7 −8 9
then  
2 3
A21 =
−8 9
and  
2 3
M21 = det(A21 ) = det = 18 + 24 = 42.
−8 9
Definition 2.5.8. Let A = [aij ] be an n × n matrix with n ≥ 4. Then
n
X
det(A) = (−1)1+j a1j M1j
j=1

= (−1)2 a11 M11 + (−1)3 a12 M12 + · · · + (−1)n+1 a1n M1n .

We illustrate Definition 2.5.8 at the hand of an example.


Example 2.5.9. We find the determinant of
 
0 −1 2 −1
 1 0 1 2 
A=  3 −1 −2
.
1 
1 0 2 1

118
Solution. According to Definition 2.5.8,
det(A) = 0 × (−1)2 M11 + (−1) × (−1)3 M12

+2 × (−1)4 M13 + (−1) × (−1)5 M14 .


Furthermore,
 
1 1 2
M12 = det  3 −2 1 
1 2 1
     
−2 1 3 1 3 −2
= det − det + 2 det
2 1 1 1 1 2

= −4 − 2 + 16

= 10,
 
1 0 2
M13 = det  3 −1 1 
1 0 1
   
−1 1 3 −1
= det + 2 det
0 1 1 0

= −1 + 2

= 1
and  
1 0 1
M14 = det  3 −1 −2 
1 0 2
   
−1 −2 3 −1
= det + det
0 2 1 0

= −2 + 1

= −1.
Therefore det(A) = M12 + 2M13 + M14 = 10 + 2 − 1 = 11.

Definitions 2.5.3 and 2.5.8 may create the impression that there is something special about
the first row of a matrix. However, one can calculate the determinant of a matrix along any
row or column, with a minor modification of the formula.

119
Theorem 2.5.10. Let A be an n × n matrix. Then for every i = 1, . . . , n
Xn
det(A) = (−1)i+j aij Mij
j=1 (2.7)

= (−1)i+1 ai1 Mi1 + (−1)i+2 ai2 Mi2 + · · · + (−1)i+n ain Min ,


and for every j = 1, . . . , n
X n
det(A) = (−1)i+j aij Mij
i=1 (2.8)

= (−1)1+j a1j M1j + (−1)2+j a2j M2j + · · · + (−1)n+j anj Mnj .

Proof. We prove (2.7) for a 3 × 3 matrix A in the case i = 2. According to Definition 2.5.3,
   
a22 a23 a21 a23
det(A) = a11 det − a12 det
a32 a33 a31 a33
 
a21 a22 (2.9)
+a13 det
a31 a32

= a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a233 a31 ) + a13 (a21 a32 − a22 a31 ).
On the other hand,
3    
X
2+j a12 a13 a11 a13
(−1) a2j M2j = −a21 det + a22 det
a32 a33 a31 a33
j=1

 
a11 a12
−a23 det
a31 a32

= −a21 (a12 a33 − a13 a32 ) + a22 (a11 a33 − a13 a31 )

−a23 (a11 a32 − a12 a31 )

= a13 a32 a21 − a12 a33 a21 + a11 a22 a33 − a13 a31 a22 (2.10)

+a12 a31 a23 − a11 a32 a23

= (a11 a22 a33 − a11 a32 a23 ) + (a12 a31 a23 − a12 a33 a21 )

+(a13 a32 a21 − a13 a31 a22 )

= a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 )

+a13 (a21 a32 − a22 a31 ).

120
3
X
By (2.9) and (2.10), det(A) = (−1)2+j a2j M2j .
j=1

Remark 2.5.11. Let A be an n × n matrix. For i = 1, . . . , n, the expression


n
X
det(A) = (−1)i+j aij Mij
j=1

is called the expansion of det(A) according to row i. For j = 1, . . . , n, the expression


n
X
det(A) = (−1)i+j aij Mij
i=1

is called the expansion of det(A) according to column j.

We demonstrate Theorem 2.5.10 at the hand of some examples.

Example 2.5.12. According to Example 2.5.9, the determinant of


 
0 −1 2 −1
 1 0 1 2 
A=  3 −1 −2

1 
1 0 2 1

is det(A) = 11. We expand det(A) according to the second column.

Solution. We have

4
X
det(A) = (−1)i+2 ai2 Mi2
i=1

= (−1)1+2 a12 M12 + (−1)2+2 a22 M22 + (−1)3+2 a32 M32 + (−1)4+2 a42 M42
   
1 1 2 0 2 −1
= det  3 −2 1  + 0 + det  1 1 2  + 0
1 2 1 1 2 1

= [1(−2 − 2) − 1(3 − 1) + 2(6 + 2)] + [0 − 2(1 − 2) + (−1)(2 − 1)]

= −4 − 2 + 16 + 2 − 1

= 11.

121
Example 2.5.13. We evaluate the determinant of
 
1 −1 2
A= 2 3 0 
0 −1 1

by expanding it according to the first column, and then by expanding it according to the
second row.

Solution. We Expand det(A) according to the first column and get

det(A) = (−1)1+1 a11 M11 + (−1)2+1 a21 M21 + (−1)3+1 a31 M31
     
3 0 −1 2 −1 2
= det − 2 det + 0 × det
−1 1 −1 1 3 0

= 1(3 + 0) − 2(−1 + 2) + 0

= 1.

When the determinant is expanded according to row 2, we get

det(A) = (−1)2+1 a21 M21 + (−1)2+2 a22 M22 + (−1)2+3 a23 M22
     
−1 2 1 2 1 −1
= −2 det + 3 det − 0 det
−1 1 0 1 0 −1

= −2(−1 + 2) + 3(1 − 0) − 0

= 1.

To calculate a determinant is typically extremely computationally taxing, even for relatively


small matrices. This means that to calculate the determinant of a matrix involves performing
a large number of algebraic operations (addition and multiplication). Indeed, if we denote
by Sn the number of additions and multiplications needed to evaluate the determinant of
an n × n matrix, then the equation

Sn+1 = (n + 1)(Sn + 1) + n

holds for every natural number n ≥ 2. Clearly, S2 = 3 so that

S3 = 14, S4 = 63, S5 = 324, S6 = 1955, S7 = 13698, . . . , S10 = 9864099.

To evaluate the determinant of a general 10 × 10 matrix involves nearly ten million calcu-
lations!

122
Theorem 2.5.10 provides us with a method by which the number of calculations involved in
evaluating a determinant can be reduced. The strategy is typically to expand the determinant
along that row or column that contains the most zeros. However, this is of very little use if a
matrix contains very few or no zero entries. In the next section, we discuss further methods
to reduce the number of calculations involved in evaluating a determinant.

Exercise 2.5.1

1. Evaluate the following determinants.


   
1 4 −5 6
(a) det (b) det
2 8 −7 −2

 √ √ 
 
−2 7 6
2 √6
(c) det (d) det  5 1 −2 
4 3
3 8 4

 
1 −2 4
(e) det  5 3 −7 
6 1 2

2. Find all values of λ for which det(A) = 0.


 
  λ−4 0 0
λ−2 1
(a) A= (b) A= 0 λ 2 
−5 λ + 4
0 3 λ−1

3. Let  
3 1
A= .
4 2
Is det(5A) = 5 det(A)?

4. Let  
−3 0 5
A =  x 0 0 .
2 0 4
Evaluate det(A) by expanding it according to
(a) the first row;
(b) the third column;
(c) the easiest row or column.

123
5. Evaluate det(A) by expanding according to the easiest row or column.
   
1 5 6 3 −4 8
(a) A =  2 4 −1  (b) A =  0 7 10 
0 −2 0 0 0 20
   
1 1 1 10 20 30
(c) A =  1 0 0  (d) A =  0 40 50 
1 1 1 60 0 70

6. Evaluate the following determinants by expanding according to the easiest row or


column.
   
1 −2 5 2 7 −1 0 0
 0 0 3 0    4 0 0 0 
(a) det  
 2 −6 −7 5 
 (b) det 
 2

6 3 0 
5 0 4 4 5 8 4 −3
 
2 −7 8 9 −6  
a b c d
 0 2 −7 8 9 
 0
  e f g 
(c) det  0 0 2 −7 8  (d) det  
 0

  0 h i 

 0 0 0 2 −7 
0 0 0 j
0 0 0 2 7
 
a 0 0 0
 b c 0 0 
(e) det  
 d

e f 0 

g h i j

2.5.2 Properties of the Determinant

As is mentioned at the end of Section 2.5.1, evaluating the determinant of a square matrix
generally involves a large number of calculations. We can reduce the number of computations
used to determine the determinant of a matrix by expanding it according to a row or column
containing many zeros. In this section, we discuss further methods by which the evaluation
of determinants can be simplified.

Consider an n × n matrix A = [aij ] in row-echelon form. In this case, all the entries of A
below the main diagonal are zero; that is,

aij = 0, i = 1, . . . , n, j < i,

see Exercise 2.5.2 number 16. In this case

det(A) = a11 × a22 × · · · × ann .

124
It is clearly much easier to calculate the determinant of a matrix in row-echelon form,
compared to that of a general matrix. Furthermore, every square matrix is row equivalent to
a matrix in row-echelon form. To exploit these facts, we must determine how row operations
affect the determinant of a matrix.
Theorem 2.5.14. Let A be a square matrix. Then the following statements are true.

(1) If a multiple of one row of A is added to another row to produce a matrix B, then
det(A) = det(B).

(2) If two rows of A are interchanged to produce a matrix B, then det(B) = − det(A).

(3) If one row of A is multiplied by a scalar k to produce a matrix B, then det(B) =


k det(A).

We illustrate Theorem 2.5.14 by means of the following example.


Example 2.5.15. Let  
−1 2 −3
A =  0 4 −5  .
0 6 −7
Then det(A) = (−1)(−28 + 30) = −2.

(1) Apply the row operation R3 → R3 − 23 R2 to obtain the matrix


 
−1 2 −3
B= 0 4 −5  .
1
0 0 2

Expanding according to the first column, we get


1
det(B) = −(4 × − 0) = −2 = det(A).
2

(2) Perform the row operation R1 ↔ R3 on A to produce the matrix


 
0 6 −7
B= 0 4 −5  .
−1 2 −3
Then, expanding according to the first column,

det(B) = 0 + 0 + (−1) × (−30 + 28) = 2 = −(−2) = − det(A).

(3) Apply R2 → 3 × R2 to A to obtain the matrix


 
−1 2 −3
B =  0 12 −15  .
0 6 −7

125
Then
det(B) = −1 × (−84 + 90) = −6 = 3(−2) = 3 det(A),
where we once again expanded according to the first column.

A version of Theorem 2.5.14 also holds for ‘column operations’ on matrices.

Theorem 2.5.16. Let A be a square matrix. Then the following statements are true.

(1) If a multiple of one column of A is added to another column to produce a matrix B,


then det(A) = det(B).

(2) If two columns of A are interchanged to produce a matrix B, then det(B) = − det(A).

(3) If one column of A is multiplied by a scalar k to produce a matrix B, then det(B) =


k det(A).

Theorems 2.5.14 and 2.5.16 are used to simplify the evaluation of determinants. The strategy
is to use row or column operations to reduce a given matrix A to a matrix B which is in
row-echelon form, or contains a row or column with many zeros. We must take care to keep
track of all the operations we perform to reduce A to B. It is important to note that only
multiplying a row or column by a scalar, or interchanging two rows or columns affects the
determinant of a matrix. Adding a scalar multiple of a row or column to another row or
column has no effect. We illustrate this strategy in the following examples.

Example 2.5.17. We find the determinant of


 
2 −8 4
A =  −2 8 −9  .
−1 7 0

Solution. By applying R1 → R1 + R2 to A we obtain the matrix


 
0 0 −5
B =  −2 8 −9 
−1 7 0

According to Theorem 2.5.14 (1),

det(A) = det(B) = −5 × (−14 + 8) = 30.

Example 2.5.18. We find the determinant of


 
1 −4 3 4
 0 −9 6 8 
B=  −6
.
0 2 −4 
1 −4 0 6

126
Solution. We have
 
1 −4 3 4
 0 −9 6 8 
B = 
 −6
 R3 → R3 + 6 × R1 ; R4 → R4 − R1
0 2 −4 
1 −4 0 6
 
1 −4 3 4
 0 −9 6 8  3
 0 −24 20 20  R3 → 4 R3
f  

0 0 −3 2
 
1 −4 3 4
 0 −9 6 8 
 0 −18 15 15  R3 → R3 − 2 × R2
f  

0 0 −3 2
 
1 −4 3 4
 0 −9 6 8 
f   R4 → R4 + R3
 0 0 3 −1 
0 0 −3 2
 
1 −4 3 4
 0 −9 6 8 
f  
 0 0 3 −1 
0 0 0 1

= C.

In the reduction of B to C, the only row operation that affects the determinant is R3 → 43 R3 .
Therefore, according to Theorem 2.5.14 (3),

det(C) = 34 det(B)

so that
det(B) = 34 det(C) = 34 (1 × (−9) × 3 × 1) = −36.

Example 2.5.19. We evaluate the determinant of


 
2 1 3 4 1
 1 2 −1 1 2 
 
A=  −1 3 1 4 4
.

 4 2 2 4 −1 
5 1 3 4 0

127
Solution. We apply the column operation C4 → C4 − C2 − C3 to obtain the matrix
 
2 1 3 0 1
 1 2 −1 0 2 
 
B=  −1 3 1 0 4  .

 4 2 2 0 −1 
5 1 3 0 0

By Theorem 2.5.16 (1),


det(A) = det(B) = 0,
where we expanded det(B) according to the fourth column.

We end this section by showing how algebraic operations on matrices, that is, addition,
multiplication and scalar multiplication affect the determinant.

Theorem 2.5.20. Let A and B be n × n matrices and let k be a real number. Then the
following statements are true.

(1) det(kA) = k n det(A).

(2) det(AB) = det(A) det(B).

Theorem 2.5.20 (1) follows from Theorem 2.5.14 (3), and the proof is given as an exercise,
see Exercise 2.5.2 number 9. We illustrate Theorem 2.5.20 (2) at the hand of an example.

Example 2.5.21. Let


   
−1 2 0 3 0 0
A =  0 2 −1  and B =  0 −1 0 .
0 0 3 2 0 −1

Then  
−3 −2 0
AB =  −2 −2 1 .
6 0 −3
We have
det(A) = (−1)(6 − 0) = −6 and det(B) = 3(1 − 0) = 3.
Expanding det(AB) according to the third row, we have

det(AB) = 6 × (−2 − 0) − 3 × (6 − 4) = −18 = −6 × 3 = det(A) det(B).

In general it is not true that det(A + B) = det(A) + det(B), as the following example
illustrates.

128
Example 2.5.22. Let
   
1 1 1 −1 −1 −1
A =  0 1 1  and B =  0 −1 −1  .
0 0 1 0 0 1

Then  
0 0 0
A + B =  0 0 0 ,
0 0 2
so that
det(A) = det(B) = 1,
but
det(A + B) = 0 ̸= det(A) + det(B).

The results and examples in this section demonstrate how row and column operations are
effectively used to simplify the evaluation of determinants. In the next section, we come to
the heart of the matter. There is a close connection between determinants and inverses of
matrices. In this regard, Theorem 2.5.20 (2) is a precursor of things to come.

Exercise 2.5.2

1. Evaluate det(A).
   
1 4 −3 1 2 1 3 1
 2 0 6 3   1 0 1 1 
(a) A =  4 −1
 (b) A =  
2 5   0 2 1 0 
3 4 3 4 0 1 2 3
   
0 1 0 0 0 0 1 1 1 1

 2 0 0 0 0 


 0 0 0 1 0 

(c) A = 
 0 0 3 0 0 
 (d) A = 
 0 0 1 1 1 

 0 0 0 0 4   1 0 0 1 1 
0 0 0 5 0 1 0 0 0 1
2. Let      
a b c b a c u v w
A =  x y z  , B =  y x z  and C =  a b c  .
u v w v u w x y z
If det(A) = k, express det(−A), det(2A), det(B), det(C) and det(A + B) in terms
of k, without expanding the determinants.

3. Let  
a b c
det  d e f  = k.
g h i

129
Express the determinants of the following matrices in terms of k.
   
d e f −a −b −c
(a) A =  g h i  (b) A =  2d 2e 2f 
a b c g h i
   
a+d b+e c+f a b c
(c) A =  d e f  (d) A =  d − 3a e − 3b f − 3c 
g h i 2g 2h 2i
   
0 0 0 a b c
(e) A =  g h i  (f) A =  a b c 
a b c g h i
4. Without expanding, show that
 
b+c c+a b+a
det  a b c  = 0.
1 1 1

5. If A is a 5 × 5 matrix and det(A) = −4, find det(2A), det(A2 ) and det(−A).

6. Let A be a square matrix such that det(A) = 3 and det(2A) = 48. What is the size
of A?

7. If  
α β γ
A= x y z 
1 2 3
has determinant det(A) = 7, find det(B) if
 
2 4 6
B =  x y z .
α β γ

8. Prove Theorem 2.5.14 (3). [HINT: If B is obtained from A by the operation Ri → kRi ,
expand det(B) according to row i.]

9. Prove Theorem 2.5.20 (1).

10. Let A and B be 2 × 2 matrices. Prove that det(AB) = det(A) det(B).

11. Describe the set of points ⟨x, y⟩ ∈ R2 so that


 
x y 1
det  0 1 1  = 0.
−1 0 1

130
12. Let A be a square matrix so that every entry in a specific row of A is zero. Prove that
det(A) = 0.

13. Let A be a square matrix so that every entry in a specific column of A is zero. Prove
that det(A) = 0.

14. Let A be a square matrix such that two of its rows are equal. Explain why det(A) = 0.

15. Let A be a 2 × 2 matrix. Prove that det(A) = det(AT ). (Note that this result is true
for any square matrix.)

16. Use Mathematical Induction to prove the following: If A = [aij ] is an n × n matrix in


row-echelon form with n ≥ 2, then det(A) = a11 a22 · · · ann .

2.5.3 The Determinant and the Inverse

As is mentioned at the end of Section 2.5.2, there is a close connection between determinants
and inverses of square matrices. In fact, the determinant completely determines whether or
not a matrix is invertible.
Theorem 2.5.23. An n × n matrix A is invertible if and only if det(A) ̸= 0.

Proof. Assume that det(A) ̸= 0. By Theorem 2.3.7 there exists an n × n matrix B in


row-echelon form so that A and B are row equivalent. By Theorem 2.5.14, there exists a
constant k ̸= 0 so that
det(B) = k det(A).
Therefore
det(B) = b11 b22 · · · bnn ̸= 0
so that bii ̸= 0 for every i = 1, . . . , n. By Theorem 2.3.15, the system

B x̄ = 0̄

has exactly one solution, namely x̄ = 0̄. But [A | 0̄] is row equivalent to [B | 0̄]. Therefore,
by Theorem 2.3.6, x̄ = 0̄ is the only solution of the system

Ax̄ = 0̄.

By Theorem 2.4.7, the matrix A is invertible.

We leave the proof of the converse as an exercise, see Exercise 2.5.3 number 4.

The next two results now follow easily. The proofs are given as exercises, see Exercise 2.5.3
numbers 5 and 6.
1
Theorem 2.5.24. If A is an invertible matrix, then det(A−1 ) = .
det(A)

131
Theorem 2.5.25. Let A be an n × n matrix. Then the following statements are equivalent.

(1) A is invertible.
(2) The only solution of the equation Ax̄ = 0̄ in Rn is x̄ = 0̄.
(3) The equation Ax̄ = b̄ has a unique solution in Rn for every b̄ ∈ Rn .
(4) det(A) ̸= 0.

We illustrate the use of Theorem 2.5.25 at the hand of the following example.
Example 2.5.26. Consider the system of linear equations
kx + y + 2z = −2
3x + 6y + kz = 2
−x − 2y + kz = 1.
where k ∈ R is a constant. We determine those values of k for which the system has a
unique solution.

Solution. We write the system in matrix form, Ax̄ = b̄, with


   
k 1 2 −2
A= 3 6 k  and b̄ =  2  .
−1 −2 k 1
We have det(A) = 8k 2 − 4k. According to Theorem 2.5.25, the system has a unique solution
if and only if det(A) ̸= 0. Now
det(A) = 0 if and only if k = 0 or k = 12 .
Therefore the system has a unique solution if and only if k ̸= 0 and k ̸= 12 .

According to Theorem 2.5.25, the determinant of a square matrix A gives information about
the solutions of equations of the form
Ax̄ = b̄.
Indeed, such an equation has a unique solution for every column vector b̄ ∈ Rn if and only
if det(A) ̸= 0. However, more can be said. It is possible to express the solutions of this
equation directly in terms of the determinant of A. This is known as Cramer’s Rule.
Theorem 2.5.27. Let A be an n × n matrix such that det(A) ̸= 0. For any b̄ ∈ Rn , the
unique solution of the system of equations Ax̄ = b̄ is
det(A1 ) det(A2 ) det(An )
x1 = , x2 = , . . . , xn = ,
det(A) det(A) det(A)
where for j = 1, . . . , n the matrix Aj is obtained by replacing the entries in the jth column
 T
of A by the entries in the matrix b̄ = b1 b2 . . . bn .

132
We illustrate Cramer’s Rule at the hand of an example.
Example 2.5.28. We use Cramer’s Rule to solve the system of equations
x1 + 2x3 = 6
−3x1 + 4x2 + 6x3 = 30
−x1 − 2x2 + 3x3 = 8.

Solution. We write the system in matrix form, Ax̄ = b̄, with


   
1 0 2 6
A =  −3 4 6  and b̄ =  30  .
−1 −2 3 8
First we compute det(A). We have
det(A) = 1 × (12 + 12) + 2 × (6 + 4) = 44.
Next we determine det(A1 ), det(A2 ) and det(A3 ). We have
     
6 0 2 1 6 2 1 0 6
A1 =  30 4 6  , A2 =  −3 30 6  and A3 =  −3 4 30  .
8 −2 3 −1 8 3 −1 −2 8
Therefore
det(A1 ) = −40, det(A2 ) = 72 and det(A3 ) = 152.
Hence, by Cramer’s Rule,
det(A1 ) 40 det(A2 ) 72 det(A3 ) 152
x1 = = − , x2 = = , x3 = = .
det(A) 44 det(A) 44 det(A) 44

Cramer’s Rule is of very little practical use, since for an n × n system it involves having to
calculate n + 1 determinants. Therefore, to solve a system of modest size, say a system of
ten equations in ten unknowns, requires 109 059 499 calculations! Solving the same system
using Gauss elimination involves far fewer calculations.

Exercise 2.5.3

1. Consider the system of equations


3kx − 2y = 4
−6x + ky = 1.
Determine (a) the value(s) of k for which the system has a unique solution, and (b)
the solution, using Cramer’s Rule.
2. Use Cramer’s Rule to solve for x′ and y ′ in terms of x and y if
3 ′ 4 ′
x = 5
x − 5
y
4 ′ 3 ′
y = 5
x + 5
y.

133
3. Let θ be an arbitrary real number. Use Cramer’s Rule to solve for x′ and y ′ in terms
of x and y if
x = cos(θ)x′ − sin(θ)y ′
y = sin(θ)x′ + cos(θ)y ′ .

4. Complete the proof of Theorem 2.5.23. In particular, show that det(A) ̸= 0 whenever
A is an invertible matrix.

5. Prove Theorem 2.5.24.

6. Prove Theorem 2.5.25.

7. Determine all values of k ∈ R for which the given matrix A is invertible.


   
1 0 1 1 0 1
(a) A =  3 1 − k 2  (b) A =  0.5 cos k 0 
k 0 1+k − sin k 1 0

2.6 An Application to Integration

In this section, we show how the theory of systems of linear equations, matrices and deter-
minants is used to solve a problem of integration. In particular, we develop a method for
calculating integrals of rational functions; that is, functions of the form
an x n + · · · + a1 x + a0
f (x) = (2.11)
b m x m + · · · + b1 x + b0
where m and n are nonnegative integers, and a0 , . . . , an and b0 , . . . , bm are real numbers.
Some rational functions are relatively easy to integrate. For instance, it is not hard to see
that
Z Z Z
1 1 1 1
dx = ln |x| + C, dx = ln |x + 1| + C, dx = − +C
x x+1 (x + 1)2 x+1
while Z Z
x 1 2 1
2
dx = 2 ln(x + 1) + C and 2
dx = arctan(x) + C.
x +1 x +1
The strategy is to express a given rational function as a sum of such simpler rational func-
tions. For instance, we have
2x x+1 1
= − .
(x + 1)(x2 + 1) x2 + 1 x + 1

More generally, a given rational function f such as in (2.11) can always be expressed as a
polynomial plus terms of the form
A Bx + C
n
and
(ax + b) (cx + dx + e)m
2

134
where m and n are positive integers and the quadratic term cx2 + dx + e is irreducible. The
specific form of such a decomposition is determined by the factors of the denominator of f .
As we show, finding such a decomposition boils down to having to solve a system of linear
equations.

The method discussed below makes use of a few facts and results about polynomial functions
on R. Recall that a polynomial function on R is a function p : R → R given by

p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0 , x ∈ R (2.12)

where n ≥ 0 is an integer, and a0 , . . . an are real numbers. A rational function on R is


therefore a function f from R to R given by
p(x)
f (x) =
q(x)
where p and q are polynomial functions on R with q not the zero polynomial.

The degree of a polynomial function p as in (2.12) is the largest integer k such that ak ̸= 0.
If a0 = . . . = an = 0, then the degree of p is undefined.
Theorem 2.6.1. Let n ≥ 0 be an integer. Let p and q be polynomial functions on R given
by
p(x) = an xn + an−1 xn−1 + · · · + a1 x + a0 , x ∈ R
and
q(x) = bn xn + bn−1 xn−1 + · · · + b1 x + b0 , x ∈ R
Then p(x) = q(x) for every x ∈ R if and only if aj = bj for all j = 0, . . . , n.

Polynomials and polynomial functions are discussed in more detail in Sections 7.1, 7.2 and
7.3.

We are now in a position to address the problem of decomposing a rational function into a
sum of simpler rational functions. In order to illustrate the underlying principles involved,
we consider a special case of the general method, see Theorem 2.6.9.
Theorem 2.6.2 (Partial Fraction Decomposition). Consider polynomial functions p
and q on R given by
p(x) = bx3 + dx2 + ex + f, x ∈ R
and
q(x) = (x + a)2 (x2 + c), x ∈ R
where c > 0. Then there exist unique real numbers A, B, C and D such that
p(x) A B Cx + D
= + 2
+ 2
q(x) x + a (x + a) x +c
for every x ∈ R so that x ̸= −a.

135
Proof. Let A, B, C and D be real numbers. We have
A B Cx + D
+ +
x + a (x + a)2 x2 + c

A(x + a)(x2 + c) + B(x2 + c) + (Cx + D)(x + a)2


=
(x + a)2 (x2 + c)

A(x3 + ax2 + cx + ac) + B(x2 + c) + (Cx + D)(x2 + 2ax + a2 )


=
q(x)

(A + C)x3 + (aA + B + 2aC + D)x2 + (cA + a2 C + 2aD)x + (acA + cB + a2 D)


=
q(x)
for every x ∈ R such that x ̸= −a. Therefore
p(x) A B Cx + D
= + 2
+ 2 , x ̸= −a
q(x) x + a (x + a) x +c
if and only if
bx3 + dx2 + ex + f
q(x)

(A + C)x3 + (aA + B + 2aC + D)x2 + (cA + a2 C + 2aD)x + (acA + cB + a2 D)


=
q(x)
for every x ̸= a. Hence
p(x) A B Cx + D
= + + , x ̸= −a
q(x) x + a (x + a)2 x2 + c
if and only if
bx3 + dx2 + ex + f = (A + C)x3 + (aA + B + 2aC + D)x2 + (cA + a2 C + 2aD)x

+(acA + cB + a2 D)
for every x ∈ R. It follows from Theorem 2.6.1 that
p(x) A B Cx + D
= + + , x ̸= −a
q(x) x + a (x + a)2 x2 + c
if and only if
A + C = b

aA + B + 2aC + D = d
(2.13)
cA + a2 C + 2aD = e

acA + cB + a2 D = f.

136
The coefficient matrix for this system of linear equationS is
 
1 0 1 0
 a 1 2a 1 
M = .
 c 0 a2 2a 
ac c 0 a2
We have
det(M ) = (a2 + c)2 .
Since c > 0, it follows that det(M ) > 0. It therefore follows from Theorem 2.5.25 that the
system of equations (2.13) has a unique solution. Therefore there exists unique real numbers
A, B, C and D so that
p(x) A B Cx + D
= + 2
+ 2 , x ̸= −a.
q(x) x + a (x + a) x +c

Partial Fraction Decomposition Let p and q be polynomial functions on R such that


the following is true.

(1) q has degree at least 1.

(2) The degree of p is strictly less than the degree of q.

Then the rational function


p(x)
f (x) = , x ∈ R such that q(x) ̸= 0
q(x)
can be expressed as a sum of terms of the form
A Bx + C
j
and
(ax + b) (cx + dx + e)k
2

where j, k ≥ 1 are integers and all the coefficients appearing in these terms are real numbers.
Such an expansion is called a decomposition of f into partial fractions. The specific terms
appearing in the partial fraction decomposition of f are determined by the factors of q as
follows.

(1) If ax + b is a factor of q, and repeats exactly n times, then the partial fraction decom-
position of f contains the terms
A1 A2 An
, 2
,..., .
ax + b (ax + b) (ax + b)n

(2) If an irreducible quadratic cx2 + dx + e is a factor of q, and repeats exactly m times,


then the partial fraction decomposition of f contains the terms
B1 x + C1 B2 x + C2 Bm x + Cm
2
, 2 2
,..., .
cx + dx + e (cx + dx + e) (cx2 + dx + e)m

137
We demonstrate the method described above at the hand of a number of examples.

Example 2.6.3. We evaluate the integral


Z 3 2
5x − 6x − 2
3 2
dx.
2 2x − x − x

Solution. Let p(x) = 5x2 − 6x − 2 and q(x) = 2x3 − x2 − x for every x ∈ R. Then

q(x) = x(2x2 − x − 1) = x(x − 1)(2x + 1), x ∈ R.

Therefore there exist unique real numbers A, B and C so that

p(x) A B C
= + + for x ̸= 1 and x ̸= − 21 .
q(x) x x − 1 2x + 1

We have
A B C A(x − 1)(2x + 1) + Bx(2x + 1) + Cx(x − 1)
+ + =
x x − 1 2x + 1 x(x − 1)(2x + 1)

A(2x2 − x − 1) + B(2x2 + x) + C(x2 − x)


=
q(x)

(2A + 2B + C)x2 + (−A + B − C)x − A


= .
q(x)

Therefore
p(x) A B C
= + + for x ̸= 1 and x ̸= − 21
q(x) x x − 1 2x + 1
if and only if

5x2 − 6x − 2 = p(x) = (2A + 2B + C)x2 + (−A + B − C)x − A for all x ∈ R

so that
2A + 2B + C = 5
−A + B − C = −6
−A = −2.
Solving the system of equations, we find that

A = 2, B = −1 and C = 3.

Therefore
p(x) 2 1 3
= − + for x ̸= 1 and x ̸= − 21 .
q(x) x x − 1 2x + 1

138
Hence
3 3
5x2 − 6x − 2
Z   Z
2 1 3
dx = − + dx
2 2x3 − x2 − x 2 x x − 1 2x + 1
3
3 ln |2x + 1
= 2 ln |x| − ln |x − 1| +
2 2

3 ln 7 3 ln 5
= 2 ln 3 − 3 ln 2 + − .
2 2
Example 2.6.4. We evaluate the integral

2x3 + 2x2 − 25x + 2


Z
dx.
(3x + 2)(x − 2)3

Solution. Let p(x) = 2x3 + 2x2 − 25x + 2 and q(x) = (3x + 2)(x − 2)3 for every x ∈ R.
Then there exist unique real numbers A, B, C and D so that

p(x) A B C D
= + 2
+ 3
+ for x ̸= 2 and x ̸= − 23 .
q(x) x − 2 (x − 2) (x − 2) 3x + 2

For x ̸= 2 and x ̸= − 32 we have

A B C D
+ 2
+ 3
+
x − 2 (x − 2) (x − 2) 3x + 2

A(x − 2)2 (3x + 2) + B(x − 2)(3x + 2) + C(3x + 2) + D(x − 2)3


=
(x − 2)3 (3x + 2)

= (3A + D)x3 + (−10A + 3B − 6D)x2 + (4A − 4B + 3C + 12D)x




+ (8A − 4B + 2C − 8D)] /q(x).

Therefore
p(x) A B C D
= + 2
+ 3
+ for x ̸= 2 and x ̸= − 32
q(x) x − 2 (x − 2) (x − 2) 3x + 2

if and only if

2x3 + 2x2 − 25x + 2 = p(x) = (3A + D)x3 + (−10A + 3B − 6D)x2

+(4A − 4B + 3C + 12D)x

+(8A − 4B + 2C − 8D)

139
for all x ∈ R so that
3A + D = 2

−10A + 3B − 6D = 2

4A − 4B + 3C + 12D = −25

8A − 4B + 2C − 8D = 2.

Solving the system of equations, we find that

A = 1, B = 2, C = −3 and D = −1.

Therefore
p(x) 1 2 3 1
= + 2
− 3
− for x ̸= 2 and x ̸= − 23 .
q(x) x − 2 (x − 2) (x − 2) 3x + 2
Hence
2x3 + 2x2 − 25x + 2
Z Z  
1 2 3 1
dx = + − − dx
(3x + 2)(x − 2)3 x − 2 (x − 2)2 (x − 2)3 3x + 2

2 3 ln |3x + 2|
= ln |x − 2| − + 2
− + C.
x − 2 2(x − 2) 3
Example 2.6.5. We evaluate the integral
Z 1
10x2 + 10x + 5
2
dx.
0 (2x + 1)(x + 2x + 2)

Solution. Let p(x) = 10x2 + 10x + 5 and q(x) = (2x + 1)(x2 + 2x + 2) for every x ∈ R.
Note that the quadratic term x2 + 2x + 2 is indecomposable, since

22 − 4 × 1 × 2 = −4 < 0.

Therefore there exist unique real numbers A, B and C so that


p(x) A Bx + C
= + 2 , x ̸= − 21 .
q(x) 2x + 1 x + 2x + 2

For x ̸= − 12 we have

A Bx + C A(x2 + 2x + 2) + (Bx + C)(2x + 1)


+ 2 =
2x + 1 x + 2x + 2 (2x + 1)(x2 + 2x + 2)

(A + 2B)x2 + (2A + B + 2C)x + (2A + C)


= .
q(x)

140
Therefore
p(x) A Bx + C
= + 2 , x ̸= − 21
q(x) 2x + 1 x + 2x + 2
if and only if

10x2 + 10x + 5 = p(x) = (A + 2B)x2 + (2A + B + 2C)x + (2A + C)

for all x ∈ R so that


A + 2B = 10
2A + B + 2C = 10
2A + C = 5.
Solving the system of equations, we find that

A = 2, B = 4 and C = 1.

Therefore
p(x) 2 4x + 1
= + 2 , x ̸= 21 .
q(x) 2x + 1 x + 2x + 2
Hence
1 1
10x2 + 10x + 5
Z Z  
2 4x + 1
dx = + 2 dx
0 (2x + 1)(x2 + 2x + 2) 0 2x + 1 x + 2x + 2
1 Z 1
4x + 1
= ln |2x + 1| + dx
0 0 x2 + 2x + 2
Z 1
4x + 1
= ln 3 + dx.
0 x2 + 2x + 2
In order to evaluate the remaining integral, we note that
d 2x + 2
ln |x2 + 2x + 2| = 2
dx x + 2x + 2
 
1 4x + 4
=
2 x2 + 2x + 2
 
1 4x + 1 3
= + .
2 x2 + 2x + 2 x2 + 2x + 2

Therefore
4x + 1 d 3
= 2 ln |x2 + 2x + 2| − 2 .
x2
+ 2x + 2 dx x + 2x + 2
Completing the square in the last term we find that
4x + 1 d 3
= 2 ln |x2 + 2x + 2| − .
x2 + 2x + 2 dx (x + 1)2 + 1

141
Therefore
Z 1 Z 1  
4x + 1 d 2 3
2
dx = 2 ln |x + 2x + 2| − dx
0 x + 2x + 2 0 dx (x + 1)2 + 1
1
= 2 ln |x2 + 2x + 2| − 3 arctan(x + 1)
0


= 2 ln 5 + 3 arctan(2) − 2 ln 2 + .
4
Finally, we have
Z 1
10x2 + 10x + 5 3π
2
dx = ln 3 + 2 ln 5 + 3 arctan(2) − 2 ln 2 +
0 (2x + 1)(x + 2x + 2) 4


= ln( 75
4
)+ − 3 arctan(2).
4
Example 2.6.6. We evaluate the integral
Z 2
−x4 − 2x3 + 4x2 − 8x + 16
dx.
1 2x5 + 16x3 + 32x

Solution. Let p(x) = −x4 − 2x3 + 4x2 − 8x + 16 and q(x) = 2x5 + 16x3 + 32x for every
x ∈ R. We have
q(x) = 2x(x4 + 8x2 + 16) = 2x(x2 + 4)2 , x ∈ R.
Since the quadratic term x2 + 4 is irreducible, there exist unique real numbers A, B, C, D
and E so that
p(x) A Bx + C Dx + E
= + 2 + 2 , x ̸= 0.
q(x) 2x x +4 (x + 4)2
For x ̸= 0 we have
A Bx + C Dx + E A(x2 + 4)2 + 2x(x2 + 4)(Bx + C) + 2x(Dx + E)
+ 2 + 2 =
2x x +4 (x + 4)2 2x(x2 + 4)2

= (A + 2B)x4 + 2Cx3 + (8A + 8B + 2D)x2




+(8C + 2E)x + 16A] /q(x).


Therefore
p(x) A Bx + C Dx + E
= + 2 + 2 , x ̸= 0
q(x) 2x x +4 (x + 4)2
if and only if
−x4 − 2x3 + 4x2 − 8x + 16 = p(x) = (A + 2B)x4 + 2Cx3 + (8A + 8B + 2D)x2

+(8C + 2E)x + 16A

142
for all x ∈ R so that
A + 2B = −1
2C = −2
8A + 8B + 2D = 4
8C + 2E = −8
16A = 16.
Solving the system of equations, we find that
A = 1, B = −1, C = −1, D = 2 and E = 0.
Therefore
p(x) 1 x+1 2x
= − 2 + 2 , x ̸= 0.
q(x) 2x x + 4 (x + 4)2
Hence
2
−x4 − 2x3 + 4x2 − 8x + 16
Z
dx
1 2x5 + 16x3 + 32x
Z 2
1 x+1 2x
= − 2 + 2 dx
1 2x x + 4 (x + 4)2
Z 2
1 x 1 2x
= − 2 − + dx
1 2x x + 4 4((x/2)2 + 1) (x2 + 4)2
2
ln |x| ln(x2 + 4) arctan(x/2) 1
= − − − 2
2 2 2 x +4 1

ln 2 ln 8 arctan(1) 1 ln 5 arctan(1/2) 1
= − − − + + +
2 2 2 8 2 2 5
√ 3 π arctan(1/2)
= ln 5 − ln 2 + − + .
40 8 2

A partial fraction decomposition for a rational function f given by


p(x)
f (x) = , x ∈ R such that q(x) ̸= 0
q(x)
exists only when the degree of p is less than the degree of q or r(x) = 0. However, if the
degree of p is greater than or equal to the degree of q, then there exist unique polynomial
functions u, r : R → R such that
p(x) = q(x)u(x) + r(x), x ∈ R
and the degree of r is less than the degree of q, see Theorem 7.2.2. In this case,
p(x) r(x)
f (x) = = u(x) + , x ∈ R such that q(x) ̸= 0.
q(x) q(x)

143
The polynomial functions u and r are determined using long division. A partial fraction de-
r(x)
composition can now be determined for q(x) . In this regard, consider the following example.

Example 2.6.7. We evaluate the integral

3x4 − x3 + 4x − 2
Z
dx.
3x2 − x

Solution. Let p(x) = 3x4 − x3 + 4x − 2 and q(x) = 3x2 − x for all x ∈ R. Note that p
has degree 4 and q has degree 2. We therefore perform long division to find polynomial
functions u and r so that
p(x) r(x)
= u(x) +
q(x) q(x)
for all x ∈ R such that q(x) ̸= 0.

x2
3x2 − x ) 3x4 − x3 + 4x − 2
3x4 − x3
4x − 2

Therefore u(x) = x2 and r(x) = 4x − 2 so that

p(x) 4x − 2 4x − 2
= x2 + 2 = x2 + for x ̸= 0 and x ̸= 13 .
q(x) 3x − x x(3x − 1)
There exist unique real numbers A and B so that
4x − 2 A B
2
= + for x ̸= 0 and x ̸= 13 .
3x − x x 3x − 1
1
For x ̸= 0 and x ̸= 3
we have

A B (3A + B)x − A
+ = .
x 3x − 1 3x2 − x
Therefore
4x − 2 A B 1
2
= + for x ̸= 0 and x ̸= 3
3x − x x 3x − 1
if and only if
4x − 2 = (3A + B)x − A
for all x ∈ R so that
3A + B = 4
−A = −2.
Solving the systems of equations we find that

A = 2 and B = −2.

144
Therefore
4x − 2 2 2 1
2
= − for x ̸= 0 and x ̸= 3
3x − x x 3x − 1
so that
3x4 − x3 + 4x − 2
Z Z  
2 2 2
dx = x + − dx
3x2 − x x 3x − 1

x3 2 ln |3x − 1|
= + 2 ln |x| − + C.
3 3

It is sometimes possible to convert a given integral into an integral of a rational function by


making an appropriate substitution. We demonstrate this idea in the following example.

Example 2.6.8. We evaluate the integral


Z ln 3 4x
2e + 10e3x + 10e2x − 2ex
dx.
ln 2 (e2x − 1)(e2x + 4ex + 5)

Solution. For x ∈ R, let u = ex . Then

2e4x + 10e3x + 10e2x − 2ex 2u4 + 10u3 + 10u2 − 2u


=
(e2x − 1)(e2x + 4ex + 5) (u2 − 1)(u2 + 4u + 5)

2u3 + 10u2 + 10u − 2


= × u.
(u2 − 1)(u2 + 4u + 5)

But
du
= ex = u
dx
so that
2e4x + 10e3x + 10e2x − 2ex 2u3 + 10u2 + 10u − 2 du
= × .
(e2x − 1)(e2x + 4ex + 5) (u2 − 1)(u2 + 4u + 5) dx
Also note that
x = ln2 if and only if u = 2
and
x = ln3 if and only if u = 3.
Therefore
ln 3 3
2e4x + 10e3x + 10e2x − 2ex 2u3 + 10u2 + 10u − 2
Z Z
dx = du.
ln 2 (e2x − 1)(e2x + 4ex + 5) 2 (u2 − 1)(u2 + 4u + 5)

Let p(u) = 2u3 + 10u2 + 10u − 2 and q(u) = (u2 − 1)(u2 + 4u + 5) for all u ∈ R. We have

q(u) = (u − 1)(u + 1)(u2 + 4u + 5), u ∈ R.

145
Because
42 − 4 × 1 × 5 = −4 < 0,
the quadratic term u2 +4u+5 is indecomposable. Therefore there exist unique real numbers
A, B, C and D so that

p(u) A B Cu + D
= + + 2 , u ̸= ±1.
q(u) u − 1 u + 1 u + 4u + 5

For u ̸= ±1 we have
A B Cu + D
+ + 2
u − 1 u + 1 u + 4u + 5

A(u + 1)(u2 + 4u + 5) + B(u − 1)(u2 + 4u + 5) + (Cu + D)(u − 1)(u + 1)


=
(u − 1)(u + 1)(u2 + 4u + 5)

(A + B + C)u3 + (5A + 3B + D)u2 + (9A + B − C)u + (5A − 5B − D)


= .
q(u)

Therefore
p(u) A B Cu + D
= + + 2 , u ̸= ±1
q(u) u − 1 u + 1 u + 4u + 5
if and only if

2u3 + 10u2 + 10u − 2 = p(u)

= (A + B + C)u3 + (5A + 3B + D)u2

+(9A + B − C)u + (5A − 5B − D)

for all x ∈ R so that


A + B + C = 2
5A + 3B + D = 10
9A + B − C = 10
5A − 5B − D = −2.
Solving the system of equations, we have

A = 1, B = 1, C = 0 and D = 2.

Therefore
p(u) 1 1 2
= + + 2 , u ̸= ±1
q(u) u − 1 u + 1 u + 4u + 5

146
so that
Z ln 3 3
2e4x + 10e3x + 10e2x − 2ex 2u3 + 10u2 + 10u − 2
Z  
dx = du
ln 2 (e2x − 1)(e2x + 4ex + 5) 2 (u2 − 1)(u2 + 4u + 5)
Z 3  
1 1 2
= + + 2 du
2 u − 1 u + 1 u + 4u + 5
Z 3  
1 1 2
= + + du
2 u − 1 u + 1 (u + 2)2 + 1
3
= ln |u − 1| + ln |u + 1| + 2 arctan(u + 2)
2

= 3 ln 2 − ln 3 + 2 arctan(5) − 2 arctan(4).

We end this section by stating, for the sake of completeness, the general result on the
partial fraction decomposition of a rational function. The proof of this result is beyond the
scope of this book, and is based on a generalisation of the Division Algorithm, see Theorem
7.2.2, called Bézout’s Identity. Alternatively, one could prove the result using the theory of
systems of linear equations, matrices and determinants, but this leads to extremely messy
symbolic calculations.

Theorem 2.6.9. Let p and q be polynomial functions on R. Assume that the following is
true.

(1) The degree of p is strictly less than the degree of q, and q has degree at least 1.

(2) The polynomial function q can be written as the product of powers of distinct linear
and quadratic terms

q(x) = (a1 x + b1 )n1 . . . (ak x + bk )nk (c1 x2 + d1 x + e1 )m1 . . . (cr x2 + dr x + er )mr , x ∈ R

where n1 , . . . , nk ≥ 0 and m1 , . . . , mr ≥ 0 are integers, ai , bi , cj , dj and ej are real


numbers such that ai ̸= 0 and d2j − 4cj ej < 0 for all i = 1, . . . , k and j = 1, . . . , r.

Then for every i = 1, . . . , k there exist unique real numbers Ai,1 , . . . , Ai,ni , and for every
j = 1, . . . , r there exist unique real numbers Bj,1 , . . . , Bj,mj and Cj,1 , . . . , Cj,mj such that
k in r mj
p(x) X X Ai,t XX Bj,s x + Cj,s
= t
+
q(x) i=1 t=1
(ai x − bi ) j=1 s=1
(cj x2 + dj x + ej )s

In this section, it is shown how the theory of systems of linear equations, matrices and deter-
minants are used to solve a problem of integration. The method we arrive at, partial fraction
decomposition of rational functions, can be used, in principle, to evaluate the integral of

147
any rational function, provided that we can determine the factors of the its denominator.
However, once we have determined the partial fraction decomposition of a given rational
function, it may be very difficult to find the antiderivative of the resulting functions. This
is particularly true in the case of repeating quadratic factors in the denominator. In order
to deal with this case in general, additional integration techniques are required. These are
discussed in Section 4.2.

Exercise 2.6

1. Evaluate the following integrals.


Z 2 3
1 − 3x
Z
x+4
(a) 2
dx (b) dx
1 2x + x − 1 2 −6x2 + 7x − 2
3 3
4x2 − 7x − 1 2x3 + x2 − 4x + 2
Z Z
(c) dx (d) dx
2 (2x − 1)(x2 + 2x − 3) 2 x4 − x 3
4 3
6x − 2x3 − 4x2 + 4 2x3 − x2 + 3x − 1
Z Z
(e) dx (f) √
dx
3 (x2 − 1)2 3 2x2 − x − 1
3 1
8x2 + 2x − 1
Z Z
3x + 4
(g) 2
dx (h) 2
dx
2√ (x + 1)(4x + 1) 0 (2x + 1)(x + 1)
Z 3 2 Z 1
2x + 2x + 3 3x2 + 6x + 7
(i) dx (j) dx
1 x3 + 3x 2
0 (x + 1)(x + 2x + 3)

1 0
12x2 − 2x3 − 5x + 2 16x3 + 40x2 + 34x + 13
Z Z
(k) √
dx (l) dx
1/ 2 (1 − 2x)(2x2 + 1)2 −1 4(2x2 + 2x + 1)2

−1
x5 + 4x4 + 5x3 + 2x2 + 5x + 5
Z
(m) dx
−2 x3 + 4x2 + 5x
2. Evaluate the following integrals by first making an appropriate substitution.
Z π/4
sin2 x cos x − 2 sin x cos x − cos x
(a) dx
π/6 sin3 x − sin x
Z ln(π/3) 3x
3e + 2e2x + ex
(b) dx
0 e3x + ex
Z 6
1
(c) √ dx
1 2 x+3+3

3. Let a, b and c be real numbers with c ̸= 0. Prove that there exist unique real numbers
A and B so that
ax + b A B
= + , x ̸= 0 and x ̸= −c.
x(x + c) x x+c

148
Chapter 3

The Definition of a Limit

The concept of a limit is fundamental to Calculus. Indeed, the derivative of a function is


defined in terms of a specific limit, and the definite integral is also defined as a type of limit.
It is possible to grasp some of the basic concepts of Calculus with only an intuitive idea
of what limits are. However, for much of Calculus a deeper understanding of this central
concept is required. In this chapter, we grapple with the nettle that is the precise definition
of a limit. As an application, in Chapter 4 we prove a version of the Fundamental Theorem
of Calculus, a cornerstone result in Mathematics, and develop some of the consequences of
this mighty theorem.

3.1 The Limit at a Point

Throughout this section, a will denote a fixed real number and f , g, h etc. functions from
R to R that are defined on an open interval (c, d) containing a, except possibly at a itself.
For instance, we may take a = 1 and f the function
x2 − 1
f (x) = , x ̸= 1.
x−1
Clearly, f is defined for every real number, except for a = 1.
Definition 3.1.1. Let f be a function defined on some open interval containing a, except
possibly at a, and L a real number. We say that lim f (x) = L if the following holds: For
x→a
every ϵ > 0 there exists a number δ > 0 so that for all x ∈ R,
if 0 < |x − a| < δ then |f (x) − L| < ϵ.
Remark 3.1.2. Definition 3.1.1 makes precise our intuitive understanding of what it means
for ‘f (x) to tend towards L as x tends towards a’: The values f (x) of f are as close to L
as we like, provided only that we take x close enough to a. The real number ϵ > 0 specifies
how close we want f (x) to be to L, and the number δ > 0 tells us how close we need to take
x to a; see the figure below.

149
y
y = f (x)

L+ϵ
L
L−ϵ
x
a

a−δ

a+δ
Note that |f (x) − L| < ϵ if and only if L − ϵ < f (x) < L + ϵ, and 0 < |x − a| < δ if and
only if a − δ < x < a + δ and x ̸= a.

The remainder of this section is devoted to examples of how the definition is applied. In
each case, our task is to verify that Definition 3.1.1 is satisfied. You should note that the
important aspect in the examples to follow is not the particular algebraic manipulations,
but the underlying logic of the argument.
Example 3.1.3. We prove that the limit of f (x) = 2x − 1 as x tends to 3 is 5; that is,
lim f (x) = 5.
x→3

Solution. First note that for x ∈ R,


|f (x) − 5| = |2x − 1 − 5|

= |2x − 6| (3.1)

= 2|x − 3|.
Fix any ϵ > 0. Choose δ = ϵ/2. Then (3.1) implies that
ϵ
if 0 < |x − 3| < δ then |f (x) − 5| = 2|x − 3| < 2δ = 2 × = ϵ.
2
Since this is true for all ϵ > 0 it follows by the definition of a limit that lim f (x) = 5.
x→3
2
Example 3.1.4. We prove that the limit of h(x) = 2x cos(x − 1) + 3 as x tends to 0 is 3;
that is, lim h(x) = 3.
x→0

Solution. First note that | cos α| ≤ 1 for all α ∈ R. Therefore we have


|h(x) − 3| = |2x cos(x2 − 1) + 3 − 3|

= 2|x|| cos(x2 − 1)| (3.2)

≤ 2|x − 0|

150
for all x ∈ R. Fix any ϵ > 0. Choose δ = ϵ/2.

Let 0 < |x − 0| < δ. Then (3.2) implies that


ϵ
|h(x) − 3| ≤ 2|x − 0| < 2δ = 2 × = ϵ.
2
Since this is true for all ϵ > 0 it follows by the definition of a limit that lim f (x) = 3.
x→3
2
Example 3.1.5. We prove that the limit of g(x) = x + 1 as x tends to 1 is 2; that is,
lim g(x) = 2.
x→1

Solution. First note that for x ∈ R we have


|g(x) − 2| = |x2 + 1 − 2|

= |x2 − 1| (3.3)

= |x + 1||x − 1|.
Note further that
if |x − 1| < 1 then |x + 1| = |x − 1 + 2| ≤ |x − 1| + |2| < 3.
Therefore (3.3) implies that
if |x − 1| < 1 then |g(x) − 2| < 3|x − 1|. (3.4)
Fix any ϵ > 0. Choose δ = min{1, ϵ/3}.

Let 0 < |x − 1| < δ. Because δ ≤ 1, it follows that |x − 1| < 1. Therefore (3.4) implies that
|g(x) − 2| < 3|x − 1| < 3δ.
But δ ≤ 3ϵ , so
ϵ
|g(x) − 2| < 3δ ≤ 3 × = ϵ.
3
Since this is true for all ϵ > 0 it follows by the definition of a limit that lim g(x) = 2.
x→1
2
Example 3.1.6. We prove that the limit of f (x) = 2x + x − 2 as x tends to −2 is 4; that
is, lim f (x) = 4.
x→−2

Solution. First note that


|f (x) − 4| = |2x2 + x − 2 − 4|

= |2x2 + x − 6|
(3.5)
= |(2x − 3)(x + 2)|

= |2x − 3||x + 2|

151
for all x ∈ R. Note further that

if |x + 2| < 1 then − 1 < x + 2 < 1 so that − 3 < x < −1.

Hence

if |x + 2| < 1 then − 6 < 2x < −2 so that − 9 < 2x − 3 < −5.

We now have

if |x + 2| < 1 then |2x − 3| < 9.

Therefore (3.5) implies that

if |x + 2| < 1 then |f (x) − 4| < 9|x + 2|. (3.6)

Fix any ϵ > 0. Choose δ = min{1, ϵ/9}.

Let 0 < |x + 2| < δ. Since δ ≤ 1, it follows that |x + 2| < 1. Therefore (3.6) implies that
ϵ
|f (x) − 4| < 9|x + 2| < 9δ ≤ 9 × = ϵ.
9
But δ ≤ 9ϵ , so
ϵ
|f (x) − 4| < 9δ ≤ 9 × = ϵ.
9
Since this is true for all ϵ > 0 it follows by the definition of a limit that lim f (x) = 4.
x→−2

1
Example 3.1.7. We prove that the limit of h(x) = as x tends to 2 is 1/4, that is,
2x
lim h(x) = 1/4.
x→2

Solution. If x ̸= 0, then
1 1
|h(x) − 14 | = −
2x 4

2−x
=
4x
(3.7)
|2 − x|
=
|4x|

1
= |2 − x|.
|4x|
Note that if |x − 2| < 1 then −1 < x − 2 < 1 so that x > 1. Therefore
1 1
if |x − 2| < 1 then |4x| > 4 > 0 so that < .
|4x| 4

152
Therefore (3.7) implies that

if |x − 2| < 1 then |h(x) − 41 | < 14 |x − 2|. (3.8)

Fix any ϵ > 0. Choose δ = min{1, 4ϵ}.

Let 0 < |x − 2| < δ. Because δ ≤ 1, it follows that |x − 2| < 1. Therefore (3.8) implies that

|h(x) − 14 | < 14 |x − 2| < 14 δ.

But δ ≤ 4ϵ so that
|h(x) − 14 | < 14 δ ≤ 1
4
× 4ϵ = ϵ.
Since this is true for all ϵ > 0 it follows by the definition of a limit that lim f (x) = 4.
x→−2

3x + 1
Example 3.1.8. We prove that f (x) = tends to 2 as x tends to 1. That is, we
x2 + 1
prove that lim f (x) = 2.
x→1

Solution. For all x ∈ R we have


3x + 1
|f (x) − 2| = | − 2|
x2 + 1

3x − 2x2 − 1
= | |
x2 + 1

|2x2 − 3x + 1|
= .
x2 + 1
Therefore, because x2 + 1 ≥ 1, we have

|f (x) − 2| ≤ |3x − 2x2 − 1|

= |2x − 1||1 − x| (3.9)

= |2x − 1||x − 1|.

Now note that

if |x − 1| < 1 then − 1 < x − 1 < 1 so that 0 < x < 2.

Thus

if |x − 1| < 1 then 0 < 2x < 4 so that − 1 < 2x − 1 < 3.

We therefore have

if |x − 1| < 1 then |2x − 1| < 3.

153
Then (3.9) implies that
if |x − 1| < 1 then |f (x) − 2| < 3|x − 1|. (3.10)
Fix any ϵ > 0. Choose δ = min{1, ϵ/3}.

Let 0 < |x − 1| < δ. Since δ ≤ 1, it follows that |x − 1| < 1. Therefore (3.10) implies that
|f (x) − 2| < 3|x − 1| < 3δ.
But δ ≤ 3ϵ . Therefore
ϵ
|f (x) − 2| < 3δ ≤ 3 ×
= ϵ.
3
Since this is true for all ϵ > 0 it follows by the definition of a limit that lim f (x) = 2.
x→1

The formal definition of a limit, as introduced in this section, is of theoretical importance.


As we will see in subsequent sections, and in some of the exercises that follow, it is an indis-
pensable tool for proving and understanding important results in mathematics. However,
working out examples such as those presented in this section, is an excellent way by which
to become accustomed to the type of reasoning that underlies the theoretical applications
which we will deal with.

Exercise 3.1

1. Prove the following using the definition of a limit.

(a) lim [7x − 3] = 25 (b) lim [x cos(5x) + x] = 0


x→4 x→0

(c) lim [xe2x + 3x4 ] = 0 (d) lim [x2 + 3x + 1] = 11


x→0 x→2

 
−x2 1 1
(e) lim [xe ]=0 (f) lim =
x→0 x→3 x + 1 4

(g) lim [3x2 + x + 2] = 12 (h) lim [x3 − x2 + x − 1] = 5


x→−2 x→2

x2 + 1
 
4 5
(i) lim [x ] = 1 (j) lim 2 =
x→1 x→2 x + 2 6
 
x+1
(k) lim =2 (l) lim [x3 − 2x2 + x] = −4
x→3 x − 1 x→−1

2. Suppose that f : R → R is continuous at c, and that f (c) > k for some real number
k. Prove that there exists a number δ > 0 so that f (x) > k for all x ∈ (c − δ, c + δ).
3. Suppose that f : R → R is differentiable at x = a. Prove that there exists a real
number δ > 0 so that |f (x) − f (a)| ≤ (1 + |f ′ (a)|)|x − a| whenever |x − a| < δ. [HINT:
If α and β are real numbers, then |α| − |β| ≤ ||α| − |β|| ≤ |α − β|.]

154
4. This exercise shows that any continuous function from R to R can be written as
the difference of two continuous functions with positive values. Consider a function
f : R → R. For each x ∈ R, let f + (x) = max{f (x), 0} and f − (x) = max{−f (x), 0}.
(a) Prove that |f + (a) − f + (x)| ≤ |f (a) − f (x)| for all a, x ∈ R.
(b) If f is continuous at a ∈ R, prove that f + is continuous at a.
(c) Repeat questions (a) and (b) for the function f − .
(d) Show that f (x) = f + (x) − f − (x) for every x ∈ R.

3.2 The Limit Laws

In this section we consider a first application of the precise definition of a limit, discussed in
Section 3.1. We give rigorous proofs of the so-called ‘limit laws’. These are the operational
properties of limits that allow us to avoid having to deal with the definition every time we
encounter a limit.

We consider first the way in which basic algebraic operations affect limits.
Theorem 3.2.1. Suppose that f and g are functions defined on an open interval I contain-
ing the point a, but possibly not at a. If lim f (x) = L and lim g(x) = M , with L and M
x→a x→a
real numbers, then the following statements are true.

(1) lim [f (x) + g(x)] exists, and lim [f (x) + g(x)] = L + M .


x→a x→a

(2) lim cf (x) exists and lim cf (x) = cL for all c ∈ R.


x→a x→a

(3) lim [f (x)g(x)] exists and lim [f (x)g(x)] = LM .


x→a x→a

f (x) f (x) L
(4) If M ̸= 0, then lim exists and lim = .
x→a g(x) x→a g(x) M

Proof of (1). First note that, by the Triangle Inequality,


|[f (x) + g(x)] − [L + M ]| = |[f (x) − L] + [g(x) − M ]|
(3.11)
≤ |f (x) − L| + |g(x) − M |.
Fix ϵ > 0. Since lim f (x) = L there exists a number δ1 > 0 so that
x→a

ϵ
if 0 < |x − a| < δ1 then |f (x) − L| < . (3.12)
2
Since lim g(x) = M there exists a number δ2 > 0 so that
x→a

ϵ
if 0 < |x − a| < δ2 then |g(x) − M | < . (3.13)
2

155
Choose δ = min{δ1 , δ2 }.

Let 0 < |x − a| < δ. Because δ ≤ δ1 and δ ≤ δ2 , it follows that

0 < |x − a| < δ1 and 0 < |x − a| < δ2 .

It now follows from (3.12) and (3.13) that


ϵ ϵ
|f (x) − L| < and |g(x) − M | < .
2 2
The inequality (3.11) now implies that
ϵ ϵ
|[f (x) + g(x)] − [L + M ]| < + = ϵ.
2 2
Since this is true for all ϵ > 0 it follows by the definition of a limit that lim [f (x) + g(x)]
x→a
exists, and
lim [f (x) + g(x)] = L + M.
x→a

This completes the proof.

Proof of (2). We have

|cf (x) − cL| = |c(f (x) − L)| = |c| |f (x) − L|. (3.14)

Fix ϵ > 0. Since lim f (x) = L, there exists a number δ > 0 so that
x→a

ϵ
if 0 < |x − a| < δ then |f (x) − L| < . (3.15)
|c| + 1

Let 0 < |x − a| < δ. It follows from (3.14) and (3.15) that


ϵ
|cf (x) − cL| < |c| < ϵ.
|c| + 1

Since this is true for all ϵ > 0 it follows by the definition of a limit that lim cf (x) exists,
x→a
and lim cf (x) = cL.
x→a

Proof of (3). By the Triangle Inequality

|f (x)g(x) − LM | = |f (x)g(x) − Lg(x) + Lg(x) − LM |

≤ |f (x)g(x) − Lg(x)| + |Lg(x) − LM | (3.16)

= |g(x)| |f (x) − L| + |L| |g(x) − M |.

156
Fix ϵ > 0. Since lim g(x) = M , there exists a number δ1 > 0 so that
x→a

if 0 < |x − a| < δ1 then |g(x) − M | < 1.


Therefore
if 0 < |x − a| < δ1 then |g(x)| = |g(x) − M + M | ≤ |g(x) − M | + |M | < 1 + |M |. (3.17)
Since lim g(x) = M , there exists a number δ2 > 0 so that
x→a
ϵ
if 0 < |x − a| < δ2 then |g(x) − M | < . (3.18)
2(|L| + 1)
Since lim f (x) = L, there exists a number δ3 > 0 so that
x→a
ϵ
if 0 < |x − a| < δ3 then |f (x) − L| < . (3.19)
2(|M | + 1)
Choose δ = min{δ1 , δ2 , δ3 }.

Let 0 < |x − a| < δ. Note that δ ≤ δ1 , δ ≤ δ2 and δ ≤ δ3 . Therefore


0 < |x − a| < δ1 and 0 < |x − a| < δ2 and 0 < |x − a| < δ3 .
It now follows from (3.17), (3.18) and (3.19) that
ϵ ϵ
|g(x)| < 1 + |M | and |g(x) − M | < and |f (x) − L| < .
2(|L| + 1) 2(|M | + 1)
Therefore (3.16) implies that
ϵ ϵ
|f (x)g(x) − LM | < [1 + |M |] + |L|
2(|M | + 1) 2(|L| + 1)
ϵ ϵ
< +
2 2

= ϵ.
This is true for all ϵ > 0, so it follows by the definition of a limit that lim [f (x)g(x)] exists
x→a
and lim [f (x)g(x)] = LM .
x→a

Proof of (4). If g(x) ̸= 0, then it follows from the Triangle Inequality that
f (x) L f (x)M − g(x)L
− =
g(x) M g(x)M

f (x)M − LM + LM − g(x)L
=
g(x)M
(3.20)
|f (x)M − LM | + |LM − g(x)L|

|g(x)| |M |

|M | |f (x) − L| + |L| |g(x) − M |


= .
|g(x)| |M |

157
|M |
Since M ̸= 0, it follows that 2
> 0. Since lim g(x) = M , there exists a number δ1 > 0 so
x→a
that
|M |
if 0 < |x − a| < δ1 then |g(x) − M | < .
2
Therefore, by the Triangle Inequality, if 0 < |x − a| < δ1 then
|M |
|M | = |M − g(x) + g(x)| ≤ |M − g(x)| + |g(x)| < + |g(x)|.
2
Hence
|M | |M |
if 0 < |x − a| < δ1 then |g(x)| > |M | − = > 0. (3.21)
2 2
It now follows from (3.20) and (3.21) that
f (x) L 2|f (x) − L| 2|L| |M − g(x)|
if 0 < |x − a| < δ1 then − < + . (3.22)
g(x) M |M | |M |2
Fix ϵ > 0. Since lim g(x) = M , there exists a number δ2 > 0 so that
x→a

ϵ|M |2
if 0 < |x − a| < δ2 then |g(x) − M | < . (3.23)
4(|L| + 1)
Since lim f (x) = L, there exists a number δ3 > 0 so that
x→a

ϵ|M |
if 0 < |x − a| < δ3 then |f (x) − L| < . (3.24)
4
Choose δ = min{δ1 , δ2 , δ3 }.

Let 0 < |x − a| < δ. Note that δ ≤ δ1 , δ ≤ δ2 and δ ≤ δ3 . Therefore


0 < |x − a| < δ1 and 0 < |x − a| < δ2 and 0 < |x − a| < δ3 .
It now follows from (3.22), (3.23) and (3.24) that
f (x) L 2 ϵ|M | 2|L| ϵ|M |2 ϵ ϵ
− < × + 2
× < + = ϵ.
g(x) M |M | 4 |M | 4(|L| + 1) 2 2
f (x)
Since this is true for all ϵ > 0, it follows by the definition of a limit that lim exists,
x→a g(x)
f (x) L
and lim = .
x→a g(x) M

We now establish a comparison result for limits. The proof of this theorem makes use of
an important technique; namely, proof by contradiction. In essence, this method of proof
works as follows. We start by assuming that the conclusion of the theorem we wish to prove
is false, and then construct a logical argument leading to a result which we know to be
false. Since mathematics is not supposed to contain any contradictions, we conclude that
the conclusion of our theorem must be true.

158
Theorem 3.2.2. Suppose that f and g are functions defined on an open interval containing
the point a, but possibly not at a, and that lim f (x) = L and lim g(x) = M , with L and M
x→a x→a
real numbers. If f (x) ≤ g(x) for all x in an open interval containing a, except possibly at
a, then L ≤ M .

Proof. Assume that the result is false; that is, suppose that L > M .
Then ϵ = L−M
2
> 0. Because lim f (x) = L there exists a real number δ1 > 0 so that
x→a

if 0 < |x − a| < δ1 then |f (x) − L| < ϵ.

Therefore
L+M
if 0 < |x − a| < δ1 then − ϵ < f (x) − L < ϵ so that f (x) > L − ϵ = . (3.25)
2
Because lim g(x) = M there exists a real number δ2 > 0 so that
x→a

if 0 < |x − a| < δ2 then |g(x) − M | < ϵ.

Therefore
L+M
if 0 < |x − a| < δ2 then − ϵ < g(x) − M < ϵ so that g(x) < M + ϵ = . (3.26)
2
Choose δ = min{δ1 , δ2 }.

Let 0 < |x − a| < δ. Because δ ≤ δ1 and δ ≤ δ2 ,

0 < |x − a| < δ1 and 0 < |x − a| < δ2 .

It follows from (3.25) and (3.26) that

L+M
g(x) < < f (x).
2
But this contradicts our assumption that f (x) ≤ g(x) for all x in an open interval containing
a, except possibly at a. Therefore our assumption that L > M is false, so that L ≤ M .

The next theorem is a useful tool for establishing that the limit of a given function h, as
x tends to a, exists and equals some real number L. All that is required is to find two
functions f and g, both with limit L as x tends to a, so that f (x) ≤ h(x) ≤ g(x) for x in
some open interval containing a, except possibly when x = a. This result is often referred
to as the ‘Squeeze Theorem’.

Theorem 3.2.3. Suppose that f (x) ≤ h(x) ≤ g(x) for all x in an open interval containing
a, but possibly not at a. If lim f (x) = lim g(x) = L, with L a real number, then lim h(x)
x→a x→a x→a
exists, and lim h(x) = L.
x→a

159
Proof. Fix ϵ > 0. Since lim f (x) = L there exists a number δ1 > 0 so that
x→a

if 0 < |x − a| < δ1 then |f (x) − L| < ϵ.

Therefore

if 0 < |x − a| < δ1 then L − ϵ < f (x) < L + ϵ. (3.27)

Since lim g(x) = L there exists a number δ2 > 0 so that


x→a

if 0 < |x − a| < δ2 then |g(x) − L| < ϵ.

Therefore

if 0 < |x − a| < δ2 then L − ϵ < g(x) < L + ϵ. (3.28)

But f (x) ≤ h(x) ≤ g(x) for all x in an open interval containing a, but possibly not at a.
Therefore there exists a number δ3 > 0 so that

if 0 < |x − a| < δ3 then f (x) ≤ h(x) ≤ g(x). (3.29)

Choose δ = min{δ1 , δ2 , δ3 }.

Let 0 < |x − a| < δ. Note that δ ≤ δ1 , δ ≤ δ2 and δ ≤ δ3 . Therefore

0 < |x − a| < δ1 and 0 < |x − a| < δ2 and 0 < |x − a| < δ3 .

It now follows from (3.27), (3.28) and (3.29) that

L − ϵ < f (x) ≤ h(x) ≤ g(x) < L + ϵ.

Therefore −ϵ < h(x) − L < ϵ so that

|h(x) − L| < ϵ.

Since this is true for all ϵ > 0, it follows by the definition of a limit that lim h(x) exists, and
x→a
lim h(x) = L.
x→a

The last result of this section deals with limits of composite functions.
Theorem 3.2.4. Assume that f is continuous at b, and lim g(x) = b. Then lim f (g(x))
x→a x→a
exists, and lim f (g(x)) = f (b).
x→a

Proof. Fix ϵ > 0. Since f is continuous at b, it follows that lim f (t) = f (b). Therefore
t→b
there exists a number δ1 > 0 so that

if |t − b| < δ1 then |f (t) − f (b)| < ϵ. (3.30)

160
But lim g(x) = b and δ1 > 0. Therefore there exists a number δ > 0 so that
x→a

if 0 < |x − a| < δ then |g(x) − b| < δ1 . (3.31)

It follows from (3.30) and (3.31) that

if 0 < |x − a| < δ then |g(x) − b| < δ1 so |f (g(x)) − f (b)| < ϵ.

Since this is true for all ϵ > 0, it follows that lim f (g(x)) exists, and lim f (g(x)) = f (b).
x→a x→a

As a first application of the precise definition of a limit, we established the validity of the
operation rules which are used to evaluate limits. Of much greater importance than the
proofs themselves are the techniques that are employed in the proofs. These ideas recur
with great frequency in Mathematics, and it is worth wile identifying the particular ‘tricks’
used in the proofs, and the circumstances under which it can be used.

Exercise 3.2
ϵ
1. Examine the proof of Theorem 3.2.1 (2). In (3.15), why do we take |f (x)−L| <
|c| + 1
ϵ
instead of |f (x) − L| < ?
|c|
2. Consider functions f : R → R and g : R → R and real numbers a, L and M . If
lim f (x) = L, and lim g(x) = M , is it necessarily true that lim g(f (x)) = M ? If this
x→a x→L x→a
is true, prove it. Otherwise, give a counterexample.

3. Consider functions f : R → R and g : R → R and real numbers a, L and M . Assume


that lim f (x) = L, lim g(x) = M and L < M . Prove that there exists a number δ > 0
x→a x→a
so that
if 0 < |x − a| < δ then f (x) < g(x).

4. Consider a function f : R → R. Assume that lim f (x) = 3. Use the definition of a


x→2
limit to prove that lim [x2 f (x)] = 12.
x→2

5. Consider functions f and g defined on an open interval I containing the point a. If


both f and g are continuous at a, use appropriate theorems to prove that each of the
following functions is continuous at a.

f (x)
(a) h(x) = f (x) + g(x), x ∈ I (b) q(x) = , x ∈ I, if g(a) ̸= 0
g(x)

(c) p(x) = f (x)g(x), x ∈ I (d) u(x) = 2f (x) − 3g(x) + f (x)2 , x ∈ I


6. Let a be a real number. Use the Principle of Mathematical Induction, see Appendix
A.2, to prove the following statement. The function f (x) = xn is continuous at a for
every positive integer n.

161
7. Consider functions f, g, h : R → R such that f (x) ≤ h(x) ≤ g(x) for all x ∈ R. If f
and g are continuous at 0 and f (0) = g(0), prove that h is continuous at 0.

3.3 One-sided Limits

For a function f from R to R, it is only possible to define the limit of f (x) as x tends to a
when f is defined on some open interval containing a, except possibly at a itself. However,
there are many interesting and important situations that involve functions that are defined
only to the left or right of a point a. In order to deal with this kind of situation, the concept
of a one-sided limit is introduced.
Definition 3.3.1. Let f be a function defined on an open interval (a, b). We say that
lim+ f (x) = L if the following holds: For every ϵ > 0 there exists a number δ > 0 so that
x→a
for all x ∈ R,
if a < x < a + δ then |f (x) − L| < ϵ.
(3.32)
Definition 3.3.2. Let f be a function defined on an open interval (b, a). We say that
lim− f (x) = L if the following holds: For every ϵ > 0 there exists a number δ > 0 so that
x→a
for all x ∈ R,
if a − δ < x < a then |f (x) − L| < ϵ.

It should be noted that most of the results on limits at a point that are established in Section
3.2 hold also for one-sided limits. Indeed, Theorems 3.2.1, 3.2.2 and 3.2.3 are also valid for
one-sided limits. Since the formulations and proofs of these results are nearly identical to
those treated in Section 3.2, a lengthy discussion is unnecessary at this point. However,
since the Squeeze Theorem for one-sided limits will be used in Section 4.3.
Theorem 3.3.3. Let f , g and h be functions defined on an interval (a, b), and let L be a
real number. If lim+ f (x) = L and lim+ g(x) = L and f (x) ≤ h(x) ≤ g(x) for all x ∈ (a, b),
x→a x→a
then lim+ h(x) = L.
x→a

Theorem 3.3.4. Let f , g and h be functions defined on an interval (b, a), and let L be a
real number. If lim− f (x) = L and lim− g(x) = L and f (x) ≤ h(x) ≤ g(x) for all x ∈ (b, a),
x→a x→a
then lim− h(x) = L.
x→a

Example 3.3.5. Consider the function


 2
 x if x ≤ 2
f (x) =
5x − 3 if x > 2

We prove that lim+ f (x) = 7.


x→2

162
Solution. Let x > 2. Then
|f (x) − 7| = |5x − 3 − 7|

= |5x − 10|

= 5|x − 2|.

But x > 2 so that x − 2 > 0. Therefore

|f (x) − 7| = 5|x − 2| = 5(x − 2). (3.33)

Fix any ϵ > 0 and choose δ = 5ϵ .

Let 2 < x < 2 + δ. Then


ϵ
0<x−2<δ = .
5
Therefore (3.33) implies that
ϵ
|f (x) − 7| < 5δ = 5 × = ϵ.
5
Since this holds for all ϵ > 0 it follows from the definition of a one-sided limit that
lim+ f (x) = 7.
x→2

Example 3.3.6. Consider the function


 2
 x + 2x + 3 if x ≤ 1
g(x) =
cos x if x > 1

We prove that lim− g(x) = 6.


x→1

Solution. Let x < 1. Then


|g(x) − 6| = |x2 + 2x + 3 − 6|

= |x2 + 2x − 3|

= |x + 3||x − 1|.

But x < 1 so that x − 1 < 0. Therefore |x − 1| = 1 − x. Hence

|g(x) − 6| = |x + 3|(1 − x). (3.34)

Note that

if 0 < x < 1 then 3 < x + 3 < 4.

163
Therefore (3.34) implies that

if 0 < x < 1 then |g(x) − 6| < 4(1 − x). (3.35)

Fix any ϵ > 0. Choose δ = min{1, 4ϵ }.

Let 1 − δ < x < 1. Since δ ≤ 1 it follows that 0 < x < 1. Hence (3.35) implies that

|g(x) − 6| < 4(1 − x) < 4δ.

But δ ≤ × 4ϵ . Therefore
ϵ
|g(x) − 6| < 4δ ≤ 4 ×
= ϵ.
4
Since this holds for all ϵ > 0 it follows from the definition of a one-sided limit that
lim− g(x) = 6.
x→1

The limit at a point, as defined in Definition 3.1.1, is related to one-sided limits in the
following way.
Theorem 3.3.7. Suppose that f is defined on an open interval containing the point a, but
possibly not at a, and let L be a real number. Then lim f (x) = L if and only if lim− f (x) = L
x→a x→a
and lim+ f (x) = L.
x→a

There are important cases in which lim− f (x) or lim+ f (x) does not exist as a finite real
x→a x→a
number, but we are still able to extract some information about the behaviour of the function
1
f near a. For instance, the values of the function f (x) = ‘tends to +∞’ as x tends to 0
x
from the right. The next two definitions make this idea precise.
Definition 3.3.8. Let f be a function defined on an open interval (a, b).

(1) We say that lim+ f (x) = +∞ if the following holds: For every E > 0 there exists a
x→a
number δ > 0 so that for all x ∈ R,

if a < x < a + δ then f (x) > E.

(2) We say that lim+ f (x) = −∞ if the following holds: For every E < 0 there exists a
x→a
number δ > 0 so that for all x ∈ R,

if a < x < a + δ then f (x) < E.


Definition 3.3.9. Let f be a function defined on an open interval (b, a).

(1) We say that lim− f (x) = +∞ if the following holds: For every E > 0 there exists a
x→a
number δ > 0 so that for all x ∈ R,

if a − δ < x < a then f (x) > E.

164
(2) We say that lim− f (x) = −∞ if the following holds: For every E < 0 there exists a
x→a
number δ > 0 so that for all x ∈ R,

if a − δ < x < a then f (x) < E.

We have four definitions for infinite one-sided limits for a function f : (a, b) → R; namely,
lim+ f (x) = ∞, lim+ f (x) = −∞, lim− f (x) = ∞ and lim− f (x) = −∞. However, as
x→a x→a x→b x→b
we show in the next theorem, it is enough to thoroughly understand one of these, namely,
lim+ f (x) = ∞.
x→a

Theorem 3.3.10. Let f be a function defined on an open interval (a, b). Then the following
statements are true.

(1) lim+ f (x) = −∞ if and only if lim+ −f (x) = ∞.


x→a x→a

(2) lim− f (x) = −∞ if and only if lim− −f (x) = ∞.


x→b x→b

(3) lim− f (x) = ∞ if and only if lim+ f (a + b − t) = ∞.


x→b t→a

Proof of (1). Assume that lim+ f (x) = −∞. Fix E > 0. Then −E < 0, so by Definition
x→a
3.3.8 (2) there exists a real number δ > 0 such that

if a < x < a + δ then f (x) < −E.

Therefore
if a < x < a + δ then − f (x) > E.
Since this is true for all E > 0, it follows by the definition of an infinite one-sided limit,
Definition 3.3.8 (1), that lim+ −f (x) = ∞.
x→a

Assume that lim+ −f (x) = ∞. Fix E < 0. Then −E > 0, so by Definition 3.3.8 (1) there
x→a
exists a real number δ > 0 such that

if a < x < a + δ then − f (x) > −E.

Therefore
if a < x < a + δ then f (x) < E.
Since this is true for all E < 0, it follows by the definition of an infinite one-sided limit,
Definition 3.3.8 (2), that lim+ f (x) = −∞.
x→a

Proof of (2). The proof is similar to that of (1) and is therefore left as an exercise for the
reader, see Exercise 3.3 number 7.

165
Proof of (3). Assume that lim− f (x) = ∞. Fix E > 0. By Definition 3.3.9 (2) there exists
x→b
a real number δ > 0 such that
if b − δ < x < b then f (x) > E.
Therefore
if b − δ < a + b − t < b then f (a + b − t) > E.
Let a < t < a + δ. Then −a − δ < −t < −a. Adding a + b throughout we get
b − δ < a + b − t < b.
Therefore
f (a + b − t) > E.
Since this is true for all E > 0, it follows by the definition of an infinite one-sided limit,
Definition 3.3.8 (1), that lim+ f (a + b − t) = ∞.
t→a

Assume that lim+ f (a + b − t) = ∞. Fix E > 0. By Definition 3.3.9 (1) there exists a
t→a
number δ > 0 so that
if a < t < a + δ then f (a + b − t) > E.
Therefore
if a < a + b − x < a + δ then f (x) = f (a + b − (a + b − x)) > E.
Let b − δ < x < b. Then −b < −x < −b + δ. Adding a + b throughout we get
a < a + b − x < a + δ.
Therefore
f (x) > E.
Since this is true for all E > 0, it follows by the definition of an infinite one-sided limit,
Definition 3.3.9 (1), that lim− f (x) = ∞.
x→b

In view of Theorem 3.3.10 we only consider examples of the form lim+ f (x) = ∞.
x→a

1
Example 3.3.11. We prove that lim+ = ∞.
x→0 x

1
Solution. Let f (x) = , x ̸= 0. Fix E > 0 and let δ = E1 .
x
1 1 1
If 0 < x < δ then 0 < x < so that > = E.
E x 1/E
Hence
1
if 0 < x < δ then f (x) = > E.
x

166
2x + 1
Example 3.3.12. We prove that lim+ = +∞.
x→1 x−1

2x + 1 1
Solution. Let f (x) = , x ̸= 1. Let x > 1. Then 2x + 1 > 3 and x−1
> 0. Therefore
x−1
1 3
f (x) = (2x + 1) × > if x > 1.
x−1 x−1
3
Fix E > 0 and choose δ = E
.

Let 1 < x < 1 + δ. Then


3 1 E
0<x−1< so that > .
E x−1 3
Hence
1 E
f (x) > 3 ×> 3 × = E.
x−1 3
Since this is true for all E > 0, it follows from the definition of an infinite one-sided limit
2x + 1
that lim+ = ∞.
x→1 x−1
1
Example 3.3.13. We prove that lim+ 2 = ∞.
x→3 x − 9

1
Solution. Let f (x) = , x ̸= ±3. We have
x2 −9
1 1
f (x) = × , x ̸= ±3.
x+3 x−3
Let 3 < x < 4. Then
1 1 1
6 < x + 3 < 7 so that < < .
7 x+3 6
1
Note also that x−3
> 0. Therefore
1 1 1 1
if 3 < x < 4 then f (x) = × > × . (3.36)
x+3 x−3 7 x−3
1
Fix E > 0 and choose δ = min{ 7E , 1}.
1
Let 3 < x < 3 + δ. Because δ ≤ 7E
,
1 1
0<x−3<δ ≤ so that > 7E.
7E x−3
Since δ ≤ 1 it follows that 3 < x < 3 + 1 = 4. It therefore follows from (3.36) that
1 1 1
f (x) >× > × 7E = E.
7 x−3 7
Since this is true for all E > 0, it follows from the definition of an infinite one-sided limit
1
that lim+ 2 = ∞.
x→3 x − 9

167
The concept of a one-sided limit is a necessary extension of the limit at a point. Indeed, it
is impossible to define what it means for a function f : [a, b] → R to be continuous on the
whole interval [a, b], hence also at a and at b, without the use of one-sided limits. You may
recall that continuity of a function on a closed interval [a, b] is an essential assumption in
some of the most important theorems in Calculus, such as the Extreme Value Theorem. In
Section 4.3 we will encounter yet another important application of one-sided limits.

Exercise 3.3

1. Prove the following, using the definition of a one-sided limit.

√ sin2 x + 1 2x
(a) lim+ x2 sin( x) = 0 (b) lim+ =∞ (c) lim+ =∞
x→0 x→2 x2 − 4 x→1 x−1
 2
 x + x if x < 1
(d) lim g(x) = 2 where g(x) =
x→1−
x2 − x if x ≥ 1.

2. Prove Theorem 3.3.7.

3. Prove Theorem 3.3.3 by suitably adapting the proof of Theorem 3.2.3.

4. Let f and g be functions defined on the open interval (a, b). Assume that L and M
are real numbers so that lim+ f (x) = L and lim+ g(x) = M . Prove that the following
x→a x→a
statements are true.
(a) lim+ [f (x) + g(x)] exists, and lim+ [f (x) + g(x)] = L + M .
x→a x→a
(b) lim+ [f (x)g(x)] exists, and lim+ [f (x)g(x)] = LM .
x→a x→a

5. Let f and g be functions defined on the open interval (a, b) such that lim+ f (x) = ∞
x→a
and lim+ g(x) = ∞. Prove that the following statements are true.
x→a

(a) lim+ [f (x) + g(x)] = ∞.


x→a
(b) lim+ [f (x)g(x)] = ∞.
x→a

6. Let f and g be functions defined on the open interval (a, b) such that lim+ f (x) = ∞
x→a
and lim+ g(x) = −∞. Decide whether the following statements are true. If true, give
x→a
a proof. Otherwise give an example that shows that it is not true.
(a) lim+ [f (x) + g(x)] exists as a finite real number, and lim+ [f (x) + g(x)] = 0.
x→a x→a
(b) If lim+ [f (x) + g(x)] exists as a finite real number, then lim+ [f (x) + g(x)] = 0.
x→a x→a
(c) lim+ [f (x) + g(x)] = ∞.
x→a
(d) lim+ [f (x) + g(x)] = −∞.
x→a

168
7. Prove Theorem 3.3.10 (2).

3.4 Limits at ±∞

In many situations in mathematics and its applications, we are interested in the long-term
behaviour of a function. For instance, if f (t) gives the number of people suffering from a
certain disease, such as malaria, at time t > 0, then an epidemiologist is interested in the
values of f (t) for ‘large’ t. If ‘f (t) tends to 0 as t tends to +∞’, it means that, eventually,
the disease will all but die out. We are therefore led to the idea of a ‘limit at ∞’, and in
this section, we make this concept precise.
Definition 3.4.1. Let f be defined on (a, ∞) for some a ∈ R, and let L be a real number.
We say that lim f (x) = L if the following holds: For every ϵ > 0 there exists an M > 0 so
x→∞
that for all x ∈ R,

if x > M then |f (x) − L| < ϵ.

Definition 3.4.2. Let f be defined on (−∞, a) for some a ∈ R, and let L be a real number.
We say that lim f (x) = L if the following holds: For every ϵ > 0 there exists an N < 0
x→−∞
so that for all x ∈ R,

if x < N then |f (x) − L| < ϵ.

Definitions 3.4.1 and 3.4.2 make our intuitive understanding of what it means for ‘f (x) to
tend to L as x tends to ∞ or to −∞’ precise. Simply put, Definition 3.4.1 states that the
values of f (x) can be made as close to L as we like, simply by making x large enough.

L+ϵ
L
L−ϵ
x
M

Remark 3.4.3. It should be noted that the Limit Laws, given in Theorem 3.2.1, hold also
for limits at ∞ and −∞. The proofs are similar to those for the Limit Laws in Theorem
3.2.1.

169
It is sufficient to properly understand the definition of lim f (x) = L, as the case lim f (x) =
x→∞ x→−∞
L can be reduced to that of a limit at ∞, as the following result shows.

Theorem 3.4.4. Let f be defined on (−∞, a) for some a ∈ R, and let L be a real number.
Then lim f (x) = L if and only if lim f (−x) = L.
x→−∞ x→∞

Proof. Assume that lim f (x) = L. Fix a real number ϵ > 0. According to the definition
x→−∞
of a limit at −∞, Definition 3.4.2, there exists a real number N < 0 so that

if x < N then |f (x) − L| < ϵ.

Therefore
if − x < N then |f (−x) − L| < ϵ.
We now have

if x > −N > 0 then − x < N so that |f (−x) − L| < ϵ.

Since this holds for all ϵ > 0, it follows from the definition of a limit at ∞, Definition 3.4.1,
that lim f (−x) = L.
x→∞

Now assume that lim f (−x) = L. Fix a real number ϵ > 0. According to the definition of
x→∞
a limit at ∞, Definition 3.4.1, there exists a real number M > 0 so that

if x > M then |f (−x) − L| < ϵ.

Therefore
if − x > M then |f (x) − L| = |f (−(−x)) − L| < ϵ.
We now have

if x < −M < 0 then − x > M so that |f (x) − L| < ϵ.

Since this holds for all ϵ > 0, it follows from the definition of a limit at −∞, Definition
3.4.2, that lim f (x) = L.
x→−∞

We now turn to some examples. In view of Theorem 3.4.4, we consider only the case
lim f (x) = L.
x→∞

1
Example 3.4.5. We prove that f (x) = tends to 0 as x tends to ∞; that is, we show that
x
lim f (x) = 0.
x→∞

170
Solution. First note that f is defined on (0, ∞) and

1
|f (x) − 0| =
x
(3.37)
1
= if x > 0.
x
Fix any ϵ > 0, and choose M = 1ϵ .

Let x > M . Then


1 1 1
x> > 0 so that < = ϵ.
ϵ x 1/ϵ
It therefore follows from (3.37) that
1
|f (x) − 0| = < ϵ.
x
This is true for all ϵ > 0. It therefore follows from the definition of a limit at ∞ that
lim f (x) = 0.
x→∞

2x − 1
Example 3.4.6. We prove that f (x) = tends to 2 as x tends to ∞; that is, we show
x+1
that lim f (x) = 2.
x→∞

Solution. First note that f is defined on (−1, ∞) and

2x − 1
|f (x) − 2| = −2
x+1

2x − 1 − 2(x + 1)
=
x+1

−3
=
x+1

3
= if x > −1
x+1
But
1 1
if x > 0 then x + 1 > x > 0 so that < .
x+1 x
Therefore
3
|f (x) − 2| < if x > 0. (3.38)
x
Fix any ϵ > 0, and choose M = 3ϵ .

171
Let x > M . Then
3 1 1 1 ϵ
x>M = > 0 so that < = = .
ϵ x M 3/ϵ 3
Therefore it follows from (3.38) that
3 ϵ
|f (x) − 2| < < 3 × = ϵ.
x 3
This holds for all ϵ > 0. It therefore follows from the definition of a limit at ∞ that
lim f (x) = 2.
x→∞

2 sin x
Example 3.4.7. We prove that h(x) = tends to 0 as x tends to ∞; that is, we show
x+2
that lim h(x) = 0.
x→∞

Solution. First note that, h is defined on (−2, ∞) and, since | sin x| ≤ 1 for all x ∈ R,
2 sin x
|h(x) − 0| =
x+2

2

x+2

2
= if x > −2.
x+2
But
1 1
if x > 0 then x + 2 > x > 0 so that < .
x+2 x
Therefore
2
if x > 0 then |h(x) − 0| < . (3.39)
x
Fix any ϵ > 0, and choose M = 2ϵ .

Let x > M . Then


2 1 1 1 ϵ
x>M = > 0 so that < = = .
ϵ x M 2/ϵ 2
It therefore follows from (3.39) that
2 ϵ
|h(x) − 0| < < 2 × = ϵ.
x 2
This holds for all ϵ > 0. It therefore follows from the definition of a limit at ∞ that
lim h(x) = 0.
x→∞

x2
Example 3.4.8. We prove that f (x) = tends to 1 as x tends to ∞; that is, we prove
x2 + x
that lim f (x) = 1.
x→∞

172
Solution. First note that f is defined on (0, ∞) and

x2
|f (x) − 1| = −1
x2 + x

−x
=
x2+x

1
= if x > 0.
x+1
But
1 1
if x > 0 then x + 1 > x > 0 so that < .
x+1 x
Therefore
1
if x > 0 then |f (x) − 1| < . (3.40)
x

Fix any ϵ > 0, and choose M = 1ϵ .

Let x > M . Then


1 1 1 1
x>M = > 0 so that < = = ϵ.
ϵ x M 1/ϵ
It now follows from (3.40) that
1
|f (x) − 1| < < ϵ.
x
This holds for all ϵ > 0. Therefore it follows from the definition of a limit at ∞ that
lim f (x) = 1.
x→∞

As in the case of one-sided limits, there is a special case when the limit of a function f at
∞ or −∞ does not exist that is of interest. This is the case when ‘f (x) tends to ∞ or to
−∞ as x tends to ∞ or −∞’.

Definition 3.4.9. Let f be defined on (a, ∞) for some a ∈ R. We say that

(1) lim f (x) = ∞ if the following holds: For every E > 0 there exists an M > 0 so that
x→∞
for all x ∈ R,

if x > M then f (x) > E.

(2) lim f (x) = −∞ if the following holds: For every E < 0 there exists an M > 0 so that
x→∞
for all x ∈ R,

if x > M then f (x) < E.

173
Definition 3.4.10. Let f be defined on (−∞, a) for some a ∈ R. We say that

(1) lim f (x) = ∞ if the following holds: For every E > 0 there exists an N < 0 so that
x→−∞
for all x ∈ R,

if x < N then f (x) > E.

(2) lim f (x) = −∞ if the following holds: For every E < 0 there exists an N < 0 so
x→−∞
that for all x ∈ R,

if x < N then f (x) < E.

As for infinite one-sided limits, a thorough understanding of the case lim f (x) = ∞ because,
x→∞
as the next two results show, the remaining three cases can be reduced to this one.
Theorem 3.4.11. Let f be a function from R to R, and a ∈ R. Then the following
statements are true.

(1) If f is defined on (a, ∞), then lim f (x) = −∞ if and only if lim −f (x) = ∞.
x→∞ x→∞

(2) If f is defined on (−∞, a), then lim f (x) = −∞ if and only if lim −f (x) = ∞.
x→−∞ x→−∞

(3) If f is defined on (−∞, a), then lim f (x) = ∞ if and only if lim f (−x) = ∞.
x→−∞ x→∞

Proof of (1). Assume that f is defined on (a, ∞) and lim f (x) = −∞. Fix a real number
x→∞
E > 0. Then −E < 0 so according to the definition of an infinite limit at ∞, Definition
3.4.9 (2), there exists a real number M > 0 so that

if x > M then f (x) < −E.

Therefore
if x > M then − f (x) > E.
This holds for all E > 0, so by the definition of an infinite limit at ∞, Definition 3.4.9 (1),
it follows that lim −f (x) = ∞.
x→∞

Now assume that lim −f (x) = ∞. Fix a real number E < 0. Then −E > 0 so according
x→∞
to the definition of an infinite limit at ∞, Definition 3.4.9 (1), there exists a real number
M > 0 so that
if x > M then − f (x) > −E.
Therefore
if x > M then f (x) < E.
This holds for all E < 0, so by the definition of an infinite limit at ∞, Definition 3.4.9 (2),
it follows that lim f (x) = −∞.
x→∞

174
Proof of (2). The proof is essentially the same as that of (1), and is therefore left as an
exercise for the reader, see Exercise 3.4 number 8.

Proof of (3). Assume that f is defined on (−∞, a) and lim f (x) = ∞. Fix a real number
x→−∞
E > 0. Then according to the definition of an infinite limit at −∞, Definition 3.4.10 (1),
there exists a real number N < 0 so that

if x < N then f (x) > E.

Therefore
if − x < N then f (−x) > E.
We now have
if x > −N > 0 then − x < N so that f (−x) > E.
This holds for all E > 0, so by the definition of an infinite limit at ∞, Definition 3.4.9 (1),
it follows that lim f (−x) = ∞.
x→∞

Now assume that lim f (−x) = ∞. Fix a real number E > 0. According to the definition
x→∞
of an infinite limit at ∞, Definition 3.4.9 (1), there exists a real number M > 0 so that

if x > M then f (−x) > E.

Therefore
if − x > M then f (x) = f (−(−x)) > E.
We now have
if x < −M < 0 then − x > M so that f (x) > E.
This holds for all E > 0, so by the definition of an infinite limit at −∞, Definition 3.4.10
(1), it follows that lim f (x) = ∞.
x→−∞

We consider two examples.


x2
Example 3.4.12. We prove that f (x) = tends to ∞ as x tends to ∞; that is, we
x+1
show that lim f (x) = ∞.
x→∞

Solution. First note that f is defined on (1, ∞) and

if x > 1 then 0 < x + 1 < x + x = 2x.

Therefore
1 1
if x > 1 then > > 0.
x+1 2x
Hence
1 1 x
f (x) = x2 × > x2 × = if x > 1. (3.41)
x+1 2x 2

175
Fix any E > 0, and choose M = max{2E, 1}.

Let x > M . Since M ≥ 1 it follows that x > 1. Therefore it follows from (3.41) that
x M
f (x) > > .
2 2
But M ≥ 2E. Therefore
M 2E
f (x) > = > E.
2 2
Since this holds for all E > 0, it follows from the definition of an infinite limit at ∞ that
lim f (x) = ∞.
x→∞
x
Example 3.4.13. We prove that f (x) = 2 tends to ∞ as x tends to ∞; that is,
sin x + 1
we prove that lim f (x) = ∞.
x→∞

Solution. First note that, for all x ∈ R,


1 1
0 ≤ sin2 x ≤ 1 so that 1 ≤ sin2 x + 1 ≤ 2, hence ≤ 2 ≤ 1.
2 sin x + 1
Therefore
x 1 x
if x > 0 then f (x) = 2 =x× 2 ≥ .
sin x + 1 sin x + 1 2
Fix any E > 0. Choose M = 2E.

Let x > M . Then


x M 2E
x > M > 0 so that f (x) ≥ > = = E.
2 2 2
Since this holds for all E > 0, it follows from the definition of an infinite limit at ∞ that
lim f (x) = ∞.
x→∞

The limit of a function f as x tends to ±∞, if it exists, gives information on the long-term
behaviour of the function f . Furthermore, as we will see, it allows us to define integrals of
functions over unbounded intervals with the aid of the Fundamental Theorem of Calculus,
see Section 4.3.

Exercise 3.4

1. Prove the following, using the appropriate definition.

4x2 2x + 1 cos x
(a) lim =4 (b) lim =2 (c) lim =0
x→∞ x2 + 3 x→∞ x + 3 x→∞ 3x + 2

x x4 x2
(d) lim =0 (e) lim =∞ (f) lim =∞
x→∞ x2 + 3 x→∞ x2 + 1 x→∞ x − 1

176
 
x
(g) lim ln = 0 [HINT: g(t) = ln(t) is continuous at t = 1.]
x→∞ x+1
 
1
(h) lim cos = 1 [HINT: lim+ cos t = 1.]
x→∞ x t→0

2. Let a be a real number and f, g : (a, ∞) → R functions so that f (x) ≤ g(x) for all
x > a. Prove the following, using the appropriate definitions.

(a) If lim f (x) = ∞ then lim g(x) = ∞.


x→∞ x→∞

(b) If lim g(x) = −∞ then lim f (x) = −∞.


x→∞ x→∞

3. Consider functions f, g : (a, +∞) → R, and real numbers L and M . Use the
appropriate definition to prove that if lim f (x) = L and lim g(x) = M , then
x→∞ x→∞
lim [f (x) + g(x)] = L + M .
x→∞

4. Consider a function f : (a, ∞) →  R such


 that f (x) > 0 for all x > a. Prove that
1
lim f (x) = 0 if and only if lim = ∞.
x→∞ x→∞ f (x)

5. This exercise is a generalisation of number 1 (g). Consider functions f, g : R → R,


and real numbers a and L. Assume that f (x) > a for all x ∈ R, lim f (x) = a and
x→∞
lim g(x) = L. Prove that lim g(f (x)) = L.
x→a x→∞

6. Consider functions f, g : (a, ∞) → R and a real number L > 0. If lim f (x) = L and
x→∞
lim g(x) = ∞, prove that lim f (x)g(x) = ∞.
x→∞ x→∞

7. Prove Theorem 3.4.11 (2).

177
Chapter 4

Integration and Applications

Integration is one of the two fundamental operations in Calculus, the other being differenti-
ation. The theory of integration has its origins in concrete problems, such as finding areas
of regions in a plane, and volumes of solid bodies. This chapter deals mainly with one of
the cornerstone results in Calculus, namely, the Fundamental Theorem of Calculus. This
result connects integration with differentiation. We also consider techniques for evaluating
integrals, and discuss some applications of integration. The definite integral is used to define
and calculate the length of certain simple curves in the plane. We shown how the definite
integral is used to define the average value of a function on an interval [a, b]. We then prove
the Fundamental Theorem of Calculus. As an application of the Fundamental Theorem of
Calculus, we give rigorous definitions for the natural logarithmic and exponential functions
and prove a Mean Value Theorem for the integral. The concept of an improper integral is
introduced, and we end the chapter with a discussion of Taylor polynomials.

4.1 Integration by Parts

Differentiation of elementary functions is an inherently easier process than determining an


anti-derivative of such a function. There are two reasons for this. Firstly, elementary
functions are obtained via addition, multiplication and composition of functions from a
relatively small family of functions for which the exact form of the derivative is known.
Secondly, there are rules for expressing the derivative of such a function in terms of the
derivatives of its constituent parts, the so called ‘Differentiation Rules’. For instance, the
Product Rule states that if f, g : R → R are differentiable functions, then the function h
given by
h(x) = f (x)g(x), x ∈ R
is differentiable, and
h′ (x) = f ′ (x)g(x) + f (x)g ′ (x), x ∈ R.

178
When it comes to integration, there is no genuine counterpart for the Product Rule. In
particular, it is not true that
Z Z Z
f (x)g(x) dx = f (x) g(x) dx + g(x) f (x) dx

for continuous functions f, g : R → R. The aim of this section is to introduce a method for
evaluating definite and indefinite integrals which is a close analogue for the Product Rule.
This method is known as Integration by Parts, and is based on the following result.
Theorem 4.1.1 (Integration by Parts). Suppose that f, g : [a, b] → R are differentiable
on [a, b], and that f ′ and g ′ are continuous on [a, b]. Then
Z b b Z b

f (x)g (x) dx = f (x)g(x) − f ′ (x)g(x) dx.
a a a

Theorem 4.1.1 is an easy consequence of the Product Rule for differentiation and the Fun-
damental Theorem of Calculus. The proof is therefore given as an exercise, see Exercise 4.1
number 3.
Remark 4.1.2. Note that if f and g are differentiable functions with continuous derivatives,
then Z Z
f (x)g (x) dx = f (x)g(x) − f ′ (x)g(x) dx.

We illustrate the usefulness of Theorem 4.1.1 at the hand of a number of examples.


Example 4.1.3. We evaluate the integral
Z 1
2xex+1 dx.
0

Solution. Let
f (x) = 2x and g ′ (x) = ex+1 , x ∈ R.
Then
f ′ (x) = 2 and g(x) = ex+1 , x ∈ R.
Therefore Z 1 Z 1
2xex+1
dx = f (x)g ′ (x) dx
0 0

1 Z 1
x+1
= 2xe − 2ex+1 dx
0 0

1
x+1 x+1
= 2xe − 2e
0

= 2e.

179
Example 4.1.4. We evaluate the integral
Z
ln x dx.

Solution. Let
f (x) = ln x and g ′ (x) = 1, x > 0.
Then
1
f ′ (x) =
and g(x) = x, x > 0.
x
Therefore Z Z
ln x dx = f (x)g ′ (x) dx

Z
1
= x ln x − x× dx
x

= x ln x − x + C.

In some cases it is necessary to apply Integration by Parts more than once, as in the following
example.
Example 4.1.5. We evaluate the integral
Z π/6
3x2 sin x dx.
0

Solution. Let
f (x) = 3x2 and g ′ (x) = sin x, x ∈ R.
Then
f ′ (x) = 6x and g(x) = − cos x, x ∈ R.
Therefore Z π/6 Z π/6
2
3x sin x dx = f (x)g ′ (x) dx
0 0

π/6 Z π/6
2
= −3x cos x + 6x cos x dx
0 0

√ 2 Z π/6

= − + 6x cos x dx.
24 0
We now set
u(x) = 6x and v ′ (x) = cos x, x ∈ R.
Then
u′ (x) = 6 and v(x) = sin x, x ∈ R.

180
Therefore Z π/6 Z π/6
6x cos x dx = u(x)v ′ (x) dx
0 0

π/6 Z π/6
= 6x sin x − 6 sin x dx
0 0

π/6
= 6x sin x + 6 cos x
0

π √
= + 3 3 − 6.
2
We then have √ 2
Z π/6
2 π √ 3π
3x sin x dx = + 3 3 − 6 − .
0 2 24

Repeatedly applying Integration by Parts to an integral of the form


Z b
f (x)g ′ (x) dx
a

may lead to a recurring pattern which can be exploited to evaluate the integral. This
technique is demonstrated in the following example.
Example 4.1.6. We evaluate the integral
Z π/2
e2x cos x dx.
0

Solution. Set Z π/2


I= e2x cos xdx.
0
Let
f (x) = e2x and g ′ (x) = cos x, x ∈ R.
Then
f ′ (x) = 2e2x and g(x) = sin x, x ∈ R,
so that
Z π/2
I = f (x)g ′ (x) dx
0

π/2 Z π/2
2x
= e sin x − 2e2x sin x dx (4.1)
0 0

Z π/2
π
= e −2 e2x sin x dx.
0

181
We now let
u(x) = e2x and v ′ (x) = sin x, x ∈ R.
Then
u′ (x) = 2e2x and v(x) = − cos x, x ∈ R,
so that
Z π/2 Z π/2
2x
e sin x dx = u(x)v ′ (x) dx
0 0

π/2 π/2
(4.2)
Z
2x
= −e cos x + 2e2x cos x dx
0 0

= 1 + 2I.

Substituting (4.2) into (4.1) we have

I = eπ − 2 (1 + 2I) .

Solving for I we find that


eπ − 2
I= .
5

In this section, the method of Integration by Parts is introduced, and its utility is demon-
strated at the hand of a number of examples. In particular, this technique is used to compute
the anti-derivative of the natural logarithmic function. It should be noted that Integration
by Parts is one of the most useful integration techniques, and has important theoretical
applications which go beyond the mere evaluation of definite and indefinite integrals.

Exercise 4.1

1. Evaluate the following integrals. For questions (e), (k) and (l), first evaluate the
integral using Integration by Parts, and then by making a suitable substitution. Which
method do you prefer?
Z ln 2 Z 1 Z e
x 2 2x
(a) 2xe dx (b) xe dx (c) x ln x dx
0 0 1

Z π Z 1 Z 1
x 2
(d) 3
x cos(3x) dx (e) dx (f) 4x3 e−x dx
0 0 x+1 0

Z 1 Z e2 Z π−1
ln x
(g) arctan x dx (h) dx (i) x3 sin(x + 1) dx
0 1 x3 −1

1 4 √ 3
x2
Z Z Z
2 2
(j) 2x ln(x + 1) dx (k) x 2x + 1 dx (l) √ dx
0 0 0 4 x+1

182
Z 2 Z π/6 Z π/2
(m) 2
(ln x) dx (n) 2x
e sin(3x) dx (o) e−2x cos x dx
1 0 0

Z 2 Z 1 Z 3
(p) x2 2x+1 dx (q) x cosh x dx (r) arctan(1/x) dx
1 0 1

2. Determine the area of the region above the x-axis and below the curve with equation
y = (1 − x2 )e−x , x ∈ R.

3. Prove Theorem 4.1.1.

4. Let a and b be nonzero real numbers. Prove that


eax (a sin(bx) − b cos(bx))
Z
eax sin(bx) dx = + C.
a2 + b2

5. Let n be an integer.
(a) Prove that

cos x sinn−1 x n − 1
Z Z
n
sin x dx = − + sinn−2 x dx
n n
for all n ≥ 2.
(b) Use the result in (a) to prove that
π/2 π/2
n−1
Z Z
n
sin x dx = sinn−2 x dx
0 n 0

for all n ≥ 2.
(c) Use Mathematical Induction and the result in (b) to prove that
π/2
2 · 4 · 6 · · · 2n
Z
sin2n+1 x dx =
0 1 · 3 · 5 · · · (2n + 1)

for all n ≥ 1.
(d) Use Mathematical Induction and the result in (b) to prove that
π/2
1 · 3 · 5 · · · (2n − 1) π
Z
sin2n x dx =
0 2 · 4 · 6 · · · 2n 2
for all n ≥ 1.

6. Assume that f : (a, b) → R is differentiable with continuous derivative f ′ . Assume


that f −1 exists and has antiderivative G. Show that
Z
f (x) dx = xf (x) − G(f (x)).

183
4.2 Integration of Trigonometric Functions

In this section we develop techniques for integrating certain trigonometric functions. The
underlying principle is simple; we use trigonometric identities to convert the integrand into
a standard form that is more easily integrable.

We first consider integrals of the form


Z b
sinn x cosm x dx (4.3)
a

where m and n are positive integers. Note that if m = 1 or n = 1, then the antiderivative
is easy to compute. For instance,

sinn+1 x
Z Z
n n d
sin x cos x dx = sin x sin x dx = + C.
dx n+1

We therefore aim to convert an integral of the form (4.3) into one involving terms of the
form
sinn x cos x or cosm x sin x.
Consider the following example.

Example 4.2.1. We evaluate the integral


Z π/4
cos3 x sin6 x dx.
0

Solution. Using the identity cos2 x + sin2 x = 1 we have

cos3 x sin6 x = cos x cos2 x sin6 x

= cos x 1 − sin2 x sin6 x




sin6 x − sin8 x cos x



=

for all x ∈ R. But


d
sin x = cos x
dx
so that
 d
cos3 x sin6 x = sin6 x − sin8 x sin x.
dx

184
Hence Z π/4 Z π/4  d
3 6
cos x sin x dx = sin6 x − sin8 x sin x dx
0 0 dx
π/4
sin7 x sin9 x
= −
7 9 0

1 1
= √ − √
56 2 144 2
11
= √ .
1008 2
Remark 4.2.2 (Odd powers of sine or cosine). The method of Example 4.2.1 can be
applied whenever the power of sine or of cosine is odd. Let m and n be positive integers.
The identity
sin2 x + cos2 x = 1
leads to the following.
m m
(1) sinn x cos2m+1 x = sinn x (cos2 x) cos x = sinn x 1 − sin2 x cos x for all x ∈ R.
n n
(2) sin2n+1 x cosm x = sin x sin2 x cosm x = sin x (1 − cos2 x) cosm x for all x ∈ R.

Example 4.2.3. We evaluate the integral


Z π/2
cos2 (2x) sin5 (2x) dx.
−π/2

Solution. We use the identity cos2 (2x) + sin2 (2x) = 1 and find

cos2 (2x) sin5 (2x) = cos2 (2x) sin4 2x sin(2x)

2
= cos2 (2x) (1 − cos2 (2x)) sin(2x)

= (cos2 (2x) − 2 cos4 (2x) + cos6 (2x)) sin(2x)

for all x ∈ R. But


d
cos(2x) = −2 sin(2x)
dx
so that
1  d
cos2 (2x) sin5 (2x) = − cos2 (2x) − 2 cos4 (2x) + cos6 (2x) cos(2x).
2 dx

185
Hence Z π/2
cos2 (2x) sin5 (2x) dx
−π/2

Z π/2
1  d
= − cos2 (2x) − 2 cos4 (2x) + cos6 (2x) cos(2x) dx
−π/2 2 dx

π/2
cos3 (2x) 2 cos5 (2x) cos7 (2x)
 
1
=− − +
2 3 5 7 −π/2

= 0.

For the method used in Examples 4.2.1 and 4.2.3 it is essential that either the power of
sine or of cosine is odd. If the powers of both sine and cosine are even, another approach is
needed.

Example 4.2.4. We evaluate the integral


Z π/2
4 cos4 x sin2 x dx.
0

Solution. We use the double angle identities

1 + cos(2x) 1 − cos(2x)
cos2 x = , sin2 x = and sin(2x) = 2 sin x cos x, x ∈ R
2 2
and find that
4 cos4 x sin2 x = 4 cos2 x (cos x sin x)2

1 + cos(2x) sin2 (2x)


= 4× ×
2 4
1
cos(2x) sin2 (2x) + sin2 (2x)

=
2
 
1 2 1 − cos(4x)
= cos(2x) sin (2x) +
2 2

cos(2x) sin2 (2x) cos(4x) 1


= − + .
2 4 4

186
Therefore
Z π/2 π/2
cos(2x) sin2 (2x) cos(4x) 1
Z  
4 2
4 cos x sin x dx = − + dx
0 0 2 4 4
π/2
sin3 2x sin 4x x
= − +
12 16 4 0

π
= .
8
Remark 4.2.5 (Even powers of sine and cosine). The method employed in Example
4.2.4 can be applied whenever the powers of both sine and cosine are even. Let m and n be
positive integers.

(1) Then the identities


1 + cos(2x) 1 − cos(2x)
cos2 x = and sin2 x = , x∈R
2 2
yield
 m  n
2m 2n 2
m 2
n 1 + cos(2x) 1 − cos(2x)
cos x sin x = cos x sin x = .
2 2

(2) Alternatively, the identity

sin(2x) = 2 sin x cos x, x ∈ R

yields

cos2m x sin2n x = cos2m−2 x sin2n−2 x (cos x sin x)2


m−1 n−1
= cos2 x sin2 x (cos x sin x)2
m−1  n−1
sin2 (2x)

1 + cos(2x) 1 − cos(2x)
=
2 2 4

(3) As in Example 4.2.4, we may have to apply the reductions in (1) and (2) more than
once.

We now turn to integrals of the form


Z b
tann x secm x dx. (4.4)
a

Since
d d π
tan x = sec2 x and sec x = sec x tan x, x ̸= + kπ, k ∈ Z
dx dx 2

187
we have
tann+1 x
Z Z
n 2 d
tan x sec x dx = tann x tan x dx = +C
dx n+1
and
secm+1 x
Z Z
m+1 d
sec sec x dx =
x tan x dx = secm x+C
dx m+1
whenever n ≥ 1 and m ≥ 0 are integers. Our aim is therefore to convert an integral of the
form (4.4) into one involving terms of the form
tann x sec2 x or secm x(sec x tan x).
Example 4.2.6. We evaluate the integral
Z π/4
tan6 x sec4 x dx.
0

Solution. The identity tan2 x + 1 = sec2 x yields


sec4 x tan6 x = tan6 x sec2 x sec2 x

= tan6 x (tan2 x + 1) sec2 x

= (tan8 x + tan6 x) sec2 x


π
for all x ̸= 2
+ kπ, k ∈ Z. But
d
tan x = sec2 x
dx
so that
 d π
sec4 x tan6 x = tan8 x + tan6 x tan x, x ̸= + kπ, k ∈ Z.
dx 2
Therefore
Z π/4 Z π/4  d
6 4
tan x sec x dx = tan8 x + tan6 x tan x dx
0 0 dx
π/4
tan9 x tan7 x
= +
9 7 0

16
. =
63
Remark 4.2.7 (Even powers of secant). The technique used in Example 4.2.6 applies
whenever the power of secant is even. Let m ≥ 1 and n ≥ 0 be positive integers. Then the
identity
π
tan2 x + 1 = sec2 x, x ̸= + kπ, k ∈ Z
2
yields m−1 2
tann x sec2m x = tann x sec2m−2 x sec2 x = tann x tan2 x + 1 sec x.
π
whenever x ̸= 2
+ kπ, k ∈ Z.

188
Now consider the following example, to which the method in Remark 4.2.7 does not apply.
Example 4.2.8. We evaluate the integral
Z π/4
sec3 x tan5 x dx.
0

Solution. The identity tan2 x + 1 = sec2 x yields

sec3 x tan5 x = sec2 x tan4 x sec x tan x

2
= sec2 x (sec2 x − 1) sec x tan x

= (sec6 x − 2 sec4 x + sec2 x) sec x tan x


π
for all x ̸= 2
+ kπ, k ∈ Z. But

d
sec x = sec x tan x
dx
so that
 d π
sec3 x tan5 x = sec6 x − 2 sec4 x + sec2 x sec x, x ̸= + kπ, k ∈ Z.
dx 2
Therefore
Z π/4 Z π/4
3 5
 d
sec x tan x dx = sec6 x − 2 sec4 x + sec2 x sec x dx
0 0 dx
π/4
sec7 x 2 sec5 x sec3 x
= − +
7 5 3 0
√ √ √
8 2 8 2 2 2 1 2 1
= − + − + −
7 5 3 7 5 3

22 2 − 8
= .
105
Remark 4.2.9 (Odd powers of tangent). The method demonstrated in Example 4.2.8
applies whenever the power of tangent is odd. Let m ≥ 1 and n ≥ 0 be positive integers.
Then the identity
π
tan2 x + 1 = sec2 x, x ̸= + kπ, k ∈ Z
2
yields
n
tan2n+1 x secm x = tan2n x secm−1 x (sec x tan x) = secm−1 x sec2 x − 1 (sec x tan x) .
π
whenever x ̸= 2
+ kπ, k ∈ Z.

189
The techniques for integrating powers of secant and tangent described in Remarks 4.2.7 and
4.2.9 are not exhaustive. That is, there are integrals of powers of secant and tangent to
which neither of these methods apply. For instance, none of the integrals
Z Z Z
tan x dx, sec x dx or sec3 x dx

can be evaluated in these ways. In such cases other integration techniques are required,
together with some ingenuity.

Example 4.2.10. We evaluate the integral


Z
tan x dx.

Solution. We have
sin x 1 d π
tan x = =− × cos x, x ̸= + kπ, k ∈ Z.
cos x cos x dx 2
Therefore
Z Z
1 d
tan x dx = − × cos x dx = − ln | cos x| + C = ln | sec x| + C.
cos x dx
Example 4.2.11. We evaluate the integral
Z
sec x dx.

Solution. Recall that


d d
sec x = sec x tan x and tan x = sec2 x.
dx dx
Therefore
sec x + tan x
sec x = sec x ×
sec x + tan x

sec2 x + sec x tan x


=
sec x + tan x
1 d
= × (sec x + tan x) .
sec x + tan x dx
Hence Z Z
1 d
sec x dx = × (sec x + tan x) dx
sec x + tan x dx

= ln | sec x + tan x| + C.

190
Example 4.2.12. We evaluate the integral
Z π/3
sec3 x dx.
0

Solution. Set Z π/3


I= sec3 x dx.
0
Let
f (x) = sec x and g ′ (x) = sec2 x, x ∈ [0, π3 ].
Then
f ′ (x) = sec x tan x and g(x) = tan x, x ∈ [0, π3 ].
Hence Z π/3
I = f (x)g ′ (x) dx
0

π/3 Z π/3
= sec x tan x − sec x tan2 x dx.
0 0
2 2
We use the identity tan x + 1 = sec x as well as Example 4.2.11 and find that
√ Z π/3
sec3 x − sec x dx

I = 2 3−
0

√ π/3
= 2 3 + ln | sec x + tan x| −I
0
√ √
= 2 3 + ln(2 + 3) − I.
Solving for I we have √ √
2 3 + ln(2 + 3)
I= .
2
Remark 4.2.13. Note that the methods discussed for evaluating integrals of powers of secant
and tangent may be modified so as to apply to integrals involving terms of the form
cotn x cscm x
with m and n positive integers.

The methods we have discussed so far can be applied to integrals which seem not to involve
trigonometric functions at all. This can be done by making an appropriate substitution.
Such substitutions are particularly useful for evaluating integrals of rational functions f
given by
p(x)
f (x) = , x ∈ R such that q(x) ̸= 0
q(x)
where the denominator q has a repeated irreducible quadratic factor. In this regard, consider
the following example.

191
Example 4.2.14. We evaluate the integral
Z 1 5
2x + x4 + 4x3 + 2x2 + 2x + 2
dx
0 (x2 + 1)3

Solution. Let p(x) = 2x5 + x4 + 4x3 + 2x2 + 2x + 2 and q(x) = (x2 + 1)3 for all x ∈ R.
Since the quadratic term x2 + 1 is irreducible, there exist unique real numbers A, B, C, D,
E and F such that
p(x) Ax + B Cx + D Ex + F
= 2 + 2 2
+ 2 , x ∈ R.
q(x) x +1 (x + 1) (x + 1)3
We have
Ax + B Cx + D Ex + F
2
+ 2 2
+ 2
x +1 (x + 1) (x + 1)3

(Ax + B)(x2 + 1)2 + (Cx + D)(x2 + 1) + (Ex + F )


=
(x2 + 1)3

(Ax + B)(x4 + 2x2 + 1) + (Cx + D)(x2 + 1) + (Ex + F )


=
(x2 + 1)3

Ax5 + Bx4 + (2A + C)x3 + (2B + D)x2 + (A + C + E)x + (B + D + F )


=
(x2 + 1)3
Therefore
p(x) Ax + B Cx + D Ex + F
= 2 + 2 2
+ 2 , x∈R
q(x) x +1 (x + 1) (x + 1)3
if and only if
2x5 + x4 + 4x3 + 2x2 + 2x + 2 = p(x)

= Ax5 + Bx4 + (2A + C)x3 + (2B + D)x2

+(A + C + E)x + (B + D + F )
for every x ∈ R so that
A = 2
B = 1
2A + C = 4
2B + D = 2
A + C + E = 2
B + D + F = 2.
Solving the system of equations, we find that

A = 2, B = 1, C = 0, D = 0, E = 0 and F = 1.

192
Therefore
p(x) 2x + 1 1
= 2 + 2 , x ∈ R.
q(x) x + 1 (x + 1)3
We now have
1
2x5 + x4 + 4x3 + 2x2 + 2x + 2
Z
dx
0 (x2 + 1)3
Z 1  
2x + 1 1
= + 2 dx
0 x + 1 (x + 1)3
2

Z 1  
2x 1 1
= + 2 + 2 dx (4.5)
0 x + 1 x + 1 (x + 1)3
2

1 Z 1
2 1
= ln(x + 1) + arctan x + dx
0 0 (x2 + 1)3
Z 1
π 1
= ln 2 + + dx.
4 0 (x2 + 1)3

In order to evaluate the remaining integral, we set


π
x = tan θ, − 2
< θ < π2 .

Then
1 1
=
(x2 + 1) 3 sec6 θ
and
dx
= sec2 θ

whenever − π2 < θ < π2 . We also have

x = 0 if and only if θ = arctan 0 = 0

and
π
x = 1 if and only if θ = arctan 1 = .
4
Therefore Z 1 Z π/4
1 1
dx = × sec2 θ dθ
0 (x2 + 1)3 0
6
sec θ
Z π/4
= cos4 θ dθ.
0

193
Using the double angle identity for cosine we have
 2
4 cos(2θ) + 1
cos θ =
2

cos2 (2θ) + 2 cos(2θ) + 1


=
4
 
1 cos(4θ) + 1
= + 2 cos(2θ) + 1
4 2

cos(4θ) cos(2θ) 3
= + + .
8 2 8
Therefore
Z 1 Z π/4  
1 cos(4θ) cos(2θ) 3
dx = + + dθ
0 (x + 1)3
2
0 8 2 8
π/4
sin(4θ) sin(2θ) 3θ (4.6)
= + +
32 4 8 0

1 3π
= + .
4 32
We substitute (4.6) into (4.5) and find
Z 1 5
2x + x4 + 4x3 + 2x2 + 2x + 2 8 + 11π
2 3
dx = ln 2 + .
0 (x + 1) 32

We end this section by considering integrals such as


Z b Z b
cos(Ax) sin(Bx) dx and sin(Ax) sin(Bx) dx
a a

where A and B are nonzero real numbers. For the evaluation of such integrals, and similar
ones, the additive identities for sine and cosine are useful. Recall that for real numbers α
and β,
cos(α + β) = cos α cos β − sin α sin β,
cos(α − β) = cos α cos β + sin α sin β,
sin(α + β) = sin α cos β + cos α sin β,
and
sin(α − β) = sin α cos β − cos α sin β.
We illustrate the underlying idea at the hand of the following examples.

194
Example 4.2.15. We evaluate the integral
Z 0
sin(3x) cos(2x) dx.
−π/2

Solution. For x ∈ R we have


sin 5x = sin(3x + 2x) = sin 3x cos 2x + sin 2x cos 3x
and
sin x = sin(3x − 2x) = sin 3x cos 2x − sin 2x cos 3x.
Therefore
sin 5x + sin x = 2 sin 3x cos 2x
so that Z 0 Z 0
1
sin(3x) cos(2x) dx = (sin 5x + sin x) dx
−π/2 2 −π/2

0
cos 5x cos x
= − −
10 2 −π/2

3
= − .
5
Example 4.2.16. We evaluate the integral
Z π/6
cos(x + 1) cos(x − 1) dx.
0

Solution. For x ∈ R we have


cos(2x) = cos((x + 1) + (x − 1)) = cos(x + 1) cos(x − 1) − sin(x + 1) sin(x − 1)
and
cos 2 = cos((x + 1) − (x − 1)) = cos(x + 1) cos(x − 1) + sin(x + 1) sin(x − 1).
Therefore
cos(2x) + cos 2 = 2 cos(x + 1) cos(x − 1)
so that
Z π/6 Z π/6
1
cos(x + 1) cos(x − 1) dx = (cos(2x) + cos 2) dx
0 2 0

π/6
sin 2x cos 2
= + x
4 2 0

3 π cos 2
= + .
8 12

195
In this section it is shown how trigonometric identities are used to evaluate integrals of
trigonometric functions. As is shown at the hand of an example, these methods can be
used, via appropriate substitutions, to evaluate integrals of other kinds of functions as well.

Exercise 4.2

1. Evaluate the following integrals.


Z π/2 Z π
3
(a) sin x cos x dx (b) sin4 x cos3 x dx
0 0

Z π/4 Z π/3
3 2
(c) sin (2x) cos (2x) dx (d) sin2 x dx
−π 0

Z π/3 Z π/3
2
(e) cos (2x) dx (f) sin2 x cos2 x dx
Zπ/4
π/6 Z π/3
0

4
sin2 (3x) + cos3 x dx

(g) sin x dx (h)
0 0

Z π/4 Z π/3
2 2
(i) x sin x dx (j) sin3 x ln(cos x) dx
0 0

2. Evaluate the following integrals.


Z π/4 Z π/6
4
(a) sec x dx (b) sec x tan3 x dx
0 0

Z π/3 Z π/3
3 3
tan2 x + sec2 x dx

(c) sec x tan x dx (d)
−π/3 0

Z 0 Z π/6
5
(e) tan x sec x dx (f) tan3 (2x) sec2 (2x) dx
−π/4 0

Z π/6 Z π/6
3
(g) tan x dx (h) sec5 x dx
0 0

Z π/4 Z π/3
2
(i) x sec x dx (j) x2 sec3 x tan x dx
0 0
Z
(k) sec3 x dx

196
3. Evaluate the following integrals.
Z π Z π/6
(a) cos x cos(4x) dx (b) sin(2x) sin(3x) dx
0 0

Z 2π+2 Z π
(c) sin(3x − 2) cos(2x) dx (d) x cos(x2 − 1) sin(2x2 + 1) dx
−2 0

4. Evaluate the given integrals.


Z π/2 Z π/3
2 4
(a) cot x csc x dx (b) csc3 x cot3 x dx
Z π/4
5π/6 Z 3π/4 π/6

(c) csc3 x dx (d) cot3 x dx


π/2 π/4

5. Evaluate the integral by using the given substitution at the appropriate stage in your
calculation.
Z 1
1
(a) 2 2
dx; x = tan θ + 1
−1 (x − 2x + 2)

2 2

x2 − 4
Z
(b) dx; x = 2 sec θ
2 x
Z 3/2 √
(c) x2 9 − x2 dx; x = 3 sin θ
0

2
x4 + x3 + 7x2 + 13x − 19
Z
(d) dx; x = 2 tan θ
0 (x + 1)(x2 + 4)2
6. Let m, n ≥ 1 be integers. Prove the following.
Z π
(a) sin(nx) cos(mx) dx = 0
−π

Z π  0 if m ̸= n
(b) sin(nx) sin(mx) dx =
−π 
π if m = n

Z π  0 if m ̸= n
(c) cos(nx) cos(mx) dx =
π 
π if m = n

197
4.3 The Fundamental Theorem of Calculus

In this section we prove an important and central result in Calculus, namely, a version of
the Fundamental Theorem of Calculus. We have already encountered the following version
of this theorem.

Theorem 4.3.1. If f is a continuous function defined on an interval [a, b], and F is a


differentiable function on [a, b] such that F ′ (t) = f (t), then
Z b
f (t)dt = F (b) − F (a).
a

Suppose now that we let the upper bound of the definite integral vary through the interval
[a, b]. That is, we consider the definite integrals
Z x
f (t)dt, a ≤ x ≤ b. (4.7)
a

For each fixed value of x we can calculate the value of the integral (4.7). If we keep the
integrand f and the interval [a, b] fixed, then clearly the value of the definite integral (4.7)
depends only on x. Therefore the expression (4.7) defines a function of x,
Z x
F (x) = f (t)dt, a ≤ x ≤ b. (4.8)
a

It is important to note that in (4.8) we have defined a new function using an integral. The
result we are going to prove tells us something about this function.

Theorem 4.3.2 (Fundamental Theorem of Calculus). If f is continuous on the closed


interval [a, b], then the function
Z x
F (x) = f (t)dt, a ≤ x ≤ b
a

is continuous on [a, b] and differentiable on (a, b) and F ′ (x) = f (x) for every x ∈ (a, b).

We give the proof in two parts.

Proof that F is differentiable on (a, b). Fix a point a < c < b.


F (c + h) − F (c)
Step 1 We prove that lim+ = f (c).
h→0 h

198
For h > 0,
Z c+h c 
F (c + h) − F (c)
Z
1
= f (t) dt − f (t) dt
h h a a

Z c Z c+h Z c 
1
= f (t) dt + f (t) dt − f (t) dt (4.9)
h a c a

Z c+h
1
= f (t) dt.
h c

Fix a real number ϵ > 0. Since f is continuous at c, it follows that lim f (t) = f (c). Therefore
t→c
there exists a real number δ > 0 so that
ϵ
if |t − c| < δ then |f (t) − f (c)| < .
2
Therefore we have
ϵ ϵ
if |t − c| < δ then f (c) − < f (t) < f (c) + .
2 2
Let 0 < h < δ. Then we have
ϵ ϵ
if c ≤ t ≤ h + c then |t − c| ≤ h < δ so that f (c) − < f (t) < f (c) + .
2 2
It now follows that Theorem A.1.10 (ii) that
Z c+h
ϵ ϵ
h[f (c) − ] ≤ f (t) dt ≤ h[f (c) + ].
2 c 2
Therefore Z c+h
ϵ 1 ϵ
− ≤ f (t) dt − f (c) ≤ + .
2 h c 2
By (4.9) we have
ϵ F (c + h) − F (c) ϵ
− ≤ − f (c) ≤ .
2 h 2
Therefore
F (c + h) − F (c) ϵ
− f (c) ≤ < ϵ.
h 2
This is true for all ϵ > 0, so by the definition of a right-hand limit,
F (c + h) − F (c)
lim+ = f (c).
h→0 h
F (c + h) − F (c)
Step 2 We prove that lim− = f (c).
h→0 h
The proof is similar to that in Step 1, and is therefore given as an exercise, see Exercise 4.3

199
number 5 (a).
Step 3 We prove that F ′ (c) = f (c).
Since
F (c + h) − F (c) F (c + h) − F (c)
lim+ = f (c) = lim−
h→0 h h→0 h
it follows by Theorem 3.3.7 that

F (c + h) − F (c)
F ′ (c) = lim = f (c) = f (c).
h→0 h
Since c ∈ (a, b) is arbitrary, this shows that F is differentiable on (a, b) with derivative
F ′ (x) = f (x) for a < x < b.

Proof that F is continuous. Because F is differentiable on (a, b) it is also continuous on


(a, b), see Theorem A.1.4. We therefore only have to show that F is continuous at a and b.
Step 1 We prove that F is continuous at b.
Note that
Z b
F (b) = f (t)dt. (4.10)
a

Theorem A.1.11 implies that for all a ≤ x < b be have


Z x
F (x) = f (t)dt
a

Z b Z b
= f (t)dt − f (t)dt (4.11)
a x

Z b
= F (b) − f (t)dt.
x

Since f is continuous on [a, b] Theorem A.1.3 implies that there are points c and d in [a, b]
such that

f (c) ≤ f (t) ≤ f (d), a ≤ t ≤ b. (4.12)

It follows from (4.12) and Theorem A.1.10 (ii) that


Z b
(b − x)f (c) ≤ f (t)dt ≤ (b − x)f (d) for all x ∈ [a, b].
x

Therefore (4.11) implies that

F (b) − (b − x)f (d) ≤ F (x) ≤ F (b) − (b − x)f (c). (4.13)

200
But lim− [F (b) − (b − x)f (d)] = lim− [F (b) − (b − x)f (c)] = F (b) so that Theorem 3.3.4 and
x→b x→b
(4.13) imply that lim− F (x) = F (b). Therefore F is continuous at b.
x→b
Step 2 We prove that F is continuous at a.
The proof is similar to that in Step 1, and is given as an exercise, see Exercise 4.3 number
5 (b).

Let us consider a simple example.


Z x
Example 4.3.3. Consider the function F (x) = t2 sin(t2 )dt, x ∈ R. We show that
0
F ′ (x) = x2 sin(x2 ) for all x ∈ R.

Solution. The function f (t) = t2 sin(t2 ), t ∈ R, is continuous. Therefore, if x > 0 then


F ′ (x) exists and F ′ (x) = x2 sin(x2 ) by the Fundamental Theorem of Calculus, Theorem
4.3.2.

Next we deal with the case x ≤ 0. In this case we have


Z x Z 0
2 2
F (x) = t sin(t )dt = − t2 sin(t2 )dt.
0 x

Note that Theorem 4.3.2 cannot be applied directly in this situation. However, for a fixed
real number a < 0 we have
Z 0 Z x Z 0
2 2 2 2
t sin(t )dt = t sin(t )dt + t2 sin(t2 )dt, a < x ≤ 0.
a a x

Therefore
Z 0 Z x Z 0
2 2 2 2
F (x) = − t sin(t )dt = t sin(t )dt − t2 sin(t2 )dt, a < x ≤ 0. (4.14)
x a a

By the Fundamental Theorem of Calculus, Theorem 4.3.2,


Z x
d
t2 sin(t2 )dt = x2 sin(x2 ), a < x ≤ 0.
dx a

The final integral in (4.14) is a constant, so it has derivative 0. Therefore


Z x Z 0
′ d 2 2 d
F (x) = t sin(t )dt − t2 sin(t2 )dt = x2 sin(x2 ), a < x ≤ 0.
dx a dx a

This holds for every fixed a < 0 and all x ∈ (a, 0]. Therefore

F ′ (x) = x2 sin(x2 ), x ≤ 0.

201
Note that Theorem 4.3.2 deals only with functions
Z x
F (x) = f (t)dt, a ≤ x ≤ b,
a

where f : [a, b] → R is continuous on [a, b]. Example 4.3.3 suggests a way in which Theorem
4.3.2 may be adapted to deal also with a function
Z x
F (x) = f (t)dt, x ∈ R,
a

where f : R → R is continuous.

Theorem 4.3.4. Suppose that f : R → R is continuous on R. For a fixed real number a,


the function Z x
F (x) = f (t)dt, x ∈ R
a

is differentiable on R, and F (x) = f (x), x ∈ R.

The proof of this result is given as an exercise, see Exercise 4.3 number 6. We now turn
to slightly more complex examples. In these examples it is not always possible to apply
Theorems 4.3.2 or 4.3.4 directly to the given function F , which is defined as an integral.
One strategy is to express the function F in terms of simpler functions to which these
theorems can be applied. We demonstrate this method at the hand of some examples.
Z cos x
2
Example 4.3.5. Consider the function F (x) = et dt. We show that F is differen-
1
tiable on R, and find its derivative.

2
Solution. Because f (t) = et is continuous on R, the function
Z x
2
G(x) = et dt
1

is differentiable on R by Theorem 4.3.4. Furthermore, cos is differentiable on R. Therefore


the Chain Rule implies that
Z cos x
2
F (x) = et dt = G(cos x)
1

is differentiable on R. Applying Theorem 4.3.4 and the Chain Rule, we find that
d 2
F ′ (x) = G′ (cos x)
cos x = − sin(x)ecos x .
dx
Z arctan x
Example 4.3.6. Consider the function F (x) = cos8 tdt. We show that F is dif-
x2
ferentiable on R, and find its derivative.

202
Solution. Let Z x
G(x) = cos8 tdt, x ∈ R.
0
8
Since f (t) = cos t, t ∈ R, is continuous on R, it follows from Theorem 4.3.4 that G is
differentiable on R and
G′ (x) = cos8 x, x ∈ R.
For x ∈ R we have Z arctan x Z 0
8
F (x) = cos tdt + cos8 tdt
0 x2

Z arctan x Z x2
8
= cos tdt − cos8 tdt
0 0

= G(arctan x) − G(x2 ).
By the Chain Rule, F is differentiable on R and
d d cos8 (arctan x)
F ′ (x) = G′ (arctan x) arctan x − G′ (x2 ) x2 = − 2x cos8 (x2 )
dx dx 1 + x2
for every x ∈ R.

The next example shows how the Fundamental Theorem of Calculus is used to obtain basic
information about the behaviour of a function which is defined as an integral, without
calculating explicitly the values of the function.
Z x 3
t − 3t2 + 2t
Example 4.3.7. Consider the function F (x) = dt, x ∈ R. We make a
0 t2 + 1
rough sketch of the graph of F , indicating its turning points as well as its behaviour as x
tends to ∞ and −∞, respectively.

Solution. First note that


0 3
t − 3t2 + 2t
Z
F (0) = dt = 0.
0 t2 + 1
Observe that this is the only value of F that we can find without calculating the antideriva-
tive of the integrand.

Using Theorem 4.3.4, we see that


x3 − 3x2 + 2x
F ′ (x) = , x ∈ R. (4.15)
x2 + 1
Thus we can find the critical points of F by setting F ′ (x) = 0. We have

x3 − 3x2 + 2x
F ′ (x) = = 0 if and only if x = 0, x = 1 or x = 2.
x2 + 1

203
We decide whether each of the critical points is a local minimum or a local maximum by
using the first derivative test. That is, we look at the sign of F ′ (x) on each of the intervals
(0, 1), (1, 2) and (2, ∞).

Note that x2 + 1 > 0 for all x ∈ R, and

x(x − 1)(x − 2)
F ′ (x) = , x ∈ R.
x2 + 1
We have
if x < 0 then x − 1 < 0 and x − 2 < 0 so that F ′ (x) < 0.
We also have,

if 0 < x < 1 then x > 0, x − 1 < 0 and x − 2 < 0 so that F ′ (x) > 0.

Furthermore,

if 1 < x < 2 then x > 0, x − 1 > 0 and x − 2 < 0 so that F ′ (x) < 0.

Lastly,

if x > 2 then x > 0, x − 1 > 0 and x − 2 > 0 so that F ′ (x) > 0.

Therefore x = 0 is a local minimum of F , while x = 1 is a local maximum of F and x = 2


is a local minimum of F .

We now determine the behaviour of F as x tends to ∞. Observe that


t3 − 3t2 + 2t
lim = ∞.
t→∞ t2 + 1
Therefore there exists a real number M > 0 so that
t3 − 3t2 + 2t
if t > M then > 1.
t2 + 1
Using Theorem A.1.10 (i) and Theorem A.1.11 we now have
M x 3
t3 − 3t2 + 2t t − 3t2 + 2t
Z Z
F (x) = dt + dt
0 t2 + 1 M t2 + 1
x 3
t − 3t2 + 2t
Z
= F (M ) + dt
M t2 + 1
Z x
≥ F (M ) + 1 dt
M

= F (M ) + x − M

204
for all x ≥ 0. But
lim [F (M ) + x − M ] = ∞.
x→∞

Therefore, see Exercise 3.4 number 2(a), lim F (x) = ∞.


x→∞

Lastly, we examine the behaviour of F (x) as x tends to −∞. Note that if t ≤ 0 then t3 ≤ 0
and −3t2 ≤ 0. Therefore

if t ≤ 0 then t3 − 3t2 + 2t ≤ 2t.

Therefore
t3 − 3t2 + 2t 2t
if t ≤ 0 then 2
≤ 2 .
t +1 t +1
Note that, for x < 0,
x 3 0 3
t − 3t2 + 2t t − 3t2 + 2t
Z Z
F (x) = dt = − dt.
0 t2 + 1 x t2 + 1
Using Theorem A.1.10 (i) we get
Z 0
2t
F (x) ≥ − dt = ln(x2 + 1).
x t2 +1

Since lim ln(x2 + 1) = ∞, it follows that lim F (x) = ∞.


x→−∞ x→−∞

Below we give a rough sketch of the graph of F , using the information that we have obtained.

y = F (x)


x

1 2 3

In this section, we have introduced a new kind of function, namely, functions which are
defined using integrals. Such functions appear with great frequency in mathematics and its
applications. In fact, it can be shown that the solutions of a differential equation

u′ (x) + p(x)u(x) = 0,

205
with p a continuous function on R, are of the form
Rx
u(x) = Ce− 0 p(t)dt
.
Here C ∈ R is a constant. The Fundamental Theorem of Calculus is the basic tool we use
to analyse functions of this type.

Exercise 4.3

1. Explain why each of the following functions is differentiable. Give the domain of each
function, and find its derivative without evaluating the integral.
Z x Z tan x
t 6 t2
(a) F (x) = e cos tdt (b) G(x) = dt
0 0 t2 + 2
Z ex Z cos x
2 4 2 2
(c) H(x) = t ln(t + t + 1)dt (d) F (x) = et dt
x2 sin x

Z x2 Z x2
10
(e) G(x) = cos tdt (f) H(x) = ln(t)dt
x4 ln(x)
Z x
2. Consider the function F (x) = f (t)dt, x ∈ R, where
0

 t − 1 if t ≤ 0
f (t) =
1 if t > 0.

(a) Show that F is not differentiable at x = 0, and explain why this does not con-
tradict Theorem 4.3.4.
(b) Determine all x ∈ R at which F is differentiable, and find F ′ (x) at each of these
values of x.
(c) Show that F is continuous on R.
3. Consider the function
x 3
t + 2t2 − t − 2
Z
F (x) = dt, x ∈ R.
0 t2 + 1
(a) Find all the critical points of F , and determine whether or not these are local
maxima or local minima.
(b) Determine the intervals on which F is increasing and decreasing, respectively.
(c) Show that lim F (x) = ∞ and lim F (x) = ∞.
x→∞ x→−∞
(d) Make a rough sketch of the graph of F .
4. Suppose that p : R → R is a continuous function. Let
Rx
u(x) = Ce− 0 p(t)dt

where C is a constant.

206
(a) Prove that u′ (x) + p(x)u(x) = 0 for every x ∈ R.
(b) Show that for every u0 ∈ R, there exists exactly one value for the constant C so
that u(0) = u0 .
5. Consider the proof of Theorem 4.3.2.
F (c + h) − F (c)
(a) Prove that lim− = f (c) for a fixed c ∈ (a, b).
h→0 h
(b) Prove that F is continuous at a.
6. The aim of this exercise is to prove Theorem 4.3.4. Let F and a be as in Theorem
4.3.4.
(a) Assume that x > a. Prove that F is differentiable at x.
(b) Assume that x ≤ a. Prove that F is differentiable at x. [HINT: Apply the
argument used in Example 4.3.3]
7. We use to Mean Value Theorem for the integral, Theorem 4.4.4, to prove the Fun-
damental Theorem of Calculus, Theorem 4.3.2. Now assume that the Fundamental
Theorem of Calculus is true, and use it to prove the Mean Value Theorem for the
integral. [HINT: You also need to use the Mean Value Theorem for the derivative.]
8. Assume that f : [a, b] → R is a bounded, integrable function. Consider the function
Z x
F (x) = f (t)dt, a ≤ x ≤ b.
a

(a) Fix a point c ∈ (a, b). Use the definition of a one-sided limit to prove that
lim− F (x) = F (c) and lim+ F (x) = F (c). [HINT: Because f is bounded, there
x→c x→c
exists a real number M > 0 so that |f (x)| ≤ M for all x ∈ [a, b].]
(b) Conclude that F is continuous on (a, b).
(c) Now prove that F is continuous at a and at b. [HINT: Proceed as in (a).]

4.4 The Integral Mean Value Theorem

In science, and in everyday life, we often make use of averages. For instance, meteorologists
usually speak of the average yearly rainfall in a region, or the average daily temperature in
some city. When wisely used, averages can give useful and easily understood information
about complex situations. Furthermore, averages are easy to calculate: The average of a
finite set of numbers S = {x1 , x2 , . . . , xn } is
x1 + x2 + · · · + xn
Average(S) = .
n
In this section, we will study averages of functions defined on an interval [a, b]. This enables
us to calculate averages of quantities that vary continuously, such as the speed of a car.

207
Our first tasks is to determine how the average value of a function on an interval [a, b] should
be defined. Suppose that f : [a, b] → R is integrable on [a, b]. That is, the definite integral
Z b
f (x)dx
a

exists. How do we define the average value of f over the interval [a, b]? For a fixed positive
integer n, let ∆x = (b − a)/n, and set
x1 = a + ∆x, x2 = a + 2∆x, x3 = a + 3∆x, ..., xn−1 = a + (n − 1)∆x = b − ∆x, xn = b.
The average of the values of f at the points x1 , x2 , ..., xn is
1
An = (f (x1 ) + f (x2 ) + ... + f (xn ))
n
1
= (f (x1 )∆x + f (x2 )∆x + ... + f (xn )∆x)
b−a
n
1 X
= f (xi )∆x.
b − a i=1

Because f is integrable on [a, b], we know that


n
X Z b
‘ f (xi )∆x tends to f (x)dx as n tends to ∞’.
i=1 a

Therefore Z b
1
‘An tends to f (x)dx as n tends to ∞’.
b−a a
We are therefore led to define the average value of f as follows.
Definition 4.4.1. For a function f : [a, b] → R that is integrable on [a, b], the average value
of f on [a, b] is
Z b
1
fave = f (x)dx.
b−a a
Remark 4.4.2. The discussion preceding Definition 4.4.1 is not a proof that the average
value of f : [a, b] → R is given by
Z b
1
fave = f (x)dx.
b−a a
This fact cannot be proven, since we have up to now not defined the average value of a
function. All that we have done is to motivate our definition.

Note that the expressions


n
X Z b
‘ f (xi )∆x tends to f (x)dx as n tends to ∞’
i=1 a

208
and Z b
1
‘An tends to f (x)dx as n tends to ∞’
b−a a
are not rigorous. These statements can be made rigorous, but this falls outside the scope of
this text.

Consider the following example.


Example 4.4.3. We calculate the average value of the function f (x) = cos x + x on the
interval [0, 2π].

Solution. We have
2π 2π
x2
Z
cos x + xdx = sin x +
0 2 0

= 2π 2 .
Therefore Z 2π
1
fave = cos x + xdx = π.
2π 0

For a finite set of numbers S = {x1 , x2 , . . . , xn }, it is possible that

Average(S) ̸= xk for all k = 1, . . . , n.

Indeed, the average number of goals scored by a soccer player during a season could be, say
1.5 goals per game. Certainly, no one has ever scored half a goal in a soccer match! On the
other hand, a continuous function on an interval [a, b] always attains its average value at
some point in [a, b]. This result is known as the Mean Value Theorem for the integral. We
prove this result using the Fundamental Theorem of Calculus, Theorem 4.3.2.
Theorem 4.4.4 (Mean Value Theorem). If f is continuous on the closed interval [a, b],
then there exists a number p in (a, b) such that
Z b
1
f (p) = fave = f (t)dt.
b−a a

Proof. Let Z x
F (x) = f (t)dt, a ≤ x ≤ b.
a

Because f is continuous on the interval [a, b], the Fundamental Theorem of Calculus, The-
orem 4.3.2, implies that F is continuous on [a, b], differentiable on (a, b) and F ′ (x) = f (x),
a < x < b. By the Mean Value Theorem for the derivative, Theorem A.1.9, there exists a
point p ∈ (a, b) so that
F (b) − F (a)
F ′ (p) = .
b−a

209
But Z b Z a

F (p) = f (p), F (b) = f (t)dt and F (a) = f (t)dt = 0.
a a
Therefore Z b
1
f (p) = f (t)dt = fave .
b−a a

The following example illustrates Theorem 4.4.4.


Example 4.4.5. We calculate the average value of the function f (x) = 3x2 − 8x + 1 on the
interval [0, 2], and find all points p ∈ [0, 2] where f takes on this value.

Solution. The average value of f on [0, 2] is


Z 2
1
fave = 3x2 − 8x + 1dx = −3.
2−0 0
According to Theorem 4.4.4, there exists at least one number p ∈ [0, 2] so that f (p) = fave =
0. Solving the equation
f (p) = 3p2 − 8p + 1 = −3
we find p = 2 or p = 32 . Therefore f attains its average value on [0, 2] at the points x = 2
and x = 32 .

We have shown how the average value of a continuously varying quantity is defined. As an
application of the Fundamental Theorem of Calculus, we obtained a version of the Mean
Value Theorem for the definite integral.

Exercise 4.4

1. Find the average value of f on the given interval.

(a) f (x) = cos2 (x) sin4 (x), 0 ≤ x ≤ π (b) f (x) = x4 − e2x , 0 ≤ x ≤ 5

√ 2 √
(c) f (x) = x sin(x2 + π2 ), 0 ≤ x ≤ π−1 (d) f (x) = , 0 ≤ x ≤ 3
x2 + 1

2x − 1 x+1
(e) f (x) = ,0≤x≤2 (f) f (x) = ,2≤x≤4
x3 + 2x2 + 2x + 1 x2 +x−2

(g) f (x) = cos3 (x) sin4 (x), 0 ≤ x ≤ π (h) f (x) = x2 e3x , 1 ≤ x ≤ 4

π
(i) f (x) = sin(2x)ex , ≤ x ≤ 2π
12
2. Find all points x ∈ [0, π] where the function f in question 1. (g) takes on its average
value.

210
3. Let 
 x + 2 if −1 ≤ x ≤ 0
f (x) =
x − 2 if 0 < x ≤ 1.

(a) Show that f (x) ̸= fave for all x ∈ [−1, 1].


(b) Explain why (a) does not contradict Theorem 4.4.4.

4. Assume that Theorem 4.4.4 is true. Assume that f : [a, b] → R is differentiable on


[a, b], and f ′ is continuous on [a, b]. Prove that there exists a number c ∈ (a, b) such
that
f (b) − f (a)
f ′ (c) = .
b−a
This shows that, for a function with continuous derivative on [a, b], the Mean Value
Theorem for the integral, Theorem 4.4.4, implies the Mean Value Theorem for the
derivative, Theorem A.1.9.

5. We have noted that for a finite set of numbers S = {x1 , x2 , . . . , xn }, it is possible that

Average(S) ̸= xk for all k = 1, . . . , n.

However, the following is true: If f : [a, b] → R is continuous on [a, b], and x1 , . . . , xn ∈


[a, b], then there exists a real number c ∈ [a, b] so that
n
1X
f (c) = f (xi ).
n i=1

Use the Intermediate Value Theorem, Theorem A.1.1, to prove this result. [HINT:
You may assume that x1 < x2 < · · · < xn .]

6. The aim of this exercise is to prove a more general version of Theorem 4.4.4. Assume
that f, g : [a, b] → R are continuous on [a, b], and that g does not change sign on [a, b];
that is, either g(x) ≥ 0 for all x ∈ [a, b] or g(x) < 0 for all x ∈ [a, b]. Prove that there
exists a real number c ∈ [a, b] such that
Z b Z b
f (c) g(x)dx = f (x)g(x)dx.
a a

In the above formula, the function g is a weight, and the integral


Z b
f (x)g(x)dx
a

is called the weighted average of f with respect to g. This kind of average is useful
when the values of f at some points x ∈ [a, b] are more important than its values at
other points. Think, for instance, of the average number of goals scored by a soccer
player during a season. We may want a goal scored during an away game to count
more than one scored during a home game.

211
4.5 Arc Length

In this section we present an application of integration. In particular, we consider the prob-


lem of finding the length of certain ‘curves’ in the plane. Let f : [a, b] → R be differentiable
at every point in an interval [a, b]. Also assume that f ′ is continuous on [a, b]. Let C denote
the graph of f ; that is,

C = {⟨x, y⟩ ∈ R2 : a ≤ x ≤ b, y = f (x)}.

Typically, we think of C as a ‘curve’ in the plane R2 , see the figure below. It is customary
to speak of ‘the curve C with equation y = f (x), x ∈ [a, b]’.

⟨x, f (x)⟩
y •
C

x
a x b

Before we can calculate the ‘length’ of C, we must first define what we mean by the ‘length’
of C. We note the following.

If C is the line segment between points p̄, q̄ ∈ R2 , then the length of C is the distance
between p̄ and q̄; that is,
Length of C = ∥p̄ − q̄∥,
see Remark 1.3.1 (3).

Suppose that C consists of a finite number of line segments C1 , C2 , . . . , Cn . Then we may


define the length of C as

Length of C = Length of C1 + Length of C2 + · · · + Length of Cn .

For instance, with C as in the sketch below, we have

Length of C = ∥p̄2 − p̄1 ∥ + ∥p̄3 − p̄2 ∥ + ∥p̄4 − p̄3 ∥ + ∥p̄5 − p̄4 ∥.

212
y

C • p̄3
p̄2
• • p̄5
• p̄4
p̄1 •
x

For a general curve C with equation y = f (x), x ∈ [a, b], we approximate C using curves
consisting of a finite number of line segments. Let n ≥ 2 be a natural number, and let
∆x = (b − a)/n. Set

x0 = a, x1 = x0 + ∆x, x2 = x1 + ∆x, . . . , xn = xn−1 + ∆x = b.

For each i = 0, 1, . . . , n let p̄i = ⟨xi , f (xi )⟩. Denote by Ci the line segment between p̄i and
p̄i+1 for i = 0, 1, . . . , n − 1.

p̄n


p̄1 p̄i+1
p̄0 • p̄2 •
• • p̄i
• •
x
a x1 x2 xi xi+1 b

Then
Length of C ≈ Length of C0 + Length of C1 + · · · + Length of Cn−1

= ∥p̄1 − p̄0 ∥ + ∥p̄2 − p̄1 ∥ + · · · + ∥p̄i+1 − p̄i ∥ + · · · + ∥p̄n − p̄n+1 ∥.


For every i = 0, 1, . . . , n − 1 we have
p
∥p̄i+1 − p̄i ∥ = (xi+1 − xi )2 + (f (xi+1 ) − f (xi ))2
p
= ∆x 1 + [(f (xi+1 ) − f (xi ))/∆x]2 .

213
Because f is differentiable, it follows from the Mean Value Theorem for the derivative,
Theorem A.1.9, that for every i = 0, 1, . . . , n − 1 there exists an x∗i ∈ (xi , xi+1 ) so that
f (xi+1 ) − f (xi )
= f ′ (x∗i ).
∆x
Hence p
∥p̄i+1 − p̄i ∥ = ∆x 1 + [f ′ (x∗i )]2 , i = 0, 1, . . . , n − 1
so that
n−1
X p
Length of C ≈ 1 + [f ′ (x∗i )]2 ∆x.
i=0
This sum is a Riemann sum for the integral
Z bp
1 + [f ′ (x)]2 dx.
a

This integral exists, because f ′ is continuous on [a, b]. Therefore


n−1
X Z bp
p
′ ∗ 2
‘ 1 + [f (xi )] ∆x tends to 1 + [f ′ (x)]2 dx as n tends to ∞’.
i=0 a

It therefore seems reasonable to define the length of C as follows.


Definition 4.5.1. Let f : [a, b] → R be differentiable at every point in an interval [a, b].
Also assume that f ′ is continuous on [a, b]. The length of the curve with equation y = f (x),
x ∈ [a, b], is
Z bp
L= 1 + [f ′ (x)]2 dx.
a

We demonstrate Definition 4.5.1 at the hand of some examples.


Example 4.5.2. Let f (x) = 2x3/2 + 1, x ≥ 0. We calculate the length L of the curve C
with equation
y = f (x), 1 ≤ x ≤ 3.


Solution. We have f ′ (x) = 3 x, x ≥ 0. Therefore
Z 3p
L = 1 + [f ′ (x)]2 dx
1

3 √
Z
= 1 + 9x dx
1

3
2(1 + 9x)3/2
=
27 1

2(283/2 − 103/2 )
= .
27

214
x2
Example 4.5.3. Let f (x) = , x ∈ R. We calculate the length L of the curve C with
2
equation
y = f (x), 0 ≤ x ≤ 1.

Solution. We have f ′ (x) = x, x ∈ R. Therefore


Z 1p
L = 1 + [f ′ (x)]2 dx
0

Z 1 √
= 1 + x2 dx.
0

In order to evaluate the integral, we make the substitution


π
x = tan θ, − 2
< θ < π2 .

Then, see Example 4.2.12 and Exercise 4.2 number 2 (k), we have
Z π/4
L = sec3 θ dθ
0

π/4
ln | sec θ + tan θ| + sec θ tan θ
=
2 0
√ √
ln( 2 + 1) + 2
= .
2

It should be noted that, in principle, Definition 4.5.1 can be used to calculate the length
of any curve that is the graph of a differentiable function, we are often confronted with
a definite integral that is extremely difficult, if not impossible, to evaluate by hand. This
situation occurs even in relatively simple cases. Indeed, if we attempt to find the length of
the curve with equation
y = sin x, 0 ≤ x ≤ π,
we encounter the definite integral
Z π √
1 + cos2 x dx.
0

It is impossible to evaluate this integral explicitly, as the function g(x) = 1 + cos2 x,
x ∈ R, does not have an elementary anti-derivative. In practice, we use numerical methods
to calculate accurate approximations for this type of integral.

215
Exercise 4.5

1. In each case, calculate the length of the curve with given equation.

(a) y = 2x + 1, 0 ≤ x ≤ 2 (b) y = 2(1 + x)3/2 , 0 ≤ x ≤ 1

x4 1
(c) y = ln(cos x), 0 ≤ x ≤ π/4 (d) y = + 2, 1 ≤ x ≤ 2
8 4x

x2 ln x √
(e) y = − ,1≤x≤2 (f) y = x − x2 + arcsin x
4 2
Z x√
(f) y = t3 − 1dt, 1 ≤ x ≤ 4
1

2. Consider the curve C in R2 with equation y = f (x), a ≤ x ≤ b where f is differentiable


with continuous derivative on [a, b].
(a) Write down a function s : [a, b] → R so that s(x) gives the length of the part of
the curve C from the point ⟨a, f (a)⟩ to ⟨x, f (x)⟩.
(b) Show that the rate of change in s(x) (with respect to x) is at least 1 for all
x ∈ (a, b).
(c) When is the rate of change in s(x) equal to 1?

4.6 The Natural Logarithmic and Exponential Func-


tions

The exponential function exp(x) = ex and its inverse, the logarithmic function ln, are among
the most important functions in mathematics. We are familiar with their basic algebraic
and analytical properties. These include the so-called exponential laws and the laws of
logarithms, and the ‘rules’ for differentiating and integrating these functions. There is,
however, a gap in our knowledge.
√ Firstly, how is the number e defined? Secondly, what
does an expression like e 2 mean? For positive integers m and n, en is defined as the
number obtained by multiplying e by itself n times, while e1/m is defined as that number
which, when multiplied by itself m times, the result is e. The fact that the number e1/m
exists is a consequence of the Intermediate Value √ Theorem A.1.1. By using the definitions
n 1/m n/m
of e and √ e , we can define e . However, 2 is not a rational number, so we cannot
define e 2 in this way. In this section, we give rigorous definitions for the logarithmic and
exponential functions, as well as for the number e.

Definition 4.6.1. The natural logarithm is the function defined by


Z x
1
ln(x) = dt, x > 0.
1 t

216
1
Remark 4.6.2. Since the function f (t) = is continuous on (0, +∞), it follows from
t
d 1
Theorem 4.3.4 that ln is a differentiable function on (0, +∞), and ln(x) = for all
dx x
x > 0. Furthermore, Z 1
1
ln(1) = dt = 0.
1 t

We proceed to establish some algebraic properties of the function ln introduced in Definition


4.6.1.
Theorem 4.6.3. Let a and b be positive real numbers, and p a rational number. Then the
following are true.

(1) ln(ab) = ln(a) + ln(b).

(2) ln ab = ln(a) − ln(b).




(3) ln(ap ) = p ln(a).

We give a proof of (1), leaving the proofs of (2) and (3) as exercises, see Exercise 4.6 numbers
1 and 2.

Proof of (1). Consider the function f (x) = ln(ax), x > 0. According to the Fundamental
Theorem of Calculus, Theorem 4.3.2, and the Chain Rule,
1 d 1 d
f ′ (x) = ax = = ln(x), x > 0.
ax dx x dx
Therefore f (x) = ln(x) + K, where K is a constant. But, since ln(1) = 0,

ln(a) = f (1) = ln(1) + K = K.

Therefore ln(ax) = f (x) = ln(x) + ln(a) for all x > 0. In particular, setting x = b we have

ln(ab) = f (b) = ln(b) + ln(a).

Next we consider some of the analytic properties of ln.


Theorem 4.6.4. The function ln satisfies the following properties.

(1) lim ln(x) = ∞.


x→∞

(2) lim ln(x) = −∞.


x→0+

(3) ln is one-to-one; that is, if ln(a) = ln(b) for some a, b > 0, then a = b.

(4) ln has range R.

217
We prove (1) and (4), and give the proofs of (2) and (3) as exercises, see Exercise 4.6
numbers 3 and 4.

Proof of (1). The proof makes use of the following fact: For every real number M there
exists a natural number n so that n > M .
M
Consider a real number M > 0. Then there exists an integer n so that n > . Because
ln(2)
d 1
ln(x) = > 0 for all x > 0, ln is an increasing function on (0, ∞). Therefore, by
dx x
Theorem 4.6.3 (3),

M
if x > 2n then ln(x) > ln(2n ) = n ln(2) > ln(2) = M.
ln(2)

Since this is true for all M > 0, it follows by Definition 3.4.9 that lim ln(x) = ∞.
x→∞

Proof of (4). We must show that for every real number y there exists a real number x > 0
so that ln(x) = y. Fix a real number y. Since lim ln(x) = ∞, there exists a real number
x→∞
b > 0 so that ln(b) > y. Because limx→0+ ln(x) = −∞, there exists a real number a > 0 so
that ln(a) < y. Therefore
ln(a) < y < ln(b).
Since ln is continuous on (0, ∞) by the Fundamental Theorem of Calculus, Theorem 4.3.2,
it follows from the Intermediate Value Theorem, Theorem A.1.1, that there is a real number
x between a and b so that ln(x) = y. Since this is true for every real numbers y, it follows
that ln has range R.

It follows from Theorem 4.6.4 (3) and (4) that there exists exactly one real number x > 0
so that ln(x) = 1. This is a very special number.

Definition 4.6.5. The number e is the unique positive real number so that ln(e) = 1.

Theorem 4.6.4 implies that ln : (0, ∞) → R is one-to-one and onto. Therefore ln has an
inverse

exp : R → (0, ∞)

that is one-to-one and onto. That is,

ln(exp(x)) = x, x ∈ R and exp(ln(x)) = x, x > 0. (4.16)

We now show how the number e is related to the inverse, exp, of ln.

Theorem 4.6.6. Let x be a rational number. Then exp(x) = ex .

218
Proof. According to (4.16), ln(exp(x)) = x. Since x is a rational number, Theorem 4.6.3
(3) implies that ln(ex ) = x ln(e) = x. Therefore ln(exp(x)) = ln(ex ). Since ln is one-to-one
by Theorem 4.6.4 (3), it follows that exp(x) = ex .

Based on Theorem 4.6.6, we introduce the following definitions.

Definition 4.6.7. For an irrational number x, ex is the real number exp(x).

Definition 4.6.8. For all real numbers x and a > 0, ax = ex ln(a) .

The algebraic properties of the function exp are given in the following result.

Theorem 4.6.9. Let a and b be real numbers, and r a rational number. Then the following
statements are true.

(1) ea+b = ea eb .
ea
(2) ea−b = .
eb
(3) (ea )r = ear .

We prove (1), and give the proofs of (2) and (3) as exercises, see Exercise 4.6 numbers 5
and 6.

Proof of (1). By Definition 4.6.7 we have ea+b = exp(a + b). Therefore Theorem 4.6.3 (3)
and Definition 4.6.5 imply that

ln(ea+b ) = (a + b) ln e = a + b = a ln(e) + b ln(e) = ln(ea ) + ln(eb ).

By Theorem 4.6.3 (1), ln(ea ) + ln(eb ) = ln(ea eb ) so that ln(ea+b ) = ln(ea+b ). Since ln is
one-to-one, see Theorem 4.6.4 (3), it follows that ea+b = ea eb .

The analytical properties of exp are given below.

Theorem 4.6.10. The function exp satisfies the following properties.


d x
(1) e = ex for all x ∈ R.
dx
(2) lim ex = ∞.
x→∞

(3) lim ex = 0.
x→−∞

We prove (1) and (2), and leave the proof of (3) as an exercise, see Exercise 4.6 number 7.

219
Proof of (1). By the Fundamental Theorem of Calculus, Theorem 4.3.2,
d 1
ln(x) = .
dx x
Since exp(x) = ex is the inverse of ln, it follows from the Inverse Function Theorem that
exp is differentiable at every x ∈ R, and
 −1
d 1
exp(x) = = exp(x), x ∈ R.
dx exp(x)
d x
That is, e = ex for all x ∈ R.
dx

Proof of (2). Since


d
exp(x) = exp(x) > 0, x ∈ R
dx
it follows that exp is an increasing function. That is,

if a < b then exp(a) < exp(b). (4.17)

Fix ϵ > 0, and let M = ln(ϵ). It follows from (4.17) that

if x > M then exp(x) > exp(M ) = exp(ln(ϵ)) = ϵ.

Since this is true for all ϵ > 0, it follows that lim ex = ∞.


x→∞

We have shown how to define the natural logarithmic and exponential functions using the
Fundamental Theorem of Calculus. This demonstrates the usefulness of the Fundamental
Theorem of Calculus. It is possible to define the exponential function in many different
ways. This is an example of an important mathematical idea; a mathematical object exists
independently of any particular definition. That is, there are different representations of the
same mathematical object, each with its advantages and disadvantages. The definition of
the exponential function that we give here has the advantage that the so-called ‘exponential
laws’ are easily derived, as is clearly demonstrated.

Exercise 4.6

1. Use Theorem 4.6.3 (1) to prove Theorem 4.6.3 (2). [HINT: For positive real numbers
a
a and b, write a = b × .]
b
2. Prove Theorem 4.6.3 (3). [Hint: The proof is similar to that of Theorem 4.6.3 (1).]

3. Use Theorem 4.6.3 (3) and Theorem 4.6.4 (1) to prove Theorem 4.6.4 (2).

4. Use Theorem 4.3.2 and the Mean Value Theorem, Theorem A.1.9, to prove Theorem
4.6.4 (3). [Hint: Assume that there exist a > b > 0 so that ln(a) < ln(b), and derive
a contraction.]

220
5. Use Theorem 4.6.9 (1) to prove Theorem 4.6.9 (2).

6. Use Theorem 4.6.3 (3) to prove Theorem 4.6.9 (3).

7. Prove Theorem 4.6.10 (3). [Hint: The proof is similar to that of Theorem 4.6.10 (2).]

8. Suppose that a, b > 0, x and y are real numbers. Use Definition 4.6.8 (and appropriate
theorems) to prove the following.
ax
(a) ax+y = ax + ay , ax−y = y and (ab)x = ax bx .
a
(b) ln(a ) = x ln(a) and (a ) = axy .
x x y

d x
(c) f (x) = ax is differentiable on R, and a = ax ln(a).
dx

4.7 Improper Integrals

The definite integral is defined for a function defined on a closed and bounded interval; that
is, a function f : [a, b] → R. Furthermore, in order to evaluate the integral
Z b
f (x) dx
a

using the Fundamental Theorem of Calculus, the integrand f must be continuous on [a, b].
However, for many applications, such as to Probability Theory and Statistics, a more general
notion of integral is required. In particular, it is often necessary to consider the ‘integral’ of
a function on an unbounded interval, or of a function f : [a, b] → R which is discontinuous
at some point in an interval [a, b]. We first consider the former case.

Definition 4.7.1 (Improper Integral of Type I). Let f be a function from R to R, and
a be a real number.

(1) If f is defined on [a, ∞), and Z t


f (x) dx
a
exists for every t ≥ a, then
Z ∞ Z t
f (x) dx = lim f (x) dx (4.18)
a t→∞ a

provided that the limit exists and is a real number. In this case, the improper integral
(4.18) is called convergent.

(2) If f is defined on (−∞, a], and Z a


f (x) dx
t

221
exists for every t ≤ a, then
Z a Z a
f (x) dx = lim f (x) dx
−∞ t→−∞ t

provided that the limit exists and is a real number. In this case, the improper integral
(4.18) is called convergent.

Remark 4.7.2. Let a be a real number and f a function defined on [a, ∞) such that
Z t
f (x) dx
a

exists for all t ≥ a.

(1) The expression Z t


F (t) = f (x) dx, t ≥ a
a

defines a function F : [a, ∞) → R. We have


Z ∞
f (x) dx = lim F (t),
a t→∞

if the limit exists; that is, the improper integral


Z ∞
f (x) dx
a

is the limit of the function F as t tends to ∞.

(2) If the improper integral Z ∞


f (x) dx
a
is not convergent, then it is called divergent.

(3) If Z ∞
f (x) dx
a
is divergent, then
Z t Z t
lim f (x) dx = ±∞ or lim f (x) dx does not exist.
t→∞ a t→∞ a

The comments in (1) to (3) apply also to the improper integral


Z a
f (x) dx.
−∞

222
We illustrate Definition 4.7.1 at the hand of the following examples.

Example 4.7.3. We determine whether or not the improper integral


Z ∞
2−x dx
1

is convergent, and find its value if it is convergent.

Solution. For t ≥ 1 we have


Z t t
−x 2−x 1 2−t
2 dx = − = − .
1 ln 2 1 2 ln 2 ln 2

We have
2−t
lim =0
t→∞ ln 2

so that Z ∞ Z t
−x
2 dx = lim 2−x dx
1 t→∞ 1

1 2−t
= − lim
2 ln 2 t→∞ ln 2
1
= .
2 ln 2
The integral is therefore convergent.

Example 4.7.4. We determine whether or not the improper integral


Z 0
xex dx
−∞

is convergent, and find its value if it is convergent.

Solution. For every t ≤ 0 we have


Z 0 0 Z 0
x x
xe dx = xe − ex dx
t t t

= et − tet − 1

1−t
= − 1.
e−t
We have
lim (1 − t) = lim e−t = ∞.
t→−∞ t→−∞

223
By l’Hospital’s Rule
1−t 1
lim = lim = lim et = 0.
t→−∞ e−t t→−∞ e−t t→−∞

Therefore Z 0 Z 0
x
xe dx = lim xex dx
−∞ t→−∞ t

1−t
= lim −1
t→−∞ e−t

= −1
so that the integral is convergent.
Example 4.7.5. We determine whether or not
Z ∞
x2 + x
dx
2 (x − 1)(x2 + 1)
is convergent, and find its value if it is convergent.

Solution. The reader should verify that


x2 + x 1 1
= + , x ̸= 1.
(x − 1)(x2 + 1) x − 1 x2 + 1
For all t ≥ 2 we therefore have
Z t Z t
x2 + x

1 1
2
dx = + dx = ln(t − 1) + arctan t − arctan 2.
2 (x − 1)(x + 1) 2 x − 1 x2 + 1
Since
π
lim ln(t − 1) = ∞ and lim arctan t = ,
t→∞ t→∞ 2
it follows that
Z ∞ t
x2 + x x2 + x
Z
dx = lim dx
2 (x − 1)(x2 + 1) t→∞ 2 (x − 1)(x2 + 1)

= lim ln(t − 1) + lim arctan t − arctan 2


t→∞ t→∞

= ∞.
Therefore the integral is divergent.

We now consider improper integrals of the form


Z ∞
f (x) dx
−∞

where f is a function defined on all of R.

224
Definition 4.7.6. Consider a function f : R → R. If for some real number a the integrals
Z a Z ∞
f (x) dx and f (x) dx
−∞ a

are both convergent, then


Z ∞ Z a Z ∞
f (x) dx = f (x) dx + f (x) dx.
−∞ −∞ a

Remark 4.7.7. Consider a function f : R → R.

(1) If there exists a real number a so that


Z ∞ Z a
f (x) dx and f (x) dx
a −∞

are both convergent, then


Z ∞ Z c
f (x) dx and f (x) dx
c −∞

are convergent for every real number c. The real number a in Definition 4.7.6 may
therefore be chosen in whichever way is convenient.
(2) The integral Z ∞
f (x) dx
−∞
is divergent if and only if
Z ∞ Z a
f (x) dx or f (x) dx
a −∞

is divergent for some real number a.


(3) If Z t
lim f (x) dx
t→∞ −t
is a real number, it does not necessarily mean that the integral
Z ∞
f (x) dx
−∞

is convergent, see Exercise 4.7 number 4.


Example 4.7.8. We determine whether or not the integral
Z ∞
2
xe−x dx
−∞

is convergent, and find its value if it is convergent.

225
Solution. For all t ≥ 0 we have
Z t 2 t 2
−x2 e−x 1 e−t
xe dx = − = − .
0 2 0 2 2

Since
lim −t2 = −∞ and lim es = 0,
t→∞ s→−∞

it follows that
2
lim e−t = 0.
t→∞

Therefore 2
∞ t
e−t
Z Z
−x2 −x2 1 1
xe dx = lim xe dx = − lim = .
0 t→∞ 0 2 t→∞ 2 2
For t ≤ 0 we have
0 2
e−t
Z
−x2 1
xe dx = − .
t 2 2
Because
lim −t2 = −∞ and lim es = 0,
t→−∞ s→−∞

we have
2
lim e−t = 0.
t→−∞

Hence 2
0 0
e−t
Z Z
−x2 −x2 1 1
xe dx = lim xe dx = lim − =− .
−∞ t→−∞ t t→∞ 2 2 2
It follows that Z ∞
xe− x2 dx
−∞

is convergent, and
Z ∞ Z 0 Z ∞
− 2 − 2
xe x dx = xe x dx + xe− x2 dx = 0.
−∞ −∞ 0

Example 4.7.9. We determine whether or not the improper integral


Z ∞
1
dx
−∞ 1 + |x|

is convergent or divergent, and find its value if it is convergent.

Solution. For t ≥ 0 we have


Z t Z t
1 1
dx = dx = ln(1 + t).
0 1 + |x| 0 1+x

226
Therefore Z ∞ Z t
1 1
dx = lim dx = lim ln(1 + t) = ∞.
0 1 + |x| t→∞ 0 1 + |x| t→∞

Since Z ∞
1
dx
0 1 + |x|
is divergent, it follows that Z ∞
1
dx
−∞ 1 + |x|
is divergent.

It is often not possible to determine the value of a given improper integral. For instance,
the integral Z ∞
2
e−x dx
0
is convergent, but it is not possible to determine its value using only tools from elementary
Calculus. Indeed, Z t
2
F (t) = e−x dx, t ≥ 0
0
is not an elementary function. The following result provides a method to determine whether
an improper integral is convergent without having to determine an anti-derivative for the
integrand.
Theorem 4.7.10 (Comparison Test). Let f, g : [a, ∞) → R be continuous on [a, ∞),
where a is a real number. Assume that there exists a real number c ≥ a so that 0 ≤ f (x) ≤
g(x) for all x ≥ c. Then the following is true.
Z ∞ Z ∞
(1) If g(x) dx is convergent, then f (x) dx is convergent.
a a
Z ∞ Z ∞
(2) If f (x) dx is divergent, then g(x) dx is divergent.
a a

A complete proof of Theorem 4.7.10 is beyond the scope of this book, but a partial proof
is given as an exercise, see Exercise 4.7 number 7. However, if we interpret the improper
integrals Z ∞Z ∞
f (x) dx and g(x) dx
a a
as the possibly infinite areas of the regions bounded by the x-axis, the line x = a and the
curves with equations y = f (x) and y = g(x), respectively, then Theorem 4.7.10 seems at
least plausible. Indeed, assume 0 ≤ f (x) ≤ g(x) whenever x ≥ a. Then
Z ∞ Z ∞
f (x) dx = Area under y = f (x) ≤ Area under y = g(x) = g(x) dx,
a a

see the sketch below.

227
y
x=a

y = g(x)
y = f (x) x
a

Therefore if Z ∞
g(x) dx = Area under y = g(x) < ∞
a
then Z ∞
f (x) dx = Area under y = f (x) < ∞.
a

The following result is a useful aid when applying The Comparison Test, Theorem 4.7.10.

Theorem 4.7.11 (The p-Test). Let a > 0 and p be real numbers. The improper integral
Z ∞
1
dx
a xp
is convergent if p > 1 and divergent if p ≤ 1.

The proof of Theorem 4.7.11 is given as an exercise, see Exercise 4.7 number 8. We now
turn to some examples.

Example 4.7.12. We determine whether or not the integral


Z ∞
cos2 x
dx
1 x2 + 2x + 2
is convergent.

Solution. Let x ≥ 1 be a real number. Then

0 ≤ cos2 x ≤ 1 and x2 + 2x + 2 > x2 + 2x > x2 > 0.

Therefore
cos2 x 1 1
0≤ 2
= cos2 x × 2 < 2. (4.19)
x + 2x + 2 x + 2x + 2 x

228
By the p-Test, Z ∞
1
dx
1 x2
is convergent. It therefore follows from (4.19) and the Comparison Test that
Z ∞
cos2 x
dx
1 x2 + 2x + 2
is convergent.

Example 4.7.13. We determine whether or not the integral


Z ∞
2
e−x dx
0

is convergent.

Solution. Let x ≥ 1 be a real number. Then

x2 ≥ x so that − x2 ≤ −x.

Since the exponential function is strictly positive and increasing, it follows that
2
0 < e−x ≤ e−x for all x ≥ 1. (4.20)

But Z t
lim e−x dx = lim (1 − e−t ) = 1.
t→∞ 0 t→∞

Therefore Z ∞
e−x dx
0

is convergent. It now follows from (4.20) and the Comparison Test that
Z ∞
2
e−x dx
0

is convergent.

Example 4.7.14. We determine whether or not the integral


Z ∞ 2
x +1
dx
1 x3 + 1
is convergent.

229
Solution. Let x ≥ 1 be a real number. Then x3 ≥ 1 so that
0 < x3 + 1 ≤ x3 + x3 = 2x3 .
Furthermore,
x2 + 1 ≥ x2 > 0.
Therefore
x2 + 1 1 1 1
= x2 + 1 × 3 ≥ x2 × 3 =

3
> 0. (4.21)
x +1 x +1 2x 2x
By the p-Test,

1 ∞1
Z Z
1
dx = dx
1 2x 2 1 x
is divergent. It now follows from (4.21) and the Comparison Test that
Z ∞ 2
x +1
dx
1 x3 + 1
is divergent.
Example 4.7.15. We determine whether or not the integral
Z ∞
1
dx
2 ln x
is convergent.

Solution. Let x > 2 be a real number. Then


d 1 1
(ln x − x) = − 1 ≤ − 1 < 0.
dx x 2
It follows from Theorem A.1.5 that
ln x − x < ln(1) − 1 < 0.
Therefore
0 < ln x < x for all x > 2
so that
1 1
0< < for all x > 2. (4.22)
x ln x
According to the p-Test, Z ∞
1
dx
2 x
is divergent. It therefore follows from (4.22) and the Comparison Test that
Z ∞
1
dx
2 ln x
is divergent.

230
We now turn to integrals of the form
Z b
f (x) dx
a

where f : [a, b] → R is continuous everywhere except at one point in [a, b]. It should be
noted that the Fundamental Theorem of Calculus does not, in general, apply to such an
integral.
Definition 4.7.16 (Improper Integrals of Type II). Let f be a function from R to R
and a < b real numbers.

(1) If f is continuous on [a, b) but not at b, then


Z b Z t
f (x) dx = lim− f (x) dx,
a t→b a

provided that the limit exists and is a real number. In this case we say that the improper
integral Z b
f (x) dx
a
is convergent.
(2) If f is continuous on (a, b] but not at a, then
Z b Z b
f (x) dx = lim+ f (x) dx,
a t→a t

provided that the limit exists and is a real number. In this case we say that the improper
integral Z b
f (x) dx
a
is convergent.
(3) If a < c < b and f is continuous on [a, c) and on (c, b], but discontinuous at c, then
Z b Z c Z b
f (x) dx = f (x) dx + f (x) dx
a a c

provided that both Z c Z b


f (x) dx and f (x) dx
a c
are convergent.
Remark 4.7.17. Consider an improper integral
Z b
f (x) dx (4.23)
a

of Type II.

231
(1) If the integral (4.23) is not convergent, then it is called divergent.
(2) If for some a < c < b the function f is continuous on [a, c) and on (c, b] but discon-
tinuous at c, then the integral (4.23) is divergent if and only if
Z c Z b
f (x) dx or f (x) dx
a c

is divergent.
(3) Definition 4.7.16 can be generalised so as to deal with a function f which is continuous
on [a, b] except at finitely many points x1 , . . . , xn ∈ [a, b].

We now turn to some examples.


Example 4.7.18. We determine whether or not the integral
Z 1
x
√ dx
0 1−x
is convergent, and determine its value if it is convergent.

Solution. Note that the function given by


x
f (x) = √ , x<1
1−x
is continuous on [0, 1), but not at 1. Hence
Z 1 Z t
x x
√ dx = lim− √ dx,
0 1−x t→1 0 1−x
provided that the limit exists.

For t < 1 we have, using Integration by Parts, that


t
√ √
Z t Z t
x
√ dx = −2x 1 − x + 2 1 − x dx
0 1−x 0 0

t
√ 4(1 − x)3/2
= −2x 1 − x −
3 0

4 √ 4(1 − t)3/2
= − 2t 1 − t − .
3 3
Since
√ 4(1 − t)3/2
 
4 4
lim− − 2t 1 − t − = ,
t→1 3 3 3
it follows that the integral is convergent and
Z 1 Z t
x x 4
√ dx = lim− √ dx = .
0 1−x t→1 0 1−x 3

232
Example 4.7.19. We determine whether or not the integral
Z 0
1
dx
−1 x + 1

is convergent, and determine its value if it is convergent.

Solution. Note that the function given by


1
f (x) = , x ̸= −1
x+1
is continuous on (−1, 0], but not at −1. Hence
Z 0 Z 0
1 1
dx = lim+ dx,
−1 x + 1 t→−1 t x+1

provided that the limit exists. For t > −1 we have


Z 0
1
dx = − ln(t + 1)
t x+1

Since
lim (t + 1) = 0 and lim ln s = −∞,
t→−1+ s→0+

it follows that Z 0
1
lim+ dx = − lim+ ln(t + 1) = ∞.
t→−1 t x+1 t→−1

Therefore the integral Z 0


1
dx
−1 x+1
is divergent.
Example 4.7.20. We determine whether or not the integral
Z 1
4x3 ln x dx
0

is convergent, and determine its value if it is convergent.

Solution. Note that the function given by

f (x) = 4x3 ln x, x > 0

is continuous on (0, 1], but not at 0. Hence


Z 1 Z 1
3
4x ln x dx = lim+ 4x3 ln x dx,
0 t→0 t

233
provided that the limit exists.

For t > 0 we have, using Integration by Parts, that


Z 1 1 Z 1
3 4
4x ln x dx = x ln x − x3 dx
t t t

t4 − 1
= − t4 ln t.
4
Note that
ln t
t4 ln t = , t>0
t−4
and
lim+ ln t = −∞, lim+ t−4 = ∞.
t→0 t→0

By l’Hospital’s Rule we have

t−1 t4
lim+ t4 ln t = lim+ = lim = 0.
t→0 t→0 −4t−5 t→0+ −4
It follows that the integral is convergent and
Z 1 Z 1
3
4x ln x dx = lim+ 4x3 ln x dx
0 t→0 t

t4 − 1
= lim+ − lim+ t4 ln t
t→0 4 t→0

1
= − .
4
Example 4.7.21. We evaluate the integral
Z 3
2
dx,
1 x2 − 2x
if possible.

Solution. Note that the function given by


2
f (x) = , x ̸= 0, x ̸= 2
x2 − 2x
is continuous at every x ∈ [1, 3], except at x = 2. Therefore
Z 3
2
2
dx
1 x − 2x

234
is an improper integral and
Z 3 Z 2 Z 3
2 2 2
2
dx = 2
dx + 2
dx
1 x − 2x 1 x − 2x 2 x − 2x

Z t Z 3
2 2
= lim− 2
dx + lim+ dx,
t→2 1 x − 2x t→2 t x2 − 2x
provided that both the limits exist and are real numbers.

For 1 < t < 2 we have


Z t Z t 
2 1 1
2
dx = − dx
1 x − 2x 1 x−2 x

= ln |t − 2| − ln |t|.

Since
lim (t − 2) = 0 and lim ln |s| = −∞,
t→2− s→0

it follows that
Z t
2
lim− dx = lim− ln |t − 2| − lim− ln t = −∞.
t→2 1 x2 − 2x t→2 t→2

Therefore the integral is divergent.


Remark 4.7.22. We have not discussed a Comparison Theorem for improper integrals of
Type II. The reader should, however, be aware that such a result does hold. For instance, if
f, g : [a, b) → R are continuous functions and there exists a ≤ c < b so that 0 ≤ f (x) ≤ g(x)
for all c ≤ x < b then the following hold.
Z b Z b
(1) If g(x) dx is convergent then f (x) dx is convergent.
a a
Z b Z b
(2) If f (x) dx is divergent then g(x) dx is divergent.
a a

Similar results hold for other Type II improper integrals.

In this section improper integrals of Type I and of Type II are introduced. An improper
integral is not a definite integral, but the limit of a function which is defined using a definite
integral. An important application of improper integrals is to probability theory. If f : R →
R is the probability density function of a random variable X, then
Z a
P (X ≤ a) = f (x) dx.
−∞

for any real number a.

235
Exercise 4.7

1. Determine whether the given improper integral is convergent. If the integral is con-
vergent, find its value.
Z ∞ Z ∞ Z ∞
1−x ln x 1
(a) 2 dx (b) dx (c) dx
0 1 x2 2
−∞ x + 1

0 ∞ ∞
ex 1−x
Z Z Z
3 −x2
(d) dx (e) xe dx (f) dx
−∞ 1 + ex −∞ 1 2x2 + 5x + 2
∞ ∞ ∞
x2
Z Z Z
x+3 −x
(g) dx (h) e cos x dx (i) √ dx
0 x2 + 3x + 2 π 0 x3 + 1

x−2
Z
(j) dx
1 6x2 −x−1
2. Decide whether or not the given integral is an improper integral.
Z 1 Z 1 Z 2
2/3 1
(a) x dx (b) √ dx (b) e1/x dx
1 − x 2
−1 0 0

3. Determine whether the given improper integral is convergent. If the integral is con-
vergent, find its value.
Z 1 Z π/2 Z 1
1
(a) dx (b) sec x dx (c) ln |x| dx
−1 x+1 0 −1

Z 2 Z 1 Z 4
1 ln x x
(d) dx (e) √ dx (f) dx
1 (x − 2)4 0 x 0 x−2
π/2 1 0
e1/x
Z Z Z
1
(g) cot x dx (h) dx (i) dx
0 −1 x3 −1 x3
4. Consider the function f : R → R given by
2x
f (x) = , x ∈ R.
+1 x2
Z ∞ Z t
Show that f (x) dx is divergent, but lim f (x) dx = 0.
−∞ t→∞ −t

5. Use the Comparison Test to determine whether or not the given improper integral is
convergent.
Z ∞ Z ∞ Z ∞
2x − 1 x2 − 1 1
(a) 3 2
dx (b) 4
dx (c) dx
1 x +x +1 0 x +x+1 0 2 + e2x

236
∞ ∞ ∞ √
sin2 x
Z Z Z
x x
(d) dx (e) √ dx (f) dx
Z 1∞ x2 1
Z ∞ x√
4+1
Z 0∞ x2√+ 1
x+1 x x2 + 1
(g) dx (h) √ dx (i) √ dx
1 x2 + 1 1
2
x +1 0 x3 + x + 1
6. For which values of the constant p is the given integral convergent?
Z 1 Z ∞ Z ∞ Z 1
1 2p+3 1
(a) p
dx (b) x dx (c) p 2 +p−6 dx (d) xp ln x dx
0 x 1 1 x 0

7. The aim of this exercise is to give a partial proof of Theorem 4.7.10. Let f, g : [a, ∞) →
R be continuous on [a, ∞), where a is a real number. Assume that there exists a real
number c ≥ a so that 0 ≤ f (x) ≤ g(x) for all x ≥ c. Assume that
Z t
lim f (x) dx = ∞.
t→∞ a

Prove that Z ∞
g(x) dx
a
is divergent.

8. Prove Theorem 4.7.11.

4.8 Taylor Polynomials

For a function f : R → R it is typically not possible, even in principle, to determine the


exact value
√ f (x) of f for an arbitrary real number√x. Consider, for example, the function
f (x) = x, x ≥ 0. The real number f (2)√= 2 exists, as you can verify using the
Intermediate Value Theorem. However, since 2 is an irrational number, and therefore has
an infinite, non-repeating decimal
√ expansion, it is impossible to determine its exact value.
You should bear in mind that 2 is only a symbol we use to represent the unique, positive
real number whose square is 2.

For many applications of mathematics, it is essential √


to be able to find accurate
√ approxi-
mations for the values of a function such as f (x) = x, for instance f (2) = 2. As an
elementary example, consider a square with sides of length 1.


According to the Theorem of Pythagoras, the length of the diagonal is ℓ = 2. Since we
cannot determine an exact numerical value for ℓ, the best we can do is to find an accurate

237
approximation for it. There are many ways in which to calculate such approximations. In
this section we consider one approach. The basic idea is to approximate a function f using
polynomials.
Definition 4.8.1. Let a be a real number, and f a function defined on an open interval
containing a. If f is n-times differentiable at a, then the nth Taylor polynomial for f about
a is
f ′′ (a)(x − a)2 f ′′′ (a)(x − a)3
Pn (x) = f (a) + f ′ (a)(x − a) + +
2 2·3

f (4) (a)(x − a)4 f (n−1) (a)(x − a)n−1 f (n) (a)(x − a)n


+ + ··· + + .
2·3·4 (n − 1)! n!
Remark 4.8.2. Let a be a real number, and f a function defined on an open interval
containing a. Suppose that f is n-times differentiable at a.

(1) For a natural number k ≥ 1, k!, pronounced k-factorial, is the natural number k! =
1 · 2 · 3 · · · · · k. We define 0! as 1.
(2) A compact notation for the nth Taylor Polynomial for f about a is
n
X f (k) (a)(x − a)k
Pn (x) = f (a) + .
k=1
k!

(3) Note that we evaluate f and its derivatives at a specific, fixed point a. Therefore
f (a), f ′ (a), . . . , f (n) (a) are real numbers.
(4) Pn is a function defined on the entire real line.
(5) The 1st order Taylor polynomial gives the tangent line to the curve y = f (x) at x = a.
(k)
(6) Note that f (a) = Pn (a) and f (k) (a) = Pn (a) for all k = 1, . . . n. Therefore the graphs
of f and Pn have some features in common near x = a. For instance, if f has a critical
point at a, then so does Pn ; if the graph of f is concave up (or down) at a, then so
1
is the graph of Pn . The sketch below shows the graph of the function f (x) = 1+x near
x = 1, and that of its third Taylor polynomial P3 about x = 1.

y
y = f (x)
2

y = P3 (x)
x
−1 1 2 3

−1

238
The Taylor polynomials for a function f about a is supposed to approximate the values of f
at points near x; that is, f (x) ≈ Pn (x) for x near a. It is important to know how good this
approximation is, and we will deal with this issue shortly. First we consider some examples.

Example 4.8.3. We find the 2nd √ Taylor polynomial for the function f (x) = 3 x about
x = 8, and use it to approximate 3 7.

Solution. For x ∈ R we have


2 5
′ x− 3 2x− 3
f (x) = , f ′′ (x) = − .
3 9
Therefore
1 1
f (8) = 2, f ′ (8) = , f ′′ (8) = −
12 144
so that
(x − 8) 1 (x − 8)2
P2 (x) = 2 + −
12 144 2!

(x − 8) (x − 8)2
= 2+ − .
12 288

3
Approximating 7 using P2 we find
√3 1 1 551
7 = f (7) ≈ P2 (7) = 2 − − = = 1.913194̇.
12 288 288
The cube root of 7, rounded to three decimal places, is 1.913.

Example 4.8.4. We find the Taylor√polynomial of degree four for the function f (x) = x
at x = 1, and use it to approximate 2.

Solution. For x > 0 we have


1 3 5 7
′ x− 2 x− 2 3x− 2 15x− 2
f (x) = , f ′′ (x) = − , f ′′′ (x) = , f (4) (x) = − .
2 4 8 16
Therefore
1 1 3 15
f (1) = 1, f ′ (1) = , f ′′ (1) = − , f ′′′ (1) = , f (4) (1) = −
2 4 8 16
so that
1 1 (x − 1)2 3 (x − 1)3 15 (x − 1)4
P4 (x) = 1 + (x − 1) − + −
2 4 2! 8 3! 16 4!

1 (x − 1)2 3(x − 1)3 15(x − 1)4


= 1 + (x − 1) − + − .
2 8 48 384

Approximating 2 using P4 we find
√ 1 1 3 15 179
2 = f (2) ≈ P4 (2) = 1 + − + − = = 1.3984375.
2 8 48 384 128
The square root of 2, rounded to seven decimal places, is 1.4142136.

239
We now turn to the issue of the accuracy of the approximation of a function f by its Taylor
polynomials. In this regard, consider a function f : I → R where I is an open interval
containing the real number a. Assume that for some positive integer n, the function f is
n-times differentiable at a. This means that the nth Taylor polynomial for f about a is
well defined. We measure the accuracy of the approximation f (x) ≈ Pn (x) using the error
function

En (x) = f (x) − Pn (x), x ∈ I. (4.24)

The nearer the value En (x) is to 0, the better the approximation f (x) ≈ Pn (x). Our aim is
to find explicit expressions for the error function En .
Theorem 4.8.5. Let I be an open interval containing the real number a, and f a function
defined on I. If f (n+1) exists and is continuous on I for some positive integer n, then
Z x (n+1)
f (t)
En (x) = (x − t)n dt, x ∈ I.
a n!

Proof. We prove the result using Mathematical Induction on n.


Assume that f ′′ exists and is continuous on I. Then, according to Definition 4.8.1 and
(4.24),
E1 (x) = f (x) − P1 (x) = f (x) − f (a) − f ′ (a)(x − a), x ∈ I.
Fix x ∈ I. Using integration by parts, we have
Z x
f ′′ (t)(x − t)dt = E1 (x),
a

see Exercise 4.8 number 7. Therefore the result is true if n = 1.


Now assume that the result is true for some fixed but arbitrary natural number n0 . That
is, if f (n0 +1) exists and is continuous on I, then
Z x (n0 +1)
f (t)
En0 (x) = (x − t)n0 dt, x ∈ I. (4.25)
a n 0 !
Suppose that f (n0 +2) exists and is continuous on I. Then, for all x ∈ I,
En0 +1 (x) = f (x) − Pn0 +1 (x)

nX
0 +1
f (k) (a)(x − a)k
= f (x) − f (a) −
k=1
k!

n0
X f (k) (a)(x − a)k f (n0 +1) (a)(x − a)n0 +1
= f (x) − f (a) − − .
k=1
k! (n0 + 1)!

But n0
X f (k) (a)(x − a)k
Pn0 (x) = f (a) + , x ∈ I.
k=1
k!

240
Therefore
f (n0 +1) (a)(x − a)n0 +1
En0 +1 (x) = f (x) − Pn0 (x) −
(n0 + 1)!

f (n0 +1) (a)(x − a)n0 +1


= En0 (x) −
(n0 + 1)!
for every x ∈ I. Since f (n0 +2) exists on I, it follows that f (n0 +1) exists and is continuous on
I. Therefore our assumption (4.25) implies that
Z x (n0 +1)
f (t) f (n0 +1) (a)(x − a)n0 +1
En0 +1 (x) = (x − t)n0 dt − .
a n0 ! (n0 + 1)!
Using integration by parts, we find that
Z x (n0 +2) x x
(t)(x − t)n0 +1 f (n0 +1) (t)(x − t)n0 +1 f (n0 +1) (t)(x − t)n0
Z
f
dt = + dt
a (n0 + 1)! (n0 + 1)! a a n0 !
x
f (n0 +1) (a)(x − a)n0 +1 f (n0 +1) (t)(x − t)n0
Z
= − + dt
(n0 + 1)! a n0 !

= En0 +1 (x).
We have shown that if the result holds for an arbitrary but fixed natural number n0 , then
it also holds for n0 + 1. Therefore, by the Principle of Mathematical Induction, the result
holds for all natural numbers n.
Remark 4.8.6. Let I be an open interval containing the real number a, and f a function
defined on I such that f (n+1) exists and is continuous on I for some natural number n.

(1) Applying integration by parts repeatedly, it is possible to determine an exact expres-


sion for the error En that does not involve an integral. This is, however, of very
limited practical use, since this expression always contains the function f which we
are attempting to approximate, see for instance Example 4.8.7.
(2) It is possible to express the error En directly in terms of the derivatives of f in the
following ways.
(a) For every x ∈ I, there exists an sx between x and a so that

f (n+1) (sx )
En (x) = (x − a)n+1 .
(n + 1)!
The proof of this fact is given as an exercise, see Exercise 4.8 number 8.
(b) For every x ∈ I, there exists an ux between x and a so that

f (n+1) (ux )
En (x) = (x − ux )n (x − a),
n!
see Exercise 4.8 number 9 for a proof.

241

Example 4.8.7. According to Example 4.8.3, the 2nd Taylor polynomial for f (x) = 3
x
about 8 is
(x − 8) (x − 8)2
P2 (x) = 2 + − .
12 288
We determine the error E2 (x) for x ∈ R.

Solution. According to Theorem 4.8.5,


x ′′′ x 8
10t− 3
Z Z
f (t)
E2 (x) = (x − t)2 dt = (x − t)2 dt, x ∈ R.
8 2 8 54
Evaluating the integral, we find
Z x
10 8
E2 (x) = t− 3 (x2 − 2xt + t2 )dt
54 8
x
3x2 − 5

10 − 23 1
= − t + 3xt + 3t
3 3
54 5 8

√ x2 5x 10
= 3
x+ − −
288 36 9
for every x ∈ R. Notice that

x2 5x 10
E2 (x) = f (x) + − − , x ∈ R.
288 36 9
If we therefore want to use this expression for E2 to determine the exact error in the
approximation f (x) ≈ P2 (x) for some x ∈ R, we must already know the value of f (x),
rendering the approximation redundant.

As Example 4.8.7 demonstrates, the exact expression for the error En given in Theorem
4.8.5 is of little practical use. However, it can be used to obtain a useful upper bound for
the error.

Theorem 4.8.8. Let a be a real number, n a positive integer and I an open interval con-
taining a. Assume that f is a function defined on I such that f (n+1) exists and is continuous
on I. Let x ∈ I be a fixed number such that x ̸= a. If there is a real number Mx ≥ 0 so that
|f (n+1) (t)| ≤ Mx for all t between a and x, then

Mx
|En (x)| ≤ |x − a|n+1 .
(n + 1)!

Proof. There are two cases to consider, namely, x > a and x < a. We consider only the
case when x > a, giving the proof of the remaining case as an Exercise, see Exercise 4.8
number 11.

242
Consider a fixed real number x ∈ I such that x > a. Assume that |f (n+1) (t)| ≤ Mx for all t
between a and x. Then

−Mx ≤ f (n+1) (t) ≤ Mx , a ≤ t ≤ x.

But (x − t)n ≥ 0 for all a ≤ t ≤ x. Therefore

(x − t)n (x − t)n (x − t)n


−Mx ≤ f (n+1) (t) ≤ Mx , a ≤ t ≤ x.
n! n! n!
Therefore Theorem A.1.10 (i) implies that

Mx x
Z x
(x − t)n Mx x
Z Z
n (n+1)
− (x − t) dt ≤ f (t) dt ≤ (x − t)n dt.
n! a a n! n! a

It now follows from Theorem 4.8.5 that


Mx x Mx x
Z Z
n
− (x − t) dt ≤ En (x) ≤ (x − t)n dt.
n! a n! a
But x
(x − a)n+1
Z
n
(x − t) dt = .
a n+1
n+1
Therefore, since (x − a) ≥ 0,
Mx Mx Mx Mx
− |x−a|n+1 = − (x−a)n+1 ≤ En (x) ≤ (x−a)n+1 = |x−a|n+1
(n + 1)! (n + 1)! (n + 1)! (n + 1)!
so that
Mx
|En (x)| ≤ |x − a|n+1 .
(n + 1)!
Remark 4.8.9. Consider a function f defined on some open interval I containing the real
number a. Assume that f (n+1) exists and is continuous on I for some positive integer n.
We should keep the following points in mind when applying Theorem 4.8.8.

(1) If for some x ∈ I we want to find an upper bound for |En (x)|, we must find a real
number Mx ≥ 0 so that |f (n+1) (t) ≤ Mx for all t between a and x. The smaller Mx is,
the better is the bound we obtain for the error. If f (n+1) attains an absolute minimum
and an absolute maximum value on on the interval with endpoints a and x, say at
points c and d, respectively, then the smallest possible value for Mx is

M = max{|f (n+1) (c)|, |f (n+1) (d)|}.

(2) In some cases it may not be practically possible to determine the smallest value for
Mx as given in (1). In such cases we aim to find a reasonable value for Mx which is
easily computable.

We demonstrate the use of Theorem 4.8.8 at the hand of some examples.

243

Example 4.8.10. Consider again the function f (x) = 3
x. According to Example 4.8.3,
the 2nd Taylor polynomial for f about a = 8 is

(x − 8) (x − 8)2
P2 (x) = 2 + − .
12 288
Furthermore,
√3 551
7 = f (7) ≈ P2 (7) = = 1.913194̇.
288
We now use Theorem 4.8.8 to find an upper bound for the error E2 (7).

Solution. We have 8
′′′10x− 3
f (x) = , x ̸= 0.
27
Note that the functions g(t) = t8/3 and h(t) = y 7 are both increasing. So, if 7 ≤ x ≤ 8, we
have
8 8
x 3 ≥ 7 3 > 72 = 49.
Therefore
10 1 10
0 < f ′′′ (x) <
× = , 7 ≤ x ≤ 8.
27 49 1323
10
Applying Theorem 4.8.8 with n = 2, x = 7 and M = , we have
1323
M 5
|E2 (2)| ≤ |7 − 8|3 = = 0.00125976 . . . .
6 3969

Therefore, if we use P2 (7) to approximate 3 7, the approximation is accurate at least up to
the second decimal place.

Example 4.8.11. We use the 5th Taylor polynomial for f (x) = sin x about x = 0 to
approximate sin(1). We also find a bound for the error E5 (1) in the approximation sin(1) ≈
P5 (1).

Solution. We have
f ′ (x) = cos x, f ′′ (x) = − sin x, f ′′′ (x) = − cos x, f (4) (x) = sin x,

f (5) (x) = cos x, f (6) (x) = − sin x.

Therefore

f (0) = 0, f ′ (0) = 1, f ′′ (0) = 0, f ′′′ (0) = −1, f (4) (0) = 0, f (5) (0) = 1

so that
x3 x5
P5 (x) = x − + .
6 120

244
Using P5 to approximate sin(1), we have
1 1 101
sin(1) = f (1) ≈ P5 (1) = 1 − + = = 0.8416̇.
6 120 120
Now we find an upper bound for the error E5 (1). For 0 ≤ x ≤ 1 we have |f (6) (x)| = | sin x| ≤
1. We apply Theorem 4.8.8 with a = 0, x = 1, n = 5 and M = 1 to find

|1|6 1
|E5 (1)| ≤ = .
6! 720
1
Therefore | sin(1) − P5 (1)| = |E5 (1)| ≤ 720 = 0.00138̇. The approximation sin(1) ≈ P5 (1) is
accurate at least up to the second decimal place
Example 4.8.12. Consider the function f (x) = ex , x ∈ R. For a fixed x ∈ [−1, 1], we
find an upper bound for the error En (x) in the approximation f (x) ≈ Pn (x), where n is an
arbitrary natural number.

Solution. We have f (n) (x) = ex for all n ∈ N and x ∈ R. Therefore

|f (n) (x)| = ex ≤ e f or all n ∈ N and x ∈ [−1, 1].

Applying Theorem 4.8.8 with M = e, we find

e|x|n+1
|En (x)| ≤ for all n ∈ N and x ∈ [−1, 1].
(n + 1)!

For a fixed x ∈ [−1, 1], what do you think happens to En (x) as n tends to +∞? What
happens to Pn (x)?

When using Taylor polynomials to approximate the value f (x) of a function f , it is useful
to know beforehand what the degree of the Taylor polynomial should be in order to obtain
a desired level of accuracy. This can be done using Theorem 4.8.8, as we now demonstrate.
Example 4.8.13. Using a Taylor polynomial Pn for f (x) = sin x about x = 0, we wish to
approximate sin(π/4) with an error of less than 10−3 . We determine how large the degree n
of the Taylor polynomial should be in order to achieve this level of accuracy.

Solution. According to Theorem 4.8.8,


M
|En (π/4)| ≤ |π/4|n+1 , x ∈ [0, π/4], n ∈ N
(n + 1)!

where M is a constant so that |f (n+1) (t)| ≤ M for all t between 0 and π/4. Since f (n+1) (x) =
± sin x or f (n+1) (x) = ± cos x, it follows that |f (n+1) (t)| ≤ 1 for all t ∈ [0, π/4] and n ∈ N.
Therefore
1
|En (π/4)| ≤ |π/4|n+1 for all n ∈ N.
(n + 1)!

245
In particular, since π < 4 we have
1
|En (π/4)| < for all n ∈ N.
(n + 1)!
Therefore
1
if ≤ 10−3 then |En (π/4)| < 10−3 .
(n + 1)!
Therefore
if (n + 1)! ≥ 103 then |En (π/4)| < 10−3 .
Now consider the following table.

n n!
1 1
2 2
3 6
4 24
5 120
6 720
7 5040

Since n! is increasing in n we therefore have


(n + 1)! ≥ 103 if and only if n ≥ 6
Therefore a Taylor polynomial of degree 6 about x = 0 will result in an approximation
sin(π/4) ≈ P6 (π/4) with error less than 10−3 .
Remark 4.8.14. The lower bound on the degree of the Taylor polynomial found in Example
4.8.13 is not the best possible bound. In fact, you can check that |E5 (π/4)| < 10−3 . Therefore
a Taylor polynomial of degree only 5 is sufficient to obtain an error of less than 10−3 . This
is due to the fact that, in general, the bound for the error
M |x − a|n+1
|En (x)| ≤
(n + 1)!
is not necessarily the best possible bound; that is, the actual error may be smaller than the
upper bound M |x − a|n+1 /(n + 1)!.

Taylor polynomials are particularly useful when dealing with a function which is defined as
an integral, as in Section 4.3. This is demonstrated in the following example.
Example 4.8.15. Consider the function F defined by
Z x
t2
F (x) = e− 2 dt, x ∈ R.
0

We use the 5th Taylor polynomial for F about x = 0 to approximate F ( 21 ) and F (1). We
find upper bounds for the errors E5 ( 12 ) and E5 (1).

246
Solution. We use the Fundamental Theorem of Calculus, Theorems 4.3.2 and 4.3.4, to find
that
x2 x2 x2 x2 x2 x2
F ′ (x) = e− 2 , F ′′ (x) = −xe− 2 , F ′′′ (x) = −e− 2 + x2 e− 2 , F (4) (x) = 3xe− 2 − x3 e− 2

and
x2 x2 x2
F (5) (x) = 3e− 2 − 6x2 e− 2 + x4 e− 2
for every x ∈ R. Therefore

F (0) = 0, F ′ (0) = 1, F ′′ (0) = 0, F ′′′ (0) = −1, F (4) (0) = 0 and F (5) (0) = 3.

We now have
x 3 x5
P5 (x) = x − + , x ∈ R.
6 40
Hence
1 1 1 1843
F ( 21 ) ≈ P5 ( 12 ) = − + = = 0.4799479166̇.
2 48 1280 3840
and
1 1 103
F (1) ≈ P5 (1) = 1 −
+ = = 0.85833̇
6 40 120
We now use Theorem 4.8.8 to find upper bounds for the errors E5 ( 12 ) and E5 (1). We have
x2 x2 x2
F (6) (x) = −15xe− 2 + 10x3 e− 2 − x5 e− 2

x2
= xe− 2 (−x4 + 10x2 − 15).

We start by finding a bound for E5 (1). In order to use Theorem 4.8.8, we must find a real
number M1 ≥ 0 so that
|F (6) (x)| ≤ M1 for all x ∈ [0, 1].
Note that
1 x2
− ≤ − ≤ 0, x ∈ [0, 1]
2 2
so that
1 x2
e− 2 ≤ e− 2 ≤ 1, x ∈ [0, 1].
Therefore
x2 x2
|xe− 2 | = |x|e− 2 ≤ 1, x ∈ [0, 1]. (4.26)

Note further that if x ∈ [0, 1] then


d
−x4 + 10x2 − 15 = 0 if and only if x = 0.

dx
Therefore the function p(x) = −x4 + 10x2 − 15 attains its global minimum and maximum
values on [0, 1] at one or more of the points x = 0 or x = 1. Therefore

| − x4 + 10x2 − 15| ≤ max{|p(0)|, |p(1)|} = 15, x ∈ [0, 1]. (4.27)

247
It now follows from (4.26) and (4.27) that

|F (6) (x)| ≤ 15, x ∈ [0, 1].

We now apply Theorem 4.8.8 with n = 5 and M1 = 15 and find that


15 1
|E5 (1)| ≤ = ≈ 0.0208.
6! 48
In exactly the same way, we find that
1 1
|E5 ( 12 )| ≤ 6
= ≈ 0.000326.
48 × 2 3072
Therefore the approximation F ( 12 ) ≈ P5 ( 21 ) is accurate to at least three decimal places,
while the approximation F (1) ≈ P5 (1) is accurate at least in the first decimal place.
Remark 4.8.16. The function F discussed in Example 4.8.15 is important in statistics.
Indeed, the normal distribution with mean µ and standard deviation σ has probability density
function
1 −(t−µ)2
f (t; µ, σ 2 ) = √ e 2σ2 , t ∈ R.
2πσ 2
In particular, if µ = 0 and σ = 1 we have the standard normal distribution with probability
density function
1 2
f (t) = √ e−t /2 , t ∈ R.

If X is a random variable that is normally distributed with mean µ = 0 and standard
deviation σ = 1, then for any x ≥ 0,
Z x
1 2
P (0 ≤ X ≤ x) = √ e−t /2 dt.
2π 0

It is not always the case that the Taylor polynomials of a function give a good approximation
for the function. Consider, for instance, the following example.
Example 4.8.17. Consider the function
 −x−2
 e if x ̸= 0
f (x) =
0 if x = 0.

It can be shown that f (n) exists and is continuous on R for every n ∈ N. In particular,
f (n) (0) = 0 for every n ∈ N. We prove this fact for the case n = 1. If x ̸= 0, then
−2
′ 2e−x
f (x) = .
x3
When x = 0, we have
−2
f (h) − f (0) e−h
= , h ̸= 0.
h h

248
Using l’Hospital’s Rule, we find

f (h) − f (0) h−1 −h−2 h


lim = lim h−2 = lim −3 h−2 = lim −2 = 0.
h→0 h h→0 e h→0 −2h e h→0 2eh

Therefore f ′ (0) exists, and f ′ (0) = 0. Again using l’Hospital’s Rule, we find that

lim f ′ (x) = 0 = f ′ (0)


x→0

so that f ′ is continuous at 0. Clearly, f ′ is continuous at every x ̸= 0, so that it is continuous


on R.

Since f (0) = 0 and f (k) (0) = 0 for every k ∈ N, it follows that Pn (x) = 0 for every n ∈ N
and x ∈ R. Therefore
−2
|En (x)| = |f (x) − Pn (x)| = e−x , f or all n ∈ N and x ̸= 0.

Motivated by the problem of approximating the values of a function f from R to R, we


introduced the Taylor polynomials Pn for f about a point a. Based on an explicit expression
for the error En (x) = f (x) − Pn (x) in the approximation f (x) ≈ Pn (x), we derived an upper
bound for the error. We also showed, at the hand of an example, that the Taylor polynomials
do not necessarily give a good approximation for the function.

Exercise 4.8

1. Find the nth Taylor polynomial Pn for f about x = a. Use Pn to approximate f (b).
Compare your approximation to the value of f (b) you get when using a calculator.
(a) f (x) = cos x, x ∈ R, n = 6, a = 0 and b = 2.
(b) f (x) = xx , x > 0, n = 3, a = 1 and b = 2.
1
(c) f (x) = , x ̸= −1, n = 5, a = 1 and b = 3.
x+1
1
(d) f (x) = arcsin x, −1 ≤ x ≤ 1, n = 3, a = 0 and b = .
2
2. Consider the function f (x) = x sin x, x ∈ R.
(a) Find the 4th Taylor polynomial for f (x) about x = 0.
(b) Show that
1
 1
|E4 2
|≤ .
26 × 10
(c) Use P4 to approximate sin( 21 ). Based on your answer in (b), how accurate is your
approximation?

3. Consider the function f (x) = x, x ≥ 0.
(a) Find the 5th Taylor polynomial for f (x) about x = 4.

249
(b) Show that
945
|E5 (2)| ≤ .
211/2× 6!

(c) Use P5 to approximate 2. Based on your answer in (b), how accurate is your
approximation?

4. Consider the function F : R → R given by


Z xp
F (x) = 1 + sin2 t dt, x ∈ R.
0

(a) Write down the 3rd Taylor polynomial for F about x = 0.


(b) Prove that
25
|E3 ( 21 )| ≤ .
192 × 24
(c) Use P3 to approximate
Z 1/2 p
1 + sin2 tdt.
0
Use your answer in (b) to determine how accurate your answer is.

5. Let f (x) = ln(x + 1), x > −1.


(a) Use the Principle of Mathematical Induction to prove that

(−1)n+1 (n − 1)!
f (n) (x) =
(x + 1)n
for all n ∈ N.
(b) Let Pn (x) be the nth Taylor polynomial for f (x) about x = 0. Write down an
expression for Pn (x), x > −1.
1
(c) Prove that |En (− 12 )| ≤ for all n ∈ N.
(n + 1)
(d) How large should n be so that |En (− 21 )| ≤ 10−6 ?
1
6. Let f (x) = , x ̸= −1.
1+x
(a) Use the Principle of Mathematical Induction to prove that
(−1)n n!
f (n) (x) =
(x + 1)n+1
for all ∈ N.
(b) Let Pn (x) be the nth Taylor polynomial for f (x) about x = 0. Write down an
expression for Pn (x), x ∈ R.
1
(c) Prove that |En (1)| ≤ 1 and |En ( 12 )| ≤ n+1 for all n ∈ N.
2

250
(d) How large should n be so that |En ( 12 )| ≤ 10−6 ?
(e) What do you think happens to Pn ( 21 ) as n becomes large? Based on this obser-

X (−1)k
vation, conjecture a value for the infinite sum .
k=0
2k
(f) Do you think that Pn (1) is ever a good approximation for f (1)? Calculate Pn (1)
for a few values of n, and see what happens.

7. Consider the proof of Theorem 4.8.5. Suppose that I is an open interval containing
the real number a, and f a function defined on I. If f ′′ exists and is continuous on I,
show that Z x
E1 (x) = f ′′ (t)(x − t)dt, x ∈ I.
a

8. Let I be an open interval containing the real number a, and f a function defined on
I such that f (n+1) exists and is continuous on I. Prove the following. For every x ∈ I
there exists a real number sx between x and a so that

f (n+1) (sx )
En (x) = (x − a)n+1 .
(n + 1)!

[Hint: For a fixed x ∈ I, apply the Weighted Mean Value Theorem for the integral,
see Exercise 4.4, to the expression for En (x) given in Theorem 4.8.5.]

9. Let I be an open interval containing the real number a, and f a function defined on
I such that f (n+1) exists and is continuous on I. Prove the following. For every x ∈ I
there exists a real number ux between x and a so that

f (n+1) (ux )
En (x) = (x − ux )n (x − a).
n!
[Hint: For a fixed x ∈ I, apply the Mean Value Theorem for the integral, Theorem
4.4.4, to the expression for En (x) given in Theorem 4.8.5.]

10. The aim of this exercise is to complete the proof of Theorem 4.8.8. Let a, I, f , n and
M be as given in the theorem.
M
(a) If x = a, show that |En (x)| = 0 = |x − a|n+1 .
(n + 1)!
(b) Suppose that x < a.
(i) Show that (−1)n (x − t)n ≥ 0 for all x ≤ t ≤ a.
(ii) Use (i) to show that

M M
− (−1)n+1 (x − a)n+1 ≤ (−1)n+1 En (x) ≤ (−1)n+1 (x − a)n+1 .
(n + 1)! (n + 1)!

M
(iii) Now show that |En (x)| ≤ |x − a|n+1
(n + 1)!

251
Chapter 5

Curves in R2 and R3

Curves in the plane R2 , and in three-dimensional space R3 , occur naturally as graphical


representations of physical and mathematical processes. Consider, for instance, a particle
moving continuously in the plane. At each time t ≥ 0, we record the position of the particle
with respect to a reference frame (x-axis and y-axis) to obtain a point r̄(t) in R2 . Plotting
the points r̄(t) for all t ≥ 0, we obtain a curve in R2 .

It is often useful to have an exact analytical description for a curve, such as a formula. For
instance, if a curve is the graph of a function f : R → R, then we use the formula y = f (x)
to describe the points on the curve; it is often possible to discover properties of the curve
by investigating the function f . However, not all curves are graphs of functions. Indeed,
the curve in the sketch above is certainly not the graph of a function. In fact, most curves
in R2 are not graphs of functions! In this chapter, we introduce an analytical description of
curves in R2 (and R3 ) which includes many curves that are not graphs of functions.

252
5.1 Vector Functions

Definition 5.1.1. A three-dimensional vector function is a function r̄ : I → R3 , where


I is a possibly unbounded interval in R. A two-dimensional vector function is a function
r̄ : I → R2 .

Remark 5.1.2. All the definitions and results we discuss in this section and the next one
hold for three-dimensional as well as two-dimensional vector functions. The proofs of these
results are virtually identical in the two cases. We will therefore state all definitions and
results for three-dimensional vector functions, with the understanding that these are also
applicable to two-dimensional vector functions. It should be noted that the theory as well as
the applications of vector functions are not restricted to two and three dimensions.

Remark 5.1.3. Let r̄ : I → R3 be a vector function. For every t ∈ I, r̄(t) is a vector in


R3 ; that is, r̄(t) = ⟨x(t), y(t), z(t)⟩. We may therefore associate with the vector function r̄
three real-valued functions

x : I → R, y : I → R, z : I → R.

These are called the component functions of the vector function r̄.

Example 5.1.4. Consider the vector function r̄(t) = ⟨t2 − t, t − 1, 1 − t⟩, t ∈ [1, ∞). The
component functions of r̄ are

x(t) = t2 − t, y(t) = t − 1, z(t) = 1 − t, t ∈ [1, ∞).

Some vector functions can be used to describe ‘curves’ in R2 and R3 . Given a vector function
r̄ : I → R3 , its range
C = {r̄(t) : t ∈ I}
is a subset of R3 . For some vector functions, this set can be thought of as a curve in R3 . In
this case, we call C the curve parameterised by r̄, and r̄ is a parameterisation of C. In the
same way, a two-dimensional vector function may describe a ‘curve’ in R2 . We illustrate the
idea of a vector function describing a ‘curve’ at the hand of some examples.

Example 5.1.5. Let r̄(t) = ⟨t2 + t, t − 1⟩, t ∈ R. We show that a point x̄ = ⟨x, y⟩ in R2 is
in the range
C = {r̄(t) : t ∈ R}
of r̄ if and only if x = y 2 + 3y + 2. We also sketch the curve C.

Solution. A point x̄ = ⟨x, y⟩ is in C if and only if x̄ = r̄(t) for some t ∈ R. Therefore

⟨x, y⟩ ∈ C if and only if x = t2 + t and y = t − 1 for some t ∈ R.

Hence
⟨x, y⟩ ∈ C if and only if x = t2 + t and y + 1 = t for some t ∈ R.

253
so that

⟨x, y⟩ ∈ C if and only if x = (y + 1)2 + y + 1 = y 2 + 3y + 2, y ∈ R.

Therefore C is the parabola with equation x = y 2 + 3y + 2.

y
1
x
1 2 3 4
−1

−2

−3

−4

1
Example 5.1.6. Consider the vector function r̄(t) = ⟨ t+1 , t2 − 1⟩, t ∈ [− 21 , 2]. We show
that a point ⟨x, y⟩ in R2 is in the range
C = {r̄(t) : t ∈ [− 21 , 2]}
1
of r̄ if and only if y = x2
− x2 , 1
3
≤ x ≤ 2. We also sketch the curve C.

Solution. Let x̄ = ⟨x, y⟩ be a point in R2 . Then x̄ ∈ C if and only if x̄ = r̄(t) for some
t ∈ [− 21 , 2]. Hence
1
x̄ ∈ C if and only if x = and y = t2 − 1 for some t ∈ [− 21 , 2].
t+1
Expressing t in terms of x, we have
1
x̄ ∈ C if and only if t = − 1 and y = t2 − 1 for some t ∈ [− 12 , 2].
x
Note that − 12 ≤ t ≤ 2 if and only if 1
3
≤ x ≤ 2. Hence
1 2 1
x̄ ∈ C if and only if y = 2
− , 3
≤ x ≤ 2.
x x
Therefore C is the curve with equation
1 2 1
y= 2
− , 3
≤ x ≤ 2.
x x
We now sketch the curve C. Firstly, we have
1
if x = 3
then y = 3

254
and
if x = 2 then y = − 43 .
Next we find the x-intercept of C. We have
1 2
0=y= 2 − if and only if 1 − 2x = 0 so that x = 12 .
x x
Now we determine the turning points of C. For 13 ≤ x ≤ 2 we have
dy 2 2 2x − 2
= 2− 3 = = 0 if and only if x = 1.
dx x x x3
Furthermore,
dy 1 dy
< 0 if 3
≤ x < 1 and > 0 if 1 < x ≤ 2.
dx dx
1 2
Therefore the function given by y = 2 − has a local minimum at x = 1. The local
x x
minimum value is y = −1.

Lastly, we have
d2 y 6 4 6 − 4x
= − = .
dx2 x4 x3 x4
Therefore
d2 y 1 3 d2 y
> 0 if 3
≤ x < 2
and < 0 if 23 < x ≤ 2.
dx2 dx2
Therefore the curve C is concave upward on the interval [ 31 , 32 ) and concave downward on
the interval ( 32 , 2].

The curve C is shown in the sketch below.

y
3 •

1/2 1 3/2 2 x
1/3


−1 •

Example 5.1.7. Let r̄(t) = (1 − t)ā + tb̄, t ∈ R, where ā and b̄ are fixed points in R3 so
that ā ̸= b̄. The range of r̄,
C = {r̄(t) : t ∈ R} = {(1 − t)ā + tb̄ : t ∈ R},
is the straight line through ā and b̄.

255
Example 5.1.8. Consider the vector function r̄(t) = ⟨c cos t, c sin t⟩, t ∈ [0, 2π], where c > 0
is a constant real number. Let

C = {r̄(t) : t ∈ [0, 2π]}

be the range of r̄. We show that a point x̄ = ⟨x, y⟩ is on C if and only if

x 2 + y 2 = c2 .

That is, C is the circle with centre 0̄ and radius c.

Solution. Let x̄ = ⟨x, y⟩ be a point on C. Then there exists a real number t ∈ [0, 2π] such
that
x = c cos t and y = c sin t.
We therefore have
x2 + y 2 = c2 (cos2 t + sin2 t) = c2 .
Consider a point x̄ = ⟨x, y⟩ in R2 such that

x 2 + y 2 = c2 .

We show that x̄ ∈ C. There are two cases to consider. Suppose that y ≥ 0, and let t ∈ [0, π]
be the magnitude of the angle between the positive x-axis and the ray with origin 0̄ in the
direction of x̄, see the sketch below.

y
• x̄

t x

⟨1, 0⟩

By the definition of the magnitude of an angle,


x̄ · ⟨1, 0⟩ x
cos t = =
∥x̄∥ ∥x̄∥
so that
x = ∥x̄∥ cos t = c cos t.
Since x2 + y 2 = c2 and y ≥ 0 it follows that
√ √
y = c2 − x2 = c 1 − cos2 t = c| sin t|.

256
But t ∈ [0, π] so that sin t ≥ 0. Therefore y = c sin t so that x̄ = r̄(t) ∈ C.

Now suppose that y < 0. Let α be the magnitude of the angle between the negative x-axis
and the ray with origin 0̄ in the direction of x̄, and let t = π + α. Then t ∈ (π, 2π) because
0 < α < π and, by the definition of the magnitude of an angle,

x̄ · ⟨−1, 0⟩ −x
cos α = = .
∥x∥ ∥x∥

Therefore
x = −∥x̄∥ cos α = −c cos(t − π) = c cos t.
Because y < 0 and x2 + y 2 = c2 it follows that
√ √
y = − c2 − x2 = −c 1 − cos2 t = −c| sin t|.

But t ∈ (π, 2π) so that sin t < 0. Therefore | sin t| = − sin t so that y = c sin t. Hence
x̄ = r̄(t) ∈ C.

We have shown that x̄ ∈ C if and only if x2 + y 2 = c2 . Hence C is the circle with centre 0̄
and radius c; that is
C = {⟨x, y⟩ : x2 + y 2 = c2 }.

y
c

x
−c c

−c

Remark 5.1.9. Consider the vector function r̄(t) = ⟨c cos t, c sin t⟩, t ∈ [0, 2π], with c > 0
a constant. As shown in Example 5.1.8, the range of r̄ is the circle with centre 0̄ and radius
c. This circle has equation
x2 + y 2 = c.
We note the following.

(1) As t increases from 0 to 2π, the point r̄(t) moves around the circle in the anti-clockwise
direction, starting and ending at r̄(0) = ⟨c, 0⟩ = r̄(2π), see the sketch below.

257
y
c r̄(π/2)

r̄(π) r̄(2π) r̄(0) x


• •
−c c

−c • r̄(3π/2)

(2) By restricting the domain of r̄, we obtain different parts of the circle C. For instance,
the set
{r̄(t) : t ∈ [0, π2 ]}
is that part of the circle with centre 0̄ and radius c that lies in the first quadrant, see
the figure below.

c r̄(π/2)

r̄(0) x

c

(3) If p̄ = ⟨p1 , p2 ⟩ is a fixed point in R2 , then the range S the vector function

ū(t) = r̄(t) + p̄ = ⟨c cos t + p1 , c sin t + p2 ⟩, t ∈ [0, 2π]

is the circle with centre p̄ and radius c. That is,

S = {ū(t) : t ∈ [0, 2π]} = {⟨x, y⟩ : (x − p1 )2 + (y − p2 )2 = c2 },

see Exercise 5.1 number 3.

Example 5.1.10. Consider the vector function r̄(t) = ⟨2 cos t, 2 sin t, 3⟩, t ∈ [0, 2π], and its
range
C = {r̄(t) : t ∈ [0, 2π]}.
We show that C is the circle in the plane with equation z = 3, with centre c̄ = ⟨0, 0, 3⟩ and
radius 2.

258
Solution. Let S be the circle in the xy-plane with centre 0̄ and radius 2. The vector
function
ū(t) = ⟨2 cos t, 2 sin t, 0⟩, t ∈ [0, 2π]
has range S, see Example 5.1.8. Hence
S = {ū(t) : t ∈ [0, 2π]}.
A point x̄ ∈ R3 is on C if and only if
x̄ = r̄(t) = ⟨2 cos t, 2 sin t, 3⟩ = ⟨2 cos t, 2 sin t, 0⟩ + ⟨0, 0, 3⟩ = ū(t) + ⟨0, 0, 3⟩.
for some t ∈ [0, 2π]. Therefore
x̄ ∈ C if and only if x̄ = ȳ + ⟨0, 0, 3⟩ for some ȳ ∈ S.
Hence
x̄ = ⟨x, y, z⟩ ∈ C if and only if ⟨x, y⟩ ∈ S and z = 3.
Then
x̄ = ⟨x, y, z⟩ ∈ C if and only if x2 + y 2 = 4 and z = 3.
Hence C is the circle in the plane with equation z = 3, with centre c̄ = ⟨0, 0, 3⟩ and radius
2.

−2
−2
y
2
x 2 S

Example 5.1.11. Consider the vector function r̄(t) = ⟨a cos t, b sin t⟩, t ∈ [0, 2π], where
a, b > 0 are constant real numbers. We show that a point x̄ in R2 is in the range
C = {r̄(t) : t ∈ [0, 2π]}
of r̄ if and only if
x2 y 2
+ 2 = 1.
a2 b
We also sketch the curve C.

259
Solution. Consider a point x̄ = ⟨x, y⟩ ∈ C. Then x̄ = r̄(t) for some t ∈ [0, 2π] so that
x2 y 2 a2 cos2 t b2 sin2 t
+ = + = cos2 t + sin2 t = 1.
a2 b2 a2 b2
Let x̄ = ⟨x, y⟩ be a point in R2 so that
x2 y 2
+ 2 = 1.
a2 b
Let ⟨u, v⟩ = ⟨x/a, y/b⟩. Then
 x 2  y 2 x2 y 2
u2 + v 2 = + = + 2 = 1.
a b a2 b
According to Example 5.1.8, there exists a real number t ∈ [0, 2π] so that ⟨u, v⟩ = ⟨cos t, sin t⟩.
Then
x
= u = cos t so that x = a cos t
a
and
y
= v = sin t so that y = b cos t.
b
Therefore x̄ = ⟨x, y⟩ = ⟨a cos t, b sin t⟩ = r̄(t) so that x̄ ∈ C.

We now sketch the curve C. Note that


x2 y 2
+ 2 = 1.
a2 b
if and only if √ √
b a2 − x 2 b a2 − x 2
y= or y = − , − a ≤ x ≤ a.
a a
We first consider that part C1 of the curve C given by

b a2 − x2
y= , − a ≤ x ≤ a.
a
For −a < x < a we have
dy bx
=− √ = 0 if and only if x = 0
dx a a2 − x 2
and
d2 y b bx
2
= − 2 2 1/2

dx a(a − x ) a(a − x2 )3/2
2

b
(a2 − x2 ) + x2

= −
a(a2 2
−x )3/2

ab
= −
(a2 − x2 )3/2

< 0.

260
Therefore C1 has a turning point at x = 0 and is concave down on the interval [−a, a].

We also have
dy bx b
lim− = lim− − √ = lim− − p = −∞
x→a dx x→a 2
a a −x 2 x→a a a /x2 − 1
2

and, in the same way,


dy
lim + = ∞.
x→−a dx

a2 −x2
Since y = b a defines a continuous function on the interval [−a, a], we see that C1 has
vertical tangents at x = −a and at x = a.

The sketch below shows the curve C1 in the case a > b.

b
C1

x
−a a

Now consider the part C2 of the curve C given by



b a2 − x 2
y=− , − a ≤ x ≤ a.
a
Note that C2 is the reflection of C1 in the x-axis; that is,

⟨x, y⟩ ∈ C1 if and only if ⟨x, −y⟩ ∈ C2 .

We therefore arrive at the following sketch of the curve C.

y b y
C
b
C
x x
−a a −a a

−b

−b
The case a > b. The case a < b.

261
Remark 5.1.12 (Ellipse). Consider the vector function r̄(t) = ⟨a cos t, b sin t⟩, t ∈ [0, 2π],
where a, b > 0 are constants. In Example 5.1.11 it is shown that the range C of r̄ is the
curve with equation
x2 y 2
+ 2 = 1.
a2 b
This curve is known as an ellipse. We note the following.

(1) You can think of an ellipse as a circle which has been stretched out along either the
x-axis or the y-axis. Indeed, the ellipse fits exactly between the circles with center 0̄
and radii a and b, respectively, see the sketch below.

y
x̄′
b •
C
x̄ •

t x
−a a

−b

(2) As t increases from 0 to 2π, the point r̄(t) moves around the ellipse in the anti-clockwise
direction, starting and ending at r̄(0) = ⟨a, 0⟩ = r̄(2π), see the sketch below.

b •r̄(π/2)

r̄(π) r̄(2π) r̄(0) x


• •
−a a


−b r̄(3π/2)

(3) The ellipse C is symmetric about the x-axis, meaning

⟨x, y⟩ ∈ C if and only if ⟨x, −y⟩ ∈ C,

and about the y-axis so that

⟨x, y⟩ ∈ C if and only if ⟨−x, y⟩ ∈ C.

262
(4) The ellipse C with equation
x2 y 2
+ 2 =1
a2 b
can be described in the following way. Assume that a > b, and let

c = a2 − b 2 .
It can be shown that
C = {x̄ ∈ R2 : ∥x̄ − ⟨c, 0⟩∥ + ∥x̄ − ⟨−c, 0⟩∥ = 2a},
see the figure below.
y

b
•x̄
L1 L2
x
• • • •
−a ⟨−c, 0⟩ ⟨c, 0⟩ a

−b

L1 + L2 = ∥x̄ − ⟨−c, 0⟩∥ + ∥x̄ − ⟨c, 0⟩∥ = 2a.

The points ⟨c, 0⟩ and ⟨−c, 0⟩ are called the foci of the ellipse C, and we call the points
⟨a, 0⟩ and ⟨−a, 0⟩ the vertices of C; the vertices are the two points on C that are
furthest apart. The x-axis is the major axis of C.

(5) If b > a, we set c = b2 − a2 . In this case, the foci of C are at ⟨0, c⟩ and ⟨0, −c⟩,
while the vertices are ⟨0, b⟩ and ⟨0, −b⟩. In this case, the y-axis is the major axis of
the ellipse.
(6) If p̄ = ⟨p1 , p2 ⟩ is a fixed point in R2 , then the range E the vector function
ū(t) = r̄(t) + p̄ = ⟨a cos t + p1 , b sin t + p2 ⟩, t ∈ [0, 2π]
is the ellipse
(x − p1 )2 (y − p2 )2
 
E = {ū(t) : t ∈ [0, 2π]} = ⟨x, y⟩ : + =1 .
a2 b2
If a > b, then the foci of E are
⟨p1 + c, p2 ⟩ and ⟨p1 − c, p2 ⟩
while the vertices of E are
⟨p1 − a, p2 ⟩ and ⟨p1 + a, p2 ⟩.
The ellipse E is symmetric about the lines with equations x = p1 and y = p2 , respec-
tively, see the figure below.

263
y
⟨p1 , p2 + b⟩

⟨p1 − a, p2 ⟩

⟨p1 + a, p2 ⟩

• • • • •
⟨p1 − c, p2 ⟩ ⟨p1 + c, p2 ⟩
x


⟨p1 , p2 − b⟩

We illustrate the above remarks at the hand of an example.

Example 5.1.13. We sketch the curve C with equation

(x + 1)2 (y − 1)2
+ = 1,
9 4
indicating all axes of symmetry and turning points (vertices).

Solution. Consider the ellipse E with equation

x2 y 2
+ = 1.
9 4
For a point x̄ = ⟨x, y⟩ ∈ R2 we have

(x + 1)2 (y − 1)2
x̄ ∈ C if and only if + = 1, if and only if ⟨x + 1, y − 1⟩ ∈ E.
9 4
Therefore
x̄ ∈ C if and only if x̄ = ȳ + ⟨−1, 1⟩ for some ȳ ∈ E.
Therefore we obtain C from E by shifting E one unit to the left and one unit upward.

The turning points of E are at ⟨−3, 0⟩, ⟨3, 0⟩, ⟨0, −2⟩ and ⟨0, 2⟩. Therefore the turning
points of C are at

⟨−3, 0⟩ + ⟨−1, 1⟩ = ⟨−4, 1⟩, ⟨3, 0⟩ + ⟨−1, 1⟩ = ⟨2, 1⟩

and
⟨0, −2⟩ + ⟨−1, 1⟩ = ⟨−1, −1⟩, ⟨0, 2⟩ + ⟨−1, 1⟩ = ⟨−1, 3⟩.
The axes of symmetry of E are the x-axis and y-axis; that is, the lines y = 0 and x = 0.
Therefore the lines
y = 1 and x = −1
are the axes of symmetry of C.

The sketch of C is shown below.

264
y

⟨−1, 3⟩
• C

⟨−4, 1⟩

⟨2, 1⟩
⟨−1, 1⟩
• • •
x


⟨−1, −1⟩

Example 5.1.14. Consider the vector function r̄(t) = ⟨a tan t, b sec t⟩, t ∈ (− π2 , 3π
2
), t ̸= π2 ,
2
where a, b > 0 are constant real numbers. We show that a point x̄ = ⟨x, y⟩ in R is in the
range
π 3π
C = {r̄(t) : t ∈ (− , ), t ̸= π2 }
2 2
of r̄ if and only if
y 2 x2
− 2 = 1.
b2 a
We also sketch the curve C.

Solution. Consider a point x̄ = ⟨x, y⟩ ∈ C. Then x̄ = r̄(t) for some t ∈ (− π2 , 3π


2
), t ̸= π2 , so
that
y 2 x2 b2 sec2 t a2 tan2 t
− = − = sec2 t − tan2 t = 1.
b2 a2 b2 a2
Let x̄ = ⟨x, y⟩ be a point in R2 so that
y 2 x2
− 2 = 1.
b2 a
We show that x̄ = r̄(t) for some t ∈ (− π2 , 3π
2
), t ̸= π2 , so that x̄ ∈ C. We have

y2 x2
= 1 + ≥1
b2 a2
so that
y y
≥ 1 or ≤ −1.
b b
There are four cases to consider, namely, y ≥ b and x ≥ 0, y ≥ b and x < 0, y ≤ −b and
x ≥ 0, and, y ≤ −b and x < 0. Suppose that y ≥ b and x ≥ 0. The function b sec t is
continuous on [0, π2 ) and its range on this interval is [b, ∞). Therefore there exists a real
number t ∈ [0, π2 ) so that
y = b sec t.

265
But
y 2 x2
− 2 =1
b2 a
so that
x2 b2 sec2 t
= − 1 = sec2 t − 1 = tan2 t.
a2 b2
Since x ≥ 0 and tan t ≥ 0 for t ∈ [0, π2 ) it follows that

x= a2 tan2 t = a tan t.

Hence
x̄ = ⟨a tan t, b sec t⟩ = r̄(t) ∈ C.
Now suppose that y ≥ b and x < 0. The function b sec t is continuous on (− π2 , 0] and its
range on this interval is [b, ∞). Since y ≥ b, it once again follows that there exists a real
number t ∈ (− π2 , 0] so that
y = b sec t.
As
y 2 x2
− 2 =1
b2 a
we have
x2 b2 sec2 t
= − 1 = sec2 t − 1 = tan2 t.
a2 b2
In this case x < 0 and tan t ≤ 0 for t ∈ (− π2 , 0]. Therefore

x = − a2 tan2 t = −a(− tan t) = a tan t

so that
x̄ = ⟨a tan t, b sec t⟩ = r̄(t) ∈ C.
The remaining two cases follow in a similar way. If y ≤ −b and x ≥ 0 then there exists
t ∈ [π, 3π2
) so that x̄ = r̄(t) ∈ C. If y ≤ −b and x < 0 then there exists t ∈ ( π2 , π] so that
x̄ = r̄(t) ∈ C, see Exercise 5.1 number 5.

We now sketch the curve C. Note that


y 2 x2
− 2 =1
b2 a
if and only if √ √
b x 2 + a2 b x 2 + a2
y= or y = − , x ∈ R.
a a
The curve C therefore consist of two branches C1 and C2 corresponding to the positive and
negative square root, respectively. We consider first the branch C1 with equation

b x 2 + a2
y= , x ∈ R.
a

266
We have
dy bx
= √ = 0 if and only if x = 0
dx a x 2 + a2
and
d2 y x2
 
b 1
= −
dx2 a (x2 + a2 )1/2 (x2 + a2 )3/2

b 2 2 2

= (x + a ) − x
a(x2 + a2 )3/2

ab
=
(x2 + a2 )3/2

> 0
for all x ∈ R. Therefore C1 has a turning point at x = 0 and is concave upwards on R.

We also have √ √
b x 2 + a2 b x 2 + a2
lim = lim = ∞.
x→∞ a x→−∞ a
The lines with equations
bx bx
y= and y = −
a a
are asymptotes of C1 . Indeed,
√ √
b x 2 + a2 b x2 bx
> = , x≥0
a a a
and √ √
b x 2 + a2 b x2 bx
> = − , x ≤ 0.
a a a
Furthermore, we have
√ !
b x2 + a2 bx ab
lim − = lim √ = 0,
x→∞ a a x→∞ x + a2 + x
2

see Exercise 5.1 number 6. In the same way,


√ !
b x2 + a2 bx
lim + = 0.
x→−∞ a a

The sketch below shows the branch C1 of the curve C.

267
y C1

y = − bx
a b y= bx
a

Note that the branch C2 of C with equation



b x 2 + a2
y=− , x∈R
a
is the reflection of C1 in the x-axis; that is,

⟨x, y⟩ ∈ C1 if and only if ⟨x, −y⟩ ∈ C2 .

We now have the following sketch of the curve C.

y C

y = − bx
a b y= bx
a

−b

Remark 5.1.15 (Hyperbola). Consider the vector function r̄(t) = ⟨a tan t, b sec t⟩, t ∈
(− π2 , 3π
2
), t ̸= π2 , where a, b > 0 are constants. In Example 5.1.14 it is shown that the range
C of r̄ is the curve with equation
y 2 x2
− 2 = 1.
b2 a
This curve is known as a hyperbola. We note the following.

268
(1) The hyperbola consists of two branches. If ⟨x, y⟩ = r̄(t) with t ∈ (− π2 , π2 ), then y ≥ b.
If ⟨x, y⟩ = r̄(t) with t ∈ ( π2 , 3π
2
), then y ≤ −b. On each of the two branches, the point
r̄(t) moves from left to right on the curve as t increases.

y C

− π2 < t < 0 0<t< π


2

b • r̄(0)

−b • r̄(π)
π 3π
2
<t<π π<t< 2

(2) The hyperbola C is symmetric about the x-axis, meaning

⟨x, y⟩ ∈ C if and only if ⟨x, −y⟩ ∈ C,

and about the y-axis so that

⟨x, y⟩ ∈ C if and only if ⟨−x, y⟩ ∈ C.

The y-axis is the major axis of C.

(3) The hyperbola C with equation


y 2 x2
− 2 =1
b2 a
can be described in the following way. Let

c = a2 + b 2 .

It can be shown that

C = {x̄ ∈ R2 : | ∥x̄ − ⟨0, c⟩∥ − ∥x̄ − ⟨0, −c⟩∥ | = 2b},

see the figure below.

269
y C

• x̄

⟨0, c⟩
L1

b
L2
x

−b

⟨0, −c⟩ C

|L1 − L2 | = | ∥x̄ − ⟨0, c⟩∥ − ∥x̄ − ⟨0, −c⟩∥ | = 2b.

The points ⟨0, −c⟩ and ⟨0, c⟩ are called the foci of the hyperbola C, and we call the
points ⟨0, b⟩ and ⟨0, −b⟩ the vertices of C; the vertices are the two points on C, one
on each branch, which are closest together.
b
(4) Recall from Example 5.1.14 that the lines with equations y = a
x and y = − ab x,
respectively, are the asymptotes of the hyperbola C.

(5) If p̄ = ⟨p1 , p2 ⟩ is a fixed point in R2 , then the range E of the vector function
π 3π π
ū(t) = r̄(t) + p̄ = ⟨a tan t + p1 , b sec t + p2 ⟩, t ∈ (− , ), t ̸=
2 2 2
is the hyperbola

(y − p2 )2 (x − p1 )2
 
π 3π π
E = {ū(t) : t ∈ (− , ), t ̸= } = ⟨x, y⟩ : − =1 .
2 2 2 b2 a2

(a) The foci of E are ⟨p1 , p2 + c⟩ and ⟨p1 , p2 − c⟩, with c = a2 + b2 .
(b) The vertices of E are ⟨p1 , p2 + b⟩ and ⟨p1 , p2 − b⟩.
(c) The lines with equations

b b
y= (x − p1 ) + p2 and y = − (x − p1 ) + p2 ,
a a
respectively, are the asymptotes of E.
(d) The hyperbola E is symmetric about the lines with equations x = p1 and y = p2 ,
respectively.

270
y E

y=

⟨p1 , p2 + c⟩

p2
1) +
−a
b (x

−p


x

a (x
p 1)

b
+p

y=
2

⟨p1 , p2 − c⟩ E

(6) Let v̄(t) = ⟨b sec t, a tan t⟩, t ∈ (− π2 , 3π


2
), t ̸= π2 . Then a point x̄ = ⟨x, y⟩ is in the range
K of v̄ if and only if
x2 y 2
− 2 = 1.
b2 a
The curve K is a hyperbola. In this√case, the vertices of K are ⟨b, 0⟩ and ⟨−b, 0⟩, the
foci are ⟨c, 0⟩ and ⟨−c, 0⟩ with c = a2 + b2 , and the lines with equations y = ab x and
y = − ab x, respectively, are the asymptotes of K.

K y= ax K
b

x
• •
⟨−c, 0⟩ ⟨c, 0⟩
−b b

y = − ax
b

Remarks above are illustrated in the following example.


Example 5.1.16. We sketch the curve C with equation
(x − 2)2
− (y + 1)2 = 1,
1/4
indicating all axes of symmetry, turning points (vertices) and assymptotes.

271
Solution. Consider the hyperbola H with equation
x2
− y 2 = 1.
1/4
For a point x̄ = ⟨x, y⟩ ∈ R2 we have
(x − 2)2
x̄ ∈ C if and only if − (y + 1)2 = 1, if and only if ⟨x − 2, y + 1⟩ ∈ H.
1/4
Therefore
x̄ ∈ C if and only if x̄ = ȳ + ⟨2, −1⟩ for some ȳ ∈ H.
Therefore we obtain C from H by shifting E two units to the right and one unit downward.

The turning points of H are at ⟨− 21 , 0⟩ and ⟨ 21 , 0⟩. Therefore the turning points of C are at

⟨− 21 , 0⟩ + ⟨2, −1⟩ = ⟨ 23 , −1⟩ and ⟨ 12 , 0⟩ + ⟨2, −1⟩ = ⟨ 25 , −1⟩.

The axes of symmetry of H are the x-axis and y-axis; that is, the lines y = 0 and x = 0.
Therefore the lines
y = −1 and x = 2
are the axes of symmetry of C.

The asymptotes of H are the lines with equations

y = −2x and y = 2x,

respectively. Therefore the asymptotes of C are the lines with equations

y = −2(x − 2) − 1 = −2x + 3 and y = 2(x − 2) − 1 = 2x − 5,

respectively.

The sketch of C is shown below.

y C
−5
2x

x
y=
⟨2, −1⟩

⟨ 32 , −1⟩ ⟨ 52 , −1⟩
• • •
y=
−2
x+
3

272
It should be noted that, in general, the range of a vector function r̄ may not resemble
anything we would think of as a ‘curve’. Indeed, when we think of a curve, we tend to think
of something like a piece of string, bent and twisted in some way. However, the range of the
vector function 
 ⟨t, 1⟩ if t ∈ Q
r̄(t) =
⟨1, t⟩ if t ∈/Q

does not resemble this idea in any way. It can be proven that for any two real numbers
a < b there exist real numbers t0 ∈ Q and t1 ∈ / Q so that a < t0 < b and a < t1 < b.
Therefore the points r̄(t) in the range of r̄ ‘jump’ between the lines y = 1 (when t ∈ Q) and
x = 1 (when t ∈/ Q), see the figure below.

y
2

x
−2 −1 1 2
−1

−2

In order to avoid this kind of situation, we introduce the concepts of limit and continuity
for vector functions in the next section.

Exercise 5.1

1. Determine a Cartesian equation for the range C of the vector function r̄, and sketch
this curve.

(a) r̄(t) = ⟨1 − t, 2 + 3t⟩, t ∈ [0, 1] (b) r̄(t) = ⟨t2 − 1, t2 + 1⟩, t ∈ R

(c) r̄(t) = ⟨3 cos(2t), 3 sin(2t)⟩, t ∈ [ π4 , π] (d) r̄(t) = ⟨e2t , t + 1⟩, t ∈ R

(e) r̄(t) = ⟨2 cos t + 2, 2 sin t⟩, t ∈ [0, 3π


2
] (f) r̄(t) = ⟨t2 , t4 ⟩, t ∈ R

(g) r̄(t) = ⟨2 cos t, sin t⟩, t ∈ [0, 2π] (h) r̄(t) = ⟨cos t, 1, sin t⟩, t ∈ R

(i) r̄(t) = ⟨sec t, tan t⟩, t ∈ (− π2 , 3π


2
), t ̸= π
2
(j) r̄(t) = ⟨cos t, sin t, t⟩, t ∈ [0, 2π]
2. Sketch the curve with given Cartesian equation. Indicate the axes of symmetry, all
turning points (vertices) and any asymptotes of the curve.

273
x2 y 2 x2 y 2
(a) + =1 (b) − =1 (c) x2 + 3y 2 = 9
4 9 4 9

(d) y 2 − 2x2 = 8 (e) (x + 2)2 − 3y 2 = 9 (f) 9(x − 1)2 + 4(y + 2)2 = 36


3. Consider Remark 5.1.9 (3). Prove that if p̄ ∈ R2 is a fixed point, then the range of
the vector function
ū(t) = ⟨c cos t, c sin t⟩ + p̄, t ∈ [0, 2π]
is the circle {⟨x, y⟩ : ∥x̄ − p̄∥ = c}.
4. Consider Example 5.1.11. Let a, b and r̄ be as in the example. Let x̄ = ⟨x, y⟩ be a
point in R2 so that
x2 y 2
2
+ 2 = 1.
√ a b
′ 2 2
Let x̄ = ⟨x, a − x ⟩.
(a) Assume that y ≥ 0, and let t be the magnitude of angle formed by the positive
x-axis and the ray with origin 0̄ in the direction of x̄′ . Show that t ∈ [0, 2π] and
x̄ = r̄(t).
(b) Assume that y < 0, and let α be the magnitude of angle formed by the negative
x-axis and the ray with origin 0̄ in the direction of x̄′ . If t = π + α, show that
t ∈ [0, 2π] and x̄ = r̄(t).
5. Study Example 5.1.14. Let a, b and r̄ be as in the example.
(a) If y ≤ −b and x ≥ 0, show that there exists t ∈ [π, 3π 2
) so that x̄ = r̄(t).
π
(b) If y ≤ −b and x < 0, prove that there exists t ∈ ( 2 , π] so that x̄ = r̄(t).
6. Let a, b > 0 be real numbers. Prove that
√ !
2
b x +a 2 bx ab
lim − = lim √ = 0.
x→∞ a a x→∞ x + a2 + x
2

7. Find the points of intersection, if any, of the given curves C1 and C2 .



(a) C1 is parameterised by r̄(t) = ⟨ t + 2, t + 1⟩ and C2 is the circle with centre 0̄
and radius 1.
2
(b) C1 is parameterised by r̄(t) = ⟨e2t , e5−t ⟩ and C2 by v̄(s) = ⟨s2 , s4 ⟩, t > 0.
8. Determine the points of intersection (if any) of the curve C with parameterisation
r̄(t) = ⟨cos t, sin t, − cos t + 3⟩, t ∈ R, and the plane V through the point ā = ⟨1, 1, 1⟩
with normal vector n̄ = ⟨1, 2, 1⟩.
9. Show that the curve C with parameterisation r̄(t) = ⟨3 cos t, 5 sin t, 4 cos t⟩, t ∈ R, lies
on the sphere with centre 0̄ and radius 5.
10. Consider the curve C with parameterisation r̄(t) = ⟨t2 − 1, at2 , t2 + b⟩, t ∈ R, where a
and b are constant real numbers. For which values of a and b does the curve C lie on
the plane with equation x − y + 2z = 1?

274
5.2 Limits and Continuity

We all have an intuitive understanding of what a curve in R2 or R3 is. In order to obtain


a mathematically acceptable definition of a curve, we consider the concepts of limits and
continuity of vector functions.
Definition 5.2.1. Let r̄ be a three-dimensional vector function defined on an open interval
containing the real number a, except possibly at a. If c̄ is a vector in R3 , then lim r̄(t) = c̄
t→a
if the following is true. For every ϵ > 0 there exists a real number δ > 0 so that

if 0 < |t − a| < δ then ∥r̄(t) − c̄∥ < ϵ.

The limit of a vector function r̄(t) is completely determined by the limits of its component
functions, as the next result shows. It is therefore possible to reduce a lot of theoretical and
computational problems involving limits of vector functions to problems involving limits of
real-valued functions.
Theorem 5.2.2. Let r̄ be a three-dimensional vector function with components x, y and
z, defined on an open interval containing the real number a, except possibly at a. If c̄ =
⟨c1 , c2 , c3 ⟩ is a vector in R3 , then lim r̄(t) = c̄ if and only if
t→a

lim x(t) = c1 , lim y(t) = c2 and lim z(t) = c3


t→a t→a t→a

We demonstrate the use of Theorem 5.2.2 at the hand of an example.


Example 5.2.3. Consider the vector function r̄(t) = ⟨t2 ln(|t|), t2 − 1⟩, t ̸= 0. We show
that lim r̄(t) exists, and find the value of the limit.
t→0

Solution. The component functions of r̄ are

x(t) = t2 ln(|t|), y(t) = t2 − 1, t ̸= 0.

Clearly, lim y(t) exists, and lim y(t) = −1. It therefore remains to show that lim x(t) exists,
t→0 t→0 t→0
and find its value. Note that
ln(|t|)
x(t) = t2 ln(|t|) = .
t−2
Since lim ln(|t|) = −∞ and lim t−2 = +∞, we apply l’Hospital’s Rule. We have
t→0 t→0

d
dt
ln(|t|) t2
d −2
=−
dt
t 2

so that
d
dt
ln(|t|)
lim d −2
= 0.
t→0
dt
t

275
By l’Hospital’s Rule, lim x(t) exists, and
t→0

d
dt
ln(|t|)
lim x(t) = lim d −2 = 0.
t→0 t→0
dt
t

Therefore, by Theorem 5.2.2, lim r̄(t) exists, and lim r̄(t) = ⟨0, −1⟩.
t→0 t→0

The following result lists a number of properties of limits of vector functions. The proofs of
these follow easily from Theorem 5.2.2 and the appropriate properties of limits of real-valued
functions, and are therefore given as exercises, see Exercise 5.2 number 2.

Theorem 5.2.4. Let r̄ and w̄ be a three-dimensional vector functions defined on an open


interval containing the real number a, except possibly at a. Assume that lim r̄(t) = c̄ and
t→a
¯ Then the following statements are true.
lim w̄(t) = d.
t→a

¯
(1) lim[r̄(t) + w̄(t)] exists, and lim[r̄(t) + w̄(t)] = c̄ + d.
t→a t→a

(2) lim αr̄(t) exists and lim αr̄(t) = αc̄ for all α ∈ R.
t→a t→a

¯
(3) lim[r̄(t) · w̄(t)] exists and lim[r̄(t) · w̄(t)] = c̄ · d.
t→a t→a

¯
(4) lim[r̄(t) × w̄(t)] exists and lim[r̄(t) × w̄(t)] = c̄ × d.
t→a t→a

(5) If f is a real-valued function defined on an open interval containing the real number a,
except possibly at a, and lim f (t) = L, then lim[f (t)r̄(t)] exists, and lim[f (t)r̄(t)] = Lc̄.
t→a t→a t→a

With the concept of a limit in hand, we are able to define what it means for a vector function
to be continuous.

Definition 5.2.5. Let r̄ be a three-dimensional vector function defined on an open interval


containing the real number a. Then r̄ is continuous at a if lim r̄(t) = r̄(a).
t→a

It follows from Theorem 5.2.2 that continuity of a vector function is equivalent to continuity
of its component functions.

Theorem 5.2.6. Let r̄ be a three-dimensional vector function defined on an open interval


containing the real number a, with component functions x, y and z. Then r̄ is continuous
at a if and only if x, y and z are continuous at a.

The proof of Theorem 5.2.6 is an easy argument, based on Theorem 5.2.2, and is therefore
given as an exercise, see Exercise 5.2 number 3.

276
Definition 5.2.7. A curve C in R3 is the range of a continuous three-dimensional vector
function r̄ : I → R3 , where I is an interval in R. That is,
C = {r̄(t) : t ∈ I}.
The function r̄ is called a parameterisation of the curve C.

It is easily verified that the vector functions in Examples 5.1.7 to 5.1.6 in Section 5.1 are all
continuous, so that the range of each of these functions is a curve, in the sense of Definition
5.2.7.

In this section and the preceding one we have introduced vector functions of a real variable,
and shown that such functions provide useful analytical descriptions of curves in R2 and
R3 . However, at this stage our ability to deduce properties of a curve given by a vector
function r̄ from the function is rather limited. Our goal in the coming sections is to develop
the mathematical tools that enable us to gain a better understanding of the behaviour of a
curve.

Exercise 5.2

1. Determine whether or not the given limit exists, and find the value of the limit if it
does exist.

t2 − 1 t
(a) lim⟨ , t, t3 ⟩ (b) lim⟨ , cos t⟩ (c) lim⟨tet , t2 e1/t ⟩
t→1 t−1 t→0 sin t t→0

2. Use Theorem 5.2.2 to prove Theorem 5.2.4.


3. Use Theorem 5.2.2 to prove Theorem 5.2.6.
4. Let A be a 3 × 3 matrix. Suppose that r̄ : R → R3 is continuous at t = a. Prove that
the vector function ū(t) = r̄(t)A is continuous at t = a.

5.3 Differentiation of Vector Functions

The derivative (and second derivative) of a function f : R → R completely determines


the shape of the graph of f . Indeed, the sign of f ′ (x) determines whether the function is
increasing or decreasing, while that of f ′′ (x) specifies whether the graph is concave upward
or downward. In this section, we introduce the derivative of a vector valued function, and
discuss its geometric interpretation.
Definition 5.3.1. Let r̄ be a three-dimensional vector function defined on an open interval
containing the real number a. Then r̄ is differentiable at a if there exists a vector r̄′ (a) ∈ R3
so that
1
lim [r̄(a + h) − r̄(a)] = r̄′ (a).
h→0 h

The vector r̄ (a), if it exists, is called the derivative of r̄ at a.

277
As a consequence of Theorem 5.2.2, the derivative of a vector function is determined by the
derivatives of its component functions.

Theorem 5.3.2. Let r̄ be a three-dimensional vector function, with components x, y and


z, defined on an open interval containing the real number a. Then the following statements
are true.

(1) r̄ is differentiable at a if and only if x ,y and z are differentiable at a.

(2) If r̄ is differentiable at a, then r̄′ (a) = ⟨x′ (a), y ′ (a), z ′ (a)⟩.

We give the proof of this result as an exercise, see Exercise 5.3 number 6.

Example 5.3.3. We find the derivative of the vector function r̄(t) = ⟨tet , cos t, t2 − t + 2⟩,
t ∈ R.

Solution. The component functions of r̄ are clearly differentiable, hence by Theorem 5.3.2,
r̄ is differentiable and
d t d d
r̄′ (t) = ⟨ te , cos t, (t2 − t + 2)⟩ = ⟨et + tet , − sin t, 2t − 1⟩.
dt dt dt

As for real valued functions, differentiation of vector valued functions satisfies a number of
so-called ‘differentiation rules’. We list a few of these.

Theorem 5.3.4. Let r̄, v̄ : I → R3 and f : R → R be differentiable functions, where I is


an open interval in R. Then the following statements are true.

(1) w̄(t) = r̄(t) + v̄(t), t ∈ I, is differentiable and w̄′ (t) = r̄′ (t) + v̄ ′ (t).

(2) If α is a real number, then w̄(t) = α r̄(t), t ∈ I, is differentiable and w̄′ (t) = α r̄′ (t).

(3) w̄(t) = f (t)r̄(t), t ∈ I, is differentiable and w̄′ (t) = f ′ (t)r̄(t) + f (t)r̄′ (t).

(4) g(t) = r̄(t) · v̄(t), t ∈ I, is differentiable and g ′ (t) = r̄′ (t) · v̄(t) + r̄(t) · v̄ ′ (t).

(5) w̄(t) = r̄(t) × v̄(t), t ∈ I, is differentiable and w̄′ (t) = r̄′ (t) × v̄(t) + r̄(t) × v̄ ′ (t).

(6) If f (a) ∈ I for some a ∈ R, then w̄(t) = r̄(f (t)) is differentiable at t = a and
w̄′ (a) = f ′ (a)r̄′ (f (a)).

We prove (5), and give the proofs of some of the remaining results as exercises, see Exercise
5.3 number 7.

Proof of (5). Let

r̄(t) = ⟨x(t), y(t), z(t)⟩ and v̄(t) = ⟨f (t), g(t), h(t)⟩, t ∈ I.

278
By the definition of the cross product, Definition 1.7.1, we have
w̄(t) = r̄(t) × v̄(t) = ⟨y(t)h(t) − z(t)g(t), z(t)f (t) − x(t)h(t), x(t)g(t) − y(t)f (t)⟩, t ∈ I.
Since r̄ and v̄ are differentiable, it follows from Theorem 5.3.2 that x, y, z, f , g and h are
differentiable, and
r̄′ (t) = ⟨x′ (t), y ′ (t), z ′ (t)⟩, v̄ ′ (t) = ⟨f ′ (t), g ′ (t), h′ (t)⟩.
Therefore yh − zg, zf − xh and xg − yf are differentiable. Again by Theorem 5.3.2, w̄ is
differentiable, and
d d d
w̄′ (t) = ⟨ [y(t)h(t) − z(t)g(t)], [z(t)f (t) − x(t)h(t)], [x(t)g(t) − y(t)f (t)]⟩.
dt dt dt
Applying the Product and Sum Rules for differentiation we get
w̄′ (t) = ⟨y ′ (t)h(t) + y(t)h′ (t) − z ′ (t)g(t) − z(t)g ′ (t),

z ′ (t)f (t) + z(t)f ′ (t) − x′ (t)h(t) − x(t)h′ (t),

x′ (t)g(t) + x(t)g ′ (t) − y ′ (t)f (t) − y(t)f ′ (t)⟩

= ⟨y ′ (t)h(t) − z ′ (t)g(t), z ′ (t)f (t) − x′ (t)h(t), x′ (t)g(t) − y ′ (t)f (t)⟩

+⟨y(t)h′ (t) − z(t)g ′ (t), z(t)f ′ (t) − x(t)h′ (t), x(t)g ′ (t) − y(t)f ′ (t)⟩.
By the definition of the cross product, Definition 1.7.1, we have
w̄′ (t) = ⟨x′ (t), y ′ (t), z ′ (t)⟩ × ⟨f (t), g(t), h(t)⟩

+⟨x(t), y(t), z(t)⟩ × ⟨f ′ (t), g ′ (t), h′ (t)⟩

= r̄′ (t) × v̄(t) + r̄(t) × v̄ ′ (t).


This completes the proof.

For a function f : R → R, the derivative f ′ (a) of f at a ∈ R, if it exists, is interpreted


geometrically as the slope of the tangent line to the curve y = f (x) at the point (a, f (a)).
In particular, the equation of the tangent line is
y = f ′ (a)(x − a) + f (a),
or, in vector form,
⟨x, y⟩ = ⟨a, f (a)⟩ + ⟨1, f ′ (a)⟩t, t ∈ R. (5.1)
Now consider the vector function r̄(t) = ⟨t, f (t)⟩, t ∈ R. Clearly, r̄ is a parameterisation of
the graph of f , and r̄(a) = ⟨a, f (a)⟩. Furthermore, due to Theorem 5.3.2,
r̄′ (a) = ⟨1, f ′ (a)⟩.

279
From equation (5.1) we deduce that the vector r̄′ (a) is parallel to the tangent line to the
curve at r̄(a) = ⟨a, f (a)⟩. We therefore interpret r̄′ (a) as the tangent vector to the curve at
r̄(a).

y
Tangent line

r̄′ (a)

r̄(a) •

In general, consider a curve in R2 (or R3 ) parameterised by a vector function r̄ that is defined


on an open interval containing the real number a. If the derivative r̄′ (a) of r̄ at a exists
and r̄′ (a) ̸= 0̄, then we interpret the vector r̄′ (a) as a tangent vector to the curve at r̄(a).
The picture below may provide some motivation for this interpretation of the derivative.
It shows a curve parameterised by a vector function r̄, the derivative r̄′ (a) and the vectors
1
h
(r̄(a + h) − r̄(a)) for successively smaller values of h.

r̄′ (a) r̄′ (a)

{ h1 (r̄(a + h) − r̄(a))
r̄(a) •
1 (r̄(a + h) − r̄(a))
• r̄(a) h

• r̄(a + h)

Remark 5.3.5. Consider a three-dimensional vector function r̄ defined on an open interval


I containing the real number a. Assume that r̄ is continuous on I and differentiable at a,
with r̄′ (a) ̸= 0̄.

(1) The range C = {r̄(t) : t ∈ I} is a curve in R3 . For t ∈ I we interpret r̄(t) as a point


on the curve C, see Section 1.3.
(2) The derivative r̄′ (a) of r̄ at t = a is considered as a tangent vector to the curve C at
the point r̄(a). Here we interpret r̄′ (a) as a geometric vector, or ‘arrow’, in space, see
Section 1.8.

280
Definition 5.3.6. Let r̄ be a continuous vector function defined on an open interval I
containing the real number a. Assume that r̄′ (a) exits and r̄′ (a) ̸= 0̄. Then the tangent line
to the curve C = {r̄(t) : t ∈ I} at r̄(a) ∈ C is the line through r̄(a) and with direction
vector r̄′ (a).

We illustrate Definition 5.3.6 at the hand of some examples.


2t
Example 5.3.7. Consider the curve C with parameterisation r̄(t) = ⟨t2 −t, t+1 , t3 ⟩, t > −1.
We find an equation for the tangent line to C at the point p̄ = ⟨0, 1, 1⟩ ∈ C, if it exists.

Solution. For a real number t > −1, r̄(t) = p̄ = ⟨0, 1, 1⟩ if and only if
2t
t2 − t = 0, = 1 and t3 = 1.
t+1
The only value for t that satisfies all three equations is t = 1; that is, r̄(1) = p̄. We have
r̄′ (t) = ⟨2t − 1, (t+1)
2 2
2 , 3t ⟩, t > −1

so that r̄′ (1) = ⟨1, 21 , 3⟩ =


̸ 0̄. Therefore the tangent line to C at p̄ exists, and has equation
x̄ = p̄ + tr̄′ (1) = ⟨0, 1, 1⟩ + t⟨1, 21 , 3⟩, t ∈ R.
Example 5.3.8. Consider the curve C with parameterisation r̄(t) = ⟨cos t, sin t, t⟩, t ∈ R.
We find all points on the curve where the tangent line to the curve is parallel to the line
through the points p̄ = ⟨2, 1, 3⟩ and q̄ = ⟨1, 1, 2⟩.

Solution. According to Definition 1.4.17, two lines are parallel if and only if their direction
vectors are nonzero scalar multiples of each other. We therefore determine those values of t
for which r̄′ (t) = ⟨− sin t, cos t, 1⟩ is a nonzero scalar multiple of c̄ = p̄ − q̄ = ⟨1, 0, 1⟩. That
is, we find all t ∈ R so that
r̄′ (t) = αc̄ f or some α ̸= 0.
For real numbers α and t,
r̄′ (t) = ⟨− sin t, cos t, 1⟩ = αc̄ = ⟨α, 0, α⟩ if and only if α = 1.
Therefore r̄′ (t) is parallel to c̄ if and only if
r̄′ (t) = ⟨− sin t, cos t, 1⟩ = ⟨1, 0, 1⟩
It follows that r̄′ (t) is parallel to c̄ if and only if

t= + 2πk, k ∈ Z.
2
Hence the tangent line to C is parallel to the line through p̄ = ⟨2, 1, 3⟩ and q̄ = ⟨1, 1, 2⟩
precisely at the points
3π 3π
r̄( + 2πk) = ⟨0, −1, + 2πk⟩, k ∈ Z.
2 2

281
For a function f : R → R, the existence of the derivative f ′ at every x ∈ R implies that the
graph of f is a ‘smooth curve’. In particular, there is a tangent line to the graph of f at
every point (x, f (x)) on the graph. For a vector function r̄ from R to, say R2 , we have a
similar situation. If r̄′ (a) exists and is nonzero for some a ∈ R, then we interpret r̄′ (a) as a
tangent vector to the curve parameterised by r̄. We now show, at the hand of an example,
why the condition that r̄′ (a) ̸= 0̄ cannot, in general, be omitted.

Example 5.3.9. Consider the vector function r̄(t) = ⟨x(t), y(t)⟩, t ∈ R, given by

 ⟨−t2 , t3 ⟩ if t < 0
r̄(t) =
 3 2
⟨t , t ⟩ if t ≥ 0.

We have
x(h) − x(0)
lim+ = lim+ h2 = 0
h→0 h h→0

and
x(h) − x(0)
lim− = lim− −h = 0
h→0 h h→0

so that x′ (0) = 0. In the same way, y ′ (0) = 0. Therefore r̄ is differentiable at every t ∈ R,


and 

 ⟨3t2 , 2t⟩ if t > 0



r̄′ (t) = 0̄ if t = 0




⟨−2t, 3t2 ⟩ if t < 0.

In fact, r̄′ is even continuous on R. Notice that r̄′ (0) = 0̄.

As can be seen from the figure above, the curve does not have a tangent line at r̄(0) = 0̄.

282
Example 5.3.9 shows that, for a vector function r̄, if r̄′ (a) = 0̄, then the curve parameterised
by r̄ may not have a tangent line at r̄(a); that is, the curve may not be smooth at r̄(a).
However, it may happen that r̄′ (a) = 0̄, but the curve parameterised by r̄ does have a
tangent line at r̄(a). This is illustrated in the next example.
Example 5.3.10. Consider the function r̄(t) = ⟨t3 , t9 ⟩, t ∈ R, and let C be the curve
parameterised by r̄. Then r̄′ (t) = ⟨3t2 , 9t8 ⟩ for all t ∈ R. In particular, r̄′ (0) = 0̄. However,
the curve parameterised by r̄ has a tangent line at r̄(0) = 0̄. Indeed, v̄(t) = ⟨t, t3 ⟩. t ∈ R, is
also a parameterisation of C. We have

v̄(0) = 0̄ = r̄(0)

and
v̄ ′ (0) = ⟨1, 0⟩ =
̸ 0̄.

x
−1.5 −1 −0.5 0.5 1 1.5

−2

Therefore ⟨1, 0⟩ is a tangent vector to C at r̄(0) = 0̄, even though r̄′ (0) = 0̄.

We have introduced the derivative of a vector function, and showed how it is interpreted
geometrically. The derivative of r̄ : I → R3 can also be interpreted physically, depending on
the physical interpretation of the vector function r̄. One particularly important situation
occurs when r̄(t) is the displacement of a moving particle at time t ≥ 0. In this case, r̄′ (t)
is the velocity of the particle at time t. In the next section, we apply the derivative to the
sketching of curves in R2 that are parameterised by a vector function.

Exercise 5.3

1. Determine r̄′ (t) and r̄′ (a) for r̄ and a as given.


(a) r̄(t) = ⟨t ln t, t2 , 1t ⟩, t > 0; a = 1
(b) r̄(t) = ⟨25−2t , sin(πt), cos(3πt)⟩, t ∈ R; a = 2
2 −1 2 +1
(c) r̄(t) = ⟨ tt2 +1 , sinh t, et ⟩, t ∈ R; a = 0
2 +1
(d) r̄(t) = ⟨3t , t3 − 1, sec(πt/6)⟩, t ∈ (−3, 3); a = 1

283
2. Find an equation for the tangent line to the curve parameterised by r̄ at the specified
point.
t+1
(a) r̄(t) = ⟨t ln(t + 1), t2 + 1, 2t+1 ⟩; at the point p̄ = ⟨0, 1, 1⟩
2 −1 2t
(b) r̄(t) = ⟨et , t2 +t−1 ⟩; at the point p̄ = ⟨1, −1⟩
√ √
(c) r̄(t) = ⟨2 cos t, 4 sin t, tan t⟩, − π2 < t < π2 ; at the point p̄ = ⟨ 2, 2 2, 1⟩
(d) r̄(t) = ⟨tt , t2 + 1, 2t+1
t2 +1
⟩; at the point p̄ = ⟨4, 5, 1⟩
(e) r̄(t) = ⟨2t , ln(t + 1)⟩; at the point p̄ = ⟨1, 0⟩
(f) r̄(t) = ⟨cos t, sin(2t), cos(t + π)⟩, −π < t < 2π; at the point 0̄

3. Find all points (if any) on the curve C parameterised by r̄(t) = ⟨t3 + 3t, t2 , t2 + 1⟩,
t ∈ R, where the tangent line to the curve is
(a) parallel to the line with equation x̄ = ⟨1 + 3t, 2 + t, 1 + t⟩.
(b) parallel to the plane with equation x − 2y − 3z = 0.
(c) perpendicular to the line through the points p̄ = ⟨−1, 0, 1⟩ and q̄ = ⟨3, −16, 2⟩.

4. Consider a differentiable vector function r̄ : R → R3 . Suppose that ∥r̄(0)∥ = 1,


r̄′ (t) ̸= 0̄ and r̄′ (t) · r̄(t) = 0 for all t ∈ R. Prove that r̄(t) lies on the sphere with
radius 1 and centre 0̄ for every t ∈ R. [HINT: Prove that ∥r̄(t)∥2 = 1 for all t ∈ R by
first showing that the function h(t) = ∥r̄(t)∥2 is constant.]

5. Consider a differentiable vector function r̄ : R → R3 and a nonzero vector n̄ ∈ R3 .


Assume that r̄(0) = 0̄. Let V be the plane with equation

n̄ · x̄ = 0.

Prove that r̄(t) ∈ V for every t ∈ R if and only if r̄′ (t) · n̄ = 0 for every t ∈ R.

6. Use Theorem 5.2.2 to prove Theorem 5.3.2.

7. Prove Theorem 5.3.4 (1), (3), (4) and (6).

8. Let r̄ : R → R3 be differentiable at t = a. If A is a 3 × 3 matrix, prove that the


vector functions ū(t) = r̄(t)A and v̄(t) = Ar̄(t)T are differentiable at t = a with
ū′ (a) = r̄′ (a)A and v̄ ′ (a) = Ar̄′ (a)T .

9. Assume that r̄ : R → R3 is differentiable on R, and A is a 3 × 3 matrix. If Ar̄′ (t)T = 0̄


for every t ∈ R, and Ar̄(t0 )T = 0̄ for at least one t0 ∈ R, prove that Ar̄(t)T = 0̄ for all
t ∈ R.

284
5.4 Curve Sketching in R2

In this section we show how the derivative of a two-dimensional vector function is used as
an aid to the sketching of curves in the plane. So far, see Section 5.1, we are able to sketch
the curve parameterised by a vector function r̄ only in two special cases, namely, when we
recognise the curve as a familiar one, such as a circle or a line, or if we are able to represent
the curve as a graph of a function. In the current section we show how the derivative of
the vector function parameterising a curve in R2 is used to determine when the curve (or a
part of it) is the graph of a function. We will also show how to extract information about
the curve from the derivative of its parameterisation.

Consider a differentiable function f : I → R, where I is an open interval. As is shown in


Section 5.3, the graph of f is parameterised by the vector function

r̄(t) = ⟨x(t), y(t)⟩ = ⟨t, f (t)⟩, t ∈ I.

The derivative of r̄ is r̄′ (t) = ⟨1, f ′ (t)⟩, t ∈ R. Notice in particular that

x′ (t) = 1 ̸= 0, t ∈ I.

That is, the obvious parameterisation of the graph of f has the property that the derivative
of the x-component is nonzero. For a general vector function r̄(t) = ⟨x(t), y(t)⟩, t ∈ I, the
condition that x′ (t) ̸= 0 is also sufficient for the curve parameterised by r̄ to be the graph
of a function f .

Theorem 5.4.1. Consider a vector function r̄ : [a, b] → R2 such that r̄(t) = ⟨x(t), y(t)⟩,
t ∈ [a, b]. Assume that r̄′ (t) = ⟨x′ (t), y ′ (t)⟩ exists on (a, b) and is continuous at a point
c ∈ (a, b). If x′ (c) ̸= 0, then the following statements are true.

(1) There exists a real number δ > 0, an open interval I containing x(c) and a function
f : I → R so that for every t ∈ (c − δ, c + δ)

r̄(t) = ⟨x, y⟩ if and only if y = f (x) for some x ∈ I, y ∈ R. (5.2)

(2) The function f in (1) is differentiable on I and for every x ∈ I with x = x(t),
t ∈ (c − δ, c + δ),
dy
′ y ′ (t) dt
f (x) = ′ = dx
. (5.3)
x (t) dt

We give a proof for the case when x′ (c) > 0. The case when x′ (c) < 0 follows in exactly the
same way.

Proof of (1). Since r′ (t) exists for all t ∈ (a, b), we know from Theorem 5.3.2 that x′ (t)
also exists for all t ∈ (a, b). Therefore x is a continuous function on (a, b) by Theorem A.1.4.

285
Since x′ (c) > 0 and x′ is continuous at c, Theorem A.1.2 implies that there exists a number
δ > 0 such that x′ (t) > 0 for all t ∈ (c − δ, c + δ). The Inverse Function Theorem, Theorem
A.1.6 now implies that x has an inverse x−1 defined on the interval I = (x(c − δ), x(c + δ)).
For each x ∈ I, let
f (x) = y(x−1 (x)), x ∈ I.
Now consider x0 ∈ I and y0 ∈ R so that
y0 = f (x0 ).
Let t0 = x−1 (x0 ) ∈ (c − δ, c + δ). By the definition of an inverse function, x0 = x(t0 ).
Furthermore,
y0 = f (x0 ) = y(x−1 (x0 )) = y(x−1 (x(t0 ))) = y(t0 ).
Therefore ⟨x0 , y0 ⟩ = r(t0 ).
Conversely, consider x0 ∈ I and y0 ∈ R so that
⟨x0 , y0 ⟩ = r(t0 ) for some t0 ∈ (c − δ, c + δ).
Then x0 = x(t0 ) so that t0 = x−1 (x0 ). But y0 = y(t0 ) so that
y0 = y(x−1 (x0 )) = f (x0 ).
We have proven that (5.2) holds.

Proof of (2). Since x = x(t) is differentiable with x′ (t) > 0 on (c − δ, c + δ), and x−1
is its inverse, it follows from the Inverse Function Theorem, Theorem A.1.6 that x−1 is
differentiable on I. Therefore f = y ◦ x−1 is differentiable on I. Because y(t) = f (x(t)) for
all t ∈ (c − δ, c + δ) it follows from the Chain Rule that
y ′ (t) = f ′ (x(t))x′ (t) ∈ (c − δ, c + δ).
Therefore
dy
y ′ (t)
f ′ (x) = = dt
dx
x′ (t) dt

for all x = x(t) ∈ I with t ∈ (c − δ, c + δ). This completes the proof.


Remark 5.4.2. In Theorem 5.4.1, the roles of x and y can be interchanged. Consider a
vector function r̄ : [a, b] → R2 given by r̄(t) = ⟨x(t), y(t)⟩ such that r̄′ (t) = ⟨x′ (t), y ′ (t)⟩
exists on (a, b) and is continuous at a point c ∈ (a, b). If y ′ (c) ̸= 0, then there exist a real
number δ > 0, an open interval I containing y(c) and a function g : I → R so that for every
t ∈ (c − δ, c + δ)
r̄(t) = ⟨x, y⟩ if and only if x = g(y) for some y ∈ I, x ∈ R.
Furthermore, g is differentiable on I and
dx
x′ (t)
g ′ (y) = = dt
dy
where y = y(t) ∈ I f or some t ∈ (c − δ, c + δ).
y ′ (t) dt

286
Remark 5.4.3. Let r̄, [a, b] and c be as given in Theorem 5.4.1. A particularly important
case is when
x′ (c) ̸= 0 and y ′ (c) = 0.
In this case, according to Theorem 5.4.1, a small piece of the curve around the point r̄(c) is
the graph y = f (x) of a differentiable function f , and
y ′ (c)
f ′ (x(c)) = = 0.
x′ (c)
Therefore the tangent line to the curve at r̄(c) is horizontal.

In the same way, utilising Remark 5.4.2, if

x′ (c) = 0 and y ′ (c) ̸= 0

then the curve has a vertical tangent line at the point r̄(c).

If x′ (c) = y ′ (c) = 0, then neither Theorem 5.4.1 nor Remark 5.4.2 apply. In this case,
further investigation is required to determine the behaviour of the curve at r̄(c).

We illustrate the application of Theorem 5.4.1 (and Remark 5.4.2) to the sketching of curves
at the hand of a few examples.
Example 5.4.4. We sketch the curve C with parameterisation r̄(t) = ⟨t3 −3t, t2 ⟩, t ∈ [−2, 2].

Solution. Note that we are only considering values for t in the interval [−2, 2]. Therefore
all other values for t are disregarded.

We start by finding the end points of C.

r̄(−2) = ⟨−2, 4⟩ and r̄(2) = ⟨2, 4⟩, (5.4)

Now we find the intercepts of the curve with the coordinate axes. For the y-intercepts, we
set
x(t) = t3 − 3t = 0
and solve for t to find √ √
t = 0 or t = 3 or t = − 3.
Therefore the y-intercepts are at
√ √
r̄(0) = ⟨0, 0⟩ and r̄( 3) = r̄(− 3) = ⟨0, 3⟩. (5.5)

To determine the x-intercepts, we solve the equation

y(t) = t2 = 0

to find
t = 0.

287
Therefore the x-intercept is at

r̄(0) = ⟨0, 0⟩. (5.6)

Next we determine any possible turning points in the curve; that is, points on the curve
where the tangent line is either horizontal or vertical. According to Remark 5.4.3, for a
horizontal tangent, we must find t ∈ (−2, 2) such that

y ′ (t) = 2t = 0 and x′ (t) = 3t2 − 3 ̸= 0.

Clearly,
y ′ (t) = 0 if and only if t = 0, and x′ (0) = −3 ̸= 0.
Therefore the curve has a horizontal tangent at

r̄(0) = ⟨0, 0⟩. (5.7)

For a vertical tangent, we find t ∈ (−2, 2) such that

x′ (t) = 3t2 − 3 = 0 and y ′ (t) = 2t ̸= 0.

We have

x′ (t) = 0 if and only if t = −1 or t = 1, and y ′ (±1) = ±2 ̸= 0.

Therefore the tangent to the curve is vertical at

r̄(−1) = ⟨2, 1⟩ and r̄(1) = ⟨−2, 1⟩. (5.8)

We now have sufficient information to make a rough sketch of the curve C. We plot the
points (5.5), (5.6), (5.7) and (5.8), keeping in mind that the points in (5.7) and (5.8) are
points where the tangent line to the curve is horizontal and vertical, respectively. Then it is
only a matter of playing ‘connect the dots’, starting at the smallest value of t, and ending
at the largest.

y
r̄(−2) r̄(2)
• 4 •
√ √
r̄( 3) • r̄(− 3)

r̄(1)• • r̄(−1)
x

−3 −2 −1 r̄(0) 1 2 3

288
Example 5.4.5. We sketch the curve parameterised by r̄(t) = ⟨tet , t2 − 2t⟩, − 32 ≤ t ≤ 32 .

Solution. We start by determining the end points of the curve.

3e−3/2 21 3e3/2 3
r̄(− 32 ) = ⟨− , ⟩ and r̄( 23 ) = ⟨ , − ⟩. (5.9)
2 4 2 4
For the y-intercepts, we set x(t) = tet = 0 and solve for t. Clearly,

x(t) = 0 if and only if t = 0.

Therefore the only y-intercept is at

r̄(0) = ⟨0, 0⟩. (5.10)

For the x-intercept, set y(t) = t2 − 2t = 0 and solve for t to find

t = 0 or t = 2 > 23 .

Hence the only x-intercept of C is at

r̄(0) = ⟨0, 0⟩. (5.11)

To determine the points where the tangent to the curve is horizontal, we determine t ∈
(− 32 , 32 ) where
y ′ (t) = 2t − 2 = 0 and x′ (t) = et + tet ̸= 0.
Clearly,
y ′ (t) = 0 if and only if t = 1.
Furthermore,
x′ (1) = 2e ̸= 0.
Therefore, according to Remark 5.4.3, the curve has a horizontal tangent at the point

r̄(1) = ⟨e, −1⟩. (5.12)

We now determine those points, if any, where the curve has a vertical tangent. We have

x′ (t) = et + tet = 0 if and only if t = −1 and y ′ (−1) = −4 ̸= 0.

Therefore the curve has a vertical tangent at

r̄(−1) = ⟨−e−1 , 3⟩. (5.13)

To sketch the curve, we plot the points in (5.9), (5.10), (5.11), (5.12) and (5.13), remembering
where horizontal or vertical tangents appear, and connect the dots in order of increasing t.

289
6 y
• r̄(− 32 )

4
• r̄(−1)
2

x

r̄(0) 2 4 6 • 8
• r̄( 32 )
r̄(1)
−2

The result is the above sketch.


Example 5.4.6. We sketch the curve parameterised by r̄(t) = ⟨t − sin t, 1 − cos t⟩, where
−2π ≤ t ≤ 2π.

Solution. We start by determining the end points of the curve.

r̄(−2π) = ⟨−2π, 0⟩ and r̄(2π) = ⟨2π, 0⟩. (5.14)

For the y-intercepts, we set x(t) = t − sin t = 0 and solve for t. Clearly,

x(0) = 0.

Furthermore, x′ (t) = 1 − cos t > 0 for all nonzero t ∈ (−2π, 2π) so that

x(t) < x(0) = 0 if t < 0 and x(t) > x(0) = 0 if t > 0.

Therefore x(t) = 0 if and only if t = 0, so the only y-intercept is at

r̄(0) = ⟨0, 0⟩. (5.15)

For the x-intercept, set y(t) = 1 − cos t = 0 and solve for t to find

t = −2π, t = 0 or t = 2π.

Hence the x-intercepts are at

r̄(−2π) = ⟨−2π, 0⟩, r̄(0) = ⟨0, 0⟩ and r̄(2π) = ⟨2π, 0⟩. (5.16)

To determine the points where the tangent to the curve is horizontal, we determine t ∈
(−2π, 2π) where
y ′ (t) = sin t = 0 and x′ (t) = 1 − cos t ̸= 0.
Clearly,
y ′ (t) = 0 if and only if t = −π, t = 0 or t = π.

290
Furthermore,
x′ (0) = 0 and x′ (−π) = x′ (π) = 2 ̸= 0.
Therefore, according to Remark 5.4.3, the curve has a horizontal tangent at the points
r̄(−π) = ⟨−π, 2⟩ and r̄(π) = ⟨π, 2⟩ (5.17)
Note that Remark 5.4.3 does not give any information on what happens at points where
x′ (t) = y ′ (t) = 0; that is, at the point
r̄(0) = ⟨0, 0⟩. (5.18)
For −2π < t < 0, x′ (t) ̸= 0 so that this part of the curve is the graph y = f (x) of a function
f , and
dy
dy dt sin t
= dx = .
dx dt
1 − cos t
Using l’Hospital’s Rule, we have
dy cos t
lim− = lim− = −∞.
t→0 dx t→0 sin t
In the same way,
dy
= ∞.
lim+
t→0 dx

Therefore the curve forms a ‘sharp cusp’ at the point r̄(0). We check whether or not the
curve has any vertical tangents. For −2π ≤ t ≤ 2π we have
x′ (t) = 1 − cos t = 0, if and only if t = 0.
This value of t has already been dealt with, so there are no points where the tangent to the
curve is vertical.

y
r̄(−π) r̄(π)
• 2 •

r̄(−2π) r̄(2π) x
• • •
−8 −6 −4 −2 r̄(0) 2 4 6 8

−1

Plotting the points in (5.14), (5.15), (5.16), (5.17) and (5.18), remembering where horizontal
tangents and ‘sharp cusps’ appear, and ‘connecting the dots’ in order of increasing t, we
arrive at a rough sketch of the curve.

291
Theorem 5.4.1 gives a sufficient condition for (part of) a curve in R2 to be the graph
of a function f , and also determines the derivative of f in terms of the derivative of a
parameterisation of C. It should be noted that a given curve in R2 has more than one
parameterisation. It is possible that a curve has two parameterisations, say r̄ and v̄, where
r̄ satisfies the conditions of Theorem 5.4.1, while v̄ does not.

Exercise 5.4

1. Sketch the curve with given parameterisation, and indicate the end points of the curve,
all intercepts with the coordinate axes and turning points, including any ‘sharp cusps’.

3
(a) r̄(t) = ⟨t2 − 4t, t3 − t⟩, −2 ≤ t ≤ 3 (b) r̄(t) = ⟨t + 4t , t2 − 6t⟩, 1 ≤ t ≤ 4

(c) r̄(t) = ⟨t3 − 3t2 , t3 − 3t⟩, −3.5 ≤ t ≤ 4 (d) r̄(t) = ⟨esin t , ecos t ⟩, 0 ≤ t ≤ 2π

(e) r̄(t) = ⟨t3 − 3t, t2 − 2t⟩, −2 ≤ t ≤ 2 (f) r̄(t) = ⟨t4 + 1, t3 + t2 ⟩, −1 ≤ t ≤ 1

(g) r̄(t) = ⟨cos t + sin t cos t, sin t + sin t cos t⟩, 0 ≤ t ≤ 2π


2 t2
2. Find all the points on the curve parameterised by r̄(t) = ⟨ 5t2 − t, t3 + 2
+ 1⟩ where
∆y
the slope ∆x of the tangent line to the curve is equal to 1.

3. Consider a vector function r̄ : [a, b] → R2 such that r̄(t) = ⟨x(t), y(t)⟩, t ∈ [a, b].
Assume that r̄′ (t) = ⟨x′ (t), y ′ (t)⟩ exists on (a, b) and is continuous at a point c ∈ (a, b).
If y ′ (c) ̸= 0, prove the following.
(a) There exist a real number δ > 0, an open interval I containing y(c) and a function
g : I → R so that for every t ∈ (c − δ, c + δ)

r̄(t) = ⟨x, y⟩ if and only if x = g(y) for some y ∈ I, x ∈ R.

(b) The function g in (a) is differentiable on I and for every y ∈ I with y = y(t),
t ∈ (c − δ, c + δ),
dx
x′ (t)
g ′ (y) = ′ dt
= dy .
y (t) dt

292
Chapter 6

Complex numbers

We know that the solutions of the equation

ax2 + bx + c = 0

with a ̸= 0, b and c constant real numbers, are given by



−b ± b2 − 4ac
x= ,
2a
provided that ∆ = b2 − 4ac ≥ 0. If ∆ < 0, then the equation
√ does not have real solutions.
Indeed, in this case there is no real number p so that p = ∆.

A similar but much more complicated formula exists for the solutions of the cubic equation

ax3 + bx2 + cx + d = 0, a ̸= 0.

However, even in the case when this equation has three real solutions, the formula may
contain the square root of a negative number, which is undefined in R. This problem inspired
the Italian mathematician Gerolamo Cardano to invent the complex numbers around the
year 1545. Since then, complex numbers have become a fundamental part of mathematics,
with numerous applications. This chapter is a brief introduction to the basic properties of
these numbers.

6.1 Definition and Algebraic Operations

Definition 6.1.1. A complex number is an ordered pair of real numbers, z = (a, b). The
real number a is called the real part of z, and the real number b is called the imaginary part
of z. We write Re(z) for the real part of z, and Im(z) for the imaginary part of z. We
denote by C the set of all complex numbers.

293
Definition 6.1.2. Two complex numbers z = (a, b) and w = (c, d) are equal if a = c and
b = d.

The algebraic operations on complex numbers are defined in the following way.

Definition 6.1.3. Let z = (a, b) and w = (c, d) be complex numbers. Then the sum of z
and w is the complex number
z + w = (a + c, b + d).
The product of z and w is the complex number

zw = (ac − bd, ad + bc).

The following example illustrates the preceding definitions.

Example 6.1.4. Let z = (2, 5) and w = (−3, 2). Then Re(z) = 2, Im(z) = 5, Re(w) = −3
and Im(w) = 2. Furthermore,

z + w = (2, 5) + (−3, 2) = (2 − 3, 5 + 2) = (−1, 7)

and
zw = (2, 5)(−3, 2) = (2(−3) − 5(2), 2(2) + 5(−3)) = (−16, −11).

We now turn to the properties of the algebraic operations on C; that is, the arithmetic rules
for C.

Theorem 6.1.5. The following are true.

(1) Commutativity of addition: z + w = w + z for all z, w ∈ C.

(2) Associativity of addition: (z + w) + y = z + (w + y) for all z, w, y ∈ C.

(3) Existence of additive identity: z + (0, 0) = z for every z ∈ C.

(4) Existence of additive inverse: For every z ∈ C, there exists a number −z ∈ C so that
z + (−z) = (0, 0).

(5) Commutativity of multiplication: zw = wz for all z, w ∈ C.

(6) Associativity of multiplication: (zw)y = z(wy) for all z, w, y ∈ C.

(7) Existence of multiplicative identity: (1, 0)z = z for every z ∈ C.

(8) Distributive law: z(w + y) = zw + zy for all z, w, y ∈ C.

Proof of (3). For any complex number z = (a, b),

z + (0, 0) = (a + 0, b + 0) = (a, b) = z.

294
Proof of (6). Let z = (a, b), w = (c, d) and y = (e, f ) be complex numbers. Then

(zw) = (ac − bd, ad + bc).

Therefore
(zw)y = ([ac − db]e − [ad + bc]f, [ac − bd]f + [ad + bc]e)

= (ace − dbe − adf − bcf, acf − bdf + ade + bce).


But
wy = (ce − df, cf + de).
Therefore
z(wy) = (a[ce − df ] − b[cf + de], a[cf + de] + b[ce − df ])

= (ace − adf − bcf − bde, acf + ade + bce − bdf )

= (ace − dbe − adf − bcf, acf − bdf + ade + bce)

= (zw)y.

Proof of (7). For any complex number z = (a, b),

(1, 0)z = (1(a) − 0(b), 1(b) + 0(a)) = (a, b) = z.

The proofs of the remaining identities follow in the same way, and are given as exercises,
see Exercise 6.1 number 6.

Remark 6.1.6. Let z and w be complex numbers.

(1) The difference of z and w is the complex number z − w = z + (−w).

(2) If n is a positive integer, we denote by z n the complex number obtained by multiplying


z by itself n times. For instance, z 3 = zzz.

Remark 6.1.7. We associate with every real number a the complex number (a, 0). If a and
b are real numbers, then
(a, 0) + (b, 0) = (a + b, 0)
and
(a, 0)(b, 0) = (ab − 0(0), a(0) + 0(b)) = (ab, 0).
We may therefore think of the real numbers as a subset of the complex numbers. We call
a complex number z real if its imaginary part Im(z) is 0. That is, z is real if z = (a, 0)
for some real number a. The usual algebraic operations on real numbers agree with the
algebraic operations on complex numbers, when applied to real numbers. We therefore make
no distinction between a real number a and the complex number (a, 0). In particular, we
denote the complex number (0, 0) by 0, and call it ‘zero’.

295
Remark 6.1.8. Consider the complex numbers (1, 0) and i = (0, 1).

(1) Every complex number z = (a, b) can be written as

z = (a, 0) + (0, b) = (a, 0) + (b, 0)i.

Since we identify the real numbers a and b with the complex numbers (a, 0) and (b, 0),
we adopt the notation
z = a + bi
for the complex number z = (a, b). This is known as the standard form of a complex
number.

(2) If the imaginary part of a complex number z = a + bi is negative, we typically write


z = a − (−b)i in stead of z = a + bi. For example, the complex number z = 2 + (−3)i
is written as z = 2 − 3i.

(3) If a complex number z = a + bi is real, that is, Im(z) = b = 0, it is acceptable to


write z = a instead of z = a + 0i. Likewise, if Re(z) = 0, we write z = bi in stead of
z = 0 + bi.

(4) In standard form, the expressions defining addition and multiplication take the form

z + w = (a + c) + i(b + d), zw = (ac − bd) + i(ad + bc)

for complex numbers z = a + ib and w = c + id.

(5) Note that i2 = (0, 1)(0, 1) = (02 − 12 , 0(1) − 1(0)) = (−1, 0) = −1. As a motivation
for the definition of multiplication, consider the following calculation. For complex
numbers z = a + bi and w = c + di, we have

zw = (a + bi)(c + di)

= a(c + di) + bi(c + di)

= ac + adi + bci + bdi2

= ac + adi + bci − bd

= (ac − bd) + (ad + bc)i.

The following example illustrates the use of the properties of addition and multiplication
on C, given in Theorem 6.1.5.

Example 6.1.9. Let z = 2 + 3i, w = −7 + i and y = 2 − 4i. We write the following complex
numbers in standard form.

(a) zw − y 2

296
(b) z(2w − zy)

Solution. (a) We have

zw − y 2 = (2 + 3i)(−7 + i) − (2 − 4i)(2 − 4i)

= −17 − 19i − (−12 − 16i)

= −5 − 3i.

(b) We have

z(2w − zy) = (2 + 3i)(−14 + 2i − (2 + 3i)(2 − 4i))

= (2 + 3i)(−14 + 2i − (16 − 2i))

= (2 + 3i)(−30 + 4i)

= −72 − 82i.

Every nonzero real number has a multiplicative inverse; that is, if a ̸= 0 is a real number,
then there exists a real number a−1 so that aa−1 = 1. This is also true for complex numbers,
as we show in what follows.
Theorem 6.1.10. For every nonzero complex number z = a + ib, there exists a unique
complex number z −1 so that zz −1 = 1.

Proof. Let z = a + bi be a fixed nonzero complex number. For any complex number
w = x + yi,
zw = (ax − by) + (bx + ay)i.
By the definition of equality of complex numbers, Definition 6.1.2, zw = 1 = 1 + 0i if and
only if
ax − by = 1
bx + ay = 0.
We express this system of equations in matrix form Ax̄ = b̄ where
     
a −b x 1
A=   , x̄ =   , b̄ =  .
b a y 0
Then

zw = 1 if and only if Ax̄ = b̄. (6.1)

We have
det (A) = a2 + b2 .

297
Because z ̸= 0, at least one of a and b is nonzero, so that det(A) > 0. Theorem 2.5.25
implies that the equation Ax̄ = b̄ has a unique solution. It now follows from (6.1) there
exists a unique complex number w = z −1 so that zz −1 = 1.

The multiplicative inverse of a nonzero complex number z can be calculated using the
conjugate of z, which we now introduce.

Definition 6.1.11. The conjugate of a complex number z = a + bi is the complex number


z̄ = a − bi.

Before showing how the conjugate is used to calculate the multiplicative inverse of a complex
number, we list some of its properties.

Theorem 6.1.12. Let z and w be complex numbers. Then the following are true.

(1) z + w = z̄ + w̄.

(2) zw = z̄ w̄.

(3) z z̄ is a real number. In particular, if z = a + ib, then z z̄ = a2 + b2 .

(4) z̄ = z.

We prove (1), and give the proofs of (2) and (3) as exercises, see Exercise 6.1 number 7.

Proof of (1). Let z = a + bi and w = c + di. Then z + w = (a + c) + (b + d)i so that

z + w = (a + c) + (−b − d)i.

Since z̄ = a − bi and w̄ = c − di, we have

z̄ + w̄ = (a + c) + (−b − d)i = z + w.
1
Theorem 6.1.13. Let z be a nonzero complex number. Then z −1 = z̄.
z z̄

Proof. Let z = a + bi. By Theorem 6.1.12 (3), the complex number z z̄ = a2 + b2 is real.
1
Since z ̸= 0, either a ̸= 0 or b ̸= 0. Therefore z z̄ is nonzero. Let w = z̄. Then
z z̄
a b
w= − 2 i
a2 +b 2 a + b2

298
so that
         
a −b −b a
zw = a −b + a +b i
a + b2
2 a + b2
2 a2 + b2 a2 + b2

a2 b2
   
−ab ab
= + + + 2 i
a2 + b 2 a2 + b 2 2
a +b 2 a + b2

= 1 + 0i

= 1.
1
But z has exactly one multiplicative inverse by Theorem 6.1.10, so z −1 = w = z̄.
z z̄

Using the multiplicative inverse of a nonzero complex number, we can define division for
complex numbers.
z
Definition 6.1.14. Let z and w be complex numbers, with w ̸= 0. Then = zw−1 .
w
zy + y 2
Example 6.1.15. We write the complex number in standard form, where z = 2 + i,
w
w = 1 − 3i and y = 2i.

Solution. According to Definition 6.1.14,


zy + y 2
= (zy + y 2 )w−1 .
w
We have
zy + y 2 = y(z + y)

= 2i(2 + 3i)

= −6 + 4i.
Furthermore,
1 1 1 3
w−1 = w̄ = (1 + 3i) = + i.
ww̄ 10 10 10
So
zy + y 2
 
1 3 18 14 9 7
= (−6 + 4i) + i = − − i = − − i.
w 10 10 10 10 5 5
Remark 6.1.16. Using the multiplicative inverse of a nonzero complex number, we can
define negative integer powers for complex numbers. If z ∈ C is nonzero, and n is a positive
integer, then z −n = (z −1 )n . For instance, if z = 2 + i, then
 2
−2 −1 2 2 1 3 4
z = (z ) = − i = − i.
5 5 25 25

299
We have defined addition and multiplication for elements of C, and shown that these oper-
ations satisfy all the basic properties of addition and multiplication of real numbers. These
are the properties that we expect ‘numbers’ to satisfy, so we are justified in calling the
elements of C complex numbers. We should note that it is by no means obvious that op-
erations of ‘addition’ and ‘multiplication’ which we may define on a set will satisfy all the
familiar properties given in Theorem 6.1.5. Indeed, in Section 2.2 we saw that multiplication
of, say 2 × 2 matrices does not satisfy the commutative law, and a nonzero 2 × 2 matrix
does not always have a multiplicative inverse. The complex numbers, with addition and
multiplication as defined in this section, are therefore rather special.

Exercise 6.1

1. Write each of the following complex numbers in standard form.


(a) (1 − 6i) − (−3 − 4i) (b) i(1 − 2i)

(c) (i − 7)(3 + i) (d) (1 + 3i)(1 − 3i)

(e) (3 − i)(4 + 2i) − 2i(6 − i) (f) (2 − i)3

(g) i3 , i4 , i5 , i6 , i7 , i25
2. Write each of the following complex numbers in standard form.

1 1 2 + 3i 1+i 1 1 (2 − i)2 + (1 + i)
, , , , + ,
1+i 2 − 3i 1−i 2 − 3i 3 5i 2 + 2i

1 1
3. If z = a + bi, show that a = (z + z̄) and b = (z − z̄).
2 2i
4. Show that z + z̄ is real for any complex number z.

5. Let z be a complex number. Show that z = z̄ if and only if z is real.

6. Prove Theorem 6.1.5 (1), (5) and (8).

7. Prove Theorem 6.1.12 (2) and (3).

8. Find all real numbers α so that cos(α) + sin(α)i = 1 + i.

9. Consider the equation


ax2 + bx + c = 0
where a, b and c are real numbers such that a ̸= 0. Suppose that b2 − 4ac < 0. Show
that the complex numbers √
b 4ac − b2
x=− ± i
2a 2a
are solutions of the equation.

300
10. Find all (possibly complex) solutions for the following equations.

(a) x2 + 1 = 0 (b) 2x2 − x + 4 = 0 (c) x3 + 2x2 + 3x = 0

6.2 Modulus of a Complex Number

The real numbers are usually represented geometrically as points on an infinite, straight
line, while vectors in R2 may be interpreted as points in the plane, as discussed in Chapter
1. In this section we give a geometric interpretation of complex numbers, providing us with
a way in which to visualize these numbers and the algebraic operations defined in Section
6.1.

We identify each complex number z = a + bi with the point (a, b) in the plane, where a
is measured along the horisontal axis and b along the vertical axis. The horisontal axis is
called the real axis, and the vertical axis is called the imaginary axis. The resulting plane is
called the complex plane, or the Argand plane, named after the French-Swiss mathematician
Jean Robert Argand (1768 - 1822). In the figure below, the complex numbers x = −3 + 2i,
z = −1 − 2i, w = 2 + i, w̄ and z + w are plotted.

3 Im

x• 2

1
w•
Re
-3 -2 -1 1 2 3
-1 • • w̄
z̄ + w
z• -2

-3

The absolute value |a| of a real number a is interpreted as the distance between the point
on the line corresponding to a, and the point corresponding to 0. The modulus of a complex
number is a generalisation of the absolute value of a real number, and has a similar geometric
interpretation.
Definition 6.2.1. The modulus of a complex number z = a + bi is the real number |z| =

a2 + b 2 .
p √
Example 6.2.2. Let z = 2 − 3i. Then |z| = 22 + (−3)2 = 13.

Geometrically, the modules of z = a + bi may be interpreted as the distance between z and


the complex number 0 in the Argand plane, see the figure below.

301
Im

•z
|z|
Re
0

More generally, if z and w are complex numbers, then the real number |z − w| may be
interpreted as the distance between z and w in the complex plane.

If a complex number z = a + bi is real, that is, if b = 0, then the modulus of z is equal to


the absolute value of a. Indeed, in this case

√ √  a if a ≥ 0
2 2
|z| = a + 0 = a = 2

−a if a < 0.

We therefore expect that the modulus will satisfy properties similar to those of the absolute
value. This fact is established in the following theorem.
Theorem 6.2.3. Let z and w be complex numbers. Then the following statements are true.

(1) |z| ≥ 0, and |z| = 0 if and only if z = 0.


(2) Re(z) ≤ |z| and Im(z) ≤ |z|.
(3) |z|2 = z z̄.
(4) |zw| = |z||w|.

(5) |z̄| = |z|.


1
(6) If w ̸= 0, then |w−1 | = .
|w|
z |z|
(7) If w ̸= 0, then = .
w |w|
(8) |z + w| ≤ |z| + |w|.

Proof of (1). Let z = a + bi. By definition, |z| = a2 + b2 ≥ 0. If z = 0, then |z| =

02 + 02 = 0. If |z| = 0, then |z|2 = a2 + b2 = 0. Because a2 ≥ 0 and b2 ≥ 0, it follows that
a = b2 = 0 so that a = b = 0. Therefore z = 0.
2

302
Proof of (8). According to property (3) in the theorem, |z + w|2 = (z + w)(z + w). It
follows from Theorem 6.1.12 (1) that

|z + w|2 = z z̄ + z w̄ + wz̄ + ww̄ = |z|2 + z w̄ + wz̄ + |w|2 .

By Theorem 6.1.12 (3) and (4), wz̄ = w̄z. Therefore, see Exercise 6.1 number 3,

z w̄ + wz̄ = z w̄ + z w̄ = 2Re(z w̄).

By properties (2), (4) and (5) of this theorem,

z w̄ + wz̄ ≤ 2|z w̄| = 2|z||w|.

Therefore
|z + w|2 ≤ |z|2 + 2|z||w| + |w|2 = (|z| + |w|)2 .
Because |z + w| ≥ 0 and |z| + |w| ≥ 0 by property (1) of this theorem, it follows that
|z + w| ≤ |z| + |w|.

The proofs of the remaining results are given as exercises, see Exercise 6.2 numbers 5 to 8.

We have given a geometric representation of complex numbers as points in the Argand plane,
providing us with a useful way to visualise complex numbers. The modulus of a complex
number adds further structure and richness to this geometric interpretation, and as we will
see in the next section, it enables us to obtain an alternative description of complex numbers,
one which is very useful from a computational point of view.

Exercise 6.2

1. Find the modulus of the complex numbers w = 2 − i, x = 3 + 2i, y = −5 + 4i and


z = 6i. Also plot each of these complex numbers on the Argand plane.

2. Sketch each of the following sets in the complex plane.

(a) A = {z ∈ C : Re(z) ≥ 1} (b) B = {z ∈ C : Im(z) ≤ 1, Re(z) ≥ 0}

(c) C = {z ∈ C : |z| ≤ 2} (d) D = {z ∈ C : |z − 2 + i| ≤ 2}

(e) E = {z ∈ C : |z| ≥ 3} (f) E = {z ∈ C : |z| ≤ 3, Im(z) ≥ 1}


3. Find all complex numbers z so that |z| = 2 and |z − (1 + i)| = 1. Interpret the set of
solutions geometrically, by sketching the appropriate sets in the Argand plane.

4. Find all complex numbers z so that |z| = |z + 1 − i|. Interpret the set of solutions
geometrically, by sketching the appropriate set in the Argand plane.

5. Prove Theorem 6.2.3 (2).

6. Prove Theorem 6.2.3 (4).

303
7. Prove Theorem 6.2.3 (6).
8. Prove Theorem 6.2.3 (7).
9. Let z be a complex number, and n a positive integer.
(a) Use Mathematical Induction to prove that |z n | = |z|n .
(b) Use (a) to prove that |z −n | = |z|−n if z ̸= 0.

6.3 Polar Form and de Moivre’s Theorem

There is typically more than one way in which to represent a given mathematical object. For
instance, a function f : R → R can be represented by a formula, such as f (x) = x2 − xex , or
its graph. Depending on the situation at hand, one representation may be more useful than
another. In this section, we introduce a new representation of a complex number z = a + bi.
This representation is motivated by the definition of multiplication of complex numbers
which is cumbersome when dealing with expressions such as (1 + 2i)10 .

We start with the following observation.


Theorem 6.3.1. If p and q are real numbers so that p2 + q 2 = 1, then there exist a unique
θ ∈ (−π, π] so that p = sin θ and q = cos θ.

Proof. Since p2 ≥ 0 and q 2 ≥ 0, it follows from p2 + q 2 = 1 that p2 ≤ 1 and q 2 ≤ 1. Hence


−1 ≤ p ≤ 1 and − 1 ≤ q ≤ 1.
Since cos(π) = −1 and cos(0) = 1 and cos θ is continuous on [0, π], it follows from the
Intermediate Value Theorem, Theorem A.1.1, that there exists α ∈ [0, π] so that cos α = q.
Since cos θ is strictly decreasing on [0, π], there is only one such α.

If p ≥ 0, let θ = α. Then q = cos θ and


p2 = 1 − q 2 = 1 − cos2 θ = sin2 θ.
Because θ ∈ [0, π], sin θ ≥ 0. Furthermore, p ≥ 0 so that
p p
p = p2 = sin2 θ = sin θ.

If p < 0, let θ = −α. Then cos θ = cos(−α) = cos α = q. Since p ̸= 0 and p2 + q 2 = 1, it


follows that cos θ = q ̸= ±1. Hence θ ∈ (−π, 0) . As in the case p ≥ 0, we have p2 = sin2 θ.
Because θ ∈ (−π, 0), it follows that sin θ < 0. Furthermore, p < 0 so that
p p
p = − p2 = − sin2 θ = sin θ.
Since cos θ is strictly increasing on (0, −π), there is exactly one value for θ ∈ (−π, 0) so that
cos θ = q.

304
Theorem 6.3.2. Let z = a + bi be a nonzero complex number. Then there exists a unique
real number θ ∈ (−π, π] such that Re(z) = r cos θ and Im(z) = r sin θ, where r = |z|.

Proof. By definition of the modulus of z it follows that


 a 2  b 2 a2 + b2
+ = = 1.
r r r2
Theorem 6.3.1 implies that there exists exactly one θ ∈ (−π, π] so that
Re(z) = a = r cos θ and Im(z) = b = r sin θ.

We are now in a position to introduce the promised representation of complex numbers.


Definition 6.3.3. Let z be a complex number. Then a polar form of z is
z = r[cos(θ) + sin(θ)i], if z ̸= 0
and
z = 0[cos(0) + sin(0)i], if z = 0
where r = |z| and θ is a real number so that Re(z) = r cos θ and Im(z) = r sin θ.
Remark 6.3.4. It follows from Theorem 6.3.2 that for any complex number z there exist
real numbers r ≥ 0 and θ so that
z = r cos(θ) + r sin(θ)i.
Since the sine and cosine functions are both periodic with period 2π,
z = r cos(θ + 2πk) + r sin(θ + 2πk)i, k ∈ Z.
There are therefore infinitely many ways in which to express a complex number in polar
form. However, by Theorem 6.3.2, there is exactly one number θ ∈ (−π, π] so that z =
r cos(θ) + r sin(θ)i. This number θ is called the principal argument of z.

The polar form of a complex number can be interpreted geometrically as in the sketch below.
For a nonzero z ∈ C, r is the length of the line segment joining 0 and z in the Argand plane,
and θ is the angle formed at 0 by this line segment and the positive real axis, measured
anticlockwise for θ ≥ 0 and clockwise for θ < 0.
Im •z Im

θ Re Re
θ

•z
Positive value for θ Negative value for θ

305

Example 6.3.5. We express z = − 3 + i in polar form with principal argument.

Solution. Firstly, r = |z| = 2. To find θ, note that



Re(z) 3
cos θ = =− with θ ∈ (−π, π].
r 2
5π 5π
Therefore θ = or θ = − . But
6 6
Im(z) 1
sin θ = = with θ ∈ (−π, π].
r 2

Therefore θ = . Hence the polar form of z with principal argument is
6
5π 5π
z = 2[cos( ) + sin( )i].
6 6

In polar form, multiplication of complex numbers has a particularly simple form. The proof
of this result is given as an exercise, see Exercise 6.3 number 3.
Theorem 6.3.6. Consider complex numbers z and w in polar form, z = r1 [cos(θ1 )+sin(θ1 )i]
and w = r2 [cos(θ2 ) + sin(θ2 )i]. Then

zw = r1 r2 [cos(θ1 + θ2 ) + sin(θ1 + θ2 )i].

Remark 6.3.7. For complex numbers z = r1 [cos(θ1 )+sin(θ1 )i] and w = r2 [cos(θ2 )+sin(θ2 )i]
in polar form,
zw = r1 r2 [cos(θ1 + θ2 ) + sin(θ1 + θ2 )i]
by Theorem 6.3.6. However, it is not necessarily true that θ1 +θ2 is the principal argument of
zw, even if θ1 and θ2 are the principal arguments of z and w, respectively. It may happen that
θ1 + θ2 > π or θ1 + θ2 ≤ −π. For instance, if z = cos(π) + sin(π)i and w = cos( π2 ) + sin( π2 )i,
then
3π 3π
zw = cos( ) + sin( )i.
2 2
π
But the principal argument of zw is − 2 .

Example 6.3.8. Let z = 2 3 − 2i and w = 4 + 4i, We calculate zw using Theorem 6.3.6.

Solution. In polar form, z = 4[cos(− π6 ) + sin(− π6 )i] and w = 4 2[cos( π4 ) + sin( π4 )i]. There-
fore
√ π π √ π π √ π √ π
zw = 16 2 cos( − ) + 16 2 sin( − )i = 16 2 cos( ) + 16 2 sin( )i.
4 6 4 6 12 12
Using the double angle formula for cosine, we have
p √
π 1  π 1/2 2+ 3
cos = √ 1 + cos =
12 2 6 2

306
and p √
π 1  π 1/2 2− 3
sin = √ 1 − cos = .
12 2 6 2
Therefore
√ √
q q
zw = 8 4 + 2 3 + 8 4 − 2 3 i.

Using Theorem 6.3.6 and the Principle of Mathematical Induction, see Appendix A.2, we
obtain the following powerful result, known as De Moivre’s Theorem.

Theorem 6.3.9 (De Moivre’s Theorem). Consider a complex number z in polar form,
z = r[cos(θ) + sin(θ)i]. If n is a nonzero integer, then

z n = rn [cos(nθ) + sin(nθ)i].

Proof. We prove the result for n > 0. The case when n < 0 is given as an exercise, see
Exercise 6.3 number 5.
It is trivially true that

z 1 = z = r[cos(θ) + sin(θ)i] = r1 [cos(1 × θ) + sin(1 × θ)i].

Hence the result is true in the case n = 1.


Fix any integer k ≥ 1, and assume the result holds for k. Then

z k = rk [cos(kθ) + sin(kθ)i].

By Theorem 6.3.6,

z k+1 = z k z

= (rk cos(kθ) + rk sin(kθ)i)(r cos(θ) + r sin(θ)i)

= rk r[cos(kθ + θ) + sin(kθ + θ)i]

= rk+1 [cos((k + 1)θ) + sin((k + 1)θ)i].

Therefore, if z k = rk [cos(kθ) + sin(kθ)i] for some fixed but arbitrary integer k ≥ 1, then
z k+1 = rk+1 [cos((k + 1)θ) + sin((k + 1)θ)i]. By the Principle of Mathematical Induction,
z n = rn [cos(nθ) + sin(nθ)i] for all integers n ≥ 1.

Example 6.3.10. For z = 1 − i, we calculate z 9 and z −5 , and write the answers in standard
form.

Solution. The polar form of z is


√ π π
z= 2[cos(− ) + sin(− )i].
4 4

307
Therefore, by De Moivre’s Theorem 6.3.9,
9π 9π
z 9 = 29/2 [cos(− ) + sin(− )i]
4 4
π π
= 29/2 [cos(−2π − ) + sin(−2π − )i]
4 4
π π
= 29/2 [cos(− ) + sin(− )i]
4 4

= 16 − 16i.
In the same way, we have
5π 5π
z −5 = 2−5/2 [cos( ) + sin( )i]
4 4
3π 3π
= 2−5/2 [cos(2π − ) + sin(2π − )i]
4 4
3π 3π
= 2−5/2 [cos(− ) + sin(− )i]
4 4

= − 81 − 8i .

We have introduced an alternative representation of a complex number, namely, its polar


form. This new perspective has a significant computational advantage. Indeed, Theorem
6.3.9 shows that terms of the form z n , with n an integers, are easily calculated using the
polar form of z. This should be compared with the rather tedious manipulations involved
in calculating, say (2 − 3i)9 , in standard form. This is just one example among many in
mathematics where a change of perspective clarifies a complicated situation.

Exercise 6.3

1. Consider the following complex numbers.


√ √ √
z = 2 − 2i, w = −i, x = 1 + 3i, y = −2 3 − 2i, v = −2 + 3i.

(a) Express each number in polar form, with principal argument.


(b) Use your answers in (a) to calculate each of the following. Give your answer in
standard form.
z 6 , wy, w100 , v −6 , x4 + w100
2. Let A be the set of all complex numbers z with principal argument θ ∈ [ π4 , 5π
6
] so that
1 ≤ |z| ≤ 2. Sketch A on the Argand plane.
3. Prove Theorem 6.3.6.
4. Prove that if z = r[cos(θ) + sin(θ)i], then z̄ = r[cos(−θ) + sin(−θ)i].

308
5. Complete the proof of Theorem 6.3.9 by considering the case n < 0. [HINT: Use
Theorem 6.1.10 and the result in 4. above.]

6. Suppose that z = cos(θ) + sin(θ)i with θ ̸= 0. Geometrically, what is the effect of


multiplying a nonzero complex number w by z? [HINT: Sketch w, zw and the circle
with radius |z| and centre at 0 on the Argand plane.]

7. Consider complex numbers z = r1 [cos(θ1 ) + sin(θ1 )i] and w = r2 [cos(θ2 ) + sin(θ2 )i] in
polar form, with w ̸= 0. Prove that
z r1
= [cos(θ1 − θ2 ) + sin(θ1 − θ2 )i].
w r2

8. Use the fact that (2 + i)(3 + i) = 5 + 5i to prove that


   
π 1 1
= arctan + arctan .
4 2 3

[HINT: Write the numbers 2 + i, 3 + i and 5 + 5i in polar form and use the fact that
sin θ
tan θ = .]
cos θ

6.4 The Complex Exponential

The exponential function f (x) = ex , with x ∈ R, is a fundamental object in mathematics.


There is hardly a field in mathematics in which this function, in one form or another, does
not play a natural and important role. In this section we show how ez is to be defined for
a complex number z.

Definition 6.4.1. For a complex number z = a + bi, define ez to be the complex number
ez = ea [cos(b) + sin(b)i].

Remark 6.4.2. If the complex number z = a + bi is real, that is, if b = 0, then ez =


ea [cos(0) + sin(0)i] = ea . Therefore our definition of complex exponentiation is an extension
of the familiar real exponentiation.

The following properties of the complex exponential are easily verified by using the formula
for multiplication of complex numbers in polar form, given in Theorems 6.3.6, and De
Moivre’s Theorem, Theorem 6.3.9. The proofs are given as Exercise 6.4 number 1.

Theorem 6.4.3. Let z and w be complex numbers and n an integer. Then the following
statements are true.

(1) ez ew = ez+w .

(2) (ez )n = enz .

309
The complex exponential provides a convenient way in which to express the polar form of a
complex number. In this regard, observe that for a real number θ we have

eθi = e0 [cos(θ) + sin(θ)i] = cos(θ) + sin(θ)i. (6.2)

Thus for a complex number z = r[cos(θ) + sin(θ)i] in polar form, we have

z = reθi . (6.3)

From this point onward, we adopt the notation in (6.3) for the polar form of a complex
number. Note that the formula for z n given in De Moivre’s Theorem 6.3.9 reads

z n = rn enθi

in the newly adopted notation.

The identity (6.2) implies a fundamental result in mathematics, known as Euler’s Identity,
which connects four of the most important constants in mathematics. Euler’s Identity states
that
eπi + 1 = 0.

It should be noted that the complex exponential function exp(z) = ez satisfies many of the
properties of the usual real exponential functions, and some surprising new ones. However,
the further study of this function is far beyond the scope of this text.

Exercise 6.4

1. Use Theorems 6.3.6 and 6.3.9 to prove Theorem 6.4.3.

2. Prove Euler’s Identity.

3. Write each of the given complex numbers in polar form reθi .



1 − i, 3 − 2i, 4 − 4i, − i

6.5 Roots of Complex Numbers

In this section we discuss, as an application of de Moivre’s Theorem, nth roots of complex


numbers. Recall that for a real number a and a positive integer n, an nth root of a is a
real number c so that cn = a. If a < 0 and n is even, then no real nth root of a exists. For
instance, there is no real number c so that c2 = −1. On the other hand, if n is odd, then
a has exactly one nth root, and for a > 0 and n even, a has exactly two nth roots. As is
shown in what follows, the situation is very different for complex numbers.

Definition 6.5.1. Let z be a complex number and n a positive integer. An nth root of z is
a complex number w so that wn = z.

310

3 1
Example 6.5.2. Let z = i. We show that w = + i is a cube root of z.
2 2
π
Solution. The polar form of w is w = e 6 i . Applying de Moivre’s Theorem 6.3.9 we have
π
w3 = e 2 i = i. Therefore w is a cube root of z = i.

The main result concerning nth roots of complex numbers states that every nonzero complex
number has exactly n distinct complex nth roots. The proof of this result makes use of the
following theorem.

Theorem 6.5.3. Consider nonzero complex numbers z = r1 eθ1 i and w = r2 eθ2 i . Then
z = w if and only if r1 = r2 and θ1 = θ2 + 2πk for some integer k.

Proof. Suppose that z = w. Then r1 = |z| = |w| = r2 . Furthermore,

r1 cos(θ1 ) + r1 sin(θ1 )i = z = w = r2 cos(θ2 ) + r2 sin(θ2 )i.

By Definition 6.1.2 and the fact that r1 = r2 ̸= 0 we have

cos θ1 = cos θ2 and sin θ1 = sin θ2 .

Therefore

cos(θ1 − θ2 ) = cos θ1 cos θ2 + sin θ1 sin θ2 = cos2 θ1 + sin2 θ1 = 1.

Therefore θ1 − θ2 = 2πk, hence θ1 = θ2 + 2πk, for some integer k.


We leave the proof of the converse statement as an exercise, see Exercise 6.5 number 6.

Theorem 6.5.4. Let z = reθi be a nonzero complex number, with θ the principal argument
of z. For every positive integer n, z has exactly n distinct nth roots w0 , ..., wn−1 given by
√ θ+2kπ
wk = n
re n i , k = 0, ..., n − 1.

Proof. The proof consists of three parts. First we show that if w is an nth root of z, then
w = wk for some k = 0, ..., n − 1. The second step is to show that each wk is an nth root
of z. Lastly, we show that the wk are all different. We prove only the first part, leaving the
second and third parts as exercises, see Exercise 6.5 number 7.

Suppose that w = seαi is an nth root of z, with 0 < α ≤ 2π. By Definition 6.5.1 and de
Moivre’s Theorem 6.3.9,
sn enαi = wn = z.
By Theorem 6.5.3, sn = r and nα = θ + 2πk for some integer k. Therefore

n
θ + 2πk
s= r and α = for some integer k.
n

311
Since 0 < α ≤ 2π,
0 < θ + 2πk ≤ 2nπ.
so that
−θ < 2πk ≤ 2nπ − θ.
But −π < θ ≤ π so that
−π < 2πk < 2nπ + π.
Hence
1 1
− <k <n+ .
2 2
Since k and n are integers it follows that 0 ≤ k ≤ n so that
√ θ+2nπ
i
w = wk forsome k = 0, ..., n − 1 or w = n
re n .

But by Theorem 6.5.3,


√ θ+2nπ
i
√ θ √ θ
n
re n = n
re n +2πi = n
re n i = w0 .

Therefore w = wk for some k = 0, . . . , n − 1.


√ √
Example 6.5.5. We find the 4th roots of z = 2 2 − 2 2i in standard form.

π
Solution. In polar form, z = 4e− 4 i . Therefore, applying Theorem 6.5.4, the 4th roots of z
are √ −π/4+2kπ
wk = 2e 4 i , k = 0, 1, 2, 3.
In particular,
√ π √ π √ π
w0 = 2e− 16 i = 2 cos(− ) + 2 sin(− )i ≈ 1.387 − 0.276i,
16 16
√ 7π i √ 7π √ 7π
w1 = 2e 16 = 2 cos( ) + 2 sin( )i ≈ 0.276 + 1.387i,
16 16
√ 15π i √ 15π √ 15π
w2 = 2e 16 = 2 cos( ) + 2 sin( )i ≈ −1.387 + 0.276i
16 16
and
√ 23π i √ 23π √ 23π
w3 = 2e 16 = 2 cos( ) + 2 sin( )i ≈ −0.276 − 1.387i.
16 16
Example 6.5.6. Find all the solutions of the equation z 4 = i2 .

Solution. Note that i2 = −1 = eπi . Hence the solutions of z 4 = i2 are the fourth roots of
−1. By Theorem 6.5.4,
π+2kπ
z = wk = e 4 i , k = 0, 1, 2, 3.
In standard form, z = w0 = √1 + √i , z = w1 = − √12 + √i , z = w2 = − √12 − √i or
2 2 2 2
z = w3 = √12 − √i2 .

312
A particularly interesting situation arises when z = 1. According to Theorem 6.5.4, for each
positive integer n, there are n distinct nth roots of z = 1. These are called the roots of
unity. We study these roots more closely in the following.

Example 6.5.7. In polar form, 1 = e0i . Hence, by Theorem 6.5.4, the nth roots of 1 are
given by
2kπ
wk = e n i , k = 0, ..., n − 1.
Let us make a few observations regarding the nth roots wk of 1.

(1) For each k = 0, ..., n − 1 we have |wk | = 1. Therefore all the nth roots of unity lie on
the circle with radius 1 and centre 0.

(2) The first nth root is w0 = e0i = 1. If n is even, then wn/2 = eπi = −1. Therefore the
real nth roots of 1 are also complex nth roots of unity.
2π 2kπ
(3) Moving from wk to wk+1 along the circle, the argument increases by from to
n n
2(k + 1)π
. Therefore the roots of unity are evenly distributed along the circle |z| = 1.
n
As an illustration of the general observations (1), (2) and (3), we calculate the cube roots
of 1 and represent these graphically on the Argand plane. We have
√ √
0i 2π
i 1 3 4π
i 1 3
w0 = e = 1, w1 = e 3 = − + i, w2 = e 3 = − − i.
2 2 2 2

Im
w1 1

w Re
• 0
−1.5 −1 −0.5 0.5 1 1.5

w2• −1

As an application of de Moivre’s Theorem 6.3.9, we have found an explicit formula for the
nth roots of a complex number. Whereas a real number has either no real nth roots, exactly
one real nth root or exactly two real nth roots, a complex number has exactly n complex
nth roots. In particular, a real number has exactly n nth roots! This surprising result hints
at things to come. Indeed, the equation z 6 = −1 has no real solutions. However, if we allow
z to take on complex values, then the equation has exactly six distinct solutions! This is a
special case of the Fundamental Theorem of Algebra, which is discussed in the next chapter.

313
Exercise 6.5

1. In the proof of Theorem 6.5.4 we use the fact that for any complex number seαi , one
can suppose that 0 < α ≤ 2π. Explain why this is so.

2. Find the nth roots of the following complex numbers.


√ √
−1 + i, n = 3; 3 − i, n = 4; i, n = 3; −i, n = 3; 32, n = 8; 2 − 2 3i, n = 2

3. Find the 6th roots of unity in standard form and represent the roots graphically on
the Argand plane.

4. Find the solutions of the given equation. Write your answers in standard form.

(a) z 3 − 1 = i (b) z 9 + z 5 − z 4 = 1

5. Compare the solutions of the equations z 4 = (−i)2 and z 2 = −i. In general, for a fixed
nonzero complex number w, what can be said regarding the solutions of the equations
z 2n = w2 and z n = w?

6. Complete the proof of Theorem 6.5.3 by showing that if r1 = r2 and θ1 = θ2 + 2kπ for
some integer k, then r1 eθ1 i = r2 eθ2 i .

7. The aim of this exercise is to complete the proof of Theorem 6.5.4. Let z, n and wk
be as in the theorem.
(a) Show that wk is an nth root of z for every k = 0, ..., n − 1.
(b) Show that wk ̸= wl whenever k ̸= l. [HINT: Use Theorem 6.5.3.]

8. Let n ≥ 2 be an integer, and w0 , ..., wn−1 the nth roots of unity. Show that w0 + w1 +
... + wn−1 = 0.

9. Let a be a nonzero complex number. Show that for an integer n ≥ 2, the nth roots of
a form a geometric sequence α, αω, αω 2 , ..., αω n−1 .

314
Chapter 7

Polynomials over R and C

Finding the solutions of polynomial equations is one of the oldest mathematical problems.
Indeed, cubic equations were known to the ancient Babylonians, Greeks, Chinese, Indians
and Egyptians. However, the behaviour of polynomials in general, and of their roots, is still
not completely understood. In this chapter we introduce some of the basic concepts and
results related to polynomials.

7.1 Polynomials

We begin our discussion with the definition of a polynomial and the algebraic operations on
polynomials.

For n ∈ N, a polynomial of degree n in one indeterminate x is an expression of the form

f (x) = an xn + · · · + a1 x + a0

where an ̸= 0. If the coefficients


a0 , a1 , . . . , an
are complex numbers, we say f (x) is a polynomial over C, and if the coefficients are real
numbers, f (x) is a polynomial over R.
Remark 7.1.1. A polynomial of degree 0 is an expression of the form f (x) = a0 , with
a0 ̸= 0. The polynomial f (x) = 0 is called the zero polynomial and its degree is undefined.
Definition 7.1.2. Two polynomials f (x) = an xn + · · · + a1 x + a0 and g(x) = bm xm + · · · +
b1 x + b0 of degree n and m, respectively, are equal if n = m and aj = bj for all j = 1, . . . , n.
Remark 7.1.3. Let f (x) = an xn + · · · + a1 x + a0 be a polynomial of degree n.

(1) If f (x) is a polynomial over R, then we may view f as a function f : R → R. Likewise,


if f is a polynomial over C, then we may view f (x) as a function f : C → C.

315
(2) If f (x) and g(x) are equal, then the functions associated with f and g, respectively,
are also equal.
(3) Recall that every real number may be identified in a natural way with a complex number.
Therefore, if f (x) is a polynomial over R, then f (x) is also a polynomial over C. Since
we only consider polynomials over R or C, we will not mention the number system
again.

In this section, we are chiefly concerned with algebraic operations on polynomials. These
are defined as follows.
Definition 7.1.4. Let f (x) = an xn + · · · + a1 x + a0 and g(x) = bm xm + · · · + b1 x + b0 be
polynomials of degree n and m respectively. The product of f (x) and g(x) is the polynomial
f (x)g(x) = cm+n xm+n + · · · + c1 x + c0
where ck = a0 bk + a1 bk−1 + · · · + ak b0 for every k = 0, . . . , m + n.
Definition 7.1.5. Let f (x) = an xn + . . . + a1 x + a0 and g(x) = bm xm + . . . + b1 x + b0 be
polynomials of degree n and m respectively. The sum of f (x) and g(x) is the polynomial
f (x) + g(x) = dk xk + . . . d1 x + d0
where k = max{m, n} and dj = aj + bj for all j = 1, ..., k, with aj = 0 if j > n and bj = 0
if j > m.
Remark 7.1.6. Consider polynomials f (x) and g(x).

(1) If f (x) = 0 or g(x) = 0, then f (x)g(x) = 0.


(2) If f (x) has degree n and g(x) has degree m, then f (x)g(x) has degree m + n, while
f (x) + g(x) has degree at most max{m, n}, or f (x) + g(x) = 0.

The definitions of multiplication and addition of polynomials may seem obscure, but are
in fact quite natural. Indeed, the definitions are based on the arithmetic rules for real
and complex numbers. If we consider polynomials f (x) = an xn + · · · + a1 x + a0 and
g(x) = bm xm + · · · + b1 x + b0 as functions, then f (x) and g(x) are (real or complex) numbers
for each fixed (real or complex) number x. Applying the arithmetic rules, we find that the
product and the sum of the numbers f (x) and g(x) are given as in Definitions 7.1.4 and
7.1.5, respectively. We demonstrate this fact in a special case.
Example 7.1.7. Let f (x) = a3 x3 + a2 x2 + a1 x + a0 and g(x) = b4 x4 + b3 x3 + b2 x2 + b1 x + b0 .
Viewing these polynomials as functions, we have, for each fixed number x, that according to
the definition of addition of polynomials
(a3 x3 + a2 x2 + a1 x + a0 ) + (b4 x4 + b3 x3 + b2 x2 + b1 x + b0 )

= b4 x4 + (a3 + b3 )x3 + (a2 + b2 )x2 + (a1 + b1 )x + (a0 + b0 )

= f (x) + g(x).

316
Likewise, for a fixed number x we have
(a3 x3 + a2 x2 + a1 x + a0 )(b4 x4 + b3 x3 + b2 x2 + b1 x + b0 )

= a3 x3 (b4 x4 + b3 x3 + b2 x2 + b1 x + b0 )

+a2 x2 (b4 x4 + b3 x3 + b2 x2 + b1 x + b0 )

+a1 x(b4 x4 + b3 x3 + b2 x2 + b1 x + b0 )

+a0 (b4 x4 + b3 x3 + b2 x2 + b1 x + b0 )

= a3 b 4 x 7 + a3 b 3 x 6 + a3 b 2 x 5 + a3 b 1 x 4 + a3 b 0 x 3

+a2 b4 x6 + a2 b3 x5 + a2 b2 x4 + a2 b1 x3 + a2 b0 x2

+a1 b4 x5 + a1 b3 x4 + a1 b2 x3 + a1 b1 x2 + a1 b0 x

+a0 b4 x4 + a0 b3 x3 + a0 b2 x2 + a0 b1 x + a0 b0

= a3 b4 x7 + (a3 b3 + a2 b4 )x6 + (a3 b2 + a2 b3 + a1 b4 )x5

+(a3 b1 + a2 b2 + a1 b3 + a0 b4 )x4 + (a3 b0 + a2 b1 + a1 b2 + a0 b3 )x3

+(a2 b0 + a1 b1 + a0 b2 )x2 + (a1 b0 + a0 b1 )x + a0 b0

= f (x)g(x)
according to Definition 7.1.4.

Multiplication and addition of polynomials satisfy the following properties.


Theorem 7.1.8. Let f (x), g(x) and h(x) be polynomials. Denote by 0 the zero polynomial
and by 1 the polynomial of degree 0 with a0 = 1. Then the following statements are true.

(1) Addition is commutative: f (x) + g(x) = g(x) + f (x).


(2) Addition is associative: (f (x) + g(x)) + h(x) = f (x) + (g(x) + h(x)).
(3) Existence of additive identity: f (x) + 0 = f (x).
(4) Existence of additive inverse: f (x) + (−f (x)) = 0.
(5) Multiplication is commutative: f (x)g(x) = g(x)f (x).
(6) Multiplication is associative: (f (x)g(x))h(x) = f (x)(g(x)h(x)).
(7) Existence of multiplicative identity: f (x)1 = f (x).
(8) Distributive law: f (x)(g(x) + h(x)) = f (x)g(x) + f (x)h(x).

317
7.2 The Division Algorithm and the Factor Theorem

For real or complex numbers a and b with b ̸= 0, there exists a unique real number c so that

a = bc.
a
Indeed, c is the quotient . In general, given two polynomials f (x) and g(x), it is not
b
possible to find a polynomial q(x) so that

f (x) = g(x)q(x).

Indeed, consider the following example.

Example 7.2.1. Let f (x) = x2 −2x+1 and g(x) = x+1. Suppose that q(x) is a polynomial
satisfying the equation
f (x) = g(x)q(x).
If q(x) has degree n, then g(x)q(x) has degree n + 1. Since f (x) has degree 2, it follows that
q(x) has degree 1. That is, q(x) = ax + b, where a ̸= 0, and therefore

g(x)q(x) = ax2 + (a + b)x + b.

From the definition of equality of polynomials, Definition 7.1.2, we obtain the system of
equations
a = 1
a + b = −2
b = 1.
Since this system has no solution, the polynomial q(x) does not exist.

The situation described above for polynomials, and illustrated at the hand of Example
7.2.1, is not entirely unfamiliar. Indeed, if a and b are integers with b ̸= 0, there does not
necessarily exist an integer c so that
a = bc.
However, there exists unique integers q and r so that

a = bq + r and 0 ≤ r < |b|.

This result is known as the Euclidean Division Algorithm, and holds also for polynomials.

Theorem 7.2.2 (Division Algorithm). Let f (x) and g(x) be nonzero polynomials. Then
there exist unique polynomials q(x) and r(x) so that

f (x) = q(x)g(x) + r(x)

with degree r(x) < degree g(x) or r(x) = 0.

318
The proofs of three special cases of Theorem 7.2.2 are given as exercises, see Exercise 7.2
number 3.
Remark 7.2.3. When we apply the Division Algorithm, Theorem 7.2.2, to polynomials f (x)
and g(x), we obtain two new polynomials q(x) and r(x). The polynomial q(x) is called the
quotient, and r(x) is called the remainder.

Given polynomials f (x) and g(x), the quotient and remainder can be determined algorith-
mically using the process of long division, also known as synthetic division. We illustrate
this process at the hand of an example.
Example 7.2.4. Let f (x) = 2x3 + 5ix2 − 4x − i and g(x) = x − i. We find the quotient
q(x) and remainder r(x) so that f (x) = q(x)g(x) + r(x).

Solution. Performing long division, we have the following.

2x2 + 7ix − 11
x − i ) 2x3 + 5ix2 − 4x − i · · · f (x)
2x3 − 2ix2 · · · 2x2 × g(x)
7ix2 − 4x − i · · · f (x) − [2x2 × g(x)]
7ix2 + 7x · · · 7ix × g(x)
− 11x − i · · · f (x) − [2x2 × g(x)] − [7ix × g(x)]
−11x + 11i · · · − 11 × g(x)
− 12i · · · f (x) − [2x2 × g(x)] − [7ix × g(x)] − [−11 × g(x)]

The quotient is q(x) = 2x2 + 7ix − 11 and the remainder is r(x) = −12i.

An important special case of Theorem 7.2.2 is the following result, known as the Remainder
Theorem.
Theorem 7.2.5 (Remainder Theorem). If a polynomial f (x) is divided according to the
Division Algorithm by a polynomial g(x) = x − c of degree one, then the remainder is equal
to f (c).

Proof. By the Division Algorithm, Theorem 7.2.2, there exist polynomials r(x) and q(x)
so that f (x) = q(x)g(x) + r(x) with r(x) = 0 or r(x) < degree g(x) = 1. Therefore r(x) = p
for some constant p. But

f (c) = q(c)g(c) + r(c) = q(c)(c − c) + p = p.

Therefore r(x) = f (c).

We illustrate the Remainder Theorem 7.2.5 at the hand of the following example.

319
Example 7.2.6. Consider the polynomials f (x) = 2x3 + 5ix2 − 4x − i and g(x) = x + i. We
determine the remainder when f (x) is divided by g(x), first using the Division Algorithm,
and then using the Remainder Theorem.

Solution. First we perform long division to determine the remainder.

2x2 + 3ix − 1
x + i ) 2x3 + 5ix2 − 4x − i
2x3 + 2ix2
3ix2 − 4x − i
3ix2 − 3x
−x−i
−x − i
0

Therefore, according to the Division Algorithm, the remainder is r(x) = 0.


According to the Remainder Theorem 7.2.5, the remainder is

r(x) = f (−i) = 2(−i)3 + 5i(−i)2 − 4(−i) − i = 2i − 5i + 4i − i = 0.

We now turn to the important concepts of factorisation of polynomials and roots of poly-
nomials.

Definition 7.2.7. Let f (x) and g(x) be polynomials. We say that g(x) is a factor of f (x)
if there exists a polynomial h(x) so that f (x) = g(x)h(x).

The following result relates factors and the Division Algorithm, Theorem 7.2.2.

Theorem 7.2.8. Let f (x) and g(x) be polynomials. Then g(x) is a factor of f (x) if and only
if the remainder obtained when dividing f (x) by g(x) according to the Division Algorithm is
the zero polynomial r(x) = 0.

As an immediate consequence of the Remainder Theorem 7.2.5 and Theorem 7.2.8 we obtain
the Factor Theorem.

Theorem 7.2.9 (Factor Theorem). A polynomial g(x) = x−c is a factor of a polynomial


f (x) if and only if f (c) = 0.

Definition 7.2.10. A number c is a root of a polynomial f (x) if f (c) = 0.

Remark 7.2.11. According to the Factor Theorem 7.2.9, a number c is a root of a polyno-
mial f (x) if and only if g(x) = x − c is a factor of f (x).

320
The results and concepts introduced in this section establish the relationship between the
roots of a polynomial, that is, solutions of a polynomial equation

an xn + an−1 xn−1 + · · · a1 x + a0 = 0

and the factors of the polynomial f (x) = an xn + an−1 xn−1 + · · · a1 x + a0 , and do so for
polynomials with possibly complex coefficients. An outstanding aspects is that the particular
number system over which the polynomials are viewed is essentially irrelevant, as long
as certain basic facts remain true. The Division Algorithm provides an implementable
mechanism by which one can test whether or not a polynomial g(x) is a factor of f (x) and,
more importantly, it provides the accompanying factor. In the next section we apply these
results to the centuries old problem of solving polynomial equations.

Exercise 7.2

1. In each case, determine the quotient and remainder when f (x) is divided by g(x).
(a) f (x) = 4x4 − x2 + 3x − 1; g(x) = x2 − 1
(b) f (x) = x3 − 2ix2 − ix + 3; g(x) = x + 2i
(c) f (x) = x5 − ix4 + 2x2 − 1; g(x) = ix2 + 2
(d) f (x) = x5 − 4x2 + 3; g(x) = x2 + i
(e) f (x) = x4 − ix3 + 2 − i; g(x) = ix − 1

2. In each case, show that g(x) is a factor of f (x).


(a) f (x) = x3 − 2x2 + x − 2; g(x) = x − 2
(b) f (x) = x4 − 1; g(x) = 2x + 2i
(c) f (x) = x4 + x3 + 2x2 + 3x − 3; g(x) = x2 + x − 1

3. The aim of this exercise is to prove three special cases of the Division Algorithm,
Theorem 7.2.2. Suppose that f (x) has degree n and g(x) has degree m. Let

f (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 and g(x) = bm xm + bm−1 xm−1 + · · · + b1 x + b0 .

(a) Assume that m > n.


(i) Show that q(x) = 0 and r(x) = f (x) satisfy the conditions of Theorem 7.2.2.
(ii) Suppose that f (x) = q0 (x)g(x) + r0 (x) where degree r0 (x) < degree g(x) or
r0 (x) = 0, and q0 (x) ̸= 0. Show that

n = degree f (x) = degree [q0 (x)g(x) + r0 (x)] ≥ m,

contradicting the assumption that m > n. Now explain why q0 (x) = q(x)
and r0 (x) = r(x).
(b) Assume that m = n. In this case, g(x) = bn xn + bn−1 xn−1 + · · · + b1 x + b0 .
an
(i) Show that q(x) = and r(x) = f (x) − q(x)g(x) satisfy the conditions of
bn
Theorem 7.2.2.

321
(ii) Now prove that q(x) and r(x) are the only polynomials that satisfy the
conditions in Theorem 7.2.2, see question 3 (a) (ii).
(c) Assume that m = 2 and n = 3; that is,

f (x) = a3 x3 + a2 x2 + a1 x + a0 and g(x) = b2 x2 + b1 x + b0 .

The general case m < n can be proven in a similar way.


(i) Show that if q(x) and r(x) satisfy the conclusions of Theorem 7.2.2, then
q(x) has degree 1.
(ii) Let q(x) = Ax + B and r(x) = Cx + D. By solving an appropriate system
of linear equations, show that there exist unique numbers A, B, C and D so
that f (x) = q(x)g(x) + r(x).

4. Prove Theorem 7.2.8.

5. Prove Theorem 7.2.9.

6. Let f (x), g(x) and h(x) be polynomials. Assume that g(x) is a factor of f (x), and
h(x) is a factor of g(x). Prove that h(x) is a factor of f (x).

7. Let f (x), g(x) and h(x) be polynomials (over C). Assume that f (x) = g(x)h(x). If
c ∈ C is a root of h(x), prove that c is also a root of f (x). [HINT: Use the result in
question 6 and an appropriate theorem.]

8. Use Theorem 7.2.9 to prove that a polynomial of degree n has at most n roots.

7.3 Roots of polynomials

In this section we get to the root of the matter, as it were. We state the main result regarding
the existence of solutions of polynomial equations, namely, the Fundamental Theorem of
Algebra. As the name suggests, it is a truly important result in Algebra. It was stated,
although not entirely correctly, by Albert Girard in 1629. Many great mathematicians
attempted to prove this result, among them d’Alembert in 1746, Euler in 1749, Lagrange in
1772, and Laplace in 1795. Gauss gave a correct, but incomplete, proof in 1899. His proof
was only completed in 1920 by Alexander Ostrowski. We will not attempt to give a proof
of the Fundamental Theorem of Algebra, but will use it to prove that every polynomial of
degree n ≥ 1 over C has n complex roots.

Theorem 7.3.1 (Fundamental Theorem of Algebra). Let f (x) be a polynomial over


C with degree n ≥ 1. Then f (x) has at least one root in C.

Example 7.3.2. The polynomial f (x) = x2 + 1 over R has no real roots. However, f (x) is
also a polynomial over C, and according to Theorem 7.3.1 it has at least one complex root.
In fact, f (x) has exactly two complex roots, namely, x = i or x = −i.

322
The following is an important consequence of the Fundamental Theorem of Algebra, Theo-
rem 7.3.1.
Theorem 7.3.3. Every polynomial f (x) = an xn + · · · + a1 x + a0 over C of degree n ≥ 1
has exactly n roots in C.
Remark 7.3.4. In Theorem 7.3.3, the roots of a polynomial f (x) are counted with multi-
plicity. That is, if g(x) = (x − c) appears exactly k times as a factor of f (x), then we count
x = c exactly k times as a root of f (x). Therefore the n roots of a polynomial of degree n
need not be different. For instance, the polynomial f (x) = x2 + 2x + 1 has two equal roots
namely x = −1 and x = −1.

Proof of Theorem 7.3.3. We prove the result using Mathematical Induction. Consider
a polynomial f (x) = a1 x + a0 of degree one. Since a1 ̸= 0, it follows that for a complex
number x,
a0
f (x) = 0 if and only if x = .
a1
Therefore f (x) has exactly one complex root.
Fix any natural number k ≥ 1. Assume that every polynomial of degree k has exactly k
complex roots. Consider a polynomial
f (x) = ak+1 xk+1 + ak xk + · · · + a1 x + a0
of degree k + 1. According to the Fundamental Theorem of Algebra, Theorem 7.3.1, f (x)
has at least one complex root, say c1 ∈ C. It follows from the Factor Theorem, Theorem
7.2.9, that g(x) = x − c1 is a factor of f (x). That is,
f (x) = g(x)h(x)
for some polynomial h(x). Since
k + 1 = degree f (x) = degree g(x) + degree h(x) = 1 + degree h(x),
it follows that h(x) has degree k. Therefore, by assumption, h(x) has exactly k complex
roots, say c2 , ..., ck+1 ∈ C. Since h(x) is a factor of f (x), every root of h(x) is a root of f (x),
see Exercise 7.2 number 7. Therefore f (x) has at least k + 1 roots c1 , c2 , ..., ck+1 . But a
polynomial of degree k + 1 has at most k + 1 roots, see Exercise 7.2 number 8. Consequently,
f (x) has exactly k + 1 roots.
By Mathematical Induction it follows that a polynomial f (x) = an xn + · · · + a1 x + a0 over
C of degree n ≥ 1 has exactly n roots in C.

Theorem 7.3.3 shows that a polynomial of degree n over C (or R) has exactly n roots in C.
However, the theorem gives no information on how to calculate the roots. While there exist
formulas that determine the roots of polynomials of second, third and forth degree, there
is no algorithm or formula for determining the roots of general polynomials of degree five
and higher. Therefore the process of finding the roots of a polynomial is, to some extent,
a matter of making educated guesses. An important aid in this regard is the following
theorem.

323
Theorem 7.3.5. Let c ∈ C be a root of a polynomial f (x) = an xn +an−1 xn−1 +· · ·+a1 x+a0
over R. Then c̄ is also a root of f (x).

We leave the proof as an exercise, see Exercise 7.3 number 5.

Remark 7.3.6. Even though there is no algorithm for determining the roots of a general
polynomial, there are some strategies that can aid us in determining the roots of a polynomial
f (x). The following is a (partial) list of such strategies.

(1) Try to guess a simple root of f (x), such as x = −1, x = 1, x = −i or x = i.

(2) If you find a complex root c of a real polynomial, it follows from Theorem 7.3.5 that
c̄ is also a root of f (x). Then g(x) = (x − c)(x − c̄) is a factor of f (x). Use long
division to determine the accompanying factor of f (x).

(3) Identify a useful grouping of the terms in the polynomial. For instance,

f (x) = x4 + x3 + x2 + x

can be factorised by noting that

f (x) = (x4 + x3 ) + (x2 + x) = x3 (x + 1) + x(x + 1) = (x + 1)(x3 + x) = x(x + 1)(x2 + 1).

(4) If f (x) has a factor of the form g(x) = xn − c, use Theorem 6.5.4 to determine the n
roots of g(x), namely, the complex nth roots of c.

We end this section with a number of examples, demonstrating how to determine the roots
of some polynomials.

Example 7.3.7. We find the roots of f (x) = x4 − x3 − 5x2 − x − 6.

Solution. By trial-and-error, we find that x = i is a root of f (x), since

f (i) = 1 + i + 5 − i − 6 = 0.

Because the coefficients of f (x) are real numbers, it follows from Theorem 7.3.5 that x = −i
is also a root of f (x). Therefore, by the Factor Theorem, Theorem 7.2.9, the polynomial
g(x) = (x − i)(x + i) = x2 + 1 is a factor of f (x). Long division yields

f (x) = (x2 + 1)(x2 − x − 6).

Hence f (x) = (x − i)(x + i)(x − 3)(x + 2) so that the roots of f (x) are i, − i, 3, and 2.

Example 7.3.8. We find the roots of the polynomial f (x) = x9 + x5 − x4 − 1.

324
Solution. Notice that

f (x) = (x9 + x5 ) − (x4 + 1) = x5 (x4 + 1) − (x4 + 1) = (x5 − 1)(x4 + 1).

Therefore the roots of f (x) are the 4th roots of −1 and the 5th roots of 1. We apply
Theorem 6.5.4 to find the roots of f (x). In polar form,

−1 = eπi and 1 = e0i .

Consequently the 4th roots of −1 are


π 3π 5π 7π
w0 = e 4 , w 1 = e 4 , w 2 = e 4 , w 3 = e 4 .

The 5th roots of 1 are


2πi 4πi 6πi 8πi
v0 = e0 = 1, v1 = e 5 , v2 = e 5 , v3 = e 5 , v4 = e 5 .
π 3π 5π 7π 2πi 4πi 6πi 8πi
The roots of f (x) are e 4 , e 4 , e 4 , e 4 , 1, e 5 , e 5 , e 5 , and e 5 .

Example 7.3.9. We find the roots of the polynomial f (x) = 2x3 + 5ix2 − 4x − i.

Solution. Note that in this example, we cannot make use of Theorem 7.3.5, since not all
the coefficients of f (x) are real numbers. According to Example 7.2.6,

f (x) = (x + i)(2x2 + 3ix − 1).

Therefore
f (x) = 0 if and only if x = −i or 2x2 + 3ix − 1 = 0.
By completing the square we find that
2


2 3i 1
2x + 3ix − 1 = 2x + √ +
2 2 8

Therefore
√ 3i i √ 3i i
2x2 + 3ix − 1 = 0 if and only if 2x + √ = √ or 2x + √ = − √ .
2 2 2 2 2 2 2 2
so that
i
2x2 + 3ix − 1 = 0 if and only if x = −or x = −i.
2
i
Hence f (x) = 0 if and only if x = −i, x = −i or x = − . That is, the roots of f (x) are
2
i
−i, − i and − .
2
Example 7.3.10. We find the roots of f (x) = x4 − 5x3 + 10x2 − 10x + 4, given that 1 + i
is a root of f (x).

325
Solution. Since the coefficients of f (x) are real, it follows from Theorem 7.3.5 that 1 − i is
also a root of f (x). Therefore, by Theorem 7.2.9,

g(x) = (x − 1 − i)(x − 1 + i) = x2 − 2x + 2

is a factor of f (x). Long division yields

f (x) = (x2 − 2x + 2)(x2 − 3x + 2).

Therefore the roots of f (x) are 1 + i, 1 − i, 1 and 2.

The Fundamental Theorem of Algebra, Theorem 7.3.1, guarantees that a polynomial over
C (or R) with degree at least 1 has at least one real root. As a consequence of this result,
we proved that a polynomial of degree n ≥ 1 has exactly n complex roots, counting multi-
plicities. Although this result does not specify a method by which to determine the roots of
a polynomial, it is possible to find such roots in relatively simple cases, as we demonstrated
at the hand of examples.

Exercise 7.3

1. Find the complex roots of f (x) = x6 − 2x4 − ix2 + 2i.

2. If it is given that 2 + i is a root of f (x) = x3 − 9x2 + 25x − 25, solve the equation
f (x) = 0 in C.

3. Decompose the given polynomials into linear factors (i.e. factors of the form ax + b).
(a) f (x) = 2x3 − 5x2 − x + 6
(b) f (x) = x3 − 7x + 6
(c) f (x) = x3 − (2 + i)x2 + 2(1 + i)x − 2i (Either i or −i is a zero.)
(d) f (x) = 2x3 + 5ix2 − 4x − i (Either i or −i is a zero.)
(e) f (x) = (2x2 + 6x + 3)(x2 − 4x + 2)
(f) f (x) = x3 + 6x2 + 6x + 5 (One factor is (x + 5).)
(g) f (x) = x4 − 2x3 + 2x2 − 2x + 1 (One of the zeros is i.)
(h) f (x) = x3 − 8
(i) f (x) = x4 + 16

4. Solve the following equations in C.

(a) x2 − ix = 0 (b) x3 + 2x2 − 2x − 1 = 0

(c) x3 − x2 − x − 2 = 0 (d) x4 − 2x3 + 2x2 − 2x + 1 = 0


5. Prove Theorem 7.3.5.

326
Appendix A

A.1 Some Theorems You Should Know

Theorem A.1.1 (Intermediate Value Theorem). Suppose that f : [a, b] → R is con-


tinuous on [a, b]. Then for every number p between f (a) and f (b), there exists c ∈ [a, b] so
that f (c) = p.

Theorem A.1.2. Assume that f : [a, b] → R is continuous at c ∈ (a, b), and that L is
a real number. If f (c) > L, then there is a number δ > 0 such that f (x) > L for all
x ∈ (c − δ, c + δ).

Theorem A.1.3 (Extreme Value Theorem). If f is continuous on a closed interval


[a, b], then there are points c and d in [a, b] such that f (c) ≤ f (x) ≤ f (d) for all x in [a, b].

Theorem A.1.4. Let f be a function defined on an open interval containing the point a.
If f is differentiable at a, then f is continuous at a.

Theorem A.1.5. Let f be a function from R to R. Assume that f is differentiable on an


interval (a, b). Then the following is true.

(1) If f ′ (x) < 0 for every x ∈ (a, b), then f is decreasing on (a, b).

(2) If f ′ (x) > 0 for every x ∈ (a, b), then f is increasing on (a, b).

Theorem A.1.6 (Inverse Function Theorem). Assume that f : [a, b] → R is contin-


uous on [a, b], differentiable on (a, b) and f ′ (x) > 0 for all x ∈ (a, b). Then the following
statements are true.

(1) f is increasing on [a, b].

(2) f has an inverse f −1 : [f (a), f (b)] → [a, b].

(3) f −1 is differentiable on (f (a), f (b)), and

d −1 1
f (x) = ′ −1 , x ∈ (f (a), f (b)).
dx f (f (x))

327
Theorem A.1.7 (Inverse Function Theorem). Assume that f : [a, b] → R is contin-
uous on [a, b], differentiable on (a, b) and f ′ (x) < 0 for all x ∈ (a, b). Then the following
statements are true.

(1) f is decreasing on [a, b].


(2) f has an inverse f −1 : [f (b), f (a)] → [a, b].
(3) f −1 is differentiable on (f (b), f (a)), and
d −1 1
f (x) = ′ −1 , x ∈ (f (a), f (b)).
dx f (f (x))
Theorem A.1.8 (Rolle’s Theorem). Suppose that f is continuous on the closed interval
[a, b] and differentiable on the open interval (a, b). If f (a) = f (b), then there exists a number
c ∈ (a, b) so that f ′ (c) = 0.
Theorem A.1.9 (Mean Value Theorem for the Derivative). Suppose that f is con-
tinuous on the closed interval [a, b] and differentiable on the open interval (a, b). Then there
exists a number c ∈ (a, b) so that
f (b) − f (a)
f ′ (c) = .
b−a
Theorem A.1.10. Assume that f and g are integrable on a closed interval [a, b]. Then the
following statements are true.

(i) If f (x) ≤ g(x) for all a ≤ x ≤ b, then


Z b Z b
f (x)dx ≤ g(x)dx.
a a

(ii) If L ≤ f (x) ≤ U for all a ≤ x ≤ b, then


Z b
L(b − a) ≤ f (x)dx ≤ U (b − a).
a

Theorem A.1.11. If f is integrable on a closed interval [a, b] and a < c < b, then
Z b Z c Z b
f (x)dx = f (x)dx + f (x)dx.
a a c

A.2 Mathematical Induction

The Principle of Mathematical Induction is a method for proving statements about natural
numbers. An example of such a statement is
n(n + 1)
1 + 2 + 3 + ··· + n = for every natural number n.
2

328
This principle can be formulated as follows.

The Principle of Mathematic Induction


A statement about natural numbers is true for every natural number n provided that the
following hold.

(1) The statement is true when n = 1.

(2) If the statement is true when n = k for some fixed but arbitrary natural number k,
then the statement is also true when n = k + 1.

Using this principle involves two steps.


Basis step Prove that the statement is true if n = 1.
Induction Step Assume that the statement is true when n = k, for some fixed but
arbitrary natural number k, and prove that the statement is true
when n = k + 1.

We illustrate the method at the hand of some examples.

Example A.2.1. We prove that

n(n + 1)
1 + 2 + 3 + ··· + n =
2
for every natural number n.

Solution. Assume that n = 1. Then

1 + 2 + 3 + ···n = 1

and
n(n + 1) 1(1 + 1) 2
= = = 1.
2 2 2
Therefore
n(n + 1)
1 + 2 + 3 + ··· + n = if n = 1.
2
Let k be an arbitrary but fixed natural number. Assume that

k(k + 1)
1 + 2 + 3 + ··· + k = . (A.1)
2
We must prove that

(k + 1)((k + 1) + 1)
1 + 2 + 3 + · · · + (k + 1) = . (A.2)
2

329
We have
1 + 2 + 3 + · · · + (k + 1) = 1 + 2 + 3 + · · · + k + (k + 1)

k(k + 1)
= + (k + 1) [By (A.1)]
2
k(k + 1) + 2(k + 1)
=
2
(k + 1)(k + 2)
=
2
(k + 1)((k + 1) + 1)
= .
2
Therefore (A.2) is true if (A.1) is true. By the Principle of Mathematical Induction,

n(n + 1)
1 + 2 + 3 + ··· + n =
2
for every natural number n.

Example A.2.2. Let a and r be real numbers so that r ̸= 1. We prove that


1 − rn
a + ar + ar2 + · · · + arn−1 = a
1−r
for every natural number n.

Solution. Assume that n = 1. Then

a + ar + ar2 + · · · + arn−1 = a

and
1 − rn 1−r
a =a = a.
1−r 1−r
Therefore
1 − rn
a + ar + ar2 + · · · + arn−1 = a if n = 1.
1−r
Let k be an arbitrary but fixed natural number. Assume that

1 − rk
a + ar + ar2 + · · · + ark−1 = a . (A.3)
1−r
We must prove that

2 (k+1)−1 1 − rk+1
a + ar + ar + · · · + ar =a . (A.4)
1−r

330
We have
a + ar + ar2 + · · · + ar(k+1)−1 = a + ar + ar2 + · · · + ark

= a + ar + ar2 + · · · + ark−1 + ark

1 − rk
= a + ark [By (A.3)]
1−r

1 − rk
 
k
= a +r
1−r

1 − rk + (1 − r)rk
 
= a
1−r

1 − rk + rk − rk+1
 
= a
1−r

1 − rk+1
 
= a .
1−r

Therefore (A.4) is true if (A.3) is true. By the Principle of Mathematical Induction,


1 − rn
a + ar + ar2 + ar3 + · · · + arn−1 = a
1−r
for every natural number n.

Example A.2.3. Assume that f : R → R is differentiable at some point a ∈ R. We prove


that
gn (x) = [f (x)]n , x ∈ R
is differentiable at a and gn′ (a) = n[f (a)]n−1 f ′ (a) for every natural number n.

Solution. Assume that n = 1. Then g1 (x) = f (x) for every x ∈ R, so g1 is differentiable


at a by assumption, and

g1′ (a) = f ′ (a) = 1 × [f (a)]1−1 × f ′ (a).

Therefore gn is differentiable at a and gn′ (a) = n[f (a)]n−1 f ′ (a) if n = 1. Fix a natural
number k. Assume that

gk is differentiable at a and gk′ (a) = k[f (a)]k−1 f ′ (a). (A.5)

We must prove that



gk+1 (x) = [f (x)]k+1 is differentiable at a, and gk+1 (a) = (k + 1)[f (a)]k f ′ (a). (A.6)

331
We have
gk+1 (x) = [f (x)]k+1 = f (x) × [f (x)]k = f (x) × gk (x), x ∈ R.
Therefore, by (A.5) and the Product Rule, gk+1 is differentiable at a and

gk+1 (a) = f ′ (a)gk (a)+f (a)gk′ (a) = f ′ (a)[f (a)]k +f (a)×k×[f (a)]k−1 f ′ (a) = (k+1)f ′ (a)[f (a)]k .

Therefore (A.6) is true if (A.5) is true. By the Principle of Mathematical Induction,

gn is differentiable at a and gn′ (a) = n[f (a)]n−1 f ′ (a)

for every natural number n.


Example A.2.4. We prove that
dn
[xex ] = nex + xex , x ∈ R f or all n ∈ N.
dxn

Solution. Fix x ∈ R.

Assume that n = 1. Then, by the Product Rule,


dn d
n
[xex ] = [xex ] = ex + xex = nex + xex .
dx dx
Therefore, if n = 1 then
dn
[xex ] = nex + xex .
dxn
Fix a natural number k. Assume that
dk
[xex ] = kex + xex . (A.7)
dxk
We must prove that

dk+1
[xex ] = (k + 1)ex + xex . (A.8)
dxk+1
By (A.7) we have
dk+1 d dk
 
x x
[xe ] = [xe ]
dxk+1 dx dxk

d
= [kex + xex ]
dx
d d
= [kex ] + [xex ]
dx dx

= kex + ex + xex

= (k + 1)ex + xex .

332
Hence (A.8) is true if (A.7) is true. Therefore
dn
n
[xex ] = nex + xex
dx
for all n ∈ N.

We sometimes encounter statements about natural numbers that are not true for all natural
numbers, but only for all natural numbers n ≥ M , where M is some natural number. The
Principle of Mathematical Induction can be applied in this case, provided that we modify
the method slightly. For the basis step, we prove that the statement is true if n = M ,
instead of proving it if n = 1. In the induction step, we assume that the statement is true
for some fixed and arbitrary natural number k ≥ M . We illustrate this in the following
example.
Example A.2.5. We prove that
4n < 2n
for all natural numbers n ≥ 5.

Solution. Assume that n = 5. Then


4n = 4 × 5 = 20 < 32 = 25 .
Therefore 4n < 2n if n = 5.

Let k be an arbitrary but fixed natural number such that k ≥ 5. Assume that
4k < 2k . (A.9)
We must prove that
4(k + 1) < 2k+1 . (A.10)
We have
4(k + 1) = 4k + 4

< 2k + 4 [By (A.9)]

= 2k + 4k [Because 4 < 4k]

= 2k + 2k [By (A.9)]

≤ 2 × 2k

≤ 2k+1 .
Therefore (A.10) is true if (A.9) is true. By the Principle of Mathematical Induction,
4n < 2n
for every natural number n ≥ 5.

333
It should be noted that the results in some of the examples considered in this section can be
proven using other methods besides Mathematical Induction. This is the case, for instance,
in Example A.2.5.

Exercise A.2

1. Prove the following statements using the Principle of Mathematical Induction.


(a) 2 + 4 + 6 + · · · + 2n = n(n + 1) for every natural number n.
(b) For every natural number n ≥ 2, if x1 , . . . , xn are real numbers, then
n
X n
X
xi ≤ |xi |.
i=1 i=1

(c) 32n − 1 is divisible by 8 for every natural number n.


dn
(d) For x ∈ R, n [xe2x ] = n2n−1 e2x + 2n xe2x for every n ∈ N.
dx
d4n
(e) For x ∈ R, 4n [ex cos x] = 4n(−1)n ex sin(x) for every n ∈ N.
dx
(f) For a fixed real number a > 0,
a n
n!(−1)n+k ak ea
Z X
n t
t e dt = + (−1)n+1 n!a
0 k=1
k!

for every n ∈ N.
(g) For x, y ∈ R, x − y is a factor of xn − y n for every natural number n.
cos 12 − cos(n + 12 )
(h) sin 1 + sin 2 + · · · + sin n = for all n ∈ N.
2 sin 21

334

You might also like