ST208 - Chapters 1-6

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 315

1

ST208 Mathematical Methods


Contents

Module Introduction and Information 5

1 Preliminaries 7
2
1.1 Subsets of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Inverse image of a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Calculating determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5 Chapter 1 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2 Multiple integration 29
2.1 Introduction: Fubini’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 Elementary regions of integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.1 Finding the limits of integration 1: 2-d Cartesian . . . . . . . . . . . . . . . . . . . 44
2.3 Change of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.4 Common coordinate changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.4.1 Finding the limits of integration 2: 2-d polar coordinates . . . . . . . . . . . . . . . 80
2.5 Applications of multiple integrals to statistics . . . . . . . . . . . . . . . . . . . . . . . . 91
2.5.1 Geometric Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.6 Chapter 2 Consolidation Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.7 Chapter 2 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

3 Symmetric matrices, quadratic forms, positive definiteness 104


3.1 Some linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.2 Quadratic forms and positive definiteness . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.3 Covariance matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.4 A glimpse of the future: Monte Carlo simulation of stock portfolio . . . . . . . . . . . . . . 138
3.5 Chapter 3 Consolidation Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.6 Chapter 3 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

4 Differentiation in R𝑛 143
4.1 Generalising the single variable case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
4.2 Basic properties of the derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

2
CONTENTS 3

4.3 Finding and classifying critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167


4.3.1 Critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.3.2 Classifying critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.4 Constrained optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.5 Chapter 4 Consolidation Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
4.6 Chapter 4 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
4.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

5 Linear algebra 208


5.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
5.2 Change of basis and linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
5.3 Inner products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
5.4 Orthogonal/orthonormal bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
5.5 Gram-Schmidt orthogonalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
5.6 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
5.7 The spectral decomposition theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
5.8 Application to simple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
5.9 Chapter 5 Consolidation Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
5.10 Chapter 5 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281

6 Metric spaces 285


6.1 Introduction to metric spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
6.2 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
6.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
6.4 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
6.5 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
6.6 Chapter 6 Consolidation Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
6.7 Chapter 6 summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
4
Module Introduction and Information

Image Yang Hui triangle using rod numerals, as depicted in a publication of Zhu Shijie in 1303 AD. Available
via https://commons.wikimedia.org/wiki/File:Yanghui_triangle.gif. Image licence: This work is in the public
domain and is free for reuse.

Historical note Yang Hui’s triangle will look familiar to many as is the so-called ‘Pascal’s triangle,’ named
after French mathematician Blaise Pascal. Although named after Pascal, the triangle had been studied
centuries earlier in India, Persia, China, and parts of Europe. It is impossible to give a definite answer to
the first occurrence of this mathematical object. Nevertheless, there is evidence to suggest knowledge of

5
6 CONTENTS

particular properties of the triangle in 2 BC.

Further reading C.F. Boyer, A history of mathematics , Wiley, 1968.

Information The module web-page https://go.warwick.ac.uk/st208 contains full module details and infor-
mation. (Note, you may be asked for your Warwick sign-on to view this page.) These notes are designed
to be a self-contained study resource that you must use in conjunction with the lectures, tutorials and online
resources. The notes contains gaps that you must complete. The nature of the gaps indicates how you will
typically complete them.

Examples We will complete these together during class.

Questions We may complete some or all of these during class. Those questions that are not completed
during class you must complete as part of your independent study. Proofs and other gaps We will typically
complete these together during class.

Each chapter contains a summary and consolidation questions. The consolidation questions are designed
for when you have finished working through a chapter and typically require deeper thought to complete.

Acknowledgements These lecture notes are closely based on those of the module’s previous lecturers,
Martyn Parker, Adam Johansen, David Croydon, Heather Humphries (which were typeset by Iain Carson)
and Paul Jenkins. Responsibility for any errors is mine.
Chapter 1

Preliminaries

Chapter outcomes

At the end of the chapter you should be familiar with and be able to:

• Sketch subsets of R2 and select appropriate techniques to determine if subsets of R2 are


closed, bounded or compact.
• Determine the pre-image (or pullback) of a given set.
• Compute partial derivatives in a range of situations.
• Apply appropriate theory to calculate determinants.

Assumed background knowledge

Before starting this chapter it is assumed that you know the following material. (References are made to the
2019-20 printed notes.)

• MAT106: Linear Algebra. Chapter 11: The determinant of a matrix

– Definition of the determinant


– The effect of matrix operations on the determinant
– The determinant of a product
– Minors and cofactors

• MA137: Mathematical Analysis. Chapter 1: Functions

– Definition of intervals.
– Standard functions.

• ST116: Mathematical Techniques. The mathematical thinking, language and writing developed is
fundamental to everything presented in this module. This includes:

– Language of sets
– Function notation, including image and pre-image.

• General mathematical knowledge. That you can fluently apply pre-university differentiation techniques.

7
8 CHAPTER 1. PRELIMINARIES

Chapter Statistician: Abraham Manie (Abe) Adelstein

Abe was a South African doctor who became the UK’s Chief Medical Statistician. He held posts a Manch-
ester and the London School of Hygiene and Tropical Medicine. His work included improving cancer re-
search and had leading parts in World Health Organisation meetings that developed recommendations for
health information systems across Europe. He was a Fellow of the Royal College of Physicians and held
several other honours.

Figure 1.1: Abraham Manie (Abe) Adelstein.

Image available via: https://imgix.ranker.com/user_node_img/20/392275/original/abraham- manie-


adelstein-all-people-photo-u1?w=650/&q=50/&fm=pjpg/&fit=crop/&crop=faces Image licence: Creative
commons licence.

1.1 Subsets of R2

“The numbers have no way of speaking for themselves. We speak for them. We imbue them with
meaning…”
— Nate Silver, The Signal and the Noise, Penguin UK, 2012

In order to be able to define properly the concepts of integration and differentiation in higher dimensions,
we need to gather and revise some ideas about sets and functions. The purpose of this chapter is to revise
these topics.

We will need to be able to sketch, quickly and efficiently, simple subsets of R2 defined by constraints such
as simple bounding curves and simple inequalities. We also need to be able to decide whether our sets are
closed, bounded or compact. We will meet the formal definitions of these terms in the Chapter 6 (on metric
spaces); for now, it will be enough to say that a set is:
1.1. SUBSETS OF R2 9

• closed if it contains its own boundary, or if any point in its complement can be surrounded by a disc
of suitably small radius (𝜀 > 0) which also lies in the complement;
• bounded if it could be contained completely within a circle, centre at the origin, of sufficiently large
radius;
• compact if it is both closed and bounded. (This is adequate if we restrict our attention to ℝ𝑛 ; it is not
correct in more general metric spaces.)

There is an equivalent statement for closed sets. A set 𝑋 ⊆ ℝ𝑛 is closed if, and only if, the limit of every
convergent sequence in 𝑋 belongs to 𝑋 . Please remember that once we see these definitions formally in
Chapter 6 you will need to utilise the updated definitions.

Example 1.1
Sketch the following subsets of R2 , and decide if they are closed, if they are bounded and if they
are compact.

1. {(𝑥, 𝑦) ∈ R2 ∶ 𝑥 + 𝑦 ≥ 1, 𝑥2 + 𝑦2 < 1},


2. {(𝑥, 𝑦) ∈ R2 ∶ 𝑥2 + 𝑦 2 ≥ 1, |𝑥| + |𝑦| < 2},
3. {(𝑥, 𝑦) ∈ R2 ∶ |𝑥| + |𝑦| ≥ 1, max{|𝑥|, |𝑦|} ≤ 1}.
10 CHAPTER 1. PRELIMINARIES
1.1. SUBSETS OF R2 11
12 CHAPTER 1. PRELIMINARIES
1.2. INVERSE IMAGE OF A FUNCTION 13

Question 1.2
Sketch the following subsets of R2 , and decide if they are closed, if they are bounded and if they
are compact.

1. 𝐴1 = {(𝑥, 𝑦) ∈ R2 ∶ 𝑥 + 𝑦 ≥ 1},
2. 𝐴2 = {(𝑥, 𝑦) ∈ R2 ∶ 𝑥 + 𝑦 ≥ 1, 𝑥2 + 𝑦2 ≤ 1},
3. 𝐴3 = {(𝑥, 𝑦) ∈ R2 ∶ 𝑦 = 0, 𝑥 ≤ 0},
4. 𝐴4 = {(𝑥, 𝑦) ∈ R2 ∶ 𝑥2 + 𝑦2 ≥ 3, 𝑥2 + 4𝑦2 ≤ 4},
5. 𝐴5 = ([0, 1] × [0, 1]) ∩ (Q × Q),
6. 𝐴6 = {(𝑥, 𝑦) ∈ R2 ∶ 𝑥2 = 𝑦2 },
7. 𝐴7 = {(𝑥, 𝑦) ∈ R2 ∶ 𝑥2 ≤ 𝑦},
8. 𝐴8 = {(𝑥, 𝑦) ∈ R2 ∶ 𝑥2 ≤ 𝑦 ≤ 1},
9. 𝐴9 = {(𝑥, 𝑦) ∈ R2 ∶ |𝑥| ≤ 1, 0 ≤ 𝑦2 < 1},
10. 𝐴10 = {(𝑥, 𝑦) ∈ R2 ∶ |𝑥| + |𝑦| ≥ 1, |𝑦| ≤ 2},
11. 𝐴11 = {(𝑥, 𝑦) ∈ R2 ∶ 𝑥2 + 𝑦2 ≤ 1, 𝑥 ∈ Q},

1.2 Inverse image of a function


Consider a function 𝑓 ∶ 𝐴 → 𝐵, 𝑎 ↦ 𝑓(𝑎) = 𝑏. This may be injective (into), surjective (onto), neither
or both. A bijective function 𝑓 (i.e. one that is both injective and surjective and hence mapping exactly one
element of 𝐴 to each element of 𝐵 ) has a related inverse function 𝑓 −1 , which is defined as the function
𝑓 −1 such that if 𝑓(𝑎) = 𝑏, then 𝑓 −1 (𝑏) = 𝑎.
If𝑓 is not bijective, then we might still be interested in which point(s) of 𝐴 map to a particular point 𝑏 ∈
𝑓(𝐴) ⊆ 𝐵, where 𝑓(𝐴) is the image of 𝐴 under 𝑓 . The preimage, inverse image or pull-back of the
subset 𝑋 of 𝐵 is defined to be

𝑓 −1 (𝑋) = {𝑎 ∈ 𝐴 ∶ 𝑓(𝑎) ∈ 𝑋}.

In particular,
𝑓 −1 ({𝑏}) = {𝑎 ∈ 𝐴 ∶ 𝑓(𝑎) = 𝑏},
which is often referred to as the fibre of 𝑏.

Example 1.3
Let 𝑓 ∶ R2 → R be defined by (𝑥, 𝑦) ↦ 1 + 𝑥2 + 𝑦2 . Find 𝑓 −1 ((0, 1)), and sketch 𝑓 −1 ([0, 2] ∪
[5, 10]).
14 CHAPTER 1. PRELIMINARIES
1.2. INVERSE IMAGE OF A FUNCTION 15
16 CHAPTER 1. PRELIMINARIES
1.3. PARTIAL DERIVATIVES 17

Question 1.4

1. Let 𝑓 ∶ R2 → R, (𝑥, 𝑦) ↦ 𝑥2 + 𝑦2 . Find 𝑓 −1 ([1, 4]), 𝑓 −1 ({−4}), 𝑓 −1 ([−4, 1]), 𝑓 −1 (Z).


2. Let 𝑓 ∶ R2 → R2 , (𝑥, 𝑦) ↦ (𝑥2 + 𝑦2 , 𝑥𝑦). Sketch 𝑓 −1 ([1, 4] × [0, 1]).
3. Let 𝑓 ∶ R2 → R2 , (𝑥, 𝑦) ↦ (𝑦, 𝑥2 + 𝑦2 ). Find 𝑓 −1 (R × {−1}), and sketch 𝑓 −1 ([0, 1] ×
[1, 2]).
4. Let 𝑓 ∶ R → R2 , 𝑥 ↦ (|𝑥|, 𝑥). Show 𝑓 −1 ({1} × R) = {−1, 1}, and find 𝑓 −1 ([0, 2] ×
[−1, 1]), 𝑓 −1 ([−1, 1] × [0, 2]).
5. Let 𝑓 ∶ R2 → R, (𝑥, 𝑦) ↦ max{|𝑥|, |𝑦|}. Sketch 𝑓 −1 ([−1, 1] ∪ [2, 4]).
6. Let 𝑓 ∶ R2 → R, (𝑥, 𝑦) ↦ |𝑥| + 2|𝑦|. Find 𝑓 −1 ({−1}), and sketch 𝑓 −1 ({1}),
−1
𝑓 (𝑓({1} × R)).

1.3 Partial derivatives


A function 𝑓 ∶ R → R is differentiable at 𝑥 if the limit

𝑓(𝑥 + ℎ) − 𝑓(𝑥)
lim
ℎ→0 ℎ
exists. Write 𝑓 ′ (𝑥) for the limit and called this the derivative of 𝑓 at 𝑥. For functions with a graph we have

Consider 𝑓 ∶ 𝑈 (⊆ R𝑛 ) → R𝑚 , what is the ‘derivative of 𝑓 ’? For the moment, we specialise to R𝑛 → R


and for the moment consider 𝑛 = 2. This simplification permits graphical motivation, in particular, the
question is: what is the derivative of 𝑓 at a point (𝑎, 𝑏)? The general question must wait until later in the
module. Graphically, near (𝑎, 𝑏) the graph of the function is
18 CHAPTER 1. PRELIMINARIES

Observe that if we slice through(𝑎, 𝑏) parallel to the 𝑥 and 𝑦-axes, the result is the graph of two related
functions which map R → R. The figure illustrates a slice of the function parallel to the 𝑥-axis. The graph
of 𝑥 ↦ 𝑓(𝑥, 𝑏) is single valued; that is, a map R → R. Differentiating the function 𝑥 ↦ 𝑓(𝑥, 𝑏) in the
𝜕𝑓
normal way one obtains (𝑎, 𝑏), which we abbreviate as 𝑓𝑥 (𝑎, 𝑏). In a similar manner, parallel to the
𝜕𝑥
𝑦-axis the graph of 𝑦 ↦ 𝑓(𝑎, 𝑦) is single-valued; that is, a map R → R. Once again, differentiating the
𝜕𝑓
function 𝑥 ↦ 𝑓(𝑎, 𝑦) in the normal way one obtains (𝑎, 𝑏), which we abbreviate as 𝑓𝑦 (𝑎, 𝑏).
𝜕𝑦
In the case 𝑓 ∶ R𝑛 → R, define

𝜕𝑓 𝑓(𝑥1 , … , 𝑥𝑖−1 , 𝑥𝑖 + ℎ, 𝑥𝑖+1 , … , 𝑥𝑛 ) − 𝑓(x)


(x) = lim
𝜕𝑥𝑖 ℎ→0 ℎ

where x = (𝑥1 , … , 𝑥𝑛 ). Finally, in the general case, where 𝑓(x) = (𝑓1 (x), … , 𝑓𝑚 (x)), this approach
yields a matrix, J, of partial derivatives where the entries are

𝜕𝑓𝑖
𝐽𝑖𝑗 =
𝜕𝑥𝑗

Returning to the general quesiton, finding partial derivatives is only the first step in determining the deriva-
tives of 𝑓 ∶ 𝑈 (⊆ R𝑛 ) → R𝑚 .

Example 1.5
𝑥
Let 𝑓 ∶ R2 → R, (𝑥, 𝑦) ↦ 𝑓(𝑥, 𝑦) be given by 𝑓(𝑥, 𝑦) = . Find 𝑓𝑥 , 𝑓𝑦 , and any
1 + 𝑥2 + 𝑦 2
points where 𝑓𝑥 = 𝑓𝑦 = 0.
1.3. PARTIAL DERIVATIVES 19
20 CHAPTER 1. PRELIMINARIES
1.3. PARTIAL DERIVATIVES 21
22 CHAPTER 1. PRELIMINARIES

Question 1.6

𝑓 ∶ R2 → R, (𝑥, 𝑦) ↦ 𝑓(𝑥, 𝑦) be as given below. Find 𝑓𝑥 , 𝑓𝑦 , and any points where


1. Let
𝑓𝑥 = 𝑓𝑦 = 0.
a) 𝑓(𝑥, 𝑦) = 𝑥3 + 6 − 𝑦3 − 2𝑥𝑦,
b) 𝑓(𝑥, 𝑦) = 6𝑥2 − 2𝑥3 + 3𝑦2 + 6𝑥𝑦,
c) 𝑓(𝑥, 𝑦) = 𝑥3 + 𝑦3 + 3𝑥2 − 3𝑦2 − 8,
d) 𝑓(𝑥, 𝑦) = 𝑥4 − 8𝑥2 + 3𝑦2 − 6𝑦,
e) 𝑓(𝑥, 𝑦) = 6𝑥𝑦𝑒−(2𝑥+3𝑦) ,
1
f) 𝑓(𝑥, 𝑦) = 𝑥 + 8𝑦 + . Note, here, 𝑓 ∶ (R {(0)})2 → R,
𝑥𝑦
2 2
g) 𝑓(𝑥, 𝑦) = 𝑥𝑦𝑒−𝑥 −𝑦
∶ R3 → R, (𝑥, 𝑦, 𝑧) ↦ 𝑓(𝑥, 𝑦, 𝑧) be as given below. Find 𝑓𝑥 , 𝑓𝑦 , 𝑓𝑧 , and any points
2. Let 𝑓
where 𝑓𝑥 = 𝑓𝑦 = 𝑓𝑧 = 0.

a) 𝑓(𝑥, 𝑦, 𝑧) = 𝑥2 𝑦 + 𝑦2 𝑧 + 𝑧 2 − 2𝑥,
b) 𝑓(𝑥, 𝑦, 𝑧) = 𝑥𝑦𝑧 − 𝑥2 − 𝑦2 − 𝑧 2 ,
c) 𝑓(𝑥, 𝑦, 𝑧) = 4𝑥𝑦𝑧 − 𝑥4 − 𝑦 4 − 𝑧 4 .

1.4 Calculating determinants


The purpose of this section is to apply theoretical properties to simplify the determinant computations. The
determinant of a matrix is a function from the space of square matrices to R; that is, the determinant of
a matrix 𝐴, written det 𝐴 or |𝐴| is a number, or expression, that depends on the elements of 𝐴. The
determinant possesses properties that simplify its computation, for example,

1. subtracting a multiple of one row (column) from another,


2. taking out a common factor from any row (column),
3. recognising upper triangular, lower triangular or diagonal matrices,
4. choosing the ‘best’ row (column) to expand along,
5. using ‘block’ matrices where appropriate,
6. identifying rows or columns which can be expressed as linear combinations of other rows or columns,
respectively.

Example 1.7
Compute the determinants of the following matrices
1.
2 1 1

⎜ 1 1 1 ⎞

⎝ 2 2 2 ⎠
1.4. CALCULATING DETERMINANTS 23

2.
2 1 1

⎜ 0 1 2 ⎞

⎝ 0 0 1 ⎠
3.
2 1 2
⎜ 0 1 0 ⎞
⎛ ⎟
⎝ 2 0 1 ⎠
4.
2 1 0

⎜ 2 2 0 ⎞

⎝ 0 0 1 ⎠
5.
0 1 0 0

⎜ 0 1 0 5 ⎞


⎜ ⎟

⎜ 0 2 1 6 ⎟
⎝ 1 2 6 1 ⎠
24 CHAPTER 1. PRELIMINARIES
1.4. CALCULATING DETERMINANTS 25
26 CHAPTER 1. PRELIMINARIES
1.5. CHAPTER 1 SUMMARY 27

Question 1.8

1. Compute the determinant of


4 2 4

⎜ 0 1 0 ⎞

⎝ 2 0 1 ⎠
2. Let
1 2 −2
𝐵=⎜ 1 5 3 ⎞
⎛ ⎟.
⎝ 2 6 −1 ⎠
Compute det(𝐵) by

a) expanding along row 1.


b) expanding down column 3.
c) row reducing the matrix.

3. Let
1 3 −2

𝐶 =⎜ 4 𝑘+3 5 ⎞ ⎟.
⎝ −2 −6 𝑘 + 2 ⎠
Determine the value(s) of 𝑘 for which det 𝐶 = 0.

1.5 Chapter 1 summary

Chapter outcomes review

Having reviewed the material and completed the assessment (formative and summative) material,
you should be able to:

• Sketch subsets of R2 and select appropriate techniques to determine if subsets of R2 are


closed, bounded or compact.
• Determine the pre-image (or pullback) of a given set.
• Compute partial derivatives in a range of situations.
• Apply appropriate theory to simplify determinant calculations.

This chapter covers four topics.

Subsets of R2 . This section examines sketching (quickly and efficiently) subsets of R2 defined by con-
straints such as bounding curves and inequalities. Deciding whether sets are closed, bounded or compact
in R2 . Specifically, a set is:

• closed if it contains its own boundary, or if any point in its complement can be surrounded by a disc
of suitably small radius (𝜀 > 0) which also lies in the complement;
28 CHAPTER 1. PRELIMINARIES

• bounded if it could be contained completely within a circle, centre at the origin, of sufficiently large
radius;
• compact if it is both closed and bounded.

Inverse images. Consider a function 𝑓 ∶ 𝐴 → 𝐵, where 𝑎 maps to 𝑓(𝑎) = 𝑏. This may be injective
(into), surjective (onto), neither or both. A bijective function 𝑓 (i.e. one that is both injective and surjective
and hence mapping exactly one element of 𝐴 to each element of 𝐵 ) has a related inverse function 𝑓 −1 ,
which is defined as the function 𝑓 −1 such that if 𝑓(𝑎) = 𝑏, then 𝑓 −1 (𝑏) = 𝑎. If 𝑓 is not bijective, then
we might still be interested in which point(s) of 𝐴 map to a particular point 𝑏 ∈ 𝑓(𝐴) ⊆ 𝐵 , where 𝑓(𝐴) is
the image of 𝐴 under 𝑓 . The preimage, inverse image or pull-back of the subset 𝑋 of 𝐵 is defined to be
𝑓 −1 (𝑋) = {𝑎 ∈ 𝐴 ∶ 𝑓(𝑎) ∈ 𝑋}.
Partial derivatives. A function 𝑓 ∶ R → R is differentiable at 𝑥 if

𝑓(𝑥 + ℎ) − 𝑓(𝑥)
lim
ℎ→0 ℎ
exists. The limit is written 𝑓 ′ (𝑥) and is called the derivative of 𝑓 at 𝑥.

Consider 𝑓 ∶ 𝑈 (⊆ R𝑛 ) → R𝑚 , with particular attention to the cases when (i) 𝑚 = 1 and (ii) 𝑛 = 2 and
𝑚 = 1. Consider the specific case in (ii); that is, 𝑓 ∶ R2 → R, what is the derivative of 𝑓 at (𝑎, 𝑏) ∈ R2 ?
In this case, there are two natural functions to consider 𝑥 ↦ 𝑓(𝑥, 𝑏), since 𝑏 is constant, the standard
𝜕𝑓
approach permits differentiation with respect to 𝑥 to give (𝑎, 𝑏) or 𝑓𝑥 (𝑎, 𝑏). In a similar manner, the
𝜕𝑥
𝜕𝑓
standard approach applied to 𝑦 ↦ 𝑓(𝑎, 𝑦) gives (𝑎, 𝑏) or 𝑓𝑦 (𝑎, 𝑏). In the case 𝑓 ∶ R𝑛 → R, define
𝜕𝑦

𝜕𝑓 𝑓(𝑥1 , … , 𝑥𝑖−1 , 𝑥𝑖 + ℎ, 𝑥𝑖+1 , … , 𝑥𝑛 ) − 𝑓(x)


(x) = lim
𝜕𝑥𝑖 ℎ→0 ℎ

where x = (𝑥1 , … , 𝑥𝑛 ). Finally, in the general case, where 𝑓(x) = (𝑓1 (x, … , 𝑓𝑚 (x)), this approach
yields a matrix, J, of partial derivatives where the entries are

𝜕𝑓𝑖
𝐽𝑖𝑗 = .
𝜕𝑥𝑗

Calculating determinants. Theoretical consideration simplify determinant computation through the follow-
ing approaches.

• subtracting a multiple of one row (column) from another,


• taking out a common factor from any row (column),
• recognising upper triangular, lower triangular or diagonal matrices,
• choosing the best row (column) to expand along,
• using block matrices where appropriate,
• identifying rows or columns which can be expressed as linear combinations of other rows or columns,
respectively.
Chapter 2

Multiple integration

Chapter outcomes

At the end of the chapter you should be familiar with and be able to:

• apply, where appropriate, a special case of Fubini’s theorem;


• solve problems that require reversing the order of integration;
• select and apply change of variables to suitable integrals.
• solve using appropriate integration techniques a range of problems, including some involving
statistics.

Assumed background knowledge

Before starting this chapter it is assumed that you know the following material. (References are made to the
2019-20 printed notes.)

• MAT106: Linear Algebra. Chapter 11: The determinant of a matrix

– Definition of the determinant.


– Calculation of the determinant

• MA137: Mathematical Analysis. Chapter 1: Functions

– Definition of intervals.
– Standard functions.

• ST116: Mathematical Techniques. The mathematical thinking, language and writing developed is
fundamental to everything presented in this module. This includes:

– Language of sets
– Function notation, including image and pre-image.

• General mathematical knowledge. It is assumed that you have differentiation proficency, for example,
differentiate standard functions including utilising rules such as product, quotient, chain rule. It is also
assumed that you can fluently integrate function including apply standard integration rules.

29
30 CHAPTER 2. MULTIPLE INTEGRATION

Chapter Statistician: Kiyoshi Ito

Kiyoshi Ito was a Japanese mathematician and a major contributor to the theory of probability. Ito devel-
oped the work of others to apply the techniques of differential and integral calculus to stochastic processes
(random phenomena that evolve over time), such as Brownian motion. This work became known as the
Ito stochastic calculus. Ito calculus has applications in several fields, including engineering, population
genetics, and mathematical finance.

Figure 2.1: Kiyoshi Ito.

Image available via: https://upload.wikimedia.org/wikipedia/commons/c/c1/Kiyosi_Ito.jpg Image licence:


Creative Commons Attribution-Share Alike 2.0 Germany.
2.1. INTRODUCTION: FUBINI’S THEOREM 31

2.1 Introduction: Fubini’s theorem

We know how to define the (Riemann) integral of a function 𝑓 ∶ R → R using a limit of (Riemann) sums
𝑥 axis using tiny intervals), and we have seen that this can be used to obtain the area
(by partitioning the
under the graph of 𝑓 . It is clear that a similar limiting process can be carried out by partitioning the rectangle
beneath a bivariate function into ‘tiny’ rectangles, and we can thus define the integral of a function 𝑓(𝑥, 𝑦)
over a rectangle 𝑅, denoted:

∫ 𝑓(𝑥, 𝑦)𝑑𝐴 = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦,


𝑅 𝑅

and of course we could extend this to cuboids and their higher-dimensional equivalents.

Figure 2.2: Surface approximated by cuboids.

Being able to define an integral is not the same as being able to evaluate it, and we do not wish to be
restricted to integrating over rectangular regions, so we now develop some ideas to allow us to address
these problems.

Fubini’s theorem is a mathematical result which makes precise when we can expect to change to order
of integration (or of integration and other operations in which limits are taken) without affecting the result.
For the avoidance of later confusion, note that the general and formal form of Fubini’s theorem normally
considers the Lebesgue rather than Riemann integral; we’ll consider only functions sufficiently regular that
the distinction is unimportant. You’ll see formal versions of this result in modules such as MA359: Measure
Theory, ST342: Mathematics of Random Events and ST318: Probability Theory.

We will make use of the following simple special case of Fubini’s theorem in this module:
32 CHAPTER 2. MULTIPLE INTEGRATION

Fubini’s Theorem

If 𝑓 is a continuous function with domain 𝑅 = [𝑎, 𝑏] × [𝑐, 𝑑], then


𝑏 𝑑 𝑑 𝑏
∫ (∫ 𝑓(𝑥, 𝑦)𝑑𝑦) 𝑑𝑥 = ∫ 𝑓(𝑥, 𝑦)𝑑𝐴 = ∫ (∫ 𝑓(𝑥, 𝑦)𝑑𝑥) 𝑑𝑦.
𝑎 𝑐 𝑅 𝑐 𝑎

Cavalieri’s principle states that each of the above integrals will find the volume under the graph of 𝑓 over
the rectangle 𝑅 (by cutting into slices parallel to one axis and integrating to obtain the cross-sectional area).
The next figure illustrates this principle for 𝑓 ∶ R2 → R defined by 𝑓(𝑥, 𝑦) = 12 𝑦 sin(𝑥) + 2.

Figure 2.3: Illustration of Cavalieri’s principle

Example 2.1
Evaluate the integrals

3 2 2 3
(𝑎) ∫ (∫ 𝑥 𝑦𝑑𝑦) 𝑑𝑥 (𝑏) ∫ (∫ 𝑥2 𝑦𝑑𝑥) 𝑑𝑦.
2
0 1 1 0
2.1. INTRODUCTION: FUBINI’S THEOREM 33
34 CHAPTER 2. MULTIPLE INTEGRATION

Fubini’s theorem’s conditions cannot be relaxed

For example,

2 1 1 2
𝑥𝑦(𝑥2 − 𝑦2 ) 1 𝑥𝑦(𝑥2 − 𝑦2 ) 1
∫ (∫ 2 2 3
𝑑𝑦) 𝑑𝑥 = ∫ (∫ 2 2 3
𝑑𝑥) 𝑑𝑦 = −
𝑥=0 𝑦=0 (𝑥 + 𝑦 ) 5 𝑦=0 𝑥=0 (𝑥 + 𝑦 ) 20

where (𝑥, 𝑦) ≠ 0. The exercises provide a method for you to evaluate these integrals.

• With a bit of luck, we will be able to integrate one of the iterated integrals using familiar integrals of
functions R → R.
• It is not necessary for 𝑓 to be continuous for it to be integrable, and Fubini’s theorem does generalise
to bounded functions with ‘nice’ discontinuities, but the statement is messy, and so we will restrict to
continuous 𝑓 .
• ∫𝑅 𝑓(𝑥, 𝑦)𝑑𝐴 inherits all the properties we like from univariate integrals: linearity, monotonicity and
so on.

Example 2.2
Suppose 𝑓 ∶ R2 → R satisfies 𝑓(𝑥, 𝑦) = 𝑔(𝑥)ℎ(𝑦) for some real-valued functions 𝑔 and ℎ.
𝑏 𝑑
Determine ∫ (∫ 𝑓(𝑥, 𝑦)𝑑𝑦) 𝑑𝑥.
𝑎 𝑐
2.1. INTRODUCTION: FUBINI’S THEOREM 35
36 CHAPTER 2. MULTIPLE INTEGRATION
2.1. INTRODUCTION: FUBINI’S THEOREM 37

Question 2.3
Evaluate the integral
2 3
∫ ∫ 1 + 8𝑥𝑦 𝑑𝑥𝑑𝑦
𝑦=1 𝑥=0
2.2. ELEMENTARY REGIONS OF INTEGRATION 39

2.2 Elementary regions of integration


The question arises, what happens with non-rectangular domains? The ideas described above can be
extended to less restricted regions of integration. For example, consider integrating the function 𝑓(𝑥, 𝑦) =
2 2
𝑦 𝑥 over the region bounded below by 𝑦 = −𝑥, above by 𝑦 = 𝑥 and the line 𝑥 = 1. Graphically we
have

Figure 2.4: Example type 1 region.

𝜙1 and 𝜙2 mapping [𝑎, 𝑏] to R be


In general, we can classify certain regions of integration by type.Let
continuous functions satisfying 𝜙2 (𝑡) ≤ 𝜙1 (𝑡), 𝑡 ∈ [𝑎, 𝑏]. Let 𝑅 = {(𝑥, 𝑦) ∶ 𝑥 ∈ [𝑎, 𝑏], 𝑦 ∈
[𝜙2 (𝑥), 𝜙1 (𝑥)]}, then 𝑅 is a region of type 1.
40 CHAPTER 2. MULTIPLE INTEGRATION
2.2. ELEMENTARY REGIONS OF INTEGRATION 41

Let𝜙3 and 𝜙4 mapping [𝑐, 𝑑] to R be continuous functions satisfying 𝜙4 (𝑡) ≤ 𝜙3 (𝑡), 𝑡 ∈ [𝑐, 𝑑]. Let
𝑆 = {(𝑥, 𝑦) ∶ 𝑦 ∈ [𝑐, 𝑑], 𝑥 ∈ [𝜙4 (𝑦), 𝜙3 (𝑦)]}, then 𝑆 is a region of type 2.
42 CHAPTER 2. MULTIPLE INTEGRATION

A region of type 3 is one that is both type 1 and type 2.

These are our elementary regions, and their boundaries are just the sorts of sets which allow integrability;
what’s more:
𝑏 𝜙1 (𝑥)
• if 𝑅 is type 1, then ∫ 𝑓(𝑥, 𝑦)𝑑𝐴 =∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑦𝑑𝑥;
𝑅 𝑎 𝜙2 (𝑥)
𝑑 𝜙3 (𝑦)
• if 𝑅 is type 2, then ∫ 𝑓(𝑥, 𝑦)𝑑𝐴 =∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦;
𝑅 𝑐 𝜙4 (𝑦)
𝑏 𝜙1 (𝑥) 𝑑 𝜙3 (𝑦)
• if 𝑅 is type 3, then ∫ 𝑓(𝑥, 𝑦)𝑑𝐴 = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑦𝑑𝑥 = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦, and
𝑅 𝑎 𝜙2 (𝑥) 𝑐 𝜙4 (𝑦)
we may use whichever version we wish.

Higher dimensions We can generalise the regions over which we can integrate to 3-d and higher dimen-
sions with similar, but much more technical, definitions.

It suffices to say that we may be able to ‘spear’ the region in a co-ordinate direction and move into the region,
pass through, and exit the region in the same simple fashion with nicely defined boundary curves, allowing
us to produce the iterated integral(s) of the type,

𝑏 𝜙1 (𝑥) 𝛾1 (𝑥,𝑦)
∫ 𝑓(𝑥, 𝑦, 𝑧)𝑑𝑉 = ∫ ∫ ∫ 𝑓(𝑥, 𝑦, 𝑧)𝑑𝑧𝑑𝑦𝑑𝑥.
𝑊 𝑎 𝜙2 (𝑥) 𝛾2 (𝑥,𝑦)
2.2. ELEMENTARY REGIONS OF INTEGRATION 43
44 CHAPTER 2. MULTIPLE INTEGRATION

We will restrict our attention to very simple regions in 3-d or higher. Of course, even when we have an
elementary region and a free choice of the order in which we carry out the iterated integration, the function
to be integrated might be too complicated to work with!

If we consider 𝑓 ∶ 𝐶 ⊆ R3 → R, where 𝐶 is a cuboid, we can make analogous definitions for


∫𝐶 𝑓(𝑥, 𝑦, 𝑧)𝑑𝑥𝑑𝑦𝑑𝑧 , and Fubini’s theorem tells us that we may, if 𝑓 is ‘nice’, evaluate this integral via one
of the (six distinct sequences of three) iterated integrals for 𝑓 over 𝐶 . And similarly for higher dimensions.

2.2.1 Finding the limits of integration 1: 2-d Cartesian


Suppose we wish to evaluate
3 1
3
𝐼 =∫ ∫ 𝑒𝑦 𝑑𝑦𝑑𝑥.
0 √𝑥/3
3
At present, we cannot evaluate the integral; the integral of 𝑒𝑦 with respect to 𝑦 cannot be expressed in
3 3
terms of elementary functions. If however, we were to integrate 𝑒𝑦 with respect to 𝑥 (so 𝑒𝑦 is a constant)
then the integration is straight forward.

Suppose we wish to reverse the order of integration; that is, given ∬ 𝑓𝑑𝑥𝑑𝑦, we want to perform
𝑅

∬ 𝑓𝑑𝑦𝑑𝑥. We illustrate the process for


𝑅

1 √1−𝑦2
∫ ∫ 𝑓𝑑𝐴 = ∫ ∫ 𝑓𝑑𝑥𝑑𝑦.
𝑅 𝑦=0 𝑥=1−𝑦

1. Sketch and label the region 𝑅 and its bounding curves.


2.2. ELEMENTARY REGIONS OF INTEGRATION 45
46 CHAPTER 2. MULTIPLE INTEGRATION

2. Introduce a cursor pointing in the direction of the first desired integration, and mark where it enters
and leaves 𝑅.
2.2. ELEMENTARY REGIONS OF INTEGRATION 47

3. Move cursor back and forth through 𝑅 parallel to the second desired integration direction, and mark
the values where it enters and leaves 𝑅.
48 CHAPTER 2. MULTIPLE INTEGRATION
2.2. ELEMENTARY REGIONS OF INTEGRATION 49

4. Write the integral with the desired order of integration:


50 CHAPTER 2. MULTIPLE INTEGRATION

Note that
∫ ∫ 𝑑𝑥𝑑𝑦 = Area(𝑅),
𝑅

∫ ∫ ∫ 𝑑𝑥𝑑𝑦𝑑𝑧 = Volume(𝑅).
𝑅

Example 2.4
Evaluate 𝐼 = ∬ (𝑥2 + 𝑦2 )𝑑𝑦𝑑𝑥 where 𝑅 is the region {(𝑥, 𝑦) ∈ R2 ∶ 0 ≤ 𝑥 ≤ 1, 𝑥2 ≤ 𝑦 ≤
𝑅
𝑥}. Verify the answer by reversing the order of integration.
2.2. ELEMENTARY REGIONS OF INTEGRATION 51
52 CHAPTER 2. MULTIPLE INTEGRATION

Returning to the motivating example.

Example 2.5
Evaluate
3 1
3
𝐼 =∫ ∫ 𝑒𝑦 𝑑𝑦𝑑𝑥.
0 √𝑥/3
2.2. ELEMENTARY REGIONS OF INTEGRATION 53
54 CHAPTER 2. MULTIPLE INTEGRATION
2.2. ELEMENTARY REGIONS OF INTEGRATION 55

Question 2.6

1. Sketch the regions of integration:


a)
3 2
∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑦𝑑𝑥,
0 0
b)
3 0
∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑦𝑑𝑥,
0 −2
c)
ln 8 ln 𝑦

∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦,
1 0
d)
2 𝑦2
∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦.
1 𝑦

2. For the integrals below, write the integral equivalent with the order of integration reversed:

a) √
1 𝑦
∫ ∫ 𝑑𝑥𝑑𝑦,
0 𝑦

b)
1 √1−𝑦2
∫ ∫ 𝑑𝑥𝑑𝑦,
0 −√1−𝑦2

c)
ln 2 2
∫ ∫ 𝑑𝑥𝑑𝑦.
0 𝑒𝑦

3. Evaluate:

a)
2 2
∫ ∫ 2𝑦2 sin(𝑥𝑦)𝑑𝑦𝑑𝑥,
0 𝑥
b) √
√(ln 3)/2 ln 3
2
∫ ∫√ 𝑒𝑥 𝑑𝑥𝑑𝑦.
0 𝑦 2
56 CHAPTER 2. MULTIPLE INTEGRATION

Question 2.7
Determine the volume of the solid bounded above by the plane 𝑧 = 4 − 𝑥 − 𝑦 and below by the
2
rectangle 𝑅 = {(𝑥, 𝑦) ∈ R ∶ 0 ≤ 𝑥 ≤ 1, 0 ≤ 𝑦 ≤ 2}.
2.3. CHANGE OF VARIABLES 57

2.3 Change of variables

If a change in the order of integration is unhelpful or impossible, perhaps a change of variables can help.
We motivate the approach using a 2d integral; that is,

∫ 𝑓(𝑥, 𝑦)𝑑𝐴 = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦.


𝑅 𝑅

Suppose (𝑥, 𝑦) = 𝑇 (𝑢, 𝑣) = (𝑔(𝑢, 𝑣), ℎ(𝑢, 𝑣)), where 𝑇 is 𝐶 1 and bijective. Figure 2.5 provides a
schematic representation.

Figure 2.5: Representation of 2d transformation.


58 CHAPTER 2. MULTIPLE INTEGRATION
2.3. CHANGE OF VARIABLES 59
60 CHAPTER 2. MULTIPLE INTEGRATION

So combining everything together gives

𝜕(𝑥, 𝑦)
∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = ∫ 𝑓 ∗ (𝑢, 𝑣) ∣ ∣ 𝑑𝑢𝑑𝑣,
𝑅 𝐺 𝜕(𝑢, 𝑣)
∗ −1
where 𝑓 (𝑢, 𝑣) = 𝑓(𝑔(𝑢, 𝑣), ℎ(𝑢, 𝑣)), 𝐺 = 𝑇 (𝑅) and
𝜕𝑔 𝜕𝑔
𝜕(𝑥, 𝑦) 𝜕𝑢 𝜕𝑣
∣ ∣ = ∣det ( )∣ .
𝜕(𝑢, 𝑣) 𝜕ℎ 𝜕ℎ
𝜕𝑢 𝜕𝑣

Do not forget the modulus of the determinant of the matrix.

We can produce a corresponding (more complicated) formula for higher dimensions, In particular, for 3-d
suppose we have a transformation 𝑇 ∶ (𝑢, 𝑣, 𝑤) ↦ (𝑥, 𝑦, 𝑧) that is 𝐶 1 and one-to-one (except possibly
on a ‘nice’ set) then:

𝜕(𝑥, 𝑦, 𝑧)
∫ 𝑓(𝑥, 𝑦, 𝑧)𝑑𝑥𝑑𝑦𝑑𝑧 = ∫ 𝑓 ∗ (𝑢, 𝑣, 𝑤) ∣ ∣ 𝑑𝑢𝑑𝑣𝑑𝑤.
𝑊 𝑊∗ 𝜕(𝑢, 𝑣, 𝑤)

Here:

1. 𝑓 ∗ (𝑢, 𝑣, 𝑤) = 𝑓(𝑇 (𝑢, 𝑣, 𝑤)) is the function 𝑓 rewritten in the coordinates 𝑢, 𝑣, 𝑤, and hopefully
has a simpler form than 𝑓 ;
2. 𝑊 ∗ = 𝑇 −1 (𝑊 ) is the region in (𝑢, 𝑣, 𝑤)-space corresponding to 𝑊 in (𝑥, 𝑦, 𝑧)-space, see Figure
2.6.

Figure 2.6: Transformation 𝑇 mapping between the set 𝑊 ∗ and 𝑊 .


2.3. CHANGE OF VARIABLES 61

𝜕(𝑥,𝑦,𝑧)
3. ∣ 𝜕(𝑢,𝑣,𝑤) ∣ denotes the modulus of the determinant of the Jacobian matrix, i.e.

|detJ(𝑢, 𝑣, 𝑤)| ,
where
𝜕𝑥 𝜕𝑥 𝜕𝑥
𝜕𝑢 𝜕𝑣 𝜕𝑤

⎜ ⎞

J(𝑢, 𝑣, 𝑤) =⎜

𝜕𝑦 𝜕𝑦 𝜕𝑦 ⎟
⎟ .
⎜ 𝜕𝑢 𝜕𝑣 𝜕𝑤 ⎟
𝜕𝑧 𝜕𝑧 𝜕𝑧
⎝ 𝜕𝑢 𝜕𝑣 𝜕𝑤 ⎠

Example 2.8
Find
𝑦 √
𝐼 = ∫ ∫ [√ + 𝑥𝑦] 𝑑𝑥𝑑𝑦,
𝑅 𝑥
where 𝑅 is illustrated below and given by the shaded area between 𝑦 = 4𝑥, 𝑦 = 𝑥, 𝑥𝑦 = 1 and
𝑥𝑦 = 9.
62 CHAPTER 2. MULTIPLE INTEGRATION
2.3. CHANGE OF VARIABLES 63
64 CHAPTER 2. MULTIPLE INTEGRATION

The purpose of the next example is to illustrate how to approach change of variables when the region is not
bounded by contour curves.

Example 2.9
Evaluate
1 𝑥
∫ ∫ 𝑑𝑦𝑑𝑥
𝑥=0 𝑦=0

using the transformation 𝑢 = 𝑥 + 𝑦 and 𝑣 = 𝑥 − 𝑦.


2.3. CHANGE OF VARIABLES 65
66 CHAPTER 2. MULTIPLE INTEGRATION
2.3. CHANGE OF VARIABLES 67

Useful result

The following result is often useful. It is an application of the chain rule to multivariable functions. We
will see the chain rule in Chapter 4, but for the moment we can apply the result without verification.

𝜕(𝑥, 𝑦) 𝜕(𝑢, 𝑣)
= 1.
𝜕(𝑢, 𝑣) 𝜕(𝑥, 𝑦)
This result generalises.
68 CHAPTER 2. MULTIPLE INTEGRATION

Question 2.10
Using a suitable change of variable, evaluate ∬ (𝑥2 + 𝑦2 )𝑑𝑥𝑑𝑦, where 𝑅 is the square region
𝑅
in R2 bounded by the lines connecting (0, 0), (1, 1), (2, 0) and (1, −1) illustrated below.
2.3. CHANGE OF VARIABLES 69

Question 2.11

1. By using the change of variables 𝑢 = 𝑥 + 𝑦, 𝑣 = 𝑦 − 2𝑥 or otherwise, evaluate


1 1−𝑥

∫ ∫ 𝑥 + 𝑦(𝑦 − 2𝑥)2 𝑑𝑦𝑑𝑥.
0 0

2. Evaluate 𝑦
3 4 2 +1
2𝑥 − 𝑦 𝑧
𝐼 =∫ ∫ ∫ ( + ) 𝑑𝑥𝑑𝑦𝑑𝑧.
0 0 𝑥= 𝑦 2 3
2

using the transformation 𝑥 = 𝑢 + 𝑣, 𝑦 = 2𝑣, 𝑧 = 3𝑤


2.4. COMMON COORDINATE CHANGES 71

2.4 Common coordinate changes

A number of situations utilise the same or related coordinate changes.

2-d polar coordinates: The 2d polar coordinate system utilises a distance 𝑟 from the origin and an angle
𝜃 taken anti-clockwise from the 𝑥 axis. A point 𝑃 whose Cartesian coordinate is (𝑥, 𝑦) is represented by
the ordered pair (𝑟, 𝜃), where 𝑥 = 𝑟 cos 𝜃, 𝑦 = 𝑟 sin 𝜃. A computation, which you should do, shows

𝜕(𝑥, 𝑦)
∣ ∣ = 𝑟.
𝜕(𝑟, 𝜃)

Figure 2.7: Representation of a point in 2d polar coordinates

Example 2.12
By using 2-d polar coordinates evaluate

2
+𝑦2 )
𝐼 = ∬ 𝑒−(𝑥 𝑑𝑥𝑑𝑦,
𝑅

where 𝑅 denotes the region 𝑥 ≥ 0, 𝑦 ≥ 0 in the 𝑥𝑦-plane.


72 CHAPTER 2. MULTIPLE INTEGRATION
2.4. COMMON COORDINATE CHANGES 73
74 CHAPTER 2. MULTIPLE INTEGRATION

3-d cylindrical coordinates: The cylindrical coordinate system is a combination of the polar coordinate
system in the 𝑥𝑦 -plane with an additional 𝑧 - coordinate vertically. A point 𝑃 whose Cartesian coordinate is
(𝑥, 𝑦, 𝑧) is represented by the ordered triple (𝑟, 𝜃, 𝑧) , where (𝑟, 𝜃) is the polar coordinate of (𝑥, 𝑦) and 𝑧
is the vertical projection along 𝑧 -axis of 𝑃 onto 𝑥𝑦 -plane.
2.4. COMMON COORDINATE CHANGES 75
76 CHAPTER 2. MULTIPLE INTEGRATION
2.4. COMMON COORDINATE CHANGES 77

3-d spherical coordinates: The 3-d spherical coordinate system is a coordinate system based on spherical
geometry. A point 𝑃 whose Cartesian coordinate is (𝑥, 𝑦, 𝑧) is represented by the ordered triple (𝜌, 𝜃, 𝜙)
, where 𝑥 = 𝜌 sin 𝜙 cos 𝜃, 𝑦 = 𝜌 sin 𝜙 sin 𝜃, 𝑧 = 𝜌 cos 𝜙.
78 CHAPTER 2. MULTIPLE INTEGRATION
2.4. COMMON COORDINATE CHANGES 79
80 CHAPTER 2. MULTIPLE INTEGRATION

Note, there are different conventions regarding spherical polar coordinates. For example, some other con-
ventions interchange 𝜃 and 𝜙, you must remember this if you start looking on the web or in textbooks.

2.4.1 Finding the limits of integration 2: 2-d polar coordinates

Consider a region 𝑅 in 2-d polar coordinates. We illustrate finding the limits of integration for ∬ 𝑑𝑟𝑑𝜃
𝑅
where 𝑅 is the region outside the circle 𝑟 = 1, inside 𝑟 = 1 + cos 𝜃 and 𝑥 ≥ 0.
2.4. COMMON COORDINATE CHANGES 81
82 CHAPTER 2. MULTIPLE INTEGRATION
2.4. COMMON COORDINATE CHANGES 83

Example 2.13
Determine
∬ 𝑦 𝑑𝐴
𝑅
2
where 𝑅 is the first quadrant inside 𝑥 + 𝑦 = 4 and outside 𝑥2 + 𝑦2 = 1.
2
84 CHAPTER 2. MULTIPLE INTEGRATION
2.4. COMMON COORDINATE CHANGES 85
86 CHAPTER 2. MULTIPLE INTEGRATION

Example 2.14
Determine the volume of the region in R3 that is bounded above by the hemisphere 𝑧 =
√𝑎2 − 𝑥2 − 𝑦2 and below by the cone 𝑧= √𝑥2 + 𝑦2 .
2.4. COMMON COORDINATE CHANGES 87
88 CHAPTER 2. MULTIPLE INTEGRATION
2.4. COMMON COORDINATE CHANGES 89

Question 2.15

1. Evaluate ∭ exp((𝑥2 + 𝑦2 + 𝑧 2 )3/2 )𝑑𝑉 , where 𝐷 is the unit ball in R3 .


𝐷

2. Evaluate ∬ log(𝑥2 + 𝑦2 )𝑑𝑥𝑑𝑦, where 𝐷 is the region below.


𝐷
2.5. APPLICATIONS OF MULTIPLE INTEGRALS TO STATISTICS 91

2.5 Applications of multiple integrals to statistics

The joint distribution of a pair of random variables 𝑋 and 𝑌 is said to have probability density 𝑓 ∶ R2 →
[0, ∞) if
ℙ ((𝑋, 𝑌 ) ∈ 𝐴) = ∫ ∫ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦,
𝐴

where ℙ((𝑋, 𝑌 ) ∈ 𝐴) is the probability that (𝑋, 𝑌 ) belongs to 𝐴.

The expected value of 𝑔(𝑋, 𝑌 ) is given by

𝔼(𝑔(𝑋, 𝑌 )) = ∫ ∫ 𝑔(𝑥, 𝑦)𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦


R2

(provided this integral is well-defined and finite).

Example 2.16
Suppose a random point is chosen uniformly from a subset 𝐷 ⊆ R2 of finite area. Determine the
probability density of (𝑋, 𝑌 ).
92 CHAPTER 2. MULTIPLE INTEGRATION
2.5. APPLICATIONS OF MULTIPLE INTEGRALS TO STATISTICS 93
94 CHAPTER 2. MULTIPLE INTEGRATION
2.5. APPLICATIONS OF MULTIPLE INTEGRALS TO STATISTICS 95

Question 2.17
If (𝑋,
√ 𝑌 ) is chosen uniformly from 𝐷 = {(𝑥, 𝑦) ∶ 𝑥2 + 𝑦2 ≤ 1}, determine 𝔼(𝑋𝑌 ). Determine
𝔼( 𝑋 2 + 𝑌 2 ).

2.5.1 Geometric Probability


This is a practical experiment. Drop a needle of length 𝑙 at random on a stripped floor, with stripes a distance
𝐿 apart. Suppose 𝑙 < 𝐿. What is the probability that the needle does not lie across any line?

Figure 2.8: Stripped floor distance L apart with needles of length 𝑙

A needle is described by its midpoint and the angle with the horizontal.

Figure 2.9: Stripped floor distance L apart with needles of length 𝑙


96 CHAPTER 2. MULTIPLE INTEGRATION
2.5. APPLICATIONS OF MULTIPLE INTEGRALS TO STATISTICS 97
98 CHAPTER 2. MULTIPLE INTEGRATION

2.6 Chapter 2 Consolidation Questions

Question 2.18

1. Evaluate
2
𝑥−𝑦
∬( ) 𝑑𝑥𝑑𝑦
𝑅 𝑥+𝑦+2

where 𝑅 is the square region in the 𝑥𝑦 plane with vertices at (1, 0), (0, 1), (−1, 0) and
(0, −1).
𝑦 𝑦
2. Using the transformation, 𝑢 = 𝑥2 − 𝑦 2 and 𝑣 = , evaluate ∬ 𝑑𝑥𝑑𝑦 , where 𝐷 is the
𝑥 𝐷 𝑥
2 2 2 2
region between 𝑥 − 𝑦 = 1, 𝑥 − 𝑦 = 4 and 𝑦 = 𝑥/2 in positive quadrant illustrated
below
2.6. CHAPTER 2 CONSOLIDATION QUESTIONS 99

Question 2.19
A random variable 𝑋 is said to have a standard (one-dimensional) Normal distribution if it has
density
1 2
𝑓1 (𝑥) = √ 𝑒−𝑥 /2 .
2𝜋
A pair (𝑋, 𝑌 ) is said to have a standard (two-dimensional) Normal distribution if each member
of the pair has a standard one-dimensional Normal distribution and they are independent; their joint
density is given by
1 2 1 2
𝑓2 (𝑥, 𝑦) = √ 𝑒−𝑥 /2 × √ 𝑒−𝑦 /2 .
2𝜋 2𝜋

1. Check that ∫ ∫ 𝑓2 (𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1 by:


R2

a. using 2-d polar coordinates;


∞ ∞
b. writing the integral as 4 ∫ ∫ 𝑓2 (𝑥, 𝑦)𝑑𝑥𝑑𝑦, and then using the change of variables
0 0
(𝑢, 𝑣) ↦ (𝑥, 𝑦) = (𝑢, 𝑢𝑣);
c. expressing the integral as a 3-d integral (i.e. the volume between 𝑧 = 0 and the surface
1 −(𝑥2 +𝑦2 )/2
𝑧= 2𝜋 𝑒 ), and then evaluating with 3-d cylindrical coordinates.

Applying the result ∬ 𝑓2 (𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1 and Fubini’s theorem, check further that ∫ 𝑓1 (𝑥)𝑑𝑥 =
R2 R
1.

2. Suppose (𝑋, 𝑌√) has a standard two-dimensional


√ Normal distribution. Using the substitution
𝑢 = (𝑥 + 𝑦)/ 2, 𝑣 = (𝑥 − 𝑦)/ 2, show that
𝑏
𝑋+𝑌 1 2
ℙ( √ ∈ [𝑎, 𝑏]) = ∫ √ 𝑒−𝑥 /2 𝑑𝑥.
2 𝑎 2𝜋

(Note that this implies that (𝑋 +𝑌 )/ 2 has a standard one-dimensional Normal distribution.)
100 CHAPTER 2. MULTIPLE INTEGRATION

Question 2.20
[This question is for interest.] Recall
2 1
𝑥𝑦(𝑥2 − 𝑦2 ) 1
1. ∫ (∫ 2 2 3
𝑑𝑦) 𝑑𝑥 =
𝑥=0 𝑦=0 (𝑥 + 𝑦 ) 5
1 2
𝑥𝑦(𝑥2 − 𝑦2 ) 1
2. ∫ (∫ 2 2 3
𝑑𝑥) 𝑑𝑦 = − .
𝑦=0 𝑥=0 (𝑥 + 𝑦 ) 20

To evaluate (1) consider the first integral

1
𝑥𝑦(𝑥2 − 𝑦2 )
∫ 𝑑𝑦
𝑦=0 (𝑥2 + 𝑦2 )3

and make the substitution 𝑢 = 𝑥2 + 𝑦2 . Using the final result, evaluate the integral in (1).
To evaluate the integral in (2) consider the first integral

2
𝑥𝑦(𝑥2 − 𝑦2 )
∫ 𝑑𝑥
𝑥=0 (𝑥2 + 𝑦2 )3

and once again make the substitution 𝑢 = 𝑥2 + 𝑦2 . Using the final result, evaluate the integral in
(2).
2.7. CHAPTER 2 SUMMARY 101

2.7 Chapter 2 summary

Chapter outcomes review

Having reviewed the material and completed the assessment (formative and summative) material,
you should be able to:

• apply, where appropriate, a special case of Fubini’s theorem;


• solve problems that require reversing the order of integration;
• select and apply change of variables to suitable integrals.
• solve using appropriate integration techniques a range of problems, including some involving
statistics.

This chapter covers six topics.

Fubini’s theorem. Consider a continuous function on domain 𝑅 = [𝑎, 𝑏] × [𝑐, 𝑑] ⊆ R2 ; that is, 𝑅 is a
rectangle. Then, a special case of the general Fubini’s theorem states:

𝑏 𝑑 𝑑 𝑏
∫ (∫ 𝑓(𝑥, 𝑦)𝑑𝑦) 𝑑𝑥 = ∫ 𝑓(𝑥, 𝑦)𝑑𝐴 = ∫ (∫ 𝑓(𝑥, 𝑦)𝑑𝑥) 𝑑𝑦.
𝑎 𝑐 𝑅 𝑐 𝑎

Cavalieri’s principle states that each of the above integrals will find the volume under the graph of 𝑓 over
the rectangle 𝑅.

Elementary regions of integration. The previous case generalises to less restricted regions. - Let 𝜙1 , 𝜙2 ∶
[𝑎, 𝑏] → R be continuous functions. Suppose 𝜙2 (𝑡) ≤ 𝜙1 (𝑡), for 𝑡 ∈ [𝑎, 𝑏]. Let

𝑅 = {(𝑥, 𝑦) ∶ 𝑥 ∈ [𝑎, 𝑏], 𝑦 ∈ [𝜙1 (𝑡), 𝜙2 (𝑡)]},

then 𝑅 is a region of type 1. - Let 𝜙3 , 𝜙4 ∶ [𝑐, 𝑑] → R be continuous functions satisfying 𝜙4 (𝑡) ≤ 𝜙3 (𝑡)
for 𝑡 ∈ [𝑐, 𝑑]. Let
𝑆 = {(𝑥, 𝑦) ∶ 𝑦 ∈ [𝑐, 𝑑], 𝑥 ∈ [𝜙3 (𝑡), 𝜙4 (𝑡)]},
Then 𝑆 is a region of type 2. - A *region of type 3() is a region of type 1 and type 2.

Regions of type 1, 2, 3 are elementary regions, and their boundaries permit integrability and also

𝑏 𝜙1 (𝑡)
• if 𝑅 is type 1, then ∫ 𝑓(𝑥, 𝑦)𝑑𝐴 = ∫𝑎 ∫𝜙 𝑓(𝑥, 𝑦)𝑑𝑦𝑑𝑥;
𝑅 2 (𝑡)
𝑑 𝜙3 (𝑡)
• if 𝑅 is type 2, then ∫ 𝑓(𝑥, 𝑦)𝑑𝐴 = ∫𝑐 ∫𝜙 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦;
𝑅 4 (𝑡)
𝑏 𝜙1 (𝑡) 𝑑 𝜙3 (𝑡)
• if 𝑅 is type 3, then ∫ 𝑓(𝑥, 𝑦)𝑑𝐴 = ∫𝑎 ∫𝜙 𝑓(𝑥, 𝑦)𝑑𝑦𝑑𝑥 = ∫𝑐 ∫𝜙 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦;
𝑅 2 (𝑡) 4 (𝑡)

Generalisation to suitable three and higher dimension regions exists, but consideration is given to simple
regions in R𝑛 , 𝑛 ≥ 3.
Finding the limits of integration 1: 2-d cartesian. Consider the integral ∫ ∫ 𝑓𝑑𝑥𝑑𝑦, what is the process
𝑅
to determine ∫ ∫ 𝑓𝑑𝑦𝑑𝑥; that is, to reverse the order of integration? Note, this process is important since
𝑅
102 CHAPTER 2. MULTIPLE INTEGRATION

such transformations can greatly simplify the computations. The approach is as follows:

1. Sketch and label the region 𝑅 and its bounding curves.


2. Introduce a cursor pointing in the direction of the first desired integration, and mark where it enters
and leaves 𝑅.
3. Move the cursor back and forth through 𝑅 parallel to the second desired integration direction, and
mark the values where it enters and leaves 𝑅.
4. Write the integral with the desired order of integration.

Change of variables. Suppose we have a one-to-one 𝐶 1 transformation 𝑇 ∶ R3 → R3 defined by


𝑇 (𝑢, 𝑣, 𝑤) = (𝑥, 𝑦, 𝑧). (Note, we can replace one-to-one with one-to-one on a ‘nice’ set.) Then

𝜕(𝑥, 𝑦, 𝑧)
∫ 𝑓(𝑥, 𝑦, 𝑧)𝑑𝑥𝑑𝑦𝑑𝑧 = ∫ 𝑓 ∗ (𝑢, 𝑣, 𝑤) ∣ ∣ 𝑑𝑢𝑑𝑣𝑑𝑤.
𝑊 𝑊∗ 𝜕(𝑢, 𝑣, 𝑤)

Here

• 𝑓 ∗ (𝑢, 𝑣, 𝑤) = 𝑓(𝑇 (𝑢, 𝑣, 𝑤)) is the function 𝑓 rewritten in the coordinates 𝑢, 𝑣, 𝑤. The aim is for
𝑓 ∗ to be simpler than 𝑓 .
• 𝑊 ∗ = 𝑇 −1 (𝑊 ) is the region in the (𝑢, 𝑣, 𝑤)-space corresponding to 𝑊 in (𝑥, 𝑦, 𝑧)-space.
𝜕(𝑥, 𝑦, 𝑧)
• ∣ ∣ denotes the modulus of the determinant of the Jacobian matrix; that is,
𝜕(𝑢, 𝑣, 𝑤)
| det J(𝑢, 𝑣, 𝑤)|, where
𝜕𝑥 𝜕𝑥 𝜕𝑥
𝜕𝑢 𝜕𝑣 𝜕𝑤
⎛ ⎞
J(𝑢, 𝑣, 𝑤) =⎜


𝜕𝑦
𝜕𝑢
𝜕𝑦
𝜕𝑣
𝜕𝑦
𝜕𝑤


⎟.
𝜕𝑧 𝜕𝑧 𝜕𝑧
⎝ 𝜕𝑢 𝜕𝑣 𝜕𝑤 ⎠

Common coordinate changes In many cases utilisation of alternative coordinate systems results in sim-
plified integrals. Common coordinate changes are:

• 2-d polar coordinates, where 𝑥 = 𝑟 cos 𝜃 and 𝑦 = 𝑟 sin 𝜃. In this case,

𝜕(𝑥, 𝑦)
∣ ∣ = 𝑟.
𝜕(𝑟, 𝜃)

• 3-d cylindrical coordinates, where 𝑥 = 𝑟 cos 𝜃, 𝑦 = 𝑟 sin 𝜃 and 𝑧 = 𝑧 . In this case,

𝜕(𝑥, 𝑦, 𝑧)
∣ ∣ = 𝑟.
𝜕(𝑟, 𝜃, 𝑧)

• 3-d spherical coordinates, where 𝑥 = 𝜌 sin 𝜙 cos 𝜃, 𝑦 = 𝑟 sin 𝜙 sin 𝜃 and 𝑧 = 𝜌 cos 𝜙. In this case,

𝜕(𝑥, 𝑦, 𝑧)
∣ ∣ = 𝜌2 sin 𝜙.
𝜕(𝜌, 𝜙, 𝜃)
2.7. CHAPTER 2 SUMMARY 103

Applications of multiple integrals to statistics. The joint distribution of a pair of random variables 𝑋
2
and 𝑌 is said to have probability density 𝑓 ∶ R → [0, ∞) if

ℙ((𝑋, 𝑌 ) ∈ 𝐴) = ∬ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦,


𝐴

where ℙ((𝑋, 𝑌 ) ∈ 𝐴) is probability that (𝑋, 𝑌 ) belongs to 𝐴. The expected value of 𝑔(𝑋, 𝑌 ) is given
by
𝔼(𝑔(𝑋, 𝑌 )) = ∬ 𝑔(𝑥, 𝑦)𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦
R2

provided the integral is well-defined and finite.

ST218: Mathematical Statistics provides further coverage of these applications.


Chapter 3

Symmetric matrices, quadratic forms,


positive definiteness

Chapter outcomes

At the end of the chapter you should be familiar with and be able to:

• apply key symmetric matrices theoretical results to solve problems;


• select and apply appropriate theoretical results regarding quadratic forms;
• solve problems involving covariance matrices;
• prove results involving symmetric matrices and associated concepts.

Assumed background knowledge

Before starting this chapter it is assumed that you know the following material. (References are made to the
2019-20 printed notes.)

• MAT106: Linear Algebra. Chapter 12 and 13

– Inverse matrices
– Determinants of matrices
– Similar matrices
– Eigenvalues and eigenvectors
– Symmetric matrices
– Orthogonal matrices

• MA137: Mathematical Analysis. None.


• ST116: Mathematical Techniques. The mathematical thinking, language and writing developed is
fundamental to everything presented in this module. This includes:

– Language of sets
– Function notation, including image and pre-image.

104
105

• General mathematical knowledge.

Chapter Statistician: Catherine Ann Sugar

Catherine Ann Sugar is an American biostatistician at the University of California, Los Angeles, her research
focuses on clustering, functional data analysis, classification and patterns of covariation in data. Her work
has a wide range of applied interests including mental health, dentistry, nephrology, and particularly health
services research where she has developed a set of cluster-analytic methods for defining and analysing
health state models. She has also worked with the NHS on projects in cancer and multiple sclerosis.

Figure 3.1: Catherine Ann Sugar.

Image available via: https://ph.ucla.edu/faculty/sugar Image licence: Public


3.1. SOME LINEAR ALGEBRA 107

3.1 Some linear algebra


In this chapter, we collect together some results from linear algebra regarding symmetric matrices, quadratic
forms and positive definiteness.

We begin by recalling some definitions and results from Year 1 Linear Algebra. Recall that a set of vectors
in R𝑛 is orthonormal if for any a = (𝑎1 , … , 𝑎𝑛 ) and b = (𝑏1 , … , 𝑏𝑛 ) in the set, we have:

• a⋅b = 𝑎1 𝑏1 + ⋯ + 𝑎𝑛 𝑏𝑛 = 0,whenever a ≠ b,
• a ⋅ a = 𝑎21 + ⋯ + 𝑎2𝑛 = 1; that is, the vectors are unit vectors.

Definition 3.1. Consider an 𝑛 × 𝑛 matrix, A. The matrix A is:

• symmetric if A = AT ;
• anti-symmetric/skew-symmetric if A = − AT ;
• orthogonal if AT A = I. Equivalently, the columns (or rows) of A form an orthonormal set of vectors
in R𝑛 .
Some useful properties of real 𝑛 × 𝑛 symmetric matrices are as follows:

1. A has a ‘full set’ of 𝑛, possibly repeated, real eigenvalues;


2. the eigenvectors corresponding to distinct eigenvalues are orthogonal;
3. there is an orthonormal basis of eigenvectors;
4. if P is a matrix constructed with such an orthonormal basis as its columns, then P−1 AP = D, where
D is a diagonal matrix whose diagonal entries are the eigenvalues of A (in order corresponding to
the order of the columns of P). Actually P−1 = PT (since the inverse of any orthogonal matrix is its
transpose), and so PT AP = D.

Theorem 3.1. Any 𝑛 × 𝑛 matrix A can always be written

A = S + C, (3.1)

where S is symmetric and C is antisymmetric. Moreover, this decomposition is unique.


108 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS
3.1. SOME LINEAR ALGEBRA 109

Example 3.1
Let
19 20 −16

A = ⎜ 20 13 4 ⎞⎟.
⎝ −16 4 31 ⎠
Express A in the form RDRT , where R is an orthogonal and D is a diagonal matrix.
110 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS
3.1. SOME LINEAR ALGEBRA 111
112 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS

Question 3.2
Let
3 −2
S =( ).
−2 3
Write S in the form S = RDR𝑇 , where R is an orthogonal and D is a diagonal matrix.
3.2. QUADRATIC FORMS AND POSITIVE DEFINITENESS 113

3.2 Quadratic forms and positive definiteness


Let A be an 𝑛 × 𝑛 matrix. The quadratic form generated by A is the map:

𝑞 ∶ R𝑛 → R, x ↦ xT Ax.

Example 3.3
Find the quadratic form generated by

3 12 4
⎛ ⎞
A = ⎜ 12 −4 −8 ⎟ .
⎝ 4 −8 7 ⎠
114 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS

Theorem 3.2. The quadratic form generated by A is the same as that generated by its symmetric part
S = 12 (A + AT ).
It is for this reason that we normally use symmetric matrices when discussing quadratic forms.
3.2. QUADRATIC FORMS AND POSITIVE DEFINITENESS 115
116 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS

Question 3.4

1. Show that for any x, we have that

0 1 −3

T ⎞
x ⎜ −1 0 −2 ⎟ x = 0.
⎝ 3 2 0 ⎠
2. Find the symmetric matrices which generate:

a) 𝑞(𝑥, 𝑦) = 4𝑥2 + 5𝑥𝑦 − 7𝑦2 ;


b) 𝑞(𝑥, 𝑦, 𝑧) = 4𝑥𝑦 + 5𝑦 2 .

Definition 3.2. A quadratic form (or its associated symmetric matrix) is said to be positive definite (p.d.)
if and only if xT Ax > 0 for every x ∈ R𝑛 \{0}.
A quadratic form (or its associated symmetric matrix) is said to be non-negative definite (n.n.d.), or
positive semi-definite, if and only if xT Ax ≥ 0 for every x ∈ R𝑛 \{0}.
We would like to be able to tell if a symmetric matrix is positive definite or not. The following gives one useful
check.
Theorem 3.3. Let S be an 𝑛 × 𝑛 symmetric matrix. Then:

1. S is p.d. if and only if every eigenvalue of S is strictly positive (> 0);


2. S is n.n.d. if and only if every eigenvalue of S is non-negative (≥ 0).
3.2. QUADRATIC FORMS AND POSITIVE DEFINITENESS 117
118 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS
3.2. QUADRATIC FORMS AND POSITIVE DEFINITENESS 119

Theorem 3.4. Let S be an 𝑛 × 𝑛 symmetric matrix. If S is p.d., then S is invertible.


120 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS

Whilst the above results can be useful, finding the eigenvalues of even a 3×3 matrix can be time-consuming,
but for certain matrices (like diagonal matrices) there are short cuts. Is there a simpler way to decide whether
a matrix is positive definite?
Definition 3.3. A minor is the determinant of a submatrix.
A principal minor is the determinant of a submatrix obtained by deleting corresponding pairs of rows
and columns from the original matrix (so the diagonal entries of the submatrix were diagonal entries in
the original matrix).
The leading principal minors are the determinants of the submatrices of A (sometimes called the prin-
cipal submatrices) obtained by deleting the last 𝑘 rows and columns.
Example leading principal minors are:

𝑎11 𝑎12 𝑎13


𝑎 𝑎
𝑀1 = ∣ 𝑎11 ∣ , 𝑀2 = ∣ 11 12 ∣ , 𝑀3 = ∣ 𝑎21 𝑎22 𝑎23 ∣ , …
𝑎21 𝑎22
𝑎31 𝑎32 𝑎33

with the pattern continuing down the leading diagonal. So, a 4 × 4 matrix has 4 leading principal minors
and 1 principal minor of order 4, (the matrix itself), 4 principal minors of order 3, 6 principle minors of order 2,
and 4 principle minors of order 1 for a total of 15 principal minors; that is, a matrix of order 𝑛 has 𝑛 choose
𝑝, principal minors of order 𝑝.
Theorem 3.5. Let A be an 𝑛 × 𝑛 symmetric matrix. Then A is p.d. if and only if all leading principal
minors of A are strictly positive (> 0).
3.2. QUADRATIC FORMS AND POSITIVE DEFINITENESS 121
122 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS
3.2. QUADRATIC FORMS AND POSITIVE DEFINITENESS 123
124 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS
3.2. QUADRATIC FORMS AND POSITIVE DEFINITENESS 125
126 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS

The previous proof does not adapt to the n.n.d. case. For example, consider the matrix

1 0 1
⎛ ⎞
A = ⎜ 0 0 0 ⎟.
⎝ 1 0 0 ⎠

This has leading principal minors 1, 0, 0 ≥ 0. However, if x = (1, 1, −2), then xT Ax = −3. Alternatively,
consider
0 0
( )
0 −1
The eigenvalues are clearly 0 and −1, so this matrix is neither p.d. nor n.n.d, nevertheless both leading
principal minors are 0.
Theorem 3.6. Let A be an 𝑛 × 𝑛 symmetric matrix. Then A is n.n.d. if and only if all principal minors of
A are non-negative (≥ 0).
It is vital that you recognise this proposition requires all principal minors of A and not just the leading principal
minors. A symmetric 2×2 matrix is p.d. if and only if its first diagonal entry and its determinant strictly exceed
0.

Example 3.5

1. Show that 𝑞(𝑥, 𝑦, 𝑧) = 12𝑥2 + 12𝑥𝑦 + 8𝑥𝑧 + 9𝑦2 − 4𝑦𝑧 + 4𝑧 2 is n.n.d.


2. Show that
3 1 1
⎛ ⎞
A=⎜ 1 2 0 ⎟
⎝ 1 0 2 ⎠
is p.d.
3.2. QUADRATIC FORMS AND POSITIVE DEFINITENESS 127
128 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS
3.2. QUADRATIC FORMS AND POSITIVE DEFINITENESS 129
130 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS

Question 3.6
Let
1 0 1
⎛ ⎞
A = ⎜ 0 1 2 ⎟.
⎝ 1 2 3 ⎠
Is A p.d.?

3.3 Covariance matrices

Suppose that𝑌 is a random variable with mean 𝔼(𝑌 ) = 𝜇𝑌 , and 𝑍 is a random variable with mean
𝔼(𝑍) = 𝜇𝑍 . Recall that the covariance between 𝑌 and 𝑍 is

cov(𝑌 , 𝑍) = 𝔼[(𝑌 − 𝜇𝑌 )(𝑍 − 𝜇𝑍 )] = 𝔼(𝑌 𝑍) − 𝜇𝑌 𝜇𝑍 .

= (𝑍1 , 𝑍2 , … , 𝑍𝑛 ) of random variables (i.e. a random vector)


This can be generalized to a collection Z
by defining the 𝑛 × 𝑛 variance-covariance matrix (or just covariance matrix), Σ = (Σ𝑖𝑗 ), as:

Σ𝑖𝑗 = cov(𝑍𝑖 , 𝑍𝑗 ).

Since cov(𝑍𝑖 , 𝑍𝑗 ) = cov(𝑍𝑗 , 𝑍𝑖 ), Σ is a symmetric matrix, and in this section we will also assume each
Σ𝑖𝑗 is finite. Furthermore:
Theorem 3.7. Let Σ be a covariance matrix. Then Σ is n.n.d.
3.3. COVARIANCE MATRICES 131

A covariance matrix may or may not be p.d. If Σ is n.n.d. but not p.d., then there exists a nonzero vector x
132 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS

such that
𝑛
var (∑ 𝑥𝑖 𝑍𝑖 ) = 0.
𝑖=1

In other words, one of the 𝑍𝑖 is degenerate in the sense that either it has zero variance or it is an affine
combination of some or all of the other 𝑍𝑗 .

Question: Given a symmetric, n.n.d. matrix A, does there exist a collection (𝑍1 , … , 𝑍𝑛 ) of random
variables whose covariance matrix is A? The answer is affirmative, and we can even construct the
collection explicitly, as follows.
3.3. COVARIANCE MATRICES 133
134 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS

Summarising the previous discussion and Theorem 3.7 gives the following result.
3.3. COVARIANCE MATRICES 135

Theorem 3.8. A matrix Σ is a covariance matrix if, and only if, it is n.n.d.

Example 3.7
Suppose 𝑍1′ ∼ 𝑁 (0, 1), 𝑍2′ ∼ 𝑁 (0, 1), with 𝑍1′ and 𝑍2′ independent. Let

1 𝜌
Σ=( ),
𝜌 1

where −1 ≤ 𝜌 ≤ 1. Determine 𝑍1 and 𝑍2 so the corresponding covariance matrix is Σ.


136 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS
3.3. COVARIANCE MATRICES 137
138 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS

3.4 A glimpse of the future: Monte Carlo simulation of stock portfolio


(Note, this section is provided as an example statistical application, none of this material will be assessed.)

A particular feature of a positive definite matrix is the Cholesky decomposition. The Cholesky decompo-
sition decomposes a positive definite matrix into the product of a lower triangular matrix and its transpose;
that is, A = LL𝑇 , for some matrix L which can be constructed.

25 15 −5 5 0 0 5 3 −1
⎛ ⎞ ⎛ ⎞
⎜ 15 18 0 ⎟ = ⎜ 3 3 0 ⎟ ⎜ 0 3 1 ⎞
⎛ ⎟
⎝ −5 0 11 ⎠ ⎝ −1 1 3 ⎠⎝ 0 0 3 ⎠
This decomposition can be used to generate correlated random variables by multiplying the lower triangular
from the decomposed covariance matrix by standard normals. An example application is Monte Carlo simu-
lation of stock prices, where the portfolio return is dependent on an array of underlying assets. Assume the
daily returns are distributed according to a multivariate normal with mean vector � and covariance matrix
Σ. Use the Cholesky decomposition in order to find a lower triangular matrix L such that Σ = LLT . The
returns can be generated by Rt = � + L𝑍𝑡 where 𝑍𝑡 ∼ 𝑁 (0, 𝐼).

Example of simulated portfolio return.


3.5. CHAPTER 3 CONSOLIDATION QUESTIONS 139

3.5 Chapter 3 Consolidation Questions

Question 3.8

1. Decide whether 𝑞 is p.d., n.n.d., or neither for:

a) 𝑞(𝑥, 𝑦) = 𝑥2 + 𝑦2 ,
b) 𝑞(𝑥, 𝑦) = (𝑥 − 𝑦)2 ,
c) 𝑞(𝑥, 𝑦) = 𝑥2 + 6𝑥𝑦 + 7𝑦 2 .

2. a) Show that 𝑞(𝑥, 𝑦, 𝑧) = 4𝑥2 + 10𝑦2 + 2𝑧 2 − 8𝑦𝑧 − 4𝑥𝑧 + 4𝑥𝑦 is n.n.d.


b) Deduce that
4 2 −2

A=⎜ 2 10 −4 ⎞

⎝ −2 −4 3 ⎠
has three strictly positive eigenvalues.

Question 3.9

1. Are the following valid covariance matrices?

2 2 1 1 1 −1
⎛ ⎞
A = ⎜ 2 3 0 ⎟,

B=⎜ 1 1 1 ⎞
⎟.
⎝ 1 0 4 ⎠ ⎝ −1 1 1 ⎠

2. Determine if there exists a collection (𝑍1 , 𝑍2 ) of random variables whose covariance matrix
is
1 2
Σ=( ).
2 4
If such a collection exists, then determine such a collection.
3.6. CHAPTER 3 SUMMARY 141

3.6 Chapter 3 summary

Chapter outcomes review

Having reviewed the material and completed the assessment (formative and summative) material,
you should be able to:

• apply key symmetric matrices theoretical results to solve problems;


• select and apply appropriate theoretical results regarding quadratic forms;
• solve problems involving covariance matrices;
• prove results involving symmetric matrices and associated concepts.

This chapter contains three main topics.

Symmetric matrices. Consider an 𝑛 × 𝑛 matrix, A. The matrix A is:

• symmetric if A = AT ;
• anti-symmetric or skew-symmetric if A = − AT ;
• orthogonal if the columns (or rows) of A form an orthonormal set of vectors in R𝑛 .

Any 𝑛 × 𝑛 matrix A can always be written uniquely as A = S + C, where S is symmetric and C is


antisymmetric. Key properties includes:

1. A has a ‘full set’ of 𝑛, possibly repeated, real eigenvalues;


2. the eigenvectors corresponding to distinct eigenvalues are orthogonal;
3. there is an orthonormal basis of eigenvectors;
4. if P is a matrix constructed with the orthonormal basis of eigenvectors, then P−1 AP = D, where D is a
diagonal matrix whose diagonal entries are the eigenvalues of A. The eigenvalues order corresponds
to the order of the columns in P.

Quadratic forms and positive definiteness. The quadratic form generated by A is the map 𝑞 ∶ R𝑛 → R
defined by x ↦ x Ax. The quadratic form generated by A is the same as that generated by its symmetric
T

1
part S = 2 (A+ AT )
A quadratic form (or its associated symmetric matrix) is said to be positive definite (p.d.) if and only if
xT Ax > 0 for every x ∈ R𝑛 {0}.
A quadratic form (or its associated symmetric matrix) is said to be non-negative definite (n.n.d.) if and only
if xT Ax ≥ 0 for every x ∈ R𝑛 {0}.
Let S be an 𝑛 × 𝑛 symmetric matrix, then

• S is p.d. if and only if every eigenvalue of S is strictly positive;


• S is n.n.d. if and only if every eigenvalue of S is non-negative;
• if S is p.d. then S is invertible.
142 CHAPTER 3. SYMMETRIC MATRICES, QUADRATIC FORMS, POSITIVE DEFINITENESS

A minor is the determinant of a sub matrix. A principal minor is the determinant of a sub matrix obtained
by deleting corresponding pairs of rows and columns from the original matrix. (So the diagonal entries of the
sub matrix are diagonal entries in the original matrix.) The leading principal minors are the determinants
of the submatrices of A:

𝑎11 𝑎12 𝑎13


𝑎11 𝑎12 ⎛ ⎞
A1 = ( 𝑎11 ) , A2 = ( ) , A3 = ⎜

⎜ 𝑎21 𝑎22 𝑎23


⎟,
𝑎21 𝑎22
⎝ 𝑎31 𝑎32 𝑎33 ⎠
with the pattern continuing down the leading diagonal. (Sometimes these are called principal submatrices.)

Let S be an 𝑛 × 𝑛 symmetric matrix. Further key results are

• The matrix S is p.d. if and only if all leading principal minors of S are strictly positive.
• The matrix S is n.n.d. if and only if all principal minors of S are non-negative.

Covariance matrices

Suppose that𝑌 is a random variable with mean 𝔼(𝑌 ) = 𝜇𝑌 , and 𝑍 is a random variable with mean
𝔼(𝑍) = 𝜇𝑍 . The covariance between 𝑌 and 𝑍 is

Cov(𝑌 , 𝑍) = 𝔼[(𝑌 − 𝜇𝑌 )(𝑍 − 𝜇𝑍 )] = 𝔼(𝑌 𝑍) − 𝜇𝑌 𝜇𝑍 .

This definition generalises to a collection Z = (𝑍1 , 𝑍2 , … , 𝑍𝑛 ) of random variable by defining the 𝑛 × 𝑛


covariance matrix (or variance-covariance matrix), Σ = (Σ𝑖𝑗 ), as Σ𝑖𝑗 = Cov(𝑍𝑖 , 𝑍𝑗 ). The matrix Σ is
symmetric. Properties include

• If Σ is a covariance matrix, then Σ is n.n.d.


• Given a symmetric, n.n.d. matrix A, there exists a random vector Z whose covariance matrix is A.

A covariance matrix may or may not be p.d.

ST218: Mathematical Statistics provides further coverage of this topic.


Chapter 4

Differentiation in R𝑛

Chapter outcomes

At the end of the chapter you should be familiar with and be able to:

• Determine derivatives of functions defined R𝑛 → R𝑚 .


𝑛
• Find and classify critical points of functions R → R.
• Solve problems involving differentiation of functions R𝑛 → R𝑚 , including constrained optimi-
sation.
• Prove results involving differentiation of functions R𝑛 → R𝑚 .

Assumed background knowledge

Before starting this chapter it is assumed that you know the following material. (References are made to the
2019-20 printed notes.)

• MAT106: Linear Algebra. Chapters 11 and 13

– The determinant of a matrix


– The scalar product.

• MA137: Mathematical Analysis. Chapter 4

– Definition of the derivative


– Properties of derivatives

• ST116: Mathematical Techniques. The mathematical thinking, language and writing developed is
fundamental to everything presented in this module. This includes:

– Language of sets
– Function notation, including image and pre-image.

• General mathematical knowledge. That you can fluently apply pre-university differentiation techniques.

Chapter Statistician: Kimiko (Kim) Osada Bowman

143
144 CHAPTER 4. DIFFERENTIATION IN R𝑁

Kimiko (Kim) Osada Bowman is a Japanese-American statistician. Her work focused on the distributional
properties of estimators based on non-normal data. Kim was a member of the scientific staff of Oak Ridge
National Laboratory for 50 years. She authored or coauthored 3 books and more than 200 articles during
her career. She was a fellow of the American Statistical Association and the American Association for the
Advancement of Science, and was an elected fellow of the International Statistical Institute and the Institute
of Mathematical Statistics.

Figure 4.1: Kimiko Osada Bowman.

Image available via: https://charterfuneral.info/wp-content/uploads/2019/01/Bowman-Kimiko-Osada.jpg


Image licence: Unknown.
4.1. GENERALISING THE SINGLE VARIABLE CASE 145

4.1 Generalising the single variable case


Recall that a function, 𝑓 ∶ R → R is differentiable at 𝑥0 if

𝑓(𝑥0 + ℎ) − 𝑓(𝑥0 )
lim
ℎ→0 ℎ
exists, in which case we denote the limit by 𝑓 ′ (𝑥0 ); the function 𝑓 ′ is the derivative of 𝑓 . This allows us to
define a line 𝐿 with equation

𝑦 = 𝑡(𝑥) = 𝑓(𝑥0 ) + 𝑓 ′ (𝑥0 )(𝑥 − 𝑥0 )

with the property


𝑓(𝑥) − 𝑡(𝑥)
lim = 0,
𝑥→𝑥0 𝑥 − 𝑥0
so that 𝑡 is a ‘good linear approximation’ for 𝑓 at 𝑥0 . The line 𝐿 is the tangent to the curve 𝑦 = 𝑓(𝑥) at
𝑥0 .
We now consider the situation in higher dimensions. Write x = (𝑥1 , … 𝑥𝑛 )T . If 𝑓 ∶ R𝑛 → R𝑚 with
𝑓(x) = (𝑓1 (x), … 𝑓𝑚 (x))T , then we have seen how to define its partial derivatives:

𝜕𝑓𝑖
( ) .
𝜕𝑥𝑗
𝑖=1,…,𝑚, 𝑗=1,…,𝑛

It would be convenient to say that 𝑓 is differentiable at x0 if all of its partial derivatives exist there. How-
ever, our choice of coordinate axes is somewhat arbitrary, and doing this might mean we miss some ‘bad’
behaviour that happens away from the axes.
𝑥3 − 𝑥𝑦2
Partial derivatives are not sufficient Consider 𝑓(𝑥, 𝑦) = . Then 𝑓𝑥 (0, 0) = 1 and
𝑥2 + 𝑦 2
𝑓𝑦 (0, 0) = 0. The function is not differentiable at (0, 0) due to the function’s behaviour near (0, 0). For
example, 𝑓(𝑥, 𝑥) ≡ 0.

A picture can never capture the complete 3d behaviour and you may find it useful to manipulate this function
on a computer. Nevertheless, a rendering of 𝑓(𝑥, 𝑦) is given below.

Figure 4.2: Surface approximated by cuboids.


146 CHAPTER 4. DIFFERENTIATION IN R𝑁

A more restrictive definition would be to say that 𝑓 is differentiable at x0 if 𝑓 is differentiable along a slice in
any direction. This is often referred to as Gateaux differentiability.

However, this definition is still somewhat unsatisfactory, as there are functions for which derivatives exist in
any direction, but for which there is still no tangent plane (i.e. good linear approximation). Even worse, the
usual rules of differentiation (e.g. chain rule, sum rule) are not guaranteed to hold in this situation.

So, we adopt a different approach, and consider the question: what would we need to ensure the existence
of a tangent plane? This is often referred to as the Fr𝑒chet
́ derivative.

In R3 , a (non-vertical) plane has equation 𝑧


= 𝛼𝑥 + 𝛽𝑦 + 𝛾 . Clearly, for the plane to be a tangent to the
surface 𝑧 = 𝑓(𝑥, 𝑦), we would want the slopes in the 𝑥- and 𝑦 -directions to be

𝜕𝑓 𝜕𝑓
𝛼= (𝑥 , 𝑦 ), 𝛽= (𝑥 , 𝑦 ).
𝜕𝑥 0 0 𝜕𝑦 0 0

At x0 = (𝑥0 , 𝑦0 ), we also require that 𝑧 = 𝑓(𝑥0 , 𝑦0 ), which implies

𝜕𝑓 𝜕𝑓
𝛾 = 𝑓(𝑥0 , 𝑦0 ) − (𝑥0 , 𝑦0 ) × 𝑥0 − (𝑥 , 𝑦 ) × 𝑦0 .
𝜕𝑥 𝜕𝑦 0 0
Putting this together, a tangent plane must have the equation

𝜕𝑓 𝜕𝑓
𝑧 = 𝑇 (𝑥, 𝑦) = 𝑓(𝑥0 , 𝑦0 ) + (𝑥0 , 𝑦0 ) × (𝑥 − 𝑥0 ) + (𝑥 , 𝑦 ) × (𝑦 − 𝑦0 ).
𝜕𝑥 𝜕𝑦 0 0
𝜕𝑓 𝜕𝑓
We consequently define 𝑓 ∶ R2 → R to be differentiable at (𝑥0 , 𝑦0 ) if 𝜕𝑥 and 𝜕𝑦 both exist there, and
also
|𝑓(𝑥, 𝑦) − 𝑇 (𝑥, 𝑦)|
lim = 0;
(𝑥,𝑦)→(𝑥0 ,𝑦0 ) ‖(𝑥, 𝑦) − (𝑥0 , 𝑦0 )‖

that is, 𝑇 is a good linear approximation for 𝑓 .

To summarise, when 𝑓 is differentiable at (𝑥0 , 𝑦0 ), we can write

𝑓(𝑥, 𝑦) = 𝑇 (𝑥, 𝑦) + 𝑅((𝑥, 𝑦), (𝑥0 , 𝑦0 )),

where:
4.1. GENERALISING THE SINGLE VARIABLE CASE 147

• the linear approximation can be expressed

𝑥 − 𝑥0
𝑇 (𝑥, 𝑦) = 𝑓(𝑥0 , 𝑦0 ) + 𝐷𝑓(𝑥0 , 𝑦0 ) ( ),
𝑦 − 𝑦0

with
𝜕𝑓 𝜕𝑓
𝐷𝑓(𝑥0 , 𝑦0 ) = ( 𝜕𝑥 (𝑥0 , 𝑦0 ) 𝜕𝑦 (𝑥0 , 𝑦0 ) )
being the derivative of 𝑓 at (𝑥0 , 𝑦0 );
• the remainder term satisfies

|𝑅((𝑥, 𝑦), (𝑥0 , 𝑦0 ))|


lim = 0.
(𝑥,𝑦)→(𝑥0 ,𝑦0 ) ‖(𝑥, 𝑦) − (𝑥0 , 𝑦0 )‖

More generally, a function 𝑓 ∶ 𝑈 ⊆ R𝑛 → R𝑚 is differentiable at x0 ∈ 𝑈 if the partial derivatives exist at


x0 and
‖𝑓(x) − 𝑓(x0 ) − 𝐷𝑓(x0 )(x − x0 )‖
lim = 0,
x→x0 ‖x − x 0 ‖
where 𝐷𝑓(x0 ) is the linear transformation represented by the Jacobian matrix

𝜕𝑓𝑖
( (x )) .
𝜕𝑥𝑗 0
𝑖=1,…,𝑚, 𝑗=1,…,𝑛

We say the function x0 ↦ 𝐷𝑓(x0 ) is the derivative of 𝑓 .

In the special case 𝑓 ∶ 𝑈 ⊆ R𝑛 → R, we have that 𝐷𝑓(x0 ) is a 1 × 𝑛 matrix

𝜕𝑓 𝜕𝑓
( (x 0 ) … (x )) .
𝜕𝑥1 𝜕𝑥𝑛 0
The corresponding column vector
T
𝜕𝑓 𝜕𝑓
( (x0 ), … , (x ))
𝜕𝑥1 𝜕𝑥𝑛 0
is said to be the gradient of 𝑓 at x0 , and is abbreviated as grad𝑓(x0 ) or ∇𝑓(x0 ).

Directional derivative
148 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.1. GENERALISING THE SINGLE VARIABLE CASE 149

Example 4.1
Show that 𝑓(𝑥, 𝑦) = 2𝑥2 − 4𝑦 is differentiable at (2, −3).
150 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.1. GENERALISING THE SINGLE VARIABLE CASE 151

Example 4.2

1. Determine the maximum rate of change of 𝑓 ∶ R2 → R defined by 𝑓(𝑥, 𝑦) = 𝑦𝑒𝑥𝑦 at (0, 2)


and the direction in which it occurs.
2. You stand on the surface of a planet at the coordinate (𝑥, 𝑦) = (0, 1). The temperature of
the planet at the point (𝑥, 𝑦) is given by 𝑇 (𝑥, 𝑦) = 𝑥𝑦 .

a. In which direction should you move to increase your temperature as quickly as possible?
b. You decide to move along the line 𝑥 + 𝑦 − 1 = 0 in the direction (−1, 1). What is the
maximum temperature you will experience along this path?
152 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.1. GENERALISING THE SINGLE VARIABLE CASE 153
154 CHAPTER 4. DIFFERENTIATION IN R𝑁

Showing that the limit defining the derivative exists can be difficult, but finding the partial derivatives and the
equation of the tangent is often simple! Moreover, the following theorem tells us that this is nearly enough.

Theorem 4.1. Let 𝑓 ∶ 𝑈 ⊆ R𝑛 → R𝑚 . If the partial derivatives exist and are continuous in a neigh-
bourhood of x, then 𝑓 is differentiable at x.

Example 4.3
Determine the tangent plane to 𝑓(𝑥, 𝑦) = 𝑥𝑒2𝑦 at (1, 0).
4.1. GENERALISING THE SINGLE VARIABLE CASE 155
156 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.1. GENERALISING THE SINGLE VARIABLE CASE 157

Example 4.4
Let 𝑓(𝑥, 𝑦) = √41 − 4𝑥2 − 𝑦2 . Approximate 𝑓(2.1, 2.9) using 𝐷𝑓(2, 3).
158 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.1. GENERALISING THE SINGLE VARIABLE CASE 159
160 CHAPTER 4. DIFFERENTIATION IN R𝑁

Question 4.5

1. Find the tangent plane to 𝑓(𝑥, 𝑦) = 2𝑥2 − 3𝑥𝑦 + 8𝑦2 + 2𝑥 − 4𝑦 + 4 at (2, −1).
2. Find the tangent plane to 𝑓(𝑥, 𝑦) = 𝑥3 − 𝑥2 𝑦 + 𝑦 2 − 2𝑥 + 3𝑦 − 2 at (−1, 3).
3. Let 𝑓(𝑥, 𝑦) = 𝑒5−2𝑥+3𝑦 . Determine an approximate to 𝑓(4.1, 0.9) using 𝐷𝑓(4, 1).
4. A ball sits at coordinate (1, 2) in the (𝑥, 𝑦)-plane on a slope whose height at (𝑥, 𝑦) is given
by
3
−𝑦2
ℎ(𝑥, 𝑦) = 𝑒−𝑥 .
In which direction will the ball roll?
4.2. BASIC PROPERTIES OF THE DERIVATIVE 161

4.2 Basic properties of the derivative


The derivative of 𝑓 , 𝐷𝑓 , has all of the properties that we expect of the derivative. These include the following.
Sum rule. If 𝑓 ∶ R𝑛 → R𝑚 , 𝑔 ∶ R𝑛 → R𝑚 , then

𝐷(𝑓 + 𝑔)(x) = 𝐷𝑓(x) + 𝐷𝑔(x).

Product rule. If 𝑓 ∶ R𝑛 → R, 𝑔 ∶ R𝑛 → R, then

𝐷(𝑓𝑔)(x) = 𝑓(x)𝐷𝑔(x) + 𝑔(x)𝐷𝑓(x).

(A similar identity holds when 𝑓 ∶ R𝑛 → R𝑚 .) Quotient rule. If 𝑓 ∶ R𝑛 → R, 𝑔 ∶ R𝑛 → R, then

𝑓 −𝑓(x)𝐷𝑔(x) + 𝑔(x)𝐷𝑓(x)
𝐷 ( ) ( x) = .
𝑔 𝑔(x)2

Dot product rule. If 𝑓 ∶ R𝑛 → R𝑚 , 𝑔 ∶ R𝑛 → R𝑚 , then

𝐷(𝑓 ⋅ 𝑔)(x) = 𝑓(x) ⋅ 𝐷𝑔(x) + 𝑔(x) ⋅ 𝐷𝑓(x),

where 𝑓 ⋅ 𝑔 ∶ R𝑛 → R is the function defined by setting

(𝑓 ⋅ 𝑔)(x) = 𝑓(x) ⋅ 𝑔(x).

Chain rule. If 𝑔 ∶ 𝑈 ⊆ R𝑛 → R𝑚 and 𝑓 ∶ 𝑉 ⊆ R𝑚 → R𝑝 , with 𝑈 mapping into 𝑉 , so that 𝑓 ∘ 𝑔 is


defined. Let 𝑔 be differentiable at x0 , and 𝑓 be differentiable at 𝑔(x0 ). Then 𝑓 ∘ 𝑔 is differentiable at x0 , and

𝐷(𝑓 ∘ 𝑔)(x0 ) = (𝐷𝑓)(𝑔(x0 ))𝐷𝑔(x0 ).

Example 4.6
Let 𝑓∶ R2 → R2 be defined by (𝑦1 , 𝑦2 ) ↦ (2𝑦1 + 𝑦22 , 3𝑦12 − 𝑦2 ). Let 𝑔 ∶ R3 → R2 be defined
by (𝑥1 , 𝑥2 , 𝑥3 ) ↦ (𝑥1 𝑥2 , 𝑥2 𝑥3 ).

1. Determine 𝐷𝑔(𝑥1 , 𝑥2 , 𝑥3 ).
2. Find explicitly the map 𝑓 ∘ 𝑔 ∶ R3 → R2 , and hence find 𝐷(𝑓 ∘ 𝑔).
162 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.2. BASIC PROPERTIES OF THE DERIVATIVE 163
164 CHAPTER 4. DIFFERENTIATION IN R𝑁

Question 4.7
Let 𝑓 ∶ R2 → R3 and 𝑔 ∶ R2 → R3 be defined by:

𝑥𝑦 sin 𝑥
𝑓(𝑥, 𝑦) = ⎛
⎜ 𝑦𝑒𝑥 ⎞
⎟, 𝑔(𝑥, 𝑦) = ⎛
⎜cos 2𝑦⎞⎟.
⎝log 𝑥⎠ ⎝ 1 ⎠

1. Find 𝐷𝑓 and 𝐷𝑔 .
2. By using the dot product rule, find the derivative of the function 𝑓 ⋅ 𝑔 ∶ R2 → R, where
(𝑓 ⋅ 𝑔)(x) ∶= 𝑓(x) ⋅ 𝑔(x).
4.2. BASIC PROPERTIES OF THE DERIVATIVE 165

Question 4.8
Let 𝑔 ∶ ℝ2 → ℝ3 and 𝑓 ∶ ℝ3 → ℝ be defined by:
𝜋𝑥 𝑣
𝑔(𝑥, 𝑦) = (sin( ) + sin(𝜋𝑦), 𝑥𝑦, 2 + 𝑥2 + 𝑦2 ) , 𝑓(𝑢, 𝑣, 𝑤) = .
2 𝑢+𝑤
1. Find 𝐷𝑓 and 𝐷𝑔 .
2. Hence, find the derivative of the composite function 𝑓 ∘ 𝑔 at the point (1, 1).
4.3. FINDING AND CLASSIFYING CRITICAL POINTS 167

4.3 Finding and classifying critical points

4.3.1 Critical points


Let 𝑓 ∶ 𝑈 ⊆ ℝ𝑛 → ℝ. We say that x0 ∈ 𝑈 is a:

• local maximum of 𝑓 if there is a neighbourhood 𝑉 of x0 such that 𝑓(x) ≤ 𝑓(x0 ) for every x ∈ 𝑉 ;
• local minimum of 𝑓 if there is a neighbourhood 𝑉 of x0 such that 𝑓(x) ≥ 𝑓(x0 ) for every x ∈ 𝑉 .

A point x0 is a critical point of 𝑓 if 𝐷𝑓(x0 ) = 0. Note that in the case that 𝑓 ∶ ℝ → ℝ, we have that
𝐷𝑓(𝑥) = 0 if and only if 𝑥 is a local maximum, a local minimum or a point of inflection. In higher dimensions,
the situation is more complicated. Appendix 4.7 contains renderings that illustrate the complexity.

Theorem 4.2. Let 𝑓 ∶ ℝ𝑛 → ℝ be differentiable. If x0 is a local extremum (maximum or minimum),


then 𝐷𝑓(x0 ) = 0.
168 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.3. FINDING AND CLASSIFYING CRITICAL POINTS 169
170 CHAPTER 4. DIFFERENTIATION IN R𝑁

Example 4.9
Find the critical points of 𝑓(𝑥, 𝑦) = 𝑥2 + 2𝑥𝑦 − 4𝑦2 + 4𝑥 − 6𝑦 + 4.
4.3. FINDING AND CLASSIFYING CRITICAL POINTS 171
172 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.3. FINDING AND CLASSIFYING CRITICAL POINTS 173

Question 4.10
Find the critical points of each of the following functions, 𝑓 ∶ 𝑈 ⊆ R2 → R, where 𝑈 is the largest
subset of R2 such that 𝑓 is defined.

1. 𝑓(𝑥, 𝑦) = 𝑥3 + 2𝑥𝑦 − 2𝑥 − 4𝑦.


2. 𝑓(𝑥, 𝑦) = √4𝑦 2 − 9𝑥2 + 24𝑦 + 36𝑥 + 36, where 4𝑦 2 − 9𝑥2 + 24𝑦 + 36𝑥 + 36 ≥ 0.

4.3.2 Classifying critical points

We start with some revision of the single-value case. Recall that under suitable conditions we may write
𝑓 ∶ R → R as
1
𝑓(𝑥0 + ℎ) = 𝑓(𝑥0 ) + 𝑓 ′ (𝑥0 )ℎ + 𝑓 ′′ (𝑥0 )ℎ2 + 𝑅(𝑥0 , ℎ)
2
where 𝑅(𝑥0 , ℎ) is the remainder term. If 𝑥0 is a critical point, then 𝑓 ′ (𝑥0 ) = 0. Therefore,

1
𝑓(𝑥0 + ℎ) − 𝑓(𝑥0 ) = 𝑓 ′′ (𝑥0 )ℎ2 + 𝑅(𝑥0 , ℎ).
2
Therefore, for small ℎ, if 𝑓 ′′ (𝑥0 )
> 0 and hence 𝑓(𝑥0 + ℎ) − 𝑓(𝑥0 ) > 0. Therefore, 𝑥0 is a minimum.
The case 𝑓 (𝑥0 ) < 0 leads to 𝑥0 as a maximum. When 𝑓 ′′ (𝑥0 ) = 0 the second-order term is insufficient
′′

to determine 𝑥0 ’s nature. Trivial examples are 𝑥3 , 𝑥4 and so on.

As noted earlier, critical points are not necessarily maximum, minima or points of inflection. For example,
consider the function 𝑓(𝑥, 𝑦) = 𝑥2 − 𝑦 2 .

At (0, 0), we have 𝑓𝑥 = 𝑓𝑦 = 0, i.e. (0, 0) is a critical point. However, this is neither a local maximum nor
a local minimum. Indeed, if we slice parallel to the 𝑥-axis, it appears to be a minimum. Whereas, if we slice
parallel to the 𝑦 -axis, it appears to be a local maximum. We call such a point a saddle point.

We now introduce a general technique for classifying critical points; as with the procedure you know for the
174 CHAPTER 4. DIFFERENTIATION IN R𝑁

one-dimensional case, this will involve second derivatives. Let 𝑓 ∶ ℝ𝑛 → ℝ. Then

𝐷𝑓 ∶ ℝ𝑛 → ℝ1×𝑛
x ↦ 𝐷𝑓(x).

As already noted, this map can be represented by a 1 × 𝑛 matrix, corresponding to the 𝑛 × 1 vector

T
𝜕𝑓 𝜕𝑓
∇𝑓(x) = ( (x), … , (x)) ,
𝜕𝑥1 𝜕𝑥𝑛

to which we can apply the operator 𝐷 again. We define the second derivative of 𝑓 by setting

𝐷2 𝑓(x) = 𝐷(∇𝑓)(x).

Since ∇𝑓 ∶ ℝ𝑛 → ℝ𝑛 , 𝐷2 𝑓 can be represented by an 𝑛 × 𝑛 matrix, specifically:


𝜕2𝑓 𝜕2𝑓
𝜕𝑥21
… 𝜕𝑥𝑛 𝜕𝑥1
⎛ ⎞
H =⎜

⎜ ⋮ ⋱ ⋮ ⎟


𝜕2𝑓 𝜕2𝑓
⎝ 𝜕𝑥1 𝜕𝑥𝑛 … 𝜕𝑥2𝑛 ⎠

𝜕2𝑓 𝜕 𝜕𝑓
where 𝜕𝑥 𝜕𝑥 = 𝜕𝑥 ( 𝜕𝑥 ). The symbol H is used as the above matrix is often referred to as the Hessian
𝑖 𝑗 𝑖 𝑗
matrix, or simply the Hessian. (In some literature, the term Hessian is used to refer to the determinant of
𝜕2𝑓
the above matrix.) It will be convenient to use the shorthand notation 𝑓𝑥 𝑥 ∶= 𝜕𝑥𝑖 𝜕𝑥𝑗 .
𝑖 𝑗

Theorem 4.3. If 𝑓𝑥 𝑥 , 𝑖, 𝑗 = 1, … , 𝑛, are defined and continuous throughout an open region containing
𝑖 𝑗
x, then
𝑓𝑥𝑖 𝑥𝑗 (x) = 𝑓𝑥𝑗 𝑥𝑖 (x).

Note, the conditions in this theorem are required. Consider, 𝑓 ∶ R2 → R defined by

0 (𝑥, 𝑦) = (0, 0)
𝑓(𝑥, 𝑦) = { 𝑥𝑦(𝑥2 −𝑦2 )
𝑥2 +𝑦2 (𝑥, 𝑦) ≠ (0, 0)

In this case, 𝑓𝑥𝑦 (0, 0) = 1 and 𝑓𝑦𝑥 (0, 0) = −1.

An immediate consequence of the above result is that if the second order partial derivatives are continuous
in an open region containing x0 , then the Hessian H is symmetric at x0 . In this situation, 𝑓 has a second
order Taylor expansion:

1
𝑓(x0 + h) = 𝑓(x0 ) + 𝐷𝑓(x0 )h + hT 𝐷2 𝑓(x0 )h + 𝑅(x0 , h),
2
where 𝑅(x0 , h) = 𝑜(‖h‖2 ). That is, limℎ→0 𝑅(x0 , h)/‖h‖2 = 0.
4.3. FINDING AND CLASSIFYING CRITICAL POINTS 175
176 CHAPTER 4. DIFFERENTIATION IN R𝑁

This leads to the following criteria for classifying local maxima and minima.

Theorem 4.4 (Classifying critical points). Suppose 𝑓 ∶ ℝ𝑛 → ℝ is twice differentiable at x0 , and its
second order partial derivatives are continuous in an open region containing x0 . If x0 is a critical point,
then:

• 𝐷2 𝑓(x0 ) being positive definite implies that x0 is a local minimum;


• 𝐷2 𝑓(x0 ) being negative definite implies that x0 is a local maximum.

Recall, a symmetric matrix 𝑀 is positive definite if and only if its leading principal minors are strictly positive.
There is a similar result for negative definite matrices.

Theorem 4.5. A symmetric matrix 𝑀 is negative definite if and only if its leading principal minors alter-
nate in sign, with the first being strictly negative.

If 𝐷2 𝑓(x0 ) is positive definite or negative definite, then it must be invertible. If 𝐷2 𝑓(x0 ) is neither positive
definite nor negative definite, but still invertible, then x0 is a non-degenerate saddle point.

If 𝐷2 𝑓(x0 ) is not invertible (including the semi-definite cases), then this is a degenerate case, and further
work is needed to decide on the type of critical point. We will not study the general theory of degenerate
critical points in this module.

Special case: If 𝑓 ∶ ℝ2 → ℝ, then the above results can be summarised as follows.


4.3. FINDING AND CLASSIFYING CRITICAL POINTS 177
178 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.3. FINDING AND CLASSIFYING CRITICAL POINTS 179

Example 4.11
Find the critical points for each of the following functions and classify their nature.

1. 𝑓(𝑥, 𝑦) = 4𝑥2 + 9𝑦2 + 8𝑥 − 36𝑦 + 24.


2. 𝑓(𝑥, 𝑦, 𝑧) = 𝑥2 +𝑦 2 +7𝑧 2 −𝑥𝑦 −3𝑦𝑧 . In each case 𝑓 ’s domain is the largest subset of R𝑛
on which 𝑓 is defined. (Note, this is our standing assumption for the functions we consider.)
180 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.3. FINDING AND CLASSIFYING CRITICAL POINTS 181
182 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.3. FINDING AND CLASSIFYING CRITICAL POINTS 183

Question 4.12
Find and classify the critical points of:

1. 𝑓(𝑥, 𝑦) = 13 𝑥3 + 𝑦2 + 2𝑥𝑦 − 6𝑥 − 3𝑦 + 4.
2. 𝑓(𝑥, 𝑦) = 8𝑥3 + 𝑦3 + 6𝑥𝑦;
3. 𝑓(𝑥, 𝑦) = 𝑥𝑦2 − 4𝑥2 − 𝑦2 + 4𝑥;
4. 𝑓(𝑥, 𝑦) = 𝑥4 + 𝑦4 ;
5. 𝑓(𝑥, 𝑦, 𝑧) = 4𝑥𝑦𝑧 − 𝑥4 − 𝑦4 − 𝑧 4 .
4.4. CONSTRAINED OPTIMISATION 185

4.4 Constrained optimisation


Once we have found local extrema, it is natural to ask whether there exist absolute extrema: we may be able
to decide this using our knowledge of the function and the set on which it is defined (for example, if we have
a continuous function on a compact set, then the image will be compact, so we will have absolute extrema).
Another natural question is whether or not there is a maximum (or minimum) value when we consider 𝑓
𝑛 𝑛 𝑚
subject to a constraint of the form 𝑔(x)
= b, where 𝑓 ∶ ℝ → ℝ and 𝑔 ∶ ℝ → ℝ are smooth functions
(we will mainly focus on the case 𝑛 = 2, 𝑚 = 1, but the technique for other 𝑛, 𝑚 is analogous). Note that
we can assume b = 0 by defining 𝑔 ∗ (x) = 𝑔(x) − b. Moreover, more than one constraint can be dealt
with by adjusting the value of 𝑚.

Theorem 4.6 (Lagrange multiplier theorem). Suppose 𝑓 ∶ ℝ𝑛 → ℝ has a local extremum at x0 when
restricted to the constraint set 𝐶 = {x ∶ 𝑔(x) = 0}, where 𝑔 ∶ ℝ𝑛 → ℝ. Suppose also that x0 is
not an end-point of 𝐶 , and ∇𝑔(x0 ) ≠ 0. Then there exists a number 𝜆0 such that (x0 , 𝜆0 ) is a critical
point of
𝐿(x, 𝜆) = 𝑓(x) − 𝜆𝑔(x).
If 𝑔∶ ℝ𝑛 → ℝ𝑚 , then we should take the Lagrange multiplier 𝜆 ∈ ℝ𝑚 , and take the Lagrangian to
be 𝐿(x, 𝜆) = 𝑓(x) − 𝜆 ⋅ 𝑔(x).

The above theorem tells us that we should search for the extrema of the constrained 𝑓 among the critical
points of the Lagrangian function. However, note that it does not guarantee that a solution exists. We will
not prove the theorem, but explain intuitively why it makes sense.

At every point in the domain, ∇𝑓 is normal to the level curve through the point (naively, ∇𝑓 measures the
change in 𝑓 , and there is no change in 𝑓 along the level curve, and so all change must be normal to it). Of
course, 𝐶 = {x ∶ 𝑔(x) = 0} is a level curve for the function 𝑔, so ∇𝑔 is normal to 𝐶 . Now, at a local
extremum x0 , the level curves for 𝑓 must be tangent to 𝐶 (if 𝐶 crossed a level curve, it would move from a
lower value to a greater, or vice versa). Hence ∇𝑓 and ∇𝑔 lie along the same line; they could point in the
same or opposite directions.

Figure 4.3: Geometry of Lagrange multipliers

To write this mathematically, we must have

∇𝑓(x0 ) = 𝜆0 ∇𝑔(x0 )

for some 𝜆0 . This implies 𝐷𝐿(x0 , 𝜆0 ) = 0, i.e. (x0 , 𝜆0 ) is a critical point of 𝐿.


186 CHAPTER 4. DIFFERENTIATION IN R𝑁

Figure 4.4: Geometry of Lagrange multipliers

We can find the critical points of 𝐿 by finding all the partial derivatives and equating them to 0. We can
classify our extrema either by considering the geometry of the situation or by using a specialised second
derivative test. In particular, let 𝑓 ∶ ℝ𝑛 → ℝ be constrained by 𝑔(x) = 0 (where 𝑔 ∶ ℝ𝑛 → ℝ𝑚 ) with
constraint set 𝐶 and Lagrangian 𝐿. Define the bordered Hessian matrix to be:

0 𝐷𝑔

⎜ 𝜕2𝐿 𝜕2𝐿 ⎞
⎜ 𝜕𝑥21 𝜕𝑥2 𝜕𝑥1 … ⎟

H̄ = ⎜ ⎟,

⎜ (𝐷𝑔)
T 𝜕2𝐿 𝜕2𝐿
… ⎟

𝜕𝑥1 𝜕𝑥2 𝜕𝑥22
⎝ ⋮ ⋮ ⋱ ⎠

where 0 is 𝑚 × 𝑚.

Important note The bordered Hessian matrix is actually just the Hessian matrix of 𝐿 differentiated with
respect to 𝜆 first that is, with 𝐿(x, 𝜆) = 𝑓(x) − 𝜆𝑔(x) we have
𝜕2𝐿 𝜕2𝐿
𝜕𝜆2 𝜕𝜆𝜕 x
( 𝜕2𝐿
𝑇
𝜕2𝐿
)
( 𝜕𝜆𝜕 x
) 𝜕 x2

−(𝐷𝑔) and
A direct computation of this matrix reveals that the upper right and bottom left entries are
−(𝐷𝑔)𝑇 . So why does our bordered Hessian omit the minus signs? We use the bordered Hessian to
compute leading principal minors, these minors are not affected by a simultaneous change of sign of both
the (𝐷𝑔) and (𝐷𝑔)𝑇 and hence the signs are set to be positive. This helps reduce the chances of sign
errors.

Calculate the leading principal minors of H̄ (x0 , 𝜆0 ) of order ≥ 2𝑚 + 1. If these:


4.4. CONSTRAINED OPTIMISATION 187

• start with the sign of (−1)𝑚+1 and alternate, then we have a local maximum for 𝑓|𝐶 ;
• all have the sign of (−1)𝑚 , then we have a local minimum for 𝑓|𝐶 ;
• do not fall into either of the above categories, then the test is inconclusive.

Note that this recovers the second derivative test we met earlier when there are 𝑚 = 0 constraints.
In the case 𝑓 ∶ ℝ2 → ℝ, 𝑔 ∶ ℝ2 → ℝ, the bordered Hessian matrix is:
𝜕𝑔 𝜕𝑔
0 𝜕𝑥 𝜕𝑦
⎛ ⎞
H̄ =⎜


𝜕𝑔
𝜕𝑥
𝜕2𝐿
𝜕𝑥2
𝜕2𝐿
𝜕𝑦𝜕𝑥


⎟.
𝜕𝑔 𝜕2𝐿 𝜕2𝐿
⎝ 𝜕𝑦 𝜕𝑥𝜕𝑦 𝜕𝑦2 ⎠
So, for a critical point x0 , the test becomes

• if det H̄ (x0 , 𝜆0 ) > 0, then we have a local maximum for 𝑓|𝐶 ;


• if det H̄ (x0 , 𝜆0 ) < 0, then we have a local minimum for 𝑓|𝐶 ;
• if det H̄ (x0 , 𝜆0 ) = 0, then the test is inconclusive.

Example 4.13
Let 𝑓(𝑥, 𝑦)= 𝑥2 + 4𝑦2 and 𝑔(𝑥, 𝑦) = 𝑥2 + 𝑦2 and let 𝐶 = {(𝑥, 𝑦) ∶ 𝑔(𝑥, 𝑦) = 1}. Use the
method of Lagrange multipliers to find the extrema for 𝑓 on 𝐶 .
188 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.4. CONSTRAINED OPTIMISATION 189
190 CHAPTER 4. DIFFERENTIATION IN R𝑁
4.4. CONSTRAINED OPTIMISATION 191

Question 4.14

1. Let 𝑓(𝑥, 𝑦) = 𝑥2 + 𝑦2 and 𝑔(𝑥, 𝑦) = 𝑦 − 𝑥 − 1, and let 𝐶 = {(𝑥, 𝑦) ∶ 𝑔(𝑥, 𝑦) = 0},


i.e. we are constrained to lie on the line 𝑦 = 𝑥 + 1. At what point is 𝐶 tangent to a level curve
of 𝑓 ? Is this a local extrema of 𝑓 on 𝐶 ? If so, what type?

2. Let𝑓(𝑥, 𝑦) = 𝑥2 + 𝑦2 and 𝑔(𝑥, 𝑦) = 𝑥2 𝑦 − 16, and √ let 𝐶 = {(𝑥, 𝑦) ∶ 𝑔(𝑥, 𝑦) = 0}.


Use the method of Lagrange multipliers to show that (±2 2, 2) are both local minima for 𝑓
on 𝐶 .

3. Use the method of Lagrange multipliers to find the maximum value of 𝑓(𝑥, 𝑦) = 2𝑥 + 𝑦 + 6
2
subject to 𝑥 + 2𝑦 = 3.

4. An ant is on a metal plate with temperature at (𝑥, 𝑦) given by 𝑇 (𝑥, 𝑦) = 4𝑥2 − 4𝑥𝑦 + 𝑦2 .
He walks around a circle of centre 0 and radius 5. Where on the circular path followed by
the ant is the temperature highest? And where is it lowest? (You should use the method of
Lagrange multipliers and the second derivative test.)
192 CHAPTER 4. DIFFERENTIATION IN R𝑁

5. Use the method of Lagrange multipliers to find the extrema of 𝑓(𝑥, 𝑦) = 3𝑥 + 2𝑦 subject to
2𝑥2 + 3𝑦2 = 3.
4.5. CHAPTER 4 CONSOLIDATION QUESTIONS 193

4.5 Chapter 4 Consolidation Questions

Question 4.15
Define 𝑓 ∶ R2 → R by
𝑥𝑦
√𝑥2 +𝑦2
(𝑥, 𝑦) ≠ (0, 0),
𝑓(𝑥, 𝑦) = {
0 (𝑥, 𝑦) = (0, 0)

Show that 𝑓(0, 0), 𝑓𝑥 (0, 0) and 𝑓𝑦 (0, 0) all exist, but that there is no tangent plane to 𝑓 at (0, 0).
Why is Theorem 4.1 not contradicted?
[Hint: look at how 𝑓 behaves on different lines through the origin.]

Question 4.16
Let 𝑓(𝑥, 𝑦) = (𝑦 − 𝑥2 )(𝑦 − 3𝑥2 ). 1. Show that (0, 0) is a critical point. 2. Show also that
along any straight line through (0, 0), 𝑓 has a local minimum at (0, 0). (Consider the function
𝑡 ↦ 𝑓(𝑎𝑡, 𝑏𝑡) for different values of 𝑎, 𝑏.) 3. Is (0, 0) a local minimum? (Hint: Consider the curve
𝑦 = 2𝑥2 .) Check your conclusion does not conflict with the second derivative criterion for a local
minimum.

Question 4.17
Let 𝑓(𝑥, 𝑦, 𝑧) = 𝑥2 + 𝑦2 + 𝑧 2 . Find the extreme point 𝑓 subject to the constraint that 𝑧 − 𝑥𝑦 = 2.

Question 4.18
𝑓(𝑥, 𝑦, 𝑧) = 𝑧 . Find the extrema of 𝑓 subject to the constraints that 𝑥 + 𝑦 + 𝑧 = 12 and
Let
𝑥 + 𝑦 2 − 𝑧 = 0.
2
4.6. CHAPTER 4 SUMMARY 195

4.6 Chapter 4 summary

Chapter outcomes review

Having reviewed the material and completed the assessment (formative and summative) material,
you should be able to:

• Determine derivatives of functions defined R𝑛 → R𝑚 .


• Find and classify critical points of functions R𝑛 → R.
• Solve problems involving differentiation of functions R𝑛 → R𝑚 , including constrained optimi-
sation.
• Prove results involving differentiation of functions R𝑛 → R𝑚 .

Differentiability in R𝑛 Consider 𝑓 ∶ R2 → R. Let 𝑇 be the tangent plane to 𝑓 at (𝑥0 , 𝑦0 ). Define 𝑓 to be


𝜕𝑓 𝜕𝑓
differentiable at (𝑥0 , 𝑦0 ) if and both exist at (𝑥0 , 𝑦0 ) and also
𝜕𝑥 𝜕𝑦

|𝑓(𝑥, 𝑦) − 𝑇 (𝑥, 𝑦)|


lim = 0.
(𝑥,𝑦)→(𝑥0 ,𝑦0 ) ‖(𝑥, 𝑦) − (𝑥0 , 𝑦0 )‖

That is, 𝑇 is a ‘good’ linear approximation to 𝑓 at (𝑥0 , 𝑦0 ).

In general, consider 𝑓 ∶ 𝑈 ⊆ R𝑛 → R𝑚 . Then 𝑓 is differentiable at x0 ∈ 𝑈 if the partial derivatives exist


at x0 and
‖𝑓(x) − 𝑓(x0 ) − 𝐷𝑓(x0 )(x − x0 )‖
lim = 0,
x→x0 ‖x − x 0 ‖
Where 𝐷𝑓(x0 ) is the linear transformation represented by the Jacobian matrix

𝜕𝑓𝑖
( (x ))
𝜕𝑥𝑗 0
𝑖=1,…,𝑚,𝑗=1,…,𝑛

The function x0 ↦ 𝐷𝑓(x0 ) is the derivative of 𝑓 .


The special case 𝑓 ∶ 𝑈 ⊂ R𝑛 → R yields a 1 × 𝑛 matrix, 𝐷𝑓(x0 )

𝜕𝑓 𝜕𝑓
( (x 0 ) … (x )) .
𝜕𝑥1 𝜕𝑥𝑛 0

The corresponding column vector 𝐷𝑓(x0 )T is called the gradient of 𝑓 at x0 . The gradient is also ab-
breviated as grad𝑓(x0 ) or ∇𝑓(x0 ). So, ∇𝑓(x0 ) ∈ R𝑛 . The gradient is used to determine directional
derivatives.

Key properties are:

• If the partial derivatives exist and are continuous in a neighbourhood of x, then 𝑓 is differentiable at x.
• The gradient of 𝑓 points in the direction in which 𝑓 is increasing fastest.
196 CHAPTER 4. DIFFERENTIATION IN R𝑁

• Sum rule. If 𝑓 ∶ R𝑛 → R𝑚 and 𝑔 ∶ R𝑛 → R𝑚 , then

𝐷(𝑓 + 𝑔)(x) = 𝐷𝑓(x) + 𝐷𝑔(x).

• Product rule. If 𝑓 ∶ R𝑛 → R and 𝑔 ∶ R𝑛 → R, then

𝐷(𝑓𝑔)(x) = 𝑓(x)𝐷𝑔(x) + 𝑔(x)𝐷𝑓(x).

• Quotient rule. If 𝑓 ∶ R𝑛 → R and 𝑔 ∶ R𝑛 → R, then

𝑓 𝑔(x)𝐷𝑓(x) − 𝑓(x)𝐷𝑔(x)
𝐷 ( ) ( x) =
𝑔 𝑔(x)2
provided all the expressions are defined at x.
• Dot product rule. If 𝑓 ∶ R𝑛 → R𝑚 and 𝑔 ∶ R𝑛 → R𝑚 , then

𝐷(𝑓 ⋅ 𝑔)(x) = 𝑓(x)T 𝐷𝑔(x) + 𝑔(x)T 𝐷𝑓(x),

where 𝑓 ⋅ 𝑔 ∶ R𝑛 → R is the function defined by (𝑓 ⋅ 𝑔)(x) = 𝑓(x)𝑔(x).


• Chain rule. Let 𝑔 ∶ 𝑈 ⊆ R𝑛 → R𝑚 and 𝑓 ∶ 𝑉 ⊆ R𝑚 → R𝑝 , with 𝑈 mapping into 𝑉 , so that 𝑓 ∘ 𝑔
is defined. Let 𝑔 be differentiable at x0 , and 𝑓 be differentiable at 𝑔(x0 ). Then 𝑓 ∘ 𝑔 is differentiable
at x0 , and
𝐷(𝑓 ∘ 𝑔)(x0 ) = (𝐷𝑓)(𝑔(x0 ))𝐷𝑔(x0 ).

Finding and classifying critical points Let 𝑓 ∶ 𝑈 ⊂ R𝑛 → R. A point x0 ∈ 𝑈 is a:

• local maximum of 𝑓 if there is a neighbourhood 𝑉 of x0 such that 𝑓(x) ≤ 𝑓(x0 ) for every x ∈ 𝑉 ;
• local minimum of 𝑓 if there is a neighbourhood 𝑉 of x0 such that 𝑓(x) ≥ 𝑓(x0 ) for every x ∈ 𝑉 .

If the inequality is strict, then an extremum is non-degenerate. A point x0 is a critical point of 𝑓 if


𝐷𝑓(x0 ) = 0.

• Let 𝑓 ∶ R𝑛 → R be differentiable. If x0 is a local extremum (so a local maximum or minimum), then


𝐷𝑓(x0 ) = 0.
• Consider 𝑓 ∶ R2 → R defined by 𝑓(𝑥, 𝑦) = 𝑥2 − 𝑦 2 for all (𝑥, 𝑦) ∈ R2 . At (0, 0), 𝑓𝑥 = 𝑓𝑦 = 0,
so (0, 0) is a critical point. The point (0, 0) is neither a local maximum nor a local minimum. This
point is a saddle point.

Define the second derivative of 𝑓 by setting 𝐷2 𝑓(x) = 𝐷(∇𝑓)(x). Since ∇𝑓 ∶ R𝑛 → R𝑛 , 𝐷2 𝑓 can


be represented by an 𝑛 × 𝑛 matrix,

𝜕2𝑓 𝜕2𝑓
𝜕𝑥21
⋯ 𝜕𝑥𝑛 𝑥1

⎜ ⎞

H =⎜
⎜ ⋮ ⋱ ⋮ ⎟
⎟ ,
⎜ ⎟
𝜕2𝑓 𝜕2𝑓
⎝ 𝜕𝑥1 𝑥𝑛 ⋯ 𝜕𝑥2𝑛 ⎠
4.6. CHAPTER 4 SUMMARY 197

𝜕 2𝑓 𝜕
where = ( 𝜕𝑓 ). The symbol H is used as the above matrix is often referred to as the
𝜕𝑥𝑖 𝜕𝑥𝑗 𝜕𝑥𝑗 𝜕𝑥𝑗
𝜕 2𝑓
Hessian matrix, or simply the Hessian. The shorthand notation 𝑓𝑥 𝑥 ∶= is useful. Key results
𝑖 𝑗
𝜕𝑥𝑖 𝜕𝑥𝑗
include

• If 𝑓𝑥 𝑥 , 𝑖, 𝑗 = 1, … , 𝑛 are defined and continuous throughout an open region containing x, then


𝑖 𝑗
𝑓𝑥 𝑥 (x) = 𝑓𝑥 𝑥 (x).
𝑖 𝑗 𝑗 𝑖
• Suppose 𝑓 ∶ R𝑛 → R is twice differentiable at x0 , and its second order derivatives are continuous in
a region containing x0 . If x0 is a critical point, then

– 𝐷2 𝑓(x0 ) being positive definite implies that x0 is a local minimum;


– 𝐷2 𝑓(x0 ) being negative definite implies that x0 is a local maximum;

• If 𝐷2 𝑓(x0 ) is positive definite or negative definite, then it must be invertible. If 𝐷2 𝑓(x0 ) is neither
positive definite nor negative definite, but still invertible, then x0 is a non-degenerate saddle point.
If 𝐷2 𝑓(x0 ) is not invertible (including the semi-definite cases), then this is a degenerate case and
further analysis is required to determine the critical points type.

Constrained optimisation Suppose 𝑓 ∶ R𝑛 → R has a local extremum at x0 when restricted to the


constraint set 𝐶 = {x ∶ 𝑔(x) = 0}, where 𝑔 ∶ R𝑛 → R. Suppose also that x0 is not an end-point of 𝐶 ,
and ∇𝑔(x0 ) ≠ 0. Then there exists a number 𝜆0 such that (x0 , 𝜆0 ) is a critical point of

𝐿(x, 𝜆) = 𝑓(x) − 𝜆𝑔(x).

In the general case, take the Lagrange multiplier 𝜆 ∈ R𝑚 , and take the Lagrangian to be 𝐿(x, 𝜆) =
𝑓(x) − 𝜆 ⋅ 𝑔(x).
𝑓 ∶
Classification follows by geometric considerations or use of a specialised second derivative test. Let
R𝑛 → R be constrained by 𝑔(x) = 0 (where 𝑔 ∶ R𝑛 → R𝑚 with constraint set 𝐶 and Lagrangian 𝐿.
Define the bordered Hessian matrix to be:

0 𝐷𝑔

⎜ ⎞

⎜ 𝜕2𝐿 𝜕2𝐿
⎜ 𝜕𝑥21 𝜕𝑥2 𝑥1 … ⎟

H=⎜ ⎟

⎜ 𝜕2𝐿 𝜕2𝐿 ⎟
⎜ (𝐷𝑔)
T
𝜕𝑥1 𝑥2 𝜕𝑥22
… ⎟

⎝ ⋮ ⋮ ⋱ ⎠

where 0 is 𝑚 × 𝑚.

Calculate the leading principal minors of H(x0 , 𝜆0 ) of order greater than or equal to 2𝑚 + 1. If these

• start with the sign of (−1)𝑚+1 and alternate, then the point is a local maximum for 𝑓|𝐶 .
• all have the sign of (−1)𝑚 , then the point is a local minimum for 𝑓|𝐶 .
• do not fall into either of the above two categories, then the test is inconclusive.
4.7. APPENDIX 199

4.7 Appendix

The following renderings illustrate various functions R2 → R and their behaviour near critical points.

𝑓(𝑥, 𝑦) = cos(𝑥) cos(𝑦)𝑒−√𝑥


2 +𝑦2

𝑓(𝑥, 𝑦) = 12 (||𝑥| − |𝑦|| − |𝑥| − |𝑦|)


200 CHAPTER 4. DIFFERENTIATION IN R𝑁

𝑓(𝑥, 𝑦) = 𝑥2 𝑦/(𝑥2 + 𝑦2 )

𝑓(𝑥, 𝑦) = −6𝑦/(2 + 𝑥2 + 𝑦2 )
4.7. APPENDIX 201

𝑥2 𝑦2
𝑓(𝑥, 𝑦) = 𝑒− 2 − 5

𝑓(𝑥, 𝑦) = 𝑥3 − 3𝑥𝑦2

𝑓(𝑥, 𝑦) = 𝑥2 − 𝑦2
202 CHAPTER 4. DIFFERENTIATION IN R𝑁

𝑓(𝑥, 𝑦) = 1/(4𝑥2 + 𝑦2 )
4.7. APPENDIX 203

𝑓(𝑥, 𝑦) = 𝑒−𝑦 cos(𝑥)


204 CHAPTER 4. DIFFERENTIATION IN R𝑁

𝑓(𝑥, 𝑦) = 𝑥𝑦(𝑥2 − 𝑦2 )/(𝑥2 + 𝑦2 )


4.7. APPENDIX 205

𝑓(𝑥, 𝑦) = 𝑦2 − 𝑦4 − 𝑥2
206 CHAPTER 4. DIFFERENTIATION IN R𝑁

𝑓(𝑥, 𝑦) = sin(𝑥2 + 𝑦2 )/(𝑥2 + 𝑦2 )


4.7. APPENDIX 207

𝑓(𝑥, 𝑦) = cos(𝑥) + 𝑦2
Chapter 5

Linear algebra

Chapter outcomes

At the end of the chapter you should be familiar with and be able to:

• Determine if a particular function defines an inner product or not.


• Select and apply appropriate theory to determine properties, including closest points, orthog-
onality, of vectors and vector spaces.
• Apply the Gram–Schmidt orthogonalisation process.
• Prove results involving inner product, orthogonality and projections.
• Analyse problems and select appropriate strategies to solve problems involving orthogonal
vectors, orthonormal vectors, vector spaces, inner products and projections.

Assumed background knowledge

Before starting this chapter it is assumed that you know the following material. (References are made to the
2019-20 printed notes.)

• MAT106: Linear Algebra. This chapter builds on significant parts of last year’s Linear Algbera. Of
particular importance are

– Chapter 5: Vector spaces;


– Chapter 6: Linear independence, spanning and bases of vector spaces;
– Chapter 7: Subspaces;
– Chapter 8: Linear Transformations;
– Chapter 9: Kernels and Images;
– Chapter 10: Inverse of a linear transformation of a matrix;
– Chapter 12: Change of basis and equivalent matrices
– Chapter 13: Eigenvectors and eigenvalues.

• MA137: Mathematical Analysis. Elements of this module arise, for example, the triangle inequality.

208
209

• ST116: Mathematical Techniques. The mathematical thinking, language and writing developed is
fundamental to everything presented in this module. This includes:

– Language of sets
– Function notation, including image and pre-image.

Chapter Statistician: Calyampudi Radhakrishna Rao

Calyampudi Radhakrishna Rao, FRS was born in the state Karnataka in southern India. Perhaps his most
famous results are the Cramér–Rao bound, the Rao–Blackwell theorem, Fisher–Rao theorem and Rao dis-
tance. In 1991, The Journal of Quantitative Economics published a special issue in Rao’s honour, including
“Dr. Rao is a very distinguished scientist and a highly eminent statistician of our time. His contributions to
statistical theory and applications are well known, and many of his results, which bear his name, are included
in the curriculum of courses in statistics at bachelor’s and master’s level all over the world. He is an inspiring
teacher and has guided the research work of numerous students in all areas of statistics. His early work
had greatly influenced the course of statistical research during the last four decades. One of the purposes
of this special issue is to recognize Dr. Rao’s own contributions to econometrics and acknowledge his major
role in the development of econometric research in India.”

Figure 5.1: Calyampudi Radhakrishna Rao.

Image available via: https://upload.wikimedia.org/wikipedia/commons/9/97/Calyampudi_Radhakrishna_R


ao_at_ISI_Chennai_/%28cropped/%29.JPG Image licence: Creative Commons.
5.1. VECTOR SPACES 211

5.1 Vector spaces

We begin by recalling the definitions of vector space and some standard terminology from last year.

Definition 5.1 (Real Vector Space). A real vector space is a set 𝑉 equipped with two operations + ∶
𝑉 × 𝑉 → 𝑉 (addition) and ⋅ ∶ R × 𝑉 → 𝑉 (multiplication by scalars); that is, for every pair u, v ∈ 𝑉
and 𝑎 ∈ R, u + v and 𝑎v are defined and themselves in 𝑉 , that satisfies the following axioms:

1. (associativity of addition) u + (v + w)
= ( u + v) + w
2. (commutativity of addition) u + v = v + u
3. (existence of additive identity element) there exists an element 0 ∈ 𝑉 , called the zero vector, such
that v + 0 = v for all v ∈ 𝑉
4. (existence of additive inverse elements) for every v ∈ 𝑉 , there exists an element −v ∈ 𝑉 , called
the additive inverse of v, such that v + (−v) =0
5. (distributivity of scalar multiplication with respect to vector addition) 𝑎(u + v) = 𝑎 u + 𝑎v
6. (distributivity of scalar multiplication with respect to field addition) (𝑎 + 𝑏)v = 𝑎v + 𝑏v
7. (compatibility of scalar multiplication with field multiplication) 𝑎(𝑏v) = (𝑎𝑏)v
8. (identity element of scalar multiplication) 1v = v.

Note, in the above u, v, w are elements of 𝑉 and 𝑎, 𝑏 ∈ R. We can similarly define a complex vector
space by supposing 𝑎, 𝑏 ∈ C.

Example 5.1

1. R𝑛 is an 𝑛-dimensional vector space.


2. 𝑃𝑛 , the set of polynomials of degree ≤ 𝑛, is an (𝑛 + 1)-dimensional vector space.
3. The set of all functions 𝑓 ∶ R𝑛 → R𝑚 is an infinite-dimensional vector space. The subset of
all linear functions is a (finite-dimensional) subspace of this.
4. It is convenient to view {0}, i.e. the set containing only the zero vector, as a 0-dimensional
vector space.

Definition 5.2 (Subspace). A subset 𝑊 of a vector space 𝑉 is a subspace of 𝑉 if: 1. it is non-empty;


2. it is closed under addition; 3. it is closed under scalar multiplication.

Example 5.2

1. Let 𝑊 = {(𝑎, 0, 𝑏)T ∣ 𝑎, 𝑏 ∈ R}. Then 𝑊 is a subspace of R3 .


2. Let 𝑊 = {(𝑥, 𝑥 + 1)T ∣ 𝑥 ∈ R}. Is 𝑊 a subspace of R2 ?
212 CHAPTER 5. LINEAR ALGEBRA
5.1. VECTOR SPACES 213
214 CHAPTER 5. LINEAR ALGEBRA

Definition 5.3 (Span). For 𝑟 ≥ 1, given a set of vectors {v1 , … , v𝑟 } ⊆ 𝑉 , the set of all linear combi-
nations 𝜆1 v1 + ⋯ + 𝜆𝑟 v𝑟 forms a subspace of 𝑉 ; we call this the span of {v1 , … , v𝑟 }, and denote it
sp({v1 , … , v𝑟 }).
If sp({v1 , … , v𝑟 }) = 𝑉 , then we say that the vectors span 𝑉 .

Definition 5.4 (linearly (in)dependent). If 𝑆 = {v1 , … , v𝑟 } is such that 𝜆1 v1 + ⋯ + 𝜆𝑟 v𝑟 = 0 if and


only if𝜆1 = ⋯ = 𝜆𝑟 = 0, then 𝑆 is said to be a linearly independent set. If there are non-zero
solutions, then we say that 𝑆 is linearly dependent.

Definition 5.5 (Basis). A basis is a set 𝑆 that is linearly independent and spans 𝑉 . A non-zero vector
space is finite-dimensional if there exists a finite set of vectors which forms a basis for that space,
otherwise it is infinite-dimensional. Any two bases for a finite-dimensional vector space have the same
number of vectors; we call that number the dimension.

Example 5.3
Find a basis of the subspace of R3 given by 𝑊 = {(𝑥, 𝑦, 𝑧)T ∣ 𝑥 + 𝑦 − 3𝑧 = 0}.
5.1. VECTOR SPACES 215
216 CHAPTER 5. LINEAR ALGEBRA
5.1. VECTOR SPACES 217

Question 5.4

1. Determine sp{(1, 2)T , (2, −3)T }.


2. Let 𝑊 = {(𝑎 + 2𝑏, 2𝑎 − 3𝑏)T ∣ 𝑎, 𝑏 ∈ R}. Is 𝑊 a subspace of R2 ?
3. Let 𝑊 = {(𝑎 + 2𝑏, 𝑎 + 1, 𝑎)T ∣ 𝑎, 𝑏 ∈ R}. Is 𝑊 a subspace of R3 ?
2𝑎 𝑏
4. Let 𝑊 = {( )∣ 𝑎, 𝑏 ∈ R}. Is 𝑊 a subspace of the space of 2×2 matrices?
3𝑎 + 𝑏 3𝑏
5.2. CHANGE OF BASIS AND LINEAR MAPS 219

5.2 Change of basis and linear maps


The canonical basis of R𝑛 is {e1 , … , e𝑛 }, where e𝑖 = (0, … , 0, 1, 0, … , 0)T (with the 1 being in the 𝑖th
𝑛
place). For any x ∈ R𝑛 there are unique coefficients (𝑥𝑖 )𝑛
𝑖=1 such that x = ∑𝑖=1 𝑥𝑖 e𝑖 .

Change of basis. Sometimes it is convenient to express vectors in R𝑛 in terms of another basis. In


𝑛
particular, suppose we have a vector x = ∑𝑖=1 𝛼𝑖 e𝑖 , where a = (𝛼1 , … , 𝛼𝑛 )T is known, that we wish to
write in terms of the basis {e′1 , … , e′𝑛 }. We know that there exist unique coefficients b = (𝛽1 , … , 𝛽𝑛 )T
𝑛
such that x = ∑𝑖=1 𝛽𝑖 e′𝑖 . But how do we compute (𝛽𝑖 )𝑛 𝑖=1 ?

𝑛
• First, since the e𝑖 form a basis and the e′𝑗 are vectors, we can write e′𝑗 = ∑𝑖=1 𝑏𝑖𝑗 e𝑖 for each 𝑗.
• Let B = (𝑏𝑖𝑗 )𝑖,𝑗=1,…,𝑛 ; that is, the matrix whose columns are given by the vectors {e′1 , … , e′𝑛 }.
• Thus,

𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛
x = ∑ 𝛼𝑖 e𝑖 = ∑ 𝛽𝑗 e′𝑗 = ∑ 𝛽𝑗 ∑ 𝑏𝑖𝑗 e𝑖 = ∑ [∑ 𝑏𝑖𝑗 𝛽𝑗 ] e𝑖 = ∑(Bb)𝑖 e𝑖
𝑖=1 𝑗=1 𝑗=1 𝑖=1 𝑖=1 𝑗=1 𝑖=1

and as this holds for arbitrary x it follows that a = Bb.


−1
• Multiplying by B yields b = B a, and so we can read off the coefficients (𝛽𝑖 )𝑛𝑖=1 .
−1

The same procedure can be applied to make a change of basis in any finite-dimensional vector space.

Example 5.5
A basis for 𝑃2 is 𝐴 = {1, 𝑥, 𝑥2 } and another basis is 𝐵 = {𝑥 + 1, 𝑥 − 1, 2𝑥2 }.

1. Determine the change of basis matrix from the basis 𝐴 to the basis 𝐵 .
2. Consider 𝑓 ∈ 𝑃2 given by 𝑓(𝑥) = 𝑎 + 𝑏𝑥 + 𝑐𝑥2 . Write 𝑓 in terms of the basis 𝐴 and then
in terms of the basis 𝐵 .
220 CHAPTER 5. LINEAR ALGEBRA
5.2. CHANGE OF BASIS AND LINEAR MAPS 221
222 CHAPTER 5. LINEAR ALGEBRA

In this example, the e′𝑖 is the basis 𝐴 and the e𝑖 is the basis 𝐵 .

Linear maps. If 𝐿 ∶ 𝑉 → 𝑉 is a linear map on a finite-dimensional vector space 𝑉 , then we can


represent 𝐿 by a matrix whose columns are the images under 𝐿 of the basis vectors of 𝑉 . Thus 𝐿 is
actually represented by a family of matrices, whose entries depend on the choice of basis. If 𝑉 = R𝑛 and
𝐿 is represented by A with respect to {e1 , … , e𝑛 }, then what matrix à represents 𝐿 with respect to the
alternative basis {e′1 , … , e′𝑛 }? Let a denote the representation of a vector v in basis (e𝑖 )𝑛
𝑖=1 and b the
representation of v in basis (e′𝑖 )𝑛
𝑖=1 .

• We know that a ↦ Aa (where vectors are written with respect to {e1 , … , e𝑛 }).
• We also have that b ↦ Ab ̃ (where vectors are written with respect to {e′1 , … , e′𝑛 }).
• Using that b = B−1 a, the previous two lines imply b ↦ B−1 Aa, which yields b ↦ B−1 ABb.
• We conclude that à = B−1 AB.

Note that if A is symmetric matrix, and we take {e′1 , … , e′𝑛 } to be an orthonormal basis of eigenvectors,
then B−1 AB is diagonal, with entries given by the eigenvalues of A. Furthermore, if {e′1 , … , e′𝑛 } forms an
orthonormal basis, then B is an orthogonal matrix and B−1 = BT .

Example 5.6
Consider the linear transformation 𝐷 ∶ 𝑃2 → 𝑃2 that maps 𝑓 ∈ 𝑃2 to 𝑓𝑥 ; that is, the derivative of
𝑓 with respect to its variable which is denoted by 𝑥.

• Determine the matrix of 𝐷 with respect to the basis {1, 𝑥, 𝑥2 }.


• Determine the matrix of 𝐷 with respect to the basis {𝑥 + 1, 𝑥 − 1, 2𝑥2 }.
5.2. CHANGE OF BASIS AND LINEAR MAPS 223
224 CHAPTER 5. LINEAR ALGEBRA
5.2. CHANGE OF BASIS AND LINEAR MAPS 225

Question 5.7

1. Let 𝑇 ∶ R3 → R3 be defined by (𝑥, 𝑦, 𝑧)T ↦ (𝑥 + 𝑦 − 𝑧, 2𝑥 + 𝑧, 𝑥)T for all 𝑥, 𝑦, 𝑧 ∈ R3 .


Find the matrix of 𝑇 with respect to the canonical basis of R3 .
2. Let 𝑉 = 𝑃2 , 𝐵 = {1, 1 + 𝑥, 1 + 𝑥 + 𝑥2 } and 𝐶 = {2 + 𝑥 + 𝑥2 , 𝑥 + 𝑥2 , 𝑥}. Both 𝐵
and 𝐶 are bases for 𝑉 . Determine the change of basis matrix from 𝐵 to 𝐶 .
3. Let 𝑇 ∶ R2 → R2 be defined by (𝑥, 𝑦)T ↦ (𝑥 − 𝑦, 2𝑥 + 𝑦)T . Determine the matrix of the
map 𝑇 with respect to

i) the canonical basis in the domain and codomain;


ii) the basis {(2, 1)T , (1, 2)T } in the domain and codomain.
Where does (6, 2)T map to? Ensure you check this using your earlier working.
5.3. INNER PRODUCTS 227

5.3 Inner products


Let𝑉 be a real vector space, 𝑓 ∶ 𝑉 × 𝑉 → R a function. Write 𝑓(u, v) = ⟨u, v⟩. The function 𝑓 is an
inner product on 𝑉 if it satisfies:

linearity: ⟨𝑎u1 + 𝑏u2 , v⟩ = 𝑎⟨u1 , v⟩ + 𝑏⟨u2 , v⟩;


symmetry: ⟨u, v⟩ = ⟨v, u⟩;
positive definiteness: ⟨u, u⟩ > 0 for all u ≠ 0 and ⟨u, u⟩ = 0 if and only if u = 0.
A vector space 𝑉 equipped with an inner product is said to be an inner product space. We can associated
a norm or length with any inner product. We write ‖u‖ = √⟨u, u⟩; this is the norm of u. Similarly,
‖u − v‖ = √⟨u − v, u − v⟩ is the distance between u and v.

Example 5.8

1. In R𝑛 , the dot or scalar product ⟨u, v⟩ = 𝑢1 𝑣1 + ⋯ + 𝑢𝑛 𝑣𝑛 is an inner product; ‖u‖ =


√𝑢21 +⋯+ 𝑢𝑛 . Since the latter is the usual definition of the length of a vector in R𝑛 , we call
2
𝑛
it the usual or Euclidean inner product. If we are discussing R and do not explicitly mention
another inner product, we mean this one!
2. If 𝑉 is the collection of continuous functions 𝑓 ∶ [𝑎, 𝑏] → R, then
𝑏
⟨𝑓, 𝑔⟩ = ∫ 𝑓(𝑥)𝑔(𝑥)𝑑𝑥
𝑎

defines an inner product on 𝑉 .


3. If 𝑉 is the collection of random variables 𝑋 such that E(𝑋) = 0 and Var(𝑋) < ∞, then
⟨𝑋1 , 𝑋2 ⟩ = Cov(𝑋1 , 𝑋2 ) is an inner product on 𝑉 .
4. If 𝑉 is the collection of 𝑛 × 𝑛 real-valued matrices, then ⟨A, B⟩ = trace(BT A) is an inner
product on 𝑉 .
228 CHAPTER 5. LINEAR ALGEBRA
5.3. INNER PRODUCTS 229
230 CHAPTER 5. LINEAR ALGEBRA

Theorem 5.1 (Cauchy-Schwarz inequality). For any u, v ∈ 𝑉 , it holds that

⟨u, v⟩2 ≤ ⟨u, u⟩⟨v, v⟩.

Note that in R𝑛 , this says that

2
𝑛 𝑛 𝑛
(∑ 𝑎𝑖 𝑏𝑖 ) ≤ (∑ 𝑎2𝑖 ) (∑ 𝑏𝑖2 ) .
𝑖=1 𝑖=1 𝑖=1
5.3. INNER PRODUCTS 231
232 CHAPTER 5. LINEAR ALGEBRA
5.3. INNER PRODUCTS 233

• We have equality in the Cauchy-Schwarz equality if and only if u and v are linearly dependent, i.e. u =
0 or v= 0 or there exists 𝜆 ∈ R such that u = 𝜆v.
• A consequence is the triangle inequality: ‖u + v‖ ≤ ‖u‖ + ‖v‖.

Parallelogram law We also have the parallelogram law: ‖u + v‖2 + ‖u − v‖2 = 2‖u‖2 + 2‖v‖2

For u, v ∈ 𝑉 \{0}, define 𝜃u,v ∈ [0, 𝜋] by

⟨u, v⟩
cos (𝜃u,v ) = .
‖u‖ × ‖v‖

This definition generalises the usual cosine rule for vectors in R𝑛 :


234 CHAPTER 5. LINEAR ALGEBRA

Definition 5.6. 1. We say that u, v are orthogonal (u ⟂ v) if and only if ⟨u, v⟩ = 0. 2. If 𝐸 ⊆ 𝑉 , then
we say that u is orthogonal to 𝐸 (u ⟂ 𝐸 ) if and only if u is orthogonal to every v ∈ 𝐸 . 3. If 𝐸1 and 𝐸2
are subsets of 𝑉 , then they are said to be orthogonal (𝐸1 ⟂ 𝐸2 ) if ⟨v, w⟩ = 0 for all v ∈ 𝐸1 , w ∈ 𝐸2 .
4. Define 𝐸 ⟂ , the orthogonal complement of 𝐸 , by setting

𝐸 ⟂ = {v ∈ 𝑉 ∶ ⟨v, w⟩ = 0, ∀w ∈ 𝐸} .

We note that 𝐸 ⟂ is a subspace of 𝑉 . Moreover, if 𝐸 is a subspace of 𝑉 , then 𝑉 = 𝐸 ⊕ 𝐸 ⟂.


Theorem 5.2. If u, v ∈ 𝑉 \{0} are orthogonal, then u and v are linearly independent.
5.3. INNER PRODUCTS 235
236 CHAPTER 5. LINEAR ALGEBRA
5.3. INNER PRODUCTS 237

Question 5.9

1. Show ⟨u, v⟩ = 3𝑢1 𝑣1 + 5𝑢2 𝑣2 defines an inner product on R2 .


2. Let
𝑢1 𝑢2 𝑣1 𝑣2
U =( ), V =( )
𝑢3 𝑢4 𝑣3 𝑣4
be elements of 𝑀2 : the space of 2×2 matrices. Does ⟨U, V⟩ = 𝑢1 𝑣1 +𝑢2 𝑣3 +𝑢3 𝑣2 +𝑢4 𝑣4
define an inner product on 𝑀2 ? Hint: Consider ⟨U, U⟩ when

1 −1
U =( )
1 1
or
1
2 −1
U =( 1 ).
1 2

3. Show ⟨𝑝, 𝑞⟩= 𝑝(0)𝑞(0) + 𝑝( 21 )𝑞( 12 ) + 𝑝(1)𝑞(1) defines an inner product on 𝑃2 , and find
‖1 + 𝑥 + 𝑥2 ‖.
5.4. ORTHOGONAL/ORTHONORMAL BASES 239

5.4 Orthogonal/orthonormal bases

Recall, from Linear Algebra that a set 𝑆 of vectors is orthogonal if its elements are pairwise orthogonal.
The set is orthonormal if, in addition, ‖s‖ = 1 for all s ∈ 𝑆 . For example, the canonical basis for R𝑛 forms
an orthonormal basis.

Example 5.10
Show that if {u1 , … , u𝑛 } is an orthogonal set, then

‖u1 + ⋯ + u𝑛 ‖2 = ‖u1 ‖2 + ⋯ + ‖u𝑛 ‖2 . (5.1)

Note, this result is a generalisation of Pythagoras’ theorem.


240 CHAPTER 5. LINEAR ALGEBRA
5.4. ORTHOGONAL/ORTHONORMAL BASES 241
242 CHAPTER 5. LINEAR ALGEBRA

𝑛
If we have an orthogonal basis {v1 , … , v𝑛 }, then we can write x = ∑𝑖=1 𝑥𝑖 v𝑖 . So,

𝑛 𝑛
⟨x, v𝑖 ⟩ = ⟨∑ 𝑥𝑗 v𝑗 , v𝑖 ⟩ = ∑ 𝑥𝑗 ⟨v𝑗 , v𝑖 ⟩ = 𝑥𝑖 ⟨v𝑖 , v𝑖 ⟩.
𝑗=1 𝑗=1

Thus we obtain
𝑛
⟨x , v 𝑖 ⟩
x =∑ v.
𝑖=1
⟨v 𝑖 , v 𝑖 ⟩ 𝑖
⟨x , v 𝑖 ⟩
We call 𝑐𝑖 = the (generalised) Fourier coefficient of x with respect to v𝑖 . Note that if ‖v𝑖 ‖ = 1
⟨v 𝑖 , v 𝑖 ⟩
for every 𝑖 (so that the basis is orthonormal), then 𝑐𝑖 = ⟨x, v𝑖 ⟩. We say that 𝑐𝑖 v𝑖 is the component of x in
the direction v𝑖 .

From (5.1), we have that


𝑛 𝑛
‖x‖2 = ∑ ‖𝑐𝑖 v𝑖 ‖2 = ∑ 𝑐𝑖2 ‖v𝑖 ‖2 . (5.2)
𝑖=1 𝑖=1

Theorem 5.3. Let {v1 , … , v𝑟 } be an orthogonal set (with each vector non-zero), and set 𝑆 =
sp({v1 , … , v𝑟 }). Then
𝑟
∥ x − ∑ 𝑎𝑖 v 𝑖 ∥
𝑖=1
𝑟
is minimised by setting 𝑎𝑖 = 𝑐𝑖 . i.e. ∑𝑖=1 𝑐𝑖 v𝑖 is the ‘closest’ vector in 𝑆 to x.
5.4. ORTHOGONAL/ORTHONORMAL BASES 243
244 CHAPTER 5. LINEAR ALGEBRA
5.4. ORTHOGONAL/ORTHONORMAL BASES 245

Example 5.11
Let 𝑊 = sp({(3, 1, −1, 1), (1, −1, 1, −1)}) ⊆ R4 . Find w ∈ 𝑊 closest to v = (3, 1, 5, 1).
246 CHAPTER 5. LINEAR ALGEBRA
5.4. ORTHOGONAL/ORTHONORMAL BASES 247
248 CHAPTER 5. LINEAR ALGEBRA

The advantage of this approach is the relative ease in computing 𝑐𝑖 which requires computing an inner
product. Nevertheless, if the provided vectors are not an orthogonal set further analysis is required.

Question 5.12
Let 𝑉 = sp{( 12 , 12 , 12 , 12 ) , (− √16 , − √16 , 0, √26 ) , ( √150 , √350 , − √650 , √250 )} ⊆ R4 . Let x =
(4, 6, 1, 1).
Determine the vector v ∈ 𝑉 that minimises ‖x − v‖.
5.5. GRAM-SCHMIDT ORTHOGONALISATION 249

5.5 Gram-Schmidt orthogonalisation

Let 𝑉 be an inner product space, and let {v1 , … , v𝑛 } be a basis for 𝑉 . Can we define an orthogonal
basis {w1 , … , w𝑛 } with the property that 𝑊𝑟 = sp({w1 , … , w𝑟 }) is equal to sp({v1 , … , v𝑟 }) for each
𝑟 = 1, … 𝑛? (If we can, we can then of course produce an orthonormal basis with the same properties.)

Step 1: First, we need to take w1 ∈ sp({v1 }). We choose

w1 = v1 .

Step 2: We next want to find w2 ∈ sp({v1 , v2 }) such that ⟨w2 , w1 ⟩ = 0. We can do this by setting

⟨v2 , w1 ⟩
w2 = v2 − w .
⟨w1 , w1 ⟩ 1

Generally: More generally, we need to find w𝑘 ∈ sp({v1 , … , v𝑘 }) such that ⟨w𝑘 , w𝑖 ⟩ = 0 for each
𝑖 = 1, … , 𝑘 − 1. We can do this by setting

w𝑘 = v𝑘 − Proj𝑊𝑘−1 (v𝑘 ),

where
𝑘−1
⟨v𝑘 , w𝑖 ⟩
Proj𝑊 (v 𝑘 ) = ∑ w
𝑘−1
𝑖=1
⟨w𝑖 , w𝑖 ⟩ 𝑖
is the projection of v𝑘 onto the subspace 𝑊𝑘−1 . The following figure shows the situation for 𝑘 = 3.
250 CHAPTER 5. LINEAR ALGEBRA

It is possible to check that: + we indeed have sp({w1 , … , w𝑟 }) = sp({v1 , … , v𝑟 }); + the set
{w1 , … , w𝑛 } is orthogonal; + each of the vectors in {w1 , … , w𝑛 } is non-zero.

Example 5.13
Let
1 1 1
⎛ ⎞
⎜ 2 ⎟ ⎛ ⎞
⎜ 2 ⎟ ⎛
⎜ 0 ⎞

v1 = ⎜
⎜ ⎟
⎟ , v2 = ⎜
⎜ ⎟
⎟ , v3 = ⎜
⎜ ⎟.
⎜ 3 ⎟ ⎜ 0 ⎟ ⎜ 0 ⎟

0
⎝ ⎠ 0
⎝ ⎠ 1
⎝ ⎠
Find an orthogonal basis for the subspace 𝑊 spanned by {v1 , v2 , v3 }.
5.5. GRAM-SCHMIDT ORTHOGONALISATION 251
252 CHAPTER 5. LINEAR ALGEBRA
5.5. GRAM-SCHMIDT ORTHOGONALISATION 253

Critical point Gram-Schmidt orthogonalisation defines an orthogonal basis, this basis must be normalised
if one requires an orthonormal basis.

Question 5.14

1. Let
1 1 1 −1 3
u1 = √ ( ), u2 =√ ( ), x =( ).
2 1 2 1 4
Check that u1 , u2 is an orthonormal basis of R2 . Find 𝑎, 𝑏 such that x = 𝑎u1 + 𝑏u2 . Hence
2 2 2
deduce that ‖x‖ =𝑎 +𝑏 .
2. What is the component of v = (1, 2, 3, 4) in the direction of w = (1, −3, 4, −2)?
3. Find an orthonormal basis for the subspace of R4 spanned by

1 1 1
⎜ 1 ⎞
⎛ ⎟ ⎜ 2 ⎞
⎛ ⎟ ⎜ −3 ⎞
⎛ ⎟
v1 = ⎜
⎜ ⎟, v2 = ⎜ ⎟, v3 = ⎜ ⎟.
⎜ 1 ⎟
⎟ ⎜ 4 ⎟
⎜ ⎟ ⎜ 4 ⎟
⎜ ⎟
⎝ 1 ⎠ ⎝ 5 ⎠ ⎝ −2 ⎠

4. Consider the vector space be 𝑃3 ; that is, the collection of polynomials of the form 𝑎3 𝑡3 +
𝑎2 𝑡2 + 𝑎1 𝑡 + 𝑎0 . Endow this vector space with the inner product
1
⟨𝑓, 𝑔⟩ = ∫ 𝑓(𝑥)𝑔(𝑥)𝑑𝑥.
0

Let 𝑊 = sp({𝑡, 𝑡2 }). Find an orthonormal basis for 𝑊 .


5. Consider the vector space be 𝑃2 , and endow this vector space with the inner product be given
by:
⟨𝑎2 𝑡2 + 𝑎1 𝑡 + 𝑎0 , 𝑏2 𝑡2 + 𝑏1 𝑡 + 𝑏0 ⟩ = 𝑎2 𝑏2 + 𝑎1 𝑏1 + 𝑎0 𝑏0 .
= 𝑥 + 𝑥2 and q = 7 + 3𝑥 + 3𝑥2 .
a. Find the cosine of the angle between p
b. Show that u = 1 − 𝑥 + 2𝑥2 and v = 2𝑥 + 𝑥2 are orthogonal. Check also that these
two vectors are not orthogonal with respect to the inner product introduced in part 4.
254 CHAPTER 5. LINEAR ALGEBRA

c. Find an orthonormal basis for sp({1, 𝑥 + 𝑥2 }).

𝑊 = sp({(1, −1, 0, 0), (1, 2, 0, −1), (1, 0, 0, 1)}) ⊆ R4 . Find w ∈ 𝑊 closest to


6. Let
v = (0, 2, 1, 0).
5.6. PROJECTIONS 255

5.6 Projections

A linear map P ∶ 𝑉 → 𝑉 is a projection if and only if P2 = P. Note, the map P is not invertible unless
P = I.

Example 5.15
Let {u1 , … , u𝑟 } be an orthonormal set, and let 𝑈 = sp({u1 , … , u𝑟 }). If we define
𝑟
P(x) = Proj𝑈 (x) = ∑⟨x, u𝑖 ⟩u𝑖 ,
𝑖=1

then P is a projection.
256 CHAPTER 5. LINEAR ALGEBRA
5.6. PROJECTIONS 257
258 CHAPTER 5. LINEAR ALGEBRA

Recall that the kernel of a map, P ∶ 𝑉 → 𝑉 , is ker(P) = {x ∈ 𝑉 ∶ P(x) = 0} and its image is
im(P) = {x ∈ 𝑉 ∶ there exists v ∈ 𝑉 such that x = P(v)}.
Theorem 5.4. If P ∶ 𝑉 → 𝑉 is a projection, then the following statements hold.

1. ker(P) = im(I − P).

2. 𝑉 = im(P) ⊕ ker(P).
5.6. PROJECTIONS 259
260 CHAPTER 5. LINEAR ALGEBRA

Note, if 𝑉 = 𝑈 ⊕ 𝑊 , then there exists a projection P with 𝑈 = im(P), 𝑊 = ker(P). Indeed, such a
projection is defined by noting v ∈ 𝑉 can be uniquely written as v = u + w, where u ∈ 𝑈 and w ∈ 𝑊 ,
and then setting P(v) = u.
5.6. PROJECTIONS 261

Example 5.16
Check that the matrix
1 0
P =( )
4 0
defines a projection. Compute its image and kernel, and show that R2 = im(P) ⊕ ker(P).
262 CHAPTER 5. LINEAR ALGEBRA
5.6. PROJECTIONS 263

Definition 5.7. An orthogonal projection is a projection with im(P) ⟂ ker(P).


Note, the projection
𝑟
P(x) = Proj𝑈 (x) = ∑⟨x, u𝑖 ⟩u𝑖 ,
𝑖=1
264 CHAPTER 5. LINEAR ALGEBRA

is orthogonal.

Critical point An orthogonal projection is not the same as an orthogonal linear map. An orthogonal linear
map preserves an inner product on an inner product space. For example, in R2 with the standard euclidean
inner product, a rotation about the origin is an orthogonal linear map.

Theorem 5.5. Let P be an 𝑛 × 𝑛 matrix representing a projection R𝑛 → R𝑛 , i.e. P2 = P. The following


three statements are equivalent:

1. The projection is orthogonal;


2. PT P = P;
3. PT = P.
5.6. PROJECTIONS 265
266 CHAPTER 5. LINEAR ALGEBRA
5.6. PROJECTIONS 267

Question 5.17

1. By appropriately applying Theorem 5.5 deduce that

1 4
17 17
P =( 4 16
)
17 17

describes an orthogonal projection.

2. Show that
4 6
13 13
P =( 6 9
)
13 13
represents an orthogonal projection.

a) Find imP and kerP, and verify that they are orthogonal.
b) The matrix represents an orthogonal projection from R2 onto a line 𝐿 in R2 . Write down
the equation of 𝐿.
5.7. THE SPECTRAL DECOMPOSITION THEOREM 269

5.7 The spectral decomposition theorem

Let {u1 , … , u𝑟 } be an orthonormal subset of R𝑛 . What matrix represents Proj𝑈 , the orthogonal projection
onto 𝑈 = sp({u1 , … , u𝑟 })?
270 CHAPTER 5. LINEAR ALGEBRA
5.7. THE SPECTRAL DECOMPOSITION THEOREM 271

In summary,
𝑟
Proj𝑈 (x) = (∑ u𝑖 uT𝑖 ) x. (5.3)
𝑖=1

Theorem 5.6 (The spectral decomposition theorem). A symmetric matrix A can be written as

A = 𝜆1 E1 + ⋯ + 𝜆𝑟 E𝑟 ,
272 CHAPTER 5. LINEAR ALGEBRA

where 𝜆𝑖 are the distinct eigenvalues, E𝑖 are the projections onto eigenspaces with the following prop-
erties:

• ET
𝑖 = E𝑖 ;
• E1 + ⋯ + E𝑟 = I;
• E𝑖 E𝑗 = 0 for 𝑖 ≠ 𝑗.

The proof is accessible, but relegated to the Consolidation Questions.

Example 5.18
Let
1 2
A =( ).
2 1
Determine the spectral decomposition of A, verifying the conditions of the spectral decomposition
theorem.
5.7. THE SPECTRAL DECOMPOSITION THEOREM 273
274 CHAPTER 5. LINEAR ALGEBRA
5.7. THE SPECTRAL DECOMPOSITION THEOREM 275

Question 5.19

1. Find the matrix P which represents an orthogonal projection of R3 onto 𝑊 ∶= {(𝑥, 𝑦, 𝑧)T ∶
2𝑥 − 3𝑦 + 𝑧 = 0}. Find the image of (1, 2, 3)T under this projection. How far is the point
(1, 2, 3)T from 𝑊 ?
2. Determine the 3 × 3 matrix P which represents orthogonal projection of R3 onto the subspace
𝑈 generated by the orthonormal vectors
1
3 − √25

⎜ ⎞
⎟ ⎛ ⎞
u1 = ⎜

2
3 ⎟
⎟, u2 =⎜

⎜ √1


⎟.
5

2
⎠ ⎝ 0 ⎠
3

Find imP and kerP, and the point in 𝑈 closest to (8, 1, −5)T .
3. Let v1 = (1, 4, 2)𝑇 and v2 = (4, 9, 1)𝑇 . Let 𝑃 be the plane in R3 spanned by v1 and v2 .
a) Determine the matrix 𝑀 that represents orthogonal projection of R3 onto the plane 𝑃 .
b) Determine ker(𝑀 ) and the equation of the plane 𝑃 .

4. Apply the spectral decomposition theorem to find a decomposition for the matrix

2 1 0
⎛ ⎞
A = ⎜1 2 0 ⎟
⎝0 0 −1⎠
as a linear combination of orthogonal projections.
5.8. APPLICATION TO SIMPLE LINEAR REGRESSION 277

5.8 Application to simple linear regression


The problem of simple linear regression is as follows: given data (𝑥𝑖 , 𝑦𝑖 ), 𝑖 = 1, … , 𝑛, (for simplicity,
we suppose that the𝑥𝑖 are distinct, but it is not difficult to drop this restriction,) find the linear function
𝑓(𝑥) = 𝛼 + 𝛽𝑥 that minimises:
𝑛
2
∑ (𝑦𝑖 − 𝑓(𝑥𝑖 )) .
𝑖=1

We will show how this can be solved using linear algebra.

Let 𝑉 be the collection of functions from {𝑥1 , … , 𝑥𝑛 } to R. This is a vector space, and can be equipped
with an inner product defined by
𝑛
⟨𝑓, 𝑔⟩ = ∑ 𝑓(𝑥𝑖 )𝑔(𝑥𝑖 ).
𝑖=1

Note that we can define a function 𝑦 ∈ 𝑉 by setting 𝑦(𝑥𝑖 ) = 𝑦𝑖 . Moreover, if we let 𝑆 be the subspace of
linear functions, then the problem of simple linear regression is to find a function 𝑓 ∈ 𝑆 such that ‖𝑓 − 𝑦‖
is minimised. From the results of the section so far, we deduce that the solution is given by the orthogonal
projection, i.e. 𝑓 = Proj𝑆 (𝑦). To compute this, we need to find an orthogonal basis for 𝑆 .
To find such a basis, we start by noting that 𝑆 = sp({𝑓1 , 𝑓2 }), where 𝑓1 (𝑥) = 1 and 𝑓2 (𝑥) = 𝑥. We
can then use Gram-Schmidt to find an orthogonal basis. In particular, set

𝑔1 = 𝑓1 ,
𝑛
⟨𝑔 , 𝑓 ⟩ ∑ 𝑥𝑖
𝑔2 = 𝑓2 − 1 2 𝑔1 = 𝑓2 − 𝑖=1 𝑔1 = 𝑓2 − 𝑥𝑔̄ 1 ,
⟨𝑔1 , 𝑔1 ⟩ 𝑛

i.e. 𝑔1 (𝑥) = 1, 𝑔2 (𝑥) = 𝑥 − 𝑥.̄


Consequently, we deduce that

2 𝑛 𝑛
⟨𝑦, 𝑔𝑖 ⟩ ∑𝑖=1 𝑦𝑖 ∑ 𝑦𝑖 (𝑥𝑖 − 𝑥)̄
𝑓(𝑥) = ∑ 𝑔𝑖 (𝑥) = 𝑔1 (𝑥) + 𝑖=1
𝑛 𝑔 (𝑥)
2 2
𝑖=1
⟨𝑔𝑖 , 𝑔 𝑖 ⟩ 𝑛 ∑ (𝑥
𝑖=1 𝑖
− 𝑥)
̄
𝑛
∑𝑖=1 (𝑥𝑖 − 𝑥)(𝑦
̄ 𝑖 − 𝑦)̄
= 𝑦̄ + 𝑛 (𝑥 − 𝑥),
̄
∑𝑖=1 (𝑥𝑖 − 𝑥)̄ 2

giving the estimators


𝑛
∑𝑖=1 (𝑥𝑖 − 𝑥)(𝑦
̄ 𝑖 − 𝑦)̄
𝛽̂ = 𝑛 , ̂ ̄
𝛼̂ = 𝑦 ̄ − 𝛽 𝑥.
∑𝑖=1 (𝑥𝑖 − 𝑥)̄ 2
5.9. CHAPTER 5 CONSOLIDATION QUESTIONS 279

5.9 Chapter 5 Consolidation Questions

Question 5.20
Write
0 1 1
A =⎛
⎜ 1 0 1 ⎞

⎝ 1 1 0 ⎠
in the form A = RDRT , where R is an orthogonal matrix, and D is a diagonal matrix.

Question 5.21
Suppose 𝑉 is the vector space of continuous functions [0, 1] → ℝ, equipped with the inner product
given by
1
⟨𝑓, 𝑔⟩ = ∫ 𝑓(𝑥)𝑔(𝑥)𝑑𝑥.
0

Define ℎ ∈ 𝑉 by setting ℎ(𝑥) = 1 for every 𝑥 ∈ [0, 1], and let 𝑆 = span({ℎ}). Given a function
𝑓 ∈ 𝑉 , determine 𝑔 ∈ 𝑆 ⟂ and 𝑐 ∈ ℝ such that 𝑓 = 𝑔 + 𝑐ℎ.

Question 5.22
Let A be an 𝑛 × 𝑛 symmetric matrix. Suppose A’s eigenvalues are 𝜆1 , … 𝜆𝑛 . Suppose the corre-
𝑛
sponding orthonormal eigenvectors are u1 , … u𝑛 . Let x ∈ R𝑛 . Write x = ∑𝑖=1 𝑥𝑖 u𝑖 .

1. Prove that A = (𝜆1 E1 + ⋯ + 𝜆𝑛 E𝑛 ), where E𝑖 = u𝑖 u𝑇𝑖 .


2. Prove that E𝑖 E𝑗 = 0 for 𝑖 ≠ 𝑗.
3. Prove that (∑ E𝑖 )x = Ix for all x.
5.10. CHAPTER 5 SUMMARY 281

5.10 Chapter 5 summary

Chapter outcomes review

Having reviewed the material and completed the assessment (formative and summative) material,
you should be able to:

• Determine if a particular function defines an inner product or not.


• Select and apply appropriate theory to determine properties, including closest points, orthog-
onality, of vectors and vector spaces.
• Apply the Gram–Schmidt orthogonalisation process.
• Prove results involving inner product, orthogonality and projections.
• Analyse problems and select appropriate strategies to solve problems involving orthogonal
vectors, orthonormal vectors, vector spaces, inner products and projections.

This chapter covers five topics.

Vector spaces

• Let 𝑉 be a real vector space. For 𝑟 ≥ 1, given a set of vectors {v1 , … , v𝑟 } ⊆ 𝑉 , the set of all linear
combinations 𝜆1 v1 + … 𝜆𝑟 v𝑟 for a subspace of 𝑉 called the span of {v1 , … , v𝑟 }.
• Denote the span of {v1 , … , v𝑟 } by sp({v1 , … , v𝑟 }). If sp({v1 , … , v𝑟 }) = 𝑉 , then the vectors
span 𝑉 .
• If 𝑆 = {v1 , … , v𝑟 } is such that 𝜆1 v1 + … 𝜆𝑟 v𝑟 = 0 if, and only if, 𝜆1 = … 𝜆𝑟 = 0, then 𝑆 is said
to be a linearly independent set. If there are non-zero solution, then 𝑆 is linearly dependent.
• A basis is a set 𝑆 that is linearly independent and spans 𝑉 .
• A non-zero vector spaces is finite-dimensional if there exists a finite set of vectors which forms a
basis for that space, otherwise it is infinite-dimensional.
• Any two bases for a finite-dimensional vector space have the same number of vectors, this number is
the dimensions of the vector space.
• The canonical basis of R𝑛 is {e1 , … , e𝑛 }, where e𝑖 = (0, … , 0, 1, 0, … , 0)T , with the 1 being in
𝑛
the 𝑖th place. Given any x ∈ R𝑛 , there are unique coefficients (𝛼𝑖 )𝑛
𝑖=1 such that x = ∑𝑖=1 𝛼𝑖 e𝑖 .

𝑛
Sometimes it is convenient to express a vector x = ∑𝑖=1 𝛼𝑖 e𝑖 in terms of another basis {e′1 , … e′𝑛 } such
𝑛
that x = ∑𝑖=1 𝛽𝑖 e′𝑖 . Key points are:

𝑛
• Each e′𝑗 can be written e′𝑗 = ∑𝑖=1 𝑏𝑖𝑗 e𝑖 for each 𝑗.
• Let B = (𝑏𝑖𝑗 )𝑖,𝑗=1,…,𝑛 ; that is, the matrix whose columns are given by the vectors {e′1 , … , e′𝑛 }.
Then

𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛
x = ∑ 𝛼𝑖 e𝑖 = ∑ 𝛽𝑗 e′𝑗 = ∑ 𝛽𝑗 ∑ 𝑏𝑖𝑗 e𝑖 = ∑ [∑ 𝑏𝑖𝑗 𝛽𝑗 ] e𝑖 = ∑(Bb)𝑖 e𝑖 .
𝑖=1 𝑗=1 𝑗=1 𝑖=1 𝑖=1 𝑗=1 𝑖=1

Since this holds for arbitrary x it follows that a = Bb. The matrix B is invertible, so b = B−1 a. This
282 CHAPTER 5. LINEAR ALGEBRA

result permits reading off the coefficients (𝛽𝑖 )𝑛


𝑖=1 . This process works in any finite-dimensional vector
space.
• If 𝐿 ∶ 𝑉 → 𝑉 is a linear map on a finite-dimensional vector space 𝑉 , then we can represent 𝐿
by a matrix whose columns are the images under 𝐿 of the basis vectors of 𝑉 . Thus 𝐿 is actually
represented by a family of matrices, whose entries depend on the choice of basis. If 𝑉 = R𝑛 and 𝐿
is represented by A with respect to {e1 , … , e𝑛 }, then the matrix representing 𝐿 with respect to the
alternative basis {e′1 , … , e′𝑛 } is given by A = B−1 AB.

Inner products Let 𝑉 be a real vector space, 𝑓 ∶ 𝑉 × 𝑉 → R a function. Write 𝑓(u, v) = ⟨u, v⟩. The
function 𝑓 is an inner product on 𝑉 if it satisfies:

⟨𝑎u1 + 𝑏u2 , v⟩ = 𝑎⟨u1 , v⟩ + 𝑏⟨u2 , v⟩;


• linearity:
• symmetry: ⟨u, v⟩ = ⟨v, u⟩.
• non-negativity: ⟨u, u⟩ ≥ 0, with equality if and only if u = 0.

A vector space 𝑉 equipped with an inter product is said to be an inner product space. Write ‖u‖ =
√⟨u, u⟩, this is the norm of u and associates a ‘length’ with the inner product. Similarly, ‖u − v‖ =
√⟨u − v, u − v⟩ is the distance between u and v.

• For any u, v ∈ 𝑉 , the following inequality holds

⟨u, v⟩2 ≤ ⟨u, u⟩⟨v, v⟩.

This inequality is named the Cauchy-Schwarz inequality. Equality holds in Cauchy-Schwarz inequality
if, and only if, u and v are linearly dependent; that is, u = 0 or v = 0 or there exists 𝜆 ∈ R such that
u = 𝜆v.
• The Cauchy-Schwarz inequality implies the triangle inequality: ‖u + v‖ ≤ ‖u‖ + ‖v‖.
• The parallelogram law is: ‖u + v‖2 + ‖u − v‖2 = 2‖u‖2 + 2‖v‖2 .
• For u, v ∈ 𝑉 {0}, define 𝜃u,v ∈ [0, 𝜋] by

⟨u, v⟩
cos(𝜃u,v ) = .
‖u‖ × ‖v‖

This generalises the usual cosine rule for vectors in R𝑛 .

Orthogonal/orthonormal bases We say that u, v are orthogonal and write u ⟂ v if, and only if, ⟨u, v⟩ = 0.
If 𝐸 ⊆ 𝑉 , then u is orthogonal to 𝐸 , written u ⟂ 𝐸 if, and only if, u is orthogonal to every v ∈ 𝐸 . If
𝐸1 and 𝐸2 are subsets of 𝑉 , then they are said to be orthogonal, written 𝐸1 ⟂ 𝐸2 if ⟨v, w⟩ = 0 for
all v ∈ 𝐸1 , w ∈ 𝐸2 . Define 𝐸 ⟂ , the orthogonal complement of 𝐸 by setting 𝐸 ⟂ = {v ∈ 𝑉 ∶
⟨v, w⟩ = 0, for all w ∈ 𝑊 }. Note that 𝐸 ⟂ is a subspace of 𝑉 . Moreover, if 𝐸 is a subspace of 𝑉 , then
𝑉 = 𝐸 ⊕ 𝐸 ⟂.

• If u, v ∈ 𝑉 {0} are orthogonal, then u and v are linearly independent.


5.10. CHAPTER 5 SUMMARY 283

A set 𝑆 of vectors is orthogonal if its elements are pairwise orthogonal. The set is orthonormal if, in
addition, ‖s‖ = 1 for all s ∈ 𝑆 . The canonical basis for R𝑛 is an example of an orthonormal basis.

𝑛
• If we have an orthogonal basis {v1 , … , v𝑛 }, then we can write x = ∑𝑖=1 𝑥𝑖 v𝑖 and show that
⟨x , v 𝑖 ⟩
𝑥𝑖 = , the {(generalised) Fourier coefficients of x with respect to v𝑖 . The vector 𝑥𝑖 v𝑖 is the
⟨v 𝑖 , v 𝑖 ⟩
component of x in the direction v𝑖 .

Gram–Schmidt orthogonalisation Let 𝑉 be an inner product space, and let {v1 , … , v𝑛 } be a basis
for 𝑉 . A question is, does there exist an orthogonal basis {w1 , … , w𝑛 } with the property that 𝑊𝑟 =
sp({w1 , … , w𝑛 }) is equal to sp({v1 , … , v𝑛 }) for each 𝑟 = 1, … 𝑛? The answer is yes and the steps are
as follows.

• Take w1 ∈ sp({v1 }). For convenience select w1 = v1 .


• Now find w2 ∈ sp(v1 , v2 ) such that ⟨w2 , w1 ⟩ = 0. Setting

⟨v2 , w1 ⟩
w2 = v2 − w ,
⟨w1 , w1 ⟩ 1
yields an appropriate w2 .
• In general, find w𝑘 ∈ sp({v1 , … , v𝑘 }) such that ⟨w𝑘 , w𝑖 ⟩ = 0 for each 𝑖 = 1, … 𝑘 − 1. Setting
w𝑘 = v𝑘 − Proj𝑊𝑘−1 (v𝑘 ), where

𝑘−1
⟨v𝑘 , w𝑖 ⟩
Proj𝑊 (v𝑘 ) = ∑ w
𝑘−1
𝑖=1
⟨w𝑖 , w𝑖 ⟩ 𝑖

is the projection of v𝑘 onto the subspace 𝑊𝑘−1 .


• Let {v1 , … v𝑟 } be an orthogonal set (with each vector non-pro), and set 𝑆 = sp({v1 , … v𝑟 }). Then
𝑟
∥x − ∑ 𝑎𝑖 v𝑖 ∥
𝑖=1

⟨x, v𝑖 ⟩
is minimised by setting 𝑎𝑖 = 𝑐𝑖 , where 𝑐𝑖 = .
⟨v𝑖 , v𝑖 ⟩

A linear map P ∶ 𝑉 → 𝑉 is a projection if, and only if, P2 = P. If P ∶ 𝑉 → 𝑉 is a projection, then


ker(P) = im(I − P) and 𝑉 = im(P) ⊕ ker(P). Moreover, if 𝑉 = 𝑈 ⊕ 𝑊 , then there exists a projection
P with 𝑈 = im(P), 𝑊 = ker(P). An orthogonal projection is a projection with im(P) ⟂ ker(P). Some
key results.

• Let P be an 𝑛 × 𝑛 matrix representation a projection R𝑛 → R𝑛 ; that is, P2 = P. The following are


equivalent:

– The projection is orthogonal;


– PT P = P; +PT = P.
284 CHAPTER 5. LINEAR ALGEBRA

• Let {u1 , … , u𝑟 } be an orthonormal subset of R𝑛 . The matrix representing Proj𝑈 , the orthogonal
projection onto 𝑈 = sp({u1 , … , u𝑟 }), is given by

𝑟
Proj𝑈 (x) = (∑ u𝑖 uT𝑖 ) x.
𝑖=1

• Spectral decomposition theorem A symmetric matrix A can be written as A = 𝜆1 E1 + … + 𝜆𝑟 E𝑟 ,


where 𝜆𝑖 are the distinct eigenvalues, E𝑖 are the projections onto eigenspaces with the following
properties

– ET𝑖 = E𝑖 ;
– E1 + … + E𝑟 = I;
– E𝑖 E𝑗 = 0 for 𝑖 ≠ 𝑗.

Application to simple linear regression Let 𝑉 be the collection of functions from {𝑥1 , … , 𝑥𝑛 } to R. This
𝑛
is a vector space, and can be equipped with an inner product defined by ⟨𝑓, 𝑔⟩ = ∑𝑖=1 𝑓(𝑥𝑖 )𝑔(𝑥𝑖 ).
Define a function 𝑦 ∈ 𝑉 by setting 𝑦(𝑥𝑖 ) = 𝑦𝑖 . Moreover, if we let 𝑆 be the subspace of linear functions,
then the problem of simple linear regression is to find a function 𝑓∈ 𝑆 such that ‖𝑓 − 𝑦‖ is minimised. The
solution is given by the orthogonal projection, i.e. 𝑓 = Proj𝑆 (𝑦). To compute this, find an orthogonal basis
for 𝑆 = sp({𝑓1 , 𝑓2 }), where 𝑓1 (𝑥) = 1 and 𝑓2 (𝑥) = 𝑥. Use Gram–Schmidt to show 𝑔1 (𝑥) = 1 and
𝑔2 (𝑥) = 𝑥 − 𝑥. Thus
2 𝑛
⟨𝑦, 𝑔𝑖 ⟩ ∑ (𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
𝑓(𝑥) = ∑ 𝑔𝑖 (𝑥) = 𝑦 + 𝑖=1 𝑛 (𝑥𝑖 − 𝑥),
𝑖=1
⟨𝑔𝑖 , 𝑔𝑖 ⟩ ∑𝑖=1 (𝑥𝑖 − 𝑥)2

giving the estimators


̂
𝛼̂ = 𝑦 − 𝛽𝑥,
where 𝑛
∑𝑖=1 (𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
𝛽̂ = 𝑛 .
∑𝑖=1 (𝑥𝑖 − 𝑥)2
Chapter 6

Metric spaces

Chapter outcomes

At the end of the chapter you should be familiar with and be able to:

• recall and apply appropriate definitions to problems involving metric spaces;


• state and prove theorems involving metric spaces;
• evaluate a problem and then apply appropriate techniques to solve problems involving metric
spaces.

Assumed background knowledge

Before starting this chapter it is assumed that you know the following material.

• MA137: Mathematical Analysis. The notation, definitions and theory of

– Convergence, particularly limits of sequences and limits of functions.


– Continuity of functions.

• ST116: Mathematical Techniques. The mathematical thinking, language and writing developed is
fundamental to everything presented in this module. This includes:

– Language of sets.
– Function notation.

• General mathematical knowledge. We will also use some calculus, for example, finding maxima.

Chapter Statistician: Florence Nightingale

Although Florence Nightingale is remembered for her role in founding the modern nursing profession, she
was a gifted mathematician being described as ‘a true pioneer in the graphical representation of statistics’.
She used the graphical representations in presentations to Parliament and civil servants. Her work improved
public health through, for example, the Public Health Acts of 1874 and 1875. In 1859, she was elected

285
286 CHAPTER 6. METRIC SPACES

a member of the Royal Statistical Society. In 1874, she became an honorary member of the American
Statistical Association.

Figure 6.1: Florence Nightingale.

Image available via: https://en.wikipedia.org/wiki/File:Florence_Nightingale_(H_Hering_NPG_x82368).jpg


Image licence: The image is in the public domain.
6.1. INTRODUCTION TO METRIC SPACES 287

6.1 Introduction to metric spaces


We have been considering inner products on vector spaces, which gives us a concept of the distance be-
tween two vectors in the space. Is it possible to have a notion of distance without all the structure of a vector
space?

Definition 6.1 (Metric Space). Let 𝑆 be a non-empty set, and let 𝑑 ∶ 𝑆 × 𝑆 → [0, ∞) be such that,
for all 𝑥, 𝑦, 𝑧 ∈ 𝑆:

• (M1) 𝑑(𝑥, 𝑦) = 0 if and only if 𝑥 = 𝑦;


• (M2) 𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥);
• (M3) 𝑑(𝑥, 𝑧) ≤ 𝑑(𝑥, 𝑦) + 𝑑(𝑦, 𝑧).

We then say that 𝑑 is a metric and (𝑆, 𝑑) is a metric space.

Notice that the codomain of 𝑑 is [0, ∞), so:

• 𝑑(𝑥, 𝑦) ≥ 0 for all 𝑥, 𝑦 ∈ 𝑆 .

Some texts make this an explicit requirement as a fourth axiom (but you can in fact deduce it from (M1–3)).

Example 6.1

1. If 𝑆 = 𝑉 , a vector space equipped with an inner product, then

𝑑(𝑥, 𝑦) = ‖𝑥 − 𝑦‖

is a metric.
2. For any non-empty set 𝑆 , define

0, if 𝑥 = 𝑦,
𝑑(𝑥, 𝑦) = {
1, otherwise.

then 𝑑 is a metric. (This is the discrete metric.)


3. Let 𝑆 be the collection of functions 𝑓 ∶ [𝑎, 𝑏] → R that are bounded, then

𝑑(𝑓, 𝑔) = sup |𝑓(𝑥) − 𝑔(𝑥)|


𝑥∈[𝑎,𝑏]

is a metric. (This is the uniform metric.)


288 CHAPTER 6. METRIC SPACES
6.1. INTRODUCTION TO METRIC SPACES 289
290 CHAPTER 6. METRIC SPACES

Important metric on R𝑛 If 𝑆 = R𝑛 , then the Euclidean distance is a metric; that is,


1/2
𝑛
2
𝑑(x, y) = (∑ |𝑥𝑖 − 𝑦𝑖 | ) .
𝑖=1

is a metric. More generally, if 𝑝 ≥ 1, then


1/𝑝
𝑛
𝑑𝑝 (x, y) = (∑ |𝑥𝑖 − 𝑦𝑖 |𝑝 )
𝑖=1

is a metric. Furthermore, so is
𝑑∞ (x, y) = max |𝑥𝑖 − 𝑦𝑖 |.
𝑖=1,…,𝑛

The Metric Spaces module (if taken) discusses these in much more detail. Metric spaces can be found in
many different areas of mathematics. For example, let 𝑆 be the vertices of a connected graph, then

𝑑(𝑥, 𝑦) = #{edges on shortest path between 𝑥 and 𝑦}


6.1. INTRODUCTION TO METRIC SPACES 291

is a metric.
Definition 6.2 (Bounded). A subset 𝐴 ⊆ 𝑆 is bounded if its diameter, diam(𝐴) = sup𝑥,𝑦∈𝐴 𝑑(𝑥, 𝑦)
is finite.

Example 6.2
Equip R with the metric 𝑑(𝑥, 𝑦) = |𝑥 − 𝑦|. Show that [0, 2] and [0, 2) are bounded. Is [0, ∞)
bounded?

Question 6.3

1. Explain why the following are not metrics on R:


292 CHAPTER 6. METRIC SPACES

a. 𝑑(𝑥, 𝑦) = |𝑥2 − 𝑦2 |;
b. 𝑑(𝑥, 𝑦) = 𝑥 − 𝑦 if 𝑥 ≥ 𝑦 , and 𝑑(𝑥, 𝑦) = (𝑦 − 𝑥)/2 otherwise;
c. 𝑑(𝑥, 𝑦) = ||𝑥| − |𝑦||;

2. Show that 𝑑(𝑥, 𝑦) = 2|𝑥| + 2|𝑦| if 𝑥 ≠ 𝑦, and 𝑑(𝑥, 𝑥) = 0 is a metric on R.


3. Prove that diam(𝐴) = 0 if and only if 𝐴 consists of a single point.
6.2. OPEN AND CLOSED SETS 293

6.2 Open and closed sets


Let (𝑆, 𝑑) be a metric space,
𝑥 ∈ 𝑆 , 𝑟 > 0. The open ball centred at 𝑥 of radius 𝑟 is 𝐵(𝑥, 𝑟) = {𝑦 ∈
𝑆 ∶ 𝑑(𝑥, 𝑦) < 𝑟}. The closed ball centred at 𝑥 of radius 𝑟 is

𝐵(𝑥, 𝑟) = {𝑦 ∈ 𝑆 ∶ 𝑑(𝑥, 𝑦) ≤ 𝑟}.

Example 6.4

1. In (R, 𝑑𝑝 ), we have 𝐵(𝑥, 𝜀) = (𝑥 − 𝜀, 𝑥 + 𝜀).


2
2. In (R , 𝑑2 ), 𝐵(𝑥, 𝜀) is the inside of a disc centred at 𝑥, radius 𝜀.
2
3. In (R , 𝑑1 ), 𝐵(𝑥, 𝜀) = {𝑦 ∶ |𝑥1 − 𝑦1 | + |𝑥2 − 𝑦2 | < 𝜀} – this is a diamond.
4. In (R2 , 𝑑∞ ), 𝐵(𝑥, 𝜀) = {𝑦 ∶ max{|𝑥1 − 𝑦1 |, |𝑥2 − 𝑦2 |} < 𝜀} – this is a square.
5. If 𝑑 is the discrete metric on 𝑆 , then 𝐵(𝑥, 𝜀) = {𝑥} if 𝜀 ≤ 1, and 𝐵(𝑥, 𝜀) = 𝑆 otherwise.
294 CHAPTER 6. METRIC SPACES

Example 6.5
Assume that 𝑑, defined as follows, is a metric on R:

0, if 𝑥 = 𝑦,
𝑑(𝑥, 𝑦) = {
|𝑥 − 1| + |𝑦 − 1| + 2|𝑥 − 𝑦|, otherwise.

Find 𝐵(0, 𝛿), 𝐵(0, 1) and 𝐵(1, 𝛿).


6.2. OPEN AND CLOSED SETS 295
296 CHAPTER 6. METRIC SPACES
6.2. OPEN AND CLOSED SETS 297

Definition 6.3 (Open and Closed sets). A subset𝐴 ⊆ 𝑆 is open if for every 𝑥 ∈ 𝐴, there exists an
𝜀 > 0 such that 𝐵(𝑥, 𝜀) ⊆ 𝐴. A subset is closed if it is the complement of an open set.

Example 6.6
1
As a subset of (R, 𝑑1 ) is { ∶ 𝑛 ∈ N} open or closed?
𝑛
Recall,
1/𝑝
𝑛
𝑝
𝑑𝑝 (x, y) = (∑ |𝑥𝑖 − 𝑦𝑖 | )
𝑖=1
298 CHAPTER 6. METRIC SPACES
6.2. OPEN AND CLOSED SETS 299

Question 6.7

1. As a subset of (R, 𝑑1 ), is Q open or closed?


2. Let 𝑆 = [0, ∞) and 𝑑 ∶ 𝑆 × 𝑆 → R be defined by

0, 𝑥=𝑦
𝑑(𝑥, 𝑦) = {
1 + |𝑥 − 𝑦| + 𝑥 + 𝑦, 𝑥 ≠ 𝑦.

a) Show that 𝑑 is a metric on 𝑆 .


b) Determine the open balls 𝐵(𝑥, 1) and 𝐵(0, 2), and 𝐵(1, 𝛿), for 𝑥 ∈ 𝑆 and 𝛿 > 0.
3. Let 𝑑 ∶ R2 × R2 → [0, ∞) be defined by

max{‖x‖, ‖y‖}, if x ≠ y,
𝑑(x, y) = {
0, otherwise,

where ‖x‖ is the Euclidean norm of x = (𝑥1 , 𝑥2 ).


a) Show that 𝑑 is a metric on R2 .
b) Determine the balls 𝐵((0, 0), 1) and 𝐵((1, 0), 1).
c) Is the set 𝐴 = {x ∈ R2 ∶ 𝑑((0, 0), x) ≤ 1} open in (R2 , 𝑑)?
6.3. CONVERGENCE 301

6.3 Convergence

Let (𝑥𝑛 )𝑛≥1 be a sequence in a metric space (𝑆, 𝑑). We say that (𝑥𝑛 ) converges to 𝑥 ∈ 𝑆 if and only if
𝑑(𝑥𝑛 , 𝑥) → 0; that is,

∀𝜀 > 0, ∃𝑛0 ∈ N such that 𝑥𝑛 ∈ 𝐵(𝑥, 𝜀), ∀𝑛 ≥ 𝑛0 .

We do not assume that (𝑥𝑛 ) is a sequence of real numbers in R. This definition is general as the next
example illustrates.

Example 6.8
Let 𝑆 be the collection of functions 𝑓 ∶ [0, 1] → R that are bounded and 𝑑(𝑓, 𝑔) =
sup𝑥∈[0,1] |𝑓(𝑥) − 𝑔(𝑥)|. Define a sequence 𝑓𝑛 ∶ [0, 1] → R by 𝑓𝑛 (𝑥) = 𝑛2 𝑥(1 − 𝑥)𝑛
for all 𝑥 ∈ [0, 1].

1. Find a function 𝑓 ∶ [0, 1] → R such that 𝑓𝑛 (𝑥) → 𝑓(𝑥) for every 𝑥 ∈ [0, 1].
2. Determine if (𝑓𝑛 ) converges to 𝑓 in (𝑆, 𝑑).
302 CHAPTER 6. METRIC SPACES
6.3. CONVERGENCE 303
304 CHAPTER 6. METRIC SPACES

Question 6.9
Let 𝑆 be the collection of functions 𝑓 ∶ [0, 1] → R that are bounded and 𝑑(𝑓, 𝑔) =
sup𝑥∈[0,1] |𝑓(𝑥) − 𝑔(𝑥)|. Consider the sequence of functions 𝑓𝑛 ∶ [0, 1] → R defined by

𝑛𝑥
𝑓𝑛 (𝑥) = .
1 + 𝑛2 𝑥2
1. Sketch the graphs of 𝑓1 , 𝑓2 and 𝑓3 in the interval [0, 1].
2. Find 𝑓 ∶ [0, 1] → R such that 𝑓𝑛 (𝑥) → 𝑓(𝑥) for every 𝑥 ∈ [0, 1].
3. Determine if (𝑓𝑛 ) converges to 𝑓 in (𝑆, 𝑑).
4. Let 𝑎 ∈ (0, 1) be fixed. Does (𝑓𝑛 ) converges to 𝑓 in (𝑆, 𝑑) on the interval [𝑎, 1]?
6.4. COMPACTNESS 305

6.4 Compactness
Chapter 1 introduced the concept of compactness in R2 . For this module, you may assume that the following
definition gives a definition of compactness for a metric space. (Note, the name ‘compact’ occurs in more
general spaces and the definition is different. In a Metric Space this alternative definition is equivalent to
sequential compactness.)

Definition 6.4 (Sequentially compact). A subset 𝐴 ⊆ 𝑆 is compact if and only if it is sequentially


compact, i.e. every sequence (𝑥𝑛 )𝑛≥1 has a convergent subsequence (𝑥𝑛 )𝑖≥1 .
𝑖

The real numbers R with the metric 𝑑(𝑥, 𝑦) = |𝑥 − 𝑦| is not compact. For example, the sequence 𝑎𝑛 = 𝑛
for all 𝑛 ∈ N does not have convergent subsequence; any subsequence is not bounded and hence not
convergent.

Determining if a general metric spaces is compact can be difficult. In R𝑛 with the Euclidean metric, there is
a complete answer. This is the result utilised in Chapter 1.

Theorem 6.1 (Heine–Borel theorem). A subset of R𝑛 (equipped with the Euclidean metric) is compact
if and only if it is closed and bounded.

Chapter 1 provides further examples of compact and non-compact subsets of R2 . In particular, [0, 1] is
compact and [0, 1) is not compact. The next example presents an example similar to those in Chapter 1,
you will find it useful to review the examples presented there.

Example 6.10
Is the set {(𝑥, 𝑦) ∈ R2 ∶ 𝑥 − 𝑦 ≥ 0, 𝑥2 + 𝑦2 ≤ 1} compact?
306 CHAPTER 6. METRIC SPACES
6.4. COMPACTNESS 307

Question 6.11
Define subsets of R2 given by

𝐴 = {(𝑥, 𝑦) ∈ R2 ∶ 𝑥2 + 𝑦2 ≥ 1, |𝑥| + |𝑦| ≤ 2},


𝐵 = {(𝑥, 𝑦) ∈ R2 ∶ 2|𝑥| + |𝑦| < 1, 𝑥𝑦 < 0},
𝐶 = {(𝑥, 𝑦) ∈ R2 ∶ 0 < 𝑥 < 2𝜋, 𝑦 = sin(𝑥)}.

where 𝑑2 is the usual (Euclidean) metric given by

𝑑2 ((𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 )) = √(𝑥1 − 𝑥2 )2 + (𝑦1 − 𝑦2 )2 .

Decide whether 𝐴, 𝐵 or 𝐶 are compact in (R2 , 𝑑2 ),


6.5. CONTINUITY 309

6.5 Continuity
Recall from Year 1 analysis that

• A function𝑓 ∶ 𝐴(⊆ R) → R is continuous at 𝑎 ∈ 𝐴 if given 𝜀 > 0, there exists 𝛿 > 0 such that
|𝑥 − 𝑎| < 𝛿 implies |𝑓(𝑥) − 𝑓(𝑎)| < 𝜀.

• A function 𝑓 ∶ 𝐴(⊆ R) → R is sequentially continuous at 𝑎 ∈ 𝐴 if given any sequence (𝑥𝑛 ) ⊆ 𝐴


that converges to 𝑎, (𝑓(𝑥𝑛 )) → 𝑓(𝑎).

• In R, a function is sequentially continuous at 𝑎 ∈ 𝐴 if and only if it is continuous.

This section generalises these ideas to a metric space. Let 𝑓 ∶ (𝑆, 𝑑) → (𝑆 ′ , 𝑑′ ), then 𝑓 is (𝑑, 𝑑′ )-
continuous at 𝑥 ∈ 𝑆 if and only if: ∀𝜀 > 0, ∃𝛿 > 0 such that

𝑑(𝑥, 𝑦) < 𝛿 ⇒ 𝑑′ (𝑓(𝑥), 𝑓(𝑦)) < 𝜀.

Theorem 6.2 (Continuous). A function 𝑓 ∶ (𝑆, 𝑑) → (𝑆 ′ , 𝑑′ ), then 𝑓 is (𝑑, 𝑑′ )-continuous at 𝑥 ∈ 𝑆


if and only if:
𝑑(𝑥𝑛 , 𝑥) → 0 ⇒ 𝑑′ (𝑓(𝑥𝑛 ), 𝑓(𝑥)) → 0.
The proof of this result is omitted and not examinable.

Example 6.12

1. On R, the following defines a metric:

0, if 𝑥 = 𝑦,
𝑑(𝑥, 𝑦) = {
|𝑥| + |𝑦|, otherwise.

Show that 𝑓 ∶ (R, 𝑑) → (R, 𝑑) defined by 𝑥 ↦ 1 + 𝑥 is not continuous at 0.


2. Let 𝑆 be the collection of continuous functions 𝑓 ∶ [𝑎, 𝑏] → R and 𝑑(𝑓, 𝑔) =
sup𝑥∈[𝑎,𝑏] |𝑓(𝑥) − 𝑔(𝑥)|. Show that the function 𝐼 ∶ (𝑆, 𝑑) → (R, 𝑑1 ) defined by

𝑏
𝐼(𝑓) = ∫ 𝑓(𝑥)𝑑𝑥
𝑎

𝑏 𝑏
is continuous. (Note, it will be helpful to recall ∣∫ 𝑓(𝑥)𝑑𝑥∣ ≤ ∫ |𝑓(𝑥)|𝑑𝑥.)
𝑎 𝑎
310 CHAPTER 6. METRIC SPACES
6.5. CONTINUITY 311
312 CHAPTER 6. METRIC SPACES

Question 6.13

1. On R, the following defines a metric:

0, if 𝑥 = 𝑦,
𝑑(𝑥, 𝑦) = {
2|𝑥| + 2|𝑦| + |𝑥 − 𝑦|, otherwise.

Show that 𝑓 ∶ (R, 𝑑) → (R, 𝑑) defined by 𝑥 ↦ 1 + 𝑥 is not continuous at 0.

2. On R, the following defines a metric:

|𝑥 − 𝑦| + 1, if 𝑥 < 0 ≤ 𝑦 or 𝑦 < 0 ≤ 𝑥,
𝑑(𝑥, 𝑦) = {
|𝑥 − 𝑦|, otherwise.

Show that 𝑓 ∶ (R, 𝑑) → (R, 𝑑) defined by 𝑥 ↦ −𝑥 is not continuous at 0.


6.6. CHAPTER 6 CONSOLIDATION QUESTIONS 313

6.6 Chapter 6 Consolidation Questions

Question 6.14

1. Explain why the following are not metrics on R:

a. 𝑑(𝑥, 𝑦) = |𝑥 − 𝑦|3 ;
b. 𝑑(𝑥, 𝑦) = 2|𝑥| + 3|𝑦| if 𝑥 ≠ 𝑦, and 𝑑(𝑥, 𝑥) = 0.
2. Show that 𝑑(𝑥, 𝑦) = |𝑥 − 𝑦| + 1 if 𝑥 < 0 ≤ 𝑦 or 𝑦 < 0 ≤ 𝑥, and 𝑑(𝑥, 𝑦) = |𝑥 − 𝑦|
otherwise, is a metric on R.
3. Is 𝑑(x, y) = min𝑖=1,…,𝑛 |𝑥𝑖 − 𝑦𝑖 | a metric on R𝑛 ?

Question 6.15
Prove that the union of two bounded sets 𝐴 and 𝐵 in a metric space, with 𝐴 ∩ 𝐵 ≠ ∅ is a bounded
set.
(Note, this result holds in the case when 𝐴 ∩ 𝐵 = ∅, but we need to define the distance between
two sets, hence the weaker result presented here.)

Question 6.16

1. Assume that 𝑑, defined as follows, is a metric on R:

0, if 𝑥 = 𝑦,
𝑑(𝑥, 𝑦) = {
2|𝑥| + 2|𝑦| + |𝑥 − 𝑦|, otherwise.

Find 𝐵(0, 𝛿), 𝐵(0, 1) and 𝐵(1, 𝛿).

2. Assume that 𝑑, defined as follows, is a metric on R2 :

⎧ 0, if x = y,
{
𝑑(x, y) = ⎨ 𝑑2 (x, y), if x, y and 0 are colinear (lie on a line) with x ≠ y,
{ 𝑑
⎩ 2 (x , 0 ) + 𝑑 ( 0 , y ), otherwise.
2

Sketch 𝐵(0, 1), 𝐵((2, 2), 1), 𝐵((1, 2), 4).


314 CHAPTER 6. METRIC SPACES

6.7 Chapter 6 summary

Chapter outcomes review

Having reviewed the material and completed the assessment (formative and summative) material,
you should be able to:

• recall and apply appropriate definitions to problems involving metric spaces;


• state and prove theorems involving metric spaces;
• evaluate a problem and then apply appropriate techniques to solve problems involving metric
spaces.

This chapter covers key topics in metric spaces. The Metric Spaces module provides greater coverage.

Metric spaces. Let 𝑆 be a non-empty set and let 𝑑 ∶ 𝑆 × 𝑆 → [0, ∞) be such that, for all 𝑥, 𝑦, 𝑧 ∈ 𝑆 :

• (M1) 𝑑(𝑥, 𝑦) = 0 if, and only if, 𝑥 = 𝑦;


• (M2) 𝑑(𝑥, 𝑦) = 𝑑(𝑦, 𝑥);
• (M3) 𝑑(𝑥, 𝑧) ≤ 𝑑(𝑥, 𝑦) + 𝑑(𝑦, 𝑧).

The function 𝑑 is a metric and (𝑆, 𝑑) is a metric space.

A subset 𝐴 ⊆ 𝑆 is bounded if its diameter, diam(𝐴) = sup𝑥,𝑦∈𝐴 𝑑(𝑥, 𝑦) is finite.


Open and closed sets. Let (𝑆, 𝑑) be a metric space, 𝑥 ∈ 𝑆 , 𝑟 > 0. The open ball centred at 𝑥 of radius
𝑟 is 𝐵(𝑥, 𝑟) = {𝑦 ∈ 𝑆 ∶ 𝑑(𝑥, 𝑦) < 𝑟}. A subset 𝐴 ⊂ 𝑆 is open in 𝑆 if for every 𝑥 ∈ 𝐴, there exists an
𝜀 > 0 such that 𝐵(𝑥, 𝜀) ⊆ 𝐴. A subset if closed if it is the complement of an open set in 𝑆 .

• A set can be both open and closed.


• Every open ball is open and every closed ball is closed.
• The empty set ∅ and the entire set 𝑆 are both open, and hence both closed in 𝑆 .

Convergence and compactness. Let (𝑥𝑛 )𝑛≥1 be a sequence in a metric space (𝑆, 𝑑). We say that 𝑥𝑛
converges to 𝑥 ∈ 𝑆 if, and only if, 𝑑(𝑥𝑛 , 𝑥) → 0; that is, for all 𝜀 > 0, there exists 𝑁 ∈ N such that

𝑥𝑛 ∈ 𝐵(𝑥, 𝜀), for all 𝑛 ≥ 𝑁.

• A subset 𝐴 ⊆ 𝑆 is sequentially compact if every sequence (𝑥𝑛 )𝑛≥1 has a convergent subsequence
(𝑥𝑛𝑖 )𝑖≥1 . (Convergence is to a point in 𝐴.)
• A subset 𝐴 ⊆ 𝑆 is compact if, and only if, it is sequentially compact.
• Heine–Borel Theorem. A subset of R𝑛 (equipped with the Euclidean metric) is compact if, and only
if, it is closed and bounded.

Continuity. Let 𝑓 ∶ (𝑆, 𝑑) → (𝑆 ′ , 𝑑′ ), then 𝑓 is (𝑑, 𝑑′ )-continous at 𝑥 ∈ 𝑆 if, and only if, for all 𝜀 > 0,
there exists 𝛿 > 0 such that

𝑑(𝑥, 𝑦) < 𝛿 ⟹ 𝑑′ (𝑓(𝑥), 𝑓(𝑦)) < 𝜀.


6.7. CHAPTER 6 SUMMARY 315

• A function 𝑓 ∶ (𝑆, 𝑑) → (𝑆 ′ , 𝑑′ ) is (𝑑, 𝑑′ )-continuous at 𝑥 ∈ 𝑆 if, and only if, for every sequence
(𝑥𝑛 ) in 𝑆 converging to 𝑥,

𝑑(𝑥𝑛 , 𝑥) → 0 ⟹ 𝑑′ (𝑓(𝑥𝑛 ), 𝑓(𝑥)) → 0.

You might also like