Linear Algebra UCD

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 152

Linear Algebra (Online)

MATH10390

Dr Richard Smith

May 20, 2023

Dr Richard Smith copyright © 2022/23.

i
Contents

Contents iii

0 Programme overview 1
0.1 Programme outline . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Assessment and grading . . . . . . . . . . . . . . . . . . . . . 5
0.3 Continuous assessment schedules . . . . . . . . . . . . . . . 7
0.4 Discussion boards and MathJax . . . . . . . . . . . . . . . . . 9
0.5 Any other business . . . . . . . . . . . . . . . . . . . . . . . . 10

1 Matrices 1 13
1.1 Introduction to matrices . . . . . . . . . . . . . . . . . . . . . 13
1.2 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Further ideas from matrix arithmetic . . . . . . . . . . . . . . 23
1.4 Inverses and determinants of 2 × 2 matrices . . . . . . . . . 26
1.5 Inverses and determinants of n × n matrices . . . . . . . . . 31

2 Vector Geometry 1 41
2.1 Euclidean n-space and vectors . . . . . . . . . . . . . . . . . 41
2.2 Vector arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.3 The scalar product . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4 Matrices acting on vectors . . . . . . . . . . . . . . . . . . . . 58
2.5 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . 59

3 Systems of Linear Equations 63


3.1 Systems of linear equations . . . . . . . . . . . . . . . . . . . 63
3.2 Elementary row operations . . . . . . . . . . . . . . . . . . . 65
3.3 Row-echelon and reduced row-echelon forms . . . . . . . . 68
3.4 Parametric solutions and inconsistent systems . . . . . . . 72
3.5 Connections between linear systems and matrices . . . . . 75

4 Vector Geometry 2 83

iii
iv Contents

4.1 Orthonormal bases of Rn and coordinate systems . . . . . . 83


4.2 Orthonormal bases and orthogonal matrices . . . . . . . . . 89

5 Eigenvalues and Eigenvectors of Matrices 93


5.1 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . 93
5.2 The characteristic equation . . . . . . . . . . . . . . . . . . . 96
5.3 Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.4 Symmetric matrices and orthonormal bases of eigenvectors 104

6 Matrices 2 107
6.1 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2 Matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

A Discussion board and WeBWorK guides 117


A.1 How to use the Moodle discussion boards . . . . . . . . . . 117
A.2 How to use WeBWorK . . . . . . . . . . . . . . . . . . . . . . 120

B Principal component analysis (non-examinable) 123

C Additional material (non-examinable) 129


C.1 Additional proofs . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.2 Vector subspaces of Rn and dimension . . . . . . . . . . . . 134
Chapter 0

Programme overview

0.1 Programme outline


Linear Algebra (Online) and Calculus (Online) comprise the Professional It should be possible to
link to anything in blue.
Certificate in Mathematics for Data Analytics and Statistics. The purpose
of these modules is to teach the student fundamental concepts and tech-
niques from linear algebra and calculus that are, for instance, necessary
for the study of multivariate statistics. While the material in the mod-
ules will be quite generic in nature (and thus applicable to many other
fields), the reader will find in some of the appendices specific techniques
in multivariate statistics (e.g. MATH10390 Appendix B on Principle Com-
ponent Analysis), that draw together many of the topics covered in the
programme as a whole, either directly or indirectly.

MATH10390 Linear Algebra (Online) outline


• Matrices 1
Matrix arithmetic, determinants of n × n matrices and their com-
putation for small n, and the adjugate method of finding inverses.
Symmetric and orthogonal matrices.

• Vector geometry 1
Vectors in n-dimensional Euclidean space, vector arithmetic, scalar
products, the Cauchy-Schwarz inequality, angles between vectors,
and the action of matrices on vectors.

1
2 Programme overview

• Systems of linear equations


Solutions of systems of linear equations by Gaussian elimination,
connections between matrices and linear systems, including matrix
rank.

• Vector geometry 2
Orthonormal lists of vectors, orthonormal bases of Rn and coordi-
nate systems.

• Eigenvalues and eigenvectors of matrices


Eigenvalues and eigenvectors of n × n matrices, and their compu-
tation for n = 2 and n = 3. Symmetric matrices and orthonormal
bases of eigenvectors.

• Matrices 2
Quadratic forms and matrix norms.

MATH10400 Calculus (Online) outline


• Functions and Limits
Functions, domain, codomain, algebra of functions, injectivity, sur-
jectivity, inverses, limits, algebra of limits, continuity, polynomials,
rational functions, trigonometric, exponential and logarithmic func-
tions.

• Differentiation
Rates of change, differentiation from first principles, relationship
with continuity, the power, product, quotient and chain rules, deriva-
tives of polynomials, trigonometric, exponential and logarithmic func-
tions, and composites thereof.

• More about the derivative


Critical points, finding local maxima, minima and inflection points,
higher order derivatives, Rolle’s Theorem, the Mean Value Theorem,
linear approximation, and logarithmic and implicit differentiation.

• Functions of several variables


Functions on Rn (mostly n = 2) and their graphs, partial deriva-
tives, gradients, critical points and their classification, second order
partial derivatives, Hessian matrices, lines of best fit, least squares.
0.1. Programme outline 3

• Integration
Indefinite integrals as antiderivatives, standard examples, Riemann
sums, definite integrals and area, the Fundamental Theorem of Cal-
culus.

• Methods of integration
Integration by substitution and integration by parts.

• Numerical techniques
Solving equations numerically, the bisection and Newton-Raphson
methods, numerical integration, the trapezoidal rule and Simpson’s
rule.

Examples will be peppered throughout the two modules. While there


will be some theory, the emphasis will be on the introduction of ideas
and techniques. A small number of mathematical proofs are included, but
they will be relatively straightforward, and will be specific applications
of the techniques introduced during the course of the modules. Any of
the deeper, more involved proofs that would belong to more theoretical
courses on linear algebra or calculus will be confined to Appendix C.1
and Appendix B.1, respectively, should the reader be interested. These
appendices, replete with dark secrets and forbidden magic, will not be
examinable!

Lecturer – Dr Richard Smith


I am on the right. My email address is richard.smith@maths.ucd.ie
and the address of my website is https://maths.ucd.ie/~rsmith.
Please only use my email address in the event of an emergency! See
point 4 below under ‘Online material’, and Section 0.4 for details on how
to pose queries concerning the module.
My office is S1.71, first floor, Science Centre South. It is in building 12,
square 6D, on the most recent version of the UCD map available.
In addition, the Mathematics and Statistics School Office is in G03, In keeping with ancient
academic tradition, the
ground floor, Science North, in building 65, square 6C on the map. photo is comfortably out
of date.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
4 Programme overview

Figure 0.1: School office: G03, ground floor, Science North

Online material
1. UCD Mathematics Moodle
All module material will be made available on

UCD Mathematics Moodle (https://vector.ucd.ie/moodle).

Once at the site, please log in using your UCD Connect creden-
tials and then enrol to both modules (the enrolment key for both is
‘ucdprofcert2023’).

2. Lecture notes and videos


You will see a series of The full set of lecture notes for each module will be made available
exercises in the notes
themselves. You do not
when the programme opens. At 9am (summer time, i.e. 08:00 GMT)
need to submit solutions on each Monday of the first eight weeks of term, a set of short videos
to these.
covering the central topics from the notes in further detail will be
released. Note that this material is intended to be absorbed over
the 12-week summer teaching term; in particular, the assessment
will be spread across the 12 weeks. The compressed schedule is
designed to allow people to look ahead if they wish.
0.2. Assessment and grading 5

3. Continuous assessment
Continuous assessment comes in two forms: written homework and
WeBWorK. Both will be issued and managed online – see Section
0.2 for more details. The full schedule of issue dates and assessment
deadlines is given in Section 0.3.

4. Discussion boards
Students can post queries and discuss topics via the weekly dis-
cussion boards – see Section 0.4 for more information.

5. News and announcements


I will make class announcements via the ‘MATH10390 announce-
ments’ and ‘MATH10400 announcements’ discussion boards at the
top of each site’s main page, respectively. These announcements
will be repeated in the ‘Latest news’ boxes to the right hand side.

Mobile access to online material


We recommend that you view UCD Mathematics Moodle via a web brow-
ser on a desktop or laptop computer, or tablet.
Moodle does have an app, available for both iOS and Android mobile
platforms. However, be warned that its functionality is limited: it does
not work with WeBWorK all that well, and there is no rendering of math-
ematical notation via MathJax (see Section 0.4). Given the app’s limita-
tions and given my strong doubts about whether smartphones can really
aid the acquisition of deep knowledge and understanding, I am simply
making you aware of the app rather than actively promoting it.
To gain access to UCD Mathematics Moodle via the app, please enter

vector.ucd.ie/moodle

when prompted for the URL, and then log in as usual.

0.2 Assessment and grading


The proportion of marks allocated to the various assessment components
will be the same for both modules. On the other hand, the modules’
assessment deadlines will be different – see Section 0.3 below.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
6 Programme overview

Homework (20% of final mark)


Four homework sheets will be issued on Moodle over the course of the
module. Each is worth 5% of your final mark. You will receive a mark out
of 5 for each sheet. The only possible exception to this is MATH10390
Homework Sheet 4, where you may receive an additional 2 bonus marks,
owing to the length of the sheet.
Written solutions to the homework should be scanned to pdf files and
submitted online via Moodle. As well as ordinary scanners, there are
some apps (free or ‘freemium’) that use smartphone cameras to scan doc-
uments to pdf, such as CamScanner. Alternatively, with a suitable app,
you can write solutions directly onto a touchscreen device, such as a
tablet, and export to a pdf file.
Please ensure that your student number and solutions are clearly visible,
otherwise, you may lose marks! For each homework assignment, you
must submit exactly one pdf file containing your solutions and accept
the submission declaration before clicking the submit button (see the
end of Section 0.5). The maximum size of files to be uploaded is 10 MB,
which should be plenty.
Homework issue dates and submission deadlines are given in Section 0.3
below. Marks and feedback on submitted solutions will be provided on a
rolling basis.

WeBWorK (10% of final mark)


WeBWorK is an online homework system, again available via Moodle.
Five WeBWorK homework sets will be issued over the course of the mod-
ule. Each set is worth 2% of your final mark. You will receive a mark out
of 2 for each set.
Solutions are entered directly online. For advice on entering answers,
and comments on certain questions, please see the WeBWorK guide in
Appendix A.2 (of either module).
WeBWorK set issue dates and submission deadlines are given in Section
0.3 below. Answers to a given WeBWorK set will be released immediately
after the corresponding deadline.

Final exam (70% of final mark)


Prior to the pandemic the exams for the two modules in this programme
were offered in-person in UCD. In 2020 and 2021 they were moved online
0.3. Continuous assessment schedules 7

due to the pandemic. Last year we reverted to in-person exams, and we


will do the same this year. Online exams are problematic because they
make it very difficult to protect the integrity of assessment; unfortunately
plagiarism was committed when the exams were online.
The two final 2-hour written exams will take place from 10am – 12 noon,
and 2pm – 4pm in B108, Computer Science Building, University College
Dublin, Friday 25 August 2023 (building 18, square C6 on the UCD map).
No alternative exam date will be offered.
When travel to Dublin for the final exams is not possible, examination in
appropriate third party centres may be facilitated. Such arrangements
will need to be made well in advance of the exam and cannot be guar-
anteed. Contact Natalia Zadorozhnyaya (dataanalyticsonline@ucd.ie)
by Friday 9 June (week 3) to enquire. Students availing of an alternative
examination centre will need to bear all the associated costs.

Grading
You will receive a mark out of 30 for your continuous assessment, which
will be converted to a letter grade according to the University’s Standard
Conversion Grade Scale (see under Mark to Grade Conversion Scales).
Likewise, you will receive a mark out of 70 for your final exam which will
be converted into a letter grade in the same manner. These two letter
grades will be combined to make an overall module grade; the precise
mechanism by which this will be achieved can be seen under Module
Grade Calculation Points.

0.3 Continuous assessment schedules


All MATH10390 continuous assessment issue dates and deadlines will fall
at 9am (summer time, i.e. 08:00 GMT) on Tuesdays. All MATH10400
issue dates and deadlines will fall at 9am on Wednesdays.
Two weeks are given to complete each WeBWorK set. There are no
‘overall submit’ buttons. To obtain full credit for a set, simply enter the
correct solutions online before the deadline.
The amount of time given to complete written homework assignments The deadlines start to
pile up towards the end
varies. Early submission of written homework assignments is strongly of the modules. Please
encouraged, but the formal deadlines are structured in a way that allows be mindful of this!
students some flexibility in making their own work plan. The complete
schedules are given in the tables below.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
8 Programme overview

The WeBWorK deadlines are hard deadlines. Regarding homework dead-


lines, if homework is submitted late, your total mark will decrease linearly
to zero after 48 hours. For example, a piece of work ordinarily worth 5%
will receive 2.5% if submitted 24 hours late, and 1.25% if submitted 36
hours late, and so on.
WeBWork marks and feedback will be returned immediately after the
deadlines, and homework marks and feedback will be returned within 10
working days of the deadlines. For this reason UCD’s Late Submission
of Coursework Policy will not apply to these modules (see point 6.1 in
the policy). Penalties for late submission may be waived if the student
has valid extenuating circumstances (see Section 0.5).

MATH10390 assessment schedule (9am Tuesdays)

Week Date Assignment issue Assignment deadline


2 30-05 WeBWorK 1
Again, while the videos 3 06-06 Homework 1
are compressed into an
8-week period, the con- 4 13-06 WeBWorK 2 WeBWorK 1
tinuous assessment is
spread throughout the 5 20-06 Homework 2
12-week summer teach-
ing term. 6 27-06 WeBWorK 3, Homework 3 WeBWorK 2
7
8 11-07 WeBWorK 4, Homework 4 WeBWorK 3, Homework 1
9 18-07 Homework 2
10 25-07 WeBWorK 5 WeBWorK 4, Homework 3
11
The homework dead-
lines for both modules 12 08-08 WeBWorK 5, Homework 4
are closely aligned, with
the exception of home-
work sheet 4.
MATH10400 assessment schedule (9am Wednesdays)

Week Date Assignment issue Assignment deadline


2 31-05 WeBWorK 1
3 07-06 Homework 1
4 14-06 WeBWorK 2 WeBWorK 1
5 21-06 Homework 2
0.4. Discussion boards and MathJax 9

6 28-06 WeBWorK 3, Homework 3 WeBWorK 2


7
8 12-07 WeBWorK 4, Homework 4 WeBWorK 3, Homework 1
9 19-07 Homework 2
10 26-07 WeBWorK 5 WeBWorK 4, Homework 3
11 02-08 Homework 4
12 09-08 WeBWorK 5

0.4 Discussion boards and MathJax


Weekly discussion boards
Each module will have its own set of discussion boards. Each week will
be given its own discussion board to keep conversations focussed. If you
have a query about the module or about its content, you are strongly
encouraged to post your query to the discussion boards. Please don’t
be afraid to ask questions!! From personal experience, I know that it can
feel daunting to ask questions (especially in an online environment), but
asking questions really is an excellent way of improving understanding!
Ordinarily, these boards will be monitored, and queries posted to them
will be answered, for up to two hours in the afternoon, Monday to Friday,
depending on the volume of queries. If I take leave at any point then I
will let you know via the ‘MATH10390 announcements’ and ‘MATH10400
announcements’ discussion boards and will arrange appropriate cover.
Please only contact me by email in the event of an emergency!
I will not monitor the boards at other times, or at weekends! Of course,
the online nature of these modules means that you can view module
content and work through assignments at any time, day or night. In
contrast, the lowly human behind it all (i.e. that person in the photo)
cannot be on hand at all times as well. Please do take this into account,
particularly when the Tuesday and Wednesday deadlines loom!

Posting mathematical content


The Moodle forums have a system called MathJax that allows people
to write mathematical notation directly into web pages, which enables
mathematical queries to be easily and clearly stated. Details of how to

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
10 Programme overview

use MathJax in the discussion boards are given in Appendix A.1 (of either
module).
You’re free to use MathJax to post mathematical content. Alternatively
you can post such content by writing it by hand and scanning it to a pdf
(see above) or by using a suitable pdf annotator, and then attaching the
pdf file to your post. This option may be preferable if you want to write
a lot of mathematical content.

0.5 Any other business


Suggested further reading
Regarding books, neither module formally follows a textbook. However, I
can suggest Anton, Rorres, Elementary Linear Algebra, Applications Ver-
sion, Chapters 1-3 and parts of Chapters 5 and 7 for MATH10390, and
Anton, Bivens, Davis, Calculus Early Transcendentals, parts of Chapters
0-5, 7 and 13 for MATH10400. It often helps to see concepts approached
from a second, slightly different perspective, and there are plenty of ex-
ercises to practice on.

Calculators permissible in the final exams


There is of course a huge range of calculators available and it is unre-
alistic to provide an explicit list of those that will be permissible in the
final exams. Generally speaking, the calculators that are not permissi-
ble are programmable ones or ones that are capable of more advanced
built-in functionality. As an example, the Casio fx-83GT PLUS model is
permissible, but the fx-991ES PLUS is not. If in doubt, please ask me
on the discussion boards. Of course, use of smartphones in the exams is
completely banned!

Registration, fee payment and withdrawals


Please confirm your personal details (including email address and photo)
and pay your programme fees using UCD’s Student Information System.
As this is a one-semester programme, payment is required in full be-
fore it starts on 22 May. For further assistance, please contact Natalia
Zadorozhnyaya (dataanalyticsonline@ucd.ie) or see UCD’s guide to
online registration and fee payment. Further information on fee payment
and deadlines can be found on the UCD Fees office website.
0.5. Any other business 11

If you wish to withdraw from the programme, then please note that it is
essential to do so by Friday 11 August (week 12), to ensure you do not
have a failing grade recorded against your name on the University’s sys-
tem. Since this programme is only one trimester long, it is not possible
to get a refund upon withdrawal; see point 1.7 of UCD’s Refunds Policy.

Extenuating circumstances
The University has an Extenuating Circumstances Policy. The Univer-
sity defines extenuating circumstances to be ‘serious unforeseen circum-
stances beyond your control which prevented you from meeting the re-
quirements of your programme’. Note the following footnote on page 2
of the Student Guide on Extenuating Circumstances:

Work commitments are not normally considered to be extenu-


ating circumstances. However a student on a part-time and/or
continuing professional education programme may have work-
related extenuating circumstances outside of the norm (e.g. a
work-related court case that they legally must attend) and in
these exceptional cases, they should consult the appropriate
programme/school office for advice.

You can apply for extenuating circumstances online. For more details,
please contact Natalia Zadorozhnyaya (dataanalyticsonline@ucd.ie).

UCD Student Code and plagiarism


Concerning conduct and plagiarism in particular, please see the Univer-
sity’s Student Code of Conduct and specifically its Student Plagiarism
Policy. In addition to these documents, the School of Mathematics and
Statistics has its own Plagiarism Protocol. Please familiarise yourselves
with the second and third documents. The Library also has resources
and advice to help people avoid unintentional plagiarism.
In accordance with the School’s protocol (see §2.2), you will need to
accept the submission declaration before clicking the submit button. In
doing so, you acknowledge that you have neither given, sought, nor re-
ceived, aid in order to complete the assessment.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
Chapter 1

Matrices 1

1.1 Introduction to matrices


Definition of matrices

Definition 1.1. A m × n (‘m by n’) matrix is a rectangular array of


numbers having m rows and n columns, enclosed by brackets.

Example 1.2.
!
2 −11 27
1. is a 2 × 3 matrix.
1 0 −5

2. Let θ be an angle, and define


!
cos θ − sin θ
Rθ = . The term ‘matrix’
sin θ cos θ was coined by James
Sylvester (1814 – 1897),
though much of what
In due course we will see that Rθ is the 2 × 2 rotation matrix we now call matrix
theory was known
which rotates points in 2 dimensions about the origin, through to mathematicians in
angle θ. preceding centuries.

We usually label matrices using capital letters like A, B and C , etc.


In spite of their unassuming definition, matrices have an enormous range
of applications to other fields, such as physical and biological sciences,
probability and statistics, engineering, sociology, optimisation, computer

13
14 Matrices 1

science, and other areas of mathematics such as geometry and differential


equations.

Remarks 1.3.

1. Two matrices have the same size if they have the same number
of rows and the same number of columns, so a 2 × 3 matrix and
a 3 × 2 matrix have different sizes.

2. A matrix having just one column is sometimes called a column


vector – see Chapter 2 for more details.

3. If a m × n matrix has the same number of rows as columns, i.e.,


if m = n, then it is called a square matrix.

4. If A is a m × n matrix, the number appearing in the ith row and


jth column is called the (i, j)-entry of A, and is denoted Aij .

5. Two matrices are equal if and only if they have the same size
and the corresponding entries are all equal.

√ !
2 2 −3
Example 1.4. Let A = .
4 0 5

Then A12 = 2 (i = 1, j = 2), A23 = 5 (i = 2, j = 3), A13 = −3, etc.

Matrix addition and subtraction


Fundamental to linear algebra (and mathematics as whole) is the idea
that you can algebraically manipulate a host of mathematical objects
that are sometimes more complicated than ordinary numbers.
In real life, we are used In spite of the fact that matrices are composed of numbers, very often
to treating composite
objects as whole things
we prefer to treat them as whole entities in their own right, and perform
in their own right: we operations on them such as addition, subtraction and multiplication, that
tend to treat people
as whole people, rather in some ways (but certainly not all!) resemble those of numbers.
than as bags of blood,
bones and organs. In order to define matrix addition, subtraction and multiplication, we do
need to open up the matrices, manipulate their innards so to speak (i.e.
manipulate the matrix entries inside), and close them back up again. So
let’s get ‘under the hood’, or ‘bonnet’.
1.1. Introduction to matrices 15

Definition 1.5 (Matrix addition and subtraction). Let A and B be ma-


trices of the same size (m × n). We define their sum, A + B, to be the
m × n matrix whose entries are given by

(A + B)ij = Aij + Bij ,

for i = 1, . . . , m and j = 1, . . . , n. Their difference, A − B, is the m × n


matrix having entries

(A − B)ij = Aij − Bij ,

where i and j are as above.


If A and B have different sizes then the matrix sums and differences
A + B and A − B are undefined.

Thus A + B is obtained from A and B by adding entries in corresponding


positions.

Example 1.6. Take two 2 × 4 matrices


! !
2 0 −1 −1 −1 1 0 −2
We can only add or
A = and B = . subtract two matrices if
they are the same size.
1 2 4 2 3 −3 1 1

Find A + B and A − B.

Solution. We have
!
2 + (−1) 0+1 −1 + 0 −1 + (−2)
A+B =
1+3 2 + (−3) 4 + 1 2+1
!
1 1 −1 −3
= ,
4 −1 5 3

and
!
2 − (−1) 0−1 −1 − 0 −1 − (−2)
A−B =
1−3 2 − (−3) 4 − 1 2−1

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
16 Matrices 1

!
3 −1 −1 1
= . 
−2 5 3 1

Matrix scalar multiplication


Matrix multiplication, that is, the multiplication of one matrix by another
to produce a third matrix, is defined below in Section 1.2. There is also
a way to multiply a matrix by a scalar, that is, an ordinary number, to
produce a new matrix.

Definition 1.7 (Matrix scalar multiplication). Let A be a m × n matrix


and let c be a scalar (i.e. a number). Then cA is the m × n matrix
having entries defined by

(cA)ij = cAij ,

for i = 1, . . . , m and j = 1, . . . , n. I.e. cA is obtained from A by


multiplying every entry in A by c.

!
2 5
Example 1.8. If A = , then
3 −4
! ! !
4 10 −10 −25 0 0
2A = , −5A = , 0A = .
6 −8 −15 20 0 0

Zero matrices
The humble 0 has a much more exciting history than you might think,
and is in fact a very special number. It is the only number, such that
when added to any number, has no effect: a + 0 = 0 + a = a. For this
reason, mathematicians call it an additive identity.
The last example in Example 1.8 prompts the following definition.

Definition 1.9 (Zero matrices). The m × n matrix whose entries are


all zero is called the zero (m × n) matrix.

The zero matrices are the matrix analogies of 0: if the m × n zero matrix
is added to any m × n matrix, the matrix remains unchanged.
1.2. Matrix multiplication 17

1.2 Matrix multiplication


Before we begin. . . summation notation
Matrix multiplication is very different from addition and subtraction. The
matrix product AB is definitely not computed in the ‘obvious’ way, i.e. by
multiplying the respective entries of A and B. The definition of ma-
trix multiplication is best expressed using summation notation. Readers
familiar with this kind of notation can skip on to the next subsection.
Otherwise, please read on. . .
In mathematics we are frequently required to add together sequences of
numbers. Suppose that we set xi = 2i, for every positive integer i. Then
the sum of the first 4 terms of the sequence xi is given by

x1 + x2 + x3 + x4 = 2 + 4 + 6 + 8 = 20. (1.1)

This is fine for small numbers of terms (in this case 4), but very often we
need to add together much larger numbers of terms, such as a thousand
terms or a million terms. Writing out such large numbers of terms as
above would get upsetting very quickly.
Summation notation has been developed to deal with such eventualities.
We will require it for matrix multiplication. The sum of terms x1 + x2 +
x3 + x4 in (1.1) can be expressed instead as
4
X
xi . (1.2)
i=1
P
The symbol is the summation symbol, i is the index of summation and
1 and 4 are the initial value (or lower limit) and final value (or upper
limit) of the index of summation, respectively. Now we can express the
sum of the first thousand terms of the sequence xi concisely as
1000
X
xi .
i=1

The sum of the terms xi from i = 34 to i = 781, can be written as


781
X
xi ,
i=34

and so on.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
18 Matrices 1

Most of the time, the choice of letter to denote the index of summation
does not matter. Letters such as j and k are commonly used instead of
i. For example, the expressions
4
X 4
X
xj and xk
j=1 k=1

mean the same thing as (1.2): x1 + x2 + x3 + x4 .


Roughly speaking, you are allowed to use whatever letter you like, pro-
vided it is not being used somewhere else. For instance, in matrix mul-
tiplication, we need to deal with quantities indexed by more than one
variable. Suppose that yij = j 2 i, for all positive integers i and j. Then
the sum
X 4
yij (1.3)
j=1

denotes the quantity

yi1 + yi2 + yi3 + yi4 = 12 i + 22 i + 32 i + 42 i = (1 + 4 + 9 + 16)i = 30i.

In (1.3), the index of summation is j: j increases from 1 to 4, while the


quantity i remains constant during the summation process.
If we write zi = 30i, then we see that
4
X
yij = zi , (1.4)
j=1

for all positive integers i. On the left hand side of (1.4), i is fixed while j
varies. Assuming i stays where it is, we can replace the summation index
j with any letter, provided that it is not i. For example,
4
X 4
X
yik = yi` = 12 i + 22 i + 32 i + 42 i = 30i = zi ,
k=1 `=1

however
4
X
yii (1.5)
i=1

means something completely different! The sum in (1.5) should be inter-


preted as

y11 + y22 + y33 + y44 = 13 + 23 + 33 + 43 = 1 + 8 + 27 + 64 = 100,


1.2. Matrix multiplication 19

which is obviously not equal to (1.4). It would be legitimate to replace


(1.4) by
X4 X4
yji = zj or y`k = z` ,
i=1 k=1

because in each case you are preserving those indices that are fixed and
those that vary, respectively.
Finally, quite often the lower and upper limits of sums can be replaced
by letters. For example, given positive integers k < n, the notation
n
X
xi , (1.6)
i=k

stands for
xk + xk+1 + xk+2 + · · · + xn−1 + xn .
If xi = 2i, as it was initally, then (1.6) happens to be equal to

n(n + 1) − (k − 1)k.

Matrix multiplication

Definition 1.10 (Matrix multiplication). Let A, B be m × p and q × n


matrices, respectively. The product AB of A and B is defined only if
p = q, i.e. only if A has the same number of columns as B has rows.
In this case, AB is a m × n matrix. The entries of AB are The choice of letter k
here as the summation
p index is common, but not
X
(AB)ij = Ai1 B1j + Ai2 B2j + Ai3 B3j + · · · + Aip Bpj = Aik Bkj . essential. E.g. we could
use `, or indeed any let-
k=1 ter, provided the letter
is not already in use.
If p 6= q then the product AB is undefined.

Like the arithmetic of ordinary numbers, matrix multiplication is per- See MATH10400 Fact
formed before addition and subtraction, e.g. AB + C = (AB) + C , not 1.4.

A(B + C ), etc.
When first encountered, this definition is perhaps best understood using
examples.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
20 Matrices 1

Example 1.11. Find AB when


 
! 3 1
2 −1 3
A= B = 1 −1 .
 
and
1 0 −1
0 2

Solution. There is a video for Example 1.11. 


Matrix multiplication
was introduced by
Arthur Cayley (1821
– 1895), in order to
reproduce the behaviour
Example 1.12. If A and B are as in Example 1.11 then BA is a 3 × 3
of so-called linear matrix. By yourselves, verify that the complete product is
transformations.
 
Such things include ge-
ometric operations like 7 −3 8
BA = 1 −1 4  .
rotations, dilations and  
shears.
2 0 −2

Warning 1.13. Matrix multiplication is not commutative! That is to


say, in general, AB 6= BA. In Examples 1.11 and 1.12, we saw that
AB and BA were both defined, but did not even have the same size.
It is also possible for only one of AB and BA to be defined, e.g. if A
is 2 × 4 and B is 4 × 3. Even if AB and BA are both defined and have
the same size (e.g. if both are 3 × 3), the two products are typically
It is absolutely essen-
tial to recognize this different.
fact!! It is a piv-
otal difference between
matrix arithmetic and
arithmetic of ordinary
numbers.
Matrix multiplication – some motivation
At first sight, the definition of matrix multiplication may seem unnatural
and overly complicated. However, there is an abundance of examples
which demonstrate that this is the ‘right’ way to do it. We consider one
example from geometry.
Happily, matrices and matrix arithmetic can be used to do things! In
general, we can represent a point (a, b) in 2 dimensions using polar
coordinates. If r > 0 is the distance from the origin to (a, b), and φ is
the angle from the positive x-axis to the point, measured anticlockwise
(in radians), then simple trigonometric considerations yield a = r cos φ
and b = r sin φ, i.e. (a, b) = (r cos φ, r sin φ).
1.2. Matrix multiplication 21

Figure 1.1: Representation of a point (a, b) in 2 dimensions using polar


coordinates
y

(a,b) = (r cos φ,r sin φ)


b
For a very similar pic-
r ture, see MATH10400
1 Figure 1.7.

φ
x
−1 1 a 4

For those of you who are familiar with complex numbers, this is very
much like representing the complex number z = a + ib in polar form.
 
r cos φ
If we represent this point as the 2 × 1 column vector r sin φ , then mul- By writing a point as
a column vector, we
tiplication on the left by Rθ gives are not doing anything
deep. We are simply
! ! ! representing the point
r cos φ cos θ − sin θ r cos φ in a slightly different
Rθ = way, to take advantage
r sin φ sin θ cos θ r sin φ of matrix multiplication.
! We return to this idea in
r cos θ cos φ − r sin θ sin φ Section 2.4.
=
r sin θ cos φ + r cos θ sin φ
!
r cos(θ + φ)
= ,
r sin(θ + φ)

using the trigonometric addition formulae. See MATH10400 Lemma


1.33.
Thus the effect of Rθ on (a, b) is to preserve its distance r from the origin, The effect of matrices
but change the angle measured from the positive x-axis by θ, i.e., Rθ has on column vectors is ex-

rotated the point anticlockwise about the origin by θ.


plored in more detail in
Section 2.4.

Example 1.14. If (a, b) = (3, 2) and θ = 5π (75◦ ), then


Matrices are used
12 in computer graphics
all the time: the
! ! !

3-dimensional virtual
3 cos 5π sin 5π
3
Rθ 12 12 information in the
= 5π 5π computer is converted
2 sin 12 cos 12 2 into 2-dimensional in-
! ! formation for display on
3 cos 5π − 2 sin 5π
−1.16 the monitor or TV, using
= 12 12
≈ . matrix multiplication.
3 sin 5π
12
+ 2 cos 5π
12
3.42 Thus, the ability to
shoot zombies relies to
some extent on matrix
multiplication.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
22 Matrices 1

Figure 1.2: Rotation of points in 2 dimensions using rotation matrices


y

  
−1.16
Rθ ≈
3
(75◦ )
4
2 3.42 θ= 5π
12

θ
x
−2 2 4

Zero divisors
We have already seen that matrices defy the commutative law of multi-
plication: in general AB 6= BA. Here is another example of how matrix
multiplication violates once sacrosanct traditions.

Example 1.15. If
! !
0 1 0 6
A = and B = ,
0 0 0 0

then ! ! !
0 1 0 6 0 0
AB = = ,
0 0 0 0 0 0
which is the 2 × 2 zero matrix.

The point is that if a and b are ordinary numbers and ab = 0, then


either a = 0 or b = 0. We use this fact all the time without thinking,
for instance, when factorising polynomials. But this fact is not true of
matrices! The miscreants A and B in Example 1.15 are known in the
trade as zero divisors.

Square matrices and matrix powers


Recall that a matrix is square if it has the same number of rows as
columns, i.e., if it is a n × n matrix for some n.
1.3. Further ideas from matrix arithmetic 23

Definition 1.16. We denote the set of n × n matrices by Mn (R). If A


If you are unfamiliar
with set notation, please
is a n × n matrix then we can write A ∈ Mn (R), and vice-versa. see MATH10400 Sec-
tion 1.2, or these notes
on sets, written for un-
dergraduates.
Suppose n = 3. Given any two matrices in M3 (R), we can add, subtract The symbol R here indi-
cates that our matrices
consist of real numbers.
and multiply these matrices and get a result that is defined, and also in
M3 (R). This observation applies equally to any value of n. You can define matri-
ces consisting of com-
plex numbers, or even
Definition 1.17 (Matrix powers). Let A ∈ Mn (R), i.e. let A be a n × n more exotic objects, but
matrix, and let k be a positive integer. The kth power Ak of A is the
we won’t consider such
matrices very much in
n × n matrix this module, if at all.

A The set Mn (R), equipped


| ×A× {z· · · × A} . with the operations of
k times addition and multiplica-
tion, are examples of
mathematical structures
known as rings.
!
1 −1
Example 1.18. Let A = ∈ M2 (R). Then
0 3
! ! !
1 −1 1 −1 1 −4
A2 = AA = = , and
0 3 0 3 0 9
! ! !
1 −4 1 −1 1 −13
A3 = AAA = A2 A = = .
0 9 0 3 0 27

The product A × A × · · · × A in Definition 1.17 is unambiguous, in that


it doesn’t matter in which order the matrix product is computed. For
instance, AAA = (AA)A = A(AA). This is a consequence of Fact 1.23 (4)
in the next section.

1.3 Further ideas from matrix arithmetic


Matrix transposes and symmetric matrices

Definition 1.19 (Matrix transposes). If A is a m × n matrix then the


transpose AT of A is the n × m matrix having entries given by

(AT )ij = Aji ,

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
24 Matrices 1

for i = 1, . . . , n and j = 1, . . . , m.

In practice, this means that we convert rows into columns, and vice-
versa: the ith row of A is the ith column of AT .
!
1 0 5
Example 1.20. If A = , then
2 −3 7
 
1 2
AT = 0 −3 .
 

5 7

Matrix transposes are taken before multiplication, e.g., ABT means A(BT ),
not (AB)T , etc. Notice that if you take the transpose of a matrix twice,
you get back to where you started: AT T = A.
The following type of matrix turns out to be very important, as we will
see in due course.

Definition 1.21 (Symmetric matrices). A matrix A ∈ Mn (R) is called


symmetric if AT = A, i.e. if Aji = Aij for 1 6 i, j 6 n.

Symmetric matrices are easy to spot. If a given matrix is square, and if


the entries along the 1st row equal the entries down the 1st column, and
likewise for the other rows and columns, then the matrix is symmetric.
 
2 −1 3
Example 1.22. The matrix A = −1 −4 7 is symmetric.
 

3 7 8

There is another way to spot symmetric matrices. The main diagonal of a


matrix A ∈ Mn (R) is the list of all entries running diagonally from top left
to bottom right, i.e. the list of entries Aii , 1 6 i 6 n. The main diagonal
of A in Example 1.22 comprises the entries 2, -4 and 8. If we placed a
mirror along the main diagonal, then the entries on either side would
form a mirror image of each other: the entries −1, 3 and 7 in the top
right can be seen reflected in the bottom left. This phenomenon occurs
in all symmetric matrices.
1.3. Further ideas from matrix arithmetic 25

The laws of matrix arithmetic


When performing arithmetic with matrices, it is important to understand
which rules apply and which do not. The following fact lists some general
properties of matrix arithmetic.

Fact 1.23. In the following, we assume that the sizes of the matrices
A, B and C are such that all indicated sums and products are defined.

1. The commutative law of addition:

A + B = B + A.

2. The associative law of addition:

(A + B) + C = A + (B + C ). Matrices share proper-


ties (1), (2), (4) and (5)
with ordinary numbers,
but not (3)!
3. The failure of the commutative law of multiplication: in general

AB 6= BA.

4. The associative law of multiplication:

(AB)C = A(BC ).

5. The distributive laws for matrix multiplication over matrix ad-


dition:

(A + B)C = AC + BC and A(B + C ) = AB + AC .

6. Properties of matrix scalar multiplication: given a number c, we


have

(cA)B = A(cB) = c(AB) and c(A + B) = cA + cB.

7. Matrix transpose properties:

(AB)T = BT AT and (A + B)T = AT + BT .

Observe that in Fact 1.23, we are treating the matrices as whole entities,
rather than as collections of numbers: besides (6), no numbers are present

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
26 Matrices 1

in the fact. Mastery of these laws is strongly recommended: they enable


one to perform matrix computations much more quickly, and improve one’s
general understanding of matrices.
Fact 1.23 is something that needs to be proved, because in mathemat-
ics one should never accept something as being true unless it can be
proved. However, proofs are not being emphasised in this module. The
interested reader will eventually find proofs of some of the facts above
in the ‘forbidden section’ of the library, namely Appendix C.1. The proof
of fact (4) is the most difficult.
Fact 1.23 (2) and (4) above imply that we can write A + B + C and ABC
without fear of ambiguity: the order of the brackets does not matter.
This extends to sums and products of four, five matrices and so on. In
particular, the product A × A × · · · × A in Definition 1.17 is unambiguous,
and we can see for instance that A2 A = (AA)A = AAA = A(AA) = AA2 .

1.4 Inverses and determinants of 2 × 2 matrices


If we divide a number by 5, we are multiplying it by 51 : the number 15 , or
5−1 , is the reciprocal or multiplicative inverse of 5. This means
1
5
× 5 = 1,

i.e. if you multiply 5 by 15 , you get 1; multiplying by 1


5
‘reverses’ the effect
of multiplying by 5.

The 2 × 2 identity matrix


We need a matrix analogue of division, but before we can cover this,
we need to find matrices that behave something like the number 1. The
number 1 is the multiplicative identity: it is distinguished because when
you multiply any number by 1, the number does not change (and 1 is the
only number with this property). First, we consider the 2 × 2 case.
 
a b
Example 1.24. Let A = c d ∈ M2 (R) be an arbitrary 2 × 2 matrix,
 
and let I2 = 0 1 . Find AI2 and I2 A.
1 0
1.4. Inverses and determinants of 2 × 2 matrices 27

Solution.
! ! !
a b 1 0 a b
AI2 = = = A,
c d 0 1 c d
! ! !
1 0 a b a b
and I2 A = = = A.
0 1 c d c d

Both AI2 and I2 A are equal to A: multiplying A by I2 (either on the


left or the right) does not affect A. 

Definition 1.25 (The 2 × 2 identity matrix). The matrix I2 above is


called the 2 × 2 identity matrix.

Inverses of 2 × 2 matrices
Now that we have the 2 × 2 identity matrix, we can ask which 2 × 2
matrices have multiplicative inverses.

! !
2 1 3 −1
Example 1.26. Let A = and B = . Compute AB
5 3 −5 2
and BA.

Solution.
! ! !
2 1 3 −1 1 0 In the same way that 15
AB = = = I2 , ‘reverses’ the effect of 5,
5 3 −5 2 0 1 by getting us back to 1
( 15 × 5 = 1), so B re-
verses the effect of A by
! ! !
3 −1 2 1 1 0 getting back to I2 (AB =
and BA = = = I2 .  BA = I2 ).
−5 2 5 3 0 1

Definition 1.27 (Matrix inverses). Let A ∈ M2 (R). If B ∈ M2 (R) sat-


isfies
AB = I2 and BA = I2
then B is called an inverse of A.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
28 Matrices 1

It is worth pointing out that there is nothing in Definition 1.27 that stip-
ulates that if B ∈ M2 (R) is an inverse of A ∈ M2 (R), then B is the only
matrix that can perform that task. Fortunately, using a little matrix arith-
metic, we can show that if A has an inverse, then said inverse must indeed
be unique. This accords with our understanding of ordinary numbers: 51
is the only number such that, when multiplied by 5, produces 1.

Proposition 1.28. Suppose that B, C ∈ M2 (R) are both inverses of a


matrix A ∈ M2 (R). Then B = C . In other words, if A has an inverse,
then the inverse is unique.

Proof. If B and C are inverses of A then

BA = AB = I2 and C A = AC = I2 .

It follows that

(BA)C = I2 C = C ,
and (BA)C = B(AC ) = BI2 = B, Fact 1.23 (4)

as required.

This is another depar- Thus we can talk about the inverse  of a matrix, whenever it exists. Not
every 2 × 2 matrix has an inverse: 0 0 has no inverse, but in addition,
ture from the arith- 0 0
metic of numbers. Ev-
ery non-zero number a  
some non-zero matrices like 0 0 don’t have inverses either.
has the multiplicative 0 1
inverse 1/a, but this is
not true of matrices!

Computing inverses of 2 × 2 matrices


Provided it exists, the inverse of a matrix A is denoted A−1 .
Given a square matrix A, how do we

(a) decide if A−1 exists, and

(b) if so, work out what it is?

The 2 × 2 case is relatively straightforward.


 
a b
Definition 1.29 (Matrix adjugates). If A = c d
∈ M2 (R), we define
1.4. Inverses and determinants of 2 × 2 matrices 29

the adjugate of A by Be warned that in some


literature the adjugate
! of a matrix is called the
d −b adjoint.
adj(A) = .
−c a

What happens when we multiply A by its adjugate?


! !
a b d −b
A · adj(A) =
c d −c a
!
ad + b(−c) a(−b) + ba
= = (ad − bc)I2 .
cd + d(−c) c(−b) + da

You should verify that, likewise, adj(A) · A = (ad − bc)I2 .


!
a b
Proposition 1.30. Let A = ∈ M2 (R), and let us suppose that
c d

ad − bc 6= 0.

Then A−1 exists and equals

1
adj(A).
ad − bc

Proof. Using Fact 1.23 (6) and the observation above, we have
1 1
A· adj(A) = A · adj(A)
ad − bc ad − bc
1
= (ad − bc)I2 = I2 .
ad − bc
1
Likewise, ad−bc adj(A) · A = I2 , so ad−bc
1
adj(A) satisfies Definition 1.27.
−1
Thus A exists and equals ad−bc adj(A).
1

 
For instance, in Example 1.26 we had A = , so here ad − bc =
2 1
5 3
2 × 3 − 1 × 5 = 1 6= 0 and
!
1 1 3 −1
adj(A) = = B.
ad − bc 1 −5 2

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
30 Matrices 1

Exercise 1.31. Compute the inverses A−1 and B−1 of


There will be the odd
exercise in a mauve box.
These exercises will not
! !
11 −8 −7 5
be graded. Feel free to

A = and B = .
discuss them on the dis-
cussion boards.
13 10 2 35
The same applies to
passages in the text
or solutions of examples Verify that your solutions are correct by showing that AA−1 = A−1 A =
BB−1 = B−1 B = I2 .
which ask you to verify
certain things.

Determinants of 2 × 2 matrices

Definition 1.32 (Determinants of 2×2 matrices). The number ad−bc


above is called the determinant of A, and is denoted det(A).

Every square matrix has a determinant. Matrix determinants are of huge


theoretical and practical importance. We visit them on multiple occasions
in this module.
 
a b
We won’t prove this fact. For example, given a matrix A = c d , if we ‘transform’ a 2-dimensional
shape by A, it turns out that the area of the shape changes by a factor
of | det(A)|. See the accompanying short video for more details.

Figure 1.3: Determinants and their effect on area


y

! ! ! !
2 b 1 a+b
A
0
A = =
1 d 1 c+d

! !
1 a
A =
0 c
x
1 2 3

Area of transformed shape = (Area of original shape)·| det(A)|.


1.5. Inverses and determinants of n × n matrices 31

Example 1.33 (Inverses of rotation matrices). Find det(Rθ ) and (Rθ )−1 .

Solution. We have

det(Rθ ) = (cos θ)(cos θ) − (− sin θ)(sin θ) = cos2 θ + sin2 θ = 1 6= 0,

so Rθ−1 exists and equals


! !
cos θ sin θ cos(−θ) − sin(−θ)
adj(Rθ ) = = = R−θ . 
− sin θ cos θ sin(−θ) cos(−θ)

Figure 1.4: Inverses of rotation matrices


y
Geometrically speaking,
Rθ rotates points an-
ticlockwise through θ,
and Rθ−1 = R−θ ro-
4
     tates points clockwise
through θ, thus Rθ−1 re-
Rθ−1 Rθ 2 = 2
3 3
verses the effect of Rθ .
2 See video.

θ
x
2 4

Remarks 1.34.

1. What happens if det(A) = 0? In this case, A does not have an


inverse. In other words, there is no 2 × 2 matrix C for which
C A = AC = I2 .

2. If det(A) 6= 0 then A is called invertible or non-singular. If


det(A) = 0 then A is non-invertible or singular.

1.5 Inverses and determinants of n × n matrices


The n × n identity matrices
Definition 1.25 and Example 1.24 generalise to n × n matrices as follows.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
32 Matrices 1

Definition 1.35 (The identity matrices). The n × n identity matrix


In ∈ Mn (R) has entries
(
1 if i = j
(In )ij =
0 if i 6= j,

for 1 6 i, j 6 n.

Recall that the main diagonal of a square matrix is the list of all the
entries running diagonally from top left to bottom right. The entries
along the main diagonal of In are all 1, while all other entries of In are 0.

Theorem 1.36. If A is a m×n matrix then AIn = Im A = A. In particular,


if A ∈ Mn (R), then AIn = In A = A.

Proof. Relegated to Appendix C.1.

Inverses and determinants of n × n matrices


Definition 1.27 and Proposition 1.28 also generalise immediately to n × n
matrices: just replace all instances of I2 by In . In particular, notice that
the proof of Proposition 1.28 does not use the fact that we are working
with 2 × 2 matrices in an essential way at all. The only thing we demand
is that they are all square and of the same size.
So the question is how to we compute inverses of n × n matrices, and
what about determinants? We will do this in more than one way in this
module. The first method uses the idea of matrix minors and cofactors,
and is done in several steps.

Definition 1.37 (Matrix minors). Let A ∈ Mn (R). Given 1 6 i, j 6 n,


define the (i, j)-minor Mij to be the determinant of the (n−1)×(n−1)
matrix obtained from A, by deleting the ith row and the j column from
A. The matrix M ∈ Mn (R) that has (i, j) entry Mij , 1 6 i, j 6 n, is the
matrix of minors of A.

This definition is a mouthful when first seen. To give you some purchase
on it, we will cover an example when n = 3.
1.5. Inverses and determinants of n × n matrices 33

Example 1.38. Find the matrix of minors of


 
1 3 0
A =  2 −2 1  .
 

−4 1 −1

Solution. The computation of the minors is on the accompanying


video. The outcome is
 
1 2 −6
M = −3 −1 13  . 
 

3 1 −8

The next thing to do is to define the matrix of cofactors. This is easily


done once the matrix of minors has been defined, as it simply involves a
few changes of sign.

Definition 1.39 (Matrix cofactors). Let A ∈ Mn (R). The matrix of


cofactors is the matrix C ∈ Mn (R) having entries given by
(
i+j Mij if i + j is even
Cij = (−1) Mij =
−Mij if i + j is odd,

where Mij is the (i, j)-minor as above.

Example 1.40. Find the matrix of cofactors of


 
1 3 0
A =  2 −2 1  .
 

−4 1 −1

Solution. We have the following pattern of signs


 
+ − +
− + − .
 

+ − +

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
34 Matrices 1

In the positions marked ‘+’, Cij = Mij , and in the positions marked
‘−’, Cij = −Mij .
We now write down C , the matrix of cofactors of A. It differs from M
in Example 1.38 only according to the pattern of signs above:
   
+1 −2 +(−6) 1 −2 −6
C = −(−3) +(−1) −13  = 3 −1 −13 . 
   

+3 −1 +(−8) 3 −1 −8

Now we are in a position to define determinants of n × n matrices.

Definition 1.41 (Matrix determinants). If A ∈ Mn (R), then the deter-


minant of A is given by
n
X
det(A) = A11 C11 + A12 C12 + · · · + A1n C1n = A1k C1k ,
| {z }
entries of 1st row of A multiplied by their cofactors k=1

where C ∈ Mn (R) is the cofactor matrix of A.

Now we can compute the determinant of A in Example 1.38.

Example 1.42. Given A in Example 1.38, we have

det(A) = A11 C11 + A12 C12 + A13 C13 = 1 · 1 + 3 · (−2) + 0 · (−6) = −5.

Remarks 1.43.

1. We only used 3 of the cofactors of A when computing det(A)


above. We will use all 9 when we find the inverse of A.

2. Definition 1.41 is an example of a recursive definition.


In fact, we can start this By Definition 1.32, we can define det(A) for any A ∈ M2 (R).
process at n = 1. If
A = (a) is a 1 × 1 matrix Combining Definitions 1.37, 1.39 and Definition 1.41 means we
then det(A) = a. If we can define det(A) for any A ∈ M3 (R). By repeating this process
we can define det(A) for A ∈ M4 (R), and so it goes on: M5 (R),
combine 1.37, 1.39 and
1.41, where n = 2, we
recover Definition 1.32. M6 (R), . . . .
In principle, we can find the determinants of any n×n matrix in
this way, whatever the value of n. In practice though, this is not
1.5. Inverses and determinants of n × n matrices 35

advisable. In fact, for large enough n, attempting to compute


determinants in this way would use up all the available atoms
in the observable universe. Happily, so that we don’t run out of
atoms, there are more computationally efficient ways of finding
determinants (though they are not covered in this module).

The formula in Definition 1.41 is known as the cofactor or Laplace ex-


pansion along the first row. It turns out that the corresponding expan-
sions along any row or any column of A also yields det(A).

Theorem 1.44. Let A ∈ Mn (R). Then


n
X
det(A) = Ai1 Ci1 + Ai2 Ci2 + · · · + Ain Cin = Aik Cik
| {z }
entries of ith row of A multiplied by their cofactors k=1
n
Pierre–Simon, marquis
X de Laplace (1749 –
= A1j C1j + A2j C2j + · · · + Anj Cnj = Akj Ckj . 1827).
| {z } k=1 Image source:
entries of jth column of A multiplied by their cofactors Wikipedia.

This theorem gives us n + n = 2n different ways of computing det(A). Its You can simplify the
computation of det(A)
proof is beyond the scope of the module. by choosing to expand
along a row or column
We have looked at determinants, but it remains to complete the business having as many zeros
of finding matrix inverses. as possible: if Aij = 0,
then there is no need
to compute Cij , because
Definition 1.45 (Matrix adjugates). Let A ∈ Mn (R). The adjugate Aij Cij = 0.

adj(A) of A is the transpose of the matrix of cofactors.

Example 1.46. Find the adjugate of A in Example 1.38.

Solution.  
1 3 3
adj(A) = C T = −2 −1 −1 . 
 

−6 −13 −8

Recall the computation A · adj(A) = adj(A) · A = det(A)I2 after Definition


1.29.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
36 Matrices 1

Example 1.47. Compute A · adj(A) and adj(A) · A, where A is as in


Example 1.38, and thus find A−1 .

Solution.
  
1 3 0 1 3 3
A · adj(A) =  2 −2 1  −2 −1 −1
  

−4 1 −1 −6 −13 −8
 
−5 0 0
=  0 −5 0 
 

0 0 −5

= −5I3 = det(A)I3 .

Likewise, adj(A) · A = −5I3 (verify this). Therefore A−1 = − 15 adj(A),


because

A · (− 51 adj(A)) = − 15 A · adj(A) = (− 15 )(−5)I3 = I3 ,

and likewise A · (− 15 adj(A)) = I3 . 

Crumbs! That took some work. That is the first method of finding matrix
inverses given in this module. With a bit of practice, it can be applied
reasonably well to 3 × 3 matrices, but applying it to larger matrices
will, in general, become very painful. The second method, which is more
computationally efficient and scales more effectively to larger matrices,
though conceptually slightly deeper, is given in Section 3.5.

Exercise 1.48. Compute the matrix of minors, the matrix of cofactors,


the adjugate matrix, the determinant and finally the inverse A−1 of
 
1 −4 3
A = 2 10 1 .
 

5 1 9

Verify that your solution is correct by showing that AA−1 = A−1 A = I3 .


1.5. Inverses and determinants of n × n matrices 37

The ‘basket-weave’ method


Some people like the following method of computing determinants of 3×3
matrices.

Remarks 1.49 (The ‘basket-weave’ method). Determinants of 3 × 3


The basket-weave trick
matrices can be found using this method. Take A from Example 1.38. applies to 3×3 matrices
only!
First, write down A and repeat columns 1 and 2 on the right:

1 3 0 1 3
2 −2 1 2 −2
−4 1 −1 −4 1

Second, det(A) is given by:

‘Sum of products along & diagonals’


− ‘Sum of products along . diagonals’.

Hence we have

det(A) = 1 · (−2) · (−1) + 3 · 1 · (−4) + 0 · 2 · 1


 
− 0 · (−2) · (−4) + 1 · 1 · 1 + 3 · 2 · (−1)
= 2 − 12 − 0 − (0 + 1 − 6)
= −10 − (−5) = −5.

Orthogonal matrices
Recall the notion of matrix transpose from Definition 1.19.

Definition 1.50 (Orthogonal matrices). A matrix P ∈ Mn (R) is called


orthogonal if its inverse P −1 exists and equals its transpose P T , i.e.
if PP T = P T P = In .

In principle, it is easy to verify whether or not a matrix in Mn (R) is


orthogonal: simply multiply it by its transpose and check to see if the
outcome equals In .

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
38 Matrices 1

 
√1 √1 √1
2 6 3
Example 1.51. Show that P = −
 √1 √1 √1 

 2 6 3
is orthogonal.
0 − √26 √1
3

Solution. Verify that PP T = P T P = I3 . 

Example 1.52. The rotation matrices Rθ ∈ M2 (R) from Example 1.2


(2) are all orthogonal matrices.

Observe that if P is orthogonal then so is its inverse P T .


It turns out that orthogonal matrices are deeply connected to so-called
orthonormal bases of Rn and different coordinate systems (see Definition
4.7 and Section 4.2), and the eigenvectors of symmetric matrices (see
Chapter 5). They also feature in Appendix B.

General facts about inverses and determinants


We conclude this chapter with a series of general results about matrix
determinants and inverses, and provide proofs here where possible. Many
of these theoretical results help when trying to compute such things!

Proposition 1.53. The following facts apply to determinants of gen-


eral square matrices.

1. det(AT ) = det(A).

2. det(A) = 0 if A has a zero row or zero column.

3. If A is upper triangular, i.e. Aij = 0 whenever i > j, then

det(A) = A11 A22 A33 . . . Ann .

In particular, this holds if A is a diagonal matrix, i.e. Aij = 0


whenever i 6= j.

4. det(cIn ) = c n . In particular, det(In ) = 1 and det(0n ) = 0.

The proof of this proposition has been banished to the forbidden zone
that is Appendix C.1.
1.5. Inverses and determinants of n × n matrices 39

The following result is the fundamental theorem of determinant theory.


There is no equivalent
Theorem 1.54. If A, B ∈ Mn (R) then det(AB) = det(A) det(B). result for sums of matri-
ces: there is no nice link
between det(A + B) and
det(A) and det(B).
You should never forget this result! You should forget your own name be-
fore forgetting it. The proof of this result is so dark that it lies beyond the
scope of the module entirely. However, from it we reap a harvest of ad-
ditional properties of determinants which often help us when attempting
to compute them.

Corollary 1.55.

1. If A ∈ Mn (R) and c is a number, then det(cA) = c n det(A).

2. If n is odd and A ∈ Mn (R) is skew-symmetric, that is AT = −A,


then det(A) = 0.

Proof.

1. We have det(cA) = det((cIn )A) = det(cIn ) det(A) = c n det(A), by


Theorem 1.54 and Proposition 1.53 (4).

2. Observe that det(A) = det(AT ) = det(−A) = (−1)n det(A) = − det(A),


by Proposition 1.53 (1), (1) immediately above, and the fact that n
is odd. Hence det(A) = 0.

The following result summarises the process for finding inverses of square
matrices presented above.

Theorem 1.56. If A ∈ Mn (R) then

A · adj(A) = adj(A) · A = det(A)In ,

and if det(A) 6= 0, then A−1 exists and equals 1


det(A)
adj(A). On the
other hand, if det(A) = 0, then A has no inverse.

Lastly, we present a couple of useful facts concerning matrix inverses.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
40 Matrices 1

Proposition 1.57.

1. If A is invertible, then det(A−1 ) = det(A)−1 .

2. If A, B ∈ Mn (R) are invertible, then so is AB, and (AB)−1 =


B−1 A−1 .

Proof.

1. We have

1 = det(In ) = det(AA−1 ) = det(A) det(A−1 ),

by Proposition 1.53 (4) and Theorem 1.54. Hence det(A−1 ) = det(A)−1 .

2. Note that

(AB)(B−1 A−1 ) = A(BB−1 )A−1 = AIn A−1 = AA−1 = In ,

and likewise

(B−1 A−1 )(AB) = B−1 (A−1 A)B = B−1 In B = B−1 B = In .

Thus (AB)−1 exists and equals B−1 A−1 .


Chapter 2

Vector Geometry 1

2.1 Euclidean n-space and vectors


Euclidean geometry is the term used to describe the geometry of (3-
dimensional) space, developed axiomatically by Euclid of Alexandria (ap-
prox. 325 – 265 BC) in his book the Elements. This book was the main
geometry text for 2000 years. Much later, René Descartes had the idea
of attaching coordinates to a point (cartesian coordinates), which allows One of the oldest

for an algebraic description of geometry, called analytic geometry.


surviving fragments of
the Elements, found at
Oxyrhynchus and dated
Euclidean plane ↔ R2 to around 100 AD.

Point P ↔ (x1 , x2 )
Image source:
University of British
Columbia.

In this way, we see that R2 is an algebraic model of the Euclidean plane Have you ever won-
and R3 is an algebraic model of Euclidean space. The ancient Greeks did dered why we live in
a 3-dimensional world
not do geometry in higher dimensions, but by using cartesian coordinates, and not, say, a 5-
we can easily study geometry in dimensions n > 3, using algebra, and dimensional one?
Rn as an algebraic model, even though visualization becomes essentially Certainly, being re-
stricted to 2 dimensions
impossible. would cause digestion
problems, as illustrated

Definition 2.1 (n-space). Given n > 1, we define


by this hypothetical
chicken.

Rn = {(x1 , . . . , xn ) : x1 , . . . , xn ∈ R} .

We identify points on a line with real numbers x ∈ R, points in a plane


with ordered pairs (x1 , x2 ) ∈ R2 , and points in 3-space with ordered
triples (x1 , x2 , x3 ) ∈ R3 . These correspondences are easy to visualize by
making pictures. We can continue in this way and interpret quadruples

41
42 Vector Geometry 1

(x1 , x2 , x3 , x4 ) ∈ R4 as points in 4-space, quintuples (x1 , x2 , x3 , x4 , x5 ) ∈ R5


as points in 5-space, etc., but it is not possible to make pictures anymore.
Nobody is suggesting For an arbitrary positive integer n we can thus identify points in n-space
that n-space is wob-
bling around somewhere
with tuples in Rn :
Point P ∈ n-space ↔ (x1 , . . . , xn ) ∈ Rn .
in physical reality. It
is instead a mathe-
matical conceit that is
highly useful when un-
derstanding and solving
In this situation, we call (x1 , . . . , xn ) the coordinates of the point P and
(real–world) problems. write
Having said that, some P = (x1 , . . . , xn ).
physical theories (string
theories) assert that the We therefore also refer to Rn as (Euclidean) n-space and to the number
n as the dimension of Rn . The point O = (0, . . . , 0) is called the origin
universe possesses ex-
tra spatial dimensions
that are beyond our per- of Rn .
ception.

Note that time is not the Example 2.2. Euclidean 3-space is of course a mathematical model
fourth dimension: R4 is
simply a way of repre- for the space we live in, whereas 4-space is the model for ‘space-time’
senting space-time. R3 × R: three spatial dimensions and one temporal dimension, used
for instance in Einstein’s theory of relativity.

Figure 2.1: Algebraic representation of 3-space as R3


z
(2,3,4)

(2,3,0)
O
2

Googling ‘high dimen- Any problem which has n variables in it naturally yields points in Rn . The
sional data’ returns
about 361 million
visualization and modeling of ‘high-dimensional’ data sets is a topic of
search results. enormous importance in contemporary science, engineering and industry.
In multivariate statistics, data having n different measurable character-
istics x1 , . . . , xn is represented as ‘data points’ (x1 , . . . , xn ) in Rn – each
dimension corresponds to a different characteristic of the data. Then the
power of linear algebra can be brought to bear to analyse this data –
see, for example, Appendix B.
2.1. Euclidean n-space and vectors 43

Vectors
Now the discussion moves from points in n-space to vectors. Given the Out there in the math-
ematical badlands, al-
right context, many different objects can be called vectors. In this module most anything (e.g. func-
we will focus on just a few. tions in calculus) can be
regarded as a vector in
the right context.
Definition 2.3. In the correct context, a number of objects can be For more information
vectors. about this, look up the
term ‘vector space’.

1. Points (x1 , . . . , xn ) ∈ Rn .

2. In the physical sciences, a vector is often a quantity having both


magnitude and direction, e.g. displacement, velocity, accelera-
tion or force.

3. It is sometimes helpful to use column vectors (see Remark 1.3


(1)), especially when we want to manipulate vectors using ma-
trices – recall Section 1.2 and see Section 2.4.

For the remainder of this section, we introduce concepts that can be


applied to all three types of vector listed above. There is an emphasis on
the second type of vector listed above, as such vectors can be interpreted
quite readily geometrically. However, these differences are skin deep
and, from an algebraic point of view (e.g. summing vectors, taking scalar
multiplies etc.), all three types are treated in the same way.
We can represent vectors of the second type using directed straight line
segments, i.e. a line segment in Rn , to which is associated one of the two
possible directions.
If P and Q are two points in Rn , we denote by PQ that straight line
segment having P and Q as endpoints, and directed from P to Q. P and
Q are the initial and terminal points of PQ, respectively.

Figure 2.2: A straight line segment PQ in R2


Q

The important thing to note is not the positions of P and Q as such, but
the position of the terminal point Q, relative to the initial point P.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
44 Vector Geometry 1

Given a line segment in Rn , we can move it around as we please; provided


we don’t change its length or direction, it represents the same vector.

Definition 2.4. Line segments having the same length and the same
direction are said to be equivalent, and represent the same vector.

Figure 2.3: Equivalent directed line segments in R2

Now we need some way of quantifying the magnitude and direction.

Definition 2.5. Let x be a vector in Rn , represented by some line


To distinguish them from
ordinary numbers, we
will denote vectors in segment PQ. The entries of x are the coordinates of Q, relative to
P, i.e., the entries of x are ‘the coordinates of Q − the coordinates of
bold face in printed
notes, e.g. x or v.
When writing them by P’.
hand, e.g. when answer-
ing homework assign-
ments, it is usual prac-
tice to underline vectors,
e.g. x or v. Underlining
Example 2.6. Suppose x is represented by AB, where A = (2, 1) and
is easier by hand! B = (−3, 2). What are the entries of x?

Solution.

entries of x = ‘the coordinates of B − the coordinates of A’


= (−3, 2) − (2, 1)
= (−5, 1). 

It is helpful to see vector entries in terms of equivalent line segments. In


Example 2.6, if we move the initial point of AB to the origin, while pre-
serving magnitude and direction, then we get an equivalent line segment
OC, where C has coordinates

(−3 − 2, 2 − 1) = (−5, 1),


| {z }
(coordinates of B)−(coordinates of A)

which are precisely the entries of x.


2.1. Euclidean n-space and vectors 45

Figure 2.4: The entries of a vector via equivalent line segments


y

B = (−3,2)

C A = (2,1)

x
O

In general, if x is represented by PQ, where P = (x1 , . . . , xn ) and Q = Even if n > 3 and we


can’t visualise Rn any-
(y1 , . . . , yn ), then the entries of x are more, we are still able
to work with the corre-

(y1 − x1 , y2 − x2 , . . . , yn − xn ).
sponding vectors using
algebra.

The difference between points and vectors


Hereafter, if x is represented by some line segment PQ, we simply write
x = PQ, and if x has entries (−5, 1), we write x = (−5, 1). If we write
x = (−5, 1), then x doesn’t look any different from the point (−5, 1), and
in many ways this is correct: when we do algebra with vectors in Rn ,
vectors and points become almost indistinguishable and interchangeable.
Some geometric differences remain: vectors tend to have non-zero length,
whereas points have zero length. Having said that, given a point P in Rn ,
there is a natural way to associate a vector to P: consider the directed
line segment OP, having initial point O and terminal point P.

Figure 2.5: The line segment OP associated with the point P


z

OP y

x
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
46 Vector Geometry 1

Of course, the entries of the vector OP equal the coordinates of the point
P. When we treat points as vectors and talk about the ‘length of P’, then
we really mean the length of OP.

2.2 Vector arithmetic


As with matrices, we will often treat vectors as whole entities in their
own right, rather than as collections of numbers. As with matrices, we
define vector addition and scalar multiplication ‘entry-wise’.
As with matrices, it only
makes sense to add two Definition 2.7 (Vector addition). Let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn )
vectors having the same
size, e.g. trying to add be two vectors in Rn . Then the vector sum x + y is another vector in
(1, 2) to (3, 4, 9) doesn’t Rn , and is given by
make any sense and
should be avoided!
The sum of any two vec-
x + y = (x1 + y1 , . . . , xn + yn ).
tors of the same size
yields another vector of
the same size.
Definition 2.7 (1) certainly accords with our geometric intuition.

Example 2.8. If x = (−2, 3) and y = (4, −1), then x + y = (2, 2).


Geometrically, if we position the initial point of x at the origin, and
move y so that its initial point equals the terminal point of x, then
x + y is the directed line segment from O to the terminal point of y.

Figure 2.6: Summing vectors


y

y
(2,2)

x
x+y

x
O

Definition 2.9 (Zero vectors). The zero vector in Rn is 0 = (0, . . . , 0).

The zero vectors in R2 and R3 are (0, 0) and (0, 0, 0), respectively, and so
on. They correspond to the origins of these spaces. Just like the zero
2.2. Vector arithmetic 47

matrices of Definition 1.9, adding a zero vector to a vector of the same The zero vectors don’t

size changes nothing: these zero vectors are additive identities.


have a definable direc-
tion.

Recall that a scalar is just another term for an ordinary number.

Definition 2.10 (Scalar multiples of vectors). Let x = (x1 , . . . , xn ) be


a vector in Rn and let c be a scalar. The scalar multiple cx of x is
cx = (cx 1 , . . . , cx n ).

As with matrices, if you multiply a vector in Rn by the scalar 0, you get


the corresponding zero vector in Rn .
Two non-zero vectors x and y are said to be parallel if x = cy for some
scalar c.

Example 2.11. Let x = (2, 1). What are 3x, −2x, −x and 12 x?

Solution. It follows from Definition 2.10 that 3x = (6, 3), −2x =


(−4, −2), −x = (−2, −1) and − 12 x = (−1, − 21 ). 

Figure 2.7: Scalar multiples of vectors in R2


y

3x
If x is non-zero and c >
0, then cx and x share
x
the same direction. If
x c < 0 then cx points in
the direction opposite to
−2x that of x.

The following general properties of vector addition and scalar multiplica-


tion are easy consequences of the definitions above. You can and should
compare them with the ones listed in Fact 1.23.

Fact 2.12. For all vectors x, y and z of the same size, and scalars k Compare this with Fact
and c, we have 1.23 (1), (2) and (part
of) (6).They are the same
laws!
1. (x + y) + z = x + (y + z) (associativity)

2. x + y = y + x (commutativity)

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
48 Vector Geometry 1

3. c(x + y) = cx + cy (dist. of scalar mult. over vector addition)

4. k(cx) = c(kx) = (kc)x.

There is no direct vector analogue of matrix multiplication, though there


is the scalar product – see Section 2.3.

Lengths of vectors and unit vectors


The length x of x = (x1 , x2 ) in R2 is the length of a line segment

x. This can be computed easily using Pythagoras: x =

representing
p
x12 + x22 . The length of a vector has physical significance, e.g. the length
of the velocity vector of a moving object is the speed of that object.


Example 2.13. If x = (4, 3), then x = 42 + 32 = 5.

Figure 2.8: The length of a vector in R2


y

(4,3)
3

x
4

We can compute lengths of vectors in R3 by applying Pythagoras twice.

Figure 2.9: The length of a vector in R3


z
(x1 ,x2 ,x3 )

x2

(x1 ,x2 ,0)


O
x1

x
2.2. Vector arithmetic 49

If x = (x1 , x2 , x3 ), then
s
q q 2 q

x = (x1 , x2 , 0) 2 + x 2 = x1 + x2 + x32 = x12 + x22 + x32 .
2 2
3

Having this in mind, it is natural to make the following definition.

Definition 2.14 (Vector length). Let x = (x1 , . . . , xn ) be a vector in Rn .


In mathematical circles,
The length or magnitude of x is defined to be the quantity x is also

called the norm of y.
q
x = x12 + · · · + xn2 .

For any vector x, x is a non-negative number. For any vector x and



scalar c, we have cx = |c| x :


cx = c(x1 , . . . , xn ) = (cx1 , . . . , cxn )

p
= (cx1 )2 + . . . (cxn )2
q
= c 2 (x12 + · · · + xn2 )
q
= |c| x12 + · · · + xn2 = |c| x .

Also, given a vector x in Rn , if x = 0 then x = 0.


Definition 2.15 (Unit vectors). A vector having length 1 is called a


unit vector.

Let x in Rn be non-zero (that is, x 6= 0, i.e. not every entry of x is zero).


Then x > 0, and we can form a new vector

1
x̂ = x.
x
Then

x̂ = 1 x = 1
x
i.e. x̂ is the unit vector having the same direction as x.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
50 Vector Geometry 1

Example 2.16. If x = (4, 3), then x̂ = 51 (4, 3) = ( 45 , 35 ).

Exercise 2.17. Let x = (−1, 3, 2). Find

1. a vector of length 1 in the direction of x;

2. a vector of length 4 in the direction of x, and

3. a vector of length 2 in the direction opposite to that of x.

Example 2.18 (Vector Addition and Displacement). A sailor in a small


boat sails 3 km east, then 4 km southeast and a third leg of length d
km at an angle θ to the easterly direction. Her final position is 7 km
due east from her starting point. Find the magnitude and direction
of the third leg of her journey.

Solution. We want to find θ and d = z . We have


(7, 0) = x+y+z
= (3, 0) + (4 cos π4 , −4 sin π4 ) + (d cos θ, d sin θ)
= (3 + 4 cos π4 + d cos θ, −4 sin π4 + d sin θ)
≈ (5.828 + d cos θ, −2.828 + d sin θ).

Equating the first and second entries gives

d cos θ ≈ 1.172 and d sin θ ≈ 2.828.

With a bit more work, We divide to eliminate d: tan θ ≈ 2.413, so θ ≈ 1.178. Finally,
we can show that θ is
exactly 38 π, or 67.5◦ .
d ≈ 1.172/ cos θ ≈ 3.06 km. 

Figure 2.10: A sailor in a small boat


7 km
π
x 4

y z

θ
2.3. The scalar product 51

2.3 The scalar product


The concepts of orthogonality and the scalar (or dot) product of two
vectors are of huge importance. In the physical sciences, mechanical work
and magnetic flux are calculated using scalar products. In statistics, the
scalar product is connected to correlation of sample data.

Definition 2.19. Let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) be vectors Don’t confuse the scalar

in Rn . The scalar product or dot product of x and y is given by


product of two vectors
with scalar multiplica-
tion of a vector by a
n
X scalar, or with matrix
x · y = x1 y1 + · · · + xn yn = xi yi . multiplication!

i=1

For instance, (1, 2) · (−4, 7) = 1 × (−4) + 2 × 7 = 10, and (2, −3, 1) ·


(2, 1, −1) = 2 × 2 + (−3) × 1 + 1 × (−1) = 0, and so on. The following
really important list of properties follow from the definition above. As in
previous lists of facts, notice that, in the statement below, the vectors are
treated as whole entities – there is no mention of the individual entries
of the vectors. The proof requires us to open up the vectors and examine
their entries, but once this is done we can forget about the entries!

Fact 2.20. The following hold for all vectors x, y and z in Rn , and These facts are ex-
scalars c ∈ R. tremely useful! You will
gain great powers if
you can master them.
1. x · y = y · x.

2. x · x = x 2 .

3. x · (y + z) = x · y + x · z and (x + y) · z = x · z + y · z.

4. (cx) · y = x · (cy) = c(x · y).

Proof. Let x = (x1 , . . . , xn ), y = (y1 , . . . , yn ) and z = (z1 , . . . , zn ).

1. We have y · x = y1 x1 + · · · + yn xn = x1 y1 + · · · + xn yn = x · y.

2. We have x · x = x12 + · · · + xn2 = x 2 .


3. Since
y + z = (y1 + z1 , . . . , yn + zn ),

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
52 Vector Geometry 1

it follows that

x · (y + z) = x1 (y1 + z1 ) + · · · + xn (yn + zn )
= (x1 y1 + · · · + xn yn ) + (x1 z1 + · · · + xn zn )
= x · y + x · z.

Likewise for the other equality.

4. Because cx = (cx 1 , . . . , cx n ), we obtain

(cx) · y = (cx1 )y1 + · · · + (cxn )yn


= c(x1 y1 + · · · + xn yn ) = c(x · y).

Likewise for the other equality.

Angles between vectors and the Cauchy-Schwarz


inequality
We use all of the facts above in the next result, called the Cauchy-
Schwarz inequality, which is one of the most important inequalities in
all of mathematics. Its proof is a great exercise in applying the rules in
Fact 2.20 above and, for this reason, it avoided exorcism to Appendix C.1.

Theorem 2.21. Let x and y be non-zero vectors in Rn . Then



|x · y| 6 x y .

Moreover, if equality holds, then x and y are parallel.

There are several established proofs of this result. The one given below
uses an intermediate result that is hugely significant in its own right.

Theorem 2.22. Let u and v be vectors in Rn , such that u · v = 0. Then

u+v 2 = u 2+ v 2.

In particular, u 6 u + v , with equality if and only if v = 0.


Proof. Using Fact 2.20, we have

u + v 2 = (u + v) · (u + v)

Fact 2.20 (2)
2.3. The scalar product 53

= u · (u + v) + v · (u + v) Fact 2.20 (3)


= u·u+u·v+v·u+v·v Fact 2.20 (3) again
= u 2 + 2(u · v) + v 2

Fact 2.20 (1) and (2)
= u 2+ v 2,

as u · v = 0. Finally,
p
u 6 u 2+ v 2 = u+v ,

with equality if and only if v = 0, which holds if and only if v = 0.


The geometric import of Theorem 2.22 is revealed later in Remark 2.27. Throughout these two
proofs, we treat u, v, x
and y as whole enti-
Proof of Theorem 2.21. Set ties and rely largely on
Fact 2.20 to manipulate
them. We do not (and
u = (x · y)x v = x y − u = x 2 y − (x · y)x.
2
and should not) concern our-
selves with the entries
u1 , u2 , . . . of the vectors
As cw = |c| w for any scalar c and vector w (see remarks after

– doing so could easily
have u = |x · y| x . Moreover, u + v = x 2 y, so
get us into a big pig’s
Definition 2.14), we
2
u+v = x y . Next, we show that u · v = 0. Indeed, we have
breakfast, and we want
to minimise the number
of such breakfasts.

u · v = u · ( x 2 y) + u · (−u)
In short, life is made
Fact 2.20 (3)
easier by Fact 2.20!
= x 2 (u · y) − u 2

Fact 2.20 (2), (4)
= x 2 (((x · y)x) · y) − (|x · y| x )2

= x 2 (x · y)2 − (x · y)2 x 2 = 0.

Fact 2.20 (4)

Applying Theorem 2.22 yields



|x · y| x = u 6 u + v = x 2 y .

Dividing both sides by x (which is a non-zero, positive number) yields



the result. Moreover, equality holds if and only if v = 0, i.e.
 
x·y
y = 2 x.
x

This means that x and y are parallel.

This inequality allows us to define the angle between vectors.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
54 Vector Geometry 1

Corollary 2.23. Let x and y be non-zero vectors in Rn . Then


x·y
−1 6 6 1.
x y

Proof. Theorem 2.21 shows that



x·y

6 1,
x y

from which the result follows.

The proof of this fact Given any real number r in the range −1 6 r 6 1, there is a unique
belongs to a calculus
course. It is outside the
number θ in the range 0 6 θ 6 π, such that
scope of MATH10400.
cos θ = r.

Figure 2.11: The graph of cosine from 0 to π


y

π
2 θ π
x

r
−1

Definition 2.24. Let x and y be non-zero vectors in Rn .

1. The angle between x and y is the unique number θ in the range


0 6 θ 6 π, such that
x·y
cos θ = .
x y

2. We say x and y are orthogonal, or perpendicular, written x ⊥ y,


if the angle between x and y is π2 (90◦ ).
2.3. The scalar product 55

It is important to check that this definition of angle does indeed fit with
our geometric intuition. We do this in R2 , for which we require the cosine
rule:
c 2 = a2 + b2 − 2ab cos θ,
where a, b and c are the side lengths of the triangle as below, and θ is
the angle as below.

Figure 2.12: The cosine rule


B

c
a

θ A
O b

We include a brief proof: if A = (b, 0) and B = (a cos θ, a sin θ), then


2
c 2 = AB = (a cos θ − b)2 + (a sin θ − 0)2
= a2 cos2 θ − 2ab cos θ + b2 + a2 sin2 θ
= a2 + b2 − 2ab cos θ.

Now let x = (x1 , x2 ), y = (y1 , y2 ). Form the triangle having x and y


as two of its sides. Let z be a vector representing the third side, so
z = (x1 − y1 , x2 − y2 ). By the cosine rule,
2
z = x 2 + y − 2 x y cos θ.
2

Now
z = (x1 − y1 )2 + (x2 − y2 )2 = x12 − 2x1 y1 + y21 + x22 − 2x2 y2 + y22 ,
2

and
2 2
x + y − 2 x y cos θ = (x12 + x22 ) + (y21 + y22 ) − 2 x y cos θ,
so by cancelling the terms x12 , y21 , x22 and y22 from both sides, we obtain

−2x1 y1 − 2x2 y2 = −2 x y cos θ

⇒ x · y = x1 y1 + x2 y2 = x y cos θ
x·y
⇒ = cos θ.
x y
This result matches perfectly with Definition 2.24!

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
56 Vector Geometry 1

Proposition 2.25. Let x and y be non-zero vectors in Rn . Then x ⊥ y


if and only if x · y = 0.

Proof. By Definition 2.24, x ⊥ y if and only if


x·y
= cos θ = cos π2 = 0.
x y

Since x and y are non-zero, we have x y > 0. Thus x ⊥ y if and only
if x · y = 0.

Example 2.26. Suppose x = (−1, 3) and y = (6, 2). Then x ⊥ y


because x · y = −1 × 6 + 3 × 2 = 0, so x ⊥ y.

The example above is trivial, but it is worth expanding on it and again


appealing to geometric intuition. Let A and B denote the points (−1, 3)
and (6, 2), respectively, and form the triangle OAB. If x ⊥ y then OAB is
right-angled at O, so we expect from the cosine rule that
2 2 2
AB = x + y .
2
We verify this: x 2 = (−1)2 + 32 = 10, y = 62 + 22 = 40 and AB =

2
(6 − (−1), 2 − 3) = (7, −1), so AB = 72 + (−1)2 = 50.
Thus the triangle OAB is right-angled at O, and x ⊥ y.

Figure 2.13: Orthogonal vectors in R2


y

B
x
y

x
−1 6
2.3. The scalar product 57

The sign of the scalar product


If x and y are non-zero vectors having angle θ between them, we can
glean some immediate information about θ simply by checking the sign
of x · y.

(a) x · y > 0 implies x y cos θ > 0, so cos θ > 0: θ is acute;
(b) x · y < 0 implies cos θ < 0: θ is obtuse;
π
(c) x · y = 0 implies cos θ = 0: θ = 2
and x ⊥ y.

The geometric significance of Theorem 2.22


Theorem 2.22 was apparently parachuted in earlier in the section, simply
to help prove the Cauchy-Schwarz inequality. Moreover, its statement
and proof were purely algebraic. However, it has a deep and ancient
geometric connection, alluded to in some sense by Example 2.26.

Remarks 2.27. Let u and v be non-zero vectors in Rn . We know now


that these vectors are perpendicular if and only if u · v = 0. In this
case, we can form the right-angled triangle as shown below, where
the length of the hypotenuse and those of the other two sides are
given by u + v and u and v , respectively. Given this, we see

that the statement

u+v 2 = u 2+ v 2,

is precisely what we would expect from the theorem of Pythagoras.


Moreover, the length of the hypotenuse u + v is going to strictly

exceed u (as v 6= 0). If v = 0 then these two values will be equal

(and of course we won’t have a triangle anymore).

Figure 2.14: The Theorem of Pythagoras

u+v

v
u

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
58 Vector Geometry 1

2.4 Matrices acting on vectors


In this section, we consider vectors as given in Definition 2.3 (3). Recall
Section 1.2, where we treated points in R2 as 2 × 1 column vectors in
order to take advantage of matrix multiplication and show that certain
matrices can rotate such points.
The arithmetic of This principle can be generalised to a much greater degree. Suppose
column vectors: sums,
scalar multiples,
that A is a m × n matrix, and let x be a point or vector in Rn . We can
lengths, scalar products regard x = (x1 , . . . , xn ) as a n × 1 column vector
etc, proceeds in the
 
x1
same way as in previous
sections.

x =  ..  ,
 . 

xn
to take advantage of matrix multiplication. With this in mind, the matrix
product Ax is a m × 1 column vector which, in turn, can be regarded as a
point or vector in Rm . In this way, we can use the matrix A to transform
points or vectors in Rn to points or vectors in Rm .
Quite often, we concentrate on the case where m = n, i.e., when A ∈
Mn (R) and x and Ax belong to the same space Rn . The rotation of points
in 2 dimensions is one such example. There are 3×3 matrices that rotate
points in 3 dimensions, and others that perform many more geometric
actions besides.
The next result ties together matrix transposes and scalar products and
has many applications.

Proposition 2.28. Let A ∈ Mn (R) and let x and y be (n × 1 column)


vectors in Rn . Then
(Ax) · y = x · (AT y).

The proof of Proposition 2.28 is in Appendix C.1. One of its applications


is to demonstrate a hugely important feature of orthogonal matrices,
namely that they preserve scalar products between vectors and they
preserve lengths of vectors, in the following sense.

Proposition 2.29. Let P ∈ Mn (R) be an orthogonal matrix, and let


x, y be vectors in Rn . Then

(Px) · (Py) = x · y.
2.5. Orthogonal projections 59


In particular, Px = x .

Proof. Since P is orthogonal, we have P T P = In . Given any vector y in


Rn , treated as a n × 1 column vector, note that In y = y by Theorem 1.36.
Thus, using Proposition 2.28, we obtain

(Px) · (Py) = x · (P T Py) = x · (In y) = x · y.


2
In particular, Px = Px · Px = x · x = x 2 . Taking positive square

roots of both sides yields the second result.

This makes complete sense when you consider rotations. When you
rotate an object you don’t change the length of anything, nor do you
change any angles between anything in the object (whereas stretching,
twisting or otherwise deforming the object may alter such things).
Proposition 2.29 is used in Sections 4.2 and 6.1, and Appendix B.

2.5 Orthogonal projections


We finish this chapter with a brief description of orthogonal projections,
which are used in Appendix B. Consider a unit vector v in Rn , some other
vector x in Rn , and the straight line L that runs through the origin 0,
parallel to v.

Definition 2.30. The orthogonal projection of the vector x onto the


line L, denoted projL (x), is the vector y that lies L, in such a way that
the line running from x to y is orthogonal to L.

In Figure 2.15 below, L and y are represented by the dotted and dashed
lines, respectively, and the line from x to y is highlighted in red. As you
can see, the red line is orthogonal to L.
How do we determine y = projL (x)? The vector y has to lie on the line L,
which means that it must be a scalar multiple of v. So let’s set y = cv,
where c is to be determined. We need the red line from x to y to be
orthogonal to L. The red line is parallel to the vector x − y, and v is
parallel to L. If we take the scalar product of x − y with v, we obtain

(x − y) · v = x · v − y · v = x · v − c(v · v) = x · v − c,

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
60 Vector Geometry 1

because v · v = v 2 = 1. For orthogonality, we require this quantity to



be 0. Therefore, c = x · v, giving y = (x · v)v.

Figure 2.15: Orthogonal projection of x onto the line L


x

y = projL (x)
0 θ
v

In the figure above, From the point of view of trigonometry, this makes perfect sense. Let θ
θ is acute and cos θ
is positive. However,
be the angle between v and x. Given the right-angled triangle in the
θ may be obtuse, so picture above, the
length of y should equal the length of x, multiplied
by | cos θ|, i.e. y = x | cos θ|. From
we require the abso-
lute value | cos θ|, rather Definition 2.24, we know that
x · v = x v cos θ = x cos θ, thus y should equal |x · v|, and indeed
than cos θ.

it does, again because v = 1.

Example 2.31. Compute projL (x) in the following cases.

1. x = (5, −7, 11, 4) and v = ( 12 , − 12 , 12 , − 12 ) = 12 (1, −1, 1, −1).

2. x = (2, 9, 1) and v = (3, 2, 1).

Solution.
In both cases you can
and should verify that 1. We have projL (x) = (x · v)v = 19
v = 19
(1, −1, 1, −1).
(x − projL (x)) · v = 0.
2 4

2. Here, v is not a unit vector, so the formula above does not apply.
In these cases, observe that L is also parallel to v̂, where v̂ is
the unit vector parallel to v defined after Definition 2.15. Thus
This formula for projL (x)
(x · v)v
projL (x) = (x · v̂)v̂ = 2 .
works for any non-zero
vector v.
v
In this example we have

(x · v)
projL (x) = 2 v = 25
v = 25
(3, 2, 1). 
v 14 14
2.5. Orthogonal projections 61

Exercise 2.32. Let v, x, y and L be as above, and let z be any vector


running parallel to L, i.e. z = av, for some number a. Show that x − y
and y − z are orthogonal, regardless of the value of a. Using this
result and Theorem 2.22, show that
2 2
x − z 2 = x − y + y − z ,


and thus x − y 6 x − z .

The exercise above shows that y is the vector on the line L that is closest
to x.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
Chapter 3

Systems of Linear Equations

3.1 Systems of linear equations


Solution sets and simultaneous solutions
The equation 2x + y = 3 is a linear equation in the variables x and y.
A pair (x0 , y0 ) of real numbers (or a vector in R2 , if you like) is a solution of
the equation 2x + y = 3 if setting x = x0 and y = y0 makes the equation
true; i.e. if 2x0 + y0 = 3. Thus, (1, 1) and (0, 3) are solutions, but (1, 4) is
not a solution since setting x = 1, y = 4 gives 2x + y = 2 × 1 + 4 6= 3.

Definition 3.1 (Solution sets). The set of all solutions of a linear


equation is called its solution set.

Example 3.2. The solution set of the equation 2x + y = 3 forms a


straight line in the 2-dimensional Cartesian plane.

Figure 3.1: The solution set of 2x + y = 3.


y

Every point on the


(0,3) line is a solution to
the equation, and
vice-versa.
2
The solution set gives us
(1,1) the graph of the function
y = 3 − 2x.
x
−2 2 4
(2,−1)

63
64 Systems of Linear Equations

Example 3.3. The solution set of −4x + 5y = 8 is another straight


line. The two lines intersect at a (unique) point ( 12 , 2). Equivalently,
x = 12 , y = 2 solve both equations 2x + y = 3 and −4x + 5y = 8
simultaneously.

Figure 3.2: The solution sets of 2x + y = 3 and −4x + 5y = 8.


y

2 ( 21 ,2)

x
−2 2 4

Personal computers
It is sometimes necessary to find solutions that solve several linear equa-
tions simultaneously. For example, weather prediction models often re-
quire simultaneous solutions of hundreds of thousands of such equations.

Simple examples are re- Example 3.4. We consider a straightforward example about making
quired when introducing computers. Suppose that a factory makes two models of personal
a topic. However, sim-
ple examples often look computer: one for standard use, that takes 2 hours to build, and a
contrived. Please ac-
cept my apologies in ad-
second, more powerful model for gaming purposes, that takes 3 hours
vance. to build. Imagine that the factory can produce a maximum of 300
computers per week, and has at its disposal a total of 800 hours of
labour per week. How many computers of each type should be built,
in order to maximise capacity and time?

Solution. Let x and y denote the number of standard and gaming


computers to be built each week, respectively. To maximise capacity
and time, we require

x + y = 300
(3.1)
2x + 3y = 800.
3.2. Elementary row operations 65

If we multiply both sides of the second equation above by 12 , we obtain

x + 3
2
y = 400. (3.2)

If we then subtract the first equation in (3.1) above from (3.2), we get
1
2
y = 100,

so y = 200. If we substitute this back into the first equation in (3.1),


we obtain x = 100. 

3.2 Elementary row operations


This kind of ‘ad hoc’ approach used above may not always work if we have It is quite possible that
you (like me) learned
a more complicated system, involving a greater number of variables, or one way of solving cer-
more equations. We will devise a general strategy for solving complicated tain systems of lin-
ear equations at school,
systems of linear equations. akin to the ad hoc
method above. At uni-
versity, I learned a dif-
Example 3.5. Find all simultaneous solutions of the system ferent method – the one
below – which initially
seemed unnatural and
x1 + 2x2 − x3 = 5 unnecessary. But in
time I came to view it as
3x1 + x2 − 2x3 = 9 superior.

−x1 + 4x2 + 2x3 = 0. If you are in this position


then I would heed the
words of Yoda, who once
said, ‘you must unlearn
what you have learned’.
The approach used in simple systems like Example 3.4 will work for
Example 3.5 too, but with this and other, harder examples, it may not
always be clear how to proceed. Our new strategy describes and solves
linear systems more systematically, with greater clarity, and thus with
less scope for error.
We associate a matrix with our system in Example 3.5: It is possible to solve
Examples 3.4 and 3.5 us-
ing matrix inverses –
  see Section 3.5. How-
1 2 −1 5 eqn 1 ever, this approach only

 3 1 −2 9 
  applies if our linear sys-
eqn 2 tem has a unique solu-
tion, and this is not true
−1 4 2 0 eqn 3 in all cases.

1. Row i of this matrix comprises first the coefficients of the variables


and then the number on the right hand side of equation i.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
66 Systems of Linear Equations

2. Column i, 1 6 i 6 3, corresponds to the variable xi , and column


4 corresponds to the numbers on the right hand side of the whole
system.

Definition 3.6 (Augmented matrices). The above matrix is the aug-


mented matrix of the linear system in Example 3.5.

In Example 3.4, we performed operations such as:

1. multiply an equation by a non-zero constant, and


2. add one equation (or a non-zero constant multiple of one equation)
to another equation.

Such operations change the system of equations, but preserve the set of
simultaneous solutions of the system.
The operations correspond to the following operations on the augmented
matrix:

1. multiply a row by a non-zero constant, and


2. add a multiple of one row to another row.

We consider a third type, namely:

3. swap two rows in the matrix (this only amounts to writing down the
equations of the system in a different order).

Definition 3.7 (Elementary row operations (EROs)). Operations on


a matrix of these three types are called elementary row operations
(EROs).

Solution of Example 3.5. See the accompanying video.


Step 1
R3 −1 4 2 0
+
R1 1 2 −1 5
new R3 0 6 1 5
3.2. Elementary row operations 67

Step 2
R2 3 1 −2 9

3R1 3 6 −3 15
new R2 0 −5 1 −6

Step 3
R2 0 −5 1 −6
+
R3 0 6 1 5
new R2 0 1 2 −1

Step 4
R3 0 6 1 5

6R2 0 6 12 −6
new R3 0 0 −11 11

Step 5
R3 0 0 −11 11
− 11 R3 0 0
1
1 −1

We have produced a new, simpler, system of equations: The point of using EROs
is that the system of
equations gets simpler,
x1 + 2x2 − x3 = 5 (A) but the solutions of the
system are preserved.
x2 + 2x3 = −1 (B) So the solutions of the
final system are the
x3 = −1 (C), same as the solutions of
the original system.

This is easily solved using back-substitution.



 (C ) x3 = −1

Back-substitution (B) x2 = −1 − 2x3 ⇒ x2 = 1

(A) x1 = 5 − 2x2 + x3 ⇒ x1 = 2.

So the solution is (2, 1, −1), or x1 = 2, x2 = 1, x3 = −1. Check that this


is a solution to the original system. It is the only solution, both of the
final system and of the original one (and every intermediate one). 

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
68 Systems of Linear Equations

3.3 Row-echelon and reduced row-echelon


forms
The matrix obtained in Step 5 above is in row-echelon form.

Definition 3.8 (Row echelon forms). A matrix A is said to be in row-


echelon form (REF), if

1. The first non-zero entry in each row is a 1 (called a leading 1).

2. If a column contains a leading 1, then every entry of the column


below the leading 1 is a zero.

3. As we move downwards through the rows of the matrix, the


leading 1s move from left to right.

4. Any zero rows (rows consisting entirely of 0s) are grouped to-
gether at the bottom of the matrix.

Equivalently, we say that A is a row-echelon matrix.

Example 3.9. The matrix


 
1 2 −1 5
2 −1 
 
 0 1
0 0 1 −1

at the end of the solution to Example 3.5 is in row-echelon form.

Example 3.10. The following are row-echelon matrices


     
1 4 7 1 −2 1 4 0 1 −7 3
 0 1 −9  ,  0 1 −10  ,
     
0 0 1  and  0 0
0 0 1 0 0 0 0 0 0 0 1
3.3. Row-echelon and reduced row-echelon forms 69

while those below are not


     
1 −2 8 1 3 3 −15 0 0 0
2 6 ,  0 0 0  0 1 0 .
     
 0 1  and
0 0 1 0 0 0 6 1 0 0

Exercise 3.11. Explain why the second set of matrices in Example


3.10 are not in row-echelon form.

Gauss-Jordan elimination
Theorems 3.12 and 3.15 lie at the heart of the new strategy for solving
linear systems. The proof of the first is algorithmic, as it describes a
method that can be implemented. We include the proof as it is of con-
siderable practical use when solving linear systems. There is a video to
accompany the proof, in which we apply the algorithm in the proof to
Example 3.13.

Theorem 3.12. Every m × n matrix can be reduced, via a series of


EROs, to a row-echelon matrix.

Proof. If A is the zero m × n matrix then there is nothing to do as A This algorithm gen-
is already in row-echelon form. If A is non-zero, look in A for the first erates a sequence of
EROs that is guar-
column (from the left) having a non-zero entry. This column is called the anteed to produce a
pivot column, and the first non-zero entry in the pivot column is called row-echelon matrix.

the pivot. Suppose the pivot column is column j and the pivot occurs in
However, this sequence
of EROs not the only
row i. Now interchange, if necessary, rows 1 and i and call the resulting one that does this! In

matrix B:
practice, in specific
cases it can be better to
R1 ↔ Ri A → B. use a different sequence
of EROs – see marginal
notes accompanying
Thus the pivot B1j is non-zero. Now perform the ERO Example 3.17.
Also be aware that, after
an alternate reduction,
1
R1 → × R1 B → C. you may get a REF that
B1j is different from the one
that the theorem pro-
vides. Such REFs are
Note that C1j = 1. Now whenever Ckj , 2 6 k 6 m, is non-zero, perform not unique!
the ERO
Rk → Rk − Ckj × R1

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
70 Systems of Linear Equations

on C . Denote the resulting matrix by D. It follows that in D, the elements


in column j, in rows 2 to m, are zero:

D2j = D3j = . . . = Dmj = 0.

Next, consider the (m − 1) × n submatrix A0 of D obtained by deleting


the first row of D. Repeat the procedure above with A0 instead of A.
Continuing in this way, we obtain a matrix in row echelon form.

Example 3.13. Reduce the matrix A below to row-echelon form.


 
2 4 −2 0 6
A =  1 2 −1 4 8 .
 

−2 −4 9 11 −15

Solution. We implement the algorithm in the proof of Theorem 3.12.


See accompanying video. The REF obtained is
 
1 2 −1 0 3
− 97  . 
 
 0 0 1 117
5
0 0 0 1 4

Definition 3.14 (Reduced row-echelon forms). A matrix is in reduced


row-echelon form (RREF) if

1. it is in row-echelon form, and

2. if a particular column contains a leading 1, then all other entries


of that column are 0s.

We state without proof the next result.

Theorem 3.15. Every m × n matrix can be reduced, via a series of


Row-echelon forms of
matrices need not be
unique. However, for EROs, to a unique reduced row-echelon matrix.
any given A, there is
only one reduced row-
echelon form!
Once we have a row-echelon form of a matrix, we can use additional
EROs to obtain the reduced row-echelon form.
3.3. Row-echelon and reduced row-echelon forms 71

Example 3.16. Reduce the matrix A below to reduced row-echelon


form.  
2 4 −2 0 6
A =  1 2 −1 4 8 .
 

−2 −4 9 11 −15

Solution. See accompanying video. The RREF in this case is


 
1 2 0 0 − 41
 0 0 1 0 − 13 . 
 
4 
5
0 0 0 1 4

So every matrix can be reduced to reduced row-echelon form. But what


is the use of this? Let’s return to Example 3.5.

Example 3.17. Consider again the system

x1 + 2x2 − x3 = 5
3x1 + x2 − 2x3 = 9
−x1 + 4x2 + 2x3 = 0

in Example 3.5, reduce the augmented matrix of the system to reduced


row-echelon form, and hence solve the system (again).

Solution. See accompanying video. On the video, we begin at step In the video accompa-
nying Example 3.5, we
5, which the the final step on the video accompanying Example 3.5. did not follow the al-
gorithm in the proof of
The RREF in this case is Theorem 3.12 exactly. In
  step 2, we performed
1 0 0 2 R2 → R2 + R3, in-
stead of R2 → − 51 R2
,
 
 0 1 0 1 (the ERO suggested by

0 0 1 −1
the algorithm). Both
EROs turn the pivot in
column 2 into a 1, but
the first ERO avoids
fractions (which can in-
which corresponds to the linear system
crease errors in arith-
x1 = 2 metic).

x2 = 1
x3 = −1.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
72 Systems of Linear Equations

This system is easier to solve! These x1 , x2 and x3 agree with our


solution to Example 3.5. 

Using EROs in this way preserves the set of solutions of the system.
By reducing to the RREF, we have passed from a system that is hard to
solve to another that is trivial to solve. Moreover, the set of solutions
has been preserved, so the solution to the final system is the same as
the solution to the original system (and of every intermediate one).
This method of solving linear systems by reducing the augmented matrix
to reduced row-echelon form is called Gauss–Jordan elimination.

3.4 Parametric solutions and inconsistent


systems
Parametric solutions
In Examples 3.4 and 3.5, we obtained unique solutions of the systems,
i.e., in both cases there is only one combination of x and y (in Example
3.4) or x1 , x2 and x3 (in Example 3.5) that solves the system. However,
very often there is more than one solution to a given system.

Example 3.18. Find the solution to

2x1 + 4x2 − 2x3 = 6


x1 + 2x2 − x3 + 4x4 = 8
−2x1 − 4x2 + 9x3 + 11x4 = −15.

Solution. The augmented matrix of this system is the matrix A in


Carl Friedrich Gauss
Examples 3.13 and 3.16. Using Gauss-Jordan elimination as above,
(1777 – 1855) is we obtain the RREF
one of the greatest
 
1 2 0 0 − 14
mathematicians of all
time.
 0 0 1 0 − 13 .
He introduced this
 
4 
elimination technique
5
to solve linear systems 0 0 0 1 4
arising from the
analysis of numerical
data from planetary
observations.
Image source:
Wikipedia.
3.4. Parametric solutions and inconsistent systems 73

This corresponds to a new linear system

x1 + 2x2 = − 14
x3 = − 13
4
x4 = 5
4
.

The RREF involves 3 leading 1s, one in each of the columns corre-
sponding to the variables x1 , x3 and x4 . The column corresponding to
x2 contains no leading 1. We distinguish between these cases.
A variable whose associated column in the RREF contains a leading
1 is called a leading variable. A variable whose column in the RREF
does not contain a leading 1 is called a free variable. In this case
x1 , x3 and x4 are leading variables and x2 is free.
To each free variable we associate a parameter. Here, we let x2 = t.
Then we can express the remaining leading variables in terms of these
parameters to get a parametric solution: Here, x3 and x4 don’t de-
pend on t, but there are
x1 = − 41 − 2t, x2 = t, x3 = − 13
4
, x4 = 5
4
. situations where all the
variables depend on pa-
rameters. Dependency
This parametric solution describes all the (infinitely many) possible on parameters varies


from case to case.
solutions of the system.

Above, we get a particular solution by choosing a specific numerical


value for t: e.g. if t = 1, then x1 = − 94 , x2 = 1, x3 = − 13
4
and x4 = 54 .
In general, we can have any number of free variables. To each one we
should associate a different parameter, so that it does not depend on
any of the others. Section 5.3 and the homework assignments contain
examples of systems having more than one parameter.

Inconsistent systems
We have dealt with systems having infinitely many solutions, by virtue
of parameters. It is also possible for a given system to have no solutions
at all.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
74 Systems of Linear Equations

Example 3.19. Show that the system

x1 + 2x2 + 3x3 = 5
2x1 + 5x2 + 7x3 = 13
3x2 + 3x3 = 10

has no solutions at all.

Solution. The system of equations corresponding to the REF (see


video) has as its third equation

0x1 + 0x2 + 0x3 = 1 i.e. 0 = 1.

This equation clearly has no solutions - no assignment of numerical


values to x1 , x2 and x3 will make the value of the expression 0x1 +0x2 +
0x3 equal to anything but 0. Hence the system has no solutions. 

Definition 3.20 (Inconsistent systems). A linear system is called in-


consistent if it has no solutions. A system which has at least one
solution is called consistent.

An inconsistent system If a system is inconsistent, a REF obtained from its augmented matrix
will always betray itself
in this way – this is how
will include a row of the form 0 0 0 . . . 0 1, i.e. it will have a leading 1 in
we spot them! its rightmost column. Such a row corresponds to an equation of the form
0x1 + 0x2 + · · · + 0xn = 1, which has no solution.

Example 3.21. Consider the system 4x + 2y = 0 and 2x + y = 3. The


solution sets of these equations form two parallel lines which never
meet. This corresponds to the fact that there is no simultaneous
solution to both equations.

Figure 3.3: The solution sets of 4x + 2y = 0 and 2x + y = 3 never meet


y

x
−2 2
3.5. Connections between linear systems and matrices 75

A summary of linear systems


We end this section by summarising the possible types of solution that
a linear system can have.

1. Unique solution: this happens if the system is consistent and there


are no free variables. In this case the RREF obtained from the
augmented matrix has the form
 
1 0 0 ... 0 ∗
 0 1 0 ... 0 ∗ 
 
 
 0 0 1 ... 0 ∗ 
 
 .. .. .. . . .. .. 
 . . . . . . 
0 0 0 ... 1 ∗

with possibly some additional rows consisting entirely of 0s at the


bottom. The unique solution can be read from the rightmost column.
See the RREF from Example 3.17 (step 8 on the video).
2. Parametric solution: this happens if the system is consistent and
Note that if the number
of variables n strictly
has at least one free variable. Systems of this type have infinitely exceeds the number of
equations m, then the
many solutions. See Example 3.18. system never has a
unique solution: either
3. No Solutions: the system may be inconsistent, i.e., it has no solu- it is inconsistent, or at
least one of the vari-
tions. This happens if a REF obtained from the augmented matrix ables must be free, giv-
has a leading 1 in its rightmost column: ing a parametric solu-
tion.
 
..
.
 .. 
 
 . 
0 0 0 0 0 1

See Example 3.19.

3.5 Connections between linear systems and


matrices
Linear systems as matrix equations
Consider a general linear system of m equations in n variables x1 , x2 , . . . ,
xn . Suppose that, in equation i, the coefficient of the variable xj is given

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
76 Systems of Linear Equations

by aij , and the right hand side is bi . Then our system can be written

a11 x1 + a12 x2 + . . . + a1n xn = b1


a21 x1 + a22 x2 + . . . + a2n xn = b2
.. .. ..
. . .
am1 x1 + am2 x2 + . . . + amn xn = bm .

Now consider the m × n matrix A whose (i, j)-entry is aij (so Aij = aij ),
and the n × 1 and m × 1 column vectors
   
x1 b1
 x2   b2 
   
x =  . 
  and b =  . 

,
.
 .   .. 
xn bm

respectively. Observe now that the m × 1 matrix product


  
a11 a12 . . . a1n x1
 a21 a22 . . . a2n   x2 
  
Ax =  .. .. ... ..   .. 
 

 . . .  . 
am1 am2 . . . amn xn
 
a11 x1 + a12 x2 + · · · + a1n xn
a21 x1 + a22 x2 + · · · + a2n xn
 
 
= 
 .. 

 . 
am1 x1 + am2 x2 + · · · + amn xn

encapsulates the left hand side of our whole system, and thus our system
of m equations can be rewritten as the single matrix equation

Ax = b.

The matrix A is called the coefficient matrix of the system. If A is a


n × n square matrix (i.e. if we have the same number of equations as
variables), and if A is invertible, then we can solve this equation using
matrix algebra:
x = In x = (A−1 A)x = A−1 b,
The fact that x = In x follows from Theorem 1.36.
3.5. Connections between linear systems and matrices 77

Example 3.22. Consider once again the system in Example 3.5:

x1 + 2x2 − x3 = 5
3x1 + x2 − 2x3 = 9
−x1 + 4x2 + 2x3 = 0.

We will use matrix inverses to solve the system.

Solution. Here, m = n = 3 and


     
1 2 −1 x1 5
A =  3 1 −2  , x =  x2  and b =  9  .
     

−1 4 2 x3 0

In this case A is invertible and


 
−10 8 3
A−1 1 
4 −1 1  ,

= 11 
−13 6 5
so
      
x1 −10 8 3 5 2
−1
 x2  = x = A b = 4 −1 1   9  =  1  ,
  1     
11 
x3 −13 6 5 0 −1

giving the solution x1 = 2, x2 = 1, x3 = −1, exactly as in Example


3.5. 

Theorem 3.23 gives a full description of the solutions of the matrix equa-
tion Ax = b, when m = n.

Theorem 3.23. Let A ∈ Mn (R) and let x and b be n×1 column vectors.

1. If A is invertible then the matrix equation Ax = b has the unique


solution x = A−1 b.

2. If A is not invertible then Ax = b either has no solutions (cor-


responding to an inconsistent system), or has infinitely many

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
78 Systems of Linear Equations

solutions x (corresponding to a parametric solution).

The first part of this theorem was proved above, just before Example 3.22.
The proof of part 2 is beyond the scope of the module.
If A is not invertible, then the question of whether or not the equation
Ax = b has any solutions depends on the specific nature of A and b. This
problem must be approached on a case by case basis.

Computing matrix inverses by augmentation


Next, we present a second way to compute matrix inverses. The advan-
tage of the following method over that given in Chapter 1 (see Theorem
1.56) is that it is much more applicable for large values of n.
 
1 3 1
Example 3.24. Let A =  2 0 −1 . Find A−1 .
 

1 4 2

Solution. The inverse A−1 (if it exists) can be found using EROs as
follows.

1. Write down a 3 × 6 matrix B, whose first 3 columns comprise A


and whose second 3 columns comprise I3 :
 
1 3 1 1 0 0
B =  2 0 −1 0 1 0 
 

1 4 2 0 0 1

2. Use EROs to obtain the RREF corresponding to B:


 
1 3 1 1 0 0
 2 0 −1 0 1 0 
 

1 4 2 0 0 1
 
1 3 1 1 0 0
R2 → R2 − 2R1  0 −6 −3 −2 1 0 
 

R3 → R3 − R1 0 1 1 −1 0 1
3.5. Connections between linear systems and matrices 79

 
1 3 1 1 0 0
R3 ↔ R2 1 −1 0 1
 
 0 1 
0 −6 −3 −2 1 0
 
R1 → R1 − 3R2 1 0 −2 4 0 −3
1 −1 0
 
 0 1 1 
R3 → R3 + 6R2 0 0 3 −8 1 6
 
1 0 −2 4 0 −3
1 −1 0
 
 0 1 1 
R3 → 13 R3 0 0 1 −3 3
8 1
2
 
R1 → R1 + 2R3 1 0 0 − 43 2
3
1
R2 → R2 − R3 − 3 −1 .
 5 1 
 0 1 0 3
0 0 1 −3 8 1
3
2

3. In this RREF, each of the first 3 columns contains a leading 1,


and the first 3 columns comprise the 3 × 3 identity matrix I3 .
As if by magic, the 3 × 3 matrix consisting of the last three
columns is the inverse of A:
   
− 34 2
3
1 −4 2 3
A−1 =  35 − 13 −1  = 31  5 −1 −3  . 
   

−38 1
3
2 −8 1 6

The procedure outlined in the example above is called finding matrix


inverses by augmentation, and extends to general n × n matrices.

Theorem 3.25. Let A ∈ Mn (R), form the n × 2n matrix B = (A | In ),


and use EROs to reduce B to its unique RREF (P | Q), where P, Q ∈
Mn (R) (possible by Theorem 3.15). If P = In , then A is invertible and
A−1 = Q. If P 6= In , then A is not invertible.

The proof of Theorem 3.25 is again beyond the scope of the module.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
80 Systems of Linear Equations

 
1 2 −1
Example 3.26. Does A =  0 1
 
1  have an inverse?
1 4 1
 
1 2 −1 1 0 0
Solution. 1. Form the 3 × 6 matrix B =  0 1
 
1 0 1 0 .
1 4 1 0 0 1

2. Use EROs to reduce B to RREF:


 
1 2 −1 1 0 0
 
 0 1 1 0 1 0 
1 4 1 0 0 1
 
1 2 −1 1 0 0
 
 0 1 1 0 1 0 
R3 → R3 − R1 0 2 2 −1 0 1
 
R1 → R1 − 2R2 1 0 −3 1 −2 0
 
 0 1 1 0 1 0 
R3 → R3 − 2R2 0 0 0 −1 −2 1
 
1 0 −3 1 −2 0
 
 0 1 1 0 1 0 
R3 → −R3 0 0 0 1 2 −1
 
R1 → R1 − R3 1 0 −3 0 −4 1
0 .
 
 0 1 1 0 1
0 0 0 1 2 −1

The left hand half of the RREF is not I3 , so A is not invertible. 

Matrix rank
Finally, we take a very brief look at the rank of a matrix.

Definition 3.27 (Matrix rank). The rank of a matrix A, denoted rank(A),


is the number of non-zero rows of any REF of A.
3.5. Connections between linear systems and matrices 81

While a matrix can have more than one REF in general, the number of
non-zero rows of any such REF will always be the same (we won’t prove
this fact).
The notion of matrix rank can give us information about the number of
solutions and the number of parameters of solutions of systems of linear
equations.
Imagine that we have a system of equations in n variables. The system
is consistent, i.e. has a solution, if and only if the rank of the coefficient
matrix A of the system equals the rank of the augmented matrix. If this
is the case then we have more. We always have rank(A) 6 n. Moreover,
if rank(A) = n then the system has a unique solution. Otherwise, the
system has infinitely many solutions, and the parametric solution will
involve n − rank(A) > 1 parameters.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
Chapter 4

Vector Geometry 2

4.1 Orthonormal bases of Rn and coordinate


systems
In this chapter, we are going to delve slightly more deeply into the theory
of vectors. Hold on tight.

Orthonormal lists of vectors

Warning 4.1. Throughout this chapter, we will be considering lots of


lists of vectors v1 , . . . , vk in Rn . Such lists of vectors will be denoted
in bold face, to distinguish them from the entries of a single vector
v = (v1 , . . . , vk ) in Rk . Do not confuse the two – doing so will cause
much anguish!

Definition 4.2 (Orthonormal lists of vectors). Let v1 , . . . , vk be a list


of vectors in Rn . We say that this list is orthonormal, or that the
vectors are orthonormal, if they satisfy two criteria:

1. vi = 1 for all i in the range 1 6 i 6 k;

2. vi · vj = 0 whenever 1 6 i, j 6 k and i 6= j.

A list of vectors is orthonormal if each vector has unit length, and if each
vector is orthogonal to every other vector in the list. Below are the
archetypal examples of orthonormal lists of vectors.

83
84 Vector Geometry 2

Example 4.3.

e 1 = (1, 0) and e2 = (0, 1). It is easy to verify that


In R 2 , set
1.
e1 = e2 = 1 and e1 · e2 = 0. Thus e1 , e2 is an orthonormal
It is clear from this ex-
ample that the definition list of vectors in R2 .
of the ei changes ac-
2. Likewise, in R3 , set e1 = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1).
cording to the value of
n, but the context should
make clear which list of As above, one can verify that each vector has unit length, and
ei one is dealing with.
that they are mutually orthogonal to each other.

3. More generally, fix a positive integer n. For each i between 1


and n, define ei = (0, . . . 0, 1, 0, . . . , 0) in Rn , where the single
1 is the ith entry of the vector. Then the list e1 , . . . , en is an
orthonormal list of vectors of Rn . Evidently, each ei has unit
length, and ei · ej = 0 whenever i 6= j, because the 1s in the
respective vectors appear in different places.

Figure 4.1: The vectors e1 , e2 , e3 in R3


z

The vectors e1 , e2 and


e3 point along the pos-
itive x-, y- and z-
axes respectively, and
are called the standard y
e3
basis vectors in R3 .
e2

O
e1

You should verify that the next examples yield orthonormal lists of vectors.

Example 4.4.

1. The vectors v1 = √12 (1, 1) and v2 = √12 (−1, 1) form an orthonor-


mal list in R2 (you should draw these vectors on a piece of
paper).

2. The vectors v1 = √1 (1, −1, 0),


2
v2 = √1 (1, 1, −2)
6
and v3 = √1 (1, 1, 1)
3
4.1. Orthonormal bases of Rn and coordinate systems 85

form an orthonormal list in R3 .

Remarks 4.5. Notice that if v has unit length, then the 1-element list
v is orthonormal. Criterion (1) of Definition 4.2 is evidently fulfilled.
Criterion (2) is also fulfilled, because it can only fail if we can find
two vectors in the list that are not orthogonal to each other!

Let us consider further the list e1 , . . . , en in Rn from Example 4.3 (3). Take
another vector x = (x1 , . . . , xn ) in Rn . The ith entry of x is the number xi .
Observe that
x · ei = (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) · (0, . . . , 0, 1, 0, . . . , 0) = xi .
Thus, the entries xi of x are equal to the scalar products x · ei , 1 6 i 6 n.
Observe further that
x1 e1 + · · · + xn en = x1 (1, 0, . . . , 0) + · · · + xn (0, . . . , 0, 1)
= (x1 , 0, . . . , 0) + · · · + (0, . . . , 0, xn )
= (x1 , . . . , xn ) = x.

Therefore, we can write the vector x as the following sum of n vectors


x = (x · e1 )e1 + · · · + (x · en )en . (4.1)

This equality holds for every vector x in Rn . It turns out that (4.1) is
part of a wider pattern exhibited by certain special lists of orthonormal
vectors.

Orthonormal bases and coordinate systems


Hereafter, we shall abbreviate orthonormal to on. The following theorem
yields some deep facts about lists of on vectors. Its proof uses a hefty
slice of more advanced theory from linear algebra, and is too rich for this
module.

Theorem 4.6. Let v1 , . . . , vk be a list of on vectors in Rn . The following


statements hold.

1. The length of the list cannot exceed the dimension of Rn , i.e.


k 6 n.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
86 Vector Geometry 2

2. If k = n, then given any vector x in Rn , we have

x = (x · v1 )v1 + · · · + (x · vn )vn . (4.2)

3. If k < n, then we can extend the list v1 , . . . , vk by new vectors


vk+1 , . . . , vn , in such a way that the extended list

v1 , . . . , vn ,

is again an on list of vectors in Rn . Necessarily, this extended


list satisfies part (2).

We will examine the consequences of Theorem 4.6 one by one. Part (1)
of the theorem imposes a limit on the length of lists of on vectors in Rn .
For example, it is not possible to have a list of 4 on vectors in R3 .
Theorem 4.6 part (2) requires more attention. First, notice that (4.1) is a
special case of (4.2), applied to the vectors e1 , . . . , en . Next, consider the
following definitions, which are prompted by this part of the theorem.

Definition 4.7 (Orthonormal bases and coordinates). An on list of


vectors v1 , . . . , vn in Rn is called an orthonormal basis or an on basis
of Rn .
Given x in Rn , the numbers x · v1 , . . . , x · vn , as in (4.2) above, are the
coordinates of x, with respect to the basis v1 , . . . , vn .
In particular, the list of vectors e1 , . . . , en from Example 4.3 (3) is
called the standard on basis of Rn .

Let’s see how Theorem 4.6 part (2), (4.2), and the ideas in Definition 4.7
apply in some examples.

Example 4.8.

1. Let x = (2, 4). The coordinates of x, with respect to the standard


on basis e1 , e2 of R2 are 2 and 4. Now consider v1 , v2 from
Example 4.4 (1). Because the length of this on list equals the
dimension of R2 , it is an on basis of R2 , according to Definition
4.7. The coordinates of the same vector x are

x · v1 = (2, 4) · √12 (1, 1) = √62 = 3 2
4.1. Orthonormal bases of Rn and coordinate systems 87


and x · v2 = (2, 4) · √1 (−1, 1)
2
= √2
2
= 2,

with respect to v1 , v2 .
Moreover, we can see that
√ √ √ √
3 2v1 + 2v2 = 3 2 √12 (1, 1) + 2 √12 (−1, 1)
= (3, 3) + (−1, 1)
= (2, 4) = x. There is a video to ac-
company this example.

2. Let x = (3, 4, 5). The coordinates of x with respect to the stan-


dard on basis e1 , e2 , e3 of R3 are 3, 4 and 5, respectively. Now
consider v1 , v2 , v3 from Example 4.4 (2). This is an on basis of
R3 . The coordinates of x with respect to this basis are

x · v1 = − √12 , x · v2 = − √36 and x · v3 = √


12
3
.

As above, we can verify that

(x · v1 )v1 + (x · v2 )v2 + (x · v3 )v3 = x.

Figure 4.2: The coordinates of x with respect to v1 and v2

4 √
2v2 = (−1,1)

x

3 2v1 = (3,3)

Let v1 , . . . , vn be an on basis of Rn . For every i between 1 and n, we


can associate a coordinate axis running parallel to the vector vi . As
the vectors in the list are mutually orthogonal, so are the corresponding
axes. The coordinate axes associated with the standard on basis of Rn
are simply the usual coordinate axes that we are familiar with.
Given a vector x in Rn , the coordinates x · v1 , . . . , x · vn of x defined above
are precisely the numbers that we would obtain if we measured the
coordinates of x with respect to the new set of coordinate axes, rather

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
88 Vector Geometry 2

than the usual ones. We consider this in the figure below, where x, v1
and v2 are as in Example 4.8 (1).

Figure 4.3: A change of coordinate axes


3 2 = x·v1
x
See the accompanying
short video.


x·v2 = 2
v2 v1

This happens all the


Changing axes in this way yields a different coordinate system. In appli-
time in physics for ex- cations, sometimes it is better to change the coordinate system, because
ample, when we have to
switch between differ- doing so allows us to make more sense of the problem at hand.
ent observers who have
their own local coordi- Finally, we consider Theorem 4.6, part (3). Observe that if k < n, then
nate systems. You will
also see it in action (4.2) does not apply to every vector in Rn .
in Section 6.1 and Ap-
pendix B.
Example 4.9. Let v1 and v2 be the first two vectors in Example 4.4
(2). Certainly, v1 , v2 form an orthonormal list in R3 . Let x be the third
vector in that example. Then

(x · v1 )v1 + (x · v2 )v2 = 0v1 + 0v2 = 0,

which is certainly not equal to x. Thus (4.2) fails in this case.

More generally, if k < n, then given any extension of v1 , . . . , vk to


v1 , . . . , vn , as in Theorem 4.6 (there are always at least two extensions,
and often infinitely many), then

(vk+1 · v1 )v1 + · · · + (vk+1 · vn )vk = 0v1 + . . . 0vk = 0,



which does not equal vk+1 , as vk+1 = 1 6= 0. Thus (4.2) will always fail
for some vectors.
4.2. Orthonormal bases and orthogonal matrices 89

4.2 Orthonormal bases and orthogonal


matrices
All that sounds lovely, but how do we change from one coordinate system
to another in practice? This can be done using orthogonal matrices.
Take an on basis v1 , . . . , vn of Rn , and let’s consider them as n × 1 column
vectors, so that we can apply matrices to them, as in Section 2.4. Let
P ∈ Mn (R) be an orthogonal matrix, and let’s make a new list of vectors
w1 , . . . , wn in Rn , by setting wi = Pvi , 1 6 i 6 n. At this point we
resurrect Proposition 2.29, because it tells that that w1 , . . . , wn is also
an on basis. Evidently, the length of the new list equals the dimension
of Rn . We simply have to verify that it fulfils Definition 4.2. This is true,
because

wi = vi = 1 and wi · wj = (Pvi ) · (Pvj ) = vi · vj = 0,

whenever 1 6 i, j 6 n are distinct.

Example 4.10. Consider the matrix P ∈ M3 (R), back in Example 1.51,


and the standard on basis e1 , e2 , e3 of R3 . For i between 1 and 3,
define wi = Pei . Show that w1 , w2 , w3 is the same as the on basis
v1 , v2 , v3 of R3 given in Example 4.4 (2).

Solution. Treating these things as 3 × 1 column vectors, we have


    1 
√1 √1 √1 1 √
2 6 3
 √21 
w1 = Pe1 =  − √12 √1 √1   0  =  − 2  = v1 .
  
6 3
0 − √26 √1
3
0 0

Notice that v1 is just the first column of P! Likewise, w2 = Pe2 is


the second column of P, which equals v2 , and similarly for w3 . 

The above example generalises to any number of dimensions. Suppose


we start with the standard basis of Rn , with its associated standard
coordinate system, and want to switch to a new coordinate system, based
on w1 , . . . , wn , where wi = Pei , 1 6 i 6 n, where P ∈ Mn (R) is orthogonal.
We make two observations. First, the vector wi , as a column vector, is
simply the ith column of P.
Second, let x = (x1 , . . . , xn ) be a vector in Rn . Its coordinates with respect
to the standard system are just x1 , . . . , xn . The coordinates of x with

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
90 Vector Geometry 2

respect to the new system are given by x·w1 , . . . , x·wn . Using Proposition
2.28, notice that
x · wi = x · (Pei ) = (P T x) · ei ,
and this is the ith entry of the vector P T x, because the ith entry of ei
equals 1 and all other entries of ei are zero. Hence the coordinates of
x with respect to the new system are precisely the entries of the vector
P T x. Let’s see this in action.

Example 4.11. Recall Examples 4.10 and 4.8 (2). Show that the co-
ordinates of the vector x = (3, 4, 5) with respect to w1 , w2 , w3 are the
entries in P T x.

Solution. From Example 4.8 (2), we know that the new coordinates
are − √12 , − √36 and √123 . Computation of P T x yields
    
√1
2
− √12 0 3 − √12
PT x =  √1 √1 − √26   4  =  − √36  ,
    
6 6
√1 √1 √1 5 √
12
3 3 3 3

as required. 

We finish the chapter with a reprise of Exercise 2.32.


To a grizzled ancient
like me, such an exercise
Exercise 4.12. Let x be a vector in Rn , let v1 , . . . , vk , 1 6 k 6 n, be
is ‘interesting’. an on list of vectors in Rn , and let y be defined by
‘Interesting’ exercises
can be difficult for
first-timers, so don’t
y = (x · v1 )v1 + · · · + (x · vk )vk .
worry overmuch if you
find it tricky! Let a1 , . . . , ak be numbers, and let

z = a1 v1 + · · · + ak vk .

Show that x − y and y − z are orthogonal, regardless of the values of


a1 , . . . , ak (hint: first show that (x − y) · vi = 0 for 1 6 i 6 k). As in
Exercise 2.32, deduce that x − y 6 x − z .

If k = 1 then Exercise 4.12 boils down to Exercise 2.32 (just set v =


v1 and a = a1 ). It turns out that the vector y in Exercise 4.12 is the
orthogonal projection of x onto the vector subspace generated by the
vectors v1 , . . . , vk , and is the closest vector in the subspace to x.
4.2. Orthonormal bases and orthogonal matrices 91

Orthogonal projections onto subspaces are connected to the idea of least


squares solutions of linear systems.
More about subspaces and orthogonal projections can be found in Ap-
pendix C.2.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
Chapter 5

Eigenvalues and Eigenvectors of Matrices

5.1 Eigenvalues and eigenvectors


Eigenvalues occur naturally in many physical and engineering problems.
A striking example is the collapse of Tacoma Narrows Bridge (built in
1940, collapsed in 1940). It is thought that the frequency of the wind
was too close to the ‘natural frequency’ of the bridge (the frequency at
which the bridge oscillates itself), causing an amplification effect which
destroyed the bridge. The natural frequency of the bridge is an example The collapse can be
of an eigenvalue. seen e.g. on YouTube.

Another example of the use of eigenvalues and eigenvectors, this time


in the context of high volumes of data, is Google’s PageRank algorithm,
used in its search engine. There is a special eigenvector, the entries of
which can be used to rank search results.

Examples and definitions


In this chapter, we are going to consider vectors as column vectors, so
that we can take advantage of matrix multiplication, as in Section 2.4.
 
Example 5.1. The matrix A = reflects points in the line
0 1
1 0
y = x:
! ! !
a a a
A = = 1·
a a a

93
94 Eigenvalues and Eigenvectors of Matrices

! ! !
a −a a
and A = = (−1) ·
−a a −a

for any a ∈ R.

Figure 5.1: Reflection matrices


y

4
! !
2 −2
A =
−2 2
2

x
−4 −2 2 4
!
2
−2

Definition 5.2 (Eigenvalues and eigenvectors). Let A ∈ Mn (R) and let


x be a non-zero n × 1 column vector (so not all of the entries of x are
zero). Then x is called an eigenvector of A if

Ax = λx

where λ is some number.


In this situation, λ is called an eigenvalue of A, and λ and x are said
David Hilbert (1862 – to correspond to each other.
1943) was arguably the
first to use the German
word eigen to denote
eigenvalues and eigen-
vectors in 1904. Hilbert
Example 5.3. If A is the reflection matrix above then 1! is an eigen-
was a pivotal figure in value of A, having corresponding eigenvector, say 11 . Also, −1 is
19th and 20th century
!
mathematics.
Image source:
an eigenvalue of A, having eigenvector, say 1
−1
.
Wikipedia.

Remarks 5.4. Notice that eigenvectors are not unique. If x is an


eigenvector of A, then any non-zero scalar multiple of x is also an
eigenvector (corresponding to the same eigenvalue). Indeed, given a
5.1. Eigenvalues and eigenvectors 95

number c, we have

A(cx) = c(Ax) = c(λx) = λ(cx).


It is convention to use
Hence, provided c is non-zero, cx is also an eigenvector, having the the Greek letter λ (or µ
or ν) to denote eigenval-
same eigenvalue λ. ues.

Warning 5.5. In this module, we have dealt exclusively with real num-
bers. Our matrices and vectors have been composed entirely of such
numbers. It turns out that some matrices, consisting entirely of real
numbers, can have eigenvalues that are complex numbers, that is,
numbers of the form a + ib, where a, b ∈ R and i is an imaginary
number satisfying i2 = −1. Moreover, in such cases, the correspond-
ing eigenvectors consist of complex numbers in general.
For example, it turns out that the rotation matrix Rθ from Example
1.2 (2) has eigenvalues cos θ ± i sin θ, which are not real numbers in
general (they are real only if θ is an integer multiple of π).
Hereafter, all the eigenvalues of the matrices considered in examples
will be real!

Seeking eigenvalues
Given a general matrix A ∈ Mn (R), how can we find its eigenvalues and
eigenvectors? The next theorem tells us how to find both. In practice, we
usually find the eigenvalues before finding the corresponding eigenvec-
tors.

Theorem 5.6. Let A ∈ Mn (R). Then λ is an eigenvalue of A if and


only if det(A − λIn ) = 0.

Proof. We are looking for non-zero n × 1 column vectors x and numbers


λ satisfying Ax = λx. This can be written in a slightly different way as
 
0
Ax − λx = 0 =  .  ,
 .. 

0
where 0 denotes the n × 1 zero column vector. Hence,
0 = Ax − λx = Ax − λIn x by Theorem 1.36

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
96 Eigenvalues and Eigenvectors of Matrices

= (A − λIn ) x.
| {z }
a n×n matrix

By Theorem 3.23, this matrix equation has a unique solution if A−λIn has
an inverse, and either no solutions or infinitely many solutions otherwise.
Notice that x = 0 is always a solution of this equation. Since we are
looking for non-zero solutions, we require A − λIn to be singular, i.e. not
invertible (if A − λIn were invertible, then the zero vector 0 would be the
only solution). If A − λIn is singular, then since 0 is always a solution,
we will have infinitely many solutions. In particular, we will have a non-
zero solution. Now A − λIn is singular if and only if det(A − λIn ) = 0, by
Theorem 1.56.

We put this to use on the matrix in Example 5.1.

Example 5.7. Determine the eigenvalues of A in Example 5.1.

Solution. We have
! ! !
0 1 λ 0 −λ 1
A − λI2 = − = .
1 0 0 λ 1 −λ

Hence
!
−λ 1
det(A − λI2 ) = det = (−λ)(−λ) − 1 × 1 = λ2 − 1.
1 −λ

So det(A − λI2 ) is a quadratic polynomial in λ.


It factorises to give (λ − 1)(λ + 1), so det(A − λI2 ) = 0 if and only if
λ = 1 or λ = −1. 

5.2 The characteristic equation


Given a matrix A ∈ M2 (R), the expression det(A − λI2 ) will always be a
quadratic in λ. In general, if A ∈ Mn (R), the expression det(A − λIn ) will
be a polynomial of degree n in λ.

Definition 5.8. Let A ∈ Mn (R). The characteristic polynomial of A is


the determinant of the matrix A − λIn ∈ Mn (R). It is a polynomial of
5.2. The characteristic equation 97

degree n in λ. The equation

det(A − λIn ) = 0

is called the characteristic equation of A.

Thus, we can restate Theorem 5.6 as saying that the eigenvalues of A


are exactly the roots or the solutions of the characteristic equation of A.
Finding eigenvalues boils down to finding roots of polynomials.
In applications we often consider 3 × 3 matrices. The characteristic poly-
nomials of such matrices have degree 3, i.e. they are cubic polynomials.
By considering the quadratic formula (if necessary), it is easy to find the
roots of a quadratic polynomial. However, it is generally difficult to find
the roots of a polynomial of degree 3 and above.
In exercises and examples, we often look at polynomials having integer
coefficients. These may not have integer roots at all, but if they do, the
following proposition is very helpful and, in particular, can be applied The formulae for finding
when looking for eigenvalues of matrices having integer entries. roots of cubic and quar-
tic polynomials were
discovered by Italian
Proposition 5.9. Let mathematicians in the
16th century. However,

p(x) = an x n + an−1 x n−1 + · · · + a1 x + a0 ,


they rank among the
most terrible things
known to humanity (I
be a polynomial having integer coefficients. If p(x) has any integer dare you to look them
up online if you haven’t
roots, then they must be factors of its constant term a0 . seen them already).
For this reason, we will
not be making use of

Proof. Assume p(x) that has an integer root r. Then


them in this module.
Interestingly, the ‘cubic

0 = p(r) = an r n + an−1 r n−1 + · · · + a1 r + a0 ,


formula’ led to the
conception of complex
numbers.
meaning As it turns out, it is im-
a0 = −(an r n−1 + an−1 r n−2 + · · · + a1 )r. possible for a ‘quintic
formula’ to exist. This
Since the expression in parentheses is again an integer, we see that a0 remarkable result was
proved by the brilliant
is an integer multiple of r. Norwegian mathemati-
cian Niels Henrik Abel
(1802 – 1829), pictured
Example 5.10. Find the eigenvalues of above. Abel died of tu-
berculosis aged only 26.
  Image source:
5 6 2 Wikipedia.

A =  0 −1 −8  .
 

1 0 −2

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
98 Eigenvalues and Eigenvectors of Matrices

Solution. The characteristic equation of A is

0 = det(A − λI3 )
 
5−λ 6 2
−1 − λ −8 
 
= det  0
1 0 −2 − λ
! !
−1 − λ −8 6 2
= (5 − λ) det + det
0 −2 − λ −1 − λ −8

= (5 − λ)(1 + λ)(2 + λ) + −48 + 2(1 + λ)
= (5 − λ)(λ2 + 3λ + 2) + 2λ − 46
= −λ3 + 2λ2 + 15λ − 36.

The constant term of p(λ) = −λ3 + 2λ2 + 15λ − 36 is −36. Thus, by


Proposition 5.9, the only possible integer roots of p(λ) are

±1, ±2, ±3, ±4, ±6, ±9, ±12, ±18, and ± 36.

We test some of these:

p(1) = −1 + 2 + 15 − 36 6= 0
p(−1) = 1 + 2 − 15 − 36 6= 0
p(2) = −8 + 8 + 30 − 36 6= 0
p(−2) = 8 + 8 − 30 − 36 6= 0
p(3) = −27 + 18 + 45 − 36 = 0.

Since 3 is a root of p(λ), we know that (3 − λ) is a factor of p(λ), and


so we can find the other, quadratic factor:

−λ3 + 2λ2 + 15λ − 36 = (3 − λ)(λ2 + λ − 12).

This factorises further to

(3 − λ)(λ + 4)(λ − 3).


We can find the
roots of the quadratic
factor easily, using
the quadratic formula
(MATH10040 Fact 1.19) Thus the eigenvalues of A are λ1 = λ2 = 3 and λ3 = −4. Repeated
if necessary. roots (as we have here) are sometimes called degenerate. 
5.3. Eigenvectors 99

Example 5.11. Find the eigenvalues of


 
8 −3 −3 This matrix arises out of
considering how easily
A =  −3 8 −3  .
  a cube will spin around
a given axis (specifi-
−3 −3 8 cally, A is related to the
inertia tensor of a cube).

Solution. The characteristic equation of A is

0 = det(A − λI3 )
 
8−λ −3 −3
= det  −3 8 − λ −3  .
 

−3 −3 8 − λ
Sometimes, as in this
example, there are
  
= (8 − λ) (8 − λ)2 − 9 + 3 −3(8 − λ) − 9 − 3 9 + 3(8 − λ)
‘smarter’ ways of evalu-
ating such determinants

= (8 − λ)(λ2 − 16λ + 55) + 6(3λ − 33)


using EROs, which lead
to a factorisation more

= −λ3 + 24λ2 − 165λ + 242.


quickly and with fewer
chances of errors in
arithmetic. However,
The constant term of the resulting polynomial p(λ) is 242. After some there was not enough
time to present such
testing as above, we find that p(2) = 0, hence λ − 2 is a factor. Upon methods in the module.
further factorisation we are left with

0 = (11 − λ)(λ − 11)(λ − 2).

Thus, the three eigenvalues are

λ1 = λ2 = 11 and λ3 = 2,

(so λ1 and λ2 are degenerate). 

5.3 Eigenvectors
We know now how to find eigenvalues. How do we find the correspond-
ing eigenvectors? The matrix equation (A − λIn )x = 0 in the proof of
Theorem 5.6 may be regarded as a system of linear equations in which
the coefficient matrix is A − λIn and the variables are the n entries of the
column vector x, which we can denote by x1 , . . . , xn – see Section 3.5.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
100 Eigenvalues and Eigenvectors of Matrices

Thus, to find eigenvectors, we are looking for non-zero solutions of


   
x1 0
(A − λIn )  .  =  .  .
 ..   .. 

xn 0

Example 5.12. We revisit Example 5.10. If


 
5 6 2
A =  0 −1 −8  ,
 

1 0 −2

then A has eigenvalues λ1 = λ2 = 3 and λ3 = −4. Find an eigenvector


of A corresponding to the eigenvalue λ3 = −4.

Solution. We need a column vector x, having entries x1 , . . . , xn , not


all zero, for which Ax = −4x or, equivalently

(A − (−4)I3 )x = (A + 4I3 )x = 0.

In other words,
    
9 6 2 x1 0
 0 3 −8   x2  =  0  .
    

1 0 2 x3 0
This fact has a practi-
cal benefit when find-
ing eigenvectors: if your
We will use EROs to solve this system of linear equations. We know
linear system does not that the determinant of the matrix on the left is 0, so we will get
have a parametric solu-
tion, then you know that either no solutions or a parametric solution, yielding infinitely many
you have done some- solutions. However, since the zero vector solves the system, we know
thing wrong! When
finding eigenvectors, the therefore that we will get a parametric solution.
number of parameters,
i.e. the number of free  
variables, equals the 9 6 2 0
number of zero rows in
 0 3 −8 0 
 
your RREF. Thus there
must be at least one
zero row in your RREF 1 0 2 0
– if not then you have
made a mistake.
5.3. Eigenvectors 101

 
R1 ↔ R3 1 0 2 0
 0 3 −8 0 
 

9 6 2 0
 
1 0 2 0
 0 3 −8 0 
 

R3 → R3 − 9R1 0 6 −16 0 In this sequence of


  EROs, the 4th column
1 0 2 0 is zero and thus never
changes. Hence, in
 0 3 −8 0 
 
future eigenvector
examples, we suppress
R3 → R3 − 2R2 0 0 0 0 it (remembering that it
  is still there), to avoid
1 0 2 0 writing unnecessary
zeros. We can do this
R2 → R2  0 1 − 38 0 
1   whenever the right
3 hand side of a linear
0 0 0 0 system is zero (but only
in this case!).
Thus x1 and x2 are leading variables and x3 is free: if x3 = t then
x1 = −2t and x2 = 83 t, so
   
−2t −2
x =  38 t  = t  83  .
   

t 1

Any non-zero choice of t produces an eigenvector. 

For instance, if t = 3 above then


 
−6
x =  8 ,
 

will suffice, and you can verify that


     
5 6 2 −6 24 −6
 0 −1 −8   8  =  −32  = −4  8  .
      

1 0 −2 3 −12 3

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
102 Eigenvalues and Eigenvectors of Matrices

Example 5.13. We revisit Example 5.11. Find unit eigenvectors v1 , v2


and v3 corresponding to the eigenvalues of
The eigenvectors we ob-  
tain happen to be paral- 8 −3 −3
lel to the so-called prin-
A =  −3 8 −3  ,
 
cipal axes of the inertia
−3 −3
tensor of a cube.
8

in Example 5.11. Moreover, choose the eigenvectors so that they are


mutually orthogonal, and thus form an on basis of R3 .

Solution. The eigenvalues are λ1 = λ2 = 11 and λ3 = 2. Being more


straightforward, we find v3 (corresponding to λ3 = 2) first. The vector
must satisfy
    
6 −3 −3 x1 0
(A − 2I3 )x = 0, i.e.  −3 6 −3   x2  =  0  .
    

−3 −3 6 x3 0

Using EROs yields


     
Note here that we have 6 −3 −3 1 1 −2 1 1 −2
suppressed the fourth
 −3 6 −3  →  1 −2 →  0 −3
columns of these matri-
     
1  3 
ces, which consist en-
tirely of zeros and never −3 −3 6 −2 1 1 0 3 −3
change.    
1 1 −2 1 0 −1
→  0 1 −1  →  0 1 −1  ,
   

0 0 0 0 0 0

so if x3 = t then x1 = x2 = t. It follows that


   
t 1
x =  t  = t  1 .
   

t 1

We seek a value of t to produce v3 , in such a way that v3 = 1.

Since x = |t| 3, we can choose t = √13 or t = − √13 . Either choice

5.3. Eigenvectors 103

is valid: let’s pick t = √1 .


3
Then
 
1
v3 = √13  1  .
 

Eigenvectors v1 and v2 must both satisfy


    
−3 −3 −3 x1 0
(A − 11I3 )x = 0, i.e.  −3 −3 −3   x2  =  0  .
    

−3 −3 −3 x3 0

Straightaway, EROs yield


   
−3 −3 −3 1 1 1
 −3 −3 −3  →  0 0 0  ,
   

−3 −3 −3 0 0 0

so x1 is leading and x2 , x3 are free. Here, we need two parameters: if


x2 = s and x3 = t then x1 = −s − t. Consequently,
     
−s − t −1 −1
x =  s  = s 1  + t  0 .
     

t 0 1

We must make two sets of choices of s and t, in such a way that we


produce mutually orthogonal unit vectors v1 and v2 (that also have
to be orthogonal to v3 , but don’t worry about that for now. . . ). There
are infinitely many valid choices. Here are two sets of choices that
work. We begin with v1 . If s = − √12 and t = 0, we obtain the unit
vector  
1
v1 = √12  −1  .
 

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
104 Eigenvalues and Eigenvectors of Matrices

Then we make a second set of choices to obtain v2 . We need s and


t so that
   
1 −s − t
0 = v1 · x = √12  −1  ·  s  = − √12 (2s + t).
   

0 t

Hence t = −2s, giving


   
s 1
x =  s  = s 1 .
   

−2s −2

Bearing in the mind the condition v2 = 1, we see that s = √1
6
or
s = − √16 . Either will do: let’s set s = √16 .
We have ensured that v1 and v2 are orthogonal unit vectors. What
about orthogonality to v3 ? As it happens, we can see by inspection
that v3 is orthogonal to both v1 and v2 , so we are done. 

5.4 Symmetric matrices and orthonormal bases


of eigenvectors
As you can see, the vectors v1 , v2 , v3 are the same as the ones in Example
4.4 (2). In Example 5.13 above, it may have seemed suspiciously conve-
nient that the vector v3 just happened to be orthogonal to the other two
vectors we chose. In fact, this was no accident, and is an instance of a
more general phenomenon.
The matrix A in Example 5.13 is symmetric. As it turns out, the eigen-
values and eigenvectors of symmetric matrices happen to be very well
behaved. This is one of the reasons why symmetric matrices are so useful.
The final result of the chapter is very important. It is used, for example, in
Section 6.1 and Appendix B. Its proof is beyond the scope of the module.

Theorem 5.14. Let A ∈ Mn (R) be symmetric. Then the following


statements hold.

1. Every eigenvalue of A is a real number; we can list them in


descending order as λ1 , . . . , λn , so λ1 > · · · > λn (note that they
5.4. Symmetric matrices and orthonormal bases of eigenvectors 105

need not necessarily be distinct).

2. Let λi and λj be distinct eigenvalues of A, and let x and y be


two eigenvectors of A that correspond to λi and λj , respectively.
Then x · y = 0.

3. Furthermore, there exists an orthogonal matrix P ∈ Mn (R), such


that

a) Pe1 , . . . , Pen is an on basis of Rn ;


b) each Pei is an eigenvector of A having eigenvalue λi , and
c) the matrix product
 
λ1 0 . . . 0
0 λ2
 
T
 0 
P AP =  .. .. .. ,

 . . .


0 ... 0 λn

is a diagonal matrix (see Proposition 1.53 (3)), where the


eigenvalues listed above run along the main diagonal.

Theorem 5.14 (2) explains the happy orthogonality ‘accident’ in the solu-
tion of Example 5.13. The vectors v1 and v2 corresponded to λ1 = λ2 = 11,
while v3 corresponded to λ3 = 2. Hence we were bound to have v1 · v3 =
v2 · v3 = 0, no matter what our choices of v1 and v2 were.
This helps in such examples: given a symmetric matrix, we don’t need
to worry about the orthogonality of eigenvectors having different eigen-
values. We still need to engineer orthogonality between eigenvectors
having the same eigenvalue, however, as in the case of v1 and v2 above.
Theorem 5.14 (3) will help enormously when we consider quadratic forms
in Section 6.1. There, we will see the utility of listing the eigenvalues in
descending order.

Example 5.15. Let A be the symmetric matrix in Examples 5.11 and


5.13, and let  1 
1 1 √ √ √
2 6 3
P =  − √12 √16 √1 ,
 
3
0 − √26 √1
3

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
106 Eigenvalues and Eigenvectors of Matrices

be the orthogonal matrix from Example 1.51. Verify that Pe1 , Pe2
and Pe3 are the eigenvectors v1 , v2 and v3 found in Example 5.13,
and that  
11 0 0
P T AP =  0 11 0  ,
 

0 0 2
is a diagonal matrix, with the eigenvalues λ1 > λ2 > λ3 of A found in
Example 5.11 running along the main diagonal.
Chapter 6

Matrices 2

6.1 Quadratic forms


Definition of quadratic forms
Quadratic forms are special types of function that map vectors x in Rn to Quadratic forms have
applications in calcu-
numbers in R. They are closely associated with symmetric matrices and lus, number theory, ge-
are used, for example, in Appendix B. Before giving their full definition, ometry and statistics.
They even managed to
we look at some examples. find their way into the
theory behind wireless
Example 6.1. communications.

1. The function q : R2 → R, given by

q(x) = x12 + x22 where x = (x1 , x2 ),

is a quadratic form. The function takes each vector x = (x1 , x2 )


in R2 as input and returns as output the number x12 + x22 , which
happens to equal x 2 in this case.

2. Fix numbers a, b, c ∈ R. In general, any function q : R2 → R of


the form

q(x) = ax12 + bx22 + cx1 x2 where x = (x1 , x2 ),

is a quadratic form. In the first example above, we had a = b =


1 and c = 0. If a = 5, b = −3 and c = −4, then we obtain

q(x) = 5x12 − 3x22 − 4x1 x2 ,

107
108 Matrices 2

which is another quadratic form.

3. Now fix numbers a, b, c, d, e, f ∈ R. Any function q : R3 → R of


the form

q(x) = ax12 + bx22 + cx32 + dx1 x2 + ex2 x3 + fx1 x3 ,

where x = (x1 , x2 , x3 ), is a quadratic form, this time defined on


R3 . For instance

q(x) = 2x12 − 5x22 + x32 − 11x1 x2 + 6x1 x3 ,

yields a quadratic form (here, e = 0).

Quadratic forms on R2 , Figure 6.1: q(x) = 5x12 − 3x22 − 4x1 x2 plotted as a 3D surface
or indeed functions
on R2 in general,
can be plotted as 3D
surfaces: the value
of f(x1 , x2 ) yields the
height of the surface
above the point (x1 , x2 ).
These days, plotting
such things with
computers is quite easy,
and doing so helps
to put flesh on them.
There are some free
graphing apps, such
as Quick Graph for
iOS, which enable the
user to quickly sketch
quadratic forms (and
more general functions)
on R2 , and view them
from different angles.

Quadratic forms can be defined on Rn , for any positive integer n. Roughly


speaking, if you define a function on Rn as a sum of multiples of terms of
the form xi2 , 1 6 i 6 n, or xi xj , 1 6 i < j 6 n, where x = (x1 , . . . , xn ), then
you will obtain a quadratic form. Notice that all of the examples above
follow that pattern. The function f : R2 → R defined by

f(x) = x12 − 4x12 x2 ,

is not a quadratic form, because the term x12 x2 does not fit into the pattern
above.
6.1. Quadratic forms 109

Having that rough idea in mind, we gingerly approach the formal defini-
tion.

Definition 6.2 (Quadratic forms). A quadratic form is a function q :


Rn → R of the form
n
X X
q(x) = ai xi2 + bij xi xj ,
i=1 16i<j6n

where x = (x1 , . . . , xn ) and where ai , 1 6 i 6 n, and bij , 1 6 i < j 6 n,


are fixed numbers.

The expression ‘1 6 i < j 6 n’ under the second summation sign means


that you should sum over all possible combinations of i and j that obey
that rule. So if n = 2, there is only one possible combination, namely
i = 1, j = 2. If n = 3, there are three possible combinations: 1, 2, 2, 3
and 1, 3.

Example 6.3.

1. Let n = 2, a1 = 5, a2 = −3 and b12 = −4. Then


n
X X
ai xi2 + bij xi xj = 5x12 + (−3)x22 + (−4)x1 x2 ,
i=1 16i<j6n

which yields Example 6.1 (2).

2. If we let n = 3, a1 = 2, a2 = −5, a3 = 1, b12 = −11, b23 = 0


and b13 = 6, then we obtain Example 6.1 (3).

Quadratic forms and symmetric matrices


The following examples reveal the connection between quadratic forms
and symmetric matrices.

Example 6.4.

1. Let A = I2 . Given a vector x in R2 , treated as a column vector,

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
110 Matrices 2

observe that
! ! !!
x1 1 0 x1
x · (Ax) = ·
x2 0 1 x2
! !
x1 x1
= · = x12 + x22 ,
x2 x2

which gives Example 6.1 (1).


 
−2
2. Let A = −2 −3 and let x be a column vector in R2 . Then
5

! ! !!
x1 5 −2 x1
x · (Ax) = ·
x2 −2 −3 x2
! !
x1 5x1 − 2x2
= ·
x2 −2x1 − 3x2

= x1 (5x1 − 2x2 ) + x2 (−2x1 − 3x2 ) = 5x12 − 3x22 − 4x1 x2 ,

which is Example 6.1 (2).


!
2 − 11
2 3
3. Finally, let A = − 11
2 −5 0 and let x be a column vector in
3 0 1
R3 . Then
    
x1 2 − 11 2
3 x1
x · (Ax) =  x2  ·  − 11 −5 0   x2 
    
2
x3 3 0 1 x3
   
x1 2x1 − 11 x + 3x3
2 2
=  x2  ·  − 11 x − 5x2 
   
2 1
x3 3x1 + x3

= x1 (2x1 − 11
x + 3x3 ) + x2 (− 11
2 2 2 1
x − 5x2 ) + x3 (3x1 + x3 )
= 2x12 − 5x2 + x3 − 11x1 x2 + 6x1 x3 ,
2 2

which is Example 6.1 (3).

Every quadratic from can be expressed succinctly in this way, using a


6.1. Quadratic forms 111

symmetric matrix and the scalar product. We won’t prove the next result.

Theorem 6.5. Let q : Rn → R be a quadratic form. Then there exists


a unique symmetric matrix A ∈ Mn (R), such that

q(x) = x · (Ax).

Conversely, given a symmetric matrix A ∈ Mn (R), the function q de-


fined by the expression above is a quadratic form.

Given a quadratic form q with numbers ai , 1 6 i 6 n, and bij , 1 6 i < j 6


n as in Definition 6.2, we can generate the associated symmetric matrix
A by setting Aii = ai , 1 6 i 6 n, and Aij = Aji = 12 bij , 1 6 i < j 6 n. Note
the division by 2 when we define the entries of A off the main diagonal
– you can see this in action in Example 6.4 above.
The next example of quadratic forms adopts some notions from calculus.

Example 6.6. Let f : Rn → R be a function having continuous second


order partial derivatives. Then the Hessian matrix Hf (a) of f at the
point (or vector) a in Rn is a symmetric matrix in Mn (R) which yields
a corresponding quadratic form. For instance, if n = 2, f(x1 , x2 ) =
x1 sin x2 and a = (3, π4 ), then

∂2 f π ∂2 f π
! !
√1
∂x12 ∂x1 ∂x2
(3, ) (3, ) 0
Hf (a) = ,
4 4 2
∂ f π ∂ f π
=
− √32
2 2
√1
∂x2 ∂x1 ∂x22
(3, 4
) (3, 4
) 2


which has corresponding quadratic form q(x) = − √32 x22 + 2x1 x2 . It is
important to note that the quadratic form will generally vary with a:
try different values of a above and see what happens.

Change of coordinates and diagonalisation


Recall Section 4.2. Let q : Rn → R be a quadratic form and let A be its
associated symmetric matrix, furnished by Theorem 6.5. It is possible to
change our coordinate system in such a way that, with respect to the
new system, our quadratic form will look simpler than it did originally.
Specifically, we will be able to rewrite it in such a way that the ‘cross
terms’, i.e. the terms of the for xi xj , i < j, will disappear. We call this
process diagonalisation of the quadratic form.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
112 Matrices 2

Let P be the orthogonal matrix kindly supplied to us by Theorem 5.14,


part (3). We know that Pe1 , . . . , Pen is an on basis of Rn , consisting of
eigenvectors of A, and having corresponding eigenvalues λ1 > · · · > λn ,
listed in descending order. Let D = P T AP be the diagonal matrix having
those eigenvalues along its main diagonal.
Given a column vector x in Rn , let us define a new column vector y = P T x.
Following the end of Section 4.2, after Example 4.10, we have that the
ith entry yi of y is given by yi = y · ei = (P T x) · ei = x · (Pei ), which equals
the ith coordinate of x, with respect to the new basis. In other words

x = y1 (Pe1 ) + · · · + yn (Pen ).

Moreover, since P is orthogonal, we have Py = PP T x = In x = x. Now


we can piece all of this together to make something magical happen:

q(x) = x · (Ax) = (Py) · (APy)


= y · (P T APy) Proposition 2.28
= y · (Dy)
    
y1 λ1 0 . . . 0 y1
 y2   0 λ2 y2
    
0  
=  .   · 
  .. ..  .. 
. ..  
 .   . . .  . 
yn 0 ... 0 λn yn
   
y1 λ1 y1
y2   λ2 y2
   
 
=  ..· . 
   . 
 .  . 
yn λn yn

= λ1 y21 + λ2 y22 + · · · + λn y2n .

As promised, we have rewritten q(x) with respect to a new coordinate


system, in such a way that all the cross terms have been removed.
In this way we can gain a far better understanding of the behaviour of
how quadratic forms behave.

Example 6.7. Diagonalise the quadratic form q : R3 → R given by

q(x) = 8x12 + 8x22 + 8x32 − 6x1 x2 − 6x1 x3 − 6x2 x3 .


6.1. Quadratic forms 113

Solution. Happily for us, the matrix A associated with the quadratic
form is the same matrix as in Examples 5.11, 5.13 and 5.15. We collect
the corresponding orthogonal matrix P from Example 5.15 and set
y = P T x. Bearing in mind the eigenvalues of A, we obtain

q(x) = 11y21 + 11y22 + 2y23 . 

This solution, while formally correct, seems rather terse and ill-mannered.
In particular, it doesn’t really reveal what is happening, so let’s take a
closer look. If we evaluate y we obtain
   1  
y1 √
2
− √12 0 x1
T
 y2  = y = P x =  √16 √16 − √26   x2 
    

y3 √1
3
√1
3
√1
3
x3
 
√1 x1 − √1 x2
2 2
=  √16 x1 + √16 x2 − √26 x3  .
 
√1 x1 + √1 x2 + √1 x3
3 3 3

Thus we have concrete expressions for the entries of y in terms of those


of x. You should verify that, when these expressions are substituted into
the diagonalised version of the quadratic form, the original version of it
is recovered.
We finish the section by giving one example of how diagonalisation can
help us understand quadratic forms, namely how to find the maximum
and minimum values of q(x), subject to the constraint that x is a unit
vector. Quite often this sort of problem is approached by considering the
method of Lagrange multipliers, but we do not need such things in this
particular case.

Proposition 6.8. Let q : Rn → R be a quadratic form, and let λ1 >


· · · > λn be as above. The maximum and minimum values that q(x)
can take, subject to the constraint x = 1, equal λ1 and λn , respec-

tively.

Proof. With y = P T x as above, observe that


2
y21 + · · · + y2n = y = x 2 ,

by Proposition 2.29. Hence


q(x) = λ1 y21 + · · · + λn y2n

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
114 Matrices 2

6 λ1 (y21 + · · · + y2n ) as λ1 > · · · > λn


= λ1 x 2 = λ1 ,

whenever x = 1. Likewise, q(x) > λn whenever x = 1. Lastly, if we



set x = Pe1 , then x = 1 by Proposition 2.29 and y = P T x = e1 , thus

q(x) = λ1 1 + λ2 0 + · · · + λn 0 = λ1 .

Likewise, setting x = Pen yields q(x) = λn .

Example 6.9. The maximum and minimum values that the quadratic
form q(x) in Example 6.7 can take, subject to the condition x = 1,

Note that if you are are 11 and 2, respectively. Verify that these values are attained at
asked to find these max-
   
√1
imum and minimum val-
ues of a quadratic form, √1
2 3
Pe1 =  − √12 and Pe3 =  √1 ,
and nothing else, then    
all you need to do is  3
find the eigenvalues of
0 √1
the associated symmet- 3
ric matrix and pick out
the largest and smallest
ones.
respectively (you can obtain the maximum at Pe2 as well).

Definition 6.10. Let A ∈ Mn (R) be symmetric. Then A is called pos-


itive definite or negative definite if its eigenvalues are all strictly
positive or all strictly negative, respectively.

In view of Proposition 6.8, A ∈ Mn (R) is positive definite if and only if the


associated quadratic form x · (Ax) is strictly positive for all non-zero x in
Rn . Indeed, A is positive definite if and only if λn > 0. Given non-zero x,
we have
x · (Ax) = x 2 x̂ · (Ax̂) > x 2 λn ,

where x̂ = x/ x is the associated unit vector. Given that x 2 is strictly



positive whenever x is non-zero, it follows that x · (Ax) is always strictly
positive for non-zero x if and only if λn > 0. A similar argument, using λ1
this time, shows that A is negative definite if and only if x · (Ax) < 0 for
all non-zero x.

Exercise 6.11. Establish whether or not the matrices in Example 6.4


(2) and (3) are positive definite, negative definite, or neither.
6.2. Matrix norms 115

6.2 Matrix norms


Lengths of vectors were covered in Section 2.2. Beside Definition 2.14
is a mysterious marginal note concerning norms of vectors. A norm is
a more general notion than the length of a vector, that can be applied
equally to vectors and to things that don’t look very much like vectors at
all (but which are in fact vectors, depending on the linear algebra course
you are taking).
It so happens that matrices have norms. In fact there are many different
ways in which one can impose a norm on a matrix. In our brief treatment,
we will look at just one.
Suppose that we fix a matrix A ∈ Mn (R). Let’s consider a unit vector x in
Rn , that is, x = 1. If we let A act on x, as in Section

2.4,
we obtain a
new vector Ax, and we can measure its length to get Ax . Now imagine

n
vectors x in R . The norm of A is
that we do this for all unit defined to
be the largest value of Ax , subject to the constraint that x = 1.

Definition
6.12 (Matrix norms). Let A ∈ Mn (R). The norm of A, written
A , is defined to be the maximum of Ax , subject to the constraint
that x = 1.

At first glance, it may not be obvious that there should be such a maximum
number at all.
But there is, courtesy of a particular quadratic form. If we
square Ax , we obtain

2
Ax = (Ax) · (Ax) = x · (AT Ax),

by Fact 2.20 (2) and Proposition 2.28. This defines a quadratic form,
because AT A is symmetric, no matter what the original matrix A is up
to! Indeed, (AT A)T = AT AT T = AT A, using Fact 1.23 (7), and the fact that
AT T = A for any matrix.
2
Therefore, by Proposition 6.8, the maximum value of Ax , subject to the
constraint that x = 1, is equal to λ, where λ is the largest eigenvalue

of the symmetric matrix AT A. Taking positive square roots of both sides
now yields √
A = λ.
Note that λ is non-negative (so we can take its square root legitimately),
2
because it is the maximum value of Ax , which can never be negative.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
116 Matrices 2

!
1 1
Example 6.13. Compute A , where A = .
0 1

Solution. We see that


! !
AT = and AT A =
1 0 1 1
.
1 1 1 2

Now det(A√T A − λI2 ) = (1 − λ)(2 − λ) − 1 = λ2 − 3λ + 1, which has roots


λ = 23 ± 12 5. Therefore the larger of the two eigenvalues of AT A is
√ q √
λ = 23 + 12 5, giving A = 32 + 21 5. 
Appendix A

Discussion board and WeBWorK guides

A.1 How to use the Moodle discussion boards


Writing mathematics on the discussion boards
Evidently, written mathematics uses a host of symbols and notation that
is not available in ordinary word processing software. These days, the
majority of mathematicians use LaTeX (or its predecessor TeX) to typeset
mathematical documents. All of the material in this module was typeset
using LaTeX. However, learning LaTeX takes some time, and it was not
designed for direct use on the web. Instead, the more recent MathJax
system enables mathematics to be written directly into web pages using
simple LaTeX expressions.
The Moodle site is supported by MathJax, so users can write simple
mathematical expressions when posting to forums. We encourage users
to use MathJax when posting mathematical queries; the results look good
and clear, and only a minimal knowledge of LaTeX is required.
For the remainder of this section, we cover the basics of how to post sim-
ple mathematical expressions using MathJax. Note that when MathJax
is used on the discussion boards, typically it takes a few seconds for
mathematical expressions to be rendered properly!

1. Enclose mathematical notation using dollar signs


All mathematical notation should be enclosed by a pair of dollar
signs. For example, if you want to post the query ‘I don’t understand
why a4 = 3 in that example.’, you should write
I don’t understand why $aˆ4 = 3$ in that example.

117
118 Discussion board and WeBWorK guides

If you want to post a mathematical expression on a separate line,


enclose it using a pair of double dollar signs. For example, writing

$$aˆ4 = 3$$

renders the expression


a4 = 3
on a separate line, as above.

2. Arithmetic and fractions


Expressions such as ‘x + y = 5’ or ‘x − 2y = 3.6’ can rendered by
writing
$x + y = 5$ or $x - 2y = 3.6$,
respectively. You can write fractions either by using the division
sign or by using the \frac command, together with two pairs of
curly braces { and }. For instance,

$3/5$ and $\frac{3}{5}$

yield 3/5 and 35 , respectively.

3. Exponents/superscripts and subscripts


The characters ˆ and _ are used to render exponents/superscripts
and subscripts, respectively. For example,

$xˆ5 + 3xˆ2 + 10 = 0$ and $x_1 + x_3 = 4$

yield x 5 + 3x 2 + 10 = 0 and x1 + x3 = 4, respectively.

4. Surds
√ √
Expressions like 2 and 5 11 can be obtained by writing $\sqrt{2}$
or $\sqrt[5]{11}$, respectively (note the use of square brackets
as well as curly ones in the second example).

5. Use pairs of curly braces to nest expressions


For instance,
$xˆ{\frac{1}{2}} = eˆ{yˆ2}$
produces x 2 = ey .
1 2
A.1. How to use the Moodle discussion boards 119

6. Standard functions
The commands $\sin$, $\cos$, $\tan$ and $\log$ produce the
standard functions sin, cos, tan and log. For example,
$\cos(x) = \frac{\sqrt{3}}{2}$

3
produces cos(x) = 2
.
7. Summation and integration
Use the commands $\sum$ and $\int$, together with ˆ and _ and
curly braces, to write expressions involving summation and integra-
tion. For instance,
$\sum_{k=1}ˆn k = \frac{1}{2}n(n-1)$
Pn
produces k=1 k = 12 n(n − 1), and
$\int_1ˆ2 xˆ2 dx = \frac{7}{3}$
R2
gives 1
x 2 dx = 73 .
8. Greek characters and special symbols
The Greek letters α, β, θ and π etc can be expressed using $\alpha$,
$\beta$, $\theta$ and $\pi$, respectively. Symbols such as R and
≈ require $\mathbb{R}$ and $\approx$, respectively.
9. Matrices
Alas, there is no quick way to write down matrices properly using
MathJax, because doing so requires a so-called ‘LaTeX environ-
ment’.
To begin, type \begin{pmatrix}. Then type in the entries of the
first row, separating each one by an ampersand & character. When
you reach end of the first row, type \\. Add the entries of the
second row as above, and repeat until you have reached the end
of the final row. To finish, type \end{pmatrix} (you do not need to
add \\ at the end of the final row).
Perhaps an example explains all of this best. Typing
$$\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$$
will produce !
1 2
.
3 4

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
120 Discussion board and WeBWorK guides

The best way to learn this stuff is through practice and experimenta-
tion. You can do so by using this Live Demo. The examples above can
be adapted and combined in all sorts of ways to produce expressions
of greater complexity (though it should not be necessary to write enor-
mously complicated expressions!).

Be mindful when using the curly braces { and }. MathJax will com-
plain with error messages, or will not render your expression properly, if
they are missing or are in the wrong place. Every opening { requires a
corresponding closing } (correctly placed).

Finally, we repeat that when MathJax is used, typically it takes a few


seconds for mathematical expressions to be rendered properly. Also, the
system is not perfect. Sometimes it can stop working for reasons that
are inexplicable and beyond our control!

A.2 How to use WeBWorK


Submitting WeBWorK solutions
WeBWorK solutions must be entered entirely online. Many solutions are
numerical in nature, in which case entering a numerical solution usually
suffices. Occasionally, it may be necessary to enter more complicated
types of solutions, such as a polynomial like x 2 + 3. Such solutions
require a syntax that is similar to, but not the same as, MathJax above
(unfortunately, we are not able to do anything about this). Advice on this
syntex is given below.

√a given answer is not an integer (e.g. a fraction or irrational number like


If
2), you can enter it either symbolically or by using a decimal expansion.
For example, you can enter the fraction 32 either as 2/3 or as 0.666667.
In the latter case, give your answer to at least 6 significant digits so
that WeBWorK does not misinterpret it. The number of digits provided
by most calculator displays should be sufficient.

Below are some examples of types of expressions that may come up in


WeBWorK for this module (and maybe other ones), together with typical
examples and how to enter them.

If you are in any doubt about how WeBWorK is going to interpret your
answer, press the ‘Preview Answers’ button.
A.2. How to use WeBWorK 121

Expression type Example Enter into WeBWorK as


3
Fractions 4
or
Powers, Exponents n5
4k
(−7)n not
Polynomials 2x 2 + 15x − 4 or
Trig functions cos x
Exponential functions ex

WeBWorK will also interpret e.g. pi as π, pi/4 as π4 , cos(pi/4) as cos π4 ,


log(2) as log 2, e^4 as e4 , and so on. See

http://webwork.maa.org/wiki/Available_Functions

for other ways of entering answers. Also, sometimes you have to be


careful when using brackets ( and ). Just as entering things in a cal-
culator in a different order will produce different answers, so WeBWorK
will interpret things in a different order, depending on where you put the
brackets.

Notes on specific MATH10390 WeBWorK questions


Set 1, question 3
Omit the outer quotation marks (see advice given in the question) from
your solutions.

Set 2, questions 8–10


You should enter vectors using chevrons or ‘angle brackets’ < and >, e.g.
<1, 2, 3>.

Set 3, question 10
The definition of row-echelon form in the notes (Definition 3.8) does not
agree with the one used here. Here, the first non-zero entry of a row
does not have to be equal to 1. Such an entry is called a ‘leading entry’,
instead of a ‘leading 1’. In Definition 3.8 (2) and (3), replace ‘leading 1’
by ‘leading entry’.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
122 Discussion board and WeBWorK guides

The definition of reduced row-echelon form is the same as the one given
in the notes. In particular, all leading entries must be leading 1s.

Set 4, questions 1, 2 and 7


You’re asked to solve a system of linear equations such as

x1 + x2 − 3x3 = −9
4x1 + 3x2 + 2x3 = −5,

(the numbers of equations and variables will vary from question to ques-
tion). In each case, the format of the answer suggests that you will find
infinitely many solutions involving a single parameter, denoted by s.
So suppose that your general solution to the system above is

x1 = 3 + 2s
x2 = 2 − 7s
x3 = s

(this is incorrect!). If this is your solution, then input


Appendix B

Principal component analysis


(non-examinable)

Consider a situation in which we have a sample of m objects, each with


n characteristics or variables that we have measured. We can represent
the n measurements of the ith object as a column vector
 
xi1
 
xi2 
 

xi = 
 ..  ,

 . 
 
xin

in Rn . Such a vector is called a data point. The number xij , where


1 6 i 6 m and 1 6 j 6 n, represents the measurement of the jth
characteristic or variable of the ith object. The number of variables we
are measuring equals the dimension of the space these data points live
in (see comments after Example 2.2).

Visualising or trying extract patterns and meaning from data in Rn can be


very difficult or impossible if the number of dimensions n is high. Prin-
ciple Component Analysis (hereafter abbreviated to PCA) is a method
of ‘projecting’ this high-dimensional data onto a lower-dimensional ‘sub-
space’ of Rn , in a way that is designed to give the user some information
about the sample that can be more easily interpreted, given the smaller
number of dimensions. See Appendix C.2 for more details about sub-
spaces of Rn , dimension and projections onto subspaces.

123
124 Principal component analysis (non-examinable)

The first principal component


The principal components of the above sample data turn out to be an on
basis of Rn (see Chapter 4). We can give a complete description of how
they can be found. In doing so, we motivate the process and interpret
the meaning of the first principal component, in terms of maximising a
certain variance. Interpreting the meaning of the subsequent principal
components is a little trickier, because we didn’t cover the mathematics
needed for this in the module, but we give a brief sketch of towards the
end nonetheless.
The first thing we shall do is mean centre the data, so that the ensuing
computations are simplified. Let
m
X
x̃ = 1
m 1
(x + · · · + xm ) = 1
m
xi .
i=1

be the sample mean vector of the data points x1 , . . . , xm . We mean centre


the data by considering new vectors yi = xi − x̃, 1 6 i 6 m. Now the
sample mean vector of the new data points y1 , . . . , ym is
m
X m
X m
X m
X
1
m
yi = 1
m
(xi − x̃) = 1
m
xi − 1
m
x̃ = x̃ − x̃ = 0.
i=1 i=1 i=1 i=1

This has the effect of simplifying the subsequent calculations.


Recall Section 2.5 on orthogonal projections. In PCA, we begin by seek-
ing a unit vector v in such a way that, when we project each yi or-
thogonally onto the line L though the origin, parallel to v as above, the
variance of the projected data points is maximised.
Let v be a unit vector. From above, the orthogonal projection of yi onto
L equals (yi · v)v. The sample mean vector of the orthogonal projections
is also 0:
X m  Xm  
1
m
(yi · v)v = 1
m
yi · v v = (0 · v)v = 0.
i=1 i=1

Consequently, the (unbiased sample) variance of the orthogonal projec-


tions is simply
m
X
1
m−1
(yi · v)2 . (B.1)
i=1
Principal component analysis (non-examinable) 125

My multiplying by m − 1, maximising (B.1) is equivalent to maximising


the quantity
Xm
(yi · v)2 , (B.2)
i=1

over all possible choices of unit vector v.


It turns out that (B.2) is a quadratic form q(v) (Section 6.1). We will
find its maximum, subject to the condition that v = 1, by appealing to

Proposition 6.8.
Let thePjth entry of the vector yi be denoted yij . Consider the quantities
Cjk = m i=1 yij yik , 1 6 j, k 6 n, and the matrix C ∈ Mn (R) having Cjk as
its (j, k)-entry. Notice that C is symmetric: Cjk = Ckj for all j and k.
The matrix C is known as the corrected sum of squares and products
(SSP) matrix of the original data points xi (remember that we mean
centred them by considering yi ).
We observe that C equals the sum of matrix products m T
P
i=1 yi yi . Indeed,
yi and yi T are n × 1 and 1 × n matrices, respectively, so yi yi T ∈ Mn (R).
Moreover, observe that the (j, 1)-entry of yi , considered as a n × 1 matrix,
is equal to the jth entry of yi , i.e.

(yi )j1 = yij .

Having this in mind, we see that

(yi yi T )jk = (yi )j1 (yi T )1k = yij yik ,

hence m
X  m
X
T
yi yi = yij yik = Cjk ,
i=1 jk i=1

for 1 6 j, k 6 p.
Also, observe that, given two n × 1 column vectors a and b, we see that
a · b equals the single entry in the identical 1 × 1 matrices aT b and bT a.
We abuse notation slightly and write a · b = aT b = bT a. With this in
mind,
n
X n
X
(yi · v) =
2
(vT yi )(yi T v)
i=1 i=1
Xn
= vT (yi yi T )v
i=1

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
126 Principal component analysis (non-examinable)

n
X 
T T
= v yi yi v = vT C v = v · (C v), (B.3)
i=1

using Fact 1.23. This yields our quadratic form q, by Theorem 6.5.
Since C is symmetric, by Theorem 5.14, there is an orthogonal matrix P ∈
Mn (R), such that the on basis Pe1 , . . . , Pen of Rn consists of eigenvectors
of C , having corresponding eigenvalues λ1 , . . . , λn , listed in descending
order. Moreover, the product P T C P is the diagonal matrix
 
λ1 0 ... 0
 
λ2
 
 0 0 
 .
 .. ... .. 
 . . 
 
0 ... 0 λn

Set vj = Pej , for 1 6 j 6 n. According to Proposition 6.8, the maximum


and minimum values of q are q(v1 ) = λ1 and q(vn ) = λn , respectively.
The vector v1 is said to be the first principal component. We maximise
(B.2) by setting v equal to the first principal component v1 .

Subsequent principal components


In general, the ith eigenvector vi , 1 6 i 6 n, is called the ith principal
component of the sample data. It is a little harder to explain the meaning
of these, for i > 2, without having a proper discussion of so-called sub-
spaces of Rn and orthogonal projections onto subspaces – these concepts
only very briefly alluded to at the end of Chapter 4.
In lieu of said discussion, we give a brief and somewhat sketchy inter-
pretation. Consider the vectors wi = yi − (yi · v1 )v1 , 1 6 i 6 m. It is simple
to check that all of the vectors wi are orthogonal to v1 (indeed, this is by
design – see Section 2.5). Now imagine that we wanted to repeat the
process above of finding a vector v that maximises the variance as in (B.1)
(equivalently, maximises (B.2)), but where the yi have been replaced by
the wi . It turns out that v2 maximises said quantity.
Then consider a further set of vectors

wi − (wi · v2 )v2 ,
Principal component analysis (non-examinable) 127

1 6 i 6 m. With a little vector algebra, and noting that v1 · v2 = 0, we


can show that these new vectors equal

yi − (yi · v1 )v1 − (yi · v2 )v2 ,

1 6 i 6 m. It is straightforward to check that all of these vectors are


orthogonal to both v1 and v2 . Repeating the process of maximisation once
more with these vectors will yield v3 . And so it goes on. We obtain v4
after repeating the process with

yi − (yi · v1 )v1 − (yi · v2 )v2 − (yi · v3 )v3 ,

1 6 i 6 m, which are all orthogonal to v1 , v2 and v3 . At each stage, we


are maximising a certain variance with respect to a set of vectors that
are orthogonal to all of the principal components found up to that point.
We run out of things to do after the nth stage, because

yi − (yi · v1 )v1 − (yi · v2 )v2 − · · · − (yi · vn )vn = 0,

1 6 i 6 m, as v1 , . . . , vn is an on basis (recall Theorem 4.6 (2))!

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
Appendix C

Additional material (non-examinable)

C.1 Additional proofs


In this module, procedure is emphasised over theory. In other words, we
focus on presenting methods of doing things, rather than looking at why
those methods work in the first place. Those of you who want to delve
more deeply into why the mathematics in this module works are invited
to look at this section.

Missing proofs from Chapter 1


In the proof of Fact 1.23, you will see how we have to open up the matrices
involved, unravel things far enough to be able to manipulate things in the
way we require, and then close everything back up again.
Throughout the proof, we assume as given the standard laws of arithmetic
of ordinary numbers. In a couple of cases, namely (1) and (2) below, we
point out exactly which laws of arithmetic are being used, but do not do
so thereafter.

Proof of Fact 1.23.

1. Let A and B be m × n matrices. Let 1 6 i 6 m and 1 6 j 6 n. We


have

(A + B)ij = Aij + Bij Definition 1.5


= Bij + Aij commutative law of addition of numbers
= (B + A)ij . Definition 1.5

129
130 Additional material (non-examinable)

In other words, the (i, j)-entry of A + B equals the (i, j)-entry of


B + A. Since the choices of i and j were arbitrary, it follows that
every entry of A + B equals its corresponding entry of B + A. This
means that A + B = B + A, according to Remarks 1.3 (5).
2. The proof quite similar to (1). Let A, B and C be m × n matrices.
Let 1 6 i 6 m and 1 6 j 6 n. We have

((A + B) + C )ij = (A + B)ij + Cij Definition 1.5


= (Aij + Bij ) + Cij Definition 1.5
= Aij + (Bij + Cij )
associative law of addition of numbers
= Aij + (B + C )ij Definition 1.5
= (A + (B + C ))ij . Definition 1.5

In other words, the (i, j)-entry of (A + B) + C equals the (i, j)-entry


of A + (B + C ). As above, since the choice of i and j were arbitrary,
it follows that A + (B + C ) = (A + B) + C .
4. Let A, B and C be m × r, r × p and p × n matrices, respectively.
It follows that AB is a m × p matrix, so (AB)C is a m × n matrix.
Likewise, BC is a r × n matrix, so A(BC ) is a m × n matrix as well.
Hence (AB)C and A(BC ) have the same size. It remains to show
that the corresponding matrix entries are equal. Given 1 6 i 6 m
and 1 6 j 6 n, we have
p
X
((AB)C )ij = (AB)ik Ckj Definition 1.10
k=1
pr
!
X X
= Ai` B`k Ckj Definition 1.10
k=1 `=1
p r
!
X X
= Ai` B`k Ckj . (C.1)
k=1 `=1

On the other hand,


r
X
(A(BC ))ij = Ai` (BC )`j Definition 1.10
`=1
p
r
!
X X
= Ai` B`k Ckj Definition 1.10
`=1 k=1
C.1. Additional proofs 131

p
r
!
X X
= Ai` B`k Ckj . (C.2)
`=1 k=1

The point is that the double sums in lines (C.1) and (C.2) are equal!
The only difference between them is the order of summation: in (C.1)
we sum over ` first, and then over k, whereas in (C.2) we sum over k
first, and then `. In summary, ((AB)C )ij = (A(BC ))ij . As above, since
i and j were chosen arbitrarily, it follows that (AB)C = A(BC ).
5. Let A and B be m × p matrices and let C be a p × n matrix. Then
A + B is a m × p matrix and (A + B)C is a m × n matrix. Likewise,
AC , BC and AC + BC are m × n matrices. We must check that the
corresponding entries of (A + B)C and AC + BC are equal. Given
1 6 i 6 m and 1 6 j 6 n, we have
p
X
((A + B)C )ij = (A + B)ik Ckj Definition 1.10
k=1
p
X
= (Aik + Bik )Ckj Definition 1.5
k=1
p
X
= Aik Ckj + Bik Ckj
k=1
p p
X X
= Aik Ckj + Bik Ckj
k=1 k=1
= (AC )ij + (BC )ij Definition 1.10
= (AC + BC )ij . Definition 1.5

As above, this tells us that (A + B)C = AB + BC . The other equality


is covered in almost the same way.
6. Exercise!
7. Let A and B be m × p and p × n matrices, respectively. Then AB is
a m × n matrix, so (AB)T is a n × m matrix. Moreover, BT and AT
are n × p and p × m matrices, respectively, so BT AT is also a n × m
matrix. Thus (AB)T and BT AT have the same size. Now we check
that their respective entries agree. Given 1 6 i 6 n and 1 6 j 6 m,
we have
((AB)T )ij = (AB)ji Definition 1.19

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
132 Additional material (non-examinable)

p
X
= Ajk Bki Definition 1.10
k=1
p
X
= (BT )ik (AT )kj Definition 1.19
k=1

= (B AT )ij .
T
Definition 1.10
This shows that the corresponding matrix entries of (AB)T and BT AT
agree, hence (AB)T = BT AT .

Proof of Theorem 1.36. Let A be a m × n matrix. Then AIn is again a


m × n matrix. As above, it remains to check that their respective entries
agree. Given 1 6 i 6 n and 1 6 j 6 m, we have
n
X
(AIn )ij = Aik (In )kj Definition 1.10
k=1
= Aij ,
because (In )kj = 1 if k = j, and is zero otherwise! Hence AIn = A. The
proof of the other equality is almost the same.

The proofs of Proposition 1.53 (1) and (3) require mathematical induction.
Since we have not considered this method of proof in this module at all,
we just give very rough clues as to how these results can be proved.

Proof of Proposition 1.53.

1. This can be proved using Theorem 1.44 and by applying mathemat-


ical induction to the number of rows (or columns) of A.
2. If row i of A is zero then Ai1 = Ai2 = · · · = Ain = 0, so by expanding
along the ith row as in Theorem 1.44, we obtain
det(A) = 0 · Ci1 + 0 · Ci2 + · · · + 0 · Cin = 0.
Likewise if column j of A is zero.
3. This can be proved using Definition 1.41 and mathematical induc-
tion.
4. Observe that (kIn )ij = k if i = j, and is zero otherwise. Thus kIn is
a diagonal matrix, and so by applying (2) we have
n
det(kIn ) = k {z· · · × k} = k .
| ×k ×
n times
C.1. Additional proofs 133

Missing proofs from Chapter 2


In the proof of Proposition 2.28, we will make extensive use of the prop-
erties and machinery of matrix multiplication developed in Chapter 1.
Notice that if we treat x and y as n × 1 column vectors, then the matrix
product yT x is a 1 × 1 matrix, and furthermore
 
x
 1 
   x2 


T
y x = y1 y2 . . . yn   .. 

 . 
 
xn
  
= y1 x1 + y2 x2 + · · · + yn xn = x·y .

In other words, the matrix product yT x is the 1 × 1 matrix whose single


entry is the scalar product x·y. Since any 1×1 matrix just identifies with
the single scalar entry inside, we drop the exterior brackets and simply
write yT x = x · y. (Abusing notation is generally not recommended, as
it often creates confusion and errors, but occasionally it is helpful, if the
context of the abuse is clear.)
Having this observation in mind, the proof of Proposition 2.28 becomes
an exercise in matrix arithmetic.

Proof of Proposition 2.28. From above,

x · (AT y) = (AT y)T x


= (yT AT T )x Fact 1.23 (7)
= (yT A)x AT T = A for any matrix
= yT (Ax) Fact 1.23 (4)
= (Ax) · y.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
134 Additional material (non-examinable)

C.2 Vector subspaces of Rn and dimension


In this section we introduce the idea of a vector subspace of Rn , and on
bases of these subspaces. The reader may find that the material covered
here is a little more abstract than that of previous notes.

Vector subspaces
Vector subspaces are special sets of vectors. Throughout this chapter, we
will be using sets and set notation. Readers unfamiliar with such things
are advised to read through these notes before continuing.

Definition C.1 (Vector subspaces). Let S ⊆ Rn be some subset of


vectors in Rn . We say that S is a vector subspace, or just a subspace
of Rn , if it satisfies three tests:

1. the zero vector 0 belongs to S;

2. whenever we have two vectors x and y in S, their sum x + y is


also in S (i.e. x + y ∈ S whenever x, y ∈ S), and

3. whenever x is in S and k is a scalar, the vector kx is also in S


(i.e. kx ∈ S whenever x ∈ S and k ∈ R).

Typically, subsets of Rn that are subspaces are denoted by capital letters


such as V , U and W . Tests (2) and (3) of Definition C.1 are sometimes
expressed by saying that a subspace is closed under the operations of
vector addition and scalar multiplication. In other words, one does not
wander out of a subspace by adding together any two vectors inside it,
or by taking any scalar multiple of any vector inside.

Subsets that are not subspaces


To show that a given subset of vectors is a subspace, one must verify that
it passes all three of the tests in Definition C.1. Before we give some
examples of vector subspaces, we will consider some examples of subsets
of Rn that are not subspaces, because they fail one of the three tests. In
doing so, we will gain some insight into how these tests operate.

Example C.2. Show that the following subsets of R2 are not sub-
C.2. Vector subspaces of Rn and dimension 135

spaces of R2 .

1. S = (x1 , x2 ) ∈ R2 : x12 + x22 = 1 .

2. S = (x1 , x2 ) ∈ R2 : x1 = 0 or x2 = 0 .

3. S = (x1 , x2 ) ∈ R2 : x1 > 1 .

Solution.

1. Evidently S fails Definition C.1, test (1), because 0 = (0, 0) in


this case, and 02 + 02 = 0 6= 1. Failure of one test is enough to
show that S is not a subspace. For good measure, we show that
S fails the other two tests as well. Consider test (2). Note that
(1, 0), (0, 1) both belong to S, but the sum (1, 0) + (0, 1) = (1, 1)
does not, because 12 + 12 = 2 6= 1. Now consider test (3).
We have (1, 0) ∈ S and 2 ∈ R, but 2(1, 0) = (2, 0) is not in S,
because 22 + 02 = 4 6= 1.
Geometrically, the set S is the circle in R2 , having centre the
origin and radius 1. This set is illustrated below, together with
the fact that it fails test (2), using the example above.
y

(0,1) (1,0)+(0,1) = (1,1)

x
(1,0)

2. The set S passes test (1) because 0 = (0, 0) ∈ S. However,


it fails test (2). For example (1, 0), (0, 1) ∈ S, but (1, 1) =
(1, 0) + (0, 1) ∈ / S. Incidentally, S does pass test (3). Indeed,
let (x1 , x2 ) ∈ S and k ∈ R. Then k(x1 , x2 ) = (kx1 , kx2 ). Either
x1 = 0 or x2 = 0, so either kx1 = 0 or kx2 = 0. Whatever the
case, k(x1 , x2 ) ∈ S.
Geometrically, S is the set of all points along the two standard
coordinate axes in R2 . The failure of test (2) is illustrated.

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
136 Additional material (non-examinable)

(0,1) (1,0)+(0,1) = (1,1)

x
(1,0)

3. As above, S passes test (1). It passes test (2) as well. Indeed,


if (x1 , x2 ) and (y1 , y2 ) belong to S, then by definition x1 , y1 > 0.
However, (x1 , x2 ) + (y1 , y2 ) = (x1 + x2 , y1 + y2 ) and x1 + y1 > 0,
so (x1 + y1 , x2 + y2 ) ∈ S. However, S fails test (3). For example,
(1, 0) ∈ S and −1 ∈ R, but −1(1, 0) = (−1, 0) ∈ / S.
Geometrically, S is the set of all points on and to the right
of the vertical line having first coordinate 1, as shown shaded
below. The failure of test (3) is illustrated.
y

x
−1(1,0) = (−1,0) (1,0)

Subspaces generated by lists of vectors


Vector subspaces of Rn can be generated in a natural way, using lists
of vectors. Before we look at that, it will help to consider the notion of
linear combinations of vectors.

Definition C.3 (Linear combinations). Let v1 , . . . , vk be a list of vectors


C.2. Vector subspaces of Rn and dimension 137

in Rn (not necessarily orthonormal). Any sum of the form


k
X
a1 v1 + · · · + ak vk = ai vi ,
i=1

where a1 , . . . , ak ∈ R are scalars, is called a linear combination of


the vectors v1 , . . . , vk . Given x in Rn , if it is possible to find scalars
b1 , . . . , bk ∈ R, such that
k
X
x = bi v i ,
i=1

then x is called linear combination of v1 , . . . , vk .

For example. . .

Example C.4. Let v1 = (1, 1, 0), v2 = (−1, 1, 0) and v3 = (3, 1, 0) be


vectors in R3 . Then the vector x = (5, 11, 0) is a linear combination
of v1 , v2 , v3 , whereas y = (0, 0, 1) is not.

Solution. Let b1 = 8, b2 = 3 and b3 = 0. Then

b1 v1 + b2 v2 + b3 v3 = 8(1, 1, 0) + 3(−1, 1, 0) + 0(3, 1, 0)


= (8, 8, 0) + (−3, 3, 0) + (0, 0, 0) = (5, 11, 0) = x.

Hence x is a linear combination of v1 , v2 , v3 .


On the other hand, given any scalars a1 , a2 , a3 , notice that

a1 v1 + a2 v2 + a3 v3 = (a1 − a2 + 3a3 , a1 + a2 + a3 , 0).

The third entry of the linear combination always is 0, no matter what


a1 , a2 , a3 are, whereas the third entry of y equals 1. Hence y is not
a linear combination of v1 , v2 , v3 . 

Remarks C.5. It is worth noting that if we set c1 = 10, c2 = 2 and


c3 = −1 and consider v1 , v2 , v3 in the example above, then

c1 v1 + c2 v2 + c3 v3 = 10(1, 1, 0) + 2(−1, 1, 0) − 1(3, 1, 0)


= (10, 10, 0) + (−2, 2, 0) + (−3, −1, 0)

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
138 Additional material (non-examinable)

= (5, 11, 0) = x.

If the list of vectors So a different set of scalars has, in this case, given us the same vec-
happens to be linearly
independent, then the
tor. The scalars involved in the linear combination are not uniquely
scalars involved in any determined by the vector in question.
linear combination will
be uniquely determined:
no two different sets of Linear combinations can be used to define subspaces.
scalars will yield the
same linear combina-
tion. Definition C.6 (Subspaces generated by vectors). Let v1 , . . . , vk be a
We will not explicitly
cover linearly indepen-
list of vectors in Rn . Define the set of all possible linear combinations
dent lists of vectors in of the list
these notes, however,
orthonormal lists of vec-
tors happen to be exam- V = {a1 v1 + · · · + ak vk : a1 , . . . an ∈ R} .
ples of them.
Then V is called the subspace spanned or generated by v1 , . . . , vk .

In the definition above, V was claimed to be a subspace without any


proof of this fact. The next result repays this debt.

Proposition C.7. The set V given above is a subspace of Rn .

Proof. We must verify that V satisfies the criteria in Definition C.1.

1. If we set a1 = 0 = · · · = ak = 0, then

a1 v1 + · · · + ak vk = 0v1 + · · · + 0vk = 0.

Thus 0 ∈ V .
2. Let x, y ∈ V . By definition, x and y are linear combinations of
v1 , . . . , vk , so there exist scalars a1 , . . . , ak and b1 , . . . , bk such that

x = a1 v1 + · · · + ak vk and y = b1 v1 + · · · + bk vk .

Adding these gives

x + y = (a1 v1 + · · · + ak vk ) + (b1 v1 + · · · + bk vk )
= (a1 + b1 )v1 + · · · + (an + bk )vk ∈ V .

3. Finally, if c is a scalar, then (with x as above)

cx = c(a1 v1 + · · · + ak vk ) = (ca1 )v1 + . . . (cak )vk ∈ V .


C.2. Vector subspaces of Rn and dimension 139

Hence V satisfies all three criteria and so is a subspace of Rn .

Given v1 , . . . , vk and V as above, observe that vi ∈ V for each i. Indeed,


vi can be expressed as the linear combination

vi = 0v1 + · · · + 0vi−1 + 1vi + 0vi+1 + · · · + 0vk ,

thus it is an element of V , by the definition of V .

Geometric examples of subspaces


The material above looks rather abstract at first glance. However, many
examples of subspaces turn out to be quite easily visualised geometri-
cally. We consider a few such examples. First, we begin with the ‘largest’
subspace of Rn , namely itself.

Example C.8. The set Rn is a subspace, because it is a subset of itself


and evidently satisfies all three of the tests present in Definition C.1.

This is considered to be the largest subspace of Rn because every sub-


space of Rn is, by definition, a subset of it.

Example C.9.

1. Fix a non-zero vector v in Rn , and let V be the set of all scalar


multiples of v, i.e.

V = {av : a ∈ R} .

The vector v comprises a 1-element list of vectors. By Propo-


sition C.7, V is a subspace of Rn . It is the set of all vectors
parallel to v. If we were to plot all the elements belonging to
V above, we would get a straight line through the origin of Rn ,
extending infinitely in both directions and running parallel to
v. Such a subspace is 1-dimensional. We will cover dimension
a little more later.

2. Specifically, let v = (1, 3, 4) in R3 . Then V as above is given by

V = {a(1, 3, 4) : a ∈ R} = {(a, 3a, 4a) : a ∈ R} .

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
140 Additional material (non-examinable)

Example C.10.

1. Fix non-zero vectors v1 and v2 in Rn that are not scalar multiples


of each other, i.e., for no scalar c do we have v2 = cv1 , and vice-
versa. Define

V = {a1 v1 + a2 v2 : a1 , a2 ∈ R} .

Again, by Proposition C.7, V is a subspace of Rn . It is a so-


called 2-dimensional subspace.

2. Specifically, let v1 = (1, 0, 12 ) and v2 = (0, 1, 21 ) in R3 . Then V as


above is given by

V = (a1 , a2 , 21 (a1 + a2 )) : a1 , a2 ∈ R .

If we were to plot all the elements of V , we would get a plane


in R3 , passing through the origin.

In the figure below, we plot a section of subspace V in Example C.10 (2).

Figure C.1: A section of the plane in Example C.10 (2)

The whole subspace, if plotted, would look like a flat surface extending
arbitrarily far in all directions. Any vector in V , if regarded as a straight
line segment having the origin as its initial point, would lie along the
surface, as depicted above.
C.2. Vector subspaces of Rn and dimension 141

Next, we consider the so-called ‘trivial’ subspaces.

Example C.11. Let 0 = (0, . . . , 0) in Rn be the zero vector. The set


{0} is called the trivial subspace of Rn .

The trivial subspace of Rn is 0-dimensional. Strictly speaking, there is


a different trivial subspace of Rn for each value of n, because the zero
vectors 0, (0, 0), (0, 0, 0),. . . , change with n. However, since they are all
simply the 1-element set containing the zero vector of the given Rn , in
some sense, they are all the same.
Given a subspace V ⊆ Rn , test (1) of Definition C.1 guarantees that {0}
is always a subset of V . Therefore, {0} can be regarded as the ‘smallest’
subspace of Rn , hence its name.
We finish this subsection by considering another type of subspace that
has strong connections to geometry.

Exercise C.12. Fix a non-zero vector n in Rn , and let V be the set of


all vectors x in Rn that are orthogonal to our fixed vector n, i.e.

V = {x ∈ Rn : n · x = 0} .

Use Fact 2.20 to show that V is a subspace of Rn .

Such a subspace as given in Exercise C.12 is called a hyperplane of Rn .


A hyperplane of Rn is any subspace of Rn that has dimension n − 1. If
n = 3 then a subspace defined in this way will yield a 2-dimensional
plane of the type described in Example C.10. Rather than hyperplanes,
such objects are simply called planes. The vector n in Exercise C.12 is
called normal to the hyperplane.

Example C.13. Let n = (− 21 , − 21 , 1). Then the subspace V of R3 de-


fined in Exercise C.12 happens to equal the subspace defined in Ex-
ample C.10 (2).
We cover planes and
normal vectors in R3 in
a very cursory manner
In the figure below, the plane in Example C.10 (2) is plotted once again, here. To understand
this time with the vector n = (− 12 , − 12 , 1), lying normal to the plane. Every these things properly,
vector in the plane is perpendicular to n. Normal vectors are not unique. we require the notion
of the vector product or
Any non-zero scalar multiple of a vector normal to a given hyperplane cross product, which is
another form of product
will also be a vector normal to that same hyperplane. that applies exclusively
to vectors in R3 . We will
not cover this product in
© 2023 Richard Smith. For personal use only, not for circulation or sharing. this module.
142 Additional material (non-examinable)

Figure C.2: The plane in Example C.10 (2) with normal vector n

Subspaces defined by orthonormal lists of vectors


For the remainder of this section, we will deal exclusively with orthonor-
mal lists of vectors v1 , . . . , vk in Rn . According to Theorem 4.6 (1), we
must have k 6 n. Given such a list, let us define V as in Definition C.6.

Proposition C.14. Let v1 , . . . , vk be an on list of vectors in Rn and let

x = a1 v1 + · · · + ak vk ∈ V ,

be a linear combination, where V is as in Definition C.6. Then ai =


x · vi whenever 1 6 i 6 k. In particular, whenever x ∈ V , we have

x = (x · v1 )v1 + · · · + (x · vk )vk .

Proof. Let 1 6 i 6 k. Taking scalar products of both sides with vi gives


us
x · vi = (a1 v1 + · · · + ak vk ) · vi
= a1 (v1 · vi ) + · · · + ai (vi · vi ) + · · · + ak (vk · vi ) by Fact 2.20
= a1 (0) + · · · + ai (1) + · · · + ak (0)
= ai .
Now we turn to the second statement of the proposition. If x ∈ V then,
by definition of V , there exist scalars a1 , . . . , ak such that
x = a1 v1 + · · · + ak vk .
C.2. Vector subspaces of Rn and dimension 143

From above ai = x · vi for each i, thus

x = (x · v1 )v1 + · · · + (x · vk )vk ,

as claimed.

It will help to compare the next definition with Definition 4.7.

Definition C.15 (Orthonormal bases of subspaces and coordinates).


Let v1 , . . . , vk be an on list of vectors in Rn and define V as in Defi-
nition C.6.

1. The list v1 , . . . , vk is called an orthonormal basis or basis of V .

2. Given x ∈ V , the numbers x · v1 , . . . , x · vk are the coordinates


of x with respect to the basis v1 , . . . , vk .

Recall Remarks C.5. Unlike the list of vectors in that remark, given x ∈ V
as above, the scalars used to express x as a linear combination of the
vectors v1 , . . . , vk are uniquely determined: by Proposition C.14, they must
equal the coordinates x · v1 , . . . , x · vk , otherwise one would get a different
vector.
Definition C.15 is a generalisation of Definition 4.7 as it allows us to
define on bases of subspaces of Rn , and coordinates of vectors that lie in
said subspaces, rather than just Rn itself. If we set k = n and V = Rn ,
then Definition C.15 boils down to Definition 4.7.

Dimension of subspaces

Definition C.16 (Subspace dimension). Let v1 , . . . , vk be an on list of


vectors in Rn and define V as in Definition C.6. The number k is
known as the dimension of the subspace V , and is denoted dim V .

On reflection, this is a natural definition of the dimension of a vector


subspace. Given a list on vectors v1 , . . . , vk , any vector x in the subspace
generated by the list has k coordinates x·v1 , . . . , x·vk . One can associate
with each vector vi (which is an element of V ), a coordinate axis running
parallel to vi . Since the vi are mutually orthogonal, so are the associated
axes. Thus we have k mutually orthogonal coordinate axes which can

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
144 Additional material (non-examinable)

be used to locate each and every vector in V , using their k coordinates.


Having this geometric picture in mind, it seems natural to define the
dimension of V to be k.
In particular, given that Rn itself is generated by the on list e1 , . . . , en , we
get dim Rn = n, which agrees with our intuition of what the dimension of
Rn ought to be.

Remarks C.17.

We will have dim V =


k if the list of vec-
1. The definition above may not work if we do not require that
tors is linearly inde- the list v1 , . . . , vk is orthonormal. Given a general list of vectors
pendent, but again, we
won’t treat this notion v1 , . . . , vk , we can generate the corresponding space V , and it
explicitly in these notes. will have a dimension. However, without further information
about v1 , . . . , vk , we are only able to say that dim V 6 k – we
may not have equality in general.

2. The definition of dimension disguises a subtle point which is


worth mentioning. A subspace V of Rn can be generated by
different on lists of vectors. In fact, if dim V > 2, then there are
infintely many such lists. For instance, fix an angle θ and con-
sider the vectors v1 = (cos θ, sin θ, 0) and v2 = (− sin θ, cos θ, 0).
It turns out that the subspace V of R3 generated by the list
v1 , v2 , which is orthonormal, is equal to the set of vectors

V = {(a, b, 0) : a, b ∈ R} ,

regardless of the value of θ. (Verifing this fact would be an


instructive exercise.) Thus V is generated by infinitely many
on lists.
Since a general subspace V may be generated by many differ-
ent on lists of vectors, this raises the possibility that there are
on lists of vectors having different lengths, which generate V .
If this is the case then our definition of dimension would flail
about in a rather embarrassing way, because we would have
are than one competing candidates for the number dim V .
Fortunately, such a frightening spectre can never arise. It turns
out that, while V can be generated by many different on lists of
vectors, all such lists have the same length. Thus our definition
C.2. Vector subspaces of Rn and dimension 145

of dimension is safe and sound: the length of any one such list
will be the same as the length of any other! The fact that all bases
of a given vector sub-
space have the same

Orthogonal projections onto subspaces of Rn


length follows from the
so-called Steinitz Ex-
change Lemma. It is a
This is a continuation and development of Section 2.5 and Exercise 4.12. key result in the theory
of vector spaces.
Section 2.5 should be regarded in the ‘1-dimensional’ version of what
follows.
One of the ways in which ‘high-dimensional’ data can be analysed is to
take this data, which can be represented by a set of vectors in Rn (where
n tends to be a large number, hence ‘high-dimensional’), and ‘project’ it
down onto a subspace V of Rn , having a dimension much lower than n.
Since the projected data exists in a lower-dimensional space, it tends
to be easier to interpret than the original data (picture a 3-dimensional
object, and then picture a 7-dimensional object. . . ). Whenever projecting
in this way we will lose some information about the original data, but the
trick is to proceed in such a way that the projected data retains much of
the most important information possessed by the original data.
Let us fix an on list of vectors v1 , . . . , vk in Rn . This generates a sub- The key to retaining the
space V of Rn as per Definition C.15. Compare the next definition with important features of a
data set when projecting
Definition 2.30 and Exercise 4.12. is to make an appropri-
ate choice of list of on
vectors.
Definition C.18. Given x in Rn , the vector

projV (x) = (x · v1 )v1 + · · · + (x · vk )vk ∈ V ,

is called the orthogonal projection of x onto V .

Geometrically speaking, projV (x) is defined in such a way that the vector
x − projV (x) is orthogonal to the entire subspace V , in the sense that
it is orthogonal to every vector in V . Imagine a fly hovering above a
horizontal table top. If we draw a vertical line down from the fly to the
table top, then the point at which the line meets the table would be the
orthogonal projection of the fly onto the table. This vertical line can be
said to be perpendicular to the entire table top.
Moreover, projV (x) is defined so that the distance between x and projV (x)
is less than the distance between x and any other vector z in V . In other
words,
x − proj (x) 6 x − z ,
V

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
146 Additional material (non-examinable)

whenever z ∈ V . The distance between x and projV (x) is the shortest


distance between x and the entire subspace V . In the context of the
hovering fly, the length of the vertical line between the fly and the table
top yields the shortest distance between the fly and the table top.

Example C.19. The plane V in Example


q C.10 (2) can be generated
by the on list v1 , v2 , where v1 = 5 (1, 0, 12 ) and v2 = √130 (1, −5, −2).
4

Let x = ( 41 , 14 , 1). Compute projV (x).

Solution. According to Definition C.18, we have

projV (x) = (x · v1 )v1 + (x · v2 )v2


= ( 35 , 0, 10
3
, 2 , 5 ) = ( 12 , 12 , 12 ).
1 1 1
) + (− 10 

Example C.19 is depicted below. The vector projV (x) is represented by


the arrow in the plane. The arrow pointing perpendicular to the plane
represents the vector x − projV (x). The vector x is given by the third
arrow. Together, all three vectors form a right-angled triangle: projV (x ·
(x − projV (x)) = 0.

Figure C.3: Projection of x onto the plane in Example C.19

The claims made above are justified by the next theorem. What follows
is essentially the solution of Exercise 4.12.
C.2. Vector subspaces of Rn and dimension 147

Theorem C.20. Let v1 , . . . , vk , V and x be as above. Then

1. (x − projV (x)) · z = 0 whenever z ∈ V , and



2. x − projV (x) 6 x − z whenever z ∈ V .

Proof.

1. First of all, given 1 6 i 6 k, observe that



(x − projV (x)) · vi = x · vi − (x · v1 )v1 + · · · + (x · vk )vk · vi
= x · vi − (x · v1 )(v1 · vi ) − · · · − (x · vi )(vi · vi )−
· · · − (x · vk )(vk · vi )
= x · vi − x · vi
= 0,

using Fact 2.20. Now let z ∈ V . According to Proposition C.14, we


have
z = (z · v1 )v1 + · · · + (z · vk )vk .
Consequently,

(x − projV (x)) · z = (x − projV (x)) · (z · v1 )v1 + · · · + (z · vk )vk

= (z · v1 ) (x − projV (x)) · v1 +

· · · + (z · vk ) (x − projV (x)) · vk
= 0 + ··· + 0 from above
= 0,

again using Fact 2.20.

2. Given z ∈ V , set u = x − projV (x) and v = projV (x) − z. Observe that


v ∈ V , as V is a subspace. Thus u · v = 0 by part (1). It follows that

x − proj (x) = u 6 u + v = x − z ,
V

by Theorem 2.22.

Some congratulations are in order, because you have reached. . .

© 2023 Richard Smith. For personal use only, not for circulation or sharing.
148 Additional material (non-examinable)

Well done!

You might also like