Linear Algebra UCD

Linear Algebra (Online)
MATH10390
Dr Richard Smith
May 20, 2023
Dr Richard Smith copyright © 2022/23.
i
Contents
Contents iii
0 Programme overview 1
0.1 Programme outline . . . . . . . . . . . . . . . . . . . . . . . . . 1
0.2 Assessment and grading . . . . . . . . . . . . . . . . . . . . . 5
0.3 Continuous assessment schedules . . . . . . . . . . . . . . . 7
0.4 Discussion boards and MathJax . . . . . . . . . . . . . . . . . 9
0.5 Any other business . . . . . . . . . . . . . . . . . . . . . . . . 10
1 Matrices 1 13
1.1 Introduction to matrices . . . . . . . . . . . . . . . . . . . . . 13
1.2 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Further ideas from matrix arithmetic . . . . . . . . . . . . . . 23
1.4 Inverses and determinants of 2 × 2 matrices . . . . . . . . . 26
1.5 Inverses and determinants of n × n matrices . . . . . . . . . 31
2 Vector Geometry 1 41
2.1 Euclidean n-space and vectors . . . . . . . . . . . . . . . . . 41
2.2 Vector arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.3 The scalar product . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.4 Matrices acting on vectors . . . . . . . . . . . . . . . . . . . . 58
2.5 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . 59
3 Systems of Linear Equations 63

3.1 Systems of linear equations . . . . . . . . . . . . . . . . . . . 63
3.2 Elementary row operations . . . . . . . . . . . . . . . . . . . 65
3.3 Row-echelon and reduced row-echelon forms . . . . . . . . 68
3.4 Parametric solutions and inconsistent systems . . . . . . . 72
3.5 Connections between linear systems and matrices . . . . . 75
4 Vector Geometry 2 83
iii
iv Contents
4.1 Orthonormal bases of Rn and coordinate systems . . . . . . 83

4.2 Orthonormal bases and orthogonal matrices . . . . . . . . . 89
5 Eigenvalues and Eigenvectors of Matrices 93

5.1 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . 93
5.2 The characteristic equation . . . . . . . . . . . . . . . . . . . 96
5.3 Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.4 Symmetric matrices and orthonormal bases of eigenvectors 104
6 Matrices 2 107
6.1 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.2 Matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
A Discussion board and WeBWorK guides 117

A.1 How to use the Moodle discussion boards . . . . . . . . . . 117
A.2 How to use WeBWorK . . . . . . . . . . . . . . . . . . . . . . 120
B Principal component analysis (non-examinable) 123
C Additional material (non-examinable) 129

C.1 Additional proofs . . . . . . . . . . . . . . . . . . . . . . . . . . 129
C.2 Vector subspaces of Rn and dimension . . . . . . . . . . . . 134
Chapter 0
Programme overview
0.1 Programme outline

Linear Algebra (Online) and Calculus (Online) comprise the Professional It should be possible to
link to anything in blue.
Certificate in Mathematics for Data Analytics and Statistics. The purpose
of these modules is to teach the student fundamental concepts and tech-
niques from linear algebra and calculus that are, for instance, necessary
for the study of multivariate statistics. While the material in the mod-
ules will be quite generic in nature (and thus applicable to many other
fields), the reader will find in some of the appendices specific techniques
in multivariate statistics (e.g. MATH10390 Appendix B on Principle Com-
ponent Analysis), that draw together many of the topics covered in the
programme as a whole, either directly or indirectly.
MATH10390 Linear Algebra (Online) outline

• Matrices 1
Matrix arithmetic, determinants of n × n matrices and their com-
putation for small n, and the adjugate method of finding inverses.
Symmetric and orthogonal matrices.
• Vector geometry 1
Vectors in n-dimensional Euclidean space, vector arithmetic, scalar
products, the Cauchy-Schwarz inequality, angles between vectors,
and the action of matrices on vectors.
1
2 Programme overview
• Systems of linear equations

Solutions of systems of linear equations by Gaussian elimination,
connections between matrices and linear systems, including matrix
rank.
• Vector geometry 2
Orthonormal lists of vectors, orthonormal bases of Rn and coordi-
nate systems.
• Eigenvalues and eigenvectors of matrices

Eigenvalues and eigenvectors of n × n matrices, and their compu-
tation for n = 2 and n = 3. Symmetric matrices and orthonormal
bases of eigenvectors.
• Matrices 2
Quadratic forms and matrix norms.
MATH10400 Calculus (Online) outline

• Functions and Limits
Functions, domain, codomain, algebra of functions, injectivity, sur-
jectivity, inverses, limits, algebra of limits, continuity, polynomials,
rational functions, trigonometric, exponential and logarithmic func-
tions.
• Differentiation
Rates of change, differentiation from first principles, relationship
with continuity, the power, product, quotient and chain rules, deriva-
tives of polynomials, trigonometric, exponential and logarithmic func-
tions, and composites thereof.
• More about the derivative

Critical points, finding local maxima, minima and inflection points,
higher order derivatives, Rolle’s Theorem, the Mean Value Theorem,
linear approximation, and logarithmic and implicit differentiation.
• Functions of several variables

Functions on Rn (mostly n = 2) and their graphs, partial deriva-
tives, gradients, critical points and their classification, second order
partial derivatives, Hessian matrices, lines of best fit, least squares.
0.1. Programme outline 3
• Integration
Indefinite integrals as antiderivatives, standard examples, Riemann
sums, definite integrals and area, the Fundamental Theorem of Cal-
culus.
• Methods of integration
Integration by substitution and integration by parts.
• Numerical techniques
Solving equations numerically, the bisection and Newton-Raphson
methods, numerical integration, the trapezoidal rule and Simpson’s
rule.
Examples will be peppered throughout the two modules. While there

will be some theory, the emphasis will be on the introduction of ideas
and techniques. A small number of mathematical proofs are included, but
they will be relatively straightforward, and will be specific applications
of the techniques introduced during the course of the modules. Any of
the deeper, more involved proofs that would belong to more theoretical
courses on linear algebra or calculus will be confined to Appendix C.1
and Appendix B.1, respectively, should the reader be interested. These
appendices, replete with dark secrets and forbidden magic, will not be
examinable!
Lecturer – Dr Richard Smith

I am on the right. My email address is richard.smith@maths.ucd.ie
and the address of my website is https://maths.ucd.ie/~rsmith.
Please only use my email address in the event of an emergency! See
point 4 below under ‘Online material’, and Section 0.4 for details on how
to pose queries concerning the module.
My office is S1.71, first floor, Science Centre South. It is in building 12,
square 6D, on the most recent version of the UCD map available.
In addition, the Mathematics and Statistics School Office is in G03, In keeping with ancient
academic tradition, the
ground floor, Science North, in building 65, square 6C on the map. photo is comfortably out
of date.
© 2023 Richard Smith. For personal use only, not for circulation or sharing.
Figure 0.1: School office: G03, ground floor, Science North
Online material
1. UCD Mathematics Moodle
All module material will be made available on
UCD Mathematics Moodle (https://vector.ucd.ie/moodle).
Once at the site, please log in using your UCD Connect creden-
tials and then enrol to both modules (the enrolment key for both is
‘ucdprofcert2023’).
2. Lecture notes and videos

You will see a series of The full set of lecture notes for each module will be made available
exercises in the notes
themselves. You do not
when the programme opens. At 9am (summer time, i.e. 08:00 GMT)
need to submit solutions on each Monday of the first eight weeks of term, a set of short videos
to these.
covering the central topics from the notes in further detail will be
released. Note that this material is intended to be absorbed over
the 12-week summer teaching term; in particular, the assessment
will be spread across the 12 weeks. The compressed schedule is
designed to allow people to look ahead if they wish.
0.2. Assessment and grading 5
3. Continuous assessment
Continuous assessment comes in two forms: written homework and
WeBWorK. Both will be issued and managed online – see Section
0.2 for more details. The full schedule of issue dates and assessment
deadlines is given in Section 0.3.
4. Discussion boards
Students can post queries and discuss topics via the weekly dis-
cussion boards – see Section 0.4 for more information.
5. News and announcements

I will make class announcements via the ‘MATH10390 announce-
ments’ and ‘MATH10400 announcements’ discussion boards at the
top of each site’s main page, respectively. These announcements
will be repeated in the ‘Latest news’ boxes to the right hand side.
Mobile access to online material

We recommend that you view UCD Mathematics Moodle via a web brow-
ser on a desktop or laptop computer, or tablet.
Moodle does have an app, available for both iOS and Android mobile
platforms. However, be warned that its functionality is limited: it does
not work with WeBWorK all that well, and there is no rendering of math-
ematical notation via MathJax (see Section 0.4). Given the app’s limita-
tions and given my strong doubts about whether smartphones can really
aid the acquisition of deep knowledge and understanding, I am simply
making you aware of the app rather than actively promoting it.
To gain access to UCD Mathematics Moodle via the app, please enter
vector.ucd.ie/moodle
when prompted for the URL, and then log in as usual.
0.2 Assessment and grading

The proportion of marks allocated to the various assessment components
will be the same for both modules. On the other hand, the modules’
assessment deadlines will be different – see Section 0.3 below.
Homework (20% of final mark)

Four homework sheets will be issued on Moodle over the course of the
module. Each is worth 5% of your final mark. You will receive a mark out
of 5 for each sheet. The only possible exception to this is MATH10390
Homework Sheet 4, where you may receive an additional 2 bonus marks,
owing to the length of the sheet.
Written solutions to the homework should be scanned to pdf files and
submitted online via Moodle. As well as ordinary scanners, there are
some apps (free or ‘freemium’) that use smartphone cameras to scan doc-
uments to pdf, such as CamScanner. Alternatively, with a suitable app,
you can write solutions directly onto a touchscreen device, such as a
tablet, and export to a pdf file.
Please ensure that your student number and solutions are clearly visible,
otherwise, you may lose marks! For each homework assignment, you
must submit exactly one pdf file containing your solutions and accept
the submission declaration before clicking the submit button (see the
end of Section 0.5). The maximum size of files to be uploaded is 10 MB,
which should be plenty.
Homework issue dates and submission deadlines are given in Section 0.3
below. Marks and feedback on submitted solutions will be provided on a
rolling basis.
WeBWorK (10% of final mark)

WeBWorK is an online homework system, again available via Moodle.
Five WeBWorK homework sets will be issued over the course of the mod-
ule. Each set is worth 2% of your final mark. You will receive a mark out
of 2 for each set.
Solutions are entered directly online. For advice on entering answers,
and comments on certain questions, please see the WeBWorK guide in
Appendix A.2 (of either module).
WeBWorK set issue dates and submission deadlines are given in Section
0.3 below. Answers to a given WeBWorK set will be released immediately
after the corresponding deadline.
Final exam (70% of final mark)

Prior to the pandemic the exams for the two modules in this programme
were offered in-person in UCD. In 2020 and 2021 they were moved online
0.3. Continuous assessment schedules 7
due to the pandemic. Last year we reverted to in-person exams, and we

will do the same this year. Online exams are problematic because they
make it very difficult to protect the integrity of assessment; unfortunately
plagiarism was committed when the exams were online.
The two final 2-hour written exams will take place from 10am – 12 noon,
and 2pm – 4pm in B108, Computer Science Building, University College
Dublin, Friday 25 August 2023 (building 18, square C6 on the UCD map).
No alternative exam date will be offered.
When travel to Dublin for the final exams is not possible, examination in
appropriate third party centres may be facilitated. Such arrangements
will need to be made well in advance of the exam and cannot be guar-
anteed. Contact Natalia Zadorozhnyaya (dataanalyticsonline@ucd.ie)
by Friday 9 June (week 3) to enquire. Students availing of an alternative
examination centre will need to bear all the associated costs.
Grading
You will receive a mark out of 30 for your continuous assessment, which
will be converted to a letter grade according to the University’s Standard
Conversion Grade Scale (see under Mark to Grade Conversion Scales).
Likewise, you will receive a mark out of 70 for your final exam which will
be converted into a letter grade in the same manner. These two letter
grades will be combined to make an overall module grade; the precise
mechanism by which this will be achieved can be seen under Module
Grade Calculation Points.
0.3 Continuous assessment schedules

All MATH10390 continuous assessment issue dates and deadlines will fall
at 9am (summer time, i.e. 08:00 GMT) on Tuesdays. All MATH10400
issue dates and deadlines will fall at 9am on Wednesdays.
Two weeks are given to complete each WeBWorK set. There are no
‘overall submit’ buttons. To obtain full credit for a set, simply enter the
correct solutions online before the deadline.
The amount of time given to complete written homework assignments The deadlines start to
pile up towards the end
varies. Early submission of written homework assignments is strongly of the modules. Please
encouraged, but the formal deadlines are structured in a way that allows be mindful of this!
students some flexibility in making their own work plan. The complete
schedules are given in the tables below.
The WeBWorK deadlines are hard deadlines. Regarding homework dead-

lines, if homework is submitted late, your total mark will decrease linearly
to zero after 48 hours. For example, a piece of work ordinarily worth 5%
will receive 2.5% if submitted 24 hours late, and 1.25% if submitted 36
hours late, and so on.
WeBWork marks and feedback will be returned immediately after the
deadlines, and homework marks and feedback will be returned within 10
working days of the deadlines. For this reason UCD’s Late Submission
of Coursework Policy will not apply to these modules (see point 6.1 in
the policy). Penalties for late submission may be waived if the student
has valid extenuating circumstances (see Section 0.5).
MATH10390 assessment schedule (9am Tuesdays)
Week Date Assignment issue Assignment deadline

2 30-05 WeBWorK 1
Again, while the videos 3 06-06 Homework 1
are compressed into an
8-week period, the con- 4 13-06 WeBWorK 2 WeBWorK 1
tinuous assessment is
spread throughout the 5 20-06 Homework 2
12-week summer teach-
ing term. 6 27-06 WeBWorK 3, Homework 3 WeBWorK 2
7
8 11-07 WeBWorK 4, Homework 4 WeBWorK 3, Homework 1
9 18-07 Homework 2
10 25-07 WeBWorK 5 WeBWorK 4, Homework 3
11
The homework dead-
lines for both modules 12 08-08 WeBWorK 5, Homework 4
are closely aligned, with
the exception of home-
work sheet 4.
MATH10400 assessment schedule (9am Wednesdays)
Week Date Assignment issue Assignment deadline

2 31-05 WeBWorK 1
3 07-06 Homework 1
4 14-06 WeBWorK 2 WeBWorK 1
5 21-06 Homework 2
0.4. Discussion boards and MathJax 9
6 28-06 WeBWorK 3, Homework 3 WeBWorK 2

7
8 12-07 WeBWorK 4, Homework 4 WeBWorK 3, Homework 1
9 19-07 Homework 2
10 26-07 WeBWorK 5 WeBWorK 4, Homework 3
11 02-08 Homework 4
12 09-08 WeBWorK 5
0.4 Discussion boards and MathJax

Weekly discussion boards
Each module will have its own set of discussion boards. Each week will
be given its own discussion board to keep conversations focussed. If you
have a query about the module or about its content, you are strongly
encouraged to post your query to the discussion boards. Please don’t
be afraid to ask questions!! From personal experience, I know that it can
feel daunting to ask questions (especially in an online environment), but
asking questions really is an excellent way of improving understanding!
Ordinarily, these boards will be monitored, and queries posted to them
will be answered, for up to two hours in the afternoon, Monday to Friday,
depending on the volume of queries. If I take leave at any point then I
will let you know via the ‘MATH10390 announcements’ and ‘MATH10400
announcements’ discussion boards and will arrange appropriate cover.
Please only contact me by email in the event of an emergency!
I will not monitor the boards at other times, or at weekends! Of course,
the online nature of these modules means that you can view module
content and work through assignments at any time, day or night. In
contrast, the lowly human behind it all (i.e. that person in the photo)
cannot be on hand at all times as well. Please do take this into account,
particularly when the Tuesday and Wednesday deadlines loom!
Posting mathematical content

The Moodle forums have a system called MathJax that allows people
to write mathematical notation directly into web pages, which enables
mathematical queries to be easily and clearly stated. Details of how to
use MathJax in the discussion boards are given in Appendix A.1 (of either
module).
You’re free to use MathJax to post mathematical content. Alternatively
you can post such content by writing it by hand and scanning it to a pdf
(see above) or by using a suitable pdf annotator, and then attaching the
pdf file to your post. This option may be preferable if you want to write
a lot of mathematical content.
0.5 Any other business

Suggested further reading
Regarding books, neither module formally follows a textbook. However, I
can suggest Anton, Rorres, Elementary Linear Algebra, Applications Ver-
sion, Chapters 1-3 and parts of Chapters 5 and 7 for MATH10390, and
Anton, Bivens, Davis, Calculus Early Transcendentals, parts of Chapters
0-5, 7 and 13 for MATH10400. It often helps to see concepts approached
from a second, slightly different perspective, and there are plenty of ex-
ercises to practice on.
Calculators permissible in the final exams

There is of course a huge range of calculators available and it is unre-
alistic to provide an explicit list of those that will be permissible in the
final exams. Generally speaking, the calculators that are not permissi-
ble are programmable ones or ones that are capable of more advanced
built-in functionality. As an example, the Casio fx-83GT PLUS model is
permissible, but the fx-991ES PLUS is not. If in doubt, please ask me
on the discussion boards. Of course, use of smartphones in the exams is
completely banned!
Registration, fee payment and withdrawals

Please confirm your personal details (including email address and photo)
and pay your programme fees using UCD’s Student Information System.
As this is a one-semester programme, payment is required in full be-
fore it starts on 22 May. For further assistance, please contact Natalia
Zadorozhnyaya (dataanalyticsonline@ucd.ie) or see UCD’s guide to
online registration and fee payment. Further information on fee payment
and deadlines can be found on the UCD Fees office website.
0.5. Any other business 11
If you wish to withdraw from the programme, then please note that it is
essential to do so by Friday 11 August (week 12), to ensure you do not
have a failing grade recorded against your name on the University’s sys-
tem. Since this programme is only one trimester long, it is not possible
to get a refund upon withdrawal; see point 1.7 of UCD’s Refunds Policy.
Extenuating circumstances
The University has an Extenuating Circumstances Policy. The Univer-
sity defines extenuating circumstances to be ‘serious unforeseen circum-
stances beyond your control which prevented you from meeting the re-
quirements of your programme’. Note the following footnote on page 2
of the Student Guide on Extenuating Circumstances:
Work commitments are not normally considered to be extenu-

ating circumstances. However a student on a part-time and/or
continuing professional education programme may have work-
related extenuating circumstances outside of the norm (e.g. a
work-related court case that they legally must attend) and in
these exceptional cases, they should consult the appropriate
programme/school office for advice.
You can apply for extenuating circumstances online. For more details,
please contact Natalia Zadorozhnyaya (dataanalyticsonline@ucd.ie).
UCD Student Code and plagiarism

Concerning conduct and plagiarism in particular, please see the Univer-
sity’s Student Code of Conduct and specifically its Student Plagiarism
Policy. In addition to these documents, the School of Mathematics and
Statistics has its own Plagiarism Protocol. Please familiarise yourselves
with the second and third documents. The Library also has resources
and advice to help people avoid unintentional plagiarism.
In accordance with the School’s protocol (see §2.2), you will need to
accept the submission declaration before clicking the submit button. In
doing so, you acknowledge that you have neither given, sought, nor re-
ceived, aid in order to complete the assessment.
Chapter 1
Matrices 1
1.1 Introduction to matrices

Definition of matrices
Definition 1.1. A m × n (‘m by n’) matrix is a rectangular array of

numbers having m rows and n columns, enclosed by brackets.
Example 1.2.
!
2 −11 27
1. is a 2 × 3 matrix.
1 0 −5
2. Let θ be an angle, and define

!
cos θ − sin θ
Rθ = . The term ‘matrix’
sin θ cos θ was coined by James
Sylvester (1814 – 1897),
though much of what
In due course we will see that Rθ is the 2 × 2 rotation matrix we now call matrix
theory was known
which rotates points in 2 dimensions about the origin, through to mathematicians in
angle θ. preceding centuries.
We usually label matrices using capital letters like A, B and C , etc.

In spite of their unassuming definition, matrices have an enormous range
of applications to other fields, such as physical and biological sciences,
probability and statistics, engineering, sociology, optimisation, computer
13
14 Matrices 1
science, and other areas of mathematics such as geometry and differential

equations.
Remarks 1.3.
1. Two matrices have the same size if they have the same number
of rows and the same number of columns, so a 2 × 3 matrix and
a 3 × 2 matrix have different sizes.
2. A matrix having just one column is sometimes called a column

vector – see Chapter 2 for more details.
3. If a m × n matrix has the same number of rows as columns, i.e.,

if m = n, then it is called a square matrix.
4. If A is a m × n matrix, the number appearing in the ith row and

jth column is called the (i, j)-entry of A, and is denoted Aij .
5. Two matrices are equal if and only if they have the same size
and the corresponding entries are all equal.
√ !
2 2 −3
Example 1.4. Let A = .
4 0 5
√
Then A12 = 2 (i = 1, j = 2), A23 = 5 (i = 2, j = 3), A13 = −3, etc.
Matrix addition and subtraction

Fundamental to linear algebra (and mathematics as whole) is the idea
that you can algebraically manipulate a host of mathematical objects
that are sometimes more complicated than ordinary numbers.
In real life, we are used In spite of the fact that matrices are composed of numbers, very often
to treating composite
objects as whole things
we prefer to treat them as whole entities in their own right, and perform
in their own right: we operations on them such as addition, subtraction and multiplication, that
tend to treat people
as whole people, rather in some ways (but certainly not all!) resemble those of numbers.
than as bags of blood,
bones and organs. In order to define matrix addition, subtraction and multiplication, we do
need to open up the matrices, manipulate their innards so to speak (i.e.
manipulate the matrix entries inside), and close them back up again. So
let’s get ‘under the hood’, or ‘bonnet’.
1.1. Introduction to matrices 15
Definition 1.5 (Matrix addition and subtraction). Let A and B be ma-

trices of the same size (m × n). We define their sum, A + B, to be the
m × n matrix whose entries are given by
(A + B)ij = Aij + Bij ,
for i = 1, . . . , m and j = 1, . . . , n. Their difference, A − B, is the m × n

matrix having entries
(A − B)ij = Aij − Bij ,
where i and j are as above.

If A and B have different sizes then the matrix sums and differences
A + B and A − B are undefined.
Thus A + B is obtained from A and B by adding entries in corresponding

positions.
Example 1.6. Take two 2 × 4 matrices

! !
2 0 −1 −1 −1 1 0 −2
We can only add or
A = and B = . subtract two matrices if
they are the same size.
1 2 4 2 3 −3 1 1
Find A + B and A − B.
Solution. We have
!
2 + (−1) 0+1 −1 + 0 −1 + (−2)
A+B =
1+3 2 + (−3) 4 + 1 2+1
!
1 1 −1 −3
= ,
4 −1 5 3
and
!
2 − (−1) 0−1 −1 − 0 −1 − (−2)
A−B =
1−3 2 − (−3) 4 − 1 2−1
16 Matrices 1
!
3 −1 −1 1
= .
−2 5 3 1
Matrix scalar multiplication

Matrix multiplication, that is, the multiplication of one matrix by another
to produce a third matrix, is defined below in Section 1.2. There is also
a way to multiply a matrix by a scalar, that is, an ordinary number, to
produce a new matrix.
Definition 1.7 (Matrix scalar multiplication). Let A be a m × n matrix

and let c be a scalar (i.e. a number). Then cA is the m × n matrix
having entries defined by
(cA)ij = cAij ,
for i = 1, . . . , m and j = 1, . . . , n. I.e. cA is obtained from A by

multiplying every entry in A by c.
!
2 5
Example 1.8. If A = , then
3 −4
! ! !
4 10 −10 −25 0 0
2A = , −5A = , 0A = .
6 −8 −15 20 0 0
Zero matrices
The humble 0 has a much more exciting history than you might think,
and is in fact a very special number. It is the only number, such that
when added to any number, has no effect: a + 0 = 0 + a = a. For this
reason, mathematicians call it an additive identity.
The last example in Example 1.8 prompts the following definition.
Definition 1.9 (Zero matrices). The m × n matrix whose entries are

all zero is called the zero (m × n) matrix.
The zero matrices are the matrix analogies of 0: if the m × n zero matrix
is added to any m × n matrix, the matrix remains unchanged.
1.2. Matrix multiplication 17
1.2 Matrix multiplication

Before we begin. . . summation notation
Matrix multiplication is very different from addition and subtraction. The
matrix product AB is definitely not computed in the ‘obvious’ way, i.e. by
multiplying the respective entries of A and B. The definition of ma-
trix multiplication is best expressed using summation notation. Readers
familiar with this kind of notation can skip on to the next subsection.
Otherwise, please read on. . .
In mathematics we are frequently required to add together sequences of
numbers. Suppose that we set xi = 2i, for every positive integer i. Then
the sum of the first 4 terms of the sequence xi is given by
x1 + x2 + x3 + x4 = 2 + 4 + 6 + 8 = 20. (1.1)
This is fine for small numbers of terms (in this case 4), but very often we
need to add together much larger numbers of terms, such as a thousand
terms or a million terms. Writing out such large numbers of terms as
above would get upsetting very quickly.
Summation notation has been developed to deal with such eventualities.
We will require it for matrix multiplication. The sum of terms x1 + x2 +
x3 + x4 in (1.1) can be expressed instead as
4
X
xi . (1.2)
i=1
P
The symbol is the summation symbol, i is the index of summation and
1 and 4 are the initial value (or lower limit) and final value (or upper
limit) of the index of summation, respectively. Now we can express the
sum of the first thousand terms of the sequence xi concisely as
1000
X
xi .
i=1
The sum of the terms xi from i = 34 to i = 781, can be written as

781
X
xi ,
i=34
and so on.
18 Matrices 1
Most of the time, the choice of letter to denote the index of summation
does not matter. Letters such as j and k are commonly used instead of
i. For example, the expressions
4
X 4
X
xj and xk
j=1 k=1
mean the same thing as (1.2): x1 + x2 + x3 + x4 .

Roughly speaking, you are allowed to use whatever letter you like, pro-
vided it is not being used somewhere else. For instance, in matrix mul-
tiplication, we need to deal with quantities indexed by more than one
variable. Suppose that yij = j 2 i, for all positive integers i and j. Then
the sum
X 4
yij (1.3)
j=1
denotes the quantity
yi1 + yi2 + yi3 + yi4 = 12 i + 22 i + 32 i + 42 i = (1 + 4 + 9 + 16)i = 30i.
In (1.3), the index of summation is j: j increases from 1 to 4, while the

quantity i remains constant during the summation process.
If we write zi = 30i, then we see that
4
X
yij = zi , (1.4)
j=1
for all positive integers i. On the left hand side of (1.4), i is fixed while j
varies. Assuming i stays where it is, we can replace the summation index
j with any letter, provided that it is not i. For example,
4
X 4
X
yik = yi` = 12 i + 22 i + 32 i + 42 i = 30i = zi ,
k=1 `=1
however
4
X
yii (1.5)
i=1
means something completely different! The sum in (1.5) should be inter-

preted as
y11 + y22 + y33 + y44 = 13 + 23 + 33 + 43 = 1 + 8 + 27 + 64 = 100,

which is obviously not equal to (1.4). It would be legitimate to replace

(1.4) by
X4 X4
yji = zj or y`k = z` ,
i=1 k=1
because in each case you are preserving those indices that are fixed and
those that vary, respectively.
Finally, quite often the lower and upper limits of sums can be replaced
by letters. For example, given positive integers k < n, the notation
n
X
xi , (1.6)
i=k
stands for
xk + xk+1 + xk+2 + · · · + xn−1 + xn .
If xi = 2i, as it was initally, then (1.6) happens to be equal to
n(n + 1) − (k − 1)k.
Matrix multiplication
Definition 1.10 (Matrix multiplication). Let A, B be m × p and q × n

matrices, respectively. The product AB of A and B is defined only if
p = q, i.e. only if A has the same number of columns as B has rows.
In this case, AB is a m × n matrix. The entries of AB are The choice of letter k
here as the summation
p index is common, but not
X
(AB)ij = Ai1 B1j + Ai2 B2j + Ai3 B3j + · · · + Aip Bpj = Aik Bkj . essential. E.g. we could
use `, or indeed any let-
k=1 ter, provided the letter
is not already in use.
If p 6= q then the product AB is undefined.
Like the arithmetic of ordinary numbers, matrix multiplication is per- See MATH10400 Fact
formed before addition and subtraction, e.g. AB + C = (AB) + C , not 1.4.
A(B + C ), etc.
When first encountered, this definition is perhaps best understood using
examples.
20 Matrices 1
Example 1.11. Find AB when

 
! 3 1
2 −1 3
A= B = 1 −1 .
 
and
1 0 −1
0 2
Solution. There is a video for Example 1.11.

Matrix multiplication
was introduced by
Arthur Cayley (1821
– 1895), in order to
reproduce the behaviour
Example 1.12. If A and B are as in Example 1.11 then BA is a 3 × 3
of so-called linear matrix. By yourselves, verify that the complete product is
transformations.
 
Such things include ge-
ometric operations like 7 −3 8
BA = 1 −1 4  .
rotations, dilations and  
shears.
2 0 −2
Warning 1.13. Matrix multiplication is not commutative! That is to

say, in general, AB 6= BA. In Examples 1.11 and 1.12, we saw that
AB and BA were both defined, but did not even have the same size.
It is also possible for only one of AB and BA to be defined, e.g. if A
is 2 × 4 and B is 4 × 3. Even if AB and BA are both defined and have
the same size (e.g. if both are 3 × 3), the two products are typically
It is absolutely essen-
tial to recognize this different.
fact!! It is a piv-
otal difference between
matrix arithmetic and
arithmetic of ordinary
numbers.
Matrix multiplication – some motivation
At first sight, the definition of matrix multiplication may seem unnatural
and overly complicated. However, there is an abundance of examples
which demonstrate that this is the ‘right’ way to do it. We consider one
example from geometry.
Happily, matrices and matrix arithmetic can be used to do things! In
general, we can represent a point (a, b) in 2 dimensions using polar
coordinates. If r > 0 is the distance from the origin to (a, b), and φ is
the angle from the positive x-axis to the point, measured anticlockwise
(in radians), then simple trigonometric considerations yield a = r cos φ
and b = r sin φ, i.e. (a, b) = (r cos φ, r sin φ).
Figure 1.1: Representation of a point (a, b) in 2 dimensions using polar

coordinates
y
(a,b) = (r cos φ,r sin φ)

b
For a very similar pic-
r ture, see MATH10400
1 Figure 1.7.
φ
x
−1 1 a 4
For those of you who are familiar with complex numbers, this is very
much like representing the complex number z = a + ib in polar form.

r cos φ
If we represent this point as the 2 × 1 column vector r sin φ , then mul- By writing a point as
a column vector, we
tiplication on the left by Rθ gives are not doing anything
deep. We are simply
! ! ! representing the point
r cos φ cos θ − sin θ r cos φ in a slightly different
Rθ = way, to take advantage
r sin φ sin θ cos θ r sin φ of matrix multiplication.
! We return to this idea in
r cos θ cos φ − r sin θ sin φ Section 2.4.
=
r sin θ cos φ + r cos θ sin φ
!
r cos(θ + φ)
= ,
r sin(θ + φ)
using the trigonometric addition formulae. See MATH10400 Lemma

1.33.
Thus the effect of Rθ on (a, b) is to preserve its distance r from the origin, The effect of matrices
but change the angle measured from the positive x-axis by θ, i.e., Rθ has on column vectors is ex-
rotated the point anticlockwise about the origin by θ.

plored in more detail in
Section 2.4.
Example 1.14. If (a, b) = (3, 2) and θ = 5π (75◦ ), then

Matrices are used
12 in computer graphics
all the time: the
! ! !
−
3-dimensional virtual
3 cos 5π sin 5π
3
Rθ 12 12 information in the
= 5π 5π computer is converted
2 sin 12 cos 12 2 into 2-dimensional in-
! ! formation for display on
3 cos 5π − 2 sin 5π
−1.16 the monitor or TV, using
= 12 12
≈ . matrix multiplication.
3 sin 5π
12
+ 2 cos 5π
12
3.42 Thus, the ability to
shoot zombies relies to
some extent on matrix
multiplication.
22 Matrices 1
Figure 1.2: Rotation of points in 2 dimensions using rotation matrices

y

−1.16
Rθ ≈
3
(75◦ )
4
2 3.42 θ= 5π
12
θ
x
−2 2 4
Zero divisors
We have already seen that matrices defy the commutative law of multi-
plication: in general AB 6= BA. Here is another example of how matrix
multiplication violates once sacrosanct traditions.
Example 1.15. If
! !
0 1 0 6
A = and B = ,
0 0 0 0
then ! ! !
0 1 0 6 0 0
AB = = ,
0 0 0 0 0 0
which is the 2 × 2 zero matrix.
The point is that if a and b are ordinary numbers and ab = 0, then

either a = 0 or b = 0. We use this fact all the time without thinking,
for instance, when factorising polynomials. But this fact is not true of
matrices! The miscreants A and B in Example 1.15 are known in the
trade as zero divisors.
Square matrices and matrix powers

Recall that a matrix is square if it has the same number of rows as
columns, i.e., if it is a n × n matrix for some n.
1.3. Further ideas from matrix arithmetic 23
Definition 1.16. We denote the set of n × n matrices by Mn (R). If A

If you are unfamiliar
with set notation, please
is a n × n matrix then we can write A ∈ Mn (R), and vice-versa. see MATH10400 Sec-
tion 1.2, or these notes
on sets, written for un-
dergraduates.
Suppose n = 3. Given any two matrices in M3 (R), we can add, subtract The symbol R here indi-
cates that our matrices
consist of real numbers.
and multiply these matrices and get a result that is defined, and also in
M3 (R). This observation applies equally to any value of n. You can define matri-
ces consisting of com-
plex numbers, or even
Definition 1.17 (Matrix powers). Let A ∈ Mn (R), i.e. let A be a n × n more exotic objects, but
matrix, and let k be a positive integer. The kth power Ak of A is the
we won’t consider such
matrices very much in
n × n matrix this module, if at all.
A The set Mn (R), equipped

| ×A× {z· · · × A} . with the operations of
k times addition and multiplica-
tion, are examples of
mathematical structures
known as rings.
!
1 −1
Example 1.18. Let A = ∈ M2 (R). Then
0 3
! ! !
1 −1 1 −1 1 −4
A2 = AA = = , and
0 3 0 3 0 9
! ! !
1 −4 1 −1 1 −13
A3 = AAA = A2 A = = .
0 9 0 3 0 27
The product A × A × · · · × A in Definition 1.17 is unambiguous, in that

it doesn’t matter in which order the matrix product is computed. For
instance, AAA = (AA)A = A(AA). This is a consequence of Fact 1.23 (4)
in the next section.
1.3 Further ideas from matrix arithmetic

Matrix transposes and symmetric matrices
Definition 1.19 (Matrix transposes). If A is a m × n matrix then the

transpose AT of A is the n × m matrix having entries given by
(AT )ij = Aji ,
24 Matrices 1
for i = 1, . . . , n and j = 1, . . . , m.
In practice, this means that we convert rows into columns, and vice-
versa: the ith row of A is the ith column of AT .
!
1 0 5
Example 1.20. If A = , then
2 −3 7
 
1 2
AT = 0 −3 .
 
5 7
Matrix transposes are taken before multiplication, e.g., ABT means A(BT ),
not (AB)T , etc. Notice that if you take the transpose of a matrix twice,
you get back to where you started: AT T = A.
The following type of matrix turns out to be very important, as we will
see in due course.
Definition 1.21 (Symmetric matrices). A matrix A ∈ Mn (R) is called

symmetric if AT = A, i.e. if Aji = Aij for 1 6 i, j 6 n.
Symmetric matrices are easy to spot. If a given matrix is square, and if

the entries along the 1st row equal the entries down the 1st column, and
likewise for the other rows and columns, then the matrix is symmetric.
 
2 −1 3
Example 1.22. The matrix A = −1 −4 7 is symmetric.
 
3 7 8
There is another way to spot symmetric matrices. The main diagonal of a

matrix A ∈ Mn (R) is the list of all entries running diagonally from top left
to bottom right, i.e. the list of entries Aii , 1 6 i 6 n. The main diagonal
of A in Example 1.22 comprises the entries 2, -4 and 8. If we placed a
mirror along the main diagonal, then the entries on either side would
form a mirror image of each other: the entries −1, 3 and 7 in the top
right can be seen reflected in the bottom left. This phenomenon occurs
in all symmetric matrices.
1.3. Further ideas from matrix arithmetic 25
The laws of matrix arithmetic

When performing arithmetic with matrices, it is important to understand
which rules apply and which do not. The following fact lists some general
properties of matrix arithmetic.
Fact 1.23. In the following, we assume that the sizes of the matrices
A, B and C are such that all indicated sums and products are defined.
1. The commutative law of addition:
A + B = B + A.
2. The associative law of addition:
(A + B) + C = A + (B + C ). Matrices share proper-

ties (1), (2), (4) and (5)
with ordinary numbers,
but not (3)!
3. The failure of the commutative law of multiplication: in general
AB 6= BA.
4. The associative law of multiplication:
(AB)C = A(BC ).
5. The distributive laws for matrix multiplication over matrix ad-

dition:
(A + B)C = AC + BC and A(B + C ) = AB + AC .
6. Properties of matrix scalar multiplication: given a number c, we

have
(cA)B = A(cB) = c(AB) and c(A + B) = cA + cB.
7. Matrix transpose properties:
(AB)T = BT AT and (A + B)T = AT + BT .
Observe that in Fact 1.23, we are treating the matrices as whole entities,
rather than as collections of numbers: besides (6), no numbers are present
26 Matrices 1
in the fact. Mastery of these laws is strongly recommended: they enable

one to perform matrix computations much more quickly, and improve one’s
general understanding of matrices.
Fact 1.23 is something that needs to be proved, because in mathemat-
ics one should never accept something as being true unless it can be
proved. However, proofs are not being emphasised in this module. The
interested reader will eventually find proofs of some of the facts above
in the ‘forbidden section’ of the library, namely Appendix C.1. The proof
of fact (4) is the most difficult.
Fact 1.23 (2) and (4) above imply that we can write A + B + C and ABC
without fear of ambiguity: the order of the brackets does not matter.
This extends to sums and products of four, five matrices and so on. In
particular, the product A × A × · · · × A in Definition 1.17 is unambiguous,
and we can see for instance that A2 A = (AA)A = AAA = A(AA) = AA2 .
1.4 Inverses and determinants of 2 × 2 matrices

If we divide a number by 5, we are multiplying it by 51 : the number 15 , or
5−1 , is the reciprocal or multiplicative inverse of 5. This means
1
5
× 5 = 1,
i.e. if you multiply 5 by 15 , you get 1; multiplying by 1

5
‘reverses’ the effect
of multiplying by 5.
The 2 × 2 identity matrix

We need a matrix analogue of division, but before we can cover this,
we need to find matrices that behave something like the number 1. The
number 1 is the multiplicative identity: it is distinguished because when
you multiply any number by 1, the number does not change (and 1 is the
only number with this property). First, we consider the 2 × 2 case.

a b
Example 1.24. Let A = c d ∈ M2 (R) be an arbitrary 2 × 2 matrix,

and let I2 = 0 1 . Find AI2 and I2 A.
1 0
1.4. Inverses and determinants of 2 × 2 matrices 27
Solution.
! ! !
a b 1 0 a b
AI2 = = = A,
c d 0 1 c d
! ! !
1 0 a b a b
and I2 A = = = A.
0 1 c d c d
Both AI2 and I2 A are equal to A: multiplying A by I2 (either on the

left or the right) does not affect A.
Definition 1.25 (The 2 × 2 identity matrix). The matrix I2 above is

called the 2 × 2 identity matrix.
Inverses of 2 × 2 matrices
Now that we have the 2 × 2 identity matrix, we can ask which 2 × 2
matrices have multiplicative inverses.
! !
2 1 3 −1
Example 1.26. Let A = and B = . Compute AB
5 3 −5 2
and BA.
Solution.
! ! !
2 1 3 −1 1 0 In the same way that 15
AB = = = I2 , ‘reverses’ the effect of 5,
5 3 −5 2 0 1 by getting us back to 1
( 15 × 5 = 1), so B re-
verses the effect of A by
! ! !
3 −1 2 1 1 0 getting back to I2 (AB =
and BA = = = I2 . BA = I2 ).
−5 2 5 3 0 1
Definition 1.27 (Matrix inverses). Let A ∈ M2 (R). If B ∈ M2 (R) sat-

isfies
AB = I2 and BA = I2
then B is called an inverse of A.
28 Matrices 1
It is worth pointing out that there is nothing in Definition 1.27 that stip-
ulates that if B ∈ M2 (R) is an inverse of A ∈ M2 (R), then B is the only
matrix that can perform that task. Fortunately, using a little matrix arith-
metic, we can show that if A has an inverse, then said inverse must indeed
be unique. This accords with our understanding of ordinary numbers: 51
is the only number such that, when multiplied by 5, produces 1.
Proposition 1.28. Suppose that B, C ∈ M2 (R) are both inverses of a

matrix A ∈ M2 (R). Then B = C . In other words, if A has an inverse,
then the inverse is unique.
Proof. If B and C are inverses of A then
BA = AB = I2 and C A = AC = I2 .
It follows that
(BA)C = I2 C = C ,
and (BA)C = B(AC ) = BI2 = B, Fact 1.23 (4)
as required.
This is another depar- Thus we can talk about the inverse of a matrix, whenever it exists. Not
every 2 × 2 matrix has an inverse: 0 0 has no inverse, but in addition,
ture from the arith- 0 0
metic of numbers. Ev-
ery non-zero number a
some non-zero matrices like 0 0 don’t have inverses either.
has the multiplicative 0 1
inverse 1/a, but this is
not true of matrices!
Computing inverses of 2 × 2 matrices

Provided it exists, the inverse of a matrix A is denoted A−1 .
Given a square matrix A, how do we
(a) decide if A−1 exists, and
(b) if so, work out what it is?
The 2 × 2 case is relatively straightforward.

a b
Definition 1.29 (Matrix adjugates). If A = c d
∈ M2 (R), we define
1.4. Inverses and determinants of 2 × 2 matrices 29
the adjugate of A by Be warned that in some

literature the adjugate
! of a matrix is called the
d −b adjoint.
adj(A) = .
−c a
What happens when we multiply A by its adjugate?

! !
a b d −b
A · adj(A) =
c d −c a
!
ad + b(−c) a(−b) + ba
= = (ad − bc)I2 .
cd + d(−c) c(−b) + da
You should verify that, likewise, adj(A) · A = (ad − bc)I2 .

!
a b
Proposition 1.30. Let A = ∈ M2 (R), and let us suppose that
c d
ad − bc 6= 0.
Then A−1 exists and equals
1
adj(A).
ad − bc
Proof. Using Fact 1.23 (6) and the observation above, we have
1 1
A· adj(A) = A · adj(A)
ad − bc ad − bc
1
= (ad − bc)I2 = I2 .
ad − bc
1
Likewise, ad−bc adj(A) · A = I2 , so ad−bc
1
adj(A) satisfies Definition 1.27.
−1
Thus A exists and equals ad−bc adj(A).
1

For instance, in Example 1.26 we had A = , so here ad − bc =
2 1
5 3
2 × 3 − 1 × 5 = 1 6= 0 and
!
1 1 3 −1
adj(A) = = B.
ad − bc 1 −5 2
30 Matrices 1
Exercise 1.31. Compute the inverses A−1 and B−1 of

There will be the odd
exercise in a mauve box.
These exercises will not
! !
11 −8 −7 5
be graded. Feel free to
A = and B = .
discuss them on the dis-
cussion boards.
13 10 2 35
The same applies to
passages in the text
or solutions of examples Verify that your solutions are correct by showing that AA−1 = A−1 A =
BB−1 = B−1 B = I2 .
which ask you to verify
certain things.
Determinants of 2 × 2 matrices
Definition 1.32 (Determinants of 2×2 matrices). The number ad−bc

above is called the determinant of A, and is denoted det(A).
Every square matrix has a determinant. Matrix determinants are of huge

theoretical and practical importance. We visit them on multiple occasions
in this module.

a b
We won’t prove this fact. For example, given a matrix A = c d , if we ‘transform’ a 2-dimensional
shape by A, it turns out that the area of the shape changes by a factor
of | det(A)|. See the accompanying short video for more details.
Figure 1.3: Determinants and their effect on area

y
! ! ! !
2 b 1 a+b
A
0
A = =
1 d 1 c+d
! !
1 a
A =
0 c
x
1 2 3
Area of transformed shape = (Area of original shape)·| det(A)|.

1.5. Inverses and determinants of n × n matrices 31
Example 1.33 (Inverses of rotation matrices). Find det(Rθ ) and (Rθ )−1 .
Solution. We have
det(Rθ ) = (cos θ)(cos θ) − (− sin θ)(sin θ) = cos2 θ + sin2 θ = 1 6= 0,
so Rθ−1 exists and equals

! !
cos θ sin θ cos(−θ) − sin(−θ)
adj(Rθ ) = = = R−θ .
− sin θ cos θ sin(−θ) cos(−θ)
Figure 1.4: Inverses of rotation matrices

y
Geometrically speaking,
Rθ rotates points an-
ticlockwise through θ,
and Rθ−1 = R−θ ro-
4
tates points clockwise
through θ, thus Rθ−1 re-
Rθ−1 Rθ 2 = 2
3 3
verses the effect of Rθ .
2 See video.
θ
x
2 4
Remarks 1.34.
1. What happens if det(A) = 0? In this case, A does not have an

inverse. In other words, there is no 2 × 2 matrix C for which
C A = AC = I2 .
2. If det(A) 6= 0 then A is called invertible or non-singular. If

det(A) = 0 then A is non-invertible or singular.
1.5 Inverses and determinants of n × n matrices

The n × n identity matrices
Definition 1.25 and Example 1.24 generalise to n × n matrices as follows.
32 Matrices 1
Definition 1.35 (The identity matrices). The n × n identity matrix

In ∈ Mn (R) has entries
(
1 if i = j
(In )ij =
0 if i 6= j,
for 1 6 i, j 6 n.
Recall that the main diagonal of a square matrix is the list of all the
entries running diagonally from top left to bottom right. The entries
along the main diagonal of In are all 1, while all other entries of In are 0.
Theorem 1.36. If A is a m×n matrix then AIn = Im A = A. In particular,

if A ∈ Mn (R), then AIn = In A = A.
Proof. Relegated to Appendix C.1.
Inverses and determinants of n × n matrices

Definition 1.27 and Proposition 1.28 also generalise immediately to n × n
matrices: just replace all instances of I2 by In . In particular, notice that
the proof of Proposition 1.28 does not use the fact that we are working
with 2 × 2 matrices in an essential way at all. The only thing we demand
is that they are all square and of the same size.
So the question is how to we compute inverses of n × n matrices, and
what about determinants? We will do this in more than one way in this
module. The first method uses the idea of matrix minors and cofactors,
and is done in several steps.
Definition 1.37 (Matrix minors). Let A ∈ Mn (R). Given 1 6 i, j 6 n,

define the (i, j)-minor Mij to be the determinant of the (n−1)×(n−1)
matrix obtained from A, by deleting the ith row and the j column from
A. The matrix M ∈ Mn (R) that has (i, j) entry Mij , 1 6 i, j 6 n, is the
matrix of minors of A.
This definition is a mouthful when first seen. To give you some purchase
on it, we will cover an example when n = 3.
Example 1.38. Find the matrix of minors of

 
1 3 0
A =  2 −2 1  .
 
−4 1 −1
Solution. The computation of the minors is on the accompanying

video. The outcome is
 
1 2 −6
M = −3 −1 13  .
 
3 1 −8
The next thing to do is to define the matrix of cofactors. This is easily

done once the matrix of minors has been defined, as it simply involves a
few changes of sign.
Definition 1.39 (Matrix cofactors). Let A ∈ Mn (R). The matrix of

cofactors is the matrix C ∈ Mn (R) having entries given by
(
i+j Mij if i + j is even
Cij = (−1) Mij =
−Mij if i + j is odd,
where Mij is the (i, j)-minor as above.
Example 1.40. Find the matrix of cofactors of

 
1 3 0
A =  2 −2 1  .
 
−4 1 −1
Solution. We have the following pattern of signs

 
+ − +
− + − .
 
+ − +
34 Matrices 1
In the positions marked ‘+’, Cij = Mij , and in the positions marked
‘−’, Cij = −Mij .
We now write down C , the matrix of cofactors of A. It differs from M
in Example 1.38 only according to the pattern of signs above:
   
+1 −2 +(−6) 1 −2 −6
C = −(−3) +(−1) −13  = 3 −1 −13 .
   
+3 −1 +(−8) 3 −1 −8
Now we are in a position to define determinants of n × n matrices.
Definition 1.41 (Matrix determinants). If A ∈ Mn (R), then the deter-

minant of A is given by
n
X
det(A) = A11 C11 + A12 C12 + · · · + A1n C1n = A1k C1k ,
| {z }
entries of 1st row of A multiplied by their cofactors k=1
where C ∈ Mn (R) is the cofactor matrix of A.
Now we can compute the determinant of A in Example 1.38.
Example 1.42. Given A in Example 1.38, we have
det(A) = A11 C11 + A12 C12 + A13 C13 = 1 · 1 + 3 · (−2) + 0 · (−6) = −5.
Remarks 1.43.
1. We only used 3 of the cofactors of A when computing det(A)

above. We will use all 9 when we find the inverse of A.
2. Definition 1.41 is an example of a recursive definition.

In fact, we can start this By Definition 1.32, we can define det(A) for any A ∈ M2 (R).
process at n = 1. If
A = (a) is a 1 × 1 matrix Combining Definitions 1.37, 1.39 and Definition 1.41 means we
then det(A) = a. If we can define det(A) for any A ∈ M3 (R). By repeating this process
we can define det(A) for A ∈ M4 (R), and so it goes on: M5 (R),
combine 1.37, 1.39 and
1.41, where n = 2, we
recover Definition 1.32. M6 (R), . . . .
In principle, we can find the determinants of any n×n matrix in
this way, whatever the value of n. In practice though, this is not
advisable. In fact, for large enough n, attempting to compute

determinants in this way would use up all the available atoms
in the observable universe. Happily, so that we don’t run out of
atoms, there are more computationally efficient ways of finding
determinants (though they are not covered in this module).
The formula in Definition 1.41 is known as the cofactor or Laplace ex-

pansion along the first row. It turns out that the corresponding expan-
sions along any row or any column of A also yields det(A).
Theorem 1.44. Let A ∈ Mn (R). Then

n
X
det(A) = Ai1 Ci1 + Ai2 Ci2 + · · · + Ain Cin = Aik Cik
| {z }
entries of ith row of A multiplied by their cofactors k=1
n
Pierre–Simon, marquis
X de Laplace (1749 –
= A1j C1j + A2j C2j + · · · + Anj Cnj = Akj Ckj . 1827).
| {z } k=1 Image source:
entries of jth column of A multiplied by their cofactors Wikipedia.
This theorem gives us n + n = 2n different ways of computing det(A). Its You can simplify the
computation of det(A)
proof is beyond the scope of the module. by choosing to expand
along a row or column
We have looked at determinants, but it remains to complete the business having as many zeros
of finding matrix inverses. as possible: if Aij = 0,
then there is no need
to compute Cij , because
Definition 1.45 (Matrix adjugates). Let A ∈ Mn (R). The adjugate Aij Cij = 0.
adj(A) of A is the transpose of the matrix of cofactors.
Example 1.46. Find the adjugate of A in Example 1.38.
Solution.  
1 3 3
adj(A) = C T = −2 −1 −1 .
 
−6 −13 −8
Recall the computation A · adj(A) = adj(A) · A = det(A)I2 after Definition

1.29.
36 Matrices 1
Example 1.47. Compute A · adj(A) and adj(A) · A, where A is as in

Example 1.38, and thus find A−1 .
Solution.
  
1 3 0 1 3 3
A · adj(A) =  2 −2 1  −2 −1 −1
  
−4 1 −1 −6 −13 −8
 
−5 0 0
=  0 −5 0 
 
0 0 −5
= −5I3 = det(A)I3 .
Likewise, adj(A) · A = −5I3 (verify this). Therefore A−1 = − 15 adj(A),

because
A · (− 51 adj(A)) = − 15 A · adj(A) = (− 15 )(−5)I3 = I3 ,
and likewise A · (− 15 adj(A)) = I3 .
Crumbs! That took some work. That is the first method of finding matrix
inverses given in this module. With a bit of practice, it can be applied
reasonably well to 3 × 3 matrices, but applying it to larger matrices
will, in general, become very painful. The second method, which is more
computationally efficient and scales more effectively to larger matrices,
though conceptually slightly deeper, is given in Section 3.5.
Exercise 1.48. Compute the matrix of minors, the matrix of cofactors,

the adjugate matrix, the determinant and finally the inverse A−1 of
 
1 −4 3
A = 2 10 1 .
 
5 1 9
Verify that your solution is correct by showing that AA−1 = A−1 A = I3 .

The ‘basket-weave’ method

Some people like the following method of computing determinants of 3×3
matrices.
Remarks 1.49 (The ‘basket-weave’ method). Determinants of 3 × 3

The basket-weave trick
matrices can be found using this method. Take A from Example 1.38. applies to 3×3 matrices
only!
First, write down A and repeat columns 1 and 2 on the right:
1 3 0 1 3
2 −2 1 2 −2
−4 1 −1 −4 1
Second, det(A) is given by:
‘Sum of products along & diagonals’

− ‘Sum of products along . diagonals’.
Hence we have
det(A) = 1 · (−2) · (−1) + 3 · 1 · (−4) + 0 · 2 · 1

− 0 · (−2) · (−4) + 1 · 1 · 1 + 3 · 2 · (−1)
= 2 − 12 − 0 − (0 + 1 − 6)
= −10 − (−5) = −5.
Orthogonal matrices
Recall the notion of matrix transpose from Definition 1.19.
Definition 1.50 (Orthogonal matrices). A matrix P ∈ Mn (R) is called

orthogonal if its inverse P −1 exists and equals its transpose P T , i.e.
if PP T = P T P = In .
In principle, it is easy to verify whether or not a matrix in Mn (R) is

orthogonal: simply multiply it by its transpose and check to see if the
outcome equals In .
38 Matrices 1
 
√1 √1 √1
2 6 3
Example 1.51. Show that P = −
 √1 √1 √1 

 2 6 3
is orthogonal.
0 − √26 √1
3
Solution. Verify that PP T = P T P = I3 .
Example 1.52. The rotation matrices Rθ ∈ M2 (R) from Example 1.2

(2) are all orthogonal matrices.
Observe that if P is orthogonal then so is its inverse P T .

It turns out that orthogonal matrices are deeply connected to so-called
orthonormal bases of Rn and different coordinate systems (see Definition
4.7 and Section 4.2), and the eigenvectors of symmetric matrices (see
Chapter 5). They also feature in Appendix B.
General facts about inverses and determinants

We conclude this chapter with a series of general results about matrix
determinants and inverses, and provide proofs here where possible. Many
of these theoretical results help when trying to compute such things!
Proposition 1.53. The following facts apply to determinants of gen-

eral square matrices.
1. det(AT ) = det(A).
2. det(A) = 0 if A has a zero row or zero column.
3. If A is upper triangular, i.e. Aij = 0 whenever i > j, then
det(A) = A11 A22 A33 . . . Ann .
In particular, this holds if A is a diagonal matrix, i.e. Aij = 0

whenever i 6= j.
4. det(cIn ) = c n . In particular, det(In ) = 1 and det(0n ) = 0.
The proof of this proposition has been banished to the forbidden zone
that is Appendix C.1.
The following result is the fundamental theorem of determinant theory.

There is no equivalent
Theorem 1.54. If A, B ∈ Mn (R) then det(AB) = det(A) det(B). result for sums of matri-
ces: there is no nice link
between det(A + B) and
det(A) and det(B).
You should never forget this result! You should forget your own name be-
fore forgetting it. The proof of this result is so dark that it lies beyond the
scope of the module entirely. However, from it we reap a harvest of ad-
ditional properties of determinants which often help us when attempting
to compute them.
Corollary 1.55.
1. If A ∈ Mn (R) and c is a number, then det(cA) = c n det(A).
2. If n is odd and A ∈ Mn (R) is skew-symmetric, that is AT = −A,

then det(A) = 0.
Proof.
1. We have det(cA) = det((cIn )A) = det(cIn ) det(A) = c n det(A), by

Theorem 1.54 and Proposition 1.53 (4).
2. Observe that det(A) = det(AT ) = det(−A) = (−1)n det(A) = − det(A),

by Proposition 1.53 (1), (1) immediately above, and the fact that n
is odd. Hence det(A) = 0.
The following result summarises the process for finding inverses of square
matrices presented above.
Theorem 1.56. If A ∈ Mn (R) then
A · adj(A) = adj(A) · A = det(A)In ,
and if det(A) 6= 0, then A−1 exists and equals 1

det(A)
adj(A). On the
other hand, if det(A) = 0, then A has no inverse.
Lastly, we present a couple of useful facts concerning matrix inverses.
40 Matrices 1
Proposition 1.57.
1. If A is invertible, then det(A−1 ) = det(A)−1 .
2. If A, B ∈ Mn (R) are invertible, then so is AB, and (AB)−1 =

B−1 A−1 .
Proof.
1. We have
1 = det(In ) = det(AA−1 ) = det(A) det(A−1 ),
by Proposition 1.53 (4) and Theorem 1.54. Hence det(A−1 ) = det(A)−1 .
2. Note that
(AB)(B−1 A−1 ) = A(BB−1 )A−1 = AIn A−1 = AA−1 = In ,
and likewise
(B−1 A−1 )(AB) = B−1 (A−1 A)B = B−1 In B = B−1 B = In .
Thus (AB)−1 exists and equals B−1 A−1 .

Chapter 2
Vector Geometry 1
2.1 Euclidean n-space and vectors

Euclidean geometry is the term used to describe the geometry of (3-
dimensional) space, developed axiomatically by Euclid of Alexandria (ap-
prox. 325 – 265 BC) in his book the Elements. This book was the main
geometry text for 2000 years. Much later, René Descartes had the idea
of attaching coordinates to a point (cartesian coordinates), which allows One of the oldest
for an algebraic description of geometry, called analytic geometry.

surviving fragments of
the Elements, found at
Oxyrhynchus and dated
Euclidean plane ↔ R2 to around 100 AD.
Point P ↔ (x1 , x2 )
Image source:
University of British
Columbia.
In this way, we see that R2 is an algebraic model of the Euclidean plane Have you ever won-
and R3 is an algebraic model of Euclidean space. The ancient Greeks did dered why we live in
a 3-dimensional world
not do geometry in higher dimensions, but by using cartesian coordinates, and not, say, a 5-
we can easily study geometry in dimensions n > 3, using algebra, and dimensional one?
Rn as an algebraic model, even though visualization becomes essentially Certainly, being re-
stricted to 2 dimensions
impossible. would cause digestion
problems, as illustrated
Definition 2.1 (n-space). Given n > 1, we define

by this hypothetical
chicken.
Rn = {(x1 , . . . , xn ) : x1 , . . . , xn ∈ R} .
We identify points on a line with real numbers x ∈ R, points in a plane

with ordered pairs (x1 , x2 ) ∈ R2 , and points in 3-space with ordered
triples (x1 , x2 , x3 ) ∈ R3 . These correspondences are easy to visualize by
making pictures. We can continue in this way and interpret quadruples
41
42 Vector Geometry 1
(x1 , x2 , x3 , x4 ) ∈ R4 as points in 4-space, quintuples (x1 , x2 , x3 , x4 , x5 ) ∈ R5

as points in 5-space, etc., but it is not possible to make pictures anymore.
Nobody is suggesting For an arbitrary positive integer n we can thus identify points in n-space
that n-space is wob-
bling around somewhere
with tuples in Rn :
Point P ∈ n-space ↔ (x1 , . . . , xn ) ∈ Rn .
in physical reality. It
is instead a mathe-
matical conceit that is
highly useful when un-
derstanding and solving
In this situation, we call (x1 , . . . , xn ) the coordinates of the point P and
(real–world) problems. write
Having said that, some P = (x1 , . . . , xn ).
physical theories (string
theories) assert that the We therefore also refer to Rn as (Euclidean) n-space and to the number
n as the dimension of Rn . The point O = (0, . . . , 0) is called the origin
universe possesses ex-
tra spatial dimensions
that are beyond our per- of Rn .
ception.
Note that time is not the Example 2.2. Euclidean 3-space is of course a mathematical model
fourth dimension: R4 is
simply a way of repre- for the space we live in, whereas 4-space is the model for ‘space-time’
senting space-time. R3 × R: three spatial dimensions and one temporal dimension, used
for instance in Einstein’s theory of relativity.
Figure 2.1: Algebraic representation of 3-space as R3

z
(2,3,4)
(2,3,0)
O
2
Googling ‘high dimen- Any problem which has n variables in it naturally yields points in Rn . The
sional data’ returns
about 361 million
visualization and modeling of ‘high-dimensional’ data sets is a topic of
search results. enormous importance in contemporary science, engineering and industry.
In multivariate statistics, data having n different measurable character-
istics x1 , . . . , xn is represented as ‘data points’ (x1 , . . . , xn ) in Rn – each
dimension corresponds to a different characteristic of the data. Then the
power of linear algebra can be brought to bear to analyse this data –
see, for example, Appendix B.
2.1. Euclidean n-space and vectors 43
Vectors
Now the discussion moves from points in n-space to vectors. Given the Out there in the math-
ematical badlands, al-
right context, many different objects can be called vectors. In this module most anything (e.g. func-
we will focus on just a few. tions in calculus) can be
regarded as a vector in
the right context.
Definition 2.3. In the correct context, a number of objects can be For more information
vectors. about this, look up the
term ‘vector space’.
1. Points (x1 , . . . , xn ) ∈ Rn .
2. In the physical sciences, a vector is often a quantity having both

magnitude and direction, e.g. displacement, velocity, accelera-
tion or force.
3. It is sometimes helpful to use column vectors (see Remark 1.3

(1)), especially when we want to manipulate vectors using ma-
trices – recall Section 1.2 and see Section 2.4.
For the remainder of this section, we introduce concepts that can be

applied to all three types of vector listed above. There is an emphasis on
the second type of vector listed above, as such vectors can be interpreted
quite readily geometrically. However, these differences are skin deep
and, from an algebraic point of view (e.g. summing vectors, taking scalar
multiplies etc.), all three types are treated in the same way.
We can represent vectors of the second type using directed straight line
segments, i.e. a line segment in Rn , to which is associated one of the two
possible directions.
If P and Q are two points in Rn , we denote by PQ that straight line
segment having P and Q as endpoints, and directed from P to Q. P and
Q are the initial and terminal points of PQ, respectively.
Figure 2.2: A straight line segment PQ in R2

Q
The important thing to note is not the positions of P and Q as such, but
the position of the terminal point Q, relative to the initial point P.
Given a line segment in Rn , we can move it around as we please; provided

we don’t change its length or direction, it represents the same vector.
Definition 2.4. Line segments having the same length and the same
direction are said to be equivalent, and represent the same vector.
Figure 2.3: Equivalent directed line segments in R2
Now we need some way of quantifying the magnitude and direction.
Definition 2.5. Let x be a vector in Rn , represented by some line

To distinguish them from
ordinary numbers, we
will denote vectors in segment PQ. The entries of x are the coordinates of Q, relative to
P, i.e., the entries of x are ‘the coordinates of Q − the coordinates of
bold face in printed
notes, e.g. x or v.
When writing them by P’.
hand, e.g. when answer-
ing homework assign-
ments, it is usual prac-
tice to underline vectors,
e.g. x or v. Underlining
Example 2.6. Suppose x is represented by AB, where A = (2, 1) and
is easier by hand! B = (−3, 2). What are the entries of x?
Solution.
entries of x = ‘the coordinates of B − the coordinates of A’

= (−3, 2) − (2, 1)
= (−5, 1).
It is helpful to see vector entries in terms of equivalent line segments. In

Example 2.6, if we move the initial point of AB to the origin, while pre-
serving magnitude and direction, then we get an equivalent line segment
OC, where C has coordinates
(−3 − 2, 2 − 1) = (−5, 1),

| {z }
(coordinates of B)−(coordinates of A)
which are precisely the entries of x.

2.1. Euclidean n-space and vectors 45
Figure 2.4: The entries of a vector via equivalent line segments

y
B = (−3,2)
C A = (2,1)
x
O
In general, if x is represented by PQ, where P = (x1 , . . . , xn ) and Q = Even if n > 3 and we

can’t visualise Rn any-
(y1 , . . . , yn ), then the entries of x are more, we are still able
to work with the corre-
(y1 − x1 , y2 − x2 , . . . , yn − xn ).
sponding vectors using
algebra.
The difference between points and vectors

Hereafter, if x is represented by some line segment PQ, we simply write
x = PQ, and if x has entries (−5, 1), we write x = (−5, 1). If we write
x = (−5, 1), then x doesn’t look any different from the point (−5, 1), and
in many ways this is correct: when we do algebra with vectors in Rn ,
vectors and points become almost indistinguishable and interchangeable.
Some geometric differences remain: vectors tend to have non-zero length,
whereas points have zero length. Having said that, given a point P in Rn ,
there is a natural way to associate a vector to P: consider the directed
line segment OP, having initial point O and terminal point P.
Figure 2.5: The line segment OP associated with the point P

z
OP y
x
Of course, the entries of the vector OP equal the coordinates of the point
P. When we treat points as vectors and talk about the ‘length of P’, then
we really mean the length of OP.
2.2 Vector arithmetic

As with matrices, we will often treat vectors as whole entities in their
own right, rather than as collections of numbers. As with matrices, we
define vector addition and scalar multiplication ‘entry-wise’.
As with matrices, it only
makes sense to add two Definition 2.7 (Vector addition). Let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn )
vectors having the same
size, e.g. trying to add be two vectors in Rn . Then the vector sum x + y is another vector in
(1, 2) to (3, 4, 9) doesn’t Rn , and is given by
make any sense and
should be avoided!
The sum of any two vec-
x + y = (x1 + y1 , . . . , xn + yn ).
tors of the same size
yields another vector of
the same size.
Definition 2.7 (1) certainly accords with our geometric intuition.
Example 2.8. If x = (−2, 3) and y = (4, −1), then x + y = (2, 2).

Geometrically, if we position the initial point of x at the origin, and
move y so that its initial point equals the terminal point of x, then
x + y is the directed line segment from O to the terminal point of y.
Figure 2.6: Summing vectors

y
y
(2,2)
x
x+y
x
O
Definition 2.9 (Zero vectors). The zero vector in Rn is 0 = (0, . . . , 0).
The zero vectors in R2 and R3 are (0, 0) and (0, 0, 0), respectively, and so
on. They correspond to the origins of these spaces. Just like the zero
2.2. Vector arithmetic 47
matrices of Definition 1.9, adding a zero vector to a vector of the same The zero vectors don’t
size changes nothing: these zero vectors are additive identities.

have a definable direc-
tion.
Recall that a scalar is just another term for an ordinary number.
Definition 2.10 (Scalar multiples of vectors). Let x = (x1 , . . . , xn ) be

a vector in Rn and let c be a scalar. The scalar multiple cx of x is
cx = (cx 1 , . . . , cx n ).
As with matrices, if you multiply a vector in Rn by the scalar 0, you get

the corresponding zero vector in Rn .
Two non-zero vectors x and y are said to be parallel if x = cy for some
scalar c.
Example 2.11. Let x = (2, 1). What are 3x, −2x, −x and 12 x?
Solution. It follows from Definition 2.10 that 3x = (6, 3), −2x =

(−4, −2), −x = (−2, −1) and − 12 x = (−1, − 21 ).
Figure 2.7: Scalar multiples of vectors in R2

y
3x
If x is non-zero and c >
0, then cx and x share
x
the same direction. If
x c < 0 then cx points in
the direction opposite to
−2x that of x.
The following general properties of vector addition and scalar multiplica-

tion are easy consequences of the definitions above. You can and should
compare them with the ones listed in Fact 1.23.
Fact 2.12. For all vectors x, y and z of the same size, and scalars k Compare this with Fact
and c, we have 1.23 (1), (2) and (part
of) (6).They are the same
laws!
1. (x + y) + z = x + (y + z) (associativity)
2. x + y = y + x (commutativity)
3. c(x + y) = cx + cy (dist. of scalar mult. over vector addition)
4. k(cx) = c(kx) = (kc)x.
There is no direct vector analogue of matrix multiplication, though there

is the scalar product – see Section 2.3.
Lengths of vectors and unit vectors

The length x of x = (x1 , x2 ) in R2 is the length of a line segment

x. This can be computed easily using Pythagoras: x =

representing
p
x12 + x22 . The length of a vector has physical significance, e.g. the length
of the velocity vector of a moving object is the speed of that object.
√
Example 2.13. If x = (4, 3), then x = 42 + 32 = 5.
Figure 2.8: The length of a vector in R2

y
(4,3)
3
x
4
We can compute lengths of vectors in R3 by applying Pythagoras twice.
Figure 2.9: The length of a vector in R3

z
(x1 ,x2 ,x3 )
x2
(x1 ,x2 ,0)

O
x1
x
2.2. Vector arithmetic 49
If x = (x1 , x2 , x3 ), then
s
q q 2 q

x = (x1 , x2 , 0) 2 + x 2 = x1 + x2 + x32 = x12 + x22 + x32 .
2 2
3
Having this in mind, it is natural to make the following definition.
Definition 2.14 (Vector length). Let x = (x1 , . . . , xn ) be a vector in Rn .

In mathematical circles,
The length or magnitude of x is defined to be the quantity x is also

called the norm of y.
q
x = x12 + · · · + xn2 .

For any vector x, x is a non-negative number. For any vector x and

scalar c, we have cx = |c| x :

cx = c(x1 , . . . , xn ) = (cx1 , . . . , cxn )

p
= (cx1 )2 + . . . (cxn )2
q
= c 2 (x12 + · · · + xn2 )
q
= |c| x12 + · · · + xn2 = |c| x .

Also, given a vector x in Rn , if x = 0 then x = 0.

Definition 2.15 (Unit vectors). A vector having length 1 is called a

unit vector.
Let x in Rn be non-zero (that is, x 6= 0, i.e. not every entry of x is zero).

Then x > 0, and we can form a new vector

1
x̂ = x.
x
Then

x̂ = 1 x = 1
x
i.e. x̂ is the unit vector having the same direction as x.
Example 2.16. If x = (4, 3), then x̂ = 51 (4, 3) = ( 45 , 35 ).
Exercise 2.17. Let x = (−1, 3, 2). Find
1. a vector of length 1 in the direction of x;
2. a vector of length 4 in the direction of x, and
3. a vector of length 2 in the direction opposite to that of x.
Example 2.18 (Vector Addition and Displacement). A sailor in a small

boat sails 3 km east, then 4 km southeast and a third leg of length d
km at an angle θ to the easterly direction. Her final position is 7 km
due east from her starting point. Find the magnitude and direction
of the third leg of her journey.
Solution. We want to find θ and d = z . We have

(7, 0) = x+y+z
= (3, 0) + (4 cos π4 , −4 sin π4 ) + (d cos θ, d sin θ)
= (3 + 4 cos π4 + d cos θ, −4 sin π4 + d sin θ)
≈ (5.828 + d cos θ, −2.828 + d sin θ).
Equating the first and second entries gives
d cos θ ≈ 1.172 and d sin θ ≈ 2.828.
With a bit more work, We divide to eliminate d: tan θ ≈ 2.413, so θ ≈ 1.178. Finally,
we can show that θ is
exactly 38 π, or 67.5◦ .
d ≈ 1.172/ cos θ ≈ 3.06 km.
Figure 2.10: A sailor in a small boat

7 km
π
x 4
y z
θ
2.3. The scalar product 51
2.3 The scalar product

The concepts of orthogonality and the scalar (or dot) product of two
vectors are of huge importance. In the physical sciences, mechanical work
and magnetic flux are calculated using scalar products. In statistics, the
scalar product is connected to correlation of sample data.
Definition 2.19. Let x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) be vectors Don’t confuse the scalar
in Rn . The scalar product or dot product of x and y is given by

product of two vectors
with scalar multiplica-
tion of a vector by a
n
X scalar, or with matrix
x · y = x1 y1 + · · · + xn yn = xi yi . multiplication!
i=1
For instance, (1, 2) · (−4, 7) = 1 × (−4) + 2 × 7 = 10, and (2, −3, 1) ·

(2, 1, −1) = 2 × 2 + (−3) × 1 + 1 × (−1) = 0, and so on. The following
really important list of properties follow from the definition above. As in
previous lists of facts, notice that, in the statement below, the vectors are
treated as whole entities – there is no mention of the individual entries
of the vectors. The proof requires us to open up the vectors and examine
their entries, but once this is done we can forget about the entries!
Fact 2.20. The following hold for all vectors x, y and z in Rn , and These facts are ex-
scalars c ∈ R. tremely useful! You will
gain great powers if
you can master them.
1. x · y = y · x.
2. x · x = x 2 .

3. x · (y + z) = x · y + x · z and (x + y) · z = x · z + y · z.
4. (cx) · y = x · (cy) = c(x · y).
Proof. Let x = (x1 , . . . , xn ), y = (y1 , . . . , yn ) and z = (z1 , . . . , zn ).
1. We have y · x = y1 x1 + · · · + yn xn = x1 y1 + · · · + xn yn = x · y.
2. We have x · x = x12 + · · · + xn2 = x 2 .

3. Since
y + z = (y1 + z1 , . . . , yn + zn ),
it follows that
x · (y + z) = x1 (y1 + z1 ) + · · · + xn (yn + zn )
= (x1 y1 + · · · + xn yn ) + (x1 z1 + · · · + xn zn )
= x · y + x · z.
Likewise for the other equality.
4. Because cx = (cx 1 , . . . , cx n ), we obtain
(cx) · y = (cx1 )y1 + · · · + (cxn )yn

= c(x1 y1 + · · · + xn yn ) = c(x · y).
Likewise for the other equality.
Angles between vectors and the Cauchy-Schwarz

inequality
We use all of the facts above in the next result, called the Cauchy-
Schwarz inequality, which is one of the most important inequalities in
all of mathematics. Its proof is a great exercise in applying the rules in
Fact 2.20 above and, for this reason, it avoided exorcism to Appendix C.1.
Theorem 2.21. Let x and y be non-zero vectors in Rn . Then

|x · y| 6 x y .
Moreover, if equality holds, then x and y are parallel.
There are several established proofs of this result. The one given below
uses an intermediate result that is hugely significant in its own right.
Theorem 2.22. Let u and v be vectors in Rn , such that u · v = 0. Then
u+v 2 = u 2+ v 2.

In particular, u 6 u + v , with equality if and only if v = 0.

Proof. Using Fact 2.20, we have
u + v 2 = (u + v) · (u + v)

Fact 2.20 (2)
= u · (u + v) + v · (u + v) Fact 2.20 (3)

= u·u+u·v+v·u+v·v Fact 2.20 (3) again
= u 2 + 2(u · v) + v 2

Fact 2.20 (1) and (2)
= u 2+ v 2,

as u · v = 0. Finally,
p
u 6 u 2+ v 2 = u+v ,

with equality if and only if v = 0, which holds if and only if v = 0.

The geometric import of Theorem 2.22 is revealed later in Remark 2.27. Throughout these two
proofs, we treat u, v, x
and y as whole enti-
Proof of Theorem 2.21. Set ties and rely largely on
Fact 2.20 to manipulate
them. We do not (and
u = (x · y)x v = x y − u = x 2 y − (x · y)x.
2
and should not) concern our-
selves with the entries
u1 , u2 , . . . of the vectors
As cw = |c| w for any scalar c and vector w (see remarks after

– doing so could easily
have u = |x · y| x . Moreover, u + v = x 2 y, so
get us into a big pig’s
Definition 2.14), we
2
u+v = x y . Next, we show that u · v = 0. Indeed, we have
breakfast, and we want
to minimise the number
of such breakfasts.
u · v = u · ( x 2 y) + u · (−u)
In short, life is made
Fact 2.20 (3)
easier by Fact 2.20!
= x 2 (u · y) − u 2

Fact 2.20 (2), (4)
= x 2 (((x · y)x) · y) − (|x · y| x )2

= x 2 (x · y)2 − (x · y)2 x 2 = 0.

Fact 2.20 (4)
Applying Theorem 2.22 yields

|x · y| x = u 6 u + v = x 2 y .

Dividing both sides by x (which is a non-zero, positive number) yields

the result. Moreover, equality holds if and only if v = 0, i.e.

x·y
y = 2 x.
x
This means that x and y are parallel.
This inequality allows us to define the angle between vectors.
Corollary 2.23. Let x and y be non-zero vectors in Rn . Then

x·y
−1 6 6 1.
x y
Proof. Theorem 2.21 shows that

x·y

6 1,
x y

from which the result follows.
The proof of this fact Given any real number r in the range −1 6 r 6 1, there is a unique
belongs to a calculus
course. It is outside the
number θ in the range 0 6 θ 6 π, such that
scope of MATH10400.
cos θ = r.
Figure 2.11: The graph of cosine from 0 to π

y
π
2 θ π
x
r
−1
Definition 2.24. Let x and y be non-zero vectors in Rn .
1. The angle between x and y is the unique number θ in the range

0 6 θ 6 π, such that
x·y
cos θ = .
x y
2. We say x and y are orthogonal, or perpendicular, written x ⊥ y,

if the angle between x and y is π2 (90◦ ).
It is important to check that this definition of angle does indeed fit with
our geometric intuition. We do this in R2 , for which we require the cosine
rule:
c 2 = a2 + b2 − 2ab cos θ,
where a, b and c are the side lengths of the triangle as below, and θ is
the angle as below.
Figure 2.12: The cosine rule

B
c
a
θ A
O b
We include a brief proof: if A = (b, 0) and B = (a cos θ, a sin θ), then

2
c 2 = AB = (a cos θ − b)2 + (a sin θ − 0)2
= a2 cos2 θ − 2ab cos θ + b2 + a2 sin2 θ
= a2 + b2 − 2ab cos θ.
Now let x = (x1 , x2 ), y = (y1 , y2 ). Form the triangle having x and y

as two of its sides. Let z be a vector representing the third side, so
z = (x1 − y1 , x2 − y2 ). By the cosine rule,
2
z = x 2 + y − 2 x y cos θ.
2
Now
z = (x1 − y1 )2 + (x2 − y2 )2 = x12 − 2x1 y1 + y21 + x22 − 2x2 y2 + y22 ,
2
and
2 2
x + y − 2 x y cos θ = (x12 + x22 ) + (y21 + y22 ) − 2 x y cos θ,
so by cancelling the terms x12 , y21 , x22 and y22 from both sides, we obtain

−2x1 y1 − 2x2 y2 = −2 x y cos θ

⇒ x · y = x1 y1 + x2 y2 = x y cos θ
x·y
⇒ = cos θ.
x y
This result matches perfectly with Definition 2.24!
Proposition 2.25. Let x and y be non-zero vectors in Rn . Then x ⊥ y

if and only if x · y = 0.
Proof. By Definition 2.24, x ⊥ y if and only if

x·y
= cos θ = cos π2 = 0.
x y

Since x and y are non-zero, we have x y > 0. Thus x ⊥ y if and only
if x · y = 0.
Example 2.26. Suppose x = (−1, 3) and y = (6, 2). Then x ⊥ y

because x · y = −1 × 6 + 3 × 2 = 0, so x ⊥ y.
The example above is trivial, but it is worth expanding on it and again

appealing to geometric intuition. Let A and B denote the points (−1, 3)
and (6, 2), respectively, and form the triangle OAB. If x ⊥ y then OAB is
right-angled at O, so we expect from the cosine rule that
2 2 2
AB = x + y .
2
We verify this: x 2 = (−1)2 + 32 = 10, y = 62 + 22 = 40 and AB =

2
(6 − (−1), 2 − 3) = (7, −1), so AB = 72 + (−1)2 = 50.
Thus the triangle OAB is right-angled at O, and x ⊥ y.
Figure 2.13: Orthogonal vectors in R2

y
B
x
y
x
−1 6
The sign of the scalar product

If x and y are non-zero vectors having angle θ between them, we can
glean some immediate information about θ simply by checking the sign
of x · y.

(a) x · y > 0 implies x y cos θ > 0, so cos θ > 0: θ is acute;
(b) x · y < 0 implies cos θ < 0: θ is obtuse;
π
(c) x · y = 0 implies cos θ = 0: θ = 2
and x ⊥ y.
The geometric significance of Theorem 2.22

Theorem 2.22 was apparently parachuted in earlier in the section, simply
to help prove the Cauchy-Schwarz inequality. Moreover, its statement
and proof were purely algebraic. However, it has a deep and ancient
geometric connection, alluded to in some sense by Example 2.26.
Remarks 2.27. Let u and v be non-zero vectors in Rn . We know now

that these vectors are perpendicular if and only if u · v = 0. In this
case, we can form the right-angled triangle as shown below, where
the length of the hypotenuse and those of the other two sides are
given by u + v and u and v , respectively. Given this, we see

that the statement
u+v 2 = u 2+ v 2,

is precisely what we would expect from the theorem of Pythagoras.

Moreover, the length of the hypotenuse u + v is going to strictly

exceed u (as v 6= 0). If v = 0 then these two values will be equal

(and of course we won’t have a triangle anymore).
Figure 2.14: The Theorem of Pythagoras
u+v
v
u
2.4 Matrices acting on vectors

In this section, we consider vectors as given in Definition 2.3 (3). Recall
Section 1.2, where we treated points in R2 as 2 × 1 column vectors in
order to take advantage of matrix multiplication and show that certain
matrices can rotate such points.
The arithmetic of This principle can be generalised to a much greater degree. Suppose
column vectors: sums,
scalar multiples,
that A is a m × n matrix, and let x be a point or vector in Rn . We can
lengths, scalar products regard x = (x1 , . . . , xn ) as a n × 1 column vector
etc, proceeds in the
 
x1
same way as in previous
sections.
x =  ..  ,
 . 
xn
to take advantage of matrix multiplication. With this in mind, the matrix
product Ax is a m × 1 column vector which, in turn, can be regarded as a
point or vector in Rm . In this way, we can use the matrix A to transform
points or vectors in Rn to points or vectors in Rm .
Quite often, we concentrate on the case where m = n, i.e., when A ∈
Mn (R) and x and Ax belong to the same space Rn . The rotation of points
in 2 dimensions is one such example. There are 3×3 matrices that rotate
points in 3 dimensions, and others that perform many more geometric
actions besides.
The next result ties together matrix transposes and scalar products and
has many applications.
Proposition 2.28. Let A ∈ Mn (R) and let x and y be (n × 1 column)

vectors in Rn . Then
(Ax) · y = x · (AT y).
The proof of Proposition 2.28 is in Appendix C.1. One of its applications

is to demonstrate a hugely important feature of orthogonal matrices,
namely that they preserve scalar products between vectors and they
preserve lengths of vectors, in the following sense.
Proposition 2.29. Let P ∈ Mn (R) be an orthogonal matrix, and let

x, y be vectors in Rn . Then
(Px) · (Py) = x · y.
2.5. Orthogonal projections 59

In particular, Px = x .
Proof. Since P is orthogonal, we have P T P = In . Given any vector y in

Rn , treated as a n × 1 column vector, note that In y = y by Theorem 1.36.
Thus, using Proposition 2.28, we obtain
(Px) · (Py) = x · (P T Py) = x · (In y) = x · y.

2
In particular, Px = Px · Px = x · x = x 2 . Taking positive square

roots of both sides yields the second result.
This makes complete sense when you consider rotations. When you
rotate an object you don’t change the length of anything, nor do you
change any angles between anything in the object (whereas stretching,
twisting or otherwise deforming the object may alter such things).
Proposition 2.29 is used in Sections 4.2 and 6.1, and Appendix B.
2.5 Orthogonal projections

We finish this chapter with a brief description of orthogonal projections,
which are used in Appendix B. Consider a unit vector v in Rn , some other
vector x in Rn , and the straight line L that runs through the origin 0,
parallel to v.
Definition 2.30. The orthogonal projection of the vector x onto the

line L, denoted projL (x), is the vector y that lies L, in such a way that
the line running from x to y is orthogonal to L.
In Figure 2.15 below, L and y are represented by the dotted and dashed
lines, respectively, and the line from x to y is highlighted in red. As you
can see, the red line is orthogonal to L.
How do we determine y = projL (x)? The vector y has to lie on the line L,
which means that it must be a scalar multiple of v. So let’s set y = cv,
where c is to be determined. We need the red line from x to y to be
orthogonal to L. The red line is parallel to the vector x − y, and v is
parallel to L. If we take the scalar product of x − y with v, we obtain
(x − y) · v = x · v − y · v = x · v − c(v · v) = x · v − c,
because v · v = v 2 = 1. For orthogonality, we require this quantity to

be 0. Therefore, c = x · v, giving y = (x · v)v.
Figure 2.15: Orthogonal projection of x onto the line L

x
y = projL (x)
0 θ
v
In the figure above, From the point of view of trigonometry, this makes perfect sense. Let θ
θ is acute and cos θ
is positive. However,
be the angle between v and x. Given the right-angled triangle in the
θ may be obtuse, so picture above, the
length of y should equal the length of x, multiplied
by | cos θ|, i.e. y = x | cos θ|. From
we require the abso-
lute value | cos θ|, rather Definition 2.24, we know that
x · v = x v cos θ = x cos θ, thus y should equal |x · v|, and indeed
than cos θ.

it does, again because v = 1.

Example 2.31. Compute projL (x) in the following cases.
1. x = (5, −7, 11, 4) and v = ( 12 , − 12 , 12 , − 12 ) = 12 (1, −1, 1, −1).
2. x = (2, 9, 1) and v = (3, 2, 1).
Solution.
In both cases you can
and should verify that 1. We have projL (x) = (x · v)v = 19
v = 19
(1, −1, 1, −1).
(x − projL (x)) · v = 0.
2 4
2. Here, v is not a unit vector, so the formula above does not apply.
In these cases, observe that L is also parallel to v̂, where v̂ is
the unit vector parallel to v defined after Definition 2.15. Thus
This formula for projL (x)
(x · v)v
projL (x) = (x · v̂)v̂ = 2 .
works for any non-zero
vector v.
v
In this example we have
(x · v)
projL (x) = 2 v = 25
v = 25
(3, 2, 1).
v 14 14
2.5. Orthogonal projections 61
Exercise 2.32. Let v, x, y and L be as above, and let z be any vector

running parallel to L, i.e. z = av, for some number a. Show that x − y
and y − z are orthogonal, regardless of the value of a. Using this
result and Theorem 2.22, show that
2 2
x − z 2 = x − y + y − z ,

and thus x − y 6 x − z .

The exercise above shows that y is the vector on the line L that is closest
to x.
Chapter 3
Systems of Linear Equations
3.1 Systems of linear equations

Solution sets and simultaneous solutions
The equation 2x + y = 3 is a linear equation in the variables x and y.
A pair (x0 , y0 ) of real numbers (or a vector in R2 , if you like) is a solution of
the equation 2x + y = 3 if setting x = x0 and y = y0 makes the equation
true; i.e. if 2x0 + y0 = 3. Thus, (1, 1) and (0, 3) are solutions, but (1, 4) is
not a solution since setting x = 1, y = 4 gives 2x + y = 2 × 1 + 4 6= 3.
Definition 3.1 (Solution sets). The set of all solutions of a linear

equation is called its solution set.
Example 3.2. The solution set of the equation 2x + y = 3 forms a

straight line in the 2-dimensional Cartesian plane.
Figure 3.1: The solution set of 2x + y = 3.

y
Every point on the

(0,3) line is a solution to
the equation, and
vice-versa.
2
The solution set gives us
(1,1) the graph of the function
y = 3 − 2x.
x
−2 2 4
(2,−1)
63
64 Systems of Linear Equations
Example 3.3. The solution set of −4x + 5y = 8 is another straight

line. The two lines intersect at a (unique) point ( 12 , 2). Equivalently,
x = 12 , y = 2 solve both equations 2x + y = 3 and −4x + 5y = 8
simultaneously.
Figure 3.2: The solution sets of 2x + y = 3 and −4x + 5y = 8.

y
2 ( 21 ,2)
x
−2 2 4
Personal computers
It is sometimes necessary to find solutions that solve several linear equa-
tions simultaneously. For example, weather prediction models often re-
quire simultaneous solutions of hundreds of thousands of such equations.
Simple examples are re- Example 3.4. We consider a straightforward example about making
quired when introducing computers. Suppose that a factory makes two models of personal
a topic. However, sim-
ple examples often look computer: one for standard use, that takes 2 hours to build, and a
contrived. Please ac-
cept my apologies in ad-
second, more powerful model for gaming purposes, that takes 3 hours
vance. to build. Imagine that the factory can produce a maximum of 300
computers per week, and has at its disposal a total of 800 hours of
labour per week. How many computers of each type should be built,
in order to maximise capacity and time?
Solution. Let x and y denote the number of standard and gaming

computers to be built each week, respectively. To maximise capacity
and time, we require
x + y = 300
(3.1)
2x + 3y = 800.
3.2. Elementary row operations 65
If we multiply both sides of the second equation above by 12 , we obtain
x + 3
2
y = 400. (3.2)
If we then subtract the first equation in (3.1) above from (3.2), we get
1
2
y = 100,
so y = 200. If we substitute this back into the first equation in (3.1),

we obtain x = 100.
3.2 Elementary row operations

This kind of ‘ad hoc’ approach used above may not always work if we have It is quite possible that
you (like me) learned
a more complicated system, involving a greater number of variables, or one way of solving cer-
more equations. We will devise a general strategy for solving complicated tain systems of lin-
ear equations at school,
systems of linear equations. akin to the ad hoc
method above. At uni-
versity, I learned a dif-
Example 3.5. Find all simultaneous solutions of the system ferent method – the one
below – which initially
seemed unnatural and
x1 + 2x2 − x3 = 5 unnecessary. But in
time I came to view it as
3x1 + x2 − 2x3 = 9 superior.
−x1 + 4x2 + 2x3 = 0. If you are in this position

then I would heed the
words of Yoda, who once
said, ‘you must unlearn
what you have learned’.
The approach used in simple systems like Example 3.4 will work for
Example 3.5 too, but with this and other, harder examples, it may not
always be clear how to proceed. Our new strategy describes and solves
linear systems more systematically, with greater clarity, and thus with
less scope for error.
We associate a matrix with our system in Example 3.5: It is possible to solve
Examples 3.4 and 3.5 us-
ing matrix inverses –
  see Section 3.5. How-
1 2 −1 5 eqn 1 ever, this approach only
 3 1 −2 9 
  applies if our linear sys-
eqn 2 tem has a unique solu-
tion, and this is not true
−1 4 2 0 eqn 3 in all cases.
1. Row i of this matrix comprises first the coefficients of the variables

and then the number on the right hand side of equation i.
2. Column i, 1 6 i 6 3, corresponds to the variable xi , and column

4 corresponds to the numbers on the right hand side of the whole
system.
Definition 3.6 (Augmented matrices). The above matrix is the aug-

mented matrix of the linear system in Example 3.5.
In Example 3.4, we performed operations such as:
1. multiply an equation by a non-zero constant, and

2. add one equation (or a non-zero constant multiple of one equation)
to another equation.
Such operations change the system of equations, but preserve the set of
simultaneous solutions of the system.
The operations correspond to the following operations on the augmented
matrix:
1. multiply a row by a non-zero constant, and

2. add a multiple of one row to another row.
We consider a third type, namely:
3. swap two rows in the matrix (this only amounts to writing down the
equations of the system in a different order).
Definition 3.7 (Elementary row operations (EROs)). Operations on

a matrix of these three types are called elementary row operations
(EROs).
Solution of Example 3.5. See the accompanying video.

Step 1
R3 −1 4 2 0
+
R1 1 2 −1 5
new R3 0 6 1 5
3.2. Elementary row operations 67
Step 2
R2 3 1 −2 9
−
3R1 3 6 −3 15
new R2 0 −5 1 −6
Step 3
R2 0 −5 1 −6
+
R3 0 6 1 5
new R2 0 1 2 −1
Step 4
R3 0 6 1 5
−
6R2 0 6 12 −6
new R3 0 0 −11 11
Step 5
R3 0 0 −11 11
− 11 R3 0 0
1
1 −1
We have produced a new, simpler, system of equations: The point of using EROs
is that the system of
equations gets simpler,
x1 + 2x2 − x3 = 5 (A) but the solutions of the
system are preserved.
x2 + 2x3 = −1 (B) So the solutions of the
final system are the
x3 = −1 (C), same as the solutions of
the original system.
This is easily solved using back-substitution.


 (C ) x3 = −1

Back-substitution (B) x2 = −1 − 2x3 ⇒ x2 = 1

(A) x1 = 5 − 2x2 + x3 ⇒ x1 = 2.

So the solution is (2, 1, −1), or x1 = 2, x2 = 1, x3 = −1. Check that this

is a solution to the original system. It is the only solution, both of the
final system and of the original one (and every intermediate one).
3.3 Row-echelon and reduced row-echelon

forms
The matrix obtained in Step 5 above is in row-echelon form.
Definition 3.8 (Row echelon forms). A matrix A is said to be in row-

echelon form (REF), if
1. The first non-zero entry in each row is a 1 (called a leading 1).
2. If a column contains a leading 1, then every entry of the column

below the leading 1 is a zero.
3. As we move downwards through the rows of the matrix, the

leading 1s move from left to right.
4. Any zero rows (rows consisting entirely of 0s) are grouped to-
gether at the bottom of the matrix.
Equivalently, we say that A is a row-echelon matrix.
Example 3.9. The matrix

 
1 2 −1 5
2 −1 
 
 0 1
0 0 1 −1
at the end of the solution to Example 3.5 is in row-echelon form.
Example 3.10. The following are row-echelon matrices

     
1 4 7 1 −2 1 4 0 1 −7 3
 0 1 −9  ,  0 1 −10  ,
     
0 0 1  and  0 0
0 0 1 0 0 0 0 0 0 0 1
3.3. Row-echelon and reduced row-echelon forms 69
while those below are not

     
1 −2 8 1 3 3 −15 0 0 0
2 6 ,  0 0 0  0 1 0 .
     
 0 1  and
0 0 1 0 0 0 6 1 0 0
Exercise 3.11. Explain why the second set of matrices in Example

3.10 are not in row-echelon form.
Gauss-Jordan elimination
Theorems 3.12 and 3.15 lie at the heart of the new strategy for solving
linear systems. The proof of the first is algorithmic, as it describes a
method that can be implemented. We include the proof as it is of con-
siderable practical use when solving linear systems. There is a video to
accompany the proof, in which we apply the algorithm in the proof to
Example 3.13.
Theorem 3.12. Every m × n matrix can be reduced, via a series of

EROs, to a row-echelon matrix.
Proof. If A is the zero m × n matrix then there is nothing to do as A This algorithm gen-
is already in row-echelon form. If A is non-zero, look in A for the first erates a sequence of
EROs that is guar-
column (from the left) having a non-zero entry. This column is called the anteed to produce a
pivot column, and the first non-zero entry in the pivot column is called row-echelon matrix.
the pivot. Suppose the pivot column is column j and the pivot occurs in
However, this sequence
of EROs not the only
row i. Now interchange, if necessary, rows 1 and i and call the resulting one that does this! In
matrix B:
practice, in specific
cases it can be better to
R1 ↔ Ri A → B. use a different sequence
of EROs – see marginal
notes accompanying
Thus the pivot B1j is non-zero. Now perform the ERO Example 3.17.
Also be aware that, after
an alternate reduction,
1
R1 → × R1 B → C. you may get a REF that
B1j is different from the one
that the theorem pro-
vides. Such REFs are
Note that C1j = 1. Now whenever Ckj , 2 6 k 6 m, is non-zero, perform not unique!
the ERO
Rk → Rk − Ckj × R1
on C . Denote the resulting matrix by D. It follows that in D, the elements

in column j, in rows 2 to m, are zero:
D2j = D3j = . . . = Dmj = 0.
Next, consider the (m − 1) × n submatrix A0 of D obtained by deleting

the first row of D. Repeat the procedure above with A0 instead of A.
Continuing in this way, we obtain a matrix in row echelon form.
Example 3.13. Reduce the matrix A below to row-echelon form.

 
2 4 −2 0 6
A =  1 2 −1 4 8 .
 
−2 −4 9 11 −15
Solution. We implement the algorithm in the proof of Theorem 3.12.

See accompanying video. The REF obtained is
 
1 2 −1 0 3
− 97  .
 
 0 0 1 117
5
0 0 0 1 4
Definition 3.14 (Reduced row-echelon forms). A matrix is in reduced

row-echelon form (RREF) if
1. it is in row-echelon form, and
2. if a particular column contains a leading 1, then all other entries

of that column are 0s.
We state without proof the next result.
Theorem 3.15. Every m × n matrix can be reduced, via a series of

Row-echelon forms of
matrices need not be
unique. However, for EROs, to a unique reduced row-echelon matrix.
any given A, there is
only one reduced row-
echelon form!
Once we have a row-echelon form of a matrix, we can use additional
EROs to obtain the reduced row-echelon form.
3.3. Row-echelon and reduced row-echelon forms 71
Example 3.16. Reduce the matrix A below to reduced row-echelon

form.  
2 4 −2 0 6
A =  1 2 −1 4 8 .
 
−2 −4 9 11 −15
Solution. See accompanying video. The RREF in this case is

 
1 2 0 0 − 41
 0 0 1 0 − 13 .
 
4 
5
0 0 0 1 4
So every matrix can be reduced to reduced row-echelon form. But what

is the use of this? Let’s return to Example 3.5.
Example 3.17. Consider again the system
x1 + 2x2 − x3 = 5
3x1 + x2 − 2x3 = 9
−x1 + 4x2 + 2x3 = 0
in Example 3.5, reduce the augmented matrix of the system to reduced

row-echelon form, and hence solve the system (again).
Solution. See accompanying video. On the video, we begin at step In the video accompa-
nying Example 3.5, we
5, which the the final step on the video accompanying Example 3.5. did not follow the al-
gorithm in the proof of
The RREF in this case is Theorem 3.12 exactly. In
  step 2, we performed
1 0 0 2 R2 → R2 + R3, in-
stead of R2 → − 51 R2
,
 
 0 1 0 1 (the ERO suggested by
0 0 1 −1
the algorithm). Both
EROs turn the pivot in
column 2 into a 1, but
the first ERO avoids
fractions (which can in-
which corresponds to the linear system
crease errors in arith-
x1 = 2 metic).
x2 = 1
x3 = −1.
This system is easier to solve! These x1 , x2 and x3 agree with our

solution to Example 3.5.
Using EROs in this way preserves the set of solutions of the system.
By reducing to the RREF, we have passed from a system that is hard to
solve to another that is trivial to solve. Moreover, the set of solutions
has been preserved, so the solution to the final system is the same as
the solution to the original system (and of every intermediate one).
This method of solving linear systems by reducing the augmented matrix
to reduced row-echelon form is called Gauss–Jordan elimination.
3.4 Parametric solutions and inconsistent

systems
Parametric solutions
In Examples 3.4 and 3.5, we obtained unique solutions of the systems,
i.e., in both cases there is only one combination of x and y (in Example
3.4) or x1 , x2 and x3 (in Example 3.5) that solves the system. However,
very often there is more than one solution to a given system.
Example 3.18. Find the solution to
2x1 + 4x2 − 2x3 = 6

x1 + 2x2 − x3 + 4x4 = 8
−2x1 − 4x2 + 9x3 + 11x4 = −15.
Solution. The augmented matrix of this system is the matrix A in

Carl Friedrich Gauss
Examples 3.13 and 3.16. Using Gauss-Jordan elimination as above,
(1777 – 1855) is we obtain the RREF
one of the greatest
 
1 2 0 0 − 14
mathematicians of all
time.
 0 0 1 0 − 13 .
He introduced this
 
4 
elimination technique
5
to solve linear systems 0 0 0 1 4
arising from the
analysis of numerical
data from planetary
observations.
Image source:
Wikipedia.
3.4. Parametric solutions and inconsistent systems 73
This corresponds to a new linear system
x1 + 2x2 = − 14
x3 = − 13
4
x4 = 5
4
.
The RREF involves 3 leading 1s, one in each of the columns corre-
sponding to the variables x1 , x3 and x4 . The column corresponding to
x2 contains no leading 1. We distinguish between these cases.
A variable whose associated column in the RREF contains a leading
1 is called a leading variable. A variable whose column in the RREF
does not contain a leading 1 is called a free variable. In this case
x1 , x3 and x4 are leading variables and x2 is free.
To each free variable we associate a parameter. Here, we let x2 = t.
Then we can express the remaining leading variables in terms of these
parameters to get a parametric solution: Here, x3 and x4 don’t de-
pend on t, but there are
x1 = − 41 − 2t, x2 = t, x3 = − 13
4
, x4 = 5
4
. situations where all the
variables depend on pa-
rameters. Dependency
This parametric solution describes all the (infinitely many) possible on parameters varies

from case to case.
solutions of the system.
Above, we get a particular solution by choosing a specific numerical

value for t: e.g. if t = 1, then x1 = − 94 , x2 = 1, x3 = − 13
4
and x4 = 54 .
In general, we can have any number of free variables. To each one we
should associate a different parameter, so that it does not depend on
any of the others. Section 5.3 and the homework assignments contain
examples of systems having more than one parameter.
Inconsistent systems
We have dealt with systems having infinitely many solutions, by virtue
of parameters. It is also possible for a given system to have no solutions
at all.
Example 3.19. Show that the system
x1 + 2x2 + 3x3 = 5
2x1 + 5x2 + 7x3 = 13
3x2 + 3x3 = 10
has no solutions at all.
Solution. The system of equations corresponding to the REF (see

video) has as its third equation
0x1 + 0x2 + 0x3 = 1 i.e. 0 = 1.
This equation clearly has no solutions - no assignment of numerical

values to x1 , x2 and x3 will make the value of the expression 0x1 +0x2 +
0x3 equal to anything but 0. Hence the system has no solutions.
Definition 3.20 (Inconsistent systems). A linear system is called in-

consistent if it has no solutions. A system which has at least one
solution is called consistent.
An inconsistent system If a system is inconsistent, a REF obtained from its augmented matrix
will always betray itself
in this way – this is how
will include a row of the form 0 0 0 . . . 0 1, i.e. it will have a leading 1 in
we spot them! its rightmost column. Such a row corresponds to an equation of the form
0x1 + 0x2 + · · · + 0xn = 1, which has no solution.
Example 3.21. Consider the system 4x + 2y = 0 and 2x + y = 3. The

solution sets of these equations form two parallel lines which never
meet. This corresponds to the fact that there is no simultaneous
solution to both equations.
Figure 3.3: The solution sets of 4x + 2y = 0 and 2x + y = 3 never meet

y
x
−2 2
3.5. Connections between linear systems and matrices 75
A summary of linear systems

We end this section by summarising the possible types of solution that
a linear system can have.
1. Unique solution: this happens if the system is consistent and there

are no free variables. In this case the RREF obtained from the
augmented matrix has the form
 
1 0 0 ... 0 ∗
 0 1 0 ... 0 ∗ 
 
 
 0 0 1 ... 0 ∗ 
 
 .. .. .. . . .. .. 
 . . . . . . 
0 0 0 ... 1 ∗
with possibly some additional rows consisting entirely of 0s at the

bottom. The unique solution can be read from the rightmost column.
See the RREF from Example 3.17 (step 8 on the video).
2. Parametric solution: this happens if the system is consistent and
Note that if the number
of variables n strictly
has at least one free variable. Systems of this type have infinitely exceeds the number of
equations m, then the
many solutions. See Example 3.18. system never has a
unique solution: either
3. No Solutions: the system may be inconsistent, i.e., it has no solu- it is inconsistent, or at
least one of the vari-
tions. This happens if a REF obtained from the augmented matrix ables must be free, giv-
has a leading 1 in its rightmost column: ing a parametric solu-
tion.
 
..
.
 .. 
 
 . 
0 0 0 0 0 1
See Example 3.19.
3.5 Connections between linear systems and

matrices
Linear systems as matrix equations
Consider a general linear system of m equations in n variables x1 , x2 , . . . ,
xn . Suppose that, in equation i, the coefficient of the variable xj is given
by aij , and the right hand side is bi . Then our system can be written
a11 x1 + a12 x2 + . . . + a1n xn = b1

a21 x1 + a22 x2 + . . . + a2n xn = b2
.. .. ..
. . .
am1 x1 + am2 x2 + . . . + amn xn = bm .
Now consider the m × n matrix A whose (i, j)-entry is aij (so Aij = aij ),
and the n × 1 and m × 1 column vectors
   
x1 b1
 x2   b2 
   
x =  . 
  and b =  . 

,
.
 .   .. 
xn bm
respectively. Observe now that the m × 1 matrix product

  
a11 a12 . . . a1n x1
 a21 a22 . . . a2n   x2 
  
Ax =  .. .. ... ..   .. 
 

 . . .  . 
am1 am2 . . . amn xn
 
a11 x1 + a12 x2 + · · · + a1n xn
a21 x1 + a22 x2 + · · · + a2n xn
 
 
= 
 .. 

 . 
am1 x1 + am2 x2 + · · · + amn xn
encapsulates the left hand side of our whole system, and thus our system
of m equations can be rewritten as the single matrix equation
Ax = b.
The matrix A is called the coefficient matrix of the system. If A is a

n × n square matrix (i.e. if we have the same number of equations as
variables), and if A is invertible, then we can solve this equation using
matrix algebra:
x = In x = (A−1 A)x = A−1 b,
The fact that x = In x follows from Theorem 1.36.
Example 3.22. Consider once again the system in Example 3.5:
x1 + 2x2 − x3 = 5
3x1 + x2 − 2x3 = 9
−x1 + 4x2 + 2x3 = 0.
We will use matrix inverses to solve the system.
Solution. Here, m = n = 3 and

     
1 2 −1 x1 5
A =  3 1 −2  , x =  x2  and b =  9  .
     
−1 4 2 x3 0
In this case A is invertible and

 
−10 8 3
A−1 1 
4 −1 1  ,

= 11 
−13 6 5
so
      
x1 −10 8 3 5 2
−1
 x2  = x = A b = 4 −1 1   9  =  1  ,
  1     
11 
x3 −13 6 5 0 −1
giving the solution x1 = 2, x2 = 1, x3 = −1, exactly as in Example

3.5.
Theorem 3.23 gives a full description of the solutions of the matrix equa-
tion Ax = b, when m = n.
Theorem 3.23. Let A ∈ Mn (R) and let x and b be n×1 column vectors.
1. If A is invertible then the matrix equation Ax = b has the unique

solution x = A−1 b.
2. If A is not invertible then Ax = b either has no solutions (cor-

responding to an inconsistent system), or has infinitely many
solutions x (corresponding to a parametric solution).
The first part of this theorem was proved above, just before Example 3.22.
The proof of part 2 is beyond the scope of the module.
If A is not invertible, then the question of whether or not the equation
Ax = b has any solutions depends on the specific nature of A and b. This
problem must be approached on a case by case basis.
Computing matrix inverses by augmentation

Next, we present a second way to compute matrix inverses. The advan-
tage of the following method over that given in Chapter 1 (see Theorem
1.56) is that it is much more applicable for large values of n.
 
1 3 1
Example 3.24. Let A =  2 0 −1 . Find A−1 .
 
1 4 2
Solution. The inverse A−1 (if it exists) can be found using EROs as
follows.
1. Write down a 3 × 6 matrix B, whose first 3 columns comprise A

and whose second 3 columns comprise I3 :
 
1 3 1 1 0 0
B =  2 0 −1 0 1 0 
 
1 4 2 0 0 1
2. Use EROs to obtain the RREF corresponding to B:

 
1 3 1 1 0 0
 2 0 −1 0 1 0 
 
1 4 2 0 0 1
 
1 3 1 1 0 0
R2 → R2 − 2R1  0 −6 −3 −2 1 0 
 
R3 → R3 − R1 0 1 1 −1 0 1
 
1 3 1 1 0 0
R3 ↔ R2 1 −1 0 1
 
 0 1 
0 −6 −3 −2 1 0
 
R1 → R1 − 3R2 1 0 −2 4 0 −3
1 −1 0
 
 0 1 1 
R3 → R3 + 6R2 0 0 3 −8 1 6
 
1 0 −2 4 0 −3
1 −1 0
 
 0 1 1 
R3 → 13 R3 0 0 1 −3 3
8 1
2
 
R1 → R1 + 2R3 1 0 0 − 43 2
3
1
R2 → R2 − R3 − 3 −1 .
 5 1 
 0 1 0 3
0 0 1 −3 8 1
3
2
3. In this RREF, each of the first 3 columns contains a leading 1,

and the first 3 columns comprise the 3 × 3 identity matrix I3 .
As if by magic, the 3 × 3 matrix consisting of the last three
columns is the inverse of A:
   
− 34 2
3
1 −4 2 3
A−1 =  35 − 13 −1  = 31  5 −1 −3  .
   
−38 1
3
2 −8 1 6
The procedure outlined in the example above is called finding matrix

inverses by augmentation, and extends to general n × n matrices.
Theorem 3.25. Let A ∈ Mn (R), form the n × 2n matrix B = (A | In ),

and use EROs to reduce B to its unique RREF (P | Q), where P, Q ∈
Mn (R) (possible by Theorem 3.15). If P = In , then A is invertible and
A−1 = Q. If P 6= In , then A is not invertible.
The proof of Theorem 3.25 is again beyond the scope of the module.
 
1 2 −1
Example 3.26. Does A =  0 1
 
1  have an inverse?
1 4 1
 
1 2 −1 1 0 0
Solution. 1. Form the 3 × 6 matrix B =  0 1
 
1 0 1 0 .
1 4 1 0 0 1
2. Use EROs to reduce B to RREF:

 
1 2 −1 1 0 0
 
 0 1 1 0 1 0 
1 4 1 0 0 1
 
1 2 −1 1 0 0
 
 0 1 1 0 1 0 
R3 → R3 − R1 0 2 2 −1 0 1
 
R1 → R1 − 2R2 1 0 −3 1 −2 0
 
 0 1 1 0 1 0 
R3 → R3 − 2R2 0 0 0 −1 −2 1
 
1 0 −3 1 −2 0
 
 0 1 1 0 1 0 
R3 → −R3 0 0 0 1 2 −1
 
R1 → R1 − R3 1 0 −3 0 −4 1
0 .
 
 0 1 1 0 1
0 0 0 1 2 −1
The left hand half of the RREF is not I3 , so A is not invertible.
Matrix rank
Finally, we take a very brief look at the rank of a matrix.
Definition 3.27 (Matrix rank). The rank of a matrix A, denoted rank(A),

is the number of non-zero rows of any REF of A.
While a matrix can have more than one REF in general, the number of
non-zero rows of any such REF will always be the same (we won’t prove
this fact).
The notion of matrix rank can give us information about the number of
solutions and the number of parameters of solutions of systems of linear
equations.
Imagine that we have a system of equations in n variables. The system
is consistent, i.e. has a solution, if and only if the rank of the coefficient
matrix A of the system equals the rank of the augmented matrix. If this
is the case then we have more. We always have rank(A) 6 n. Moreover,
if rank(A) = n then the system has a unique solution. Otherwise, the
system has infinitely many solutions, and the parametric solution will
involve n − rank(A) > 1 parameters.
Chapter 4
Vector Geometry 2
4.1 Orthonormal bases of Rn and coordinate

systems
In this chapter, we are going to delve slightly more deeply into the theory
of vectors. Hold on tight.
Orthonormal lists of vectors
Warning 4.1. Throughout this chapter, we will be considering lots of

lists of vectors v1 , . . . , vk in Rn . Such lists of vectors will be denoted
in bold face, to distinguish them from the entries of a single vector
v = (v1 , . . . , vk ) in Rk . Do not confuse the two – doing so will cause
much anguish!
Definition 4.2 (Orthonormal lists of vectors). Let v1 , . . . , vk be a list

of vectors in Rn . We say that this list is orthonormal, or that the
vectors are orthonormal, if they satisfy two criteria:

1. vi = 1 for all i in the range 1 6 i 6 k;
2. vi · vj = 0 whenever 1 6 i, j 6 k and i 6= j.
A list of vectors is orthonormal if each vector has unit length, and if each
vector is orthogonal to every other vector in the list. Below are the
archetypal examples of orthonormal lists of vectors.
83
Example 4.3.
e 1 = (1, 0) and e2 = (0, 1). It is easy to verify that

In R 2 , set
1.
e1 = e2 = 1 and e1 · e2 = 0. Thus e1 , e2 is an orthonormal
It is clear from this ex-
ample that the definition list of vectors in R2 .
of the ei changes ac-
2. Likewise, in R3 , set e1 = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1).
cording to the value of
n, but the context should
make clear which list of As above, one can verify that each vector has unit length, and
ei one is dealing with.
that they are mutually orthogonal to each other.
3. More generally, fix a positive integer n. For each i between 1

and n, define ei = (0, . . . 0, 1, 0, . . . , 0) in Rn , where the single
1 is the ith entry of the vector. Then the list e1 , . . . , en is an
orthonormal list of vectors of Rn . Evidently, each ei has unit
length, and ei · ej = 0 whenever i 6= j, because the 1s in the
respective vectors appear in different places.
Figure 4.1: The vectors e1 , e2 , e3 in R3

z
The vectors e1 , e2 and

e3 point along the pos-
itive x-, y- and z-
axes respectively, and
are called the standard y
e3
basis vectors in R3 .
e2
O
e1
You should verify that the next examples yield orthonormal lists of vectors.
Example 4.4.
1. The vectors v1 = √12 (1, 1) and v2 = √12 (−1, 1) form an orthonor-

mal list in R2 (you should draw these vectors on a piece of
paper).
2. The vectors v1 = √1 (1, −1, 0),

2
v2 = √1 (1, 1, −2)
6
and v3 = √1 (1, 1, 1)
3
4.1. Orthonormal bases of Rn and coordinate systems 85
form an orthonormal list in R3 .
Remarks 4.5. Notice that if v has unit length, then the 1-element list
v is orthonormal. Criterion (1) of Definition 4.2 is evidently fulfilled.
Criterion (2) is also fulfilled, because it can only fail if we can find
two vectors in the list that are not orthogonal to each other!
Let us consider further the list e1 , . . . , en in Rn from Example 4.3 (3). Take
another vector x = (x1 , . . . , xn ) in Rn . The ith entry of x is the number xi .
Observe that
x · ei = (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) · (0, . . . , 0, 1, 0, . . . , 0) = xi .
Thus, the entries xi of x are equal to the scalar products x · ei , 1 6 i 6 n.
Observe further that
x1 e1 + · · · + xn en = x1 (1, 0, . . . , 0) + · · · + xn (0, . . . , 0, 1)
= (x1 , 0, . . . , 0) + · · · + (0, . . . , 0, xn )
= (x1 , . . . , xn ) = x.
Therefore, we can write the vector x as the following sum of n vectors

x = (x · e1 )e1 + · · · + (x · en )en . (4.1)
This equality holds for every vector x in Rn . It turns out that (4.1) is
part of a wider pattern exhibited by certain special lists of orthonormal
vectors.
Orthonormal bases and coordinate systems

Hereafter, we shall abbreviate orthonormal to on. The following theorem
yields some deep facts about lists of on vectors. Its proof uses a hefty
slice of more advanced theory from linear algebra, and is too rich for this
module.
Theorem 4.6. Let v1 , . . . , vk be a list of on vectors in Rn . The following

statements hold.
1. The length of the list cannot exceed the dimension of Rn , i.e.

k 6 n.
2. If k = n, then given any vector x in Rn , we have
x = (x · v1 )v1 + · · · + (x · vn )vn . (4.2)
3. If k < n, then we can extend the list v1 , . . . , vk by new vectors

vk+1 , . . . , vn , in such a way that the extended list
v1 , . . . , vn ,
is again an on list of vectors in Rn . Necessarily, this extended

list satisfies part (2).
We will examine the consequences of Theorem 4.6 one by one. Part (1)
of the theorem imposes a limit on the length of lists of on vectors in Rn .
For example, it is not possible to have a list of 4 on vectors in R3 .
Theorem 4.6 part (2) requires more attention. First, notice that (4.1) is a
special case of (4.2), applied to the vectors e1 , . . . , en . Next, consider the
following definitions, which are prompted by this part of the theorem.
Definition 4.7 (Orthonormal bases and coordinates). An on list of

vectors v1 , . . . , vn in Rn is called an orthonormal basis or an on basis
of Rn .
Given x in Rn , the numbers x · v1 , . . . , x · vn , as in (4.2) above, are the
coordinates of x, with respect to the basis v1 , . . . , vn .
In particular, the list of vectors e1 , . . . , en from Example 4.3 (3) is
called the standard on basis of Rn .
Let’s see how Theorem 4.6 part (2), (4.2), and the ideas in Definition 4.7
apply in some examples.
Example 4.8.
1. Let x = (2, 4). The coordinates of x, with respect to the standard

on basis e1 , e2 of R2 are 2 and 4. Now consider v1 , v2 from
Example 4.4 (1). Because the length of this on list equals the
dimension of R2 , it is an on basis of R2 , according to Definition
4.7. The coordinates of the same vector x are
√
x · v1 = (2, 4) · √12 (1, 1) = √62 = 3 2
4.1. Orthonormal bases of Rn and coordinate systems 87
√
and x · v2 = (2, 4) · √1 (−1, 1)
2
= √2
2
= 2,
with respect to v1 , v2 .
Moreover, we can see that
√ √ √ √
3 2v1 + 2v2 = 3 2 √12 (1, 1) + 2 √12 (−1, 1)
= (3, 3) + (−1, 1)
= (2, 4) = x. There is a video to ac-
company this example.
2. Let x = (3, 4, 5). The coordinates of x with respect to the stan-

dard on basis e1 , e2 , e3 of R3 are 3, 4 and 5, respectively. Now
consider v1 , v2 , v3 from Example 4.4 (2). This is an on basis of
R3 . The coordinates of x with respect to this basis are
x · v1 = − √12 , x · v2 = − √36 and x · v3 = √

12
3
.
As above, we can verify that
(x · v1 )v1 + (x · v2 )v2 + (x · v3 )v3 = x.
Figure 4.2: The coordinates of x with respect to v1 and v2
4 √
2v2 = (−1,1)
x
√
3 2v1 = (3,3)
Let v1 , . . . , vn be an on basis of Rn . For every i between 1 and n, we

can associate a coordinate axis running parallel to the vector vi . As
the vectors in the list are mutually orthogonal, so are the corresponding
axes. The coordinate axes associated with the standard on basis of Rn
are simply the usual coordinate axes that we are familiar with.
Given a vector x in Rn , the coordinates x · v1 , . . . , x · vn of x defined above
are precisely the numbers that we would obtain if we measured the
coordinates of x with respect to the new set of coordinate axes, rather
than the usual ones. We consider this in the figure below, where x, v1
and v2 are as in Example 4.8 (1).
Figure 4.3: A change of coordinate axes
√
3 2 = x·v1
x
See the accompanying
short video.
√
x·v2 = 2
v2 v1
This happens all the

Changing axes in this way yields a different coordinate system. In appli-
time in physics for ex- cations, sometimes it is better to change the coordinate system, because
ample, when we have to
switch between differ- doing so allows us to make more sense of the problem at hand.
ent observers who have
their own local coordi- Finally, we consider Theorem 4.6, part (3). Observe that if k < n, then
nate systems. You will
also see it in action (4.2) does not apply to every vector in Rn .
in Section 6.1 and Ap-
pendix B.
Example 4.9. Let v1 and v2 be the first two vectors in Example 4.4
(2). Certainly, v1 , v2 form an orthonormal list in R3 . Let x be the third
vector in that example. Then
(x · v1 )v1 + (x · v2 )v2 = 0v1 + 0v2 = 0,
which is certainly not equal to x. Thus (4.2) fails in this case.
More generally, if k < n, then given any extension of v1 , . . . , vk to

v1 , . . . , vn , as in Theorem 4.6 (there are always at least two extensions,
and often infinitely many), then
(vk+1 · v1 )v1 + · · · + (vk+1 · vn )vk = 0v1 + . . . 0vk = 0,

which does not equal vk+1 , as vk+1 = 1 6= 0. Thus (4.2) will always fail
for some vectors.
4.2. Orthonormal bases and orthogonal matrices 89
4.2 Orthonormal bases and orthogonal

matrices
All that sounds lovely, but how do we change from one coordinate system
to another in practice? This can be done using orthogonal matrices.
Take an on basis v1 , . . . , vn of Rn , and let’s consider them as n × 1 column
vectors, so that we can apply matrices to them, as in Section 2.4. Let
P ∈ Mn (R) be an orthogonal matrix, and let’s make a new list of vectors
w1 , . . . , wn in Rn , by setting wi = Pvi , 1 6 i 6 n. At this point we
resurrect Proposition 2.29, because it tells that that w1 , . . . , wn is also
an on basis. Evidently, the length of the new list equals the dimension
of Rn . We simply have to verify that it fulfils Definition 4.2. This is true,
because

wi = vi = 1 and wi · wj = (Pvi ) · (Pvj ) = vi · vj = 0,
whenever 1 6 i, j 6 n are distinct.
Example 4.10. Consider the matrix P ∈ M3 (R), back in Example 1.51,

and the standard on basis e1 , e2 , e3 of R3 . For i between 1 and 3,
define wi = Pei . Show that w1 , w2 , w3 is the same as the on basis
v1 , v2 , v3 of R3 given in Example 4.4 (2).
Solution. Treating these things as 3 × 1 column vectors, we have

    1 
√1 √1 √1 1 √
2 6 3
 √21 
w1 = Pe1 =  − √12 √1 √1   0  =  − 2  = v1 .
  
6 3
0 − √26 √1
3
0 0
Notice that v1 is just the first column of P! Likewise, w2 = Pe2 is

the second column of P, which equals v2 , and similarly for w3 .
The above example generalises to any number of dimensions. Suppose

we start with the standard basis of Rn , with its associated standard
coordinate system, and want to switch to a new coordinate system, based
on w1 , . . . , wn , where wi = Pei , 1 6 i 6 n, where P ∈ Mn (R) is orthogonal.
We make two observations. First, the vector wi , as a column vector, is
simply the ith column of P.
Second, let x = (x1 , . . . , xn ) be a vector in Rn . Its coordinates with respect
to the standard system are just x1 , . . . , xn . The coordinates of x with
respect to the new system are given by x·w1 , . . . , x·wn . Using Proposition
2.28, notice that
x · wi = x · (Pei ) = (P T x) · ei ,
and this is the ith entry of the vector P T x, because the ith entry of ei
equals 1 and all other entries of ei are zero. Hence the coordinates of
x with respect to the new system are precisely the entries of the vector
P T x. Let’s see this in action.
Example 4.11. Recall Examples 4.10 and 4.8 (2). Show that the co-
ordinates of the vector x = (3, 4, 5) with respect to w1 , w2 , w3 are the
entries in P T x.
Solution. From Example 4.8 (2), we know that the new coordinates
are − √12 , − √36 and √123 . Computation of P T x yields
    
√1
2
− √12 0 3 − √12
PT x =  √1 √1 − √26   4  =  − √36  ,
    
6 6
√1 √1 √1 5 √
12
3 3 3 3
as required.
We finish the chapter with a reprise of Exercise 2.32.

To a grizzled ancient
like me, such an exercise
Exercise 4.12. Let x be a vector in Rn , let v1 , . . . , vk , 1 6 k 6 n, be
is ‘interesting’. an on list of vectors in Rn , and let y be defined by
‘Interesting’ exercises
can be difficult for
first-timers, so don’t
y = (x · v1 )v1 + · · · + (x · vk )vk .
worry overmuch if you
find it tricky! Let a1 , . . . , ak be numbers, and let
z = a1 v1 + · · · + ak vk .
Show that x − y and y − z are orthogonal, regardless of the values of

a1 , . . . , ak (hint: first show that (x − y) · vi = 0 for 1 6 i 6 k). As in
Exercise 2.32, deduce that x − y 6 x − z .

If k = 1 then Exercise 4.12 boils down to Exercise 2.32 (just set v =

v1 and a = a1 ). It turns out that the vector y in Exercise 4.12 is the
orthogonal projection of x onto the vector subspace generated by the
vectors v1 , . . . , vk , and is the closest vector in the subspace to x.
4.2. Orthonormal bases and orthogonal matrices 91
Orthogonal projections onto subspaces are connected to the idea of least

squares solutions of linear systems.
More about subspaces and orthogonal projections can be found in Ap-
pendix C.2.
Chapter 5
Eigenvalues and Eigenvectors of Matrices
5.1 Eigenvalues and eigenvectors

Eigenvalues occur naturally in many physical and engineering problems.
A striking example is the collapse of Tacoma Narrows Bridge (built in
1940, collapsed in 1940). It is thought that the frequency of the wind
was too close to the ‘natural frequency’ of the bridge (the frequency at
which the bridge oscillates itself), causing an amplification effect which
destroyed the bridge. The natural frequency of the bridge is an example The collapse can be
of an eigenvalue. seen e.g. on YouTube.
Another example of the use of eigenvalues and eigenvectors, this time

in the context of high volumes of data, is Google’s PageRank algorithm,
used in its search engine. There is a special eigenvector, the entries of
which can be used to rank search results.
Examples and definitions

In this chapter, we are going to consider vectors as column vectors, so
that we can take advantage of matrix multiplication, as in Section 2.4.

Example 5.1. The matrix A = reflects points in the line
0 1
1 0
y = x:
! ! !
a a a
A = = 1·
a a a
93
94 Eigenvalues and Eigenvectors of Matrices
! ! !
a −a a
and A = = (−1) ·
−a a −a
for any a ∈ R.
Figure 5.1: Reflection matrices

y
4
! !
2 −2
A =
−2 2
2
x
−4 −2 2 4
!
2
−2
Definition 5.2 (Eigenvalues and eigenvectors). Let A ∈ Mn (R) and let

x be a non-zero n × 1 column vector (so not all of the entries of x are
zero). Then x is called an eigenvector of A if
Ax = λx
where λ is some number.

In this situation, λ is called an eigenvalue of A, and λ and x are said
David Hilbert (1862 – to correspond to each other.
1943) was arguably the
first to use the German
word eigen to denote
eigenvalues and eigen-
vectors in 1904. Hilbert
Example 5.3. If A is the reflection matrix above then 1! is an eigen-
was a pivotal figure in value of A, having corresponding eigenvector, say 11 . Also, −1 is
19th and 20th century
!
mathematics.
Image source:
an eigenvalue of A, having eigenvector, say 1
−1
.
Wikipedia.
Remarks 5.4. Notice that eigenvectors are not unique. If x is an

eigenvector of A, then any non-zero scalar multiple of x is also an
eigenvector (corresponding to the same eigenvalue). Indeed, given a
5.1. Eigenvalues and eigenvectors 95
number c, we have
A(cx) = c(Ax) = c(λx) = λ(cx).

It is convention to use
Hence, provided c is non-zero, cx is also an eigenvector, having the the Greek letter λ (or µ
or ν) to denote eigenval-
same eigenvalue λ. ues.
Warning 5.5. In this module, we have dealt exclusively with real num-
bers. Our matrices and vectors have been composed entirely of such
numbers. It turns out that some matrices, consisting entirely of real
numbers, can have eigenvalues that are complex numbers, that is,
numbers of the form a + ib, where a, b ∈ R and i is an imaginary
number satisfying i2 = −1. Moreover, in such cases, the correspond-
ing eigenvectors consist of complex numbers in general.
For example, it turns out that the rotation matrix Rθ from Example
1.2 (2) has eigenvalues cos θ ± i sin θ, which are not real numbers in
general (they are real only if θ is an integer multiple of π).
Hereafter, all the eigenvalues of the matrices considered in examples
will be real!
Seeking eigenvalues
Given a general matrix A ∈ Mn (R), how can we find its eigenvalues and
eigenvectors? The next theorem tells us how to find both. In practice, we
usually find the eigenvalues before finding the corresponding eigenvec-
tors.
Theorem 5.6. Let A ∈ Mn (R). Then λ is an eigenvalue of A if and

only if det(A − λIn ) = 0.
Proof. We are looking for non-zero n × 1 column vectors x and numbers

λ satisfying Ax = λx. This can be written in a slightly different way as
 
0
Ax − λx = 0 =  .  ,
 .. 
0
where 0 denotes the n × 1 zero column vector. Hence,
0 = Ax − λx = Ax − λIn x by Theorem 1.36
= (A − λIn ) x.
| {z }
a n×n matrix
By Theorem 3.23, this matrix equation has a unique solution if A−λIn has
an inverse, and either no solutions or infinitely many solutions otherwise.
Notice that x = 0 is always a solution of this equation. Since we are
looking for non-zero solutions, we require A − λIn to be singular, i.e. not
invertible (if A − λIn were invertible, then the zero vector 0 would be the
only solution). If A − λIn is singular, then since 0 is always a solution,
we will have infinitely many solutions. In particular, we will have a non-
zero solution. Now A − λIn is singular if and only if det(A − λIn ) = 0, by
Theorem 1.56.
We put this to use on the matrix in Example 5.1.
Example 5.7. Determine the eigenvalues of A in Example 5.1.
Solution. We have
! ! !
0 1 λ 0 −λ 1
A − λI2 = − = .
1 0 0 λ 1 −λ
Hence
!
−λ 1
det(A − λI2 ) = det = (−λ)(−λ) − 1 × 1 = λ2 − 1.
1 −λ
So det(A − λI2 ) is a quadratic polynomial in λ.

It factorises to give (λ − 1)(λ + 1), so det(A − λI2 ) = 0 if and only if
λ = 1 or λ = −1.
5.2 The characteristic equation

Given a matrix A ∈ M2 (R), the expression det(A − λI2 ) will always be a
quadratic in λ. In general, if A ∈ Mn (R), the expression det(A − λIn ) will
be a polynomial of degree n in λ.
Definition 5.8. Let A ∈ Mn (R). The characteristic polynomial of A is

the determinant of the matrix A − λIn ∈ Mn (R). It is a polynomial of
5.2. The characteristic equation 97
degree n in λ. The equation
det(A − λIn ) = 0
is called the characteristic equation of A.
Thus, we can restate Theorem 5.6 as saying that the eigenvalues of A

are exactly the roots or the solutions of the characteristic equation of A.
Finding eigenvalues boils down to finding roots of polynomials.
In applications we often consider 3 × 3 matrices. The characteristic poly-
nomials of such matrices have degree 3, i.e. they are cubic polynomials.
By considering the quadratic formula (if necessary), it is easy to find the
roots of a quadratic polynomial. However, it is generally difficult to find
the roots of a polynomial of degree 3 and above.
In exercises and examples, we often look at polynomials having integer
coefficients. These may not have integer roots at all, but if they do, the
following proposition is very helpful and, in particular, can be applied The formulae for finding
when looking for eigenvalues of matrices having integer entries. roots of cubic and quar-
tic polynomials were
discovered by Italian
Proposition 5.9. Let mathematicians in the
16th century. However,
p(x) = an x n + an−1 x n−1 + · · · + a1 x + a0 ,

they rank among the
most terrible things
known to humanity (I
be a polynomial having integer coefficients. If p(x) has any integer dare you to look them
up online if you haven’t
roots, then they must be factors of its constant term a0 . seen them already).
For this reason, we will
not be making use of
Proof. Assume p(x) that has an integer root r. Then

them in this module.
Interestingly, the ‘cubic
0 = p(r) = an r n + an−1 r n−1 + · · · + a1 r + a0 ,

formula’ led to the
conception of complex
numbers.
meaning As it turns out, it is im-
a0 = −(an r n−1 + an−1 r n−2 + · · · + a1 )r. possible for a ‘quintic
formula’ to exist. This
Since the expression in parentheses is again an integer, we see that a0 remarkable result was
proved by the brilliant
is an integer multiple of r. Norwegian mathemati-
cian Niels Henrik Abel
(1802 – 1829), pictured
Example 5.10. Find the eigenvalues of above. Abel died of tu-
berculosis aged only 26.
  Image source:
5 6 2 Wikipedia.
A =  0 −1 −8  .
 
1 0 −2
Solution. The characteristic equation of A is
0 = det(A − λI3 )
 
5−λ 6 2
−1 − λ −8 
 
= det  0
1 0 −2 − λ
! !
−1 − λ −8 6 2
= (5 − λ) det + det
0 −2 − λ −1 − λ −8

= (5 − λ)(1 + λ)(2 + λ) + −48 + 2(1 + λ)
= (5 − λ)(λ2 + 3λ + 2) + 2λ − 46
= −λ3 + 2λ2 + 15λ − 36.
The constant term of p(λ) = −λ3 + 2λ2 + 15λ − 36 is −36. Thus, by

Proposition 5.9, the only possible integer roots of p(λ) are
±1, ±2, ±3, ±4, ±6, ±9, ±12, ±18, and ± 36.
We test some of these:
p(1) = −1 + 2 + 15 − 36 6= 0
p(−1) = 1 + 2 − 15 − 36 6= 0
p(2) = −8 + 8 + 30 − 36 6= 0
p(−2) = 8 + 8 − 30 − 36 6= 0
p(3) = −27 + 18 + 45 − 36 = 0.
Since 3 is a root of p(λ), we know that (3 − λ) is a factor of p(λ), and

so we can find the other, quadratic factor:
−λ3 + 2λ2 + 15λ − 36 = (3 − λ)(λ2 + λ − 12).
This factorises further to
(3 − λ)(λ + 4)(λ − 3).

We can find the
roots of the quadratic
factor easily, using
the quadratic formula
(MATH10040 Fact 1.19) Thus the eigenvalues of A are λ1 = λ2 = 3 and λ3 = −4. Repeated
if necessary. roots (as we have here) are sometimes called degenerate.
5.3. Eigenvectors 99
Example 5.11. Find the eigenvalues of

 
8 −3 −3 This matrix arises out of
considering how easily
A =  −3 8 −3  .
  a cube will spin around
a given axis (specifi-
−3 −3 8 cally, A is related to the
inertia tensor of a cube).
Solution. The characteristic equation of A is
0 = det(A − λI3 )
 
8−λ −3 −3
= det  −3 8 − λ −3  .
 
−3 −3 8 − λ
Sometimes, as in this
example, there are

= (8 − λ) (8 − λ)2 − 9 + 3 −3(8 − λ) − 9 − 3 9 + 3(8 − λ)
‘smarter’ ways of evalu-
ating such determinants
= (8 − λ)(λ2 − 16λ + 55) + 6(3λ − 33)

using EROs, which lead
to a factorisation more
= −λ3 + 24λ2 − 165λ + 242.

quickly and with fewer
chances of errors in
arithmetic. However,
The constant term of the resulting polynomial p(λ) is 242. After some there was not enough
time to present such
testing as above, we find that p(2) = 0, hence λ − 2 is a factor. Upon methods in the module.
further factorisation we are left with
0 = (11 − λ)(λ − 11)(λ − 2).
Thus, the three eigenvalues are
λ1 = λ2 = 11 and λ3 = 2,
(so λ1 and λ2 are degenerate).
5.3 Eigenvectors
We know now how to find eigenvalues. How do we find the correspond-
ing eigenvectors? The matrix equation (A − λIn )x = 0 in the proof of
Theorem 5.6 may be regarded as a system of linear equations in which
the coefficient matrix is A − λIn and the variables are the n entries of the
column vector x, which we can denote by x1 , . . . , xn – see Section 3.5.
Thus, to find eigenvectors, we are looking for non-zero solutions of

   
x1 0
(A − λIn )  .  =  .  .
 ..   .. 
xn 0
Example 5.12. We revisit Example 5.10. If

 
5 6 2
A =  0 −1 −8  ,
 
1 0 −2
then A has eigenvalues λ1 = λ2 = 3 and λ3 = −4. Find an eigenvector

of A corresponding to the eigenvalue λ3 = −4.
Solution. We need a column vector x, having entries x1 , . . . , xn , not

all zero, for which Ax = −4x or, equivalently
(A − (−4)I3 )x = (A + 4I3 )x = 0.
In other words,
    
9 6 2 x1 0
 0 3 −8   x2  =  0  .
    
1 0 2 x3 0
This fact has a practi-
cal benefit when find-
ing eigenvectors: if your
We will use EROs to solve this system of linear equations. We know
linear system does not that the determinant of the matrix on the left is 0, so we will get
have a parametric solu-
tion, then you know that either no solutions or a parametric solution, yielding infinitely many
you have done some- solutions. However, since the zero vector solves the system, we know
thing wrong! When
finding eigenvectors, the therefore that we will get a parametric solution.
number of parameters,
i.e. the number of free  
variables, equals the 9 6 2 0
number of zero rows in
 0 3 −8 0 
 
your RREF. Thus there
must be at least one
zero row in your RREF 1 0 2 0
– if not then you have
made a mistake.
 
R1 ↔ R3 1 0 2 0
 0 3 −8 0 
 
9 6 2 0
 
1 0 2 0
 0 3 −8 0 
 
R3 → R3 − 9R1 0 6 −16 0 In this sequence of

  EROs, the 4th column
1 0 2 0 is zero and thus never
changes. Hence, in
 0 3 −8 0 
 
future eigenvector
examples, we suppress
R3 → R3 − 2R2 0 0 0 0 it (remembering that it
  is still there), to avoid
1 0 2 0 writing unnecessary
zeros. We can do this
R2 → R2  0 1 − 38 0 
1   whenever the right
3 hand side of a linear
0 0 0 0 system is zero (but only
in this case!).
Thus x1 and x2 are leading variables and x3 is free: if x3 = t then
x1 = −2t and x2 = 83 t, so
   
−2t −2
x =  38 t  = t  83  .
   
t 1
Any non-zero choice of t produces an eigenvector.
For instance, if t = 3 above then

 
−6
x =  8 ,
 
will suffice, and you can verify that

     
5 6 2 −6 24 −6
 0 −1 −8   8  =  −32  = −4  8  .
      
1 0 −2 3 −12 3
Example 5.13. We revisit Example 5.11. Find unit eigenvectors v1 , v2

and v3 corresponding to the eigenvalues of
The eigenvectors we ob-  
tain happen to be paral- 8 −3 −3
lel to the so-called prin-
A =  −3 8 −3  ,
 
cipal axes of the inertia
−3 −3
tensor of a cube.
8
in Example 5.11. Moreover, choose the eigenvectors so that they are

mutually orthogonal, and thus form an on basis of R3 .
Solution. The eigenvalues are λ1 = λ2 = 11 and λ3 = 2. Being more

straightforward, we find v3 (corresponding to λ3 = 2) first. The vector
must satisfy
    
6 −3 −3 x1 0
(A − 2I3 )x = 0, i.e.  −3 6 −3   x2  =  0  .
    
−3 −3 6 x3 0
Using EROs yields

     
Note here that we have 6 −3 −3 1 1 −2 1 1 −2
suppressed the fourth
 −3 6 −3  →  1 −2 →  0 −3
columns of these matri-
     
1  3 
ces, which consist en-
tirely of zeros and never −3 −3 6 −2 1 1 0 3 −3
change.    
1 1 −2 1 0 −1
→  0 1 −1  →  0 1 −1  ,
   
0 0 0 0 0 0
so if x3 = t then x1 = x2 = t. It follows that

   
t 1
x =  t  = t  1 .
   
t 1

We seek a value of t to produce v3 , in such a way that v3 = 1.
√
Since x = |t| 3, we can choose t = √13 or t = − √13 . Either choice

is valid: let’s pick t = √1 .

3
Then
 
1
v3 = √13  1  .
 
Eigenvectors v1 and v2 must both satisfy

    
−3 −3 −3 x1 0
(A − 11I3 )x = 0, i.e.  −3 −3 −3   x2  =  0  .
    
−3 −3 −3 x3 0
Straightaway, EROs yield

   
−3 −3 −3 1 1 1
 −3 −3 −3  →  0 0 0  ,
   
−3 −3 −3 0 0 0
so x1 is leading and x2 , x3 are free. Here, we need two parameters: if

x2 = s and x3 = t then x1 = −s − t. Consequently,
     
−s − t −1 −1
x =  s  = s 1  + t  0 .
     
t 0 1
We must make two sets of choices of s and t, in such a way that we

produce mutually orthogonal unit vectors v1 and v2 (that also have
to be orthogonal to v3 , but don’t worry about that for now. . . ). There
are infinitely many valid choices. Here are two sets of choices that
work. We begin with v1 . If s = − √12 and t = 0, we obtain the unit
vector  
1
v1 = √12  −1  .
 
Then we make a second set of choices to obtain v2 . We need s and

t so that
   
1 −s − t
0 = v1 · x = √12  −1  ·  s  = − √12 (2s + t).
   
0 t
Hence t = −2s, giving

   
s 1
x =  s  = s 1 .
   
−2s −2

Bearing in the mind the condition v2 = 1, we see that s = √1
6
or
s = − √16 . Either will do: let’s set s = √16 .
We have ensured that v1 and v2 are orthogonal unit vectors. What
about orthogonality to v3 ? As it happens, we can see by inspection
that v3 is orthogonal to both v1 and v2 , so we are done.
5.4 Symmetric matrices and orthonormal bases

of eigenvectors
As you can see, the vectors v1 , v2 , v3 are the same as the ones in Example
4.4 (2). In Example 5.13 above, it may have seemed suspiciously conve-
nient that the vector v3 just happened to be orthogonal to the other two
vectors we chose. In fact, this was no accident, and is an instance of a
more general phenomenon.
The matrix A in Example 5.13 is symmetric. As it turns out, the eigen-
values and eigenvectors of symmetric matrices happen to be very well
behaved. This is one of the reasons why symmetric matrices are so useful.
The final result of the chapter is very important. It is used, for example, in
Section 6.1 and Appendix B. Its proof is beyond the scope of the module.
Theorem 5.14. Let A ∈ Mn (R) be symmetric. Then the following

statements hold.
1. Every eigenvalue of A is a real number; we can list them in

descending order as λ1 , . . . , λn , so λ1 > · · · > λn (note that they
5.4. Symmetric matrices and orthonormal bases of eigenvectors 105
need not necessarily be distinct).
2. Let λi and λj be distinct eigenvalues of A, and let x and y be

two eigenvectors of A that correspond to λi and λj , respectively.
Then x · y = 0.
3. Furthermore, there exists an orthogonal matrix P ∈ Mn (R), such

that
a) Pe1 , . . . , Pen is an on basis of Rn ;

b) each Pei is an eigenvector of A having eigenvalue λi , and
c) the matrix product
 
λ1 0 . . . 0
0 λ2
 
T
 0 
P AP =  .. .. .. ,

 . . .


0 ... 0 λn
is a diagonal matrix (see Proposition 1.53 (3)), where the

eigenvalues listed above run along the main diagonal.
Theorem 5.14 (2) explains the happy orthogonality ‘accident’ in the solu-
tion of Example 5.13. The vectors v1 and v2 corresponded to λ1 = λ2 = 11,
while v3 corresponded to λ3 = 2. Hence we were bound to have v1 · v3 =
v2 · v3 = 0, no matter what our choices of v1 and v2 were.
This helps in such examples: given a symmetric matrix, we don’t need
to worry about the orthogonality of eigenvectors having different eigen-
values. We still need to engineer orthogonality between eigenvectors
having the same eigenvalue, however, as in the case of v1 and v2 above.
Theorem 5.14 (3) will help enormously when we consider quadratic forms
in Section 6.1. There, we will see the utility of listing the eigenvalues in
descending order.
Example 5.15. Let A be the symmetric matrix in Examples 5.11 and

5.13, and let  1 
1 1 √ √ √
2 6 3
P =  − √12 √16 √1 ,
 
3
0 − √26 √1
3
be the orthogonal matrix from Example 1.51. Verify that Pe1 , Pe2
and Pe3 are the eigenvectors v1 , v2 and v3 found in Example 5.13,
and that  
11 0 0
P T AP =  0 11 0  ,
 
0 0 2
is a diagonal matrix, with the eigenvalues λ1 > λ2 > λ3 of A found in
Example 5.11 running along the main diagonal.
Chapter 6
Matrices 2
6.1 Quadratic forms

Definition of quadratic forms
Quadratic forms are special types of function that map vectors x in Rn to Quadratic forms have
applications in calcu-
numbers in R. They are closely associated with symmetric matrices and lus, number theory, ge-
are used, for example, in Appendix B. Before giving their full definition, ometry and statistics.
They even managed to
we look at some examples. find their way into the
theory behind wireless
Example 6.1. communications.
1. The function q : R2 → R, given by
q(x) = x12 + x22 where x = (x1 , x2 ),
is a quadratic form. The function takes each vector x = (x1 , x2 )

in R2 as input and returns as output the number x12 + x22 , which
happens to equal x 2 in this case.

2. Fix numbers a, b, c ∈ R. In general, any function q : R2 → R of

the form
q(x) = ax12 + bx22 + cx1 x2 where x = (x1 , x2 ),
is a quadratic form. In the first example above, we had a = b =

1 and c = 0. If a = 5, b = −3 and c = −4, then we obtain
q(x) = 5x12 − 3x22 − 4x1 x2 ,
107
108 Matrices 2
which is another quadratic form.
3. Now fix numbers a, b, c, d, e, f ∈ R. Any function q : R3 → R of

the form
q(x) = ax12 + bx22 + cx32 + dx1 x2 + ex2 x3 + fx1 x3 ,
where x = (x1 , x2 , x3 ), is a quadratic form, this time defined on

R3 . For instance
q(x) = 2x12 − 5x22 + x32 − 11x1 x2 + 6x1 x3 ,
yields a quadratic form (here, e = 0).
Quadratic forms on R2 , Figure 6.1: q(x) = 5x12 − 3x22 − 4x1 x2 plotted as a 3D surface
or indeed functions
on R2 in general,
can be plotted as 3D
surfaces: the value
of f(x1 , x2 ) yields the
height of the surface
above the point (x1 , x2 ).
These days, plotting
such things with
computers is quite easy,
and doing so helps
to put flesh on them.
There are some free
graphing apps, such
as Quick Graph for
iOS, which enable the
user to quickly sketch
quadratic forms (and
more general functions)
on R2 , and view them
from different angles.
Quadratic forms can be defined on Rn , for any positive integer n. Roughly

speaking, if you define a function on Rn as a sum of multiples of terms of
the form xi2 , 1 6 i 6 n, or xi xj , 1 6 i < j 6 n, where x = (x1 , . . . , xn ), then
you will obtain a quadratic form. Notice that all of the examples above
follow that pattern. The function f : R2 → R defined by
f(x) = x12 − 4x12 x2 ,
is not a quadratic form, because the term x12 x2 does not fit into the pattern
above.
6.1. Quadratic forms 109
Having that rough idea in mind, we gingerly approach the formal defini-
tion.
Definition 6.2 (Quadratic forms). A quadratic form is a function q :

Rn → R of the form
n
X X
q(x) = ai xi2 + bij xi xj ,
i=1 16i<j6n
where x = (x1 , . . . , xn ) and where ai , 1 6 i 6 n, and bij , 1 6 i < j 6 n,

are fixed numbers.
The expression ‘1 6 i < j 6 n’ under the second summation sign means

that you should sum over all possible combinations of i and j that obey
that rule. So if n = 2, there is only one possible combination, namely
i = 1, j = 2. If n = 3, there are three possible combinations: 1, 2, 2, 3
and 1, 3.
Example 6.3.
1. Let n = 2, a1 = 5, a2 = −3 and b12 = −4. Then

n
X X
ai xi2 + bij xi xj = 5x12 + (−3)x22 + (−4)x1 x2 ,
i=1 16i<j6n
which yields Example 6.1 (2).
2. If we let n = 3, a1 = 2, a2 = −5, a3 = 1, b12 = −11, b23 = 0

and b13 = 6, then we obtain Example 6.1 (3).
Quadratic forms and symmetric matrices

The following examples reveal the connection between quadratic forms
and symmetric matrices.
Example 6.4.
1. Let A = I2 . Given a vector x in R2 , treated as a column vector,
110 Matrices 2
observe that
! ! !!
x1 1 0 x1
x · (Ax) = ·
x2 0 1 x2
! !
x1 x1
= · = x12 + x22 ,
x2 x2
which gives Example 6.1 (1).

−2
2. Let A = −2 −3 and let x be a column vector in R2 . Then
5
! ! !!
x1 5 −2 x1
x · (Ax) = ·
x2 −2 −3 x2
! !
x1 5x1 − 2x2
= ·
x2 −2x1 − 3x2
= x1 (5x1 − 2x2 ) + x2 (−2x1 − 3x2 ) = 5x12 − 3x22 − 4x1 x2 ,
which is Example 6.1 (2).

!
2 − 11
2 3
3. Finally, let A = − 11
2 −5 0 and let x be a column vector in
3 0 1
R3 . Then
    
x1 2 − 11 2
3 x1
x · (Ax) =  x2  ·  − 11 −5 0   x2 
    
2
x3 3 0 1 x3
   
x1 2x1 − 11 x + 3x3
2 2
=  x2  ·  − 11 x − 5x2 
   
2 1
x3 3x1 + x3
= x1 (2x1 − 11
x + 3x3 ) + x2 (− 11
2 2 2 1
x − 5x2 ) + x3 (3x1 + x3 )
= 2x12 − 5x2 + x3 − 11x1 x2 + 6x1 x3 ,
2 2
which is Example 6.1 (3).
Every quadratic from can be expressed succinctly in this way, using a

symmetric matrix and the scalar product. We won’t prove the next result.
Theorem 6.5. Let q : Rn → R be a quadratic form. Then there exists

a unique symmetric matrix A ∈ Mn (R), such that
q(x) = x · (Ax).
Conversely, given a symmetric matrix A ∈ Mn (R), the function q de-

fined by the expression above is a quadratic form.
Given a quadratic form q with numbers ai , 1 6 i 6 n, and bij , 1 6 i < j 6

n as in Definition 6.2, we can generate the associated symmetric matrix
A by setting Aii = ai , 1 6 i 6 n, and Aij = Aji = 12 bij , 1 6 i < j 6 n. Note
the division by 2 when we define the entries of A off the main diagonal
– you can see this in action in Example 6.4 above.
The next example of quadratic forms adopts some notions from calculus.
Example 6.6. Let f : Rn → R be a function having continuous second

order partial derivatives. Then the Hessian matrix Hf (a) of f at the
point (or vector) a in Rn is a symmetric matrix in Mn (R) which yields
a corresponding quadratic form. For instance, if n = 2, f(x1 , x2 ) =
x1 sin x2 and a = (3, π4 ), then
∂2 f π ∂2 f π
! !
√1
∂x12 ∂x1 ∂x2
(3, ) (3, ) 0
Hf (a) = ,
4 4 2
∂ f π ∂ f π
=
− √32
2 2
√1
∂x2 ∂x1 ∂x22
(3, 4
) (3, 4
) 2
√
which has corresponding quadratic form q(x) = − √32 x22 + 2x1 x2 . It is
important to note that the quadratic form will generally vary with a:
try different values of a above and see what happens.
Change of coordinates and diagonalisation

Recall Section 4.2. Let q : Rn → R be a quadratic form and let A be its
associated symmetric matrix, furnished by Theorem 6.5. It is possible to
change our coordinate system in such a way that, with respect to the
new system, our quadratic form will look simpler than it did originally.
Specifically, we will be able to rewrite it in such a way that the ‘cross
terms’, i.e. the terms of the for xi xj , i < j, will disappear. We call this
process diagonalisation of the quadratic form.
112 Matrices 2
Let P be the orthogonal matrix kindly supplied to us by Theorem 5.14,

part (3). We know that Pe1 , . . . , Pen is an on basis of Rn , consisting of
eigenvectors of A, and having corresponding eigenvalues λ1 > · · · > λn ,
listed in descending order. Let D = P T AP be the diagonal matrix having
those eigenvalues along its main diagonal.
Given a column vector x in Rn , let us define a new column vector y = P T x.
Following the end of Section 4.2, after Example 4.10, we have that the
ith entry yi of y is given by yi = y · ei = (P T x) · ei = x · (Pei ), which equals
the ith coordinate of x, with respect to the new basis. In other words
x = y1 (Pe1 ) + · · · + yn (Pen ).
Moreover, since P is orthogonal, we have Py = PP T x = In x = x. Now

we can piece all of this together to make something magical happen:
q(x) = x · (Ax) = (Py) · (APy)

= y · (P T APy) Proposition 2.28
= y · (Dy)
    
y1 λ1 0 . . . 0 y1
 y2   0 λ2 y2
    
0  
=  .   · 
  .. ..  .. 
. ..  
 .   . . .  . 
yn 0 ... 0 λn yn
   
y1 λ1 y1
y2   λ2 y2
   
 
=  ..· . 
   . 
 .  . 
yn λn yn
= λ1 y21 + λ2 y22 + · · · + λn y2n .
As promised, we have rewritten q(x) with respect to a new coordinate

system, in such a way that all the cross terms have been removed.
In this way we can gain a far better understanding of the behaviour of
how quadratic forms behave.
Example 6.7. Diagonalise the quadratic form q : R3 → R given by
q(x) = 8x12 + 8x22 + 8x32 − 6x1 x2 − 6x1 x3 − 6x2 x3 .

Solution. Happily for us, the matrix A associated with the quadratic
form is the same matrix as in Examples 5.11, 5.13 and 5.15. We collect
the corresponding orthogonal matrix P from Example 5.15 and set
y = P T x. Bearing in mind the eigenvalues of A, we obtain
q(x) = 11y21 + 11y22 + 2y23 .
This solution, while formally correct, seems rather terse and ill-mannered.
In particular, it doesn’t really reveal what is happening, so let’s take a
closer look. If we evaluate y we obtain
   1  
y1 √
2
− √12 0 x1
T
 y2  = y = P x =  √16 √16 − √26   x2 
    
y3 √1
3
√1
3
√1
3
x3
 
√1 x1 − √1 x2
2 2
=  √16 x1 + √16 x2 − √26 x3  .
 
√1 x1 + √1 x2 + √1 x3
3 3 3
Thus we have concrete expressions for the entries of y in terms of those

of x. You should verify that, when these expressions are substituted into
the diagonalised version of the quadratic form, the original version of it
is recovered.
We finish the section by giving one example of how diagonalisation can
help us understand quadratic forms, namely how to find the maximum
and minimum values of q(x), subject to the constraint that x is a unit
vector. Quite often this sort of problem is approached by considering the
method of Lagrange multipliers, but we do not need such things in this
particular case.
Proposition 6.8. Let q : Rn → R be a quadratic form, and let λ1 >

· · · > λn be as above. The maximum and minimum values that q(x)
can take, subject to the constraint x = 1, equal λ1 and λn , respec-

tively.
Proof. With y = P T x as above, observe that

2
y21 + · · · + y2n = y = x 2 ,

by Proposition 2.29. Hence

q(x) = λ1 y21 + · · · + λn y2n
114 Matrices 2
6 λ1 (y21 + · · · + y2n ) as λ1 > · · · > λn

= λ1 x 2 = λ1 ,

whenever x = 1. Likewise, q(x) > λn whenever x = 1. Lastly, if we

set x = Pe1 , then x = 1 by Proposition 2.29 and y = P T x = e1 , thus

q(x) = λ1 1 + λ2 0 + · · · + λn 0 = λ1 .
Likewise, setting x = Pen yields q(x) = λn .
Example 6.9. The maximum and minimum values that the quadratic
form q(x) in Example 6.7 can take, subject to the condition x = 1,

Note that if you are are 11 and 2, respectively. Verify that these values are attained at
asked to find these max-
   
√1
imum and minimum val-
ues of a quadratic form, √1
2 3
Pe1 =  − √12 and Pe3 =  √1 ,
and nothing else, then    
all you need to do is  3
find the eigenvalues of
0 √1
the associated symmet- 3
ric matrix and pick out
the largest and smallest
ones.
respectively (you can obtain the maximum at Pe2 as well).
Definition 6.10. Let A ∈ Mn (R) be symmetric. Then A is called pos-

itive definite or negative definite if its eigenvalues are all strictly
positive or all strictly negative, respectively.
In view of Proposition 6.8, A ∈ Mn (R) is positive definite if and only if the

associated quadratic form x · (Ax) is strictly positive for all non-zero x in
Rn . Indeed, A is positive definite if and only if λn > 0. Given non-zero x,
we have
x · (Ax) = x 2 x̂ · (Ax̂) > x 2 λn ,

where x̂ = x/ x is the associated unit vector. Given that x 2 is strictly

positive whenever x is non-zero, it follows that x · (Ax) is always strictly
positive for non-zero x if and only if λn > 0. A similar argument, using λ1
this time, shows that A is negative definite if and only if x · (Ax) < 0 for
all non-zero x.
Exercise 6.11. Establish whether or not the matrices in Example 6.4

(2) and (3) are positive definite, negative definite, or neither.
6.2. Matrix norms 115
6.2 Matrix norms

Lengths of vectors were covered in Section 2.2. Beside Definition 2.14
is a mysterious marginal note concerning norms of vectors. A norm is
a more general notion than the length of a vector, that can be applied
equally to vectors and to things that don’t look very much like vectors at
all (but which are in fact vectors, depending on the linear algebra course
you are taking).
It so happens that matrices have norms. In fact there are many different
ways in which one can impose a norm on a matrix. In our brief treatment,
we will look at just one.
Suppose that we fix a matrix A ∈ Mn (R). Let’s consider a unit vector x in
Rn , that is, x = 1. If we let A act on x, as in Section

2.4,
we obtain a
new vector Ax, and we can measure its length to get Ax . Now imagine

n
vectors x in R . The norm of A is
that we do this for all unit defined to
be the largest value of Ax , subject to the constraint that x = 1.

Definition
6.12 (Matrix norms). Let A ∈ Mn (R). The norm of A, written
A , is defined to be the maximum of Ax , subject to the constraint
that x = 1.

At first glance, it may not be obvious that there should be such a maximum
number at all.
But there is, courtesy of a particular quadratic form. If we
square Ax , we obtain

2
Ax = (Ax) · (Ax) = x · (AT Ax),
by Fact 2.20 (2) and Proposition 2.28. This defines a quadratic form,
because AT A is symmetric, no matter what the original matrix A is up
to! Indeed, (AT A)T = AT AT T = AT A, using Fact 1.23 (7), and the fact that
AT T = A for any matrix.
2
Therefore, by Proposition 6.8, the maximum value of Ax , subject to the
constraint that x = 1, is equal to λ, where λ is the largest eigenvalue

of the symmetric matrix AT A. Taking positive square roots of both sides
now yields √
A = λ.
Note that λ is non-negative (so we can take its square root legitimately),
2
because it is the maximum value of Ax , which can never be negative.
116 Matrices 2
!
1 1
Example 6.13. Compute A , where A = .
0 1
Solution. We see that

! !
AT = and AT A =
1 0 1 1
.
1 1 1 2
Now det(A√T A − λI2 ) = (1 − λ)(2 − λ) − 1 = λ2 − 3λ + 1, which has roots

λ = 23 ± 12 5. Therefore the larger of the two eigenvalues of AT A is
√ q √
λ = 23 + 12 5, giving A = 32 + 21 5.
Appendix A
Discussion board and WeBWorK guides
A.1 How to use the Moodle discussion boards

Writing mathematics on the discussion boards
Evidently, written mathematics uses a host of symbols and notation that
is not available in ordinary word processing software. These days, the
majority of mathematicians use LaTeX (or its predecessor TeX) to typeset
mathematical documents. All of the material in this module was typeset
using LaTeX. However, learning LaTeX takes some time, and it was not
designed for direct use on the web. Instead, the more recent MathJax
system enables mathematics to be written directly into web pages using
simple LaTeX expressions.
The Moodle site is supported by MathJax, so users can write simple
mathematical expressions when posting to forums. We encourage users
to use MathJax when posting mathematical queries; the results look good
and clear, and only a minimal knowledge of LaTeX is required.
For the remainder of this section, we cover the basics of how to post sim-
ple mathematical expressions using MathJax. Note that when MathJax
is used on the discussion boards, typically it takes a few seconds for
mathematical expressions to be rendered properly!
1. Enclose mathematical notation using dollar signs

All mathematical notation should be enclosed by a pair of dollar
signs. For example, if you want to post the query ‘I don’t understand
why a4 = 3 in that example.’, you should write
I don’t understand why $aˆ4 = 3$ in that example.
117
118 Discussion board and WeBWorK guides
If you want to post a mathematical expression on a separate line,

enclose it using a pair of double dollar signs. For example, writing
$$aˆ4 = 3$$
renders the expression

a4 = 3
on a separate line, as above.
2. Arithmetic and fractions

Expressions such as ‘x + y = 5’ or ‘x − 2y = 3.6’ can rendered by
writing
$x + y = 5$ or $x - 2y = 3.6$,
respectively. You can write fractions either by using the division
sign or by using the \frac command, together with two pairs of
curly braces { and }. For instance,
$3/5$ and $\frac{3}{5}$
yield 3/5 and 35 , respectively.
3. Exponents/superscripts and subscripts

The characters ˆ and _ are used to render exponents/superscripts
and subscripts, respectively. For example,
$xˆ5 + 3xˆ2 + 10 = 0$ and $x_1 + x_3 = 4$
yield x 5 + 3x 2 + 10 = 0 and x1 + x3 = 4, respectively.
4. Surds
√ √
Expressions like 2 and 5 11 can be obtained by writing $\sqrt{2}$
or $\sqrt[5]{11}$, respectively (note the use of square brackets
as well as curly ones in the second example).
5. Use pairs of curly braces to nest expressions

For instance,
$xˆ{\frac{1}{2}} = eˆ{yˆ2}$
produces x 2 = ey .
1 2
A.1. How to use the Moodle discussion boards 119
6. Standard functions
The commands $\sin$, $\cos$, $\tan$ and $\log$ produce the
standard functions sin, cos, tan and log. For example,
$\cos(x) = \frac{\sqrt{3}}{2}$
√
3
produces cos(x) = 2
.
7. Summation and integration
Use the commands $\sum$ and $\int$, together with ˆ and _ and
curly braces, to write expressions involving summation and integra-
tion. For instance,
$\sum_{k=1}ˆn k = \frac{1}{2}n(n-1)$
Pn
produces k=1 k = 12 n(n − 1), and
$\int_1ˆ2 xˆ2 dx = \frac{7}{3}$
R2
gives 1
x 2 dx = 73 .
8. Greek characters and special symbols
The Greek letters α, β, θ and π etc can be expressed using $\alpha$,
$\beta$, $\theta$ and $\pi$, respectively. Symbols such as R and
≈ require $\mathbb{R}$ and $\approx$, respectively.
9. Matrices
Alas, there is no quick way to write down matrices properly using
MathJax, because doing so requires a so-called ‘LaTeX environ-
ment’.
To begin, type \begin{pmatrix}. Then type in the entries of the
first row, separating each one by an ampersand & character. When
you reach end of the first row, type \\. Add the entries of the
second row as above, and repeat until you have reached the end
of the final row. To finish, type \end{pmatrix} (you do not need to
add \\ at the end of the final row).
Perhaps an example explains all of this best. Typing
$$\begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$$
will produce !
1 2
.
3 4
The best way to learn this stuff is through practice and experimenta-
tion. You can do so by using this Live Demo. The examples above can
be adapted and combined in all sorts of ways to produce expressions
of greater complexity (though it should not be necessary to write enor-
mously complicated expressions!).
Be mindful when using the curly braces { and }. MathJax will com-
plain with error messages, or will not render your expression properly, if
they are missing or are in the wrong place. Every opening { requires a
corresponding closing } (correctly placed).
Finally, we repeat that when MathJax is used, typically it takes a few

seconds for mathematical expressions to be rendered properly. Also, the
system is not perfect. Sometimes it can stop working for reasons that
are inexplicable and beyond our control!
A.2 How to use WeBWorK

Submitting WeBWorK solutions
WeBWorK solutions must be entered entirely online. Many solutions are
numerical in nature, in which case entering a numerical solution usually
suffices. Occasionally, it may be necessary to enter more complicated
types of solutions, such as a polynomial like x 2 + 3. Such solutions
require a syntax that is similar to, but not the same as, MathJax above
(unfortunately, we are not able to do anything about this). Advice on this
syntex is given below.
√a given answer is not an integer (e.g. a fraction or irrational number like

If
2), you can enter it either symbolically or by using a decimal expansion.
For example, you can enter the fraction 32 either as 2/3 or as 0.666667.
In the latter case, give your answer to at least 6 significant digits so
that WeBWorK does not misinterpret it. The number of digits provided
by most calculator displays should be sufficient.
Below are some examples of types of expressions that may come up in

WeBWorK for this module (and maybe other ones), together with typical
examples and how to enter them.
If you are in any doubt about how WeBWorK is going to interpret your
answer, press the ‘Preview Answers’ button.
A.2. How to use WeBWorK 121
Expression type Example Enter into WeBWorK as

3
Fractions 4
or
Powers, Exponents n5
4k
(−7)n not
Polynomials 2x 2 + 15x − 4 or
Trig functions cos x
Exponential functions ex
WeBWorK will also interpret e.g. pi as π, pi/4 as π4 , cos(pi/4) as cos π4 ,

log(2) as log 2, e^4 as e4 , and so on. See
http://webwork.maa.org/wiki/Available_Functions
for other ways of entering answers. Also, sometimes you have to be

careful when using brackets ( and ). Just as entering things in a cal-
culator in a different order will produce different answers, so WeBWorK
will interpret things in a different order, depending on where you put the
brackets.
Notes on specific MATH10390 WeBWorK questions

Set 1, question 3
Omit the outer quotation marks (see advice given in the question) from
your solutions.
Set 2, questions 8–10

You should enter vectors using chevrons or ‘angle brackets’ < and >, e.g.
<1, 2, 3>.
Set 3, question 10
The definition of row-echelon form in the notes (Definition 3.8) does not
agree with the one used here. Here, the first non-zero entry of a row
does not have to be equal to 1. Such an entry is called a ‘leading entry’,
instead of a ‘leading 1’. In Definition 3.8 (2) and (3), replace ‘leading 1’
by ‘leading entry’.
The definition of reduced row-echelon form is the same as the one given
in the notes. In particular, all leading entries must be leading 1s.
Set 4, questions 1, 2 and 7

You’re asked to solve a system of linear equations such as
x1 + x2 − 3x3 = −9
4x1 + 3x2 + 2x3 = −5,
(the numbers of equations and variables will vary from question to ques-
tion). In each case, the format of the answer suggests that you will find
infinitely many solutions involving a single parameter, denoted by s.
So suppose that your general solution to the system above is
x1 = 3 + 2s
x2 = 2 − 7s
x3 = s
(this is incorrect!). If this is your solution, then input

Appendix B
Principal component analysis

(non-examinable)
Consider a situation in which we have a sample of m objects, each with

n characteristics or variables that we have measured. We can represent
the n measurements of the ith object as a column vector
 
xi1
 
xi2 
 

xi = 
 ..  ,

 . 
 
xin
in Rn . Such a vector is called a data point. The number xij , where

1 6 i 6 m and 1 6 j 6 n, represents the measurement of the jth
characteristic or variable of the ith object. The number of variables we
are measuring equals the dimension of the space these data points live
in (see comments after Example 2.2).
Visualising or trying extract patterns and meaning from data in Rn can be

very difficult or impossible if the number of dimensions n is high. Prin-
ciple Component Analysis (hereafter abbreviated to PCA) is a method
of ‘projecting’ this high-dimensional data onto a lower-dimensional ‘sub-
space’ of Rn , in a way that is designed to give the user some information
about the sample that can be more easily interpreted, given the smaller
number of dimensions. See Appendix C.2 for more details about sub-
spaces of Rn , dimension and projections onto subspaces.
123
124 Principal component analysis (non-examinable)
The first principal component

The principal components of the above sample data turn out to be an on
basis of Rn (see Chapter 4). We can give a complete description of how
they can be found. In doing so, we motivate the process and interpret
the meaning of the first principal component, in terms of maximising a
certain variance. Interpreting the meaning of the subsequent principal
components is a little trickier, because we didn’t cover the mathematics
needed for this in the module, but we give a brief sketch of towards the
end nonetheless.
The first thing we shall do is mean centre the data, so that the ensuing
computations are simplified. Let
m
X
x̃ = 1
m 1
(x + · · · + xm ) = 1
m
xi .
i=1
be the sample mean vector of the data points x1 , . . . , xm . We mean centre

the data by considering new vectors yi = xi − x̃, 1 6 i 6 m. Now the
sample mean vector of the new data points y1 , . . . , ym is
m
X m
X m
X m
X
1
m
yi = 1
m
(xi − x̃) = 1
m
xi − 1
m
x̃ = x̃ − x̃ = 0.
i=1 i=1 i=1 i=1
This has the effect of simplifying the subsequent calculations.

Recall Section 2.5 on orthogonal projections. In PCA, we begin by seek-
ing a unit vector v in such a way that, when we project each yi or-
thogonally onto the line L though the origin, parallel to v as above, the
variance of the projected data points is maximised.
Let v be a unit vector. From above, the orthogonal projection of yi onto
L equals (yi · v)v. The sample mean vector of the orthogonal projections
is also 0:
X m Xm
1
m
(yi · v)v = 1
m
yi · v v = (0 · v)v = 0.
i=1 i=1
Consequently, the (unbiased sample) variance of the orthogonal projec-

tions is simply
m
X
1
m−1
(yi · v)2 . (B.1)
i=1
Principal component analysis (non-examinable) 125
My multiplying by m − 1, maximising (B.1) is equivalent to maximising

the quantity
Xm
(yi · v)2 , (B.2)
i=1
over all possible choices of unit vector v.

It turns out that (B.2) is a quadratic form q(v) (Section 6.1). We will
find its maximum, subject to the condition that v = 1, by appealing to

Proposition 6.8.
Let thePjth entry of the vector yi be denoted yij . Consider the quantities
Cjk = m i=1 yij yik , 1 6 j, k 6 n, and the matrix C ∈ Mn (R) having Cjk as
its (j, k)-entry. Notice that C is symmetric: Cjk = Ckj for all j and k.
The matrix C is known as the corrected sum of squares and products
(SSP) matrix of the original data points xi (remember that we mean
centred them by considering yi ).
We observe that C equals the sum of matrix products m T
P
i=1 yi yi . Indeed,
yi and yi T are n × 1 and 1 × n matrices, respectively, so yi yi T ∈ Mn (R).
Moreover, observe that the (j, 1)-entry of yi , considered as a n × 1 matrix,
is equal to the jth entry of yi , i.e.
(yi )j1 = yij .
Having this in mind, we see that
(yi yi T )jk = (yi )j1 (yi T )1k = yij yik ,
hence m
X m
X
T
yi yi = yij yik = Cjk ,
i=1 jk i=1
for 1 6 j, k 6 p.
Also, observe that, given two n × 1 column vectors a and b, we see that
a · b equals the single entry in the identical 1 × 1 matrices aT b and bT a.
We abuse notation slightly and write a · b = aT b = bT a. With this in
mind,
n
X n
X
(yi · v) =
2
(vT yi )(yi T v)
i=1 i=1
Xn
= vT (yi yi T )v
i=1
126 Principal component analysis (non-examinable)
n
X
T T
= v yi yi v = vT C v = v · (C v), (B.3)
i=1
using Fact 1.23. This yields our quadratic form q, by Theorem 6.5.
Since C is symmetric, by Theorem 5.14, there is an orthogonal matrix P ∈
Mn (R), such that the on basis Pe1 , . . . , Pen of Rn consists of eigenvectors
of C , having corresponding eigenvalues λ1 , . . . , λn , listed in descending
order. Moreover, the product P T C P is the diagonal matrix
 
λ1 0 ... 0
 
λ2
 
 0 0 
 .
 .. ... .. 
 . . 
 
0 ... 0 λn
Set vj = Pej , for 1 6 j 6 n. According to Proposition 6.8, the maximum

and minimum values of q are q(v1 ) = λ1 and q(vn ) = λn , respectively.
The vector v1 is said to be the first principal component. We maximise
(B.2) by setting v equal to the first principal component v1 .
Subsequent principal components

In general, the ith eigenvector vi , 1 6 i 6 n, is called the ith principal
component of the sample data. It is a little harder to explain the meaning
of these, for i > 2, without having a proper discussion of so-called sub-
spaces of Rn and orthogonal projections onto subspaces – these concepts
only very briefly alluded to at the end of Chapter 4.
In lieu of said discussion, we give a brief and somewhat sketchy inter-
pretation. Consider the vectors wi = yi − (yi · v1 )v1 , 1 6 i 6 m. It is simple
to check that all of the vectors wi are orthogonal to v1 (indeed, this is by
design – see Section 2.5). Now imagine that we wanted to repeat the
process above of finding a vector v that maximises the variance as in (B.1)
(equivalently, maximises (B.2)), but where the yi have been replaced by
the wi . It turns out that v2 maximises said quantity.
Then consider a further set of vectors
wi − (wi · v2 )v2 ,
Principal component analysis (non-examinable) 127
1 6 i 6 m. With a little vector algebra, and noting that v1 · v2 = 0, we

can show that these new vectors equal
yi − (yi · v1 )v1 − (yi · v2 )v2 ,
1 6 i 6 m. It is straightforward to check that all of these vectors are

orthogonal to both v1 and v2 . Repeating the process of maximisation once
more with these vectors will yield v3 . And so it goes on. We obtain v4
after repeating the process with
yi − (yi · v1 )v1 − (yi · v2 )v2 − (yi · v3 )v3 ,
1 6 i 6 m, which are all orthogonal to v1 , v2 and v3 . At each stage, we

are maximising a certain variance with respect to a set of vectors that
are orthogonal to all of the principal components found up to that point.
We run out of things to do after the nth stage, because
yi − (yi · v1 )v1 − (yi · v2 )v2 − · · · − (yi · vn )vn = 0,
1 6 i 6 m, as v1 , . . . , vn is an on basis (recall Theorem 4.6 (2))!
Appendix C
Additional material (non-examinable)
C.1 Additional proofs

In this module, procedure is emphasised over theory. In other words, we
focus on presenting methods of doing things, rather than looking at why
those methods work in the first place. Those of you who want to delve
more deeply into why the mathematics in this module works are invited
to look at this section.
Missing proofs from Chapter 1

In the proof of Fact 1.23, you will see how we have to open up the matrices
involved, unravel things far enough to be able to manipulate things in the
way we require, and then close everything back up again.
Throughout the proof, we assume as given the standard laws of arithmetic
of ordinary numbers. In a couple of cases, namely (1) and (2) below, we
point out exactly which laws of arithmetic are being used, but do not do
so thereafter.
Proof of Fact 1.23.
1. Let A and B be m × n matrices. Let 1 6 i 6 m and 1 6 j 6 n. We

have
(A + B)ij = Aij + Bij Definition 1.5

= Bij + Aij commutative law of addition of numbers
= (B + A)ij . Definition 1.5
129
130 Additional material (non-examinable)
In other words, the (i, j)-entry of A + B equals the (i, j)-entry of

B + A. Since the choices of i and j were arbitrary, it follows that
every entry of A + B equals its corresponding entry of B + A. This
means that A + B = B + A, according to Remarks 1.3 (5).
2. The proof quite similar to (1). Let A, B and C be m × n matrices.
Let 1 6 i 6 m and 1 6 j 6 n. We have
((A + B) + C )ij = (A + B)ij + Cij Definition 1.5

= (Aij + Bij ) + Cij Definition 1.5
= Aij + (Bij + Cij )
associative law of addition of numbers
= Aij + (B + C )ij Definition 1.5
= (A + (B + C ))ij . Definition 1.5
In other words, the (i, j)-entry of (A + B) + C equals the (i, j)-entry

of A + (B + C ). As above, since the choice of i and j were arbitrary,
it follows that A + (B + C ) = (A + B) + C .
4. Let A, B and C be m × r, r × p and p × n matrices, respectively.
It follows that AB is a m × p matrix, so (AB)C is a m × n matrix.
Likewise, BC is a r × n matrix, so A(BC ) is a m × n matrix as well.
Hence (AB)C and A(BC ) have the same size. It remains to show
that the corresponding matrix entries are equal. Given 1 6 i 6 m
and 1 6 j 6 n, we have
p
X
((AB)C )ij = (AB)ik Ckj Definition 1.10
k=1
pr
!
X X
= Ai` B`k Ckj Definition 1.10
k=1 `=1
p r
!
X X
= Ai` B`k Ckj . (C.1)
k=1 `=1
On the other hand,

r
X
(A(BC ))ij = Ai` (BC )`j Definition 1.10
`=1
p
r
!
X X
= Ai` B`k Ckj Definition 1.10
`=1 k=1
C.1. Additional proofs 131
p
r
!
X X
= Ai` B`k Ckj . (C.2)
`=1 k=1
The point is that the double sums in lines (C.1) and (C.2) are equal!
The only difference between them is the order of summation: in (C.1)
we sum over ` first, and then over k, whereas in (C.2) we sum over k
first, and then `. In summary, ((AB)C )ij = (A(BC ))ij . As above, since
i and j were chosen arbitrarily, it follows that (AB)C = A(BC ).
5. Let A and B be m × p matrices and let C be a p × n matrix. Then
A + B is a m × p matrix and (A + B)C is a m × n matrix. Likewise,
AC , BC and AC + BC are m × n matrices. We must check that the
corresponding entries of (A + B)C and AC + BC are equal. Given
1 6 i 6 m and 1 6 j 6 n, we have
p
X
((A + B)C )ij = (A + B)ik Ckj Definition 1.10
k=1
p
X
= (Aik + Bik )Ckj Definition 1.5
k=1
p
X
= Aik Ckj + Bik Ckj
k=1
p p
X X
= Aik Ckj + Bik Ckj
k=1 k=1
= (AC )ij + (BC )ij Definition 1.10
= (AC + BC )ij . Definition 1.5
As above, this tells us that (A + B)C = AB + BC . The other equality

is covered in almost the same way.
6. Exercise!
7. Let A and B be m × p and p × n matrices, respectively. Then AB is
a m × n matrix, so (AB)T is a n × m matrix. Moreover, BT and AT
are n × p and p × m matrices, respectively, so BT AT is also a n × m
matrix. Thus (AB)T and BT AT have the same size. Now we check
that their respective entries agree. Given 1 6 i 6 n and 1 6 j 6 m,
we have
((AB)T )ij = (AB)ji Definition 1.19
p
X
= Ajk Bki Definition 1.10
k=1
p
X
= (BT )ik (AT )kj Definition 1.19
k=1
= (B AT )ij .
T
Definition 1.10
This shows that the corresponding matrix entries of (AB)T and BT AT
agree, hence (AB)T = BT AT .
Proof of Theorem 1.36. Let A be a m × n matrix. Then AIn is again a

m × n matrix. As above, it remains to check that their respective entries
agree. Given 1 6 i 6 n and 1 6 j 6 m, we have
n
X
(AIn )ij = Aik (In )kj Definition 1.10
k=1
= Aij ,
because (In )kj = 1 if k = j, and is zero otherwise! Hence AIn = A. The
proof of the other equality is almost the same.
The proofs of Proposition 1.53 (1) and (3) require mathematical induction.
Since we have not considered this method of proof in this module at all,
we just give very rough clues as to how these results can be proved.
Proof of Proposition 1.53.
1. This can be proved using Theorem 1.44 and by applying mathemat-

ical induction to the number of rows (or columns) of A.
2. If row i of A is zero then Ai1 = Ai2 = · · · = Ain = 0, so by expanding
along the ith row as in Theorem 1.44, we obtain
det(A) = 0 · Ci1 + 0 · Ci2 + · · · + 0 · Cin = 0.
Likewise if column j of A is zero.
3. This can be proved using Definition 1.41 and mathematical induc-
tion.
4. Observe that (kIn )ij = k if i = j, and is zero otherwise. Thus kIn is
a diagonal matrix, and so by applying (2) we have
n
det(kIn ) = k {z· · · × k} = k .
| ×k ×
n times
C.1. Additional proofs 133
Missing proofs from Chapter 2

In the proof of Proposition 2.28, we will make extensive use of the prop-
erties and machinery of matrix multiplication developed in Chapter 1.
Notice that if we treat x and y as n × 1 column vectors, then the matrix
product yT x is a 1 × 1 matrix, and furthermore
 
x
 1 
 x2 


T
y x = y1 y2 . . . yn   .. 

 . 
 
xn

= y1 x1 + y2 x2 + · · · + yn xn = x·y .
In other words, the matrix product yT x is the 1 × 1 matrix whose single

entry is the scalar product x·y. Since any 1×1 matrix just identifies with
the single scalar entry inside, we drop the exterior brackets and simply
write yT x = x · y. (Abusing notation is generally not recommended, as
it often creates confusion and errors, but occasionally it is helpful, if the
context of the abuse is clear.)
Having this observation in mind, the proof of Proposition 2.28 becomes
an exercise in matrix arithmetic.
Proof of Proposition 2.28. From above,
x · (AT y) = (AT y)T x

= (yT AT T )x Fact 1.23 (7)
= (yT A)x AT T = A for any matrix
= yT (Ax) Fact 1.23 (4)
= (Ax) · y.
C.2 Vector subspaces of Rn and dimension

In this section we introduce the idea of a vector subspace of Rn , and on
bases of these subspaces. The reader may find that the material covered
here is a little more abstract than that of previous notes.
Vector subspaces
Vector subspaces are special sets of vectors. Throughout this chapter, we
will be using sets and set notation. Readers unfamiliar with such things
are advised to read through these notes before continuing.
Definition C.1 (Vector subspaces). Let S ⊆ Rn be some subset of

vectors in Rn . We say that S is a vector subspace, or just a subspace
of Rn , if it satisfies three tests:
1. the zero vector 0 belongs to S;
2. whenever we have two vectors x and y in S, their sum x + y is

also in S (i.e. x + y ∈ S whenever x, y ∈ S), and
3. whenever x is in S and k is a scalar, the vector kx is also in S

(i.e. kx ∈ S whenever x ∈ S and k ∈ R).
Typically, subsets of Rn that are subspaces are denoted by capital letters

such as V , U and W . Tests (2) and (3) of Definition C.1 are sometimes
expressed by saying that a subspace is closed under the operations of
vector addition and scalar multiplication. In other words, one does not
wander out of a subspace by adding together any two vectors inside it,
or by taking any scalar multiple of any vector inside.
Subsets that are not subspaces

To show that a given subset of vectors is a subspace, one must verify that
it passes all three of the tests in Definition C.1. Before we give some
examples of vector subspaces, we will consider some examples of subsets
of Rn that are not subspaces, because they fail one of the three tests. In
doing so, we will gain some insight into how these tests operate.
Example C.2. Show that the following subsets of R2 are not sub-
C.2. Vector subspaces of Rn and dimension 135
spaces of R2 .

1. S = (x1 , x2 ) ∈ R2 : x12 + x22 = 1 .

2. S = (x1 , x2 ) ∈ R2 : x1 = 0 or x2 = 0 .

3. S = (x1 , x2 ) ∈ R2 : x1 > 1 .
Solution.
1. Evidently S fails Definition C.1, test (1), because 0 = (0, 0) in

this case, and 02 + 02 = 0 6= 1. Failure of one test is enough to
show that S is not a subspace. For good measure, we show that
S fails the other two tests as well. Consider test (2). Note that
(1, 0), (0, 1) both belong to S, but the sum (1, 0) + (0, 1) = (1, 1)
does not, because 12 + 12 = 2 6= 1. Now consider test (3).
We have (1, 0) ∈ S and 2 ∈ R, but 2(1, 0) = (2, 0) is not in S,
because 22 + 02 = 4 6= 1.
Geometrically, the set S is the circle in R2 , having centre the
origin and radius 1. This set is illustrated below, together with
the fact that it fails test (2), using the example above.
y
(0,1) (1,0)+(0,1) = (1,1)
x
(1,0)
2. The set S passes test (1) because 0 = (0, 0) ∈ S. However,

it fails test (2). For example (1, 0), (0, 1) ∈ S, but (1, 1) =
(1, 0) + (0, 1) ∈ / S. Incidentally, S does pass test (3). Indeed,
let (x1 , x2 ) ∈ S and k ∈ R. Then k(x1 , x2 ) = (kx1 , kx2 ). Either
x1 = 0 or x2 = 0, so either kx1 = 0 or kx2 = 0. Whatever the
case, k(x1 , x2 ) ∈ S.
Geometrically, S is the set of all points along the two standard
coordinate axes in R2 . The failure of test (2) is illustrated.
(0,1) (1,0)+(0,1) = (1,1)
x
(1,0)
3. As above, S passes test (1). It passes test (2) as well. Indeed,

if (x1 , x2 ) and (y1 , y2 ) belong to S, then by definition x1 , y1 > 0.
However, (x1 , x2 ) + (y1 , y2 ) = (x1 + x2 , y1 + y2 ) and x1 + y1 > 0,
so (x1 + y1 , x2 + y2 ) ∈ S. However, S fails test (3). For example,
(1, 0) ∈ S and −1 ∈ R, but −1(1, 0) = (−1, 0) ∈ / S.
Geometrically, S is the set of all points on and to the right
of the vertical line having first coordinate 1, as shown shaded
below. The failure of test (3) is illustrated.
y
x
−1(1,0) = (−1,0) (1,0)
Subspaces generated by lists of vectors

Vector subspaces of Rn can be generated in a natural way, using lists
of vectors. Before we look at that, it will help to consider the notion of
linear combinations of vectors.
Definition C.3 (Linear combinations). Let v1 , . . . , vk be a list of vectors

in Rn (not necessarily orthonormal). Any sum of the form

k
X
a1 v1 + · · · + ak vk = ai vi ,
i=1
where a1 , . . . , ak ∈ R are scalars, is called a linear combination of

the vectors v1 , . . . , vk . Given x in Rn , if it is possible to find scalars
b1 , . . . , bk ∈ R, such that
k
X
x = bi v i ,
i=1
then x is called linear combination of v1 , . . . , vk .
For example. . .
Example C.4. Let v1 = (1, 1, 0), v2 = (−1, 1, 0) and v3 = (3, 1, 0) be

vectors in R3 . Then the vector x = (5, 11, 0) is a linear combination
of v1 , v2 , v3 , whereas y = (0, 0, 1) is not.
Solution. Let b1 = 8, b2 = 3 and b3 = 0. Then
b1 v1 + b2 v2 + b3 v3 = 8(1, 1, 0) + 3(−1, 1, 0) + 0(3, 1, 0)

= (8, 8, 0) + (−3, 3, 0) + (0, 0, 0) = (5, 11, 0) = x.
Hence x is a linear combination of v1 , v2 , v3 .

On the other hand, given any scalars a1 , a2 , a3 , notice that
a1 v1 + a2 v2 + a3 v3 = (a1 − a2 + 3a3 , a1 + a2 + a3 , 0).
The third entry of the linear combination always is 0, no matter what

a1 , a2 , a3 are, whereas the third entry of y equals 1. Hence y is not
a linear combination of v1 , v2 , v3 .
Remarks C.5. It is worth noting that if we set c1 = 10, c2 = 2 and

c3 = −1 and consider v1 , v2 , v3 in the example above, then
c1 v1 + c2 v2 + c3 v3 = 10(1, 1, 0) + 2(−1, 1, 0) − 1(3, 1, 0)

= (10, 10, 0) + (−2, 2, 0) + (−3, −1, 0)
= (5, 11, 0) = x.
If the list of vectors So a different set of scalars has, in this case, given us the same vec-
happens to be linearly
independent, then the
tor. The scalars involved in the linear combination are not uniquely
scalars involved in any determined by the vector in question.
linear combination will
be uniquely determined:
no two different sets of Linear combinations can be used to define subspaces.
scalars will yield the
same linear combina-
tion. Definition C.6 (Subspaces generated by vectors). Let v1 , . . . , vk be a
We will not explicitly
cover linearly indepen-
list of vectors in Rn . Define the set of all possible linear combinations
dent lists of vectors in of the list
these notes, however,
orthonormal lists of vec-
tors happen to be exam- V = {a1 v1 + · · · + ak vk : a1 , . . . an ∈ R} .
ples of them.
Then V is called the subspace spanned or generated by v1 , . . . , vk .
In the definition above, V was claimed to be a subspace without any

proof of this fact. The next result repays this debt.
Proposition C.7. The set V given above is a subspace of Rn .
Proof. We must verify that V satisfies the criteria in Definition C.1.
1. If we set a1 = 0 = · · · = ak = 0, then
a1 v1 + · · · + ak vk = 0v1 + · · · + 0vk = 0.
Thus 0 ∈ V .
2. Let x, y ∈ V . By definition, x and y are linear combinations of
v1 , . . . , vk , so there exist scalars a1 , . . . , ak and b1 , . . . , bk such that
x = a1 v1 + · · · + ak vk and y = b1 v1 + · · · + bk vk .
Adding these gives
x + y = (a1 v1 + · · · + ak vk ) + (b1 v1 + · · · + bk vk )
= (a1 + b1 )v1 + · · · + (an + bk )vk ∈ V .
3. Finally, if c is a scalar, then (with x as above)
cx = c(a1 v1 + · · · + ak vk ) = (ca1 )v1 + . . . (cak )vk ∈ V .

Hence V satisfies all three criteria and so is a subspace of Rn .
Given v1 , . . . , vk and V as above, observe that vi ∈ V for each i. Indeed,

vi can be expressed as the linear combination
vi = 0v1 + · · · + 0vi−1 + 1vi + 0vi+1 + · · · + 0vk ,
thus it is an element of V , by the definition of V .
Geometric examples of subspaces

The material above looks rather abstract at first glance. However, many
examples of subspaces turn out to be quite easily visualised geometri-
cally. We consider a few such examples. First, we begin with the ‘largest’
subspace of Rn , namely itself.
Example C.8. The set Rn is a subspace, because it is a subset of itself

and evidently satisfies all three of the tests present in Definition C.1.
This is considered to be the largest subspace of Rn because every sub-

space of Rn is, by definition, a subset of it.
Example C.9.
1. Fix a non-zero vector v in Rn , and let V be the set of all scalar

multiples of v, i.e.
V = {av : a ∈ R} .
The vector v comprises a 1-element list of vectors. By Propo-

sition C.7, V is a subspace of Rn . It is the set of all vectors
parallel to v. If we were to plot all the elements belonging to
V above, we would get a straight line through the origin of Rn ,
extending infinitely in both directions and running parallel to
v. Such a subspace is 1-dimensional. We will cover dimension
a little more later.
2. Specifically, let v = (1, 3, 4) in R3 . Then V as above is given by
V = {a(1, 3, 4) : a ∈ R} = {(a, 3a, 4a) : a ∈ R} .
Example C.10.
1. Fix non-zero vectors v1 and v2 in Rn that are not scalar multiples

of each other, i.e., for no scalar c do we have v2 = cv1 , and vice-
versa. Define
V = {a1 v1 + a2 v2 : a1 , a2 ∈ R} .
Again, by Proposition C.7, V is a subspace of Rn . It is a so-

called 2-dimensional subspace.
2. Specifically, let v1 = (1, 0, 12 ) and v2 = (0, 1, 21 ) in R3 . Then V as

above is given by

V = (a1 , a2 , 21 (a1 + a2 )) : a1 , a2 ∈ R .
If we were to plot all the elements of V , we would get a plane

in R3 , passing through the origin.
In the figure below, we plot a section of subspace V in Example C.10 (2).
Figure C.1: A section of the plane in Example C.10 (2)
The whole subspace, if plotted, would look like a flat surface extending
arbitrarily far in all directions. Any vector in V , if regarded as a straight
line segment having the origin as its initial point, would lie along the
surface, as depicted above.
Next, we consider the so-called ‘trivial’ subspaces.
Example C.11. Let 0 = (0, . . . , 0) in Rn be the zero vector. The set

{0} is called the trivial subspace of Rn .
The trivial subspace of Rn is 0-dimensional. Strictly speaking, there is

a different trivial subspace of Rn for each value of n, because the zero
vectors 0, (0, 0), (0, 0, 0),. . . , change with n. However, since they are all
simply the 1-element set containing the zero vector of the given Rn , in
some sense, they are all the same.
Given a subspace V ⊆ Rn , test (1) of Definition C.1 guarantees that {0}
is always a subset of V . Therefore, {0} can be regarded as the ‘smallest’
subspace of Rn , hence its name.
We finish this subsection by considering another type of subspace that
has strong connections to geometry.
Exercise C.12. Fix a non-zero vector n in Rn , and let V be the set of

all vectors x in Rn that are orthogonal to our fixed vector n, i.e.
V = {x ∈ Rn : n · x = 0} .
Use Fact 2.20 to show that V is a subspace of Rn .
Such a subspace as given in Exercise C.12 is called a hyperplane of Rn .

A hyperplane of Rn is any subspace of Rn that has dimension n − 1. If
n = 3 then a subspace defined in this way will yield a 2-dimensional
plane of the type described in Example C.10. Rather than hyperplanes,
such objects are simply called planes. The vector n in Exercise C.12 is
called normal to the hyperplane.
Example C.13. Let n = (− 21 , − 21 , 1). Then the subspace V of R3 de-

fined in Exercise C.12 happens to equal the subspace defined in Ex-
ample C.10 (2).
We cover planes and
normal vectors in R3 in
a very cursory manner
In the figure below, the plane in Example C.10 (2) is plotted once again, here. To understand
this time with the vector n = (− 12 , − 12 , 1), lying normal to the plane. Every these things properly,
vector in the plane is perpendicular to n. Normal vectors are not unique. we require the notion
of the vector product or
Any non-zero scalar multiple of a vector normal to a given hyperplane cross product, which is
another form of product
will also be a vector normal to that same hyperplane. that applies exclusively
to vectors in R3 . We will
not cover this product in
© 2023 Richard Smith. For personal use only, not for circulation or sharing. this module.
Figure C.2: The plane in Example C.10 (2) with normal vector n
Subspaces defined by orthonormal lists of vectors

For the remainder of this section, we will deal exclusively with orthonor-
mal lists of vectors v1 , . . . , vk in Rn . According to Theorem 4.6 (1), we
must have k 6 n. Given such a list, let us define V as in Definition C.6.
Proposition C.14. Let v1 , . . . , vk be an on list of vectors in Rn and let
x = a1 v1 + · · · + ak vk ∈ V ,
be a linear combination, where V is as in Definition C.6. Then ai =

x · vi whenever 1 6 i 6 k. In particular, whenever x ∈ V , we have
x = (x · v1 )v1 + · · · + (x · vk )vk .
Proof. Let 1 6 i 6 k. Taking scalar products of both sides with vi gives

us
x · vi = (a1 v1 + · · · + ak vk ) · vi
= a1 (v1 · vi ) + · · · + ai (vi · vi ) + · · · + ak (vk · vi ) by Fact 2.20
= a1 (0) + · · · + ai (1) + · · · + ak (0)
= ai .
Now we turn to the second statement of the proposition. If x ∈ V then,
by definition of V , there exist scalars a1 , . . . , ak such that
x = a1 v1 + · · · + ak vk .
From above ai = x · vi for each i, thus
x = (x · v1 )v1 + · · · + (x · vk )vk ,
as claimed.
It will help to compare the next definition with Definition 4.7.
Definition C.15 (Orthonormal bases of subspaces and coordinates).

Let v1 , . . . , vk be an on list of vectors in Rn and define V as in Defi-
nition C.6.
1. The list v1 , . . . , vk is called an orthonormal basis or basis of V .
2. Given x ∈ V , the numbers x · v1 , . . . , x · vk are the coordinates

of x with respect to the basis v1 , . . . , vk .
Recall Remarks C.5. Unlike the list of vectors in that remark, given x ∈ V
as above, the scalars used to express x as a linear combination of the
vectors v1 , . . . , vk are uniquely determined: by Proposition C.14, they must
equal the coordinates x · v1 , . . . , x · vk , otherwise one would get a different
vector.
Definition C.15 is a generalisation of Definition 4.7 as it allows us to
define on bases of subspaces of Rn , and coordinates of vectors that lie in
said subspaces, rather than just Rn itself. If we set k = n and V = Rn ,
then Definition C.15 boils down to Definition 4.7.
Dimension of subspaces
Definition C.16 (Subspace dimension). Let v1 , . . . , vk be an on list of

vectors in Rn and define V as in Definition C.6. The number k is
known as the dimension of the subspace V , and is denoted dim V .
On reflection, this is a natural definition of the dimension of a vector

subspace. Given a list on vectors v1 , . . . , vk , any vector x in the subspace
generated by the list has k coordinates x·v1 , . . . , x·vk . One can associate
with each vector vi (which is an element of V ), a coordinate axis running
parallel to vi . Since the vi are mutually orthogonal, so are the associated
axes. Thus we have k mutually orthogonal coordinate axes which can
be used to locate each and every vector in V , using their k coordinates.

Having this geometric picture in mind, it seems natural to define the
dimension of V to be k.
In particular, given that Rn itself is generated by the on list e1 , . . . , en , we
get dim Rn = n, which agrees with our intuition of what the dimension of
Rn ought to be.
Remarks C.17.
We will have dim V =

k if the list of vec-
1. The definition above may not work if we do not require that
tors is linearly inde- the list v1 , . . . , vk is orthonormal. Given a general list of vectors
pendent, but again, we
won’t treat this notion v1 , . . . , vk , we can generate the corresponding space V , and it
explicitly in these notes. will have a dimension. However, without further information
about v1 , . . . , vk , we are only able to say that dim V 6 k – we
may not have equality in general.
2. The definition of dimension disguises a subtle point which is

worth mentioning. A subspace V of Rn can be generated by
different on lists of vectors. In fact, if dim V > 2, then there are
infintely many such lists. For instance, fix an angle θ and con-
sider the vectors v1 = (cos θ, sin θ, 0) and v2 = (− sin θ, cos θ, 0).
It turns out that the subspace V of R3 generated by the list
v1 , v2 , which is orthonormal, is equal to the set of vectors
V = {(a, b, 0) : a, b ∈ R} ,
regardless of the value of θ. (Verifing this fact would be an

instructive exercise.) Thus V is generated by infinitely many
on lists.
Since a general subspace V may be generated by many differ-
ent on lists of vectors, this raises the possibility that there are
on lists of vectors having different lengths, which generate V .
If this is the case then our definition of dimension would flail
about in a rather embarrassing way, because we would have
are than one competing candidates for the number dim V .
Fortunately, such a frightening spectre can never arise. It turns
out that, while V can be generated by many different on lists of
vectors, all such lists have the same length. Thus our definition
of dimension is safe and sound: the length of any one such list
will be the same as the length of any other! The fact that all bases
of a given vector sub-
space have the same
Orthogonal projections onto subspaces of Rn

length follows from the
so-called Steinitz Ex-
change Lemma. It is a
This is a continuation and development of Section 2.5 and Exercise 4.12. key result in the theory
of vector spaces.
Section 2.5 should be regarded in the ‘1-dimensional’ version of what
follows.
One of the ways in which ‘high-dimensional’ data can be analysed is to
take this data, which can be represented by a set of vectors in Rn (where
n tends to be a large number, hence ‘high-dimensional’), and ‘project’ it
down onto a subspace V of Rn , having a dimension much lower than n.
Since the projected data exists in a lower-dimensional space, it tends
to be easier to interpret than the original data (picture a 3-dimensional
object, and then picture a 7-dimensional object. . . ). Whenever projecting
in this way we will lose some information about the original data, but the
trick is to proceed in such a way that the projected data retains much of
the most important information possessed by the original data.
Let us fix an on list of vectors v1 , . . . , vk in Rn . This generates a sub- The key to retaining the
space V of Rn as per Definition C.15. Compare the next definition with important features of a
data set when projecting
Definition 2.30 and Exercise 4.12. is to make an appropri-
ate choice of list of on
vectors.
Definition C.18. Given x in Rn , the vector
projV (x) = (x · v1 )v1 + · · · + (x · vk )vk ∈ V ,
is called the orthogonal projection of x onto V .
Geometrically speaking, projV (x) is defined in such a way that the vector
x − projV (x) is orthogonal to the entire subspace V , in the sense that
it is orthogonal to every vector in V . Imagine a fly hovering above a
horizontal table top. If we draw a vertical line down from the fly to the
table top, then the point at which the line meets the table would be the
orthogonal projection of the fly onto the table. This vertical line can be
said to be perpendicular to the entire table top.
Moreover, projV (x) is defined so that the distance between x and projV (x)
is less than the distance between x and any other vector z in V . In other
words,
x − proj (x) 6 x − z ,
V
whenever z ∈ V . The distance between x and projV (x) is the shortest

distance between x and the entire subspace V . In the context of the
hovering fly, the length of the vertical line between the fly and the table
top yields the shortest distance between the fly and the table top.
Example C.19. The plane V in Example

q C.10 (2) can be generated
by the on list v1 , v2 , where v1 = 5 (1, 0, 12 ) and v2 = √130 (1, −5, −2).
4
Let x = ( 41 , 14 , 1). Compute projV (x).
Solution. According to Definition C.18, we have
projV (x) = (x · v1 )v1 + (x · v2 )v2

= ( 35 , 0, 10
3
, 2 , 5 ) = ( 12 , 12 , 12 ).
1 1 1
) + (− 10
Example C.19 is depicted below. The vector projV (x) is represented by

the arrow in the plane. The arrow pointing perpendicular to the plane
represents the vector x − projV (x). The vector x is given by the third
arrow. Together, all three vectors form a right-angled triangle: projV (x ·
(x − projV (x)) = 0.
Figure C.3: Projection of x onto the plane in Example C.19
The claims made above are justified by the next theorem. What follows
is essentially the solution of Exercise 4.12.
Theorem C.20. Let v1 , . . . , vk , V and x be as above. Then
1. (x − projV (x)) · z = 0 whenever z ∈ V , and

2. x − projV (x) 6 x − z whenever z ∈ V .

Proof.
1. First of all, given 1 6 i 6 k, observe that

(x − projV (x)) · vi = x · vi − (x · v1 )v1 + · · · + (x · vk )vk · vi
= x · vi − (x · v1 )(v1 · vi ) − · · · − (x · vi )(vi · vi )−
· · · − (x · vk )(vk · vi )
= x · vi − x · vi
= 0,
using Fact 2.20. Now let z ∈ V . According to Proposition C.14, we

have
z = (z · v1 )v1 + · · · + (z · vk )vk .
Consequently,

(x − projV (x)) · z = (x − projV (x)) · (z · v1 )v1 + · · · + (z · vk )vk

= (z · v1 ) (x − projV (x)) · v1 +

· · · + (z · vk ) (x − projV (x)) · vk
= 0 + ··· + 0 from above
= 0,
again using Fact 2.20.
2. Given z ∈ V , set u = x − projV (x) and v = projV (x) − z. Observe that

v ∈ V , as V is a subspace. Thus u · v = 0 by part (1). It follows that

x − proj (x) = u 6 u + v = x − z ,
V
by Theorem 2.22.
Some congratulations are in order, because you have reached. . .
Well done!

Linear Algebra UCD

Uploaded by

Copyright:

Available Formats

You might also like

Linear Algebra UCD

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Algebra UCD

Uploaded by

Copyright:

Available Formats

Linear Algebra (Online)

May 20, 2023

Dr Richard Smith copyright © 2022/23.

3 Systems of Linear Equations 63

4.1 Orthonormal bases of Rn and coordinate systems . . . . . . 83

5 Eigenvalues and Eigenvectors of Matrices 93

A Discussion board and WeBWorK guides 117

B Principal component analysis (non-examinable) 123

C Additional material (non-examinable) 129

0.1 Programme outline

MATH10390 Linear Algebra (Online) outline

• Systems of linear equations

• Eigenvalues and eigenvectors of matrices

MATH10400 Calculus (Online) outline

• More about the derivative

• Functions of several variables

Examples will be peppered throughout the two modules. While there

Lecturer – Dr Richard Smith

Figure 0.1: School office: G03, ground floor, Science North

UCD Mathematics Moodle (https://vector.ucd.ie/moodle).

2. Lecture notes and videos

5. News and announcements

Mobile access to online material

when prompted for the URL, and then log in as usual.

0.2 Assessment and grading

Homework (20% of final mark)

WeBWorK (10% of final mark)

Final exam (70% of final mark)

due to the pandemic. Last year we reverted to in-person exams, and we

0.3 Continuous assessment schedules

The WeBWorK deadlines are hard deadlines. Regarding homework dead-

MATH10390 assessment schedule (9am Tuesdays)

Week Date Assignment issue Assignment deadline

Week Date Assignment issue Assignment deadline

6 28-06 WeBWorK 3, Homework 3 WeBWorK 2

0.4 Discussion boards and MathJax

Posting mathematical content

0.5 Any other business

Calculators permissible in the final exams

Registration, fee payment and withdrawals

Work commitments are not normally considered to be extenu-

UCD Student Code and plagiarism

1.1 Introduction to matrices

Definition 1.1. A m × n (‘m by n’) matrix is a rectangular array of

2. Let θ be an angle, and define

We usually label matrices using capital letters like A, B and C , etc.

science, and other areas of mathematics such as geometry and differential

2. A matrix having just one column is sometimes called a column

3. If a m × n matrix has the same number of rows as columns, i.e.,

4. If A is a m × n matrix, the number appearing in the ith row and

Matrix addition and subtraction

Definition 1.5 (Matrix addition and subtraction). Let A and B be ma-

(A + B)ij = Aij + Bij ,

for i = 1, . . . , m and j = 1, . . . , n. Their difference, A − B, is the m × n

(A − B)ij = Aij − Bij ,

where i and j are as above.

Thus A + B is obtained from A and B by adding entries in corresponding

Example 1.6. Take two 2 × 4 matrices

Matrix scalar multiplication