Download as pdf or txt
Download as pdf or txt
You are on page 1of 287

Algebra

M. Anthony, M. Harvey
MT1173
2015

Undergraduate study in
Economics, Management,
Finance and the Social Sciences

This subject guide is for a 100 course offered as part of the University of London
International Programmes in Economics, Management, Finance and the Social Sciences.
This is equivalent to Level 4 within the Framework for Higher Education Qualifications in
England, Wales and Northern Ireland (FHEQ).
For more information about the University of London International Programmes
undergraduate study in Economics, Management, Finance and the Social Sciences, see:
www.londoninternational.ac.uk
This guide was prepared for the University of London International Programmes by:
M. Anthony, Professor of Mathematics, Department of Mathematics, London School of
Economics and Political Science.
M. Harvey, Course Leader, Department of Mathematics, London School of Economics and
Political Science.

This is one of a series of subject guides published by the University. We regret that due
to pressure of work the authors are unable to enter into any correspondence relating to,
or arising from, the guide. If you have any comments on this subject guide, favourable or
unfavourable, please use the form at the back of this guide.

University of London International Programmes


Publications Office
Stewart House
32 Russell Square
London WC1B 5DN
United Kingdom
www.londoninternational.ac.uk

Published by: University of London


© University of London 2015

The University of London asserts copyright over all material in this subject guide except where
otherwise indicated. All rights reserved. No part of this work may be reproduced in any form,
or by any means, without permission in writing from the publisher. We make every effort to
respect copyright. If you think we have inadvertently used your copyright material, please let
us know.
Contents

Contents

Preface 1

1 Introduction 3
1.1 This subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Aims of the course . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Route map to the guide . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Online study resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 The VLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Making use of the Online Library . . . . . . . . . . . . . . . . . . 7
1.4 Using the guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Examination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 The use of calculators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Preliminaries 9
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Some basic set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Union and intersection . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.4 Showing two sets are equal . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Numbers, algebra and equations . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Basic notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Simple algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

i
Contents

2.2.4 Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.5 Quadratic equations . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.6 Polynomial equations . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Mathematical statements and proof . . . . . . . . . . . . . . . . . . . . . 16
2.4 Introduction to proving statements . . . . . . . . . . . . . . . . . . . . . 18
2.5 Some basic logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.2 Conjunction and disjunction . . . . . . . . . . . . . . . . . . . . . 22
2.6 Implications, converse and contrapositive . . . . . . . . . . . . . . . . . . 24
2.6.1 ‘If-then’ statements . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.2 Converse statements . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6.3 Contrapositive statements . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Proof by contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.8 Some terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 28
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Feedback on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 Matrices 31
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1 Definitions and terminology . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Matrix addition and scalar multiplication . . . . . . . . . . . . . . . . . . 33
3.3 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Matrix inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5.1 Definition of a matrix inverse . . . . . . . . . . . . . . . . . . . . 38
3.5.2 Properties of the inverse . . . . . . . . . . . . . . . . . . . . . . . 40
3.6 Powers of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.7 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

ii
Contents

3.7.1 The transpose of a matrix . . . . . . . . . . . . . . . . . . . . . . 42


3.7.2 Symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 45
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4 Vectors 49
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1 Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.1 Definition of vector and Euclidean space . . . . . . . . . . . . . . 50
4.1.2 The inner product of two vectors . . . . . . . . . . . . . . . . . . 51
4.1.3 Vectors and matrices . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Developing geometric insight – R2 and R3 . . . . . . . . . . . . . . . . . 53
4.2.1 Vectors in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.2 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.3 Vectors in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.1 Lines in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.2 Lines in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5 Lines and hyperplanes in Rn . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.1 Vectors and lines in Rn . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.2 Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 72
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5 Linear systems I: Gaussian elimination 77


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

iii
Contents

Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.1 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2 Row operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.1 The algorithm — reduced row echelon form . . . . . . . . . . . . 81
5.3.2 Consistent and inconsistent systems . . . . . . . . . . . . . . . . . 85
5.3.3 Linear systems with free variables . . . . . . . . . . . . . . . . . . 86
5.3.4 Solution sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 90
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6 Linear systems II: an application and homogeneous systems 93


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.1 Application: Leontief input-output analysis . . . . . . . . . . . . . . . . . 94
6.2 Homogeneous systems and null space . . . . . . . . . . . . . . . . . . . . 96
6.2.1 Homogeneous systems . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2.2 Null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 101
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 102

7 Matrix inversion 105


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

iv
Contents

Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.1 Elementary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.2 Row equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.3 The main theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.4 Using row operations to find the inverse matrix . . . . . . . . . . . . . . 110
7.5 Verifying an inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 113
Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Comment on exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8 Determinants 115
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.2 Results on determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.2.1 Determinant using row operations . . . . . . . . . . . . . . . . . . 119
8.2.2 The determinant of a product . . . . . . . . . . . . . . . . . . . . 123
8.3 Matrix inverse using cofactors . . . . . . . . . . . . . . . . . . . . . . . . 123
8.4 Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 128
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 129

9 Rank, range and linear systems 131


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

v
Contents

9.1 The rank of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132


9.2 Rank and systems of linear equations . . . . . . . . . . . . . . . . . . . . 133
9.3 General solution of a linear system in vector notation . . . . . . . . . . . 137
9.4 Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 141
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 142

10 Sequences and series 145


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
10.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
10.1.1 Sequences in general . . . . . . . . . . . . . . . . . . . . . . . . . 146
10.1.2 Arithmetic progressions . . . . . . . . . . . . . . . . . . . . . . . 146
10.1.3 Geometric progressions . . . . . . . . . . . . . . . . . . . . . . . . 146
10.1.4 Compound interest . . . . . . . . . . . . . . . . . . . . . . . . . . 146
10.1.5 Frequent compounding . . . . . . . . . . . . . . . . . . . . . . . . 147
10.2 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
10.2.1 Arithmetic series . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
10.2.2 Geometric series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
10.3 Finding a formula for a sequence . . . . . . . . . . . . . . . . . . . . . . 149
10.4 Limiting behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
10.5 Financial applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 152
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

11 Difference equations 157


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

vi
Contents

Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
11.1 First-order difference equations . . . . . . . . . . . . . . . . . . . . . . . 158
11.2 Solving first-order difference equations . . . . . . . . . . . . . . . . . . . 159
11.3 Long-term behaviour of solutions . . . . . . . . . . . . . . . . . . . . . . 161
11.4 The cobweb model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
11.5 Financial applications of first-order difference equations . . . . . . . . . . 163
11.6 Homogeneous second-order difference equations . . . . . . . . . . . . . . 164
11.7 Non-homogeneous second-order equations . . . . . . . . . . . . . . . . . . 168
11.8 Behaviour of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
11.9 Economic applications of second-order difference equations . . . . . . . . 169
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 171
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

12 Vector spaces and subspaces 179


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
12.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
12.1.1 Definition of a vector space . . . . . . . . . . . . . . . . . . . . . 180
12.1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
12.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
12.2.1 An alternative characterisation of a subspace . . . . . . . . . . . . 185
12.3 Subspaces connected with matrices . . . . . . . . . . . . . . . . . . . . . 185
12.3.1 Null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
12.3.2 Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 186
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 186

vii
Contents

13 Linear span and linear independence 189


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
13.1 Linear span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
13.1.1 Lines and planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . 190
13.1.2 Row space and column space . . . . . . . . . . . . . . . . . . . . 192
13.2 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
13.3 Testing for linear independence in Rn . . . . . . . . . . . . . . . . . . . . 194
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 197
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 197

14 Bases and dimension 201


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
14.1 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
14.1.1 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
14.1.2 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
14.1.3 Dimension and bases of subspaces . . . . . . . . . . . . . . . . . . 206
14.2 Finding a basis for a linear span in Rn . . . . . . . . . . . . . . . . . . . 206
14.3 Basis and dimension of range and null space . . . . . . . . . . . . . . . . 208
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 211
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 211

15 Linear transformations 213


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

viii
Contents

Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213


Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
15.1 Linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
15.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
15.3 Linear transformations and matrices . . . . . . . . . . . . . . . . . . . . 215
15.3.1 Rotation in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
15.4 Linear transformations of any vector spaces . . . . . . . . . . . . . . . . 217
15.4.1 Identity and zero linear transformations . . . . . . . . . . . . . . 218
15.4.2 Composition and combinations of linear transformations . . . . . 218
15.4.3 Inverse linear transformations . . . . . . . . . . . . . . . . . . . . 218
15.5 Linear transformations determined by action on a basis . . . . . . . . . . 219
15.6 Range and null space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
15.7 Rank and nullity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 222
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 223

16 Coordinates and change of basis 225


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
16.1 Coordinates and coordinate change . . . . . . . . . . . . . . . . . . . . . 226
16.2 Change of basis and similarity . . . . . . . . . . . . . . . . . . . . . . . . 229
16.2.1 Matrices of linear transformations . . . . . . . . . . . . . . . . . . 229
16.2.2 Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 231

17 Diagonalisation 233
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

ix
Contents

Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233


Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
17.1 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . 234
17.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
17.1.2 Finding eigenvalues and eigenvectors . . . . . . . . . . . . . . . . 234
17.1.3 Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
17.1.4 Eigenvalues and the determinant . . . . . . . . . . . . . . . . . . 238
17.2 Diagonalisation of a square matrix . . . . . . . . . . . . . . . . . . . . . 238
17.3 When is diagonalisation possible? . . . . . . . . . . . . . . . . . . . . . . 241
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 245
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 245

18 Applications of diagonalisation 247


Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
18.1 Powers of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
18.2 Systems of difference equations . . . . . . . . . . . . . . . . . . . . . . . 249
18.2.1 Systems of difference equations . . . . . . . . . . . . . . . . . . . 250
18.2.2 Solving by change of variable . . . . . . . . . . . . . . . . . . . . 250
18.2.3 Solving using matrix powers . . . . . . . . . . . . . . . . . . . . . 253
18.2.4 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 258
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 258

A Sample examination paper 259

B Commentary on the Sample examination paper 263

x
Preface

This subject guide is not a course text. It sets out a logical sequence in which to study
the topics in this subject. The subject guide alone is not sufficient to gain a thorough
understanding of the subject; you will need to do the essential reading recommended for
each chapter and any further reading you find helpful.
We are very grateful to James Ward and Keith Martin for their careful readings of an
earlier edition of this guide and for their many helpful comments.

1
Preface

2
1

Chapter 1
Introduction

In this very brief introduction, we aim to give you an idea of the nature of this subject
and to advise on how best to approach it. We give general information about the
contents and use of this subject guide, and on recommended reading and how to use the
textbooks.

1.1 This subject


Algebra as studied in this course is primarily what is usually called linear algebra:
the study of matrices, systems of linear equations, eigenvalues and eigenvectors,
diagonalisation of matrices, and related topics. However, we also, in this course, study
sequences, series and difference equations.
Our approach here is not just to help you acquire proficiency in techniques and
methods, but also to understand some of the theoretical ideas behind these. For
example, after completing this course, you will hopefully understand why the number of
‘free parameters’ in the set of solutions of a system of linear equations is linked with the
idea of the ‘rank’ of the matrix that describes the system of equations. In addition to
this, we try to indicate the uses of some of the methods in applications to economics,
finance and related disciplines.

1.1.1 Aims of the course


The broad aims of this course are as follows:

to enable students to acquire skills in the methods of algebra, as required for their
use in further mathematics subjects and economics-based subjects
to prepare students for further courses in mathematics and/or related disciplines.
As emphasised above, however, we do also want you to understand why certain
methods work: this is one of the ‘skills’ that you should aim to acquire. The examination
will test not simply your ability to perform routine calculations, but will probe your
knowledge and understanding of the fundamental principles underlying the area.

1.1.2 Learning outcomes


We now state the broad learning outcomes of this course, as a whole. More specific
learning outcomes can be found at the end of each chapter.
At the end of this course and having completed the reading and activities you should
have:

3
1
1. Introduction

used the concepts, terminology, methods and conventions covered in the course to
solve mathematical problems in this subject

the ability to solve unseen mathematical problems involving understanding of these


concepts and application of these methods

seen how algebra can be used to solve problems in economics and related subjects

the ability to demonstrate knowledge and understanding of the underlying


principles.
There are a couple of things we should stress at this point. First, note the intention that
you will be able to solve unseen problems. This means simply that you will be
expected to be able to use your knowledge and understanding of the material to solve
problems that are not completely standard. This is not something you should worry
unduly about: all mathematics topics expect this, and you will never be expected to do
anything that cannot be done using the material of this course. Second, we expect you
to be able to ‘demonstrate knowledge and understanding’ and you might well wonder
how you would demonstrate this in the examination. Well, it is precisely by being able
to grapple successfully with unseen, non-routine, questions that you will indicate that
you have a proper understanding of the topic.

1.1.3 Route map to the guide


Descriptions of topics to be covered appear in the relevant chapters. However, it is
useful to give a brief overview at this stage.
We start with a preliminary chapter, before we embark on the main material of
MT1173 Algebra. This preliminary chapter serves two purposes. It quickly discusses
some basics, things you may have seen before. It also contains some discussion about
how we prove things in mathematics, and for many of you this will be new material.
This subject is not a very formal ‘abstract’ one, but it is nonetheless useful for you to
begin to think about how we prove or disprove (establish truth or falsity) of
mathematical statements.
Then we move onto the main part of the guide. We start by introducing the
fundamental objects of study in linear algebra: these are matrices and vectors. We
examine the geometrical interpretation of vectors and use this to understand how to
find the equations of planes, for instance. Then we show how certain very important
types of equations — systems of simultaneous linear equations — can be looked at from
the perspective of matrices, and we develop systematic techniques (using, especially
‘row operations’) for solving such systems (or detecting that no solutions exist).
Continuing our study of matrices, we look at the idea of the inverse of a matrix and
show that inverses can be determined in two distinct ways, one of which uses row
operations and another the important notion of the ‘determinant’ of a matrix. We then
depart from linear algebra to look at sequences, series and difference equations. This is
an important topic in its own right, but also has many applications to finance and
economics. We will see later how linear algebra techniques can be used to solve certain
problems in difference equations, so it is not unrelated to linear algebra. Next, we
discuss basic ‘vector space’ concepts such as subspaces, linear independence, bases and

4
1
1.2. Reading

dimension, and linear transformations. Finally, we study the diagonalisation of matrices


and some of its many applications.
Throughout, the emphasis is on the theory as much as on the methods. That is to say,
the aim in this course is not only to provide you with useful techniques and methods of
algebra, but to enable you to understand why these techniques work.

1.2 Reading

1.2.1 Essential reading

R
The guide closely follows the following textbook (which is also available as an e-book):
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. (Cambridge
University Press, 2012) [ISBN 9780521279482].
You will need a copy of this book. The guide repeats some of the content of the book,
but the book has much more extensive discussion and explanation, which will help you
understand better. The guide will make frequent references to the book, asking you to
read various sections of it as you work through the guide. We will often abbreviate it, in
such references, to ‘A-H’. The guide contains some exercises for you to attempt, but the
main source of exercises for most chapters will be the textbook (which contains full
solutions to its exercises). We will also regularly ask you to attempt some Problems
from the book, and discussion of how to approach these will be found on the VLE.
The textbook by Anthony and Harvey covers almost all the material needed for this
course. (It also contains much more, in Chapters 10 to 13, that is not needed for this
course but which will be useful if you continue the study of linear algebra after this
course). There is one reasonably small exception: the book does not cover Sequences,
Series and Difference Equations. For that, we recommend the following textbook (also

R
available as an e-book):
Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. (Cambridge: Cambridge University Press, 1996) [ISBN
9780521551137 (hardback); 9780521559138 (paperback)].

1.2.2 Further reading


Please note that as long as you read the Essential reading you are then free to read
around the subject area in any text, paper or online resource. It is sometimes useful to
read other textbooks to deepen your understanding and to work through additional
exercises. There are many other books that would be useful for this purpose. Almost
any text on linear algebra will be useful to some extent. The easiest way to locate a
given topic in another textbook is to use the index or table of contents. To help you
read extensively, you have free access to the virtual learning environment (VLE) and
University of London Online Library (see below).
Two examples of additional textbooks that will be useful are the following, though

R
many, many others will be useful too.
Anton, H. and C. Rorres. Elementary Linear Algebra with Supplemental

5
1
1. Introduction

Applications (International Student Version). (John Wiley & Sons (Asia) Plc Ltd,

R 2010) tenth edition. [ISBN 9780470561577 ]. 1


Lay, D.C. Linear Algebra and its Applications. (Pearson Education, Inc., 2013)
[ISBN 9781292020556].2

1.3 Online study resources


In addition to the subject guide and the Essential reading, it is crucial that you take
advantage of the study resources that are available online for this course, including the
VLE and the Online Library.
You can access the VLE, the Online Library and your University of London email
account via the Student Portal at:
http://my.londoninternational.ac.uk
You should receive your login details in your study pack. If you have forgotten these
login details, please click on the ‘Forgotten your password’ link on the login page.

1.3.1 The VLE


The VLE, which complements this subject guide, has been designed to enhance your
learning experience, providing additional support and a sense of community. It forms an
important part of your study experience with the University of London and you should
access it regularly.
The VLE provides a range of resources for EMFSS courses:

Self-testing activities: Doing these allows you to test your own understanding of
subject material.
Electronic study materials: The printed materials that you receive from the
University of London are available to download, including updated reading lists
and references.
Past examination papers and Examiners’ commentaries: These provide advice on
how each examination question might best be answered.
A student discussion forum: This is an open space for you to discuss interests and
experiences, seek support from your peers, work collaboratively to solve problems
and discuss subject material.
Videos: There are recorded academic introductions to the subject, interviews and
debates and, for some courses, audio-visual tutorials and conclusions.
Recorded lectures: For some courses, where appropriate, the sessions from previous
years’ Study Weekends have been recorded and made available.
1
There are many editions and variants of this book, such as the ‘Applications version’. Any one is
equally useful and you will not need more than one of them. You can find the relevant sections cited in
this guide in any edition by using the index.
2
Any earlier edition of this text will also be useful.

6
1
1.4. Using the guide

Study skills: Expert advice on preparing for examinations and developing your
digital literacy skills.

Feedback forms.
Some of these resources are available for certain courses only, but we are expanding our
provision all the time and you should check the VLE regularly for updates.

1.3.2 Making use of the Online Library


The Online Library contains a huge array of journal articles and other resources to help
you read widely and extensively.
To access the majority of resources via the Online Library you will either need to use
your University of London Student Portal login details, or you will be required to
register and use an Athens login:
http://tinyurl.com/ollathens
The easiest way to locate relevant content and journal articles in the Online Library is
to use the Summon search engine.
If you are having trouble finding an article listed in a reading list, try removing any
punctuation from the title, such as single quotation marks, question marks and colons.
For further advice, please see the online help pages:
www.external.shl.lon.ac.uk/summon/about.php

1.4 Using the guide


You must do the essential reading as you work through the guide. The exercises
suggested at the end of the chapters of this guide are a vital part of the learning
process. It is important, to derive as much benefit from them as possible, to make a
good attempt at exercises before consulting the solutions. It is vital that you develop
and enhance your problem-solving skills and the only way to do this is to try lots of
examples.
The exercises at the end of the chapters of the textbook and guide are part of the
learning process. As such, they are not all supposed to be ‘sample examination
questions’. However, a Sample examination paper is given as the final chapter to this
guide. Some exercises will be easy or routine. Others will be more challenging, and
others more challenging still. These are there to help you learn and understand.

1.5 Examination
Important: the information and advice given here are based on the examination
structure used at the time this guide was written. Please note that subject guides may
be used for several years. Because of this we strongly advise you to always check both
the current Regulations for relevant information about the examination, and the virtual
learning environment (VLE) where you should be advised of any forthcoming changes.

7
1
1. Introduction

You should also carefully check the rubric/instructions on the paper you actually sit
and follow those instructions. Remember, it is important to check the VLE for:

up-to-date information on examination and assessment arrangements for this course

where available, past examination papers and Examiners’ commentaries for the
course which give advice on how each question might best be answered.
There are no optional topics in this subject: you should do them all. This is reflected in
the structure of the examination paper. There are five questions (each worth 20 marks)
and all questions are compulsory.
Please do not think that the questions in a real examination will necessarily be very
similar to those in the Sample examination paper. An examination is designed (by
definition) to test you. You will get examination questions unlike questions in this
guide. The whole point of examining is to see whether you can apply knowledge in
familiar and unfamiliar settings. The Examiners (nice people though they are) have an
obligation to surprise you! For this reason, it is important that you try as many
examples as possible, from the guide and from the textbooks. This is not so that you
can cover any possible type of question the Examiners can think of! It’s so that you get
used to confronting unfamiliar questions, grappling with them, and finally coming up
with the solution.
Do not panic if you cannot completely solve an examination question. There are many
marks to be awarded for using the correct approach or method.

1.6 The use of calculators


You will not be permitted to use calculators of any type in the examination. This is not
something that you should panic about: the examiners are interested in assessing that
you understand the key concepts, ideas, methods and techniques, and will set questions
which do not require the use of a calculator.

8
2

Chapter 2
Preliminaries

Introduction
Before we embark on the main material, we need quickly to point out some basic
mathematical skills that you need, and to introduce some concepts of mathematical
proof. Do not worry if you do not feel comfortable of familiar with this material. You
should not let it detain you too long: proceed on with the rest of the guide, and you will
pick up what you need as you go.

Aims
This chapter has two aims:

Briefly discuss some basic mathematics and mathematical notation you probably
already know.

Introduce some mathematical terminology, some basics of logic and techniques of


proof.

Reading
For the first part of the chapter (concerning sets, notation and functions), you might
well find that you have studied most of these topics in previous mathematics courses
and that nearly all of the material is revision. But don’t worry if a topic is new to you.
We will mention the main results which you will need to know. If you are unfamiliar
with a topic, or if you find any of the topics difficult, then you should look up that topic
in any basic mathematics text. There are many textbooks you could consult. For
example, Chapter 2 of the Anthony and Biggs book might be useful.
For the material on proof, the discussion is relatively self-contained, so no additional
reading should be required.

R
(For full publication details, see Chapter 1.)

Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods


and Modelling. Chapter 2.

9
2. Preliminaries

2
Synopsis
We discuss the basic notation and ideas associated with sets. We then look at the
standard number systems and some associated important notation. Algebraic
manipulation is an important skill you should possess, and we review that briefly, and
look at how to solve quadratic and simple polynomial equations. Then we gently start
to explore mathematical proof. A proof is a way of establishing that a mathematical
statement is true and you will encounter proofs at various points in your study of this
subject. We give some examples of different proof methods and explore some of the
underlying mathematical logic.

2.1 Some basic set theory

2.1.1 Sets
A set may be thought of as a collection of objects. 1 A set is usually described by
listing or describing its members inside curly brackets. For example, when we write
A = {1, 2, 3}, we mean that the objects belonging to the set A are the numbers 1, 2, 3
(or, equivalently, the set A consists of the numbers 1, 2 and 3). Equally (and this is
what we mean by ‘describing’ its members), this set could have been written as

A = {n | n is a whole number and 1 ≤ n ≤ 3}.

Here, the symbol | stands for ‘such that’. Often, the symbol ‘:’ is used instead, so that
we might write
A = {n : n is a whole number and 1 ≤ n ≤ 3}.

As another example, the set

B = {x | x is a reader of this guide}

has as its members all of you (and nothing else). When x is an object in a set A, we
write x ∈ A and say ‘x belongs to A’ or ‘x is a member of A’.
The set which has no members is called the empty set and is denoted by ∅. The empty
set may seem like a strange concept, but it has its uses.

2.1.2 Subsets
We say that the set S is a subset of the set T , and we write S ⊆ T , or S ⊂ T , if every
member of S is a member of T . For example, {1, 2, 5} ⊆ {1, 2, 4, 5, 6, 40}. The difference
between the two symbols is that S ⊂ T literally means that S is a proper subset of T ,
meaning not all of T , and S ⊆ T means that S is a subset of T and possibly (but not
necessarily) all of T . So in the example just given we could have also written,
{1, 2, 5} ⊂ {1, 2, 4, 5, 6, 40}.
1
See Anthony and Biggs, Section 2.1.

10
2.2. Numbers, algebra and equations

2
2.1.3 Union and intersection
Given two sets A and B, the union A ∪ B is the set whose members belong to A or B
(or both A and B): that is,

A ∪ B = {x | x ∈ A or x ∈ B}.

Example 2.1 If A = {1, 2, 3, 5} and B = {2, 4, 5, 7}, then A ∪ B = {1, 2, 3, 4, 5, 7}.

Similarly, we define the intersection: A ∩ B to be the set whose members belong to


both A and B: 2
A ∩ B = {x | x ∈ A and x ∈ B}.

Activity 2.1 Suppose A = {1, 2, 3, 5} and B = {2, 4, 5, 7}. Find A ∩ B.

2.1.4 Showing two sets are equal


Sets A and B are equal if they contain exactly the same objects. A rather less obvious
— but very useful — way of thinking about this is as follows: A = B means the same as
saying A ⊆ B and B ⊆ A. This is, in fact, quite often the best way to show that two
sets are equal, by showing first that A ⊆ B and then, also, that B ⊆ A.

2.2 Numbers, algebra and equations

2.2.1 Numbers
There are some standard notations for important sets of numbers. 3 The set R of real
numbers, may be thought of as the points on a line. Each such number can be
described by a decimal representation.
The set of real numbers R includes the following subsets.
N, the set of natural numbers: N = {1, 2, 3, . . . }, also referred to as the positive
integers.
Z, the set of integers: {. . . , −3, −2, −1, 0, 1, 2, 3, . . .}.
p 2 9 4
Q, the set of rational numbers: with p, q ∈ Z, q 6= 0; for example, , − , = 4.
q 5 2 1
The
√ set of irrational numbers, that is, real numbers which are not rational; for example,
2, π.
These sets are related by: N ⊂ Z ⊂ Q ⊂ R.
Given two real numbers a and b, we define intervals such as,

(a, b) = {x| a < x < b} [a, b] = {x| a ≤ x ≤ b}


2
See Anthony and Biggs for examples of union and intersection.
3
See Anthony and Biggs, Section 2.1.

11
2. Preliminaries

2
and combinations of these. For example, [a, b) = {x| a ≤ x < b}. The numbers a and b
are called the endpoints of the interval. You should notice that when a square bracket,
‘[’ or ‘]’, is used to denote an interval, the number beside the bracket is included in the
interval, whereas if a round bracket, ‘(’ or ‘)’, is used, the adjacent number is not in the
interval. For example, [2, 3] contains the number 2, but (2, 3] does not. We can also
indicate unbounded intervals, such as

(−∞, b) = {x| x < b} [a, ∞) = {x| a ≤ x}.

The symbol ∞ means ‘infinity’, but it is not a real number, merely a notational
convenience.
The absolute value of a real number a is defined by:

a if a ≥ 0
|a| = .
−a if a ≤ 0

So the absolute value of a equals a if a is non-negative (that is, if a ≥ 0), and equals −a
otherwise. For instance, |6| = 6 and | − 2.5| = 2.5. (This is sometimes called the
modulus of a). Roughly speaking, the absolute value of a number is obtained just by
ignoring any minus sign the number has. Note that

a2 = |a|,

since by x we always mean the positive square root to avoid ambiguity. So the √ two
solutions of the equation x2 = 4, are x = ±2 (meaning x = 2 or x = −2), but 4 = 2.
The absolute value of real numbers satisfies the following inequality,

|a + b| ≤ |a| + |b|, a, b ∈ R.

This is called the triangle inequality.


Having defined R, we can define the set R2 of ordered pairs (x, y) of real numbers.
Thus R2 is the set usually depicted as the set of points in a plane, x and y being the
coordinates of a point with respect to a pair of axes. For instance, (−1, 3/2) is an
element of R2 lying to the left of and above (0, 0), which is known as the origin.

2.2.2 Basic notations


Although there is a high degree of standardisation of notation within mathematical
texts, some differences do occur. The notation given here is indicative of what is used in
the rest of this guide and in most texts. 4 You should endeavour to familiarise yourself
with as many of the common notations as possible. As an example, multiplication is
sometimes denoted by a dot, as in a · b rather than a × b. Beware of confusing
multiplication and the use of a dot to indicate a decimal point. Even more commonly,
one simply uses ab to denote the multiplication of a and b. Also, you should be aware of
implied multiplications, as in 2(3) = 6.
Some other useful notations are those for sums and factorials. We denote the sum

x1 + x2 + · · · + xn
4
You may consult any of a large number of basic maths texts for further information on basic notations.

12
2.2. Numbers, algebra and equations

2
of the numbers x1 , x2 , . . . , xn by
n
X
xi .
i=1

The ‘Σ’ indicates that numbers are being summed, and the ‘i = 1’ and n below and
above the Σ show that it is the numbers xi , as i runs from 1 to n, that are being
summed together. Sometimes we will be interested in adding up only some of the
numbers. For example,
n−1
X
xi
i=2

would denote the sum x2 + x3 + · · · + xn−1 , which is the sum of all the numbers except
the first and last.

Example 2.2 Suppose that x1 = 1, x2 = 3, x3 = −1, x4 = 5. Then


4
X 4
X
xi = 1 + 3 + (−1) + 5 = 8, xi = 3 + (−1) + 5 = 7.
i=1 i=2

For a positive whole number n, n! (read as n factorial) is the product of all the integers
from 1 up to n. For example, 4! = 1 · 2 · 3 · 4 = 24. By convention 0! is taken to be 1.
Finally, we often use the symbol  to denote the end of a proof, where we have finished
explaining why a particular result is true. This is just to make it clear where the proof
ends and the following text begins.

2.2.3 Simple algebra


You should try to become confident and capable in handling simple algebraic
expressions and equations.
You should be proficient in:

collecting up terms: e.g.


2a + 3b − a + 5b = a + 8b.

multiplication of variables: e.g.

(−a)(b) + (a)(−b) − 3(a)(b) + (−2a)(−4b) = −ab − ab − 3ab + 8ab = 3ab.

expansion of bracketed terms: e.g.

(2x − 3y)(x + 4y) = 2x2 − 3xy + 8xy − 12y 2 = 2x2 + 5xy − 12y 2 .

Activity 2.2 Expand (x − 1)(x + 1). Then use this to expand (x − 1)(x + 1)(x + 2).

13
2. Preliminaries

2
2.2.4 Powers
When n is a positive integer, the nth power5 of the number a, denoted an , is simply
the product of n copies of a, that is,
an = a
| × a × a{z× · · · × a} .
n times

The number n is called the power, exponent, or index. We have the power rules (or
rules of exponents):
ar as = ar+s , (ar )s = ars ,
whenever r and s are positive integers.

Activity 2.3
Prove the power rules above using the definition of an for n ∈ N.

The power a0 is defined to be 1.


The definition is extended to negative integers as follows. When n is a positive integer,
a−n means 1/an . For example, 3−2 is 1/32 = 1/9. The power rules hold when r and s
are any integers, positive, negative or zero.
When n is a positive integer, a1/n is the positive nth root of a; this is√the positive
number x such that xn = a. For example, a1/2 is usually denoted by a, and is the
positive square root of a, so that 41/2 = 2.
m
When m and n are integers and n is positive, am/n is a1/n . This extends the
definition of powers to the rational numbers. The definition is extended to real numbers
by ‘filling in the gaps’ between the rational numbers, and it can be shown that the rules
of exponents still apply.

Activity 2.4 Simplify the expression:

49x−2 4xy 2
− .
35y (2xy)3

2.2.5 Quadratic equations


It is straightforward to find the solution of a linear equation, one of the form ax + b = 0
where a, b ∈ R. By a solution, we mean a real number x for which the equation is true.
A common problem is to find the set of solutions of a quadratic equation6
ax2 + bx + c = 0,
where we may as well assume that a 6= 0, because if a = 0 the equation reduces to a
linear one. In some cases the quadratic expression can be factorised, which means that
it can be written as the product of two linear terms. For example
x2 − 6x + 5 = (x − 1)(x − 5),
5
See Anthony and Biggs, Section 7.1.
6
See Anthony and Biggs, Section 2.4.

14
2.2. Numbers, algebra and equations

2
so the equation x2 − 6x + 5 = 0 becomes (x − 1)(x − 5) = 0. Now the only way that two
numbers can multiply to give 0 is if at least one of the numbers is 0, so we can conclude
that x − 1 = 0 or x − 5 = 0; that is, the equation has two solutions, 1 and 5.

Activity 2.5
Use factorisation to find the solutions of each of these equations:

(a) x2 − 4 = 0, (b) x2 + 2x − 8 = 0, (c) 2x2 − 7x + 3 = 0.

Although factorisation may be difficult, there is a general method for determining the
solutions to a quadratic equation using the quadratic formula,7 as follows. Suppose
we have the quadratic equation ax2 + bx + c = 0, where a 6= 0. Then the solutions of
this equation are:
√ √
−b − b2 − 4ac −b + b2 − 4ac
x1 = x2 = .
2a 2a
The term b2 − 4ac is called the discriminant.

If b2 − 4ac > 0, the equation has two real solutions as given above.

If b2 − 4ac = 0, the equation has exactly one solution, x = −b/(2a). (In this case we
say that this is a solution of multiplicity two.)

If b2 − 4ac < 0, the equation has no real solutions.


For example, consider the equation 2x2 − 7x + 3 = 0. Using the quadratic formula, we
have √ p
−b ± b2 − 4ac 7 ± 49 − 4(2)(3) 7±5
x= = =
2a 2(2) 4
So the solutions are x = 3 and x = 21 .
The equation x2 + 6x + 9 = 0 has one solution of multiplicity two; its discriminant is
b2 − 4ac = 36 − 9(4) = 0. This equation is most easily solved by recognising that
x2 + 6x + 9 = (x + 3)2 , so the solution is x = −3.
On the other hand, consider the quadratic equation x2 − 2x + 3 = 0; here we have
a = 1, b = −2, c = 3. The quantity b2 − 4ac < 0, so this equation has no real solution.
(It does have solutions in complex numbers, but this is outside the scope of this course.)
This is less mysterious than it may seem. We can write the equation as (x − 1)2 + 2 = 0.
Rewriting the left-hand side of the equation in this form is known as completing the
square. Now, the square of a number is always greater than or equal to 0, so the
quantity on the left of this equation is always at least 2 and is therefore never equal to
0. The quadratic formula for the solutions to a quadratic equation is obtained using the
technique of completing the square.8 Quadratic polynomials which cannot be written as
a product of linear terms (so ones for which the discriminant is negative) are said to be
irreducible.
7
See Anthony and Biggs, Section 2.4.
8
See Anthony and Biggs, Section 2.4, if you haven’t already.

15
2. Preliminaries

2
2.2.6 Polynomial equations
A polynomial of degree n in x is an expression of the form,

Pn (x) = a0 + a1 x + a2 x2 + . . . + an xn

6 0, and x is a real variable. For example, a


where the ai are real constants, an =
quadratic expression such as those discussed above, is a polynomial of degree 2.
In general, a polynomial equation of degree n has at most n solutions. For example,
since
x3 − 7x + 6 = (x − 1)(x − 2)(x + 3),
the equation x3 − 7x + 6 = 0 has three solutions; namely, 1, 2, −3. The solutions of the
equation Pn (x) = 0 are called the roots or zeros of the polynomial. Unfortunately,
there is no general straightforward formula (as there is for quadratics) for the solutions
to Pn (x) = 0 for polynomials Pn of degree larger than 2.
To find the solutions to P (x) = 0 where P is a polynomial of degree n, we use the fact
that if α is such that P (α) = 0, then (x − α) must be a factor of P (x). We find such an
a by trial and error and then write P (x) in the form (x − α)Q(x), where Q(x) is a
polynomial of degree n − 1.
As an example, we’ll use this method to factorise the cubic polynomial x3 − 7x + 6.
Note that if this polynomial can be expressed as a product of linear factors, then it will
be of the form,
x3 − 7x + 6 = (x − r1 )(x − r2 )(x − r3 )
where its constant term is the product of the roots: 6 = −r1 r2 r3 . (To see this, just
substitute x = 0 into both sides of the above equation.) So if there is an integer root, it
will be a factor of 6. We will try x = 1. Substituting this value for x, we do indeed get
1 − 7 + 6 = 0, so (x − 1) is a factor. Then we can deduce that

x3 − 7x + 6 = (x − 1)(x2 + λx − 6)

for some number λ, as the coefficient of x2 must be 1 for the product to give x3 , and the
constant term must be −6 so that (−1)(−6) = 6, the constant term in the cubic. It only
remains to find λ. This is accomplished by comparing the coefficients of either x2 or x in
the cubic polynomial and the product. The coefficient of x2 in the cubic is 0, and in the
product the coefficient of x2 is obtained from the terms (−1)(x2 ) + (x)(λx), so that we
must have λ − 1 = 0 or λ = 1. Then

x3 − 7x + 6 = (x − 1)(x2 + x − 6),

and the quadratic term is easily factored into (x − 2)(x + 3), that is

x3 − 7x + 6 = (x − 1)(x − 2)(x + 3).

2.3 Mathematical statements and proof


In the next part of this chapter, we introduce the topics of mathematical statement and
proof. We start by giving some explicit examples. Later, we discuss some general theory

16
2.3. Mathematical statements and proof

2
and principles. Our discussion of this topic is limited because this course is not a course
in logic or proof, as such. What we do need is enough logic to understand what
mathematical statements mean and how we might prove or disprove them.
Consider the following statements (in which, you should recall that the natural numbers
are the positive integers):

(a) 20 is divisible by 4.

(b) 21 is not divisible by 7.

(c) 21 is divisible by 4.

(d) 21 is divisible by 3 or 5.

(e) 50 is divisible by 2 and 5.

(f) n2 is even.

(g) For every natural number n, the number n2 + n is even.

(h) There is a natural number n such that 2n = 2n .

(i) If n is even, then n2 is even.

(j) For all odd numbers n, n2 is odd.

(k) For natural numbers n, n2 is even if and only if n is even.


These are all mathematical statements, of different sorts. Statements (a) to (e) are
straightforward propositions about certain numbers, and these are either true or false.
Statements (d) and (e) are examples of compound statements. Statement (d) is true
precisely when either one (or both) of the statements ‘21 is divisible by 3’ and ‘21 is
divisible by 5’ is true. Statement (e) is true precisely when both of the statements ‘50 is
divisible by 2’ and ‘50 is divisible by 5’ are true.
Statement (f) is different, because the number n is not specified and whether the
statement is true or false will depend on the value of the so-called ‘free variable’ n. Such
a statement is known as a predicate.
Statement (g) makes an assertion about all natural numbers and is an example of a
universal statement.
Statement (h) asserts the existence of a particular number and is an example of an
existential statement.
Statement (i) can be considered as an assertion about all even numbers, and so it is a
universal statement, where the ‘universe’ is all even numbers. But it can also be
considered as an implication, asserting that if n happens to be even, then n2 is even.
Statement (j) is a universal statement about all odd numbers. It can also be thought of
(or rephrased) as an implication, for it says precisely the same as ‘if n is odd, then n2 is
odd’.
Statement (k) is an ‘if and only if’ statement: what it says is that n2 is even, for a
natural number n, precisely when n is even. But this means two things: namely that n2

17
2. Preliminaries

2
is even if n is even, and n is even if n2 is even. Equivalently, it means that n2 is even if
n is even and that n2 is odd if n is odd. So statement (k) will be true precisely if (i) and
(j) are true.

2.4 Introduction to proving statements


We’ve seen, above, various types of mathematical statement, and such statements are
either true or false. But how would we establish the truth or falsity of these?
We can, even at this early stage, prove (by which we mean establish the truth of) or
disprove (by which we mean establish the falsity of) most of the statements given
above. Here’s how we can do this.

(a) 20 is divisible by 4.
This statement is true. Yes, yes, we know it’s ‘obvious’, but stay with us. To give a
proper proof, we need first to understand exactly what the word ‘divisible’ means.
You will probably most likely think that this means that when we divide 20 by 4
we get no remainder. This is correct: in general, for natural numbers n and d, to
say that n is divisible by d (or, equivalently, that n is a multiple of d) means
precisely that there is some natural number m for which n = md. Since 20 = 5 × 4,
we see that 20 is divisible by 4. And that’s a proof! It’s utterly convincing,
watertight, and not open to debate.

(b) 21 is not divisible by 7.


This is false. It’s false because 21 is divisible by 7, because 21 = 3 × 7.

(c) 21 is divisible by 4.
This is false, as can be established in a number of ways. First, we note that if the
natural number m satisfies m ≤ 5, then m × 4 will be no more than 20. And if
m ≥ 6 then m × 4 will be at least 24. Well, any natural number m is either at most
5 or at least 6 so, for all possible m, we do not have m × 4 = 21 and hence there is
no natural number m for which m × 4 = 21. In other words, 21 is not divisible by 4.
Another argument (which is perhaps more straightforward, but which relies on
properties of rational numbers rather than just simple properties of natural
numbers) is to note that 21/4 = 5.25, and this is not a natural number, so 21 is not
divisible by 4. (This second approach is the same as showing that 21 has remainder
1, not 0, when we divide by 4.)

(d) 21 is divisible by 3 or 5.
As we noted above, this is a compound statement and it will be true precisely when
one (or both) of the following statements is true:
(i) 21 is divisible by 3
(ii) 21 is divisible by 5.
Statement (i) is true, because 21 = 7 × 3. Statement (ii) is false. Because at least
one of these two statements is true, statement (d) is true.

18
2.4. Introduction to proving statements

2
(e) 50 is divisible by 2 and 5.
This is true. Again, this is a compound statement and it is true precisely if both of
the following statements are true:
(i) 50 is divisible by 2
(ii) 50 is divisible by 5.
Statements (i) and (ii) are indeed true because 50 = 25 × 2 and 50 = 10 × 5. So
statement (e) is true.

(f) n2 is even.
As mentioned above, whether this is true or false depends on the value of n. For
example, if n = 2 then n2 = 4 is even, but if n = 3 then n2 = 9 is odd. So, unlike
the other statements (which are propositions), this is a predicate P (n). The
predicate will become a proposition when we assign a particular value to n to it,
and the truth or falsity of the proposition can then be established. Statements (i),
(j) and (k) below do this comprehensively.

(g) For every natural number n, the number n2 + n is even.


Here’s our first non-immediate, non-trivial, proof. How on earth can we prove this,
if it is true, or disprove it, if it is false? Suppose it was false. How would you
convince someone of that? Well, the statement says that for every natural number
n, n2 + n is even. So if you managed (somehow!) to find a particular N for which
N 2 + N happened to be odd, you could prove the statement false by simply
observing that ‘When n = N , it is not the case that n2 + n is even.’ And that
would be the end of it. So, in other words, if a universal statement about natural
numbers is false, you can prove it is false by showing that its conclusion is false for
some particular value of n. But suppose the statement is true. How could you prove
it? Well, you could prove it for n = 1, then n = 2, then n = 3, and so on, but at
some point you would expire and there would still be numbers n that you hadn’t
yet proved it for. And that simply wouldn’t do, because if you proved it true for
the first 9999 numbers, it might be false when n = 10000. So what you need is a
more sophisticated, general argument that shows the statement is true for any
arbitrary n.
Now, it turns out that this statement is true. So we need a nice general argument
to establish this. Well, here’s one approach. We can note that n2 + n = n(n + 1).
The numbers n and n + 1 are consecutive natural numbers. So one of them is odd
and one of them is even. When you multiply any odd number and any even number
together, you get an even number, so n2 + n is even. Are you convinced? Maybe
not? We really should be more explicit. Suppose n is even. What that means is
that, for some integer k, n = 2k. Then n + 1 = 2k + 1 and hence

n(n + 1) = 2k(2k + 1) = 2 (k(2k + 1)) .

Because k(2k + 1) is an integer, this shows that n2 + n = n(n + 1) is divisible by 2;


that is, it is even. We supposed here that n was even. But it might be odd, in
which case we would have n = 2k + 1 for some integer k. Then

n(n + 1) = (2k + 1)(2k + 2) = 2 ((2k + 1)(k + 1)) ,

19
2. Preliminaries

2
which is, again, even, because (2k + 1)(k + 1) is an integer.
Right, we’re really proving things now. This is a very general statement, asserting
something about all natural numbers, and we have managed to prove it.
(h) There is a natural number n such that 2n = 2n .
This is an existential statement, asserting that there exists n with 2n = 2n . Before
diving in, let’s pause for a moment and think about how we might deal with such
statements. If an existential statement like this is true we would need only to show
that its conclusion (which in this case is 2n = 2n ) holds for some particular n. That
is, we need only find an n that works. If the statement is false, we have a lot more
work to do in order to prove that it is false. Because, to show that it is false, we
would need to show that, for no value of n does the conclusion hold. Equivalently,
for every n, the conclusion fails. So we’d need to prove a universal statement and,
as we saw in the previous example, that would require us to come up with a
suitably general argument.
In fact, this statement is true. This is because when n = 1 we have
2n = 2 = 21 = 2n .
(i) If n is even, then n2 is even.
This is true. The most straightforward way to prove this is to assume that n is
some (that is, any) even number and then show that n2 is even. So suppose n is
even. Then n = 2k for some integer k and hence n2 = (2k)2 = 4k 2 . This is even
because it is 2(2k 2 ) and 2k 2 is an integer.
(j) For all odd numbers n, n2 is odd.
This is true. The most straightforward way to prove this is to assume that n is any
odd number and then show that n2 is also odd. So suppose n is odd. Then
n = 2k + 1 for some integer k and hence n2 = (2k + 1)2 = 4k 2 + 4k + 1. To establish
that this is odd, we need to show that it can be written in the form 2K + 1 for
some integer K. Well, 4k 2 + 4k + 1 = 2(2k 2 + 2k) + 1. This is indeed of the form
2K + 1, where K is the integer 2k 2 + 2k. Hence n2 is odd.
Another possible way to prove this result is to prove that if n2 is even then n must
be even. We won’t do that, but let’s think about why it would be a possible
strategy. Suppose we were able to prove the following statement, which we’ll call Q:
Q: If n2 is even then n is even.
Why would that establish what we want (namely that if n is odd then n2 is odd)?
Suppose we have proved statement Q and suppose that n is odd. Then it must be
the case that n2 is odd. For, if n2 was not odd, it would be even and then Q would
tell us that this means n is even. But we have assumed n is odd. It cannot be both
even and odd, so we have reached a contradiction. By assuming that the opposite
conclusion holds (n2 even) we have shown that something impossible happens. This
type of argument is known as a proof by contradiction and it is often very
powerful. We will see more about this later.
(k) For natural numbers n, n2 is even if and only if n is even.
This is true. What we have shown in proving (i) and (j) is that if n is even then n2
is even, and if n is odd then n2 is odd. The first, (statement (i)) establishes that if

20
2.5. Some basic logic

2
n is even, then n2 is even. The second of these (statement (j)) establishes that n2 is
even only if n is even. This is because it shows that n2 is odd if n is odd, from
which it follows that if n2 is even, n must not have been odd, and therefore must
have been even. ‘If and only if’ statements of this type are very important. As we
see here, the proof of such statements breaks down into the proof of two ‘If-then’
statements.
These examples hopefully demonstrate that there are a wide range of statements and
proof techniques, and in the rest of this chapter we will explore these further.
Right now, one thing we hope comes out very clearly from these examples is that to
prove a mathematical statement, you need to know precisely what it means. Well, that
sounds obvious, but you can see how detailed we had to be about the meanings (that is,
the definitions) of the terms ‘divisible’, ‘even’ and ‘odd’. Definitions are very important.

2.5 Some basic logic


Mathematical statements can be true or false. Let’s denote ‘true’ by T and ‘false’ by F.
Given a statement, or a number of statements, it is possible to form other statements.
This was indicated in some of the examples above (such as the compound statements).
A technique known as the use of ‘truth tables’ enables us to define ‘logical operations’
on statements, and to determine when such statements are true. This is all a bit vague,
so let’s get down to some concrete examples.

2.5.1 Negation
The simplest way to take a statement and form another statement is to negate the
statement. The negation of a statement P is the statement ¬P (sometimes just
denoted ‘not P ’), which is defined to be true exactly when P is false. This can be
described in the very simple truth table, Table 2.1:

P ¬P
T F
F T

Table 2.1: The truth table for ‘negation’ or ‘not’

What does the table signify? Quite simply, it tells us that if P is true then ¬P is false
and if P is false then ¬P is true.

Example 2.3 If P is ‘20 is divisible by 3’ then ¬P is ‘20 is not divisible by 3’.


Here, P is false and ¬P is true.

It has, we hope, been indicated in the examples earlier in this chapter, that to disprove
a universal statement about natural numbers amounts to proving an existential
statement. That is, if we want to disprove a statement of the form ‘for all natural
numbers n, property p(n) holds’ (where p(n) is some predicate, such as ‘n2 is even’) we

21
2. Preliminaries

2
need only produce some N for which p(N ) fails. Such an N is called a
counterexample. Equally, to disprove an existential statement of the form ‘there is
some n such that property p(n) holds’, one would have to show that for every n, p(n)
fails. That is, to disprove an existential statement amounts to proving a universal one.
But, now that we have the notion of the negation of a statement we can phrase this a
little more formally. Proving that a statement P is false is equivalent to proving that
the negation ¬P is true. In the language of logic, therefore, we have the following:

The negation of a universal statement is an existential statement.


The negation of an existential statement is a universal statement.
More precisely,

The negation of the universal statement ‘for all n, property p(n) holds’ is the
existential statement ‘there is n such that property p(n) does not hold’.
The negation of the existential statement ‘there is n such that property p(n) holds’
is the universal statement ‘for all n, property p(n) does not hold’.
We could be a little more formal about this, by defining the negation of a predicate p(n)
(which, recall, only has a definitive true or false value once n is specified) to be the
predicate ¬p(n) which is true (for any particular n) precisely when p(n) is false. Then
we might say that

The negation of the universal statement ‘for all n, p(n) is true’ is the existential
statement ‘there is n such that ¬p(n) is true’.
The negation of the existential statement ‘there is n such that p(n) is true’ is the
universal statement ‘for all n, ¬p(n) is true’.
Now, let’s not get confused here. None of this is really difficult or new. We meet such
logic in everyday life. If we say ‘It rains every day in London’ then either this statement
is true or it is false. If it is false, it is because on (at least) one day it does not rain. The
negation (or disproof) of the statement ‘On every day, it rains in London’ is simply
‘There is a day on which it does not rain in London’. The former is a universal
statement (‘On every day, . . .’) and the latter is an existential statement (‘there is . . .’).
Or, consider the statement ‘There is a student who enjoys reading this guide’. This is
an existential statement (‘There is . . .’). This is false if ‘No student enjoys reading this
guide’. Another way of phrasing this last statement is ‘Every student reading this guide
does not enjoy it’. This is a more awkward expression, but it emphasises that the
negation of the initial, existential statement, is a universal one (‘Every student . . .’).
We hope these examples illustrate the point that much of logic is simple common sense.

2.5.2 Conjunction and disjunction


There are two very basic ways of combining propositions: through the use of ‘and’
(known as conjunction) and the use of ‘or’ (known as disjunction).
Suppose that P and Q are two mathematical statements. Then ‘P and Q’, also denoted
P ∧ Q, and called the conjunction of P and Q, is the statement that is true precisely
when both P and Q are true. For example, statement (e) above, which is

22
2.5. Some basic logic

2
‘50 is divisible by 2 and 5’
is the conjunction of the two statements
‘50 is divisible by 2’
‘50 is divisible by 5’
Statement (e) is true because both of these two statements are true. Table 2.2 gives the
truth table for the conjunction P and Q.

P Q P ∧Q
T T T
T F F
F T F
F F F

Table 2.2: The truth table for ‘and’

What Table 2.2 says is simply that P ∧ Q is true precisely when both P and Q are true
(and in no other circumstances).
Suppose that P and Q are two mathematical statements. Then ‘P or Q’, also denoted
P ∨ Q, and called the disjunction of P and Q, is the statement that is true precisely
when P , or Q, or both, are true. For example, statement (d) above, which is
‘21 is divisible by 3 or 5’
is the disjunction of the two statements
‘21 is divisible by 3’
‘21 is divisible by 5’
Statement (d) is true because at least one (namely the first) of these two statements is
true.
Note one important thing about the mathematical interpretation of the word ‘or’. It is
always used in the ‘inclusive-or’ sense. So P ∨ Q is true in the case when P is true, or Q
is true, or both. In some ways, this use of the word ‘or’ contrasts with its use in normal
everyday language, where it is often used to specify a choice between mutually exclusive
alternatives. (For example ‘You’re either with us or against us’.) But if someone said
‘Tomorrow I will wear brown trousers or I will wear a yellow shirt’ then, in the
mathematical way in which the word ‘or’ is used, the statement would be true if they
wore brown trousers and any shirt, any trousers and a yellow shirt, and also if they
wore brown trousers and a yellow shirt. You might have your doubts about their dress
sense in this last case, but, logically, it makes the statement true.
Table 2.3 gives the truth table for the disjunction P and Q.

P Q P ∨Q
T T T
T F T
F T T
F F F

Table 2.3: The truth table for ‘or’

23
2. Preliminaries

2
What Table 2.3 says is simply that P ∨ Q is true precisely when at least one of P and Q
is true.

2.6 Implications, converse and contrapositive

2.6.1 ‘If-then’ statements


It is very important to understand the formal meaning of the word ‘if’ in mathematics.
The word is often used rather sloppily in everyday life, but has a very precise
mathematical meaning. Let me give you an example. Suppose someone tells you ‘If it
rains, then I wear a raincoat’, and suppose that this is a true statement. Well, then,
suppose it rains. You can certainly conclude they will wear a raincoat. But what if it
does not rain? Well, you can’t conclude anything. The statement only tells you about
what happens if it rains. If it does not, then they might, or they might not, wear a
raincoat: and whether they do or not does not affect the truth of the statement. You
have to be clear about this: an ‘if-then’ statement only tells you about what follows if
something particular happens.
More formally, suppose P and Q are mathematical statements (each of which can
therefore be either true or false). Then we can form the statement denoted P =⇒ Q
(‘P implies Q’ or, equivalently, ‘if P , then Q’), which has as its truth table Table 2.4.
(This type of statement is known as an if-then statement or an implication.)

P Q P =⇒ Q
T T T
T F F
F T T
F F T

Table 2.4: The truth table for ‘P =⇒ Q’

Note that the statement P =⇒ Q is false only when P is true but Q is false. (To go
back to the previous example, the statement ‘If it rains, I wear a raincoat’ is false
precisely if it does rain but they do not wear a raincoat.) This is tricky, so you may
have to spend a little time understanding it. As we’ve suggested, perhaps the easiest
way is to think about when a statement ‘if P , then Q’ is false.
The statement P =⇒ Q can also be written as Q ⇐= P . There are different ways of
describing P =⇒ Q, such as:

if P then Q

P implies Q

P is sufficient for Q

Q if P

P only if Q

24
2.6. Implications, converse and contrapositive

2
Q whenever P

Q is necessary for P .
All these mean the same thing. The first two are the ones we will use most frequently.
If P =⇒ Q and Q =⇒ P then this means that Q will be true precisely when P is. That
is Q is true if and only if P is. We use the single piece of notation P ⇐⇒ Q instead of
the two separate P =⇒ Q and Q ⇐= P . There are several phrases for describing what
P ⇐⇒ Q means, such as:

P if and only if Q (sometimes abbreviated to ‘P iff Q’)

P is equivalent to Q

P is necessary and sufficient for Q

Q is necessary and sufficient for P .


The truth table is shown in Table 2.5, where we have also indicated the truth or falsity
of P =⇒ Q and Q =⇒ P to emphasise that P ⇐⇒ Q is the same as the conjunction
(P =⇒ Q) ∧ (Q =⇒ P ).

P Q P =⇒ Q Q =⇒ P P ⇐⇒ Q
T T T T T
T F F T F
F T T F F
F F T T T

Table 2.5: The truth table for ‘P ⇐⇒ Q’

What the table shows is that P ⇐⇒ Q is true precisely when P and Q are either both
true or both false.

2.6.2 Converse statements


Given an implication P =⇒ Q, the ‘reverse’ implication Q =⇒ P is known as its
converse. Generally, there is no reason why the converse should be true just because
the implication is. For example, consider the statement ‘If it is Tuesday, then John buys
the Guardian newspaper.’ The converse is ‘If John buys the Guardian newspaper, then
it is Tuesday’. Well, John might buy that newspaper on other days too, in which case
the implication can be true but the converse false. We’ve seen, in fact, that if both
P =⇒ Q and Q =⇒ P then we have a special notation, P ⇐⇒ Q, for this situation.
Generally, then, the truth or falsity of the converse Q =⇒ P has to be determined
separately from that of the implication P =⇒ Q.

Activity 2.6 What is the converse of the statement ‘if the natural number n
divides 4 then n divides 12’ ? Is the converse true? Is the original statement true?

25
2. Preliminaries

2
2.6.3 Contrapositive statements
The contrapositive of an implication P =⇒ Q is the statement ¬Q =⇒ ¬P . The
contrapositive is equivalent to the implication, as Table 2.6 shows. (The columns
highlighted in bold are identical.)

P Q P =⇒ Q ¬P ¬Q ¬Q =⇒ ¬P
T T T F F T
T F F F T F
F T T T F T
F F T T T T

Table 2.6: The truth tables for P =⇒ Q and ¬Q =⇒ ¬P .

If you think about it, the equivalence of the implication and its contrapositive makes
sense. Because, ¬Q =⇒ ¬P says that if Q is false, P is false also. So, it tells us that we
cannot have Q false and P true, which is precisely the same information as is given by
P =⇒ Q.
So what’s the point of this? Well, sometimes you might want to prove P =⇒ Q and it
will, in fact, be easier to prove instead the equivalent (contrapositive) statement
¬Q =⇒ ¬P . See Anthony and Biggs, section 3.5 for an example.

2.7 Proof by contradiction


We’ve seen a small example of proof by contradiction earlier in the chapter. Suppose
you want to prove P =⇒ Q. One way to do this is by contradiction. What this means is
that you suppose P is true but Q is false (in other words, that the statement P =⇒ Q
is false) and you show that, somehow, this leads to a conclusion that you know,
definitely, to be false.
Here’s an example.

Example 2.4 There are no integers m, n such that 6m + 8n = 1099.


Proof
To prove this by contradiction, we can argue as follows:
Suppose that integers m, n do exist such that 6m + 8n = 1099. Then since 6 is even,
6n is also even; and, since 8 is even, 8n is even. Hence 6m + 8n, as a sum of two even
numbers, is even. But this means 1099 = 6m + 8n is an even number. But, in fact, it
is not even, so we have a contradiction. It follows that m, n of the type required do
not exist.

This sort of argument can be a bit perplexing when you first meet it. What’s going on
in the example just given? Well, what we show is that if such m, n exist, then something
impossible happens: namely the number 1099 is both even and odd. Well, this can’t be.
If supposing something leads to a conclusion you know to be false, then the initial
supposition must be false. So the conclusion is that such integers m, n do not exist.

26
2.8. Some terminology

2
2.8 Some terminology
At this point, it’s probably worth introducing some important terminology. When, in
mathematics, we prove a true statement, we often say we are proving a theorem, or a
proposition. (Usually the word ‘proposition’ is used if the statement does not seem
quite so significant as to merit the description ‘theorem’.) A theorem that is a
preliminary result leading up to a theorem is often called a lemma, and a minor
theorem that is a fairly direct consequence of, or special case of, a theorem is called a
corollary, if it is not significant enough itself to merit the title theorem. For your
purposes, it is important just to know that these words all mean true mathematical
statements. You should realise that these terms are used subjectively: for instance, the
person writing the mathematics has to make a decision about whether a particular
result merits the title ‘theorem’ or is, instead, merely to be called a ‘proposition’.

Overview
This chapter has explored some of the basics of algebra, together with an introduction
to mathematical proof. It is a preliminary chapter, and you should not let it detain you
from proceeding with the rest of the course. As we mentioned earlier, if you’re not
entirely comfortable with it, it is best to proceed anyway: particularly when it comes to
proving things, you can pick up the key ideas in context in what follows.

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

demonstrate understanding of the key ideas and notations concerning sets


demonstrate an understanding of what mathematical statements are
demonstrate knowledge of standard mathematical notation
solve polynomial equations
prove whether certain mathematical statements are true or false
construct truth tables for simple logical statements
demonstrate knowledge of what is meant by conjunction and disjunction
demonstrate understanding of the meaning of ‘if-then’ statements and be able to
prove or disprove such statements
demonstrate understanding of the meaning of ‘if and only if’ statements and be
able to prove or disprove such statements
find the converse of statements
prove results by various methods, including directly, and by the method of proof by
contradiction

27
2. Preliminaries

2
Test your knowledge and understanding
Attempt the following exercises.

Exercises
Exercise 2.1
Simplify, then solve for a:
a
6ab − (b2 − 4bc) = 1.
b

Exercise 2.2
Given that the polynomial P (x) = x3 + 3x2 + 4x + 4 has an integer root, find it and
hence show that the polynomial can be expressed as a product P (x) = (x − r)Q(x)
where Q(x) is an irreducible quadratic polynomial.

Exercise 2.3
Is the following statement about natural numbers n true or false? Justify your answer
by giving a proof or a counterexample:

If n is divisible by 6 then n is divisible by 3.

What are the converse and contrapositive of this statement? Is the converse true? Is the
contrapositive true?

Exercise 2.4
Is the following statement about natural numbers n true or false? Justify your answer
by giving a proof or a counterexample:

If n is divisible by 2 then n is divisible by 4.

What are the converse and contrapositive of this statement? Is the converse true? Is the
contrapositive true?

Exercise 2.5
Prove by contradiction that there is no largest natural number.

Feedback on selected activities


Feedback to activity 2.1
A ∩ B is the set of objects in both sets, and so it is {2, 5}.

Feedback to activity 2.2


(x − 1)(x + 1) = x2 − 1. (x − 1)(x + 1)(x + 2) = x3 + 2x2 − x − 2.

28
2.8. Comments on exercises

2
Feedback to activity 2.3
We will show the first, and leave the second to you.
ar as = (a × a × a × · · · × a) (a × a × a × · · · × a) .
| {z }| {z }
r times s times

Removing the brackets, we have the product of a times itself a total of r + s times; that
is,
ar as = |a × a × a{z× · · · × a} = ar+s .
r+s times

Feedback to activity 2.4

49x−2 4xy 2 4xy 2


 
7 7 1 1 7 1 9 1
− = − = − = − =
35y (2xy)3 5x2 y 8x3 y 3 5x2 y 2x2 y x2 y 5 2 10 x2 y
Feedback to activity 2.5
(a) x2 − 4 = (x − 2)(x + 2) = 0, with solutions x = ±2.
(b) x2 + 2x − 8 = (x − 2)(x + 4) = 0, so x = 2 or x = −4
(c) 2x2 − 7x + 3 = (2x − 1)(x − 3) = 0, so x = 21 or x = 3.

Feedback to activity 2.6


The converse is ‘if n divides 12 then n divides 4’. This is false. For instance, n = 12 is a
counterexample. This is because 12 divides 12, but it does not divide 4. The original
statement is true, however. For, if n divides 4, then for some m ∈ Z, 4 = nm and hence
12 = 3 × 4 = 3nm = n(3m), which shows that n divides 12.

Comments on exercises
Solution to exercise 2.1
a
(a) 6ab − (b2 − 4bc) = 6ab − ab + 4ac = 5ab + 4ac = a(5b + 4c), so the equation
b
becomes a(5b + 4c) = 1, and solving for a:
1
a= , provided 5b + 4c 6= 0.
5b + 4c
Note that it is an important part of the solution to declare that it is only valid if
5b + 4c 6= 0, otherwise there is no solution.

Solution to exercise 2.2


Because all the terms are separated by + signs, the integer root must be a negative
number, so try x = −1. Substitution into the polynomial yields, −1 + 3 − 4 + 4 6= 0, so
−1 is not a root. Next try x = −2. This time it works, −8 + 3(4) + 4(−2) + 4 = 0, so
x3 + 3x2 + 4x + 4 = (x + 2)(x2 + λx + 2).
Comparing the coefficients of either x2 or x terms, you should obtain λ = 1. The
quadratic polynomial x2 + x + 2 cannot be factored over the real numbers, since its
discriminant is negative. Therefore
P (x) = x3 + 3x2 + 4x + 4 = (x + 2)(x2 + x + 2) = (x + 2)Q(x)

29
2. Preliminaries

2
where Q(x) is an irreducible quadratic polynomial.

Solution to exercise 2.3


The statement is true. Because, suppose n is divisible by 6. Then for some m ∈ N,
n = 6m, so n = 3(2m) and since 2m ∈ N, this proves that n is divisible by 3.
The converse is ‘If n is divisible by 3 then n is divisible by 6’. This is false. For example,
n = 3 is a counterexample: it is divisible by 3, but not by 6.
The contrapositive is ‘If n is not divisible by 3 then n is not divisible by 6’. This is true,
because it is logically equivalent to the initial statement, which we have proved to be
true.

Solution to exercise 2.4


The statement is false. For example, n = 2 is a counterexample: it is divisible by 2, but
not by 4.
The converse is ‘If n is divisible by 4 then n is divisible by 2’. This is true. Because,
suppose n is divisible by 4. Then for some m ∈ N, n = 4m, so n = 2(2m) and since
2m ∈ N, this proves that n is divisible by 2.
The contrapositive is ‘If n is not divisible by 4 then n is not divisible by 2’. This is false,
because it is logically equivalent to the initial statement, which we have proved to be
false. Alternatively, you can see that it’s false because 2 is a counterexample: it is not
divisible by 4, but it is divisible by 2.

Solution to exercise 2.5


Let’s prove by contradiction that there is no largest natural number. So suppose there is
a largest natural number. Let us call it N . (What we want to do now is somehow show
that a conclusion, or something we know for sure must be false, follows.) Well, consider
the number N + 1. This is a natural number. But since N is the largest natural number,
we must have N + 1 ≤ N , which means that 1 ≤ 0, and that’s nonsense. So it follows
that we must have been wrong in supposing there is a largest natural number. (That’s
the only place in this argument where we could have gone wrong.) So there is no largest
natural number. We could have argued the contradiction slightly differently. Instead of
using the fact that N + 1 ≤ N to obtain the absurd statement that 1 ≤ 0, we could
have argued as follows: N + 1 is a natural number. But N + 1 > N and this contradicts
the fact that N is the largest natural number.

30
Chapter 3 3
Matrices

Introduction
Matrices will be the main tool in our study of linear algebra, so we begin by learning
what they are and how to use them. This chapter contains a lot of definitions with
which you should become familiar, including terminology associated with a matrix and
the operations defined on matrices. All of the operations are defined purposefully to
ensure matrices are a useful tool, as we shall see in later chapters. In particular, the
definition of matrix multiplication may seem strange at first, but it turns out to be
exactly what we need.

Aims
The aims of this chapter are to:

Define a matrix, the terminology associated with a matrix, and the operations
defined on matrices.

Learn how to manipulate matrices algebraically using these operations.

Become familiar with the properties and rules of matrix operations, how they
combine and interact.

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 1,
Sections 1.1–1.7
This chapter of the subject guide closely follows the first half of Chapter 1 of the
textbook. You should read the corresponding sections of the textbook and work through
all the activities there while working through the sections of this subject guide.

Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

31
3. Matrices

Synopsis
3 We define a matrix and the basic terminology associated with a matrix (entry, row,
column, size, square matrix, diagonal matrix, equality of matrices) and then look at the
operations of addition, scalar multiplication and matrix multiplication. We show how to
algebraically manipulate matrices using these operations, and state what is meant by a
zero matrix and an identity matrix. We define the inverse of a square matrix, when it
exists, and its properties. Next we define what is meant by powers of a square matrix,
look at its properties and how it interacts with the inverse of a matrix. Then we define
the transpose of a matrix and what is meant by a symmetric matrix, and look at the
properties of the transpose of a matrix and how it interacts with the inverse of a matrix.

3.1 Definitions and terminology


Definition 3.1 (Matrix) A matrix is a rectangular array of numbers or symbols. It
can be written as
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
A= ... .. .. ..  .
. . . 
am1 am2 · · · amn

We denote this array by the single letter A or by (aij ), 1 ≤ i ≤ m, 1 ≤ j ≤ n, and we


say that A has m rows and n columns, or that it is an m × n matrix. We also say that
A is a matrix of size m × n.
The number aij in the ith row and jth column is called the (i, j)-entry. Note that the
first subscript on aij always refers to the row and the second subscript to the column.

Example 3.1 The matrix


 
4 0 9
 −2 5 −1 
A=
 7

3 0 
1 −1 2

is a 4 × 3 matrix whose entries are integers.

Activity 3.1 What is a31 for this matrix?

A square matrix is a matrix with the same number of rows as columns. The diagonal
of an n × n square matrix S is the list of entries s11 , s22 , . . . , snn .

Example 3.2 The matrix  


a b c
A = d e f 
g h i
is a 3 × 3 square matrix. The diagonal of this matrix is a, e, i.

32
3.2. Matrix addition and scalar multiplication

A diagonal matrix is a square matrix with all the entries which are not on the diagonal
equal to zero. So D = (dij ) is diagonal if it is n × n and dij = 0 if i 6= j,
3
d11 0 · · · 0
 
 0 d22 · · · 0 
D=  ... .. .. ..  .
. . . 
0 0 · · · dnn

Activity 3.2 Write down the diagonal matrix with the same diagonal as the matrix
A in the previous example.

Definition 3.2 (Equality of matrices) Two matrices are equal if they are the same
size and if corresponding entries are equal. That is, if A = (aij ) and B = (bij ) are both
m × n matrices, then

A = B ⇐⇒ aij = bij 1 ≤ i ≤ m, 1 ≤ j ≤ n

3.2 Matrix addition and scalar multiplication


If A and B are two matrices, then provided they are the same size we can add them
together to form a new matrix A + B. The entries of A + B are the sums of the
corresponding entries in A and B.
Definition 3.3 (Addition) If A = (aij ) and B = (bij ) are both m × n matrices, then

A + B = (aij + bij ) 1 ≤ i ≤ m, 1 ≤ j ≤ n

We can also multiply a matrix of any size by a real number, which we call a scalar in
this context. If λ is a scalar and A is a matrix, then λA is the matrix whose entries are
λ times each of the entries of A.
Definition 3.4 (Scalar multiplication) If A = (aij ) is an m × n matrix and λ ∈ R,
then
λA = (λaij ) 1 ≤ i ≤ m, 1 ≤ j ≤ n

We say that the matrix λA is a scalar multiple of the matrix A.

Example 3.3 If
   
1 2 −1 1 0 3
A= and B =
−2 3 5 −4 2 −1

Then what is A + 3B?


We have
     
1 2 −1 3 0 9 4 2 8
A + 3B = + = .
−2 3 5 −12 6 −3 −14 9 2

33
3. Matrices

We write −B for the matrix (−1)B and if A and B are m × n matrices, then A − B is
defined by A − B = A + (−1)B.
3
Activity 3.3 Find the matrix A − B for the matrices A and B in the above
example.

Activity 3.4 Write down the missing entries in the matrices below:
     
1 2 5 −1

R
+ = .
4 2 3 4
Read sections 1.1 and 1.2 of the text A-H, working through the activities there.

3.3 Matrix multiplication


Is there a way to multiply two matrices together? The answer is sometimes, depending
on the sizes of the matrices. If A and B are matrices such that the number of columns
of A is equal to the number of rows of B, then we can define a matrix C which is the
product of A and B. We do this by saying what the entry cij of the product matrix AB
should be.
Definition 3.5 (Matrix multiplication) If A is an m × n matrix and B is an n × p
matrix then the product is the matrix AB = C = (cij ) with
cij = ai1 b1j + ai2 b2j + · · · + ain bnj , 1 ≤ i ≤ m, 1 ≤ j ≤ p.

Although this formula looks daunting, it is quite easy to use in practice. What it says is
that the element in row i and column j of the product is obtained by taking each entry
of row i of A and multiplying it by the corresponding entry of column j of B, then
adding these n products together.
 
b1j
 
b2j
 
   
row i of A −→  a i1 a i2 · · · a in
 
  .
..


 
bnj

column j of B

What size is C = AB? The matrix C must be m × p since it will have one entry for
each of the m rows of A and each of the p columns of B.

Example 3.4 In the following product the element in row 2 and column 1 of the
product matrix (indicated in bold type) is found, as described above, by using the
row and column printed in bold type.
   
1 1 1   3 4
2 0 1  3 0
= 5 3 

AB =    1 1  −1 14 

1 2 4 
−1 3
2 2 −1 9 −1

34
3.4. Matrix algebra

This entry is 5 because

(2)(3) + (0)(1) + (1)(−1) = 5. 3


Notice the sizes of the three matrices. A is 4 × 3, B is 3 × 2, and the product AB
is 4 × 2.

We shall see in later chapters that this definition of matrix multiplication is exactly
what is needed for applying matrices in our study of linear algebra.
It is an important consequence of this definition that:

AB 6= BA in general. That is, matrix multiplication is not ‘commutative’.

To see just how non-commutative matrix multiplication is, let’s look at some examples,
starting with the two matrices A and B in the example above. The product AB is
defined, but the product BA is not even defined. Since A is 4 × 3 and B is 3 × 2 it is
not possible to multiply the matrices in the order BA.
Now consider the matrices
 
  3 1
2 1 3
A= and B = 1 0.
1 2 1
1 1

Both products AB and BA are defined, but they are different sizes, so they cannot be
equal. What sizes are they?

Activity 3.5 Answer the question just posed concerning the sizes of AB and BA.
Multiply the matrices to find the two product matrices, AB and BA.

Even if both products are defined and the same size, it is still generally true that
AB 6= BA.

Activity 3.6 Try this for any two 2 × 2 matrices. Write down two different matrices
A and B and find the products AB and BA. For example, you could use
   
1 2 1 1
A= B= .
3 4 0 1

3.4 Matrix algebra


Matrices are useful because they provide a compact notation and we can do algebra
with them.
For example, given a matrix equation such as

3A + 2B = 2(B − A + C),

35
3. Matrices

we can solve this for the matrix C using the rules of algebra. You must always bear in
mind that to perform the operations, they must be defined. In this equation it is
3 understood that all the matrices A, B and C are the same size, say m × n.
We list the rules of algebra satisfied by the operations of addition, scalar multiplication
and matrix multiplication. The sizes of the matrices are dictated by the operations
being defined.

A + B = B + A. Matrix addition is ‘commutative’.

This is easily shown to be true. We will carry out the proof as an example. The matrices
A and B must be of the same size, say m × n for the operation to be defined, so both
A + B and B + A are also m × n matrices. They also have the same entries. The (i, j)
entry of A + B is aij + bij and the (i, j) entry of B + A is bij + aij , but aij + bij = bij + aij
by the properties of real numbers. So the matrices A + B and B + A are equal.
On the other hand, as we have seen, matrix multiplication is not commutative:
AB 6= BA in general.
We have the following ‘associative’ laws:

(A + B) + C = A + (B + C)

λ(AB) = (λA)B = A(λB)

(AB)C = A(BC)

These rules allow us to remove brackets. For example the last rule says that we will get
the same result if we first multiply AB and then multiply by C on the right, as we will
if we first multiply BC and then multiply by A on the left, so the choice is ours.
All these rules follow from the definitions of the operations in the same way as we
showed the commutativity of addition. We need to know that the matrices on the left
and on the right of the equals sign have the same size and that corresponding entries
are equal. Only the associativity of multiplication presents any complications; it is
tedious, but it can be done.

Activity 3.7 Think about these rules. What sizes are each of the matrices?
Write down the (i, j) entry for each of the matrices λ(AB) and (λA)(B) and prove
that the matrices are equal.

36
3.4. Matrix algebra

Similarly, we have three ‘distributive’ laws:

A(B + C) = AB + AC 3
(B + C)A = BA + CA

λ(A + B) = λA + λB.

Why do we need both of the first two rules (which state that matrix multiplication
distributes through addition)? Since matrix multiplication is not commutative, we
cannot conclude the second distributive rule from the first; we have to prove it is true
separately. All these statements can be proved from the definitions using the same
technique as used earlier, but we will not take the time to do this here.
If A is an m × n matrix, what is the result of A − A? We obtain an m × n matrix all of
whose entries are 0. This is an ‘additive identity’: that is, it plays the same role for
matrices as the number 0 does for numbers, in the sense that A + 0 = 0 + A = A. There
is a zero matrix of any size m × n.
Definition 3.6 (Zero matrix) A zero matrix, denoted 0, is an m × n matrix with all
entries zero,  
0 0 ··· 0 0
0 0 ··· 0 0
0=  ... ... . . . ... ...  .

0 0 ··· 0 0

Then

A+0=A

A−A=0

0A = 0, A0 = 0

where the sizes of the zero matrices above must be compatible with the size of the
matrix A for the operations to be defined.

Activity 3.8 If A is a 2 × 3 matrix, write down the zero matrix for each of the rules
involving addition. What sizes are the zero matrices for the rules involving matrix
multiplication?

What about matrix multiplication? Is there a ‘multiplicative identity’, which acts like
the number 1 does for multiplication of numbers? The answer is ‘yes’.
Definition 3.7 (Identity matrix) The n × n identity matrix, denoted In or simply I,
is the diagonal matrix with aii = 1, 1 ≤ i ≤ n,
 
1 0 ··· 0
0 1 ··· 0
I=  ... ... . . . ...  .

0 0 ··· 1

37
3. Matrices

If A is any m × n matrix, then,

3 AI = A and IA = A,

where it is understood that the identity matrix is the appropriate size for the product
to be defined.

Activity 3.9 What size is the identity matrix if A is m × n and IA = A?

Activity 3.10 You can apply these rules to solve the matrix equation,

3A + 2B = 2(B − A + C)

for the matrix C. Do this; solve the equation for C, stating what rule of matrix

R
algebra you are using for each step of the solution.

You should now read sections 1.3 and 1.4 of the text A-H, working through the
activities there. You will find the solution of the last activity in the text.

3.5 Matrix inverses

3.5.1 Definition of a matrix inverse


If AB = AC, can we conclude that B = C? The answer is ‘no’, as the following example
shows.

Example 3.5 If
     
0 0 1 −1 8 0
A= , B= , C= ,
1 1 3 5 −4 4

then the matrices B and C are not equal, but


 
0 0
AB = AC = .
4 4

Activity 3.11 Check this by multiplying out the matrices.

On the other hand, If A + 5B = A + 5C, then we can conclude that B = C because the
operations of addition and scalar multiplication have inverses. If we have a matrix A,
then the matrix −A = (−1)A is an additive inverse because it satisfies A + (−A) = 0. If
we multiply a matrix A by a non-zero scalar c we can ‘undo’ this by multiplying cA
by 1/c.
What about matrix multiplication, is there a multiplicative inverse? The answer is
‘sometimes’.

38
3.5. Matrix inverses

Definition 3.8 (Inverse matrix) The n × n matrix A is invertible if there is a matrix


B such that
AB = BA = I 3
where I is the n × n identity matrix. The matrix B is called the inverse of A and is
denoted by A−1 .

Notice that the matrix A must be square, and that both I and B = A−1 must also be
square n × n matrices for the products to be defined.

Example 3.6 Let  


1 2
A= .
3 4
Then with  
−2 1
B= 3
2
− 12
we have AB = BA = I, so B = A−1 .

Activity 3.12 Check this. Multiply the matrices to show that AB = I and
BA = I, where I is the 2 × 2 identity matrix.

To show that a matrix B is equal to A−1 , find the matrix products AB and BA and
show that each product is equal to the identity matrix I.

You might have noticed that we have said that B is the inverse of A. This is because an
invertible matrix has only one inverse. We will prove this.
Theorem 3.1 If A is an n × n invertible matrix, then the matrix A−1 is unique.
Proof
Assume the matrix A has two inverses, B and C, so that AB = BA = I and
AC = CA = I. You need to show that B and C must actually be the same matrix, that
is, you need to show that B = C. Begin by the statement
B = BI = · · ·

R
and substitute an appropriate product for I until you obtain that B = C.

You can check your proof with the details in the text A-H, where this theorem is
labelled as Theorem 1.23.
Not all square matrices will have an inverse. We say that A is invertible or non-singular
if it has an inverse. We say that A is non-invertible or singular if it has no inverse.
 
0 0
Example 3.7 The matrix (used in Example 3.5) is not invertible. It is
1 1
not possible for a matrix to satisfy
    
0 0 a b 1 0
=
1 1 c d 0 1

39
3. Matrices

since the (1,1)-entry of the product is 0 and 0 6= 1.

3
On the other hand, if
 
a b
A= , where ad − bc 6= 0,
c d

then A has the inverse  


−1 1 d −b
A = .
ad − bc −c a

Activity 3.13 Check that this is indeed the inverse of A, by showing that if you
multiply A on the left or on the right by this matrix, then you obtain the identity
matrix I.

This tells us how to find the inverse of any 2 × 2 invertible matrix. If


 
a b
A= ,
c d

the scalar ad − bc is called the determinant of the matrix A, denoted |A|. We shall see
more about the determinant in Chapter 8. So if |A| = ad − bc 6= 0, then to construct
A−1 we take the matrix A, switch the main diagonal entries and put minus signs in
front of the other two entries, then multiply by the scalar 1/|A|.

Activity 3.14 Use this to find the inverse of the matrix


 
1 2
B= ,
3 4

and check your answer by looking at Example 3.6 on page 39.

If AB = AC, and A is invertible, can we conclude that B = C? This time the answer is
‘yes’, because we can multiply each side of the equation on the left by A−1 :

A−1 AB = A−1 AC =⇒ IB = IC =⇒ B = C.

But be careful, if AB = CA then we cannot conclude that B = C, only that


B = A−1 CA.
It is not possible to ‘divide’ by a matrix. We can only multiply on the right or left by
the inverse matrix.

3.5.2 Properties of the inverse


If A is an invertible matrix, then by definition A−1 exists and AA−1 = A−1 A = I. This
statement also says that the matrix A is the inverse of A−1 ; that is,

40
3.6. Powers of a matrix

(A−1 )−1 = A.

It is important to understand the definition of an inverse matrix and be able to use it. 3
Basically, if we can find a matrix that works in the definition, then that matrix is the
inverse, and the matrices are invertible. For example, if A is an invertible n × n matrix,
and λ 6= 0 ∈ R, then
1 −1
(λA)−1 = A .
λ
This statement says that the matrix λA is invertible, and its inverse is given by the
matrix C = (1/λ)A−1 . To prove this is true, we just need to show that the matrix C
satisfies (λA)C = C(λA) = I. This is straightforward using matrix algebra:
   
1 −1 1 −1 1 −1 1
(λA) A = λ AA = I and A (λA) = λA−1 A = I.
λ λ λ λ

If A and B are invertible n × n matrices, then using the definition of the inverse, you
can show that

(AB)−1 = B −1 A−1 .

This last statement says that if A and B are invertible matrices of the same size, then
the product AB is invertible and its inverse is the product of the inverses in the reverse
order. The proof of this statement is left as an exercise at the end of this chapter.

3.6 Powers of a matrix


If A is a square matrix, what do we mean by A2 ? We naturally mean the product of A
with itself, A2 = AA. In the same way, if A is an n × n matrix and r ∈ N, then
Ar = A . . . A} .
| A{z
r times

If A is an n × n matrix and r ∈ N, then the usual rules of exponents hold: for integers
r, s:

Ar As = Ar+s
(Ar )s = Ars .

As r and s are positive integers and matrix multiplication is associative, these


properties are easily verified in the same way as they are with real numbers.
What about the inverse of Ar ?

(Ar )−1 = (A−1 )r .

This follows immediately from the definition of inverse matrix and the associativity of
matrix multiplication. Think about what it says; that the inverse of the product of A
times itself r times, is the product of A−1 times itself r times.

41
3. Matrices

Activity 3.15 Verify the above three properties.


3
3.7 Transpose

3.7.1 The transpose of a matrix


The transpose of a matrix A is the matrix, denoted AT , which is obtained from A by
making the rows of A into the columns of AT .
Definition 3.9 (Transpose) The transpose of an m × n matrix
a11 a12 . . . a1n
 
 a21 a22 . . . a2n 
A = (aij ) = 
 ... .. .. .. 
. . . 
am1 am2 . . . amn
is the n × m matrix
a11 a21 . . . am1
 
 a12 a22 . . . am2 
AT = (aji ) = 
 ... .. .. .. 
. . . 
a1n a2n . . . amn
which is obtained from A by interchanging rows and columns.

That is, row i of A becomes column i of AT .

Example 3.8 If  
1 2
A= and B = ( 1 5 3 ) ,
3 4
then  
  1
1 3
AT = , T
B = 5.

2 4
3

Activity 3.16 Let D be a diagonal matrix

d11 0 ··· 0
 
 0 d22 ··· 0 
D=  ... .. .. ..  .
. . . 
0 0 · · · dnn

Show that DT = D.

Activity 3.17 Under what conditions on a, b, c, d will the matrix


 
a b
A=
c d
satisfy AT = A?

42
3.7. Transpose

Properties of transpose

If we take the transpose of a matrix A by switching the rows and columns, and then do 3
it again, we get back to the original matrix A. This is summarised in the following
equation:

(AT )T = A.

Two further properties relate to scalar multiplication and addition:

(λA)T = λAT and


(A + B)T = AT + B T .

These follow immediately from the definition. In particular, the (i, j) entry of (λA)T is
λaji which is also the (i, j) entry of λAT .
The next property tells you what happens when you take the transpose of a product of
matrices:

(AB)T = B T AT

This can be stated as: The transpose of the product of two matrices is the product of the
transposes in the reverse order.
Showing that this is true is slightly more complicated since it involves matrix
multiplication. It is important to understand why the product of the transposes must be
in the reverse order by carrying out the following activity.

Activity 3.18 If A is an m × n matrix and B is n × p, look at the sizes of the


matrices AT , B T , (AB)T and show that only the product B T AT is always defined.
Show also that its size is equal to the size of (AB)T .

If A is an m × n matrix and B is n × p, from the above activity you know that (AB)T
and B T AT are the same size. To prove that (AB)T = B T AT you need to show that the
(i, j)-entries are equal.
The (i, j) entry of (AB)T is the (j, i) entry of AB, which is obtained by taking row j of
A and multiplying each of the n terms by the corresponding entry of column i of B and
then summing the terms.

Activity 3.19 How can you similarly describe in words the (i, j) entry of B T AT ?

The final property in this section states that the inverse of the transpose of an invertible
matrix is the transpose of the inverse; that is, if A is invertible, then

(AT )−1 = (A−1 )T .

This follows from the previous property and the definition of inverse. We have
AT (A−1 )T = (A−1 A)T = I T = I, and in the same way (A−1 )T AT = (AA−1 )T = I.
Therefore, by the definition of the inverse of a matrix, (A−1 )T must be the inverse of AT .

43
3. Matrices

3.7.2 Symmetric matrices


3 Definition 3.10 (Symmetric matrix) A matrix A is symmetric if it is equal to its
transpose, A = AT .

Only square matrices can be symmetric. If A is symmetric, then aij = aji . That is,
entries diagonally opposite to each other must be equal: the matrix is symmetric about
its diagonal.

Activity 3.20 Fill in the missing numbers if the matrix A is symmetric:


   
1 4 1
A= 2 = −7  = AT
5 3

If D is a diagonal matrix, then dij = 0 = dji for all i 6= j. So as we saw before in

R
Activity 3.16, DT = D; that is, all diagonal matrices are symmetric.
Read the remaining parts of Chapter 1, Sections 1.1–1.7.

Overview
In this chapter we have looked at the terminology associated with matrices and the
operations defined for matrices, when they are defined and the properties they satisfy.
We have seen how to manipulate matrices algebraically.
You will be working with matrices throughout this course so it is important for you to
gain a facility with these definitions and operations – you should be able to use them
with ease.

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

define a matrix and explain the terminology used with matrices, such as row,
column, size, square matrix, diagonal matrix, equality of matrices, transpose of a
matrix, symmetric matrix
define and use matrix addition, scalar multiplication and matrix multiplication
appropriately (know when and how these operations are defined)
manipulate matrices algebraically
define what is meant by the inverse of a square matrix, know and use the
properties of the inverse of a matrix and find the inverse of a 2 × 2 matrix
define what is meant by An where A is a square matrix and n is an integer;
demonstrate and use the fact that (An )−1 = (A−1 )n .
state and use the properties of the transpose, use transpose in combination with
the other operations defined on matrices

44
3.7. Test your knowledge and understanding

Test your knowledge and understanding


Work Exercises 1.1–1.7 in the text A-H. The solutions to all exercises in the text can be 3
found at the end of the textbook.
Work Problems 1.5–1.7 in the text A-H. You will find the solutions on the VLE.
In addition, work the exercises below. The solutions are at the end of this chapter.

Exercises
Exercise 3.1
For fixed real numbers a, b, c, d and real numbers x, y, z, w assume that
    
a b x y 1 0
AB = = = I.
c d z w 0 1

Write down the four linear equations in x, y, z, w that you obtain by first multiplying
the matrices on the left and then equating the entries of the product to the entries of
the identity matrix. Then solve these equations for x, y, z, w.
You should find that solution is only possible if ad − bc 6= 0, and that the solution is
 
1 d −b
B= .
ad − bc −c a

Compare this with the result you were given on page 40.

Exercise 3.2
What is meant by the statement that A is a symmetric matrix?
If B is an m × k matrix, show that the matrix B T B is a k × k symmetric matrix.

Comments on selected activities


Feedback to activity 3.1
This is the entry in the third row and first column, so a31 = 7.

Feedback to activity 3.2


 
a 0 0
D = 0 e 0.
0 0 i

Feedback to activity 3.3


 
0 2 −4
A−B = .
2 1 6

45
3. Matrices

Feedback to activity 3.4

3
     
1 2 −2 5 −1 7
+ = .
1 4 2 0 3 4

Feedback to activity 3.5


AB is 2 × 2 and BA is 3 × 3,
 
  7 5 10
10 5
AB = BA =  2 1 3 
6 2
3 3 4
Feedback to activity 3.6

   
1 3 4 6
AB = BA = .
3 7 3 4

Feedback to activity 3.7


If A is m × n and B is n × p, then AB is an m × p matrix. The size of a matrix is not
changed by scalar multiplication, so both λ(AB) and (λA)B are m × p. Looking at the
(i, j) entries of each,

(λ(AB))ij = λ (ai1 b1j + ai2 b2j + . . . + ain bnj )


= λai1 b1j + λai2 b2j + . . . + λain bnj
= (λai1 )b1j + (λai2 )b2j + . . . + (λain )bnj
= ((λA)B)ij ,

so these two matrices are equal.


Feedback to activity 3.8
For the first two rules, the 0 matrix must be a 2 × 3 matrix,
 
0 0 0
0= .
0 0 0
For the last set of rules, if A is 2 × 3, then for 0A = 0, the 0 matrix multiplying A must
be p × 2 where p is any positive integer, and then 0A is equal to the 0 matrix of size
p × 3. Similarly, for A0 = 0, the 0 matrix multiplying A must be 3 × q where q is any
positive integer, and then A0 is equal to the 0 matrix of size 2 × q.
Feedback to activity 3.9
In this case I is m × m.

Feedback to activity 3.10


Look at how this is done in Example 1.17 of the text A-H.

Feedback to activity 3.12

    
1 2 −2 1 1 0
AB = 3 =
3 4 2
− 21 0 1

46
3.7. Comments on selected activities

and     
−2 1 1 2 1 0
BA = = .
 
3
2
− 21 3 4 0 1 3
−2 1
Therefore A−1 = 3 .
2
− 12

Feedback to activity 3.13


We will show one way, you should show that A−1 A = I.
   
−1 a b 1 d −b
AA =
c d ad − bc −c a
 
1 ad − bc −ab + ba
=
ad − bc cd − dc −bc + ad
 
1 0
= .
0 1
Feedback to activity 3.15
We will do the last property and leave the others to you. The inverse of Ar is a matrix
B such that Ar B = BAr = I. So show that the matrix B = (A−1 )r works:
Ar (A−1 )r = (A
| A{z
. . . A})(A −1 −1 −1
| A {z. . . A }).
r times r times

Removing the brackets (matrix multiplication is associative) and replacing each central
AA−1 = I, the resultant will eventually be AIA−1 = AA−1 = I. In the same way,
(A−1 )r Ar = (A −1 −1 −1
| A {z. . . A })(A
| A{z
. . . A}) = I.
r times r times

Therefore (Ar )−1 = (A−1 )r .


Feedback to activity 3.16
If D = (dij ) is a diagonal n × n matrix, then dij = 0 for all i 6= j. Therefore dji = 0 = dij
for all i 6= j. And dii = dii does not change, so DT = D.
Feedback to activity 3.17
For A = AT , we must have b = c.

Feedback to activity 3.18


Given the sizes of A and B, the matrix AB is m × p, so (AB)T is p × m. Also, AT is
n × m and B T is p × n, so the only way these matrices can be multiplied is as B T AT
(unless m = p).
Feedback to activity 3.19
The (i, j) entry of B T AT is obtained by taking row i of B T , which is column i of the
matrix B and multiplying each of the n terms by the corresponding entry of column j
of AT , which is row j of the matrix A, and then summing the products.
You can also write the entries as:
(AB)T ij = aj1 b1i + aj2 b2i + . . . + ajn bni .

and
B T AT ij = b1i aj1 + b2i aj2 + . . . + bni ajn .


47
3. Matrices

Since multiplication of real numbers is commutative, these two expression are the same
real number.
3 Feedback to activity 3.20
The matrix is  
1 4 5
A =  4 2 −7  = AT
5 −7 3

Comments on exercises
Solution to exercise 3.1
The equations are ( (
ax + bz = 1 ay + bw = 0
and
cx + dz = 0 cy + dw = 1
To begin you can solve the first set by multiplying the top equation by d and the
bottom equation by b and then subtracting one equation from the other to eliminate
the terms in z. You will obtain (ad − bc)x = 1. Then provided ad − bc 6= 0,

d
x= .
ad − bc
Repeat the steps, this time eliminating the terms in x and solve for z = −c/(ad − bc).
Then solve the second set of equations in the same way.

Solution to exercise 3.2


A matrix A is symmetric if AT = A.
Since B T is a k × m matrix, B T B is k × k. (B T B)T = B T (B T )T = B T B which shows
that it is symmetric.

48
Chapter 4
Vectors 4

Introduction
Matrices lead us to a study of vectors, which can be viewed as n × 1 matrices, but
which have far reaching applications viewed as elements of a Euclidean space, Rn . To
understand this, we develop our geometric intuition by looking at R2 and R3 , and use
vectors to obtain equations of familiar geometric objects, namely lines and planes.

Aims
The aims of this chapter are to:

Define a vector and define Rn , Euclidean n-space

Define the inner product of two vectors and establish the properties satisfied by
this operation

Develop geometric insight by looking at vectors in R2 and R3

Become familiar with forming lines and planes in R2 and R3 using linear
combinations of vectors

Extend these ideas to lines and hyperplanes in Rn .

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 1,
Sections 1.8–1.12
This chapter of the subject guide closely follows the second half of Chapter 1 of the
textbook. You should read the corresponding sections of the textbook and work through
all the activities there while working through the sections of this subject guide.

Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

49
4. Vectors

Synopsis
We define a vector, what we mean by a linear combination of vectors and say what we
mean by Euclidean n-space, Rn . Then we define the inner product of two vectors in Rn
and look at its properties. We pause to establish a fundamental relationship between
4 the vector Ax, where A is an m × n matrix and x ∈ Rn , and the column vectors of A.
We then focus on developing geometric insight, beginning with R2 , looking at what we
mean by the length and direction of a vector, and the angle between two vectors, and
extend these concepts to R3 . We then study lines in R2 , learning how to describe them
using Cartesian equations or vector equations and how to switch from one description
to the other. We then extend these ideas to vectors in R3 , where the extra dimension
increases the possibilities of how lines can interact. We next extend the idea of linear
combinations of vectors to planes in R3 and look at vector equations and Cartesian
equations of planes. We give several examples of determining the interactions of planes
and of lines and planes. Finally we extend these concepts to Rn , to lines and
hyperplanes.

4.1 Vectors in Rn

4.1.1 Definition of vector and Euclidean space


Definition 4.1 (Vector) An n × 1 matrix is a column vector, or simply a vector

v1
 
 v2 
v=
 ... 
 vi ∈ R
vn

We can also define a row vector to be a 1 × n matrix. However, in this text, by the
term vector we shall always mean a column vector.
The numbers v1 , v2 , . . . , vn , are known as the components (or entries) of the vector, v.
In order to distinguish vectors from scalars, and to emphasise that they are vectors and
not general matrices, in this text vectors are written in lowercase boldface type. (When
writing by hand, vectors should be underlined to avoid confusion with scalars.)
Addition and scalar multiplication are defined for vectors as for n × 1 matrices:

v1 + w 1 λv1
   
 v2 + w 2   λv2 
v+w = ..  λv = 
 ... 

 . 
vn + w n λvn

For a fixed positive integer n, the set of vectors together with the operations of addition
and scalar multiplication form Rn , usually called Euclidean n-space.
We will often write a column vector in the text as the transpose of a row vector.

50
4.1. Vectors in Rn

Although
x1

 x2 
x=
 ...  = ( x1
 x2 · · · xn )T ,
xn
4
we will usually write x = (x1 , x2 , · · · , xn )T , with commas separating the entries. A
matrix does not have commas; however, we will use the commas in order to clearly
distinguish the separate components of the vector.
For vectors v1 , v2 , . . . , vk in Rn and scalars α1 , α2 , . . . , αk in R, the vector

v = α1 v1 + · · · + αk vk ∈ Rn

is known as a linear combination of the vectors v1 , . . . , vk .


A zero vector, denoted 0, is a vector with all of its entries equal to 0. There is one
zero vector in each space Rn . As with matrices, this vector is an additive identity. For
any vector v ∈ Rn , 0 + v = v + 0 = v and multiplying v by the scalar zero results in
the zero vector, 0v = 0.
Although the matrix product of two vectors v and w in Rn cannot be calculated, it is
possible to form the matrix products vT w and vwT . The first is a 1 × 1 matrix, and the
latter is an n × n matrix.

Activity 4.1 Calculate aT b and abT for


   
1 4
a =  2  , and b =  −2  .
3 1

The 1 × 1 matrix vT w can be identified with the real number, or scalar, which is its
unique entry. This turns out to be particularly useful, and is known as the inner product
or scalar product or dot product of v and w.

4.1.2 The inner product of two vectors


Definition 4.2 (Inner product) Given two vectors:

v1 w1
   
 v2 
 , w =  w.2  ,
 
v= .
 ..   .. 
vn wn

the inner product denoted hv, wi, is the real number given by

v1 w1 +
   
*
 v2   w2 
hv, wi =  .  ,  .  = v1 w1 + v2 w2 + · · · + vn wn
 ..   .. 
vn wn

51
4. Vectors

The inner product, hv, wi is also known as the scalar product of v and w, or as the dot
product. In the latter case it is denoted by v · w.
The inner product of v and w is precisely the scalar quantity given by
w1
 
4  w2 
vT w = ( v1 v2 · · · vn ) 
 ...  = v1 w1 + v2 w2 + · · · + vn wn ,

wn
so that we can write
hv, wi = vT w.

Example 4.1 If x = (1, 2, 3)T and y = (2, −1, 1)T then

hx, yi = 1(2) + 2(−1) + 3(1) = 3.

It is important to realise that the inner product is just a number, a scalar, not another
vector or a matrix.
The inner product on Rn satisfies certain basic properties as shown in the next theorem.
Theorem 4.1 The inner product

hx, yi = x1 y1 + x2 y2 + · · · + xn yn , x, y ∈ Rn

satisfies the following properties for all x, y, z ∈ Rn and for all α ∈ R:

(i) hx, yi = hy, xi


(ii) αhx, yi = hαx, yi = hx, αyi
(iii) hx + y, zi = hx, zi + hy, zi
(iv) hx, xi ≥ 0 and hx, xi = 0 if and only if x = 0.

Proof
We have

hx, yi = x1 y1 + x2 y2 + · · · + xn yn = y1 x1 + y2 x2 + · · · + yn xn = hy, xi

which proves (i). We leave the proofs of (ii) and (iii) as an exercise. For (iv), note that

hx, xi = x21 + x22 + · · · + x2n

is a sum of squares, so hx, xi ≥ 0, and hx, xi = 0 if and only if each term x2i is equal to
zero, that is, if and only if each xi = 0, so x is the zero vector, x = 0. 

Activity 4.2 Prove properties (ii) and (iii). Show, also, that these two properties
are equivalent to the single property

hαx + βy, zi = αhx, zi + βhy, zi.

52
4.2. Developing geometric insight – R2 and R3

From the definitions, it is clear that it is not possible to combine vectors in different
Euclidean spaces, either by addition or by taking the inner product. If v ∈ Rn and
w ∈ Rm , with m 6= n, then these vectors live in different ‘worlds’, or more precisely, in
different ‘vector spaces’.

4.1.3 Vectors and matrices


4
If A is an m × n matrix, then the columns of A are vectors in Rm . If x is any vector in
Rn , then the product Ax is an m × 1 matrix, so it is also a vector in Rm . There is a
fundamental relationship between these vectors in Rm which follows from matrix
multiplication, and which you will be using very much later on in this course.
Theorem 4.2 Let A be an m × n matrix
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
A=  ... .. .. ..  .
. . . 
am1 am2 · · · amn

and denote the columns of A by the column vectors c1 , c2 , . . . , cn , so that

a1i
 
 a2i 
 ...  , 1 ≤ i ≤ n.
ci =  

ami

Then if x = (x1 , x2 , . . . , xn )T is any vector in Rn ,

Ax = x1 c1 + x2 c2 + · · · xn cn .

This theorem states that the matrix product Ax, which is a vector in Rm , can be
expressed as a linear combination of the column vectors of A.

Activity 4.3 Prove this theorem; derive expressions for both the left-hand side and
the right-hand side of the equality as a single m × 1 vector and compare the

R
components to prove the equality.

Read section 1.8 of the text A-H, working through the activities there. You will
find the solution of the last activity in the text.

4.2 Developing geometric insight – R2 and R3


Vectors have a broader use beyond that of being special types of matrices. It is likely
that you have some previous knowledge of vectors; for example, in describing the
displacement of an object from one point to another in R2 or in R3 . Before we continue
our study of linear algebra it is important to consolidate this background, for it
provides valuable geometric insight into the definitions and uses of vectors in higher
dimensions. Parts of this section may be review for you.

53
4. Vectors

4.2.1 Vectors in R2

The set R can be represented as points along a horizontal line, called a real-number line.
In order to represent pairs of real numbers, (a1 , a2 ), we use a Cartesian plane, a plane
with both a horizontal axis and a vertical axis, each axis being a copy of the
4 real-number line, and we mark A = (a1 , a2 ) as a point in this plane. We associate this
point with the vector a = (a1 , a2 )T , as representing a displacement from the origin (the
point (0, 0)) to the point A. In this context, a is the position vector of the point A.
This displacement is illustrated by an arrow, or directed line segment, with initial point
at the origin and terminal point at A.

y 6

a2 r
 (a1 , a2 )
a
>




 -
(0, 0) a1 x

position vector, a
Even if a displacement does not begin at the origin, two displacements of the same
length and the same direction are considered to be equal. So, for example, the two
arrows below represent the same vector, v = (1, 2)T .

y 6






 -
(0, 0) x

displacement vectors, v
If an object is displaced from a point, say O, the origin, to a point P by the
displacement p, and then displaced from P to Q, by the displacement v, then the total
displacement is given by the vector from O to Q, which is the position vector q. So we
would expect vectors to satisfy q = p + v, both geometrically (in the sense of a
displacement) and algebraically (by the definition of vector addition). This is certainly
true in general, as illustrated below.

54
4.2. Developing geometric insight – R2 and R3
6

q2 v :

4
*

p2  
> 
p 
 q





 -
(0, 0) p1 q1

If v = (v1 , v2 )T , then q1 = p1 + v1 and q2 = v2 + p2 .


The order of displacements does not matter, (nor does the order of vector addition) so
also q = v + p. For this reason the addition of vectors is said to follow the parallelogram
law.
6

v:

>

  
p >
p

 
 
 :


 v
 
 -
(0, 0)

p + v = v + p.
From q = p + v, we have v = q − p. This is the displacement from P to Q. To help you
determine in which direction the vector v points, think of v = q − p as the vector which
is added to the vector p in order to obtain the vector q.
If v represents a displacement, then 2v must represent a displacement in the same
direction, but twice as far, and −v represents an equal displacement in the opposite
direction. This interpretation is compatible with the definition of scalar multiplication.

Activity 4.4 Sketch the vector v = (1, 2)T in a coordinate system. Then sketch 2v
and −v. Looking at the coordinates on your sketch, what are the components of 2v
and −v?

We have stated that a vector has both a length and a direction. Given a vector
a = (a1 , a2 )T , its length, denoted by kak, can be calculated using Pythagoras’ theorem
applied to the right triangle shown below:

55
4. Vectors

y 6

(a1 , a2 )
>


a
4 a2




 -
(0, 0) a1 x

So the length of a is the scalar quantity


q
kak = a21 + a22 .

The length of a vector can be expressed in terms of the inner product,


p
kak = ha, ai,

simply because ha, ai = a21 + a22 . A unit vector is a vector of length 1.


√ √
Example 4.2 If v = (1, 2)T , then kvk = 12 + 22 = 5.
 T
The vector u = √15 , √25 is a unit vector in the same direction as v.

Activity 4.5 Check this. Calculate the length of u.

The direction of a vector is essentially given by the components of the vector. If we


have two vectors a and b which are (non-zero) scalar multiples, say

a = λb, λ ∈ R, (λ 6= 0),

then a and b are parallel. If λ > 0 then a and b have the same direction. If λ < 0
then we say that a and b have opposite directions.
The zero vector, 0, has length 0 and has no direction. For any other vector, v 6= 0, there
is one unit vector in the same direction as v, namely
1
u= v.
kvk
 
4
Activity 4.6 Write down a unit vector, u, which is parallel to the vector a = .
3
Then write down a vector, w, of length 2 which is in the opposite direction to a.

4.2.2 Inner product


The inner product in R2 is closely linked with the geometric concepts of length and
angle. If a = (a1 , a2 )T , we have already seen that

kak2 = ha, ai = a21 + a22 .

56
4.2. Developing geometric insight – R2 and R3

Let a, b be two vectors in R2 , and let θ denote the angle between them. (Note that
angles are always measured in radians, not degrees, here. So, for example 45 degrees is
π/4 radians.) By the angle between two vectors we shall always mean the angle, θ, such
that 0 ≤ θ ≤ π. If θ < π, the vectors a, b, and c = b − a form a triangle, where c is the
side opposite the angle θ, as, for example, in the figure below.
4

>

 J
 J
a J c=b−a
 J
 J
 J


θ J^
-J
b

The law of cosines (which you may or may not know — don’t worry if you don’t)
applied to this triangle gives us the important relationship stated in the following
theorem.
Theorem 4.3 Let a, b ∈ R2 and let θ denote the angle between them. Then

ha, bi = kak kbk cos θ .

Proof
The law of cosines states that c2 = a2 + b2 − 2ab cos θ where c = kb − ak, a = kak,
b = kbk. That is,
kb − ak2 = kak2 + kbk2 − 2kak kbk cos θ (1)
Expanding the inner product and using its properties, we have

kb − ak2 = hb − a, b − ai = hb, bi + ha, ai − 2ha, bi

That is,
kb − ak2 = kak2 + kbk2 − 2ha, bi (2)
Comparing equations (1) and (2) above, we conclude that

ha, bi = kak kbk cos θ .

This theorem has many geometrical consequences.


For example, we can use it to find the angle between two vectors by using

ha, bi
cos θ = .
kak kbk

57
4. Vectors

Example 4.3 Let    


1 3
v= , and w = ,
2 1
and let θ be the angle between them. Then
4 5 1 π
cos θ = √ √ = √ , so that θ = .
5 10 2 4

Since
ha, bi = kak kbk cos θ ,
and −1 ≤ cos θ ≤ 1 for any real number θ, the maximum value of the inner product is
ha, bi = kak kbk. This occurs precisely when cos θ = 1, that is, when θ = 0. In this case
the vectors a and b are parallel and in the same direction. If they point in opposite
directions, then θ = π and we have ha, bi = −kak kbk. The inner product will be
positive if and only if the angle between the vectors is acute, meaning that 0 ≤ θ < π2 . It
will be negative if the angle is obtuse, meaning that π2 < θ ≤ π.
The non-zero vectors a and b are orthogonal (or perpendicular or, sometimes,
normal) when the angle between them is θ = π2 . Since cos( π2 ) = 0, this is precisely
when their inner product is zero. We restate this important fact:

The vectors a and b are orthogonal if and only if ha, bi = 0.

4.2.3 Vectors in R3
Everything we have said so far about the inner product and its geometric interpretation
in R2 extends to R3 .
 
a1 q
If a = a2 ,
  then kak = a21 + a22 + a23 .
a3

Activity 4.7 Show this. Sketch a position vector a = (a1 , a2 , a3 )T in R3 . Drop a


perpendicular to the xy-plane as in the figure below, and apply Pythagoras’ theorem
twice to obtain the result.

z6

(a1 , a2 , a3 )

1
 

-

PP
P y
 P
(a1 , a2 , 0)
 PP

x


58
4.3. Lines

The vectors a, b and c = b − a in R3 lie in a plane and the law of cosines can still be
applied to establish the result that
ha, bi = kak kbk cos θ .

Activity 4.8 Calculate the angles of the triangle with sides a, b, c and show it is an
isosceles right triangle, where
4
   
1 −1
a = 2 b= 1  c=b−a
2 4

4.3 Lines

4.3.1 Lines in R2
In R2 , a line is given by a single Cartesian equation, such as y = ax + b, and as such, we
can draw a graph of the line in the xy-plane. This line can also be expressed as a single
vector equation with one parameter. To see this, look at the following examples.

Example 4.4 Consider the line y = 2x. Any point (x, y) on this line must satisfy
this equation, and all points that satisfy the equation are on this line.

y6





 x
-
 (0, 0)




The line y = 2x. The vector shown is v = (1, 2)T .

Another way to describe the points on the line is by giving their position vectors. We
can let x = t where t is any real number. Then y is determined by y = 2x = 2t. So if
x = (x, y)T is the position vector of a point on the line, then
   
t 1
x= =t = tv , t ∈ R.
2t 2

For example, if t = 2, we get the position vector of the point (2, 4) on the line, and if
t = −1 we obtain the point (−1, −2). As the parameter t runs through all real
numbers, this vector equation gives the position vectors of all the points on the line.
Starting with the vector equation,
   
x 1
x= = tv = t , t∈R
y 2

59
4. Vectors

we can retrieve the Cartesian equation using the fact that the two vectors are equal
if and only if their components are equal. This gives us the two equations x = t and
y = 2t. Eliminating the parameter t between these two equations yields y = 2x.

4 The line in the above example is a line through the origin. What about a line which
does not contain (0, 0)?

Example 4.5 Consider the line y = 2x + 1. Proceeding as above, we set x = t,


t ∈ R. Then y = 2x + 1 = 2t + 1, so the position vector of a point on this line is
given by
         
t 0 t 0 1
x= = + = +t , t ∈ R.
2t + 1 1 2t 1 2

y6 






-

 (0, 0)
x




The line y = 2x + 1. The vector shown is v = (1, 2)T .

We can interpret this as follows. To locate any point on the line, first locate one
particular point which is on the line, for example the y intercept, (0, 1). Then the
position vector of any point on the line is a sum of two displacements, first going to
the point (0, 1) and then going along the line, in a direction parallel to the vector
v = (1, 2)T . It is important to notice that in this case the actual position vector of a
point on the line does not lie along the line. Only if the line goes through the origin
will that happen.

Activity 4.9 Sketch the line y = 2x + 1 and the position vector q of the point (3, 7)
which is on this line. Then express q as the sum of two vectors, q = p + tv where
p = (0, 1)T and v = (1, 2)T for some t ∈ R and add these vectors to your sketch.

In the vector equation, any point on the line can be used to locate the line, and any
vector parallel to the direction vector, v, can be used to give the direction. So, for
example,      
x 1 −2
= +s , s∈R
y 3 −4
is also a vector equation of this line.

Activity 4.10 If q = (3, 7)T , what is s in this expression of the line?

60
4.3. Lines

As before, we can retrieve the Cartesian equation of the line by equating components of
the vector and eliminating the parameter.

Activity 4.11 Do this for each of the vector equations given above for the line
y = 2x + 1.
2
4
In general, any line in R is given by a vector equation with one parameter of the form
x = p + tv
where x is the position vector of a point on the line, p is any particular point on the
line and v is the direction of the line.

Activity 4.12 Write down a vector equation of the line through the points
P = (−1, 1) and Q = (3, 2). What is the direction of this line? Find a value for c
such that the point (7, c) is on the line.

In R2 , two lines are either parallel or intersect in a unique point.

Example 4.6 The lines `1 and `2 , given by


           
x 1 1 x 5 −2
`1 : = +t , `2 : = +t , t∈R
y 3 2 y 6 1
are not parallel, since their direction vectors are not scalar multiples of one another.
Therefore they intersect in a unique point. We can find this point either by finding
the Cartesian equation of each line and solving the equations simultaneously, or
using the vector equations. We will do the latter. We are looking for a point (x, y) on
both lines, so its position vector will satisfy
         
x 1 1 5 −2
= +t = +s
y 3 2 6 1

for some t ∈ R and for some s ∈ R. We need to use different symbols (s and t) in the
equations because they are unlikely to be the same number for each line. We are
looking for values of s and t which will give us the same point. Equating components
of the position vectors of points on the lines, we have

1 + t = 5 − 2s 2s + t = 4 2s + t = 4
⇒ ⇒ .
3 + 2t = 6 + s −s + 2t = 3 −2s + 4t = 6
Adding these last two equations, we obtain t = 2, and therefore s = 1. Therefore the
point of intersection is (3, 7):
         
1 1 3 5 −2
+2 = = +1
3 2 7 6 1

What is the angle of intersection of these two lines? Since


   
1 −2
, =0
2 1
the lines are perpendicular.

61
4. Vectors

4.3.2 Lines in R3
How can you describe a line in R3 ? Think about this. How do you describe the set of
points (x, y, z) which are on a given line?
Because there are three variables involved, the natural way is to use a vector equation.
4 To describe a line you locate one point on the line by its position vector, and then
travel along from that point in a given direction, or in the opposite direction.


z 6 

q



 -
 (0, 0, 0)
 y
 
 
x



A line in R3

Therefore, a line in R3 is given by a vector equation with one parameter,


x = p + tv,
where x is the position vector of any point on the line, p is the position vector of one
particular point on the line and v is the direction of the line,
     
x p1 v1
x = y = p2 + t v2  , t ∈ R,
     (∗)
z p3 v3

The equation, x = tv represents a parallel line through the origin.

Example 4.7 The equations


       
1 1 3 −3
x =  3  + t  2  and x =  7  + s  −6  , s, t ∈ R
0 −1 −2 3

describe the same line. This is not obvious, so how do we show it?
The lines represented by these equations are parallel since their direction vectors are
parallel    
−3 1
 −6  = −3  2  ,
3 −1
so they either have no points in common and are parallel, or they have all points in
common, and are really the same line. Since
     
3 1 1
 7  = 3 + 2 2 ,
−2 0 −1

62
4.3. Lines

the point (3, 7, 2) is on both lines, so they must have all points in common. We say
that the lines are collinear.
On the other hand, the lines represented by the equations
       
1 1 3 −3
x =  3  + t  2  and x =  7  + t  −6  , t∈R
4
0 −1 1 3

are parallel, with no points in common, since there is no value of t for which
     
3 1 1
7 = 3 + t 2 .
1 0 −1

Activity 4.13 Verify this last statement.

Now try the following.

Activity 4.14 Write down a vector equation of the line through the points
P = (−1, 1, 2) and Q = (3, 2, 1). What is the direction of this line?
Is the point (7, 1, 3) on this line? Suppose you want a point on this line of the form
(c, d, 3). Find one such point. How many choices do you actually have for the values
of c and d?

We can also describe a line in R3 by Cartesian equations, but this time we need two
such equations because there are three variables. Equating components in the vector
equation (∗) above, we have
x = p1 + tv1 , y = p2 + tv2 , z = p3 + tv3 .
Solving each of these equations for the parameter t and equating the results, we have
the two equations
x − p1 y − p2 z − p3
= = , provided vi 6= 0, i = 1, 2, 3.
v1 v2 v3
Example 4.8 To find Cartesian equations of the line
   
1 −1
x =  2  + t  0  , t ∈ R,
3 5
we equate components,
x = 1 − t, y = 2, z = 3 + 5t,
and then solve for t in the first and third equation. The Cartesian equations are
z−3
1−x= and y = 2.
5
This is a line parallel to the xz-plane in R3 . The direction vector has a 0 in the
second component, so there is no change in the y direction, the y coordinate has the
constant value y = 2.

63
4. Vectors

In R2 , two lines are either parallel or intersect in a unique point. In R3 more can
happen. Two lines in R3 either intersect in a unique point, are parallel, or are skew,
which means that they lie in parallel planes and are not parallel.
Try to imagine what skew lines look like. If you are in a room with a ceiling parallel to
the floor, imagine a line drawn in the ceiling. It is possible for you to draw a parallel
4 line in the floor, but instead it is easier to draw a line in the floor which is not parallel
to the one in the ceiling. These lines will be skew. They lie in parallel planes (the ceiling
and the floor). If you could move the skew line in the floor onto the ceiling, then the
lines would intersect in a unique point.
Two lines are said to be coplanar if they lie in the same plane, in which case they are
either parallel or intersecting.

Example 4.9 Are the lines L1 and L2 intersecting, parallel or skew?


           
x 1 1 x 5 −2
L1 :  y  =  3  + t  2  , L2 :  y  =  6  + t  1  , t∈R
z 4 −1 z 1 7

Activity 4.15 Clearly the lines are not parallel. Why?

Example 4.16 (continued)


The lines intersect if there exist values of the parameters, s, t such that
         
x 1 1 5 −2
y  = 3 + t 2  = 6 + s 1 .
z 4 −1 1 7

Equating components, we need to solve the three simultaneous equations in two


unknowns, 
1 + t = 5 − 2s  2s + t = 4
3 + 2t = 6 + s ⇒ −s + 2t = 3 .
4 − t = 1 + 7s 7s + t = 3

We have already seen in Example 4.6 on page 61, that the first two equations have
the unique solution, s = 1, t = 2. Substituting these values into the third equation,

7s + t = 7(1) + 2 6= 3,

we see that the system has no solution. Therefore the lines do not intersect and must
be skew.

Example 4.17 On the other hand, if we take a new line L3 , which is parallel to L2
but which passes through the point (5, 6, −5), then the lines
           
x 1 1 x 5 −2
L1 :  y  =  3  + t  2  , L3 :  y  =  6  + t  1  , t ∈ R
z 4 −1 z −5 7
do intersect in the unique point (3, 7, 2).

64
4.4. Planes in R3

Activity 4.16 Check this. Find the point of intersection of the two lines L1 and L3 .

4.4 Planes in R3
4
On a line, there is essentially one direction in which a point can move (forwards or
backwards) given as all possible scalar multiples of a given direction, but on a plane
there are more possibilities. A point can move in two different directions, and in any
linear combination of these two directions. So how do we describe a plane in R3 ?
The vector parametric equation
x = p + sv + tw, s, t, ∈ R
describes the position vectors of points on a plane in R3 provided that the vectors v and
w are non-zero and are not parallel. The vector p is the position vector of any
particular point on the plane and the vectors v and w are displacement vectors which
lie in the plane. By taking all possible linear combinations x = p + sv + tw, for s, t ∈ R,
we obtain all the points on the plane.
The equation
x = sv + tw, s, t, ∈ R
describes a plane through the origin. In this case the position vector, x, of any point on
the plane lies in the plane.

Activity 4.17 If v and w are parallel, what does the equation


x = p + sv + tw, s, t ∈ R, actually represent?

Example 4.18 You have shown that the lines L1 and L3 given in Example 4.17
intersect in the point (3, 7, 2). (See Activity 4.16 on page 65.) Two intersecting lines
determine a plane. A vector equation of the plane containing the two lines is given by
       
x 3 1 −2
 y  =  7  + s  2  + t  1  , s, t ∈ R.
z 2 −1 7

Why? We know that (3, 7, 2) is a point on the plane, and the directions of each of
the lines must lie in the plane. As s and t run through all real numbers, this
equation gives the position vector of all points on the plane. Since the point (3, 7, 2)
is on both lines, if t = 0 we have the equation of L1 , and if s = 0 we get L3 .
Any point which is on the plane can take the place of the vector (3, 7, 2)T , and any
non-parallel vectors which are linear combinations of v and w can replace these in
the equation. So, for example
       
x 1 1 −3
 y  =  3  + t  2  + s  −1  , s, t ∈ R
z 4 −1 8

is also an equation of this plane.

65
4. Vectors

Activity 4.18 Verify this. Show that (1, 3, 4) is a point on the plane given by each
equation, and show that (−3, −1, 8)T is a linear combination of (1, 2, −1)T and
(−2, 1, 7)T .

4 There is another way to describe a plane in R3 geometrically which is often easier to


use. We begin with planes through the origin. Let n be a given vector in R3 and
consider all position vectors x which are orthogonal to n. Geometrically, the set of all
such vectors describes a plane through the origin in R3 .
Try to imagine this by placing a pencil perpendicular to a table top. The pencil
represents a normal vector, the table top a plane, and the point where the pencil is
touching the table is the origin of your coordinate system. Then any vector which you
can draw on the table top is orthogonal to the pencil, and conversely any point on the
table top can be reached by a directed line segment (from the point of the pencil) which
is orthogonal to the pencil.
A vector, x, is orthogonal to n if and only if

hn, xi = 0,

so this equation gives the position vectors, x, of points on the plane. If n = (a, b, c)T
and x = (x, y, z)T , then this equation can be written as
*   +
a x
hn, xi =  b , y =0
 
c z
or
ax + by + cz = 0.
This is a Cartesian equation of a plane through the origin in R3 . The vector n is called
a normal vector to the plane. Any vector which is parallel to n will also be a normal
vector and will lead to the same Cartesian equation.
On the other hand, given a Cartesian equation,

ax + by + cz = 0

then this equation represents a plane through the origin in R3 with normal vector
n = (a, b, c)T .

66
4.4. Planes in R3

To describe a plane which does not go through the origin, we choose a normal vector n
and one point P on the plane with position vector p. We then consider all displacement
vectors which lie in the plane with initial point at P . If x is the position vector of any
point on the plane, then the displacement vector x − p lies in the plane, and x − p is
orthogonal to n. Conversely, if the position vector x of a point satisfies hx − p, ni = 0,
then the vector x − p lies in the plane, so the point (with position vector x) is on the 4
plane.
(Again, think about the pencil perpendicular to the table top, only this time the point
where the pencil is touching the table is a point, P , on the plane, and the origin of your
coordinate system is somewhere else, say, in the corner on the floor.)
The orthogonality condition means that the position vector of any point on the plane is
given by the equation
hn, x − pi = 0.
Using properties of the inner product, we can rewrite this as

hn, xi = hn, pi

where hn, pi = d is a constant.


If n = (a, b, c)T and x = (x, y, z)T , then

ax + by + cz = d

is a Cartesian equation of a plane in R3 . The plane goes through the origin if and only if
d = 0.

Example 4.19 The equation

2x − 3y − 5z = 2

represents a plane which does not go through the origin, since (x, y, z) = (0, 0, 0)
does not satisfy the equation. To find a point on the plane we can choose any two of
the coordinates, say y = 0 and z = 0, and then the equation tells us that x = 1. So
the point (1, 0, 0) is on this plane. The components of a normal vector to the plane
can be read from this equation as the coefficients of x, y, z: n = (2, −3, −5)T .

How does the Cartesian equation of a plane relate to the vector parametric equation of
a plane? A Cartesian equation can be obtained from the vector equation algebraically,
by eliminating the parameters in the vector equation, and vice versa, as the following
example shows.

Example 4.20 Consider the plane


     
x 1 −2
 y  = s  2  + t  1  = sv + tw, s, t ∈ R,
z −1 7

which is a plane through the origin parallel to the plane in Example 4.18 on page 65.
The direction vectors v = (1, 2, −1)T and w = (−2, 1, 7) lie in the plane.

67
4. Vectors

To obtain a Cartesian equation in x, y and z, we equate the components in this


vector equation
x = s − 2t
y = 2s + t
z = −s + 7t
4 and eliminate the parameters s and t. We begin by solving the first equation for s,
and then substitute this into the second equation to solve for t in terms of x and y,
y − 2x
s = x + 2t ⇒ y = 2(x + 2t) + t = 2x + 5t ⇒ 5t = y − 2x ⇒ t = .
5
Then substitute back into the first equation to obtain s in terms of x and y,
 
y − 2x x + 2y
s=x+2 ⇒ 5s = 5x + 2y − 4x ⇒ s = .
5 5

Finally, we substitute for s and t in the third equation, z = −s + 7t, and simplify to
obtain a Cartesian equation of the plane,

3x − y + z = 0.

Activity 4.19 Carry out this last step to obtain the Cartesian equation of the
plane.

This Cartesian equation can be expressed as


   
3 x
hn, xi = 0 where n =  −1  , x = y .

1 z

The vector n is a normal vector to the plane. We can check that n is, indeed,
orthogonal to the plane by taking the inner product with the vectors v and w, which lie
in the plane.

Activity 4.20 Do this. Calculate hn, vi and hn, wi, and verify that both inner
products are equal to zero.

Since n is orthogonal to both v and w, it is orthogonal to all linear combinations of


these vectors, and hence to any vector in the plane. So this plane can equally be
described as the set of all position vectors which are orthogonal to n.

Activity 4.21 Using the properties of inner product, show that this last statement
is true. That is, if hn, vi = 0 and hn, wi = 0, then hn, sv + twi = 0, for any
s, t ∈ R.

Can we do the same for a plane which does not pass through the origin? Consider the
following example.

68
4.4. Planes in R3

Example 4.21 The plane we just considered in Example 4.20 is parallel to the
plane with vector equation
       
x 3 1 −2
 y  =  7  + s  2  + t  1  = p + sv + tw, s, t ∈ R,
z 2 −1 7 4
which passes through the point (3, 7, 2). Since the planes are parallel, they will have
the same normal vectors. So the Cartesian equation of this plane is of the form
3x − y + z = d.
Since (3, 7, 2) is a point on the plane, it must satisfy the equation for the plane.
Substituting into the equation we find d = 3(3) − (7) + (2) = 4 (which is equivalent
to finding d by using d = hn, pi). So the Cartesian equation we obtain is
3x − y + z = 4.

Conversely, starting with a Cartesian equation of a plane, we can obtain a vector


equation. We are looking for the position vector of a point on the plane whose
components satisfy 3x − y + z = 4, or equivalently, z = 4 − 3x + y. (We can solve for
any one of the variables x, y or z, but we chose z for simplicity.) So we are looking
for all vectors x such that
         
x x 0 1 0
y =  y  = 0 +x
   0  + y 1

z 4 − 3x + y 4 −3 1
for any x, y ∈ R. Therefore
       
x 0 1 0
y  = 0 + s 0  + t1, s, t ∈ R
z 4 −3 1
is a vector equation of the same plane as that given by the original vector equation,
       
x 3 1 −2
 y  =  7  + s  2  + t  1  , s, t ∈ R,
z 2 −1 7
although it is difficult to spot this at a glance.
There are many ways to show that these two vector equations do represent the same
plane, but we can use what we know about planes to find the easiest. The planes
represented by the two vector equations have the same normal vector n, since the
vectors (1, 0, −3)T and (0, 1, 1)T are also orthogonal to n. So we know that the two
vector equations represent parallel planes. They are the same plane if they have a
point in common. It is far easier to find values of s and t for which p = (3, 7, 2)T
satisfies the new vector equation,
       
3 0 1 0
 7  =  0  + s  0  + t  1  , s, t ∈ R
2 4 −3 1
than the other way around (which is by showing that (0, 0, 4) satisfies the original
equation) because of the positions of the 0s and 1s in the direction vectors.

69
4. Vectors

Activity 4.22 Do this. You should be able to immediately spot the values of s and
t which work.

Using the examples we have just done, you should now be able to tackle the following
question.
4
Activity 4.23 The two lines, L1 and L2 ,
           
x 1 1 x 5 −2
L1 :  y  =  3  + t  2  , L2 :  y  =  6  + t  1  , t∈R
z 4 −1 z 1 7

in Example 4.9 on page 64 are skew, and therefore are contained in parallel planes.
Find vector equations and Cartesian equations for these two planes.

Two planes in R3 are either parallel or intersect in a line. Considering such questions, it
is usually easier to use the Cartesian equations of the planes. If the planes are parallel,
then this will be obvious from looking at their normal vectors. If they are not parallel,
then the line of intersection can be found by solving the two Cartesian equations
simultaneously.

Example 4.22 The planes


x + 2y − 3z = 0 and − 2x − 4y + 6z = 4
are parallel, since their normal vectors are related by (−2, −4, 6)T = −2(1, 2, −3)T .
The equations do not represent the same plane, since they have no points in
common; that is, there are no values of x, y, z which can satisfy both equations. The
first plane goes through the origin and the second plane does not.
On the other hand, the planes
x + 2y − 3z = 0 and x − 2y + 5z = 4
intersect in a line. The points of intersection are the points (x, y, z) which satisfy
both equations, so we solve the equations simultaneously. We begin by eliminating
the variable x from the second equation, by subtracting the first equation from the
second. This will naturally lead us to a vector equation of the line of intersection;
)
x + 2y − 3z = 0 x + 2y − 3z = 0

x − 2y + 5z = 4 −4y + 8z = 4 .
This last equations tells us that if z = t is any real number, then y = −1 + 2t.
Substituting these expressions into the first equation, we find x = 2 − t. Then a
vector equation of the line of intersection is
       
x 2−t 2 −1
 y  =  −1 + 2t  =  −1  + t  2  .
z t 0 1
This can be verified by showing that the point (2, −1, 0) satisfies both Cartesian
equations and that the vector v = (−1, 2, 1)T is orthogonal to the normal vectors of
each of the planes (and therefore lies in both planes).

70
4.5. Lines and hyperplanes in Rn

Activity 4.24 Carry out the calculations in the above example and verify that the
line is in both planes.

4.5 Lines and hyperplanes in Rn 4


4.5.1 Vectors and lines in Rn
We can apply similar geometric language to vectors in Rn . Using the inner product in
Rn (defined in section 4.1.2), we define the length of a vector x = (x1 , x2 , . . . , xn )T by
q
kxk = x21 + x22 + · · · + x2n or kxk2 = hx, xi.
We say that two vectors, v, w ∈ Rn are orthogonal if and only if
hv, wi = 0.

A line in Rn is the set of all points (x1 , x2 , . . . , xn ) whose position vectors x satisfy a
vector equation of the form
x = p + tv, t ∈ R,
where p is the position vector of one particular point on the line and v is the direction
of the line. If we can write x = tv, t ∈ R, then the line goes through the origin.

4.5.2 Hyperplanes
The set of all points (x1 , x2 , . . . , xn ) which satisfy one Cartesian equation,
a1 x1 + a2 x2 + · · · + an xn = d,
is called a hyperplane in Rn .
In R2 , a hyperplane is a line; in R3 it is a plane. For n > 3, we use the term hyperplane.
The vector
a1
 
 a2 
a=  ... 

an
is a normal vector to the hyperplane. Writing the Cartesian equation in vector form, a
hyperplane is the set of all vectors, x ∈ Rn such that
hn, x − pi = 0,
where the normal vector n and the position vector p of a point on the hyperplane are
given.

Activity 4.25 How many Cartesian equations would you need to describe a line in
Rn ?

R
How many parameters would there be in a vector equation of a hyperplane?
Read the remaining parts of Chapter 1, Sections 1.8–1.12.

71
4. Vectors

Overview
In this chapter we have defined and looked at vectors and Euclidean n-space, Rn ,
together with the definition and properties of the inner product in Rn . We have worked
with lines in R2 and lines and planes in R3 in order to gain geometric insight into the
4 possibilities that arise from linear combinations of vectors, so that we may be able to
apply this intuition to Rn . Vectors are the fundamental building blocks of linear
algebra, as we shall see in the next chapters.
At the end of this chapter and the relevant reading you should be able to:

explain what is meant by a vector and by Euclidean n-space, Rn .


define the inner product of two vectors, understand and use the properties of the
inner product
show that if A = (c1 c2 . . . cn ), and if x = (α1 , α2 , . . . , αn )T ∈ Rn , then
Ax = α1 c1 + α2 c2 + · · · + αn cn
state what is meant by the length and direction of a vector, what is meant by a
unit vector
state and apply the relationship between the inner product and the length and
angle between two vectors in R2 and R3
explain what is meant by two vectors being orthogonal and determine if two
vectors are orthogonal
find the equations, vector and Cartesian, of lines in R2 , lines and planes in R3 , and
work problems involving lines and planes
understand what is meant by a line and by a hyperplane in Rn .

Test your knowledge and understanding


Work Exercises 1.8–1.12 in the text A-H. The solutions to all exercises in the text can
be found at the end of the textbook.
Work Problems 1.10 – 1.12 in the text A-H. You will find the solutions on the VLE.

Comments on selected activities


Feedback to activity 4.1
 
4
aT b = ( 1 2 3 )  −2  = (3)
1
   
1 4 −2 1
abT =  2  ( 4 −2 1 ) =  8 −4 2  .
3 12 −6 3

Feedback to activity 4.2


To prove properties (ii) and (iii), apply the definition to the LHS (left-hand side) of the
equation and rearrange the terms to obtain the RHS (right-hand side). For example, for

72
4.5. Comments on selected activities

x, y ∈ Rn , using the properties of real numbers,


αhx, yi = α(x1 y1 + x2 y2 + · · · + xn yn )
= αx1 y1 + αx2 y2 + · · · + αxn yn
= (αx1 )y1 + (αx2 )y2 + · · · + (αxn )yn = hαx, yi.
4
Do the same for property (iii).
The single property hαx + βy, zi = αhx, zi + βhy, zi implies property (ii) by letting
β = 0 for the first equality and then letting α = 0 for the second, and property (iii) by
letting α = β = 1. On the other hand, if properties (ii) and (iii) hold, then
hαx + βy, zi = hαx, zi + hβy, zi by property (iii)
= αhx, zi + βhy, zi by property (ii) .

Feedback to activity 4.6


kak = 5, so    
1 4 2 4
u= and w = − .
5 3 5 3

Feedback to activity 4.7


In the figure below,

z6

(a1 , a2 , a3 )
1

 
 -
 PPP y
 PP
 P (a , a , 0)
 1 2
x


the line from the originpto the point (a1 , a2 , 0) lies in the xy-plane, and by Pythagoras’
theorem, it has length a21 + a22 . Applying Pythagoras’ theorem again to the right
triangle shown, we have
s 2
q q
kak = a1 + a2 + (a3 )2 = a12 + a22 + a32
2 2

Feedback to activity 4.8


We have    

1 −1 −2
a = 2 b= 1  c = b − a =  −1 
2 4 2
The cosines of the three angles are given by
ha, bi −1 + 2 + 8 1
= √ √ =√
kakkbk 9 18 2

73
4. Vectors

ha, ci 2+2−4
= √ √ = 0;
kakkck 9 9
hb, ci 2−1+8 1
= √ √ =√
kbkkck 18 9 2
4 Thus the triangle has a right-angle, and two angles of π/4.
Alternatively, as the vectors a and c are orthogonal, and have the same length, it
follows immediately that the triangle is right-angled and isosceles.
Feedback to activity 4.9
If t = 3, then q = (3, 7)T . You are asked to sketch the position vector q as this sum to
illustrate that the vector q does locate a point on the line, but the vector q does not lie
on the line.
Feedback to activity 4.10
Here s = −1.

Feedback to activity 4.11


We will work through this for the second equation and leave the first for you. We have,
for s ∈ R,
(
x = 1 − 2s
     
x 1 −2 1−x 3−y
= +s ⇒ ⇒ =s= ,
y 3 −4 y = 3 − 4s 2 4

which yields 2(1 − x) = 3 − y or y = 2x + 1.

Feedback to activity 4.12


A vector equation of the line is
   
−1 4
x= +t = p + tv, t ∈ R,
1 1
where we have used p to locate a point on the line, and the direction vector, v = q − p.
The point (7, 3) is on the line (t = 2), and this is the only point of this form on the line,
since once 7 is chosen for the x coordinate, the y coordinate is determined.
Feedback to activity 4.13
Once given, for example, that the x coordinate is x = 3, the parameter t of the vector
equation is determined, therefore, so are the other two coordinates. We saw in Example
4.7 that t = 2 satisfies the first two equations and it certainly does not satisfy the third
equation, 1 = 0 − t.
Feedback to activity 4.14
This is similar to the earlier activity in R2 . A vector equation of the line is
   
−1 4
x =  1  + t  1  = p + tv, t ∈ R.
2 −1

The point (7, 1, 3) is not on this line, but the point (−5, 0, 3) is on the line. The value
t = −1 will then satisfy all three component equations. There is, of course, only one
possible choice for the values of c and d.

74
4.5. Comments on selected activities

Feedback to activity 4.15


The lines are not parallel because their direction vectors are not parallel.

Feedback to activity 4.30


If v and w are parallel, then this equation represents a line in the direction v. If
w = λv, then this line can be written as 4
x = p + (s + λt)v, where r = s + λt ∈ R.
Feedback to activity 4.21
Using the properties of the inner product, we have for any s, t ∈ R,
hn, sv + twi = shn, vi + thn, wi = s · 0 + t · 0 = 0.
Feedback to activity 4.22
Equating components in the vector equation, we have 3 = s and 7 = t from the first two
equations, and these values do satisfy the third equation, 2 = 4 − 3s + t.
Feedback to activity 4.23
The parallel planes must each contain the direction vectors of each of the lines as
displacement vectors, so the vector equations of the planes are, respectively
       
x 1 1 −2
y  = 3 + s 2  + t 1 
z 4 −1 7
and        
x 5 1 −2
y  = 6 + s 2  + t 1 ,
z 1 −1 7
where s, t ∈ R.
The parallel planes have the same normal vector, which we need for the Cartesian
equations. Recall that in Example 4.21 on page 69 we found a Cartesian equation and a
normal vector to the first plane, the plane which contains L1 ,
 
3
3x − y + z = 4 with n =  −1  .
1
Note that the point (1, 3, 4) is on this plane because it satisfies the equation, but the
point (5, 6, 1) does not. Substituting (5, 6, 1) into the equation 3x − y + z = d, we find
the Cartesian equation of the parallel plane which contains L2 is
3x − y + z = 10.
Feedback to activity 4.24
As stated, to verify that the line is in both planes, show that its direction vector is
perpendicular to the normal vector of each plane, and that the point (2, −1, 0) satisfies
both equations.
Feedback to activity 4.25
To describe a line in Rn you need n − 1 Cartesian equations. A vector parametric
equation of a hyperplane in Rn would require n − 1 parameters.

75
4. Vectors

76
Chapter 5
Linear systems I: Gaussian
elimination
5

Introduction
Being able to solve systems of many linear equations in many unknowns is a vital part
of linear algebra. This is where we begin to use matrices and vectors as essential
elements of obtaining and expressing the solutions. In this chapter we investigate linear
systems and present a useful method known as Gaussian elimination.

Aims
The aims of this chapter are to:

Define linear systems

Learn how to solve linear systems by using the method of Gaussian elimination.

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 2.
Sections 2.1-2.3.
This chapter of the guide closely follows the first half of Chapter 2 of the textbook. You
should read the corresponding sections of the textbook and work through all the
activities there while working through the sections of this subject guide.

Further reading

R
The material of this chapter is also discussed in the following book:
Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. Chapters 16, 17.
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

77
5. Linear systems I: Gaussian elimination

Synopsis
We begin by expressing a system in matrix form and defining elementary row
operations on an augmented matrix. These operations mimic standard operations on
systems of equations. We then learn a precise algorithm to apply these operations in
order to put the matrix in a form called reduced echelon form, from which the general
solution to the system is readily obtained. The method of manipulating matrices in this
way to obtain the solution is known as Gaussian elimination.
5
5.1 Systems of linear equations
A system of m linear equations in n unknowns x1 , x2 , . . . , xn is a set of m
equations of the form

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. ..
. .
am1 x1 + am2 x2 + · · · + amn xn = bm .

The numbers aij are known as the coefficients of the system.


We say that s1 , s2 , . . . , sn is a solution of the system if all m equations hold true when

x1 = s1 , x2 = s2 , . . . , xn = sn .

Sometimes a system of linear equations is known as a set of simultaneous equations;


such terminology emphasises that a solution is an assignment of values to each of the n
unknowns such that each and every equation holds with this assignment. It is also
referred to simply as a linear system.
In order to deal with large systems of linear equations we write them in matrix form.
Definition 5.1 (Coefficient matrix) The matrix A = (aij ), whose (i, j)-entry is the
coefficient aij of the system of linear equations is called the coefficient matrix.

a11 a12 ... a1n


 
 a21 a22 ... a2n 
A=
 ... .. .. .. 
. . . 
am1 am2 ... amn

Let x = (x1 , x2 , . . . , xn )T be the vector of unknowns. Then the product Ax of the m × n


coefficient matrix A and the n × 1 column vector x is an m × 1 matrix,

a11 a12 ... a1n x1 a11 x1 + a12 x2 + · · · + a1n xn


    
 a21 a22 ... a2n   x2   a21 x1 + a22 x2 + · · · + a2n xn 
 . .. .. ..   . = .. .. ,
 .. . . .   ..   . . 
am1 am2 ... amn xn am1 x1 + an2 x2 + · · · + amn xn

78
5.2. Row operations

whose entries are the left-hand sides of our system of linear equations.
If we define another column vector b, whose m components are the right-hand sides bi ,
the system is equivalent to the matrix equation
Ax = b.

Example 5.1 Consider the following system of three linear equations in the three
unknowns, x1 , x2 , x3 :
5
x1 + x2 + x3 = 3
2x1 + x2 + x3 = 4
x1 − x2 + 2x3 = 5

This system can be written in matrix notation as Ax = b with


     
1 1 1 x1 3
A= 2 1
 1 ,
 x = x2 ,
  b = 4.

1 −1 2 x3 5

The entries of the matrix A are the coefficients of the xi . If we perform the matrix
multiplication of Ax,
    
1 1 1 x1 x1 + x2 + x3
2 1 1   x2  =  2x1 + x2 + x3 
1 −1 2 x3 x1 − x2 + 2x3

the matrix product is a 3 × 1 matrix, a column vector. If Ax = b, then


   
x1 + x2 + x 3 3
 2x1 + x2 + x3  =  4 
x1 − x2 + 2x3 5

and these two 3 × 1 matrices are equal if and only if their components are equal.
This gives precisely the three linear equations.

5.2 Row operations


Our purpose is to find an efficient means of finding the solutions of systems of linear
equations. To do this, we begin by looking at a simple example.
An elementary way of solving a system of linear equations such as
x1 + x2 + x3 = 3
2x1 + x2 + x3 = 4
x1 − x2 + 2x3 = 5
is to begin by eliminating one of the variables from two of the equations. For example,
we can eliminate x1 from the second equation by multiplying the first equation by 2 and

79
5. Linear systems I: Gaussian elimination

then subtracting it from the second equation. Let’s do this. Twice the first equation
gives the equation 2x1 + 2x2 + 2x3 = 6. Subtracting this from the second equation,
2x1 + x2 + x3 = 4 yields the equation −x2 − x3 = −2. We can now replace the second
equation in the original system by this new equation,

x1 + x2 + x3 = 3
−x2 − x3 = −2
5 x1 − x2 + 2x3 = 5

and the new system will have the same set of solutions as the original system.
We can continue in this manner to obtain a simpler set of equations with the same
solution set as the original system. So what are the operations that we can perform on
the equations of a linear system without altering the set of solutions? We can:
O1 multiply both sides of an equation by a non-zero constant
O2 interchange two equations
O3 add a multiple of one equation to another.
These operations do not alter the set of solutions since the restrictions on the variables
x1 , x2 , . . . , xn given by the new equations imply the restrictions given by the old ones
(that is, we can undo the manipulations made on the old system).
At the same time, we observe that these operations really only involve the coefficients of
the variables and the right sides of the equations.
For example, using the same system as above expressed in matrix form, Ax = b, then
the matrix  
1 1 1 3
(A|b) =  2 1 1 4 
1 −1 2 5
which is the coefficient matrix A together with the constants b as the last column,
contains all the information we need to use, and instead of manipulating the equations,
we can instead manipulate the rows of this matrix. For example, subtracting twice
equation 1 from equation 2 is executed by taking twice row 1 from row 2.
These observations form the motivation behind a method to solve systems of linear
equations, known as Gaussian elimination. To solve a linear system Ax = b we first
form the augmented matrix, denoted (A|b) which is A with column b tagged on.
Definition 5.2 (Augmented matrix) If Ax = b is a system of linear equations,
a11 a12 · · · a1n x1 b1
     
 a21 a22 · · · a2n  x2   b.2 
    
A=  ... .. .. ..  x =  .
 ..  b =  .. 
. . .
am1 am2 · · · amn xn bm
Then the matrix
a11 a12 · · · a1n b1
 
 a21 a22 · · · a2n b2 
(A|b) = 
 ... .. ... .. .. 
. . . 
am1 am2 · · · amn bm
is called the augmented matrix of the linear system.

80
5.3. Gaussian elimination

From the operations listed above for manipulating the equations of the linear system,
we define corresponding operations on the rows of the augmented matrix.
Definition 5.3 (Elementary row operations) These are:
RO1 multiply a row by a non-zero constant
RO2 interchange two rows
RO3 add a multiple of one row to another.
5
5.3 Gaussian elimination
We will describe a systematic method for solving systems of linear equations by an
algorithm which uses row operations to put the augmented matrix into a form from
which the solution of the linear system can be easily read. To illustrate the algorithm,
we will use two examples: the augmented matrix (A|b) of the example in the previous
section and the augmented matrix (B|b) of a second system of linear equations.
   
1 1 1 3 0 0 2 3
( A|b ) =  2 1 1 4  , ( B|b ) =  0 2 3 4  .
1 −1 2 5 0 0 1 5

5.3.1 The algorithm — reduced row echelon form


Using the above two examples, we will carry out the algorithm in detail.
(1) Find the leftmost column that is not all zeros.
   
1 1 1 3 0 0 2 3
2 1 1 4 0 2 3 4
1 −1 2 5 0 0 1 5
↑ ↑
(This is column 1 of (A|b) and column 2 of (B|b).)

(2) Get a non-zero entry at the top of this column.


The matrix on the left already has a non-zero entry at the top. For the matrix on the
right, we interchange row 1 and row 2.
   
1 1 1 3 0 2 3 4
2 1 1 4 0 0 2 3
1 −1 2 5 0 0 1 5

(3) Make this entry 1; multiply the first row by a suitable number or interchange two
rows. This is called a leading one.
The left-hand matrix already had a 1 in this position. For the second matrix, we
multiply row 1 by 12 .
3
   
1 1 1 3 0 1 2
2
2 1 1 4 0 0 2 3
1 −1 2 5 0 0 1 5

81
5. Linear systems I: Gaussian elimination

(4) Add suitable multiples of the top row to rows below to make all entries below the
leading one become zero.
For the matrix on the left, we add −2 times row 1 to row 2, then we add −1 times row 1
to row 3. The first operation is the same as the one we performed earlier on the example
using the equations. The matrix on the right already has zeros under the leading one.
0 1 32 2
   
1 1 1 3
 0 −1 −1 −2  0 0 2 3
5 0 −2 1 2 0 0 1 5

At any stage we can read the modified system of equations from the new augmented
matrix, remembering that column 1 gives the coefficients of x1 , column 2 the coefficients
of x2 and so on, and that the last column represents the right-hand side of the equations.
For example the matrix on the left is now the augmented matrix of the system

x1 + x2 + x3 = 3
−x2 − x3 = −2
−2x2 + x3 = 2

The next step in the algorithm is


(5) Cover up the top row and apply steps 1 to 4 again.
This time we will work on one matrix at a time. After the first four steps, we have
altered the augmented matrix (A|b) to:
 
1 1 1 3
(A|b) −→  0 −1 −1 −2 
0 −2 1 2
We now ignore the top row. Then the leftmost column which is not all zeros is column
2. This column already has a non-zero entry at the top. We make it into a leading one
by multiplying row 2 by −1:
 
1 1 1 3
−→  0 1 1 2 
0 −2 1 2
This is now a leading one, and we use it to obtain zeros below. We add 2 times row 2 to
row 3:  
1 1 1 3
−→ 0 1 1 2
0 0 3 6
Now we cover up the top two rows and start again with steps 1 to 4. The leftmost
column which is not all zeros is column 3. We multiply row 3 by 31 to obtain the final
leading one:  
1 1 1 3
−→  0 1 1 2  .
0 0 1 2

This last matrix is in row echelon form, or simply, echelon form.

82
5.3. Gaussian elimination

Definition 5.4 (Row echelon form) A matrix is said to be in row echelon form,
(or echelon form) if it has the following three properties:
(1) Every non-zero row begins with a leading one.
(2) A leading one in a lower row is further to the right.
(3) Zero rows are at the bottom of the matrix.

Activity 5.1
Check that the above matrix satisfies these three properties. 5
The term echelon form takes its name from the form of the equations at this stage.
Reading from the matrix, these equations are

x1 + x2 + x3 = 3
x2 + x 3 = 2
x3 = 2

We could now use a method called back substitution to find the solution of the
system. The last equation tells us that x3 = 2. We can then substitute this into the
second equation to obtain x2 , and then use these two values to obtain x1 . This is an
acceptable approach, but we can effectively do the same calculations by continuing with
row operations. So we continue with one final step of our algorithm.
(6) Begin with the last row and add suitable multiples to each row above to get zeros
above the leading ones.
Continuing from the row echelon form and using row 3, we replace row 2 with row
2−row 3, and at the same time we replace row 1 with row 1−row 3.
   
1 1 1 3 1 1 0 1
(A|b) −→  0 1 1 2  −→  0 1 0 0 
0 0 1 2 0 0 1 2

We now have zeros above the leading one in column 3. There is only one more step to
do, and that is to get a zero above the leading one in column 2. So the final step is row
1−row 2,  
1 0 0 1
−→  0 1 0 0  .
0 0 1 2
This final matrix is now in reduced (row) echelon form. It has the additional property
that every column with a leading one has zeros elsewhere.
Definition 5.5 (Reduced row echelon form) A matrix is said to be in reduced
row echelon form (or reduced echelon form) if it has the following four properties:
(1) Every non-zero row begins with a leading one.
(2) A leading one in a lower row is further to the right.
(3) Zero rows are at the bottom of the matrix.
(4) Every column with a leading one has zeros elsewhere.

83
5. Linear systems I: Gaussian elimination

If R is the reduced row echelon form of a matrix M , we will sometimes write


R = RREF (M ).
The solution can now be read from the matrix. The top row says x1 = 1, the second row
says x2 = 0, and the third row says x3 = 2. The original system has been reduced to the
matrix equation     
1 0 0 x1 1
 0 1 0   x2  =  0 
0 0 1 x3 2
5
giving the solution    
x1 1
 x2  =  0  .
x3 2
This system of equations has a unique solution.
We can check that this solution is the correct solution of the original system by
substituting it into the equations, or equivalently, by multiplying out the matrices Ax
to show that Ax = b.

Activity 5.2 Do this: check that


    
1 1 1 1 3
2 1 10 = 4.
1 −1 2 2 5

We now return to the example (B|b) which we left after the first round of steps 1 to 4,
and apply step 5. We cover up the top row and apply steps 1 to 4 again. We need to
have a leading one in the second row, which we achieve by switching row 2 and row 3:
0 1 32 2 0 1 32 2
   

(B|b) −→  0 0 2 3  −→  0 0 1 5 
0 0 1 5 0 0 2 3
We obtain a zero under this leading one by replacing row 3 with row 3 + (−2) times
row 2,
0 1 32 2
 

−→  0 0 1 5 
0 0 0 −7
and then finally multiply row 3 by − 17
3
 
0 1 2
2
−→ 0 0
 1 5
0 0 0 1
This matrix is now in row echelon form, but we shall see that there is no point in going
on to reduced row echelon form. This last matrix is equivalent to the system
0 1 32
    
x1 2
 0 0 1   x2  =  5 
0 0 0 x3 1
What is the bottom equation of this system? Row 3 says 0x1 + 0x2 + 0x3 = 1, that is
0 = 1 which is impossible! This system has no solution.

84
5.3. Gaussian elimination

5.3.2 Consistent and inconsistent systems


Definition 5.6 (Consistent) A system of linear equations is said to be consistent if
it has at least one solution. It is inconsistent if there are no solutions.

If the row echelon form (REF) of the augmented matrix ( A|b ) contains a row
(0 0 · · · 0 1) then it is inconsistent.
5
It is instructive to look at the original systems represented by these augmented matrices,
   
1 1 1 3 0 0 2 3
( A|b ) =  2 1 1 4  ( B|b ) =  0 2 3 4 
1 −1 2 5 0 0 1 5
 
 x1 + x2 + x3 = 3  2x3 = 3
2x + x2 + x3 = 4 2x + 3x3 = 4 .
 1  2
x1 − x2 + 2x3 = 5 x3 = 5
We see immediately that the system Bx = b is inconsistent since it is not possible for
both the top and the bottom equation to hold.
Since these are systems of three equations in three variables, we can interpret these
results geometrically. Each of the equations above represents a plane in R3 . The system
Ax = b represents three planes which intersect in the point (1, 0, 2). This is the only
point which lies on all three planes. The system Bx = b represents three planes, two of
which are parallel (the horizontal planes 2x3 = 3 and x3 = 5), so there is no point which
lies on all three planes.
This method of reducing the augmented matrix to reduced row echelon form is known
as Gaussian elimination or Gauss-Jordan elimination.
We have been very careful in illustrating this method to explain what the row
operations were for each step of the algorithm, but in solving a system with this method
it is not necessary to include all this detail. The aim is to use row operations (following
the algorithm) to put the augmented matrix into reduced row echelon form, and then
read off the solutions from this form. Where it is useful to indicate the operations, you
can do so by writing, for example, R2 − 2R1 , where we always write down the row we
are replacing first, so that R2 − 2R1 indicates ‘replace row 2 (R2 ) with row 2 plus −2
times row 1 (R2 − 2R1 )’. Otherwise, you can just write down the sequence of matrices
linked by arrows. It is important to realise that once you have performed a row
operation on a matrix, the new matrix obtained is not equal to the previous one, this is
why you must use arrows between the steps and not equal signs.

Example 5.2 We repeat the reduction of (A|b) to illustrate this for the system

x1 + x2 + x3 = 3
2x1 + x2 + x3 = 4
x1 − x2 + 2x3 = 5

Begin by writing down the augmented matrix, then apply the row operations to

85
5. Linear systems I: Gaussian elimination

carry out the algorithm. Here we will indicate the row operations.
 
1 1 1 3
(A|b) =  2 1 1 4  →
1 −1 2 5
 
1 1 1 3
R2 − 2R1  0 −1 −1 −2  →
5 R3 − R1 0 −2 1 2
 
1 1 1 3
(−1)R2 0 1 1 2  →

0 −2 1 2
 
1 1 1 3
0 1 1 2 →
R3 + 2R2 0 0 3 6
 
1 1 1 3
0 1 1 2.
1
( 3 )R3 0 0 1 2
The matrix is now in row echelon form, continue to reduced row echelon form,
 
R1 − R3 1 1 0 1
R2 − R3  0 1 0 0  →
0 0 1 2
 
R1 − R2 1 0 0 1
0 1 0 0.
0 0 1 2
The augmented matrix is now in reduced row echelon form.

Activity 5.3 Use Gaussian elimination to solve the following system of equations,

x1 + x2 + x3 = 6
2x1 + 4x2 + x3 = 5
2x1 + 3x2 + x3 = 6.

Be sure to follow the algorithm to put the augmented matrix into reduced row
echelon form using row operations.

5.3.3 Linear systems with free variables


Gaussian elimination can be used to solve systems of linear equations with any number
of equations and unknowns. We will now look at an example of a linear system with
four equations in five unknowns,

x 1 + x2 + x3 + x4 + x5 = 3
2x1 + x2 + x3 + x4 + 2x5 = 4

86
5.3. Gaussian elimination

x1 − x2 − x3 + x4 + x5 = 5
x1 + x4 + x5 = 4.

The augmented matrix is


 
1 1 1 1 1 3
2 1 1 1 2 4
(A|b) = 
 1 −1 −1 1
.
1 5
1 0 0 1 1 4 5
Check that your augmented matrix is correct before you proceed, or you could be
solving the wrong system! A good method is to first write down the coefficients by rows,
so reading across the equations, and then to check the columns do correspond to the
coefficients of that variable. Now follow the algorithm to put (A|b) into reduced row
echelon form.  
−→ 1 1 1 1 1 3
R2 − 2R1  0 −1 −1 −1 0 −2 

R3 − R1  0 −2 −2 0 0 2 
R4 − R1 0 −1 −1 0 0 1
 
1 1 1 1 1 3
(−1)R2 
 0 1 1 1 0 2
−→  0 −2 −2 0 0 2
0 −1 −1 0 0 1
 
1 1 1 1 1 3
−→ 0
 1 1 1 0 2
R3 + 2R2  0 0 0 2 0 6
R4 + R2 0 0 0 1 0 3
 
1 1 1 1 1 3
−→  0  1 1 1 0 2
( 21 )R3  0 0 0 1 0 3
0 0 0 1 0 3
 
1 1 1 1 1 3
−→  0 1 1 1 0 2
0 0 0 1 0 3
R4 − R3 0 0 0 0 0 0
This matrix is in row echelon form. We continue to reduced row echelon form, starting
with the third row,  
R1 − R3 1 1 1 0 1 0
R2 − R30
 1 1 0 0 −1 

−→ 0 0 0 1 0 3 
0 0 0 0 0 0
 
1 0 0 0 1 1
R1 − R2 
0 1 1 0 0 −1 
.
0 0 0 1 0 3 
−→
0 0 0 0 0 0

87
5. Linear systems I: Gaussian elimination

There are only three leading ones in the reduced row echelon form of this matrix. These
appear in columns 1, 2 and 4. Since the last row gives no information, but merely states
that 0 = 0, the matrix is equivalent to the system of equations

x1 + 0 + 0 + 0 + x5 = 1
x2 + x3 + 0 + 0 = −1
x4 + 0 = 3.
5 The form of these equations tells us that we can assign any values to x3 and x5 , and
then the values of x1 , x2 and x4 will be determined.
Definition 5.7 (Leading variables) The variables corresponding to the columns with
leading ones in the reduced row echelon form of an augmented matrix are called
leading variables. The other variables are called non-leading variables.

In this example the variables x1 , x2 and x4 are leading variables, x3 and x5 are
non-leading variables. We assign x3 , x5 the arbitrary values s, t, where s, t represent any
real numbers, and then solve for the leading variables in terms of these. We get

x4 = 3 x2 = −1 − s x1 = 1 − t.

Then we express this solution in vector form:


         
x1 1−t 1 0 −1
 x2   −1 − s   −1   −1   0 
         
x =  x3  =  s  =  0  + s  1  + t 
       
 0 .

 x4   3   3   0   0 
x5 t 0 0 1

Observe that there are infinitely many solutions, because any values of s ∈ R and t ∈ R
will give a solution.
The solution given above is called a general solution of the system, because it gives a
solution for any values of s and t. For any particular assignment of values to s and t,
such as s = 0, t = 1, we obtain a particular solution of the system.

Activity 5.4 Let s = 0 and t = 0 and show (by substituting it into the equation)
that x0 = (1, −1, 0, 3, 0)T is a solution of Ax = b. Then let s = 1 and t = 2 and show
that the new vector x1 you obtain is also a solution.

With practice, you will be able to read the general solution directly from the reduced
row echelon form of the augmented matrix. We have
 
1 0 0 0 1 1
 0 1 1 0 0 −1 
(A|b) −→  0 0 0 1 0 3 .

0 0 0 0 0 0

Locate the leading ones, and note which are the leading variables. Then locate the
non-leading variables and assign each an arbitrary parameter. So, as above, we note

88
5.3. Overview

that the leading ones correspond to x1 , x2 and x4 and we assign arbitrary parameters to
the non-leading variables; that is, values such as x3 = s and x5 = t where s and t
represent any real number. Then write down the vector x = (x1 , x2 , x3 , x4 , x5 )T (as a
column) and fill in the values starting with x5 and working up. We have x5 = t. Then
the third row tells us that x4 = 3. We have x3 = s. Now look at the second row, which
says x2 + x3 = −1, or x2 = −1 − s. Then the top row tells us that x1 = 1 − t. In this
way we obtain the solution in vector form.

5
Activity 5.5 Write down the system of three linear equations in three unknowns
represented by the matrix equation Ax = b, where
     
1 2 1 x 3
A= 2 2 0 ,
  x= y ,
  b = 2.

3 4 1 z 5

Use Gaussian elimination to solve the system. Express your solution in vector form.
If each equation represents the Cartesian equation of a plane in R3 , describe the
intersection of these three planes.

5.3.4 Solution sets


We have seen systems of linear equations which have a unique solution, no solution and
infinitely many solutions. It turns out that these are the only possibilities.
For suppose we have a linear system Ax = b which has two distinct solutions, p and q.
Thinking of these vector solutions as determining points in Rn , then we will show that
every point on the line through p and q is also a solution. Therefore, as soon as there is
more than one solution, there must be infinitely many.
To prove this claim, let p and q be vectors such that Ap = b and Aq = b, p 6= q. The
equation of the line is
v = p + t(q − p) t ∈ R.
Then for any vector v on the line we have Av = A(p + t(q − p)). Using the
distributive laws,

Av = Ap + tA(q − p) = Ap + t(Aq − Ap) = b + t(b − b) = b

Therefore v is also a solution for any t ∈ R, so there are infinitely many of them.
Notice that in this proof, the vector w = q − p satisfies the equation Ax = 0.

Overview
In this chapter, we have seen how the method of Gaussian elimination can be used to
solve linear systems. We will see some applications of linear systems shortly, but we will
also see that the basic method of Gaussian elimination (the use of elementary row
operations) is a crucial tool in many areas of linear algebra.

89
5. Linear systems I: Gaussian elimination

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

express a system of linear equations in matrix form as Ax = b and know what is


meant by the coefficient matrix and the augmented matrix
put a matrix into reduced row echelon form using row operations and following the
algorithm
5 recognise consistent and inconsistent systems of equations
solve a system of m linear equations in n unknowns using Gaussian elimination
express the solution in vector form
interpret systems with three unknowns as intersections of planes in R3

Test your knowledge and understanding


Work Exercises 2.1– 2.4 in the text A-H. The solutions to all exercises in the text can
be found at the end of the textbook.
Work Problem 2.8 in the text A-H. You will find the solution on the VLE.

Comments on selected activities


Feedback to activity 5.3
Put the augmented matrix into reduced row echelon form. It should take five steps:
   
1 1 1 6 1 0 0 2
 2 4 1 5  −→ (1) −→ (2) −→ (3) −→ (4) −→  0 1 0 −1  ,
2 3 1 6 0 0 1 5
from which you can read the solution, x = (2, −1, 5)T . We will state the row operations
at each stage. To obtain (1), do R2 − 2R1 and R3 − 2R1 ; for (2) switch R2 and R3 ; For
(3) do R3 − 2R2 . The augmented matrix is now in row echelon form, so starting with
the bottom row, for (4) do R2 + R3 and R1 − R3 . The final operation, R1 − R2 will yield
the matrix in reduced row echelon form.
Feedback to activity 5.4
Multiply the matrices below as instructed to obtain b:
   
  1   −1
1 1 1 1 1  1 1 1 1 1 
  −1   −2 
 
2 1 1 1 2 2 1 1 1 2
Ax0 =  1 −1 −1 1 1   0  and Ax1 =  1 −1 −1 1
    1 .
 3  1 3 

1 0 0 1 1 1 0 0 1 1
0 2
Feedback to activity 5.5
The equations are:
x1 + 2x2 + x3 = 3
2x1 + 2x2 = 2
3x1 + 4x2 + x3 = 5.

90
5.3. Comments on selected activities

Put the augmented matrix into reduced row echelon form


   
1 2 1 3 1 2 1 3
2 2 0 2  −→  0 −2 −2 −4 
3 4 1 5 0 −2 −2 −4
   
1 2 1 3 1 0 −1 −1
−→ . . . −→  0 1 1 2  −→  0 1 1 2 
0 0 0 0
    
0 0 0
 
0

5
x −1 + t −1 1
with solution  y  =  2 − t  =  2  + t  −1  ,
z t 0 1
t ∈ R. This is the equation of a line in R3 . So the three planes intersect in a line.

91
5. Linear systems I: Gaussian elimination

92
Chapter 6
Linear systems II: an application and
homogeneous systems

Introduction 6
In this chapter we give an economic application of linear systems. We also study some
general properties of the set of solutions to a linear system.

Aims
The aims of this chapter are to:

Describe the Leontief input-output analysis model

Explain what is meant by a homogeneous linear system and what is meant by the
null space of a matrix

Explain how the general solution to any linear system is related, through the
‘Principle of Linearity’ to that of a related homogeneous system.

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 3,
Section 3.5 and Chapter 2, Section 2.4.

Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

R
Input-output analysis can also be found in the Anthony and Biggs book.

Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods


and Modelling. Chapter 19.

93
6. Linear systems II: an application and homogeneous systems

Synopsis
We begin by describing a model developed by Leontief known as input-output analysis.
This is an economic application of linear systems.
We then examine the forms of solutions to systems of linear equations and look at their
properties, defining what is meant by a homogeneous system and the null space of a
matrix. We explain how the solution set of a linear system is related to the null space of
the coefficient matrix (or, equivalently, to the solution set of the related homogeneous
linear system).

6
6.1 Application: Leontief input-output analysis
In 1973 Wassily Leontief was awarded the Nobel prize in Economics for work he did
analysing an economy with many interdependent industries using a system of linear
equations. We present a brief outline of his method here.
Suppose an economy has n interdependent production processes; the outputs of the n
industries are used to run the industries and to satisfy an outside demand. We will
assume that prices are fixed so that they can be used to measure the output. The
problem we wish to solve is to determine the level of output of each industry which will
satisfy all demands exactly; that is, both the demands of the other industries and the
outside demand. The problem can be described as a system of linear equations, as we
shall see by considering the following simple example.

Example 6.1 Suppose there are two industries: water and electricity. Let
x1 = total output of water ($ value)
x2 = total output of electricity ($ value)
We can express this as a vector,
 
x1
x= , called a production vector.
x2
Suppose we know that 
$0.01 water
water uses to produce $1.00 water output
$0.15 electricity 
$0.21 water
electric uses to produce $1.00 electricity.
$0.05 electricity
What is the total water used by the industries? Water is using $0.01 for each unit
output, so a total of 0.01x1 , and electricity is using $0.21 water for each unit of its
output, so a total of 0.21x2 . The total amount of water used by the industries is
therefore 0.01x1 + 0.21x2 . In the same way, the total amount of electricity used by
the industries is 0.15x1 + 0.05x2 . The totals can be expressed as
    
water 0.01 0.21 x1
= = Cx.
electricity 0.15 0.05 x2

The matrix C is known as a consumption matrix or a technology matrix.

94
6.1. Application: Leontief input-output analysis

After the industries have used water and electricity to produce their outputs, how
much water and electricity are left to satisfy the outside demand?

Activity 6.1 Think about this before continuing. Write down an expression for the
total amount of water which is left after the industries have each used what they
need to produce their output. Do the same for electricity.

Example 6.10 (continued)


Let d1 denote the outside demand for water, and d2 the demand for electricity. Then
in order for the output of these industries to supply the industries and satisfy the 6
outside demand exactly, the following equations must be satisfied:
(
x1 − 0.01x1 − 0.21x2 = d1 (water)
=⇒
x2 − 0.15x1 − 0.05x2 = d2 (electricity)

In matrix notation,
      
x1 0.01 0.21 x1 d1
− = ,
x2 0.15 0.05 x2 d2

or, x − Cx = d, where
 
d1
d= is the outside demand vector.
d2

If we use the fact that Ix = x, where I is the 2 × 2 identity matrix, then we can
rewrite this system in matrix form as

Ix − Cx = d or (I − C)x = d.

This is now in the usual matrix form for a system of linear equations. A solution, x,
to this system of equations will determine the output levels of each industry required
to satisfy all demands exactly.

Now let’s look at the general case. Suppose we have an economy with n interdependent
industries. If cij denotes the amount of industry i used by industry j to produce $1.00
of industry j, then the consumption or technology matrix is C = (cij ):

c11 c12 · · · c1n


 
 c21 c22 · · · c2n 
C=
 ... .. ... ..  ,
. . 
cn1 cn2 · · · cnn

where

Row i lists the amounts of industry i used by each industry.

Column j lists the amounts of each industry used by industry j.

95
6. Linear systems II: an application and homogeneous systems

If, as before, we denote by d the n × 1 outside demand vector, then in matrix form the
problem we wish to solve is to find the production vector x such that
(I − C)x = d,
a system of n linear equations in n unknowns.

Activity 6.2 Return to Example 6.1 and assume that the public demand for water
is $627 and for electricity is $4,955. Find the levels of output which satisfy all
demands exactly. (You should find that x1 = 1, 800 and x2 = 5, 500.)

6
6.2 Homogeneous systems and null space

6.2.1 Homogeneous systems


Definition 6.1 A homogeneous system of linear equations is a linear system of
the form Ax = 0.

A homogeneous system Ax = 0 is always consistent.

Why? Because A0 = 0, so it always has the solution x = 0. For this reason, x = 0 is


called the trivial solution.

Note that if Ax = 0 has a unique solution, then it must be the trivial solution,
x = 0.

If we form the augmented matrix, (A | 0), of a homogeneous system, then the last
column will consist entirely of zeros. This column will remain a column of zeros
throughout the entire row reduction, so there is no point in writing it. Instead, we use
Gaussian elimination on the coefficient matrix A, remembering that we are solving
Ax = 0.

Example 6.11 Find the solution of the homogeneous linear system,

x + y + 3z + w = 0
x−y+z+w = 0
y + 2z + 2w = 0

We reduce the coefficient matrix A to reduced row echelon form,


     
1 1 3 1 1 1 3 1 1 1 3 1
A =  1 −1 1 1  −→  0 −2 −2 0  −→  0 1 1 0
0 1 2 2 0 1 2 2 0 1 2 2
     
1 1 3 1 1 1 0 −5 1 0 0 −3
−→  0 1 1 0  −→  0 1 0 −2  −→  0 1 0 −2 
0 0 1 2 0 0 1 2 0 0 1 2

96
6.2. Homogeneous systems and null space

Activity 6.3 Work through the above calculation and state what row operation is
being done at each stage. For example, the first operation is R2 − R1 .
Then write down the solution from the reduced row echelon form of the matrix.

The solution is   
x 3
y
 = t 2 ,
 
x=
z  −2  t∈R
w 1
which is a line through the origin, x = tv, with v = (3, 2, −2, 1)T . There are infinitely
many solutions, one for every t ∈ R. 6
This example illustrates the following fact.
Theorem 6.1 If A is an m × n matrix with m < n then Ax = 0 has infinitely many

R
solutions.

Read the proof of this theorem in the textbook A-H, where it is labelled as
Theorem 2.21. But before you do so, think about why the theorem is true and try
to prove it yourself. Why were there infinitely many solutions in the above example?
What about a linear system Ax = b? If A is m × n with m < n, does Ax = b have
infinitely many solutions? The answer is, that provided the system is consistent, then
there are infinitely many solutions, as the following examples show.

Example 6.12 The linear system

x+y+z = 6
x+y+z = 1

is inconsistent, since there are no values of x, y, z which can satisfy both equations.
These equations represent parallel planes in R3 .

Example 6.13 On the other hand,

x + y + 3z + w = 2
x−y+z+w = 4
y + 2z + 2w = 0

is consistent and will have infinitely many solutions. Notice that the coefficient
matrix of this linear system is the same matrix A as that used in the previous
example of a homogeneous system.
The augmented matrix is
 
1 1 3 1 2
(A|b) =  1 −1 1 1 4  .
0 1 2 2 0

97
6. Linear systems II: an application and homogeneous systems

Activity 6.4 Show that the reduced row echelon form of the augmented matrix is
 
1 0 0 −3 1
0 1 0 −2 −2  .
0 0 1 2 1

Then write down the solution.

Example 6.5 (continued)


The general solution of this system,
6   
x 1
 
3

 y   −2   2 
x=  z  =  1  + t  −2  = p + tv
     t∈R
w 0 1

is a line which does not go through the origin. It is parallel to the line of solutions of
the homogeneous system, Ax = 0, and goes through the point determined by p.
This should come as no surprise, since the coefficient matrix forms the first four
columns of the augmented matrix. Compare the solution sets:

Ax = 0 : Ax = b :

RREF (A) RREF (A|b)


   
1 0 0 −3 1 0 0 −3 1
 0 1 0 −2   0 1 0 −2 −2 
0 0 1 2 0 0 1 2 1
     
3 1 3
 2   −2   2 
x = t 
 −2  x= 1  + t  −2 
  

1 0 1

The reduced row echelon form of the augmented matrix of a system Ax = b will always
contain the information for the solution of Ax = 0, since the matrix A is the first part
of (A|b). We therefore have the following definition.
Definition 6.2 (Associated homogeneous system) Given a system of linear
equations, Ax = b, the linear system Ax = 0 is called the associated homogeneous
system.

The solutions of the associated homogeneous system form an important part of the
solution of the system Ax = b, as we shall see in the next section.

Activity 6.5 Look at the reduced row echelon form of A in Example 6.13,
 
1 0 0 −3
 0 1 0 −2  .
0 0 1 2

98
6.2. Homogeneous systems and null space

Explain why you can tell from this matrix that for all b ∈ R3 , the linear system
Ax = b is consistent with infinitely many solutions.

Activity 6.6 Find the solution of the system of equations Ax = b given by

x1 + 2x2 + x3 = 1
2x1 + 2x2 = 2
3x1 + 4x2 + x3 = 2.

Find also the general solution of the associated homogeneous system, Ax = 0.


Describe the configuration of intersecting planes for each system of equations 6
(Ax = b and Ax = 0).

6.2.2 Null space


It is clear from what we have just seen that the general solution to a consistent linear
system Ax = b involves solutions to the system Ax = 0. This set of solutions is given a
special name: the null space or kernel of a matrix A. This null space, denoted N (A), is
the set of all solutions x to Ax = 0, where 0 is the zero vector. That is,
Definition 6.3 (Null space) For an m × n matrix A, the null space of A is the
subset of Rn given by,
N (A) = {x ∈ Rn | Ax = 0},
where 0 = (0, 0, . . . , 0)T is the zero vector of Rm .

We now formalise the connection between the solution set of a consistent linear system,
and the null space of the coefficient matrix of the system.
Theorem 6.2 Suppose that A is an m × n matrix, that b ∈ Rm , and that the system
Ax = b is consistent. Suppose that p is any solution of Ax = b. Then the set of all
solutions of Ax = b consists precisely of the vectors p + z for z ∈ N (A); that is,

R {x | Ax = b} = {p + z | z ∈ N (A)}.
Read the proof of this in the textbook A-H, where it is labelled as Theorem 2.29.
Note that it uses the strategy (Section 2.1.4) of proving that two sets are equal by
showing that each is a subset of the other.
The above result is the ‘Principle of Linearity’. It says that the general solution of a
consistent linear system Ax = b is equal to any one particular solution p, where
Ap = b, plus the general solution of the associated homogeneous system.

{solutions of Ax = b} = p + {solutions of Ax = 0}.


In light of this result, let’s have another look at some of the examples we worked earlier.
In the previous section (page 98) we observed that the solutions of
x + y + 3z + w = 2
x−y+z+w = 4
y + 2z + 2w = 0.

99
6. Linear systems II: an application and homogeneous systems

are of the form



    
x 1 3
 y   −2 
 + t  2  = p + tv,
 
x= =
z  1   −2  t ∈ R,
w 0 1
where x = tv is the general solution we had found of the associated homogeneous
system (page 96). It is clear that p is a particular solution of the linear system (take
t = 0), so this solution is of the form described in the theorem.
Now refer back to the first two examples Ax = b and Bx = b which we worked in
section 5.3.1.
6  
 x+y+z = 3  2z = 3
2x + y + z = 4 2y + 3z = 4 .
x − y + 2z = 5 z = 5
 

The echelon forms of the augmented matrices we found were


3
   
1 0 0 1 0 1 2
2
(A|b) −→  0 1 0 0  , (B|b) −→  0 0 1 5.
0 0 1 2 0 0 0 1
The first system, Ax = b, has a unique solution, p = (1, 0, 2)T , and the second system,
Bx = b, is inconsistent.
The reduced row echelon form of the matrix A is the identity matrix (these are the first
three columns of the augmented matrix). Therefore the homogeneous system Ax = 0
will only have the trivial solution. The unique solution of Ax = b is of the form
x = p + 0, which conforms with the Principle of Linearity.
This principle does not apply to the inconsistent system Bx = b; however, the
associated homogeneous system is consistent. Notice that the homogeneous system is

 2z = 0
2y + 3z = 0
z = 0

which represents the intersection of two planes, since the equations 2z = 0 and z = 0
each represent the xy-plane. To find the solution, we continue to reduce the matrix B to
reduced row echelon form.
0 1 23
   
0 1 0
B −→  0 0 1  −→  0 0 1  .
0 0 0 0 0 0
The non-leading variable is x, so we set x = t, and the solution is
   
t 1
x = 0 = t0, t ∈ R
0 0
which is a line through the origin, namely the x-axis. So the plane 2y + 3z = 0
intersects the xy-plane along the x-axis.
We summarise what we have noticed so far:

100
6.2. Overview

If Ax = b is consistent, the solutions are of the form x = p + z where p is any one


particular solution and z ∈ N (A), the null space of A.
(1) If Ax = b has a unique solution then Ax = 0 has only the trivial solution.
(2) If Ax = b has infinitely many solutions then Ax = 0 has infinitely many
solutions.

Ax = b may be inconsistent, but Ax = 0 is always consistent.

Activity 6.7
Look at the example we solved in section 5.3.3 on page 86.

x1 + x2 + x3 + x4 + x5 = 3
6
2x1 + x2 + x3 + x4 + 2x5 = 4
x 1 − x2 − x3 + x4 + x5 = 5
x1 + x4 + x5 = 4.

Show that the solution we found is of the form x = p + sv + tw, s, t ∈ R, where p is


a particular solution of Ax = b and sv + tw is a general solution of the associated
homogeneous system Ax = 0.

Overview
We have explored an economic application of linear systems, known as input-output
analysis. We have defined the null space of a matrix and expanded our understanding of
linear systems by looking at solutions of homogeneous systems, and showing how the
solution set of a consistent linear system is related to the solution set of the associated
homogeneous system.

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

say what is meant by the Leontief input-output model


say what is meant by a homogeneous system of equations and what is meant by the
associated homogeneous system of any linear system of equations
say what is meant by the null space of a matrix
state and explain the Principle of Linearity

Test your knowledge and understanding


Work Exercises 2.5, 2.6 and 3.10 in the text A-H. The solutions to all exercises in the
text can be found at the end of the textbook.
Work Problem 2.6 in the text A-H. You will find the solution on the VLE.

101
6. Linear systems II: an application and homogeneous systems

Comments on selected activities


Feedback to activity 6.1
The total water output remaining is x1 − 0.01x1 − 0.21x2 , and the total electricity
output left is x2 − 0.15x1 − 0.05x2 .
Feedback to activity 6.2
Solve (I − C)x = d by Gaussian elimination, where
     
0.01 0.21 x1 627
C= , x= , d= .
0.15 0.05 x2 4955
6 Reducing the augmented matrix,
   
0.99 −0.21 627 33 −7 20900
((I − C)|d) = →
−0.15 0.95 4955 −3 19 99100
   
1 −7/33 1900/3 1 −7/33 1900/3

−3 19 99100 0 202/11 101000
     
1 −7/33 1900/3 1 0 1800 1800
→ , ⇒ x= .
0 1 5500 0 1 5500 5500

Feedback to activity 6.5


This is the reduced row echelon form of the coefficient matrix, A. The reduced row
echelon form of any augmented matrix. (A|b), will have the same four columns as the
first four of its five columns. As there is a leading one in every row, it is impossible to
have a row of the form (0 0 . . . 0 1), so the system will be consistent. There will be one
free (non-leading) variable, (fourth column, say x4 = t), so there will be infinitely many
solutions.
Feedback to activity 6.6
Using row operations to reduce the augmented matrix to echelon form, we obtain
       
1 2 1 1 1 2 1 1 1 2 1 1 1 2 1 1
 2 2 0 2  →  0 −2 −2 0  → 0 1
 1 0  → 0
 1 1 0 .
3 4 1 2 0 −2 −2 −1 0 −2 −2 −1 0 0 0 −1
There is no reason to reduce the matrix further, we conclude that the original system of
equations is inconsistent, there is no solution. For the homogeneous system, Ax = 0, the
row echelon form of A consists of the first three columns of the echelon form of the
augmented matrix. So starting from these and continuing to reduced row echelon form,
we obtain
     
1 2 1 1 2 1 1 0 −1
A = 2 2 0 → ... → 0 1 1 → 0 1 1 .
3 4 1 0 0 0 0 0 0
Setting the non-leading variable x3 = t, we find that the null space of A consists of all
vectors, x,  
1
x = t  −1  , t ∈ R.
1

102
6.2. Comments on selected activities

The system of equations Ax = 0 has infinitely many solutions.


Geometrically, the associated homogeneous system represents the equations of three
planes, all of which pass through the origin. These planes intersect in a line through the
origin. The equation of this line is given by the solution we found.
The original system represents three planes with no common points of intersection. No
two of the planes in either system are parallel. Why? Look at the normals to the planes,
no two of these are parallel, so no two planes are parallel. These planes intersect to form
a kind of triangular prism; any two planes intersect in a line, and the three lines of
intersection are parallel, but there are no points which lie on all three planes. (If you
have trouble visualising this, take three cards, place one flat on the table, and then get
the other two to balance on top, forming a triangle when viewed from the side.) 6

103
6. Linear systems II: an application and homogeneous systems

104
Chapter 7
Matrix inversion

Introduction
How do we know if a matrix A is invertible, and how do we find the inverse if it is? In
this chapter we will answer these two questions using row operations. The ‘main
theorem’ which accomplishes this will feature throughout the course.
Only a square matrix can have an inverse, so in this chapter all matrices will be square
7
unless explicitly stated otherwise.

Aims
The aims of this chapter are to:

Look carefully at row operations on a matrix, introduce elementary matrices and


establish how they can be used to perform row operations

State and prove (using elementary matrices) the main result answering the
question, “When is a matrix A invertible?”

Deduce and demonstrate a method to find the inverse of a matrix using row
operations

Establish the result that if A and B are square matrices with AB = I, then A and
B are invertible and one is the inverse of the other.

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 3,
Section 3.1

Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

105
7. Matrix inversion

Synopsis
We examine the effects of a row operation on a matrix and on the product of two
matrices. This leads us to the definition of an elementary matrix and we observe how
multiplying a matrix A on the left by an elementary matrix performs a single row
operation on A. This observation enables us to prove the main theorem, which states
that A is invertible if and only if any one of three other conditions holds. From the
proof of this theorem we deduce the method of finding the inverse of a matrix using row
operations. We then use the theorem to prove that if A and B are square matrices with
AB = I, then A and B are invertible and are inverses of each other.

7.1 Elementary matrices


7
Recall the three elementary row operations we use to put a matrix into reduced row
echelon form:
RO1 multiply a row by a non-zero constant
RO2 interchange two rows
RO3 add a multiple of one row to another.
These operations change a matrix into a different matrix. We want to examine this
process more closely. For this purpose, let A be an n × n matrix and let Ai denote the
ith row of A. Then we can write A as a column of n rows,
a11 a12 · · · a1n A1
   
 a21 a22 · · · a2n   A2 
A=  ... .. .. ..  =  ... 

. . . 
an1 an2 · · · ann An
We use this row notation to indicate row operations. For example, what row operations
are indicated below?
A1 A2 A1
     
 3A2   A1   A2 + 4A1 
 .   .   .. 
 ..   ..   . 
An An An
The first is multiply row 2 by 3, the second is interchange row 1 and row 2, and the
third is add 4 times row 1 to row 2. Each of these represent new matrices after the row
operation has been executed.
Now look at a product of two n × n matrices A and B. The (1, 1) entry in the product
is the inner product of row 1 of A and column 1 of B. The (1, 2) entry is the inner
product of row 1 of A and column 2 of B, and so on. In fact, row 1 of the product
matrix AB is obtained by taking the product of the row matrix A1 with the matrix B,
that is, A1 B. This is true of each row of the product; that is, each row i of the product
AB is obtained by taking Ai B. So we can express the product AB as,
a11 a12 · · · a1n b11 b12 · · · b1n A1 B
    
 a21 a22 · · · a2n   b21 b22 · · · b2n   A2 B 
 .
 .. .. .. ..   . .. .. ..  = . 
. . .   .. . . .   .. 
an1 an2 · · · ann bn1 bn2 · · · bnn An B

106
7.1. Elementary matrices

Now consider the effect of a row operation on a product AB. For example, the first
matrix below is the product AB after the row operation ‘add 4 times row 1 of AB to
row 2 of AB’.
A1 B A1 B A1
     
 A2 B + 4A1 B   (A2 + 4A1 )B   A2 + 4A1 
 .. = .. = .. B
 .   .   . 
An B An B An

In the second matrix we have used the distributive rule to write


A2 B + 4A1 B = (A2 + 4A1 )B. But comparing this matrix to the row form of a product
of two matrices given above, this matrix is just the product of the matrix obtained from
A after the same row operation, multiplying the matrix B.
This argument can be repeated for any row operation to show that the matrix obtained
by a row operation on the product AB is equal to the product of the matrix obtained 7
by the row operation on A with the matrix B; that is,
(matrix obtained by a row operation on AB) =
(matrix obtained by a row operation on A)B
This is true for any n × n matrices A and B. Now take A = I, the identity matrix.
Since IB = B, the previous statement now says that:
(matrix obtained by a row operation on B) =
(matrix obtained by a row operation on I)B

Definition 7.1 (Elementary matrix) An elementary matrix, E, is an n × n matrix


obtained by doing exactly one row operation on the n × n identity matrix, I.

For example,      
1 0 0 0 1 0 1 0 0
0 3 0 1 0 0 4 1 0
0 0 1 0 0 1 0 0 1
are elementary matrices. The first has had row 2 multiplied by 3, the second had row 1
and row 2 interchanged, and the last matrix had 4 times row 1 added to row 2.

Activity 7.1 Which of the matrices below are elementary matrices?


     
2 1 0 0 1 0 1 0 0
0 1 0   1 0 0  0 1 0
0 0 1 −1 0 1 −1 0 1

Write the first matrix as the product of two elementary matrices.

Elementary matrices provide a useful tool to relate a matrix to its reduced row echelon
form. We have shown above that the matrix obtained from a matrix B after performing
one row operation is equal to a product EB, where E is the elementary matrix obtained
from I by that same row operation.

107
7. Matrix inversion

Example 7.1 Suppose we want to put the matrix


 
1 2 4
B=  1 3 6
−1 0 1

into reduced row echelon form. Our first step is


   
1 2 4 1 2 4
R2 −R1
B=  1 3 6  −→  0 1 2
−1 0 1 −1 0 1

We perform the same operation on the identity matrix to obtain an elementary


matrix, which we will denote by E1 .
7    
1 0 0 1 0 0
R2 −R1
I= 0 1 0
  −→  −1 1 0  = E1
0 0 1 0 0 1

Then the matrix E1 B is


    
1 0 0 1 2 4 1 2 4
E1 B =  −1 1 0   1 3 6  =  0 1 2  ,
0 0 1 −1 0 1 −1 0 1

which is the matrix obtained from B after the row operation.

We now want to look at the invertibility of elementary matrices and row operations.
Any elementary row operation can be undone by an elementary row operation.
RO1 is multiply a row by a non-zero constant.
To undo RO1, multiply the row by 1/(constant).
RO2 is interchange two rows.
To undo RO2 interchange the rows again.
RO3 is add a multiple of one row to another.
To undo RO3 subtract the multiple of one row from the other.
If we obtain an elementary matrix by performing one row operation on the identity, and
another elementary matrix from the row operation which ‘undoes’ it, then multiplying
these matrices together will return the identity matrix. That is, they are inverses of one
another. This argument establishes the following theorem.
Theorem 7.1 Any elementary matrix is invertible, and the inverse is also an
elementary matrix.

Activity 7.2 For the matrix E below, write down E −1 , and then show that
EE −1 = I and E −1 E = I.  
1 0 0
E =  −4 1 0  .
0 0 1

108
7.2. Row equivalence

We saw earlier in Example 7.1 that multiplying E1 B we obtain


    
1 0 0 1 2 4 1 2 4
E1 B =  −1 1 0 1 3 6 =  0 1 2
0 0 1 −1 0 1 −1 0 1

We can undo this row operation and return the matrix B by multiplying on the left
by E1−1 ,     
1 0 0 1 2 4 1 2 4
1 1 0 0 1 2 =  1 3 6.
0 0 1 −1 0 1 −1 0 1

7.2 Row equivalence


7
Definition 7.2 If A and B are m × n matrices, we say that A is row equivalent to B
if and only if there is a sequence of elementary row operations to transform A into B.

This is an example of what is known as an equivalence relation. This means it


satisfies three important conditions: it is

reflexive A∼A

symmetric A∼B ⇒B∼A

transitive A ∼ B and B ∼ C ⇒ A ∼ C

Activity 7.3 Argue why this is true: that is, explain why row equivalence as defined
above satisfies these three conditions.

The algorithm for putting a matrix A into reduced row echelon form by a sequence of
row operations means that every matrix is row equivalent to a matrix in reduced row
echelon form. This fact is stated in the following theorem.
Theorem 7.2 Every matrix is row equivalent to a matrix in reduced row echelon form.

7.3 The main theorem


We are now ready to answer the first question, ‘When is a matrix invertible?’ We collect
our results in the following theorem.
Theorem 7.3 If A is an n × n matrix, then the following statements are equivalent
(meaning if any one of these statements is true for A, then all the statements are true).
(1) A−1 exists.
(2) Ax = b has a unique solution for any b ∈ Rn .
(3) Ax = 0 only has the trivial solution, x = 0.
(4) The reduced row echelon form of A is I.

109
7. Matrix inversion

It is particularly important for you to appreciate how one statement of this theorem
implies the next. As you read through the proof stop and think about this.
Proof
If we show that (1) ⇒ (2) ⇒ (3) ⇒ (4) ⇒ (1), then any one statement will imply all
the others, so the statements are equivalent.
(1) =⇒ (2). We assume that A−1 exists, and consider the system of linear equations
Ax = b where x is the vector of unknowns and b is any vector in Rn . We use the
matrix A−1 to solve for x by multiplying the equation on the left by A−1 ,
A−1 Ax = A−1 b =⇒ Ix = A−1 b =⇒ x = A−1 b.
This shows that x = A−1 b is a solution, and that it is the only possible solution. So
Ax = b has a unique solution for any b ∈ Rn .
7 (2) =⇒ (3). If Ax = b has a unique solution for all b ∈ Rn , then this is true for b = 0.
The unique solution of Ax = 0 must be the trivial solution, x = 0.

(3) =⇒ (4). If the only solution of Ax = 0 is x = 0, then there are no free


(non-leading) variables and the reduced row echelon form of A must have a leading one
in every column. Since the matrix is square and a leading one in a lower row is further
to the right, A must have a leading one in every row. Since every column with a leading
one has zeros elsewhere, this can only be the n × n identity matrix.

(4) =⇒ (1). We now make use of elementary matrices. If A is row equivalent to I, then
there is a sequence or row operations which reduce A to I, so there must exist
elementary matrices E1 , . . . , Er such that
Er Er−1 · · · E1 A = I.
Each elementary matrix has an inverse. We use these to solve the above equation for A,
−1
by first multiplying the equation on the left by Er−1 , then by Er−1 , and so on, to obtain
A = E1−1 · · · Er−1
−1
Er−1 I
This says that A is a product of invertible matrices, and therefore A is invertible.
(Recall from Chapter 3 that if A and B are invertible matrices of the same size, then
the product AB is invertible and its inverse is the product of the inverses in the reverse
order, (AB)−1 = B −1 A−1 .)
This proves the theorem.

7.4 Using row operations to find the inverse matrix


From the proof of the theorem we have
A = E1−1 · · · Er−1
where the matrices Ei are the elementary matrices corresponding to the row operations
used to reduce A to the identity matrix, I. Then taking the inverse of both sides,
A−1 = (E1−1 · · · Er−1 )−1 = Er · · · E1 = Er · · · E1 I.

110
7.4. Using row operations to find the inverse matrix

This tells us that if we apply the same row operations to the matrix I that we use to
reduce A to I, then we will obtain the matrix A−1 . That is,

Er Er−1 · · · E1 A = I , A−1 = Er · · · E1 .I

This gives us a method to find the inverse of a matrix A. We start with the matrix A
and we form a new, larger matrix by placing the identity matrix to the right of A,
obtaining the matrix denoted (A|I). We then use row operations to reduce this to
(I|B). If this is not possible (which will become apparent) then the matrix is not
invertible. If it can be done, then A is invertible and B = A−1 .

Example 7.2 We use this method to find the inverse of the matrix

7
 
1 2 4
A=  1 3 6.
−1 0 1

In order to determine if the matrix is invertible and, if so, to determine the inverse,
we form the matrix  
1 2 4 1 0 0
(A | I) =  1 3 6 0 1 0.
−1 0 1 0 0 1
(We have separated A from I by a vertical line just to emphasise how this matrix is
formed. It is also helpful in the calculations.) Then we carry out elementary row
operations.  
1 2 4 1 0 0
R2 − R1 
0 1 2 −1 1 0 
R3 + R1
0 2 5 1 0 1
 
1 2 4 1 0 0
 0 1 2 −1 1 0 
R3 − 2R2
0 0 1 3 −2 1
 
1 2 0 −11 8 −4
R1 − 4R3 
0 1 0 −7 5 −2 
R2 − 2R3
0 0 1 3 −2 1
 
1 0 0 3 −2 0
R1 − 2R2 
0 1 0 −7 5 −2  .
0 0 1 3 −2 1
This is now in the form (I|B) so we deduce that A is invertible and that
 
3 −2 0
A−1 =  −7 5 −2  .
3 −2 1

It is very easy to make mistakes when row reducing a matrix, so the next thing you
should do is check that AA−1 = I.

111
7. Matrix inversion

Activity 7.4 Do this. Check that when you multiply AA−1 , you get the identity
matrix I.
(In order to establish that this is the inverse matrix, you should also show
A−1 A = I, but we will forgo that here.)

If the matrix A is not invertible, what will happen? By the theorem, if A is not
invertible, then the reduced row echelon form of A cannot be I, so there will be a row of
zeros in the row echelon form of A.

Activity 7.5 Find the inverse, if it exists, of each of the following matrices
   
−2 1 3 2 1 3
A =  0 −1 1 B =  0 −1 1  .
7 1 2 0 1 2 0

7.5 Verifying an inverse


At this stage, in order to show that a square matrix B is the inverse of the n × n matrix
A, it seems we have to show that both statements, AB = I and BA = I are true. After
we have proved the following theorem, which follows from Theorem 7.3, we will be able
to deduce from the one statement AB = I that A and B must be inverses of one
another.
Theorem 7.4 If A and B are n × n matrices and AB = I, then A and B are each

R
invertible matrices, and A = B −1 and B = A−1 .

Read the proof of this theorem in the textbook A-H, where it is labelled as
Theorem 3.12. Before you do so, think about how you might prove it yourself. If
you can show that the homogeneous system of equations Bx = 0 has only the
trivial solution, x = 0, then by Theorem 7.3 this will prove that B is invertible.
Then you can use B −1 to complete the proof. Try it, and then read the textbook.

Overview
In this chapter we defined and used elementary matrices in order to establish the main
theorem concerning the invertibility of a matrix. The main theorem and its proof are of
fundamental importance, linking the concepts we have studied so far, and we shall be
using and adding to it in later chapters. Indeed, we have already used it to establish the
fact that if A and B are square matrices then it is sufficient to just show that AB = I
in order to conclude that B = A−1 . We no longer have to show that also BA = I.

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

112
7.5. Test your knowledge and understanding

say what is meant by an elementary matrix, and understand how elementary


matrices are used to perform row operations
state the results contained in the main theorem (page 109) and understand the
proof of this theorem
find the inverse of a matrix using row operations
understand why, if A and B are square matrices A = B −1 and B = A−1 .

Test your knowledge and understanding


Work Exercises 3.1 and 3.2 in the text A-H. The solutions to all exercises in the text
can be found at the end of the textbook.
In addition, work the exercise below. The solution is at the end of this chapter.
7
Exercise
Exercise 7.1
Prove the following statement using Theorem 7.4
If A and B are n × n matrices and (AB)−1 exists
then A and B are invertible.

Comments on selected activities


Feedback to activity 7.1
Only the last matrix is an elementary matrix, representing the operation R3 − R1 on I.
The others each represent two row operations. For example,
    
2 1 0 1 1 0 2 0 0
 0 1 0  =  0 1 0   0 1 0  = E2 E1 ,
0 0 1 0 0 1 0 0 1
where E1 represents 2R1 and E2 represents R1 + R2 . You should multiply the matrices in
the opposite order, E1 E2 , and notice the effect, thinking about the row operations on I.
Feedback to activity 7.2
The matrix E is the identity matrix after the row operation R2 − 4R1 , so the inverse
matrix is the identity matrix after R2 + 4R1 ,
 
1 0 0
E −1 =  4 1 0.
0 0 1
Multiply out EE −1 and E −1 E as instructed.
Feedback to activity 7.5
For the matrix A,
   
−2 1 3 1 0 0 1 2 0 0 0 1
R1 ↔R3
(A|I) =  0 −1 1 0 1 0
 −→  0 −1 1 0 1 0

1 2 0 0 0 1 −2 1 3 1 0 0

113
7. Matrix inversion

   
1 2 0 0 0 1 1 2 0 0 0 1
R3 +2R1 (−1)R2
−→  0 −1 1 0 1 0
 −→  0 1 −1 0 −1 0 
0 5 3 1 0 2 0 5 3 1 0 2
 
1 2 0 0 0 1 1 R 1 2 0 0 0 1
!
R3 −5R2 8 3
−→  0 1 −1 0 −1 0  −→ 0 1 −1 0 −1 0
1 5 1
0 0 8 1 5 2 0 0 1 8 8 4
   2 6 1
1 2 0 0 0 1 1 0 0 − 8 8 2
R2 +R3 1 3 1 R1 −2R2
−→  0 1 0 8 − 8 4  −→ 0 1 0 18 − 83 14 
0 0 1 18 5
8
1
4
0 0 1 1
8
5
8
1
4
So  
−2 6 4
1
A−1 =  1 −3 2 
8
1 5 2
7
Now check that AA−1 = I.
When you carry out the row reduction, it is not necessary to always indicate the
separation of the two matrices by a line as we have done so far. You just need to keep
track of what you are doing.
In the calculation for the inverse of B, we have omitted the line but added a bit of
space to make it easier for you to read.
   
2 1 3 1 0 0 1 2 0 0 0 1
R1 ↔R3
(B|I) =  0 −1 1 0 1 0  −→  0 −1 1 0 1 0
1 2 0 0 0 1 2 1 3 1 0 0
   
1 2 0 0 0 1 1 2 0 0 0 1
R3 −2R1 (−1)R2
−→  0 −1 1 0 1 0  −→  0 1 −1 0 −1 0 
0 −3 3 1 0 −2 0 −3 3 1 0 −2
 
1 2 0 0 0 1
R3 +3R2
−→  0 1 −1 0 −1 0 
0 0 0 1 −3 −2
which indicates that the matrix B is not invertible; it is not row equivalent to the
identity matrix.

Comment on exercise
Solution to exercise 7.1
To prove this using Theorem 7.4, write (AB)(AB)−1 = I. By the associativity of matrix
multiplication, this says that A(B(AB)−1 ) = I, and by the theorem this implies that
A−1 exists. Multiplying in the opposite order, (AB)−1 (AB) = I shows in the same way
that B is invertible.

114
Chapter 8
Determinants

Introduction
The determinant provides an alternative and efficient way to answer the question of
invertibility of a square matrix. In this chapter we will establish another, often more
practical, way to find the inverse of a matrix. This will lead to another method, known
as Cramer’s rule, to find the solution of a system of n linear equations in n unknowns
with a unique solution.
The determinant is only defined for a square matrix, so in this chapter all matrices are
square unless indicated otherwise.
8

Aims
The aims of this chapter are to:

Define the determinant of a square matrix.

Establish results to facilitate the calculation of the determinant, including the


effects of row operations.

Establish how to use the determinant to determine when a matrix A is invertible


and to find the inverse matrix.

Demonstrate Cramer’s rule.

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 3,
Sections 3.2–3.4

Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

115
8. Determinants

Synopsis
We define the terms minor and cofactor for a square matrix in order to define the
determinant of an n × n matrix as a cofactor expansion. We find that the same real
number, the determinant, can be obtained as a cofactor expansion by any row or any
column of the matrix. We then use this fact to discover properties of the determinant,
in particular we find how the determinant of a matrix is affected by changing the
matrix using row operations in order to facilitate its calculation. We next prove the
result that a matrix is invertible if and only if the determinant is non-zero, and see how
to find the inverse of a matrix using cofactors and the adjoint matrix. Finally we apply
this to finding a solution of a system of n linear equations in n unknowns if the
coefficient matrix has non-zero determinant, a method known as Cramer’s Rule.

8.1 Determinants
8 The determinant of a square matrix A is a particular real number intrinsically
associated with A, written |A| or det A. (You can think of it as a function from the set
of all square matrices to the real numbers.) How do we find this number and what is its
purpose?
The determinant will provide a quick way to determine whether or not a matrix A is
invertible. In view of this, suppose A is a 2 × 2 matrix, and that we wish to find A−1
using row operations. Then we form the matrix (A | I) and attempt to row reduce A to
I. We assume a 6= 0, otherwise we would begin by switching rows,
   
a b 1 0 (1/a)R1 1 b/a 1/a 0
(A | I) = −→
c d 0 1 c d 0 1
   
R2 −cR1 1 b/a 1/a 0 aR2 1 b/a 1/a 0
−→ −→ ,
0 d − cb/a −c/a 1 0 (ad − bc) −c a
which shows that A−1 exists if and only if ad − bc 6= 0.
For a 2 × 2 matrix, the determinant is given by the formula
 
a b
= a b = ad − bc.


c d c d

For example,
1 2
3 4 = (1)(4) − (2)(3) = −2.

To extend this definition to n × n matrices, we define the determinant of an n × n


matrix recursively, in terms of (n − 1) × (n − 1) determinants. So the determinant of a
3 × 3 matrix is given in terms of 2 × 2 matrices, and so on. To do this we will need the
following two definitions.
Definition 8.1 (minor) Suppose A is an n × n matrix. The (i, j) minor of A, denoted
by Mij , is the determinant of the (n − 1) × (n − 1) matrix obtained by removing the ith
row and jth column of A.

116
8.1. Determinants

Definition 8.2 (cofactor) The (i, j) cofactor of a matrix A is

Cij = (−1)i+j Mij .

So the cofactor is equal to the minor if i + j is even, and it is equal to −1 times the
minor if i + j is odd.

Example 8.1 Let  


1 2 3
A =  4 1 1.
−1 3 0
Then the minor M23 and the cofactor C23 are

1 2
M23 =
= 5, C23 = (−1)(2+3) M23 = −5.
−1 3

There is a simple way to associate the cofactor Cij with the entry aij of the matrix. 8
Locate the entry aij and cross out the row and the column containing aij . Then
evaluate the determinant of the (n − 1) × (n − 1) matrix which remains. This is the
minor, Mij . Then give it a ‘+’ or ‘−’ sign according to the position of aij on the
following pattern:
+ − + − ···
 
− + − + ···
+ − + − ···.
 
.. .. .. .. . .
. . . . .

Activity 8.1 Write down the cofactor C13 for the matrix A above using this
method.

If A is an n × n matrix, the determinant of A is given by



a11 a12 . . . a1n

a21 a22 . . . a2n
|A| = .. .. .. .. = a11 C11 + a12 C12 + · · · + a1n C1n .
. . . .
a an2 . . . ann
n1

This is called the cofactor expansion of |A| by row one. It is a recursive definition,
meaning that the determinant of an n × n matrix is given in terms of (n − 1) × (n − 1)
determinants.

Example 8.2 We calculate the determinant of the matrix A in Example 8.1:

|A| = 1C11 + 2C12 + 3C13



1 1 4 1 4 1
= 1
− 2
+ 3

3 0 −1 0 −1 3
= 1(−3) − 2(1) + 3(13) = 34.

117
8. Determinants

Note that if A is a 2 × 2 matrix, then the determinant as defined earlier is just the
cofactor expansion:

a11 a12
a21 a22 == a11 C11 + a12 C12 = a11 a22 − a12 a21 .

Activity 8.2 Calculate the determinant of the matrix


 
−1 2 1
M = 0 2 3.
1 1 4

You might ask, ‘Why is the cofactor expansion given by row 1, rather than any other
row?’ In fact it turns out that using a cofactor expansion by any row or column of A
will give the same number |A|, as the following theorem states.
Theorem 8.1 If A is an n × n matrix, then the determinant of A can be computed by
multiplying the entries of any row (or column) by their cofactors and summing the
8 resulting products:
|A| = ai1 Ci1 + ai2 Ci2 + . . . + ain Cin
(cofactor expansion by row i)

|A| = a1j C1j + a2j C2j + . . . + anj Cnj


(cofactor expansion by column j).

Before we look into a proof, note that this result allows you to choose any row or any
column of a matrix to find its determinant using a cofactor expansion. So you should
choose a row or column which gives the simplest calculations.
Obtaining the correct value for |A| is important, so it is a good idea to check your result
by calculating the determinant by another row or column.

Example 8.3 In the example we have been using (see page 117), instead of using
the cofactor expansion by row 1 as shown above, we can choose to evaluate the
determinant of the matrix A by row 3 or column 3, which will involve fewer
calculations since a33 = 0. To check the result |A| = 34, we will evaluate the
determinant again using column three. Remember the correct cofactor signs.

1 2 3
4 1 1 2
|A| = 4 1 1 = 3

− 1 −1 3 + 0 = 3(13) − (5) = 34.

−1 3 0 −1 3

Activity 8.3 Check your calculation of the determinant of the matrix


 
−1 2 1
M=  0 2 3.
1 1 4
in the previous activity by expanding by a different row or column. Choose one with
fewer calculations.

118
R
8.2. Results on determinants

Read Section 3.2.2 of the text A-H, which gives an informal proof of
Theorem 8.1. The purpose of this section is to show that the determinant is a
number intrinsically defined by the matrix as a sum of signed elementary products
and that this very same number can be obtained as a cofactor expansion by any
row or column of the matrix.

8.2 Results on determinants


Theorem 8.2 If A is an n × n matrix then

|AT | = |A|.

Proof
This theorem follows immediately from Theorem 8.1. The cofactor expansion by row i
of |AT | is precisely the same, number for number, as the cofactor expansion by column i
of |A|. 8
Each of the following three statements follows from Theorem 8.1. By Theorem 8.2, it
follows that each is true if the word row is replaced by column. We will need these
results in the next section. In all of them we assume that A is an n × n matrix.

Corollary 1 If a row of A consists entirely of zeros, then |A| = 0.

Corollary 2 If A contains two rows which are equal, then |A| = 0.

Corollary 3 If the cofactors of one row are multiplied by the entries of a different row,

R
then the result is 0.

Read the proofs of these Corollaries in Section 3.3 of the text A-H where they
are labelled Corollary 3.28, Corollary 3.29 and Corollary 3.30.

Activity 8.4 Write down any 3 × 3 matrix and use it to illustrate Corollary 3:
multiply the cofactors of one row by the entries of a different row. Then find the
determinant of your matrix using the correct row entries and verify it using a
cofactor expansion by a different row or column.

8.2.1 Determinant using row operations


Definition 8.3 An n × n matrix A is upper triangular if all entries below the main
diagonal are zero. It is lower triangular if all entries above the main diagonal are zero.

a11 a12 . . . a1n


 
upper
triangular  0 a22 . . . a2n 
 . .. .. .. 
matrix  .. . . . 
0 0 . . . ann

119
8. Determinants

a11 0 ... 0
 

lower  a21 a22 ... 0 


 . .. .. .. 
triangular  .. . . . 
matrix an1 an2 . . . ann
a11 0 ... 0
 
 0 a22 ... 0 
diagonal  .
 .. .. .. .. 
. . . 
matrix
0 0 . . . ann

Suppose we wish to evaluate the determinant of an upper triangular matrix, such as



a11 a12 . . . a1n

0 a22 . . . a2n
. .. .. ..
.. . . .

0 0 ... ann

Which row or column should we use for the cofactor expansion? Clearly the calculations
8 are simplest if we expand by column 1 or row n. Expansion by column 1 gives us

a22 . . . a2n
. .. ..
|A| = a11 .. . . .
0 ... a
nn

where the (n − 1) × (n − 1) matrix on the right is again upper triangular. Continuing in


this way we see that |A| is just the product of the diagonal entries. The same argument
holds true for a matrix which is diagonal or lower triangular, so we have established one
more corollary of Theorem 8.1:
Corollary 4 If A is upper triangular, lower triangular, or diagonal then

|A| = a11 a22 · · · ann .

A square matrix in row echelon form is upper triangular. If we know how a determinant
is affected by a row operation, then this observation will give us an easier way to
calculate large determinants. We can use row operations to put the matrix into row
echelon form, keep track of any changes, and then easily calculate the determinant of
the reduced matrix. So how does each row operation affect the value of the determinant?
RO1 multiply a row by a non-zero constant
Suppose the matrix B is obtained from a matrix A by multiplying row i by a non-zero
constant α. For example,

a11 a12 . . . a1n a11 a12 . . . a1n

a21 a22 . . . a2n αa21 αa22 . . . αa2n
|A| = .. .. .. .. |B| = .. .. .. ..
. . . . . . . .
a a . . . a a a . . . ann
n1 n2 nn n1 n2

If we evaluate |B| by the cofactor expansion by row i, we obtain

|B| = αai1 Ci1 + αai2 Ci2 + · · · + αain Cin = α(ai1 Ci1 + ai2 Ci2 + · · · + ain Cin ) = α|A|

120
8.2. Results on determinants

The effect of multiplying a row of A by α is to multiply |A| by α, |B| = α|A|.

When we actually need this, we will use it to factor out a constant α from the
determinant, as

a11 a12 . . . a1n a11 a12 . . . a1n

αa21 αa22 . . . αa2n a21 a22 . . . a2n
.
.. .. .. .. = α .. .. .. .. .
. . . . . . .
a an2 . . . ann a an2 . . . ann
n1 n1

RO2 interchange two rows


This time we will use an inductive proof on the cofactor expansion. If A is a 2 × 2
matrix and B is the matrix obtained from A by interchanging the two rows, then

a b c d
|A| = = ad − bc |B| = = bc − ad
c d a b

so |B| = −|A|. 8
Now let A be a 3 × 3 matrix and let B be a matrix obtained from A by interchanging
two rows. Then if we expand |B| using a different row, each cofactor contains the
determinant of a 2 × 2 matrix which is a cofactor of A with two rows interchanged, so
each will be multiplied by −1, and |B| = −|A|. To visualise this, consider for example

a b c g h i

|A| = d e f , |B| = d e f
g h i a b c

Expanding |A| and |B| by row 2, we have



b c a c a b
|A| = −d + e −f
h i g i g h

h i g i g h
|B| = −d + e
a c − f a b = −|A|

b c
since all the 2 × 2 determinants change sign. In the same way, if this holds for
(n − 1) × (n − 1) matrices, then it holds for n × n matrices.

The effect of interchanging two rows of a matrix is to multiply the determinant


by −1, |B| = −|A|.

RO3 add a multiple of one row to another.


Suppose the matrix B is obtained from the matrix A by replacing row j of A by row j
plus k times row i of A, j 6= i. For example, consider the case in which B is obtained
from A by adding 4 times row 1 of A to row 2. Then

a11 a12 . . . a1n

a21 a22 . . . a2n
|A| = ..
.. .. .. ,
. . . .
a an2 . . . ann
n1

121
8. Determinants


a11 a12 ... a1n

a21 + 4a11 a22 + 4a12 . . . a2n + 4a1n
|B| = .. .. .. .. .
. . . .
an1 an2 ... ann
In general, in a situation like this, we can expand |B| by row j:
|B| = (aj1 + kai1 )Cj1 + (aj2 + kai2 )Cj2 + · · · + (ajn + kain )Cjn
= aj1 Cj1 + aj2 Cj2 + · · · + ajn Cjn + k(ai1 Cj1 + ai2 Cj2 + · · · + ain Cjn )
= |A| + 0
The last expression in brackets is 0 because it consists of the cofactors of one row
multiplied by the entries of another row. So this row operation does not change the
value of |A|.

There is no change in the value of the determinant if a multiple of one row is added
to another.

We collect these results in the following theorem.


8
Theorem 8.3 (Effect of a row (column) operation on |A|) All statements are true
if row is replaced by column.
(RO1) If a row is multiplied by a constant α
|A| changes to α|A|.
(RO2) If two rows are interchanged
|A| changes to − |A|.
(RO3) If a multiple of one row is added to another
NO change in |A|.

Example 8.4 We can now use row operations to evaluate



1 2 −1 4

−1 3 0 2
|A| =
2 1 1 2
1 4 1 3
by reducing A to an upper triangular matrix. First we obtain zeros below the
leading one by adding multiples of row 1 to the rows below. The new matrix will
have the same determinant as A,

1 2 −1 4 1 2 −1 4

0 5 −1 6 0 5 −1 6
|A| =

= −3 0 1 −1 2

0 −3 3 −6
0 2 2 −1 0 2 2 −1
In the second step we factored −3 from the third row. We would need to multiply
the resulting determinant on the right by −3 in order to put the −3 back into the
third row, and get back a matrix with the same determinant as A. Next we switch
row 2 and row 3, with the effect of changing the sign of the determinant.

1 2 −1 4 1 2 −1 4

0 1 −1 2 0 1 −1 2
|A| = 3

= 3 0 0 4 −4

0 5 −1 6
0 2 2 −1 0 0 4 −5
122
8.3. Matrix inverse using cofactors

The final steps all use RO3, so there is no change in the value of the determinant.
Finally we evaluate the determinant of the upper triangular matrix

1 2
−1 4
0 1 −1 2
= 3 = −12.
0 0 4 −4
0 0 0 −1

A word of caution with row operations! What is the change in the value of |A|
(1) if R2 is replaced by R2 − 3R3 or
(2) if R2 is replaced by 3R1 − R2 ?
For (1) there is no change, but for (2) the determinant will change sign. Why? 3R1 − R2
is actually two elementary row operations: first multiply row 2 by −1 and then add
three times row 1 to it. When performing row operation RO3, you should always add a
multiple of another row to the row you are replacing. 8
Activity 8.5 You can shorten the writing in the above example by expanding the
4 × 4 determinant using the first column as soon as you have obtained the
determinant with zeros under the leading one. You will then be left with a 3 × 3
determinant to evaluate. Do this. Without looking at the example above, work
through the calculations in this way to evaluate

1 2 −1 4

−1 3 0 2
|A| = .
2 1 1 2

1 4 1 3

8.2.2 The determinant of a product


One very important result concerning determinants can be stated as: ‘The determinant
of the product of two square matrices is the product of their determinants’. This is the
content of the following theorem.
Theorem 8.4 If A and B are n × n matrices then

R
|AB| = |A||B|.

Read the proof of this theorem in the text A-H where it is labelled Theorem 3.37.

8.3 Matrix inverse using cofactors


Theorem 8.5 If A is an n × n matrix, then A is invertible if and only if |A| 6= 0.

We will give two proofs of this theorem, the first proof follows easily from Theorem 7.3
and establishes |A| 6= 0 as another equivalent condition for a matrix A to be invertible.

123
8. Determinants

The second proof follows from our results on determinants and gives us another method
to calculate the inverse of a matrix.
Proof 1: We have already established this theorem indirectly by our arguments in the
previous section; we will repeat and collect them here.
By Theorem 7.3 on page 109, A is invertible if and only if the reduced row echelon form
of A is the identity matrix. Let R be the reduced row echelon form of A. Since A is a
square matrix, R is either the identity matrix, or a matrix with a row of zeros. (Indeed,
if R has a leading one in every row, then it must also have a leading one in every
column, and since it is n × n it must be the identity matrix. Otherwise, there is a row of
R without a leading one, and this must, therefore, be a row of zeros.)
So either R = I, which is the case if and only if A is invertible, with |R| = 1 6= 0; or
|R| = 0 because it has a row of zeros, which is the case if and only if A is not invertible.
As we have seen, row operations cannot alter the fact that a determinant is zero or
non-zero. By performing a row operation we might be multiplying the determinant by a
non-zero constant, or by −1, or not changing the determinant at all. Therefore we can
conclude that |A| = 0 if and only if the determinant of its reduced row echelon form,
8 |R| = 0, which is if and only if A is not invertible. Or, put the other way, |A| 6= 0 if and
only if |R| = 1, if and only if the matrix A is invertible. 

Proof 2: We will now prove this theorem directly. Since it is an if and only if
statement, we must prove both implications.
First we show that if A is invertible, then |A| 6= 0. We assume A−1 exists, so that
AA−1 = I. Then taking the determinant of both sides of this equation,
|AA−1 | = |I| = 1. Applying Theorem 8.4 to the product,
|AA−1 | = |A| |A−1 | = 1.
If the product of two real numbers is non-zero, then neither number can be zero, which
proves that |A| =
6 0.
As a consequence of this argument we have the bonus result that
1
|A−1 | = .
|A|

We now show the other implication, that if |A| 6= 0 then A is invertible. To do this we
will construct A−1 , and to do this we need some definitions.
Definition 8.4 If A is an n × n matrix, the matrix of cofactors of A is the matrix
whose (i, j) entry is Cij , the (i, j) cofactor of A. The adjoint (referred to as the
adjugate in some textbooks) of the matrix A is the transpose of the matrix of cofactors.
That is, the adjoint of A, adj(A), is the matrix
C11 C21 . . . Cn1
 
 C12 C22 . . . Cn2 
adj(A) =   ... .. .. ..  .
. . . 
C1n C2n . . . Cnn

Notice that column 1 of this matrix consists of the cofactors of row 1 of A (and row 1
consists of the cofactors of column 1 of A), and similarly for each column and row.

124
8.3. Matrix inverse using cofactors

We now multiply the matrix A with its adjoint matrix,

a11 a12 . . . a1n C11 C21 . . . Cn1


  
 a21 a22 . . . a2n   C12
  C22 . . . Cn2 
A adj(A) = 
 ... .. .. ..   .. .. .. .. 
. . . . . . . 
an1 an2 . . . ann C1n C2n . . . Cnn

Look carefully at what each entry of the product will be.


The (1, 1) entry is: a11 C11 + a12 C12 + · · · + a1n C1n . This is the cofactor expansion of |A|
by row 1.
The (1,2) entry is: a11 C21 + a12 C22 + · · · + a1n C2n . This consists of the cofactors of
row 2 of A multiplied by the entries of row 1, so this is equal to 0 by Corollary 3 in
section 8.2.
Continuing in this way, we see that the entries on the main diagonal of the product are
all equal to |A|, and all entries off the main diagonal are equal to 0. That is,

|A| 0 · · · 0
  8
 0 |A| · · · 0 
A adj(A) = 
 ... .. .. ..  = |A| I,
. . . 
0 0 · · · |A|

since |A| is just a real number, a scalar.


We know |A| 6= 0, so we can divide both sides of the equation by |A| to obtain,
 
1
A adj(A) = I
|A|

This implies that


1
A−1 = adj(A)
|A|

This gives us a method to calculate the inverse of a matrix using cofactors.

Example 8.5 Find A−1 for the matrix


 
1 2 3
A =  −1 2 1  .
4 1 1

First calculate |A| to see if A is invertible. Using the cofactor expansion by row 1,

|A| = 1(2 − 1) − 2(−1 − 4) + 3(−1 − 8) = −16 6= 0.

We then calculate the minors, for example



2 1
M11 =
= 1,
1 1

125
8. Determinants

and fill in the chart below


M11 = 1 M12 = −5 M13 = −9
M21 = −1 M22 = −11 M23 = −7
M31 = −4 M32 = 4 M33 = 4 .

Change the minors into cofactors, by multiplying by −1 those minors with i + j equal
to an odd number. Finally transpose the result to form the adjoint matrix, so that
 
1 1 −4
1 1
⇒ A−1 = adj(A) = −  5 −11 −4  .
|A| 16
−9 7 4

As with all calculations, it is easy to make a mistake. Therefore, having found A−1 ,
the next thing you should do is check your result by showing that AA−1 = I,
    
1 2 3 1 1 −4 −16 0 0
1 1
−  −1 2 1   5 −11 −4  = −  0 −16 0  = I.
8 16
4 1 1 −9 7 4
16
0 0 −16

Activity 8.6 Use this method to find the inverse of the matrix
 
1 2 3
A= 0  4 0
5 6 7

Check your result.

Remember: the adjoint matrix only contains the cofactors of A; the (i, j) entry is the
cofactor Cji of A. The entries only multiply the cofactors when calculating the
determinant of A, |A|.

8.4 Cramer’s rule


If A is a square matrix with |A| 6= 0, then Cramer’s rule gives us an alternative method
of solving a system of linear equations Ax = b.
Theorem 8.6 (Cramer’s rule) If A is n × n, |A| 6= 0, and b ∈ Rn , then the solution
x = (x1 , x2 , . . . , xn )T of the linear system Ax = b is given by

|Ai |
xi =
|A|

where Ai is the matrix obtained from A by replacing the ith column with the vector b

Before you look at the proof of this theorem, let’s see how it works.

126
8.4. Overview

Example 8.6 Use Cramer’s rule to find the solution of the linear system
x + 2y + 3z = 7
−x + 2y + z = −3
4x + y + z = 5

In matrix form Ax = b this system is,


    
1 2 3 x 7
 −1 2 1   y  =  −3 
4 1 1 z 5

We first check that |A| 6= 0. This is the same matrix A we used Example 8.5 to find
the inverse of a matrix on page 125; |A| = −16. Then applying Cramer’s rule, we
find x by evaluating the determinant of the matrix obtained from A by replacing
column 1 with b,
7 2 3
8

−3 2 1

5 1 1 −16
x= = =1
|A| −16
and in the same way we obtain y and z.

1 7 3 1 2 7

−1 −3 1 −1 2 −3

4 5 1 48 4 1 5 −64
y= = = −3 z= = =4
|A| −16 |A| −16

which can be easily checked by substitution into the original equations (or

R
multiplying Ax).

Read and work through the proof of this theorem in the text A-H, where it is
labelled Theorem 3.43. Be sure you understand how the proof works.

Summary of Cramer’s rule. To find xi ,


(1) replace column i of A by b,
(2) evaluate the determinant of the resulting matrix,
(3) divide by |A|.

Activity 8.7 Can you think of any other methods you can use to obtain the
solution to Example 8.6?

Overview
In this chapter we have shown how to obtain the determinant of a square matrix, the
real number intrinsically associated with the matrix. We looked at properties of the
determinant and how its value is affected by changing the matrix using row operations.

127
8. Determinants

We then used this information to obtain the result that a square matrix A is invertible
if and only if |A| 6= 0. This led in turn to the method of finding the inverse of A using
cofactors (the adjoint matrix) and to Cramer’s rule.

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

find the determinant of a square matrix


use and understand the determinant as a means of determining invertibility of a
square matrix
find the inverse of a matrix using cofactors
solve a consistent system of n linear equations in n unknowns using Cramer’s rule.

In addition you should know that:


8
There are three methods to solve Ax = b if A is n × n and |A| 6= 0:
(1) Gaussian elimination.
(2) Find A−1 , then x = A−1 b.
(3) Cramer’s rule.

There is one method to solve Ax = b if A is m × n and m 6= n, or if |A| = 0:


(1) Gaussian elimination.

There are two methods to find A−1 :


(1) Using cofactors for the adjoint matrix.
(2) By row reduction of (A | I) to (I | A−1 ).

If A is an n × n matrix, then the following statements are equivalent (Theorem 7.3


and Theorem 8.5):
(1) A is invertible.
(2) Ax = b has a unique solution for any b ∈ Rn .
(3) Ax = 0 has only the trivial solution, x = 0.
(4) The reduced row echelon form of A is I.
(5) |A| 6= 0.

Test your knowledge and understanding


Work Exercises 3.3–3.9 and 3.11 in the text A-H. The solutions can be found at the end
of the textbook.
Work Problem 3.10 in the text A-H. (Carry out the required proof using determinants.)
You will find the solution on the VLE.

128
8.4. Comments on selected activities

Comments on selected activities


Feedback to activity 8.1
C13 = 13.

Feedback to activity 8.2


|M | = −1(8 − 3) − 2(0 − 3) + 1(0 − 2) = −1

Feedback to activity 8.3


You should either expand by column 1 or row 2. For example, using column 1:
|M | = −1(8 − 3) + 1(6 − 2) = −1.

Feedback to activity 8.5



1 2 −1 4
0 5 −1 6 5 −1 6

|A| =

= −3 3 −6

0 −3 3 −6 2 2 −1
8

0 2 2 −1
At this stage you can expand the 3 × 3 matrix using a cofactor expansion, or continue a
bit more with row operations:

1 −1 2 1 −1 2
4 −4
|A| = 3 5 −1 6 = 3 0 4 −4 = 3 = 3(−4) = −12
2 2 −1 4 −5
0 4 −5

Feedback to activity 8.6

|A| = −32 6= 0
   
28 4 −12 −7 −1 3
1 1 1
A−1 = adj(A) = −  0 −8 0 =  0 2 0 .
|A| 32 8
−20 4 4 5 −1 −1

Feedback to activity 8.7


One way is to use Gaussian elimination. Another method is to use the inverse matrix.
You found A−1 in Example 8.5 on page 125. Now use it to calculate the solution,
x = A−1 b.

129
8. Determinants

130
Chapter 9
Rank, range and linear systems

Introduction
In this short chapter we aim to extend and consolidate what we have learned so far
about systems of equations and matrices, and tie together many of the results of the
previous chapters. We will intersperse an overview of the previous chapters with two
new concepts, the rank of a matrix and the range of matrix.
This chapter will serve as a synthesis of what we have learned so far in anticipation of a
return to these topics later in the guide.

Aims 9
The aims of this chapter are to:

synthesise what we have learned so far about linear systems

introduce the concepts of the rank and range of a matrix

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 4.

Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

Synopsis
We start by introducing the rank of a matrix. We then show how the rank of a matrix is
connected with the set of solutions of linear systems corresponding to the matrix and
relate the number of free parameters in the general solution (when it exists) to the rank
of the corresponding matrix. Finally, we define the range of a matrix.

131
9. Rank, range and linear systems

9.1 The rank of a matrix


Any matrix A can be reduced to a matrix in reduced row echelon form by elementary
row operations. You just have to follow the algorithm and you will obtain first a
row-equivalent matrix which is in row echelon form, and then continuing with the
algorithm, a row-equivalent matrix in reduced row echelon form (see section 7.2).
Another way to say this is

Any matrix A is row-equivalent to a matrix in reduced row echelon form.

There are several ways of defining the rank of a matrix, and we shall meet some other
(more sophisticated) ways later. All are equivalent. We begin with the following
definition.
Definition 9.1 (Rank of a matrix) The rank, rank(A), of a matrix A is the number of
non-zero rows in a row echelon matrix obtained from A by elementary row operations.

Notice that the definition only requires that the matrix A be put into row echelon form,
because by then the number of non-zero rows is determined. By a non-zero row, we
9 simply mean one that contains entries other than 0. Since every non-zero row of a
matrix in row echelon form begins with a leading one, this is equivalent to the following
definition.
Definition 9.2 The rank, rank(A), of a matrix A is the number of leading ones in a
row echelon matrix obtained from A by elementary row operations.

Generally, if A is an m × n matrix, then the number of non-zero rows (the number of


leading ones) in a row echelon form of A can certainly be no more than the total
number of rows, m. Furthermore, since the leading ones must be in different columns,
the number of leading ones in the echelon form can be no more than the total number,
n, of columns. Thus we have:
Theorem 9.1 For an m × n matrix A, rank(A) ≤ min{m, n}, where min{m, n}
denotes the smaller of the two integers m and n.

Example 9.1 Consider the matrix


 
1 2 1 1
M = 2 3 0 5.
3 5 1 6

Reducing this to row echelon form using elementary row operations, we have:
     
1 2 1 1 1 2 1 1 1 2 1 1
2 3 0 5  →  0 −1 −2 3  →  0 1 2 −3 
3 5 1 6 0 −1 −2 3 0 0 0 0

This last matrix is in row echelon form and has two non-zero rows (and two leading
ones), so the matrix M has rank 2.

132
9.2. Rank and systems of linear equations

Activity 9.1 Prove that the matrix


 
1 2 1 1
B = 2 3 0 5
3 5 1 4

has rank 3.

If a square matrix A of size n × n has rank n, then its reduced row echelon form has a
leading one in every row and (since the leading ones are in different columns) a leading
one in every column. Since every column with a leading one has zeros elsewhere, it
follows that the reduced echelon form of A must be I, the n × n identity matrix.
Conversely, if the reduced row echelon form of A is I, then by the definition of rank, A
has rank n. We therefore have one more equivalent statement to add to our theorem:
Theorem 9.2 If A is an n × n matrix, then the following statements are equivalent.

A−1 exists.

Ax = b has a unique solution for any b ∈ Rn . 9


Ax = 0 has only the trivial solution, x = 0.

The reduced echelon form of A is I.

|A| 6= 0.

The rank of A is n.

9.2 Rank and systems of linear equations


Recall that to solve a system of linear equations, one forms the augmented matrix and
reduces it to echelon form by using elementary row operations.

Example 9.2 Consider the system of equations

x1 + 2x2 + x3 = 1
2x1 + 3x2 = 5
3x1 + 5x2 + x3 = 4.

The augmented matrix is the matrix B in the previous activity. When you reduced
B to find the rank, after two steps you found,
     
1 2 1 1 1 2 1 1 1 2 1 1
 2 3 0 5  →  0 −1 −2 3  →  0 1 2 −3  .
3 5 1 4 0 −1 −2 1 0 0 0 −2

133
9. Rank, range and linear systems

Thus the original system of equations is equivalent to the system

x1 + 2x2 + x3 = 1
x2 + 2x3 = −3
0x1 + 0x2 + 0x3 = −2.

But this system has no solutions, since there are no values of x1 , x2 , x3 that satisfy
the last equation. It reduces to the false statement ‘0 = −2’, whatever values we give
the unknowns. We deduce, therefore, that the original system has no solutions, and
we say that it is inconsistent. Notice that in this case there is no reason to reduce
the matrix further.

If, as in Example 9.2, the row echelon form of an augmented matrix has a row of the
kind (0 0 . . . 0 a), with a 6= 0, then the original system is equivalent to one in which
there is an equation
0x1 + 0x2 + · · · + 0xn = a (a 6= 0).
Clearly this equation cannot be satisfied by any values of the xi s, and the system is
inconsistent.
9 Continuing with our example:

Example 9.2 (continued)


Note that the coefficient matrix A, consists of the first three columns of the
augmented matrix, and the row echelon form of A consists of the first three columns
of the row echelon form of the augmented matrix:
   
1 2 1 1 2 1
A = 2 3 0 → ... → 0 1 2.
3 5 1 0 0 0
   
1 2 1 1 1 2 1 1
(A|b) −  2 3 0 5  → . . . →  0 1 2 −3  .
3 5 1 4 0 0 0 1

The rank of the coefficient matrix A is 2, but the rank of the augmented matrix
(A|b), is 3.

If a linear system is consistent then there can be no leading one in the last column of
the reduced augmented matrix, for that would mean there was a row of the form
(0 0 . . . 0 1). Thus, a system Ax = b is consistent if and only if the rank of the
augmented matrix is precisely the same as the rank of the matrix A.

Example 9.3 In contrast, consider the system of equations

x1 + 2x2 + x3 = 1
2x1 + 3x2 = 5
3x1 + 5x2 + x3 = 6.

134
9.2. Rank and systems of linear equations

This system has the same coefficient matrix A as Example 9.2, and the rank of A is
2. The augmented matrix for the system is the matrix M in Example 9.1 on page
132, which also has rank 2, so this system is consistent. Since the rank is 2 and there
are 3 columns in A, there is a free variable and therefore infinitely many solutions.

Activity 9.2 Write down a general solution for this system to verify these remarks.

If an m × n matrix A has rank m, then there will be a leading one in every row of an
echelon form of A, and in this case a system of equations Ax = b will never be
inconsistent; it will be consistent for all b ∈ Rm . Why? There are two ways to see this.
In the first place, if there is a leading one in every row of A, the augmented matrix
(A|b) can never have a row of the form (0 0 . . . 0 1). Second, the augmented matrix
also has m rows, its size is m × (n + 1). So the rank of (A|b) can never be more than m.

Example 9.4 Consider again the matrix B from Activity 9.1 on page 133, which
we interpreted as the augmented matrix B = (A|b) in Example 9.2, and its row
echelon form:
   
1 2 1 1 1 2 1 1
B =  2 3 0 5  → . . . →  0 1 2 −3  .
9
3 5 1 4 0 0 0 1

This time interpret B as representing the coefficient matrix of a system of three


equations in four unknowns, Bx = d, with d ∈ R3 . The coefficient matrix B is 3 × 4
and has rank 3, so as we argued above, this system of equations is always consistent.
But let’s look at this more closely. Any augmented matrix (B|d) will be row
equivalent to a matrix in echelon form for which the first four columns are the same
as the echelon form of B; that is,
 
1 2 1 1 p1
(B|d) → . . . →  0 1 2 −3 p2 
0 0 0 1 p3

for some constants pi , which could be zero. This system will have infinitely many
solutions for any d ∈ R3 , because the number of columns is greater than the rank of
B. There is one column without a leading one, so there is one non-leading variable.

Activity 9.3 If p1 = 1, p2 = −2 and p3 = 0, and x = (x1 , x2 , x3 , x4 )T , write down


the solution to a given system Bx = d in vector form. Use this to determine the
vector d in this case.

Suppose we have a consistent system, and suppose that the rank r is strictly less than
n, the number of unknowns. Then, as we have just seen in Example 9.4, the system in
reduced row echelon form (and hence the original one) does not provide enough
information to specify the values of x1 , x2 , . . . , xn uniquely. Let’s consider this in more
detail.

135
9. Rank, range and linear systems

Example 9.5 Suppose we are given a system for which the augmented matrix
reduces to the row echelon form
 
1 3 −2 0 2 0 0
0 0 1 2 0 3 1
0 0 0 0 0 1 5.
 

0 0 0 0 0 0 0

Here the rank (number of non-zero rows) is r = 3 which is strictly less than the
number of unknowns, n = 6.
Continuing to reduced row echelon form, we obtain the matrix
 
1 3 0 4 2 0 −28
0 0
 1 2 0 0 −14  .
0 0 0 0 0 1 5 
0 0 0 0 0 0 0

Activity 9.4 Verify this. What are the additional two row operations which need to
9 be carried out?

Example 9.5 (continued)


The corresponding system is

x1 + 3x2 + 4x4 + 2x5 = −28


x3 + 2x4 = −14
x6 = 5.

The variables x1 , x3 and x6 correspond to the columns with the leading ones and are
the leading variables. The other variables are the non-leading variables.
The form of these equations tells us that we can assign any values to x2 , x4 and x5 ,
and then the leading variables will be determined. Explicitly, if we give x2 , x4 , x5 the
arbitrary values s, t, u, where s, t, u represent any real numbers, the solution is given
by

x1 = −28 − 3s − 4t − 2u, x2 = s, x3 = −14 − 2t, x4 = t, x5 = u, x6 = 5.

There are infinitely many solutions because the so-called ‘free variables’ x2 , x4 , x5
can take any values s, t, u ∈ R.

Generally, for a consistent system, we can describe what happens when the row echelon
form has r < n non-zero rows (0 0 . . . 0 1 ∗ ∗ . . . ∗). If the leading one is in the kth
column, it is the coefficient of the variable xk . So if the rank is r and the leading ones
occur in columns c1 , c2 , . . . , cr then the general solution to the system can be expressed
in a form where the unknowns xc1 , xc2 , . . . , xcr (the leading variables) are given in terms
of the other n − r unknowns (the non-leading variables), and those n − r unknowns are
free to take any values. In Example 9.5, we have n = 6 and r = 3, and the 3 variables
x1 , x3 , x6 can be expressed in terms of the 6 − 3 = 3 free variables x2 , x4 , x5 .

136
9.3. General solution of a linear system in vector notation

In the case r = n, where the number of leading ones r in the echelon form is equal to
the number of unknowns n, there is only one solution to the system — for there is a
leading one in every column since the leading ones move to the right as we go down the
rows. In this case there is a unique solution obtained from the reduced echelon form. In
fact, this can be thought of as a special case of the more general one discussed above:
since r = n there are n − r = 0 free variables, and the solution is therefore unique.
We can now summarise our conclusions thus far concerning a general linear system of m
equations in n variables, written as Ax = b, where the coefficient matrix A is an m × n
matrix of rank r.

If the echelon form of the augmented matrix has a row (0 0 . . . 0 a), with a 6= 0,
the original system is inconsistent; it has no solutions. In this case
rank(A) = r < m and rank(A|b) = r + 1.

If the echelon form of the augmented matrix has no rows of the above type the
system is consistent, and the general solution involves n − r free variables, where r
is the rank of the coefficient matrix. When r < n there are infinitely many solutions,
but when r = n there are no free variables and so there is a unique solution.
A homogeneous system of m equations in n unknowns is always consistent. In this case
the last statement still applies. 9
The general solution of a homogeneous system involves n − r free variables, where r
is the rank of the coefficient matrix. When r < n there are infinitely many
solutions, but when r = n there are no free variables and so there is a unique
solution, namely the trivial solution, x = 0.

9.3 General solution of a linear system in vector


notation
Continuing with Example 9.5, we found the general solution of the linear system above
in terms of the three free variables, or parameters, s, t, u. Expressing the solution, x,
as a column vector, we have
   
x1 −28 − 3s − 4t − 2u
 x2   s 
   
 x3   −14 − 2t 
x= =
   
x
  4 t 

 x5   u 
x6 5
or        
−28 −3s −4t −2u
 0   s   0   0 
       
 −14   0   −2t   0 
x=
 0  +  0  +  t  +  0 .
      
       
 0   0   0   u 
5 0 0 0

137
9. Rank, range and linear systems

That is, the general solution is

x = p + sv1 + tv2 + uv3 , s, t, u ∈ R,

where        
−28 −3 −4 −2
 0   1   0   0 
       
 −14   0   −2   0 
p=
 0 ,
 v1 = 
 0 ,
 v2 = 
 1 ,
 v3 = 
 0 .

       
 0   0   0   1 
5 0 0 0

Applying the same method generally to a consistent system of rank r with n unknowns,
we can express the general solution of a consistent system Ax = b in the form

x = p + a1 v1 + a2 v2 + · · · + an−r vn−r .

Note that, if we put all the ai s equal to 0, we get a solution x = p, which means that
Ap = b, so p is a particular solution of the system. Putting a1 = 1 and the remaining
9 ai s equal to zero, we get a solution x = p + v1 , which means that A(p + v1 ) = b. Thus

b = A(p + v1 ) = Ap + Av1 = b + Av1 .

Comparing the first and last expressions, we see that Av1 = 0. Clearly, the same
equation holds for v2 , . . . , vn−r . So we have proved the following.
If A is an m × n matrix of rank r, the general solution of Ax = b is the sum of:

a particular solution p of the system Ax = b and

a linear combination a1 v1 + a2 v2 + · · · + an−r vn−r of solutions v1 , v2 , . . . , vn−r of


the homogeneous system Ax = 0.
If A has rank n, then Ax = 0 only has the solution x = 0, and so Ax = b has a unique
solution: p + 0 = p.
This is a more precise form of the result of Theorem 6.2, which states that all solutions
of a consistent system Ax = b are of the form x = p + z where p is any solution of
Ax = b and z ∈ N (A), the null space of A (the set of all solutions of Ax = 0).

Activity 9.5 Solve the following system of equations Ax = b by reducing the


augmented matrix to reduced row echelon form:

x1 − x2 + x3 + x4 + 2x5 = 4
−x1 + x2 + x4 − x5 = −3
x1 − x2 + 2x3 + 3x4 + 4x5 = 7.

Show that your solution can be written in the form p + su1 + tu2 where Ap = b,
Au1 = 0 and Au2 = 0.

138
9.4. Range

9.4 Range
The range of a matrix A is defined as follows.
Definition 9.3 (Range of a matrix) Suppose that A is an m × n matrix. Then the
range of A, denoted by R(A), is the subset

R(A) = {Ax | x ∈ Rn }

of Rm . That is, the range is the set of all vectors y ∈ Rm of the form y = Ax for some
x ∈ Rn .

What is the connection between the range of a matrix A and a system of linear
equations Ax = b? If A is m × n, then x ∈ Rn and b ∈ Rm . If the system Ax = b is
consistent, then this means that there is a vector x ∈ Rn such that Ax = b, so b is in
the range of A. Conversely, if b is in the range of A, then the system Ax = b must have
a solution. Therefore, we have shown that for an m × n matrix A:

The range of A, R(A), consists of all vectors b ∈ Rm for which the system of
equations Ax = b is consistent.
9
Let’s look at R(A) from a different point of view. Suppose that the columns of A are
c1 , c2 , . . . , cn . Then we may write A = (c1 c2 . . . cn ). If x = (α1 , α2 , . . . , αn )T ∈ Rn ,
then the product Ax is equal to

Ax = α1 c1 + α2 c2 + · · · + αn cn .

Activity 9.6 You proved this result earlier; it is Theorem 4.2 in the subject guide.
Prove it again now to make sure you understand how and why it works.
(This is where we start to make good use of this equality.)

The equality says that R(A), the set of all matrix products Ax, is also the set of all
linear combinations of the columns of A. For this reason R(A) is also called the
column space of A. (More on this in Chapter 13.)
If A = (c1 c2 . . . cn ), where ci denotes column i of A, then we can write

R(A) = {a1 c1 + a2 c2 + . . . + an cn | a1 , a2 , . . . , an ∈ R}.

Example 9.6 Suppose that  


1 2
A =  −1 3  .
2 1
Then for x = (α1 , α2 )T ,
       
1 2   α1 + 2α2 1 2
α1
Ax =  −1 3  =  −α1 + 3α2  = α1  −1  + α2  3  ,
α2
2 1 2α1 + α2 2 1

139
9. Rank, range and linear systems

so   
 α1 + 2α2

R(A) =  −α1 + 3α2  α1 , α2 ∈ R ;

2α1 + α2
 
or
R(A) = {α1 c1 + α2 c2 | α1 , α2 ∈ R} ,
 
 
1 2
where c1 = −1 and c2 = 3  are the columns of A.
  
2 1

Again, thinking of the connection with the system of equations Ax = b, we have


already shown that Ax = b is consistent if and only if b is in the range of A, and we
have now shown that R(A) is equal to the set of all linear combinations of the columns
of A. Therefore we can now assert that

The system of equations Ax = b is consistent if and only if b is a linear combination


of the columns of A.

9
Example 9.7 Consider the following systems of three equations in two unknowns.
 
 x + 2y = 0
  x + 2y = 1

−x + 3y = −5 −x + 3y = 5
 
2x + y = 3 2x + y = 2
 

Solving these by Gaussian elimination (or any other method) you will find that the
first system is consistent and the second system has no solution. The first system has
the unique solution (x, y)T = (2, −1)T .

Activity 9.7 Do this. Solve each of the above systems.

The coefficient matrix of each of the systems is the same, and is equal to the matrix A
in the Example 9.7. For the first system,
   
1 2   0
x
A =  −1 3, x = , b =  −5  .
y
2 1 3

Checking this solution, you will find that


         
1 2   0 0 1 2
2
Ax =  −1 3 =  −5  or  −5  = 2  −1  −  3  = 2c1 − c2 .
−1
2 1 3 3 2 1

On the other hand, it is not possible to express the vector (1, 5, 2)T as a linear
combination of the column vectors of A. Trying to do so would lead to precisely the
same set of inconsistent equations.

140
9.4. Overview

Notice, also, that the homogeneous system Ax = 0 has only the trivial solution, and
that the only way to express 0 as a linear combination of the columns of A is by
0c1 + 0c2 = 0.

Activity 9.8 Verify all of the above statements.

Activity 9.9 Look at your solution to Activity 9.5 on page 138, and express the
vector b = (4, −3, 7)T as a linear combination of the columns of the coefficient matrix
 
1 −1 1 1 2
A =  −1 1 0 1 −1  .
1 −1 2 3 4

Do the same for the vector 0.

Overview
This chapter has drawn together what we have already learned about linear systems,
and added important new ingredients: rank and range. We now should have a better 9
understanding of the nature of the solution set to a linear system.

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

find a general solution to a linear system, Ax = b, expressed in vector notation as


the sum of a particular solution plus a general solution to the associated
homogeneous system Ax = 0
explain why a general solution x to Ax = b, where A is an m × n matrix of rank r,
is of the form x = p + a1 v1 + a2 v2 + · · · + an−r vn−r , ai ∈ R; specifically why there
are n − r arbitrary constants
explain what is meant by the rank of a matrix and by the range of a matrix, be
able to find the rank of a matrix
show that if A = (c1 c2 . . . cn ), and if x = (α1 , α2 , . . . , αn )T ∈ Rn , then
Ax = α1 c1 + α2 c2 + · · · + αn cn
write b as a linear combination of the columns of A if Ax = b is consistent
write 0 as a linear combination of the columns of A, and explain when it is possible
to do this in some way other than using the trivial solution, x = 0 (with all the
coefficients in the linear combination equal to zero).

Test your knowledge and understanding


Work the Exercises in Chapter 4 of the text A-H (Exercises 4.1–4.6). The solutions can
be found at the end of the textbook.

141
9. Rank, range and linear systems

Work Problems 4.3 and 4.5 in Chapter 4 of the text A-H. You will find the solutions on
the VLE.

Comments on selected activities


Feedback to activity 9.2
One more row operation on the row echelon form will obtain a matrix in reduced row
echelon form which is row equivalent to the matrix M , from which the solution is found
to be    
7 3
x =  −3  + t  −2  , t ∈ R.
0 1

Feedback to activity 9.3


Substitute for p1 , p2 , p3 in the row echelon form of the augmented matrix and then
continue to reduce it to reduced row echelon form. The non-leading variable is x3 .
Letting x3 = t, the general solution is
     
x1 5 3
 x2   −2 
9  + t  −2  = p + tv,
 
x= =
 x3   0   1  t ∈ R.
x4 0 0

Since Bp = d, multiplying Bp you will find that d = (1, 4, 5)T . (You can check all this
by row reducing (B|d).)
Feedback to activity 9.5
Put the augmented matrix into RREF:
   
1 −1 1 1 2 4 R2 +R1 1 −1 1 1 2 4
(A|b) =  −1 1 0 1 −1 −3  R−→3 −R1
 0 0 1 2 1 1
1 −1 2 3 4 7 −→ 0 0 1 2 2 3
   
1 −1 1 1 2 4 R2 −R3 1 −1 1 1 0 0
R3 −R2
−→  0 0 1 2 1 −→
1 R1 −2R3 0 0 1
  2 0 −1 
0 0 0 0 1 2 −→ 0 0 0 0 1 2
 
1 −1 0 −1 0 1
R1 −R2
−→  0 0 1 2 0 −1  .
0 0 0 0 1 2
Set the non-leading variables to arbitrary constants: x2 = s, x4 = t. Then solve for the
leading variables in terms of these parameters, starting with the bottom row.
For s, t ∈ R,
x5 = 2, x4 = t, x3 = −1 − 2t, x2 = s, x1 = 1 + s + t
         
x1 1+s+t 1 1 1
 x2   s   0  1  0 
         
 x3  =  −1 − 2t  =  −1  + s  0  + t  −2  = p + su1 + tu2
x=         
 x4   t   0   0   1 
x5 2 2 0 0

142
9.4. Comments on selected activities

Verify:  
  1  
1 −1 1 1 2  0 
  4
Ap =  −1 1 0 1 −1   −1  = −3
  
1 −1 2 3 4  0  7
2
 
  1  
1 −1 1 1 2 1
  0
Au1 = −1 1 0 1 −1  0  = 0 
    
1 −1 2 3 4 0 0
0
 
  1  
1 −1 1 1 2  0 
  0
Au2 = −1 1 0 1 −1  −2  = 0 
    
1 −1 2 3 4  1  0
0

Feedback to activity 9.9


You can use any solution x (so any values of s, t ∈ R) to write b as a linear combination
of the columns of A, so this can be done in infinitely many ways. In particular, taking 9
x = p, and letting ci indicate column i of the coefficient matrix A,

Ap = c1 − c3 + 2c5 = b.

You should write this out in detail and check that the sum of the vectors does add to
the vector b. Notice that this combination uses only the columns corresponding to the
leading variables:        
1 1 2 4
 −1  −  0  + 2  −1  =  −3  .
1 2 4 7
Similarly, since Au1 = 0 and Au2 = 0, any linear combination of these two vectors will
give a vector v = su1 + tu2 for which Av = 0, and you can rewrite Av as a linear
combination of the columns of A. For example, taking u1 ,
     
1 −1 0
c1 + c2 =  −1  +  1  =  0  .
1 −1 0

143
9. Rank, range and linear systems

144
Chapter 10
Sequences and series

Introduction
In this chapter and the next, we make a slight detour into the topic of sequences, series
and difference equations (also known as recurrence equations). Many problems in
economics and finance involve sequences, particularly those involving quantities which
change with time, but not continuously (such as the balance of a deposit account where
interest is paid once a year, at the end of the year). This chapter and the next one are
independent of the other chapters so far, but the material is important in its own right
and, moreover, we will see later that matrices and linear algebra can be used to solve
systems of difference equations.

Aims
10
The aims of this chapter are to:

introduce sequences and series, especially of arithmetic and geometric type

see how sequences can be used in financial modelling

begin to develop methods for finding formulae for sequences

Reading
This chapter and the next concern a topic that, although algebraic, is not linear
algebra, and therefore is not discussed in the A-H text, or in linear algebra books
generally. No significant additional reading is required, but we recommend the following

R
supplementary reading (which will also be useful for the next chapter of the guide).
Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. Chapters 3 and 4.

Synopsis
We start by introducing sequences, and two very important types: arithmetic and
geometric. We explain how geometric sequences arise naturally when modelling
compound interest. We then look at series, which involves finding the sum of members

145
10. Sequences and series

of a sequence. We look at some aspects of the limiting behaviour of sequences and we


illustrate how this material can be applied to solve some problems in finance.

10.1 Sequences

10.1.1 Sequences in general


A sequence1 of numbers y0 , y1 , y2 , . . . is an infinite and ordered list of numbers with
one term, yt , corresponding to each non-negative integer, t. We call yt−1 the tth term of
the sequence. Notice that, in our notation, the first term is y0 and yt is actually the
(t + 1)st term of the sequence. (Be careful not to be confused by this, as some texts
differ. It’s quite legitimate to, instead, denote the first term by y1 .) For example, yt
could represent the price of a commodity t years from now, or the balance in a bank
account t years from now. Often, a sequence is defined explicitly by a formula. For
instance, the formula yt = t2 generates the sequence

y0 = 0, y1 = 1, y2 = 4, y3 = 9, y4 = 16, . . .

and the sequence 3, 5, 7, 9, . . . may be described by the formula

yt = 2t + 3 (t ≥ 0).
10
10.1.2 Arithmetic progressions
The arithmetic progression with first term a and common difference d has its terms
given by the formula yt = a + dt. For example, the arithmetic progression with first
term 5 and common difference 3 is 5, 8, 11, 14, . . .. Note that yt is obtained from yt−1 by
adding the common difference d. In symbols, yt = yt−1 + d.

10.1.3 Geometric progressions


Another very important type of sequence is the geometric progression. The geometric
progression with first term a and common ratio x is given by the formula yt = axt .
Notice that successive terms are related through the relationship yt = xyt−1 . For
example, the geometric progression with first term 3 and common ratio 1/2 is given by
yt = 3(1/2)t ; that is, the sequence is 3, 3/2, 3/4, 3/8, . . ..

10.1.4 Compound interest


Perhaps the simplest occurrence of geometric progressions in economics is in the study
of compound interest.2 Suppose that we have a savings account for which the annual
percentage interest rate is constant at 8%. What this means is that if we have $P in the
account at the beginning of a year then, at the end of that year, the account balance is
increased by 8% of $P . In other words, the balance increases to $(P + 0.08P ).
1
See Anthony and Biggs, Section 3.1.
2
See Anthony and Biggs, Sections 4.3 and 7.3.

146
10.1. Sequences

Generally, if the annual percentage rate of interest is R%, then the interest rate is
r = R/100 and in the course of one year, a balance of $P becomes $P + rP =
$(1 + r)P . One year after that, the balance in dollars becomes $(1 + r)((1 + r)P ), which
is $(1 + r)2 P . Continuing in this way, we can see that if P dollars are deposited in an
account where interest is paid annually at rate r, and if no money is taken from or
added to the account, then after t years we have a balance of P (1 + r)t dollars. This
process is known as compounding (or compound interest), because interest is paid on
interest previously added to the account.

Activity 10.1 Suppose that $1, 000 is invested in an account that pays interest at a
fixed rate of 7%, paid annually. How much is there in the account after four years?

10.1.5 Frequent compounding


What happens if interest is added more frequently than once a year? Suppose, for
example, that instead of 8% interest paid at the end of the year, we have 4% interest
added twice-yearly, once at the middle of the year and once at the end. If $100 is
invested, the amount after one year will be

100(1 + 0.04)2 = 108.16

dollars, which is slightly more than the $108 which results from the single annual
addition. If the interest is added quarterly (so that 2% is added four times a year), the 10
amount after one year will be

100(1 + 0.02)4 = 108.24

dollars (approximately). In general, when the year is divided into m equal periods, the
rate is r/m over each period, and the balance after one year is
 r m
P 1+ ,
m
where P is the initial deposit.
Taking m larger and larger — formally, letting m tend to infinity — we find ourselves in
the situation of continuous compounding. Now, it is a standard fact (that we won’t
verify here) that, as m gets larger and larger, tending to infinity,
 r m
1+
m
approaches er , where e is the base of the natural logarithm. (See the subject guide for
MT1174 Calculus.) Formally,
 r m
lim 1 + = er .
m→∞ m
So the balance after one year should be P er . If invested for a further year, we would
have P er er = P (er )2 = P e2r . After t years continuous compounding, the balance of the
account would be P ert .

147
10. Sequences and series

10.2 Series
Let us continue with the story of our investor. It is natural to investigate how the
balance varies if the investor adds a certain amount to the account each year. Suppose
that they add $P to the account at the beginning of each year, so that at the beginning
of the first year the balance is $P . At the beginning of the second year the balance in
dollars will be $P (1 + r) + P ; this represents the money from the first year with interest
added, and the new, further, deposit of $P . Convince yourself that, continuing in this
way, the balance at the beginning of year t is, in dollars,

P + P (1 + r) + · · · + P (1 + r)t−2 + P (1 + r)t−1 .

How can we calculate this expression? Note that it is the sum of the first t terms (that
is, term 0 to term t − 1) of the geometric progression with first term P and common
ratio 1 + r. Before coming back to this, we shall discuss such things in a more general
setting.
Given a sequence y0 , y1 , y2 , y3 , . . ., a finite series is a sum of the form

y0 + y2 + · · · + yt−1 ,

the first t terms added together, for some number t. There are two important results
10 about series, concerning the cases where the corresponding sequence is an arithmetic
progression (in which case the series is called an arithmetic series) and where it is a
geometric progression (in which case the series is called a geometric series).

10.2.1 Arithmetic series


The main result here is that if yt = a + dt describes an arithmetic progression and St is
the sequence
St = y0 + y1 + y2 + . . . + yt−1 ,

then
t(2a + (t − 1)d)
St = .
2
There is a useful way of remembering this result. Notice that St may be rewritten as

(a + (a + (t − 1)d)) (y0 + yt−1 )


St = t =t ,
2 2

so that we have the following easily remembered result: an arithmetic series has value
equal to the number of terms, t, times the value of the average of the first and last
terms (y0 + yt−1 )/2. Equivalently, the average value St /t of the t terms is the average,
(y0 + yt−1 )/2 of the first and last terms.

Activity 10.2 Find the sum of the first n terms of an arithmetic series whose first
term is 1 and whose common difference is 5.

148
10.3. Finding a formula for a sequence

10.2.2 Geometric series


We now look at geometric series. It is easily checked (by multiplying out the expression)
that, for any x,
(1 − x)(1 + x + x2 + · · · + xt−1 ) = 1 − xt .
So, if x 6= 1 and yt = axt , then the geometric series

St = y0 + y2 + . . . + yt−1 = a + ax + ax2 + · · · + axt−1

is therefore given by
a(1 − xt )
St = .
1−x

Example 10.1 In our earlier discussion on savings accounts, we came across the
expression
P + P (1 + r) + . . . + P (1 + r)t−2 + P (1 + r)t−1 .
We now see that this is a geometric series with t terms, first term P and common
ratio 1 + r. Therefore it equals

1 − (1 + r)t P
(1 + r)t − 1 .

P =
1 − (1 + r) r

10
Activity 10.3 Find an expression for

2 + 2(3) + 2(32 ) + 2(33 ) + · · · + 2(3n ).

10.3 Finding a formula for a sequence


Often we can use results on series to determine an exact formula for the members of a
sequence of numbers. The following example illustrates this.

Example 10.2 Suppose a sequence of numbers is constructed as follows. The first


number, y0 , is 1, and each other number in the sequence is obtained from the
previous number by multiplying by 2 and adding 1 (so that yt = 2yt−1 + 1, for
t ≥ 1). What’s the general expression for yt in terms of t?
We can see that

y1 = 2y0 + 1 = 2(1) + 1 = 2 + 1
y2 = 2y1 + 1 = 2(2 + 1) + 1 = 22 + 2 + 1
y3 = 2y2 + 1 = 2(22 + 2) + 1 = 23 + 22 + 2 + 1
y4 = 2y3 + 1 = 2(23 + 22 + 2 + 1) + 1 = 24 + 23 + 22 + 2 + 1.

In general, it would appear that

yt = 2t + 2t−1 + · · · + 22 + 1.

149
10. Sequences and series

But this is just a geometric series: perhaps this is clearer if we write it as

yt = 1 + 2 + 22 + · · · + 2t−1 + 2t ,

from which it is clear that this is the sum of the first t + 1 terms of the geometric
progression with first term 1 and common ratio 2. By the formula for the sum of a
geometric series, we have
1 − 2t+1
yt = = 2t+1 − 1,
1−2

10.4 Limiting behaviour


When x is greater than 1, as t increases, xt will eventually become greater than any
given number, and we say that xt tends to infinity as t tends to infinity.3 We write this
in symbols as
xt → ∞ as t → ∞ or lim xt = ∞.
t→∞

On the other hand, when x < 1 and x > −1, we have

xt → 0 as t → ∞ or lim xt = 0.
t→∞

We notice that, while xt gets closer and closer to 0 for all values of x in the range
10 −1 < x < 1, its behaviour depends to some extent on whether x is positive or negative.
When x is negative, the terms are alternately positive and negative, and we say that the
approach to zero is oscillatory. For example, when x = −0.2, the sequence xt is

−0.2, 0.04, −0.008, 0.0016, −0.00032, 0.000064, −0.0000128, 0.00000256, . . .

When x is less than −1, the sequence is again oscillatory, but it does not approach any
limit, the terms being alternately large-positive and large-negative. In this case, we say
that at oscillates increasingly.
As an application of this, let us consider again the geometric series

St = a + ax + ax2 + · · · + axt−1 .

We have
a(1 − xt )
St = .
1−x
If −1 < x < 1 then xt → 0 as t → ∞. This means that St approaches the number
a(1 − 0)/(1 − x) = a/(1 − x), as t increases. In other words,
a
St → as t → ∞.
1−x
We call this limit the sum to infinity of the sequence given by yt = axt . Note that a
geometric sequence has a finite sum to infinity only if the common ratio is strictly
between −1 and 1.

3
See Anthony and Biggs, Section 3.3.

150
10.5. Financial applications

Example 10.3 Consider the sequence with yi = 1/2i for i ≥ 0. The sum of the first
t terms of this sequence is
1 1 1
St = 1 + + 2 + · · · + t−1 .
2 2 2
By the formula for the sum of a geometric series,
 t !
1
St = 2 1 − ,
2

and we see that St → 2 as t → ∞.

Activity 10.4 Find an expression for


 2  3  t
2 2 2 2
St = + + + ··· + ,
3 3 3 3

and determine the limit of St as t tends to infinity.

10.5 Financial applications


10
A number of problems in financial mathematics can be solved just using what we know
about arithmetic and geometric series. Here is an example.

Example 10.4 John has opened a savings account with a bank, and they pay a
fixed interest rate of 5% per annum, with the interest paid once a year, at the end of
the year. He opened the savings account with a payment of $100 on 1 January 2003,
and will be making deposits of $200 yearly, on the same date. What will his savings
be after he has made N of these additional deposits? (Your answer will be an
expression involving N .)
If yN is the required amount, then we have

y1 = (1.05)100 + 200,
y2 = (1.05)y1 + 200
= 100(1.05)2 + 200(1.05) + 200,

and, in general, we can spot the pattern and observe that

yN = 100(1.05)N + 200(1.05)N −1 + 200(1.05)N −2 + · · · + 200(1.05) + 200


= 100(1.05)N + 200 1 + (1.05) + (1.05)2 + · · · + (1.05)N −2 + (1.05)N −1


1 − (1.05)N
= 100(1.05)N + 200
1 − (1.05)
= 100(1.05)N + 4000 (1.05)N − 1 ,


where we have used the formula for a geometric series.

151
10. Sequences and series

Overview
We have explored arithmetic and geometric sequences and series and shown how these
can be used to find general formulae for other types of sequence. We have also seen how
to apply this material to financial modelling. In the next chapter, we continue the study
of sequences, by looking at how to find formulae for sequences defined by a difference
equation (that is, recursively).

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

explain what is meant by arithmetic and geometric progressions, and calculate the
sum of finite arithmetic and geometric series
explain compound interest and calculate balances under compound interest
apply sequences and series in finance
analyse the long-term behaviour of series and sequences

Test your knowledge and understanding


10
Work all the exercises below. The solutions are at the end of this chapter.

Exercises
Exercise 10.1
A geometric progression has a sum to infinity of 3 and has second term y1 equal to 2/3.
Show that there are two possible values of the common ratio x and find the
corresponding values of the first term a.

Exercise 10.2
Suppose we have an initial amount, A0 , to invest and we add an additional investment
F at the end of each subsequent year. All investments earn an interest of i% per annum,
paid at the end of each year.
(a) Use the formula for the sum of a geometric series to derive a formula for the value of
the investment, An , after n years.
(b) An investor puts $10, 000 into an investment account that yields interest of 10% per
annum. The investor adds an additional $5, 000 at the end of each year. How much will
there be in the account at the end of five years? Show that if the investor has to wait N
years until the balance is at least $80, 000, then

ln(13/6)
N≥ .
ln(1.1)

152
10.5. Comments on selected activities

Exercise 10.3
An amount of $1,000 is invested and attracts interest at a rate equivalent to 10% per
annum. Find expressions for the total after one year if the interest is compounded:

(a) annually

(b) quarterly

(c) monthly

(d) daily. (Assume the year is not a leap year.)


What would be the total after one year if the interest is 10% compounded continuously?

Exercise 10.4
Suppose yi = 1/22i . Find the limit, as t → ∞, of

St = y0 + y2 + · · · + yt−1 .

Comments on selected activities


10
Feedback to activity 10.1
The required amount is 1000(1 + 0.07)4 = 1310.80 dollars.

Feedback to activity 10.2


We have
1 n 5 3
Sn = n (2(1) + (n − 1)5) = (5n − 3) = n2 − n.
2 2 2 2

Feedback to activity 10.3


Noting that there are n + 1 terms in the sum (not n: be careful!), and that it is the sum
of a geometric progression with first term 2 and common ratio 3, the expression is

2(1 − 3n+1 )
= 3n+1 − 1.
1−3

Feedback to activity 10.4


St is the sum of the first t terms of a geometric progression with first term 2/3 and
common ratio 2/3, so
 t !
2 1 − (2/3)t
 
2
St = =2 1− .
3 1 − (2/3) 3

As t → ∞, (2/3)t → 0 and so St → 2.

153
10. Sequences and series

Comments on exercises
Solution to exercise 10.1
We know that the sum to infinity is given by the formula a/(1 − x) and that y1 = ax.
Therefore, the given information is
a 2
= 3, ax = .
1−x 3
From the first equation, a = 3(1 − x) and the second equation then gives
3(1 − x)x = 2/3, from which we obtain the quadratic equation 9x2 − 9x + 2 = 0. This
has the two solutions x = 2/3 and x = 1/3. The corresponding values of the first term a
(given by a = 3(1 − x)) are 1 and 2, respectively. So, as suggested by the question, there
are two geometric progressions that have the required sum to infinity and second term.

Solution to exercise 10.2


(a) After 1 year, at the beginning of the second, the amount A1 in the account is
A0 (1 + i/100) + F , because the initial amount A0 has attracted interest at rate i/100
and F has been added. Similar considerations show that
 
i
A2 = 1+ A1 + F
100
    
i i
10 = 1+
100
A0 1 +
100
+F +F
 2  
i i
= A0 1 + +F 1+ + F,
100 100

 
i
A3 = 1+ A2 + F
100
   2   !
i i i
= 1+ A0 1 + +F 1+ +F +F
100 100 100
 3  2  
i i i
= A0 1 + +F 1+ +F 1+ + F.
100 100 100
In general, if we continued, we could see that
 n  n−1  n−2  
i i i i
An = A0 1 + +F 1+ +F 1+ + ··· + F 1 + + F.
100 100 100 100
Now,
 n−1      n−1
i i i i
F 1+ + ··· + F 1 + +F = F +F 1+ + ··· + F 1 +
100 100 100 100
n
F (1 − (1 + i/100) )
=
1 − (1 + i/100)
 n 
100F i
= 1+ −1 ,
i 100

154
10.5. Comments on exercises

where we have used the formula for the sum of a geometric progression. Therefore
 n  n 
i 100F i
An = A0 1 + + 1+ −1 .
100 i 100

For (b), we use the formula just obtained, with A0 = 10000, i = 10, F = 5000 and
n = 5, and we see that
 5  5 !
10 100(5000) 10
A5 = 10000 1 + + 1+ −1
100 10 100
= 10000 (1.1)5 + 50000 (1.1)5 − 1


= 46630.60.

Now, for the balance to be at least $80, 000 after N years, we need AN ≥ 80000 which
means  
10000 (1.1)N + 50000 (1.1)N − 1 ≥ 80000.
This is equivalent, after a little manipulation, to

60000(1.1)N ≥ 130000,

or (1.1)N ≥ 13/6. To solve this, we can take logarithms and see that we need

N ln(1.1) ≥ ln(13/6),
10
so
ln(13/6)
N≥ ,
ln(1.1)
as required.

Solution to exercise 10.3


We use the fact that if the interest is paid in m equally spaced instalments, then the
m
total after one year is 1000 1 + mr , where r = 0.1 and m = 1, 4, 12, 365 in the four
cases. Therefore the answers to the first four parts of the problem are as follows:

(a) 1000 (1 + 0.1) = 1100.


 4
0.1
(b) 1000 1 + = 1000(1.025)4 .
4
 12
0.1
(c) 1000 1 + .
12
 365
0.1
(d) 1000 1 + .
365
For the last part, we use the fact that under continuous compounding at rate r, an
amount P grows to P er after one year, so the answer here is 1000e0.1 .

155
10. Sequences and series

Solution to exercise 10.4


Note that 1/22i = 1/4i = (1/4)i , so this is a geometric series where the common ratio is
1/4. The first term is 1, and there are t terms, so
 t !
1 − (1/4)t 4 1
St = = 1− .
1 − (1/4) 3 4

As t → ∞, (1/4)t → 0 and so St → 4/3.

10

156
Chapter 11
Difference equations

Introduction
We now turn our attention to sequences that are defined recursively or (equivalently) by
a difference equation. Difference equations occur naturally in mathematical modelling,
where we know how one member of a sequence is obtained from previous members and
what we would like to do is find a general formula for the sequence.

Aims
The aims of this chapter are to:

explain what is meant by a first-order difference equation and show how to solve
them

look at the cobweb model, an economic application of first-order difference


equations 11
explain how difference equations can be used in financial modelling

explain what is meant by a second-order difference equation and show how to solve
them

discuss some economic applications of second-order difference equations

Reading

R
We recommend the following supplementary reading.
Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. Chapters 3, 4, 5 and 23. (Note that this text uses the phrase
‘recurrence equation’ rather than ‘difference equation’, but they mean the same
thing.)

Synopsis
We start by explaining what is meant by a first-order difference equation. We then
present a method for solving them, discussing the behaviour of the solution, and

157
11. Difference equations

applications to the cobweb model in economics, and to financial modelling. We then


explain what is meant by linear second-order difference equations, explain how to solve
them and investigate an economic application.

11.1 First-order difference equations


We have noted that if
y0 , y1 , y2 , y3 , . . .
is an arithmetic progression with common difference d then yt can be related to yt−1
through the equation yt = yt−1 + d. In fact, this equation, together with the value of y0 ,
tells us precisely what yt is.

Example 11.1 Suppose we have yt = yt−1 + 3 and we know y0 . Then we can


calculate y1 as y1 = y0 + 3, and, similarly, y2 = y1 + 3 = y0 + 3 + 3 = y0 + (3)2, and
so on. We can deduce that in general, yt will be yt = y0 + 3t. This expression enables
us to find yt for any t without going through the iteration process.
If, in addition we know that y0 = 5, then this gives us the expression yt = 5 + 3t. So
if we want to know what y321 is without repeating the iteration process, we can
simply substitute t = 321 into this expression to find that y321 = 5 + 3(321) = 968.

The equation
yt = ayt−1 + b, t ≥ 1,
where a and b are numbers is called a first-order linear difference equation with
11 constant coefficients and the value of y0 is known as an initial condition. It is said
to be first-order because the equation only involves the previous value of the sequence.
Once the value of y0 is known, all the numbers yt of the sequence are determined.
Important point: It should, of course, be understood that the difference equation
yt = ayt−1 + b (t ≥ 1)
is entirely equivalent to the difference equation
yt+1 = ayt + b (t ≥ 0).
They say precisely the same thing about the sequence.
By a solution of a first-order difference equation we mean an expression for yt as a
function of the positive integer t, depending on the initial condition y0 . The difference
equation yt = yt−1 + 3, with y0 = 5, has, as we have seen, the solution yt = 5 + 3t.
The question we want to consider here is how to determine such explicit solutions to
any linear first-order difference equation.
It is easy to see (and you should convince yourself of this) that if a = 1 then the
sequence of numbers yt given by the general first-order difference equation
yt = ayt−1 + b
is an arithmetic progression with common difference b and first term y0 . Therefore we
shall discuss the solution to such a general difference equation when a 6= 1.

158
11.2. Solving first-order difference equations

Activity 11.1 Write down the solution of the difference equation yt = yt−1 + b for a
fixed constant b and initial condition y0 .

If b = 0, then the equation yt = ayt−1 + b can be written as


yt − ayt−1 = 0.
This is a homogeneous first order linear difference equation with constant coefficients.
We have yt = ayt−1 and so y1 = ay0 , y2 = ay1 = a2 y0 , and, in general, yt = at y0 . In this
case, then, finding the solution is easy. The solution is yt = at y0 and the sequence of
numbers yt is a geometric progression with first term y0 and common ratio a. (In a
specific application, y0 would be given also.)
Difference equations in which a 6= 1 and b 6= 0 occur often in economics. Suppose, for
example, that yt represents the balance in a savings account at the end of t years and
that the interest rate is r. Suppose that, from this account, each year, the investor
withdraws an amount I. Then we have the difference equation
yt = (1 + r)yt−1 − I.
This is of the form yt = ayt−1 + b where a = 1 + r and b = −I.
Let’s work out a few terms of a difference equation.

Example 11.2 Suppose that y0 = 1 and for t ≥ 1, yt = 2yt−1 + 1. The sequence of


yt is not an arithmetic or a geometric progression. (We do not get from one term to
the next simply by adding a fixed constant — as we would were it an arithmetic
progression; nor by simply multiplying by a fixed constant — as we would were it a
geometric progression. Instead, we both multiply by a fixed constant and then add a
11
fixed constant.) Working out successive terms, we have

y1 = 2y0 + 1 = 2(1) + 1 = 3
y2 = 2y1 + 1 = 2(3) + 1 = 7
y3 = 2y2 + 1 = 2(7) + 1 = 15,

and so on.
Now, suppose you wanted to know the value of y312 . Do you really want to have to
carry out 312 calculations of the type we have just seen? Certainly not! Which is
why we want a solution of the difference equation, a formula or expression for yt
involving only t and y0 (and not yt−1 ).

11.2 Solving first-order difference equations


Suppose we want to find the general solution of the equation yt = ayt−1 + b (with
a 6= 1). First, note that the constant
b
y∗ = satisfies y ∗ = ay ∗ + b.
1−a

159
11. Difference equations

Activity 11.2 Check this!

This means that if y0 happened to be y ∗ , then we’d have y1 = ay ∗ + b = y ∗ , y2 = y ∗ and,


generally, yt = y ∗ for all t. For this reason, the number y ∗ is known as the constant
solution or time-independent solution.
It turns out that the general solution to the equation yt = ayt−1 + b is
b
yt = y ∗ + (y0 − y ∗ )at , where y ∗ = .
(1 − a)

You might wonder where this formula comes from. We’ll briefly give an indication, but
first we state this as a theorem, just to make it clear how important it is.
Theorem 11.1 The general solution of the equation yt = ayt−1 + b (with a 6= 1) is
yt = y ∗ + (y0 − y ∗ )at
where y ∗ = b/(1 − a).

To see why this is true, first, note that the difference equation can be written as
yt − ayt−1 = b
and that, by the way in which y ∗ is defined, y ∗ − ay ∗ = b. This means that the constant
sequence yt = y ∗ is a particular solution of the difference equation. Now suppose we
considered the homogeneous difference equation yt − ayt−1 = 0. It’s easy to see that, for
any constant k, yt = kat satisfies this (and, indeed, is the general solution, meaning
that all solutions will look like this). So (just as for linear systems of equations, Ax = b,
11 – and the analogy is a real one, not just a coincidence), if we consider yt = y ∗ + kat , the
sum of the particular solution and the general solution to the homogeneous equation,
we will have
yt − ayt−1 = (y ∗ + kat ) − a(y ∗ + kat−1 ) = (y ∗ − ay ∗ ) + k(at − at ) = y ∗ − ay ∗ = b.
So yt = y ∗ + kat is a solution, for any k. To get a solution that has the right value, y0 ,
when t = 0, we must have (on substituting t = 0) y0 = y ∗ + ka0 = y ∗ + k, which is why

R
we take k = y0 − y ∗ .
Read Section 3.2 of the Anthony and Biggs book for a slightly different (and
fuller) explanation of why this theorem is true.

Example 11.3 Consider yt = 2yt−1 + 1 with y0 = 1. We have a = 2, b = 1 and


b 1
y∗ = = = −1,
1−a 1−2
so the solution is

yt = y ∗ + (y0 − y ∗ )at = −1 + (1 − (−1))2t = −1 + 2(2t ).

Note that y3 = −1 + 2(23 ) = −1 + 2(8) = 15 exactly as we found above in


Example 11.2.

160
11.3. Long-term behaviour of solutions

Example 11.4 We find the solution of the equation

yt = 5yt−1 + 6,

given that y0 = 5/2.


If we take a = 5 and b = 6 in the standard form yt = ayt−1 + b, then we have exactly
the equation given. The first thing to do is to find the constant solution. By the
formula, this is y ∗ = b/(1 − a) = 6/(1 − 5) = −3/2. We can now write down the
general solution and insert the given value of y0 :
3
yt = y ∗ + (y0 − y ∗ )at = − + 4(5t ).
2

Alternatively, you can use the fact that the general solution is the sum of a
particular solution (the constant solution) and the general solution of the
homogeneous equation. The constant solution y ∗ can be found by substituting this
constant into the equation,
6 3
y ∗ = 5y ∗ + 6 =⇒ y∗ = =− .
−4 2
The homogeneous equation is yt − 5yt−1 = 0 with general solution yt = k(5t ).
Therefore the general solution of the equation is yt = −3/2 + k5t . Substituting the
initial value y0 = 5/2 into this equation produces the solution:
5 3 3
y0 = =− +k ⇒ k=4 ⇒ yt = − + 4(5)t .
2 2 2 11
Activity 11.3 Suppose that yt = (2/3)yt−1 + 5 and that y0 = 2. Find yt .

11.3 Long-term behaviour of solutions


The behaviour of the general solution (or time path) for yt , yt = y ∗ + (y0 − y ∗ )at ,
depends simply on the behaviour of at . For example, if at → 0, then the formula tells us
that yt → y ∗ . We can tabulate the results as follows. (For this table, we assume y0 6= y ∗
because in this case it is clear that the solution is constant, and equal to y ∗ .)

Value of a Behaviour of at Behaviour of yt


a>1 at → ∞ yt → ∞ or yt → −∞
0≤a<1 at → 0 (decreasing) yt → y ∗
−1 < a < 0 at → 0 (oscillating) yt → y ∗
a < −1 oscillates increasingly oscillates increasingly

In the first of these cases (a > 1), whether yt → ∞ or yt → −∞ will, of course, depend
on the sign of y0 − y ∗ .

161
11. Difference equations

Activity 11.4 The case a = −1 is not covered in the table just given. How does the
solution yt behave in this case?

11.4 The cobweb model


In economics, markets for a single good are often described using supply and demand
functions which indicate relationships between the price per unit of the good (usually
denoted by p) and the quantity of it on the market (usually denoted by q). (If you are
unfamiliar with supply and demand functions, please see the subject guide for MT1174
Calculus or Chapters 1 and 5 of Anthony and Biggs.)
Consider an agricultural product for which there is a yearly ‘crop’. If there are no
disturbances, the equilibrium price p∗ and quantity q ∗ will prevail. Suppose that one
year, however, for some external reason such as drought, there is a shortage, so that the
quantity falls and the price rises to p0 . During the winter the farmers plan their
production for the next year on the basis of this higher price, and so an increased
quantity appears on the market in the next year: specifically q1 = q S (p0 ). Because the
quantity is greater, the price consumers pay falls, to the value p1 = pD (q1 ). Overall, the
effect of the disturbance on the price is that it goes from p0 , which is greater than p∗ , to
p1 , which is less than p∗ . The process is repeated again in the following year: this time
the lower price p1 leads to a decrease in production q2 and that in turn means a higher
price p2 . The next year a similar process takes place, and so on. When the sequences
p0 , p1 , p2 , . . . , and q1 , q2 , . . . , are plotted on the supply and demand diagram, we get a
figure like a ‘cobweb’. This is the reason for the name ‘cobweb model’.
11
Example 11.5 Generalising the argument above, we see that pt−1 determines qt ,
which in turn determines pt , according to the rules
qt = q S (pt−1 ), pt = pD (qt ),
where q S is the supply function and pD the inverse demand function. Suppose that
the demand and supply equations are, respectively, as follows:
q + p = 24 and 2q + 18 = p.
Then the equilibrium quantity and price are obtained by setting the supply equal to
the demand; that is,
p∗ = 24 − q ∗ = 2q ∗ + 18 =⇒ q ∗ = 2 and p∗ = 22.
The supply and demand functions are
q S (p) = 0.5p − 9, pD (q) = 24 − q.
The equations linking pt−1 , qt and pt are thus
qt = 0.5pt−1 − 9, pt = 24 − qt .
Eliminating qt we obtain a first-order difference equation for pt :
pt = 33 − 0.5pt−1 .

162
11.5. Financial applications of first-order difference equations

This is in the standard form yt = ayt−1 + b, with pt replacing yt and


a = −0.5, b = 33. The time-independent solution is b/(1 − a) = 33/(3/2) = 22, and
the explicit solution in terms of p0 is

pt = 22 + (p0 − 22)(−0.5)t .

Note that the time-independent solution is the equilibrium price p∗ = 22, and that in
this case (if p0 6= p∗ ) the sequence approaches p∗ in an oscillatory way. We say that
we have a stable cobweb. However, it is possible, for other supply and demand
curves, that the price oscillates around the equilibrium price with ever-increasing
magnitude. In such cases, the price does not approach p∗ and we say we have an

R
unstable or exploding cobweb.

Read Chapter 5 of the Anthony and Biggs book for a more extensive discussion
of the cobweb model, including an analysis of the general case and a
characterisation of when it is stable.

11.5 Financial applications of first-order difference


equations
First-order difference equations are very useful in the mathematics of finance.
We consider how capital accrues under compound interest. In particular, we consider
the situation in which there is a fixed annual interest rate r available to investors, and
interest is compounded annually. In this case, if we invest P then after t years we have 11
an amount P (1 + r)t . This same result can be derived very simply via difference
equations. If we let yt be the capital at the end of the tth year we have y0 = P and the
difference equation
yt = (1 + r)yt−1 , (t = 1, 2, 3, . . .).
This is in the standard form, with a = (1 + r) and b = 0. The solution is fairly obvious
(since this is just a geometric progression).
It might seem unnecessary to use difference equations for such a simple investment
scenario, when it is very easy to determine by elementary means the amount of capital
after t years. However, suppose that we withdraw an amount I at the end of each year
for N years. Then what is the balance of the account after t years? This is less obvious,
but difference equations provide an easy means of determining the answer. As we noted
above, in this case, the difference equation is

yt = (1 + r)yt−1 − I, where y0 = P.

This is another case of the first-order linear difference equation, in standard form with
a = 1 + r and b = −I. The time-independent solution is therefore y ∗ = I/r. The general
solution is yt = y ∗ + (y0 − y ∗ )at , and since y0 = P we obtain
 
I I
yt = + P − (1 + r)t .
r r

163
11. Difference equations

This formula enables us to answer a number of questions. First, we might want to know
how large the withdrawals I can be given an initial investment of P , if we want to be
able to withdraw I annually for N years. The condition that nothing is left after N
years is, yN = 0. This is  
I I
+ P− (1 + r)N = 0,
r r
and rearranging, we get
I
(1 + r)N − 1 = P (1 + r)N ,

r
so that
r(1 + r)N
 
I(P ) = P.
(1 + r)N − 1
An ‘inverse’ question is: what principal P is required to provide an annual income I for
the next N years? Rearranging the equation gives the result
 
I 1
P (I) = 1− .
r (1 + r)N

11.6 Homogeneous second-order difference


equations
An equation of the form
11 yt + a1 yt−1 + a2 yt−2 = 0, t≥2

in which a1 and a2 are constants, is a homogeneous linear second-order difference


equation with constant coefficients (or recurrence equation of the same
description). It is called second-order because the equation involves two previous
terms of the sequence (or equivalently, because the difference between the highest index
and lowest index is two). If you are given y0 and y1 , then the equation determines y2
and all remaining numbers in the sequence. The equation may also be written as

yt+2 + a1 yt+1 + a2 yt = 0, t ≥ 0.

We want to find a general solution of this equation. The general solution will need to
have two arbitrary constants so that a specific solution can be found once y0 and y1 are
given (just as the general solution of a first-order equation contains one arbitrary
constant).
As the equation is linear and homogeneous, two very useful properties apply (compare
this with homogeneous systems of linear equations, Ax = 0):

a constant multiple of a solution is a solution,

the sum of two solutions is a solution.

164
11.6. Homogeneous second-order difference equations

Activity 11.5 Show this. Given two sequences xt and zt which satisfy the
homogeneous difference equation yt+2 + a1 yt+1 + a2 yt = 0, show that xt + zt and cxt ,
where c is a constant, also satisfy this equation

Therefore, it follows that if we know two solutions xt and zt of the difference equation,
then
yt = Axt + Bzt
is also a solution for any constants A, B ∈ R.
We next set about finding solutions. Knowing what we do about the general solution of
the homogeneous first-order equation (a geometric progression), let’s try a solution of
the form yt = mt where m is some constant to be determined.
Substituting yt = mt into the difference equation we obtain
yt+2 + a1 yt+1 + a2 yt = mt+2 + a1 mt+1 + a2 mt = mt (m2 + a1 m + a2 ) = 0.
If m = 0, we get yt = 0, so we ignore this possibility. Then yt = mt is a solution of the
difference equation if and only if m satisfies m2 + a1 m + a2 = 0. This equation,
z 2 + a1 z + a2 = 0. is known as the auxiliary equation.
Let’s look at an example.

Example 11.6 We find the general solution of the difference equation

yt+2 − 5yt+1 + 6yt = 0.

The auxiliary equation is 11


z 2 − 5z + 6 = 0.
This equation factors, z 2 − 5z + 6 = (z − 2)(z − 3) = 0, with solutions z = 2 and
z = 3. Therefore both xt = 2t and zt = 3t are solutions and the general solution is

yt = A(2t ) + B(3t ), A, B ∈ R.

Now suppose we are given initial conditions y0 = 1 and y1 = 4. Then the difference
equation determines all remaining numbers in the sequence. We have
y2 = 5y1 − 6y0 = 5(4) − 6(1) = 14, y3 = 5y2 − 6y1 , and so on.
The specific solution of the difference equation with these initial conditions is found
by substituting t = 0 and t = 1 into the general solution. We have

y0 = A + B = 1 and y1 = 2A + 3B = 4.

Solving these equations for A and B, we find B = 2 and A = −1. Therefore the
solution of yt+2 − 5yt+1 = 6yt = 0 with initial conditions y0 = 1 and y1 = 4 is

yt = −(2t ) + 2(3t ).

We can immediately check that y0 = −20 + 2(30 ) = −1 + 2(1) = 1 and


y1 = −2 + 2(3) = 4 as required. A further check is to find that
y2 = −(22 ) + 2(32 ) = −4 + 18 = 14 as we found earlier.

165
11. Difference equations

The auxiliary equation z 2 + a1 z + a2 = 0 of a second-order difference equation is a


quadratic equation and the general solution to the difference equation depends on
whether the auxiliary equation has two distinct solutions, or just one solution, or no
(real) solutions. Thus, the form of general solution depends on the value of the
discriminant, a21 − 4a2 . We consider each case in turn.

When the auxiliary equation has two distinct solutions, α and β, the general
solution is
yt = Aαt + Bβ t (A, B constants).
In any specific case, A and B are determined by the initial values y0 and y1 , as in
Example 11.6.

When the auxiliary equation has just one solution, α, the general solution is

yt = Ctαt + Dαt = (Ct + D)αt .

As in the previous case, the values of the constants C and D can be determined by
using the initial values y0 and y1 .

The auxiliary equation has no real solutions when the quantity a21 − 4a2 is negative.
In that case, 4a2 − a21 is positive, and hence so is a2 . Thus there is a positive square

root r of a2 ; that is, we can define r = a2 . In order to write down the general
solution in this case we define the angle θ by
a1 a1
cos θ = − =− √ .
2r 2 a2
11
Then the general solution in this case is

yt = Ert cos θt + F rt sin θt,

where E and F are constants.


That these general solutions are as stated can be verified by substitution into the
difference equations, but you are not expected to do so for this subject. If you study
complex numbers in the future, you will be able to see where the last type of general
solution comes from.
Instead, we will look at some examples.

Example 11.7 We find the general solution of the difference equation

yt − 6yt−1 + 5yt−2 = 0.

The auxiliary equation is z 2 − 6z + 5 = 0, that is (z − 5)(z − 1) = 0, with solutions 1


and 5. The general solution is therefore

yt = A(1t ) + B(5t ) = A + B5t ,

for arbitrary constants A and B.

166
11.6. Homogeneous second-order difference equations

Example 11.8 We find the general solution of the difference equation

yt + 6yt−1 + 9yt−2 = 0.

The auxiliary equation is z 2 + 6z + 9 = 0, that is (z + 3)2 = 0, with solution z = −3.


The general solution is therefore

yt = A(−3)t + Bt(−3)t = (A + Bt)(−3)t ,

for arbitrary constants A and B.

Example 11.9 Let us find yt if

yt − 2yt−1 + 4yt−2 = 0,

and y0 = 1, y1 = 1 − 3. Here, the auxiliary equation, z 2 − 2z + 4 = 0, has no real
solutions,
√ so we are in the third case. In the notation used above, we have
r = 4 = 2. It follows that

cos θ = −(−2)/(2r) = 2/4 = 1/2,

so θ = π/3. The general solution is therefore

yt = 2t (E cos (πt/3) + F sin (πt/3)) . 11


Putting t√
= 0, and using the given initial condition y0 = 1, we have E = 1. Similarly
y1 = 1 − 3 implies that
√ !
1 3 √
2 (E cos(π/3) + F sin(π/3)) = 2 +F = 1 − 3,
2 2
√ √
so that 1 + 3F = 1 − 3. Therefore F = −1 and the required solution is

yt = 2t (cos (πt/3) − sin (πt/3)) .



Let’s check this solution. It should satisfy the initial conditions
√ y0 = 1, y1 =√1 − 3.
In addition we can obtain y2 : y2 = 2y1 − 4y0 = 2(1 − 3) − 4(1) = −2 − 2 3.
Using the solution,
y0 = 20 (cos 0 − sin 0) = 1
and √ √
y1 = 2(cos(π/3) − sin(π/3)) = 2((1/2) − ( 3/2)) = 1 − 3.
So far, so good. Now for y2 . We have
√ √
y2 = 22 (cos(2π/3) − sin(2π/3)) = 4( − (1/2) − ( 3/2)) = −2 − 2 3,

which is as it should be.

167
11. Difference equations

11.7 Non-homogeneous second-order equations


We now consider how to solve an equation of the form

yt + a1 yt−1 + a2 yt−2 = k

where k is a constant.
By analogy with the first-order case, we start by looking for a constant solution yt = y ∗
for all t. For this we require

k
y ∗ + a1 y ∗ + a2 y ∗ = k, or y ∗ =
1 + a1 + a2

(provided 1 + a1 + a2 6= 0). Then y ∗ is a particular solution of the equation and, as with


first-order equations, we have that
General solution of the non-homogeneous equation =
Particular solution + General solution of the homogeneous equation.

Activity 11.6 Suppose that yt − 5yt−1 − 14yt−2 = 18, and that y0 = −1, y1 = 8.
Find yt .

Example 11.10 We want to find the general solution of the difference equation
11 yt+2 − 6yt+1 + 5yt = 8.

If we look for a constant solution we find that we would need to solve


y ∗ − 6y ∗ + 5y ∗ = 8. But this has no solution since y ∗ − 6y ∗ + 5y ∗ = 0.
So what do we do? Look at the solution of the homogeneous equation, which we
solved in Example 11.7. The auxiliary equation is

z 2 − 6z + 5 = (z − 1)(z − 5) = 0

so that the general solution of the homogeneous equation is

yt = A(1)t + B(5)t = A + B(5t ), A, B ∈ R.

Constant sequences are solutions of the homogeneous equation, so they cannot


possibly be a solution of the non-homogeneous equation. This kind of situation
occurred with first-order equations when the solution sequence was an arithmetic
progression. Bearing this in mind (and also looking at the case of the double root for
the homogeneous equation) let’s see if there is some constant m for which pt = mt is
a particular solution. Substituting into the equation, you should find that pt = mt is
a solution of the non-homogeneous equation if and only if m = −2. (Do this!)
Then the general solution is

yt = A + B(5t ) − 2t, A, B ∈ R.

168
11.8. Behaviour of solutions

11.8 Behaviour of solutions


You should know how to discuss the behaviour of the solution, or time path. For
example, suppose that the solution to a second-order difference equation is
yt = 3t − 2t .
How does this behave as t → ∞? As t → ∞, we know that both 3t and 2t tend to
infinity, but what happens to their difference? It appears that 3t grows much faster than
2t and thus we might expect that yt → ∞. This is indeed the case, as can be seen by
writing
 t !
2
3t − 2t = 3t 1 − .
3
Since (2/3)t tends to zero, it follows that 1 − (2/3)t tends to 1. The other factor 3t tends
to infinity, so the product tends to infinity.

11.9 Economic applications of second-order


difference equations
Second-order difference equations occur quite naturally in macro-economic modelling.
We shall consider a closed national economy, with no government (for the sake of
simplicity). Three important quantities tell us something about the state of the
economy:

Investment, I 11
Income, Y
Consumption, C
Suppose we can measure each of the quantities in successive time periods of equal
length (for example, each year). Denote by It , Yt , Ct the values of the key quantities in
time-period t. Then we have a sequence of values I0 , I1 , I2 , . . ., and similarly for the other
quantities. We shall assume that the equilibrium condition Yt = Ct + It holds for each t.
In the multiplier-accelerator model, we assume that the following equations link the
key quantities:

Ct = c + bYt−1 , where c and b are positive constants


It = i + v(Yt−1 − Yt−2 ), where i and v are positive constants.
Using the equilibrium condition Yt = Ct + It , we can obtain a difference equation
involving only the Y s.
Yt = Ct + It
= (c + bYt−1 ) + (i + v (Yt−1 − Yt−2 ))
= (c + i) + (b + v)Yt−1 − vYt−2 .

In other words, we have the second-order difference equation


Yt − (b + v)Yt−1 + vYt−2 = c + i.

169
11. Difference equations

Example 11.11 Suppose

3 1
Ct = Yt−1 , It = 40 + (Yt−1 − Yt−2 )
8 8
and let’s assume the equilibrium condition Yt = Ct + It holds. Let’s suppose that
Y0 = 65 and Y1 = 64.5, and try to determine an expression for Yt .
Arguing as above, we have

Yt = Ct + It
 
3 1
= Yt−1 + 40 + (Yt−1 − Yt−2 )
8 8
1 1
= 40 + Yt−1 − Yt−2 ,
2 8
so
1 1
Yt − Yt−1 + Yt−2 = 40.
2 8
The auxiliary equation is
1 1
z 2 − z + = 0,
2 8
which has discriminant (1/2)2 − 4(1/8) = −1/4. This is negative, so there are no
(real) solutions. We are therefore in the third case of a second-order difference
equation.
p To proceed, √ we use the method given above. We have
r = 1/8 = 1/(2 2), and
11 √
(−1/2) 2 2 1
cos θ = − = =√ ,
2r 4 2

so θ = π/4. Thus, the general solution to the homogeneous equation in this case is
 t
1
√ (E cos(πt/4) + F sin(πt/4)) .
2 2

We need a particular solution of

1 1
Yt − Yt−1 + Yt−2 = 40.
2 8
Trying Yt = k, a constant, we see that k − (1/2)k + (1/8)k = 40, so k = 64. It follows
that for some constants E and F ,
 t
1
yt = 64 + √ (E cos(πt/4) + F sin(πt/4)) .
2 2

To find E and F we use the initial conditions, Y0 = 65 and Y1 = 64.5. Now,

Y0 = 64 + E cos(0) + F sin(0) = 64 + E,

170
11.9. Overview

so E = 1. Also,
   
1 1
Y1 = 64 + E √ cos(π/4) + F √ sin(π/4)
2 2 2 2
     
1 1 1 1
= 64 + E √ √ +F √ √
2 2 2 2 2 2
E F
= 64 + + ,
4 4
and since this is 64.5, we have E + F = 2 and hence F = 1. The final answer is
therefore  t     
1 πt πt
Yt = 64 + √ cos + sin .
2 2 4 4

Overview
In this chapter we have considerably expanded our understanding of sequences by
studying methods for determining sequences that are defined by difference equations.
We have seen, too, that these have important applications in economics and finance.

Learning outcomes
At the end of this chapter and the relevant reading you should be able to: 11
solve problems involving first-order difference equations
solve second-order difference equations
analyse the behaviour of solutions to difference equations
solve problems involving the application of difference equations.

Test your knowledge and understanding


Work all the exercises below. The solutions are at the end of this chapter.
For additional practice, work the Exercises in chapters 3, 4, 5 and Exercises 23.1–23.8 in
the Anthony and Biggs text.

Exercises
Exercise 11.1
Planners believe that, as a result of a recent government grant scheme, the number of
new high technology businesses starting up each year will be N . There are already 3,000
such businesses in existence in the country, but it is expected that each year 5% of all
those in existence at the beginning of the year will fail (shut down). Let yt denote the

171
11. Difference equations

number of businesses at the end of year t. Explain why

yt = 0.95yt−1 + N.

Solve this difference equation for general N . Find a condition on N which will ensure
that the number of businesses will increase from year to year.

Exercise 11.2
The supply and demand functions for a good are

q S (p) = 0.05p − 4, q D (p) = 20 − 0.15p.

Find the equilibrium price. What is the inverse demand function pD (q)? Suppose that
the sequence of prices pt is determined by pt = pD (q S (pt−1 )) (as in the cobweb model).
Find an expression for pt .

Exercise 11.3
A market for a commodity is modelled by taking the demand and supply functions as
follows:
D(p) = 1 − p,

S(p) = p,
so that when the price p prevails the amount of commodity demanded by the market is
D(p) and the amount which producers will supply is S(p). Price adjusts over time t in
response to the excess of the demand over the supply according to the equation:
11
pt+1 − pt = a(D(pt ) − S(pt )),

where a is a positive constant. Initially the price p is p0 = 43 . Solve this equation and
show that over time the price adjusts towards the clearing value (i.e. the price at which
supply and demand are equal) if and only if

0 < a < 1.

Under what circumstances does the price tend towards the equilibrium price in an
oscillatory fashion? What happens to the price if a = 12 ?

Exercise 11.4
Find the general solution of the difference equation

yt − yt−1 − 6yt−2 = 0.

Exercise 11.5
(a) Suppose that consumption this year is the average of this year’s income and last
year’s consumption; that is,
1
Ct = (Yt + Ct−1 ).
2

172
11.9. Comments on selected activities

Suppose also that the relationship between next year’s income and current investment is
Yt+1 = kIt , for some positive constant k. Show that, if the equilibrium condition
Yt = Ct + It holds, then
 
k+1 k
Yt − Yt−1 + Yt−2 = 0.
2 2

(b) In the model set up in part (a), suppose that k = 3 and that the initial value Y0 is
positive. Show that Yt oscillates with increasing magnitude.

(c) Find the values of k for which the model set up in part (a) leads to an oscillating Yt ,
and determine whether or not the oscillations increase in magnitude. (Remember we are
given that k > 0.)

Comments on selected activities


Feedback to activity 11.1
The solution is yt = y0 + b t.

Feedback to activity 11.3


For yt = (2/3)yt−1 + 5, we have a = 2/3 and b = 5. Then,

b 5
y∗ = = = 15,
1−a 1 − (2/3)
11
and the solution is
 t  t
∗ ∗ t 2 2
yt = y + (y0 − y )a = 15 + (2 − 15) = 15 − 13 .
3 3

Feedback to activity 11.4


When a = −1, we have
yt = y ∗ + (y0 − y ∗ )(−1)t
and y ∗ alternately takes two values: it ‘flips’ between the value y ∗ + (y0 − y ∗ ) = y0
(when t is even) and the value y ∗ − (y0 − y ∗ ) = 2y ∗ − y0 (when t is odd).

Feedback to activity 11.5


We will show this for the sum. xt + zt and leave the scalar multiple cxt to you.
Given that xt and zt are solutions of the homogeneous difference equation
yt+2 + a1 yt+1 + a2 yt = 0, we know that xt+2 + a1 xt+1 + a2 xt = 0 and
zt+2 + a1 zt+1 + a2 zt = 0.
Then substituting yt = xt + zt into the difference equation and rearranging, we have

(xt+2 +zt+2 )+a1 (xt+1 +zt+1 )+a2 (xt +zt ) = (xt+2 +a1 xt+1 +a2 xt )+(zt+2 +a1 zt+1 +a2 zt ) = 0

since both of the expressions in brackets on the right-hand side of this equation are
equal to zero. Therefore, yt = xt + zt is also a solution.

173
11. Difference equations

Feedback to activity 11.6


The auxiliary equation is

z 2 − 5z − 14 = (z + 2)(z − 7) = 0,

with solutions −2 and 7. The homogeneous equation

yt − 5yt−1 − 14yt−2 = 0

therefore has general solution yt = A(−2)t + B(7t ). A particular solution of the


non-homogeneous equation is the constant solution y ∗ = 18/(1 − 5 − 14) = −1, so this
equation has general solution

yt = −1 + A(−2)t + B(7t ).

To find the values of A and B we use the given values of y0 and y1 . Since y0 = −1, we
must have −1 + A + B = −1 and since y1 = 8, −1 − 2A + 7B = 8. Solving these, we
obtain A = −1 and B = 1, and therefore

yt = −1 − (−2)t + 7t .

Comments on exercises
Solution to exercise 11.1
Since 5% of the yt−1 businesses in operation at the start of year t fail during that year,
11 it follows that 95% of these survive. Additionally, N new businesses are created, so

yt = 0.95yt−1 + N.

This is a first-order difference equation with, in the standard notation, a = 0.95 and
b = N . Also, from the given information, y0 = 3000. The time-independent solution is
b N N
y∗ = = = = 20N
1−a 1 − 0.95 0.05
and the solution is

yt = y ∗ + (y0 − y ∗ )(0.95)t = 20N + (3000 − 20N )(0.95)t .

There are several ways to solve the last part of the question. Perhaps the easiest way is
to notice that since (0.95)t decreases with t, yt will increase with t if and only if the
number, 3000 − 20N multiplying (0.95)t is negative. So we need 3000 − 20N < 0, or
N > 150.

Solution to exercise 11.2


The equilibrium price is given by 0.05p − 4 = 20 − 0.15p, so p = 120. The inverse
demand function is obtained by solving the equation q = 20 − 0.15p for p, so
400 20
pD (q) = p = − q.
3 3

174
11.9. Comments on exercises

Now,
400 20 1
pt = pD (q S (pt−1 )) = pD (0.05pt−1 − 4) = − (0.05pt−1 − 4) = 160 − pt−1 .
3 3 3
This has time-independent solution
16
p∗ = = 120,
1 − (−1/3)

which is the equilibrium price. The solution for pt is


 t
1
pt = 120 + (p0 − 120) − .
3

Solution to exercise 11.3


We have
pt+1 − pt = a ((1 − pt ) − pt ) = a − 2apt ,
so pt+1 = (1 − 2a)pt + a, which is entirely equivalent to the equation

pt = (1 − 2a)pt−1 + a.

Now, the time-independent solution is


a 1
p∗ = = ,
1 − (1 − 2a) 2
11
and so
 
∗ ∗ t1 3 1 1 1
pt = p + (p0 − p )(1 − 2a) = + − (1 − 2a)t = + (1 − 2a)t .
2 4 2 2 4

The equilibrium price is given by 1 − p = p, and so is 1/2. From our expression for pt ,
we see that pt → 1/2 as t → ∞ if and only if (1 − 2a)t → 0. For this to be true, we need
−1 < 1 − 2a < 1, which is equivalent to 0 < a < 1. The price will oscillate towards 1/2
when, additionally, 1 − 2a is negative. So this happens when 1/2 < a < 1. When
a = 1/2, 1 − 2a = 0 and the price pt equals 1/2 for all t.

Solution to exercise 11.4


The auxiliary equation is

z 2 − z − 6 = (z − 3)(z + 2) = 0,

so for some constants A and B

yt = A3t + B(−2)t .

175
11. Difference equations

Solution to exercise 11.5


(a) There are a number of ways of deriving the difference equation. We note first that
substituting It = Yt+1 /k in the equation Yt = Ct + It gives
1
Yt = Ct + Yt+1 .
k
Thus,
Yt+1
C t = Yt −
k
and so (replacing t by t − 1),
Yt
Ct−1 = Yt−1 − .
k
Substituting in the equation Ct = (Yt + Ct−1 )/2, we get
 
1 Yt
Yt − Yt+1 /k = Yt + Yt−1 − .
2 k
Rearranging and replacing t by t − 1, we obtain the second-order difference equation
 
k+1 k
Yt − Yt−1 + Yt−2 = 0.
2 2

(b) When k = 3, the difference equation is


3
Yt − 2Yt−1 + Yt−2 = 0.
2
11 The auxiliary equation z 2 − 2z + (3/2) = 0 has no solutions, so the solution for Yt is

Yt = rt (E cos θt + F sin θt) ,


p
where r = 3/2 and p p
cos θ = −(−2)/(2 3/2) = 2/3.
Since E = Y0 , and Y0 is positive, we have E > 0. Also r > 1, so Yt oscillates with
increasing magnitude.
(c) In general, the auxiliary equation for the difference equation is
 
2 k+1 k
z − z + = 0.
2 2
This has no solutions if  2  
k+1 k
<4 ,
2 2
that is,
(k + 1)2 < 8k.

In this case the general solution is of the form


r !t
k
Yt = (E cos θt + F sin θt) .
2

176
11.9. Comments on exercises

This solution is oscillatory.


Suppose that (k + 1)2 > 8k. Then the auxiliary equation has the roots
p
(k + 1)/2 ± (k + 1)2 /4 − 2k
α, β = ,
2
and both of these are positive. Then the solution is of the form

Yt = Aαt + Bβ t ,

and since α and β are positive, in this case there can be no oscillatory behaviour. The
same holds true when (k + 1)2 = 8k.
We have shown that oscillations occur when (k + 1)2 < 8k, in other words when k lies
strictly between the roots of the equation (k + 1)2 = 8k. Rewriting this as the quadratic
equation k 2 − 6k + 2 = 0, we find that the roots are
√ √
3 − 2 2 and 3 + 2 2.

So the model predicts that, when k is between these two numbers, the national income
Yt will oscillate. (In economics language, it will exhibit ‘business cycles’.)
Whether the oscillations increase
p ort decrease in magnitude depends p on k. Since the
solution involves the factor ( k/2) , the oscillations decrease if k/2 < 1 — that is, if
k < 2 — and increase if k > 2.

11

177
11. Difference equations

11

178
Chapter 12
Vector spaces and subspaces

Introduction
In this chapter we study the important theoretical concept of a vector space. This, and
the related concepts to be explored in the subsequent chapters, will help us understand
much more deeply and comprehensively what we’ve already learned about matrices and
linear equations. There is, necessarily, a bit of a step upwards in the level of
‘abstraction’, but it is worth the effort in order to help our fundamental understanding.

Aims
The aims of this chapter are to:

define a vector space and give examples


describe what is meant by a subspace
show how to determine whether a subset is a subspace
describe special subspaces associated with matrices
12

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Sections 5.1
and 5.2.

Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

Synopsis
We introduce the general concept of a vector space. We describe what is meant by a
subspace of a vector space and show how to determine whether a subset of a vector

179
12. Vector spaces and subspaces

space is a subspace. We then look at two special subspaces associated with any matrix:
its range and its null space.

12.1 Vector spaces

12.1.1 Definition of a vector space


We know that vectors of Rn can be added together and that they can be ‘scaled’ by real
numbers. That is, for every x, y ∈ Rn and every α ∈ R, it makes sense to talk about
x + y and αx. Furthermore, these operations of addition and multiplication by a scalar
(that is, multiplication by a real number) behave and interact ‘sensibly’, in that, for
example,
α(x + y) = αx + αy,
α(βx) = (αβ)x,
x + y = y + x,
and so on.
But it is not only vectors in Rn that can be added and multiplied by scalars. There are
other sets of objects for which this is possible. Consider the set V of all functions from
R to R. Then any two of these functions can be added: given f, g ∈ V we simply define
the function f + g by
(f + g)(x) = f (x) + g(x).
Also, for any α ∈ R, the function αf is given by

(αf )(x) = αf (x).

12 These operations of addition and scalar multiplication are sometimes said to be


pointwise addition and pointwise scalar multiplication. This might seem a bit
abstract, but think about what the functions x + x2 and 2x represent: the former is the
function x plus the function x2 , and the latter is the function x multiplied by the scalar
2. So this is just a different way of looking at something you are already familiar with.
It turns out that V and its rules for addition and multiplication by a scalar satisfy the
same key properties as the set of vectors in Rn with its addition and scalar
multiplication. We refer to a set with an addition and scalar multiplication which behave
appropriately as a vector space. We now give the formal definition of a vector space.
Definition 12.1 (Vector space) A (real) vector space V is a non-empty set
equipped with an addition operation and a scalar multiplication operation such that for
all α, β ∈ R and all u, v, w ∈ V ,

1. u + v ∈ V (closure under addition)

2. u + v = v + u (the commutative law for addition)

3. u + (v + w) = (u + v) + w (the associative law for addition)

4. there is a single member 0 of V , called the zero vector, such that for all v ∈ V ,
v+0=v

180
12.1. Vector spaces

5. for every v ∈ V there is an element w ∈ V (usually written as −v), called the


negative of v, such that v + w = 0
6. αv ∈ V (closure under scalar multiplication)
7. α(u + v) = αu + αv (distributive law)
8. (α + β)v = αv + βv (distributive law)
9. α(βv) = (αβ)v (associative law for scalar multiplication)
10. 1v = v.

Other properties follow from those listed in the definition. For instance, we can see that
0x = 0 for all x, as follows:

0x = (0 + 0)x = 0x + 0x,

so, adding the negative −0x of 0x to each side,

0 = 0x + (−0x) = (0x + 0x) + (−0x) = 0x + (0x + (−0x)) = 0x + 0 = 0x.

(A bit sneaky, but just remember the result: 0x = 0.)

Activity 12.1 Prove that (−1)x = −x, the negative of the vector x, using a similar
argument with 0 = 1 + (−1).

(Note that this definition says nothing at all about ‘multiplying’ together two vectors:
the only operations with which the definition is concerned are addition and scalar
multiplication.)
A vector space as we have defined it is called a real vector space, to emphasise that 12
the ‘scalars’ α, β and so on are real numbers rather than (say) complex numbers. There
is a notion of complex vector space, where the scalars are complex numbers, which
we shall not cover. In this guide all scalars will be real numbers.

12.1.2 Examples

Example 12.1 The set Rn is a vector space with the usual way of adding and
scalar multiplying vectors.

Example 12.2 The set V = {0} consisting only of the zero vector is a vector
space, with addition defined by 0 + 0 = 0, and scalar multiplication defined by
α0 = 0 for all α ∈ R.

Example 12.3 The set V of functions from R to R with pointwise addition and
scalar multiplication (described earlier in this section) is a vector space. Note that
the zero vector in this space is the function that maps every real number to 0 —
that is, the identically-zero function.

181
12. Vector spaces and subspaces

Activity 12.2 Show that all 10 properties of a vector space are satisfied. In
particular, if the function f is a vector in this space, what is the vector −f ?

Example 12.4 The set of m × n matrices with real entries is a vector space, with
the usual addition and scalar multiplication of matrices. The ‘zero vector’ in this
vector space is the zero m × n matrix which has all entries equal to 0.

Example 12.5 Let V be the set of all vectors in R3 with third entry equal to 0,
that is,   
 x 
V =  y  : x, y ∈ R .
0
 

Then V is a vector space with the usual addition and scalar multiplication. To verify
this, we need only check that V is closed under addition and scalar multiplication.
The associative, commutative and distributive laws (properties 2, 3, 7, 8, 9, 10) will
hold for vectors in V because they hold for all vectors in R3 (and all linear
combinations of vectors in V are in V ). Furthermore, if we can show that V is
closed under scalar multiplication, then for any particular v ∈ V , 0v = 0 ∈ V and
(−1)v = −v ∈ V . So we simply need to check that V 6= ∅ (V is non-empty), that if
u, v ∈ V then u + v ∈ V , and if α ∈ R and v ∈ V then αv ∈ V . Each of these is
easy to check.

Activity 12.3 Verify that V 6= ∅, and that for u, v ∈ V and α ∈ R, u + v ∈ V and


αv ∈ V .

12.2 Subspaces
12
The last example above is informative. Arguing as we did there, if V is a vector space
and W ⊆ V is non-empty and closed under scalar multiplication and addition, then W
too is a vector space (and we do not need to verify that all the other properties hold).
The formal definition of a subspace is as follows.
Definition 12.2 (Subspace) A subspace W of a vector space V is a non-empty
subset of V that is itself a vector space (under the same operations of addition and
scalar multiplication as V ).

The discussion given justifies the following important result.


Theorem 12.1 Suppose V is a vector space. Then a non-empty subset W of V is a
subspace if and only if:

for all u, v ∈ W , u + v ∈ W (W is closed under addition), and

R
for all v ∈ W and α ∈ R, αv ∈ W (W is closed under scalar multiplication).

Read the Comment on Activity 5.17 in the A-H text for an explanation of why
this theorem is true.

182
12.2. Subspaces

Example 12.6 In R2 , the lines y = 2x and y = 2x + 1 can be defined as the sets of


vectors,
     
x x
S= : y = 2x, x ∈ R , U= : y = 2x + 1, x ∈ R .
y y

Each vector in one of the sets is the position vector of a point on that line. We will
show that the set S is a subspace of R2 , and that the set U is not a subspace of R2 .
   
1 0
If v = and p = , these sets can equally well be expressed as,
2 1

S = {x : x = tv, t ∈ R} U = {x : x = p + tv, t ∈ R}.

Activity 12.4 Show that the two descriptions of S describe the same set of vectors.

Example 12.6 (continued)


To show S is a subspace, we need to show that it is non-empty, and we need to show
that it is closed under addition and closed under scalar multiplication using any
vectors in S and any scalar in R. We’ll use the second set of definitions, so our line is
the set of vectors  
1
S = {x : x = tv, t ∈ R} v= .
2
The set S is non-empty, since 0 = 0v ∈ S.
Let u, w be any vectors in S and let α ∈ R. Then
   
1 1
u=s w=t for some s, t ∈ R.
2 2 12
closure under addition:
     
1 1 1
u+w =s +t = (s + t) ∈S (since s + t ∈ R).
2 2 2

closure under scalar multiplication:


    
1 1
αu = α s = (αs) ∈S (since αs ∈ R).
2 2

This shows that S is a subspace of R2 .


To show U is not a subspace, any one of the three following statements
(counterexamples) will suffice.
1. 0 ∈
/ U.
2. U is not closed under addition:
         
0 1 0 1 1
∈U ∈U but + = ∈
/U
1 3 1 3 4

183
12. Vector spaces and subspaces

since 4 6= 2(1) + 1.
3. U is not closed under scalar multiplication:
     
0 0 0
∈ U, 2 ∈ R but 2 = ∈
/U
1 1 2

Activity 12.5 Show that 0 ∈


/ U . Explain why this suffices to show that U is not a
subspace.

Example 12.6 (continued)


The line y = 2x + 1 is an example of an affine set, a translation of a subspace.
It is useful to visualise what is happening here by looking at the graphs of the lines
y = 2x and y = 2x + 1. Sketch y = 2x and sketch the position vector of any point on
the line. You will find that the vector lies along the line, so any scalar multiple of
that position vector will also lie along the line, as will the sum of any two such
position vectors. These position vectors are all still in the set S. Now sketch the line
y = 2x + 1. First notice that it does not contain the origin. Now sketch the position
vector of any point on the line. You will find that the position vector does not lie
along the line, but goes from the origin up to the point on the line. If you scalar
multiply this vector by any constant α 6= 1, it will be the position vector of a point
which is not on the line, so the resulting vector will not be in U . The same is true if
you add together the position vectors of two points on the line. So U is not a
subspace.

Activity 12.6 Let v be any non-zero vector in a vector space V . Show that the set
12
S = {αv : α ∈ R}

is a subspace of V . The set S defines a line through the origin in V .

If V is a vector space, the sets V and {0} are subspaces of V . The set {0} is not empty,
it contains one vector, namely the zero vector. It is a subspace because 0 + 0 = 0 and
α0 = 0 for any α ∈ R.
Given any subset S of a vector space V , how do you decide if it is a subspace? First
check that 0 ∈ S. Then using some vectors in the subset, see if adding them and scalar
multiplying them will give you another vector in S. To prove that S is a subspace, you
will need to verify that it is closed under addition and closed under scalar multiplication
for any vectors in S, so you will need to use letters to represent general vectors, or
components of general vectors, in the set. That is, using letters show that the sum u + v
and the scalar product αu of vectors in S also satisfy the definition of a vector in S.
To prove a set S is not a subspace you only need to find one counterexample, one or two
particular vectors (use numbers) for which the sum or the scalar product does not
satisfy the definition of S. Note that if 0 is not in the set, it cannot be a subspace.

184
12.3. Subspaces connected with matrices

Activity 12.7 Write down a general vector (using letters) and a particular vector
(using numbers) for each of the following subsets. Show that one of the sets is a
subspace of R3 and the other is not.
     
 x   x 
S1 =  x2  : x ∈ R , S2 =  2x  : x ∈ R
0 0
   

12.2.1 An alternative characterisation of a subspace

We have seen that a subspace is a non-empty subset W of a vector space that is closed
under addition and scalar multiplication, meaning that if u, v ∈ W and α ∈ R, then
both u + v and αv are in W . Now, it is fairly easy to see that the following equivalent
property characterises when W will be a subspace:
Theorem 12.2 A non-empty subset W of a vector space is a subspace if and only if
for all u, v ∈ W and all α, β ∈ R, we have αu + βv ∈ W .

That is, W is a subspace if it is non-empty and closed under linear combination.

12.3 Subspaces connected with matrices

12.3.1 Null space

Suppose that A is an m × n matrix. Then the null space N (A), the set of solutions to
the homogeneous linear system Ax = 0, is a subspace of Rn .
12
Theorem 12.3 For any m × n matrix A, N (A) is a subspace of Rn .

R Read the proof of this theorem in the A-H text, where it is numbered as
Theorem 5.28. But before you do so, try to prove it yourself by showing that N (A)
is non-empty, closed under addition and closed under scalar multiplication.
The null space is the set of solutions to the homogeneous linear system. If we instead
consider the set of solutions S to a general system Ax = b, S is not a subspace of Rn if
b 6= 0 (that is, if the system is not homogeneous). This is because 0 does not belong to
S. However, as we saw in Chapter 5 (Theorem 6.2), there is a relationship between S
and N (A): if x0 is any solution of Ax = b then S = {x0 + z : z ∈ N (A)}, which we may
write as x0 + N (A). S is an affine set, a translation of the subspace N (A).
Generally, if W is a subspace of a vector space V and x ∈ V then the set x + W defined
by
x + W = {x + w : w ∈ W }

is called an affine subset of V . An affine subset is not generally a subspace (although


every subspace is an affine subset, as we can see by taking x = 0).

185
12. Vector spaces and subspaces

12.3.2 Range
Recall that the range of an m × n matrix is

R(A) = {Ax : x ∈ Rn }.

R
Theorem 12.4 For any m × n matrix A, R(A) is a subspace of Rm .

Read the proof of this theorem in the A-H text, where it is numbered as
Theorem 5.30. But again before you do so, think about why it is true and see if you
can write out a proof.

Overview
We have defined what is meant by a vector space and a subspace, and have seen how to
determine whether a subset of a vector space is a subspace. Two particular subsets
associated with a matrix have been shown to be subspaces: the range and the null space.

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

explain what is meant by a vector space and a subspace


prove that a given set is a vector space, or a subspace of a given vector space
prove that the null space and range of a matrix are subspaces

12
Test your knowledge and understanding
Work Exercises 5.1, 5.2 and 5.6 in the text A-H. The solutions to all exercises in the
text can be found at the end of the textbook.
Work Problem 5.6 in the text A-H. You will find the solution on the VLE.

Comments on selected activities


Feedback to activity 12.1
For any x,
0 = 0x = (1 + (−1))x = 1x + (−1)x = x + (−1)x
so adding the negative −x of x to each side, and using properties 3 and 4 of the
definition of a vector space,

−x = −x + 0 = −x + x + (−1)x = (−1)x

which proves that −x = (−1)x.

186
12.3. Comments on selected activities

Feedback to activity 12.2


The properties are not hard to check: we omit the details here. The negative of a
function f is the function −f given by (−f )(x) = −(f (x)) for all x.

Feedback to activity 12.3


Clearly V 6= ∅. Suppose    0
x x
u =  y  , v =  y 0  ∈ V,
0 0
and that α ∈ R. Then
x + x0
 

u + v =  y + y0  ∈ V
0
and  
αx
αv =  αy  ∈ V.
0

Feedback to activity 12.5


The vector 0 is not in the set U as
     
0 0 1
0= 6= +t for any t ∈ R,
0 1 2

so property 4 of the definition of a vector space is not satisfied (definition 12.1).

Feedback to activity 12.6


Note first that S is non-empty because 0 ∈ S. Suppose that x, y ∈ S. (Why are we not
using the usual symbols u and v? Can you see? It’s because v is already used in the
definition of the set S.) Suppose also that β ∈ R. Now, because x and y belong to S, 12
there are α, α0 ∈ R such that x = αv and y = α0 v. Then,

x + y = αv + α0 v = (α + α0 )v,

which is in S since it is a scalar multiple of v. Also,

βx = β(αv) = (βα)v ∈ S

and it follows that S is a subspace.


 
Feedback to activity 12.7 x
A general vector in S1 is of the form x2 , x ∈ R, and one particular vector, taking

  0
1
x = 1, is  1 . To show S1 is not a subspace you need to find one counterexample, one
0
or two particular vectors in S1 which do not satisfy the closure properties, such as
       
1 1 1 2
 1  ∈ S1 but  1  +  1  =  2  ∈ / S1 .
0 0 0 0

187
12. Vector spaces and subspaces

 
x
A general vector in S2 is of the form  2x , x ∈ R, and one particular vector, taking
  0
1
x = 1, is 2 . S2 is a subspace. We show it is closed under addition and scalar

0
multiplication using general vectors: if u, v ∈ S2 , a, b ∈ R,
     
a b a+b
u + v =  2a  +  2b  =  2(a + b)  ∈ S2 , a + b ∈ R,
0 0 0

and, if α ∈ R,  
αa
αu =  2(αa)  ∈ S2 , αa ∈ R.
0

12

188
Chapter 13
Linear span and linear independence

Introduction
In this chapter, we continue our study of vector spaces by looking at what is meant by
the linear span of a set of vectors, and we meet the concept of linear independence of a
set of vectors. These ideas are central to a deeper understanding of linear algebra.

Aims
The aims of this chapter are to:

define what is meant by the linear span of a set of vectors in a vector space

define what it means to say that a set of vectors is linearly independent or linearly
dependent

show how to determine whether a set of vectors is linearly independent; and, if it is


not, how to express one of the vectors as a linear combination of the others.

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Sections 5.3 13
and 6.1.

Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

Synopsis
We define what is meant by linear combinations and linear span of vectors in a vector
space, We look at the geometric interpretation of linear span in Rn , and we define the
row space and column space of a matrix. We then say what it means for a set of vectors

189
13. Linear span and linear independence

to be linearly independent or linearly dependent, and show how to test whether a set of
vectors is linearly independent.

13.1 Linear span


Recall that by a linear combination of vectors v1 , v2 , . . . , vk we mean a vector of the
form
v = α1 v1 + α2 v2 + · · · + αk vk .

If we add together two vectors of this form, we get another linear combination of the
vectors v1 , v2 , . . . , vk . The same is true of any scalar multiple of v.

Activity 13.1 Show this; show that if v = α1 v1 + α2 v2 + · · · + αk vk and


w = β1 v1 + β2 v2 + · · · + βk vk then v + w and sv, s ∈ R, are also linear
combinations of the vectors v1 , v2 , . . . , vk .

The set of all linear combinations of a given set of vectors of a vector space V forms a
subspace, and we give it a special name.
Definition 13.1 (Linear span) Suppose that V is a vector space and that
v1 , v2 , . . . , vk ∈ V . The linear span of X = {v1 , . . . , vk } is the set of all linear
combinations of the vectors v1 , . . . , vk , denoted by Lin{v1 , v2 , . . . , vk } or Lin(X) .
That is,
Lin{v1 , v2 , . . . , vk } = {α1 v1 + · · · + αk vk : α1 , α2 , . . . , αk ∈ R}.
Theorem 13.1 If X = {v1 , . . . , vk } is a set of vectors of a vector space V , then Lin(X)

R
is a subspace of V . It is the smallest subspace containing the vectors v1 , v2 , . . . , vk .

Read the proof of this theorem in the A-H textbook, where it is numbered
Theorem 5.33.
13 The subspace Lin(X) is also known as the subspace spanned by the set
X = {v1 , . . . , vk }, or, simply, as the span of {v1 , v2 , . . . , vk }.
Different texts may use different notations for the linear span of a set of vectors.
Notation is important, but it is nothing to get anxious about: just always make it clear
what you mean by your notation: use words as well as symbols!

13.1.1 Lines and planes in R3


What is the set Lin{v} of a single non-zero vector v ∈ Rn ? We have already seen that
this defines a line through the origin in any vector space, Rn , as we have
Lin{v} = {αv : α ∈ R},
and in Activity 12.6 on page 184, you proved that this is a subspace for any vector
space, V .
In Section 4.4 of this study guide we saw that a plane in R3 can be defined as the set of
all vectors x = (x, y, z)T whose components satisfy a single Cartesian equation,

190
13.1. Linear span

ax + by + cz = d, or as the set of all vectors x which satisfy a vector equation with two
parameters, x = p + sv + tw, s, t ∈ R, where v and w are non-parallel vectors and p is
the position vector of a point on the plane. These definitions are equivalent as it is
possible to go from one representation of a given plane to the other.
If d = 0, the plane contains the origin, so, taking p = 0, the plane is the set of vectors
{x : x = sv + tw, s, t ∈ R}.
Since this is the linear span, Lin{v, w}, of two vectors in R3 , a plane through the origin
is a subspace of R3 .
Let’s look at a specific example.

Example 13.1 Let S be the set given by


  
 x 
S =  y  : 3x − 2y + z = 0 .
z
 

Then for x ∈ S,
           
x x x 0 1 0
x= y =
   y  =  0  +  y  =x  0  + y 1.

z 2y − 3x −3x 2y −3 2

That is, x = xv1 + yv2 where x, y can be any real numbers and v1 , v2 are the
vectors given above. Since S is the linear span of two vectors, it is a subspace of R3 .
Of course, you can show directly that S is a subspace by showing it is non-empty,
and closed under addition and scalar multiplication.

If d 6= 0 then the plane is not a subspace. It is an affine set, a translation of a linear


space.

Activity 13.2 Show this in general. Show that the set


   13
 x 
S =  y  : ax + by + cz = d
z
 

is a subspace if d = 0 and it is not a subspace if d 6= 0. Do this by showing that S is


non-empty, and that the set S as defined above is closed under addition and scalar
multiplication if d = 0, but not if d 6= 0.

Conversely, if v1 , v2 ∈ R3 are two non-parallel vectors, then the set K = Lin{v1 , v2 } is


a plane, and we can obtain its Cartesian equation. Let us return to Example 13.1.

Example 13.7 (continued)


The plane is the set of all linear combinations of v1 and v2 , that is, all vectors x
such that      
x 1 0
x= y =s
   0  + t 1
 s, t ∈ R.
z −3 2

191
13. Linear span and linear independence

This yields three equations in the two unknowns, s and t. Eliminating s and t from
these equations yields a single Cartesian equation between the variables x, y, z:

x=s 
y=t =⇒ z = −3x + 2y or 3x − 2y + z = 0.
z = −3s + 2t

In the same way as for planes in R3 , any hyperplane in Rn which contains the origin is a
subspace of Rn . You can show this directly, exactly as in the activity above, or you can
show it is the linear span of n − 1 vectors in Rn .

13.1.2 Row space and column space


The linear span of the column vectors of an m × n matrix A is called the column
space of A and denoted by CS(A). It is a subspace of Rm .
In Section 12.3 we observed that the range R(A) of an m × n matrix A is a subspace of
Rm ., and in Section 9.4 we saw that the set R(A) is equal to the set of all linear
combinations of the columns of A. So the range of A and the column space of A are
equal, R(A) = CS(A). Although defined differently, they are the same subspace of Rm .
It is also possible to consider the row space RS(A) of a matrix: this is the linear span
of the rows of A considered as vectors in Rn . If A is an m × n matrix the row space is a
subspace of Rn
As we saw in Section 12.3, the null space of A, N (A) is also a subspace of Rn . Because
the vectors in the null space are the solutions of the homogeneous system Ax = 0, every
vector in the null space is orthogonal to every vector in the row space of A, meaning

R
that the inner product hr, xi = 0 for any r ∈ RS(A) and x ∈ N (A).
Read Section 5.3 of the A-H textbook for a more detailed explanation of what
we have seen so far about linear span.
13
13.2 Linear independence
Linear independence is a central idea in the theory of vector spaces. If {v1 , v2 , . . . , vk }
is a set of vectors in a vector space V , then the vector equation
α1 v1 + α2 v2 + · · · + αr vk = 0
always has the trivial solution, α1 = α2 = · · · = αk = 0.
We say that vectors x1 , x2 , . . . , xk in Rn are linearly dependent if there are numbers
α1 , α2 , . . . , αk , not all zero, such that
α1 x1 + α2 x2 + · · · + αk xk = 0.
In this case the left-hand side is termed a non-trivial linear combination. The
vectors are linearly independent if they are not linearly dependent; that is, if no
non-trivial linear combination of them is the zero vector or, equivalently, whenever
α1 x1 + α2 x2 + · · · + αk xk = 0,

192
13.2. Linear independence

then, necessarily, α1 = α2 = · · · = αk = 0. We have been talking about Rn , but the


same definitions can be used for any vector space V . We state them formally now.
Definition 13.2 (Linear independence) Let V be a vector space and v1 , . . . , vk ∈ V .
Then v1 , v2 , . . . , vk form a linearly independent set or are linearly independent
if and only if

α1 v1 + α2 v2 + · · · + αk vk = 0 =⇒ α1 = α2 = · · · = αk = 0 :

that is, if and only if no non-trivial linear combination of v1 , v2 , . . . , vk equals the zero
vector.

Definition 13.3 (Linear dependence) Let V be a vector space and


v1 , v2 , . . . , vk ∈ V . Then v1 , v2 , . . . , vk form a linearly dependent set or are
linearly dependent if and only if there are real numbers α1 , α2 , . . . , αk , not all zero,
such that
α1 v1 + α2 v2 + · · · + αk vk = 0,
that is, if and only if some non-trivial linear combination of the vectors is the zero
vector.

Example 13.8 In R3 , the following vectors are linearly dependent:


     
1 2 4
v1 =  2  , v2 =  1  , v3 =  5  .
3 5 11

This is because
2v1 + v2 − v3 = 0.
Note that this can also be written as v3 = 2v1 + v2 .

This example illustrates the following general result. Try to prove it yourself before
looking at the proof. 13
Theorem 13.2 The set {v1 , v2 , . . . , vk } ⊆ V is linearly dependent if and only if some

R
vector vi is a linear combination of the other vectors.

Read the proof of this theorem in the A-H textbook, where it is numbered
Theorem 6.6.
It follows from this theorem that a set of two vectors is linearly dependent if and only if
one vector is a scalar multiple of the other.

Example 13.9 The vectors


   
1 2
v1 =  2  , v2 =  1  ,
3 5

in Example 13.8 are linearly independent, since one is not a scalar multiple of the
other.

193
13. Linear span and linear independence

Activity 13.3 Show that, for any vector v in a vector space V , the set of vectors
{v, 0} is linearly dependent.

13.3 Testing for linear independence in Rn


Given k vectors v1 , . . . , vk ∈ Rn , the vector equation

α1 v1 + α2 v2 + · · · + αr vk = 0

is a homogeneous system of n linear equations in k unknowns. This can be written in


matrix form as Ax = 0, where A is the n × k matrix A = (v1 v2 · · · vk ) with the
vectors v1 , v2 , . . . vk as its columns, and x is the vector (or k × 1 matrix),

α1
 
 α2 
x=  ...  .

αk

Recall Theorem 4.2, (page 53) that the matrix product Ax is exactly the linear
combination α1 v1 + αv2 + · · · + αk vk .
Then the question of whether or not a set of vectors in Rn is linearly independent can
be answered by looking at the solutions of the homogeneous system Ax = 0.
Theorem 13.3 The vectors v1 , v2 , . . . , vk are linearly dependent if and only if the
linear system Ax = 0 has a solution other than x = 0, where A is the matrix
A = (v1 v2 · · · vk ). Equivalently, the vectors are linearly independent precisely when
the only solution to the system is x = 0.

If the vectors are linearly dependent, then any solution x 6= 0 of the system Ax = 0 will
13 directly give a non-trivial linear combination of the vectors that equals the zero vector.

Activity 13.4 Show that the vectors


     
1 1 2
v1 = , v2 = , v3 =
2 −1 −5

are linearly dependent by solving Ax = 0. Use your solution to give a non-trivial


linear combination of the vectors that equals the zero vector.

Looking further, we know from our experience of solving linear systems with row
operations that the system Ax = 0 will have precisely the one solution x = 0 if and
only if we obtain from the n × k matrix A an echelon matrix in which there are k
leading ones. That is, if and only if rank(A) = k. (Think about this!) Thus, we have the
following result.
Theorem 13.4 Suppose that v1 , . . . , vk ∈ Rn . Then the set {v1 , . . . , vk } is linearly
independent if and only if the n × k matrix (v1 v2 · · · vk ) has rank k.

194
13.3. Testing for linear independence in Rn

But the rank is always at most the number of rows, so we certainly need to have k ≤ n.
Also, there is a set of n linearly independent vectors in Rn . In fact, there are infinitely
many such sets, but an obvious one is
{e1 , e2 , . . . , en } ,
where ei is the vector with every entry equal to 0 except for the ith entry, which is 1.

Activity 13.5 Show that the set of vectors

{e1 , e2 , . . . , en } ,

in Rn is linearly independent.

Thus, we have the following result.


Theorem 13.5 The maximum size of a linearly independent set of vectors in Rn is n.

So any set of more than n vectors in Rn is linearly dependent. On the other hand, it
should not be imagined that any set of n or fewer is linearly independent: that isn’t true.

Example 13.10 In R4 , which of the following sets of vectors are linearly


independent?          

 1 1 2 0 2 
0 2 1 0 5
         
L1 =   −1  ,  9  ,  3  ,  1  ,  9  ,
        

 
0 2 1 0 1
 
   

 1 1  
0   2 

L2 =    ,   ,

 −1   9  
0 2
 
     
1 1 2 
13

 
0   2   1 

L3 =    ,   ,   ,

 −1   9   3  
0 2 1
 
       

 1 1 2 0  
0 2 1 0
       
L4 =   −1  ,  9  ,  3  ,  1  .
      

 
0 2 1 0
 

Try this yourself before reading the answers.


The set L1 is linearly dependent because it consists of five vectors in R4 . The set L2
is linearly independent because neither vector is a scalar multiple of the other. To
see that the set L3 is linearly dependent, write the vectors as the columns of a
matrix A and reduce A to echelon form to find that the rank of A is 2. This means
that there is a non-trivial linear combination of the vectors which is equal to 0, or
equivalently, that one of the vectors is a linear combination of the other two. The
last set, L4 contains the set L3 and is therefore also linearly dependent, since it is
still true that one of the vectors is a linear combination of the others.

195
13. Linear span and linear independence

Activity 13.6 For the set L3 above, find the solution of the corresponding
homogeneous system Ax = 0 where A is the matrix whose columns are the vectors
of L3 . Use the solution to write down a non-trivial linear combination of the vectors
that is equal to the zero vector. Express one of the vectors as a linear combination of
the other two.

There is an important property of linearly independent sets of vectors which holds for
any vector space V .
Theorem 13.6 If x1 , x2 , . . . , xm are linearly independent in V and

c1 x1 + c2 x2 + · · · + cm xm = c01 x1 + c02 x2 + · · · + c0m xm

then
c1 = c01 , c2 = c02 , ..., cm = c0m .

Activity 13.7 Prove this. Use the fact that

c1 x1 + c2 x2 + · · · + cm xm = c01 x1 + c02 x2 + · · · + c0m xm

if and only if

(c1 − c01 )x1 + (c2 − c02 )x2 + · · · + (cm − c0m )xm = 0.

What does this theorem say about x = c1 x1 + c2 x2 + · · · + cm xm ? (Think about this


before you continue reading.)
It says that if a vector x can be expressed as a linear combination of linearly
independent vectors, then this can be done in only one way. The linear combination is
unique.

13 Overview
We have seen what is meant by linear combinations and linear span of vectors in a
vector space. We looked at lines and planes in R3 as examples. We defined the row
space and column space of a matrix and looked at their relationships with the range
and null space. We have defined linear independence and dependence and shown how to
determine whether a given set of vectors is linearly independent. We have seen that a
linearly independent set of vectors in Rn contains at most n vectors, and that there is at
most one way to express a vector as a linear combination of linearly independent
vectors.

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

explain what is meant by the linear span of a set of vectors

196
13.3. Test your knowledge and understanding

define column space, CS(A), and row space, RS(A) of a matrix A and explain why
CS(A) = R(A), where R(A) is the range of A
explain what is meant by the statement that the row space of a matrix A is
orthogonal to the null space of A and why this is true
define what is meant by linearly independence and linearly dependence of a set of
vectors
determine whether a given set of vectors is linearly independent or linearly
dependent, and in the latter case find a non-trivial linear combination of the
vectors which equals the zero vector

Test your knowledge and understanding


Work Exercises 5.3–5.5, 6.1–6.7 in the text A-H. The solutions to all exercises in the
text can be found at the end of the textbook.
Work Problems 5.4 and 6.4 in the text A-H. You will find the solutions on the VLE.

Comments on selected activities


Feedback to activity 13.1
Any two such vectors will be of the form

v = α 1 v 1 + α 2 v2 + · · · + α k vk

and
v0 = α10 v1 + α20 v2 + · · · + αk0 vk
and we will have

v + v0 = (α1 + α10 )v1 + (α2 + α20 )v2 + · · · + (αk + αk0 )vk ,

which is a linear combination of the vectors v1 , v2 , · · · , vk . Also,


13
αv = α(α1 v1 + α2 v2 + · · · + αk vk ) = (αα1 )v1 + (αα2 )v2 + · · · + (ααk )vk ,

is a linear combination of the vectors v1 , v2 , · · · , vk .

Feedback to activity 13.2


It’s easy to see that S 6= ∅. Suppose d = 0. Let u, v ∈ S and α ∈ R. Then
   0
x x
u = y , v = y0  ,
  
z z0

where ax + by + cz = 0 and ax0 + by 0 + cz 0 = 0. Consider u + v. This equals

x + x0
   
X
 Y  =  y + y0 
Z z + z0

197
13. Linear span and linear independence

and we want to show this belongs to S. Now, this is the case, because

aX + bY + cZ = a(x + x0 ) + b(y + y 0 ) + c(z + z 0 )

= (ax + by + cz) + (ax0 + by 0 + cz 0 ) = 0 + 0 = 0,


and similarly it can be shown that, for any α ∈ R, αv ∈ S. So, in this case, S is a
subspace. You can see why this argument fails when d is not 0, because then
aX + bY + cZ will equal 2d, which will not be the same as d. So we will not have
u + v ∈ S. (Similarly, we could see that αv will not be in S if α 6= 1.) Also, if d 6= 0, the
simple statement that 0 does not satisfy the equation means that in this case S is not a
subspace.
Feedback to activity 13.3
The linear combination, 0v + 10 = 0, is a non-trivial linear combination of the vectors
which is equal to the zero vector. Any set of vectors containing the zero vector, 0, is
linearly dependent.
Feedback to activity 13.4
Let  
1 1 2
A= .
2 −1 −5
Then, using row operations, it can be seen that the general solution x to Ax = 0 is
x = (r, −3r, r)T for r ∈ R. In particular, taking r = 1, and multiplying out the equation
Ax = 0, we have that
       
1 1 2 0
−3 + = .
2 −1 −5 0
Feedback to activity 13.5
You can argue this directly. Looking at the components of the vector equation

a1 e 1 + a2 e 2 + . . . a n e n = 0

13 you can see that the positions of the ones and zeros in the vectors lead to the equations
a1 = 0 from the first component, a2 = 0 from the second component, and so on, so that
ai = 0 (1 ≤ i ≤ n), is the only possible solution. Alternatively, the matrix
A = (e1 , e2 , . . . , en ) is the n × n identity matrix, so the only solution to Az = 0 is the
trivial solution, proving that the vectors are linearly independent.
Feedback to activity 13.6
The general solution to the system is
   
x −3/2
x =  y  = t  −1/2  , t ∈ R.
z 1
Taking t = −1 and multiplying out the equation Ax = 0, we see that
     
1 1 2
3 0  1 2 1
 +   −   = 0,
2  −1  2  9   3 
0 2 1

198
13.3. Comments on selected activities

and hence      
2 1 1
1 3  0  1 2
 = 
 3  2  −1  + 2  9  .
  

1 0 2

Feedback to activity 13.7


As noted,
c1 x1 + c2 x2 + · · · + cm xm = c01 x1 + c02 x2 + · · · + c0m xm
if and only if
(c1 − c01 )x1 + (c2 − c02 )x2 + · · · + (cm − c0m )xm = 0.
But since the vectors are linearly independent, this can be true only if c1 − c01 = 0,
c2 − c02 = 0, and so on. That is, for each i, we must have ci = c0i .

13

199
13. Linear span and linear independence

13

200
Chapter 14
Bases and dimension

Introduction
In this chapter we look more deeply into the structure of a vector space, developing the
concept of a basis, which will enable us to know precisely what we mean by the
dimension of a vector space. We also discuss the important rank-nullity theorem.

Aims
The aims of this chapter are to:

introduce the concept of basis, and the dimension of a finite-dimensional vector


space
show how to find bases
explain how rank and nullity are defined, and the relationship between them (the
rank-nullity theorem).

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods.
Sections 6.2–6.5.

Further reading 14
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

Synopsis
We define what is meant by a finite basis of a vector space (or subspace) and how this
leads to the concept of dimension. Then we explain what is meant by coordinates with
respect to a basis. We also show how to find bases in some important circumstances. We
define the rank and nullity of a matrix and show how these are related.

201
14. Bases and dimension

14.1 Basis
The following result about Rn is very important in the theory of vector spaces. It says
that a linearly independent set of n vectors in Rn spans Rn .
Theorem 14.1 If v1 , v2 , . . . , vn are linearly independent vectors in Rn , then for any x

R
in Rn , x can be written as a unique linear combination of v1 , . . . , vn .

Read the proof of this theorem from the A-H text, where it is labelled
Theorem 6.22.
It follows from this theorem that if we have a set of n linearly independent vectors in
Rn , then the set of vectors also spans Rn , so any vector in Rn can be expressed in
exactly one way as a linear combination of the n vectors. We say that the n vectors form
a basis of Rn . The formal definition of a (finite) basis for a vector space is as follows.
Definition 14.1 ((Finite) Basis) Let V be a vector space. Then the subset
B = {v1 , v2 , . . . , vn } of V is said to be a basis for (or of) V if:

B is a linearly independent set of vectors, and

V = Lin(B).

An alternative characterisation of a basis can be given: B is a basis of V if every vector


in V can be expressed in exactly one way as a linear combination of the vectors in B.
The set B spans V if and only if a linear combination exists, and B is linearly
independent if and only if any linear combination is unique. We have therefore shown
Theorem 14.2 B = {v1 , v2 , . . . , vn } is a basis of V if and only if any v ∈ V is a
unique linear combination of v1 , v2 , . . . , vn .

Example 14.1 The vector space Rn has the basis {e1 , e2 , . . . , en } where ei is (as
earlier) the vector with every entry equal to 0 except for the ith entry, which is 1.
It’s clear that the vectors are linearly independent (as you showed in Activity 13.5
14 on page 195), and there are n of them, so we know straight away that they form a
basis. In fact, it’s easy to see that they span the whole of Rn , since for any
x = (x1 , x2 , . . . , xn )T ∈ Rn ,

x = x1 e 1 + x 2 e 2 + · · · + xn e n .

The basis {e1 , e2 , . . . , en } is called the standard basis of Rn .

Example 14.2 We will find a basis of the subspace of R3 given by


  
 x 
W =  y  : x + y − 3z = 0 .
z
 

202
14.1. Basis

If x = (x, y, z)T is any vector in W , then its components must satisfy y = −x + 3z,
and we can express x as
       
x x 1 0
x = y = −x + 3z = x −1 + z 3  = xv + zw x, z ∈ R.
      
z z 0 1

This shows that the set {v, w} spans W . The set is linearly independent. Why?
Because of the positions of the zeros and ones, if αv + βw = 0 then necessarily
α = 0 and β = 0.

Example 14.3 The set    


1 1
S= ,
2 −1
is a basis of R2 . To show this we have to show it spans R2 and is linearly
independent, or equivalently, that any vector b ∈ R2 is a unique linear combination
of these two vectors. Writing the vectors as the columns of a matrix A, we find that
|A| =
6 0, so this is true by Theorem 9.2 (page 133).

As in the above example, we can show that n vectors in Rn are a basis of Rn by writing
them as the columns of a matrix A and invoking Theorem 9.2. Turning this around, we
can see that if A = (v1 v2 . . . vn ) is an n × n matrix with rank(A) = n, then the
columns of A are a basis of Rn . Indeed, by Theorem 9.2, the system Az = x will have a
unique solution for any x ∈ Rn , so any vector x ∈ Rn can be written as a unique linear
combination of the column vectors. We therefore have two more equivalent statements
to add to the theorem.
Theorem 14.3 If A is an n × n matrix, then the following statements are equivalent.

A−1 exists.

Ax = b has a unique solution for any b ∈ Rn .

Ax = 0 has only the trivial solution, x = 0.

The reduced echelon form of A is I. 14


|A| 6= 0.

The rank of A is n.

The columns vectors of A are a basis of Rn .

The row vectors of A are a basis of Rn .

The last statement can be seen from the facts that |AT | = |A|, and the rows of A are
the columns of AT . This theorem provides an easy way to determine if a set of n vectors
is a basis of Rn . We simply write the n vectors as the columns of a matrix, and evaluate
its determinant.

203
14. Bases and dimension

Activity 14.1 Which of these sets is a basis of R3 ?


           
 −1 1 −1   −1 1 1 
U=  0  , 2 ,
   2  , W =  0  , 2 , 2 .
  
1 3 5 1 3 5
   

Show that one of these sets is a basis of R3 and that the other one spans a plane in
R3 . Find a basis for this plane. Then find a Cartesian equation for the plane.

14.1.1 Coordinates

What is the importance of a basis? If S = {v1 , v2 , . . . , vn } is a basis of Rn , then any


vector v ∈ Rn can be expressed uniquely as v = α1 v1 + α2 v2 + · · · + αn vn . The real
numbers α1 , α2 , · · · , αn are the coordinates of v with respect to the basis, S.
Definition 14.2 (Coordinates) If S = {v1 , v2 , . . . , vn } is a basis of a vector space V
and v = α1 v1 + α2 v2 + · · · + αn vn , then the real numbers α1 , α2 , · · · , αn are the
coordinates of v with respect to the basis, S. We use the notation

α1
 
 α2 
[v]S = 
 ... 

αn S

to denote the coordinate vector of v in the basis S.

Example 14.4 The sets B = {e1 , e2 } and S = {v1 , v2 } where


       
1 0 1 1
B= , and S= ,
0 1 2 −1

are each a basis of R2 . The coordinates of the vector v = (2, −5)T in each basis are
given by the coordinate vectors,
14    
2 −1
[v]B = and [v]S = .
−5 B 3 S

In the standard basis, the coordinates of v are precisely the components of the
vector v. In the basis S, the components of v arise from the observation that
     
1 1 2
v = −1 +3 = .
2 −1 −5

Activity 14.2 For the example above, sketch the vector v on graph paper and show
it as the sum of the vectors given by each of the linear combinations: v = 2e1 − 5e2
and v = −1v1 + 3v2 .

204
14.1. Basis

14.1.2 Dimension
A fundamental result is that if a vector space V has a finite basis, then all bases of V
are of the same size.
Theorem 14.4 Suppose that the vector space V has a finite basis consisting of d
vectors. Then any basis of V consists of exactly d vectors.

R
This is not an easy result to prove.
Read Section 6.4.1, Theorems 6.36 and 6.37, of the A-H textbook to understand
why this theorem is true.
This enables us, finally, to define exactly what we mean by the dimension of a vector
space V .
Definition 14.3 (Dimension) The number d of vectors in a finite basis of a vector
space V is the dimension of V , and is denoted dim(V ). The vector space V = {0} is
defined to have dimension 0.

A vector space which has a finite basis is said to be finite-dimensional. Not all vector
spaces are finite-dimensional. (For example, the vector space of real functions with
pointwise addition and scalar multiplication has no finite basis. Such a vector space is
said to be infinite-dimensional.)

Example 14.5 We already know Rn has a basis of size n. (For example, the
standard basis consists of n vectors.) So Rn has dimension n (which is reassuring,
since it is often referred to as n-dimensional Euclidean space).

If we know the dimension of a vector space V , then we know how many vectors we need
for a basis. If we have the correct number of vectors for a basis and we know either that
the vectors span V , or that they are linearly independent, then we can conclude that
both must be true and they form a basis, as is shown in the following theorem. That is,
we do not need to show both.
Theorem 14.5 Let V be a finite-dimensional vector space of dimension d. Then: 14
d is the largest size of a linearly independent set of vectors in V . Furthermore, any
set of d linearly independent vectors is necessarily a basis of V

d is the smallest size of a spanning set of vectors for V . Furthermore, any finite set

R
of d vectors that spans V is necessarily a basis.

Read the proof of Theorem 6.42 in the A-H textbook to understand why this is
true.
Thus, d = dim(V ) is the largest possible size of a linearly independent set of vectors in
V , and the smallest possible size of a spanning set of vectors (a set of vectors whose
linear span is V ).

205
14. Bases and dimension

Example 14.6 We know that the plane W in R3 ,


  
 x 
W =  y  : x + y − 3z = 0
z
 

has dimension 2, because we found a basis for it consisting of two vectors. If we


choose any set of 2 linearly independent vectors in W , then that set will be a basis
of W . For example, the vectors v1 = (1, 2, 1)T and v2 = (3, 0, 1)T are linearly
independent (why?), so by the theorem, S = {v1 , v2 } is a basis of W .

14.1.3 Dimension and bases of subspaces


Suppose that W is a subspace of the finite-dimensional vector space V . Any set of
linearly independent vectors in W is also a linearly independent set in V .

Activity 14.3 Prove this last statement.

Now, the dimension of W is the largest size of a linearly independent set of vectors in
W , so there is a set of dim(W ) linearly independent vectors in V . But then this means
that dim(W ) ≤ dim(V ), since the largest possible size of a linearly independent set in V
is dim(V ). There is another important relationship between bases of W and V : this is
that any basis of W can be extended to one of V . The following result states this
precisely.
Theorem 14.6 Suppose that V is a finite-dimensional vector space and that W is a
subspace of V . Then dim(W ) ≤ dim(V ). Furthermore, if {w1 , w2 , . . . , wr } is a basis of
W then there are s = dim(V ) − dim(W ) vectors v1 , v2 , . . . , vs ∈ V such that
{w1 , w2 , . . . , wr , v1 , v2 , . . . , vs } is a basis of V . (In the case W = V , the basis of W is
already a basis of V .) That is, we can obtain a basis of the whole space V by adding

R
certain vectors of V to any basis of W .

Read the proof of this theorem in the text A-H where it is labelled Theorem 6.45.

14 Example 14.7 The plane W in R3 ,

W = {x : x + y − 3z = 0}

has a basis consisting of the vectors v1 = (1, 2, 1)T and v2 = (3, 0, 1)T . If v3 is any
vector which is not in this plane, for example, v3 = (1, 0, 0)T , then the set
S = {v1 , v2 , v3 } is a basis of R3 .

14.2 Finding a basis for a linear span in Rn


Suppose we are given k vectors x1 , x2 . . . , xk in Rn , and we want to find a basis for the
linear span Lin{x1 , . . . , xk }. The point is that the k vectors themselves might not form
a linearly independent set (and hence they are not a basis).

206
14.2. Finding a basis for a linear span in Rn

A useful technique is to form a matrix with the xTi as rows, and to perform row
operations until the resulting matrix is in echelon form. Then a basis of the linear span
is given by the transposed non-zero rows of the echelon matrix (which, it should be
noted, will not generally be among the initial given vectors). The reason this works is
that: (i) row operations are such that at any stage in the resulting procedure, the row
space of the matrix is equal to the row space of the original matrix, which is precisely
the linear span of the original set of vectors, and (ii) the non-zero rows of an echelon
matrix are linearly independent (which is clear, since each has a one in a position where
the vectors below it all have zero).

Example 14.8 We find a basis for the subspace of R5 spanned by the vectors
       
1 2 −1 3
 −1   1   2   0 
       
 2  , x2 =  −2  , x3 =  −4  , x4 =  0  .
x1 =        
 −1   −2   1   −3 
−1 −2 1 −3

Write the vectors as the columns of a matrix A and then take AT (effectively writing
each column vector as a row).
 
1 −1 2 −1 −1
 2 1 −2 −2 −2 
AT = 
 −1
.
2 −4 1 1 
3 0 0 −3 −3

Reducing this to echelon form by elementary row operations,


   
1 −1 2 −1 −1 1 −1 2 −1 −1
 2
 1 −2 −2 −2   →  0 3 −6 0 0 
 
 −1 2 −4 1 1   0 1 −2 0 0 
3 0 0 −3 −3 0 3 −6 0 0
   
1 −1 2 −1 −1 1 −1 2 −1 −1
0 1 −2 0 0   →  0 1 −2 0 0  .
 
→ 14
0 1 −2 0 0  0 0 0 0 0 
0 1 −2 0 0 0 0 0 0 0
The echelon matrix at the end of this tells us that a basis for Lin{x1 , x2 , x3 , x4 } is
formed from the first two rows, transposed, of the echelon matrix, that is,
   

 1 0  
−1   1 


    

 2  ,  −2  .
   


  −1   0  

 
−1 0
 

If we want to find a basis that consists of a subset of the original vectors, then we need
to take those vectors that ‘correspond’ to the final non-zero rows in the echelon matrix.

207
14. Bases and dimension

By this, we mean the rows of the original matrix that have ended up as non-zero rows
in the echelon matrix. For instance, in Example 14.8, the first and second rows of the
original matrix correspond to the non-zero rows of the echelon matrix, so a basis of the
span is {x1 , x2 }. On the other hand, if we interchange rows, the correspondence won’t
be so obvious.
A better method to obtain such a basis is given in the next section, using the matrix A
whose columns are the vectors x1 , x2 . . . , xk . Then, as we have seen,
Lin{x1 , . . . , xk } = R(A). That is, Lin{x1 , . . . , xk } is the range or column space of the
matrix A.

14.3 Basis and dimension of range and null space


We have shown that the range and null space of an m × n matrix are subspaces of Rm
and Rn respectively (section 12.3). Their dimensions are so important that they are
given special names.
Definition 14.4 (Rank and nullity) The rank of a matrix A is

rank(A) = dim(R(A))

and the nullity is


nullity(A) = dim(N (A)).

We have, of course, already used the word ‘rank’, so it had better be the case that the
usage just given coincides with the earlier one. Fortunately it does. In fact, we have the
following connection.
Theorem 14.7 Suppose that A is an m × n matrix with columns c1 , c2 , . . . , cn , and
that an echelon form obtained from A has leading ones in columns i1 , i2 , . . . , ir . Then a
basis for R(A) is
B = {ci1 , ci2 , . . . , cir }.

Note that the basis is formed from columns of A, not columns of the echelon matrix: the
14 basis consists of those columns of A corresponding to the leading ones in the echelon
matrix.
We will outline a proof of this theorem, so you can see how it works. We have already
seen that a solution x = (α1 , α2 , . . . , αn ) of Ax = 0 gives a linear combination of the
columns of A which is equal to the zero vector,

0 = α1 c1 + α2 c2 + . . . + αn cn .

If E denotes the reduced echelon form of A, and if c01 , c02 , . . . , c0n denote the columns of
E, then exactly the same relationship holds:

0 = α1 c01 + α2 c02 + . . . + αn c0n .

In fact, we use E to obtain the solution x = (α1 , α2 , . . . , αn ). So the linear dependence


relations are the same for the columns of both matrices, which means that the linearly

208
14.3. Basis and dimension of range and null space

independent columns of A correspond precisely to the linearly independent columns of


E. Which columns of E are linearly independent? The columns which contain the

R
leading ones.
For more detail, read the proof of Theorem 6.54 in the A-H textbook.
We have already seen that a matrix A and its reduced row echelon form have the same
row space, and that the non-zero rows form a basis of this row space. So the dimension
of the row space of A, RS(A), and the dimension of the column space of A,
CS(A) = R(A), are each equal to the number of leading ones in an echelon form of A,
that is both are equal to rank(A). We restate this important fact:

dim(RS(A)) = dim(R(A)) = rank(A).

Example 14.9 Let A be the matrix,


 
1 1 2 1
A = 2 0 1 1.
9 −1 3 4

The reduced echelon form of the matrix is (verify this!)

0 21 12
 
1
E = 0 1 23 12  .
0 0 0 0

The leading ones in this echelon matrix are in the first and second columns, so a
basis for R(A) can be obtained by taking the first and second columns of A. (Note:
‘columns of A’, not of the echelon matrix!) Therefore a basis for R(A) is
   
 1 1 
2, 0  .
9 −1
 

A basis of the row space of A consists of the two non-zero rows of the reduced
matrix, or the first two rows of the original matrix,

1
   
0  1
   
2 
14
 
0 1 1 0
   
,
1 3 or 2 1 .
,
 21
 2
1

 
 

2 2
1 1

Note that the column space is a two-dimensional subspace of R3 (a plane) and the
row space is a two-dimensional subspace of R4 . The columns of A and E satisfy the
same linear dependence relations, which can be easily read from the reduced echelon
form of the matrix,
1 3 1 1
c3 = c1 + c2 , c4 = c1 + c2 .
2 2 2 2

209
14. Bases and dimension

Activity 14.4 Check that the columns of A satisfy these same linear dependence
relations.

There is a very important relationship between the rank and nullity of a matrix. We
have already seen some indication of it in our considerations of linear systems. Recall
that if an m × n matrix A has rank r then the general solution to the (consistent)
system Ax = 0 involves n − r ‘free parameters’. Specifically (noting that 0 is a
particular solution, and using a characterisation obtained earlier in Chapter 9), the
general solution takes the form

x = s1 u1 + s2 u2 + · · · + sn−r un−r ,

where u1 , u2 , . . . , un−r are themselves solutions of the system Ax = 0. But the set of
solutions of Ax = 0 is precisely the null space N (A). Thus, the null space is spanned by
the n − r vectors u1 , . . . , un−r , and so its dimension is at most n − r. In fact, it turns
out that its dimension is precisely n − r. That is,

nullity(A) = n − rank(A).

To see this, we need to show that the vectors u1 , . . . , un−r are linearly independent.
Because of the way in which these vectors arise (look at Example 14.9), it will be the
case that for each of them, there is some position where that vector will have an entry
equal to 1 and the entry in that same position of all the other vectors will be 0. From
this we can see that no non-trivial linear combination of them can be the zero vector, so
they are linearly independent. We have therefore proved the following central result.
Theorem 14.8 (Rank-nullity theorem) For an m × n matrix A,

rank(A) + nullity(A) = n.

Activity 14.5 Find a basis of the null space of the matrix A from Example 14.9,
 
1 1 2 1
A = 2 0 1 1.
14 9 −1 3 4

Verify the rank-nullity theorem for this matrix.

Overview
We have seen what is meant by a basis of a vector space, and the related concept of
dimension and we have seen how to determine coordinates with respect to a basis. We
have also defined the rank and nullity of a matrix and observed the connection between
these, through the rank-nullity theorem.

210
14.3. Learning outcomes

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

explain what is meant by a basis, and by the dimension of a finite-dimensional


vector space
find a basis for a linear span
find a basis for the null space, range and row space of a matrix from its reduced
row echelon form
explain how rank and nullity are defined, and the relationship between them (the
rank-nullity theorem).

Test your knowledge and understanding


Work Exercises 6.8–6.15 in the text A-H. The solutions to all exercises in the text can
be found at the end of the textbook.
Work Problems 6.14–6.15 in the text A-H. You will find the solutions on the VLE.

Comments on selected activities


Feedback to activity 14.1
Write each set of vectors as the columns of a matrix:
   
−1 1 −1 −1 1 1
B =  0 2 2 , A =  0 2 2.
1 3 5 1 3 5

|A| 6= 0, so W is a basis of R3 . |B| = 0, so U is not. Because the set U is linearly


dependent, one of the vectors is a linear combination of the other two. Also, any two
vectors of U are linearly independent since no one of them is a scalar multiple of any
other. Therefore any two vectors from the set U are a basis of Lin(U ). Since Lin(U ) is a
two-dimensional subspace of R3 , it is plane. 14
Using the first two vectors in U for the basis, if x ∈ U ,
     
x −1 1
x = y  = s 0  + t2, s, t ∈ R.
z 1 3

Equating components, you obtain three equations in the two unknowns s and t.
Eliminating s and t between the three equations you will obtain a single equation
relating x, y and z. Explicitly, we have

x = −s + t, y = 2t, z = s + 3t,

so
y y
t= , s=t−x= −x
2 2

211
14. Bases and dimension

and y 3 
z = s + 3t = − x + y,
2 2
so we have x − 2y + z = 0. This is the Cartesian equation of the plane.
Note that a Cartesian equation could equally well have been obtained by writing the
two basis vectors and the vector x as the columns of a matrix M and using the fact
that |M | = 0 if and only if the columns of M are linearly dependent. That is,

−1 1 x

0 2 y = −2x + 4y − 2z = 0.

1 3 z

Feedback to activity 14.3


If S = {w1 , w2 , . . . , wr } is a linearly independent set of vectors in W , then we can state
that the only linear combination

a1 w1 + a2 w2 + . . . + ar wr = 0

is the trivial one, with all ai = 0. But all the vectors in W are also in V , and this
statement still holds true, so S is a linearly independent set of vectors in V .

Feedback to activity 14.5


A general solution of the system of equations Ax = 0 is

− 21
 1
−2
 
3  1
x = s 1  − 2  + s 2  − 2  = s 1 u1 + s 2 u2 .
 
1 0
0 1

The set {u1 , u2 } is a basis of the null space of A, so dim(N (A)) = 2. From Example
14.9, rank(A) = 2. The matrix A has n = 4 columns.

rank(A) + nullity(A) = 2 + 2 = 4 = n.

Note that the basis vectors of the null space give precisely the same linear dependence
relations between the column vectors as those given in the example. Since Au1 = 0 and
14 Au2 = 0,
1 3 1 1
Au1 = − c1 − c2 + c3 = 0 and Au2 = − c1 − c2 + c4 = 0.
2 2 2 2

212
Chapter 15
Linear transformations

Introduction
In the next few chapters, we turn our attention to special types of functions between
vector spaces known as linear transformations. In this chapter, we will discuss what is
meant by a linear transformation, and will look at the matrix representations of linear
transformations between Euclidean vector spaces. This material, together with that of
the next chapter, provides the fundamental theoretical underpinning for the technique
of diagonalisation, which has many applications, as we shall see later.

Aims
The aims of this chapter are to:

explain what is meant by a linear transformation, and provide examples


explore the connection between linear transformations and matrices
define, for linear transformations (extending what we did for matrices) the range,
null space, rank and nullity.

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Sections 7.1
and 7.2.

Further reading
15
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

Synopsis
We define linear transformations and give examples. We then show an important
connection between linear transformations of Euclidean space and matrices. We discuss

213
15. Linear transformations

rotations as linear transformations, and compositions and inverse transformations. Then


we look more abstractly at linear transformations between vector spaces, defining the
range, null space, rank and nullity (extending these concepts from our earlier context of
matrices).

15.1 Linear transformations


We now turn attention to linear mappings (or linear transformations as they are
usually called) between vector spaces.
Definition 15.1 (Linear transformation) Suppose that V and W are (real) vector
spaces. A function T : V → W is linear if for all u, v ∈ V and all α ∈ R,

1. T (u + v) = T (u) + T (v) and

2. T (αu) = αT (u).
T is said to be a linear transformation (or linear mapping or linear function).

Equivalently, T is linear if for all u, v ∈ V and α, β ∈ R,

T (αu + βv) = αT (u) + βT (v).

(This single condition implies the two in the definition, and is implied by them.)

Activity 15.1 Prove that this single condition is equivalent to the two of the
definition.

Sometimes you will see T (u) written simply as T u.

15.2 Examples

Example 15.1 Let V = Rn and W = Rm and suppose that A is an m × n matrix.


15 Let TA be the function given by TA (x) = Ax for x ∈ Rn . That is, TA is simply
multiplication by A. Then TA is a linear transformation. This is easily checked, as
follows: first,

TA (u + v) = A(u + v) = Au + Av = TA (u) + TA (v).

Next,
TA (αu) = A(αu) = αAu = αTA (u).
So the two ‘linearity’ conditions are satisfied. We call TA the linear transformation
corresponding to A.

214
15.3. Linear transformations and matrices

Example 15.2 (More complicated) Let us take V = Rn and take W to be the


vector space of all functions f : R → R (with pointwise addition and scalar
multiplication). Define a function T : Rn → W as follows:

u1
 
 u2 
T (u) = T 
 ...  = pu1 ,u2 ,...,un = pu ,

un

where pu = pu1 ,u2 ,...,un is the polynomial function given by

pu1 ,u2 ,...,un (x) = u1 x + u2 x2 + u3 x3 + · · · + un xn .

Then T is a linear transformation. To check this we need to verify that

T (u + v) = T (u) + T (v) and T (αu) = αT (u).

Now, T (u + v) = pu+v , T (u) = pu , and T (v) = pv , so we need to check that


pu+v = pu + pv . This is in fact true, since, for all x,

pu+v (x) = pu1 +v1 ,...,un +vn (x)


= (u1 + v1 )x + (u2 + v2 )x2 + · · · (un + vn )xn
= (u1 x + u2 x2 + · · · + un xn ) + (v1 x + v2 x2 + · · · + vn xn )
= pu (x) + pv (x)
= (pu + pv )(x).

The fact that for all x, pu+v (x) = (pu + pv )(x) means that the functions pu+v and
pu + pv are identical. The fact that T (αu) = αT (u) is similarly proved, and you
should try it!

Activity 15.2 Prove that T (αu) = αT (u).

15.3 Linear transformations and matrices


In this section, we consider only linear transformations from Rn to Rm (for some m and
n). But much of what we say can be extended to linear transformations mapping from 15
any finite-dimensional vector space to any other finite-dimensional vector space.
We have seen that any m × n matrix A gives a linear transformation TA : Rn → Rm (the
linear transformation ‘corresponding to A’), given by TA (u) = Au. There is a reverse
connection: for every linear transformation T : Rn → Rm there is a matrix A such that
T = TA .
Theorem 15.1 Suppose that T : Rn → Rm is a linear transformation and let
{e1 , e2 , . . . , en } denote the standard basis of Rn . If A = AT is the matrix whose columns
are the vectors T (e1 ), T (e2 ), . . . , T (en ): that is,

A = (T (e1 ) T (e2 ) . . . T (en )) ,

215
15. Linear transformations

R
then T = TA : that is, for every u ∈ Rn , T (u) = Au.

Read the proof of this theorem in the A-H text, where it is labelled Theorem 7.8.
Thus, to each matrix A there corresponds a linear transformation TA , and to each linear
transformation T there corresponds a matrix AT . Note that the matrix AT we found
was determined by using the standard basis in both vector spaces: later in this chapter
we will generalise this to use other bases.

Example 15.3 Let T : R3 → R3 be the linear transformation given by


   
x x+y+z
T y =  x−y .
z x + 2y − 3z
   
1 6
In particular, if u =  2 , then T (u) =  −1 .
3 −4
To find the matrix of this linear transformation we need the images of the standard
basis vectors. We have that
     
1 1 1
T (e1 ) =  1  , T (e2 ) =  −1  , T (e3 ) =  0  .
1 2 −3

The matrix representing T is AT = (T (e1 ) T (e2 ) T (e3 )), which is


 
1 1 1
AT =  1 −1 0  .
1 2 −3

Notice that the entries of the matrix AT are just the coefficients of x, y, z in the
definition of T .

15.3.1 Rotation in R2
We find the matrix A that represents the linear transformation T : R2 → R2 which is
rotation anticlockwise by an angle θ about the origin. Let the images of the standard
15 basis vectors e1 and e2 be the vectors,
   
a b
T (e1 ) = , T (e2 ) = ,
c d

so that  
a b
AT = .
c d

We need to determine the coordinates a, c, b, d. It is helpful to draw a diagram of R2


such as the one shown below, with the images T (e1 ) and T (e2 ) after rotation
anticlockwise by an angle θ, 0 < θ < π2 .

216
15.4. Linear transformations of any vector spaces

y 6

1 6
(a, c)
T (e1 ) 
7
(b, d) }
Z T (e2 )

θZZ 

Z 
Z 
Z θ
Z
- -
(0, 0) 1 x

   
a b
The vectors T (e1 ) = and T (e2 ) = are orthogonal and each has length one
c d
since they are the rotated standard basis vectors. Drop a perpendicular from the point
(a, c) to the x-axis, forming a right triangle with angle θ at the origin. Since the
x-coordinate of the rotated vector is a and the y-coordinate is c, the side opposite the
angle θ has length c and the side adjacent to the angle θ has length a. The hypotenuse
of this triangle (which is the rotated unit vector e1 ) has length equal to one. We
therefore have a = cos θ and c = sin θ. Similarly, drop the perpendicular from the point
(b, d) to the x-axis and observe that the angle opposite the x-axis is equal to θ. Again,
basic trigonometry tells us that the x-coordinate is b = − sin θ (it has length sin θ and is
in the negative x-direction), and the height is d = cos θ. Therefore,
   
a b cos θ − sin θ
A= =
c d sin θ cos θ

is the matrix of rotation anticlockwise by an angle θ. Although we have shown this


using an angle 0 < θ < π2 , the argument can be extended to any angle θ.
In particular, if θ = π4 , then rotation anticlockwise by π
4
radians is given by the matrix
 √1
− √12
  
cos π4 − sin π4 2
B= = .
sin π4 cos π4 √1
2
√1
2

Activity 15.3 Confirm this by sketching the vectors e1 and e2 and the image
vectors  1   1 

2
− √2
T (e1 ) = 1 and T (e2 ) = 1 .
√ √
2 2
15
What is the matrix of the linear transformation which is a rotation anticlockwise by
π radians? What is the matrix of the linear transformation which is a reflection in
the y-axis? Think about what each of these two transformations does to the
standard basis vectors e1 and e2 .

15.4 Linear transformations of any vector spaces


In this section we explore a few more ideas on linear transformations, in the general
setting.

217
15. Linear transformations

15.4.1 Identity and zero linear transformations


If V is a vector space, we can define a linear transformation T : V → V by T (v) = v,
called the identity linear transformation. If V = Rn , the matrix of this linear
transformation is I, the n × n identity matrix.
There is also a linear transformation T : V → W defined by T (v) = 0. If V = Rn and
W = Rm , the matrix of this linear transformation is an m × n matrix consisting entirely
of zeros.

15.4.2 Composition and combinations of linear transformations


The composition of linear transformations is again a linear transformation. If
T : V → W and S : W → U , then ST is the linear transformation given by

ST (v) = S(T (v)) = S(w) = u,


T S
where w = T (v). Note that ST means do T and then do S: V → W → U . (For ST ,
work from the inside, out).
In terms of matrices,

ST (v) = S(T (v)) = S(AT v) = AS AT v.

That is, AST = AS AT . The matrix of the composition is obtained by matrix


multiplication of the matrices of the linear transformations. The order is important.
Composition of linear transformations, like multiplication of matrices, is not
commutative.
A linear combination of linear transformations is again a linear transformation. If
S, T : V −→ W are linear transformations between the same vector spaces, then S + T
and αS, α ∈ R, are linear transformations, and therefore, so is αS + βT for any
α, β ∈ R.

15.4.3 Inverse linear transformations


If V and W are finite dimensional vector spaces of the same dimension, then the inverse
of a linear transformation T : V → W is the linear transformation T −1 : W → V such
that
15 T −1 (T (v)) = v.
If T −1 exists, then its matrix satisfies T −1 (T (v)) = AT −1 AT v = Iv. That is, T −1 exists
if and only if (AT )−1 exists, and (AT )−1 = AT −1 .

Example 15.4 In R2 , the inverse of rotation anticlockwise by an angle θ is rotation


clockwise by the same angle. Thinking of clockwise rotation by θ as anticlockwise
rotation by an angle −θ, the matrix of rotation clockwise by θ is given by,
   
cos(−θ) − sin(−θ) cos θ sin θ
AT −1 = = .
sin(−θ) cos(−θ) − sin θ cos θ

218
15.5. Linear transformations determined by action on a basis

This is easily checked,


    
cos θ sin θ cos θ − sin θ 1 0
AT −1 AT = = .
− sin θ cos θ sin θ cos θ 0 1

Activity 15.4 Check this by multiplying the matrices. (You should note that
sin2 θ + cos2 θ = 1: see the subject guide for MT1174 Calculus.)

Example 15.5 Is there an inverse to the first example we considered,


   
x x+y+z
T y =  x−y ?
z x + 2y − 3z

We found,  
1 1 1
AT =  1 −1 0  .
1 2 −3
Since |A| = 9, the matrix is invertible, and T −1 is given by the matrix
 
3 5 1
1
A−1
T = 3 −4 1  .
9
3 −1 −2

That is,
1
+ 95 v + 19 w
 
u
!
3
u
T −1 v = 1
3
− 49 v + 19 w  .
u
1
w 3
u − 19 v + − 92 w

Activity 15.5 Check that T −1 T = I.

15.5 Linear transformations determined by action on


a basis
15

We have the following result, which shows that if we know what a linear transformation
does to the basis of a vector space, then we know what it does to any vector.
Theorem 15.2 Let V be a finite dimensional vector space and let T be a linear
transformation from V to a vector space W . Then T is completely determined by what

R
it does to a basis of V .

Read the proof of this theorem in the A-H text, where it is labelled
Theorem 7.20.

219
15. Linear transformations

15.6 Range and null space


Just as we have the range and null space of a matrix, so we have the range and null
space of a linear transformation, defined as follows.
Definition 15.2 (Range and null space of a linear transformation) Suppose that
T is a linear transformation from vector space V to vector space W . Then the range,
R(T ), of T is
R(T ) = {T (v) : v ∈ V },
and the null space, N (T ), of T is
N (T ) = {v ∈ V : T (v) = 0},
where 0 denotes the zero vector of W .

The null space is also called the kernel, and may be denoted ker(T ) in some texts.
Of course, for any matrix, A, R(TA ) = R(A) and N (TA ) = N (A).

Activity 15.6 Prove this last statement.

The range and null space of a linear transformation T : V → W are subspaces of W and
V , respectively.

Activity 15.7 Prove this!

Example 15.6 We find the null space and range of the linear transformation
S : R2 → R 4 ,  
  x + y
x  x 
S =
x − y.

y
y
The matrix of the linear transformation is
 
1 1
1 0 
AS =  1 −1  .

0 1

15 Observe that this matrix has rank 2 (by having two linearly independent columns, or
you could alternatively see this by putting it into row echelon form), so that
N (S) = {0}, the subspace of R2 consisting of only the zero vector. This can also be
seen directly from the fact that
   
x+y 0    
 x  0 x 0
  =   ⇐⇒ x = 0, y = 0 ⇐⇒ = .
x − y 0 y 0
y 0

The range, R(S) is the two-dimensional subspace of R4 with basis given by the
column vectors of AS .

220
15.7. Rank and nullity

15.7 Rank and nullity


If V and W are both finite dimensional, then so are R(T ) and N (T ). We define the
rank of T , rank(T ) to be dim(R(T )) and the nullity of T , nullity(T ), to be
dim(N (T )). As for matrices, there is a strong link between these two dimensions:
Theorem 15.3 (Dimension theorem) or (Rank-nullity theorem for linear
transformations)
Suppose that T is a linear transformation from the finite-dimensional vector space V to
the vector space W . Then
rank(T ) + nullity(T ) = dim(V ).

R
(Note that this result holds even if W is not finite-dimensional.)
Read the proof of this theorem in the A-H text, where it is labelled Theorem
7.25. Note the differences (and similarities) between the statement and proof of this
theorem and the rank-nullity theorem for matrices.
For an m × n matrix A, if T = TA , then T is a linear transformation from V = Rn to
W = Rm , and rank(T ) = rank(A), nullity(T ) = nullity(A), so this theorem states the
earlier result that
rank(A) + nullity(A) = n.

Example 15.7 Is it possible to construct a linear transformation T : R3 → R3 with


   
 1 
N (T ) = t 2   : t∈R , R(T ) = xy-plane?
3
 

A linear transformation T : R3 → R3 must satisfy the dimension theorem with n = 3:


nullity(T ) + rank(T ) = 3.
Since the dimension of the null space of T is 1 and the dimension of R(T ) is 2, the
rank-nullity theorem is satisfied, so at this stage, we certainly can’t rule out the
possibility that such a linear transformation exists. (Of course, if it was not satisfied,
we’d know straight away that we couldn’t have a linear transformation of the type
suggested.)
To find a linear transformation T with N (T ) and R(T ) as above, we construct a
matrix AT , which must be 3 × 3 since T : R3 → R3 . Note that if R(AT ) = R(T ) is
the xy-plane, then the column vectors of AT must be linearly dependent and
15
include a basis for this plane. You can take any two linearly independent vectors in
the xy-plane to be the first two columns of the matrix, and the third column must
be a linear combination of the first two. The linear dependency condition they must
satisfy is given by the basis of the null space.
For example, we take the first two column vectors to be the standard basis vectors,
c1 = e1 , and c2 = e2 . Then using the null space basis vector v, AT v = 0:
 
1
AT v = ( c1 c2 c3 )  2  = 1 c1 + 2 c2 + 3 c3 = 0 .
3

221
15. Linear transformations

Therefore, we must have c3 = − 31 c1 − 23 c2 , so that one possible linear


transformation satisfying these conditions is given by the matrix

1 0 − 31
 

AT =  0 1 − 23  .
0 0 0

Overview
We have defined linear transformations and seen some examples. An important theme
has been the connection between linear transformations and matrices. This gives us a
very useful way to think about matrices: not simply as arrays of numbers on which one
can perform algebra, but as representations of linear transformations. That conception
of matrices will enable us to understand better the key topic of change of basis (in the
next chapter) and diagonalisation (following that).

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

explain what is meant by a linear transformation and be able to prove a given


mapping is linear
explain what is meant by the range and null space, and rank and nullity of a linear
transformation
know the dimension theorem (the rank-nullity theorem) for linear transformations
and be able to apply it
comprehend the two-way relationship between matrices and linear transformations

Test your knowledge and understanding


Work Exercises 7.1, 7.2 and 7.6 in the text A-H. The solutions to all exercises in the
text can be found at the end of the textbook.
15 Work Problems 7.2 and 7.6 in the text A-H. You will find the solutions on the VLE.

222
15.7. Comments on selected activities

Comments on selected activities


Feedback to activity 15.1
To show that the condition is equivalent to the other two, we need to prove two things:
first, that the two conditions imply this one and, second, that this single condition
implies the other two. So suppose the two conditions of the definition hold and suppose
that u, v ∈ V and α, β ∈ R. Then we have T (αu) = αT (u) and T (βv) = βT (v) (by
property 2) and, by property 1, we then have

T (αu + βv) = T (αu) + T (βv) = αT (u) + βT (v),

as required. On the other hand, suppose that for all u, v ∈ V and α, β ∈ R, we have
T (αu + βv) = αT (u) + βT (v). Then property 1, follows on taking α = β = 1 and
property 2 follows on taking β = 0.
Feedback to activity 15.2
T (αu) = pαu , and T (u) = pu , so we need to check that pαu = αpu . Now, for all x,

pαu (x) = pαu1 ,αu2 ,...,αun (x)


= (αu1 )x + (αu2 )x2 + · · · + (αun )xn
= α(u1 x + u2 x2 + · · · + un xn )
= αpu (x),

as required.
Feedback to activity 15.3
Rotation by π radians is given by matrix A, whereas reflection in the y-axis is given by
matrix B:    
−1 0 −1 0
A= B= .
0 −1 0 1

Feedback to activity 15.6


This is just ‘definition-chasing’. By definition, TA is the mapping given by TA (x) = Ax
and
R(TA ) = {TA (x) : x ∈ V } = {Ax : x ∈ V } = R(A),
N (TA ) = {x ∈ V : TA (x) = 0} = {x ∈ V : Ax = 0} = N (A).

Feedback to activity 15.7


This is very similar to the proofs in the previous chapter that, for a matrix A, R(A) and
N (A) are subspaces. 15
First, we show R(T ) is a subspace of W . Note that it is non-empty since T (0) = 0 and
hence it contains 0. (The fact that T (0) = 0 can be seen in a number of ways. For
instance, take any x ∈ V . Then T (0) = T (0x) = 0T (x) = 0.) We need to show that if
u, v ∈ R(T ) then u + v ∈ R(T ) and, for any α ∈ R, αv ∈ R(T ). Suppose u, v ∈ R(T ).
Then for some y1 , y2 ∈ V , u = T (y1 ), v = T (y2 ). Now,

u + v = T (y1 ) + T (y2 ) = T (y1 + y2 ),

and so u + v ∈ R(T ). Next,

αv = α(T (y1 )) = T (αy1 ),

223
15. Linear transformations

so αv ∈ R(A).
Now consider N (T ). It is non-empty because the fact that T (0) = 0 shows 0 ∈ N (T ).
Suppose u, v ∈ N (A) and α ∈ R. Then to show u + v ∈ N (T ) and αu ∈ N (T ), we must
show that T (u + v) = 0 and T (αu) = 0. We have

T (u + v) = T (u) + T (v) = 0 + 0 = 0

and
T (αu) = α(T (u)) = α0 = 0,
so we have shown what we needed.

15

224
Chapter 16
Coordinates and change of basis

Introduction
This chapter looks at how vectors can be represented in terms of their coordinates with
respect to a basis, and explores how these representations change if the basis is changed.
Equally, linear transformations can be represented as matrices with respect to different
bases, and this chapter explains how these representations change if the bases are
changed. It is quite technical material, but vital for a proper understanding of the next
topic, diagonalisation.

Aims
The aims of this chapter are to:

define coordinates with respect to a basis

explore how coordinates change with a change of basis

represent linear transformations with respect to particular bases as matrices, and


understand how the matrices change if the bases are changed

introduce the concept of similarity of two matrices

Essential reading
R Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Sections 7.3
and 7.4.
16
Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

225
16. Coordinates and change of basis

Synopsis
We define the coordinates of a vector with respect to a given basis. Two different bases
will give different coordinates for a given vector and we explain how these two different
coordinates are connected, through a transition matrix. We have seen that linear
transformations can be represented by matrices, but we then look at how we may do
this with respect to any bases, not just the standard ones (which we will see is what we
have done up to now). The way in which these matrices change when we change the
bases in question brings us to the concept of similarity of two matrices.

16.1 Coordinates and coordinate change


Suppose that the vectors v1 , v2 , . . . , vn form a basis B for Rn . Then, as we have seen,
any x ∈ Rn can be written in exactly one way as a linear combination,
x = α1 v1 + α2 v2 + · · · + αn vn ,
of the vectors in the basis. The vector
α1
 
 α2 
[x]B = 
 ... 

αn B
is called the coordinate vector of x with respect to the basis B = {v1 , v2 , . . . , vn }.
One very straightforward observation is that the coordinate vector of any x ∈ Rn with
respect to the standard basis is just x itself. This is because if x = (x1 , x2 , . . . , xn )T ,
x = x1 e1 + x2 e2 + · · · + xn en .
What is less immediately obvious is how to find the coordinates of a vector x with
respect to a basis other than the standard one.

Example 16.1 Suppose that we let B be the following basis of R3 :


     
 1 2 3 
B =  2  ,  −1  ,  2  .
3 3 −1
 

If x is the vector (5, 7, −2)T , then the coordinate vector of x with respect to B is
16 
1

[x]B =  −1  ,
2 B

because      
1 2 3
x = 1  2  + (−1)  −1  + 2  2  .
3 3 −1

226
16.1. Coordinates and coordinate change

To find the coordinates of a vector with respect to a basis {v1 , v2 , . . . , vn }, we need to


solve the system of linear equations
a1 v1 + a2 v2 + · · · + an vn = x,
which in matrix form is
(v1 v2 . . . vn )a = x
with a = (a1 , a2 , . . . , an )T . In other words, if we let PB be the matrix whose columns are
the basis vectors (in order),
PB = (v1 v2 . . . vn ).
Then for any x ∈ Rn ,
x = PB [x]B .
The matrix PB is invertible (because its columns are linearly independent, and hence its
rank is n). So we can also write
[x]B = PB−1 x.

Definition 16.1 (Transition matrix) If B = {v1 , v2 , . . . , vn } is a basis of Rn , the


matrix
PB = (v1 v2 . . . vn )
whose columns are the B basis vectors is called the transition matrix from B
coordinates to standard coordinates. Then the matrix PB−1 is the transition matrix from
standard coordinates to coordinates in the basis B.

Note that, considered as the matrix of a linear transformation, P (x) = PB x, the


transition matrix from B coordinates to standard coordinates, actually maps the
standard basis vectors, ei , to the new basis vectors, vi . That is, P (ei ) = vi .

Example 16.2 Suppose we wish to change basis in R2 by a rotation of the axes π4


radians anticlockwise. What are the coordinates of a vector with respect to this new
basis, B = {v1 , v2 }?
The matrix of the linear transformation which performs this rotation is given by
  1
cos π4 − sin π4 − √1
 

2 2
AT = = = PB ,
sin π4 cos π4 √1
2
√1
2

where the column vectors of the matrix are the new basis vectors, v1 , v2 , so the
matrix is also the transition matrix from B coordinates to standard coordinates;
that is, we have v = PB [v]B . Then the coordinates of a vector with respect to the
new basis are given by [v]B = PB−1 v. The inverse of rotation anticlockwise, is
rotation clockwise, so we have, 16
  1
cos(− π4 ) − sin(− π4 ) cos π4 sin π4 √1
   

−1 2 2
PB = = =
sin(− π4 ) cos(− π4 ) − sin π4 cos π4 − √12 √12

From a different viewpoint, consider the vector


 1 
√ √
 
1 √
x= = 2 12 = 2v1 .
1 √
2

227
16. Coordinates and change of basis

What are √
its coordinates in the new basis B? We can find these directly since we
have x = 2v1 + 0v2 , and in B coordinates
   
1 0
[v1 ]B = and [v2 ]B = ,
0 B 1 B

so that, √ 

 
1 2
[x]B = 2 = .
0 B 0 B
Note that √ 
√1 − √12
  
2 2 1
x = PB [x]B = = .
√1 √1 0 B 1
2 2

Given a basis B of Rn with transition matrix PB , and another basis B 0 with transition
matrix PB 0 , how do we change from coordinates in the basis B to coordinates in the
basis B 0 ?
The answer is quite simple. First we change from B coordinates to standard coordinates
using v = PB [v]B and then change from standard coordinates to B 0 coordinates using
[v]B 0 = PB−10 v. That is,
[v]B 0 = PB−10 PB [v]B .

The matrix M = PB−10 PB is the transition matrix from B coordinates to B 0 coordinates.


In practice, this may be the easiest way to obtain the matrix M , as the product of the
two transition matrices, M = PB−10 PB . But let’s look more closely at the matrix M . If
the basis B is the set of vectors B = {v1 , v2 , . . . , vn }, then these are the columns of the
transition matrix, PB = (v1 v2 . . . vn ). Looking closely at the columns of the product
matrix,
M = PB−10 PB = PB−10 (v1 v2 . . . vn ) = (PB−10 v1 PB−10 v2 . . . PB−10 vn ),

that is, each column of the matrix M is obtained by multiplying the matrix PB−10 by the
corresponding column of PB . But PB−10 vi is just the B 0 coordinates of the vector vi , so
the matrix M is given by

M = ([v1 ]B 0 [v2 ]B 0 . . . [vn ]B 0 ).

We have, therefore, established the following result.


16
Theorem 16.1 If B and B 0 are two bases of Rn , with B = {v1 , v2 , . . . , vn }, then the
transition matrix from B coordinates to B 0 coordinates is given by

M = ([v1 ]B 0 [v2 ]B 0 . . . [vn ]B 0 ).

R Read Section 7.3 in the text A-H and work through the activity labelled
Activity 7.33 there.

228
16.2. Change of basis and similarity

16.2 Change of basis and similarity

16.2.1 Matrices of linear transformations


We have already seen that if T is a linear transformation from Rn to Rm , then there is a
corresponding matrix AT such that T (x) = AT x for all x. The matrix AT is given by

(T (e1 ) T (e2 ) . . . T (en )).

This matrix is obtained using the standard basis in both Rn and in Rm .


Now suppose that B is a basis of Rn and B 0 a basis of Rm , and suppose we want to
know the coordinates [T (x)]B 0 of T (x) with respect to B 0 , given the coordinates [x]B of
x with respect to B. Is there a matrix M such that

[T (x)]B 0 = M [x]B

for all x? Indeed there is, as the following result shows.


Theorem 16.2 Suppose that B = {v1 , . . . , vn } and B 0 = {v10 , . . . , vm
0
} are (ordered)
n m n m
bases of R and R and that T : R → R is a linear transformation. Let M = A[B,B 0 ]
be the m × n matrix with ith column equal to [T (vi )]B 0 , the coordinate vector of T (vi )
with respect to the basis B 0 . Then for all x, [T (x)]B 0 = M [x]B .

The matrix A[B,B 0 ] is called the matrix representing T with respect to bases B
and B 0 . (This could also be denoted as AT [B,B 0 ] to show the correspondence with the

R
linear transformation T .)
Read the proof of this theorem in the text A-H where it is labelled Theorem 7.36.
The theorem states that A[B,B 0 ] = ([T (v1 )]B 0 [T (v2 )]B 0 · · · [T (vn )]B 0 ). According to the
proof of this theorem, this matrix is obtainable as

A[B,B 0 ] = PB−10 AT PB

where PB and PB 0 are, respectively, the transition matrix from B coordinates to


standard coordinates in Rn and the transition matrix from B 0 coordinates to standard
coordinates in Rm , and AT is the matrix of the linear transformation in standard
coordinates. Make sure you understand why this works. In fact, this is usually the
easiest way to calculate the matrix M = A[B,B 0 ] , as the matrix product M = PB−10 AT PB .
Thus, if we change the basis from the standard bases of Rn and Rm , the matrix
representation of the linear transformation changes.

16.2.2 Similarity 16
A particular case of Theorem 16.2 is so important it is worth stating separately. It
corresponds to the case in which m = n and B 0 = B.
Theorem 16.3 Suppose that T : Rn → Rn is a linear transformation and that
B = {x1 , x2 , . . . , xn } is some basis of Rn . Let

P = (x1 x2 . . . xn )

229
16. Coordinates and change of basis

be the matrix whose columns are the vectors of B. Then for all x ∈ Rn ,

[T (x)]B = P −1 AP [x]B ,

where A is the matrix corresponding to T ,

A = (T (e1 ) T (e2 ) . . . T (en )).

In other words,
A[B,B] = P −1 AP.

The relationship between the matrices A[B,B] and A is a central one in the theory of
linear algebra. The matrix A[B,B] performs the same linear transformation as the matrix
A, only A[B,B] describes it in terms of the basis B rather than in standard coordinates.
This likeness of effect inspires the following definition.
Definition 16.2 (Similarity) We say that two square matrices A and M are similar if
there is an invertible (non-singular) matrix P such that M = P −1 AP .

Note that ‘similar’ has a very precise meaning here: it doesn’t mean that the matrices
somehow ‘look like’ each other (as the usual use of the word similar would suggest), but
that they represent the same linear transformation in different bases.
As we shall see in the remaining chapters, this relationship can be used to great

R
advantage if the new basis B is chosen carefully.
Read Section 7.4 in the text A-H (including Example 7.40 and Example 7.42).

Overview
We defined the coordinates of a vector with respect to a given basis and studied how
these change when the basis changes. We have also investigated the representation of
linear transformations by matrices, with respect to any bases on each of the two vector
spaces (the one it maps from and the one it maps to). The concept of similarity of
matrices then arose naturally, as two matrices are similar if they represent the same
linear transformation of a vector space to itself, but possibly with respect to a different
basis.

Learning outcomes
16
At the end of this chapter and the relevant reading you should be able to:

know what is meant by the coordinate vector of a vector with respect to a basis
and be able to determine this
find the matrix representation of a transformation with respect to two given bases
know how to change between different bases of a vector space
know what it means to say that two square matrices are similar.

230
16.2. Test your knowledge and understanding

Test your knowledge and understanding


Work Exercises 7.3–7.5 (these will also review the previous chapter) and work
exercises 7.7–7.9 in the text A-H. The solutions to all exercises in the text can be found
at the end of the textbook.
Work Problems 7.10–7.12 in the text A-H. You will find the solutions on the VLE.

16

231
16. Coordinates and change of basis

16

232
Chapter 17
Diagonalisation

Introduction
One of the most useful techniques in applications of matrices and linear algebra is
diagonalisation. This relies on the topic of eigenvalues and eigenvectors, and is related
to change of basis. We will learn how to find eigenvalues and eigenvectors of an n × n
matrix, how to diagonalise a matrix when it is possible to do so, and also how to
recognise when it is not possible. We shall see in the next chapter how useful a
technique diagonalisation is.
All matrices in this chapter are square n × n matrices with real entries, so all vectors
will be in Rn for some n.

Aims
The aims of this chapter are to:

define eigenvalues and eigenvectors and explain how to find these


explain what is meant by diagonalisation of a matrix, and how this can be achieved.

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 8,
except Section 8.3.4.

Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
17
Synopsis
We define eigenvalues and eigenvectors of a square matrix and show how to find these,
through the characteristic polynomial of the matrix. We then say what it means to

233
17. Diagonalisation

show that a matrix is diagonalisable, and we relate this to the existence of a basis, the
members of which are eigenvectors of the matrix. We explain this by drawing on the
material of the previous two chapters, through our understanding that a matrix
represents a linear transformation. We explain that diagonalisation is not always
possible but, when it is, we show how to achieve it.

17.1 Eigenvalues and eigenvectors

17.1.1 Definitions
Definition 17.1 Suppose that A is a square matrix. The number λ is said to be an
eigenvalue of A if for some non-zero vector x, Ax = λx. Any non-zero vector x for
which this equation holds is called an eigenvector for eigenvalue λ or an
eigenvector of A corresponding to eigenvalue λ.

17.1.2 Finding eigenvalues and eigenvectors


To determine whether λ is an eigenvalue of A, we need to determine whether there are
any non-zero solutions x to the matrix equation Ax = λx. Note that the matrix
equation Ax = λx is not of the standard form, since the right-hand side is not a fixed
vector b, but depends explicitly on x. However, we can rewrite it in standard form.
Note that λx = λIx, where I is, as usual, the identity matrix. So, the equation is
equivalent to Ax = λIx, or Ax − λIx = 0, which is equivalent to (A − λI)x = 0.
Now, a square linear system Bx = 0 has solutions other than x = 0 precisely when
|B| = 0. Therefore, taking B = A − λI, λ is an eigenvalue if and only if the determinant
of the matrix A − λI is zero. This determinant, p(λ) = |A − λI| is a polynomial of
degree n in the variable λ.
Definition 17.2 (Characteristic polynomial and equation) The polynomial
|A − λI| is known as the characteristic polynomial of A, and the equation
|A − λI| = 0 is called the characteristic equation of A.

To find the eigenvalues, we solve the characteristic equation |A − λI| = 0. Let us


illustrate with a 2 × 2 example.

Example 17.1 Let  


7 −15
A= .
2 −4
Then      
7 −15 1 0 7−λ −15
A − λI = −λ =
2 −4 0 1 2 −4 − λ
17 and the characteristic polynomial is

7 − λ −15
|A − λI| =

2 −4 − λ
= (7 − λ)(−4 − λ) + 30

234
17.1. Eigenvalues and eigenvectors

= λ2 − 3λ − 28 + 30
= λ2 − 3λ + 2.

So the eigenvalues are the solutions of λ2 − 3λ + 2 = 0. To solve this for λ, one could
use either the formula for the solutions to a quadratic equation, or simply observe
that the characteristic polynomial factorises. We have (λ − 1)(λ − 2) = 0 with
solutions λ = 1 and λ = 2. Hence the eigenvalues of A are 1 and 2, and these are the
only eigenvalues of A.

To find an eigenvector for each eigenvalue λ, we have to find a non-trivial solution to


(A − λI)x = 0, meaning a solution other than the zero vector. (We stress the fact that
eigenvectors cannot be the zero vector because this is a mistake many students make.)
This is easy, since for a particular value of λ, all we need to do is solve a simple linear
system. We illustrate by finding the eigenvectors for the matrix of the example just
given.

Example 17.2 We find the eigenvectors of


 
7 −15
A= .
2 −4
We have seen that the eigenvalues are 1 and 2. To find the eigenvectors for
eigenvalue 1 we solve the system (A − I)x = 0. We do this by putting the coefficient
matrix A − I into reduced echelon form.
1 − 25
   
6 −15
(A − I) = −→ · · · −→ .
2 −5 0 0
This system has solutions
 
5
v=t , for any t ∈ R.
2
There are infinitely many eigenvectors for 1: for each t 6= 0, v is an eigenvector of A
corresponding to λ = 1. But be careful not to think that you can choose t = 0; for
then v becomes the zero vector, and this is never an eigenvector, simply by
definition. To find the eigenvectors for 2, we solve (A − 2I)x = 0 by reducing the
coefficient matrix,
   
5 −15 1 −3
(A − 2I) = −→ · · · −→ .
2 −6 0 0
Setting the non-leading variable equal to t, we obtain the solutions
 
3
v=t , t ∈ R.
1
Any non-zero scalar multiple of the vector (3, 1)T is an eigenvector of A for
eigenvalue 2.
Note that each system of equations is simple enough to be solved directly. For 17
example, if x = (x1 , x2 )T , the system (A − 2I)x = 0 consists of the equations
5x1 − 15x2 = 0 , 2x1 − 6x2 = 0.
Clearly both equations are equivalent to x1 = 3x2 . If we set x2 = t for any real
number t, then we obtain the eigenvectors for λ = 2 as before.

235
17. Diagonalisation

So why do we prefer row operations? There are two reasons. The first reason is that the
system of equations may not be as simple as the one just given, particularly for an
n × n matrix where n > 2. The second reason is that putting the matrix A − λI into
echelon form provides a useful check on the eigenvalue. If |A − λI| = 0, the echelon form
of A − λI must have a row of zeros, and the system (A − λI)x = 0 will have a
non-trivial solution. If we have reduced the matrix (A − λ0 I) for some supposed
eigenvalue λ0 and do not obtain a zero row, we know immediately that there is an error,
either in the row reduction or in the choice of λ0 , and we can go back and correct it.

Examples in R3

We now give two examples with 3 × 3 matrices.

Example 17.3 Suppose that


 
4 0 4
A = 0 4 4.
4 4 8

Find the eigenvalues of A and find the corresponding eigenvectors for each
eigenvalue.
To find the eigenvalues we solve |A − λI| = 0. Now,

4 − λ 0 4

|A − λI| = 0 4−λ 4
4 4 8 − λ

4 − λ 4 0 4 − λ
= (4 − λ) + 4

4 8 − λ 4 4
= (4 − λ) ((4 − λ)(8 − λ) − 16) + 4 (−4(4 − λ))
= (4 − λ) ((4 − λ)(8 − λ) − 16) − 16(4 − λ).

We notice that each of the two terms in this expression has 4 − λ as a factor, so
instead of expanding everything, we take 4 − λ out as a common factor, obtaining

|A − λI| = (4 − λ) ((4 − λ)(8 − λ) − 16 − 16)


= (4 − λ)(32 − 12λ + λ2 − 32)
= (4 − λ)(λ2 − 12λ)
= (4 − λ)λ(λ − 12).

It follows that the eigenvalues are 4, 0, 12. (The characteristic polynomial will not
always factorise so easily. Here it was simple because of the common factor 4 − λ.
The next example is more difficult.)
17 To find an eigenvector for 4, we have to solve the equation (A − 4I)x = 0 for
x = (x1 , x2 , x3 )T . Using row operations, we have
   
0 0 4 1 1 0
 0 0 4  −→ . . . −→  0 0 1  .
4 4 4 0 0 0

236
17.1. Eigenvalues and eigenvectors

Thus x3 = 0 and setting the free variable x2 = t, the solutions are


 
−1
x = t 1 , t ∈ R.
0

So the eigenvectors for λ = 4 are the non-zero multiples of


 
−1
v1 =  1  .
0

Activity 17.1 Determine the eigenvectors for 0 and 12. Check your answers: verify
that Av = λv for each eigenvalue and one corresponding eigenvector.

Example 17.4 Let  


−3 −1 −2
A =  1 −1 1  .
1 1 0
Given that −1 is an eigenvalue of A, find all the eigenvalues of A.
We calculate the characteristic polynomial of A:

−3 − λ −1 −2

|A − λI| = 1 −1 − λ 1
1 1 −λ

−1 − λ 1 1 1 1 −1 − λ
= (−3 − λ) − (−1)
− 2

1 −λ 1 −λ 1 1
= (−3 − λ)(λ2 + λ − 1) + (−λ − 1) − 2(2 + λ)
= −λ3 − 4λ2 − 5λ − 2
= −(λ3 + 4λ2 + 5λ + 2) .

Now, the fact that −1 is an eigenvalue means that −1 is a solution of the equation
|A − λI| = 0. (You should immediately check the characteristic polynomial we
obtained by verifying that λ = −1 does, indeed, satisfy |A − λI| = 0.) This means
that λ − (−1), that is, λ + 1, is a factor of the characteristic polynomial |A − λI|. So
this characteristic polynomial can be written in the form

−(λ + 1)(aλ2 + bλ + c).

Clearly we must have a = 1 and c = 2 to obtain the correct λ3 term and the correct
constant. Using this, and comparing the coefficients of either λ2 or λ with the cubic
polynomial, we find b = 3. In other words, the characteristic polynomial is
17
−(λ3 + 4λ2 + 5λ + 2) = −(λ + 1)(λ2 + 3λ + 2) = −(λ + 1)(λ + 2)(λ + 1).

237
17. Diagonalisation

Activity 17.2 Perform the calculations to check that b = 3 and that the
characteristic polynomial factorises as stated.

We have, |A − λI| = (λ + 1)2 (λ + 2). The eigenvalues are the solutions to |A − λI| = 0,
so they are λ = −1 and λ = −2.
Note that in this case, there are only two distinct eigenvalues. We say that the
eigenvalue −1 has occurred twice, or the λ = −1 is an eigenvalue of multiplicity 2. We
will find the eigenvectors when we look at this example again in section 17.3.

17.1.3 Eigenspaces
If A is an n × n matrix and λ is an eigenvalue of A, then the set of eigenvectors
corresponding to the eigenvalue λ together with the zero vector, 0, is a subspace of Rn .
Why?
We have already seen that the null space of any m × n matrix is a subspace of Rn . The
null space of the n × n matrix A − λI, consists of all solutions to the matrix equation
(A − λI)x = 0, which is precisely the set of all eigenvectors corresponding to λ together
with the vector 0.
Definition 17.3 (Eigenspace) If A is an n × n matrix and λ is an eigenvalue of A,
then the eigenspace of the eigenvalue λ is the subspace N (A − λI) of Rn .

17.1.4 Eigenvalues and the determinant


There is a straightforward relationship between the eigenvalues of a matrix A and its
determinant. If λ1 , λ2 , . . . , λn are the eigenvalues of A, with multiple roots of the
characteristic polynomial listed each time they occur, then the characteristic polynomial
factors as
|A − λI| = (−1)n (λ − λ1 )(λ − λ2 ) · · · (λ − λn ).
Letting λ = 0 in both sides of this equation, we have

|A| = (−1)n (−1)n λ1 , λ2 · · · , λn = λ1 , λ2 · · · , λn .

We state this result as the following theorem.


Theorem 17.1 The determinant of an n × n matrix A is equal to the product of its

R
eigenvalues.

Read a more detailed discussion of this theorem at the beginning of Section 8.1.4
of the text A-H (where the theorem is labelled Theorem 8.11). Section 8.1.4 also
discusses the trace of a matrix, which is entirely optional for this course.
17
17.2 Diagonalisation of a square matrix
Recall that square matrices A and M are similar if there is an invertible matrix P such
that P −1 AP = M . We met this idea earlier when we looked at how a matrix

238
17.2. Diagonalisation of a square matrix

representing a linear transformation changes when the basis is changed. We now begin
to explore why this is such an important and useful concept.
Definition 17.4 The matrix A is diagonalisable if it is similar to a diagonal matrix;
in other words, if there is a diagonal matrix D and an invertible matrix P such that
P −1 AP = D.

Suppose that the n × n matrix A is diagonalisable, and that P −1 AP = D, where D is a


diagonal matrix

λ1 0 · · · 0
 
 0 λ2 · · · 0 
D = diag(λ1 , λ2 , . . . , λn ) = 
 0 0 ... 0  .

0 0 · · · λn

(Note the useful notation for describing the diagonal matrix D.) Then we have
AP = P D. If the columns of P are the vectors v1 , v2 , . . . , vn , then

AP = A(v1 . . . vn ) = (Av1 . . . Avn ),

and
λ1 0 ··· 0
 
 0 λ2 ··· 0 
P D = (v1 . . . vn ) 
 0 ...  = (λ1 v1 . . . λn vn ).
0 0 
0 0 · · · λn
So this means that

Av1 = λ1 v1 , Av2 = λ2 v2 , ..., Avn = λn vn .

The fact that P −1 exists means that none of the vectors vi is the zero vector. So this
means that (for i = 1, 2, . . . , n) λi is an eigenvalue of A and vi is a corresponding
eigenvector. Since P has an inverse, these eigenvectors are linearly independent.
Therefore, A has n linearly independent eigenvectors. Conversely, if A has n linearly
independent eigenvectors, then the matrix P whose columns are these eigenvectors will
be invertible, and we will have P −1 AP = D where D is a diagonal matrix with entries
equal to the eigenvalues of A. We have therefore established the following result.
Theorem 17.2 An n × n matrix A is diagonalisable if and only if it has n linearly
independent eigenvectors.

Since n linearly independent vectors in Rn form a basis of Rn , another way to state this
theorem is:
Theorem 17.3 An n × n matrix A is diagonalisable if and only if there is a basis of
Rn consisting of eigenvectors of A.
17
Suppose that this is the case, and let v1 , . . . , vn be n linearly independent eigenvectors,
where vi is an eigenvector for eigenvalue λi . Then the vectors form a basis of Rn , and
the matrix P = (v1 . . . vn ) is such that P −1 exists, and P −1 AP = D where
D = diag(λ1 , . . . , λn ).

239
17. Diagonalisation

This gives us a more sophisticated way to think about diagonalisation in terms of


change of basis and matrix representations of linear transformations. Suppose that
T = TA is the linear transformation corresponding to A, so that T (x) = Ax for all x.
Suppose that A has a set of n linearly independent eigenvectors B = {v1 , v2 , . . . , vn },
corresponding (respectively) to the eigenvalues λ1 , . . . , λn . Then B is a basis of Rn .
By Theorem 16.3 the matrix representing the linear transformation T with respect to
the basis B is A[B,B] = P −1 AT P , where the columns of P are the basis vectors,

P = (v1 . . . vn ) .

P is the matrix whose columns are the basis of eigenvectors of A and AT is the matrix
representing T , which in this case is simply A itself, so that

P −1 AP = A[B,B] = D.

In other words, the matrices A and D are similar. They represent the same linear
transformation, but A does so with respect to the standard basis and D represents T in
the basis of eigenvectors of A.
What does this tell us about the linear transformation T = TA ? If x ∈ Rn is any vector,
then its image under the linear transformation T is particularly easy to calculate in B
coordinates, where B is the basis of eigenvectors of A. That is, suppose the B
coordinates of x are [x]B = [b1 , b2 , . . . , bn ]B , then since [T (x)]B = A[B,B] [x]B = D[x]B , we
have
λ1 0 . . . 0 b1 λ1 b1
    
 0 λ 2 . . . 0   b2   λ2 b2 
 0 0 . . . 0   ...  =  ...  .
[T (x)]B =     

0 0 . . . λn bn B λn bn B
You simply multiply each coordinate by the corresponding eigenvalue.
Geometrically, we can describe the linear transformation A as a stretch in the direction
of the eigenvector vi by a factor λi (in the same direction if λ > 0 and in the opposite
direction if λ < 0). Indeed this can be seen directly. Since Avi = λi vi , each vector on
the line tvi , t ∈ R is mapped into the scalar multiple λi tvi by the linear transformation
A. If λi = 0, the line tvi is mapped to 0.

Example 17.5 Consider the first 3 × 3 matrix for which we found the eigenvalues
and eigenvectors in section 17.1.2 (page 236); we will diagonalise the matrix
 
4 0 4
A = 0 4 4.
4 4 8

We have seen that it has three distinct eigenvalues 0, 4, 12. From the eigenvectors
17 we found we take one eigenvector corresponding to each of the eigenvalues
λ1 = 4, λ2 = 0, λ3 = 12, in that order,
     
−1 −1 1
v1 =  1  , v2 =  −1  , v3 =  1  .
0 1 2

240
17.3. When is diagonalisation possible?

We now form the matrix P whose columns are these eigenvectors:


 
−1 −1 1
P =  1 −1 1  .
0 1 2

Then we know that D will be the matrix


 
4 0 0
D = 0 0 0 .
0 0 12

You can choose any order for listing the eigenvectors as the columns of the matrix
P , as long as you write the corresponding eigenvalues in the corresponding columns
of D, that is, as long as the column orders in P and D match. (If, for example, we
had chosen Pb = (v2 v1 v3 ) then Db = diag(0, 4, 12).)
As soon as you have written down the matrices P and D, you should check that
your eigenvectors are correct. That is, check that

AP = (Av1 Av2 Av3 ) = (λ1 v1 λ2 v2 λ3 v3 ) = P D.

Activity 17.3 Carry out this calculation to check that the eigenvectors are correct,
that is, check that the columns of P are eigenvectors of A corresponding to the
eigenvalues 4, 0, 12.

Then, according to the theory, if P has an inverse, that is, if the eigenvectors are
linearly independent, then P −1 AP = D = diag(4, 0, 12).

Activity 17.4 Check that P is invertible. Then find P −1 (the inverse may be
calculated using either elementary row operations or the cofactor method) and verify
that P −1 AP = D.

Note how important it is to have checked P first. Calculating the inverse of an incorrect
matrix P would have been a huge wasted effort.

Activity 17.5 Geometrically, how would you describe the linear transformation
TA (x) = Ax for this example?

17.3 When is diagonalisation possible?


Not all n × n matrices have n linearly independent eigenvectors, as the following
example shows. 17
Example 17.6 The 2 × 2 matrix
 
4 1
A=
−1 2

241
17. Diagonalisation

has characteristic polynomial λ2 − 6λ + 9 = (λ − 3)2 , so there is only one eigenvalue,


λ = 3. The eigenvectors are the non-zero solutions to (A − 3I)x = 0: that is,
    
1 1 x1 0
= .
−1 −1 x2 0

This is equivalent to the single equation x1 + x2 = 0, with general solution x1 = −x2 .


Setting x2= r,we see that the solution set of the system consists of all vectors of
−r
the form as r runs through all real numbers. So the eigenvectors are precisely
r  
−1
the non-zero scalar multiples of the fixed vector . Any two eigenvectors are
1
therefore scalar multiples of each other and hence form a linearly dependent set. In
other words, there are not two linearly independent eigenvectors, and the matrix A
is not diagonalisable.

The following result shows that if a matrix has n different eigenvalues then it is
diagonalisable, because the matrix will have n linearly independent eigenvectors.
Theorem 17.4 Eigenvectors corresponding to different eigenvalues are linearly

R
independent.

Read the proof of this theorem from the A-H text, where it is labelled
Theorem 8.33.
Using these results we have the important conclusion:

If an n × n matrix has n different eigenvalues, then it has a set of n linearly


independent eigenvectors and is therefore diagonalisable.

It is not, however, necessary for the eigenvalues to be distinct. What is needed for
diagonalisation is a set of n linearly independent eigenvectors, and this can happen even
when there is a ‘repeated’ eigenvalue (that is, when there are fewer than n different
eigenvalues). The following example illustrates this.

Example 17.7 Consider the matrix


 
3 −1 1
A = 0 2 0.
1 −1 3

The eigenvalues are given by the solutions of the characteristic equation


|A − λI| = 0. Expanding the determinant by the second row,

3 − λ −1 1
17

|A − λI| = 0 2−λ 0
1 −1 3 − λ

3 − λ 1
= (2 − λ)
1 3 − λ
= (2 − λ)(λ2 − 6λ + 9 − 1)

242
17.3. When is diagonalisation possible?

= (2 − λ)(λ2 − 6λ + 8)
= (2 − λ)(λ − 4)(λ − 2)
= −(λ − 2)2 (λ − 4).

The matrix A has only two eigenvalues: λ = 4 and λ = 2, an eigenvalue of


multiplicity 2. If we want to diagonalise it, we need to find three linearly
independent eigenvectors. There will be one (linearly independent) eigenvector
corresponding to λ = 4, so we will need two linearly independent eigenvectors
corresponding to the eigenvalue of multiplicity 2. Therefore we look for these first.
We row reduce the matrix (A − 2I),
   
1 −1 1 1 −1 1
(A − 2I) =  0 0 0  −→  0 0 0  .
1 −1 1 0 0 0

We see immediately that this matrix has rank 1, so its null space (the eigenspace for
λ = 2) will have dimension 2, and we can find a basis of this space consisting of two
linearly independent eigenvectors. Setting the non-leading variables equal to
arbitrary parameters s and t, we find that the solutions of (A − 2I)x = 0 are
   
1 −1
x = s  1  + t  0  = sv1 + tv2 , s, t ∈ R,
0 1

where v1 and v2 are two linearly independent eigenvectors for λ = 2.

Activity 17.6 How do you know that v1 and v2 are linearly independent?

Now, knowing that we will be able to diagonalise A, we find the eigenvector for λ = 4
by reducing (A − 4I).
   
−1 −1 1 1 0 −1
(A − 4I) =  0 −2 0  −→ . . . −→  0 1 0 
1 −1 −1 0 0 0
with solutions  
1
x = t 0,
 t ∈ R.
1
Let  
1
v3 =  0  .
1
The eigenvectors corresponding to distinct eigenvalues are linearly independent, so the 17
vectors v1 , v2 , v3 form a linearly independent set. Then we may take
   
1 1 −1 4 0 0
P =  0 1 0  and P −1 AP = D =  0 2 0  .
1 0 1 0 0 2

243
17. Diagonalisation

Activity 17.7 Check this! Check that AP = P D. Once you have checked that the
columns of P are the eigenvectors corresponding to the eigenvalues in the
corresponding columns of D, the theory will tell you that P −1 AP = D. Why?

Example 17.8 Consider again the last 3 × 3 example in section 17.1.2. We found
that the matrix,  
−3 −1 −2
A =  1 −1 1 
1 1 0
has an eigenvalue λ1 = −1 of multiplicity 2, and a second eigenvalue, λ2 = −2. We
can find one (linearly independent) eigenvector corresponding to λ2 = −2. In order
to diagonalise this matrix we need two linearly independent eigenvectors for λ = −1.
To see if this is possible, we row reduce the matrix (A + I):
   
−2 −1 −2 1 0 1
(A + I) =  1 0 1  −→ . . . −→  0 1 0  .
1 1 1 0 0 0

This matrix has rank 2 and the null space (the eigenspace for λ = −1) therefore (by
the rank-nullity theorem) has dimension 1. We can only find one linearly
independent eigenvector for λ = 1. All solutions of (A + I)x = 0 are of the form
 
−1
x = t 0  t ∈ R.
1

We conclude that this matrix cannot be diagonalised as it is not possible to find 3


linearly independent eigenvectors to form the matrix P .

There is another reason why a matrix A may not be diagonalisable over the real
numbers. Consider the following example.

Example 17.9 If A is the matrix


 
0 −1
A=
1 0

then the characteristic equation,



−λ −1
17 |A − λI| = = λ2 + 1 = 0
1 −λ

has no real solutions.


This matrix A can be diagonalised over the complex numbers, but not over the real
numbers.

244
17.3. Overview

Overview
We defined eigenvalues and eigenvectors of a square matrix and explained the method
of finding these. We then said what it means for a matrix to be diagonalisable and
showed how to diagonalise a matrix when it is possible to do so.

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

state what is meant by the characteristic equation of a matrix


state carefully what is meant by eigenvectors and eigenvalues, and by
diagonalisation
find eigenvalues and corresponding eigenvectors for a square matrix
diagonalise a diagonalisable matrix
determine whether or not a matrix can be diagonalised
recognise what diagonalisation says in terms of change of basis and matrix
representation of linear transformations
use diagonalisation to describe the geometric effect of a linear transformation.

Test your knowledge and understanding


Work Exercises 8.1–8.10 in the text A-H. The solutions to all exercises in the text can
be found at the end of the textbook.
Work Problems 8.9, 8.11 and 8.14 in the text A-H. You will find the solutions on the
VLE.

Comments on selected activities


Feedback to activity 17.1
The eigenvectors for λ = 0 are the non-zero solutions of Ax = 0. To find these, row
reduce the coefficient matrix A.
   
4 0 4 1 0 1
 0 4 4  −→ · · · −→  0 1 1  .
4 4 8 0 0 0

The solutions are 


−1

17
x = t  −1  , t ∈ R,
1
so that the eigenvectors are non-zero multiples of v2 = (−1, −1, 1)T . The eigenspace of
λ = 0 is the null space of the matrix A. Note that Av2 = 0v2 = 0.

245
17. Diagonalisation

Similarly, you should find that for λ = 12 the eigenvectors are non-zero multiples of
 
1
v3 =  1  .
2

Feedback to activity 17.3


Perform the matrix multiplication to show that

AP = (4v1 0v2 12v3 ) = P D .

Feedback to activity 17.4


Since |P | = 6 6= 0, P is invertible. Using the adjoint method (or row reduction), obtain
 
−3 3 0
−1 1
P = −2 −2 2.
6
1 1 2

Check that P P −1 = I. You have calculated AP in the previous activity, so now just
multiply P −1 AP to obtain D.

Feedback to activity 17.5


TA is a stretch by a factor 4 in the direction of the vector v1 = (−1, 1, 0)T , a stretch by
a factor of 12 in the direction of v3 = (1, 1, 2)T and it maps the line x = tv2 to 0.

Feedback to activity 17.6


The method of solution ensures this. See the discussion at the end of section 14.3.

Feedback to activity 17.7


The vectors v1 and v2 are linearly independent. Since v3 corresponds to a different
eigenvalue, the set of three eigenvectors is linearly independent. Therefore
P = (v3 v1 v2 ) is invertible and multiplying the equation AP = P D on the left by P −1
gives the result.

17

246
Chapter 18
Applications of diagonalisation

Introduction
We will now look at some applications of diagonalisation. We apply diagonalisation to
find powers of diagonalisable matrices and solve systems of simultaneous linear
difference equations. We also look at the topic of Markov chains. You should try to
understand why the diagonalisation process makes the solution possible, by essentially
changing basis to one in which the problem is readily solvable, namely the basis of Rn
consisting of eigenvectors of the matrix.

Aims
The aims of this chapter are to:

explain how to use diagonalisation to find powers of matrices


explain how to solve systems of difference equations
apply what we have learned to Markov chains and understand some of their key
aspects.

R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Sections 9.1
and 9.2.

Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.

Synopsis
18
We start by showing how to determine the general nth power of a matrix if that matrix
can be diagonalised. We then look at systems of linear difference equations. We discuss

247
18. Applications of diagonalisation

two ways of solving these: using a change of variable, and using powers of matrices.
Underlying both these methods is the diagonalisation of the matrix that corresponds to
the coefficients of the system of equations. We then look at Markov chains, where we
use powers of matrices and look at some properties of Markov chains.

18.1 Powers of matrices


For a positive integer n, the nth power of a matrix A is simply
An = A {z· · · A} .
| AA
n times

It is often useful, as we shall see in this chapter, to determine An for a general integer n.
Diagonalisation helps here. If we can write P −1 AP = D, then A = P DP −1 and so
An = A {z· · · A}
| AA
n times
−1
= (P DP ) (P DP −1 ) (P DP −1 ) · · · (P DP −1 )
| {z }
n times
= P D(P P )D(P P )D(P −1 P ) · · · D(P −1 P )DP −1
−1 −1

= P DIDIDI · · · DIDP
−1
{z· · · D} P
= P |DDD
n times
= P Dn P −1 .
The product P Dn P −1 is easy to compute since Dn is simply the diagonal matrix with
entries equal to the nth power of those of D.

Activity 18.1 Convince yourself that if

λ1 0 ··· 0
 
 0 λ2 ··· 0 
D=  ... .. .. . ,
. . .. 
0 0 · · · λk

then
λn1 0 ··· 0
 
 0 λn2 ··· 0 
Dn = 
 ... .. ... ..  .
. . 
0 0 · · · λnk

We give an illustrative example using a 2 × 2 matrix A, but you should be able to carry
out the procedure for 3 × 3 matrices as well.

Example 18.1 Suppose that we want a matrix expression for the nth power of the
18 matrix 
1 4

A= 1 .
2
0

248
18.2. Systems of difference equations

The characteristic polynomial |A − λI| is (check this!) λ2 − λ − 2 = (λ − 2)(λ + 1).


So the eigenvalues are −1 and 2. An eigenvector for −1 is a solution of
(A + I)v = 0, found by
   
2 4 1 2
A+I = 1 −→ ,
2
1 0 0

so we may take (2, −1)T . Eigenvectors for 2 are given by


   
−1 4 1 −4
A − 2I = 1 −→ ,
2
−2 0 0

so we may take (4, 1)T . Let P be the matrix whose columns are these eigenvectors.
Then  
2 4
P = .
−1 1
The inverse is  
−1 1 1 −4
P = .
6 1 2

We have P −1 AP = D = diag(−1, 2). The nth power of the matrix A is given by


n −1
An = P D
 P   
1 2 4 (−1)n 0 1 −4
=
6 −1 1 0 2n 1 2
 
1 2(−1) + 4(2 ) −8(−1) + 8(2n )
n n n
= .
6 −(−1)n + 2n 4(−1)n + 2(2n )

Activity 18.2 Check all of the statements just made.

18.2 Systems of difference equations


Recall from Chapter 11 that a difference equation is an equation linking the terms of a
sequence to previous terms. One very simple result we will need is that the solution to
the difference equation
xt+1 = axt
is simply xt = at x0 , where x0 is the first term of the sequence. (We assume that the
sequence is x0 , x1 , x2 , . . . rather than x1 , x2 , . . ..)

18

249
18. Applications of diagonalisation

18.2.1 Systems of difference equations


Suppose three sequences xt , yt and zt satisfy x0 = 12, y0 = 6, z0 = 6 and are related, for
t ≥ 0, as follows:
xt+1 = 5xt + 4zt (18.1)
yt+1 = 5yt + 4zt (18.2)
zt+1 = 4xt + 4yt + 9zt (18.3)

We cannot directly solve equation (18.1) for xt since we would need to know zt . On the
other hand we can’t work out zt directly from equation (18.2) or equation (18.3)
because to do so we would need to know yt ! It seems impossible, perhaps, but there are
ways to proceed.
Note that this (coupled) system of difference equations can be expressed as
    
xt+1 5 0 4 xt
 yt+1  =  0 5 4   yt  .
zt+1 4 4 9 zt
That is,
xt+1 = Axt ,
where    
xt 5 0 4
xt =  yt  and A =  0 5 4  .
zt 4 4 9
The general system we shall consider will take the form xt+1 = Axt where A is an n × n
square matrix. We shall concentrate on 3 × 3 and 2 × 2 systems, though the method is
applicable to larger values of n.
We shall describe two techniques: one involving a change of variable, and the other
powers of matrices.

18.2.2 Solving by change of variable


We can use diagonalisation as the key to a general method for solving systems of
difference equations. Given a system xt+1 = Axt , in which A is diagonalisable, we
perform a change of variable or change of coordinates, as follows. Suppose that
P −1 AP = D (where D is diagonal) and let
xt = P z t
or, equivalently, the new variable vector zt is zt = P −1 xt , so that the vector xt is in
standard coordinates and zt is in coordinates in the basis of eigenvectors. Then the
equation xt+1 = Axt becomes
P zt+1 = AP zt ,
which means that
zt+1 = P −1 AP zt = Dzt ,
18
which since D is diagonal, is very easy to solve for zt . To find xt we then use the fact
that xt = P zt .

250
18.2. Systems of difference equations

Example 18.2 We find the sequences xt , yt , zt such that

xt+1 = 5xt + 4zt


yt+1 = 5yt + 4zt
zt+1 = 4xt + 4yt + 9zt

and x0 = 12, y0 = 6, z0 = 6. This is the problem described above. In matrix form, as


we have seen, this system is xt+1 = Axt where
 
5 0 4
A = 0 5 4.
4 4 9

To use the technique, we need to diagonalise A. You should work through this
diagonalisation yourself. We’ll omit the workings here, but if
 
−1 −1 1
P =  −1 1 1 
1 0 2

then
P −1 AP = D = diag(1, 5, 13).
Now let  
ut
zt =  vt 
wt
be given by xt = P zt . Then the equation xt+1 = Axt gives rise (as explained above)
to zt+1 = Dzt . That is,
    
ut+1 1 0 0 ut
 vt+1  =  0 5 0   vt  ,
wt+1 0 0 13 wt

so we have the following system for the new sequences ut , vt , wt :

ut+1 = ut
vt+1 = 5vt
wt+1 = 13wt .

This is very easy to solve: each equation involves only one sequence, so we have
uncoupled the equations. We have, for all t,

ut = u0 , vt = 5t v0 , wt = (13)t w0 .

We have not yet solved the original problem, however, since we need to find xt , yt , zt .
We have
       
xt −1 −1 1 ut −1 −1 1 u0
xt =  yt  = P zt =  −1 1 1   vt  =  −1 1 1   5t v0  .
18
xt 1 0 2 wt 1 0 2 (13)t w0

251
18. Applications of diagonalisation

But we have also to find out what u0 , v0 , w0 are. These are not given in the problem,
but x0 , y0 , z0 are, and we know that
      
x0 u0 −1 −1 1 u0
 y0  = P  v0  =  −1 1 1   v0  .
z0 w0 1 0 2 w0

To find u0 , v0 , w0 we can either solve the linear system


     
u0 x0 12
P v0 = y 0 =
     6 
w0 z0 6

using row operations, or we can (though it involves more work) find out what P −1 is
and use the fact that
     
u0 x0 12
 v0  = P −1  y0  = P −1  6  .
w0 z0 6

Either way (and the working is again omitted), we find


   
u0 −4
 v0  =  −3  .
w0 5

Returning then to the general solution to the system, we obtain


    
xt −1 −1 1 u0
 yt  =  −1 1 1   5t v0 
xt 1 0 2 (13)t w0
  
−1 −1 1 −4
=  −1 1 1   −3(5t ) 
1 0 2 5(13t )
 
4 + 3(5t ) + 5(13)t
=  4 − 3(5t ) + 5(13)t  .
−4 + 10(13)t

So the final answer is that the sequences are:

xt = 4 + 3(5t ) + 5(13)t
yt = 4 − 3(5t ) + 5(13)t
zt = −4 + 10(13)t .

Activity 18.3 Perform the omitted diagonalisation calculations required for the
18 example just given.

As this example demonstrates, solving a system of difference equations involves a lot of

252
18.2. Systems of difference equations

work, but the good news is that it is just a matter of going through a definite (if
time-consuming) procedure.

18.2.3 Solving using matrix powers


Another way of looking at this problem is to notice that if xt+1 = Axt , then

x t = A t x0 .

Activity 18.4 Show this.

This solution can be determined explicitly if we can find the tth power At of the matrix
A. As described in section 18.1, this can be done using diagonalisation of A.

Example 18.3 We solve the system of the above example using matrix powers.
The system is xt+1 = Axt where
 
5 0 4
A = 0 5 4
4 4 9

and where x0 = 12, y0 = 6, z0 = 6. So the solution is xt = At x0 = At (12, 6, 6)T . We


have seen how A can be diagonalised: with
 
−1 −1 1
P =  −1 1 1  ,
1 0 2

we have
P −1 AP = D = diag(1, 5, 13).
So A = P DP −1 and At = P Dt P −1 . Now, as you can calculate (the details are
omitted here),  
−1/3 −1/3 1/3
P −1 =  −1/2 1/2 0 ,
1/6 1/6 1/3
so
xt = At x0 = P Dt P −1 x0
Doing the multiplication (again, details omitted),
     
1 0 0 12 4 + 3(5t ) + 5(13t )
xt = P  0 5 t 0  P −1  6  =  4 − 3(5t ) + 5(13t )  ,
0 0 (13)t 6 −4 + 10(13t )

which is, of course, precisely the same answer as we obtained using the previous
method.
18

253
18. Applications of diagonalisation

Activity 18.5 Check the calculations omitted in this example.

Note that although this technique is presented as being different from the one using a
change of variable, they are essentially the same. Here, as before, the matrix P −1 x0
represents the coordinates of the vector x0 (the initial conditions) in the basis of
eigenvectors of A (the columns of P ). In both cases, diagonalisation enables us to solve
the system by a change of basis from the standard basis in Rn to a basis consisting of
eigenvectors of the matrix A.

18.2.4 Markov chains


We begin with an example.

Example 18.4 Suppose two supermarkets compete for customers in a region with
20,000 shoppers. Assume that no shopper goes to both supermarkets in any week,
and that the table below gives the probabilities that a shopper will change from one
supermarket (or none) to another (or none) during the week.

From A From B From none


To A 0.70 0.15 0.30
To B 0.20 0.80 0.20
To none 0.10 0.05 0.50

For example, an interpretation of the second column is that during any given week
supermarket B will keep 80% of its customers while losing 15% to supermarket A
and 5% to no supermarket. Suppose that at the end of a certain week (call it week
zero) it is known that the total population of T = 20, 000 shoppers was distributed
as follows: 10,000 (0.5 T ) went to supermarket A; 8,000 (0.4 T ) went to
supermarket B; and 2,000 (0.1 T ) did not go to a supermarket.
Let xt denote the percentage of total shoppers going to supermarket A in week t, yt
the percentage going to supermarket B, and zt the percentage who do not go to any
supermarket. The number of shoppers in week t can be predicted by this model from
the numbers in the previous week, that is,
   
0.70 0.15 0.30 xt
xt = Axt−1 where A = 0.20 0.80 0.20 , xt = yt 
  
0.10 0.05 0.50 zt

with x0 = 0.5, y0 = 0.4, z0 = 0.1. The questions we wish to answer are: can we
predict from this information the number of shoppers at each supermarket in any
future week t?, and can we predict a long-term distribution of shoppers?
This is an example of a Markov chain.

In general, a Markov chain or a Markov process is a closed system consisting of a


18 population which is distributed into n different states and which changes with time
from one distribution to another. The system is observed at scheduled times. It is
assumed that the probability that a given member will change from one state into

254
18.2. Systems of difference equations

another, depending on the state it occupied at the previous observation, is known. The
system is then observed at a certain time, and the information is used to predict the
distribution of the system into its different states at a future time t.
The probabilities are listed in an n × n matrix A = (aij ) where the entry aij is the
probability that a member of the population will change from state j into state i. Such
a matrix, called a transition matrix, has the following two properties:

(1) The entries of A are all non-negative.

(2) The sum of the entries in each column of A is equal to 1: a1j + a2j + · · · + anj = 1.
Property (2) follows from the assumption that all members of the population must be
in one of the n states at any given time.
The distribution vector (or state vector) for the time period t is the vector xt ,
whose ith entry is the percentage of the population in state i at time t. The entries of xt
sum to 1, for the reason just given, that all members of the population are in one of the
states at any time. Our first goal is to find the state vector for any t, and to do this we
need to solve the difference equation

xt = Axt−1 .

A solution of the difference equation is an expression for the distribution vector xt in


terms of the original information A and x0 , and so, as we have seen in the previous
section, the solution is xt = At x0 .
Now assume that A can be diagonalised. If A has eigenvalues λ1 , λ2 , . . . , λn with
corresponding eigenvectors v1 , v2 , . . . , vn , then P −1 AP = D where P is the matrix of
eigenvectors of A and D is the corresponding diagonal matrix of eigenvalues.
The solution of the difference equation is

xt = At x0 = (P Dt P −1 )x0 .

If we set x = P z, so that z0 = P −1 x0 = (b1 , b2 , . . . , bn )T represents the coordinates of x0


in the basis of eigenvectors, then this solution can be written in vector form as

xt = P Dt (P −1 x0 )
 λt1 0 ··· 0 b1
  

| | |  0 λt2 ··· 0  b2 
= v 1 v2 · · · v n  
 ... .. .. ..  . 
.  .. 

| | | . .
0 0 · · · λtn bn

= b1 λt1 v1 + b2 λt2 v2 + · · · + bn λtn vn .

Activity 18.6 Verify the final statement above.

We now return to our example.


18

255
18. Applications of diagonalisation

Example 18.5 We find the number of shoppers using each of the supermarkets at
the end of week t, and see if we can use this to predict the long-term distribution of
shoppers.
First diagonalise the matrix A. The characteristic equation of A is

0.70 − λ 0.15 0.30

|A − λI| = 0.20 0.80 − λ 0.20 = −λ3 + 2λ2 − 1.24λ + 0.24 = 0.
0.10 0.05 0.50 − λ

This equation is satisfied by λ = 1, hence 1 is an eigenvalue. Using the fact that


(λ − 1) is a factor of the polynomial, we find

(λ − 1)(λ2 − λ + 0.24) = (λ − 1)(λ − 0.6)(λ − 0.4) = 0,

so the eigenvalues are λ1 = 1, λ2 = 0.6, and λ3 = 0.4. The corresponding


eigenvectors vi are found by solving the homogeneous systems (A − λi I)v = 0 (we
omit the calculations). Writing them as the columns of a matrix P , we obtain
   
3 3 −1 1 0 0
P =  4 −4 0  D =  0 0.6 0  P −1 AP = D.
1 1 1 0 0 0.4

Activity 18.7 Carry out the omitted calculations for the diagonalisation above.

The distribution vector xt at any time t is then given by


xt = b1 (1)t v1 + b2 (0.6)t v2 + b3 (0.4)t v3
where it only remains to find the coordinates, b1 , b2 , b3 of x0 in the basis of eigenvectors.
To find the long-term distribution of shoppers we need to consider what happens to
xt for very large values of t, that is, as t → ∞. Note, and this is very important, that
1t = 1, and that as t → ∞, (0.6)t → 0 and (0.4)t → 0, so that the limit of xt is a
multiple of the eigenvector whose eigenvalue is 1.
The coordinates of x0 in the basis of eigenvectors are given by
      
1 1 1 0.5 0.125 b1
−1 1
P x0 = 1 −1 1   0.4 = 0.025 = b2  .
   
8
−2 0 6 0.1 −0.05 b3
Hence,      
3 3 −1
xt = 0.125  4  + 0.025(0.6)t  −4  − 0.05(0.4)t  0 
1 1 1
 
0.375
and q = lim xt =  0.500 
t→∞
0.125
18
As the total number of shoppers is 20, 000, the long-term distribution is predicted to be
20,000q: 7,500 to supermarket A; 10,000 to B; and 2,500 to no supermarket.

256
18.2. Overview

Activity 18.8 Verify that P −1 is as stated.

You will have noticed that an essential part of the solution of predicting a long-term
distribution for this example is the fact that the transition matrix A has an eigenvalue
λ = 1 (of multiplicity one), and that the other eigenvalues satisfy |λi | < 1. In this case,
as t increases, the distribution vector xt will approach the unique eigenvector q for
λ = 1 which is also a distribution vector, so that Aq = q. (The fact that the entries sum
to 1 makes q unique in this one-dimensional eigenspace.)
We would like to be able to know that this is the case for any Markov chain, but there
are some exceptions to this rule. A Markov chain is said to be regular if some integer
power of the transition matrix A has strictly positive entries, aij > 0 (so no zero
entries). In this case, there will be a long-term distribution as the following theorem
implies.
Theorem 18.1 If A is the transition matrix of a regular Markov chain, then λ = 1 is
an eigenvalue of multiplicity one and all other eigenvalues satisfy |λi | < 1.

A proof of this theorem can be found in texts on Markov chains and is beyond the
scope of this course. However, we can prove a similar, but less strong result, which
makes it clear that the only thing that can go wrong is for the eigenvalue λ = 1 to have
multiplicity greater than 1.
Theorem 18.2 If A is the transition matrix of a Markov chain, then λ = 1 is an

R
eigenvalue of A and all other eigenvalues satisfy |λi | ≤ 1.

Read the discussion in Section 9.2.6 of the text A-H following Theorem 9.19
there (Theorem 18.1 above) for a proof of Theorem 18.2.
Theorem 18.2 tells us that λ = 1 is an eigenvalue, but it might have multiplicity greater
than one, in which case either there would be more than one (linearly independent)
eigenvector corresponding to λ = 1, or the matrix might not be diagonalisable.
In order to obtain a long-term distribution we need to know that there is only one
(linearly independent) eigenvector for the eigenvalue λ = 1. So if the eigenvalue λ = 1 of
a transition matrix A of a Markov chain does have multiplicity 1, then Theorem 18.2
implies all the others will have |λi | < 1. There will be one corresponding eigenvector
which is also a distribution vector and, provided A can be diagonalised, we will know
that there is a long-term distribution. This is all we will need in practice.

Overview
We’ve seen that if a matrix is diagonalisable, then its powers can easily be determined.
We then applied diagonalisation to the solution of systems of difference equations. We
described two methods, the first using a change of variable from the standard basis to
the basis of eigenvectors, and the second using powers of a matrix. Although the 18
description of the second method seemed different, we observed that it is essentially the
same as the first giving the same form of solution. We then looked at the special case of

257
18. Applications of diagonalisation

systems of difference equations which are known as Markov chains and noted the special
attributes of such systems.

Learning outcomes
At the end of this chapter and the relevant reading you should be able to:

calculate the general nth power of a diagonalisable matrix using diagonalisation


solve systems of difference equations in which the underlying matrix is
diagonalisable, by using both the matrix powers method and the change of variable
method
know what is meant by a Markov chain and its properties, and be able to find the
long-term distribution.

Test your knowledge and understanding


Work Exercises 9.1–9.6 and 9.10 in the text A-H. The solutions to all exercises in the
text can be found at the end of the textbook.
Work Problems 9.3 and 9.6 in the text A-H. You will find the solutions on the VLE.

Comments on selected activities


Feedback to activity 18.1
Take any 2 × 2 diagonal matrix D. Calculate D2 and D3 , and observe what happens.

Feedback to activity 18.4


We have

xt = Axt−1 = AAxt−2 = A2 xt−2 = A2 Axt−3 = A3 xt−3 = · · · = At x0 .


Feedback to activity 18.6
First multiply the two matrices on the right to obtain the expression Dt (P −1 x0 ) as a
vector in Rn ,
 t
λ1 0 · · · 0 b1 b1 λt1
   

t −1
 0 λt2 · · · 0  b2   b2 λt2 
D (P x0 ) =  ..
 .. . . ..  .  =  . .
. . . .  ..   .. 
0 0 · · · λtn bn bn λtn

Then express the product P (Dt (P −1 x0 )) as a linear combination of the columns of P ,


 b1 λt1
 

| | |  b2 λt2 
t −1 t t t
P (D (P x0 )) = v1 v2 · · · vn   ...  = b1 λ1 v1 + b2 λ2 v2 + · · · + bn λn vn .

18
 
| | |
bn λtn

258
Appendix A
Sample examination paper

Important note: This Sample examination paper reflects the intended examination
and assessment arrangements for this course in the academic year 2014/2015. The
intended format and structure of the examination may have changed since the
publication of this subject guide. You can find the most recent examination papers on
the VLE where all changes to the format of the examination are posted.

Time allowed: THREE HOURS


Candidates should answer all FIVE questions. All questions carry equal marks (20
marks each).
Calculators may not be used for this paper.

1(a) Consider the following system of equations, for some constants a and b,

x − y + 2z = 4
3x − y − z = 0
x + y + az = b

Use matrix methods to determine what values a and b must take if this system is
consistent and has infinitely many solutions.

What must the value of a not be if the system has precisely one solution?

What can be said about a and b if the system has no solutions?

(b) If a = 4 and b = 1, find the solution of the above system using any matrix method
(Gaussian elimination, inverse matrix, Cramer’s rule).

(c) What does it mean to say that a set {x1 , x2 , . . . , xk } of vectors in Rn is linearly
dependent? Show that the set {x1 , x2 , x3 , x4 } of vectors in R4 is linearly dependent,
where
       
1 2 2 2
2 0 1 5
x1 = 
 1  , x2 =  3  , x3 =  7  , x4 =  6  .
      

4 5 3 6

Express x4 as a linear combination of the other three vectors.


A
259
A. Sample examination paper

2(a) A system of linear equations Ax = d is known to have the following solution:


     
1 2 1
 2  1  1 
     
x =  0  + s1 + t
   
 0 .

 −1  0  −1 
0 0 1
Assume that A is an m × n matrix. Let c1 , c2 , . . . , cn denote the columns of A.
Answer the following questions, or, if there is insufficient information to answer the
question, say so.
(1) What number is n?
(2) What number is m?
(3) What (number) is the rank of A?
(4) Write down a basis of the null space of A, N (A).
(5) Which columns ci form a basis of the range, R(A)?
(6) Write down an expression for d as a linear combination of the columns ci .
(7) Write down a non-trivial linear combination of the columns ci which is equal
to the zero vector.
(b) A sequence xt satisfies
√ a
xt+1 = axt − xt−1 ,
4

for all t ≥ 1, where a > 0 is a fixed number. If x0 = −1 and x1 = a, find a
formula (in terms of t and a) for xt .
(c) An investor saves money in a bank account paying interest at a fixed rate of 5%,
where the interest is paid once per year, at the end of the year. She makes an
initial deposit of $20, 000 and, then, at the end of each of the next N years, just
after the interest has been paid, she withdraws an amount of $500. Find an
expression, in terms of N , for the amount of money in the account at the end of N
years, just after the N th withdrawal has been made.
3(a) Consider the set   
 2t 
H=  t  : t∈R .
3t
 

Prove that the set H is closed under addition and scalar multiplication. Hence, or
otherwise, prove that it is a subspace of R3 .
Show that every vector w ∈ H is a unique linear combination of the vectors
   
1 0
v1 =  0  and v2 =  1  .
−1 5

Answer the following questions, justifying any answers.


(1) Is {v1 , v2 } a basis of the subspace H? If yes, state why. If no, write down a
basis of H. State the dimension of H.
(2) Find a Cartesian equation for the subspace G = Lin{v1 , v2 }. Is {v1 , v2 } a
basis of G? Why or why not?
A
260
(b) State the dimension (rank-nullity) theorem for a linear transformation,
T : V → W , from a finite dimensional vector space V to a vector space W ,
carefully defining each term.

Let {e1 , e2 , e3 , e4 } be the standard basis of R4 , and let v1 , v2 , v3 , x be the


following vectors in R3 (where x, y, z are constants):
       
1 2 −1 x
v1 =  0  , v2 =  3  , v3 =  5  , x = y ,

−2 −1 7 z

Let T be a linear transformation, T : R4 → R3 , given by

T (e1 ) = v1 , T (e2 ) = v2 , T (e3 ) = v3 , T (e4 ) = x.

(i) Suppose the vector x is such that the linear transformation T has

dimR(T ) = dimN (T ).

Write down a condition that the components of x must satisfy for this to happen.
Find a basis of R(T ) in this case.

(ii) Suppose the vector x is such that the linear transformation T has

dimN (T ) = 1.

Write down a condition that the components of x must satisfy for this to happen.
Find a basis of N (T ) in this case.

4 Suppose
 
−1 −2 −1
A =  4 −4 −8  .
−13 −2 11

Find a basis of the null space of A.


Deduce that λ = 0 is an eigenvalue of A and write down the corresponding
eigenvector. (Justify your answer using the definition of eigenvalue and eigenvector.)
Diagonalise the matrix A: find an invertible matrix P and a diagonal matrix D
such that P −1 AP = D.
Using your answer, or otherwise, determine the sequences (xn ), (yn ), (zn ) which
have the following properties

xn+1 = −xn − 2yn − zn


yn+1 = 4xn − 4yn − 8zn
zn+1 = −13xn − 2yn + 11zn

and which satisfy the initial conditions x0 = y0 = 1 and z0 = 0.


A
261
A. Sample examination paper

5(a) Let   
1 4 5 3 2 11
A =  0 2 4 2 2, b =  2 .
−1 1 5 0 1 6

Solve the system of equations, Ax = b, using Gaussian elimination. (Put the


augmented matrix into reduced row echelon form.) Express your solution in vector
form (as x = p + a1 v1 + · · · + ak vk , where k is a positive integer).
Let c1 , c2 , . . . , c5 denote the columns of A.
Can you express c3 as a linear combination of c1 and c2 ? Justify your answer.
Write down a linear combination if one exists.
Explain how to deduce from the reduced row echelon form of A that the set of
vectors B = {c1 , c2 , c4 } is linearly independent.
Why can you conclude that B is a basis of R3 ?
Write down the coordinates of the vector b in this basis; that is, write down [b]B .

(b) Show that the set S = {c1 , c3 , c4 } is also a basis of R3 .


Find the transition matrix P from coordinates in the basis B to coordinates in the
basis S.
Hence, or otherwise, find [b]S , the coordinates of the vector b in the basis S.

A
262
Appendix B
Commentary on the Sample
examination paper

General remarks
We start by emphasising that candidates should always include their working. This
means two things. First, you should not simply write down the answer in the
examination script, but should explain the method by which it is obtained. Second, you
should include rough working. The Examiners want you to get the right answers, of
course, but it is more important that you demonstrate that you know what you are
doing: that is what is really being examined.
We also stress that if a candidate has not completely solved a problem, they may still
be awarded marks for a partial, incomplete, or slightly wrong, solution; but, if they have
written down a wrong answer and nothing else, no marks can be awarded.

Solutions to questions
Question 1(a) Since you are asked to use matrix methods, begin by thinking of the
system of equations in matrix form, as Ax = b with
     
1 −1 2 x 4
A =  3 −1 −1  , x = y , b = 0.
1 1 a z b
Read through the question to know all that is being asked. There are different
approaches you can take to start.
The most efficient method is to write down the augmented matrix and begin to row
reduce it,
   
1 −1 2 4 −→ 1 −1 2 4
(A|b) =  3 −1 −1 0  R2 − 3R1  0 2 −7 −12 
1 1 a b R3 − R1 0 2 a−2 b−4
 
1 −1 2 4
−→ 0 2 −7 −12  .
R3 − R2
0 0 a+5 b+8
You are now in a position to answer the questions asked in the order in which they were
asked.

263
B
B. Commentary on the Sample examination paper

The system will be consistent with infinitely many solutions if and only if the last row
of the row echelon form is a row of zeros, so a = −5 and b = −8. It will have a unique
solution if and only if a + 5 6= 0, so a 6= −5. It will be inconsistent (no solution) if and
only if a + 5 = 0 and b + 8 6= 0, that is a = −5 and b 6= −8.
Alternatively, you can begin by evaluating the determinant of A, for example, using the
cofactor expansion by row 3,

1 −1 2

3 −1 −1 = 1(1 + 2) − 1(−1 − 6) + a(−1 + 3) = 10 + 2a.

1 1 a

The system will have a unique solution if and only if |A| 6= 0, so a 6= −5. If a = −5
there will either be infinitely many solutions or no solution, depending on the value of b.
To answer the remaining questions, you still need to row reduce the augmented matrix,
but this time you can do it with a = −5,
   
1 −1 2 4 −→ 1 −1 2 4
 3 −1 −1 0  R2 − 3R1  0 2 −7 −12  .
1 1 −5 b R3 − R1 0 2 −7 b − 4

Comparing the last two rows, you can see that the system will be inconsistent if
b − 4 6= −12, that is if b 6= −8 and a = −5, and that there will be infinitely many
solutions if b = −8 and a = −5.
(b) If you have successfully solved part (a) of this question, then the easiest way to
solve the system with a = 4 and b = 1 is to substitute these values into the row echelon
form of the augmented matrix and continue reducing:
     
1 −1 2 4 1 −1 2 4 1 −1 2 4
0 2 −7 −12  =  0 2 −7 −12  −→  0 2 −7 −12 
0 0 a+5 b+8 0 0 9 9 0 0 1 1

1 0 0 − 12
     
1 −1 0 2 1 −1 0 2
−→  0 2 0 −5  −→  0 1 0 − 52  −→  0 1 0 − 52  .
0 0 1 1 0 0 1 1 0 0 1 1
The unique solution is x = (x, y, z)T = (− 21 , − 25 , 1)T . (It is easy for you to check that
this is correct by substituting the values into the equations.)
You could also solve this system using the inverse matrix or Cramer’s rule. These are
covered in Chapter 8 of the subject guide. It is a good idea for you to practise these
methods by solving this system to obtain the same answer.

(c) Linear independence is covered in Chapter 13 of the subject guide.


A set {x1 , x2 , . . . , xk } of vectors in Rn is linearly dependent if there are real numbers
a1 , a2 , . . . , ak , not all zero, such that

a1 x1 + a2 x2 + · · · + ak xk = 0.

Equivalently, the set {x1 , x2 , . . . , xk } of vectors is linearly dependent if one of the vectors
can be expressed as a linear combination of the others. (Either statement is acceptable).

264
B
To show that the set {x1 , x2 , x3 , x4 } of vectors in R4 is linearly dependent, where
       
1 2 2 2
2 0 1 5
x1 =  1  , x2 =  3  , x3 =  7  , x4 =  6  ,
      

4 5 3 6

you can write the vectors as the columns of a matrix A and row reduce it, thereby
solving the system of equations

Ax = a1 x1 + a2 x2 + a3 x3 + a4 x4 = 0.

The steps are not shown here, but you should show all steps in the examination. The
reduced row echelon form of a matrix is unique, so you should find that
   
1 2 2 2 1 0 0 2
 −→ . . . −→  0 1 0 −1  .
2 0 1 5  
A= 1 3 7 6 0 0 1 1 
4 5 3 6 0 0 0 0

From the reduced row echelon form you can deduce that there are infinitely many
solutions, since there is one non-leading variable, and therefore the vectors are linearly
dependent.
To find the linear combination, you can spot the linear dependence relations between
the columns of the reduced row echelon form, and the columns of A will have the same
relationship, namely,
x4 = 2x1 − x2 + x3 .
Or you can find the solution v = (−2, 1, −1, 1)T of Ax = 0 and use it to write down the
relationship between the columns of A, since

Av = −2x1 + x2 − x3 + x4 = 0,

and then solve for x4 . Either way, it is easy to check (and you should do this) that your
answer is correct by using the vectors,
       
2 1 2 2
5 2 0 1
  = 2  −   +  .
6 1 3 7
6 4 5 3

Question 2(a) This question is a good test of your understanding of the material in
Chapters 5, 6 and 9 of the subject guide. If A is an m × n matrix with columns
c1 , c2 , . . . , cn such that the system of linear equations Ax = d has solution:
     
1 2 1
 2  1  1 
     
x=  0  + s  1  + t  0  = p + sv1 + tv2 ,
    
 −1  0  −1 
0 0 1

265
B
B. Commentary on the Sample examination paper

then you should be able to deduce certain properties of the matrix A just by looking at
the solution.
(1) The number of columns, n = 5. Why? Because the solutions, x, are 5 × 1 vectors,
and the multiplication Ax is only defined if A has the same number of columns as x has
rows.
(2) The number m cannot be determined. (But, m ≥ 3 from part (3).)
(3) The rank of A is 3. Essentially, this is deduced from the rank-nullity theorem
which says that rank(A)+nullity(A)=n, where n is the number of columns of A. So the
rank, r is r = n − dim(N (A)). You have also seen that the general solution of Ax = b is
of the form
x = p + a1 v1 + · · · + an−r vn−r
and the given solution is of the form x = p + sv1 + tv2 , so dim(N (A)) = 2 and
r = 5 − 2 = 3.
(4) The two vectors, v1 and v2 form a basis of the null space of A, N (A), So {v1 , v2 } is
a basis, where
v1 = (2, 1, 1, 0, 0, )T and v2 = (1, 1, 0, −1, 1)T .

(5) To answer this you need a good understanding of how the general solution is
obtained using Gaussian elimination. By looking at the solution, you can tell the
positions of the leading variables and the non-leading variables in the reduced row
echelon form of A. The non-leading variables must be in the third and fifth column
because of the positions of 0 and 1 in the solution vectors, and the leading ones must be
in the first, second and fourth columns. So a basis of the range, R(A), is the set of
vectors {c1 , c2 , c4 }.
(6) From Ap = d, you can deduce that d = c1 + 2c2 − c4 . Any solution x, so any value
of s and t, will also give you a vector such that Ax = d, and so a different linear
combination, but p is the simplest one to use.
(7) In the same way, using Av1 = 0, or Av2 = 0, you obtain the linear combinations
2c1 + c2 + c3 = 0 or c1 + c2 − c4 + c5 = 0.
Again, any linear combination of v1 and v2 can be used.
(b) This is a second-order difference equation,
√ as covered in Chapter 11 of the subject
guide. In
√ standard form, we have x t+1 −
√ ax t + (a/4)xt−1 = 0, so the auxiliary equation

2 2
is z − az + (a/4) = 0, which is (z − a/2) = 0, so there is just one solution, a/2.
Therefore, for some constants A and B,
 √ t
a
xt = (At + B) .
2

The facts that x0 = −1 and x1 = a show that A = 3 and B = −1, so
 √ t
a
xt = (3t − 1) .
2

(c) Let yn be the amount of money after the nth withdrawal. Then:
y1 = 20000(1.05) − 500,

266
B
y2 = (1.05)y1 − 500 = 20000(1.05)2 − 500(1.05) − 500,
y3 = y2 (1.05) − 500 = 20000(1.05)3 − 500(1.05)2 − 500(1.05) − 500.

Spotting the pattern,

yN = 20000(1.05)N − 500(1.05)N −1 − 500(1.05)N −2 − · · · − 500(1.05) − 500


(1.05)N − 1
= 20000(1.05)N − 500
(1.05) − 1
= 20000(1.05) − 10000((1.05)N − 1)
N

= 10000(1.05)N + 10000.

The question can also be solved using difference equations.

Question 3(a) This question is covered in Chapters 12, 13 and 14 of the subject guide.
It also relies on understanding lines and planes in R3 as covered in Chapter 4.
  
 2t 
To show H =  t  : t∈R is closed under addition,
3t
 
   
2t 2s
let u, v ∈ H. Then u =  t , v=
  s , for some s, t ∈ R. Then
3t 3s
       
2t 2s 2t + 2s 2(t + s)
u+v = t + s = t+s = t+s ∈H
3t 3s 3t + 3s 3(t + s)

since (t + s) ∈ R. Therefore, H is closed under addition.


To show H is closed under scalar multiplication:
Let u ∈ H, α ∈ R. Then
     
2t α(2t) 2(αt)
αu = α  t  =  αt  =  αt  ∈ H
3t α(3t) 3(αt)

since (αt) ∈ R. Therefore, H is closed under scalar multiplication


The set H is non-empty, since the vector 0 ∈ H, as well as the vector v = (2, 1, 3)T .
Since H is also closed under addition and scalar multiplication, it is a subspace.
 
2s
For the next part, let w ∈ H, w =  s  for some constant s ∈ R. You need to find
3s
constants a, b such that
     
2s 1 0
 s  = a 0  + b1,
3s −1 5
which is equivalent to the simultaneous equations: 2s = a, s = b, 3s = −a + 5b.
Substituting the values of a and b obtained from the first two equations into the third

267
B
B. Commentary on the Sample examination paper

equation, we find that these values also satisfy the third equation. Therefore the system
has the unique solution a = 2s, b = s, and w = (2s)v1 + (s)v2 .
To answer the remaining questions, it helps for you to see what is going on.
(1) The set {v1 , v2 } is NOT a basis of the subspace H since v1 ∈
/ H (and also,
v2 ∈
/ H.)
 
2
A basis of H is {v} where v = 1 ; and dim(H) = 1.

3
(2) A Cartesian equation for the subspace G = Lin{v1 , v2 } is given by

1 0 x

0
1 y = x − 5y + z = 0.
−1 5 z

(This can be easily checked by substituting in the components of v1 and v2 , and you
should do this.) The set {v1 , v2 } is a basis of G. It spans as G is, by definition, the set
of all linear combinations of v1 and v2 . It is linearly independent as neither vector is a
scalar multiple of the other.
(Although this is not asked as part of the question, it should be clear to you by now
that, geometrically, H is a line through the origin, that is, H is the set of position
vectors whose endpoints determine a line in R3 .
The set G is a plane through the origin (meaning position vectors of points on the
plane). The line H is contained in the plane G, or algebraically, H is a subspace of G,
which is why every vector in H is a unique linear combination of the basis vectors of G.
Both H and G are subspaces of R3 .)

(b) Linear transformations are covered in Chapter 15 of the subject guide.


If T : V → W is a linear transformation, and dim(V ) = n, then the dimension theorem
states that
dimR(T ) + dimN (T ) = n
or
rank(T ) + nullity(T ) = n
where nullity(T ) is the dimension of N (T ), the kernel, or null space, of T and rank(T )
is the dimension of R(T ), the range of T . Note that you must specifically say what n
represents; that is, n = dimV .

Let {e1 , e2 , e3 , e4 } be the standard basis of R4 , v1 , v2 , v3 , x the vectors.


       
1 2 −1 x
v1 =  0  , v2 =  3  , v3 =  5  , x = y ,
−2 −1 7 z

and let T be the linear transformation, T : R4 → R3 , given by

T (e1 ) = v1 T (e2 ) = v2 T (e3 ) = v3 T (e4 ) = x.

268
B
Then T is given by T (x) = Ax where A is a 3 × 4 matrix. The simplest way to answer
the questions is to construct this matrix, whose columns are the images of the standard
basis vectors, T (ei ),
 
1 2 −1 x
A= 0 3 5 y
−2 −1 7 z

In order to consider the two possibilities in parts (i) and (ii), row reduce this matrix,
beginning with R3 + 2R1 ,
   
1 2 −1 x 1 2 −1 x
A −→  0 3 5 y  −→  0 3 5 y .
0 3 5 z + 2x 0 0 0 z + 2x − y

(i) By the dimension theorem, since T : R4 → R3 , n = 4, so for the dimensions of R(T )


and N (T ) to be equal, the subspaces must both have dimension 2. Looking at the
reduced form of the matrix, we see that this will happen if

2x − y + z = 0.

If the vector x satisfies this condition, then a basis of R(T ) is given by the columns of A
corresponding to the leading ones in the row echelon form, which will be the first two
columns. So a basis of R(T ) is {v1 , v2 }.
You could also approach this question by first deducing from the dimension theorem
that the dimR(T ) = 2 as above, so R(T ) is a plane in R3 . Therefore {v1 , v2 } is a basis,
since these two vectors are linearly independent (because they are not scalar multiples)
and they span a plane whose Cartesian equation is given by

1 2 x

0
3 y = 6x − 3y + 3z = 0.
−2 −1 z

or 2x − y + z = 0. The components of the vector v3 satisfy this equation, and this is the
condition that the components of x must satisfy.

(ii) If the linear transformation has dimN (T ) = 1, then by the dimension theorem, you
know that dimR(T ) = 3 (therefore R(T ) = R3 ), so the echelon form of the matrix A
needs to have 3 leading ones. Therefore the condition that the components of x must
satisfy is
2x − y + z 6= 0.

Now continue with row reducing the matrix A to obtain a basis for N (T ). The row
echelon form of A will have a leading one in the last column (first multiply the last row
by 1/(2x − y + z) to get this leading one, then continue to reduced echelon form)

1 0 − 13
     
1 2 −1 0 1 2 −1 0 3
0
A −→ . . . −→  0 3 5 0  −→  0 1 35 0  −→  0 1 5
3
0,
0 0 0 1 0 0 0 1 0 0 0 1

269
B
B. Commentary on the Sample examination paper

13
 
3
 −5 
 3
so a basis of N (T ) is given by the vector w =   or any non-zero scalar multiple
1 
  0
13
 −5 
of this, such as 
 3 .

Question 4 The null space of a matrix is introduced in Chapter 6 of the subject guide.
Diagonalisation and its application to systems of difference equations are found in
Chapters 17 and 18, respectively.
To find a basis of the null space of the matrix A, put it into reduced row echelon form
using the algorithm. The steps are not shown, but you should be able to carry them out
efficiently and accurately, and you should show all the steps in the examination.
   
−1 −2 −1 1 0 −1
A= 4 −4 −8  −→ . . . −→  0 1 1  .
−13 −2 11 0 0 0

You can read the solution of the homogeneous system Ax = 0 from the reduced echelon
form of the matrix, setting z = t, t ∈ R, to obtain the general solution
 
1
x = t  −1  = tv1 . t ∈ R.
1

The vector v1 = (1, −1, 1)T is a basis of the null space.


Since Av1 = 0 = 0v1 , the vector v1 is an eigenvector of A corresponding to the
eigenvalue λ1 = 0. (This statement invokes the definition of eigenvalue and eigenvector,
namely that Av = λv for some v 6= 0.)
To diagonalise the matrix A, you first need to find the remaining eigenvalues by solving
|A − λI| = 0. The characteristic equation is

|A − λI| = −λ3 + 6λ2 + 72λ = −λ(λ + 6)(λ − 12) = 0.

Again, the steps are not shown here, but you should show them all in an examination.
You need to expand the determinant slowly and carefully to avoid errors. The
eigenvalues are λ1 = 0, λ2 = −6, λ3 = 12.
Next solve (A − λI)v = 0 for each of the other two eigenvalues. In each case the
reduced echelon form of the matrix, (A − λI) should contain a row of zeros, so that
there is a non-trivial solution giving the corresponding eigenvector. This checks that the
eigenvalues are correct. If the reduced echelon form of (A − λI) does not contain a row
of zeros, then you need to find your error. This may be in the row reduction, or it may
be in your characteristic equation or factorising. One quick way to check whether your
eigenvalue is correct is to substitute it into |A − λI| and see if you do get zero when you
evaluate the determinant.
Having solved (A − λI)v = 0 for each of λ2 and λ3 , you should have that the

270
B
corresponding eigenvectors are multiples of:
   
1 0
v2 =  2  and v3 =  −1  ,
1 2
respectively. Again, all work should be shown.
At this stage, you should check that the eigenvectors are correct. Form a matrix P
whose columns are the eigenvectors and the diagonal matrix D with the corresponding
eigenvalues on the diagonal,
   
1 1 0 0 0 0
P =  −1 2 −1  D =  0 −6 0  .
1 1 2 0 0 12
Now check that AP = P D by multiplying out the matrices AP and P D.
You know that P is invertible since eigenvectors corresponding to distinct eigenvalues
are linearly independent. Therefore, you can conclude that P −1 AP = D. Having
checked the eigenvalues and eigenvectors, you do not need to compute P −1 AP explicitly
to determine D; you can simply state the result because of the underlying theory.
Use this diagonalisation to determine the sequences (xn ), (yn ), (zn ) which have the
following properties:
xn+1 = −xn − 2yn − zn
yn+1 = 4xn − 4yn − 8zn
zn+1 = −13xn − 2yn + 11zn
and which satisfy the initial conditions x0 = y0 = 1 and z0 = 0.
Denoting by xn the vector (xn , yn , zn )T , this can be expressed as xn+1 = Axn , for which
the solution is given by
xn = An x0 = P Dn P −1 x0 .
Using the adjoint method (cofactors), or any other method, find P −1 , and immediately
check that the inverse is correct by showing P P −1 = I.
Then using the initial conditions,
 1 
5 −2 −1 1
! !
1 2
P −1 x0 = 1 2 1 1 = 1
2

6 −3 0 3 0 − 12
So that      1 
xn 1 1 0 0 0 0 2
xn =  yn  =  −1 2 −1   0 (−6)n 0   12 
zn 1 1 2 0 0 (12)n − 12
The solution is,
1
xn = (−6)n
2
1
yn = (−6)n + (12)n
2
1
zn = (−6) + 12n .
n
2

271
B
B. Commentary on the Sample examination paper

(The answer can be checked by finding x1 both from the original equations and from
the solution. If you have time, you might want to do this.)
Question 5(a) Solving a linear system of equations by putting the augmented matrix
into reduced echelon form is an application of the basic material in Chapter 5 of the
subject guide. The vector form is specifically emphasised in Chapter 9.
You should begin this question as instructed, by writing down the augmented matrix
and putting it into reduced row echelon form. Do this carefully to avoid errors,
   
1 4 5 3 2 11 1 4 5 3 2 11
(A|b) =  0 2 4 2 2 2  −→  0 1 2 1 1 1  −→
−1 1 5 0 1 6 0 5 10 3 3 17
   
1 4 5 3 2 11 1 4 5 3 2 11
0 1 2 1 1 1  −→  0 1 2 1 1 1  −→
0 0 0 −2 −2 12 0 0 0 1 1 −6
   
1 4 5 0 −1 29 1 0 −3 0 −1 1
0 1 2 0 0 7  −→  0 1 2 0 0 7 .
0 0 0 1 1 −6 0 0 0 1 1 −6
Check that you do have the reduced row echelon form; find the columns with leading
ones and make sure they have zeros elsewhere (above and below). As the question
specifically asks you to put the matrix into reduced row echelon form, if you stop at row
echelon form and use back substitution, you will not earn full marks, and you are also
less likely to obtain the correct answer.
You can ‘read’ the solution from the reduced echelon form. Assign parameters, say s
and t, to the non-leading variables x3 and x5 , and write down the other variables in
terms of these using the equations deduced from the matrix. The general solution is
         
x1 1 + 3s + t 1 3 1
 x2   7 − 2s   7   −2   0 
         
x= x
  3
= s  =  0  + s 1  + t 0 .
      
 x4   −6 − t   −6   0   −1 
x5 t 0 0 1

x = p + sv1 + tv2 , s, t ∈ R.

To answer the questions concerning the column vectors, you need to understand the
material in Chapters 9, 13 and 14 of the subject guide.
The columns of the reduced row echelon form of a matrix satisfy the same dependency
relations as the columns of the matrix. From the reduced row echelon form of A, you
can see that
c3 = −3c1 + 2c2 .
Indeed, this also follows from Av1 = 0, and you can, and should, check that it is correct:
     
5 1 4
 4  = −3  0  + 2  2  .
5 −1 1

272
B
To answer the next part, it is enough to say that in the reduced echelon form of A, the
columns with the leading ones correspond to the vectors c1 , c2 and c4 . Therefore these
vectors are linearly independent. (The reduced row echelon form of a matrix C
consisting of these three column vectors would have a leading one in every column, so
Cx = 0 has only the trivial solution.)

To conclude that B is a basis of R3 , you can state that B is a set of three linearly
independent vectors in a three-dimensional vector space, R3 , therefore B is a basis of
R3 . It is not sufficient to merely say that the vectors are linearly independent and span,
you would need to give a reason why they span R3 . (For example, by stating that there
is a leading one in every row, so Ax = b has a solution for all b ∈ R3 .)

From the solution, Ap = b, you have b = c1 + 7c2 − 6c4 . You should recognise that this
expresses b as a linear combination of the basis vectors, and the coefficients are the
coordinates of b in this basis, B. That is,
 
1
[b]B =  7  .
−6 B

(b) This part of the question continues with the material on the basis of a vector space
contained in Chapter 14. The material on changing basis is in Chapter 16.
To show that S = {c1 , c3 , c4 } is also a basis of R3 , you can calculate the determinant of
the matrix with these vectors as columns.

1 5 3

0 4 2 = 1(−10) − 1(10 − 12) = −8 6= 0.

−1 5 0

Since the determinant is non-zero, this implies that S is a basis of R3 .


(This statement answers the question, and is all that is required here, but you should
understand why it is true. The relationship of these concepts is covered in Chapter 8
and Chapters 14 and 15 of the subject guide. If Q denotes the matrix with column
vectors, c1 , c3 , c4 , then |Q| 6= 0 implies that Q−1 exists, so that a system of equations
Qx = b has a unique solution for all b ∈ R3 . This implies both that the column vectors
are linearly independent and that they span R3 . The same argument follows by reducing
the matrix Q to echelon form and showing that there are three leading ones.)
You can find P by using the transition matrix M from B coordinates to standard and
the transition matrix Q from S coordinates to standard:
   
1 4 3 1 5 3
M=  0 2 2 ,  Q=  0 4 2,
−1 1 0 −1 5 0

If you recall that


v = M [v]B and w = Q[w]S ,
then to change from coordinates in the basis B to coordinates in the basis S, you need

[v]S = Q−1 M [v]B .

273
B
B. Commentary on the Sample examination paper

So Q−1 M is the transition matrix from B coordinates to S coordinates. The easiest way
to find Q−1 is using the cofactor method. Then

−10 15 −2 1 32 0
    
1 4 3
1
Q−1 M = −  −2 3 −2   0 2 2  =  0 12 0  = P.
8
4 −10 4 −1 1 0 0 0 1

You can find the S coordinates of b using this matrix and [b]B from part (a),

1 32 0
    23 
1 2
[b]S =  0 21 0   7  =  72  ,
0 0 1 −6 B −6 S

which you can easily check. Or, you can find the S coordinates directly from the basis S
by solving b = ac1 + bc3 + cc4 for a, b, c using Gaussian elimination or by using the
inverse matrix, Q−1 , which you found above.
You can also do this using the results of part (a). You know that b = 1c1 + 7c2 − 6c4
and c3 = −3c1 + 2c2 . If you solve the latter equation for c2 and substitute into the
equation for b, you will obtain the vector b as a linear combination of c1 , c3 , c4 , and
hence the coordinates of b in this basis.
This idea can be used in a better understanding of the matrix P . Notice the simple
form of the transition matrix P from B coordinates to S coordinates. If you have a
vector expressed as a linear combination of the basis vectors of B and as a linear
combination of the basis vectors of S, then the coefficients of the first and last vectors
will be the same in either basis since the first and last basis vectors are the same. Only
the middle vector is different. Therefore, P will be of the form
 
1 a 0
P = 0 b 0.
0 c 1

To change from a linear combination of c1 , c2 , c4 , to a linear combination of c1 , c3 , c4 ,


you just need to know how to express c2 as a linear combination of the S basis vectors;
that is, you need the coordinates of the vector c2 in the basis S. Using the result of part
(a), that c3 = −3c1 + 2c2 , as you did above, and solving for c2 , you will obtain
c2 = 23 c1 + 12 c3 . Therefore,
1 32 0
 

P =  0 21 0  .
0 0 1

274
B
Comment form
We welcome any comments you may have on the materials which are sent to you as part of your
study pack. Such feedback from students helps us in our effort to improve the materials produced
for the International Programmes.
If you have any comments about this guide, either general or specific (including corrections,
non-availability of Essential readings, etc.), please take the time to complete and return this form.

Title of this subject guide:

Name
Address

Email
Student number
For which qualification are you studying?

Comments

Please continue on additional sheets if necessary.

Date:

Please send your completed form (or a photocopy of it) to:


Publishing Manager, Publications Office, University of London International Programmes,
Stewart House, 32 Russell Square, London WC1B 5DN, UK.

You might also like