Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Linear Algebra

Notes for MTHA4006Y and MTHB5008A


Sinéad Lyle

Autumn Semester 2020

Contents
1 Matrices 2
1.1 First definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Addition, scalar multiplication and transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Combining operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Matrix inverses 9
2.1 Matrix inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Determinants and inverses of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Powers of square matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Simultaneous Linear Equations 18


3.1 Lines and Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Matrix form of a system of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Solving systems of linear equations 27


4.1 Solving systems of linear equations I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Solving systems of equations II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Row rank and systems of equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 More on matrix inverses 39


5.1 Elementary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.2 Inverting a matrix using elementary row operations . . . . . . . . . . . . . . . . . . . . . . . . 41

6 Eigenvalues and eigenvectors 43


6.1 Matrices and Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

A Appendix 47
A.1 Inverses and Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.2 Row reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

1
1 Matrices
In this Section we will

• Define matrices.
• Define matrix addition, scalar multiplication and multiplication and transposition.
• Look at different types of matrices: zero, square, diagonal, symmetric, upper / lower-triangular, identity.

In their simplest interpretation, matrices are arrays of numbers used to hold information in an orderly way. They
are a useful way to store data that can be organised into rows and columns. Can you think of everyday
examples?

These arrays can be thought of ‘multi-dimensional’ real numbers and we can define matrix addition, scalar
multiplication and matrix multiplication in much the same way as for ‘ordinary one-dimensional’ real numbers.
(In fact we can think about real numbers as one-dimensional matrices.) An advantage of using matrices is
precisely this; a multi-dimensional block of data can be handled and manipulated as a single entity. Many
problems that involve large amounts of data are reduced to matrix problems before being solved on a computer.
Think about modelling, computer graphics, medical imaging, economics, electrical networks, data encryption,
etc... These all rely on the mathematics of matrices.

1.1 First definitions


Definition 1.1. Let m, n ∈ N. For 1 ≤ i ≤ m and 1 ≤ j ≤ n, let aij ∈ R. Then an array of numbers in the
shape  
a11 a12 a13 ··· a1n
 a21 a22 a23 ··· a2n 
A= .  = (aij )
 
.. .. ..
 .. . . . 
am1 am2 am3 ··· amn
is a matrix of m rows and n columns, or for short, an m × n matrix. We write

Mm,n (R) = {A = (aij ) : aij ∈ R, 1 ≤ i ≤ m, 1 ≤ j ≤ n}

to represent the set of all m × n matrices A with entries in R.

• The real number aij represents the (i, j)-entry of A, that is, the entry in row i and column j.
• The ith row of A is represented by (ai1 · · · ain ) for i = 1, . . . m.
• The j th column of A is represented by
 
a1j
 a2j 
for j = 1, . . . n.
 
 .. 
 . 
amj

Examples.  
  π
7 −1 0 1 −1
1 ∈ M2,4 (R),   ∈ M4,1 (R).
2 0 1 −1 4
0

Definition 1.2. Let A = (aij ), B = (bij ) ∈ Mm,n (R).

• A and B are equal if all their corresponding entries agree, that is aij = bij for all i = 1, . . . , m and
j = 1, . . . , n. We write A = B.
• The negative of matrix A is the matrix −A = (−aij ).

2
• The zero matrix, 0m,n , is the m × n matrix in which all entries are 0. We sometimes write 0n for 0n,n
or just 0 if we don’t need to know the size.

Definition 1.3. A matrix A is called square if it has the same number of rows and columns. The diagonal of a
square n × n matrix A = (aij ) consists of the entries aii where 1 ≤ i ≤ n, that is, the entries on the NorthWest
- SouthEast diagonal.
Definition 1.4. The following definitions only apply to square matrices.

• A diagonal matrix is a square matrix A = (aij ) such that aij = 0 for i 6= j, that is, each of whose
non-diagonal entries is equal to zero.
• A square matrix with each entry below the main diagonal equal to zero is an upper-triangular matrix.
• A square matrix with each entry above the main diagonal equal to zero is an lower-triangular matrix.
• A square matrix is symmetric if it is unchanged by reflection in the diagonal, that is, if aij = aji for
1 ≤ i, j ≤ n.

Examples. Let
     
    2 0 0 −1 0 0 1 −1 0
0 0 1 2
A= , B= , C = −1 4 0 , D =  0 4 0 , E = −1 4 6 .
0 0 0 1
5 −2 1 0 0 8 0 6 1
Then A and D are both diagonal, upper-triangular, lower-triangular and symmetric. B is upper-triangular and
C is lower-triangular. E is symmetric.

1.2 Addition, scalar multiplication and transpose


Definition 1.5. Given two matrices A = (aij ), B = (bij ) ∈ Mm,n (R) we can define their sum as follows. If
   
a11 a12 a13 · · · a1n b11 b12 b13 · · · b1n
 a21 a22 a23 · · · a2n   b21 b22 b23 · · · b2n 
A= . and B =  .
   
. .
. .
. .
.  . .. .. .. 
 . . . .   . . . . 
am1 am2 am3 · · · amn bm1 bm2 bm3 . . . bmn
then  
a11 + b11 a12 + b12 a13 + b13 ··· a1n + b1n
 a21 + b21 a22 + b22 a23 + b23 ··· a2n + b2n 
A+B =  = (aij + bij ).
 
.. .. .. ..
 . . . . 
am1 + bm1 am2 + bm2 am3 + bm3 . . . amn + bmn
Or more succinctly, let A + B = D = (dij ) where dij = aij + bij for all i = 1, 2, . . . , m and j = 1, 2, . . . , n.

Examples.
• Given two matrices A = (aij ), B = (bij ) ∈ M2,3 (R), we define their sum:
     
a11 a12 a13 b11 b12 b13 a11 + b11 a12 + b12 a13 + b13
A= , B= , A+B = .
a21 a22 a23 b21 b22 b23 a21 + b11 a22 + b22 a23 + b23

• We have      
1 4 −1 1 0 5
 +  5 −2 =  8 −2 .
3 0    

−9 7   9 4   0 11 
−2 5 −2 5 −4 10
• The sum  
1 4  
3 1 4
0 


−9 + 3 0
7
−9 7
−2 5
makes no sense.

3
Remark. Only matrices of the same size can be added together.

Theorem 1.6. For all matrices A, B and C of the same size and their corresponding size zero matrix 0, that
is, A, B, C, 0 ∈ Mm,n (R) for some m, n, we have

• A + B ∈ Mm,n (R) (Closure),


• A + B = B + A (Commutative),
• A + (B + C) = (A + B) + C (Associative),
• A + 0 = 0 + A = A (Additive Identity),
• A + (−A) = (−A) + A = 0 (Additive Inverse).

Proof. These properties all depend on corresponding properties in R. For example, suppose r, s, t ∈ R. Then
r + s = s + r and (r + s) + t = r + (s + t). We will give the first two proofs. You should attempt the other
proofs yourself.

• The fact that A + B is an m × n matrix follows from the definition


• Let D = A + B and E = B + A. We want to show that D = E. They are both m × n matrices, so to
show they are the same, we need to show that the entry in row i and column j of D is the same as the
entry in row i and column j of E (for all 1 ≤ i ≤ m and 1 ≤ j ≤ n). Write D = (dij ) and E = (eij ) so
that dij is the entry in row i and column j of D and eij is the entry in row i and column j of E. Then
looking at the definitions, if 1 ≤ i ≤ m and 1 ≤ j ≤ n then

dij = aij + bij by definition


= bij + aij by the properties of R
= eij by definition.

This shows that D is indeed equal to E.

Definition 1.7. Let A ∈ Mm,n (R) and let r ∈ R. If


 
a11 a12 a13 ··· a1n
 a21 a22 a23 ··· a2n 
A=
 
.. .. .. .. 
 . . . . 
am1 am2 am3 · · · amn

then we define the scalar multiple of A as


 
ra11 ra12 ra13 · · · ra1n
 ra21 ra22 ra23 · · · ra2n 
rA =  .  = (raij ).
 
.. .. ..
 .. . . . 
ram1 ram2 ram3 · · · ramn

Examples.

• Given a matrix A = (aij ) ∈ M2,3 (R) and a scalar r ∈ R we can define the scalar multiple of A as
   
a11 a12 a13 ra11 ra12 ra13
A= , r ∈ R, rA =
a21 a22 a23 ra21 ra22 ra23

• We have    
1 4 3 12
3 0= 9 0

3
−9
 .
7   −27 21
−2 5 −6 15

4
Theorem 1.8. For all matrices A and B of the same size (that is, A, B ∈ Mm,n (R)) and for all scalars r, s ∈ R,
we have
• rA ∈ Mm×n (R) (Closure),
• (rs)A = r(sA) (Associative),
• (r + s)A = rA + sA (Distributive),
• r(A + B) = rA + rB (Distributive).
Proof. Again, these properties follow from the properties of the real numbers. The first property follows from
the definition. We will prove the last one here. You should try and prove the others yourself.
Let A = (aij ), B = (bij ) ∈ Mm,n (R) and r ∈ R. Let D = r(A + B) and E = rA + rB. Now A + B ∈ Mm,n (R)
by Theorem 1.6 and hence r(A + B) ∈ Mm,n (R) by the first part of Theorem 1.8. To show D and E are the
same, we need to show that the entry in row i and column j of D is the same as the entry in row i and column
j of E (for 1 ≤ i ≤ m and 1 ≤ j ≤ n). Write D = (dij ) and E = (eij ) so that dij is the entry in row i
and column j of D and eij is the entry in row i and column j of E. The (i, j)-entry of A + B is aij + bij so
the (i, j)-entry of D is r(aij + bij ). The (i, j)-entry of rA is raij and the (i, j)-entry of rB is rbij , hence the
(i, j)-entry of rA + rB is raij + rbij . But by the properties of real numbers, r(aij + bij ) = raij + rbij . Hence
D = E as required.

Definition 1.9. The transpose of an m × n matrix A = (aij ) is the n × m matrix AT whose (i, j)th entry is
the (j, i)th entry of A. That is AT = (aji ).
Example. The matrix  
0 2 3
A=
4 1 0
has transpose  
0 4
AT =  2 1  .
3 0
Theorem 1.10. Let A, B ∈ Mm×n (R). Then the following results hold:
• (AT )T = A,
• (A + B)T = AT + B T .
Proof. The first identity follows from the definition. Let A = (aij ) and B = (bij ) and set D = AT + B T
and E = (A + B)T . Then AT , B T ∈ Mn,m (R) so AT + B T ∈ Mm,n (R). Also, A + B ∈ Mm,n (R) so
(A + B)T ∈ Mn,m (R). Now let 1 ≤ k ≤ n and 1 ≤ l ≤ m. We want to show that the (k, l)-entry of D is equal
to the (k, l)-entry of E. The (k, l)-entry of AT is alk and the (k, l)-entry of B T is blk . So the (k, l)-entry of
D is alk + blk . The (k, l)-entry of E = (A + B)T is the (l, k)-entry of (A + B), that is alk + blk . So indeed
D = E.

Remark. Using this definition, we see that a matrix A is symmetric if and only if AT = A. This is often taken
to be the definition of a symmetric matrix.

1.3 Matrix multiplication


Example. Given A ∈ M2,3 (R) and B ∈ M3,2 (R) we can define the matrix product AB ∈ M2,2 (R) as follows
 
  b11 b12  
a11 a12 a13 d11 d12
A= , B = b21 b22 ,
  AB = ,
a21 a22 a23 d21 d22
b31 b32

where the dij are calculated as follows:

d11 = a11 b11 + a12 b21 + a13 b31 , d12 = a11 b12 + a12 b22 + a13 b32 ,
d21 = a21 b11 + a22 b21 + a23 b31 , d22 = a21 b12 + a22 b22 + a23 b32 .

5
Definition 1.11. In general, suppose that A = (aij ) and B = (bij ) are matrices with A ∈ Mm,n (R) and
B ∈ Mn,p (R). We define the matrix product AB to be the m × p matrix where the entry in row i and column
j is given by
Xn
aik bkj .
k=1

Remark. It is absolutely essential that the sizes are compatible. The number of columns of A must be equal
to the number of rows of B. In the first example in this section, A is a 2 × 3 matrix and B is an 3 × 2 matrix,
yielding AB which is a 2 × 2 matrix. If the sizes are not compatible, the product is undefined.

Examples. Let  
−2 2 3 
A= , B= 2 −1 .
−1 1 0
Then A is a 2 × 3 matrix and B is a 1 × 2 matrix.

• The matrix D = AB is not defined. A ∈ M2,3 (R) and B ∈ M1,2 (R) so that A has three columns, but B
has one row.
• The matrix D = BA is defined, as B ∈ M1,2 (R) and A ∈ M2,3 (R) so that the number of columns of B
is equal to the number of rows of A. Since B is a 1 × 2 matrix and A is a 2 × 3 matrix, BA is a 1 × 3
matrix and  
 −2 2 3 
BA = 2 −1 = −3 3 6 .
−1 1 0
The working behind this is as follows:

d11 = b11 a11 + b12 a21 = 2 × (−2) + (−1) × (−1) = −3,


d12 = b11 a12 + b12 a22 = 2 × 2 + (−1) × 1 = 3,
d13 = b11 a13 + b12 a23 = 2 × 3 + (−1) × 0 = 6.

A visualisation
The following diagram can help us picture our calculations for matrix multiplication. We can interpret the
coefficient dij as the matrix product of row i of A ∈ Mm,n (R) and column j of B ∈ Mn,p (R):
 
b11 · · · b1j ··· b1p
 .. .. .. 
 . . . 
bn1 · · · bnj ··· bnp

   
a11 · · · a1n ∗ ··· ∗ ··· ∗
 .. ..   .. .. .. 
 . .   . . . 

 
 ai1 · · · ain   ∗ ··· dij ··· ∗ 

 .. ..   .. .. .. 
 
 . .   . . . 
am1 · · · amn ∗ ··· ∗ ··· ∗
where dij = ai1 b1j + . . . + ain bnj and D = AB = (dij ) ∈ Mm,p (R).

Examples.

• Repeating the previous example, we can use the following diagram to help us:
 
−2 2 3
−1 1 0
 
2 −1 −3 3 6

6
  
4 −1 3 −2
• Suppose we want to find . We use the diagram again:
−8 5 4 1
 
3 −2
4 1
   
4 −1 8 −10
−8 5 −4 21

Theorem 1.12. The following properties hold for matrix multiplication.


• Let A ∈ Mm×n (R), B ∈ Mn×p (R) and C ∈ Mp×q (R). Then (AB)C = A(BC), an m × q matrix
(Associative),
• Let A ∈ Mm×n (R) and B, C ∈ Mn×p (R). Then A(B + C) = AB + AC, an m × p matrix (Distributive),
• Let A, B ∈ Mm×n (R) and C ∈ Mn×p (R). Then (A + B)C = AC + BC, an m × p matrix (Distributive).
Proof. We prove the first one and leave the rest as an exercise. Suppose that A = (aij ), B = (bij ) and C = (cij ).
We have AB ∈ Mm×p (R) so (AB)C ∈ Mm×q (R) and BC ∈ Mn×q (R) so that A(BC) ∈ Mm×q (R). Let
F = (fij ) = AB and G = (gij ) = BC. We want to find the (i, j)-entry of A(BC) and of (AB)C, for
1 ≤ i ≤ m and 1 ≤ j ≤ q.
The (i, j)-entry of (AB)C = F C is given by
p p n
!
X X X
fik ckj = ail blk ckj
k=1 k=1 l=1

The (i, j)-entry of A(BC) = AG is given by


n n p p X
n
!
X X X X
ail glj = ail blk ckj = (ail blk ) ckj .
l=1 l=1 k=1 k=1 l=1

Hence the matrices are the same.

Remark. Matrix multiplication is not commutative! - in general, AB 6= BA. In fact AB could be a different
size to BA. But even if AB and BA are the same size, it is unlikely that they are equal.
Examples.

• Let 
1 
A= 4  and B= 1 7 9 .
4
Then  
1 7 9
AB =  4 28 36  while BA = (65).
4 28 36

• Let    
3 4 0 1
A= and B= .
1 2 −1 −1
Then    
−4 −1 1 2
AB = while BA = .
−2 −1 −4 −6

Remark. Another interesting property is that it is perfectly possible to have AB = 0, the zero matrix, without
having A = 0 or B = 0.
       
1 2 4 6 0 0 −2 −4
Example. Let A = and B = . Then AB = and BA = .
−1 −2 −2 −3 0 0 1 2

7
Observe that if A is an m × n matrix AT is and n × m matrix and then AAT is an m × m matrix and AT A is
an n × n matrix.
Example. The matrix  
0 2 3
A=
4 1 0
has transpose 

0 4
AT =  2 1  .
3 0
We have  
  16 4 0
T 13 2
AA = , AT A =  4 5 6 .
2 17
0 6 0
We leave the following theorem as an exercise - try to prove it yourself!
Theorem 1.13. Let A ∈ Mm×n (R) and B ∈ Mn×p (R). Then (AB)T = B T AT .
Theorem 1.14. Let B ∈ Mn,n (R). Then BB T is symmetric.
Proof. Let B = (bij ) and write B T = (cij ). That is, the (i, j)-entry of B T is cij = bji for 1 ≤ i, j ≤ n. The
(x, y)-entry of BB T is given by
Xn Xn
bxz czy = bxz byz .
z=1 z=1

The (y, x)-entry of BB T is given by


n
X n
X n
X
byz czx = byz bxz = bxz byz .
z=1 z=1 z=1

Hence BB T is indeed symmetric.

Earlier we defined the zero matrix 0m,n which is the additive identity when considering matrix addition. We
can also define a matrix (corresponding to the number 1 for the real numbers) as follows:
Definition 1.15. The matrix In ∈ Mn,n (R) given by
 
1 0 0 0
 0 1 0 0 
In = (δij ) = 
 
.. .. .. .. 
 . . . . 
0 0 ... 0 1

where 
1 if i = j,
δij =
0 otherwise,
is the n × n identity matrix.
Theorem 1.16. Let A ∈ Mm,n (R) and Im and In represent the m × m and n × n identity matrices respectively.
Then
Im A = AIn = A.
In particular, if A ∈ Mn,n (R) then In acts as a left and right multiplicative identity.
Proof. Let A = (aij ) and consider Im A. In particular, the (i, j)-entry of Im A is given by
m
X
δik akj = aij
k=1

since δik = 0 unless k = i. Hence Im A = A. The proof that AIn = A is similar.

8
1.4 Combining operations
It is clear that we could keep combining these operations above, provided the matrices in question have the
right size. For example, if A, B, C ∈ Mn,n (R) then we also have
(A + B T )C − 2B + (A + B T )(C + 5C T ) ∈ Mn,n (R).
The distributive rules that we’ve seen previously shows this is well-defined.
Note that we can define powers of square matrices. Suppose A ∈ Mn,n (R). We have
A0 = I n , A1 = A, and we recursively define Al+1 = (Al )A for l ≥ 1.
So A2 = AA, A3 = (AA)A, . . .. We will consider A−1 , A−2 , . . . later in the module.
Example. Let A ∈ M2,2 (R). Consider
(A + I2 )(A − I2 ) = A2 − AI2 + I2 A − I2 I2 = A2 − A + A − I2 = A2 − I2 .

2 Matrix inverses
In this Section we will
• Define what is meant by an invertible matrix.
• Define the determinant of a matrix and look at ways of computing it.
• Learn to determine whether or not a given matrix is invertible in terms of its determinant. Learn how to
compute the inverse of an invertible matrix in terms of its adjugate.
• Consider powers of square matrices, in particular powers of invertible matrices.

2.1 Matrix inverses


In our previous chapter we saw that many of the properties of ordinary arithmetic for the real numbers have
analogues in matrix algebra. We shall now consider the extent to which a further important property of ordinary
arithmetic is mirrored in matrix algebra.
Recall: Given any real number 0 6= a ∈ R there exists a number b such that ab = ba = 1. This number b is
usually denoted by a−1 or 1/a and is called the multiplicative inverse of a.
Definition 2.1. Suppose that A ∈ Mn×n (R) is a square matrix. We say that A is invertible if there exists
B ∈ Mn×n (R) such that AB = In = BA. We call B the inverse of A.
Remark. B is also invertible, with inverse A.
Remark. The definition of invertible only holds for square matrices.
Examples.
       
1 5 1 5 1 −5 1 −5 1 5
• The matrix is invertible since = I2 = .
0 1 0 1 0 1 0 1 0 1
   
1 3 −1 3/5 −1 4/5
• The matrix 2 1
 1  is invertible with inverse −1/5 1 −3/5.

3 −1 2 −1 2 −1
   
2 −6 a b
• The matrix A = is not invertible. To see this, suppose that is an inverse for A, that
−3 9 c d
is     
2 −6 a b 1 0
= .
−3 9 c d 0 1
This implies that we have the four simultaneous equations
2a − 6c = 1, 2b − 6d = 0,
−3a + 9c = 0, −3b + 9d = 1,
and looking at the first and third equation we see that 6a − 18c = 3 and 6a − 18c = 0, giving a
contradiction.

9
Remark. We will return later to the problem of deciding whether or not a matrix is invertible.
Theorem 2.2. If B and B 0 are inverses of A then B = B 0 . In other words, if an inverse of A exists then it is
unique and will be denoted by A−1 .
Proof. Suppose that B and B 0 are both inverses of A. Then

B = In B by Theorem 1.16
0
= (B A)B by definition of the inverse
0
= B (AB) by associativity
0
= B In by definition of the inverse
0
=B by Theorem 1.16

Theorem 2.3. Suppose that

A ∈ Mn,n (R), X, X 0 ∈ Mn,p (R), Y, Y ∈ Mm,n (R)

and that A is invertible. Then

AX = AX 0 =⇒ X = X 0 and Y A = Y 0 A =⇒ Y = Y 0 .

Proof. Suppose AX = AX 0 . By the property of the identity matrix, and the associative property of matrix
multiplication we have

X = In X = (A−1 A)X = A−1 (AX) = A−1 (AX 0 ) = (A−1 A)X 0 = In X 0 = X 0 .

The second part is similar.

Theorem 2.4. If A and B are invertible n × n matrices then AB is invertible and

(AB)−1 = B −1 A−1 .

Proof. We have

(AB)(B −1 A−1 ) = A(B(B −1 A−1 )) = A((BB −1 )A−1 ) = A(In )A−1 ) = AA−1 = In

by associativity and the property of the identity matrix. Similarly (B −1 A−1 )(AB) = In . Hence AB is invertible
with (AB)−1 = B −1 A−1 .

Remark. This result can be extended. Let A1 , A2 , . . . Ak be invertible matrices of the same size. Then the
product A1 A2 . . . Ak is invertible with

(A1 A2 . . . Ak )−1 = A−1 −1 −1


k Ak−1 . . . A1 .

Proposition 2.5. Suppose that A, B ∈ Mn,n (R) and that AB = In . Then BA = In .


Remark. This proposition says that B is a left inverse of A if and only if it is a right inverse. The proof of this
result is by no means obvious and in fact uses some ideas that you will see in Algebra II. We sketch the proof
in the Appendix in Proposition A.1.
Corollary 2.6. Let A ∈ Mn,n (R).
• If there exists B ∈ Mn,n (R) with AB = In then A is invertible (with inverse B).
• If there exists B ∈ Mn,n (R) with BA = In then A is invertible (with inverse B).
We have a converse to Theorem 2.4.
Lemma 2.7. Suppose that A, B ∈ Mn,n (R) and that A is not invertible or B is not invertible. Then AB is
not invertible.
Proof. Suppose for a contradiction that AB is invertible with inverse D. Then In = (AB)D = A(BD) so
A is invertible by Corollary 2.6. Also In = D(AB) = (DA)B so B is invertible, also by Corollary 2.6. This
contradicts our initial assumption.

10
2.2 Determinants
The determinant of a square matrix assigns a number which characterises certain properties of the matrix. In
particular, we will prove that a matrix is invertible if and only if its determinant function is non-zero. We will
see another characterisation of the determinant in Section 6.1.

Definition 2.8. The determinant of a 1 × 1 matrix A = a is

det A = a = a.
 
a b
Definition 2.9. The determinant of a 2 × 2 matrix A = is
c d

a b
det A = = ad − bc.
c d
 
1 2
Example. If A = then
3 4

1 2
det A =
= (1 × 4) − (2 × 3) = −2.
3 4

Inverses of 2 × 2 matrices
We shall return to proving this later, but as many of you know, if a 2 × 2 has non-zero determinant then it is
invertible since we can calculate the inverseusingdeterminants as follows.
a b
To find the inverse of a 2 × 2 matrix A = with det A 6= 0:
c d

1. Interchange the diagonal entries.


2. Multiply the non-diagonal entries by −1.
3. Divide by the determinant, giving  
−1 1 d −b
A = .
det A −c a

To see this works, note that


      
−1 1 d −b a b 1 ad − bc 0 1 0
A A= = =
det A −c a c d det A 0 ad − bc 0 1
       
a b 1 d −b 1 ad − bc 0 1 0
AA−1 = = =
c d det A −c a det A 0 ad − bc 0 1
 
1 2
Example. For our matrix A = we can find its inverse:
3 4
   
−1 1 4 −2 −2 1
A = = 3
−2 −3 1 2 − 12 .
 
a1 b1 c1
Definition 2.10. The determinant of a 3 × 3 matrix A = a2 b2 c2  is
a3 b3 c3

b2 c2 a2 c2 a2 b2
− b1
det A = a1 a3 c3 + c1 a3 b3 = a1 (b2 c3 − c2 b3 ) − b1 (a2 c3 − c2 a3 ) + c1 (a2 b3 − b2 a3 ).

b3 c3

Example.
1 2 1
3 1 −1 = 1 1 −1 − 2 3 −1 + 1 3 1 = 1(2) − 2(1) + 1(5) = 5.

1 1 −2 1 −2 1
−2 1 1

11
Determinant of an n × n matrix
In order to formally define the determinant function for any square n × n matrix we need to introduce a couple
more things first. As you may have guessed from the definitions of 2×2 and 3×3 matrices, the definition can be
thought of inductive, in that the determinant of an n×n matrix is made up of a combination of (n−1)×(n−1)
matrices, which are each in turn made up of combinations of (n − 2) × (n − 2) matrices, and so on!

Definition 2.11. Let A = (aij ) ∈ Mn,n (R). For 1 ≤ i, j ≤ n, we define the ij th minor of A to be the
(n − 1) × (n − 1) matrix obtained by deleting the ith row and the j th column from A. We denote the minor by
Aij . In some texts the minor is called the submatrix.
 
1 2 3 4
2 3 4 1
Example. We find some minors for the matrix A = 
3
. For example,
4 1 2
4 1 2 3
   
3 4 1 1 2 4
A11 = 4 1 2  and A23 = 3 4 2  .
1 2 3 4 1 3

Definition 2.12. Let A = (aij ) ∈ Mn,n (R). The cofactor Cij associated with the entry aij is

Cij = (−1)i+j det Aij ,

where Aij is the ij th minor defined above.


 
1 2 3 4
2 3 4 1
Example. We find some cofactors for the matrix A = 
3
. For example,
4 1 2
4 1 2 3

3 4 1 1 2 4
= (−1)2 4 1 2 = −36 = (−1)5 3 4 2 = 44.

C11 and C23
1 2 3 4 1 3

We are now set to complete the definition of the determinant.

Definition 2.13. If A = (a) is a 1 × 1-matrix then its determinant is det A = |a| = a. Suppose n ≥ 2. The
determinant of an n × n-matrix A = (aij ) is

a11 a12 . . . a1n

n
a21 a22 . . . a2n X
det A = . .. = a1j C1j .

..
.. . .
j=1
an1 an2 . . . ann

Remark. In this definition we are expanding along the top row and using the cofactors of the entries in the top
row. There are alternative expansions for the determinant that collect the terms in different ways which we will
see shortly.

Examples.
 
a b
• Suppose A = . Then
c d

det A = aC11 − bC12 = a|d| − b|c| = ad − bc

which agrees with the formula we gave above.

12
 
a11 a12 a13
• Suppose A = a21 a22 a23 . Then
a31 a32 a33

det A = a11 C11 − a12 C12 + a13 C13



a22 a23 a21 a23 a21 a22
= a11 − a12
a31 a33 + a13 a31 a32

a32 a33
= a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 )

which again agrees with the formula we had above.


• Let’s look at n = 4:

a11 a12 a13 a14

a21 a22 a23 a24

a31 = a11 C11 + a12 C12 + a13 C13 + a14 C14 .
a32 a33 a34
a41 a42 a43 a44

Now the first cofactor associated to row entry a11 is



a22 a23 a24
C11 = (−1)2 a32 a33 a34 = a22 (a33 a44 − a34 a43 ) − a23 (a32 a44 − a34 a42 ) + a24 (a32 a43 − a33 a42 ).

a42 a43 a44

Note how the sign changes for each cofactor, so



a21 a23 a24 a21 a23 a24
3

C12 = (−1) a31 a33 a34 = − a31 a33 a34 .
a41 a43 a44 a41 a43 a44

Example.

1 −1 0
1 0 5 0 5 1
− (−1) 0 −2 + 0 0 −1 = 1(−2) + 1(−10) + 0 = −12.
5 1 0 = 1


0 −1 −2 −1 −2

We now look at ways of simplfying the calculation of determinants. We are not yet in a position to prove the
second result below, so for the moment we shall simply accept that it is true. We shall give a proof later in the
lecture course when we are familiar with elementary matrices.

Theorem 2.14. Suppose A, B ∈ Mn,n (R) and let In denote the identity matrix. Then the following hold:

1. det In = 1
2. det AT = det A.
3. det(AB) = (det A)(det B)

Proof. The first statement is clear, using induction on n. The proof of (2) is quite lengthy so we give it as the
proof of Theorem A.7 in the Appendix. To prove (3), we need to consider elementary matrices which we shall
define later. We prove (3) in Theorem 5.13.

The following operations are the elementary row operations. We will be seeing quite a lot more of them when
we look at solving systems of linear equations and performing Gauss-Jordan elimination later in the course. Note
that in this definition, A does not need to be square.

Definition 2.15. Suppose that A ∈ Mm,n (R). There are three elementary row operations that we can perform
on A.

1. Interchange two rows.


2. Multiply a row by a non-zero number.

13
3. Change one row by adding to it a multiple of another.
Theorem 2.16. Let A be an n×n matrix and let B be the matrix obtained from A after applying an elementary
row operation.
1. If B is obtained from A after interchanging two rows of A, then det B = − det A.
2. If B is obtained from A after multiplying a row of A by 0 6= k ∈ R, then det B = k det A.
3. If B is obtained from A after adding k times one row of A to another, then det B = det A.
Proof. Again, these proofs are quite complicated. You can find them as Theorems A.4, A.5 and A.6 in the
Appendix.

By transposing the matrix, we have the same results for column operations.
Theorem 2.17. Let A be an n × n matrix.
1. If B is obtained from A after interchanging two columns of A, then det B = − det A.
2. If B is obtained from A after multiplying a column of A by 0 6= k ∈ R, then det B = k det A.
3. If B is obtained from A after adding k times one column of A to another, then det B = det A.
Proof. Combine Theorem 2.16 with Theorem 2.14 (2).
 
1 −1 1
Examples. Consider the matrix A = 2 0 −1. Calculating the determinant we get det A = 8. Consider
1 1 2
what happens when we consider the determinants of matrices that are obtained from A after performing an
elementary row operation.
 
1 −1 1
• Let B = 4 0 −2 be the matrix obtained from A after multiplying the second row by 2 (which we
1 1 2
write as 2r2 ). We have det B = 16 = 2 det A.
 
2 0 −1
• Let C = 1 −1 1  be the matrix obtained from A after the elementary row operation of swapping
1 1 2
rows 1 and 2 (which we write as r1 ↔ r2 ). We have det C = −8 = − det A.
 
1 −1 1
• Let D = 0 2 −3 be the matrix obtained from A after the elementary row operation of subtracting
1 1 2
2 copies of row 1 from row 2 (which we write as r2 − 2r1 ). We have det D = 8 = det A.
There are other rules for computing determinants. Although we don’t need them to compute determinants,
they often make our lives easier. First recall the recursive definition:
n
X
det A = a1j C1j .
j=1

Using Theorem 2.16 (1), we realise that we don’t need to expand along the first row, any row will do so long
as we change the sign appropriately. Futhermore, using Theorem 2.14 (2), we realise that we can expand down
columns instead.
Theorem 2.18. Let A ∈ Mn,n (R). Suppose 1 ≤ i ≤ n. Then
n
X
i+1
det A = (−1) aij Cij .
j=1

Suppose that 1 ≤ j ≤ n. Then


n
X
det A = (−1)j+1 aij Cij .
i=1

14
Example. Expanding along the second row:

2 −3 5
−3 5 2 5 2 −3
0 1 0 = −0

+ 1 3 −3 − 0 3 −1 = −21.


3 −1 −3 −1 −3

Note the change of sign because we are going along row 2. Also note that we don’t need to compute the
cofactors C21 and C23 because of the 0 in front of them.

Sometimes it is easy to tell if a matrix has zero determinant, which can save us a lot of work.

Theorem 2.19. Let A be a square matrix. Then det A = 0 if any of the following hold:

1. A has an entire row (or column) of zeros.


2. A has two equal rows (or columns).
3. A has two proportional rows (or columns).

Proof. The first result follows from Theorem 2.18. To get the last, apply Theorem 2.16 (3) (or Theoerem 2.17
(3)) to get a matrix with a row (or column) or zeros.

Example. Subtracting the second row from the first row of a matrix does not change its determinant.

3 −2 3 1 0 0
−2 3
2 −2 3 = 2 −2 3 =
= −22.

1 0 11
0 11 1 0 11

2.3 Determinants and inverses of matrices


Previously, we stated that the inverse of a 2 × 2 matrix A exists if det A 6= 0. This result extends to all square
matrices, and indeed is an if and only if result.

Theorem 2.20. Let A be an n × n matrix. Then A is invertible if and only if det A 6= 0. Furthermore if A is
invertible and has inverse A−1 then det A−1 = det1 A .

Remark. We are not yet able to prove this result. The first part will be proved as Theorem 5.12 and the second
part is Corollary 5.14. But we can prove part of the Theorem, that is, if det A 6= 0, we will now show that A is
invertible.

Finding the inverse using determinants


The inverse calculation for finding 2 × 2 inverses using the determinant function can be extended to n × n
matrices.

Definition 2.21. For an n × n matrix A we can define the adjugate of A, denoted by adj A, as follows
 
C11 C21 . . . Cn1
 C12 C22 . . . Cn2 
adj A = (Cji ) =  .
 
.. 
 .. . 
C1n C2n . . . Cnn

where Cij are the cofactors defined in Definition 2.12.

Remark. In the definition of adj A, note the transposition, that is, note that the (i, j)-entry of adj A is Aji .

Theorem 2.22. Suppose that A ∈ Mn,n (R) and that det A 6= 0. Then we can find the inverse of A as follows

1
A−1 = adj A.
det A
Proof. We give a self-contained proof of this in Theorem A.2 in the Appendix.

15
 
−2 3 2
Example. Find the inverse of A =  6 0 3 . We need to find adj A and det A. We have det A = 72.
4 1 −1
We calculate the first three cofactors and leave the rest for you to check.

0 3 6 3 6 0
C11 = = −3, C12 = − = 18, C13 = = 6,
1 −1 4 −1 4 1
and continuing with these calculations we get
 
−3 5 9
adj A =  18 −6 18  .
6 14 −18

Therefore,  
−3 5 9
1 
A−1 = 18 −6 18  .
72
6 14 −18
Note that the entries -3,18,6 which we calculated above which are the cofactors of the first row of A appear in
the first column of adj A. It is always a good idea to check your solution: Verify that
    
−3 5 9 −2 3 2 1 0 1
1 
A−1 = 18 −6 18   6 0 3  = 0 1 0 .
72
6 14 −18 4 1 −1 0 0 1

We’ll see another way of computing the inverse of a matrix in Section 5.2.

2.4 Powers of square matrices


Definition 2.23. If A is a square n × n matrix, recall that we defined

A0 := In , A1 := A, A2 := AA, A3 := A2 A

and so on, that is, Ak+1 := Ak A, for any k ≥ 0. If in addition A is invertible and k ≥ 1 then define

A−k := (A−1 )k .
−1
Notice, as we have Ak A−k = A...A
| {z } |A {z...A−1} = In , we conclude that A−k is the inverse of Ak .
k k

Lemma 2.24. Suppose A is a square matrix and r and s are both integers. Then

Ar As = Ar+s , (Ar )s = Ars .

Examples.
 2 
1 2 7 10
• = .
3 4 15 22
 2  
0 1 0 0
• = .
0 0 0 0
• Simplify (A + B)2 − (A − B)2 . We have

(A + B)2 − (A − B)2 =(A + B)(A + B) − (A − B)(A − B) by definition


=A(A + B) + B(A + B) − A(A − B) + B(A − B) distributive property
2 2 2 2
=A + AB + BA + B − A + AB + BA − B distributive property
=2(AB + BA) .

16
 
1 −1
• Let A = . Let us find Ak for all k ≥ 1. (Note that A is not invertible.)
−1 1
 
1 −1
A= ,
−1 1
      
1 −1 1 −1 2 −2 1 −1
A2 = = =2
−1 1 −1 1 −2 2 −1 1
      
3 2 −2 1 −1 4 −4 1 −1
A = = =4 .
−2 2 −1 1 −4 4 −1 1

So we conjecture that Ak = 2k−1 A. To give a proof, we use induction. For n ≥ 1, set P (n) to be the
statement:  n  
1 −1 n−1 1 −1
=2 .
−1 1 −1 1
 1  
1 −1 1 −1
Base Step: Suppose that k = 1. The left hand side is = and the right hand
−1 1 −1 1
   
1 −1 1 −1
side is 20 = . So P (1) is true.
−1 1 −1 1
Inductive Step: Suppose that P (k) holds. Then
 k+1  k  
1 −1 1 −1 1 −1
=
−1 1 −1 1 −1 1
  
k−1 1 −1 1 −1
=2
−1 1 −1 1
 
2 −2
= 2k−1
−2 2
 
(k+1)−1 1 −1
=2 .
−1 1

Hence P (k + 1) holds and so P (n) holds for all n ≥ 1.

If a matrix satisfies certain equations, we can deduce facts about other matrices related to it.

Example. Let A be an n×n matrix and let In denote the n×n identity matrix. Suppose that A3 +2A2 +3A = In .
Then A is invertible, and
A−1 = A2 + 2A + 3In .
To see this, note that

In = A3 + 2A2 + 3A = A A2 + 2A + 3In and In = A3 + 2A2 + 3A = A2 + 2A + 3In A.


 

Hence A2 + 2A + 3In is the unique inverse of A.




Example. Let A be an n × n matrix and let In denote the n × n identity matrix and 0n the n × n zero matrix.
Suppose that A2 − 2A + In = 0n . Then A − In is not invertible.
To see this, note that
0n = A2 − 2A + In = (A − In )2 .
Now it is not necessarily true that A − In = 0n . (This is a very common misconception! - however, recall
from Section 1.3 that we can have CD = 0n even if C 6= 0n and D 6= 0n .) Suppose for a contradiction that
A − In is invertible with inverse B. Then

0n = 0n B 2 = (A − In )2 B 2 = In

giving the required contradiction. So A − In is not invertible.

17
3 Simultaneous Linear Equations
In this section we shall begin with familiar linear equations in two or three unknowns, that is, linear equations
in R2 and R3 . So for R2 we are thinking about straight lines we can draw on an x − y - axis in two dimensions,
and for R3 we are thinking about level planes that we (or a computer package like Maple!) can plot on an
x − y − z - axis in three dimensions. (The diagrams in this section were drawn using the LaTeX package Tikz.)
In this Section we will

• Use examples to understand the connection between the solutions of systems of simultaneous linear
equations and the intersection of lines and planes in R2 and R3 .
• Define what is meant by a system of simultaneous linear equations in Rn and its solution set.
• See examples of consistent, inconsistent, homogeneous and inhomogenous systems and find the solution
set in some easy cases.
• Define the matrix form of a system of linear equations.
• Learn Cramer’s rule for finding the solution to a consistent set of n linear equations in n unknowns.

3.1 Lines and Planes


Lines: A linear equation of the form
ax + by = c
with a, b, c ∈ R represents a line in R2 . The coefficients of the equation are represented by a, b ∈ R. The
constant of the equation is represented by c ∈ R. The unknowns of the equation are represented by x, y ∈ R.
Two equations
Given two linear equations ax + by = c and dx + ey = f it is possible to solve these simultaneously to find
values for x and y. These values (or coordinates if you prefer) correspond to the points of intersection of these
two lines in R2 . We use the following notation to represent our system of simultaneous linear equations:

ax + by = c
dx + ey = f
Thinking about two lines in R2 , what are the situations that can occur?

Examples.

• The system 
x − y = −1
x + 2y = 5
has the unique solution x = 1, y = 2, corresponding to the unique point of intersection (1, 2) of the two
lines in R2 .

• For the system 


x − y = −1
x − y = 1
it is not possible to find x and y that simultaneously satisfy both equations. There is no solution, and the
two lines are parallel in R2 .

18
y

• The system 
−4x − 2y = −8
2x + y = 4
has infinitely many solutions, (k, 4 − 2k) where k ∈ R. The two equations represent the same line in R2 .

Planes: A linear equation of the form


ax + by + cz = d
R3 .
with a, b, c, d ∈ R represents a plane in The coefficients of the equation are represented by a, b, c ∈ R. The
constant of the equation is represented by d ∈ R. The unknowns of the equation are represented by x, y, z ∈ R.
Two equations
Given two linear equations ax + by + cz = d and ex + f y + gz = h it is possible to solve these simultaneously to
find values for x, y and z. These values (or coordinates if you prefer) correspond to the points of intersection of
these two planes in R3 . We use the following notation to represent our system of simultaneous linear equations:

ax + by + cz = d
ex + f y + gz = h
Thinking about two planes in R3 , what are the situations that can occur?
Examples.

• For the system 


x + y + z = 1
x + y + z = 2
it is not possible to find x, y and z that simultaneously satisfy both equations. There is no solution, and
the two planes are parallel in R3 .

19
• The system

x + y + z = 1
2x + 2y + 2z = 2

has infinitely many solutions, as the two equations represent the same plane in R3 . Each set of values
for x, y and z satisfying x + y + z = 1 is a solution to this system, such as x = 1, y = 0, z = 0 and
x = −2, y = 4, z = −1.

• The system

x + y + z = 1
x + y = 0

has infinitely many solutions - the planes in R3 intersect in a line. The z-coordinate of each point is 1, so
the line lies above the (x, y)-plane. Each set of values for x, y and z satisfying x + y = 0 and z = 1 is a
solution to this system, such as x = 1, y = −1, z = 1 and x = −3, y = −3, z = 1.

Three equations
Given three linear equations ax + by + cz = d, ex + f y + gz = h and ix + jy + kz + l it is possible to solve
these simultaneously to find values for x, y and z. These values (or coordinates if you prefer) correspond to
the points of intersection of these three planes in R3 . We use the following notation to represent our system of
simultaneous linear equations:

 ax + by + cz = d
ex + f y + gz = h
ix + jy + kz = l

Thinking about three planes in R3 , what are the situations that can occur?

Examples.

• The system

 x + y + z = 1
x + y = 0
x − z = 0

has the unique solution x = 1, y = −1, z = 1 corresponding to the unique point of intersection (1, −1, 1)
of the three planes in R3 .

20
• The system

 x + y + z = 1
2x + 2y + 2z = 2
−x − y − z = −1

has infinitely many solutions, as the three equations represent the same plane in R3 .

• The system

 x + y + z = 1
x + y = 0
x + y − z = −1

has infinitely many solutions - the planes in R3 intersect in a line.

• The system

 x + y + z = 1
x + y + z = 2
x + y − z = −1

has no solutions. The first two equations represent parallel planes in R3 .

21
• The system 
 x + y = 1
x + z = 1
− y + z = 1

has no solutions. The planes intersect in pairs and so there are no points common to all three planes.

Remark. There are other possibilities, for example, 3 parallel planes.

3.2 Systems of linear equations


We shall now formalise our definition of a linear equation for any Rn . This definition clearly covers our definitions
of lines and planes above. Do ensure careful use of subscripts when writing these out!

Definition 3.1. An equation of the form

a1 x1 + a2 x2 + . . . an xn = b,

where a1 , a2 . . . an , b ∈ R and a1 , a2 . . . an are not all zero, is a linear equation in the n unknowns x1 , x2 . . . xn .
The coefficients of the equation are represented by ai ∈ R. The constant term of the equation is represented
by b ∈ R. The unknowns of the equation are represented by xi ∈ R.

Examples. The following are linear equations in the unknowns x1 , . . . x5 :

• x1 + 3x2 − x3 − 5x4 − 2x5 = 0


• x1 − x2 + 2x3 x4 + 3x5 = 4
• 5x2 − x5 = 2

The following are not linear equations in the unknowns x1 , . . . x5 :

• a1 x1 + a2 x22 + . . . + a5 x55 = b
• sin x1 + cos x2 = 1

22
Definition 3.2. A system of m simultaneous linear equations in n unknowns, x1 , . . . xn , consists of m
linear equations



 a11 x1 + a12 x2 + . . . + a1n xn = b1
 a21 x1 + a22 x2 + . . . + a2n xn

= b2
.. .. .. .. ..


 . . . . = .
am1 x1 + am2 x2 + . . . + amn xn = bm

where aij , bi ∈ R for i = 1, . . . m and j = 1, . . . n. The coefficients are represented by aij ∈ R. The constant
terms are represented by bi ∈ R. The unknowns are represented by xj ∈ R.

Remark. As we have seen in our examples of systems of linear equations in R2 and R3 the number m of
equations need not be the same as the number n of unknowns.
Examples.
• The system 
 x1 − 3x2 = 2
4x1 − x2 = −4
−2x1 − 2x2 = 1

represents 3 equations in 2 unknowns.


• The system 
 x1 − x2 − x3 + x4 = 3
4x1 + x2 − 3x3 − x4 = 0
−x1 − 2x2 + 2x3 + 3x4 = 1

represents 3 equations in 4 unknowns.


Definition 3.3. The values x1 = c1 , x2 = c2 , . . . xn = cn are a solution of a system of m simultaneous linear
equations in n unknowns if these values simultaneously satisfy all m equations of the system. The solution set
of the system is the set of all solutions.
Examples.
• The solution set of the system 
 x + y + z = 1
x + y = 1
x − z = 0

is the set {(0, 1, 0)}, which has just one member.


• The solution set of the system 
 x + y + z = 1
x + y + z = 2
x + y − z = 0

is the empty set.


• The solutions of the system 
 x + y + z = 1
x + y = 1
x + y − z = 1

are the triples of values for x, y and z satisfying x + y = 1 and z = 0. We can rearrange the equation
to y = 1 − x, so given any value of x we can find the corresponding value for y. We write this general
solution as
x = k, y = 1 − k, z = 0, k ∈ R.
The solution set can be written more formally as
{(k, 1 − k, 0) : k ∈ R}
and has infinitely many members.

23
Definition 3.4. A system of simultaneous linear equations is consistent when it has at least one solution. The
system is inconsistent when it has no solutions.

So, in our examples above the first and third systems are consistent, the second system is inconsistent.

Definition 3.5. A homogeneous system of simultaneous linear equations is a system in which each constant
term is equal to zero, that is bi = 0 for all i = 1, . . . m. A system containing at least one non-zero constant
term is a non-homogeneous system.

Definition 3.6. The trivial solution to a homogeneous system of simultaneous linear equations is the solution
with each unknown equal to zero, that is, xj = 0 for 1 ≤ j ≤ n. A solution with at least one non-zero unknown
is a non-trivial solution.

Examples.

• The homogeneous system



 x + y + z = 0
2x + 3y + 4z = 0
x + y − z = 0

has exactly one solution x = y = z = 0. (This is the trivial solution.)


• The homogeneous system

 x + y + z = 0
2x + 2y + 2z = 0
x + y − z = 0

has the solution set


{(k, 1 − k, 0) : k ∈ R}.

There are infinitely many solutions including the trivial solution x = y = z = 0 and the non-trivial
solutions x = 1, y = −1, z = 0 and x = −2, y = 2, z = 0.

Remark. Note that if the system is non-homogeneous, setting xj = 0 for all j cannot be a solution. So the
trivial solution is only possible for non-homogeneous systems. But if the system is homogeneous, the trivial
solution is always a solution.

Remark. Given any system of m linear equations in n unknowns, the solution set of the system will be of one
of three forms: it contains exactly one solution, it contains infinitely many solutions or it is the empty set.

Remark. If our equations are in m ≤ 3 unknowns, we can think about them as representing points (one
unknown), lines (two unknowns) or planes (3 unknowns). Then solving the simultaneous equations corresponds
to finding the intersection of these objects, as in the first examples in this section. However

• We can think of an equation in m unknowns as representing a ‘plane’ in m-dimensional space. This is


just a little hard to picture for m ≥ 3!
• You should have seen complex numbers by now - ignore this line if not! Instead of real coefficients and
constants, we could have used complex coefficients and constants. But now it’s hard to picture the
equations geometrically. For example, consider the complex system

x + iy = 1
ix + 2y = 0

This has the unique solution x = 2/3, y = −i/3.

The methods we will see in the next section for solving systems of linear equations work equally well if we work
over C instead of R. However, in this course we’ll just be working over R.

24
3.3 Matrix form of a system of linear equations
We shall now consider the matrix form of a system of linear equations.
Given any system of simultaneous linear equations


 a11 x1 + a12 x2 + . . . + a1n xn = b1
 a21 x1 + a22 x2 + . . . + a2n xn

= b2
.. .. .. .. ..


 . . . . = .
am1 x1 + am2 x2 + . . . + amn xn = bm

we can express these as a matrix product Ax = b where we have the coefficient matrix denoted by
 
a11 a12 a13 · · · a1n
 a21 a22 a23 · · · a2n 
A= . ..  = (aij ),
 
.. ..
 .. . . . 
am1 am2 am3 · · · amn

the matrix of unknowns denoted by  


x1
 x2 
x=
 
.. 
 . 
xn
and the matrix of constant terms denoted by
 
b1
 b2 
b= .
 
..
 . 
bm

We can use matrix inverses to give us another method for solving certain systems of linear equations.
Example. Consider the system 
2x + 4y = 10
4x + y = 6.
This system may be expressed in matrix form as
    
2 4 x 10
= .
4 1 y 6
   1 2

2 4 − 14 7
The coefficient matrix is invertible, with inverse 2 .
4 1 7 − 17 .
Multiplying both sides of the matrix form of the system on the left by the inverse of the coefficient matrix we
obtain    1 2
    1 2
   
x − 14 7 2 4 x − 14 7 10 1
= 2 1 = 2 1 =
y 7 −7 4 1 y 7 −7 6 2
which gives us the unique solution    
x 1
= .
y 2
Theorem 3.7. Let A be an invertible matrix. Then the system of linear equations Ax = b has the unique
solution x = A−1 b.
Proof. Since A is invertible, there exists A−1 such that A−1 A = In . Then

Ax = b =⇒ A−1 (Ax) = A−1 b =⇒ x = A−1 b; and


x = A−1 b =⇒ Ax = A(A−1 b) =⇒ Ax = b.

25
3.4 Cramer’s Rule
Cramer’s rule is another method for solving n equations in n unknowns which works when the system has a
unique solution. So if you are solving the system Ax = b, you should check that det A = 0 before trying to
apply it.
We won’t prove that Cramer’s rule works, but if you are interested you can find a proof online.
Cramer’s Rule for systems of two linear equations in two unknowns
We shall state Cramer’s Rule in general as Theorem 3.9, but first let us consider it for simple systems. Let our
system be given by 
ax + by = e
cx + dy = f

Theorem 3.8. If Ax = b is a system of two equations in two unknowns such that det A 6= 0, then the system
has a unique solution given by:
ed − bf af − ce
x= , y= .
ad − bc ad − bc
This is easy to verify - we know there is a unique solution and substititing the values of x and y above shows
that these values are indeed a solution. You will observe that the denominators
 in bothcases are det A and the

e b a e
numerators are the determinants of the two matrices A1 = and A2 = respectively.
f d c f

Example. Using Cramer’s Rule, let us solve the system



x + 2y = 5
3x + 4y = 9

We have      
1 2 5 2 1 5
A= , A1 = , A2 =
3 4 9 4 3 9
Calculating det A = −2 and det A1 = 2 and det A2 = −6 we get

det A1 2 det A2 −6
x= = = −1 , y= = = 3.
det A −2 det A −2
We shall now extend Theorem 3.8 which stated Cramer’s Rule for 2 × 2 matrices. As mentioned previously,
Cramer’s Rule allows us to calculate the unique solution of a system of linear equations when the determinant
of the coefficient matrix A is non-zero.

Theorem 3.9. If Ax = b is a system of n linear equations in n unknowns such that det A 6= 0, then the system
has a unique solution. This solution is

det A1 det A2 det An


x1 = , x2 = , xn =
det A det A det A

where Aj is the matrix obtained by replacing the entries in the j th column of A by the entries in the matrix
 
b1
 b2 
b=.
 
 .. 
bn
.

Example. We use Cramer’s Rule to solve



 x + 2z = 6
−3x + 4y + 6z = 30
−x − 2y + 3z = 8

26
We have
       
1 0 2 6 0 2 1 6 2 1 0 6
A = −3 4 6 A1 = 30 4 6 A2 = −3 30 6 A3 = −3 4 30
−1 −2 3 8 −2 3 −1 8 3 −1 −2 8

det A1 −40 −10 det A2 72 18 det A3 152 38


det A = 44 x1 = = = y= = = z= = =
det A 44 11 det A 44 11 det A 44 11
So the solution is  
−10
1 
x= 18  .
11
38

4 Solving systems of linear equations


When solving systems in R2 it is quite legitimate and feasible to substitute a few equations into each other to
find solutions. In R2 and R3 it is also quite feasible to plot the lines and planes to find solutions. However,
these methods are not efficient - what happens when you have lots of lines or planes to work with? Extending
to Rn for n = 4, . . . these methods would not work at all (unless you are very lucky or good at guessing)!
Luckily, there is a systematic method for solving systems of simultaneous linear equations. The method is called
Gauss-Jordan Elimination and it works for all systems. It entails successively transforming a system into
simpler systems, in such a way that the solution set remains unchanged. The process ends when the solutions
can be determined easily - we will end up with a system so simple that we can just read off the solution.
In this Section we will

• Look at performing elementary row operations to transform a system of linear equations and hence solve
them.
• Define the augmented matrix corresponding to a system of linear equations.
• Define when a matrix is in row-reduced echelon form.
• Describe an algorithm (Gauss-Jordan elimination) to turn a matrix into a matrix in row-reduced echelon
form using elementary row operations.
• Explain how to find the solution space of a system of linear equations by applying Gauss-Jordan elimination
to the augmented matrix.
• Define the row rank of a matrix and see how this relates to the solution set of a system of equations.

4.1 Solving systems of linear equations I

In this section, we shall explore ways of solving systems. We will look at a systematic method - an algorithm -
for solving systems in the next section.

Lemma 4.1. The following operations do not change the solution set of a system of linear equations.

1. Interchange two equations.


2. Multiply an equation by a non-zero number.
3. Change one equation by adding to it a multiple of another.

Proof. We consider each operation in turn.

1. It is clear that changing the order that we write the equations in does not change the solution set.
2. Let k ∈ R \ {0}. Consider the equations

(1) a1 x1 + a2 x2 + . . . an xn = b, (2) ka1 x1 + ka2 x2 + . . . kan xn = kb.

27
If x1 , x2 , . . . , xn satisfy

a1 x1 + a2 x2 + . . . an xn = b then multiplying by k we have ka1 x2 + ka2 x1 + . . . kan xn = kb.

Conversely if x1 , x2 , . . . , xn satisfy

ka1 x1 + ka2 x2 + . . . kan xn = kb then multiplying by k −1 we havea1 x1 + a2 x2 + . . . an xn = b.

So x1 , . . . , xn satisfy (1) if and only if they satisfy (2).


3. Consider the systems
( (
a1 x1 + a2 x2 + . . . an xn = b (a1 + la01 )x1 + (a2 + la02 )x2 + . . . (an + la0n )xn = b + lb0
(1) (2)
a01 x1 + a02 x2 + . . . a0n xn = b0 a01 x1 + a02 x2 + . . . a0n xn = b0

We can check that x1 , . . . , xn satisfy (1) if and only if they satisfy (2).

Below are some examples: We will explain in the next section an algorithm that will always work to give the
solution set.
Systems with exactly one solution
We shall begin by using examples of systems which yield exactly one solution. These are the best systems to
start practising with as you will see.

Examples.

• Solve the following system of two linear equations in two unknowns:



x − y = −1 r1
2x + y = 4 r2

We have labelled the equations r1 , r2 in order that we can describe what we are doing at each stage.
Firstly, we want to eliminate x from equation r2 :

x − y = −1
3y = 6 r2 − 2r1

Secondly, we want to simplify equation r2 to get a value for y:



x − y = −1
1
y = 2 3 r2

Finally, we want to eliminate y from equation r1 to get a value for x:



x = 1 r1 + r2
y = 2

We have transformed our original system into an equivalent simpler system and we can read off our
unique solution x = 1, y = 2.
• Solve the following system of three linear equations in three unknowns:

 x + y + z = 1 r1
x + y = 1 r2
x − z = 0 r3

We eliminate x from equations r2 and r3 :

28

 x + y + z = 1
− z = 0 r2 − r1
− y − 2z = −1 r3 − r1

We interchange r2 and r3 :

 x + y + z = 1
− y − 2z = −1 r2 ↔ r3
− z = 0

Multiply r2 and r3 by −1:



 x + y + z = 1
y + 2z = 1 −r2
z = 0 −r3

Eliminate y from equation r1 .



 x − z = 0 r1 − r2
y + 2z = 1
z = 0

Finally, eliminate z from equations r1 and r2 :



 x = 0 r1 + r3
y = 1 r2 − 2r3
z = 0

By using the elementary operations we have transformed our original system into an equivalent simple
system. The unique solution is x = 0, y = 1, z = 0.
Systems with infinitely many solutions
So far, the examples we have done have yielded exactly one solution and it has been easy to read off the solutions
for the unknowns. What happens when we perform the elementary operations on a system with infinitely many
solutions?
Examples.
• Solve the following system of two linear equations in two unknowns:

−4x − 2y = −8 r1
2x + y = 4 r2

We can multiply r1 by − 21 :

− 21 r1

2x + y = 4
2x + y = 4
Eliminate both x and y by subtracting r1 from r2 :

2x + y = 4
0x + 0y = 0 r2 − r1
We are left with a single equation in terms of x and y. Rearranging we get y = 4 − 2x. So for any value
of x we are able to calculate a corresponding value of y. We can write the general solution as:
x = k, y = 4 − 2k, k ∈ R.
Formally, the solution set to the system is written:
{(k, 4 − 2k) : k ∈ R}

29
• Solve the following system of three linear equations in three unknowns:

 x + y + z = 1 r1
x + y = 1 r2
x + y − z = 1 r3

We can eliminate x and y from r1 and r3 :



 z = 0 r1 − r2
x + y = 1
− z = 0 r3 − r2

Add r1 to r3 :

 z = 0
x + y = 1
0x + 0y + 0z = 0 r3 + r1

We see that we have z = 0 and an equation in terms of x and y. Rearranging this equation we get
y = 1 − x. So for any value of x we are able to calculate a corresponding value of y. We can write the
general solution as:
x = k, y = 1 − k, z = 0, k ∈ R.
Formally, the solution set to the system is written:

{(k, 1 − k, 0) : k ∈ R}

Systems with no solutions


We now consider the final case for a system of simultaneous linear equations. What happens to an inconsistent
system when we perform Gauss-Jordan elimination?

Examples.

• Solve the following system of two linear equations in two unknowns:



x − y = −1 r1
x − y = 1 r2

Subtract r1 from r2 : 
x − y = −1
0 = 2 r2 − r1
We are left with an impossible equation 0 = 2 in r2 which can never be true, hence there are no solutions
to this system.
• Solve the following system of three linear equations in three unknowns:

 x + y + z = 1 r1
x + y + z = 2 r2
x + y − z = 0 r3

We subtract r1 from r2 and r3 :



 x + y + z = 1
0 = 1 r2 − r1
− 2z = −1 r3 − r1

Again, we are left with an impossible equation 0 = 1 which can never be true, hence there are no solutions
to this system.

30
4.2 Solving systems of equations II
Using elementary operations on a small system of equations is doable, but I expect you have probably thought
already it is somewhat cumbersome and prone to error. Luckily we can abbreviate a system of linear equations
by writing its coefficients and constants in the form of a matrix, called the augmented matrix of the system.
Using this matrix allows us (and computers) to more efficiently solve systems of simultaneous linear equations.

Definition 4.2. Given a system of m simultaneous linear equations in n unknowns




 a11 x1 + a12 x2 + ... + a1n xn = b1
 a21 x1

+ a22 x2 + ... + a2n xn = b2
.. .. .. .. ..


 . . . = . .
am1 x1 + am2 x2 + . . . + amn xn = bm

we can abbreviate this as the m × (n + 1) matrix


 
a11 a12 ... a1n b1
 a21 a22 ... a2n b2 
.
 
 .. .. .. ..
 . . . . 
am1 am2 . . . amn bm

This matrix is called the augmented matrix of the system. (Some texts refer to it as the extended matrix of
the system.) The word augmented reflects the fact that this is made up of a matrix formed by the coefficients
of the system augmented by a matrix formed by the constants of the system.

Remark. We draw a line down the matrix to separate the coefficients aij from the constants bi . This won’t
affect any of the maths that we do.

Examples.

• The system
  
 −x + 7y − z = 1 −1 7 −1 1
x − y = −1 has augmented matrix  1 −1 0 −1  .
− 3y − z = 2 0 −3 −1 2

• The system
  
−x1 + 7x2 − x3 + 2x4 = 3 −1 7 −1 2 3
has augmented matrix .
x1 − x2 − x4 = 0 1 −1 0 −1 0

The three elementary operations that we use on a system of linear equations correspond exactly to the three
elementary row operations on the rows of the augmented matrix of the system. Recall Definition 2.15:
Suppose that A ∈ Mm,n (R). The elementary row operations on A are the following operations:

1. Interchange two rows.


2. Multiply a row by a non-zero number.
3. Change one row by adding to it a multiple of another.

We use the notation ri ↔ rj to indicate that we are swapping row i and row j. We use the notation ri 7→ kri
to indicate that we are multiplying row i by k 6= 0. We use the notation ri 7→ ri + lrj to indicate that we are
adding l copies of row j to row i, for some l ∈ R. Sometimes we can perform multiple operations at once (but
be careful with this and go slowly if you’re not sure).

Definition 4.3. We say that two matrices A and B are row equivalent if matrix B can be obtained from
matrix A by a finite number of elementary row operations.

31
   
2 0 −1 2 −1 1
Example. The matrices 4 −4 2  and 4 −4 2  are row equivalent. We can see this by performing
6 −1 3 0 2 −2
the following elementary row operations:
       
2 0 −1 r 7→r − 1 r 0 2 −2 r 7→ 1 r 0 2 −2 2 −1 1
1 1 2 3 3 r1 ↔r3
4 −4 2  −− −−−−2−→  4 −4 2  −−−−5−→ 4 −4 2  −− −−→ 4 −4 2 
r3 7→r3 +r2
6 −1 3 10 −5 5 2 −1 1 0 2 −2

The elementary row-operations will come in useful when we want to solve systems of linear equations. Informally,
this is because

• Performing elementary row operations on an augmented matrix does not change the solution set.
• If an augmented matrix is in row-reduced echelon form then it is easy to read off the solutions.

We will explain these statements properly soon.

Definition 4.4. Let A be any matrix.

• The first non-zero entry in any row of A is called the leading entry of that row. (Also referred to as the
leading coefficient or as the pivot of the row.)
• A is in row echelon form if

1. the non-zero rows appear before any rows of zeros,


2. the leading entry of a nonzero row is always strictly to the right of the leading entry of the row above
it.

• A is in row-reduced echelon form if

1. the non-zero rows appear before any rows of zeros,


2. the leading entry of a nonzero row is always strictly to the right of the leading entry of the row above
it,
3. each leading entry is equal to 1,
4. for each leading entry all entries above are equal to zero.

Examples. The following matrices are all in row-reduced echelon form with the leading entries in red.
   
  0 1 0 2 0 −1 1 0 1 0
1 0 0
0 1 0  ,
0
 0 1 −2 0 2  ,
0
 1 −5 0 
0 0 0 0 1 2 0 0 0 1
0 0 1
0 0 0 0 0 0 0 0 0 0

Examples. The following matrices are all in row echelon form but NOT in row-reduced echelon form, with the
leading entries in red.
   
  0 1 0 2 −1 −1 1 0 1 −1
1 0 0
0 2 0 ,
0
 0 2 −2 0 2 ,
0
 1 −5 3 

0 0 0 0 1 2 0 0 0 1
0 0 3
0 0 0 0 0 0 0 0 0 0

Examples. The following matrices are not in row echelon form (and therefore not in row-reduced echelon form
either) with the leading entries in red.
   
  0 1 0 2 −1 −1 1 0 1 −1
1 0 0
1 0 0 ,
0
 0 1 −2 0 2 ,
0
 0 0 1
0 0 0 0 0 0 0 1 −5 3 
0 0 1
0 0 0 0 1 2 0 0 0 0

Remark. Any matrix in row-reduced echelon form is also row-reduced.

32
Although the next result is simple, we shall use it many times in the next section.

Lemma 4.5. Suppose A ∈ Mn,n (R) is in row-reduced echelon form. Then A = In or A contains a row of zeros.

Proof. If every row of A contains a leading entry then A = In ; otherwise A contains a row of zeros.

Remark. Using elementary row operations to turn a matrix into a matrix in row-reduced echelon form is called
Gauss-Jordan elimination. Using elementary row operations to turn a matrix into a matrix in row echelon form
is called Gauss elimination (or sometimes Gaussian elimination).

The next theorem just restates Lemma 4.1 using our new language.

Theorem 4.6. Suppose that S1 and S2 are two systems of m equations in n unknowns. Let A1 be the
augmented matrix of S1 and let A2 be the augmented matrix of S2 . Then S1 and S2 have the same solution
set if and only if A1 is row equivalent to A2 .

Theorem 4.7. Every matrix is row equivalent to a unique matrix in row-reduced echelon form.

In Theorem 4.8 below, we will see an algorithm that shows that we can use row operations convert any matrix
into a matrix in row-reduced echelon form. So we need to consider uniqueness. The proof is quite technical, so
we have put it in the Appendix. You can find it as Theorem A.9.

Remark. By Theorem 4.7, we see that two matrices are row equivalent if and only if they are row-equivalent
to the same row-reduced echelon matrix.

Putting Theorem 4.7 and Theorem 4.6 together, we see that we can take a system of equations, write the
augmented matrix for that system and then use elementary row operations to get the matrix into a matrix in
row-reduced echelon form where the corresponding equations have the same solution set as the original system
of equations. We will soon see

• An algorithm for performing elementary row operations which will turn any matrix into a matrix in row-
reduced echelon form.
• How to read off the solution set from a system of equations where the augmented matrix is in row-reduced
echelon form.

Example. Consider the system of equations



 x + y + z = 1
x + y = 1
x − z = 0

that we solved in the last section. We’re going to solve it again. Compare the steps we went through last
time with the steps below. We begin by writing down the augmented matrix of the system and then we apply
elementary row operations until the matrix is in row-reduced echelon form.

     
1 1 1 1 1 1 1 1 1 1 1 1
r2 7→r2 −r1 r2 ↔r3
 1 1 0 1 − −−−−−→  0 0 −1 0  −− −−→  0 −1 −2 −1 
r3 7→r3 −r1
1 0 −1 0 0 −1 −2 −1 0 0 −1 0
     
1 1 1 1 1 0 −1 0 1 0 0 0
r2 7→−r2 r1 7→r1 −r2 r 7→r1 +r3
−− −−−→  0 1 2 1  −− −−−−→  0 1 2 1  −−1−−− −−→  0 1 0 1 
r3 7→−r3 r2 7→r2 −2r3
0 0 1 0 0 0 1 0 0 0 1 0

This augmented matrix clearly corresponds to the system of equations x = 0, y = 1, z = 0. So our original
system has (still) the unique solution {(0, 1, 0)}.

We’ve said that every matrix is row equivalent to a (unique) matrix in row-reduced echelon form. Now we
describe a method for finding that matirx.

Theorem 4.8. Carry out the following fours steps, first with row 1 as the current row, then with row 2, and so
on until EITHER every row has been the current row, OR step 1) is not possible.

33
1. Select the first column from the left that has at least one non-zero entry in or below the current row.
2. If the current row has a 0 in the selected column, interchange it with a row below which has a non-zero
entry in that column.
3. If the entry now in the current row and the selected column is c, multiply the current row by 1/c to create
a leading 1.
4. Add suitable multiples of the current row to the other rows to make each entry above and below the
leading 1 into a 0.
 
1 1 2 3
Example. We perform elementary row operations on the matrix 2 2 3 5 until it is in row-reduced
1 −1 0 5
echelon form.

     
1 1 2 3 1 1 2 3 1 1 2 3
r2 7→r2 −2r1 r2 ↔r3
2 2 3 5  −−−−−−−→ 0 0  −1 −1 −− −−→ 0 −2 −2 2 
r3 7→r3 −r1
1 −1 0 5 0 −2 −2 2 0 0 −1 −1
   
1
r2 7→− 2 r2
1 1 2 3 1 0 1 4
r1 7→r1 −r2
−−−−−−→ 0 1  1 −1 −− −−−−→ −0 1 1 −1
0 0 −1 −1 0 0 −1 −1
   
1 0 1 4 1 0 0 3
r3 7→−r3 r1 7→r1 −r3
−−−−−→ 0 1 1 −1 −−−−−−→ 0 1 0 −2
r2 7→r2 −r3
0 0 1 1 0 0 1 1

Let’s see how this fits in with the algorithm.

• We start with the current row as row 1. The first column with a non-zero entry in row 1 or below is
column 1. Since the current row (row 1) does not have 0 in column 1, we do not need to do anything for
step 2. Similarly, because the entry is 1, we don’t need to do anything for step 3. We now perform step
4, adding multiples of row 1 to the other rows so that they have zeros in column 1.
• We now go back to step 1 with row 2 as the current row. The leftmost column in row 2 or 3 which has a
non-zero entry is column 2. But column 2 has zero in row 2, so we swap rows 2 and 3 so that there is a
non-zero entry in row 2 and column 2. Now the leading entry in row 2 and column 2 is -2, so we multiply
row 2 by −1/2. We now perform step 4, adding multiples of row 2 to the other rows so that they have
zero in column 2. (Row 3 already has zero in column 2 so we don’t need to do anything.)
• We now go back to step 1 with row 3 as the current row. The leftmost column in row 3 which has a
non-zero entry is column 3. The entry in the current row and column 3 is −1 so we do not need to do
anything for step 2 and for step 3 we multiply row 3 by −1. We now perform step 4, adding multiples of
row 3 to the other rows so that they have zeros in column 3.
• We can’t do step 1, so we are done. We check (just in case we made a mistake) that our matrix is in
row-reduced echelon form. Since it is, we are done.

Remark. The algorithm we describe is not always the quickest or easiest way of row-reducing the matrix. If
you see a quicker way, you should feel free to use it! However, using the algorithm will get you to the answer,
whereas trying random elementary row operations can lead you round in circles.

We would like to know that this process really does give us a matrix in row-reduced echelon form. The proof is
given in Proposition A.2 in the Appendix.
Now suppose that we have a system of simultaneous linear equations where the augmented matrix is in row-
reduced echelon form. We want to know how to read off the solution set from this matrix.

Proposition 4.9. Suppose that we have a system of m simultaneous equations in n unknowns and that the
corresponding augmented matrix A is in row-reduced echelon form.

• If A has a row (0 0 . . . 0|B) with B 6= 0 then the system is inconsistent. (That is there are no solutions.)

34
• If the first n columns of the matrix all contain a leading entry, the system has a unique solution.
• If neither of the conditions above hold, the system has infinitely many solutions.

Remark.

• The first case corresponds to the equation 0x1 + 0x2 + . . . + 0xn = B, which has no solution if B 6= 0.
• The second case holds if and only if m ≥ n and the matrix consisting of the top n rows and n columns
of A is In . It is then straightforward to read off the solution.
• In the third case, there is at least one of the first n columns which only has zeros in it. This is the most
complicated case and we will look at it in more detail below.

Examples. Suppose we have got the augmented matrix to one of the following forms.

• The system is inconsistent. No solutions.


 
1 0 −1 3
 0 1 −1 2 
0 0 0 4

• Unique solution: x1 = 3, x2 = 2, x3 = −2.


 
1 0 0 3
 0 1 0 2 
 
 0 0 1 −2 
0 0 0 0

• Infinitely many solutions.  


1 2 0 0 −3 0 2

 0 0 1 0 1 0 −2 


 0 0 0 1 2 0 0 
 0 0 0 0 0 1 3 
0 0 0 0 0 0 0
To each variable corresponding to a column which does not contain a leading entry, we assign a parameter
in R. Here, these are columns 2 and 5. So we set x2 = k and x5 = l for some k, l ∈ R.
  
1 2 0 0 −3 0 2 
 x1 + 2x2 − 3x5 =2
0 0 1 0 1 0 −2  x3 + x5 = −2
 

  

 0 0 0 1 2 0 0  x4 + 2x5 =0
0 0 0 0 0 1 3  x6 =3
 



0 0 0 0 0 0 0

We’ve decided to set x2 = k and x5 = l. Rewriting the four equations above, we find

x1 = 2 − 2k + 3l, x2 = k x3 = −2 − l,
x4 = −2l, x5 = l, x6 = 3.

The solution set is n o


(2 − 2k + 3l, k, −2 − l, −2l, l, 3) | k, l ∈ R .

We are finally ready to put all these results together and describe how to solve systems of linear equations.

Proposition 4.10. Suppose that we want to solve a system of m simultaneous equations in n unknowns:


 a11 x1 + a12 x2 + ... + a1n xn = b1
 a21 x1 +

a22 x2 + ... + a2n xn = b2
.. .. .. .. ..


 . . . = . .
am1 x1 + am2 x2 + . . . + amn xn = bm

35
1. Write down the augmented matrix
 
a11 a12 ... a1n b1
 a21 a22 ... a2n b2 
.
 
 .. ..
 . . 
am1 am2 . . . amn bn

2. Bring the matrix into row-reduced echelon form using elementary row operations.
3. If at any stage you get a row
(0 0 0 · · · 0 | B)
where B 6= 0, your system is inconsistent and has no solutions. Stop here - you do not need to finish
row-reducing.
4. Read off the solutions, if any exist.
5. If you have a solution or solutions put it back into the initial equation and check that it works.

• If it works, you know you have got the right answer. Sigh of relief.
• If it doesn’t work, you have gone wrong somewhere. Work backward until you find a step where
your solution satisfies one system of equations but not the one before it. That is where (one of the)
errors occured.

Example. Solve the system 


 7x1 + x2 + 6x3 = 0
−2x1 + 4x2 − 3x3 = 4
31x1 −17x2 + 33x3 = −2

We write down the augmented matrix and row-reduce it.


     
7 1 6 0 −2 4 −3 4 1 −2 3/2 −2
r1 ↔r2 r1 →
7 −1/2r 1
 −2 4 −3 4  −− −−→  7 1 6 0  −−−−−−−→  7 1 6 0 
31 −17 33 −2 31 −17 33 −2 31 −17 33 −2
   
1 −2 3/2 −2 1 −2 3/2 −2
r 7→r −7r1 r3 7→r3 −3r2
−−2−−−2−−−→  0 15 −9/2 14  −− −−−−−→  0 15 −9/2 14 
r3 7→r3 −31r1
0 45 −27/2 −60 0 0 0 −102

The system is inconsistent - no solutions.

Example. Solve the system 


 2x + 3y = 7
−x + 4y = 2
5x − 11y = −1

We write down the augmented matrix and row-reduce it.


       
2 3 7 −1 4 2 1 −4 −2 1 −4 −2
r1 ↔r2 r1 7→−r1 r2 7→r2 −2r1
 −1 4 2  −− −−→  2 3 7  −− −−−→  2 3 7  −− −−−−−→  0 11 11 
r3 7→r3 −5r1
5 −11 −1 5 −11 −1 5 −11 −1 0 9 9
   
1 −4 −2 1 0 2
r2 7→1/11r2 r1 7→r1 +4r2
−−−−−−−→  0 1 1  −− −−−−−→  0 1 1 
r3 7→r3 −9r2
0 9 9 0 0 0

Hence we read off the unique solution x = 2, y = 1.

Example. Solve the system 


 7x1 + x2 + 6x3 = 0
−2x1 + 4x2 − 3x3 = 4
31x1 −17x2 + 33x3 = −20

36
We write down the augmented matrix and row-reduce it.
     
7 1 6 0 −2 4 −3 4 1 −2 3/2 −2
r1 ↔r2 r1 7→1/2r1
 −2 4 −3 4  −− −−→  7 1 6 0  −−−−−−→  7 1 6 0 
31 −17 33 −20 31 −17 33 −20 31 −17 33 −20
   
1 −2 3/2 −2 1 −2 3/2 −2
r2 7→r2 −7r1 r3 7→r3 −3r2
−−−−−−−−→  0 15 −9/2 14  −−−−−−−→  0 15 −9/2 14 
r3 7→r3 −31r1
0 45 −27/2 42 0 0 0 0
   
1 −2 3/2 −2 1 0 9/10 −2/15
r2 7→1/15r2 r1 7→r1 +2r2
−−−−−−−→  0 1 −3/10 14/15  −− −−−−−→  0 1 −3/10 14/15 
0 0 0 0 0 0 0 0

There is no leading entry in column 3, so we set x3 = k. Then

x1 + 9/10x3 = −2/15 =⇒ x1 = −2/15 − 9/10k,


x2 − 3/10x3 = 14/15 =⇒ x2 = 14/15 + 3/10k,

and the solutions are the elements of the set


n o
(−2/15 − 9/10k, 14/15 + 3/10k, k) : k ∈ R .

Remark. If a solution set has infinitely many solutions, there will be more than one way of writing the solution
set. For example, the solution set above can be seen as the line going through (−2/15, 14/15, 0) with gradient
(−9/10, 3/10, 1). You could also write (for example)
n o n o
(−2/15 − 9/10k, 14/15 + 3/10k, k) : k ∈ R = (1 + 9l, 5/9 − 3l, −34/27 − 10l) : l ∈ R .

While this answer is also correct, it is much easier just to write down the solution space using the row-reduced
echelon matrix as we have done above.

Example. Determine the values of λ ∈ R for which the following system of linear equations is consistent and,
for these values of λ, find all possible solutions:

 2x + (1 + 2λ)y + z = 3
x + λy 2
+ λ z = λ+1
x + λy + z = 2

We perform row reduction on the augmented matrix, being careful not to divide by zero:

λ2 λ + 1 λ2
     
2 1 + 2λ 1 3 1 λ 1 λ λ+1
2 r1 ↔r2 r2 7→r2 −2r1
1 λ λ λ + 1 −−−−→ 2 1 + 2λ
  1 3  −− −−−−−→ 0 1 1 − 2λ2 1 − 2λ
r3 7→r3 −r1
1 λ 1 2 1 λ 1 2 0 0 1 − λ2 1 − λ
2 0 λ2 + λ 2λ + 1
     
1 λ λ λ+1 1 1 0 1+λ 2+λ
r2 7→r2 −2r3 r1 −λr2 r1 7→r1 +r3
−−−−−−−→ 0  1 −1 −1  −−−−−→ 0  1 −1 −1  −− −−−−→ 0 1 −1 −1  (‡)
0 0 1 − λ2 1 − λ 0 0 1−λ 2 1−λ 2
0 0 1−λ 1−λ

Case 1: 1 − λ2 6= 0, that is λ 6= ±1. In this case, we can divide by 1 − λ2 = (1 − λ)(1 + λ) so we can continue
the row reduction to obtain an equivalent matrix in row-reduced echelon form:
   
r3 7→ 1 0 λ+1 2+λ
1
r3 1 0 0 λ+1
1−λ2 r1 7→r1 −(λ+1)r3 1
−−−−−−−→ 0 1 −1 −1 −−−−−−−−−−→ 0 1 0 1+λ − 1
   
1 r2 7→r2 +r3 1
0 0 1 1+λ 0 0 1 1+λ

From this we can see that the system is consistent, with unique solution

1 λ 1
x = λ + 1, y= 1+λ − 1 = − 1+λ , z= 1+λ .

37
 
1 0 2 3
Case 2: λ = 1. The final matrix in (‡) is 0 1 −1
 −1 so the equations are equivalent to
0 0 0 0

 x + 2z = 3
y − z = −1
0 = 0

The system is consistent, with infinitely many solutions. The general solution is

x = 3 − 2k, y = k − 1, z = k, k∈R

The solution set can be written as {(3 − 2k, k − 1, k) : k ∈ R}.


 
1 0 0 1
Case 3: λ = −1. The final matrix in (‡) is now 0 1 −1 −1. The final row represents the impossible
0 0 0 2
equation 0 = 2, so the system of equations is inconsistent for λ = −1.
So to summarise:
• λ = 1: Infinitely many solutions: {(3 − 2k, k − 1, k) : k ∈ R}.
• λ = −1: No solutions.
n o
• λ 6= ±1: Unique solution: λ
λ + 1, 1+λ 1
, 1+λ .

4.3 Row rank and systems of equations


In Section 3 we defined our systems for m equations and n unknowns and were able to find solutions, or not,
using Gauss-Jordan elimination for any augmented m × (n + 1) matrix. In the following chapters we found some
short cuts and results for square coefficient matrices, that is, matrices which correspond to systems with equal
equations and unknowns.
In this short chapter we will define the row rank of any matrix and discover a result about solutions to any shape
system.
Definition 4.11. Let A be an m×n matrix. Then the row rank of A, denoted rk(A), is the number of non-zero
rows in the equivalent row-reduced echelon form matrix of A.
 
1 1 1
Examples. The matrix A = 1 2 2 has rk(A) = 2 as it is equivalent to the row-reduced matrix
0 1 1
 
1 0 0
0 1 1.
0 0 0
   
7 6 5 4 1 0 −1 −2
The matrix B = 8 7 6 5 has rk(B) = 2 as it is equivalent to the row-reduced matrix 0 1 2 3 .
9 8 7 6 0 0 0 0
   
7 6 5 4 1 0 −1 0
The matrix C = 8 7 6 5 has rk(C) = 3 as it is equivalent to the row-reduced matrix 0 1 2 0.
9 8 7 0 0 0 0 1
We mention this result for interest! - the proof is beyond the scope of this course.
Theorem 4.12. If A is any matrix, then rk(A) = rk(AT ).
Theorem 4.13. Solvability of linear equations. Let (EQ) be any system of m simultaneous linear equations in
n unknowns, represented in matrix form by Ax = b. The (EQ) is consistent if and only if rk(A) = rk(A|b).
Furthermore, (EQ) has a unique solution if and only if rk(A) = rk(A|b) = n.

38
Proof. Let B + be the unique m × n + 1 matrix in row-reduced echelon form which is row equivalent to (A|b)
and let B be the matrix obtained by removing the last column from B + . Then B is in row-reduced echelon
form and the elementary row operations that turn (A|b) into B + also turn A into B. Hence rk(A) is the
number of non-zero rows of B and rk(A|b) is the number of non-zero rows of B + . Let r = rk(A). Then either
the last column of B + has 1 in row r + 1 and 0 elsewhere; or it contains only 0 in rows r + 1, . . . , m. In the
former case, the system is inconsistent and rk(A) 6= rk(A|b); in the latter case, the system is consistent and
rk(A) = rk(A|b).

Now assume the system is consistent, so that rk(A) = rk(A|b) and rows r + 1, r + 2, . . . , m of B + are all zero
rows. There is a unique solution if and only if every column of B contains a leading entry, that is, if and only
if n ≤ m and the first n rows of B are equal to In . This also occurs if and only if rk(A) = n.

Examples. Calculating the ranks of the coefficient matrix and augmented matrix for the system

 7x + 6y + 5z = 4
8x + 7y + 6z = 5
9x + 8y + 7z = 6

gives rk(A) = rk(A|b) = 2, so we know from the theorem the system is consistent. We can also conclude, as
2 < n = 3, that we have infinitely many solutions.

Calculating the ranks of the coefficient matrix and augmented matrix for the system

 7x + 6y + 5z = 4
8x + 7y + 6z = 5
9x + 8y + 7z = 0

gives rk(A) = 2 and rk(A|b) = 3, so we know from the theorem the system is inconsistent.

Theorem 4.14. Let A be an n × n matrix. Then the following statements are equivalent.

1. A is invertible.
2. rk(A) = n
3. The system Ax = b has a unique solution for each n × 1 matrix b.
4. The system Ax = v0 has only the trivial solution.

Proof. These results all follow by noting that A is invertible if and only if A is row equivalent to In .

5 More on matrix inverses


In this Section we will

• Define elementary matrices and look at their properties.


• Describe how to invert a matrix, or show it is not invertible, using elementary row operations.

5.1 Elementary matrices


Definition 5.1. A square matrix E ∈ Mn,n (R) is an elementary matrix if E is obtained from the identity
matrix of the same size, In , by an elementary row operation.

So there will be three different kinds of elementary matrices. Rather than writing out the details we give an

39
example that illustrates these three types. For instance, if n = 3 and k, l are real numbers with b 6= 0 then
 
0 1 0
r1 ↔ r2 corresponds to E = 1  0 0
0 0 1
 
1 0 0
r2 7→ r2 + kr1 corresponds to E = k 1 0 .
0 0 1
 
1 0 0
r3 7→ lr3 corresponds to E = 0 1 0
0 0 l

Theorem 5.2. If A is an m × n matrix and Em is an m × m elementary matrix corresponding to a particular


elementary row operation, then EA is the matrix obtained by applying that elementary row operation to A.

Proof. We leave this as an exercise.

Example. If m = n = 3 and E corresponds to r1 ↔ r2 as above, then for


      
a b c 0 1 0 a b c d e f
A = d e f  we have EA = 1 0 0 d e f  = a b c  .
g h i 0 0 1 g h i g h i

Theorem 5.3. An elementary matrix is invertible and its inverse is also an elementary matrix.

Proof. Let In denote the identity matrix of the relevant size. Going back to the definition of elementary
operations note that each can be undone by an elementary operation: For ri ↔ rj apply the same again, for
ri 7→ ri + krj apply ri 7→ ri − krj and for ri → lri apply ri 7→ l−1 ri . Therefore, if F is the elementary matrix
corresponding to the row operation which undoes the one that gives rise to E then

F (EIn ) = In

by the definition. Hence In = F (EIn ) = (F E)In = F E and similarly we get In = EF. Hence E is invertible
with inverse F .

Recall that Theorem 4.7 says that any matrix is row equivalent to a unique matrix in row-reduced echelon form.
We need this result in mind for our following results.

Lemma 5.4. Suppose that A ∈ Mn,n (R) and that the last row of A contains only 0s. Then A is not invertible.

Proof. Suppose for a contradiction that A is invertible and that AB = BA = In . Then the (n, n)-entry in the
matrix AB is equal to 1. But this entry is equal to
n
X
ank bkn = 0,
k=1

giving a contradition. Hence A is not invertible.

Theorem 5.5. Suppose that A ∈ Mn,n (R). Then A is invertible if and only if A is row equivalent to In .

Proof. By Theorem 4.7 there exists a unique matrix M in row-reduced echelon form such that A is row equivalent
to M . Then by Lemma 5.2 there exist elementary matrices E1 , E2 , . . . , Et such that M = E1 E2 . . . Et A. Since
M is in row-reduced echelon form and is square, either M = In or the last row of M consists of 0s by Lemma 4.5.
Suppose A is invertible. Then M = E1 E2 . . . Et A is also invertible, since each elementary matrix Ei is invertible.
So M = In , since otherwise M is not invertible, by Lemma 5.4.
Suppose M = In . Then (E1 E2 . . . Et )A = In and A is invertible with inverse (E1 E2 . . . Et ) by Proposition 2.6.

40
Corollary 5.6. Any sequence of elementary row operations that transforms A to In also transforms In to A−1 .
Proof. Suppose that
In = (E1 E2 . . . Et )A
where E1 , E2 , . . . , Et are elementary matrices. Then

A−1 = E1 E2 . . . Et = E1 E2 . . . Et In

Corollary 5.7. Suppose that A is row-equivalent to a matrix B. Then we can write A as a product

A = E10 E20 . . . Et0 B

where each Ei is an elementary matrix. In particular, suppose that A is equivalent to In . Then we can write A
as a product
A = E10 E20 . . . Et0
of elementary matrices.
Proof. By Lemma 5.2 there exist elementary matrices E1 , E2 , . . . , Et such that B = E1 E2 . . . Et A, where each
−1
elementary matrix is invertible by Lemma 5.3. Set Ei0 = Et−i+1 , so that Ei0 is also invertible. Then

E10 E20 . . . Et0 B = Et−1 Et−1


−1
. . . E1−1 E1 E2 . . . Et A = A.

The second part follows by setting B = In .

5.2 Inverting a matrix using elementary row operations


Suppose that we want to see if A is invertible and to find A−1 if it is. Elementary row opereations give us a
method which will work every time.
Proposition 5.8. Let A ∈ Mn,n (R). Suppose we want to decide if A is invertible and to find its inverse if it
is. Write the matrix A next to the matrix In . Perform elementary row operations on both matrices until A has
a row of zeros or A = In . If it has a row of zeros, then A is not invertible by Theorem 5.5. Otherwise the
elementary row operations will have turned the matrix In into the matrix A−1 , by Corollary 5.6.
Examples.
• Find the inverse of the matrix  
1 2 1
A =  0 −1 1 .
−1 0 1

We perform row operations on the matrix


   
1 2 1 1 0 0 1 2 1 1 0 0
r +r
 0 −1 1 0 1 0  −−3−−→
1  0 1 −1 0 −1 0 
−r2
−1 0 1 0 0 1 0 2 2 1 0 1
   
1 2 1 1 0 0 1
r3
1 2 1 1 0 0
r3 −2r2
−− −−→  0 1 −1 0 −1 0  −4−→  0 1 −1 0 −1 0 

0 0 4 1 2 1 0 0 1 41 12 14
1 2 0 34 − 12 − 14 1 0 0 41 1
− 34
   
r +r3 r1 −2r2 2
−−2−−→  0 1 0 1 −1 1  −−−−→  0 1 0 1 −1 1 
.
r1 −r3 41 1
2 4
1
41 1
2 4
1
0 0 1 4 2 4 0 0 1
4 2 4

So we get  
1 2 −3
1
A−1 = 1 −2 1  ,
4
1 2 1
which you should check by multiplying by A.

41
 
1 5 9
• Let B =  2 −1 1 . We determine if B is invertible.
−1 17 25
   
1 5 9 1 0 0 1 5 9 1 0 0
 2 −1 1 0 1 0  −r−3− +r1
−→  0 −11 −17 −2 1 0 
r2 −2r1
−1 17 25 0 0 1 0 22 34 1 0 1
 
1 5 9 1 0 0
r3 +2r2
−−−−→  0 −11 −17 −2 1 0 
0 0 0 3 2 1

So B is not invertible.
Now let us consider how the determinant changes for equivalent matrices. What can we deduce about the
determinant of a matrix A from the elementary operations we apply to find the equivalent matrix in row-reduced
echelon form?
Theorem 5.9. Let E be an elementary matrix and let k 6= 0 ∈ R.
1. If E results from interchanging two rows of In , then det E = −1.
2. If E results from multiplying a row of In by k, then det E = k.
3. If E results from adding l times one row of In to another row, then det E = 1.
Proof. This follows from Theorem 2.16, noting that det In = 1.

We are almost in a position to prove Theorem 2.14 (3). We begin with a case.
Theorem 5.10. Suppose that A, E ∈ Mn,n (R) with E an elementary matrix. Then
det(EA) = (det E)(det A).
Proof. By Theorem 5.2, EA is the matrix obtained by applying the relevant elementary row operation to A.
Then det(EA) is described in Theorem 2.16 and det E is described in Theorem 5.9. Comparing the determinants
gives the result.

Theorem 5.11. Let A ∈ Mn,n (R). Then A is row equivalent to In if and only if det A 6= 0.
Proof. Suppose that A is row equivalent to a matrix B which is in row-reduced echelon form. By Corol-
lary 5.7 there exist elementary matrices E10 , E20 , . . . , El0 such that A = E10 E20 . . . Et0 B. By repeatedly applying
Theorem 5.10, we have
det A = det E10 det E20 . . . det Et0 det B.
If B = In then det B = 1 and det Ei 6= 0 for any i since by Theorem 5.9, none of the elementary matrices have
determinant zero. Hence det A 6= 0. If B 6= In then by Lemma 4.5, the last row of B consists zeros and hence
det B = 0 by Lemma 5.4. Hence det A = 0.

Combining Theorems 5.5 and 5.11, the following theorem is immediate.


Theorem 5.12. Let A ∈ Mn,n (R). Then A is invertible if and only if det A 6= 0.
Theorem 5.13. Suppose that A, B ∈ Mn,n (R). Then det(AB) = (det A)(det B).
Proof. Suppose that A and B are both invertible. Then by Corollary 5.7, A = E1 E2 . . . El and B =
E10 E20 . . . El0 for some elementary matrices Ei , Ej0 . Repeatedly applying Lemma 5.10 we have that since
AB = E1 E2 . . . El E10 E20 . . . El0 we have
det(AB) = det(E1 E2 . . . El E10 E20 . . . El0 ) = (det E1 E2 . . . El )(det E10 E20 . . . El0 ) = (det A)(det B).

Now suppose A is not invertible or B is not invertible, so by Theorem 5.12, det A = 0 or det B = 0 and so
(det A)(det B) = 0. Now AB is not invertible by Lemma 2.7 hence by Theorem 5.12 again, det(AB) = 0.
Hence det(AB) = (det A)(det B).

Corollary 5.14. Suppose that A is invertible. Then det A−1 = 1


det A .

42
6 Eigenvalues and eigenvectors
In this Section we will
• See how matrices map n-dimensional space.
• Define the eigenvalues, eigenvectors and eigenspace of a square matrix and learn how to compute them.

6.1 Matrices and Rn


Definition 6.1. Let Rn = Mn,1 (R), the column vectors of length n with entries in R. One special vector is the
zero vector, v0 = 0n,1 , the vector in which every entry is zero.
Observe that when we muliply a matrix with a column vector, we get another column vector. Specifically, if
A ∈ Mm,n (R) and v ∈ Rn then Av ∈ Rm .
Example. Let  
3 5 −1  
6 −2 8  3
A=
0 −1 0  ,
 v = 2 .
1
1 1 2
Then    
3 5 −1   18
6 −2 8  3  22 
0 −1 0  2 = −2 .
Av =     
1
1 1 2 7
If A ∈ Mn,n (R) is a square matrix then the map v 7→ Av is a map from Rn to Rn . (In fact, it is a linear
transformation, as you will see if you take Algebra II.) It is a bijection if and only if it is an invertible map, that
is, if and only if the matrix A is invertible. We can obtain information about the map by looking at properties
of the matrix. We give some examples.
Example. Let  
2 1
A=
1 3
and consider the map v 7→ Av for v ∈ R2 . Note that
              
2 1 1 2 2 1 0 1 2 1 1 3
= , = , = .
1 3 0 1 1 3 1 3 1 3 1 4
We can check that the unit square, pictured left, gets mapped to the parallelogram on the right.

Then the area of the parallelogram is 5, which is | det A|.


Proposition 6.2. Let
a b
A=
.
c d
       
0 1 0 1
The map v 7→ Av for v ∈ R2
maps the unit square with corners , , and to the parallelogram
0 0 1 1
       
0 a b a+b
with corners , , and and area | det A|.
0 c d c+d

43
   
a b
Remark. We have that det A = 0 if and only if the vectors and are parallel.
c d

Proposition 6.3. Let A ∈ M3,3 (R). The map v 7→ Av for v ∈ R3 maps the unit cube to a parallelepiped with
volume | det A|.

Remark. Similar results hold for Rn , if you can imagine n-dimensional space. In fact, we can define the
determinant in this way. Let A ∈ Mn,n (R). The map v 7→ Av from Rn to Rn takes the unit cube in Rn to a
shape of volume det A. If we have A1 , A2 ∈ Mn,n (R) then we can compose the maps
A A
Rn −−→
1
Rn −−→
2
Rn

so that v 7→ A2 (A1 v) = (A2 A1 )v. So on one hand, this sends the unit cube to a shape of volume | det(A1 A2 )|;
but on the other hand, it first sends the unit cube to a shape of volume det A1 and then to a shape of volume
| det A1 || det A2 |.

6.2 Eigenvalues and eigenvectors


Definition 6.4. Let A ∈ Mn,n (R). We say that λ ∈ R is an eigenvalue of A (over R) if there exists a non-zero
vector v ∈ Rn such that
Av = λv.
We call such a vector v an eigenvector with corresponding eigenvalue λ.
     
3 1 2 3
Example. Show that v1 = and v2 = are eigenvectors of A = .
−1 −1 −1 −2
We have     
2 3 3 3
Av1 = = = v1 ;
−1 −2 −1 −1
    
2 3 1 −1
Av2 = = = −v2 .
−1 −2 −1 1
So v1 , v2 are eigenvectors of A. We have that v1 is an eigenvector of A with eigenvalue λ1 = 1, and v2 is an
eigenvector of A with eigenvalue λ2 = −1.

Theorem 6.5. Let A be an n × n matrix over R and let λ ∈ R. Then λ is an eigenvalue of A if and only if it
satisfies the equation det(A − λIn ) = 0.

Proof. The equation Av = λv an be rewritten 0n = Av − λIn v, or

(A − λIn )v = 0n .

So, saying that λ is an eigenvalue of A is precisely the same as saying that there is a non-trivial solution to the
equation (A − λIn )v = 0n which, by Theorem 4.14 happens if and only if the matrix (A − λIn ) is not invertible,
which by Theorem 5.12 happens if and only if det(A − λIn ) = 0.

Definition 6.6. Let A ∈ Mn,n (R). The polynomial

charA (λ) = det(A − λIn )

is called the characteristic polynomial of the matrix A. It is a polynomial in the variable λ of degree n.

44
Combining this definition with Theorem 6.5, we have the following result.
Proposition 6.7. Let A ∈ Mn,n (R). Then λ is an eigenvalue of A if and only if
charA (λ) = 0.
 
2 3
Example. With A = as in the example above, we have the characteristic polynomial
−1 −2
     
2 3 1 0 2−λ 3
det(A − λIn ) = det −λ = det = (2 − λ)(−2 − λ) + 3 = λ2 − 1.
−1 −2 0 1 −1 −2 − λ
We have that charA (λ) = λ2 − 1 = 0 if and only if λ = ±1, so eigenvalues of A are λ = 1 and λ =  −1.
x
Let us find all eigenvectors corresponding to eigenvalue 1. This means that we want to find all v = such
y
that     
2 3 x x
=1
−1 −2 y y
or equivalently, such that     
2−1 3 x 0
=
−1 −2 − 1 y 0
that is, we need to solve the pair of simultaneous equations

x + 3y = 0
−x − 3y = 0
which we can all do easily! This has general solution x = −3k, y = k, k ∈ R, BUT we must recall from the
definition that eigenvectors are non-zero, so we also need k 6= 0. So the eigenvectors of A with eigenvector 1
are all vectors of the form    
−3k
v= : k ∈ R \ {0} .
k
Exercise: Find all the eigenvectors corresponding to −1.
Remark. The zero vector v0 will satisfy Av0 = λv0 for any value of λ. So v0 is never an eigenvector. (It’s
kind of similar to the way that we don’t consider 1 to be a prime number.) However, as we saw above, it’s
then a bit clumsy to write down the eigenvectors as we have to explicitly exclude v. We get around this by
introducing eigenspaces. You will see more of these if you take Algebra II next year.
Definition 6.8. Suppose that A ∈ Mn,n (R) and that λ is an eigenvalue of A. The eigenspace of λ is defined
to be
Eλ = {v ∈ Rn : Av = λv}.
Note that v is always an element of Eλ . So the eigenspace Eλ consists of all the eigenvectors for λ together
with the zero vector.
Example. Find the eigenvalues of the matrix A below and for each eigenvalue, find the corresponding eigenspace.
 
2 2 −1
A = −1 3 0 
−1 3 1
We first compute the characteristic polynomial:
 
2−λ 2 −1
charA (λ) = det  −1 3 − λ 0 
−1 3 1−λ
= (2 − λ)(3 − λ)(1 − λ) − 2(−(1 − λ)) − 1(−3 + (3 − λ)
= (2 − λ)(3 − λ)(1 − λ) + 2(1 − λ) + λ
= (2 − λ)(3 − λ)(1 − λ) + (2 − λ)
= (2 − λ)(3 − 4λ + λ2 + 1)
= (2 − λ)(λ − 2)2

45
Hence charA (λ) = 0 if and only if λ = 2, so λ = 2 is the only eigenvalue of A. We now compute E2 .

E2 = v ∈ R3 : Av = 2v

      
 x 2 2 −1 x x 
= y  : −1 3 0  y  = 2 y 
z −1 3 1 z z
 
  
 x 2x + 2y − z = 2x 
= y  : −x + 3y = 2y
z −x + 3y + z = 2z
 
  
 x 2y − z = 0 
= y  : −x + y = 0
z −x + 3y − z = 0
 

To solve this homogeneous system of equations, we go back to the techniques in Section 4.2. Since the system
is homogeneous (that is, all the constants are 0) we don’t bother writing down the last column. We consider
the matrix    
0 2 −1 2−2 2 −1
−1 1 0  =  −1 3 − 2 0 .
−1 3 −1 −1 3 1−2
After applying elementary row operations we get
   
0 2 −1 1 0 −1/2
−1 1 0  ; 0 1 −1/2 .
−1 3 −1 0 0 0
Hence the solution set, that is to say E2 , is given by
  
 1/2k 
E2 = 1/2k  : k ∈ R .
k
 

Remark. In the example above, we showed all the working. However, all we really need to solve to find the
eigenspace
 Eλis the system (A − λIn ) = 0v . We didn’t really need to justify the steps that led us to the matrix
0 2 −1
−1 1 0 .
−1 3 −1
Example. Find the eigenvalues of the matrix A below and for each eigenvalue, find the corresponding eigenspace.
 
2 1 −1
A = 1 2 −1
1 1 0
We first compute the characteristic polynomial:
 
2−λ 1 −1
charA (λ) = det  1 2 − λ −1  = (2 − λ)(λ − 1)2
1 1 −λ
so the eigenvalues are λ = 2 and λ = 1. To find E2 , consider the matrix
   
0 1 −1 1 0 −1
1 0 −1 − EROs 
−−→ 0 1 −1 .
1 1 −2 0 0 0
Hence   
 k 
E2 =  k :k∈R .

k
 

46
To find E1 , consider the matrix    
1 1 −1 1 1 −1
EROs
1 1 −1 −−−→ 0 0 0  .
1 1 −1 0 0 0
Hence   
 l2 − l1 
E1 =  l1  : l1 , l2 ∈ R .
l2
 

Let us summarize our methods.

Proposition 6.9. To determine the eigenvalues and eigenspaces of a square matrix A.

1. Compute the characteristic polynomial charA (λ) = det(A − λIn ) and find the values of λ for which
charA (λ) = 0. These are the eigenvalues of A.
2. For each eigenvalue λ of A:

(a) Write down the eigenvector equation (A − λIn )v = 0n .


(b) Convert the equation to a system of simultaneous linear equations.
(c) Solve the system of linear equations to find the corresponding eigenspace.

A Appendix
In the appendix, we collect together some of the long proofs of results that we have used. This is not part of
the module - just some extra information. If you want to know why certain things we claimed in the lecture
course are actually true, you may find this section interesting.

A.1 Inverses and Determinants


In the course, we used the fact that a left inverse of an n × n matrix was also a right inverse, and vice versa.
The proof below requires some ideas which you probably won’t see unless you take Algebra II so it most likely
won’t make too much sense now.

Proposition A.1. Suppose that A, B ∈ Mn,n (R) and that AB = In . Then BA = In .

Proof. Suppose that A, B ∈ Mn,n (R) and that AB = In . Then

B = BIn = B(AB) = (BA)B =⇒ (In − BA)B = 0n .

Unfortunately this is NOT enough to conclude that (In − BA) = 0n or B = 0n - recall that two matrices can
multiply together to get the zero matrix even if neither of them are the zero matrix. Instead, consider the linear
transformation θ : Rn → Rn given by θ(x) = Bx for all x ∈ Rn . This is injective, since if Bx = By then
x = (AB)x = (AB)y = y, and any injective linear map from an n-dimensional vector space to itself is also
surjective. So we have

(In − BA)Bx = v0 for all x ∈ Rn =⇒ (In − BA)y = v0 for all y ∈ Rn =⇒ In − BA = 0n =⇒ BA = In .

Recall that if A ∈ Mn,n (R) then for 1 ≤ i ≤ n we have


n
X
det A = aij Cij
j=1

where Cij = (−1)i+j det Aij is the (i, j)-cofactor and Aij is the matrix obtained by deleting row i and column
A to mean the (i, j)-cofactor of the matrix A (since we will be looking at cofactors
j of A. Below, we write Cij
of more than one matrix). We write Aii0 jj 0 to denote the matrix obtained from A by deleting rows i and i0 and
columns j and j 0 .

47
Theorem A.2. Suppose that det A 6= 0. Then we can find the inverse of A as follows
1
A−1 = adj A.
det A
Proof. We want to consider the matrix A(adj A). First, suppose 1 ≤ i ≤ n and consider the (i, i)-entry of
A(adj A). This is given by
Xn Xn
aik Cik = (−1)i+k aik det Aik = det A.
k=1 k=1

Now suppose that 1 ≤ i, j ≤ n with i 6= j. The (i, j)-entry of A(adj A) is given by


n
X n
X
aik Cjk = (−1)j+k aik det Ajk
k=1 k=1
n k−1 n
!
X X X
= aik (−1)k+l aik ail det Aijkl + (−1)k+l+1 aik ail det Aijkl
k=1 l=1 l=k+1
 
X X
= (−1)i+j  (−1)k+l aik ail + (−1)k+l+1 aik ail  det Aijkl
1≤l<k≤n 1≤l<k≤n

=0

So A(adj A) = (det A)In . Since det A 6= 0 we have


adj A
A = In
det A
and by Corollary 2.6, (or by checking directly that (adj A)A = (det A)In ) we have that A is invertible with
inverse adj A.

We now look at how performing elementary row operations on square matrices affects their determinants.
Lemma A.3. Let A ∈ Mn,n (R) where n ≥ 2. Let B be the matrix obtained by swapping rows 1 and 2 of A.
Then det B = − det A.
Proof. Write Aii0 jj 0 (resp. Bii0 jj 0 ) to denote the matrix obtained by deleting rows i and i0 and columns j andj 0
for A (resp. B). Then
n
X
det B = b1j (−1)j+1 det B1j
j=1
 
n
X Xj−1 n
X
= b1j (−1)j+1  b2k (−1)k+1 det B12jk + b2k (−1)k det B12kj 
j=1 k=1 k=j+1
 
n
X j−1
X n
X
= a2j (−1)j+1  a1k (−1)k+1 det A12jk + a1k (−1)k det A12kj 
j=1 k=1 k=j+1
X X
= (−1)j+k a2j a1k det A12kj + (−1)j+k+1 a2j a1k det A12kj
1≤k<j≤n 1≤j<k≤n
n
X
det A = a1k (−1)k+1 det A1k
k=1
 
n
X Xk−1 n
X
= a1k (−1)k+1  a2j (−1)j+1 det A12jk + a2j (−1)j det A12jk 
k=1 j=1 j=k+1
X X
= (−1)j+k a1k a2j det A12jk + (−1)j+k+1 a1k a2j det A12kj
1≤j<k≤n 1≤k<j≤n

Comparing the expressions, we see that det B = det A.

48
Lemma A.4. Let A ∈ Mn,n (R) where n ≥ 2 and suppose that 1 ≤ x < y ≤ n. Let B be the matrix obtained
by swapping rows x and y of A. Then det B = − det A.
Proof. We use induction on n. If n = 2 then
   
a11 a12 a21 a22
det = a11 a22 − a12 a21 = −(a21 a12 − a11 a22 ) = − det
a21 a22 a11 a12

so the Lemma holds for n = 2. So suppose that n > 2 and the Lemma holds for all A0 ∈ Mn−1,n−1 (R). Let
A ∈ Mn,n (R) and let B be the matrix obtained by swapping rows x and y of A. We have 3 cases to consider.
• If x 6= 1 then
n
X n
X n
X
B B A
det B = b1j C1j = a1j C1j =− a1j C1j = − det A,
j=1 j=1 j=1
B = −C A .
where we have used the inductive hypothesis to deduce that Cij ij
• If x = 1 and y = 2 then the result is Lemma A.3.
• Suppose x = 1 and y > 2. Let D be the matrix obtained by swapping rows 2 and y of A. and D0 the
matrix obtained by swapping rows 2 and y of B. Then using the first two parts of the Lemma,

det B = − det D0 = det D = − det A.

Lemma A.5. Let A ∈ Mn,n (R) and let k ∈ R. Suppose that 1 ≤ x ≤ n and let B be the matrix obtained by
multiplying row x by k. Then det B = k det A.
Proof. We use induction on n. If n = 1 and A = (a) then det(ka) = ka = k det(a) so the Lemma holds. So
suppose that n > 1 and the Lemma holds for all A0 ∈ Mn−1,n−1 (R). Let A ∈ Mn,n (R) and let B be the matrix
obtained by multiplying row x by k. If x = 1 then
n
X n
X n
X
B A A
det B = b1j C1j = ka1j C1j =k a1j C1j = k det A,
j=1 j=1 j=1

and if x > 1 then


n
X n
X n
X
B A A
det B = b1j C1j = a1j kC1j =k a1j C1j = k det A,
j=1 j=1 j=1

where we have used the inductive hypothesis to deduce that CijB = kC A . Hence the Lemma hold for all
ij
A ∈ Mn,n (R) and so the induction hypothesis holds for all n.

Lemma A.6. Let A ∈ Mn,n (R) with n ≥ 2 and let k ∈ R. Suppose that 1 ≤ x, y ≤ n with x 6= y and let B
be the matrix obtained by adding k copies of row y to row x. Then det B = det A.
Proof. We use induction on n. If n = 2 then
 
a11 + ka21 a12 + ka22
det = (a11 + ka21 )a22 − (a12 + ka22 )a21 = a11 a22 − a12 a21
a21 a22
 
a11 a12
= det
a21 a22
 
a11 a12
det = a11 (a22 + ka12 ) − a12 (a21 + ka11 ) = a11 a22 − a12 a21
a21 + ka11 a22 + ka12
 
a11 a12
= det
a21 a22

so the result holds for n = 2. So suppose that n > 2 and the Lemma holds for all A0 ∈ Mn−1,n−1 (R). Let
A ∈ Mn,n (R) and let B be the matrix obtained by adding k copies of row y to row x. If x =
6 1 then
n
X n
X
B A
det B = b1j C1j = a1j C1j = det A,
j=1 j=1

49
B = C A . So suppose that x = 1. Let D be the
where we have used the inductive hypothesis to deduce that Cij ij
0
matrix obtained by swapping rows 2 and y of A. and D the matrix obtained by swapping rows 2 and y of B.
Then by the last result
det B = − det D0 = − det D = det A
as required.

We now give a proof of Theorem 2.14 (2).

Theorem A.7. Let A ∈ Mn,n (R). Then det A = det AT .

Proof. We use induction on n. If n = 1 then A = AT so the result is true. Suppose that n > 2 and the Lemma
holds for all A0 ∈ Mn−1,n−1 (R). Let A ∈ Mn,n (R) and let B = AT .
n
X
det B = b1j (−1)j+1 det B1j
j=1
n
X
T
= b11 det B11 + b1j (−1)j+1 det B1j
T
using the inductive hypothesis
j=2
Xn
= a11 det A11 + aj1 (−1)j+1 det Aj1
j=2
n
X n
X
= a11 det A11 + aj1 (−1)j+1 (−1)k a1k det A1j1k
j=2 k=2
Xn Xn
= a11 det A11 + (−1)j+k+1 det A1j1k
j=2 k=2
n
X
det A = (−1)k+1 a1k det A1k
k=1
n
X
= a11 det A11 + (−1)k+1 a1k det AT1k using the inductive hypothesis
k=2
Xn
= a11 det A11 + (−1)k+1 a1k det Bk1
k=2
Xn n
X
= a11 det A11 + (−1)k+1 a1k (−1)j b1j det B1k1j
k=2 j=2
n
X n
X
= a11 det A11 + (−1)k+1 a1k (−1)j aj1 det A1j1k using the inductive hypothesis
k=2 j=2
Xn Xn
= a11 det A11 + (−1)j+k+1 det A1j1k
j=2 k=2

A.2 Row reduction


Theorem A.8. Suppose that M ∈ Mm,n (R). By performing the elementary row operations described in
Theorem 4.8, we can convert M into a matrix which is in row-reduced echelon form.

Proof. Let M be an m × n matrix. If M has all entries = 0 then we are done. So we suppose that M has
some non-zero entries. The proof is by induction on m, the number of rows of M.
Base Step m = 1 : Let d be the leading entry of the (only) row of M. Now multiply the row by d−1 to get
leading entry 1 and we are done.
Inductive step: Suppose the result holds for all matrices with (m − 1) rows. Carry out the following:

50
(1) If necessary, interchange rows so that the first column which does not consist entirely of zeros (call it column
j) has a non-zero entry d in the first row. The matrix looks something like:
 
0 ··· 0 d ∗ ··· ∗
 .. .. .. ..  ,
. . . .
0 ··· 0 ∗ ∗ ··· ∗

where ∗ represents entries that we are not interested in for the moment.
(2) Multiply the first row by d−1 . We denote the remaining entries in column j by c2 , ..., cm , so the matrix
looks like:  
0 ··· 0 1 ∗ ··· ∗
0 · · · 0 c2 ∗ · · · ∗
 
 .. .. .. .. .. 
. . . . .
0 ··· 0 cm ∗ · · · ∗
(3) We clear the j th column by subtracting

c2 times row r1 from row r2 ,


c3 times row r1 from row r3 ,
..
.
cm times row r1 from row rm .

Now the matrix looks like  


0 ··· 0 1 ∗ ··· ∗
0 · · · 0 0 ∗ ··· ∗
..  .
 
 .. .. .. ..
. . . . .
0 ··· 0 0 ∗ ··· ∗
Now let B be the (m − 1) × n matrix consisting of the last (m − 1) rows of this matrix. By induction, B is row
equivalent to a matrix in row-reduced echelon form. Moreover, if we perform elementary row operations to B
then the first j columns remain zeros so getting B into row-reduced echelon form also gets M into row echelon
form.
(4) The final step is to clear any non-zero entries in the first row of M which appear above leading 1s in the
rows below. Labelling our entries in row r1 as follows, the matrix now looks something like this:
 
0 · · · 0 1 ej+1 · · · en
0 · · · 0 0 1 ··· 0 
..  .
 
 .. .. .. ..
. . . . .
0 ··· 0 0 0 ··· 1

For k = j + 1, . . . , n proceed as follows - IF ek 6= 0 AND ek appears above a leading entry 1 in row ri , then
perform the elementary row operation r1 −ek ri . After performing these finitely many row operations, our matrix
M will be in row-reduced echelon form.
Hence, by induction the result holds for all m × n matrices.

Theorem A.9. Every matrix is row equivalent to a unique matrix in row-reduced echelon form.

Proof. In Theorem A.2 above we saw an algorithm that shows that we can use row operations convert any
matrix into a matrix in row-reduced echelon form. So we need to consider uniqueness. We use induction on the
number of columns. If there is only one column, there are only two matrices in row-reduced echelon form, the
zero matrix and the matrix with 1 in the top left corner and 0 elsewhere, and these are clearly not row-equivalent.
So suppose that A ∈ Mm,n (R) and that the result holds for A0 ∈ Mm,n−1 (R). Suppose that B and C are
matrices in row-reduced echelon form which are both equivalent to A.
Let Ā (resp. B̄, C̄) be the matrices obtained by removing the last column from A (resp. from B, C). Any
sequence of elementary row operations that turns A into a matrix in row-reduced echelon form also turns Ā
into a matrix in row-reduced echelon form, so B̄ and C̄ are in row-reduced echelon form and therefore by the

51
induction hypothesis, B̄ = C̄. Hence B and C agree, except possibly in the last column. Let bi (resp. ci ) be
the entry in row i and column n of B (resp. C).
Consider A, B and C as augmented matrices for a system of m equations in n − 1 unknowns. By Theorem 4.6,
the systems all have the same solution space. Suppose that B̄ (and hence C̄) has got r non-zero rows and that
the leading 1s are in columns f1 , f2 , . . . , fr . Let 1 ≤ i ≤ r.
Suppose that bi 6= ci for some 1 ≤ i ≤ r. Then xfi = bi and xk = 0 for all k 6= fi solution to the system
indexed by B but not to the system indexed by C, giving a contradiction. So bi = ci for all 1 ≤ i ≤ r.
Now the system given by B (resp. by C) is inconsistent if and only if bi 6= 0 for some i ≥ r (resp. ci 6= 0
for some i ≥ r) and since B (resp. C) is in row-reduced echelon form, in this case it must be that br+1 = 1,
with bi = 0 for all i 6= r + 1 (resp. cr+1 = 1, with ci = 0 for all i 6= r + 1). Thus if the system given by B
is inconsistent then so is the system given by C (by Theorem 4.6) and B = C. Otherwise both systems are
consistent and bi = ci = 0 for i ≥ r + 1 and so again B = C.

52

You might also like