Linear Algebra Course Pack PDF

MA212 Further Mathematical Methods Lecture LA 1
Introducing Linear Algebra:

Arrangements, Overview, Some Revision
What’s on Moodle: Slides, Recordings, Assignments...

& their Solutions (eventually), Past papers
Homework Assignment: Class arrangements to continue as

for Calculus
Course content
Vector spaces and all that
Standard examples of spaces: R3 , R4 , Rn

Exotic examples of spaces, like M2×2 : the 2 × 2 matrices
(as these can be added and scaled). But note that we can code
 
a c
  as (a, b, c, d) ∈ R4 .
b d
Bases, co-ordinates
Orthogonal bases =⇒ projections
Nearest point map: good approximations
Linear regression
MA212 – Lecture 21 – Tues 5 December 2017 page 2

Content cont’d
Linear transformations: their representation by matrices

Square Matrices: when diagonalizable ... their interpretation
via eigenvalues
Complex scalars and vectors ... as real matrices can have
complex eigenvalues
What if not diagonalizable?
How about non-square matrices? Generalized Inverses
Exotic examples:
R1
A(f ) := 0 f (t) dt

Background Readings (for this week’s material)
Anthony and Harvey (Linear Algebra)

Ch. 2 (§2.2 and §2.3), Ch. 3 (§3.1 and §3.3), Ch. 5
Adam Ostaszewski (Advanced Mathematical Methods)

Ch. 1 (§1.1-§1.3), also §8.1.

Revision: A vector space is
a set V equipped with notions of “addition” and “scalar

multiplication”, so that:
(i) the sum v1 + v2 of any two elements (vectors) v1 , v2 in V

is also in V ,
(ii) the scalar multiple av is also in V , whenever v is in V and
a is any scalar.
Usually, we shall take the set of scalars to be the set R of all real
numbers: then we say that V is a real vector space. Sometimes
we shall take the set of scalars to be the set C of all complex
numbers: then V is a complex vector space.

*Formal rules for a vector space
Addition and scalar multiplication of vectors in a vector space

satisfy the following properties, all of which hold for all vectors
u , v , w in the vector space, and all scalars a , b :
Rules of Vector Addition (group rules)

(i) commutativity: u + v = v + u ,
(ii) associativity: u + (v + w) = (u + v) + w ,
(iii) a zero vector, denoted 0 , exists with: 0 + v = v + 0 = v ,
Rules of Vector Scaling (action rules)
(iv) action of the scalar 1 : 1u = u,
(v) associativity: (ab)u = a(bu),
Rules of interconnection (distributivity rules)
(vi) (a + b)u = au + bu, (vii) a(u + v) = au + av.
* means “No need to memorize”
Linear independence
Let V be a real or complex vector space.
A finite subset {v1 , v2 , . . . , vk } of vectors in V is defined to be

linearly dependent if there are scalars a1 , . . . , ak , not all zero,
such that a1 v1 + a2 v2 + · · · + ak vk = 0 .
A set of vectors that is not linearly dependent is called linearly

independent . So the set {v1 , v2 , . . . , vk } is linearly independent
if the only scalars a1 , . . . , ak satisfying
a1 v1 + a2 v2 + · · · + ak vk = 0 are a1 = a2 = · · · = ak = 0 .

Complex 3-space C3
  
 
 z1 
 

C3 := 
 z 2
 : z1 , z2 , z3 ∈ C


 

 
z3
√
Example: with i = −1
 
2i
 
v = 1+i 

,
3 − 5i
Scaling by a complex scalar? Here goes

Complex scaling
 
(1 − i)2i
 
(1 − i)v =  (1 − i)(1 + i) 


(1 − i)(3 − 5i)
 
2 + 2i
 
= 2
 1−i =1+1 

3 − 5 − i(3 + 5)
 
2 + 2i
 
= 2 .

−2 − 8i

Check for linear independence the set
     
 
 1   0

 
0 

 2 , 4 + i , 0  .
      

 

i −1 3−i
Suppose for some scalars a1 , a2 , a3

       
1 0 0 0
       
a1        
 2  + a2  4 + i  + a3  0  = 0 =  0  .
i −1 3−i 0

Cont’d
       
1 0 0 0
       
a1  2  + a2  4 + i  + a3  0 
     =  0 .
  
i −1 3−i 0
Component comparison gives
a1 + 0 + 0 = 0, =⇒ a1 = 0,
2a1 + (4 + i)a2 + 0a3 = 0, =⇒ a2 = 0,
ia1 − a2 + a3 (3 − i) = 0 =⇒ a3 = 0.
So the set is linearly independent.

Check for linear dependence the set
      

 1 2 0 
      
 
v1 =  3  , v2 = 
 0
 , v3 =  2  .
  

 
 
1 −1 1
By observation
(−2)v1 + 1v2 + 3v3 = 0.
Indeed
       
1 2 0 0
       
(−2)        
 3  +  0  + 3 2  = 0 =  0 .
1 −1 1 0

Component comparison gives
(−2) × 1 + 2 + 3 × 0 = 0,
(−2) × 3 + 0 + 3 × 2 = 0,
(−2) × 1 + (−1) + 3 × 1 = 0.
But, how did we find these scalars?

Method for linear independence: Flip-Stack-Reduce
More accurately: Transpose, stack, row-reduce
     


 1 2 0  
      
v1 = 
 3 
 , v2 = 
 0
 , v3 =  2 
  

 
 
1 −1 1
   
v1T 1 3 1
   
=⇒  v2  =  2 0 −1 
 T  
.
v3T 0 2 1
This is the transpose and stack part. The next step is the
row-reduce part.

Method for linear independence: Flip-Stack-Reduce
On completion :
If a zero row appears, then there is linear dependence

Otherwise there is linear independence.
Example to follow after next business.

Elementary row operations: revision
(i) Multiply a row by a non-zero constant

(ii) Exchange two rows
(iii) Add another scaled row
Row ops leave the row space of the matrix unchanged

They also preserve column dependencies

Watch the example
   
1 3 1 v1T
   
 2 0 −1  =  v T 
   2 
0 2 1 v3T
R2 −2R1 R2 −2R1
   
1 3 1 v1T
   
 0 −6 −3  =  [vT − 2vT ] 
   2 1 
0 2 1 v3T
R2 +3R3 R2 +3R3
   
1 3 1 v1T
   
 0 0 0  =  T T T 
   [v2 − 2v1 ] + 3v3 
0 2 1 v3T

End effect
From
   
1 3 1 v1T
   
 0 0 0  =  [vT − 2vT ] + 3vT 
   2 1 3 
0 2 1 v3T
we have
[v2T − 2v1T ] + 3v3T = 0T

−2v1 + v2 + 3v3 = 0,
(finally).

Recalling some general facts
For a square matrix the following are equivalent:

A has an inverse
det(A) 6= 0
Rows of A are lin. independent
Row reduction doesn’t yield a zero row

Mostly about Gaussian Elimination
Menu for the day:

Recap: vector subspaces
Gaussian elimination (Main dish/ main business)
Bases and dimension
Elementary matrices and Determinants
Wronskian Determinant
Recap: vector subspaces
A subset U of a vector space V is a vector subspace (or just

subspace) of V if U is also a vector space (with the addition and
multiplication inherited from the larger vector space V ).
To check that a subset U of V is a vector subspace, we don’t

need to verify all the properties of addition and multiplication of a
vector space, as these already hold in V .
What we do need is to check U is closed under addition and

scaling ...
MA212 – Lecture 37 – Fri 8 December 2017 page 2

Criterion for checking U is a subspace
U is closed under addition and scaling whenever a is a scalar

and v1 , v2 , v are elements of U , then v1 + v2 is in U and av
is in U .
Equivalently:
a subset U of a vector space V is a subspace of V
iff U is closed under linear combinations:
i.e., whenever a1 , a2 are scalars and v1 , v2 are elements of U ,

then the linear combination a1 v1 + a2 v2 is also in U .

Subspace: an example to check
  
 
 x 
 

W =  y
 
 : 2x − y + z = 0

 

 
z
Consider a scalar a and

   
x u
   
x : =   
 y  , u : =  v  both in W.
z w
   
x+u au
   
Is x + u =  y + v  in W and au : = 
 
 av
 ∈ W?

z+w aw
Let’s see.
2x − y + z = 0,
2u − v + w = 0.
1. Adding the two equations
2[x + u] − [y + v] + [z + w] = 0,
so the components of x + u satisfy the defining condition.

2. Scaling the first equation by a,
 
ax
 
2ax−ay+az = 0, so ax : =  
 ay  satisfies the defining condition.
az
Linear combinations
The linear span of a finite set S = {v1 , v2 , . . . , vk } of vectors in a

real vector space V is the subset
Xk
Lin(S) = { ai vi | ai ∈ R}
i=1
of linear combinations of the vectors in S . If the vectors are in a

complex vector space, then the sum is over all ai in C . The
linear span of an infinite set A of vectors is the set of vectors
that can be written as a linear combination of finitely many
vectors from A .

Spanning and other notation for Lin(A)
A subset A of a vector space V spans the vector space V if

span(A) = V , meaning that every vector in the vector space can
be written as the finite linear combination of these vectors.
Other notation for Lin(A) is either of:
span(A), hAi .

Gaussian elimination and its purpose
In MA100 you learned how to solve simultaneous equations:

Step 1 : Express in matrix form
Ax = b.
Step 2 : Use row reduction on the augmented matrix
(A|b)
to get this into Echelon form.

This is called Gaussian elimination: because the echelon form
eliminates all but one variable from each equation.
If the system is consistent (= has a solution at all), the method
easily finds all the solutions.
A simple example
Consider 3x1 + 2x3 = 6 and −x1 + x2 + x3 = 5 . We get the

augmented matrix
 
3 0 2 6
 .
−1 1 1 5
Gaussian elimination yields the augmented matrix

 
1 0 2/3 2
 .
0 1 5/3 7
Looking just at the first two columns, which are the columns
containing leading 1s, we can read off one solution: x1 = 2 ,
x2 = 7 and x3 = 0 .
Simple example cont’d
Additional solutions are provided by allowing x3 to be any real

number ( x3 is called a free variable) and setting x1 = 2 − 23 x3
and x2 = 7 − 53 x3 .

Finite dimensional spaces
A vector space V is called ‘finite-dimensional’ if V has a (finite!)

basis.
Theorem. If V is finite-dimensional and has a basis with

n members, then any set of n + 1 vectors of V is linearly
dependent.
Proof. Not examined. Available in the Appendix.

Dimension
By the theorem just stated:

In a finite dimensional vector space V any two bases of V have
exactly the same number of vectors.
That number is the dimension of V denoted dim(V ) .

Elementary matrices versus Gaussian Elimination
An elementary matrix arises from applying an elementary row

operation on the identity matrix (of any size m × m ).
We list the types below and check that if E is elementary of size
m × m , then A′ defined by
A′m×n := Em×m Am×n
is A row reduced by one elementary row operation.

Non-zero scaling of a row
 
1
 
 1 
 
 
 ... 
 
 
 1 
 
Ea =  
 a 
 
  ←− j th row × a 6= 0
 1 
 
 
 ... 
 
1

Example
    
1 0 0 a b a b
    
 0 −2 0   c d  =  −2c −2d 
    
0 0 1 e f e f
Another example, with a 6= 0 :
E1/a Ea = I,
since E1/a multiplies the j th row of Ea by 1/a. So Ea is

invertible.

Exchange rows
 
1
 
 ... 
 
 
 0 1 
 
  ←− ith row
 1 
exch  
E = 
 ... 
 
 
 1 0 
 
  ←− j th row
 ... 
 
1

Example
    
0 0 1 a b e f
    
 0 1 0  c d  =  c d 
    
1 0 0 e f a b
Another example:
E exch E exch = I,
since E exch exchanges the same two rows again . So E exch is

invertible.

Adding a row
Fix i, j .
 
1
 
 ... 
 
 
 1 
 
  ←− ith
row
 1 
 
Eaadd = 
 ... 
 
 
 a 1 
 
  Rj −→ Rj + aRi
 ... 
 
1

Example
    
1 0 0 a b a b
    
 0 1 2   c d  =  c + 2e d + 2f 
    
0 0 1 e f e f
Another example:
add add
E−a Ea = I,
add
since E−a takes away a× ith row of I from the j th row of Eaadd ,
so cancels the earlier addition of that same row to the j -th row.
So Eaadd is invertible.

Inversion and elementary row operations
Suppose that An×n can be row reduced to the identity using k

row operations.
Label them 1, 2, 3, ..., k according to the order of operation.
Using the corresponding elementary matrices we get
Ek · ... · E2 · E1 A = I.
All these elementary matrices are invertible so
A = E1−1 · E2−1 · ... · Ek−1 .
As the inverse of an elementary matrix is an elementary matrix,
A = a product of elementary matrices

Inverse from elementary matrices
By multiplying out the matrices below we see that
[Ek · ... · E2 · E1 ][E1−1 · E2−1 · ... · Ek−1 ] = I =
= [E1−1 · E2−1 · ... · Ek−1 ][Ek · Ek−1

·
... · E1 ].
So A is invertible and
A−1 = Ek · ... · E2 · E1 .

Determinants
Elementary operations on determinants.

We list some properties of the determinant det(A) of a square
matrix A.
They correspond to elementary row operations.
(i) Exchange two rows, then the determinant is multiplied by −1 ;

(ii) Multiply a row by c, then the determinant is multiplied by c ;
(iii) Add a multiple of a row to another row, the determinant is
unchanged;
(iv) det(In×n ) = 1.

Consequences
If E is an elementary matrix
det(EA) = det(E) det(A)
Proof. We consider separately each of the case (i),(ii),(iii) above.

In each case we work out det(E) an check that our formula for
det(EA) is correct, as follows:
(i) det(E) = −1,
(ii) det(E) = c,
(iii) det(E) = unchanged. So det(E) = det(I) = 1 (as E comes
from I ).

Products of determinants: step 1
In particular, if A is invertible, then
A = a product of elementary matrices
so, writing say

A = F1 · ... · Fk ,
det(A) = det(F1 ) · ... · det(Fk ).
Why? Because
det(A) = det(F1 · [F2 · ... · Fk ])

= det(F1 ) · det(F2 · ... · Fk )
= det(F1 ) · det(F2 ) · det(F3 · ... · Fk )
etc.
Products of determinants: step 2
If also B is invertible, then, writing
B = H1 · ... · Hk ,
AB = F1 · ... · Fk · H1 · ... · Hk
det(AB) = det(F1 ) · ... · det(Fk ) · det(H1 ) · ... · det(Hk )

= det(A) det(B).
If A or B is not invertible, this equation continues to hold. Why?
( det B = 0 → ∃v 6= 0 Bv = 0 → ABv = 0 → det AB = 0;

det B 6= 0 and det A = 0 → det(AB) = 0, otherwise A =
(AB)B −1 is invertible.)
Linear independence of functions
This is a one-way test for independence.

We illustrate the idea with 3 functions, but it is easy to generalize
to more functions.
We work in the vector space of functions. The zero vector is the
function o(t) ≡ 0 .
Suppose the three functions below are twice differentiable:

f1 : [0, 1] → R, f2 : [0, 1] → R, f3 : [0, 1] → R
Suppose also that they are linearly dependent.
So there are scalars a1 , a2 , a3 with
a1 f1 (t) + a2 f2 (t) + a3 f3 (t) = o(t) = 0 (0 ≤ t ≤ 1).

Use some differentiation
Differentiate each side

a1 f1′ (t) + a2 f2′ (t) + a3 f3′ (t) = o(t) = 0 (0 ≤ t ≤ 1),
because o′ (t) = 0′ = 0 .
Differentiate again
a1 f1′′ (t) + a2 f2′′ (t) + a3 f3′′ (t) = o(t) = 0 (0 ≤ t ≤ 1).
Express the three equations above in matrix form...

In matrix form
Express the three equations above in matrix form...

    
f1 (t) f2 (t) f3 (t) a1 0
    
 f ′ (t) f ′ (t) f ′ (t)   a2  = 0 =  0  .
 1 2 3    
f1′′ (t) f2′′ (t) f3′′ (t) a3 0

In symbols
W(t)a = 0 (0 ≤ t ≤ 1),
where we define below the Wronskian matrix W(t) to be

 
f1 (t) f2 (t) f3 (t)
 
W(t) =  f1 (t) f2 (t) f3 (t) 
 ′ ′ ′
.
f1′′ (t) f2′′ (t) f3′′ (t)
So W(t) is not invertible (otherwise a = 0 ). So the Wronskian

determinant is zero.

In summary:
The Wronskian vanishes if the functions f1 (t), f2 (t), f3 (t) are

linearly dependent
W (t) := det(W(t)) ≡ 0 (for all 0 ≤ t ≤ 1). (WV)
We deduce a test for non-dependence from non-vanishing:
INDEPENDENCE TEST (Somewhere Non-vanishing Wron-

skian):
The three functions f1 (t), f2 (t), f3 (t) are linearly indepen-
dent if for some t0 with 0 ≤ t0 ≤ 1
W (t0 ) = det(W(t0 )) 6= 0.

Example:
Check for independence these functions defined for 0 ≤ t ≤ 2 :
f1 (t) ≡ 1, f2 (t) ≡ sin t, f3 (t) ≡ sin 2t (0 ≤ t ≤ 2).

 
1 sin t sin 2t
 

W(t) =  0 cos t 2 cos 2t .
0 − sin t −4 sin 2t
W (t) = −4 sin 2t cos t + 2 sin t cos 2t

Some experimenting needed:
W (0) = 0, W (π/2) = 0 + 2 · 1 · (−1) = −2.
Conclusion: Here the functions f1 (t), f2 (t), f3 (t) are linearly

independent.

In general ..
if the k functions f1 (t), f2 (t), ..., fk (t) are differentiable k − 1

times, we define
 
f1 (t) f2 (t) ··· fk (t)
 
 ′ ′ ′ 
 f1 (t) f2 (t) fk (t) 
W(t) =   .
.. .. .. 
 . . ··· . 
 
(k−1) (k−1) (k−1)
f1 (t) f2 (t) fk (t)
Then f1 (t), f2 (t), ..., fk (t) are linearly independent if for some t0
W (t0 ) = det(W(t0 )) 6= 0.

Lecture 23: Linear Transformations
Linear versus Matrix Transformations

Co-ordinates from bases
Base changes and representations
Linear transformations e.g. Matrix transformations
A function T : U → V between vector spaces U, V is linear if it

is both additive and homogeneous:
(i) T (u1 + u2 ) = T (u1 ) + T (u2 )

(ii) T (au) = aT (u).
This says T preserves (or ‘respects’) vector addition and

scaling.
This is reminiscent of our definition of subspace (where addition
and scaling preserves membership of the subspace); as there,
so too here we can roll the two conditions (i) and (ii) into one:
T (a1 u1 + a2 u2 ) = a1 T (u1 ) + a2 T (u2 ),
(giving preservation of linear combinations).

MA212 – Lecture 23 – Tuesday 9 January 2018 page 2
Important special case:
For U = Rn , V = Rm and Am×n a matrix
T x : = Ax,
i.e. T is defined by using matrix multiplication to define y =T x ,

where
x ∈Rn and y ∈Rm
and
y =Ax.

Comment
Whilst this is a special case of a linear transformation, it

describes exactly all linear transformations between vector
spaces U, V provided these are finite dimensional. We will see
why shortly.
We therefore say that matrices offer a representation for the

linear transformations.

Example 1.
Here U = R3 and V = R1 = R
 
x
 
T y
 
 = 2x − y + z.
   
x u
   
Consider x: =    
 y  , u : =  v  , then
z w
 
x+u
 
T (x + u) = T  y + v 

 = 2[x + u] − [y + v] + [z + w]
z+w
Example 1 cont’d
So
 
x+u
 
T (x + u) = T  
 y + v  = 2[x + u] − [y + v] + [z + w] =
z+w
= 2x − y + z + [2u − v + w]
= T (x)+T (u)

Example 1 cont’d
Consider a scalar a and

 
au
 
T (au) = T  
 av  = [2au − av + aw] = a[2u − v + w] = aT (u).
aw
Notice that the matrix

A= 2 −1 1
represents T as we have
 
x
 
T (x) =2x − y + z = 2 −1 1  y
 
 = Ax.
z MA212 – Lecture 23 – Tuesday 9 January 2018 page 7

Example 2.
Here U = R3 and V = R3
   
x 2x − y
   
T y
 
 =  y+z



z 2x + z
  
2 −1 0 x
  
= 
 0 1 1   y 
 
2 0 1 z

Example 3.
Here U = Continuous functions f : [0, 1] → R and V = R

Z 1
T (f ) = f (t) dt
0
Integration is a linear transformation:

Additivity
Z 1 Z 1 Z 1
T (f + g) = [f (t) + g(t)] dt = T (f ) = f (t) dt + g(t) dt
0 0 0
= T (f ) + T (g)

Example 3 cont’d
Homogeneity
Z 1 Z 1
T (af ) = af (t) dt = a f (t) dt = aT (f ).
0 0
Recalling that the integral is defined as a limiting sum, T above

may be compared with this summation:
 
x
 
T 
 y  = x + y + z.
z

Example 4. Expectation as linear transformation.
Exotic but significant:

Here U = R is the space of random variables X on a finite
state space Ω.
T (X) = E[X].
Here
T (X + Y ) = E[X + Y ] = E[X] + E[Y ] = T (X) + T (Y ),

T (aX) = E[aX] = aE[X] = aT (X).

Bases
A base B is an ordered, spanning set of linearly independent

vectors. The primary purpose is to represent a vector by a
unique column of co-ordinates.
In a finite dimensional space U with dimension n a base B is

an ordered set of n vectors, written:
B = (u1 , u2 , ..., un ) or [u1 , u2 , ..., un ]
Note the round or square brackets (but NOT the curly variety
{...} ). The notation emphasizes that the vectors ui in the list
above are presented in a fixed order.

Co-ordinate Column
Then for any u in U there exist unique scalars x1 , ..., xn with
u =x1 u1 + x2 u2 + ... + xn un ,
i.e. if also
u =x′1 u1 + x2 u2 + ... + x′n un ,
then
x1 = x′1 , ..., xn = x′n .

An expression for the co-ordinates
We write this representation of u using the unique column of

co-ordinates corresponding to B , following MA100, in either of
the two ways:
   
x x
 1   1 
   
 x2   x2 
(u)B or [u]B =    
 ..  or rarely just [u]B =  ..  .
 .   . 
   
xn xn
B
(Column vectors can be written with round or square brackets!)

We will usually use [u]B , as more convenient. Whichever choice
one follows, the notation has to “remember” the base B , and a
bit of notational overkill is just ‘safety first’.
Co-ordinate column cont’d
We say that the column [u]B represents u relative to the base

B , or briefly that it B -represents u .

Example.
In R3 the natural base is

     
1 0 0
      
E=  0  ,  1  ,  0  ,
      
0 0 1
so for u as below we have:

       
4 1 0 0
       
u :=     
 4  = 4 0  + 4 1
 + 6 0 .
  
6 0 0 1

So ...
so
   
4 4
   
u = 4 = 4  .
   
6 6
E
Here, because the base is the natural base,

   
4 4
   
[u]E =  4  = 
  4
 
 = u.
6 6
E

A comparison
Compare this to what happens when using the following

     
1 0 1
     
B=      
 0  ,  1  ,  2  ,
1 0 −1
which is a base (check this!); then for u as above we have:

       
4 1 0 1
       
u =  4  = 5 0  + 6 1  − 1 2 
      
,
6 1 0 −1

An alternative
so
     
4 4 5
     
   
u = 4  =  4  = 6 

 .
6 6 −1
E B
Alternatively
     
4 4 5
     
[u]E =      
 4  =  4  whereas [u]B =  6  .
6 6 −1
E

Using bases to describe a linear transformation
Let U be n dimensional and V be m dimensional, and let

T : U → V be a linear transformation.
Fix a base B for U and a base C for V :
B = (u1 , u2 , ..., un ) and C = (v1 , v2 , ..., vm ).
So for any u in U there exist unique scalars x1 , ..., xn with

 
x
 1 
 
 x2 
u =x1 u1 + x2 u2 + ... + xn un , i.e. [u]B = 
 ..


 . 
 
xn
B
and so u is represented by a column in Rn .

Similarly in the range
Likewise for v any vector in V there exist unique scalars

y1 , ..., ym with
 
y
 1 
 
 y2 
v =y1 v1 + y2 v2 + ... + ym vm , i.e. [v]C =  
 .. 
 . 
 
ym
C
and so v is represented by a column in Rm .

We will see that using these representations allows us to
represent T by a matrix.

Looking for a matrix
 
x1
 
 
 x2 
For u ∈ U, knowing its B -representation x = 
 ..
 , we

 . 
 
xn
B
need to find y with
 
y1
 
 
 y2 
v =T (u) = 

 .
.. 
 . 
 
ym
C

Some special cases
To do this first we look at the special cases:

T (u1 ), T (u2 ), · · · , T (un )
Step 1: As these are in V, we work out their respective column
coordinates using C :
     
a a a
 11   12   1n 
     
 a21   a22   a2n 
T (u1 ) =  .  , T (u2 ) =  .  , ..., T (un ) = 
   
 .


 . . .
 .   . .
     
am1 am2 amn
C C C

Step 2: Set these down side by side like so
     
a a a
 11   12   1n 
     
 a21   a22   a2n 
T (u1 ), T (u2 ), ..., T (un ) = 
 ..
 ,
  ..
 , ... 
  ..


 .   .   . 
     
am1 am2 amn
C C C
or better still like so

     
a a a
 11   12   1n 
     
 a21   a22   a2n 
T (u1 )C , T (u2 )C , ..., T (un )C = 
 ..
,
  ..
 , ... 
  ..


 .   .   . 
     
am1 am2 amn
Step 3:
Put  
a a12 ... a1n
 11 
 
 a21 a22 a2n 
Am×n 
= . 
. .. 
 . . 
 
am1 am2 ... amn
We claim that for x, y as defined above
y = Ax.

Justification
Indeed
v = T (u) =T (x1 u1 ) + T (x2 u2 ) + ... + T (xn un )

= x1 T (u1 ) + x2 T (u2 ) + ... + xn T (un )
So
     
a a a
 11   12   1n 
     
 a21   a   a 
y = x1   + x2  22  + ... + xn  2n 
 ..   ..   .. 
 .   .   . 
     
am1 am2 amn
= Ax.

Comment
For later use note that because [v]C = y (same as [v]C = yC )

and uB = x we have an equivalence:
v =T (u) i.e. yC =T (xB ) iff y =Ax.

Notation
When we want to remember how A comes about from the

bases B, C and the transformation T , we may write the matrix
equation y =Ax here as
y =AC,B
T x, or even y = C B
AT x.
The latter form particularly helps to associate B with x, and C

with y.
You may recall from MA100 the helpful notation y =AB→C T x.

Summary:
Given T : U → V with dim U = n and dim V = m, and

bases B in U and C in V
...
To find the corresponding representing matrix A :
Simply write down side-by-side the C -representations of the
B -base images
A = T (B)C = [T (u1 )C , T (u2 )C , ..., T (un )C ]

Special case of natural bases where [u]E = u
For U = Rn and B the natural base B = En = (e1 , e2 , ..., en )

since
and V = Rm and C the natural base C = Em = (e1 , e2 , ..., em )
T (En )Em = [T (e1 ), T (e2 ), ..., T (en )]
since uE = u
We simplify the notation here to
A = AT
dropping the bases.

Example 1.
For T : R3 → R2 given by
 
x1  
  2x1 − x2
 
T  x2  =  ,
x1 + x2 + x3
x3
we can write
   
x1   x1
  2 −1 0  
T  
 x2  =
  x2 
 
1 1 1
x3 x3
and so

...and so using natural bases
 
2 −1 0
AT =  .
1 1 1

Example 2.
Q here is the space of quadratic polynomials in t :
p ∈ Q iff p(t) = a0 + a1 t + a2 t2
Take
B = (u1 , u2 , u3 ) = (1, t, t2 )
then
 
a0
 

pB =  a 1 

a2

Consider shifting t to t + 1
Consider T : Q → Q
q = T (p) defined by q(t) = T (p)(t) = p(t + 1).
Find the representation of T when B = C = (1, t, t2 ).

Finding the representation of T when B = C = (1, t, t2 ).
Here
T (u1 ) = T (1) = 1 = u1
T (u2 ) = T (t) = t + 1 = u2 + u1
T (u3 ) = T (t2 ) = (t + 1)2 = t2 + 2t + 1 = u3 + 2u2 + u1
So
     
1 1 1
     
T (u1 )B =      
 0  , T (u2 )B =  1  , T (u3 )B =  2 
0 0 1

so
...
 
1 1 1
 
A= AB,B
T = 0 1 2 

.
0 0 1

... or more neatly
This can be done more neatly as follows
T (B)C = T (1, t, t2 )C = (T (1), T (t), T (t2 ))C

= (1, t + 1, (t + 1)2 )C
= (1C , (t + 1)C , (t2 + 2t + 1)C )
      
1 1 1
      
=   0  ,  1  ,  2 
      
0 0 1

i.e.
i.e.
 
1 1 1
 
A= AB,B
T = 0 1 2 


0 0 1

Check:
q(t) = T (p)(t) = p(t + 1)

= a0 + a1 (t + 1) + a2 (t + 1)2
= a0 + a1 (t + 1) + a2 (t2 + 2t + 1)
= a0 + a1 t + a1 + a2 t2 + 2a2 t + a2
= (a0 + a1 + a2 ) + (a1 + 2a2 )t + a2 t2
So, as B = C
    
a0 + a1 + a2 1 1 1 a0
    
qC = qB = 
 a1 + 2a2  =  0 1 2   a  = ApB .
   1 
a2 0 0 1 a2

Understanding Base changes in Rn
Base changes in Rn ≡ Re-representation of In×n the identity

matrix

Step 1. How to find the B -coordinates of a vector in Rn .
At this point in the exercise we regard a vector u in Rn as being

of course a column so it is its own co-ordinate column for the
natural base E = En i.e. u = uE (as seen earlier).
Now given another base B = (u1 , u2 , ..., un ), how do we find the
B -representation of u , i.e. the x with
[u]B = u = xB ?

The answer is easy:
By definition of representation
 
x1
 
 
 x2 
u =x1 u1 + x2 u2 + ... + xn un , i.e. [u]B = 
 ..


 . 
 
xn
B
but
 
x1
 
 
 x2 
u =x1 u1 + x2 u2 + ... + xn un = (u1 , u2 , ..., un ) 
 ..
.

 . 
 
xn
So
u =BxB for B = (u1 , u2 , ..., un ).

Lite work:
The matrix taking B -coordinates to E -coordinates is the matrix

made up from B.
But B has rank n so is invertible, and so
x =B −1 u.
So the other way from E to B is big beer – harder computation:

The matrix taking E -coordinates to B -coordinates is the matrix
made up from B −1 .
Done and dusted.

Important re-interpretation.
Recall the equivalence between a transformation and its

representing matrix
v = T (u) i.e. iff y =Ax.
As u is its own E -representation: u = [u]E (last slide?), we can

relabel u as a y to remember that it is an E -representation
giving
y =BxB with B = (u1 , u2 , ..., un ).
So from the equivalence: for some linear transformation T
B = AE,B
T = T (B) so T [u1 , u2 , ..., un ] = B = [u1 , u2 , ..., un ].

This means that..
T (ui ) = ui for all base vectors
so by linearity
T (u) = u, for all vectors u ∈ U = Rn , i.e. T = In×n is the identity.

Conclusion:
The matrix taking B -coordinates to E -coordinates represents

the identity:
B = AE,B
I
So
B −1 = AB,E
I

Consequences
Suppose that U = Rn has a base B = (b1 , b2 , ..., bn ) and

V = Rm has a base C = (c1 , c2 , ..., cm ). Find the representation
of the matrix transformation
y =Ax.
The context here is natural bases so, as x = xE and y = yE , in

fact
yE = AxE .

Goal:
We are to find a matrix M with
yC = M xB ,
where xB are the B -coordinates of x and yC are the

C -cordinates of y.

Solution:
Break this up into small steps (‘Slowly, slowly catchy monkey’).
Step 1. (Lite work) Interpret A as a transformation v =T (u)

relative to the natural base En and Em , i.e. with x = xE and
y = yE .
Step 2. (Lite work) Do a base change from En to B and from

Em to C :
x = xE = BxB and y = yE = CyC
Step 3. Substitute the above into y =Ax and solve for yC in

term of xB :
yE = AxE ⇒ CyC = ABxB =⇒ yC = C −1 ABxB .

Big deal?
Possible as C is m × m and invertible (rank= m ). So
M = AC,B
T = C −1
AB.

Lecture 24: Similar Matrices
Similar square matrices (and their characteristic

polynomials)
Computation of ranges and kernels
From scalar product to inner product
... But first a recap
Story so far:
1. T : U → V with bases B = (b1 , b2 , ..., bn ) in U and C in V
is represented by:
A = T (B) = ((T b1 )C , (T (b2 )C , ..., T (bn )C ).
2. Base change E → B has
[u]E =(b1 , b2 , ..., bn )xB =⇒ xB = B −1 [u]E
3. Representing y =Ax with bases B = (b1 , b2 , ..., bn ) in Rn

and C = (c1 , c2 , ..., cm ) in Rm has
CyC = ABxB =⇒ yC = C −1 ABxB .
MA212 – Lecture 24 – Friday 12 January 2018 page 2

Example revisited.
For T : R3 → R2 given by
   
x1     x1
  2x1 − x2 2 −1 0  
 
T  x2  =   =    x 
 2 
x1 + x2 + x3 1 1 1
x3 x3
find the representation in which ...

the representation in which...
       
1 0 1 1 0 1
       
       
B =  0  ,  1  ,  2  7→  0 1 2 


1 0 −1 1 0 −1
and      
2 −1 2 −1
C =  ,  7→  
4 3 4 3

Computation
 
−1 1  3 1 
C = (swop/re-sign/divide)
6+4 −4 2
So the required matrix is

 
   1 0 1
1  3 1   2 −1 0  



M =  0 1 2 
10 −4 2 1 1 1
1 0 −1
 
1  8 −2 2 
= ... =
10 −4 6 4

Similarity of square matrices:
Given the matrix An×n which defines in Rn a transformation

T : Rn → Rn via the formula y = Ax we have
A = AEE
T ;
this uses E the natural base twice: in the domain and in the
range of T . Now switch from the natural base E to an arbitrary
base C in Rn , to be used both in the domain and the range of
T. Then the transformation T is represented by
−1 EE −1
ACC
T = C AT C = C AC.

Similarity
We say that the matrix An×n is similar to Bn×n if for some

non-singular Pn×n
A = P −1 BP.
(The matrix P is sometimes called ‘conjugating’.) Since
B = P AP −1 = Q−1 AQ, with Q = P −1 ,
we see that B is similar to A, so we just say that A and B are

similar.
Thus ACC T above and A are similar.

Comment
Think of
P = (p1 , p2 , ..., pn )
as determining a base comprising its columns p1 , p2 , ..., pn .

Now we see that similar matrices A and B represent the same
linear transformation, but relative to different bases.

Similarity is an equivalence:
i) A is similar to A since A = IAI −1

ii) A similar to B implies B similar to A (as 2 slides earlier)
iii) A similar to B and B similar to C implies that for some P, Q
non-singular
B = P AP −1 and C = QBQ−1 ;
so
C = QP AP −1 Q−1 = (QP )A(QP )−1
and so C is similar to A.

Eigenvectors and eigenvalues
For a square matrix A and scalar λ if
Ax = λx for some x 6= 0,
then λ is an eigenvalue of A and x is a

corresponding/associated eigenvector.
Note that
Ax = λx iff (A − λI)x = 0,
so since x 6= 0 :
λ is an eigenvalue iff A − λI is singular iff det(A − λI) = 0 .
A real matrix A may have complex eigenvalues, in which case
the eigenvector x is in Cn .

Characteristic polynomial
It is convenient to define the characteristic polynomial of A as
pA (x) := det(xI − A),
as this gives a monic polynomial – meaning that the leading

term, which is ( +1)xn , has coefficient one (from ‘monos’ the
Greek for ‘first’).
Theorem. If A, B are similar, then
pA (x) = pB (x).

Proof
Suppose that A = P BP −1 then
1 = det(P P −1 ) = det(P ) det(P −1 )
and so
pA (x) = det(xI − A) = det(xP P −1 − P BP −1 )

= det(P (xI − B)P −1 )
= det(P ) det(xI − B) det(P −1 )
= det(xI − B) = pB (x).

Example
For  
1 2
A= 
3 4

x − 1 −2
pA (x) = = (x − 1)(x − 4) − (−2)(−3)

−3 x − 4
= x2 − 5x − 2
Notice that
5 = (1 + 4) = trace(A)
−2 = (4 − 6) = det(A).

General situation:
For An×n , as pA is monic of degree n,
pA (x) = (x − λ1 )...(x − λn ),
with λ1 , ..., λn the n roots (accounting for multiplicity). But

x−a −a12 ... −a1n
11

−a x − a22 ... −a2n
21
pA (x) = = (x − λ1 )...(x − λn ).
... ...

−an1 x − ann

Continued
Expanding and comparing coefficients at xn−1
−a11 − a22 − ... − ann = −λ1 − λ2 − ... − λn
So switching signs on both sides
λ1 + λ2 + ... + λn = tr(A) = a11 + a22 + ... + ann .

Comparing the constant terms we have
(Put x = 0 )
(−1)n det(A) = (−1)n λ1 λ2 ...λn
and so
det(A) = λ1 λ2 ...λn .

Consequences:
So if A and B are similar:
(i) A and B have the same eigenvalues and to the same

multiplicities;
(ii) A and B have the same trace;
(iii) A and B have equal characteristic polynomials.

Rank and Nullity
Ranges, kernels and their dimensions

How to compute these using Gaussian elimination

Null space
For T : U → V a linear transformation, the null space, also

known as the kernel, of T is
N (T ) = ker(T ) = {u ∈ U : T (u) = 0}.
This is indeed a vector subspace of U.

The range space, or image space, of T is
R(T ) = {v ∈ V : v =T (u) for some u ∈ U }.
This is indeed a vector subspace of V.

Illustration: kernel and range
T
U V
࣬(T)
࣬(T)
ࣨ(T)

Distinction without a difference
Using bases B, C in U, V and associated co-ordinates
T (u) = 0 iff ACB

T x=0
and
v =T (u) iff y =ACB
T x.
When B, C are the natural bases En and Em and A = AT we

don’t fuss over the distinctions between x and xE nor between
y and yE and we write
N (T ) = N (AT ) = N (A)
so that in a word: N (T ) is the set of solutions to the equation

AT x = 0.
Likewise
R(T ) = R(AT ) = R(A).

Example
Consider
      
x x−y 1 −1 x
T = =  .
y 2y − 2x −2 2 y
 
1 −1
AT =  
−2 2

Computation
We consider the null space:

       
 x 1 −1 x 0 
N (T ) =   :    =  
 y −2 2 y 0 
so we must solve
x − y = 0 and 2y − 2x = 0. So
         
 x   1   1 
N (T ) =   : x ∈ R = x   : x ∈ R = Lin  
 x   1   1 
a line through the origin in direction (1, 1)T .

We consider the range:
Notice that any number z can be written as x − y (e.g. take

z = x and y = 0), so
    
 x−y   x−y
R(T ) =   : x, y ∈ R =   : x, y ∈
 2y − 2x   −2(x − y)
       
 1   1 
= (x − y)   : x, y ∈ R = z  :z∈R
 −2   −2 
 
 1 
= Lin   ,
 −2 
again a line.
Range and column space
For A an m × n matrix with columns of depth m and with n

columns altogether:
  
a a12 ... a1n x
 11  1 
  
 a21 a22 ... amn   x2 
Ax =   .. ..

..   .. 
=
 . . ... .  . 
  
am1 am2 ... amn xn
     
a a a
 11   12   1n 
     
 a21   a22   amn 
= x1  .  + x2  .  + ... + xn 
   
 .


 . .  . . .
 . 
     
am1 am2 amn
= x1 a1 + x2 a2 + ... + xn an MA212 – Lecture 24 – Friday 12 January 2018 page 26
So...
R(T ) = Lin {a1 , a2 , ..., an }
and is the column space of A.

Row operations preserve column dependencies.
The best way to see this is to consider the simplest example of

dependency: v =2u.
   
1 2
   
2   
 2 = 4 
3 6
We check what happens under each row operation:

(i) exchange of first and second rows
   
2 4
   
2 1 = 2 
   
3 6

(ii) Now add/subtract another row from
   
2 4
   
2 1 = 2 
   
3 6
   
2 4
   
2
 1 =
  2 

3−2 6−4
R3 −R1

Computation using Gaussian elimination.
It is again easiest to learn this method by example.

Given a matrix: Make the top-left 1 a leading 1 in the echelon
form:
 
1 2 0 1 2
 
 2 4 1 3 3  R2 − 2R1
 
 
 −1 −2 1 0 −3  R3 + R1
 
 
 −3 −6 0 −3 −6  R4 + 3R1
 
0 0 −3 −3 3 nowt to do

Cont’d
This gives another column namely the third to contribute a 2 nd

row with leading 1
 
1 2 0 1 2 leader
 
 0 0 1 1 −1  leader
 
 
 0 0 1 1 −1  R3 − R2
 
 
 0 0 0 0 0  OK
 
0 0 −3 −3 3 R5 + 3R2

Cont’d
 
1 2 0 1 2 leader
 
 0 0 1 1 −1  leader
 
 
B = 
 0 0 0 0 0  OK
 
 0 0 0 0 0  OK
 
0 0 0 0 0 OK
↑ × ↑ × ×
=
x1 x2 x3 x4 x5
Here B is in echelon form.

Step 1 Find N (B)
i.e.
x1 + 2x2 + x4 + 2x5 = 0,
x3 + x4 − x5 = 0.
Take x2 , x4 , x5 arbitrary (this corresponds to the × marked

columns) and use these to define x1 and x3 (these ones
correspond to the up-arrow associated with the leaders)
x1 = −2x2 − x4 − 2x5 ,
x3 = −x4 + x5 .

The solution set is
       
−2x2 − x4 − 2x5 −2 −1 −2
       
 x2   1   0   
      0 
       
 −x4 + x5  = x2  0  + x4  
−1  + x5  1 

     
       
 x4   0   1   0 
      
x5 0 0 1

Step 2. Find N (A) : easy!
N (A) = N (B) i.e. Ax = 0 iff Bx = 0,
Why? Because the 5 row operations performed can be shown

as multiplications by elementary matrices, which are invertible:
B = M5 M4 M3 M2 M1 A
So
     

 −2 −1 −2 


     


 


 1   0 
 
 
 0 


     
N (A) = Lin 
 0  ,  −1 
  ,
 
 1  .


     

 

 0    
  1   0




 

 
0 0 1
This has dimension 3 corresponding to the three crosses ×.

Step 3 Find R(B) Range:
We read off from B the independent columns of B (these are

the leaders): col (B)1 and col (B)3
Now notice the dependencies
col(B)1 = e1
col(B)2 = 2e1 = 2col(B)1
col(B)3 = e2
col(B)4 = e1 + e2 = col(B)1 + col(B)3
col(B)5 = 2e1 − e2 = 2col(B)1 − col(B)3
So
col-space(B) = Lin{col(B)1 , col(B)3 }.

Step 4: Find N (A) : easy! Copy-cat:
col-space(A) = Lin{col(A)1 , col(A)3 },
because row operations preserve column dependencies. This

has dimension 2,
Remark: We see here by example that the domain of A is R5 as

there are 5 columns, and that ‘rank + nullity’=5:
r(A) + n(A) = 2 + 3 = 5 = dim-dom(A).

Summary and general situation:
If A is m × n row reduce it to an echelon matrix Bm×n with n

columns; of these r get up-arrows (leaders) and the remaining
columns are in number k = n − r and get marked ×.
Find N (B) by giving arbitrary values (free choices) to the k
variables marked × ; substitute for the leaders, and decompose
the solution vector as a linear combination of k columns, one
per free choice variable.
So N (A) has dimension k (‘k’ as in kernel).

Summary cont’d
To find R(A) describe R(B) as spanned by the columns of B

with numbers col(B)i corresponding to the r independent
columns which are determined by the leaders.
Then R(A) is described by swopping B for A in the previous
description.

Consequence:
r(A) + n(A) = r + k = r + (n − r) = n = dim-dom(A).
rank(A) + nullity(A) = dim-dom(A).

Another consequence for free:
It is obvious from the echelon form that
row-rank(B) = col-rank(B) = r.
Inverting all the row operations we get
row-rank(A) = r = col-rank(A).

Inner products: Cosine Rule recalled
Consider a triangle in the plane R2 with one vertex at 0 an the

other two at a and b. We denote the angle betwen a and b by
θ. Write
c = b − a,
then
b = a + (b − a) = a + c,
and by the Cosine Rule
||c||2 = ||a||2 + ||b||2 − 2||a|| · ||b|| cos θ.

Cont’d
We can connect this result to the coordinates of vectors a , b as

follows.
Define for u, v in R2
u · v := u1 v1 + u2 v2 ,
then, as in Pythagoras’s Theorem for the triangle with

perpendicular sides of lengths u1 , u2 ,
u · u = u21 + u22 = ||u||2 .

...Furthermore
(αu) · v = (αu1 )v1 + (αu2 )v2 = α(u1 v1 + u2 v2 ) = α(u · v)
and
(u + v) · w = (u1 + v1 )w1 + (u2 + v2 )w2

= (u1 w1 + u2 w2 ) + (v1 w1 + v2 w2 )
= u · w + v · w.
and
u·v =v·u

Returning to the cosine rule:
||c||2 = c · c
= (b − a) · (b − a) = b · (b − a) − a · (b − a)
= b · b + a · a−2a · b
So
a · b =||a|| · ||b|| cos θ
and in particular a and b are orthogonal/perpendicular when

θ = π/2 and
a · b =0.

Scalar products in Rn
We now extend the above ideas and define for u, v in Rn
u · v = u1 v1 + u2 v2 + ... + un vn
so that, consistently with Pythagoras’s Theorem,
u · u = ||u||2 = u21 + u22 + ... + u2n .
This allows us to define u and v to be orthogonal when
u · v = 0.

Properties of the scalar product in Rn
1. Linearity property:
(αu+βv) · w = αu · w+βv · w.
so that for fixed v the transformation Tv below is linear:
Tv (u) := u · v.
2. Symmetry property
u · v = v · u.
3. Positivity property
u · u >0 for u 6= 0.

Lecture 25: Towards Pythagoras’s Theorem
Inner products
Orthogonality
Pythagoras’s Theorem
Angles ...leading to
Orthonormal bases (if time allows)
Recapitulation
Last time we met in Rn the scalar (or dot) product:
u · v = u1 v1 + ... + un vn
of two vectors, and noted that
a · b =||a|| · ||b|| cos θ

u · u = u21 + ... + u2n = kuk2
Linearity: (αx + βy) · v = αx · v + βy · v
Symmetry: u · v = v · u
Positivity: u · u > 0 for u 6= 0
We study this and related examples of ‘products’ with these 3

properties under the name of inner product.
Two important examples of inner products.
Example 1. In R3 first rescale the three axes by 1, 2, 3 :

        
x x x 1 0 0 x
        
 y  →  2y  , represented by  2y  =  0 2 0   y 
        
z 3z 3z 0 0 3 z
Now apply the old scalar product to the images. This gives a
new inner product, so we use angular brackets to denote this
new operation:
hu, vi = u1 v1 + (2u2 )(2v2 ) + (3u3 )(3v3 )

= u1 v1 + 4u2 v2 + 9u3 v3 .

Interpretation
One can interpret this scaling as placing increasingly greater

significance on the second and third coordinates, for instance if
we attach greater probability to outcomes measured by the three
co-ordinates.
Notice that the matrix
 
1 0 0
 
 0 2 0 
 
0 0 3
has positive eigenvalues.

Example 2.
Consider the matrix

 
1 1 0
 
A :=  
 1 2 0 .
0 0 1
This matrix has entries symmetric about the main diagonal:

A = AT . We call A symmetric if A = AT . Put
hu, vi = uT Av =
= u1 v1 + 2u2 v2 + u3 v3 + u1 v2 + u2 v1 .

Example 2 cont’d
Here, since A = AT we get symmetry in the inner product:
hu, vi = uT Av = uT AT v = (vT Au)T = hv, uiT = hv, ui,
the last step because hv, ui is a number (which we identify with

a 1 × 1 matrix).
Obviously
(αu1 + βu2 )T Av =αuT1 Av+βuT2 Av,
so for each fixed v the following map is linear:
T (u) := hu, vi.

Now we check for positivity.
Consider that
hu, ui = u21 + 2u22 + u23 + 2u1 u2

= (u1 + u2 )2 + u22 + u23 ≥ 0.
Here if hu, ui = 0 then each of the squares: (u1 + u2 )2 , u22 , u23 is

0, so u3 = 0, u2 = 0 and as u1 + u2 = 0 also u1 = 0. So we do
have positvity.

Real inner products
In a real vector space V (i.e. with the scalars taken from R) a

map V × V → R , denoted hv1 , v2 i , is an inner product if for all
vectors v1 , v2 , v in V and α, β scalars in R the following
properties hold:
1. Linearity
hαv1 + βv2 , vi = αhv1 , vi + βhv2 , vi,
2. Symmetry
hv1 , v2 i = hv2 , v1 i,
3. Positivity
hv, vi > 0 for v 6= 0.

Consequences
hv, 0i = 0 = h0, vi = 0, for all v
and in particular
h0, 0i = 0.
Indeed
h0, vi = h0 + 0, vi = h0, vi + h0, vi =⇒ h0, vi = 0

Bilinearity
hv, αv1 + βv2 i = αhv, v1 i + βhv, v2 i
This follows from symmetry:
hv, αv1 + βv2 i = hαv1 + βv2 , vi = αhv1 , vi + βhv2 , vi

= αhv, v1 i + βhv, v2 i.

Matrix representation of an inner product
For V finite dimensional with dimension n , take a basis

B = (v1 , v2 , ..., vn ). Consider the co-ordinate columns of two
vectors x and y
v = BxB , w = ByB ,
then
hv, wi = hBxB , wi = hx1 v1 + x2 v2 + ... + xn vn , wi

= hx1 v1 , wi + hx2 v2 , wi + ... + hxn vn , wi
= x1 hv1 , wi + x2 hv2 , wi + ... + xn hvn , wi

Cont’d
But, likewise,
hv1 , wi = hv1 , ByB i = hv1 , y1 v1 + y2 v2 + ... + yn vn i

= y1 hv1 , v1 i + y2 hv1 , v2 i + ... + yn hv1 , vn i.
Similarly
hv2 , wi = y1 hv2 , v1 i + y2 hv2 , v2 i + ... + yn hv2 , vn i.
etc .

Putting all these together gives ...
X
hv, wi = xi yj hvi , vj i = xT W y,
i,j
just like in Example 2, and here

 
hv , v i hv1 , v2 i ... hv1 , vn i
 1 1 
 
 hv2 , v1 i hv2 , v2 i ... hv2 , vn i 
W = .. .. ..
.

 . . ... . 
 
hvn , v1 i hvn , v2 i ... hvn , vn i

Comment
Note that W = W T is symmetric, because hvi , vj i = hvj , vi i .
But, not every symmetric matrix gives positivity.
We will see that a real inner product is of the form xT W y for W

a symmetric matrix such that all its eigenvalues are positive (just
as in Example 2).

Exotic Example: Covariance in L20
A random variable has finite mean square (so said to be L2 ) if
E[X 2 ] < ∞.
Recall that cov(X, Y ) = E(XY ) − E(X)E(Y ); we want to keep

matters simple, so consider the subspace of mean-zero random
variables, denoted L20 .
hX, Y i = E(XY ).
Clearly this is linear and symmetric on L20 . Now
hX, Xi = E[X 2 ] ≥ 0
and E[X 2 ] = 0 implies that X = 0 (actually: = almost always),

so we have positivity. MA212 – Lecture 25 – Tuesday 16 January 2018 page 15
Cont’d
Here, for independent X, Y
E(XY ) = E(X)E(Y ) = 0
so X, Y are orthogonal under the inner product.

Remarks
1. For Gaussian random variables hX, Y i = 0 iff X, Y are

independent.
2. In the next slide we see that any inner product can be used to
introduce a notion of ‘length of a vector’ called its norm;
consequently, we observe that in L20 a notion of length ||X|| can
be introduced with
||X||2 := hX, Xi = var(X) > 0 iff X 6= 0.
(Incidentally this explains why we wanted mean zero: the

variance of any constant is zero and its mean is itself; so here
we avoid ‘constant’ random variables by requiring the mean to
be zero.)

Norms from inner products: motivation
In R2 consider the right angle triangle with the vector

v =(v1 , v2 )T along the hypotenuse. Pythagoras’s Theorem tells
us that
||v||2 = v12 + v22 = v · v.
Likewise in R3 with the vector v =(v1 , v2 , v3 )T
||v||2 = v12 + v22 + v32 = v · v.
This motivates the definition in the next slide.

Definition of norm
In an inner product space V we introduce ||v|| , the norm of a

vector v , by setting
p
||v|| := hv, vi, so that ||v||2 := hv, vi.
This is valid as hv, vi ≥ 0.
Two useful features are:

1. ||0||2 = h0, 0i = 0, but hv, vi > 0 for v 6= 0.
2. Positive homogeneity:
||αv||2 = hαv, αvi = α2 hv, vi = α2 ||v||2 = |α|2 ||v||2 so ...
||αv|| = |α|.||v||
An application
In L20 (the space of random variables X with finite mean square

and with mean zero) we get
p
||X||2 := E[X 2 ], so that ||X||22 := cov(X, X) = E[X 2 ] = var(X).
This is called the L2 -norm; the subscript 2 on the norm symbol

and the superscript on the L symbol remind us of the squaring
involved.
So here the variance, square-rooted, provides a notion of length.

Pythagoras’s Theorem
We go on to see that the norm corresponds to our intuitions

about distance.
Consider a triangle with sides parallel to the vectors u and v,
where these two vectors are perpendicular. The hypotenuse is
u + v.
Pythagoras’s Theorem: In a real inner product space V ,

If hu, vi = 0 , then
||u + v||2 =||u||2 +||v||2 .

Proof
Expanding and using bilinearity
||u + v||2 = hu + v, u + vi
= hu, ui + hv, vi + hu, vi + hv, ui
= ||u||2 +||v||2 ,
since by assumption 0 = hu, vi = hv, ui, the latter by symmetry.


A corollary to come
The corollary in the next slide (in the form of a Theorem) allows
us to regard an arbitrary inner product as satisfying
hu, vi = ||u||.||v|| cos θ
for some θ which we define to be the angle between u and v ,

just like in the cosine rule.
However, unlike in the cosine rule, here we do not start from a

notion of length:
remember that the lengths on the right-hand side of the equation
arise from the inner product, which we seek to interpret.

Cauchy-Schwarz Inequality
The idea of the argument below is that IF we can validly

introduce a notion of angle, as above, then dropping a
perpendicular from u onto the line spanned by v is represented
by the point along v of length
α := ||u|| cos θ.
If v is of unit length, then the said point is (||u|| cos θ).v.
Theorem (Cauchy-Schwarz Inequality). In a real inner

product space V
|hu, vi| ≤ ||u||.||v||, for u, v ∈ V.
Proof. We do this in two steps. MA212 – Lecture 25 – Tuesday 16 January 2018 page 24
Step 1.
We consider the special case when
||v|| =1.
So we are to prove that |hu, vi| ≤ ||u||. Put
α := hu, vi
(so αv is the foot of the perpendicular) and let
w := u − αv.

Continued
Then (as expected from dropping perpendiculars) w is

orthogonal to v :
hw, vi = hu − αv, vi = hu, vi − αhv, vi = hu, vi − α = 0.
So also
hw, αvi = αhw, vi = 0.
So applying Pythagoras’s Theorem to w and αv , and noting

that w+αv = (u − αv)+αv = u we have
||u||2 = ||w+αv||2 = ||w||2 + ||αv||2

Step 1 Cont’d
But, by positive homogeneity
||αv||2 = |α|2 ||v||2 = |α|2 .
So, dropping the non-negative term ||w||2 in the Pythagoras

formula
||u||2 ≥ ||αv||2 = |α|2 = |hu, vi|2 .
So
||u|| ≥ |hu, vi|,
as required.

Step 2.
We now deal with a general v.

If v = 0, then both sides of the inequality are zero, since
hu, 0i = 0 , and we already saw that ||0||2 = h0, 0i = 0.
Proceeding to the case ||v||2 = hv, vi > 0, we can do some
rescaling of v to v/||v|| .
By positive homogeneity

1 1

||v|| v = ||v|| ||v|| = 1.
We can apply the special case to the vectors u and v/||v|| , as

follows.

Step 2 Cont’d
Since
1
hu, v/||v||i = hu, vi
||v||
we get
1
|hu, vi| = |hu, v/||v||i| ≤ ||u||,
||v||
and so on cross-multiplying
|hu, vi| ≤ ||u||.||v||,
as required.

Angles
As
−||u||.||v|| ≤hu, vi ≤ ||u||.||v||,
for u, v 6= 0 we can write this as
hu, vi
−1≤ ≤ 1,
||u||.||v||
and so we can find a unique angle θ with 0 ≤ θ ≤ π such that
hu, vi
cos θ = .
||u||.||v||
We define this θ to be the angle between u and v . Then
hu, vi = ||u||.||v|| cos θ.

Comment
This will helps us to remember how to drop a perpendicular from

u onto a unit vector:
||u|| cos θv = hu, viv for ||v|| = 1

A Final Result
From the Cauchy-Schwarz inequality we deduce our final result

on norms.
In R2 consider the parallelogram determined by the origin, two
vertices determined by vectors u, v with u + v as the fourth
vertex. The triangle formed by u and v has third side of length
same as that of u + v . This motivates a similar situation in an
inner product space:
Theorem (The triangle inequality).

In a real inner product space V , for any vectors u, v ∈ V
||u + v|| ≤ ||u|| + ||v||.

Proof.
Expand using bilinearity and use symmetry
||u + v||2 = hu + v, u + vi
= hu, ui + hv, vi + hu, vi + hv, ui
= ||u||2 + ||v||2 + 2hu, vi
≤ ||u||2 + ||v||2 + 2||u||.||v|| by Cauchy-Schwarz
= (||u|| + ||v||)2 .
That is
||u + v||2 ≤ (||u|| + ||v||)2 .
But since both ||u + v|| and ||u|| + ||v|| are non-negative
||u + v|| ≤ ||u|| + ||v||,

as required.
Applications
In C[0, 1] put
Z 1
hf, gi = f (t)g(t) dt.
0
(This is like summing ui vi .) This is symmetric and bilinear; also

Z 1
hf, f i = f (t)2 dt = 0 =⇒ f (t) ≡ 0.
0
(If f (t)2 is non-zero at some point t it will be non-zero in an

interval contributing positive area.)
So here Pythagoras’s Theorem asserts that if hf, gi = 0 then
Z 1 Z 1 Z 1
2 2
[f (t) + g(t)] dt = f (t) dt + g(t)2 dt.
0 0 0

Furthermore
Also here Cauchy-Schwarz asserts that

Z 2 Z Z
1 1 1
f (t) · g(t) dt ≤ f (t)2 dt g(t)2 dt .

0 0 0
The triangle inequality asserts that

Z 1 1/2 Z 1 1/2 Z 1 1/2
[f (t) + g(t)]2 dt ≤ f (t)2 dt + g(t)2 dt .
0 0 0

In L2
For any X take mX = E[X]
X̃ = X − mX ... so that E[X̃] = 0,
so X̃ is in L20 . Cauchy-Schwarz applied to L20 with
hX̃, Ỹ i = E[X̃ Ỹ ] and ||X̃||22 = E[X̃ 2 ]
gives
|hX̃, Ỹ i|2 ≤ ||X̃||22 .||Ỹ ||22

Cont’d
But notice that
E[X̃ Ỹ ] = E[(X − mX )(Y − mY )]

= E[XY − mX Y − XmY + mX mY ]
= E[XY ] − mX E[Y ] − mY E[X] + mX mY
= E[XY ] − mX mY = cov(X, Y )
and in particular
||X̃||22 = E[X̃ 2 ] = E[X 2 ] − m2X = var(X).

Cont’d
So
cov(X, Y )2 ≤ var(X)var(Y ).
Here the angle θ between X and Y corresponds to their

correlation:
cov(X, Y )
p
var(X)var(Y )

Lecture 26: Orthogonal Matrices
Orthogonality
Orthogonal bases,
Orthogonal matrices: isometry & conformality properties
Rotation and reflection in R2 and R3
Recapitulation: story so far
Real inner products hu, vi yield a notion of orthogonality:
u⊥v means hu, vi = 0.
Pythagoras’s Theorem follows and identifies the foot of the

perpendicular from u onto the line through v as
v v hu, viv
hu, i =
||v|| ||v|| ||v||2
In an n -dimensional space
hu, vi = uT W v for some symmetric W with positive eigenvalues.

Cautionary example: symmetry is not enough
 
1 2
W = 
2 1
Here W = W T so is symmetric but

  
h i 1 2 u1
T
hu, ui = u W u = u1 u2   
2 1 u2
= u21 + 4u1 u2 + u22 = (u1 + u2 )2 + 2u1 u2

So positivity fails, because ...
 
1
... if u =   then hu, ui = −2 < 0.
−1
Underlying cause is that one eigenvalue is negative:

x − 1 −2
pW (x) = = x2 − 2x − 3 = (x + 1)(x − 3).

−2 x − 1
Conclusion: the matrix W needs to be symmetric and have

positive eigenvalues.

Orthonormal bases
Why these? Co-ordinates come much easier... via inner

products.
The following applies to any real inner product space.
The vectors {v1 , ..., vm } are mutually orthogonal if for all i
vi 6= 0 (all are non-zero) and
hvi , vj i = 0 for all i, j with i 6= j.
The vectors {v1 , ..., vm } form an orthonormal set if

(i) they are mutually orthogonal, and
(ii) for all i ||vi || = 1.

Example:
In Rn the natural base vectors {e1 , ..., en } form an orthonormal

set.

Linear independence*
Any mutually orthogonal set {v1 , ..., vm } is linearly independent.

Suppose that
a1 v1 +...+am vm = 0.
Then
ha1 v1 +...+am vm , v1 i = h0, v1 i = 0.
Using linearity,
a1 hv1 , v1 i+...+am hvm , v1 i = 0

a1 hv1 , v1 i = 0
a1 = 0, as hv1 , v1 i =
6 0.

Etc!
The same argument applies to v2 to give a2 = 0, and so on.

Gram-Schmidt process.
In any subspace U = Lin{u1 , ..., um } with {u1 , ..., um } linearly

independent we contruct an orthonormal set {v1 , ..., vm }
spanning U .
Step 1. We take
u1
v1 :=
||u1 ||
so that ||v1 || = 1.

Step 2.
We choose α so that
w2 = u2 − αv1 ⊥v1
This needs
0 = hw2 , v1 i = hu2 − αv1 , v1 i = hu2 , v1 i − αhv1 , v1 i

= hu2 , v1 i − α.1
So
w2 = u2 − hu2 , v1 iv1 , and w2 ⊥v1

But is w2 non-zero?
Yes, otherwise
u2 = hu2 , v1 iv1
is a scalar multiple of u1 contradicting linear independence.

Now we may take
w2
v2 = ⊥v1
||w2 ||
and v2 is of length 1.
And so on ... for instance: put
w3 = u3 − hu3 , v1 iv1 − hu3 , v2 iv2
w3
so that w3 ⊥v1 and w3 ⊥v2 ; then take v3 = ||w3 || .

Example in the function space C[0, 1].
We will use
Z 1
hu, vi = u(t)v(t) dt
0
which gives the Pythagorean norm

Z 1 1/2
||v|| = ||v||2 := v(t)2 dt
0
Consider the subspace U := Lin{u1 , u2 , u3 } where
u1 (t) ≡ 1,
u2 (t) ≡ t,
u3 (t) ≡ t2 .

Apply Gram-Schmidt, starting with ...
Z 1 1/2
||u1 || = 1 dt = 1.
0
We take v1 := u1 = 1. Now
1
w2 = u2 − hu2 , v1 iv1 = t − hu2 , v1 i1 = t − ,
2
as here
Z 1
1
hu2 , v1 i = t dt =
0 2

Now ...
" 3 #1 " 3 #
Z 1 2
1 1 1 1 1 1
||w2 ||2 = t− dt = t− = − −
0 2 3 2 3 8 2
0
1
= ,
3·4
so
1
||w2 || = √ ,
2 3
and so

w2 (t) √ 1 √
v2 (t) = =2 3 t− = 3 (2t − 1) .
||w2 || 2

Computation cont’d
w3 = u3 − hu3 , v1 iv1 − hu3 , v2 iv2

Z 1
2 1
hu3 , v1 i = t · 1 dt =
0 3
Z 1 √ √ Z 1
2 3 2

hu3 , v2 i = t · 3 (2t − 1) dt = 3 2t − t dt
0 0
4 1
√ t t3 √ 1 1 √ 1 1
= 3 2 − = 3 − = 3 = √
4 3 0 2 3 6 2· 3

So
2 1 1 √
w3 (t) = t − · 1 − √ 3 (2t − 1)
3 2· 3
1
= t2 − t + .
6
In the next slide we use the observation that
1 1 2−3
− =
6 4 12

And so finally
Z 1 2
2 2 1
||w3 || = t −t+ dt
0 6
Z 1 2
1 2 1
= (t − ) − dt
0 2 12
Z 1/2 2 Z 1/2 2
1 1
= s2 − ds = 2 s2 − ds
−1/2 12 0 12
Z 1/2
4 1 2 1
= 2 s − s + ds
0 6 144
4 2
1/2
s 1s 1
= 2 s − +
5 6 3 144 0

And so
(noting that 2s =1 when substituting s = 1/2 )

4 2

s 1s 1 1 5·4 16 · 5
= − + = 1− +
5 6 3 12 · 12 s=1/2 5 · 16 6 · 3 12 · 12

1 18 5 · 4 2 · 5 8 1
= − + = =
5 · 16 18 6 · 3 6 · 3 5 · 16 · 6 · 3 5 · 36
So
√
v3 := 6 5w3
√ 2

v3 = 5 6t − 6t + 1 .

Orthogonal Matrices
Suppose that B = (v1 , ..., vn ) is an orthonormal basis for Rn .

Then
||vi || = 1 ∀i
hvi , vj i = 0 ∀i 6= j
Recall that MB = matrix of size n × n with columns v1 , ..., vn

represents a change of basis from En to B .
MB maps a vector in B -co-ordinates into its En -co-ordinates.
Here we take hu, vi := u · v .

...also
  
−− v1T −− | | ...
  
MBT MB =  −− v2 −−   v1 v2 ... 
 T  

... | |
   
v1T v1 v1T v2 ... 1 0 ...
   
=  T T   
 v2 v1 v2 v2 ...  =  0 1 ...  = In×n
... ... ...
So B = (v1 , ..., vn ) orthonormal ⇔ MBT MB = I ⇔ MB−1 = MBT .

In general...
Definition: Matrix M is orthogonal if M −1 = M T .
...important in view of their Geometric features

Preservation of Geometry
The following are equivalent for An×n
1. A is orthogonal
2. T defined by T (x) := Ax takes En to an orthonormal basis
3. Isometry: ||Av|| = ||v|| for all v in Rn

... so distance is preserved
4. Conformality: angles are preserved

Let’s check this out ...
(1) ⇔ (2) Write A = [a1 , ...., an ] (columns of A ) then
Ax = x1 a1 + .... + xn an
Since T (e1 ) = a1 etc., T takes E = (e1 , ..., en ) to (a1 , ...., an )

which are orthonormal if A is orthogonal.

A useful observation
If AT A = I
(Au) · (Av) = (Au)T Av = uT AT Av = uT v = u · v.
So
(1) =⇒ (3): Take u = v , then
||Av||2 =(Av) · (Av) = v · v = ||v||2

Cont’d
(1) =⇒ (4): Since ||Av|| =||v|| for all v, then with θ the angle
between u and v and ϕ the angle between Au and Av :
(Au) · (Av) = ||Au||.||Av|| cos ϕ

u · v = ||u||.||v|| cos θ
So
ϕ = θ (or 2π − θ, same thing really).

Cont’d
(4) =⇒ (2):

 1, if i = j,
ai · aj = (Aei ) · (Aej ) = ei · ej =
 0, if i 6= j.
(4) =⇒ (3): Angles are preserved; but the angle between Av

and Av is θ = 0, hence the angle between v and v is 0 so
||Av||2 =as θ=0 (Av) · (Av) = v · v =as θ=0 ||v||2 .

Cont’d
(3) =⇒ (4): By a recent Homework exercise:
1 2 1
(Au) · (Av) = ||Au + Av|| − ||Au − Av||2
4 4
(To see why: expand RHS using ||x + y||2 = hx + y, x + yi etc.)
1 1
= ||A(u + v)||2 − ||A(u − v)||2
4 4
1 2 1
= ||(u + v)|| − ||(u − v)||2
4 4
... by isometry
=u·v
... by the same Homework exercise.

Some useful facts
A orthogonal =⇒ A−1 orthogonal
AT = A−1 =⇒ (A−1 )T = (AT )T = A = (A−1 )−1 .
A, B orthogonal =⇒ AB orthogonal
(AB)(AB)T = ABB T AT = AIAT = AAT = I
So
(AB)−1 = (AB)T .
Of course: if A, B preserve distance then so does AB (which

represents B followed by A)
A orthogonal =⇒ det(A) = ±1.

Matrix orthogonality for vectors in R2
Here A is 2 × 2. Write
 
cos θ
a1 = Ae1 = (1 col. of A) = 
st ,
sin θ
being a vector of length 1.

Where and what is Ae2 ?
As e1 ⊥ e2 , we must have Ae1 ⊥ Ae2 . So: either

   
cos(θ + π2 ) − sin θ
Ae2 =   =  
sin(θ + π2 ) cos θ
or:    
cos(θ − π2 ) sin θ
Ae2 =  = 
sin(θ − π2 ) − cos θ

Either this
e2
a1
+
2 e1
a2

or this
e2
a1 ! a2
2
e1

In the first case
 
cos θ − sin θ
A = [a1 , a2 ] =   and det A = cos2 θ + sin2 θ = 1.
sin θ cos θ
This is orientation preserving: the axes have been rotated

through an angle θ.

Illustration
Ae1 e2
Ae2
e1
Ae2 MA212 – Lecture 26 – Friday 19 January 2018 page 34

In the second case
 
cos θ sin θ
A = [a1 , a2 ] =   and det A = − cos2 θ−sin2 θ = −1
sin θ − cos θ
This is orientation reversing: Not a rotation!

This is reflection in the line through the origin with angle θ/2.

Illustration
e2
Ae1 Ae2
ɽ/2
x
e1

Three dimensions: Part I
Theorem. If A is 3 × 3 and orthogonal with det A = +1

then λ = +1 is an eigenvalue of A.
Proof. As det(AT ) = det A = 1
det(A − I) = det(AT (A − I))

= det(I − AT ) ...(as AT A = I )
= det(I − A)T
= det(I − A) ...(as det M T = det M )
= det(−I)(A − I)
= det(−I) det(A − I)
= (−1) det(A − I)...(as here − I is 3 × 3)
Conclusion
So
2 det(A − I) = 0 : det(A − I) = 0.
This says that λ = 1 solves det(A − λI) = 0.

How about det = −1
Theorem. If A is 3 × 3 and orthogonal with det A = −1

then λ = −1 is an eigenvalue of A.
Proof. As det(AT ) = det A = −1
det(A + I) = (− det AT ) det(A + I)

= − det(AT (A + I))
= − det(I + AT ) ...(as AT A = I )
= − det(I + A)T
= − det(I + A) ...(as det M T = det M )

Conclusion
So
2 det(A + I) = 0 : det(A + I) = 0.
This says that λ = −1 solves det(A − λI) = 0.

Summary
Orthogonal means M −1 = M T
Preservation of geometry
Group property: AB and A−1 are orthogonal if A, B are.
det(M ) = ±1 and |λ| = 1 for λ any eigenvalue (could be
complex!)
For vectors in R2 an orthogonal matrix transformation is a
rotation or a reflection
If det = +1, then there is an eigenvector v with Av = v
If det = −1, then there is an eigenvector v with Av = −v

Lecture 27: Orthogonal Matrices – cont’d
Two 3 × 3 examples
Complex scalars and vectors
Complex Orthogonality
Story so far ...
Orthonormal base B has its representing matrix MB an

‘orthogonal matrix’
MBT MB = I, or
MB−1 = MBT .

Recapitulation : 1
M is an ‘orthogonal matrix’ means that
M −1 = M T .
Then: Isometry, and Conformality

The 2 × 2 case has either det = +1 and is a rotation:
 
cos θ − sin θ
 
sin θ cos θ
Or det = −1 and is then a reflection matrix:

 
cos θ sin θ
 
sin θ − cos θ
Axis of rotation of a 3 × 3 matrix
Suppose that det A = +1.

Let v be an eigenvector to value λ = +1, i.e.
Av = v.
Consider P the plane through the origin orthogonal to the Line:

Lin{v}.
So u ∈ P iff u ⊥ v.
Consider u ∈ P with u ⊥ v.
Then by conformality Au ⊥ Av = v
So Au ⊥ v.
A maps P into P

A rotation!
So the linear mapping T : P → P defined by T (u) = Au is a

rotation: why?

Why a rotation?
Because, det A = 1 so A is orientation preserving on P.

Conclusion: A is a rotation around the line through v.
Au ∈ P whenever u ⊥ v.

Illustration
Au

Example:
Find the matrix representing rotation T by +π/4 (i.e.

anti-clockwise) around
 
1
 
v :=  1
 
.

Solution approach
First re-scale the vector v to get a unit vector:

 
1
1  
u =√  1 
 .
2
0
Step 0. Solve a simpler problem: a rotation R about e3 through

an angle θ = π/4 is represented, as in slide 3, by
  √ √  
cos θ − sin θ 0 1/ 2 −1/ 2 0
   √ √ 
AEE
R :=    
 sin θ cos θ 0  =  1/ 2 1/ 2 0  .
0 0 1 0 0 1
This is relative to the natural basis.

Step 1.
We can get the transformation T from R : just interpret e3 as

representing the axis of rotation, that being the following unit
vector in some new base B = (v1 , v2 , v3 )
 
1
1  
v3 = √  1 

 .
2
0
In other words we want e3 = (u)B , i.e. to be the co-ordinate

column of u relative to some base B.

Step 2.
So construct an orthonormal basis B = (v1 , v2 , v3 ) for R3 which

includes the above unit-length vector as v3 , and preserves
orientation.
Then we think of e3 = (0, 0, 1)T as the co-ordinate column xB of
u, because it encodes that u is ( 0 units of v1 ) + ( 0 units of v2 )
+ ( 1 unit of v3 ):
e3 = (u)B
(in the notation of MA100). Then

 √ √ 
1/ 2 −1/ 2 0
 √ √ 
ABB
T = AEE
R = 
 1/ 2 1/ 2 0  .
0 0 1

Step 3.
But we require AEE

T . So use
(x)E = MB (x)B .
Then
(y)B = T ((x)B ) = ABB

T (x)B ⇔ MB−1 (y)E = ABB −1
T MB (x)E
−1 −1
(y)E = MB ABB
T M B (x)E : AEE
T = M ABB
B T M B .

Solution
Pick also a with a ⊥ u. How about taking:

   
1 1
1   1  
a =√  −1 
 ⊥ u = v3 = √   1 
 .
2 2
0 0
Now pick a vector b with b ⊥ v3 and b ⊥ a.

How about, the following which is of unit length:
 
0
 
b = 0
 
.

Problem: which do we take?
Do we take
B := [a, b, v3 ] or B := [b, a, v3 ]?
They have determinants of opposite sign. A little

experimentation leads to the right choice as
 √ √   
0 1/ 2 1/ 2 0 1 1
 √ √  1  
 
B := [v1 , v2 , v3 ] =  0 −1/ 2 1/ 2  = √ 
 0 −1 1 ,

2 √
1 0 0 2 0 0
since this has det B = +1. Otherwise we would switch the

sense of rotation to clockwise.

Getting there
This B is orthogonal (we took an orthonormal basis), so its

inverse is its transpose.
So
AEE
T = MB ABB T EE
T MB = MB AR MB
T
   √ 
0 1 1 1 −1 0 0 0 2
1 


 
 
= √  0 −1 1   1 1 0   1 −1 0 
 

2 2 √ √
2 0 0 0 0 2 1 1 0
= ...
 √ √ √ 
1 + 2 −1 + 2 2
1  √ √ √ 
= √  −1 + 2 1 + 2 − 2 

 .
2 2 √ √
− 2 2 2
Illustration: We want preservation of orientation.
We refer to the fingers of the right hand as indicating the

directions: v1 thumb , v2 index finger, and v3 middle finger.
(This agrees with the usual ordering e1 , e2 and e3 ).

Middle finger to be along v3 , the axis of rotation.

Orientation: thumB must be on b
(It helps to make the right arm point right.)

Recapitulation 2
The case of orientation preservation:
det = +1 =⇒ (∃v) Av = v
The case of orientation reversal:
det = −1 =⇒ (∃v) Av = −v

Example in R3
You are told that

 
8 −1 4
1 
A =  1 −8 −4 


9
4 4 −7
is orthogonal with det A = +1. Find the axis and angle of

rotation.

Solution.
First solve Av = v to get the axis. Equivalently, here

9(A − I)v = 0.
    
8−9 −1 4 x1 0
    
 −4     
 1 −8 − 9   x2  =  0 
4 4 −7 − 9 x3 0

So
    
−1 −1 4 x1 0
    
 1  
−17 −4   x2  =  0 
   
4 4 −16 x3 0
Adding the first two rows gives
−18x2 = 0 : x2 = 0.
Hence the general solution is

 
4
 
v = x3  0
 
.
1
Cont’d
Take
 
4
 
v3 =  0
 
.

Next ..
Pick also u with u ⊥ v. How about taking:

 
0
 
u = 
 1  = e2 .
0
Then
 
−1
1 
Au =Ae2 = 2 nd 
col =  −8 
 .
9
4

Cont’d
So if θ is the angle of rotation, then as ||Au|| =||u|| = 1 we have

 
−1
1

 8
u·Au = ||u|| · ||Au|| cos θ = cos θ = e2 ·  −8  = − .
9 9
4

So

−1 8
θ = cos − .
9

Comment: an alternative method
As det A = +1, A is similar to

 
cos θ − sin θ 0
 
B=
 sin θ cos θ 0 

0 0 1
So
T r(B) = 1 + 2 cos θ
= T r(A) ... (by similarity)
1 7
= (8 − 8 − 7) = −
9 9

So
7 16 8
2 cos θ = − − 1 = − : cos θ = − .
9 9 9
" ! ! "
0 "
2
8/9
!"
"
2 !"
Question: is this clockwise or anticlockwise?
As cos(−θ) = cos θ we still have an unanswered issue:

although we can compute the “acute” angle θ ∈ [0, π] with
cos θ = − 89 , we must yet determine if the rotation is:
+θ (anticlockwise), or −θ (clockwise).
The clockwise answer is of course equivalent to an

anticlockwise obtuse angle of 2π − θ .
(Watch your language!)
The upshot is that we know the angle, but not its direction.
To settle this question we need another Recap from MA100.

Recapitulation 3: On cross-products
For vectors a, b, c

e e e e
1 2 3

c = a × b = a = a1 a2 a3

b b1 b2 b3
has
||c|| = ||a|| · ||b|| · | sin θ|.
Beware this is sin NOT cos . The formula describes numerically

the area of the parallelogram with sides a and b , with θ the
angle between a and b (measured from a to b ). See the
Figures.

Orientation of c = a × b with θ measured from a to b.
The orientation of c (parallel to v3 ) is obtained by applying the

right-hand rule: identify your thumb with a , your index with b ,
then the middle finger points to (is aligned with) c .
c = (||a|| · ||b|| · sin θ) v3
v3
v2
c
b
v1
a
(For us v3 = v with v2 somewhere in the plane of a and b .)

Parallelogram
... for c = a × b with θ measured from a to b
v2
b
b sin !
!
a v1

Computation
Using the cross-product formula, and remembering that u = e2

so that Au = a2

e e e e e e e
1 2 3 1 2 3
1
a = u = e2 = a1 a2 a3 = 0 1 0
9

b =Au = a2 b1 b2 b3 −1 −8 4
1
= (4e1 + 0e2 + 1e3 )
9
 
4 √
1 
 0  = 1 v = 17 v
=
9  9 9 ||v||
1

So c points in the same direction as v
1
...because 9 > 0. So
√
v 17 v
c = a × b = sin θ =
||v|| 9 ||v||
Recall that above a = u and b = Au , both being of length 1 .

Let θ measure the angle from a to b ; then
√
17
sin θ = .
9
But θ is also the angle of rotation around v , so the angle of
rotation is positive so anti-clockwise.

This is of course numerically consistent with θ = cos−1 − 98 .
Revision
Getting to grips with C (the complex scalars)

Complex numbers as scalars
C denotes the complex numbers written typically as

√
z = a + ib a, b ∈ R and i = −1
a = Re(z) = real part of z
b = Im(z) = imaginary part of z
The conjugate is
z̄ = a − ib.

Multiplication:
This follows the usual rules of algebra with i2 being replaced by

−1 :
(a + ib)(c + id) = ac + bdi2 + i(bc + ad)

= (ac − bd) + i(bc + ad)
So, taking c = a, d = −b :
z z̄ = (a + ib)(a − ib) = a2 + b2

Modulus
The modulus is defined by

p
|z| = a 2 + b2 .
This relies on the idea that z may be interpreted as the

point/vector (a, b) in the plane R2 . Then |z| is the Pythagorean
norm (length) of the vector (a, b) :
|z| = r = ||(a, b)|| : where r2 = a2 + b2
So this denotes the radial distance of z from the origin (which

represents 0).

Fractions
Note that computation of fractions/ratios may need

‘real-ification’:
1 z̄ z̄ a − ib
= = 2 = 2 2
.
z z z̄ |z| a +b
Example
2−i 3−i (2 − i)(3 − i) (6 − 1) + i(−5)

= (2 − i) = =
3+i (3 + i)(3 − i) 10 10
5 − 5i 1−i
= = .
10 2

More about conjugates
Interpreting z = a + ib as the vector (a, b) tells us that the real

line is the horizontal axis: the real axis of the plane.
Likewise the imaginary numbers ib (so with real part zero) form
the perpendicular axis of points (0, b) : the imaginary axis.
The complex conjugacy map
z 7→ z̄ (a + ib) 7→ (a − ib)
is then reflection in the real axis. Underlying this is a

transformation which replaces i by −i. Since
(−i)2 = i2 = −1,
we’re substituting one root of −1 by another (its conjugate).

Conjugacy preserves arithmetic
Conjugacy is both additive and multiplicative:
(z + w) = z̄ + w̄,
(zw) = z̄ · w̄,
so preserves arithmetic operations.

Hence
(z/w) = z̄/w̄ for w 6= 0.

Geometric features
From the exponential series
z 1 2 1 3
e = 1 + z + z + z + ...,
2 3!
which mathematicians usually take as the definition of ez , a
point we revisit later.

Substitution z = iθ
yields
iθ 1 2 1 3
e = 1 + iθ − θ − iθ + ...
2 3!
1 2 1 3
= (1 − θ + ...) + i(θ − θ + ...)
2 3!
= cos θ + i sin θ.
So eiθ is interpreted as the vector
(cos θ, sin θ)
at unit distance from 0 and with angle θ to the positive real

half-axis.
Cont’d
Since the vector (a, b) is at distance r from 0 and has

co-ordinates
a = r cos θ, b = r sin θ, where r2 = a2 + b2 .
hence the corresponding complex number (a + ib) may also be

written
a + ib = r(cos θ + i sin θ) = reiθ .

Cont’d
Hence also with z = reiθ and w = seiϕ
|zw| = |z| · |w| from |reiθ · seiϕ | = |rsei(θ+ϕ) | = rs,
and likewise
|z/w| = |reiθ /seiϕ | = |(r/s)ei(θ−ϕ) | = r/s = |z|/|w|, for w 6= 0.
Using the fact that conjugacy replaces i by −i :
z̄ = re−iθ .

Triangle inequality
Also, in view of |z| = |a + ib| = r = ||(a, b)||, the triangle

inequality for Pythagorean norms gives
|z + w| ≤ |z| + |w|.

Complex Vector Space Cn
We define   

 z1 


  

 .. 
Cn =  .  : z1 , ..., zn ∈ C .

   


 

zn
These are called complex vectors: they have complex

entries/components.
Applying conjugation to the components:
   
z1 z̄1
   
 ..   .. 
If z =  .  , then z̄ =  .  .
   
zn z̄n

A complex matrix is ...
a matrix with complex entries:

 
a ... a1n
 11 
 . .. 
A =  .. ... .  with all aij in C.
 
am1 ... amn
Then the conjugate matrix is:

 
ā ... ā1n
 11 
 . .. 
Ā =  .. ... . .
 
ām1 ... āmn

And as for the determinant
As conjugacy preserves arithmetic, for m = n
det(Ā) = det A

Easy as she goes...
What can be done with real scalars can be done with complex
scalars. Watch an example.
Solve Az = b where
    
1 0 i z1 2+i
    
 −i 1 1 + i   z2  =  3 − 2i  .
    
0 i −1 z3 2i

Solution
Form the augmented matrix

 
1 0 i 2+i
 
 −i 1 1 + i 3 − 2i 
 
R2 → R2 + iR1
0 i −1 2i
 
1 0 i 2+i
 
 0 1 i 2 
 
0 i −1 2i R3 → R3 − iR2

Continuing ...
 
1 0 i 2+i
 
 0 1 i 2 
 
0 0 0 0 R3 → R3 − iR2

So
   
 z1 2 + i − iz3
z1 + iz3 = 2 + i     

=⇒  z2  =
 
 2 − iz3 

z2 + iz3 = 2 
z3 z3
   
2+i −i
   
=   
 2  + z3  −i 
 for z3 ∈ C
0 1

Comments
We have found the solutions of Az = b to be of the form

z = zspecial + z3 v.
In particular for z3 = 0 we get b =Azspecial . So
A(z3 v) =A(z − zspecial ) = b − b = 0 .
So
 
 
 −i 
 
N (A) = Lin{v} = Lin   −i  ,

 

 
1
and nullity (A) = 1.

Comments continued
Referring to
 
1 0 i
 
B :=  
 0 1 i  = [b1 , b2 , b3 ]
0 0 0
and noting the third column b3 is a combination of b1 , b2 , we

have
   
 
 1   0 
 
R(B) = Lin  0  ,  1  = Lin {b1 , b2 }
   

 
 
0 0

Further consequences
So rk(B) = rk(A) = 2 and in fact:

   

 1 0 

   
R(A) = Lin {a1 , a2 } = Lin  −i , 1  .
  

 
 
0 i
As rank+nullity=2+1 =dim-dom=3, as expected: since C3 is

3 -dimensional as a vector space over the complex scalars.

Determinant here?

1 0 i

1 1+i −i 1
det(A) = −i 1 1 + i = 1

+ 0 + i

i −1 0 i
0 i −1
= −1 − i(1 + i) + i(−i2 ) = −1 − i − i2 + i = 0.

Geometry in Cn
Recall that
|z|2 = z z̄ = a2 + b2 .
We follow the Pythagorean norm on C and generalize beyond to

Cn by defining
||z||2 = |z1 |2 + ... + |zn |2 = z1 z̄1 + ... + zn z̄n .

Inner products: complex-valued
The last formula motivates us to define the complex scalar

product by:
   
z w
 1   1 
 .   . 
z · w =  ..  ·  .. 
   
zn wn
 
w̄1
 
 . 
= zT w̄ = [z1 , ..., zn ]  .. 
 
w̄n
= z1 w̄1 + ... + zn w̄n .

Then ...
w · z = w1 z̄1 + ... + wn z̄n = (z1 w̄1 + ... + zn w̄n ) = z · w
We will also write
hz, wi = z · w : hw, zi = w · z = z · w = hz, wi.
We loose symmetry but retain left-sided linearity:
hαz + βz′ , wi = (αz1 + βz1′ )w̄1 + ... + (αzn + βzn′ )w̄n

= (αz1 w̄1 + ...) + (βz1′ w̄1 + ...)
= α(z1 w̄1 + ...) + β(z1′ w̄1 + ...)
= αhz, wi + βhz′ , wi

Compare this with ...
hz, αw + βw′ i = hαw + βw′ , zi = αhw, zi + βhw′ , zi

= ᾱhw, zi + β̄hw′ , zi = ᾱhz, wi + β̄hz, w′ i
This property is called ‘sesquilinearity’ (Latin for ‘one and a

half’-linearity).

Complex Inner product Space
For V a complex vector space a map
h·, ·i : V × V → C
is a complex inner product if there is:

Linearity in the first argument:
hαu1 + βu2 , vi = αhu1 , vi + βhu2 , vi all u1 , u2 , v ∈V, α, β ∈ C
Hermitian property:
hv, ui = hu, vi all u, v ∈V.
Positivity:
hu, ui > 0 all u 6= 0 ∈V.
Comment
Note the Hermitian property implies that hu, ui =hu, ui for all
u ∈V ; so hu, ui ∈ R ; but we want more than just being real.

Lecture 28: Complex orthonormality
Unitary matrices from orthogonality

Adjoint operation A∗
Hermitian matrices
(In pursuit of diagonalization)
Background Readings for this Lecture
Anthony and Harvey: Chap. 13 (§13.3-13.6)
Adam Ostaszewski: Chap. 5 (§5.1 and 5.3)

Recap about: <, > in the complex case
u · v = uT v
= u1 v1 + u2 v2 + · · · + un vn
so
P
u·u= |ui |2 ,
u · u > 0 for u 6= 0 .
This is the standard example of a complex inner product

hu, vi = u · v

Complex inner product space
For h·, ·i : V × V −→ C
Linearity:
hαu1 + βu2 , vi = αhu1 , vi + βhu2 , vi , ∀u1 , u2 , v ∈ V ,
∀α, β ∈ C
Hermitian property: hv, ui = hu, vi , ∀v, u
Positivity:
hu, ui > 0 for all u 6= 0
N.B.
hu, ui = hu, ui so hu, ui is real, but we want more (positivity).

Orthonormal set, an example:
   
1 1
   
v1 = √2 
1
i
 
 , v2 = √1
3
−i what are their norms?
 
0 1
Since 1 · 1 + i · i + 0 = 1 + (i)(−i) = 1 + 1 = 2 ,
||v1 || = 1 .
Since 1 · 1 + (−i) · −i + 1 · 1 = 1 − (i)(i) + 1 = 1 + 1 + 1 = 3 ,
||v2 || = 1 .
Moreover, v1 · v2 = √1 √1 (1 · 1 + i(−i) + 0 · 1) = 0 ,
2 3
so v1 ⊥ v2 .

Constructing an orthonormal base:
Essentially just as in the Rn case, but care must be taken over

inner products here because linearity is only in the first
argument of h·, ·i.
Base given as:

     
1 1 0
     
 
u1 :=  i  , u2 := −i , u3 := 
  1
 

0 1 0

Apply the Gram-Schmidt construction ...
... to the above vectors:

 
1
 
u1 =  i
 

0,
and ku1 k2 = 12 + |i|2 + 02 = 2 . So,

 
1
 
v1 = √2 
1
i
 
.

In the Figure below: what’s α?
A vector w2 ⊥ v1 is wanted with w2 = u2 − αv1 . So what is

wanted is:
hw2 , v1 i = 0 i.e. hu2 − αv1 , v1 i = 0.
So hu2 , v1 i − αhv1 , v1 i = 0 ,
yielding:
α = hu2 , v1 i.

Notice that ...
u2 comes first in the inner product h·, ·i calculation:
1
hu2 , v1 i = − √ {1 + (−i)(−i) + 0} = 0.
2
So
     
1 1 1
  1   1  
w2 =      
−i − √2 {1 + (−i)(−i) + 0}  i  √2 = −i .
1 0 1

Hurray!
So u2 ⊥ v1 , as wanted!
Hence kw2 k2 = 1 + |(−i)|2 + 1 = 3 . Rescaling yields
 
1
 
v2 = √3 
1
−i
 
.
1
 
0
 
Now work on u3 =  1
 
.
w3 = u3 − hu3 , v1 iv1 − hu3 , v2 iv2
Again u3 comes first in the two inner products h·, ·i

Cont’d
   
0 1
  1   1 1
   
hu3 , v1 i = 1 · √  i  = √ (0 + i + 0) = − √ i
2 2 2
0 0
and
   
0 1
  1   1 i
   
hu3 , v2 i = 1 · √ −i = √ (0 + 1(−i) + 0) = √ .
3 3 3
0 0

So...
     
0 1 1
  i 1   i 1  
w3 = 1 − (− √ ) √  i  − ( √ ) √ −i
    

2 2 3 3
0 0 1
     
0 i i
  1  1 
=   
1 + 2 −1 − 3 1
 
0 0 i
 
i/6
  1 1 3−2

=  1/6  as − = .
2 3 6
−i/3

Cont’d
   
i/6 i
  1 
=  1/6  =  1 
  
 .
6
−i/3 −2i
1 6
Now kw3 k2 = 36 (|i|2 + 12 + |(−2i)|2 ) = 36 = 16 , so
 
i
w3 1  
v3 = =√  1 

 .
kw3 k 6
−2i

Caution: More than two units among the scalars
Above v3 ⊥ v1 , v2 . For any complex number α with |α| = 1, we

have
||αv3 || = |α| · ||v3 || = ||v3 ||
. So αv3 is also of unit length. Also
αv3 ⊥ v1 , v2 .
More units than than the two ±1 available in R .

Example:
The multiple:
 
−1
 
iv3 = √6 
1
 i 
can equally well be used in place of v3 .

Continuous function into C
If f : [0, 1] −→ C is continuous, then f (t) = g(t) + ih(t)

for some functions g, h both [0, 1] −→ R with g and h both
continuous.
We define complex integration by putting:

R1 R1 R1
0 f (t)dt := 0 g(t)dt + i 0 h(t)dt
Moral: apply ordinary integration separately to the real and

imaginary parts.
We denote the continuous functions from [0, 1] −→ C by CC [0, 1].

Example
f (t) = eit = cos(t) + i · sin(t)
Note that if f (t) = g(t) + ih(t) , then

Z 1 Z 1 Z 1
f (t) dt = g(t)dt + i h(t)dt
0 0 0
Z 1 Z 1
= g(t)dt + i h(t)dt
0 0
Z 1 Z 1
= g(t)dt − i h(t)dt
0 0
Z 1
= {g(t) − ih(t)}dt
0
Z 1
= f (t)dt.
0 MA212 – Lecture 28 – Friday 26 January 2018 page 17
Continuing in the same vein
Define also:
R1
hf1 , f2 i = 0 f1 (t)f2 (t)dt
Check that this has the properties required:

Z 1
hαf1 + βf2 , gi := [αf1 (t) + βf2 (t)]g(t)dt
0
Z 1
= (αf1 (t)g(t) + βf2 (t)g(t))dt
0
Z 1 Z 1
= αf1 (t)g(t)dt + βf2 (t)g(t)dt
0 0
= αhf1 , gi + βhf2 , gi

Cont’d
Z 1
hf, gi = f (t)g(t)dt
0
Z 1
= f (t)g(t)dt
0
Z 1
= g(t)f (t)dt = hg, f i
0
Z 1 Z 1
hf, f i = f (t)f (t)dt = |f (t)|2 dt > 0 unless f (t) ≡ 0
0 0

Unitary Matrices: here hu, vi = u · v
For an orthonormal basis B = (v1 , . . . , vn ) take

MB = [v1 , . . . , vn ], square of size n × n . Consider
 
. . . vT1 . . .
 
 
. . . vT2 . . . T
  = M
 ..  B
 . 
 
. . . vTn . . .
n×n
Then
T
M B MB = I so MB∗ MB = I.
See slide 22 for the definition of M ∗ .

In detail ...
   
vT1 vT1 v1 vT1 v2 ...
   
 T  T T 
T v 2  v2 v1 v2 v2 ... 
  
M B MB =  .  [v1 , v2 , . . . , vn ] =  . 
. . .
. 
 .   . . ... 
   
vTn ... ... vTn vn
 
hv , v i hv2 , v1 i ...
 1 1 
 
hv1 , v2 i ... ... 

= =I
.. .. 
 . . ... 
 
... ... hvn , vn i
... since uT v = uT v = hu, vi = hv, ui .

The conjugate transpose
Definition. For a matrix A we define the conjugate transpose of

A to be the matrix
∗ T
A := A .
Definition. A square matrix A is said to be unitary if
A−1 = A∗ .
Remark. So a real square unitary matrix A is just an orthogonal

one!

Example
   
√1 √1 −1 √1 −i
√ √ 0
 2 3 6  2 2 
 √i   √1 
A=  −i
√ √i  , then A∗ =  √i √1  .
 2 3 6  3 3 3
√1 √2 −1 −i √2
0 3 6
√
6
√
6 6
It is easily checked that AA∗ = I. So A∗ A = I and A−1 = A∗ .

Cont’d
 
1 1 −1
1 1 1  
det(A) = √ √ √ det  i −i i 


2 3 6
0 1 2
1
= ((−2i − i) − i(2 + 1))
6
= −i.
So | det(A)| = 1 .

Properties of *
1. (A∗ )∗ = A
2. (AB)∗ = B ∗ A∗
3. det(A∗ ) = det(A)
4. The Star property: x · (A∗ y) = (Ax) · y
Indeed
T T
LHS = x (A y)
T
T
=x A y
= (xT AT )y
= (Ax)T y
= (Ax) · y = RHS

Properties cont’d
5. Group property: for A unitary,

(A−1 )∗ = (A∗ )∗ = A = (A−1 )−1
So A unitary implies A−1 unitary, which itself implies A
unitary.
6. A, B unitary implies AB unitary.
(AB)∗ AB = B ∗ A∗ AB
= B∗B = I
7. A unitary implies | det A| = 1. Indeed A∗ A = I

det(A) det(A) = |I| = 1
| det(A)|2 = 1
| det(A)| = 1

Theorem
Theorem:
Equivalent characterizations for an n × n matrix:
1. A is unitary;
2. columns of A are an orthonormal basis for Cn ;
3. isometry: kAuk = kuk ;
4. conformality,
and in fact (Au) · (Av) = u · v .

Proof: (1) implies (3)
(Au) · (Av) = (Au)T Av

= uT AT Av
= uT v
Indeed
T
A A = I , so (as I is real)
T
I = A A = AT A, i.e. AT A = I.
Hence (with v = u ),
kAuk2 = (Au) · (Au) = uT u = kuk2 .

A theorem about unitary matrices
Theorem: For A unitary, its eigenvalues have modulus 1.
Proof:
Let Ax = λx , with x 6= 0 . Then
x · x = kxk2 = (Ax) · (Ax) = (λx) · (λx) = λλ x · x.
So λλ = 1 , as x · x = kxk2 > 0 .

Normal Matrices
A matrix An×n is normal if
AA∗ = A∗ A
They are ‘normal’ precisely because they are diagonalizable, i.e.

there is an orthonormal basis change that gives them a diagonal
matrix representation.
So
unitary matrices are normal
UU∗ = I ⇔ UU∗ = I
As U −1 = U ∗ .
Now we meet a very significant member of the normal family:

Hermitian Matrices
A is Hermitian if A∗ = A
An example:
 
1 i −1
 
 −i 0 3 + 2i
 
−1 3 − 2i 2
If A = A∗ then A∗ A = A2 = AA∗ .
So if A is Hermitian, then A is normal.
Similarly for A skew-hermitian: i.e. when
A∗ = −A (1)
then AA∗ = −A2 = A∗ A

Example:
   
0 2 0 −2
A= , then ∗
A =   = −A
−2 i 2 −i

4 −2i
∗
Check: AA =  
2i 5

Comment
Note: (AA∗ )∗ = A∗∗ A∗ = AA∗
Theorem (group-property)
Let An×n be invertible. Then:

1. (A∗ )−1 = (A−1 )∗
2. A is normal =⇒ A−1 is normal
3. A is Hermitian =⇒ A−1 is Hermitian

Proof:
1. I = I ∗ = (AA−1 )∗ = (A−1 )∗ A∗ , so (A∗ )−1 = (A−1 )∗

2.If AA∗ = A∗ A , then
(A∗ )−1 A−1 = A−1 (A∗ )−1 .
Since (A∗ )−1 = (A−1 )∗ by part 1,
(A−1 )∗ A−1 = A−1 (A−1 )∗ ,
which shows that A−1 is normal.

3.if A = A∗ , then A−1 = (A∗ )−1 = (A−1 )∗

Theorem about eigenvalues
Theorem
Let An×n be Hermitian. Then:
1. All eigenvalues are real
2. Eigenvectors to distinct eigenvalues are orthogonal.
Comment: So this holds for real symmetric A

Proof for 1.
Let Ax = λx , for x 6= 0 . Then (since A = A∗ )
x · (Ax) = x · (A∗ x)
Using the star property
x · (λx) = (Ax) · x
Thus
λx · x = λx · x
and we conclude λ = λ as x · x > 0 .

Proof for 2.
Let Ax = λx and Ay = µy , with λ 6= µ . Then
(Ax) · y = x · (A∗ y) = x · Ay
Thus (since µ = µ as in the last slide)
λx · y = x · (µy) = µx · y = µx · y
So (λ − µ)x · y = 0 , and hence x · y = 0 .

Summary. The square matrix M is:
Normal if M M ∗ = M ∗ M
Unitary if M ∗ M = I i.e. M −1 = M ∗
Hermitian if M ∗ = M
For M real: orthogonal is unitary

For M real: symmetric is Hermitian
If M is unitary (e.g. real + orthogonal) its eigenvalues λ

satisfy |λ| = 1.
If M is Hermitian (e.g. real + symmetric) its eigenvalues λ

satisfy λ ∈ R MA212 – Lecture 28 – Friday 26 January 2018 page 38
Postscript: Hermite property again
Suppose A = A∗ , put hx, yi = xT Ay . Then

hy, xi = yT Ax
= yT Ax regarded as a 1 × 1 matrix, (z)1×1 say,
= (yT Ax)T since (z)T1×1 = (z)1×1
T
= xT A y
T
= xT Ay as A = A∗ = A
= hx, yi

Lecture 29: Towards diagonalization
Example of diagonalization
Upper triangular form
Background readings
Anthony & Harvey Chapter 8 (§8.2 & §8.3)
Adam Ostaszewski Chapter 6 (§6.2 & §6.3)

Recap
Diagonalization means base change so that

 
×
 
 × 0 
 
A −→ 



×





 0 × 

×
| {z }
Diagonal
When can we do this? Answer: See later...

Recall that
If D is a diagonal matrix:
 
d1
 
 


D=
d2 0 


 .. 
 . 
 
0 dn
Then pD (x) = det(x I − D)

i.e. pD (x) = (x − d1 )(x − d2 ) . . . (x − dn )
so, d1 , d2 , . . . , dn are eigenvalues of D .

Cont’d
A is diagonalizable iff
P −1 AP = D ‘Similar to D ’
for some diagonal D

Then D and A have the same eigenvalues. The diagonal
entries of D are eigenvalues of A .

Diagonalization
Let A be a square matrix of size n × n .

Paradigm:
Suppose there are n linearly independent eigenvectors. Say
these are v1 , · · · , vn and
Avi = λi vi
Then A is similar to
 
λ1
 
 


D=
λ2 0 


 .. 
 . 
 
0 λn
Cont’d
Actually: D represents y = A x with respect to the basis
B = (v1 , . . . , vn ).

Cont’d
so
D = MB−1 A MB
Check:
λj ej = D ej = MB−1 A MB ej
= MB−1 Avj (as vj = MB ej )
= MB−1 λj vj
= λj MB−1 vj
= λj ej (again as vj = MB ej ).

Some observations
Example... soon to come (next 7→ )

Even if A is real, nevertheless eigenvalues can be complex.
However, some good news: if λ ∈ C and A is real
if A v = λv v 6= 0
then A v = λv but A is real so
Av = Av = λv and v 6= 0.
So, for real A if λ is an eigenvalue and v a corresponding

eigenvector, then λ is also an eigenvalue and v a corre-
sponding eigenvector.

Example
Consider
 
−1 0 −2
 
A=
0 3 2
1 −3 0

Step 1: Find eigenvalues
pA (x) = det(x I − A)

x + 1 0 2

= 0 x − 3 −2

−1 3 x

x − 3 −2 0 x − 3

= (x + 1)
+ 2
3 x −1 3
= (x + 1)(x2 − 3x + 6) + 2(x − 3)
= x3 − 2x2 + 5x
= x(x2 − 2x + 5)

So the roots are...
Roots: x=0
√
2 ± 4 − 20
x= = 1 ± 2i
2
λ1 = 0, λ2 = 1 + 2i λ3 = λ2 = 1 − 2i.

Step 2: Find eigenvectors
λ1 = 0 solve Av = 0
−x1 − 2x3 = 0 take x3 = −3t

3x2 + 2x3 = 0 then x2 = 2t
x1 − 3x2 = 0 then x1 = 6t.
(The last equation is redundant, but useful here, nevertheless.)

Say t = 1...
 
6t
 
v1 =  2t 


−3t
 
6
 
=t2

−3
Say t = 1 ...
 
6
 
v1 = 
 2 
−3
Getting v2
λ2 = 1 + 2i solve A v = λ2 v
(λ2 I − A)v = 0
    
2 + 2i 0 2 x1 0
    
 0 −2 + 2i −2  x2  = 0
    
−1 3 1 + 2i x3 0
(1 + i)x1 + x3 = 0 (1)
(−1 + i)x2 − x3 = 0 (2)
Ignore third, as redundant.

Ignoring the redundant one
Put x3 = −t
−t −t(−1 − i)
(2) ⇒ x2 = =
−1 + i 2
1+i
= t (−1 + i)(−1 − i) = 2
2
t t(1 − i)
x1 = = (1 + i)(1 − i) = 2
1+i 2
yielding
   
1−i 1−i
t   
v2 =  1 + i
 say (t = 2) 1 + i
 
2
−2 −2
v3 for free from v2
Take v3 = v2
   
1−i 1−i
t   
v2 =  1 + i
 say (t = 2) 1 + i
 
2
−2 −2
so take
 
1+i
 
v3 = v2 =  
1 − i
−2

Step 3: Construct the ‘modular’ matrix.
Take

P = v1 , v2 , v3
 
6 1−i 1+i
 
=
 2 1 + i 1 − i

−3 −2 −2
Then
A P = (A v1 , A v2 , Av3 )
= (λ1 v1 , λ2 v2 , λ3 v3 )

Finally
So that
 
λ1 0 0
 
A P = P D = (v1 , v2 , v3 ) 
0 λ2 0 
0 0 λ3
 
λ1 0 0
 
∴ P −1 A P =
0 λ2 0 
.
0 0 λ3
 
0 0 0
 
−1 
P A P = 0 1 + 2i 0  .
0 0 1 − 2i

Here:
 
4 4 4
−1 1 


P = −3 + i −3 − 9i −8 − 4i
20  
−3 − i −3 + 9i −8 + 4i
(Here A is diagonalizable.)
Question: In general, when can this be done?...

Break-downs - I
Where can the
diagonalization
paradigm
break down?

Eigenspaces
For a matrix A sized n × n and λ an eigenvalue; the

Eigenspace to value λ is defined to be:
E(A, λ) = {v : A v = λv}
This includes 0 .
But v ∈ E(A, λ) and v 6= 0 ⇒ v is an eigenvector.
FACT
If pA (x) = (x − λ1 )m1 . . . (x − λk )mk , then
1 ≤ dimE(A, λj ) ≤ mj
| {z } |{z}
geometric multiplicity algebraic multiplicity

N.B.
E(A, λ) = {v : (A − λI)v = 0}
= N (A − λI)
= Null space of A − λI.

EXAMPLE
 
4 0 6
 
A = −2 1 −5


−3 0 −5

x − 4 0 −6

x − 4 −6

pA (x) = 2 x−1 5 = (x − 1)

3 x + 5
3 0 x + 5
= (x − 1)(x2 + x − 2)
= (x − 1)(x − 1)(x + 2)
= (x − 1)2 (x + 2)

So ...
λ = 1 has alg. multiplicity 2

λ = −2 has alg. multiplicity 1 .

So dimE(A, −2) = 1, but is dimE(A, 1) 1 or 2, which?
solve(A − I)v = 0
 
3 0 6
 
−2 0 −5 v = 0
 
−3 0 −6
3x1 + 6x3 = 0 (1)

−2x1 − 5x3 = 0 (2)
−3x1 − 6x3 = 0 (3)

Cont’d
By (1) , x1 = −2x3 . Substitute for x1 in (2) .

 
0
 
x3 = 0, x1 = 0, v =  x
  2


 
0
 
E(A, 1) = Lin  1
 

so, it is 1 -dimensional: This is a deficiency – we wanted/hoped

for 2 -dimensional.
So, A is not diagonalizable because it doesn’t have 3
independent eigenvectors.

Geometric multiplicity is preserved under similarity.
Theorem If A, B are similar and λ is an eigenvalue of

either (= both here) then
dimE(A, λ) = dimE(B, λ).
Observation. Suffices to show
dimE(A, λ) ≥ dimE(B, λ)
... why?
Answer: Write A for B and B for A
dimE(B, λ) ≥ dimE(A, λ)
because A similar to B ⇒ B similar to A .

Proof
Say A = P −1 B P ; λ an eigenvalue of A and B ; {v1 , . . . , vk }

basis of E(B, λ) .
Put ui = P −1 vi
−1

Aui = P B P P −1 vi
= P −1 B vi
= P −1 λvi
= λui .
So, ui ∈ E(A, λ) .

Proof continued
We now show:
ui are linearly independent
If α1 ui + · · · + αk uk = 0 , then, as v 7→ P v is linear,
α1 P u1 + · · · + αk P uk = 0
iff α1 v1 + αk vk = 0.
So, α1 = α2 = · · · = αk = 0 , because v1 , . . . , vk are linearly

independent (form a basis).
So, dimE(A, λ) ≥ k = dimE(B, λ) .
This completes the proof in view of the earlier observation.

EXAMPLE
 
1 1
A= 
0 1

x − 1 −1
pA (x) =

0 x − 1
= (x − 1)2
 
0 1
A−I =  
0 0

Cont’d
Av = v ⇒ x2 = 0
 
x1
v=  
0
  
 1 
E(A, 1) = Lin   of dimension 1
 0 
so, A is NOT diagonalizable.

Why is ...
 
1 1
A= 
0 1
not diagonalizable?
If it were λ1 = λ2 = 1
   
1 1 1 0
  = P −1  P
0 1 0 1
= P −1 I P = I
 
1 0
=  
0 1
Contradiction! MA212 – Lecture 29 – Tuesday 30 January 2018 page 34

Break-downs - II
Upper triangular
will
fix it!

Although not every square matrix can be diagonalized,
... nevertheless every matrix can be reduced to the very useful:

Upper Triangular Form:
 
d
 1 
T =


 d2




U
 .. 
 . 
 
0 dn
N.B.
pT (x) = (x − d1 )(x − d2 ) · · · (x − dn )
So, diagonal entries are eigenvalues, each appearing same

number of times as its algebraic multiplicity.
Why useful?
Just like in each format, equations
   
x b
 1  1
 .  .
T  ..  =  .. 
   
xn bn
are easy to solve ‘backwards’ from last equation upwards.

Last equation:
dn xn = bn
(soluble if dn 6= 0 for any bn or soluble if dn = bn = 0 .)

Last but one:
dn−1 xn−1 + tn−1,n xn = bn−1
etc.

Theorem
Theorem. Every square matrix is similar to an upper trian-

gular matrix.
Proof. This is proved by showing how to pass to progressively

smaller matricesa , until one reaches 1 × 1 matrices!
Let A be n × n .
Let λ be an eigenvalue of A
Let
(u1 , . . . , uk ) be a basis of E(A, λ)
Extend to a basis of Rn :
B = (u1 , . . . , un )
a
‘Proof by mathematical induction’
Cont’d
Put
A′ = MB−1 A MB
Base change, A′ represents A relative to (u1 , . . . , un ) . Now for

j = 1, . . . , k
A′ ej = MB−1 A MB ej
= MB−1 A uj
= MB−1 λuj
= λMB−1 uj
= λ ej

 
λ
 
 


..
. P 

 
 λ 
A =
′



 
 
 


0 Q 

for some P that is k × (n − k)

Q that is (n − k) × (n − k)
pA (x) = pA′ (x) = (x − λ)k pQ (x)

So:
1≤ k
|{z} ≤ m
|{z}
Geometric multiplicity algebraic multiplicity
Q is smaller sized, at most (n − 1) × (n − 1) .

Suppose Q can be ‘reduced’ to upper triangular form, i.e.,
C −1 Q C = U
for C non-singular and U upper-triangular.

Cont’d
Put
 
1
 
 .. 


. 0 

 
 1 
N =



 
 
 


0 C 


Cont’d ... N −1 A′ N =
   
   
 Ik 0  λIk P  Ik 0 
   
   
   
   
   
   
   

 0 C −1 
 0 Q 
 0 C 


N −1 MB−1 A MB N is upper triangular
  
  

 λIk P 
 Ik 0 

  
  
  
=  
  
  
 −1  

 0 C Q 
 0 C 


 
 

 λIk PC 
 
 
 
= 
 
 
 −1 

 0 C Q C


Cont’d
 
λ
 
 .. 

 . PC 

 
 λ 
=



 
 
 


0 U 

is upper triangular.
A is similar to A′
A′ is similar to upper triangular.
∴ A is similiar to upper triangular.

EXAMPLE
The matrix
 
2 2 −1
 
A = −1 −1 1 


−1 −2 2
has
pA (x) = det(x I − A) = (x − 1)3 .
 
1
 
v= 0
 
 is an eigenvector

Create a basis ...
using v for R3
say,
     
1 0 0
     
v= 0
 
, e2 = 1 , e3 = 
  0
 
,
1 0 1
obviously linearly independent

 
1 0 0
 
M B = 0 1 0 


1 0 1
  
2 2 −1 1 0 0
  
A MB = −1 −1 1  0 1 0
  

−1 −2 2 1 0 1
 
1 2 −1
 
= 0 −1 1 


1 −2 2

Easy inversion:
 
1 0 0
 
M B = 0 1 0 


1 0 1
(you get this by adding row 1 to row 3 ) Then

 
1 0 0
 
MB−1 =
 0 1 0

−1 0 1
(So, you get this by the opposite inverse action, i.e., subtract row
1 from row 3 )

End result
  
1 0 0 1 2 −1
  
MB−1 A MB =
0 1 0 0 −1 1 
 

−1 0 1 1 −2 2
 
1 2 −1
 
=
0 −1 1 ( = A′
)
0 −4 3

Consider  
−1 1
A1 =   ( =Q)
−4 3
pA1 (x) = (x + 1)(x − 3) + 4 = x2 − 2x + 1 = (x − 1)2
 
1
A1 v = v ⇒ v =  
2

Extend to basis of R2
 
1 0
 
2 1
(like MB but working in R2 not R3 )
with inverse (swop/re-sign/divide!)

 
1 0
 
−2 1
(again like MB−1 but working in R2 not R3 )

   
1 0 1 1
A1   =   (like AMB but working in R2 )
2 1 2 3
    
1 0 1 1 1 1
  =  ( = M −1 A1 MB )
B
−2 1 2 3 0 1
   
1 0 0 1 2 −1 1 0 0
   
0 1 0 0 −1 1  0 1 0 
  ( = N −1 ′
AN )
  
0 −2 1 0 −4 3 0 2 1
  
1 0 0 1 0 −1
  
= 
0 1 0 0 1 1
0 −2 1 0 2 3
So finally
  
1 0 0 1 0 −1
  
=
0 1 0 0 1 1 
 

0 −2 1 0 2 3
 
1 0 −1
 
=
0 1 1
0 0 1

N.B. (Nota bene!)
For A sized n × n suppose det(A − λI) = 0

Then
(A − λI)x = 0
has a redundant equation

... because A − λI has rank strictly smaller than n and so at
least one row is a linear combination of the others.

Lecture 30: Block-diagonal Forms
Shearing and Jordan blocks

Cayley-Hamilton Theorem
Jordan canonical form (see Moodle for a ‘theory’ Appendix)
Examples
Example of a non-diagonalizable matrix
The following matrix is not diagonalizable:

 
1 1
A=  .
0 1
Reference to the characteristic polynomial shows its

eigenvalues to be the diagonal entries. Were it to be
diagonalizable, there would be a non-singular P such that
P −1 AP = diag{1, 1}
 
1 0
=  =I
0 1
MA212 – Lecture 30 – Friday 2 February 2018 page 2

Then
   
1 1 1 0
  = A = P IP −1 = I =  ,
0 1 0 1
oops, a contradiction.

Here A = I+ a shearing action:
        
1 1 x1 x1 + x2 x1 x2
  = =I  + .
0 1 x2 x2 x2 0
Here     
x2 0 1 x1
 =  
0 0 0 x2
and  
0 1
S= 
0 0
has a shearing action.

Geometric interpretation of the transformation A.
‫ݔ‬2
‫ݔ‬ A‫ݔ‬
‫ݔ‬2
S‫ݔ‬
‫ݔ‬1 ‫ݔ‬2 ‫ݔ‬1

Deformation?
‘most stretched’
‘least stretched’
More generally
      
λ 1 x1 λ 0 x2
Ax =   =  x+   = λIx + Sx
0 λ x2 0 λ 0
 
λ 1
J2 =   is called a 2 × 2 Jordan block.
0 λ

Illustration in 3-dimensions
ʄI‫ݔ‬
‫ݔ‬
‫ݔ‬2
S‫ݔ‬
‫ݔ‬1
‫ݔ‬2

In 3-dimensions
A 3 × 3 Jordan block is given by

 
λ 1 0
 
J3 =  
 0 λ 1 
0 0 λ
and
       
λ 1 0 λx1 + x2 x2 0
       
 0 λ 1  x =  λx2 + x3  = λIx+  0  +  x3 
       
0 0 λ λx3 0 0
includes a shear along the x1 -axis plus a shear along the

x2 -axis.
In 3-D
Ix
3 x
2
3
2
1

In the last example one can have fewer shearing actions:
 
λ 1 0
 
 0 λ 0 
 
←− shearing absent here
0 0 λ
 
λ 0 0
  ←− shearing absent here
 0 λ 1 
 
0 0 λ
 
λ 0 0
  ←− shearing absent here
 0 λ 0 
 
←− shearing absent here
0 0 λ
Decomposition
We can decompose the first matrix as a 2 × 2 block J2 + a 1 × 1

block J1 plus bordering
     
λ 1 0 0 λ 1 0
   J2   
 0 λ 0 = 0  =  0 λ 0 
     
0 0 λ 0 0 J1 0 0 λ

Cont’d
We can decompose the second matrix as a 1 × 1 block J1 + a

2 × 2 block plus bordering
     
λ 0 0 J1 0 0 λ 0 0
     
 0 λ 1 = 0 = 0 λ 1 
   J2   
0 0 λ 0 0 0 λ

Theorem on the Jordan canonical form
Theorem. Every n×n matrix is similar to a Jordan block form,

with Jk -blocks down the diagonal with zeros elsewhere.
For example:
 
λ1 1
 
 λ1   
 
  J
 λ2 1 0   2 
   
   J3 
 0 λ2 1 = 
   
   J 
 λ2   
  ..
  .
 λ3 
 
..
.
Each eigenvalue occurs according to its algebraic multiplicity.

*Exotica: Shearing and differentiation
In C ∞ [0, 1] , the space of functions f (t) having derivatives of all

orders:
Take
d
Af := f i.e.f ′
dt
An eigenvector here is a function f such that for some scalar λ
f ′ = Af = λf
d λt
f (t) = e as e = λeλt .
λt
dt
* For background reading only

*Cont’d
Now take e1 (t) := eλt and e2 (t) := teλt then
d λt
te = eλt + λteλt : Ae2 = λe2 + e1
dt
The second term e1 here acts as a shear on e2 .
* For background reading only

Recall, something that we need in the next slide, about
partitioning matrices:
 
x
 1 
 
 x2 
Ax = [a1 , a2 , ..., an ]  
 .. 
 . 
 
xn
= x1 a1 + x2 a2 + ... + xn an

Likewise, ...
more generally:
 
x1 y1 ...
 
 
 x2 y2 ... 
A[x, y, ...] = [a1 , a2 , ..., an ] 
 .. .. ..


 . . . 
 
xn yn ...
= [x1 a1 + x2 a2 + ... + xn an , y1 a1 + y2 a2 + ... + yn an , ...]

Another view of J = Jm
 
λ 1 Je1 = λe1
 
 λ 1  Je2 = e1 +λe2
 
 
 λ 1  ...
 
J = ..  =⇒
 . 
 
 
 λ 1  Jem−1 = em−2 +λem−1
 
λ Jem = em−1 +λem
m×m

An important tool: the Cayley-Hamilton Theorem*
Theorem (Cayley-Hamilton Theorem). If A is n × n and

the characteristic polynomial factorizes as:
pA (x) := det(xI − A) = (x − λ1 )(x − λ2 )...(x − λn ),
then
(A − λ1 I)(A − λ2 I)...(A − λn I) = O.
This is remembered as “ pA (A) = O ”.
*=Statement only needed; no need to memorize the proof.

Fake proof
pA (A) = det(xI − A)|substitute x=A = det(AI − A) = det(O) = 0.
However, x ranges over scalars not over matrices, so this won’t

do!!
*=Statement only needed; no need to memorize the proof.

Example
 

0 −1 x 1
A=  
=⇒ pA (x) = = x2 + 2x + 1.

1 −2 −1 x + 2

Cont’d
      
0 −2 0 −1 0 −1 −1 2
2A =   , A2 =   = 
2 −4 1 −2 1 −2 −2 3
 
−1 0
A + 2A = 
2  = −I.
0 −1
Yesss! A2 + 2A + I = O.

Proper proof: Stage 1
Find P non-singular and U upper triangular such that
U = P −1 AP.
Write (with × denoting arbitrary entry):

 
λ1 × × × ... ×
 
 λ2 × × ... × 
 
 
 λ3 × ... × 
 
U :=  
 ... ... 
 
 
 ... × 
 
λn

Then
 
λ1 − λj × × × ... ×
 
 λ2 − λj × × ... × 
 
 
 ... × ... × 
 
U − λj I =  
 0 = λj − λj ... ... 
 
 
 ... × 
 
λn − λj
In view of the ‘0’ on the diagonal
(U − λj I) maps Lin{e1 , ..., ej } → Lin{e1 , ..., ej−1 }.

Check:
(U − λj I)e1 = (λ1 − λj )e1 ,

(U − λj I)e2 = ×e1 + (λ2 − λj )e2
...
(U − λj I)ej = ×e1 + ... + ×ej−1
All of these are in Lin{e1 , ..., ej−1 } .

This implies that...
(U − λn I)Cn = (U − λn I)Lin{e1 , ..., en } ⊆ Lin{e1 , ..., en−1 }
So
(U − λn−1 I)(U − λn I)Cn = (U − λn−1 I)Lin{e1 , ..., en−1 }
⊆ Lin{e1 , ..., en−2 }
and
(U −λn−2 I)(U −λn−1 I)(U −λn I)Cn = (U −λn−2 I)Lin{e1 , ..., en−2 }
⊆ Lin{e1 , ..., en−3 }

Continuing all the way:
(U − λ1 I)(U − λ2 I)(U − λ3 I)(U − λn I)Cn = Lin{0} = {0}.
i.e. LHS maps all vectors to 0 :
(U − λ1 I)(U − λ2 I)...(U − λn−1 I)(U − λn I) = O.

Proper proof: Stage 2
By applying P on the left and P −1 on the right we deduce from

the above that
(A − λ1 I)(A − λ2 I)...(A − λn I) = O
Substituting U = P −1 AP and I = P −1 P ,
O = (U − λ1 I)(U − λ2 I)...(U − λn I)
= (P −1 AP −λ1 P −1 P )(P −1 AP −λ2 P −1 P )...(P −1 AP −λn P −1 IP ).

Cont’d
Using (P −1 AP − λ1 P −1 P ) = P −1 (A − λ1 I)P (i.e. ‘pulling

factors out’ left and right):
O = P −1 (A − λ1 I)P P −1 (A − λ2 I)P...P −1 (A − λn I)P

= P −1 (A − λ1 I)(A − λ2 I)...(A − λn I)P,
after cancellation.
Now pre- and post-multiply resp. by P and P −1
O = P OP −1 = P P −1 (A − λ1 I)(A − λ2 I)...(A − λn I)P P −1
Finally
O = (A − λ1 I)(A − λ2 I)...(A − λn I).

How to find the Jordan Normal Form
We start by finding the characteristic polynomial, the

eigenvalues and corresponding eigenvectors just as for
diagonalization.
Let’s look at the 2 × 2 and 3 × 3 cases.

2 × 2 case: We assume A is not diagonalizable.
Suppose A 6= λI2×2 . Then:

Regarding A − λI, 3 possibilities follow.
i) dim N (A − λI) = 2.
ii) dim N (A − λI) = 1.
iii) dim N (A − λI) = 0? Nope! Silly: of course ∃v 6= 0 with
Av = λv, because det(A − λI) = 0.

Working in R2 , if dim N (A − λI) = 2 , then N (A − λI) = R2 ,
and then
(A − λI)u= 0 ∀u ∈ R2
(A − λI) ≡ O
A = λI,
but we assumed A was not diagonal.

So, after all dim N (A − λI) = 1.
So with one dimension left:

Pick u ∈R2 \N (A − λI) and write v2 = u.
So v2 witnesses (A − λI) 6= O .
Put
v :=(A − λI)u 6= 0, and so note Au =λu + v
and write v1 = v.

By the Cayley-Hamilton Theorem
(A − λI)v = (A − λI)(A − λI)u = Ou = 0.
So Av = λv, and we get a stretch and a stretch+shear:
Av = λv Av1 =λv1
Au = λu + v Av2 =λv2 +v1
Needs proof that {v1 , v2 } is a linearly independent set. But it is

so.

Take
P = [v1 , v2 ]
Then
AP = A[v1 , v2 ] = [Av1 , Av2 ] = [λv1 , λv2 + v1 ]

 
λ 1
= [v1 , v2 ]  
0 λ

So
 
λ 1
P −1
AP =  .
0 λ
So if A is not diagonalizable, then it is similar to

 
λ 1
 .
0 λ

Example:
Recall that
 

0 −1 x 1
A=  
=⇒ pA (x) = = x2 +2x+1 = (x+1)2 .

1 −2 −1 x + 2
So λ = −1 twice.

   
1 −1  1 
A − λI = A + I =   R(A − λI) = Lin  
1 −1  1 
Find u ∈N
/ (A − λI) = N (A + I), i.e. u with (A + I)u 6= 0,
i.e. u = (u1 , u2 )T such that
   
1 −1
u1   + u2   6= 0.
1 −1

Easy:
take v2 = u = (1, 0)T = e1 . Now take

 
1
v1 = (A + I)v2 = (A + I)u =  .
1
Take
   
1 1 1  0 −1 
P = [v1 , v2 ] =   and so P −1 =
1 0 det P −1 1

Then..
... as det P = −1,

   
−1 0 1 0 −1 1 1
J = P AP =    
1 −1
1 −2 1 0
    
0 1 −1 0 −1 1
=     =  
1 −1 −1 1 0 −1

3 × 3 case:
working in R3 ....
Main focus here is on λ repeated three times pA (x) = (x − λ)3 .
NB The case pA (x) = (x − λ)2 (x − µ) reduces to
   
λ 1 0 λ 0 0
   
 0 λ 0  or  0 λ 0  .
   
0 0 µ 0 0 µ
In the former case choose v1 with Av1 = λv1 and find a

solution for v2 to the equation (A − λI)v2 = v1 . In the latter
case choose independent v1 , v2 with Avi = λvi for i = 1, 2 .
In both cases choose v3 with Av3 = µv3 .

By the Cayley-Hamilton Theorem...
pA (x) = (x − λ)3 =⇒ (A − λI)3 = O.
The situation falls into several cases.

Case 1. ‘Bottom up’: v3 , then v2 , then v1
Case: (A − λI)2 6= O
2

Pick v3 = u with u ∈
/ N (A − λI)
(A − λI)2 u 6= 0, so v3 witnesses the non-zero.
Then
v2 := (A − λI)v3 6= 0, as otherwise (A − λI)2 v3 = 0.

Finally take...
v1 := (A − λI)v2 = (A − λI)2 v3 6= 0.
Then
(A − λI)v1 := (A − λI)2 v2 = (A − λI)3 v3 = 0,
the latter by the Cayley-Hamilton Theorem.

Cascade
So we now have the cascade:
Av1 = λv1
Av2 = λv2 + v1
Av3 = λv3 + v2
Needs proof that {v1 , v2 , v3 } is a linearly independent set.

And it is so.

Again take P = [v1 , v2 , v3 ]
Then
AP = A[v1 , v2 , v3 ] = [Av1 , Av2 , Av3 ]

= [λv1 , v1 + λv2 , v2 + λv3 ]
 
λ 1 0
 
= [v1 , v2 , v3 ]  0 λ 1 


0 0 λ
Then
 
λ 1 0
 
P −1
AP =  
 0 λ 1 .
0 0 λ

Case 2. ‘Bottom up’: witness v3 , then v2 ; separately v1
(A − λI)2 = O
Possibilities:
i) dim N (A − λI) = 3 =⇒ A = λI, as before (see below)
ii) dim N (A − λI) = 2... see below
iii) dim N (A − λI) = 1... impossible

iii) dim N (A − λI) = 0... impossible, as ∃v 6= 0 with Av = λv .

Cases (i) and (iii)
i) dim N (A − λI) = 3 implies that N (A − λI) = R3 , i.e.
(A − λI)u = 0 ∀u : Au = λu ∀u : A = λI.
iii) dim N (A − λI) = 1... then rk (A − λI) = 3 − 1 = 2. But as

(A − λI)2 = O
(A − λI)[(A − λI)u]= 0 ∀u
(A − λI)v= 0 ∀v ∈ R(A − λI)
So
R(A − λI) ⊆ N (A − λI) =⇒ dim N (A − λI) ≥ 2,
a contradiction.

Case (ii): dim N (A − λI) = 2
As dim N (A − λI) = 2 6= 3, pick u = v3 with
v2 := (A − λI)u 6= 0 : Av3 := v2 + λv3
Here
(A − λI)v2 := (A − λI)2 u = 0 : Av2 := λv2
i.e. v2 ∈ N (A − λI), which has dimension 2, so then...
pick v1 ∈ N (A − λI) independent of v2 .

Then Av1 := λv1 so take P = [v1 , v2 , v3 ]
AP = A[v1 , v2 , v3 ] = [Av1 , Av2 , Av3 ]

= [λv1 , λv2 , v2 + λv3 ]
 
λ 0 0
 
= [v1 , v2 , v3 ] 
 0 λ 1


0 0 λ
   
λ 0 0 λ 0 0
   
P −1
AP = 
 0
= 0
  λ 1 

J2
0 0 0 λ

Example
 
1 2 1
 
A =  0 2 0 


−1 2 3

x − 1 −2 −1

(expand by middle row) 0 x−2 0

1 −2 x − 3
= (x − 2){(x − 1)(x − 3) + 1} = (x − 2){(x2 − 4x + 4}

= (x − 2)(x − 2)2 = (x − 2)3 .

 
−1 2 1
 
A − λI = A − 2I =  0 0 0 


−1 2 1
Here
(A − 2I)2 = O.

Pick u to witness (A − 2I)u 6= 0
Pick u = v3 = e1 with
 
 
 1 
 
v2 := (A − 2I)e1 ∈Lin  0  6= 0 :
  Av3 = v2 + λv3

 
 
1

 
−1
 
v2 = (A − 2I)e1 =  0 


−1
Now we cannot take for v1 the image (A − 2I)v2 as

    
−1 2 1 −1 0
    
(A − 2I)v2 = 
 0 0 0  0  =  0 
   
−1 2 1 −1 0

So v2 ∈ N (A − 2I), so we look for an alternative eigenvector
independent of v2 in N (A − λI).
First we find N (A − 2I) by solving
(A − 2I)x = 0
i.e.
    
−1 2 1 x1 0
    
 0 0 0   x2  =  0 
    
−1 2 1 x3 0

The only non-redundant equation is:
−x1 + 2x2 + x3 = 0.
So
     
2x2 + x3 2x2 x3
     
x =
 x2  =  x2  +  0 
    
x3 0 x3

So
    
 
  2   1 
 
N (A − 2I) = Lin  1 , 0 
   

 
 
0 1
Here the obvious choice is the first of the two (not the second)
 
2
 
v1 :=  
 1 
0

So take
 
2 −1 1
 
P = [v1 , v2 , v3 ] =  1 0 0 


0 −1 0
Then
 
2 0 0
 
P −1
AP =  0 2 1 


0 0 2
Just one shear here.

Lecture 31: Solving differential equations

using Jordan blocks
Reduction to first order systems

Diagonalizable case
Exponential Matrix Method (use of etA )
Background Readings
Anthony and Harvey

Chap. 9 (§9.2,9.3)
Briefly also in :
Adam Ostaszewski
§6.5
MA212 – Lecture 31 – Tuesday 6 February 2018 page 2

Recap: ... what we did and saw:
Bottom-up method:
λ eigenvalue: Algebraic multiplicity m = mA .
Geometric multiplicity mG ... with mG ≤ mA .
i.e. we can get only mG lin. indep. eigenvectors to value λ .
For each of these eigenvectors we arrange a stretch-and-shear
scheme:
Av1 = λv1
Av2 = v1 + λv2
...
Avk = vk−1 + λvk

The sequence of operations is this.
Find k with k ≤ m such that
(A − λI)k = O
(A − λI)k−1 6= O
Pick vk witnessing the above:
(A − λI)k−1 vk 6= 0,
continue recursively (backwards!)...

That is
vk−1 = (A − λI)vk
vk−2 = (A − λI)vk−1
...
v1 = (A − λI)v2 = ... = (A − λI)k−1 vk 6= 0
The last equation:
v1 = (A − λI)v2 = ... = (A − λI)k−1 vk
yields:
(A − λI)v1 = (A − λI)k vk = 0.
So Av1 = λv1 and v1 is an eigenvector.

We freely make use of the theorem that:
Every square matrix An×n is similar to a Jordan Normal Form
P −1 AP = J
 
Block A1
 
 Block A2 
 
 
 Block A3 
 
J = 
 ... 
 
 
 ... 
 
Block Am
where each block has the Jordan form, i.e.

The form:
 
λ 1
 
 
 λ 
Ai := 
 ..


 . 1 
 
λ
Thus λ is repeated all the way down the diagonal,

and 1 is repeated all the way down the (first) super-diagonal.

First order constant coefficient differential equations
These take the form

ẋ = Ax + b
where A is an n × n matrix of constants (the constant

coefficients) and so...

...Here
   
ẋ1 b (t)
   1 
   
 ẋ2  d  b2 (t) 
ẋ = 
 ..
 where ẋi = xi = xi (t) and b = 

′
 ..
.

 .  dt  . 
   
ẋn bn (t)

Example: Newton’s Equation of Motion:
This equates acceleration and force:
ẍ(t) = f (t).
We transform this into first-order, as follows. Take
x1 = x, and x2 = ẋ
then we get
ẋ1 = x2 and ẍ(t) = ẋ2 = f (t)

... and so
      
ẋ1 0 1 x1 0
  =   + 
ẋ2 0 0 x2 f (t)
ẋ = Ax + b.
for  
0 1
A = ANewton :=  
0 0
and  
0
b = bNewton :=  
f (t)

Reduction to an uncoupled system?
Yes, if A is diagonalizable by P, then

 
λ 0 0 ...
 1 
 
P AP = D =  0 . . . 0 ...
−1

 
0 0 ... λn
Put x = P y , then ẋ = P ẏ , as P is constant.

Then
ẋ = Ax+b P ẏ = AP y+b ẏ = P −1 AP y+c with c =P −1 b
i.e.
ẏ = Dy + c
yielding a sequence of independent equations
ẏi = λi yi + ci (t) ẏi − λi yi = ci (t).

Solution:
The integrating factor (IF) here is e−λi t .

Multiplying each side by the IF gives
ẏi e−λi t − λi yi e−λi t = ci (t)e−λi t .
So
d n −λi t o
yi e = ẏi e−λi t − λi yi e−λi t = ci (t)e−λi t
dt Z
yi e−λi t = ci (t)e−λi t dt

Example
 
1 −1
ẋ =  x
4 1
Here

x−1 1

pA (x) = = (x − 1)2 + 4

−4 x − 1
so λ = 1 ± 2i.

Finding P (‘the modular matrix’)
For λ = 1 + 2i we solve (A − λI)x = 0 :

    
−2i −1 x1 0
  = 
4 −2i x2 0
The second row is a multiple of the first (by 2i ).

Solve one of them, e.g.
 
1
2ix1 + x2 = 0 : v1 =  .
−2i

Freebie
Apply conjugation to get the eigenvector to value λ̄ = 1 − 2i :

 
1
v2 =  ,
+2i
as  
1
v1 =  ,
−2i

Now for P :
 
1 1
P = [v1 , v2 ] =  
−2i +2i
we get
 
1 + 2i 0
P −1
AP =  
0 1 − 2i

Taking x = P y
we solve  
1 + 2i 0
ẏ = Dy =   y.
0 1 − 2i

Cont’d
Taking α = y1 (0) and β = y2 (0) gives
y1 = αe(1+2i)t = αet (cos 2t + i sin 2t)

y2 = βe(1−2i)t = βet (cos 2t − i sin 2t)

Back to x
    
x1 1 1 αe(1+2i)t
x =   = Py =  
x2 −2i +2i βe(1−2i)t
 
αet (cos 2t + i sin 2t) + βet (cos 2t − i sin 2t)
=  
−2iαet (cos 2t + i sin 2t) + 2iβet (cos 2t − i sin 2t)

Equivalently:
 
t (α + β) cos 2t + i(α − β) sin 2t
x=e ,
2i(β − α) cos 2t + 2(α + β) sin 2t
and furthermore ...

Its real equivalent:
We obtain  
t K cos 2t + L sin 2t
=e 
−2L cos 2t + 2K sin 2t
by taking
α+β =K ∈R 2β = K + iL
=⇒
i(α − β) = L ∈ R 2α = K − iL

What if A is not diagonalizable?
The example with

 
λ 1
ẋ = Ax + b =  x + b
0 λ
gives
ẋ1 = x2 + λx1 + b1 (t)

ẋ2 = λx2 + b2 (t).

Lets take b = 0.
The bottom equation then gives
x2 = K2 eλt
and so subbing in the upper equation:
ẋ1 = K2 eλt + λx1 ẋ1 − λx1 = K2 eλt

Using the integration factor method,
here the IF is e−λt , so
d n −λt o
x1 e = K2 so x1 e−λt = K2 t + K1
dt
x1 = (K2 t + K1 )eλt .

So
      
x1 K2 t + K1 1 t K1
  = eλt   = eλt   .
x2 K2 0 1 K2
We will soon see whence the family resemblance:

   
1 t λ 1
between   and  .
0 1 0 λ

In preparation for an important construction we recall
that
(P −1 AP )k = P −1 Ak P.
For instance,
(P −1 AP )2 = (P −1 AP )(P −1 AP ) = P −1 A(P P −1 )AP

= P −1 A2 P,
(P −1 AP )3 = (P −1 AP )2 (P −1 AP ) = P −1 A2 (P P −1 )AP
= P −1 A3 P

The exponential of a matrix
Recall from Taylor’s Theorem that
t 1 2 1 3
e := 1 + t + t + t + ...
2! 3!
and the series converges for every t.

For A a square matrix
we define:
A 1 2 1 3
e = I + A + A + A + ...
2! 3!
We read the right-hand side as a limiting sum of n × n matrices.

More precisely, for each i, j we read this as summing the
(i, j) -entry of each summand
1 1
(I)ij + (A)ij + (A)ij + (A)3ij + ...;
2
2! 3!
one needs to verify that for a fixed A this sum converges; we
explain why so later.

So eA is an n × n matrix.
To get a feel for what this is consider a diagonalizable A. Say

 
λ
 1 
 
 λ2 
−1
P AP = D =   .
.. 
 . 
 
λn

Then
 
λk1
 
 
 λk2 
P A P = (P AP ) = D = 
−1 k −1 k k
 ..
.

 . 
 
λkn
So
−1 A −1 −1 1 −1 2 1 −1 3
P e P =P P +P AP + P A P + P A P + ...
2! 3!

So P −1 eA P =...
−1 A 1 2 1 3
P e P = I + D + D + D + ... = eD
2! 3!
   
1 λ
   1 
   
 1   λ2 

=    
.. + .. 
 .   . 
   
1 λn
 
λ21
 
 
1  λ22 
+   + etc.
2! 

..
.


 
λ2n MA212 – Lecture 31 – Tuesday 6 February 2018 page 33
Summing up:
 
1 2
1 + λ1 + 2! λ1 + ...
 
 1 2 
 1 + λ2 + 2! λ2 + ... 
 
 .. 
 . 
 
1 2
1 + λn + 2! λn + ...
 
eλ1
 
 
 eλ2 
= 
 ..
.

 . 
 
eλn

In summary
For a diagonalizable square matrix A :

 
eλ1
 
 λ2 
 e  −1
eA = P  ..
P .

 . 
 
eλn

For a general n × n matrix A...
We refer to its Jordan block form J.

Say
 
A
 1 
 
 A2 
−1
P AP = J =   .
.. 
 . 
 
An

Powers
Of course, since J k = P −1 Ak P , then as before, but with J in

place of D ,
−1 A 1 2 1 3
P e P = I + J + J + J + ... = eJ
2! 3!

But what is J k ?
This is easy:
 
Ak1
 
 
 Ak2 
J =
 ..
.

 . 
 
Akn

Now we need:
The last observation reduces our needs to the computation of

 k
λ 1
 
 
 λ 1 
  ,
 .. 
 . 1 
 
λ
which we delay. We stop to explain the usefulness of the

enterprise.

Why bother?
Taking P −1 AP = J, we solve
ẋ = Ax
by substituting x = P y. Then, as before,
ẏ = Jy.

Claim: y(t) = etJ y(0).
tJ 1 2 2 1 3 3
e = I + tJ + t J + t J + ...
2! 3!
d tJ 1 1
{e } = J + tJ 2 + t2 J 3 + ...
dt 1! 2!

1 1 2 2
= J I + tJ + t J + ... = JetJ
1! 2!

1 1 2 2
or = I + tJ + t J + ... J = etJ J.
1! 2!

So, applying the above to y(t) = etJ y(0)
d d tJ
ẏ(t) = y(t) = {e y(0)} = JetJ y(0) = Jy(t),
dt dt
as before.

More generally,...
by substituting x = P y into
ẋ = Ax + b
gives
ẏ =Jy+P −1 b =Jy + c ẏ−Jy = c

Pre-multiply by e−tJ
ẏ − Jy = c ⇒ e−tJ ẏ−e−tJ Jy =e−tJ c
Use the integrating factor method to get
d −tJ
e y(t) = e−tJ ẏ−e−tJ Jy =e−tJ c
dt Z
e−tJ y(t) = e−tJ c(t) dt.
We will soon see that this integral also has a nice formula!

Computation of J k for J a block: Example.
  
λ 1 λ 1
  
  
 λ 1  λ 1 
J2 = 
 ..

 ..


 . 1  . 1 
  
λ λ
 
λ2 2λ 1 0 0...
 
 
 λ2 2λ 1 ... 
 
 
=  λ2 ... 
 
 .. 
 . 
 
λ2

Cont’d
 
λ2 2λ 1 0 0...  
  λ 1
  
 λ2 2λ 1 ...   
  λ 1 
  =
J3 =  λ2 ...   .. 
  . 1 
 ..  
 . 
  λ
λ 2

Cont’d
 
λ3 3λ2 3λ 1 0...
 
 
 λ3 3λ2 3λ 1... 
 
 
= λ3 ... 
 
 .. 
 . 
 
λ3

Comments
Hereabove, beyond what we stipulate below, all else is a 0:

1. The diagonal entries are all λ3
2. The first super-diagonal entries are all 3λ2
3. The second super-diagonal entries are all 3λ
4. The third super-diagonal entries are all 1

Surprise:
For good reason: these repeated items are exactly as the terms
in: (λ + 1)3 = λ3 + 3λ2 + 3λ + 1
Starting from its diagonal position, each row exhibits the terms:
λ3 , 3λ2 , 3λ, 1 in that order, for as long as there is space in the
row available for all the terms.

Formula:
In J k the diagonal carries λk and then the exponent falls so that

the mth super-diagonal carries: (binomial coeffient) × λk−m :
   
k k
  λk−m ; where   = k(k − 1)...(k − m + 1)
m m m.(m − 1)...2.1

Reason for this:
Ultimately, the same mechanism is at work as in Pascal’s

Triangle:
1 1
1 2 1
1 3 3 1
1 4 6 4 1
In any row each inner element ‘folds in’ (sums) the two elements
above.

Exercise:
Compute
tJ t2 2 t3 3
e = I + tJ + J + J + .
2! 3!

Solution:
Consider any fixed entry (i, j) from the mth super-diagonal,

remembering that only J k with k ≥ m has a non-zero mth
super-diagonal.
Summing over all the relevant powers J m , J m+1 , ... we get
X tk
k−m k(k − 1)...(k − m + 1)
λ
k! m!
k≥m
tm X k−m k−m k(k − 1)...(k − m + 1)

= t λ
m! k!
k≥m
= as k! = k(k − 1)...(k − m + 1)(k − m)...

Continuing
tm X k−m 1
= (tλ) [put: ℓ = k − m]
m! (k − m)!
k≥m
tm X ℓ1
= (tλ)
m! ℓ!
ℓ=0
m
λt t
= e
m!

So these look like..
t2 t3
1, t, 2! , 3! , ...
the terms in the exponential series ×eλt
t2 λt t3 λt
1eλt , teλt , 2! e , 3! e , ...

Examples: For 2 × 2
     
λ 1 eλt teλt 1 t
J = , e tJ
=  = eλt  
0 λ 0 eλt 0 1

For 3 × 3
 
λ 1 0
 
J = 0 λ 1 

,
0 0 λ
   
λt λt t2 λt 2
e te 2! e 1 t t /2!
   
e tJ
=
 0 e λt
te λt  = eλt  0 1
  t 

0 0 eλt 0 0 1

In general
  

 λ 1 


  

 
  λ 1 

etJ = exp t 
 ..



  . 1 

  

 

 λ 
 
2 3
1 t t /2! t /3!
 
 1 t 2 3
t /2! t /3! 
 
 
 1 t 2 3
t /2! t /3! 
 
= eλt  
 1 t 2
t /2! 
 
 .. 
 . ... ... 
 
1
Summary
To solve
ẋ = Ax
Take P −1 AP = J and solve by substituting x = P y by reducing

to:
ẏ = Jy.

This has solution
y(t) = etJ y(0).
Here etJ is computed block-wise with the factoring out of eλt

and using tk /k! on the k th super-diagonal:

Lecture 32: Solving recurrence equations
using Jordan blocks
Dominant eigenvalue
Long-term forecasts
Recap: differential equations
For  
λ 1
 
 
 λ 1 
J =
 ..


 . 1
 
λ
ẏ = Jy
tJ
with y(0) initial data
=⇒ y(t) = e y(0)
ẋ = Ax, x = P y
y = P −1 x
ẏ = P −1 AP y
Cont’d
 
2
t
1 t 2 ···
 
 
etJ = eλt  1 t · · ·
 
..
.

Example:
 
−1 1 0
 
ẋ =  0 −1 1 

x
0 0 −1

Solution:
x(t) = etJ x(0)

  
t2
1 t 2 A1

−t 
 
= e 0 1 t A2 
 
0 0 1 A3
Ai = xi (0)

Example:
 
0 −1
Solve ẋ =  x
1 −2
We saw in Lecture 30 slide 39 that for
 
1 1
P = MB =  
1 0
 
−1 1
−1
P AP =  =J
0 −1

So...
Take
x = Py
P ẏ = AP y
 
−1 1
ẏ = P −1
AP y =  y
0 −1
y(t) = etJ y(0)
  
−t 1 t A
  1
y(t) = e
0 1 A2
 
A1
y(0) =  
A2
so
x(t) = P y(t)
   
−t 1 1 1 t  A1 
=e
1 0 0 1 A2

Recurrence Equations: xt+1 = Axt t = 0, 1, ...
Our study is by cases:

Case 1: A diagonalizable
Case 2: Jordan block form
Subcase 2.1: b 6= c
Subcase 2.2: b = c and a 6= c
 
a
 
 
b
where x0 =   

c
.
..

Aim
We study solutions of equations like

    
x1 (t + 1) 0 7 6 x1 (t)
    
x2 (t + 1) =  1 0 0  
  4  x2 (t)
1
x3 (t + 1) 0 2 0 x3 (t)
An example concerned with population growth will eventually

follow.

Recurrence Equations:
   
x x (k)
 1  1 
 .   . 
For x =  ..  , xk =  ..  , with k = 0, 1, 2, · · · . Consider
   
xn xn (k)
xk+1 = An×n xk
x0 given
Solution:
xk = Axk−1 = A(Axk−2 )
= · · · = Ak x0

Case 1: A diagonalizable
 
λ1
 
 .. 
P −1 AP = D =  . 
 
λn
Put
P yk = xk
P yk = Axk−1 = AP yk−1
yk = P −1 AP yk−1 = Dyk−1

So
yk = Dk y0
 
λk1
  (1)
 .. 
= .  y0
 
λkn
But P yk = xk , y0 = P −1 x0

So
xk = P yk
 
λk1
  (2)
 ..  −1
=P .  P x0
 
λkn
more later on xk .

Provided there is one dominating eigenvalue
Let us index the eigenvalues according to their modulus. If the

largest in modulus is the only eigenvalue with that modulus, then
we say that it is a dominant or dominating eigenvalue. If this
happens, we have:
|λ1 | > |λ2 | ≥ |λ3 | ≥ · · ·
λk1 grows fastest, i.e.
λk2 λ2 k
k
= ( 1 ) −→ 0 as k −→ ∞,
λ1 λ
λ3
and similarly for λ1 etc.

More about xk : as promised
xk = P Dk y0
 
λk1 α
 
 k 
 λ2 β 
=P  .. 

 . 
 
λkn γ
 
α
 
 λ2 k 
( ) β 
k  λ1
= λ1 P  .  
 .. 
 
( λλn1 )k γ

Cont’d
Here
y0 = P −1 x0
 
α
 
 
β 
= 
 .. 
.
 
γ

Some more about xk : ∗ denotes ‘small’ relative to α
xk = P Dk y0
 
α
 
 
∗
k  
= λ1 P  . 
 .. 
 
∗
= αλk1 v1 , approximately, as P e1 = v1 (nb Av1 = λ1 v1 ) and
   
α 1
   
   
∗ 0
  ≈ α  = αe1
 ..   .. 
. .
   
∗ 0 MA212 – Lecture 32 – Tuesday 9 February 2018 page 18
Subdominant eigenvalue:
In the very special (= not generic) case that α = 0 above

argument breaks down.
Subdominant eigenvalue:
|λ1 | > |λ2 | > |λ3 | ≥ |λ4 | ≥ · · ·
In practical modelling usually α 6= 0 ; but models need to be

robust & must keep away from the ... case α = 0

An example for case 1
The following recurrence models the 3 inter-related sections of a

population over (discrete) time periods t with respective sizes
x1 , x2 , x3 of the young, middle-age and old members. The
coefficient matrix reflects fertility (i.e. creation of the young) and
survival from one period to the next. For example, 1/4 of the
young survive, and likewise just 1/2 of the middle-aged survive.
    
x1 (t + 1) 0 7 6 x1 (t)
    
x2 (t + 2) =  1 0 0 x2 (t)
 
  4 
1
x3 (t + 1) 0 2 0 x3 (t)

Analysis

x −7 −6

PA (x) = det(xI − A) = − 41 x 0

0 −2 x
1
1 6 7
= (x · 0 − ) + x(x2 − )
2 4 4
3 7 3 2 3
= x − x − = (x + 1)(x − x − )
4 4 4
3 1
= (x + 1)(x − )(x + )
2 2
So λ = + 23 , − 12 , −1 and so there is a dominant eigenvalue here:

3 1 1
> 1 = | − 1| > = − .
2 2 2 MA212 – Lecture 32 – Tuesday 9 February 2018 page 21
To find the corresponding eigenvector v1 solve:
  
3
2 −7 −6 x1
  
− 1 3
0  x 2  = 0 (3)
 4 2  
0 − 12 23 x3
−x1 + 6x2 = 0
−x2 + 3x3 = 0
... omitting the first row as redundant

Cont’d
 
18
 
v1 = 
  3 
1

P = v1 , · · ·
   
x1 (0) α
   
P x2 (0)
−1   
 = β  = y0
x3 (0) γ
NB: xk = P yk and we assume α 6= 0.

Provided α 6= 0:
   
γk 18
   
s  ≈ ( 3 )k α  3 
 k 2  
tk 1
Age class proportions 18:3:1

Case 1 (Diagonalizable) Summary
In the dominant eigenvalue setting
xt ≈ λtdominant × α × eigenvector to value λdominant
Comments: Perron-Frobenius Theorems

1. When A is real with non-negative ( ≥ 0 ) entries, A has a
non-negative eigenvalue which is largest in modulus:
Put ρ(A) := max{|λ| : λ is eigenvalue of A}
Then ρ(A) ≥ 0 , and ρ(A) is an eigenvalue.

2. If Ak for some k = 1, 2, .. has all entries POSITIVE, then
ρ(A) > 0 and ρ(A) has multiplicity 1,
so ρ(A) is a dominant eigenvalue.

Case 2: Jordan Normal Form (JNF)
Here
P −1 AP = J
As before sub
P yk = xk
yk = J k y0
 
A
 1 
 
 A2 
J = 
.. 
 . 
 
An

And ...
 
Ak1
 
 
 Ak2 
J =
k
 ..


 . 
 
Akn
so xk = P J k P −1 x0

Example: How to compute J
 
2 1 −2
 
A = 1 1 −1


1 0 0

x − 2 −1 2

pA (x) = −1 x − 1 1 = (x − 1)3

−1 0 x
So (A − I)3 = O .

So as λ = 1
 
1 1 −2
 
A − λI = A − I = 
1 0 −1

1 0 −1
 
0 1 −1
 
(A − λI) = (A − I) = 
2 2
0 1 −1
 6= 0
0 1 −1
 
0
 
v3 := e2 =  1
 
 “picks up” 2nd col.
to witness (A − I)2 v3 6= 0 . MA212 – Lecture 32 – Tuesday 9 February 2018 page 30

Cont’d
 
0
 
v3 =  1
 
 ( = e2 )
0
 
1
 
v2 = (A − I)v3 = 
 0 ( = e1 ) = 2nd col. “pick up” from (A − I)
0
 
1 1 −2
 

as A − λI = 1 0 −1 .
1 0 −1

Cont’d
 
1
 
v1 = (A − I)v2 =  1
 
,
1
   
1 1 −2 1
   
 
as A − λI = 1 0 −1 and v2 =  0
 
 ( = e1 ).
1 0 −1 0

Cont’d
 
1 1 0
 
P = v1 v2 v3 = 1 0 1 


1 0 0

Cont’d
 
1 1 0
 
J = P AP = 0 1 1
−1 

0 0 1
Recall
 k−2 
k k−1 k
λ kλ 2 λ
 
k 
Jλ =  0 λ k
kλ k−1  (4)

0 0 λk

k k−1 k k−2
(λ + 1)k = λk + λ + λ + ···
1 2

Here λ = 1
so
 
1
1 k 2 k(k − 1)
 
J =
k
0 1 k 
 (5)
0 0 1
Back to the xk
xk = P J k P −1 x0
    
1
1 1 0 1 k 2 k(k − 1) 0 0 1 a
     (6)
= 1 0 1 


0 1 k 

 1 0 −1 b
 
1 0 0 0 0 1 0 1 −1 c

Note:
    
0 0 1 a c
    
   
y0 = 1 0 −1  b  = a − c

 (7)
0 1 −1 c b−c
As λ1 = λ2 = λ3 = 1 here.

Let’s study ‘Long-term’ behaviour
here, of course, of
   
1 1 0 c
  k 
xk = 1 0 1 J a − c
  
 (8)
1 0 0 b−c
for ’large’ k
 
1
1 k 2 k(k − 1)
 
J =
k
0 1 k 
 (9)
0 0 1

Subcase 1: Influence of k 2 term
1 1 2 1
k(k − 1) = k (1 − ) (10)
2 2 k
Influences xk ... see this by factorizing out k 2 :
   
1
1 1 0 ∗ ∗ 2 c
   
xk = k 1 0 1 0 ∗ ∗  a − c
2    

1 0 0 0 0 ∗ b−c
  (11)
1
2 (b − c)
 
=k P ∗ 
2 

∗
Above (and also in later slides ) ∗ denotes arbitrary values.

Here
 
1
21
 
xk = k (b − c)P 
 0+∗
2
0
  (12)
1
21
 
= k (b − c) 
 1 +∗
2
1
Note that
 
1
 
v1 =  1
 

1
Subcase 2: b = c and so k 2 has no influence
If b = c , then
   
1
2 (b − c) 0
   
 ∗  = ∗ (13)
   
∗ ∗

So start over: this time the k term wields influence
Recall a previous slide

   
1 1 0 c
  k 
xk =    
1 0 1 J a − c (14)
1 0 0 b−c
for ’large’ k
 
1
1 k 2 k(k − 1)
 
J =
k
0 1 k 
 (15)
0 0 1
Now factor out only k

So start over: this time the k term wields influence
Denoting an irrelevant term by

  
∗ 1 c
  
xk = kP  
0 ∗ 1  a − c

0 0 ∗ 0
 
a−c
 
= kP  ∗ 

+∗
0
  
1 1 0 1
  
= k 1 0 1
 0 (a − c) + ∗
 
1 0 0 0
That is
  
1 1 0 1
  
xk = k 1 0 1
 0 (a − c) + ∗
 
1 0 0 0
 
1
 
= k(a − c)  
1 + ∗
1
Here the eigenvector v1 is all important.

Summary of Case 2: Jordan Form
x = P y =⇒ yt+1 = Jyt , t = 0, 1, 2 · · ·
=⇒ y1 = Jy0
 
k
λk kλk−1 2 λk−2 · · ·
 
 
 λk kλk−1 · · ·
J =
k
 ..


 . · · ·
 
..
.

Long term forecast for xt
We are given
 
a
 
 
b
x0 = 
 

c
.
..

Subcase 2.1 b − c 6= 0
∼ 1 2
xt = t (b − c) × eigenvector for the λ block
2

Or Subcase 2.2 b = c and a − c 6= 0
xt ∼
= t(a − c) × eigenvector

Lecture 33: Unitary Diagonalization
& Applications
About normality
Inner products and positive definiteness
Singular values (for non-square matrices)
Background reading
Anthony and Harvey,

Chap. 13 (§13.6,13.7)
Adam Ostaszewski
Chap. 5 (§5.1-5.3, 5.5 5.6,)

Recap: Complex inner product
We recall the properties of a complex inner product h., .i on V :

Linearity in the first argument:
hαu1 + βu2 , vi = αhu1 , vi + βhu2 , vi all u1 , u2 , v ∈V, α, β ∈ C
Hermitian property:
hv, ui = hu, vi all u, v ∈V.
Positivity:
hu, ui > 0 all u 6= 0 ∈V.
In Cn the standard example is u · v := u1 v¯1 + ... + un v¯n = uT v̄ .

This may be equivalently written as v∗ u , i.e. v̄T u .

Definition
An n × n complex matrix A is unitarily diagonalizable if there is

a unitary matrix S and a diagonal matrix D such that
S ∗ AS = D
i.e. S −1 AS = D with S −1 = S ∗ .
When this happens the columns of S form an orthonormal basis
of Cn (as S ∗ S = I) and are eigenvectors of A as AS = SD, i.e.
 
λ1
 
A[s1 , ..., sn ] = [s1 , ..., sn ] 
 ...  = [λ1 s1 , ...,λn sn ].

λn

Conclusion and Question
Theorem. An n × n complex matrix A is unitarily diagonal-

izable iff A has an orthonormal basis consisting of eigen-
vectors.
Question: Just which n × n complex matrices are unitarily

diagonalizable?
Answer: Those which are normal: i.e. satisfy AA∗ = A∗ A.

Theorem
Theorem. An n × n complex matrix A is unitarily diagonal-

izable iff A is normal.

Proof
Proof: (Left to Right) If A is unitarily diagonalisable, then there is

a unitary matrix S and a diagonal matrix D such that
S ∗ AS = D .
∴ A = SDS ∗ and A∗ = (SDS ∗ )∗ = S ∗∗ D∗ S ∗ = SD∗ S ∗
so AA∗ = (SDS ∗ )(SD∗ S ∗ ) = SDD∗ S ∗
and A∗ A = (SD∗ S ∗ )(SDS ∗ ) = SD∗ DS ∗
Our claim will be established if we show that DD∗ = D∗ D .

Cont’d
   

λ1 0 
λ1 0
   
 λ2   λ2 
But, D = 
 ..
 and D∗ = 
  ..


 .   . 
   
0 λn 0 λn

 

|λ1 |2 0 
 
 |λ2 |2 
and so DD = 
∗
 ..
 = D∗ D

 . 
 
0 |λn |2
∴ AA∗ = A∗ A , i.e. A is normal.
So every unitarily diagonalizable matrix is normal.

The proof of the converse direction is much harder and beyond
the scope of this course.

Quick recap: If A is an n × n complex matrix, then
A is normal if A∗ A = AA∗
A is unitary if A∗ = A−1
A is Hermitian if A∗ = A
Consequently if A is unitary or Hermitian it is normal, so may be

unitarily diagonalized.

Real and symmetric A...
Also if An×n is real and symmetric, then it is Hermitian, and so

has real eigenvalues and we can find a real eigenvector for each
(real) eigenvalue (by solving the real equations Ax = λx with λ
real).
As A is normal, the eigenvectors will form an orthonormal basis
of Rn .
So A can be orthogonally diagonalized,
i.e. there is an orthogonal matrix P and a diagonal matrix D
such that P T AP = D (i.e. P −1 = P T ).

Recap on inner products via a Hermitian matrix
Also recall from Lectures 27/28 that
hx, yi := y∗ Ax
defines an inner product on Cn provided:

A is Hermitian,
x∗ Ax > 0 for all x 6= 0. (For A Hermitian, we later show
that x∗ Ax ∈ R .)

Inner product from Hermitian A
For An×n Hermitian and x, y ∈ Cn put
hx, yi := x · (Ay) = xT Ay.
Then hx, yi = xT Ay = xT Ay = (xT Ay)T = yT AT x.

| {z }
as this is just a number
Now, as A is Hermitian (i.e. A∗ = A )
yT AT x = yT ĀT x = yT A∗ x = yT Ax.
So by definition of hx, yi we have hx, yi = hy, xi .

This verifies the Hermitian property of the inner product.
What remains is the positivity property, i.e. that x∗ Ax > 0 for all
x 6= 0 , a matter we will return to.
Recall
Definition. An n × n complex matrix A is positive definite if A

is Hermitian and x∗ Ax > 0 for all x 6= 0 in Cn .
So we conclude that
Theorem. For A Hermitian, hx, yi := y∗ Ax defines an

inner product on Cn iff A is positive definite.
The link between positive definiteness and the signs of the

eigenvalues is:
Theorem. A is positive definite iff A is Hermitian and all

its eigenvalues are (strictly!) positive.

Proof: (⇒)
To show that if A is positive definite, i.e. that all of the

eigenvalues are strictly positive, we show that
if A has a e-value λ ≤ 0 , then A is not positive definite.
Suppose that A has a e-value λ ≤ 0 so that
Ax = λx
for some x 6= 0 . Then
x∗ Ax = x∗ (λx) = λ(x∗ x) = λkxk2
and so x∗ Ax ≤ 0 for some x 6= 0 , i.e. A is not positive definite.

Note that as A is Hermitian, λ is real.

Proof: (⇐)
Suppose that A is Hermitian and that all of its e-values

λ1 , λ2 , . . . , λn are strictly positive.
As A is Hermitian, it is normal, and so the corresponding
e-vectors can be x1 , x2 , . . . , xn 6= 0 where
Ax1 = λ1 x1 , Ax2 = λ2 x2 , . . . , Axn = λn xn ,
and they form an orthonormal basis of Cn . So for any x ∈ Cn

with x 6= 0 , we have

say x = α1 x1 + α2 x2 + · · · + αn xn with αi not all zero
Ax = A(α1 x1 + α2 x2 + · · · + αn xn )
= α1 Ax1 + α2 Ax2 + · · · + αn Axn
= α1 λ1 x1 + α2 λ2 x2 + · · · + αn λn xn .
Then x∗ Ax =
= (α1 x1 + α2 x2 + · · · + αn xn )∗ (α1 λ1 x1 + α2 λ2 x2 + · · · + αn λn xn )
= (α1 x∗1 + α2 x∗2 + · · · + αn x∗n )(α1 λ1 x1 + α2 λ2 x2 + · · · + αn λn xn )
= α1 α1 λ1 x∗1 x1 + α2 α1 λ1 x∗2 x1 + · · · + αn α1 λ1 x∗n x1
+ α1 α2 λ2 x∗1 x2 + α2 α2 λ2 x∗2 x2 + · · · + αn α2 λ2 x∗n x2
+ α1 αn λn x∗1 xn + α2 αn λn x∗2 xn + · · · + αn αn λn x∗n xn
= |α1 |2 λ1 + |α2 |2 λ2 + · · · + |αn |2 λn
>0
Determinantal test
Or, look at the principal (or leading) minors of A and see

whether they are all strictly positive.
 
a a12 · · · a1n
 11 
 
 a21 a22 · · · a2n 
The Hermitian matrix A =   .. .. .. .. 

 . . . . 
 
..
an1 an2 . ann
is positive definite, if and only if all the principal sub-
determinants are positive

Test
a11 > 0

a11 a12
>0

a21 a22

a a a
11 12 13

a
21 a22 a23 > 0

a31 a32 a33
and all the next ones,...

ending with

a11 a12 · · · a1n

a21 a22 · · · a2n
the full determinant of A . .. ..

.. = |A| > 0.
.. . . .

an1 an2 · · · ann

Example
 
2 1 1
 
Consider the matrix A = 1 1 0

 , its characteristic
1 0 3
polynomial is

Cont’d

x − 2 −1 −1

pA (x) = det(xI − A) = −1 x − 1 0

−1 0 x − 3
pA (x) = −(0 + (x − 1)) + (x − 3)((x − 2)(x − 1) − 1)

= (x − 3)(x2 − 3x + 2) − 2x + 4
= x3 − 3x2 + 2x − 3x2 + 9x − 6 − 2x + 4
= x3 − 6x2 + 9x − 2
∴ pA (x) = (x − 2)(x2 − 4x + 1)
∴ pA (x) = 0 if x = 2 or x2 − 4x + 1 = 0
1
√ √
So x = 2 (4 ± 16 − 4) = 2 ± 3 > 0.
The principal determinants of A are
the following three:

2 1 1

2 1
2, = 1 , and

1 1 0 = 2.

1 1
1 0 3
All positive, as expected.

The magical properties of the matrix A∗ A
Let A be complex m × n matrix (i.e. A need not be a square

matrix).
Observe:
A∗ A is square as it will be a n × n matrix.

A∗ A is Hermitian as (A∗ A)∗ = A∗ A∗∗ = A∗ A
A∗ A is non-negative definite as, for any x ∈ Cn , we have
x∗ (A∗ Ax) = (x∗ A∗ )Ax = (Ax)∗ Ax = kAxk2 ≥ 0
So all the eigenvalues will be non-negative.

Cont’d
In particular, if the only solution to Ax = 0 is x = 0 , then

x∗ A∗ Ax > 0 for all x 6= 0 in Cn and A∗ A is actually positive
definite.

Example
 
2 i 0
Consider the 2 × 3 complex matrix A =  
0 −i 2
 
2 0
 
so A = −i i 
∗ 
 and
0 2
   
2 0   4 2i 0
  2 i 0  
∗  
A A = −i i    = −2i 2 2i


0 −i 2
0 2 0 −2i 4

The characteristic polynomial here of A∗ A
is

x − 4 −2i 0

pA∗ A (x) = 2i x − 2 −2i

0 2i x − 4
So

x − 2 −2i 2i −2i
pA∗ A (x) = (x − 4) + 2i

2i x − 4 0 x − 4
= x(x − 4)(x − 6) so λ0 = 0, λ1 = 4, λ2 = 6.

Comment
Indeed, the non-zero eigenvalue of A∗ A are the same as the

non-zero eigenvalue of AA∗ .
Why? Suppose that λ is a non-zero eigenvalue of A∗ A so that
A∗ Av = λv
for some v 6= 0 . We then have
A(A∗ Av) = A(λv)

=⇒ (AA∗ )(Av) = λ(Av)
with Av 6= 0 . (Why?) That is λ is also a non-zero eigenvalue of

AA∗ with eigenvector Av 6= 0 .
(Because Av = 0 ⇒ λv = A∗ Av = 0 ⇒ v = 0 as λ 6= 0 .)

Important definition: the singular values of A
Definition. If λ1 , λ2 , . . . , λk are the non-zero eigenvalues of the

∗
√ √ √
n × n matrix A A , then λ1 , λ2 , . . . , λk are called the
singular values of A .

Example
Using
 
2 i 0
A= 
0 −i 2
again, we have
 
  2 0  
2 i 0   5 −1
∗
AA =      
−i i  =
0 −i 2 −1 5
0 2

Cont’d
Thus AA∗ has eigenvalues given by

 
5−λ −1
det(AA − λI) = 0 =⇒ det 
∗ =0
−1 5−λ
(5 − λ)2 − 1 = 0
(5 − λ)2 = 1 =⇒ 5 − λ = ±1
so λ = 5 ± 1 = 6, 4
√
Indeed, the singular values of A are 6 and 2 .

Furthermore: normality
Since A∗ A is Hermitian it is also normal as

(A∗ A)∗ (A∗ A) = (A∗ A∗∗ )(A∗ A) = A∗ AA∗ A
(A∗ A)(A∗ A)∗ = (A∗ A)(A∗ A∗∗ ) = A∗ AA∗ A
which means that the eigenvectors of the n × n matrix A∗ A

form an orthonormal basis of Cn .
Indeed, suppose {v1 , v2 , . . . , vk } is an orthonormal set of
eigenvectors of A∗ A corresponding to the non-zero eigenvalues
λ1 , λ2 , . . . , λk , then
A∗ Avi = λi vi for 1 ≤ i ≤ k with λi > 0

vi · vj = 0 for i 6= j and kvi k = 1 for 1 ≤ i ≤ k
and then
and then
...we see that

each of the vectors
1
ui = √ Avi for 1 ≤ i ≤ k
λi
is an eigenvector of AA∗ corresponding to the non-zero

eigenvalue λi as
∗ ∗1 1 ∗ 1
AA ui = AA ( √ Avi ) = √ A(A Avi ) = √ A(λi vi )
λi λi λi
1
= λi ( √ Avi ) = λi ui
λi

Orthogonal
orthogonal as, for i 6= j ,
1 1 1
ui · uj = ( √ Avi ) · ( √ Avj ) = p (Avi ) · (Avj )
λi λi λi λj
1 ∗ 1
=p (Avj ) Avi = p vj∗ A∗ Avi
λi λj λi λj
s
λi ∗ λi
= p vj vi = (vi · vj )
λi λj λ j
∴ ui · uj = 0 when i 6= j

Unit length
unit length as
s r
λi λi
ui · uj = (vi · vj ) =⇒ ui · ui = (vi · vi )
λj λi
=⇒ kui k2 = kvi k2 = 1
That is, the vectors ui = √1 Avi form an orthonormal set of

λi
eigenvector of AA∗ corresponding to its non-zero eigenvalues
λ1 , λ2 , . . . , λk

Example
 
2 i 0
Using A =   again, we have
0 −i 2
 
5 −1
∗
AA =  
−1 5
with eigenvalues 4, 6

1 1
λ1 = 4 =⇒ e-vector ui = √
2 1

1 −1
λ2 = 6 =⇒ e-vector ui = √
2 1

Example cont’d
 
4 2i 0
 
A A = −2i 2 2i
∗ 

0 −2i 4
with e-values 0, 4, 6

Cont’d
 
1
1  
λ1 = 4 =⇒ e-vector √ 0


2
1
 
−1
1  
λ2 = 6 =⇒ e-vector √  i 


3
1
 
−i
1  
λ3 = 0 =⇒ e-vector √  2 


6
i

Eigenvectors ui for λi > 0 are:...
 
  1
1 1 2 i 0 
1    1 2 1 1
u1 = √ Av1 =  √ 0 = √ = √
4 2 0 −i 2 2  2 2 2 2 1
1

and
 
  −1
1 1 2 i 0 1  
u2 = √ Av2 = √ √  i 


6 6 0 −i 2 3
1

1 −3 1 −1
= √ =√
3 2 +3 2 +1

Comment
Indeed, using similar reasoning, we see that if ui , u2 , . . . , uk is

an orthonormal set of eigenvectors of AA∗ corresponding to the
non-zero eigenvalues λ1 , λ2 , . . . , λk , then the vectors
1 ∗
vi = √ A ui for 1 ≤ i ≤ k
λi
form an orthonormal set of eigenvectors of A∗ A corresponding

to its non-zero eigenvalues λ1 , λ2 , . . . , λk .

Example
Using what we saw above we have
     
2 0   2 1
1 ∗ 1  1 1 1   1  
v1 = √ A u1 =  −i i 
 √  = √  0
 =√   0

4 2 2 1 2 2 2
0 2 2 1

And
   
2 0   −2
1 ∗ 1 
 1

−1 1  
v2 = √ A u2 = √ −i i  √   = √  2i 


6 6 2 1 2 3
0 2 2
 
−1
1  
=√  i 

3
1

Lecture 34: Singular Values Decomposition
How many SVs are there? ... = rank(A) many

The maximal SV versus magnification under A
Spectral Decomposition
SV Decomposition
S1 + S2 – sum of two subspaces
Background reading
Adam Ostaszewski
Chap. 5

The norm of a matrix
Definition. Let A be any matrix, the norm of A , denoted by

kAk , is
kAk = max{kA xk : kxk = 1}
i.e., it is the maximum value of kA xk over all unit vectors x .

Indeed, as the norm of a vector allows us to write for x 6= 0

kA xk 1 x
= A x = A
kxk
,
kxk kxk
we also have

kA xk
kAk = max : x 6= 0 ,
kxk
and this is easier to deal with.

The norm of a matrix is a measure of ‘how large’ the linear
transformation represented by A can be, i.e., how large can
kA xk be if we have kxk = 1 ?

Of course, if v is an eigenvector of A with eigenvalue λ we have
Av = λv (v 6= 0)
and so
kAvk kλvk kvk
= = |λ| = |λ|
kvk kvk kvk
which means that the norm of A , i.e. kAk , must be at least as
large as the largest modulus than we can get from its
eigenvalues...
... But in general, it is larger! (see, for example, Exercise 8.2 :
√
1+ 5
Norm = 2 > 1 = modulus of largest eigenvalue).

A simpler example
 
0 1
If A =   , then the eigenvalues are 0; but Ae1 = 0 and
0 0
Ae2 = e1 , so kAk = 1 > 0 .

How to find the norm an m × n complex matrix A?
For any such matrix write
kA xk2 = (A x) · (A x) = (A x)∗ (A x) = x∗ A∗ A x.
Of course, A∗ A is Hermitian and non-negative definite and so

we can see that its eigenvalues
λ1 , λ2 , . . . , λn
are all non-negative. Lets say that λ1 is the largest one.

A∗ A is also normal and so its eigenvectors v1 , . . . , vn
respectively form an orthonormal set.

Computation
We start with A∗ A vi = λi vi where λi ≥ 0 and λ1 is the largest,

and {v1 , . . . , vn } an orthonormal set. For any x ∈ Cn , by using
orthogonality of the vi , we can write
x = α1 v1 + α2 v2 + · · · + αn vn
so kxk2 = hx, xi = · · · = |α1 |2 + |α2 |2 + · · · + |αn |2
2
P P
since ||x|| = hx, xi = i,j αi α¯j hvi , vj i = i αi ᾱi .
In words: The norm of x is equal to the norm of its coeffi-

cient vector in the base B = (v1 , · · · , vn ) .

We also have, as λ1 is the largest eigenvalue
kA xk2 = x∗ A∗ A x = |α1 |2 λ1 + |α2 |2 λ2 + · · · + |αn |2 λn

≤ |α1 |2 λ1 + |α2 |2 λ1 + · · · + |αn |2 λ1
2 2 2

= λ1 |α1 | + |α2 | + · · · + |αn |
= λ1 kxk2 .
since ||Ax||2 = hAx, Axi = x∗ A∗ A x

and
A∗ Ax = A∗ A(α1 v1 + α2 v2 + · · · + αn vn )
= λ1 α1 v1 + λ2 α2 v2 + · · · + λn αn vn

From here
kA xk2
∴ 2
≤ λ1 with equality if x = α1 v1
kxk
kA xk √
∴ ≤ λ1 with equality if x
kxk
is a corresponding eigenvector of A∗ A.
So, for any matrix A , its norm equals its largest singular value!
Warning
A square matrix A has both eigenvalues and singular values.
Distinguish these carefully!
Denoting eigenvalues by λA i we see that
q
|λA
∗
max | ≤ ||A|| = λA A
max
EXAMPLE
Using what  in Lecture 33 .

 we saw
2 i 0 √
If A =   , its singular values are 6 and 2 , therefore,
0 −i 2
√
the norm of A is kAk = 6 (as this is larger than 2, of course).
Indeed, this means that the maximum value of
kA xk
kxk
√
is 6 and this occurs when x = (−1, i, 1)⊤ (an eigenvector of
A∗ A corresponding to its eigenvalue 6 . We can, of course, take
any non-zero scalar multiple of this!)

Check:
 
  −1  
2 i 0   −3 √
Ax =       ⇒ kA xk = 18
 i =
0 −i 2 3
1
√
and kxk = 3
√
kA xk 18 √
so = √ = 6.
kxk 3

The singular value decomposition of a matrix
Recall that if A is any m × n complex matrix, the matrices A A∗

and A∗ A are normal and have the same non-zero eigenvalues.
Indeed, if these non-zero eigenvalues are λ1 , λ2 , . . . , λk , then
∗
A
|{z}A has an orthonormal set of eigenvectors
n×n matrix
v1 , . . . , vk ∈ Cn
∗
A A
| {z } has an orthonormal set of eigenvectors
m×m matrix
u1 , . . . , uk ∈ Cm
The two sets of eigenvectors are related by
1 1 ∗
ui = √ A vi and vi = √ A ui
λi λi
We now derive a direct link of A to its singular values.
This will be the singular value decomposition of A or SVD for

short.
Let A be any m × n complex matrix.
As A∗ A is normal, its eigenvectors give rise to an orthonormal
basis of Cn . Suppose the orthonormal vectors are:
v1 , v2 , . . . , vk , vk+1 , vk+2 , . . . , vn
where v1 , . . . , vk are eigenvectors for the non-zero eigenvalues

λ1 , . . . , λk of A∗ A , and vk+1 , vk+2 , . . . , vn are eigenvectors for
the remaining eigenvalue of A∗ A , i.e., zero!

Cont’d
As these eigenvectors are orthonormal,

   
v1∗ v1∗
   
 ∗  ∗
 v2  v 
  (v1 , v2 , . . . , vn ) = In = (v1 , v2 , . . . , vn )  2 
 ..   .. 
 .   . 
   
vn∗ vn∗
A statement equivalent to the rightmost equation is that

v1 v1∗ + v2 v2∗ + · · · + vn vn∗ = In
| {z }
n×n matrices
For the above recall that V = (v1 , v2 , . . . , vn ) is unitary means

that V ∗ V = I = V V ∗ and V V ∗ = v1 v1∗ + v2 v2∗ + · · · + vn vn∗ .
Why?
Consider the case where n = 2 . We have

 
α1
c1 , c2   = α1 c1 + α2 c2 .
α2
Now,
 
α1 β1
c1 , c2   = α1 c1 + α2 c2 , β1 c1 + β2 c2
α2 β2

= α1 c1 , β1 c1 + α2 c2 , β2 c2
= c1 (α1 , β1 ) + c2 (α2 , β2 )

So,
 
r1
c1 , c2   = c1 r1 + c2 r2 ,
r2
for
r1 = (α1 , β1 ) r2 = (α2 , β2 ).

So...
bearing in mind that A∗ A is n × n and
A∗ A vi = λi vi with λi 6= 0 for 1 ≤ i ≤ k
and λi = 0 for k + 1 ≤ i ≤ k
1
with ui = √ A vi for the corresponding eigenvector for A A∗ .
λi
This is only for 1 ≤ i ≤ k ;
Indeed, for k + 1 ≤ i ≤ n , we have λi = 0 so that
A∗ A vi = 0 ⇒ vi∗ A∗ A vi = 0
⇒ (A vi )∗ A vi = 0
⇒ (A vi ) · (A vi ) = 0
⇒ kA vi k = 0
⇒ Avi = 0
Conclusion: the SVD.
I = vi vi∗ + v2 v2∗ + · · · + vk vk∗ + vk+1 vk+1

∗
+ · · · + vn vn∗
∴ A = A I = A v1 v1∗ + A v2 v2∗ + · · · + A vk vk∗ +

∗
+ A vk+1 vk+1 + · · · + A vn vn∗
√ √ √
∴ A = λ1 u1 v1 + λ2 u2 v2 + · · · + λk uk vk∗
∗ ∗
This is the SVD of A !

An alternative SVD.
√ √ √
∴ A = λ1 u1 v1 + λ2 u2 v2 + · · · + λk uk vk∗ =
∗ ∗
= U DV ∗ for U = (u1 , · · · , uk )m×k , V = (v1 , · · · , vn )n×n

√ 
λ1 0 0 0 0 0 ···
 √ 
 0 λ2 0 0 0 0 · · ·
 
and D =  
 ··· 
 
√
0 0 ··· λk 0 0 ···
k×n
This is also the SVD of A , but in a fancy format!

EXAMPLE
 
2 i 0
With A =   from the last lecture (Lecture 33) we had
0 −i 2
A∗ A with non-zero eigenvalues λ1 = 4 and λ2 = 6

   
1 −1
1   1  
with orthonormal eigenvectors v1 = √   0
 , v2 = √   i 

2 3
1 1
   
1 1 1 −1
and u1 = √ , u2 = √
2 1 2 1

Check:
Therefore, the SVD of A is

   
√ √1 √ −1
√
A = 4  2  √1 0 √1 + 6  2  − √1 − √i3 √1
√1 2 2 3 √1 3
2 2
∴
   
1
0 1 √ √1 √i − √16
A= 2 2 2
+ 6 6 6 
1 1
2 0 2 − √16 − √i6 √1
6

Check:
The RHS must add up to give A on the LHS!

     
1 0 1 1 i −1 2 i 0
RHS =   +   =   = A = LHS
1 0 1 −1 −i 1 0 −i 2

The spectral decomposition of a matrix
Actually, if A is a normal matrix, i.e., it is square, and

A∗ A = A A∗ , then we can do better than the SVD!
As A is normal, its eigenvectors v1 , v2 , . . . , vn form an
orthonormal basis of Cn . W suppose that the corresponding
eigenvalues are λi , so that
A vi = λi vi ( and vi 6= 0).
So, as before
I = vi vi∗ + v2 v2∗ + · · · + vn vn∗

∴ A = A I = A v1 v1∗ + A v2 v2∗ + · · · + A vn vn∗
∴ A = λ1 v1 v1∗ + λ2 v2 v2∗ + · · · + λn vn vn∗

So
A = λ1 v1 v1∗ + λ2 v2 v2∗ + · · · + λn vn vn∗
which is, in reality, just a fancy way of writing P D P ∗ = A.
In this case the norm of A is equal to the largest modulus of its

eigenvalues.

EXAMPLE
 
3 2
A= 
2 0
= −1
Eigenvalues: λ1 = 4, λ2  .  
1 2  1 −1
Eigenvectors: v1 = √ , v2 = √
5 1 5 2
   
√2 − √15
∴ A= 4  5  √2 √1 + (−1)   −1
√ √2
√1 5 5 √2 5 5
5 5
   
4 2 1
− 25
∴ A= 4 5 5
+ (−1)  5 
2 1
5 5 − 25 4
5

Before we move on ...
...to our next topic, we need to think a little more about vector
spaces..., specifically Real vector spaces cont’d.
Sums and intersections
Definition. Suppose that U and W are two subspaces of a real
vector space V . We define the sum of U and W to be
U + W = {u + w : u ∈ U, w ∈ W },
and the intersection of U and W to be
U ∩ W = {x : x ∈ U and x ∈ W }.
Usefully, both of the sets of vectors we have just defined are

subspaces!

A Theorem
Theorem. If U and W are subspaces of a vector space

V , then both U + W and U ∩ W are subspaces of V .
Proof. For U + W we have
0 ∈ U + W as 0 = 0 + 0 and so U + W 6= ∅ .
If x1 , x2 ∈ U + W we have x1 = u1 + w1 and x2 = u2 + w2
with ui ∈ U, wi ∈ W , so that
x1 + x2 = (u1 + w1 ) + (u2 + w2 ) = (u1 + u2 ) + (w1 + w2 )
⇒ x1 + x2 ∈ U + W.

Cont’d
For α ∈ R , x ∈ U + W with x = u + w and u ∈ U , w ∈ W :
αx = α(u + w) = αu + αw ⇒ αx ∈ U + W
For U ∩ W we have
0 ∈ U ∩ W (as 0 ∈ U and 0 ∈ W ) and so U ∩ W 6= ∅ .

If x1 , x2 ∈ U ∩ W we have x1 ∈ U and x2 ∈ U so that
x1 + x2 ∈ U . And we have x1 ∈ W and x2 ∈ W so that
x1 + x2 ∈ W . Which means that x1 + x2 ∈ U ∩ W .
If α ∈ R and x ∈ U ∩ W , we have x ∈ U so that αx ∈ U .
And we have x ∈ W so that αx ∈ W . Which means
αx ∈ U ∩ W .
An actual calculation will follow after these illustrations...

Illustrations
In the middle diagram the line is in the plane of U
U W (point)
U (line)
U+W (plane)
0
W (line)
U W (line)
U+W (plane)
0
U (plane)
W (line)
W (plane)
U W (line)
0
U (plane)
EXAMPLE
Suppose we have the subspaces of R4 given by

      
 x   1 0 

 
 
 

   
      
 y   0 1
 
       
U=   x, y ∈ R = Lin    ,   

 0 
  0 0  

   
       

 
 
 

0 0 0
      
 a   1 0 

 
 
 


   
       
 a  1 0 
       
and W =   a, b ∈ R = Lin    ,   

 b 
  0 1 

   
  
     


 
 
 

b 0 1

Each of them of dimension 2.
Here, any vector x ∈ U + W can written as

         
x a 1 0 0
         
 y  a  0 1 0
         
x =   +   = (x + a)   + (y + a)   + b   .
0  b  0 0 1
         
0 b 0 0 1
and so
       

 1 0 0 



       

 
0 1 0
       
x ∈ Lin    ,   ,    .

 0 0 1  

        

 

0 0 1

Conversely,
any vector in the linear Lin above can be written as

         
1 0 0 α 0
         
0 1 0 β   0 
         
x = α + β +γ   =  +  ∈ U +W
0 0 1  0  γ 
         
0 0 1 0 γ

Consequently,
     

 1 0 0 


      
0 1 0
 
       
U + W = Lin    ,   ,   

0 0 1  

       

 

0 0 1
so dim(U + W ) = 3.

Also,
any vector x ∈ U ∩ W must be such that

 
x 
  
 y 

  
x ∈ U ⇒ x =   
 0 

  


   

 x a
0 

   
 y  a 
   
⇒  =  ⇒ x=a=y &b=0
   0  b 
    
a 
   0 b
a  

  
x ∈ W ⇒ x =   
 b 


  


b 

So
   
a 
 1 
  
   

a 
1 
     
and so, x =   ∈ Lin     .
0  
 0 
  
   

 

0 0

Conversely,
 

 1 

  


1 
   
... any vector x ∈ Lin     can be seen as

0  

   

 

0
   
a a
   
a  a 
   
x= ∈U and x= ∈W
0 0
   
0 0
and so x ∈ U ∩ W .

Consequently,
   

 1 
   

1 
   
U ∩ W = Lin     . So dim(U ∩ W ) = 1 .
 0  
    

 

0
Indeed, we observe that
dim(U + W ) = dim(U ) + dim(W ) − dim(U ∩ W )
and this result holds more generally...

Lecture 35: Projections = Direct sums
Direct sums
Orthogonal complements
Orthogonal projections
A duality result:
R(A⊤ ) = N (A)⊥
Background readings
Anthony & Harvey

Chapter 12 (§12.1 & §12.5)
Adam Ostaszewski
Chapter 4 (§4.1 & §4.6)

The lecture’s main purpose:
0 Px U
subspace
1. To recognize projection matrices

2. To write down the orthogonal projection matrix P onto U
given a basis for U .

Recap 1
Concepts needed - I:
U1 + U2 = {u1 + u2 | u1 ∈ U1 , and u2 ∈ U2 }
vector sum of two subspaces, as illustrated:
U2
u2 u1+u2
0 u1 U1
Recap 2
Below we shall use a simple idea:

For:
U a subspace of V, and
{u1 . . . , ur } a basis of U
We can extend to a base for V :
{u1 , . . . , ur , ur+1 , . . . , ur+s }
The above assumes V is finite-dimensional.

Direct sum
We write
V = U1 ⊕ U2
for subspaces U1 , U2 of V if each v ∈ V has a unique

representation
v = u1 + u2 with u1 ∈ U1 , u2 ∈ U2

EXAMPLE:
V = R3
Suppose v1 , v2 , v3 are linearly independent
Take
U1 = Lin({v1 , v2 })
U1 = Lin({v3 })
For any v, there is a linear combination

h i h i
v = α1 v1 + α2 v2 + α3 v3
| {z } | {z }
in U1 in U2
for unique scalars α1 , α2 , α3 .

A Theorem
Theorem.
V = U1 ⊕ U2 (Direct sum)
⇐⇒
(i) V = U1 + U2 (sum)
(ii) U1 ∩ U2 = {0}

Proof:
( ⇒ ) Suppose V = U1 ⊕ U2 .
(i) For any v ∈ V
v = u1 + u2 ∈ U1 + U2 .
(ii) If u ∈ U1 ∩ U2 , then
u = | {z
u } + | {z
0 }
in U1 in U2
= | {z
0 } + | {z
u }
in U1 in U2
Two representations? Must be the same: so u = 0 .

So U1 ∩ U2 = {0}
Cont’d
( ⇐ ) Suppose that (i) and (ii) hold.

Suppose v ∈ V . By (i) we can write
v = u1 + u2 (as V = U1 + U2 ).
Suppose also v = u′1 + u′2
Then u1 + u2 = u′1 + u′2

u1 − u′1 = u′2 − u′2
| {z } | {z }
in U1 in U2
i.e. u1 − u′1 ∈ U1 = u′2 − u2 ∈ U2
so u1 − u′1 , u′2 − u2 ∈ U1 ∩ U2 = {0}.
So u1 − u′1 = u′2 − u2 = 0 , i.e. u1 = u′1 and u2 = u′2 .

A Corollary
Corollary.
If V = U1 ⊕ U2 ,
then
dim(V ) = dim(U1 ) + dim(U2 ).
Proof: Here
dim(U1 ∩ U2 ) = dim({0}) = 0.
So
dim(V ) = dim(U1 + U2 )
= dim(U1 ) + dim(U2 ) − dim(U1 ∩ U2 ) = dim(U1 ) + dim(U2 ).
| {z }
=0

Complements
Given a vector space V and a subspace U , then a

Definition.
subspace W is a complement (in V ) of U if
V =U ⊕W
Task.Finding a complement:
Solution. First we find a base for U , say it is
{u1 . . . , ur }.

Next
Extend to a base for V
{u1 . . . , ur , ur+1 , . . . , ur+s }.

Take W = Lin {ur+1 , . . . , ur+s }
Here we assume V is finite dimensional and
dim(V ) = r + s, s > 0.

Comment
V = R2 , U a line through the origin.
W
0
Any line W through the origin (distinct from U ) is a complement

of U
as U + W = R2
and U ∩ W = {0}

Comment cont’d
SPECIAL CASE U a line as before ....

If W is the line orthogonal to U , then we say that W is the
orthogonal complement.
(‘the ’ because it is unique).

Definition in an inner product space
Given a vector subspace U of V the orthogonal complement of U

denoted U ⊥ is
U ⊥ = {v ∈ V : v ⊥ u for all u ∈ U }
where v ⊥ u means hv, ui = 0 .

So
v ∈ U⊥
⇐⇒
hu, vi = 0 for all u ∈ U.
\
⊥
U = {v ∈ V : hu, vi = 0}
u∈U
an intersection of hyperplanes passing through the origin.

A Theorem
Theorem.For V a finite-dimensional inner product space and

U a subspace of V ,
(i) U ⊥ is a subspace
(ii) V = U ⊕ U ⊥ (i.e. U ⊥ is a complement)
⊥
⊥
(iii) U =U
Proof.
(i) Exercise for you!
(Intersection of hyperplanes through the origin?)

(ii)
Take an orthonormal basis
{u1 , . . . , ur }
for U .
Extend to an orthonormal basis of V
{u1 , . . . , ur , ur+1 , un }
For v ∈ V write
v = α1 u1 + . . . + αr ur + αr+1 ur+1 + . . . + αn un
| {z } | {z }
∈U ∈U ⊥
the second group ∈ U ⊥ because basis vectors are mutually

orthogonal. MA212 – Lecture 35 – Tuesday 20 February 2018 page 19
Cont’d
So V = U + U ⊥ .
Also
u ∈ U ∩ U⊥ ⇒
|{z} hu, ui = 0 ⇒ u = 0
(as u∈U ⊥ )
so U ∩ U ⊥ = {0}

⊥ ⊥
(iii) For this, note that U ⊆ U .
(Query: why?...see next slide.) So all we need is to check

⊥
⊥
dim(U ) = dim U .
But, by (ii) , both
dim(V ) = dim(U ) + dim(U ⊥ )
and
dim(V ) = dim(U ⊥ ) + dim (U ⊥ )⊥ .
Subtracting equations

0 = dim(U ) − dim (U ⊥ )⊥
MA212 – Lecture 35 – Tuesday 20 February 2018

page 21
Query answered
⊥
U ⊆ U⊥
Suppose u ∈ U and w ∈ U ⊥ ; we are to prove that u ⊥ w . But

w ∈ U ⊥ i.e. w ⊥ u , so
hu, wi = hw, ui = 0

Illustration with U a line or plane in R3
V=Թ3
u
W
0 w
U
W
w
0 u U
U
Application: a Duality result
 
a
⊤
Take A = [a, b]1×2 , so A =   . Notice that, first,
b
2×1
       
x x x a
N (A) = {  : ax + by = 0} = {  :h  ,  i = 0}
y y y b
 
a ⊥

= Lin{ }
b
and, second, for t a scalar

 
a
v=t   ∈ R(A⊤ ).
b
Here’s something important
But
*    +
a x
0=   ,  
b y
says that
   
a x
  ⊥  .
b y
So
R(A⊤ ) = N (A)⊥
True for any matrix ! (Not just 1 × 2 .)

Equivalently:
R(A)⊥ = N (A⊤ )
(since U ⊥ ⊥ = U ) MA212 – Lecture 35 – Tuesday 20 February 2018 page 25

Projection
If V is a vector space, a linear transformation V −→ V is a

projection if
T (T (v)) = T (v) for v ∈ V
i.e.
T2 = T
When second power yields just the same as one, one says
“Idempotent”!

Theorem
Theorem. If T : V → V is a projection, then
V = R(T ) ⊕ N (T )
In words: the kernel/nullspace is a complement of the range.
!(T)
(I T)v v
(T)
Tv

Comment and Proof
Indeed the Theorem says the picture correctly shows v as the

parallelogram sum of T v and v − T v :
v = T v + (v − T v)
T v + (I − T ) v
= |{z}
| {z }
∈R(T ) ∈N (T )
To see the latter inclusion, notice that

T (I − T )v = (T − T 2 )(v)
=0 as T = T 2 .

Proof Proper
For v ∈ V
v=Iv (identity)
= T v + (I − T )v
Here, T v ∈ R(T ) (clear) and (I − T )v ∈ N (T ) , why? Well,

T (I − T )v) = T v − T T (v)
= T v − T2 v = T v − T v = 0

Cont’d
So V = R(T ) + N (T ) . So far only ‘vector sum’ is proved,

that is, (i) .
If v ∈ R(T ) ∩ N (T ) then:
T v = 0,
and
v = T w, say (for some w), so T v = T (T w) = T w = v.
So v = 0 , as T v = 0 , proving (ii) .
So we conclude V = R(T ) ⊕ N (T ) .

Converse: Direct sums give projections
If V =U ⊕W
Then v ∈ V may be written uniquely as
v =u+w with u ∈ U, w ∈ W.
Consider the map taking v to the unique u as above, i.e.

v 7→ u =: T (v). Then w = (I − T )v
As u=u+0
Tu=u for u ∈ U
and so T2 = T
Here we say that T maps onto U ‘parallel’ to W

EXAMPLE/QUESTION:
Suppose
 v v2


z }| { z }| {
1

    


 1 0 

   
 
V = R3 U = Lin  0 ,  1 
   

 

 


 1 −1 


 

 

 

  


 2 


  
W = Lin  1
 


 

 

 0  


| {z }

v3

What matrix represents...
... the projection T onto U parallel to W ?

ANSWER
Here V = U ⊕ W ... why?
Answer: They are three linearly independent vector.
Now by definition of projection:
T v1 = v1 , T v2 = v2 , T v3 = 0 so T has these three as
eigenvectors with eigenvalues λ1 = 1 , λ2 = 1 , λ3 = 0
So relative to (v1 , v2 , v3 ) T is represented by
 
1
 
D=
 1 

0
Change of base now
Change to the basis (v1 , v2 , v3 ) with base change matrix:
S = (v1 , v2 , v3 )
 
1 0 2
 
= 0 1 1


1 −1 0
Suppose P represents T relative to E = (e1 , e2 , e3 ) ; then

(y)E = P (x)E . Recalling (x)E = S(x)S , we see that
S(y)S = P S(x)S and so
S −1 P S = D, so that P = S D S −1 .

Calculation (check!)
 
−1 2 2
 
S −1 
= −1 2 1

1 −1 −1
Now
 
1
 
D=
 1  = (e1 , e2 , 0) .

0
So SD = (Se1 , Se2 , S0) .

Cont’d
P = SDS −1
 
1 0 2
 
=0 1 1
 D S −1
( recall SD = (Se1 , Se2 , S0) )
1 −1 0
  
1 0 0 −1 2 2
  
=0 1 0 
 −1 2 1
 (use row operations!)
1 −1 0 1 −1 1
 
−1 2 2
 
=−1 2 1

Orthogonal Projection onto a subspace U
This is taken to mean ‘parallel to U ⊥ ’, so V = U ⊕ U ⊥ .

We show 2 methods for finding the corresponding P .
Note a characteristic property
Theorem.
P is an orthogonal projection ⇐⇒ P 2 = P and P ⊤ = P .

Proof (⇒)
As x=Ix
= P x + (I − P )x
For P orthogonal we have ∀x ∀y h|{z}

P x , (I − P )yi = 0 i.e.
| {z }
∈U ∈U ⊥
(P x)⊤ (I − P )y = 0 :
0 = x⊤ P ⊤ (I − P )y
= x⊤ (P ⊤ − P ⊤ P )y

...so..
P ⊤ = P ⊤P
transposing
P = P ⊤ P ⊤⊤ = P ⊤ P = P ⊤
So P = P ⊤ symmetric

Conversely: (⇐)
Suppose P2 = P
and P⊤ = P
⇒ P ⊤ = P = P P = P ⊤ P.
Then, as before,
hP x, (I − P ) yi = x⊤ (P ⊤ − P ⊤ P ) y = 0.
| {z }
0
So P is orthogonal!

Recognizing a projection: AN EXAMPLE
 
1 1 2
A=
5 2 4
  
2 1 1 2 1 2
A =
25 2 4 2 4
 
1  5 10
= =A
25 10 20
and also A⊤ = A.

So...
So... A projects orthogonally onto
R(A) = column space of A

  
 1 
= Lin  
 2 

EXAMPLE
 
2
 
Let V = R , v1 = 
3
1
 
 and U = Lin {v1 }
1
   
1 0
   
Then U is spanned by: v2 = −2 , v3 = 
⊥   1
 
.
0 −1
(By inspection! Here v2 and v3 are lin. indep. and both v2 ⊥ v1

and v3 ⊥ v1 .)

Problem
Find P representing the orthogonal projection onto U .
Let T = orthogonal projection onto U . This is easy to represent

using the basis (v1 , v2 , v3 ) , so put
 
2 1 0
 
R= 
1 −2 1  = (v1 , v2 , v3 ).
1 0 −1
Then
T v1 = v1 , T v2 = 0, T v3 = 0 so...
recalling (x)E = R(x)R , we translate (y)R = T (x)R to

(y)E = R(y)R = RT R−1 (x)E

So...
 
1
  −1
P = R
 0 R

0
   
2 0 0 2 1 1
 1 
=
1 0 0 
 6 2 −2 −2

1 0 0 2 −1 −5
 
4 2 2
1 
=  2 1 1
6 
2 1 1
PT = P here, of course.
To span U ⊥ ...(if not by inspection)
   
2 x1
   
x ∈ U⊥ ⇐⇒ 1  • x 2  = 0
   
1 x3
2x1 + x2 + x3 = 0
put x1 = t
x3 = u
so x2 = −u − 2 t

Then
     
t 1 0
     
x = −u − 2 t = t −2 +u 
    −1
 

u+ 0 1
| {z } | {z }
v2 −v3

If not by inspection...cont’d
Alternative approach, knowing U ⊥ .

The two vectors
 
2
1  
u1 = √ 1

 (as given)
6
1
 
0
1  
u2 = √  1 

 (got by taking t = 0 above)
2
−1
are orthogonal ( u1 ⊥ u2 ).

Apply Gram-Schmidt to...
 
1
 
to u :=  −2
 
 (got by taking u = 0 above):
0
Noting that u ⊥ u1 ,
    
  
1 * 1 0 + 0
  1 1      
 
w = −2 − √ √  −2, 1   1  − 0
2 2      
0 0 −1 −1
   
1 0
   
= −2 + 1 
   
0 −1

which is ..
 
1
 
= −1
 

−1
 
−1
1 

 1
say u3 = √  1  = √ (−w)
3 3
1
Now continue as before, using the basis (u1 , u2 , u3 ) .

A projection formula to remember...
An×m
−1
⊤
A A A A⊤
| {z }
m×n×n×m
Why such a ‘magical’ (‘out of nowhere’) formula?
We’ll see in a later lecture!

Formula for representing orthogonal projection...
V = Rn (Reals!)
U = Lin ({v1 . . . , vm })
| {z }
linearly independent
Put
A := [v1 | . . . | vm ]n×m
Consider
P = A(A⊤ A)−1 A⊤

Need to check: for this choice of A and P
A⊤ A invertible
P2 = P
P ⊤ = P (obvious)
R(P ) = U = R(A)

A⊤ A invertible
Find N (A⊤ A) :
A⊤ A x = 0 ⇒ x⊤ A⊤ A x = 0
⇒ hA x, A xi = 0
⇒ Ax = 0
so x = 0 as v1 , ..., vm are linearly independent. So A⊤ A is of

full rank and so invertible.

Idempotency and symmetry
P idempotent P 2 = P
A(A⊤ A)−1 A⊤ A(A⊤ A)−1 A⊤

= A(A⊤ A)−1 I A⊤ = P
P symmetric P ⊤ = P
⊤
A(A⊤ A)−1 A⊤
= A⊤⊤ (A⊤ A⊤⊤ )−1 A⊤

= A(A⊤ A)−1 A⊤

R(P ) = R(A)
h i
P x = A (A⊤ A)−1 A⊤ x
∈ R(A).
i.e. R(P ) ⊆ R(A) .

Converse? ... i.e. R(A) ⊆ R(P ).
we ask
?
A x = P applied to some vector ?
Trick of insertion!
−1
A x = A A⊤ A A⊤ A x
= A(A⊤ A)−1 A⊤ (A x)
= P (A x) ∈ R(P ).
So indeed
R(A) ⊆ R(P )

EXAMPLE
V = R3 Use the formula to find projection orthogonally onto

    
 
  1   0 
 
Lin  −2 , 1 
   

 
 
0 −1

Solution
Put
 
1 0
 
A = −2 1 


0 −1
3×2
 
  1 0
1 −2  0 
⊤
(A A)2×2 =  −2 1 

 
0 1 −1
0 −1
 
5 −2
=  (now swop and sign-switch)
−2 2
 
1 2 2
(A⊤ A)−1 =
Using P = A(A⊤ A)A⊤
 
1 0   
 1 2 2 1 −2 0

P = −2 1      
6 2 5 0 1 −1
0 −1
 
2 2  
1  1 −2 0
=   −2 1 
 
6 0 1 −1
−2 −5
 
2 −2 −2
1 
= −2 5 −1


6
−2 −1 5

Lecture 36: Best-fits using Projections
Understanding PA = A(AT A)−1 AT

A more general PB = A(BA)−1 B
Best-fitting line through data points
Background Readings
Anthony and Harvey

Chap. 12 (§12.6-12.7)
Adam Ostaszewski
Chap. 4 (§4.7)

Understanding the aims...
An×m = [v1 , ..., vm ]
with v1 , ..., vm ∈ Rn
i.e. here A is built from columns in Rn
A(AT A)−1 AT
with AT A being m × m
Warning In this lecture always check the sizing of A .

As above, it may vary from the traditional m × n .
(The latter corresponds to A : Rn → Rm .)

Understanding Projections
We’ll look carefully at the formula

P = A(AT A)−1 AT
(AT A) is called a Gram matrix
and is very like a covariance matrix.
L = (AT A)−1 AT is best remembered by the formula
LA = I as = (AT A)−1 AT .A = I
so L stands for ‘Left of A ’

The Gram matrix ... like covariance
If
An×k = [v1 , ..., vk ]
then
 
−− v1T −−
 
A A=
T
 ...  [v1 , ..., vk ]

−− vkT −−
The ij element is
hvi , vj i
Hence AT A is like a covariance matrix, if we regard v1 , ..., vk as

‘random variables’.

More importantly:
If An×k = [v1 , ..., vk ] has rank k then AT A has rank k and is

square of size k × k
[why?]
and then L = (AT A)−1 AT exists and gives
LA = I
So L is a left inverse for A.

Comments
If An×k = [v1 , ..., vk ] has a left inverse L , i.e. with

Lk×n An×k = Ik×k , then the rank of A is k :
to see this check the nullity:
Ax = 0 =⇒ x =Ix = LAx = 0
so N (A) = {0} i.e. nullity (A) = 0 , and so rank (A) = k, as

dim-dom = k.

Conclusion: a criterion
An×k = [v1 , ..., vk ] has a left inverse ⇔ k = rank(A).

Comments
1. So P = A(AT A)−1 AT has the property that
P = AL and LA = I.
Of course
A |{z}
L.A L = AL.
So P is a projection; it is orthogonal because
P T = [A(AT A)−1 AT ]T = AT T ((AT A)T )−1 AT = A(AT A)−1 AT = P.

Cont’d
2. If there is a right inverse R , i.e. with
AR = I
then
R |{z}
A.R A = RA.
So P = RA is also a projection.
Then
RT AT = I
so RT is a left inverse for AT and so rank ((AT )n×m ) = m

(Here A is m × n .)

Conclusion
Am×n has a right inverse iff rank(A) = m.
In summary Al×r
has a left inverse if rank = r
has a right inverse if rank = l

Another Formula
From An×m = [v1 , ..., vm ] yielding
(AT A)m×m and A(AT A)−1 AT
To
PB := A(Bm×n An×m )−1 B
(Here B is the same size as AT .)

Here
PB2 = A(BA)−1 B.A(BA)−1 B = A(BA)−1 B = PB ,
so again we get a projection, though perhaps not an orthogonal

one.

The generalization: its aim
Suppose that
Rn = U ⊕ W
with U = Lin{v1 , ..., vm } using a basis of lin. indep. vectors for

U. Then dim W = n − m .
W
(n m) dim
m dim U
The aim is to have U = R(A) and W = N (B) and for PB to

project parallel to W onto U .

Want W = N (B) = {x : Bx = 0}
So B provides a bunch of equations via its rows, assumed lin.

indep. (to avoid redundancy). How many rows? We’ll see it must
be m . Denoting the normal vectors n1 , ..., nm we get:

hn1 , xi = nT1 x =0 


...



hnm , xi = nTm x =0
ni
0 x

So..
So A, B take the format:
U = R([v1 , ..., vm ]) = R(An×m ) (lin. indep. cols !)

 
nT1
 
W = N (
 ... 
 ) = N (Bm×n )
nTm
m×n

How to get B?
W = N (Bm×n ) (Wanted)
W ⊥ = N (Bm×n )⊥ (Equivalently)
= R((Bm×n )T ) (Duality: R(B T ) = N (B)⊥ )
= column space of B T
Task now is: Find columns spanning W ⊥ .

Example to follow in a moment.

Why m rows in B ? (Recall A has m lin. indep. columns)
Rn = U ⊕ W = Rn = R(A) ⊕ N (B)
rank(A) = m ⇔ nullity(B) = n − m ⇔ rank(B) = m.
Put Pn×n = An×m (Bm×n An×m )−1 Bm×n
Then P projects onto U parallel to W. Why?
projects bcsP 2 = A(BA)−1 B A(BA)−1 B =

| {z }
= A(BA)−1 B = P
onto R(P ) = R(A(BA)−1 B) ⊆ R(A)
and R(A) ⊆ R(P ) because P A = A(BA)−1 BA = A

Ax = P Ax ∈R(P )
Cont’d: ... and parallel to W
Bx = 0 =⇒ P x = A(BA)−1 .Bx = 0 so N (B) ⊆ N (P )
To improve the inclusion to equality we compute the dimensions

of each side:
n − m = dim N (B) ⊆ N (P ) = n − rank(P ) = n − m
So same dimension, so
W = N (B) = N (P ).

Example
    
 
  1 
  1
 
U = Lin  0  so A = 
  0
 


 

 
1 1
   
 
 1   0 
 
W = Lin  1 , 1 
   

 
 
0 3

We find W ⊥ = N (Bm×n )⊥
x ∈ W⊥ ⇔
 
1
 
 1  · x = 0, i.e. x1 + x2 = 0, and
 
0
 
0
 
 1  · x = 0, i.e. x2 + 3x3 = 0,
 
3
   
3x3 3
   

x =  −3x3  = x3  −3 
 

x3 1 MA212 – Lecture 36 – Friday 23 February 2018 page 20
Example cont’d
   
 
  3 
  
3

W ⊥
= Lin  −3  = R 
 

 −3 



 

 
1 1
 ⊥  T 
3  3
   
∴W = R  −3  = N    
−3   = N ([3, −3, 1])
   
1 1
by duality.

Example concluded
 
1
 
BA = [3, −3, 1]  0
 
 = (4)1×1
1
(BA)−1 = 1/4
P = A(BA)−1 B
 
1
 
=  0
 
 (1/4)[3, −3, 1] = ...

...again
   
1 3 −3 1
1  1 
P =  0  [3, −3, 1] =  0 0 0 
  
 .
4 4
1 3 −3 1

Take ... a look at BAx = B(Ax)
U = Lin{u1 , ..., um } = R([u1 , ..., um ]) = R(An×m ) ⊆ Rn

and define T : U → Rm by
T (u) = Bm×n u for u ∈ U ⊆ Rn
(Bear in mind that any u in U is of the form u = Ax .)

T
(Ã)
= (A)
N (T ) = N (B) ∩ U = W ∩ U = {0}, bcs Rn = U ⊕ W
So rank(T ) = rank(T ) + nullity(T ) = dim(U ) = m

| {z }
=0
We inspect...
... the range of T. We see that y ∈R(BA) implies y =BAx for

some x, so y =B(Ax) , and
Ax =[u1 , ..., um ]x ∈ Lin{u1 , ..., um } = U so y ∈R(T ) , i.e.
R(BA) ⊆ R(T ).
Now for the converse inclusion ...

But, conversely for u ∈ U
Given u find enx with u = Ax , then
T (u) = T (Ax) for some x ∈ Rm

= B.Ax
= B(Ax) ∈R(BA)
So
R(BA) = R(T )
rank(BA) + nullity(T ) = m,
rank(BA) + 0 = m.
So (BA)m×m has rank m, so has an inverse.

Nearest point map =Orthogonal Projection
For P an orthogonal projection onto U :


u∈U 
=⇒ u − P v ∈ U
Pv ∈ U 
v − P v ∈ U ⊥ & u − P v ∈ U =⇒ v − P v ⊥ P v − u
By Pythagoras’s Theorem
||v − u||2 = ||v − P v||2 + ||P v − u||2 ≥ ||v − P v||2
So
||v − u|| ≥ ||v − P v||
i.e. P v is the nearest point in U to v (nearest in the sense of at

the least distance). MA212 – Lecture 36 – Friday 23 February 2018 page 27
‘Least squares approximation’
P v in U is the nearest point of U to the point v.
hypoteneuse
v
u Pv
0 Pv
U u

Another way to say this:
P v is the best approximation (as measured by the Pythagorean
norm) available from the choice set U : the choice P v
minimizes over all available u (= gives) the least squares error:
||v − u||2 = |v1 − u1 |2 + |v2 − u2 |2 + ... + |vn − un |2 .

Example:
Given the Experimental Data:
True value x 0 3 5 8 10
Observation y 2 5 6 9 11
‘Theory’ suggests this data should be modelled linearly as:
y = ax + b
for a, b constants. The first two readings give:

)
2=0+b
=⇒ a = 1, b = 2.
5 = 3a + b

So a = 1, b = 2, i.e. y = x + 2.
But, this is inconsistent with the other three readings:
True value x 0 3 5 8 10
Observation y 2 5 6 9 11
What should be done?

Reading errors?
Suppose the y readings have errors and should be replaced by

cleaned up values y ∗ (x); so
y ∗ (x) = y(x) − e(x),
with e(x) subtracting off the ‘measurement error’. Then

y∗ = y + e and
     
y ∗ (0) 0 1
     
 y ∗ (3)   3   1 
     
     
y =
∗
 ...
 = a 5  + b 1 
    
     
 ..   8   1 
     
y ∗ (10) 10 1
So
   

 0 1 


    


  


 3  
  1



   
y ∈ Lin 
∗
 5  ,
 
 1 


    

  

  8  
 1 



 

 
10 1
and
y∗ = y − e

Construct the orthogonal projection matrix onto
   

 0 1 


    


  


 3  
  1



   
U = Lin  5  ,
 
 1  ⊆ R5


    

  

  8  
 1 



 

 
10 1
Then the smallest error occurs when
y∗ = P y.

Put
 
0 1
 
 3 1   
  ∗
  a
A=
 5 1 
 , wanted: y ∗
= A  
  b∗
 8 1 
 
10 1

So
P = A(AT A)−1 AT = AL
 
0 1
 
  3 1 
 
0 3 5 8 10  
AT A =   5 1 
 
1 1 1 1 1  
 8 1 
 
10 1
 
198 26
=  
26 5
(of course symmetric)

Cont’d
 
1  5 −26
T
(A A)−1
= 
314 −26 189

Now solve y∗ = P y with y∗ ∈ U , i.e. y∗ = A[α, β]T
 
α
A  = y∗ = P y = ALy
β
   
α α
Pre-multiply by L: LA 
|{z}
=  = L A Ly = Ly
β β | {z }
=I =P y

So
 
α
  = (AT A)−1 AT y
β

Cont’d: (α, β)T = Ly = (AT A)−1 AT y
 
2
 
  5   
 
0 3 5 8 10   227
T
A y=    6  =  
 
1 1 1 1 1   33
 0 
 
11
    
α 5 −26 227
  = (AT A)−1 AT y = 1   
β 314 −26 189 33
     
1  277   .88  1
= = not quite  .
314 632 2.01 2

What was this about?
11
9
8
y*(x)
5 y(x)
3 5 8 10
Sum of squared errors indicated by m is to be minimized.

Lecture 37: Left-, Right-,

and Generalized-Inverses
L -inverses are determined by their N (L)

R -inverses are determined by their R(L)
Distinct features of L and R
G -inverses created via factorization A = BC
Background Readings:
Anthony and Harvey:

‘Orthogonal projection & best fit’: Chap. 12 (§12.5, 12.6, 12.7)
Adam Ostaszewski:
Inverses Chap. 8 (§8.3,8.4)

Left Inverses Recalled
For An×m of rank m (the size on the right of n × m )
L := (AT A)−1 AT
is a left inverse
LA = I.
Others?
If Bm×n An×m is of rank m for some B then
| {z }
m×m
L := (BA)−1 B gives LA = I.
We will use the latter formula whenever it is easy to spot such a

B.
Q: When is a left inverse unique...?
Answer:...only when extra conditions are slapped on

Left inverse is uniquely determined by its kernel
*Theorem. If An×m has rank m and LA = I = L′ A and

further N (L) = N (L′ ), then
Lm×n = L′m×n
Proof.The two projections P = AL and P ′ = AL′ have the

same range, as R(AL) = R(A) = R(AL′ ) . By assumption both
have the same null space: N (AL) = N (L) = N (L′ ) = N (AL′ ) .
So both have the same direct sum, so are the same projection
(Lecture 35). So P = P ′ , and so
AL = AL′ ⇒ L = IL = |{z}
LA L = L |{z} AL′ = |{z}
AL = L |{z} I L′ = L′ .
=I =P P =P ′ =LA

Example
Find a left inverse L for

   
1 0 
 0 

    
A=  1 1  with N (L) = W = Lin  1  ⊆ R3
  

 
 
1 1 −1
This condition makes L unique.

Solution
We apply the last lecture’s work taking U = R(A). Now

 
0
 
w :=  1 

∈/ R(A)
−1
(by inspection, obvious). So if we take
W = Lin {w} , then W ∩ R(A) = {0}.

Next...with w = (0, 1, −1)T
Next we find W ⊥ : we find all x s.t. hx, w, i = 0, i.e. x s.t.
0 + x2 − x3 = 0
       
x1 x1 1 0
       
 x2  =  x2  = x1  0  + x2  1  .
       
x3 x2 0 1
  
1 0
  
W ⊥:     T
= R  0 1   = R B , say.
0 1

Using R(B T ) = N (B)⊥
 
1 0 0
W = N   = N (B)
0 1 1
 
  1 0  
1 0 0  1 0
BA =      
 1 1 =
0 1 1 2 2
1 1 det=2

Cont’d
 
−1 1 2 0 
(BA) = (after a ... swap/re-sign/divide)
2 −2 1
    
−1 1  2 0  1 0 0  1  2 0 0 
L = (BA) B= = .
2 −2 1 0 1 1 2 −2 1 1

So: check
LA = (BA)−1 B .A = I.
| {z }

A freebie
If there is R with AR = I we say that R is a right inverse.

Then
RT AT = I,
so RT is a left inverse.
Take AT to be n × m , then Am×n must have rank m.
Now observe that, by ‘duality’:
N (RT ) = N (R′T ) ⇐⇒ R(R)⊥ = R(R′ )⊥ ⇐⇒ R(R) = R(R′ )
So ...

So ...
*Theorem. If Am×n has rank m and AR = I = AR′ and

further R(R) = R(R′ ), then
′
Rn×m = Rn×m .
In words: right inverses are uniquely determined by their range.

Use of the right inverse
If Am×n has a right inverse: Am×n Rn×m = I, then one can solve
Ax = b ∈Rm
(for any b ) by taking
AR b = b.
x = Rb, because Ax = |{z}
=I
So R(A) = Rm (= set of all b ’s), and so rank(A) = m.

What’s so good about R = AT (AAT )−1
m×m
General solution of
Ax = b
is “particular solution plus complementary vector”, i.e.
x = Rb + z for some z ∈ N (A),
because if Ax = b
A(z) = A(x − Rb) = A(x)−AR(b) = b − b = 0.
Now by “duality”
Rb =AT (AAT )−1 b ∈R(AT ) = N (A)⊥
and z ∈ N (A) .
So ...
So Rb ⊥ z.
So, by Pythagoras,
||Rb + z||2 = ||Rb||2 + ||z||2 ≥ ||Rb||2
So x = Rb is the solution with least norm, when

R=AT (AAT )−1 .

For comparison:...
if
L = (AT A)−1 AT ,
then P = AL = A(AT A)−1 AT = P T and so P projects

orthogonally onto R(A),
and v∗ =Ax∗ is the best approximation to solving the
inconsistent system Ax = v.
Pv
U= (A)
0 v*
N.B. The dotted line is ⊥ .

Summary: for P = AL to project onto U parallel to W
An×m = [v1 , ..., vm ] (i.e. with m columns) has a left inverse iff
rank (A) = m
If (BA)m×m has rank m then (BA)−1 B is a left inverse
Choose B with N (B) = a specified subspace = W .
How?
Method:
Write W ⊥ = R(M ) = i.e. as a col. space

=⇒ W = W ⊥⊥ = R(M )⊥ = N (M T ),
then take B = M T

What if k = rank(An×m ) < m, n.
No right or left inverse here.

But we can have a matrix G called a generalized inverse with

AGA = A 
GAG = G 
Motivation:
AR A = A,
|{z} R |{z}
AR = R; A |{z}
LA = A, LA L = L.
|{z}
How? We show later that we can write
An×m = Bn×k Ck×m k = rk(A) = rk(B) = rk(C),
and ... MA212 – Lecture 37 – Tuesday 27 February 2018 page 19

Thence ... a generalized G = RL
Then we construct LB and RC with
LB B = I and CRC = I.
Take
G = RC LB
and

Then ...
GAG
G
z }| { z }| {
= RC LB |{z}
BC RC LB = RC [LB B][CRC ]LB = RC LB
A
= G
and
=A
z}|{
AGA = BC RC LB BC = B[CRC ][LB B]C = BC
| {z }
= A

Pause for thought (to help later)
If T : U → V is linear, then T preserves or reduces dimension!

That is, for any subspace W ⊆ U,
dim T [W ] ≤ dim W ;
indeed, define S : W → V as the restriction of T to W :
S(w) := T (w) for w ∈ W ,
then dim T [W ] = dim R(S) = rank(S), and:

rank(S) + nullity(S) = dim −dom(S) = dim(W ), so
| {z }
≥0
dim(T [W ]) = rank(S) ≤ dim(W ).

How to factorize A as
A = BC.
Watch the following example:

 
1 0 2 −1 4
 
A = [a1 , a2 , a3 , a4 , a5 ] = 
 0 1 −1 0 6 

1 1 1 −1 10
We select a set of columns that are lin. indep. and span the
remaining columns: a1 , a2 does the trick. (These two columns
are obviously lin. indep.: look at their first component.)
So B = (a1 , a2 ) is a basis for R(A) .

Next ...
Next we compute the co-ordinate columns relative to B of each

of the remaining columns.
Relative to B we see a1 = (e1 )B and a2 = (e2 )B and further:
 
2    
  2 −1
 
a3 =  −1  = 2a1 − a2 =   ; a4 = −a1 =  
−1 0
1 B B

So ...
 
4  
  4
 
a5 =  6  = 4a1 + 6a2 =  
6
10 B

Ergo
 
1 0  
  1 0 2 −1 4
A= 0 1 


 .
0 1 −1 0 6
1 1 | {z }
co-ordinate cols. rel. B
If not by inspection, then how? Use row reduction to echelon

form. Here’s another less obvious example:

Another less obvious example
 
1 1 2 4
 
 −1 2 −1 −1  R + R
  2 1
A=  →
 3 0 5 
9  R3 − 3R1

1 4 3 7 R4 − R1
 
1 1 2 4
 
 0 3 1 3 
 
→ 
 0 −3 −1 −3  R3 + R2
 
0 3 1 3 R4 − R2

Cont’d
 
1 1 2 4
 
 0 3 1 3  R1 − 2R2
 
→  (to put to work the 1 in R2 )
 0 0 0 0 
 
0 0 0 0
 
1 −5 0 −2
 
 0 3 1 3 
 
→  = A′ so (col 1, col 3) form a basis of A′
 0 0 0 0 
 
0 0 0 0

Next step: we use the same columns of A
So in A take B = (a1 , a3 ); but in A′ since a′2 = −5a′1 + 3a′1 and

a′4 = −2a′1 + 3a′3 , we have
   
−5 −2
a2 =   , a4 =  
3 3
B B

So
   
1 1 2 4 1 2
    
 −1 2 −1 −1   −1 −1  1 −5 0 −2
    
  = BC =  
 3 0 5 9   3 5  0 3 1 3
   
1 4 3 7 1 3
Now we pass to LB , RC . Notice that row 1 and row 2 of B are

independent: we use this to find P in the next slide.

How about
L = (P B)−1 P
LB = (P B)−1 P B = I
for some P . This works so long as P2×4 B4×2 is invertible!.. i.e.

| {z }
2×2
of rank 2.
Think of P B as P executing “row operations” on B ...

So take
   
1 0 0 0 − − eT1 − −
P = = .
0 1 0 0 − − eT2 − −
Regard P as describing the row operations: “pick rows 1 & 2”

 
1 2
    
1 0 0 0 
 −1 −1 
 1 2 ←− Row(B)1
PB =   = 
0 1 0 0 
 3 5 
 −1 −1 ←− Row(B)2
1 3
(See Slide 33.)

 
−1 −2
(P B) −1
=  swop & sign, and note det P = −1+2 = 1
1 1 MA212 – Lecture 37 – Tuesday 27 February 2018 page 32
Continued: so of course
    
−1 −2 1 0 0 0 −1 −2 0 0
−1
(P B) P =  = 
1 1 0 1 0 0 1 1 0 0

Right inverse for C
Here again try for Q(CQ)−1 , and think of Q as performing

column operations on C .
Recall that C contains c1 = e1 and c3 = e2 ; indeed:
 
1 −5 0 −2
C= 
0 3 1 3
So take Q = (e1 , e3 ) to “pick cols 1 & 3”, then CQ = (Ce1 , Ce3 )
 
1 0
 
 0 0 
 
N.B. these ei are in R4 and so Q =  
 0 1 
 
0 0
Then...
then obviously
 
1 0
CQ =  =I
0 1
Q(CQ)−1 = Q = RC

Conclusion
From A = BC to G = RC LB :
   
1 0 −1 −2 0 0
     
 0 0  −1 −2 0 0  0 0 0 0 
    
G=  = 
 0 1  1 1 0 0  1 1 0 0 
   
| {z }
0 0 =LB 0 0 0 0
| {z }
=RC

Comment
Suppose that An×m = Bn×k Ck×m and rk(A) = rk(B) = k.

Then:
1. rk(C) ≤ k because C is k × m
2. In fact: equality, as also k ≤ rk(C), because:
k = rk(A) = dim R(A) = dim R(BC)

= dim B[R(C)] (but B preserves or lowers dimension)
≤ dim R(C) = rk(C)

Lecture 38: Pseudo-inverses
Infinite dimensional spaces


ADA = A 
+ Orthogonality
DAD = D 
Underlying projections: AD, DA

The “ special one”
Lin. Algebra connections with analysis
Some general features, when...
DAn×m D = Dm×n
ADm×n A = An×m
We get projections
D[A · DA] = DAn×m so DA a projection in Rm
[AD · A]D = ADm×n so AD a projection in Rn
Using rank+nullity (below) we can show that each of the two

projections DA, AD has the same rank as A , i.e.
rk(DA) = rk(AD) = rk(D) = rk(A) = k, say.
MA212 – Lecture 38 – Friday 2 March 2018 page 2

This is because:
R(AD) = R(A) and N (AD) = N (D) (1)
N (DA) = N (A) (2)
Indeed
ADx = A(Dx) so R(AD) ⊆ R(A)

Ax = [AD](Ax) so R(D) ⊆ R(AD)
and
ADx = 0 =⇒ DADx = 0 =⇒ Dx = 0 =⇒ N (AD) ⊆ N (D)
Dx = 0 =⇒ ADx = 0 =⇒ N (D) ⊆ N (AD).

Similarly...
Az = 0 =⇒ DAz = 0, so N (A) ⊆ N (DA);
| {z } z = 0 =⇒ Az = 0, so N (DA) ⊆ N (A).
DAz = 0 =⇒ ADA
=A

Rank considerations for D
1. From the size of Dm×n←−
rk(D) + nullity(D) = n
2. As AD is a projection onto R(AD) = R(A) and

N (AD) = N (D) we get the direct sum decomposition:
Rn = R(AD) ⊕ N (AD) = R(A) ⊕ N (D)
n = rk(A) + nullity(D).
So
rk(D) + nullity(D) = n = rk(A) + nullity(D).

Features continued:
We get direct sums both in Rn and in Rm :
(1 : from (AD)) Rn = R(A) ⊕ N (D)
and
(2 : from (DA)) Rm = N (A) ⊕ R(D)
Let’s see why:

(1) For x ∈ Rn : put
z = x − ADx : Dz = Dx − DADx = 0, so
x = (ADx) + (z)
| {z } |{z}
∈R(AD) ∈N (D) MA212 – Lecture 38 – Friday 2 March 2018 page 6
Also:
z = Ay & Dz = 0 ⇒ z = Ay = AD[A y] = A |{z}

Dz = 0
| {z }
=0
So R(A) ∩ N (D) = {0}
(2) For y ∈ Rm , put
z = y − DAy : Az = Ay − ADAy = 0, so
y = (DAy) + |{z}
z
| {z }
∈R(DA) ∈N (A)
Also w = Dx & Aw = 0 ⇒ w = Dx = DA[D x]

| {z }
= D |{z}
Aw = 0
=0
R(D) ∩ N (A) = {0} MA212 – Lecture 38 – Friday 2 March 2018 page 7
Conversely: (N.B. N (A) ⊆ Rm and R(A) ⊆ Rn )
Theorem: If
Rn = R(An×m ) ⊕ W,
Rm = V
|{z} ⊕ N (An×m )
dim V =rk(A)
– then there exists a matrix G with

 
GAG = G  W = N (G) e.g. = R(A)⊥ 
+
AGA = A  V = R(G) e.g. = N (A)⊥ 
and G is unique.
This uses previous technology...

nothing special just: R(A) ∩ N (B) = {0}, then (BA)−1 B.A = I.
Note about dimV for An×m
rank(A) + nullity(A) = m,
dim V + nullityA = m
So
rank(A) = dim V
The specific example with


W = N (G) = R(A)⊥ 
can be arranged with a unique G
V = R(G) = N (A) ⊥ 

Comment
We write for k = rk(A)
An×m = Ãn×k Ck×m with R(Ã) |{z}

= R( |{z}
A )
| {z } | {z }
rk=k rk=k obvious bcs rk=k
−→ LÃ and RC with N (A) = N (C)
Indeed, the last equality holds because
Ax = 0 =⇒ ÃCx = 0 =⇒ LÃ ÃCx = 0 =⇒ Cx = 0,
and t’other way about:
Cx = 0 =⇒ ÃCx = 0 i.e. |{z}

A x = 0.
=ÃC

This is followed by ‘basis bashing’:
Recall that our aim is to get:
Rn = R(Ãn×k ) ⊕ W,
=V
z }| {
Rm = R( |{z}
H ) ⊕ N (C)
| {z }
for some H =N (A)
where Ãn×k =
submatrix with k indep. col. spanning col. space of An×m ,
so that then R(Ã) = R(A).

With
W = N (B) (cf. Lecture 36)

LÃ = (B Ã)−1 B gives LÃ = I

Here
N (A) = N (C) because, let’s recall from page 10,
0 = Ax = |{z} =⇒ Cx = 0 =⇒ ÃCx = Ax = 0
ÃC x =⇒ LÃCx = 0 |{z}
=A LÃ=I
So take H = [v1 , ..., vk ] with k = dim V and with

V = Lin{v1 , ..., vk } = R(H) .
By assumption
V ∩ N (C) = {0}, equivalently

R(H) ∩N (C) = {0}.
| {z }
=V
R = RC = H(CH)−1 gives CR = I

Now take
G = RL
These choices do it!

Puzzle: why B Ã, with B as in slide 12, has rank k?
Consider T : R(Ã) → defined by restriction:
T (Ãx) = B Ãx
for B chosen with N (B) = W . Here, as R(Ã) = R(A)
N (T ) = N (B) ∩R(A) = W ∩ R(A) = {0}

| {z }
=W
and so nullity (T ) = 0; therefore
rk(T ) + nullity(T ) = dim R(A) = k

rk(T ) = k
So dim R(B Ã) = k.

Illustration ?
To come ... in Slide 26.

Special Case via Gram matrix
k = rank(A) = rk(B)
An×m = Bn×k Ck×m G = RC LB
RC = C T (CC T )−1
LB = (B T B)−1 B T .
G = C T (CC T )−1 (B T B)−1 B T ( = RL )

AG = BC · C T (CC T )−1 (B T B)−1 B T
= B(B T B)−1 B T which is symmetric!
GA = C T (CC T )−1 (B T B)−1 B T · BC

T T −1
= C (CC ) C which is symmetric!
Special Case Cont’d. Conclusion 1: in Rn
Conclusion 1: AG projects orthogonally onto R(A) = R(AG)

so parallel to N (AT ) = R(A)⊥ = N (G) .
See Figure:
ࣨ(AT)=ࣨ(G) Թn
AG
࣬(A)=࣬(AG)
Anxm Gmxn
Conclusion 2: in Rm
࣬ ࣬
GA projects orthogonally parallel to N (A)
Here N (A) = N (GA)

So N (A)⊥ = N (GA)⊥
= R((GA)T ) = R((GA)) (as GA is symmetric)
ࣨ(A) Թm
GA
࣬(GA)=࣬(AT)
Anxm Gmxn
Comment on N (G) = R(A)⊥
We show below that
= R(B)⊥ = R(A)⊥
N (G) =↑see below N (LB ) =↑see below N (B T ) |{z}
duality
Of course R(A) = R(B) by choice of B .
=G
z}|{
Gx = 0 =⇒ RL x = 0 =⇒ |{z}
CR Lx = 0 =⇒ Lx = 0
=I
⇔
|{z} (B T B)−1 B T x = 0 ⇔ B x = 0
T
sub for L
=⇒ Lx = 0 =⇒ RLx = 0 =⇒ Gx = 0

Comment on R(G) = N (A)⊥
= N (C)⊥ =↑(2) see overleaf N (A)⊥

R(G) =↑(1) see below R(C T ) |{z}
duality
The last equality says that N (C) = N (A) .
Gx = C T [(CC T )−1 Lx] so R(G) ⊆ R(C T )

| {z }
=R
=I
z}|{
T −1
C y = C [(CC ) · LB ] · CC T ]y
T T
= [C T (CC T )−1 ·L](B · CC T y)

| {z }
=R
G (B · CC T y)
= |{z} so R(C T ) ⊆ R(G)
=RL
Cont’d
... and for (2) argue that
Ax = 0 =⇒ |{z}
BC x = 0 =⇒ |{z}
LB Cx = 0 =⇒ Cx = 0 =⇒ |{z}
BC x = 0,
=A =I =A
... i.e. Ax = 0 back to where we started, so
N (A) = N (C).
So the implications are all equivalences.

The Strong Generalized Inverse
The unique G such that
 
GAG = G  N (G) = R(A)⊥ 
+ [i.e. ‘Orthogonality’]
AGA = A  R(G) = N (A) 
⊥
is called the
Strong Generalized Inverse
or the
Moore-Penrose pseudo-inverse
after Roger Penrose the theoretical physicist (=mathematician)

Construction: page 17
(For the *uniqueness proof see AO page 106.)
How about the right and left inverses?
If An×m has rank m ≤ n, then the left inverse defined by
L = (AT A)−1 AT
is its strong generalized inverse, since
ALA = A(LA) = A and LAL = (LA)L = L
and both of

AL = A(AT A)−1 AT , 
are symmetric,
LA = I 
so these projections are orthogonal, being symmetric.

Similarly, ...
If An×m has rank n ≤ m, then the right inverse defined by
L = AT (AAT )−1
is its strong generalized inverse, since
ARA = (AR)A = A and RAR = R(AR) = R
and both of

RA = AT (AAT )−1 A, 
are symmetric.
AR = I 

Farewell Example: Find the pseudo-inverse of A
Know the drill:
1. Factorize A = BC . 2. Compute LB = (B T B)−1 B T .

3. Compute RC = C T (CC)−1 . 4. Compute G = RL .
Off you go ...

 
1 0 2
 
A = [a1 , a2 , a3 ] =  −1 1 0 

 , where a3 = 2(a1 + a2 ).
0 1 2
The dependencies noted by observation. To get this

systematically: reduce to row echelon, as previously in Lecture
37.
Check: Row reduction
 
1 0 2
 
 −1 1 0 
 
R2 → R2 + R1
0 1 2
 
1 0 2
 
→  
 0 1 2 
0 1 2 R3 → R3 − R2
 
1 0 2
 
→  0 1 2 


c3 = 2(c1 + c2 )(as c1 = e1 , c2 = e2 )
0 0 0

Cont’d
 
1 0  
  1 0 2
A =  −1 1 


  = BC
0 1 2
0 1
Here B = [a1 , a2 ]; relative to the basis B the matrix C identifies

the co-ordinate columns relative to B of a1 , a2 , a3 . So
C = [e1 , e2 , 2(e1 + e2 )].

Cont’d
 
  1 0  
1 −1 0   2 −1
B B = 
T   −1 1  =  
 
0 1 1 −1 2
0 1
 
1 2 1 
(B T B)−1 = ... swap & re-sign, and det=3
3 1 2
    
T −1 T 1  2 1   1 −1 0  1  2 −1 1 
LB = (B B) B = =
3 1 2 0 1 1 3 1 1 2

Cont’d
 
  1 0  
1 0 2  5 4
CC T
=   0 1  =  
 
0 1 2 4 5
2 2
 
1  5 −4 
(CC T )−1 =
9 −4 5
up above ... swap & re-sign, and det=25-16=9

So
 
1 0  
 1 5 −4
RC T
= C (CC ) T −1 
= 0 1   
9 −4 5
2 2
 
5 −4
1
 −4

= 5 
9 
2 2

G = RL now
 
5 −4  
1 1  2 −1 1
G = RL = ·  −4 5 

 
9 3 1 1 2
2 2
 
6 −9 −3
1  −3

= 9 6 
27  
6 0 6

Function Spaces

Examples of Function Spaces
We’ve seen many of these before:
F [0, 1] := {f : f is a function from [0, 1] → R}
C[0, 1] := {f ∈ F [0, 1] : f is continuous}
D[0, 1] or C 1 [0, 1] := {f ∈ F [0, 1] : f is differentiable}
S[0, 1] aka C ∞ [0, 1]
C ∞ [0, 1] := {f ∈ F [0, 1] : f is diff’ntable infinitely many times}
P[0, 1] := {f ∈ F [0, 1] : f is a polynomial}
P[0, 1] ⊆ S[0, 1] ⊆ D[0, 1] ⊆ C[0, 1] ⊆ F [0, 1]

Inner products
In C[0, 1] we can introduce

Z 1
hf, gi := f (t)g(t) dt
0
P[0, 1] contains the function fn (t) ≡ tn ; all these are lin. indep
and infinitely differentiable.

Variations
If C replaces R as co-domains (=range space), then

Z 1
hf, gi := f (t)g(t) dt,
0
using the conjugate.

If R+ := [0, ∞) replaces [0, 1] as domain for C(R)... then in the
formula for hf, gi the functions f (and likewise g ) need to satisfy
Z ∞
|f (t)|2 dt < ∞,
0
whereupon we may validly put

Z ∞
hf, gi = f (t)g(t) dt.
0
If N in place of [0, ∞) , then we obtain the ...
Sequence Spaces:
∞
X
ℓ2 := {a = (a1 , a2 , ...) : (∀n)(an ∈ R) and a2n < ∞}.
n=1
We saw that ℓ2 is a vector space under pointwise addition (i.e.

co-ordinatewise) and scaling.

Other Examples
Sequences convergent to zero:
c0 := {a = (a1 , a2 , ...) : (∀n)(an ∈ R and limn an = 0)}
contains the eventually “k”onstant ones
k := {a = (a1 , a2 , ...) : (∀n)(an ∈ R) and eventually an = 0}
k ⊆ c0 , k ⊆ ℓ2 , ℓ2 ⊆ c0 .
In ℓ2 we define
X∞
ha, bi := a n bn
n=1

Cauchy-Schwarz to infinity
The Cauchy-Schwarz inequality in Rn asserts that

X n Xn 1/2 Xn 1/2
2 2
|x · y| ≤ ||x||.||y|| i.e. xi yi ≤ xi yi
i=1 i=1 i=1
Apply this for any n and xi = |ai |, yi = |bi | to yield

Xn Xn 1/2 Xn 1/2
(sn :=) = |ai |.|bi | ≤ a2i b2i
i=1 i=1 i=1
X∞ 1/2 X∞ 1/2
≤ a2i b2i < ∞,
i=1 i=1
for a, b ∈ ℓ2 .

Cont’d
Hence {sn } is bounded, increasing, and so converges. Hence

X∞ X∞ 1/2 X∞ 1/2
limn sn = |ai |.|bi | ≤ a2i b2i < ∞,
i=1 i=1 i=1
P∞ P∞
and so, as | i=1 ai bi | ≤ i=1 |ai |.|bi | ,
|ha, bi| ≤ ||a||.||b||.

k and so ℓ2 are infinite dimensional:
Consider
e1 = (1, 0, 0, ...) ∈ k
e2 = (0, 1, 0, ...) ∈ k
etc.
Then 
 0, if i 6= j,
hei , ej i =
 1, if i = j.
So {e1 , e2 , ...} is an orthonormal set, hence all are lin. indep.

and indeed span k.

Some forward planning for Weeks 10 and 11
Exam solutions are on display only for the 3 past years.

(Dept. policy)
Be advised that eventually the 2017 solutions will supplant
the 2014 solutions.
Revision will be centered around the 2017 examination
...solutions and discussion of the context of the questions
or? .. we can re-run some past lectures/ parts of lectures
Please e-mail me your thoughts

Planning continued
Some general advice as regards old exam papers:

Read them! Don’t waste time doing masses of them.
That is: Don’t waste time on arithmetic.
Instead, ask yourself: Do you know how to solve them.
If you don’t ask for help. From me, from Dr Lokka, from Prof
Skokan, class teachers ...
You have 1 week to e-mail me your

thoughts

Lecture 39: Infinite-dimensional Vector

Spaces
Differences
Similarities
It ain’t necessarily so!
For V finite dimensional, and T : V → V linear:
R(T ) = V ⇔ N (T ) = {0} ⇔ T : V → V invertible
Now in ℓ2 ...
For a = (a1 , a2 , ...) define the shift-map T by putting:
T (a) = (a2 , a3 , ...).
This is linear (Exercise!)

Here R(T ) = ℓ2 but N (T ) = Lin{e1 } where e1 := (1, 0, 0, ..).
Indeed T (a) = (a2 , ...) = 0 iff a2 = 0 = a3 = a4 = ... so
a = (a1 , 0, 0, ...) = a1 e1 .
MA212 – Lecture 39 – Tuesday 6 March 2018 page 2

Furthermore
T (e1 ) = 0 =T (0), so T not invertible.
Now define the reverse shift
T ∗ (a) = (0, a1 , a2 , ...).
This also is linear (Exercise!)

Here R(T ∗ ) = {b ∈ℓ2 : b1 = 0} =
6 ℓ2 , yet N (T ∗ ) = {0}
Also
T ∗ (a) = e1 = (1, 0, 0, ...) not soluble, so T ∗ not invertible.

More misery
T ∗ T e1 = 0 = T ∗ 0 = 0 6= e1
yet
T T ∗ a = T (0, a1 , a2 , ...) = a for a = (a1 , a2 , ..)
so
TT∗ = I
T ∗ T 6= I

Eigenvalues?
? T ∗ a = λa, for some a 6= 0, i.e.

(0, a1 , a2 , ...) = (λa1 , λa2 , ..) for some a 6= 0 ?
implies...

... implies that
T ∗a = λa
0 = λa1
a1 = λa2
a2 = λa3
If λ = 0, then the LHS says a = 0. If λ 6= 0 , then canceling in

the first equation, we first get a1 = 0,

then in the next ...
we get a2 = 0 and so on.; so a = 0, a contradiction.
T ∗a = λa
0 = a1
a1 = λa2
a2 = λa3
So no eigenvalues here.

Still more misery:
U ⊆ (U ⊥ )⊥ 6= U
Example: In V = ℓ2 , take U = k = the eventually zero

sequences.
e1 ∈ k, e2 ∈ k, ...
If a ∈k ⊥ , then
0 = hei , ai = ai all i, so a = 0, i.e. k ⊥ = {0}.
So (k ⊥ )⊥ = ℓ2 6= k.
k + k ⊥ = k + {0} = k 6= ℓ2
Why?

Why? Why k 6= ℓ2 .
Indeed
1 1 X 1
2
a = (1, , ..., , ...) ∈ ℓ as 2
< ∞,
2 n n
1 1
but (1, , ..., , ...) ∈
/ k.
2 n

Green shoots of hope
In an inner-product space V :
Suppose that U ⊆ V is a finite-dimensional subspace.
So U = Lin{u1 , u2 , ..., un }, say; suppose indeed these are
orthonormal, so a basis for U. (Can arrange, by applying the
Gram-Schmidt process)
Put
P (v) = hv, u1 iu1 + hv, u2 iu2 + ... + hv, un iun
Then P is linear (as e.g. hv, u1 i is) and
P (v) ∈ U
hv−P v, ui i = hv, ui i − hP v, ui i

So substituting for P v
= hv, ui i − hhv, u1 iu1 + hv, u2 iu2 + ... + hv, un iun , ui i

= hv, ui i − [hhv, u1 iu1 , ui i + hhv, u2 iu2 , ui i + ...
... + hhv, un iun , ui i]
= hv, ui i − [hv, u1 ihu1 , ui i + hv, u2 ihu2 , ui i + ...

... + hv, un ihun , ui i]
So v − P v ⊥ U .

Observation
Now inside [..] most terms are zero, by orthogonality. So
= hv, ui i − hv, ui ihui , ui i

= hv, ui i − hv, ui i1 = 0
So
v = |{z}
Pv +v| −{zP v}
∈U ∈U ⊥
V = U + U ⊥.

In fact for U a finite-dimensional subspace
U ⊕ U ⊥,
V = |{z}
finite dim.
because
U ∩ U ⊥ = {0};
indeed, if u ∈U ∩ U ⊥ then
hu, ui = 0, so u = 0.
So by the above
P (v) = hv, u1 iu1 + hv, u2 iu2 + ... + hv, un iun
is the orthogonal projection from V onto U. MA212 – Lecture 39 – Tuesday 6 March 2018 page 13
Recap
Recall that, as v = u + w uniquely for some u ∈U and w ∈U ⊥

(from last page), then P (v) = u.
So for u ∈U and v ∈V
u − v = (u − P v) + (P v − v)
| {z } | {z }
∈U ∈U ⊥
so by the Pythagoras Theorem
||u − v||2 = ||u− P v||2 + ||P v − v||2

≥ ||P v − v||2
So P v in U is the closest to v.

Illustration
Pv U
hypoteneuse
v
Pv u
0 Pv
U u
Example 1
Take V = ℓ2
U = U (n) = Lin{e1 , e2 , ..., en }

= {(a1 , a2 , ..., an , 0, 0, 0...) : a1 , a2 , ..., an ∈ R}
P (a) = ha, e1 ie1 + ha, e2 ie2 + ... + ha, en ien
P ((a1 , a2 , ...)) = a1 e1 + a2 e2 + ... + an en

= (a1 , a2 , ..., an , 0, 0, 0...)
This keeps only the first n entries of a .

So ... with V = ℓ2
||u − a||2 = (a1 − u1 )2 + ... + (an − un )2 + a2n+1 + a2n+2 + ...

≥ a2n+1 + a2n+2 + ... (dropping the first n terms)
= ||a − P a||2 = ||P a − a||2
So the point P a ∈ U is the point u of U closest to a.

Example 2
Here V = C[0, 1]
Z 1
hf, gi = f (t)g(t) dt
0
U = Lin{1, t, t2 }
The Gram-Schmidt process yields {g1 , g2 , g3 } (see the next

slides)
g1 (t) ≡ 1,
√
g2 (t) ≡ 3(2t − 1)
√
g3 (t) ≡ 5(6t2 − 6t + 1)
P (f ) = PU (f ) = hf, g1 ig1 + hf, g2 ig2 + hf, g3 ig3

Doing it, doing it
Z 1
||1||2 = 12 dt = 1, so g1 ≡ 1
0
Z 1 2

t 1
ht, g1 i = t.1 dt = =
0 2 2
1
t − ht, g1 ig1 = t − 1 ⊥ g1
2
 2
2 Z 1 Z 1/2 Z 1/2
1  1
t − =  t −  dt = s2 ds = 2 s2 ds
2 0 | {z 2} −1/2 0
=s

s3 1
=2 =
3 s=1/2 12
−1
1 1 √ 1 √ MA212 – Lecture 39 – Tuesday 6 March 2018 page 19
g2 = √ t− =2 3 t− = 3 (2t − 1) .
We now find g3
Z 1
3 1
2 2 t 1
ht , g1 i = t .1 dt = =
0 3 0 3
Z 1 √ √ Z 1
ht2 , g2 i = t2 . 3 (2t − 1) dt = 3 (2t3 − t2 ) dt
0 0

3 1
√
√ t t 4 √ 1 1 √ 3−2 3
= 3 − = 3 − = 3 =
2 3 0 2 3 6 6
t2 − ht2 , g1 ig1 − ht2 , g2 ig2

2 1 √ 1√
= t − 1− 3 3 (2t − 1)
3 6
2 1
= t −t+ .
6 MA212 – Lecture 39 – Tuesday 6 March 2018 page 20
Thus

√ 2 1 √ 2

g3 = 6 5 t − t + = 5 6t − 6t + 1
6
Let’s look at the example of PU (f ) for
√
f (t) = t

√
Example of f (t) = t
Z
1 √ Z 1 √
P (f ) = t1/2 dt 1 + 3 1/2
t .(2t − 1) dt 3 (2t − 1)
0 0

√ Z 1
1/2 2
√ 2

+ 5 t . 6t − 6t + 1 dt 5 6t − 6t + 1
0

2 4 2 12 12 2 2

P (f ) = +3 − (2t − 1) + 5 − + 6t − 6t + 1
3 5 3 7 5 3
6 + 48t − 20t2
=
35

Comment
Z 2
√
||f − P (f )||2 = min t − at2 + bt + c dt
a,b,c

since the general element in U has the form at2 + bt + c .

Hoping for more
Suppose that {u1 , u2 , ..., un , ...} forms an orthonormal set and

Pn projects onto U (n) := Lin{u1 , u2 , ..., un }.
Then
||P1 (v) − v|| ≥||P2 (v) − v|| ≥ ...
Indeed: P2 (v) − v ⊥ U (2) and P2 (v) − P1 (v) ∈ U (2) , so
kv − P1 (v)k2 = kv − P2 (v)k2 + kP2 (v) − P1 (v)k2 ≥ kv − P2 (v)k2 .
Question:
? ||Pn (v) − v|| → 0 ?

Sometimes! ...
For V = ℓ2 and ui = ei , as then
Pn (a) = (a1 , ..., an , 0, 0, 0, ..)

∞
X
||Pn (a) − a||2 = a2j → 0, as a ∈ ℓ2 .
j=n+1

But ...Not always!
Example 3 V = ℓ2 and u1 = e2 , u2 = e3 , ...ui = ei+1 , ...
Pn (a) = (0, a2 , ..., an+1 , 0, 0, 0, ..)
||a−Pn (a)||2 = ||(a1 , 0, ...0, an+2 , an+3 , an+4 , ...)||2

X∞
= a21 + a2j → a21 ,
j=n+2
i.e.
Pn (a) → a − a1 e1 ,
which is not equal to a (unless a1 = 0 ).

So care needed!

In the example before this one...
Back there we had
Lin {e1 , e2 , · · · , en , · · · }
= U (1) ∪ U (2) ∪ U (3) ∪ ... = k and k is dense in ℓ2 .
But the very the last Example 3 was not ‘dense’ in ℓ2 .

k is “dense in” ℓ2 :
If a = (a1 , a2 , ...) then Pn (a) = (a1 , ..., an , 0, 0, 0, ..) and as before

∞
X
||Pn (a) − a||2 = a2j → 0,
j=n+1
so ... any point a ∈ ℓ2 can be approximated arbitrarily well by

points from k. Thus k is dense in ℓ2 , but
Lin{e2 , e3 , ..., ei+1 , ...} is not dense in ℓ2

However, in C[0, 1]
...in C[0, 1] with

Z 1
hf, gi = f (t)g(t) dt,
0
take an orthonormal sequence of polynomials gn built via the

Gram-Schmidt process from: 1, t, t2 , t3 ...; then under the norm
p
||f || = hf, f i one has
n
X
hf, gi igi (t) → f (t).
i=1

Lecture 40: Periodic Functions
(& why Ptolemy Rules OK)
The orthonormal sequence eint n ∈ Z

Eigenvectors in C[0, 1]
Consider
d2
T (f ) := 2 f,
dt
then its eigenvalues equation is
T (f ) = λf,
i.e.
f ′′ − λf = 0 (auxiliary equation: m2 − λ = 0)
for λ = α2 > 0 : f (t) = eαt
for λ = −α2 < 0 : f (t) = eiαt

Fourier’s Heat Equation
∂2 ∂
2
f (x, t) = f (x, t).
∂x ∂t
How to solve?
Investigate the problem using eigenvectors in the space C[0, 1],
or better, as we see later, on C[−π, +π]

Trigonometric Polynomials
These are the functions f (t) into C (the complex numbers) that
are of the form
n
X
ikt
√
ak e with i = −1 (finite sum!)
k=−n
For example
ei5t + ie−3it + 7 (this last term is 7e0it )

Reason for the name:
Expressible in terms of cos and sin because
eit = cos t + i sin t
eikt = cos(kt) + i sin(kt) e−ikt = cos(kt) − i sin(kt),
as cos(−θ) = cos θ and sin(−θ) = − sin θ.

So using 1i = −i ,
eliminating between the displayed equations gives
1 ikt −ikt
−i ikt −ikt

cos(kt) = 2 e +e sin(kt) = 2 e −e

Function space view
Context: the continuous functions
f : [−π, +π] → C
This context exploits that cos and sin are periodic (period 2π );
so are also trigonometric polynomials:
both satisfy
f (t + 2π) = f (t)
so e.g.
f (−π) = f (π),
by taking t = −π in the preceding equation.

NB: Conjugates
eikt = cos(kt) + i sin(kt)

= cos(kt) − i sin(kt) = e−ikt .
Henceforth we will work in C[−π, +π]... and we use

Z +π
hf, gi = f (t)g(t) dt
−π

What is ...
Z
eikt dt
As defined before, we integrate the real and imaginary parts;

for k 6= 0 :
Z Z Z
[cos(kt) + i sin(kt)] dt = cos(kt) dt + i sin(kt) dt
1 1
= sin(kt)−i cos kt + C
k k
1
= [cos kt + i sin(kt)] + C
ik
eikt
= + C, (as 1/i = −i)
ik
as expected.
Orthogonality
Z +π Z +π
heint , eimt i = eint eimt dt = eint e−imt dt
−π −π
Z +π
= ei(n−m)t dt
−π
Consider k = n − m 6= 0, then

ikt +π
int imt e
he ,e i = = 0, as cos(kt)|t=π = cos(kt)|t=−π
ik −π
(even function),
whereas sin(kt) = 0 for t = ±π.

Now take n = m
Z +π Z +π
||eint ||2 = heint , eint i = eint e−int dt dt = 1 dt = 2π.
−π −π
So

1 int
√ e :n∈Z is an orthonormal set in C[−π, +π].
2π
So we can project f ∈ C[−π, +π] as in Lecture 39 onto

1 ikt
Lin √ e : k = −n, −(n − 1), ..., −1, 0, +1, ..., n .
2π finite dim

This yields
n
X n
X
eikt eikt 1
Pn (f ) = hf, √ i √ = hf, eikt ieikt .
2π 2π 2π
k=−n | {z } k=−n
Here Z +π
1 1
ak (f ) = ikt
hf, e i = f (t)e−ikt dt
2π 2π −π
are the Fourier coefficients of f .
For f : [−π, +π] → R we can also rewrite this

n
X
Pn (f ) = α0 + [βk sin(kt) + γk cos(kt)];
k=1
here α0 , β s, γ s are real, and we get a real-valued function!

Example:
Define
f (t) := |t| for t ∈ [−π, +π],
and periodically elsewhere.
For α 6= 0 , we need to compute (with u = t and dv = eαt dt )

Z αt Z
e eαt eαt eαt
teαt dt = t − dt = t − 2 +C
α α α α
Note that for integer k :
eikπ = cos kπ + i sin kπ


 +1, k even
=
 −1, k odd
Illustration
>2 > 0 2 3

Now compute Fourier coefficients
We use the first formula on slide 11, ignoring 1/2π .

First, for k = 0,
Z +π Z +π 2 +π
|t| dt = 2 t dt = t 0 = π 2 .
−π 0

And for k 6= 0,
Z +π Z +π Z 0
|t|e−ikt dt = te−ikt dt + (−t) e−ikt dt
−π 0 −π |{z}
=|t|
Z +π Z 0
= te−ikt dt − te−ikt dt
0 −π

−ikt π

−ikt 0
t −ikt e t −ikt e
= i e + 2 − i e + 2
k k 0 k k −π
1 2 1 n −ikπ o
= {0} − 2 + 2 e + e+ikπ
k
 k k
 0 k even (& k 6= 0)
=
 − 42 k odd
k
In the above remember that e+ikπ = e−ikπ for integer k . MA212 – Lecture 40 – Friday 9 March 2018 page 15
So


 π/2
 k=0
ak (f ) = 0 k even & k 6= 0



− πk2 2 k odd
and so:
n
π 2 X 1 ikt
Pn (f ) = − 2
e ( but (−k)2 = k 2 so...)
2 π k
k odd
−n
X n n
X
π 2 1 ikt −ikt π 4 cos(kt)
= − (e + e )= − ,
2 π k2 2 π k2
k odd k=1 & odd
1
because (eikt + e−ikt ) = cos(kt) + i sin(kt) + cos(kt) − i sin(kt)

Leading question
n
X
π 4 cos(kt)
Pn (f ) = − 2
→ |t| as n → ∞ for t ∈ [−π, +π]?
2 π k
k=1 & odd
−
Puzzle!

Fourier’s Theorem
For f ∈ C[−π, +π]

(i) (cf. last slide Lecture 39)

Xn

f − ak (f )eikt → 0

k=−n
(ii)
∞
X
kf k2 = 2π |ak (f )|2
−∞
(iii)
∞
X
f= ak (f )eikt for − π < t < +π
k=−∞
Danger at t = ±π.
To get (ii) from (i)
... note that

2
X n
ikt
a k (f )e

k=−n
* n n
+
X X
ikt
= ak (f )e , ah (f )eiht
k=−n h=−n
n
X X n D E n
X
= ak (f )ah (f ) eikt , eiht = ak (f )ak (f )(2π)
k=−n h=−n k=−n
Xn
= 2π |ak (f )|2 ,
k=−n

In the limit...
Now pass to the limit using (i):

2

!
X n

∗ − f +f → k0 + f k2 .

k=−n
| {z }

→0

Computing π 2
By the Theorem: for −π < t < +π

∞
X
π 4 cos(kt)
|t| = − 2
;
2 π k
k=1 & odd
take t = 0, then
∞
X
π 4 1
0= − 2
.
2 π k
k>0 odd

So ..
So
∞
X
π2 1
= 2
,
8 k
k>0 odd
i.e.
1 1 1 π2
1 + 2 + 2 + 2 + ... = .
3 5 7 8
How about all k ? What is this sum:
1 1 1
1 + 2 + 2 + 2 + ...
2 3 4

A brainwave:
Even=2ℓ × odd for some ℓ = 0, 1, 2, ...
If odd=1
Evenodd=1 =1, 2, 4, 8, 16, ...2 ℓ , ... for some ℓ = 0, 1, 2, ...
Evenodd=3 =3 × {1, 2, 4, 8, 16, ...2 ℓ , ...} for some ℓ = 0, 1, 2, ...

1 1
ℓ 2
= ℓ
(2 ) 4
X∞
1 1 4 1
ℓ
= 1 = as common ratio=
4 1− 4
3 4
ℓ=0

Claim
∞
X 1 π2
2
=
k 6
k=1
i.e.
1 1 1 π2
1 + 2 + 2 + 2 + ... =
2 3 4 6

Idea...
Re-organize the sum as a sum of sums; each odd reciprocated

power factorized out:

1 1
= 1 × 1 + 2 + 2 + ...
2 4

1 1 1
+ 2 × 1 + 2 + 2 + ...
3 2 4

1 1 1
+ 2 × 1 + 2 + 2 + ... + ...
5 2 4

1 1 1 1 1
= 1 + 2 + 2 + 2 + ... × 1 + 2 + 2 + ...
3 5 7 2 4
π2 4 π2
= × =
8 3 6

Linear Algebra Course Pack PDF

Uploaded by

Copyright:

Available Formats

You might also like

Linear Algebra Course Pack PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Algebra Course Pack PDF

Uploaded by

Copyright:

Available Formats

MA212 Further Mathematical Methods Lecture LA 1

Introducing Linear Algebra:

What’s on Moodle: Slides, Recordings, Assignments...

Homework Assignment: Class arrangements to continue as

Vector spaces and all that

Standard examples of spaces: R3 , R4 , Rn

MA212 – Lecture 21 – Tues 5 December 2017 page 2

Linear transformations: their representation by matrices

MA212 – Lecture 21 – Tues 5 December 2017 page 3

Anthony and Harvey (Linear Algebra)

Adam Ostaszewski (Advanced Mathematical Methods)

MA212 – Lecture 21 – Tues 5 December 2017 page 4

a set V equipped with notions of “addition” and “scalar

(i) the sum v1 + v2 of any two elements (vectors) v1 , v2 in V

MA212 – Lecture 21 – Tues 5 December 2017 page 5

Addition and scalar multiplication of vectors in a vector space

Rules of Vector Addition (group rules)

Let V be a real or complex vector space.

A finite subset {v1 , v2 , . . . , vk } of vectors in V is defined to be

A set of vectors that is not linearly dependent is called linearly

MA212 – Lecture 21 – Tues 5 December 2017 page 7

Scaling by a complex scalar? Here goes

MA212 – Lecture 21 – Tues 5 December 2017 page 8

MA212 – Lecture 21 – Tues 5 December 2017 page 9

Suppose for some scalars a1 , a2 , a3

MA212 – Lecture 21 – Tues 5 December 2017 page 10

Component comparison gives

So the set is linearly independent.

MA212 – Lecture 21 – Tues 5 December 2017 page 11

MA212 – Lecture 21 – Tues 5 December 2017 page 12

But, how did we find these scalars?

MA212 – Lecture 21 – Tues 5 December 2017 page 13

More accurately: Transpose, stack, row-reduce

MA212 – Lecture 21 – Tues 5 December 2017 page 14

If a zero row appears, then there is linear dependence

Example to follow after next business.

MA212 – Lecture 21 – Tues 5 December 2017 page 15

(i) Multiply a row by a non-zero constant

Row ops leave the row space of the matrix unchanged

MA212 – Lecture 21 – Tues 5 December 2017 page 16

MA212 – Lecture 21 – Tues 5 December 2017 page 17

[v2T − 2v1T ] + 3v3T = 0T

MA212 – Lecture 21 – Tues 5 December 2017 page 18

For a square matrix the following are equivalent:

MA212 – Lecture 21 – Tues 5 December 2017 page 19

Mostly about Gaussian Elimination

Menu for the day:

A subset U of a vector space V is a vector subspace (or just

To check that a subset U of V is a vector subspace, we don’t

What we do need is to check U is closed under addition and

MA212 – Lecture 37 – Fri 8 December 2017 page 2

U is closed under addition and scaling whenever a is a scalar

i.e., whenever a1 , a2 are scalars and v1 , v2 are elements of U ,

MA212 – Lecture 37 – Fri 8 December 2017 page 3

Consider a scalar a and

1. Adding the two equations

so the components of x + u satisfy the defining condition.