Linear Algebra Course Pack PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 872

MA212 Further Mathematical Methods Lecture LA 1

Introducing Linear Algebra:


Arrangements, Overview, Some Revision

What’s on Moodle: Slides, Recordings, Assignments...


& their Solutions (eventually), Past papers

Homework Assignment: Class arrangements to continue as


for Calculus
Course content

Vector spaces and all that

Standard examples of spaces: R3 , R4 , Rn


Exotic examples of spaces, like M2×2 : the 2 × 2 matrices
(as these can be added and scaled). But note that we can code
 
a c
  as (a, b, c, d) ∈ R4 .
b d

Bases, co-ordinates
Orthogonal bases =⇒ projections
Nearest point map: good approximations
Linear regression

MA212 – Lecture 21 – Tues 5 December 2017 page 2


Content cont’d

Linear transformations: their representation by matrices


Square Matrices: when diagonalizable ... their interpretation
via eigenvalues
Complex scalars and vectors ... as real matrices can have
complex eigenvalues
What if not diagonalizable?
How about non-square matrices? Generalized Inverses

Exotic examples:
R1
A(f ) := 0 f (t) dt

MA212 – Lecture 21 – Tues 5 December 2017 page 3


Background Readings (for this week’s material)

Anthony and Harvey (Linear Algebra)


Ch. 2 (§2.2 and §2.3), Ch. 3 (§3.1 and §3.3), Ch. 5

Adam Ostaszewski (Advanced Mathematical Methods)


Ch. 1 (§1.1-§1.3), also §8.1.

MA212 – Lecture 21 – Tues 5 December 2017 page 4


Revision: A vector space is

a set V equipped with notions of “addition” and “scalar


multiplication”, so that:

(i) the sum v1 + v2 of any two elements (vectors) v1 , v2 in V


is also in V ,
(ii) the scalar multiple av is also in V , whenever v is in V and
a is any scalar.

Usually, we shall take the set of scalars to be the set R of all real
numbers: then we say that V is a real vector space. Sometimes
we shall take the set of scalars to be the set C of all complex
numbers: then V is a complex vector space.

MA212 – Lecture 21 – Tues 5 December 2017 page 5


*Formal rules for a vector space

Addition and scalar multiplication of vectors in a vector space


satisfy the following properties, all of which hold for all vectors
u , v , w in the vector space, and all scalars a , b :

Rules of Vector Addition (group rules)


(i) commutativity: u + v = v + u ,
(ii) associativity: u + (v + w) = (u + v) + w ,
(iii) a zero vector, denoted 0 , exists with: 0 + v = v + 0 = v ,
Rules of Vector Scaling (action rules)
(iv) action of the scalar 1 : 1u = u,
(v) associativity: (ab)u = a(bu),
Rules of interconnection (distributivity rules)
(vi) (a + b)u = au + bu, (vii) a(u + v) = au + av.
* means “No need to memorize”
Linear independence

Let V be a real or complex vector space.

A finite subset {v1 , v2 , . . . , vk } of vectors in V is defined to be


linearly dependent if there are scalars a1 , . . . , ak , not all zero,
such that a1 v1 + a2 v2 + · · · + ak vk = 0 .

A set of vectors that is not linearly dependent is called linearly


independent . So the set {v1 , v2 , . . . , vk } is linearly independent
if the only scalars a1 , . . . , ak satisfying
a1 v1 + a2 v2 + · · · + ak vk = 0 are a1 = a2 = · · · = ak = 0 .

MA212 – Lecture 21 – Tues 5 December 2017 page 7


Complex 3-space C3

  
 
 z1 
 

C3 := 
 z 2
 : z1 , z2 , z3 ∈ C


 

 
z3

Example: with i = −1
 
2i
 
v = 1+i 

,
3 − 5i

Scaling by a complex scalar? Here goes

MA212 – Lecture 21 – Tues 5 December 2017 page 8


Complex scaling

 
(1 − i)2i
 
(1 − i)v =  (1 − i)(1 + i) 


(1 − i)(3 − 5i)
 
2 + 2i
 
= 2
 1−i =1+1 

3 − 5 − i(3 + 5)
 
2 + 2i
 
= 2 .

−2 − 8i

MA212 – Lecture 21 – Tues 5 December 2017 page 9


Check for linear independence the set

     
 
 1   0

 
0 

 2 , 4 + i , 0  .
      

 

i −1 3−i

Suppose for some scalars a1 , a2 , a3


       
1 0 0 0
       
a1        
 2  + a2  4 + i  + a3  0  = 0 =  0  .
i −1 3−i 0

MA212 – Lecture 21 – Tues 5 December 2017 page 10


Cont’d

       
1 0 0 0
       
a1  2  + a2  4 + i  + a3  0 
     =  0 .
  
i −1 3−i 0

Component comparison gives

a1 + 0 + 0 = 0, =⇒ a1 = 0,
2a1 + (4 + i)a2 + 0a3 = 0, =⇒ a2 = 0,
ia1 − a2 + a3 (3 − i) = 0 =⇒ a3 = 0.

So the set is linearly independent.

MA212 – Lecture 21 – Tues 5 December 2017 page 11


Check for linear dependence the set

      

 1 2 0 
      
 
v1 =  3  , v2 = 
 0
 , v3 =  2  .
  

 
 
1 −1 1

By observation
(−2)v1 + 1v2 + 3v3 = 0.

Indeed
       
1 2 0 0
       
(−2)        
 3  +  0  + 3 2  = 0 =  0 .
1 −1 1 0

MA212 – Lecture 21 – Tues 5 December 2017 page 12


Component comparison gives

(−2) × 1 + 2 + 3 × 0 = 0,
(−2) × 3 + 0 + 3 × 2 = 0,
(−2) × 1 + (−1) + 3 × 1 = 0.

But, how did we find these scalars?

MA212 – Lecture 21 – Tues 5 December 2017 page 13


Method for linear independence: Flip-Stack-Reduce

More accurately: Transpose, stack, row-reduce

     


 1 2 0  
      
v1 = 
 3 
 , v2 = 
 0
 , v3 =  2 
  

 
 
1 −1 1
   
v1T 1 3 1
   
=⇒  v2  =  2 0 −1 
 T  
.
v3T 0 2 1

This is the transpose and stack part. The next step is the
row-reduce part.

MA212 – Lecture 21 – Tues 5 December 2017 page 14


Method for linear independence: Flip-Stack-Reduce

On completion :

If a zero row appears, then there is linear dependence


Otherwise there is linear independence.

Example to follow after next business.

MA212 – Lecture 21 – Tues 5 December 2017 page 15


Elementary row operations: revision

(i) Multiply a row by a non-zero constant


(ii) Exchange two rows
(iii) Add another scaled row

Row ops leave the row space of the matrix unchanged


They also preserve column dependencies

MA212 – Lecture 21 – Tues 5 December 2017 page 16


Watch the example

   
1 3 1 v1T
   
 2 0 −1  =  v T 
   2 
0 2 1 v3T
R2 −2R1 R2 −2R1
   
1 3 1 v1T
   
 0 −6 −3  =  [vT − 2vT ] 
   2 1 
0 2 1 v3T
R2 +3R3 R2 +3R3
   
1 3 1 v1T
   
 0 0 0  =  T T T 
   [v2 − 2v1 ] + 3v3 
0 2 1 v3T

MA212 – Lecture 21 – Tues 5 December 2017 page 17


End effect

From
   
1 3 1 v1T
   
 0 0 0  =  [vT − 2vT ] + 3vT 
   2 1 3 

0 2 1 v3T

we have

[v2T − 2v1T ] + 3v3T = 0T


−2v1 + v2 + 3v3 = 0,

(finally).

MA212 – Lecture 21 – Tues 5 December 2017 page 18


Recalling some general facts

For a square matrix the following are equivalent:


A has an inverse
det(A) 6= 0
Rows of A are lin. independent
Row reduction doesn’t yield a zero row

MA212 – Lecture 21 – Tues 5 December 2017 page 19


MA212 Further Mathematical Methods Lecture LA 2

Mostly about Gaussian Elimination

Menu for the day:


Recap: vector subspaces
Gaussian elimination (Main dish/ main business)
Bases and dimension
Elementary matrices and Determinants
Wronskian Determinant
Recap: vector subspaces

A subset U of a vector space V is a vector subspace (or just


subspace) of V if U is also a vector space (with the addition and
multiplication inherited from the larger vector space V ).

To check that a subset U of V is a vector subspace, we don’t


need to verify all the properties of addition and multiplication of a
vector space, as these already hold in V .

What we do need is to check U is closed under addition and


scaling ...

MA212 – Lecture 37 – Fri 8 December 2017 page 2


Criterion for checking U is a subspace

U is closed under addition and scaling whenever a is a scalar


and v1 , v2 , v are elements of U , then v1 + v2 is in U and av
is in U .

Equivalently:
a subset U of a vector space V is a subspace of V
iff U is closed under linear combinations:

i.e., whenever a1 , a2 are scalars and v1 , v2 are elements of U ,


then the linear combination a1 v1 + a2 v2 is also in U .

MA212 – Lecture 37 – Fri 8 December 2017 page 3


Subspace: an example to check

  
 
 x 
 

W =  y
 
 : 2x − y + z = 0

 

 
z

Consider a scalar a and


   
x u
   
x : =   
 y  , u : =  v  both in W.
z w
   
x+u au
   
Is x + u =  y + v  in W and au : = 
 
 av
 ∈ W?

z+w aw
MA212 – Lecture 37 – Fri 8 December 2017 page 4
Let’s see.

2x − y + z = 0,
2u − v + w = 0.

1. Adding the two equations

2[x + u] − [y + v] + [z + w] = 0,

so the components of x + u satisfy the defining condition.


2. Scaling the first equation by a,
 
ax
 
2ax−ay+az = 0, so ax : =  
 ay  satisfies the defining condition.
az
MA212 – Lecture 37 – Fri 8 December 2017 page 5
Linear combinations

The linear span of a finite set S = {v1 , v2 , . . . , vk } of vectors in a


real vector space V is the subset

Xk
Lin(S) = { ai vi | ai ∈ R}
i=1

of linear combinations of the vectors in S . If the vectors are in a


complex vector space, then the sum is over all ai in C . The
linear span of an infinite set A of vectors is the set of vectors
that can be written as a linear combination of finitely many
vectors from A .

MA212 – Lecture 37 – Fri 8 December 2017 page 6


Spanning and other notation for Lin(A)

A subset A of a vector space V spans the vector space V if


span(A) = V , meaning that every vector in the vector space can
be written as the finite linear combination of these vectors.
Other notation for Lin(A) is either of:
span(A), hAi .

MA212 – Lecture 37 – Fri 8 December 2017 page 7


Gaussian elimination and its purpose

In MA100 you learned how to solve simultaneous equations:


Step 1 : Express in matrix form

Ax = b.

Step 2 : Use row reduction on the augmented matrix

(A|b)

to get this into Echelon form.


This is called Gaussian elimination: because the echelon form
eliminates all but one variable from each equation.
If the system is consistent (= has a solution at all), the method
easily finds all the solutions.
MA212 – Lecture 37 – Fri 8 December 2017 page 8
A simple example

Consider 3x1 + 2x3 = 6 and −x1 + x2 + x3 = 5 . We get the


augmented matrix
 
3 0 2 6
 .
−1 1 1 5

Gaussian elimination yields the augmented matrix


 
1 0 2/3 2
 .
0 1 5/3 7

Looking just at the first two columns, which are the columns
containing leading 1s, we can read off one solution: x1 = 2 ,
x2 = 7 and x3 = 0 .
MA212 – Lecture 37 – Fri 8 December 2017 page 9
Simple example cont’d

Additional solutions are provided by allowing x3 to be any real


number ( x3 is called a free variable) and setting x1 = 2 − 23 x3
and x2 = 7 − 53 x3 .

MA212 – Lecture 37 – Fri 8 December 2017 page 10


Finite dimensional spaces

A vector space V is called ‘finite-dimensional’ if V has a (finite!)


basis.

Theorem. If V is finite-dimensional and has a basis with


n members, then any set of n + 1 vectors of V is linearly
dependent.

Proof. Not examined. Available in the Appendix.

MA212 – Lecture 37 – Fri 8 December 2017 page 11


Dimension

By the theorem just stated:


In a finite dimensional vector space V any two bases of V have
exactly the same number of vectors.

That number is the dimension of V denoted dim(V ) .

MA212 – Lecture 37 – Fri 8 December 2017 page 12


Elementary matrices versus Gaussian Elimination

An elementary matrix arises from applying an elementary row


operation on the identity matrix (of any size m × m ).
We list the types below and check that if E is elementary of size
m × m , then A′ defined by

A′m×n := Em×m Am×n

is A row reduced by one elementary row operation.

MA212 – Lecture 37 – Fri 8 December 2017 page 13


Non-zero scaling of a row

 
1
 
 1 
 
 
 ... 
 
 
 1 
 
Ea =  
 a 
 
  ←− j th row × a 6= 0
 1 
 
 
 ... 
 
1

MA212 – Lecture 37 – Fri 8 December 2017 page 14


Example

    
1 0 0 a b a b
    
 0 −2 0   c d  =  −2c −2d 
    
0 0 1 e f e f

Another example, with a 6= 0 :

E1/a Ea = I,

since E1/a multiplies the j th row of Ea by 1/a. So Ea is


invertible.

MA212 – Lecture 37 – Fri 8 December 2017 page 15


Exchange rows

 
1
 
 ... 
 
 
 0 1 
 
  ←− ith row
 1 
exch  
E = 
 ... 
 
 
 1 0 
 
  ←− j th row
 ... 
 
1

MA212 – Lecture 37 – Fri 8 December 2017 page 16


Example

    
0 0 1 a b e f
    
 0 1 0  c d  =  c d 
    
1 0 0 e f a b

Another example:
E exch E exch = I,

since E exch exchanges the same two rows again . So E exch is


invertible.

MA212 – Lecture 37 – Fri 8 December 2017 page 17


Adding a row

Fix i, j .
 
1
 
 ... 
 
 
 1 
 
  ←− ith
row
 1 
 
Eaadd = 
 ... 
 
 
 a 1 
 
  Rj −→ Rj + aRi
 ... 
 
1

MA212 – Lecture 37 – Fri 8 December 2017 page 18


Example

    
1 0 0 a b a b
    
 0 1 2   c d  =  c + 2e d + 2f 
    
0 0 1 e f e f

Another example:
add add
E−a Ea = I,
add
since E−a takes away a× ith row of I from the j th row of Eaadd ,
so cancels the earlier addition of that same row to the j -th row.
So Eaadd is invertible.

MA212 – Lecture 37 – Fri 8 December 2017 page 19


Inversion and elementary row operations

Suppose that An×n can be row reduced to the identity using k


row operations.
Label them 1, 2, 3, ..., k according to the order of operation.
Using the corresponding elementary matrices we get

Ek · ... · E2 · E1 A = I.

All these elementary matrices are invertible so

A = E1−1 · E2−1 · ... · Ek−1 .

As the inverse of an elementary matrix is an elementary matrix,

A = a product of elementary matrices

MA212 – Lecture 37 – Fri 8 December 2017 page 20


Inverse from elementary matrices

By multiplying out the matrices below we see that

[Ek · ... · E2 · E1 ][E1−1 · E2−1 · ... · Ek−1 ] = I =

= [E1−1 · E2−1 · ... · Ek−1 ][Ek · Ek−1


·
... · E1 ].

So A is invertible and

A−1 = Ek · ... · E2 · E1 .

MA212 – Lecture 37 – Fri 8 December 2017 page 21


Determinants

Elementary operations on determinants.


We list some properties of the determinant det(A) of a square
matrix A.
They correspond to elementary row operations.

(i) Exchange two rows, then the determinant is multiplied by −1 ;


(ii) Multiply a row by c, then the determinant is multiplied by c ;
(iii) Add a multiple of a row to another row, the determinant is
unchanged;
(iv) det(In×n ) = 1.

MA212 – Lecture 37 – Fri 8 December 2017 page 22


Consequences

If E is an elementary matrix

det(EA) = det(E) det(A)

Proof. We consider separately each of the case (i),(ii),(iii) above.


In each case we work out det(E) an check that our formula for
det(EA) is correct, as follows:
(i) det(E) = −1,
(ii) det(E) = c,
(iii) det(E) = unchanged. So det(E) = det(I) = 1 (as E comes
from I ).

MA212 – Lecture 37 – Fri 8 December 2017 page 23


Products of determinants: step 1

In particular, if A is invertible, then

A = a product of elementary matrices

so, writing say


A = F1 · ... · Fk ,

det(A) = det(F1 ) · ... · det(Fk ).

Why? Because

det(A) = det(F1 · [F2 · ... · Fk ])


= det(F1 ) · det(F2 · ... · Fk )
= det(F1 ) · det(F2 ) · det(F3 · ... · Fk )
etc.
MA212 – Lecture 37 – Fri 8 December 2017 page 24
Products of determinants: step 2

If also B is invertible, then, writing

B = H1 · ... · Hk ,

AB = F1 · ... · Fk · H1 · ... · Hk

det(AB) = det(F1 ) · ... · det(Fk ) · det(H1 ) · ... · det(Hk )


= det(A) det(B).

If A or B is not invertible, this equation continues to hold. Why?

( det B = 0 → ∃v 6= 0 Bv = 0 → ABv = 0 → det AB = 0;


det B 6= 0 and det A = 0 → det(AB) = 0, otherwise A =
(AB)B −1 is invertible.)
MA212 – Lecture 37 – Fri 8 December 2017 page 25
Linear independence of functions

This is a one-way test for independence.


We illustrate the idea with 3 functions, but it is easy to generalize
to more functions.
We work in the vector space of functions. The zero vector is the
function o(t) ≡ 0 .

Suppose the three functions below are twice differentiable:


f1 : [0, 1] → R, f2 : [0, 1] → R, f3 : [0, 1] → R
Suppose also that they are linearly dependent.
So there are scalars a1 , a2 , a3 with
a1 f1 (t) + a2 f2 (t) + a3 f3 (t) = o(t) = 0 (0 ≤ t ≤ 1).

MA212 – Lecture 37 – Fri 8 December 2017 page 26


Use some differentiation

Differentiate each side


a1 f1′ (t) + a2 f2′ (t) + a3 f3′ (t) = o(t) = 0 (0 ≤ t ≤ 1),
because o′ (t) = 0′ = 0 .

Differentiate again
a1 f1′′ (t) + a2 f2′′ (t) + a3 f3′′ (t) = o(t) = 0 (0 ≤ t ≤ 1).

Express the three equations above in matrix form...

MA212 – Lecture 37 – Fri 8 December 2017 page 27


In matrix form

Express the three equations above in matrix form...


    
f1 (t) f2 (t) f3 (t) a1 0
    
 f ′ (t) f ′ (t) f ′ (t)   a2  = 0 =  0  .
 1 2 3    
f1′′ (t) f2′′ (t) f3′′ (t) a3 0

MA212 – Lecture 37 – Fri 8 December 2017 page 28


In symbols

W(t)a = 0 (0 ≤ t ≤ 1),

where we define below the Wronskian matrix W(t) to be


 
f1 (t) f2 (t) f3 (t)
 
W(t) =  f1 (t) f2 (t) f3 (t) 
 ′ ′ ′
.
f1′′ (t) f2′′ (t) f3′′ (t)

So W(t) is not invertible (otherwise a = 0 ). So the Wronskian


determinant is zero.

MA212 – Lecture 37 – Fri 8 December 2017 page 29


In summary:

The Wronskian vanishes if the functions f1 (t), f2 (t), f3 (t) are


linearly dependent

W (t) := det(W(t)) ≡ 0 (for all 0 ≤ t ≤ 1). (WV)

We deduce a test for non-dependence from non-vanishing:

INDEPENDENCE TEST (Somewhere Non-vanishing Wron-


skian):
The three functions f1 (t), f2 (t), f3 (t) are linearly indepen-
dent if for some t0 with 0 ≤ t0 ≤ 1

W (t0 ) = det(W(t0 )) 6= 0.

MA212 – Lecture 37 – Fri 8 December 2017 page 30


Example:

Check for independence these functions defined for 0 ≤ t ≤ 2 :

f1 (t) ≡ 1, f2 (t) ≡ sin t, f3 (t) ≡ sin 2t (0 ≤ t ≤ 2).


 
1 sin t sin 2t
 

W(t) =  0 cos t 2 cos 2t .
0 − sin t −4 sin 2t

W (t) = −4 sin 2t cos t + 2 sin t cos 2t

MA212 – Lecture 37 – Fri 8 December 2017 page 31


Some experimenting needed:

W (0) = 0, W (π/2) = 0 + 2 · 1 · (−1) = −2.

Conclusion: Here the functions f1 (t), f2 (t), f3 (t) are linearly


independent.

MA212 – Lecture 37 – Fri 8 December 2017 page 32


In general ..

if the k functions f1 (t), f2 (t), ..., fk (t) are differentiable k − 1


times, we define
 
f1 (t) f2 (t) ··· fk (t)
 
 ′ ′ ′ 
 f1 (t) f2 (t) fk (t) 
W(t) =   .
.. .. .. 
 . . ··· . 
 
(k−1) (k−1) (k−1)
f1 (t) f2 (t) fk (t)

Then f1 (t), f2 (t), ..., fk (t) are linearly independent if for some t0

W (t0 ) = det(W(t0 )) 6= 0.

MA212 – Lecture 37 – Fri 8 December 2017 page 33


MA212 Further Mathematical Methods Lecture LA 3

Lecture 23: Linear Transformations

Linear versus Matrix Transformations


Co-ordinates from bases
Base changes and representations
Linear transformations e.g. Matrix transformations

A function T : U → V between vector spaces U, V is linear if it


is both additive and homogeneous:

(i) T (u1 + u2 ) = T (u1 ) + T (u2 )


(ii) T (au) = aT (u).

This says T preserves (or ‘respects’) vector addition and


scaling.
This is reminiscent of our definition of subspace (where addition
and scaling preserves membership of the subspace); as there,
so too here we can roll the two conditions (i) and (ii) into one:

T (a1 u1 + a2 u2 ) = a1 T (u1 ) + a2 T (u2 ),

(giving preservation of linear combinations).


MA212 – Lecture 23 – Tuesday 9 January 2018 page 2
Important special case:

For U = Rn , V = Rm and Am×n a matrix

T x : = Ax,

i.e. T is defined by using matrix multiplication to define y =T x ,


where
x ∈Rn and y ∈Rm

and
y =Ax.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 3


Comment

Whilst this is a special case of a linear transformation, it


describes exactly all linear transformations between vector
spaces U, V provided these are finite dimensional. We will see
why shortly.

We therefore say that matrices offer a representation for the


linear transformations.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 4


Example 1.

Here U = R3 and V = R1 = R
 
x
 
T y
 
 = 2x − y + z.

   
x u
   
Consider x: =    
 y  , u : =  v  , then
z w
 
x+u
 
T (x + u) = T  y + v 

 = 2[x + u] − [y + v] + [z + w]
MA212 – Lecture 23 – Tuesday 9 January 2018 page 5
z+w
Example 1 cont’d

So
 
x+u
 
T (x + u) = T  
 y + v  = 2[x + u] − [y + v] + [z + w] =
z+w
= 2x − y + z + [2u − v + w]
= T (x)+T (u)

MA212 – Lecture 23 – Tuesday 9 January 2018 page 6


Example 1 cont’d

Consider a scalar a and


 
au
 
T (au) = T  
 av  = [2au − av + aw] = a[2u − v + w] = aT (u).
aw

Notice that the matrix


 
A= 2 −1 1

represents T as we have
 
x
  
T (x) =2x − y + z = 2 −1 1  y
 
 = Ax.

z MA212 – Lecture 23 – Tuesday 9 January 2018 page 7


Example 2.

Here U = R3 and V = R3
   
x 2x − y
   
T y
 
 =  y+z



z 2x + z
  
2 −1 0 x
  
= 
 0 1 1   y 
 
2 0 1 z

MA212 – Lecture 23 – Tuesday 9 January 2018 page 8


Example 3.

Here U = Continuous functions f : [0, 1] → R and V = R


Z 1
T (f ) = f (t) dt
0

Integration is a linear transformation:


Additivity
Z 1 Z 1 Z 1
T (f + g) = [f (t) + g(t)] dt = T (f ) = f (t) dt + g(t) dt
0 0 0
= T (f ) + T (g)

MA212 – Lecture 23 – Tuesday 9 January 2018 page 9


Example 3 cont’d

Homogeneity
Z 1 Z 1
T (af ) = af (t) dt = a f (t) dt = aT (f ).
0 0

Recalling that the integral is defined as a limiting sum, T above


may be compared with this summation:
 
x
 
T 
 y  = x + y + z.
z

MA212 – Lecture 23 – Tuesday 9 January 2018 page 10


Example 4. Expectation as linear transformation.

Exotic but significant:


Here U = R is the space of random variables X on a finite
state space Ω.
T (X) = E[X].

Here

T (X + Y ) = E[X + Y ] = E[X] + E[Y ] = T (X) + T (Y ),


T (aX) = E[aX] = aE[X] = aT (X).

MA212 – Lecture 23 – Tuesday 9 January 2018 page 11


Bases

A base B is an ordered, spanning set of linearly independent


vectors. The primary purpose is to represent a vector by a
unique column of co-ordinates.

In a finite dimensional space U with dimension n a base B is


an ordered set of n vectors, written:

B = (u1 , u2 , ..., un ) or [u1 , u2 , ..., un ]

Note the round or square brackets (but NOT the curly variety
{...} ). The notation emphasizes that the vectors ui in the list
above are presented in a fixed order.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 12


Co-ordinate Column

Then for any u in U there exist unique scalars x1 , ..., xn with

u =x1 u1 + x2 u2 + ... + xn un ,

i.e. if also
u =x′1 u1 + x2 u2 + ... + x′n un ,

then
x1 = x′1 , ..., xn = x′n .

MA212 – Lecture 23 – Tuesday 9 January 2018 page 13


An expression for the co-ordinates

We write this representation of u using the unique column of


co-ordinates corresponding to B , following MA100, in either of
the two ways:
   
x x
 1   1 
   
 x2   x2 
(u)B or [u]B =    
 ..  or rarely just [u]B =  ..  .
 .   . 
   
xn xn
B

(Column vectors can be written with round or square brackets!)


We will usually use [u]B , as more convenient. Whichever choice
one follows, the notation has to “remember” the base B , and a
bit of notational overkill is just ‘safety first’.
MA212 – Lecture 23 – Tuesday 9 January 2018 page 14
Co-ordinate column cont’d

We say that the column [u]B represents u relative to the base


B , or briefly that it B -represents u .

MA212 – Lecture 23 – Tuesday 9 January 2018 page 15


Example.

In R3 the natural base is


     
1 0 0
      
E=  0  ,  1  ,  0  ,
      
0 0 1

so for u as below we have:


       
4 1 0 0
       
u :=     
 4  = 4 0  + 4 1
 + 6 0 .
  
6 0 0 1

MA212 – Lecture 23 – Tuesday 9 January 2018 page 16


So ...

so
   
4 4
   
u = 4 = 4  .
   
6 6
E

Here, because the base is the natural base,


   
4 4
   
[u]E =  4  = 
  4
 
 = u.

6 6
E

MA212 – Lecture 23 – Tuesday 9 January 2018 page 17


A comparison

Compare this to what happens when using the following


     
1 0 1
     
B=      
 0  ,  1  ,  2  ,
1 0 −1

which is a base (check this!); then for u as above we have:


       
4 1 0 1
       
u =  4  = 5 0  + 6 1  − 1 2 
      
,
6 1 0 −1

MA212 – Lecture 23 – Tuesday 9 January 2018 page 18


An alternative

so
     
4 4 5
     
   
u = 4  =  4  = 6 

 .
6 6 −1
E B

Alternatively
     
4 4 5
     
[u]E =      
 4  =  4  whereas [u]B =  6  .
6 6 −1
E

MA212 – Lecture 23 – Tuesday 9 January 2018 page 19


Using bases to describe a linear transformation

Let U be n dimensional and V be m dimensional, and let


T : U → V be a linear transformation.
Fix a base B for U and a base C for V :

B = (u1 , u2 , ..., un ) and C = (v1 , v2 , ..., vm ).

So for any u in U there exist unique scalars x1 , ..., xn with


 
x
 1 
 
 x2 
u =x1 u1 + x2 u2 + ... + xn un , i.e. [u]B = 
 ..


 . 
 
xn
B

and so u is represented by a column in Rn .


MA212 – Lecture 23 – Tuesday 9 January 2018 page 20
Similarly in the range

Likewise for v any vector in V there exist unique scalars


y1 , ..., ym with
 
y
 1 
 
 y2 
v =y1 v1 + y2 v2 + ... + ym vm , i.e. [v]C =  
 .. 
 . 
 
ym
C

and so v is represented by a column in Rm .


We will see that using these representations allows us to
represent T by a matrix.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 21


Looking for a matrix

 
x1
 
 
 x2 
For u ∈ U, knowing its B -representation x = 
 ..
 , we

 . 
 
xn
B
need to find y with
 
y1
 
 
 y2 
v =T (u) = 

 .
.. 
 . 
 
ym
C

MA212 – Lecture 23 – Tuesday 9 January 2018 page 22


Some special cases

To do this first we look at the special cases:


T (u1 ), T (u2 ), · · · , T (un )
Step 1: As these are in V, we work out their respective column
coordinates using C :
     
a a a
 11   12   1n 
     
 a21   a22   a2n 
T (u1 ) =  .  , T (u2 ) =  .  , ..., T (un ) = 
   
 .


 . . .
 .   . .
     
am1 am2 amn
C C C

MA212 – Lecture 23 – Tuesday 9 January 2018 page 23


Step 2: Set these down side by side like so

     
a a a
 11   12   1n 
     
 a21   a22   a2n 
T (u1 ), T (u2 ), ..., T (un ) = 
 ..
 ,
  ..
 , ... 
  ..


 .   .   . 
     
am1 am2 amn
C C C

or better still like so


     
a a a
 11   12   1n 
     
 a21   a22   a2n 
T (u1 )C , T (u2 )C , ..., T (un )C = 
 ..
,
  ..
 , ... 
  ..


 .   .   . 
     
am1 am2 amn
MA212 – Lecture 23 – Tuesday 9 January 2018 page 24
Step 3:

Put  
a a12 ... a1n
 11 
 
 a21 a22 a2n 
Am×n 
= . 
. .. 
 . . 
 
am1 am2 ... amn

We claim that for x, y as defined above

y = Ax.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 25


Justification

Indeed

v = T (u) =T (x1 u1 ) + T (x2 u2 ) + ... + T (xn un )


= x1 T (u1 ) + x2 T (u2 ) + ... + xn T (un )

So
     
a a a
 11   12   1n 
     
 a21   a   a 
y = x1   + x2  22  + ... + xn  2n 
 ..   ..   .. 
 .   .   . 
     
am1 am2 amn
= Ax.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 26


Comment

For later use note that because [v]C = y (same as [v]C = yC )


and uB = x we have an equivalence:

v =T (u) i.e. yC =T (xB ) iff y =Ax.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 27


Notation

When we want to remember how A comes about from the


bases B, C and the transformation T , we may write the matrix
equation y =Ax here as

y =AC,B
T x, or even y = C B
AT x.

The latter form particularly helps to associate B with x, and C


with y.
You may recall from MA100 the helpful notation y =AB→C T x.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 28


Summary:

Given T : U → V with dim U = n and dim V = m, and


bases B in U and C in V
...
To find the corresponding representing matrix A :
Simply write down side-by-side the C -representations of the
B -base images

A = T (B)C = [T (u1 )C , T (u2 )C , ..., T (un )C ]

MA212 – Lecture 23 – Tuesday 9 January 2018 page 29


Special case of natural bases where [u]E = u

For U = Rn and B the natural base B = En = (e1 , e2 , ..., en )


since
and V = Rm and C the natural base C = Em = (e1 , e2 , ..., em )

T (En )Em = [T (e1 ), T (e2 ), ..., T (en )]

since uE = u
We simplify the notation here to

A = AT

dropping the bases.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 30


Example 1.

For T : R3 → R2 given by
 
x1  
  2x1 − x2
 
T  x2  =  ,
x1 + x2 + x3
x3

we can write
   
x1   x1
  2 −1 0  
T  
 x2  =
  x2 
 
1 1 1
x3 x3

and so

MA212 – Lecture 23 – Tuesday 9 January 2018 page 31


...and so using natural bases

 
2 −1 0
AT =  .
1 1 1

MA212 – Lecture 23 – Tuesday 9 January 2018 page 32


Example 2.

Q here is the space of quadratic polynomials in t :

p ∈ Q iff p(t) = a0 + a1 t + a2 t2

Take
B = (u1 , u2 , u3 ) = (1, t, t2 )

then
 
a0
 

pB =  a 1 

a2

MA212 – Lecture 23 – Tuesday 9 January 2018 page 33


Consider shifting t to t + 1

Consider T : Q → Q

q = T (p) defined by q(t) = T (p)(t) = p(t + 1).

Find the representation of T when B = C = (1, t, t2 ).

MA212 – Lecture 23 – Tuesday 9 January 2018 page 34


Finding the representation of T when B = C = (1, t, t2 ).

Here

T (u1 ) = T (1) = 1 = u1
T (u2 ) = T (t) = t + 1 = u2 + u1
T (u3 ) = T (t2 ) = (t + 1)2 = t2 + 2t + 1 = u3 + 2u2 + u1

So
     
1 1 1
     
T (u1 )B =      
 0  , T (u2 )B =  1  , T (u3 )B =  2 
0 0 1

MA212 – Lecture 23 – Tuesday 9 January 2018 page 35


so

...
 
1 1 1
 
A= AB,B
T = 0 1 2 

.
0 0 1

MA212 – Lecture 23 – Tuesday 9 January 2018 page 36


... or more neatly

This can be done more neatly as follows

T (B)C = T (1, t, t2 )C = (T (1), T (t), T (t2 ))C


= (1, t + 1, (t + 1)2 )C
= (1C , (t + 1)C , (t2 + 2t + 1)C )
      
1 1 1
      
=   0  ,  1  ,  2 
      
0 0 1

MA212 – Lecture 23 – Tuesday 9 January 2018 page 37


i.e.

i.e.
 
1 1 1
 
A= AB,B
T = 0 1 2 


0 0 1

MA212 – Lecture 23 – Tuesday 9 January 2018 page 38


Check:

q(t) = T (p)(t) = p(t + 1)


= a0 + a1 (t + 1) + a2 (t + 1)2
= a0 + a1 (t + 1) + a2 (t2 + 2t + 1)
= a0 + a1 t + a1 + a2 t2 + 2a2 t + a2
= (a0 + a1 + a2 ) + (a1 + 2a2 )t + a2 t2

So, as B = C
    
a0 + a1 + a2 1 1 1 a0
    
qC = qB = 
 a1 + 2a2  =  0 1 2   a  = ApB .
   1 
a2 0 0 1 a2

MA212 – Lecture 23 – Tuesday 9 January 2018 page 39


Understanding Base changes in Rn

Base changes in Rn ≡ Re-representation of In×n the identity


matrix

MA212 – Lecture 23 – Tuesday 9 January 2018 page 40


Step 1. How to find the B -coordinates of a vector in Rn .

At this point in the exercise we regard a vector u in Rn as being


of course a column so it is its own co-ordinate column for the
natural base E = En i.e. u = uE (as seen earlier).
Now given another base B = (u1 , u2 , ..., un ), how do we find the
B -representation of u , i.e. the x with

[u]B = u = xB ?

MA212 – Lecture 23 – Tuesday 9 January 2018 page 41


The answer is easy:

By definition of representation
 
x1
 
 
 x2 
u =x1 u1 + x2 u2 + ... + xn un , i.e. [u]B = 
 ..


 . 
 
xn
B

but
 
x1
 
 
 x2 
u =x1 u1 + x2 u2 + ... + xn un = (u1 , u2 , ..., un ) 
 ..
.

 . 
 
xn
MA212 – Lecture 23 – Tuesday 9 January 2018 page 42
So

u =BxB for B = (u1 , u2 , ..., un ).

MA212 – Lecture 23 – Tuesday 9 January 2018 page 43


Lite work:

The matrix taking B -coordinates to E -coordinates is the matrix


made up from B.
But B has rank n so is invertible, and so

x =B −1 u.

So the other way from E to B is big beer – harder computation:


The matrix taking E -coordinates to B -coordinates is the matrix
made up from B −1 .
Done and dusted.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 44


Important re-interpretation.

Recall the equivalence between a transformation and its


representing matrix

v = T (u) i.e. iff y =Ax.

As u is its own E -representation: u = [u]E (last slide?), we can


relabel u as a y to remember that it is an E -representation
giving
y =BxB with B = (u1 , u2 , ..., un ).

So from the equivalence: for some linear transformation T

B = AE,B
T = T (B) so T [u1 , u2 , ..., un ] = B = [u1 , u2 , ..., un ].

MA212 – Lecture 23 – Tuesday 9 January 2018 page 45


This means that..

T (ui ) = ui for all base vectors

so by linearity

T (u) = u, for all vectors u ∈ U = Rn , i.e. T = In×n is the identity.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 46


Conclusion:

The matrix taking B -coordinates to E -coordinates represents


the identity:
B = AE,B
I

So
B −1 = AB,E
I

MA212 – Lecture 23 – Tuesday 9 January 2018 page 47


Consequences

Suppose that U = Rn has a base B = (b1 , b2 , ..., bn ) and


V = Rm has a base C = (c1 , c2 , ..., cm ). Find the representation
of the matrix transformation

y =Ax.

The context here is natural bases so, as x = xE and y = yE , in


fact
yE = AxE .

MA212 – Lecture 23 – Tuesday 9 January 2018 page 48


Goal:

We are to find a matrix M with

yC = M xB ,

where xB are the B -coordinates of x and yC are the


C -cordinates of y.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 49


Solution:

Break this up into small steps (‘Slowly, slowly catchy monkey’).

Step 1. (Lite work) Interpret A as a transformation v =T (u)


relative to the natural base En and Em , i.e. with x = xE and
y = yE .

Step 2. (Lite work) Do a base change from En to B and from


Em to C :
x = xE = BxB and y = yE = CyC

Step 3. Substitute the above into y =Ax and solve for yC in


term of xB :

yE = AxE ⇒ CyC = ABxB =⇒ yC = C −1 ABxB .


MA212 – Lecture 23 – Tuesday 9 January 2018 page 50
Big deal?

Possible as C is m × m and invertible (rank= m ). So

M = AC,B
T = C −1
AB.

MA212 – Lecture 23 – Tuesday 9 January 2018 page 51


MA212 Further Mathematical Methods Lecture LA 4

Lecture 24: Similar Matrices

Similar square matrices (and their characteristic


polynomials)
Computation of ranges and kernels
From scalar product to inner product
... But first a recap

Story so far:
1. T : U → V with bases B = (b1 , b2 , ..., bn ) in U and C in V
is represented by:

A = T (B) = ((T b1 )C , (T (b2 )C , ..., T (bn )C ).

2. Base change E → B has

[u]E =(b1 , b2 , ..., bn )xB =⇒ xB = B −1 [u]E

3. Representing y =Ax with bases B = (b1 , b2 , ..., bn ) in Rn


and C = (c1 , c2 , ..., cm ) in Rm has

CyC = ABxB =⇒ yC = C −1 ABxB .

MA212 – Lecture 24 – Friday 12 January 2018 page 2


Example revisited.

For T : R3 → R2 given by
   
x1     x1
  2x1 − x2 2 −1 0  
 
T  x2  =   =    x 
 2 
x1 + x2 + x3 1 1 1
x3 x3

find the representation in which ...

MA212 – Lecture 24 – Friday 12 January 2018 page 3


the representation in which...

       
1 0 1 1 0 1
       
       
B =  0  ,  1  ,  2  7→  0 1 2 


1 0 −1 1 0 −1

and      
2 −1 2 −1
C =  ,  7→  
4 3 4 3

MA212 – Lecture 24 – Friday 12 January 2018 page 4


Computation

 
−1 1  3 1 
C = (swop/re-sign/divide)
6+4 −4 2

So the required matrix is


 
   1 0 1
1  3 1   2 −1 0  



M =  0 1 2 
10 −4 2 1 1 1
1 0 −1
 
1  8 −2 2 
= ... =
10 −4 6 4

MA212 – Lecture 24 – Friday 12 January 2018 page 5


Similarity of square matrices:

Given the matrix An×n which defines in Rn a transformation


T : Rn → Rn via the formula y = Ax we have

A = AEE
T ;

this uses E the natural base twice: in the domain and in the
range of T . Now switch from the natural base E to an arbitrary
base C in Rn , to be used both in the domain and the range of
T. Then the transformation T is represented by

−1 EE −1
ACC
T = C AT C = C AC.

MA212 – Lecture 24 – Friday 12 January 2018 page 6


Similarity

We say that the matrix An×n is similar to Bn×n if for some


non-singular Pn×n
A = P −1 BP.

(The matrix P is sometimes called ‘conjugating’.) Since

B = P AP −1 = Q−1 AQ, with Q = P −1 ,

we see that B is similar to A, so we just say that A and B are


similar.
Thus ACC T above and A are similar.

MA212 – Lecture 24 – Friday 12 January 2018 page 7


Comment

Think of
P = (p1 , p2 , ..., pn )

as determining a base comprising its columns p1 , p2 , ..., pn .


Now we see that similar matrices A and B represent the same
linear transformation, but relative to different bases.

MA212 – Lecture 24 – Friday 12 January 2018 page 8


Similarity is an equivalence:

i) A is similar to A since A = IAI −1


ii) A similar to B implies B similar to A (as 2 slides earlier)
iii) A similar to B and B similar to C implies that for some P, Q
non-singular

B = P AP −1 and C = QBQ−1 ;

so
C = QP AP −1 Q−1 = (QP )A(QP )−1

and so C is similar to A.

MA212 – Lecture 24 – Friday 12 January 2018 page 9


Eigenvectors and eigenvalues

For a square matrix A and scalar λ if

Ax = λx for some x 6= 0,

then λ is an eigenvalue of A and x is a


corresponding/associated eigenvector.
Note that
Ax = λx iff (A − λI)x = 0,

so since x 6= 0 :
λ is an eigenvalue iff A − λI is singular iff det(A − λI) = 0 .
A real matrix A may have complex eigenvalues, in which case
the eigenvector x is in Cn .

MA212 – Lecture 24 – Friday 12 January 2018 page 10


Characteristic polynomial

It is convenient to define the characteristic polynomial of A as

pA (x) := det(xI − A),

as this gives a monic polynomial – meaning that the leading


term, which is ( +1)xn , has coefficient one (from ‘monos’ the
Greek for ‘first’).

Theorem. If A, B are similar, then

pA (x) = pB (x).

MA212 – Lecture 24 – Friday 12 January 2018 page 11


Proof

Suppose that A = P BP −1 then

1 = det(P P −1 ) = det(P ) det(P −1 )

and so

pA (x) = det(xI − A) = det(xP P −1 − P BP −1 )


= det(P (xI − B)P −1 )
= det(P ) det(xI − B) det(P −1 )
= det(xI − B) = pB (x).

MA212 – Lecture 24 – Friday 12 January 2018 page 12


Example

For  
1 2
A= 
3 4



x − 1 −2
pA (x) = = (x − 1)(x − 4) − (−2)(−3)

−3 x − 4
= x2 − 5x − 2

Notice that

5 = (1 + 4) = trace(A)
−2 = (4 − 6) = det(A).

MA212 – Lecture 24 – Friday 12 January 2018 page 13


General situation:

For An×n , as pA is monic of degree n,

pA (x) = (x − λ1 )...(x − λn ),

with λ1 , ..., λn the n roots (accounting for multiplicity). But



x−a −a12 ... −a1n
11

−a x − a22 ... −a2n
21
pA (x) = = (x − λ1 )...(x − λn ).
... ...


−an1 x − ann

MA212 – Lecture 24 – Friday 12 January 2018 page 14


Continued

Expanding and comparing coefficients at xn−1

−a11 − a22 − ... − ann = −λ1 − λ2 − ... − λn

So switching signs on both sides

λ1 + λ2 + ... + λn = tr(A) = a11 + a22 + ... + ann .

MA212 – Lecture 24 – Friday 12 January 2018 page 15


Comparing the constant terms we have

(Put x = 0 )
(−1)n det(A) = (−1)n λ1 λ2 ...λn

and so

det(A) = λ1 λ2 ...λn .

MA212 – Lecture 24 – Friday 12 January 2018 page 16


Consequences:

So if A and B are similar:

(i) A and B have the same eigenvalues and to the same


multiplicities;

(ii) A and B have the same trace;

(iii) A and B have equal characteristic polynomials.

MA212 – Lecture 24 – Friday 12 January 2018 page 17


Rank and Nullity

Ranges, kernels and their dimensions


How to compute these using Gaussian elimination

MA212 – Lecture 24 – Friday 12 January 2018 page 18


Null space

For T : U → V a linear transformation, the null space, also


known as the kernel, of T is

N (T ) = ker(T ) = {u ∈ U : T (u) = 0}.

This is indeed a vector subspace of U.


The range space, or image space, of T is

R(T ) = {v ∈ V : v =T (u) for some u ∈ U }.

This is indeed a vector subspace of V.

MA212 – Lecture 24 – Friday 12 January 2018 page 19


Illustration: kernel and range

T
U V

࣬(T)

࣬(T)
ࣨ(T)

MA212 – Lecture 24 – Friday 12 January 2018 page 20


Distinction without a difference

Using bases B, C in U, V and associated co-ordinates

T (u) = 0 iff ACB


T x=0

and
v =T (u) iff y =ACB
T x.

When B, C are the natural bases En and Em and A = AT we


don’t fuss over the distinctions between x and xE nor between
y and yE and we write

N (T ) = N (AT ) = N (A)

so that in a word: N (T ) is the set of solutions to the equation


AT x = 0.
MA212 – Lecture 24 – Friday 12 January 2018 page 21
Likewise

R(T ) = R(AT ) = R(A).

MA212 – Lecture 24 – Friday 12 January 2018 page 22


Example

Consider
      
x x−y 1 −1 x
T = =  .
y 2y − 2x −2 2 y

 
1 −1
AT =  
−2 2

MA212 – Lecture 24 – Friday 12 January 2018 page 23


Computation

We consider the null space:


       
 x 1 −1 x 0 
N (T ) =   :    =  
 y −2 2 y 0 

so we must solve

x − y = 0 and 2y − 2x = 0. So
         
 x   1   1 
N (T ) =   : x ∈ R = x   : x ∈ R = Lin  
 x   1   1 

a line through the origin in direction (1, 1)T .

MA212 – Lecture 24 – Friday 12 January 2018 page 24


We consider the range:

Notice that any number z can be written as x − y (e.g. take


z = x and y = 0), so

    
 x−y   x−y
R(T ) =   : x, y ∈ R =   : x, y ∈
 2y − 2x   −2(x − y)
       
 1   1 
= (x − y)   : x, y ∈ R = z  :z∈R
 −2   −2 
 
 1 
= Lin   ,
 −2 

again a line.
MA212 – Lecture 24 – Friday 12 January 2018 page 25
Range and column space

For A an m × n matrix with columns of depth m and with n


columns altogether:
  
a a12 ... a1n x
 11  1 
  
 a21 a22 ... amn   x2 
Ax =   .. ..

..   .. 
=
 . . ... .  . 
  
am1 am2 ... amn xn
     
a a a
 11   12   1n 
     
 a21   a22   amn 
= x1  .  + x2  .  + ... + xn 
   
 .


 . .  . . .
 . 
     
am1 am2 amn
= x1 a1 + x2 a2 + ... + xn an MA212 – Lecture 24 – Friday 12 January 2018 page 26
So...

R(T ) = Lin {a1 , a2 , ..., an }

and is the column space of A.

MA212 – Lecture 24 – Friday 12 January 2018 page 27


Row operations preserve column dependencies.

The best way to see this is to consider the simplest example of


dependency: v =2u.
   
1 2
   
2   
 2 = 4 
3 6

We check what happens under each row operation:

MA212 – Lecture 24 – Friday 12 January 2018 page 28


(i) exchange of first and second rows

   
2 4
   
2 1 = 2 
   
3 6

MA212 – Lecture 24 – Friday 12 January 2018 page 29


(ii) Now add/subtract another row from

   
2 4
   
2 1 = 2 
   
3 6
   
2 4
   
2
 1 =
  2 

3−2 6−4
R3 −R1

MA212 – Lecture 24 – Friday 12 January 2018 page 30


Computation using Gaussian elimination.

It is again easiest to learn this method by example.


Given a matrix: Make the top-left 1 a leading 1 in the echelon
form:
 
1 2 0 1 2
 
 2 4 1 3 3  R2 − 2R1
 
 
 −1 −2 1 0 −3  R3 + R1
 
 
 −3 −6 0 −3 −6  R4 + 3R1
 
0 0 −3 −3 3 nowt to do

MA212 – Lecture 24 – Friday 12 January 2018 page 31


Cont’d

This gives another column namely the third to contribute a 2 nd


row with leading 1
 
1 2 0 1 2 leader
 
 0 0 1 1 −1  leader
 
 
 0 0 1 1 −1  R3 − R2
 
 
 0 0 0 0 0  OK
 
0 0 −3 −3 3 R5 + 3R2

MA212 – Lecture 24 – Friday 12 January 2018 page 32


Cont’d

 
1 2 0 1 2 leader
 
 0 0 1 1 −1  leader
 
 
B = 
 0 0 0 0 0  OK
 
 0 0 0 0 0  OK
 
0 0 0 0 0 OK

↑ × ↑ × ×
=
x1 x2 x3 x4 x5

Here B is in echelon form.

MA212 – Lecture 24 – Friday 12 January 2018 page 33


Step 1 Find N (B)

i.e.

x1 + 2x2 + x4 + 2x5 = 0,
x3 + x4 − x5 = 0.

Take x2 , x4 , x5 arbitrary (this corresponds to the × marked


columns) and use these to define x1 and x3 (these ones
correspond to the up-arrow associated with the leaders)

x1 = −2x2 − x4 − 2x5 ,
x3 = −x4 + x5 .

MA212 – Lecture 24 – Friday 12 January 2018 page 34


The solution set is

       
−2x2 − x4 − 2x5 −2 −1 −2
       
 x2   1   0   
      0 
       
 −x4 + x5  = x2  0  + x4  
−1  + x5  1 

     
       
 x4   0   1   0 
      
x5 0 0 1

MA212 – Lecture 24 – Friday 12 January 2018 page 35


Step 2. Find N (A) : easy!

N (A) = N (B) i.e. Ax = 0 iff Bx = 0,

Why? Because the 5 row operations performed can be shown


as multiplications by elementary matrices, which are invertible:

B = M5 M4 M3 M2 M1 A

So
     

 −2 −1 −2 


     


 


 1   0 
 
 
 0 


     
N (A) = Lin 
 0  ,  −1 
  ,
 
 1  .


     

 

 0    
  1   0




 

 
0 0 1

This has dimension 3 corresponding to the three crosses ×.


MA212 – Lecture 24 – Friday 12 January 2018 page 36
Step 3 Find R(B) Range:

We read off from B the independent columns of B (these are


the leaders): col (B)1 and col (B)3
Now notice the dependencies

col(B)1 = e1
col(B)2 = 2e1 = 2col(B)1
col(B)3 = e2
col(B)4 = e1 + e2 = col(B)1 + col(B)3
col(B)5 = 2e1 − e2 = 2col(B)1 − col(B)3

So
col-space(B) = Lin{col(B)1 , col(B)3 }.

MA212 – Lecture 24 – Friday 12 January 2018 page 37


Step 4: Find N (A) : easy! Copy-cat:

col-space(A) = Lin{col(A)1 , col(A)3 },

because row operations preserve column dependencies. This


has dimension 2,

Remark: We see here by example that the domain of A is R5 as


there are 5 columns, and that ‘rank + nullity’=5:

r(A) + n(A) = 2 + 3 = 5 = dim-dom(A).

MA212 – Lecture 24 – Friday 12 January 2018 page 38


Summary and general situation:

If A is m × n row reduce it to an echelon matrix Bm×n with n


columns; of these r get up-arrows (leaders) and the remaining
columns are in number k = n − r and get marked ×.
Find N (B) by giving arbitrary values (free choices) to the k
variables marked × ; substitute for the leaders, and decompose
the solution vector as a linear combination of k columns, one
per free choice variable.
So N (A) has dimension k (‘k’ as in kernel).

MA212 – Lecture 24 – Friday 12 January 2018 page 39


Summary cont’d

To find R(A) describe R(B) as spanned by the columns of B


with numbers col(B)i corresponding to the r independent
columns which are determined by the leaders.
Then R(A) is described by swopping B for A in the previous
description.

MA212 – Lecture 24 – Friday 12 January 2018 page 40


Consequence:

r(A) + n(A) = r + k = r + (n − r) = n = dim-dom(A).

rank(A) + nullity(A) = dim-dom(A).

MA212 – Lecture 24 – Friday 12 January 2018 page 41


Another consequence for free:

It is obvious from the echelon form that

row-rank(B) = col-rank(B) = r.

Inverting all the row operations we get

row-rank(A) = r = col-rank(A).

MA212 – Lecture 24 – Friday 12 January 2018 page 42


Inner products: Cosine Rule recalled

Consider a triangle in the plane R2 with one vertex at 0 an the


other two at a and b. We denote the angle betwen a and b by
θ. Write
c = b − a,

then
b = a + (b − a) = a + c,

and by the Cosine Rule

||c||2 = ||a||2 + ||b||2 − 2||a|| · ||b|| cos θ.

MA212 – Lecture 24 – Friday 12 January 2018 page 43


Cont’d

We can connect this result to the coordinates of vectors a , b as


follows.
Define for u, v in R2

u · v := u1 v1 + u2 v2 ,

then, as in Pythagoras’s Theorem for the triangle with


perpendicular sides of lengths u1 , u2 ,

u · u = u21 + u22 = ||u||2 .

MA212 – Lecture 24 – Friday 12 January 2018 page 44


...Furthermore

(αu) · v = (αu1 )v1 + (αu2 )v2 = α(u1 v1 + u2 v2 ) = α(u · v)

and

(u + v) · w = (u1 + v1 )w1 + (u2 + v2 )w2


= (u1 w1 + u2 w2 ) + (v1 w1 + v2 w2 )
= u · w + v · w.

and
u·v =v·u

MA212 – Lecture 24 – Friday 12 January 2018 page 45


Returning to the cosine rule:

||c||2 = c · c
= (b − a) · (b − a) = b · (b − a) − a · (b − a)
= b · b + a · a−2a · b

So
a · b =||a|| · ||b|| cos θ

and in particular a and b are orthogonal/perpendicular when


θ = π/2 and
a · b =0.

MA212 – Lecture 24 – Friday 12 January 2018 page 46


Scalar products in Rn

We now extend the above ideas and define for u, v in Rn

u · v = u1 v1 + u2 v2 + ... + un vn

so that, consistently with Pythagoras’s Theorem,

u · u = ||u||2 = u21 + u22 + ... + u2n .

This allows us to define u and v to be orthogonal when

u · v = 0.

MA212 – Lecture 24 – Friday 12 January 2018 page 47


Properties of the scalar product in Rn

1. Linearity property:

(αu+βv) · w = αu · w+βv · w.

so that for fixed v the transformation Tv below is linear:

Tv (u) := u · v.

2. Symmetry property

u · v = v · u.

3. Positivity property

u · u >0 for u 6= 0.

MA212 – Lecture 24 – Friday 12 January 2018 page 48


MA212 Further Mathematical Methods Lecture LA 5

Lecture 25: Towards Pythagoras’s Theorem

Inner products
Orthogonality
Pythagoras’s Theorem
Angles ...leading to
Orthonormal bases (if time allows)
Recapitulation

Last time we met in Rn the scalar (or dot) product:

u · v = u1 v1 + ... + un vn

of two vectors, and noted that

a · b =||a|| · ||b|| cos θ


u · u = u21 + ... + u2n = kuk2
Linearity: (αx + βy) · v = αx · v + βy · v
Symmetry: u · v = v · u
Positivity: u · u > 0 for u 6= 0

We study this and related examples of ‘products’ with these 3


properties under the name of inner product.
MA212 – Lecture 25 – Tuesday 16 January 2018 page 2
Two important examples of inner products.

Example 1. In R3 first rescale the three axes by 1, 2, 3 :


        
x x x 1 0 0 x
        
 y  →  2y  , represented by  2y  =  0 2 0   y 
        
z 3z 3z 0 0 3 z

Now apply the old scalar product to the images. This gives a
new inner product, so we use angular brackets to denote this
new operation:

hu, vi = u1 v1 + (2u2 )(2v2 ) + (3u3 )(3v3 )


= u1 v1 + 4u2 v2 + 9u3 v3 .

MA212 – Lecture 25 – Tuesday 16 January 2018 page 3


Interpretation

One can interpret this scaling as placing increasingly greater


significance on the second and third coordinates, for instance if
we attach greater probability to outcomes measured by the three
co-ordinates.
Notice that the matrix
 
1 0 0
 
 0 2 0 
 
0 0 3

has positive eigenvalues.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 4


Example 2.

Consider the matrix


 
1 1 0
 
A :=  
 1 2 0 .
0 0 1

This matrix has entries symmetric about the main diagonal:


A = AT . We call A symmetric if A = AT . Put

hu, vi = uT Av =
= u1 v1 + 2u2 v2 + u3 v3 + u1 v2 + u2 v1 .

MA212 – Lecture 25 – Tuesday 16 January 2018 page 5


Example 2 cont’d

Here, since A = AT we get symmetry in the inner product:

hu, vi = uT Av = uT AT v = (vT Au)T = hv, uiT = hv, ui,

the last step because hv, ui is a number (which we identify with


a 1 × 1 matrix).
Obviously

(αu1 + βu2 )T Av =αuT1 Av+βuT2 Av,

so for each fixed v the following map is linear:

T (u) := hu, vi.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 6


Now we check for positivity.

Consider that

hu, ui = u21 + 2u22 + u23 + 2u1 u2


= (u1 + u2 )2 + u22 + u23 ≥ 0.

Here if hu, ui = 0 then each of the squares: (u1 + u2 )2 , u22 , u23 is


0, so u3 = 0, u2 = 0 and as u1 + u2 = 0 also u1 = 0. So we do
have positvity.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 7


Real inner products

In a real vector space V (i.e. with the scalars taken from R) a


map V × V → R , denoted hv1 , v2 i , is an inner product if for all
vectors v1 , v2 , v in V and α, β scalars in R the following
properties hold:
1. Linearity

hαv1 + βv2 , vi = αhv1 , vi + βhv2 , vi,

2. Symmetry
hv1 , v2 i = hv2 , v1 i,

3. Positivity
hv, vi > 0 for v 6= 0.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 8


Consequences

hv, 0i = 0 = h0, vi = 0, for all v

and in particular
h0, 0i = 0.

Indeed

h0, vi = h0 + 0, vi = h0, vi + h0, vi =⇒ h0, vi = 0

MA212 – Lecture 25 – Tuesday 16 January 2018 page 9


Bilinearity

hv, αv1 + βv2 i = αhv, v1 i + βhv, v2 i

This follows from symmetry:

hv, αv1 + βv2 i = hαv1 + βv2 , vi = αhv1 , vi + βhv2 , vi


= αhv, v1 i + βhv, v2 i.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 10


Matrix representation of an inner product

For V finite dimensional with dimension n , take a basis


B = (v1 , v2 , ..., vn ). Consider the co-ordinate columns of two
vectors x and y
v = BxB , w = ByB ,

then

hv, wi = hBxB , wi = hx1 v1 + x2 v2 + ... + xn vn , wi


= hx1 v1 , wi + hx2 v2 , wi + ... + hxn vn , wi
= x1 hv1 , wi + x2 hv2 , wi + ... + xn hvn , wi

MA212 – Lecture 25 – Tuesday 16 January 2018 page 11


Cont’d

But, likewise,

hv1 , wi = hv1 , ByB i = hv1 , y1 v1 + y2 v2 + ... + yn vn i


= y1 hv1 , v1 i + y2 hv1 , v2 i + ... + yn hv1 , vn i.

Similarly

hv2 , wi = y1 hv2 , v1 i + y2 hv2 , v2 i + ... + yn hv2 , vn i.

etc .

MA212 – Lecture 25 – Tuesday 16 January 2018 page 12


Putting all these together gives ...

X
hv, wi = xi yj hvi , vj i = xT W y,
i,j

just like in Example 2, and here


 
hv , v i hv1 , v2 i ... hv1 , vn i
 1 1 
 
 hv2 , v1 i hv2 , v2 i ... hv2 , vn i 
W = .. .. ..
.

 . . ... . 
 
hvn , v1 i hvn , v2 i ... hvn , vn i

MA212 – Lecture 25 – Tuesday 16 January 2018 page 13


Comment

Note that W = W T is symmetric, because hvi , vj i = hvj , vi i .

But, not every symmetric matrix gives positivity.

We will see that a real inner product is of the form xT W y for W


a symmetric matrix such that all its eigenvalues are positive (just
as in Example 2).

MA212 – Lecture 25 – Tuesday 16 January 2018 page 14


Exotic Example: Covariance in L20

A random variable has finite mean square (so said to be L2 ) if

E[X 2 ] < ∞.

Recall that cov(X, Y ) = E(XY ) − E(X)E(Y ); we want to keep


matters simple, so consider the subspace of mean-zero random
variables, denoted L20 .

hX, Y i = E(XY ).

Clearly this is linear and symmetric on L20 . Now

hX, Xi = E[X 2 ] ≥ 0

and E[X 2 ] = 0 implies that X = 0 (actually: = almost always),


so we have positivity. MA212 – Lecture 25 – Tuesday 16 January 2018 page 15
Cont’d

Here, for independent X, Y

E(XY ) = E(X)E(Y ) = 0

so X, Y are orthogonal under the inner product.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 16


Remarks

1. For Gaussian random variables hX, Y i = 0 iff X, Y are


independent.
2. In the next slide we see that any inner product can be used to
introduce a notion of ‘length of a vector’ called its norm;
consequently, we observe that in L20 a notion of length ||X|| can
be introduced with

||X||2 := hX, Xi = var(X) > 0 iff X 6= 0.

(Incidentally this explains why we wanted mean zero: the


variance of any constant is zero and its mean is itself; so here
we avoid ‘constant’ random variables by requiring the mean to
be zero.)

MA212 – Lecture 25 – Tuesday 16 January 2018 page 17


Norms from inner products: motivation

In R2 consider the right angle triangle with the vector


v =(v1 , v2 )T along the hypotenuse. Pythagoras’s Theorem tells
us that
||v||2 = v12 + v22 = v · v.

Likewise in R3 with the vector v =(v1 , v2 , v3 )T

||v||2 = v12 + v22 + v32 = v · v.

This motivates the definition in the next slide.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 18


Definition of norm

In an inner product space V we introduce ||v|| , the norm of a


vector v , by setting
p
||v|| := hv, vi, so that ||v||2 := hv, vi.

This is valid as hv, vi ≥ 0.

Two useful features are:


1. ||0||2 = h0, 0i = 0, but hv, vi > 0 for v 6= 0.
2. Positive homogeneity:

||αv||2 = hαv, αvi = α2 hv, vi = α2 ||v||2 = |α|2 ||v||2 so ...

||αv|| = |α|.||v||
MA212 – Lecture 25 – Tuesday 16 January 2018 page 19
An application

In L20 (the space of random variables X with finite mean square


and with mean zero) we get
p
||X||2 := E[X 2 ], so that ||X||22 := cov(X, X) = E[X 2 ] = var(X).

This is called the L2 -norm; the subscript 2 on the norm symbol


and the superscript on the L symbol remind us of the squaring
involved.
So here the variance, square-rooted, provides a notion of length.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 20


Pythagoras’s Theorem

We go on to see that the norm corresponds to our intuitions


about distance.
Consider a triangle with sides parallel to the vectors u and v,
where these two vectors are perpendicular. The hypotenuse is
u + v.

Pythagoras’s Theorem: In a real inner product space V ,


If hu, vi = 0 , then

||u + v||2 =||u||2 +||v||2 .

MA212 – Lecture 25 – Tuesday 16 January 2018 page 21


Proof

Expanding and using bilinearity

||u + v||2 = hu + v, u + vi
= hu, ui + hv, vi + hu, vi + hv, ui
= ||u||2 +||v||2 ,

since by assumption 0 = hu, vi = hv, ui, the latter by symmetry.




MA212 – Lecture 25 – Tuesday 16 January 2018 page 22


A corollary to come

The corollary in the next slide (in the form of a Theorem) allows
us to regard an arbitrary inner product as satisfying

hu, vi = ||u||.||v|| cos θ

for some θ which we define to be the angle between u and v ,


just like in the cosine rule.

However, unlike in the cosine rule, here we do not start from a


notion of length:
remember that the lengths on the right-hand side of the equation
arise from the inner product, which we seek to interpret.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 23


Cauchy-Schwarz Inequality

The idea of the argument below is that IF we can validly


introduce a notion of angle, as above, then dropping a
perpendicular from u onto the line spanned by v is represented
by the point along v of length

α := ||u|| cos θ.

If v is of unit length, then the said point is (||u|| cos θ).v.

Theorem (Cauchy-Schwarz Inequality). In a real inner


product space V

|hu, vi| ≤ ||u||.||v||, for u, v ∈ V.

Proof. We do this in two steps. MA212 – Lecture 25 – Tuesday 16 January 2018 page 24
Step 1.

We consider the special case when

||v|| =1.

So we are to prove that |hu, vi| ≤ ||u||. Put

α := hu, vi

(so αv is the foot of the perpendicular) and let

w := u − αv.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 25


Continued

Then (as expected from dropping perpendiculars) w is


orthogonal to v :

hw, vi = hu − αv, vi = hu, vi − αhv, vi = hu, vi − α = 0.

So also
hw, αvi = αhw, vi = 0.

So applying Pythagoras’s Theorem to w and αv , and noting


that w+αv = (u − αv)+αv = u we have

||u||2 = ||w+αv||2 = ||w||2 + ||αv||2

MA212 – Lecture 25 – Tuesday 16 January 2018 page 26


Step 1 Cont’d

But, by positive homogeneity

||αv||2 = |α|2 ||v||2 = |α|2 .

So, dropping the non-negative term ||w||2 in the Pythagoras


formula
||u||2 ≥ ||αv||2 = |α|2 = |hu, vi|2 .

So
||u|| ≥ |hu, vi|,

as required.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 27


Step 2.

We now deal with a general v.


If v = 0, then both sides of the inequality are zero, since
hu, 0i = 0 , and we already saw that ||0||2 = h0, 0i = 0.
Proceeding to the case ||v||2 = hv, vi > 0, we can do some
rescaling of v to v/||v|| .
By positive homogeneity

1 1

||v|| v = ||v|| ||v|| = 1.

We can apply the special case to the vectors u and v/||v|| , as


follows.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 28


Step 2 Cont’d

Since
1
hu, v/||v||i = hu, vi
||v||
we get
1
|hu, vi| = |hu, v/||v||i| ≤ ||u||,
||v||
and so on cross-multiplying

|hu, vi| ≤ ||u||.||v||,

as required. 

MA212 – Lecture 25 – Tuesday 16 January 2018 page 29


Angles

As
−||u||.||v|| ≤hu, vi ≤ ||u||.||v||,

for u, v 6= 0 we can write this as

hu, vi
−1≤ ≤ 1,
||u||.||v||

and so we can find a unique angle θ with 0 ≤ θ ≤ π such that

hu, vi
cos θ = .
||u||.||v||

We define this θ to be the angle between u and v . Then

hu, vi = ||u||.||v|| cos θ.


MA212 – Lecture 25 – Tuesday 16 January 2018 page 30
Comment

This will helps us to remember how to drop a perpendicular from


u onto a unit vector:

||u|| cos θv = hu, viv for ||v|| = 1

MA212 – Lecture 25 – Tuesday 16 January 2018 page 31


A Final Result

From the Cauchy-Schwarz inequality we deduce our final result


on norms.
In R2 consider the parallelogram determined by the origin, two
vertices determined by vectors u, v with u + v as the fourth
vertex. The triangle formed by u and v has third side of length
same as that of u + v . This motivates a similar situation in an
inner product space:

Theorem (The triangle inequality).


In a real inner product space V , for any vectors u, v ∈ V

||u + v|| ≤ ||u|| + ||v||.

MA212 – Lecture 25 – Tuesday 16 January 2018 page 32


Proof.

Expand using bilinearity and use symmetry

||u + v||2 = hu + v, u + vi
= hu, ui + hv, vi + hu, vi + hv, ui
= ||u||2 + ||v||2 + 2hu, vi
≤ ||u||2 + ||v||2 + 2||u||.||v|| by Cauchy-Schwarz
= (||u|| + ||v||)2 .

That is
||u + v||2 ≤ (||u|| + ||v||)2 .

But since both ||u + v|| and ||u|| + ||v|| are non-negative

||u + v|| ≤ ||u|| + ||v||,


MA212 – Lecture 25 – Tuesday 16 January 2018 page 33

as required.
Applications

In C[0, 1] put
Z 1
hf, gi = f (t)g(t) dt.
0

(This is like summing ui vi .) This is symmetric and bilinear; also


Z 1
hf, f i = f (t)2 dt = 0 =⇒ f (t) ≡ 0.
0

(If f (t)2 is non-zero at some point t it will be non-zero in an


interval contributing positive area.)
So here Pythagoras’s Theorem asserts that if hf, gi = 0 then
Z 1 Z 1 Z 1
2 2
[f (t) + g(t)] dt = f (t) dt + g(t)2 dt.
0 0 0

MA212 – Lecture 25 – Tuesday 16 January 2018 page 34


Furthermore

Also here Cauchy-Schwarz asserts that


Z 2 Z  Z 
1 1 1
f (t) · g(t) dt ≤ f (t)2 dt g(t)2 dt .

0 0 0

The triangle inequality asserts that


Z 1 1/2 Z 1 1/2 Z 1 1/2
[f (t) + g(t)]2 dt ≤ f (t)2 dt + g(t)2 dt .
0 0 0

MA212 – Lecture 25 – Tuesday 16 January 2018 page 35


In L2

For any X take mX = E[X]

X̃ = X − mX ... so that E[X̃] = 0,

so X̃ is in L20 . Cauchy-Schwarz applied to L20 with

hX̃, Ỹ i = E[X̃ Ỹ ] and ||X̃||22 = E[X̃ 2 ]

gives
|hX̃, Ỹ i|2 ≤ ||X̃||22 .||Ỹ ||22

MA212 – Lecture 25 – Tuesday 16 January 2018 page 36


Cont’d

But notice that

E[X̃ Ỹ ] = E[(X − mX )(Y − mY )]


= E[XY − mX Y − XmY + mX mY ]
= E[XY ] − mX E[Y ] − mY E[X] + mX mY
= E[XY ] − mX mY = cov(X, Y )

and in particular

||X̃||22 = E[X̃ 2 ] = E[X 2 ] − m2X = var(X).

MA212 – Lecture 25 – Tuesday 16 January 2018 page 37


Cont’d

So
cov(X, Y )2 ≤ var(X)var(Y ).

Here the angle θ between X and Y corresponds to their


correlation:
cov(X, Y )
p
var(X)var(Y )

MA212 – Lecture 25 – Tuesday 16 January 2018 page 38


MA212 Further Mathematical Methods Lecture LA 6

Lecture 26: Orthogonal Matrices

Orthogonality
Orthogonal bases,
Orthogonal matrices: isometry & conformality properties
Rotation and reflection in R2 and R3
Recapitulation: story so far

Real inner products hu, vi yield a notion of orthogonality:

u⊥v means hu, vi = 0.

Pythagoras’s Theorem follows and identifies the foot of the


perpendicular from u onto the line through v as

v v hu, viv
hu, i =
||v|| ||v|| ||v||2

In an n -dimensional space

hu, vi = uT W v for some symmetric W with positive eigenvalues.

MA212 – Lecture 26 – Friday 19 January 2018 page 2


Cautionary example: symmetry is not enough

 
1 2
W = 
2 1

Here W = W T so is symmetric but


  
h i 1 2 u1
T
hu, ui = u W u = u1 u2   
2 1 u2
= u21 + 4u1 u2 + u22 = (u1 + u2 )2 + 2u1 u2

MA212 – Lecture 26 – Friday 19 January 2018 page 3


So positivity fails, because ...

 
1
... if u =   then hu, ui = −2 < 0.
−1

Underlying cause is that one eigenvalue is negative:




x − 1 −2
pW (x) = = x2 − 2x − 3 = (x + 1)(x − 3).

−2 x − 1

Conclusion: the matrix W needs to be symmetric and have


positive eigenvalues.

MA212 – Lecture 26 – Friday 19 January 2018 page 4


Orthonormal bases

Why these? Co-ordinates come much easier... via inner


products.
The following applies to any real inner product space.
The vectors {v1 , ..., vm } are mutually orthogonal if for all i
vi 6= 0 (all are non-zero) and

hvi , vj i = 0 for all i, j with i 6= j.

The vectors {v1 , ..., vm } form an orthonormal set if


(i) they are mutually orthogonal, and
(ii) for all i ||vi || = 1.

MA212 – Lecture 26 – Friday 19 January 2018 page 5


Example:

In Rn the natural base vectors {e1 , ..., en } form an orthonormal


set.

MA212 – Lecture 26 – Friday 19 January 2018 page 6


Linear independence*

Any mutually orthogonal set {v1 , ..., vm } is linearly independent.


Suppose that
a1 v1 +...+am vm = 0.

Then
ha1 v1 +...+am vm , v1 i = h0, v1 i = 0.

Using linearity,

a1 hv1 , v1 i+...+am hvm , v1 i = 0


a1 hv1 , v1 i = 0
a1 = 0, as hv1 , v1 i =
6 0.

MA212 – Lecture 26 – Friday 19 January 2018 page 7


Etc!

The same argument applies to v2 to give a2 = 0, and so on.

MA212 – Lecture 26 – Friday 19 January 2018 page 8


Gram-Schmidt process.

In any subspace U = Lin{u1 , ..., um } with {u1 , ..., um } linearly


independent we contruct an orthonormal set {v1 , ..., vm }
spanning U .

Step 1. We take
u1
v1 :=
||u1 ||
so that ||v1 || = 1.

MA212 – Lecture 26 – Friday 19 January 2018 page 9


Step 2.

We choose α so that

w2 = u2 − αv1 ⊥v1

This needs

0 = hw2 , v1 i = hu2 − αv1 , v1 i = hu2 , v1 i − αhv1 , v1 i


= hu2 , v1 i − α.1

So
w2 = u2 − hu2 , v1 iv1 , and w2 ⊥v1

MA212 – Lecture 26 – Friday 19 January 2018 page 10


But is w2 non-zero?

Yes, otherwise
u2 = hu2 , v1 iv1

is a scalar multiple of u1 contradicting linear independence.


Now we may take
w2
v2 = ⊥v1
||w2 ||
and v2 is of length 1.

And so on ... for instance: put

w3 = u3 − hu3 , v1 iv1 − hu3 , v2 iv2

w3
so that w3 ⊥v1 and w3 ⊥v2 ; then take v3 = ||w3 || .

MA212 – Lecture 26 – Friday 19 January 2018 page 11


Example in the function space C[0, 1].

We will use
Z 1
hu, vi = u(t)v(t) dt
0

which gives the Pythagorean norm


Z 1 1/2
||v|| = ||v||2 := v(t)2 dt
0

Consider the subspace U := Lin{u1 , u2 , u3 } where

u1 (t) ≡ 1,
u2 (t) ≡ t,
u3 (t) ≡ t2 .

MA212 – Lecture 26 – Friday 19 January 2018 page 12


Apply Gram-Schmidt, starting with ...

Z 1 1/2
||u1 || = 1 dt = 1.
0

We take v1 := u1 = 1. Now
1
w2 = u2 − hu2 , v1 iv1 = t − hu2 , v1 i1 = t − ,
2
as here
Z 1
1
hu2 , v1 i = t dt =
0 2

MA212 – Lecture 26 – Friday 19 January 2018 page 13


Now ...

"   3 #1 "  3 #
Z 1 2
1 1 1 1 1 1
||w2 ||2 = t− dt = t− = − −
0 2 3 2 3 8 2
0
1
= ,
3·4
so
1
||w2 || = √ ,
2 3
and so
 
w2 (t) √ 1 √
v2 (t) = =2 3 t− = 3 (2t − 1) .
||w2 || 2

MA212 – Lecture 26 – Friday 19 January 2018 page 14


Computation cont’d

w3 = u3 − hu3 , v1 iv1 − hu3 , v2 iv2


Z 1
2 1
hu3 , v1 i = t · 1 dt =
0 3

Z 1 √ √ Z 1
2 3 2

hu3 , v2 i = t · 3 (2t − 1) dt = 3 2t − t dt
0 0
 4 1  
√ t t3 √ 1 1 √ 1 1
= 3 2 − = 3 − = 3 = √
4 3 0 2 3 6 2· 3

MA212 – Lecture 26 – Friday 19 January 2018 page 15


So

2 1 1 √
w3 (t) = t − · 1 − √ 3 (2t − 1)
3 2· 3
1
= t2 − t + .
6
In the next slide we use the observation that
1 1 2−3
− =
6 4 12

MA212 – Lecture 26 – Friday 19 January 2018 page 16


And so finally

Z 1 2
2 2 1
||w3 || = t −t+ dt
0 6
Z 1 2
1 2 1
= (t − ) − dt
0 2 12
Z 1/2  2 Z 1/2  2
1 1
= s2 − ds = 2 s2 − ds
−1/2 12 0 12
Z 1/2  
4 1 2 1
= 2 s − s + ds
0 6 144
  4 2
1/2
s 1s 1
= 2 s − +
5 6 3 144 0

MA212 – Lecture 26 – Friday 19 January 2018 page 17


And so

(noting that 2s =1 when substituting s = 1/2 )


 4 2
  
s 1s 1 1 5·4 16 · 5
= − + = 1− +
5 6 3 12 · 12 s=1/2 5 · 16 6 · 3 12 · 12
 
1 18 5 · 4 2 · 5 8 1
= − + = =
5 · 16 18 6 · 3 6 · 3 5 · 16 · 6 · 3 5 · 36

So

v3 := 6 5w3
√ 2

v3 = 5 6t − 6t + 1 .

MA212 – Lecture 26 – Friday 19 January 2018 page 18


Orthogonal Matrices

Suppose that B = (v1 , ..., vn ) is an orthonormal basis for Rn .


Then

||vi || = 1 ∀i
hvi , vj i = 0 ∀i 6= j

Recall that MB = matrix of size n × n with columns v1 , ..., vn


represents a change of basis from En to B .

MB maps a vector in B -co-ordinates into its En -co-ordinates.

Here we take hu, vi := u · v .

MA212 – Lecture 26 – Friday 19 January 2018 page 19


...also

  
−− v1T −− | | ...
  
MBT MB =  −− v2 −−   v1 v2 ... 
 T  

... | |
   
v1T v1 v1T v2 ... 1 0 ...
   
=  T T   
 v2 v1 v2 v2 ...  =  0 1 ...  = In×n
... ... ...

So B = (v1 , ..., vn ) orthonormal ⇔ MBT MB = I ⇔ MB−1 = MBT .

MA212 – Lecture 26 – Friday 19 January 2018 page 20


In general...

Definition: Matrix M is orthogonal if M −1 = M T .

...important in view of their Geometric features

MA212 – Lecture 26 – Friday 19 January 2018 page 21


Preservation of Geometry

The following are equivalent for An×n

1. A is orthogonal

2. T defined by T (x) := Ax takes En to an orthonormal basis

3. Isometry: ||Av|| = ||v|| for all v in Rn


... so distance is preserved

4. Conformality: angles are preserved

MA212 – Lecture 26 – Friday 19 January 2018 page 22


Let’s check this out ...

(1) ⇔ (2) Write A = [a1 , ...., an ] (columns of A ) then

Ax = x1 a1 + .... + xn an

Since T (e1 ) = a1 etc., T takes E = (e1 , ..., en ) to (a1 , ...., an )


which are orthonormal if A is orthogonal.

MA212 – Lecture 26 – Friday 19 January 2018 page 23


A useful observation

If AT A = I

(Au) · (Av) = (Au)T Av = uT AT Av = uT v = u · v.

So
(1) =⇒ (3): Take u = v , then

||Av||2 =(Av) · (Av) = v · v = ||v||2

MA212 – Lecture 26 – Friday 19 January 2018 page 24


Cont’d

(1) =⇒ (4): Since ||Av|| =||v|| for all v, then with θ the angle
between u and v and ϕ the angle between Au and Av :

(Au) · (Av) = ||Au||.||Av|| cos ϕ


u · v = ||u||.||v|| cos θ

So
ϕ = θ (or 2π − θ, same thing really).

MA212 – Lecture 26 – Friday 19 January 2018 page 25


Cont’d

(4) =⇒ (2):

 1, if i = j,
ai · aj = (Aei ) · (Aej ) = ei · ej =
 0, if i 6= j.

(4) =⇒ (3): Angles are preserved; but the angle between Av


and Av is θ = 0, hence the angle between v and v is 0 so

||Av||2 =as θ=0 (Av) · (Av) = v · v =as θ=0 ||v||2 .

MA212 – Lecture 26 – Friday 19 January 2018 page 26


Cont’d

(3) =⇒ (4): By a recent Homework exercise:

1 2 1
(Au) · (Av) = ||Au + Av|| − ||Au − Av||2
4 4

(To see why: expand RHS using ||x + y||2 = hx + y, x + yi etc.)

1 1
= ||A(u + v)||2 − ||A(u − v)||2
4 4
1 2 1
= ||(u + v)|| − ||(u − v)||2
4 4
... by isometry
=u·v

... by the same Homework exercise.

MA212 – Lecture 26 – Friday 19 January 2018 page 27


Some useful facts

A orthogonal =⇒ A−1 orthogonal

AT = A−1 =⇒ (A−1 )T = (AT )T = A = (A−1 )−1 .

A, B orthogonal =⇒ AB orthogonal

(AB)(AB)T = ABB T AT = AIAT = AAT = I

So
(AB)−1 = (AB)T .

Of course: if A, B preserve distance then so does AB (which


represents B followed by A)

A orthogonal =⇒ det(A) = ±1.


MA212 – Lecture 26 – Friday 19 January 2018 page 28
Matrix orthogonality for vectors in R2

Here A is 2 × 2. Write
 
cos θ
a1 = Ae1 = (1 col. of A) = 
st ,
sin θ

being a vector of length 1.

MA212 – Lecture 26 – Friday 19 January 2018 page 29


Where and what is Ae2 ?

As e1 ⊥ e2 , we must have Ae1 ⊥ Ae2 . So: either


   
cos(θ + π2 ) − sin θ
Ae2 =   =  
sin(θ + π2 ) cos θ

or:    
cos(θ − π2 ) sin θ
Ae2 =  = 
sin(θ − π2 ) − cos θ

MA212 – Lecture 26 – Friday 19 January 2018 page 30


Either this

e2

a1

+
2 e1

a2

MA212 – Lecture 26 – Friday 19 January 2018 page 31


or this

e2

a1 ! a2
2

e1

MA212 – Lecture 26 – Friday 19 January 2018 page 32


In the first case

 
cos θ − sin θ
A = [a1 , a2 ] =   and det A = cos2 θ + sin2 θ = 1.
sin θ cos θ

This is orientation preserving: the axes have been rotated


through an angle θ.

MA212 – Lecture 26 – Friday 19 January 2018 page 33


Illustration

Ae1 e2
Ae2

e1

Ae2 MA212 – Lecture 26 – Friday 19 January 2018 page 34


In the second case

 
cos θ sin θ
A = [a1 , a2 ] =   and det A = − cos2 θ−sin2 θ = −1
sin θ − cos θ

This is orientation reversing: Not a rotation!


This is reflection in the line through the origin with angle θ/2.

MA212 – Lecture 26 – Friday 19 January 2018 page 35


Illustration

e2
Ae1 Ae2

ɽ/2
x
e1

MA212 – Lecture 26 – Friday 19 January 2018 page 36


Three dimensions: Part I

Theorem. If A is 3 × 3 and orthogonal with det A = +1


then λ = +1 is an eigenvalue of A.

Proof. As det(AT ) = det A = 1

det(A − I) = det(AT (A − I))


= det(I − AT ) ...(as AT A = I )
= det(I − A)T
= det(I − A) ...(as det M T = det M )
= det(−I)(A − I)
= det(−I) det(A − I)
= (−1) det(A − I)...(as here − I is 3 × 3)
MA212 – Lecture 26 – Friday 19 January 2018 page 37
Conclusion

So
2 det(A − I) = 0 : det(A − I) = 0.

This says that λ = 1 solves det(A − λI) = 0.

MA212 – Lecture 26 – Friday 19 January 2018 page 38


How about det = −1

Theorem. If A is 3 × 3 and orthogonal with det A = −1


then λ = −1 is an eigenvalue of A.

Proof. As det(AT ) = det A = −1

det(A + I) = (− det AT ) det(A + I)


= − det(AT (A + I))
= − det(I + AT ) ...(as AT A = I )
= − det(I + A)T
= − det(I + A) ...(as det M T = det M )

MA212 – Lecture 26 – Friday 19 January 2018 page 39


Conclusion

So
2 det(A + I) = 0 : det(A + I) = 0.

This says that λ = −1 solves det(A − λI) = 0.

MA212 – Lecture 26 – Friday 19 January 2018 page 40


Summary

Orthogonal means M −1 = M T
Preservation of geometry
Group property: AB and A−1 are orthogonal if A, B are.
det(M ) = ±1 and |λ| = 1 for λ any eigenvalue (could be
complex!)
For vectors in R2 an orthogonal matrix transformation is a
rotation or a reflection
If det = +1, then there is an eigenvector v with Av = v
If det = −1, then there is an eigenvector v with Av = −v

MA212 – Lecture 26 – Friday 19 January 2018 page 41


MA212 Further Mathematical Methods Lecture LA 7

Lecture 27: Orthogonal Matrices – cont’d

Two 3 × 3 examples
Complex scalars and vectors
Complex Orthogonality
Story so far ...

Orthonormal base B has its representing matrix MB an


‘orthogonal matrix’

MBT MB = I, or
MB−1 = MBT .

MA212 – Lecture 27 – Tuesday 23 January 2018 page 2


Recapitulation : 1

M is an ‘orthogonal matrix’ means that

M −1 = M T .

Then: Isometry, and Conformality


The 2 × 2 case has either det = +1 and is a rotation:
 
cos θ − sin θ
 
sin θ cos θ

Or det = −1 and is then a reflection matrix:


 
cos θ sin θ
 
sin θ − cos θ
MA212 – Lecture 27 – Tuesday 23 January 2018 page 3
Axis of rotation of a 3 × 3 matrix

Suppose that det A = +1.


Let v be an eigenvector to value λ = +1, i.e.

Av = v.

Consider P the plane through the origin orthogonal to the Line:


Lin{v}.
So u ∈ P iff u ⊥ v.

Consider u ∈ P with u ⊥ v.
Then by conformality Au ⊥ Av = v
So Au ⊥ v.
A maps P into P

MA212 – Lecture 27 – Tuesday 23 January 2018 page 4


A rotation!

So the linear mapping T : P → P defined by T (u) = Au is a


rotation: why?

MA212 – Lecture 27 – Tuesday 23 January 2018 page 5


Why a rotation?

Because, det A = 1 so A is orientation preserving on P.


Conclusion: A is a rotation around the line through v.
Au ∈ P whenever u ⊥ v.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 6


Illustration

Au

MA212 – Lecture 27 – Tuesday 23 January 2018 page 7


Example:

Find the matrix representing rotation T by +π/4 (i.e.


anti-clockwise) around
 
1
 
v :=  1
 
.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 8


Solution approach

First re-scale the vector v to get a unit vector:


 
1
1  
u =√  1 
 .
2
0

Step 0. Solve a simpler problem: a rotation R about e3 through


an angle θ = π/4 is represented, as in slide 3, by
  √ √  
cos θ − sin θ 0 1/ 2 −1/ 2 0
   √ √ 
AEE
R :=    
 sin θ cos θ 0  =  1/ 2 1/ 2 0  .
0 0 1 0 0 1

This is relative to the natural basis.


MA212 – Lecture 27 – Tuesday 23 January 2018 page 9
Step 1.

We can get the transformation T from R : just interpret e3 as


representing the axis of rotation, that being the following unit
vector in some new base B = (v1 , v2 , v3 )
 
1
1  
v3 = √  1 

 .
2
0

In other words we want e3 = (u)B , i.e. to be the co-ordinate


column of u relative to some base B.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 10


Step 2.

So construct an orthonormal basis B = (v1 , v2 , v3 ) for R3 which


includes the above unit-length vector as v3 , and preserves
orientation.
Then we think of e3 = (0, 0, 1)T as the co-ordinate column xB of
u, because it encodes that u is ( 0 units of v1 ) + ( 0 units of v2 )
+ ( 1 unit of v3 ):
e3 = (u)B

(in the notation of MA100). Then


 √ √ 
1/ 2 −1/ 2 0
 √ √ 
ABB
T = AEE
R = 
 1/ 2 1/ 2 0  .
0 0 1

MA212 – Lecture 27 – Tuesday 23 January 2018 page 11


Step 3.

But we require AEE


T . So use

(x)E = MB (x)B .

Then

(y)B = T ((x)B ) = ABB


T (x)B ⇔ MB−1 (y)E = ABB −1
T MB (x)E
−1 −1
(y)E = MB ABB
T M B (x)E : AEE
T = M ABB
B T M B .

MA212 – Lecture 27 – Tuesday 23 January 2018 page 12


Solution

Pick also a with a ⊥ u. How about taking:


   
1 1
1   1  
a =√  −1 
 ⊥ u = v3 = √   1 
 .
2 2
0 0

Now pick a vector b with b ⊥ v3 and b ⊥ a.


How about, the following which is of unit length:
 
0
 
b = 0
 
.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 13


Problem: which do we take?

Do we take
B := [a, b, v3 ] or B := [b, a, v3 ]?

They have determinants of opposite sign. A little


experimentation leads to the right choice as
 √ √   
0 1/ 2 1/ 2 0 1 1
 √ √  1  
 
B := [v1 , v2 , v3 ] =  0 −1/ 2 1/ 2  = √ 
 0 −1 1 ,

2 √
1 0 0 2 0 0

since this has det B = +1. Otherwise we would switch the


sense of rotation to clockwise.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 14


Getting there

This B is orthogonal (we took an orthonormal basis), so its


inverse is its transpose.
So

AEE
T = MB ABB T EE
T MB = MB AR MB
T
   √ 
0 1 1 1 −1 0 0 0 2
1 


 
 
= √  0 −1 1   1 1 0   1 −1 0 
 

2 2 √ √
2 0 0 0 0 2 1 1 0
= ...
 √ √ √ 
1 + 2 −1 + 2 2
1  √ √ √ 
= √  −1 + 2 1 + 2 − 2 

 .
2 2 √ √
− 2 2 2
MA212 – Lecture 27 – Tuesday 23 January 2018 page 15
Illustration: We want preservation of orientation.

We refer to the fingers of the right hand as indicating the


directions: v1 thumb , v2 index finger, and v3 middle finger.
(This agrees with the usual ordering e1 , e2 and e3 ).

MA212 – Lecture 27 – Tuesday 23 January 2018 page 16


Middle finger to be along v3 , the axis of rotation.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 17


Orientation: thumB must be on b

(It helps to make the right arm point right.)

MA212 – Lecture 27 – Tuesday 23 January 2018 page 18


Recapitulation 2

The case of orientation preservation:

det = +1 =⇒ (∃v) Av = v

The case of orientation reversal:

det = −1 =⇒ (∃v) Av = −v

MA212 – Lecture 27 – Tuesday 23 January 2018 page 19


Example in R3

You are told that


 
8 −1 4
1 
A =  1 −8 −4 


9
4 4 −7

is orthogonal with det A = +1. Find the axis and angle of


rotation.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 20


Solution.

First solve Av = v to get the axis. Equivalently, here


9(A − I)v = 0.

    
8−9 −1 4 x1 0
    
 −4     
 1 −8 − 9   x2  =  0 
4 4 −7 − 9 x3 0

MA212 – Lecture 27 – Tuesday 23 January 2018 page 21


So

    
−1 −1 4 x1 0
    
 1  
−17 −4   x2  =  0 
   
4 4 −16 x3 0

Adding the first two rows gives

−18x2 = 0 : x2 = 0.

Hence the general solution is


 
4
 
v = x3  0
 
.

1
MA212 – Lecture 27 – Tuesday 23 January 2018 page 22
Cont’d

Take
 
4
 
v3 =  0
 
.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 23


Next ..

Pick also u with u ⊥ v. How about taking:


 
0
 
u = 
 1  = e2 .
0

Then
 
−1
1 
Au =Ae2 = 2 nd 
col =  −8 
 .
9
4

MA212 – Lecture 27 – Tuesday 23 January 2018 page 24


Cont’d

So if θ is the angle of rotation, then as ||Au|| =||u|| = 1 we have


 
−1
1

 8
u·Au = ||u|| · ||Au|| cos θ = cos θ = e2 ·  −8  = − .
9 9
4

MA212 – Lecture 27 – Tuesday 23 January 2018 page 25


So

 
−1 8
θ = cos − .
9

MA212 – Lecture 27 – Tuesday 23 January 2018 page 26


Comment: an alternative method

As det A = +1, A is similar to


 
cos θ − sin θ 0
 
B=
 sin θ cos θ 0 

0 0 1

So

T r(B) = 1 + 2 cos θ
= T r(A) ... (by similarity)
1 7
= (8 − 8 − 7) = −
9 9

MA212 – Lecture 27 – Tuesday 23 January 2018 page 27


So

7 16 8
2 cos θ = − − 1 = − : cos θ = − .
9 9 9

" ! ! "
0 "
2
8/9

!"
"
MA212 – Lecture 27 – Tuesday 23 January 2018 page 28
2 !"
Question: is this clockwise or anticlockwise?

As cos(−θ) = cos θ we still have an unanswered issue:


although we can compute the “acute” angle θ ∈ [0, π] with
cos θ = − 89 , we must yet determine if the rotation is:
+θ (anticlockwise), or −θ (clockwise).

The clockwise answer is of course equivalent to an


anticlockwise obtuse angle of 2π − θ .
(Watch your language!)

The upshot is that we know the angle, but not its direction.
To settle this question we need another Recap from MA100.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 29


Recapitulation 3: On cross-products

For vectors a, b, c

e e e e
1 2 3


c = a × b = a = a1 a2 a3


b b1 b2 b3

has
||c|| = ||a|| · ||b|| · | sin θ|.

Beware this is sin NOT cos . The formula describes numerically


the area of the parallelogram with sides a and b , with θ the
angle between a and b (measured from a to b ). See the
Figures.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 30


Orientation of c = a × b with θ measured from a to b.

The orientation of c (parallel to v3 ) is obtained by applying the


right-hand rule: identify your thumb with a , your index with b ,
then the middle finger points to (is aligned with) c .

c = (||a|| · ||b|| · sin θ) v3

v3

v2

c
b
v1
a

(For us v3 = v with v2 somewhere in the plane of a and b .)


MA212 – Lecture 27 – Tuesday 23 January 2018 page 31
Parallelogram

... for c = a × b with θ measured from a to b

v2

b
b sin !
!
a v1

MA212 – Lecture 27 – Tuesday 23 January 2018 page 32


Computation

Using the cross-product formula, and remembering that u = e2


so that Au = a2

e e e e e e e
1 2 3 1 2 3
1
a = u = e2 = a1 a2 a3 = 0 1 0
9

b =Au = a2 b1 b2 b3 −1 −8 4
1
= (4e1 + 0e2 + 1e3 )
9
 
4 √
1 
 0  = 1 v = 17 v
=
9  9 9 ||v||
1

MA212 – Lecture 27 – Tuesday 23 January 2018 page 33


So c points in the same direction as v

1
...because 9 > 0. So

v 17 v
c = a × b = sin θ =
||v|| 9 ||v||

Recall that above a = u and b = Au , both being of length 1 .


Let θ measure the angle from a to b ; then

17
sin θ = .
9
But θ is also the angle of rotation around v , so the angle of
rotation is positive so anti-clockwise.

This is of course numerically consistent with θ = cos−1 − 98 .
MA212 – Lecture 27 – Tuesday 23 January 2018 page 34
Revision

Getting to grips with C (the complex scalars)

MA212 – Lecture 27 – Tuesday 23 January 2018 page 35


Complex numbers as scalars

C denotes the complex numbers written typically as



z = a + ib a, b ∈ R and i = −1
a = Re(z) = real part of z
b = Im(z) = imaginary part of z

The conjugate is
z̄ = a − ib.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 36


Multiplication:

This follows the usual rules of algebra with i2 being replaced by


−1 :

(a + ib)(c + id) = ac + bdi2 + i(bc + ad)


= (ac − bd) + i(bc + ad)

So, taking c = a, d = −b :

z z̄ = (a + ib)(a − ib) = a2 + b2

MA212 – Lecture 27 – Tuesday 23 January 2018 page 37


Modulus

The modulus is defined by


p
|z| = a 2 + b2 .

This relies on the idea that z may be interpreted as the


point/vector (a, b) in the plane R2 . Then |z| is the Pythagorean
norm (length) of the vector (a, b) :

|z| = r = ||(a, b)|| : where r2 = a2 + b2

So this denotes the radial distance of z from the origin (which


represents 0).

MA212 – Lecture 27 – Tuesday 23 January 2018 page 38


Fractions

Note that computation of fractions/ratios may need


‘real-ification’:

1 z̄ z̄ a − ib
= = 2 = 2 2
.
z z z̄ |z| a +b

Example

2−i 3−i (2 − i)(3 − i) (6 − 1) + i(−5)


= (2 − i) = =
3+i (3 + i)(3 − i) 10 10
5 − 5i 1−i
= = .
10 2

MA212 – Lecture 27 – Tuesday 23 January 2018 page 39


More about conjugates

Interpreting z = a + ib as the vector (a, b) tells us that the real


line is the horizontal axis: the real axis of the plane.
Likewise the imaginary numbers ib (so with real part zero) form
the perpendicular axis of points (0, b) : the imaginary axis.
The complex conjugacy map

z 7→ z̄ (a + ib) 7→ (a − ib)

is then reflection in the real axis. Underlying this is a


transformation which replaces i by −i. Since

(−i)2 = i2 = −1,

we’re substituting one root of −1 by another (its conjugate).

MA212 – Lecture 27 – Tuesday 23 January 2018 page 40


Conjugacy preserves arithmetic

Conjugacy is both additive and multiplicative:

(z + w) = z̄ + w̄,
(zw) = z̄ · w̄,

so preserves arithmetic operations.


Hence
(z/w) = z̄/w̄ for w 6= 0.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 41


Geometric features

From the exponential series

z 1 2 1 3
e = 1 + z + z + z + ...,
2 3!
which mathematicians usually take as the definition of ez , a
point we revisit later.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 42


Substitution z = iθ

yields

iθ 1 2 1 3
e = 1 + iθ − θ − iθ + ...
2 3!
1 2 1 3
= (1 − θ + ...) + i(θ − θ + ...)
2 3!
= cos θ + i sin θ.

So eiθ is interpreted as the vector

(cos θ, sin θ)

at unit distance from 0 and with angle θ to the positive real


half-axis.
MA212 – Lecture 27 – Tuesday 23 January 2018 page 43
Cont’d

Since the vector (a, b) is at distance r from 0 and has


co-ordinates

a = r cos θ, b = r sin θ, where r2 = a2 + b2 .

hence the corresponding complex number (a + ib) may also be


written
a + ib = r(cos θ + i sin θ) = reiθ .

MA212 – Lecture 27 – Tuesday 23 January 2018 page 44


Cont’d

Hence also with z = reiθ and w = seiϕ

|zw| = |z| · |w| from |reiθ · seiϕ | = |rsei(θ+ϕ) | = rs,

and likewise

|z/w| = |reiθ /seiϕ | = |(r/s)ei(θ−ϕ) | = r/s = |z|/|w|, for w 6= 0.

Using the fact that conjugacy replaces i by −i :

z̄ = re−iθ .

MA212 – Lecture 27 – Tuesday 23 January 2018 page 45


Triangle inequality

Also, in view of |z| = |a + ib| = r = ||(a, b)||, the triangle


inequality for Pythagorean norms gives

|z + w| ≤ |z| + |w|.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 46


Complex Vector Space Cn

We define   

 z1 


  

 .. 
Cn =  .  : z1 , ..., zn ∈ C .

   


 

zn

These are called complex vectors: they have complex


entries/components.
Applying conjugation to the components:
   
z1 z̄1
   
 ..   .. 
If z =  .  , then z̄ =  .  .
   
zn z̄n

MA212 – Lecture 27 – Tuesday 23 January 2018 page 47


A complex matrix is ...

a matrix with complex entries:


 
a ... a1n
 11 
 . .. 
A =  .. ... .  with all aij in C.
 
am1 ... amn

Then the conjugate matrix is:


 
ā ... ā1n
 11 
 . .. 
Ā =  .. ... . .
 
ām1 ... āmn

MA212 – Lecture 27 – Tuesday 23 January 2018 page 48


And as for the determinant

As conjugacy preserves arithmetic, for m = n

det(Ā) = det A

MA212 – Lecture 27 – Tuesday 23 January 2018 page 49


Easy as she goes...

What can be done with real scalars can be done with complex
scalars. Watch an example.
Solve Az = b where
    
1 0 i z1 2+i
    
 −i 1 1 + i   z2  =  3 − 2i  .
    
0 i −1 z3 2i

MA212 – Lecture 27 – Tuesday 23 January 2018 page 50


Solution

Form the augmented matrix


 
1 0 i 2+i
 
 −i 1 1 + i 3 − 2i 
 
R2 → R2 + iR1
0 i −1 2i

 
1 0 i 2+i
 
 0 1 i 2 
 
0 i −1 2i R3 → R3 − iR2

MA212 – Lecture 27 – Tuesday 23 January 2018 page 51


Continuing ...

 
1 0 i 2+i
 
 0 1 i 2 
 
0 0 0 0 R3 → R3 − iR2

MA212 – Lecture 27 – Tuesday 23 January 2018 page 52


So

   
 z1 2 + i − iz3
z1 + iz3 = 2 + i     

=⇒  z2  =
 
 2 − iz3 

z2 + iz3 = 2 
z3 z3
   
2+i −i
   
=   
 2  + z3  −i 
 for z3 ∈ C
0 1

MA212 – Lecture 27 – Tuesday 23 January 2018 page 53


Comments

We have found the solutions of Az = b to be of the form


z = zspecial + z3 v.
In particular for z3 = 0 we get b =Azspecial . So
A(z3 v) =A(z − zspecial ) = b − b = 0 .
So
 
 
 −i 
 
N (A) = Lin{v} = Lin   −i  ,

 

 
1

and nullity (A) = 1.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 54


Comments continued

Referring to
 
1 0 i
 
B :=  
 0 1 i  = [b1 , b2 , b3 ]
0 0 0

and noting the third column b3 is a combination of b1 , b2 , we


have
   
 
 1   0 
 
R(B) = Lin  0  ,  1  = Lin {b1 , b2 }
   

 
 
0 0

MA212 – Lecture 27 – Tuesday 23 January 2018 page 55


Further consequences

So rk(B) = rk(A) = 2 and in fact:


   

 1 0 

   
R(A) = Lin {a1 , a2 } = Lin  −i , 1  .
  

 
 
0 i

As rank+nullity=2+1 =dim-dom=3, as expected: since C3 is


3 -dimensional as a vector space over the complex scalars.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 56


Determinant here?


1 0 i

1 1+i −i 1
det(A) = −i 1 1 + i = 1

+ 0 + i



i −1 0 i
0 i −1

= −1 − i(1 + i) + i(−i2 ) = −1 − i − i2 + i = 0.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 57


Geometry in Cn

Recall that

|z|2 = z z̄ = a2 + b2 .

We follow the Pythagorean norm on C and generalize beyond to


Cn by defining

||z||2 = |z1 |2 + ... + |zn |2 = z1 z̄1 + ... + zn z̄n .

MA212 – Lecture 27 – Tuesday 23 January 2018 page 58


Inner products: complex-valued

The last formula motivates us to define the complex scalar


product by:
   
z w
 1   1 
 .   . 
z · w =  ..  ·  .. 
   
zn wn
 
w̄1
 
 . 
= zT w̄ = [z1 , ..., zn ]  .. 
 
w̄n
= z1 w̄1 + ... + zn w̄n .

MA212 – Lecture 27 – Tuesday 23 January 2018 page 59


Then ...

w · z = w1 z̄1 + ... + wn z̄n = (z1 w̄1 + ... + zn w̄n ) = z · w

We will also write

hz, wi = z · w : hw, zi = w · z = z · w = hz, wi.

We loose symmetry but retain left-sided linearity:

hαz + βz′ , wi = (αz1 + βz1′ )w̄1 + ... + (αzn + βzn′ )w̄n


= (αz1 w̄1 + ...) + (βz1′ w̄1 + ...)
= α(z1 w̄1 + ...) + β(z1′ w̄1 + ...)
= αhz, wi + βhz′ , wi

MA212 – Lecture 27 – Tuesday 23 January 2018 page 60


Compare this with ...

hz, αw + βw′ i = hαw + βw′ , zi = αhw, zi + βhw′ , zi


= ᾱhw, zi + β̄hw′ , zi = ᾱhz, wi + β̄hz, w′ i

This property is called ‘sesquilinearity’ (Latin for ‘one and a


half’-linearity).

MA212 – Lecture 27 – Tuesday 23 January 2018 page 61


Complex Inner product Space

For V a complex vector space a map

h·, ·i : V × V → C

is a complex inner product if there is:


Linearity in the first argument:

hαu1 + βu2 , vi = αhu1 , vi + βhu2 , vi all u1 , u2 , v ∈V, α, β ∈ C

Hermitian property:

hv, ui = hu, vi all u, v ∈V.

Positivity:
hu, ui > 0 all u 6= 0 ∈V.
MA212 – Lecture 27 – Tuesday 23 January 2018 page 62
Comment

Note the Hermitian property implies that hu, ui =hu, ui for all
u ∈V ; so hu, ui ∈ R ; but we want more than just being real.

MA212 – Lecture 27 – Tuesday 23 January 2018 page 63


MA212 Further Mathematical Methods Lecture LA 8

Lecture 28: Complex orthonormality

Unitary matrices from orthogonality


Adjoint operation A∗
Hermitian matrices
(In pursuit of diagonalization)
Background Readings for this Lecture

Anthony and Harvey: Chap. 13 (§13.3-13.6)

Adam Ostaszewski: Chap. 5 (§5.1 and 5.3)

MA212 – Lecture 28 – Friday 26 January 2018 page 2


Recap about: <, > in the complex case

u · v = uT v
= u1 v1 + u2 v2 + · · · + un vn

so
P
u·u= |ui |2 ,
u · u > 0 for u 6= 0 .

This is the standard example of a complex inner product


hu, vi = u · v

MA212 – Lecture 28 – Friday 26 January 2018 page 3


Complex inner product space

For h·, ·i : V × V −→ C
Linearity:
hαu1 + βu2 , vi = αhu1 , vi + βhu2 , vi , ∀u1 , u2 , v ∈ V ,
∀α, β ∈ C
Hermitian property: hv, ui = hu, vi , ∀v, u
Positivity:
hu, ui > 0 for all u 6= 0

N.B.
hu, ui = hu, ui so hu, ui is real, but we want more (positivity).

MA212 – Lecture 28 – Friday 26 January 2018 page 4


Orthonormal set, an example:

   
1 1
   
v1 = √2 
1
i
 
 , v2 = √1
3
−i what are their norms?
 
0 1

Since 1 · 1 + i · i + 0 = 1 + (i)(−i) = 1 + 1 = 2 ,
||v1 || = 1 .
Since 1 · 1 + (−i) · −i + 1 · 1 = 1 − (i)(i) + 1 = 1 + 1 + 1 = 3 ,
||v2 || = 1 .

Moreover, v1 · v2 = √1 √1 (1 · 1 + i(−i) + 0 · 1) = 0 ,
2 3
so v1 ⊥ v2 .

MA212 – Lecture 28 – Friday 26 January 2018 page 5


Constructing an orthonormal base:

Essentially just as in the Rn case, but care must be taken over


inner products here because linearity is only in the first
argument of h·, ·i.

Base given as:


     
1 1 0
     
 
u1 :=  i  , u2 := −i , u3 := 
  1
 

0 1 0

MA212 – Lecture 28 – Friday 26 January 2018 page 6


Apply the Gram-Schmidt construction ...

... to the above vectors:


 
1
 
u1 =  i
 

0,

and ku1 k2 = 12 + |i|2 + 02 = 2 . So,


 
1
 
v1 = √2 
1
i
 
.

MA212 – Lecture 28 – Friday 26 January 2018 page 7


In the Figure below: what’s α?

A vector w2 ⊥ v1 is wanted with w2 = u2 − αv1 . So what is


wanted is:

hw2 , v1 i = 0 i.e. hu2 − αv1 , v1 i = 0.

So hu2 , v1 i − αhv1 , v1 i = 0 ,
yielding:
α = hu2 , v1 i.

MA212 – Lecture 28 – Friday 26 January 2018 page 8


Notice that ...

u2 comes first in the inner product h·, ·i calculation:

1
hu2 , v1 i = − √ {1 + (−i)(−i) + 0} = 0.
2

So
     
1 1 1
  1   1  
w2 =      
−i − √2 {1 + (−i)(−i) + 0}  i  √2 = −i .
1 0 1

MA212 – Lecture 28 – Friday 26 January 2018 page 9


Hurray!

So u2 ⊥ v1 , as wanted!
Hence kw2 k2 = 1 + |(−i)|2 + 1 = 3 . Rescaling yields
 
1
 
v2 = √3 
1
−i
 
.

1
 
0
 
Now work on u3 =  1
 
.

w3 = u3 − hu3 , v1 iv1 − hu3 , v2 iv2

Again u3 comes first in the two inner products h·, ·i

MA212 – Lecture 28 – Friday 26 January 2018 page 10


Cont’d

   
0 1
  1   1 1
   
hu3 , v1 i = 1 · √  i  = √ (0 + i + 0) = − √ i
2 2 2
0 0

and
   
0 1
  1   1 i
   
hu3 , v2 i = 1 · √ −i = √ (0 + 1(−i) + 0) = √ .
3 3 3
0 0

MA212 – Lecture 28 – Friday 26 January 2018 page 11


So...

     
0 1 1
  i 1   i 1  
w3 = 1 − (− √ ) √  i  − ( √ ) √ −i
    

2 2 3 3
0 0 1
     
0 i i
  1  1 
=   
1 + 2 −1 − 3 1
 

0 0 i
 
i/6  
  1 1 3−2

=  1/6  as − = .
2 3 6
−i/3

MA212 – Lecture 28 – Friday 26 January 2018 page 12


Cont’d

   
i/6 i
  1 
=  1/6  =  1 
  
 .
6
−i/3 −2i
1 6
Now kw3 k2 = 36 (|i|2 + 12 + |(−2i)|2 ) = 36 = 16 , so
 
i
w3 1  
v3 = =√  1 

 .
kw3 k 6
−2i

MA212 – Lecture 28 – Friday 26 January 2018 page 13


Caution: More than two units among the scalars

Above v3 ⊥ v1 , v2 . For any complex number α with |α| = 1, we


have
||αv3 || = |α| · ||v3 || = ||v3 ||

. So αv3 is also of unit length. Also

αv3 ⊥ v1 , v2 .

More units than than the two ±1 available in R .

MA212 – Lecture 28 – Friday 26 January 2018 page 14


Example:

The multiple:
 
−1
 
iv3 = √6 
1
 i 

can equally well be used in place of v3 .

MA212 – Lecture 28 – Friday 26 January 2018 page 15


Continuous function into C

If f : [0, 1] −→ C is continuous, then f (t) = g(t) + ih(t)


for some functions g, h both [0, 1] −→ R with g and h both
continuous.

We define complex integration by putting:


R1 R1 R1
0 f (t)dt := 0 g(t)dt + i 0 h(t)dt

Moral: apply ordinary integration separately to the real and


imaginary parts.
We denote the continuous functions from [0, 1] −→ C by CC [0, 1].

MA212 – Lecture 28 – Friday 26 January 2018 page 16


Example

f (t) = eit = cos(t) + i · sin(t)

Note that if f (t) = g(t) + ih(t) , then


Z 1 Z 1 Z 1
f (t) dt = g(t)dt + i h(t)dt
0 0 0
Z 1 Z 1
= g(t)dt + i h(t)dt
0 0
Z 1 Z 1
= g(t)dt − i h(t)dt
0 0
Z 1
= {g(t) − ih(t)}dt
0
Z 1
= f (t)dt.
0 MA212 – Lecture 28 – Friday 26 January 2018 page 17
Continuing in the same vein

Define also:
R1
hf1 , f2 i = 0 f1 (t)f2 (t)dt

Check that this has the properties required:


Z 1
hαf1 + βf2 , gi := [αf1 (t) + βf2 (t)]g(t)dt
0
Z 1
= (αf1 (t)g(t) + βf2 (t)g(t))dt
0
Z 1 Z 1
= αf1 (t)g(t)dt + βf2 (t)g(t)dt
0 0
= αhf1 , gi + βhf2 , gi

MA212 – Lecture 28 – Friday 26 January 2018 page 18


Cont’d

Z 1
hf, gi = f (t)g(t)dt
0
Z 1
= f (t)g(t)dt
0
Z 1
= g(t)f (t)dt = hg, f i
0

Z 1 Z 1
hf, f i = f (t)f (t)dt = |f (t)|2 dt > 0 unless f (t) ≡ 0
0 0

MA212 – Lecture 28 – Friday 26 January 2018 page 19


Unitary Matrices: here hu, vi = u · v

For an orthonormal basis B = (v1 , . . . , vn ) take


MB = [v1 , . . . , vn ], square of size n × n . Consider
 
. . . vT1 . . .
 
 
. . . vT2 . . . T
  = M
 ..  B
 . 
 
. . . vTn . . .
n×n

Then
T
M B MB = I so MB∗ MB = I.

See slide 22 for the definition of M ∗ .

MA212 – Lecture 28 – Friday 26 January 2018 page 20


In detail ...

   
vT1 vT1 v1 vT1 v2 ...
   
 T  T T 
T v 2  v2 v1 v2 v2 ... 
  
M B MB =  .  [v1 , v2 , . . . , vn ] =  . 
. . .
. 
 .   . . ... 
   
vTn ... ... vTn vn
 
hv , v i hv2 , v1 i ...
 1 1 
 
hv1 , v2 i ... ... 

= =I
.. .. 
 . . ... 
 
... ... hvn , vn i

... since uT v = uT v = hu, vi = hv, ui .

MA212 – Lecture 28 – Friday 26 January 2018 page 21


The conjugate transpose

Definition. For a matrix A we define the conjugate transpose of


A to be the matrix
∗ T
A := A .

Definition. A square matrix A is said to be unitary if

A−1 = A∗ .

Remark. So a real square unitary matrix A is just an orthogonal


one!

MA212 – Lecture 28 – Friday 26 January 2018 page 22


Example

   
√1 √1 −1 √1 −i
√ √ 0
 2 3 6  2 2 
 √i   √1 
A=  −i
√ √i  , then A∗ =  √i √1  .
 2 3 6  3 3 3
√1 √2 −1 −i √2
0 3 6

6

6 6

It is easily checked that AA∗ = I. So A∗ A = I and A−1 = A∗ .

MA212 – Lecture 28 – Friday 26 January 2018 page 23


Cont’d

 
1 1 −1
1 1 1  
det(A) = √ √ √ det  i −i i 


2 3 6
0 1 2
1
= ((−2i − i) − i(2 + 1))
6
= −i.

So | det(A)| = 1 .

MA212 – Lecture 28 – Friday 26 January 2018 page 24


Properties of *

1. (A∗ )∗ = A
2. (AB)∗ = B ∗ A∗
3. det(A∗ ) = det(A)
4. The Star property: x · (A∗ y) = (Ax) · y
Indeed
T T
LHS = x (A y)
T
T
=x A y
= (xT AT )y
= (Ax)T y
= (Ax) · y = RHS

MA212 – Lecture 28 – Friday 26 January 2018 page 25


Properties cont’d

5. Group property: for A unitary,


(A−1 )∗ = (A∗ )∗ = A = (A−1 )−1
So A unitary implies A−1 unitary, which itself implies A
unitary.
6. A, B unitary implies AB unitary.
(AB)∗ AB = B ∗ A∗ AB
= B∗B = I

7. A unitary implies | det A| = 1. Indeed A∗ A = I


det(A) det(A) = |I| = 1
| det(A)|2 = 1
| det(A)| = 1

MA212 – Lecture 28 – Friday 26 January 2018 page 26


Theorem

Theorem:
Equivalent characterizations for an n × n matrix:
1. A is unitary;
2. columns of A are an orthonormal basis for Cn ;
3. isometry: kAuk = kuk ;
4. conformality,
and in fact (Au) · (Av) = u · v .

MA212 – Lecture 28 – Friday 26 January 2018 page 27


Proof: (1) implies (3)

(Au) · (Av) = (Au)T Av


= uT AT Av
= uT v
Indeed
T
A A = I , so (as I is real)

T
I = A A = AT A, i.e. AT A = I.

Hence (with v = u ),

kAuk2 = (Au) · (Au) = uT u = kuk2 .

MA212 – Lecture 28 – Friday 26 January 2018 page 28


A theorem about unitary matrices

Theorem: For A unitary, its eigenvalues have modulus 1.

Proof:
Let Ax = λx , with x 6= 0 . Then

x · x = kxk2 = (Ax) · (Ax) = (λx) · (λx) = λλ x · x.

So λλ = 1 , as x · x = kxk2 > 0 .

MA212 – Lecture 28 – Friday 26 January 2018 page 29


Normal Matrices

A matrix An×n is normal if

AA∗ = A∗ A

They are ‘normal’ precisely because they are diagonalizable, i.e.


there is an orthonormal basis change that gives them a diagonal
matrix representation.
So
unitary matrices are normal
UU∗ = I ⇔ UU∗ = I

As U −1 = U ∗ .
Now we meet a very significant member of the normal family:

MA212 – Lecture 28 – Friday 26 January 2018 page 30


Hermitian Matrices

A is Hermitian if A∗ = A

An example:
 
1 i −1
 
 −i 0 3 + 2i
 
−1 3 − 2i 2
If A = A∗ then A∗ A = A2 = AA∗ .
So if A is Hermitian, then A is normal.
Similarly for A skew-hermitian: i.e. when

A∗ = −A (1)

then AA∗ = −A2 = A∗ A


MA212 – Lecture 28 – Friday 26 January 2018 page 31
Example:

   
0 2 0 −2
A= , then ∗
A =   = −A
−2 i 2 −i

4 −2i

Check: AA =  
2i 5

MA212 – Lecture 28 – Friday 26 January 2018 page 32


Comment

Note: (AA∗ )∗ = A∗∗ A∗ = AA∗

Theorem (group-property)

Let An×n be invertible. Then:


1. (A∗ )−1 = (A−1 )∗
2. A is normal =⇒ A−1 is normal
3. A is Hermitian =⇒ A−1 is Hermitian

MA212 – Lecture 28 – Friday 26 January 2018 page 33


Proof:

1. I = I ∗ = (AA−1 )∗ = (A−1 )∗ A∗ , so (A∗ )−1 = (A−1 )∗


2.If AA∗ = A∗ A , then

(A∗ )−1 A−1 = A−1 (A∗ )−1 .

Since (A∗ )−1 = (A−1 )∗ by part 1,

(A−1 )∗ A−1 = A−1 (A−1 )∗ ,

which shows that A−1 is normal.


3.if A = A∗ , then A−1 = (A∗ )−1 = (A−1 )∗

MA212 – Lecture 28 – Friday 26 January 2018 page 34


Theorem about eigenvalues

Theorem
Let An×n be Hermitian. Then:
1. All eigenvalues are real
2. Eigenvectors to distinct eigenvalues are orthogonal.

Comment: So this holds for real symmetric A

MA212 – Lecture 28 – Friday 26 January 2018 page 35


Proof for 1.

Let Ax = λx , for x 6= 0 . Then (since A = A∗ )

x · (Ax) = x · (A∗ x)

Using the star property

x · (λx) = (Ax) · x

Thus
λx · x = λx · x

and we conclude λ = λ as x · x > 0 .

MA212 – Lecture 28 – Friday 26 January 2018 page 36


Proof for 2.

Let Ax = λx and Ay = µy , with λ 6= µ . Then

(Ax) · y = x · (A∗ y) = x · Ay

Thus (since µ = µ as in the last slide)

λx · y = x · (µy) = µx · y = µx · y

So (λ − µ)x · y = 0 , and hence x · y = 0 .

MA212 – Lecture 28 – Friday 26 January 2018 page 37


Summary. The square matrix M is:

Normal if M M ∗ = M ∗ M
Unitary if M ∗ M = I i.e. M −1 = M ∗
Hermitian if M ∗ = M

For M real: orthogonal is unitary


For M real: symmetric is Hermitian

If M is unitary (e.g. real + orthogonal) its eigenvalues λ


satisfy |λ| = 1.

If M is Hermitian (e.g. real + symmetric) its eigenvalues λ


satisfy λ ∈ R MA212 – Lecture 28 – Friday 26 January 2018 page 38
Postscript: Hermite property again

Suppose A = A∗ , put hx, yi = xT Ay . Then


hy, xi = yT Ax
= yT Ax regarded as a 1 × 1 matrix, (z)1×1 say,
= (yT Ax)T since (z)T1×1 = (z)1×1
T
= xT A y
T
= xT Ay as A = A∗ = A
= hx, yi

MA212 – Lecture 28 – Friday 26 January 2018 page 39


MA212 Further Mathematical Methods Lecture LA 9

Lecture 29: Towards diagonalization

Example of diagonalization
Upper triangular form
Background readings

Anthony & Harvey Chapter 8 (§8.2 & §8.3)

Adam Ostaszewski Chapter 6 (§6.2 & §6.3)

MA212 – Lecture 29 – Tuesday 30 January 2018 page 2


Recap

Diagonalization means base change so that


 
×
 
 × 0 
 
A −→ 



×





 0 × 

×
| {z }
Diagonal

When can we do this? Answer: See later...


MA212 – Lecture 29 – Tuesday 30 January 2018 page 3
Recall that

If D is a diagonal matrix:
 
d1
 
 


D=
d2 0 


 .. 
 . 
 
0 dn

Then pD (x) = det(x I − D)


i.e. pD (x) = (x − d1 )(x − d2 ) . . . (x − dn )

so, d1 , d2 , . . . , dn are eigenvalues of D .

MA212 – Lecture 29 – Tuesday 30 January 2018 page 4


Cont’d

A is diagonalizable iff

P −1 AP = D ‘Similar to D ’

for some diagonal D


Then D and A have the same eigenvalues. The diagonal
entries of D are eigenvalues of A .

MA212 – Lecture 29 – Tuesday 30 January 2018 page 5


Diagonalization

Let A be a square matrix of size n × n .


Paradigm:
Suppose there are n linearly independent eigenvectors. Say
these are v1 , · · · , vn and

Avi = λi vi

Then A is similar to
 
λ1
 
 


D=
λ2 0 


 .. 
 . 
 
0 λn
MA212 – Lecture 29 – Tuesday 30 January 2018 page 6
Cont’d

Actually: D represents y = A x with respect to the basis

B = (v1 , . . . , vn ).

MA212 – Lecture 29 – Tuesday 30 January 2018 page 7


Cont’d

so
D = MB−1 A MB

Check:

λj ej = D ej = MB−1 A MB ej
= MB−1 Avj (as vj = MB ej )
= MB−1 λj vj
= λj MB−1 vj
= λj ej (again as vj = MB ej ).

MA212 – Lecture 29 – Tuesday 30 January 2018 page 8


Some observations

Example... soon to come (next 7→ )


Even if A is real, nevertheless eigenvalues can be complex.
However, some good news: if λ ∈ C and A is real

if A v = λv v 6= 0
then A v = λv but A is real so
Av = Av = λv and v 6= 0.

So, for real A if λ is an eigenvalue and v a corresponding


eigenvector, then λ is also an eigenvalue and v a corre-
sponding eigenvector.

MA212 – Lecture 29 – Tuesday 30 January 2018 page 9


Example

Consider

 
−1 0 −2
 
A=
0 3 2
1 −3 0

MA212 – Lecture 29 – Tuesday 30 January 2018 page 10


Step 1: Find eigenvalues

pA (x) = det(x I − A)

x + 1 0 2



= 0 x − 3 −2

−1 3 x


x − 3 −2 0 x − 3

= (x + 1)
+ 2
3 x −1 3
= (x + 1)(x2 − 3x + 6) + 2(x − 3)
= x3 − 2x2 + 5x
= x(x2 − 2x + 5)

MA212 – Lecture 29 – Tuesday 30 January 2018 page 11


So the roots are...

Roots: x=0

2 ± 4 − 20
x= = 1 ± 2i
2

λ1 = 0, λ2 = 1 + 2i λ3 = λ2 = 1 − 2i.

MA212 – Lecture 29 – Tuesday 30 January 2018 page 12


Step 2: Find eigenvectors

λ1 = 0 solve Av = 0

−x1 − 2x3 = 0 take x3 = −3t


3x2 + 2x3 = 0 then x2 = 2t
x1 − 3x2 = 0 then x1 = 6t.

(The last equation is redundant, but useful here, nevertheless.)

MA212 – Lecture 29 – Tuesday 30 January 2018 page 13


Say t = 1...

 
6t
 
v1 =  2t 


−3t
 
6
 
=t2

−3

Say t = 1 ...
 
6
 
v1 = 
 2 

−3
MA212 – Lecture 29 – Tuesday 30 January 2018 page 14
Getting v2

λ2 = 1 + 2i solve A v = λ2 v
(λ2 I − A)v = 0
    
2 + 2i 0 2 x1 0
    
 0 −2 + 2i −2  x2  = 0
    
−1 3 1 + 2i x3 0

(1 + i)x1 + x3 = 0 (1)
(−1 + i)x2 − x3 = 0 (2)

Ignore third, as redundant.

MA212 – Lecture 29 – Tuesday 30 January 2018 page 15


Ignoring the redundant one

Put x3 = −t

−t −t(−1 − i)
(2) ⇒ x2 = =
−1 + i 2
1+i
= t (−1 + i)(−1 − i) = 2
2
t t(1 − i)
x1 = = (1 + i)(1 − i) = 2
1+i 2

yielding

   
1−i 1−i
t   
v2 =  1 + i
 say (t = 2) 1 + i
 
2
−2 −2
MA212 – Lecture 29 – Tuesday 30 January 2018 page 16
v3 for free from v2

Take v3 = v2

   
1−i 1−i
t   
v2 =  1 + i
 say (t = 2) 1 + i
 
2
−2 −2

so take

 
1+i
 
v3 = v2 =  
1 − i
−2

MA212 – Lecture 29 – Tuesday 30 January 2018 page 17


Step 3: Construct the ‘modular’ matrix.

Take
 
P = v1 , v2 , v3
 
6 1−i 1+i
 
=
 2 1 + i 1 − i

−3 −2 −2

Then

A P = (A v1 , A v2 , Av3 )
= (λ1 v1 , λ2 v2 , λ3 v3 )

MA212 – Lecture 29 – Tuesday 30 January 2018 page 18


Finally

So that
 
λ1 0 0
 
A P = P D = (v1 , v2 , v3 ) 
0 λ2 0 
0 0 λ3
 
λ1 0 0
 
∴ P −1 A P =
0 λ2 0 
.
0 0 λ3
 
0 0 0
 
−1 
P A P = 0 1 + 2i 0  .
0 0 1 − 2i

MA212 – Lecture 29 – Tuesday 30 January 2018 page 19


Here:

 
4 4 4
−1 1 


P = −3 + i −3 − 9i −8 − 4i
20  
−3 − i −3 + 9i −8 + 4i

(Here A is diagonalizable.)
Question: In general, when can this be done?...

MA212 – Lecture 29 – Tuesday 30 January 2018 page 20


Break-downs - I
Where can the
diagonalization
paradigm
break down?

MA212 – Lecture 29 – Tuesday 30 January 2018 page 21


Eigenspaces

For a matrix A sized n × n and λ an eigenvalue; the


Eigenspace to value λ is defined to be:

E(A, λ) = {v : A v = λv}

This includes 0 .
But v ∈ E(A, λ) and v 6= 0 ⇒ v is an eigenvector.
FACT
If pA (x) = (x − λ1 )m1 . . . (x − λk )mk , then

1 ≤ dimE(A, λj ) ≤ mj
| {z } |{z}
geometric multiplicity algebraic multiplicity

MA212 – Lecture 29 – Tuesday 30 January 2018 page 22


N.B.

E(A, λ) = {v : (A − λI)v = 0}
= N (A − λI)
= Null space of A − λI.

MA212 – Lecture 29 – Tuesday 30 January 2018 page 23


EXAMPLE

 
4 0 6
 
A = −2 1 −5


−3 0 −5


x − 4 0 −6

x − 4 −6

pA (x) = 2 x−1 5 = (x − 1)

3 x + 5
3 0 x + 5
= (x − 1)(x2 + x − 2)
= (x − 1)(x − 1)(x + 2)
= (x − 1)2 (x + 2)

MA212 – Lecture 29 – Tuesday 30 January 2018 page 24


So ...

λ = 1 has alg. multiplicity 2


λ = −2 has alg. multiplicity 1 .

MA212 – Lecture 29 – Tuesday 30 January 2018 page 25


So dimE(A, −2) = 1, but is dimE(A, 1) 1 or 2, which?

solve(A − I)v = 0

 
3 0 6
 
−2 0 −5 v = 0
 
−3 0 −6

3x1 + 6x3 = 0 (1)


−2x1 − 5x3 = 0 (2)
−3x1 − 6x3 = 0 (3)

MA212 – Lecture 29 – Tuesday 30 January 2018 page 26


Cont’d

By (1) , x1 = −2x3 . Substitute for x1 in (2) .


 
0
 
x3 = 0, x1 = 0, v =  x
  2

MA212 – Lecture 29 – Tuesday 30 January 2018 page 27


 
0
 
E(A, 1) = Lin  1
 

so, it is 1 -dimensional: This is a deficiency – we wanted/hoped


for 2 -dimensional.
So, A is not diagonalizable because it doesn’t have 3
independent eigenvectors.

MA212 – Lecture 29 – Tuesday 30 January 2018 page 28


Geometric multiplicity is preserved under similarity.

Theorem If A, B are similar and λ is an eigenvalue of


either (= both here) then

dimE(A, λ) = dimE(B, λ).

Observation. Suffices to show

dimE(A, λ) ≥ dimE(B, λ)

... why?
Answer: Write A for B and B for A

dimE(B, λ) ≥ dimE(A, λ)

because A similar to B ⇒ B similar to A .


MA212 – Lecture 29 – Tuesday 30 January 2018 page 29
Proof

Say A = P −1 B P ; λ an eigenvalue of A and B ; {v1 , . . . , vk }


basis of E(B, λ) .
Put ui = P −1 vi

−1

Aui = P B P P −1 vi
= P −1 B vi
= P −1 λvi
= λui .

So, ui ∈ E(A, λ) .

MA212 – Lecture 29 – Tuesday 30 January 2018 page 30


Proof continued

We now show:
ui are linearly independent
If α1 ui + · · · + αk uk = 0 , then, as v 7→ P v is linear,

α1 P u1 + · · · + αk P uk = 0
iff α1 v1 + αk vk = 0.

So, α1 = α2 = · · · = αk = 0 , because v1 , . . . , vk are linearly


independent (form a basis).
So, dimE(A, λ) ≥ k = dimE(B, λ) .
This completes the proof in view of the earlier observation.

MA212 – Lecture 29 – Tuesday 30 January 2018 page 31


EXAMPLE

 
1 1
A= 
0 1


x − 1 −1
pA (x) =

0 x − 1
= (x − 1)2
 
0 1
A−I =  
0 0

MA212 – Lecture 29 – Tuesday 30 January 2018 page 32


Cont’d

Av = v ⇒ x2 = 0
 
x1
v=  
0
  
 1 
E(A, 1) = Lin   of dimension 1
 0 

so, A is NOT diagonalizable.

MA212 – Lecture 29 – Tuesday 30 January 2018 page 33


Why is ...

 
1 1
A= 
0 1

not diagonalizable?
If it were λ1 = λ2 = 1
   
1 1 1 0
  = P −1  P
0 1 0 1
= P −1 I P = I
 
1 0
=  
0 1

Contradiction! MA212 – Lecture 29 – Tuesday 30 January 2018 page 34


Break-downs - II
Upper triangular
will
fix it!

MA212 – Lecture 29 – Tuesday 30 January 2018 page 35


Although not every square matrix can be diagonalized,

... nevertheless every matrix can be reduced to the very useful:


Upper Triangular Form:
 
d
 1 

T =


 d2




U
 .. 
 . 
 
0 dn

N.B.
pT (x) = (x − d1 )(x − d2 ) · · · (x − dn )

So, diagonal entries are eigenvalues, each appearing same


number of times as its algebraic multiplicity.
MA212 – Lecture 29 – Tuesday 30 January 2018 page 36
Why useful?
Just like in each format, equations
   
x b
 1  1
 .  .
T  ..  =  .. 
   
xn bn

are easy to solve ‘backwards’ from last equation upwards.


Last equation:
dn xn = bn

(soluble if dn 6= 0 for any bn or soluble if dn = bn = 0 .)

MA212 – Lecture 29 – Tuesday 30 January 2018 page 37


Last but one:

dn−1 xn−1 + tn−1,n xn = bn−1

etc.

MA212 – Lecture 29 – Tuesday 30 January 2018 page 38


Theorem

Theorem. Every square matrix is similar to an upper trian-


gular matrix.

Proof. This is proved by showing how to pass to progressively


smaller matricesa , until one reaches 1 × 1 matrices!
Let A be n × n .
Let λ be an eigenvalue of A
Let
(u1 , . . . , uk ) be a basis of E(A, λ)

Extend to a basis of Rn :

B = (u1 , . . . , un )
a
‘Proof by mathematical induction’
MA212 – Lecture 29 – Tuesday 30 January 2018 page 39
Cont’d

Put
A′ = MB−1 A MB

Base change, A′ represents A relative to (u1 , . . . , un ) . Now for


j = 1, . . . , k

A′ ej = MB−1 A MB ej
= MB−1 A uj
= MB−1 λuj
= λMB−1 uj
= λ ej

MA212 – Lecture 29 – Tuesday 30 January 2018 page 40


 
λ
 
 


..
. P 

 
 λ 
A =




 
 
 


0 Q 

for some P that is k × (n − k)


Q that is (n − k) × (n − k)

pA (x) = pA′ (x) = (x − λ)k pQ (x)


MA212 – Lecture 29 – Tuesday 30 January 2018 page 41
So:

1≤ k
|{z} ≤ m
|{z}
Geometric multiplicity algebraic multiplicity

Q is smaller sized, at most (n − 1) × (n − 1) .

MA212 – Lecture 29 – Tuesday 30 January 2018 page 42


Suppose Q can be ‘reduced’ to upper triangular form, i.e.,

C −1 Q C = U

for C non-singular and U upper-triangular.

MA212 – Lecture 29 – Tuesday 30 January 2018 page 43


Cont’d

Put
 
1
 
 .. 


. 0 

 
 1 
N =



 
 
 


0 C 

MA212 – Lecture 29 – Tuesday 30 January 2018 page 44


Cont’d ... N −1 A′ N =

   
   
 Ik 0  λIk P  Ik 0 
   
   
   
   
   
   
   

 0 C −1 
 0 Q 
 0 C 

MA212 – Lecture 29 – Tuesday 30 January 2018 page 45


N −1 MB−1 A MB N is upper triangular

  
  

 λIk P 
 Ik 0 

  
  
  
=  
  
  
 −1  

 0 C Q 
 0 C 

MA212 – Lecture 29 – Tuesday 30 January 2018 page 46


 
 

 λIk PC 
 
 
 
= 
 
 
 −1 

 0 C Q C

MA212 – Lecture 29 – Tuesday 30 January 2018 page 47


Cont’d

 
λ
 
 .. 

 . PC 

 
 λ 
=



 
 
 


0 U 

is upper triangular.
A is similar to A′
A′ is similar to upper triangular.
∴ A is similiar to upper triangular.

MA212 – Lecture 29 – Tuesday 30 January 2018 page 48


EXAMPLE

The matrix
 
2 2 −1
 
A = −1 −1 1 


−1 −2 2

has
pA (x) = det(x I − A) = (x − 1)3 .
 
1
 
v= 0
 
 is an eigenvector

MA212 – Lecture 29 – Tuesday 30 January 2018 page 49


Create a basis ...

using v for R3
say,
     
1 0 0
     
v= 0
 
, e2 = 1 , e3 = 
  0
 
,

1 0 1

obviously linearly independent

MA212 – Lecture 29 – Tuesday 30 January 2018 page 50


 
1 0 0
 
M B = 0 1 0 


1 0 1

  
2 2 −1 1 0 0
  
A MB = −1 −1 1  0 1 0
  

−1 −2 2 1 0 1
 
1 2 −1
 
= 0 −1 1 


1 −2 2

MA212 – Lecture 29 – Tuesday 30 January 2018 page 51


Easy inversion:

 
1 0 0
 
M B = 0 1 0 


1 0 1

(you get this by adding row 1 to row 3 ) Then


 
1 0 0
 
MB−1 =
 0 1 0

−1 0 1

(So, you get this by the opposite inverse action, i.e., subtract row
1 from row 3 )

MA212 – Lecture 29 – Tuesday 30 January 2018 page 52


End result

  
1 0 0 1 2 −1
  
MB−1 A MB =
0 1 0 0 −1 1 
 

−1 0 1 1 −2 2
 
1 2 −1
 
=
0 −1 1 ( = A′
)
0 −4 3

MA212 – Lecture 29 – Tuesday 30 January 2018 page 53


Consider  
−1 1
A1 =   ( =Q)
−4 3

pA1 (x) = (x + 1)(x − 3) + 4 = x2 − 2x + 1 = (x − 1)2

 
1
A1 v = v ⇒ v =  
2

MA212 – Lecture 29 – Tuesday 30 January 2018 page 54


Extend to basis of R2

 
1 0
 
2 1

(like MB but working in R2 not R3 )

with inverse (swop/re-sign/divide!)


 
1 0
 
−2 1

(again like MB−1 but working in R2 not R3 )

MA212 – Lecture 29 – Tuesday 30 January 2018 page 55


   
1 0 1 1
A1   =   (like AMB but working in R2 )
2 1 2 3
    
1 0 1 1 1 1
  =  ( = M −1 A1 MB )
B
−2 1 2 3 0 1

   
1 0 0 1 2 −1 1 0 0
   
0 1 0 0 −1 1  0 1 0 
  ( = N −1 ′
AN )
  
0 −2 1 0 −4 3 0 2 1
  
1 0 0 1 0 −1
  
= 
0 1 0 0 1 1
MA212 – Lecture 29 – Tuesday 30 January 2018 page 56
0 −2 1 0 2 3
So finally

  
1 0 0 1 0 −1
  
=
0 1 0 0 1 1 
 

0 −2 1 0 2 3
 
1 0 −1
 
=
0 1 1
0 0 1

MA212 – Lecture 29 – Tuesday 30 January 2018 page 57


N.B. (Nota bene!)

For A sized n × n suppose det(A − λI) = 0


Then
(A − λI)x = 0

has a redundant equation


... because A − λI has rank strictly smaller than n and so at
least one row is a linear combination of the others.

MA212 – Lecture 29 – Tuesday 30 January 2018 page 58


MA212 Further Mathematical Methods Lecture LA 10

Lecture 30: Block-diagonal Forms

Shearing and Jordan blocks


Cayley-Hamilton Theorem
Jordan canonical form (see Moodle for a ‘theory’ Appendix)
Examples
Example of a non-diagonalizable matrix

The following matrix is not diagonalizable:


 
1 1
A=  .
0 1

Reference to the characteristic polynomial shows its


eigenvalues to be the diagonal entries. Were it to be
diagonalizable, there would be a non-singular P such that

P −1 AP = diag{1, 1}
 
1 0
=  =I
0 1

MA212 – Lecture 30 – Friday 2 February 2018 page 2


Then

   
1 1 1 0
  = A = P IP −1 = I =  ,
0 1 0 1

oops, a contradiction.

MA212 – Lecture 30 – Friday 2 February 2018 page 3


Here A = I+ a shearing action:

        
1 1 x1 x1 + x2 x1 x2
  = =I  + .
0 1 x2 x2 x2 0

Here     
x2 0 1 x1
 =  
0 0 0 x2

and  
0 1
S= 
0 0

has a shearing action.

MA212 – Lecture 30 – Friday 2 February 2018 page 4


Geometric interpretation of the transformation A.

‫ݔ‬2

‫ݔ‬ A‫ݔ‬
‫ݔ‬2

S‫ݔ‬
‫ݔ‬1 ‫ݔ‬2 ‫ݔ‬1

MA212 – Lecture 30 – Friday 2 February 2018 page 5


Deformation?

‘most stretched’

‘least stretched’
MA212 – Lecture 30 – Friday 2 February 2018 page 6
More generally

      
λ 1 x1 λ 0 x2
Ax =   =  x+   = λIx + Sx
0 λ x2 0 λ 0

 
λ 1
J2 =   is called a 2 × 2 Jordan block.
0 λ

MA212 – Lecture 30 – Friday 2 February 2018 page 7


Illustration in 3-dimensions

ʄI‫ݔ‬

‫ݔ‬
‫ݔ‬2

S‫ݔ‬
‫ݔ‬1
‫ݔ‬2

MA212 – Lecture 30 – Friday 2 February 2018 page 8


In 3-dimensions

A 3 × 3 Jordan block is given by


 
λ 1 0
 
J3 =  
 0 λ 1 
0 0 λ

and
       
λ 1 0 λx1 + x2 x2 0
       
 0 λ 1  x =  λx2 + x3  = λIx+  0  +  x3 
       
0 0 λ λx3 0 0

includes a shear along the x1 -axis plus a shear along the


x2 -axis.
MA212 – Lecture 30 – Friday 2 February 2018 page 9
In 3-D

Ix

3 x
2
3

2
1

MA212 – Lecture 30 – Friday 2 February 2018 page 10


In the last example one can have fewer shearing actions:

 
λ 1 0
 
 0 λ 0 
 
←− shearing absent here
0 0 λ

 
λ 0 0
  ←− shearing absent here
 0 λ 1 
 
0 0 λ
 
λ 0 0
  ←− shearing absent here
 0 λ 0 
 
←− shearing absent here
0 0 λ
MA212 – Lecture 30 – Friday 2 February 2018 page 11
Decomposition

We can decompose the first matrix as a 2 × 2 block J2 + a 1 × 1


block J1 plus bordering
     
λ 1 0 0 λ 1 0
   J2   
 0 λ 0 = 0  =  0 λ 0 
     
0 0 λ 0 0 J1 0 0 λ

MA212 – Lecture 30 – Friday 2 February 2018 page 12


Cont’d

We can decompose the second matrix as a 1 × 1 block J1 + a


2 × 2 block plus bordering
     
λ 0 0 J1 0 0 λ 0 0
     
 0 λ 1 = 0 = 0 λ 1 
   J2   
0 0 λ 0 0 0 λ

MA212 – Lecture 30 – Friday 2 February 2018 page 13


Theorem on the Jordan canonical form

Theorem. Every n×n matrix is similar to a Jordan block form,


with Jk -blocks down the diagonal with zeros elsewhere.
For example:
 
λ1 1
 
 λ1   
 
  J
 λ2 1 0   2 
   
   J3 
 0 λ2 1 = 
   
   J 
 λ2   
  ..
  .
 λ3 
 
..
.

Each eigenvalue occurs according to its algebraic multiplicity.


*Exotica: Shearing and differentiation

In C ∞ [0, 1] , the space of functions f (t) having derivatives of all


orders:
Take
d
Af := f i.e.f ′
dt
An eigenvector here is a function f such that for some scalar λ

f ′ = Af = λf
d λt
f (t) = e as e = λeλt .
λt
dt

* For background reading only


*Cont’d

Now take e1 (t) := eλt and e2 (t) := teλt then

d λt
te = eλt + λteλt : Ae2 = λe2 + e1
dt
The second term e1 here acts as a shear on e2 .

* For background reading only


Recall, something that we need in the next slide, about
partitioning matrices:
 
x
 1 
 
 x2 
Ax = [a1 , a2 , ..., an ]  
 .. 
 . 
 
xn
= x1 a1 + x2 a2 + ... + xn an

MA212 – Lecture 30 – Friday 3 February 2017 page 17


Likewise, ...

more generally:
 
x1 y1 ...
 
 
 x2 y2 ... 
A[x, y, ...] = [a1 , a2 , ..., an ] 
 .. .. ..


 . . . 
 
xn yn ...

= [x1 a1 + x2 a2 + ... + xn an , y1 a1 + y2 a2 + ... + yn an , ...]

MA212 – Lecture 30 – Friday 3 February 2017 page 18


Another view of J = Jm

 
λ 1 Je1 = λe1
 
 λ 1  Je2 = e1 +λe2
 
 
 λ 1  ...
 
J = ..  =⇒
 . 
 
 
 λ 1  Jem−1 = em−2 +λem−1
 
λ Jem = em−1 +λem
m×m

MA212 – Lecture 30 – Friday 3 February 2017 page 19


An important tool: the Cayley-Hamilton Theorem*

Theorem (Cayley-Hamilton Theorem). If A is n × n and


the characteristic polynomial factorizes as:

pA (x) := det(xI − A) = (x − λ1 )(x − λ2 )...(x − λn ),

then
(A − λ1 I)(A − λ2 I)...(A − λn I) = O.

This is remembered as “ pA (A) = O ”.

*=Statement only needed; no need to memorize the proof.


Fake proof

pA (A) = det(xI − A)|substitute x=A = det(AI − A) = det(O) = 0.

However, x ranges over scalars not over matrices, so this won’t


do!!

*=Statement only needed; no need to memorize the proof.


Example

 

0 −1 x 1
A=  
=⇒ pA (x) = = x2 + 2x + 1.

1 −2 −1 x + 2

MA212 – Lecture 30 – Friday 3 February 2017 page 22


Cont’d

      
0 −2 0 −1 0 −1 −1 2
2A =   , A2 =   = 
2 −4 1 −2 1 −2 −2 3
 
−1 0
A + 2A = 
2  = −I.
0 −1

Yesss! A2 + 2A + I = O.

MA212 – Lecture 30 – Friday 3 February 2017 page 23


Proper proof: Stage 1

Find P non-singular and U upper triangular such that

U = P −1 AP.

Write (with × denoting arbitrary entry):


 
λ1 × × × ... ×
 
 λ2 × × ... × 
 
 
 λ3 × ... × 
 
U :=  
 ... ... 
 
 
 ... × 
 
λn

MA212 – Lecture 30 – Friday 3 February 2017 page 24


Then

 
λ1 − λj × × × ... ×
 
 λ2 − λj × × ... × 
 
 
 ... × ... × 
 
U − λj I =  
 0 = λj − λj ... ... 
 
 
 ... × 
 
λn − λj

In view of the ‘0’ on the diagonal

(U − λj I) maps Lin{e1 , ..., ej } → Lin{e1 , ..., ej−1 }.

MA212 – Lecture 30 – Friday 3 February 2017 page 25


Check:

(U − λj I)e1 = (λ1 − λj )e1 ,


(U − λj I)e2 = ×e1 + (λ2 − λj )e2
...
(U − λj I)ej = ×e1 + ... + ×ej−1

All of these are in Lin{e1 , ..., ej−1 } .

MA212 – Lecture 30 – Friday 3 February 2017 page 26


This implies that...

(U − λn I)Cn = (U − λn I)Lin{e1 , ..., en } ⊆ Lin{e1 , ..., en−1 }

So

(U − λn−1 I)(U − λn I)Cn = (U − λn−1 I)Lin{e1 , ..., en−1 }

⊆ Lin{e1 , ..., en−2 }

and

(U −λn−2 I)(U −λn−1 I)(U −λn I)Cn = (U −λn−2 I)Lin{e1 , ..., en−2 }

⊆ Lin{e1 , ..., en−3 }

MA212 – Lecture 30 – Friday 3 February 2017 page 27


Continuing all the way:

(U − λ1 I)(U − λ2 I)(U − λ3 I)(U − λn I)Cn = Lin{0} = {0}.

i.e. LHS maps all vectors to 0 :

(U − λ1 I)(U − λ2 I)...(U − λn−1 I)(U − λn I) = O.

MA212 – Lecture 30 – Friday 3 February 2017 page 28


Proper proof: Stage 2

By applying P on the left and P −1 on the right we deduce from


the above that

(A − λ1 I)(A − λ2 I)...(A − λn I) = O

Substituting U = P −1 AP and I = P −1 P ,

O = (U − λ1 I)(U − λ2 I)...(U − λn I)

= (P −1 AP −λ1 P −1 P )(P −1 AP −λ2 P −1 P )...(P −1 AP −λn P −1 IP ).

MA212 – Lecture 30 – Friday 3 February 2017 page 29


Cont’d

Using (P −1 AP − λ1 P −1 P ) = P −1 (A − λ1 I)P (i.e. ‘pulling


factors out’ left and right):

O = P −1 (A − λ1 I)P P −1 (A − λ2 I)P...P −1 (A − λn I)P


= P −1 (A − λ1 I)(A − λ2 I)...(A − λn I)P,

after cancellation.
Now pre- and post-multiply resp. by P and P −1

O = P OP −1 = P P −1 (A − λ1 I)(A − λ2 I)...(A − λn I)P P −1

Finally
O = (A − λ1 I)(A − λ2 I)...(A − λn I).

MA212 – Lecture 30 – Friday 3 February 2017 page 30


How to find the Jordan Normal Form

We start by finding the characteristic polynomial, the


eigenvalues and corresponding eigenvectors just as for
diagonalization.
Let’s look at the 2 × 2 and 3 × 3 cases.

MA212 – Lecture 30 – Friday 3 February 2017 page 31


2 × 2 case: We assume A is not diagonalizable.

Suppose A 6= λI2×2 . Then:


Regarding A − λI, 3 possibilities follow.
i) dim N (A − λI) = 2.
ii) dim N (A − λI) = 1.
iii) dim N (A − λI) = 0? Nope! Silly: of course ∃v 6= 0 with
Av = λv, because det(A − λI) = 0.

MA212 – Lecture 30 – Friday 3 February 2017 page 32


Working in R2 , if dim N (A − λI) = 2 , then N (A − λI) = R2 ,
and then

(A − λI)u= 0 ∀u ∈ R2
(A − λI) ≡ O
A = λI,

but we assumed A was not diagonal.

MA212 – Lecture 30 – Friday 3 February 2017 page 33


So, after all dim N (A − λI) = 1.

So with one dimension left:


Pick u ∈R2 \N (A − λI) and write v2 = u.
So v2 witnesses (A − λI) 6= O .
Put
v :=(A − λI)u 6= 0, and so note Au =λu + v

and write v1 = v.

MA212 – Lecture 30 – Friday 3 February 2017 page 34


By the Cayley-Hamilton Theorem

(A − λI)v = (A − λI)(A − λI)u = Ou = 0.

So Av = λv, and we get a stretch and a stretch+shear:

Av = λv Av1 =λv1
Au = λu + v Av2 =λv2 +v1

Needs proof that {v1 , v2 } is a linearly independent set. But it is


so.

MA212 – Lecture 30 – Friday 3 February 2017 page 35


Take

P = [v1 , v2 ]

Then

AP = A[v1 , v2 ] = [Av1 , Av2 ] = [λv1 , λv2 + v1 ]


 
λ 1
= [v1 , v2 ]  
0 λ

MA212 – Lecture 30 – Friday 3 February 2017 page 36


So

 
λ 1
P −1
AP =  .
0 λ

So if A is not diagonalizable, then it is similar to


 
λ 1
 .
0 λ

MA212 – Lecture 30 – Friday 3 February 2017 page 37


Example:

Recall that
 

0 −1 x 1
A=  
=⇒ pA (x) = = x2 +2x+1 = (x+1)2 .

1 −2 −1 x + 2

So λ = −1 twice.

MA212 – Lecture 30 – Friday 3 February 2017 page 38


   
1 −1  1 
A − λI = A + I =   R(A − λI) = Lin  
1 −1  1 

Find u ∈N
/ (A − λI) = N (A + I), i.e. u with (A + I)u 6= 0,
i.e. u = (u1 , u2 )T such that
   
1 −1
u1   + u2   6= 0.
1 −1

MA212 – Lecture 30 – Friday 3 February 2017 page 39


Easy:

take v2 = u = (1, 0)T = e1 . Now take


 
1
v1 = (A + I)v2 = (A + I)u =  .
1

Take
   
1 1 1  0 −1 
P = [v1 , v2 ] =   and so P −1 =
1 0 det P −1 1

MA212 – Lecture 30 – Friday 3 February 2017 page 40


Then..

... as det P = −1,


   
−1 0 1 0 −1 1 1
J = P AP =    
1 −1
1 −2 1 0
    
0 1 −1 0 −1 1
=     =  
1 −1 −1 1 0 −1

MA212 – Lecture 30 – Friday 3 February 2017 page 41


3 × 3 case:

working in R3 ....
Main focus here is on λ repeated three times pA (x) = (x − λ)3 .
NB The case pA (x) = (x − λ)2 (x − µ) reduces to
   
λ 1 0 λ 0 0
   
 0 λ 0  or  0 λ 0  .
   
0 0 µ 0 0 µ

In the former case choose v1 with Av1 = λv1 and find a


solution for v2 to the equation (A − λI)v2 = v1 . In the latter
case choose independent v1 , v2 with Avi = λvi for i = 1, 2 .

In both cases choose v3 with Av3 = µv3 .

MA212 – Lecture 30 – Friday 3 February 2017 page 42


By the Cayley-Hamilton Theorem...

pA (x) = (x − λ)3 =⇒ (A − λI)3 = O.

The situation falls into several cases.

MA212 – Lecture 30 – Friday 3 February 2017 page 43


Case 1. ‘Bottom up’: v3 , then v2 , then v1

Case: (A − λI)2 6= O
2

Pick v3 = u with u ∈
/ N (A − λI)

(A − λI)2 u 6= 0, so v3 witnesses the non-zero.

Then

v2 := (A − λI)v3 6= 0, as otherwise (A − λI)2 v3 = 0.

MA212 – Lecture 30 – Friday 3 February 2017 page 44


Finally take...

v1 := (A − λI)v2 = (A − λI)2 v3 6= 0.

Then

(A − λI)v1 := (A − λI)2 v2 = (A − λI)3 v3 = 0,

the latter by the Cayley-Hamilton Theorem.

MA212 – Lecture 30 – Friday 3 February 2017 page 45


Cascade

So we now have the cascade:

Av1 = λv1
Av2 = λv2 + v1
Av3 = λv3 + v2

Needs proof that {v1 , v2 , v3 } is a linearly independent set.


And it is so.

MA212 – Lecture 30 – Friday 3 February 2017 page 46


Again take P = [v1 , v2 , v3 ]

Then

AP = A[v1 , v2 , v3 ] = [Av1 , Av2 , Av3 ]


= [λv1 , v1 + λv2 , v2 + λv3 ]
 
λ 1 0
 
= [v1 , v2 , v3 ]  0 λ 1 


0 0 λ

Then
 
λ 1 0
 
P −1
AP =  
 0 λ 1 .
0 0 λ

MA212 – Lecture 30 – Friday 3 February 2017 page 47


Case 2. ‘Bottom up’: witness v3 , then v2 ; separately v1

(A − λI)2 = O

Possibilities:
i) dim N (A − λI) = 3 =⇒ A = λI, as before (see below)
ii) dim N (A − λI) = 2... see below

iii) dim N (A − λI) = 1... impossible


iii) dim N (A − λI) = 0... impossible, as ∃v 6= 0 with Av = λv .

MA212 – Lecture 30 – Friday 3 February 2017 page 48


Cases (i) and (iii)

i) dim N (A − λI) = 3 implies that N (A − λI) = R3 , i.e.

(A − λI)u = 0 ∀u : Au = λu ∀u : A = λI.

iii) dim N (A − λI) = 1... then rk (A − λI) = 3 − 1 = 2. But as


(A − λI)2 = O

(A − λI)[(A − λI)u]= 0 ∀u
(A − λI)v= 0 ∀v ∈ R(A − λI)

So
R(A − λI) ⊆ N (A − λI) =⇒ dim N (A − λI) ≥ 2,

a contradiction.

MA212 – Lecture 30 – Friday 3 February 2017 page 49


Case (ii): dim N (A − λI) = 2

As dim N (A − λI) = 2 6= 3, pick u = v3 with

v2 := (A − λI)u 6= 0 : Av3 := v2 + λv3

Here

(A − λI)v2 := (A − λI)2 u = 0 : Av2 := λv2

i.e. v2 ∈ N (A − λI), which has dimension 2, so then...

pick v1 ∈ N (A − λI) independent of v2 .

MA212 – Lecture 30 – Friday 3 February 2017 page 50


Then Av1 := λv1 so take P = [v1 , v2 , v3 ]

AP = A[v1 , v2 , v3 ] = [Av1 , Av2 , Av3 ]


= [λv1 , λv2 , v2 + λv3 ]
 
λ 0 0
 
= [v1 , v2 , v3 ] 
 0 λ 1


0 0 λ

   
λ 0 0 λ 0 0
   
P −1
AP = 
 0
= 0
  λ 1 

J2
0 0 0 λ

MA212 – Lecture 30 – Friday 3 February 2017 page 51


Example

 
1 2 1
 
A =  0 2 0 


−1 2 3

x − 1 −2 −1


(expand by middle row) 0 x−2 0


1 −2 x − 3

= (x − 2){(x − 1)(x − 3) + 1} = (x − 2){(x2 − 4x + 4}


= (x − 2)(x − 2)2 = (x − 2)3 .

MA212 – Lecture 30 – Friday 3 February 2017 page 52


 
−1 2 1
 
A − λI = A − 2I =  0 0 0 


−1 2 1

Here
(A − 2I)2 = O.

MA212 – Lecture 30 – Friday 3 February 2017 page 53


Pick u to witness (A − 2I)u 6= 0

Pick u = v3 = e1 with
 
 
 1 
 
v2 := (A − 2I)e1 ∈Lin  0  6= 0 :
  Av3 = v2 + λv3

 
 
1

MA212 – Lecture 30 – Friday 3 February 2017 page 54


 
−1
 
v2 = (A − 2I)e1 =  0 


−1

Now we cannot take for v1 the image (A − 2I)v2 as


    
−1 2 1 −1 0
    
(A − 2I)v2 = 
 0 0 0  0  =  0 
   
−1 2 1 −1 0

MA212 – Lecture 30 – Friday 3 February 2017 page 55


So v2 ∈ N (A − 2I), so we look for an alternative eigenvector
independent of v2 in N (A − λI).
First we find N (A − 2I) by solving

(A − 2I)x = 0

i.e.
    
−1 2 1 x1 0
    
 0 0 0   x2  =  0 
    
−1 2 1 x3 0

MA212 – Lecture 30 – Friday 3 February 2017 page 56


The only non-redundant equation is:

−x1 + 2x2 + x3 = 0.

So
     
2x2 + x3 2x2 x3
     
x =
 x2  =  x2  +  0 
    
x3 0 x3

MA212 – Lecture 30 – Friday 3 February 2017 page 57


So

    
 
  2   1 
 
N (A − 2I) = Lin  1 , 0 
   

 
 
0 1

Here the obvious choice is the first of the two (not the second)
 
2
 
v1 :=  
 1 
0

MA212 – Lecture 30 – Friday 3 February 2017 page 58


So take

 
2 −1 1
 
P = [v1 , v2 , v3 ] =  1 0 0 


0 −1 0

Then
 
2 0 0
 
P −1
AP =  0 2 1 


0 0 2

Just one shear here.

MA212 – Lecture 30 – Friday 3 February 2017 page 59


MA212 Further Mathematical Methods Lecture LA 11

Lecture 31: Solving differential equations


using Jordan blocks

Reduction to first order systems


Diagonalizable case
Exponential Matrix Method (use of etA )
Background Readings

Anthony and Harvey


Chap. 9 (§9.2,9.3)

Briefly also in :
Adam Ostaszewski
§6.5

MA212 – Lecture 31 – Tuesday 6 February 2018 page 2


Recap: ... what we did and saw:

Bottom-up method:
λ eigenvalue: Algebraic multiplicity m = mA .
Geometric multiplicity mG ... with mG ≤ mA .
i.e. we can get only mG lin. indep. eigenvectors to value λ .
For each of these eigenvectors we arrange a stretch-and-shear
scheme:

Av1 = λv1
Av2 = v1 + λv2
...
Avk = vk−1 + λvk

MA212 – Lecture 31 – Tuesday 6 February 2018 page 3


The sequence of operations is this.

Find k with k ≤ m such that

(A − λI)k = O
(A − λI)k−1 6= O

Pick vk witnessing the above:

(A − λI)k−1 vk 6= 0,

continue recursively (backwards!)...

MA212 – Lecture 31 – Tuesday 6 February 2018 page 4


That is

vk−1 = (A − λI)vk
vk−2 = (A − λI)vk−1
...
v1 = (A − λI)v2 = ... = (A − λI)k−1 vk 6= 0

The last equation:

v1 = (A − λI)v2 = ... = (A − λI)k−1 vk

yields:
(A − λI)v1 = (A − λI)k vk = 0.

So Av1 = λv1 and v1 is an eigenvector.


MA212 – Lecture 31 – Tuesday 6 February 2018 page 5
We freely make use of the theorem that:

Every square matrix An×n is similar to a Jordan Normal Form

P −1 AP = J
 
Block A1
 
 Block A2 
 
 
 Block A3 
 
J = 
 ... 
 
 
 ... 
 
Block Am

where each block has the Jordan form, i.e.

MA212 – Lecture 31 – Tuesday 6 February 2018 page 6


The form:

 
λ 1
 
 
 λ 
Ai := 
 ..


 . 1 
 
λ

Thus λ is repeated all the way down the diagonal,


and 1 is repeated all the way down the (first) super-diagonal.

MA212 – Lecture 31 – Tuesday 6 February 2018 page 7


First order constant coefficient differential equations

These take the form


ẋ = Ax + b

where A is an n × n matrix of constants (the constant


coefficients) and so...

MA212 – Lecture 31 – Tuesday 6 February 2018 page 8


...Here

   
ẋ1 b (t)
   1 
   
 ẋ2  d  b2 (t) 
ẋ = 
 ..
 where ẋi = xi = xi (t) and b = 


 ..
.

 .  dt  . 
   
ẋn bn (t)

MA212 – Lecture 31 – Tuesday 6 February 2018 page 9


Example: Newton’s Equation of Motion:

This equates acceleration and force:

ẍ(t) = f (t).

We transform this into first-order, as follows. Take

x1 = x, and x2 = ẋ

then we get
ẋ1 = x2 and ẍ(t) = ẋ2 = f (t)

MA212 – Lecture 31 – Tuesday 6 February 2018 page 10


... and so

      
ẋ1 0 1 x1 0
  =   + 
ẋ2 0 0 x2 f (t)
ẋ = Ax + b.

for  
0 1
A = ANewton :=  
0 0

and  
0
b = bNewton :=  
f (t)

MA212 – Lecture 31 – Tuesday 6 February 2018 page 11


Reduction to an uncoupled system?

Yes, if A is diagonalizable by P, then


 
λ 0 0 ...
 1 
 
P AP = D =  0 . . . 0 ...
−1

 
0 0 ... λn

Put x = P y , then ẋ = P ẏ , as P is constant.

MA212 – Lecture 31 – Tuesday 6 February 2018 page 12


Then

ẋ = Ax+b P ẏ = AP y+b ẏ = P −1 AP y+c with c =P −1 b

i.e.
ẏ = Dy + c

yielding a sequence of independent equations

ẏi = λi yi + ci (t) ẏi − λi yi = ci (t).

MA212 – Lecture 31 – Tuesday 6 February 2018 page 13


Solution:

The integrating factor (IF) here is e−λi t .


Multiplying each side by the IF gives

ẏi e−λi t − λi yi e−λi t = ci (t)e−λi t .

So
d n −λi t o
yi e = ẏi e−λi t − λi yi e−λi t = ci (t)e−λi t
dt Z
yi e−λi t = ci (t)e−λi t dt

MA212 – Lecture 31 – Tuesday 6 February 2018 page 14


Example

 
1 −1
ẋ =  x
4 1

Here

x−1 1

pA (x) = = (x − 1)2 + 4

−4 x − 1

so λ = 1 ± 2i.

MA212 – Lecture 31 – Tuesday 6 February 2018 page 15


Finding P (‘the modular matrix’)

For λ = 1 + 2i we solve (A − λI)x = 0 :


    
−2i −1 x1 0
  = 
4 −2i x2 0

The second row is a multiple of the first (by 2i ).


Solve one of them, e.g.
 
1
2ix1 + x2 = 0 : v1 =  .
−2i

MA212 – Lecture 31 – Tuesday 6 February 2018 page 16


Freebie

Apply conjugation to get the eigenvector to value λ̄ = 1 − 2i :


 
1
v2 =  ,
+2i

as  
1
v1 =  ,
−2i

MA212 – Lecture 31 – Tuesday 6 February 2018 page 17


Now for P :

 
1 1
P = [v1 , v2 ] =  
−2i +2i

we get
 
1 + 2i 0
P −1
AP =  
0 1 − 2i

MA212 – Lecture 31 – Tuesday 6 February 2018 page 18


Taking x = P y

we solve  
1 + 2i 0
ẏ = Dy =   y.
0 1 − 2i

MA212 – Lecture 31 – Tuesday 6 February 2018 page 19


Cont’d

Taking α = y1 (0) and β = y2 (0) gives

y1 = αe(1+2i)t = αet (cos 2t + i sin 2t)


y2 = βe(1−2i)t = βet (cos 2t − i sin 2t)

MA212 – Lecture 31 – Tuesday 6 February 2018 page 20


Back to x

    
x1 1 1 αe(1+2i)t
x =   = Py =  
x2 −2i +2i βe(1−2i)t
 
αet (cos 2t + i sin 2t) + βet (cos 2t − i sin 2t)
=  
−2iαet (cos 2t + i sin 2t) + 2iβet (cos 2t − i sin 2t)

MA212 – Lecture 31 – Tuesday 6 February 2018 page 21


Equivalently:

 
t (α + β) cos 2t + i(α − β) sin 2t
x=e ,
2i(β − α) cos 2t + 2(α + β) sin 2t

and furthermore ...

MA212 – Lecture 31 – Tuesday 6 February 2018 page 22


Its real equivalent:

We obtain  
t K cos 2t + L sin 2t
=e 
−2L cos 2t + 2K sin 2t

by taking

α+β =K ∈R 2β = K + iL
=⇒
i(α − β) = L ∈ R 2α = K − iL

MA212 – Lecture 31 – Tuesday 6 February 2018 page 23


What if A is not diagonalizable?

The example with


 
λ 1
ẋ = Ax + b =  x + b
0 λ

gives

ẋ1 = x2 + λx1 + b1 (t)


ẋ2 = λx2 + b2 (t).

MA212 – Lecture 31 – Tuesday 6 February 2018 page 24


Lets take b = 0.

The bottom equation then gives

x2 = K2 eλt

and so subbing in the upper equation:

ẋ1 = K2 eλt + λx1 ẋ1 − λx1 = K2 eλt

MA212 – Lecture 31 – Tuesday 6 February 2018 page 25


Using the integration factor method,

here the IF is e−λt , so

d n −λt o
x1 e = K2 so x1 e−λt = K2 t + K1
dt
x1 = (K2 t + K1 )eλt .

MA212 – Lecture 31 – Tuesday 6 February 2018 page 26


So

      
x1 K2 t + K1 1 t K1
  = eλt   = eλt   .
x2 K2 0 1 K2

We will soon see whence the family resemblance:


   
1 t λ 1
between   and  .
0 1 0 λ

MA212 – Lecture 31 – Tuesday 6 February 2018 page 27


In preparation for an important construction we recall

that
(P −1 AP )k = P −1 Ak P.

For instance,

(P −1 AP )2 = (P −1 AP )(P −1 AP ) = P −1 A(P P −1 )AP


= P −1 A2 P,

(P −1 AP )3 = (P −1 AP )2 (P −1 AP ) = P −1 A2 (P P −1 )AP
= P −1 A3 P

MA212 – Lecture 31 – Tuesday 6 February 2018 page 28


The exponential of a matrix

Recall from Taylor’s Theorem that

t 1 2 1 3
e := 1 + t + t + t + ...
2! 3!
and the series converges for every t.

MA212 – Lecture 31 – Tuesday 6 February 2018 page 29


For A a square matrix

we define:

A 1 2 1 3
e = I + A + A + A + ...
2! 3!

We read the right-hand side as a limiting sum of n × n matrices.


More precisely, for each i, j we read this as summing the
(i, j) -entry of each summand

1 1
(I)ij + (A)ij + (A)ij + (A)3ij + ...;
2
2! 3!
one needs to verify that for a fixed A this sum converges; we
explain why so later.

MA212 – Lecture 31 – Tuesday 6 February 2018 page 30


So eA is an n × n matrix.

To get a feel for what this is consider a diagonalizable A. Say


 
λ
 1 
 
 λ2 
−1
P AP = D =   .
.. 
 . 
 
λn

MA212 – Lecture 31 – Tuesday 6 February 2018 page 31


Then

 
λk1
 
 
 λk2 
P A P = (P AP ) = D = 
−1 k −1 k k
 ..
.

 . 
 
λkn

So

−1 A −1 −1 1 −1 2 1 −1 3
P e P =P P +P AP + P A P + P A P + ...
2! 3!

MA212 – Lecture 31 – Tuesday 6 February 2018 page 32


So P −1 eA P =...

−1 A 1 2 1 3
P e P = I + D + D + D + ... = eD
2! 3!
   
1 λ
   1 
   
 1   λ2 

=    
.. + .. 
 .   . 
   
1 λn

 
λ21
 
 
1  λ22 
+   + etc.
2! 

..
.


 
λ2n MA212 – Lecture 31 – Tuesday 6 February 2018 page 33
Summing up:

 
1 2
1 + λ1 + 2! λ1 + ...
 
 1 2 
 1 + λ2 + 2! λ2 + ... 
 
 .. 
 . 
 
1 2
1 + λn + 2! λn + ...

 
eλ1
 
 
 eλ2 
= 
 ..
.

 . 
 
eλn

MA212 – Lecture 31 – Tuesday 6 February 2018 page 34


In summary

For a diagonalizable square matrix A :


 
eλ1
 
 λ2 
 e  −1
eA = P  ..
P .

 . 
 
eλn

MA212 – Lecture 31 – Tuesday 6 February 2018 page 35


For a general n × n matrix A...

We refer to its Jordan block form J.


Say
 
A
 1 
 
 A2 
−1
P AP = J =   .
.. 
 . 
 
An

MA212 – Lecture 31 – Tuesday 6 February 2018 page 36


Powers

Of course, since J k = P −1 Ak P , then as before, but with J in


place of D ,

−1 A 1 2 1 3
P e P = I + J + J + J + ... = eJ
2! 3!

MA212 – Lecture 31 – Tuesday 6 February 2018 page 37


But what is J k ?

This is easy:
 
Ak1
 
 
 Ak2 
J =
 ..
.

 . 
 
Akn

MA212 – Lecture 31 – Tuesday 6 February 2018 page 38


Now we need:

The last observation reduces our needs to the computation of


 k
λ 1
 
 
 λ 1 
  ,
 .. 
 . 1 
 
λ

which we delay. We stop to explain the usefulness of the


enterprise.

MA212 – Lecture 31 – Tuesday 6 February 2018 page 39


Why bother?

Taking P −1 AP = J, we solve

ẋ = Ax

by substituting x = P y. Then, as before,

ẏ = Jy.

MA212 – Lecture 31 – Tuesday 6 February 2018 page 40


Claim: y(t) = etJ y(0).

tJ 1 2 2 1 3 3
e = I + tJ + t J + t J + ...
2! 3!
d tJ 1 1
{e } = J + tJ 2 + t2 J 3 + ...
dt 1! 2!
 
1 1 2 2
= J I + tJ + t J + ... = JetJ
1! 2!
 
1 1 2 2
or = I + tJ + t J + ... J = etJ J.
1! 2!

MA212 – Lecture 31 – Tuesday 6 February 2018 page 41


So, applying the above to y(t) = etJ y(0)

d d tJ
ẏ(t) = y(t) = {e y(0)} = JetJ y(0) = Jy(t),
dt dt

as before.

MA212 – Lecture 31 – Tuesday 6 February 2018 page 42


More generally,...

by substituting x = P y into

ẋ = Ax + b

gives
ẏ =Jy+P −1 b =Jy + c ẏ−Jy = c

MA212 – Lecture 31 – Tuesday 6 February 2018 page 43


Pre-multiply by e−tJ

ẏ − Jy = c ⇒ e−tJ ẏ−e−tJ Jy =e−tJ c

Use the integrating factor method to get

d  −tJ
e y(t) = e−tJ ẏ−e−tJ Jy =e−tJ c
dt Z
e−tJ y(t) = e−tJ c(t) dt.

We will soon see that this integral also has a nice formula!

MA212 – Lecture 31 – Tuesday 6 February 2018 page 44


Computation of J k for J a block: Example.

  
λ 1 λ 1
  
  
 λ 1  λ 1 
J2 = 
 ..

 ..


 . 1  . 1 
  
λ λ
 
λ2 2λ 1 0 0...
 
 
 λ2 2λ 1 ... 
 
 
=  λ2 ... 
 
 .. 
 . 
 
λ2

MA212 – Lecture 31 – Tuesday 6 February 2018 page 45


Cont’d

 
λ2 2λ 1 0 0...  
  λ 1
  
 λ2 2λ 1 ...   
  λ 1 
  =
J3 =  λ2 ...   .. 
  . 1 
 ..  
 . 
  λ
λ 2

MA212 – Lecture 31 – Tuesday 6 February 2018 page 46


Cont’d

 
λ3 3λ2 3λ 1 0...
 
 
 λ3 3λ2 3λ 1... 
 
 
= λ3 ... 
 
 .. 
 . 
 
λ3

MA212 – Lecture 31 – Tuesday 6 February 2018 page 47


Comments

Hereabove, beyond what we stipulate below, all else is a 0:


1. The diagonal entries are all λ3
2. The first super-diagonal entries are all 3λ2
3. The second super-diagonal entries are all 3λ
4. The third super-diagonal entries are all 1

MA212 – Lecture 31 – Tuesday 6 February 2018 page 48


Surprise:

For good reason: these repeated items are exactly as the terms
in: (λ + 1)3 = λ3 + 3λ2 + 3λ + 1
Starting from its diagonal position, each row exhibits the terms:
λ3 , 3λ2 , 3λ, 1 in that order, for as long as there is space in the
row available for all the terms.

MA212 – Lecture 31 – Tuesday 6 February 2018 page 49


Formula:

In J k the diagonal carries λk and then the exponent falls so that


the mth super-diagonal carries: (binomial coeffient) × λk−m :
   
k k
  λk−m ; where   = k(k − 1)...(k − m + 1)
m m m.(m − 1)...2.1

MA212 – Lecture 31 – Tuesday 6 February 2018 page 50


Reason for this:

Ultimately, the same mechanism is at work as in Pascal’s


Triangle:
1 1
1 2 1
1 3 3 1
1 4 6 4 1

In any row each inner element ‘folds in’ (sums) the two elements
above.

MA212 – Lecture 31 – Tuesday 6 February 2018 page 51


Exercise:

Compute
tJ t2 2 t3 3
e = I + tJ + J + J + .
2! 3!

MA212 – Lecture 31 – Tuesday 6 February 2018 page 52


Solution:

Consider any fixed entry (i, j) from the mth super-diagonal,


remembering that only J k with k ≥ m has a non-zero mth
super-diagonal.
Summing over all the relevant powers J m , J m+1 , ... we get

X tk
k−m k(k − 1)...(k − m + 1)
λ
k! m!
k≥m

tm X k−m k−m k(k − 1)...(k − m + 1)


= t λ
m! k!
k≥m

= as k! = k(k − 1)...(k − m + 1)(k − m)...

MA212 – Lecture 31 – Tuesday 6 February 2018 page 53


Continuing

tm X k−m 1
= (tλ) [put: ℓ = k − m]
m! (k − m)!
k≥m
tm X ℓ1
= (tλ)
m! ℓ!
ℓ=0
m
λt t
= e
m!

MA212 – Lecture 31 – Tuesday 6 February 2018 page 54


So these look like..

t2 t3
1, t, 2! , 3! , ...

the terms in the exponential series ×eλt

t2 λt t3 λt
1eλt , teλt , 2! e , 3! e , ...

MA212 – Lecture 31 – Tuesday 6 February 2018 page 55


Examples: For 2 × 2

     
λ 1 eλt teλt 1 t
J = , e tJ
=  = eλt  
0 λ 0 eλt 0 1

MA212 – Lecture 31 – Tuesday 6 February 2018 page 56


For 3 × 3

 
λ 1 0
 
J = 0 λ 1 

,
0 0 λ
   
λt λt t2 λt 2
e te 2! e 1 t t /2!
   
e tJ
=
 0 e λt
te λt  = eλt  0 1
  t 

0 0 eλt 0 0 1

MA212 – Lecture 31 – Tuesday 6 February 2018 page 57


In general

  

 λ 1 


  

 
  λ 1 

etJ = exp t 
 ..



  . 1 

  

 

 λ 

 
2 3
1 t t /2! t /3!
 
 1 t 2 3
t /2! t /3! 
 
 
 1 t 2 3
t /2! t /3! 
 
= eλt  
 1 t 2
t /2! 
 
 .. 
 . ... ... 
 
1
MA212 – Lecture 31 – Tuesday 6 February 2018 page 58
Summary

To solve
ẋ = Ax

Take P −1 AP = J and solve by substituting x = P y by reducing


to:
ẏ = Jy.

MA212 – Lecture 31 – Tuesday 6 February 2018 page 59


This has solution

y(t) = etJ y(0).

Here etJ is computed block-wise with the factoring out of eλt


and using tk /k! on the k th super-diagonal:

MA212 – Lecture 31 – Tuesday 6 February 2018 page 60


MA212 Further Mathematical Methods Lecture LA 12

Lecture 32: Solving recurrence equations

using Jordan blocks

Dominant eigenvalue
Long-term forecasts
Recap: differential equations

For  
λ 1
 
 
 λ 1 
J =
 ..


 . 1
 
λ

ẏ = Jy
tJ
with y(0) initial data
=⇒ y(t) = e y(0)

ẋ = Ax, x = P y
y = P −1 x
ẏ = P −1 AP y
MA212 – Lecture 32 – Tuesday 9 February 2018 page 2
Cont’d

 
2
t
1 t 2 ···
 
 
etJ = eλt  1 t · · ·
 
..
.

MA212 – Lecture 32 – Tuesday 9 February 2018 page 3


Example:

 
−1 1 0
 
ẋ =  0 −1 1 

x
0 0 −1

MA212 – Lecture 32 – Tuesday 9 February 2018 page 4


Solution:

x(t) = etJ x(0)


  
t2
1 t 2 A1

−t 
 
= e 0 1 t A2 
 
0 0 1 A3
Ai = xi (0)

MA212 – Lecture 32 – Tuesday 9 February 2018 page 5


Example:

 
0 −1
Solve ẋ =  x
1 −2
We saw in Lecture 30 slide 39 that for
 
1 1
P = MB =  
1 0
 
−1 1
−1
P AP =  =J
0 −1

MA212 – Lecture 32 – Tuesday 9 February 2018 page 6


So...

Take
x = Py
P ẏ = AP y
 
−1 1
ẏ = P −1
AP y =  y
0 −1
y(t) = etJ y(0)
  
−t 1 t A
  1
y(t) = e
0 1 A2
 
A1
y(0) =  
A2
MA212 – Lecture 32 – Tuesday 9 February 2018 page 7
so

x(t) = P y(t)
   
−t 1 1 1 t  A1 
=e
1 0 0 1 A2

MA212 – Lecture 32 – Tuesday 9 February 2018 page 8


Recurrence Equations: xt+1 = Axt t = 0, 1, ...

Our study is by cases:


Case 1: A diagonalizable
Case 2: Jordan block form
Subcase 2.1: b 6= c
Subcase 2.2: b = c and a 6= c
 
a
 
 
b
where x0 =   

c
.
..

MA212 – Lecture 32 – Tuesday 9 February 2018 page 9


Aim

We study solutions of equations like


    
x1 (t + 1) 0 7 6 x1 (t)
    
x2 (t + 1) =  1 0 0  
  4  x2 (t)
1
x3 (t + 1) 0 2 0 x3 (t)

An example concerned with population growth will eventually


follow.

MA212 – Lecture 32 – Tuesday 9 February 2018 page 10


Recurrence Equations:

   
x x (k)
 1  1 
 .   . 
For x =  ..  , xk =  ..  , with k = 0, 1, 2, · · · . Consider
   
xn xn (k)

xk+1 = An×n xk
x0 given

Solution:
xk = Axk−1 = A(Axk−2 )
= · · · = Ak x0

MA212 – Lecture 32 – Tuesday 9 February 2018 page 11


Case 1: A diagonalizable

 
λ1
 
 .. 
P −1 AP = D =  . 
 
λn
Put
P yk = xk
P yk = Axk−1 = AP yk−1
yk = P −1 AP yk−1 = Dyk−1

MA212 – Lecture 32 – Tuesday 9 February 2018 page 12


So

yk = Dk y0
 
λk1
  (1)
 .. 
= .  y0
 
λkn

But P yk = xk , y0 = P −1 x0

MA212 – Lecture 32 – Tuesday 9 February 2018 page 13


So

xk = P yk
 
λk1
  (2)
 ..  −1
=P .  P x0
 
λkn

more later on xk .

MA212 – Lecture 32 – Tuesday 9 February 2018 page 14


Provided there is one dominating eigenvalue

Let us index the eigenvalues according to their modulus. If the


largest in modulus is the only eigenvalue with that modulus, then
we say that it is a dominant or dominating eigenvalue. If this
happens, we have:

|λ1 | > |λ2 | ≥ |λ3 | ≥ · · ·

λk1 grows fastest, i.e.

λk2 λ2 k
k
= ( 1 ) −→ 0 as k −→ ∞,
λ1 λ

λ3
and similarly for λ1 etc.

MA212 – Lecture 32 – Tuesday 9 February 2018 page 15


More about xk : as promised

xk = P Dk y0
 
λk1 α
 
 k 
 λ2 β 
=P  .. 

 . 
 
λkn γ
 
α
 
 λ2 k 
( ) β 
k  λ1
= λ1 P  .  
 .. 
 
( λλn1 )k γ

MA212 – Lecture 32 – Tuesday 9 February 2018 page 16


Cont’d

Here

y0 = P −1 x0
 
α
 
 
β 
= 
 .. 
.
 
γ

MA212 – Lecture 32 – Tuesday 9 February 2018 page 17


Some more about xk : ∗ denotes ‘small’ relative to α

xk = P Dk y0
 
α
 
 
∗
k  
= λ1 P  . 
 .. 
 

= αλk1 v1 , approximately, as P e1 = v1 (nb Av1 = λ1 v1 ) and
   
α 1
   
   
∗ 0
  ≈ α  = αe1
 ..   .. 
. .
   
∗ 0 MA212 – Lecture 32 – Tuesday 9 February 2018 page 18
Subdominant eigenvalue:

In the very special (= not generic) case that α = 0 above


argument breaks down.
Subdominant eigenvalue:

|λ1 | > |λ2 | > |λ3 | ≥ |λ4 | ≥ · · ·

In practical modelling usually α 6= 0 ; but models need to be


robust & must keep away from the ... case α = 0

MA212 – Lecture 32 – Tuesday 9 February 2018 page 19


An example for case 1

The following recurrence models the 3 inter-related sections of a


population over (discrete) time periods t with respective sizes
x1 , x2 , x3 of the young, middle-age and old members. The
coefficient matrix reflects fertility (i.e. creation of the young) and
survival from one period to the next. For example, 1/4 of the
young survive, and likewise just 1/2 of the middle-aged survive.
    
x1 (t + 1) 0 7 6 x1 (t)
    
x2 (t + 2) =  1 0 0 x2 (t)
 
  4 
1
x3 (t + 1) 0 2 0 x3 (t)

MA212 – Lecture 32 – Tuesday 9 February 2018 page 20


Analysis


x −7 −6


PA (x) = det(xI − A) = − 41 x 0

0 −2 x
1

1 6 7
= (x · 0 − ) + x(x2 − )
2 4 4
3 7 3 2 3
= x − x − = (x + 1)(x − x − )
4 4 4
3 1
= (x + 1)(x − )(x + )
2 2

So λ = + 23 , − 12 , −1 and so there is a dominant eigenvalue here:



3 1 1
> 1 = | − 1| > = − .
2 2 2 MA212 – Lecture 32 – Tuesday 9 February 2018 page 21
To find the corresponding eigenvector v1 solve:

  
3
2 −7 −6 x1
  
− 1 3
0  x 2  = 0 (3)
 4 2  
0 − 12 23 x3

−x1 + 6x2 = 0
−x2 + 3x3 = 0

... omitting the first row as redundant

MA212 – Lecture 32 – Tuesday 9 February 2018 page 22


Cont’d

 
18
 
v1 = 
  3 

1
 
P = v1 , · · ·
   
x1 (0) α
   
P x2 (0)
−1   
 = β  = y0
x3 (0) γ

NB: xk = P yk and we assume α 6= 0.

MA212 – Lecture 32 – Tuesday 9 February 2018 page 23


Provided α 6= 0:

   
γk 18
   
s  ≈ ( 3 )k α  3 
 k 2  
tk 1

Age class proportions 18:3:1

MA212 – Lecture 32 – Tuesday 9 February 2018 page 24


Case 1 (Diagonalizable) Summary

In the dominant eigenvalue setting

xt ≈ λtdominant × α × eigenvector to value λdominant

Comments: Perron-Frobenius Theorems


1. When A is real with non-negative ( ≥ 0 ) entries, A has a
non-negative eigenvalue which is largest in modulus:
Put ρ(A) := max{|λ| : λ is eigenvalue of A}
Then ρ(A) ≥ 0 , and ρ(A) is an eigenvalue.

MA212 – Lecture 32 – Tuesday 9 February 2018 page 25


2. If Ak for some k = 1, 2, .. has all entries POSITIVE, then

ρ(A) > 0 and ρ(A) has multiplicity 1,

so ρ(A) is a dominant eigenvalue.

MA212 – Lecture 32 – Tuesday 9 February 2018 page 26


Case 2: Jordan Normal Form (JNF)

Here
P −1 AP = J

As before sub
P yk = xk
yk = J k y0
 
A
 1 
 
 A2 
J = 
.. 
 . 
 
An

MA212 – Lecture 32 – Tuesday 9 February 2018 page 27


And ...

 
Ak1
 
 
 Ak2 
J =
k
 ..


 . 
 
Akn

so xk = P J k P −1 x0

MA212 – Lecture 32 – Tuesday 9 February 2018 page 28


Example: How to compute J

 
2 1 −2
 
A = 1 1 −1


1 0 0

x − 2 −1 2



pA (x) = −1 x − 1 1 = (x − 1)3

−1 0 x

So (A − I)3 = O .

MA212 – Lecture 32 – Tuesday 9 February 2018 page 29


So as λ = 1

 
1 1 −2
 
A − λI = A − I = 
1 0 −1

1 0 −1
 
0 1 −1
 
(A − λI) = (A − I) = 
2 2
0 1 −1
 6= 0
0 1 −1
 
0
 
v3 := e2 =  1
 
 “picks up” 2nd col.

to witness (A − I)2 v3 6= 0 . MA212 – Lecture 32 – Tuesday 9 February 2018 page 30


Cont’d

 
0
 
v3 =  1
 
 ( = e2 )

0
 
1
 
v2 = (A − I)v3 = 
 0 ( = e1 ) = 2nd col. “pick up” from (A − I)

0
 
1 1 −2
 

as A − λI = 1 0 −1 .
1 0 −1

MA212 – Lecture 32 – Tuesday 9 February 2018 page 31


Cont’d

 
1
 
v1 = (A − I)v2 =  1
 
,

1
   
1 1 −2 1
   
 
as A − λI = 1 0 −1 and v2 =  0
 
 ( = e1 ).

1 0 −1 0

MA212 – Lecture 32 – Tuesday 9 February 2018 page 32


Cont’d

 
1 1 0
   
P = v1 v2 v3 = 1 0 1 


1 0 0

MA212 – Lecture 32 – Tuesday 9 February 2018 page 33


Cont’d

 
1 1 0
 
J = P AP = 0 1 1
−1 

0 0 1

Recall
  k−2 
k k−1 k
λ kλ 2 λ
 
k 
Jλ =  0 λ k
kλ k−1  (4)

0 0 λk
   
k k−1 k k−2
(λ + 1)k = λk + λ + λ + ···
1 2

MA212 – Lecture 32 – Tuesday 9 February 2018 page 34


Here λ = 1

so
 
1
1 k 2 k(k − 1)
 
J =
k
0 1 k 
 (5)

0 0 1

Back to the xk

xk = P J k P −1 x0
    
1
1 1 0 1 k 2 k(k − 1) 0 0 1 a
     (6)
= 1 0 1 


0 1 k 

 1 0 −1 b
 
1 0 0 0 0 1 0 1 −1 c

MA212 – Lecture 32 – Tuesday 9 February 2018 page 35


Note:

    
0 0 1 a c
    
   
y0 = 1 0 −1  b  = a − c

 (7)

0 1 −1 c b−c

As λ1 = λ2 = λ3 = 1 here.

MA212 – Lecture 32 – Tuesday 9 February 2018 page 36


Let’s study ‘Long-term’ behaviour

here, of course, of
   
1 1 0 c
  k 
xk = 1 0 1 J a − c
  
 (8)

1 0 0 b−c

for ’large’ k
 
1
1 k 2 k(k − 1)
 
J =
k
0 1 k 
 (9)

0 0 1

MA212 – Lecture 32 – Tuesday 9 February 2018 page 37


Subcase 1: Influence of k 2 term

1 1 2 1
k(k − 1) = k (1 − ) (10)
2 2 k
Influences xk ... see this by factorizing out k 2 :
   
1
1 1 0 ∗ ∗ 2 c
   
xk = k 1 0 1 0 ∗ ∗  a − c
2    

1 0 0 0 0 ∗ b−c
  (11)
1
2 (b − c)
 
=k P ∗ 
2 

Above (and also in later slides ) ∗ denotes arbitrary values.


MA212 – Lecture 32 – Tuesday 9 February 2018 page 38
Here

 
1
21
 
xk = k (b − c)P 
 0+∗
2
0
  (12)
1
21
 
= k (b − c) 
 1 +∗
2
1

Note that
 
1
 
v1 =  1
 

1
MA212 – Lecture 32 – Tuesday 9 February 2018 page 39
Subcase 2: b = c and so k 2 has no influence

If b = c , then
   
1
2 (b − c) 0
   
 ∗  = ∗ (13)
   
∗ ∗

MA212 – Lecture 32 – Tuesday 9 February 2018 page 40


So start over: this time the k term wields influence

Recall a previous slide


   
1 1 0 c
  k 
xk =    
1 0 1 J a − c (14)

1 0 0 b−c

for ’large’ k
 
1
1 k 2 k(k − 1)
 
J =
k
0 1 k 
 (15)

0 0 1

Now factor out only k

MA212 – Lecture 32 – Tuesday 9 February 2018 page 41


So start over: this time the k term wields influence

Denoting an irrelevant term by 


  
∗ 1  c
  
xk = kP  
0 ∗ 1  a − c

0 0 ∗ 0
 
a−c
 
= kP  ∗ 

+∗
0
  
1 1 0 1
  
= k 1 0 1
 0 (a − c) + ∗
 
1 0 0 0
MA212 – Lecture 32 – Tuesday 9 February 2018 page 42
That is

  
1 1 0 1
  
xk = k 1 0 1
 0 (a − c) + ∗
 
1 0 0 0
 
1
 
= k(a − c)  
1 + ∗
1

Here the eigenvector v1 is all important.

MA212 – Lecture 32 – Tuesday 9 February 2018 page 43


Summary of Case 2: Jordan Form

x = P y =⇒ yt+1 = Jyt , t = 0, 1, 2 · · ·

=⇒ y1 = Jy0
  
k
λk kλk−1 2 λk−2 · · ·
 
 
 λk kλk−1 · · ·
J =
k
 ..


 . · · ·
 
..
.

MA212 – Lecture 32 – Tuesday 9 February 2018 page 44


Long term forecast for xt

We are given
 
a
 
 
b
x0 = 
 

c
.
..

MA212 – Lecture 32 – Tuesday 9 February 2018 page 45


Subcase 2.1 b − c 6= 0

∼ 1 2
xt = t (b − c) × eigenvector for the λ block
2

MA212 – Lecture 32 – Tuesday 9 February 2018 page 46


Or Subcase 2.2 b = c and a − c 6= 0

xt ∼
= t(a − c) × eigenvector

MA212 – Lecture 32 – Tuesday 9 February 2018 page 47


MA212 Further Mathematical Methods Lecture LA 13

Lecture 33: Unitary Diagonalization

& Applications

About normality
Inner products and positive definiteness
Singular values (for non-square matrices)
Background reading

Anthony and Harvey,


Chap. 13 (§13.6,13.7)

Adam Ostaszewski
Chap. 5 (§5.1-5.3, 5.5 5.6,)

MA212 – Lecture 33 – Tuesday 13 February 2018 page 2


Recap: Complex inner product

We recall the properties of a complex inner product h., .i on V :


Linearity in the first argument:

hαu1 + βu2 , vi = αhu1 , vi + βhu2 , vi all u1 , u2 , v ∈V, α, β ∈ C

Hermitian property:

hv, ui = hu, vi all u, v ∈V.

Positivity:
hu, ui > 0 all u 6= 0 ∈V.

In Cn the standard example is u · v := u1 v¯1 + ... + un v¯n = uT v̄ .


This may be equivalently written as v∗ u , i.e. v̄T u .

MA212 – Lecture 33 – Tuesday 13 February 2018 page 3


Definition

An n × n complex matrix A is unitarily diagonalizable if there is


a unitary matrix S and a diagonal matrix D such that

S ∗ AS = D

i.e. S −1 AS = D with S −1 = S ∗ .
When this happens the columns of S form an orthonormal basis
of Cn (as S ∗ S = I) and are eigenvectors of A as AS = SD, i.e.
 
λ1
 
A[s1 , ..., sn ] = [s1 , ..., sn ] 
 ...  = [λ1 s1 , ...,λn sn ].

λn

MA212 – Lecture 33 – Tuesday 13 February 2018 page 4


Conclusion and Question

Theorem. An n × n complex matrix A is unitarily diagonal-


izable iff A has an orthonormal basis consisting of eigen-
vectors.

Question: Just which n × n complex matrices are unitarily


diagonalizable?

Answer: Those which are normal: i.e. satisfy AA∗ = A∗ A.

MA212 – Lecture 33 – Tuesday 13 February 2018 page 5


Theorem

Theorem. An n × n complex matrix A is unitarily diagonal-


izable iff A is normal.

MA212 – Lecture 33 – Tuesday 13 February 2018 page 6


Proof

Proof: (Left to Right) If A is unitarily diagonalisable, then there is


a unitary matrix S and a diagonal matrix D such that
S ∗ AS = D .
∴ A = SDS ∗ and A∗ = (SDS ∗ )∗ = S ∗∗ D∗ S ∗ = SD∗ S ∗
so AA∗ = (SDS ∗ )(SD∗ S ∗ ) = SDD∗ S ∗
and A∗ A = (SD∗ S ∗ )(SDS ∗ ) = SD∗ DS ∗

Our claim will be established if we show that DD∗ = D∗ D .

MA212 – Lecture 33 – Tuesday 13 February 2018 page 7


Cont’d

   

λ1 0 
λ1 0
   
 λ2   λ2 
But, D = 
 ..
 and D∗ = 
  ..


 .   . 
   
0 λn 0 λn

MA212 – Lecture 33 – Tuesday 13 February 2018 page 8


 

|λ1 |2 0 
 
 |λ2 |2 
and so DD = 

 ..
 = D∗ D

 . 
 
0 |λn |2

∴ AA∗ = A∗ A , i.e. A is normal.

So every unitarily diagonalizable matrix is normal.


The proof of the converse direction is much harder and beyond
the scope of this course.

MA212 – Lecture 33 – Tuesday 13 February 2018 page 9


Quick recap: If A is an n × n complex matrix, then

A is normal if A∗ A = AA∗
A is unitary if A∗ = A−1
A is Hermitian if A∗ = A

Consequently if A is unitary or Hermitian it is normal, so may be


unitarily diagonalized.

MA212 – Lecture 33 – Tuesday 13 February 2018 page 10


Real and symmetric A...

Also if An×n is real and symmetric, then it is Hermitian, and so


has real eigenvalues and we can find a real eigenvector for each
(real) eigenvalue (by solving the real equations Ax = λx with λ
real).
As A is normal, the eigenvectors will form an orthonormal basis
of Rn .
So A can be orthogonally diagonalized,
i.e. there is an orthogonal matrix P and a diagonal matrix D
such that P T AP = D (i.e. P −1 = P T ).

MA212 – Lecture 33 – Tuesday 13 February 2018 page 11


Recap on inner products via a Hermitian matrix

Also recall from Lectures 27/28 that

hx, yi := y∗ Ax

defines an inner product on Cn provided:


A is Hermitian,
x∗ Ax > 0 for all x 6= 0. (For A Hermitian, we later show
that x∗ Ax ∈ R .)

MA212 – Lecture 33 – Tuesday 13 February 2018 page 12


Inner product from Hermitian A

For An×n Hermitian and x, y ∈ Cn put

hx, yi := x · (Ay) = xT Ay.

Then hx, yi = xT Ay = xT Ay = (xT Ay)T = yT AT x.


| {z }
as this is just a number
Now, as A is Hermitian (i.e. A∗ = A )

yT AT x = yT ĀT x = yT A∗ x = yT Ax.

So by definition of hx, yi we have hx, yi = hy, xi .


This verifies the Hermitian property of the inner product.
What remains is the positivity property, i.e. that x∗ Ax > 0 for all
x 6= 0 , a matter we will return to.
MA212 – Lecture 33 – Tuesday 13 February 2018 page 13
Recall

Definition. An n × n complex matrix A is positive definite if A


is Hermitian and x∗ Ax > 0 for all x 6= 0 in Cn .
So we conclude that

Theorem. For A Hermitian, hx, yi := y∗ Ax defines an


inner product on Cn iff A is positive definite.

The link between positive definiteness and the signs of the


eigenvalues is:

Theorem. A is positive definite iff A is Hermitian and all


its eigenvalues are (strictly!) positive.

MA212 – Lecture 33 – Tuesday 13 February 2018 page 14


Proof: (⇒)

To show that if A is positive definite, i.e. that all of the


eigenvalues are strictly positive, we show that
if A has a e-value λ ≤ 0 , then A is not positive definite.
Suppose that A has a e-value λ ≤ 0 so that

Ax = λx

for some x 6= 0 . Then

x∗ Ax = x∗ (λx) = λ(x∗ x) = λkxk2

and so x∗ Ax ≤ 0 for some x 6= 0 , i.e. A is not positive definite.


Note that as A is Hermitian, λ is real.

MA212 – Lecture 33 – Tuesday 13 February 2018 page 15


Proof: (⇐)

Suppose that A is Hermitian and that all of its e-values


λ1 , λ2 , . . . , λn are strictly positive.
As A is Hermitian, it is normal, and so the corresponding
e-vectors can be x1 , x2 , . . . , xn 6= 0 where

Ax1 = λ1 x1 , Ax2 = λ2 x2 , . . . , Axn = λn xn ,

and they form an orthonormal basis of Cn . So for any x ∈ Cn


with x 6= 0 , we have

MA212 – Lecture 33 – Tuesday 13 February 2018 page 16


say x = α1 x1 + α2 x2 + · · · + αn xn with αi not all zero

Ax = A(α1 x1 + α2 x2 + · · · + αn xn )
= α1 Ax1 + α2 Ax2 + · · · + αn Axn
= α1 λ1 x1 + α2 λ2 x2 + · · · + αn λn xn .

Then x∗ Ax =
= (α1 x1 + α2 x2 + · · · + αn xn )∗ (α1 λ1 x1 + α2 λ2 x2 + · · · + αn λn xn )
= (α1 x∗1 + α2 x∗2 + · · · + αn x∗n )(α1 λ1 x1 + α2 λ2 x2 + · · · + αn λn xn )
= α1 α1 λ1 x∗1 x1 + α2 α1 λ1 x∗2 x1 + · · · + αn α1 λ1 x∗n x1
+ α1 α2 λ2 x∗1 x2 + α2 α2 λ2 x∗2 x2 + · · · + αn α2 λ2 x∗n x2
+ α1 αn λn x∗1 xn + α2 αn λn x∗2 xn + · · · + αn αn λn x∗n xn
= |α1 |2 λ1 + |α2 |2 λ2 + · · · + |αn |2 λn
MA212 – Lecture 33 – Tuesday 13 February 2018 page 17
>0
Determinantal test

Or, look at the principal (or leading) minors of A and see


whether they are all strictly positive.
 
a a12 · · · a1n
 11 
 
 a21 a22 · · · a2n 
The Hermitian matrix A =   .. .. .. .. 

 . . . . 
 
..
an1 an2 . ann
is positive definite, if and only if all the principal sub-
determinants are positive

MA212 – Lecture 33 – Tuesday 13 February 2018 page 18


Test

a11 > 0


a11 a12
>0

a21 a22

a a a
11 12 13

a
21 a22 a23 > 0

a31 a32 a33

and all the next ones,...

MA212 – Lecture 33 – Tuesday 13 February 2018 page 19


ending with



a11 a12 · · · a1n


a21 a22 · · · a2n
the full determinant of A . .. ..

.. = |A| > 0.
.. . . .


an1 an2 · · · ann

MA212 – Lecture 33 – Tuesday 13 February 2018 page 20


Example

 
2 1 1
 
Consider the matrix A = 1 1 0

 , its characteristic
1 0 3
polynomial is

MA212 – Lecture 33 – Tuesday 13 February 2018 page 21


Cont’d


x − 2 −1 −1



pA (x) = det(xI − A) = −1 x − 1 0

−1 0 x − 3

pA (x) = −(0 + (x − 1)) + (x − 3)((x − 2)(x − 1) − 1)


= (x − 3)(x2 − 3x + 2) − 2x + 4
= x3 − 3x2 + 2x − 3x2 + 9x − 6 − 2x + 4
= x3 − 6x2 + 9x − 2
∴ pA (x) = (x − 2)(x2 − 4x + 1)
∴ pA (x) = 0 if x = 2 or x2 − 4x + 1 = 0
1
√ √
So x = 2 (4 ± 16 − 4) = 2 ± 3 > 0.
MA212 – Lecture 33 – Tuesday 13 February 2018 page 22
The principal determinants of A are

the following three:



2 1 1

2 1
2, = 1 , and

1 1 0 = 2.

1 1
1 0 3

All positive, as expected.

MA212 – Lecture 33 – Tuesday 13 February 2018 page 23


The magical properties of the matrix A∗ A

Let A be complex m × n matrix (i.e. A need not be a square


matrix).
Observe:

A∗ A is square as it will be a n × n matrix.


A∗ A is Hermitian as (A∗ A)∗ = A∗ A∗∗ = A∗ A
A∗ A is non-negative definite as, for any x ∈ Cn , we have

x∗ (A∗ Ax) = (x∗ A∗ )Ax = (Ax)∗ Ax = kAxk2 ≥ 0

So all the eigenvalues will be non-negative.

MA212 – Lecture 33 – Tuesday 13 February 2018 page 24


Cont’d

In particular, if the only solution to Ax = 0 is x = 0 , then


x∗ A∗ Ax > 0 for all x 6= 0 in Cn and A∗ A is actually positive
definite.

MA212 – Lecture 33 – Tuesday 13 February 2018 page 25


Example

 
2 i 0
Consider the 2 × 3 complex matrix A =  
0 −i 2
 
2 0
 
so A = −i i 
∗ 
 and
0 2
   
2 0   4 2i 0
  2 i 0  
∗  
A A = −i i    = −2i 2 2i


0 −i 2
0 2 0 −2i 4

MA212 – Lecture 33 – Tuesday 13 February 2018 page 26


The characteristic polynomial here of A∗ A

is


x − 4 −2i 0


pA∗ A (x) = 2i x − 2 −2i

0 2i x − 4

So


x − 2 −2i 2i −2i
pA∗ A (x) = (x − 4) + 2i



2i x − 4 0 x − 4

= x(x − 4)(x − 6) so λ0 = 0, λ1 = 4, λ2 = 6.

MA212 – Lecture 33 – Tuesday 13 February 2018 page 27


Comment

Indeed, the non-zero eigenvalue of A∗ A are the same as the


non-zero eigenvalue of AA∗ .
Why? Suppose that λ is a non-zero eigenvalue of A∗ A so that

A∗ Av = λv

for some v 6= 0 . We then have

A(A∗ Av) = A(λv)


=⇒ (AA∗ )(Av) = λ(Av)

with Av 6= 0 . (Why?) That is λ is also a non-zero eigenvalue of


AA∗ with eigenvector Av 6= 0 .
(Because Av = 0 ⇒ λv = A∗ Av = 0 ⇒ v = 0 as λ 6= 0 .)

MA212 – Lecture 33 – Tuesday 13 February 2018 page 28


Important definition: the singular values of A

Definition. If λ1 , λ2 , . . . , λk are the non-zero eigenvalues of the



√ √ √
n × n matrix A A , then λ1 , λ2 , . . . , λk are called the
singular values of A .

MA212 – Lecture 33 – Tuesday 13 February 2018 page 29


Example

Using
 
2 i 0
A= 
0 −i 2

again, we have
 
  2 0  
2 i 0   5 −1

AA =      
−i i  =
0 −i 2 −1 5
0 2

MA212 – Lecture 33 – Tuesday 13 February 2018 page 30


Cont’d

Thus AA∗ has eigenvalues given by


 
5−λ −1
det(AA − λI) = 0 =⇒ det 
∗ =0
−1 5−λ

(5 − λ)2 − 1 = 0
(5 − λ)2 = 1 =⇒ 5 − λ = ±1
so λ = 5 ± 1 = 6, 4

Indeed, the singular values of A are 6 and 2 .

MA212 – Lecture 33 – Tuesday 13 February 2018 page 31


Furthermore: normality

Since A∗ A is Hermitian it is also normal as


(A∗ A)∗ (A∗ A) = (A∗ A∗∗ )(A∗ A) = A∗ AA∗ A
(A∗ A)(A∗ A)∗ = (A∗ A)(A∗ A∗∗ ) = A∗ AA∗ A

which means that the eigenvectors of the n × n matrix A∗ A


form an orthonormal basis of Cn .
Indeed, suppose {v1 , v2 , . . . , vk } is an orthonormal set of
eigenvectors of A∗ A corresponding to the non-zero eigenvalues
λ1 , λ2 , . . . , λk , then

A∗ Avi = λi vi for 1 ≤ i ≤ k with λi > 0


vi · vj = 0 for i 6= j and kvi k = 1 for 1 ≤ i ≤ k

and then
MA212 – Lecture 33 – Tuesday 13 February 2018 page 32
and then

...we see that


each of the vectors
1
ui = √ Avi for 1 ≤ i ≤ k
λi

is an eigenvector of AA∗ corresponding to the non-zero


eigenvalue λi as

∗ ∗1 1 ∗ 1
AA ui = AA ( √ Avi ) = √ A(A Avi ) = √ A(λi vi )
λi λi λi
1
= λi ( √ Avi ) = λi ui
λi

MA212 – Lecture 33 – Tuesday 13 February 2018 page 33


Orthogonal

orthogonal as, for i 6= j ,

1 1 1
ui · uj = ( √ Avi ) · ( √ Avj ) = p (Avi ) · (Avj )
λi λi λi λj
1 ∗ 1
=p (Avj ) Avi = p vj∗ A∗ Avi
λi λj λi λj
s
λi ∗ λi
= p vj vi = (vi · vj )
λi λj λ j

∴ ui · uj = 0 when i 6= j

MA212 – Lecture 33 – Tuesday 13 February 2018 page 34


Unit length

unit length as
s r
λi λi
ui · uj = (vi · vj ) =⇒ ui · ui = (vi · vi )
λj λi
=⇒ kui k2 = kvi k2 = 1

That is, the vectors ui = √1 Avi form an orthonormal set of


λi
eigenvector of AA∗ corresponding to its non-zero eigenvalues
λ1 , λ2 , . . . , λk

MA212 – Lecture 33 – Tuesday 13 February 2018 page 35


Example

 
2 i 0
Using A =   again, we have
0 −i 2
 
5 −1

AA =  
−1 5

with eigenvalues 4, 6
 
1 1
λ1 = 4 =⇒ e-vector ui = √
2 1
 
1 −1
λ2 = 6 =⇒ e-vector ui = √
2 1

MA212 – Lecture 33 – Tuesday 13 February 2018 page 36


Example cont’d

 
4 2i 0
 
A A = −2i 2 2i
∗ 

0 −2i 4

with e-values 0, 4, 6

MA212 – Lecture 33 – Tuesday 13 February 2018 page 37


Cont’d

 
1
1  
λ1 = 4 =⇒ e-vector √ 0


2
1
 
−1
1  
λ2 = 6 =⇒ e-vector √  i 


3
1
 
−i
1  
λ3 = 0 =⇒ e-vector √  2 


6
i

MA212 – Lecture 33 – Tuesday 13 February 2018 page 38


Eigenvectors ui for λi > 0 are:...

 
  1    
1 1 2 i 0 
1    1 2 1 1
u1 = √ Av1 =  √ 0 = √ = √
4 2 0 −i 2 2  2 2 2 2 1
1

MA212 – Lecture 33 – Tuesday 13 February 2018 page 39


and

 
  −1
1 1 2 i 0 1  
u2 = √ Av2 = √ √  i 


6 6 0 −i 2 3
1
  
1 −3 1 −1
= √ =√
3 2 +3 2 +1

MA212 – Lecture 33 – Tuesday 13 February 2018 page 40


Comment

Indeed, using similar reasoning, we see that if ui , u2 , . . . , uk is


an orthonormal set of eigenvectors of AA∗ corresponding to the
non-zero eigenvalues λ1 , λ2 , . . . , λk , then the vectors

1 ∗
vi = √ A ui for 1 ≤ i ≤ k
λi

form an orthonormal set of eigenvectors of A∗ A corresponding


to its non-zero eigenvalues λ1 , λ2 , . . . , λk .

MA212 – Lecture 33 – Tuesday 13 February 2018 page 41


Example

Using what we saw above we have

     
2 0   2 1
1 ∗ 1  1 1 1   1  
v1 = √ A u1 =  −i i 
 √  = √  0
 =√   0

4 2 2 1 2 2 2
0 2 2 1

MA212 – Lecture 33 – Tuesday 13 February 2018 page 42


And

   
2 0   −2
1 ∗ 1 
 1

−1 1  
v2 = √ A u2 = √ −i i  √   = √  2i 


6 6 2 1 2 3
0 2 2

 
−1
1  
=√  i 

3
1

MA212 – Lecture 33 – Tuesday 13 February 2018 page 43


MA212 Further Mathematical Methods Lecture LA 14

Lecture 34: Singular Values Decomposition

How many SVs are there? ... = rank(A) many


The maximal SV versus magnification under A
Spectral Decomposition
SV Decomposition
S1 + S2 – sum of two subspaces
Background reading

Adam Ostaszewski
Chap. 5

MA212 – Lecture 34 – Friday 16 February 2018 page 2


The norm of a matrix

Definition. Let A be any matrix, the norm of A , denoted by


kAk , is
kAk = max{kA xk : kxk = 1}

i.e., it is the maximum value of kA xk over all unit vectors x .


Indeed, as the norm of a vector allows us to write for x 6= 0

kA xk 1 x
= A x = A
kxk
,
kxk kxk

we also have
 
kA xk
kAk = max : x 6= 0 ,
kxk

and this is easier to deal with.


MA212 – Lecture 34 – Friday 16 February 2018 page 3
The norm of a matrix is a measure of ‘how large’ the linear
transformation represented by A can be, i.e., how large can
kA xk be if we have kxk = 1 ?

MA212 – Lecture 34 – Friday 16 February 2018 page 4


Of course, if v is an eigenvector of A with eigenvalue λ we have

Av = λv (v 6= 0)

and so
kAvk kλvk kvk
= = |λ| = |λ|
kvk kvk kvk
which means that the norm of A , i.e. kAk , must be at least as
large as the largest modulus than we can get from its
eigenvalues...
... But in general, it is larger! (see, for example, Exercise 8.2 :

1+ 5
Norm = 2 > 1 = modulus of largest eigenvalue).

MA212 – Lecture 34 – Friday 16 February 2018 page 5


A simpler example

 
0 1
If A =   , then the eigenvalues are 0; but Ae1 = 0 and
0 0
Ae2 = e1 , so kAk = 1 > 0 .

MA212 – Lecture 34 – Friday 16 February 2018 page 6


How to find the norm an m × n complex matrix A?

For any such matrix write

kA xk2 = (A x) · (A x) = (A x)∗ (A x) = x∗ A∗ A x.

Of course, A∗ A is Hermitian and non-negative definite and so


we can see that its eigenvalues

λ1 , λ2 , . . . , λn

are all non-negative. Lets say that λ1 is the largest one.


A∗ A is also normal and so its eigenvectors v1 , . . . , vn
respectively form an orthonormal set.

MA212 – Lecture 34 – Friday 16 February 2018 page 7


Computation

We start with A∗ A vi = λi vi where λi ≥ 0 and λ1 is the largest,


and {v1 , . . . , vn } an orthonormal set. For any x ∈ Cn , by using
orthogonality of the vi , we can write

x = α1 v1 + α2 v2 + · · · + αn vn
so kxk2 = hx, xi = · · · = |α1 |2 + |α2 |2 + · · · + |αn |2

2
P P
since ||x|| = hx, xi = i,j αi α¯j hvi , vj i = i αi ᾱi .

In words: The norm of x is equal to the norm of its coeffi-


cient vector in the base B = (v1 , · · · , vn ) .

MA212 – Lecture 34 – Friday 16 February 2018 page 8


We also have, as λ1 is the largest eigenvalue

kA xk2 = x∗ A∗ A x = |α1 |2 λ1 + |α2 |2 λ2 + · · · + |αn |2 λn


≤ |α1 |2 λ1 + |α2 |2 λ1 + · · · + |αn |2 λ1
2 2 2

= λ1 |α1 | + |α2 | + · · · + |αn |
= λ1 kxk2 .

since ||Ax||2 = hAx, Axi = x∗ A∗ A x


and

A∗ Ax = A∗ A(α1 v1 + α2 v2 + · · · + αn vn )
= λ1 α1 v1 + λ2 α2 v2 + · · · + λn αn vn

MA212 – Lecture 34 – Friday 16 February 2018 page 9


From here

kA xk2
∴ 2
≤ λ1 with equality if x = α1 v1
kxk
kA xk √
∴ ≤ λ1 with equality if x
kxk
is a corresponding eigenvector of A∗ A.

So, for any matrix A , its norm equals its largest singular value!
Warning
A square matrix A has both eigenvalues and singular values.
Distinguish these carefully!
Denoting eigenvalues by λA i we see that
q
|λA

max | ≤ ||A|| = λA A
max
MA212 – Lecture 34 – Friday 16 February 2018 page 10
EXAMPLE

Using what  in Lecture 33 .


 we saw
2 i 0 √
If A =   , its singular values are 6 and 2 , therefore,
0 −i 2

the norm of A is kAk = 6 (as this is larger than 2, of course).
Indeed, this means that the maximum value of

kA xk
kxk

is 6 and this occurs when x = (−1, i, 1)⊤ (an eigenvector of
A∗ A corresponding to its eigenvalue 6 . We can, of course, take
any non-zero scalar multiple of this!)

MA212 – Lecture 34 – Friday 16 February 2018 page 11


Check:

 
  −1  
2 i 0   −3 √
Ax =       ⇒ kA xk = 18
 i =
0 −i 2 3
1


and kxk = 3

kA xk 18 √
so = √ = 6.
kxk 3

MA212 – Lecture 34 – Friday 16 February 2018 page 12


The singular value decomposition of a matrix

Recall that if A is any m × n complex matrix, the matrices A A∗


and A∗ A are normal and have the same non-zero eigenvalues.
Indeed, if these non-zero eigenvalues are λ1 , λ2 , . . . , λk , then

A
|{z}A has an orthonormal set of eigenvectors
n×n matrix
v1 , . . . , vk ∈ Cn

A A
| {z } has an orthonormal set of eigenvectors
m×m matrix
u1 , . . . , uk ∈ Cm
The two sets of eigenvectors are related by

1 1 ∗
ui = √ A vi and vi = √ A ui
λi λi
MA212 – Lecture 34 – Friday 16 February 2018 page 13
We now derive a direct link of A to its singular values.

This will be the singular value decomposition of A or SVD for


short.
Let A be any m × n complex matrix.
As A∗ A is normal, its eigenvectors give rise to an orthonormal
basis of Cn . Suppose the orthonormal vectors are:

v1 , v2 , . . . , vk , vk+1 , vk+2 , . . . , vn

where v1 , . . . , vk are eigenvectors for the non-zero eigenvalues


λ1 , . . . , λk of A∗ A , and vk+1 , vk+2 , . . . , vn are eigenvectors for
the remaining eigenvalue of A∗ A , i.e., zero!

MA212 – Lecture 34 – Friday 16 February 2018 page 14


Cont’d

As these eigenvectors are orthonormal,


   
v1∗ v1∗
   
 ∗  ∗
 v2  v 
  (v1 , v2 , . . . , vn ) = In = (v1 , v2 , . . . , vn )  2 
 ..   .. 
 .   . 
   
vn∗ vn∗

A statement equivalent to the rightmost equation is that


v1 v1∗ + v2 v2∗ + · · · + vn vn∗ = In
| {z }
n×n matrices

For the above recall that V = (v1 , v2 , . . . , vn ) is unitary means


that V ∗ V = I = V V ∗ and V V ∗ = v1 v1∗ + v2 v2∗ + · · · + vn vn∗ .
MA212 – Lecture 34 – Friday 16 February 2018 page 15
Why?

Consider the case where n = 2 . We have


 
  α1
c1 , c2   = α1 c1 + α2 c2 .
α2

Now,
 
  α1 β1  
c1 , c2   = α1 c1 + α2 c2 , β1 c1 + β2 c2
α2 β2
   
= α1 c1 , β1 c1 + α2 c2 , β2 c2
= c1 (α1 , β1 ) + c2 (α2 , β2 )

MA212 – Lecture 34 – Friday 16 February 2018 page 16


So,

 
  r1
c1 , c2   = c1 r1 + c2 r2 ,
r2

for

r1 = (α1 , β1 ) r2 = (α2 , β2 ).

MA212 – Lecture 34 – Friday 16 February 2018 page 17


So...

bearing in mind that A∗ A is n × n and

A∗ A vi = λi vi with λi 6= 0 for 1 ≤ i ≤ k
and λi = 0 for k + 1 ≤ i ≤ k

1
with ui = √ A vi for the corresponding eigenvector for A A∗ .
λi
This is only for 1 ≤ i ≤ k ;
Indeed, for k + 1 ≤ i ≤ n , we have λi = 0 so that

A∗ A vi = 0 ⇒ vi∗ A∗ A vi = 0
⇒ (A vi )∗ A vi = 0
⇒ (A vi ) · (A vi ) = 0
⇒ kA vi k = 0
MA212 – Lecture 34 – Friday 16 February 2018 page 18

⇒ Avi = 0
Conclusion: the SVD.

I = vi vi∗ + v2 v2∗ + · · · + vk vk∗ + vk+1 vk+1



+ · · · + vn vn∗

∴ A = A I = A v1 v1∗ + A v2 v2∗ + · · · + A vk vk∗ +



+ A vk+1 vk+1 + · · · + A vn vn∗

√ √ √
∴ A = λ1 u1 v1 + λ2 u2 v2 + · · · + λk uk vk∗
∗ ∗

This is the SVD of A !

MA212 – Lecture 34 – Friday 16 February 2018 page 19


An alternative SVD.

√ √ √
∴ A = λ1 u1 v1 + λ2 u2 v2 + · · · + λk uk vk∗ =
∗ ∗

= U DV ∗ for U = (u1 , · · · , uk )m×k , V = (v1 , · · · , vn )n×n


√ 
λ1 0 0 0 0 0 ···
 √ 
 0 λ2 0 0 0 0 · · ·
 
and D =  
 ··· 
 

0 0 ··· λk 0 0 ···
k×n
This is also the SVD of A , but in a fancy format!

MA212 – Lecture 34 – Friday 16 February 2018 page 20


EXAMPLE

 
2 i 0
With A =   from the last lecture (Lecture 33) we had
0 −i 2

A∗ A with non-zero eigenvalues λ1 = 4 and λ2 = 6


   
1 −1
1   1  
with orthonormal eigenvectors v1 = √   0
 , v2 = √   i 

2 3
1 1
   
1 1 1 −1
and u1 = √ , u2 = √
2 1 2 1

MA212 – Lecture 34 – Friday 16 February 2018 page 21


Check:

Therefore, the SVD of A is


   
√ √1   √ −1
√  
A = 4  2  √1 0 √1 + 6  2  − √1 − √i3 √1
√1 2 2 3 √1 3
2 2


   
1
0 1 √ √1 √i − √16
A= 2 2 2
+ 6 6 6 
1 1
2 0 2 − √16 − √i6 √1
6

MA212 – Lecture 34 – Friday 16 February 2018 page 22


Check:

The RHS must add up to give A on the LHS!


     
1 0 1 1 i −1 2 i 0
RHS =   +   =   = A = LHS
1 0 1 −1 −i 1 0 −i 2

MA212 – Lecture 34 – Friday 16 February 2018 page 23


The spectral decomposition of a matrix

Actually, if A is a normal matrix, i.e., it is square, and


A∗ A = A A∗ , then we can do better than the SVD!
As A is normal, its eigenvectors v1 , v2 , . . . , vn form an
orthonormal basis of Cn . W suppose that the corresponding
eigenvalues are λi , so that

A vi = λi vi ( and vi 6= 0).

So, as before

I = vi vi∗ + v2 v2∗ + · · · + vn vn∗


∴ A = A I = A v1 v1∗ + A v2 v2∗ + · · · + A vn vn∗
∴ A = λ1 v1 v1∗ + λ2 v2 v2∗ + · · · + λn vn vn∗

MA212 – Lecture 34 – Friday 16 February 2018 page 24


So

A = λ1 v1 v1∗ + λ2 v2 v2∗ + · · · + λn vn vn∗

which is, in reality, just a fancy way of writing P D P ∗ = A.

In this case the norm of A is equal to the largest modulus of its


eigenvalues.

MA212 – Lecture 34 – Friday 16 February 2018 page 25


EXAMPLE

 
3 2
A= 
2 0

= −1
Eigenvalues: λ1 = 4, λ2  .  
1 2  1 −1
Eigenvectors: v1 = √ , v2 = √
5 1 5 2

   
√2   − √15  
∴ A= 4  5  √2 √1 + (−1)   −1
√ √2
√1 5 5 √2 5 5
5 5
   
4 2 1
− 25
∴ A= 4 5 5
+ (−1)  5 
2 1
5 5 − 25 4
5

MA212 – Lecture 34 – Friday 16 February 2018 page 26


Before we move on ...

...to our next topic, we need to think a little more about vector
spaces..., specifically Real vector spaces cont’d.
Sums and intersections
Definition. Suppose that U and W are two subspaces of a real
vector space V . We define the sum of U and W to be

U + W = {u + w : u ∈ U, w ∈ W },

and the intersection of U and W to be

U ∩ W = {x : x ∈ U and x ∈ W }.

Usefully, both of the sets of vectors we have just defined are


subspaces!

MA212 – Lecture 34 – Friday 16 February 2018 page 27


A Theorem

Theorem. If U and W are subspaces of a vector space


V , then both U + W and U ∩ W are subspaces of V .

Proof. For U + W we have

0 ∈ U + W as 0 = 0 + 0 and so U + W 6= ∅ .
If x1 , x2 ∈ U + W we have x1 = u1 + w1 and x2 = u2 + w2
with ui ∈ U, wi ∈ W , so that

x1 + x2 = (u1 + w1 ) + (u2 + w2 ) = (u1 + u2 ) + (w1 + w2 )

⇒ x1 + x2 ∈ U + W.

MA212 – Lecture 34 – Friday 16 February 2018 page 28


Cont’d

For α ∈ R , x ∈ U + W with x = u + w and u ∈ U , w ∈ W :

αx = α(u + w) = αu + αw ⇒ αx ∈ U + W

For U ∩ W we have

0 ∈ U ∩ W (as 0 ∈ U and 0 ∈ W ) and so U ∩ W 6= ∅ .


If x1 , x2 ∈ U ∩ W we have x1 ∈ U and x2 ∈ U so that
x1 + x2 ∈ U . And we have x1 ∈ W and x2 ∈ W so that
x1 + x2 ∈ W . Which means that x1 + x2 ∈ U ∩ W .
If α ∈ R and x ∈ U ∩ W , we have x ∈ U so that αx ∈ U .
And we have x ∈ W so that αx ∈ W . Which means
αx ∈ U ∩ W .

An actual calculation will follow after these illustrations...


MA212 – Lecture 34 – Friday 16 February 2018 page 29
Illustrations

In the middle diagram the line is in the plane of U

U W (point)
U (line)

U+W (plane)
0
W (line)

U W (line)
U+W (plane)
0
U (plane)
W (line)

W (plane)
U W (line)
0
MA212 – Lecture 34 – Friday 16 February 2018 page 30
U (plane)
EXAMPLE

Suppose we have the subspaces of R4 given by


      
 x   1 0 

 
 
 

   
      
 y   0 1
 
       
U=   x, y ∈ R = Lin    ,   

 0 
  0 0  

   
       

 
 
 

0 0 0
      
 a   1 0 

 
 
 


   
       
 a  1 0 
       
and W =   a, b ∈ R = Lin    ,   

 b 
  0 1 

   
  
     


 
 
 

b 0 1

MA212 – Lecture 34 – Friday 16 February 2018 page 31


Each of them of dimension 2.

Here, any vector x ∈ U + W can written as


         
x a 1 0 0
         
 y  a  0 1 0
         
x =   +   = (x + a)   + (y + a)   + b   .
0  b  0 0 1
         
0 b 0 0 1

and so
       

 1 0 0 



       

 
0 1 0
       
x ∈ Lin    ,   ,    .

 0 0 1  

        

 

0 0 1

MA212 – Lecture 34 – Friday 16 February 2018 page 32


Conversely,

any vector in the linear Lin above can be written as


         
1 0 0 α 0
         
0 1 0 β   0 
         
x = α + β +γ   =  +  ∈ U +W
0 0 1  0  γ 
         
0 0 1 0 γ

MA212 – Lecture 34 – Friday 16 February 2018 page 33


Consequently,

     

 1 0 0 


      
0 1 0
 
       
U + W = Lin    ,   ,   

0 0 1  

       

 

0 0 1

so dim(U + W ) = 3.

MA212 – Lecture 34 – Friday 16 February 2018 page 34


Also,

any vector x ∈ U ∩ W must be such that


 
x 
  
 y 

  
x ∈ U ⇒ x =   
 0 

  


   

 x a
0 

   
 y  a 
   
⇒  =  ⇒ x=a=y &b=0
   0  b 
    
a 
   0 b
a  

  
x ∈ W ⇒ x =   
 b 


  


b 

MA212 – Lecture 34 – Friday 16 February 2018 page 35


So

   
a 
 1 
  
   

a 
1 
     
and so, x =   ∈ Lin     .
0  
 0 
  
   

 

0 0

MA212 – Lecture 34 – Friday 16 February 2018 page 36


Conversely,

 

 1 

  


1 
   
... any vector x ∈ Lin     can be seen as

0  

   

 

0
   
a a
   
a  a 
   
x= ∈U and x= ∈W
0 0
   
0 0

and so x ∈ U ∩ W .

MA212 – Lecture 34 – Friday 16 February 2018 page 37


Consequently,

   

 1 
   

1 
   
U ∩ W = Lin     . So dim(U ∩ W ) = 1 .
 0  
    

 

0
Indeed, we observe that

dim(U + W ) = dim(U ) + dim(W ) − dim(U ∩ W )

and this result holds more generally...

MA212 – Lecture 34 – Friday 16 February 2018 page 38


MA212 Further Mathematical Methods Lecture LA 15

Lecture 35: Projections = Direct sums

Direct sums
Orthogonal complements
Orthogonal projections
A duality result:
R(A⊤ ) = N (A)⊥
Background readings

Anthony & Harvey


Chapter 12 (§12.1 & §12.5)

Adam Ostaszewski
Chapter 4 (§4.1 & §4.6)

MA212 – Lecture 35 – Tuesday 20 February 2018 page 2


The lecture’s main purpose:

0 Px U
subspace

1. To recognize projection matrices


2. To write down the orthogonal projection matrix P onto U
given a basis for U .

MA212 – Lecture 35 – Tuesday 20 February 2018 page 3


Recap 1

Concepts needed - I:

U1 + U2 = {u1 + u2 | u1 ∈ U1 , and u2 ∈ U2 }

vector sum of two subspaces, as illustrated:

U2
u2 u1+u2

0 u1 U1
MA212 – Lecture 35 – Tuesday 20 February 2018 page 4
Recap 2

Below we shall use a simple idea:


For:
U a subspace of V, and

{u1 . . . , ur } a basis of U

We can extend to a base for V :

{u1 , . . . , ur , ur+1 , . . . , ur+s }

The above assumes V is finite-dimensional.

MA212 – Lecture 35 – Tuesday 20 February 2018 page 5


Direct sum

We write
V = U1 ⊕ U2

for subspaces U1 , U2 of V if each v ∈ V has a unique


representation

v = u1 + u2 with u1 ∈ U1 , u2 ∈ U2

MA212 – Lecture 35 – Tuesday 20 February 2018 page 6


EXAMPLE:

V = R3
Suppose v1 , v2 , v3 are linearly independent
Take

U1 = Lin({v1 , v2 })
U1 = Lin({v3 })

For any v, there is a linear combination


h i h i
v = α1 v1 + α2 v2 + α3 v3
| {z } | {z }
in U1 in U2

for unique scalars α1 , α2 , α3 .

MA212 – Lecture 35 – Tuesday 20 February 2018 page 7


A Theorem

Theorem.

V = U1 ⊕ U2 (Direct sum)
⇐⇒
(i) V = U1 + U2 (sum)
(ii) U1 ∩ U2 = {0}

MA212 – Lecture 35 – Tuesday 20 February 2018 page 8


Proof:

( ⇒ ) Suppose V = U1 ⊕ U2 .

(i) For any v ∈ V

v = u1 + u2 ∈ U1 + U2 .

(ii) If u ∈ U1 ∩ U2 , then

u = | {z
u } + | {z
0 }
in U1 in U2
= | {z
0 } + | {z
u }
in U1 in U2

Two representations? Must be the same: so u = 0 .


So U1 ∩ U2 = {0}
MA212 – Lecture 35 – Tuesday 20 February 2018 page 9
Cont’d

( ⇐ ) Suppose that (i) and (ii) hold.


Suppose v ∈ V . By (i) we can write
v = u1 + u2 (as V = U1 + U2 ).
Suppose also v = u′1 + u′2

Then u1 + u2 = u′1 + u′2


u1 − u′1 = u′2 − u′2
| {z } | {z }
in U1 in U2

i.e. u1 − u′1 ∈ U1 = u′2 − u2 ∈ U2

so u1 − u′1 , u′2 − u2 ∈ U1 ∩ U2 = {0}.

So u1 − u′1 = u′2 − u2 = 0 , i.e. u1 = u′1 and u2 = u′2 .


MA212 – Lecture 35 – Tuesday 20 February 2018 page 10
A Corollary

Corollary.
If V = U1 ⊕ U2 ,
then
dim(V ) = dim(U1 ) + dim(U2 ).

Proof: Here
dim(U1 ∩ U2 ) = dim({0}) = 0.

So

dim(V ) = dim(U1 + U2 )
= dim(U1 ) + dim(U2 ) − dim(U1 ∩ U2 ) = dim(U1 ) + dim(U2 ).
| {z }
=0

MA212 – Lecture 35 – Tuesday 20 February 2018 page 11


Complements

Given a vector space V and a subspace U , then a


Definition.
subspace W is a complement (in V ) of U if

V =U ⊕W

Task.Finding a complement:
Solution. First we find a base for U , say it is

{u1 . . . , ur }.

MA212 – Lecture 35 – Tuesday 20 February 2018 page 12


Next

Extend to a base for V

{u1 . . . , ur , ur+1 , . . . , ur+s }.


 
Take W = Lin {ur+1 , . . . , ur+s }
Here we assume V is finite dimensional and

dim(V ) = r + s, s > 0.

MA212 – Lecture 35 – Tuesday 20 February 2018 page 13


Comment

V = R2 , U a line through the origin.

W
0

Any line W through the origin (distinct from U ) is a complement


of U

as U + W = R2
and U ∩ W = {0}

MA212 – Lecture 35 – Tuesday 20 February 2018 page 14


Comment cont’d

SPECIAL CASE U a line as before ....


If W is the line orthogonal to U , then we say that W is the
orthogonal complement.

(‘the ’ because it is unique).

MA212 – Lecture 35 – Tuesday 20 February 2018 page 15


Definition in an inner product space

Given a vector subspace U of V the orthogonal complement of U


denoted U ⊥ is

U ⊥ = {v ∈ V : v ⊥ u for all u ∈ U }

where v ⊥ u means hv, ui = 0 .

MA212 – Lecture 35 – Tuesday 20 February 2018 page 16


So

v ∈ U⊥
⇐⇒
hu, vi = 0 for all u ∈ U.

\

U = {v ∈ V : hu, vi = 0}
u∈U

an intersection of hyperplanes passing through the origin.

MA212 – Lecture 35 – Tuesday 20 February 2018 page 17


A Theorem

Theorem.For V a finite-dimensional inner product space and


U a subspace of V ,
(i) U ⊥ is a subspace
(ii) V = U ⊕ U ⊥ (i.e. U ⊥ is a complement)

⊥
(iii) U =U

Proof.
(i) Exercise for you!
(Intersection of hyperplanes through the origin?)

MA212 – Lecture 35 – Tuesday 20 February 2018 page 18


(ii)

Take an orthonormal basis

{u1 , . . . , ur }

for U .
Extend to an orthonormal basis of V

{u1 , . . . , ur , ur+1 , un }

For v ∈ V write

v = α1 u1 + . . . + αr ur + αr+1 ur+1 + . . . + αn un
| {z } | {z }
∈U ∈U ⊥

the second group ∈ U ⊥ because basis vectors are mutually


orthogonal. MA212 – Lecture 35 – Tuesday 20 February 2018 page 19
Cont’d

So V = U + U ⊥ .
Also
u ∈ U ∩ U⊥ ⇒
|{z} hu, ui = 0 ⇒ u = 0
(as u∈U ⊥ )

so U ∩ U ⊥ = {0}

MA212 – Lecture 35 – Tuesday 20 February 2018 page 20



⊥ ⊥
(iii) For this, note that U ⊆ U .

(Query: why?...see next slide.) So all we need is to check


  


dim(U ) = dim U .

But, by (ii) , both

dim(V ) = dim(U ) + dim(U ⊥ )

and  
dim(V ) = dim(U ⊥ ) + dim (U ⊥ )⊥ .

Subtracting equations
 
0 = dim(U ) − dim (U ⊥ )⊥

MA212 – Lecture 35 – Tuesday 20 February 2018 


page 21
Query answered

 ⊥
U ⊆ U⊥

Suppose u ∈ U and w ∈ U ⊥ ; we are to prove that u ⊥ w . But


w ∈ U ⊥ i.e. w ⊥ u , so

hu, wi = hw, ui = 0

MA212 – Lecture 35 – Tuesday 20 February 2018 page 22


Illustration with U a line or plane in R3

V=Թ3
u

W
0 w
U

W
w

0 u U

MA212 – Lecture 35 – Tuesday 20 February 2018 page 23

U
Application: a Duality result

 
a

Take A = [a, b]1×2 , so A =   . Notice that, first,
b
2×1
       
x x x a
N (A) = {  : ax + by = 0} = {  :h  ,  i = 0}
y y y b
 
a ⊥

= Lin{ }
b

and, second, for t a scalar


 
a
v=t   ∈ R(A⊤ ).
b
MA212 – Lecture 35 – Tuesday 20 February 2018 page 24
Here’s something important

But
*    +
a x
0=   ,  
b y

says that
   
a x
  ⊥  .
b y

So
R(A⊤ ) = N (A)⊥

True for any matrix ! (Not just 1 × 2 .)


Equivalently:
R(A)⊥ = N (A⊤ )

(since U ⊥ ⊥ = U ) MA212 – Lecture 35 – Tuesday 20 February 2018 page 25


Projection

If V is a vector space, a linear transformation V −→ V is a


projection if
T (T (v)) = T (v) for v ∈ V

i.e.
T2 = T

When second power yields just the same as one, one says
“Idempotent”!

MA212 – Lecture 35 – Tuesday 20 February 2018 page 26


Theorem

Theorem. If T : V → V is a projection, then

V = R(T ) ⊕ N (T )

In words: the kernel/nullspace is a complement of the range.

!(T)

(I T)v v

(T)
Tv

MA212 – Lecture 35 – Tuesday 20 February 2018 page 27


Comment and Proof

Indeed the Theorem says the picture correctly shows v as the


parallelogram sum of T v and v − T v :

v = T v + (v − T v)
T v + (I − T ) v
= |{z}
| {z }
∈R(T ) ∈N (T )

To see the latter inclusion, notice that


 
T (I − T )v = (T − T 2 )(v)

=0 as T = T 2 .

MA212 – Lecture 35 – Tuesday 20 February 2018 page 28


Proof Proper

For v ∈ V

v=Iv (identity)
= T v + (I − T )v

Here, T v ∈ R(T ) (clear) and (I − T )v ∈ N (T ) , why? Well,

  
T (I − T )v) = T v − T T (v)

= T v − T2 v = T v − T v = 0

MA212 – Lecture 35 – Tuesday 20 February 2018 page 29


Cont’d

So V = R(T ) + N (T ) . So far only ‘vector sum’ is proved,


that is, (i) .
If v ∈ R(T ) ∩ N (T ) then:

T v = 0,

and

v = T w, say (for some w), so T v = T (T w) = T w = v.

So v = 0 , as T v = 0 , proving (ii) .
So we conclude V = R(T ) ⊕ N (T ) .

MA212 – Lecture 35 – Tuesday 20 February 2018 page 30


Converse: Direct sums give projections

If V =U ⊕W

Then v ∈ V may be written uniquely as

v =u+w with u ∈ U, w ∈ W.

Consider the map taking v to the unique u as above, i.e.


v 7→ u =: T (v). Then w = (I − T )v

As u=u+0
Tu=u for u ∈ U
and so T2 = T

Here we say that T maps onto U ‘parallel’ to W


MA212 – Lecture 35 – Tuesday 20 February 2018 page 31
EXAMPLE/QUESTION:

Suppose
 v v2


z }| { z }| {
1


    


 1 0 

   
 
V = R3 U = Lin  0 ,  1 
   

 

 


 1 −1 


 

 

 

  


 2 


  
W = Lin  1
 


 

 

 0  


| {z }

v3

MA212 – Lecture 35 – Tuesday 20 February 2018 page 32


What matrix represents...

... the projection T onto U parallel to W ?


ANSWER
Here V = U ⊕ W ... why?
Answer: They are three linearly independent vector.
Now by definition of projection:
T v1 = v1 , T v2 = v2 , T v3 = 0 so T has these three as
eigenvectors with eigenvalues λ1 = 1 , λ2 = 1 , λ3 = 0
So relative to (v1 , v2 , v3 ) T is represented by
 
1
 
D=
 1 

0
MA212 – Lecture 35 – Tuesday 20 February 2018 page 33
Change of base now

Change to the basis (v1 , v2 , v3 ) with base change matrix:

S = (v1 , v2 , v3 )
 
1 0 2
 
= 0 1 1


1 −1 0

Suppose P represents T relative to E = (e1 , e2 , e3 ) ; then


(y)E = P (x)E . Recalling (x)E = S(x)S , we see that
S(y)S = P S(x)S and so

S −1 P S = D, so that P = S D S −1 .

MA212 – Lecture 35 – Tuesday 20 February 2018 page 34


Calculation (check!)

 
−1 2 2
 
S −1 
= −1 2 1

1 −1 −1

Now
 
1
 
D=
 1  = (e1 , e2 , 0) .

0

So SD = (Se1 , Se2 , S0) .

MA212 – Lecture 35 – Tuesday 20 February 2018 page 35


Cont’d

P = SDS −1
 
1 0 2
 
=0 1 1
 D S −1
( recall SD = (Se1 , Se2 , S0) )
1 −1 0
  
1 0 0 −1 2 2
  
=0 1 0 
 −1 2 1
 (use row operations!)
1 −1 0 1 −1 1
 
−1 2 2
 
=−1 2 1

0 0 1 MA212 – Lecture 35 – Tuesday 20 February 2018 page 36
Orthogonal Projection onto a subspace U

This is taken to mean ‘parallel to U ⊥ ’, so V = U ⊕ U ⊥ .


We show 2 methods for finding the corresponding P .
Note a characteristic property

Theorem.
P is an orthogonal projection ⇐⇒ P 2 = P and P ⊤ = P .

MA212 – Lecture 35 – Tuesday 20 February 2018 page 37


Proof (⇒)

As x=Ix
= P x + (I − P )x

For P orthogonal we have ∀x ∀y h|{z}


P x , (I − P )yi = 0 i.e.
| {z }
∈U ∈U ⊥
(P x)⊤ (I − P )y = 0 :

0 = x⊤ P ⊤ (I − P )y
= x⊤ (P ⊤ − P ⊤ P )y

MA212 – Lecture 35 – Tuesday 20 February 2018 page 38


...so..

P ⊤ = P ⊤P

transposing
P = P ⊤ P ⊤⊤ = P ⊤ P = P ⊤

So P = P ⊤ symmetric

MA212 – Lecture 35 – Tuesday 20 February 2018 page 39


Conversely: (⇐)

Suppose P2 = P
and P⊤ = P
⇒ P ⊤ = P = P P = P ⊤ P.

Then, as before,

hP x, (I − P ) yi = x⊤ (P ⊤ − P ⊤ P ) y = 0.
| {z }
0

So P is orthogonal!

MA212 – Lecture 35 – Tuesday 20 February 2018 page 40


Recognizing a projection: AN EXAMPLE

 
1 1 2
A=
5 2 4
  
2 1 1 2 1 2
A =
25 2 4 2 4
 
1  5 10
= =A
25 10 20

and also A⊤ = A.

MA212 – Lecture 35 – Tuesday 20 February 2018 page 41


So...

So... A projects orthogonally onto

R(A) = column space of A


  
 1 
= Lin  
 2 

MA212 – Lecture 35 – Tuesday 20 February 2018 page 42


EXAMPLE

 
2
 
Let V = R , v1 = 
3
1
 
 and U = Lin {v1 }

1
   
1 0
   
Then U is spanned by: v2 = −2 , v3 = 
⊥   1
 
.

0 −1

(By inspection! Here v2 and v3 are lin. indep. and both v2 ⊥ v1


and v3 ⊥ v1 .)

MA212 – Lecture 35 – Tuesday 20 February 2018 page 43


Problem

Find P representing the orthogonal projection onto U .

Let T = orthogonal projection onto U . This is easy to represent


using the basis (v1 , v2 , v3 ) , so put
 
2 1 0
 
R= 
1 −2 1  = (v1 , v2 , v3 ).
1 0 −1

Then
T v1 = v1 , T v2 = 0, T v3 = 0 so...

recalling (x)E = R(x)R , we translate (y)R = T (x)R to


(y)E = R(y)R = RT R−1 (x)E

MA212 – Lecture 35 – Tuesday 20 February 2018 page 44


So...

 
1
  −1
P = R
 0 R

0
   
2 0 0 2 1 1
 1 
=
1 0 0 
 6 2 −2 −2

1 0 0 2 −1 −5
 
4 2 2
1 
=  2 1 1
6 
2 1 1

PT = P here, of course.
MA212 – Lecture 35 – Tuesday 20 February 2018 page 45
To span U ⊥ ...(if not by inspection)

   
2 x1
   
x ∈ U⊥ ⇐⇒ 1  • x 2  = 0
   
1 x3

2x1 + x2 + x3 = 0

put x1 = t
x3 = u
so x2 = −u − 2 t

MA212 – Lecture 35 – Tuesday 20 February 2018 page 46


Then

     
t 1 0
     
x = −u − 2 t = t −2 +u 
    −1
 

u+ 0 1
| {z } | {z }
v2 −v3

MA212 – Lecture 35 – Tuesday 20 February 2018 page 47


If not by inspection...cont’d

Alternative approach, knowing U ⊥ .


The two vectors
 
2
1  
u1 = √ 1

 (as given)
6
1
 
0
1  
u2 = √  1 

 (got by taking t = 0 above)
2
−1

are orthogonal ( u1 ⊥ u2 ).

MA212 – Lecture 35 – Tuesday 20 February 2018 page 48


Apply Gram-Schmidt to...

 
1
 
to u :=  −2
 
 (got by taking u = 0 above):

0
Noting that u ⊥ u1 ,
    
  
1 * 1 0 + 0
  1 1      
 
w = −2 − √ √  −2, 1   1  − 0
2 2      
0 0 −1 −1
   
1 0
   
= −2 + 1 
   
0 −1

MA212 – Lecture 35 – Tuesday 20 February 2018 page 49


which is ..

 
1
 
= −1
 

−1
 
−1
1 

 1
say u3 = √  1  = √ (−w)
3 3
1

Now continue as before, using the basis (u1 , u2 , u3 ) .

MA212 – Lecture 35 – Tuesday 20 February 2018 page 50


A projection formula to remember...

An×m
 −1

A A A A⊤
| {z }
m×n×n×m
Why such a ‘magical’ (‘out of nowhere’) formula?
We’ll see in a later lecture!

MA212 – Lecture 35 – Tuesday 20 February 2018 page 51


Formula for representing orthogonal projection...

V = Rn (Reals!)
U = Lin ({v1 . . . , vm })
| {z }
linearly independent

Put
A := [v1 | . . . | vm ]n×m

Consider
P = A(A⊤ A)−1 A⊤

MA212 – Lecture 35 – Tuesday 20 February 2018 page 52


Need to check: for this choice of A and P

A⊤ A invertible
P2 = P
P ⊤ = P (obvious)
R(P ) = U = R(A)

MA212 – Lecture 35 – Tuesday 20 February 2018 page 53


A⊤ A invertible

Find N (A⊤ A) :

A⊤ A x = 0 ⇒ x⊤ A⊤ A x = 0
⇒ hA x, A xi = 0
⇒ Ax = 0

so x = 0 as v1 , ..., vm are linearly independent. So A⊤ A is of


full rank and so invertible.

MA212 – Lecture 35 – Tuesday 20 February 2018 page 54


Idempotency and symmetry

P idempotent P 2 = P

A(A⊤ A)−1 A⊤ A(A⊤ A)−1 A⊤


= A(A⊤ A)−1 I A⊤ = P

P symmetric P ⊤ = P

 ⊤
A(A⊤ A)−1 A⊤

= A⊤⊤ (A⊤ A⊤⊤ )−1 A⊤


= A(A⊤ A)−1 A⊤

MA212 – Lecture 35 – Tuesday 20 February 2018 page 55


R(P ) = R(A)

h i
P x = A (A⊤ A)−1 A⊤ x
∈ R(A).

i.e. R(P ) ⊆ R(A) .

MA212 – Lecture 35 – Tuesday 20 February 2018 page 56


Converse? ... i.e. R(A) ⊆ R(P ).

we ask
?
A x = P applied to some vector ?

Trick of insertion!

 −1 
A x = A A⊤ A A⊤ A x

= A(A⊤ A)−1 A⊤ (A x)
= P (A x) ∈ R(P ).

So indeed
R(A) ⊆ R(P )

MA212 – Lecture 35 – Tuesday 20 February 2018 page 57


EXAMPLE

V = R3 Use the formula to find projection orthogonally onto


    
 
  1   0 
 
Lin  −2 , 1 
   

 
 
0 −1

MA212 – Lecture 35 – Tuesday 20 February 2018 page 58


Solution

Put
 
1 0
 
A = −2 1 


0 −1
3×2
 
  1 0
1 −2  0 

(A A)2×2 =  −2 1 

 
0 1 −1
0 −1
 
5 −2
=  (now swop and sign-switch)
−2 2
 
1 2 2
(A⊤ A)−1 =
6 2 5 MA212 – Lecture 35 – Tuesday 20 February 2018 page 59
Using P = A(A⊤ A)A⊤

 
1 0   
 1 2 2 1 −2 0

P = −2 1      
6 2 5 0 1 −1
0 −1
 
2 2  
1  1 −2 0
=   −2 1 
 
6 0 1 −1
−2 −5
 
2 −2 −2
1 
= −2 5 −1


6
−2 −1 5

MA212 – Lecture 35 – Tuesday 20 February 2018 page 60


MA212 Further Mathematical Methods Lecture LA 16

Lecture 36: Best-fits using Projections

Understanding PA = A(AT A)−1 AT


A more general PB = A(BA)−1 B
Best-fitting line through data points
Background Readings

Anthony and Harvey


Chap. 12 (§12.6-12.7)

Adam Ostaszewski
Chap. 4 (§4.7)

MA212 – Lecture 36 – Friday 23 February 2018 page 2


Understanding the aims...

An×m = [v1 , ..., vm ]

with v1 , ..., vm ∈ Rn
i.e. here A is built from columns in Rn

A(AT A)−1 AT

with AT A being m × m

Warning In this lecture always check the sizing of A .


As above, it may vary from the traditional m × n .
(The latter corresponds to A : Rn → Rm .)

MA212 – Lecture 36 – Friday 23 February 2018 page 3


Understanding Projections

We’ll look carefully at the formula


P = A(AT A)−1 AT
(AT A) is called a Gram matrix
and is very like a covariance matrix.

L = (AT A)−1 AT is best remembered by the formula

LA = I as = (AT A)−1 AT .A = I

so L stands for ‘Left of A ’

MA212 – Lecture 36 – Friday 23 February 2018 page 4


The Gram matrix ... like covariance

If
An×k = [v1 , ..., vk ]
then
 
−− v1T −−
 
A A=
T
 ...  [v1 , ..., vk ]

−− vkT −−

The ij element is
hvi , vj i

Hence AT A is like a covariance matrix, if we regard v1 , ..., vk as


‘random variables’.

MA212 – Lecture 36 – Friday 23 February 2018 page 5


More importantly:

If An×k = [v1 , ..., vk ] has rank k then AT A has rank k and is


square of size k × k
[why?]
and then L = (AT A)−1 AT exists and gives

LA = I

So L is a left inverse for A.

MA212 – Lecture 36 – Friday 23 February 2018 page 6


Comments

If An×k = [v1 , ..., vk ] has a left inverse L , i.e. with


Lk×n An×k = Ik×k , then the rank of A is k :
to see this check the nullity:

Ax = 0 =⇒ x =Ix = LAx = 0

so N (A) = {0} i.e. nullity (A) = 0 , and so rank (A) = k, as


dim-dom = k.

MA212 – Lecture 36 – Friday 23 February 2018 page 7


Conclusion: a criterion

An×k = [v1 , ..., vk ] has a left inverse ⇔ k = rank(A).

MA212 – Lecture 36 – Friday 23 February 2018 page 8


Comments

1. So P = A(AT A)−1 AT has the property that

P = AL and LA = I.

Of course
A |{z}
L.A L = AL.

So P is a projection; it is orthogonal because

P T = [A(AT A)−1 AT ]T = AT T ((AT A)T )−1 AT = A(AT A)−1 AT = P.

MA212 – Lecture 36 – Friday 23 February 2018 page 9


Cont’d

2. If there is a right inverse R , i.e. with

AR = I

then
R |{z}
A.R A = RA.

So P = RA is also a projection.
Then
RT AT = I

so RT is a left inverse for AT and so rank ((AT )n×m ) = m


(Here A is m × n .)

MA212 – Lecture 36 – Friday 23 February 2018 page 10


Conclusion

Am×n has a right inverse iff rank(A) = m.

In summary Al×r
has a left inverse if rank = r
has a right inverse if rank = l

MA212 – Lecture 36 – Friday 23 February 2018 page 11


Another Formula

From An×m = [v1 , ..., vm ] yielding

(AT A)m×m and A(AT A)−1 AT

To
PB := A(Bm×n An×m )−1 B

(Here B is the same size as AT .)


Here

PB2 = A(BA)−1 B.A(BA)−1 B = A(BA)−1 B = PB ,

so again we get a projection, though perhaps not an orthogonal


one.

MA212 – Lecture 36 – Friday 23 February 2018 page 12


The generalization: its aim

Suppose that
Rn = U ⊕ W

with U = Lin{v1 , ..., vm } using a basis of lin. indep. vectors for


U. Then dim W = n − m .

W
(n m) dim

m dim U

The aim is to have U = R(A) and W = N (B) and for PB to


project parallel to W onto U .

MA212 – Lecture 36 – Friday 23 February 2018 page 13


Want W = N (B) = {x : Bx = 0}

So B provides a bunch of equations via its rows, assumed lin.


indep. (to avoid redundancy). How many rows? We’ll see it must
be m . Denoting the normal vectors n1 , ..., nm we get:

hn1 , xi = nT1 x =0 


...



hnm , xi = nTm x =0

ni

0 x

MA212 – Lecture 36 – Friday 23 February 2018 page 14


So..

So A, B take the format:

U = R([v1 , ..., vm ]) = R(An×m ) (lin. indep. cols !)


 
nT1
 
W = N (
 ... 
 ) = N (Bm×n )
nTm
m×n

MA212 – Lecture 36 – Friday 23 February 2018 page 15


How to get B?

W = N (Bm×n ) (Wanted)
W ⊥ = N (Bm×n )⊥ (Equivalently)
= R((Bm×n )T ) (Duality: R(B T ) = N (B)⊥ )
= column space of B T

Task now is: Find columns spanning W ⊥ .


Example to follow in a moment.

MA212 – Lecture 36 – Friday 23 February 2018 page 16


Why m rows in B ? (Recall A has m lin. indep. columns)

Rn = U ⊕ W = Rn = R(A) ⊕ N (B)

rank(A) = m ⇔ nullity(B) = n − m ⇔ rank(B) = m.

Put Pn×n = An×m (Bm×n An×m )−1 Bm×n

Then P projects onto U parallel to W. Why?

projects bcsP 2 = A(BA)−1 B A(BA)−1 B =


| {z }
= A(BA)−1 B = P
onto R(P ) = R(A(BA)−1 B) ⊆ R(A)

and R(A) ⊆ R(P ) because P A = A(BA)−1 BA = A


Ax = P Ax ∈R(P )
MA212 – Lecture 36 – Friday 23 February 2018 page 17
Cont’d: ... and parallel to W

Bx = 0 =⇒ P x = A(BA)−1 .Bx = 0 so N (B) ⊆ N (P )

To improve the inclusion to equality we compute the dimensions


of each side:

n − m = dim N (B) ⊆ N (P ) = n − rank(P ) = n − m

So same dimension, so

W = N (B) = N (P ).

MA212 – Lecture 36 – Friday 23 February 2018 page 18


Example

    
 
  1 
  1
 
U = Lin  0  so A = 
  0
 


 

 
1 1
   
 
 1   0 
 
W = Lin  1 , 1 
   

 
 
0 3

MA212 – Lecture 36 – Friday 23 February 2018 page 19


We find W ⊥ = N (Bm×n )⊥

x ∈ W⊥ ⇔
 
1
 
 1  · x = 0, i.e. x1 + x2 = 0, and
 
0
 
0
 
 1  · x = 0, i.e. x2 + 3x3 = 0,
 
3

   
3x3 3
   

x =  −3x3  = x3  −3 
 

x3 1 MA212 – Lecture 36 – Friday 23 February 2018 page 20
Example cont’d

   
 
  3 
  
3

W ⊥
= Lin  −3  = R 
 

 −3 



 

 
1 1
 ⊥  T 
3  3
   
∴W = R  −3  = N    
−3   = N ([3, −3, 1])
   
1 1

by duality.

MA212 – Lecture 36 – Friday 23 February 2018 page 21


Example concluded

 
1
 
BA = [3, −3, 1]  0
 
 = (4)1×1

1
(BA)−1 = 1/4

P = A(BA)−1 B
 
1
 
=  0
 
 (1/4)[3, −3, 1] = ...

MA212 – Lecture 36 – Friday 23 February 2018 page 22


...again

   
1 3 −3 1
1  1 
P =  0  [3, −3, 1] =  0 0 0 
  
 .
4 4
1 3 −3 1

MA212 – Lecture 36 – Friday 23 February 2018 page 23


Take ... a look at BAx = B(Ax)

U = Lin{u1 , ..., um } = R([u1 , ..., um ]) = R(An×m ) ⊆ Rn


and define T : U → Rm by

T (u) = Bm×n u for u ∈ U ⊆ Rn

(Bear in mind that any u in U is of the form u = Ax .)


T

(Ã)
= (A)

N (T ) = N (B) ∩ U = W ∩ U = {0}, bcs Rn = U ⊕ W

So rank(T ) = rank(T ) + nullity(T ) = dim(U ) = m


| {z }
=0
MA212 – Lecture 36 – Friday 23 February 2018 page 24
We inspect...

... the range of T. We see that y ∈R(BA) implies y =BAx for


some x, so y =B(Ax) , and
Ax =[u1 , ..., um ]x ∈ Lin{u1 , ..., um } = U so y ∈R(T ) , i.e.

R(BA) ⊆ R(T ).

Now for the converse inclusion ...

MA212 – Lecture 36 – Friday 23 February 2018 page 25


But, conversely for u ∈ U

Given u find enx with u = Ax , then

T (u) = T (Ax) for some x ∈ Rm


= B.Ax
= B(Ax) ∈R(BA)

So
R(BA) = R(T )

rank(BA) + nullity(T ) = m,
rank(BA) + 0 = m.

So (BA)m×m has rank m, so has an inverse.

MA212 – Lecture 36 – Friday 23 February 2018 page 26


Nearest point map =Orthogonal Projection

For P an orthogonal projection onto U :



u∈U 
=⇒ u − P v ∈ U
Pv ∈ U 

v − P v ∈ U ⊥ & u − P v ∈ U =⇒ v − P v ⊥ P v − u

By Pythagoras’s Theorem

||v − u||2 = ||v − P v||2 + ||P v − u||2 ≥ ||v − P v||2

So
||v − u|| ≥ ||v − P v||

i.e. P v is the nearest point in U to v (nearest in the sense of at


the least distance). MA212 – Lecture 36 – Friday 23 February 2018 page 27
‘Least squares approximation’

P v in U is the nearest point of U to the point v.

hypoteneuse
v

u Pv
0 Pv
U u

MA212 – Lecture 36 – Friday 23 February 2018 page 28


Another way to say this:
P v is the best approximation (as measured by the Pythagorean
norm) available from the choice set U : the choice P v
minimizes over all available u (= gives) the least squares error:

||v − u||2 = |v1 − u1 |2 + |v2 − u2 |2 + ... + |vn − un |2 .

MA212 – Lecture 36 – Friday 23 February 2018 page 29


Example:

Given the Experimental Data:

True value x 0 3 5 8 10
Observation y 2 5 6 9 11

‘Theory’ suggests this data should be modelled linearly as:

y = ax + b

for a, b constants. The first two readings give:


)
2=0+b
=⇒ a = 1, b = 2.
5 = 3a + b

MA212 – Lecture 36 – Friday 23 February 2018 page 30


So a = 1, b = 2, i.e. y = x + 2.

But, this is inconsistent with the other three readings:

True value x 0 3 5 8 10
Observation y 2 5 6 9 11

What should be done?

MA212 – Lecture 36 – Friday 23 February 2018 page 31


Reading errors?

Suppose the y readings have errors and should be replaced by


cleaned up values y ∗ (x); so

y ∗ (x) = y(x) − e(x),

with e(x) subtracting off the ‘measurement error’. Then


y∗ = y + e and
     
y ∗ (0) 0 1
     
 y ∗ (3)   3   1 
     
     
y =

 ...
 = a 5  + b 1 
    
     
 ..   8   1 
     
y ∗ (10) 10 1
MA212 – Lecture 36 – Friday 23 February 2018 page 32
So

   

 0 1 


    


  


 3  
  1



   
y ∈ Lin 

 5  ,
 
 1 


    

  

  8  
 1 



 

 
10 1

and
y∗ = y − e

MA212 – Lecture 36 – Friday 23 February 2018 page 33


Construct the orthogonal projection matrix onto

   

 0 1 


    


  


 3  
  1



   
U = Lin  5  ,
 
 1  ⊆ R5


    

  

  8  
 1 



 

 
10 1

Then the smallest error occurs when

y∗ = P y.

MA212 – Lecture 36 – Friday 23 February 2018 page 34


Put

 
0 1
 
 3 1   
  ∗
  a
A=
 5 1 
 , wanted: y ∗
= A  
  b∗
 8 1 
 
10 1

MA212 – Lecture 36 – Friday 23 February 2018 page 35


So

P = A(AT A)−1 AT = AL

 
0 1
 
  3 1 
 
0 3 5 8 10  
AT A =   5 1 
 
1 1 1 1 1  
 8 1 
 
10 1
 
198 26
=  
26 5

(of course symmetric)


MA212 – Lecture 36 – Friday 23 February 2018 page 36
Cont’d

 
1  5 −26
T
(A A)−1
= 
314 −26 189

MA212 – Lecture 36 – Friday 23 February 2018 page 37


Now solve y∗ = P y with y∗ ∈ U , i.e. y∗ = A[α, β]T

 
α
A  = y∗ = P y = ALy
β
   
α α
Pre-multiply by L: LA 
|{z}
=  = L A Ly = Ly
β β | {z }
=I =P y

MA212 – Lecture 36 – Friday 23 February 2018 page 38


So

 
α
  = (AT A)−1 AT y
β

MA212 – Lecture 36 – Friday 23 February 2018 page 39


Cont’d: (α, β)T = Ly = (AT A)−1 AT y

 
2
 
  5   
 
0 3 5 8 10   227
T
A y=    6  =  
 
1 1 1 1 1   33
 0 
 
11

    
α 5 −26 227
  = (AT A)−1 AT y = 1   
β 314 −26 189 33
     
1  277   .88  1
= = not quite  .
314 632 2.01 2

MA212 – Lecture 36 – Friday 23 February 2018 page 40


What was this about?

11

9
8
y*(x)

5 y(x)

3 5 8 10

Sum of squared errors indicated by m is to be minimized.


MA212 – Lecture 36 – Friday 23 February 2018 page 41
MA212 Further Mathematical Methods Lecture LA 17

Lecture 37: Left-, Right-,


and Generalized-Inverses

L -inverses are determined by their N (L)


R -inverses are determined by their R(L)
Distinct features of L and R
G -inverses created via factorization A = BC
Background Readings:

Anthony and Harvey:


‘Orthogonal projection & best fit’: Chap. 12 (§12.5, 12.6, 12.7)

Adam Ostaszewski:
Inverses Chap. 8 (§8.3,8.4)

MA212 – Lecture 37 – Tuesday 27 February 2018 page 2


Left Inverses Recalled

For An×m of rank m (the size on the right of n × m )

L := (AT A)−1 AT

is a left inverse
LA = I.

Others?
If Bm×n An×m is of rank m for some B then
| {z }
m×m

L := (BA)−1 B gives LA = I.

We will use the latter formula whenever it is easy to spot such a


B.
MA212 – Lecture 37 – Tuesday 27 February 2018 page 3
Q: When is a left inverse unique...?

Answer:...only when extra conditions are slapped on

MA212 – Lecture 37 – Tuesday 27 February 2018 page 4


Left inverse is uniquely determined by its kernel

*Theorem. If An×m has rank m and LA = I = L′ A and


further N (L) = N (L′ ), then

Lm×n = L′m×n

Proof.The two projections P = AL and P ′ = AL′ have the


same range, as R(AL) = R(A) = R(AL′ ) . By assumption both
have the same null space: N (AL) = N (L) = N (L′ ) = N (AL′ ) .
So both have the same direct sum, so are the same projection
(Lecture 35). So P = P ′ , and so

AL = AL′ ⇒ L = IL = |{z}
LA L = L |{z} AL′ = |{z}
AL = L |{z} I L′ = L′ .
=I =P P =P ′ =LA

MA212 – Lecture 37 – Tuesday 27 February 2018 page 5


Example

Find a left inverse L for


   
1 0 
 0 

    
A=  1 1  with N (L) = W = Lin  1  ⊆ R3
  

 
 
1 1 −1

This condition makes L unique.

MA212 – Lecture 37 – Tuesday 27 February 2018 page 6


Solution

We apply the last lecture’s work taking U = R(A). Now


 
0
 
w :=  1 

∈/ R(A)
−1

(by inspection, obvious). So if we take

W = Lin {w} , then W ∩ R(A) = {0}.

MA212 – Lecture 37 – Tuesday 27 February 2018 page 7


Next...with w = (0, 1, −1)T

Next we find W ⊥ : we find all x s.t. hx, w, i = 0, i.e. x s.t.

0 + x2 − x3 = 0
       
x1 x1 1 0
       
 x2  =  x2  = x1  0  + x2  1  .
       
x3 x2 0 1
  
1 0
   
W ⊥:     T
= R  0 1   = R B , say.
0 1

MA212 – Lecture 37 – Tuesday 27 February 2018 page 8


Using R(B T ) = N (B)⊥

 
1 0 0
W = N   = N (B)
0 1 1
 
  1 0  
1 0 0  1 0
BA =      
 1 1 =
0 1 1 2 2
1 1 det=2

MA212 – Lecture 37 – Tuesday 27 February 2018 page 9


Cont’d

 
−1 1 2 0 
(BA) = (after a ... swap/re-sign/divide)
2 −2 1

    
−1 1  2 0  1 0 0  1  2 0 0 
L = (BA) B= = .
2 −2 1 0 1 1 2 −2 1 1

MA212 – Lecture 37 – Tuesday 27 February 2018 page 10


So: check

LA = (BA)−1 B .A = I.
| {z }

MA212 – Lecture 37 – Tuesday 27 February 2018 page 11


A freebie

If there is R with AR = I we say that R is a right inverse.


Then
RT AT = I,

so RT is a left inverse.
Take AT to be n × m , then Am×n must have rank m.

Now observe that, by ‘duality’:

N (RT ) = N (R′T ) ⇐⇒ R(R)⊥ = R(R′ )⊥ ⇐⇒ R(R) = R(R′ )

So ...

MA212 – Lecture 37 – Tuesday 27 February 2018 page 12


So ...

*Theorem. If Am×n has rank m and AR = I = AR′ and


further R(R) = R(R′ ), then


Rn×m = Rn×m .

In words: right inverses are uniquely determined by their range.

MA212 – Lecture 37 – Tuesday 27 February 2018 page 13


Use of the right inverse

If Am×n has a right inverse: Am×n Rn×m = I, then one can solve

Ax = b ∈Rm

(for any b ) by taking

AR b = b.
x = Rb, because Ax = |{z}
=I

So R(A) = Rm (= set of all b ’s), and so rank(A) = m.

MA212 – Lecture 37 – Tuesday 27 February 2018 page 14


What’s so good about R = AT (AAT )−1
m×m

General solution of
Ax = b

is “particular solution plus complementary vector”, i.e.

x = Rb + z for some z ∈ N (A),

because if Ax = b

A(z) = A(x − Rb) = A(x)−AR(b) = b − b = 0.

Now by “duality”

Rb =AT (AAT )−1 b ∈R(AT ) = N (A)⊥

and z ∈ N (A) .
MA212 – Lecture 37 – Tuesday 27 February 2018 page 15
So ...

So Rb ⊥ z.
So, by Pythagoras,

||Rb + z||2 = ||Rb||2 + ||z||2 ≥ ||Rb||2

So x = Rb is the solution with least norm, when


R=AT (AAT )−1 .

MA212 – Lecture 37 – Tuesday 27 February 2018 page 16


For comparison:...

if
L = (AT A)−1 AT ,

then P = AL = A(AT A)−1 AT = P T and so P projects


orthogonally onto R(A),
and v∗ =Ax∗ is the best approximation to solving the
inconsistent system Ax = v.

Pv
U= (A)
0 v*

N.B. The dotted line is ⊥ .

MA212 – Lecture 37 – Tuesday 27 February 2018 page 17


Summary: for P = AL to project onto U parallel to W

An×m = [v1 , ..., vm ] (i.e. with m columns) has a left inverse iff
rank (A) = m
If (BA)m×m has rank m then (BA)−1 B is a left inverse
Choose B with N (B) = a specified subspace = W .

How?
Method:

Write W ⊥ = R(M ) = i.e. as a col. space


=⇒ W = W ⊥⊥ = R(M )⊥ = N (M T ),
then take B = M T

MA212 – Lecture 37 – Tuesday 27 February 2018 page 18


What if k = rank(An×m ) < m, n.

No right or left inverse here.


But we can have a matrix G called a generalized inverse with

AGA = A 
GAG = G 

Motivation:
AR A = A,
|{z} R |{z}
AR = R; A |{z}
LA = A, LA L = L.
|{z}

How? We show later that we can write

An×m = Bn×k Ck×m k = rk(A) = rk(B) = rk(C),

and ... MA212 – Lecture 37 – Tuesday 27 February 2018 page 19


Thence ... a generalized G = RL

Then we construct LB and RC with

LB B = I and CRC = I.

Take
G = RC LB

and

MA212 – Lecture 37 – Tuesday 27 February 2018 page 20


Then ...

GAG
G
z }| { z }| {
= RC LB |{z}
BC RC LB = RC [LB B][CRC ]LB = RC LB
A
= G

and
=A
z}|{
AGA = BC RC LB BC = B[CRC ][LB B]C = BC
| {z }
= A

MA212 – Lecture 37 – Tuesday 27 February 2018 page 21


Pause for thought (to help later)

If T : U → V is linear, then T preserves or reduces dimension!


That is, for any subspace W ⊆ U,

dim T [W ] ≤ dim W ;

indeed, define S : W → V as the restriction of T to W :

S(w) := T (w) for w ∈ W ,

then dim T [W ] = dim R(S) = rank(S), and:


rank(S) + nullity(S) = dim −dom(S) = dim(W ), so
| {z }
≥0

dim(T [W ]) = rank(S) ≤ dim(W ).

MA212 – Lecture 37 – Tuesday 27 February 2018 page 22


How to factorize A as

A = BC.

Watch the following example:


 
1 0 2 −1 4
 
A = [a1 , a2 , a3 , a4 , a5 ] = 
 0 1 −1 0 6 

1 1 1 −1 10

We select a set of columns that are lin. indep. and span the
remaining columns: a1 , a2 does the trick. (These two columns
are obviously lin. indep.: look at their first component.)
So B = (a1 , a2 ) is a basis for R(A) .

MA212 – Lecture 37 – Tuesday 27 February 2018 page 23


Next ...

Next we compute the co-ordinate columns relative to B of each


of the remaining columns.
Relative to B we see a1 = (e1 )B and a2 = (e2 )B and further:
 
2    
  2 −1
 
a3 =  −1  = 2a1 − a2 =   ; a4 = −a1 =  
−1 0
1 B B

MA212 – Lecture 37 – Tuesday 27 February 2018 page 24


So ...

 
4  
  4
 
a5 =  6  = 4a1 + 6a2 =  
6
10 B

MA212 – Lecture 37 – Tuesday 27 February 2018 page 25


Ergo

 
1 0  
  1 0 2 −1 4
A= 0 1 


 .
0 1 −1 0 6
1 1 | {z }
co-ordinate cols. rel. B

If not by inspection, then how? Use row reduction to echelon


form. Here’s another less obvious example:

MA212 – Lecture 37 – Tuesday 27 February 2018 page 26


Another less obvious example

 
1 1 2 4
 
 −1 2 −1 −1  R + R
  2 1
A=  →
 3 0 5 
9  R3 − 3R1

1 4 3 7 R4 − R1
 
1 1 2 4
 
 0 3 1 3 
 
→ 
 0 −3 −1 −3  R3 + R2
 
0 3 1 3 R4 − R2

MA212 – Lecture 37 – Tuesday 27 February 2018 page 27


Cont’d

 
1 1 2 4
 
 0 3 1 3  R1 − 2R2
 
→  (to put to work the 1 in R2 )
 0 0 0 0 
 
0 0 0 0

 
1 −5 0 −2
 
 0 3 1 3 
 
→  = A′ so (col 1, col 3) form a basis of A′
 0 0 0 0 
 
0 0 0 0

MA212 – Lecture 37 – Tuesday 27 February 2018 page 28


Next step: we use the same columns of A

So in A take B = (a1 , a3 ); but in A′ since a′2 = −5a′1 + 3a′1 and


a′4 = −2a′1 + 3a′3 , we have
   
−5 −2
a2 =   , a4 =  
3 3
B B

MA212 – Lecture 37 – Tuesday 27 February 2018 page 29


So

   
1 1 2 4 1 2
    
 −1 2 −1 −1   −1 −1  1 −5 0 −2
    
  = BC =  
 3 0 5 9   3 5  0 3 1 3
   
1 4 3 7 1 3

Now we pass to LB , RC . Notice that row 1 and row 2 of B are


independent: we use this to find P in the next slide.

MA212 – Lecture 37 – Tuesday 27 February 2018 page 30


How about

L = (P B)−1 P
LB = (P B)−1 P B = I

for some P . This works so long as P2×4 B4×2 is invertible!.. i.e.


| {z }
2×2
of rank 2.
Think of P B as P executing “row operations” on B ...

MA212 – Lecture 37 – Tuesday 27 February 2018 page 31


So take

   
1 0 0 0 − − eT1 − −
P = = .
0 1 0 0 − − eT2 − −

Regard P as describing the row operations: “pick rows 1 & 2”


 
1 2
    
1 0 0 0 
 −1 −1 
 1 2 ←− Row(B)1
PB =   = 
0 1 0 0 
 3 5 
 −1 −1 ←− Row(B)2
1 3

(See Slide 33.)


 
−1 −2
(P B) −1
=  swop & sign, and note det P = −1+2 = 1
1 1 MA212 – Lecture 37 – Tuesday 27 February 2018 page 32
Continued: so of course

    
−1 −2 1 0 0 0 −1 −2 0 0
−1
(P B) P =  = 
1 1 0 1 0 0 1 1 0 0

MA212 – Lecture 37 – Tuesday 27 February 2018 page 33


Right inverse for C

Here again try for Q(CQ)−1 , and think of Q as performing


column operations on C .
Recall that C contains c1 = e1 and c3 = e2 ; indeed:
 
1 −5 0 −2
C= 
0 3 1 3

So take Q = (e1 , e3 ) to “pick cols 1 & 3”, then CQ = (Ce1 , Ce3 )

 
1 0
 
 0 0 
 
N.B. these ei are in R4 and so Q =  
 0 1 
 
0 0
MA212 – Lecture 37 – Tuesday 27 February 2018 page 34
Then...

then obviously
 
1 0
CQ =  =I
0 1

Q(CQ)−1 = Q = RC

MA212 – Lecture 37 – Tuesday 27 February 2018 page 35


Conclusion

From A = BC to G = RC LB :
   
1 0 −1 −2 0 0
     
 0 0  −1 −2 0 0  0 0 0 0 
    
G=  = 
 0 1  1 1 0 0  1 1 0 0 
   
| {z }
0 0 =LB 0 0 0 0
| {z }
=RC

MA212 – Lecture 37 – Tuesday 27 February 2018 page 36


Comment

Suppose that An×m = Bn×k Ck×m and rk(A) = rk(B) = k.


Then:
1. rk(C) ≤ k because C is k × m
2. In fact: equality, as also k ≤ rk(C), because:

k = rk(A) = dim R(A) = dim R(BC)


= dim B[R(C)] (but B preserves or lowers dimension)
≤ dim R(C) = rk(C)

MA212 – Lecture 37 – Tuesday 27 February 2018 page 37


MA212 Further Mathematical Methods Lecture LA 18

Lecture 38: Pseudo-inverses

Infinite dimensional spaces



ADA = A 
+ Orthogonality
DAD = D 

Underlying projections: AD, DA


The “ special one”
Lin. Algebra connections with analysis
Some general features, when...

DAn×m D = Dm×n
ADm×n A = An×m

We get projections

D[A · DA] = DAn×m so DA a projection in Rm

[AD · A]D = ADm×n so AD a projection in Rn

Using rank+nullity (below) we can show that each of the two


projections DA, AD has the same rank as A , i.e.

rk(DA) = rk(AD) = rk(D) = rk(A) = k, say.

MA212 – Lecture 38 – Friday 2 March 2018 page 2


This is because:

R(AD) = R(A) and N (AD) = N (D) (1)

N (DA) = N (A) (2)

Indeed

ADx = A(Dx) so R(AD) ⊆ R(A)


Ax = [AD](Ax) so R(D) ⊆ R(AD)

and

ADx = 0 =⇒ DADx = 0 =⇒ Dx = 0 =⇒ N (AD) ⊆ N (D)

Dx = 0 =⇒ ADx = 0 =⇒ N (D) ⊆ N (AD).


MA212 – Lecture 38 – Friday 2 March 2018 page 3
Similarly...

Az = 0 =⇒ DAz = 0, so N (A) ⊆ N (DA);

| {z } z = 0 =⇒ Az = 0, so N (DA) ⊆ N (A).
DAz = 0 =⇒ ADA
=A

MA212 – Lecture 38 – Friday 2 March 2018 page 4


Rank considerations for D

1. From the size of Dm×n←−

rk(D) + nullity(D) = n

2. As AD is a projection onto R(AD) = R(A) and


N (AD) = N (D) we get the direct sum decomposition:

Rn = R(AD) ⊕ N (AD) = R(A) ⊕ N (D)

n = rk(A) + nullity(D).

So
rk(D) + nullity(D) = n = rk(A) + nullity(D).

MA212 – Lecture 38 – Friday 2 March 2018 page 5


Features continued:

We get direct sums both in Rn and in Rm :

(1 : from (AD)) Rn = R(A) ⊕ N (D)

and

(2 : from (DA)) Rm = N (A) ⊕ R(D)

Let’s see why:


(1) For x ∈ Rn : put

z = x − ADx : Dz = Dx − DADx = 0, so
x = (ADx) + (z)
| {z } |{z}
∈R(AD) ∈N (D) MA212 – Lecture 38 – Friday 2 March 2018 page 6
Also:

z = Ay & Dz = 0 ⇒ z = Ay = AD[A y] = A |{z}


Dz = 0
| {z }
=0
So R(A) ∩ N (D) = {0}

(2) For y ∈ Rm , put

z = y − DAy : Az = Ay − ADAy = 0, so
y = (DAy) + |{z}
z
| {z }
∈R(DA) ∈N (A)

Also w = Dx & Aw = 0 ⇒ w = Dx = DA[D x]


| {z }
= D |{z}
Aw = 0
=0
R(D) ∩ N (A) = {0} MA212 – Lecture 38 – Friday 2 March 2018 page 7
Conversely: (N.B. N (A) ⊆ Rm and R(A) ⊆ Rn )

Theorem: If

Rn = R(An×m ) ⊕ W,
Rm = V
|{z} ⊕ N (An×m )
dim V =rk(A)

– then there exists a matrix G with


 
GAG = G  W = N (G) e.g. = R(A)⊥ 
+
AGA = A  V = R(G) e.g. = N (A)⊥ 

and G is unique.

This uses previous technology...


nothing special just: R(A) ∩ N (B) = {0}, then (BA)−1 B.A = I.
MA212 – Lecture 38 – Friday 2 March 2018 page 8
Note about dimV for An×m

rank(A) + nullity(A) = m,
dim V + nullityA = m

So
rank(A) = dim V

The specific example with



W = N (G) = R(A)⊥ 
can be arranged with a unique G
V = R(G) = N (A) ⊥ 

MA212 – Lecture 38 – Friday 2 March 2018 page 9


Comment

We write for k = rk(A)

An×m = Ãn×k Ck×m with R(Ã) |{z}


= R( |{z}
A )
| {z } | {z }
rk=k rk=k obvious bcs rk=k

−→ LÃ and RC with N (A) = N (C)

Indeed, the last equality holds because

Ax = 0 =⇒ ÃCx = 0 =⇒ LÃ ÃCx = 0 =⇒ Cx = 0,

and t’other way about:

Cx = 0 =⇒ ÃCx = 0 i.e. |{z}


A x = 0.
=ÃC

MA212 – Lecture 38 – Friday 2 March 2018 page 10


This is followed by ‘basis bashing’:

Recall that our aim is to get:

Rn = R(Ãn×k ) ⊕ W,
=V
z }| {
Rm = R( |{z}
H ) ⊕ N (C)
| {z }
for some H =N (A)

where Ãn×k =
submatrix with k indep. col. spanning col. space of An×m ,
so that then R(Ã) = R(A).

MA212 – Lecture 38 – Friday 2 March 2018 page 11


With

W = N (B) (cf. Lecture 36)


LÃ = (B Ã)−1 B gives LÃ = I

MA212 – Lecture 38 – Friday 2 March 2018 page 12


Here

N (A) = N (C) because, let’s recall from page 10,

0 = Ax = |{z} =⇒ Cx = 0 =⇒ ÃCx = Ax = 0
ÃC x =⇒ LÃCx = 0 |{z}
=A LÃ=I

So take H = [v1 , ..., vk ] with k = dim V and with


V = Lin{v1 , ..., vk } = R(H) .
By assumption

V ∩ N (C) = {0}, equivalently


R(H) ∩N (C) = {0}.
| {z }
=V

R = RC = H(CH)−1 gives CR = I

MA212 – Lecture 38 – Friday 2 March 2018 page 13


Now take

G = RL

These choices do it!

MA212 – Lecture 38 – Friday 2 March 2018 page 14


Puzzle: why B Ã, with B as in slide 12, has rank k?

Consider T : R(Ã) → defined by restriction:

T (Ãx) = B Ãx

for B chosen with N (B) = W . Here, as R(Ã) = R(A)

N (T ) = N (B) ∩R(A) = W ∩ R(A) = {0}


| {z }
=W

and so nullity (T ) = 0; therefore

rk(T ) + nullity(T ) = dim R(A) = k


rk(T ) = k

So dim R(B Ã) = k.


MA212 – Lecture 38 – Friday 2 March 2018 page 15
Illustration ?

To come ... in Slide 26.

MA212 – Lecture 38 – Friday 2 March 2018 page 16


Special Case via Gram matrix

k = rank(A) = rk(B)
An×m = Bn×k Ck×m G = RC LB
RC = C T (CC T )−1
LB = (B T B)−1 B T .

G = C T (CC T )−1 (B T B)−1 B T ( = RL )


AG = BC · C T (CC T )−1 (B T B)−1 B T
= B(B T B)−1 B T which is symmetric!

GA = C T (CC T )−1 (B T B)−1 B T · BC


MA212 – Lecture 38 – Friday 2 March 2018 page 17
T T −1
= C (CC ) C which is symmetric!
Special Case Cont’d. Conclusion 1: in Rn

Conclusion 1: AG projects orthogonally onto R(A) = R(AG)


so parallel to N (AT ) = R(A)⊥ = N (G) .

See Figure:

ࣨ(AT)=ࣨ(G) Թn
AG

࣬(A)=࣬(AG)
Anxm Gmxn
MA212 – Lecture 38 – Friday 2 March 2018 page 18
Conclusion 2: in Rm
࣬ ࣬
GA projects orthogonally parallel to N (A)

Here N (A) = N (GA)


So N (A)⊥ = N (GA)⊥
= R((GA)T ) = R((GA)) (as GA is symmetric)

ࣨ(A) Թm
GA

࣬(GA)=࣬(AT)
Anxm Gmxn
MA212 – Lecture 38 – Friday 2 March 2018 page 19
Comment on N (G) = R(A)⊥

We show below that

= R(B)⊥ = R(A)⊥
N (G) =↑see below N (LB ) =↑see below N (B T ) |{z}
duality

Of course R(A) = R(B) by choice of B .

=G
z}|{
Gx = 0 =⇒ RL x = 0 =⇒ |{z}
CR Lx = 0 =⇒ Lx = 0
=I


|{z} (B T B)−1 B T x = 0 ⇔ B x = 0
T

sub for L
=⇒ Lx = 0 =⇒ RLx = 0 =⇒ Gx = 0

MA212 – Lecture 38 – Friday 2 March 2018 page 20


Comment on R(G) = N (A)⊥

= N (C)⊥ =↑(2) see overleaf N (A)⊥


R(G) =↑(1) see below R(C T ) |{z}
duality

The last equality says that N (C) = N (A) .

Gx = C T [(CC T )−1 Lx] so R(G) ⊆ R(C T )


| {z }
=R

=I
z}|{
T −1
C y = C [(CC ) · LB ] · CC T ]y
T T

= [C T (CC T )−1 ·L](B · CC T y)


| {z }
=R

G (B · CC T y)
= |{z} so R(C T ) ⊆ R(G)
MA212 – Lecture 38 – Friday 2 March 2018 page 21
=RL
Cont’d

... and for (2) argue that

Ax = 0 =⇒ |{z}
BC x = 0 =⇒ |{z}
LB Cx = 0 =⇒ Cx = 0 =⇒ |{z}
BC x = 0,
=A =I =A

... i.e. Ax = 0 back to where we started, so

N (A) = N (C).

So the implications are all equivalences.

MA212 – Lecture 38 – Friday 2 March 2018 page 22


The Strong Generalized Inverse

The unique G such that

 
GAG = G  N (G) = R(A)⊥ 
+ [i.e. ‘Orthogonality’]
AGA = A  R(G) = N (A) 

is called the
Strong Generalized Inverse

or the
Moore-Penrose pseudo-inverse

after Roger Penrose the theoretical physicist (=mathematician)


Construction: page 17
(For the *uniqueness proof see AO page 106.)
MA212 – Lecture 38 – Friday 2 March 2018 page 23
How about the right and left inverses?

If An×m has rank m ≤ n, then the left inverse defined by

L = (AT A)−1 AT

is its strong generalized inverse, since

ALA = A(LA) = A and LAL = (LA)L = L

and both of

AL = A(AT A)−1 AT , 
are symmetric,
LA = I 

so these projections are orthogonal, being symmetric.

MA212 – Lecture 38 – Friday 2 March 2018 page 24


Similarly, ...

If An×m has rank n ≤ m, then the right inverse defined by

L = AT (AAT )−1

is its strong generalized inverse, since

ARA = (AR)A = A and RAR = R(AR) = R

and both of

RA = AT (AAT )−1 A, 
are symmetric.
AR = I 

MA212 – Lecture 38 – Friday 2 March 2018 page 25


Farewell Example: Find the pseudo-inverse of A

Know the drill:

1. Factorize A = BC . 2. Compute LB = (B T B)−1 B T .


3. Compute RC = C T (CC)−1 . 4. Compute G = RL .

Off you go ...


 
1 0 2
 
A = [a1 , a2 , a3 ] =  −1 1 0 

 , where a3 = 2(a1 + a2 ).
0 1 2

The dependencies noted by observation. To get this


systematically: reduce to row echelon, as previously in Lecture
37.
MA212 – Lecture 38 – Friday 2 March 2018 page 26
Check: Row reduction

 
1 0 2
 
 −1 1 0 
 
R2 → R2 + R1
0 1 2
 
1 0 2
 
→  
 0 1 2 
0 1 2 R3 → R3 − R2
 
1 0 2
 
→  0 1 2 


c3 = 2(c1 + c2 )(as c1 = e1 , c2 = e2 )
0 0 0

MA212 – Lecture 38 – Friday 2 March 2018 page 27


Cont’d

 
1 0  
  1 0 2
A =  −1 1 


  = BC
0 1 2
0 1

Here B = [a1 , a2 ]; relative to the basis B the matrix C identifies


the co-ordinate columns relative to B of a1 , a2 , a3 . So
C = [e1 , e2 , 2(e1 + e2 )].

MA212 – Lecture 38 – Friday 2 March 2018 page 28


Cont’d

 
  1 0  
1 −1 0   2 −1
B B = 
T   −1 1  =  
 
0 1 1 −1 2
0 1
 
1 2 1 
(B T B)−1 = ... swap & re-sign, and det=3
3 1 2

    
T −1 T 1  2 1   1 −1 0  1  2 −1 1 
LB = (B B) B = =
3 1 2 0 1 1 3 1 1 2

MA212 – Lecture 38 – Friday 2 March 2018 page 29


Cont’d

 
  1 0  
1 0 2  5 4
CC T
=   0 1  =  
 
0 1 2 4 5
2 2
 
1  5 −4 
(CC T )−1 =
9 −4 5

up above ... swap & re-sign, and det=25-16=9

MA212 – Lecture 38 – Friday 2 March 2018 page 30


So

 
1 0  
 1 5 −4
RC T
= C (CC ) T −1 
= 0 1   
9 −4 5
2 2
 
5 −4
1
 −4

= 5 
9 
2 2

MA212 – Lecture 38 – Friday 2 March 2018 page 31


G = RL now

 
5 −4  
1 1  2 −1 1
G = RL = ·  −4 5 

 
9 3 1 1 2
2 2
 
6 −9 −3
1  −3

= 9 6 
27  
6 0 6

MA212 – Lecture 38 – Friday 2 March 2018 page 32


Function Spaces

MA212 – Lecture 38 – Friday 2 March 2018 page 33


Examples of Function Spaces

We’ve seen many of these before:

F [0, 1] := {f : f is a function from [0, 1] → R}

C[0, 1] := {f ∈ F [0, 1] : f is continuous}

D[0, 1] or C 1 [0, 1] := {f ∈ F [0, 1] : f is differentiable}

S[0, 1] aka C ∞ [0, 1]

C ∞ [0, 1] := {f ∈ F [0, 1] : f is diff’ntable infinitely many times}

P[0, 1] := {f ∈ F [0, 1] : f is a polynomial}

P[0, 1] ⊆ S[0, 1] ⊆ D[0, 1] ⊆ C[0, 1] ⊆ F [0, 1]

MA212 – Lecture 38 – Friday 2 March 2018 page 34


Inner products

In C[0, 1] we can introduce


Z 1
hf, gi := f (t)g(t) dt
0

P[0, 1] contains the function fn (t) ≡ tn ; all these are lin. indep
and infinitely differentiable.

MA212 – Lecture 38 – Friday 2 March 2018 page 35


Variations

If C replaces R as co-domains (=range space), then


Z 1
hf, gi := f (t)g(t) dt,
0

using the conjugate.


If R+ := [0, ∞) replaces [0, 1] as domain for C(R)... then in the
formula for hf, gi the functions f (and likewise g ) need to satisfy
Z ∞
|f (t)|2 dt < ∞,
0

whereupon we may validly put


Z ∞
hf, gi = f (t)g(t) dt.
0
MA212 – Lecture 38 – Friday 2 March 2018 page 36
If N in place of [0, ∞) , then we obtain the ...
Sequence Spaces:


X
ℓ2 := {a = (a1 , a2 , ...) : (∀n)(an ∈ R) and a2n < ∞}.
n=1

We saw that ℓ2 is a vector space under pointwise addition (i.e.


co-ordinatewise) and scaling.

MA212 – Lecture 38 – Friday 2 March 2018 page 37


Other Examples

Sequences convergent to zero:

c0 := {a = (a1 , a2 , ...) : (∀n)(an ∈ R and limn an = 0)}

contains the eventually “k”onstant ones

k := {a = (a1 , a2 , ...) : (∀n)(an ∈ R) and eventually an = 0}

k ⊆ c0 , k ⊆ ℓ2 , ℓ2 ⊆ c0 .

In ℓ2 we define
X∞
ha, bi := a n bn
n=1

MA212 – Lecture 38 – Friday 2 March 2018 page 38


Cauchy-Schwarz to infinity

The Cauchy-Schwarz inequality in Rn asserts that


X n Xn 1/2 Xn 1/2
2 2
|x · y| ≤ ||x||.||y|| i.e. xi yi ≤ xi yi
i=1 i=1 i=1

Apply this for any n and xi = |ai |, yi = |bi | to yield


Xn Xn 1/2 Xn 1/2
(sn :=) = |ai |.|bi | ≤ a2i b2i
i=1 i=1 i=1
X∞ 1/2 X∞ 1/2
≤ a2i b2i < ∞,
i=1 i=1

for a, b ∈ ℓ2 .

MA212 – Lecture 38 – Friday 2 March 2018 page 39


Cont’d

Hence {sn } is bounded, increasing, and so converges. Hence


X∞ X∞ 1/2 X∞ 1/2
limn sn = |ai |.|bi | ≤ a2i b2i < ∞,
i=1 i=1 i=1
P∞ P∞
and so, as | i=1 ai bi | ≤ i=1 |ai |.|bi | ,

|ha, bi| ≤ ||a||.||b||.

MA212 – Lecture 38 – Friday 2 March 2018 page 40


k and so ℓ2 are infinite dimensional:

Consider

e1 = (1, 0, 0, ...) ∈ k
e2 = (0, 1, 0, ...) ∈ k
etc.

Then 
 0, if i 6= j,
hei , ej i =
 1, if i = j.

So {e1 , e2 , ...} is an orthonormal set, hence all are lin. indep.


and indeed span k.

MA212 – Lecture 38 – Friday 2 March 2018 page 41


Some forward planning for Weeks 10 and 11

Exam solutions are on display only for the 3 past years.


(Dept. policy)
Be advised that eventually the 2017 solutions will supplant
the 2014 solutions.
Revision will be centered around the 2017 examination
...solutions and discussion of the context of the questions
or? .. we can re-run some past lectures/ parts of lectures
Please e-mail me your thoughts

MA212 – Lecture 38 – Friday 2 March 2018 page 42


Planning continued

Some general advice as regards old exam papers:


Read them! Don’t waste time doing masses of them.
That is: Don’t waste time on arithmetic.
Instead, ask yourself: Do you know how to solve them.
If you don’t ask for help. From me, from Dr Lokka, from Prof
Skokan, class teachers ...

You have 1 week to e-mail me your


thoughts

MA212 – Lecture 38 – Friday 2 March 2018 page 43


MA212 Further Mathematical Methods Lecture LA 19

Lecture 39: Infinite-dimensional Vector


Spaces

Differences
Similarities
It ain’t necessarily so!

For V finite dimensional, and T : V → V linear:

R(T ) = V ⇔ N (T ) = {0} ⇔ T : V → V invertible

Now in ℓ2 ...
For a = (a1 , a2 , ...) define the shift-map T by putting:

T (a) = (a2 , a3 , ...).

This is linear (Exercise!)


Here R(T ) = ℓ2 but N (T ) = Lin{e1 } where e1 := (1, 0, 0, ..).
Indeed T (a) = (a2 , ...) = 0 iff a2 = 0 = a3 = a4 = ... so
a = (a1 , 0, 0, ...) = a1 e1 .

MA212 – Lecture 39 – Tuesday 6 March 2018 page 2


Furthermore

T (e1 ) = 0 =T (0), so T not invertible.

Now define the reverse shift

T ∗ (a) = (0, a1 , a2 , ...).

This also is linear (Exercise!)


Here R(T ∗ ) = {b ∈ℓ2 : b1 = 0} =
6 ℓ2 , yet N (T ∗ ) = {0}
Also

T ∗ (a) = e1 = (1, 0, 0, ...) not soluble, so T ∗ not invertible.

MA212 – Lecture 39 – Tuesday 6 March 2018 page 3


More misery

T ∗ T e1 = 0 = T ∗ 0 = 0 6= e1

yet
T T ∗ a = T (0, a1 , a2 , ...) = a for a = (a1 , a2 , ..)

so
TT∗ = I
T ∗ T 6= I

MA212 – Lecture 39 – Tuesday 6 March 2018 page 4


Eigenvalues?

? T ∗ a = λa, for some a 6= 0, i.e.


(0, a1 , a2 , ...) = (λa1 , λa2 , ..) for some a 6= 0 ?

implies...

MA212 – Lecture 39 – Tuesday 6 March 2018 page 5


... implies that

T ∗a = λa
0 = λa1
a1 = λa2
a2 = λa3

If λ = 0, then the LHS says a = 0. If λ 6= 0 , then canceling in


the first equation, we first get a1 = 0,

MA212 – Lecture 39 – Tuesday 6 March 2018 page 6


then in the next ...

we get a2 = 0 and so on.; so a = 0, a contradiction.

T ∗a = λa
0 = a1
a1 = λa2
a2 = λa3

So no eigenvalues here.

MA212 – Lecture 39 – Tuesday 6 March 2018 page 7


Still more misery:

U ⊆ (U ⊥ )⊥ 6= U

Example: In V = ℓ2 , take U = k = the eventually zero


sequences.
e1 ∈ k, e2 ∈ k, ...
If a ∈k ⊥ , then

0 = hei , ai = ai all i, so a = 0, i.e. k ⊥ = {0}.

So (k ⊥ )⊥ = ℓ2 6= k.
k + k ⊥ = k + {0} = k 6= ℓ2

Why?

MA212 – Lecture 39 – Tuesday 6 March 2018 page 8


Why? Why k 6= ℓ2 .

Indeed
1 1 X 1
2
a = (1, , ..., , ...) ∈ ℓ as 2
< ∞,
2 n n
1 1
but (1, , ..., , ...) ∈
/ k.
2 n

MA212 – Lecture 39 – Tuesday 6 March 2018 page 9


Green shoots of hope

In an inner-product space V :
Suppose that U ⊆ V is a finite-dimensional subspace.
So U = Lin{u1 , u2 , ..., un }, say; suppose indeed these are
orthonormal, so a basis for U. (Can arrange, by applying the
Gram-Schmidt process)
Put
P (v) = hv, u1 iu1 + hv, u2 iu2 + ... + hv, un iun

Then P is linear (as e.g. hv, u1 i is) and

P (v) ∈ U

hv−P v, ui i = hv, ui i − hP v, ui i

MA212 – Lecture 39 – Tuesday 6 March 2018 page 10


So substituting for P v

= hv, ui i − hhv, u1 iu1 + hv, u2 iu2 + ... + hv, un iun , ui i


= hv, ui i − [hhv, u1 iu1 , ui i + hhv, u2 iu2 , ui i + ...
... + hhv, un iun , ui i]

= hv, ui i − [hv, u1 ihu1 , ui i + hv, u2 ihu2 , ui i + ...


... + hv, un ihun , ui i]

So v − P v ⊥ U .

MA212 – Lecture 39 – Tuesday 6 March 2018 page 11


Observation

Now inside [..] most terms are zero, by orthogonality. So

= hv, ui i − hv, ui ihui , ui i


= hv, ui i − hv, ui i1 = 0

So
v = |{z}
Pv +v| −{zP v}
∈U ∈U ⊥

V = U + U ⊥.

MA212 – Lecture 39 – Tuesday 6 March 2018 page 12


In fact for U a finite-dimensional subspace

U ⊕ U ⊥,
V = |{z}
finite dim.

because
U ∩ U ⊥ = {0};

indeed, if u ∈U ∩ U ⊥ then

hu, ui = 0, so u = 0.

So by the above

P (v) = hv, u1 iu1 + hv, u2 iu2 + ... + hv, un iun

is the orthogonal projection from V onto U. MA212 – Lecture 39 – Tuesday 6 March 2018 page 13
Recap

Recall that, as v = u + w uniquely for some u ∈U and w ∈U ⊥


(from last page), then P (v) = u.

So for u ∈U and v ∈V

u − v = (u − P v) + (P v − v)
| {z } | {z }
∈U ∈U ⊥

so by the Pythagoras Theorem

||u − v||2 = ||u− P v||2 + ||P v − v||2


≥ ||P v − v||2

So P v in U is the closest to v.

MA212 – Lecture 39 – Tuesday 6 March 2018 page 14


Illustration

Pv U

hypoteneuse
v

Pv u
0 Pv
MA212 – Lecture 39 – Tuesday 6 March 2018 page 15
U u
Example 1

Take V = ℓ2

U = U (n) = Lin{e1 , e2 , ..., en }


= {(a1 , a2 , ..., an , 0, 0, 0...) : a1 , a2 , ..., an ∈ R}

P (a) = ha, e1 ie1 + ha, e2 ie2 + ... + ha, en ien

P ((a1 , a2 , ...)) = a1 e1 + a2 e2 + ... + an en


= (a1 , a2 , ..., an , 0, 0, 0...)

This keeps only the first n entries of a .

MA212 – Lecture 39 – Tuesday 6 March 2018 page 16


So ... with V = ℓ2

||u − a||2 = (a1 − u1 )2 + ... + (an − un )2 + a2n+1 + a2n+2 + ...


≥ a2n+1 + a2n+2 + ... (dropping the first n terms)
= ||a − P a||2 = ||P a − a||2

So the point P a ∈ U is the point u of U closest to a.

MA212 – Lecture 39 – Tuesday 6 March 2018 page 17


Example 2

Here V = C[0, 1]
Z 1
hf, gi = f (t)g(t) dt
0

U = Lin{1, t, t2 }

The Gram-Schmidt process yields {g1 , g2 , g3 } (see the next


slides)

g1 (t) ≡ 1,

g2 (t) ≡ 3(2t − 1)

g3 (t) ≡ 5(6t2 − 6t + 1)

P (f ) = PU (f ) = hf, g1 ig1 + hf, g2 ig2 + hf, g3 ig3

MA212 – Lecture 39 – Tuesday 6 March 2018 page 18


Doing it, doing it

Z 1
||1||2 = 12 dt = 1, so g1 ≡ 1
0
Z 1  2

t 1
ht, g1 i = t.1 dt = =
0 2 2
1
t − ht, g1 ig1 = t − 1 ⊥ g1
2
 2
2 Z 1 Z 1/2 Z 1/2
1  1
t − =  t −  dt = s2 ds = 2 s2 ds
2 0 | {z 2} −1/2 0
=s
 
s3 1
=2 =
3 s=1/2 12
 −1    
1 1 √ 1 √ MA212 – Lecture 39 – Tuesday 6 March 2018 page 19
g2 = √ t− =2 3 t− = 3 (2t − 1) .
We now find g3

Z 1  
3 1
2 2 t 1
ht , g1 i = t .1 dt = =
0 3 0 3

Z 1 √ √ Z 1
ht2 , g2 i = t2 . 3 (2t − 1) dt = 3 (2t3 − t2 ) dt
0 0
 
3 1
  √
√ t t 4 √ 1 1 √ 3−2 3
= 3 − = 3 − = 3 =
2 3 0 2 3 6 6

t2 − ht2 , g1 ig1 − ht2 , g2 ig2


2 1 √ 1√
= t − 1− 3 3 (2t − 1)
3 6
2 1
= t −t+ .
6 MA212 – Lecture 39 – Tuesday 6 March 2018 page 20
Thus

 
√ 2 1 √ 2

g3 = 6 5 t − t + = 5 6t − 6t + 1
6
Let’s look at the example of PU (f ) for

f (t) = t

MA212 – Lecture 39 – Tuesday 6 March 2018 page 21



Example of f (t) = t

Z   
1 √ Z 1 √
P (f ) = t1/2 dt 1 + 3 1/2
t .(2t − 1) dt 3 (2t − 1)
0 0
 
√ Z 1
1/2 2
 √ 2

+ 5 t . 6t − 6t + 1 dt 5 6t − 6t + 1
0

   
2 4 2 12 12 2 2

P (f ) = +3 − (2t − 1) + 5 − + 6t − 6t + 1
3 5 3 7 5 3
6 + 48t − 20t2
=
35

MA212 – Lecture 39 – Tuesday 6 March 2018 page 22


Comment

Z  2
√ 
||f − P (f )||2 = min t − at2 + bt + c dt
a,b,c

since the general element in U has the form at2 + bt + c .

MA212 – Lecture 39 – Tuesday 6 March 2018 page 23


Hoping for more

Suppose that {u1 , u2 , ..., un , ...} forms an orthonormal set and


Pn projects onto U (n) := Lin{u1 , u2 , ..., un }.
Then
||P1 (v) − v|| ≥||P2 (v) − v|| ≥ ...

Indeed: P2 (v) − v ⊥ U (2) and P2 (v) − P1 (v) ∈ U (2) , so

kv − P1 (v)k2 = kv − P2 (v)k2 + kP2 (v) − P1 (v)k2 ≥ kv − P2 (v)k2 .

Question:
? ||Pn (v) − v|| → 0 ?

MA212 – Lecture 39 – Tuesday 6 March 2018 page 24


Sometimes! ...

For V = ℓ2 and ui = ei , as then

Pn (a) = (a1 , ..., an , 0, 0, 0, ..)



X
||Pn (a) − a||2 = a2j → 0, as a ∈ ℓ2 .
j=n+1

MA212 – Lecture 39 – Tuesday 6 March 2018 page 25


But ...Not always!

Example 3 V = ℓ2 and u1 = e2 , u2 = e3 , ...ui = ei+1 , ...

Pn (a) = (0, a2 , ..., an+1 , 0, 0, 0, ..)

||a−Pn (a)||2 = ||(a1 , 0, ...0, an+2 , an+3 , an+4 , ...)||2


X∞
= a21 + a2j → a21 ,
j=n+2

i.e.
Pn (a) → a − a1 e1 ,

which is not equal to a (unless a1 = 0 ).


So care needed!

MA212 – Lecture 39 – Tuesday 6 March 2018 page 26


In the example before this one...

Back there we had

Lin {e1 , e2 , · · · , en , · · · }

= U (1) ∪ U (2) ∪ U (3) ∪ ... = k and k is dense in ℓ2 .

But the very the last Example 3 was not ‘dense’ in ℓ2 .

MA212 – Lecture 39 – Tuesday 6 March 2018 page 27


k is “dense in” ℓ2 :

If a = (a1 , a2 , ...) then Pn (a) = (a1 , ..., an , 0, 0, 0, ..) and as before



X
||Pn (a) − a||2 = a2j → 0,
j=n+1

so ... any point a ∈ ℓ2 can be approximated arbitrarily well by


points from k. Thus k is dense in ℓ2 , but

Lin{e2 , e3 , ..., ei+1 , ...} is not dense in ℓ2

MA212 – Lecture 39 – Tuesday 6 March 2018 page 28


However, in C[0, 1]

...in C[0, 1] with


Z 1
hf, gi = f (t)g(t) dt,
0

take an orthonormal sequence of polynomials gn built via the


Gram-Schmidt process from: 1, t, t2 , t3 ...; then under the norm
p
||f || = hf, f i one has
n
X
hf, gi igi (t) → f (t).
i=1

MA212 – Lecture 39 – Tuesday 6 March 2018 page 29


MA212 Further Mathematical Methods Lecture LA 20

Lecture 40: Periodic Functions

(& why Ptolemy Rules OK)

The orthonormal sequence eint n ∈ Z


Eigenvectors in C[0, 1]

Consider
d2
T (f ) := 2 f,
dt
then its eigenvalues equation is

T (f ) = λf,

i.e.
f ′′ − λf = 0 (auxiliary equation: m2 − λ = 0)

for λ = α2 > 0 : f (t) = eαt

for λ = −α2 < 0 : f (t) = eiαt

MA212 – Lecture 40 – Friday 9 March 2018 page 2


Fourier’s Heat Equation

∂2 ∂
2
f (x, t) = f (x, t).
∂x ∂t
How to solve?
Investigate the problem using eigenvectors in the space C[0, 1],
or better, as we see later, on C[−π, +π]

MA212 – Lecture 40 – Friday 9 March 2018 page 3


Trigonometric Polynomials

These are the functions f (t) into C (the complex numbers) that
are of the form
n
X
ikt

ak e with i = −1 (finite sum!)
k=−n

For example

ei5t + ie−3it + 7 (this last term is 7e0it )

MA212 – Lecture 40 – Friday 9 March 2018 page 4


Reason for the name:

Expressible in terms of cos and sin because

eit = cos t + i sin t

eikt = cos(kt) + i sin(kt) e−ikt = cos(kt) − i sin(kt),

as cos(−θ) = cos θ and sin(−θ) = − sin θ.


So using 1i = −i ,
eliminating between the displayed equations gives

1 ikt −ikt
 −i ikt −ikt

cos(kt) = 2 e +e sin(kt) = 2 e −e

MA212 – Lecture 40 – Friday 9 March 2018 page 5


Function space view

Context: the continuous functions

f : [−π, +π] → C

This context exploits that cos and sin are periodic (period 2π );
so are also trigonometric polynomials:
both satisfy
f (t + 2π) = f (t)

so e.g.
f (−π) = f (π),

by taking t = −π in the preceding equation.

MA212 – Lecture 40 – Friday 9 March 2018 page 6


NB: Conjugates

eikt = cos(kt) + i sin(kt)


= cos(kt) − i sin(kt) = e−ikt .

Henceforth we will work in C[−π, +π]... and we use


Z +π
hf, gi = f (t)g(t) dt
−π

MA212 – Lecture 40 – Friday 9 March 2018 page 7


What is ...

Z
eikt dt

As defined before, we integrate the real and imaginary parts;


for k 6= 0 :
Z Z Z
[cos(kt) + i sin(kt)] dt = cos(kt) dt + i sin(kt) dt

1 1
= sin(kt)−i cos kt + C
k k
1
= [cos kt + i sin(kt)] + C
ik
eikt
= + C, (as 1/i = −i)
ik
as expected.
MA212 – Lecture 40 – Friday 9 March 2018 page 8
Orthogonality

Z +π Z +π
heint , eimt i = eint eimt dt = eint e−imt dt
−π −π
Z +π
= ei(n−m)t dt
−π

Consider k = n − m 6= 0, then
 
ikt +π
int imt e
he ,e i = = 0, as cos(kt)|t=π = cos(kt)|t=−π
ik −π
(even function),
whereas sin(kt) = 0 for t = ±π.

MA212 – Lecture 40 – Friday 9 March 2018 page 9


Now take n = m

Z +π Z +π
||eint ||2 = heint , eint i = eint e−int dt dt = 1 dt = 2π.
−π −π

So
 
1 int
√ e :n∈Z is an orthonormal set in C[−π, +π].

So we can project f ∈ C[−π, +π] as in Lecture 39 onto


 
1 ikt
Lin √ e : k = −n, −(n − 1), ..., −1, 0, +1, ..., n .
2π finite dim

MA212 – Lecture 40 – Friday 9 March 2018 page 10


This yields

n
X n
X
eikt eikt 1
Pn (f ) = hf, √ i √ = hf, eikt ieikt .
2π 2π 2π
k=−n | {z } k=−n
Here Z +π
1 1
ak (f ) = ikt
hf, e i = f (t)e−ikt dt
2π 2π −π

are the Fourier coefficients of f .

For f : [−π, +π] → R we can also rewrite this


n
X
Pn (f ) = α0 + [βk sin(kt) + γk cos(kt)];
k=1

here α0 , β s, γ s are real, and we get a real-valued function!


MA212 – Lecture 40 – Friday 9 March 2018 page 11
Example:

Define
f (t) := |t| for t ∈ [−π, +π],

and periodically elsewhere.

For α 6= 0 , we need to compute (with u = t and dv = eαt dt )


Z αt Z
e eαt eαt eαt
teαt dt = t − dt = t − 2 +C
α α α α

Note that for integer k :

eikπ = cos kπ + i sin kπ



 +1, k even
=
 −1, k odd
MA212 – Lecture 40 – Friday 9 March 2018 page 12
Illustration

>2‹ >‹ 0 ‹ 2‹ 3‹

MA212 – Lecture 40 – Friday 9 March 2018 page 13


Now compute Fourier coefficients

We use the first formula on slide 11, ignoring 1/2π .


First, for k = 0,
Z +π Z +π  2 +π
|t| dt = 2 t dt = t 0 = π 2 .
−π 0

MA212 – Lecture 40 – Friday 9 March 2018 page 14


And for k 6= 0,

Z +π Z +π Z 0
|t|e−ikt dt = te−ikt dt + (−t) e−ikt dt
−π 0 −π |{z}
=|t|
Z +π Z 0
= te−ikt dt − te−ikt dt
0 −π
 
−ikt π
 
−ikt 0
t −ikt e t −ikt e
= i e + 2 − i e + 2
k k 0 k k −π
1 2 1 n −ikπ o
= {0} − 2 + 2 e + e+ikπ
k
 k k
 0 k even (& k 6= 0)
=
 − 42 k odd
k

In the above remember that e+ikπ = e−ikπ for integer k . MA212 – Lecture 40 – Friday 9 March 2018 page 15
So



 π/2
 k=0
ak (f ) = 0 k even & k 6= 0



− πk2 2 k odd

and so:
n
π 2 X 1 ikt
Pn (f ) = − 2
e ( but (−k)2 = k 2 so...)
2 π k
k odd
−n
X n n
X
π 2 1 ikt −ikt π 4 cos(kt)
= − (e + e )= − ,
2 π k2 2 π k2
k odd k=1 & odd
1

because (eikt + e−ikt ) = cos(kt) + i sin(kt) + cos(kt) − i sin(kt)


MA212 – Lecture 40 – Friday 9 March 2018 page 16
Leading question

n
X
π 4 cos(kt)
Pn (f ) = − 2
→ |t| as n → ∞ for t ∈ [−π, +π]?
2 π k
k=1 & odd

Puzzle!

MA212 – Lecture 40 – Friday 9 March 2018 page 17


Fourier’s Theorem

For f ∈ C[−π, +π]


(i) (cf. last slide Lecture 39)

Xn

f − ak (f )eikt → 0

k=−n

(ii)

X
kf k2 = 2π |ak (f )|2
−∞

(iii)

X
f= ak (f )eikt for − π < t < +π
k=−∞

Danger at t = ±π.
MA212 – Lecture 40 – Friday 9 March 2018 page 18
To get (ii) from (i)

... note that


2
X n
ikt
a k (f )e

k=−n
* n n
+
X X
ikt
= ak (f )e , ah (f )eiht
k=−n h=−n
n
X X n D E n
X
= ak (f )ah (f ) eikt , eiht = ak (f )ak (f )(2π)
k=−n h=−n k=−n
Xn
= 2π |ak (f )|2 ,
k=−n

MA212 – Lecture 40 – Friday 9 March 2018 page 19


In the limit...

Now pass to the limit using (i):


2

!
X n

∗ − f +f → k0 + f k2 .

k=−n
| {z }

→0

MA212 – Lecture 40 – Friday 9 March 2018 page 20


Computing π 2

By the Theorem: for −π < t < +π



X
π 4 cos(kt)
|t| = − 2
;
2 π k
k=1 & odd

take t = 0, then


X
π 4 1
0= − 2
.
2 π k
k>0 odd

MA212 – Lecture 40 – Friday 9 March 2018 page 21


So ..

So

X
π2 1
= 2
,
8 k
k>0 odd

i.e.
1 1 1 π2
1 + 2 + 2 + 2 + ... = .
3 5 7 8
How about all k ? What is this sum:
1 1 1
1 + 2 + 2 + 2 + ...
2 3 4

MA212 – Lecture 40 – Friday 9 March 2018 page 22


A brainwave:

Even=2ℓ × odd for some ℓ = 0, 1, 2, ...

If odd=1

Evenodd=1 =1, 2, 4, 8, 16, ...2 ℓ , ... for some ℓ = 0, 1, 2, ...

Evenodd=3 =3 × {1, 2, 4, 8, 16, ...2 ℓ , ...} for some ℓ = 0, 1, 2, ...


1 1
ℓ 2
= ℓ
(2 ) 4
X∞
1 1 4 1

= 1 = as common ratio=
4 1− 4
3 4
ℓ=0

MA212 – Lecture 40 – Friday 9 March 2018 page 23


Claim


X 1 π2
2
=
k 6
k=1

i.e.
1 1 1 π2
1 + 2 + 2 + 2 + ... =
2 3 4 6

MA212 – Lecture 40 – Friday 9 March 2018 page 24


Idea...

Re-organize the sum as a sum of sums; each odd reciprocated


power factorized out:
 
1 1
= 1 × 1 + 2 + 2 + ...
2 4
 
1 1 1
+ 2 × 1 + 2 + 2 + ...
3 2 4
 
1 1 1
+ 2 × 1 + 2 + 2 + ... + ...
5 2 4
   
1 1 1 1 1
= 1 + 2 + 2 + 2 + ... × 1 + 2 + 2 + ...
3 5 7 2 4
π2 4 π2
= × =
8 3 6

MA212 – Lecture 40 – Friday 9 March 2018 page 25

You might also like