Professional Documents
Culture Documents
Linear Algebra Notes: Eigenvectors & Diagonalization
Linear Algebra Notes: Eigenvectors & Diagonalization
Linear Algebra Notes: Eigenvectors & Diagonalization
Eigenvectors and
Diagonalization
1.1 Prologue
If T : Rn → Rn is a linear mapping, then T may be represented as an n × n
matrix M . But this involves a choice of basis on Rn , and M changes if you
choose a different basis. The big theme of this chapter is that a suitable choice
of basis makes M diagonal, and it is then easy to do calculations.
ẋ = −x + 4y,
ẏ = −2x + 5y.
CHAPTER 1. EIGENVECTORS AND DIAGONALIZATION 2
Av = λv.
(Both the Greek letter, and the hybrid German-English words, are traditional.)
Here we are thinking of v as a column-vector. In particular,
0 = Av − λv = (A − λI)v.
has characteristic polynomial pA (t) = −(t + 2)(t − 3)(t − 1), and so the eigen-
values are λ = −2, 3, 1.
CHAPTER 1. EIGENVECTORS AND DIAGONALIZATION 3
has characteristic polynomial pA (t) = −(t + 2)(t − 3)(t − 1), and so the eigen-
values are λ = −2, 3, 1, as before.
Example 3. Find the eigenvalues of the matrix
0 1
A= .
1 0
Solution. We calculate
Example 4. The eigenvectors are v1 = (1, 2, 1), v2 = (1, 1, 1) and v2 = (2, 0, 1).
Multiple eigenvalues. If we have a polynomial p(t) with degree N , the
following decomposition holds:
p(t) = c (t − λ1 )k1 · (t − λ2 )k2 · · · · · (t − λp )kp ,
with c ∈ C is non zero. The roots, λi ∈ C, are all distinct, i.e. λi 6= λj when
i 6= j. The multiplicity of the root λi is given by ki ∈ N, with ki 6= 0, and
P p
i=1 ki = N .
If an eigenvalue λ has multiplicity k 6= 0 ∈ N it means that the characteristic
polynomial takes the form
pA (t) = (t − λ)k Q(t) ,
where the polynomial Q(t) has degree N − k and it is non-vanishing on λ, i.e.
Q(λ) 6= 0.
There are p linearly-independent eigenvectors corresponding to λ, where 1 ≤
p ≤ k. The eigenspace Vλ is the p-dimensional vector space spanned by these
eigenvectors.
Since the eigenspace Vλ is generated by all the eigenvectors associated with
the eigenvalue λ we can also write it as
Vλ = Ker (A − λI) ,
where the kernel, Ker, of a linear operator B is a subspace of the full vector
space V given by
Ker B = {v ∈ V s.t. Bv = 0} .
The simplest examples are the following.
• The 2 × 2 matrix A = I has eigenvalue λ = 1 (twice), and two independent
eigenvectors: the eigenspace is 2-dimensional.
• By contrast,
1 1
A=
0 1
also has eigenvalue λ = 1 (twice), but v = (1, 0) is the only eigenvector (up to
scalar multiplication): the eigenspace is 1-dimensional, namely span{v}.
Theorem. (The Cayley-Hamilton theorem). Let A be a square matrix with
characteristic polynomial p(t). Then p(A) = 0, as a matrix equation.
Partial proof. We show this for 2 × 2 matrices only, by direct calculation. (Or
just do for A = σ1 .) Let
a b
A= .
c d
CHAPTER 1. EIGENVECTORS AND DIAGONALIZATION 5
For completeness:
Complete proof.
Let us use a flag basis for A, i.e. a basis {v1 , ..., vN } such that A is upper
triangular
a11 a12 . . . a1N
0 a22 . . . a2N
A= .
.. . .. . .. .. . (1.1)
.
0 ... 0 aN N
We use the result (without proving it) that for any linear transformation T we
can find a flag basis and the matrix representation A of the linear transformation
T in this basis takes precisely the form above.
Because of the upper triangular form, if we apply A to the vector basis vi we
obtain
Avi = aii vi + wi
.
with wi ∈ Vi−1 = Span{v1 , ... vi−1 }. So
(A − aii I) vi ∈ Vi−1 .
(A − a11 I)v1 = 0
v = αvi + w,
CHAPTER 1. EIGENVECTORS AND DIAGONALIZATION 6
with α ∈ C and w ∈ Vi−1 . From the upper triangular property it follows that
(A − aii I)w ∈ Vi−1 since Aw ∈ Vi−1 , and also (A − aii I)αvi ∈ Vi−1 , so we deduce
that
(A − aii I)v = w0 ∈ Vi−1 ∀v ∈ Vi
.
At this point we can use the inductive hypothesis since we know that, for all
w0 ∈ Vi−1 :
(A − a11 I)(A − a22 I) . . . (A − ai−1 i−1 I)w0 = 0 ,
concluding our theorem.
Roughly speaking, similar matrices are ones which are equivalent to each other
via some change of basis. The square matrix A is said to be diagonalizable if A
is similar to a diagonal matrix (which will then obviously have the eigenvalues
of A as its diagonal entries). Here is a summary of some facts about similarity
and diagonalizability. Note that if we allow complex roots (as we do), then the
n × n matrix A has n eigenvalues, but they may not all be distinct.
(a) For A and B to be similar, they must have the same eigenvalues. Follows
easily from the Proposition.
(b) If A and B have the same eigenvalues, and both are diagonalizable,
then A ∼ B. Being diagonalizable means that A = M −1 D1 M and
B = N −1 D2 N for some diagonal matrices D1 , D2 . But since they have
the same eigenvalues it means that D1 ∼ D2 , so by the transitivity of ∼
it follows that A ∼ B.
Below, we prove items (d) and (e); item (f) will crop up in a later chapter.
AM = [λ1 v1 λ2 v2 . . . λn vn ]
= [v1 v2 . . . vn ] diag[λ1 λ2 . . . λn ]
= M D.
• λ=3
0 −4 4 v1
= 0 = (A − 3I)v =
0 −2 2 v2
CHAPTER 1. EIGENVECTORS AND DIAGONALIZATION 10
Therefore we set
2 1 −1 1 −1
M= , M = ,
1 1 −1 2
and then
−1 1 0
M AM = .
0 3
Remark. The construction gives you M ; if you want to check your answer
without computing M −1 , just check that AM = M D, where D is the diagonal
matrix of eigenvalues (in the right order).
Example 2. Solve the system of ODEs
ẋ = −x + 4y,
ẏ = −2x + 5y.
(At this stage, you should substitute these answers back into the ODEs to check
the calculation!)
Example 3. If possible, diagonalize the matrix
1 2
A= .
0 1
Note that the above remarks are valid only when we allow ourself to work
over the complex numbers! In particular, eigenvalues might be complex, even
when the matrix is real.
Example 4. Let us try to diagonalize
−2 −1
A= .
5 2
with ωp a k th root of unity, i.e. ωp = e2π i p/k ∈ C, ωpk = 1, we still have B1k = A.
Chapter 2
Inner-product spaces
2.1 Definitions and examples
Definition A bilinear form B on a real vector space V is a mapping B :
V × V → R assigning to each pair of vectors u, v ∈ V a real number B(u, v)
satisfying the following for all u, v, w ∈ V and all λ ∈ R:
(i) B(u + v, w) = B(u, w) + B(v, w) ,
(ii) B(w, u + v) = B(w, u) + B(w, v) ,
(iii) B(λu, v) = B(u, λv) = λB(u, v) .
If we pick V = Rn and we use the basis {v1 , ..., vn }, we can express a bilinear
form B on V as n X n
X
B(u, w) = yi Aij xj ,
i=1 j=1
(ii) and (iii) constitute linearity in first factor; (i)-(iii) make it symmetric and
bilinear, ie. linear in each factor separately.
A bilinear form B, which is also positive and symmetric, defines an inner
product. A real inner-product space is a real vector space equipped with an
inner product.
The bilinearity of the inner product is equivalent to the fact that the trans-
formation:
(· , v) : V 7→ V
defined by u 7→ (u, v) is a linear transformation for every v ∈ V (plus of course
symmetricity).
Examples.
(u, v) = u · v = u1 v1 + · · · + un vn .
(h) V = C[a, b], the vector space of continuous functions f : [a, b] → R. Recall
that V is infinite-dimensional, and that the zero-vector is the function f
which is identically zero. Define
Z b
(f, g) = f (t) g(t) dt.
a
(This scalar product is precisely the scalar product inducing the so called
L2 norm over C[a, b].) Note that this is symmetric and linear in each factor.
Also Z b
(f, f ) = f (t)2 dt ≥ 0
a
and if f (t) is not identically zero on [a, b] then (f, f ) > 0. For in this
case there exists t0 ∈ [a, b] such that f (t0 ) = c 6= 0 and hence, using the
continuity of f , there is a small neighbourhood (t − δ, t + δ), with δ > 0,
2 Rb
around t0 where f (t)2 ≥ c4 , so integrating gives (f, f ) = a f (t)2 dt > 0.
Conclusion: this is an inner product.
(j) Vector space of Polynomials Let R[x]n denote the space of polynomials
in x, of degree at most n, with real coefficients, i.e. R[x]n = {a0 + ... +
an xn with ai ∈ R}. This is a vector space of dimension n+1. For any interval
[a, b], it is a subspace of the infinite-dimensional space C[a, b] above. The
Rb
form (f, g) = a f (t) g(t) dt is one choice of inner product. A natural basis
is {1, x, . . . , xn−1 , xn }; so if
p(x) = pn xn + . . . + p0
b = 1 and n = 1 we get
1 1 1
(p, q) = p0 q0 + p1 q0 + p0 q1 + p1 q1 .
2 2 3
Matrix version.
Definition We say that an n × n matrix B is symmetric if
B t = B,
anti-symmetric or skew-symmetric if
B t = −B.
A symmetric matrix B is positive-definite if all its eigenvalues are positive. (We
will see later on that every symmetric matrix is also diagonalizable) An al-
ternative, equivalent, criterion for B to be positive-definite is that all n of its
upper-left square submatrices have positive determinant. (These subdetermi-
nants are called the leading principal minors of B.)
Every matrix B can be decomposed into its symmetric and anti-symmetric
parts simply via
B + Bt B − Bt
B= + = B+ + B− ,
2 2
t
B−B t
clearly B+ = B+B2 is symmetric, while B− = 2 is anti-symmetric.
Proposition If V = Rn , a bilinear form B defines an inner product on V if
and only if its matrix representation, that we call with the same name B, is a
symmetric matrix, i.e. its anti-symmetric part B− must vanish, and, to ensure
the positivity condition of the inner product, B is a positive-definite n × n
matrix.
We think of v, w ∈ V as column vectors, written in some basis, then an inner
product (·, ·) is simply (v, w) = wt Bv with B a symmetric and positive definite
matrix.
For example (e) above, we have
1 1
B= ;
1 1
the two subdeterminants are 1 and 0, so B is not positive-definite.
For example (g) above, we have
2 0 0
B= 0 1 −2 ;
0 −2 k
CHAPTER 2. INNER-PRODUCT SPACES 18
(d) V = C2 and hz, wi = 3z1 w1 + 2z2 w2 + iz1 w2 − iz2 w1 . Note that (i) is
satisfied, and (ii), (iii) are satisfied since the right-hand-side is complex-
linear in z1 , z2 . For (iv), we have
p(x) = pn xn + . . . + p0
Matrix version.
Definition We say that a matrix B is hermitian if B ∗ = B. Here B ∗ denotes
the complex-conjugate transpose of B (take the complex conjugate of all the
entries in B and transpose the matrix)
(B ∗ )ij = B ji ,
where z = Re(z)−i Im(z) denotes the complex conjugate of the complex number
z = Re(z) + i Im(z).
Similarly B is an anti-hermitian matrix if B ∗ = −B. Note that sometimes
B ∗ is denoted with B † .
As for the real case, every matrix can be decomposed as a sum of two matrices,
one hermitian and the other anti-hermitian:
B + B∗ B − B∗
B= + = B+ + B− ,
2 2
CHAPTER 2. INNER-PRODUCT SPACES 20
∗ ∗
where B+ = B+B 2 is hermitian and B− = B−B
2 is anti-hermitian.
n
If V = C , then an inner product on V corresponds to an hermitian positive-
definite n × n matrix B, as follows. Think of v, w ∈ V as column vectors in
some basis, and define hv, wi = w∗ Bv.
The matrix B can be reconstructed, as for the real case seen before, from
the hermitian product evaluated on the basis vectors. In particular if we take
{v1 , ..., vn } as a basis we have
Bij = hvj , vi i ,
which is not hermitian. (It is symmetric, but that is not what is needed in the
complex case.)
this for all a ∈ R (or C) and all v, u ∈ V. Note that from (i) we have ||0|| = 0
and ||v|| = || − v||, so from the triangle inequality it follows that
||v|| ≥ 0 ∀v ∈ V (non-negativity) .
CHAPTER 2. INNER-PRODUCT SPACES 21
The triangle inequality takes its name from the well known fact that in a
triangle the sum of the lengths of any two sides must be greater than or equal
to the length of the remaining side.
Examples.
(a) The only unit vectors in R with the standard inner product are ±1; but in
C with the standard inner product, eiθ = cos(θ) + i sin(θ) is a unit vector
for any θ ∈ [0, 2π).
(b) In C2 with the standard hermitian inner product, find all the unit vectors
v which are orthogonal to u = (i, 1). Solution: hv, ui = −iv1 + v2 , so
v = (c, ic) for some c√∈ C. Then ||v||2 = 2|c|2 , so to √
get a unit vector we
iθ
must impose |c| = 1/ 2. The solution is v = (1, i)e / 2 for θ ∈ [0, 2π).
In the real case, we would like to define the angle θ between u and v by
(u, v) = ||u||.||v||. cos θ ; but is this angle well-defined? The Cauchy-Schwarz
inequality guarantees that it is.
Theorem (the real Cauchy-Schwarz inequality.) (u, v)2 ≤ (u, u)(v, v),
with equality if and only if u and v are linearly dependent. Equivalently,
|(u, v)| ≤ ||u||.||v||.
Proof. This is trivial if u = 0, so suppose that u 6= 0. Then for all real numbers
x we have
||xu − v||2 = x2 ||u||2 − 2x(u, v) + ||v||2 ≥ 0,
CHAPTER 2. INNER-PRODUCT SPACES 22
(||u|| + ||v||)2 − ||u + v||2 = ||u||2 + 2||u||.||v|| + ||v||2 − ||u||2 − 2(u, v) − ||v||2
= 2 [||u||.||v|| − (u, v)]
≥ 0.
Thanks to the Corollary we see that, whenever we have a scalar productp (·, ·),
on a real vector space V , the function || · || : V → R given by ||v|| = (v, v),
defines a norm on V , usually we say that this is the norm induced by the inner
product. Note however that our definition of norm is more general and does
not rely on the presence of an inner product: there are norms which are not
induced by any inner products.
Corollary. For a real inner-product space, the angle θ ∈ [0, π] between u and
v, given by (u, v) = ||u||.||v||. cos θ , is well-defined since from the Cauchy-
Schwarz inequality we know that −||u||.||v|| ≤ (u, v) ≤ ||u||.||v||. Clearly if u
and v are orthogonal vectors (u, v) = 0 and the angle between them is θ = π/2
as expected. If u and v are parallel to one another, i.e. linearly dependent
with v = x u with x > 0, the angle is θ = 0, while if they are anti-parallel, i.e.
linearly dependent v = x u with x < 0, the angle is θ = π.
An orthonormal basis of V is a basis consisting of mutually orthogonal unit
vectors. More on this later on.
Examples.
(b) V = R2 with inner product (u, v) = 4u1 v1 +u1 v2 +u2 v1 +u2 v2 . Let u = (1, 0)
and v = (0, 1). Then ||u|| = 2, ||v|| = 1, and cos θ = 1/2, so θ = π/3.
R1
(c) V = C[−1, 1] with the inner product (f, g) = −1 f (t) g(t) dt. Then ||1|| =
√ p R1
2, ||t|| = 2/3, and 1, t are orthogonal. However, (1, t2 ) = −1 t2 dt =
2/3, so 1, t2 are not orthogonal.
CHAPTER 2. INNER-PRODUCT SPACES 23
R1
(d) V = C[0,p 1] with the inner product (f, g) = 0 f (t) g(t) dt. Then ||1|| = 1,
||t|| = √1/3, and (1, t) = 1/2. If θ is the angle between 1 and t, then
cos θ = 3/2, and so θ = π/6.
and at this point we can apply the same argument we used in the real Cauchy-
Schwarz inequality. For the above expression to be always positive, for all
λ ∈ R, the discriminat of this quadratic expression must be negative
Finally we can show that the norm induced by an hermitian inner product
satisfies the triangle inequality.
Corollary: the Triangle Inequality. For all u, v ∈ V , we have ||u + v|| ≤
||u|| + ||v||.
Proof:
where we used the fact that hu, vi + hv, ui = 2Rehu, vi, from the hermiticity
property of the inner product, and finally Rehu, vi ≤ |hu, vi|.
p
We were thus justified in calling hv, ui the norm induced by the inner
product h , i, since it satisfies the defining properties (i)-(ii)-(iii) of a norm.
Then
(ṽr+1 , u1 ) = · · · = (ṽr+1 , ur ) = 0,
and ṽr+1 6= 0 since vr+1 ∈ / span {u1 , . . . , ur } = span {v1 , . . . , vr }.
ṽr+1
Define ur+1 = ||ṽr+1 || . Then u1 , . . . , ur+1 are mutually-orthogonal unit
vectors, and span {u1 , . . . , ur+1 } = span {v1 , . . . , vr+1 }.
Examples.
Solution:
1
u1 = √ (1, 1, 0),
2
1 1
ṽ2 = v2 − (v2 , u1 ) u1 = (1, 0, 1) − (1, 1, 0) = (1, −1, 2),
2 2
1
u2 = √ (1, −1, 2),
6
ṽ3 = v3 − (v3 , u1 ) u1 − (v3 , u2 ) u2
1 1 2
= (0, 1, 1) − (1, 1, 0) − (1, −1, 2) = (−1, 1, 1),
2 6 3
1
u3 = √ (−1, 1, 1).
3
Remember, there is a simple way to check your answer — always do so!
(b) V = R2 with ||v||2 = 4x21 + 2x1 x2 + x22 for v = (x1 , x2 ), v1 = (1, 0) and
v2 = (0, 1).
Solution. The first step is to deduce the inner product from the norm (this
can always be done when the norm is induced by an inner product). In
this case, it is clear that (u, v) = 4u1 v1 + u1 v2 + u2 v1 + u2 v2 . Applying
Gram-Schmidt, we have
1
u1 = (1, 0),
2
1 1
ṽ2 = v2 − (v2 , u1 ) u1 = (0, 1) − (1, 0) = (−1, 4),
4 4
1
u2 = √ (−1, 4).
2 3
R1
(c) V = C[−1, 1] with the inner product (f, g) = −1 f (t) g(t) dt, and f1 =
1, f2 = t, f3 = t2 . Find orthonormal {g1 , g2 , g3 }.
Solution. We previously computed ||f1 ||2 = 2, ||f2 ||2 = 2/3 and (f1 , f2 ) = 0.
CHAPTER 2. INNER-PRODUCT SPACES 27
√ p
So we immediately get g1 = 1/ 2 and g2 = 3/2 t. Then
Z 1 √
1 2
(f3 , g1 ) = √ t2 dt = ,
2 −1 3
r Z 1
3
(f3 , g2 ) = t3 dt = 0,
2 −1
f˜3 = f3 − (f3 , g1 ) g1 − (f3 , g2 ) g2
√
2 1
= t2 − √
3 2
1
= t2 − ,
Z 1 3
2 1
||f˜3 || =
2 4
t − t + 2
dt
−1 3 9
2 22 2
= − +
5 33 9
2 2
= −
5 9
8
= ,
45√
3 5 2 1
g3 = √ t − .
2 2 3
||v1 ||2 = 3,
1
u1 = √ (1, 0),
3
ṽ2 = v2 − hv2 , u1 iu1
1 1
= (0, 1) − (−i)(1, 0) = (i, 3),
3 3
5
||ṽ2 ||2 = ,
3
1
u2 = √ (i, 3).
15
(e) Take V = R3 with the standard inner product. Find an orthonormal basis
for the subspace U given by x1 +x2 +x3 = 0, and extend it to an orthonormal
CHAPTER 2. INNER-PRODUCT SPACES 28
basis for R3 .
Solution. A basis for U is v1 = (1, −1, 0) and v2 = (0, 1, −1). Apply
Gram-Schmidt to this, giving
1
u1 = √ (1, −1, 0),
2
1 1
ṽ2 = (0, 1, −1) − (−1)(1, −1, 0) = (1, 1, −2),
2 2
1
u2 = √ (1, 1, −2).
6
So {u1 , u2 } is an orthonormal basis for U . To extend it, take v3 = (0, 0, 1),
say, and apply Gram-Schmidt one more time:
1 1
ṽ3 = (0, 0, 1) − (0, 0, 0) − (−2)(1, 1, −2) = (1, 1, 1),
6 3
1
u3 = √ (1, 1, 1).
3
Now {u1 , u2 , u3 } is an orthonormal basis for V .
u = (v, u1 ) u1 + · · · + (v, uk ) uk , ũ = v − u .
CHAPTER 2. INNER-PRODUCT SPACES 29
Clearly u ∈ U ; and for each ui note that (u, ui ) = (v, ui ).(ui , ui ) = (v, ui ). So
for each ui we have
(ũ, ui ) = (v − u, ui )
= (v, ui ) − (u, ui )
= (v, ui ) − (v, ui )
= 0.
Hence ũ ∈ U ⊥ .
Given finite-dimensional vector subspace U of the inner product space V, ( , )
we can consider the orthogonal projection operator P onto U , defined by P (v) =
(v, u1 ) u1 + · · · + (v, uk ) uk , where {u1 , . . . , uk } is an orthonormal basis of U .
is a linear map P : V → V . This projection operator satisfies the following
properties:
• P is a linear operator;
• P is idempotent, i.e. P 2 = P ;
• (I − P )P = P (I − P ) = 0.
The last equation is simply telling us that if we project first a vector v onto
U , and then take the component of this projection orthogonal to U , or if we
take the component of a vector v orthogonal to U and then project it onto U ,
we always find the zero vector.
From the idempontency condition is really easy to see the eigenvalues of P .
Let v be an eigenvector of P , with eigenvalue λ, i.e. P v = λv, then since
P 2 = P we get
λv = P v = P 2 v = λ2 v ,
so the only possible eigenvalues are λ = 0 and λ = 1 since v 6= 0.
Let us assume that the inner product space V has finite dimension dim V = n,
then the matrix form of the projection onto U takes a very simple form if we
use a smart basis. We know that V = U ⊕ U ⊥ , so we can use Gram-Schmidt to
obtain a basis {u1 , ..., uk } for U , made of orthonormal vectors, with k = dim U ,
and similarly a basis {v1 , ..., vn−k } for U ⊥ , made of orthonormal vectors, with
n − k = dim U ⊥ . The set {u1 , ..., uk , v1 , ..., vn−k } forms an orthonormal basis
CHAPTER 2. INNER-PRODUCT SPACES 30
Examples.
(a) Consider V = R4 equipped the standard inner product (and the norm
induced by it), find the point in
0 2
2 3
U = span v1 = −1 , v2 = 1
2 2
nearest to
1
1
v0 =
1 .
We have
1 1 √
1 t3 t2
Z
1 2 2
(g, f1 ) = √ (t + t) dt = √ + = ,
−1 2 2 3 2 −1 3
Z 1√ √ 4 3 1
√
3 3 3 t t 2
(g, f2 ) = √ (t + t2 ) dt = √ + =√ .
−1 2 2 4 3 −1 3
The orthogonal projection of g onto U is
√ √ √
2 1 2 3 1
f (t) = (g, f1 )f1 + (g, f2 )f2 = √ + √ √ t=t+ .
3 2 3 2 3
So this is the “best” (least-squares) straight-line fit to the parabola.
We can check this. Suppose that f = at + b then we want to find a and b
that minimise kg − f k = kt2 + t − at − bk. Now
Z 1
kt2 + t − at − bk2 = dt t4 + 2t3 + t2 − 2at3 − 2at2 − 2bt2 − 2bt+
−1
+a2 t2 + 2abt + b2
5 1
t t4 t3 at4 2(a + b)t3 a 2 3
t
= + + − − − bt2 + + abt2 + b2 t
5 2 3 2 3 3 −1
2
2 2 4a 4b 2a
= + − − + + 2b2
5 3 3 3 3 2
8 2 1
= + (a − 1)2 + 2 b − ,
45 3 3
where we completed the square in the last line. This is minimised when
a = 1 and b = 1/3. That is g(t) = t + 1/3 as above.
As one would expect from the well known 3-dimensional case of R3 , when
we take the projection of a vector, we generically obtain a “shorter” vector.
This holds in the more general context of inner product spaces discussed in this
Section, where the notion of length of a vector is now replaced by the more
general concept of norm of a vector.
CHAPTER 2. INNER-PRODUCT SPACES 33
Examples.
(a)
1 1 2
√ is symmetric but not orthogonal.
2 2 3
(b)
1 1 −1
√ is orthogonal but not symmetric.
2 1 1
(c)
1 1 1
√ is both symmetric and orthogonal.
2 1 −1
(d)
1 1 i
√ is hermitian but not unitary.
2 −i 2
(e)
1 1 −i
√ is unitary but not hermitian.
2 −i 1
the eigenvalues are all different, then there are n orthogonal eigenvectors by
the proposition below, and we can diagonalize by an orthogonal P ; as two
eigenvalues approach each other, the eigenvectors remain orthogonal, so it works
even in the limit when eigenvalues coincide.
Proposition. If A is orthogonally-diagonalizable, then A is symmetric.
Proof. If P t AP = D with P orthogonal and D diagonal, then A = P DP t which
is clearly symmetric.
Similarly for hermitian matrices and unitary diagonalization.
Proposition. If A is unitary-diagonalizable and the eigenvalues of A are all
real, then A is hermitian.
Proof. If P ∗ AP = D with P unitary and D diagonal, then A = P DP ∗ which is
clearly hermitian because we assumed that the eigenvalues of A were all real,
i.e. D∗ = D.
Proposition. Let A be a complex Hermitian (or real symmetric) n × n matrix.
Then the eigenvalues of A are real, and eigenvectors corresponding to different
eigenvalues are mutually-orthogonal. If A is real, then the eigenvectors are real
as well.
Remark. In what follows, we think of v ∈ Cn as a column vector, and use the
standard inner product hv, wi = w∗ v.
Proof. Let λ be an eigenvalue of A, so Ax = λx for some x 6= 0. Note that x
can be taken as real if and only if λ is real. Then we have
x∗ Ax = λx∗ x,
x∗ A∗ x = λx∗ x (conjugate transpose)
x∗ Ax = λx∗ x (A = A∗ as A is hermitian)
λx∗ x = λx∗ x (using Ax = λx
λ = λ (x∗ x > 0).
Next suppose that λ, µ are eigenvalues, with λ 6= µ. Then for some x, y, (which
we may take to be real when A is real, since λ, µ are real) we have
Ax = λx, x 6= 0, Ay = µy, y 6= 0
and
x∗ Ay = µx∗ y,
y∗ Ax = λy∗ x,
x∗ Ay = x∗ A∗ y = λx∗ y = λx∗ y,
so (λ − µ)x∗ y = 0,
x∗ y = 0
CHAPTER 2. INNER-PRODUCT SPACES 36
Example (b)
2 −2 0
A = −2 1 −2
0 −2 0
Characteristic polynomial:
t−2 2 0
det 2 t − 1 2 = (t − 2)[(t − 1)t − 4] − 2 × 2t
0 2 t
= (t − 2)[t2 − t − 4] − 4t
= t3 − 3 t2 − 6 t + 8
= (t − 1) (t + 2) (t − 4) .
λ = 4, 1, −2.
Eigenvectors:
• λ=4
2 2 0 x1 2x1 +2x2 0
(4I − A)x = 2 3 2 x2 = 2x1 +3x2 +2x3 = 0 .
0 2 4 x3 2x2 +4x3 0
Unit eigenvector
2
1
−2 .
3 1
CHAPTER 2. INNER-PRODUCT SPACES 38
• λ=1
−1 2 0 x1 −x1 +2x2 0
(I − A)x = 2 0 2 x2 = 2x1 +2x3 = 0 .
0 2 1 x3 2x2 +x3 0
Unit eigenvector
2
1
1 .
3 −2
• λ = −2
−4 2 0 x1 −4x1 +2x2 0
(−2I − A)x = 2 −3 2 x2 =
2x1 −3x2 +2x3 = 0 .
0 2 −2 x3 2x2 −2x3 0
Unit eigenvector
1
1
2 .
3 2
Thus
2 2 1 4 0 0
1
P = −2 1 2 P t AP = 0 1 0
3 1 −2 2 0 0 −2
Note: All the eigenvalues in the example are distinct, so the corresponding
eigenvectors are automatically orthogonal to each other. If the eigenvalues are
not distinct, then for repeated eigenvalues we must choose mutually orthogonal
eigenvectors.
Example (c)
4 1−i
A= .
1+i 5
Then A is hermitian, and the characteristic polynomial of A is given by
λ − 4 −1 + i
det (λI − A) = det
−1 − i λ − 5
= λ2 − 9λ + 18
= (λ − 3)(λ − 6).
λ = 3:
1 −1 + i
√ .
3 1
λ = 6:
1 1−i
√ .
6 2
(Note that these are mutually orthogonal.) Thus
!
1−i 1−i
− 3
√ √
6
P = √1 √2
3 6
is unitary and
3 0
P ∗ AP = .
0 6
is positive if and only if the λi are all positive, in other words if and only if the
matrix A is positive-definite.
Similarly, every complex inner product on Cn may be written as hu, vi =
y∗ Ax, where A is an hermitian n × n matrix and x, y are the coordinates of
the two vectors u, v in the chosen basis.
CHAPTER 2. INNER-PRODUCT SPACES 40
is positive if and only if the λi are all positive, in other words if and only if the
matrix A is positive-definite.
Chapter 3
Special polynomials
Prologue. In the 18th and 19th centuries, mathematicians modelling things
like vibrations encountered ODEs such as
(1 − x2 )y 00 − 2xy 0 − λy = 0,
with λ constant. This particular one is called the Legendre equation. They
came up again in the 20th century, for example this equation arises in atomic
physics.
Unlike the case of constant coefficients, the general solution of such equations
is messy. But for certain values of λ, there are polynomial solutions, and these
turn out to be the physically-relevant ones. Note that we can write the equation
as Ly = λy, where
2
2 d d
L = (1 − x ) 2 − 2x ,
dx dx
is a linear operator on a vector space of functions. If this function space is
a space of polynomials, then the special values of λ are the eigenvalues of L.
In applications these often correspond to natural freqencies of vibration, or of
spectral lines etc.
where k(x) is some fixed, continuous function such that k(x) > 0 for all x ∈
[a, b], with a, b fixed numbers.
CHAPTER 3. SPECIAL POLYNOMIALS 42
Note that this is indeed an inner product over the space of continuous, real
valued functions C[a, b], over the interval [a, b], and hence also on its vector
subspaces R[x] and R[x]n :
1. It is symmetric and homogeneous thanks to the properties of the integral
Z b
(λf1 + µf2 , g) = (λ f1 (x) + µ f2 (x)) g(x) k(x)
a
Z b Z b
=λ f1 (x) g(x) k(x) + µ f2 (x) g(x) k(x)
a a
= λ(f1 , g) + µ(f2 , g) = λ(g, f1 ) + µ(g, f2 ) ,
since f (x)2 ≥ 0 and k(x) was chosen by hypothesis to be positive over the
interval [a, b].
3. It is non degenerate, i.e. (f, f ) = 0 if and only if f = 0, using the
properties of continuous functions and the fact that k(x) > 0 is strictly
positive. In fact if f 6= 0, it means that there exists x0 ∈ [a, b] such that
f (x0 ) = c 6= 0. By continuity we know that if f is non-vanishing at x0 ,
there must be an interval around x0 such that f is non-zero on the whole
interval, i.e. for all > 0 there exists δ > 0 such that |f (x) − f (x0 )| >
for all x ∈ (x0 − δ, x0 + δ) (for simplicitiy let us assume that this interval
is entirely contained in [a, b]). On the interval (x0 − δ, x0 + δ), the function
f 2 (x), and k(x) as well, are strictly bigger than zero, i.e. f 2 (x)k(x) > ˜,
for some ˜ > 0 related to , c and the minimum of k(x) in the interval
(which is strictly bigger than zero). So
Z b Z x0 +δ
(f, f ) = f 2 (x) k(x) dx ≥ f 2 (x) k(x) dx > 2 ˜ δ > 0 .
a x0 −δ
The crucial points are continuity and the fact that k(x) is positive definite
over the interval [a, b].
Suppose that we want an orthonormal basis {f0 , f1 , f2 , . . . , fn }, where fn ∈
R[x]n . One way of obtaining such a basis is to start with {1, x, x2 , . . . , xn } as a
“first guess”, and apply the Gram-Schmidt process. But in some special cases
CHAPTER 3. SPECIAL POLYNOMIALS 43
there is also a more interesting way, related to the fact that many such examples
arose in the study of differential equations.
We shall see in the next section that, in each of these cases, the orthogonal
polynomials arise as special solutions of differential equations, and in fact as
eigenfunctions of linear differential operators.
(ii) Chebyshev-I: LI f = (1 − x2 )f 00 − xf 0 ;
Remarks.
(a) In each case, the corresponding differential equation for f
Lf = λf
(LPn , P ) = (Pn , LP ) = 0,
Example. Let V = R[x]3 . Write down the matrix M representing the operator
LL acting on V , using the basis {1, x, x2 , x3 } of V . Use this to construct the
Legendre polynomials {f0 , f1 , f2 , f3 }.
Solution. The operator LL maps 1 to 0, x to −2x, x2 to 2 − 6x2 and x3 to
−12x3 + 6x; so its matrix is
0 0 2 0
0 −2 0 6
M = 0 0 −6 0 .
0 0 0 −12
The corresponding normalized functions are the first four normalized Legendre
polynomials.
Nomenclature A function f , satisfying the differential equation
Lf = λf
for some linear differential operator L and some number λ is called an eigen-
function of L corresponding to the eigenvalue λ. If we rewrite this eigenfunction
f using some basis, i.e. the basis of polynomials {1, x, ..., xn , ...}, we will call
the vector containing the coordinates of f in the basis used, the eigenvector of
the associated matrix problem.
For example f3 = 5x2 −3x, computed above, is an eigenfunction of the Legen-
dre operator with eigenvalue λ = −12, while in the standard basis of R[x]3 , the
vector v3 = [0 − 3 0 5] represents the eigenfunction f3 , and it is an eigenvector
of the Legendre operator in matrix form.
Example. On Cn with the standard basis and the standard inner product,
a linear operator represented by the matrix M is hermitian if and only if the
matrix is hermitian (M ∗ = M ).
Proposition. If L is hermitian, then hLv, vi is real for all v ∈ V .
Proof.:
hLv, vi = hv, Lvi = hLv, vi,
from the hermiticity property of the inner product.
c = hLMv, vi
= hMv, Lvi (since L is hermitian)
= hv, MLvi (since M is hermitian)
= hMLv, vi (hermiticity property of inner product).
Groups II
4.1 Axioms and Examples
Definition. A group is a set G equipped with an operation • : G × G → G
such that
e1 • x = x • e1 = x ,
e2 • x = x • e2 = x ,
e2 • e1 = e1 • e2 = e1 ,
Example (0). Any vector space V with the operation +. In this case, the
zero vector 0 is the identity, and the inverse of v is −v. This group structure
is abelian. If V is a real 1-dimensional vector space, then this is the group R
of real numbers under addition (ie. the operation is +).
And now something a little less trivial.
Example (c). The group SL(n, R) of n×n real matrices M having det(M ) = 1,
is a subgroup of GL(n, R). Proof.:
Example (d). The group O(n) of n×n real orthogonal matrices, is a subgroup
of GL(n, R). Proof.:
(iii) I is othogonal;
CHAPTER 4. GROUPS II 53
and
a c
Pt = ,
b d
with a, b, c, d ∈ R. The orthogonality condition P t P = P P t = I plus the extra
condition that detP = 1 is equivalent to the system of equations
a2 + c2 = 1 ,
ab + cd = 0 ,
b2 + d2 = 1 ,
ad − bc = 1 . (detP = +1)
We can easily solve this system and the most generic P ∈ SO(2) takes the form
cos θ sin θ
P = ,
− sin θ cos θ
where the rotation angle θ ∈ [0, 2π]. [ We say that SO(2) is isomorphic to
S 1 = {(a, c) ∈ R2 s.t. a2 + c2 = 1}.]
Solution. Consider at the beginning the vector v = (1, 0), the application
P v = (cos(θ), − sin(θ)) which correspond to a clockwise rotation of the initial
vector v = (1, 0) of an angle θ. For a generic vector v = (x, y) the application
P v = (cos(θ)x + sin(θ)y, − sin(θ)x + cos(θ)y) still correspond to a clockwise
rotation of an angle θ of the vector v = (x, y).
Exercise. Try to parametrize SU(2) in way similar to the one obtained for
SO(2).
We can solve for b = −c̄ and d = −ā, so that the most general M ∈SU(2) takes
the form
a −c̄
M= ,
c −ā
with |a|2 + |c|2 = 1.
[Writing a = x1 + ix2 and c = x3 + ix4 with x1 , x2 , x3 , x4 ∈ R, the equation
|a|2 + |c|2 = 1 becomes x21 + x22 + x23 + x24 = 1. We say that SU(2) is isomorphic
to S 3 = {x ∈ R4 s.t. x21 + x22 + x23 + x24 = 1}]
Example (h). There are exactly two groups of order 4, namely Z4 as above,
and the Klein group V which has the table
• e a b c
e e a b c
a a e c b
b b c e a
c c b a e
Note that the group table of Z4 has a quite different structure from that of
V . One visualisation (or representation) of V is as the group of rotations by
180◦ about each of the x-, y- and z-axes in R3 (ie. the element a corresponds
to rotation by π about the x-axis, etc). Exercise: by using a marked cube such
as a die, convince yourself that a • b = c.
Another representation of V is as the group of reflections across the x−axis,
the y−axis, and a combine reflection across the x−axis and then the y−axis,
in R2 . Take
1 0 1 0 −1 0 −1 0
e= , a= , b= , c= ,
0 1 0 −1 0 1 0 −1
and see that these matrices satisfy the group table of the Klein group using as
group operation the matrix multiplication.
CHAPTER 4. GROUPS II 56
Example. Take the direct product of two cyclic groups Z4 × Z3 , and consider
the element (2, 2). The order of 2 in Z4 is 2 while the order of 2 in Z3 is 3 so
the order of (2, 2) in Z4 × Z3 is 6, in fact we have
preserve lengths and angles, which means that they preserve the inner product:
for all v, w we have
wt v = (M w)t (M v) = wt M t M v.
This requires M t M = I, ie. M has to be an orthogonal matrix. So the rotations
of Rn form a group, which is exactly the orthogonal group O(n).
For example, if n = 1, then we get O(1) = {1, −1}. The group operation is
multiplication, the element 1 is the identity, and the element −1 is the reflection
x 7→ −x. This group is finite and abelian, but O(n) for n ≥ 2 is neither finite
nor abelian.
Pure (proper) rotations. Recall that any orthogonal matrix M has det(M ) =
±1. Elements of O(n) with det(M ) = −1 are rotations which incorporate a
relection. The pure rotations, on the other hand, have det(M ) = 1; in other
words, they belong to SO(n). Note that SO(1) is trivial, SO(2) is abelian, and
SO(n) for n ≥ 3 is not abelian.
Plane polygons. Consider a regular plane polygon with n sides. The pure
rotations leaving it unchanged make up a finite abelian group of order n: isomor-
phic to Zn , but with the matrix representation Mj ∈ SO(2) for j = 1, 2, . . . , n,
where
cos(2πj/n) sin(2πj/n)
Mj = .
− sin(2πj/n) cos(2πj/n)
For example, the pure rotations which preserve a square (n = 4) are represented
by the four orthogonal matrices
0 1 −1 0 0 −1
M1 = , M2 = , M3 = , M4 = I.
−1 0 0 −1 1 0
These form a finite subgroup of SO(2).
The group of all symmetries of the regular n-polygon is the dihedral group
Dn , which is a finite group of order 2n. (Some people call it D2n , so beware
of confusion.) This includes reflections; so Dn is finite subgroup of O(2). The
name comes from “dihedron”, namely a two-sided polyhedron.
For example, the group of symmetries of a square is D4 , the elements of which
are represented by {M1 , M2 , M3 , M4 } plus the four reflections
0 −1 −1 0 0 1 1 0
R1 = , R2 = , R3 = , R4 = .
−1 0 0 1 1 0 0 −1
You can visualize R1 as a reflection across the line y = −x, R2 as a reflection
across the x-axis, R3 as a reflection across the line y = x and finally R4 a
reflection across the y-axis
CHAPTER 4. GROUPS II 59
Using the representation written above of the Klein group as group of reflec-
tions across the x− and y− axis in R2 , you can prove that the Klein group is a
subgroup of D4 .
× 1 2
1 1 2
2 2 1
Note that Z× ∼
3 = Z2 .
What about the integers modulo 4? The set S = {1, 2, 3} is not a group,
since it is not closed. But if we omit 2, then we get a group Z×
4 = {1, 3} with
table
CHAPTER 4. GROUPS II 60
× 1 3
1 1 3
3 3 1
Examples. (a) Z×
5 has order 4, and table
× 1 2 3 4
1 1 2 3 4
2 2 4 1 3
3 3 1 4 2
4 4 3 2 1
clearly Z× ∼
6 = Z2 .
Chapter 5
Example. Find the line which best fits the data points (1, 1), (2, 2.4), (3, 3.6),
(4, 4) in R2 .
Solution. Fitting the line y = a + bx to the data-points gives Av = k, where
1 1 1
1 2 a 2.4
A= 1 3
, v = , k = .
b 3.6
1 4 4
Then
4 10 1 10 5 0 −5
At A = ⇒ ψ(A) = .
10 30 10 −3 −1 1 3
Thus
a 0.2
= ψ(A)k = ,
b 1.02
and so y = 0.2 + 1.02x is the best fit.
1 x1 · · · x1m−1
a0 y1
A = ... ... ..
.
.. ,
. a = ... , y = ... .
1 xn · · · xnm−1 am−1 yn
0
CHAPTER 5. PSEUDOINVERSE AND JORDAN NORMAL FORM 63
At (y − y0 ) = 0
⇒ At Aa = At y0 = At y
⇒ a = (At A)−1 At y = ψ(A)y.
ẋ = x + y,
ẏ = −x + 3y.
If s = M −1 u, then the original system becomes ṡ1 = 2s1 + s2 , ṡ2 = 2s2 , which
is easily solved to give s2 (t) = c2 exp(2t), s1 (t) = (c1 + c2 t) exp(2t). [Exercise
for you: obtain this.] Finally, transform back to u = M s, giving
x(t) = (c1 + c2 + c2 t) exp(2t), y(t) = (c1 + 2c2 + c2 t) exp(2t).
[Exercise for you: check this by substitution.]
For larger matrices, the Jordan normal form has the eigenvalues down the diag-
onal, either ones or zeros immediately above the diagonal, and zeros everywhere
else. It is a block-diagonal matrix, with each block corresponding to a particular
eigenvalue λ. These Jordan blocks can be described as follows:
(i) λ — the Jordan block of size 1;
λ 1
(ii) — the Jordan block of size 2;
0 λ
λ 1 0
(iii) 0 λ 1 — the Jordan block of size 3;
0 0 λ
and so on.
Example 2. Find the Jordan normal form of
2 2 −2
A = 1 −1 1 .
2 0 0
We first calculate the characteristic polynomial:
2−t 2 −2
pA (t) = det(A − tI) = det 1 −1 − t 1 = −t3 + t2 = (1 − t)t2 .
2 0 −t
Thus the eigenvalues are λ = 1 and λ = 0 (twice).
• λ=1
0 1 2 −2 x1
0 = (A − I)x = 1 −2 1 x2 .
0 2 0 −1 x3
So x3 = 2x1 and 2x2 = −x1 + 2x3 = 3x1 . An eigenvector is
2
u1 = 3 .
4
CHAPTER 5. PSEUDOINVERSE AND JORDAN NORMAL FORM 65
• λ=0
0 2 2 −2 x1
0 = (A − 0I)x = Ax = 1 −1 1 x2 .
0 2 0 0 x3
So x1 = 0 and x2 = x3 . An eigenvector is
0
u2 = 1 .
1
and
1 0 0
M −1 AM = 0 0 1 .
0 0 0
This is the Jordan normal form of A. It has a 1 × 1 block associated with λ = 1,
and a 2 × 2 block associated with λ = 0.
Remark. If λ is a triple eigenvalue, and there is only one eigenvector w,
then we just iterate the process. Solve (A − λI)v = w to get v, then solve
(A − λI)u = v to get u; and take M to have the three columns w, v, u.
Example 3. Find the Jordan normal form of
1 1 0
A= 0 2
1 .
1 −1 3
CHAPTER 5. PSEUDOINVERSE AND JORDAN NORMAL FORM 66
and
2 1 0
M −1 AM = 0 2 1 .
0 0 2
This is the Jordan normal form of A. It has a single 3×3 Jordan block associated
with λ = 2.
CHAPTER 5. PSEUDOINVERSE AND JORDAN NORMAL FORM 67
Note that these choices are not unique. We could have chosen
1
u2 = 2 ;
1