Linear Algebra Notes: Eigenvectors & Diagonalization

Chapter 1
Eigenvectors and
Diagonalization
1.1 Prologue
If T : Rn → Rn is a linear mapping, then T may be represented as an n × n
matrix M . But this involves a choice of basis on Rn , and M changes if you
choose a different basis. The big theme of this chapter is that a suitable choice
of basis makes M diagonal, and it is then easy to do calculations.
Example A. A predator-prey model. Let xn and yn be the number of owls and

mice, respectively, at the end of year n. Suppose that the change in population
from one year to the next is modelled by vn+1 = M vn , where vn is the column-
vector with entries xn and yn , and

0.6 0.4
M= .
−0.1 1.2
So given starting a population v0 , the population after n years is M n v0 . Can

we get a formula for M n ? In this simple case, we could guess one by trial and
error, but for larger matrices that is not feasible. However, if M were diagonal,
say M = diag(0.8, 1.1), then M n = diag(0.8n , 1.1n ). So the idea is to solve the
original problem by changing basis so that M is diagonal. This process is called
diagonalization, and almost all square matrices can be diagonalized.
Example B. Consider the following system of ordinary differential equation

for x(t) and y(t):
ẋ = −x + 4y,
ẏ = −2x + 5y.
CHAPTER 1. EIGENVECTORS AND DIAGONALIZATION 2
This can be written in matrix form as

d
v = Av,
dt
where
−1 4
A= .
−2 5
It is not too hard to solve this system, but the calculations in effect amount to
diagonalizing the matrix A. Notice that if A were diagonal, say A = diag(−2, 3),
then the equations are ẋ = −2x, ẏ = 3y, and the general solution is immediate,
namely x(t) = x0 exp(−2t), y(t) = y0 exp(3t).
1.2 Eigenvalues and eigenvectors

Let A be a square matrix (same number n of rows as columns). A non-zero
n-vector is an eigenvector of A with eigenvalue λ ∈ C, if
Av = λv.
(Both the Greek letter, and the hybrid German-English words, are traditional.)
Here we are thinking of v as a column-vector. In particular,
0 = Av − λv = (A − λI)v.
Since v is non-zero by definition, this means that the matrix B = A − λI is not

invertible (since if B −1 exists, then Bv = 0 implies B −1 Bv = 0 implies v = 0).
Therefore
det(A − λI) = 0.
This is called the characteristic equation of the matrix A. Notice that det(A−tI)
is a polynomial in the variable t, of degree n. We define pA (t) = det(A − tI) to
be the characteristic polynomial of A; the eigenvalues are the roots of pA . Note
that if v is an eigenvector, then so is cv, for any number c 6= 0.
Example 1. If A is a diagonal matrix, then the eigenvalues of A are precisely
the entries on the diagonal. For example,
 
−2 0 0
A =  0 3 0 .
0 0 1
has characteristic polynomial pA (t) = −(t + 2)(t − 3)(t − 1), and so the eigen-
values are λ = −2, 3, 1.
Example 2. If A is an upper-triangular matrix, then the eigenvalues of A are

again the entries on the diagonal. For example,
 
−2 5 1
A =  0 3 2 .
0 0 1
has characteristic polynomial pA (t) = −(t + 2)(t − 3)(t − 1), and so the eigen-
values are λ = −2, 3, 1, as before.
Example 3. Find the eigenvalues of the matrix

0 1
A= .
1 0
Solution. The characteristic polynomial is pA (t) = t2 −1, and so the eigenvalues

are λ = ±1.
Example 4. Find the eigenvalues of the matrix
 
−2 1 2
A =  2 3 −4 .
−1 1 1
Solution. We calculate
−pA (t) = det(tI − A)

 
t + 2 −1 −2
= det  −2 t − 3 4 
1 −1 t − 1
= (t + 2)(t − 3)(t − 1) − 4 − 4 + 4(t + 2) + 2(t − 3) − 2(t − 1)
= t3 − 2t2 − t + 2
= (t − 2)(t − 1)(t + 1).
The factorization is obtained, for example, by spotting that t = 1 is a root, and

then doing long-division by the factor t − 1. The eigenvalues are λ = 2, 1, −1.
To find the corresponding eigenvectors, we solve (A − λI)v = 0.
Example 1. The eigenvectors are clearly v1 = (1, 0, 0), v2 = (0, 1, 0) and
v3 = (0, 0, 1).
Example 2. Start with previous, and adjust: the eigenvectors are v1 =
(1, 0, 0), v2 = (1, 1, 0) and v3 = (−4/3, −1, 1).
Example 3. The eigenvectors are v1 = (1, 1) and v2 = (1, −1).
Example 4. The eigenvectors are v1 = (1, 2, 1), v2 = (1, 1, 1) and v2 = (2, 0, 1).
Multiple eigenvalues. If we have a polynomial p(t) with degree N , the
following decomposition holds:
p(t) = c (t − λ1 )k1 · (t − λ2 )k2 · · · · · (t − λp )kp ,
with c ∈ C is non zero. The roots, λi ∈ C, are all distinct, i.e. λi 6= λj when
i 6= j. The multiplicity of the root λi is given by ki ∈ N, with ki 6= 0, and
P p
i=1 ki = N .
If an eigenvalue λ has multiplicity k 6= 0 ∈ N it means that the characteristic
polynomial takes the form
pA (t) = (t − λ)k Q(t) ,
where the polynomial Q(t) has degree N − k and it is non-vanishing on λ, i.e.
Q(λ) 6= 0.
There are p linearly-independent eigenvectors corresponding to λ, where 1 ≤
p ≤ k. The eigenspace Vλ is the p-dimensional vector space spanned by these
eigenvectors.
Since the eigenspace Vλ is generated by all the eigenvectors associated with
the eigenvalue λ we can also write it as
Vλ = Ker (A − λI) ,
where the kernel, Ker, of a linear operator B is a subspace of the full vector
space V given by
Ker B = {v ∈ V s.t. Bv = 0} .
The simplest examples are the following.
• The 2 × 2 matrix A = I has eigenvalue λ = 1 (twice), and two independent
eigenvectors: the eigenspace is 2-dimensional.
• By contrast,
1 1
A=
0 1
also has eigenvalue λ = 1 (twice), but v = (1, 0) is the only eigenvector (up to
scalar multiplication): the eigenspace is 1-dimensional, namely span{v}.
Theorem. (The Cayley-Hamilton theorem). Let A be a square matrix with
characteristic polynomial p(t). Then p(A) = 0, as a matrix equation.
Partial proof. We show this for 2 × 2 matrices only, by direct calculation. (Or
just do for A = σ1 .) Let
a b
A= .
c d
Then p(t) = t2 − (a + d)t + (ad − bc), and

a b a b a b 1 0 0 0
p(A) = − (a + d) + (ad − bc) = .
c d c d c d 0 1 0 0
For completeness:
Complete proof.
Let us use a flag basis for A, i.e. a basis {v1 , ..., vN } such that A is upper
triangular  
a11 a12 . . . a1N
0 a22 . . . a2N 
A= .
 .. . .. . .. ..  . (1.1)
. 
0 ... 0 aN N
We use the result (without proving it) that for any linear transformation T we
can find a flag basis and the matrix representation A of the linear transformation
T in this basis takes precisely the form above.
Because of the upper triangular form, if we apply A to the vector basis vi we
obtain
Avi = aii vi + wi
.
with wi ∈ Vi−1 = Span{v1 , ... vi−1 }. So
(A − aii I) vi ∈ Vi−1 .
The characteristic polynomial takes the form
p(t) = (t − a11 )(t − a22 ) . . . (t − aN N ) ,
so let’s consider the associated matrix equation:
p(A) = (A − a11 I)(A − a22 I) . . . (A − aN N I) .
We want to prove that p(A) = 0 as a matrix equation. To prove it we will show

that, ∀ v ∈ Vi , we have p(A)v = 0, then, using induction on i, we can conclude
that p(A) vasishes identically.
For i = 1 we have that V1 = Span{v1 } so clearly
(A − a11 I)v1 = 0
from the upper triangular form of A.

Let us assume that the inductive hypothesis is valid up to i − 1, we will show
that it also holds for i. To prove it we note that ∀ v ∈ Vi , we can write
v = αvi + w,
with α ∈ C and w ∈ Vi−1 . From the upper triangular property it follows that
(A − aii I)w ∈ Vi−1 since Aw ∈ Vi−1 , and also (A − aii I)αvi ∈ Vi−1 , so we deduce
that
(A − aii I)v = w0 ∈ Vi−1 ∀v ∈ Vi
.
At this point we can use the inductive hypothesis since we know that, for all
w0 ∈ Vi−1 :
(A − a11 I)(A − a22 I) . . . (A − ai−1 i−1 I)w0 = 0 ,
concluding our theorem.
1.3 Interlude: the $400bn eigenvector

There are a huge number n of web pages, and the main task of a search engine is
to rank them. Google, Bing et al index around 2×1010 pages. Ranking involves
studying the hyperlinks between the pages, the information about these links
can be expressed by an n × n matrix A. The basic idea is that the ranking is
then given by the magnitudes of the components of the dominant eigenvector
of A, namely the one corresponding to the largest eigenvalue. Google has such
a vector (or at least a good approximation to one), which is why the company
is now worth around $400bn.
To compute eigenvectors as we do in this chapter is totally impractical for such
a large matrix. But we can usually get a good approximation to the dominant
eigenvector, if it exists, via a power sequence. Here is a simple scheme. Start
with a vector v0 which has 1 as its largest component, and all components
between -1 and 1. Then take v1 = cAv0 , where the scalar c is chosen so that
v1 has 1 as its largest component, and continue in this way to get a sequence
{v0 , v1 , v2 , . . .}. Then if a dominant eigenvector v exists at all, and for almost
all choices of the starting vector, we have vk → v as k → ∞ (not obvious, but
a theorem). In fact, Google’s algorithm is rather different, but it amounts to
the same thing. Of course, a search engine does not compute the analytic limit,
but simply takes k to be large enough to give an acceptable approximation vk .
As a trivial example, take n = 3, A = diag(2, 3, 5) and v0 = (1, 1, 1). Then
it is easy to see that vk = ((2/5)k , (3/5)k , 1) → (0, 0, 1) as k → ∞. You can
see why it is called a power sequence. Of course, A in practice would not be
diagonal, and you might get a v like v = (0.2, 1, 0.7), which would tell you to
rank the pages in the order 2-3-1.
1.4 The similarity relation

We want to be able to identify matrices differing from one another just by a
change of basis, i.e. representing the same linear transformation but in different
basis.
Definition. An equivalence relation, ∼, on a set X, is a binary relation with
the following properties:
• a ∼ a (reflexivity)
• a ∼ b ⇒ b ∼ a (symmetry)
• If a ∼ b and b ∼ c ⇒ a ∼ c (transitivity),
∀a, b, c ∈ X.
Definition. We say that two square matrices A and B (of the same size)
are similar, written A ∼ B, if there exists an invertible matrix M such that
A = M −1 BM .
Proposition. Similarity is an equivalence relation.
Proof. First, the relation is reflexive, namely A ∼ A for any A: just take M = I.
Secondly, the relation is symmetric, namely A ∼ B implies B ∼ A: because
A = M −1 BM implies B = N −1 AN , where N = M −1 . Finally, the relation
is transitive, namely if A ∼ B and B ∼ C, then A ∼ C: this follows easily
from matrix multiplication. These three properties are the definition of an
equivalence relation.
So the set of all n × n matrices gets partitioned into equivalence classes
[A] = {B s.t. B ∼ A, equivalentlyB = M −1 AM }.
For example the equivalence class to the identity matrix is simply given by
[I] = {I}.
Proposition. Similar matrices have the same eigenvalues.
Proof. Suppose A = M −1 BM . The characteristic polynomial of A is
pA (t) = det(A−tI) = det[M −1 (B−tI)M ] = det(M )−1 det(B−tI) det(M ) = pB (t),
so A and B have the same eigenvalues.
Note the the viceversa is not generically true!
Example. Consider the matrices

1 0 1 α
A= =I B=
0 1 0 1
with α ∈ R and α 6= 0. Clearly pA (t) = pB (t) = (t − 1)2 so A and B have

eigenvalues λ = {1, 1}. Despite having the same eigenvalues A cannot possibly
be similar to B, in fact all the matrices similar to A are given by the equivalence
class [A] = {I} so B ∈[A]
/ which means that B is not similar to A.
Roughly speaking, similar matrices are ones which are equivalent to each other
via some change of basis. The square matrix A is said to be diagonalizable if A
is similar to a diagonal matrix (which will then obviously have the eigenvalues
of A as its diagonal entries). Here is a summary of some facts about similarity
and diagonalizability. Note that if we allow complex roots (as we do), then the
n × n matrix A has n eigenvalues, but they may not all be distinct.
(a) For A and B to be similar, they must have the same eigenvalues. Follows
easily from the Proposition.
(b) If A and B have the same eigenvalues, and both are diagonalizable,
then A ∼ B. Being diagonalizable means that A = M −1 D1 M and
B = N −1 D2 N for some diagonal matrices D1 , D2 . But since they have
the same eigenvalues it means that D1 ∼ D2 , so by the transitivity of ∼
it follows that A ∼ B.
(c) If A is diagonalizable but B is not, then they cannot be similar. Diago-

nalizability of A means A ∼ D for some diagonal matrix D, if A were to
be similar to B this would mean, by the transitivity of ∼, that B ∼ D,
contradicting our hypothesis.
(d) Not all square matrices are diagonalizable.
(e) If the eigenvalues of A are distinct, then A is diagonalizable.
(f) If A is a symmetric matrix, then A is diagonalizable.
Below, we prove items (d) and (e); item (f) will crop up in a later chapter.
1.5 Diagonalization by change of basis
Proposition. The n × n matrix A is diagonalizable if and only if it has n

linearly-independent eigenvectors.
Proof. Suppose that the eigenvalues are {λ1 , λ2 , . . . , λn }, and that {v1 , v2 , . . . , vn }
are a set of linearly-independent eigenvectors. Assemble these column-vectors
to make an n × n matrix M = [v1 v2 . . . vn ], and let D denote the diagonal

matrix with λ1 , λ2 , . . . , λn down the main diagonal. Now observe that
AM = [λ1 v1 λ2 v2 . . . λn vn ]
= [v1 v2 . . . vn ] diag[λ1 λ2 . . . λn ]
= M D.
Thus M −1 AM = D, and A is diagonalizable. For the converse, suppose that

M −1 AM = D; the columns of M are eigenvectors of A, and they form a
linearly-independent set since M is invertible.
Remark. As we will see later on eigenvalues corresponding to different eigen-
vectors will be automatically linearly independent, this means that each eigenspace
Vλ will be linearly independent from Vµ when the eigenvalues λ is different from
the eigenvalue µ.
In particular, we know that 1 ≤ dim Vλ ≤ mλ , where mλ is the multiplicity of
the eigenvalue λ. Since the degree of the characterstic polynomial is precisely
the dimension n of our matrix,
P it means that if we sum the multiplicities of all
the eigenvalues we find λ mλ = n.
Hence, for the matrix under consideration to be diagonalizable, we must have
that Vλ = mλ , for all the eigenspaces Vλ , since only in this case we will have a
set of n, linearly independent eigenvectors.
This theorem also tells one how to construct M . Here are some examples.
Example 1. If possible, diagonalize the matrix

−1 4
A= .
−2 5
Solution. p(t) = t2 − 4t + 3 = (t − 1)(t − 3), so the eigenvalues are λ1 = 1 and

λ2 = 3.
• λ=1
0 −2 4 v1
= 0 = (A − I)v =
0 −2 4 v2
has solution −2v1 + 4v2 = 0, so taking v2 = 1 gives v1 = 2, and

2
v1 = .
1
• λ=3
0 −4 4 v1
= 0 = (A − 3I)v =
0 −2 2 v2
has solution −4v1 + 4v2 = 0, so taking v2 = 1 gives v1 = 1, and so

1
v2 = .
1
Therefore we set

2 1 −1 1 −1
M= , M = ,
1 1 −1 2
and then
−1 1 0
M AM = .
0 3
Remark. The construction gives you M ; if you want to check your answer
without computing M −1 , just check that AM = M D, where D is the diagonal
matrix of eigenvalues (in the right order).
Example 2. Solve the system of ODEs
ẋ = −x + 4y,
ẏ = −2x + 5y.
Solution. The procedure is to diagonalize the system, which amounts to the

same thing as changing basis. If v denotes the column-vector with entries (x, y),
put w = M −1 v. Then w(t) satisfies the system ẇ = Dw, where D = diag[1 3].
The solution is w1 (t) = c1 exp(t), w2 (t) = c2 exp(3t), with c1 and c2 constant;
and then v = M w gives
x(t) = 2c1 et + c2 e3t ,

y(t) = c1 et + c2 e3t .
(At this stage, you should substitute these answers back into the ODEs to check
the calculation!)
Example 3. If possible, diagonalize the matrix

1 2
A= .
0 1
Solution. Here p(t) = (1 − t)2 , so there is one eigenvalue are λ = 1 with

multiplicity 2. Now

0 0 2 v1
= 0 = (A − I)v =
0 0 0 v2
has solution v2 = 0. So the eigenspace V1 is spanned by

1
v1 = .
0
In other words, V1 is 1-dimensional. In view of our theorem, we conclude that
A is not diagonalizable.
Proposition. Eigenvectors that correspond to distinct eigenvalues of A are
linearly independent. In particular, if all the eigenvalues of A are distinct, then
A is diagonalizable.
Proof. Suppose λ1 6= λ2 and w = c1 v1 + c2 v2 = 0. Then Aw − λ1 w gives
(λ2 − λ1 )c2 v1 = 0,
and hence c2 = 0. Similarly c1 = 0. Thus {v1 , v2 } are linearly independent.
This proof extends directly to k eigenvectors. Suppose w = c1 v1 +. . .+ck vk =
0. Do Aw − λk w to get rid of vk
Aw − λk w = c1 (λ1 − λk )v1 + ... + ck−1 (λk−1 − λk )vk−1 = w0
. Then do Aw0 − λk−1 w0 to get rid of vk−1 . Eventually you end up with
(λ1 − λ2 ) . . . (λ1 − λk )c1 v1 = 0,
and so c1 = 0. Similarly cj = 0, for all j.
Remark As mentioned before, from the above proposition we deduce that
eigenspaces corresponding to different eigenvalues must be linearly indepen-
dent. Since we know that the dimension of each eigenspace, Vλ , is bounded
by the multiplicity, mλ , of that particular eigenvalue, we deduce that the only
obstruction to finding n, linearly independent, eigenvectors comes from all the
eigenspaces corresponding to an eigenvalue with mλ > 1.
Given a matrix A, a necessary and sufficient condition for A to be diago-
nalizable is that dim Vλ = dim Ker (A − λI) = mλ , for all the eigenvalues λ.
Only in this case we can decompose the total vector space V as a direct sum of
eigenspaces
V = Vλ1 ⊕ Vλ2 ⊕ ... ⊕ Vλp ,
so that we have precisely n linearly independent eigenvectors since
dimV = dimVλ1 + dimVλ2 + ... + dimVλ1 = mλ1 + mλ2 + ... + mλp = n ,
since we know that the degree of the characteristic polynomial pA (t) = n and
it’s also equal to the sum of all the multiplicities of all eigenvalues:
deg (pA (t)) = mλ1 + mλ2 + ... + mλp = n .
Note that the above remarks are valid only when we allow ourself to work
over the complex numbers! In particular, eigenvalues might be complex, even
when the matrix is real.
Example 4. Let us try to diagonalize

−2 −1
A= .
5 2
Solution. p(t) = t2 + 1, so the eigenvalues are λ1 = i and λ2 = −i. Then we get

−2 + i −2 − i 1 −5i 1 − 2i
M= , M −1 = ,
5 5 10 5i 1 + 2i
and
−1 i 0
M AM = .
0 −i
Remark. If A is a real matrix, it may be possible to diagonalize A over

C, but not over R, as in this example. In general, we take “diagonalizable”
to mean that complex numbers are allowed. But sometimes we might ask
whether A is “diagonalizable when R is the underlying field of scalars”, or
“diagonalizable over R”, which means that only real numbers are allowed, and
is more restrictive. Note that the eigenvectors belong to C2 , a complex vector
space.
Example 5. If
−1 4
A= ,
−2 5
derive a formula for An .
Solution. Note that A = M DM −1 , where M and D are as in Example 1. So
n n

2 − 3 2 × 3 − 2
An = M Dn M −1 = .
1 − 3n 2 × 3n − 1
Example 6. Mathematical models often lead to discrete-time evolution sys-

tems of the form vk+1 = Avk . These can be solved by diagonalizing A, as
we did for differential equations. Sometimes we are interested in whether the
solution stabilizes, ie vk → v as k → ∞. Such a v has to have the property
Av = v (eigenvalue 1), and furthermore v has to be a dominant eigenvector.
Exercise for you: show that the dynamical system defined by

0.8 0.1
A=
0.2 0.9
admits a stable solution,√and find it (as a unit vector). [Answer: p(t) = (t −

0.7)(t − 1), v = ±(1,
√ 2)/ 5. The other eigenvector, corresponding to λ = 0.7,
is v = ±(1, −1)/ 2.]
Question. Can we define the k th root of a diagonalizable matrix A, i.e. a

matrixB =“A1/k ”, with k ∈ N, such that B k = A ? If A is diagonalizable,
it means that M −1 AM = D = diag
(λ1 , ..., λn) for some λi ∈ C. If we define
1/k 1/k
B = M D̃M −1 , with D̃ = diag λ1 , ..., λn we can easily see that B k =
M D̃k M −1 = M D M −1 = A.
Note that this matrix B is not unique, for example if we take

1/k
B1 = M diag ω1 λ1 , ..., ωn λn M −1 ,
1/k
with ωp a k th root of unity, i.e. ωp = e2π i p/k ∈ C, ωpk = 1, we still have B1k = A.
Chapter 2
Inner-product spaces
2.1 Definitions and examples
Definition A bilinear form B on a real vector space V is a mapping B :
V × V → R assigning to each pair of vectors u, v ∈ V a real number B(u, v)
satisfying the following for all u, v, w ∈ V and all λ ∈ R:
(i) B(u + v, w) = B(u, w) + B(v, w) ,
(ii) B(w, u + v) = B(w, u) + B(w, v) ,
(iii) B(λu, v) = B(u, λv) = λB(u, v) .
If we pick V = Rn and we use the basis {v1 , ..., vn }, we can express a bilinear
form B on V as n X n
X
B(u, w) = yi Aij xj ,
i=1 j=1
where xi , yj are the coordinates of the vectors u and w) respectively, in the

basis {v1 , ..., vn }, while the matrix A is defined by Aij = B(vj , vi ), note the
labelling of the indices i and j.
We simply think the coordinates {y1 , ..., yn }of w as a row vector, while the
coordinates {x1 , ..., xn } of u as a column vector, hence the bilinear form B(u, w)
is simply
Xn Xn
B(u, w) = yi Aij xj = yt Ax .
i=1 j=1
Definition of real inner-product space. An inner product on a real vector

space V assigns to each pair of vectors u, v ∈ V a real number (u, v), satisfying
the following for all u, v, w ∈ V and all λ ∈ R:
(i) (u, v) = (v, u) (symmetry);
CHAPTER 2. INNER-PRODUCT SPACES 15
(ii) (u + v, w) = (u, w) + (v, w);
(iii) (λu, v) = λ(u, v);
(iv) (v, v) ≥ 0 with (v, v) = 0 ⇐⇒ v = 0 (positivity).
(ii) and (iii) constitute linearity in first factor; (i)-(iii) make it symmetric and
bilinear, ie. linear in each factor separately.
A bilinear form B, which is also positive and symmetric, defines an inner
product. A real inner-product space is a real vector space equipped with an
inner product.
The bilinearity of the inner product is equivalent to the fact that the trans-
formation:
(· , v) : V 7→ V
defined by u 7→ (u, v) is a linear transformation for every v ∈ V (plus of course
symmetricity).
Examples.
(a) V = Rn with the standard Euclidean inner product (dot product)
(u, v) = u · v = u1 v1 + · · · + un vn .
(b) V = Rn and (u, v) = a1 u1 v1 + · · · + an un vn , where a1 , a2 , . . . , an are fixed

positive real numbers. This is called the weighted Euclidean inner product
with weights a1 , . . . , an . Note that properties (i)-(iii) are immediate, and
(iv) follows since a1 , . . . , an > 0.
(c) On V = R4 , the expression (x, y) = x1 y1 + x2 y2 + x3 y3 − x4 y4 is not an

inner product: properties (i)-(iii) hold, but (iv) does not. For example,
x = (1, 0, 0, 1) is a non-zero vector with (x, x) = 0. However, it is of great
importance: from Einstein, we know that this is the pseudo-inner product
on flat space-time, with x4 being the time coordinate. The vector space R4
with this pseudo-inner product is called Minkowski space.
(d) On V = R2 , the expression (x, y) = x1 y1 + x1 y2 + x2 y2 is not an inner

product: it is not symmetric. Eg x = (1, 1) and y = (1, 0) has (x, y) = 1
but (y, x) = 2.
(e) On V = R2 , the expression (x, y) = x1 y1 + x1 y2 + x2 y1 + x2 y2 is not an

inner product: properties (i)-(iii) hold, but (iv) does not. To see this, note
that (x, x) = (x1 + x2 )2 , so for example x = (1, −1) is a non-zero vector
with (x, x) = 0.
(f) On V = R2 , the expression (x, y) = x1 y1 + x1 y2 + x2 y1 + 2x2 y2 is an inner

product.
(g) On V = R3 , consider the expression (x, y) = 2x1 y1 + x2 y2 − 2x2 y3 − 2x3 y2 +

kx3 y3 , where k ∈ R is fixed. This satisfies properties (i)-(iii), so the question
is whether (iv) holds. To investigate this, observe that (x, x) = 2x21 + (x2 −
2x3 )2 + (k − 4)x23 , so it is an inner product if and only if k > 4. (Note that
k = 4 does not work.)
(h) V = C[a, b], the vector space of continuous functions f : [a, b] → R. Recall
that V is infinite-dimensional, and that the zero-vector is the function f
which is identically zero. Define
Z b
(f, g) = f (t) g(t) dt.
a
(This scalar product is precisely the scalar product inducing the so called
L2 norm over C[a, b].) Note that this is symmetric and linear in each factor.
Also Z b
(f, f ) = f (t)2 dt ≥ 0
a
and if f (t) is not identically zero on [a, b] then (f, f ) > 0. For in this
case there exists t0 ∈ [a, b] such that f (t0 ) = c 6= 0 and hence, using the
continuity of f , there is a small neighbourhood (t − δ, t + δ), with δ > 0,
2 Rb
around t0 where f (t)2 ≥ c4 , so integrating gives (f, f ) = a f (t)2 dt > 0.
Conclusion: this is an inner product.
(i) Weighted version of previous example. Let k : [a, b] → R be such that

k(t) > 0 for all t ∈ (a, b), with k a continuous function also called kernel,
Rb
and define (f, g) = a k(t) f (t) g(t) dt. More on this later.
(j) Vector space of Polynomials Let R[x]n denote the space of polynomials
in x, of degree at most n, with real coefficients, i.e. R[x]n = {a0 + ... +
an xn with ai ∈ R}. This is a vector space of dimension n+1. For any interval
[a, b], it is a subspace of the infinite-dimensional space C[a, b] above. The
Rb
form (f, g) = a f (t) g(t) dt is one choice of inner product. A natural basis
is {1, x, . . . , xn−1 , xn }; so if
p(x) = pn xn + . . . + p0
is a polynomial, then its components with respect to this basis are p =

(p0 , . . . , pn ). The expression for (p, q) depends on a, b and n. For a = 0,
b = 1 and n = 1 we get
1 1 1
(p, q) = p0 q0 + p1 q0 + p0 q1 + p1 q1 .
2 2 3
Matrix version.
Definition We say that an n × n matrix B is symmetric if
B t = B,
anti-symmetric or skew-symmetric if
B t = −B.
A symmetric matrix B is positive-definite if all its eigenvalues are positive. (We
will see later on that every symmetric matrix is also diagonalizable) An al-
ternative, equivalent, criterion for B to be positive-definite is that all n of its
upper-left square submatrices have positive determinant. (These subdetermi-
nants are called the leading principal minors of B.)
Every matrix B can be decomposed into its symmetric and anti-symmetric
parts simply via
B + Bt B − Bt
B= + = B+ + B− ,
2 2
t
B−B t
clearly B+ = B+B2 is symmetric, while B− = 2 is anti-symmetric.
Proposition If V = Rn , a bilinear form B defines an inner product on V if
and only if its matrix representation, that we call with the same name B, is a
symmetric matrix, i.e. its anti-symmetric part B− must vanish, and, to ensure
the positivity condition of the inner product, B is a positive-definite n × n
matrix.
We think of v, w ∈ V as column vectors, written in some basis, then an inner
product (·, ·) is simply (v, w) = wt Bv with B a symmetric and positive definite
matrix.
For example (e) above, we have

1 1
B= ;
1 1
the two subdeterminants are 1 and 0, so B is not positive-definite.
For example (g) above, we have
 
2 0 0
B= 0 1 −2 ;
0 −2 k
the three subdeterminants are 2, 2 and 2(k − 4), so B is positive-definite if and

only if k − 4 > 0.
Definition of complex inner product. A complex (or hermitian) inner
product on a complex vector space V assigns to each pair of vectors u, v ∈ V
a complex number hu, vi, satisfying the following for all u, v, w ∈ V and all
λ ∈ C:
(i) hu, vi = hv, ui (hermiticity);
(ii) hu + v, wi = hu, wi + hv, wi;
(iii) hλu, vi = λhu, vi;
(iv) hv, vi ≥ 0 with hv, vi = 0 ⇐⇒ v = 0 (positivity).

(ii) and (iii) constitute linearity in first factor; note that the inner product is
not quite linear in the second factor, since from (i) & (iii) we see that hu, λvi =
λhu, vi, this property is sometimes called sesquilinearity.
Note that hv, vi is always real, by property (i). [NB. We are using ( , ) for
real inner products, and h , i for hermitian inner products, but this notation is
not used by all authors.]
Examples.
(a) Simplest case: V = C. The standard inner product on C is hz, wi = zw.
Note that hz, zi = |z|2 .
(b) The standard hermitian inner product on Cn is given by hz, wi = z1 w1 +

· · · + zn wn . Note that hz, zi = |z1 |2 + · · · + |zn |2 .
(c) V = C2 and hz, wi = 3z1 w1 + 2z2 w2 .
(d) V = C2 and hz, wi = 3z1 w1 + 2z2 w2 + iz1 w2 − iz2 w1 . Note that (i) is
satisfied, and (ii), (iii) are satisfied since the right-hand-side is complex-
linear in z1 , z2 . For (iv), we have
hz, zi = 3z1 z 1 + 2z2 z 2 + iz1 z 2 − iz2 z 1

= 2|z1 |2 + |z2 |2 + |z1 − iz2 |2
≥ 0,
with equality if and only if z1 = z2 = 0. So (iv) is satisfied as well.
(e) V = C2 and hz, wi = z1 w1 + iz1 w2 + iz2 w1 + z2 w2 is not hermitian, ie.

property (i) fails.
(f) V = C2 and hz, wi = z1 w1 + iz1 w2 − iz2 w1 + z2 w2 satisfies (i)(ii)(iii), but

(iv) fails, since hz, zi = |z1 − iz2 |2 which is zero for z = (1, −i).
(g) Let V be the set of continuous complex-valued functions f : [a, b] → C,
with Z b
hf, gi = f (t) g(t) dt
a
As an exercise, check that this defines an inner-product space.
(h) Let C[x]n denote the space of polynomials in x, of degree at most n, with
complex coefficients. This is complex vector space of complex dimension
n + 1. For any interval [a, b], it is a subspace of the infinite-dimensional
space V above. A natural basis is {1, x, . . . , xn−1 , xn }; so if
p(x) = pn xn + . . . + p0
is a polynomial, then its components with respect Rb to this basis are p =

(p0 , . . . , pn ). As for the real case, we could use a as an inner product. For
a = 0, b = 1 and n = 1 we get
1 1 1
hp, qi = p0 q0 + p1 q0 + p0 q1 + p1 q1 ,
2 2 3
that can also be rewritten as

1 1/2 p0
hp, qi = (q0 , q1 ) .
1/2 1/3 p1
Matrix version.
Definition We say that a matrix B is hermitian if B ∗ = B. Here B ∗ denotes
the complex-conjugate transpose of B (take the complex conjugate of all the
entries in B and transpose the matrix)
(B ∗ )ij = B ji ,
where z = Re(z)−i Im(z) denotes the complex conjugate of the complex number
z = Re(z) + i Im(z).
Similarly B is an anti-hermitian matrix if B ∗ = −B. Note that sometimes
B ∗ is denoted with B † .
As for the real case, every matrix can be decomposed as a sum of two matrices,
one hermitian and the other anti-hermitian:
B + B∗ B − B∗
B= + = B+ + B− ,
2 2
∗ ∗
where B+ = B+B 2 is hermitian and B− = B−B
2 is anti-hermitian.
n
If V = C , then an inner product on V corresponds to an hermitian positive-
definite n × n matrix B, as follows. Think of v, w ∈ V as column vectors in
some basis, and define hv, wi = w∗ Bv.
The matrix B can be reconstructed, as for the real case seen before, from
the hermitian product evaluated on the basis vectors. In particular if we take
{v1 , ..., vn } as a basis we have
Bij = hvj , vi i ,
note the labels of the indices.

For example (d) above, we have

3 −i
B=
i 2
which is hermitian; the two subdeterminants are 3 and 5, so B is positive-

definite.
For example (e) above, we have

1 i
B= ,
i 1
which is not hermitian. (It is symmetric, but that is not what is needed in the
complex case.)
2.2 The Cauchy-Schwarz inequality
Definition Given a vector space V over R (or C), a norm on V is a function

usually denoted by || · || : V → R with the following properties
(i) ||a v|| = |a| ||v|| (absolute homogeneity),
(ii) ||v + u|| ≤ ||v|| + ||u|| (triangle inequality or subadditivity),
(iii) If ||v|| = 0 then v = 0 (separation of points),
this for all a ∈ R (or C) and all v, u ∈ V. Note that from (i) we have ||0|| = 0
and ||v|| = || − v||, so from the triangle inequality it follows that
||v|| ≥ 0 ∀v ∈ V (non-negativity) .
The triangle inequality takes its name from the well known fact that in a
triangle the sum of the lengths of any two sides must be greater than or equal
to the length of the remaining side.
Norm, orthogonality and angle.
The real case

p product space, then we define the norm ||v|| of a
If (V, ( , )) is a real inner
vector v ∈ V by ||v|| = (v, v) . One could also call this the length of the
vector, but using the word norm emphasizes that the inner product may not
be the standard one. A vector v ∈ V is a unit vector if ||v|| = 1. Two vectors
u and v are orthogonal if (u, v) = 0.
p
Since we just called ||v|| = (v, v) a norm, it is clear that it will satisfy the
properties outlined in the definition above. It is straightforward to see that the
properties (i) and (iii) are satisfied following from the properties of the inner
products. We are left with proving that the norm just introduced satisfies the
triangle inequality.
Theorem (Pythagoras). If u and v are orthogonal, then ||u + v||2 = ||u||2 +

||v||2 . The proof is a direct calculation: expand ||u + v||2 .
Examples.
(a) The only unit vectors in R with the standard inner product are ±1; but in
C with the standard inner product, eiθ = cos(θ) + i sin(θ) is a unit vector
for any θ ∈ [0, 2π).
(b) In C2 with the standard hermitian inner product, find all the unit vectors
v which are orthogonal to u = (i, 1). Solution: hv, ui = −iv1 + v2 , so
v = (c, ic) for some c√∈ C. Then ||v||2 = 2|c|2 , so to √
get a unit vector we
iθ
must impose |c| = 1/ 2. The solution is v = (1, i)e / 2 for θ ∈ [0, 2π).
In the real case, we would like to define the angle θ between u and v by
(u, v) = ||u||.||v||. cos θ ; but is this angle well-defined? The Cauchy-Schwarz
inequality guarantees that it is.
Theorem (the real Cauchy-Schwarz inequality.) (u, v)2 ≤ (u, u)(v, v),
with equality if and only if u and v are linearly dependent. Equivalently,
|(u, v)| ≤ ||u||.||v||.
Proof. This is trivial if u = 0, so suppose that u 6= 0. Then for all real numbers
x we have
||xu − v||2 = x2 ||u||2 − 2x(u, v) + ||v||2 ≥ 0,
with equality if and only if xu − v = 0. Thinking of this quadratic expression

in x, we deduce that (u, v)2 − ||u||2 ||v||2 ≤ 0, with equality if and only if
xu − v = 0.
Corollary: the Triangle Inequality. For all u, v ∈ V , we have ||u + v|| ≤

||u|| + ||v||.
Proof:
(||u|| + ||v||)2 − ||u + v||2 = ||u||2 + 2||u||.||v|| + ||v||2 − ||u||2 − 2(u, v) − ||v||2
= 2 [||u||.||v|| − (u, v)]
≥ 0.
Thanks to the Corollary we see that, whenever we have a scalar productp (·, ·),
on a real vector space V , the function || · || : V → R given by ||v|| = (v, v),
defines a norm on V , usually we say that this is the norm induced by the inner
product. Note however that our definition of norm is more general and does
not rely on the presence of an inner product: there are norms which are not
induced by any inner products.
Corollary. For a real inner-product space, the angle θ ∈ [0, π] between u and
v, given by (u, v) = ||u||.||v||. cos θ , is well-defined since from the Cauchy-
Schwarz inequality we know that −||u||.||v|| ≤ (u, v) ≤ ||u||.||v||. Clearly if u
and v are orthogonal vectors (u, v) = 0 and the angle between them is θ = π/2
as expected. If u and v are parallel to one another, i.e. linearly dependent
with v = x u with x > 0, the angle is θ = 0, while if they are anti-parallel, i.e.
linearly dependent v = x u with x < 0, the angle is θ = π.
An orthonormal basis of V is a basis consisting of mutually orthogonal unit
vectors. More on this later on.
Examples.
(a) V = R2 with the standard inner product. Take u = (1, 0) and v =

(cos θ, sin θ). Both are unit vectors, and (u, v) = cos θ.
(b) V = R2 with inner product (u, v) = 4u1 v1 +u1 v2 +u2 v1 +u2 v2 . Let u = (1, 0)
and v = (0, 1). Then ||u|| = 2, ||v|| = 1, and cos θ = 1/2, so θ = π/3.
R1
(c) V = C[−1, 1] with the inner product (f, g) = −1 f (t) g(t) dt. Then ||1|| =
√ p R1
2, ||t|| = 2/3, and 1, t are orthogonal. However, (1, t2 ) = −1 t2 dt =
2/3, so 1, t2 are not orthogonal.
R1
(d) V = C[0,p 1] with the inner product (f, g) = 0 f (t) g(t) dt. Then ||1|| = 1,
||t|| = √1/3, and (1, t) = 1/2. If θ is the angle between 1 and t, then
cos θ = 3/2, and so θ = π/6.
The complex case

The same story can be repeated for the norm induced by an hermitian inner
product h , i on a complex vector space V .
Let (V, h , i) be an hermitian inner
p product space, then we define the norm
||v|| of a vector v ∈ V by ||v|| = hv, vi . One could also call this the length
of the vector, but using the word norm emphasizes that the inner product may
not be the standard one. A vector v ∈ V is a unit vector if ||v|| = 1. Two
vectors u and v are orthogonal if hu, vi = 0.
As for the real case seen above, also for the norm induced by an hermitian
product h , i, it is straightforward to see that the properties (i) and (iii) of what
defines a norm are satisfied, simply following from the properties of the inner
products. We are left with proving that the norm just introduced satisfies the
triangle inequality.
Theorem (the complex Cauchy-Schwarz inequality.) Let (V, h , i) be an

hermitian inner product space, then for all the vectors u, v ∈ V we have
|hu, vi|2 ≤ ||u||2 ||v||2 ,
with equality if and only if u and v are linearly dependent.

Proof: essentially the same as the real case. Let us assume u 6= 0 and
hu, vi 6= 0, otherwise the statement is trivial and let us consider the following
quadratic expression in x ∈ C
||xu − v||2 = |x|2 ||u||2 − 2Re (xhu, vi) + ||v||2 ≥ 0,
with equality if and only if xu − v = 0.

Since this expression must be positive for all x ∈ C we can choose in particular
|hu, vi|
x=λ ,
hu, vi
with λ ∈ R.
The above inequality becomes
λ2 ||u||2 − 2λ|hu, vi| + ||v||2 ≥ 0 ,

and at this point we can apply the same argument we used in the real Cauchy-
Schwarz inequality. For the above expression to be always positive, for all
λ ∈ R, the discriminat of this quadratic expression must be negative
|hu, vi|2 − ||u||2 · ||v||2 ≤ 0 ,
which is precisely the complex Cauchy-Schwarz inequality.

The equality holds only when, for some x ∈ C, we have xu − v = 0, i.e. u
and v linearly dependent.
Note that we cannot define the angle between u, v as in the real case, since
hu, vi is complex. However, it still makes sense to say that complex vectors
are orthogonal. An orthonormal basis of V is a basis consisting of mutually
orthogonal unit vectors.
Example. In C3 with the standard hermitian inner product, the following is
an orthonormal basis:
v1 = (1, 0, 0), v2 = (0, −1, 0), v3 = (0, 0, i).
Finally we can show that the norm induced by an hermitian inner product
satisfies the triangle inequality.
Corollary: the Triangle Inequality. For all u, v ∈ V , we have ||u + v|| ≤
||u|| + ||v||.
Proof:
(||u|| + ||v||)2 − ||u + v||2 = ||u||2 + 2||u||.||v|| + ||v||2 +

−||u||2 − hu, vi − hv, ui − ||v||2
= 2 [ ||u||.||v|| − Rehu, vi ] ≥ 2 [ ||u||.||v|| − |hu, vi| ]
≥ 0,
where we used the fact that hu, vi + hv, ui = 2Rehu, vi, from the hermiticity
property of the inner product, and finally Rehu, vi ≤ |hu, vi|.
p
We were thus justified in calling hv, ui the norm induced by the inner
product h , i, since it satisfies the defining properties (i)-(ii)-(iii) of a norm.
2.3 The Gram-Schmidt procedure

It is often useful to have an orthonormal set of vectors in an inner-product
space. If there are n such vectors and the vector space is n-dimensional, then
they form an orthonormal basis.
There is a systematic method to produce an orthonormal set of vectors from

any given set of linearly-independent vectors, called the Gram-Schmidt proce-
dure, which works as follows.
Let v1 , . . . , vk be linearly independent vectors in an inner-product space V .
These span a k-dimensional subspace U of V . Our task is to construct an
orthonormal basis u1 , . . . , uk for U . The procedure is as follows.
v1
• Define u1 = ||v1 || . Then u1 is a unit vector, and span {u1 } = span {v1 }.
• Define ṽ2 = v2 − (v2 , u1 ) u1 . Then (u1 , ṽ2 ) = 0 and ṽ2 6= 0, since v2 ∈

/
span {u1 }.
• Define u2 = ||ṽṽ22 || . Then u1 , u2 are mutually orthogonal unit vectors, and

span {u1 , u2 } = span {v1 , v2 }.
• Suppose now that mutually orthogonal unit vectors u1 , . . . , ur have been

found with span {u1 , . . . , ur } = span {v1 , . . . , vr } for some r with 1 ≤ r ≤
k. If r < k, then define
ṽr+1 = vr+1 − (vr+1 , u1 ) u1 − · · · − (vr+1 , ur ) ur .
Then
(ṽr+1 , u1 ) = · · · = (ṽr+1 , ur ) = 0,
and ṽr+1 6= 0 since vr+1 ∈ / span {u1 , . . . , ur } = span {v1 , . . . , vr }.
ṽr+1
Define ur+1 = ||ṽr+1 || . Then u1 , . . . , ur+1 are mutually-orthogonal unit
vectors, and span {u1 , . . . , ur+1 } = span {v1 , . . . , vr+1 }.
By this inductive process, we construct an orthonormal basis for U . In partic-

ular, if U = V with V of finite dimension, then this enables us to construct an
orthonormal basis for V .
Examples.
(a) In R3 with the standard inner product, apply Gram-Schmidt to
v1 = (1, 1, 0), v2 = (1, 0, 1), v3 = (0, 1, 1).

Solution:
1
u1 = √ (1, 1, 0),
2
1 1
ṽ2 = v2 − (v2 , u1 ) u1 = (1, 0, 1) − (1, 1, 0) = (1, −1, 2),
2 2
1
u2 = √ (1, −1, 2),
6
ṽ3 = v3 − (v3 , u1 ) u1 − (v3 , u2 ) u2
1 1 2
= (0, 1, 1) − (1, 1, 0) − (1, −1, 2) = (−1, 1, 1),
2 6 3
1
u3 = √ (−1, 1, 1).
3
Remember, there is a simple way to check your answer — always do so!
(b) V = R2 with ||v||2 = 4x21 + 2x1 x2 + x22 for v = (x1 , x2 ), v1 = (1, 0) and
v2 = (0, 1).
Solution. The first step is to deduce the inner product from the norm (this
can always be done when the norm is induced by an inner product). In
this case, it is clear that (u, v) = 4u1 v1 + u1 v2 + u2 v1 + u2 v2 . Applying
Gram-Schmidt, we have
1
u1 = (1, 0),
2
1 1
ṽ2 = v2 − (v2 , u1 ) u1 = (0, 1) − (1, 0) = (−1, 4),
4 4
1
u2 = √ (−1, 4).
2 3
R1
(c) V = C[−1, 1] with the inner product (f, g) = −1 f (t) g(t) dt, and f1 =
1, f2 = t, f3 = t2 . Find orthonormal {g1 , g2 , g3 }.
Solution. We previously computed ||f1 ||2 = 2, ||f2 ||2 = 2/3 and (f1 , f2 ) = 0.
√ p
So we immediately get g1 = 1/ 2 and g2 = 3/2 t. Then
Z 1 √
1 2
(f3 , g1 ) = √ t2 dt = ,
2 −1 3
r Z 1
3
(f3 , g2 ) = t3 dt = 0,
2 −1
f˜3 = f3 − (f3 , g1 ) g1 − (f3 , g2 ) g2
√
2 1
= t2 − √
3 2
1
= t2 − ,
Z 1 3
2 1
||f˜3 || =
2 4
t − t + 2
dt
−1 3 9
2 22 2
= − +
5 33 9
2 2
= −
5 9
8
= ,
45√
3 5 2 1
g3 = √ t − .
2 2 3
(d) Apply Gram-Schmidt to v1 = (1, 0) and v2 = (0, 1) on C2 with the inner

product hz, wi = 3z1 w1 + 2z2 w2 + iz1 w2 − iz2 w1 .
Solution:
||v1 ||2 = 3,
1
u1 = √ (1, 0),
3
ṽ2 = v2 − hv2 , u1 iu1
1 1
= (0, 1) − (−i)(1, 0) = (i, 3),
3 3
5
||ṽ2 ||2 = ,
3
1
u2 = √ (i, 3).
15
(e) Take V = R3 with the standard inner product. Find an orthonormal basis
for the subspace U given by x1 +x2 +x3 = 0, and extend it to an orthonormal
basis for R3 .
Solution. A basis for U is v1 = (1, −1, 0) and v2 = (0, 1, −1). Apply
Gram-Schmidt to this, giving
1
u1 = √ (1, −1, 0),
2
1 1
ṽ2 = (0, 1, −1) − (−1)(1, −1, 0) = (1, 1, −2),
2 2
1
u2 = √ (1, 1, −2).
6
So {u1 , u2 } is an orthonormal basis for U . To extend it, take v3 = (0, 0, 1),
say, and apply Gram-Schmidt one more time:
1 1
ṽ3 = (0, 0, 1) − (0, 0, 0) − (−2)(1, 1, −2) = (1, 1, 1),
6 3
1
u3 = √ (1, 1, 1).
3
Now {u1 , u2 , u3 } is an orthonormal basis for V .
2.4 Orthogonal projection & orthogonal com-

plement
Definition. Let U be a subspace of an inner product space V . Then the
orthogonal complement of U is the set
U ⊥ = {v ∈ V : (u, v) = 0 for all u ∈ U } ,
of vectors in V which are orthogonal to all the vectors in U . Note that U ⊥ is a

subspace of V , ie. U is a vector space in its own right. Now given any vector
v ∈ V , there is a unique vector u ∈ U called the orthogonal projection of v
onto U (at least if U is finite-dimensional), and one may find this u explicitly
as follows.
Proposition. If U is finite-dimensional, then there always exists a decomposi-

tion v = u + ũ, where u ∈ U and ũ ∈ U ⊥ so we can write V = U ⊕ U ⊥ .
Proof. Let {u1 , . . . , uk } be an orthonormal basis of U , and set
u = (v, u1 ) u1 + · · · + (v, uk ) uk , ũ = v − u .
Clearly u ∈ U ; and for each ui note that (u, ui ) = (v, ui ).(ui , ui ) = (v, ui ). So
for each ui we have
(ũ, ui ) = (v − u, ui )
= (v, ui ) − (u, ui )
= (v, ui ) − (v, ui )
= 0.
Hence ũ ∈ U ⊥ .
Given finite-dimensional vector subspace U of the inner product space V, ( , )
we can consider the orthogonal projection operator P onto U , defined by P (v) =
(v, u1 ) u1 + · · · + (v, uk ) uk , where {u1 , . . . , uk } is an orthonormal basis of U .
is a linear map P : V → V . This projection operator satisfies the following
properties:
• P is a linear operator;
• the image of P is ImP = U , while its kernel is KerP = U ⊥ ;
• P is idempotent, i.e. P 2 = P ;
• P restricted to U is the identity operator, which we usually write as

P |U = I, i.e. P (u) = u for all u ∈ U;
• (I − P )P = P (I − P ) = 0.
The last equation is simply telling us that if we project first a vector v onto
U , and then take the component of this projection orthogonal to U , or if we
take the component of a vector v orthogonal to U and then project it onto U ,
we always find the zero vector.
From the idempontency condition is really easy to see the eigenvalues of P .
Let v be an eigenvector of P , with eigenvalue λ, i.e. P v = λv, then since
P 2 = P we get
λv = P v = P 2 v = λ2 v ,
so the only possible eigenvalues are λ = 0 and λ = 1 since v 6= 0.
Let us assume that the inner product space V has finite dimension dim V = n,
then the matrix form of the projection onto U takes a very simple form if we
use a smart basis. We know that V = U ⊕ U ⊥ , so we can use Gram-Schmidt to
obtain a basis {u1 , ..., uk } for U , made of orthonormal vectors, with k = dim U ,
and similarly a basis {v1 , ..., vn−k } for U ⊥ , made of orthonormal vectors, with
n − k = dim U ⊥ . The set {u1 , ..., uk , v1 , ..., vn−k } forms an orthonormal basis
for V , furthermore the orthogonal projection onto U can be easily evaluated on

these basis elements:
P (ui ) = ui , ∀ i = 1, ...k, while P (vj ) = 0 , ∀ j = 1, ..., n − k ,
so P = Diag(1, ..., 1, 0, ..., 0) where the number of 1s on the diagonal is k =

dimU , while the number of 0s on the diagonal is n − k = dimU ⊥ .
In a certain sense, the projection P (v) of a vector v onto the vector subspace
U , is the vector P (v) = u ∈ U which is “nearest” to v. More mathematically
we have the following proposition.
Proposition. Let U be a finite-dimensional subspace of an inner product space

V , and let v0 ∈ V . If u0 is the orthogonal projection of v0 onto U , then for all
u ∈ U we have
||u − v0 || ≥ ||u0 − v0 ||,
with equality if and only if u = u0 . Thus u0 gives the nearest point on U to
v0 . In some contexts, u0 is a least-squares approximation to v0 .
Proof. Write v0 = u0 + ũ0 . Then
||v0 − u||2 = ||v0 − u0 + u0 − u||2

= ||v0 − u0 ||2 + ||u0 − u||2
by Pythagoras, since v0 − u0 = ũ0 ∈ U ⊥ and u0 − u ∈ U . Thus ||v0 − u||2 ≥

||v0 − u0 ||2 , with equality if and only if ||u0 − u||2 = 0.
Examples.
(a) Consider V = R4 equipped the standard inner product (and the norm
induced by it), find the point in
    

 0 2 
 2 3 
U = span v1 = −1 , v2 = 1
  

 
2 2
 
nearest to  
1
1
v0 = 
1 .

Solution. We first need an orthonormal basis for U . Now kv1 k2 = 9, so the

first unit vector in the Gram-Schmidt process is

 
0
1 2 
u1 =  .
3 −1
2
Then
     
2 0 2
 − (6 − 1 + 4) 1  2  = 1 .
3     
ṽ2 = v2 − (v2 , u1 ) u1 = 
1  9 −1 2
2 2 0
Then kṽ2 k2 = 9, so define  

2
1 1
u2 =  .
3 2 
0
Then {u1 , u2 } is an orthonormal basis for U . The orthogonal projection of
v0 onto U is
     
0 2 10
3  2  5 1 1 11
   
u0 = (v0 , u1 ) u1 + (v0 , u2 ) u2 =  +  =   .
9  −1  9 2 9 7
2 0 6
Using Pythagoras’ theorem gives
kv0 − u0 k2 = kv0 k2 − ku0 k2
= (v0 , v0 ) − (v0 , u1 )2 − (v0 , u2 )2
= 4 − 1 − 25/9 = 2/9.
Note that  
−1
1 −2
v0 − u0 =  ,
9  2 
3
and kv0 − u0 k2 = (1 + 4 + 4 + 9)/81 = 2/9. This agrees!
(b) Let V = C[−1, 1] with the inner product
Z 1
(f, g) = f (t)g(t) dt.
−1
Find the linear polynomial f closest to g(t) = t2 + t.

Solution. The space of linear polynomials is U = span(1, t). From a previ-
ous example, an orthonormal basis for U is
( √ )
1 3
f1 = √ , f 2 = √ t .
2 2
We have
1 1 √
1 t3 t2
Z
1 2 2
(g, f1 ) = √ (t + t) dt = √ + = ,
−1 2 2 3 2 −1 3
Z 1√ √ 4 3 1
√
3 3 3 t t 2
(g, f2 ) = √ (t + t2 ) dt = √ + =√ .
−1 2 2 4 3 −1 3
The orthogonal projection of g onto U is
√ √ √
2 1 2 3 1
f (t) = (g, f1 )f1 + (g, f2 )f2 = √ + √ √ t=t+ .
3 2 3 2 3
So this is the “best” (least-squares) straight-line fit to the parabola.
We can check this. Suppose that f = at + b then we want to find a and b
that minimise kg − f k = kt2 + t − at − bk. Now
Z 1
kt2 + t − at − bk2 = dt t4 + 2t3 + t2 − 2at3 − 2at2 − 2bt2 − 2bt+
−1
+a2 t2 + 2abt + b2

5 1
t t4 t3 at4 2(a + b)t3 a 2 3
t
= + + − − − bt2 + + abt2 + b2 t
5 2 3 2 3 3 −1
2
2 2 4a 4b 2a
= + − − + + 2b2
5 3 3 3 3 2
8 2 1
= + (a − 1)2 + 2 b − ,
45 3 3
where we completed the square in the last line. This is minimised when
a = 1 and b = 1/3. That is g(t) = t + 1/3 as above.
As one would expect from the well known 3-dimensional case of R3 , when
we take the projection of a vector, we generically obtain a “shorter” vector.
This holds in the more general context of inner product spaces discussed in this
Section, where the notion of length of a vector is now replaced by the more
general concept of norm of a vector.
Bessel’s Inequality. Let V be an inner product space and U a finite-dimensional

subspace. If v ∈ V , and u ∈ U is the orthogonal projection of v onto U , then
||u||2 ≤ ||v||2 . In particular if {u1 , . . . , uk } is an orthonormal basis for U and
u = λ1 u1 + · · · + λk uk then
k
X
λ2i ≤ ||v||2 .
i=1
Proof. Writing v = u + ũ as usual, we have

k
X
λ2i = ||u||2 ≤ ||u||2 + ||ũ||2 = ||v||2 .
i=1
2.5 Orthogonal and Unitary Diagonalization

We want to understand now, if, via a change of basis, we can put every real
or hermitian inner product in some standard form, in the same spirit of our
discussion about diagonalization of matrices.
First, the important idea of orthogonal and unitary matrices.
Definitions. A real n × n matrix A is called symmetric if A = At , and anti-
symmetric or skew-symmetric if At = −A. If AAt = I = At A, then A is said to
be orthogonal.
The set (is actually a group as we will see later on) of n × n orthogonal
matrices is usually denoted by O(n) (orthogonal group)
O(n) = {M ∈ GL(n, R) s.t. M t M = M M t = I}.
Note that since detM = detM t , orthogonal matrices must have detM =
±1. The subset (is actually a subgroup) of all the orthogonal matrices with
determinant +1 is denoted by SO(n) (special orthogonal group)
SO(n) = {M ∈ O(n) s.t. detM = +1}.
A complex n × n matrix A = (aij ) is called hermitian if A = A∗ , that is
ajk = akj , anti-hermitian if A = −A∗ , and is called unitary if AA∗ = I = A∗ A.
The set (is actually a group as we will see later on) of n × n unitary matrices
is usually denoted by U (n) (unitary group)
U (n) = {M ∈ GL(n, C) s.t. M ∗ M = M M ∗ = I}.
Note that detA∗ = detA, and for this reason a unitary matrix has |detA|2 = 1.
In particular notice that U (1) = {z ∈ C s.t. z z ∗ = z ∗ z = zz = |z|2 = 1}, so the
determinant of every unitary matrix A is an element of U (1).
A complex number with unit modulus, i.e. an element of U (1), is simply a

point on the unit circle in the complex plane, so U (1) = {eiφ with φ ∈ [0, 2π]},
and the determinant of every unitary matrix takes the form detA = eiϕ for
some ϕ ∈ [0, 2π].
The subset (is actually a subgroup) of all the unitary matrices with determi-
nant +1 is denoted by SU (n) (special unitary group)
SU (n) = {M ∈ U (n) s.t. detM = +1}.
A matrix is orthogonal (or unitary) if its columns form an orthonormal set of

vectors, using the standard inner product on Rn (or Cn ).
Examples.
(a)
1 1 2
√ is symmetric but not orthogonal.
2 2 3
(b)
1 1 −1
√ is orthogonal but not symmetric.
2 1 1
(c)
1 1 1
√ is both symmetric and orthogonal.
2 1 −1
(d)
1 1 i
√ is hermitian but not unitary.
2 −i 2
(e)
1 1 −i
√ is unitary but not hermitian.
2 −i 1
Proposition. If A is real and symmetric, then there exists an orthogonal n × n

matrix P such that P t AP = D with D diagonal. If A is a complex hermitian
n × n matrix, then there exists a unitary n × n matrix P such that P ∗ AP = D
with a D diagonal real matrix. The entries of D are the eigenvalues of A. In
each case, the columns of P are the normalized eigenvectors of A.
Remarks. We shall omit the proof of this, and concentrate on using it. However,
we first prove the converse, and partial bit of the proposition itself, both of
which are fairly straightforward. Intuitively, the reason it is true is that if
the eigenvalues are all different, then there are n orthogonal eigenvectors by
the proposition below, and we can diagonalize by an orthogonal P ; as two
eigenvalues approach each other, the eigenvectors remain orthogonal, so it works
even in the limit when eigenvalues coincide.
Proposition. If A is orthogonally-diagonalizable, then A is symmetric.
Proof. If P t AP = D with P orthogonal and D diagonal, then A = P DP t which
is clearly symmetric.
Similarly for hermitian matrices and unitary diagonalization.
Proposition. If A is unitary-diagonalizable and the eigenvalues of A are all
real, then A is hermitian.
Proof. If P ∗ AP = D with P unitary and D diagonal, then A = P DP ∗ which is
clearly hermitian because we assumed that the eigenvalues of A were all real,
i.e. D∗ = D.
Proposition. Let A be a complex Hermitian (or real symmetric) n × n matrix.
Then the eigenvalues of A are real, and eigenvectors corresponding to different
eigenvalues are mutually-orthogonal. If A is real, then the eigenvectors are real
as well.
Remark. In what follows, we think of v ∈ Cn as a column vector, and use the
standard inner product hv, wi = w∗ v.
Proof. Let λ be an eigenvalue of A, so Ax = λx for some x 6= 0. Note that x
can be taken as real if and only if λ is real. Then we have
x∗ Ax = λx∗ x,
x∗ A∗ x = λx∗ x (conjugate transpose)
x∗ Ax = λx∗ x (A = A∗ as A is hermitian)
λx∗ x = λx∗ x (using Ax = λx
λ = λ (x∗ x > 0).
Next suppose that λ, µ are eigenvalues, with λ 6= µ. Then for some x, y, (which
we may take to be real when A is real, since λ, µ are real) we have
Ax = λx, x 6= 0, Ay = µy, y 6= 0
and
x∗ Ay = µx∗ y,
y∗ Ax = λy∗ x,
x∗ Ay = x∗ A∗ y = λx∗ y = λx∗ y,
so (λ − µ)x∗ y = 0,
x∗ y = 0
and thus x⊥y.

The main idea is that, for real-symmetric and hermitian matrices we can
decompose the full vector space into orthonormalized eigenspaces
V = Vλ1 ⊕ Vλ2 ⊕ ... ⊕ Vλk ,
relative to distinct eigenvalues, i.e. λi 6= λj for all i 6= j, all mutually orthogonal
to one another. From the proposition above we know that eigenvectors relative
to different eigenvalues are mutually orthogonal, so eigenspaces relative to dif-
ferent eigenvalues are directly mutually orthogonal Vλi ⊥ Vλj when i 6= j, with
respect to the standard real or complex inner product (real if the matrix under
consideration is real-symmetric, complex if the matrix considered is hermitian).
Eigenspace by eigenspace, we can apply the Gram-Schmidt procedure. Given
the eigenspace Vλ , with Gram-Schmidt we obtain a new set of mλ (the multi-
plicity of the eigenvalue) linearly independent eigenvectors, all with eigenvalue
λ (i.e. Gram-Schmidt does not bring you outside of the eigenspace), all mu-
tually orthonormal with respect to the standard inner product. Let us denote
(λ) (λ)
with {v1 , ..., vmλ } the orthonormal basis for the eigenspace Vλ .
By collecting together each orthonormal basis from every eigenspace Vλ we
obtain a basis for the full vector space V formed by mutually orthonormal,
eigenvectors of A. We can construct the matrix
(λ ) (λ1 ) (λ ) (λk )
P = [v1 1 , ..., vm λ
, ..., v1 k , ..., vm λ
]
k
where each column is one of these orthonormalized eigenvector. Clearly

(λ ) (λ1 ) (λ ) (λk )
AP = [Av1 1 , ..., Avm λ
, ..., Av1 k , ..., Avm λ
]
k
(λ ) (λ1 ) (λ ) (λk )
= [λ1 v1 1 , ..., λ1 vm λ
, ..., λk v1 k , ..., λk vm λ
]
k
(λ ) (λ1 ) (λ ) (λk )
= [v1 1 , ..., vm λ
, ..., v1 k , ..., vm λk
]D
where D is the diagonal matrix formed by the eingevalues of A (with multiplic-

ities), since each viλ is an eigenvector of A.
Finally, if we consider P t AP , in the case in which A is real and symmetric,
or P ∗ AP in the case in which A is hermitian, we will have to take all the stan-
dard (real or complex) inner products between the different basis elements. To
(λ ) (λ ) (λ ) (λ )
simplify the notation let us relabel the basis {v1 1 , ..., vmλ1 , ..., v1 k , ..., vmλkk }
simply as {v1 , ..., vn }. Thanks to the orthonormalization property of this par-
ticular basis we have
(P t AP )ij = λj vit · vj
in the real and symmetric case, or
(P ∗ AP )ij = λj vi∗ · vj
in the hermitian case. Thanks to the orthogonality between eigenspaces relative

to different eigenvalues, and after having applied the Gram-Schmidt procedure
to each eigenspace separately, we already know that our basis is orthonormal,
i.e. vit · vj = 0 if i 6= j and 1 otherwise, and similarly vi∗ · vj = 0 when i 6= j
and 1 otherwise.
Hence the matrix P , constructed following the procedure outlined above,
diagonalizes orthogonally the matrix A, i.e. P t AP = D, when A is real and
symmetric, or it diagonalizes unitarily the matrix A, i.e. P ∗ AP = D, when A
is hermitian.
Example (a)
2 −2
A= .
−2 5
The eigenvalues are λ = 6, 1, and we get

1 1 2
P =√ .
5 −2 1
Example (b)  
2 −2 0
A = −2 1 −2
0 −2 0
Characteristic polynomial:
 
t−2 2 0
det  2 t − 1 2 = (t − 2)[(t − 1)t − 4] − 2 × 2t
0 2 t
= (t − 2)[t2 − t − 4] − 4t
= t3 − 3 t2 − 6 t + 8
= (t − 1) (t + 2) (t − 4) .
λ = 4, 1, −2.
Eigenvectors:
• λ=4
      
2 2 0 x1 2x1 +2x2 0
(4I − A)x = 2 3 2 x2  = 2x1 +3x2 +2x3  = 0 .
0 2 4 x3 2x2 +4x3 0
Unit eigenvector  
2
1 
−2 .
3 1
• λ=1
      
−1 2 0 x1 −x1 +2x2 0
(I − A)x =  2 0 2 x2  =  2x1 +2x3  = 0 .
0 2 1 x3 2x2 +x3 0
2
1 
1 .
3 −2
• λ = −2
      
−4 2 0 x1 −4x1 +2x2 0
(−2I − A)x =  2 −3 2   x2 =
  2x1 −3x2 +2x3 = 0 .
 
0 2 −2 x3 2x2 −2x3 0
1
1 
2 .
3 2
Thus    
2 2 1 4 0 0
1
P = −2 1 2 P t AP = 0 1 0 
3 1 −2 2 0 0 −2
Note: All the eigenvalues in the example are distinct, so the corresponding
eigenvectors are automatically orthogonal to each other. If the eigenvalues are
not distinct, then for repeated eigenvalues we must choose mutually orthogonal
eigenvectors.
Example (c)
4 1−i
A= .
1+i 5
Then A is hermitian, and the characteristic polynomial of A is given by

λ − 4 −1 + i
det (λI − A) = det
−1 − i λ − 5
= λ2 − 9λ + 18
= (λ − 3)(λ − 6).
Thus the eigenvalues are λ = 3, 6.

Corresponding unit eigenvectors:
λ = 3:
1 −1 + i
√ .
3 1
λ = 6:
1 1−i
√ .
6 2
(Note that these are mutually orthogonal.) Thus
!
1−i 1−i
− 3
√ √
6
P = √1 √2
3 6
is unitary and
3 0
P ∗ AP = .
0 6
Remark on inner products.

Every real inner product on Rn may be written as (u, v) = yt Ax, where A is
a symmetric n × n matrix and x, y are the coordinates of the two vectors u, v
in the chosen basis.
If we change basis, the coordinates representing the two vectors u, v will
change to x = M z, y = M w for some invertible matrix M , where z and w are
the coordinates of the vectors u, v in the new basis.
In the new basis the inner product will take the form (u, v) = yt Ax =
wt M t AM z = wt Bz, where the symmetric matrix B = M t AM represents the
same inner product but in a different basis.
As we have mentioned before, every symmetric matrix A can be orthogonally
diagonalized, i.e. there exists an orthogonal matrix P such that P AP t = D,
for D diagonal, then we have
(u, v) = yt P t DP x = (P y)t D(P x) = wt Dz.
So x 7→ z = P x, y 7→ w = P y is a change of basis which diagonalizes the inner

product. In particular,
(v, v) = (P x)t D(P x) = λ1 (z1 )2 + . . . + λn (zn )2
is positive if and only if the λi are all positive, in other words if and only if the
matrix A is positive-definite.
Similarly, every complex inner product on Cn may be written as hu, vi =
y∗ Ax, where A is an hermitian n × n matrix and x, y are the coordinates of
the two vectors u, v in the chosen basis.
If we change basis, the coordinates representing the two vectors u, v will

change to x = M z, y = M w for some invertible matrix M , where z and w are
the coordinates of the vectors u, v in the new basis.
In the new basis the inner product will take the form (u, v) = y∗ Ax =
w∗ M ∗ AM z = w∗ Bz, where the hermitian matrix B = M ∗ AM represents the
same inner product but in a different basis.
We can use the fact that every hermitian matrix can be unitarily diagonalized
to write P AP ∗ = D, with P unitary and D diagonal, so that we have
hu, vi = y∗ P ∗ DP x = (P y)∗ D(P x).
So x 7→ z = P x, y 7→ w = P y is a change of basis which diagonalizes the inner

product. In particular,
hv, vi = (P x)∗ D(P x) = λ1 |z1 |2 + . . . + λn |zn |2
is positive if and only if the λi are all positive, in other words if and only if the
matrix A is positive-definite.
Chapter 3
Special polynomials
Prologue. In the 18th and 19th centuries, mathematicians modelling things
like vibrations encountered ODEs such as
(1 − x2 )y 00 − 2xy 0 − λy = 0,
with λ constant. This particular one is called the Legendre equation. They
came up again in the 20th century, for example this equation arises in atomic
physics.
Unlike the case of constant coefficients, the general solution of such equations
is messy. But for certain values of λ, there are polynomial solutions, and these
turn out to be the physically-relevant ones. Note that we can write the equation
as Ly = λy, where
2
2 d d
L = (1 − x ) 2 − 2x ,
dx dx
is a linear operator on a vector space of functions. If this function space is
a space of polynomials, then the special values of λ are the eigenvalues of L.
In applications these often correspond to natural freqencies of vibration, or of
spectral lines etc.
3.1 Orthogonal polynomials

Introduction. Let R[x] denote the (infinite-dimensional) vector space of all
real polynomials in x, and let R[x]n denote the (n + 1)-dimensional subspace of
all real polynomials of degree at most n. The inner product is taken to be
Z b
(f, g) = f (x) g(x) k(x) dx,
a
where k(x) is some fixed, continuous function such that k(x) > 0 for all x ∈
[a, b], with a, b fixed numbers.
CHAPTER 3. SPECIAL POLYNOMIALS 42
Note that this is indeed an inner product over the space of continuous, real
valued functions C[a, b], over the interval [a, b], and hence also on its vector
subspaces R[x] and R[x]n :
1. It is symmetric and homogeneous thanks to the properties of the integral
Z b
(λf1 + µf2 , g) = (λ f1 (x) + µ f2 (x)) g(x) k(x)
a
Z b Z b
=λ f1 (x) g(x) k(x) + µ f2 (x) g(x) k(x)
a a
= λ(f1 , g) + µ(f2 , g) = λ(g, f1 ) + µ(g, f2 ) ,
for all constants λ, µ ∈ R, and all f1 , f2 , g ∈ C[a, b].

2. It is positive definite
Z b
(f, f ) = f (x)2 k(x) dx ≥ 0
a
since f (x)2 ≥ 0 and k(x) was chosen by hypothesis to be positive over the
interval [a, b].
3. It is non degenerate, i.e. (f, f ) = 0 if and only if f = 0, using the
properties of continuous functions and the fact that k(x) > 0 is strictly
positive. In fact if f 6= 0, it means that there exists x0 ∈ [a, b] such that
f (x0 ) = c 6= 0. By continuity we know that if f is non-vanishing at x0 ,
there must be an interval around x0 such that f is non-zero on the whole
interval, i.e. for all > 0 there exists δ > 0 such that |f (x) − f (x0 )| >
for all x ∈ (x0 − δ, x0 + δ) (for simplicitiy let us assume that this interval
is entirely contained in [a, b]). On the interval (x0 − δ, x0 + δ), the function
f 2 (x), and k(x) as well, are strictly bigger than zero, i.e. f 2 (x)k(x) > ˜,
for some ˜ > 0 related to , c and the minimum of k(x) in the interval
(which is strictly bigger than zero). So
Z b Z x0 +δ
(f, f ) = f 2 (x) k(x) dx ≥ f 2 (x) k(x) dx > 2 ˜ δ > 0 .
a x0 −δ
The crucial points are continuity and the fact that k(x) is positive definite
over the interval [a, b].
Suppose that we want an orthonormal basis {f0 , f1 , f2 , . . . , fn }, where fn ∈
R[x]n . One way of obtaining such a basis is to start with {1, x, x2 , . . . , xn } as a
“first guess”, and apply the Gram-Schmidt process. But in some special cases
there is also a more interesting way, related to the fact that many such examples
arose in the study of differential equations.
Example: Legendre polynomials. If we take k(x) = 1 and (a, b) = (−1, 1),

then we get a basis of polynomials of which the first few are
r r r
1 3 5
P0 (x) = , P1 (x) = x, P2 (x) = (3x2 − 1), , . . . .
2 2 8
These are called the normalized Legendre polynomials. They are defined by
the following requirements:
1. Pn has degree n, and is orthogonal to Pj for all j < n;
2. Pn is normalized in some way — either the leading term has coefficient 1,

or the norm of Pn is 1 — we will use the second of these.
Their properties include the following.
• By definition, they are orthonormal:

Z 1
6 k,
0 when j =
Pj (x) Pk (x) dx =
−1 1 when j = k.
• There is a formula for Pk (x), namely Rodrigue’s formula

r
1 2k + 1 dk 2
Pk (x) = k (x − 1)k .
2 k! 2 dxk
• Pk (x) is a solution of the differential equation
(1 − x2 )y 00 − 2xy 0 + k(k + 1)y = 0,
which is called the Legendre equation of order k. (This is a second-order

linear ordinary differential equation. You know how to solve such equa-
tions if they have constant coefficients, but the Legendre equation does
not have constant coefficients. It arose in various real-world applications,
and Legendre polynomials arose as its solutions. The Legendre equation
has two independent solutions, of which Pk (x) is one.)
More examples. Here is a list of historically-important examples, named after

various 19th-century mathematicians, for which the same sort of story holds.
(i) Legendre: k(x) = 1 and (a, b) = (−1, 1) ;

[Multipole expansion for the gravitational potential associated to a point
mass or the Coulomb potential associated to a point charge. Schrodinger
equation in spherical polar coordinates.]
√
(ii) Chebyshev-I: k(x) = 1/ 1 − x2 and (a, b) = (−1, 1) ;
√
(iii) Chebyshev-II: k(x) = 1 − x2 and (a, b) = (−1, 1) ;
[Numerical solutions to PDEs and integral equations. Finite elements
methods.]
(iv) Hermite: k(x) = exp(−x2 ) and (a, b) = (−∞, ∞) ;

[Eigenstates of the quantum harmonic oscillator.]
(v) Laguerre: k(x) = exp(−x) and (a, b) = (0, ∞) .

[Radial functions for the hydrogen atom wave functions.]
We shall see in the next section that, in each of these cases, the orthogonal
polynomials arise as special solutions of differential equations, and in fact as
eigenfunctions of linear differential operators.
Remark. Another famous example is that of Fourier series: in this case, we

have
√ orthonormal functions
1/ 2, cos(2πt), sin(2πt), . . . , cos(2πnt),
R1 sin(2πnt), . . .,
with the inner product (f, g) = 2 0 f (t)g(t) dt. Since these are not polynomials,
they do not quite fall into the framework being discussed here, but they are
closely related.
3.2 Linear differential operators

Definition. If V is a real vector space with inner product, and L : V → V is
a linear operator, then L is said to be symmetric if (Lv, w) = (v, Lw) for all
v, w ∈ V .
Example. If V = Rn with the standard Euclidean inner product, and L is

represented by an n × n matrix M , then L is symmetric (according to the
above definition) if and only if M is symmetric (in the sense that M t = M ).
Proof.:
(Lv, w) = wt M v = (M w)t v = (v, Lw).
Symmetric linear differential operators. Here is a list of linear operators

associated with each of the five examples of the previous section, describing its
action on a polynomial f (x).
(i) Legendre: LL f = (1 − x2 )f 00 − 2xf 0 ;
(ii) Chebyshev-I: LI f = (1 − x2 )f 00 − xf 0 ;
(iii) Chebyshev-II: LII f = (1 − x2 )f 00 − 3xf 0 ;
(iv) Hermite: LH f = f 00 − 2xf 0 ;
(v) Laguerre: Ll f = xf 00 + (1 − x)f 0 .
Remarks.
(a) In each case, the corresponding differential equation for f
Lf = λf
says that f is an eigenfunction of the operator L with eigenvalue λ. For example,

the Legendre equation is LL f = −k(k + 1)f .
(b) Each of these operators is symmetric; here is the proof for the Laguerre
case.
R∞
Theorem. If (f, g) = 0 f (x) g(x) exp(−x) dx, then the operator Ll is sym-
metric.
Proof. This is just a calculation:
Z ∞
(Ll f, g) = [xf 00 + (1 − x)f 0 ] g(x) e−x dx
Z0 ∞
−x 0 0
= x e f g(x) dx
0 Z ∞
−x 0 x=∞
= x e f g x=0 − x e−x f 0 (x) g 0 (x) dx
0
= (Ll g, f ) (since 1st term is zero, and 2nd term is symmetric in f & g)
= (f, Ll g) (by symmetry of the inner product).
Theorem. If {P0 , P1 , P2 , . . .} is an orthonormal set of polynomials, with Pj of

degree j, and L is a symmetric linear operator such that the degree of LP is
less than or equal to the degree of P , then each Pj is an eigenfunction of L.
Remark. Note that the extra condition about degrees is satisfied for the five
operators listed above.
Proof. Let n be a positive integer. We need to show that Pn is an eigen-

function of L. Put V = R[x]n and W = R[x]n−1 , so W is a subspace of
V . Now the orthogonal complement W ⊥ has dimension dim(V ) − dim(W ) =
(n + 1) − n = 1, and the single polynomial Pn spans W ⊥ (since Pn is othogonal
to P0 , P1 , . . . , Pn−1 , and so Pn ∈ W ⊥ ). Thus any polynomial in W ⊥ has the
form λPn for some constant λ. Now if P ∈ W , then
(LPn , P ) = (Pn , LP ) = 0,
because L is symmetric and LP ∈ W . This means that LPn ∈ W ⊥ , which in

turn means that LPn = λPn for some λ. In other words, Pn is an eigenfunction
of L.
Remark. So in the above five special cases, the corresponding (Legendre,
Chebyshev etc) polynomials are eigenfunctions of the corresponding (Legen-
dre, Chebyshev etc) operator.
Example. Let V = R[x]3 . Write down the matrix M representing the operator
LL acting on V , using the basis {1, x, x2 , x3 } of V . Use this to construct the
Legendre polynomials {f0 , f1 , f2 , f3 }.
Solution. The operator LL maps 1 to 0, x to −2x, x2 to 2 − 6x2 and x3 to
−12x3 + 6x; so its matrix is
 
0 0 2 0
 0 −2 0 6 
M = 0 0 −6 0  .

0 0 0 −12
Remark. M had to turn out to be upper-triangular, because of the “ex-

tra condition” referred to above regarding the degrees of LL P and P , i.e.
deg LL P ≤ deg P . So to get the eigenvalues is easy. The Legendre case
has Lxn = −n(n + 1)xn + lower − order, so λ = −n(n + 1) for n = 0, 1, 2, 3, . . ..
The eigenvalues here are λ = 0, −2, −6, −12.
For λ = 0, the eigenvector is v0 = [1 0 0 0]t , and the corresponding eigenfunc-
tion is f0 = 1.
For λ = −2, the eigenvector is v1 = [0 1 0 0]t , and the corresponding eigen-
function is f1 = x.
For λ = −6, the eigenvector is v2 = [−1 0 3 0]t , and the corresponding eigen-
function is f2 = 3x2 − 1.
For λ = −12, the eigenvector is v3 = [0 − 3 0 5]t , and the corresponding
eigenfunction is f3 = 5x3 − 3x.
The corresponding normalized functions are the first four normalized Legendre
polynomials.
Nomenclature A function f , satisfying the differential equation
Lf = λf
for some linear differential operator L and some number λ is called an eigen-
function of L corresponding to the eigenvalue λ. If we rewrite this eigenfunction
f using some basis, i.e. the basis of polynomials {1, x, ..., xn , ...}, we will call
the vector containing the coordinates of f in the basis used, the eigenvector of
the associated matrix problem.
For example f3 = 5x2 −3x, computed above, is an eigenfunction of the Legen-
dre operator with eigenvalue λ = −12, while in the standard basis of R[x]3 , the
vector v3 = [0 − 3 0 5] represents the eigenfunction f3 , and it is an eigenvector
of the Legendre operator in matrix form.
3.3 Complex version: hermitian operators

Hermitian operators. Let V be a complex vector space. A linear operator
L : V → V is said to be hermitian if hLu, vi = hu, Lvi. (This property is
also called self-adjoint.) Note that if L is hermitian, then so is Ln for positive
integer n (simple proof).
Similarly, we will say that an operator LA : V → V is anti-hermitian if
hLA u, vi = −hu, LA vi. Note that thanks to the properties of the complex
inner product h , i an anti-hermitian operator LA can always be written as
LA = i L with L hermitian
hLA u, vi = ihLu, vi = ihu, Lvi = −hu, iLvi = −hu, LA vi .
Example. On Cn with the standard basis and the standard inner product,
a linear operator represented by the matrix M is hermitian if and only if the
matrix is hermitian (M ∗ = M ).
Proposition. If L is hermitian, then hLv, vi is real for all v ∈ V .
Proof.:
hLv, vi = hv, Lvi = hLv, vi,
from the hermiticity property of the inner product.
In particular we have the following proposition.

Proposition. The eigenvalues of an hermitian operator are all reals.
Proof.: In fact if v is an eigenvector of an hermitian operator L with eigenvalue

λ, from the proposition above we know that hLv, vi ∈ R, but since Lv = λv
we deduce that λhv, vi ∈ R, hence λ ∈ R.
Proposition. If LA is anti-hermitian, then hLA v, vi is purely imaginary for all

v ∈V.
Proof.:
hLA v, vi = −hv, LA vi = −hLA v, vi,
from the hermiticity property of the inner product. We could have also used
the fact that iLA is an hermitian operator if LA is anti-hermitian.
Proposition. The eigenvalues of an anti-hermitian operator are all purely

imaginary, i.e. if LA v = λv then λ = i c with c ∈ R.
Exercise. Let L and M be hermitian linear operators on V . Then
4|hLMv, vi|2 ≥ |h(LM − ML)v, vi|2 .
Proof. Write c = hLMv, vi. Then
c = hLMv, vi
= hMv, Lvi (since L is hermitian)
= hv, MLvi (since M is hermitian)
= hMLv, vi (hermiticity property of inner product).
So hMLv, vi = c̄, and |h(LM − ML)v, vi|2 = |c − c̄|2 = 4[Im(c)]2 ≤ 4|c|2 .
Example. Let V be the space of differentiable complex-valued functions f (x)

on R, such that xn f (x) → 0 as x → ±∞ for every fixed value of n. Define an
hermitian inner product on V by
Z ∞
hf, gi = f (x) g(x) dx.
−∞
Some examples of hermitian linear operators on V are:
• X : f (x) 7→ xf (x), multiplication by x;
• X 2 : f (x) 7→ x2 f (x), multiplication by x2 ;
• P : f (x) 7→ if 0 (x), differentiation plus multiplication by i.

Exercise. Show that P : f (x) 7→ if 0 (x) is hermitian.

Solution. Integrate by parts and use f (x) → 0 as x → ±∞:
Z ∞
hP f, gi = i f 0 (x) g(x) dx
−∞
Z ∞
= −i f (x) g 0 (x) dx
−∞
Z ∞
dg(x)
= f (x) i dx
−∞ dx
= hf, P gi.
Note that the i factor in the definition of P is essential to the hermiticity

property of P , without that factor the operator −iP = d/dx would be anti-
hermitian.
3.4 An application: mechanics

Particle on a line. Suppose we have a particle on the x-axis. Its state at a
particular time is described by a unit vector f ∈ V , with V the function space
above. The interpretation of f is that |f (x)|2 is a probability distribution: the
Rb
probability of the particle being in the interval [a, b] on the x-axis is a |f (x)|2 dx.
So ||f || = 1 just says that the particle has to be somewhere.
Now any observable quantity q corresponds to an hermitian linear operator
Q on V . The expected value of the quantity q in the state f is hQi = hQf, f i.
The significance of Q being hermitian is that it guarantees that hQi is real: this
follows from a proposition in the previous section.
One example is the position operator X : ψ(x) 7→ xψ(x). If we prepare the
same state many times, and measure the position of the particle each time, we
get a distribution of answers, the mean or average of which is hXi.
What about X 2 : ψ(x) 7→ x2 ψ(x)? This is related to the variance ∆2x of x,
which is defined to be the expected value of (X − hXi)2 . The quantity ∆x
(standard deviation) measures the uncertainty in the position, or the spread of
position measurements in the state f .
What about L : f (x) 7→ if 0 (x)? For historical reasons, one multiplies this by a
certain real constant β, and the observable quantity is then called momentum p.
So P : f (x) 7→ iβf 0 (x). [Actually β = −h/(2π), where h is Planck’s constant.]
Variance and uncertainty. The variance in position is hA2 i where A =

X − hXi, and the variance in momentum is hB 2 i where B = P − hP i. Note
that A and B are hermitian, since X and P are. Now

hA2 ihB 2 i = hA2 f, f ihB 2 f, f i
= hAf, Af ihBf, Bf i (since A, B are hermitian)
≥ |hAf, Bf i|2 (Cauchy-Schwarz)
= |hBAf, f i|2 (B hermitian)
≥ |h(BA − AB)f, f i|2 /4 (Lemma of previous section).
The next step is to compute (BA−AB)f . Almost all the terms of this expression
cancel, and the only one that remains comes from the fact that (xf )0 − xf 0 = f ;
in fact (check as an exercise), we get (BA−AB)f = −iβf . Using this, together
with the fact that hf, f i = 1, gives
2
β
hA2 ihB 2 i ≥ ,
2
and hence we get the Heisenberg Uncertainty Rule
|β| h
∆x ∆p ≥ = .
2 4π
So the accuracy with which both position and momentum can be determined is
limited. In standard kg-m-s units, the value of Planck’s constant is very small
(h ≈ 6.6 × 10−34 ), so this is not a limitation that we are familiar with from
everyday life — but it is there.
The harmonic oscillator. Another hermitian operator on V is H : f 7→
−f 00 (x) + U (x), where U (x) is some fixed real function. The corresponding
observable (with suitable constants omitted here) is energy. The function U (x)
is the potential energy. The eigenvalues of H are the allowable energy levels of
the system.
The simplest non-trivial case is U (x) = x2 , which is the harmonic oscillator.
To find the eigenfunctions in this case, substitute f (x) = g(x) exp(−x2 /2) into
Hf = λf . This gives
g 00 − 2xg 0 + (λ − 1)g = 0.
Now L : g 7→ g 00 − 2xg 0 is the Hermite operator, and its eigenvalues are
0, −2, −4, −6, . . .. So λ = 2n + 1 for n = 0, 1, 2, 3, . . ., and these are the dis-
crete energy levels of the one-dimensional harmonic oscillator. With constants
added, the energy eigenvalues are
r
h k
En = (2n + 1) ,
4π m
where m is the mass of the particle and k is the “spring constant”, ie. U (x) =
1 2
2 kx .
Chapter 4
Groups II
4.1 Axioms and Examples
Definition. A group is a set G equipped with an operation • : G × G → G
such that
(i) if x, y ∈ G then x • y ∈ G (closure);
(ii) (x • y) • z = x • (y • z) for all x, y, z ∈ G (associativity);
(iii) G contains an identity element e such that x • e = e • x = x for all x ∈ G;
(iv) each x ∈ G has an inverse x−1 ∈ G such that x−1 • x = x • x−1 = e.
A group is abelian (or commutative) if x • y = y • x for all x, y ∈ G.
Proposition If (G, •) is a group, the identity element e is unique.

Proof. Suppose that e1 , e2 ∈ G are both identity elements, this means that
e1 • x = x • e1 = x ,
e2 • x = x • e2 = x ,
for all elements x ∈ G. In particular if we put x = e2 in the first equation, we

get
e1 • e2 = e2 • e1 = e2 ,
while if we substitute x = e1 in the second equation we get
e2 • e1 = e1 • e2 = e1 ,
so e1 = e2 , and the identity element is unique.

CHAPTER 4. GROUPS II 52
Definition We say that H is a subgroup of the group G if H is a subset of G

and if H is a group in its own right, with the group operation inherited from
G.
Example (0). Any vector space V with the operation +. In this case, the
zero vector 0 is the identity, and the inverse of v is −v. This group structure
is abelian. If V is a real 1-dimensional vector space, then this is the group R
of real numbers under addition (ie. the operation is +).
And now something a little less trivial.
Example (a). The group GL(n, R) of n × n real invertible matrices, with

P •Q = P Q (matrix multiplication). Here the identity element is the identity
n × n matrix I, and the inverse of a matrix in the group sense is just its inverse
in the usual matrix sense. The fact that matrix multiplication is associative is
a straightforward (but messy) calculation. This group GL(n, R) is not abelian,
except for the case n = 1 when it is just the group of non-zero real numbers
with multiplication as the operation.
Example (b). The group GL(n, C) of n × n complex invertible matrices. The

details are the same as for the previous example. Note that GL(n, R) is a
subgroup of GL(n, C).
Example (c). The group SL(n, R) of n×n real matrices M having det(M ) = 1,
is a subgroup of GL(n, R). Proof.:
(i) closure follows from det(P Q) = det(P ) det(Q) = 1;
(ii) associativity is inherited from GL(n, R);
(iii) I is in the set, since det(I) = 1;
(iv) if det(M ) = 1, then det(M −1 ) = 1.
Example (d). The group O(n) of n×n real orthogonal matrices, is a subgroup
of GL(n, R). Proof.:
(i) if A and B are orthogonal, then (AB)(AB)t = ABB t At = AAt = I, so

AB is orthogonal;
(ii) associativity is inherited from GL(n, R);
(iii) I is othogonal;
(iv) if A is orthogonal, then A−1 = At by definition, so (A−1 )(A−1 )t = At A = I,

so A−1 is orthogonal.
Example (e). The group SO(n) of n × n real orthogonal matrices M having

det(M ) = 1. This is a subgroup both of O(n) and of SL(n, R). The group
SO(2) is particularly simple: if P ∈ SO(2) we can write

a b
P = ,
c d
and
a c
Pt = ,
b d
with a, b, c, d ∈ R. The orthogonality condition P t P = P P t = I plus the extra
condition that detP = 1 is equivalent to the system of equations


 a2 + c2 = 1 ,
ab + cd = 0 ,


 b2 + d2 = 1 ,
ad − bc = 1 . (detP = +1)

We can easily solve this system and the most generic P ∈ SO(2) takes the form

cos θ sin θ
P = ,
− sin θ cos θ
where the rotation angle θ ∈ [0, 2π]. [ We say that SO(2) is isomorphic to
S 1 = {(a, c) ∈ R2 s.t. a2 + c2 = 1}.]
Exercise. Try to visualize how an SO(2) matrix P transforms a generic vector

v ∈ R2 if we consider it as a linear application P : v → P v. [Hint: Use the
form for P ∈ SO(2) written above]
Solution. Consider at the beginning the vector v = (1, 0), the application
P v = (cos(θ), − sin(θ)) which correspond to a clockwise rotation of the initial
vector v = (1, 0) of an angle θ. For a generic vector v = (x, y) the application
P v = (cos(θ)x + sin(θ)y, − sin(θ)x + cos(θ)y) still correspond to a clockwise
rotation of an angle θ of the vector v = (x, y).
Examples (f ). The group SL(n, C) of n × n complex matrices M having

det(M ) = 1; the group U(n) of n × n complex unitary matrices; and the group
SU(n) of n × n complex unitary matrices M having det(M ) = 1.
Exercise. Try to parametrize SU(2) in way similar to the one obtained for
SO(2).
Solution. Start with writing a generic 2 × 2 matrix as

a b
M= ,
c d
with a, b, c, d ∈ C. For M to be in SU (2) we must have M ∗ M = M M ∗ = I

and det M = 1, with
∗ ā c̄
M = .
b̄ d¯
The unitarity condition M ∗ M = M M ∗ = I plus the extra condition that
detM = 1 is equivalent to the system of equations


 |a|2 + |c|2 = 1 ,
b ā + d c̄ = 0 ,


 |b|2 + |d|2 = 1 ,
ad − bc = 1 , (detM = +1)

We can solve for b = −c̄ and d = −ā, so that the most general M ∈SU(2) takes
the form
a −c̄
M= ,
c −ā
with |a|2 + |c|2 = 1.
[Writing a = x1 + ix2 and c = x3 + ix4 with x1 , x2 , x3 , x4 ∈ R, the equation
|a|2 + |c|2 = 1 becomes x21 + x22 + x23 + x24 = 1. We say that SU(2) is isomorphic
to S 3 = {x ∈ R4 s.t. x21 + x22 + x23 + x24 = 1}]
Exercise. Show that the set G of matrices of the form

1 x
Mx = ,
0 1
with x ∈ R, forms a group under matrix multiplication. Solution: Mx My =

Mx+y , so G is just the set of real numbers as an additive group, i.e. with group
operation +. Note that if we change the form of the matrix Mx a bit, for
example by putting a 2 in the upper-left-hand corner, then it would no longer
work (the closure property would fail).
Definition. A finite group G is a group having a finite number of elements,

and then its order, denoted by |G|, is the number of elements.
Theorem. (Lagrange). Let G be a finite order group and H a subgroup of G,

then the order of H divides the order of G.
Example (g). The integers Z with operation + forms an abelian group. If n is

an integer with n ≥ 2, then we may define an equivalence relation on Z by p ∼ q
if and only if n divides p − q; this is also written p ≡ q (mod n). The set Zn
of equivalence classes forms a finite abelian group of order n, with addition as
the group operation (note that + is well-defined on equivalence classes). This
group Zn is also called the cyclic group of order n. For example, the group
Z4 has elements {0, 1, 2, 3}, and the group operation of addition modulo 4 is
expressed in the group table
+ 0 1 2 3
0 0 1 2 3
1 1 2 3 0
2 2 3 0 1
3 3 0 1 2
Example (h). There are exactly two groups of order 4, namely Z4 as above,
and the Klein group V which has the table
• e a b c
e e a b c
a a e c b
b b c e a
c c b a e
Note that the group table of Z4 has a quite different structure from that of
V . One visualisation (or representation) of V is as the group of rotations by
180◦ about each of the x-, y- and z-axes in R3 (ie. the element a corresponds
to rotation by π about the x-axis, etc). Exercise: by using a marked cube such
as a die, convince yourself that a • b = c.
Another representation of V is as the group of reflections across the x−axis,
the y−axis, and a combine reflection across the x−axis and then the y−axis,
in R2 . Take

1 0 1 0 −1 0 −1 0
e= , a= , b= , c= ,
0 1 0 −1 0 1 0 −1
and see that these matrices satisfy the group table of the Klein group using as
group operation the matrix multiplication.
Definition. Let g ∈ G. The smallest positive integer n such that g n = e, if

such an n exists, is called the order of g, denoted by |g|. The identity element
has order 1.
Proposition For every element g ∈ G, the order of g divides the order of the
group.
Proof. Consider H = {g, g 2 , ..., g n } where n = |g|, this is a subgroup of G
called the subgroup generated by g. The order of H is precisely equal to the
order of g. From the Lagrange theorem the order of H divides the order of the
group hence the order of g divides |G|.
In Z, no element other than the identity has finite order. But in Zn , every
element has finite order. Observe that the elements a, b, c of the Klein group
V all have order 2; whereas in Z4 , 2 has order 2 and 1, 3 have order 4. In a
matrix group like GL(2, C), most elements have infinite order, but some have
finite order: eg A = diag(1, −1) has order 2.
Definition. Two groups G and H are isomorphic if there is a bijection f : G →
H preserving the group operation, ie f (g1 g2 ) = f (g1 )f (g2 ) for all g1 , g2 ∈ G. In
particular f (eG ) = eH where eG is the identity in G and eH is the identity in
H.
The observation above shows that the Klein group V and Z4 are not isomor-
phic, in fact we have the following proposition.
Proposition If the groups G and H are isomorphic, the order of every element
g ∈ G is equal to the order of the corresponding element f (g) ∈ H.
Proof. The proof is very simple, if the order of g ∈ G is n this means that
n
g = eG where eG is the identity in G, but since G and H are isomorphic we
have that f (g)n = f (g n ) = f (eG ) = eH , so we deduce that the order of f (g) ∈ H
is precisely equal to n, the order of g ∈ G.
At this point we can start ”combining“ groups together to obtain new groups.
Definition. Given two groups G and H, the direct product G × H is defined
as follows:
1. The elements of G × H are the ordered pairs (g, h) where g ∈ G and
h ∈ H. We can also say that the set of elements of G × H is simply the
Cartesian product of the sets G and H;
2. The operation • is defined component by component, i.e.
(g1 , h1 ) • (g2 , h2 ) = (g1 · g2 , h1 · h2 ) ,
where on the first component we use the operation · defined on G and on
the second component the one on H.
Proposition The direct product G × H of two groups G and H, is a group,

furthermore if G has order n and H has order m, the direct product G × H has
order n · m.
Proof: Associativity descends from the associativity of the operations on G

and H. The identity element on G × H is simply e = (eG , eH ), where eG
is the identity on G and eH the identity on H. The inverse of an element
(g, h) ∈ G × H is simply (g −1 , h−1 ) where g −1 is the inverse of g in G, and h−1
is the inverse of h in H.
The order of G × H is the product of the order of G times the order of H and
it follows from the formula for the cardinality of the cartesian product of sets.
Example The direct product of two cyclic groups of order 2 is Z2 × Z2 . This

group has 4 elements, i.e. {(0, 0), (1, 0), (0, 1), (1, 1)}. The element (0, 0) is the
identity element in Z2 × Z2 and the group table is given by
+ (0, 0) (1, 0) (0, 1) (1, 1)
(0, 0) (0, 0) (1, 0) (0, 1) (1, 1)
(1, 0) (1, 0) (0, 0) (1, 1) (0, 1)
(0, 1) (0, 1) (1, 1) (0, 0) (1, 0)
(1, 1) (1, 1) (0, 1) (1, 0) (0, 0)
Comparing with Example (h), we see that Z2 × Z2 is isomorphic to the Klein

group.
Proposition The order of each element (g, h) ∈ G × H is the least common

multiple of the orders of g and h.
Example. Take the direct product of two cyclic groups Z4 × Z3 , and consider
the element (2, 2). The order of 2 in Z4 is 2 while the order of 2 in Z3 is 3 so
the order of (2, 2) in Z4 × Z3 is 6, in fact we have
(2, 2)1 = (2, 2) (2, 2)2 = (0, 1) (2, 2)3 = (2, 0)

(2, 2)4 = (0, 2) (2, 2)5 = (2, 1) (2, 2)6 = (0, 0).
4.2 Rotations and their subgroups

Rotations. We work in the vector space Rn with the standard Euclidean inner
product. If vectors v and w are thought of as column vectors, then the inner
product is (v, w) = wt v. Rotations are linear transformations v 7→ M v which
preserve lengths and angles, which means that they preserve the inner product:
for all v, w we have
wt v = (M w)t (M v) = wt M t M v.
This requires M t M = I, ie. M has to be an orthogonal matrix. So the rotations
of Rn form a group, which is exactly the orthogonal group O(n).
For example, if n = 1, then we get O(1) = {1, −1}. The group operation is
multiplication, the element 1 is the identity, and the element −1 is the reflection
x 7→ −x. This group is finite and abelian, but O(n) for n ≥ 2 is neither finite
nor abelian.
Pure (proper) rotations. Recall that any orthogonal matrix M has det(M ) =
±1. Elements of O(n) with det(M ) = −1 are rotations which incorporate a
relection. The pure rotations, on the other hand, have det(M ) = 1; in other
words, they belong to SO(n). Note that SO(1) is trivial, SO(2) is abelian, and
SO(n) for n ≥ 3 is not abelian.
Plane polygons. Consider a regular plane polygon with n sides. The pure
rotations leaving it unchanged make up a finite abelian group of order n: isomor-
phic to Zn , but with the matrix representation Mj ∈ SO(2) for j = 1, 2, . . . , n,
where
cos(2πj/n) sin(2πj/n)
Mj = .
− sin(2πj/n) cos(2πj/n)
For example, the pure rotations which preserve a square (n = 4) are represented
by the four orthogonal matrices

0 1 −1 0 0 −1
M1 = , M2 = , M3 = , M4 = I.
−1 0 0 −1 1 0
These form a finite subgroup of SO(2).
The group of all symmetries of the regular n-polygon is the dihedral group
Dn , which is a finite group of order 2n. (Some people call it D2n , so beware
of confusion.) This includes reflections; so Dn is finite subgroup of O(2). The
name comes from “dihedron”, namely a two-sided polyhedron.
For example, the group of symmetries of a square is D4 , the elements of which
are represented by {M1 , M2 , M3 , M4 } plus the four reflections

0 −1 −1 0 0 1 1 0
R1 = , R2 = , R3 = , R4 = .
−1 0 0 1 1 0 0 −1
You can visualize R1 as a reflection across the line y = −x, R2 as a reflection
across the x-axis, R3 as a reflection across the line y = x and finally R4 a
reflection across the y-axis
Using the representation written above of the Klein group as group of reflec-
tions across the x− and y− axis in R2 , you can prove that the Klein group is a
subgroup of D4 .
4.3 More on finite order groups

4.3.1 The Symmetric group and the Alternating group
Definition The symmetric group of degree n, denoted by Sn , is the group
whose elements are all the permutations that can be performed on n distinct
symbols, and whose group operation is the composition of permutations.
The order of Sn = n!.
Remember that any permutation σ ∈ Sn can be written as product of trans-
positions. The sign sgn σ of the permutation is defined to be sgn σ = +1, if σ
is written as the product of an even number of transpositions, or sgn σ = −1 if
σ is written as the product of an odd number of transpositions.
We will say that a permutation σ ∈ Sn is even if sgn σ = +1, i.e. if it can
be decomposed as the product of an even number of transpositions, or odd
otherwise.
Proposition Let An = {σ ∈ Sn | sgn σ = +1} be the subset of Sn containing
all the even permutations. An is a subgroup of Sn called the alternating group
of degree n.
The order of An is n!/2. Note that the subset of Sn given by the odd permu-
tations does not define a subgroup.
4.3.2 Multiplicative groups

Consider the set of equivalence classes of integers modulo 3, namely S =
{0, 1, 2}. The operation of multiplication is well-defined on S, and 1 is an
identity. Now S is not a group, since 0 has no inverse. But if we omit 0, then
we get the multiplicative group Z×
3 = {1, 2}. This group has order 2 and table
× 1 2
1 1 2
2 2 1
Note that Z× ∼
3 = Z2 .
What about the integers modulo 4? The set S = {1, 2, 3} is not a group,
since it is not closed. But if we omit 2, then we get a group Z×
4 = {1, 3} with
table
× 1 3
1 1 3
3 3 1
which is again isomorphic to Z2 . It is not hard to see that the set Z×

n , consisting
of classes k such that k and n do not have a common factor, forms a finite
multiplicative group. If n is prime, then this group has order n − 1; but if n is
not prime, then the order is less than n − 1.
Examples. (a) Z×
5 has order 4, and table
× 1 2 3 4
1 1 2 3 4
2 2 4 1 3
3 3 1 4 2
4 4 3 2 1
Observe, by comparing the tables with elements in different order, that Z× ∼

5 =
Z4 .
(b) Z× 6 has order 2, and is isomorphic to Z2 . First we consider the set S =
{1, 2, 3, 4, 5}, this is not a group, since it is not closed, i.e. 2 · 3 = 0 (mod 6)
and similarly 3 · 4 = 0 (mod 6). If we omit 2, 3, 4 we get the group Z× 6 = {1, 5}
whose table is
× 1 5
1 1 5
5 5 1
clearly Z× ∼
6 = Z2 .
Chapter 5
Pseudoinverse and Jordan

normal form
5.1 The pseudoinverse
Introduction. Given an n × n matrix A and an n-vector k, consider the
equation Ax = k. If det(A) 6= 0, then this has a unique solution x = A−1 k.
But what if det(A) = 0? Consider the case k = 0, ie. Ax = 0. Suppose that m
rows of A are linearly-dependent with m < n, so in effect we have m equations
in n unknowns. The solutions form a vector space of dimension n − m. In other
words, there are many solutions. We say that the system is underdetermined.
Example. If the matrix M has eigenvalue λ, then A = M − λI has det(A) = 0,

and the solutions of Ax = 0 are exactly the eigenvectors in the eigenspace Vλ .
Pseudoinverse. What if x is an m-vector, and A an n × m matrix, with

n > m? Then the system Ax = k is overdetermined, and in general it has
no solutions. The question is: what is the “best try” for x? Suppose that
the columns of A are linearly-independent. (The rows of A cannot be linearly-
independent, but the columns generally will be.) Then the m × m matrix At A
is invertible (proof omitted). Define the pseudoinverse of A to be the m × n
matrix ψ(A) = (At A)−1 At . Now
Ax = k ⇒ At Ax = At k ⇒ x = (At A)−1 At k = ψ(A)k.
This is unique “best-fit” solution. We shall see below that it is a least-squares

fit. Note that if m = n, then
ψ(A) = (At A)−1 At = A−1 (At )−1 At = A−1 ,
and so the “best-fit” solution really is a solution.

CHAPTER 5. PSEUDOINVERSE AND JORDAN NORMAL FORM 62
Example. Find the line which best fits the data points (1, 1), (2, 2.4), (3, 3.6),
(4, 4) in R2 .
Solution. Fitting the line y = a + bx to the data-points gives Av = k, where
   
1 1 1
1 2  a 2.4
A= 1 3 
, v = , k =  .
b 3.6
1 4 4
Then
4 10 1 10 5 0 −5
At A = ⇒ ψ(A) = .
10 30 10 −3 −1 1 3
Thus
a 0.2
= ψ(A)k = ,
b 1.02
and so y = 0.2 + 1.02x is the best fit.
Theorem. Let (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) be a set of n data points. Let

y = a0 + a1 x + . . . + am−1 xm−1 ∈ R[x]m−1 be a polynomial that is to be fitted
to these data points, with m < n. Substitution gives Aa = y, where
1 x1 · · · x1m−1
     
a0 y1
A =  ... ... ..
.
..  ,
. a =  ...  , y =  ...  .
1 xn · · · xnm−1 am−1 yn
Then a = ψ(A)y gives the polynomial which is the least-squares fit.

Proof. Think of y ∈ Rn with its standard Euclidean inner product. Now A :
Rm → Rn , and W = Image(A) is an m-dimensional subspace of Rn . Let y0 be
the orthogonal projection of y onto W . Then the points (x1 , y10 ), (x2 , y20 ), . . . , (xn , yn0 )
lie on some polynomial, since y0 ∈ W . And this choice of y0 minimizes
ky − y0 k2 = (y1 − y10 )2 + . . . + (yn − yn0 )n ;
so the polynomial with coefficients a, where Aa = y0 , is the least-squares fit.

Now a basis for W is {w1 , . . . , wm }, where the wi are the columns of A. For
example,  
1
0
w1 = A   ...  .

0
So the condition for y − y0 to be orthogonal to W is
At (y − y0 ) = 0
⇒ At Aa = At y0 = At y
⇒ a = (At A)−1 At y = ψ(A)y.
5.2 Jordan normal form

Any square matrix is similar to one in Jordan normal form, which is almost
(but maybe not quite) diagonal. Let us start with the 2 × 2 case.
Proposition. Let the 2 × 2 matrix A have repeated eigenvalue λ. Then either

A = λI diagonal, or A is similar to the Jordan-normal-form matrix

λ 1
J= .
0 λ
Proof. The Cayley-Hamilton theorem tells us that (A − λI)2 = 0. If A − λI = 0,

then the first possibility holds; so suppose that A − λI 6= 0. This implies that
there is some vector v such that w = (A − λI)v 6= 0. Clearly (A − λI)w = 0,
because (A − λI)2 = 0, so Aw = λw and Av = λv + w. A similar argument
to one we’ve used before shows that v and w are linearly independent, so
M = [w v] is invertible. And also AM = M J, which proves our result.
Example 1. Solve the system of ODEs
ẋ = x + y,
ẏ = −x + 3y.
Solution. The system has the form u̇ = Au, where

1 1
A= .
−1 3
The characteristic polynomial of A is p(t) = (t − 2)2 , so A has the double

eigenvalue λ = 2. There is only one eigenvector, namely w = (1, 1), so A
cannot be diagonalized. To get v we solve (A − 2I)v = w, and one solution
is v = (1, 2) — this is not the only solution, but that doesn’t matter. Thus
M −1 AM = J, where

1 1 2 1
M= , J= .
1 2 0 2
If s = M −1 u, then the original system becomes ṡ1 = 2s1 + s2 , ṡ2 = 2s2 , which
is easily solved to give s2 (t) = c2 exp(2t), s1 (t) = (c1 + c2 t) exp(2t). [Exercise
for you: obtain this.] Finally, transform back to u = M s, giving
x(t) = (c1 + c2 + c2 t) exp(2t), y(t) = (c1 + 2c2 + c2 t) exp(2t).
[Exercise for you: check this by substitution.]
For larger matrices, the Jordan normal form has the eigenvalues down the diag-
onal, either ones or zeros immediately above the diagonal, and zeros everywhere
else. It is a block-diagonal matrix, with each block corresponding to a particular
eigenvalue λ. These Jordan blocks can be described as follows:

(i) λ — the Jordan block of size 1;

λ 1
(ii) — the Jordan block of size 2;
0 λ
 
λ 1 0
(iii)  0 λ 1  — the Jordan block of size 3;
0 0 λ
and so on.
Example 2. Find the Jordan normal form of
 
2 2 −2
A = 1 −1 1  .
2 0 0
We first calculate the characteristic polynomial:
 
2−t 2 −2
pA (t) = det(A − tI) = det  1 −1 − t 1  = −t3 + t2 = (1 − t)t2 .
2 0 −t
Thus the eigenvalues are λ = 1 and λ = 0 (twice).
• λ=1     
0 1 2 −2 x1
0 = (A − I)x = 1 −2 1  x2  .
0 2 0 −1 x3
So x3 = 2x1 and 2x2 = −x1 + 2x3 = 3x1 . An eigenvector is
 
2
u1 = 3 .

4
• λ=0     
0 2 2 −2 x1
0 = (A − 0I)x = Ax = 1 −1 1  x2  .
0 2 0 0 x3
So x1 = 0 and x2 = x3 . An eigenvector is
 
0
u2 = 1 .
1
The 0-eigenspace is spanned by u2 . So to calculate the Jordan normal

form, we must solve Ax = u2 , which gives
    
2 2 −2 x1 0
1 −1 1  x2  = 1 .
2 0 0 x3 1
Hence we can take  

1/2
u3 = −1/2 .
0
Let M be the matrix whose columns are u1 , u2 and u3 . Then

   
2 0 1/2 1 1 −1
M = 3 1 −1/2 , M −1 = −4 −4 5  ,
4 1 0 −2 −4 4
and  
1 0 0
M −1 AM = 0 0 1 .
0 0 0
This is the Jordan normal form of A. It has a 1 × 1 block associated with λ = 1,
and a 2 × 2 block associated with λ = 0.
Remark. If λ is a triple eigenvalue, and there is only one eigenvector w,
then we just iterate the process. Solve (A − λI)v = w to get v, then solve
(A − λI)u = v to get u; and take M to have the three columns w, v, u.
Example 3. Find the Jordan normal form of
 
1 1 0
A= 0 2
 1 .
1 −1 3
We first calculate the characteristic polynomial of A:

 
1−t 1 0
pA (t) = det(A − tI) = det  0 2 − t 1  = (2 − t)3 .
1 −1 3 − t
Therefore the only eigenvalue is λ = 2, of multiplicity 3. We first solve

    
0 −1 1 0 x1
0 = (A − 2I)x =  0 0 1 x2  .
0 1 −1 1 x3
The solutions satisfy x1 = x2 , x3 = 0. So we can take

 
1
u1 = 1 .
0
Now we look for a solution of (A − 2I)x = u1 , which gives

 
0
u2 = 1 .
1
Finally, we solve (A − 2I)x = u2 , which gives the third root vector

 
0
u3 = 0 .
1
Now if M is the matrix whose columns are u1 , u2 and u3 , we have

   
1 0 0 1 0 0
M = 1 1 0 , M −1 = −1 1 0 ,
0 1 1 1 −1 1
and  
2 1 0
M −1 AM = 0 2 1 .
0 0 2
This is the Jordan normal form of A. It has a single 3×3 Jordan block associated
with λ = 2.
Note that these choices are not unique. We could have chosen
 
1
u2 = 2 ;
1
and then we can choose

   
−1 0
u3 =  0  or 1 .
2 2

Linear Algebra Notes: Eigenvectors & Diagonalization

Uploaded by

Copyright:

Available Formats

You might also like

Linear Algebra Notes: Eigenvectors & Diagonalization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Algebra Notes: Eigenvectors & Diagonalization

Uploaded by

Copyright:

Available Formats

Chapter 1

Example A. A predator-prey model. Let xn and yn be the number of owls and

So given starting a population v0 , the population after n years is M n v0 . Can

Example B. Consider the following system of ordinary differential equation

This can be written in matrix form as

1.2 Eigenvalues and eigenvectors

Since v is non-zero by definition, this means that the matrix B = A − λI is not

Example 2. If A is an upper-triangular matrix, then the eigenvalues of A are

Solution. The characteristic polynomial is pA (t) = t2 −1, and so the eigenvalues

−pA (t) = det(tI − A)

The factorization is obtained, for example, by spotting that t = 1 is a root, and

Then p(t) = t2 − (a + d)t + (ad − bc), and

The characteristic polynomial takes the form

p(t) = (t − a11 )(t − a22 ) . . . (t − aN N ) ,

so let’s consider the associated matrix equation:

p(A) = (A − a11 I)(A − a22 I) . . . (A − aN N I) .

We want to prove that p(A) = 0 as a matrix equation. To prove it we will show

from the upper triangular form of A.

1.3 Interlude: the $400bn eigenvector

1.4 The similarity relation

with α ∈ R and α 6= 0. Clearly pA (t) = pB (t) = (t − 1)2 so A and B have

(c) If A is diagonalizable but B is not, then they cannot be similar. Diago-

(d) Not all square matrices are diagonalizable.

(e) If the eigenvalues of A are distinct, then A is diagonalizable.

(f) If A is a symmetric matrix, then A is diagonalizable.

1.5 Diagonalization by change of basis

Proposition. The n × n matrix A is diagonalizable if and only if it has n

to make an n × n matrix M = [v1 v2 . . . vn ], and let D denote the diagonal

Thus M −1 AM = D, and A is diagonalizable. For the converse, suppose that

Solution. p(t) = t2 − 4t + 3 = (t − 1)(t − 3), so the eigenvalues are λ1 = 1 and

has solution −4v1 + 4v2 = 0, so taking v2 = 1 gives v1 = 1, and so

Solution. The procedure is to diagonalize the system, which amounts to the

x(t) = 2c1 et + c2 e3t ,

Solution. Here p(t) = (1 − t)2 , so there is one eigenvalue are λ = 1 with

has solution v2 = 0. So the eigenspace V1 is spanned by

Solution. p(t) = t2 + 1, so the eigenvalues are λ1 = i and λ2 = −i. Then we get

Remark. If A is a real matrix, it may be possible to diagonalize A over

Example 6. Mathematical models often lead to discrete-time evolution sys-

admits a stable solution,√and find it (as a unit vector). [Answer: p(t) = (t −

Question. Can we define the k th root of a diagonalizable matrix A, i.e. a

where xi , yj are the coordinates of the vectors u and w) respectively, in the

Definition of real inner-product space. An inner product on a real vector

(ii) (u + v, w) = (u, w) + (v, w);

(iii) (λu, v) = λ(u, v);

(iv) (v, v) ≥ 0 with (v, v) = 0 ⇐⇒ v = 0 (positivity).

(a) V = Rn with the standard Euclidean inner product (dot product)

(b) V = Rn and (u, v) = a1 u1 v1 + · · · + an un vn , where a1 , a2 , . . . , an are fixed

(c) On V = R4 , the expression (x, y) = x1 y1 + x2 y2 + x3 y3 − x4 y4 is not an

(d) On V = R2 , the expression (x, y) = x1 y1 + x1 y2 + x2 y2 is not an inner

(e) On V = R2 , the expression (x, y) = x1 y1 + x1 y2 + x2 y1 + x2 y2 is not an

(f) On V = R2 , the expression (x, y) = x1 y1 + x1 y2 + x2 y1 + 2x2 y2 is an inner

(g) On V = R3 , consider the expression (x, y) = 2x1 y1 + x2 y2 − 2x2 y3 − 2x3 y2 +

(i) Weighted version of previous example. Let k : [a, b] → R be such that

is a polynomial, then its components with respect to this basis are p =

the three subdeterminants are 2, 2 and 2(k − 4), so B is positive-definite if and

(ii) hu + v, wi = hu, wi + hv, wi;

(iii) hλu, vi = λhu, vi;

(iv) hv, vi ≥ 0 with hv, vi = 0 ⇐⇒ v = 0 (positivity).