QR Alg

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Orthogonal Bases and the QRAlgorithm

by Peter J. Olver
University of Minnesota
1. Orthogonal Bases.
Throughout, we work in the Euclidean vector space V = R
n
, the space of column
vectors with n real entries. As inner product, we will only use the dot product v w = v
T
w
and corresponding Euclidean norm | v| =

v v .
Two vectors v, w V are called orthogonal if their inner product vanishes: v w = 0.
In the case of vectors in Euclidean space, orthogonality under the dot product means that
they meet at a right angle. A particularly important conguration is when V admits a
basis consisting of mutually orthogonal elements.
Denition 1.1. A basis u
1
, . . . , u
n
of V is called orthogonal if u
i
u
j
= 0 for all i ,= j.
The basis is called orthonormal if, in addition, each vector has unit length: | u
i
| = 1, for
all i = 1, . . . , n.
The simplest example of an orthonormal basis is the standard basis
e
1
=
_
_
_
_
_
_
_
_
1
0
0
.
.
.
0
0
_
_
_
_
_
_
_
_
, e
2
=
_
_
_
_
_
_
_
_
0
1
0
.
.
.
0
0
_
_
_
_
_
_
_
_
, . . . e
n
=
_
_
_
_
_
_
_
_
0
0
0
.
.
.
0
1
_
_
_
_
_
_
_
_
.
Orthogonality follows because e
i
e
j
= 0, for i ,= j, while | e
i
| = 1 implies normality.
Since a basis cannot contain the zero vector, there is an easy way to convert an
orthogonal basis to an orthonormal basis. Namely, we replace each basis vector with a
unit vector pointing in the same direction.
Lemma 1.2. If v
1
, . . . , v
n
is an orthogonal basis of a vector space V , then the
normalized vectors u
i
= v
i
/| v
i
|, i = 1, . . . , n, form an orthonormal basis.
Example 1.3. The vectors
v
1
=
_
_
1
2
1
_
_
, v
2
=
_
_
0
1
2
_
_
, v
3
=
_
_
5
2
1
_
_
,
are easily seen to form a basis of R
3
. Moreover, they are mutually perpendicular, v
1
v
2
=
v
1
v
3
= v
2
v
3
= 0, and so form an orthogonal basis with respect to the standard dot
6/5/10 1 c _2010 Peter J. Olver
product on R
3
. When we divide each orthogonal basis vector by its length, the result is
the orthonormal basis
u
1
=
1

6
_
_
1
2
1
_
_
=
_
_
_
_
1

6
2

6
_
_
_
_
, u
2
=
1

5
_
_
0
1
2
_
_
=
_
_
_
_
0
1

5
2

5
_
_
_
_
, u
3
=
1

30
_
_
5
2
1
_
_
=
_
_
_
_
5

30

30
1

30
_
_
_
_
,
satisfying u
1
u
2
= u
1
u
3
= u
2
u
3
= 0 and | u
1
| = | u
2
| = | u
3
| = 1. The appearance
of square roots in the elements of an orthonormal basis is fairly typical.
A useful observation is that any orthogonal collection of nonzero vectors is automati-
cally linearly independent.
Proposition 1.4. If v
1
, . . . , v
k
V are nonzero, mutually orthogonal elements, so
v
i
,= 0 and v
i
v
j
= 0 for all i ,= j, then they are linearly independent.
Proof : Suppose
c
1
v
1
+ +c
k
v
k
= 0.
Let us take the inner product of this equation with any v
i
. Using linearity of the inner
product and orthogonality, we compute
0 = c
1
v
1
+ +c
k
v
k
v
i
= c
1
v
1
v
i
+ +c
k
v
k
v
i
= c
i
v
i
v
i
= c
i
| v
i
|
2
.
Therefore, provided v
i
,= 0, we conclude that the coecient c
i
= 0. Since this holds for
all i = 1, . . . , k, the linear independence of v
1
, . . . , v
k
follows. Q.E.D.
As a direct corollary, we infer that any collection of nonzero orthogonal vectors forms
a basis for its span.
Theorem 1.5. Suppose v
1
, . . . , v
n
V are nonzero, mutually orthogonal elements
of an inner product space V . Then v
1
, . . . , v
n
form an orthogonal basis for their span
W = span v
1
, . . . , v
n
V , which is therefore a subspace of dimension n = dimW. In
particular, if dimV = n, then v
1
, . . . , v
n
form a orthogonal basis for V .
Computations in Orthogonal Bases
What are the advantages of orthogonal and orthonormal bases? Once one has a basis
of a vector space, a key issue is how to express other elements as linear combinations of the
basis elements that is, to nd their coordinates in the prescribed basis. In general, this
is not so easy, since it requires solving a system of linear equations. In high dimensional
situations arising in applications, computing the solution may require a considerable, if
not infeasible amount of time and eort.
However, if the basis is orthogonal, or, even better, orthonormal, then the change
of basis computation requires almost no work. This is the crucial insight underlying the
ecacy of both discrete and continuous Fourier analysis, least squares approximations and
statistical analysis of large data sets, signal, image and video processing, and a multitude
of other applications, both classical and modern.
6/5/10 2 c _2010 Peter J. Olver
Theorem 1.6. Let u
1
, . . . , u
n
be an orthonormal basis for an inner product space
V . Then one can write any element v V as a linear combination
v = c
1
u
1
+ +c
n
u
n
, (1.1)
in which its coordinates
c
i
= v u
i
, i = 1, . . . , n, (1.2)
are explicitly given as inner products. Moreover, its norm
| v| =
_
c
2
1
+ +c
2
n
=

_
n

i =1
v u
i
2
(1.3)
is the square root of the sum of the squares of its orthonormal basis coordinates.
Proof : Let us compute the inner product of (1.1) with one of the basis vectors. Using
the orthonormality conditions
u
i
u
j
=
_
0 i ,= j,
1 i = j,
(1.4)
and bilinearity of the inner product, we nd
v u
i
=
_
n

j =1
c
j
u
j
; u
i
_
=
n

j =1
c
j
u
j
u
i
= c
i
| u
i
|
2
= c
i
.
To prove formula (1.3), we similarly expand
| v|
2
= v v =
n

i,j =1
c
i
c
j
u
i
u
j
=
n

i =1
c
2
i
,
again making use of the orthonormality of the basis elements. Q.E.D.
Example 1.7. Let us rewrite the vector v = ( 1, 1, 1 )
T
in terms of the orthonormal
basis
u
1
=
_
_
_
_
1

6
2

6
_
_
_
_
, u
2
=
_
_
_
_
0
1

5
2

5
_
_
_
_
, u
3
=
_
_
_
_
5

30

30
1

30
_
_
_
_
,
constructed in Example 1.3. Computing the dot products
v u
1
=
2

6
, v u
2
=
3

5
, v u
3
=
4

30
,
we immediately conclude that
v =
2

6
u
1
+
3

5
u
2
+
4

30
u
3
.
Needless to say, a direct computation based on solving the associated linear system is more
tedious.
6/5/10 3 c _2010 Peter J. Olver
While passage from an orthogonal basis to its orthonormal version is elementary one
simply divides each basis element by its norm we shall often nd it more convenient to
work directly with the unnormalized version. The next result provides the corresponding
formula expressing a vector in terms of an orthogonal, but not necessarily orthonormal
basis. The proof proceeds exactly as in the orthonormal case, and details are left to the
reader.
Theorem 1.8. If v
1
, . . . , v
n
form an orthogonal basis, then the corresponding coor-
dinates of a vector
v = a
1
v
1
+ +a
n
v
n
are given by a
i
=
v v
i
| v
i
|
2
. (1.5)
In this case, its norm can be computed using the formula
| v |
2
=
n

i =1
a
2
i
| v
i
|
2
=
n

i =1
_
v v
i
| v
i
|
_
2
. (1.6)
Equation (1.5), along with its orthonormal simplication (1.2), is one of the most
useful formulas we shall establish, and applications will appear repeatedly throughout the
sequel.
Example 1.9. The wavelet basis
v
1
=
_
_
_
1
1
1
1
_
_
_
, v
2
=
_
_
_
1
1
1
1
_
_
_
, v
3
=
_
_
_
1
1
0
0
_
_
_
, v
4
=
_
_
_
0
0
1
1
_
_
_
, (1.7)
is an orthogonal basis of R
4
. The norms are
| v
1
| = 2, | v
2
| = 2, | v
3
| =

2, | v
4
| =

2.
Therefore, using (1.5), we can readily express any vector as a linear combination of the
wavelet basis vectors. For example,
v =
_
_
_
4
2
1
5
_
_
_
= 2 v
1
v
2
+ 3 v
3
2 v
4
,
where the wavelet coordinates are computed directly by
v v
1
| v
1
|
2
=
8
4
= 2 ,
v v
2
| v
2
|
2
=
4
4
= 1,
v v
3
| v
3
|
2
=
6
2
= 3
v v
4
| v
4
|
2
=
4
2
= 2 .
Finally, we note that
46 = | v |
2
= 2
2
| v
1
|
2
+(1)
2
| v
2
|
2
+3
2
| v
3
|
2
+(2)
2
| v
4
|
2
= 4 4+1 4+9 2+4 2,
in conformity with (1.6).
6/5/10 4 c _2010 Peter J. Olver
2. The GramSchmidt Process.
Once we become convinced of the utility of orthogonal and orthonormal bases, a natu-
ral question arises: How can we construct them? A practical algorithm was rst discovered
by PierreSimon Laplace in the eighteenth century. Today the algorithm is known as the
GramSchmidt process, after its rediscovery by the nineteenth century mathematicians
Jorgen Gram and Erhard Schmidt. The GramSchmidt process is one of the premier
algorithms of applied and computational linear algebra.
We assume that we already know some basis w
1
, . . . , w
n
of V , where n = dimV .
Our goal is to use this information to construct an orthogonal basis v
1
, . . . , v
n
. We will
construct the orthogonal basis elements one by one. Since initially we are not worrying
about normality, there are no conditions on the rst orthogonal basis element v
1
, and so
there is no harm in choosing
v
1
= w
1
.
Note that v
1
,= 0, since w
1
appears in the original basis. The second basis vector must be
orthogonal to the rst: v
2
v
1
= 0. Let us try to arrange this by subtracting a suitable
multiple of v
1
, and set
v
2
= w
2
c v
1
,
where c is a scalar to be determined. The orthogonality condition
0 = v
2
v
1
= w
2
v
1
c v
1
v
1
= w
2
v
1
c | v
1
|
2
requires that c = w
2
v
1
/| v
1
|
2
, and therefore
v
2
= w
2

w
2
v
1
| v
1
|
2
v
1
. (2.1)
Linear independence of v
1
= w
1
and w
2
ensures that v
2
,= 0. (Check!)
Next, we construct
v
3
= w
3
c
1
v
1
c
2
v
2
by subtracting suitable multiples of the rst two orthogonal basis elements from w
3
. We
want v
3
to be orthogonal to both v
1
and v
2
. Since we already arranged that v
1
v
2
= 0,
this requires
0 = v
3
v
1
= w
3
v
1
c
1
v
1
v
1
, 0 = v
3
v
2
= w
3
v
2
c
2
v
2
v
2
,
and hence
c
1
=
w
3
v
1
| v
1
|
2
, c
2
=
w
3
v
2
| v
2
|
2
.
Therefore, the next orthogonal basis vector is given by the formula
v
3
= w
3

w
3
v
1
| v
1
|
2
v
1

w
3
v
2
| v
2
|
2
v
2
.
Since v
1
and v
2
are linear combinations of w
1
and w
2
, we must have v
3
,= 0, as otherwise
this would imply that w
1
, w
2
, w
3
are linearly dependent, and hence could not come from
a basis.
6/5/10 5 c _2010 Peter J. Olver
Continuing in the same manner, suppose we have already constructed the mutually
orthogonal vectors v
1
, . . . , v
k1
as linear combinations of w
1
, . . . , w
k1
. The next or-
thogonal basis element v
k
will be obtained from w
k
by subtracting o a suitable linear
combination of the previous orthogonal basis elements:
v
k
= w
k
c
1
v
1
c
k1
v
k1
.
Since v
1
, . . . , v
k1
are already orthogonal, the orthogonality constraint
0 = v
k
v
j
= w
k
v
j
c
j
v
j
v
j
requires
c
j
=
w
k
v
j
| v
j
|
2
for j = 1, . . . , k 1. (2.2)
In this fashion, we establish the general GramSchmidt formula
v
k
= w
k

k1

j =1
w
k
v
j
| v
j
|
2
v
j
, k = 1, . . . , n. (2.3)
The GramSchmidt process (2.3) denes an explicit, recursive procedure for construct-
ing the orthogonal basis vectors v
1
, . . . , v
n
. If we are actually after an orthonormal ba-
sis u
1
, . . . , u
n
, we merely normalize the resulting orthogonal basis vectors, setting u
k
=
v
k
/| v
k
| for each k = 1, . . . , n.
Example 2.1. The vectors
w
1
=
_
_
1
1
1
_
_
, w
2
=
_
_
1
0
2
_
_
, w
3
=
_
_
2
2
3
_
_
, (2.4)
are readily seen to form a basis

of R
3
. To construct an orthogonal basis (with respect to
the standard dot product) using the GramSchmidt procedure, we begin by setting
v
1
= w
1
=
_
_
1
1
1
_
_
.
The next basis vector is
v
2
= w
2

w
2
v
1
| v
1
|
2
v
1
=
_
_
1
0
2
_
_

1
3
_
_
1
1
1
_
_
=
_
_
_
4
3
1
3
5
3
_
_
_
.

This will, in fact, be a consequence of the successful completion of the GramSchmidt process
and does not need to be checked in advance. If the given vectors were not linearly independent,
then eventually one of the GramSchmidt vectors would vanish, v
k
= 0, and the process will
break down.
6/5/10 6 c _2010 Peter J. Olver
The last orthogonal basis vector is
v
3
= w
3

w
3
v
1
| v
1
|
2
v
1

w
3
v
2
| v
2
|
2
v
2
=
_
_
_
2
2
3
_
_
_

3
3
_
_
_
1
1
1
_
_
_

7
14
3
_
_
_
4
3
1
3
5
3
_
_
_
=
_
_
_
1

3
2

1
2
_
_
_
.
The reader can easily validate the orthogonality of v
1
, v
2
, v
3
.
An orthonormal basis is obtained by dividing each vector by its length. Since
| v
1
| =

3 , | v
2
| =
_
14
3
, | v
3
| =
_
7
2
.
we produce the corresponding orthonormal basis vectors
u
1
=
_
_
_
_
1

3
1

3
_
_
_
_
, u
2
=
_
_
_
_
4

42
1

42
5

42
_
_
_
_
, u
3
=
_
_
_
_
2

14

14

14
_
_
_
_
. (2.5)
Modications of the GramSchmidt Process
With the basic GramSchmidt algorithm now in hand, it is worth looking at a couple
of reformulations that have both practical and theoretical advantages. The rst can be used
to directly construct the orthonormal basis vectors u
1
, . . . , u
n
from the basis w
1
, . . . , w
n
.
We begin by replacing each orthogonal basis vector in the basic GramSchmidt for-
mula (2.3) by its normalized version u
j
= v
j
/| v
j
|. The original basis vectors can be
expressed in terms of the orthonormal basis via a triangular system
w
1
= r
11
u
1
,
w
2
= r
12
u
1
+r
22
u
2
,
w
3
= r
13
u
1
+r
23
u
2
+r
33
u
3
,
.
.
.
.
.
.
.
.
.
.
.
.
w
n
= r
1n
u
1
+r
2n
u
2
+ +r
nn
u
n
.
(2.6)
The coecients r
ij
can, in fact, be computed directly from these formulae. Indeed, taking
the inner product of the equation for w
j
with the orthonormal basis vector u
i
for i j,
we nd, in view of the orthonormality constraints (1.4),
w
j
u
i
= r
1j
u
1
+ +r
jj
u
j
u
i
= r
1j
u
1
u
i
+ +r
jj
u
n
u
i
= r
ij
,
and hence
r
ij
= w
j
u
i
. (2.7)
On the other hand, according to (1.3),
| w
j
|
2
= | r
1j
u
1
+ +r
jj
u
j
|
2
= r
2
1j
+ +r
2
j1,j
+r
2
jj
. (2.8)
6/5/10 7 c _2010 Peter J. Olver
The pair of equations (2.78) can be rearranged to devise a recursive procedure to com-
pute the orthonormal basis. At stage j, we assume that we have already constructed
u
1
, . . . , u
j1
. We then compute

r
ij
= w
j
u
i
, for each i = 1, . . . , j 1.
(2.9)
We obtain the next orthonormal basis vector u
j
by computing
r
jj
=
_
| w
j
|
2
r
2
1j
r
2
j1,j
, u
j
=
w
j
r
1j
u
1
r
j1,j
u
j1
r
jj
.
(2.10)
Running through the formulae (2.910) for j = 1, . . . , n leads to the same orthonormal
basis u
1
, . . . , u
n
as the previous version of the GramSchmidt procedure.
Example 2.2. Let us apply the revised algorithm to the vectors
w
1
=
_
_
1
1
1
_
_
, w
2
=
_
_
1
0
2
_
_
, w
3
=
_
_
2
2
3
_
_
,
of Example 2.1. To begin, we set
r
11
= | w
1
| =

3 , u
1
=
w
1
r
11
=
_
_
_
_
1

3
1

3
_
_
_
_
.
The next step is to compute
r
12
= w
2
u
1
=
1

3
, r
22
=
_
| w
2
|
2
r
2
12
=
_
14
3
, u
2
=
w
2
r
12
u
1
r
22
=
_
_
_
_
4

42
1

42
5

42
_
_
_
_
.
The nal step yields
r
13
= w
3
u
1
=

3 , r
23
= w
3
u
2
=
_
21
2
,
r
33
=
_
| w
3
|
2
r
2
13
r
2
23
=
_
7
2
, u
3
=
w
3
r
13
u
1
r
23
u
2
r
33
=
_
_
_
_
2

14

14

14
_
_
_
_
.
As advertised, the result is the same orthonormal basis vectors that we found in Exam-
ple 2.1.

When j = 1, there is nothing to do.


6/5/10 8 c _2010 Peter J. Olver
For hand computations, the original version (2.3) of the GramSchmidt process is
slightly easier even if one does ultimately want an orthonormal basis since it avoids
the square roots that are ubiquitous in the orthonormal version (2.910). On the other
hand, for numerical implementation on a computer, the orthonormal version is a bit faster,
as it involves fewer arithmetic operations.
However, in practical, large scale computations, both versions of the GramSchmidt
process suer from a serious aw. They are subject to numerical instabilities, and so accu-
mulating round-o errors may seriously corrupt the computations, leading to inaccurate,
non-orthogonal vectors. Fortunately, there is a simple rearrangement of the calculation
that obviates this diculty and leads to the numerically robust algorithm that is most
often used in practice. The idea is to treat the vectors simultaneously rather than sequen-
tially, making full use of the orthonormal basis vectors as they arise. More specically,
the algorithm begins as before we take u
1
= w
1
/| w
1
|. We then subtract o the
appropriate multiples of u
1
from all of the remaining basis vectors so as to arrange their
orthogonality to u
1
. This is accomplished by setting
w
(2)
k
= w
k
w
k
u
1
u
1
for k = 2, . . . , n.
The second orthonormal basis vector u
2
= w
(2)
2
/| w
(2)
2
| is then obtained by normalizing.
We next modify the remaining w
(2)
3
, . . . , w
(2)
n
to produce vectors
w
(3)
k
= w
(2)
k
w
(2)
k
u
2
u
2
, k = 3, . . . , n,
that are orthogonal to both u
1
and u
2
. Then u
3
= w
(3)
3
/| w
(3)
3
| is the next orthonormal
basis element, and the process continues. The full algorithm starts with the initial basis
vectors w
j
= w
(1)
j
, j = 1, . . . , n, and then recursively computes
u
j
=
w
(j)
j
| w
(j)
j
|
, w
(j+1)
k
= w
(j)
k
w
(j)
k
u
j
u
j
,
j = 1, . . . , n,
k = j + 1, . . . , n.
(2.11)
(In the nal phase, when j = n, the second formula is no longer needed.) The result is a
numerically stable computation of the same orthonormal basis vectors u
1
, . . . , u
n
.
Example 2.3. Let us apply the stable GramSchmidt process (2.11) to the basis
vectors
w
(1)
1
= w
1
=
_
_
2
2
1
_
_
, w
(1)
2
= w
2
=
_
_
0
4
1
_
_
, w
(1)
3
= w
3
=
_
_
1
2
3
_
_
.
The rst orthonormal basis vector is u
1
=
w
(1)
1
| w
(1)
1
|
=
_
_
_
2
3
2
3

1
3
_
_
_
. Next, we compute
w
(2)
2
= w
(1)
2
w
(1)
2
u
1
u
1
=
_
_
2
2
0
_
_
, w
(2)
3
= w
(1)
3
w
(1)
3
u
1
u
1
=
_
_
1
0
2
_
_
.
6/5/10 9 c _2010 Peter J. Olver
The second orthonormal basis vector is u
2
=
w
(2)
2
| w
(2)
2
|
=
_
_
_
_

2
1

2
0
_
_
_
_
. Finally,
w
(3)
3
= w
(2)
3
w
(2)
3
u
2
u
2
=
_
_
_

1
2

1
2
2
_
_
_
, u
3
=
w
(3)
3
| w
(3)
3
|
=
_
_
_

2
6

2
6

2
3
_
_
_
.
The resulting vectors u
1
, u
2
, u
3
form the desired orthonormal basis.
3. Orthogonal Matrices.
Matrices whose columns form an orthonormal basis of R
n
relative to the standard
Euclidean dot product play a distinguished role. Such orthogonal matrices appear in
a wide range of applications in geometry, physics, quantum mechanics, crystallography,
partial dierential equations, symmetry theory, and special functions. Rotational motions
of bodies in three-dimensional space are described by orthogonal matrices, and hence they
lie at the foundations of rigid body mechanics, including satellite and underwater vehicle
motions, as well as three-dimensional computer graphics and animation. Furthermore,
orthogonal matrices are an essential ingredient in one of the most important methods of
numerical linear algebra: the QR algorithm for computing eigenvalues of matrices.
Denition 3.1. A square matrix Q is called an orthogonal matrix if it satises
Q
T
Q = I . (3.1)
The orthogonality condition implies that one can easily invert an orthogonal matrix:
Q
1
= Q
T
. (3.2)
In fact, the two conditions are equivalent, and hence a matrix is orthogonal if and only if
its inverse is equal to its transpose. The second important characterization of orthogonal
matrices relates them directly to orthonormal bases.
Proposition 3.2. A matrix Q is orthogonal if and only if its columns form an
orthonormal basis with respect to the Euclidean dot product on R
n
.
Proof : Let u
1
, . . . , u
n
be the columns of Q. Then u
T
1
, . . . , u
T
n
are the rows of the
transposed matrix Q
T
. The (i, j) entry of the product Q
T
Q is given as the product of the
i
th
row of Q
T
times the j
th
column of Q. Thus, the orthogonality requirement (3.1) implies
u
i
u
j
= u
T
i
u
j
=
_
1, i = j,
0, i ,= j,
which are precisely the conditions (1.4) for u
1
, . . . , u
n
to
form an orthonormal basis. Q.E.D.
Warning: Technically, we should be referring to an orthonormal matrix, not an
orthogonal matrix. But the terminology is so standard throughout mathematics that
we have no choice but to adopt it here. There is no commonly accepted name for a matrix
whose columns form an orthogonal but not orthonormal basis.
6/5/10 10 c _2010 Peter J. Olver
u
1
u
2

u
1
u
2

Figure 1. Orthonormal Bases in R


2
.
Example 3.3. A 2 2 matrix Q =
_
a b
c d
_
is orthogonal if and only if its columns
u
1
=
_
a
c
_
, u
2
=
_
b
d
_
, form an orthonormal basis of R
2
. Equivalently, the requirement
Q
T
Q =
_
a c
b d
_ _
a b
c d
_
=
_
a
2
+c
2
ab +cd
ab +cd b
2
+d
2
_
=
_
1 0
0 1
_
,
implies that its entries must satisfy the algebraic equations
a
2
+c
2
= 1, ab +cd = 0, b
2
+d
2
= 1.
The rst and last equations say that the points ( a, c )
T
and ( b, d )
T
lie on the unit circle
in R
2
, and so
a = cos , c = sin , b = cos , d = sin ,
for some choice of angles , . The remaining orthogonality condition is
0 = ab +cd = cos cos + sin sin = cos( ),
which implies that and dier by a right angle: =
1
2
. The sign leads to two
cases:
b = sin , d = cos , or b = sin , d = cos .
As a result, every 2 2 orthogonal matrix has one of two possible forms
_
cos sin
sin cos
_
or
_
cos sin
sin cos
_
, where 0 < 2. (3.3)
The corresponding orthonormal bases are illustrated in Figure 1. The former is a right
handed basis, and can be obtained from the standard basis e
1
, e
2
by a rotation through
angle , while the latter has the opposite, reected orientation.
6/5/10 11 c _2010 Peter J. Olver
Example 3.4. A 33 orthogonal matrix Q = ( u
1
u
2
u
3
) is prescribed by 3 mutually
perpendicular vectors of unit length in R
3
. For instance, the orthonormal basis constructed
in (2.5) corresponds to the orthogonal matrix Q =
_
_
_
_
1

3
4

42
2

14
1

3
1

42

3

14

3
5

42

1

14
_
_
_
_
.
Lemma 3.5. An orthogonal matrix has determinant det Q = 1.
Proof : Taking the determinant of (3.1),
1 = det I = det(Q
T
Q) = det Q
T
det Q = (det Q)
2
,
which immediately proves the lemma. Q.E.D.
An orthogonal matrix is called proper or special if it has determinant +1. Geomet-
rically, the columns of a proper orthogonal matrix form a right handed basis of R
n
. An
improper orthogonal matrix, with determinant 1, corresponds to a left handed basis that
lives in a mirror image world.
Proposition 3.6. The product of two orthogonal matrices is also orthogonal.
Proof : If Q
T
1
Q
1
= I = Q
T
2
Q
2
, then (Q
1
Q
2
)
T
(Q
1
Q
2
) = Q
T
2
Q
T
1
Q
1
Q
2
= Q
T
2
Q
2
= I ,
and so the product matrix Q
1
Q
2
is also orthogonal. Q.E.D.
This property says that the set of all orthogonal matrices forms a group. The orthog-
onal group lies at the foundation of everyday Euclidean geometry, as well as rigid body
mechanics, atomic structure and chemistry, computer graphics and animation, and many
other areas.
4. Eigenvalues of Symmetric Matrices.
Symmetric matrices play an important role in a broad range of applications, and
enjoy a number of important properties not shared by more general matrices. Not only
are the eigenvalues of a symmetric matrix necessarily real, the eigenvectors always form
an orthogonal basis. In fact, this is by far the most common way for orthogonal bases
to appear as the eigenvector bases of symmetric matrices. Let us state this important
result, but defer its proof until the end of the section.
Theorem 4.1. Let A = A
T
be a real symmetric n n matrix. Then
(a) All the eigenvalues of A are real.
(b) Eigenvectors corresponding to distinct eigenvalues are orthogonal.
(c) There is an orthonormal basis of R
n
consisting of n eigenvectors of A.
In particular, all symmetric matrices are complete.
6/5/10 12 c _2010 Peter J. Olver
Example 4.2. Consider the symmetric matrix A =
_
_
5 4 2
4 5 2
2 2 1
_
_
. A straight-
forward computation produces its eigenvalues and eigenvectors:

1
= 9,
2
= 3,
3
= 3,
v
1
=
_
_
1
1
0
_
_
, v
2
=
_
_
1
1
1
_
_
, v
3
=
_
_
1
1
2
_
_
.
As the reader can check, the eigenvectors form an orthogonal basis of R
3
. An orthonormal
basis is provided by the unit eigenvectors
u
1
=
_
_
_
_
1

2
0
_
_
_
_
, u
2
=
_
_
_
_
1

3
1

3
1

3
_
_
_
_
, u
3
=
_
_
_
_
1

6
1

6
_
_
_
_
.
Proof of Theorem 4.1: First, if A = A
T
is real, symmetric, then
(Av) w = v (Aw) for all v, w C
n
, (4.1)
where indicates the Euclidean dot product when the vectors are real and, more generally,
the Hermitian dot product v w = v
T
w when they are complex.
To prove property (a), suppose is a complex eigenvalue with complex eigenvector
v C
n
. Consider the Hermitian dot product of the complex vectors Av and v:
(Av) v = (v) v = | v|
2
.
On the other hand, by (4.1),
(Av) v = v (Av) = v (v) = v
T
v = | v|
2
.
Equating these two expressions, we deduce
| v|
2
= | v|
2
.
Since v ,= 0, as it is an eigenvector, we conclude that = , proving that the eigenvalue
is real.
To prove (b), suppose
Av = v, Aw = w,
where ,= are distinct real eigenvalues. Then, again by (4.1),
v w = (Av) w = v (Aw) = v (w) = v w,
and hence
( ) v w = 0.
6/5/10 13 c _2010 Peter J. Olver
Since ,= , this implies that v w = 0 and hence the eigenvectors v, w are orthogonal.
Finally, the proof of (c) is easy if all the eigenvalues of A are distinct. Part (b) proves
that the corresponding eigenvectors are orthogonal, and Proposition 1.4 proves that they
form a basis. To obtain an orthonormal basis, we merely divide the eigenvectors by their
lengths: u
k
= v
k
/| v
k
|, as in Lemma 1.2.
To prove (c) in general, we proceed by induction on the size n of the matrix A. To
start, the case of a 11 matrix is trivial. (Why?) Next, suppose A has size nn. We know
that A has at least one eigenvalue,
1
, which is necessarily real. Let v
1
be the associated
eigenvector. Let
V

= w R
n
[ v
1
w = 0
denote the orthogonal complement to the eigenspace V

1
the set of all vectors orthogonal
to the rst eigenvector. Since dimV

= n 1, we can choose an orthonormal basis


y
1
, . . . , y
n1
. Now, if w is any vector in V

, so is Aw, since, by (4.1),


v
1
(Aw) = (Av
1
) w =
1
v
1
w = 0.
Thus, A denes a linear transformation on V

represented by an (n 1) (n 1) matrix
with respect to the chosen orthonormal basis y
1
, . . . , y
n1
. It is not hard to prove that
the representing matrix is symmetric, and so our induction hypothesis then implies that
there is an orthonormal basis of V

consisting of eigenvectors u
2
, . . . , u
n
of A. Appending
the unit eigenvector u
1
= v
1
/| v
1
| to this collection will complete the orthonormal basis
of R
n
. Q.E.D.
The orthonormal eigenvector basis serves to diagonalize the symmetric matrix, result-
ing in the so-called spectral factorization formula.
Theorem 4.3. Let A be a real, symmetric matrix. Then there exists an orthogonal
matrix Q such that
A = QQ
1
= QQ
T
, (4.2)
where = diag (
1
, . . . ,
n
) is a real diagonal matrix. The eigenvalues of A appear on the
diagonal of , while the columns of Q are the corresponding orthonormal eigenvectors.
Proof : Equation (4.2) can be rewritten as AQ = Q. The k
th
column of the latter
matrix equation reads Av
k
=
k
v
k
, where v
k
is the k
th
column of Q. But this is merely
the condition that v
k
be an eigenvector of A with eigenvalue
k
. Theorem 4.1 serves to
complete the proof. Q.E.D.
Remark: The term spectrum refers to the eigenvalues of a matrix or, more gener-
ally, a linear operator. The terminology is motivated by physics. The spectral energy lines
of atoms, molecules and nuclei are characterized as the eigenvalues of the governing quan-
tum mechanical Schrodinger operator. The Spectral Theorem 4.3 is the nite-dimensional
version for the decomposition of quantum mechanical linear operators into their spectral
eigenstates.
The most important subclass of symmetric matrices are the positive denite matrices.
6/5/10 14 c _2010 Peter J. Olver
Denition 4.4. An nn symmetric matrix A is called positive denite if it satises
the positivity condition
x
T
Ax > 0 for all 0 ,= x R
n
. (4.3)
Theorem 4.5. A symmetric matrix A = A
T
is positive denite if and only if all of
its eigenvalues are strictly positive.
Proof : First, if A is positive denite, and v ,= 0 is an eigenvector with (necessarily
real) eigenvalue , then (4.3) implies
0 < v
T
Av = v
T
(v) = | v |
2
, (4.4)
which immediately proves that > 0. Conversely, suppose A has all positive eigenvalues.
Let u
1
, . . . , u
n
be the orthonormal eigenvector basis of R
n
guaranteed by Theorem 4.1,
with Au
j
=
j
u
j
. Then, writing
x = c
1
u
1
+ +c
n
u
n
, we nd Ax = c
1

1
u
1
+ +c
n

n
u
n
.
Therefore, using the orthonormality of the eigenvectors, for any x ,= 0,
x
T
Ax = (c
1
u
T
1
+ +c
n
u
T
n
) (c
1

1
u
1
+ +c
n

n
u
n
) =
1
c
2
1
+ +
n
c
2
n
> 0
since only x = 0 has c
1
= = c
n
= 0. This proves that A is positive denite. Q.E.D.
5. The QR Factorization.
The GramSchmidt procedure for orthonormalizing bases of R
n
can be reinterpreted
as a matrix factorization. This is more subtle than the LU factorization that results from
Gaussian Elimination, but is of comparable signicance, and is used in a broad range of
applications in mathematics, statistics, physics, engineering, and numerical analysis.
Let w
1
, . . . , w
n
be a basis of R
n
, and let u
1
, . . . , u
n
be the corresponding orthonormal
basis that results from any one of the three implementations of the GramSchmidt process.
We assemble both sets of column vectors to form nonsingular n n matrices
A = ( w
1
w
2
. . . w
n
), Q = ( u
1
u
2
. . . u
n
).
Since the u
i
form an orthonormal basis, Q is an orthogonal matrix. Moreover, the Gram
Schmidt equations (2.6) can be recast into an equivalent matrix form:
A = QR, where R =
_
_
_
_
r
11
r
12
. . . r
1n
0 r
22
. . . r
2n
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . r
nn
_
_
_
_
(5.1)
is an upper triangular matrix whose entries are the coecients in (2.910). Since the
GramSchmidt process works on any basis, the only requirement on the matrix A is that
its columns form a basis of R
n
, and hence A can be any nonsingular matrix. We have
therefore established the celebrated QR factorization of nonsingular matrices.
6/5/10 15 c _2010 Peter J. Olver
QR Factorization of a Matrix A
start
for j = 1 to n
set r
jj
=
_
a
2
1j
+ +a
2
nj
if r
jj
= 0, stop; print A has linearly dependent columns
else for i = 1 to n
set a
ij
= a
ij
/r
jj
next i
for k = j + 1 to n
set r
jk
= a
1j
a
1k
+ +a
nj
a
nk
for i = 1 to n
set a
ik
= a
ik
a
ij
r
jk
next i
next k
next j
end
Theorem 5.1. Any nonsingular matrix Acan be factored, A = QR, into the product
of an orthogonal matrix Q and an upper triangular matrix R. The factorization is unique
if all the diagonal entries of R are assumed to be positive.
We will use the compacted term positive upper triangular to refer to upper triangular
matrices with positive entries along the diagonal.
Example 5.2. The columns of the matrix A =
_
_
1 1 2
1 0 2
1 2 3
_
_
are the same as the
basis vectors considered in Example 2.2. The orthonormal basis (2.5) constructed using the
GramSchmidt algorithm leads to the orthogonal and positive upper triangular matrices
Q =
_
_
_
_
1

3
4

42
2

14
1

3
1

42

3

14

3
5

42

1

14
_
_
_
_
, R =
_
_
_
_

3
1

3
0

14

21

2
0 0

2
_
_
_
_
. (5.2)
The reader may wish to verify that, indeed, A = QR.
While any of the three implementations of the GramSchmidt algorithm will produce
the QR factorization of a given matrix A = ( w
1
w
2
. . . w
n
), the stable version, as encoded
in equations (2.11), is the one to use in practical computations, as it is the least likely to fail
6/5/10 16 c _2010 Peter J. Olver
due to numerical artifacts produced by round-o errors. The accompanying pseudocode
program reformulates the algorithm purely in terms of the matrix entries a
ij
of A. During
the course of the algorithm, the entries of the matrix A are successively overwritten; the
nal result is the orthogonal matrix Q appearing in place of A. The entries r
ij
of R must
be stored separately.
Example 5.3. Let us factor the matrix
A =
_
_
_
2 1 0 0
1 2 1 0
0 1 2 1
0 0 1 2
_
_
_
using the numerically stable QR algorithm. As in the program, we work directly on the
matrix A, gradually changing it into orthogonal form. In the rst loop, we set r
11
=

5
to be the norm of the rst column vector of A. We then normalize the rst column
by dividing by r
11
; the resulting matrix is
_
_
_
_
_
_
2

5
1 0 0
1

5
2 1 0
0 1 2 1
0 0 1 2
_
_
_
_
_
_
. The next entries r
12
=
4

5
, r
13
=
1

5
, r
14
= 0, are obtained by taking the dot products of the rst column
with the other three columns. For j = 1, 2, 3, we subtract r
1j
times the rst column
from the j
th
column; the result
_
_
_
_
_
_
2

5

3
5

2
5
0
1

5
6
5
4
5
0
0 1 2 1
0 0 1 2
_
_
_
_
_
_
is a matrix whose rst column is
normalized to have unit length, and whose second, third and fourth columns are orthogonal
to it. In the next loop, we normalize the second column by dividing by its norm r
22
=
_
14
5
, and so obtain the matrix
_
_
_
_
_
_
2

5

3

70

2
5
0
1

5
6

70
4
5
0
0
5

70
2 1
0 0 1 2
_
_
_
_
_
_
. We then take dot products of
the second column with the remaining two columns to produce r
23
=
16

70
, r
24
=
5

70
.
Subtracting these multiples of the second column from the third and fourth columns, we
obtain
_
_
_
_
_
_
2

5

3

70
2
7
3
14
1

5
6

70

4
7

3
7
0
5

70
6
7
9
14
0 0 1 2
_
_
_
_
_
_
, which now has its rst two columns orthonormalized,
and orthogonal to the last two columns. We then normalize the third column by dividing
6/5/10 17 c _2010 Peter J. Olver
by r
33
=
_
15
7
, and so
_
_
_
_
_
_
_
2

5

3

70
2

105
3
14
1

5
6

70

4

105

3
7
0
5

70
6

105
9
14
0 0
7

105
2
_
_
_
_
_
_
_
. Finally, we subtract r
34
=
20

105
times the third column from the fourth column. Dividing the resulting fourth column by
its norm r
44
=
_
5
6
results in the nal formulas,
Q =
_
_
_
_
_
_
_
2

5

3

70
2

105

1

30
1

5
6

70

4

105
2

30
0
5

70
6

105

3

30
0 0
7

105
4

30
_
_
_
_
_
_
_
, R =
_
_
_
_
_
_
_

5
4

5
1

5
0
0

14

5
16

70
5

70
0 0

15

7
20

105
0 0 0

6
_
_
_
_
_
_
_
,
for the A = QR factorization.
6. Numerical Calculation of Eigenvalues.
We are now ready to apply these results to the problem of numerically approximating
eigenvalues of matrices. Before explaining the QRalgorithm, we rst review the elementary
power method.
The Power Method
We assume, for simplicity, that A is a complete n n matrix, meaning that it has an
eigenvector basis

v
1
, . . . , v
n
, with corresponding eigenvalues
1
, . . . ,
n
. It is easy to see
that the solution to the linear iterative system
v
(k+1)
= Av
(k)
, v
(0)
= v,
(6.1)
is obtained by multiplying the initial vector v by the successive powers of the coecient
matrix: v
(k)
= A
k
v. If we write the initial vector in terms of the eigenvector basis
v = c
1
v
1
+ +c
n
v
n
, (6.2)
then the solution takes the explicit form
v
(k)
= A
k
v = c
1

k
1
v
1
+ +c
n

k
n
v
n
. (6.3)
Suppose further that A has a single dominant real eigenvalue,
1
, that is larger than
all others in magnitude, so
[
1
[ > [
j
[ for all j > 1. (6.4)

This is not a very severe restriction. Theorem 4.1 implies that all symmetric matrices are
complete. Moreover, perturbations caused by round o and/or numerical inaccuracies will almost
inevitably make an incomplete matrix complete.
6/5/10 18 c _2010 Peter J. Olver
As its name implies, this eigenvalue will eventually dominate the iteration (6.3). Indeed,
since
[
1
[
k
[
j
[
k
for all j > 1 and all k 0,
the rst term in the iterative formula (6.3) will eventually be much larger than the rest,
and so, provided c
1
,= 0,
v
(k)
c
1

k
1
v
1
for k 0.
Therefore, the solution to the iterative system (6.1) will, almost always, end up being a
multiple of the dominant eigenvector of the coecient matrix.
To compute the corresponding eigenvalue, we note that the i
th
entry of the iterate
v
(k)
is approximated by v
(k)
i
c
1

k
1
v
1,i
, where v
1,i
is the i
th
entry of the eigenvector v
1
.
Thus, as long as v
1,i
,= 0, we can recover the dominant eigenvalue by taking a ratio between
selected components of successive iterates:

1

v
(k)
i
v
(k1)
i
, provided that v
(k1)
i
,= 0. (6.5)
Example 6.1. Consider the matrix A =
_
_
1 2 2
1 4 2
3 9 7
_
_
. As you can check, its
eigenvalues and eigenvectors are

1
= 3, v
1
=
_
_
1
1
3
_
_
,
2
= 2, v
2
=
_
_
0
1
1
_
_
,
3
= 1, v
3
=
_
_
1
1
2
_
_
.
Repeatedly multiplying an initial vector v = ( 1, 0, 0 )
T
, say, by A results in the iterates
v
(k)
= A
k
v listed in the accompanying table. The last column indicates the ratio
(k)
=
v
(k)
1
/v
(k1)
1
between the rst components of successive iterates. (One could equally well
use the second or third components.) The ratios are converging to the dominant eigenvalue

1
= 3, while the vectors v
(k)
are converging to a very large multiple of the corresponding
eigenvector v
1
= ( 1, 1, 3 )
T
.
Variants of the power method for nding other eigenvalues include the inverse power
method based on iterating the inverse matrix A
1
, with eigenvalues
1
1
, . . . ,
1
n
and the
shifted inverse power method, based on (A I )
1
, with eigenvalues (
k
)
1
. The power
method only produces the dominant (largest in magnitude) eigenvalue of a matrix A. The
inverse power method can be used to nd the smallest eigenvalue. Additional eigenvalues
can be found by using the shifted inverse power method, or deation. However, if we need
to know all the eigenvalues, such piecemeal methods are too time-consuming to be of much
practical value.
The QR Algorithm
The most popular scheme for simultaneously approximating all the eigenvalues of a
matrix A is the remarkable QR algorithm, rst proposed in 1961 independently by Francis
6/5/10 19 c _2010 Peter J. Olver
and Kublanovskaya. The underlying idea is simple, but surprising. The rst step is to
factor the matrix
A = A
1
= Q
1
R
1
into a product of an orthogonal matrix Q
1
and a positive (i.e., with all positive entries
along the diagonal) upper triangular matrix R
1
. Next, multiply the two factors together
in the wrong order! The result is the new matrix
A
2
= R
1
Q
1
.
We then repeat these two steps. Thus, we next factor
A
2
= Q
2
R
2
using the GramSchmidt process, and then multiply the factors in the reverse order to
produce
A
3
= R
2
Q
2
.
The complete algorithm can be written as
A = Q
1
R
1
, R
k
Q
k
= A
k+1
= Q
k+1
R
k+1
, k = 1, 2, 3, . . . , (6.6)
where Q
k
, R
k
come from the previous step, and the subsequent orthogonal matrix Q
k+1
and positive upper triangular matrix R
k+1
are computed by using the numerically stable
form of the GramSchmidt algorithm.
The astonishing fact is that, for many matrices A, the iterates A
k
V converge to
an upper triangular matrix V whose diagonal entries are the eigenvalues of A. Thus, after
a sucient number of iterations, say k

, the matrix A
k
will have very small entries below
the diagonal, and one can read o a complete system of (approximate) eigenvalues along
its diagonal. For each eigenvalue, the computation of the corresponding eigenvector can
be done by solving the appropriate homogeneous linear system, or by applying the shifted
inverse power method.
Example 6.2. Consider the matrix A =
_
2 1
2 3
_
. The initial GramSchmidt fac-
torization A = Q
1
R
1
yields
Q
1
=
_
.7071 .7071
.7071 .7071
_
, R
1
=
_
2.8284 2.8284
0 1.4142
_
.
These are multiplied in the reverse order to give
A
2
= R
1
Q
1
=
_
4 0
1 1
_
.
We refactor A
2
= Q
2
R
2
via GramSchmidt, and then reverse multiply to produce
Q
2
=
_
.9701 .2425
.2425 .9701
_
, R
2
=
_
4.1231 .2425
0 .9701
_
,
A
3
= R
2
Q
2
=
_
4.0588 .7647
.2353 .9412
_
.
6/5/10 20 c _2010 Peter J. Olver
The next iteration yields
Q
3
=
_
.9983 .0579
.0579 .9983
_
, R
3
=
_
4.0656 .7090
0 .9839
_
,
A
4
= R
3
Q
3
=
_
4.0178 .9431
.0569 .9822
_
.
Continuing in this manner, after 9 iterations we nd, to four decimal places,
Q
10
=
_
1 0
0 1
_
, R
10
=
_
4 1
0 1
_
, A
11
= R
10
Q
10
=
_
4 1
0 1
_
.
The eigenvalues of A, namely 4 and 1, appear along the diagonal of A
11
. Additional
iterations produce very little further change, although they can be used for increasing the
accuracy of the computed eigenvalues.
If the original matrix A is symmetric, positive denite, and with distinct eigenvalues,
then, in most situations (see below for the precise requirement), the iterates converge,
A
k
V = , to a diagonal matrix containing the eigenvalues of A in decreasing order.
Moreover, if, in this case, we recursively dene
S
1
= Q
1
, S
k
= S
k1
Q
k
= Q
1
Q
2
Q
k1
Q
k
, k > 1, (6.7)
then S
k
S have, as their limit, an orthogonal matrix whose columns are the orthonormal
eigenvector basis of A.
Example 6.3. Consider the symmetric matrix A =
_
_
2 1 0
1 3 1
0 1 6
_
_
. The initial
A = Q
1
R
1
factorization produces
S
1
= Q
1
=
_
_
.8944 .4082 .1826
.4472 .8165 .3651
0 .4082 .9129
_
_
, R
1
=
_
_
2.2361 2.2361 .4472
0 2.4495 3.2660
0 0 5.1121
_
_
,
and so
A
2
= R
1
Q
1
=
_
_
3.0000 1.0954 0
1.0954 3.3333 2.0870
0 2.0870 4.6667
_
_
.
We refactor A
2
= Q
2
R
2
and reverse multiply to produce
Q
2
=
_
_
.9393 .2734 .2071
.3430 .7488 .5672
0 .6038 .7972
_
_
, S
2
= S
1
Q
2
=
_
_
.7001 .4400 .5623
.7001 .2686 .6615
.1400 .8569 .4962
_
_
,
R
2
=
_
_
3.1937 2.1723 .7158
0 3.4565 4.3804
0 0 2.5364
_
_
, A
3
= R
2
Q
2
=
_
_
3.7451 1.1856 0
1.1856 5.2330 1.5314
0 1.5314 2.0219
_
_
.
6/5/10 21 c _2010 Peter J. Olver
Continuing in this manner, after 10 iterations we nd
Q
11
=
_
_
1.0000 .0067 0
.0067 1.0000 .0001
0 .0001 1.0000
_
_
, S
11
=
_
_
.0753 .5667 .8205
.3128 .7679 .5591
.9468 .2987 .1194
_
_
,
R
11
=
_
_
6.3229 .0647 0
0 3.3582 .0006
0 0 1.3187
_
_
, A
12
=
_
_
6.3232 .0224 0
.0224 3.3581 .0002
0 .0002 1.3187
_
_
.
After 20 iterations, the process has completely settled down, and
Q
21
=
_
_
1 0 0
0 1 0
0 0 1
_
_
, S
21
=
_
_
.0710 .5672 .8205
.3069 .7702 .5590
.9491 .2915 .1194
_
_
,
R
21
=
_
_
6.3234 .0001 0
0 3.3579 0
0 0 1.3187
_
_
, A
22
=
_
_
6.3234 0 0
0 3.3579 0
0 0 1.3187
_
_
.
The eigenvalues of A appear along the diagonal of A
22
, while the columns of S
21
are the
corresponding orthonormal eigenvector basis, listed in the same order as the eigenvalues,
both correct to 4 decimal places.
We will devote the remainder of this section to a justication of the QR algorithm for
a general class of symmetric matrices. We begin by assuming that A is an nn symmetric,
positive denite matrix, with distinct eigenvalues

1
>
2
> >
n
> 0. (6.8)
According to the Spectral Theorem 4.3, the corresponding unit eigenvectors u
1
, . . . , u
n
form an orthonormal basis of R
n
. After treating this case, the analysis will be extended
to a broader class of symmetric matrices, having both positive and negative eigenvalues.
The secret is that the QR algorithm is, in fact, a well-disguised adaptation of the more
primitive power method. If we were to use the power method to capture all the eigenvectors
and eigenvalues of A, the rst thought might be to try to perform it simultaneously on
a complete basis v
(0)
1
, . . . , v
(0)
n
of R
n
instead of just one individual vector. The problem
is that, for almost all vectors, the power iterates v
(k)
j
= A
k
v
(0)
j
all tend to a multiple of
the dominant eigenvector u
1
. Normalizing the vectors at each step is not any better, since
then they merely converge to one of the two dominant unit eigenvectors u
1
. However,
if, inspired by the form of the eigenvector basis, we orthonormalize the vectors at each
step, then we eectively prevent them from all accumulating at the same dominant unit
eigenvector, and so, with some luck, the resulting vectors will converge to the full system of
eigenvectors. Since orthonormalizing a basis via the GramSchmidt process is equivalent
to a QR matrix factorization, the mechanics of the algorithm is not so surprising.
In detail, we start with any orthonormal basis, which, for simplicity, we take to be
the standard basis vectors of R
n
, and so u
(0)
1
= e
1
, . . . , u
(0)
n
= e
n
. At the k
th
stage
of the algorithm, we set u
(k)
1
, . . . , u
(k)
n
to be the orthonormal vectors that result from
6/5/10 22 c _2010 Peter J. Olver
applying the GramSchmidt algorithm to the power vectors v
(k)
j
= A
k
e
j
. In matrix
language, the vectors v
(k)
1
, . . . , v
(k)
n
are merely the columns of A
k
, and the orthonormal
basis u
(k)
1
, . . . , u
(k)
n
are the columns of the orthogonal matrix S
k
in the QR decomposition
of the k
th
power of A, which we denote by
A
k
= S
k
P
k
, (6.9)
where P
k
is positive upper triangular. Note that, in view of (6.6)
A = Q
1
R
1
, A
2
= Q
1
R
1
Q
1
R
1
= Q
1
Q
2
R
2
R
1
,
A
3
= Q
1
R
1
Q
1
R
1
Q
1
R
1
= Q
1
Q
2
R
2
Q
2
R
2
R
1
= Q
1
Q
2
Q
3
R
3
R
2
R
1
,
and, in general,
A
k
=
_
Q
1
Q
2
Q
k1
Q
k
_ _
R
k
R
k1
R
2
R
1
_
. (6.10)
The product of orthogonal matrices is also orthogonal. The product of positive upper
triangular matrices is also positive upper triangular. Therefore, comparing (6.9, 10) and
invoking the uniqueness of the QR factorization, we conclude that
S
k
= Q
1
Q
2
Q
k1
Q
k
= S
k1
Q
k
, P
k
= R
k
R
k1
R
2
R
1
= R
k
P
k1
. (6.11)
Let S = ( u
1
u
2
. . . u
n
) denote an orthogonal matrix whose columns are unit eigen-
vectors of A. The Spectral Theorem 4.3 tells us that
A = S S
T
, where = diag (
1
, . . . ,
n
)
is the diagonal eigenvalue matrix. Substituting the spectral factorization into (6.9), we
nd
A
k
= S
k
S
T
= S
k
P
k
.
We now make one additional assumption on the matrix A by requiring that S
T
be a
regular matrix, meaning that we can factor S
T
= LU into a product of special lower and
upper triangular matrices, or, equivalently, that Gaussian Elimination can be performed
on the linear system S
T
x = b without any row interchanges. Regularity holds generically,
and is the analog of the condition that our initial vector in the power method includes
a nonzero component of the dominant eigenvector. We can assume that, without loss of
generality, the diagonal entries of the upper triangular matrix U that is, the pivots of
S
T
are all positive. Indeed, this can be arranged by multiplying each row of S
T
by
the sign of its pivot this amounts to possibly changing the signs of some of the unit
eigenvectors u
i
, which is allowed since it does not aect their status as an orthonormal
eigenvector basis.
Under these two assumptions,
A
k
= S
k
LU = S
k
P
k
, and hence S
k
L = S
k
P
k
U
1
.
Multiplying on the right by
k
, we obtain
S
k
L
k
= S
k
T
k
, where T
k
= P
k
U
1

k
(6.12)
6/5/10 23 c _2010 Peter J. Olver
is also a positive upper triangular matrix, since we are assuming that the eigenvalues, i.e.,
the diagonal entries of , are all positive.
Now consider what happens as k . The entries of the lower triangular matrix
N =
k
L
k
are
n
ij
=
_

_
l
ij
(
i
/
j
)
k
, i > j,
l
ii
= 1, i = j,
0, i < j.
Since we are assuming
i
<
j
when i > j, we immediately deduce that

k
L
k
I ,
and hence
S
k
T
k
= S
k
L
k
S as k . (6.13)
We now appeal to the following lemma, whose proof will be given after we nish the
justication of the QR algorithm.
Lemma 6.4. Let S
1
, S
2
, . . . and S be orthogonal matrices and T
1
, T
2
, . . . positive
upper triangular matrices. Then S
k
T
k
S as k if and only if S
k
S and T
k
I .
Therefore, as claimed, the orthogonal matrices S
k
do converge to the orthogonal
eigenvector matrix S. By (6.11),
Q
k
= S
1
k1
S
k
I as k . (6.14)
Moreover, by (6.1112),
R
k
= P
k
P
1
k1
=
_
T
k

k
U
1
_ _
T
k1

k1
U
1
_
1
= T
k
T
1
k1
.
Since both T
k
and T
k1
converge to the identity matrix, in the limit
R
k
as k . (6.15)
Combining (6.14, 15), we deduce that the matrices appearing in the QR algorithm converge
to the diagonal eigenvalue matrix, as claimed:
A
k
= Q
k
R
k
as k . (6.16)
Note that the eigenvalues appear in decreasing order along the diagonal, which is a conse-
quence of our regularity assumption on the transposed eigenvector matrix S
T
.
The last remaining item is a proof of Lemma 6.4. We write S = ( u
1
u
2
. . . u
n
),
S
k
= (u
(k)
1
, . . . , u
(k)
n
) in columnar form. Let t
(k)
ij
denote the entries of the positive upper
triangular matrix T
k
. The rst column of the limiting equation S
k
T
k
S reads
t
(k)
11
u
(k)
1
u
1
.
Since both u
(k)
1
and u
1
are unit vectors, and t
(k)
11
> 0,
| t
(k)
11
u
(k)
1
| = t
(k)
11
| u
1
| = 1, and hence u
(k)
1
u
1
.
6/5/10 24 c _2010 Peter J. Olver
The second column reads
t
(k)
12
u
(k)
1
+t
(k)
22
u
(k)
2
u
2
.
Taking the inner product with u
(k)
1
u
1
and using orthonormality, we deduce t
(k)
12
0,
and so t
(k)
22
u
(k)
2
u
2
, which, by the previous reasoning, implies t
(k)
22
1 and u
(k)
2
u
2
.
The proof is completed by working in order through the remaining columns, employing a
similar argument at each step.
To modify the preceding proof for symmetric, but non-positive denite matrices, we
introduce the diagonal orthogonal matrix
=
1
= diag (sign
1
, . . . , sign
n
), so that = = diag ([
1
[, . . . , [
n
[).
whose diagonal entries are the signs of the eigenvalues. (So in the positive denite case
= I .) Set

S
k
= S
k

k
,

T
k
=
k
T
k
.
Thus, (6.13) can be rewritten as

S
k

T
k
= S
k
T
k
= S
k
L
k
S as k .
Recalling (6.12), the diagonal entries of the upper triangular matrix

T
k
=
k
T
k
are seen
to be strictly positive, and hence Lemma 6.4 implies

S
k
S,

T
k
I , as k .
Therefore,
Q
k
= S
1
k1
S
k
=
k1

S
1
k1

S
k

k
,
R
k
= T
k
T
1
k1
=
k

T
k

T
1
k1

k1
,
as k . (6.17)
The convergence of the QR matrices immediately follows:
A
k
= Q
k
R
k

2
= as k .
Keep in mind that, in non-positive denite cases, the orthogonal matrices S
k
no longer
converge, since the nonzero entries in the columns corresponding to negative eigenvalues
switch their signs back and forth as k increases; however, by eliminating the sign changes
which, in practice can be done by inspection their modications

S
k
= S
k

k
S
will converge to the eigenvector matrix.
This completes the proof of the convergence of the QR algorithm for a broad class of
symmetric matrices.
Theorem 6.5. If A is symmetric, satises
[
1
[ > [
2
[ > > [
n
[ > 0, (6.18)
and its transposed eigenvector matrix S
T
is regular, then the matrices appearing in the
QR algorithm applied to A converge to the diagonal eigenvalue matrix: A
k
=
diag (
1
, . . . ,
n
) as k .
6/5/10 25 c _2010 Peter J. Olver
v
u
Hv
u

Figure 2. Elementary Reection Matrix.


Tridiagonalization
In practical implementations, the direct QR algorithm often takes too long to provide
reasonable approximations to the eigenvalues of large matrices. Fortunately, the algorithm
can be made much more ecient by a simple preprocessing step. The key observation is
that the QR algorithm preserves the class of symmetric tridiagonal matrices, and, more-
over, like Gaussian Elimination, is much faster when applied to this class of matrices.
It turns out that, although diagonalizinjg a matrix is eectively the same as nding its
eigenvectors, one can tridiagonalize the matrix by a sequence of fairly elementary matrix
operations, based on the following class of matrices.
Consider the Householder or elementary reection matrix
H = I 2uu
T
(6.19)
in which u is a unit vector (in the Euclidean norm). The matrix H represents a reection of
vectors through the subspace u

= v[ v u = 0 of vectors orthogonal to u, as illustrated


in Figure 2. The matrix H is symmetric and orthogonal:
H
T
= H, H
2
= I , H
1
= H.
(6.20)
The proof is straightforward: symmetry is immediate, while
HH
T
= H
2
= ( I 2uu
T
) ( I 2uu
T
) = I 4uu
T
+ 4u(u
T
u) u
T
= I
since, by assumption, u
T
u = | u|
2
= 1. By suitably forming the unit vector u, we can
construct an elementary reection matrix that interchanges any two vectors of the same
length.
Lemma 6.6. Let v, w R
n
with | v| = | w|. Set u = (v w)/| v w| and let
H = I 2uu
T
be the corresponding elementary reection matrix. Then Hv = w and
Hw = v.
6/5/10 26 c _2010 Peter J. Olver
Proof : Keeping in mind that x and y have the same Euclidean norm, we compute
Hv = ( I 2uu
T
) v = v 2
(v w)(v w)
T
v
| v w|
2
= v 2
(v w)
_
| v|
2
w v
_
2 | v|
2
2 v w
= v (v w) = w.
The proof of the second equation is similar. Q.E.D.
In Householders approach to the QR factorization, we were able to convert the matrix
A to upper triangular form R by a sequence of elementary reection matrices. Unfortu-
nately, this procedure does not preserve the eigenvalues of the matrix the diagonal
entries of R are not the eigenvalues and so we need to be a bit more clever here. We
begin by recalling that similar matrices have the same eigenvalues.
Lemma 6.7. If H = I 2uu
T
is an elementary reection matrix, with u a unit
vector, then A and B = HAH are similar matrices and hence have the same eigenvalues.
Proof : According to (6.20), H
1
= H, and hence B = H
1
AH is similar to A. If
is any eigenvalue of A, soAv = v for v ,= 0, then Bw = w where w = H
1
v, which
implies that remains an eigenvalue of B,l and conversely. Q.E.D.
Given a symmetric n n matrix A, our goal is to devise a similar tridiagonal matrix
by applying a sequence of Householder reections. We begin by setting
x
1
=
_
_
_
_
_
_
0
a
21
a
31
.
.
.
a
n1
_
_
_
_
_
_
, y
1
=
_
_
_
_
_
_
0
r
1
0
.
.
.
0
_
_
_
_
_
_
, where r
1
= | x
1
| = | y
1
|,
so that x
1
contains all the o-diagonal entries of the rst column of A. Let
H
1
= I 2u
1
u
T
1
, where u
1
=
x
1
y
1
| x
1
y
1
|
be the corresponding elementary reection matrix that maps x
1
to y
1
. Either sign in
the formula for y
1
works in the algorithm; a good choice is to set it to be the opposite of
the sign of the entry a
21
, which helps minimize the possible eects of round-o error when
computing the unit vector u
1
. By direct computation, based on Lemma 6.6 and the fact
that the rst entry of u
1
is zero,
A
2
= H
1
AH
1
=
_
_
_
_
_
_
a
11
r
1
0 . . . 0
r
1
a
22
a
23
. . . a
2n
0 a
32
a
33
. . . a
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 a
n2
a
n3
. . . a
nn
_
_
_
_
_
_
(6.21)
6/5/10 27 c _2010 Peter J. Olver
for certain a
ij
; the explicit formulae are not needed. Thus, by a single Householder trans-
formation, we convert A into a similar matrix A
2
whose rst row and column are in
tridiagonal form. We repeat the process on the lower right (n 1) (n 1) submatrix of
A
2
. We set
x
2
=
_
_
_
_
_
_
_
_
0
0
a
32
a
42
.
.
.
a
n2
_
_
_
_
_
_
_
_
, y
1
=
_
_
_
_
_
_
_
_
0
0
r
2
0
.
.
.
0
_
_
_
_
_
_
_
_
, where r
2
= | x
2
| = | y
2
|,
and the sign is chosen to be the opposite of that of a
32
. Setting
H
2
= I 2u
2
u
T
2
, where u
2
=
x
2
y
2
| x
2
y
2
|
,
we construct the similar matrix
A
3
= H
2
A
2
H
2
=
_
_
_
_
_
_
_
_
a
11
r
1
0 0 . . . 0
r
1
a
22
r
2
0 . . . 0
0 r
2
a
33
a
34
. . . a
3n
0 0 a
43
a
44
. . . a
4n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 a
n3
a
n4
. . . a
nn
_
_
_
_
_
_
_
_
.
whose rst two rows and columns are now in tridiagonal form. The remaining steps in the
algorithm should now be clear. Thus, the nal result is a tridiagonal matrix T = A
n
that
has the same eigenvalues as the original symmetric matrix A. Let us illustrate the method
by an example.
Example 6.8. To tridiagonalize A =
_
_
_
4 1 1 2
1 4 1 1
1 1 4 1
2 1 1 4
_
_
_
, we begin with its rst
column. We set x
1
=
_
_
_
0
1
1
2
_
_
_
, so that y
1
=
_
_
_
0

6
0
0
_
_
_

_
_
_
0
2.4495
0
0
_
_
_
. Therefore, the unit
vector is u
1
=
x
1
y
1
| x
1
y
1
|
=
_
_
_
0
.8391
.2433
.4865
_
_
_
, with corresponding Householder matrix
H
1
= I 2u
1
u
T
1
=
_
_
_
1 0 0 0
0 .4082 .4082 .8165
0 .4082 .8816 .2367
0 .8165 .2367 .5266
_
_
_
.
6/5/10 28 c _2010 Peter J. Olver
Thus,
A
2
= H
1
AH
1
=
_
_
_
4.0000 2.4495 0 0
2.4495 2.3333 .3865 .8599
0 .3865 4.9440 .1246
0 .8599 .1246 4.7227
_
_
_
.
In the next phase, x
2
=
_
_
_
0
0
.3865
.8599
_
_
_
, y
2
=
_
_
_
0
0
.9428
0
_
_
_
, so u
2
=
_
_
_
0
0
.8396
.5431
_
_
_
, and
H
2
= I 2u
2
u
T
2
=
_
_
_
1 0 0 0
0 1 0 0
0 0 .4100 .9121
0 0 .9121 .4100
_
_
_
.
The resulting matrix
T = A
3
= H
2
A
2
H
2
=
_
_
_
4.0000 2.4495 0 0
2.4495 2.3333 .9428 0
0 .9428 4.6667 0
0 0 0 5
_
_
_
is now in tridiagonal form.
Since the nal tridiagonal matrix T has the same eigenvalues as A, we can apply
the QR algorithm to T to approximate the common eigenvalues. (The eigenvectors must
then be computed separately, e.g., by the shifted inverse power method.) It is not hard to
show that, if A = A
1
is tridiagonal, so are all the iterates A
2
, A
3
, . . . . Moreover, far fewer
arithmetic operations are required. For instance, in the preceding example, after we apply
20 iterations of the QR algorithm directly to T, the upper triangular factor has become
R
21
=
_
_
_
6.0000 .0065 0 0
0 4.5616 0 0
0 0 5.0000 0
0 0 0 .4384
_
_
_
.
The eigenvalues of T, and hence also of A, appear along the diagonal, and are correct to
4 decimal places.
Finally, even if A is not symmetric, one can still apply the same sequence of House-
holder transformations to simplify it. The nal result is no longer tridiagonal, but rather
a similar upper Hessenberg matrix, which means that all entries below the subdiagonal are
zero, but those above the superdiagonal are not necessarily zero. For instance, a 5 5
upper Hessenberg matrix looks like
_
_
_
_
_


0
0 0
0 0 0
_
_
_
_
_
,
6/5/10 29 c _2010 Peter J. Olver
where the starred entries can be anything. It can be proved that the QR algorithm
maintains the upper Hessenberg form, and, while not as ecient as in the tridiagonal
case, still yields a signicant savings in computational eort required to nd the common
eigenvalues.
Acknowledgments: I would like to thank Raphaele Herbin for comments on an earlier
version of these notes.
6/5/10 30 c _2010 Peter J. Olver

You might also like