Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Chapter 13

Orthogonal
Transformations
In Chapter 6 we saw how transformations by elementary lower triangular matrices
can be used to reduce a matrix to triangular form. Lower triangular matrices
are not the only kind of transformations which can be used for such a task. In this
chapter we study how transformations by orthogonal matrices can be used to reduce
a rectangular matrix to upper triangular (also called upper trapezoidal) form. This
lead to a decomposition of the matrix known as a QR-decomposition and a compact
form which we refer to as a QR-factorization. Orthogonal transformation has the
advantage that they preserve the Euclidian norm of a vector, and the spectral norm
and Frobenius norm of a matrix. Indeed, if Q R
m,m
is an orthogonal matrix then
Qv
2
= v
2
, QA
2
= A
2
, and QA
F
= A
F
for any vector v R
m
and
any martrix A R
m,n
, (cf. Lemma 12.5 and Theorem 12.10). This means that
when an orthogonal transformation is applied to an inaccurate vector or matrix then
the error will not grow. Thus in general an orthogonal transformation is numerically
stable.
13.1 The QR decomposition.
Denition 13.1. Let A R
m,n
with m n 1. We say that A = QR is a
QR-decomposition of A if Q R
m,m
is square and orthogonal and
R =
_
R
1
0
mn,n
_
where R
1
R
n,n
is upper triangular and 0
mn,n
R
mn,n
is the zero matrix. We
call A = QR a QR-factorization of A if Q R
m,n
has orthonormal columns
and R R
n,n
is upper triangular.
A QR-factorization is obtained from a QR-decomposition A = QR by simply
using the rst n columns of Q. Indeed, if we partition Q as [Q
1
, Q
2
] and R =
_
R1
0

,
where Q
1
R
m,n
and R
1
R
n,n
then A = Q
1
R
1
is a QR-factorization of A.
153
154 Chapter 13. Orthogonal Transformations
On the other hand a QR-factorization A = Q
1
R
1
of A can be turned into a
QR-decomposition by extending the set of columns {q
1
, . . . , q
n
} of Q
1
into an
orthonormal basis {q
1
, . . . , q
n
, q
n+1
, . . . , q
m
} for R
m
and adding mn rows of zeros
to R
1
. We then obtain the QR-decomposition A = QR, where Q = [q
1
, . . . , q
m
]
and R =
_
R1
0

.
Example 13.2. An example of a QR-decomposition is
A =
_

_
1 3 1
1 3 7
1 1 4
1 1 2
_

_
=
1
2
_

_
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
_

_
2 2 3
0 4 5
0 0 6
0 0 0
_

_
= QR,
while a QR-factorization A = Q
1
R
1
is obtained by dropping the last column of Q
and the last row of R so that
A =
1
2
_

_
1 1 1
1 1 1
1 1 1
1 1 1
_

_
_
2 2 3
0 4 5
0 0 6
_
_
= Q
1
R
1
Consider now existence and uniqueness.
Theorem 13.3. Suppose A R
m,n
with m n 1 has linearly independent
columns. Then A has a QR-decomposition and a QR-factorization. The QR-
factorization is unique if R has positive diagonal entries.
Proof. For existence it is enough to show that A has a QR-factorization. By
Example 7.13 the matrix A
T
A is positive denite, and by Lemma 7.16 it has a
Cholesky factorization A
T
A = R
T
1
R
1
, where R
1
R
n,n
is upper triangular and
nonsingular. The matrix Q
1
:= AR
1
1
has orthogonal columns since Q
T
1
Q
1
=
R
T
1
A
T
AR
1
1
= R
T
1
R
T
1
R
1
R
1
1
= I. But then A = Q
1
R
1
is a QR-factorization
of A. This shows existence. For uniqueness, if A = Q
1
R
1
is a QR-factorization
of A and R
1
has positive diagonal entries then A
T
A = R
T
1
Q
T
1
Q
1
R
1
= R
T
1
R
1
is
the Cholesky factorization of A
T
A. Since the Cholesky factorization is unique it
follows that R
1
is unique and hence Q
1
= AR
1
1
is unique.
We show in Theorem 13.11 existence without the assumption of linearly inde-
pendent columns.
Exercise 13.1.
A =
_

_
1 2
1 2
1 0
1 0
_

_
, Q =
1
2
_

_
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
_

_
, R =
_

_
2 2
0 2
0 0
0 0
_

_
.
13.1. The QR decomposition. 155
Show that Q is orthogonal and that QR is a QR-decomposition of A. Find a QR-
factorization of A.
13.1.1 QR and Gram-Schmidt
The Gram-Schmidt orthogonalization of the columns of A can be used to nd the
QR-factorization of A. If A R
m,n
has rank n, then the set of columns {a
1
, . . . , a
n
}
forms a basis for span(A) and the Gram-Schmidt orthogonalization process (2.23)
takes the form
v
1
= a
1
, v
j
= a
j

j1

i=1
a
T
j
v
i
v
T
i
v
i
v
i
, for j = 2, . . . , n. (13.1)
By Theorem 2.36 {v
1
, . . . , v
n
} is an orthogonal basis for span(A). If we write (13.1)
in the form
a
1
= v
1
, a
j
=
j1

i=1

ij
v
i
+v
j
, where
ij
=
a
T
j
v
i
v
T
i
v
i
v
i
for j = 2, . . . , n,
then A = V

R, where V := [v
1
, . . . , v
n
] R
m,n
and

R :=
_

_
1
12

13

14

1n
0 1
23

24

2n
0 0 1
34

3n
0 0 0 1
4n
.
.
.
.
.
.
0 0 0 0 1
_

_
is upper unit triangular. Since {v
1
, . . . , v
n
} is a basis for span(A) the matrix
D := diag(v
1

2
, . . . , v
n

2
) is nonsingular, and the matrix Q
1
:= V D
1
=
[
v1
v12
, . . . ,
vn
vn2
] is orthogonal. Therefore, A = Q
1
R
1
, with R
1
:= D

R is a
QR-factorization of A with positive diagonal entries in R
1
.
Exercise 13.2. Construct Q
1
and R
1
in Example 13.2 using Gram-Schmidt or-
thogonalization.
13.1.2 QR and Least Squares
Suppose A R
m,n
has rank n and let b R
m
. The QR-factorization can be used
to solve the least squares problem min
xR
nAx b
2
. Suppose A = QR is a
QR-decomposition of A. We partition Q and R as Q = [Q
1
Q
2
] and R =
_
R1
0

,
where Q
1
R
m,n
and R
1
R
n,n
. Then
Ax b
2
2
= QRx b
2
2
= Rx Q
T
b
2
2
=
_
R
1
x Q
T
1
b
Q
T
2
b
_

2
2
= R
1
x Q
T
1
b
2
2
+Q
T
2
b
2
2
.
156 Chapter 13. Orthogonal Transformations
Thus Axb
2
Q
T
2
b
2
for all x R
n
with equality if R
1
x = Q
T
1
b. In summary
we have the following method to solve the least squares problem.
1. Find a QR-factorization A = Q
1
R
1
of A.
2. Solve R
1
x = Q
T
1
b for the least squares solution x.
Example 13.4. Consider the least squares problem with
A =
_

_
1 3 1
1 3 7
1 1 4
1 1 2
_

_
and b =
_

_
1
1
1
1
_

_
.
This is the matrix in Example 13.2. The least squares solution x is found by solving
the system
_
_
2 2 3
0 4 5
0 0 6
_
_
_
_
x
1
x
2
x
3
_
_
=
1
2
_
_
1 1 1 1
1 1 1 1
1 1 1 1
_
_

_
1
1
1
1
_

_
and we nd x = [1, 0, 0]
T
.
13.2 The Householder Transformation
The Gram-Schmidt orthogonalization process should not be used to compute the
QR-factorization numerically. The columns of Q
1
computed in oating point arith-
metic using Gram-Schmidt orthogonalization will often be far from orthogonal.
Instead Householder transformations should be used.
Denition 13.5. A matrix H R
n,n
of the form
H := I uu
T
, where u R
n
and u
T
u = 2
is called a Householder transformation. The name elementary reector is
also used.
For n = 2 we nd H =
_
1u
2
1
u1u2
u2u1 1u
2
2
_
. A Householder transformation is
symmetric and orthogonal. In particular,
H
T
H = H
2
= (I uu
T
)(I uu
T
) = I 2uu
T
+u(u
T
u)u
T
= I.
There are several ways to represent a Householder transformation. House-
holder used I 2uu
T
, where u
T
u = 1. For any nonzero v R
n
the matrix
H := I 2
vv
T
v
T
v
(13.2)
is a Householder transformation. In fact H = I uu
T
, where u :=

2
v
v2
.
13.2. The Householder Transformation 157
x
y=Hx
Px
x+y
Figure 13.1. The Householder transformation
Our use of Householder transformations will be to produce zeros in vectors.
We start with.
Lemma 13.6. Suppose x, y R
n
with x
2
= y
2
and v := x y = 0. Then
_
I 2
vv
T
v
T
v
_
x = y.
Proof. Since x
T
x = y
T
y we have
v
T
v = (x y)
T
(x y) = 2x
T
x 2y
T
x = 2v
T
x. (13.3)
But then
_
I 2
vv
T
v
T
v
_
x = x
2v
T
x
v
T
v
v = x v = y.
A geometric interpretation of this lemma is shown in Figure 13.1. We have
H = I
2vv
T
v
T
v
= P
vv
T
v
T
v
, where P := I
vv
T
v
T
v
,
and
Px = x
v
T
x
v
T
v
v
(13.3)
= x
1
2
v =
1
2
(x +y).
It follows that Hx is the reected image of x. The mirror contains the vector x+y
and has normal x y.
Exercise 13.3. Show that x
2
= y
2
implies that xy is orthogonal to x+y and
conclude that Px is the orthogonal projection of x into the subspace span(x +y).
We can introduce zeros in a vector x by picking
2
= x
T
x and y := e
1
in
Lemma 13.6. The equation
2
= x
T
x has two solutions = +x
2
and = x
2
.
We want to develop an algorithm which denes a Householder transformation for
any nonzero x. We achieve this by choosing to have opposite sign of x
1
. Then
v
1
= x
1
= 0 so v = 0. Another advantage of this choice is that we avoid
cancelation in the subtraction in the rst component of v = xe
1
. This leads to
a numerically stable algorithm.
158 Chapter 13. Orthogonal Transformations
Lemma 13.7. For a nonzero vector x R
n
we dene
:=
_
x
2
if x
1
> 0
+x
2
otherwise,
(13.4)
and
H := I uu
T
with u =
x/ e
1
_
1 x
1
/
. (13.5)
Then H is a Householder transformation and Hx = e
1
.
Proof. Let y := e
1
and v := x y. If x
1
> 0 then y
1
= < 0, while if x
1
0
then y
1
= > 0. It follows that x
T
x = y
T
y and v = 0. By Lemma 13.6 we have
Hx = e
1
, where H = I 2
vv
T
v
T
v
is a Householder transformation. Since
0 < v
T
v = (x e
1
)
T
(x e
1
) = x
T
x 2x
1
+
2
= 2( x
1
),
we nd
H = I
2(x e
1
)(x e
1
)
T
2( x
1
)
= I
(x/ e
1
)(x/ e
1
)
T
1 x
1
/
= I uu
T
.
Example 13.8. For x := [1, 2, 2]
T
we have x
2
= 3 and since x
1
> 0 we choose
= 3. We nd u = [2, 1, 1]
T
/

3 and
H = I
1
3
_
_
2
1
1
_
_
_
2 1 1

=
1
3
_
_
1 2 2
2 2 1
2 1 2
_
_
.
The formulas in Lemma 13.7 are implemented in the following algorithm.
Algorithm 13.9. To given x R
n
the following algorithm computes a =
and the vector u so that (I uu
T
)x = e
1
.
function [ u , a]=housegen ( x)
a=norm( x ) ; u=x ;
i f a==0
u(1)=sqrt ( 2 ) ; return ;
end
i f u(1) >0
a=a ;
end
u=u/a ; u(1)=u(1) 1;
u=u/sqrt(u ( 1 ) ) ;
13.2. The Householder Transformation 159
If x = 0 then any u with u
2
=

2 can be used in the Householder trans-


formation. In the algorithm we use u =

2e
1
in this case.
Exercise 13.4. Determine H in Algorithm 13.9 when x = e
1
.
Householder transformations can also be used to zero out only the lower part
of a vector. Suppose y R
k
, z R
nk
and
2
= z
T
z. Consider nding a
Householder transformation H such that H[
y
z
] = [
y
e1
]. Let u and be the
output of Algorithm 13.9 called with x = z, i. e., , [ u, ] = housegen(z) and set
u
T
= [0
T
, u
T
]. Then
H = I uu
T
=
_
I 0
0 I
_

_
0
u
_
_
0 u

=
_
I 0
0

H
_
,
where

H = I u u
T
. Since u
T
u = u
T
u = 2 we see that H and

H are Householder
transformations.
Exercise 13.5. Construct an elementary reector Q such that Qx = y in the
following cases.
a) x =
_
3
4
_
, y =
_
5
0
_
.
b) x =
_
_
2
2
1
_
_
, y =
_
_
0
3
0
_
_
.
Exercise 13.6. Show that a 2 2 Householder transformation can be written in
the form
Q =
_
cos sin
sin cos
_
.
Find Qx if x = [cos , sin ]
T
.
Exercise 13.7.
a) Find Householder transformations Q
1
, Q
2
R
3,3
such that
Q
2
Q
1
A = Q
2
Q
1
_
_
1 0 1
2 1 0
2 2 1
_
_
is upper triangular.
b) Find the QR factorization of A where R has positive diagonal elements.
160 Chapter 13. Orthogonal Transformations
13.3 Householder Triangulation
Suppose A R
m,n
and assume rst for simplicity that m > n. We describe how to
nd a sequence H
1
, . . . , H
n
of orthogonal matrices such that
H
n
H
n1
H
1
A =
_
R
1
0
_
,
and where R
1
is upper triangular. Here each H
k
is a Householder transformation.
Since the product of orthogonal matrices is orthogonal and each H
k
is symmetric
we obtain the QR-decomposition of A in the form
A = QR, where Q := H
1
H
2
H
n
and R :=
_
R
1
0
_
. (13.6)
Dene A
1
= A and suppose for k 1 that A
k
is upper triangular in its rst k 1
columns so that A
k
=
_
B
k
C
k
0 D
k

, where B
k
R
k1,k1
is upper triangular and
D
k
R
nk+1,nk+1
. Let

H
k
= I u
k
u
T
k
be a Householder transformation which
zero out the rst column in D
k
under the diagonal, so that

H
k
(D
k
e
1
) =
k
e
1
.
Set H
k
:=
_
I
k1
0
0

H
k
_
. Then A
k+1
:= H
k
A
k
=
_
B
k
C
k
0

H
k
D
k
_
=
_
B
k+1
C
k+1
0 D
k+1
_
,
where B
k+1
R
k,k
is upper triangular and D
k+1
R
nk,nk
. Thus A
k+1
is
upper triangular in its rst k columns and the reduction has been carried one
step further. At the end R := A
n+1
=
_
R1
0

, where R
1
is upper triangular and
R = H
n
H
2
H
1
A. Thus A = H
1
H
n
R and we obtain (13.6).
The process just described can be illustrated as follows when m = 4 and n = 3
using so called Wilkinson diagrams.
_

_
x x x
x x x
x x x
x x x
_

_
H1

_
r
11
r
12
r
13
0 x x
0 x x
0 x x
_

_
H2

_
r
11
r
12
r
13
0 r
22
r
23
0 0 x
0 0 x
_

_
H3

_
r
11
r
12
r
13
0 r
22
r
23
0 0 r
33
0 0 0
_

_
.
A
1
= D
1
A
2
=
_
B
2
C
2
0 D
2
_
A
3
=
_
B
3
C
3
0 D
3
_
A
4
=
_
R
1
0
_
The transformation is applied to the lower right block.
The process can also be applied to A R
m,n
if m n. In this case m 1
Householder transformations will suce and we obtain
H
m1
H
1
A = [R
1
, S
1
] = R, (13.7)
where R
1
is upper triangular and S
1
R
m,nm
.
In an algorithm we can store most of the vectors u
k
= [u
kk
, . . . , u
mk
]
T
in A.
We can also store the entries of R
1
in A, but there are no room for the diagonal
(r
11
, . . . , r
nn
) = (
1
, . . . ,
n
). We store these entries in a separate vector r. Thus
for m = 4 and n = 3 we have r = [
1
,
2
,
3
]
T
and
A =
_

_
u
11
r
12
r
13
u
21
u
22
r
23
u
31
u
32
u
33
u
41
u
42
u
43
_

_
13.3. Householder Triangulation 161
13.3.1 QR and Least Squares
Suppose A R
m,n
has rank n and that b R
m
. In the following algorithm we use
Householder transformations to solve the least squares problem min
x
Ax b
2
.
We rst nd a QR-decomposition A = QR of A and compute Q
T
b. We then solve
the system R
1
x = Q
T
1
b, where A = Q
1
R
1
is the QR-factorization of A. For each
Householder transformation we need to compute the update

H
k
D
k
. We do not
form the matrix

H
k
, but compute the product as
(I u
k
u
T
k
)D
k
= D
k
u
k
( u
T
k
D
k
). (13.8)
Algorithm 13.10. Suppose A R
m,n
has rank n and that b R
m
. This
algorithm uses Householder transformations to solve the least squares problem
min
x
Ax b
2
if m > n and the linear system Ax = b if m = n. It uses
Algorithms 6.12 and 13.9.
function x = hous e l s q (A, b)
[ m, n]=si ze (A) ; A=[A b ] ;
for k=1:min( n , m1)
[ v , A( k , k)] =housegen (A( k : m, k ) ) ;
A( k : m, k+1:n+1)=A( k : m, k+1:n+1)v( v A( k : m, k+1: n+1));
end
x=backs ol ve (A( 1 : n , 1 : n) ,A( 1 : n , n+1));
The function housegen(x) returns a Householder transformation for any x
R
n
. Thus in in Algorithm 13.10 we obtain a QR-decomposition A = QR, where
Q = H
1
. . . H
r
, is orthogonal and r = min{n, m 1}. Thus a QR-factorization
always exists and we have proved
Theorem 13.11. Suppose m n 1 and A R
m,n
. Then A has a QR-
decomposition and a QR-factorization.
Algorithm 13.10 is a useful alternative to normal equations for solving full
rank least squares problems. Recall that the 2-norm condition number of a matrix
is the square root of the largest eigenvalue of A
T
A. It follows that the 2-norm
condition number of A
T
A is the square of the 2-norm condition number of A.
Thus if A is mildly ill-conditioned the normal equations can be quite ill-conditioned
and solving the normal equations can give inaccurate results. On the other hand
Algorithm 13.10 is quite stable. But using Householder transformations require
more work. The leading term in the number of ops in Algorithm 13.10 can be
estimated from (13.8). Since D
k
R
mk+1,nk+1
we can estimate this work as
_
n
0
4(mk)(n k)dk = 2mn
2

2
3
n
3
. (13.9)
When m is large compared to m the term 2mn
2
dominates. Now forming the normal
equations and taking advantage of the symmetry requires mn
2
ops. Thus Algo-
162 Chapter 13. Orthogonal Transformations
rithm 13.10 requires approximately twice as many ops when m is large compared
to n.
13.3.2 QR and linear systems
Algorithm 13.10 can be used to solve linear systems. If A is nonsingular and m = n
then the output x will be the solution of Ax = b. This follows since the QR-
decomposition and QR-factorization are the same when A is square. Therefore, if
Rx = Q
T
b then
Ax b
2
= QRx b
2
= Rx Q
T
b
2
= 0.
So Algorithm 13.10 can be used as an alternative to Gaussian elimination. The
two methods are similar since they both reduce A to upper triangular form using
certain transformations.
Which method is better? Linear systems can be constructed where Gaussian
elimination with partial pivoting will fail numerically. On the other hand the trans-
formations used in Householder triangulation are orthogonal so the method is quite
stable. So why is Gaussian elimination more popular than Householder triangula-
tion? One reason is that the number of ops in (13.9) when m = n is given by 4n
3
/3,
while the count for Gaussian elimination is half of that. Numerical stability can
be a problem with Gaussian elimination, but years and years of experience shows
that it works well for most practical problems and pivoting is often not necessary.
Tradition might also play a role.
13.4 Givens rotations
In some applications, the matrix we want to triangulate has a special structure.
Suppose for example that A R
n,n
is square and upper Hessenberg as illustrated
by a Wilkinson diagram for n = 4
A =
_

_
x x x x
x x x x
0 x x x
0 0 x x
_

_
.
Only one entry in each column needs to be annihilated and a full Householder
transformation will be inecient. In this case we can use a simpler transformation.
Denition 13.12. A plane rotation (also called a Givens rotation) is a matrix
of the form
P :=
_
c s
s c
_
, where c
2
+ s
2
= 1.
A plane rotation is orthogonal and there is a unique angle [0, 2) such
that c = cos and s = sin . Moreover, the identity matrix is a plane rotation.
13.4. Givens rotations 163
x
y=Rx

Figure 13.2. A plane rotation.


Exercise 13.8. Show that if x = [
r cos
r sin
] then Px =
_
r cos ()
r sin ()
_
. Thus P rotates
a vector x in the plane an angle clockwise. See Figure 13.2.
Suppose
x =
_
x
1
x
2
_
= 0, c :=
x
1
r
, s :=
x
2
r
, r := x
2
.
Then
Px =
1
r
_
x
1
x
2
x
2
x
1
_ _
x
1
x
2
_
=
1
r
_
x
2
1
+ x
2
2
0
_
=
_
r
0
_
,
and we have introduced a zero in x. We can take P = I when x = 0.
For an n-vector x R
n
and 1 i < j n we dene a rotation in the i, j-
plane as a matrix P
ij
= (p
kl
) R
n,n
by p
kl
=
kl
except for positions ii, jj, ij, ji,
which are given by
_
p
ii
p
ij
p
ji
p
jj
_
=
_
c s
s c
_
, where c
2
+ s
2
= 1.
Premultiplying a matrix by a rotation in the i, j plane changes only rows i and
j of the matrix, while postmultiplying the matrix by such a rotation only changes
column i and j. . In particular, if B = P
ij
A and C = AP
ij
then B(k, :) = A(k, :),
C(:, k) = A(:, k) for all k = i, j and
_
B(i, :)
B(j, :)
_
=
_
c s
s c
_ _
A(i, :)
A(j, :)
_
,
_
C(:, i) C(:, j)

=
_
A(:, i) A(:, j)

_
c s
s c
_
.
(13.10)
An upper Hessenberg matrix A R
n,n
can be transformed to upper triangular
form using rotations P
i,i+1
for i = 1, . . . , n 1. For n = 4 the process can be
illustrated as follows.
A =
_
x x x x
x x x x
0 x x x
0 0 x x
_
P12

_
r11 r12 r13 r14
0 x x x
0 x x x
0 0 x x
_
P23

_
r11 r12 r13 r14
0 r22 r23 r24
0 0 x x
0 0 x x
_
P34

_
r11 r12 r13 r14
0 r22 r23 r24
0 0 r33 r34
0 0 0 r44
_
.
For an algorithm see Exercise 13.9.
Exercise 13.9. Let A R
n,n
be upper Hessenberg and nonsingular, and let b R
n
.
The following algorithm solves the linear system Ax = b using rotations P
k,k+1
for
k = 1, . . . , n 1. Determine the number of ops of this algorithm.
164 Chapter 13. Orthogonal Transformations
Algorithm 13.13. Suppose A R
n,n
is nonsingular and upper Hessenberg
and that b R
n
. This algorithm uses Givens rotations to solve the linear
system Ax = b. It uses Algorithm 6.12.
function x=r o t h e s s t r i (A, b)
n=length (A) ; A=[A b ] ;
for k=1:n1
r=norm( [ A( k , k ) ,A( k+1,k ) ] ) ;
i f r>0
c=A( k , k)/ r ; s=A( k+1,k)/ r ;
A( [ k k+1] , k+1:n+1)=[ c s ;s c ] A( [ k k+1] , k+1:n+1);
end
A( k , k)=r ; A( k+1,k)=0;
end
x=backs ol ve (A( : , 1 : n) ,A( : , n+1));

You might also like