Orthogonal Matrices

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

ORTHOGONAL PROJECTIONS AND ORTHOGONAL MATRICES

DYLAN RUPEL

1. Reminders about orthonormal bases


I want to begin the lecture today by recalling why we care about orthonormal bases for a subspace V of
Rn . First lets remember what an orthonormal basis is:
Definition 1.1. A basis ~u1 , ~u2 , . . . , ~um of a vector space V is called orthonormal if the basis vectors are
pairwise orthogonal
~ui ~uj = 0 whenever i 6= j
and each basis vector ~ui has unit length
||~ui || = 1

for all i.

The main observation resulting from this definition is that one may more easily find the expansion of an
arbitrary vector ~x V in terms of an orthonormal basis than in terms of any old basis. Consider such an
expansion
~x = c1 ~u1 + c2 ~u2 + + cm ~um

(1.1)

Using that ~u1 ~ui = 0 whenever 2 i m, there is a lot of simplification when we apply the linear map
~u1 to both sides of (1.1)
~u1 ~x = ~u1 (c1 ~u1 + c2 ~u2 + + cm ~um )
= c1 (~u1 ~u1 ) + c2 (~u1 ~u2 ) + + cm (~u1 ~um )
= c1 1 + c2 0 + + cm 0
= c1
So that c1 = ~u1 ~x. Similarly, we may show that all of the expansion coefficients are easily computed
ci = ~ui ~x
Inserting these into (1.1) we obtain a very tidy representation for ~x V = span{~u1 , . . . , ~um }
(1.2)

~x = (~u1 ~x) ~u1 + (~u2 ~x) ~u2 + + (~um ~x) ~um

To help emphasize how nice this situation is let me remind how much work would be required if we were to
obtain a similar representation of ~x V in terms of an arbitrary basis ~v1 , . . . , ~vm of V with no distinguishing
properties. That is we are looking for the expansion
~x = c1~v1 + c2~v2 + + cm~vm

c1
c2

= A .
..
cm
where the columns of A are exactly the basis vectors ~v1 , . . . , ~vm


A = ~v1 ~v2 ~vm
Date: July 27, 2013.
1

DYLAN RUPEL

To find the expansion coefficients, i.e. the vector ~c =

c1
c2
..
.

, we have to solve the system of n equations

cm


A~c = ~x, i.e. use Gauss-Jordan elimination to find the reduced row-echelon form of the matrix A ~x . You
should be convinced by now that this can be a little bit tedious if the matrices involved are large.
2. Transpose of a matrix and dot products
The process of finding the transpose of a matrix is extremely important in understanding more conceptually some of the computations we have been doing since the beginning of the class.
Definition 2.1. Let A be an n m matrix. The transpose of A, denoted AT , is the m n matrix obtained
by rewriting each column of A as a row of AT .
To be more precise in the definition would require more notation than I care to push on you, so let me
illustrate with some examples.
Example 2.2.
T
1
2


3 = 1 2

4
5

1
2

4
5

T
6

7

8 =
6
9
10

2
7

3
8

4
9

5
10

1
2

3
4

T
1
0 2
3
0
4
7 5
=
2
1
0
1
3
6
8
3

2
3
4
4 1 6

7
0
8
5
1
3

We will need to know one essential property of the transpose: for an n m matrix A and an m p matrix
B we have
(2.1)

(AB)T = B T AT

This should remind you a lot of a similar identity involved when taking the inverse of a product. While
the identity for inverses is rather easy to prove, the proof of this identity for the transpose requires a lot of
notation and isnt very enlightening. If you feel up to it I encourage you to try to prove it in general (as a
first step this will require you to precisely define the transpose operation for an arbitrary matrix), if not you
should at least convince yourself that it is true by way of examples.
Now I want to think a little
bit
dot product that we have been using all term and give a more
about the
x1
y1
x2
y2


natural definition. Let ~x = . and ~y = . be vectors in Rn . Notice that we have identified vectors in
..
..
xn
yn
Rn with n 1 matrices and that we have been doing this constantly, this will be an important observation.
Recall that the dot product is given by
~x ~y = x1 y1 + x2 y2 + . . . + xn yn
From the point of view of vectors this seems perfectly natural, however from the point of view of matrices
this is a strange new operation that we are forcing into the universe: there is no natural way to multiply
two n 1 matrices and get a scalar.
But notice that analogous to identifying vectors in Rn with n 1 matrices, we may identify scalars
(elements of R) with 1 1 matrices. Finally note that: there is a natural way to combine a 1 n
matrix with an n 1 matrix to get a 1 1 matrix, that is we simply multiply. The final observation
that we need to make is that we may reinterpret the dot product in terms of matrix multiplication

y1



y2
~x ~y = x1 y1 + x2 y2 + . . . + xn yn = x1 x2 xn . = ~xT ~y
..
yn

ORTHOGONAL PROJECTIONS AND ORTHOGONAL MATRICES

3. Matrix of an orthogonal projection


In (1.2) we gave an easy method for expressing an arbitrary vector ~x V in terms of an orthonormal
basis ~u1 , ~u2 , . . . , ~um of V . But notice that this is exactly the formula that gives the orthogonal projection of
an arbitrary vector ~x Rn into the subspace V
(3.1)

projV (~x) = (~u1 ~x) ~u1 + (~u2 ~x) ~u2 + + (~um ~x) ~um

To complete our understanding of the orthogonal projection we need only to realize that we are again
performing a very unnatural operation by expressing the projection in this way. Well begin by looking
again at rescaling a vector ~x Rn by a scalar c R


cx1
x1
x2 cx2


c~x = c . = .
.. ..

(3.2)

cxn

xn

This seems like a natural enough thing to do to a vector (in fact we can do it by the very definition of a
vector space!) But remember that we identify the vector ~x Rn with an n 1 matrix and we identify the
scalar c R with a 1 1 matrix, but then writing things as in (3.2) doesnt really make sense. Again we
have forced a strange new operation into the world of matrices: there is no natural way to multiply a
1 1 matrix on the left of an n 1 matrix to get an n 1 matrix.
How can we remedy this situation? Well notice that: there is a natural way to multiply a 1 1
matrix on the right of an n 1 matrix to get an n 1 matrix. So we conclude that it is more natural
to write scalars acting on the right if we are going to identify vectors in Rn with n 1 matrices.
Lets take these observations above and rewrite (3.1) in the more natural way: first by writing scalar
multiplication on the right and then by rewriting the dot product in terms of matrix multiplication
projV (~x) = (~u1 ~x) ~u1 + (~u2 ~x) ~u2 + + (~um ~x) ~um
= ~u1 (~u1 ~x) + ~u2 (~u2 ~x) + + ~um (~um ~x)
= ~u1 (~uT1 ~x) + ~u2 (~uT2 ~x) + + ~um (~uTm ~x)
= (~u1 ~uT1 ) ~x + (~u2 ~uT2 ) ~x + + (~um ~uTm ) ~x

In the last step we used the associativity of matrix multiplication to simply move the parentheses. But notice
we have exactly what we want, that is the matrix of the orthogonal projection. Indeed, lets reevaluate what
we have written. Each ~ui ~uTi is the product of an n 1 matrix and a 1 n matrix or more precisely, it is an
n n matrix. So we have a bunch of n n matrices each multiplying the vector ~x and we can simply factor
~x out to get the matrix of the projection
(3.3)

projV (~x) = (~u1 ~uT1 + ~u2 ~uT2 + + ~um ~uTm ) ~x

If we introduce a little bit of notation (which should already be familiar)


 we can write the matrix of the
orthogonal projection in a much tidier form. Let Q = ~u1 ~u2 ~um denote the matrix whose columns
are the orthonormal basis of V (the Q should remind you of the matrix Q that appears when performing
the Gram-Schmidt process, i.e. when finding the QR-factorization). After a seconds thought it is easy to
reinterpret the matrix of projV in terms of the matrix Q
~uT1

 ~uT2
~um .
..


~u1 ~uT1 + ~u2 ~uT2 + + ~um ~uTm = ~u1

~u2

~uTm

= QQT

DYLAN RUPEL

Example 3.1. Lets find the matrix of the orthogonal projection projV : R5 R5 onto the subspace V of
R5 spanned by the vectors


3
2
3
2

~v1 = 2 , ~v2 =
1 .
1
2
4
3
(Note that this question appeared on the last quiz where we didnt yet have such a clean method for finding
the answer.)
First we notice that the vectors are orthogonal, i.e. ~v1 ~v2 = 0. To apply the formula 3.3 we need to have
an orthonormal basis of V , so well divide each vector by its length to get

1/2
2/5
1/2
2/5

~u1 = 2/5 , ~u2 =


1/6
1/6
2/5
2/3
3/5
Now applying the formula we get

1/2
2/5

2/5



 1/2 
2/5 2/5 2/5 2/5 3/5 + 1/6 1/2 1/2 1/6 1/6 2/3
2/5
projV =

1/6
2/5
2/3
3/5

4/25 4/25 4/25 4/25 6/25


1/4
1/4 1/12
1/12 1/3
4/25 4/25 4/25 4/25 6/25 1/4
1/4 1/12
1/12 1/3

+ 1/12 1/12
4/25
4/25
4/25
4/25
6/25
1/36
1/36
1/9
=

4/25 4/25 4/25 4/25 6/25 1/12


1/12 1/36
1/36 1/9
6/25 6/25 6/25 6/25 9/25
1/3 1/3
1/9 1/9
4/9


= big matrix with lots of fractions , hence the calculator on the quiz.
4. Orthogonal Matrices
To define orthogonal matrices let me begin by recalling an important fact about invertible matrices.
Theorem 4.1. An n n matrix A is invertible if and only if the columns of A form a basis of Rn .
We have already seen that among all possible bases of Rn the orthonormal bases are particularly nice.
Thus it stands to reason that an invertible n n matrix whose columns form an orthonormal basis of Rn
should be particularly nice as well. Following this line of thought we make the following definition.
Definition 4.2. An n n matrix Q is called orthogonal if the columns of Q form an orthonormal basis
of Rn .
Lets start trying to understand why such a matrix is worth defining. To begin with we will see that the
inverse of such a matrix is almost trivial to compute. First we will note that the projection projV makes
sense for any subspace of Rn , including Rn itself. So if we fix an orthogonal matrix Q we may compute
the matrix of the orthogonal projection projRn : Rn Rn as projRn = QQT . Okay, we already knew this
formula. But remember if we apply the orthogonal projection onto V to a vector that already lies in V
nothing happens, the vector already lies in V so the projection into V is the vector itself! If we think this
through for Rn we see that the orthogonal projection of Rn onto Rn is just the identity map, i.e. its matrix
is In . So we see that for any orthogonal matrix we have
QQT = In
In other words we just proved the following nice theorem.
Theorem 4.3. For any orthogonal matrix Q, we have Q1 = QT .

ORTHOGONAL PROJECTIONS AND ORTHOGONAL MATRICES

That means we dont have to do any work to find the inverse! Let me emphasize how nice this is by
reminding

 you that we find the inverse of an arbitrary invertible matrix A by forming the augmented matrix
A In and then using Gauss-Jordan elimination, again this gets to be very tedious if the involved matrices
are large.
Example 4.4. Consider the counter-clockwise rotation of R2 about the origin by an angle . The matrix of
this transformation is given by


cos sin
rot =
sin
cos
These columns are clearly orthogonal and after a seconds thought it is easy to see that they are also unit
length. Thus the rotation is an orthogonal transformation of R2 . Notice in particular that this is a rigid
motion of the plane, that is it doesnt change the lengths of vectors or the angles between vectors.
It turns out that this last observation always holds: any orthogonal transformation preserves
lengths of vectors and angles between vectors.
Theorem 4.5. For any orthogonal n n matrix Q and vector ~x Rn , we have ||Q~x|| = ||~x||.
To finish we will prove this result, it will be a homework exercise to check that an orthogonal transformation
also preserves the angles between vectors.
Proof. To do this we need to recall the correct definition of ||~x||. One may write the length of ~x as
q
||~x|| = x21 + x22 + + x2n
but this will not suffice for our purposes. Recall that the quantity under the radical is exactly ~x ~x and that
we have reinterpreted this in terms of matrix multiplication

(4.1)
||~x|| = ~x ~x = ~xT ~x
We then apply (4.1) to the vector Q~x, where we will need to remember that (Q~x)T = ~xT QT and QQT = In
q
p
p

||Q~x|| = (Q~x)T Q~x = ~xT QT Q~x = ~xT In ~x = ~xT ~x = ||~x||.




You might also like