Orthogonal

Orthogonality
Orthogonality 1 / 116
Goals
1 Orthogonal and independence

2 Orthogonalize a basis of a vector (sub)space with Gram-Schmidt
orthogonalization algorithm
3 Orthogonal diagonalization and applications
Orthogonality
Table of contents
1 Orthogonality
Orthogonality
Orthogonalization
Orthogonal Complement
Projection
2 Orthogonal Diagonalization
3 Singular Value Decomposition (SVD)
4 Positive Definite Matrices
5 An Application to Quadratic Forms
Orthogonality Orthogonality
Dot product, length

   
x1 y1
Two vectors x = . . . and y = . . . in Rn
   
xn yn
Dot product
Their dot product is
x · y = x 1 y1 + · · · + x n yn
which is matrix product xT y.
Length of a vector
Length of x is q
∥x∥ = x21 + · · · + x2n
Properties of dot product and length
Let x, y and z be vectors in Rn . Then

1 x · y = y · x
2 x · (y + z) = x · y + x · z
3 (ax) · y = a(x · y) = x · (ay) for all scalar a ∈ R
4 ∥x∥ = x · x
5 ∥x∥ ≥ 0 and ∥x∥ = 0 if and only if x = 0
6 ∥ax∥ = |a|∥x∥ for all scalar a ∈ R
Distance
Definition (Euclidean distance)
Distance between two vector x and y in Rn is
d(x, y) = ∥x − y∥
Properties
Let x, y, z be three vectors in Rn then
1 d(x, y) ≥ 0
2 d(x, y) = 0 if and only if x = y
3 d(x, y) = d(y, x)
4 d(x, z) ≤ d(x, y) + d(y, z)
Orthogonal and Orthogonal Sets
Orthogonal
Two vectors x and y in Rn are orthogonal if x · y = 0
Orthogonal sets
A set of vectors x1 , x2 , · · · , xk in Rn is called orthogonal set if
xi · xj = 0 ∀i ̸= j and xi ̸= 0 ∀i
A set of vectors x1 , x2 , · · · , xk in Rn is called orthonormal is if is

orthogonal and each xi is a unite vector, that is
∥xi ∥ = 1 ∀i
Example
The standard basis {e1 , . . . , en } is an orthonormal set in Rn
Example
If {x1 , x2 , · · · , xk } is an orthogonal set then so is {a1 x1 , a2 x2 , · · · , ak xk }
for all nonzero scarlars ai
Example
The standard basis {e1 , . . . , en } is an orthonormal set in Rn
Example
If {x1 , x2 , · · · , xk } is an orthogonal set then so is {a1 x1 , a2 x2 , · · · , ak xk }
for all nonzero scarlars ai
Normalizing an Orthogonal Set

n o
1 1
If {x1 , x2 , · · · , xk } is an orthogonal set then ∥x1 ∥ x1 , ∥x2 ∥ x2 , · · · , ∥x1k ∥ xk
is an orthogonal set is an orthonormal set.
Example
     
1 1 −1
 1  0  0 
If f1 =  , f2 =   and f3 =   then {f1 , f2 , f3 } is an orthogonal
     
 1  1  1 
−1 2 0
set in R4 . After normalizing, the orthonomal set is
1 1 1

f1 , √ f2 , √ f3
2 6 2
Orthogonality implies Linear Independence

Suppose that {x1 , x2 , · · · , xk } is an orthogonal set. Consider
t1 x1 + t2 x2 + · · · + tk xk = 0
Multiplying both sides by x1 , we have
t1 ∥x1 ∥2 + t2 x2 x1 + · · · + tk xk x1 = 0
It is clear that x2 x1 = · · · = xk x1 = 0. Hence
t1 ∥x1 ∥2 = 0 ⇒ t1 = 0
| {z }
̸=0
Similar t2 = · · · = tk = 0. Hence {x1 , x2 , · · · , xk } is linear independence
Orthogonality implies Linear Independence

Suppose that {x1 , x2 , · · · , xk } is an orthogonal set. Consider
t1 x1 + t2 x2 + · · · + tk xk = 0
Multiplying both sides by x1 , we have
t1 ∥x1 ∥2 + t2 x2 x1 + · · · + tk xk x1 = 0
It is clear that x2 x1 = · · · = xk x1 = 0. Hence
t1 ∥x1 ∥2 = 0 ⇒ t1 = 0
| {z }
̸=0
Similar t2 = · · · = tk = 0. Hence {x1 , x2 , · · · , xk } is linear independence
Theorem
Every orthogonal set in Rn is linear independent
Theorem (Expansion Theorem)

If f1 , f2 , . . . , fm is an orthogonal basis of a subspace U of Rn then for any
vector x ∈ U , we have
x · f1 x · f2 x · fm

x= f1 + f2 + · · · + fm
∥f1 ∥2 ∥f2 ∥2 ∥fm ∥2
The expansion in of x as a linear combination of the orthogonal basis

f1 , f2 , . . . , fm is called the Fourier expansion of x, and the coefficients
ti = ∥fx·fi ∥i2 are called the Fourier coefficients.
Theorem (Expansion Theorem)

If f1 , f2 , . . . , fm is an orthogonal basis of a subspace U of Rn then for any
vector x ∈ U , we have
x · f1 x · f2 x · fm

x= f1 + f2 + · · · + fm
∥f1 ∥2 ∥f2 ∥2 ∥fm ∥2
The expansion in of x as a linear combination of the orthogonal basis

f1 , f2 , . . . , fm is called the Fourier expansion of x, and the coefficients
ti = ∥fx·fi ∥i2 are called the Fourier coefficients.
Proof.
Suppose that
x = t1 f1 + · · · + tm fm
then
x · f1 = t1 f1 · f1 = t1 ∥f1 ∥2
x·f1 x·fi
So t1 = ∥f1 ∥2
. Similar we have ti = ∥fi ∥2
for i = 2, . . . , m.
Example      
1 1 −1
 1  0  0 
Let U = span(f1 , f2 ), f3 where f1 =  , f2 =   and f3 =   . We
     
 1  1  1 
−1 2 0
have {f1 , f2 , f3 } is an orthogonal set and then is a basis of U . For all
vector x = (a, b, c, d) in U , it can be expanded as a linear combination of
{f1 , f2 , f3 } with Fourrier coefficients
x · f1 1
t1 = = (a + b + c − d)
∥f1 ∥2 4
x · f2 1
t2 = 2
= (a + c + 2d)
∥f2 ∥ 6
x · f2 1
t2 = = (−a + c)
∥f1 ∥2 2
That is
x = t1 f1 + t2 f3 + t3 f3 = . . .
Orthogonality Orthogonalization
If {v1 , ..., v1 m} is linearly independent , and if vm+1 is not in

span{v1 , ..., vm }, then {v1 , ..., vm , vm+1 } is linear independent
Orthogonal Lemma
Let {f1 , . . . , fm } be an orthogonal set in Rn . Given x ∈ Rn , write
x · f1 2 x · fm 2
fm = x − f1 − · · · − fm
∥f1 ∥ ∥fm ∥
then
1 fm · fi = 0 for k = 1, · · · m
2 If x ∈
/ span{f1 , . . . , fm } then fm ̸= 0 and {f1 , . . . , fm , fm+1 } be an
orthogonal set
One of the important consequence of orthogonal lemma is an extension for

orthogonal sets of the fundamental fact that any independent set is part of
a basis
Theorem
Let U be a subspace in Rn
1 Every orthogonal set in U is a subset of an orthogonal basis of U
2 U has an orthogonal basis
The second consequence of the orthogonal lemma is a procedure by which

any basis of a subspace U of Rn can be systematically modified to yield an
orthogonal basis of U
Gram-Schidt Orthogonalization Algorithm
If {x1 , · · · , xm } is any basis of a subspace U of Rn , construct
f1 , f2 , · · · , fm in U as following
f1 = x1
x2 · f1
f2 = x2 −
∥f1 ∥2
..
.
xk · f2 x2 · f1 xk · fk−1
fk = xk − − −
∥f2 ∥2 ∥f1 ∥2 ∥fk−1 ∥2
for k = 2, . . . , m. Then
1 f1 , f2 , · · · , fm is an orthogonal basis of U
2 span(f1 , · · · , fk ) = span(x1 , · · · , xk ) for all k = 1, . . . , m
Example
Find an orthogonal basis of U = span{x1 , x2 , x3 } with

     
1 3 1
 1  2 0
x1 =   , x2 =   , x3 =  
     
−1 0 1
−1 1 0
Solution
Observe that x1 , x2 and x3 are independent. The algorithm gives
f1 = x1
 T    
3 1 2
x2 · f1 
2
 4 1   1 
  
f2 = x2 − =  −  = 
∥f1 ∥2 0 4 −1 −1
1 −1 2
 
4
x3 · f1 x3 · f2 0 3 −3
1  
f3 = x3 − − = x3 − f 1 − f 2 =
∥f1 ∥2 ∥f2 ∥2 10  7 
 
4 10
−6
Hence {f1 , f2 , f3 } is an orthogonal basis.
Remark
The orthogonal property does not change it a vector in basis is multiplied by a
nonzero. It may be convenient to eliminate fractions and use {f1 , f2 , 10f3 } as an
orthogonal basis of U
Remark
In order to prove {x1 , x2 , x3 } is an independent set, we can solve the

system equation
sx1 + tx2 + ux3 = 0
to obtain the unique solution s = t = u = 0
Another possible procedure is as following
1 x ̸= 0 so {x } is an independent set
1 1
2 f ̸= 0 so x ∈
2 2 / span{f1 } = span{x1 }. That is x1 and x2 are
independent
3 f ̸= 0 so f ∈
3 2 / span{f1 , f2 } = span{x1 , x2 }. Hence x1 , x2 , x3 are
independent.
The second approach is a consequence of Gram-Schmidt orthogonal
algorithm. It can be used to find an orthogonal basis of a subspace
spanned by a set of vectors
Example
 
0
Find an orthogonal basis of U = span{x1 , x2 , x3 , x4 } where x1 = 1,
 
0
     
1 1 1
x2 = 0, x3 = 1, x4 = 1
     
1 1 3
Solution
The algorithm gives
f1 = x1
h iT
x2 ·f1
f2 = x2 − ∥f 1∥
2 = x2 − 0f1 = 1, 0, 1 ̸= 0
So f2 ∈
/ span{f1 } = span{x1 }. It implies that f2 and f1 are
independent and span{f1 , f2 } = span{x1 , x2 }
h iT
x3 ·f1 x3 ·f2 1 2
f3 = x3 − ∥f 1∥
2 − ∥f2∥2 f2 = x3 − 1 f1 − 2 f2 0, 0, 0 =0
So x3 ∈ span{f1 , f2 } = span{x1 , x2 }. Hence
span{x1 , x2 , x3 } = span{x1 , x2 } = span{f1 , f2 }
We do not need to put attention on f3
h iT
x4 ·f1 x4 ·f2 1 4
f4 = x4 − ∥f 2 − ∥f2∥2 f2 = x3 − 1 f1 − 2 f2 −1, 0, 1 ̸= 0
1∥
So x4 ∈ span{f1 , f2 } = span{x1 , x2 , x3 }. Hence
span{x1 , x2 , x3 , x4 } = span{x1 , x2 , x4 } = span{f1 , f2 , f4 }
An orthogonal basis of U = span{x1 , x2 , x3 , x4 } is {f1 , f2 , f4 }
Orthogonality Orthogonal Complement
Problem motivation
Suppose a point x and a plane U through the origin in R3 are given, and
we want to find the point p in the plane that is closest to x. Our
geometric intuition assures us that such a point p exists. In fact, p must
be chosen in such a way that x − p is perpendicular to the plane.
Problem motivation
Suppose a point x and a plane U through the origin in R3 are given, and
we want to find the point p in the plane that is closest to x. Our
geometric intuition assures us that such a point p exists. In fact, p must
be chosen in such a way that x − p is perpendicular to the plane.
If U is a subspace of Rn , define the orthogonal complement U ⊥ of U
(pronounced ”U -perp”) by
U ⊥ = {x ∈ Rn | x · y = 0 ∀y ∈ U }
Properties of the orthogonal complement
Lemma
Let U be a subspace of Rn
1 U ⊥ is a subspace of Rn
2 {0⊥ = Rn } and (Rn )⊥ = {0}
3 If U = span{x1 , . . . , xm } then
U ⊥ = {x ∈ Rn | x · xi = 0 for i = 1, . . . , m}
Example
h iT h iT
Find U ⊥ if U = span 1, −2, 2, 0 , 1, 0, −2, 3 in R4 .
Solution
h iT
x = x, y, z, w is in U ⊥ if and only if it is orthogonal to both
h iT h iT
1, −2, 2, 0 and 1, 0, −2, 3 , that is
x − y + 2z = 0 (1)
x − 2z + 3w = 0 (2)
h iT h iT
Gaussian elimimation gives U⊥ = span 2, 4, 1, 0 , 3, 3, 0, −1
Example
Orthogonality Projection
Projection onto a Suspace of Rn
Let U be a subspace in Rn with orthogonal basis {f1 , . . . , fm }. The vector

x in Rn is defined by
x · f1 x · fm
projU x = 2
f1 + · · · + fm
∥f1 ∥ ∥fm ∥2
is called the orthogonal projection of x on U .
For the zero subspace U = {0}, we define
proj{0} x = 0
Projection onto a line in R3
Consider vectors x and d ̸= 0 in R3 . The projection p = projd x is defined

as
x·d
p = projd x = d
∥d∥2
where the dotted line is the error e = x − p which is orthogonal
(perpendicular) to d.
Theorem (Projection Theorem)

If U is a subspace in Rn and denote p = projU x then
1 p ∈ U and x − p ∈ U ⊥
2 p is the vector in U closest to x in the sense that
∥x − p∥ < ∥x − y∥ ∀y ∈ U, y ̸= p
Example
h iT h iT
Let U = span{x1 , x2 } where x1 = 1, 1, 0, 1 and x2 = 0, 1, 1, 2 . If
h iT
x = 3, −1, 0, 2 . Find the vector in U closest to x and express x as the
sum of a vector in U and a vector orthogonal to U .
Solution
{x1 , x2 } are independent but not orthogonal. Gram-Schmidt algorithm
h iT
gives an orthogonal basis {f1 , f2 } of U where f1 = x1 = 1, 1, 0, 1 and
iT
x2 ·f1 2
h
f2 = x2 − = x2 = 33 f1 −1, 0, 1, 1
∥f1 ∥ f1
Compute projection using orthogonal basis (f1 , f2 )
x · f1 2 x · f2 2 4 −1 1 hT i
p = projU x = f1 + f2 = f1 + f2 = 5, 4, −1, 3
∥f1 ∥ ∥f2 ∥ 3 3 3
h iT
Thus p is the vector in U closest to x and x − p = 31 4, −7, 1, 3 is
orthogonal to every vector in U . The decomposition of x is
x = p + (x − p) = . . .
Projection Onto a Subspace

Let {a1 , . . . , an } be a basis of a subspace U in Rm
Problem: find the combination p = x̂1 a1 + · · · + x̂n an closest to a

vector b ∈ U .
We need
h to find p i= Ax̂ - the projection in the column space of
A = a1 . . . an that closest to b. The error vector e = b − Ax̂ is
perpenticular to the space

aT (b − Ax̂) = 0
 T
 1

 a1 h
..  .. 
i
. or  .  b − Ax̂ = 0 ⇐⇒ AT (b − Ax̂) = 0

aTn

 T
a2 (b − Ax̂) = 0
Hence
AT Ax̂ = AT b
If a’s are linear independent then AT A is symmetric and invertible then
x̂ = (AT A)−1 AT b
So
p = Ax̂ = A(AT A)−1 AT b
The matrix
P = A(AT A)−1 AT
is the projection matrix such that p = Pb
Visualization of projection onto a line and onto S =

column space of A
Let θ be the angle between b and the line go through vector b then the
projection p of b on the line has the length
∥aT ∥∥b∥ cos θ
∥b∥ = ∥a∥ = ∥b∥ cos θ
∥a∥2
and the length of error
∥e∥ = ∥b∥ sin θ
Projection in R3
p1 is projection of b on the line Oz - axis = span{e3 } and p2 is the

projection of b onto the plane Oxy = span{e1 , e2 }
Projection on Oz
Matrix of subspace  
h 0 i
A = e3 = 0
 
1
 
h i 0
So AT A = 0 0 1 0 = 1. Hence projection matrix
 
1
   
0 h i 0 0 0
T −1 T
P1 = A(A A) A = 0 1 0 0 1 = 0 0 0
   
1 0 0 1
      
x 0 0 0 x 0
The projection of b = y  is p1 = P1 b = 0 0 0 y  = 0
      
z 0 0 1 z z
Projection onto Oxy

Matrix of subspace  
h i 1 0
A = e1 , e2 = 0 1
 
0 0
 
" # 1 0 " #
T 1 0 0  1 0
So A A = 0 1 = . Hence projection matrix

0 1 0 0 1
0 0
   
1 0 " #−1 " # 1 0 0
T −1 T 1 0 1 0 0
P2 = A(A A) A = 0 1 = 0 1 0
   
0 1 0 1 0
0 0 0 0 0
      
x 1 0 0 x x
The projection of b = y  is p2 = P2 b = 0 1 0 y  = y 
      
z 0 0 0 z 0
Exericise
h iT h iT
Let U = span{x1 , x2 } where x1 = 1, 1, 0, 1 and x2 = 0, 1, 1, 2 . Find
h iT
projection of b = 3, −1, 0, 2 on U .
Exercise
Let    
1 0 6
A = 1 1 and b = 0 .
   
1 2 0
Find projection of b on column space of A.
Application to Least Square Approximation - Linear

Regression
1 Solving AT Ax̂ = AT b gives the projection p = Ax̂ of b onto the
column space of A
2 When Ax = b has no solution then x̂ is the ”least-square solution”:
∥b − Ax̂∥2 = minimum
3 Setting partial deriavatives of square of length of error
∂E
E = ∥b − Ax̂∥2 to zero ∂x i
= 0 also produces AT Ax̂ = AT b
4 To fit points (t1 , b1 ), . . . , (tm , bm ) by a straight lines y = ax + k,
1 t1
 
k + at1 = b1


need to solve . . . or Ax = b with A =  ... ... 
 

1 tm

k + at
m = bm
Example
Find the closest line to the points (0, 6), (1, 0) and (2, 0)
Solution
   
1 0 6
We have A = 1 1 and b = 0 The coefficients a, k of the fitted line
   
1 2 0
are given by
" # " #
k 5
= x̂ = (AT A)−1 AT b =
a 3
So the fitted line is y = 5 + 3x
Fitting by a Parabola
Drop a stone from the Leaning Tower of Pisa by Galileo
Fit heights b1 , . . . , bm at times t1 , . . . , tm by a parabola C +Dt+Et2
The exact fit are solution to system


2 1 t1 t21
 
C + Dt1 + Et1 = b1


... ⇔ Ax = b with A =  ... ... .. 

 . 
C + Dt + Et2 = b

1 tm tm 2
m m m
which is generally unsolvable

 
C
Least square The closet parabola C + Dt + Et2 chooses x̂ = D
 
E
satisfy AT Ax̂ = AT b
Orthogonal Diagonalization
Table of contents
1 Orthogonality
Orthogonality
Orthogonalization
Projection
Question
An n × n matrix A is diagonalizable if and only if it has n independent

eigenvectors. The matrix P with these eigenvectors as column is a
diagonalization matrix for A, that is
P −1 AP is diagonal
The really nice bases of Rn are the orthogonal ones. So which matrices
have an orthogonal basis of eigenvectors?
Answer this question is a main result in this section: the matrix A is

symmetric
Normalize an orthogonal set
Recall that an orthogonal set of vector is orthogonormal if all vector has

length of 1. Any orthogonal set {v1n, . . . , vk } can be (normalized)
o
converted to an orthogonormal set ∥v11 ∥ v1 , . . . , ∥v1k ∥ vk
Orthogonal matrix
Theorem
The following conditions are equivalent for all square matrix P
1 P is invertible and P −1 = P
2 The rows of P are orthogonormal
3 The columns of P are orthogonormal
Definition
A square matrix P is called an orthogonormal matrix if it satisfies one of
the above conditions
Example
" #
cos θ − sin θ
The rotation matrix is orthogonal for any angle θ
sin θ cos θ
It is not enough that the rows of a matrix A are merely orthogonal for A
to be an orthogonal matrix.
Example
The matrix  
2 1 1
−1 1 1
 
0 −1 1
has orthogonal rows but columns are not orthogonal.
However it the rows are normalized then the resulting matrix

 
√2 √1 √1
 −16 6 6
√ √1 √1 
 3 3 3
−1 √1
0 √
2 2
is orthogonal
Example
if P and Q are orthogonal matrices then P Q and P are also orthogonal
Example
if P and Q are orthogonal matrices then P Q and P are also orthogonal
Solution
Prove that P Q is orthogonal
P and Q are invertible and so is P Q with
(P Q)−1 = Q−1 P −1
Because P and Q are orthogonal, we have P −1 = P T and Q−1 = QT .

Hence
(P Q)−1 = QT P T = (P Q)T
Thus P Q is orthogonal
Solution (Cont)
Prove that P −1 is orthogonal
It is clear that P −1 is invertible with
(P −1 )−1 = P
Moreover P is orthogonal, so P −1 = P T . Hence
(P −1 )T = (P T )T = P
Thus
(P −1 )−1 = (P −1 )T
So P −1 is orthogonal
Definition (Orthogonally Diagonalizable Matrices)

An n × n matrix A is said to be orthogonally diaonalizable if here exists
an orthogonal matrix P such that P −1 AP = P T AP is diagonal
This condition turns out to characterize the symmetric matrices
Theorem (Principal Axes Theorem)

The following conditions are equivalent for a square matrix A
1 A is an orthogonal set of n eigenvectors
2 A is orthogonally diagonalizable
3 A is symmetric
A set of orthonormal eigenvectors of a symmetric matrix A is called a set

of principal axes for A. The name comes from geometry and this is
discussed in application to quadratic form later
Theorem
If A is symmetric then
(Ax) · y = x · (Ay)
for all column vectors x, y ∈ Rn
Theorem
If A is a symmetric matrix, then eigenvectors of A corresponding to
distinct eigenvalues are orthogonal.
Example
an orthogonal matrix P such that P −1 AP is diagonal where
Find 
1 0 −1
A= 0 1 2 
 
−1 2 5
Solution
The characteristic polynomial of A is
 
1 − x, 0, −1
cA (x) = det  0, 1 − x, 2  = −x(x − 1)(x − 6)
 
−1, 2, 5 − x
Thus the eigenvalues are λ1 = 0, λ2 = 1 and λ3 = 6. The corresponding

eigenvectors are
     
1 2 −1
x1 = −2 , x2 = 1 , x3 =  2 
     
1 0 5
These vectors are orthogonal so they should be normalized to create an

orthogonal diagonalizing matrix
 √ √ 
5 2√6 −1
h
x1 x2 x3
i 1  √
P = ∥x1 ∥ ∥x2 ∥ ∥x3 ∥ = √ −2 √ 5 6 2

30
5 0 5
Thus P −1 = P T and  
0 0 0
T
P AP = 0 1 0
 
0 0 6
Singular Value Decomposition (SVD)
Table of contents
1 Orthogonality
Orthogonality
Orthogonalization
Projection
Image processing by linear algebra
1 An image is a large matrix of grayscal values, one for each pixel and
color
2 When nearby pixels are correlated (not random), the image can be
compressed
3 SVD separates any matrix A into rank one pieces (simple pieces)
uvT = (column)(row) . It is useful to compress image
4 The rows and columns are eigenvectors of AT A and AT A
Example - Low Rank Images

Consider an image represented by
 
1 1 1 1 1 1
1 1 1 1 1 1
 
1 1 1 1 1 1
 
 
A=
1 1 1 1 1 1
1 1 1 1 1 1
 
1 1 1 1 1 1
 
1 1 1 1 1 1
 
1 1 1 1 1 1
1 1 1 1 1 1
 
1 1 1 1 1 1
 
 
If you send 
1 1 1 1 1 1  then you need to use 36 numbers (36
1 1 1 1 1 1
 
1 1 1 1 1 1
 
1 1 1 1 1 1
pixels - each pixel require 8 bits of information).
Observe that  
1
1
 
1 h i
A=  1 1 1 1 1 1
 
1
 
1
1
 
1
1
 
1
Instead of sending all elements in matrix A, we can send a column  
 
1
 
1
1
h i
and a row 1 1 1 1 1 1 which require 12 numbers
With 300 by 300 hundred pixels, 90000 numbers becomes 600
Rank 1 pattern A = uvT
Rank 2 pattern A = c1 u1 vT1 + c2 u2 vT2 or higher
" # " # " #

1 0 1 h i 1 h i
A= is equal to A = 1 1 − 0 1
1 1 1 0
Rank 2 pattern A = c1 u1 vT1 + c2 u2 vT2 or higher
" # " # " #

1 0 1 h i 1 h i
A= is equal to A = 1 1 − 0 1
1 1 1 0
If the rank of A is much higher than 2 (for real images) then A will add up
may rank one pieces
n
X
A= σi ui vTi for n ≥ 2
i=1
We want the small ones such that thay can be discarded with no loss of
visual quantity - image compression.
In other words, we wan to decompose an m × n matrix A into
A = U ΣV T
where U, V are orthogonal matrices and matrix Σ is diagonal

Starting T
h from A Ai is symmetric, there exists an orthogonal matrix
V = v1 . . . vn such that
V (AT A)V T = D
where D is a diagonal matrix whose diagonal entries λ1 , . . . , λn are the

eigenvalues of AT A, that is
AT Avi = λi vi
Decomposition A = U ΣV T leads to AV = U Σ that require Avi = σi ui .

Hence AAT ui = σ1i AAT Avj = σ1i Aλi vi = λσii Avi = λi ui . That is λi is
also an eigenvalue of AAT and ui is its corresponding eigenvector.
Remark that ∥Avi ∥2 = λi ∥vi ∥2 ≥ 0 √
⇒ λi ≥ 0 and we can verify that
λi vi = AT Avi = σ 2 vi . Hence σi = λi
Lemma
1 All eigenvalues of AT A and AAT are non-negative
2 AT A and AAT have the same set of positive eigenvalues {λi }
Definition
√
The real numbers σi = λi are called singular values of the matrix A
Theorem (SVD theorem)

Suppose that A is a matrix of rank r and σ1 ≥ σ2 ≥ · · · ≥ σr > 0 be
positive singular values of A then
r
X
A= σi ui vTi
i=1
where ui and vi are orthonormal eigenvector corresponding to eigenvalues

λi = σi2 of AT A and AAT , called left singular vectors and right
singular vector respectively
Theorem (SVD theorem (cont))

In other words
A = U ΣV T
where U and V are orthogonal matrix of AT A and AAT and
" #
diag(σ1 , . . . , σr ) 0
Σ=
0 0
in block form, which is called singular matrix of A
Geometric meaning (rotation ) × (stretching) × (rotation)

SVD algorithm
In order to obtain SVD of A,

1 Compute AT A and AAT
√
2 Find the eigenvalues λ of AT A and then singularvalues σ = λi of
i i
A to create singular matrix Σ
3 Find orthogonal matrix of AT A and AAT
4 Decompose A = U ΣV T
Example
" #
0 1
Find an SVD for A =
−1 0
Positive Definite Matrices
Table of contents
1 Orthogonality
Orthogonality
Orthogonalization
Projection
Definition (Positive Definite Matrices)

A square matrix is called positive definite if it is symmetric and all its
eigenvalues λ are positive, that is λ > 0.
Because these matrices are symmetric, the principal axes theorem plays a
central role in the theory.
Theorem
If A is positive definite then it is invertible and det(A) > 0
We have the following characterization of positive definite matrices

Theorem
A symmetric matrix A is positive definite if and only if xT Ax > 0 for every
column x ̸= 0 in Rn
Proof of both theorems is based on the orthogonal diagonalization of

symmetric matrix
Example
If U is any invertible n × n matrix then A = U T U is positive definite
Example
If U is any invertible n × n matrix then A = U T U is positive definite
Solution
If x ∈ Rn ̸= 0 then
xT Ax = xT U T U x = (U x)T (U x) = ∥U x∥2
Because x ̸= 0 and U is invertible, the vector U x ̸= 0 and then

∥U x∥2 > 0. Thus
xT Ax > 0
Hence A is positive definite.
Principal submatrices
Definition
If A be an n × n matrix, let (r) A denote the r × r submatrix in the upper
left corner of A. The matrices (1) A, (2) A, . . . , (n) A = A are called
principal matrices of A
Example
 
10 5 2 " #
(1) A = 10 , (2) A = 10 5 and
h i
If  5 3 2 then
  (3) A =A
5 3
2 2 3
Theorem
If A is positive definite, so is each principal submatrix (r) A for
r = 1, 2, . . . , n
Proof.
Write " #
(r) A P
A=
Q R
" #
y
For y ̸= 0 in Rr , consider x = ∈ Rn . Then x ̸= 0. So
0
" #" #
h i (r) A P y
0 < xT Ax = y 0 = yT Ay
Q R 0
Hence (r) A is positive definite
Theorem
The following conditions are equivalent for a symmetric n × n matrix A
1 A is positive definite
2 det((r) A) > 0 for each r = 1, 2, . . . , n
3 A = U T U where U is an upper triangular matrix with positive entries
on the main diagonal
Furthermore, the factorization in (3) is unique (called the Cholesky
factorization of A)
Algorithm for the Cholesky Factorization A = U T U for positive

definite matrix A
Step 1 Carry A to an upper triangular matrix U1 with positive diagonal
entries using row operations each of which adds a multiple of a row to
a lower row
Step 2 Obtain U from U1 by dividing each row of U1 by the square root of
the diagonal entry in that row.
Example
 
10 5 2
Find the Cholesky factorization of A =  5 3 2
 
2 2 2
Solution
" #
h i 10 5
We have( 1)A = 10 , (2) A = and (3) A = A. It is easy to verify
5 3
that det((1) A) = 10 > 0, det((2) A) = 5 > 0 and det((3) A) = 3 > 0. So A
is positive definite. It has the Cholesky factorization.
Step 1
     
10 5 2 − r1 +r 10 5 2 10 5 2
5 3 
A =  5 3 2 −−r−−−→  0 1/2 1  −−−−−→  0 1/2 1 
    
1 −2r +r
2 2 2 − 2 +r2
2 3
0 2 13/5 0 0 3/5
Solution (cont)
Step 2 √
√5 √2

10 10 √10
0 √1 2
 
U =
 2 q 
3
0 0 5
One can verify that A = U T U
An Application to Quadratic Forms
Table of contents
1 Orthogonality
Orthogonality
Orthogonalization
Projection
Quadratic form
Definition
A quadratic form q of n variables x1 , . . . , xn is a linear combination of
x21 , . . . , x2n and cross terms x1 x2 , x1 x3 , x2 x3 . . .
n
X n
X n X
X n
q= aii x2i + (aij + aji )xi xj = aij xi xj
i=i i,j=1 i=i j=1
n̸=j
This sum can be written compactly as a matrix product
q(x) = xT Ax
T
where x = x1 . . . xn and A = ai j .
h i h i
There is no loss of generality in assuming that xi xj and xj xi have the

same coefficients in the sum for q so we may assume that A is
symmetric
Example
Write q = x21 + 3x23 + 2x1 x2 − x1 x3 in the form of q(x) = xT Ax where A
is a symmetric 3 × 3 matrix.
Solution
The cross terms are 2x1 x2 = x1 x2 + x2 x1 , −x1 x3 = − 12 x1 x3 − 21 x3 x1 and
both x2 x3 and x3 x2 have coefficient zero and does x22 . Hence
  
h i 1 1 − 21 x1
q(x) = x1 x2 x3  1 0 0  x2 
  
− 12 0 3 x3
is the required form.
Problem
Given a symmetric matrix A and the quadratic form
q(x) = xT Ax
The problem is to find new variables y1 , . . . , yn related to x1 , . . . , xn such

that when q is expressed in terms of y1 , . . . , yn , there are no cross terms,
that is
q = b11 y12 + b22 y22 + · · · + bnn yn2
h iT
If we write y = y1 . . . yn then
q = yT Dy where D is a diagonal matrix
Solution
The symmetric matrix A is orthogonally diagonalized. There exists

orthogonal matrix P (that is P −1 = P T ) that diagonalizes matrix A
 
λ1 0 . . . 0
 0 λ2 . . . 0

P T AP = D = 

 .. .. . . .. 
 . . . . 

0 0 ... λn
Define y by
x = Py equivalent to y = PTx
and substitution in q(x) = xT Ax gives
q = (P y)T A(P y) = yT P T AP y = yT Dy = λ1 y12 + λ2 y22 + · · · + λn yn2
Principle axes
Let λ1 , . . . , λn (repeated according to their multiplicities) and the
corresponding set {f1 , . . . , fn } of orthonormal eigenvector of A, called a
set of principal axes then the orthogonally diagonalizing matrix
h i
P = f1 f2 . . . fn
and
 
y1
 y2 
i
h 
x = P y = f1 f2 . . . fn  .  = y1 f1 + y2 f2 + · · · + yn fn

 .. 
yn
The new variables yi are the coefficients when x is expanded in terms of

the orthonormal basis {f1 , . . . , fn } of Rn . Hence
q = q(x) = λ1 (x · f1 )2 + · · · + λn (x · fn )2
Example
Find new variables y1 , y2 such that
q = x21 + x1 x2 + x22
has diagonal form, and find the corresponding principal axes.
Solution
The form can be written as q = xT Ax where
" # " #
1
x 1
x= 1 and A = 1
2
x2 2 2
The eigenvalues of A is the solution of

3
cA (x) = det(xI − A) = x2 − 2x −
4
which are λ1 = 0.5 and λ2 = 1.5
The corresponding orthogonal eigenvectors are the principal axes
− √12
" # " 1 #
√
f1 = , f2 = 2
√1 √1
2 2
The diagonalizing matrix
− √12 √1
" #
h i
P = f1 f2 = 2
√1 √1
2 2
" # " #
y x1 + x2
Introduce y = 1 = P T x = 1
2 then
y2 −x1 + x2
1 3
q = y12 + y22
2 2
Quadratic form of two variables
In case of two variable x1 and x2 , consider the quadratic form
q = ax21 + bx1 x2 + cx22 where a, c, b2 − 4ac are all nonzero
1 There is a counterclockwise rotation of the x1 and x2 -axes about the

origin such to obtain the principal axes
2 The graph of the equation
ax21 + bx1 x2 + cx22 = 1
is an ellipse if b2 − 4ac < 0 and an hyperpoba if b2 − 4ac > 0
Proof
If b = 0 then q is already has no cross term and (1), (2) are clear. So
assume b ̸= 0 then " #
a 2b
A= b
2 c
and q has characteristic polynomial
1
cA (x) = x2 − (a + c)x − (b2 − 4ac)
4
p
Denote d = b2 + (a − c)2 then the eigenvalues of A are
1 1
λ1 = (a + c − d) and λ1 = (a + c + d)
2 2
with the corresponding principal axes
" # " #
1 a−c−d 1 −b
f1 = p 2 , f1 = p 2
b + (a − c − d)2 b b + (a − c − d)2 a − c − d
for part 1
Because ∥f1 ∥ = 1, there exists an angle θ such that
a−c−d b
cos θ = p 2 , sin θ = p 2
b + (a − c − d)2 b + (a − c − d)2
then " #
h i cos θ − sin θ
P = f1 f2 =
sin θ cos θ
diagonalizes A and the principal axes
" # " #
cos θ − sin θ
f1 = = P e1 and f2 = = P e2
sin θ cos θ
can always be found by rotating the x1 and x2 axes around the origin
through an angle θ
rotating the x1 and x2 axes around the origin through an angle θ to obtain
principal axes
for part 2
We have " #
λ 0 1
det(A) = det 1 ⇒ λ1 λ2 = (4ac − b2 )
0 λ2 4
In terms of y1 , y2 , the equation becomes
λ1 y12 + λ2 y22 = 1
whose graph is an ellipse if b2 < 4ac and an hyperbola if b2 > 4ac
Example
The notation in the prevous result for the equation x2 + xy + y 2 = 1
becomes a = b = c = 1. So the rotation angle θ is found by
−1 1
cos θ = √ , sin θ = √
2 2
3π
Hence θ = 4 . Thus the principle axes are
" −1 #
− √12
" #
√
f1 = , f2 = 2
√1 −1
√
2 2
and then
1 1
y1 = √ (−x1 + x2 ), y2 = − √ (x1 + x2 )
2 2
In y1 y2 -coordinate, the equation becomes
1 2 3 2
y + y =1
2 1 4 2
The angle θ is choosen such that new y1 and y2 axes are the axes of
symmetry of the ellipse. The eigenvector f1 and f2 point along these axes
of symmetry. For this reason, they are called principal axes
An Application to Constrained Optimization
Table of contents
6 An Application to Constrained Optimization
7 Principle Component Analysis
Constrained Optimization
It is a frequent occurrence in applications that a function

q = q(x1 , x2 , ..., xn ) of n variables, called an objective function, is to be
made as large or as small as possible among all vectors x = (x1 , x2 , ..., xn )
lying in a certain region of Rn called the feasible region. A wide variety
of objective functions q arise in practice; our primary concern here is to
examine one important situation where q is a quadratic form.
Example
A politician proposes to spend x1 dollars annually on health care and x2
dollars annually on education. She is constrained in her spending by
various budget pressures, and one model of this is that the expenditures x1
and x2 should satisfy a constraint like
5x21 + 3x22 ≤ 15
Since xi ≥ 0 for each i, the feasible region is the shaded area
These choices have different effects on voters, and the politician wants to
choose x = (x1 , x2 ) to maximize some measure q = q(x1 , x2 ) of voter
satisfaction. Assume that for any value of c, all point on the graph of
q(x1 , x2 ) = c have the same appeal to vote. Hence the goal is to find the
largest value of c for which the graph of q(x1 , x2 ) = c contains a feasible
point.
Remark that the constraint can be put in a standard from ∥y∥ ≤ 1 with
x1 x2
y1 = √ 3
, y2 = √ 5
. So we can convert the above problem into find
maximum of a quadratic form subject to ∥y∥ ≤ 1 called unit ball
Theorem
Consider a the quadratic form q = xT Ax where A is an n × n symmetric
matrix and let λ1 and λm denote the largest and smallest eigenvalues of
A. Then
1 max{q(x)|∥x∥ ≤ 1} = λ1 and q(f1 ) = λ1 where f1 is any unit
λ1 -eigenvector
2 min{q(x)|∥x∥ ≤ 1} = λn and q(fn ) = λn where fn is any unit
λn -eigenvector
Proof of (1)
Since A is symmetric, let the real eigenvalues of A be ordered as
λ1 ≥ λ2 ≥ · · · ≥ λn
By the principal axes theorem, let P be an orthogonal matrix such that
P T AP = D = diag(λ, λ2 , . . . , λn ) and define y = P T x equivalent to
x = P y then ∥y∥ = ∥x∥ because ∥y∥2 = yT y = xT P P T x = xT x = ∥x∥2 .
Express q in terms of y, we have
q(x) = q(P y) = (P y)T A(P y) = yT P T AP y = yT Dy = λ1 y12 + · · · + λn yn2
Assume that ∥x∥ ≤ 1, then ∥y∥ = ∥x∥ ≤ 1. Since λi ≤ λ1 for all i, we
have
q(x) = λ1 y12 + · · · + λn yn2 ≤ λ1 y12 + · · · + λ1 yn2 = λ1 (y12 + · · · + yn2 )
= λ1 ∥y∥2 = λ1
Hence λ1 is the maximum value of q(x) when ∥x∥ ≤ 1.
The proof of (2) is analogous

Let f1 be an unit eigenvector corresponding to λ1 then
q(f1 ) = fT1 Af1 = fT1 (λ1 f1 ) = λ1 fT1 f1 = λ1 ∥f1 ∥2 = λ1
Example
Maximize and minimize the form q(x) = 3x21 + 14x1 x2 + 3x22 subject to
∥x∥ ≤ 1
Example
Maximize and minimize the form q(x) = 3x21 + 14x1 x2 + 3x22 subject to
∥x∥ ≤ 1
Solution
 
3 7
The matrix of q is A = 7  with eigenvalues λ1 = 10 , λ2 = −4 and
 
3
" # " #
1 1 √1
the corresponding unit eigenvectors f1 = , f1 = √12
1 −11 2
Hence q(x) takes its maximum value 10 at x = f1 and the minimum value
is -4 when x = f2
Principle Component Analysis
Table of contents
6 An Application to Constrained Optimization
7 Principle Component Analysis
Sample data - Statistics inference
Suppose the heights h1 , h2 , ..., hn of n men are measured. Such a data set
is called a sample of the heights of all the men in the population under
study, and various questions are often asked about such a sample: What is
the average height in the sample? How much variation is there in the
sample heights, and how can it be measured? What can be inferred from
the sample about the heights of all men in the population? How do these
heights compare to heights of men in neighbouring countries? Does the
prevalence of smoking affect the height of a man?
Principal Component Analysis (PCA) by SVD
1 Data often comes in a matrix: n samples and m measurements per

sample
2 Center each row of a matrix by substracting the mean from each
measurement
3 The SVD finds combinations of data that contain the most
information
4 Largest singular values σ1 ↔ greatest variance ↔ most information in
u1

Example
For m = 2 variables like age and height, then points lie in the plane R2 .
Substract the average age and height to center the data. If the n recenterd
points clusters along a line, how will linear algebra find that line?

Sample mean
Represent
h a sample i {x1 , . . . , xn } as a sample vector
x = x1 , . . . , x n
The most widely known statistic for describing a data set is the
sample mean x̄ defined by x̄ = n1 (x1 + · · · + xn )

Figure 1: x = −1, 0, 1, 4, 6 with sample mean x̄ = 2
The difference xi − x̄ is the deviation of xi from x̄ which can be

negative or positive but the sum of these deviation is zero
n n
!
X X
(xi x̄) = xi − nx̄ = nx̄ − nx̄ = 0
i=i i=i
Centred sample
If the mean x̄ is substracted from each data valuexi , the resulting data
xi − x̄ is said to be centred. The corresponding data vector is
h i
x̄c = x1 − x̄, . . . , xn − x̄
has the mean x̄c = 0

Figure 2: Centred sample xc = −3, −2, −1, 2, 4
The effect of centring is to shift the data by an amount x̄ so that the

mean moves to 0
Sample variance
to answer the question how much variability is in the sample
h i
x = x1 , . . . , x n
that is how widely are the data ”spead out” around the sample mean
x̄
use the square (xi − x̄)2 as a measure of variability
sample variance
n
1 X 1
s2x = (xi − x̄)2 = ∥x − x̄1∥2
n − 1 i=i n−1
The sample variance will be large if there are many xi at a large

distance from the sample mean x̄ and it will be small if all the xi are
tightly clustered about the mean
square root of sample variance is sample standard deviation
Sample Covariance Matrix
Start with the measurements in A0 : the sample data. Find the average
mean µ1 , . . . , µm of each row. Substract each mean µi from row i to
center the data to obtain the centered matrix A.
AAT
The ”sample covariance matrix” is defined by S = n−1
A shows the distance aij − µi from each measurement to the row

average µi
(AAT )11 and (AAT )22 show the sum of squared distances (sample
variance s2i , s22 )
(AAT )12 shows the sample covariance
s12 = (row 1 of A) · (row 2 of A)

Interpretation example
Average exam score is 75 tells you that it was a decent exam

A variance s2 = 25 (standard deviation s = 5) means that most
grades were in 70’s: closely packed
A sample variance s2 = 225 (standard deviation s = 15) means that
grade were widely scattered.
covariance of score of two different subjects tells how one score
depends on other score in linear relationship. If covariance close to
zero mean one subject strong while the other is weak. High
covariance means both strong or both weak

Example - Six math and history scores (notice the zero

mean in each row)
" #
3 −4 7 1 −4 −3
A=
7 −6 8 −1 −1 −7
has sample covariance
" #
AAT 20 25
S= =
5 25 40
The rows of A are highly correlated s12 = 25. Above avarage math
went with above average history.
Notice that S has posive trace and determinant. AAT is positive
definite.

The Essentials of Principal Component Analysis (PCA)

PCA gives a way to understand data plot in dimension m - number of
variables. The crucial connection to linear algebra is in the singular values
and singular vecter in the centered data matrix A. Those come from the
eigenvalues and eigenvectors of the sample covariance matrix
S = AAT /(n − 1)
The total variance in the data is the sum of all eigenvalues and of sample
variances
total variance T = σ12 + · · · + σm
2
= s21 + · · · + s2m = trace
The first eigenvector u1 of S points in the most significant direction of the

σ2
data. This direction accounts for (or explain) a fraction T1 of total variance
σ22
the next eigenvector u2 (orthogonal to u1 ) accounts for smaller fraction T
Stop when those fractions are small. You have R dimensions that explain
most of the data. The n data points are very near a R-dimension subspace
with basis u1 to uR . These u’s are the principal components in
m-dimension space
PCA procedure
PCA is a tools for dimension reduction in machine learning when the data
consists of large variables (features). It aims to reduce number of features
(dimension) but keep important information by discard component
contributing smaller variance
1 Centering data and compute sample covariance matrix S
2 Compute eigenvalues and eigenvectors of S
3 Pick K largest eigenvalues and the corresponding eigenvectors
(principle component) which are orthonormal

4 Project data to subspace spanned by selected eigenvector to obtain
projected points in lower dimension (coordinate of data in basis of

selected eigenvectors )

Example - six math and history scores

Eigenvalues of S are near 57 and 3. The unit eigenvectors are the principle
components
2 σi2
" # σ i T
−0.82806723
Principle component u1 = 57 0.95
−0.56062881
" #
0.56062881
Principle component u1 = 3 0.05
−0.82806723
Total T = 60 1
The leading vector u1 shows the dominant direction in the scatter plot
Remark that {u1 , u2 } is an orthogonal set.

If we choose 1 largest eigenvalue and the correpsonding principle
component then we need to transform raw" score
# data in terms of u1 . For
3
example, the projection of the score s1 = of the first student along the
7
u1 is
s1 · u1
u1 ≈ −6.4
∥u1 ∥2
" #
3
That is a 2-dimensition data is reduced to 1-dimension data with
7
value −6.4

Application of PCA
Eigenfaces to recognize faces

Searching the Web
Dynamics of Interest rate in Finance
...

Example - Interest rate
Figure 3: U.S. Treasury Yields: 6 Days and 5 Centered Daily Differences

σ2
Fractions Ti drop quickly to zero. The first three principle components
containes almost information

Principle ui are orthogonal.

u1 measures a weighted average of the daily changes in the 9 yields
u2 gauges the daily change in the yield spread between long and short
bonds
u3 shows daily changes in the curvature (short and long bond versus
medium)
Figure 4: The nine loadings on u1 , u2 , u3 form 3 months to 20 years

Orthogonal

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Orthogonal

Uploaded by

Copyright:

Available Formats

Orthogonality

1 Orthogonal and independence

3 Singular Value Decomposition (SVD)

4 Positive Definite Matrices

5 An Application to Quadratic Forms

Dot product, length

which is matrix product xT y.

Properties of dot product and length

Let x, y and z be vectors in Rn . Then

3 (ax) · y = a(x · y) = x · (ay) for all scalar a ∈ R

5 ∥x∥ ≥ 0 and ∥x∥ = 0 if and only if x = 0

6 ∥ax∥ = |a|∥x∥ for all scalar a ∈ R

Distance between two vector x and y in Rn is

Orthogonal and Orthogonal Sets

A set of vectors x1 , x2 , · · · , xk in Rn is called orthonormal is if is

Normalizing an Orthogonal Set

Orthogonality implies Linear Independence

Multiplying both sides by x1 , we have

It is clear that x2 x1 = · · · = xk x1 = 0. Hence

Similar t2 = · · · = tk = 0. Hence {x1 , x2 , · · · , xk } is linear independence

Orthogonality implies Linear Independence

Multiplying both sides by x1 , we have

It is clear that x2 x1 = · · · = xk x1 = 0. Hence

Similar t2 = · · · = tk = 0. Hence {x1 , x2 , · · · , xk } is linear independence

Theorem (Expansion Theorem)

The expansion in of x as a linear combination of the orthogonal basis

Theorem (Expansion Theorem)

The expansion in of x as a linear combination of the orthogonal basis

If {v1 , ..., v1 m} is linearly independent , and if vm+1 is not in

One of the important consequence of orthogonal lemma is an extension for

The second consequence of the orthogonal lemma is a procedure by which

Find an orthogonal basis of U = span{x1 , x2 , x3 } with

In order to prove {x1 , x2 , x3 } is an independent set, we can solve the

Properties of the orthogonal complement

Projection onto a Suspace of Rn

Let U be a subspace in Rn with orthogonal basis {f1 , . . . , fm }. The vector

is called the orthogonal projection of x on U .

For the zero subspace U = {0}, we define

Projection onto a line in R3

Consider vectors x and d ̸= 0 in R3 . The projection p = projd x is defined

Theorem (Projection Theorem)

Projection Onto a Subspace

Problem: find the combination p = x̂1 a1 + · · · + x̂n an closest to a

If a’s are linear independent then AT A is symmetric and invertible then

Visualization of projection onto a line and onto S =

p1 is projection of b on the line Oz - axis = span{e3 } and p2 is the

Projection onto Oxy

Application to Least Square Approximation - Linear

So the fitted line is y = 5 + 3x

Fit heights b1 , . . . , bm at times t1 , . . . , tm by a parabola C +Dt+Et2

The exact fit are solution to system

which is generally unsolvable

3 Singular Value Decomposition (SVD)

4 Positive Definite Matrices

5 An Application to Quadratic Forms

An n × n matrix A is diagonalizable if and only if it has n independent

Answer this question is a main result in this section: the matrix A is

Normalize an orthogonal set

Recall that an orthogonal set of vector is orthogonormal if all vector has

However it the rows are normalized then the resulting matrix

Because P and Q are orthogonal, we have P −1 = P T and Q−1 = QT .

Moreover P is orthogonal, so P −1 = P T . Hence