Professional Documents
Culture Documents
Orthogonal
Orthogonal
Orthogonality 1 / 116
Goals
Orthogonality 2 / 116
Orthogonality
Table of contents
1 Orthogonality
Orthogonality
Orthogonalization
Orthogonal Complement
Projection
2 Orthogonal Diagonalization
Orthogonality 3 / 116
Orthogonality Orthogonality
Dot product
Their dot product is
x · y = x 1 y1 + · · · + x n yn
Length of a vector
Length of x is q
∥x∥ = x21 + · · · + x2n
Orthogonality 4 / 116
Orthogonality Orthogonality
2 x · (y + z) = x · y + x · z
4 ∥x∥ = x · x
Orthogonality 5 / 116
Orthogonality Orthogonality
Distance
Definition (Euclidean distance)
d(x, y) = ∥x − y∥
Properties
Let x, y, z be three vectors in Rn then
1 d(x, y) ≥ 0
2 d(x, y) = 0 if and only if x = y
3 d(x, y) = d(y, x)
4 d(x, z) ≤ d(x, y) + d(y, z)
Orthogonality 6 / 116
Orthogonality Orthogonality
Orthogonal
Two vectors x and y in Rn are orthogonal if x · y = 0
Orthogonal sets
A set of vectors x1 , x2 , · · · , xk in Rn is called orthogonal set if
xi · xj = 0 ∀i ̸= j and xi ̸= 0 ∀i
∥xi ∥ = 1 ∀i
Orthogonality 7 / 116
Orthogonality Orthogonality
Example
The standard basis {e1 , . . . , en } is an orthonormal set in Rn
Example
If {x1 , x2 , · · · , xk } is an orthogonal set then so is {a1 x1 , a2 x2 , · · · , ak xk }
for all nonzero scarlars ai
Orthogonality 8 / 116
Orthogonality Orthogonality
Example
The standard basis {e1 , . . . , en } is an orthonormal set in Rn
Example
If {x1 , x2 , · · · , xk } is an orthogonal set then so is {a1 x1 , a2 x2 , · · · , ak xk }
for all nonzero scarlars ai
Orthogonality 8 / 116
Orthogonality Orthogonality
Example
1 1 −1
1 0 0
If f1 = , f2 = and f3 = then {f1 , f2 , f3 } is an orthogonal
1 1 1
−1 2 0
set in R4 . After normalizing, the orthonomal set is
1 1 1
f1 , √ f2 , √ f3
2 6 2
Orthogonality 9 / 116
Orthogonality Orthogonality
t1 x1 + t2 x2 + · · · + tk xk = 0
t1 ∥x1 ∥2 + t2 x2 x1 + · · · + tk xk x1 = 0
t1 ∥x1 ∥2 = 0 ⇒ t1 = 0
| {z }
̸=0
Orthogonality 10 / 116
Orthogonality Orthogonality
t1 x1 + t2 x2 + · · · + tk xk = 0
t1 ∥x1 ∥2 + t2 x2 x1 + · · · + tk xk x1 = 0
t1 ∥x1 ∥2 = 0 ⇒ t1 = 0
| {z }
̸=0
Theorem
Every orthogonal set in Rn is linear independent
Orthogonality 10 / 116
Orthogonality Orthogonality
Orthogonality 11 / 116
Orthogonality Orthogonality
Proof.
Suppose that
x = t1 f1 + · · · + tm fm
then
x · f1 = t1 f1 · f1 = t1 ∥f1 ∥2
x·f1 x·fi
So t1 = ∥f1 ∥2
. Similar we have ti = ∥fi ∥2
for i = 2, . . . , m.
Orthogonality 11 / 116
Orthogonality Orthogonality
Example
1 1 −1
1 0 0
Let U = span(f1 , f2 ), f3 where f1 = , f2 = and f3 = . We
1 1 1
−1 2 0
have {f1 , f2 , f3 } is an orthogonal set and then is a basis of U . For all
vector x = (a, b, c, d) in U , it can be expanded as a linear combination of
{f1 , f2 , f3 } with Fourrier coefficients
x · f1 1
t1 = = (a + b + c − d)
∥f1 ∥2 4
x · f2 1
t2 = 2
= (a + c + 2d)
∥f2 ∥ 6
x · f2 1
t2 = = (−a + c)
∥f1 ∥2 2
That is
x = t1 f1 + t2 f3 + t3 f3 = . . .
Orthogonality 12 / 116
Orthogonality Orthogonalization
Orthogonal Lemma
Let {f1 , . . . , fm } be an orthogonal set in Rn . Given x ∈ Rn , write
x · f1 2 x · fm 2
fm = x − f1 − · · · − fm
∥f1 ∥ ∥fm ∥
then
1 fm · fi = 0 for k = 1, · · · m
2 If x ∈
/ span{f1 , . . . , fm } then fm ̸= 0 and {f1 , . . . , fm , fm+1 } be an
orthogonal set
Orthogonality 13 / 116
Orthogonality Orthogonalization
Orthogonality 14 / 116
Orthogonality Orthogonalization
f1 = x1
x2 · f1
f2 = x2 −
∥f1 ∥2
..
.
xk · f2 x2 · f1 xk · fk−1
fk = xk − − −
∥f2 ∥2 ∥f1 ∥2 ∥fk−1 ∥2
for k = 2, . . . , m. Then
1 f1 , f2 , · · · , fm is an orthogonal basis of U
2 span(f1 , · · · , fk ) = span(x1 , · · · , xk ) for all k = 1, . . . , m
Orthogonality 15 / 116
Orthogonality Orthogonalization
Orthogonality 16 / 116
Orthogonality Orthogonalization
Example
Orthogonality 17 / 116
Orthogonality Orthogonalization
Solution
Observe that x1 , x2 and x3 are independent. The algorithm gives
f1 = x1
T
3 1 2
x2 · f1
2
4 1 1
f2 = x2 − = − =
∥f1 ∥2 0 4 −1 −1
1 −1 2
4
x3 · f1 x3 · f2 0 3 −3
1
f3 = x3 − − = x3 − f 1 − f 2 =
∥f1 ∥2 ∥f2 ∥2 10 7
4 10
−6
Hence {f1 , f2 , f3 } is an orthogonal basis.
Remark
The orthogonal property does not change it a vector in basis is multiplied by a
nonzero. It may be convenient to eliminate fractions and use {f1 , f2 , 10f3 } as an
orthogonal basis of U
Orthogonality 18 / 116
Orthogonality Orthogonalization
Remark
Orthogonality 19 / 116
Orthogonality Orthogonalization
Example
0
Find an orthogonal basis of U = span{x1 , x2 , x3 , x4 } where x1 = 1,
0
1 1 1
x2 = 0, x3 = 1, x4 = 1
1 1 3
Orthogonality 20 / 116
Orthogonality Orthogonalization
Solution
The algorithm gives
f1 = x1
h iT
x2 ·f1
f2 = x2 − ∥f 1∥
2 = x2 − 0f1 = 1, 0, 1 ̸= 0
So f2 ∈
/ span{f1 } = span{x1 }. It implies that f2 and f1 are
independent and span{f1 , f2 } = span{x1 , x2 }
h iT
x3 ·f1 x3 ·f2 1 2
f3 = x3 − ∥f 1∥
2 − ∥f2∥2 f2 = x3 − 1 f1 − 2 f2 0, 0, 0 =0
So x3 ∈ span{f1 , f2 } = span{x1 , x2 }. Hence
span{x1 , x2 , x3 } = span{x1 , x2 } = span{f1 , f2 }
We do not need to put attention on f3
h iT
x4 ·f1 x4 ·f2 1 4
f4 = x4 − ∥f 2 − ∥f2∥2 f2 = x3 − 1 f1 − 2 f2 −1, 0, 1 ̸= 0
1∥
So x4 ∈ span{f1 , f2 } = span{x1 , x2 , x3 }. Hence
span{x1 , x2 , x3 , x4 } = span{x1 , x2 , x4 } = span{f1 , f2 , f4 }
An orthogonal basis of U = span{x1 , x2 , x3 , x4 } is {f1 , f2 , f4 }
Orthogonality 21 / 116
Orthogonality Orthogonal Complement
Problem motivation
Suppose a point x and a plane U through the origin in R3 are given, and
we want to find the point p in the plane that is closest to x. Our
geometric intuition assures us that such a point p exists. In fact, p must
be chosen in such a way that x − p is perpendicular to the plane.
Orthogonality 22 / 116
Orthogonality Orthogonal Complement
Problem motivation
Suppose a point x and a plane U through the origin in R3 are given, and
we want to find the point p in the plane that is closest to x. Our
geometric intuition assures us that such a point p exists. In fact, p must
be chosen in such a way that x − p is perpendicular to the plane.
Orthogonal Complement
If U is a subspace of Rn , define the orthogonal complement U ⊥ of U
(pronounced ”U -perp”) by
U ⊥ = {x ∈ Rn | x · y = 0 ∀y ∈ U }
Orthogonality 22 / 116
Orthogonality Orthogonal Complement
Lemma
Let U be a subspace of Rn
1 U ⊥ is a subspace of Rn
2 {0⊥ = Rn } and (Rn )⊥ = {0}
3 If U = span{x1 , . . . , xm } then
U ⊥ = {x ∈ Rn | x · xi = 0 for i = 1, . . . , m}
Orthogonality 23 / 116
Orthogonality Orthogonal Complement
Example
h iT h iT
Find U ⊥ if U = span 1, −2, 2, 0 , 1, 0, −2, 3 in R4 .
Solution
h iT
x = x, y, z, w is in U ⊥ if and only if it is orthogonal to both
h iT h iT
1, −2, 2, 0 and 1, 0, −2, 3 , that is
x − y + 2z = 0 (1)
x − 2z + 3w = 0 (2)
h iT h iT
Gaussian elimimation gives U⊥ = span 2, 4, 1, 0 , 3, 3, 0, −1
Orthogonality 24 / 116
Orthogonality Orthogonal Complement
Example
Orthogonality 25 / 116
Orthogonality Projection
proj{0} x = 0
Orthogonality 26 / 116
Orthogonality Projection
Orthogonality 27 / 116
Orthogonality Projection
∥x − p∥ < ∥x − y∥ ∀y ∈ U, y ̸= p
Orthogonality 28 / 116
Orthogonality Projection
Example
h iT h iT
Let U = span{x1 , x2 } where x1 = 1, 1, 0, 1 and x2 = 0, 1, 1, 2 . If
h iT
x = 3, −1, 0, 2 . Find the vector in U closest to x and express x as the
sum of a vector in U and a vector orthogonal to U .
Orthogonality 29 / 116
Orthogonality Projection
Solution
{x1 , x2 } are independent but not orthogonal. Gram-Schmidt algorithm
h iT
gives an orthogonal basis {f1 , f2 } of U where f1 = x1 = 1, 1, 0, 1 and
iT
x2 ·f1 2
h
f2 = x2 − = x2 = 33 f1 −1, 0, 1, 1
∥f1 ∥ f1
Compute projection using orthogonal basis (f1 , f2 )
x · f1 2 x · f2 2 4 −1 1 hT i
p = projU x = f1 + f2 = f1 + f2 = 5, 4, −1, 3
∥f1 ∥ ∥f2 ∥ 3 3 3
h iT
Thus p is the vector in U closest to x and x − p = 31 4, −7, 1, 3 is
orthogonal to every vector in U . The decomposition of x is
x = p + (x − p) = . . .
Orthogonality 30 / 116
Orthogonality Projection
We need
h to find p i= Ax̂ - the projection in the column space of
A = a1 . . . an that closest to b. The error vector e = b − Ax̂ is
perpenticular to the space
aT (b − Ax̂) = 0
T
1
a1 h
.. ..
i
. or . b − Ax̂ = 0 ⇐⇒ AT (b − Ax̂) = 0
aTn
T
a2 (b − Ax̂) = 0
Hence
AT Ax̂ = AT b
Orthogonality 31 / 116
Orthogonality Projection
x̂ = (AT A)−1 AT b
So
p = Ax̂ = A(AT A)−1 AT b
The matrix
P = A(AT A)−1 AT
is the projection matrix such that p = Pb
Orthogonality 32 / 116
Orthogonality Projection
Let θ be the angle between b and the line go through vector b then the
projection p of b on the line has the length
∥aT ∥∥b∥ cos θ
∥b∥ = ∥a∥ = ∥b∥ cos θ
∥a∥2
and the length of error
∥e∥ = ∥b∥ sin θ
Orthogonality 33 / 116
Orthogonality Projection
Projection in R3
Projection on Oz
Matrix of subspace
h 0 i
A = e3 = 0
1
h i 0
So AT A = 0 0 1 0 = 1. Hence projection matrix
1
0 h i 0 0 0
T −1 T
P1 = A(A A) A = 0 1 0 0 1 = 0 0 0
1 0 0 1
x 0 0 0 x 0
The projection of b = y is p1 = P1 b = 0 0 0 y = 0
z 0 0 1 z z
Orthogonality 35 / 116
Orthogonality Projection
Exericise
h iT h iT
Let U = span{x1 , x2 } where x1 = 1, 1, 0, 1 and x2 = 0, 1, 1, 2 . Find
h iT
projection of b = 3, −1, 0, 2 on U .
Orthogonality 37 / 116
Orthogonality Projection
Exercise
Let
1 0 6
A = 1 1 and b = 0 .
1 2 0
Find projection of b on column space of A.
Orthogonality 38 / 116
Orthogonality Projection
Orthogonality 39 / 116
Orthogonality Projection
Example
Find the closest line to the points (0, 6), (1, 0) and (2, 0)
Solution
1 0 6
We have A = 1 1 and b = 0 The coefficients a, k of the fitted line
1 2 0
are given by
" # " #
k 5
= x̂ = (AT A)−1 AT b =
a 3
Orthogonality 40 / 116
Orthogonality Projection
Fitting by a Parabola
Drop a stone from the Leaning Tower of Pisa by Galileo
Orthogonality 41 / 116
Orthogonal Diagonalization
Table of contents
1 Orthogonality
Orthogonality
Orthogonalization
Orthogonal Complement
Projection
2 Orthogonal Diagonalization
Orthogonality 42 / 116
Orthogonal Diagonalization
Question
P −1 AP is diagonal
The really nice bases of Rn are the orthogonal ones. So which matrices
have an orthogonal basis of eigenvectors?
Orthogonality 43 / 116
Orthogonal Diagonalization
Orthogonality 44 / 116
Orthogonal Diagonalization
Orthogonal matrix
Theorem
The following conditions are equivalent for all square matrix P
1 P is invertible and P −1 = P
2 The rows of P are orthogonormal
3 The columns of P are orthogonormal
Definition
A square matrix P is called an orthogonormal matrix if it satisfies one of
the above conditions
Orthogonality 45 / 116
Orthogonal Diagonalization
Example
" #
cos θ − sin θ
The rotation matrix is orthogonal for any angle θ
sin θ cos θ
Orthogonality 46 / 116
Orthogonal Diagonalization
It is not enough that the rows of a matrix A are merely orthogonal for A
to be an orthogonal matrix.
Example
The matrix
2 1 1
−1 1 1
0 −1 1
has orthogonal rows but columns are not orthogonal.
is orthogonal
Orthogonality 47 / 116
Orthogonal Diagonalization
Example
if P and Q are orthogonal matrices then P Q and P are also orthogonal
Orthogonality 48 / 116
Orthogonal Diagonalization
Example
if P and Q are orthogonal matrices then P Q and P are also orthogonal
Solution
Prove that P Q is orthogonal
P and Q are invertible and so is P Q with
(P Q)−1 = Q−1 P −1
Orthogonality 48 / 116
Orthogonal Diagonalization
Solution (Cont)
Prove that P −1 is orthogonal
It is clear that P −1 is invertible with
(P −1 )−1 = P
(P −1 )T = (P T )T = P
Thus
(P −1 )−1 = (P −1 )T
So P −1 is orthogonal
Orthogonality 49 / 116
Orthogonal Diagonalization
Orthogonality 50 / 116
Orthogonal Diagonalization
Theorem
If A is symmetric then
(Ax) · y = x · (Ay)
for all column vectors x, y ∈ Rn
Theorem
If A is a symmetric matrix, then eigenvectors of A corresponding to
distinct eigenvalues are orthogonal.
Orthogonality 51 / 116
Orthogonal Diagonalization
Example
an orthogonal matrix P such that P −1 AP is diagonal where
Find
1 0 −1
A= 0 1 2
−1 2 5
Orthogonality 52 / 116
Orthogonal Diagonalization
Solution
The characteristic polynomial of A is
1 − x, 0, −1
cA (x) = det 0, 1 − x, 2 = −x(x − 1)(x − 6)
−1, 2, 5 − x
Orthogonality 53 / 116
Orthogonal Diagonalization
Thus P −1 = P T and
0 0 0
T
P AP = 0 1 0
0 0 6
Orthogonality 54 / 116
Singular Value Decomposition (SVD)
Table of contents
1 Orthogonality
Orthogonality
Orthogonalization
Orthogonal Complement
Projection
2 Orthogonal Diagonalization
Orthogonality 55 / 116
Singular Value Decomposition (SVD)
1 An image is a large matrix of grayscal values, one for each pixel and
color
2 When nearby pixels are correlated (not random), the image can be
compressed
3 SVD separates any matrix A into rank one pieces (simple pieces)
uvT = (column)(row) . It is useful to compress image
4 The rows and columns are eigenvectors of AT A and AT A
Orthogonality 56 / 116
Singular Value Decomposition (SVD)
Observe that
1
1
1 h i
A= 1 1 1 1 1 1
1
1
1
1
1
1
Instead of sending all elements in matrix A, we can send a column
1
1
1
h i
and a row 1 1 1 1 1 1 which require 12 numbers
Orthogonality 58 / 116
Singular Value Decomposition (SVD)
Orthogonality 59 / 116
Singular Value Decomposition (SVD)
Orthogonality 60 / 116
Singular Value Decomposition (SVD)
Orthogonality 61 / 116
Singular Value Decomposition (SVD)
If the rank of A is much higher than 2 (for real images) then A will add up
may rank one pieces
n
X
A= σi ui vTi for n ≥ 2
i=1
We want the small ones such that thay can be discarded with no loss of
visual quantity - image compression.
Orthogonality 61 / 116
Singular Value Decomposition (SVD)
A = U ΣV T
V (AT A)V T = D
AT Avi = λi vi
Lemma
1 All eigenvalues of AT A and AAT are non-negative
2 AT A and AAT have the same set of positive eigenvalues {λi }
Definition
√
The real numbers σi = λi are called singular values of the matrix A
SVD algorithm
4 Decompose A = U ΣV T
Orthogonality 65 / 116
Singular Value Decomposition (SVD)
Example
" #
0 1
Find an SVD for A =
−1 0
Orthogonality 66 / 116
Positive Definite Matrices
Table of contents
1 Orthogonality
Orthogonality
Orthogonalization
Orthogonal Complement
Projection
2 Orthogonal Diagonalization
Orthogonality 67 / 116
Positive Definite Matrices
Because these matrices are symmetric, the principal axes theorem plays a
central role in the theory.
Theorem
If A is positive definite then it is invertible and det(A) > 0
Example
If U is any invertible n × n matrix then A = U T U is positive definite
Orthogonality 69 / 116
Positive Definite Matrices
Example
If U is any invertible n × n matrix then A = U T U is positive definite
Solution
If x ∈ Rn ̸= 0 then
xT Ax = xT U T U x = (U x)T (U x) = ∥U x∥2
Orthogonality 69 / 116
Positive Definite Matrices
Principal submatrices
Definition
If A be an n × n matrix, let (r) A denote the r × r submatrix in the upper
left corner of A. The matrices (1) A, (2) A, . . . , (n) A = A are called
principal matrices of A
Example
10 5 2 " #
(1) A = 10 , (2) A = 10 5 and
h i
If 5 3 2 then
(3) A =A
5 3
2 2 3
Orthogonality 70 / 116
Positive Definite Matrices
Theorem
If A is positive definite, so is each principal submatrix (r) A for
r = 1, 2, . . . , n
Proof.
Write " #
(r) A P
A=
Q R
" #
y
For y ̸= 0 in Rr , consider x = ∈ Rn . Then x ̸= 0. So
0
" #" #
h i (r) A P y
0 < xT Ax = y 0 = yT Ay
Q R 0
Orthogonality 71 / 116
Positive Definite Matrices
Theorem
The following conditions are equivalent for a symmetric n × n matrix A
1 A is positive definite
2 det((r) A) > 0 for each r = 1, 2, . . . , n
3 A = U T U where U is an upper triangular matrix with positive entries
on the main diagonal
Furthermore, the factorization in (3) is unique (called the Cholesky
factorization of A)
Example
10 5 2
Find the Cholesky factorization of A = 5 3 2
2 2 2
Solution
" #
h i 10 5
We have( 1)A = 10 , (2) A = and (3) A = A. It is easy to verify
5 3
that det((1) A) = 10 > 0, det((2) A) = 5 > 0 and det((3) A) = 3 > 0. So A
is positive definite. It has the Cholesky factorization.
Step 1
10 5 2 − r1 +r 10 5 2 10 5 2
5 3
A = 5 3 2 −−r−−−→ 0 1/2 1 −−−−−→ 0 1/2 1
1 −2r +r
2 2 2 − 2 +r2
2 3
0 2 13/5 0 0 3/5
Orthogonality 73 / 116
Positive Definite Matrices
Solution (cont)
Step 2 √
√5 √2
10 10 √10
0 √1 2
U =
2 q
3
0 0 5
Orthogonality 74 / 116
An Application to Quadratic Forms
Table of contents
1 Orthogonality
Orthogonality
Orthogonalization
Orthogonal Complement
Projection
2 Orthogonal Diagonalization
Orthogonality 75 / 116
An Application to Quadratic Forms
Quadratic form
Definition
A quadratic form q of n variables x1 , . . . , xn is a linear combination of
x21 , . . . , x2n and cross terms x1 x2 , x1 x3 , x2 x3 . . .
n
X n
X n X
X n
q= aii x2i + (aij + aji )xi xj = aij xi xj
i=i i,j=1 i=i j=1
n̸=j
q(x) = xT Ax
T
where x = x1 . . . xn and A = ai j .
h i h i
Example
Write q = x21 + 3x23 + 2x1 x2 − x1 x3 in the form of q(x) = xT Ax where A
is a symmetric 3 × 3 matrix.
Solution
The cross terms are 2x1 x2 = x1 x2 + x2 x1 , −x1 x3 = − 12 x1 x3 − 21 x3 x1 and
both x2 x3 and x3 x2 have coefficient zero and does x22 . Hence
h i 1 1 − 21 x1
q(x) = x1 x2 x3 1 0 0 x2
− 12 0 3 x3
Orthogonality 77 / 116
An Application to Quadratic Forms
Problem
q(x) = xT Ax
Orthogonality 78 / 116
An Application to Quadratic Forms
Solution
Define y by
x = Py equivalent to y = PTx
and substitution in q(x) = xT Ax gives
Orthogonality 79 / 116
An Application to Quadratic Forms
Principle axes
Let λ1 , . . . , λn (repeated according to their multiplicities) and the
corresponding set {f1 , . . . , fn } of orthonormal eigenvector of A, called a
set of principal axes then the orthogonally diagonalizing matrix
h i
P = f1 f2 . . . fn
and
y1
y2
i
h
x = P y = f1 f2 . . . fn . = y1 f1 + y2 f2 + · · · + yn fn
..
yn
q = q(x) = λ1 (x · f1 )2 + · · · + λn (x · fn )2
Orthogonality 80 / 116
An Application to Quadratic Forms
Example
Find new variables y1 , y2 such that
q = x21 + x1 x2 + x22
Solution
The form can be written as q = xT Ax where
" # " #
1
x 1
x= 1 and A = 1
2
x2 2 2
− √12
" # " 1 #
√
f1 = , f2 = 2
√1 √1
2 2
− √12 √1
" #
h i
P = f1 f2 = 2
√1 √1
2 2
" # " #
y x1 + x2
Introduce y = 1 = P T x = 1
2 then
y2 −x1 + x2
1 3
q = y12 + y22
2 2
Orthogonality 82 / 116
An Application to Quadratic Forms
Orthogonality 83 / 116
An Application to Quadratic Forms
Proof
If b = 0 then q is already has no cross term and (1), (2) are clear. So
assume b ̸= 0 then " #
a 2b
A= b
2 c
and q has characteristic polynomial
1
cA (x) = x2 − (a + c)x − (b2 − 4ac)
4
p
Denote d = b2 + (a − c)2 then the eigenvalues of A are
1 1
λ1 = (a + c − d) and λ1 = (a + c + d)
2 2
with the corresponding principal axes
" # " #
1 a−c−d 1 −b
f1 = p 2 , f1 = p 2
b + (a − c − d)2 b b + (a − c − d)2 a − c − d
Orthogonality 84 / 116
An Application to Quadratic Forms
for part 1
Because ∥f1 ∥ = 1, there exists an angle θ such that
a−c−d b
cos θ = p 2 , sin θ = p 2
b + (a − c − d)2 b + (a − c − d)2
then " #
h i cos θ − sin θ
P = f1 f2 =
sin θ cos θ
diagonalizes A and the principal axes
" # " #
cos θ − sin θ
f1 = = P e1 and f2 = = P e2
sin θ cos θ
can always be found by rotating the x1 and x2 axes around the origin
through an angle θ
Orthogonality 85 / 116
An Application to Quadratic Forms
rotating the x1 and x2 axes around the origin through an angle θ to obtain
principal axes
Orthogonality 86 / 116
An Application to Quadratic Forms
for part 2
We have " #
λ 0 1
det(A) = det 1 ⇒ λ1 λ2 = (4ac − b2 )
0 λ2 4
In terms of y1 , y2 , the equation becomes
λ1 y12 + λ2 y22 = 1
Orthogonality 87 / 116
An Application to Quadratic Forms
Example
The notation in the prevous result for the equation x2 + xy + y 2 = 1
becomes a = b = c = 1. So the rotation angle θ is found by
−1 1
cos θ = √ , sin θ = √
2 2
3π
Hence θ = 4 . Thus the principle axes are
" −1 #
− √12
" #
√
f1 = , f2 = 2
√1 −1
√
2 2
and then
1 1
y1 = √ (−x1 + x2 ), y2 = − √ (x1 + x2 )
2 2
In y1 y2 -coordinate, the equation becomes
1 2 3 2
y + y =1
2 1 4 2
Orthogonality 88 / 116
An Application to Quadratic Forms
The angle θ is choosen such that new y1 and y2 axes are the axes of
symmetry of the ellipse. The eigenvector f1 and f2 point along these axes
of symmetry. For this reason, they are called principal axes
Orthogonality 89 / 116
An Application to Constrained Optimization
Table of contents
Orthogonality 90 / 116
An Application to Constrained Optimization
Constrained Optimization
Orthogonality 91 / 116
An Application to Constrained Optimization
Example
A politician proposes to spend x1 dollars annually on health care and x2
dollars annually on education. She is constrained in her spending by
various budget pressures, and one model of this is that the expenditures x1
and x2 should satisfy a constraint like
5x21 + 3x22 ≤ 15
Orthogonality 92 / 116
An Application to Constrained Optimization
These choices have different effects on voters, and the politician wants to
choose x = (x1 , x2 ) to maximize some measure q = q(x1 , x2 ) of voter
satisfaction. Assume that for any value of c, all point on the graph of
q(x1 , x2 ) = c have the same appeal to vote. Hence the goal is to find the
largest value of c for which the graph of q(x1 , x2 ) = c contains a feasible
point.
Remark that the constraint can be put in a standard from ∥y∥ ≤ 1 with
x1 x2
y1 = √ 3
, y2 = √ 5
. So we can convert the above problem into find
maximum of a quadratic form subject to ∥y∥ ≤ 1 called unit ball
Orthogonality 93 / 116
An Application to Constrained Optimization
Theorem
Consider a the quadratic form q = xT Ax where A is an n × n symmetric
matrix and let λ1 and λm denote the largest and smallest eigenvalues of
A. Then
1 max{q(x)|∥x∥ ≤ 1} = λ1 and q(f1 ) = λ1 where f1 is any unit
λ1 -eigenvector
2 min{q(x)|∥x∥ ≤ 1} = λn and q(fn ) = λn where fn is any unit
λn -eigenvector
Orthogonality 94 / 116
An Application to Constrained Optimization
Proof of (1)
Since A is symmetric, let the real eigenvalues of A be ordered as
λ1 ≥ λ2 ≥ · · · ≥ λn
By the principal axes theorem, let P be an orthogonal matrix such that
P T AP = D = diag(λ, λ2 , . . . , λn ) and define y = P T x equivalent to
x = P y then ∥y∥ = ∥x∥ because ∥y∥2 = yT y = xT P P T x = xT x = ∥x∥2 .
Express q in terms of y, we have
q(x) = q(P y) = (P y)T A(P y) = yT P T AP y = yT Dy = λ1 y12 + · · · + λn yn2
Assume that ∥x∥ ≤ 1, then ∥y∥ = ∥x∥ ≤ 1. Since λi ≤ λ1 for all i, we
have
q(x) = λ1 y12 + · · · + λn yn2 ≤ λ1 y12 + · · · + λ1 yn2 = λ1 (y12 + · · · + yn2 )
= λ1 ∥y∥2 = λ1
Hence λ1 is the maximum value of q(x) when ∥x∥ ≤ 1.
Orthogonality 96 / 116
An Application to Constrained Optimization
Example
Maximize and minimize the form q(x) = 3x21 + 14x1 x2 + 3x22 subject to
∥x∥ ≤ 1
Orthogonality 97 / 116
An Application to Constrained Optimization
Example
Maximize and minimize the form q(x) = 3x21 + 14x1 x2 + 3x22 subject to
∥x∥ ≤ 1
Solution
3 7
The matrix of q is A = 7 with eigenvalues λ1 = 10 , λ2 = −4 and
3
" # " #
1 1 √1
the corresponding unit eigenvectors f1 = , f1 = √12
1 −11 2
Hence q(x) takes its maximum value 10 at x = f1 and the minimum value
is -4 when x = f2
Orthogonality 97 / 116
Principle Component Analysis
Table of contents
Orthogonality 98 / 116
Principle Component Analysis
Suppose the heights h1 , h2 , ..., hn of n men are measured. Such a data set
is called a sample of the heights of all the men in the population under
study, and various questions are often asked about such a sample: What is
the average height in the sample? How much variation is there in the
sample heights, and how can it be measured? What can be inferred from
the sample about the heights of all men in the population? How do these
heights compare to heights of men in neighbouring countries? Does the
prevalence of smoking affect the height of a man?
Orthogonality 99 / 116
Principle Component Analysis
Example
For m = 2 variables like age and height, then points lie in the plane R2 .
Substract the average age and height to center the data. If the n recenterd
points clusters along a line, how will linear algebra find that line?
Sample mean
Represent
h a sample i {x1 , . . . , xn } as a sample vector
x = x1 , . . . , x n
The most widely known statistic for describing a data set is the
sample mean x̄ defined by x̄ = n1 (x1 + · · · + xn )
Figure 1: x = −1, 0, 1, 4, 6 with sample mean x̄ = 2
Centred sample
If the mean x̄ is substracted from each data valuexi , the resulting data
xi − x̄ is said to be centred. The corresponding data vector is
h i
x̄c = x1 − x̄, . . . , xn − x̄
Figure 2: Centred sample xc = −3, −2, −1, 2, 4
Sample variance
to answer the question how much variability is in the sample
h i
x = x1 , . . . , x n
that is how widely are the data ”spead out” around the sample mean
x̄
use the square (xi − x̄)2 as a measure of variability
sample variance
n
1 X 1
s2x = (xi − x̄)2 = ∥x − x̄1∥2
n − 1 i=i n−1
Start with the measurements in A0 : the sample data. Find the average
mean µ1 , . . . , µm of each row. Substract each mean µi from row i to
center the data to obtain the centered matrix A.
AAT
The ”sample covariance matrix” is defined by S = n−1
Interpretation example
" #
3 −4 7 1 −4 −3
A=
7 −6 8 −1 −1 −7
has sample covariance
" #
AAT 20 25
S= =
5 25 40
The rows of A are highly correlated s12 = 25. Above avarage math
went with above average history.
Notice that S has posive trace and determinant. AAT is positive
definite.
PCA procedure
PCA is a tools for dimension reduction in machine learning when the data
consists of large variables (features). It aims to reduce number of features
(dimension) but keep important information by discard component
contributing smaller variance
1 Centering data and compute sample covariance matrix S
The leading vector u1 shows the dominant direction in the scatter plot
Orthogonality 110 / 116
Principle Component Analysis
Application of PCA
σ2
Fractions Ti drop quickly to zero. The first three principle components
containes almost information