Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Best rank-one approximation

Definition: The first left singular vector of A is defined to be the vector u1 such that
1 u1 = Av1 , where 1 and v1 are, respectively, the first singular value and the first right
singular vector.
Theorem: The best rank-one approximation to A is 1 u1 vT 1 where 1 is the first singular
value, u1 is the first left singular vector, and v1 is the first right singular vector of A.
Best rank-one approximation: example
 
1 4 .78
Example: For the matrix A = , the first right singular vector is v1 ⇡ and the
5 2 .63

.54
first singular value 1 is about 6.1. The first left singular vector is u1 ⇡ , meaning
.84
1 u1 = A v1 .
Then
We then have  
1 4 2.6 2.1
A Ã ⇡
à = T
1 u 1 v1
5 2 4.0 3.2
 
.54 ⇥ ⇤ 1.56 1.93
⇡ 6.1 .78 .63 ⇡
.84 1.00 1.23

2.6 2.1
⇡ so the squared Frobenius norm of A Ã is
4.0 3.2
1.562 + 1.932 + 12 + 1.232 ⇡ 8.7

||A Ã||2F = ||A||2F 2


1 ⇡ 8.7. X
The closest one-dimensional affine space

In trolley-line problem, line must go through origin:


closest one-dimensional vector space.
Perhaps line not through origin is much closer.
An arbitrary line (one not necessarily passing through the
origin) is a one-dimensional affine space.

Given points a1 , . . . , am ,
I choose point ā and translate each of the input points by subtracting ā:
a1 ā, . . . , am ā
I find the one-dimensional vector space closest to these translated points, and then translate
that vector space by adding back ā.
Best choice of ā is the centroid of the input points, the vector ā = m1 (a1 + · · · + am ).
(Proof is lovely–maybe we’ll see it later.)
Translating the points by subtracting o↵ the centroid is called centering the points.
The closest one-dimensional affine space

In trolley-line problem, line must go through origin:


closest one-dimensional vector space.
Perhaps line not through origin is much closer.
An arbitrary line (one not necessarily passing through the
origin) is a one-dimensional affine space.

Given points a1 , . . . , am ,
I choose point ā and translate each of the input points by subtracting ā:
a1 ā, . . . , am ā
I find the one-dimensional vector space closest to these translated points, and then translate
that vector space by adding back ā.
Best choice of ā is the centroid of the input points, the vector ā = m1 (a1 + · · · + am ).
(Proof is lovely–maybe we’ll see it later.)
Translating the points by subtracting o↵ the centroid is called centering the points.
The closest one-dimensional affine space

In trolley-line problem, line must go through origin:


closest one-dimensional vector space.
Perhaps line not through origin is much closer.
An arbitrary line (one not necessarily passing through the
origin) is a one-dimensional affine space.

Given points a1 , . . . , am ,
I choose point ā and translate each of the input points by subtracting ā:
a1 ā, . . . , am ā
I find the one-dimensional vector space closest to these translated points, and then translate
that vector space by adding back ā.
Best choice of ā is the centroid of the input points, the vector ā = m1 (a1 + · · · + am ).
(Proof is lovely–maybe we’ll see it later.)
Translating the points by subtracting o↵ the centroid is called centering the points.
Politics revisited

We center the voting data, and find the closest one-dimensional vector space Span {v1 }.
Now projection along v gives better spread. Look at coordinate representation in terms of v:

Which of the senators to the left of the origin are Republican?

>>> {r for r in senators if is_neg[r] and is_Repub[r]}


{’Collins’, ’Snowe’, ’Chafee’}

Similarly, only three of the senators to the right of the origin are Democrat.
Visualization revisited

We now can turn a bunch of high-dimensional vectors into a bunch of numbers, plot the
numbers on number line. Dimension reduction
What about turning a bunch of high-dimensional vectors into vectors in R2 or R3 or R10 ?
Closest 1-dimensional vector space (trolley-line-location problem):
I input: Vectors a1 , . . . am
I output:
P Orthonormal basis {v1 } for dim-1 vector space V1 that minimizes
2
i (distance from ai to V1 )
P
We saw: i (distance from ai to Span {v1 })2 = ||A||2F kAv1 k2
Therefore: Best vector v1 is the unit vector that maximizes kAv1 k.
Closest k-dimensional vector space:
I input: Vectors a1 , . . . am , integer k
I output:
P Orthonormal basis {v1 , . . . , vk } for dim-k vector space Vk that minimizes
2
i (distance from ai to Vk )
Let v1 , . . . , vk be an orthonormal basis for a subspace V
By the Pythagorean Theorem,
a?V
1 = a1 akV
1
kV
.. ka?V
1 k
2 = ka1 k2 ka1 k2
. ..
.
a?V
m = am akV
m ?V 2 2 kV 2
Closest 1-dimensional vector space (trolley-line-location problem):
I input: Vectors a1 , . . . am
I output:
P Orthonormal basis {v1 } for dim-1 vector space V1 that minimizes
2
i (distance from ai to V1 )
P
We saw: i (distance from ai to Span {v1 })2 = ||A||2F kAv1 k2
Therefore: Best vector v1 is the unit vector that maximizes kAv1 k.
Closest k-dimensional vector space:
I input: Vectors a1 , . . . am , integer k
I output:
P Orthonormal basis {v1 , . . . , vk } for dim-k vector space Vk that minimizes
2
i (distance from ai to Vk )
Let v1 , . . . , vk be an orthonormal basis for a subspace V
By the Pythagorean Theorem,
a?V
1 = a1 akV
1
kV
.. ka?V
1 k
2 = ka1 k2 ka1 k2
. ..
.
a?V
m = am akV
m ?V 2 2 kV 2
Thus For an orthonormal basis v1 , . . . , vk of V,
X
(dist from ai to V)2 = kAk2F kAv1 k2 + · · · + kAvk k2
i

Therefore choosing a k-dimensional space V minimizing the sum of squared distances to V is


equivalent to choosing k orthonormal vectors v1 , . . . , vk to maximize kAv1 k2 + · · · + kAvk k2 .
How to choose such vectors? A greedy algorithm.
Closest dimension-k vector space
Computational Problem: closest low-dimensional subspace:
I input: Vectors a1 , . . . am and positive integer k
P
I output: basis for dim-k vector space Vk that minimizes i (distance from ai to Vk )2
Algorithm for one dimension: choose unit-norm vector v that maximizes kAvk
Natural generalization of this algorithm in which an orthonormal basis is sought.

Algorithm: In i th iteration, select unit vector v that maximizes kAvk among those vectors
orthogonal to all previously selected vectors
def find right singular vectors(A):
for i = 1, 2, . . . , min{m, n}
• v1 = norm-one vector v maximizing kAvk,
vi =arg max{||Av|| : ||v|| = 1,
• v2 = norm-one vector v orthog. to v1 that
v is orthog. to v1 , v2 , . . . vi 1 }
maximizes kAvk,
until Av = 0 for every vector v orthogonal
• v3 = norm-one vector v orthog. to v1 and v2
to v1 , . . . , vi Define i = kAvi k.
that maximizes kAvk, and so on.
return [v1 , v2 , . . . , vr ] r = number of iterations.
Closest dimension-k vector space
Computational Problem: closest low-dimensional subspace:
I input: Vectors a1 , . . . am and positive integer k
P
I output: basis for dim-k vector space Vk that minimizes i (distance from ai to Vk )2

Algorithm: In i th iteration, select vector v that maximizes kAvk among those vectors or-
thogonal to all previously selected vectors. ) v1 , . . . , vk
Theorem: For P each k 0, the first k right singular vectors span the k-dimensional space Vk
that minimizes i (distance from ai to Vk )2 .
Proof: by induction on k. The case k = 0 is trivial.
Assume the theorem holds for k = q 1. We prove it for k = q.
Suppose W is a q-dimensional space. Let wq be a unit vector in W that is orthogonal to
v1 , . . . , vq 1 . (Why is there such a vector?) Let w1 , . . . , wq 1 be vectors such that w1 , . . . , wq
form an orthonormal basis for W. (Why are there such vectors?)
By choice of vq , kAvq k kAwq k.
By the induction hypothesis, Span {v1 , . . . , vq 1 } is the (q 1)-dimensional space minimizing
sum of squared distances, so kAv1 k2 + · · · + kAvq 1 k2 kAw1 k2 + · · · + kAwq 1 k2 .
Proposition: The singular values 1 , . . . , r
def find right singular vectors(A): are positive and in nonincreasing order.
for i = 1, 2, . . . , min{m, n} Proof: i = kAvi k and norm of a vector is
vi =arg max{||Av|| : ||v|| = 1, nonnegative. Algorithm stops before it would
v is orthog. to v1 , v2 , . . . vi 1 } choose a vector vi such that kAvi k is zero, so
until Av = 0 for every vector v orthogonal singular values are positive. First right singular
to v1 , . . . , vi Define i = kAvi k. vector is chosen most freely, followed by second,
return [v1 , v2 , . . . , vr ] r = number of iterations. etc. QED
Proposition: Right singular vectors are orthonormal.
Proof: In iteration i, vi is chosen from among vectors that have norm one and are orthogonal
to v1 , . . . , vi 1 . QED
Theorem: Let A be an m ⇥ n matrix, and let a1 , . . . , am be its rows. Let v1 , . . . , vr be its right
singular vectors, and let 1 , . . . , r be its singular values. For k = 1, 2, . . . , r , Span {v1 , . . . , vk }
is the k-dimensional vector space V that minimizes
(distance from a1 to V)2 + · · · + (distance from am to V)2

Proposition: Left singular vectors u1 , . . . , ut are orthonormal. (See text for proof.)
Closest k-dimensional affine space

Use the centering technique:


Find the centroid ā of the input points a1 , . . . , am , and subtract it from each of the input
points. Then find a basis v1 , . . . , vk for the k-dimensional vector space closest to
a1 ā, . . . , am ā. The k-dimensional affine space closest to the original points a1 , . . . , am is

{ā + v : v 2 Span {v1 , . . . , vk }}


Deriving the Singular Value Decomposition
Let A be an m ⇥ n matrix. We have defined a procedure to obtain
v1 , . . . , vr the right singular vectors orthonormal by choice
1, . . . , r the singular values positive
u 1 , . . . , ur the left singular vectors orthonormal by Proposition
such that i ui = Avi for i = 1, . . . , r .
Express
2 equations
3 using matrix-matrix
2 multiplication: 3
2 3
6 7 6 7
6 7 6 7
6 A 7 4 v1 · · · vr = 6
5 1 u1 ··· 7
r ur 7
6 7 6
4 5 4 5

We
2 rewrite 3
equation as 2 3
2 3 2 3
6 7 6 7 1
6 7 6 76 .. 7
6 7 4 v1 · · · vr = 6
5 ur 7
6 A 7 6 u1 · · · 74 . 5
4 5 4 5
r
Deriving the Singular Value Decomposition
We
2 rewrite 3
equation as 2 3
2 3 2 3
6 7 6 7 1
6 7 6 76 .. 7
6 7 4 v1 · · · vr = 6
5 ur 7
6 A 7 6 u1 · · · 74 . 5
4 5 4 5
r

Assume number r of singular values is n. Then the rightmost matrix is square and its columns
are orthonormal, so it is an orthogonal matrix, so its inverse is its transpose. Multiplying both
sides of equation,
2 we obtain 3 2 3
2 32 3
6 7 6 7 1 vT1
6 7 6 7
6 A 7 = 6 u 1 · · · un 7 6 .. 76 .. 7
6 7 6 74 . 54 . 5
4 5 4 5 T
n vn
Deriving the Singular Value Decomposition

Assume number r of singular values is n. Then the rightmost matrix is square and its columns
are orthonormal, so it is an orthogonal matrix, so its inverse is its transpose. Multiplying both
sides of equation,
2 we obtain 3 2 3
2 32 3
6 7 6 7 1 vT1
6 7 6 7
6 A 7 = 6 u 1 · · · un 7 6 .. 76 .. 7
6 7 6 74 . 54 . 5
4 5 4 5 T
n vn

A = U⌃V T
where U and V are column-orthogonal and ⌃ is diagonal with positive diagonal elements.
called the (compact) singular value decomposition (SVD) of A.
Existence of SVD
Lemma: Each row of A lies in the span of the def find right singular vectors(A):
right singular vectors. Proof .... for i = 1, 2, . . . , min{m, n}
Let V = Span {v1 , . . . , vr }. vi =arg max{||Av|| : ||v|| = 1,
v is orthog. to v1 , v2 , . . . vi 1 }
By termination condition, Av = 0 for every i = ||Avi ||
vector v orthogonal to V. until Av = 0 for every vector v orthogonal
||V
For each row ai , write ai = ai + ai . ?V to v1 , . . . , vi
let r be the final value of the loop variable i.
return [v1 , v2 , . . . , vr ]
D E D E
||V
0 = ai , a?V
i = ai + a ?V
i , a ?V
i
D E D E
||V
= ai , a?Vi + a?V i , ai
?V

= 0 + ka?V
i k
2

||V
so a?V
i = 0. Shows ai = ai , which shows that ai lies in V. QED
Existence, continued
2 3 2 3
aT1 ha1 , v1 i · · · ha1 , vr i 2 3
6 7 6 ha2 , v1 i · · · vT1
6 aT2 7 6 ha2 , vr i 7
76 .. 7
6 .. 7=6 .. 74 . 5
4 . 5 4 . 5
vTr
aTm ham , v1 i · · · ham , vr i
The j th column of the first matrix on the right-hand side is
2 3
ha1 , vj i
6 ha2 , vj i 7
6 7
6 .. 7
4 . 5
ham , vj i

which is Avj , which is j uj

A = U⌃V T
The Singular Value Decomposition

The (compact) SVD of a matrix A is the factorization of A as

A = U⌃V T

where U and V are column-orthogonal and ⌃ is diagonal with positive diagonal elements.
In general, ⌃ is allowed to have zero diagonal elements.
Di↵erent flavors of SVD A = U⌃V T of an m ⇥ n matrix:
I traditional: U is m ⇥ m, V T is n ⇥ n, and ⌃ is m ⇥ n
I reduced, or thin:
I If m n then U is m ⇥ n, V T is n ⇥ n.
I If m  n then U is m ⇥ m, V T is m ⇥ n.
I compact: thin, but omit zero singular values.
We never use the traditional SVD; we mostly use compact SVD.
Properties of the SVD

I Row space of A = row space of V T


I Col space of A = col space of U.
SVD of the transpose

We can go from the SVD of A to the SVD of AT .


2 3 2 32 32 3
4 A 5=4 U 54 ⌃ 54 VT 5

Define Ū = V and V̄ = U. Then


2 3 2 3
2 32 3
6 7 6 7
6 7 6 7
6 AT 7=6 Ū 74 ⌃ 54 V̄ T 5
6 7 6 7
4 5 4 5
Best rank-k approximation in terms of the singular value decomposition
Start by writing SVD of A:
2 3 2 32 32 3
1
6 7 6 76 76 7
6 7 6 76 76 7
6 A 7=6 U 76 .. 76
VT 7
6 7 6 76 . 76 7
4 5 4 56
4
74
5 5
r

Replace k+1 , . . . , n with zeroes. We obtain


2 3 2 32 32 3
1
6 7 6 76 .. 76 7
6 7 6 76 . 76 7
6 Ã 7=6 U 76 76
VT 7
6 7 6 76 k 76 7
4 5 4 56
4
74
5 5

This gives the same approximation as before.


Computing SVD

I I derived the SVD assuming a procedure to solve this problem:

arg max{||Av|| : kvk = 1, v is orthog. to v1 , v2 , . . . vi 1}

I Later we give a procedure to approximately solve this problem.


I The most efficient method for computing the SVD is beyond the scope of the course.
Example: Senators

First center the data. Then find first two right singular vectors v1 and v2 .
Projecting onto these gives two coordinates.
To find singular vectors,
I make a matrix A whose rows are the centered versions of vectors
I find SVD of A using svd module.
>>> U, Sigma, V = svd.factor(A)
I first two columns of V are first two right singular vectors.
Example: Senators, two principal components
Example: Senators, two principal components

You might also like