Download as pdf or txt
Download as pdf or txt
You are on page 1of 79

MAT 517: COMPUTATIONAL LINEAR ALGEBRA

Assoc. Prof. Dr. Noor Atinah Ahmad


School of Mathematical Sciences
Universiti Sains Malaysia
nooratinah@usm.my

LECTURE 6: Singular Value


Decomposition (SVD)
Matrix factorization at a glance

• You’ve seen how useful LU and Cholesky factorization can be


for solving linear system of equations.
• The QR factorization --- also a form of triangular factorization,
more stable (ONLY FOR FULL RANK MATRICES)
• Eigenvalue revealing factorization (remember these?)
 Matrix diagonalization (diagonalizable matrices)
A  VDV 1 For nondefective matrix A
 Spectral decomposition (symmmetric matrices)
A  UDUT For SPD matrix A
The spectral decomposition

 1   u1T 
  T 
 2   u2  .
A   u1 u 2  un 
   
   T 
 n   u n 

T
U D U
How do we make sense of the spectral decomposition?

One way to make sense of it is to take it a part piece by piece ---


LITERALLY!
Start with this:
 u1T 
 T
 u2 
A  UDUT   1u1 2u 2  n u n  .
  
 T 
 un 
Taking the terms apart

The (i,j) entry of A:


 u j1 
 
 u j2 
aij   1ui1  2ui 2    nuin 
  
 
 u jn 
 1ui1u j1  2ui 2u j 2    nuinu jn .

The (i,j) entry of u k uTk


k  1, 2, , n.
Which means….

A can be expressed as
A  1u1u1T  2u 2uT2    nu nuTn .

 A is a sum of ‘decreasing’ terms.


 The decreasing terms are called ‘outer products’ of the
eigenvectors.
The column space of A

From the perspective of eigenvalues and eigenvectors, the


column space of A contains vectors of the form (for any x  R )
n

Ax  1  u x  u1  2  u x  u 2    n  u x  u n .
T
1
T
2
T
n

Largest Second
component largest
Etc…
component
The principle subspace

Principle subspace of rank k:

Span u1 , u 2 , , u k 

The spectral decomposition of A reveals


the principle subspace, i.e., the subspace
which contains most of the important
information in range(A)
Generalizing spectral decomposition

We would like a matrix decomposition with these properties:


 Reveals eigenvalues and eigenvectors (i.e. latent factors).
 Exists for a wide range of matrices.
 Computable.
Definition: Singular Value Decomposition ( m  n )

The singular value decomposition (SVD) for a m  n matrix A is


A  UΣVT
where
 U is m  n with orthonormal columns;
 V is m  n with orthonormal columns;
 Σ is n  n diagonal matrix;
 The columns of U ( V ) are the left (right) singular vectors;
 The diagonals of Σ are the singular values.
This called the “economy size” or “reduced” SVD which you get
using the command
[U, S, V] =svd(A,’econ’);
1   v1T 
  T 
 2   v2 .
A   u1 u 2  un 
   
   T 
  n   vn 

T
U Σ V
Taking the SVD apart

Start with this:

 v1T 
 T
 v2 
A  UΣVT   1u1  2u 2   nu n  .
  
 T 
 vn 
Then look at each entry in A

The (i,j) entry of A:


 v1 j 
 
 v2 j 
aij   1ui1   2ui 2     nuin 
  
 
 vnj 
  1ui1v1 j   2ui 2v2 j     nuin vnj .

The (i,j) entry of u k vTk


k  1, 2, , n.
Which means….

A can be expressed as
A   1u1 v1T   2u 2 vT2     nu n vTn .

 A is a sum of ‘decreasing’ terms.


 The decreasing terms are called ‘outer products’ of the
left and right singular vectors
What can be said about the column space of A

RECALL: column space of A  range  A 

For any x  R , Ax  range  A 


n

Consider Ax in this form:


Ax   1u1  v1T x    2u 2  vT2 x      nu n  vTn x  ,
then, we observe that range  A  has an orthonormal basis
u1 , u 2 , , u n  and SVD provides the coordinates of Ax with
respect to u1 , u 2 , , u n .

If k is the number of nonzero singular values of A,


then the rank of A is k.
What can be said about the column space of A

SVD helps provide a useful description of range  A 


 It tells you the size of range  A  (i.e. its rank)
 Works even for full rank as well as rank deficient matrices
 It provides you with the orthogonal factors
(i.e. u1 , u 2 , , u n ) that describes range  A .
 It provides you with the ‘loadings’ or the weight of each
factors (i.e.  i  v i x  ) as a measure of how much each
T

factor affects information contained in range  A  .


How do we proof that SVD exists?

You proof it by showing exactly what U , V and Σ are.


T
We will start by looking at A A :

If A  UΣV T , then
A A   UΣV
T

T T
 UΣV T

 VΣT UT UΣV T
 VΣT ΣVT

DO YOU RECOGNIZE THIS?


The spectral decomposition of AT A

A A  VΣ ΣV
T T T

SPD/SSPD matrix

Orthogonal matrix
Diagonal matrix

T
 Columns of V are orthonormal eigenvectors of A A .
 Diagonal entries of ΣT Σ are the associated eigenvalues.
A few notes about the eigenvalues of AT A

T
 FACT: A A is positive definite if A is full rank and semi-
positive definite if A is rank deficient.

 FACT: Eigenvalues of AT A are always  0 .


Eigenvectors of AT A

AT A is symmetric thus we can always find a set of n


orthonormal eigenvectors associated with the eigenvalues
1 , 2 , , n .

Let’s call these eigenvectors v1 , v 2 , , v n .

Orthonormal means

0 whenever i  j
v vj  
T
i
1 whenever i  j
Diagonal entries of ΣT Σ

It should be clear that


  12 
 
 2
2
.
Σ ΣΣ 
T 2
  
 2

 n 

Thus if 1 , 2 , , n are the eigenvalues of AT A arranged such


that 1  2    n  0 , then we can define

 i  i  0, i  1, 2, , n.
About V
T
Recall that V appears in the spectral decomposition of A A .

So we put the orthonormal eigenvectors v 1 , v 2 ,  , v n into the


columns of V :

  
 
V   v1 v2  vn  ,
   
 

…. AND WE NAIL V.
What about U?

To see what U is, we need to go back to the SVD:


A  UΣV T .
We can re write this as
AV  UΣ.
We already know V and Σ so we can solve for U by comparing
the two sides of the equation:
A  v1 v 2  v n   U  1e1  2e 2   ne n 
  1Ue1  2 Ue 2   n Ue n 
  1u1  2u 2   nu n 
Still looking for U….

Write V and Σ in terms of their columns:


A  v1 v 2  v n   U  1e1  2e 2   ne n  .

The RHS:
U  1e1  2e 2   ne n    1Ue1  2 Ue2   n Ue n 
  1u1  2u 2   nu n  ,

where u1 , u 2 , , u n  are the columns of U.


Compare with LHS:
Av i   i u i , i  1, 2,  , n.
I think we’ve got U….almost
1
If  i  0 : ui  Av i . i  1, 2,  , n.
i
If only the first k > n  i ‘s are nonzero, i.e.,
 1   2     k  0,
and  k 1   k  2     n  0,

then,
1
ui  Av i . i  1, 2,  , k .
i
WE’VE ONLY JUST GOT THE
FIRST k COLUMNS OF U…..
What about the rest of the columns in U?

Ok…let’s go back to the SVD:


A  UΣV T .
We need to look at the details:
 Σ1 0   V1T 
A   U1 U2     T .
 0 0   V2 

where U1  R mk , U 2  R m n  k  ,


Σ1  R k k ,
V1  R nk , V2  R n n  k 
Look closer….

What we already know (1):

 1 
 

Σ1   2 ,
  
 
 k 

The nonzero singular values.


Look closer still….

We also know these (2i):


V1   v1 v2  vk  ,

where  v1 , v 2 , , v k  are the eigenvectors of AT A associated


with the nonzero eigenvalues 1 , 2 , , k  ,
And these (2ii):
V2   v k 1 vk 2  v n  ,

where  v k 1 , v k  2 , , v n  are the eigenvectors associated with


the zero eigenvalue (algebraic multiplicity n-k).
Look even closer….

We’ve figured out how to get these (3):


U1   u1 u 2  u k  ,

where u1 , u 2 , , u k  are orthonormal vectors by virtue of the


orthonormality of  v1 , v 2 , , v k  .

We are yet to get these:


U 2   u k 1 u k  2  u n  .

BUT.. DO WE REALLY NEED IT???


Let’s see if we really need U 2 ….

 Σ1 0   V1T 
A   U1 U2    T 
 0 0   V2 
 Σ1V1T 
  U1 U 2   
 0 
 U1Σ1V1T THE REDUCED SVD

YOU ONLY NEED THE REDUCED SVD TO


FULLY REPRESENT A!
If we really need a full U…..

We need some linear algebra to determine the rest of U:

 Orthogonal complements:
Let S be a subspace in R . Then the subspace S  defined
m

by

S   y  R m : y T x  0 for all x  S . 
 Theorem – (refer pg. 22 & 23 Datta)

null  A   range  A 
T 
(I)

null  A   range  A 
T 
(II)
What do we have in U…..

U   U1 U 2 
  u1 u 2  u k u k 1 u k  2  u n 

Spans S Spans S 

WHAT IS S ???
Discovering S…..

Notice that
u i  Av i , i  1, 2, , k .
 ui  range  A 
 S  span u1 , u 2 , , u k   range  A 

S
range  A 
u1 , u 2 , , u k
When is S as big as range  A  ?

It should be obvious that range  A   R


n

Let r  n be the rank of A (i.e. r = dimension of range  A  )


When k = r , S becomes equal to range  A  .
But k  r !!!
REMEMBER THIS:

Ax   1u1  v1T x    2u 2  vT2 x      nu n  vTn x  ,

for any x  R n .
Thus,
S  span u1 , u 2 , , u n   span u1 , u 2 , , u k   range  A  .
What about S  ?

Recall Theorem (II) earlier:

S  range  A   null  AT  .
 

Therefore to find u k 1 , u k  2 , , u n  , we need to find the


solution space of the homogeneous system

AT u  0,
and determine an orthonormal basis of that space.
In summary…

The SVD:
 Σ1 0   V1T 
A  UΣVT   U1 U2     T .
 0 0   V2 

where
m k m n  k 
U1  R , U2  R ,

Σ1  R k k ,
n k n n  k 
V1  R , V2  R
In summary: The key points

 Columns of V1 :  v1 , v 2 , , v k 
Eigenvectors of AT A associated with eigenvalues
 Columns of V2 :  v k 1 , v k  2 , , v n 
Eigenvectors of AT A associated with eigenvalues k 1  k  2    n  0
 Diagonal entries of 1 :
 i  i , i  1, 2, , k
 Columns of U1 : u1 , u 2 , , u k 
1
ui  Av i . i  1, 2,  , k .
i

 Columns of U 2 : u k 1 , u k  2 , , um 
Orthonormal basis of null A  
T
SVD by hand:

Let’s find the SVD of


1 1
 
A   1 1 .
0 0
 
 2 2
A A
T
 with eigenvalues  2
1  4,  2  0.
2

 2 2
I can now write down Σ :
 2 0
Σ .
0 0
SVD by hand (cont’d):

We can work out the eigenvectors of AT A to get


 1 1
vˆ 1    , vˆ 2    .
 1  1
Check for orthogonality: vˆ 1T vˆ 2  0
Check for orthonormality: vˆ 1T vˆ 1  vˆ T2 vˆ 2  2  1.
Normalize: 1  1 1 1
v1  vˆ 1 vˆ 1 2    , v 2  vˆ 2 vˆ 2   .
2  1 2
2  1 
I can now write down V :
1 1 1 
V  .
2  1 1 
SVD by hand (cont’d):

Now, A is 3X2, so we should expect U to be 3X3 (for full SVD)


and 3X2 for economy size SVD, i.e.
U   u1 u 2 u3  , u1 , u 2 , u3  R3 .
u1 should not be a problem, this is just
1 1 1
1 1  1  1 1  
u1  Av1   1 1     1 .
1 2  2  1 2 
 0 0 0
SVD by hand (cont’d):

To get u 2 , first we need to solve


AT u  0
 u1 
1 1 0     0 
   u2     ,
1 1 0     0 
which gives  u3 
 k   1  0
     
u    k   k  1  l  0  , for some k , l  R.
 l   0  1
     
SVD by hand (cont’d):

So it looks like null  A  is two dimensional and an obvious basis


for the space is
 1   0  
    
 1 ,  0   .
 0   1   ORTHOGONAL!!!
    
We can then set
1  0
1    
u1   1 , u2   0  . NORMALIZED FOR
2  1 ORTHONORMALITY
0  
SVD using MATLAB
>> [U,S,V] = svd(A) % [U,S,V] = svd(A,’econ’) for economy size

U =

-0.7071 -0.7071 0
-0.7071 0.7071 0
0 0 1.0000

S =

2 0
0 0
0 0

V =

-0.7071 -0.7071
-0.7071 0.7071
Some image processing

Introducing Lenna

A 1972 image of Lenna


Let’s use MATLAB to process Lenna….

First we need to turn Lenna from png into a matrix:

>>Lenna1 = imread(‘lenna.png’); %Read the image


>>Lenna2 = rgb2gray(Lenna1); %Convert to gray scale
>>Lenna = im2double(Lenna2); %Convert to double
>>imshow(Lenna); %Show Lenna
>>size(Lenna) %Show the dimension
Gray scale Lenna

This is what you see in MATLAB:

The ‘im2double’ command is important because it changes the


image data to a 2D matrix. The size of Lenna is 220 X 220.
Compute the SVD of Lenna
Lenna’s SVD:
>>[U,S,V] =svd(Lenna);

We’ll now experiment with the series

A  Lenna   1u1 v1T   2u 2 vT2     nu n vTn .


The following codes will allow me to compute a truncation of the
series:

A = zeros(size(Lenna));
for i = 1:k
A = A + S(i,i)*U(:,i)*V(:,i)’;
end
Let’s compare Lenna and its truncation

k 1
Let’s compare Lenna and its truncation

k  10
Let’s compare Lenna and its truncation

k  20
Let’s compare Lenna and its truncation

k  40
Let’s compare Lenna and its truncation

k  80
Let’s compare Lenna and its truncation

k  100
What have we observed?

 Lenna at ~80 is almost as good as Lenna at 220.


 Why???
 Ok, let’s look at the singular values:
120

100

80 Significant sv’s
60
k

40

20

0
0 50 100 150 200 250
k
Observation 1

SVD has a way of compressing the information in the matrix in


the leading terms (associated with larger singular values).
Observation 2

Let’s look at it from the perspective of range  A  (important


linear algebra problems are often associated with this vector
space):
Ax   1  v1T x  u1   2  vT2 x  u 2     n  vTn x  u n
 c1u1  c2u 2    cnu n ,
with c1  c2    cn . If only the first k terms is significant,
then, a general consensus on range  A  (and the information it
contains) can be achieved by only looking at Span u1 , u 2 , , u k  .
i.e. SVD allows us to reduce the dimension of the information
space from n to k.
Observation 2 (Example: Stock market prediction)

s1  t  s 2  t   s n  t 
day 1  s11 s12  s1n 
A  day 2  s21 s22 

s2 n 
     
 
day m  sm1 sm 2 smn 

klse(t )  Ax  x1s1  t   x2s 2  t     xns n  t 


Using SVD, we can get a good approximation of the stock market
data:
klse(t )   1  v1T x  u1   2  vT2 x  u 2     k  vTk x  u k
The full SVD

The FULL singular value decomposition (SVD) for a m  n matrix


A is A  UΣV T

where
 U is m  m with orthonormal columns;
 V is n  n with orthonormal columns;
 Σ is m  n diagonal matrix;

With this definition U and V can be treated as orthogonal


matrices.
(we need this definition to prove several important theorems
related to SVD)
No difference between FULL and REDUCED SVD

FULL SVD

 Σ 0   V T

A  UΣV   U1 U 2  
T 1 1
 T 
 0 0   V2 
 U1Σ1V1T  0
 U1Σ1V1T REDUCED
SVD
Important results (1): Columns of U and V

Let k  rank  A  and A  UΣVT


 A has k nonzero singular values
 Sp u1 , u 2 , , u k   range  A 
 Sp u k 1 , u k  2 , , u m   null  AT 
Sp  v1 , v 2 , , v k   range  A  T Column space of A

 Sp  v k 1 , v k  2 , , v n   null  A 
Orthogonal complement
Row space of A of column space of A

Orthogonal complement
of the row space of A
Important results (2): A 2

Theorem 6.1: If A is m x n matrix with singular value


decomposition UΣVT , then

A 2   1. (the largest singular value).


Important results (2): Proof of Theorem 6.1

U and V are orthogonal, therefore

A 2  UΣV T  Σ 2.
2
12
 k
2

Σx    i xi  
Σ 2  max 2
 max  i 1   .
12 1
x0 x x0
 k
2
   xi  
2

 i 1 
Σx 2
x achieves maximum when x  e1 . That’s it!
2
Important results (3): cond(A)

Corollary 6.2: If A is m x n matrix with singular value


decomposition UΣVT , then

cond  A    1  n .
Important results (3): Proof of Corollary 6.2

Based on the definition: cond  A   A 2 A† .


Recall: A   A A  A . (pseudoinverse of A)
1 2
† T T

Using SVD: A †
 VΣ † T
U ,
where SVD of A†
1  1 0  0 
 
0 1 
Σ †   ΣT Σ  Σ T  
1 2 .
    
 
 0  1 n 

Thus cond  A   A 2
A†   1 1  n    1  n . That’s it!
2

Largest singular value


Important results (4): A F

Lemma 6.3: If A is m x n matrix and Q is an orthogonal matrix,


then
QA F
 A F.
Theorem 6.4: If A is m x n matrix with singular value
T
decomposition UΣV , then

A F
       
2
1
2
2 n
2 12
.
Important results (4): Proofs

Proof of Lemma 6.3:


  Qa1 Qa 2  Qa n 
2 2
QA F F
n
  Qai
2
2
i 1
n
  ai
2 2
2
 A F.
i 1
Proof of Theorem 6.4:
A F
 UΣV T
 Σ F
       
2
1
2
2
2 12
n  .
F
Solving least squares problem using SVD (full rank)

Suppose A  R mn is full rank and has SVD A  UΣVT .


The pseudoinverse: A   A A  AT  VΣ 1UT .
† T 1

The least squares solution of Ax  b in terms of the


pseudoinverse and SVD:
x  A †b  VΣ† UT b.

Can be computed very efficiently


(See next slide)
Computing least squares solution (A full rank)

Computing x  A †b  VΣ 1UT b.
First notice that
 1 T 
  u1 b 
 1 
 1 T 
  u2 b 
O 
n 2

y Σ U b 2
† T
  R n
. multiplication
  
 
 1 uT b 
 n 
 n 
n n
1 T 
Thus, x  Vy   yi v i    ui b v i .
i 1 i 1   i 
Solving least squares problem using SVD (rank deficient)

Suppose A is rank deficient with rank k  n.


The least squares solution is the solution to the minimization
problem
2
min b  Ax 2 .
x

Let’s analyze this problem using SVD


Rank deficient LSQ problem: Analyzing….

Proceed as follows:
2 2 2
b  Ax 2  b  UΣV x  U b  ΣV x .
T T T
2 2

(using orthogonality of U)
2
 U b   Σ1 0   V x 
2
T T
U b  ΣV x  
T T

1
T 
1
 .
T
 U b  0 0 V x 2
2
2 2

(subscripts ‘1’ denotes columns associated with nonzero singular


values)
Rank deficient LSQ problem: Still analyzing….

2 2
 U b   Σ1 0   V x 
2
T T
 U b  Σ1V x T T
U b  ΣV x  
T T

1
T    
1
T T
1
 ,
1

 U b  0 0 V x 2  U2 b
2
2 2 2
(The last term on the right is just the L2 norm of a R m vector).
which means
2 2 2
b  Ax 2  U1 b  Σ1V1 x  U 2 b .
T T T
2 2
Therefore
2 2 2
min b  Ax 2  min U b  Σ1V x  U b ,
T
1 1
T T
2
x x 2 2

and the minimum is achieved when U1T b  Σ1V1T x  0. (*)


The minimum-norm solution

Solution to U1T b  Σ1V1T x  0 is called the minimum-norm


solution.
Let y 1  Σ11U1T b.

Notice that x  V1y 1 satisfies U1T b  Σ1V1T x  0 .

When A is full rank, the minimum-norm solution is the unique


solution.
However, when A is rank deficient, it’s no longer unique, i.e.
2
there are other x that minimizes b  Ax 2 .
HOW DO I KNOW THIS???
The other solutions

Let x  V1y 1  V2 w where w is any vector in R n  k .

U1T b  Σ1V1T x  U1T b  Σ1V1T V1y    Σ1V1T V2 w  0.


1

0 0
by definition of y 1 due to orthogonality
of the columns of V

Therefore, when A is rank deficient there are infinitely many


solutions other than the minimum-norm solution.
NOTE: w is an arbitrary vector in R n  k which means, V2 w is just
a linear combination of v k 1 , v k  2 , , v n .
Let’s compare the full rank case and the rank deficient case

We revisit the solution given by the pseudoinverse:


x  A †b  VΣ† UT b
 Σ1† 0  T
 †
 U1 
  V1 V2   0 Σ 2   T  b
 0   U2 
 
 Σ11 0  T
 1 
 U1 
  V1 V2   0 Σ 2   T  b
   U2 
 0 
where the subscripts ‘1’ are associated with singular values
 1 ,  2 , ,  k and subscipts ‘2’ are associated with  k 1 ,  k  2 , ,  n .
One solution reveals all….

The least squares solution can be written as


x  A †b  V1Σ1† U1T b  V2 Σ†2 UT2 b.
1
Let yi  uTi b, i  1, 2, , n, then
i
n
 1 T 
n
x  Vy   yi v i    ui b v i  V1y 1  V2 y  2
i 1 i 1   i 

The minimum norm solution

 0 when A is full rank


A procedure for computing the least squares solution
using SVD

 Compute the nonzero singular values:  1 ,  2 , ,  k .


 Compute the columns of V1 : v1 , v 2 , , v k .
 Compute the columns of V2 (if needed): v k 1 , v k  2 , , v n .
 Compute the columns of U1 : u1 , u 2 , , u k .
1
 Compute yi  uTi b, i  1, 2, , k .
i k
 Compute minimum-norm solution: xˆ   yi v i
i 1
 Any other solution can be formed by
xˆ  yˆ
where ŷ is any linear combination of v k 1 , v k  2 , , v n .
Class Example:

Use the SVD method to find all the least squares solution of
Ax = b where
1 2  6
A , b   ,
 1 2   4 

given that the eigenvalues of AT A are 1  10, 2  0 with


corresponding eigenvectors
1 2
x1    , x 2    ,
2  1
respectively.
Do this with MATLAB:

Use the SVD method to find all the least squares solution of
Ax = b where
1 1 1  89 
   
1 1 0  67 
A  0 1 1, b   53  .
   
1 0 0  35 
0 0 1   20 
  

You might also like