Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

L.

Vandenberghe ECE133B (Spring 2020)

3. Symmetric eigendecomposition

• eigenvalues and eigenvectors

• symmetric eigendecomposition

• quadratic forms

• low rank matrix approximation

3.1
Eigenvalues and eigenvectors

a nonzero vector x is an eigenvector of the n × n matrix A, with eigenvalue λ, if

Ax = λx

• the matrix λI − A is singular and x is a nonzero vector in the nullspace of λI − A


• the eigenvalues of A are the roots of the characteristic polynomial:

det(λI − A) = λn + cn−1 λn−1 + · · · + c1 λ + (−1)n det(A) = 0

• this immediately shows that every square matrix has at least one eigenvalue
• the roots of the polynomial (and corresponding eigenvectors) may be complex
• (algebraic) multiplicity of an eigenvalue is its multiplicity as a root of det(λI − A)
• there are exactly n eigenvalues, counted with their multiplicity
• set of eigenvalues of A is called the spectrum of A

Symmetric eigendecomposition 3.2


Diagonal matrix

 A11
 0 ··· 0 

 0 A22 · · · 0
A =  .

.. ... ..
 .



 0
 0 ··· Ann 

• eigenvalues of A are the diagonal entries A11, . . . , Ann


• the n unit vectors e1 = (1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1) are eigenvectors:

Aei = Aii ei

• linear combinations of ei are eigenvectors if the corresponding Aii are equal

Example: A = αI is a scalar multiple of the identity matrix

• one eigenvalue α with multipicity n


• every nonzero vector is an eigenvector

Symmetric eigendecomposition 3.3


Similarity transformation

two matrices A and B are similar if

B = T −1 AT

for some nonsingular matrix T

• the mapping that maps A to T −1 AT is called a similarity transformation


• similarity transformations preserve eigenvalues:

det(λI − B) = det(λI − T −1 AT) = det(T −1(λI − A)T) = det(λI − A)

• if x is an eigenvector of A then y = T −1 x is an eigenvector of B:

By = (T −1 AT)(T −1 x) = T −1 Ax = T −1(λx) = λy

of special interest will be orthogonal similarity transformations (T is orthogonal)


Symmetric eigendecomposition 3.4
Diagonalizable matrices

a matrix is diagonalizable if it is similar to a diagonal matrix:

T −1 AT = Λ

for some nonsingular matrix T

• the diagonal elements of Λ are the eigenvalues of A


• the columns of T are eigenvectors of A:

A(T ei ) = TΛei = Λii (T ei )

• the columns of T give a set of n linearly independent eigenvectors

not all square matrices are diagonalizable

Symmetric eigendecomposition 3.5


Spectral decomposition

suppose A is diagonalizable, with

 λ1 0 · · · 0   w1T 
0 λ2 · · ·
   
−1 0 w2T
A = TΛT =
     
v1 v2 · · · vn  .. .. . . . .. 


 .. 

λn
   

 0 0 ··· 


 wnT 

= λ1 v1 w1T + λ2 v2 w2T + · · · + λn vn wnT

this is a spectral decomposition of the linear function f (x) = Ax

• elements of T −1 x are coefficients of x in the basis of eigenvectors {v1, . . . , vn }:

x = TT −1 x = α1 v1 + · · · + αn vn where αi = wiT x

• applied to an eigenvector, f (vi ) = Avi = λi vi is a simple scaling


• by superposition, we find Ax as

Ax = α1 λ1 v1 + · · · + αn λn vn = TΛT −1 x

Symmetric eigendecomposition 3.6


Exercise

recall from 133A the definition of a circulant matrix


 a1 an an−1 ··· a3 a2 


 a2 a1 an ··· a4 a3 

a3 a2 a1 ··· a5 a4
A = 
 
.. .. .. ... .. .. 

 

 an−1 an−2 an−3 ··· a1 an 


 an an−1 an−2 ··· a2 a1 

and its factorization


1
A = W diag(W a)W H
n
W is the discrete Fourier transform matrix (W a is the DFT of a) and

1
W −1 = W H
n

what is the spectrum of A?


Symmetric eigendecomposition 3.7
Outline

• eigenvalues and eigenvectors

• symmetric eigendecomposition

• quadratic forms

• low rank matrix approximation


Symmetric eigendecomposition

eigenvalues/vectors of a symmetric matrix have important special properties

• all the eigenvalues are real

• the eigenvectors corresponding to different eigenvalues are orthogonal

• a symmetrix matrix is diagonalizable by an orthogonal similarity transformation:

QT AQ = Λ, QT Q = I

in the remainder of the lecture we assume that A is symmetric (and real)

Symmetric eigendecomposition 3.8


Eigenvalues of a symmetric matrix are real

consider an eigenvalue λ and eigenvector x (possibly complex):

Ax = λx, x,0

• inner product with x shows that x H Ax = λx H x


• x H x = i=1 |xi | 2 is real and positive, and x H Ax is real:
Pn

n X
n n
x Ax =
H
Ai j x̄i x j = Aii |xi | + 2
2
X X X
Ai j Re( x̄i x j )
i=1 j=1 i=1 j<i

• therefore λ = (x H Ax)/(x H x) is real


• if x is complex, its real and imaginary part are real eigenvectors (if nonzero):

A(xre + jxim) = λ(xre + jxim) =⇒ Axre = λxre, Axim = λxim

therefore, eigenvectors can be assumed to be real

Symmetric eigendecomposition 3.9


Orthogonality of eigenvectors

suppose x and y are eigenvectors for different eigenvalues λ, µ:

Ax = λx, Ay = µy, λ,µ

• take inner products with x , y :

λyT x = yT Ax = xT Ay = µxT y

second equality holds because A is symmetric

• if λ , µ this implies that


xT y = 0

Symmetric eigendecomposition 3.10


Eigendecomposition

every real symmetric n × n matrix A can be factored as

A = QΛQT (1)

• Q is orthogonal

• Λ = diag(λ1, . . . , λn) is diagonal, with real diagonal elements

• A is diagonalizable by an orthogonal similarity transformation: QT AQ = Λ

• the columns of Q are an orthonormal set of n eigenvectors: write AQ = QΛ as

 λ1 0 · · · 0 
  0 λ2 · · ·
 
0
=
   
A q1 q2 · · · qn q1 q2 · · · qn  . .. . . . ..
.


λn
 
 0 0 ··· 
 
= λ1 q1 λ2 q2 · · · λn qn
 

Symmetric eigendecomposition 3.11


Proof by induction

• the decomposition (1) obviously exists if n = 1


• suppose it exists if n = m and A is an (m + 1) × (m + 1) matrix
• A has at least one eigenvalue (page 3.2)
• let λ1 be any eigenvalue and q1 a corresponding eigenvector, with kq1 k = 1
• let V be an (m + 1) × m matrix that makes the matrix q1 V orthogonal:
 

q1T q1T Aq1 q1T AV λ1 q1T q1 λ1 q1T V λ1


       
0
= = =
 
A q1 V
VT V T Aq1 V T AV λ1V T q1 V T AV 0 V T AV

• V T AV is a symmetric m × m matrix, so by the induction hypothesis,

V T AV = Q̃Λ̃Q̃T for some orthogonal Q̃ and diagonal Λ̃

• matrix Q = q1 V Q̃ is orthogonal and defines a similarity that diagonalizes A:


 

q1T λ1 λ1 0
     
0
QT AQ = A q1 V Q̃ = =
 
Q̃T V T 0 Q̃T V T AV Q̃ 0 Λ̃
Symmetric eigendecomposition 3.12
Spectral decomposition

the decomposition (1) expresses A as a sum of rank-one matrices:

 λ1 0 · · · 0   q1T 
0 λ2 · · ·
   
0 q2T
A = QΛQ T
=
     
q1 q2 · · · qn  .. .. . . . .. 


 .. 

λn
   

 0 0 ··· 


 qnT 

n
= λi qi qiT
X
i=1

• the matrix–vector product Ax is decomposed as

n
Ax = λi qi (qiT x)
X
i=1

• (q1T x, . . . , qnT x) are coordinates of x in the orthonormal basis {q1, . . . , qn }


• (λ1 q1T x, . . . , λn qnT x) are coordinates of Ax in the orthonormal basis {q1, . . . , qn }

Symmetric eigendecomposition 3.13


Non-uniqueness

some freedom exists in the choice of Λ and Q in the eigendecomposition

 λ1 · · · 0   q1T 
A = QΛQT =

q1 · · ·
 
qn  .. . . . .. 


 .. 

λn
  

 0 ··· 


 qnT 

Ordering of eigenvalues
diagonal Λ and columns of Q can be permuted; we will assume that

λ1 ≥ λ2 ≥ · · · ≥ λn

Choice of eigenvectors
suppose λi is an eigenvalue with multiplicity k : λi = λi+1 = · · · = λi+k−1

• nonzero vectors in span{qi, . . . , qi+k−1 } are eigenvectors with eigenvalue λi


• qi , . . . , qi+k−1 can be replaced with any orthonormal basis of this “eigenspace”

Symmetric eigendecomposition 3.14


Inverse

a symmetric matrix is invertible if and only if all its eigenvalues are nonzero:

• inverse of A = QΛQT is

 1/λ1
 0 ··· 0 

 0 1/λ2 · · · 0
A−1 = (QΛQT )−1 = QΛ−1QT , Λ−1 =  .

. .. ... .. 

 
 0
 0 ··· 1/λn 

• eigenvectors of A−1 are the eigenvectors of A


• eigenvalues of A−1 are reciprocals of eigenvalues of A

Symmetric eigendecomposition 3.15


Spectral matrix functions

Integer powers

Ak = (QΛQT ) k = QΛ k QT , Λ k = diag(λ1k , . . . , λnk )

• negative powers are defined if A is invertible (all eigenvalues are nonzero)


• Ak has the same eigenvectors as A, eigenvalues λik

Square root
p p
A 1/2
= QΛ Q ,
1/2 T
Λ1/2
= diag( λ1, . . . , λn)

• defined if eigenvalues are nonnegative


• a symmetric matrix that satisfies A1/2 A1/2 = A

Other matrix functions: can be defined via power series, for example,

exp(A) = Q exp(Λ)QT , exp(Λ) = diag(eλ1, . . . , eλn )

Symmetric eigendecomposition 3.16


Range, nullspace, rank

eigendecomposition with nonzero eigenvalues placed first in Λ:

QT1
   
Λ1 0
A = QΛQT = = Q1Λ1QT1
 
Q1 Q2
0 0 QT2

diagonal entries of Λ1 are the nonzero eigenvalues of A

• columns of Q1 are an orthonormal basis for range(A)


• columns of Q2 are an orthonormal basis for null(A)
• this is an example of a full-rank factorization (page 1.27): A = BC with

B = Q1, C = Λ1QT1

• rank of A is the number of nonzero eigenvalues (with their multiplicities)

Symmetric eigendecomposition 3.17


Pseudo-inverse

we use the same notation as on the previous page

QT1
   
Λ1 0
A= = Q1Λ1QT1
 
Q1 Q2
0 0 QT2

diagonal entries of Λ1 are the nonzero eigenvalues of A

• pseudo-inverse follows from page 1.36 with B = Q1 and C = Λ1QT1


• the pseudo-inverse is A† = C † B† = (Q1Λ−1
1 )Q T:
1

Λ−1 QT1
   
0
A† = Q1Λ−1 1 =
T
 
Q Q1 Q2 1
1 0 0 QT2

• eigenvectors of A† are the eigenvectors of A


• nonzero eigenvalues of A† are reciprocals of nonzero eigenvalues of A
• range, nullspace, and rank of A† are the same as for A

Symmetric eigendecomposition 3.18


Trace

the trace of an n × n matrix B is the sum of its diagonal elements

n
trace(B) =
X
Bii
i=1

• transpose: trace(BT ) = trace(B)


• product: if B is n × m and C is m × n, then

n X
m
trace(BC) = trace(CB) =
X
Bi j C ji
i=1 j=1

• eigenvalues: the trace of a symmetric matrix is the sum of the eigenvalues

n
trace(QΛQ ) = trace(Q QΛ) = trace(Λ) =
T T
λi
X
i=1

Symmetric eigendecomposition 3.19


Frobenius norm

recall the definition of Frobenius norm of an m × n matrix B:


v
tm n q q
kBkF = Bi2j = trace(BT B) =
XX
trace(BBT )
i=1 j=1

• this is an example of a unitarily invariant norm: if U , V are orthogonal, then

kUBV kF = kBkF

Proof:

kUBV kF2 = trace(V T BT U T UBV) = trace(VV T BT B) = trace(BT B) = kBkF2

• for a symmetric n × n matrix with eigenvalues λ1, . . . , λn,


! 1/2
n
k AkF = kQΛQT kF = kΛkF = λi2
X
i=1

Symmetric eigendecomposition 3.20


2-Norm

recall the definition of 2-norm or spectral norm of an m × n matrix B:

kBxk
kBk2 = max
x,0 k xk

• this norm is also unitarily invariant: if U , V are orthogonal, then

kUBV k2 = kBk2

Proof:

kUBV xk kUByk kByk


kUBV k2 = max = max T = max = kBk2
x,0 k xk y,0 kV yk y,0 k yk

• for a symmetric n × n matrix with eigenvalues λ1, . . . , λn,

k Ak2 = kQΛQT k2 = kΛk2 = max |λi | = max{λ1, −λn }


i=1,...,n

Symmetric eigendecomposition 3.21


Exercises

Exercise 1

suppose A has eigendecomposition A = QΛQT ; give an eigendecomposition of

A − αI

Exercise 2

what are the eigenvalues and eigenvectors of an orthogonal projector

A = UU T (where U T U = I )

Exercise 3

the condition number of a nonsingular matrix is defined as

κ(A) = k Ak2 k A−1 k2

express the condition number of a symmetric matrix in terms of its eigenvalues


Symmetric eigendecomposition 3.22
Outline

• eigenvalues and eigenvectors

• symmetric eigendecomposition

• quadratic forms

• low rank matrix approximation


Quadratic forms

the eigendecomposition is a useful tool for problems that involve quadratic forms

f (x) = xT Ax

• substitute A = QΛQT and make an orthogonal change of variables y = QT x :

f (Qy) = yT Λy = λ1 y12 + · · · + λn yn2

• y1, . . . , yn are coordinates of x in the orthonormal basis of eigenvectors

• the orthogonal change of variables preserves inner products and norms:

k yk2 = kQT xk2 = k xk2

Symmetric eigendecomposition 3.23


Maximum and minimum value

consider the optimization problems with variable x

maximize xT Ax minimize xT Ax
subject to xT x = 1 subject to xT x = 1

change coordinates to the spectral basis ( y = QT x and x = Qy ):

maximize λ1 y12 + · · · + λn yn2 minimize λ1 y12 + · · · + λn yn2


subject to y12 + · · · + yn2 = 1 subject to y12 + · · · + yn2 = 1

• maximization: y = (1, 0, . . . , 0) and x = q1 are optimal; maximal value is

max xT Ax = max (λ1 y12 + · · · + λn yn2) = λ1 = max λi


k xk=1 k yk=1 i=1,...,n

• minimization: y = (0, 0, . . . , 1) and x = qn are optimal; minimal value is

min xT Ax = min (λ1 y12 + · · · + λn yn2) = λn = min λi


k xk=1 k yk=1 i=1,...,n

Symmetric eigendecomposition 3.24


Exercises

Exercise 1: find the extreme values of the Rayleigh quotient (xT Ax)/(xT x), i.e.,

xT Ax xT Ax
max T , min T
x,0 x x x,0 x x

Exercise 2: solve the optimization problems

maximize xT Ax minimize xT Ax
subject to xT x ≤ 1 subject to xT x ≤ 1

Exercise 3: show that (for symmetric A)

k Ak2 = max |λi | = max |xT Ax|


i=1,...,n k xk=1

Symmetric eigendecomposition 3.25


Sign of eigenvalues

matrix property condition on eigenvalues


positive definite λn > 0
positive semidefinite λn ≥ 0
indefinite λn < 0 and λ1 > 0
negative semidefinite λ1 ≤ 0
negative definite λ1 < 0

• λ1 and λn denote the largest and smallest eigenvalues:

λ1 = max λi, λn = min λi


i=1,...,n i=1,...,n

• properties in the table follow from

xT Ax xT Ax
λ1 = max x Ax = max T ,
T
λn = min x Ax = min T
T
k xk=1 x,0 x x k xk=1 x,0 x x

Symmetric eigendecomposition 3.26


Ellipsoids
if A is positive definite, the set

E = {x | xT Ax ≤ 1}

is an ellipsoid with center at the origin

√1 q1
λ1

√1 qn
λn

after the orthogonal change of coordinates y = QT x the set is described by

λ1 y12 + · · · + λn yn2 ≤ 1

this shows that:

• eigenvectors of A give the principal axes



• the width along the principal axis determined by qi is 2/ λi

Symmetric eigendecomposition 3.27


Exercise

give an interpretation of trace(A−1) as a measure of the size of the ellipsoid

E = {x | xT Ax ≤ 1}

Symmetric eigendecomposition 3.28


Max–min characterization of eigenvalues

as an extension of the maximization problem on page 3.24, consider

maximize λmin(X T AX) (2)


subject to XT X = I

the variable X is an n × k matrix, for some given value of k between 1 and n

• λmin(X T AX) denotes the smallest eigenvalue of the k × k matrix X T AX


• for k = 1 this is the problem on page 3.24: λmin(xT Ax) = xT Ax

Solution: from the eigendecomposition A = QΛQT = λ


Pn T
i=1 i qi qi
• the optimal value of (2) is the k th eigenvalue λ k of A
• an optimal choice for X is formed from the first k columns of Q:

X=
 
q1 q2 · · · qk

this is known as the Courant–Fischer min–max theorem


Symmetric eigendecomposition 3.29
Proof of the max–min characteriation

we make a change of variables Y = QT X :

maximize λmin(Y T ΛY )
subject to Y TY = I

we also partition Λ as

   λ1 · · · 0   λ k+1 · · · 0 
Λ=
Λ1 0
, Λ1 = 
 .. . . . .. ,
  .
Λ2 =  . ... .. 

0 Λ2
λk λn
 

 0 ··· 

 0
 ··· 

 
I
we show that the matrix Ŷ = is optimal
0
• for this matrix
 T    
I Λ1 0 I
Ŷ T ΛŶ = = Λ1, λmin(Ŷ T ΛŶ ) = λmin(Λ1) = λ k
0 0 Λ2 0

• on the next page we show that λmin(Y T ΛY ) ≤ λ k if Y is n × k with Y T Y = I

Symmetric eigendecomposition 3.30


Proof of the max–min characteriation
• on page 3.24, we have seen that

λmin(Y T ΛY ) = min uT (Y T ΛY )u
kuk=1

• if Y has k columns, there exists v , 0 such that Y v has k − 1 leading zeros:


 Y11 ··· Y1k   0 

 .. .. 


 .. 

   v1   
Yk−1,1 ··· Yk−1,k  .   0
Y v =   . =
 

Yk1 ··· Yk k    yk 

 .. ..   vk  
   .. 

   

 Yn1 ··· Ynk 


 yn 

• if Y T Y = I and we normalize v , then kY vk = kvk = 1 and

(Y v)T Λ(Y v) = λ k y k2 + · · · + λn yn2 ≤ λ k (y k2 + · · · + yn2) = λ k

• this shows that

λmin(Y T ΛY ) = min uT (Y T ΛY )u ≤ vT (Y T ΛY )v ≤ λ k
kuk=1
Symmetric eigendecomposition 3.31
Min–max characterization of eigenvalues

the minimization problem on page 3.24 can be extended in a similar way:

minimize λmax(X T AX)


(3)
subject to XT X = I

the variable X is an n × k matrix

• λmax(X T AX) denotes the largest eigenvalue of the k × k matrix X T AX


• for k = 1 this is the minimization problem on page 3.24: λmax(xT Ax) = xT Ax

Solution: from the eigenvalue decomposition A = QΛQT = λ


Pn T
i=1 i qi qi
• the optimal value of (3) is eigenvalue λn−k+1 of A
• an optimal choice of X is formed from the last k columns of Q:

X=
 
qn−k+1 · · · qn−1 qn

this follows from the max–min characterization on page 3.29 applied to −A


Symmetric eigendecomposition 3.32
Exercises

Exercise 1: suppose B is an m × m principal submatrix of A, for example,

 A11
 A12 · · · A1m 

 A21 A22 · · · A2m
B =  . ,

.. .. (4)
 .


 Am1
 Am2 · · · Amm 

and denote the m eigenvalues of B by µ1 ≥ µ2 ≥ · · · ≥ µm

show that
µ1 ≤ λ1, µ2 ≤ λ2, . . ., µm ≤ λm

(λ1, . . . , λm are the first m eigenvalues of A)

Exercise 2: consider the matrix B in (4) with m = n − 1; show that

λ1 ≥ µ1 ≥ λ2 ≥ µ2 ≥ · · · ≥ λn−1 ≥ µn−1 ≥ λn

this is known as the eigenvalue interlacing theorem


Symmetric eigendecomposition 3.33
Eigendecomposition of covariance matrix

• suppose x is a random n-vector with mean µ, covariance matrix Σ


• Σ is positive semidefinite with eigendecomposition

Σ = E((x − µ)(x − µ)T ) = QΛQT

define a random n-vector y = QT (x − µ)

• y has zero mean and covariance matrix Λ:

E(yyT ) = QT E((x − µ)(x − µ)T )Q = QT ΣQ = Λ

• components of y are uncorrelated and have variances E(yi2) = λi


• x is decomposed in uncorrelated components with decreasing variance:

E(y12) ≥ E(y22) ≥ · · · ≥ E(yn2)

the transformation is known as the Karhunen–Loève or Hotelling transform


Symmetric eigendecomposition 3.34
Multivariate normal distribution

multivariate normal (Gaussian) probability density function

1 − 12 (x−µ)T Σ−1 (x−µ)


p(x) = √ e
(2π) n/2 det Σ

x2
contour lines of density function for

 √   
1
Σ= √7 3
, µ=
5
4 3 5 4

eigenvalues of Σ are λ1 = 2, λ2 = 1,
 √   
3/2 1/2
q1 = , q2 = √
1/2 − 3/2
x1

Symmetric eigendecomposition 3.35


Multivariate normal distribution

the decorrelated and de-meaned variables y = QT (x − µ) have distribution

n
1 yi2
p̃(y) =
Y
√ exp(− )
i=1 2πλi 2λi

x2
y2
y1
y1
−λ11/2 λ11/2

y2 x1
−λ21/2 λ21/2

Symmetric eigendecomposition 3.36


Joint diagonalization of two matrices

• a symmetric matrix A is diagonalized by an orthogonal similarity:

QT AQ = Λ

• as an extension, if A, B are symmetric and B is positive definite, then

ST AS = D, ST BS = I

for some nonsingular S and diagonal D

Algorithm: S and D can be computed is as follows

• Cholesky factorization B = RT R, with R upper triangular and nonsingular


• eigendecomposition R−T AR−1 = QDQT , with D diagonal, Q orthogonal
• define S = R−1Q:

ST AS = QT R−T AR−1Q = Λ, ST BS = QT R−T BR−1Q = QT Q = I

Symmetric eigendecomposition 3.37


Optimization problems with two quadratic forms

as an extension of the maximization problem on page 3.24, consider

maximize xT Ax
subject to xT Bx = 1

where A, B are symmetric and B is positive definite

• compute nonsingular S that diagonalizes A, B:

ST AS = D, ST BS = I

• make change of variables x = Sy :

maximize yT Dy
subject to yT y = 1

• if diagonal elements of D are sorted as D11 ≥ · · · ≥ Dnn, solution is

y = e1 = (1, 0, . . . , 0), x = Se1, xT Ax = D11


Symmetric eigendecomposition 3.38
Outline

• eigenvalues and eigenvectors

• symmetric eigendecomposition

• quadratic forms

• low rank matrix approximation


Low-rank matrix approximation

• low rank is a useful matrix property in many applications

• low rank is not a robust property (easily destroyed by noise or estimation error)

• most matrices in practice have full rank

• often the full-rank matrix is close to being low rank

• computing low-rank approximations is an important problem in linear algebra

on the next pages we discuss this for positive semidefinite matrices

Symmetric eigendecomposition 3.39


Rank-r approximation of positive semidefinite matrix

let A be a positive semidefinite matrix with rank(A) > r and eigendecomposition


n
A = QΛQ = T
λi qi qiT , λ1 ≥ · · · ≥ λn ≥ 0, λr+1 > 0
X
i=1

the best rank-r approximation is the sum of the first r terms in the decomposition:
r
B= λi qi qiT
X
i=1

• B is the best approximation for the Frobenius norm: for every C with rank r ,
! 1/2
n
k A − CkF ≥ k A − BkF = λi2
X
i=r+1

• B is also the best approximation for the 2-norm: for every C with rank r ,

k A − Ck2 ≥ k A − Bk2 = λr+1

Symmetric eigendecomposition 3.40


Rank-r approximation in Frobenius norm

the approximation problem in Frobenius norm is a nonlinear least squares problem


!2
n X
n r
X X T kF2 =
X X
minimize kA − Ai j − Xik X j k
i=1 j=1 k=1

• we parametrize B as B = X X T with X of size n × r , and optimize over X


• this can be written in the standard nonlinear least squares form

minimize g(x) = k f (x)k 2

with vector x containing the elements of X and f (x) the elements of A − X X T


• the first order (necessary but not sufficient) optimality conditions are

∇g(x) = 2D f (x)T f (x) = 0

• the first order optimality conditions will be derived on page 3.41; they are

4(A − X X T )X = 0
Symmetric eigendecomposition 3.41
Solution of first order optimality conditions

AX = X(X T X)

• define eigendecomposition X T X = UDU T (U orthogonal r × r , D diagonal)


• use Y = XU and D as variables:

AY = Y D, Y TY = D

• the r diagonal elements of D must be eigenvalues of A


• the r columns of Y are corresponding orthogonal eigenvectors

• the columns of Y are normalized to have norm Dii

we conclude that the solutions of the first order optimality conditions satisfy

X X = YY =
T T
λi qi qiT
X
i∈I

where I is a subset of r elements of {1, 2, . . . , n}


Symmetric eigendecomposition 3.42
Optimal solution

among the solutions of the 1st order conditions we choose the one that minimizes

k A − X X T kF

• the squared error in the approximation is

X X T kF2 = λi qi qiT kF2


X
kA − kA −
i∈I

= λi qi qiT kF2
X
k
i<I

= λi2
X
i<I

• the optimal choice for I is I = {1, 2, . . . , r }:


r n
XX =
T
λi qi qiT , X X T kF2 = λi2
X X
kA −
i=1 i=r+1

Symmetric eigendecomposition 3.43


First order optimality
to derive the first order optimality conditions for

minimize k A − X X T kF2

we substitute X + δX , with arbitrary small δX , and linearize:

k A − (X + δX)(X + δX)T kF2


= k A − X X T + δX X T + X δX T + δX δX T kF2
≈ k A − X X T + δX X T + X δX T kF2
 
= trace (A − X X + δX X + X δX )(A − X X + δX X + X δX )
T T T T T T

≈ trace ((A − X X T )(A − X X T )) + 2 trace ((δX X T + X δX T )(A − X X T ))


= k A − X X T kF2 + 4 trace (δX T (A − X X T )X)

X is a stationary point if the second term is zero for all δX :

4(A − X X T )X = 0

Symmetric eigendecomposition 3.44


Rank-r approximation in 2-norm

the same matrix B is also the best approximation in 2-norm: if C has rank r , then

k A − Ck2 ≥ k A − Bk2

the right-hand side is

n r
k A − Bk2 = λi qi qiT λi qi qiT k2
X X
k −
i=1 i=1
n
= λi qi qiT k2
X
k
i=r+1
= λr+1

on the next page we show that k A − Ck2 ≥ λr+1 if C has rank r

Symmetric eigendecomposition 3.45


Proof

• if rank(C) = r , the nullspace of C has dimension n − r


• define an n × (n − r) matrix V with orthonormal columns that span null(C)
• we use the min–max theorem on page 3.32 to bound k A − Ck2:

k A − Ck2 = max |xT (A − C)x| (page 3.25)


k xk=1
≥ max xT (A − C)x
k xk=1
≥ max yT V T (A − C)V y ( kV yk = k yk )
k yk=1
= max yT V T AV y (V T CV = 0)
k yk=1
= λmax(V T AV)

≥ λr+1 (page 3.32 with k = n − r )

Symmetric eigendecomposition 3.46

You might also like