Eval Norms

L.
Vandenberghe ECE133B (Spring 2020)
3. Symmetric eigendecomposition
• eigenvalues and eigenvectors
• symmetric eigendecomposition
• quadratic forms
• low rank matrix approximation
3.1
Eigenvalues and eigenvectors
a nonzero vector x is an eigenvector of the n × n matrix A, with eigenvalue λ, if
Ax = λx
• the matrix λI − A is singular and x is a nonzero vector in the nullspace of λI − A

• the eigenvalues of A are the roots of the characteristic polynomial:
det(λI − A) = λn + cn−1 λn−1 + · · · + c1 λ + (−1)n det(A) = 0
• this immediately shows that every square matrix has at least one eigenvalue
• the roots of the polynomial (and corresponding eigenvectors) may be complex
• (algebraic) multiplicity of an eigenvalue is its multiplicity as a root of det(λI − A)
• there are exactly n eigenvalues, counted with their multiplicity
• set of eigenvalues of A is called the spectrum of A
Symmetric eigendecomposition 3.2

Diagonal matrix
 A11
 0 ··· 0 

 0 A22 · · · 0
A =  .

.. ... ..
 .



 0
 0 ··· Ann 

• eigenvalues of A are the diagonal entries A11, . . . , Ann

• the n unit vectors e1 = (1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1) are eigenvectors:
Aei = Aii ei
• linear combinations of ei are eigenvectors if the corresponding Aii are equal
Example: A = αI is a scalar multiple of the identity matrix
• one eigenvalue α with multipicity n

• every nonzero vector is an eigenvector

Similarity transformation
two matrices A and B are similar if
B = T −1 AT
for some nonsingular matrix T
• the mapping that maps A to T −1 AT is called a similarity transformation

• similarity transformations preserve eigenvalues:
det(λI − B) = det(λI − T −1 AT) = det(T −1(λI − A)T) = det(λI − A)
• if x is an eigenvector of A then y = T −1 x is an eigenvector of B:
By = (T −1 AT)(T −1 x) = T −1 Ax = T −1(λx) = λy
of special interest will be orthogonal similarity transformations (T is orthogonal)

Diagonalizable matrices
a matrix is diagonalizable if it is similar to a diagonal matrix:
T −1 AT = Λ
for some nonsingular matrix T
• the diagonal elements of Λ are the eigenvalues of A

• the columns of T are eigenvectors of A:
A(T ei ) = TΛei = Λii (T ei )
• the columns of T give a set of n linearly independent eigenvectors
not all square matrices are diagonalizable

Spectral decomposition
suppose A is diagonalizable, with
 λ1 0 · · · 0   w1T 
0 λ2 · · ·
   
−1 0 w2T
A = TΛT =
   
v1 v2 · · · vn  .. .. . . . .. 


 .. 

λn
   

 0 0 ··· 


 wnT 

= λ1 v1 w1T + λ2 v2 w2T + · · · + λn vn wnT
this is a spectral decomposition of the linear function f (x) = Ax
• elements of T −1 x are coefficients of x in the basis of eigenvectors {v1, . . . , vn }:
x = TT −1 x = α1 v1 + · · · + αn vn where αi = wiT x
• applied to an eigenvector, f (vi ) = Avi = λi vi is a simple scaling

• by superposition, we find Ax as
Ax = α1 λ1 v1 + · · · + αn λn vn = TΛT −1 x

Exercise
recall from 133A the definition of a circulant matrix

 a1 an an−1 ··· a3 a2 


 a2 a1 an ··· a4 a3 

a3 a2 a1 ··· a5 a4
A = 
 
.. .. .. ... .. .. 

 

 an−1 an−2 an−3 ··· a1 an 


 an an−1 an−2 ··· a2 a1 

and its factorization

1
A = W diag(W a)W H
n
W is the discrete Fourier transform matrix (W a is the DFT of a) and
1
W −1 = W H
n
what is the spectrum of A?

Outline
• quadratic forms

Symmetric eigendecomposition
eigenvalues/vectors of a symmetric matrix have important special properties
• all the eigenvalues are real
• the eigenvectors corresponding to different eigenvalues are orthogonal
• a symmetrix matrix is diagonalizable by an orthogonal similarity transformation:
QT AQ = Λ, QT Q = I
in the remainder of the lecture we assume that A is symmetric (and real)

Eigenvalues of a symmetric matrix are real
consider an eigenvalue λ and eigenvector x (possibly complex):
Ax = λx, x,0
• inner product with x shows that x H Ax = λx H x

• x H x = i=1 |xi | 2 is real and positive, and x H Ax is real:
Pn
n X
n n
x Ax =
H
Ai j x̄i x j = Aii |xi | + 2
2
X X X
Ai j Re( x̄i x j )
i=1 j=1 i=1 j<i
• therefore λ = (x H Ax)/(x H x) is real

• if x is complex, its real and imaginary part are real eigenvectors (if nonzero):
A(xre + jxim) = λ(xre + jxim) =⇒ Axre = λxre, Axim = λxim
therefore, eigenvectors can be assumed to be real

Orthogonality of eigenvectors
suppose x and y are eigenvectors for different eigenvalues λ, µ:
Ax = λx, Ay = µy, λ,µ
• take inner products with x , y :
λyT x = yT Ax = xT Ay = µxT y
second equality holds because A is symmetric
• if λ , µ this implies that

xT y = 0

Eigendecomposition
every real symmetric n × n matrix A can be factored as
A = QΛQT (1)
• Q is orthogonal
• Λ = diag(λ1, . . . , λn) is diagonal, with real diagonal elements
• A is diagonalizable by an orthogonal similarity transformation: QT AQ = Λ
• the columns of Q are an orthonormal set of n eigenvectors: write AQ = QΛ as
 λ1 0 · · · 0 
 0 λ2 · · ·
 
0
=

A q1 q2 · · · qn q1 q2 · · · qn  . .. . . . ..
.


λn
 
 0 0 ··· 
 
= λ1 q1 λ2 q2 · · · λn qn


Proof by induction
• the decomposition (1) obviously exists if n = 1

• suppose it exists if n = m and A is an (m + 1) × (m + 1) matrix
• A has at least one eigenvalue (page 3.2)
• let λ1 be any eigenvalue and q1 a corresponding eigenvector, with kq1 k = 1
• let V be an (m + 1) × m matrix that makes the matrix q1 V orthogonal:

q1T q1T Aq1 q1T AV λ1 q1T q1 λ1 q1T V λ1

0
= = =

A q1 V
VT V T Aq1 V T AV λ1V T q1 V T AV 0 V T AV
• V T AV is a symmetric m × m matrix, so by the induction hypothesis,
V T AV = Q̃Λ̃Q̃T for some orthogonal Q̃ and diagonal Λ̃
• matrix Q = q1 V Q̃ is orthogonal and defines a similarity that diagonalizes A:

q1T λ1 λ1 0

0
QT AQ = A q1 V Q̃ = =

Q̃T V T 0 Q̃T V T AV Q̃ 0 Λ̃
Spectral decomposition
the decomposition (1) expresses A as a sum of rank-one matrices:
 λ1 0 · · · 0   q1T 
0 λ2 · · ·
   
0 q2T
A = QΛQ T
=
   
q1 q2 · · · qn  .. .. . . . .. 


 .. 

λn
   

 0 0 ··· 


 qnT 

n
= λi qi qiT
X
i=1
• the matrix–vector product Ax is decomposed as
n
Ax = λi qi (qiT x)
X
i=1
• (q1T x, . . . , qnT x) are coordinates of x in the orthonormal basis {q1, . . . , qn }

• (λ1 q1T x, . . . , λn qnT x) are coordinates of Ax in the orthonormal basis {q1, . . . , qn }

Non-uniqueness
some freedom exists in the choice of Λ and Q in the eigendecomposition
 λ1 · · · 0   q1T 
A = QΛQT =

q1 · · ·

qn  .. . . . .. 


 .. 

λn
  

 0 ··· 


 qnT 

Ordering of eigenvalues
diagonal Λ and columns of Q can be permuted; we will assume that
λ1 ≥ λ2 ≥ · · · ≥ λn
Choice of eigenvectors
suppose λi is an eigenvalue with multiplicity k : λi = λi+1 = · · · = λi+k−1
• nonzero vectors in span{qi, . . . , qi+k−1 } are eigenvectors with eigenvalue λi

• qi , . . . , qi+k−1 can be replaced with any orthonormal basis of this “eigenspace”

Inverse
a symmetric matrix is invertible if and only if all its eigenvalues are nonzero:
• inverse of A = QΛQT is
 1/λ1
 0 ··· 0 

 0 1/λ2 · · · 0
A−1 = (QΛQT )−1 = QΛ−1QT , Λ−1 =  .

. .. ... .. 

 
 0
 0 ··· 1/λn 

• eigenvectors of A−1 are the eigenvectors of A

• eigenvalues of A−1 are reciprocals of eigenvalues of A

Spectral matrix functions
Integer powers
Ak = (QΛQT ) k = QΛ k QT , Λ k = diag(λ1k , . . . , λnk )
• negative powers are defined if A is invertible (all eigenvalues are nonzero)

• Ak has the same eigenvectors as A, eigenvalues λik
Square root
p p
A 1/2
= QΛ Q ,
1/2 T
Λ1/2
= diag( λ1, . . . , λn)
• defined if eigenvalues are nonnegative

• a symmetric matrix that satisfies A1/2 A1/2 = A
Other matrix functions: can be defined via power series, for example,
exp(A) = Q exp(Λ)QT , exp(Λ) = diag(eλ1, . . . , eλn )

Range, nullspace, rank
eigendecomposition with nonzero eigenvalues placed first in Λ:
QT1

Λ1 0
A = QΛQT = = Q1Λ1QT1

Q1 Q2
0 0 QT2
diagonal entries of Λ1 are the nonzero eigenvalues of A
• columns of Q1 are an orthonormal basis for range(A)

• columns of Q2 are an orthonormal basis for null(A)
• this is an example of a full-rank factorization (page 1.27): A = BC with
B = Q1, C = Λ1QT1
• rank of A is the number of nonzero eigenvalues (with their multiplicities)

Pseudo-inverse
we use the same notation as on the previous page
QT1

Λ1 0
A= = Q1Λ1QT1

Q1 Q2
0 0 QT2
diagonal entries of Λ1 are the nonzero eigenvalues of A
• pseudo-inverse follows from page 1.36 with B = Q1 and C = Λ1QT1

• the pseudo-inverse is A† = C † B† = (Q1Λ−1
1 )Q T:
1
Λ−1 QT1

0
A† = Q1Λ−1 1 =
T

Q Q1 Q2 1
1 0 0 QT2
• eigenvectors of A† are the eigenvectors of A

• nonzero eigenvalues of A† are reciprocals of nonzero eigenvalues of A
• range, nullspace, and rank of A† are the same as for A

Trace
the trace of an n × n matrix B is the sum of its diagonal elements
n
trace(B) =
X
Bii
i=1
• transpose: trace(BT ) = trace(B)

• product: if B is n × m and C is m × n, then
n X
m
trace(BC) = trace(CB) =
X
Bi j C ji
i=1 j=1
• eigenvalues: the trace of a symmetric matrix is the sum of the eigenvalues
n
trace(QΛQ ) = trace(Q QΛ) = trace(Λ) =
T T
λi
X
i=1

Frobenius norm
recall the definition of Frobenius norm of an m × n matrix B:

v
tm n q q
kBkF = Bi2j = trace(BT B) =
XX
trace(BBT )
i=1 j=1
• this is an example of a unitarily invariant norm: if U , V are orthogonal, then
kUBV kF = kBkF
Proof:
kUBV kF2 = trace(V T BT U T UBV) = trace(VV T BT B) = trace(BT B) = kBkF2
• for a symmetric n × n matrix with eigenvalues λ1, . . . , λn,

! 1/2
n
k AkF = kQΛQT kF = kΛkF = λi2
X
i=1

2-Norm
recall the definition of 2-norm or spectral norm of an m × n matrix B:
kBxk
kBk2 = max
x,0 k xk
• this norm is also unitarily invariant: if U , V are orthogonal, then
kUBV k2 = kBk2
Proof:
kUBV xk kUByk kByk

kUBV k2 = max = max T = max = kBk2
x,0 k xk y,0 kV yk y,0 k yk
• for a symmetric n × n matrix with eigenvalues λ1, . . . , λn,
k Ak2 = kQΛQT k2 = kΛk2 = max |λi | = max{λ1, −λn }

i=1,...,n

Exercises
Exercise 1
suppose A has eigendecomposition A = QΛQT ; give an eigendecomposition of
A − αI
Exercise 2
what are the eigenvalues and eigenvectors of an orthogonal projector
A = UU T (where U T U = I )
Exercise 3
the condition number of a nonsingular matrix is defined as
κ(A) = k Ak2 k A−1 k2
express the condition number of a symmetric matrix in terms of its eigenvalues

Outline
• quadratic forms

Quadratic forms
the eigendecomposition is a useful tool for problems that involve quadratic forms
f (x) = xT Ax
• substitute A = QΛQT and make an orthogonal change of variables y = QT x :
f (Qy) = yT Λy = λ1 y12 + · · · + λn yn2
• y1, . . . , yn are coordinates of x in the orthonormal basis of eigenvectors
• the orthogonal change of variables preserves inner products and norms:
k yk2 = kQT xk2 = k xk2

Maximum and minimum value
consider the optimization problems with variable x
maximize xT Ax minimize xT Ax
subject to xT x = 1 subject to xT x = 1
change coordinates to the spectral basis ( y = QT x and x = Qy ):
maximize λ1 y12 + · · · + λn yn2 minimize λ1 y12 + · · · + λn yn2

subject to y12 + · · · + yn2 = 1 subject to y12 + · · · + yn2 = 1
• maximization: y = (1, 0, . . . , 0) and x = q1 are optimal; maximal value is
max xT Ax = max (λ1 y12 + · · · + λn yn2) = λ1 = max λi

k xk=1 k yk=1 i=1,...,n
• minimization: y = (0, 0, . . . , 1) and x = qn are optimal; minimal value is
min xT Ax = min (λ1 y12 + · · · + λn yn2) = λn = min λi

k xk=1 k yk=1 i=1,...,n

Exercises
Exercise 1: find the extreme values of the Rayleigh quotient (xT Ax)/(xT x), i.e.,
xT Ax xT Ax
max T , min T
x,0 x x x,0 x x
Exercise 2: solve the optimization problems
maximize xT Ax minimize xT Ax
subject to xT x ≤ 1 subject to xT x ≤ 1
Exercise 3: show that (for symmetric A)
k Ak2 = max |λi | = max |xT Ax|

i=1,...,n k xk=1

Sign of eigenvalues
matrix property condition on eigenvalues

positive definite λn > 0
positive semidefinite λn ≥ 0
indefinite λn < 0 and λ1 > 0
negative semidefinite λ1 ≤ 0
negative definite λ1 < 0
• λ1 and λn denote the largest and smallest eigenvalues:
λ1 = max λi, λn = min λi

i=1,...,n i=1,...,n
• properties in the table follow from
xT Ax xT Ax
λ1 = max x Ax = max T ,
T
λn = min x Ax = min T
T
k xk=1 x,0 x x k xk=1 x,0 x x

Ellipsoids
if A is positive definite, the set
E = {x | xT Ax ≤ 1}
is an ellipsoid with center at the origin
√1 q1
λ1
√1 qn
λn
after the orthogonal change of coordinates y = QT x the set is described by
λ1 y12 + · · · + λn yn2 ≤ 1
this shows that:
• eigenvectors of A give the principal axes

√
• the width along the principal axis determined by qi is 2/ λi

Exercise
give an interpretation of trace(A−1) as a measure of the size of the ellipsoid
E = {x | xT Ax ≤ 1}

Max–min characterization of eigenvalues
as an extension of the maximization problem on page 3.24, consider
maximize λmin(X T AX) (2)

subject to XT X = I
the variable X is an n × k matrix, for some given value of k between 1 and n
• λmin(X T AX) denotes the smallest eigenvalue of the k × k matrix X T AX

• for k = 1 this is the problem on page 3.24: λmin(xT Ax) = xT Ax
Solution: from the eigendecomposition A = QΛQT = λ

Pn T
i=1 i qi qi
• the optimal value of (2) is the k th eigenvalue λ k of A
• an optimal choice for X is formed from the first k columns of Q:
X=

q1 q2 · · · qk
this is known as the Courant–Fischer min–max theorem

Proof of the max–min characteriation
we make a change of variables Y = QT X :
maximize λmin(Y T ΛY )
subject to Y TY = I
we also partition Λ as
 λ1 · · · 0   λ k+1 · · · 0 
Λ=
Λ1 0
, Λ1 = 
 .. . . . .. ,
  .
Λ2 =  . ... .. 

0 Λ2
λk λn
 

 0 ··· 

 0
 ··· 


I
we show that the matrix Ŷ = is optimal
0
• for this matrix
T
I Λ1 0 I
Ŷ T ΛŶ = = Λ1, λmin(Ŷ T ΛŶ ) = λmin(Λ1) = λ k
0 0 Λ2 0
• on the next page we show that λmin(Y T ΛY ) ≤ λ k if Y is n × k with Y T Y = I

Proof of the max–min characteriation
• on page 3.24, we have seen that
λmin(Y T ΛY ) = min uT (Y T ΛY )u
kuk=1
• if Y has k columns, there exists v , 0 such that Y v has k − 1 leading zeros:

 Y11 ··· Y1k   0 

 .. .. 


 .. 

   v1   
Yk−1,1 ··· Yk−1,k  .   0
Y v =   . =
 

Yk1 ··· Yk k    yk 

 .. ..   vk  
   .. 

   

 Yn1 ··· Ynk 


 yn 

• if Y T Y = I and we normalize v , then kY vk = kvk = 1 and
(Y v)T Λ(Y v) = λ k y k2 + · · · + λn yn2 ≤ λ k (y k2 + · · · + yn2) = λ k
• this shows that
λmin(Y T ΛY ) = min uT (Y T ΛY )u ≤ vT (Y T ΛY )v ≤ λ k
kuk=1
Min–max characterization of eigenvalues
the minimization problem on page 3.24 can be extended in a similar way:
minimize λmax(X T AX)

(3)
subject to XT X = I
the variable X is an n × k matrix
• λmax(X T AX) denotes the largest eigenvalue of the k × k matrix X T AX

• for k = 1 this is the minimization problem on page 3.24: λmax(xT Ax) = xT Ax
Solution: from the eigenvalue decomposition A = QΛQT = λ

Pn T
i=1 i qi qi
• the optimal value of (3) is eigenvalue λn−k+1 of A
• an optimal choice of X is formed from the last k columns of Q:
X=

qn−k+1 · · · qn−1 qn
this follows from the max–min characterization on page 3.29 applied to −A

Exercises
Exercise 1: suppose B is an m × m principal submatrix of A, for example,
 A11
 A12 · · · A1m 

 A21 A22 · · · A2m
B =  . ,

.. .. (4)
 .


 Am1
 Am2 · · · Amm 

and denote the m eigenvalues of B by µ1 ≥ µ2 ≥ · · · ≥ µm
show that
µ1 ≤ λ1, µ2 ≤ λ2, . . ., µm ≤ λm
(λ1, . . . , λm are the first m eigenvalues of A)
Exercise 2: consider the matrix B in (4) with m = n − 1; show that
λ1 ≥ µ1 ≥ λ2 ≥ µ2 ≥ · · · ≥ λn−1 ≥ µn−1 ≥ λn
this is known as the eigenvalue interlacing theorem

Eigendecomposition of covariance matrix
• suppose x is a random n-vector with mean µ, covariance matrix Σ

• Σ is positive semidefinite with eigendecomposition
Σ = E((x − µ)(x − µ)T ) = QΛQT
define a random n-vector y = QT (x − µ)
• y has zero mean and covariance matrix Λ:
E(yyT ) = QT E((x − µ)(x − µ)T )Q = QT ΣQ = Λ
• components of y are uncorrelated and have variances E(yi2) = λi

• x is decomposed in uncorrelated components with decreasing variance:
E(y12) ≥ E(y22) ≥ · · · ≥ E(yn2)
the transformation is known as the Karhunen–Loève or Hotelling transform

Multivariate normal distribution
multivariate normal (Gaussian) probability density function
1 − 12 (x−µ)T Σ−1 (x−µ)

p(x) = √ e
(2π) n/2 det Σ
x2
contour lines of density function for
√
1
Σ= √7 3
, µ=
5
4 3 5 4
eigenvalues of Σ are λ1 = 2, λ2 = 1,
√
3/2 1/2
q1 = , q2 = √
1/2 − 3/2
x1

Multivariate normal distribution
the decorrelated and de-meaned variables y = QT (x − µ) have distribution
n
1 yi2
p̃(y) =
Y
√ exp(− )
i=1 2πλi 2λi
x2
y2
y1
y1
−λ11/2 λ11/2
y2 x1
−λ21/2 λ21/2

Joint diagonalization of two matrices
• a symmetric matrix A is diagonalized by an orthogonal similarity:
QT AQ = Λ
• as an extension, if A, B are symmetric and B is positive definite, then
ST AS = D, ST BS = I
for some nonsingular S and diagonal D
Algorithm: S and D can be computed is as follows
• Cholesky factorization B = RT R, with R upper triangular and nonsingular

• eigendecomposition R−T AR−1 = QDQT , with D diagonal, Q orthogonal
• define S = R−1Q:
ST AS = QT R−T AR−1Q = Λ, ST BS = QT R−T BR−1Q = QT Q = I

Optimization problems with two quadratic forms
as an extension of the maximization problem on page 3.24, consider
maximize xT Ax
subject to xT Bx = 1
where A, B are symmetric and B is positive definite
• compute nonsingular S that diagonalizes A, B:
ST AS = D, ST BS = I
• make change of variables x = Sy :
maximize yT Dy
subject to yT y = 1
• if diagonal elements of D are sorted as D11 ≥ · · · ≥ Dnn, solution is
y = e1 = (1, 0, . . . , 0), x = Se1, xT Ax = D11

Outline
• quadratic forms

Low-rank matrix approximation
• low rank is a useful matrix property in many applications
• low rank is not a robust property (easily destroyed by noise or estimation error)
• most matrices in practice have full rank
• often the full-rank matrix is close to being low rank
• computing low-rank approximations is an important problem in linear algebra
on the next pages we discuss this for positive semidefinite matrices

Rank-r approximation of positive semidefinite matrix
let A be a positive semidefinite matrix with rank(A) > r and eigendecomposition

n
A = QΛQ = T
λi qi qiT , λ1 ≥ · · · ≥ λn ≥ 0, λr+1 > 0
X
i=1
the best rank-r approximation is the sum of the first r terms in the decomposition:
r
B= λi qi qiT
X
i=1
• B is the best approximation for the Frobenius norm: for every C with rank r ,
! 1/2
n
k A − CkF ≥ k A − BkF = λi2
X
i=r+1
• B is also the best approximation for the 2-norm: for every C with rank r ,
k A − Ck2 ≥ k A − Bk2 = λr+1

Rank-r approximation in Frobenius norm
the approximation problem in Frobenius norm is a nonlinear least squares problem

!2
n X
n r
X X T kF2 =
X X
minimize kA − Ai j − Xik X j k
i=1 j=1 k=1
• we parametrize B as B = X X T with X of size n × r , and optimize over X

• this can be written in the standard nonlinear least squares form
minimize g(x) = k f (x)k 2
with vector x containing the elements of X and f (x) the elements of A − X X T

• the first order (necessary but not sufficient) optimality conditions are
∇g(x) = 2D f (x)T f (x) = 0
• the first order optimality conditions will be derived on page 3.41; they are
4(A − X X T )X = 0
Solution of first order optimality conditions
AX = X(X T X)
• define eigendecomposition X T X = UDU T (U orthogonal r × r , D diagonal)

• use Y = XU and D as variables:
AY = Y D, Y TY = D
• the r diagonal elements of D must be eigenvalues of A

• the r columns of Y are corresponding orthogonal eigenvectors
√
• the columns of Y are normalized to have norm Dii
we conclude that the solutions of the first order optimality conditions satisfy
X X = YY =
T T
λi qi qiT
X
i∈I
where I is a subset of r elements of {1, 2, . . . , n}

Optimal solution
among the solutions of the 1st order conditions we choose the one that minimizes
k A − X X T kF
• the squared error in the approximation is
X X T kF2 = λi qi qiT kF2

X
kA − kA −
i∈I
= λi qi qiT kF2
X
k
i<I
= λi2
X
i<I
• the optimal choice for I is I = {1, 2, . . . , r }:

r n
XX =
T
λi qi qiT , X X T kF2 = λi2
X X
kA −
i=1 i=r+1

First order optimality
to derive the first order optimality conditions for
minimize k A − X X T kF2
we substitute X + δX , with arbitrary small δX , and linearize:
k A − (X + δX)(X + δX)T kF2

= k A − X X T + δX X T + X δX T + δX δX T kF2
≈ k A − X X T + δX X T + X δX T kF2

= trace (A − X X + δX X + X δX )(A − X X + δX X + X δX )
T T T T T T
≈ trace ((A − X X T )(A − X X T )) + 2 trace ((δX X T + X δX T )(A − X X T ))

= k A − X X T kF2 + 4 trace (δX T (A − X X T )X)
X is a stationary point if the second term is zero for all δX :
4(A − X X T )X = 0

Rank-r approximation in 2-norm
the same matrix B is also the best approximation in 2-norm: if C has rank r , then
k A − Ck2 ≥ k A − Bk2
the right-hand side is
n r
k A − Bk2 = λi qi qiT λi qi qiT k2
X X
k −
i=1 i=1
n
= λi qi qiT k2
X
k
i=r+1
= λr+1
on the next page we show that k A − Ck2 ≥ λr+1 if C has rank r

Proof
• if rank(C) = r , the nullspace of C has dimension n − r

• define an n × (n − r) matrix V with orthonormal columns that span null(C)
• we use the min–max theorem on page 3.32 to bound k A − Ck2:
k A − Ck2 = max |xT (A − C)x| (page 3.25)

k xk=1
≥ max xT (A − C)x
k xk=1
≥ max yT V T (A − C)V y ( kV yk = k yk )
k yk=1
= max yT V T AV y (V T CV = 0)
k yk=1
= λmax(V T AV)
≥ λr+1 (page 3.32 with k = n − r )

Eval Norms

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Eval Norms

Uploaded by

Copyright:

Available Formats

L.

Vandenberghe ECE133B (Spring 2020)

• eigenvalues and eigenvectors

• low rank matrix approximation

a nonzero vector x is an eigenvector of the n × n matrix A, with eigenvalue λ, if

• the matrix λI − A is singular and x is a nonzero vector in the nullspace of λI − A

det(λI − A) = λn + cn−1 λn−1 + · · · + c1 λ + (−1)n det(A) = 0

Symmetric eigendecomposition 3.2

• eigenvalues of A are the diagonal entries A11, . . . , Ann

• linear combinations of ei are eigenvectors if the corresponding Aii are equal

Example: A = αI is a scalar multiple of the identity matrix

• one eigenvalue α with multipicity n

Symmetric eigendecomposition 3.3

two matrices A and B are similar if

for some nonsingular matrix T

• the mapping that maps A to T −1 AT is called a similarity transformation

det(λI − B) = det(λI − T −1 AT) = det(T −1(λI − A)T) = det(λI − A)

• if x is an eigenvector of A then y = T −1 x is an eigenvector of B:

of special interest will be orthogonal similarity transformations (T is orthogonal)

a matrix is diagonalizable if it is similar to a diagonal matrix:

for some nonsingular matrix T

• the diagonal elements of Λ are the eigenvalues of A

A(T ei ) = TΛei = Λii (T ei )

• the columns of T give a set of n linearly independent eigenvectors

not all square matrices are diagonalizable

Symmetric eigendecomposition 3.5

suppose A is diagonalizable, with

this is a spectral decomposition of the linear function f (x) = Ax

• elements of T −1 x are coefficients of x in the basis of eigenvectors {v1, . . . , vn }:

• applied to an eigenvector, f (vi ) = Avi = λi vi is a simple scaling

Symmetric eigendecomposition 3.6

recall from 133A the definition of a circulant matrix

and its factorization

what is the spectrum of A?

• eigenvalues and eigenvectors

• low rank matrix approximation

eigenvalues/vectors of a symmetric matrix have important special properties

• all the eigenvalues are real

• the eigenvectors corresponding to different eigenvalues are orthogonal

• a symmetrix matrix is diagonalizable by an orthogonal similarity transformation:

in the remainder of the lecture we assume that A is symmetric (and real)

Symmetric eigendecomposition 3.8

consider an eigenvalue λ and eigenvector x (possibly complex):

• inner product with x shows that x H Ax = λx H x

• therefore λ = (x H Ax)/(x H x) is real

A(xre + jxim) = λ(xre + jxim) =⇒ Axre = λxre, Axim = λxim

therefore, eigenvectors can be assumed to be real

Symmetric eigendecomposition 3.9

suppose x and y are eigenvectors for different eigenvalues λ, µ:

Ax = λx, Ay = µy, λ,µ

• take inner products with x , y :

second equality holds because A is symmetric

• if λ , µ this implies that

Symmetric eigendecomposition 3.10

every real symmetric n × n matrix A can be factored as

• Λ = diag(λ1, . . . , λn) is diagonal, with real diagonal elements

• A is diagonalizable by an orthogonal similarity transformation: QT AQ = Λ

• the columns of Q are an orthonormal set of n eigenvectors: write AQ = QΛ as

Symmetric eigendecomposition 3.11

• the decomposition (1) obviously exists if n = 1

q1T q1T Aq1 q1T AV λ1 q1T q1 λ1 q1T V λ1

• V T AV is a symmetric m × m matrix, so by the induction hypothesis,