Solutions To The Exercises On Principal Component Analysis

Solutions to the Exercises* on
Principal Component Analysis
Laurenz Wiskott
Institut für Neuroinformatik
Ruhr-Universität Bochum, Germany, EU
4 February 2017
Contents
1 Intuition 3
1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Exercise: Second moment from mean and variance . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Exercise: Second moment of a uniform distribution . . . . . . . . . . . . . . . . . . . . 3
1.2 Projection and reconstruction error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Exercise: Projection by an inner product is orthogonal . . . . . . . . . . . . . . . . . . 4
1.2.2 Exercise: Error function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Reconstruction error and variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.1 Exercise: Relation among the elements of a second moment matrix . . . . . . . . . . . 4
1.4.2 Exercise: From data distribution to second-moment matrix . . . . . . . . . . . . . . . 5
1.4.3 Exercise: From data distribution to second-moment matrix . . . . . . . . . . . . . . . 6
1.4.4 Exercise: From second-moment matrix to data . . . . . . . . . . . . . . . . . . . . . . 6
1.4.5 Exercise: Data distributions with and without mean . . . . . . . . . . . . . . . . . . . 7

© 2016 Laurenz Wiskott (homepage https://www.ini.rub.de/PEOPLE/wiskott/). This work (except for all figures from
other sources, if present) is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view
a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/. Figures from other sources have their own
copyright, which is generally indicated. Do not distribute parts of these lecture notes showing figures with non-free copyrights
(here usually figures I have the rights to publish but you don’t, like my own published figures).
Several of my exercises (not necessarily on this topic) were inspired by papers and textbooks by other authors. Unfortunately,
I did not document that well, because initially I did not intend to make the exercises publicly available, and now I cannot trace
it back anymore. So I cannot give as much credit as I would like to. The concrete versions of the exercises are certainly my
own work, though.
* These exercises complement my corresponding lecture notes available at https://www.ini.rub.de/PEOPLE/wiskott/
Teaching/Material/, where you can also find other teaching material such as programming exercises. The table of contents of
the lecture notes is reproduced here to give an orientation when the exercises can be reasonably solved. For best learning effect
I recommend to first seriously try to solve the exercises yourself before looking into the solutions.
1
1.5 Covariance matrix and higher order structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 PCA by diagonalizing the covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Formalism 8
2.1 Definition of the PCA-optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Matrix VT : Mapping from high-dimensional old coordinate system to low-dimensional new
coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Matrix V: Mapping from low-dimensional new coordinate system to subspace in old coordi-
nate system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 Exercise: Norm of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Matrix (VT V): Identity mapping within new coordinate system . . . . . . . . . . . . . . . . 9
2.5 Matrix (VVT ): Projection from high- to low-dimensional (sub)space within old coordinate
system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Reconstruction error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8 Covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.8.1 Exercise: Second-moment matrices are positive semi-definite . . . . . . . . . . . . . . 9
2.8.2 Exercise: Covariance matrix from mean and second-moment matrix . . . . . . . . . . 9
2.9 Eigenvalue equation of the covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.9.1 Exercise: Eigenvectors of a symmetric matrix are orthogonal . . . . . . . . . . . . . . 10
2.10 Total variance of the data x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.11 Diagonalizing the covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.12 Variance of y for a diagonalized covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . 10
2.13 Constraints of matrix V0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.14 Finding the optimal subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.15 Interpretation of the result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.15.1 Exercise: Moments of a data distribution: Simple example . . . . . . . . . . . . . . . . 10
2.15.2 Exercise: From data distribution to second-moment matrix via the eigenvectors . . . . 12
2.15.3 Exercise: From data distribution to second-moment matrix via the eigenvectors . . . . 13
2.15.4 Exercise: Dimensionality reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.16 PCA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.17 Intuition of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.18 Whitening or sphering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.18.1 Exercise: Sphered data is uncorrelated . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.19 Singular value decomposition + . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2
3 Application 15
3.1 Face processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Acknowledgment 15
1 Intuition
1.1 Problem statement
1.1.1 Exercise: Second moment from mean and variance
How are mean m, variance v and 2nd moment s related to each other? In other words, if mean and variance
of a one-dimensional distribution were given. How could you compute the corresponding 2nd moment?
Hint: Assume x to be the data values and x̄ their mean. Then play around with the corresponding expressions
for mean x̄ = hxi, variance h(x − x̄)2 i and second moment hx2 i.
Solution: Let x be the data values and x̄ their mean. For the second moment we then get
s = hx2 i (1)
= h((x − x̄) + x̄)2 i (2)
2 2
= h(x − x̄) + 2(x − x̄)x̄ + x̄ i (3)
2 2
= h(x − x̄) i + h2(x − x̄)x̄i + hx̄ i (4)
2 2
= h(x − x̄) i + 2( hxi −x̄)x̄ + x̄ (5)
|{z}
= x̄
= h(x − x̄)2 i + x̄2 (6)
2
= v+m . (7)
Thus, the 2nd moment is the sum of the variance and the square of the mean.
1.1.2 Exercise: Second moment of a uniform distribution
Calculate the second moment of a uniform, i.e. flat, distribution in [−1, +1]. This is a distribution where
every value between −1 and +1 is equally likely and other values are impossible.
Solution:
The second moment is

Z +1
hx2 i = (1/2) x2 dx (1)
−1
= (1/2) [(1/3) x3 ]+1
−1 (2)
= (1/2) (1/3 + 1/3) (3)
= 1/3 . (4)
This might be bit surprising, since one might think that such a distribution has a standard deviation of 0.5
and therefore a variance of 0.52 = 0.25. However, due to the square in the second moment, larger values are
weighted
√ more than smaller values. Thus, the variance of this distribution is 1/3 and its standard deviation
1/ 3 ≈ 0.577.
3
1.2 Projection and reconstruction error
1.2.1 Exercise: Projection by an inner product is orthogonal
1. We have defined the projected vector, xk , by
xk = vvT x (1)
where x is the data point and v is the unit vector along the principal axis of the projection. Show that
the difference vector between data point and the projected data point
x⊥ = x − xk (2)
is orthogonal to v.
Solution: Not available!
2. Give a reason why the orthogonality of the two vectors is useful.
1.2.2 Exercise: Error function
Why should the reconstruction error, E, be defined as the mean of the squared difference of the original and
reconstructed data vectors, and not simply the mean of the difference or the mean of the absolute difference?
Solution: In the mean of the difference, positive errors can cancel out with negative errors, and a poor
solution might have a low error value, which would render the error function useless.
The mean of the absolute difference does not have this flaw and might actually be a reasonable error function.
However, the square is mathematically more convenient than the absolute value in many ways, for instance,
the derivative is well defined everywhere. Thus, the square is more practical. (It also has a close relationship
to Gaussian noise, which would be a bit more involved to explain.)
1.3 Reconstruction error and variance
1.4 Covariance matrix
1.4.1 Exercise: Relation among the elements of a second moment matrix
For a set of data vectors xµ , µ = 1, ..., M the second moment matrix C is defined as Cij := hxµi xµj iµ . What
are the upper and lower limits of Cij if Cii and Cjj are known?
Hint: Consider hxµi xµj iµ = M

1
P µ µ
µ xi xj as the scalar product of two vectors.
Solution: Interpret xµi and xµj as two M -dimensional vectors, like xi := (x1i , x2i , ..., xM
i ) and let the inner
product be defined as
1 X µ µ
(xi , xj ) := x x . (1)
M µ i j
4
Then
1 X µ µ
Cii = x x = (xi , xi ) = ||xi ||2 , (2)
M µ i i
1 X µ µ
Cjj = x x = (xj , xj ) = ||xj ||2 , (3)
M µ j j
1 X µ µ
Cij = x x = (xi , xj ) = ||xi ||||xj || cos(α) , (4)
M µ i j
p
=⇒ |Cij | ≤ Cii Cjj (since −1 ≤ cos(α) ≤ 1) . (5)
1.4.2 Exercise: From data distribution to second-moment matrix
Give an estimate of the second moment matrix for the following data distributions.
x2 x2 x2
1 1 1
1 x1 1 x1 1 x1
(a) (b) (c)

© CC BY-SA 4.0
Solution:
1. x1 and x2 are uncorrelated and the second-moment matrix therefore diagonal. The first component is
a uniform distribution between −1 and +1, which we know has variance 1/3. The second component
is a uniform distribution between −1/4 and +1/4, the variance of which is therefore scaled by 1/16
compared to that of the first one resulting in a variance of 1/48. Thus,

0.33 0
C≈ . (1)
0 0.02
2. x1 and x2 are again uncorrelated, plus the distribution is rotation symmetric, so that the variances are
identical. The distrubution of one component lies between −1 and +1, thus the variance is less than 1,
but it is concentrated towards the ends, thus the variance is greater than 1/3. Let’s guess 0.5. Thus,

0.5 0
C≈ . (2)
0 0.5
3. If we would just consider the mean of the distribution the second moment matrix would have the
values (−1)2 = 1, −1 · 0.5 = −0.5, 0.5 · −1 = −0.5, and 0.52 = 0.25. We also know the variances of
the distribution are 1/3 scaled by 1/42 = 1/16, because it has a width of 1/4 + 1/4 = 1/2 in both
directions. Adding this to the diagonal elements of the second-moment matrix of the mean yields

1.02 −0.5
C≈ , (3)
−0.5 0.27
assuming that the off-diagonal elements are not affected by the variance of the distribution.
5
1.4.3 Exercise: From data distribution to second-moment matrix
Give an estimate of the second moment matrix for the following data distributions.
x2 x2 x2
1 1 1
1 x1 1 x1 1 x1
(a) (b) (c)

© CC BY-SA 4.0
Solution:
(a) x2 is apparantly uniformly distributed in [−1, +1] and thus has a 2nd moment of C22 ≈ 1/3. Since
x1 ≈ −x2 /2, its 2nd moment is C11 = hx1 x1 i ≈ h(−x2 /2) · (−x2 /2)i ≈ C22 /4 ≈ 1/12 and the the
mixed 2nd moments are C12 = C21 ≈ −C22 /2 ≈ −1/6. Thus,

1/12 −1/6
C≈ . (1)
−1/6 1/3
(b) If all points were exactly at x = (0.5, 1)T the 2nd-moment matrix would simply be

0.25 0.5
C= . (2)
0.5 1
Since the points are slightly spread in the x2 -direction, C22 is slightly increased from 1 to, let say, 1.1
(the 2nd moment of a variable is the square of its mean plus its variance). The other values are not
effected. This is obvious for C11 but also true for C12 = C21 = hx1 x2 i = h0.5 x2 i = 0.5hx2 i = 0.5 · 1 =
0.5. Thus,
0.25 0.5
C≈ . (3)
0.5 1.1
(c) This distribution has a rotation symmetry of 120◦ , and since the variance does not change for a rotation
of 180◦ , the directional variance of the distribution has a rotation symmetry of 60◦ . This effectively
means that the directional variance, which in general is an ellipse, must by a circle. Thus the 2nd
moment matrix is diagonal with C11 = C22 . If one projects the data onto the x1 -axis one might guess
that C11 is slightly larger than for a uniform distribution in [−1, +1]. Thus,

0.4 0
C≈ . (4)
0 0.4
1.4.4 Exercise: From second-moment matrix to data
Draw a data distribution qualitatively consistent with the following second-moment matrices C.

1 −0.5 1 0 1 1
(a) C= (b) C= (c) C=
−0.5 1 0 0.5 1 1
Solution:
6
(a) (b) (c)
1 1 1
1
−1 −1 1 −1 1
−1 −1 −1
© CC BY-SA 4.0
The fat red squares indicate minimal sets of data points to generate the second-moment matrices.
1.4.5 Exercise: Data distributions with and without mean
1. Define a procedure by which you can turn any mean-free data distribution into a distribution with
finite (non-zero) mean but identical second-moment matrix. (Are there exceptions?)
Solution: If we flip a data point µ at the origin, i.e. if we replace xµ by −xµ , the second-moment
matrix does not change, since xµi xµj = (−xµi )(−xµj ). Thus, if we flip each point with negative first
component, then the second-matrix has not changed but the first component of the mean should be
positive. If the first component is always negative we can do the flipping with any other suitable
component.
Only if all components are always zero are we stuck and cannot produce a non-zero-mean data dis-
tribution with identical second-moment matrix. In this case the second-moment matrix would be
zero.
2. Conversely, define a procedure by which you can turn any data distribution with finite mean into a
distribution with zero mean but identical second-moment matrix. (Are there exceptions?)
Solution: Here one can use the same trick as in the first part. However, one not only flips data points
but copies them also. Thus, for each data points xµ a flipped one x(µ+M ) := −xµ is added. The
second-moment matrix does not change but the mean vanishes.
There is no exception for this method. It always works.
Hint: Think about what happens if you flip a point µ at the origin, i.e. if you replace xµ by −xµ in the data
set.
7
1.5 Covariance matrix and higher order structure
1.6 PCA by diagonalizing the covariance matrix
2 Formalism
2.1 Definition of the PCA-optimization problem
2.2 Matrix VT : Mapping from high-dimensional old coordinate system to low-

dimensional new coordinate system
2.3 Matrix V: Mapping from low-dimensional new coordinate system to sub-

space in old coordinate system
2.3.1 Exercise: Norm of a vector
Let bi , i = 1, ..., N , be an orthonormal basis. Then we have (bi , bj ) = δij and

N
X
v= vi bi with vi := (v, bi ) ∀v . (1)
i=1
Show that
N
X
kvk2 = vi2 . (2)
i=1
Solution: We can show directly that
kvk2 = (v, v) (3)

 
XN N
X
=  vi bi , vj bj  (4)
i=1 j=1
N
X
= vi vj (bi , bj ) (5)
i,j=1
| {z }
δij
N
X
= vi2 (bi , bi ) (since (bi , bj ) = 0 for i 6= j) (6)
i=1
N
X
= vi2 (since the basis vectors are normalized to 1) . (7)
i=1
8
2.4 Matrix (VT V): Identity mapping within new coordinate system
2.5 Matrix (VVT ): Projection from high- to low-dimensional (sub)space within

old coordinate system
2.6 Variance
2.7 Reconstruction error
2.8 Covariance matrix
2.8.1 Exercise: Second-moment matrices are positive semi-definite
(//10/11 min)Show that a second-moment matrix C := hxµ (xµ )T iµ is always positive semi-definite, i.e. for
each vector v we find vT Cv ≥ 0. For which vectors v does vT Cv = 0 hold?
Solution: It follows directly that

vT Cv = vT hxµ (xµ )T iµ v (1)
T µ µ T
= hv x (x ) viµ (2)
T µ 2
= h(v x ) iµ (3)
≥ 0. (4)
Only if v is orthogonal to all data points xµ does vT Cv = 0 hold. This is only possible if the data are
degenerated and have variance zero in at least one direction.
2.8.2 Exercise: Covariance matrix from mean and second-moment matrix
Given some data xµ , µ = 1, ..., M , with mean

1
x̄ := hxi = (1)
−1
and second-moment matrix
4 −1
C := hxxT i = (2)
−1 2
Calculate the covariance matrix
Σ := h(x − x̄)(x − x̄)T i . (3)
First derive a general formula and then calculate it for the concrete values given.
Solution: Multiplying out the covariance matrix yields

Σ := h(x − x̄)(x − x̄)T i (4)
= hxxT − xx̄T − x̄xT + x̄x̄T i (5)
T T T T
= hxx i − hxix̄ − x̄hx i + x̄x̄ (6)
T T T T
= hxx i − x̄x̄ − x̄x̄ + x̄x̄ (7)
T T
= hxx i − x̄x̄ (8)

(2,1) 4 −1 1
= − 1 −1 (9)
−1 2 −1

4 −1 1 −1
= − (10)
−1 2 −1 1

3 0
= . (11)
0 1
9
2.9 Eigenvalue equation of the covariance matrix
2.9.1 Exercise: Eigenvectors of a symmetric matrix are orthogonal
Prove that the eigenvectors of a symmetric matrix are orthogonal, if their eigenvalues are different. Proceed
as follows:
1. Let A be a symmetric N -dimensional matrix, i.e. A = AT . Show first that (v, Aw) = (Av, w) for
any vectors v, w ∈ RN , with (·, ·) indicating the Euclidean inner product.
Solution:
(v, Aw) = vT Aw = vT AT w = (Av)T w = (Av, w) . (1)
2. Let {ai } be the eigenvectors of the matrix A with the eigenvalues λi . Show with the help of part one
that (ai , aj ) = 0 if λi 6= λj .
Hint: λi (ai , aj ) = ...
Solution:
(1)
λi (ai , aj ) = (λi ai , aj ) = (Aai , aj ) = (ai , Aaj ) = (ai , λj aj ) = λj (ai , aj ) (2)
=⇒ (ai , aj ) = 0 if λi 6= λj , (3)
which means that eigenvectors to different eigenvalues are orthogonal.
2.10 Total variance of the data x
2.11 Diagonalizing the covariance matrix
2.12 Variance of y for a diagonalized covariance matrix
2.13 Constraints of matrix V0
2.14 Finding the optimal subspace
2.15 Interpretation of the result
2.15.1 Exercise: Moments of a data distribution: Simple example
Given a data distribution xµ with

−3 1 −2
x1 = , x2 = , x3 = . (1)
2 −1 3
1. Calculate the mean x̄ = hxµ iµ and the second-moment matrix C = hxµ xµ T iµ .

Solution: The mean is
1 1
x + x2 + x3

x̄ = (2)
3
1 −3 1 −2
= + + (3)
3 2 −1 3

1 −4
= . (4)
3 +4
10
The second-moment matrix is
1 1 1T T T

C = x x + x2 x2 + x3 x3 (5)
3
1 −3 1 −2
= (−3, 2) + (1, −1) + (−2, 3) (6)
3 2 −1 3

1 9 −6 1 −1 4 −6
= + + (7)
3 −6 4 −1 1 −6 9

1 +14 −13
= . (8)
3 −13 +14
2. Determine the normalized eigenvectors c1 and c2 of C and the corresponding eigenvalues.

Hint: Look at the data distribution and guess the eigenvectors on the basis of the symmetry of the
distribution. Then insert the guessed eigenvectors into the eigenvalue equation, verify that they are
eigenvectors and calculate the eigenvalues. Otherwise you have to go the hard way via the characteristic
polynomial.
Solution:
© CC BY-SA 4.0
The symmetry of the data points (blue points) indicates that two eigenvectors are c1 = √12 (−1, 1)T
and c2 = √12 (1, 1)T (red arrows). Multiplying with the second-moment matrix verifies the eigenvectors
and provides the eigenvalues.

1 +14 −13 −1
Cc1 = √ (9)
2·3 −13 +14 1

1 −27 1 −9
= √ =√ = 9 c1 (10)
2·3 27 2 9
=⇒ λ1 = 9 , (11)

1 +14 −13 1
Cc2 = √ (12)
2·3 −13 +14 1

1 1 1
= √ = c2 (13)
2·3 1 3
1
=⇒ λ2 = . (14)
3
(15)
3. Determine the first and second moment of
y µ = cTα xµ , (16)
i.e. hy µ iµ and h(y µ )2 iµ , for α ∈ {1, 2}.
11
Hint: You don’t have to compute the projected data. There is a simpler way.
Solution: First consider the general equations. The first moment is
hy µ iµ = hcTα xµ iµ = cTα hxµ iµ = cTα x̄ . (17)
The second moment is

µT
h(y µ )2 iµ = h cTα xµ x cα iµ = cTα hxµ xµ T iµ cα = cTα Ccα = cTα λα cα = λα . (18)
Thus, the first moments are

1 −4 8
cT1 x̄ = √ (−1, 1) =√ ≈ 1.89 , (19)
2·3 +4 2·3

1 −4 0
cT2 x̄ = √ (1, 1) =√ = 0, (20)
2·3 +4 2·3
and the second moments are simply the eigenvalues 9 and 1/3.
2.15.2 Exercise: From data distribution to second-moment matrix via the eigenvectors
Give an estimate of the second-moment matrix for the following data distributions by first guessing the
eigenvalues and normalized eigenvectors from the distribution and then calculating the matrix.
x2 x2 x2
1 1 1
1 x1 1 x1 1 x1
(a) (b) (c)

© CC BY-SA 4.0
Solution:
(a) The two coefficients x1 and x2 are uncorrelated and therefore valid eigenvectors lie along the axes,
i.e. u1 = (1, 0)T , u2 = (0, 1)T , resulting in U = 1. Since x1 is uniformly distributed in [−1, +1], its
variance is 1/3, thus λ1 = 1/3. The other coefficient, x2 , is compressed by a factor of about 4, resulting
in a variance that is a factor of 42 smaller, thus λ2 = 1/48. The 2nd-moment matrix is therefore
C = UΛUT (1)
= Λ (since U = 1) (2)

1/3 0
= (3)
0 1/48

0.33 0
≈ . (4)
0 0.02
(b) The two coefficients x1 and x2 are uncorrelated and therefore valid eigenvectors lie along the axes, i.e.
u1 = (1, 0)T , u2 = (0, 1)T , resulting in U = 1. However, since the variance is the same in all directions
for symmetry reasons, any other set of orthogonal unit vectors would do as well. If one projects the
12
data onto one of the axes one sees that a single coefficient is not uniformly distributed but is heavier
near ±1. Thus, the variance might be about 1/2 instead of 1/3 and λ1 = λ2 = 1/2. Therefore
C = UΛUT (5)
= Λ (since U = 1) (6)

1/2 0
= . (7)
0 1/2
(c) This distribution clearly has its largest 2nd moment (not variance) in the direction of u1 = √1 (−1, 1/2)T =
5/4
√1 (−2, 1)T ,
and the corresponding value is a bit more than 1, let say λ1 = 61/48. The second eigen-
5
vector must be orthogonal to the first one, for instance u2 = √15 (1, 2)T , and the corresponding 2nd
moment (in this case even variance) is much smaller, let say λ2 = 1/48. The 2nd-moment matrix is
therefore
C = UΛUT (8)
uT1

= u1 u2 diag(λ1 , λ2 ) (9)
uT2

1 −2 1 61/48 0 1 −2 1
= √ √ (10)
5 1 2 0 1/48 5 1 2

1 −2 1 61 0 −2 1
= (11)
5 · 48 1 2 0 1 1 2

1 −2 1 −122 61
= (12)
5 · 48 1 2 1 2

1 245 −120
= (13)
5 · 48 −120 65

1 49 −24
= (14)
48 −24 13

1.02 −0.5
≈ . (15)
−0.5 0.27
We verify that the trace is the sum of the eigenvalues

λ1 + λ2 = 61/48 + 1/48 (16)
= 62/48 (17)
= (49 + 13)/48 (18)
= C11 + C22 . (19)
2.15.3 Exercise: From data distribution to second-moment matrix via the eigenvectors
Give an estimate of the second-moment matrix for the following data distributions by first guessing the
eigenvalues and normalized eigenvectors from the distribution and then calculating the matrix.
x2 x2 x2
1 1 1
1 x1 1 x1 1 x1
(a) (b) (c)
13
© CC BY-SA 4.0
2.15.4 Exercise: Dimensionality reduction
Given some data in R3 with the corresponding 3 × 3 second-moment matrix C with eigenvectors cα and
eigenvalues λα , with λ1 = 3, λ2 = 1 and λ3 = 0.2.
1. Define a matrix A ∈ R2×3 that maps the data into a two-dimensional space while preserving as much
variance as possible.
Solution: The dimension with least variance is spanned by the eigenvector of λ3 . The two-dimensional
subspace with largest variance is spanned by the eigenvectors of λ1 and λ2 . The corresponding matrix
reads T
c1
A := . (1)
cT2
2. Define a matrix B ∈ R3×2 that places the reduced data back into R3 with minimal reconstruction
error. How large is the reconstruction error?
Solution: Embedding the reduced data back into the R3 is done again with the eigenvectors.
B := (c1 , c2 ) . (2)
The reconstruction error is the sum over eigenvalues of the neglected eigenvectors, which is λ3 = 0.2
in this case.
3. Prove that AB is an identity matrix. Why would one expect that intuitively?
Solution: Intuitively the matrix AB corresponds to a mapping from R2 into R3 and back again. No
information is lost in this process, which means that AB should be the identity matrix. We can also
show formally that
T T
c1 c1 cT1 c2

c1 1 0
AB = (c 1 , c 2 ) = = . (3)
cT2 cT2 c1 cT2 c2 0 1
4. Prove that BA is a projection matrix but not the identity matrix.

Solution: The defining property of a projection matrix is that applying it twice is the same as applying
it once. Thus, we verify
(BA) (BA) = B (AB) A = BA . (4)
| {z }
=1
To show that BA is not the identity matrix we multiply it with the third eigenvector.
T
c1
BAc3 = (c1 , c2 ) c3 (5)
cT2

0
= (c1 , c2 ) (since c3 is orthogonal to c1 and c2 ) (6)
0
= 0 (7)
6= c3 . (8)
Thus, BA cannot be the identity matrix.

Alternatively one can argue that B is 3 × 2 and A is 2 × 3. Thus, BA can have at most rank 2 and
therefore can not be the identity matrix.
14
2.16 PCA Algorithm
2.17 Intuition of the Results
2.18 Whitening or sphering
2.18.1 Exercise: Sphered data is uncorrelated
Prove that sphered zero-mean data x̂ projected onto two orthogonal vectors n1 and n2 is uncorrelated.
Hint: The correlation coefficient for two scalar data sets y1 and y2 with means ȳi := hyi i is defined as
h(y1 − ȳ1 )(y2 − ȳ2 )i

c := p p (1)
h(y1 − ȳ1 )2 i h(y2 − ȳ2 )2 i
Solution: Projecting the data x̂ onto the vectors n1 and n2 , which we assume are normalized without loss
of generality, yields
yi = nTi x̂ , (2)
which is zero-mean because x̂ is zero-mean.
Considering just the numerator of the correlation yields
h(y1 − ȳ1 )(y2 − ȳ2 )i = hy1 y2 i (since the yi are zero-mean) (3)
(2)
= h(nT1 x̂) (nT2 x̂)i (4)
= h(nT1 x̂) (x̂T n2 )i (5)
= nT1 hx̂x̂T in2 (6)
= nT1 1n2 (since x̂ is sphered) (7)
= nT1 n2 (8)
= 0 (since n1 and n2 are orthogonal) . (9)
This proves the assertion for data with finite variance. If the variance of the data is zero then the denominator
is zero and the correlation is not defined.
2.19 Singular value decomposition +
3 Application
3.1 Face processing
4 Acknowledgment
15

Solutions To The Exercises On Principal Component Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Solutions To The Exercises On Principal Component Analysis

Uploaded by

Copyright:

Available Formats

Solutions to the Exercises* on

Principal Component Analysis

1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 Exercise: Second moment from mean and variance . . . . . . . . . . . . . . . . . . . . 3

1.1.2 Exercise: Second moment of a uniform distribution . . . . . . . . . . . . . . . . . . . . 3

1.2 Projection and reconstruction error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Exercise: Projection by an inner product is orthogonal . . . . . . . . . . . . . . . . . . 4

1.2.2 Exercise: Error function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Reconstruction error and variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4.1 Exercise: Relation among the elements of a second moment matrix . . . . . . . . . . . 4

1.4.2 Exercise: From data distribution to second-moment matrix . . . . . . . . . . . . . . . 5

1.4.3 Exercise: From data distribution to second-moment matrix . . . . . . . . . . . . . . . 6

1.4.4 Exercise: From second-moment matrix to data . . . . . . . . . . . . . . . . . . . . . . 6

1.4.5 Exercise: Data distributions with and without mean . . . . . . . . . . . . . . . . . . . 7

1.6 PCA by diagonalizing the covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1 Definition of the PCA-optimization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3.1 Exercise: Norm of a vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.7 Reconstruction error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.8 Covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.8.1 Exercise: Second-moment matrices are positive semi-definite . . . . . . . . . . . . . . 9

2.8.2 Exercise: Covariance matrix from mean and second-moment matrix . . . . . . . . . . 9

2.9 Eigenvalue equation of the covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.9.1 Exercise: Eigenvectors of a symmetric matrix are orthogonal . . . . . . . . . . . . . . 10

2.10 Total variance of the data x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.11 Diagonalizing the covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.12 Variance of y for a diagonalized covariance matrix . . . . . . . . . . . . . . . . . . . . . . . . 10

2.13 Constraints of matrix V0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.14 Finding the optimal subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.15 Interpretation of the result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.15.1 Exercise: Moments of a data distribution: Simple example . . . . . . . . . . . . . . . . 10

2.15.4 Exercise: Dimensionality reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.16 PCA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.17 Intuition of the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.18 Whitening or sphering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.18.1 Exercise: Sphered data is uncorrelated . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.19 Singular value decomposition + . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1 Face processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.1 Problem statement

1.1.1 Exercise: Second moment from mean and variance

1.1.2 Exercise: Second moment of a uniform distribution

The second moment is

1.2.1 Exercise: Projection by an inner product is orthogonal

1. We have defined the projected vector, xk , by

1.2.2 Exercise: Error function

1.3 Reconstruction error and variance

1.4 Covariance matrix

1.4.1 Exercise: Relation among the elements of a second moment matrix

Hint: Consider hxµi xµj iµ = M

1.4.2 Exercise: From data distribution to second-moment matrix

(a) (b) (c)

(a) (b) (c)

1.4.4 Exercise: From second-moment matrix to data

1.4.5 Exercise: Data distributions with and without mean

1.6 PCA by diagonalizing the covariance matrix

2.1 Definition of the PCA-optimization problem

2.2 Matrix VT : Mapping from high-dimensional old coordinate system to low-

2.3 Matrix V: Mapping from low-dimensional new coordinate system to sub-

2.3.1 Exercise: Norm of a vector

Let bi , i = 1, ..., N , be an orthonormal basis. Then we have (bi , bj ) = δij and

Solution: We can show directly that

kvk2 = (v, v) (3)

2.5 Matrix (VVT ): Projection from high- to low-dimensional (sub)space within