Professional Documents
Culture Documents
MIT9 40S18 Lec17
MIT9 40S18 Lec17
Computation
Lecture 17
Principal Components Analysis
Learning Objectives for Lecture 17
2
Learning Objectives for Lecture 17
3
Matrix transformations
y = Ax
• In general A maps the set of vectors in R2 onto another set of vectors
in R2 .
−1
A
det(A) ≠ 0
4
Eigenvectors and eigenvalues
• Matrix transformations have special directions
⎛ 1+ δ 0 ⎞ ⎛ 1− δ 0 ⎞
A=⎜ A=⎜
1 ⎟⎠
⎟
⎝ 0 ⎝ 0 1+ δ ⎠
⎛ λ1 0 ⎞
These are all diagonal matrices Λ=⎜ ⎟
⎜⎝ 0 λ 2 ⎠⎟ 5
Eigenvectors and eigenvalues
• Some vectors are rotated, some are not.
• For a diagonal matrix, vectors along the axes are scaled, but not
rotated.
⎛ λ1 0 ⎞ ⎛
1 ⎞ ⎛ 1 ⎞
Λ ê1 = λ1 ê1
⎜ ⎟⎜ = λ
⎜⎝ 0 λ 2 ⎟⎠ ⎝ 0 ⎟⎠
1⎜
⎝ 0 ⎟⎠
⎛ λ1 0 ⎞ ⎛
0 ⎞ ⎛ 0 ⎞ Λ ê2 = λ 2 ê2
⎜ ⎟⎜ = λ2 ⎜
⎜⎝ 0 λ 2 ⎟⎠ ⎝ 1 ⎟⎠ ⎝ 1 ⎟⎠
6
Eigenvectors and eigenvalues
• Diagonal matrices have the property that they map any vector parallel
to a standard basis vector into another vector along that standard basis
vector.
eigenvalue equation
⎛ λ1 0 0 ⎞
⎜ ⎟
Λ = ⎜ 0 λ2 0 ⎟ Λ êi = λi êi , i = 1,2....n
⎜ ⎟
⎜⎝ 0 0 λ 3 ⎟⎠
• Any vector v that is mapped by matrix A onto a parallel vectorλ v
is called an eigenvector of A. The scale factor λ is called the
eigenvalue of vector v .
⎛ 1.5 0.5 ⎞ ⎛ 2 0 ⎞
A = ΦΛΦT =⎜ Λ=⎜ 1 ⎛ 1 −1 ⎞
⎝ 0.5 1.5 ⎟⎠
Φ(45 ) =
0
⎝ 0 1 ⎟⎠ ⎜
2⎝ 1 1 ⎠
⎟
8
Eigenvectors and eigenvalues
• What are the eigenvectors and eigenvalues of our rotated
A = ΦΛΦ
transformation matrix
T
?
Remember…
A xi = ai xi
⎛ λ1 0 ⎞
ΦΛΦ xi = ai xi
T Λ êi = λi êi Λ=⎜ ⎟
⎜⎝ 0 λ 2 ⎟⎠
ΦT ΦΛΦT xi = ΦT (ai xi ) So we know the solution if ΦT xi = êi
Λ ΦT xi = ai ΦT xi Λ êi = ai êi
êi êi ⇒ ai = λi
1 ⎛ 1 −1 ⎞
Φ(450 ) = ⎜ ⎟
2⎝ 1 1 ⎠
1 ⎛ −1⎞ 1 ⎛ 1⎞
f̂1 = ⎜ ⎟ f̂1 =
2 ⎝ 1⎠ 2 ⎜⎝ 1⎟⎠
! 1 ⎛ 1 −1 ⎞⎛ 1 ⎞ 1 ⎛ 1⎞
x1 = Φê1 = ⎜ ⎟ =
2 ⎜⎝ 1⎟⎠
⎜ ⎟
2 ⎜⎝ 1 1 ⎟⎠⎝ 0 ⎠
! ⎛ ⎞
1 ⎛ 1 −1 ⎞ ⎛ 0 ⎞ = 1 −1
x2 = Φê2 = ⎜ ⎟ ⎜ ⎟
2 ⎝ 1 1 ⎠ ⎜⎝ 1 ⎟⎠ 2 ⎝ 1⎠
A = ΦΛΦT
where Φ is a rotation matrix and Λ is a diagonal matrix
(
Φ = f̂1 f̂2 )
• The eigenvalue associated with each eigenvector f̂i is the diagonal
element λi of Λ .
⎛ λ1 0 ⎞
Λ=⎜ ⎟
⎜⎝ 0 λ 2 ⎟⎠
11
Eigenvectors and eigenvalues
• Note that eigenvectors are not unique…
If xi is an eigenvector of A then so is axi
A xi = λi xi A ( axi ) = λi ( axi )
12
Eigenvectors and eigenvalues
• What are the eigenvalues of a general 2-dim matrix A ?
Ax = λ x we only want solutions where
x≠0
A x = λI x
( A − λ I
) =0
x det ( A − λ I ) = 0
⎛ a b ⎞ ⎛ a−λ b ⎞
A=⎜ A − λI = ⎜ det ( A − λ I ) = ( a − λ )( d − λ ) − bc
⎝ c d ⎟⎠ ⎝ c d − λ ⎟⎠
det ( A − λ I ) = ( a − λ )( d − λ ) − bc = 0
ad − λ ( a + d ) + λ2 − bc = 0
λ2 − ( a + d ) λ + ( ad − bc ) = 0 13
Eigenvectors and eigenvalues
• What are the eigenvalues of a general 2-dim matrix?
λ2 − ( a + d ) λ + ( ad − bc ) = 0
1 1
λ± = ( a + d ) ± ( a − d ) + 4bc
2
2 2
14
Eigenvectors and eigenvalues
1 1
λ± = ( a + d ) ± ( a − d ) + 4bc
2
2 2
⎛ a b ⎞ ⎛ 3/ 2 1/ 2 ⎞
A=⎜ A=⎜
⎝ b d ⎟⎠ ⎝ 1 / 2 1 / 2 ⎠⎟
2 2
1 1 1 ⎛ 3 1⎞ 1 ⎛ 3 1⎞ ⎛ 1⎞
λ± = (a + d ) ± (a − d )2
+ 4b
2
λ± = ⎜ + ⎟ ± ⎜ − ⎟ + 4 ⎜⎝ ⎟⎠
2 2
≥0 2 ⎝ 2 2⎠ 2 ⎝ 2 2⎠ 2
15
Eigenvectors and eigenvalues
• Let’s consider a special case of a symmetric matrix
1 1
⎛ a b ⎞ λ± = (a + d ) ± ( a − d )2 + 4b2
A=⎜ 2 2
⎝ b a ⎟⎠
λ+ = a + b λ− = a − b
Ax+ = λ + x+
⎛ a b ⎞
x
⎜⎝ b a ⎟⎠ + = (a + b) x+
⎛ a b ⎞ ⎛ a+b 0 ⎞
x =
⎜⎝ b a ⎟⎠ + ⎜⎝ 0 x+
a + b ⎟⎠
⎛ −b b ⎞
⎜ ⎟ x = 0
⎜⎝ b −b ⎠⎟ +
⎛ −1 1 ⎞ ⎛ x1 ⎞ x1 = s 1 ⎛ 1⎞ 1 ⎛ −1⎞
⎜ ⎟
⎜⎝ 1 −1 ⎟⎠
⎜ ⎟= 0
⎜⎝ x2 ⎟⎠ x+ = ⎜ ⎟ x− = ⎜ ⎟
x2 = s 2 ⎝ 1⎠ 2 ⎝ 1⎠
16
Eigen-decomposition
• The process of writing a matrix A as A = ΦΛΦT is called eigen-
decomposition. It works for any symmetric matrix.
⎛ λ1 0 ⎞
• Let’s rewrite… A = ΦΛΦT Λ=⎜ ⎟
⎜⎝ 0 λ 2 ⎟⎠
AΦ = ΦΛΦT Φ
• Eigenvalue equation
AΦ = ΦΛ
is equivalent to the set of equations A f̂i = λi f̂i
17
Eigenvectors and eigenvalues
• MATLAB® has a function ‘eig’ to calculate eigenvectors and
eigenvalues
A = FV F T
18
Learning Objectives for Lecture 17
19
Variance
• Let’s say we have m observations of a variable x.
N(x)
x( j)
, j = 1,2, 3...., m
σ2
x
jth observation of x
µ
1 m (i )
• The mean of these observations is µ= x = ∑x
m i=1
m
• The variance is σ = ∑ ( x − µ )
2 1 (i ) 2
m i=1
20
Variance
• Now let’s say we have m simultaneous observations of variables
and . x1 x2 x2
σ 22
⎛ x( j) ⎞
⎜ ( j) ⎟
1
, j = 1,2, 3...., m µ2 x1
⎜⎝ x2 ⎟⎠
σ12
σ12 σ12
• x1 has the same variance in both of these cases… also x2
1 m ( j) σ12
σ12 = ∑ ( x1 − µ1 ) ( x2( j ) − µ2 ) ρ=
m j=1 σ12σ 22 22
Gaussian distribution
• Many kinds of data can be fit by a Gaussian distribution
σ 1 − 1 2 ( x− µ )2
P(x) = e 2σ
x σ 2π
µ
24
Multivariate Gaussian distribution
• We can create a Gaussian distribution in two dimensions
x2
x2
P (x1 , x2 ) = P(x1 )P(x2 )
x
1 1
− x12 − x22 P(x2 )
= βe 2 2e x1
normalization to
make total prob = 1
= βe ( x12 + x22 )
−
1
2
− 1 x12
! −1d2 P(x ) ∝ e 2
=
1
P( x) βe 2 x1
!2 ! ! Isotropic multivariate Gaussian
d 2 = x = xT x
distribution
d = Mahalanobis Distance 25
Multivariate Gaussian distribution
y2 y2 y2
y1 y1 y1
Σ = ΦΛΦT
− 1 2 yT y − 1 yT Λ−1y − 1 yT Σ−1y
P( y) = β e 2σ P( y) = βe 2 P( y) = βe 2
⎛ σ2 0 ⎞ ⎛ σ2 σ ⎞
σ 2 Λ=⎜
1
⎟ Σ = ⎜ 1 122 ⎟
⎜⎝ 0 σ 22 ⎟⎠ ⎜⎝ σ12 σ 2 ⎟⎠
1 σ
P(x1 )
y =σ x
27
Multivariate Gaussian distribution
P( y) =? y =σx isotropic: variance = σ 2
y2
− 1 xT x −1 y
P( x) = βe 2 x =σ y
P(y2 )
y1
What is the Mahalanobis distance?
σ
!T ! −1 ! T !
d = x x = (σ y ) (σ −1 y )
2
σ
T −1 −1
= yσ σ y P(y1 )
!T −2 !
d = yσ y
2
1 ⎛ yT y ⎞ 2
− ⎜ 2⎟ y
−1 2
P( y) = βe 2⎝ σ ⎠
= βe 2σ
28
Multivariate Gaussian distribution
Isotropic Non-isotropic
x2 y2
x
P(x2 ) y
x1 y1
σ2
σ1
P(x1 )
y = Sx
⎛ σ1 0 ⎞
S=⎜ ⎟
⎜⎝ 0 σ 2 ⎟⎠
29
Multivariate Gaussian distribution
−1
y2
y = Sx x=S y
P(y2 ) y
y1
What is the Mahalanobis distance? σ2
! ! !T !
d 2 = xT x = ( S−1 y ) ( S−1 y ) σ1
P(y1 )
= yT S−1S−1 y
= yT S−2 y ⎛ σ −2 ⎞ ⎛ σ2 0 ⎞
0
Λ=⎜ ⎟
−1 −2 1
Λ =S =⎜ ⎟
1
!T −1 ! ⎜⎝ 0 σ 2−2 ⎟⎠
d = y Λ y
2
⎜⎝ 0 σ 22 ⎟⎠
matrix of variances
⎛ 2 2⎞
− 1 ⎜ y12 + y2 2 ⎟
− 1 yT Λ−1y
P( y) = βe 2 = βe 2 ⎝ σ1 σ 2 ⎠
30
Multivariate Gaussian distribution
Non-isotropic:
Isotropic
with correlation
x2 y2
x y
P(x2 ) P(y2 )
x1 y1
σ2
σ1
P(x1 ) P(y1 )
T
y = ΦSΦ x
stretch matrix
⎛ σ1 0 ⎞
rotation matrix
S=⎜ ⎟
⎜⎝ 0 σ 2 ⎟⎠
Φ 31
Covariance matrix
T −1 T
y = ΦSΦ x x = ΦS Φ y y2
y
What is the Mahalanobis distance? P(y2 )
y1
! ! !T !
d 2 = xT x = ( ΦS−1ΦT y ) ( ΦS−1ΦT y )
σ2
= yT ΦS−1ΦT ΦS−1ΦT y
σ1
= yT ΦS−2 ΦT y
P(y1 )
= yT ΦΛ−1ΦT y
! ! ⎛ σ2 σ ⎞
d2 = yT Σ −1 y −1 −1
Σ = ΦΛ Φ T
Σ = ⎜ 1 122 ⎟
⎜⎝ σ12 σ 2 ⎟⎠
! !
! − 1 yT Σ−1y
P( y) = βe 2 Σ = ΦΛΦT covariance matrix
32
Multivariate Gaussian distribution
y2 y2 y2
y1 y1 y1
Σ = ΦΛΦT
− 1 2 yT y − 1 yT Λ−1y − 1 yT Σ−1y
P( y) = β e 2σ P( y) = βe 2 P( y) = βe 2
⎛ σ2 0 ⎞ ⎛ σ2 σ ⎞
σ 2 Λ=⎜
1
⎟ Σ = ⎜ 1 122 ⎟
⎜⎝ 0 σ 22 ⎟⎠ ⎜⎝ σ12 σ 2 ⎟⎠
Σ = ΦΛΦT
• Thus, our covariance matrix is just a transformation matrix that turns an isotropic
Gaussian distribution (of variance 1) into non-isotropic multivariate Gaussian.
• The eigenvectors of the covariance matrix are just the basis vectors of the
rotated transformation.
• And the eigenvalues of the covariance matrix are are the variances of the
Gaussian in the directions of these basis vectors.
34
Learning Objectives for Lecture 17
35
How do we fit a Gaussian to
multivariate data?
• In 1-dimension
To find the Gaussian that best fits our
data – σ2
Just measure the mean and variance! x
µ
⎛ z1( j )z2( j⎞
m m
∑
1
σ = ∑
1
2
z1( j )z1( j ) σ12 = )
⎛ σ2 σ ⎞ 1
Σ
m
=⎜ ⎟
j=1 m
= ⎜ 1 122
j=1
⎟
⎜⎝ σ 21 σ 2 ⎟⎠
⎜ m m ⎟
⎝ ∑ σ 2 = ∑ z2 z2⎠
1 1
σ 21 = z2( j)z1( j) 2 ( j) ( j)
m j=1 m j=1 37
Outer product
• We are going to implement a useful trick called the vector
‘outer product’.
Inner product ⎛ z1 ⎞
z =⎜ ⎟
⎜⎝ z2 ⎟⎠
T
z z=
(z 1 z2 ) ⎛ z1 ⎞ 2
= z
⎜ ⎟
⎜⎝ z2 ⎟⎠
1x2 2 x1 1x1
Outer product
T (
⎛ z ⎞ z1 z2 ) ⎛ z1z1 z1z2 ⎞
zz = = ⎜
1
⎜ ⎟ ⎟
⎜⎝ z2 ⎟⎠ ⎜⎝ z1z2 z2 z2 ⎟⎠
2 x1 1x2 2x2 38
Computing the covariance matrix
The covariance matrix has a simpler form using ⎛ z( j) ⎞
• ( j )
z = ⎜ ( j) ⎟
1
outer product.
⎜⎝ z2 ⎟⎠
m ⎛ ( j) ( j)
m
( j ) ( j ) T z1( j ) z2( j ) ⎞
∑ ∑
1 1 z1 z1
z (z ) = ⎜ ( j) ( j)
m j=1 ⎜ z2 z1
⎟
m j=1
⎝ z2 z2 ⎟⎠
( j) ( j)
⎛ σ2 σ ⎞
Σ = ⎜
1 12
⎜⎝ σ 21 σ 222
⎟
⎟⎠
39
Computing the covariance matrix
• Representing data as a matrix
nxm
40
Computing the covariance matrix
• Now finding the covariance matrix is trivial!
Σ
1
= Z ZT
m
⎛ z11 z21 ⎞
⎜ ⎟
1 ⎛ z11 z12 z13 z1m ⎞ ⎜ z12 z22 ⎟ ⎛ σ2 σ ⎞
= ⎜ ⎟ ⎜ z13 z23 ⎟ = ⎜
1 12
⎟
m ⎜⎝ z21 z22 z23 z2 m ⎟⎠ ⎜⎝ σ 21 σ 222 ⎟⎠
⎜ ⎟
⎜z z ⎟
⎝ 1m 2 m ⎠
2xm mx2 2x2
41
Subtracting the mean
• The covariance calculation we just did assumes the data were
mean-subtracted. How to subtract the mean?
• We have m observations of vector x
⎛ x11 x12 x13 x1m ⎞ ⎛ µ1 ⎞
X =⎜ ⎟ ⎜ ⎟
⎜⎝ x21 x22 x23 x2 m ⎟⎠ ⎜⎝ µ2 ⎟⎠
nxm
• First compute the mean in matrix notation and make a matrix
with m copies of this column vector.
⎛ µ1 µ1 µ1 µ1 ⎞
Mu=mean(X,2); Μ=⎜ ⎟
MU=repmat(mu,1,m); ⎜⎝ µ2 µ2 µ2 µ2 ⎟⎠
nxm
• Now subtract this from X to get Z
Z=X-MU;
42
Learning Objectives for Lecture 17
43
Principal Components Analysis
• A method for finding the directions in high-dimensional data that
Eigenfaces
contain information.
⎛ ⎞
1 ⎛ 1 −1 ⎞ 3 0
Φ= ⎜ ⎟ S=⎜ ⎟
( 3)
−1
2⎝ 1 1 ⎠ ⎜ 0 ⎟
⎝ ⎠
T
z = ΦSΦ x
X m=500; Z
X=randn(2,m);
PC2
PC1
1 ⎛ 0.72 −0.70 ⎞
Σ = Z ZT F=⎜
0.72 ⎟⎠ T
m ⎝ 0.70
zf = F z
Σ = FVF T ⎛ 3.28
V =⎜
0 ⎞
⎝ 0 0.31 ⎟⎠
PC2
PC1
Q=Z*Z’/m;
[F,V]=eig(Q); Zf=F’*Z;
47
PCA on time-domain signals
• Let’s look at a problem in the time domain.
⎛ x1 ⎞
j ⎜⎜ x2 ⎟⎟ n = 100
x =⎜ ⎟
⎜ ⎟
⎝ xn ⎠
There are 200 different vectors
⎛ ⎞
(1) (2) (3) (m )
X = (x x x x ) X=⎜
⎜
n x m ⎟ 100
⎟
⎝ ⎠
m = 200
200
48
Covariance matrix
• Do PCA
Mu=mean(X,2);
MU=repmat(mu,1,m);
Z=X-MU;
Q=Z*Z'/m;
[F,V]=eig(cov);
49
Eigenvalues
[F,V]=eig(cov);
var=flip(sum(V)); signal
noise
• The first two eigen-
values are much larger
than all the rest
50
Eigenvectors
• Since there were only two large eigenvalues, we look at the
eigenvectors associated with these eigenvalues
• These are just the first two columns of the F matrix plot(F(:,3), 'r’)
plot(F(:,4), ’g’)
plot(F(:,1), 'r’)
plot(F(:,2), ’g’)
51
Principal components
• Principal components are just the projections of each of the original
data vectors onto the two principal eigenvectors.
52
Filtering using PCA
• Only the first two entries in the column vectors Zf (in the rotated
basis) have signal. So keep only the first two and set the rest to
zero.
Zf=F'*Z;
Zffilt=Zf;
• Then rotate back to the original basis set Zffilt(3:end,:)=0;
Zflt=F*Zffilt;
Xflt=Zflt+MU;
Before filtering After filtering
53
Learning Objectives for Lecture 17
54
MIT OpenCourseWare
https://ocw.mit.edu/
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.