Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

Introduction to Neural

Computation

Prof. Michale Fee


MIT BCS 9.40 — 2017

Lecture 17
Principal Components Analysis
Learning Objectives for Lecture 17

• Eigenvectors and eigenvalues

• Variance and multivariate Gaussian distributions

• Computing a covariance matrix from data

• Principal Components Analysis (PCA)

2
Learning Objectives for Lecture 17

• Eigenvectors and eigenvalues

• Variance and multivariate Gaussian distributions

• Computing a covariance matrix from data

• Principal Components Analysis (PCA)

3
Matrix transformations
 
y = Ax
• In general A maps the set of vectors in R2 onto another set of vectors
in R2 .

−1
A
det(A) ≠ 0
4
Eigenvectors and eigenvalues
• Matrix transformations have special directions
⎛ 1+ δ 0 ⎞ ⎛ 1− δ 0 ⎞
A=⎜ A=⎜
1 ⎟⎠

⎝ 0 ⎝ 0 1+ δ ⎠

⎛ λ1 0 ⎞
These are all diagonal matrices Λ=⎜ ⎟
⎜⎝ 0 λ 2 ⎠⎟ 5
Eigenvectors and eigenvalues
• Some vectors are rotated, some are not.

• For a diagonal matrix, vectors along the axes are scaled, but not
rotated.
⎛ λ1 0 ⎞ ⎛
1 ⎞ ⎛ 1 ⎞
Λ ê1 = λ1 ê1
⎜ ⎟⎜ = λ
⎜⎝ 0 λ 2 ⎟⎠ ⎝ 0 ⎟⎠
1⎜
⎝ 0 ⎟⎠

⎛ λ1 0 ⎞ ⎛
0 ⎞ ⎛ 0 ⎞ Λ ê2 = λ 2 ê2
⎜ ⎟⎜ = λ2 ⎜
⎜⎝ 0 λ 2 ⎟⎠ ⎝ 1 ⎟⎠ ⎝ 1 ⎟⎠
6
Eigenvectors and eigenvalues
• Diagonal matrices have the property that they map any vector parallel
to a standard basis vector into another vector along that standard basis
vector.

eigenvalue equation
⎛ λ1 0 0 ⎞
⎜ ⎟
Λ = ⎜ 0 λ2 0 ⎟ Λ êi = λi êi , i = 1,2....n
⎜ ⎟
⎜⎝ 0 0 λ 3 ⎟⎠

 
• Any vector v that is mapped by matrix A onto a parallel vectorλ v
is called an eigenvector of A. The scale factor λ is called the

eigenvalue of vector v .

• A matrix in Rn has n eigenvectors and n eigenvalues.


7
Eigenvectors and eigenvalues
• What are the special directions of our rotated transformations?

⎛ 1.5 0.5 ⎞ ⎛ 2 0 ⎞
A = ΦΛΦT =⎜ Λ=⎜ 1 ⎛ 1 −1 ⎞
⎝ 0.5 1.5 ⎟⎠
Φ(45 ) =
0
⎝ 0 1 ⎟⎠ ⎜
2⎝ 1 1 ⎠

8
Eigenvectors and eigenvalues
• What are the eigenvectors and eigenvalues of our rotated
A = ΦΛΦ
transformation matrix
T
?

  Remember…
A xi = ai xi
⎛ λ1 0 ⎞
 
ΦΛΦ xi = ai xi
T Λ êi = λi êi Λ=⎜ ⎟
⎜⎝ 0 λ 2 ⎟⎠
  
ΦT ΦΛΦT xi = ΦT (ai xi ) So we know the solution if ΦT xi = êi
 
Λ ΦT xi = ai ΦT xi Λ êi = ai êi

êi êi ⇒ ai = λi

So the solution to the eigenvalue equation ΦΛΦT xi = ai xi is:

eigenvalues eigenvectors what is this?



ai = λi xi = Φêi
9
Eigenvectors and eigenvalues
• The eigenvectors are just the standard basis vectors rotated by the
matrix Φ ! 
xi = Φêi

1 ⎛ 1 −1 ⎞
Φ(450 ) = ⎜ ⎟
2⎝ 1 1 ⎠
1 ⎛ −1⎞ 1 ⎛ 1⎞
f̂1 = ⎜ ⎟ f̂1 =
2 ⎝ 1⎠ 2 ⎜⎝ 1⎟⎠
! 1 ⎛ 1 −1 ⎞⎛ 1 ⎞ 1 ⎛ 1⎞
x1 = Φê1 = ⎜ ⎟ =
2 ⎜⎝ 1⎟⎠
⎜ ⎟
2 ⎜⎝ 1 1 ⎟⎠⎝ 0 ⎠

! ⎛ ⎞
1 ⎛ 1 −1 ⎞ ⎛ 0 ⎞ = 1 −1
x2 = Φê2 = ⎜ ⎟ ⎜ ⎟
2 ⎝ 1 1 ⎠ ⎜⎝ 1 ⎟⎠ 2 ⎝ 1⎠

The eigenvectors are just the columns of Φ!


10
Eigenvectors and eigenvalues
• In summary, a symmetric matrix A can always be written as follows:

A = ΦΛΦT
where Φ is a rotation matrix and Λ is a diagonal matrix

• The eigenvectors of A are the columns of Φ ( the basis vectors, f̂i )

(
Φ = f̂1 f̂2 )
• The eigenvalue associated with each eigenvector f̂i is the diagonal
element λi of Λ .
⎛ λ1 0 ⎞
Λ=⎜ ⎟
⎜⎝ 0 λ 2 ⎟⎠

11
Eigenvectors and eigenvalues
• Note that eigenvectors are not unique…

 
If xi is an eigenvector of A then so is axi
   
A xi = λi xi A ( axi ) = λi ( axi )

… so we usually write eigenvectors as unit vectors

• For a matrix in n-dimensions… there are n different unit eigenvectors

• For a symmetric matrix, the eigenvectors of A are orthogonal (and we


write them as unit vectors)…

… the eigenvectors of A form a complete orthonormal basis


set!

12
Eigenvectors and eigenvalues
• What are the eigenvalues of a general 2-dim matrix A ?

 
Ax = λ x we only want solutions where

  x≠0
A x = λI x

( A − λ I

) =0
x det ( A − λ I ) = 0

⎛ a b ⎞ ⎛ a−λ b ⎞
A=⎜ A − λI = ⎜ det ( A − λ I ) = ( a − λ )( d − λ ) − bc
⎝ c d ⎟⎠ ⎝ c d − λ ⎟⎠

det ( A − λ I ) = ( a − λ )( d − λ ) − bc = 0
ad − λ ( a + d ) + λ2 − bc = 0

Characteristic equation of matrix A

λ2 − ( a + d ) λ + ( ad − bc ) = 0 13
Eigenvectors and eigenvalues
• What are the eigenvalues of a general 2-dim matrix?

Characteristic equation of matrix A

λ2 − ( a + d ) λ + ( ad − bc ) = 0

• Solutions are given by the quadratic formula

1 1
λ± = ( a + d ) ± ( a − d ) + 4bc
2

2 2

eigenvalues can be real, complex, imaginary

14
Eigenvectors and eigenvalues
1 1
λ± = ( a + d ) ± ( a − d ) + 4bc
2

2 2

• For a symmetric matrix

⎛ a b ⎞ ⎛ 3/ 2 1/ 2 ⎞
A=⎜ A=⎜
⎝ b d ⎟⎠ ⎝ 1 / 2 1 / 2 ⎠⎟

2 2
1 1 1 ⎛ 3 1⎞ 1 ⎛ 3 1⎞ ⎛ 1⎞
λ± = (a + d ) ± (a − d )2
+ 4b
2
λ± = ⎜ + ⎟ ± ⎜ − ⎟ + 4 ⎜⎝ ⎟⎠
2 2
≥0 2 ⎝ 2 2⎠ 2 ⎝ 2 2⎠ 2

The eigenvalues of a symmetric matrix are


always real. 2
λ± = 1 ±
2

15
Eigenvectors and eigenvalues
• Let’s consider a special case of a symmetric matrix
1 1
⎛ a b ⎞ λ± = (a + d ) ± ( a − d )2 + 4b2
A=⎜ 2 2
⎝ b a ⎟⎠
λ+ = a + b λ− = a − b
 
Ax+ = λ + x+
⎛ a b ⎞ 
x
⎜⎝ b a ⎟⎠ + = (a + b) x+

⎛ a b ⎞  ⎛ a+b 0 ⎞
x =
⎜⎝ b a ⎟⎠ + ⎜⎝ 0 x+
a + b ⎟⎠

⎛ −b b ⎞ 
⎜ ⎟ x = 0
⎜⎝ b −b ⎠⎟ +

⎛ −1 1 ⎞ ⎛ x1 ⎞ x1 = s  1 ⎛ 1⎞  1 ⎛ −1⎞
⎜ ⎟
⎜⎝ 1 −1 ⎟⎠
⎜ ⎟= 0
⎜⎝ x2 ⎟⎠ x+ = ⎜ ⎟ x− = ⎜ ⎟
x2 = s 2 ⎝ 1⎠ 2 ⎝ 1⎠
16
Eigen-decomposition
• The process of writing a matrix A as A = ΦΛΦT is called eigen-
decomposition. It works for any symmetric matrix.

o the eigenvalues λi are real numbers

o the eigenvectors f̂i form an orthogonal basis set

⎛ λ1 0 ⎞
• Let’s rewrite… A = ΦΛΦT Λ=⎜ ⎟
⎜⎝ 0 λ 2 ⎟⎠
AΦ = ΦΛΦT Φ
• Eigenvalue equation
AΦ = ΦΛ
is equivalent to the set of equations A f̂i = λi f̂i

17
Eigenvectors and eigenvalues
• MATLAB® has a function ‘eig’ to calculate eigenvectors and
eigenvalues

A = FV F T

18
Learning Objectives for Lecture 17

• Eigenvectors and eigenvalues

• Variance and multivariate Gaussian distributions

• Computing a covariance matrix from data

• Principal Components Analysis (PCA)

19
Variance
• Let’s say we have m observations of a variable x.
N(x)
x( j)
, j = 1,2, 3...., m
σ2
x
jth observation of x
µ

1 m (i )
• The mean of these observations is µ= x = ∑x
m i=1

m
• The variance is σ = ∑ ( x − µ )
2 1 (i ) 2

m i=1
20
Variance
• Now let’s say we have m simultaneous observations of variables
and . x1 x2 x2

σ 22
⎛ x( j) ⎞
⎜ ( j) ⎟
1
, j = 1,2, 3...., m µ2 x1
⎜⎝ x2 ⎟⎠

σ12

• The mean and variance of x1 and x2 are: µ1


1 m ( j) 1 m ( j)
µ1 = ∑ x1 µ2 = ∑ x2
m j=1 m j=1
1 m
σ = ∑ ( x2 − µ2 )
m
σ1 = ∑ ( x1 − µ1 )
2 1 ( j) 2 2 ( j) 2
2
m j=1 m j=1
21
Covariance
uncorrelated correlated
x2 x2
σ 22 σ 22
x1 x1
σ12 = 0 σ12 > 0

σ12 σ12
• x1 has the same variance in both of these cases… also x2

• So we need another measure to describe the relation between x1 and x2 .


covariance correlation

1 m ( j) σ12
σ12 = ∑ ( x1 − µ1 ) ( x2( j ) − µ2 ) ρ=
m j=1 σ12σ 22 22
Gaussian distribution
• Many kinds of data can be fit by a Gaussian distribution

If x is a Gaussian random variable

P(x) probability ‘density’

σ 1 − 1 2 ( x− µ )2
P(x) = e 2σ
x σ 2π
µ

• Gaussian is defined by only its mean and variance.

To find the Gaussian that best fits our


data – σ2
Just measure the mean and variance! x
µ 23
Gaussian distribution
• We are going develop a description of Gaussian distributions in higher
dimensions

• Develop deep insights into high-dimensional data

• We will develop this description using vector and matrix notation

o vectors and matrices are the natural format to manipulate


data sets

o very compact notation

o manipulations are trivial in MATLAB®

24
Multivariate Gaussian distribution
• We can create a Gaussian distribution in two dimensions

Two independent Gaussian random variables x1 and x2

x2
x2
P (x1 , x2 ) = P(x1 )P(x2 )

x
1 1
− x12 − x22 P(x2 )
= βe 2 2e x1
normalization to
make total prob = 1
= βe ( x12 + x22 )

1
2
− 1 x12
! −1d2 P(x ) ∝ e 2
=
1
P( x) βe 2 x1
!2 ! ! Isotropic multivariate Gaussian
d 2 = x = xT x
distribution
d = Mahalanobis Distance 25
Multivariate Gaussian distribution
y2 y2 y2

y1 y1 y1

Σ = ΦΛΦT

Isotropic Non-isotropic: Non-isotropic:


no correlation with correlation

     
 − 1 2 yT y  − 1 yT Λ−1y  − 1 yT Σ−1y
P( y) = β e 2σ P( y) = βe 2 P( y) = βe 2

⎛ σ2 0 ⎞ ⎛ σ2 σ ⎞
σ 2 Λ=⎜
1
⎟ Σ = ⎜ 1 122 ⎟
⎜⎝ 0 σ 22 ⎟⎠ ⎜⎝ σ12 σ 2 ⎟⎠

variance variance matrix covariance matrix 26


Multivariate Gaussian distribution
isotropic: variance = 1 isotropic: variance = σ2
x2 y2

 y
x
P(x2 )
x1 y1
σ

1 σ

P(x1 )

 
y =σ x
27
Multivariate Gaussian distribution
  
P( y) =? y =σx isotropic: variance = σ 2
y2
  
 − 1 xT x  −1  y
P( x) = βe 2 x =σ y
P(y2 )
y1
What is the Mahalanobis distance?
σ
!T ! −1 ! T !
d = x x = (σ y ) (σ −1 y )
2
σ

T −1 −1 
= yσ σ y P(y1 )

!T −2 !
d = yσ y
2

1 ⎛ yT y ⎞ 2
 − ⎜ 2⎟ y
−1 2
P( y) = βe 2⎝ σ ⎠
= βe 2σ
28
Multivariate Gaussian distribution
Isotropic Non-isotropic
x2 y2

x 
P(x2 ) y
x1 y1
σ2

σ1

P(x1 )

 
y = Sx
⎛ σ1 0 ⎞
S=⎜ ⎟
⎜⎝ 0 σ 2 ⎟⎠
29
Multivariate Gaussian distribution
   −1 
y2
y = Sx x=S y

P(y2 ) y
y1
What is the Mahalanobis distance? σ2

! ! !T !
d 2 = xT x = ( S−1 y ) ( S−1 y ) σ1

P(y1 )
 
= yT S−1S−1 y
 
= yT S−2 y ⎛ σ −2 ⎞ ⎛ σ2 0 ⎞
0
Λ=⎜ ⎟
−1 −2 1
Λ =S =⎜ ⎟
1
!T −1 ! ⎜⎝ 0 σ 2−2 ⎟⎠
d = y Λ y
2
⎜⎝ 0 σ 22 ⎟⎠
matrix of variances
  ⎛ 2 2⎞
− 1 ⎜ y12 + y2 2 ⎟
 − 1 yT Λ−1y
P( y) = βe 2 = βe 2 ⎝ σ1 σ 2 ⎠

30
Multivariate Gaussian distribution
Non-isotropic:
Isotropic
with correlation
x2 y2

 
x y
P(x2 ) P(y2 )
x1 y1
σ2

σ1

P(x1 ) P(y1 )
 T
y = ΦSΦ x
stretch matrix
⎛ σ1 0 ⎞
rotation matrix
S=⎜ ⎟
⎜⎝ 0 σ 2 ⎟⎠
Φ 31
Covariance matrix
 T  −1 T 
y = ΦSΦ x x = ΦS Φ y y2

y
What is the Mahalanobis distance? P(y2 )
y1
! ! !T !
d 2 = xT x = ( ΦS−1ΦT y ) ( ΦS−1ΦT y )
σ2
 
= yT ΦS−1ΦT ΦS−1ΦT y
  σ1
= yT ΦS−2 ΦT y
  P(y1 )
= yT ΦΛ−1ΦT y

! ! ⎛ σ2 σ ⎞
d2 = yT Σ −1 y −1 −1
Σ = ΦΛ Φ T
Σ = ⎜ 1 122 ⎟
⎜⎝ σ12 σ 2 ⎟⎠
! !
! − 1 yT Σ−1y
P( y) = βe 2 Σ = ΦΛΦT covariance matrix
32
Multivariate Gaussian distribution
y2 y2 y2

y1 y1 y1

Σ = ΦΛΦT

Isotropic Non-isotropic: Non-isotropic:


no correlation with correlation

     
 − 1 2 yT y  − 1 yT Λ−1y  − 1 yT Σ−1y
P( y) = β e 2σ P( y) = βe 2 P( y) = βe 2

⎛ σ2 0 ⎞ ⎛ σ2 σ ⎞
σ 2 Λ=⎜
1
⎟ Σ = ⎜ 1 122 ⎟
⎜⎝ 0 σ 22 ⎟⎠ ⎜⎝ σ12 σ 2 ⎟⎠

variance variance matrix covariance matrix 33


Eigen-decomposition of the
covariance matrix
y2
 
 − 1 yT Σ−1y
P( y) = βe 2 Σ−1 = ΦΛ−1ΦT
y1

Σ = ΦΛΦT
• Thus, our covariance matrix is just a transformation matrix that turns an isotropic
Gaussian distribution (of variance 1) into non-isotropic multivariate Gaussian.

• The eigenvectors of the covariance matrix are just the basis vectors of the
rotated transformation.

• And the eigenvalues of the covariance matrix are are the variances of the
Gaussian in the directions of these basis vectors.

34
Learning Objectives for Lecture 17

• Eigenvectors and eigenvalues

• Variance and multivariate Gaussian distributions

• Computing a covariance matrix from data

• Principal Components Analysis (PCA)

35
How do we fit a Gaussian to
multivariate data?
• In 1-dimension
To find the Gaussian that best fits our
data – σ2
Just measure the mean and variance! x
µ

• Compute the covariance matrix! x2


( j )
x , j = 1,2, 3...., m

µ
• First we subtract the mean
( j )
z
( j ) ( j )   1 m ( j ) ( j )
z = x −µ µ = ∑x x
m j=1 x1
36
Computing the covariance matrix
from data
• Compute the covariance matrix of a set of multivariate
observations z2
( j )
x , j = 1,2, 3...., m

• First we subtract the mean z1


( j ) ( j )   1 m ( j ) ( j )
z = x −µ µ = ∑x z
m j=1

⎛ z1( j )z2( j⎞
m m

1
σ = ∑
1
2
z1( j )z1( j ) σ12 = )
⎛ σ2 σ ⎞ 1

Σ
m
=⎜ ⎟
j=1 m
= ⎜ 1 122
j=1

⎜⎝ σ 21 σ 2 ⎟⎠
⎜ m m ⎟
⎝ ∑ σ 2 = ∑ z2 z2⎠
1 1
σ 21 = z2( j)z1( j) 2 ( j) ( j)
m j=1 m j=1 37
Outer product
• We are going to implement a useful trick called the vector
‘outer product’.

Inner product  ⎛ z1 ⎞
z =⎜ ⎟
⎜⎝ z2 ⎟⎠
T 
z z=
(z 1 z2 ) ⎛ z1 ⎞ 2
= z
⎜ ⎟
⎜⎝ z2 ⎟⎠
1x2 2 x1 1x1
Outer product

 T (
⎛ z ⎞ z1 z2 ) ⎛ z1z1 z1z2 ⎞
zz = = ⎜
1
⎜ ⎟ ⎟
⎜⎝ z2 ⎟⎠ ⎜⎝ z1z2 z2 z2 ⎟⎠
2 x1 1x2 2x2 38
Computing the covariance matrix
The covariance matrix has a simpler form using ⎛ z( j) ⎞
• ( j )
z = ⎜ ( j) ⎟
1
outer product.
⎜⎝ z2 ⎟⎠

m ⎛ ( j) ( j)
m
( j ) ( j ) T z1( j ) z2( j ) ⎞
∑ ∑
1 1 z1 z1
z (z ) = ⎜ ( j) ( j)
m j=1 ⎜ z2 z1

m j=1
⎝ z2 z2 ⎟⎠
( j) ( j)

⎛ σ2 σ ⎞
Σ = ⎜
1 12

⎜⎝ σ 21 σ 222

⎟⎠

39
Computing the covariance matrix
• Representing data as a matrix

We have m observations of vector


 ( j) ⎛ z1( j ) ⎞
• z z = ⎜ ( j) ⎟ , j = 1...m
⎜⎝ z2 ⎟⎠
• Put them in matrix form as follows

(1) (2) (3) (m )


Z = (z z z  z )

j = 1 2 3 ... m = number of samples

⎛ z11 z12 z13  z1m ⎞ dimension of


Z =⎜ ⎟ n
⎜⎝ z21 z22 z23  z2 m ⎟⎠ data vector =2

nxm
40
Computing the covariance matrix
• Now finding the covariance matrix is trivial!

Σ
1
= Z ZT
m

⎛ z11 z21 ⎞
⎜ ⎟
1 ⎛ z11 z12 z13  z1m ⎞ ⎜ z12 z22 ⎟ ⎛ σ2 σ ⎞
= ⎜ ⎟ ⎜ z13 z23 ⎟ = ⎜
1 12

m ⎜⎝ z21 z22 z23  z2 m ⎟⎠ ⎜⎝ σ 21 σ 222 ⎟⎠
⎜ ⎟
⎜z z ⎟
⎝ 1m 2 m ⎠
2xm mx2 2x2

41
Subtracting the mean
• The covariance calculation we just did assumes the data were
mean-subtracted. How to subtract the mean?

• We have m observations of vector x
⎛ x11 x12 x13  x1m ⎞ ⎛ µ1 ⎞
X =⎜ ⎟ ⎜ ⎟
⎜⎝ x21 x22 x23  x2 m ⎟⎠ ⎜⎝ µ2 ⎟⎠
nxm
• First compute the mean in matrix notation and make a matrix
with m copies of this column vector.
⎛ µ1 µ1 µ1  µ1 ⎞
Mu=mean(X,2); Μ=⎜ ⎟
MU=repmat(mu,1,m); ⎜⎝ µ2 µ2 µ2  µ2 ⎟⎠
nxm
• Now subtract this from X to get Z
Z=X-MU;
42
Learning Objectives for Lecture 17

• Eigenvectors and eigenvalues

• Variance and multivariate Gaussian distributions

• Computing a covariance matrix from data

• Principal Components Analysis (PCA)

43
Principal Components Analysis
• A method for finding the directions in high-dimensional data that
Eigenfaces
contain information.

Genetic profiling Single-cell


transcriptional profiling

Screen shot ©mikedusenberry.com. All rights reserved. This


content is excluded from our Creative Commons license. For
more information, see https://ocw.mit.edu/help/faq-fair-use/.
Eigenworm
Spike Sorting

Screen shot ©JoVE. All rights reserved. This content is


Figure 12 by Hwang, Wen-Jyi, et al., “Efficient Architecture for Spike Sorting in Reconfigurable
excluded from our Creative Commons license. For more
information, see https://ocw.mit.edu/help/faq-fair-use/.
44
Hardware.” Sensors 13 no. 11 (2013): 14860-14887. MDPI Open Access. License: CC BY.
PCA demo on Gaussian points

⎛ ⎞
1 ⎛ 1 −1 ⎞ 3 0
Φ= ⎜ ⎟ S=⎜ ⎟
( 3)
−1
2⎝ 1 1 ⎠ ⎜ 0 ⎟
⎝ ⎠

 T
z = ΦSΦ x

X m=500; Z
X=randn(2,m);

R=[1 -1;1 1]/sqrt(2);


S=[1.73 0 ; 0 0.577];
%
Z=R*S*R’*X; 45
PCA demo on Gaussian points

PC2
PC1
1 ⎛ 0.72 −0.70 ⎞
Σ = Z ZT F=⎜
0.72 ⎟⎠  T 
m ⎝ 0.70
zf = F z
Σ = FVF T ⎛ 3.28
V =⎜
0 ⎞
⎝ 0 0.31 ⎟⎠

Q=Z*Z’/m; F=-fliplr(F); Zf=F’*Z;


[F,V]=eig(Q); V=flip(sum(V)); 46
Clustering

PC2
PC1

Q=Z*Z’/m;
[F,V]=eig(Q); Zf=F’*Z;

47
PCA on time-domain signals
• Let’s look at a problem in the time domain.

• Here we have many examples of a noisy signal in time.

Each example has 100 time points

⎛ x1 ⎞
 j ⎜⎜ x2 ⎟⎟ n = 100
x =⎜ ⎟

⎜ ⎟
⎝ xn ⎠
There are 200 different vectors
⎛ ⎞
(1) (2) (3) (m )
X = (x x x  x ) X=⎜

n x m ⎟ 100

⎝ ⎠
m = 200
200
48
Covariance matrix
• Do PCA

o Subtract the mean


o Compute the covariance matrix
1
o Find the eigenvectors and Σ =
m
Z ZT
eigenvalues

Mu=mean(X,2);
MU=repmat(mu,1,m);
Z=X-MU;
Q=Z*Z'/m;

[F,V]=eig(cov);
49
Eigenvalues
[F,V]=eig(cov);
var=flip(sum(V)); signal

noise
• The first two eigen-
values are much larger
than all the rest

• The first two eigen-


values explain over
60% of the total
variance.

50
Eigenvectors
• Since there were only two large eigenvalues, we look at the
eigenvectors associated with these eigenvalues

• These are just the first two columns of the F matrix plot(F(:,3), 'r’)
plot(F(:,4), ’g’)

plot(F(:,1), 'r’)
plot(F(:,2), ’g’)

51
Principal components
• Principal components are just the projections of each of the original
data vectors onto the two principal eigenvectors.

• Remember, this is just a change of basis using the matrix F


 T 
zf = F z Zf=F’*Z;

plot(Zf(1,:),Zf(2,:), ‘o’) plot(Zf(2,:),Zf(3,:), ‘o’)

52
Filtering using PCA
• Only the first two entries in the column vectors Zf (in the rotated
basis) have signal. So keep only the first two and set the rest to
zero.
Zf=F'*Z;
Zffilt=Zf;
• Then rotate back to the original basis set Zffilt(3:end,:)=0;
Zflt=F*Zffilt;
Xflt=Zflt+MU;
Before filtering After filtering

53
Learning Objectives for Lecture 17

• Eigenvectors and eigenvalues

• Variance and multivariate Gaussian distributions

• Computing a covariance matrix from data

• Principal Components Analysis (PCA)

54
MIT OpenCourseWare
https://ocw.mit.edu/

9.40 Introduction to Neural Computation


Spring 2018

For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

You might also like