MIT9 40S18 Lec17

Introduction to Neural
Computation
Prof. Michale Fee

MIT BCS 9.40 — 2017
Lecture 17
Principal Components Analysis
Learning Objectives for Lecture 17
• Eigenvectors and eigenvalues
• Variance and multivariate Gaussian distributions
• Computing a covariance matrix from data
• Principal Components Analysis (PCA)
2
3
Matrix transformations
 
y = Ax
• In general A maps the set of vectors in R2 onto another set of vectors
in R2 .
−1
A
det(A) ≠ 0
4
Eigenvectors and eigenvalues
• Matrix transformations have special directions
⎛ 1+ δ 0 ⎞ ⎛ 1− δ 0 ⎞
A=⎜ A=⎜
1 ⎟⎠
⎟
⎝ 0 ⎝ 0 1+ δ ⎠
⎛ λ1 0 ⎞
These are all diagonal matrices Λ=⎜ ⎟
⎜⎝ 0 λ 2 ⎠⎟ 5
• Some vectors are rotated, some are not.
• For a diagonal matrix, vectors along the axes are scaled, but not
rotated.
⎛ λ1 0 ⎞ ⎛
1 ⎞ ⎛ 1 ⎞
Λ ê1 = λ1 ê1
⎜ ⎟⎜ = λ
⎜⎝ 0 λ 2 ⎟⎠ ⎝ 0 ⎟⎠
1⎜
⎝ 0 ⎟⎠
⎛ λ1 0 ⎞ ⎛
0 ⎞ ⎛ 0 ⎞ Λ ê2 = λ 2 ê2
⎜ ⎟⎜ = λ2 ⎜
⎜⎝ 0 λ 2 ⎟⎠ ⎝ 1 ⎟⎠ ⎝ 1 ⎟⎠
6
• Diagonal matrices have the property that they map any vector parallel
to a standard basis vector into another vector along that standard basis
vector.
eigenvalue equation
⎛ λ1 0 0 ⎞
⎜ ⎟
Λ = ⎜ 0 λ2 0 ⎟ Λ êi = λi êi , i = 1,2....n
⎜ ⎟
⎜⎝ 0 0 λ 3 ⎟⎠
 
• Any vector v that is mapped by matrix A onto a parallel vectorλ v
is called an eigenvector of A. The scale factor λ is called the

eigenvalue of vector v .
• A matrix in Rn has n eigenvectors and n eigenvalues.

7
• What are the special directions of our rotated transformations?
⎛ 1.5 0.5 ⎞ ⎛ 2 0 ⎞
A = ΦΛΦT =⎜ Λ=⎜ 1 ⎛ 1 −1 ⎞
⎝ 0.5 1.5 ⎟⎠
Φ(45 ) =
0
⎝ 0 1 ⎟⎠ ⎜
2⎝ 1 1 ⎠
⎟
8
• What are the eigenvectors and eigenvalues of our rotated
A = ΦΛΦ
transformation matrix
T
?
  Remember…
A xi = ai xi
⎛ λ1 0 ⎞
 
ΦΛΦ xi = ai xi
T Λ êi = λi êi Λ=⎜ ⎟
⎜⎝ 0 λ 2 ⎟⎠
  
ΦT ΦΛΦT xi = ΦT (ai xi ) So we know the solution if ΦT xi = êi
 
Λ ΦT xi = ai ΦT xi Λ êi = ai êi
êi êi ⇒ ai = λi
So the solution to the eigenvalue equation ΦΛΦT xi = ai xi is:
eigenvalues eigenvectors what is this?


ai = λi xi = Φêi
9
• The eigenvectors are just the standard basis vectors rotated by the
matrix Φ ! 
xi = Φêi
1 ⎛ 1 −1 ⎞
Φ(450 ) = ⎜ ⎟
2⎝ 1 1 ⎠
1 ⎛ −1⎞ 1 ⎛ 1⎞
f̂1 = ⎜ ⎟ f̂1 =
2 ⎝ 1⎠ 2 ⎜⎝ 1⎟⎠
! 1 ⎛ 1 −1 ⎞⎛ 1 ⎞ 1 ⎛ 1⎞
x1 = Φê1 = ⎜ ⎟ =
2 ⎜⎝ 1⎟⎠
⎜ ⎟
2 ⎜⎝ 1 1 ⎟⎠⎝ 0 ⎠
! ⎛ ⎞
1 ⎛ 1 −1 ⎞ ⎛ 0 ⎞ = 1 −1
x2 = Φê2 = ⎜ ⎟ ⎜ ⎟
2 ⎝ 1 1 ⎠ ⎜⎝ 1 ⎟⎠ 2 ⎝ 1⎠
The eigenvectors are just the columns of Φ!

10
• In summary, a symmetric matrix A can always be written as follows:
A = ΦΛΦT
where Φ is a rotation matrix and Λ is a diagonal matrix
• The eigenvectors of A are the columns of Φ ( the basis vectors, f̂i )
(
Φ = f̂1 f̂2 )
• The eigenvalue associated with each eigenvector f̂i is the diagonal
element λi of Λ .
⎛ λ1 0 ⎞
Λ=⎜ ⎟
⎜⎝ 0 λ 2 ⎟⎠
11
• Note that eigenvectors are not unique…
 
If xi is an eigenvector of A then so is axi
   
A xi = λi xi A ( axi ) = λi ( axi )
… so we usually write eigenvectors as unit vectors
• For a matrix in n-dimensions… there are n different unit eigenvectors
• For a symmetric matrix, the eigenvectors of A are orthogonal (and we

write them as unit vectors)…
… the eigenvectors of A form a complete orthonormal basis

set!
12
• What are the eigenvalues of a general 2-dim matrix A ?
 
Ax = λ x we only want solutions where

  x≠0
A x = λI x
( A − λ I

) =0
x det ( A − λ I ) = 0
⎛ a b ⎞ ⎛ a−λ b ⎞
A=⎜ A − λI = ⎜ det ( A − λ I ) = ( a − λ )( d − λ ) − bc
⎝ c d ⎟⎠ ⎝ c d − λ ⎟⎠
det ( A − λ I ) = ( a − λ )( d − λ ) − bc = 0
ad − λ ( a + d ) + λ2 − bc = 0
Characteristic equation of matrix A
λ2 − ( a + d ) λ + ( ad − bc ) = 0 13
• What are the eigenvalues of a general 2-dim matrix?
Characteristic equation of matrix A
λ2 − ( a + d ) λ + ( ad − bc ) = 0
• Solutions are given by the quadratic formula
1 1
λ± = ( a + d ) ± ( a − d ) + 4bc
2
2 2
eigenvalues can be real, complex, imaginary
14
1 1
λ± = ( a + d ) ± ( a − d ) + 4bc
2
2 2
• For a symmetric matrix
⎛ a b ⎞ ⎛ 3/ 2 1/ 2 ⎞
A=⎜ A=⎜
⎝ b d ⎟⎠ ⎝ 1 / 2 1 / 2 ⎠⎟
2 2
1 1 1 ⎛ 3 1⎞ 1 ⎛ 3 1⎞ ⎛ 1⎞
λ± = (a + d ) ± (a − d )2
+ 4b
2
λ± = ⎜ + ⎟ ± ⎜ − ⎟ + 4 ⎜⎝ ⎟⎠
2 2
≥0 2 ⎝ 2 2⎠ 2 ⎝ 2 2⎠ 2
The eigenvalues of a symmetric matrix are

always real. 2
λ± = 1 ±
2
15
• Let’s consider a special case of a symmetric matrix
1 1
⎛ a b ⎞ λ± = (a + d ) ± ( a − d )2 + 4b2
A=⎜ 2 2
⎝ b a ⎟⎠
λ+ = a + b λ− = a − b
 
Ax+ = λ + x+
⎛ a b ⎞ 
x
⎜⎝ b a ⎟⎠ + = (a + b) x+
⎛ a b ⎞  ⎛ a+b 0 ⎞
x =
⎜⎝ b a ⎟⎠ + ⎜⎝ 0 x+
a + b ⎟⎠
⎛ −b b ⎞ 
⎜ ⎟ x = 0
⎜⎝ b −b ⎠⎟ +
⎛ −1 1 ⎞ ⎛ x1 ⎞ x1 = s  1 ⎛ 1⎞  1 ⎛ −1⎞
⎜ ⎟
⎜⎝ 1 −1 ⎟⎠
⎜ ⎟= 0
⎜⎝ x2 ⎟⎠ x+ = ⎜ ⎟ x− = ⎜ ⎟
x2 = s 2 ⎝ 1⎠ 2 ⎝ 1⎠
16
Eigen-decomposition
• The process of writing a matrix A as A = ΦΛΦT is called eigen-
decomposition. It works for any symmetric matrix.
o the eigenvalues λi are real numbers
o the eigenvectors f̂i form an orthogonal basis set
⎛ λ1 0 ⎞
• Let’s rewrite… A = ΦΛΦT Λ=⎜ ⎟
⎜⎝ 0 λ 2 ⎟⎠
AΦ = ΦΛΦT Φ
• Eigenvalue equation
AΦ = ΦΛ
is equivalent to the set of equations A f̂i = λi f̂i
17
• MATLAB® has a function ‘eig’ to calculate eigenvectors and
eigenvalues
A = FV F T
18
19
Variance
• Let’s say we have m observations of a variable x.
N(x)
x( j)
, j = 1,2, 3...., m
σ2
x
jth observation of x
µ
1 m (i )
• The mean of these observations is µ= x = ∑x
m i=1
m
• The variance is σ = ∑ ( x − µ )
2 1 (i ) 2
m i=1
20
Variance
• Now let’s say we have m simultaneous observations of variables
and . x1 x2 x2
σ 22
⎛ x( j) ⎞
⎜ ( j) ⎟
1
, j = 1,2, 3...., m µ2 x1
⎜⎝ x2 ⎟⎠
σ12
• The mean and variance of x1 and x2 are: µ1

1 m ( j) 1 m ( j)
µ1 = ∑ x1 µ2 = ∑ x2
m j=1 m j=1
1 m
σ = ∑ ( x2 − µ2 )
m
σ1 = ∑ ( x1 − µ1 )
2 1 ( j) 2 2 ( j) 2
2
m j=1 m j=1
21
Covariance
uncorrelated correlated
x2 x2
σ 22 σ 22
x1 x1
σ12 = 0 σ12 > 0
σ12 σ12
• x1 has the same variance in both of these cases… also x2
• So we need another measure to describe the relation between x1 and x2 .

covariance correlation
1 m ( j) σ12
σ12 = ∑ ( x1 − µ1 ) ( x2( j ) − µ2 ) ρ=
m j=1 σ12σ 22 22
Gaussian distribution
• Many kinds of data can be fit by a Gaussian distribution
If x is a Gaussian random variable
P(x) probability ‘density’
σ 1 − 1 2 ( x− µ )2
P(x) = e 2σ
x σ 2π
µ
• Gaussian is defined by only its mean and variance.
To find the Gaussian that best fits our

data – σ2
Just measure the mean and variance! x
µ 23
Gaussian distribution
• We are going develop a description of Gaussian distributions in higher
dimensions
• Develop deep insights into high-dimensional data
• We will develop this description using vector and matrix notation
o vectors and matrices are the natural format to manipulate

data sets
o very compact notation
o manipulations are trivial in MATLAB®
24
Multivariate Gaussian distribution
• We can create a Gaussian distribution in two dimensions
Two independent Gaussian random variables x1 and x2
x2
x2
P (x1 , x2 ) = P(x1 )P(x2 )

x
1 1
− x12 − x22 P(x2 )
= βe 2 2e x1
normalization to
make total prob = 1
= βe ( x12 + x22 )
−
1
2
− 1 x12
! −1d2 P(x ) ∝ e 2
=
1
P( x) βe 2 x1
!2 ! ! Isotropic multivariate Gaussian
d 2 = x = xT x
distribution
d = Mahalanobis Distance 25
y2 y2 y2
y1 y1 y1
Σ = ΦΛΦT
Isotropic Non-isotropic: Non-isotropic:

no correlation with correlation
     
 − 1 2 yT y  − 1 yT Λ−1y  − 1 yT Σ−1y
P( y) = β e 2σ P( y) = βe 2 P( y) = βe 2
⎛ σ2 0 ⎞ ⎛ σ2 σ ⎞
σ 2 Λ=⎜
1
⎟ Σ = ⎜ 1 122 ⎟
⎜⎝ 0 σ 22 ⎟⎠ ⎜⎝ σ12 σ 2 ⎟⎠
variance variance matrix covariance matrix 26

isotropic: variance = 1 isotropic: variance = σ2
x2 y2

 y
x
P(x2 )
x1 y1
σ
1 σ
P(x1 )
 
y =σ x
27
  
P( y) =? y =σx isotropic: variance = σ 2
y2
  
 − 1 xT x  −1  y
P( x) = βe 2 x =σ y
P(y2 )
y1
What is the Mahalanobis distance?
σ
!T ! −1 ! T !
d = x x = (σ y ) (σ −1 y )
2
σ
T −1 −1 
= yσ σ y P(y1 )
!T −2 !
d = yσ y
2
1 ⎛ yT y ⎞ 2
 − ⎜ 2⎟ y
−1 2
P( y) = βe 2⎝ σ ⎠
= βe 2σ
28
Isotropic Non-isotropic
x2 y2

x 
P(x2 ) y
x1 y1
σ2
σ1
P(x1 )
 
y = Sx
⎛ σ1 0 ⎞
S=⎜ ⎟
⎜⎝ 0 σ 2 ⎟⎠
29
   −1 
y2
y = Sx x=S y

P(y2 ) y
y1
What is the Mahalanobis distance? σ2
! ! !T !
d 2 = xT x = ( S−1 y ) ( S−1 y ) σ1
P(y1 )
 
= yT S−1S−1 y
 
= yT S−2 y ⎛ σ −2 ⎞ ⎛ σ2 0 ⎞
0
Λ=⎜ ⎟
−1 −2 1
Λ =S =⎜ ⎟
1
!T −1 ! ⎜⎝ 0 σ 2−2 ⎟⎠
d = y Λ y
2
⎜⎝ 0 σ 22 ⎟⎠
matrix of variances
  ⎛ 2 2⎞
− 1 ⎜ y12 + y2 2 ⎟
 − 1 yT Λ−1y
P( y) = βe 2 = βe 2 ⎝ σ1 σ 2 ⎠
30
Non-isotropic:
Isotropic
with correlation
x2 y2
 
x y
P(x2 ) P(y2 )
x1 y1
σ2
σ1
P(x1 ) P(y1 )
 T
y = ΦSΦ x
stretch matrix
⎛ σ1 0 ⎞
rotation matrix
S=⎜ ⎟
⎜⎝ 0 σ 2 ⎟⎠
Φ 31
Covariance matrix
 T  −1 T 
y = ΦSΦ x x = ΦS Φ y y2

y
What is the Mahalanobis distance? P(y2 )
y1
! ! !T !
d 2 = xT x = ( ΦS−1ΦT y ) ( ΦS−1ΦT y )
σ2
 
= yT ΦS−1ΦT ΦS−1ΦT y
  σ1
= yT ΦS−2 ΦT y
  P(y1 )
= yT ΦΛ−1ΦT y
! ! ⎛ σ2 σ ⎞
d2 = yT Σ −1 y −1 −1
Σ = ΦΛ Φ T
Σ = ⎜ 1 122 ⎟
⎜⎝ σ12 σ 2 ⎟⎠
! !
! − 1 yT Σ−1y
P( y) = βe 2 Σ = ΦΛΦT covariance matrix
32
y2 y2 y2
y1 y1 y1
Σ = ΦΛΦT
Isotropic Non-isotropic: Non-isotropic:

no correlation with correlation
     
 − 1 2 yT y  − 1 yT Λ−1y  − 1 yT Σ−1y
P( y) = β e 2σ P( y) = βe 2 P( y) = βe 2
⎛ σ2 0 ⎞ ⎛ σ2 σ ⎞
σ 2 Λ=⎜
1
⎟ Σ = ⎜ 1 122 ⎟
⎜⎝ 0 σ 22 ⎟⎠ ⎜⎝ σ12 σ 2 ⎟⎠
variance variance matrix covariance matrix 33

Eigen-decomposition of the
covariance matrix
y2
 
 − 1 yT Σ−1y
P( y) = βe 2 Σ−1 = ΦΛ−1ΦT
y1
Σ = ΦΛΦT
• Thus, our covariance matrix is just a transformation matrix that turns an isotropic
Gaussian distribution (of variance 1) into non-isotropic multivariate Gaussian.
• The eigenvectors of the covariance matrix are just the basis vectors of the
rotated transformation.
• And the eigenvalues of the covariance matrix are are the variances of the
Gaussian in the directions of these basis vectors.
34
35
How do we fit a Gaussian to
multivariate data?
• In 1-dimension
To find the Gaussian that best fits our
data – σ2
Just measure the mean and variance! x
µ
• Compute the covariance matrix! x2

( j )
x , j = 1,2, 3...., m

µ
• First we subtract the mean
( j )
z
( j ) ( j )   1 m ( j ) ( j )
z = x −µ µ = ∑x x
m j=1 x1
36
Computing the covariance matrix
from data
• Compute the covariance matrix of a set of multivariate
observations z2
( j )
x , j = 1,2, 3...., m
• First we subtract the mean z1

( j ) ( j )   1 m ( j ) ( j )
z = x −µ µ = ∑x z
m j=1
⎛ z1( j )z2( j⎞
m m
∑
1
σ = ∑
1
2
z1( j )z1( j ) σ12 = )
⎛ σ2 σ ⎞ 1
Σ
m
=⎜ ⎟
j=1 m
= ⎜ 1 122
j=1
⎟
⎜⎝ σ 21 σ 2 ⎟⎠
⎜ m m ⎟
⎝ ∑ σ 2 = ∑ z2 z2⎠
1 1
σ 21 = z2( j)z1( j) 2 ( j) ( j)
m j=1 m j=1 37
Outer product
• We are going to implement a useful trick called the vector
‘outer product’.
Inner product  ⎛ z1 ⎞
z =⎜ ⎟
⎜⎝ z2 ⎟⎠
T 
z z=
(z 1 z2 ) ⎛ z1 ⎞ 2
= z
⎜ ⎟
⎜⎝ z2 ⎟⎠
1x2 2 x1 1x1
Outer product
 T (
⎛ z ⎞ z1 z2 ) ⎛ z1z1 z1z2 ⎞
zz = = ⎜
1
⎜ ⎟ ⎟
⎜⎝ z2 ⎟⎠ ⎜⎝ z1z2 z2 z2 ⎟⎠
2 x1 1x2 2x2 38
The covariance matrix has a simpler form using ⎛ z( j) ⎞
• ( j )
z = ⎜ ( j) ⎟
1
outer product.
⎜⎝ z2 ⎟⎠
m ⎛ ( j) ( j)
m
( j ) ( j ) T z1( j ) z2( j ) ⎞
∑ ∑
1 1 z1 z1
z (z ) = ⎜ ( j) ( j)
m j=1 ⎜ z2 z1
⎟
m j=1
⎝ z2 z2 ⎟⎠
( j) ( j)
⎛ σ2 σ ⎞
Σ = ⎜
1 12
⎜⎝ σ 21 σ 222
⎟
⎟⎠
39
• Representing data as a matrix
We have m observations of vector

 ( j) ⎛ z1( j ) ⎞
• z z = ⎜ ( j) ⎟ , j = 1...m
⎜⎝ z2 ⎟⎠
• Put them in matrix form as follows
(1) (2) (3) (m )

Z = (z z z  z )
j = 1 2 3 ... m = number of samples
⎛ z11 z12 z13  z1m ⎞ dimension of

Z =⎜ ⎟ n
⎜⎝ z21 z22 z23  z2 m ⎟⎠ data vector =2
nxm
40
• Now finding the covariance matrix is trivial!
Σ
1
= Z ZT
m
⎛ z11 z21 ⎞
⎜ ⎟
1 ⎛ z11 z12 z13  z1m ⎞ ⎜ z12 z22 ⎟ ⎛ σ2 σ ⎞
= ⎜ ⎟ ⎜ z13 z23 ⎟ = ⎜
1 12
⎟
m ⎜⎝ z21 z22 z23  z2 m ⎟⎠ ⎜⎝ σ 21 σ 222 ⎟⎠
⎜ ⎟
⎜z z ⎟
⎝ 1m 2 m ⎠
2xm mx2 2x2
41
Subtracting the mean
• The covariance calculation we just did assumes the data were
mean-subtracted. How to subtract the mean?

• We have m observations of vector x
⎛ x11 x12 x13  x1m ⎞ ⎛ µ1 ⎞
X =⎜ ⎟ ⎜ ⎟
⎜⎝ x21 x22 x23  x2 m ⎟⎠ ⎜⎝ µ2 ⎟⎠
nxm
• First compute the mean in matrix notation and make a matrix
with m copies of this column vector.
⎛ µ1 µ1 µ1  µ1 ⎞
Mu=mean(X,2); Μ=⎜ ⎟
MU=repmat(mu,1,m); ⎜⎝ µ2 µ2 µ2  µ2 ⎟⎠
nxm
• Now subtract this from X to get Z
Z=X-MU;
42
43
Principal Components Analysis
• A method for finding the directions in high-dimensional data that
Eigenfaces
contain information.
Genetic profiling Single-cell

transcriptional profiling
Screen shot ©mikedusenberry.com. All rights reserved. This

content is excluded from our Creative Commons license. For
more information, see https://ocw.mit.edu/help/faq-fair-use/.
Eigenworm
Spike Sorting
Screen shot ©JoVE. All rights reserved. This content is

Figure 12 by Hwang, Wen-Jyi, et al., “Efficient Architecture for Spike Sorting in Reconfigurable
excluded from our Creative Commons license. For more
information, see https://ocw.mit.edu/help/faq-fair-use/.
44
Hardware.” Sensors 13 no. 11 (2013): 14860-14887. MDPI Open Access. License: CC BY.
PCA demo on Gaussian points
⎛ ⎞
1 ⎛ 1 −1 ⎞ 3 0
Φ= ⎜ ⎟ S=⎜ ⎟
( 3)
−1
2⎝ 1 1 ⎠ ⎜ 0 ⎟
⎝ ⎠
 T
z = ΦSΦ x
X m=500; Z
X=randn(2,m);
R=[1 -1;1 1]/sqrt(2);

S=[1.73 0 ; 0 0.577];
%
Z=R*S*R’*X; 45
PCA demo on Gaussian points
PC2
PC1
1 ⎛ 0.72 −0.70 ⎞
Σ = Z ZT F=⎜
0.72 ⎟⎠  T 
m ⎝ 0.70
zf = F z
Σ = FVF T ⎛ 3.28
V =⎜
0 ⎞
⎝ 0 0.31 ⎟⎠
Q=Z*Z’/m; F=-fliplr(F); Zf=F’*Z;

[F,V]=eig(Q); V=flip(sum(V)); 46
Clustering
PC2
PC1
Q=Z*Z’/m;
[F,V]=eig(Q); Zf=F’*Z;
47
PCA on time-domain signals
• Let’s look at a problem in the time domain.
• Here we have many examples of a noisy signal in time.
Each example has 100 time points
⎛ x1 ⎞
 j ⎜⎜ x2 ⎟⎟ n = 100
x =⎜ ⎟

⎜ ⎟
⎝ xn ⎠
There are 200 different vectors
⎛ ⎞
(1) (2) (3) (m )
X = (x x x  x ) X=⎜
⎜
n x m ⎟ 100
⎟
⎝ ⎠
m = 200
200
48
Covariance matrix
• Do PCA
o Subtract the mean

o Compute the covariance matrix
1
o Find the eigenvectors and Σ =
m
Z ZT
eigenvalues
Mu=mean(X,2);
MU=repmat(mu,1,m);
Z=X-MU;
Q=Z*Z'/m;
[F,V]=eig(cov);
49
Eigenvalues
[F,V]=eig(cov);
var=flip(sum(V)); signal
noise
• The first two eigen-
values are much larger
than all the rest
• The first two eigen-

values explain over
60% of the total
variance.
50
Eigenvectors
• Since there were only two large eigenvalues, we look at the
eigenvectors associated with these eigenvalues
• These are just the first two columns of the F matrix plot(F(:,3), 'r’)
plot(F(:,4), ’g’)
plot(F(:,1), 'r’)
plot(F(:,2), ’g’)
51
Principal components
• Principal components are just the projections of each of the original
data vectors onto the two principal eigenvectors.
• Remember, this is just a change of basis using the matrix F

 T 
zf = F z Zf=F’*Z;
plot(Zf(1,:),Zf(2,:), ‘o’) plot(Zf(2,:),Zf(3,:), ‘o’)
52
Filtering using PCA
• Only the first two entries in the column vectors Zf (in the rotated
basis) have signal. So keep only the first two and set the rest to
zero.
Zf=F'*Z;
Zffilt=Zf;
• Then rotate back to the original basis set Zffilt(3:end,:)=0;
Zflt=F*Zffilt;
Xflt=Zflt+MU;
Before filtering After filtering
53
54
MIT OpenCourseWare
https://ocw.mit.edu/
9.40 Introduction to Neural Computation

Spring 2018
For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.

MIT9 40S18 Lec17

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MIT9 40S18 Lec17

Uploaded by

Copyright:

Available Formats

Introduction to Neural

Prof. Michale Fee

• Eigenvectors and eigenvalues

• Variance and multivariate Gaussian distributions

• Computing a covariance matrix from data

• Principal Components Analysis (PCA)

• Eigenvectors and eigenvalues

• Variance and multivariate Gaussian distributions

• Computing a covariance matrix from data

• Principal Components Analysis (PCA)

• A matrix in Rn has n eigenvectors and n eigenvalues.

So the solution to the eigenvalue equation ΦΛΦT xi = ai xi is:

eigenvalues eigenvectors what is this?

The eigenvectors are just the columns of Φ!

• The eigenvectors of A are the columns of Φ ( the basis vectors, f̂i )

… so we usually write eigenvectors as unit vectors

• For a matrix in n-dimensions… there are n different unit eigenvectors

• For a symmetric matrix, the eigenvectors of A are orthogonal (and we

… the eigenvectors of A form a complete orthonormal basis

Characteristic equation of matrix A

Characteristic equation of matrix A

• Solutions are given by the quadratic formula

eigenvalues can be real, complex, imaginary

• For a symmetric matrix

The eigenvalues of a symmetric matrix are

o the eigenvalues λi are real numbers

o the eigenvectors f̂i form an orthogonal basis set

• Eigenvectors and eigenvalues

• Variance and multivariate Gaussian distributions

• Computing a covariance matrix from data

• Principal Components Analysis (PCA)

• The mean and variance of x1 and x2 are: µ1

• So we need another measure to describe the relation between x1 and x2 .

If x is a Gaussian random variable

P(x) probability ‘density’

• Gaussian is defined by only its mean and variance.

To find the Gaussian that best fits our

• Develop deep insights into high-dimensional data

• We will develop this description using vector and matrix notation

o vectors and matrices are the natural format to manipulate

o very compact notation

o manipulations are trivial in MATLAB®

Two independent Gaussian random variables x1 and x2

Isotropic Non-isotropic: Non-isotropic:

variance variance matrix covariance matrix 26

Isotropic Non-isotropic: Non-isotropic:

variance variance matrix covariance matrix 33

• Eigenvectors and eigenvalues

• Variance and multivariate Gaussian distributions

• Computing a covariance matrix from data

• Principal Components Analysis (PCA)

• Compute the covariance matrix! x2

• First we subtract the mean z1

We have m observations of vector

(1) (2) (3) (m )

j = 1 2 3 ... m = number of samples

⎛ z11 z12 z13  z1m ⎞ dimension of

• Eigenvectors and eigenvalues

• Variance and multivariate Gaussian distributions

• Computing a covariance matrix from data

• Principal Components Analysis (PCA)

Genetic profiling Single-cell

Screen shot ©mikedusenberry.com. All rights reserved. This

Q=ZZ’/m; F=-fliplr(F); Zf=F’Z;