Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 24

Principal Component Analysis (PCA)

J.-S Roger Jang ( 張智星 )


jang@mirlab.org
http://mirlab.org/jang
MIR Lab, CSIE Dept
National Taiwan University
Introduction to PCA
 PCA (Principal  Characteristics:
Component Analysis)  For unlabeled data
 An effective method for  A linear transform with
reducing a dataset’s solid mathematical
dimensionality while foundation
keeping spatial  Applications
characteristics as much  Line/plane fitting
as possible
 Face recognition
 Machine learning
 ...
-2-
Comparison:
PCA & K-Means Clustering

 Common goal: Reduction of unlabeled data


 PCA: dimensionality reduction Quiz!

 Objective function: Variance ↑


 K-means clustering: data count reduction
 Objective function: Distortion ↓

| | | 
X  x1 x2  x n 
 | | | 

-3-
Examples of PCA Projections

 PCA projections
 2D  1D
 3D  2D

-4-
Problem Definition Quiz!

Input  Output
 A dataset X of n d-dim  A unity vector u such
points which are zero that the square sum of
justified: the dataset’s projection
Dataset : X  x1 , x 2 ,..., x n  onto u is maximized.
n
Zero justified :  x i  0
i 1

5 -5-
Projection
Angle between vectors  Projection of x onto u
x u T
xT u
cos   x cos   xT u if u  1
Quiz! x u u

Extension: What is the projection of x onto


the subspace spanned by u1, u2, …, um?

u
x

x cos   xT u if u  1

6 -6-
Eigenvalue & Eigenvector

 Definition of eigenvector x and eigenvalue 


of a square matrix A: Quiz!

Ax  x or  A  I x  0
 x is non-zero  A  I is singular  A  I  0

-7-
Demo of Eigenvectors and Eigenvalue
 Try “eigshow” in MATLAB to plot trajectories
of a linear transform in 2D
 Cleve’s comments

-8-
Mathematical Formulation
Dataset representation:  Square sum:
 X is d by n, with n>d 2

J u   p  pT p  XT u  X u   u
T T T
XXT u
| | | 
X  x1 x2  x n 
 | | | 
 Objective function with
a constraint on u:
Projection of each
max J u   uT XXT u, s.t. uT u  1
column of X onto u: u

x1T u 
~

max J u,    uT XXT u   1  uT u
u ,

 T  Lagrange multiplier
x u
p   2   XT u
 
 T  Lagrange Multipliers | Geometric Meaning & Full Example
x n u 

9 -9-
Optimization of the Obj. Function
Set the gradient to zero:  If we arrange
 J u,     u XX u   1  u u   0
~ T T T eigenvalues such that:
u u

 2 XXT u  2u  0  XXT u  u 1  2    d


XXT: Covariance
 u is the eigenvector matrix times n  Max of J(u) is 1, which
while  is the eigenvalue occurs at u=u1
When u is the  Min of J(u) is d, which
eigenvector: occurs at u=ud
J  u   p  uT XXT u  uT  u  
2

10 -10-
Facts about Symmetric Matrices
 A symmetric matrix have orthogonal eigenvectors
corresponding to different eigenvalues
Proof: Quiz!

 1
Ax   x
1 1

 1 2x T
Ax  x T

1 2 2 x   T
2 1 x2
x
  T T
   
T
Ax
 2 x
2 2 x
 1 A x2  Ax 1 x2   T
1 1 x2
x
 2 x1T x2  1 x1T x2   2  1  x1T x2  0  x1T x2  0.

Quiz: Does a symmetric matrix always have orthogonal eigenvectors?


Answer: No! (Can you give an example?)

11 -11-
Conversion
 Conversion between orthonormal bases

 1, if i  j
ui  u j = uTi u j  
0, otherwise
| | | 
U   u1 u2  ud   UT U  I  U1  UT
 | | | 
y 
| | |  1
y
x  y1u1  y2 u2    yd ud   u1 u2  ud   2   Uy
 
 | | 
|  
 yd 
 y  U1x = UT x
Projection of x onto u1, u2, …

12 -12-
Steps for PCA
1. Find the sample mean: 1 n
μ  xi
n i 1
2. Compute the covariance matrix:
1 1 n
C  XX   (x i μ)(x i μ)T
T

n n i 1
3. Find the eigenvalues of nC and arrange
them into descending order,       
1 2 d

with the corresponding eigenvectors {u , u ,, u } 1 2 d

4. The transformation is y  U x , with


T

| | | 
U  u1 u 2  u d 
 | | | 

13 -13-
Quiz: Prove that both LS and TLS lines

LS vs. TLS go through the average of these n points.

 Problem definition of line fitting


Given a set of n points ( x1 , y1 ), ( x2 , y2 ),..., ( xn , yn )in 2D,
find a line to minimize fitting error defined next :

 LS (least squares) Line : y  ax  b


Quiz!
n
Fitting error : J a, b    axi  b  yi 
2

i 1

 TLS (total least squares) Quiz!

Quiz!
Line : ax  by  c  0 Hint : The shortest distance between
a point x0 , y0  and a line ax  by  c  0
n
Fitting error : J a, b, c   
axi  byi  c 
2

ax0  by0  c
i 1 a 2  b2 is .
a b
2 2

-14-
PCA for TLS
 Problem for ordinary LS (least squares)
 Not robust if the fitting line has a large slope
 PCA can be used for TLS (total least squares)
 Concept of PCA for TLS

-15-
Three Steps of PCA for TLS
 2D  3D Quiz!

 Set data average to zero.  Set data average to zero.


 Find u1 & u2 via PCA.  Find u1, u2, & u3via
Use u2 as the normal PCA. Use u3 as the
vector of the fitting line. normal vector of the
 Use the normal vector fitting plane.
and the data average to  Use the normal vector
find the fitting line. and the data average to
find the fitting plane.
Prove the fitting plane passes
the data average point.

-16-
Tidbits
 Comparison of methods for dim. reduction
 PCA: For unlabeled data  Unsupervised leaning
 LDA (linear discriminant analysis): For
classifying labeled data  Supervised learning
 If d>>n, then we need to have a workaround
for computing the eigenvectors

17 -17-
Example of PCA
IRIS dataset projection

18 -18-
Weakness of PCA for Classification
Not designed for classification problem (with labeled
training data)

Ideal situation Adversary situation

19 -19-
Linear Discriminant Analysis
LDA projection onto directions that can best separate data
of different classes.

Adversary situation Ideal situation


for PCA for LDA

20 -20-
Exercise 1

 Given two vectors x=[3, 4], y=[5, 5], find the


projection of y onto x.

-21-
Exercise 2

 Given a point [1 2 3] and a plane


x+2y+2z=14, find the distance between the
point and the plane.

-22-
Exercise 3

 Find the eigenvalues and eigenvectors of A=[0


3; 1 2].

-23-
Exercise 4

 Given a data matrix X=[2 0 3 -1; 0 -2 -3 1],


compute the variance after projecting the
dataset onto its first principal component.

-24-

You might also like