Agenda: Principal Component Analysis (PCA)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Math 104

Agenda: Principal Component Analysis (PCA)

Definition of principal components


Procedure for PCA
Computing principal components
Examples
Introduction

PCA: vector-space transform used to reduce multi-dimensional datasets for


data analysis

Used as a tool in exploratory data analysis and for making predictive models
History
Introduced by K. Pearson in 1901
Believed this was the correct solution to problems of interest to
biometricians
Did not propose a method for computing principal components

Discussion of practical computing methods: Hotelling, 1933

Karl Pearson Harold Hotelling


Objectives

p variables X1 , . . . , Xp

Find new variables Z1 , . . . , Zp


1 Uncorrelated and arranged in order of importance
2 Describe variation in the data

Lack of correlation means that components are measuring different


‘dimensions’ of the data

New variables Z1 , . . . , Zp are principal components (PC’s)


Dimensionality reduction

Typically, dimensionality reduction is possible when variables are highly correlated

dataset ≈ 1D dataset ≈ 2D

Examples: X1 , X2 , . . . , X10 students’ grades in various classes at stanford


Procedure

De-mean (center) data: X ←− X − X̄


X = X - X.mean(axis=0)
or de-mean and scale data (standardize): X ←− (X − X̄)/std(X)
X = (X - X.mean(axis=0))/X.std(axis=0)
Procedure

De-mean (center) data: X ←− X − X̄


X = X - X.mean(axis=0)
or de-mean and scale data (standardize): X ←− (X − X̄)/std(X)
X = (X - X.mean(axis=0))/X.std(axis=0)
Take SVD (X centered or standardized)

X = U ΣV >
Procedure

De-mean (center) data: X ←− X − X̄


X = X - X.mean(axis=0)
or de-mean and scale data (standardize): X ←− (X − X̄)/std(X)
X = (X - X.mean(axis=0))/X.std(axis=0)
Take SVD (X centered or standardized)

X = U ΣV >

New scores
Z = XV = U Σ
[Z•1 , Z•2 , . . . , Z•p ] = [X•1 , X•2 , . . . , X•p ]V = [σ1 U•1 , σ2 U•2 , . . . , σp U•p ]
[Z•1 , Z•2 , . . . , Z•p ] = [X•1 , X•2 , . . . , X•p ]V = [σ1 U•1 , σ2 U•2 , . . . , σp U•p ]

X̄ = 0 =⇒ Z̄ = 0
[Z•1 , Z•2 , . . . , Z•p ] = [X•1 , X•2 , . . . , X•p ]V = [σ1 U•1 , σ2 U•2 , . . . , σp U•p ]

X̄ = 0 =⇒ Z̄ = 0
Variances
n
X
Var{Z•j } ∝ (Zij − Z̄•j )2 = kZ•j k2 = σj2 kU•j k2 = 1
i=1
[Z•1 , Z•2 , . . . , Z•p ] = [X•1 , X•2 , . . . , X•p ]V = [σ1 U•1 , σ2 U•2 , . . . , σp U•p ]

X̄ = 0 =⇒ Z̄ = 0
Variances
n
X
Var{Z•j } ∝ (Zij − Z̄•j )2 = kZ•j k2 = σj2 kU•j k2 = 1
i=1

Total variance
p
X p
X
Var{Z•j } = Var{X•j }
j=1 j=1
[Z•1 , Z•2 , . . . , Z•p ] = [X•1 , X•2 , . . . , X•p ]V = [σ1 U•1 , σ2 U•2 , . . . , σp U•p ]

X̄ = 0 =⇒ Z̄ = 0
Variances
n
X
Var{Z•j } ∝ (Zij − Z̄•j )2 = kZ•j k2 = σj2 kU•j k2 = 1
i=1

Total variance
p
X p
X
Var{Z•j } = Var{X•j }
j=1 j=1

Correlations: j 6= `
n
X
> >
Cov{Z•j , Z•,` } ∝ (Zij − Z̄•j )(Zi` − Z̄•` ) = Z•j Z•` = σj σ` U•j U•` = 0
i=1

New variables are uncorrelated!


Illustration
536 14. Unsupervised Learning

536 14. Unsupervised Learning

• •

1.0
• • • • •
• • • • •• •• • • ••

1.0
• •
• •• • • •• ••• •• •
• • • •• •

component
• •

0.5
• • •

principal component

0.5
• • • •
••
• •
• • •
• • • • • •

Secondprincipal

• •••••• •• • • •

0.0
0.0
•••••• • • • •
• •
• • ••• •••• • •••• • • • •
•• • • • •

Second
• • • ••••• • •• • •
••• • • • • •

−0.5


• • • •••

••

−0.5
• • • •• • •
• • • • • •••
• •
• • • ••
−1.0
−1.0 −0.5 0.0 •0.5 1.0• • •

−1.0 First principal component


• •
−0.5 0.0 0.5 −1.0 1.0
SumFIGURE
of squared distances
14.21. The best rank-two linearprojections
approximationonto
toFirst first
theprincipal two
half-sphere PCs
data.
component
The right panel shows the projected points with coordinates given by U2 D2 , the
first two principal components of the data.
FIGURE 14.21. The best rank-two linear approximation to the half-sphere data.
The right panel shows the projected points with coordinates given by U2 D2 , the
first two principal components of the data.
two-dimensional principal component surface fit to the half-sphere data
Example 3.4 Shrinkage Methods 67

4
Largest Principal o
Component
o o
2 o o oo o o
o oo oo
ooooo o
oooo o
o o oo
ooo oo o o ooo o o o
o o oo
o o o
o o o o
o ooo oo o oo o o
ooooooooooooooo o oo oo o
X2

o
o ooo ooo o o
0

o o o o oooo ooo ooo o


o o o
o o oo o o
ooo o ooooo
o ooo ooo
oo o o
oo oo oooooooo o o
o
o ooo oo ooo o o
o ooo oo o o
o o ooo o
o
-2

oo o o
oo o Smallest Principal
o
Component

o
o
-4

-4 -2 0 2 4

X1

You might also like