Professional Documents
Culture Documents
Agenda: Principal Component Analysis (PCA)
Agenda: Principal Component Analysis (PCA)
Agenda: Principal Component Analysis (PCA)
Used as a tool in exploratory data analysis and for making predictive models
History
Introduced by K. Pearson in 1901
Believed this was the correct solution to problems of interest to
biometricians
Did not propose a method for computing principal components
p variables X1 , . . . , Xp
dataset ≈ 1D dataset ≈ 2D
X = U ΣV >
Procedure
X = U ΣV >
New scores
Z = XV = U Σ
[Z•1 , Z•2 , . . . , Z•p ] = [X•1 , X•2 , . . . , X•p ]V = [σ1 U•1 , σ2 U•2 , . . . , σp U•p ]
[Z•1 , Z•2 , . . . , Z•p ] = [X•1 , X•2 , . . . , X•p ]V = [σ1 U•1 , σ2 U•2 , . . . , σp U•p ]
X̄ = 0 =⇒ Z̄ = 0
[Z•1 , Z•2 , . . . , Z•p ] = [X•1 , X•2 , . . . , X•p ]V = [σ1 U•1 , σ2 U•2 , . . . , σp U•p ]
X̄ = 0 =⇒ Z̄ = 0
Variances
n
X
Var{Z•j } ∝ (Zij − Z̄•j )2 = kZ•j k2 = σj2 kU•j k2 = 1
i=1
[Z•1 , Z•2 , . . . , Z•p ] = [X•1 , X•2 , . . . , X•p ]V = [σ1 U•1 , σ2 U•2 , . . . , σp U•p ]
X̄ = 0 =⇒ Z̄ = 0
Variances
n
X
Var{Z•j } ∝ (Zij − Z̄•j )2 = kZ•j k2 = σj2 kU•j k2 = 1
i=1
Total variance
p
X p
X
Var{Z•j } = Var{X•j }
j=1 j=1
[Z•1 , Z•2 , . . . , Z•p ] = [X•1 , X•2 , . . . , X•p ]V = [σ1 U•1 , σ2 U•2 , . . . , σp U•p ]
X̄ = 0 =⇒ Z̄ = 0
Variances
n
X
Var{Z•j } ∝ (Zij − Z̄•j )2 = kZ•j k2 = σj2 kU•j k2 = 1
i=1
Total variance
p
X p
X
Var{Z•j } = Var{X•j }
j=1 j=1
Correlations: j 6= `
n
X
> >
Cov{Z•j , Z•,` } ∝ (Zij − Z̄•j )(Zi` − Z̄•` ) = Z•j Z•` = σj σ` U•j U•` = 0
i=1
• •
•
1.0
• • • • •
• • • • •• •• • • ••
1.0
• •
• •• • • •• ••• •• •
• • • •• •
component
• •
0.5
• • •
principal component
0.5
• • • •
••
• •
• • •
• • • • • •
Secondprincipal
•
• •••••• •• • • •
0.0
0.0
•••••• • • • •
• •
• • ••• •••• • •••• • • • •
•• • • • •
Second
• • • ••••• • •• • •
••• • • • • •
−0.5
•
•
• • • •••
•
••
−0.5
• • • •• • •
• • • • • •••
• •
• • • ••
−1.0
−1.0 −0.5 0.0 •0.5 1.0• • •
4
Largest Principal o
Component
o o
2 o o oo o o
o oo oo
ooooo o
oooo o
o o oo
ooo oo o o ooo o o o
o o oo
o o o
o o o o
o ooo oo o oo o o
ooooooooooooooo o oo oo o
X2
o
o ooo ooo o o
0
oo o o
oo o Smallest Principal
o
Component
o
o
-4
-4 -2 0 2 4
X1