Professional Documents
Culture Documents
WEEK 8 PCA
WEEK 8 PCA
Victor&Lavrenko&and&Nigel&Goddard&
School&of&Informa6cs&
Semester&1&
Overview&
• Curse&of&dimensionality&
• Different&ways&to&reduce&dimensionality&
• Principal&Components&Analysis&(PCA)&
• Example:&Eigen&Faces&
• PCA&for&classifica6on&
• WiSen&&&Frank&sec6on&7.3&
– only&the&PCA&sec6on&required&
Copyright&©&2014&Victor&Lavrenko&
True&vs.&observed&dimensionality&
• Get&a&popula6on,&predict&some&property&
– instances&represented&as&{urefu,&height}&pairs&
– what&is&the&dimensionality&of&this&data?&
• Data&points&over&6me&from&
different&geographic&areas&
over&6me:&
• X1:&#&of&skidding&accidents&
• X2:&#&of&burst&water&pipes&
• X3:&snowaplow&expenditures&
• X4:&#&of&school&closures&
• X5:&#&pa6ents&with&heat&stroke&
“height” = “urefu” in Swahili
Temperature?
Copyright&©&2014&Victor&Lavrenko&
Curse&of&dimensionality&
• Datasets&typically&high&dimensional&
– vision:&104&pixels,&text:&106&words&
• the&way&we&observe&/&record&them&
– true&dimensionality&oeen&much&lower&
• a&manifold&(sheet)&in&a&highad&space&
• Example:&handwriSen&digits&
– 20&x&20&bitmap:&{0,1}400&possible&events&
• will&never&see&most&of&these&events&
• actual&digits:&6ny&frac6on&of&events&
– true&dimensionality:&
• possible&varia6ons&of&the&penastroke&
Copyright&©&2014&Victor&Lavrenko&
Curse&of&dimensionality&(2)&
• Machine&learning&methods&are&sta6s6cal&by&nature&
– count&observa6ons&in&various®ions&of&some&space&
– use&counts&to&construct&the&predictor&f(x)&
– e.g.&decision&trees:&p+/pa&in&{o=rain,w=strong,T>28o}&
– text:&#documents&in&{“hp”&and&“3d”&and¬&“$”&and&…)&
• As&dimensionality&grows:&fewer&&observa6ons&per®ion&
– 1d:&3®ions,&2d:&32®ions,&1000d&–&hopeless&
– sta6s6cs&need&repe66on&
• flip&a&coin&once&!&head&
• P(head)&=&100%?&
Copyright&©&2014&Victor&Lavrenko&
Dealing&with&high&dimensionality&
• Use&domain&knowledge&
– feature&engineering:&SIFT,&MFCC&
• Make&assump6on&about&dimensions&
– independence:&count&along&&
each&dimension&separately&
– smoothness:&propagate&class&&
counts&to&neighboring®ions&
– symmetry:&e.g.&invariance&to&&
order&of&dimensions:&x1&"&x2&
• Reduce&the&dimensionality&of&the&data&
– create&a&new&set&of&dimensions&(variables)&
Copyright&©&2014&Victor&Lavrenko&
Dimensionality&reduc6on&
• Goal:&represent&instances&with&fewer&variables&
– try&to&preserve&as&much&structure&in&the&data&as&possible&
– discrimina6ve:&only&structure&that&affects&class&separability&
• Feature&selec6on&
– pick&a&subset&of&the&original&dimensions&X1#X2#X3#…#Xd(1#Xd#
– discrimina6ve:&pick&good&class&“predictors”&(e.g.&gain)&
• Feature&extrac6on&
– construct&a&new&set&of&dimensions&
& & & &Ei&=&f(X1…Xd)&
– (linear)&combina6ons&of&original&
Copyright&©&2014&Victor&Lavrenko&
Principal&Components&Analysis&
• Defines&a&set&of&principal&components&
– 1st:&direc6on&of&the&greatest&variability&in&the&data&
– 2nd:&perpendicular&to&1st,&greatest&variability&of&what’s&lee&
– ...&and&so&on&un6l&d&(original&dimensionality)&
• First&m<<d#components&become&m#new&dimensions&
– change&coordinates&of&every&data&point&to&these&dimensions&
Copyright&©&2014&Victor&Lavrenko&
Why&greatest&variability?&
• Example:&reduce&2adimensional&data&to&1ad&
– {x1,x2}&!&e’&(along&new&axis&e)&
• Pick&e&to&maximize&variability&
• Reduces&cases&when&two&&
points&are&close&in&easpace&
but&very&far&in&(x,y)aspace&
• Minimizes&distances&&
between&original&points&&
and&their&projec6ons&
Copyright&©&2014&Victor&Lavrenko&
Principal&components&
• “Center”&the&data&at&zero:&&xi,a&=&xi,a&–&μa&
– subtract&mean&from&each&aSribute&
• Compute&covariance&matrix&Σ&
– covariance&of&dimensions&x1&and&x2:& 1 n 2
var(a) = ∑ x ia
n i=1
• do&x1&and&x2&tend&to&increase&together?& 1 n
cov(b,a) = ∑ x ib x ia
• or&does&x2&decrease&as&x1&increases?& n i=1
• Mul6ply&a&vector&by&Σ:&&&&&&&&&&&&&&&&&&&&&again&
"2.0 0.8%" −1% " −1.2%
$ '$ ' →$ '
#0.8 0.6&# +1& # −0.2&
$ −2.5'
→&€ ) →&
% −1.0(
$ −6.0' $ −14.1' $ −33.3'
) →& ) →& )
% −2.7( % −6.4 ( % −15.1(
– turns&towards&direc6on&of&variance&
€
slope: 0.400 0.450 0.454 0.454
€ €
• Want&vectors&e&which&aren’t&turned:&Σ&e&=&λ&e"
€ € €
– e&…&eigenvectors&of&Σ,&&λ&…&corresponding&eigenvalues&
– principal&components&=&eigenvectors&w.&largest&eigenvalues&&
Copyright&©&2014&Victor&Lavrenko&
Finding&Principal&Components&
1.&find&eigenvalues&by&solving:&det(Σ&–&λI)&=&0&
$2.0 − λ 0.8 ' 2
det& ) = (2 − λ )(0.6 − λ ) − (0.8)(0.8) = λ − 2.6 λ + 0.56 = 0
% 0.8 0.6 − λ (
( )
{λ1, λ2 } = 12 2.6 ± 2.6 2 − 4 * 0.56 = {2.36,0.23}
€
2.&find&i &eigenvector&by&solving:&Σ&ei&=&λi&ei&
th €
€
"2.0 0.8%" e1,1 % " e1,1 % 2.0e1,1 + 0.8e1,2 = 2.36e1,1
$ '$ ' € = 2.36$ ' e1,1 = 2.2e1,2
# 0.8 0.6 e
&# 1,2 & # e1,2 & 0.8e1,1 + 0.6e1,2 = 2.36e1,2
"2.2 %
"2.0 0.8%" e2,1 % " e2,1 % #−0.41& e1 ~ $ '
$ '$ ' = 0.23$ ' e2 = % ( #1 &
€ #0.8 0.6&# e2,2 & # e2,2 & $ 0.91 ' € want: ||e1|| = 1
€
"0.91%
3.&1st&PC:&&&&&&&,&2nd&PC:
"0.91% #−0.41&
e1 = $ '
$ '
#0.41& &%$ 0.91 (' #0.41&
€
€ € Copyright&©&2014&Victor&Lavrenko& slope: 0.454
€ €
€
Projec6ng&to&new&dimensions&
• e1&…&em&are&new&dimension&vectors& e1
• Have&instance&&x&=&{x1…xd}&(original&coordinates)&
• Want&new&coordinates&x’&=&{x’1&…&x’m}:& e2
1. “center”&the&instance&(subtract&the&mean):&x’aμ" x’-µ
2. “project”&to&each&dimension:&(x’aμ)Tej&for&j=1…m&&
! !
( x − µ) = [( x1 − µ1 ) ( x 2 − µ2 ) ! ( xd − µd ) ]
" x1 ' % " ( x" − µ" )T e"1 % " ( x1 − µ1 )e1,1 + ( x 2 − µ2 )e1,2 +# + ( x d − µd )e1,d %
$ ' $ " " T" ' $ '
µ) e2 ' $ ( x1 − µ1 )e2,1 + ( x 2 − µ2 )e2,2 +# + ( x d − µd )e2,d '
€ $ x 2 ' ' = $ ( x −€ =$ '
$ ! ' $ ! ' !
$ ' $ " " T" ' $ '
# x m '& #( x − µ) em & #( x1 − µ1 )em,1 + ( x 2 − µ2 )em,2 +# + ( x d − µd )em,d &
Copyright&©&2014&Victor&Lavrenko&
Direc6on&of&greatest&variability&
• Select&dimension&e&which&maximizes&the&variance&
• Points&xi&“projected”&onto&vector&e:&
• Variance&of&
projec6ons:&
• Maximize&variance&
V
– want&unit&length:&||e||=1&
– add&Lagrange&mul6plier& ∂V 2 n $ d '
$ d ( = ∑&&∑ x ij e j )) x ia − 2 λea = 0
∂ea n i=1 % j =1 (
& ∑ cov(1, j)e j = λ€
e1 &
& j =1 & #1 n
d &
% ! ) hold for 2 e
∑ j % n ∑ x ia x ij ( = 2λea
d
e must be an & & a=1..d $ i=1 '
eigenvector &∑ cov(d, j)e j = λed & j =1
Variance&along&eigenvector&
Variance of projected points (xTe):
1 n # d &
µ = ∑%∑ x ij e j ((
%
n i=1 $ j =1 '
d # &
1 n
1 #d &# d & = ∑% ∑ x ij (e j
n
n i=1 '
= ∑%%∑ x ij e j ((%∑€x ia ea ( j =1 $
dimensions
Copyright&©&2014&Victor&Lavrenko&
PCA&in&a&nutshell& 3. compute covariance matrix
4. eigenvectors + eigenvalues
eig(cov(data))
Copyright&©&2014&Victor&Lavrenko&
PCA&example:&Eigen&Faces&
input: dataset of N face images face: K x K bitmap of pixels “unfold” each bitmap to
K2-dimensional vector
arrange in a matrix
each face = column
K2 x N
set of m eigenvectors
Copyright&©&2014&Victor&Lavrenko&
each is K2-dimensional
Eigen&Faces:&Projec6on&
= mean +
• Project&new&face&to&
space&of&eigenafaces&
• Represent&vector&as&
a&linear&combina6on&
of&principal&components&
• How&many&do&we&need?&
Copyright&©&2014&Victor&Lavrenko&
(Eigen)&Face&Recogni6on&
• Face&similarity&
– in&the&reduced&space&
– insensi6ve&to&ligh6ng&
expression,&orienta6on&
• Projec6ng&new&“faces”&
– everything&is&a&face&
new face
projected to eigenfaces
Copyright&©&2014&Victor&Lavrenko&
PCA:&prac6cal&issues&
• Covariance&extremely&sensi6ve&to&large&values&
– mul6ply&some&dimension&by&1000&
• dominates&covariance&
• becomes&a&principal&component&
– normalize&each&dimension&to&zero&mean&and&unit&variance:&&
&x’&=&(x&–&mean)&/&st.dev&
• PCA&assumes&underlying&subspace&is&linear&
– 1d:&straight&line&
2d:&flat&sheet&
– transform&to&handle&
nonalinear&spaces&
(manifolds)&
Copyright&©&2014&Victor&Lavrenko& direction of greatest variability
PCA&and&classifica6on&
• PCA&is&unsupervised&
– maximizes&overall&variance&of&the&data&along&&
a&small&set&of&direc6ons&
– does¬&know&anything&&
about&class&labels&
– can&pick&direc6on&&
that&makes&it&hard&&
to&separate&classes&
• Discrimina6ve&approach&
– look&for&a&dimension&&
that&makes&it&easy&to&&
separate&classes&
Copyright&©&2014&Victor&Lavrenko&
Linear&Discriminant&Analysis&
• LDA:&pick&a&new&dimension&that&gives:&
– maximum&separa6on&between&means&of&projected&classes&
– minimum&variance&within&each&projected&class&
• Solu6on:&eigenvectors&based&on&betweenaclass&and&
withinaclass&covariance&matrices&
Copyright&©&2014&Victor&Lavrenko&
PCA&vs.&LDA&
• LDA¬&guaranteed&to&be&beSer&for&classifica6on&
– assumes&classes&are&unimodal&Gaussians&
– fails&when&discriminatory&informa6on&is¬&in&the&mean,&
but&in&the&variance&of&the&data&
• Example&where&PCA&gives&a&beSer&projec6on:&
Copyright&©&2014&Victor&Lavrenko&
Dimensionality&reduc6on&
• Pros&
– reflects&our&intui6ons&about&the&data&
– allows&es6ma6ng&probabili6es&in&highadimensional&data&
• no&need&to&assume&independence&etc.&
– drama6c&reduc6on&in&size&of&data&
• faster&processing&(as&long&as&reduc6on&is&fast),&smaller&storage&
• Cons&
– too&expensive&for&many&applica6ons&(TwiSer,&web)&
– disastrous&for&tasks&with&fineagrained&classes&
– understand&assump6ons&behind&the&methods&(linearity&etc.)&
• there&may&be&beSer&ways&to&deal&with&sparseness&
Copyright&©&2014&Victor&Lavrenko&
Summary&
• True&dimensionality&<<&observed&dimensionality&
• High&dimensionality&#&sparse,&unstable&es6mates&
• Dealing&with&high&dimensionality:&
– use&domain&knowledge&
– make&an&assump6on:&independence&/&smoothness&/&symmetry&
– dimensionality&reduc6on:&feature&selec6on&/&feature&extrac6on&
• Principal&Components&Analysis&(PCA)&
– picks&dimensions&that&maximize&variability&
• eigenvectors&of&the&covariance&matrix&
– examples:&Eigen&Faces&
– variant&for&classifica6on:&Linear&Discriminant&Analysis&
Copyright&©&2014&Victor&Lavrenko&