Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

IAML:&Dimensionality&Reduc6on&

Victor&Lavrenko&and&Nigel&Goddard&
School&of&Informa6cs&

Semester&1&
Overview&
• Curse&of&dimensionality&
• Different&ways&to&reduce&dimensionality&
• Principal&Components&Analysis&(PCA)&
• Example:&Eigen&Faces&
• PCA&for&classifica6on&
• WiSen&&&Frank&sec6on&7.3&
– only&the&PCA&sec6on&required&

Copyright&©&2014&Victor&Lavrenko&
True&vs.&observed&dimensionality&
• Get&a&popula6on,&predict&some&property&
– instances&represented&as&{urefu,&height}&pairs&
– what&is&the&dimensionality&of&this&data?&
• Data&points&over&6me&from&
different&geographic&areas&
over&6me:&
• X1:&#&of&skidding&accidents&
• X2:&#&of&burst&water&pipes&
• X3:&snowaplow&expenditures&
• X4:&#&of&school&closures&
• X5:&#&pa6ents&with&heat&stroke&
“height” = “urefu” in Swahili
Temperature?
Copyright&©&2014&Victor&Lavrenko&
Curse&of&dimensionality&
• Datasets&typically&high&dimensional&
– vision:&104&pixels,&text:&106&words&
• the&way&we&observe&/&record&them&
– true&dimensionality&oeen&much&lower&
• a&manifold&(sheet)&in&a&highad&space&
• Example:&handwriSen&digits&
– 20&x&20&bitmap:&{0,1}400&possible&events&
• will&never&see&most&of&these&events&
• actual&digits:&6ny&frac6on&of&events&
– true&dimensionality:&
• possible&varia6ons&of&the&penastroke&

Copyright&©&2014&Victor&Lavrenko&
Curse&of&dimensionality&(2)&
• Machine&learning&methods&are&sta6s6cal&by&nature&
– count&observa6ons&in&various&regions&of&some&space&
– use&counts&to&construct&the&predictor&f(x)&
– e.g.&decision&trees:&p+/pa&in&{o=rain,w=strong,T>28o}&
– text:&#documents&in&{“hp”&and&“3d”&and&not&“$”&and&…)&
• As&dimensionality&grows:&fewer&&observa6ons&per&region&
– 1d:&3&regions,&2d:&32&regions,&1000d&–&hopeless&
– sta6s6cs&need&repe66on&
• flip&a&coin&once&!&head&
• P(head)&=&100%?&

Copyright&©&2014&Victor&Lavrenko&
Dealing&with&high&dimensionality&
• Use&domain&knowledge&
– feature&engineering:&SIFT,&MFCC&
• Make&assump6on&about&dimensions&
– independence:&count&along&&
each&dimension&separately&
– smoothness:&propagate&class&&
counts&to&neighboring&regions&
– symmetry:&e.g.&invariance&to&&
order&of&dimensions:&x1&"&x2&
• Reduce&the&dimensionality&of&the&data&
– create&a&new&set&of&dimensions&(variables)&

Copyright&©&2014&Victor&Lavrenko&
Dimensionality&reduc6on&
• Goal:&represent&instances&with&fewer&variables&
– try&to&preserve&as&much&structure&in&the&data&as&possible&
– discrimina6ve:&only&structure&that&affects&class&separability&
• Feature&selec6on&
– pick&a&subset&of&the&original&dimensions&X1#X2#X3#…#Xd(1#Xd#
– discrimina6ve:&pick&good&class&“predictors”&(e.g.&gain)&
• Feature&extrac6on&
– construct&a&new&set&of&dimensions&
& & & &Ei&=&f(X1…Xd)&
– (linear)&combina6ons&of&original&

Copyright&©&2014&Victor&Lavrenko&
Principal&Components&Analysis&
• Defines&a&set&of&principal&components&
– 1st:&direc6on&of&the&greatest&variability&in&the&data&
– 2nd:&perpendicular&to&1st,&greatest&variability&of&what’s&lee&
– ...&and&so&on&un6l&d&(original&dimensionality)&
• First&m<<d#components&become&m#new&dimensions&
– change&coordinates&of&every&data&point&to&these&dimensions&

Copyright&©&2014&Victor&Lavrenko&
Why&greatest&variability?&
• Example:&reduce&2adimensional&data&to&1ad&
– {x1,x2}&!&e’&(along&new&axis&e)&
• Pick&e&to&maximize&variability&
• Reduces&cases&when&two&&
points&are&close&in&easpace&
but&very&far&in&(x,y)aspace&
• Minimizes&distances&&
between&original&points&&
and&their&projec6ons&
Copyright&©&2014&Victor&Lavrenko&
Principal&components&
• “Center”&the&data&at&zero:&&xi,a&=&xi,a&–&μa&
– subtract&mean&from&each&aSribute&
• Compute&covariance&matrix&Σ&
– covariance&of&dimensions&x1&and&x2:& 1 n 2
var(a) = ∑ x ia
n i=1
• do&x1&and&x2&tend&to&increase&together?& 1 n
cov(b,a) = ∑ x ib x ia
• or&does&x2&decrease&as&x1&increases?& n i=1

• Mul6ply&a&vector&by&Σ:&&&&&&&&&&&&&&&&&&&&&again&
"2.0 0.8%" −1% " −1.2%
$ '$ ' →$ '
#0.8 0.6&# +1& # −0.2&
$ −2.5'
→&€ ) →&
% −1.0(
$ −6.0' $ −14.1' $ −33.3'
) →& ) →& )
% −2.7( % −6.4 ( % −15.1(

– turns&towards&direc6on&of&variance&

slope: 0.400 0.450 0.454 0.454

€ €
• Want&vectors&e&which&aren’t&turned:&Σ&e&=&λ&e"
€ € €

– e&…&eigenvectors&of&Σ,&&λ&…&corresponding&eigenvalues&
– principal&components&=&eigenvectors&w.&largest&eigenvalues&&
Copyright&©&2014&Victor&Lavrenko&
Finding&Principal&Components&
1.&find&eigenvalues&by&solving:&det(Σ&–&λI)&=&0&
$2.0 − λ 0.8 ' 2
det& ) = (2 − λ )(0.6 − λ ) − (0.8)(0.8) = λ − 2.6 λ + 0.56 = 0
% 0.8 0.6 − λ (
( )
{λ1, λ2 } = 12 2.6 ± 2.6 2 − 4 * 0.56 = {2.36,0.23}

2.&find&i &eigenvector&by&solving:&Σ&ei&=&λi&ei&
th €

"2.0 0.8%" e1,1 % " e1,1 % 2.0e1,1 + 0.8e1,2 = 2.36e1,1
$ '$ ' € = 2.36$ ' e1,1 = 2.2e1,2
# 0.8 0.6 e
&# 1,2 & # e1,2 & 0.8e1,1 + 0.6e1,2 = 2.36e1,2
"2.2 %
"2.0 0.8%" e2,1 % " e2,1 % #−0.41& e1 ~ $ '
$ '$ ' = 0.23$ ' e2 = % ( #1 &
€ #0.8 0.6&# e2,2 & # e2,2 & $ 0.91 ' € want: ||e1|| = 1

"0.91%
3.&1st&PC:&&&&&&&,&2nd&PC:
"0.91% #−0.41&
e1 = $ '
$ '
#0.41& &%$ 0.91 (' #0.41&

€ € Copyright&©&2014&Victor&Lavrenko& slope: 0.454
€ €

Projec6ng&to&new&dimensions&
• e1&…&em&are&new&dimension&vectors& e1
• Have&instance&&x&=&{x1…xd}&(original&coordinates)&
• Want&new&coordinates&x’&=&{x’1&…&x’m}:& e2
1. “center”&the&instance&(subtract&the&mean):&x’aμ" x’-µ
2. “project”&to&each&dimension:&(x’aμ)Tej&for&j=1…m&&
! !
( x − µ) = [( x1 − µ1 ) ( x 2 − µ2 ) ! ( xd − µd ) ]
" x1 ' % " ( x" − µ" )T e"1 % " ( x1 − µ1 )e1,1 + ( x 2 − µ2 )e1,2 +# + ( x d − µd )e1,d %
$ ' $ " " T" ' $ '
µ) e2 ' $ ( x1 − µ1 )e2,1 + ( x 2 − µ2 )e2,2 +# + ( x d − µd )e2,d '
€ $ x 2 ' ' = $ ( x −€ =$ '
$ ! ' $ ! ' !
$ ' $ " " T" ' $ '
# x m '& #( x − µ) em & #( x1 − µ1 )em,1 + ( x 2 − µ2 )em,2 +# + ( x d − µd )em,d &
Copyright&©&2014&Victor&Lavrenko&
Direc6on&of&greatest&variability&
• Select&dimension&e&which&maximizes&the&variance&
• Points&xi&“projected”&onto&vector&e:&
• Variance&of&
projec6ons:&
• Maximize&variance&
V
– want&unit&length:&||e||=1&
– add&Lagrange&mul6plier& ∂V 2 n $ d '
$ d ( = ∑&&∑ x ij e j )) x ia − 2 λea = 0
∂ea n i=1 % j =1 (
& ∑ cov(1, j)e j = λ€
e1 &
& j =1 & #1 n
d &
% ! ) hold for 2 e
∑ j % n ∑ x ia x ij ( = 2λea
d
e must be an & & a=1..d $ i=1 '
eigenvector &∑ cov(d, j)e j = λed & j =1

' j =1 €* covariance of a,j


Copyright&©&2014&Victor&Lavrenko&

Variance&along&eigenvector&
Variance of projected points (xTe):
1 n # d &
µ = ∑%∑ x ij e j ((
%
n i=1 $ j =1 '
d # &
1 n
1 #d &# d & = ∑% ∑ x ij (e j
n
n i=1 '
= ∑%%∑ x ij e j ((%∑€x ia ea ( j =1 $

n i=1 $ j =1 '$ a =1 '


d d # & €
1 n
= ∑ ∑% ∑ x ia x ij (e j ea
a =1 j =1 $ n i=1 '
€ d # d & n
1
= ∑%%∑ cov(a, j)e j ((ea cov(a, j) = ∑ x ia x ij
n i=1
a =1 $ j =1 '
€ d d

= ∑ ( λea )ea ∑ cov(a, j)e j = λea e is an eigenvector of


a =1 € j =1 the covariance matrix
2
€ =λ e =λ
Copyright&©&2014&Victor&Lavrenko&


How&many&dimensions?&
• Have:&eigenvectors&e1&…&ed&&&want:&&m&<<&d#
• Proved:&eigenvalue&λi&=&variance&along&ei&
• Pick&ei&that&“explain”&the&most&variance&
– sort&eigenvectors&s.t.&λ1&≥&λ2&≥&…&≥&λd&
– pick&first&m&eigenvectors&which&
explain&90%&or&the&total&variance&
• typical&threshold&values:&0.9&or&0.95&
• Or&use&a&scree&plot:&
eigenvalue

– like&Kameans& pick 3 PCs (visually)

dimensions
Copyright&©&2014&Victor&Lavrenko&
PCA&in&a&nutshell& 3. compute covariance matrix

1. correlated hi-d data 2. center the points


(“urefu” means “height” in Swahili)

4. eigenvectors + eigenvalues

eig(cov(data))

5. pick m<d eigenvectors


w. highest eigenvalues
7. uncorrelated low-d data 6. project data points to
those eigenvectors

Copyright&©&2014&Victor&Lavrenko&
PCA&example:&Eigen&Faces&
input: dataset of N face images face: K x K bitmap of pixels “unfold” each bitmap to
K2-dimensional vector

arrange in a matrix
each face = column

K2 x N

Matlab demo on course webpage


“fold” into a K x K bitmap
PCA&
can visualize
eigenvectors:
m “aspects”
of prototypical
K2 x m
facial features

set of m eigenvectors
Copyright&©&2014&Victor&Lavrenko&
each is K2-dimensional
Eigen&Faces:&Projec6on&
= mean +

• Project&new&face&to&
space&of&eigenafaces&
• Represent&vector&as&
a&linear&combina6on&
of&principal&components&
• How&many&do&we&need?&

Copyright&©&2014&Victor&Lavrenko&
(Eigen)&Face&Recogni6on&
• Face&similarity&
– in&the&reduced&space&
– insensi6ve&to&ligh6ng&
expression,&orienta6on&
• Projec6ng&new&“faces”&
– everything&is&a&face&

new face

projected to eigenfaces

Copyright&©&2014&Victor&Lavrenko&
PCA:&prac6cal&issues&
• Covariance&extremely&sensi6ve&to&large&values&
– mul6ply&some&dimension&by&1000&
• dominates&covariance&
• becomes&a&principal&component&
– normalize&each&dimension&to&zero&mean&and&unit&variance:&&
&x’&=&(x&–&mean)&/&st.dev&
• PCA&assumes&underlying&subspace&is&linear&
– 1d:&straight&line&
2d:&flat&sheet&
– transform&to&handle&
nonalinear&spaces&
(manifolds)&
Copyright&©&2014&Victor&Lavrenko& direction of greatest variability
PCA&and&classifica6on&
• PCA&is&unsupervised&
– maximizes&overall&variance&of&the&data&along&&
a&small&set&of&direc6ons&
– does&not&know&anything&&
about&class&labels&
– can&pick&direc6on&&
that&makes&it&hard&&
to&separate&classes&
• Discrimina6ve&approach&
– look&for&a&dimension&&
that&makes&it&easy&to&&
separate&classes&
Copyright&©&2014&Victor&Lavrenko&
Linear&Discriminant&Analysis&
• LDA:&pick&a&new&dimension&that&gives:&
– maximum&separa6on&between&means&of&projected&classes&
– minimum&variance&within&each&projected&class&
• Solu6on:&eigenvectors&based&on&betweenaclass&and&
withinaclass&covariance&matrices&

Copyright&©&2014&Victor&Lavrenko&
PCA&vs.&LDA&
• LDA&not&guaranteed&to&be&beSer&for&classifica6on&
– assumes&classes&are&unimodal&Gaussians&
– fails&when&discriminatory&informa6on&is&not&in&the&mean,&
but&in&the&variance&of&the&data&
• Example&where&PCA&gives&a&beSer&projec6on:&

Copyright&©&2014&Victor&Lavrenko&
Dimensionality&reduc6on&
• Pros&
– reflects&our&intui6ons&about&the&data&
– allows&es6ma6ng&probabili6es&in&highadimensional&data&
• no&need&to&assume&independence&etc.&
– drama6c&reduc6on&in&size&of&data&
• faster&processing&(as&long&as&reduc6on&is&fast),&smaller&storage&
• Cons&
– too&expensive&for&many&applica6ons&(TwiSer,&web)&
– disastrous&for&tasks&with&fineagrained&classes&
– understand&assump6ons&behind&the&methods&(linearity&etc.)&
• there&may&be&beSer&ways&to&deal&with&sparseness&

Copyright&©&2014&Victor&Lavrenko&
Summary&
• True&dimensionality&<<&observed&dimensionality&
• High&dimensionality&#&sparse,&unstable&es6mates&
• Dealing&with&high&dimensionality:&
– use&domain&knowledge&
– make&an&assump6on:&independence&/&smoothness&/&symmetry&
– dimensionality&reduc6on:&feature&selec6on&/&feature&extrac6on&
• Principal&Components&Analysis&(PCA)&
– picks&dimensions&that&maximize&variability&
• eigenvectors&of&the&covariance&matrix&
– examples:&Eigen&Faces&
– variant&for&classifica6on:&Linear&Discriminant&Analysis&
Copyright&©&2014&Victor&Lavrenko&

You might also like