16 Dimensionality Reduction

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Machine Learning

Dimensionality
i i li Reduction
d i

slides thanks to Xiaoli Fern (CS534, Oregon State Univ., 2011)

Jeff Howbert Introduction to Machine Learning Winter 2012 1


Dimensionality reduction

z Many modern data domains involve huge


numbers of features / dimensions

Documents: thousands of words, millions of bigrams


g

Images: thousands to millions of pixels

Genomics: thousands of genes, millions of DNA


pol morphisms
polymorphisms

Jeff Howbert Introduction to Machine Learning Winter 2012 2


Why reduce dimensions?

z High dimensionality has many costs

Redundant and irrelevant features degrade


performance of some ML algorithms

Difficulty in interpretation and visualization

Computation may become infeasible


what if your algorithm scales as O( n3 )?

Curse of dimensionality

Jeff Howbert Introduction to Machine Learning Winter 2012 3


Jeff Howbert Introduction to Machine Learning Winter 2012 4
Jeff Howbert Introduction to Machine Learning Winter 2012 5
Jeff Howbert Introduction to Machine Learning Winter 2012 6
Jeff Howbert Introduction to Machine Learning Winter 2012 7
Jeff Howbert Introduction to Machine Learning Winter 2012 8
Jeff Howbert Introduction to Machine Learning Winter 2012 9
Jeff Howbert Introduction to Machine Learning Winter 2012 10
Repeat until m lines

Jeff Howbert Introduction to Machine Learning Winter 2012 11


Steps in principal component analysis

z Mean center the data

z Compute covariance matrix

z Calculate eigenvalues and eigenvectors of


Ei
Eigenvector t with l 1 is
ith llargestt eigenvalue
i i 1st principal
i i l
component (PC)
Eigenvector with kth largest eigenvalue k is kth PC
k / i i = proportion of variance captured by kth PC

Jeff Howbert Introduction to Machine Learning Winter 2012 12


Applying a principal component analysis

z Full set of PCs comprise a new orthogonal basis for


feature space, whose axes are aligned with the maximum
variances of original data.
z Projection of original data onto first k PCs gives a reduced
dimensionality representation of the data
data.
z Transforming reduced dimensionality projection back into
original
g space g
gives a reduced dimensionality y
reconstruction of the original data.
z Reconstruction will have some error, but it can be small
and often is acceptable given the other benefits of
dimensionality reduction.

Jeff Howbert Introduction to Machine Learning Winter 2012 13


PCA example

original data mean centered data with


PCs overlayed

Jeff Howbert Introduction to Machine Learning Winter 2012 14


PCA example

original data projected original data reconstructed using


Into full PC space only a single PC

Jeff Howbert Introduction to Machine Learning Winter 2012 15


Jeff Howbert Introduction to Machine Learning Winter 2012 16
Choosing the dimension k

Jeff Howbert Introduction to Machine Learning Winter 2012 17


Jeff Howbert Introduction to Machine Learning Winter 2012 18
Jeff Howbert Introduction to Machine Learning Winter 2012 19
Jeff Howbert Introduction to Machine Learning Winter 2012 20
Jeff Howbert Introduction to Machine Learning Winter 2012 21
PCA: a useful preprocessing step

z Helps reduce computational complexity.


z Can help supervised learning
learning.
Reduced dimension simpler hypothesis space.
Smaller VC dimension less risk of overfitting.
overfitting
z PCA can also be seen as noise reduction.
z Caveats:
Fails when data consists of multiple separate clusters.
Directions of greatest variance may not be most
informative (i.e. greatest classification power).

Jeff Howbert Introduction to Machine Learning Winter 2012 22


Jeff Howbert Introduction to Machine Learning Winter 2012 23
Jeff Howbert Introduction to Machine Learning Winter 2012 24
Jeff Howbert Introduction to Machine Learning Winter 2012 25
Jeff Howbert Introduction to Machine Learning Winter 2012 26
Jeff Howbert Introduction to Machine Learning Winter 2012 27
Jeff Howbert Introduction to Machine Learning Winter 2012 28
Jeff Howbert Introduction to Machine Learning Winter 2012 29
Jeff Howbert Introduction to Machine Learning Winter 2012 30
Jeff Howbert Introduction to Machine Learning Winter 2012 31
Jeff Howbert Introduction to Machine Learning Winter 2012 32
Jeff Howbert Introduction to Machine Learning Winter 2012 33
Jeff Howbert Introduction to Machine Learning Winter 2012 34
Off-the-shelf classifiers

Per Tom Dietterich:

Methods that can be applied directly to data


without requiring a great deal of time-consuming
data preprocessing or careful tuning of the
learning procedure.

Jeff Howbert Introduction to Machine Learning Winter 2012 35


Off-the-shelf criteria

slide thanks to Tom Dietterich (CS534, Oregon State Univ., 2005)

Jeff Howbert Introduction to Machine Learning Winter 2012 36


Practical advice on machine learning

from Andrew Ng at Stanford

slides:
http://cs229 stanford edu/materials/ML-advice
http://cs229.stanford.edu/materials/ML advice.pdf
pdf

video:
http://www.youtube.com/v/sQ8T9b-uGVE
(starting at 24:56)

Jeff Howbert Introduction to Machine Learning Winter 2012 37

You might also like