Professional Documents
Culture Documents
Dimensionality Reduction: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
Dimensionality Reduction: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
Jayanta Mukhopadhyay
Dept. of Computer Science and Engg.
Books
n Chapters 6 of “Introduction to
Machine Learning” by Ethem Alpaydin.
Why to reduce dimension?
n For reducing complexity of inference, memory and
computation.
n In most learning algorithms, the complexity depends on
n the number of input dimensions, d
PCA-Algorithm
n Input: A set of data points: S={xj=(x1j,x2j,…xdj)| xj in Rd}.
n Output: A set of k eigen vectors providing tx. matrix: W=[w1,w2,…,wk]
1. Compute mean of data points.
2. Translate all data points to their mean.
3. Compute covariance matrix of the set.
4. Compute eigen vetcors and eigen values (in increasing
order).
5. Choose k such that the fraction of variance accounted for is
more than a threshold.
6. Use those k-components for representing any data point.
Example
n Data : {( 5, 3, 2), (4, 6, 0), (3, -7, 14), (2, 5, 3),
(3, 13, -6)}
n Perform PCA and if applicable, reduce the
dimension of data.
Example (contd.)
Example (contd.)
Redundant dimension
Coordinate transformation
e2
e1 Y
X
PCA properties
n PCA diagonalizes the data covariance matrix Σ.
n Σ = CDCT,
n D: Diagonal matrix;
n C: Columns are unit eigen vectors of Σ. à CCT=CTC=I
n Components are uncorrelated
n As covariance among components is zero.
n By normalizing components with their variances
(eigen values), Euclidean distance could be used
for classification.
n Reconstr. error from lower dimensional space
minimum among all linear transforms of the data.
Application of PCA
n Data compression
n Provides optimum set of orthonormal basis vectors for a
set of data points.
n Data dependent.
After
Band PCA 6 Band PCA 7 Band PCA 8 Band PCA 9 Band PCA 10
component 20,
not much
details are
Band PCA 11 Band PCA 12 Band PCA 13 Band PCA 14 Band PCA 15
available.
Band PCA 16 Band PCA 17 Band PCA 18 Band PCA 19 Band PCA 20
Removal of
data
redundancy.
Courtesy: Li et al, “A New Subspace Approach for Supervised Hyperspectral Image
Classification”, 2011 IEEE International Geoscience and Remote Sensing Symposium.
Application of PCA
n Factor analysis.
n Highlights decorrelated factors.
n Useful for classification.
n For example, eigen faces for representing
human faces.
n Performs PCA on a large set of images of human
faces cropped to the same size.
n Any arbitrary face expressed as linear combination of
them.
n Coefficients of linear combination represent an
arbitrary face.
PCA: Eigen faces
http://en.wikipedia.org/wiki/Image:Eigenfaces.png
Application of PCA
n Classification / High level processing
n Using the representation derived by
principal component analysis.
PCA
basis
vectors
Principal Classification
Components
Output
Linear discriminant analysis
n For the purpose of classification, dimensional
reduction using PCA may not work.
n It captures the direction of maximum variance for
a data set.
n For labelled data sets, it does not capture the
direction of maximum separation between the
groups of data points of differing labels.
Well separated
but not along
the direction of
Direction for principal
principal
component.
component.
Fisher linear discriminant
n Consider a set of data points S={xi| xi in Rd}.
n N1 points in class w1.
n N2 points in class w2.
n Say, N1+ N2=N (total data points).
n Consider a line with direction u.
n Projection of data xi on u: yi= xiTu
n One dimensional subspace representing data.
Separation between projected
data of different classes
n m1= mean of data points in w1.
n m2= mean of data points in w2.
n Projection of means:
n my1=m1Tu
n my2=m2Tu
n A measure of separation: D
_
my1 my2
n D=|my1 my2|
n Does not consider variance of data.
A better measure of
separation
q Normalized by a factor proportional to
class variances.
q Scatter of data belonging to class C: D
my1 my2
Mean
Class Variance x Number of samples
To maximize
Solution
To maximize
SW=S1+S2
Example (contd.)
n LDA: Separability
Well separated.
Example (contd.)
n PCA:
Reduced
margin of
separation.
Summary
n Feature selection (Subset Selection)
n Forward and backward sequential selection
methods.
n Unsupervised dimension reduction method.
n PCA
n Supervised dimension reduction method
n LDA