Professional Documents
Culture Documents
Subspace Methods
Subspace Methods
Subspace Methods
April 2004
Seong-Wook Joo
Dimensionality reduction
Achieved by extracting important features from the dataset
Learning
Is desirable to avoid the curse of dimensionality in pattern
recognition Classification
With fixed sample size, the classification performance decreases as the
number of feature increases
Linear Subspaces
Xdxn
xi b=1..k qbi ub
Udxk
Qkxn
Definitions/Notations
Xdxn: sample data set. n d-vectors
Udxk: basis vector set. k d-vectors
Qkxn: coefficient (component) sets. n k-vectors
Assumption, Notation
Measured data is a linear combination of some set of independent
signals (random variables x representing (x(1)x(d)) or row d-vectors)
xi = ai1s1 + + ainsn = ai S (ai : row n-vector)
zero-mean xi , ai assumed
X = AS (Xnxd: measured data, i.e., n different mixtures, Anxn: mixing
matrix, Snxd: n independent signals)
Algorithm
Goal: given X, find A and S (or find W=A-1 s.t. S=WX)
Key idea
By the Central Limit Theorem, sum of independent random variables
becomes more Gaussian than the individual r.v.s
Some linear comb. v X is maximally non-Gaussian when v X=si, i.e., v=wi
(naturally, this doent work when s is Gaussian)
Non-Gaussianity measures
Kurtosis (a 4th order stat), Negentropy
ICA Examples
Natural images
Assumption, Notation
Two sets of vectors X = [x1xm], Y = [y1yn]
X, Y: measured from the same semantic object (physical phenomenon)
projection for each of the sets: x' = wxx, y' = wyy
Algorithm
Goal: Given X, Y find wx, wy that maximizes the correlation btwn x', y'
E[ xy]
E[ x2 ]E[ y2 ]
E[ w x T x y T w y ]
T
E[w x x x w x ]E[ w y y y w y ]
w xT X YTw y
T
x
X XT w x w y T Y YT w y
XXT = Cxx, YYT = Cyy : within-set cov. , XYT = Cxy : between set cov.
Solutions for wx, wy by generalized eigenvalue problem or SVD
Taking the top k vector pairs Wx=(wx1wxk), Wy=(wy1wyk), correlation matrixkxk of the
projected k-vectors x', y' is diagonal with diagonals maximized
k min(m,n)
CCA Example
Comparisons
PCA
Unsupervised
LDA
Supervised
ICA
Unsupervised
Transform into variables not only uncorrelated (2 nd order) but also as independent as
possible (higher order)
CCA
Supervised
Kernel Methods
Kernels
(.): nonlinear mapping to a high dimensional space
Mercer kernels can be decomposed into dot product
K(x,y) = (x)(y)
Kernel PCA
Xdxn (cols of d-vectors) (X) (high dimensional vectors)
Inner-product matrix = (X)T(X) = [K(xi,xj)] Knxn(X,X)
First k eigenvectors e: transform matrix Enxk = [e1ek]
The real eigenvectors are (X)E
New pattern y is mapped (into prin. components) by
((X)E)T (y) = ET (X)T (y) = ET Knx1(X,y)
References
Overview
ICA
CCA
=
Getting coefficients for orthonormal basis vectors:
Qkxn
(Udxk)T
Xdxn