EE769-11 Dimension Reduction

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

EE769 Intro to ML

Dimension Reduction
Amit Sethi
Faculty member, IIT Bombay
Learning objectives

• Write the advantages of dimension reduction

• Write steps for principal component analysis

• Extend PCA to nonlinear methods using the kernel trick

2
Dimensionality reduction: what & why?
• Objective:
• For x RD, find mapping f : x → y, y Rd, d<D
• Such that, there exists f-1 that can (almost) reconstruct x

• Advantages:
• Less redundancy, easier classification
• Smaller storage, faster search
• Unravel meaningful latent variables
3
Example: Only two latent variables

4
Example 2: Many latent variables, but still less
than “pixels”

• What defines the shape of a face?


• Add pose and expression…
• Still a lot less variables than the pixels of a face image
5
Image source: Pixabay.com
Visualizing manifolds

“An Introduction to Locally Linear Embedding,” Lawrence K. Saul


6
Inherent linear dimension of data
Data along a line on a 2-D plane Data along a line or hyperplane in 3-D

7
Deviations from the underlying hyperplane

8
Principal component analysis
Objective:
• Find a way to select top d < D orthogonal directions that explain the
maximum possible variance of the data

Outline:
• Mean-center the data
• Loop
• Find direction of maximum variance (that is orthogonal to all
previously found directions)
• Until desired percent of variance is captured or number of
dimensions
9
Complete algorithm
• Mean centering: z = x − μx
• Covariance: N C = ZT Z
• Eigen decomposition: C = U Λ UT ; λj uj = C uj
• Selection: Cd = U Λd UT = Ud Λd UdT, d<D
• Drop dimensions: Set lower D−d eigenvalues to zero
• Projection: Y = Z Ud

• Reconstruction: Zd = Y UdT = Z Ud UdT


• Reconstruction error: ||Z − Zd ||22 | Λ − Λd |
Example: Eigenfaces

Turk, Matthew, and Alex Pentland. "Eigenfaces for


recognition." Journal of cognitive neuroscience 3.1 (1991):
71-86.

11
Data often lies around a nonlinear manifold

Shan, Caifeng, Shaogang Gong, and Peter W.


McOwan. "Appearance manifold of facial
expression." International Workshop on Human-
Computer Interaction. Springer, Berlin,
Heidelberg, 2005.

12
Kernel PCA introduces nonlinearity by
mapping data to a feature space

13
Kernel PCA avoids calculating features directly
by using the kernel trick
• Recall PCA: C ui = λiui ; N C = ZT Z
• Substituting features: ∑n φ(zn) φ(zn)T ui = N λiui
• Substitute: ui = ∑n ain φ(zn)
• This means: ∑n kl,n ∑m aim kn,m = N λi ∑n ain kl,n
• In matrix form: K2 ai = λi N K ai
• Solve eigenvalue problem: K ai = λi N ai

• There is a mean-centering step:


• Ref: https://www.ics.uci.edu/~welling/classnotes/papers_class/Kernel-PCA.pdf
14
Schölkopf, Bernhard (1998). "Nonlinear Component Analysis as a Kernel Eigenvalue Problem". Neural Computation. 10 (5): 1299–1319.
Stochastic Neighbor Embedding
• pj|i = softmax (− ||xi − xj||2 / 2σi2)
• qj|i = softmax (− ||yi − yj||2)

• Objective: Reduce KL divergence between P and Q

“Stochastic Neighbor Embedding,” Geoffrey Hinton and Sam Roweis


Auto-encoder bottlenecks for dimension
reduction

You might also like