Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Principal Components

Analysis
ISB / MLSL / Machine Learning
Dr. Shailesh Kumar
Orthogonal Bases Functions
k

x
x = x1i + x2 j + x3k
UNORDERED – all bases
equally important!
x3 ( ) ( ) ( )
= iT x i + j T x j + kT x k
x1 j

x2
i

ORDERED – bases have


𝟑𝟓𝟐𝟖 = 𝟑 × 𝟏𝟎𝟑 + 𝟓 × 𝟏𝟎𝟐 +𝟐 × 𝟏𝟎𝟏 +𝟖 × 𝟏𝟎𝟎
decreasing importance
What is a Projection?

DIRECTION of
Projection

RAW Data in
3-D SPACE

PROJECTED data
In 2-D SPACE
Is there a “Best” Projection?
A B C D

How do we “measure” the “goodness” of a projection?

The one that “preserves” the maximum “information”


Projecting a cloud a data
RAW Data in
2-D SPACE

PROJECTED data
In 1-D SPACE
Which is the “better” projection here?
Which is the “best” projection?

How do we “measure” the “goodness” of a projection?

The one that “preserves” the maximum “information”


Which is the “best” and “worst” projection?
B
C

D
Two Complete and Orthogonal Projections

SECOND Principal
Component

FIRST Principal
Component
PCA | Finding the Best Dimensions to project data
Principal Components Analysis in Nature

1st Principal 2nd Principal 3rd Principal 4th Principal 5th Principal
Component Component Component Component Component
Principal Components Analysis
Project the to preserves maximum variance
Projection: x(n) ® wT x(n) = y(n) Variance in Projected Space:

Mean in Projected Space:

mY y(n)
mY
y (n)
w1
w2

x(n) x(n)
Principal Components Analysis
Project the data into the direction in which its variance is maximum
Projection: x(n) ® wT x(n) = y(n) Variance in Projected Space:

Mean in Projected Space:

w* = argmax wT Cw = EigenVector(C)
mY w

y(n)
cv = cov(data);
[eig_vectors, eig_values] = eig(cv);
[d,q] = sort(-diag(d));
x(n) pca_proj = data * v(:,q(1:d));
PCA for IRIS data
cov(data) eig_vectors
0.6811 -0.0390 1.2652 0.5135 - -0.3173 0.5810 0.6565 0.3616
0.0390 0.1868 -0.3196 -0.1172 0.3241 -0.5964 0.7297 -0.0823
1.2652 -0.3196 3.0924 1.2877 0.4797 -0.0725 -0.1758 0.8566 -
0.5135 -0.1172 1.2877 0.5785 0.7511 -0.5491 -0.0747 0.3588
eig_values =
0.0235 0.0780 0.2406 4.1967

´ ´
Data = Signal (Structure) + Noise (Background)
( ) ( ) ( )
x = w1T x w1 + wT2 x w2 + wT3 x w3 + wT4 x w4( )
SIGNAL NOISE

Eigen Values
% age Variance Captured 4.1967 0.2406 0.0780 0.0235

l12 + l22 +... + ld2


S ( d D) = 2
l1 + l22 +...ld2 +... + lD2

Principal Components (d)


PCA For MNIST Data
Data = Signal (Structure) + Noise (Background)
x = ( v1T x ) v1 + ...+ ( vT30 x ) v 30 + ...+ ( vT784 x ) v 784
SIGNAL NOISE

Eigen Values
% age Variance Captured

l12 + l22 +... + ld2


S ( d D) = 2
l1 + l22 +...ld2 +... + lD2

Principal Components (d)


Loss of information = Degree of Reconstruction
1 5 10 15 20 25 30 35 40 ORIG
When to use PCA?

▪ When the DATA is MULI-VARIATE and NUMERIC

▪ When Number of FEATURES is LARGE

▪ When Data is Unimodal

▪ When CLASS labels are NOT present / ignored

▪ To VISUALIZE the data – top 2 or top 3 PC’s.

▪ To REDUCE #Dimensions/Features for next stages

▪ To REMOVE Noise in features and Outliers in data

You might also like