Professional Documents
Culture Documents
Principal Component Analysis (PCA) Final
Principal Component Analysis (PCA) Final
3 Dimension to 2 Dimension
Sad or Happy?
Sad or Happy?
SAD HAPPY
Weight
Age
Analysing Data
The BLUE line passing through
Income the data captures the
Experience DIRECTION of maximum
Variation AND the
MAGNITUDE of maximum
variation
The RED line is perpendicular
to the BLUE Line and it
captures the DIRECTION and
MAGNITUDE of the Second
Highest Variation
The GREEN Line captures the
DIRECTION and MAGNITUDE
of Third Highest Variation
Age
Dimension Reduction 80%-90% Variation Captured along 2
Dimensions: PC1 and PC2
100% Variation Captured
Income along 3 Dimensions 10% - 20% of Variation along Green
Experience Line is lost as the 3rd dimension has
been removed
3D 2D PC2
PC1
Do we need so many data variables to extract the relevant information for our
business problem?
Data can be highly correlated for example Experience Level and Age
Engineering
Google Search
An important relationship
Example
Direction (Ф) of the vector did not
change.
Only the vector became longer
Same
A𝒗
4
0 1 1 2 1
= =2 x
2 1 2 𝒗
2 4 2
Ф
A v k λ v
1 2
When a Matrix is multiplied by its Eigen Vector it results in the Eigen Vector
being multiplied by a Scalar. This Scalar is called Eigen Value
Each and every Eigen Value will have an Eigen Vector associated to it
5 3.1 3 3
Highly correlated
data r = 92.5%
6 2.3 2.7 2.5
2
7 2 1.6 1.5
8 1 1.1 1
-1
Center Point of Data
-1.5
Numerical Example: Compute the Variance and
Covariance of the Centered Data
Step 4: Using the Variance.S and Covariance.S function the variance and
covariance can be computed. This gives the Variance and Covariance matrix
λ1 λ2
4
2
4.0 4.5 5.0 5.5 6.0
Numerical Example: Compute the Eigen Value
𝑨𝒗= 𝝀𝒗
𝝀 is a scalar and it has been converted into matrix
𝑨 𝒗 = 𝝀 𝑰𝒗 form by multiplying with the Identity Matrix 𝑰 so
that operations on matrices can be performed
𝒗 – the eigen vector cannot be a 0
Det|𝑨 − 𝝀 𝑰| = 𝟎
Numerical Example: Compute the Eigen Value
Det|𝑨 − 𝝀 𝑰| = 𝟎
0.616556 0.615444 1 0
−𝝀 = 𝟎
0.615444 0.716556 0 1
0.616556 0.615444 𝝀 0
− 0 𝝀
= 𝟎
0.615444 0.716556
Numerical Example: Compute the Eigen Value
0.616556 0.615444 𝝀 0
− 0 𝝀
= 𝟎
0.615444 0.716556
0.616556 - λ 0.615444
= 𝟎
0.615444 0.716556 - λ
This is a Quadratic Equation and will have two roots. The roots can
be obtained from the standard formula
−𝑏± 𝑏2 −4𝑎𝑐
λ1 ,λ2 = = 1.284028, 0.049083
2𝑎
For each eigen value there is a corresponding eigen vector which is also know as the
Principal Component
Numerical Example: Compute Eigen Vector
For λ1 = 1.284028
-0.66747*V11 + 0.61544*V12 = 0
(𝑨 − 𝝀𝟏 𝑰)𝒗 = 𝟎 0.615444*V11 – 0.56747*V12 = 0
Check the
V11 0.92205 / 1.850176 0.677874
magnitude of
V1 the vector and
V12 1 / 1.850176 0.735179
it will be
approx. 1
Check the
V21 1/ 1.360214 0.735179
magnitude of
V2 the vector and
V22 -0.922053 / 1.360214 -0.677874
it will be
approx. 1
V 0
0.735179 -0.677874
-1.5 -1 -0.5 0 0.5 1 1.5
-0.5
Λ1 = 1.284028 Λ2 = 0.049083
-1
-1.5
Numerical Example: Transformation of Data
What does Transformation of Data Imply?
Transformed data
Plot of X1 versus X2
3.5 0.400
0.300
3
0.200
2.5
0.100
2
0.000
-2.000 -1.500 -1.000 -0.500 0.000 0.500 1.000 1.500 2.000
1.5
-0.100
1
-0.200
0.5
-0.300
0 -0.400
0 0.5 1 1.5 2 2.5 3 3.5
-0.500
Numerical Example: Analysis of Eigen Vector and
Eigen Value
The total variance is 100% which is λ1 + λ2
The first eigen vector or First Principal Component captures 96.31% of variance. This is
computed from λ1 / λ1 + λ2
The second eigen vector or Second Principal Component captures 3.69% of variance.
This is computed from computed from λ2 / λ1 + λ2
The first principal component explains 96.31% of the variance. The second principal
component can be dropped without losing much information
V1 V2 V1
Plot of X1 versus X2
3.5
3 0.828
-1.778 Group 1 Group 2
2.5
0.992
2 0.274
1.676 -
1.5 0.913 1.778 0 1.676
-0.099
1
-1.145
Reduced from 2 Dimensional to one
0.5 -0.438
Dimension
-1.224
0
0 0.5 1 1.5 2 2.5 3 3.5
PCA Model
X1 X2 V1 V2 Y1 Y2
x11 x12
x21 x22 v11 v12 x11 v11 + x12 v21 x11 v12 + x12 v22
x31 x32 v21 v22 x21 v11 + x22v21 x21 v12 + x22v22
x31 v11 + x32v21 x31 v12 + x32v22