Topic01 - Principal Component Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Artificial Intelligent for

Mechanical Engineers

Tegoeh Tjahjowidodo

Principal Component Analysis

Tegoeh Tjahjowidodo

•1
…recall
Principal component analysis

type sedan sedan truck truck bus

color red red blue blue yellow

attribute plain sticker plain sticker plain

The vehicles are characterized by three features, i.e. type,


color and attribute.
But in this case, only two of them are needed to identify the
vehicle. type and color are redundant
Tegoeh Tjahjowidodo

3 3

…recall
Principal component analysis

type sedan sedan truck truck bus

color 326 326 630 630 782

attribute plain sticker plain sticker plain

The vehicles are characterized by three features, i.e. type,


color and attribute.
But in this case, only two of them are needed to identify the
vehicle. type and color are redundant
Tegoeh Tjahjowidodo

4 4

•2
…recall
Principal component analysis

type 11 11 13 13 14

color 326 326 630 630 782

attribute plain sticker plain sticker plain

The vehicles are characterized by three features, i.e. type,


color and attribute.
But in this case, only two of them are needed to identify the
vehicle. type and color are redundant
Tegoeh Tjahjowidodo

5 5

…recall
Principal component analysis

type 11 11 13 13 14

color 326 326 630 630 782

attribute 0 1 0 1 0
900
800
700
600
The vehicles are characterized
500 by three features, i.e. type,
Color

color and attribute. 400


300
But in this case, only two of200them are needed to identify the
vehicle. type and color
100
are redundant
Tegoeh Tjahjowidodo 0
8 9 10 11 12 13 14 15
6 6 Type

•3
…recall
Principal component analysis

type 11 11 13 13 14

color 328.97 366.27 672.89 630.56 786.66

attribute 0 1 0 1 0
900
800
700
600
The vehicles are characterized
500 by three features, i.e. type,
Color
color and attribute. 400
300
But in this case, only two of200them are needed to identify the
vehicle. type and color
100
are redundant
Tegoeh Tjahjowidodo 0
8 9 10 11 12 13 14 15
7 7 Type

Data Redundancy
Let us consider two sets of data that are redundant.

In the absence of noise, the dependency of data2 to data1 is


very obvious.
Tegoeh Tjahjowidodo

•4
Data Redundancy
In real practice, most likely, the data will be contaminated
with random noise

The noise might smear the dependency of the data.


How do we determine if the two datasets are
Tegoeh Tjahjowidodo redundant?

Variance and Covariance


Variance is a measure of how data is spread in a set.
∑ ( ̅)
𝑠 =
( )

where 𝑥̅ is the mean of variable 𝑥: 𝑥̅ = and 𝑛 is the
total number of data

Covariance is similar to variance, but it measures the joint


variability between two random variables.
∑ ( ̅ )( )
𝑐𝑜𝑣(𝑥, 𝑦) =
( )

it measures the covariance between variables 𝑥 and 𝑦.



where 𝑦 is the mean of variable 𝑦: 𝑦 =
Tegoeh Tjahjowidodo

10

•5
Variance and Covariance
Covariance tells us how 𝑦 is correlated to 𝑥:

Tegoeh Tjahjowidodo

11

Covariance Matrix
Covariance matrix (auto-covariance matrix) is a square
matrix containing (co-)variances between each pair of
elements.

For 𝑛 variables, the covariance matrix is:


𝑠, 𝑐𝑜𝑣(1,2) ⋯ 𝑐𝑜𝑣(1, 𝑛)
𝐾 = 𝑐𝑜𝑣(2,1) 𝑠 , ⋯ 𝑐𝑜𝑣(2, 𝑛)
⋮ ⋮ ⋱ ⋮
𝑐𝑜𝑣(𝑛, 1) 𝑐𝑜𝑣(𝑛, 2) ⋯ 𝑠 ,

For 𝑥 and 𝑦 variables, the covariance matrix is, therefore:


𝑠 𝑐𝑜𝑣 𝑥, 𝑦
𝐾=
𝑐𝑜𝑣 𝑦, 𝑥 𝑠

Tegoeh Tjahjowidodo

12

•6
Covariance Matrix
Covariance matrix shows how the variables are
distributed and correlated:

Tegoeh Tjahjowidodo

13

Data Redundancy (…recall)


Let us consider two sets of data that are redundant.

In the absence of noise, the dependency of data2 to data1 is


very obvious.
Tegoeh Tjahjowidodo

14

•7
Data Redundancy
In real practice, most likely, the data will be contaminated
with random noise

The noise might smear the dependency of the data.


How do we determine if the two datasets are
Tegoeh Tjahjowidodo redundant?

15

Principal Component Analysis


The working concept is to find transformation angle that will
result in a principal component with highest variance, but
the lowest covariance.

10

-2

-4

-6

-8

-10
-10 -8 -6 -4 -2 0 2 4 6 8 10

9.6573 9.2621
Tegoeh Tjahjowidodo 𝐾=
9.2621 9.7297
16

•8
Principal Component Analysis
The working concept is to find transformation angle that will
result in a principal component with highest variance, but
the lowest covariance.

10

-2

-4

-6

-8

-10
-10 -8 -6 -4 -2 0 2 4 6 8 10

13.2045 8.5709
Tegoeh Tjahjowidodo 𝐾=
8.5709 6.1825
17

Principal Component Analysis


The working concept is to find transformation angle that will
result in a principal component with highest variance, but
the lowest covariance.

10

-2

-4

-6

-8

-10
-10 -8 -6 -4 -2 0 2 4 6 8 10

16.2172 6.5749
Tegoeh Tjahjowidodo 𝐾=
6.5749 3.1699
18

•9
Principal Component Analysis
The working concept is to find transformation angle that will
result in a principal component with highest variance, but
the lowest covariance.

10

-2

-4

-6

-8

-10
-10 -8 -6 -4 -2 0 2 4 6 8 10

18.2367 3.5779
Tegoeh Tjahjowidodo 𝐾=
3.5779 1.1503
19

Principal Component Analysis


The working concept is to find transformation angle that will
result in a principal component with highest variance, but
the lowest covariance.

10

-2

-4

-6

-8

-10
-10 -8 -6 -4 -2 0 2 4 6 8 10

18.9556 0.0362
Tegoeh Tjahjowidodo 𝐾=
0.0362 0.4314
20

•10
Principal Component Analysis
Analytical steps: Collect data

Normalized

Calculate cov. matrix

Rotate data

Calculate cov. matrix

Cov. no
Minimum
?

yes

Tegoeh Tjahjowidodo
PCA

21

Example
Two sets of input data were collected.
a) Analyse the principal component
b) Asses if the data can be reduced to single set

x y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2.0 1.6
1.0 1.1
1.5 1.6
1.1 0.9
Tegoeh Tjahjowidodo

22

•11
Matrix Revisited (…recall)
Eigen problem

Example. Find the eigenvalues and eigenvectors of:


5 4
A=
1 2

1
l1 = 1; q1 = −1
AQ = QÂ
4
l2 = 6; q2 = 1

we need to come A ⸱ Q = Q ⸱ Â
up to this form:
5 4 1 4 1 4 1 0
Eigenvalue =
Tegoeh Tjahjowidodo 1 2 −1 1 −1 1 0 6
23
matrix!

23

Matrix Revisited (…recall)


Eigen problem

Example. Find the eigenvalues and eigenvectors of:


5 4
A=
1 2
Eigenvector will
tell the orientation
1
l1 = 1; q1 = −1
AQ = QÂ
4
l2 = 6; q2 = 1

A ⸱ Q = Q ⸱ Â

5 4 1 4 1 4 1 0
=
Tegoeh Tjahjowidodo 1 2 −1 1 −1 1 0 6
24

24

•12
Why do Eigenvalues tell the PC(s)?

Covariance :
∑ ( ̅ )( )
𝑐𝑜𝑣(𝑥, 𝑦) =
( )

𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[ (X - 𝔼[X]) ⸱ (Y - 𝔼[Y]) ]


𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[ XY - X⸱𝔼[Y] - Y⸱𝔼[X] + 𝔼[X]⸱𝔼[Y]]

𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[XY] - 𝔼[X]⸱𝔼[Y] - 𝔼[Y]⸱𝔼[X] + 𝔼[X]⸱𝔼[Y]


𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[XY] - 𝔼[X]⸱𝔼[Y]

Tegoeh Tjahjowidodo

25

Why do Eigenvalues tell the PC(s)?

K = 𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[XY] - 𝔼[X]⸱𝔼[Y]


Let’s assume m as a column vector, where mTm = 1:

mTKm = mT ( 𝔼[XY] - 𝔼[X]⸱𝔼[Y]) m


= 𝔼[(mT X)(mT Y)] - 𝔼[mT X]⸱𝔼[mT Y])
= 𝑐𝑜𝑣(mT𝑥, mT𝑦) = l
Therefore mTKm = l

Premultiply with m:
mmTKm = ml
Km = ml
Eigenvector Eigenvalue
Tegoeh Tjahjowidodo

26

•13
Example
Two sets of input data were collected.
a) Analyse the principal component using Eigen-solution
b) Asses if the data can be reduced to single set

x y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2.0 1.6
Note:
1.0 1.1
Eigen approach will
1.5 1.6 work only on data
1.1 0.9 with zero mean
Tegoeh Tjahjowidodo

27

•14

You might also like