Topic01 - Principal Component Analysis

Artificial Intelligent for
Mechanical Engineers
Tegoeh Tjahjowidodo
Principal Component Analysis
Tegoeh Tjahjowidodo
•1
…recall
Principal component analysis
type sedan sedan truck truck bus
color red red blue blue yellow
attribute plain sticker plain sticker plain
The vehicles are characterized by three features, i.e. type,

color and attribute.
But in this case, only two of them are needed to identify the
vehicle. type and color are redundant
Tegoeh Tjahjowidodo
3 3
…recall
type sedan sedan truck truck bus
color 326 326 630 630 782

Tegoeh Tjahjowidodo
4 4
•2
…recall
type 11 11 13 13 14
color 326 326 630 630 782

Tegoeh Tjahjowidodo
5 5
…recall
type 11 11 13 13 14
color 326 326 630 630 782
attribute 0 1 0 1 0
900
800
700
600
The vehicles are characterized
500 by three features, i.e. type,
Color
color and attribute. 400

300
But in this case, only two of200them are needed to identify the
vehicle. type and color
100
are redundant
Tegoeh Tjahjowidodo 0
8 9 10 11 12 13 14 15
6 6 Type
•3
…recall
type 11 11 13 13 14
color 328.97 366.27 672.89 630.56 786.66
attribute 0 1 0 1 0
900
800
700
600
The vehicles are characterized
500 by three features, i.e. type,
Color
color and attribute. 400
300
But in this case, only two of200them are needed to identify the
vehicle. type and color
100
are redundant
Tegoeh Tjahjowidodo 0
8 9 10 11 12 13 14 15
7 7 Type
Data Redundancy
Let us consider two sets of data that are redundant.
In the absence of noise, the dependency of data2 to data1 is

very obvious.
Tegoeh Tjahjowidodo
•4
Data Redundancy
In real practice, most likely, the data will be contaminated
with random noise
The noise might smear the dependency of the data.

How do we determine if the two datasets are
Tegoeh Tjahjowidodo redundant?
Variance and Covariance

Variance is a measure of how data is spread in a set.
∑ ( ̅)
𝑠 =
( )
∑
where 𝑥̅ is the mean of variable 𝑥: 𝑥̅ = and 𝑛 is the
total number of data
Covariance is similar to variance, but it measures the joint

variability between two random variables.
∑ ( ̅ )( )
𝑐𝑜𝑣(𝑥, 𝑦) =
( )
it measures the covariance between variables 𝑥 and 𝑦.

∑
where 𝑦 is the mean of variable 𝑦: 𝑦 =
Tegoeh Tjahjowidodo
10
•5
Variance and Covariance
Covariance tells us how 𝑦 is correlated to 𝑥:
Tegoeh Tjahjowidodo
11
Covariance Matrix
Covariance matrix (auto-covariance matrix) is a square
matrix containing (co-)variances between each pair of
elements.
For 𝑛 variables, the covariance matrix is:

𝑠, 𝑐𝑜𝑣(1,2) ⋯ 𝑐𝑜𝑣(1, 𝑛)
𝐾 = 𝑐𝑜𝑣(2,1) 𝑠 , ⋯ 𝑐𝑜𝑣(2, 𝑛)
⋮ ⋮ ⋱ ⋮
𝑐𝑜𝑣(𝑛, 1) 𝑐𝑜𝑣(𝑛, 2) ⋯ 𝑠 ,
For 𝑥 and 𝑦 variables, the covariance matrix is, therefore:

𝑠 𝑐𝑜𝑣 𝑥, 𝑦
𝐾=
𝑐𝑜𝑣 𝑦, 𝑥 𝑠
Tegoeh Tjahjowidodo
12
•6
Covariance Matrix
Covariance matrix shows how the variables are
distributed and correlated:
Tegoeh Tjahjowidodo
13
Data Redundancy (…recall)

Let us consider two sets of data that are redundant.
In the absence of noise, the dependency of data2 to data1 is

very obvious.
Tegoeh Tjahjowidodo
14
•7
Data Redundancy
In real practice, most likely, the data will be contaminated
with random noise
The noise might smear the dependency of the data.

How do we determine if the two datasets are
Tegoeh Tjahjowidodo redundant?
15

The working concept is to find transformation angle that will
result in a principal component with highest variance, but
the lowest covariance.
10
-2
-4
-6
-8
-10
-10 -8 -6 -4 -2 0 2 4 6 8 10
9.6573 9.2621
Tegoeh Tjahjowidodo 𝐾=
9.2621 9.7297
16
•8
10
-2
-4
-6
-8
-10
-10 -8 -6 -4 -2 0 2 4 6 8 10
13.2045 8.5709
8.5709 6.1825
17

10
-2
-4
-6
-8
-10
-10 -8 -6 -4 -2 0 2 4 6 8 10
16.2172 6.5749
6.5749 3.1699
18
•9
10
-2
-4
-6
-8
-10
-10 -8 -6 -4 -2 0 2 4 6 8 10
18.2367 3.5779
3.5779 1.1503
19

10
-2
-4
-6
-8
-10
-10 -8 -6 -4 -2 0 2 4 6 8 10
18.9556 0.0362
0.0362 0.4314
20
•10
Analytical steps: Collect data
Normalized
Calculate cov. matrix
Rotate data
Calculate cov. matrix
Cov. no
Minimum
?
yes
Tegoeh Tjahjowidodo
PCA
21
Example
Two sets of input data were collected.
a) Analyse the principal component
b) Asses if the data can be reduced to single set
x y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2.0 1.6
1.0 1.1
1.5 1.6
1.1 0.9
Tegoeh Tjahjowidodo
22
•11
Matrix Revisited (…recall)
Eigen problem
Example. Find the eigenvalues and eigenvectors of:

5 4
A=
1 2
1
l1 = 1; q1 = −1
AQ = QÂ
4
l2 = 6; q2 = 1
we need to come A ⸱ Q = Q ⸱ Â
up to this form:
5 4 1 4 1 4 1 0
Eigenvalue =
Tegoeh Tjahjowidodo 1 2 −1 1 −1 1 0 6
23
matrix!
23
Matrix Revisited (…recall)

Eigen problem
Example. Find the eigenvalues and eigenvectors of:

5 4
A=
1 2
Eigenvector will
tell the orientation
1
l1 = 1; q1 = −1
AQ = QÂ
4
l2 = 6; q2 = 1
A ⸱ Q = Q ⸱ Â
5 4 1 4 1 4 1 0
=
Tegoeh Tjahjowidodo 1 2 −1 1 −1 1 0 6
24
24
•12
Why do Eigenvalues tell the PC(s)?
Covariance :
∑ ( ̅ )( )
𝑐𝑜𝑣(𝑥, 𝑦) =
( )
𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[ (X - 𝔼[X]) ⸱ (Y - 𝔼[Y]) ]

𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[ XY - X⸱𝔼[Y] - Y⸱𝔼[X] + 𝔼[X]⸱𝔼[Y]]
𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[XY] - 𝔼[X]⸱𝔼[Y] - 𝔼[Y]⸱𝔼[X] + 𝔼[X]⸱𝔼[Y]

𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[XY] - 𝔼[X]⸱𝔼[Y]
Tegoeh Tjahjowidodo
25
Why do Eigenvalues tell the PC(s)?
K = 𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[XY] - 𝔼[X]⸱𝔼[Y]

Let’s assume m as a column vector, where mTm = 1:
mTKm = mT ( 𝔼[XY] - 𝔼[X]⸱𝔼[Y]) m

= 𝔼[(mT X)(mT Y)] - 𝔼[mT X]⸱𝔼[mT Y])
= 𝑐𝑜𝑣(mT𝑥, mT𝑦) = l
Therefore mTKm = l
Premultiply with m:
mmTKm = ml
Km = ml
Eigenvector Eigenvalue
Tegoeh Tjahjowidodo
26
•13
Example
Two sets of input data were collected.
a) Analyse the principal component using Eigen-solution
b) Asses if the data can be reduced to single set
x y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2.0 1.6
Note:
1.0 1.1
Eigen approach will
1.5 1.6 work only on data
1.1 0.9 with zero mean
Tegoeh Tjahjowidodo
27
•14

Topic01 - Principal Component Analysis

Uploaded by

Copyright:

Available Formats

You might also like

Topic01 - Principal Component Analysis

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic01 - Principal Component Analysis

Uploaded by

Copyright:

Available Formats

Artificial Intelligent for

Principal Component Analysis

type sedan sedan truck truck bus

color red red blue blue yellow

attribute plain sticker plain sticker plain

The vehicles are characterized by three features, i.e. type,

type sedan sedan truck truck bus

color 326 326 630 630 782

attribute plain sticker plain sticker plain

The vehicles are characterized by three features, i.e. type,

color 326 326 630 630 782

attribute plain sticker plain sticker plain

The vehicles are characterized by three features, i.e. type,

color 326 326 630 630 782

color and attribute. 400

color 328.97 366.27 672.89 630.56 786.66

In the absence of noise, the dependency of data2 to data1 is

The noise might smear the dependency of the data.

Variance and Covariance

Covariance is similar to variance, but it measures the joint

it measures the covariance between variables 𝑥 and 𝑦.

For 𝑛 variables, the covariance matrix is:

For 𝑥 and 𝑦 variables, the covariance matrix is, therefore:

Data Redundancy (…recall)

In the absence of noise, the dependency of data2 to data1 is

The noise might smear the dependency of the data.

Principal Component Analysis

Principal Component Analysis

Principal Component Analysis

Calculate cov. matrix

Calculate cov. matrix

Example. Find the eigenvalues and eigenvectors of:

Matrix Revisited (…recall)

Example. Find the eigenvalues and eigenvectors of:

𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[ (X - 𝔼[X]) ⸱ (Y - 𝔼[Y]) ]

𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[XY] - 𝔼[X]⸱𝔼[Y] - 𝔼[Y]⸱𝔼[X] + 𝔼[X]⸱𝔼[Y]

Why do Eigenvalues tell the PC(s)?

K = 𝑐𝑜𝑣(𝑥, 𝑦) = 𝔼[XY] - 𝔼[X]⸱𝔼[Y]

mTKm = mT ( 𝔼[XY] - 𝔼[X]⸱𝔼[Y]) m

You might also like