Principal Component Analysis (PCA) : Shusen Wang

Principal Component Analysis (PCA)
Shusen Wang
Some Statistical Concepts
Mean and Variance (Scalar)
• Let 𝑋 be a random scalar variable and 𝑝(⋅) be the probability
density function (PDF).
• Mean: 𝜇 = 𝔼 𝑋 = ∫ 𝑥 ⋅ 𝑝 𝑥 𝑑𝑥.
• Variance: 𝜎 . = 𝔼 𝑋 − 𝜇 .
=∫ 𝑥−𝜇 .
⋅ 𝑝 𝑥 𝑑𝑥.
Mean and Variance (Scalar)
• Let 𝑋 be a random scalar variable and 𝑝(⋅) be the probability
• Mean: 𝜇 = 𝔼 𝑋 = ∫ 𝑥 ⋅ 𝑝 𝑥 𝑑𝑥.
• Variance: 𝜎 . = 𝔼 𝑋 − 𝜇 .
=∫ 𝑥−𝜇 .
⋅ 𝑝 𝑥 𝑑𝑥.
• Let 𝑥0 , ⋯ , 𝑥3 be independently drawn observations of 𝑋.

0
• Sample mean: 𝜇4 = ∑3670 𝑥6 .
3
0
.
• Sample variance: 𝜎4 = ∑3670 𝑥6 − 𝜇4 . .
380
Mean and Covariance (Vector)
• Let 𝑋 be a random vector variable and 𝑝(⋅) be the probability
• Mean: 𝛍 = 𝔼 𝑋 = ∫ 𝐱 ⋅ 𝑝 𝐱 𝑑𝐱 ∈ ℝ= .
?
• Covariance: 𝐂 = 𝔼 𝐱 − 𝛍 𝐱 − 𝛍 ∈ ℝ=×= .
• Let 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= be independently drawn observations of 𝑋.

0
A = ∑3670 𝐱 6 ∈ ℝ= .
• Sample mean: 𝛍
3
0
• Sample covariance: 𝐂B = ∑3670 𝐱 6 − 𝛍
C 𝐱6 − 𝛍
C ?
∈ ℝ=×= .
380
Principal Component Analysis (PCA)
PCA Explained
• Transforms the data to a new coordinate system.
• The greatest variance lie on the first coordinate (called the 1st principal
component).
• The 2nd greatest variance on the second coordinate, and so on…
PCA Explained
The original data.
PCA Explained
Step 1: subtract the mean
0
A = ∑3670 𝐱 6
3
PCA Explained
0
A = ∑3670 𝐱 6
3
• Translation: 𝐱 6D = 𝐱 6 − 𝛍
A
translation
PCA Explained
0
A = ∑3670 𝐱 6
3
• Translation: 𝐱 6D = 𝐱 6 − 𝛍
A
After translation
PCA Explained
Step 2: rotation
the 2nd greatest variance direction
the greatest variance direction

PCA Explained
Step 2: rotation
• Find a rotation matrix 𝐕 ∈ ℝ=×=

• Rotation: 𝐱 6DD = 𝐕 ? 𝐱 6 ′ ∈ ℝ=
Rotation

PCA Explained
The result
=×=
• Find a rotation matrix 𝐕 ∈ ℝ

Question: How to Perform the Rotation?
Rotation
• Find a rotation matrix 𝐕 ∈ ℝ=×=

• 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D , ⋯ , 𝐱 3D .

• Singular value decomposition (SVD):
𝐗 D = ∑=H70 𝜎H 𝐮H 𝐯H? .
• 𝐕 = 𝐯0 , 𝐯. , ⋯ , 𝐯= ∈ ℝ=×= .
• Rotation: 𝐱 6DD = 𝐕 𝐱 6 ′ ∈ ℝ= .

𝐗 D = ∑=H70 𝜎H 𝐮H 𝐯H? .
• 𝐕 = 𝐯0 , 𝐯. , ⋯ , 𝐯= ∈ ℝ=×= .
• Rotation: 𝐱 6DD = 𝐕 𝐱 6 ′ ∈ ℝ= .
the 2nd greatest variance direction is 𝐯.
the greatest variance direction is 𝐯0


𝐗 D = ∑=H70 𝜎H 𝐮H 𝐯H? .
Rotation
• 𝐕 = 𝐯0 , 𝐯. , ⋯ , 𝐯= ∈ ℝ=×= .
• Rotation: 𝐱 6DD = 𝐕 ? 𝐱 6 ′ ∈ ℝ= .
the 2nd greatest variance direction is 𝐯.
the greatest variance direction is 𝐯0


𝐞. 𝐗 D = ∑=H70 𝜎H 𝐮H 𝐯H? .
• 𝐕 = 𝐯0 , 𝐯. , ⋯ , 𝐯= ∈ ℝ=×= .
𝐞0
• Rotation: 𝐱 6DD = 𝐕 ? 𝐱 6 ′ ∈ ℝ= .
• 𝐗 DD ∈ ℝ3×= : the stack of 𝐱0DD , ⋯ , 𝐱 3DD .

• Its right singular vectors are the
standard basis: 𝐞0 , ⋯ , 𝐞= .
The Procedure of PCA (Recap)
1. Input: data 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= and target rank 𝑘 (≤ 𝑑).

0
A = ∑3670 𝐱 6 and 𝐱 6D = 𝐱 6 − 𝛍
2. Subtract the mean: 𝛍 A.
3
• Obtain matrix 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D, ⋯ , 𝐱 3D .

0
A = ∑3670 𝐱 6 and 𝐱 6D = 𝐱 6 − 𝛍
3
3. Find the rotation matrix 𝐕N = 𝐯0 , 𝐯. , ⋯ , 𝐯N ∈ ℝ=×N
• Truncated SVD: 𝐗 DN = ∑NH70 𝜎H 𝐮H 𝐯H? .
Question: Why setting a target rank 𝑘 ≤ 𝑑? Why not using 𝑘 = 𝑑?


0
A = ∑3670 𝐱 6 and 𝐱 6D = 𝐱 6 − 𝛍
3
• 𝐯0 is the greatest variance direction, 𝐯. is the 2nd greatest, and so on.

• Only the top variance directions are interesting.
• The top ones are features.
• The bottom ones are noise.
• For visualization, we set 𝑘 = 2 or 3.

0
A = ∑3670 𝐱 6 and 𝐱 6D = 𝐱 6 − 𝛍
3
4. Rotation (and dimensionality reduction):
• 𝐱 6DD = 𝐕N? 𝐱 6 ′ ∈ ℝN , for 𝑖 = 1 to 𝑛, are the outputs.
Dimensionality reduction: 𝐱 6 ∈ ℝ= 𝐱 6DD ∈ ℝN

Why is 𝐯0 the Greatest Variance Direction?
• The rows of 𝐗 D ∈ ℝ3×= have zero mean.

• 𝐱 6D = 𝐱 6 − 𝛍
A is the result of translation (Step 1 of PCA).
• The sample covariance matrix of 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= is:
0
• 𝐂B = ∑3670 C 𝐱6 − 𝛍
𝐱6 − 𝛍 C ? ∈ ℝ=×= .
380

• 𝐱 6D = 𝐱 6 − 𝛍
0
• 𝐂B = ∑3670 C 𝐱6 − 𝛍
𝐱6 − 𝛍 C ? ∈ ℝ=×= .
380
0 D𝑻 D
• è 𝐂B = 𝐗 𝐗.
380

• 𝐱 6D = 𝐱 6 − 𝛍
0
• 𝐂B = ∑3670 C 𝐱6 − 𝛍
𝐱6 − 𝛍 C ? ∈ ℝ=×= .
380
0 D𝑻 D
• è 𝐂B = 𝐗 𝐗.
380
0
• 𝐗 = ∑=H70 𝜎H 𝐮H 𝐯H?
D
𝐂B = ∑=H70 𝜎H. 𝐯H 𝐯H? .
380
• 𝜎0 ≥ 𝜎. ≥ ⋯ ≥ 𝜎= .

• 𝐱 6D = 𝐱 6 − 𝛍
0
• 𝐂B = ∑3670 C 𝐱6 − 𝛍
𝐱6 − 𝛍 C ? ∈ ℝ=×= .
380
0 D𝑻 D
• è 𝐂B = 𝐗 𝐗.
380
0
• 𝐗 = ∑=H70 𝜎H 𝐮H 𝐯H?
D
𝐂B = ∑=H70 𝜎H. 𝐯H 𝐯H? .
380
• 𝜎0 ≥ 𝜎. ≥ ⋯ ≥ 𝜎= .

• 𝐱 6D = 𝐱 6 − 𝛍
0
• 𝐂B = ∑3670 C 𝐱6 − 𝛍
𝐱6 − 𝛍 C ? ∈ ℝ=×= .
380
0 D𝑻 D
• è 𝐂B = 𝐗 𝐗.
380
0
• 𝐗 = ∑=H70 𝜎H 𝐮H 𝐯H?
D
𝐂B = ∑=H70 𝜎H. 𝐯H 𝐯H? .
380
• 𝜎0 ≥ 𝜎. ≥ ⋯ ≥ 𝜎= .
• 𝐯0 = argmax𝐯 𝐯 ? 𝐂B 𝐯.
PCA: A Real-World Example
Ocean Temperature Data
ocean temperatures taken 1979—2011 at 6 hour intervals

• Just keep the surface temperature
• Shape: 𝑛×𝑑0 ×𝑑. (oder-3 tensor)
• 𝑛 = 4×365×32 = 46,720 (4 measurements per day)
• 𝑑0 = 2×360 = 720 (2 measurement per degree of longitude)
• 𝑑. = 2×180 = 360 (2 measurement per degree of latitude)
• Just keep the surface temperature
• Shape: 𝑛×𝑑0 ×𝑑. (oder-3 tensor)
• 𝑛 = 4×365×32 = 46,720 (4 measurements per day)
• 𝑑0 = 2×360 = 720 (2 measurement per degree of longitude)
• 𝑑. = 2×180 = 360 (2 measurement per degree of latitude)
• Reshape the tensor to a matrix 𝐗 ∈ ℝ3×=

• Temporal dimension: 𝑛 = 46,720
• Spatial dimension: 𝑑 = 𝑑0𝑑. = 259,200
Compute the Top 𝑘 = 4 Principal Components
• Ocean temperature data: 𝐗 ∈ ℝ3×=

• Treat each row as a sample.

• The result of PCA is 𝑛×4 matrix.
• Treat each column as a sample.

• The result of PCA is 𝑑×4 matrix.
• Ocean temperature data: 𝐗 ∈ ℝ3×= The first principal component 𝑛×1



One of the 𝑛 time points
• Ocean temperature data: 𝐗 ∈ ℝ3×=



• Ocean temperature data: 𝐗 ∈ ℝ3×= The first principal component 𝑑×1

• Spatial dimension: 𝑑 = 𝑑0𝑑. = 259,200 One of the 𝑑 locations (color indicates value)


The 1st Principal Components
Annual cycle
The 2nd Principal Components
Annual cycle
The 3rd Principal Components
El Nino
The 4th Principal Components
La Nina
Summary
Dimensionality Reduction
• Applications
• Visualization and analysis
• Data processing (to make downstream ML more efficient or accurate)
Dimensionality Reduction
• Applications
• Visualization and analysis
• Data processing (to make downstream ML more efficient or accurate)
• Methods
• Principal component analysis (PCA)
• Kernel PCA
• Manifold learning
• Autoencoder
• Linear discriminant analysis (LDA)
3 Steps of PCA
1. Subtract mean (translation/shift).

2. Find the rotation matrix by (truncated) SVD.
• The top right singular vectors are the maximum variance directions.
3. Rotate the zero-mean vectors.

Principal Component Analysis (PCA) : Shusen Wang

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Principal Component Analysis (PCA) : Shusen Wang

Uploaded by

Copyright:

Available Formats

Principal Component Analysis (PCA)

• Let 𝑥0 , ⋯ , 𝑥3 be independently drawn observations of 𝑋.

• Let 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= be independently drawn observations of 𝑋.

the 2nd greatest variance direction

the greatest variance direction

• Find a rotation matrix 𝐕 ∈ ℝ=×=

the 2nd greatest variance direction

the greatest variance direction

the greatest variance direction

the 2nd greatest variance direction

the greatest variance direction

• 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D , ⋯ , 𝐱 3D .

• 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D , ⋯ , 𝐱 3D .

the 2nd greatest variance direction is 𝐯.

the greatest variance direction is 𝐯0

• 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D , ⋯ , 𝐱 3D .

the 2nd greatest variance direction is 𝐯.

the greatest variance direction is 𝐯0

• 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D , ⋯ , 𝐱 3D .

• 𝐗 DD ∈ ℝ3×= : the stack of 𝐱0DD , ⋯ , 𝐱 3DD .

1. Input: data 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= and target rank 𝑘 (≤ 𝑑).

1. Input: data 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= and target rank 𝑘 (≤ 𝑑).

Question: Why setting a target rank 𝑘 ≤ 𝑑? Why not using 𝑘 = 𝑑?

1. Input: data 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= and target rank 𝑘 (≤ 𝑑).

• 𝐯0 is the greatest variance direction, 𝐯. is the 2nd greatest, and so on.

1. Input: data 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= and target rank 𝑘 (≤ 𝑑).

Dimensionality reduction: 𝐱 6 ∈ ℝ= 𝐱 6DD ∈ ℝN

• The rows of 𝐗 D ∈ ℝ3×= have zero mean.

• The rows of 𝐗 D ∈ ℝ3×= have zero mean.

• The rows of 𝐗 D ∈ ℝ3×= have zero mean.

• The rows of 𝐗 D ∈ ℝ3×= have zero mean.

• The rows of 𝐗 D ∈ ℝ3×= have zero mean.

ocean temperatures taken 1979—2011 at 6 hour intervals

• Reshape the tensor to a matrix 𝐗 ∈ ℝ3×=

• Ocean temperature data: 𝐗 ∈ ℝ3×=

• Treat each row as a sample.

• Treat each column as a sample.

• Ocean temperature data: 𝐗 ∈ ℝ3×= The first principal component 𝑛×1

• Treat each row as a sample.

• Treat each column as a sample.

• Ocean temperature data: 𝐗 ∈ ℝ3×=

• Treat each row as a sample.

• Treat each column as a sample.

• Ocean temperature data: 𝐗 ∈ ℝ3×= The first principal component 𝑑×1

• Treat each row as a sample.

• Treat each column as a sample.

1. Subtract mean (translation/shift).

3. Rotate the zero-mean vectors.

You might also like