Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

Principal Component Analysis (PCA)

Shusen Wang
Some Statistical Concepts
Mean and Variance (Scalar)
• Let 𝑋 be a random scalar variable and 𝑝(⋅) be the probability
density function (PDF).
• Mean: 𝜇 = 𝔼 𝑋 = ∫ 𝑥 ⋅ 𝑝 𝑥 𝑑𝑥.
• Variance: 𝜎 . = 𝔼 𝑋 − 𝜇 .
=∫ 𝑥−𝜇 .
⋅ 𝑝 𝑥 𝑑𝑥.
Mean and Variance (Scalar)
• Let 𝑋 be a random scalar variable and 𝑝(⋅) be the probability
density function (PDF).
• Mean: 𝜇 = 𝔼 𝑋 = ∫ 𝑥 ⋅ 𝑝 𝑥 𝑑𝑥.
• Variance: 𝜎 . = 𝔼 𝑋 − 𝜇 .
=∫ 𝑥−𝜇 .
⋅ 𝑝 𝑥 𝑑𝑥.

• Let 𝑥0 , ⋯ , 𝑥3 be independently drawn observations of 𝑋.


0
• Sample mean: 𝜇4 = ∑3670 𝑥6 .
3
0
.
• Sample variance: 𝜎4 = ∑3670 𝑥6 − 𝜇4 . .
380
Mean and Covariance (Vector)
• Let 𝑋 be a random vector variable and 𝑝(⋅) be the probability
density function (PDF).
• Mean: 𝛍 = 𝔼 𝑋 = ∫ 𝐱 ⋅ 𝑝 𝐱 𝑑𝐱 ∈ ℝ= .
?
• Covariance: 𝐂 = 𝔼 𝐱 − 𝛍 𝐱 − 𝛍 ∈ ℝ=×= .

• Let 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= be independently drawn observations of 𝑋.


0
A = ∑3670 𝐱 6 ∈ ℝ= .
• Sample mean: 𝛍
3
0
• Sample covariance: 𝐂B = ∑3670 𝐱 6 − 𝛍
C 𝐱6 − 𝛍
C ?
∈ ℝ=×= .
380
Principal Component Analysis (PCA)
PCA Explained
• Transforms the data to a new coordinate system.
• The greatest variance lie on the first coordinate (called the 1st principal
component).
• The 2nd greatest variance on the second coordinate, and so on…
PCA Explained
The original data.
PCA Explained
Step 1: subtract the mean

0
A = ∑3670 𝐱 6
• Sample mean: 𝛍
3
PCA Explained
Step 1: subtract the mean

0
A = ∑3670 𝐱 6
• Sample mean: 𝛍
3
• Translation: 𝐱 6D = 𝐱 6 − 𝛍
A
translation
PCA Explained
Step 1: subtract the mean

0
A = ∑3670 𝐱 6
• Sample mean: 𝛍
3
• Translation: 𝐱 6D = 𝐱 6 − 𝛍
A

After translation
PCA Explained
Step 2: rotation

the 2nd greatest variance direction

the greatest variance direction


PCA Explained
Step 2: rotation

• Find a rotation matrix 𝐕 ∈ ℝ=×=


• Rotation: 𝐱 6DD = 𝐕 ? 𝐱 6 ′ ∈ ℝ=
Rotation

the 2nd greatest variance direction

the greatest variance direction


PCA Explained
The result

=×=
• Find a rotation matrix 𝐕 ∈ ℝ
the 2nd greatest variance direction
• Rotation: 𝐱 6DD = 𝐕 ? 𝐱 6 ′ ∈ ℝ=

the greatest variance direction


Question: How to Perform the Rotation?

Rotation
• Find a rotation matrix 𝐕 ∈ ℝ=×=
• Rotation: 𝐱 6DD = 𝐕 ? 𝐱 6 ′ ∈ ℝ=

the 2nd greatest variance direction

the greatest variance direction


Question: How to Perform the Rotation?

• 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D , ⋯ , 𝐱 3D .


• Singular value decomposition (SVD):
𝐗 D = ∑=H70 𝜎H 𝐮H 𝐯H? .
• 𝐕 = 𝐯0 , 𝐯. , ⋯ , 𝐯= ∈ ℝ=×= .
• Rotation: 𝐱 6DD = 𝐕 𝐱 6 ′ ∈ ℝ= .
Question: How to Perform the Rotation?

• 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D , ⋯ , 𝐱 3D .


• Singular value decomposition (SVD):
𝐗 D = ∑=H70 𝜎H 𝐮H 𝐯H? .
• 𝐕 = 𝐯0 , 𝐯. , ⋯ , 𝐯= ∈ ℝ=×= .
• Rotation: 𝐱 6DD = 𝐕 𝐱 6 ′ ∈ ℝ= .

the 2nd greatest variance direction is 𝐯.

the greatest variance direction is 𝐯0


Question: How to Perform the Rotation?

• 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D , ⋯ , 𝐱 3D .


• Singular value decomposition (SVD):
𝐗 D = ∑=H70 𝜎H 𝐮H 𝐯H? .
Rotation
• 𝐕 = 𝐯0 , 𝐯. , ⋯ , 𝐯= ∈ ℝ=×= .
• Rotation: 𝐱 6DD = 𝐕 ? 𝐱 6 ′ ∈ ℝ= .

the 2nd greatest variance direction is 𝐯.

the greatest variance direction is 𝐯0


Question: How to Perform the Rotation?

• 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D , ⋯ , 𝐱 3D .


• Singular value decomposition (SVD):
𝐞. 𝐗 D = ∑=H70 𝜎H 𝐮H 𝐯H? .
• 𝐕 = 𝐯0 , 𝐯. , ⋯ , 𝐯= ∈ ℝ=×= .
𝐞0
• Rotation: 𝐱 6DD = 𝐕 ? 𝐱 6 ′ ∈ ℝ= .

• 𝐗 DD ∈ ℝ3×= : the stack of 𝐱0DD , ⋯ , 𝐱 3DD .


• Its right singular vectors are the
standard basis: 𝐞0 , ⋯ , 𝐞= .
The Procedure of PCA (Recap)

1. Input: data 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= and target rank 𝑘 (≤ 𝑑).


0
A = ∑3670 𝐱 6 and 𝐱 6D = 𝐱 6 − 𝛍
2. Subtract the mean: 𝛍 A.
3
• Obtain matrix 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D, ⋯ , 𝐱 3D .
The Procedure of PCA (Recap)

1. Input: data 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= and target rank 𝑘 (≤ 𝑑).


0
A = ∑3670 𝐱 6 and 𝐱 6D = 𝐱 6 − 𝛍
2. Subtract the mean: 𝛍 A.
3
• Obtain matrix 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D, ⋯ , 𝐱 3D .
3. Find the rotation matrix 𝐕N = 𝐯0 , 𝐯. , ⋯ , 𝐯N ∈ ℝ=×N
• Truncated SVD: 𝐗 DN = ∑NH70 𝜎H 𝐮H 𝐯H? .

Question: Why setting a target rank 𝑘 ≤ 𝑑? Why not using 𝑘 = 𝑑?


The Procedure of PCA (Recap)

1. Input: data 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= and target rank 𝑘 (≤ 𝑑).


0
A = ∑3670 𝐱 6 and 𝐱 6D = 𝐱 6 − 𝛍
2. Subtract the mean: 𝛍 A.
3
• Obtain matrix 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D, ⋯ , 𝐱 3D .
3. Find the rotation matrix 𝐕N = 𝐯0 , 𝐯. , ⋯ , 𝐯N ∈ ℝ=×N
• Truncated SVD: 𝐗 DN = ∑NH70 𝜎H 𝐮H 𝐯H? .

• 𝐯0 is the greatest variance direction, 𝐯. is the 2nd greatest, and so on.


• Only the top variance directions are interesting.
• The top ones are features.
• The bottom ones are noise.
• For visualization, we set 𝑘 = 2 or 3.
The Procedure of PCA (Recap)

1. Input: data 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= and target rank 𝑘 (≤ 𝑑).


0
A = ∑3670 𝐱 6 and 𝐱 6D = 𝐱 6 − 𝛍
2. Subtract the mean: 𝛍 A.
3
• Obtain matrix 𝐗 D ∈ ℝ3×= : the stack of 𝐱0D, ⋯ , 𝐱 3D .
3. Find the rotation matrix 𝐕N = 𝐯0 , 𝐯. , ⋯ , 𝐯N ∈ ℝ=×N
• Truncated SVD: 𝐗 DN = ∑NH70 𝜎H 𝐮H 𝐯H? .
4. Rotation (and dimensionality reduction):
• 𝐱 6DD = 𝐕N? 𝐱 6 ′ ∈ ℝN , for 𝑖 = 1 to 𝑛, are the outputs.

Dimensionality reduction: 𝐱 6 ∈ ℝ= 𝐱 6DD ∈ ℝN


Why is 𝐯0 the Greatest Variance Direction?

• The rows of 𝐗 D ∈ ℝ3×= have zero mean.


• 𝐱 6D = 𝐱 6 − 𝛍
A is the result of translation (Step 1 of PCA).
• The sample covariance matrix of 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= is:
0
• 𝐂B = ∑3670 C 𝐱6 − 𝛍
𝐱6 − 𝛍 C ? ∈ ℝ=×= .
380
Why is 𝐯0 the Greatest Variance Direction?

• The rows of 𝐗 D ∈ ℝ3×= have zero mean.


• 𝐱 6D = 𝐱 6 − 𝛍
A is the result of translation (Step 1 of PCA).
• The sample covariance matrix of 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= is:
0
• 𝐂B = ∑3670 C 𝐱6 − 𝛍
𝐱6 − 𝛍 C ? ∈ ℝ=×= .
380
0 D𝑻 D
• è 𝐂B = 𝐗 𝐗.
380
Why is 𝐯0 the Greatest Variance Direction?

• The rows of 𝐗 D ∈ ℝ3×= have zero mean.


• 𝐱 6D = 𝐱 6 − 𝛍
A is the result of translation (Step 1 of PCA).
• The sample covariance matrix of 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= is:
0
• 𝐂B = ∑3670 C 𝐱6 − 𝛍
𝐱6 − 𝛍 C ? ∈ ℝ=×= .
380
0 D𝑻 D
• è 𝐂B = 𝐗 𝐗.
380
0
• 𝐗 = ∑=H70 𝜎H 𝐮H 𝐯H?
D
𝐂B = ∑=H70 𝜎H. 𝐯H 𝐯H? .
380
• 𝜎0 ≥ 𝜎. ≥ ⋯ ≥ 𝜎= .
Why is 𝐯0 the Greatest Variance Direction?

• The rows of 𝐗 D ∈ ℝ3×= have zero mean.


• 𝐱 6D = 𝐱 6 − 𝛍
A is the result of translation (Step 1 of PCA).
• The sample covariance matrix of 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= is:
0
• 𝐂B = ∑3670 C 𝐱6 − 𝛍
𝐱6 − 𝛍 C ? ∈ ℝ=×= .
380
0 D𝑻 D
• è 𝐂B = 𝐗 𝐗.
380
0
• 𝐗 = ∑=H70 𝜎H 𝐮H 𝐯H?
D
𝐂B = ∑=H70 𝜎H. 𝐯H 𝐯H? .
380
• 𝜎0 ≥ 𝜎. ≥ ⋯ ≥ 𝜎= .
Why is 𝐯0 the Greatest Variance Direction?

• The rows of 𝐗 D ∈ ℝ3×= have zero mean.


• 𝐱 6D = 𝐱 6 − 𝛍
A is the result of translation (Step 1 of PCA).
• The sample covariance matrix of 𝐱0 , ⋯ , 𝐱 3 ∈ ℝ= is:
0
• 𝐂B = ∑3670 C 𝐱6 − 𝛍
𝐱6 − 𝛍 C ? ∈ ℝ=×= .
380
0 D𝑻 D
• è 𝐂B = 𝐗 𝐗.
380
0
• 𝐗 = ∑=H70 𝜎H 𝐮H 𝐯H?
D
𝐂B = ∑=H70 𝜎H. 𝐯H 𝐯H? .
380
• 𝜎0 ≥ 𝜎. ≥ ⋯ ≥ 𝜎= .
• 𝐯0 = argmax𝐯 𝐯 ? 𝐂B 𝐯.
PCA: A Real-World Example
Ocean Temperature Data

ocean temperatures taken 1979—2011 at 6 hour intervals


Ocean Temperature Data
• Just keep the surface temperature
• Shape: 𝑛×𝑑0 ×𝑑. (oder-3 tensor)
• 𝑛 = 4×365×32 = 46,720 (4 measurements per day)
• 𝑑0 = 2×360 = 720 (2 measurement per degree of longitude)
• 𝑑. = 2×180 = 360 (2 measurement per degree of latitude)
Ocean Temperature Data
• Just keep the surface temperature
• Shape: 𝑛×𝑑0 ×𝑑. (oder-3 tensor)
• 𝑛 = 4×365×32 = 46,720 (4 measurements per day)
• 𝑑0 = 2×360 = 720 (2 measurement per degree of longitude)
• 𝑑. = 2×180 = 360 (2 measurement per degree of latitude)

• Reshape the tensor to a matrix 𝐗 ∈ ℝ3×=


• Temporal dimension: 𝑛 = 46,720
• Spatial dimension: 𝑑 = 𝑑0𝑑. = 259,200
Compute the Top 𝑘 = 4 Principal Components

• Ocean temperature data: 𝐗 ∈ ℝ3×=


• Temporal dimension: 𝑛 = 46,720
• Spatial dimension: 𝑑 = 𝑑0𝑑. = 259,200

• Treat each row as a sample.


• The result of PCA is 𝑛×4 matrix.

• Treat each column as a sample.


• The result of PCA is 𝑑×4 matrix.
Compute the Top 𝑘 = 4 Principal Components

• Ocean temperature data: 𝐗 ∈ ℝ3×= The first principal component 𝑛×1


• Temporal dimension: 𝑛 = 46,720
• Spatial dimension: 𝑑 = 𝑑0𝑑. = 259,200

• Treat each row as a sample.


• The result of PCA is 𝑛×4 matrix.

• Treat each column as a sample.


One of the 𝑛 time points
• The result of PCA is 𝑑×4 matrix.
Compute the Top 𝑘 = 4 Principal Components

• Ocean temperature data: 𝐗 ∈ ℝ3×=


• Temporal dimension: 𝑛 = 46,720
• Spatial dimension: 𝑑 = 𝑑0𝑑. = 259,200

• Treat each row as a sample.


• The result of PCA is 𝑛×4 matrix.

• Treat each column as a sample.


• The result of PCA is 𝑑×4 matrix.
Compute the Top 𝑘 = 4 Principal Components

• Ocean temperature data: 𝐗 ∈ ℝ3×= The first principal component 𝑑×1


• Temporal dimension: 𝑛 = 46,720
• Spatial dimension: 𝑑 = 𝑑0𝑑. = 259,200 One of the 𝑑 locations (color indicates value)

• Treat each row as a sample.


• The result of PCA is 𝑛×4 matrix.

• Treat each column as a sample.


• The result of PCA is 𝑑×4 matrix.
The 1st Principal Components

Annual cycle
The 2nd Principal Components

Annual cycle
The 3rd Principal Components

El Nino
The 4th Principal Components

La Nina
Summary
Dimensionality Reduction

• Applications
• Visualization and analysis
• Data processing (to make downstream ML more efficient or accurate)
Dimensionality Reduction

• Applications
• Visualization and analysis
• Data processing (to make downstream ML more efficient or accurate)

• Methods
• Principal component analysis (PCA)
• Kernel PCA
• Manifold learning
• Autoencoder
• Linear discriminant analysis (LDA)
3 Steps of PCA

1. Subtract mean (translation/shift).


2. Find the rotation matrix by (truncated) SVD.
• The top right singular vectors are the maximum variance directions.

3. Rotate the zero-mean vectors.

You might also like