Professional Documents
Culture Documents
14 PCA Max Variance and Min Error
14 PCA Max Variance and Min Error
14 PCA Max Variance and Min Error
62 2
Fig.1: Principal components of a multivariate gaussian centered at (1,3). Image Source: [3].
E 1 Ei l d Ei t
https://towardsdatascience.com/principal-component-analysis-part-1-the-different-formulations-6508f63a5553#:~:text=In the Minimum Error Formulation,lead to the same solution. 3/26
10/5/22, 10:52 PM Principal Component Analysis Part 1: The Different Formulations. | by Aadhithya Sankar | Towards Data Science
Eq. 1: Eigenvalues and Eigenvectors
Sign In Get started
Now let us consider the simplest case where M=1. We define a vector w1∈ R^D
as the direction of the lower dimensional space. Since we are only interested
in the direction of the space, we set w1 to be of unit length. i.e.,
https://towardsdatascience.com/principal-component-analysis-part-1-the-different-formulations-6508f63a5553#:~:text=In the Minimum Error Formulation,lead to the same solution. 4/26
10/5/22, 10:52 PM
t e d ect o oPrincipal
t eComponent
space,Analysis
we set Part 1: The Different Formulations. | by Aadhithya Sankar | Towards Data Science
w to be o u t e gt . .e.,
Sign In Get started
Then the data observations x_n can be projected onto this new space as
If x̄ is the mean of the data observations in the original space, then the mean
of the samples in the projected space is given by
where S is the covariance matrix of the observed data in the original high
dimensional space.
Eq. 6: Covariance of x.
Eq. 9: The maximum variance in the lower dimensional space is equal to the eigenvalue corresponding to
eigenvector w1.
maximise variance while being orthogonal to the existing ones. For the Sign In Get started
general case of a lower dimensional space with M dimensions with M<D, the
principal components are the eigenvectors w1, w2, … wm corresponding to
the M largest eigenvalues λ1, λ2, …, λm.
where δᵢⱼ is the kronecker delta. Since the basis is complete, we can represent
any vector as a linear combination of the basis vectors
Eq. 12
Since we have orthonormal basis, we have the solution for α_ni as the dot
product of the x_n and w_i. Then, we can write Eq. 12 as
Eq. 14
and the remaining (D-M) basis are shared by all data points(shared offsets). In
Eq. 14, z_ni depends on the individual data points while b_i are constants that
are shared by all the data points.
From Eq. 13 and Eq. 14 we can compute the difference between x_n and x̃_n
as
Now, we can substitute Eq. 15 in Eq. 10 to get the objective function as:
Eq. 16
Eq. 17
Eq. 18
N b tit t E 17 dE 18 i E 16 d t
https://towardsdatascience.com/principal-component-analysis-part-1-the-different-formulations-6508f63a5553#:~:text=In the Minimum Error Formulation,lead to the same solution. 13/26
10/5/22, 10:52 PM Principal Component Analysis Part 1: The Different Formulations. | by Aadhithya Sankar | Towards Data Science
Now, we can substitute Eq. 17 and Eq. 18 in Eq. 16 and get
Sign In Get started
Eq. 18
Our goal is to minimise J(w), but we observe that a trivial solution exists to this
https://towardsdatascience.com/principal-component-analysis-part-1-the-different-formulations-6508f63a5553#:~:text=In the Minimum Error Formulation,lead to the same solution. 14/26
10/5/22, 10:52 PM
g J( ),
Principal Component Analysis Part 1: The Different Formulations. | by Aadhithya Sankar | Towards Data Science
problem when w = 0. To overcome this, we again make use of the property ofSign In Get started
Now, we look at a simple example where D=2 and M=1. We have to choose a
direction w2 such that we can minimise the following objective
Eq. 19
Eq. 20
Eq. 21
Eq. 22
the sum of (D-M) eigenvalues. Therefore, our goal is to use the M largest
eigenvalues so that the distortion measure J, now constituting of the (D-M)
smallest eigenvalues is minimised. Thus we can conclude that minimising the
reconstruction error maximises the variance of the projection.
The first step is to compute and subtract the mean from the data points so that
the data is centred around 0 and therefore has zero mean.
The goal of PCA is the transformation of the coordinate system such that the
covariance between the new axes are 0.
Fig. 2: Goal of PCA is to find space W such that the covariance between new axes is 0.
4. Dimensionality Reduction
pca_dim_reduction.py
hosted with ❤ by GitHub view raw
only the eigenvectors corresponding to the M largest eigenvalues. Therefore Sign In Get started
Fig.3 PCA can get mislead by unstandardised data. (a) Principal Component is skewed because PCA is misled
by the unstandardised data. (b) PCA when the scales are standardised. Image generated from code at [5].
The principal directions from PCA are the ones along which the variance is
the most. So, PCA can be mislead by directions along which the variance
appears high just because of the measurement scale. We can see this in Fig.
3(a) where the Principal Component is not aligned properly because it is
mislead by the unstandardised scale. Fig. 3(b) shows the correct principal
component when the scales are standardised. Therefore, care needs to be
taken to standardise the data to avoid such issues.
Conclusion
In this rather long post, we dove deep and looked at the two formulations of
PCA: the Maximum Variance and Minimum Error Formulation. We saw that
both the formulations had the same solution/algorithm — select as the new
basis, the eigenvectors corresponding to the M largest eigenvalues of the
Covariance matrix of the data. We saw how PCA can be use for dimensionality
reduction and how it can be implemented in python. Finally we briefly looked
into the importance of standardising the data and how it affects the algorithm.
This brings to the end of this post, which is merely Part 1 of this series on
PCA. The next parts will cover Probabilistic PCA, Singular Value
Decomposition, Auto encoders and the relationship between Autoencoders,
PCA and SVD.
https://towardsdatascience.com/principal-component-analysis-part-1-the-different-formulations-6508f63a5553#:~:text=In the Minimum Error Formulation,lead to the same solution. 21/26
10/5/22, 10:52 PM Principal Component Analysis Part 1: The Different Formulations. | by Aadhithya Sankar | Towards Data Science
PCA and SVD.
Sign In Get started
Follow Aadhithya Sankar to get notified when the next parts are made
available!
If you find any mistakes, please leave a comment, I will fix them!🙏🏽 ✌🏽
References
[1] Bishop, Christopher M. Pattern Recognition and Machine Learning. New
York :Springer, 2006.
[3]https://commons.wikimedia.org/wiki/File:GaussianScatterPCA.svg#/media/
File:GaussianScatterPCA.svg
[4] https://www.math.ucdavis.edu/~linear/old/notes21.pdf
[5] Murphy, K., Soliman, M., Duran-Martin, G., Kara, A., Liang Ang, M.,
Reddy S & Patel D (2021) PyProbML library for Probabilistic Machine
https://towardsdatascience.com/principal-component-analysis-part-1-the-different-formulations-6508f63a5553#:~:text=In the Minimum Error Formulation,lead to the same solution. 22/26
10/5/22, 10:52 PM Principal Component Analysis Part 1: The Different Formulations. | by Aadhithya Sankar | Towards Data Science
Reddy, S., & Patel, D. (2021). PyProbML library for Probabilistic Machine
Sign In Get started
Learning [Computer software].
Resources
Here are some resources that can help understand the topic better
Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-edge
research to original features you don't want to miss. Take a look.
By signing up, you will create a Medium account if you don’t already have one. Review Get this newsletter
our Privacy Policy for more information about our privacy practices.