Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

https://www.linkedin.

com/posts/danny-butvinik_machinelearning-datascience-statistics-
activity-7018487302729445376-inSV?utm_source=share&utm_medium=member_desktop

How manually compute PCA

In data science, machine learning, and statistics, Principal Component Analysis (PCA) is a
dimensionality reduction method often used to reduce the dimensionality of large data sets by
transforming a large set of variables into a smaller one that still contains most of the
information in the large set.

Reducing the number of variables in a data set naturally comes at the expense of accuracy. Still,
the trick to dimensionality reduction is to trade a little accuracy for simplicity. Smaller data sets
are easier to explore and see, making it much easier for machine learning algorithms to analyze
data since they don't have to deal with as many variables.
 
PCA finds directions of maximal variance in the data. It finds mutually orthogonal directions.
Mutually orthogonal means it's a global algorithm. Global means that all the new directions
and features they find must be orthogonal. This is a significant global constraint.
 
Let’s see how we can manually compute PCA given some random table of values (see the
illustration)
 
Step 1: Standardize the dataset.
Step 2: Calculate the covariance matrix for the features in the dataset.
Step 3: Calculate the eigenvalues and eigenvectors for the covariance matrix.
Step 4: Sort the eigenvalues and their corresponding eigenvectors.
Step 5: calculate eigenvector for each eigenvalue using Cramer’s rule
Step 6: Build the eigenvectors matrix.
Step 7: Pick k eigenvalues and form a matrix of eigenvectors.
Step 8: Transform the original matrix.

You might also like