Component Analysis Is A Dimension-Reduction Tool That Can

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Dimension A Multivariate Analysis problem could start out with a

reduction tool substantial number of correlated variables. Principal


Component Analysis is a dimension-reduction tool that can
be used advantageously in such situations. Principal
component analysis aims at reducing a large set of variables
to a small set that still contains most of the information in
the large set.

Principal The technique of principal component analysis enables us to


factors create and use a reduced set of variables, which are
called principal factors. A reduced set is much easier to
analyze and interpret. To study a data set that results in the
estimation of roughly 500 parameters may be difficult, but if
we could reduce these to 5 it would certainly make our day.
We will show in what follows how to achieve substantial
dimension reduction.

Inverse While these principal factors represent or replace one or


transformaion more of the original variables, it should be noted that they
not possible are not just a one-to-one transformation, so inverse
transformations are not possible.

Original data To shed a light on the structure of principal components


matrix analysis, let us consider a multivariate data matrix X,
with n rows and p columns. The p elements of each row are
scores or measurements on a subject such as height, weight
and age.

Linear Next, standardize the X matrix so that each column mean is


function that 0 and each column variance is 1. Call this matrix Z. Each
maximizes column is a vector variable, zi,i=1,…,p. The main idea
variance behind principal component analysis is to derive a linear
function y for each of the vector variables zi. This linear
function possesses an extremely important property;
namely, its variance is maximized.

Linear This linear function is referred to as a component of z. To


function is illustrate the computation of a single element for
component the jth y vector, consider the product y=zv′ where v′ is a
of z column vector of V, and V is a p×p coefficient matrix that
carries the p-element variable z into the derived n-
element variable y. V is known as the eigen vector matrix.
The dimension of z is 1×p, the dimension of v′ is p×1. The
scalar algebra for the component score for the ith individual
of yj,j=1,…,p is:
yij=v′1z1i+v′2z2i+⋯+v′pzpi.
This becomes in matrix notation for all of the y:
Y=ZV.

Mean and The mean of y is my=V′mz=0, because mz=0.


dispersion
matrix of y The dispersion matrix of y is

Dy=V′DzV=V′RV.

R is Now, it can be shown that the dispersion matrix Dz of a


correlation standardized variable is a correlation matrix. Thus R is the
matrix correlation matrix for z.

Number of At this juncture you may be tempted to say: "so what?". To


parameters to answer this let us look at the intercorrelations among the
estimate elements of a vector variable. The number of parameters to
increases be estimated for a p-element variable is
rapidly
as p increases  p means
 p covariances
 for a total of 2p+(p2−p)/2 parameters.

So

 If p=2, there are 5 parameters


 If p=10, there are 65 parameters
 If p≥30, there are 495 parameters

Uncorrelated All these parameters must be estimated and interpreted.


variables That is a herculean task, to say the least. Now, if we could
require no transform the data so that we obtain a vector of uncorrelated
covariance variables, life becomes much more bearable, since there are
estimation no covariances.

https://www.youtube.com/watch?v=F3YoC5A6Avg

You might also like