Professional Documents
Culture Documents
Multivariate Analysis: Case Study
Multivariate Analysis: Case Study
Analysis
Case
Study
Nuwan
Senevirathne
Index No : 10534
Question
1
I.
There
are
7
variables
and
information
of
23
observations.
(countries
are
observations)
II.
Covariance
Matrix
(Matlab
code
:
cov(X))
Descriptive
Statistics
Mean
Std.
Deviation
Variance
0.159669565
X1
11.4117
.39080
0.855287747
X2
23.1217
.90449
4.914802767
X3
52.0913
2.16821
0.006670356
X4
2.0263
.07988
0.043731225
X5
4.1829
.20452
0.281681423
X6
9.0921
.51907
147.2948292
X7
158.3775
11.86974
III.
Correlation
Matrix
(matlab
code
:
corrcoef(X))
IV. The
given
original
data
set
could
be
standardized
using
the
matlab
code
zscore(X)
(Please
find
the
attached
Excel
data
sheet).
Then
using
the
code
cov
for
standardized
data,
the
covariance
matrix
for
standardized
data
can
be
obtained.
It
is
same
as
the
correlation
matrix
of
the
original
data
(non-standardized).
V. The
new
reproduced
covariance
matrix
using
spectral
decomposition
(matlab
code:svd(X)
for
decomposition)
and
using
only
2
eigenvalues
because
it
explains
99.89%
of
the
total
variance.
The
matlab
code
used
to
calculate
eigenvalues
and
eigenvector
[v,d]=eigs(cov.X,7).
The
excel
codes
used
for
the
rest
of
the
calculations.
(MMULT).
Trace
of
the
original
matrix
was
153.556
and
trace
of
this
matrix
is
153.398.
Hence,
approximate
value
for
the
rank
would
be
2
because
2
eigenvalues
together
represents
more
than
99%.
(Please
find
the
calculations
of
eigenvectors
in
excel
data
sheet).
Question
2
The
scree
plot
According to the slopes of the above plot, it is very clear that 3 principal components (PCs) would
be enough to do the analysis.
Addition to that one may consider the residual matrices to determine the number of PCs to keep. It
also suggests that 3 PCs would be ideal (please find the calculated 3 residual matrices in the
attached excel sheet there are 3 separate calculations to see what happened when only 1 or 2 factors
considered).
SPSS Output:
Total Variance Explained
Component
% of Variance
Cumulative %
2.831
40.450
40.450
2.216
31.657
72.106
1.760
25.148
97.254
By looking at the data rotated data it can be said that X1, X2 and X3 are highly correlated and X4 also
indicates somewhat strong relation to that group where X4 correlated to X5 and X6 even more.
X7 is not related to any of others, demonstrating completely isolated.
Above component plot in rotated space gives a very clear picture about the variables. X1, X2 and X3 are
in a side and X5 and X6 on the other side where X4 in the middle and little closer to the X5 and X6 than
X1, X2 and X3. X7 is away from the rest of the variables.
It these variables were seen as they belongs to 3 categories, say short, medium and long X1, X2 and X3
are belongs to the short category clearly and X5 and X6 clearly to the medium and X4 is somewhere
middle but much more closer to the medium category. Thus, X7 belong to the long category.