Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

STA3005

MULTIVARIATE DATA ANALYSIS

MEAN VECTOR INFERENCE


Hotelling’s T-squared test for plausibility that that a mean vector has a specific mean
versus

Reject for if

where the data matrix has n items each with p variables.


Example
Suppose we have n=3 parishes data items with just p=2 variables say the
number of Covid19 present cases and recoveries. It is hypothesized that among
all parishes the mean number of cases is 9 and recoveries is 8 represented by .
Is there evidence for this claim? The data for the three parishes is given

Present recoveries
cases

Parish1 8 7

Parish1 12 4

Parish1 10 10
= =4
= -3

det(S)=27,
Therefore
=
= = =0.777

==4(199.5)=798
Since from tables
It is false that
Therefore we do not reject
but conclude that .
Confidence regions and simultaneous confidence intervals for
the means
If the region for vector X is R(X), then
P(R(X) will cover population mean vector) =
P[ ]=
The confidence region for is the ellipsoid such that

If and are eigen values and eigen vectors of S then the axis for
each component axis is
Confidence intervals for components are
Test for multivariate mean of paired differences

.
.

j=1,2,3,…,n

Therefore,
If then reject
Tests for two independent population means

versus

is a random sample of with p variables


is a random sample of with p variables and the two
populations are independent.
For small samples
1. Both populations are multivariate normal
2. They have the same covariance matrix
The pooled estimate of Σ is
=

if
Comparing Several Means with Multivariate Analysis of
Variance (MANOVA)
In this case there are populations each with variables each
with equal numbers of items n
Assumptions
For
1. is a random sample of size from a population with mean
and the samples in the population are independent.
2. All populations have common covariance matrix
3. Each population is multivariate normal
Similar to Univariate ANOVA, sums of squares components sum
to total sums of square for MANOVA

For MANOVA the Sums of Square are as follows


An alternative approach to get the sums of squares is to use
covariance matrices
where is the sums of squares covariance matrix for group i.
B is the sums of squares covariance matrix for values of all
groups combined.
for each matrix element of B
One option for a test statistic involves Wilk’s lambda

The distribution of Wilk’s lambda is given by:


Number of Number of Sampling distribution for
Variables p Groups g Multivariate normal

For other cases and large sample sizes use Bartlett’s test
Reject if test statistic is larger than table value.
Example
Consider the data set with 3 groups, variables and 3
replicates in each group
Group1 Group2 Group3

Var1 Var2 Var1 Var2 Var1 Var2

Replicate1 8 4 10 4 5 6

Replicate2 5 3 8 5 6 4

Replicate3 8 5 6 6 7 8
Group1 Group2 Group3

Var1 Var2 Var1 Var2 Var1 Var2

mean 7 4 8 5 6 6
For group1 and
For group2 and
For group3 and
=
=
Overall mean for variable1=(8+5+8+10+8+6+5+6+7)/9=63/9=7
Overall mean for variable2=(4+3+5+4+5+6+6+4+8)/9=45/9=5

Using for each matrix element of B

=-3
So
So

Wilk’s Lambda = 0.4872


For this test we have a choice for options 2 or 4.
Using option 4, = 1.08

==3.48

Since the test statistic value is less we conclude that the mean
vectors are not different.

You might also like