Professional Documents
Culture Documents
Multivariate Analysis OF Variance (Manova)
Multivariate Analysis OF Variance (Manova)
MULTIVARIATE ANALYSIS
OF
VARIANCE
(MANOVA)
2
9.1 Introduction
The multivariate analysis of variance (MANOVA) is the multivariate analog of the analysis of variance
(ANOVA) procedure used for univariate data.
02/05/2022
3
Treatment
1 2 …. g
1 ….
Subject
2 ….
.. .. … ..
….
02/05/2022
4
The columns correspond to the responses to different treatments or from different populations and the
rows correspond to the subjects in each of these treatments or populations.
Notations:
= Observation from subject in group
= Number of subjects in group
= Total sample size.
02/05/2022
5
Assumptions for the Analysis of Variance are the same as for a two sample t-test except the no. of groups is
more than two:
(a) The data from group I has common mean=, i. e. . This means that there is no sub-populations with
different means.
(b) Homoskedasticity: The data from all groups have common variance ; i. e. . that is, the variability in
the data does not depend on group membership.
(c) Independence: The subjects are independently sampled.
(d) Normality: The data are normally distributed.
02/05/2022
6
The hypothesis of interest is that all of the means are equal to one another. Mathematically we write this
as:
i.e., there is difference between at least one pair of group population means. The following notation should
be considered:
This involves taking average of all the observations for to belonging to the group. The dot in the second
subscript means that the average involves summing over the second subscript of
02/05/2022
7
This involves taking average of all the observations within each group and over the groups and dividing by
the total sample size. The double dots indicate that we are summing over both subscripts of y.
The Analysis of Variance involves a partitioning of the total sum of squares which is defined as in the
expression below:
02/05/2022
8
Residual
Total
02/05/2022
9
Where
Under the null hypothesis that the treatment effect is equal across group means, that is
02/05/2022
10
02/05/2022
11
9.3 The Multivariate Approach: One-way Multivariate Analysis of Variance (One-way MANOVA)
Now we will consider the multivariate analog, the Multivariate Analysis of Variance, often abbreviated as
MANOVA. Suppose that we have data on p variables which we can arrange in a table such as the one below:
In this multivariate case the scalar quantities, , of the corresponding table in ANOVA, are replaced by vectors
having observations.
02/05/2022
Treatment
1 2 …. g
12
….
1
….
2
Subject
…… ……… ……… ……… ………..
….
02/05/2022
13
Notation
= Observation for k from subject in group . These are collected into vectors:
= vector of variables for subject in group .
= The number of subjects in ith treatment.
= Total sample size.
02/05/2022
14
Assumptions
The assumptions here are essentially the same as the assumptions in a Hotelling's T2 test, only here they
apply to groups:
(a) The data from group has common mean vector
(b) The data from all groups have common variance-covariance matrix .
(c) Independence: The subjects are independently sampled.
(d) Normality: The data are multivariate normally distributed.
02/05/2022
15
Here we are interested in testing the null hypothesis that group mean vectors are all equal to one another.
Mathematically this is expressed as:
Vs
This says that the null hypothesis is false if at least one pair of treatments is different on at least one
variable.
02/05/2022
16
Notation
The scalar quantities used in the univariate setting are replaced by in the multivariate setting:
Sample Mean Vector
=sample mean vector for group . This sample mean vector is comprised of the group means for each of the
variables.
= sample mean for variable in group .
02/05/2022
17
where
02/05/2022
18
02/05/2022
Consider
19
where is the Error Sum of Squares and Cross Products, and is the Hypothesis Sum of Squares and Cross
Products.
02/05/2022
20
The element of the error sum of squares and cross products matrix is:
The element of the error sum of squares and cross products matrix is:
02/05/2022
21
The partitioning of the total sum of squares and cross products matrix may be summarized in the
MANOVA
Treatments g - 1 H
Error N - g E
Total N - 1 T
02/05/2022
We 22
wish to reject
if the hypothesis sum of squares and cross products matrix H is large relative to the error sum of squares and
cross products matrix E.
We reject if the ratio of generalized variances
is too small. The quantity , proposed originally by Wilks. The exact distribution of A * can be derived for the
special cases listed in below table .
Distribution of Wilks' Lambda
02/05/2022
23 No. of No. of Sampling distribution for multivariate normal
Variables Groups data
02/05/2022
24
For other cases and large sample sizes, a modification of due to Bartlett can be used to test .
Bartlerr has shown that if true and is large,
has an approximately a chi-square distribution with degree of freedom consequently, for large, we reject at
significance level if
02/05/2022
25
Example 9.2 (MANOVA table and Wilks' lambda for testing the equality of three mean vectors)
Suppose an additional variable is observed along with the variable introduced in Example 9.1, The sample
sizes are
02/05/2022
Example 9.3: (A multivariate analysis of Wisconsin nursing home data)
26
The Wisconsin Department of Health and Social Services reimburses nursing homes in the state for the
services provided. The department develops a set of formulas for rates for each facility, based on factors
such as level of care, mean wage rate, and average wage rate in the state. Nursing homes can be classified
on the basis of ownership (private party, nonprofit organization, and government) and certification
(skilled nursing facility, intermediate care facility, or a combination of the two). One purpose of a recent
study was to investigate the effects of ownership or certification (or both) on costs. Four costs, computed
on a per-patient-day basis and measured in hours per patient day, were selected for analysis: == cost of
nursing labor, cost of dietary labor,= cost of plant operation and maintenance labor, and = cost of
housekeeping and laundry labor. A total of observations on each of the cost variables were initially
separated according to ownership. Summary statistics for each of the groups are given in the following
table.
02/05/2022
27
Group Number of Sample mean vector
observations
(private)
(nonprofit)
(government)
02/05/2022
28 Sample covariance matrices
02/05/2022
29
02/05/2022
30
Assumption 1: This assumption says that there are no subpopulations with different mean vectors. Here,
this assumption might be violated if data collected from the same site had inconsistencies.
Assumption 3: This assumption is satisfied if the assayed data are obtained by randomly sampling data
collected from each site. This assumption would be violated if, for example, data samples were collected in
clusters. In other applications, this assumption may be violated if the data were collected over time or space.
Assumption 4:
For large samples, the Central Limit Theorem says that the sample mean vectors are approximately
multivariate normally distributed, even if the individual observations are not.
For the small samples we cannot rely on the Central Limit Theorem.
02/05/2022
Diagnostic procedures are based on the residuals, computed by taking the differences between the individual
31
observations and the group means for each variable:
Thus, for each subject residuals are defined for each of the variables. Then, to assess normality, we apply
the following graphical procedures:
Plot the histograms of the residuals for each variable. Look for a symmetric distribution.
Plot a matrix of scatter plots. Look for elliptical distributions and outliers.
Plot three-dimensional scatter plots. Look for elliptical distributions and outliers.
If the histograms are not symmetric or the scatter plots are not elliptical, this would be evidence that the data
are not sampled from a multivariate normal distribution in violation of Assumption 4. In this case, a
normalizing transformation should be considered.
02/05/2022
Assumption 2: This assumption can be checked using Bartlett's test for homogeneity of variance-covariance
32
matrices. To obtain Bartlett's test, let denote the population variance-covariance matrix for group . Consider
testing:
against
Under the alternative hypothesis, at least two of the variance-covariance matrices differ on at least one of their
elements. Let:
denote the sample variance-covariance matrix for group . Compute the pooled variance-covariance matrix
02/05/2022
𝑔
∑ ( 𝑛𝑖 − 1 ) 𝑆𝑖 𝐸
𝑖 =1
𝑆𝑝 = =
33 𝑔
𝑛− 𝑔
∑ 𝑛𝑖
𝑖 =1
The version of Bartlett's test considered in the lesson of the two-sample Hotelling's T-square is a special case
where g = 2. Under the null hypothesis of homogeneous variance-covariance matrices, L' is approximately
chi-square distributed with
02/05/2022
34
02/05/2022
35
Example 9.4: (Testing equality of covariance matrices-nursing homes)
We introduced the Wisconsin nursing home data in Example 5.3. In that example the sample covariance
matrices for cost variables associated with groups of nursing homes are displayed. Test the assumption of
equality of covariance matrices.
02/05/2022
36
02/05/2022