Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36

CHAPTER 9

MULTIVARIATE ANALYSIS
OF
VARIANCE
(MANOVA)
2

9.1 Introduction
The multivariate analysis of variance (MANOVA) is the multivariate analog of the analysis of variance
(ANOVA) procedure used for univariate data.

9.2 The Univariate Approach: Analysis of Variance (ANOVA)


In the univariate case, the data can often be arranged in a table as shown in the table below:

02/05/2022
3

  Treatment

    1 2 …. g
 
1 ….
Subject

2 ….

.. .. …   ..

….

02/05/2022
4

The columns correspond to the responses to different treatments or from different populations and the
rows correspond to the subjects in each of these treatments or populations.

Notations:
 = Observation from subject in group
 = Number of subjects in group
 = Total sample size.

02/05/2022
5

Assumptions for the Analysis of Variance are the same as for a two sample t-test except the no. of groups is
more than two:
(a) The data from group I has common mean=, i. e. . This means that there is no sub-populations with
different means.
(b) Homoskedasticity: The data from all groups have common variance ; i. e. . that is, the variability in
the data does not depend on group membership.
(c) Independence: The subjects are independently sampled.
(d) Normality: The data are normally distributed.
 

02/05/2022
6

The hypothesis of interest is that all of the means are equal to one another. Mathematically we write this
as:

The alternative is expressed as:

i.e., there is difference between at least one pair of group population means. The following notation should
be considered:
This involves taking average of all the observations for to belonging to the  group. The dot in the second
subscript means that the average involves summing over the second subscript of 

02/05/2022
7

This involves taking average of all the observations within each group and over the groups and dividing by
the total sample size. The double dots indicate that we are summing over both subscripts of y.
The Analysis of Variance involves a partitioning of the total sum of squares which is defined as in the
expression below:

02/05/2022
8

Analysis of Variation (ANOVA) Table

Source of Degree of Sum of Squares Mean sum of Squares


Variation Freedom (d.f.) (SS) (MSS)
Regression

Residual
Total    

02/05/2022
9
Where

Under the null hypothesis that the treatment effect is equal across group means, that is

this  statistic is-distributed with  and degrees of freedom:


We reject at level if

02/05/2022
10

Example 9.1: (The sum of squares decomposition for univariate ANOVA)


Consider the following independent samples.
Population 1:
population 2:
Population 3:

02/05/2022
11

9.3 The Multivariate Approach: One-way Multivariate Analysis of Variance (One-way MANOVA)
Now we will consider the multivariate analog, the Multivariate Analysis of Variance, often abbreviated as
MANOVA. Suppose that we have data on p variables which we can arrange in a table such as the one below:
In this multivariate case the scalar quantities, , of the corresponding table in ANOVA, are replaced by vectors
having  observations.

02/05/2022
    Treatment
    1 2 …. g
12
    ….
  1
    ….
  2
Subject
…… ……… ……… ……… ………..
  ….

02/05/2022
13
Notation
= Observation for k from subject in group . These are collected into vectors:
= vector of variables for subject in group .
= The number of subjects in ith treatment.
= Total sample size.

02/05/2022
14

Assumptions
The assumptions here are essentially the same as the assumptions in a Hotelling's T2 test, only here they
apply to groups:
(a) The data from group  has common mean vector 
(b) The data from all groups have common variance-covariance matrix .
(c) Independence: The subjects are independently sampled.
(d) Normality: The data are multivariate normally distributed.

02/05/2022
15

Here we are interested in testing the null hypothesis that group mean vectors are all equal to one another.
Mathematically this is expressed as:

Vs

This says that the null hypothesis is false if at least one pair of treatments is different on at least one
variable.

02/05/2022
16

Notation
The scalar quantities used in the univariate setting are replaced by in the multivariate setting:
Sample Mean Vector
=sample mean vector for group . This sample mean vector is comprised of the group means for each of the
variables.
= sample mean for variable in group .

02/05/2022
17

Grand Mean Vector

where

02/05/2022
18

Total Sum of Squares and Cross products


In the univariate analysis of variance, total sum of squares, a scaler quantity. The multivariate analog is the
Total sum of squares and cross products matrix, a matrix of numbers. The total sum of squares is a cross
product matrix defined by the expression below:

Here, the element of is

02/05/2022
Consider
19

where  is the Error Sum of Squares and Cross Products, and  is the Hypothesis Sum of Squares and Cross
Products.

02/05/2022
20

The  element of the error sum of squares and cross products matrix  is:

The  element of the error sum of squares and cross products matrix  is:

02/05/2022
21
The partitioning of the total sum of squares and cross products matrix may be summarized in the

multivariate analysis of variance table:

MANOVA

Source d.f. SSP

Treatments g - 1 H

Error N - g E

Total N - 1 T

02/05/2022
We 22
wish to reject

if the hypothesis sum of squares and cross products matrix H is large relative to the error sum of squares and
cross products matrix E.
We reject if the ratio of generalized variances
 

is too small. The quantity , proposed originally by Wilks. The exact distribution of A * can be derived for the
special cases listed in below table .
Distribution of Wilks' Lambda

02/05/2022
23 No. of No. of Sampling distribution for multivariate normal
Variables Groups data

02/05/2022
24

For other cases and large sample sizes, a modification of due to Bartlett can be used to test .
Bartlerr has shown that if true and is large,
has an approximately a chi-square distribution with degree of freedom consequently, for large, we reject at
significance level if

Where is the upper th percentile of a chi-square distribution with d.f.

02/05/2022
25

Example 9.2 (MANOVA table and Wilks' lambda for testing the equality of three mean vectors)
Suppose an additional variable is observed along with the variable introduced in Example 9.1, The sample
sizes are

02/05/2022
Example 9.3: (A multivariate analysis of Wisconsin nursing home data)
26
The Wisconsin Department of Health and Social Services reimburses nursing homes in the state for the
services provided. The department develops a set of formulas for rates for each facility, based on factors
such as level of care, mean wage rate, and average wage rate in the state. Nursing homes can be classified
on the basis of ownership (private party, nonprofit organization, and government) and certification
(skilled nursing facility, intermediate care facility, or a combination of the two). One purpose of a recent
study was to investigate the effects of ownership or certification (or both) on costs. Four costs, computed
on a per-patient-day basis and measured in hours per patient day, were selected for analysis: == cost of
nursing labor, cost of dietary labor,= cost of plant operation and maintenance labor, and = cost of
housekeeping and laundry labor. A total of observations on each of the cost variables were initially
separated according to ownership. Summary statistics for each of the groups are given in the following
table.

02/05/2022
27
Group Number of Sample mean vector
observations
(private)
 
(nonprofit)
(government)
 

02/05/2022
28 Sample covariance matrices

02/05/2022
29

9.4 Checking Model Assumptions


Before carrying out a MANOVA, first check the model assumptions:
1. The data from group has common mean vector 
2. The data from all groups have common variance-covariance matrix
3. Independence: The subjects are independently sampled.
4. Normality: The data are multivariate normally distributed.

02/05/2022
30

Assumption 1: This assumption says that there are no subpopulations with different mean vectors. Here,
this assumption might be violated if data collected from the same site had inconsistencies.
Assumption 3:  This assumption is satisfied if the assayed data are obtained by randomly sampling data
collected from each site. This assumption would be violated if, for example, data samples were collected in
clusters. In other applications, this assumption may be violated if the data were collected over time or space.
Assumption 4: 
For large samples, the Central Limit Theorem says that the sample mean vectors are approximately
multivariate normally distributed, even if the individual observations are not.
For the small samples we cannot rely on the Central Limit Theorem.

02/05/2022
Diagnostic procedures are based on the residuals, computed by taking the differences between the individual
31
observations and the group means for each variable:

Thus, for each subject residuals are defined for each of the  variables. Then, to assess normality, we apply
the following graphical procedures:
 Plot the histograms of the residuals for each variable. Look for a symmetric distribution.
 Plot a matrix of scatter plots. Look for elliptical distributions and outliers.
 Plot three-dimensional scatter plots. Look for elliptical distributions and outliers.

If the histograms are not symmetric or the scatter plots are not elliptical, this would be evidence that the data
are not sampled from a multivariate normal distribution in violation of Assumption 4. In this case, a
normalizing transformation should be considered.
02/05/2022
Assumption 2: This assumption can be checked using Bartlett's test for homogeneity of variance-covariance
32
matrices. To obtain Bartlett's test, let denote the population variance-covariance matrix for group . Consider
testing:

against

Under the alternative hypothesis, at least two of the variance-covariance matrices differ on at least one of their
elements. Let:

denote the sample variance-covariance matrix for group  . Compute the pooled variance-covariance matrix

02/05/2022
𝑔

∑ ( 𝑛𝑖 − 1 ) 𝑆𝑖 𝐸
𝑖 =1
𝑆𝑝 = =
33 𝑔
𝑛− 𝑔
∑ 𝑛𝑖
𝑖 =1

Bartlett's test is based on the following test statistic:

where the correction factor is

The version of Bartlett's test considered in the lesson of the two-sample Hotelling's T-square is a special case
where g = 2. Under the null hypothesis of homogeneous variance-covariance matrices, L' is approximately
chi-square distributed with

02/05/2022
34

degrees of freedom. Reject at level if

02/05/2022
35

 
Example 9.4: (Testing equality of covariance matrices-nursing homes)
We introduced the Wisconsin nursing home data in Example 5.3. In that example the sample covariance
matrices for cost variables associated with groups of nursing homes are displayed. Test the assumption of
equality of covariance matrices.
 

02/05/2022
36

02/05/2022

You might also like