Professional Documents
Culture Documents
ANOVA701
ANOVA701
• Motivating Example
• Analysis of Variance
• Model & Assumptions
• Data Estimates of the Model
• Analysis of Variance
• Multiple Comparisons
• Checking Assumptions
• One-way ANOVA Transformations
Oneway ANOVA
If an experiment is conducted randomly allocate units to the k
“treatment” groups, otherwise independent samples are taken.
Population 1
Population 2
2, 2 ... Population k
x
,s ,
1k 1 k
When sample
Whenare
sizes sample
sizes arewe
unequal equal
say
Independently drawn random samples we sayisdesign
design
is balanced.
unbalanced
Questions:
1) Do the k population means differ somehow, i.e. are at least two treatment
means differ?
2) If so, which pairs of means differ?
Motivating Example:
Treating Anorexia Nervosa
Anorexia patients were randomly assigned to receive one
of three different therapies:
• Standard – current accepted therapy
• Family – family therapy
• Behavior – behavioral cognitive therapy
For each patient weight change during the course of the
treatment was recorded. The duration of treatment was the
same for all patients, regardless of therapy received.
Between- Within-group
group variation
variation
(group means
are diamonds)
1
1 3j = y3j 3
2
2 = 2 -
3
3
Analysis of Variance
Between-group
Within-group variation is
variation is small
large.
2j = y2j - 2
= 1 = 2 = 3
Analysis of Variance
• If we consider all of the data together, regardless of
which sample the observation belongs to, we can
measure the overall total variability in the data by:
k ni
( xij x )
2
i 1 j 1
ij i ij i
( x x ) 2
( x x ) 2
( x x ) 2
i 1 j 1 i 1 j 1 i 1 j 1
ij i ij i
( x x ) 2
( x x ) 2
( x x ) 2
i 1 j 1 i 1 j 1 i 1 j 1
• Therefore, n ( xi x ) 2
i 1 = MSTreat
k 1
is an estimate of , if and only if the null hypothesis is true.
• However if the alternative hypothesis is true, and there is
substantial between group variation, this estimate of
will be too BIG. This measure of between group variation
is called the Mean Square for Treatments and is denoted
MSTreat .
TWO ESTIMATES OF THE COMMON VARIANCE
TWO ESTIMATES OF THE COMMON VARIANCE
• Thus if the null hypothesis is true we have two
estimates of the common variance ( , namely the
Mean Square for Treatments (MSTreat) and the Mean
Square Error (MSError) .
• If MSTreat >> MSError, i.e. the between group variation
is large relative to the within group variation, we
reject Ho.
• If MSTreat MS Error we fail to reject Ho, i.e. the
between group variation is NOT large relative to the
within group variation.
Analysis of Variance: F-Test Statistic
• Our test statistic is the F-ratio (or F-statistic)
which compares these two mean squares:
MSTreat
F0
MSError
MSTreat
SSTreat + SSError = SSTotal = 5.42
MSError
Analysis of Variance
• A large F-statistic provides evidence against H0
while a small F-statistic indicates that the data and
H0 are compatible.
0 2 4 5 4
1 (.95) .60
18
Multiple Comparisons
With 18 independent
comparisons, we have
60% chance of at least 1
false positive. With 60
comparisons it’s over a
95% chance.
Multiple Comparisons
With 18 independent
comparisons, we expect
about 1 false positive.
Similarly with 60
comparisons we expect to
find 3 false positives and
with 100 comparisons we
expect 5.
Multiple Comparisons
• If we estimate each comparison separately with 95%
confidence, the overall error rate will be greater
than 5%.
• So, using ordinary pair-wise comparisons (i.e. lots of
individual pooled t-tests), we tend to find too many
significant differences between our sample means.
• We need to modify our intervals so that they
simultaneously contain the true differences with 95%
confidence across the entire set of comparisons.
• These modified intervals and tests are known as:
simultaneous confidence methods OR
multiple comparison procedures
Multiple Comparisons
• First we consider the Bonferroni correction.
• Instead of using for determining our confidence
intervals (e.g. we use /m, where m
is the total number of possible pair-wise
comparisons (i.e. m = k (k 1) / 2).
• For testing we compare our two-tailed p-values
from pair-wise comparisons to /m instead.
• This assumes all pair-wise comparisons are
independent, which is not the case, so this
adjustment is too conservative (intervals will be too
wide; i.e. finds too few significant differences).
Multiple Comparisons
• As a better alternative we have Tukey Intervals.
• The calculation of Tukey Intervals is quite
complicated, but overcomes the problems of the
unadjusted pair-wise comparisons finding too many
significant differences
(i.e. confidence intervals that are too narrow),
These CI’s
contain 0.
Assumptions:
1) Independent samples were drawn from the 5 populations of interest.
2) Normality looks OK with the exception of a few large outliers.
3) Variation looks fairly equal across groups, however the healthy
group appears to have less variation in their white blood cell counts.
Example 2: Leucocyte Counts
• Formally checking the variance equality assumption.
In JMP In SPSS