Professional Documents
Culture Documents
BN2102 3. Anova
BN2102 3. Anova
Dr Alberto Corrias
†
remember variance is simply the square of the standard deviation
the key idea of the analysis of variance is that if the
samples were drawn from the same population, that
is, diet has no effect on cardiac output, then the two ANOVA strategy: the key idea
estimates of the population variance should be about
the same. In other words, their ratio, which I call F key idea
should be about 1. CLICK. With this in mind, we If the NULL hypothesis that all the samples were drawn from the
will tend to reject the NULL hypothesis when F is big same population (i.e., diet has no effect on cardiac output) is true,
2 and s 2 should be about the same, or
then swit
and we will be unable to do so when F is closer to bet
0 1 2 fcrit 3 4
(X control − X )2 + (X fr − X )2 + (X pa − X )2 + (X st − X )2
sX2 =
m−1
Once we have the two variances we take the ratio and
we look at the value we obtain. Based on n and m,
we choose the appropriate F distribution. In this case Steps for ANOVA (m = 4 groups, n = 10 sample size)
ν1 = 4 − 1 = 3 and ν2 = 4(10 − 1) = 36. Following
the same philosophy as hypothesis testing, we choose
a value of α as significance level. This identifies the We compute 0.6 F distribution, F (3, 36)
s2
value fcrit here which separates the rejection region F = sbet
2
wit
and choose
If we select a significance
from the acceptance region. Then, it is all a matter a significance level α level α = 0.05, then fcrit
0.4
of checking whether my computed ratio is smaller or is the value that leaves an
If F > fcrit , we area of 0.05 to its right
bigger than fcrit . reject the NULL
hypothesis 0.2
If F < fcrit , we
fail to reject the
NULL
hypothesis 0 1 2 fcrit 3 4
areas in this right tail here are halved. Now, what is where k is the number of multiple comparisons to be performed e.g., k = 3 for 3
NEW groups, k = 6 for 4 groups etc.
the value of this αT and therefore of this new tcrit ?
CLICK Statisticians have come up with a whole family
of techniques, each characterized by a different way to
compute αT . Here, we will only mention 2. The Bon-
ferroni t test where αT is simply α divided by k and
k is the number of possible multiple comparisons. Note that the higher the value of k, the higher the compounding
of type I errors and the smaller αT will be. Experience has shown that for high values of k, the Bonferroni t test is
a bit too conservative, that is, the value of αT becomes really small (but sometimes being conservative is good and
the Bonferroni test is still very much used!). One of the many popular alternatives is the Holm-Sidak t test. Here
αT is given by one minus one minus α to the power of 1 over k.
Before we finish, let’s look at a very common case. The
one where my samples are not all of the same size. Our
scenario had 4 groups of 10 individuals each. If, instead ANOVA with unequal sample sizes
you have m groups each with a different size ni , the The principle is the same, however the formulas get a bit more
principle is exactly the same, only the formulas get a bit complicated. Assuming m groups each with a different ni sample
more complicated. Here you have the variances within sizes.
and between the samples expressed as ratio between a 2 SSwit
swit =
ηwit
sum of squares and a number of degrees of freedom. SSbet
2
The meaning of these quantities is the same as before. sbet =
ηbet
There is no need to memorize these formulas, but it’s where N = m
P
i=1 ni , ηbet = m − 1 represents the numerator degrees
useful to mention them because the case of unequal of freedom, ηwit = N − m represents the denominator degrees of
sample size is quite common. freedom and
X m
SSwit = (ni − 1)si2
i=1
m
( m 2
P
2 i=1 ni X i )
X
SSbet = ( ni X i ) −
N
i=1
To summarize, in this lesson we saw the general prin-
ciples of ANOVA. This led us to introduce the F distri-
bution. We then saw the calculation steps and briefly Lecture Summary
looked at the formulas for unequal sample size.
2 The F distribution