Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Hello everyone and welcome to this BN2102 lecture.

Today we will be looking at the analysis of variance,


also known with the acronym ANOVA

Analysis of Variance (ANOVA)

Dr Alberto Corrias

Department of Biomedical Engineering. National University of Singapore


We are going to look at an example scenario where
we selected 4 groups of people and divide them in 4
groups. Each group is told to eat only a certain type General ideas
of food. 10 will eat only fruits, 10 will eat only pasta,
10 only steaks and the last group is told to eat nor-
mally. This last group will work as control. After a Scenario
pre-determined amount of time, we go and measure We randomly select 4 groups of 10 persons from a population of a
cardiac output in these 4 groups of people. CLICK. small village of 200 healthy individuals. Ten are told to continue
eating normally. Ten are told to eat fruits only, ten are told to eat
The main question is ”does diet affect cardiac output? pasta only and ten are told to eat steaks only. After some time of this
We therefore have two hypothesis. The NULL hypoth- diet we measure their cardiac output.
esis is that diet does not affect cardiac output, while
The main question is: does diet affect cardiac output? Note that we
the alternate hypothesis is that it does. The key differ- have 4 groups to compare.
ence between the t test to test the difference between H0 : Diet has no effect on cardiac output
two groups is that we have 4 groups to compare. H1 : Diet has effect on cardiac output
Let’s look at an example of obtaining 4 samples of
10 people. Here we have the 4 groups made of 10
individuals with their measurement of cardiac output. Sampling multiple groups
The idea is that there will be variability among the
samples. If H0 is true, however, this variability is simply
due to random sampling and does not reflect a real
effect of the diet on cardiac output. In this case, all
my samples would come from the same population as
shown in this slide. However, if H0 is false instead,
If H0 is true, the observed
it would mean that one or more samples were drawn differences are simply due
from a different population with different mean cardiac to random sampling.
outputs. Down here at the bottom I recorded each
sample mean - in this case 4 means - and the black bar
dot here is the mean of the means while the bar is the
standard deviation of all means.
With that idea in mind, we first observe that the vari-
ance of the underlying population is unknown to us.
We only have the samples. We will try to estimate ANOVA strategy
this unknown variance in two ways. The first is taking The variance † of the underlying population is unknown. We are
each sample, computing the variance within that sam- going to estimate it in two ways
ple and then average all the variances obtained.CLICK. 1 The average of the 4 sample variances we obtained within each
group. In our scenario
The second way is by looking at the means of the sam-
ples. Remember I recorded all the sample means at the 1 2
2 2 2 2
bottom in the previous slide. Well, the standard devia- swit = (scon + sfru + spas + sste )
4
tion of the means, also known as standard error of the
means can also be used as an estimate of the popula- 2 We look at the means of the samples. Recall that the SEM

tion variance. So we have these two measures of the σX = σ/ n, hence σ 2 = nσX2 . We estimate the population
variance from the variability between the samples
same unknown quantity.
2
sbet = nsX2


remember variance is simply the square of the standard deviation
the key idea of the analysis of variance is that if the
samples were drawn from the same population, that
is, diet has no effect on cardiac output, then the two ANOVA strategy: the key idea
estimates of the population variance should be about
the same. In other words, their ratio, which I call F key idea
should be about 1. CLICK. With this in mind, we If the NULL hypothesis that all the samples were drawn from the
will tend to reject the NULL hypothesis when F is big same population (i.e., diet has no effect on cardiac output) is true,
2 and s 2 should be about the same, or
then swit
and we will be unable to do so when F is closer to bet

1. This, of course, implies the obvious question here: 2


sbet
F = ∼1
what is a big value of F . Statisticians found out that 2
swit
this ratio actually follows a known distribution, called
As a general idea, we will reject the NULL hypothesis when F is big
Fisher distribution, or F distribution. and we will be unable to reject it when F ∼ 1.

What is a big value of F?


The F distribution looks like this. We have the values
of F on the x axis here. You see that the highest
2
probability is around 1 and probabilities get lower as sbet /s 2
wit follows the F distribution
we move towards the tail. Thanks to this, we can
quantify what is a ”big” value of F . The procedure is
similar to hypothesis testing: we select a small value 0.6
α - often 0.05. This identifies a threshold value fcrit
here. If my F is greater than fcrit then I will reject the This area is the probability
0.4
NULL hypothesis. Otherwise, I will be unable to reject of picking a value F ≥ fcrit
it.
0.2

0 1 2 fcrit 3 4

No Rejection Region Rejection Region


Let’s look at the F distribution in a little more detail.
It is actually characterized by 2 parameters, referred
to as degrees of freedom: the numerator degrees of The F(ν1 , ν2 ): numerator and denominator DOF
freedom ν1 here and the denominator degrees of free-
dom ν2 . m is the number of groups, 4 in our diet and
1
cardiac output scenario. n is the size of our samples. ν1 = 6, ν2 = 3
With these two information we can compute ν1 and ν2 . ν1 = 6, ν2 = 50
0.8
ν1 = 10, ν2 = 6
Now, there is no need to remember these formula, but ν1 = 10, ν2 = 50 If m = number of
I will just point out one feature: as ν2 increases, the 0.6 groups, n = size of
area under the tail becomes smaller: this makes sense each sample, then
0.4 ν1 = m − 1
because as n increases, our sampling will be more and
ν2 = m(n − 1)
more accurate and the chances to calculate a big value 0.2
F will get smaller.
0
0 1 2 3 4
Now let’s look at the steps we need to perform in
ANOVA. In this example, we will have n = 10, our
sample size and m = 4 our groups. First of all we com- Steps for ANOVA (m = 4 groups, n = 10 sample size)
pute the average in each of our groups.CLICK. Then
1 Compute the mean value of each sample
we compute, for each group, the variance.
Pn Pn
X control i=1 Xi
fruit
X control = i=1 i , X fruit =
n n
Pn pasta Pn steaks
X i=1 Xi
X pasta = i=1 i , X steaks =
n n

2 Compute the variance for of each sample


Pn Pn
2 (X control − X control )2 2 (X fruit − X fruit )2
scontrol = i=1 i , sfruit = i=1 i
n−1 n−1
Pn pasta 2
Pn
2 (X − X pasta ) 2 (X steak − X steak )2
spasta = i=1 i , ssteak = i=1 i
n−1 n−1
Once we have all the variances, I take the average of
them and obtain one of my two estimates of the popu-
lation variance. CLICK. Next, I average out all sample Steps for ANOVA (m = 4 groups, n = 10 sample size)
means and obtain the mean of the means. CLICK.
With this, I can compute my other estimate, the one 3 2
Compute swit
I call the variance between groups as n times the vari-
ance of the means s 2 . 2
swit
1 2
= (scontrol 2
+ sfruit 2
+ spasta 2
+ ssteak )
X 4
4 Compute the mean of the means
1
X = (X control + X fruits + X pasta + X steak )
4
5 Compute sX2 and obtain sbet
2 = ns 2
X

(X control − X )2 + (X fr − X )2 + (X pa − X )2 + (X st − X )2
sX2 =
m−1
Once we have the two variances we take the ratio and
we look at the value we obtain. Based on n and m,
we choose the appropriate F distribution. In this case Steps for ANOVA (m = 4 groups, n = 10 sample size)
ν1 = 4 − 1 = 3 and ν2 = 4(10 − 1) = 36. Following
the same philosophy as hypothesis testing, we choose
a value of α as significance level. This identifies the We compute 0.6 F distribution, F (3, 36)
s2
value fcrit here which separates the rejection region F = sbet
2
wit
and choose
If we select a significance
from the acceptance region. Then, it is all a matter a significance level α level α = 0.05, then fcrit
0.4
of checking whether my computed ratio is smaller or is the value that leaves an
If F > fcrit , we area of 0.05 to its right
bigger than fcrit . reject the NULL
hypothesis 0.2
If F < fcrit , we
fail to reject the
NULL
hypothesis 0 1 2 fcrit 3 4

No Rejection Region Rejection Region


Let’s visualize these two cases. If F is big enough,
that is, bigger than fcrit , then I will reject the NULL
hypothesis. I can compute the exact p value as the area Steps for ANOVA (m = 4 groups, n = 10 sample size)
- in purple here - under the tail beyond F . Of course
0.6
p < α. This p value represents a probability: which
probability? if, in fact all samples were drawn from 0.4 Case of F > fcrit .
This purple area is the p-value
the same population, then the probability of having 0.2 F (3, 36) We reject the
NULL hypothesis.
a value F like the one we calculated or greater just p<α
fcrit F
due to random sampling is p. The other case is when
No Rejection Region Rejection Region
F < fcrit . Here we fail to reject the NULL hypothesis
and the p-value in purple here will be bigger than α. 0.6
0.4 Case of F < fcrit .
This purple area is the p-value
0.2 F (3, 36) We are unable to
reject the NULL
F fcrit hypothesis. p > α

No Rejection Region Rejection Region


Let’s look at what we can conclude based on ANOVA.
When we reject the NULL hypothesis, we are essen-
tially saying that the probability of the observed dif- ANOVA: interpretation
ferences among our samples being due only to random
sampling is small. In our example, we will say that
diet has an effect on cardiac output.VERY important When we reject the NULL hypothesis
note here: we are not saying anything about WHICH we are saying that the probability that the observed differences are
of the food diets had an effect, nor if one or more had due to random sampling is small (i.e., smaller than α). In our
example, we will say that diet has an effect on cardiac output. We
any effect. We are only saying that the chances that can’t say which specific diet (fruits, pasta, etc) nor if only one or
those 4 samples were drawn from the same popula- more is the origin of the observed differences.
tion are small -and when I say same population I mean
When we are unable to reject the NULL hypothesis
all with the same cardiac output. Therefore we reject
we are saying that the available data are unable to confirm that diet
this notion and say that they were drawn from different has an effect on cardiac output. We can’t say that diet has no effect
populations, i.e., with different cardiac outputs.CLICK. on cardiac output, i.e., we can’t say that H0 is true.
If instead we conclude that the can’t reject the NULL
hypothesis, we can say that data are unable to confirm
an effect of diet on cardiac output. However, I can’t
claim that diet has NO effect on cardiac output.
Let’s look at the case when we reject the NULL hy-
pothesis. Remember that ANOVA will not tell which
one of the groups is responsible for the observed differ- A common mistake
ences. In our example, we do not know which of the
diets has an effect on cardiac output. But if you think
of it, it would be really useful to know which one is Scenario
the culprit, right? Or even if more than one is. To try Imagine that after ANOVA, we were able to reject the NULL
hypothesis. We concluded that our data supported an effect of diet
to answer the question, we could perform many pair- on cardiac output. It is natural to wonder which one of the diets
wise t tests. Control versus steak, control versus pasta had the effect, or if more than one did. Therefore we could perform
and so on. CLICK. This approach appears reasonable pairwise Student t tests. For example control versus steak, control
and will yield p-values as well as results of hypothesis versus pasta and control versus fruit and see which one rejects the
NULL hypothesis.
testing. However, there is something wrong with it.

This approach appears reasonable but it is wrong, why?


To understand why it is wrong it is important to remind
us about hypothesis testing: what we are actually stat-
ing when we reject the NULL hypothesis of a Student Compounding of errors
t test is that we are ready to conclude in favour of an
effect of the diet because the probabilities of having Control vs Fruit
We are ready to accept 100α% (blue area)
drawn such sample from a population where the effect of error, i.e., state an effect of a fruit diet
was actually not there are very small. We quantify this −tcrit tcrit when it was only due to random sampling.

”small” with the significance level α. In other words, Control vs Pasta


We are ready to accept 100α% (blue area)
we are ready to accept a type I error 100 times α per- of error, i.e., state an effect of a pasta diet
−tcrit tcrit when it was only due to random sampling.
cent of the times. Now in the scenario of this lecture,
we are performing 3 t tests. Every time we are ready Control vs Steak We are ready to accept 100α% (blue area)
of error, i.e., state different effects of a
to accept such error. CLICK. Since we are making 3 steak diet when it was only due to random
−tcrit tcrit sampling.
t tests we will be accepting an error that is 3 times
that of an individual test. In the very common case of Taken together, we accept to erroneously conclude 100α% + 100α% +100α%
α = 0.05, this means accepting an error 15% of the that at least one pair of groups differs. If α = 0.05, this equates to 15% of errors!

times! So, how shall we proceed in this case?


The correct procedure is to perform ANOVA first and
check whether the NULL hypothesis that all samples
are drawn from the same population (that is, popula- The correct approach
tion with the same cardiac output). CLICK. However,
we are still very much interested in which of the di-
ets has the effect on cardiac output and if only one
or more. So the idea is to compute the pairwise Stu- STEP 1: Perform ANOVA and test whether the diet has an
effect on cardiac output. If we reject the NULL hypothesis, go to
dent t tests, but taking into account the problem of STEP 2.
compounding of errors mentioned in the previous slide. STEP 2: perform pairwise t tests: Control versus pasta, control
CLICK. There is a family of techniques to do so. They versus fruits, control versus steak, etc but taking into account
are collectively referred to as multiple comparison tech- the problem of compounding errors
niques or post-hoc tests (from the Latin, after the fact) The techniques that ”take into account the problem of compounding
errors” are known as multiple comparison procedures or post-hoc
tests.
What is the idea of these multiple comparison proce-
dures? The underlying idea is to try to compensate for
the compounding of type I errors by making it more Multiple comparison procedures: the idea
difficult to reject the NULL hypothesis. In practice,
Original signifi-
this means that if I had an original value of α, I would The idea is to compensate
cance value α/2
have needed to compute my value of t based on the for the risk of Type I error
sample to be greater than tcrit if I wanted to reject the New significance by coming up with a
value α /2 T
smaller value αT and make
NULL hypothesis. If I am doing multiple comparisons, it more difficult to reject
this is not enough any more. In order to compensate the NULL hypothesis
tcrit NEW
tcrit
for the issue of compounding errors, I need to have
my compute t based on my sample to be greater than
NEW Procedure name Choice of αT
this tcrit which is defined by a smaller value which
I call αT . To avoid confusion, I should mention that Bonferroni t test αT = α/k
1
in this slide I am assuming a two-tailed test, so the Holm-Sidak t test αT = 1 − (1 − α) k

areas in this right tail here are halved. Now, what is where k is the number of multiple comparisons to be performed e.g., k = 3 for 3
NEW groups, k = 6 for 4 groups etc.
the value of this αT and therefore of this new tcrit ?
CLICK Statisticians have come up with a whole family
of techniques, each characterized by a different way to
compute αT . Here, we will only mention 2. The Bon-
ferroni t test where αT is simply α divided by k and
k is the number of possible multiple comparisons. Note that the higher the value of k, the higher the compounding
of type I errors and the smaller αT will be. Experience has shown that for high values of k, the Bonferroni t test is
a bit too conservative, that is, the value of αT becomes really small (but sometimes being conservative is good and
the Bonferroni test is still very much used!). One of the many popular alternatives is the Holm-Sidak t test. Here
αT is given by one minus one minus α to the power of 1 over k.
Before we finish, let’s look at a very common case. The
one where my samples are not all of the same size. Our
scenario had 4 groups of 10 individuals each. If, instead ANOVA with unequal sample sizes
you have m groups each with a different size ni , the The principle is the same, however the formulas get a bit more
principle is exactly the same, only the formulas get a bit complicated. Assuming m groups each with a different ni sample
more complicated. Here you have the variances within sizes.
and between the samples expressed as ratio between a 2 SSwit
swit =
ηwit
sum of squares and a number of degrees of freedom. SSbet
2
The meaning of these quantities is the same as before. sbet =
ηbet
There is no need to memorize these formulas, but it’s where N = m
P
i=1 ni , ηbet = m − 1 represents the numerator degrees
useful to mention them because the case of unequal of freedom, ηwit = N − m represents the denominator degrees of
sample size is quite common. freedom and
X m
SSwit = (ni − 1)si2
i=1
m
( m 2
P
2 i=1 ni X i )
X
SSbet = ( ni X i ) −
N
i=1
To summarize, in this lesson we saw the general prin-
ciples of ANOVA. This led us to introduce the F distri-
bution. We then saw the calculation steps and briefly Lecture Summary
looked at the formulas for unequal sample size.

1 The general ideas of ANOVA

2 The F distribution

3 ANOVA: calculations steps

4 Interpretation of ANOVA results

5 Pairwise Comparisons after ANOVA

6 Unequal sample size

You might also like