Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Chapter 11

Comparing the Means of Many Independent


Samples

11.1 Introduction
S Chapter 7: Compare the means of 2 populations using 2 independent
samples
H0 : 61  62
S Chapter 11: Compare 2 or more, in general I population means using I
independent samples.
H 0 : 6 1  6 2  ...  6 I
Example 11.1 (p463)
Treatment

1 2 3 4 5
16.5 11 8.5 16 13
15 15 13 14.5 10.5
B B B B B
10.5 5 9 8.5 11
Mean 11.5 9.6 10.3 11.1 12.3
SD 3.5 2.4 2.0 3.1 2.9
n 12 12 12 12 12

1
S Note the following:
[ The sample means are different from each other
[ There are considerable variation within each group
[ Also look at figure 11.1 (p464)

S Notation to be used:
[ Population means: 6 1 , 6 2 , ..., 6 I
[ Population standard deviations: @ 1 , @ 2 , ..., @ I

S Q:
H0 : 61  62  63  64  65
H A : The 6 Ui s are not all equal
A: Use the method called Analysis of Variance (ANOVA) to test this
hypothesis.

S Q:Why not repeated t tests?


A: There are three reasons:
1. P type I error  PŸRe ject H 0 | H 0 true .
This probability increases as the number of repeated t tests
increases.
2. The ANOVA technique combines the information on
variability from all the samples simultaneously, therefore
increases the precision of the analysis.
3. The structure of the treatment groups makes the t tests
ineffective.

2
A Graphical Perspective on ANOVA

S This is one of the first steps in any analysis. (Use a


graph to see what is going on.)
S Will consider 2 variabilities
[ between group sample means
[ within groups

3
11.2 The Basic Analysis of Variance

S Before we can use the ANOVA procedure to test the hypothesis


H0 : 61  62  63  64  65
H A : The 6 Ui s are not all equal
S we first need to do a few calculations that describe the variability
between as well as within the different groups.

S Notation:

[ y ij indicate observation j in group i with i  1, T, I


and j  1, T, n i
[ ni is the sample size of group i

[ I the number of groups

[ ! j1
ni
y ij the mean for group i
y i)  ni

[ n   n 1  T  n I is the total sample size

[ ! i1
I
! j1
ni
y ij is the grand mean
y ))  n

4
Example 11.2: (p11.2)

Diet 1 Diet 2 Diet 3

y 11  8 y 21  9 y 31  15
y 12  16 y 22  16 y 32  10
y 13  9 y 23  21 y 33  17
y 24  11 y 34  6
y 25  18
ni n1  3 n2  5 n3  4

! j1 y 1j  33 ! j1 y 2j  75 ! j1 y 3j  48


Sum 3 5 4

y i

Mean ! j1
3
y 1j ! j1
5
y 2j ! j1
4
y 3j
n1  11 n2  15 n3  12
y i

The overall mean:

! i1
I
! j1
ni
y ij
y ))  n
 337548
354
 13

5
Variation Within Groups

S Notation:

[ SSŸwithin   ! i1 ! j1


Ii
Ÿy ij " y i)   2
n

the Sum of Squares within groups

[ dfŸwithin   n  " I
 Ÿn 1 " 1    T  Ÿn I " 1  
the within groups degrees of freedom

[ MSŸwithin   
SSŸwithin  
dfŸwithin  

the mean square within groups

[ S pooled  MSŸwithin  
pooled standard deviation

6
Example 11.3 & 11.4: (p469 &471)
Diet 1 Diet 2 Diet 3

8 9 15
16 16 10
9 21 17
11 6
18
ni 3 5 4
Mean 11 15 12
Sum  ! j1 Ÿy ij " y i)   2
n
i
38 98 74

! j1
3
Ÿy 1j " y 1)   2  Ÿ8 " 11   2  ...  Ÿ9 " 11   2  38
! j1
5
Ÿy 2j " y 2)   2  Ÿ9 " 15   2  ...  Ÿ18 " 15   2  98
! j1
4
Ÿy 3j " y 3)   2  Ÿ15 " 12   2  ...  Ÿ6 " 12   2  74

S SSŸwithin   38  98  74  210
S dfŸwithin   12 " 3  9
S MSŸwithin    210
9
 23.333
S S pooled  210
9
 4.83

7
More on SSŸwithin  and MSŸwithin 

SSŸwithin 
 ! i1 ! j1 Ÿy ij " y i)   2
I i n

 ! j1 Ÿy 1j " y 1)   2  ...  ! j1 Ÿy Ij " y I)   2


1 n I n

 n 1 "1
n 1 "1
! j1
n1
Ÿy 1j " y 1)   2  ... 
n I "1
n I "1
! j1
n1
! j1
nI
Ÿy Ij " y I)   2
! n1
Ÿy 1j "y 1)  
2
! nI
Ÿy Ij "y I)  
2

 Ÿn 1 " 1   j1
n 1 "1
 ..  Ÿn I " 1   j1
n I "1

 Ÿn 1 " 1  s 21  ...  Ÿn I " 1  s 2I

MSŸwithin  
SSŸwithin  
 dfŸwithin  
Ÿn 1 "1  s 21 ...Ÿn I "1  s 2I
 n  "I
Ÿn 1 "1  s 21 ...Ÿn I "1  s 2I
 Ÿn 1 "1  TŸn I "1  

S The MSŸwithin   can now also be calculated from the


individual sample standard deviations i.e. using
s 21 , s 22 , ..., s 2I .

8
Variation Between Groups

S Notation:

[ SSŸbetween    ! i1 n i Ÿy i) " y ))   2


I

the Sum of Squares between Groups

[ dfŸbetween    I " 1
the degrees of freedom between groups
[ MSŸbetween   
SSŸbetween  
dfŸbetween  

the mean square between groups

9
Example 11.5 (p11.5)

Diet 1 Diet 2 Diet 3

ni n1  3 n2  5 n4  4
Mean y 1)  11 y 2)  15 y 3)  12
Grand mean y )) 13

SSŸbetween  
 ! i1 n i Ÿy i) " y ))   2
I

 3Ÿ11 " 13  2  5Ÿ15 " 13  2  4Ÿ12 " 13  2


 36

dfŸbetween  
 I"1
 3"1
2

MSŸbetween  
SSŸbetween  
 dfŸbetween  

 36
2
 18

10
Fundamental Relationship of ANOVA

Total Sum of Squares


y ij " y ))  Ÿy ij " y i)    Ÿy i) " y ))  

which leads to
! i ! j Ÿy ij " y ))   2
 ! i ! j Ÿy ij " y i)   2  ! i ! j Ÿy i) " y ))   2
 ! i ! j Ÿy ij " y i)   2  ! i n i Ÿy i) " y ))  
 SSŸwithin    SSŸbetween  

SSŸTotal    SSŸwithin    SSŸbetween  

Total degrees of freedom


dfŸtotal  
 n " 1
 Ÿn  " I    ŸI " 1  
 dfŸwithin    dfŸbetween  

11
Example 11.6: (p474)

SSŸTotal  
 ! i ! j Ÿy ij " y ))   2
 Ÿ8 " 13   2  ...  Ÿ9 " 13   2
 Ÿ9 " 13   2  ...  Ÿ18 " 13   2
 Ÿ15 " 13   2  ...  Ÿ6 " 13   2
 246

SSŸwithin    210
SSŸbetween    36

dfŸtotal  
 n " 1
 12 " 1
 11

dfŸwithin    9
dfŸbetween    2

12
The ANOVA table

S This is a summary of all the formulae and calculations.

Source df SS MS
Between I"1 ! i1
I
n i Ÿy i) " y ))   2 SSŸb /df
n  " I ! i1 ! j1 Ÿy ij " y i)   2 SSŸw /df
I n
Within i

Total n " 1 ! i ! j Ÿy ij " y ))   2

Source df SS MS
Between 2 36 18
Within 9 210 23.333
Total 11 246

13
11.3 The Analysis of Variance Model
Think of the ANOVA in terms of the following
statistical model:
y ij  6  A i  / ij

6 the grand population mean

A i effect of group i i.e. the difference between the


population mean for group i, 6 i , and the grand mean,
6. Therefore, A i  6 i " 6.
S If A i  0 (positive): the observations from group i tend
to be greater than the overall average
S If A i  0 (negative): the observations from group i tend
to be smaller than the overall average.

/ ij random error
The following two null hypothesis are equivalent
H 0 : 6 1  ...  6 I
and
H 0 : A 1  ...  A I  0

14
Thus, the statistical model can be stated in words as:
y ij  6  A i  / ij
observation  overall average  group effect 
random error

Parameter estimates:

§
6  y 
§
6 i  y i
§
Ai  §
6i " §
6  y i " y 
§
/ ij  y ij " y i

Thus,
y ij  y   Ÿy i " y     Ÿy ij " y i  

so that
§
y ij  §
6§
A i  / ij

SSŸbetween    ! i1 n i Ÿy i) " y ))   2  ! i1 n i §


A 2i
I I

ni § 2
SSŸwithin   ! i1 ! j1 Ÿy ij " y i)   2  ! i1 ! j1 / ij
I ni I

15
11.4 The Global F Test
In this section we use all the calculation done in the previous section to
test the hypothesis

H 0 : 6 1  ...  6 I
H A : The 6 Ui s are not all equal
S This is a compound hypothesis.
S Upon rejection of H 0 we do not know which means differ from which.
S Further analysis is needed.

The F Distribution ŸF v 1 ,v 2  
Q: What does this distribution look like?
0.8
0.8

0.6

dF ( x, 9 , 5) 0.4

0.2

0 0
0 2 4 6 8 10
0 x 10

S The distribution is heavily skewed to the right


S Depends on two parameters:
[ Numerator degrees of freedom
[ Denominator degrees of freedom
S Critical values are found in Table 10

16
The F Test
1. H 0 : 6 1  ...  6 I
H A : The 6 Ui s are not all equal
2. Choose )  ?
3. Calculate the F test statistic value
MSŸbetween  
FS  MSŸwithin  
with
Numerator df dfŸbetween  
Denominator df dfŸwithin  
4. Find the p " value and compare it to the chosen
) " value or find the critical value from Table 9
and compare it to the calculated test statistic
value.

5. Conclusion: Reject or do not reject H 0

17
Example 11.9: (p479)

1. H 0 : 6 1  6 2  6 3
H A :The 6 Ui s are not all equal

2. Choose )  0.05

3. Calculate the F test statistic value


MSŸbetween  
FS  MSŸwithin  
 18
23.333
 0.77
with
Numerator df  dfŸbetween    2
Denominator df  dfŸwithin    9

4. The p " value  0.2 and the critical value


F 2,9,0.05  4.26.

5. Conclusion: Since the p " value  0.2  )  0.05


or since F S  0.77  F 2,9,0.05  4.26 we do not
reject H 0

18
More ways to calculate F S
FS
MSŸbetween  
 MSŸwithin  
SSŸbetween 

 dfŸbetween 
SSŸwithin 
dfŸwithin 

 dfŸwithin •SSŸbetween 
dfŸbetween •SSŸwithin 

19
11.5 Applicability of Methods

The calculations and the interpretations of the


ANOVA are based on certain conditions
1. Design conditions:
S Should be reasonable to regard the groups of
observations as random samples from their
respective populations. The observations
within each sample must be independent
from each other
S The I samples must be independent of each
other.
2. Population conditions:
S The I population distributions must be
(approximately) normal with equal standard
deviations i.e. @ 1  @ 2  ...  @ I

20
11.5 Two-way ANOVA

Analysis of Variance for:


1. Randomized complete block design
2. Two factors

Randomized complete block design

Example 11.12 (p487)


S Effect of different amounts of acid on the growth rate
of plants and at the same time take into account of the
differing amounts of sunlight

Low High Control Block Mean


Block 1 1.58 1.10 2.47 1.717
Block 2 1.15 1.05 2.15 1.450
Block 3 1.27 0.50 1.46 1.077
Block 4 1.25 1.00 2.36 1.537
Block 5 1.00 1.50 1.00 1.167
ni 5 5 5
y i 1.25 1.03 1.888

21
Dotplots of the 3 treatment groups

2.75

2.25
Height (cm)

1.75

1.25

0.75

0.25
0.5 1 1.5 2 2.5 3 3.5

low high control

S Hypothesis:
H A : 6 1  6 2  6 3 , but we need to take into account
the differences between blocks
S Statistical model:
y ijk  6  A i  * j  / ijk
S Notation:
U
y ijk the k th observation when treatment i is applied
in block j
6 grand population mean

A i effect of group i

* j effect of block j

22
SSŸtotal   SSŸwithin   SSŸtreatments   SSŸblocks 
Formulae:
Treatments:
SSŸtreatments   ! i1 n i Ÿy i) " y ))   2
I

dfŸtreatments   I " 1
MSŸtreatments   SSŸtreatments 
dfŸtreatments 

Blocks:
SSŸblocks   ! j1 m i Ÿy j " y ))   2
B

dfŸblocks   B " 1
MSŸblocks   SSŸblocks 
dfŸblocks 

Total:
SSŸtotal   ! i1 ! j1 ! k1 Ÿy ijk " y    2
I B
i n

dfŸtotal   n ' " 1


Within:
SSŸwithin   ! i1 ! j1 Ÿy ij " y i) " y )j  y    2
Ii n

dfŸwithin   n  " I " B  1


MSŸwithin   SSŸwithin 
dfŸwithin 

23
Example 11.13

Low High Control mi y j


Block 1 1.58 1.10 2.47 3 1.717
Block 2 1.15 1.05 2.15 3 1.450
Block 3 1.27 0.50 1.46 3 1.077
Block 4 1.25 1.00 2.36 3 1.537
Block 5 1.00 1.50 1.00 3 1.167
n 5 5 5 n '  15
y i 1.25 1.03 1.888 y   1.389

SSŸtreatments 
 ! i1 n i Ÿy i) " y ))   2
I

 5Ÿ1.25 " 1.389   2  5Ÿ1.03 " 1.389   2  5Ÿ1.888 " 1.389   2


 1.986

dfŸtreatments 
 I"1
 3"1
2

24
MSŸtreatments 
 SSŸtreatments 
dfŸtreatments 

 1.986
2
 1.986

SSŸblocks 
 ! j1 m i Ÿy j " y ))   2
B

 3Ÿ1.717 " 1.389   2  ...  3Ÿ1.167 " 1.389   2


 0.840

dfŸblocks 
 B"1
 5"1
4

MSŸblocks 
 SSŸblocks 
dfŸblocks 

 0840
4
 0.210

25
SSŸtotal 
 ! i1 ! j1 ! k1 Ÿy ijk " y    2
I Bi n

 Ÿ1.58 " 1.389   2  Ÿ1.15 " 1.389   2  ...  Ÿ1.00 " 1.389   2
 4.278

dfŸtotal 
 n' " 1
 15 " 1
 14

SSŸwithin 
 SSŸtotal  " SSŸtreatments  " SSŸblocks 
 4.278 " 1.986 " 0.840
 1.452
OR

SSŸwithin 
! ! Ÿy " y i)  y )j  y    2
I ni
i1 j1 ij

 Ÿ1.58 " 1.25 " 1.717  1.389   2  ...


 Ÿ1.0 " 1.888 " 1.167  1.389   2
 1.452

26
dfŸwithin 
 dfŸtotal  " dfŸtreatments  " dfŸblocks 
 14 " 2 " 4
8
OR

dfŸwithin 
 n " I " B  1
 15 " 3 " 5  1
8

27
Summarize all the calculations in an ANOVA table:
Source df SS MS F " ratio
Treatments 2 1.986 0.993 5.47
Blocks 4 0.840 0.210
Within grou ps 8 1.452 0.1815
Total 14 4.278

Thus,
1. H 0 : 6 1  6 2  6 3
H A :The 6 Ui s are not all equal

2. Choose )  0.05

3. Calculate the F test statistic value


MSŸtreatments  
FS  MSŸwithin  
 0.993
0.1815
 5.47
with
Numerator df  dfŸbetween    2
Denominator df  dfŸwithin    8

4. The p " value  0.05 and the critical value


F 2,8,0.05  4.46.

5. Conclusion: Since the p " value  0.05  )  0.05


or since F S  5.47  F 2,8,0.05  4.46 we reject H 0

28
Factorial ANOVA
S Two or more factors influence the response variable
simultaneously i.e.there are more than one explanatory
variable
Example 11.15:(p490)
S The effect of stress (control & stress) and light (low &
moderate) on growth of soybean plants
S 2 • 2 factorial experiment
Representation of a 2 • 2 factorial experiment:

B1 B2
Low Moderate

A1 ..  ..  Ÿ1, 1  •.. • .. • Ÿ1, 2 


Control Treatment 1 Treatment 3

A2 …..…..… Ÿ2, 1  O..O..O Ÿ2, 2 


Stress Treatment 2 Treatment 4

29
S Factor A: Mechanical Stress
[ Level 1: Control i.e. No stress
[ Level 2: Stress
S Factor B: Light
[ Level 1: Low
[ Level 2: Moderate
S Treatment 1: Control & Low light
S Treatment 2: Stress & Low light
S Treatment 3: Control & Moderate light
S Treatment 4: Stress & Moderate light

30
The data of example 11.15:
Treatment 1 Treatment 2 Treatment 3 Treatment 4

264 235 314 283

200 188 320 312

225 195 310 291

B B B B
288 255 282 282

230 202 273 257

y 11  245.3 y 21  245.3 y 12  245.3 y 22  245.3


SD 11  27.0 SD 21  27.0 SD 12  27.0 SD 22  27.0
13 13 13 13

31
Statistical Model for no interaction

y ijk  6  A i  * j  / ijk

Notation

y ijk the k U th observation of level i of the first factor


and level j of the second factor.

6 grand population mean

A i the effect of level i of the first factor

* j the effect of level j of the second factor

where

! i1
I
Ai  0

32
Arrange sample means in a table:

B1 B2 Difference
A1 y 11  245.3 y 12  304.1 58.8
A2 y 21  212.9 y 22  268.8 55.9
Difference "32.4 "35.3

B2
B1

A1 A2

S Additive factors  the influence of the two factors are


equal to the sum of their separate influences i.e. there
is no interaction between the two factors.
S Take note of simple effects and main effects (See
page 492)

33
Interaction graphs:

A1
A2

B1 B2

S No interaction in a factorial experiment (Parallel lines)

A1
A2

B1 B2

S Interaction as a difference in magnitude of response

34
A1
A2

B1 B2

S Interaction as a difference in direction of response

A1
A2

B1 B2

S Interaction due to a difference in magnitude and


direction of response

35
Example 11.17 (p493)

Unfertilized Fertilized Difference


Ambient y 11  0.289 y 12  0.347 0.058
Elevated y 21  0.227 y 22  0.496 0.269
Difference "0.062 0.149
Carbon absorbtion values

Fertilized
Unfertilized

Ambient Elevated
CO2 Concentration

Suppose that
S Factor 1: CO 2 concentrations
[ Level 1: Ambient
[ Level 2: Elevated
S Factor 2: Soil Type
[ Level 1: Unfertilized
[ Level 2: Fertilized

36
Statistical Model for Interaction

y ijk  6  A i  * j  + ij  / ijk

Notation

y ijk the k U th observation of level i of the first factor


and level j of the second factor.

6 grand population mean

A i the effect of level i of the first factor

* j the effect of level j of the second factor

+ ij the interaction effect between level i of factor 1


and level j of factor 2
where

! i1
I
Ai  0

! j1
J
*j  0

! i1
I
+ ij  ! j1 + ij  0
J

37
Assume that we have a balanced design i.e. we have
an equal number of observations (or measurements)
in each of the treatments. Furthermore, suppose that
we have r observation per treatment.

1 2 C J Marginal
1 y 11 y 12 C y 1J y 1
2 y 21 y 22 C y 2J y 2
B B B E B B
I y I1 y I2 C y IJ y I
Marginal y 1 y 2 C y J y 

SSŸtotal   SSŸfactor1   SSŸfactor2  


SSŸinteraction   SSŸwithin 

Source df SS
I"1 rJ ! i1 Ÿy i " y    2
I
Factor1
J"I rI ! j1 y j " y 
J 2
Factor2
ŸI " 1 ŸJ " 1  r ! i1 ! j1 y ij " y i " y j  y 
I J
Inter
n  " IJ ! i1 ! j1 ! k1 y ijk " y ij
I J r 2
Within
n "1

! ! ! y ijk " y 
I J r 2
Total i1 j1 k1

38
Source df SS MS F
Factor1 1 0.005678 0.005678 1.19
Factor2 1 0.080197 0.080197 16.79
Interaction 1 0.33391 0.33391 6.99
Within 8 0.038202 0.004775
Total 11 0.157468

The three null hypothesis that we need to test

1. H 0 : A 1  A 2  ...A I  0

2. H 0 : * 1  * 2  ...* I  0

3. H 0 : + 11  ...+ IJ  0

Take note:
Inferences about individual factor effects depend
upon the presence or absence of interaction.
Significance of interaction is determined before any
determinations of significance for main effects of the
factors, since significant interaction can modify any
inferences based on the significant differences among
the marginal means of the factors

39
Testing for Interaction effects
1. H 0 : + 11  ...+ IJ  0
H A : The + Ui s are not all equal to zero
2. Choose )  0.05
3. Calculate the F test statistic value
MSŸInteraction  
FS  MSŸwithin  
 6.99
with
Numerator df dfŸInteraction    ŸI " 1 ŸJ " 1   1
Denominator df dfŸwithin    n  " IJ  8
4. Find the p " value and compare it to the chosen
) " value or find the critical value from Table 9
and compare it to the calculated test statistic
value. In this case F s  6.99  F 1,8,0.05  5.32

5. Conclusion: Reject or do not reject H 0 . In this


case we reject H 0

Take note:
S If H 0 is rejected we should be careful in interpreting
main effects

40
Testing for main effects of factor 1
1. H 0 : A 1  A 2  ...A I  0
H A : The A Ui s are not all equal to zero
2. Choose )  0.05
3. Calculate the F test statistic value
MSŸfactor1  
FS  MSŸwithin  
 1.19
with
Numerator df dfŸfactor1    I " 1  1
Denominator df dfŸwithin    n  " IJ  8
4. Find the p " value and compare it to the chosen
) " value or find the critical value from Table 9
and compare it to the calculated test statistic
value. In this case F s  1.19  F 1,8,0.05  5.32

5. Conclusion: Reject or do not reject H 0

41
Testing for main effects of factor 2
1. H 0 : * 1  * 2  ...* I  0
H A : The * Ui s are not all equal to zero
2. Choose )  ?
3. Calculate the F test statistic value
MSŸfactor2  
FS  MSŸwithin  
 16.79
with
Numerator df dfŸfactor2    J " 1  1
Denominator df dfŸwithin    n  " IJ  8
4. Find the p " value and compare it to the chosen
) " value or find the critical value from Table 9
and compare it to the calculated test statistic
value.In this case F s  16.79  F 1,8,0.05  5.32

5. Conclusion: Reject or do not reject H 0

42
Chapter 11: Exercises
11.1 11.2 11.3 11.4 11.5 11.6 p476

11.8 11.9 11.10 p481

11.17 11.18 11.19 11.20 11.21 11.22 p497

Take note:
S These exercises is part of the textbook and can be
included in any class test, semester test or exam!

43

You might also like