Professional Documents
Culture Documents
Selvanathan 6e - 16 - PPT
Selvanathan 6e - 16 - PPT
Analysis of variance
Chapter outline
Introduction
In Chapter 13, we tested hypotheses about the mean of
a single population, and in Chapter 14, we tested the
equality of means of two populations.
Analysis of variance (ANOVA), presented in this chapter,
helps compare two or more population means of
numerical data.
Analysis of variance is a procedure that tests to
determine whether differences exist between two or
more population means.
To do this, the technique analyses the sample variance
of the data.
16.1 Single-factor analysis of variance:
16.6
1 2 k
x11
First observation, x21 xk1
x12
x22 xk2
first sample
.
. .
.
. .
Second observation, .
. .
second sample xn1,1
xn2,1 xnk,1
n1
n2 nk
x1
x2 xk
Sample size
Sample mean
Terminology
th th
xij refers to the i observation in the j sample.
rd th
e.g. x35 is the 3 observation of the 5 sample.
Mean of sample j:
where n = n1 + n2 + … + nk.
16.10
Example 1
(Example 16.1, p599)
An apple juice manufacturer is planning to develop a new
product, a liquid concentrate. The marketing manager has
to decide how to market the new product. Three
strategies are considered:
• emphasise convenience of using the product
• emphasise quality of the product
• emphasise product’s low price.
16.12
Example 1…
Convnce Quality Price
An experiment was conducted as 529
658
804
630
672
531
follows: 793 774 443
514 717 596
• In three cities an advertising 663 679 602
719 604 502
campaign was launched. 711 620 659
• In each city only one of the 606 697 689
461 706 675
three characteristics 529 615 512
498 492 691
(convenience, quality, and price) 663 719 733
was emphasised. 604 787 698
495 699 776
• The weekly sales were recorded 485 572 561
for twenty weeks following the 557 523 572
353 584 469
beginning of the campaigns. 557 634 581
542 580 679
614 624 532
16.13
H0: µ1 = µ2= µ3
n1 = 20 n2 = 20 n3 = 20
Graphical demonstration:
25
x 3 20
x 3 20
20 20
19
x 2 15
16 x 2 15
15
14
x1 10
11
12
x1 10
10 10
9 9
Treatment 1 a conclusion
to draw Treatment
about 2the Treatment 3 Treatment 1 Treatment
variability makes 2 to draw a Treatment 3
it harder
Variability between the sample means is measured as the sum of squared distances between
SST
k
2
SST n j ( x j x )
j 1
Note: When the sample means are close to one another, their distance from the grand mean is small,
leading to a small SST, which supports H0. Thus, large SST indicates large variation between sample means,
which supports HA. The question is: how large is “large enough”?
16.22
613.07
16.23
COMPUTE
Example 1 – Solution…
In our example, this is the sum of all squared differences between sales in city j and the sample mean
Calculate SSE
= 506983.50
16.27
SST SSE
MST MSE
k 1 nk
57512.23 506983.50
3 1 60 3
28756.12 8,894.45
16.29
COMPUTE
Example 1 – Solution…
COMPUTE
Example 1 – Solution…
Hypotheses:
H0: 1 = 2 = …=k
MST
F
Test statistic: MSE
The F-test:
MST
F
Ho: µ 1 = µ 2= µ 3 MSE
28756.12
HA: At least two means differ
8894.17
3.23
Test statistic: F= MST/ MSE ~ Fk-1,n-k
Decision rule:
Conclusion: Since 3.23 > 3.15, there is sufficient evidence to reject Ho in favour of HA and argue
that at least one of the mean sales is different than the others.
16.32
Note:
MST SST / ( k 1)
F
MSE SSE / ( n k )
Alternative method:
p-value = P(F > Fcalculated)
Decision rule: Reject Ho if p-value <
16.33
3.23
16.34
In the Data Analysis dialogue box (shown below), enter the input and the output is presented in
Example 1 – Solution…
SUMMARY
Groups Count Sum Average Variance
Convenience 20 11551 577.55 10775.00
Quality 20 13060 653.00 7238.11
Price 20 12173 608.65 8670.24
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 57512 2 28756 3.23 0.0468 3.16
Within Groups 506984 57 8894
Total 564496 59
SST
SSE
SS(Total)
SS(Total) = SST + SSE
16.37
1. Hypotheses:
H0: μ1 = μ2 = μ3
HA: At least two means differ.
2. Test statistic:
~ Fk-1,n-k n = n1 + n2 + n3 = 60, k=3
6. Conclusion:
Since F = 3.23 > 3.15 or p-value = P(F > 3.23) = 0.0468 <
= 0.05, reject H0.
16.40
Example 1 – Solution…
Using Excel (Data Analysis): Output
16.42
Example 1 – Solution…
Checking the required conditions
• The F-test requires that the populations are normally
distributed with equal variance.
• The equality of variances is examined by printing the
sample standard deviations or variances. The similarity
of sample variances allows us to assume that the
population variances are equal.
• From the Excel printout we compare the sample
variances: 10775, 7238, 8670. It seems the variances
are equal.
• To check the normality observe the histogram of each
sample (see next slide).
16.44
Example 1 – Solution…
Checking the required conditions…
normal
16.45
ANOVA table
16.47
ANOVA table
The results of analysis of variance are usually reported
in an ANOVA table…
Source of Degrees of
Sum of squares Mean square
variation freedom
F-stat=MST/MSE
16.48
Identifying factors
16.50
Multiple comparisons
2 1 1
( x1 x2 ) t /2 Sp
n1 n2
If the interval excludes 0 we can conclude that the
population means differ. So another way to conduct a
two-tail test is to determine whether
is 2 1than
greater 1
( x1 x2 ) t /2 S p
n1 n2
16.53
1 1
LSD t /2, MSE , nk
ni n j
Example 2
(Example 16.3, page 621)
Example 2
(Example 16.3, page 621)
Example 2 – Solution
(LSD method)
The problem objective is to compare four populations,
the data are interval, and the samples are independent.
The correct statistical method is the one-way analysis of
variance.
A B C D E F G
11 ANOVA
12 Source of Variation SS df MS F P-value F crit
13 Between Groups 150,884 3 50,295 4.06 0.0139 2.8663
14 Within Groups 446,368 36 12,399
15
16 Total 597,252 39
Example 2 – Solution…
(LSD method)
The sample means are
x1 380.0
x2 485.9
x3 483.8
x4 348.2
and MSE = 12,399. Thus
1 1 1 1
LSD t /2, MSE 2.030 (12399) 101.09
ni n j 10 10
16.58
Example 2 – Solution…
(LSD method)
We calculate the absolute value of the differences between
means and compare them to LSD = 101.09.
| x1 x2 | | 380.0 485.9 | | 105.9 | 105 .9
| x1 x3 | | 380.0 483.8 | | 103.8 | 103.8
| x1 x4 | | 380.0 348.2 | | 31.8 | 31 .8
| x2 x3 | | 485.9 483.8 | | 2.1| 2.1
| x2 x4 | | 485.9 348.2 | |137.7 | 137.7
| x3 x4 | | 483.8 348.2 | |135.6 | 135.6
Hence, µ1 and µ2, µ1 and µ3, µ2 and µ4, and µ3 and µ4 differ.
The other two pairs µ1 and µ4, and µ2 and µ3 do not differ.
16.59
Example 2 – Solution…
(LSD method)
Using Excel
16.60
Example 2 – Solution…
(LSD method)
Using Excel (Data Analysis Plus)
In the Data Analysis Plus dialogue box (shown below), enter the input and the output is presented in the
next slide.
16.61
Example 2 – Solution…
(LSD method)
Using Excel (Data Analysis Plus)
A B C D E
1 Multiple Comparisons
2
3 LSD Omega
4 Treatment Treatment Difference Alpha = 0.05 Alpha = 0.05
5 Bumper 1 Bumper 2 -105.9 100.99 133.45
6 Bumper 3 -103.8 100.99 133.45
7 Bumper 4 31.8 100.99 133.45
8 Bumper 2 Bumper 3 2.1 100.99 133.45
9 Bumper 4 137.7 100.99 133.45
10 Bumper 3 Bumper 4 135.6 100.99 133.45
Hence, µ1 and µ2, µ1 and µ3, µ2 and µ4, and µ3 and µ4 differ.
The other two pairs µ1 and µ4, and µ2 and µ3 do not differ.
16.62
Example 2 – Solution
(Bonferroni adjustment to LSD)
1 1 1 1
LSD t /2, MSE 2.79 (12399) 139.13
ni n j 10 10
16.64
Example 2 – Solution…
(Bonferroni adjustment to LSD)
Using Excel
Click Add-Ins > Data Analysis Plus > Multiple Comparisons
16.65
Example 2 – Solution…
(Bonferroni adjustment to LSD)
Using Excel
A B C D E
1 Multiple Comparisons
2
3 LSD Omega
4 Treatment Treatment Difference Alpha = 0.0083 Alpha = 0.05
5 Bumper 1 Bumper 2 -105.9 139.11 133.45
6 Bumper 3 -103.8 139.11 133.45
7 Bumper 4 31.8 139.11 133.45
8 Bumper 2 Bumper 3 2.1 139.11 133.45
9 Bumper 4 137.7 139.11 133.45
10 Bumper 3 Bumper 4 135.6 139.11 133.45
Table 7 - Appendix B
harmonic mean of the sample sizes
16.67
Example 2 – Solution…
(Tukey’s Multiple Comparison)
k = number of treatments
n = Number of observations ( n = n1+ n2 + . . . + nk )
= Degrees of freedom associated with MSE
ng = Number of observations in each of k samples
= Significance level
q(k,) = Critical value of the Studentized range
16.68
Example 2 – Solution…
(Tukey’s Multiple Comparison)
k=4
n1 = n2 = n3 = n4 = ng = 10
Ν = 40 – 4 = 36
MSE = 12,399
q0.05(4,37) q0.05(4,40) = 3.79
Thus,
MSE 12,399
q (k , ) (3.79) 133.45
ng 10
16.69
Example 2 – Solution…
(Tukey’s Multiple Comparison)
A B C D E
1 Multiple Comparisons
2
3 LSD Omega
4 Treatment Treatment Difference Alpha = 0.05 Alpha = 0.05
5 Bumper 1 Bumper 2 -105.9 100.99 133.45
6 Bumper 3 -103.8 100.99 133.45
7 Bumper 4 31.8 100.99 133.45
8 Bumper 2 Bumper 3 2.1 100.99 133.45
9 Bumper 4 137.7 100.99 133.45
10 Bumper 3 Bumper 4 135.6 100.99 133.45
One-way ANOVA
Response
Two-way ANOVA
Response
Treatment 3
Treatment 2
Treatment 1
Level 3
Level 2
Level 1
Factor A
Level 2 Level 1
Factor B
16.75
Numerical
we’ll
do this
first
16.79
Treatment 4
Treatment 3
Treatment 2
Treatment 1
SS(Total)==SST
SS(Total) SST++SSB
SSB++SSE
SSE
Sum of square for treatments Sum of square for blocks Sum of square for error
Randomised blocks…
16.82
nd
Mean of the observations of the 2 treatment
16.83
SSB k ( x [ B ]1 ) x k ( x [ B ]2 ) x ... k ( x [ B ]k ) x
2 2 2
Mean squares
To perform hypothesis tests for treatments and blocks we
need: SST
MST
• mean square for treatments k 1
• SSB
mean square for blocks MSB
• b 1
mean square for error.
SSE
MSE
n k b 1
16.86
ANOVA table…
We can summarise this new information in an analysis of variance (ANOVA) table for the randomised
Source of Sum of
d.f.: Mean square F statistic
variation squares
Identifying factors….
16.91
Example 3 – Solution…
a) The parameters of interest are the treatment means, μj
(j = 1, 2, …,7).
The complete test is as follows:
1. Hypotheses:
H0: μ1 = μ2 = … = μ7
HA: At least two means differ.
2. Test statistic:
F = ~ Fk−1,n−k−b+1,
where n = 1400, k = 7 and b = 200.
3. Level of significance: Assume = 0.05
16.95
Example 3 – Solution…
Example 3 – Solution…
b) The parameters of interest are the block means, μi (i =
1, 2, …,200 - teenagers).
The complete test is as follows:
1. Hypotheses:
H0: μ1 = μ2 = … = μ200
HA: At least two block differ.
2. Test statistic:
F = ~ Fb−1,n−k−b+1,
where n = 1400, k = 7 and b = 200.
3. Level of significance: Assume = 0.05
16.97
Example 3 – Solution…
4. Decision rule:
Reject H0 if F > F,b−1,n−k−b+1 = F0.05,199,1194 = 1.19.
Alternatively, reject H0 if p-value < = 0.05.
5. Value of the test statistic:
From the complete output below, F = 2.63
p-value = 0
6. Conclusion:
Since F = 2.63 > 1.19 (alternatively, as p-value =
0 < = 0.05), reject H0.
Example 3 – Solution…
COMPUTE
b–1 MST/MSE
Blocks k–1
Treatments MSB/MSE
Conclusion: at 5% significance level there is sufficient evidence to reject the null hypothesis, and infer that
(a) mean radio time is different in at least one of the week days, and (b) mean listening time differs among
Example 3 – Solution…
Checking the required conditions
The F-test of the randomised block design of the analysis of variance has the same requirements as the
independent samples design. That is, the random variable must be normally distributed and the
The histograms (see below) appear to support the validity of our results; the reductions appear to be
normal.
Sunday 462.9742
Monday 502.1718
Tuesday 506.2758
The population variances seem to be equal. See the sample
Wednsday 540.7065
variances: Thursday 483.7455
Friday 484.6227
Saturday 481.6128
16.103
In the matched pairs experiment, we simply remove the effect of the variation caused by differences
The effect of this removal is seen in the decrease in the value of the standard error (compared to the
standard error in the test statistic produced from independent samples) and the increase in the value
of the t-statistic.
16.105
The sum of squares for error is reduced by SSB, making it easier to detect differences between the
treatments.
Additionally, we can test to determine whether the blocks differ – a procedure we were unable to
Identifying factors
Factors that identify the randomised block of the
analysis of variance:
Numerical
16.107
Identifying factors…
16.110
Example 4…
Example 4 – Solution
INTERPRET
5. Value of the test statistic and p-value: From the output,
6. Conclusion:
Since F = 2.17 > Fcritical = 2.14 and p-value = P(F > 2.17) =
0.0467 < = 0.05, reject H0.
A B C D E F G
1 Anova: Single Factor
2
3 SUMMARY
4 Groups Count Sum Average Variance
5 Male E1 10 126 12.60 8.27
6 Male E2 10 110 11.00 8.67
7 Male E3 10 106 10.60 11.60
8 Male E4 10 90 9.00 5.33
9 Female E1 10 115 11.50 8.28
10 Female E2 10 112 11.20 9.73
11 Female E3 10 94 9.40 16.49
12 Female E4 10 81 8.10 12.32
13
14
15 ANOVA
16 Source of Variation SS df MS F P-value F crit
17 Between Groups 153.35 7 21.91 2.17 0.0467 2.1397
18 Within Groups 726.20 72 10.09
19
20 Total 879.55 79
16.117
Example 4 – Solution…
Terminology
A complete factorial experiment is an experiment in
which the data for all possible combinations of the
levels of the factors are gathered. This is also known as
a two-way classification.
The two factors are usually labelled A and B, with the
number of levels of each factor denoted by a and b
respectively.
The number of observations for each combination is
called a replicate, and is denoted by r. For our
purposes, the number of replicates will be the same for
each treatment, that is, they are balanced.
16.119
Terminology…
Thus, we use a complete factorial experiment where
the number of treatments is ab with r replicates per
treatment.
Example 4 – Solution…
If you examine the ANOVA table, you can see that the
Total variation is SS(Total) = 879.55
Sum of squares for treatments is SST = 153.35
Sum of squares for error is SSE = 726.20.
ANOVA Table
Source of Sum of
d.f.: Mean square F Statistic
variation squares
MS(AB) = SS(AB)
Interaction (a-1)(b-1) SS(AB) F=MS(AB)/MSE
[(a-1)(b-1)]
Example 4 - Solution…
Test statistic: F =
Value of the test statistic: From the computer output, we
find MS(A) = 11.25, MSE = 10.09. Thus F = 11.25/10.09 =
1.12. Also, p-value = 0.2944.
16.128
Test statistic: F =
Value of the test statistic: From the computer output, we
find MS(B) = 45.28 and MSE = 10.09. Thus, F = 45.28/10.09 =
4.49. Also p-value = 0.0060.
16.130
Conclusion:
As p-value = 0.006 < = 0.05, reject H0.
Conclusion:
As p-value = 0.8915 > = 0.05, do not reject H0.
Identifying factors…
Numerical
Identifying factors…
16.139
Summary of ANOVA…
Two-factor analysis of variance