Professional Documents
Culture Documents
Analysis of Variance Handouts
Analysis of Variance Handouts
• Using Statistics
• The Hypothesis Test of Analysis of Variance
• The Theory and Computations of ANOVA
• The ANOVA Table and Examples
• Further Analysis
• Models, Factors, and Designs
• Two-Way Analysis of Variance
• Blocking Designs
ANOVA is a technique whereby the total variation present in a set of data is partitioned into several components.
Associated with each of these components is a specific source of variation, so that in the analysis, it is possible to
ascertain the magnitude of the contribution of each of these sources to the total variation.
• In an analysis of variance:
We have k independent random samples, each one corresponding to a population subject to a
different treatment.
We have:
• n = n1+ n2+ n3+ ...+nk total observations.
• k sample means: x1, x2 , x3 , ... , xk
– These k sample means can be used to calculate an estimator of the population
variance. If the population means are equal, we expect the variance among the
sample means to be small.
• k sample variances: s12, s22, s32, ...,sk2
– These sample variances can be used to find a pooled estimator of the population
variance.
The Hypothesis Test of
Analysis of Variance (continued): Assumptions
A Statistical Test to Determine if k Population Means are Equal: The One Way Analysis of Variance
The analysis of variance is used to test the hypothesis that the means of three or more populations are the
same against the alternative hypothesis that at least one population mean differs.
It is called the analysis of variance because the test is based on the analysis of variation in the data obtained
from different samples.
Example
1
Suppose we have teachers at a school who have devised three different methods to teach arithmetic. They
want to find out if these three methods produce different mean scores. Let
be the mean scores of all students who are taught by Methods I, II, and III, respectively.
To test if the three teaching methods produce different means, we test the null hypothesis
Ho: μ 1= μ 2 = (all
μ 3 three population means are equal)
Ha: at least one population mean is different from the other two
∑ x 2= allthesamples.
sum of the squares of the values in
SSC SSE
MSC= MSE=
k −1 k (n−1)
where k-1 and n – k are, respectively, the df for the numerator and the df for the denominator for the F distribution.
Example
Fifteen fourth-grade students were randomly assigned to three groups to experiment with three different methods
of teaching arithmetic. At the end of the semester, the same test was given to all 15 students. The table gives the
scores of students in the three groups.
Method I Method II Method III
48 55 85
73 85 68
51 70 95
65 69 74
87 90 67
Test that the mean scores of all three groups of fourth graders taught by three different methods are not equal.
Assume that all the required assumptions hold true. Use 0.01 level of significance.
Solution to Example
1. Ho: μ 1= μ 2 = μ(the 3 mean scores of the three groups are equal)
Ha: at least one population mean is different from the other two
where be the mean arithmetic scores of all fourth-grade students who are taught, respectively,
by MethodsμI,1 ,II,
, μand
2 , and
II. μ 3
[ ]
2 2 2
(∑ X 1 ) (∑ X 2 ) (∑ X k )
+ +. . .+
n1 n2 nk
SSC = - (T)2 /n
ANOVA Table:
Critical Region:
4
F > F.α (k-1 , n –k ) --- F > F0.05 ( 2, 12)
F > 3.89
Conclusion: there is no sufficient evidence that that the mean scores of all three groups of fourth graders taught by three
different methods are not equal.
Exercises:
1. The following data are test scores of the three groups of respondents randomly taken from a normally distributed
standardized test. The respondents were grouped according to the type of high school they graduated. Test whether there is a
significant difference in the mean score of these three groups of respondents. Use alpha level 0.05. Is there sufficient evidence
to say that the type of high school affects the performance in the given test?
2. the data below represent the number of hours of pain relief provided by 5 different brands of headache tablets
administered to 25 subjects. The 25 subjects were randomly selected, divided into 5 groups and each group was
treated with different brand.
5
POST HOC TEST
Test to be performed if in ANOVA, Ho is rejected.
To determine which of the means are not equal.
HSD=q α , k, n−k
√ MSE
nj
Where nj is the smaller sample size between the two comparing groups. (If the comparing groups have the same sample size n j is the
Consider the ANOVA table below: Taken from 5 groups (A, B, C, D, E) with common sample size of 5. The means of the five groups
are respectively the following: (5.2, 7.8, 4, 2.8 and 6.6)
Source of Variation df Sum of squares Mean Square F ratio
Columns ( between tablets) 4 79.44 19.86 6.895833
Error (with in tablets) 20 57.6 2.88= MSE
24 137.04
The Critical region is F > F.05, 4,20 where F.05, 4,20 = 2.87, thus the Critical region is F > 2.87.
Ho is rejected. So Post Hoc Test is needed to find which of the group mean is significantly different.
Solution:
Mean Differences:
Mean A B C D E
difference (5.2) (7.8) (4) (2.8) (6.6)
A (5.2) 2.6 1.2 2.4 1.4
B (7.8) 3.8 5 1.2
C (4) 1.2 2.6
D (2.8) 3.8
E (6.6)
q
From table H with α = 0.05, k = 5, n-k = 25 – 5 = 20, .05,5 ,,20
=4.23
HSD=4 .23
√ 2. 88
5
=3 . 21
All mean difference higher than 3.21 is significant. It can be concluded that there is a significant difference between the means of the
following: Between B and C, between B and D, between D and E.
6
Exercises:
Given Below is ANOVA table obtained from 3 groups A, B, C, with sample sizes respectively 4, 6, and 5. Perform a post hoc test using
Tukey’s HSD.
Std.
Mean n Dev
5.33333333Aa
aa3AA 5.8 4 1.26 A
5.333333333 3.5 6 1.52 B
5.333333333 7.2 5 1.64 C
5.3 15 2.16 Total
ANOVA table
p-
Source SS df MS F value
Treatment 38.28 2 19.142 8.49 .0050
Error 27.05 12 2.254
Total 65.33 14
When the sample sizes are not all the same, use the smaller sample size of the group, of the two groups we are
comparing.
Example: Test that the means of the populations where the following sample groups are taken are significantly different.
At alpha level of 0.01
A B C
5 8 10
6 9 10
5 8 9
8 7 8
6 9 8
7 9 9
6 10 10
5 8 9
6 8
7 9
10
8
n1=10 n2 = 8 n3=12
6.1 8.5 9
Perform Tukey’s HSD to determine which of the two means have a significant difference .
ANOVA table
p-
Source SS df MS F value
1.69E-
column 49.80 2 24.900 29.36 07
Error 22.90 27 0.848
Total 72.70 29 7
MSE = 0.848
q .01,3 ,27=4.517
Mean difference:
A vs. B = 2.4, A vs. C = 2.90, B vs. C = 0.5
prepared by
SIOTE WY
8
9
10