Professional Documents
Culture Documents
Module 8 ANOVA or F Test
Module 8 ANOVA or F Test
Module 8 ANOVA or F Test
Overview
In this module, learners will examine the most commonly used statistical method for testing hypotheses about 3 or
more means –Analysis of Variance, which is usually shortened to ANOVA. This test presents the technique in simple
analysis of variance of data in one-way classification and discussions about several post ANOVA test.
Objectives
Learning Focus
To test whether any particular two of five means are significantly different, use either the z-test or the t-
test, to make ten distinct comparisons. However, this section considers a more convenient technique, and it
involves testing the equality of several means of SIMULTANEOUSLY! This user-friendly technique is referred to as
the analysis of variance, ANOVA or the F-Test for some users.
ANOVA
The ANOVA is used to analyze or to test the significance of differences among the means of 3 or more
groups simultaneously. It is a method of subdividing the total variation in the outcome measurements into that,
which is attributable to differences among the groups and which is due to chance or attributable to inherent
variation within the groups.
ANOVA was developed by Fisher, a famous statistician from whom the term F-test came. This section
presents the technique in simple analysis of variance of data in one-way classification or the One-Way ANOVA of
equal and unequal sample size. Always remember that in this method of testing the difference between means
simultaneously, the search stops when the decision arrives at is the acceptance of the hypothesis. If, however, the
decision arrives at is the rejection of the null, then the search continues to find out which pair accounts for the
difference. Either a t-test or any available post-ANOVA test will later be used to find out which group significantly
differs from the other.
Simple analysis of variance or One-Way ANOVA is based on two sources of variations, namely:
1. Actual difference of the means due to TREATMENT. This is represented by the sum of squares between
columns(𝑆𝑆𝑏 ).
2. Chance or experimental ERROR. This is represented by the sums of squares within columns (𝑆𝑆𝑤 )
For easy usage, the formulas minus the complicated mathematical notations are as follows:
(∑ 𝒙)𝟐
1. Total Sums of Squares: 𝑻𝑺𝑺 = ∑ 𝒙𝟐 − 𝑵
Under the assumption that the groups of levels of the factor being studied represent populations whose
outcome measurements are randomly and independently drawn, follow the normal distribution, and have equal
variances, the null hypothesis of no differences in the population means.
Ho: 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4 … 𝜇𝑛 ; There is no significant difference among the means of the groups being
compared, and is tested against the alternative hypothesis that the means are not all the same, thus:
Ha: Not all means 𝜇’s equal.
The box to the left shows what a true hypothesis looks like when three groups are compared and the
assumptions of normally and equality of variances hold. The three populations representing the different levels of
the factor are identical and, therefore, super impose on one another.
On the other hand, the box on the right shows a null hypothesis which is false, that is 𝜇1 = 𝜇2 , but not
equal to 𝜇3 , that is 𝜇3 > 𝜇1 and therefore also 𝜇3 > 𝜇2 .
Applications
Example 1.
Problem 1. The following table indicates the number of bottles of four popular brands of vinegar sold by
MS Supermarket on six randomly selected days. Test at 0.01 level of significance that there is
no significant difference in the average number of bottles sold for the four brands of vinegar.
A B C D
29 23 45 23
36 19 60 40
22 41 33 42
34 27 36 29
29 12 31 53
45 35 40 32
.
Solution:
1. Determine 𝒏 then compute for ∑ 𝒙 and ∑ 𝒙𝟐 . A table can facilitate your solution just like below.
A B C D
29 23 45 23
36 19 60 40
22 41 33 42
34 27 36 29
29 12 31 53
45 35 40 32 Total
𝑛(number of sample) → 6 6 6 6 24
∑ 𝑥 (sum of scores in a column) → 195 157 245 219 816
∑ 𝑥 2 (sum of the squares of each score in a column) → 6643 4669 10571 8567 30450
𝑆𝑆 (sum of the squares within column) → 305.50 560.833 566.833 573.50 2706
a. 𝑛𝑐𝑜𝑙𝑢𝑚𝑛𝐴 = 𝟔
b. ∑ 𝑥𝑐𝑜𝑙𝑢𝑚𝑛𝐴 = 𝟏𝟗𝟓 = 29 + 36 + 22 + 34 + 29 + 45
c. ∑ 𝑥 2 𝑐𝑜𝑙𝑢𝑚𝑛𝐴 = 𝟔𝟔𝟒𝟑 = 292 + 362 + 222 + 342 + 292 + 452
2 2
( ∑ 𝑥𝐴 ) (195)
d. 𝑆𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝐴 = ∑ 𝑥𝐴 2 − 𝑁𝐴
= 𝟑𝟎𝟓. 𝟓 = 6643 − 6
The Critical value or Tabular F-value (with dfw=20 and dfb=3) shown below can be calculated online
through this link https://www.socscistatistics.com/tests/criticalvalues/default.aspx.
You can see that the results of the test of the significance of the differences of the means is the same for
both (critical and p-value) approaches. Both resulted to the non-rejection of the hypothesis.
Since the decision arrived at is not reject Ho, your job ends here.
Example 2.
Problem 2. The table shows the amount of dirt in milligrams that the brands of laundry detergent have
removed. Test at 𝜶 = 𝟎. 𝟎𝟓. If there is a significant difference among the mean amounts of dirt
removed by the four brands of laundry detergent.
Brand W Brand X Brand Y Brand Z
10 11 10 17
12 13 11 15
16 16 15 17
16 18 14 19
14 20 13 21
.
Solution:
1. Determine 𝑛 then compute for ∑ 𝑥 and ∑ 𝑥 2 . A table can facilitate your solution just like below.
a. 𝑛𝑏𝑟𝑎𝑛𝑑𝑊 = 5
b. ∑ 𝑥𝑏𝑟𝑎𝑛𝑑𝑊 = 68 = 10 + 12 + 16 + 16 + 14
c. ∑ 𝑥 2 𝑏𝑟𝑎𝑛𝑑𝑊 = 952 = 102 + 122 + 162 + 162 + 142
2
(68)
d. 𝑆𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑊 = 27.2 = 952 − 5
The Critical value or Tabular F-value (with dfw=16 and dfb=3) shown below can be calculated online
through this link https://www.socscistatistics.com/tests/criticalvalues/default.aspx.
Below is the Megastat output for the amount of dirt by the four types of laundry detergent.
Now since the decision leads to the rejection of Ho, then you have to continue searching which pair or
pairs significantly differ from each other. Remember that if the decision is not to reject Ho, then you are done; your
job is finished.
At this point, you are already familiar with the t-test, you will employ it to look for the pair or pairs of
laundry detergent which are significantly different from each other. Be guided by the following steps:
Step 1. Arrange the means from the lowest to highest, writing them from left to right.
𝑌̅ ̅
𝑊 𝑋̅ 𝑍̅
12.6 13.6 15.6 17.8
Step 2.𝛼 = 0.05(This is the 𝛼 in the ANOVA problem); One-tailed test(since you know that the means on
the right are greater than the means on the left)
Step 3. Perform “t-test for Sample Means Assuming Equal Variances” on each pair starting from the right
to the left, that is: 𝑍̅and𝑋̅; 𝑍̅and𝑊
̅ ; 𝑍̅ and𝑌̅; 𝑋̅ and𝑊
̅ ; 𝑋̅ and𝑌̅; 𝑊
̅ and𝑌̅.
Printouts follow:
Step 4. Analysis
a) 𝑍̅ and 𝑌̅: p-value (0.002723)<𝛼(0.05), Ho is rejected-There is significant difference.
b) 𝑍̅ and 𝑊̅ : p-value (0.013307)<𝛼(0.05), Ho is rejected-There is significant difference.
̅ ̅
c) 𝑍 and 𝑋: p-value (0.142902)>𝛼(0.05), Ho is not rejected-There is no significant difference.
d) 𝑋̅ and 𝑊̅ : p-value (0.173864)>𝛼(0.05), Ho is not rejected-There is no significant difference.
e) 𝑋̅ and 𝑌̅: p-value (0.074243)>𝛼(0.05), Ho is not rejected-There is no significant difference.
f) 𝑊̅ and 𝑌̅: p-value (0.260512)>𝛼(0.05), Ho is not rejected-There is no significant difference.
Step 6. Conclusion: Since Z and X removed the most amount of dirt and they are significantly
different from each other, then they are found to be the best among the four brands of laundry
detergent.