Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Analysis of Variance

• Using Statistics
• The Hypothesis Test of Analysis of Variance
• The Theory and Computations of ANOVA
• The ANOVA Table and Examples
• Further Analysis
• Models, Factors, and Designs
• Two-Way Analysis of Variance
• Blocking Designs

ANOVA: Using Statistics


• ANOVA (Analysis of Variance) is a statistical method for determining the existence of differences among
several population means.
 ANOVA is designed to detect differences among means from populations subject to different
treatments
 ANOVA is a joint test
• The equality of several population means is tested simultaneously or jointly.
ANOVA tests for the equality of several population means by looking at two estimators of the population variance
(hence, analysis of variance).

ANOVA is a technique whereby the total variation present in a set of data is partitioned into several components.
Associated with each of these components is a specific source of variation, so that in the analysis, it is possible to
ascertain the magnitude of the contribution of each of these sources to the total variation.

The Hypothesis Test of


Analysis of Variance

• In an analysis of variance:
 We have k independent random samples, each one corresponding to a population subject to a
different treatment.
 We have:
• n = n1+ n2+ n3+ ...+nk total observations.
• k sample means: x1, x2 , x3 , ... , xk
– These k sample means can be used to calculate an estimator of the population
variance. If the population means are equal, we expect the variance among the
sample means to be small.
• k sample variances: s12, s22, s32, ...,sk2
– These sample variances can be used to find a pooled estimator of the population
variance.
The Hypothesis Test of
Analysis of Variance (continued): Assumptions

• We assume independent random sampling from each of the k populations


• We assume that the k populations under study:
– are normally distributed,
– with means µi that may or may not be equal,
– but with equal variances, σi2.

A Statistical Test to Determine if k Population Means are Equal: The One Way Analysis of Variance
 The analysis of variance is used to test the hypothesis that the means of three or more populations are the
same against the alternative hypothesis that at least one population mean differs.
 It is called the analysis of variance because the test is based on the analysis of variation in the data obtained
from different samples.
Example

1
 Suppose we have teachers at a school who have devised three different methods to teach arithmetic. They
want to find out if these three methods produce different mean scores. Let
be the mean scores of all students who are taught by Methods I, II, and III, respectively.
 To test if the three teaching methods produce different means, we test the null hypothesis
Ho: μ 1= μ 2 = (all
μ 3 three population means are equal)
Ha: at least one population mean is different from the other two

 Using a one-way ANOVA test, we analyze only one factor or variable.


 For instance, in the example of testing for the equality of mean arithmetic scores of students taught by each
of the three different methods, we are considering only one factor, which is the effect of different teaching
methods on the scores of students.
 Sometimes we may analyze the effects of two factors. For example, if different teachers teach arithmetic
using these three methods, we can analyze the effects of teachers and teaching methods on the scores of
students. This is done by using a two-way ANOVA.

Assumptions of One-Way ANOVA


 The populations from which the samples are drawn are (approximately) normally distributed.
 The populations from which samples are drawn have the same variance (or standard deviation).
 The samples drawn from different populations are random and independent.
 In the example about three methods of teaching arithmetic, we first assume that the scores of all students
taught by each method are (approximately) normally distributed.
 Second, the means of the distributions of scores for the three teaching methods may or may not be the
same, but all three distributions have the same variance .
 Third, when we take samples to make an ANOVA test, these samples are drawn independently and randomly
from three different populations.

One-Way ANOVA Test
 The ANOVA test is applied by calculating two estimates of the variance, σ 2 , of population distributions: the
VARIANCE BETWEEN (COLUMN) SAMPLES and the VARIANCE WITHIN (ERROR) SAMPLES.
 The variance between samples is also called the mean square between samples or MSB OR MSC. The
variance within samples is also called the mean square within (ERROR) samples of MSW OR MSE.
 The variance between samples, MSB OR MSC, gives an estimate of σ 2 based on the variation among the
means of samples taken from different populations.
 For the example of three teaching methods, MSB or MSC will be based on the values of the mean scores of
three samples of students taught by three different methods.
 If the means of all populations under consideration are equal, the means of the respective samples will still
be different but the variation among them is expected to be small, and consequently, the value of MSB or
MSC is expected to be small.
 However, if the means of populations under consideration are not all equal, the variation among the means
of respective samples is expected to be large, and consequently, the value of MSB or MSC is expected to be
large.
 The variance within samples, MSWor MSE, gives an estimate of σ 2 based on the variation within the data of
different samples.
 For the example of three teaching methods, MSW or MSE will be based on the scores of individual students
included in the three samples taken from three populations.
 Is always right-tailed with the rejection region in the right tail of the F distribution curve.
 The hypothesis-testing procedure using ANOVA involves the same five steps that were used in the previous
sessions.

Test Statistic F for a One-Way ANOVA Test


 The value of the test statistic F for a test of hypothesis using ANOVA is given by the ratio of two variances,
the variance between samples (MSB) and the variance within samples (MSW).

Variance between samples MSC


F= or
Variance within samples MSE
2
Notations
Let x = the variable of interest
k = the number of different samples
(Or treatments)
ni = the size of sample i
Ti = the sum of the values in sample i
n = the number of values in all samples
= n1 + n2 + n3 + . . .+ nk
∑ x = the sum of the values in all samples
= T1 + T2 + T3 + . . . + Tk

∑ x 2= allthesamples.
sum of the squares of the values in

Calculating the Values of MSC and MSE

SSC SSE
MSC= MSE=
k −1 k (n−1)

where k-1 and n – k are, respectively, the df for the numerator and the df for the denominator for the F distribution.

Example
Fifteen fourth-grade students were randomly assigned to three groups to experiment with three different methods
of teaching arithmetic. At the end of the semester, the same test was given to all 15 students. The table gives the
scores of students in the three groups.
Method I Method II Method III

48 55 85

73 85 68

51 70 95

65 69 74

87 90 67

Test that the mean scores of all three groups of fourth graders taught by three different methods are not equal.
Assume that all the required assumptions hold true. Use 0.01 level of significance.

Solution to Example
1. Ho: μ 1= μ 2 = μ(the 3 mean scores of the three groups are equal)
Ha: at least one population mean is different from the other two
where be the mean arithmetic scores of all fourth-grade students who are taught, respectively,
by MethodsμI,1 ,II,
, μand
2 , and
II. μ 3

2. Level of Significance: .01


3. Test Statistic: Because we are comparing the means for three normally distributed populations, we use the F
Test Statistic.
3
Critical Region: Reject Ho if F > F α ,( k −1, n−k )
F > F . 01 ,( 3−1 ,15−3 )
F > F . 01 ,( 2,12 )=6. 93

4. Calculate the value of the test statistic

Method I Method II Method III


48 55 85
Where ∑X2 = ∑X12 + ∑X22 +. . . + ∑Xk2 73 85 68
And T = ∑X1 + ∑X2 +. . . + ∑Xk 51 70 95
65 69 74
∑X2 = 22028 + 28011 + 30839 = 80878 87 90 67
T = 324 + 369 + 389 = 1082 k=3 ∑X1= 324 ∑X2 = 369 ∑X3 = 389
n1 = 5 n2 = 5 n3 = 5
SST = ∑X2 - (T)2 / n ∑X12 = 22028 ∑X22=28011 ∑X32 = 30839

[ ]
2 2 2
(∑ X 1 ) (∑ X 2 ) (∑ X k )
+ +. . .+
n1 n2 nk
SSC = - (T)2 /n

SSE = SST - SSC

ANOVA Table:

Source of Sum of Degrees of Mean Squares F


variation Squares freedom
Column SSC k–1 SSC S1 2 / s2 2
means s 2=
1 k −1
Error SSE n-k or SSE
(n1+..nk)-k s 2=
2 k ( n−1 )
Total SST n -1 or
(n1+..nk)-1

SST = 80878 – (1082)2 / 15


= 80878 – 78048.27
= 2829.73

SSC = [ 3242/5 + 3692/5 + 3892/5 ] - 78048.27


= 443.33

SSE = 2829.73 - 443.33 = 2386.4

Source of Sum of Degrees Mean Computed


variation squares of squares F
freedom
column 443.33 3–1=2 221.67 1.11
Error 2386.4 15-3=12 199.12
total 15 -1 =14

Critical Region:

4
F > F.α (k-1 , n –k ) --- F > F0.05 ( 2, 12)
F > 3.89

Decision: Fail to reject H0. (Accept Ho)

Conclusion: there is no sufficient evidence that that the mean scores of all three groups of fourth graders taught by three
different methods are not equal.

Exercises:

1. The following data are test scores of the three groups of respondents randomly taken from a normally distributed
standardized test. The respondents were grouped according to the type of high school they graduated. Test whether there is a
significant difference in the mean score of these three groups of respondents. Use alpha level 0.05. Is there sufficient evidence
to say that the type of high school affects the performance in the given test?

Private Public Science


high
4 2 8
7 1 6
6 3 8
6 3 9
3 5
4

2. the data below represent the number of hours of pain relief provided by 5 different brands of headache tablets
administered to 25 subjects. The 25 subjects were randomly selected, divided into 5 groups and each group was
treated with different brand.

Hours of Relief from Headache


Biogesic Alaxan Advil Tylenol Placebo
5 9 3 2 7
4 7 5 3 6
8 8 2 4 9
6 6 3 1 4
3 9 7 4 7
Test the hypothesis at the 0.05 level of significant that the mean number of hours of relief provided by the tablets is
the same for all five brands.

5
POST HOC TEST
Test to be performed if in ANOVA, Ho is rejected.
To determine which of the means are not equal.

Tukey’s HSD Test (one of the Post Hoc Tests)


In 1953 J. W. Tukey proposed a procedure for making all pairwise comparison among means. This method is now widely
used , is called HSD (Honestly significant Difference) test or W procedure.

HSD=q α , k, n−k
√ MSE
nj
Where nj is the smaller sample size between the two comparing groups. (If the comparing groups have the same sample size n j is the

common sample size.)


q α ,k,n−k is obtained using table H. MSE is the Mean Square value in the ANOVA Table.
Example.

Consider the ANOVA table below: Taken from 5 groups (A, B, C, D, E) with common sample size of 5. The means of the five groups
are respectively the following: (5.2, 7.8, 4, 2.8 and 6.6)
Source of Variation df Sum of squares Mean Square F ratio
Columns ( between tablets) 4 79.44 19.86 6.895833
Error (with in tablets) 20 57.6 2.88= MSE
24 137.04

The Critical region is F > F.05, 4,20 where F.05, 4,20 = 2.87, thus the Critical region is F > 2.87.
Ho is rejected. So Post Hoc Test is needed to find which of the group mean is significantly different.

Perform a Post Hoc Test Using Tukey’s HSD.

Solution:
Mean Differences:

Mean A B C D E
difference (5.2) (7.8) (4) (2.8) (6.6)
A (5.2) 2.6 1.2 2.4 1.4
B (7.8) 3.8 5 1.2
C (4) 1.2 2.6
D (2.8) 3.8
E (6.6)

q
From table H with α = 0.05, k = 5, n-k = 25 – 5 = 20, .05,5 ,,20
=4.23

HSD=4 .23
√ 2. 88
5
=3 . 21
All mean difference higher than 3.21 is significant. It can be concluded that there is a significant difference between the means of the
following: Between B and C, between B and D, between D and E.

6
Exercises:
Given Below is ANOVA table obtained from 3 groups A, B, C, with sample sizes respectively 4, 6, and 5. Perform a post hoc test using
Tukey’s HSD.

One factor ANOVA

Std.
Mean n Dev
5.33333333Aa
aa3AA 5.8 4 1.26 A
5.333333333 3.5 6 1.52 B
5.333333333 7.2 5 1.64 C
5.3 15 2.16 Total

ANOVA table
p-
Source SS df MS F value
Treatment 38.28 2 19.142 8.49 .0050
Error 27.05 12 2.254
Total 65.33 14

When the sample sizes are not all the same, use the smaller sample size of the group, of the two groups we are
comparing.

Example: Test that the means of the populations where the following sample groups are taken are significantly different.
At alpha level of 0.01

A B C
5 8 10
6 9 10
5 8 9
8 7 8
6 9 8
7 9 9
6 10 10
5 8 9
6 8
7 9
10
8

n1=10 n2 = 8 n3=12
6.1 8.5 9

Perform Tukey’s HSD to determine which of the two means have a significant difference .

ANOVA table
p-
Source SS df MS F value
1.69E-
column 49.80 2 24.900 29.36 07
Error 22.90 27 0.848
Total 72.70 29 7
MSE = 0.848
q .01,3 ,27=4.517
Mean difference:
A vs. B = 2.4, A vs. C = 2.90, B vs. C = 0.5

Comparing A and B use n = 8 ,


significant.
HSD=4 .517
√ . 848
8
=1 . 47
, since 2.4 > 1.47, the difference between the means of A and B is

comparing A and C use n = 10,


HSD=4 .517
√ . 848
10
=1 . 31
, 2.9 > 1.31, significant difference between the means of A and C

comparing B and C use n= 8,


HSD=4 .517
√ . 848
8
=1 . 47
, 0.5 < 1.47, no significant difference between the means of B and C

prepared by

SIOTE WY

8
9
10

You might also like