Professional Documents
Culture Documents
Analysis of Variance
Analysis of Variance
ANALYSIS OF VARIANCE
We studied the principles and procedures of testing hypothesis about the equality of
two population means under the assumption that the two random samples are drawn
independently from two normal populations that have equal variance.
However, in many situations we are required to test hypothesis about the equality of
several population means simultaneously. In such cases, one way is to perform two samples t-
tests on all possible pairs of means, but we might be tempted to apply the two sample t-test to
all possible pairwise comparisons of means. For example, if we want to compare 6 population
6
means, we could perform 15 two sample t-tests. This sort of multiple running two
2
sample t-tests for comparing means has two disadvantages. First, the procedure is tedious and
time consuming and secondly, the overall significance level greatly increases as the number
of t-tests increases i.e. the more t-tests one runs on a given set of data, the larger the overall
risk of committing type-I error for at least one of the comparisons. For example, if we wish to
10
test the hypothesis about the equality 10 population means we could perform 45 two
2
sample t-tests. If the tests are independent and each one is performed using 0.05 , we
could expect 45(0.05) 2.25 such errors even if the null hypothesis H 0 : i j is true for
each test. Thus a series of two samples t-tests is not an appropriate procedure to test the
equality of several means simultaneously.
Therefore, we require a procedure for carrying out a test of hypothesis about the
equality of several population means simultaneously. For this purpose, Sir Ronald A. Fisher
in 1923 developed a technique called the Analysis of Variance (ANOVA).
The Analysis of Variance is a technique that partitions the total variation present in
the data into its meaningful components, each of which is associated with a different source
of variation. These component parts of variance are then analyzed in such a manner that
certain hypothesis can be tested. This technique is based on the facts that
(i) The more the sample means differ, the larger the variance becomes
(ii) The separate component proved independent and unbiased estimates of the
common population variance.
Therefore the ANOVA procedure compares two different estimates of variance by
using F-distribution to test the equality of the population means. The Analysis of variance is
the most powerful and useful technique whenever the statistical data can be categorized in
groups.
When each observation is classified according to single criterion, we have a one-way
classification while the classification of each observation on the basis of two criteria of
classification simultaneously, is called a two way classification and so on.
2
Samples (Treatments)
Observation
1 2 …….. j …….. k
1 X11 X12 …….. X1j …….. X1k
2 X21 X22 …….. X2j …….. X2k
……
……
……
……
……
……
i Xi1 Xi2 …….. Xij …….. Xik
……
……
……
……
……
……
(i) H 0 : 1 2 ........ k
H1 : At least one pair of means differ
(ii) Level of significance:
(iii) Test Statistic:
sb2
F
sw2
(iv) Calculation:
T..2
C .F .(Correction factor )
n
3
T. 2j
Between samples S .S .( Between samples Sum of square) j
C .F .
rj
within or error samples S .S .(Within samples Sumof square) Total S .S . Between S .S .
ANOVA table
S.O.V. S.S. M.S.
d.f. F-ratio
(Source of variation) Sum of Square Mean Square
sb2
r X . j X .. SSB
k 2 SSB
Between Samples k 1 s 2
F 2
k 1
b
j 1 sw
X ij X . j SSE
k r 2 SSE
Within (Error) Samples nk sw2 ……
j 1 i 1 nk
X ij X .. SST
k r 2
Total n 1 ------- ……
j 1 i 1
…………………………………………………………………………………………………..
Example# 1 Given the data below, test the hypothesis that the means of the three populations
are equal. Let 0.05
(i) H 0 : 1 2 3
H1 : At least one pair of means differ
(ii) Level of significance:
0.05
4
sb2
F
sw2
(iii) Calculation:
Sample 1 Sample 2 Sample 3
40 70 45
50 65 38
60 66 60
65 50 42
T. j 215 251 185 T.. 651
T..2 (651)2
C .F . 35316.75
n 12
Total S .S . X ij2 C .F .
i j
T. 2j
Between Samples S .S . j
C .F .
r
(215)2 (251)2 (185)2
Between Samples S .S . 35316.75
4 4 4
(215)2 (251) 2 (185) 2
Between Samples S .S . 35316.75
4
143451
Between Samples S.S. 35316.75 35862.75 35316.75 546.00
4
within Samples S .S . Total S .S . Between S .S .
ANOVA table
S.O.V. d.f. S.S. M.S. F-ratio
Example# 2 Given the data below, test the hypothesis that the means of the all four
populations are equal. Let 0.05
Sample Number
1 2 3 4
11 13 21 10
4 9 18 4
6 14 15 19
Example# 3 Twenty men are used in an experiment, five being assigned at random to each of
the four machines. The observations are the amount produced by the machines in one day.
Test the hypothesis at 0.05 , that the machines are not different with respect to the
number of items produced.
Machine Number
1 2 3 4
64 41 65 45
39 48 57 51
65 41 76 55
46 49 72 47
63 57 64 47
Example# 4 The following are three consecutive weeks earnings of three salesmen employed
by a given firm.
Salesmen
A B C
152 181 160
175 171 130
180 203 124
Calculate F and assuming that the necessary assumptions can be met, test at 5% level of
significance, whether difference between salesmen are significant.
Example# 5 Determination of yields of a process with four treatments are given. Test the
hypothesis that no differences exist among the four treatments at 0.05 .
6
Treatments
1 2 3 4
11 6 8 14
4 4 6 27
4 3 4 8
5 6 11 18
D.Y.S.
………………………………………………………………………………………………......
Example# 6 Given the data below, test the hypothesis that the means of the three populations
are equal. Let 0.05
Groups Observations
A 4 9 10 11 17 19
B 6 8 10 11 12 12 15
C 9 13 15 20 23
Sol:
(i) H 0 : A B C
H1 : At least one pair of means differ
(ii) Level of significance:
0.05
(iii) Test Statistic:
sb2
F 2
sw
(iii) Calculation:
Groups
A B C
4 6 9
9 8 13
10 10 15
11 11 20
17 12 23
19 12 ----
--- 15 ----
T. j 70 74 80 T.. 224
7
T..2 ( 224)2
C .F . 2787.56
n 18
Total S .S . X ij2 C .F .
i j
Example# 7 Determinations are made on the yield using three methods of catalyzing a
chemical process.
8
Method Measurements
1 47.2 49.8 48.5
2 50.1 49.3 51.5 50.9
3 49.1 53.2 51.2 52.8 52.3
D.Y.S.
………………………………………………………………………………………………......
ASSUMPTIONS OF ONE WAY ANALYSIS OF VARIANCE
(i) The k samples are selected randomly and independently from the k populations.
(ii) All the k populations from which the samples are drawn are normally distributed with
means 1 , 2 ,........., k .
(iii) The normal populations all have equal variances i.e. 12 22 ........ k2 2
(iv) The effects are additive. This means that X ij , the ith observation in the jth samples,
is made up of three component quantities as follows;
X ij j ij
Where is the overall mean, j is the sample or treatment effect for jth
population and ij is the random error, usually considered a normally and
independently distributed variable with zero mean and common variance 2 .
…………………………………………………………………………………………………..
TWO WAY ANALYSIS OF VARIANCE (TWO WAY ANOVA)
When each observation is classified according to two criteria (or variables) of
classification simultaneously, we use the two way Analysis of Variance technique. The
classified data are recorded in a table, in which the columns represent one criterion (or
variable) of the classification and the rows represent the other criterion. If there are c columns
and r rows in the table, then there will be altogether rc cells. Each cell may contain a single
observation or several observations.
There are two basic forms of two way analysis of variance, depending upon whether
the two variables of classification are independent or whether they interact.
TWO WAY ANALYSIS OF VARIANCE (Without Interaction)
Let X ij denote an observation in the ith row and jth column in a table consisting of r
rows and c columns and containing sample data from normal populations with means ij and
the common variance 2 , classified according to two criteria of classification simultaneously.
9
Columns
Rows Total Means
1 2 …….. j …….. c
1 X11 X12 …….. X1j …….. X1c T1. X 1.
……
……
……
……
……
……
……
i Xi1 Xi2 …….. Xij …….. Xic Ti . Xi.
……
……
……
……
……
……
……
……
……
r Xr1 Xr2 …….. Xrj …….. Xrc Tr . X r.
sc2
F2 2
se
Where sr2 Rows Mean Square
r
Ti .2
Between Rows S .S . i 1 C .F .
c
c
T. 2j
j 1
Between Columns S .S . C .F .
r
Error S .S . Total S .S . Between Rows S .S . BetweenColumns S .S .
ANOVA table
S.O.V. d.f. S.S. M.S. F-ratio
sr2
c X i . X .. SSR
r SSR
F1
2
Between Rows r 1 sr2
i 1 r 1 se2
sc2
r X . j X .. SSC
c 2 SSC
Between Columns c 1 sc2 F2
j 1 c 1 se2
SSE
Error ( r 1)(c 1) SSE se2 ……
(r 1)(c 1)
X ij X .. SST
r c 2
Total n 1 ------- ……
i 1 j 1
if F1 F( , v1 v2
r 1, ( r 1)( c1) )
And if F2 F( , v1 v2
c1, ( r 1)( c1) )
(vi) Conclusion:
…………………………………………………………………………………………………..
sr2
F1
se2
sc2
F2
se2
P.T.O.
(iv) Calculation:
Consignments
Observers Ti . Ti .2
1 2 3 4 5 6
1 9 10 9 10 11 11 60 3600
2 12 11 9 11 10 10 63 3969
3 11 10 10 12 11 10 64 4096
4 12 13 11 14 12 10 72 5184
T..2 ( 259)2
C .F . 2795.04
n 24
r c
Total S .S . X ij2 C .F .
i 1 j 1
Error S .S . 13.12
ANOVA table
S.O.V. d.f. S.S. M.S. F-ratio
(vi) Conclusion:
Since the computed value of F1 5.03 falls in the rejection region, therefore the null
hypothesis H 0 rejected. It is therefore concluded that At least one pair of observer’s
means differ significantly.
Since the computed value of F2 2.23 does not fall in the rejection region, therefore
the null hypothesis H 0 cannot be rejected. It is therefore concluded that there is no
significant difference between the means of consignments.
…………………………………………………………………………………………………..
Total S . S . X ij X ..
r c 2
i 1 j 1
Total S . S . X ij X i . X i . X . j X . j X .. X .. X ..
r c 2
i 1 j 1
i 1 j 1
Total S . S . X i . X .. X . j X .. X ij X i . X . j X ..
r c 2 r c 2 r c 2
i 1 j 1 i 1 j 1 i 1 j 1
Total S . S . c X i . X .. r X . j X .. X ij X i . X . j X ..
r 2 c 2 r c 2
i 1 j 1 i 1 j 1
With d.f.
n 1 (r 1) (c 1) ( r 1)(c 1)
………………………………………………………………………………………………
14
sr2
F1
se2
sc2
F2
se2
15
(iv) Calculation:
Treatments
Blocks Ti . Ti .2
1 2 3 4 5 6
1 1 3 6 4 3 2 19 361
2 1 4 4 8 5 1 23 529
3 3 6 7 8 4 3 31 961
4 2 3 2 3 2 1 13 169
T. j 7 16 19 23 14 7 T.. = 86 Ti .2 2020
i 1
7
X.j 1.75 4.00 4.75 5.75 3.50 1.75
4
T..2 (86)2
C .F . 308.17
n 24
r c
Total S .S . X ij2 C .F .
i 1 j 1
ANOVA table
S.O.V. d.f. S.S. M.S. F-ratio
Between 28.50
Blocks r 1 4 1 3 28.50 sr2 9.50 ……
3
16
(vi) Conclusion:
Since the computed value of F2 6.61 falls in the rejection region, therefore the null
hypothesis rejected. It is therefore concluded that At least one pair of treatment’s means
differ significantly.
Because H 0 is rejected using F-test, therefore we apply the LSD test to find which
pairs of means differ significantly as;
2( MSE )
LSD t
( , error of d . f .)
2
r
2(1.57)
LSD t( 0.025, 15)
4
2(1.57)
LSD ( 2.13) 1.89
4
x1 x6 x5 x2 x3 x4
1.75 1.75 3.50 4.00 4.75 5.75
Alternative Method
Arranging the treatment means in ascending order of magnitude and drawing a line
under the pair of adjacent means that are not significantly different, we have;
x1 x6 x5 x2 x3 x4
1.75 1.75 3.50 4.00 4.75 5.75
…………………………………………………………………………………………………..
Q# 20.31 (b) D.Y.S.
…………………………………………………………………………………………………..
(ii) THE DUNCAN’S MULTIPLE RANGE TEST
MSE
; Rp q ( p, error d . f .)
p 2,3,..., k 1, k
r
…………………………………………………………………………………………………..
Example# 20.8 (page# 328)
(i) H 0 : All treatment ' s means are equal
H1 : At least one pair of treatment ' s means differ
(ii) Level of significance:
0.05
(iii) Test Statistic:
sr2
F1
se2
sc2
F2
se2
18
(iv) Calculation:
Treatments
Blocks Ti . Ti .2
1 2 3 4 5 6
1 1 3 6 4 3 2 19 361
2 1 4 4 8 5 1 23 529
3 3 6 7 8 4 3 31 961
4 2 3 2 3 2 1 13 169
T. j 7 16 19 23 14 7 T.. = 86 Ti .2 2020
i 1
7
X.j 1.75 4.00 4.75 5.75 3.50 1.75
4
T..2 (86)2
C .F . 308.17
n 24
r c
Total S .S . X ij2 C .F .
i 1 j 1
ANOVA table
S.O.V. d.f. S.S. M.S. F-ratio
Between 28.50
Blocks r 1 4 1 3 28.50 sr2 9.50
3
19
(vi) Conclusion:
Since the computed value of F2 6.61 falls in the rejection region, therefore the null
hypothesis rejected. It is therefore concluded that At least one pair of treatment’s means
differ significantly.
Because H 0 is rejected using F-test, therefore we apply the Duncan’s Multiple Range
test to find which pairs of means differ significantly as;
MSE
Rp q ( p, error d . f .) ; p 2,3, 4,5,6
r
1.57
Rp q0.05 ( p, 15)
4
Rp q0.05 ( p, 15) (0.6265)
Arranging the treatment means in ascending order of magnitude, we get
x1 x6 x5 x2 x3 x4
1.75 1.75 3.50 4.00 4.75 5.75