Professional Documents
Culture Documents
One Way
One Way
VARIANCE
Structure
6.1 Introduction
Objectives
6.2 One-way Analysis of Variance Model
6.3 Basic Assumptions of One-way Analysis of Variance
6.4 Estimation of Parameters
6.5 Test of Hypothesis
6.6 Degrees of Freedom of Various Sum of Squares
6.7 Expectations of Various Sum of Squares
Expectation of Treatment Sum of Squares
Expectation of Sum of Squares due to Error
6.8 ANOVA Table for One-way Classification
6.9 Summary
6.10 Solutions/Answers
6.1 INTRODUCTION
The analysis of variance is one of the powerful techniques of statistical
analysis. Analysis of variance is used for testing of equality of means of
several populations. It tests the variability of the means of the several
populations. In the previous unit, we have discussed about the fundamental
terms which are used in the analysis of variance. In that unit, we have also
discussed the basic assumptions and models of analysis of variance.
As we have stated that the analysis of variance technique can be divided into
two categories (i) parametric ANOVA and (ii) Non-parametric ANOVA. The
parametric ANOVA can also be classified as one-way ANOVA if only one
response variable is considered and MANOVA if two or more response
variables are considered.
In this unit, we shall discuss the one-way analysis of variance. One-way
analysis of variance is a technique where only one independent variable at
different levels is considered which affects the response variable.
In this unit, the one-way analysis of variance model is discussed in Section
6.2. The basic assumptions under one-way analysis of variance are described
in Section 6.3 whereas the estimates of each level mean of a factor are derived
in Section 6.4. Test of hypothesis method is explained in Section 6.5 and the
degrees of freedom for various sum of squares are determined in Section 6.6.
The expectations of various sum of squares are derived in Section 6.7 whereas
the analysis of variance table for one-way classification is described in
Section 6.8.
Objectives
After studying this unit, you would be able to
describe the one-way analysis of variance model; 21
Analysis of Variance
describe the basic assumptions under one-way analysis of variance;
estimate of each level mean of a factor;
determine the degrees of freedom for various sum of squares;
derive the expectations of various sum of squares;
construct the ANOVA table;
test the hypothesis under one-way analysis of variance;
identify differences in population means of k level of a factor; and
determine which population means are different using multiple
comparison methods.
The linear mathematical model for one-way classified data can be written as
yij = µi + eij where, i = 1, 2, . . ., k & j = 1, 2, . . ., n
22
One-Way Analysis of
Variance
k
Total observations are n1 + n2 + … + nk = n i N
i 1
where, αi = µi − µ
1 k
µ= n i i
N i 1
This model decomposes the response into an over all (grand) mean, the effect
of the ith factor level αi and error term eij. The analysis of variance provides
estimates of the grand mean µ and the effect of the ith factor level αi. The
predicted values and the responses of the model are
23
yij i eij … (1)
Analysis of Variance
e ij y ij i
αi is the effect of the ith level of the factor and given by
αi = µi – µ i = 1, 2, …, k. … (1a)
n n
i i i i
n n
i i i i n i
= Nµ - Nµ
=0
The parameters μ and α 1, α2, ..., α k are estimated by the principle of least
square on minimizing the error (residual) sum of squares. The residual sum of
squares can be obtained as
eij = yij − µ − αi i =1, 2, …, k & j = 1, 2, …, ni.
eij2 = (yij − µ − αi)2
k ni k ni
E= e2ij yij i
i1 j1 i 1 j1
2
k ni k
or y
i1 j1
ij Nµ n i α i 0 ,
i1
because kn i N
k ni k
or y
i 1 j1
ij N 0, because n i i 0
i 1
k ni
or N y ij
i 1 j1
1 k ni
Therefore, µ yij y..
N i1 j1
Similarly,
n1
E
2 y1 j 1 0
1 j1
n1
y
j 1
1j n1 n11 0
n1
Or n11 y1 j n1
j1
n1
y
j1
1j
1 ˆ
n1
Therefore, α1 = y 1. − y ..
Similarly,
α2 = y 2. − y ..
Or in general αi = y i. − y .. i = 1, 2, …, k.
25
Analysis of Variance
6.5 TEST OF HYPOTHESIS
Small differences between sample means are usually present. The objective is
to determine whether these differences are significant or in other words, are
the difference more than what might be expected to occur by chance? If the
differences are more than what might be expected to occur by chance, you
have sufficient evidence to conclude that there are differences between the
population means of different levels of a factor.
The hypothesis is :
H0 : Population means of k levels of a factor are equal.
H1 : At least one population mean of a level of a factor is different from the
population means of other levels of the factor.
or H0 : µ1 = µ 2 = . . . = µk
H1 : µ1 ≠ µ 2 = . . . ≠ µk
Now, substituting these estimated values in the model given in equation (1),
the model becomes
yij = y .. + ( y i. − y ..) + eij
and then eij = (yij − y ..) - ( y i. − y ..)
= yij − y i.
Now substituting these values in equation (1) we get
So, yij = y ..+ ( y i. − y ..) + ( y ij − y i.)
Transporting by y.. to the left and squaring both sides and taking sum over i
and j.
k ni k ni
y
ij y .. y i. y.. y ij y i .
2
2
i 1 j1 i 1 j1
k ni k ni k ni
y i. y.. y ij y i. 2 y ij y i. y i. y..
2 2
k ni k ni k ni
y i. y .. y ij y i. 2 y i. y.. y ij y i.
2 2
ni
But yj1
ij y i. = 0, since the sum of the deviations of the observations from
y
i 1 j1
ij y ..
2
yi. y.. 2 y ij yi. 2
i1 j1 i1 j1
k ni k k ni
y
i 1 j1
ij y..
2
n i yi. y..
i1
2
y
i 1 j1
ij y i.
2
where,
k ni k
2
y ij y.. is known as total sum of squares (TSS), n y y .. is
2
i i.
i 1 j1 i 1
26
called between sum of squares or treatment sum of squares or sum of squares
k ni One-Way Analysis of
2
due to different levels of a factor (SST) and y
i 1 j1
ij yi. is called within Variance
The Total Sum of Squares (TSS) which is computed from the N quantities of
the form (yij - y .. ) will carry (N−1) degrees of freedom. One degree of
freedom lost because of linear constraints
k ni
y
i 1 j1
ij y.. = 0
k
2
Similarly, the Treatment Sum of Squares (SST) = y
i 1
i. y.. will have
k ni
2
SSE = yij yi.
i 1 j1
y
i 1 j1
ij yi. = 0 for i = 1, 2, ..., k
ni ni
1 1 1 ni
y i.
ni
j1 ni
i
j1
eij
n i j1
y i . = μ + α i + ei . … (2)
Now,
1 k ni 1 k ni
y.. ij N
N i 1 j1
y i eij
i 1 j1
1 k ni 1 k ni 1 k ni k
y.. N
N i1 j1
µ i N
i 1 j1
α
i1 j1
e ij becausei 1
αi 0
Substituting the value of y i. and y .. from equations (2) & (3), we get
k 2
E (SST) = E n i i ei. e..
i 1
k 2
= E n i i ei. e..
i 1
k 2
E n i i2 ei. e.. 2 i ei. e..
i 1
k k k
2
E n i i2 E n i ei. e.. 2 E n i i ei. e..
i 1 i 1 i 1
k k k
n i α i2 E n i ei2. e..2 2 ei. e.. 2 n iα i Eei. e..
i1 i1 i1
k k
n iα i2 E n i ei2. e..2 2 ei. e.. 0 because Eei. e.. 0
i1 i1
k k k
k
n i α i2 E n i ei2. n i e..2 2 n i ei.e..
i1 i1 i 1 i 1
k k k
k
n iα i2 n i E ei2. n i E e..2 2E n i ei. e..
i1 i1 i1 i1
k k
k
n i α i2 n i E ei2. NE e..2 2 E e.. n i ei.
i1 i 1 i1
28
One-Way Analysis of
k k k Variance
1
n iα i2 n i E ei2. N E e..2 2E e.. Ne.. because e.. n i ei.
N i1
i1 i1
k k
n iα i2 n i E ei2. NE e..2
i1 i1 … (5)
2
Since, eij~ iid N (0, σe ) & E(eij) = 0
V (eij) = E(eij2) - (E(eij))2
σe2 = E(eij2) … (6)
E(ēi.) = 0 and E(ē..) = 0
V(ēi.) = E(ēi.2) (E(ēi.))2 = σe2/ni
or E (ēi.2) = σe2/ni … (7)
V (ē..) = E(ē..2) - (E(ē..)) = σe /N 2 2
ni
k k
2e
2e n i
i 1 j1 i 1 ni
= N σ e − k σ e2
2
= (N − k) σe2
E (SSE/ N−k) = σe2
E (MSSE) = σe2
Therefore, the error mean squares always gives an unbiased estimate of σe2.
Under H0, E (MSST) = E (MSSE)
Otherwise, E (MSST) > E (MSSE)
Hence, the test statistics for testing H0 is provided by the variance ratio or
Snedecor's
F = MSST/MSSE with [(k−1), (N−k)] df.
Thus, if an observed value of F is greater than the tabulated value of F for
{(k−1), (N−k)} df and specific level of significance (usually 5% or 1%), then
H0 is rejected otherwise, it may be accepted.
Total N−1 k ni
2
y
i 1 j1
ij y.. = TSS
30
If the treatment show significance effect at α level of significance then we are One-Way Analysis of
interested to find out which pair/pairs of treatments differ significantly, say Variance
the hypothesis H01: µi = µj or the null hypothesis is rejected.
With the help of
y i. y j.
t with df N− k = k (n−1)
1 1
MSSE
ni n j
y i . y j.
t
2 MSSE
n
y i . y j.
or t with k (n−1) df
2 MSSE
n
If t > tα/2 [k(n−1)], then H01 is rejected at the level α. That is to say, H01 is
rejected at the level of significance α if
2MSSE
yi. y j. t α 2k ( n1)
n
Thus, to compare the factor level effect/group means two at a time, we have to
2MSSE
calculate t (with df k (n-1)) ×
2 n
which is called the critical difference (CD) or the least significant difference
(lsd) and if the difference between the observed class/groups /factor level
effect means, i.e. ( y i.− y j. ), is greater than the CD, then H01: µi = µj is
rejected at α level of significance, otherwise it is accepted. Here, tα/2 [k (n-1)]
is the tabulated value of t-distribution with k(n−1) df at upper α/2 point.
School I (S1) 8 6 7 5 9
School II (S2) 6 4 6 5 6 7
School III(S3) 6 5 5 6 7 8 5
School IV(S 4) 5 6 6 7 6 7
Solution: If 1, 2, 3, 4 denote the average score of students of 8th class of
schools I, II, III, IV respectively. Then
Null Hypothesis H0 : 1 2 3 4
31
Alternative hypothesis H1 : Difference among 1, 2, 3, 4 are significant.
Analysis of Variance
S.No. S1 S2 S3 S4 S 12 S22 S32 S 42
1 8 6 6 5 64 36 36 25
2 6 4 5 6 36 16 25 36
3 7 6 5 6 49 36 25 36
4 5 5 6 7 25 25 36 49
5 9 6 7 6 81 36 49 36
6 7 8 7 49 64 49
7 5 25
Total 35 34 42 37 255 198 260 231
= 912.6667
k ni
2
Raw Sum of Squares (RSS) = y
i 1 j1
ij 255 198 260 231 944
Sources of DF SS MSS F
Variation
Fertilizer Yields (in tones) from the 10 plots allocated to that fertilizer
1 6.27 5.36 6.39 4.85 5.99 7.14 5.08 4.07 4.35 4.95
2 3.07 3.29 4.04 4.19 .41 0.75 04.87 3.94 6.49 3.15
3 4.04 3.79 4.56 4.55 4.53 3.53 3.71 7.00 4.61 4.55
Solution:
H0 : Mean effect of Ist fertilizer = Mean effect of the IInd fertilizer = Mean
effect IIIrd fertilizer
H0 : µ1 = µ2 = µ3
H1 : µ1 ≠ µ2 ≠ µ3
Steps for calculating different sum of squares
Grant Total = Total of all observation = y ij = G = 139.20
Correction Factor (CF) = G2/N = 139.20 ×139.20 /30 = 645.89
2
Raw Sum of Squares (RSS) = y ij = 6385.3249
Total Sum of Squares = RSS − CF =36.4449
Sum of Squares due to Treatments/Fertilizer (SST)
y12. y 22. y32.
= − CF
10 10 10
= (54.5)2/10 + (40)2/10 + (44.9)2/10 – CF
= 10.8227
Sum of Squares due to Error = TSS − SST = 36.4449 −10.8227
= 25.6222
Mean Sum of Squares due to Treatments/Fertilizers (MSST) = SST/df
= 10.8227/2 = 5.4114
Mean Sum of Squares due to Error (MSSE) = SSE/df
= 25.6221/27 = 0.9490
Variance ratio F2,27 = MSST/MSSE = 5.414/0.9490 = 5.70
Tabulated F2,27 = 3.35
33
Analysis of Variance
Since calculated value of F2,27 is greater than tabulated F2,27 at 5% level of
significance so we reject H0. It means there is a significant difference among
the effect of these three fertilizers.
Now, H0 is rejected. So, we have to test which of the fertilizers most
important among the pairwise comparison. So, pairwise comparison test will
be applied.
To test H0 : µi = µj
against H1 : µi≠ µj
We have a test statistic
2MSE
yi. y j. t / 2 (N k)
10
or y1. y 2. = 5.45 − 4.0 =1.45 … (9)
ν1 ν3 ν2 ν4 ν4
2000 2270 2230 2270 2180
ν2 ν1 ν2 ν3 ν2
2160 2100 2050 2300 2280
ν1 ν1 ν4 ν3 ν1
2200 2300 2040 2420 2240
ν4 ν1 ν2 ν2 ν1
2370 2250 2040 2360 2460
ν3 ν1 ν2 ν1 ν3
2210 2340 2190 2150 2020
Perform the ANOVA to test whether there is any significant difference
between varieties of wheat.
6.9 SUMMARY
In this unit, we have discussed:
1. The one-way analysis of variance model;
2. The basic assumptions in one-way analysis of variance;
3. Estimation of parameters of one-way analysis of variance model;
4. Test of hypothesis for one-way classified ANOVA;
5. How to obtain the expectation of various sum of squares in one-way
ANOVA; and
6. The construction of one-way ANOVA table.
G = 8+10+7+14+11+7+5+10+9+9+12+9+13+12+14 = 150
N = Total number of observations = 15
G 2 150 150
Correction factor CF 1500
N 15
2
Raw Sum of Squares (RSS) = y ij
25
CD 2.571
5
2 2.571 1.41 2.571
= 3.625
T1 T2 10 8 2
T1 T3 10 12 2
T2 T3 8 12 4
G y ij = Grand total
= 14+16+18+14+13+15+22+18+16+19+19+20 = 204
N = Total number of observations = 12
G2 204 204
Correction Factor (CF) = 3468
N 12
2
Raw Sum of Squares (RSS) = y ij
Error
Total 14 84
E3) We have
ν1 ν2 ν3 ν4
2000 2160 2210 2370
2200 2230 2270 2040
2100 2050 2300 2270
2300 2040 2420 2180
2250 2190 2020
2340 2360
2150 2280
2240
2460
38
To simplify the calculations, we subtract some suitable number 2200 One-Way Analysis of
(say) from all observations and then dividing by 10 we have Variance
CF =
G2
43 73.96 2
N 25
RSS = 1538 + 827 + 958 + 598 = 3921
TSS = RSS - CF = 3921 – 73.96 = 3847.04
SST =
242 92 222 62 73.96
9 7 5 4
= 64 + 11.5714 + 96.8 + 9 -73.96
= 107.4114
SSE = TSS – SST
= 3847.04 – 107.4114
= 3739.6286
SST 107 .4114
MSST = 35.8038
df 3
SSE 3739 .6284
MSSE = 178 .0776
df 21
MSST 35.8038
F= 0.2011
MSSE 178.0776
39
Analysis of Variance
ANOVA Table
Source of SS df MSS F
variation
Total 3847.04 24
Calculated F = 0.2011
Tabulated value of F at 5% level of significance with (3, 21) degrees
of freedom is 3.07
Conclusion: Since calculated F < Tabulated F, so we may accept H0
and conclude that varieties ν1, ν2, ν3, ν4 of wheat are homogeneous.
40