Professional Documents
Culture Documents
Lecture 4: Randomised Complete Block Designs and Latin Squares
Lecture 4: Randomised Complete Block Designs and Latin Squares
EXAMPLE 1.
Suppose that we want to compare the yields of three varieties of tomatoes. Suppose
that we have two farms available for the experiment and on each farm there are
three plots available to us. We grow each of the varieties on one plot on each farm.
Suppose that we get the following results.
Variety A B C
Farm 1 14 22 23
Farm 2 40 45 46
If we ignore the farms we get the following ANOVA table:
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : y ie ld
S u m o f
S o u r c e D F S q u a r e s M e a n S q u a r e F V a lu e P r > F
M o d e l 2 6 6 .3 3 3 3 3 3 3 3 3 .1 6 6 6 6 6 7 0 .1 1 0 .8 9 5 3
E r r o r 3 8 6 7 .0 0 0 0 0 0 0 2 8 9 .0 0 0 0 0 0 0
C o r r e c te d T o ta l 5 9 3 3 .3 3 3 3 3 3 3
R -S q u a r e C o e ff V a r R o o t M S E y ie ld M e a n
0 .0 7 1 0 7 1 5 3 .6 8 4 2 1 1 7 .0 0 0 0 0 3 1 .6 6 6 6 7
S o u r c e D F T y p e I S S M e a n S q u a r e F V a lu e P r > F
v a r ie t y 2 6 6 .3 3 3 3 3 3 3 3 3 3 .1 6 6 6 6 6 6 7 0 .1 1 0 .8 9 5 3
S o u r c e D F T y p e III S S M e a n S q u a r e F V a lu e P r > F
v a r ie t y 2 6 6 .3 3 3 3 3 3 3 3 3 3 .1 6 6 6 6 6 6 7 0 .1 1 0 .8 9 5 3
and we would conclude that the varieties are all equally productive. This output
was produced using
libname lect ’/courses/da9372e5ba27fe300/35356’;
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : y ie ld
S u m o f
S o u r c e D F S q u a r e s M e a n S q u a r e F V a lu e P r > F
M o d e l 3 9 3 0 .3 3 3 3 3 3 3 3 1 0 .1 1 1 1 1 1 1 2 0 6 .7 4 0 .0 0 4 8
E r r o r 2 3 .0 0 0 0 0 0 0 1 .5 0 0 0 0 0 0
C o r r e c te d T o ta l 5 9 3 3 .3 3 3 3 3 3 3
R -S q u a r e C o e ff V a r R o o t M S E y ie ld M e a n
0 .9 9 6 7 8 6 3 .8 6 7 6 1 5 1 .2 2 4 7 4 5 3 1 .6 6 6 6 7
S o u r c e D F T y p e I S S M e a n S q u a r e F V a lu e P r > F
v a r ie t y 2 6 6 .3 3 3 3 3 3 3 3 3 .1 6 6 6 6 6 7 2 2 .1 1 0 .0 4 3 3
fa r m 1 8 6 4 .0 0 0 0 0 0 0 8 6 4 .0 0 0 0 0 0 0 5 7 6 .0 0 0 .0 0 1 7
S o u r c e D F T y p e III S S M e a n S q u a r e F V a lu e P r > F
v a r ie t y 2 6 6 .3 3 3 3 3 3 3 3 3 .1 6 6 6 6 6 7 2 2 .1 1 0 .0 4 3 3
fa r m 1 8 6 4 .0 0 0 0 0 0 0 8 6 4 .0 0 0 0 0 0 0 5 7 6 .0 0 0 .0 0 1 7
and we see that the farms were a large source of variation in this experiment. The
correct estimate of the unobserved variability is 1.5 rather than 289. This output
was produced by
proc glm data=lect.tomato;
class farm variety;
model yield=farm variety;
run;
This example illustrates the reason that we need to remove all the known sources of
variation before we draw any conclusions from an experiment.
Analysis of Variance
As before we will assume that there are a treatments but now we will also assume
that the experimental units are grouped into b sets, called blocks, of a homogenous
units. The treatments are allocated to the units at random within each block. Thus
we say that the randomisation is restricted by the block factor.
We say that the blocks are complete because there are as many units as treatments
in each block. If there are more treatments than units then we say the blocks are
incomplete. Balanced incomplete block designs are one example of designs with
incomplete blocks.
yij = µ + τi + βj + eij ,
As before we assume that the random error terms are independently identically
normal with constant variance.
The terms in this model can not be uniquely determined and so we assume that
the
P treatment and
P block effects are deviations from the overall mean. Thus we have
τ
i i = 0 and j βj = 0. We will come back to this assumption shortly.
H0 : τ1 = τ2 = . . . = τa = 0
H1 : at least one treatment effect is non-zero.
yij − y .. = y i. − y .. + y .j − y .. + yij − y i. − y .j + y ..
= (y i. − y .. ) + (y .j − y .. ) + (yij − y i. − y .j + y .. )
The Block SS can not be used for testing that the βj are all 0 since the Block SS
arises as a result of a restricted randomisation and so the test statistic would be
testing both the block terms and the randomisation restriction. A large value of the
Block MS suggests that blocking was helpful in reducing the unexplained variability
however.
EXAMPLE 2.
Montgomery(2007) gives the results of an experiment to compare the effect of ex-
trusion pressure on the number of defects in artificial veins. The veins are produced
by “by extruding billets of polytetrafluoroethylene (PTFE) resin combined with a
lubricant into tubes”. Since the resin comes from an external supplier and the en-
gineers want to allow for possible batch-to-batch variability, each batch is used to
produce some veins at each of the different pressures. The response variable is the
proportion of veins which have no defects.
Treatment
Block 1 2 3 4
1 90.3 92.5 85.5 82.5
2 89.2 89.5 90.8 89.5
3 98.2 90.6 89.6 85.6
4 93.9 94.7 86.2 87.4
5 87.4 87.0 88.0 78.9
6 97.9 95.8 93.4 90.7
proc glm data=lect.veins plots=diagnostics;
class block pressure;
model prop=block pressure;
means pressure /tukey;
run;
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : p r o p
S u m o f
S o u r c e D F S q u a r e s M e a n S q u a r e F V a lu e P r > F
M o d e l 8 3 7 0 .4 2 3 3 3 3 3 4 6 .3 0 2 9 1 6 7 6 .3 2 0 .0 0 1 1
E r r o r 1 5 1 0 9 .8 8 6 2 5 0 0 7 .3 2 5 7 5 0 0
C o r r e c te d T o ta l 2 3 4 8 0 .3 0 9 5 8 3 3
R -S q u a r e C o e ff V a r R o o t M S E p r o p M e a n
0 .7 7 1 2 1 8 3 .0 1 4 1 8 5 2 .7 0 6 6 1 2 8 9 .7 9 5 8 3
S o u r c e D F T y p e I S S M e a n S q u a r e F V a lu e P r > F
p r e ssu r e 3 1 7 8 .1 7 1 2 5 0 0 5 9 .3 9 0 4 1 6 7 8 .1 1 0 .0 0 1 9
b lo c k 5 1 9 2 .2 5 2 0 8 3 3 3 8 .4 5 0 4 1 6 7 5 .2 5 0 .0 0 5 5
S o u r c e D F T y p e III S S M e a n S q u a r e F V a lu e P r > F
p r e ssu r e 3 1 7 8 .1 7 1 2 5 0 0 5 9 .3 9 0 4 1 6 7 8 .1 1 0 .0 0 1 9
b lo c k 5 1 9 2 .2 5 2 0 8 3 3 3 8 .4 5 0 4 1 6 7 5 .2 5 0 .0 0 5 5
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : p r o p
T h e G L M P ro c e d u re
T u k e y 's S tu d e n tiz e d R a n g e (H S D ) T e s t fo r p r o p
A lp h a 0 .0 5
E r r o r D e g r e e s o f F r e e d o m 1 5
E r r o r M e a n S q u a r e 7 .3 2 5 7 5
C r it ic a l V a lu e o f S t u d e n t iz e d R a n g e 4 .0 7 5 8 8
M in im u m S ig n if ic a n t D if f e r e n c e 4 .5 0 3 7
C o m p a r is o n s s ig n if ic a n t a t t h e 0 .0 5 le v e l a r e
in d ic a t e d b y * * * .
S im u lt a n e o u s
D if f e r e n c e 9 5 %
p r e ssu r e B e tw e e n C o n f id e n c e
C o m p a r is o n M e a n s L im it s
1 - 2 1 .1 3 3 - 3 .3 7 0 5 .6 3 7
1 - 3 3 .9 0 0 - 0 .6 0 4 8 .4 0 4
1 - 4 7 .0 5 0 2 .5 4 6 1 1 .5 5 4 * * *
2 - 1 - 1 .1 3 3 - 5 .6 3 7 3 .3 7 0
2 - 3 2 .7 6 7 - 1 .7 3 7 7 .2 7 0
2 - 4 5 .9 1 7 1 .4 1 3 1 0 .4 2 0 * * *
3 - 1 - 3 .9 0 0 - 8 .4 0 4 0 .6 0 4
3 - 2 - 2 .7 6 7 - 7 .2 7 0 1 .7 3 7
3 - 4 3 .1 5 0 - 1 .3 5 4 7 .6 5 4
4 - 1 - 7 .0 5 0 - 1 1 .5 5 4 - 2 .5 4 6 * * *
4 - 2 - 5 .9 1 7 - 1 0 .4 2 0 - 1 .4 1 3 * * *
4 - 3 - 3 .1 5 0 - 7 .6 5 4 1 .3 5 4
We will estimate the model parameters using least squares. To do this we calculate
the (theoretical) Error sum of squares, which is
a X
X b a X
X b
2
(yij − µ − τi − βj ) = e2ij = S,
i=1 j=1 i=1 j=1
and choose values for µ, τi and βj that minimise this sum of squares. Thus we
must differentiate S with respect to each of the parameters in turn, set the resulting
equations to 0 and solve to find the parameter estimates.
Thus altogether we have a + b + 1 equations, one for each parameter in the linear
model. We call these the normal equations and we see that we can get the normal
equation corresponding to a particular term by adding over all subscripts that do
not subscript that term. This is a short-cut which avoids the need to differentiate
S.
We can also see that the sum of the normal equations associated with the τi gives the
normal equation associated with µ and the sum of the normal equations associated
with the βj gives the normal equation associated with µ as well. Thus to be able
abb
µ = y..,
bb
µ + bb
τi = yi ., i = 1, . . . , a,
ab
µ + aβbj = y.j , j = 1, . . . , b,
µ
b = y..
τbi = y i . − y..
βbj = y .j − y..
for i = 1, . . . , a and j = 1, . . . , b.
Other constraints could give different estimates for the parameter values in the model
but the estimates for the estimable functions are independent of the constraints
chosen.
DEFINITION 1.
A Latin square of order n is an n × n array based on a set of n symbols such that
each symbol appears exactly once in each row of the square, and exactly once in each
column of the square.
EXAMPLE 3.
The squares in Table 1 are each of order 4.
In the context of designed experiments, Latin squares are used when there are two
known and controllable nuisance factors and each experimental unit appears in ex-
actly one block for the two nuisance factors. Most often these factors are the rows
and columns of plants such as trees laid out in an orchard but they may be factors
like day of the week and time of day.
Analysis of Variance
As before we will assume that there are a treatments but now we will also assume
that the experimental units are grouped into two sets of a blocks, each of a ho-
mogenous units. The treatments are allocated to the units at random so that each
treatment appears once in each row and once in each column of the Latin square.
Thus we say that the randomisation is restricted by the block factors.
Once again the blocks are complete.
The effects model for the Latin square design is given by
yijk = µ + τi + ρj + κk + eijk , i = 1, . . . , a, j = 1, . . . , a, k = 1, . . . , a.
EXAMPLE 4.
(From Mason, Gunst and Hess (1989)) A tyre wholesaler wanted to road test four
brands of tyres intended for use on heavy-duty commercial trucks. The response
was the fuel efficiency, measured in miles per gallon. To ne meaningful, the test
runs had to be several hundred miles long. Hence it was decided to use several test
trucks and to test each brand on each truck. Because of the length of the test drive
it was necessary to run the test programme over several days. To allow for possible
variation in the weather each brand of tyre was tested on each day.
The final layout appears in the following table. Is there a difference between the
brands?
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : e ffic ie n c y
S u m o f
S o u r c e D F S q u a r e s M e a n S q u a r e F V a lu e P r > F
M o d e l 9 1 .6 3 7 7 0 0 0 0 0 .1 8 1 9 6 6 6 7 1 1 .7 7 0 .0 0 3 6
E r r o r 6 0 .0 9 2 8 0 0 0 0 0 .0 1 5 4 6 6 6 7
C o r r e c te d T o ta l 1 5 1 .7 3 0 5 0 0 0 0
R -S q u a r e C o e ff V a r R o o t M S E e f f ic ie n c y M e a n
0 .9 4 6 3 7 4 1 .8 3 3 6 1 7 0 .1 2 4 3 6 5 6 .7 8 2 5 0 0
S o u r c e D F T y p e I S S M e a n S q u a r e F V a lu e P r > F
d a y 3 0 .2 6 3 0 0 0 0 0 0 .0 8 7 6 6 6 6 7 5 .6 7 0 .0 3 4 8
tr u c k 3 0 .0 6 7 6 5 0 0 0 0 .0 2 2 5 5 0 0 0 1 .4 6 0 .3 1 7 0
b r a n d 3 1 .3 0 7 0 5 0 0 0 0 .4 3 5 6 8 3 3 3 2 8 .1 7 0 .0 0 0 6
S o u r c e D F T y p e III S S M e a n S q u a r e F V a lu e P r > F
d a y 3 0 .2 6 3 0 0 0 0 0 0 .0 8 7 6 6 6 6 7 5 .6 7 0 .0 3 4 8
tr u c k 3 0 .0 6 7 6 5 0 0 0 0 .0 2 2 5 5 0 0 0 1 .4 6 0 .3 1 7 0
b r a n d 3 1 .3 0 7 0 5 0 0 0 0 .4 3 5 6 8 3 3 3 2 8 .1 7 0 .0 0 0 6
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : e ffic ie n c y
The test for normality confirms that the assumption of normal errors is reasonable.
We can not test the assumption of equal variances for the full model since there is
no replication of results for a particular day, truck combination.
These notes are only intended to provide an overview of the material. They are
supplemented by the discussion that takes place in the classroom, both in lectures
and in labs. The exercise sheets are an integral part of the subject and students
should attempt all of the questions.
Students who would like further reading about this topic have many options as this is
a standard topic that is covered in any book on designed experiments. Kuehl (2000)
and Montgomery (2007 and earlier editions) both cover this material in detail and
are well written. But any book that you find in the library that covers the material