Lecture 4: Randomised Complete Block Designs and Latin Squares

Lecture 4: Randomised Complete Block Designs
and Latin Squares

Reference: Montgomery Chapter 4
Reference: Dean and Voss Chapter 10, and Sections 12.1-2
We have seen how a CRD can be used to answer questions about a set of a treatments
assuming that we have a set of homogeneous experimental units available.
We now consider how to design experiments when we do not have enough homo-
geneous experimental units available, although we do have experimental units that
can be grouped into sets of homogeneous units.
We will talk about situations where there is one way of grouping the units (ran-
domised complete blocks) and where there are two ways of grouping the units (Latin
squares).
The groupings that we use are a property of the experimental units and are of no
interest to the experimenter. Thus we can think of blocks as being the levels of a
nuisance factor. The blocks may have an effect on the response but we are only
interested in that effect in so much as we want to be sure that it does not lead us to
declaring treatments that have different effects to be the same or vice versa.
Recall that we used randomisation as a way of safeguarding the results of an exper-
iment from unknown and uncontrollable nuisance factors. Blocks can be thought of
as known and controllable nuisance factors.
Randomised Complete Block Designs
EXAMPLE 1.
Suppose that we want to compare the yields of three varieties of tomatoes. Suppose
that we have two farms available for the experiment and on each farm there are
three plots available to us. We grow each of the varieties on one plot on each farm.
Suppose that we get the following results.
Variety A B C
Farm 1 14 22 23
Farm 2 40 45 46
If we ignore the farms we get the following ANOVA table:
Written by Debbie Street, 35356: Lecture 4 1

modified by Steve Bush
T h u rs d a y , M a rc h 1 3 , 2 0 1 4 0 1 :2 9 :0 2 P M 2
T h e G L M P ro c e d u re
D e p e n d e n t V a r ia b le : y ie ld
S u m o f
S o u r c e D F S q u a r e s M e a n S q u a r e F V a lu e P r > F
M o d e l 2 6 6 .3 3 3 3 3 3 3 3 3 .1 6 6 6 6 6 7 0 .1 1 0 .8 9 5 3
E r r o r 3 8 6 7 .0 0 0 0 0 0 0 2 8 9 .0 0 0 0 0 0 0
C o r r e c te d T o ta l 5 9 3 3 .3 3 3 3 3 3 3
R -S q u a r e C o e ff V a r R o o t M S E y ie ld M e a n
0 .0 7 1 0 7 1 5 3 .6 8 4 2 1 1 7 .0 0 0 0 0 3 1 .6 6 6 6 7
S o u r c e D F T y p e I S S M e a n S q u a r e F V a lu e P r > F
v a r ie t y 2 6 6 .3 3 3 3 3 3 3 3 3 3 .1 6 6 6 6 6 6 7 0 .1 1 0 .8 9 5 3
S o u r c e D F T y p e III S S M e a n S q u a r e F V a lu e P r > F
v a r ie t y 2 6 6 .3 3 3 3 3 3 3 3 3 3 .1 6 6 6 6 6 6 7 0 .1 1 0 .8 9 5 3
and we would conclude that the varieties are all equally productive. This output
was produced using
libname lect ’/courses/da9372e5ba27fe300/35356’;
proc glm data=lect.tomato;

class variety;
model yield=variety;
run;
But we have designed our experiment by randomising tomaoto varieties so that each
variety appears once on each farm and so our analysis should reflect that restriction
on the randomisation. Thus we should include the farms as a source of variation in
the analysis, although it is of no interest to us per se. This gives:

D e p e n d e n t V a r ia b le : y ie ld
S u m o f
M o d e l 3 9 3 0 .3 3 3 3 3 3 3 3 1 0 .1 1 1 1 1 1 1 2 0 6 .7 4 0 .0 0 4 8
E r r o r 2 3 .0 0 0 0 0 0 0 1 .5 0 0 0 0 0 0
C o r r e c te d T o ta l 5 9 3 3 .3 3 3 3 3 3 3
R -S q u a r e C o e ff V a r R o o t M S E y ie ld M e a n
0 .9 9 6 7 8 6 3 .8 6 7 6 1 5 1 .2 2 4 7 4 5 3 1 .6 6 6 6 7
v a r ie t y 2 6 6 .3 3 3 3 3 3 3 3 3 .1 6 6 6 6 6 7 2 2 .1 1 0 .0 4 3 3
fa r m 1 8 6 4 .0 0 0 0 0 0 0 8 6 4 .0 0 0 0 0 0 0 5 7 6 .0 0 0 .0 0 1 7
v a r ie t y 2 6 6 .3 3 3 3 3 3 3 3 3 .1 6 6 6 6 6 7 2 2 .1 1 0 .0 4 3 3
fa r m 1 8 6 4 .0 0 0 0 0 0 0 8 6 4 .0 0 0 0 0 0 0 5 7 6 .0 0 0 .0 0 1 7
and we see that the farms were a large source of variation in this experiment. The
correct estimate of the unobserved variability is 1.5 rather than 289. This output
was produced by
proc glm data=lect.tomato;
class farm variety;
model yield=farm variety;
run;
This example illustrates the reason that we need to remove all the known sources of
variation before we draw any conclusions from an experiment.
Analysis of Variance
As before we will assume that there are a treatments but now we will also assume
that the experimental units are grouped into b sets, called blocks, of a homogenous
units. The treatments are allocated to the units at random within each block. Thus
we say that the randomisation is restricted by the block factor.
We say that the blocks are complete because there are as many units as treatments
in each block. If there are more treatments than units then we say the blocks are
incomplete. Balanced incomplete block designs are one example of designs with
incomplete blocks.

The effects model for the randomised complete block design (RCBD) is given by
yij = µ + τi + βj + eij ,
As before we assume that the random error terms are independently identically
normal with constant variance.
The terms in this model can not be uniquely determined and so we assume that
the
P treatment and
P block effects are deviations from the overall mean. Thus we have
τ
i i = 0 and j βj = 0. We will come back to this assumption shortly.
The null hypothesis is unaltered from the CRD situation,
H0 : τ1 = τ2 = . . . = τa = 0
H1 : at least one treatment effect is non-zero.
Proceeding as we did in the CRD section, we write
yij − y .. = y i. − y .. + y .j − y .. + yij − y i. − y .j + y ..
= (y i. − y .. ) + (y .j − y .. ) + (yij − y i. − y .j + y .. )
Squaring both sides and adding over all observations gives

XX XX XX XX
(yij −y .. )2 = (y i. −y .. )2 + (y .j −y .. )2 + (yij −y i. −y .j +y .. )2
i j i j i j i j
which is often written as

Total Sum of Squares = Treatments SS + Block SS + Error SS.
Note that we have to show that the cross-product terms are 0; see the exercises.
As before the expected value of the Error MS is σ 2 , but it now has (a − 1)(b − 1)
degrees of freedom. For the other two SS a similar derivationP to our earlier one
shows that the Treatments SS has expectedPvalue (a − 1)σ 2 + b i τi2 and that the
Block SS has expected value (b − 1)σ 2 + a j βj2 .

We can construct the ANOVA table for this model:
The Block SS can not be used for testing that the βj are all 0 since the Block SS
arises as a result of a restricted randomisation and so the test statistic would be
testing both the block terms and the randomisation restriction. A large value of the
Block MS suggests that blocking was helpful in reducing the unexplained variability
however.
EXAMPLE 2.
Montgomery(2007) gives the results of an experiment to compare the effect of ex-
trusion pressure on the number of defects in artificial veins. The veins are produced
by “by extruding billets of polytetrafluoroethylene (PTFE) resin combined with a
lubricant into tubes”. Since the resin comes from an external supplier and the en-
gineers want to allow for possible batch-to-batch variability, each batch is used to
produce some veins at each of the different pressures. The response variable is the
proportion of veins which have no defects.
Treatment
Block 1 2 3 4
1 90.3 92.5 85.5 82.5
2 89.2 89.5 90.8 89.5
3 98.2 90.6 89.6 85.6
4 93.9 94.7 86.2 87.4
5 87.4 87.0 88.0 78.9
6 97.9 95.8 93.4 90.7
proc glm data=lect.veins plots=diagnostics;
class block pressure;
model prop=block pressure;
means pressure /tukey;
run;

D e p e n d e n t V a r ia b le : p r o p
S u m o f
M o d e l 8 3 7 0 .4 2 3 3 3 3 3 4 6 .3 0 2 9 1 6 7 6 .3 2 0 .0 0 1 1
E r r o r 1 5 1 0 9 .8 8 6 2 5 0 0 7 .3 2 5 7 5 0 0
C o r r e c te d T o ta l 2 3 4 8 0 .3 0 9 5 8 3 3
R -S q u a r e C o e ff V a r R o o t M S E p r o p M e a n
0 .7 7 1 2 1 8 3 .0 1 4 1 8 5 2 .7 0 6 6 1 2 8 9 .7 9 5 8 3
p r e ssu r e 3 1 7 8 .1 7 1 2 5 0 0 5 9 .3 9 0 4 1 6 7 8 .1 1 0 .0 0 1 9
b lo c k 5 1 9 2 .2 5 2 0 8 3 3 3 8 .4 5 0 4 1 6 7 5 .2 5 0 .0 0 5 5
p r e ssu r e 3 1 7 8 .1 7 1 2 5 0 0 5 9 .3 9 0 4 1 6 7 8 .1 1 0 .0 0 1 9
b lo c k 5 1 9 2 .2 5 2 0 8 3 3 3 8 .4 5 0 4 1 6 7 5 .2 5 0 .0 0 5 5

D e p e n d e n t V a r ia b le : p r o p
A formal test of the residuals (using Anderson-Darling) gives a p-value of 0.34 so

the assumption of normality seems reasonable. Similarly the assumption of equal
variances also seems reasonable.
Finally we explore the significant mean differences by using Tukey’s test.

T h u rs d a y , M a rc h 1 3 , 2 0 1 4 0 1 :2 9 :0 2 P M 1 1
T u k e y 's S tu d e n tiz e d R a n g e (H S D ) T e s t fo r p r o p
N o te : T h is te s t c o n tro ls th e T y p e I e x p e rim e n tw is e e rro r ra te .
A lp h a 0 .0 5
E r r o r D e g r e e s o f F r e e d o m 1 5
E r r o r M e a n S q u a r e 7 .3 2 5 7 5
C r it ic a l V a lu e o f S t u d e n t iz e d R a n g e 4 .0 7 5 8 8
M in im u m S ig n if ic a n t D if f e r e n c e 4 .5 0 3 7
C o m p a r is o n s s ig n if ic a n t a t t h e 0 .0 5 le v e l a r e
in d ic a t e d b y * * * .
S im u lt a n e o u s
D if f e r e n c e 9 5 %
p r e ssu r e B e tw e e n C o n f id e n c e
C o m p a r is o n M e a n s L im it s
1 - 2 1 .1 3 3 - 3 .3 7 0 5 .6 3 7
1 - 3 3 .9 0 0 - 0 .6 0 4 8 .4 0 4
1 - 4 7 .0 5 0 2 .5 4 6 1 1 .5 5 4 * * *
2 - 1 - 1 .1 3 3 - 5 .6 3 7 3 .3 7 0
2 - 3 2 .7 6 7 - 1 .7 3 7 7 .2 7 0
2 - 4 5 .9 1 7 1 .4 1 3 1 0 .4 2 0 * * *
3 - 1 - 3 .9 0 0 - 8 .4 0 4 0 .6 0 4
3 - 2 - 2 .7 6 7 - 7 .2 7 0 1 .7 3 7
3 - 4 3 .1 5 0 - 1 .3 5 4 7 .6 5 4
4 - 1 - 7 .0 5 0 - 1 1 .5 5 4 - 2 .5 4 6 * * *
4 - 2 - 5 .9 1 7 - 1 0 .4 2 0 - 1 .4 1 3 * * *
4 - 3 - 3 .1 5 0 - 7 .6 5 4 1 .3 5 4

Estimating Model Parameters
We will estimate the model parameters using least squares. To do this we calculate
the (theoretical) Error sum of squares, which is
a X
X b a X
X b
2
(yij − µ − τi − βj ) = e2ij = S,
i=1 j=1 i=1 j=1
and choose values for µ, τi and βj that minimise this sum of squares. Thus we
must differentiate S with respect to each of the parameters in turn, set the resulting
equations to 0 and solve to find the parameter estimates.
Thus altogether we have a + b + 1 equations, one for each parameter in the linear
model. We call these the normal equations and we see that we can get the normal
equation corresponding to a particular term by adding over all subscripts that do
not subscript that term. This is a short-cut which avoids the need to differentiate
S.
We can also see that the sum of the normal equations associated with the τi gives the
normal equation associated with µ and the sum of the normal equations associated
with the βj gives the normal equation associated with µ as well. Thus to be able

to solve these equations and find the least squares estimates we must impose two
constraints. We will impose the constraints
a
X b
X
τbi = 0 and βbj = 0.
i=1 j=1
Then the normal equations become
abb
µ = y..,
bb
µ + bb
τi = yi ., i = 1, . . . , a,
ab
µ + aβbj = y.j , j = 1, . . . , b,
from which we get estimates
µ
b = y..
τbi = y i . − y..
βbj = y .j − y..
for i = 1, . . . , a and j = 1, . . . , b.
Other constraints could give different estimates for the parameter values in the model
but the estimates for the estimable functions are independent of the constraints
chosen.

Latin Squares
DEFINITION 1.
A Latin square of order n is an n × n array based on a set of n symbols such that
each symbol appears exactly once in each row of the square, and exactly once in each
column of the square.
EXAMPLE 3.
The squares in Table 1 are each of order 4.
Table 1: Two Latin squares of order 4

1 2 3 4 1 2 3 4
2 1 4 3 4 1 2 3
3 4 1 2 3 4 1 2
4 3 2 1 2 3 4 1
In the context of designed experiments, Latin squares are used when there are two
known and controllable nuisance factors and each experimental unit appears in ex-
actly one block for the two nuisance factors. Most often these factors are the rows
and columns of plants such as trees laid out in an orchard but they may be factors
like day of the week and time of day.
Analysis of Variance
As before we will assume that there are a treatments but now we will also assume
that the experimental units are grouped into two sets of a blocks, each of a ho-
mogenous units. The treatments are allocated to the units at random so that each
treatment appears once in each row and once in each column of the Latin square.
Thus we say that the randomisation is restricted by the block factors.
Once again the blocks are complete.
The effects model for the Latin square design is given by
yijk = µ + τi + ρj + κk + eijk , i = 1, . . . , a, j = 1, . . . , a, k = 1, . . . , a.

As before we assume that the random error terms are independently identically
normal with constant variance.
The terms in this model can not be uniquely determined and so we assume that the
treatment,
P row and
P column effects
P are deviations from the overall mean. Thus we
have i τi = 0, j ρj = 0 and k κk = 0.
We can derive the 3a+1 normal equations to obtain parameter estimates. The sums
of squares come from the sums of squares identity for this model.
EXAMPLE 4.
(From Mason, Gunst and Hess (1989)) A tyre wholesaler wanted to road test four
brands of tyres intended for use on heavy-duty commercial trucks. The response
was the fuel efficiency, measured in miles per gallon. To ne meaningful, the test
runs had to be several hundred miles long. Hence it was decided to use several test
trucks and to test each brand on each truck. Because of the length of the test drive
it was necessary to run the test programme over several days. To allow for possible
variation in the weather each brand of tyre was tested on each day.
The final layout appears in the following table. Is there a difference between the
brands?
Table 2: Results from the tyre experiment

Day
Truck Mon Tue Wed Thur
A 4, 7.38 2, 7.00 1, 6.31 3, 6.73
B 2, 7.11 1, 6.54 3, 6.55 4, 7.13
C 1, 6.71 3, 6.64 4, 7.06 2, 6.59
D 3, 6.57 4, 7.31 2, 6.63 1, 6.26
proc glm data=lect.tyre plots=diagnostics;

class day truck brand;
model efficiency=day truck brand;
run;

D e p e n d e n t V a r ia b le : e ffic ie n c y
S u m o f
M o d e l 9 1 .6 3 7 7 0 0 0 0 0 .1 8 1 9 6 6 6 7 1 1 .7 7 0 .0 0 3 6
E r r o r 6 0 .0 9 2 8 0 0 0 0 0 .0 1 5 4 6 6 6 7
C o r r e c te d T o ta l 1 5 1 .7 3 0 5 0 0 0 0
R -S q u a r e C o e ff V a r R o o t M S E e f f ic ie n c y M e a n
0 .9 4 6 3 7 4 1 .8 3 3 6 1 7 0 .1 2 4 3 6 5 6 .7 8 2 5 0 0
d a y 3 0 .2 6 3 0 0 0 0 0 0 .0 8 7 6 6 6 6 7 5 .6 7 0 .0 3 4 8
tr u c k 3 0 .0 6 7 6 5 0 0 0 0 .0 2 2 5 5 0 0 0 1 .4 6 0 .3 1 7 0
b r a n d 3 1 .3 0 7 0 5 0 0 0 0 .4 3 5 6 8 3 3 3 2 8 .1 7 0 .0 0 0 6
d a y 3 0 .2 6 3 0 0 0 0 0 0 .0 8 7 6 6 6 6 7 5 .6 7 0 .0 3 4 8
tr u c k 3 0 .0 6 7 6 5 0 0 0 0 .0 2 2 5 5 0 0 0 1 .4 6 0 .3 1 7 0
b r a n d 3 1 .3 0 7 0 5 0 0 0 0 .4 3 5 6 8 3 3 3 2 8 .1 7 0 .0 0 0 6

D e p e n d e n t V a r ia b le : e ffic ie n c y
The test for normality confirms that the assumption of normal errors is reasonable.
We can not test the assumption of equal variances for the full model since there is
no replication of results for a particular day, truck combination.
Further reading and extensions
These notes are only intended to provide an overview of the material. They are
supplemented by the discussion that takes place in the classroom, both in lectures
and in labs. The exercise sheets are an integral part of the subject and students
should attempt all of the questions.
Students who would like further reading about this topic have many options as this is
a standard topic that is covered in any book on designed experiments. Kuehl (2000)
and Montgomery (2007 and earlier editions) both cover this material in detail and
are well written. But any book that you find in the library that covers the material

in a way that you find helpful is suitable.
Balanced incomplete block designs are used as experimental designs when there are
not as many homogeneous units per block as there are treatments. When fitting
a linear model to such data it is important to put the block term in before the
treatment term since the sum of squares depends on the order of fit (so when fitting
the model block treatment the treatment sum of squares is said to be adjusted
for blocks. The adjusted sum of squares is the correct sum of squares to use when
testing for treatment effects.


Lecture 4: Randomised Complete Block Designs and Latin Squares

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4: Randomised Complete Block Designs and Latin Squares

Uploaded by

Copyright:

Available Formats

Lecture 4: Randomised Complete Block Designs

and Latin Squares

Randomised Complete Block Designs

Written by Debbie Street, 35356: Lecture 4 1

proc glm data=lect.tomato;

Written by Debbie Street, 35356: Lecture 4 2

Written by Debbie Street, 35356: Lecture 4 3

The null hypothesis is unaltered from the CRD situation,

Proceeding as we did in the CRD section, we write

Squaring both sides and adding over all observations gives

which is often written as

Written by Debbie Street, 35356: Lecture 4 4

Written by Debbie Street, 35356: Lecture 4 5

Written by Debbie Street, 35356: Lecture 4 6

A formal test of the residuals (using Anderson-Darling) gives a p-value of 0.34 so

Written by Debbie Street, 35356: Lecture 4 7

N o te : T h is te s t c o n tro ls th e T y p e I e x p e rim e n tw is e e rro r ra te .

Written by Debbie Street, 35356: Lecture 4 8

Written by Debbie Street, 35356: Lecture 4 9

Then the normal equations become

from which we get estimates

Written by Debbie Street, 35356: Lecture 4 10

Table 1: Two Latin squares of order 4

Written by Debbie Street, 35356: Lecture 4 11

Table 2: Results from the tyre experiment

proc glm data=lect.tyre plots=diagnostics;

Written by Debbie Street, 35356: Lecture 4 12

Written by Debbie Street, 35356: Lecture 4 13

Further reading and extensions

Written by Debbie Street, 35356: Lecture 4 14

Written by Debbie Street, 35356: Lecture 4 15

You might also like