Professional Documents
Culture Documents
STA508 Lecture Notes 2021
STA508 Lecture Notes 2021
Rachel Sarguta
PART I:
INTRODUCTION
Introduction
Definitions I
Introduction
Definitions II
Introduction
Definitions III
Introduction
Definitions IV
(a) Replication: This is where more than one observation is taken for each
combination of treatment factor levels used in the experiment.
(b) Blocking: This refers to the use of blocking factors to divide the
experimental units into sets (blocks) such that the units within sets are more
homogeneous (less variable with respect to the response variable) than units
in general.
(c) Randomization: This means that the allocation of experimental units to
combinations of treatment factor levels should be randomly determined.
Randomization is important because:
(i) It provides a solid basis for the statistical analysis of the data produced by an
experiment.
(ii) It helps avoid systematic biasing of results caused by an “unfair” allocation of
treatments to experimental units.
(iii) Randomization can provide a basis for the analysis of experimental data even
when the usual assumptions we make about the observations are violated.
Planning Experiments
Planning Experiments
Steps I
I Define Objectives. Define the objectives of the study. First, this statement
should answer the question of why is the experiment to be performed.
Second, determine if the experiment is conducted to classify sources of
variability or if its purpose is to study cause and effect relationships. If it is
the latter, determine if it is a screening or optimization experiment. For
studies of cause and effect relationships, decide how large an effect should be
in order to be meaningful to detect.
I Identify Experimental Units. Declare the item upon which something will
be changed. Is it an animal or human subject, raw material for some
processing operation, or simply the conditions that exist at a point in time or
trial? Identifying the experimental units will help in understanding the
experimental error and variance of experimental error.
I Define a Meaningful and Measurable Response or Dependent Variable.
Define what characteristic of the experimental units can be measured and
recorded after each run. This characteristic should best represent the
expected differences to be caused by changes in the factors.
Sarguta (SoM) Design and Analysis February - May, 2021 11 / 263
Introduction Planning Experiments
Planning Experiments
Steps II
Planning Experiments
Steps III
Planning Experiments
Steps IV
PART II:
EXPERIMENTS WITH A SINGLE FACTOR:
COMPLETELY RANDOMIZED DESIGNS
Effects Model
Yij = µ + τi + eij
the effects model and the τi ’s are called the effects. τi represents the difference
between the long-run average of all possible experiments at the ith level of the
treatment factor and the overall average.
Parameter Estimation
For equal number of replicates, the sample means of the data in the ith level of
the treatment factor is represented by
ri
1X
ȳi. = yij
ri
j=1
µ̂i = ȳi.
Example Contd
The data from a CRD design for the bread rise experiment described earlier:
Rise Time Loaf Heights
35 4.5, 5.0, 5.5, 6.75
40 6.5, 6.5, 10.5, 9.5
45 9.75, 8.75, 6.5, 8.25
> bread <- read.csv("plan2.csv")
> bread$time<-as.factor(bread$time)
> library(daewr )
> mod0 <- lm( Height ~ time, data = bread )
> summary( mod0 )
Call:
lm(formula = Height ~ time, data = bread)
Residuals:
Min 1Q Median 3Q Max
-1.812 -1.141 0.000 1.266 2.250
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.4375 0.7655 7.104 5.65e-05 ***
time40 2.8125 1.0825 2.598 0.0288 *
time45 2.8750 1.0825 2.656 0.0262 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sums of Squares
Plots
Plots
Constant Leverage:
Residuals vs Factor Levels Residuals vs Fitted
2.0
Standardized residuals
5 5
2
1.0
Residuals
1
0.0
0
−2 −1
−1.5
3 11 11 3
time :
35 40 45 5.5 6.0 6.5 7.0 7.5 8.0
2
5
residuals(mod1)
1.0
1
0.0
0
−1
−1.0
11 3
Question
Exercise
PART III:
RANDOMIZED BLOCKS, LATIN SQUARES AND
RELATED DESIGNS
Randomized Blocks
Introduction
I Example:
A hardness testing machine presses a pointed rod (the ’tip’) into a metal
specimen (a ’coupon’), with a known force. The depth of the depression is a
measure of the hardness of the specimen. It is feared that, depending on the
kind of tip used, the machine might give different readings. The experimenter
wants 4 observations on each of the 4 types of tips. Note that the differences
in readings might also depend on which type of metal specimen is used, i.e.
on the coupons.
I A Completely Randomized design would use 16 coupons, making 1 depression
in each. The coupons would be randomly assigned to the tips, hoping that
this would average out any differences between the coupons. Here ’coupon
type’ is a ’nuisance factor’ - it may affect the readings, but we aren’t very
interested in measuring its effect.
I It is also controllable, by blocking: we can use 4 coupons (the ’blocks’) and
apply each of the 4 treatments (the tips) to each coupon. This is preferable
to hoping that randomization alone will do the job; it also uses fewer coupons.
Sarguta (SoM) Design and Analysis February - May, 2021 29 / 263
Randomized Blocks, Latin Squares and Related Designs Randomized Blocks
Introduction Contd
I There may be unknown and uncontrollable factors affecting the readings (the
eyesight of the operator, ... think of others). Here is where randomization
might help - within each block, the treatments are applied in random order.
So each block can be viewed as one CR designed experiment. This is a
Randomized Complete Block Design (RCBD). ’Complete’ means that each
block contains all of the treatments.
I Common blocking variables: Day of week, person, batch of raw material, ... .
A basic idea is that the responses should be less highly varied within a block
than between blocks.
Data Entry in R
> reading<-c(9.3,9.4,9.6,10.0,9.4,9.3,9.8,9.9,
+ 9.2,9.4,9.5,9.7,9.7,9.6,10.0,10.2)
> coupon<-c(rep(1:4,4))
> tip<-c(rep(1,4),rep(2,4),rep(3,4),rep(4,4))
> tip<-factor(tip)
> coupon<-factor(coupon)
> Hardness<-data.frame(reading,tip,coupon)
Visualizing with box plots
> par(mfrow=c(1,2))
> boxplot(reading~coupon, xlab="coupon")
> boxplot(reading~tip, xlab="tip")
Box plots
10.2
10.2
10.0
10.0
9.8
9.8
9.6
9.6
9.4
9.4
9.2
9.2
1 2 3 4 1 2 3 4
coupon tip
RCBD Model
Effects Model:
yij = µ + τi + βj + ij
i = 1, . . . , a = number of treatments
j = 1, . . . , b = number of blocks
τi = effect of ith treatment
βj = effect of jth block
X X
τi = βj = 0
µij = µ + τi + βj = E [yij ]
Sums of Squares
I We consider the effects model, and Decompose SST into sums of squares
attributable to (i) treatment differences (SSTr ), (ii) blocks (SSBlocks ), (iii)
experimental error (SSE ).
I Least Square Estimates: (Prove!)
µ̂ = ȳ.. ; τ̂i = ȳi. − ȳ.. ; β̂j = ȳ.j − ȳ..
Decomposition of SST
a X
X b
2
SST = (yij − ȳ.. )
i=1 j=1
a X
X b a X
X b
2 2
= (ȳi. − ȳ.. ) + (ȳ.j − ȳ.. )
i=1 j=1 i=1 j=1
a X
X b
2
+ (yij − ȳi. − ȳ.j + ȳ.. )
i=1 j=1
Degrees of Freedom
I Degrees of freedom
df (SSTr ) = a − 1
df (SSBlocks ) = b − 1
df (SSE ) = ab − 1 − (a − 1) − (b − 1) = (a − 1)(b − 1)
I Task: Give the Theoretical ANOVA table for RCBD
R - Output
Response: reading
Df Sum Sq Mean Sq F value Pr(>F)
tip 3 0.385 0.128333 14.438 0.0008713 ***
coupon 3 0.825 0.275000 30.938 4.523e-05 ***
Residuals 9 0.080 0.008889
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation
I Thus at any level α > 0.00087, we would reject the null hypothesis of no
significant treatment effects (H0 : τ1 = · · · = τa = 0). It also appears that
the blocks have a significant effect.
I A caution here though - the randomization alone ensures that the F-test for
treatments is approximately valid even if the errors are not very normal.
Because of the randomization restriction, the same is not true for testing the
significance of blocks by looking at MSMSE . Thus the p-value of 4.253e − 05
Blocks
for blocks (coupons) should be used only as a guide, unless one is sure of the
normality.
Assumptions Verification
I qq-plot of residuals; ij = yij − ŷij , where the fitted values are ŷij = µ̂ + τ̂i + β̂j .
I Residuals vs. treatment labels, block labels, fitted values.
> par(mfrow=c(2,2))
> plot(fit.hardness,which=2)#qqplot
> plot(fit.hardness,which=1)#Residuals vs Fitted
> plot(fit.hardness,which=5)#Residuals vs Factor Levels(Tip)
> plot(fit.hardness$residuals,coupon)
Conclusion
I Since
I we conclude that tips 1,2 and 3 produce identical hardness readings but that
tip 4 gives significantly different (and higher) readings.
I In making these statements our experiment-wise error rate is
< 6(0.05) = 0.3, so our overall confidence is > 70%.
Fisher’s LSD in R
> library(agricolae)
> comparison<-LSD.test(reading,tip,9,0.0089)
> comparison
$statistics
MSerror Df Mean CV t.value LSD
0.0089 9 9.625 0.9801539 2.262157 0.1509047
$parameters
test p.ajusted name.t ntr alpha
Fisher-LSD none tip 4 0.05
$means
reading std r LCL UCL Min Max Q25 Q50 Q75
1 9.575 0.3095696 4 9.468294 9.681706 9.3 10.0 9.375 9.50 9.700
2 9.600 0.2943920 4 9.493294 9.706706 9.3 9.9 9.375 9.60 9.825
3 9.450 0.2081666 4 9.343294 9.556706 9.2 9.7 9.350 9.45 9.550
4 9.875 0.2753785 4 9.768294 9.981706 9.6 10.2 9.675 9.85 10.050
$comparison
Sarguta (SoM) Design and Analysis February - May, 2021 44 / 263
Randomized Blocks, Latin Squares and Related Designs Randomized Blocks
q qtukey (0.95, 4, 9)
√α = √ = 3.1218
2 2
to get s
q 2
√α · se (ȳi. − ȳi 0 . ) = 3.1218 0.00889 = 0.208
2 4
I The same conclusions are drawn, with an experiment-wise error rate of only
0.05.
TASK: Find other mean comparison methods.
Tukey HSD in R
> fit<-aov(reading~tip+coupon)
> TukeyHSD(fit)
Tukey multiple comparisons of means
95% family-wise confidence level
$tip
diff lwr upr p adj
2-1 0.025 -0.18311992 0.23311992 0.9809005
3-1 -0.125 -0.33311992 0.08311992 0.3027563
4-1 0.300 0.09188008 0.50811992 0.0066583
3-2 -0.150 -0.35811992 0.05811992 0.1815907
4-2 0.275 0.06688008 0.48311992 0.0113284
4-3 0.425 0.21688008 0.63311992 0.0006061
$coupon
diff lwr upr p adj
2-1 0.025 -0.18311992
Sarguta (SoM)
0.2331199 0.9809005
Design and Analysis February - May, 2021 46 / 263
Randomized Blocks, Latin Squares and Related Designs Randomized Blocks
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
2 - 1 == 0 0.02500 0.06667 0.375 0.98092
3 - 1 == 0 -0.12500 0.06667 -1.875 0.30293
4 - 1 == 0 0.30000 0.06667 4.500 0.00698 **
3 - 2 == 0 -0.15000 0.06667 -2.250 0.18164
4 - 2 == 0 0.27500 0.06667 4.125 0.01120 *
4 - 3 == 0 0.42500
Sarguta (SoM)
0.06667 6.375 < 0.001 ***
Design and Analysis February - May, 2021 48 / 263
Randomized Blocks, Latin Squares and Related Designs Randomized Blocks
2−1 ( )
3−1 ( )
4−1 ( )
3−2 ( )
4−2 ( )
4−3 ( )
Exercise
To be given!
Latin Squares
I Same (hardness) example. Suppose that the ’operator’ of the testing machine
was also thought to be a factor.
I We suppose that there are p = 4 operators, p = 4 coupons, and p = 4 tips.
The first two are nuisance factors, the last is the ’treatment’.
I We can carry out the experiment, and estimate everything we need to, in only
p 2 = 16 runs (as before), if we use a Latin Square Design.
I Here each tip is used exactly once on each coupon, and exactly once by each
operator.
I Represent the treatments by the Latin letters A, B, C , D and consider the
Latin square:
Operator
Coupon k=1 k=2 k=3 k=4
i=1 A D B C
i=2 B A C D
i=3 C B D A
i=4 D C A B
Data
I Each letter appears exactly once in each row and in each column.
I There are many ways to construct a Latin square, and the randomization
enters into things by randomly choosing one of them.
I Suppose the data were:
Operator
Coupon k=1 k=2 k=3 k=4
i=1 A=9.3 B=9.3 C=9.5 D=10.2
i=2 B=9.4 A=9.4 D=10.0 C=9.7
i=3 C=9.2 D=9.6 A=9.6 B=9.9
i=4 D=9.7 C=9.4 B=9.8 A=10.0
Effects Model
yijk = µ + αi + τj + βk + ijk , i, j, k = 1, . . . , p
where
(i) yijk is the observation in row i, column k, using treatment j. ( So
y243 = 10.0, y223 does not exist - only p 2 of them do).
(ii) αi , τj , βk are the row, treatment and column effects, all summing to zero.
(iii) ijk is the random effect
Note that the model is additive in that there is no interaction effect: any
treatment has the same effect regardless of the levels of the other factors.
as usual, with
p
X
SSRows = p α̂i2
i=1
p
X
SSTr = p τ̂j2
j=1
p
X
SSCol = p β̂k2
k=1
ANOVA Table
(i) Write down the theoretical ANOVA table for a Latin Square Design.
(ii) Give the rejection criteria for the hypothesis of equal treatment effects.
R - Code
> y <- c(9.3, 9.4, 9.2, 9.7, 9.3, 9.4, 9.6, 9.4,
+ 9.5, 10.0, 9.6, 9.8, 10.2, 9.7, 9.9, 10.0)
> operators <- as.factor(rep(1:4, each=4))
> coupons <- as.factor(rep(1:4, times=4))
> tips <- as.factor(c("A", "B", "C", "D", "B", "A", "D", "C",
+ "C","D", "A", "B", "D", "C", "B","A"))
> data <- data.frame(y, operators, coupons, tips)
> data
y operators coupons tips
1 9.3 1 1 A
2 9.4 1 2 B
3 9.2 1 3 C
4 9.7 1 4 D
5 9.3 2 1 B
6 9.4 2 2 A
7 9.6 2 3 D
8 9.4 2 4 C
9 9.5 3 1 C
10 10.0Sarguta (SoM) 3 2 D
Design and Analysis February - May, 2021 56 / 263
Randomized Blocks, Latin Squares and Related Designs Latin Squares
Model Fit
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
tips 3 0.385 0.128333 38.5 0.0002585 ***
operators 3 0.825 0.275000 82.5 2.875e-05 ***
coupons 3 0.060 0.020000 6.0 0.0307958 *
Residuals 6 0.020 0.003333
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the ANOVA table the means are significantly different.
so again we conclude that tip 4 gives significantly different readings. Tips 2 and 3
seem significantly different as well. Now the coupons don’t seem to affect the
readings, although it appears that the operators do.
Sarguta (SoM) Design and Analysis February - May, 2021 58 / 263
Randomized Blocks, Latin Squares and Related Designs Latin Squares
Check Assumptions
The usual model checks should be done - qqplot of residuals (to check normality),
plots of residuals against row labels, column labels, treatment labels (to check for
constant variances). Bartlett’s test can be carried out if normality is assured.
Exercise 1
Exercise 2
A cornflakes company wishes to test the market for a new product that is intended
to be eaten for breakfast. Primarily two factors are of interest, namely an
advertising campaign and the type of emballage used. Four alternative advertising
campaigns were considered:
I A; TV commercials,
I B; adds in the newspapers,
I C; lottery in the individual packages,
I D; free package (sent by mail to many families).
Four different kinds of emballage were chosen. They differed in the way the
product was described on the front of the packages:
I I; contains calcium, ferro minerals, phosphorus and B vitamin,
I II; easy and fast to prepare,
I III; low cost food,
I IV; gives you energy to last for the whole day.
The investigation was carried out in four cities called 1, 2, 3 and 4. The following
results were obtained:
Sarguta (SoM) Design and Analysis February - May, 2021 61 / 263
Randomized Blocks, Latin Squares and Related Designs Latin Squares
Exercise 2 - Contd
Graeco-Latin Square
Model
I Effects Model
yijkl = µ + αi + τj + βk + θl + ijkl
I The additive model has terms for all four factors - coupons, operators, days
and tips. Each is estimated by the sample average for that level of that
factor, minus the overall average.
I For example the LSE of the effect of Tuesday is
1
θ̂2 = (9.2 + 9.3 + 10.0 + 10.0) − y....
4
and
p
X
SSDays = p θ̂l2 .
l=1
Model Fit in R
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
tips 3 0.385 0.128333 25.6667 0.012188 *
operators 3 0.825 0.275000 55.0000 0.004029 **
coupons 3 0.060 0.020000 4.0000 0.142378
days 3 0.005 0.001667 0.3333 0.804499
Residuals 3 0.015 0.005000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Exercise
To be given!
I In the RCBD, C=’Complete’ means that each block contains each treatment.
E.g. each coupon is subjected to each of the 4 tips.
I Suppose that a coupon is only large enough that 3 tips can be used. Then
the blocks would be ’incomplete’.
I One way to run the experiment is to randomly assign 3 tips to each block,
perhaps requiring that each tip appears 3 times in total.
I There is a more efficient way. An incomplete block design is ’balanced’ if any
two treatments appear in the same block an equal number of times. This is
then a Balanced Incomplete Block Design.
Model
Model is as for a RCBD:
yij = µ + τi + βj + ij
As usual, the total sum of squares is
X 2
SST = (yij − ȳ ..)
i,j
on b − 1 d.f. The treatment SS depends on the ’adjusted total for the ith
treatment’.
b
1X
Qi = yi. − nij y.j
k
j=1
Pb
So j=1 nij y.j is the total of the block totals, counting only those blocks that
contain treatment i:
1
Q1 = 28.7 − (28.2 + 28.1 + 30.1) = −0.1000
3
1
Q2 = 29.0 − (28.1 + 29.3 + 30.1) = −0.1667
3
1
Q3 = 28.1 − (28.2 + 28.1 + 29.3) = −0.4333
3
1
Q4 = 29.9 − (28.2 + 29.3 + 30.1) = 0.7000
3
P
(As a check, it is always the case that Qi = 0).
4
X
E [y.. ] = 12µ + r (τ1 + τ2 + τ3 + τ4 ) + k βj
j=1
= 12µ
Then
3µ + 3τ1 + β1 + β2 + β4 12µ
E [ȳi. − ȳ.. ] = −
3 12
β1 + β2 + β4
= τ1 +
3
The block totals must be brought in, in order to adjust for the bias.
Sarguta (SoM) Design and Analysis February - May, 2021 71 / 263
Randomized Blocks, Latin Squares and Related Designs Balanced Incomplete Block Designs
The idea is that we first estimate the block effects and then see how much of the
remaining variation is attributable to treatments. Doing it in the other order
results in something quite different.
Analysis in R
Correct Analysis:
> data=c(9.3,9.4,10,9.3,9.8,9.9,9.2,9.4,9.5,9.7,10,10.2)
> tip_s=as.factor(c(1,1,1,2,2,2,3,3,3,4,4,4))
> coupon_s=as.factor(c(1,2,4,2,3,4,1,2,3,1,3,4))
> g1 <- lm(data ~ coupon_s + tip_s)
> anova(g1)
Analysis of Variance Table
Response: data
Df Sum Sq Mean Sq F value Pr(>F)
coupon_s 3 0.90917 0.303056 29.3280 0.001339 **
tip_s 3 0.26833 0.089444 8.6559 0.020067 *
Residuals 5 0.05167 0.010333
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Caution!
Incorrect Analysis:
> h1 <- lm(data ~ tip_s + coupon_s)
> anova(h1)
Analysis of Variance Table
Response: data
Df Sum Sq Mean Sq F value Pr(>F)
tip_s 3 0.56250 0.187500 18.145 0.004054 **
coupon_s 3 0.61500 0.205000 19.839 0.003311 **
Residuals 5 0.05167 0.010333
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Inferences
To make inferences, we use the fact that the τ̂i are independent and equally
varied, with
kσ 2
VAR [τ̂i ] = ,
λa
so that r
2k
se (τ̂i − τ̂j ) = · MSE .
λa
In our example this is 0.0880, so that single confidence intervals are
qtukey (1−α,4,5)
and simultaneous Tukey-type intervals replace t α2 ,5 by √
2
.
Exercise
Suppose that a chemical engineer thinks that the time of reaction of a chemical
process is a function of the type of catalyst employed. Four catalysts are currently
being investigated. The experimental procedure consists of selecting a batch of
raw material, loading the pilot plant, applying each catalyst in a separate run of
the pilot plant and observing the reaction time. Because variations in the batches
of raw material may affect the performance of the catalysts, the engineer decides
to use batches of raw material as blocks. However, each batch is only large
enough to permit three catalysts to be run. The balanced incomplete block design
for this experiment along with the observations recorded is given as follows:
Block (Batch of Raw Material)
Treatment (Catalyst) 1 2 3 4 yi.
1 73 74 - 71 218
2 - 75 67 72 214
3 73 75 68 - 216
4 75 - 72 75 222
y.j 221 224 207 218 870
Analyse the B.I.B.D data above.
A design for which v treatments are laid out in b blocks each of size k, no block
receiving a treatment more than once (nij ≤ 1) is called a PBIBD if the following
conditions are satisfied:
1. Each treatment is replicated the same number of times
2. Any two treatments are either first associates or second associates, . . . , or
mth associates of each other. Two treatments which are ith associates of
each other occur together λi times (i = 1, 2, . . . , m). The number of ith
associates of any treatment is ni , where ni does not depend on the treatments
considered (i = 1, 2, . . . , m). Thus we have a new set of parameters
n1 , n2 , . . . , nm and λ1 , λ2 , . . . , λm . If m = 1, the PBIBD reduces to a BIBD.
3. Given two treatments which are ith associates of each other, the number of
treatments common to the class of jth associate of one and the class of kth
associate of the other is the same whatever the pair of treatment we start
with and whatever the order in which we take them. This constant is denoted
i i i
as pjk . By definition pjk = pkj .
Parameters of PBIBD
Proofs of Restrictions
(a) Given any treatment, all other treatments are eitherPits first associates,
m
second associates, or,. . . , or mth associates. Thus j=1 nj = v − 1.
(b) This is obvious because both sides equal the number of observations.
(c) The number of pairs of treatments occurring together that may be formed so
as to include one particular treatment always is r (k − 1). Again any
treatment occurs λi times with each of the treatments of its ith associates
and there are, in all, ni treatments which are ith associates of this treatment.
Hence the total number of pairs of treatments occurringP together that can be
m
formed so as to include a particular treatment always is j=1 nj λj . Hence
Pm
r (k − 1) = j=1 nj λj .
Proofs Contd
(d) (i) In this case we consider two treatments α and β which are ith associates. Now
α has ni − 1 ith associates other than β. These ith associates of α can occur
in all or some of the m subgroups with respect to β. Thus
i i i
pi1 + pi2 + · · · + pim = ni − 1.
(ii) In this case we consider two treatments α and β which are ith associates. Now
α or β has nj jth associates. Note α is an ith associate of β and β is an ith
associate of α. Also α is not a jth associate of β nor is β a jth associate of α.
i i i
Thus pj1 + pj2 + · · · + pjm = nj , (i 6= j).
(e) Consider the group Gi of ni treatments which are ith associates of a given
treatment θ and the group Gj of nj treatments which are jth associates of θ.
i
Every treatment belonging to Gi has exactly pjk kth associates among the
treatments of group Gj . Hence the number of pairs of kth associates which
can be found by taking one treatment from Gi and one treatment from Gj is
i j j
on one hand ni pjk and on the other hand nj pik . Similarly nj pik = nk pijk . Thus
i j
ni pjk = nj pik = nk pijk .
Construction of a PBIBD
Consider a cube whose eight corners are numbered arbitrarily. Assign blocks to the
numbers (treatments) appearing in each of the six faces of the solid cube. The
resulting design is a P.B.I.B. design with parameters:
v = 8, b = 6, r = 3, k = 4
λ1 = 2, λ2 = 1, λ3 = 0
n1 = 3, n2 = 3, n3 = 1.
The blocks are:
Bl1 : 1, 2, 3, 4
Bl2 : 5, 6, 7, 8
Bl3 : 1, 3, 5, 7
Bl4 : 2, 4, 6, 8
Bl5 : 1, 2, 5, 6
Bl6 : 3, 4, 7, 8
The labels for the corners are treatments in the blocks.
Association Scheme
Thus
0 2 0
1
P = 2
0 1
0 1 0
Similarly
2 0 1
P 2 = 0 2 0
1 0 0
and
0 3 0
P 3 = 3 0 0
0 0 0
PART IV-1:
INTRODUCTION TO FACTORIAL DESIGNS
Factorial Designs
Introduction
I In a factorial design the cells consist of all possible combinations of the levels
of the factors under study.
I The simplest types of factorial designs involve only two factors or sets of
treatments. There are a levels of factor A and b levels of factor B, and these
are arranged in a factorial design; that is, each replicate of the experiment
contains all ab treatment combinations. In general, there are n replicates.
I Factorial designs accentuate the factor effects, allow for estimation of
inter-dependency of effects (or interactions), and are the first technique in
the category of what is called treatment design.
I By examining all possible combinations of factor levels, the number of
replicates of a specific level of one factor is increased by the product of the
number of levels of all other factors in the design, and thus the same power
or precision can be obtained with fewer replicates.
I If the effect of one factor changes depending on the level of another factor, it
will be seen in a factorial plan. This phenomenon will be missed in the
classical approach where each factor is only varied at constant levels of the
other factors.
Sarguta (SoM) Design and Analysis February - May, 2021 88 / 263
Factorial Designs Introduction
Then the total number of treatment combinations are 2 ∗ 2 ∗ 2 · · · n times, that is,
2n . Any treatment combination will be denoted by
a1X1 , a2X2 · · · anXn (1)
For example, if X1 = 1 and X2 = X3 = · · · = Xn = 0, then the treatment defined
by (1) means
a11 a20 a30 · · · an0 = a1
The treatment combination in (1) can be written in the form
0 0 0 0
a1 a2 a3 an
⊗ ⊗ ⊗ · · · ⊗ (2)
a11 a21 a31 an1
where ⊗ mean symbolic direct product.
Sarguta (SoM) Design and Analysis February - May, 2021 90 / 263
Factorial Designs Introduction
Treatment Effects
The treatment effects will be denoted by AX1 1 AX2 2 · · · AXn n where
(
1
Xi = i = 1, 2, 3, . . . n (4)
0
For example, if X1 = 1 and X2 = X3 = · · · = Xn = 0 then (4) means A1 which is
the main effect of factor A1 . Similarly if X1 = X2 = 1 and
X3 = X4 = · · · = Xn = 0 then (4) will mean A1 A2 which is the two factor
interaction between factor A1 and factor A2 . Similarly if X1 = X2 = X3 = 1 and
X4 = X5 = · · · = Xn = 0 then (4) will mean A1 A2 A3 . This is the three factor
interaction between factor A1 , A2 and factor A3 .
We have
1. n main effects A1 , A2 , . . . An
2. n2 two factor interactions A1 A2 , A1 A3 , . . . , An−1 An
3. n3 three factor interactions A1 A2 A3 , A1 A2 A4 , . . . , An−2 An−1 An , e.t.c.
Including the grand average of all observations, the total number of treatment
effects is 2n . All these factors (treatment effects or factorial effects) are given by
the matrix symbolic product
I I I
⊗ ⊗ ··· ⊗
A1 A2 An
Definitions
Interaction Plots
160
160
temp type
140
140
1 1
2 2
120
120
mean of y
100
100
80
80
60
60
Main Effects
I The average lifetimes at the 4 combinations of levels are
Temperature level (B)
Type (A) 1 (LO) 2 (HI)
1 134.75 57.5
2 155.75 49.5
I The ’main effect of A’ is the change in response caused by changing the level
of A. Here it is estimated by the difference in the average responses at those
levels of Factor A:
155.75 + 49.5 134.75 + 57.5
A= − = 65.5.
2 2
I Similarly
57.5 + 49.5 134.75 + 155.75
B= − = −91.75.
2 2
I Because of the interactions, these main effects are misleading. At the low
level of B, the effect of A is 155.75 − 134.75 = 21.00. At the high level, it is
49.5 − 57.5 = −8.0. The ’interaction effect’ is measured by the average
difference between these two: AB = −8.0−(21.00)
2 = −14.5
Sarguta (SoM) Design and Analysis February - May, 2021 98 / 263
Factorial Designs Two-Factor Factorial Design
Model
I The effects model, including terms for interaction, is that the kth observation
at level i of A, j of B is
yijk = µ + τi + βj + (τ β)ij + ijk
i = 1, . . . , a = levels of Factor A
j = 1, . . . , b = levels of Factor B
k = 1, . . . , n.
I Constraints i τi = 0 (average effect of levels of A is 0), i βj = 0 (average
P P
P P
effect of levels of B is 0), and average interactions i (τ β)ij = j (τ β)ij = 0.
I Reasonable estimates of these effects, obeying these constraints, are:
µ̂ = ȳ...
τ̂i = ȳi.. − ȳ... ,
β̂j = ȳ.j. − ȳ... ,
τc
β = (ȳij. − ȳ... ) − τ̂i − β̂j
ij
= ȳij. − ȳi.. − ȳ.j. + ȳ... .
Sarguta (SoM) Design and Analysis February - May, 2021 99 / 263
Factorial Designs Two-Factor Factorial Design
Sums of Squares
The effect estimates are shown to be the LSEs in the usual way:
I Decompose SST as:
X 2
SST = (yijk − ȳ... )
i,j,k
X 2
= τ̂i + β̂j + τc
β + (yijk − ȳij. )
ij
i,j,k
X X X 2 X 2
= nb τ̂i2 + na β̂j2 + n τc
β + (yijk − ȳij. )
ij
i j i,j i,j,k
= SSA + SSB + SSAB + SSE ,
Sarguta (SoM) Design and Analysis February - May, 2021 100 / 263
Factorial Designs Two-Factor Factorial Design
I Minimize
X 2
S (µ, τ, β, (τ β)) = (yijk − E [yijk ])
i,j,k
X 2
= yijk − µ − τi − βj − (τ β)ij
i,j,k
MSAB
F0 = ∼ F(a−1)(b−1),ab(n−1)
MSE
I If the interactions are not significant then it makes sense to ask about the
MSA
significance of the levels of the factors, using MSE
, etc.
Sarguta (SoM) Design and Analysis February - May, 2021 101 / 263
Factorial Designs Two-Factor Factorial Design
The expected values of the mean squares turn out to be what one would expect:
E [MSE ] = σ2 ,
P 2
2
n i,j (τ β)ij
E [MSAB ] = σ + ,
(a − 1) (b − 1)
P 2
2 nb i τi
E [MSA ] = σ + ,
a−1
P 2
na j βj
E [MSB ] = σ 2 + .
b−1
Sarguta (SoM) Design and Analysis February - May, 2021 102 / 263
Factorial Designs Two-Factor Factorial Design
and
b
1 X 2 y2
SSB = y.j. − ...
an abn
j=1
Sarguta (SoM) Design and Analysis February - May, 2021 103 / 263
Factorial Designs Two-Factor Factorial Design
It is convenient to obtain the SSAB in two stages. First we compute the sum of
squares between the ab cell totals, which is called the sum of squares due to
”subtotals”:
a b
1 XX 2 y2
SSSubtotals = yij. − ...
n abn
i=1 j=1
This sum of squares also contains SSA and SSB . Therefore, the second step is to
compute SSAB as
SSAB = SSSubtotals − SSA − SSB
We may compute SSE by subtraction as
or
SSE = SST − SSSubtotals
Sarguta (SoM) Design and Analysis February - May, 2021 104 / 263
Factorial Designs Two-Factor Factorial Design
Task
Write down the theoretical ANOVA table for a two factor factorial experiment
with n observations per cell.
Sarguta (SoM) Design and Analysis February - May, 2021 105 / 263
Factorial Designs Two-Factor Factorial Design
Example Contd
Sarguta (SoM) Design and Analysis February - May, 2021 107 / 263
Factorial Designs Two-Factor Factorial Design
Sums of Squares
a X
b X
n 2
X
2 y...
SST = yijk −
abn
i=1 j=1 k=1
(3799)2
= (130)2 + (155)2 + · · · + (60)2 − = 77, 646.97
36
a
1 X 2 y2
SSMaterial = yi.. − ...
bn abn
i=1
1 (3799)2
= (998)2 + (1300)2 + (1501)2 − = 10, 683.72
(3)(4) 36
b
1 X 2 y2
SSTemperature = y.j. − ...
an abn
j=1
1 (3799)2
= (1738)2 + (1291)2 + (770)2 − = 39, 118.72
(3)(4) 36
Sarguta (SoM) Design and Analysis February - May, 2021 108 / 263
Factorial Designs Two-Factor Factorial Design
a b
1 XX 2 y2
SSInteraction = yij. − ... − SSMaterial − SSTemperature
n abn
i=1 j=1
1 (3799)2
= (539)2 + (229)2 + · · · + (342)2 −
4 36
−10, 683.72 − 39, 118.72 = 9613.78
and
Sarguta (SoM) Design and Analysis February - May, 2021 109 / 263
Factorial Designs Two-Factor Factorial Design
Solution - R
> y<-c(130,155,74,180,34,40,80,75,20,70,82,
+ 58,150,188,159,126,136,122,106,115,25,70,
+ 58,45,138,110,168,160,174,120,150,139,96,104,82,60)
> type <- as.factor(rep(1:3,each=12))
> temp <- as.factor(rep(c(15,70,125),each=4,times=3))
> data <- data.frame(type, temp, y)
> means <- matrix(nrow=3,ncol=3)
> for(i in 1:3) {for (j in 1:3) means[i,j] <-
+ mean(y[type==i & temp == c(15,70,125)[j]])}
> means
[,1] [,2] [,3]
[1,] 134.75 57.25 57.5
[2,] 155.75 119.75 49.5
[3,] 144.00 145.75 85.5
> interaction.plot(type,temp,y)
> interaction.plot(temp,type,y)
Sarguta (SoM) Design and Analysis February - May, 2021 110 / 263
Factorial Designs Two-Factor Factorial Design
Interaction Plots
160
160
temp type
140
140
70 3
15 1
125 2
120
120
mean of y
mean of y
100
100
80
80
60
60
1 2 3 15 70 125
type temp
Sarguta (SoM) Design and Analysis February - May, 2021 111 / 263
Factorial Designs Two-Factor Factorial Design
Interpretation
Sarguta (SoM) Design and Analysis February - May, 2021 112 / 263
Factorial Designs Two-Factor Factorial Design
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
type 2 10684 5341.9 7.9114 0.001976 **
temp 2 39119 19559.4 28.9677 1.909e-07 ***
type:temp 4 9614 2403.4 3.5595 0.018611 *
Residuals 27 18231 675.2
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I As suspected, the interaction effects are quite significant. There is no battery
type which is ’best’ at all temperatures.
I If interactions were NOT significant one could compare the µ + τi by seeing
which of the differences µ̂ + τ̂i q
= ȳi.. were significantly different from each
2MSE
other ( using se (ȳi.. − ȳk.. ) = nb ).
Sarguta (SoM) Design and Analysis February - May, 2021 113 / 263
Factorial Designs Two-Factor Factorial Design
µi2 = µ + τi + β2 + (τ β)i2 ,
Sarguta (SoM) Design and Analysis February - May, 2021 114 / 263
Factorial Designs Two-Factor Factorial Design
Sarguta (SoM) Design and Analysis February - May, 2021 115 / 263
Factorial Designs Two-Factor Factorial Design
I One computes
SSRed − SSFull 1
F0 = ∼ F(a−1)(b−1)−1 .
MSE (Full)
I The difference
SSN = SSRed − SSFull
is called the ’SS for non-additivity’, and uses 1 d.f. to estimate the one
parameter γ.
I The ANOVA becomes
Source SS df MS
SSA
A SSA a−1 MSA = a−1
SSB
B SSB b−1 MSB = b−1
N SSN 1 MSN = SS1N
Error SSE (a − 1)(b − 1) − 1 MSE = dfSS E
(Err )
Total SST ab − 1
Sarguta (SoM) Design and Analysis February - May, 2021 117 / 263
Factorial Designs Two-Factor Factorial Design
Sarguta (SoM) Design and Analysis February - May, 2021 118 / 263
Factorial Designs Two-Factor Factorial Design
Example
Sarguta (SoM) Design and Analysis February - May, 2021 119 / 263
Factorial Designs Two-Factor Factorial Design
Sums of Squares
b
1 X 2 y..2 1 2 442
9 + 62 + 132 + 62 + 102 −
SSB = y.j − = = 11.60
a ab 3 (3)(5)
j=1
a X
b
X y..2
SST = yij2 − = 166 − 129.07 = 36.93
ab
i=1 j=1
and
SSResidual = SST − SSA − SSB = 36.93 − 23.33 − 11.60 = 2.00
Sarguta (SoM) Design and Analysis February - May, 2021 120 / 263
Factorial Designs Two-Factor Factorial Design
hP 2
i2
a Pb y..
i=1 j=1 yij yi. y.j − y.. SSA + SSB + ab
SSN =
abSSA SSB
2
[7236 − (44)(23.33 + 11.60 + 129.07)]
=
(3)(5)(23.33)(11.60)
2
[20.00]
= = 0.0985
4059.42
and the error sum of squares is
Sarguta (SoM) Design and Analysis February - May, 2021 121 / 263
Factorial Designs Two-Factor Factorial Design
ANOVA Table
Sarguta (SoM) Design and Analysis February - May, 2021 122 / 263
Factorial Designs Two-Factor Factorial Design
Conclusion
In concluding this section, we note that the two-factor factorial model with one
observation per cell looks exactly like the randomized complete block model. In
fact, the Tukey single-degree-of-freedom test for nonadditivity can be directly
applied to test for interaction in the randomized block model. However, remember
that the experimental situations that lead to the randomized block and factorial
models are very different. In the factorial model, all ab runs have been made in
random order, whereas in the randomized block model, randomization occurs only
within the block. The blocks are a randomization restriction. Hence, the manner
in which the experiments are run and the interpretation of the two models are
quite different.
Sarguta (SoM) Design and Analysis February - May, 2021 123 / 263
Factorial Designs The General Factorial Design
Three-factor Factorial
Model:
for
i = 1, 2, . . . , a
j = 1, 2, . . . , b
k = 1, 2, . . . , c
l = 1, 2, . . . , n
Sarguta (SoM) Design and Analysis February - May, 2021 125 / 263
Factorial Designs The General Factorial Design
A three-factor example
A soft drink bottler is interested in obtaining more uniform fill heights in the
bottles produced by his manufacturing process. The filling machine theoretically
fills each bottle to the correct target height, but in practice, there is variation
around this target, and the bottler would like to understand the sources of this
variability better and eventually reduce it. The process engineer can control three
variables during the filling process: the percent carbonation (A), the operating
pressure in the filler (B), and the bottles produced per minute or the line speed
(C). The pressure and speed are easy to control, but the percent carbonation is
more difficult to control during actual manufacturing because it varies with
product temperature. However, for purposes of an experiment, the engineer can
control carbonation at three levels: 10, 12, and 14 percent. She chooses two levels
for pressure (25 and 30 psi) and two levels for line speed (200 and 250 bpm). She
decides to run two replicates of a factorial design in these three factors, with all 24
runs taken in random order. The response variable observed is the average
deviation from the target fill height observed in a production run of bottles at
each set of conditions.
Sarguta (SoM) Design and Analysis February - May, 2021 126 / 263
Factorial Designs The General Factorial Design
Sarguta (SoM) Design and Analysis February - May, 2021 127 / 263
Factorial Designs The General Factorial Design
Solve in R
Sarguta (SoM) Design and Analysis February - May, 2021 128 / 263
Factorial Designs The General Factorial Design
Plots
14 press
8
6
30
mean of y
mean of y
6
30 25
250
4
4
12 200
2
2
25
0
0
10
carbon press speed
10 12 14
Factors carbon
speed speed
8
250 250
6
mean of y
mean of y
200 200
4
4
3
2
2
0
10 12 14 25 30
carbon press
Sarguta (SoM) Design and Analysis February - May, 2021 129 / 263
Factorial Designs The General Factorial Design
Analysis
> g<-lm(y ~ carbon + press + speed + carbon*press
+ + carbon*speed + press*speed + carbon*press*speed)
> anova(g)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
carbon 2 252.750 126.375 178.4118 1.186e-09 ***
press 1 45.375 45.375 64.0588 3.742e-06 ***
speed 1 22.042 22.042 31.1176 0.0001202 ***
carbon:press 2 5.250 2.625 3.7059 0.0558081 .
carbon:speed 2 0.583 0.292 0.4118 0.6714939
press:speed 1 1.042 1.042 1.4706 0.2485867
carbon:press:speed 2 1.083 0.542 0.7647 0.4868711
Residuals 12 8.500 0.708
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sarguta (SoM) Design and Analysis February - May, 2021 130 / 263
Factorial Designs The General Factorial Design
Conclusion
I It seems that interactions are largely absent, and that all three main effects
are significant. In particular, the low level of pressure results in smaller mean
deviations from the target.
I We see that the percentage of carbonation, operating pressure, and line
speed significantly affect the fill volume. The carbonation-pressure interaction
F ratio has a P-value of 0.0558, indicating some interaction between these
factors.
Task: Carry out an analysis of the residuals.
Sarguta (SoM) Design and Analysis February - May, 2021 131 / 263
The 2k Factorial Design
PART IV-2:
k
THE 2 FACTORIAL DESIGN
Sarguta (SoM) Design and Analysis February - May, 2021 132 / 263
The 2k Factorial Design Introduction
Introduction
Sarguta (SoM) Design and Analysis February - May, 2021 133 / 263
The 2k Factorial Design Introduction
Sarguta (SoM) Design and Analysis February - May, 2021 134 / 263
The 2k Factorial Design 22 Factorials
22 Factorials
I Two factors (A and B), each at two levels - low (’-’) and high (’+’). The
number of replicates = n.
I Example - investigate yield (y ) of a chemical process when the concentration
of a reactant (the primary substance producing the yield) - factor A - and
amount of a catalyst (to speed up the reaction) - factor B - are changed.
E.g. nickel is used as a ’catalyst’, or a carrier of hydrogen in the
hydrogenation of oils (the reactants) for use in the manufacture of margarine.
Factor n = 3 replicates
A B I II III Total Label
- - 28 25 27 80 (1)
+ - 36 32 32 100 a
- + 18 19 23 60 b
+ + 31 30 29 90 ab
Sarguta (SoM) Design and Analysis February - May, 2021 135 / 263
The 2k Factorial Design 22 Factorials
22 Factorial
I Notation
I Effects model
A = A2 − A1 .
Sarguta (SoM) Design and Analysis February - May, 2021 136 / 263
The 2k Factorial Design 22 Factorials
b + ab − a − (1)
B=
2n
A = 8.33,
B = −5.0,
AB = 1.67.
Solution in R
> A <- c(-1, 1, -1,1)
> B <- c(-1, -1, 1, 1)
> I <- c(28, 36, 18, 31)
> II <- c(25, 32, 19, 30)
> III <- c(27, 32, 23, 29)
> data <- data.frame(A, B, I, II, III)
> data
A B I II III
1 -1 -1 28 25 27
2 1 -1 36 32 32
3 -1 1 18 19 23
4 1 1 31 30 29
> #Compute sums for each combination
> sums <- apply(data[,3:5], 1, sum)
> names(sums) <- c("(1)", "(a)", "(b)", "(ab)")
> sums
(1) (a) (b) (ab)
80 100 60 90
Sarguta (SoM) Design and Analysis February - May, 2021 139 / 263
The 2k Factorial Design 22 Factorials
Interaction Plots
> ybar <- sums/3
> par(mfrow=c(1,2))
> interaction.plot(A, B, ybar)
> interaction.plot(B, A, ybar)
B A
32
32
−1 1
mean of ybar
mean of ybar
1 −1
28
28
24
24
20
20
−1 1 −1 1
A B
Sarguta (SoM) Design and Analysis February - May, 2021 140 / 263
The 2k Factorial Design 22 Factorials
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
factorA 1 208.333 208.333 53.1915 8.444e-05 ***
factorB 1 75.000 75.000 19.1489 0.002362 **
factorA:factorB 1 8.333 8.333 2.1277 0.182776
Residuals 8 31.333 3.917
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sarguta (SoM) Design and Analysis February - May, 2021 141 / 263
The 2k Factorial Design 22 Factorials
Residual Analysis
2.0
11 11
3
2 2
Standardized residuals
2
1.0
Residuals
1
0
0.0
−1
−1.0
−2
3
3
Constant Leverage:
Scale−Location Residuals vs Factor Levels
11
2 11
1.2
2
Standardized residuals
1.5
Standardized residuals
3
0.8
0.5
−0.5
0.4
3
−1.5
0.0
factorA :
20 22 24 26 28 30 32 −1 1
Sarguta (SoM) Design and Analysis February - May, 2021 142 / 263
The 2k Factorial Design 22 Factorials
Contrasts
I The estimates of the effects have used only the terms ab, a, b and (1), each
of which is the sum of n = 3 independent terms. Then
ab + a − b − (1) CA
A = = ,
2n 2n
ab − a + b − (1) CB
B = = ,
2n 2n
ab − a − b + (1) CAB
AB = = ,
2n 2n
where CA , CB , CAB are orthogonal contrasts (why?) in ab, a, b and (1).
I In our previous notation, the SS for Factor A (we might have written is as
bn Â2i ) is
P
C2
SSA = 2n Â21 + Â22 = 4nÂ22 = nÂ2 = A ,
4n
and similarly
C2 C2
SSB = B , SSAB = AB .
4n 4n
[90+100−60−80]2
In this way SSA = 12 = 208.33.
Sarguta (SoM) Design and Analysis February - May, 2021 143 / 263
The 2k Factorial Design 2k Factorials
2k Factorials
Sarguta (SoM) Design and Analysis February - May, 2021 144 / 263
The 2k Factorial Design 2k Factorials
Effect
I A B C AB AC
BC ABC
(1) + - - - + ++ -
a + + - - +
b + - + - +
ab + + + + - - - -
c + - - + +
ac + + - + -
bc + - + + -
abc + + + + +
I Interpretation: Assign the appropriate signs to the combinations (1), . . . , abc.
Effect estimates are
[(1) + b + c + bc] a + ab + ac + abc
A=− + ,
4n 4n
etc.
[a + b + c + abc] − [(1) + ab + ac + bc]
ABC = ,
4n
all with 2k−1 n in the denominator.
Sarguta (SoM) Design and Analysis February - May, 2021 145 / 263
The 2k Factorial Design 2k Factorials
Sum of Squares
2
{[a + b + c + abc] − [(1) + ab + ac + bc]}
SSABC = .
8n
I The sums of squares are all on 1 d.f. (including SSI , which uses the 1 d.f.
usually subtracted from N = 2k n for the estimation of the overall mean µ),
so that SSE , obtained by subtraction, is on N − 2k = 2k (n − 1) d. f.
I The F-ratio to test the effect of factor A is
MSA
F0 = ,
MSE
SSE
where MSA = SSA and MSE = df (SSE ) .
Sarguta (SoM) Design and Analysis February - May, 2021 146 / 263
The 2k Factorial Design 2k Factorials
Replicate n = 1
Sarguta (SoM) Design and Analysis February - May, 2021 147 / 263
The 2k Factorial Design 2k Factorials
Data
> y<-c(45,71,48,65,68,60,80,65,43,100,45,104,75,86,70,96)
> A <-as.factor(rep(c(-1,1,-1,1),4))
> B <- as.factor(rep(c(-1, -1, 1, 1),4))
> C <- as.factor(rep(c(-1, -1, -1, -1,1,1,1,1),2))
> D <- as.factor(c(-1, -1, -1, -1,-1, -1, -1, -1,1,1,1,1,1,1,1,1))
> data<-data.frame(A,B,C,D,y)
> data
A B C D y
1 -1 -1 -1 -1 45
2 1 -1 -1 -1 71
3 -1 1 -1 -1 48
4 1 1 -1 -1 65
5 -1 -1 1 -1 68
6 1 -1 1 -1 60
7 -1 1 1 -1 80
8 1 1 1 -1 65
9 -1 -1 -1 1 43
10 1 -1 -1 1 100
11 -1 Sarguta
1 -1(SoM)1 45 Design and Analysis February - May, 2021 148 / 263
The 2k Factorial Design 2k Factorials
ANOVA
> g <- lm(y ~(A+B+C+D)^4)
> anova(g)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1870.56 1870.56
B 1 39.06 39.06
C 1 390.06 390.06
D 1 855.56 855.56
A:B 1 0.06 0.06
A:C 1 1314.06 1314.06
A:D 1 1105.56 1105.56
B:C 1 22.56 22.56
B:D 1 0.56 0.56
C:D 1 5.06 5.06
A:B:C 1 14.06 14.06
A:B:D 1 68.06 68.06
A:C:D Sarguta (SoM)
1 10.56 10.56 Design and Analysis February - May, 2021 149 / 263
The 2k Factorial Design 2k Factorials
Effects - R
> g$effects
(Intercept) A1 B1 C1 D1 A1
-280.25 -43.25 -6.25 19.75 29.25 0
A1:C1 A1:D1 B1:C1 B1:D1 C1:D1 A1:B1
-36.25 33.25 -4.75 -0.75 -2.25 3
A1:B1:D1 A1:C1:D1 B1:C1:D1 A1:B1:C1:D1
-8.25 3.25 5.25 2.75
Note that these are twice as large in absolute value as those computed, and the
signs sometimes differ. This is because of R’s definition of ’effect’, and makes no
difference for comparing their absolute values.
Sarguta (SoM) Design and Analysis February - May, 2021 150 / 263
The 2k Factorial Design 2k Factorials
A1
40
A1:C1
A1:D1
30
Sample Quantiles
D1
20
C1
10
A1:B1:D1
B1
B1:C1:D1
B1:C1
A1:B1:C1
A1:C1:D1
A1:B1:C1:D1
C1:D1
A1:B1 B1:D1
0
−1 0 1
Theoretical Quantiles
The significant terms seems to be A, C, D and the interactions AC, AD. So let’s
just drop B and fit all terms not involving B.
Sarguta (SoM) Design and Analysis February - May, 2021 151 / 263
The 2k Factorial Design 2k Factorials
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1870.56 1870.56 83.3677 1.667e-05 ***
C 1 390.06 390.06 17.3844 0.0031244 **
D 1 855.56 855.56 38.1309 0.0002666 ***
A:C 1 1314.06 1314.06 58.5655 6.001e-05 ***
A:D 1 1105.56 1105.56 49.2730 0.0001105 ***
C:D 1 5.06 5.06 0.2256 0.6474830
A:C:D 1 10.56 10.56 0.4708 0.5120321
Residuals 8 179.50 22.44
---
Signif.Sarguta
codes:
(SoM)
0 '***' 0.001Design
'**' 0.01 '*' 0.05 '.'February
and Analysis
' 2021
0.1- May, ' 1 152 / 263
The 2k Factorial Design 2k Factorials
Interaction Plots
1
80
80
1
−1
75
mean of y
mean of y
70
70
60
65
−1
−1
50
60
−1
A C D
−1 1
Factors A
D D
60 65 70 75 80
90
1 1
mean of y
mean of y
−1 −1
80
70
60
−1 1 −1 1
A C
Although the main effects plot indicates that C high is best, the interaction plots
show that the best settings are A high, C low and D high.
Sarguta (SoM) Design and Analysis February - May, 2021 153 / 263
The 2k Factorial Design Additional Concepts in Factorial Designs
Sarguta (SoM) Design and Analysis February - May, 2021 154 / 263
The 2k Factorial Design Additional Concepts in Factorial Designs
Example: 32 Design
I The simplest 3k factorial design is the 32 design, which has two factors, each
at three levels.
I The 32 = 9 treatment combinations are: 00, 01, 10, 02, 20, 11, 12, 21, 22.
I There are eight degrees of freedom between these nine treatment
combinations: the main effects A and B have 2 degrees of freedom each, and
the AB interaction has 4 degrees of freedom.
I When a factor has three levels, it will have two degrees of freedom.
I Therefore, the associated sums of squares can be broken down into two
components: one that represents the linear effect (SSAL ) and the other that
represents the quadratic effect (SSAQ ).
I A linear effect is where the value of the response variable changes at almost a
constant rate over the different levels.
I A quadratic effect is where the value of the response variable changes along
the lines of a quadratic relationship.
Sarguta (SoM) Design and Analysis February - May, 2021 155 / 263
Blocking and Confounding in the 2k Factorial Design
PART VI:
BLOCKING AND CONFOUNDING IN THE 2k
FACTORIAL DESIGN
Sarguta (SoM) Design and Analysis February - May, 2021 156 / 263
Blocking and Confounding in the 2k Factorial Design Introduction
Introduction
Sarguta (SoM) Design and Analysis February - May, 2021 157 / 263
Blocking and Confounding in the 2k Factorial Design Blocking
Blocking
I Importance of blocking is to control nuisance factors - day of week, batch of
raw material, etc.
I Complete Blocks. This is the easy case. Suppose we run a 22 factorial
experiment, with all 4 runs made on each of 3 days. So there are 3 replicates
(= blocks), 12 observations. There is 1 d.f. for each of I, A, B, AB, leaving 8
d.f. Of these, 2 are used for blocks and the remaining 6 for SSE .
I The LSE’s of the block effects are
and
X 2 3
X 2
SSBlocks = Bl
bi =4 Bl
bi .
all obs’ns i=1
I Note the randomization used here - it is only within each block. If we could
run the blocks in random order, for instance if they were batches of raw
material, then we would also do so.
Sarguta (SoM) Design and Analysis February - May, 2021 158 / 263
Blocking and Confounding in the 2k Factorial Design Blocking
Example
I Check on R.
Sarguta (SoM) Design and Analysis February - May, 2021 159 / 263
Blocking and Confounding in the 2k Factorial Design Blocking
Incomplete Blocks
I Consider again a 22 factorial, in which only 2 runs can be made in each of 2
days (the blocks). Which 2 runs?
I Consider
Block 1: (1), ab
Block 2: a, b.
I What is the LSE of the block effect? Think of the blocks as being at a ’high’
level - Block 1 - and a ’low’ level - Block 2.
I Then the estimate is
Bl = average at high level - average at low level
ab + (1) − a − b
=
2
[ab − a] − [b − (1)]
=
2
= effect of B when A is high - effect of B when A is low
= AB (6)
I We say that AB is confounded with blocks since the block effect and the AB
interaction are identical.
Sarguta (SoM) Design and Analysis February - May, 2021 160 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Sarguta (SoM) Design and Analysis February - May, 2021 162 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Sarguta (SoM) Design and Analysis February - May, 2021 163 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Confounded Effect?
Sarguta (SoM) Design and Analysis February - May, 2021 164 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Sarguta (SoM) Design and Analysis February - May, 2021 165 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Sarguta (SoM) Design and Analysis February - May, 2021 166 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
I The data have been modified by subtracting 20 from all Block 1 observations,
to simulate a situation where the first batch of formaldehyde is inferior.
Sarguta (SoM) Design and Analysis February - May, 2021 168 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Data - R
Sarguta (SoM) Design and Analysis February - May, 2021 169 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Output - R
A B C D ABCD y
1 -1 -1 -1 -1 1 25
2 1 -1 -1 -1 -1 71
3 -1 1 -1 -1 -1 48
4 1 1 -1 -1 1 45
5 -1 -1 1 -1 -1 68
6 1 -1 1 -1 1 40
7 -1 1 1 -1 1 60
8 1 1 1 -1 -1 65
9 -1 -1 -1 1 -1 43
10 1 -1 -1 1 1 80
11 -1 1 -1 1 1 25
12 1 1 -1 1 -1 104
13 -1 -1 1 1 1 55
14 1 -1 1 1 -1 86
15 -1 1 1 1 -1 70
16 1 1 1 1 1 76
Sarguta (SoM) Design and Analysis February - May, 2021 170 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1870.56 1870.56
B 1 39.06 39.06
C 1 390.06 390.06
D 1 855.56 855.56
A:B 1 0.06 0.06
A:C 1 1314.06 1314.06
A:D 1 1105.56 1105.56
B:C 1 22.56 22.56
B:D 1 0.56 0.56
C:D 1 5.06 5.06
A:B:C 1 14.06 14.06
A:B:D 1 68.06 68.06
A:C:D 1 10.56 10.56
B:C:D 1 27.56 27.56
Sarguta (SoM) Design and Analysis February - May, 2021 171 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
A1
40
A1:B1:C1:D1
A1:C1
A1:D1
30
D1
Sample Quantiles
20
C1
10
A1:B1:D1
B1
B1:C1:D1
B1:C1
A1:B1:C1
A1:C1:D1
C1:D1
A1:B1 B1:D1
0
−1 0 1
Theoretical Quantiles
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1870.56 1870.56 89.757 5.600e-06 ***
C 1 390.06 390.06 18.717 0.0019155 **
D 1 855.56 855.56 41.053 0.0001242 ***
Blocks 1 1387.56 1387.56 66.581 1.889e-05 ***
A:C 1 1314.06 1314.06 63.054 2.349e-05 ***
A:D 1 1105.56 1105.56 53.049 4.646e-05 ***
Residuals 9 187.56 20.84
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sarguta (SoM) Design and Analysis February - May, 2021 173 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
L1 = x1 + x4 + x5
L2 = x2 + x3 + x5
Sarguta (SoM) Design and Analysis February - May, 2021 174 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
2k factorials in 2p blocks
L1 = x1 + x2 + x3
L2 = x1 + x3 + x4
with (L1 , L2 ) = (0, 0) in Block I, (L1 , L2 ) = (1, 0) in Block II, (L1 , L2 ) = (0, 1)
in Block III and (L1 , L2 ) = (1, 1) in Block IV.
Give the treatment combinations in the various blocks.
Sarguta (SoM) Design and Analysis February - May, 2021 176 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
blocks 3 3787.7 1262.56
A 1 1105.6 1105.56
B 1 826.6 826.56
C 1 885.1 885.06
D 1 33.1 33.06
A:B 1 95.1 95.06
A:C 1 1.6 1.56
A:D 1 540.6 540.56
B:C 1 217.6 217.56
C:D 1 60.1 60.06
A:B:D 1 3.1 3.06
B:C:D 1 22.6 22.56
A:B:C:D 1 5.1 5.06
Residuals 0 0.0
Sarguta (SoM) Design and Analysis February - May, 2021 179 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
blocks4
40
Sample Quantiles
A
30
B C
blocks3
blocks2
A:D
20
B:C
10
A:B
C:D
B:C:D D
A:C A:B:D A:B:C:D
0
−1 0 1
Theoretical Quantiles
It looks like we can drop the main effect of ’D’ if we keep some of its interactions.
R will, by default, estimate a main effect if an interaction is in the model. To fit
blocks, A, B, C, AB, AD, BC, CD but not D, we can add the SS and df for D to
those for Error.
Sarguta (SoM) Design and Analysis February - May, 2021 180 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
blocks 3 3787.7 1262.56 156.5969 0.0001333 ***
A 1 1105.6 1105.56 137.1240 0.0003042 ***
B 1 826.6 826.56 102.5194 0.0005356 ***
C 1 885.1 885.06 109.7752 0.0004690 ***
D 1 33.1 33.06 4.1008 0.1128484
B:C 1 217.6 217.56 26.9845 0.0065401 **
A:B 1 95.1 95.06 11.7907 0.0264444 *
A:D 1 540.6 540.56 67.0465 0.0012117 **
C:D 1 60.1 60.06 7.4496 0.0524755 .
Residuals 4 32.2 8.06
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This would change MSE to (32.2 + 33.1)/5 = 13.06 on 5 d.f.
Sarguta (SoM) Design and Analysis February - May, 2021 181 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Interaction Plots
65
70
B D
65
60
−1 1
1 −1
60
55
mean of y
mean of y
55
50
50
45
45
40
40
35
−1 1 −1 1
A A
60
C D
60
1 1
55
55
−1 −1
mean of y
mean of y
50
50
45
45
40
35
40
−1 1 −1 1
B C
Partial Confounding
Sarguta (SoM) Design and Analysis February - May, 2021 183 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Sarguta (SoM) Design and Analysis February - May, 2021 184 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Example
Sarguta (SoM) Design and Analysis February - May, 2021 185 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Sarguta (SoM) Design and Analysis February - May, 2021 186 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
R Code
When the levels of one factor (Blocks) make sense only within the levels of
another factor (Replicates) we say that the first is ’nested’ within the second. A
way to indicate this in R is as:
> h <- lm(y ~ Rep + Block%in%Rep + A + B + C + A*B
+ + A*C + B*C + A*B*C)
> anova(h)
Through the partial confounding we are able to estimate all interactions. It looks
like only A, C, and AC are significant.
Sarguta (SoM) Design and Analysis February - May, 2021 187 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
R Output
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
Rep 1 3875 3875 1.5191 0.272551
A 1 41311 41311 16.1941 0.010079 *
B 1 218 218 0.0853 0.781987
C 1 374850 374850 146.9446 6.749e-05 ***
Rep:Block 2 458 229 0.0898 0.915560
A:B 1 3528 3528 1.3830 0.292529
A:C 1 94403 94403 37.0066 0.001736 **
B:C 1 18 18 0.0071 0.936205
A:B:C 1 6 6 0.0024 0.962816
Residuals 5 12755 2551
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sarguta (SoM) Design and Analysis February - May, 2021 188 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Residual Analysis
Sarguta (SoM) Design and Analysis February - May, 2021 189 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Computing SSBlocks(Rep)
How is SSBlocks(Rep) computed? One way is to compute SSABC in Replicate I,
where this effect is confounded with blocks, and similarly SSAB in Replicate II, and
add them:
> # Calculate SS.Blocks.in.Rep as SS of effects
> #confounded with blocks:
> SSABC.confounded <- ((sum(y[Rep=="I" &
+ ABC==1])-sum(y[Rep=="I" & ABC==-1]))^2)/8
> SSAB.confounded <- ((sum(y[Rep=="II" &
+ AB==1])-sum(y[Rep=="II" & AB==-1]))^2)/8
> SS.Blocks.in.Rep <- SSABC.confounded + SSAB.confounded
> SSABC.confounded
[1] 338
> SSAB.confounded
[1] 120.125
> SS.Blocks.in.Rep
[1] 458.125
which is in agreement with the ANOVA output.
Sarguta (SoM) Design and Analysis February - May, 2021 190 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Computing SSBlocks(Rep)
Here ȳij. is the average in block j of replicate i and ȳi.. is the overall average
of that replicate, which is the only one in which that block makes sense.
Sarguta (SoM) Design and Analysis February - May, 2021 191 / 263
Blocking and Confounding in the 2k Factorial Design Confounding
Sarguta (SoM) Design and Analysis February - May, 2021 192 / 263
Fractional Factorial Designs
PART V:
FRACTIONAL FACTORIAL DESIGNS
Sarguta (SoM) Design and Analysis February - May, 2021 193 / 263
Fractional Factorial Designs Introduction
Sarguta (SoM) Design and Analysis February - May, 2021 194 / 263
Fractional Factorial Designs Introduction
Example
I 23 factorial which could be run in two blocks, with ABC confounded with
blocks:
Effect
I A B C AB AC BC ABC Blocks
(1) + - - - + + + - 1
a + + - - + 2
b + - + - + 2
ab + + + - + - - - 1
c + - - + + 2
ac + + - + - 1
bc + - + + - 1
abc + + + + + 2
I If we run only block 2, then the design uses a, b, c, abc. These are those for
which ABC = +; since also I = + we say that the defining relation for the
design is I = ABC , and we refer to the ’word’ ABC as the ’generator’ of the
design.
I If we only used those combinations with A = +, then A = I would be the
defining relation and A the generator of the design.
Sarguta (SoM) Design and Analysis February - May, 2021 195 / 263
Fractional Factorial Designs One-Half Fraction
One-half Fraction
I By running only block 2, our one-half fraction is
Effect
I A B C AB AC BC ABC
a + + - - - - + +
b + - + - - + - +
c + - - + + - - +
abc + + + + + + + +
I The estimates of the effects are obtained by applying the ± signs
appropriately.
I We use [·]’s to distinguish these from the full factorial estimates.
a − b − c + abc
[A] = = [BC ] ,
2
−a + b − c + abc
[B] = = [AC ] ,
2
−a − b + c + abc
[C ] = = [AB] .
2
I We say that these pairs of effects are aliases.
Sarguta (SoM) Design and Analysis February - May, 2021 196 / 263
Fractional Factorial Designs One-Half Fraction
1 a − b − c + abc
(A + BC ) = ,
2 2
so that [A] and [BC ] are each estimating the same thing as A + BC . This is
denoted as [A] → A + BC , [B] → B + AC , etc.
I These relations can also be obtained by doing multiplication (mod 2) on the
defining relation:
I = ABC ⇒ A = A2 BC = BC ,
B = AB 2 C = AC ,
etc.
Sarguta (SoM) Design and Analysis February - May, 2021 197 / 263
Fractional Factorial Designs One-Half Fraction
I The one-half fraction with defining relation ABC = I is called the principal
fraction. The other half, in which ABC = −, is called the alternate or
complementary fraction, and has defining relation:
I = −ABC .
I Complementary fraction
Effect
I A B C AB AC BC ABC
(1) + - - - + + + -
ab + + + - + - - -
ac + + - + - - - -
bc + - + + - + + -
Sarguta (SoM) Design and Analysis February - May, 2021 198 / 263
Fractional Factorial Designs One-Half Fraction
0 −(1) + ab + ac − bc 0
[A] = = − [BC ] ,
2
0 −(1) + ab − ac + bc 0
[B] = = − [AC ] ,
2
0
[C ] = ?? =??. (7)
−(1) + ab + ac − bc
A − BC = ,
2
so that
0
[A] → A − BC
0
[B] → B − AC
etc
Sarguta (SoM) Design and Analysis February - May, 2021 199 / 263
Fractional Factorial Designs One-Half Fraction
I In practice, it does not matter which fraction is actually used. Both fractions
belong to the same family; that is, the two one-half fractions form a complete
23 design.
I Suppose that after running one of the one-half fractions of the 23 design, the
other fraction was also run. Thus, all eight runs associated with the full 23
are now available.
I We may now obtain de-aliased estimates of all the effects by analyzing the
eight runs as a full 23 design in two blocks of four runs each. This could also
be done by adding and subtracting the linear combination of effects from the
two individual fractions.
I For example, consider [A] → A + BC and [A]0 → A − BC . This implies that
1 1
([A] + [A]0 ) = (A + BC + A − BC ) → A
2 2
and that
1 1
([A] − [A]0 ) = (A + BC − A + BC ) → BC
2 2
Sarguta (SoM) Design and Analysis February - May, 2021 200 / 263
Fractional Factorial Designs Design Resolution
Design Resolution
A design is of resolution R if no p-factor effect is aliased with another effect
containing less than R − p factors.
1. Resolution III designs. These are designs in which no main effects are
aliased with any other main effect, but main effects are aliased with
two-factor interactions and some two-factor interactions may be aliased with
each other. A 23−1 design with I = ABC is a resolution III design (23−1 III ).
2. Resolution IV designs. These are designs in which no main effect is aliased
with any other main effect or with any two-factor interaction, but two-factor
interactions are aliased with each other. A 24−1 design with I = ABCD is a
resolution IV design (24−1
IV ).
3. Resolution V designs. These are designs in which no main effect or
two-factor interaction is aliased with any other main effect or two-factor
interaction, but two factor interactions are aliased with three-factor
interactions. A 25 − 1 design with I = ABCDE is a resolution V design
(25−1
V ).
In general, the resolution of a two-level fractional factorial design is equal to the
number of letters in the shortest word in the defining relation. Consequently, we
could call the preceding design types three-, four-, and five-letter designs,
respectively.
Sarguta (SoM) Design and Analysis February - May, 2021 201 / 263
Fractional Factorial Designs Design Resolution
I = ABCD.
I Note that
The defining relationship implies that D = ABC .
The principal (or complementary) half of a 2k factorial is a full 2k−1 factorial
for k − 1 of the factors.
I Thus we can get the design by writing down a full 23 factorial for A, B and
C, and computing the signs for D from D = ABC .
Sarguta (SoM) Design and Analysis February - May, 2021 202 / 263
Fractional Factorial Designs Design Resolution
4−1
Resulting 2IV Design
Effect
I A B C D=ABC y
(1) + - - - - 45
a + + - - + 100
b + - + - + 45
ab + + + - - 65
c + - - + + 75
ac + + - + - 60
bc + - + + - 80
abc + + + + + 96
The liasing techniques are:
A = BCD, B = ACD, C = ABD, D = ABC , AB = CD, AC = BD, AD = BC .
Thus
[A] → A + BCD,
[AB] → AB + CD,
etc.
Sarguta (SoM) Design and Analysis February - May, 2021 203 / 263
Fractional Factorial Designs Design Resolution
Analysis - R
For the analysis, first (try to) fit the full 24 model:
> A <- rep(c(-1,1), times=4)
> B <- rep(c(-1,1), each = 2, times=2)
> C <- rep(c(-1,1), each = 4)
> D <- A*B*C
> A <- as.factor(A)
> B <- as.factor(B)
> C <- as.factor(C)
> D <- as.factor(D)
> y <- c(45, 100, 45, 65, 75, 60, 80, 96)
> data <- data.frame(A, B, C, D, y)
Sarguta (SoM) Design and Analysis February - May, 2021 204 / 263
Fractional Factorial Designs Design Resolution
Model
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 722.0 722.0
B 1 4.5 4.5
C 1 392.0 392.0
D 1 544.5 544.5
A:B 1 2.0 2.0
A:C 1 684.5 684.5
A:D 1 722.0 722.0
Residuals 0 0.0
Sarguta (SoM) Design and Analysis February - May, 2021 205 / 263
Fractional Factorial Designs Design Resolution
Only one member of each aliased pair is exhibited; by default it is the shortest
word in the pair. From the ANOVA it looks like only A, C and D have significant
main effects.
> g$effects
> effects <- abs(g$effects[-1])
> qq <- qqnorm(effects, type="n") # "n" means no plotting
> text(qq$x, qq$y, labels = names(effects))
Sarguta (SoM) Design and Analysis February - May, 2021 206 / 263
Fractional Factorial Designs Design Resolution
A1:D1 A1
A1:C1
25
D1
20
C1
Sample Quantiles
15
10
5
B1
A1:B1
Theoretical Quantiles
Sarguta (SoM) Design and Analysis February - May, 2021 207 / 263
Fractional Factorial Designs Design Resolution
I The half normal plot also points to AD (= BC) and AC (= BD) as significant.
I Since B is not significant we wouldn’t expect BC or BD to be significant
either.
I We conclude that the factors of interest are A, C, D and the interactions AC,
AD.
Sarguta (SoM) Design and Analysis February - May, 2021 208 / 263
Fractional Factorial Designs Design Resolution
Sarguta (SoM) Design and Analysis February - May, 2021 209 / 263
Fractional Factorial Designs Design Resolution
Analysis
> h <- lm(y ~(A+C+D)^2)
> anova(h)
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 722.0 722.0 160.4444 0.05016 .
C 1 392.0 392.0 87.1111 0.06795 .
D 1 544.5 544.5 121.0000 0.05772 .
A:C 1 684.5 684.5 152.1111 0.05151 .
A:D 1 722.0 722.0 160.4444 0.05016 .
C:D 1 2.0 2.0 0.4444 0.62567
Residuals 1 4.5 4.5
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If the insignificant SSCD were combined with SSE , we would have MSE = 3.25 on
2 d.f., and all F-values would be 4.5/3.25 = 1.38 times as large.
Sarguta (SoM) Design and Analysis February - May, 2021 210 / 263
Fractional Factorial Designs Generalization
PART VI:
RESPONSE SURFACE METHODS AND DESIGNS
Sarguta (SoM) Design and Analysis February - May, 2021 212 / 263
Response Surface Methods and Designs Introduction
Introduction
I The purpose of response surface methods (RSM) is to optimize a process or
system. RSM is a way to explore the effect of operating conditions (the
factors) on the response variable, y .
I As we map out the response surface of y we move our process as close as
possible towards the optimum, taking into account any constraints.
I Initially, when we are far away from the optimum, we will use factorial
experiments. As we approach the optimum then these factorials are replaced
with better designs that more closely approximate conditions at the optimum.
I For example, suppose that a chemical engineer wishes to find the levels of
temperature (x1 ) and pressure (x2 ) that maximize the yield (y ) of a process.
The process yield is a function of the levels of temperature and pressure say
y = f (x1 , x2 ) +
Approximating Polynomials
I In most RSM problems, the form of the relationship between the response
and the independent variables is unknown.
I The first step in RSM is to find a suitable approximation for the true
functional relationship between y and the set of independent variables.
I Usually, a low-order polynomial in some region of the independent variables is
employed.
I If the response is well modeled by a linear function of the independent
variables, then the approximating function is the first-order model:
y = β0 + β1 x1 + β2 x2 + · · · + βk xk +
Sarguta (SoM) Design and Analysis February - May, 2021 214 / 263
Response Surface Methods and Designs Approximation
I Experiments are conducted along the path of steepest ascent until no further
increase in response is observed. Then a new first-order model may be fit, a
new path of steepest ascent determined, and the procedure continued.
I Eventually, the experimenter will arrive in the vicinity of the optimum. This is
usually indicated by lack of fit of a first-order model.
I At that time, additional experiments are conducted to obtain a more precise
estimate of the optimum.
Sarguta (SoM) Design and Analysis February - May, 2021 216 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
Residuals:
Min 1Q Median 3Q Max
-0.244444 -0.044444 0.005556 0.055556 0.255556
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.44444 0.05729 705.987 5.45e-16 ***
x1 0.77500 0.08593 9.019 0.000104 ***
x2 0.32500 0.08593 3.782 0.009158 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sarguta (SoM) Design and Analysis February - May, 2021 219 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
I Employing the methods for two-level designs, we obtain the following model
in the coded variables:
I Before exploring along the path of steepest ascent, the adequacy of the
first-order model should be investigated.
I The 22 design with center points allows the experimenter to
Obtain an estimate of error.
Check for interactions (cross-product terms) in the model.
Check for quadratic effects (curvature).
I The replicates at the center can be used to calculate an estimate of error as
follows:
(202.3)2
(40.3)2 + (40.5)2 + (40.7)2 + (40.6)2 −
σ̂ 2 = 5
= 0.0430
4
Sarguta (SoM) Design and Analysis February - May, 2021 220 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
I The first-order model assumes that the variables x1 and x2 have an additive
effect on the response.
I Interaction between the variables would be represented by the coefficient β12
of a cross-product term x1 x2 added to the model.
I The least squares estimate of this coefficient is just one-half the interaction
effect calculated as in an ordinary 22 factorial design, or
1 1
β̂12 = [(1 ∗ 39.3) + (1 ∗ 41.5) + (−1 ∗ 40.0) + (−1 ∗ 40.9)] = (−0.1) = −0.0
4 4
I The single-degree-of-freedom sum of squares for interaction is
(−0.1)2
SSInteraction = = 0.0025
4
I Comparing SSInteraction to σ̂ 2 gives a lack-of-fit statistic
SSInteraction 0.0025
F = = = 0.058
σ̂ 2 0.0430
which is small, indicating that interaction is negligible.
Sarguta (SoM) Design and Analysis February - May, 2021 221 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
Call:
lm(formula = y ~ x1 * x2)
Residuals:
1 2 3 4 5 6 7
-0.01944 -0.01944 -0.01944 -0.01944 -0.14444 0.05556 0.25556 -0.24
9
0.15556
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.44444 0.06231 649.069 1.65e-13 ***
x1 0.77500 0.09347 8.292 0.000417 ***
x2 0.32500 0.09347 3.477 0.017713 *
x1:x2 -0.02500 0.09347 -0.267 0.799787
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x1 1 2.40250 2.40250 68.7520 0.0004166 ***
x2 1 0.42250 0.42250 12.0906 0.0177127 *
x1:x2 1 0.00250 0.00250 0.0715 0.7997870
Residuals 5 0.17472 0.03494
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sarguta (SoM) Design and Analysis February - May, 2021 223 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
I Since
SSPureQuadratic 0.0027
F = = = 0.063
σ̂ 2 0.0430
is small, there is no indication of a pure quadratic effect.
I Both the interaction and curvature checks are not significant.
I The standard error of β̂1 and β̂2 is
r
r MS σ̂ 2
r
0.0430
E
se β̂i = = = = 0.10
4 4 4
for i = 1, 2. Both regression coefficients β̂1 and β̂2 are large relative to their
standard errors.
I At this point we have no reason to question the adequacy of the first order
model.
Sarguta (SoM) Design and Analysis February - May, 2021 225 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
Response: y
Sarguta (SoM) Design and Analysis February - May, 2021 226 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
I To move away from the design center - the point (x1 = 0, x2 = 0) - along the
path of steepest ascent, we would move 0.775 units in the x1 direction for
every 0.325 units in the x2 direction.
I Thus, the path of steepest ascent passes through the point (x1 = 0, x2 = 0)
and has a slope 0.325/0.775.
I The engineer decides to use 5 minutes of reaction time as the basic step size.
Using the relationship between ξ1 and x1 , we see that 5 minutes of reaction
time is equivalent to a step in the coded variable x1 of ∆x1 .
I Therefore, the steps along the path of steepest ascent are ∆x1 = 1.0000 and
∆x2 = (0.325/0.775) = 0.42.
I The engineer computes points along this path and observes the yields at
these points until a decrease in response is noted.
I Although the coded variables are easier to manipulate mathematically, the
natural variables must be used in running the process.
I Increases in response are observed through the tenth step; however, all steps
beyond this point result in a decrease in yield. Therefore, another first-order
model should be fit in the general vicinity of the point (ξ1 = 85, ξ2 = 175).
Sarguta (SoM) Design and Analysis February - May, 2021 227 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
Sarguta (SoM) Design and Analysis February - May, 2021 228 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
I A new first-order model is fit around the point (ξ1 = 85, ξ2 = 175).
I The region of exploration for ξ1 is [80,90], and it is [170,180] for ξ2 .
I The coded variable are therefore
ξ1 − 85
x1 =
5
and
ξ2 − 175
x2 =
5
I A 22 design with five center points is used.
Sarguta (SoM) Design and Analysis February - May, 2021 229 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
Sarguta (SoM) Design and Analysis February - May, 2021 230 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
> y1<-c(76.5,77,78,79.5,79.9,80.3,80,79.7,79.8)
> x1<-c(-1,-1,1,1,0,0,0,0,0)
> x2<-c(-1,1,-1,1,0,0,0,0,0)
> fit_rsm<-rsm(y1~FO(x1,x2))
> summary(fit_rsm)
Call:
rsm(formula = y1 ~ FO(x1, x2))
Response: y1
Sarguta (SoM) Design and Analysis February - May, 2021 231 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
Response: y1
Df Sum Sq Mean Sq F value Pr(>F)
FO(x1, x2) 2 5.000 2.500 1.150 0.3882680
Sarguta (SoM) Design and Analysis February - May, 2021 232 / 263
Response Surface Methods and Designs The Method of Steepest Ascent
I The Lack of fit check imply that the first-order model is not an adequate
approximation.
I This curvature in the true surface may indicate that we are near the optimum.
I At this point, additional analysis must be done to locate the optimum more
precisely.
Sarguta (SoM) Design and Analysis February - May, 2021 233 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface
I Once we are close to the optimal solution, the plane will be rather flat.
I We expect that the optimal solution will be somewhere in our experimental
set-up (a peak). Hence, we expect curvature.
I A model that incorporates curvature is usually required to approximate the
response.
I In most cases, the second-order model
k
X k
X XX
y = β0 + βi xi + βii xi2 + βij xi xj +
i=1 i=1 i< j
is adequate.
I We now have more parameters, hence we need more observations to fit the
model.
Sarguta (SoM) Design and Analysis February - May, 2021 234 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface
Sarguta (SoM) Design and Analysis February - May, 2021 235 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface
78.5 79 79.5
0.5
80
79
Yield
80
80
78
.5
0.0
x1
77
1.0
79.5
0.5
−0.5
x10.0 1.0
79
−0.5 0.5
78.5
0.0
−1.0
−0.5 x2
78 −1.0
77.5
−1.0
77
x2
Sarguta (SoM) Design and Analysis February - May, 2021 236 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface
ŷ = β̂0 + x 0 b + x 0 Bx (9)
I By substituting Equation (11) into Equation (9), we can find the predicted
response at the stationary point as
1
ŷs = β̂0 + xs0 b (12)
2
Sarguta (SoM) Design and Analysis February - May, 2021 238 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface
Example
Sarguta (SoM) Design and Analysis February - May, 2021 239 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface
> yield<-c(76.5,77,78,79.5,79.9,80.3,80,79.7,
+ 79.8,78.4,75.6,78.5,77)
> Temperature<-c(-1,-1,1,1,0,0,0,0,0,1.414,-1.414,0,0)
> Time<-c(-1,1,-1,1,0,0,0,0,0,0,0,1.414,-1.414)
> chem2.lm <- lm(yield ~ poly(Temperature, Time, degree=2))
> par(mfrow=c(1,2))
> contour(chem2.lm, Temperature ~ Time, main="Contour Plot")
> persp(chem2.lm, Temperature ~ Time, zlab = "Yield",
+ main="Response Surface")
Sarguta (SoM) Design and Analysis February - May, 2021 240 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface
78
77
76
.5
79.5
76.5
1.0
80
79
0.5
78
Yield
Temperature
80
76
0.0
74
77
−0.5
1.0
Te
0.5
78
m
0.0
pe
78.5
−0.5 0.5 1.0
rat
0.0
−1.0
−1.0
76
u
−0.5 Time
re
77.5 −1.0
75
.5
74 76.5
.5 75
75
−1.5
Time
Sarguta (SoM) Design and Analysis February - May, 2021 241 / 263
Experiments with Random Factors
PART VII:
EXPERIMENTS WITH RANDOM FACTORS
Sarguta (SoM) Design and Analysis February - May, 2021 242 / 263
Experiments with Random Factors
I So far we have considered only fixed factors, that is, the levels of the factors
used by the experimenter were the specific levels of interest - fixed levels of
temperature, pressure, etc.
I The statistical inferences made about these factors are confined to the
specific levels studied.
I Often factor levels are chosen at random from a larger population of potential
levels, and we wish to make inferences about the entire population of levels,
not just those that were used in the experimental design. The factor here, is
said to be a random factor.
Sarguta (SoM) Design and Analysis February - May, 2021 243 / 263
Experiments with Random Factors
Examples
Sarguta (SoM) Design and Analysis February - May, 2021 244 / 263
Experiments with Random Factors
Sarguta (SoM) Design and Analysis February - May, 2021 246 / 263
Experiments with Random Factors
Test Statistic
I From the expected mean squares, we see that the appropriate statistic for
testing the no interaction hypothesis H0 : στ2 β = 0 is
MSAB
F0 =
MSE
because under H0 both numerator and denominator of F0 have expectation
σ 2 , and only if H0 is false is E (MSAB ) greater than E (MSE ). F0 is
distributed as F(a−1)(b−1),ab(n−1) .
I Similarly, for testing H0 : στ2 = 0 we would use
MSA
F0 =
MSAB
which is distributed as F(a−1),(a−1)(b−1) .
I For testing H0 : σβ2 = 0 the test statistic is
MSB
F0 =
MSAB
which is distributed as F(b−1),(a−1)(b−1) .
Sarguta (SoM) Design and Analysis February - May, 2021 247 / 263
Experiments with Random Factors
σ̂ 2 = MSE
MSAB − MSE
σ̂τ2 β =
n
MSB − MSAB
σ̂β2 =
an
MSA − MSAB
σ̂τ2 =
bn
as the point estimates of the variance components in the two-factor random
effects model.
I These are moment estimators.
Sarguta (SoM) Design and Analysis February - May, 2021 248 / 263
Experiments with Random Factors
Sarguta (SoM) Design and Analysis February - May, 2021 249 / 263
Experiments with Random Factors
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
part 19 1185.43 62.391 62.9151 <2e-16 ***
operator 2 2.62 1.308 1.3193 0.2750
part:operator 38 27.05 0.712 0.7178 0.8614
Residuals 60 59.50 0.992
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The interaction effect is not significant (p-value = 0.862).
Sarguta (SoM) Design and Analysis February - May, 2021 250 / 263
Experiments with Random Factors
Test Statistics
I There is no change in the ANOVA, the formation of the mean squares, or the
d.f.
I However, the relevant F-ratios are not necessarily what they were in the fixed
factor case. One must start by determining the expected values of the mean
squares.
F-value for A is 87.87324
p-value for A is 0
F-value for B is 1.84507
p-value for B is 0.1718915
F-value for AB is 0.7171717
p-value for AB is 0.8620944
I The variance components can be estimated by equating the mean squares to
their expected values and solving the resulting equations.
> var.tau.beta <- (MSAB-MSE)/n
> cat("Estimate of sigma.sqd(tau.beta) =", var.tau.beta,"\n")
Estimate of sigma.sqd(tau.beta) = -0.14
Sarguta (SoM) Design and Analysis February - May, 2021 251 / 263
Experiments with Random Factors
Interpretation
I Notice that the estimate of one of the variance components, στ2 β is negative.
This is certainly not reasonable because by definition variances are
non-negative.
I We can deal with this negative result in a variety of ways. One possibility is
to assume that the negative estimate means that the variance component is
really zero and just set it to zero, leaving the other non-negative estimates
unchanged. Another approach is to estimate the variance components with a
method that assures non-negative estimates (this can be done with the
maximum likelihood approach).
I The p-value for the interaction term in the ANOVA table is very large, take
this as evidence that στ2 β really is zero and that there is no interaction effect,
and then fit a reduced model.
Sarguta (SoM) Design and Analysis February - May, 2021 252 / 263
Experiments with Random Factors
Reduced Model
I Fit a reduced model of the form
yijk = µ + τi + βj + ijk
that does not include the interaction term.
I Here
E [MSA ] = σ 2 + bnστ2 ,
E [MSB ] = σ 2 + anσβ2 ,
E [MSE ] = σ2 .
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
part 19 1185.43 62.391 70.6447 <2e-16 ***
operator 2 2.62 1.308 1.4814 0.2324
Residuals 98 86.55 0.883
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sarguta (SoM) Design and Analysis February - May, 2021 253 / 263
Experiments with Random Factors
I Since there is no interaction term in the model, both main effects are tested
against the error term, and the estimates of the variance components are
62.39 − 0.88
σ̂τ2 = = 10.25
(3)(2)
1.31 − 0.88
σ̂β2 = = 0.0108
(20)(2)
σ̂ 2 = 0.88
I The variability of the gauge (arising from operator variability and random
error) is estimated by
2
σ̂gauge = σ̂ 2 + σ̂β2 = 0.88 + 0.0108 = 0.8908
I The variability in the gauge appears small relative to the variability in the
product (σ̂τ2 ). This is generally a desirable situation, implying that the gauge
is capable of distinguishing among different grades of product.
Sarguta (SoM) Design and Analysis February - May, 2021 254 / 263
Experiments with Random Factors
R - Output
Sarguta (SoM) Design and Analysis February - May, 2021 255 / 263
Experiments with Random Factors
I The appropriate test statistic for testing that the means of the fixed factor
MSA
effects are equal, or H0 : τi = 0, is F0 = MS AB
for which the reference
distribution is Fa−1,(a−1)(b−1) .
I For testing H0 : σβ2 = 0, the test statistic is F0 = MSB
MSE with reference
distribution Fb−1,ab(n−1) .
I For testing the interaction hypothesis H0 : στ2 β = 0, we would use F0 = MSAB
MSE
which has a reference distribution F(a−1)(b−1),ab(n−1) .
Sarguta (SoM) Design and Analysis February - May, 2021 257 / 263
Experiments with Random Factors
Estimates
I Fixed effects
µ̂ = ȳ...
τ̂i = ȳi.. − ȳ... .
Sarguta (SoM) Design and Analysis February - May, 2021 258 / 263
Experiments with Random Factors
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
operator 2 2.62 1.308 1.3193 0.2750
part 19 1185.43 62.391 62.9151 <2e-16 ***
operator:part 38 27.05 0.712 0.7178 0.8614
Residuals 60 59.50 0.992
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sarguta (SoM) Design and Analysis February - May, 2021 259 / 263
Experiments with Random Factors
Mean Squares
Sarguta (SoM) Design and Analysis February - May, 2021 260 / 263
Experiments with Random Factors
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
operator 2 2.62 1.308 1.4814 0.2324
part 19 1185.43 62.391 70.6447 <2e-16 ***
Residuals 98 86.55 0.883
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sarguta (SoM) Design and Analysis February - May, 2021 261 / 263
Experiments with Random Factors
Mean Squares
Sarguta (SoM) Design and Analysis February - May, 2021 262 / 263
Experiments with Random Factors
References
Sarguta (SoM) Design and Analysis February - May, 2021 263 / 263