Download as pdf or txt
Download as pdf or txt
You are on page 1of 263

STA 508

Design and Analysis of Experiments


Lecture Notes

Rachel Sarguta

School of Mathematics, University of Nairobi

February - May, 2021

Sarguta (SoM) Design and Analysis February - May, 2021 1 / 263


Introduction

PART I:
INTRODUCTION

Sarguta (SoM) Design and Analysis February - May, 2021 2 / 263


Introduction Definitions

Introduction
Definitions I

I Experiment (also called a Run) is an action where the experimenter changes


at least one of the variables being studied and then observes the effect of his
or her actions(s). Note the passive collection of observational data is not
experimentation.
I Experimental Unit is the item under study upon which something is changed.
This could be raw materials, human subjects, or just a point in time.
I Sub-Sample, Sub-Unit, or Observational Unit When the experimental unit is
split, after the action has been taken upon it, this is called a sub-sample or
sub-unit. Sometimes it is only possible to measure a characteristic separately
for each sub-unit; for that reason they are often called observational units.
Measurements on sub-samples, or sub-units of the same experimental unit,
are usually correlated and should be averaged before analysis of data rather
than being treated as independent outcomes.

Sarguta (SoM) Design and Analysis February - May, 2021 3 / 263


Introduction Definitions

Introduction
Definitions II

I Independent Variable (Factor or Treatment Factor) is one of the variables


under study that is being controlled at or near some target value, or level,
during any given experiment. The level is being changed in some systematic
way from run to run in order to determine what effect it has on the
response(s).
I Background Variable (also called a Lurking Variable) is a variable that the
experimenter is unaware of or cannot control, and which could have an effect
on the outcome of the experiment. In a well-planned experimental design, the
effect of these lurking variables should balance out so as to not alter the
conclusion of a study.
I Dependent Variable (or the Response denoted by Y ) is the characteristic of
the experimental unit that is measured after each experiment or run. The
magnitude of the response depends upon the settings of the independent
variables or factors and lurking variables.

Sarguta (SoM) Design and Analysis February - May, 2021 4 / 263


Introduction Definitions

Introduction
Definitions III

I Effect is the change in the response that is caused by a change in a factor or


independent variable. After the runs in an experimental design are conducted,
the effect can be estimated by calculating it from the observed response data.
This estimate is called the calculated effect. Before the experiments are ever
conducted, the researcher may know how large the effect should be to have
practical importance. This is called a practical effect or the size of a practical
effect.
I Replicate runs are two or more experiments conducted with the same settings
of the factors or independent variables, but using different experimental units.
The measured dependent variable may differ among replicate runs due to
changes in lurking variables and inherent differences in experimental units.
I Duplicates refer to duplicate measurements of the same experimental unit
from one run or experiment. The measured dependent variable may vary
among duplicates due to measurement error, but in the analysis of data these
duplicate measurements should be averaged and not treated as separate
responses.
Sarguta (SoM) Design and Analysis February - May, 2021 5 / 263
Introduction Definitions

Introduction
Definitions IV

I Experimental Design is a collection of experiments or runs that is planned in


advance of the actual execution. The particular runs selected in an
experimental design will depend upon the purpose of the design.
I Confounded Factors arise when each change an experimenter makes for one
factor, between runs, is coupled with an identical change to another factor.
In this situation it is impossible to determine which factor causes any
observed changes in the response or dependent variable.
I Biased Factor results when an experimenter makes changes to an
independent variable at the precise time when changes in background or
lurking variables occur. When a factor is biased it is impossible to determine
if the resulting changes to the response were caused by changes in the factor
or by changes in other background or lurking variables.
I Experimental Error is the difference between the observed response for a
particular experiment and the long run average of all experiments conducted
at the same settings of the independent variables or factors. The fact that it
is called ”error” should not lead one to assume that it is a mistake or blunder.
Sarguta (SoM) Design and Analysis February - May, 2021 6 / 263
Introduction Definitions

Observational Studies vs Experiments

I In an observational study, variables (both independent and dependent) are


observed without any attempt to change or control the value of the
independent factors. Therefore any observed changes in the response, or
dependent variable, cannot necessarily be attributed to observed changes in
the independent variables because background or lurking variables might be
the cause.
I In an experiment, however, the independent variables are purposely varied
and the runs are conducted in a way to balance out the effect of any
background variables that change. In this way the average change in the
response can be attributed to the changes made in the independent variables.

Sarguta (SoM) Design and Analysis February - May, 2021 7 / 263


Introduction Definitions

Purposes of Experimental Design

I Statistical experimental designs provide a plan for collecting data in a way


that they can be analyzed statistically to corroborate the conjecture in
question. When an experimental design is used, the conjecture must be
stated clearly and a list of experiments proposed in advance to provide the
data to test the hypothesis. This is an organized approach which helps to
avoid false starts and incomplete answers to research questions.
I Another advantage to using the experimental design approach is the ability to
avoid confounding factor effects. When the research hypothesis is not clearly
stated and a plan is not constructed to investigate it, researchers tend toward
a trial and error approach wherein many variables are simultaneously changed
in an attempt to achieve some goal. When this is the approach, the goal may
sometimes be achieved, but it cannot be repeated because it is not known
what changes actually caused the improvement.
I One of the main purposes for experimental designs is to minimize the effect
of experimental error. Aspects of designs that do this, such as randomization,
replication, and blocking, are called methods of error control.

Sarguta (SoM) Design and Analysis February - May, 2021 8 / 263


Introduction Basic Principles of Experimental Designs

Basic Principles of Experimental Designs

(a) Replication: This is where more than one observation is taken for each
combination of treatment factor levels used in the experiment.
(b) Blocking: This refers to the use of blocking factors to divide the
experimental units into sets (blocks) such that the units within sets are more
homogeneous (less variable with respect to the response variable) than units
in general.
(c) Randomization: This means that the allocation of experimental units to
combinations of treatment factor levels should be randomly determined.
Randomization is important because:
(i) It provides a solid basis for the statistical analysis of the data produced by an
experiment.
(ii) It helps avoid systematic biasing of results caused by an “unfair” allocation of
treatments to experimental units.
(iii) Randomization can provide a basis for the analysis of experimental data even
when the usual assumptions we make about the observations are violated.

Sarguta (SoM) Design and Analysis February - May, 2021 9 / 263


Introduction Planning Experiments

Planning Experiments

An effective experimental design plan should include the following items:


I A clear description of the objectives,
I an appropriate design plan that guarantees unconfounded factor effects and
factor effects that are free of bias,
I a provision for collecting data that will allow estimation of the variance of the
experimental error, and
I a stipulation to collect enough data to satisfy the objectives.

Sarguta (SoM) Design and Analysis February - May, 2021 10 / 263


Introduction Planning Experiments

Planning Experiments
Steps I

I Define Objectives. Define the objectives of the study. First, this statement
should answer the question of why is the experiment to be performed.
Second, determine if the experiment is conducted to classify sources of
variability or if its purpose is to study cause and effect relationships. If it is
the latter, determine if it is a screening or optimization experiment. For
studies of cause and effect relationships, decide how large an effect should be
in order to be meaningful to detect.
I Identify Experimental Units. Declare the item upon which something will
be changed. Is it an animal or human subject, raw material for some
processing operation, or simply the conditions that exist at a point in time or
trial? Identifying the experimental units will help in understanding the
experimental error and variance of experimental error.
I Define a Meaningful and Measurable Response or Dependent Variable.
Define what characteristic of the experimental units can be measured and
recorded after each run. This characteristic should best represent the
expected differences to be caused by changes in the factors.
Sarguta (SoM) Design and Analysis February - May, 2021 11 / 263
Introduction Planning Experiments

Planning Experiments
Steps II

I List the Independent and Lurking Variables. Declare which independent


variables you wish to study. Be sure that the independent variables chosen to
study can be controlled during a single run, and varied from run to run. If
there is interest in a variable, but it cannot be controlled or varied, it cannot
be included as a factor. Variables that are hypothesized to affect the
response, but cannot be controlled, are lurking variables. The proper
experimental design plan should prevent uncontrollable changes in these
variables from biasing factor effects under study.
I Run Pilot Tests. Make some pilot tests to be sure you can control and vary
the factors that have been selected, that the response can be measured, and
that the replicate measurements of the same or similar experimental units are
consistent. Inability to measure the response accurately or to control the
factor levels are the main reasons that experiments fail to produce desired
results. If the pilot tests fail, go back to steps 2, 3, and 4. If these tests are
successful, measurements of the response for a few replicate tests with the
same levels of the factors under study will produce data that can be used to
get a preliminary estimate of the variance of experimental error.
Sarguta (SoM) Design and Analysis February - May, 2021 12 / 263
Introduction Planning Experiments

Planning Experiments
Steps III

I Make a Flow Diagram of the Experimental Procedure for Each Run.


This will make sure the procedure to be followed is understood and will be
standardized for all runs in the design.
I Choose the Experimental Design. Choose an experimental design that is
suited for the objectives of your particular experiment. This will include a
description of what factor levels will be studied and will determine how the
experimental units are to be assigned to the factor levels or combination of
factor levels if there are more than one factor. The choice of the experimental
design will also determine what model should be used for analysis of the data.
I Determine the Number of Replicates Required. Based on the expected
variance of the experimental error and the size of a practical difference, the
researcher should determine the number of replicate runs that will give a high
probability of detecting an effect of practical importance.

Sarguta (SoM) Design and Analysis February - May, 2021 13 / 263


Introduction Planning Experiments

Planning Experiments
Steps IV

I Randomize the Experimental Conditions to Experimental Units.


According to the particular experimental design being used, there is a
proscribed method of randomly assigning experimental conditions to
experimental units. In some designs, factor levels or combination of factor
levels are assigned to experimental units completely at random. In other
designs, randomizing factor levels is performed separately within groups of
experimental units and may be done differently for different factors. The way
the randomization is done affects the way the data should be analyzed, and it
is important to describe and record exactly what has been done.
I Describe a Method for Data Analysis. This should be an outline of the
steps of the analysis. An actual analysis of simulated data is often useful to
verify that the proposed outline will work.
I Timetable and Budget for Resources Needed to Complete the
Experiments. Experimentation takes time and having a schedule to adhere
to will improve the chances of completing the research on time.

Sarguta (SoM) Design and Analysis February - May, 2021 14 / 263


Completely Randomized Designs

PART II:
EXPERIMENTS WITH A SINGLE FACTOR:
COMPLETELY RANDOMIZED DESIGNS

Sarguta (SoM) Design and Analysis February - May, 2021 15 / 263


Completely Randomized Designs Introduction

Completely Randomized Designs


Introduction

I In a completely randomized design, abbreviated as CRD, with one treatment


factor, n experimental units are divided randomly into t groups. Each group
is then subject to one of the unique levels or values of the treatment factor.
I If n = tr is a multiple of t, then each level of the factor will be applied to r
unique experimental units, and there will be r replicates of each run with the
same level of the treatment factor.
I If n is not a multiple of t, then there will be an unequal number of replicates
of each factor level. All other known independent variables are held constant
so that they will not bias the effects.
I This design should be used when there is only one factor under study and the
experimental units are homogeneous.

Sarguta (SoM) Design and Analysis February - May, 2021 16 / 263


Completely Randomized Designs Introduction

Linear Model for CRD

Cell Means Model


Yij = µi + eij
where Yij is the response for the jth experimental unit subject to the ith level of
the treatment factor, i = 1 . . . t, j = 1 . . . ri , and ri is the number of experimental
units or replications in ith level of the treatment factor.
Cell means model with a different mean µi for each level of the treatment factor.

Effects Model
Yij = µ + τi + eij
the effects model and the τi ’s are called the effects. τi represents the difference
between the long-run average of all possible experiments at the ith level of the
treatment factor and the overall average.

Sarguta (SoM) Design and Analysis February - May, 2021 17 / 263


Completely Randomized Designs Parameter Estimation

Parameter Estimation
For equal number of replicates, the sample means of the data in the ith level of
the treatment factor is represented by
ri
1X
ȳi. = yij
ri
j=1

and the grand mean is given by


t t ri
1X 1 XX
ȳ.. = ȳi. = yij
t n
i=1 i=1 j=1

Using the method of maximum likelihood, which is equivalent to the method of


least squares with these assumptions, the estimates of the cell means are found by
choosing them to minimize the error sum of squares. This results in the estimates:

µ̂i = ȳi.

Sarguta (SoM) Design and Analysis February - May, 2021 18 / 263


Completely Randomized Designs Parameter Estimation

Example: L.S. Calculations with R Function lm

In a bread rise experiment, if the experimenter wants to examine three different


rise times (35 minutes, 40 minutes, and 45 minutes) and test four replicate loaves
of bread at each rise time, the following code will create the list.
> set.seed(7638)
> f <- factor( rep( c(35, 40, 45 ), each = 4))
> fac <- sample( f, 12 )
> eu <- 1:12
> plan <- data.frame( loaf=eu, time=fac )
> write.csv( plan, file = "Plan.csv", row.names = FALSE)

Sarguta (SoM) Design and Analysis February - May, 2021 19 / 263


Completely Randomized Designs Parameter Estimation

Example Contd

The data from a CRD design for the bread rise experiment described earlier:
Rise Time Loaf Heights
35 4.5, 5.0, 5.5, 6.75
40 6.5, 6.5, 10.5, 9.5
45 9.75, 8.75, 6.5, 8.25
> bread <- read.csv("plan2.csv")
> bread$time<-as.factor(bread$time)

Sarguta (SoM) Design and Analysis February - May, 2021 20 / 263


Completely Randomized Designs Parameter Estimation

> library(daewr )
> mod0 <- lm( Height ~ time, data = bread )
> summary( mod0 )
Call:
lm(formula = Height ~ time, data = bread)

Residuals:
Min 1Q Median 3Q Max
-1.812 -1.141 0.000 1.266 2.250

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.4375 0.7655 7.104 5.65e-05 ***
time40 2.8125 1.0825 2.598 0.0288 *
time45 2.8750 1.0825 2.656 0.0262 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.531 on 9 degrees of freedom


Multiple R-squared: 0.5056, Adjusted R-squared: 0.3958
F-statistic: 4.602 on 2 and 9 DF, p-value: 0.042
Sarguta (SoM) Design and Analysis February - May, 2021 21 / 263
Completely Randomized Designs Parameter Estimation

Sums of Squares

> mod1 <- aov( Height ~ time, data = bread )


> summary(mod1)
Df Sum Sq Mean Sq F value Pr(>F)
time 2 21.57 10.786 4.602 0.042 *
Residuals 9 21.09 2.344
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sarguta (SoM) Design and Analysis February - May, 2021 22 / 263


Completely Randomized Designs Verifying Assumptions of the Linear Model

Verifying Assumptions of the Linear Model

I Constant variance of the experimental error, σ 2 , across all levels of the


treatment factor; Scatter plot of the model residuals versus the factor levels
can show whether the variability seen at each level of the factor is
approximately equal.
I Normality of experimental error; the normality of the experimental errors can
be checked by making a normal probability plot of the model residuals.

Sarguta (SoM) Design and Analysis February - May, 2021 23 / 263


Completely Randomized Designs Verifying Assumptions of the Linear Model

Plots

> par(mfrow = c(2,2))


> plot(mod1, which=5)
> plot(mod1, which=1)
> plot(mod1, which=2)
> plot(residuals(mod1) ~ loaf, main="Residuals vs Exp. Unit",
+ font.main=1,data=bread)
> abline(h = 0, lty = 2)

Sarguta (SoM) Design and Analysis February - May, 2021 24 / 263


Completely Randomized Designs Verifying Assumptions of the Linear Model

Plots

Constant Leverage:
Residuals vs Factor Levels Residuals vs Fitted

2.0
Standardized residuals
5 5

2
1.0

Residuals

1
0.0

0
−2 −1
−1.5

3 11 11 3

time :
35 40 45 5.5 6.0 6.5 7.0 7.5 8.0

Factor Level Combinations Fitted values

Residuals vs Exp. Unit


Normal Q−Q
2.0
Standardized residuals

2
5

residuals(mod1)
1.0

1
0.0

0
−1
−1.0

11 3

−1.5 −0.5 0.5 1.5 2 4 6 8 10 12

Theoretical Quantiles loaf

Sarguta (SoM) Design and Analysis February - May, 2021 25 / 263


Completely Randomized Designs Verifying Assumptions of the Linear Model

Question

What are the analysis strategies when assumptions are violated?

Sarguta (SoM) Design and Analysis February - May, 2021 26 / 263


Completely Randomized Designs Verifying Assumptions of the Linear Model

Exercise

In an experiment to study the effect of the amount of baking powder in a biscuit


dough upon the rise heights of the biscuits, four levels of baking powder were
tested and four replicate biscuits were made with each level in a random order.
The results are shown in the table below
.25 tsp .5 tsp .75 tsp 1 tsp
11.4 27.8 47.6 61.6
11.0 29.2 47.0 62.4
11.3 26.8 47.3 63.0
9.5 26.0 45.5 63.9
(a) What is the experimental unit?
(b) Perform the analysis of variance to test the hypothesis of no treatment effect.
(c) Estimate the variance of the experimental error σ 2 .
(d) Make a plot of residuals versus predicted values and normal plot of residuals
and comment on whether the assumptions of the linear model are justified.

Sarguta (SoM) Design and Analysis February - May, 2021 27 / 263


Randomized Blocks, Latin Squares and Related Designs

PART III:
RANDOMIZED BLOCKS, LATIN SQUARES AND
RELATED DESIGNS

Sarguta (SoM) Design and Analysis February - May, 2021 28 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Randomized Blocks
Introduction

I Example:
A hardness testing machine presses a pointed rod (the ’tip’) into a metal
specimen (a ’coupon’), with a known force. The depth of the depression is a
measure of the hardness of the specimen. It is feared that, depending on the
kind of tip used, the machine might give different readings. The experimenter
wants 4 observations on each of the 4 types of tips. Note that the differences
in readings might also depend on which type of metal specimen is used, i.e.
on the coupons.
I A Completely Randomized design would use 16 coupons, making 1 depression
in each. The coupons would be randomly assigned to the tips, hoping that
this would average out any differences between the coupons. Here ’coupon
type’ is a ’nuisance factor’ - it may affect the readings, but we aren’t very
interested in measuring its effect.
I It is also controllable, by blocking: we can use 4 coupons (the ’blocks’) and
apply each of the 4 treatments (the tips) to each coupon. This is preferable
to hoping that randomization alone will do the job; it also uses fewer coupons.
Sarguta (SoM) Design and Analysis February - May, 2021 29 / 263
Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Introduction Contd

I There may be unknown and uncontrollable factors affecting the readings (the
eyesight of the operator, ... think of others). Here is where randomization
might help - within each block, the treatments are applied in random order.
So each block can be viewed as one CR designed experiment. This is a
Randomized Complete Block Design (RCBD). ’Complete’ means that each
block contains all of the treatments.
I Common blocking variables: Day of week, person, batch of raw material, ... .
A basic idea is that the responses should be less highly varied within a block
than between blocks.

Sarguta (SoM) Design and Analysis February - May, 2021 30 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Hardness testing design and data

yi = machine reading for tip i, coupon j; order in parentheses


Coupon Mean
Tip 1 2 3 4 ȳi.
1 9.3(3) 9.4(3) 9.6(2) 10.0(1) 9.575
2 9.4(1) 9.3(4) 9.8(1) 9.9(4) 9.600
3 9.2(4) 9.4(2) 9.5(3) 9.7(3) 9.450
4 9.7(2) 9.6(1) 10.0(4) 10.2(2) 9.875
ȳ.j 9.400 9.425 9.725 9.950 ȳ.. = 9.625
Note that the layout of the data is the same as for a CR design, where the
columns would be labelled ’replicates’. But the design is different - if this were a
CRD the ’times’ would be (1) . . . (16) in random order. Here the randomization is
restricted- it is done separately within each block. This will allow us to attribute
some of the variation to the blocks (SSBlocks ), and thus remove it from
experimental error (SSE ).

Sarguta (SoM) Design and Analysis February - May, 2021 31 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Data Entry in R

> reading<-c(9.3,9.4,9.6,10.0,9.4,9.3,9.8,9.9,
+ 9.2,9.4,9.5,9.7,9.7,9.6,10.0,10.2)
> coupon<-c(rep(1:4,4))
> tip<-c(rep(1,4),rep(2,4),rep(3,4),rep(4,4))
> tip<-factor(tip)
> coupon<-factor(coupon)
> Hardness<-data.frame(reading,tip,coupon)
Visualizing with box plots
> par(mfrow=c(1,2))
> boxplot(reading~coupon, xlab="coupon")
> boxplot(reading~tip, xlab="tip")

Sarguta (SoM) Design and Analysis February - May, 2021 32 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Box plots

10.2

10.2
10.0

10.0
9.8

9.8
9.6

9.6
9.4

9.4
9.2

9.2

1 2 3 4 1 2 3 4

coupon tip

Sarguta (SoM) Design and Analysis February - May, 2021 33 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

RCBD Model
Effects Model:

yij = µ + τi + βj + ij
i = 1, . . . , a = number of treatments
j = 1, . . . , b = number of blocks
τi = effect of ith treatment
βj = effect of jth block
X X
τi = βj = 0

Assume ij ∼ (0, σ 2 ). Putting

µij = µ + τi + βj = E [yij ]

gives the means model


yij = µij + ij
.
Sarguta (SoM) Design and Analysis February - May, 2021 34 / 263
Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Sums of Squares
I We consider the effects model, and Decompose SST into sums of squares
attributable to (i) treatment differences (SSTr ), (ii) blocks (SSBlocks ), (iii)
experimental error (SSE ).
I Least Square Estimates: (Prove!)
µ̂ = ȳ.. ; τ̂i = ȳi. − ȳ.. ; β̂j = ȳ.j − ȳ..
Decomposition of SST
a X
X b
2
SST = (yij − ȳ.. )
i=1 j=1
a X
X b a X
X b
2 2
= (ȳi. − ȳ.. ) + (ȳ.j − ȳ.. )
i=1 j=1 i=1 j=1
a X
X b
2
+ (yij − ȳi. − ȳ.j + ȳ.. )
i=1 j=1

SST = SSTr + SSBlocks + SSE


Sarguta (SoM) Design and Analysis February - May, 2021 35 / 263
Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Degrees of Freedom

I Degrees of freedom
df (SSTr ) = a − 1

df (SSBlocks ) = b − 1

df (SSE ) = ab − 1 − (a − 1) − (b − 1) = (a − 1)(b − 1)
I Task: Give the Theoretical ANOVA table for RCBD

Sarguta (SoM) Design and Analysis February - May, 2021 36 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

R - Output

I For the Hardness design, we have the following output:


> fit.hardness<-lm(reading~tip+coupon)
> anova(fit.hardness)
Analysis of Variance Table

Response: reading
Df Sum Sq Mean Sq F value Pr(>F)
tip 3 0.385 0.128333 14.438 0.0008713 ***
coupon 3 0.825 0.275000 30.938 4.523e-05 ***
Residuals 9 0.080 0.008889
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sarguta (SoM) Design and Analysis February - May, 2021 37 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Interpretation

I Thus at any level α > 0.00087, we would reject the null hypothesis of no
significant treatment effects (H0 : τ1 = · · · = τa = 0). It also appears that
the blocks have a significant effect.
I A caution here though - the randomization alone ensures that the F-test for
treatments is approximately valid even if the errors are not very normal.
Because of the randomization restriction, the same is not true for testing the
significance of blocks by looking at MSMSE . Thus the p-value of 4.253e − 05
Blocks

for blocks (coupons) should be used only as a guide, unless one is sure of the
normality.

Sarguta (SoM) Design and Analysis February - May, 2021 38 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Assumptions Verification

I qq-plot of residuals; ij = yij − ŷij , where the fitted values are ŷij = µ̂ + τ̂i + β̂j .
I Residuals vs. treatment labels, block labels, fitted values.
> par(mfrow=c(2,2))
> plot(fit.hardness,which=2)#qqplot
> plot(fit.hardness,which=1)#Residuals vs Fitted
> plot(fit.hardness,which=5)#Residuals vs Factor Levels(Tip)
> plot(fit.hardness$residuals,coupon)

Sarguta (SoM) Design and Analysis February - May, 2021 39 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Test for Homogeneity of Variances

I Does the error variance depend on the treatment, or on the block?


Apparently not:
> bartlett.test(reading,tip)
Bartlett test of homogeneity of variances

data: reading and tip


Bartlett's K-squared = 0.44773, df = 3, p-value = 0.9302
> bartlett.test(reading,coupon)
Bartlett test of homogeneity of variances

data: reading and coupon


Bartlett's K-squared = 0.94628, df = 3, p-value = 0.8142
I The normality-based tests can be justified here since we have little evidence
of non-normality. It’s a good idea to run non-parametric tests too, to reassure
ourselves that we reach the same conclusions without assuming normality.

Sarguta (SoM) Design and Analysis February - May, 2021 40 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Conclusion

I So - the assumptions seem to be met, and at least some of the differences in


the treatment means, i.e. in the mean readings µi. = µ + τi , are significant -
the readings of the hardness testing device depend on which tip is being used.
I This is bad news for the engineers.
I Is there any one tip responsible for the differences?
I We should look at all of the differences µ̂i. − µ̂i 0 . = ȳi. − ȳi 0 . to see which are
significant.

Sarguta (SoM) Design and Analysis February - May, 2021 41 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Fisher’s LSD (”Least Significant Difference”)

I A 100(1 − α)% confidence interval on one difference µi. − µi 0 . is


s  
1 1
ȳi. − ȳi 0 . ± t α2 ,df (MSE ) MSE +
b b
s  
2
= ȳi. − ȳi 0 . ± t α2 ,9 0.00889
4

I With α = 0.05, the 95% interval is ȳi. − ȳi 0 . ± 0.151.


I Converting this to a hypothesis test, we see that the hypothesis of equality is
rejected if
|ȳi. − ȳi 0 . | > LSD = 0.151
.

Sarguta (SoM) Design and Analysis February - May, 2021 42 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Fisher’s LSD - Example

I Since

|ȳ1. − ȳ2. | = 0.025 < 0.151,


|ȳ1. − ȳ3. | = 0.125 < 0.151,
|ȳ1. − ȳ4. | = 0.300 > 0.151, ∗
|ȳ2. − ȳ3. | = 0.150 < 0.151,
|ȳ2. − ȳ4. | = 0.275 > 0.151, ∗
|ȳ3. − ȳ4. | = 0.425 > 0.151, ∗

I we conclude that tips 1,2 and 3 produce identical hardness readings but that
tip 4 gives significantly different (and higher) readings.
I In making these statements our experiment-wise error rate is
< 6(0.05) = 0.3, so our overall confidence is > 70%.

Sarguta (SoM) Design and Analysis February - May, 2021 43 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Fisher’s LSD in R
> library(agricolae)
> comparison<-LSD.test(reading,tip,9,0.0089)
> comparison
$statistics
MSerror Df Mean CV t.value LSD
0.0089 9 9.625 0.9801539 2.262157 0.1509047

$parameters
test p.ajusted name.t ntr alpha
Fisher-LSD none tip 4 0.05

$means
reading std r LCL UCL Min Max Q25 Q50 Q75
1 9.575 0.3095696 4 9.468294 9.681706 9.3 10.0 9.375 9.50 9.700
2 9.600 0.2943920 4 9.493294 9.706706 9.3 9.9 9.375 9.60 9.825
3 9.450 0.2081666 4 9.343294 9.556706 9.2 9.7 9.350 9.45 9.550
4 9.875 0.2753785 4 9.768294 9.981706 9.6 10.2 9.675 9.85 10.050

$comparison
Sarguta (SoM) Design and Analysis February - May, 2021 44 / 263
Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Tukey Honest Significant Difference (HSD)

I Tukey’s procedure replaces t α ,9 in Fisher’s LSD with


2

q qtukey (0.95, 4, 9)
√α = √ = 3.1218
2 2
to get s  
q 2
√α · se (ȳi. − ȳi 0 . ) = 3.1218 0.00889 = 0.208
2 4
I The same conclusions are drawn, with an experiment-wise error rate of only
0.05.
TASK: Find other mean comparison methods.

Sarguta (SoM) Design and Analysis February - May, 2021 45 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Tukey HSD in R
> fit<-aov(reading~tip+coupon)
> TukeyHSD(fit)
Tukey multiple comparisons of means
95% family-wise confidence level

Fit: aov(formula = reading ~ tip + coupon)

$tip
diff lwr upr p adj
2-1 0.025 -0.18311992 0.23311992 0.9809005
3-1 -0.125 -0.33311992 0.08311992 0.3027563
4-1 0.300 0.09188008 0.50811992 0.0066583
3-2 -0.150 -0.35811992 0.05811992 0.1815907
4-2 0.275 0.06688008 0.48311992 0.0113284
4-3 0.425 0.21688008 0.63311992 0.0006061

$coupon
diff lwr upr p adj
2-1 0.025 -0.18311992
Sarguta (SoM)
0.2331199 0.9809005
Design and Analysis February - May, 2021 46 / 263
Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

95% family−wise confidence level


4−3 3−2 3−1

−0.2 0.0 0.2 0.4 0.6

Differences in mean levels of tip

95% family−wise confidence level


4−3 3−2 3−1

−0.2 0.0 0.2 0.4 0.6

Sarguta (SoM) Design and Analysis February - May, 2021 47 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Tukey HSD with package multcomp


> library(multcomp)
> fit<-aov(reading~tip+coupon)
> fit.tukey <- glht(fit, linfct = mcp(tip = "Tukey"))
> summary(fit.tukey)
Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts

Fit: aov(formula = reading ~ tip + coupon)

Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
2 - 1 == 0 0.02500 0.06667 0.375 0.98092
3 - 1 == 0 -0.12500 0.06667 -1.875 0.30293
4 - 1 == 0 0.30000 0.06667 4.500 0.00698 **
3 - 2 == 0 -0.15000 0.06667 -2.250 0.18164
4 - 2 == 0 0.27500 0.06667 4.125 0.01120 *
4 - 3 == 0 0.42500
Sarguta (SoM)
0.06667 6.375 < 0.001 ***
Design and Analysis February - May, 2021 48 / 263
Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

95% family−wise confidence level

2−1 ( )

3−1 ( )

4−1 ( )

3−2 ( )

4−2 ( )

4−3 ( )

−0.2 0.0 0.2 0.4 0.6

Sarguta (SoM) Design and Analysis February - May, 2021 49 / 263


Randomized Blocks, Latin Squares and Related Designs Randomized Blocks

Exercise

To be given!

Sarguta (SoM) Design and Analysis February - May, 2021 50 / 263


Randomized Blocks, Latin Squares and Related Designs Latin Squares

Latin Squares
I Same (hardness) example. Suppose that the ’operator’ of the testing machine
was also thought to be a factor.
I We suppose that there are p = 4 operators, p = 4 coupons, and p = 4 tips.
The first two are nuisance factors, the last is the ’treatment’.
I We can carry out the experiment, and estimate everything we need to, in only
p 2 = 16 runs (as before), if we use a Latin Square Design.
I Here each tip is used exactly once on each coupon, and exactly once by each
operator.
I Represent the treatments by the Latin letters A, B, C , D and consider the
Latin square:
Operator
Coupon k=1 k=2 k=3 k=4
i=1 A D B C
i=2 B A C D
i=3 C B D A
i=4 D C A B

Sarguta (SoM) Design and Analysis February - May, 2021 51 / 263


Randomized Blocks, Latin Squares and Related Designs Latin Squares

Data

I Each letter appears exactly once in each row and in each column.
I There are many ways to construct a Latin square, and the randomization
enters into things by randomly choosing one of them.
I Suppose the data were:
Operator
Coupon k=1 k=2 k=3 k=4
i=1 A=9.3 B=9.3 C=9.5 D=10.2
i=2 B=9.4 A=9.4 D=10.0 C=9.7
i=3 C=9.2 D=9.6 A=9.6 B=9.9
i=4 D=9.7 C=9.4 B=9.8 A=10.0

Sarguta (SoM) Design and Analysis February - May, 2021 52 / 263


Randomized Blocks, Latin Squares and Related Designs Latin Squares

Effects Model

We use an effects model

yijk = µ + αi + τj + βk + ijk , i, j, k = 1, . . . , p

where
(i) yijk is the observation in row i, column k, using treatment j. ( So
y243 = 10.0, y223 does not exist - only p 2 of them do).
(ii) αi , τj , βk are the row, treatment and column effects, all summing to zero.
(iii) ijk is the random effect
Note that the model is additive in that there is no interaction effect: any
treatment has the same effect regardless of the levels of the other factors.

Sarguta (SoM) Design and Analysis February - May, 2021 53 / 263


Randomized Blocks, Latin Squares and Related Designs Latin Squares

Least Square Estimates and Sums of Squares

For the Latin square design the LSEs are

α̂i = ȳi.. − ȳ... , τ̂j = ȳ.j. − ȳ... , β̂k = ȳ..k − ȳ...

as usual, with
p
X
SSRows = p α̂i2
i=1
p
X
SSTr = p τ̂j2
j=1

p
X
SSCol = p β̂k2
k=1

SSE = SST − SSRows − SSTr − SSCol


The d.f. of these are p − 1, p − 1, p − 1 and p 2 − 1 − 3(p − 1) = (p − 2)(p − 1).

Sarguta (SoM) Design and Analysis February - May, 2021 54 / 263


Randomized Blocks, Latin Squares and Related Designs Latin Squares

ANOVA Table

(i) Write down the theoretical ANOVA table for a Latin Square Design.
(ii) Give the rejection criteria for the hypothesis of equal treatment effects.

Sarguta (SoM) Design and Analysis February - May, 2021 55 / 263


Randomized Blocks, Latin Squares and Related Designs Latin Squares

R - Code
> y <- c(9.3, 9.4, 9.2, 9.7, 9.3, 9.4, 9.6, 9.4,
+ 9.5, 10.0, 9.6, 9.8, 10.2, 9.7, 9.9, 10.0)
> operators <- as.factor(rep(1:4, each=4))
> coupons <- as.factor(rep(1:4, times=4))
> tips <- as.factor(c("A", "B", "C", "D", "B", "A", "D", "C",
+ "C","D", "A", "B", "D", "C", "B","A"))
> data <- data.frame(y, operators, coupons, tips)
> data
y operators coupons tips
1 9.3 1 1 A
2 9.4 1 2 B
3 9.2 1 3 C
4 9.7 1 4 D
5 9.3 2 1 B
6 9.4 2 2 A
7 9.6 2 3 D
8 9.4 2 4 C
9 9.5 3 1 C
10 10.0Sarguta (SoM) 3 2 D
Design and Analysis February - May, 2021 56 / 263
Randomized Blocks, Latin Squares and Related Designs Latin Squares

Model Fit

> g <- lm(y~tips + operators + coupons)


> anova(g)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
tips 3 0.385 0.128333 38.5 0.0002585 ***
operators 3 0.825 0.275000 82.5 2.875e-05 ***
coupons 3 0.060 0.020000 6.0 0.0307958 *
Residuals 6 0.020 0.003333
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the ANOVA table the means are significantly different.

Sarguta (SoM) Design and Analysis February - May, 2021 57 / 263


Randomized Blocks, Latin Squares and Related Designs Latin Squares

Multiple Comparison Tests

> g1=aov(y~tips + operators + coupons)


> model.tables( g1, type = "means" )$tables$tips
tips
A B C D
9.575 9.600 9.450 9.875
> #These are the averages for tips A, B, C, D
> qtukey(.95,4,6)*sqrt(.0033/4)
[1] 0.1406154
Tukey’s procedure says that µ.k. and µ.l. are significantly different (α = 0.05) if
s
MSE
|ȳ.k. − ȳ.l. | > qtukey (0.95, 4, 6) = 0.141,
p

so again we conclude that tip 4 gives significantly different readings. Tips 2 and 3
seem significantly different as well. Now the coupons don’t seem to affect the
readings, although it appears that the operators do.
Sarguta (SoM) Design and Analysis February - May, 2021 58 / 263
Randomized Blocks, Latin Squares and Related Designs Latin Squares

Check Assumptions

The usual model checks should be done - qqplot of residuals (to check normality),
plots of residuals against row labels, column labels, treatment labels (to check for
constant variances). Bartlett’s test can be carried out if normality is assured.

Sarguta (SoM) Design and Analysis February - May, 2021 59 / 263


Randomized Blocks, Latin Squares and Related Designs Latin Squares

Exercise 1

The effect of s = 4 sleeping pills is tested on s 2 = 16 persons, who are stratified


according to the design of the Latin Square, based on the ordinally classified
factor’s body weight and blood pressure. The response to be measured is the
prolongation of sleep (in minutes) compared to an average value (without sleeping
pills).
Weight
A (43) B (57) C (61) D (74)
B (59) C (63) D (75) A (46)
Blood Pressure
C (65) D (79) A (48) B (64)
D (83) A (55) B (67) C (72)
Is there a significant effect of the sleeping pills on the prolongation of sleep?

Sarguta (SoM) Design and Analysis February - May, 2021 60 / 263


Randomized Blocks, Latin Squares and Related Designs Latin Squares

Exercise 2
A cornflakes company wishes to test the market for a new product that is intended
to be eaten for breakfast. Primarily two factors are of interest, namely an
advertising campaign and the type of emballage used. Four alternative advertising
campaigns were considered:
I A; TV commercials,
I B; adds in the newspapers,
I C; lottery in the individual packages,
I D; free package (sent by mail to many families).
Four different kinds of emballage were chosen. They differed in the way the
product was described on the front of the packages:
I I; contains calcium, ferro minerals, phosphorus and B vitamin,
I II; easy and fast to prepare,
I III; low cost food,
I IV; gives you energy to last for the whole day.
The investigation was carried out in four cities called 1, 2, 3 and 4. The following
results were obtained:
Sarguta (SoM) Design and Analysis February - May, 2021 61 / 263
Randomized Blocks, Latin Squares and Related Designs Latin Squares

Exercise 2 - Contd

Sales figures in Multiples of 1000 Shillings


City
Emballage 1 2 3 4
I A52 B51 C55 D56
II B50 C45 D49 A51
III C39 D41 A37 B39
IV D43 A41 B42 C42
I Formulate and analyze a mathematical model for this experiment. Discuss
whether the model and the results found seem reasonable.
I If any of the sources of variation are (statistically) significant try to see if any
of the alternatives considered are better or worse than the other or, in
general, if there seems to be a grouping of alternatives with respect to effect
on the sale.

Sarguta (SoM) Design and Analysis February - May, 2021 62 / 263


Randomized Blocks, Latin Squares and Related Designs Graeco-Latin Square

Graeco-Latin Square

I The Latin square notion extends to Graeco-Latin squares.


I Suppose that we had one more factor - day of the week, at four levels α
(Monday), β (Tuesday), γ (Wednesday), δ (Thursday), of importance if the
whole experiment took 4 days to complete.
I Superimpose a 4 ∗ 4 Latin squares consisting of these Greek letters, in such a
way that each (Latin,Greek) combination of letters occurs exactly once:
Operator
Coupon k=1 k=2 k=3 k=4
i=1 Aα=9.3 Bβ=9.3 Cγ=9.5 Dδ=10.2
i=2 Bδ=9.4 Aγ=9.4 Dβ=10.0 Cα=9.7
i=3 Cβ=9.2 Dα=9.6 Aδ=9.6 Bγ=9.9
i=4 Dγ=9.7 Cδ=9.4 Bα=9.8 Aβ=10.0

Sarguta (SoM) Design and Analysis February - May, 2021 63 / 263


Randomized Blocks, Latin Squares and Related Designs Graeco-Latin Square

Model

I Effects Model
yijkl = µ + αi + τj + βk + θl + ijkl
I The additive model has terms for all four factors - coupons, operators, days
and tips. Each is estimated by the sample average for that level of that
factor, minus the overall average.
I For example the LSE of the effect of Tuesday is

1
θ̂2 = (9.2 + 9.3 + 10.0 + 10.0) − y....
4
and
p
X
SSDays = p θ̂l2 .
l=1

I Each factor uses (p − 1) d.f., so that SSE is on only


p 2 − 1 − 4(p − 1) = (p − 3)(p − 1) d.f.

Sarguta (SoM) Design and Analysis February - May, 2021 64 / 263


Randomized Blocks, Latin Squares and Related Designs Graeco-Latin Square

Model Fit in R

> days <- as.factor(c(1,4,2,3, 2,3,1,4,3,2,4,1, 4,1,3,2))


> h <- lm(y~tips + operators + coupons + days)
> anova(h)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
tips 3 0.385 0.128333 25.6667 0.012188 *
operators 3 0.825 0.275000 55.0000 0.004029 **
coupons 3 0.060 0.020000 4.0000 0.142378
days 3 0.005 0.001667 0.3333 0.804499
Residuals 3 0.015 0.005000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sarguta (SoM) Design and Analysis February - May, 2021 65 / 263


Randomized Blocks, Latin Squares and Related Designs Graeco-Latin Square

Exercise

To be given!

Sarguta (SoM) Design and Analysis February - May, 2021 66 / 263


Randomized Blocks, Latin Squares and Related Designs Balanced Incomplete Block Designs

Balanced Incomplete Block Design

I In the RCBD, C=’Complete’ means that each block contains each treatment.
E.g. each coupon is subjected to each of the 4 tips.
I Suppose that a coupon is only large enough that 3 tips can be used. Then
the blocks would be ’incomplete’.
I One way to run the experiment is to randomly assign 3 tips to each block,
perhaps requiring that each tip appears 3 times in total.
I There is a more efficient way. An incomplete block design is ’balanced’ if any
two treatments appear in the same block an equal number of times. This is
then a Balanced Incomplete Block Design.

Sarguta (SoM) Design and Analysis February - May, 2021 67 / 263


Randomized Blocks, Latin Squares and Related Designs Balanced Incomplete Block Designs

Hardness testing: BIBD design and data.


Coupon
Tip 1 2 3 4 yi. Qi
1 9.3 9.4 - 10.0 28.7 -0.1000
2 - 9.3 9.8 9.9 29.0 -0.1667
3 9.2 9.4 9.5 - 28.1 -0.4333
4 9.7 - 10.0 10.2 29.9 0.7000
y.j 28.2 28.1 29.3 30.1
Notation:
I a = number of treatments, b = number of blocks; a = 4, b = 4
I k = number of treatments per block; k = 3
I r = number of times each treatment appears in the entire experiment; r = 3
I N = ar = bk = number of observations; N = 12
I λ = number of times each pair of treatments appears together.
Task: Show that
r (k − 1)
λ= .
a−1

Sarguta (SoM) Design and Analysis February - May, 2021 68 / 263


Randomized Blocks, Latin Squares and Related Designs Balanced Incomplete Block Designs

Model
Model is as for a RCBD:
yij = µ + τi + βj + ij
As usual, the total sum of squares is
X 2
SST = (yij − ȳ ..)
i,j

on N − 1 d.f. and the SS for Blocks is


b
X 2
SSBlocks = k (ȳ.j − ȳ.. )
j=1

on b − 1 d.f. The treatment SS depends on the ’adjusted total for the ith
treatment’.
b
1X
Qi = yi. − nij y.j
k
j=1

where nij = 1 if treatment i appears in block j and is 0 otherwise.


Sarguta (SoM) Design and Analysis February - May, 2021 69 / 263
Randomized Blocks, Latin Squares and Related Designs Balanced Incomplete Block Designs

Pb
So j=1 nij y.j is the total of the block totals, counting only those blocks that
contain treatment i:
1
Q1 = 28.7 − (28.2 + 28.1 + 30.1) = −0.1000
3
1
Q2 = 29.0 − (28.1 + 29.3 + 30.1) = −0.1667
3
1
Q3 = 28.1 − (28.2 + 28.1 + 29.3) = −0.4333
3
1
Q4 = 29.9 − (28.2 + 29.3 + 30.1) = 0.7000
3
P
(As a check, it is always the case that Qi = 0).

Sarguta (SoM) Design and Analysis February - May, 2021 70 / 263


Randomized Blocks, Latin Squares and Related Designs Balanced Incomplete Block Designs

Where is the Difference?


Due to the incompleteness of the blocks, ȳi. − ȳ.. is no longer an unbiased
estimate of τi . For instance in our example
E [y1. ] = E [y11 + y12 + y14 ]
= 3µ + 3τ1 + β1 + β2 + β4

4
X
E [y.. ] = 12µ + r (τ1 + τ2 + τ3 + τ4 ) + k βj
j=1
= 12µ
Then
3µ + 3τ1 + β1 + β2 + β4 12µ
E [ȳi. − ȳ.. ] = −
3 12
β1 + β2 + β4
= τ1 +
3
The block totals must be brought in, in order to adjust for the bias.
Sarguta (SoM) Design and Analysis February - May, 2021 71 / 263
Randomized Blocks, Latin Squares and Related Designs Balanced Incomplete Block Designs

Sum of Squares of Treatments

It turns out that the LSE’s of the treatment effects are


kQi
τ̂i = ,
λa
and that these are unbiased. The ’Sum of Squares of Treatments, adjusted for
Blocks’ is
a
k X 2
SSTr (Bl) = Qi ,
λa
i=1

an a − 1 d.f. In our case


3
SSTr (Bl) = ∗ 0.7155 = 0.2683.
(2)(4)

The idea is that we first estimate the block effects and then see how much of the
remaining variation is attributable to treatments. Doing it in the other order
results in something quite different.

Sarguta (SoM) Design and Analysis February - May, 2021 72 / 263


Randomized Blocks, Latin Squares and Related Designs Balanced Incomplete Block Designs

Analysis in R

Correct Analysis:
> data=c(9.3,9.4,10,9.3,9.8,9.9,9.2,9.4,9.5,9.7,10,10.2)
> tip_s=as.factor(c(1,1,1,2,2,2,3,3,3,4,4,4))
> coupon_s=as.factor(c(1,2,4,2,3,4,1,2,3,1,3,4))
> g1 <- lm(data ~ coupon_s + tip_s)
> anova(g1)
Analysis of Variance Table

Response: data
Df Sum Sq Mean Sq F value Pr(>F)
coupon_s 3 0.90917 0.303056 29.3280 0.001339 **
tip_s 3 0.26833 0.089444 8.6559 0.020067 *
Residuals 5 0.05167 0.010333
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sarguta (SoM) Design and Analysis February - May, 2021 73 / 263


Randomized Blocks, Latin Squares and Related Designs Balanced Incomplete Block Designs

Caution!

Incorrect Analysis:
> h1 <- lm(data ~ tip_s + coupon_s)
> anova(h1)
Analysis of Variance Table

Response: data
Df Sum Sq Mean Sq F value Pr(>F)
tip_s 3 0.56250 0.187500 18.145 0.004054 **
coupon_s 3 0.61500 0.205000 19.839 0.003311 **
Residuals 5 0.05167 0.010333
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sarguta (SoM) Design and Analysis February - May, 2021 74 / 263


Randomized Blocks, Latin Squares and Related Designs Balanced Incomplete Block Designs

Inferences

To make inferences, we use the fact that the τ̂i are independent and equally
varied, with
kσ 2
VAR [τ̂i ] = ,
λa
so that r
2k
se (τ̂i − τ̂j ) = · MSE .
λa
In our example this is 0.0880, so that single confidence intervals are

τ̂i − τ̂j ± t α2 ,5 · 0.0880

qtukey (1−α,4,5)
and simultaneous Tukey-type intervals replace t α2 ,5 by √
2
.

Sarguta (SoM) Design and Analysis February - May, 2021 75 / 263


Randomized Blocks, Latin Squares and Related Designs Balanced Incomplete Block Designs

Exercise
Suppose that a chemical engineer thinks that the time of reaction of a chemical
process is a function of the type of catalyst employed. Four catalysts are currently
being investigated. The experimental procedure consists of selecting a batch of
raw material, loading the pilot plant, applying each catalyst in a separate run of
the pilot plant and observing the reaction time. Because variations in the batches
of raw material may affect the performance of the catalysts, the engineer decides
to use batches of raw material as blocks. However, each batch is only large
enough to permit three catalysts to be run. The balanced incomplete block design
for this experiment along with the observations recorded is given as follows:
Block (Batch of Raw Material)
Treatment (Catalyst) 1 2 3 4 yi.
1 73 74 - 71 218
2 - 75 67 72 214
3 73 75 68 - 216
4 75 - 72 75 222
y.j 221 224 207 218 870
Analyse the B.I.B.D data above.

Sarguta (SoM) Design and Analysis February - May, 2021 76 / 263


Randomized Blocks, Latin Squares and Related Designs Partially Balanced Incomplete Block Designs

Partially Balanced Incomplete Block Design (PBIBD)

A design for which v treatments are laid out in b blocks each of size k, no block
receiving a treatment more than once (nij ≤ 1) is called a PBIBD if the following
conditions are satisfied:
1. Each treatment is replicated the same number of times
2. Any two treatments are either first associates or second associates, . . . , or
mth associates of each other. Two treatments which are ith associates of
each other occur together λi times (i = 1, 2, . . . , m). The number of ith
associates of any treatment is ni , where ni does not depend on the treatments
considered (i = 1, 2, . . . , m). Thus we have a new set of parameters
n1 , n2 , . . . , nm and λ1 , λ2 , . . . , λm . If m = 1, the PBIBD reduces to a BIBD.
3. Given two treatments which are ith associates of each other, the number of
treatments common to the class of jth associate of one and the class of kth
associate of the other is the same whatever the pair of treatment we start
with and whatever the order in which we take them. This constant is denoted
i i i
as pjk . By definition pjk = pkj .

Sarguta (SoM) Design and Analysis February - May, 2021 77 / 263


Randomized Blocks, Latin Squares and Related Designs Partially Balanced Incomplete Block Designs

Parameters of PBIBD

The parameters associated with the PBIBD are:


(i) b, v , r , k
(ii) λ1 , λ2 , . . . , λm
(iii) n1 , n2 , . . . , nm
 0

(iv) P i = pjji 0 ; j, j = 1, 2, . . . , m , i = 1, 2, . . . , m
The set of parameters in (i), (ii) and (iii) are called primary parameters and those
in (iv) are called secondary parameters. The total number of parameters involved
in (i), (ii), (iii) and (iv) are 4 + 2m + m2 m+1
2 , all of which are not however
independent.

Sarguta (SoM) Design and Analysis February - May, 2021 78 / 263


Randomized Blocks, Latin Squares and Related Designs Partially Balanced Incomplete Block Designs

The parameters are subject to the restrictions:


Pm
1. j=1 nj = v − 1
2. bk = vr
Pm
3. r (k − 1) = j=1 nj λj
i i i
4. (i) pi1 + pi2 + · · · + pim = ni − 1
i i i
(ii) pj1 + pj2 + · · · + pjm = nj , (i 6= j)
i j
5. ni pjk = nj pik = nk pijk

Sarguta (SoM) Design and Analysis February - May, 2021 79 / 263


Randomized Blocks, Latin Squares and Related Designs Partially Balanced Incomplete Block Designs

Proofs of Restrictions

(a) Given any treatment, all other treatments are eitherPits first associates,
m
second associates, or,. . . , or mth associates. Thus j=1 nj = v − 1.
(b) This is obvious because both sides equal the number of observations.
(c) The number of pairs of treatments occurring together that may be formed so
as to include one particular treatment always is r (k − 1). Again any
treatment occurs λi times with each of the treatments of its ith associates
and there are, in all, ni treatments which are ith associates of this treatment.
Hence the total number of pairs of treatments occurringP together that can be
m
formed so as to include a particular treatment always is j=1 nj λj . Hence
Pm
r (k − 1) = j=1 nj λj .

Sarguta (SoM) Design and Analysis February - May, 2021 80 / 263


Randomized Blocks, Latin Squares and Related Designs Partially Balanced Incomplete Block Designs

Proofs Contd

(d) (i) In this case we consider two treatments α and β which are ith associates. Now
α has ni − 1 ith associates other than β. These ith associates of α can occur
in all or some of the m subgroups with respect to β. Thus
i i i
pi1 + pi2 + · · · + pim = ni − 1.
(ii) In this case we consider two treatments α and β which are ith associates. Now
α or β has nj jth associates. Note α is an ith associate of β and β is an ith
associate of α. Also α is not a jth associate of β nor is β a jth associate of α.
i i i
Thus pj1 + pj2 + · · · + pjm = nj , (i 6= j).
(e) Consider the group Gi of ni treatments which are ith associates of a given
treatment θ and the group Gj of nj treatments which are jth associates of θ.
i
Every treatment belonging to Gi has exactly pjk kth associates among the
treatments of group Gj . Hence the number of pairs of kth associates which
can be found by taking one treatment from Gi and one treatment from Gj is
i j j
on one hand ni pjk and on the other hand nj pik . Similarly nj pik = nk pijk . Thus
i j
ni pjk = nj pik = nk pijk .

Sarguta (SoM) Design and Analysis February - May, 2021 81 / 263


Randomized Blocks, Latin Squares and Related Designs Partially Balanced Incomplete Block Designs

Construction of a PBIBD

Consider a cube whose eight corners are numbered arbitrarily. Assign blocks to the
numbers (treatments) appearing in each of the six faces of the solid cube. The
resulting design is a P.B.I.B. design with parameters:
v = 8, b = 6, r = 3, k = 4
λ1 = 2, λ2 = 1, λ3 = 0
n1 = 3, n2 = 3, n3 = 1.
The blocks are:
Bl1 : 1, 2, 3, 4
Bl2 : 5, 6, 7, 8
Bl3 : 1, 3, 5, 7
Bl4 : 2, 4, 6, 8
Bl5 : 1, 2, 5, 6
Bl6 : 3, 4, 7, 8
The labels for the corners are treatments in the blocks.

Sarguta (SoM) Design and Analysis February - May, 2021 82 / 263


Randomized Blocks, Latin Squares and Related Designs Partially Balanced Incomplete Block Designs

Association Scheme

Treatment 1st Associates 2nd Associates 3rd Associates


1 2,3,5 4,6,7 8
2 1,4,6 5,3,8 7
3 1,4,7 2,5,8 6
4 2,3,8 1,6,7 5
5 1,6,7 2,3,8 4
6 2,5,8 1,4,7 3
7 3,5,8 1,4,6 2
8 4,6,7 2,3,5 1

Sarguta (SoM) Design and Analysis February - May, 2021 83 / 263


Randomized Blocks, Latin Squares and Related Designs Partially Balanced Incomplete Block Designs

Here P i is a 3 ∗ 3 matrix since i, j, k = 1, 2, 3.


 1 1 1

p11 p12 p13
P 1 = p21 1 1
p22 1 
p23
1 1 1
p31 p32 p33

P 2 and P 3 are 3 ∗ 3 matrices. (Write them down!)


Now,
1
p11 denotes the common number of treatments between first associates of one
and first associates of the other if the two treatments are themselves first
associates. (e.g 1 and 2). In this case it is zero.
1
p12 denotes the common number of treatments between first associates of one
and second associates of the other where the two treatments under consideration
1
being first associates, (eg 1 and 2). In this case p12 = 2. etc

Sarguta (SoM) Design and Analysis February - May, 2021 84 / 263


Randomized Blocks, Latin Squares and Related Designs Partially Balanced Incomplete Block Designs

Thus  
0 2 0
1
P = 2
 0 1
0 1 0
Similarly  
2 0 1
P 2 = 0 2 0
1 0 0
and  
0 3 0
P 3 = 3 0 0
0 0 0

Sarguta (SoM) Design and Analysis February - May, 2021 85 / 263


Randomized Blocks, Latin Squares and Related Designs Partially Balanced Incomplete Block Designs

Another Method of obtaining P.B.I.B. design

Consider pq treatments arranged into p rows and q columns as follows:


t11 t12 · · · t1q
t21 t22 · · · t2q
.. .. ..
. . ··· .
tp1 tp2 · · · tpq
Associate with each treatment a block containing all treatments in the vertical
and horizontal line through it.
The resulting arrangement is a PBIB design with parameters; v = b = pq,
k = p + q − 1, r = p + q − 1, λ1 = p, λ2 = q, λ3 = 2, n1 = p − 1, n2 = q − 1,
n3 = (p − 1) (q − 1) and
 
p−2 0 0
P1 =  0 0 q−1 
0 q − 1 (p − 2) (q − 1)

Sarguta (SoM) Design and Analysis February - May, 2021 86 / 263


Factorial Designs

PART IV-1:
INTRODUCTION TO FACTORIAL DESIGNS

Sarguta (SoM) Design and Analysis February - May, 2021 87 / 263


Factorial Designs Introduction

Factorial Designs
Introduction

I In a factorial design the cells consist of all possible combinations of the levels
of the factors under study.
I The simplest types of factorial designs involve only two factors or sets of
treatments. There are a levels of factor A and b levels of factor B, and these
are arranged in a factorial design; that is, each replicate of the experiment
contains all ab treatment combinations. In general, there are n replicates.
I Factorial designs accentuate the factor effects, allow for estimation of
inter-dependency of effects (or interactions), and are the first technique in
the category of what is called treatment design.
I By examining all possible combinations of factor levels, the number of
replicates of a specific level of one factor is increased by the product of the
number of levels of all other factors in the design, and thus the same power
or precision can be obtained with fewer replicates.
I If the effect of one factor changes depending on the level of another factor, it
will be seen in a factorial plan. This phenomenon will be missed in the
classical approach where each factor is only varied at constant levels of the
other factors.
Sarguta (SoM) Design and Analysis February - May, 2021 88 / 263
Factorial Designs Introduction

Notation and Terminology


The following are notations and terminologies used in factorial experiments:
I Each basic variable in the experiment will be called a factor and will be
designated by a capital letter (A, B, C , D, . . .). The total number of factors
will be designated by n.
I The possible forms that a factor can take are known as levels of the factor.
The levels will be denoted by superscripts and will be numbered beginning
with 0.
I A particular combination involving one level from each factor is a treatment.
Treatments will be designated by lower case letters with appropriate
superscripts. In using superscripts, the following conventions will apply
If the factor level is zero, the corresponding letter will be omitted from the
treatment designation.
Unless otherwise specified, the factor level is 1.
If all factors are at zero level, the treatment will be designated by the symbol
(I )
I Factorial effects will be designated by capital letters with appropriate
superscripts, the conventions given above applying.
Sarguta (SoM) Design and Analysis February - May, 2021 89 / 263
Factorial Designs Introduction

General description of 2n factorial designs


Let there be an experiment with n-factors say A1 , A2 , A3 , . . . , An . Suppose each
factor is tried at two levels
(
1 upper level
Xi = , i = 1, 2, . . . , n
0 lower level

Then the total number of treatment combinations are 2 ∗ 2 ∗ 2 · · · n times, that is,
2n . Any treatment combination will be denoted by
a1X1 , a2X2 · · · anXn (1)
For example, if X1 = 1 and X2 = X3 = · · · = Xn = 0, then the treatment defined
by (1) means
a11 a20 a30 · · · an0 = a1
The treatment combination in (1) can be written in the form
 0  0  0  0
a1 a2 a3 an
⊗ ⊗ ⊗ · · · ⊗ (2)
a11 a21 a31 an1
where ⊗ mean symbolic direct product.
Sarguta (SoM) Design and Analysis February - May, 2021 90 / 263
Factorial Designs Introduction

General Description Contd


For example  0
a20

 0  0 a1
a1 a2 a11 a20 
⊗ 1 =  (3)
a11 a2 a10 a21 
a11 a21
Here we have 22 treatments.
Similarly
 0
a20

 0  0  0 a1
a11 a20 
 0
a1 a2 a3 a
⊗ 1 ⊗ 1 =   ⊗ 31
a11 a2 a3 a10 a21  a3
a11 a21
 0
a20 a30

a1
a11 a20 a30 
 0 
a1
 1 a21 a30 

a1 a21 a30 
=  a10

 1 a20 a31 

a1
 0 a20 a31 

Sarguta (SoM)
a1
Design and Analysis
a21 a31  February - May, 2021 91 / 263
Factorial Designs Introduction

Treatment Effects
The treatment effects will be denoted by AX1 1 AX2 2 · · · AXn n where
(
1
Xi = i = 1, 2, 3, . . . n (4)
0
For example, if X1 = 1 and X2 = X3 = · · · = Xn = 0 then (4) means A1 which is
the main effect of factor A1 . Similarly if X1 = X2 = 1 and
X3 = X4 = · · · = Xn = 0 then (4) will mean A1 A2 which is the two factor
interaction between factor A1 and factor A2 . Similarly if X1 = X2 = X3 = 1 and
X4 = X5 = · · · = Xn = 0 then (4) will mean A1 A2 A3 . This is the three factor
interaction between factor A1 , A2 and factor A3 .
We have
1. n main effects A1 , A2 , . . . An
2. n2 two factor interactions A1 A2 , A1 A3 , . . . , An−1 An
3. n3 three factor interactions A1 A2 A3 , A1 A2 A4 , . . . , An−2 An−1 An , e.t.c.


The total number of treatment effects are


       
n n n n
+ + + ··· + = 2n − 1
1 2 3 n
Sarguta (SoM) Design and Analysis February - May, 2021 92 / 263
Factorial Designs Introduction

Total Number of Treatment Effects

Including the grand average of all observations, the total number of treatment
effects is 2n . All these factors (treatment effects or factorial effects) are given by
the matrix symbolic product
     
I I I
⊗ ⊗ ··· ⊗
A1 A2 An

where I is the average of all treatment effects.


All treatments are generated by the symbolic direct matrices
     
1 1 1
⊗ ⊗ ··· ⊗
a1 a2 an

Sarguta (SoM) Design and Analysis February - May, 2021 93 / 263


Factorial Designs Introduction

Definitions

I Simple effect of a factor: The average change in response produced by a


change in the level of the factor, all the other factors being held constant.
When all factors are at two levels, each factor of a 2n factorial experiment
has 2n−1 simple effects.
I Main effect of a factor: The average of the simple effects. Interpretation of
this quantity is conditioned on the assumption of an additive model.
I Interaction between two factors: The average difference between the
simple effects of one factor determined at the levels of the second factor.
I Design Matrix: An array of t rows and n columns which specifies the
treatments to be included in the experiment. In presenting this array of 2n
factorials, zeros will be replaced by minus signs and ones by plus signs.
I X-matrix: The augmented design matrix which specifies the linear
combinations of the treatments to be used in estimating all main effects and
interactions. Interaction columns are obtained by taking the products of the
corresponding main effect columns.

Sarguta (SoM) Design and Analysis February - May, 2021 94 / 263


Factorial Designs Two-Factor Factorial Design

Two-Factor Factorial Design


I We study the effects of two or more factors, each at several levels. A
Factorial Design has observations at all combinations of these levels.
I The simplest types of factorial designs involve only two factors or sets of
treatments. There are a levels of factor A and b levels of factor B, and these
are arranged in a factorial design; that is, each replicate of the experiment
contains all ab treatment combinations. In general, there are n replicates.
I Example: Batteries are of two types (’1’ and ’2’; a = 2 levels of Factor A)
and their lifetimes may depend on the temperature at which they are used
(LO=’1’, HI=’2’; b = 2 levels of factor B). n = 4 observations are made at
each of the 22 (= levels factors ) combination of levels. The nab = 16
observations are made in a random order, so this is also a CRD.
Temperature level (B)
Type (A) 1 (LO) 2 (HI)
1 130, 155, 74, 180 20, 70, 82, 58
2 150, 188, 159, 126 25, 70, 58, 45
I We see the effect of changing one factor, while leaving the other fixed, by
plotting the means ȳij. at the 4 combinations.
Sarguta (SoM) Design and Analysis February - May, 2021 95 / 263
Factorial Designs Two-Factor Factorial Design

Interaction Plots - R Code

> y <- c(130,155,74,180,20,70,82,58,


+ 150,188,159,126,25,70,58,45)
> type <- as.factor(rep(1:2,each=8))
> temp <- as.factor(rep(1:2, each=4, times=2))
> data <- data.frame(y,type,temp)
> interaction.plot(type,temp,y)
> interaction.plot(temp,type,y)
Interpretation: If there is an interaction or joint effect between two factors, the
effect of one factor upon the response will differ depending on the level of the
other factor.

Sarguta (SoM) Design and Analysis February - May, 2021 96 / 263


Factorial Designs Two-Factor Factorial Design

Interaction Plots
160

160
temp type
140

140
1 1
2 2
120

120
mean of y
100

100
80

80
60

60

Sarguta (SoM) Design and Analysis February - May, 2021 97 / 263


Factorial Designs Two-Factor Factorial Design

Main Effects
I The average lifetimes at the 4 combinations of levels are
Temperature level (B)
Type (A) 1 (LO) 2 (HI)
1 134.75 57.5
2 155.75 49.5
I The ’main effect of A’ is the change in response caused by changing the level
of A. Here it is estimated by the difference in the average responses at those
levels of Factor A:
155.75 + 49.5 134.75 + 57.5
A= − = 65.5.
2 2
I Similarly
57.5 + 49.5 134.75 + 155.75
B= − = −91.75.
2 2
I Because of the interactions, these main effects are misleading. At the low
level of B, the effect of A is 155.75 − 134.75 = 21.00. At the high level, it is
49.5 − 57.5 = −8.0. The ’interaction effect’ is measured by the average
difference between these two: AB = −8.0−(21.00)
2 = −14.5
Sarguta (SoM) Design and Analysis February - May, 2021 98 / 263
Factorial Designs Two-Factor Factorial Design

Model
I The effects model, including terms for interaction, is that the kth observation
at level i of A, j of B is
yijk = µ + τi + βj + (τ β)ij + ijk
i = 1, . . . , a = levels of Factor A
j = 1, . . . , b = levels of Factor B
k = 1, . . . , n.
I Constraints i τi = 0 (average effect of levels of A is 0), i βj = 0 (average
P P
P P
effect of levels of B is 0), and average interactions i (τ β)ij = j (τ β)ij = 0.
I Reasonable estimates of these effects, obeying these constraints, are:
µ̂ = ȳ...
τ̂i = ȳi.. − ȳ... ,
β̂j = ȳ.j. − ȳ... ,
 
τc
β = (ȳij. − ȳ... ) − τ̂i − β̂j
ij
= ȳij. − ȳi.. − ȳ.j. + ȳ... .
Sarguta (SoM) Design and Analysis February - May, 2021 99 / 263
Factorial Designs Two-Factor Factorial Design

Sums of Squares

The effect estimates are shown to be the LSEs in the usual way:
I Decompose SST as:
X 2
SST = (yijk − ȳ... )
i,j,k
X   2
= τ̂i + β̂j + τc
β + (yijk − ȳij. )
ij
i,j,k
X X X 2 X 2
= nb τ̂i2 + na β̂j2 + n τc
β + (yijk − ȳij. )
ij
i j i,j i,j,k
= SSA + SSB + SSAB + SSE ,

on a − 1, b − 1, (a − 1)(b − 1) and ab(n − 1) degrees of freedom.

Sarguta (SoM) Design and Analysis February - May, 2021 100 / 263
Factorial Designs Two-Factor Factorial Design

Least Square Method

I Minimize
X 2
S (µ, τ, β, (τ β)) = (yijk − E [yijk ])
i,j,k
X 2
= yijk − µ − τi − βj − (τ β)ij
i,j,k

to obtain the parameter estimators.


I Under the hypothesis of no interactions, the appropriate F-ratio is

MSAB
F0 = ∼ F(a−1)(b−1),ab(n−1)
MSE
I If the interactions are not significant then it makes sense to ask about the
MSA
significance of the levels of the factors, using MSE
, etc.

Sarguta (SoM) Design and Analysis February - May, 2021 101 / 263
Factorial Designs Two-Factor Factorial Design

Expected Values of Mean Squares

The expected values of the mean squares turn out to be what one would expect:

E [MSE ] = σ2 ,
P 2
2
n i,j (τ β)ij
E [MSAB ] = σ + ,
(a − 1) (b − 1)
P 2
2 nb i τi
E [MSA ] = σ + ,
a−1
P 2
na j βj
E [MSB ] = σ 2 + .
b−1

Sarguta (SoM) Design and Analysis February - May, 2021 102 / 263
Factorial Designs Two-Factor Factorial Design

Computational Formulas for Sums of Squares

The total sum of squares is computed as


a X
b X
n 2
X
2 y...
SST = yijk − .
abn
i=1 j=1 k=1

The sums of squares for the main effect are


a
1 X 2 y2
SSA = yi.. − ...
bn abn
i=1

and
b
1 X 2 y2
SSB = y.j. − ...
an abn
j=1

Sarguta (SoM) Design and Analysis February - May, 2021 103 / 263
Factorial Designs Two-Factor Factorial Design

It is convenient to obtain the SSAB in two stages. First we compute the sum of
squares between the ab cell totals, which is called the sum of squares due to
”subtotals”:
a b
1 XX 2 y2
SSSubtotals = yij. − ...
n abn
i=1 j=1

This sum of squares also contains SSA and SSB . Therefore, the second step is to
compute SSAB as
SSAB = SSSubtotals − SSA − SSB
We may compute SSE by subtraction as

SSE = SST − SSAB − SSA − SSB

or
SSE = SST − SSSubtotals

Sarguta (SoM) Design and Analysis February - May, 2021 104 / 263
Factorial Designs Two-Factor Factorial Design

Task

Write down the theoretical ANOVA table for a two factor factorial experiment
with n observations per cell.

Sarguta (SoM) Design and Analysis February - May, 2021 105 / 263
Factorial Designs Two-Factor Factorial Design

Example- Two Factor Factorial Design


I As an example of a factorial design involving two factors, an engineer is
designing a battery for use in a device that will be subjected to some extreme
variations in temperature. The only design parameter that he can select at
this point is the plate material for the battery, and he has three possible
choices.
I When the device is manufactured and is shipped to the field, the engineer has
no control over the temperature extremes that the device will encounter, and
he knows from experience that temperature will probably affect the effective
battery life. However, temperature can be controlled in the product
development laboratory for the purposes of a test. The engineer decides to
test all three plate materials at three temperature levels - 15, 70, and 125 F -
because these temperature levels are consistent with the product end-use
environment.
I Because there are two factors at three levels, this design is sometimes called a
32 factorial design. Four batteries are tested at each combination of plate
material and temperature, and all 36 tests are run in random order. The
experiment and the resulting observed battery life data are given in the next
Table.
Sarguta (SoM) Design and Analysis February - May, 2021 106 / 263
Factorial Designs Two-Factor Factorial Design

Example Contd

I In this problem the engineer wants to answer the following questions:


What effects do material type and temperature have on the life of the battery?
Is there a choice of material that would give uniformly long life regardless of
temperature?
Material Temperature (F)
Type
15 70 125 yi.
130 155 34 40 20 70
1 539 229 230 998
74 180 80 75 82 58
150 188 136 122 25 70
2 623 479 198 1300
159 126 106 115 58 45
138 110 174 120 96 104
3 576 583 342 1501
168 160 150 139 82 60
y.j 1738 1291 770 3799 =
y..

Sarguta (SoM) Design and Analysis February - May, 2021 107 / 263
Factorial Designs Two-Factor Factorial Design

Sums of Squares

a X
b X
n 2
X
2 y...
SST = yijk −
abn
i=1 j=1 k=1

(3799)2
= (130)2 + (155)2 + · · · + (60)2 − = 77, 646.97
36

a
1 X 2 y2
SSMaterial = yi.. − ...
bn abn
i=1
1   (3799)2
= (998)2 + (1300)2 + (1501)2 − = 10, 683.72
(3)(4) 36

b
1 X 2 y2
SSTemperature = y.j. − ...
an abn
j=1

1   (3799)2
= (1738)2 + (1291)2 + (770)2 − = 39, 118.72
(3)(4) 36
Sarguta (SoM) Design and Analysis February - May, 2021 108 / 263
Factorial Designs Two-Factor Factorial Design

a b
1 XX 2 y2
SSInteraction = yij. − ... − SSMaterial − SSTemperature
n abn
i=1 j=1

1  (3799)2
= (539)2 + (229)2 + · · · + (342)2 −
4 36
−10, 683.72 − 39, 118.72 = 9613.78

and

SSE = SST − SSMaterial − SSTemperature − SSInteraction


= 77, 646.97 − 10, 683.72 − 39, 118.72 − 9613.78 = 18, 230.75

Task: Summarize the information in an ANOVA Table. Is the interaction between


material types and temperature significant?

Sarguta (SoM) Design and Analysis February - May, 2021 109 / 263
Factorial Designs Two-Factor Factorial Design

Solution - R

> y<-c(130,155,74,180,34,40,80,75,20,70,82,
+ 58,150,188,159,126,136,122,106,115,25,70,
+ 58,45,138,110,168,160,174,120,150,139,96,104,82,60)
> type <- as.factor(rep(1:3,each=12))
> temp <- as.factor(rep(c(15,70,125),each=4,times=3))
> data <- data.frame(type, temp, y)
> means <- matrix(nrow=3,ncol=3)
> for(i in 1:3) {for (j in 1:3) means[i,j] <-
+ mean(y[type==i & temp == c(15,70,125)[j]])}
> means
[,1] [,2] [,3]
[1,] 134.75 57.25 57.5
[2,] 155.75 119.75 49.5
[3,] 144.00 145.75 85.5
> interaction.plot(type,temp,y)
> interaction.plot(temp,type,y)

Sarguta (SoM) Design and Analysis February - May, 2021 110 / 263
Factorial Designs Two-Factor Factorial Design

Interaction Plots
160

160
temp type
140

140
70 3
15 1
125 2
120

120
mean of y

mean of y
100

100
80

80
60

60

1 2 3 15 70 125

type temp

Sarguta (SoM) Design and Analysis February - May, 2021 111 / 263
Factorial Designs Two-Factor Factorial Design

Interpretation

I To assist in interpreting the results of this experiment, it is helpful to


construct a graph of the average responses at each treatment combination.
I The significant interaction is indicated by the lack of parallelism of the lines.
I In general, longer life is attained at low temperature, regardless of material
type.
I Changing from low to intermediate temperature, battery life with material
type 3 may actually increase, whereas it decreases for types 1 and 2.
I From intermediate to high temperature, battery life decreases for material
types 2 and 3 and is essentially unchanged for type 1.
I Material type 3 seems to give the best results if we want less loss of effective
life as the temperature changes.

Sarguta (SoM) Design and Analysis February - May, 2021 112 / 263
Factorial Designs Two-Factor Factorial Design

Test for Interaction Effect


> g <- lm(y~type+temp+type*temp)
> anova(g)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
type 2 10684 5341.9 7.9114 0.001976 **
temp 2 39119 19559.4 28.9677 1.909e-07 ***
type:temp 4 9614 2403.4 3.5595 0.018611 *
Residuals 27 18231 675.2
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I As suspected, the interaction effects are quite significant. There is no battery
type which is ’best’ at all temperatures.
I If interactions were NOT significant one could compare the µ + τi by seeing
which of the differences µ̂ + τ̂i q
= ȳi.. were significantly different from each
2MSE
other ( using se (ȳi.. − ȳk.. ) = nb ).
Sarguta (SoM) Design and Analysis February - May, 2021 113 / 263
Factorial Designs Two-Factor Factorial Design

Comparisons at fixed Levels of other Factor

I Note that in this experiment, interaction is significant. When interaction is


significant, comparisons between the means of one factor (e.g., A) may be
obscured by the AB interaction.
I As it is, we can only make comparisons at fixed levels of the other factor. For
instance when temp=70, (j=2) we can compare the means

µi2 = µ + τi + β2 + (τ β)i2 ,

with estimates ȳi2. (each an average of n observations).


I The 95% Tukey CI’s on µi1 − µk2 are
r
MSE
ȳi2. − ȳk2. ± qtukey (.95, 3, 27) = ȳi2. − ȳk2. ± 45.55
n
I Since ȳ12. = 57.25, ȳ22. = 119.75 and ȳ32. = 145.75 we conclude that µ12 is
significantly less than µ22 and µ32 , but that these two are not significantly
different from each other.

Sarguta (SoM) Design and Analysis February - May, 2021 114 / 263
Factorial Designs Two-Factor Factorial Design

Model Adequacy Checking

Task: Carry out Model diagnostics

Sarguta (SoM) Design and Analysis February - May, 2021 115 / 263
Factorial Designs Two-Factor Factorial Design

Test for non-additivity; 3 factor designs


I Sometimes we can make only a single replicate, that is, one observation per
cell (n=1). Then all yij1 − ȳij. = 0, so SSE = 0 on ab(n − 1) = 0 d.f.
I The interaction SS, which for n = 1 is
X 2
(yij − ȳi. − ȳ.j + ȳ.. ) (5)
i,j

is what we should be using to estimate experimental error. There is still


however a way to test for interactions, if we assume that they take a simple
form:
(τ β)ij = γτi βj .
I We carry out ’Tukey’s one d.f. test for interaction’, which is an application of
the usual ’reduction in SS’ hypothesis testing principle.
I The ’full’ model is
yij = µ + τi + βj + γτi βj + ij .
I Under the null hypothesis H0 : γ = 0 of no interactions, the ’reduced’ model
is
yij = µ + τi + βj + ij
in which the minimum SS (i.e. SSRed ) is (8) above.
Sarguta (SoM) Design and Analysis February - May, 2021 116 / 263
Factorial Designs Two-Factor Factorial Design

I One computes
SSRed − SSFull 1
F0 = ∼ F(a−1)(b−1)−1 .
MSE (Full)
I The difference
SSN = SSRed − SSFull
is called the ’SS for non-additivity’, and uses 1 d.f. to estimate the one
parameter γ.
I The ANOVA becomes
Source SS df MS
SSA
A SSA a−1 MSA = a−1
SSB
B SSB b−1 MSB = b−1
N SSN 1 MSN = SS1N
Error SSE (a − 1)(b − 1) − 1 MSE = dfSS E
(Err )
Total SST ab − 1

Sarguta (SoM) Design and Analysis February - May, 2021 117 / 263
Factorial Designs Two-Factor Factorial Design

Error Sum of Squares

The error SS is SSFull . To obtain it one has to minimize


X 2
(yij − [µ + τi + βj + γτi βj ]) .
i,j

After calculation it turns out that


nP  2
o2
ȳ..
i,j yij ȳi. ȳ.j − ȳ.. SSA + SSB + ab
SSN = .
abSSA · SSB
with one degree of freedom, and SSE is obtained by subtraction:
SSE = SSRed − SSN , with (a − 1)(b − 1) − 1 degrees of freedom. To test for the
presence of interaction, we compute
SSN
F0 =
SSE /[(a − 1)(b − 1) − 1]

Sarguta (SoM) Design and Analysis February - May, 2021 118 / 263
Factorial Designs Two-Factor Factorial Design

Example

The impurity present in a chemical product is affected by two factors-pressure and


temperature. The data from a single replicate of a factorial experiment are as
shown:
Pressure
Temperature (F)
25 30 35 40 45 yi.
100 5 4 6 3 5 23
125 3 1 4 2 3 13
150 1 1 3 1 2 8
y.j 9 6 13 6 10 44 = y..
Analyse the data.

Sarguta (SoM) Design and Analysis February - May, 2021 119 / 263
Factorial Designs Two-Factor Factorial Design

Sums of Squares

The sums of squares are


a
1 X 2 y..2 1 2 442
23 + 132 + 82 −

SSA = yi. − = = 23.33
b ab 5 (3)(5)
i=1

b
1 X 2 y..2 1 2 442
9 + 62 + 132 + 62 + 102 −

SSB = y.j − = = 11.60
a ab 3 (3)(5)
j=1

a X
b
X y..2
SST = yij2 − = 166 − 129.07 = 36.93
ab
i=1 j=1

and
SSResidual = SST − SSA − SSB = 36.93 − 23.33 − 11.60 = 2.00

Sarguta (SoM) Design and Analysis February - May, 2021 120 / 263
Factorial Designs Two-Factor Factorial Design

The sum of squares for nonadditivity is computed as follows:


a X
X b
yij yi. y.j = (5)(23)(9) + (4)(23)(6) + · · · + (2)(8)(10) = 7236
i=1 j=1

hP  2
i2
a Pb y..
i=1 j=1 yij yi. y.j − y.. SSA + SSB + ab
SSN =
abSSA SSB
2
[7236 − (44)(23.33 + 11.60 + 129.07)]
=
(3)(5)(23.33)(11.60)
2
[20.00]
= = 0.0985
4059.42
and the error sum of squares is

SSE = SSResidual − SSN = 2.00 − 0.0985 = 1.9015

Sarguta (SoM) Design and Analysis February - May, 2021 121 / 263
Factorial Designs Two-Factor Factorial Design

ANOVA Table

The complete ANOVA is summarized as follows:


Source of Sum of Degrees of Mean F0
Variation Squares Freedom Square
Temperature 23.33 2 11.67 42.97
Pressure 11.60 4 2.90 10.68
Nonadditivity 0.0985 1 0.0985 0.36
Error 1.9015 7 0.2716
Total 36.93 14
The test statistic for nonadditivity is F0 = 0.36, so we conclude that there is no
evidence of interaction in these data. The main effects of temperature and
pressure are significant.

Sarguta (SoM) Design and Analysis February - May, 2021 122 / 263
Factorial Designs Two-Factor Factorial Design

Conclusion

In concluding this section, we note that the two-factor factorial model with one
observation per cell looks exactly like the randomized complete block model. In
fact, the Tukey single-degree-of-freedom test for nonadditivity can be directly
applied to test for interaction in the randomized block model. However, remember
that the experimental situations that lead to the randomized block and factorial
models are very different. In the factorial model, all ab runs have been made in
random order, whereas in the randomized block model, randomization occurs only
within the block. The blocks are a randomization restriction. Hence, the manner
in which the experiments are run and the interpretation of the two models are
quite different.

Sarguta (SoM) Design and Analysis February - May, 2021 123 / 263
Factorial Designs The General Factorial Design

The General Factorial Design


I The results for the two-factor factorial design may be extended to the general
case where there are a levels of factor A, b levels of factor B, c levels of
factor C , and so on, arranged in a factorial experiment.
I In general, there will be abc · · · n total observations if there are n replicates of
the complete experiment.
I Once again, note that we must have at least two replicates (n ≥ 2) to
determine a sum of squares due to error if all possible interactions are
included in the model.
I If all factors in the experiment are fixed, we may easily formulate and test
hypotheses about the main effects and interactions using the ANOVA.
I For a fixed effects model, test statistics for each main effect and interaction
may be constructed by dividing the corresponding mean square for the effect
or interaction by the mean square error. All of these F tests will be upper-tail,
one-tail tests.
I The number of degrees of freedom for any main effect is the number of levels
of the factor minus one, and the number of degrees of freedom for an
interaction is the product of the number of degrees of freedom associated
with the individual components of the interaction.
Sarguta (SoM) Design and Analysis February - May, 2021 124 / 263
Factorial Designs The General Factorial Design

Three-factor Factorial

Model:

yijkl = µ + τi + βj + γk + (τ β)ij + (τ γ)ik + (βγ)jk + (τ βγ)ijk + ijkl

for

i = 1, 2, . . . , a
j = 1, 2, . . . , b
k = 1, 2, . . . , c
l = 1, 2, . . . , n

Sarguta (SoM) Design and Analysis February - May, 2021 125 / 263
Factorial Designs The General Factorial Design

A three-factor example

A soft drink bottler is interested in obtaining more uniform fill heights in the
bottles produced by his manufacturing process. The filling machine theoretically
fills each bottle to the correct target height, but in practice, there is variation
around this target, and the bottler would like to understand the sources of this
variability better and eventually reduce it. The process engineer can control three
variables during the filling process: the percent carbonation (A), the operating
pressure in the filler (B), and the bottles produced per minute or the line speed
(C). The pressure and speed are easy to control, but the percent carbonation is
more difficult to control during actual manufacturing because it varies with
product temperature. However, for purposes of an experiment, the engineer can
control carbonation at three levels: 10, 12, and 14 percent. She chooses two levels
for pressure (25 and 30 psi) and two levels for line speed (200 and 250 bpm). She
decides to run two replicates of a factorial design in these three factors, with all 24
runs taken in random order. The response variable observed is the average
deviation from the target fill height observed in a production run of bottles at
each set of conditions.

Sarguta (SoM) Design and Analysis February - May, 2021 126 / 263
Factorial Designs The General Factorial Design

Fill Height Deviation Data

Operating Pressure (B)


25 psi 30 psi
Line Speed (C) Line Speed (C)
Percent Carbonation (A) 200 250 200 250
-3 -1 -1 1
10
-1 0 0 1
0 2 2 6
12
1 1 3 5
5 7 7 10
14
4 6 9 11

Sarguta (SoM) Design and Analysis February - May, 2021 127 / 263
Factorial Designs The General Factorial Design

Solve in R

> y <- c(-3,-1,0,1,5,4, -1,0,2,1,7,6,


+ -1,0,2,3,7,9, 1,1,6,5,10,11)
> carbon <- as.factor(rep(c(10,12,14),each=2, times=4))
> press <- as.factor(rep(c(25,30), each=12))
> speed <- as.factor(rep(c(200,250), each=6, times=2))
> data <- data.frame(y, carbon, press, speed)
> par(mfrow=c(2,2))
> plot.design(data)
> interaction.plot(carbon,press,y)
> interaction.plot(carbon,speed,y)
> interaction.plot(press,speed,y)

Sarguta (SoM) Design and Analysis February - May, 2021 128 / 263
Factorial Designs The General Factorial Design

Plots

14 press

8
6

30

mean of y
mean of y

6
30 25
250
4

4
12 200
2

2
25

0
0

10
carbon press speed
10 12 14

Factors carbon

speed speed
8

250 250
6
mean of y

mean of y

200 200
4
4

3
2

2
0

10 12 14 25 30

carbon press

Sarguta (SoM) Design and Analysis February - May, 2021 129 / 263
Factorial Designs The General Factorial Design

Analysis
> g<-lm(y ~ carbon + press + speed + carbon*press
+ + carbon*speed + press*speed + carbon*press*speed)
> anova(g)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
carbon 2 252.750 126.375 178.4118 1.186e-09 ***
press 1 45.375 45.375 64.0588 3.742e-06 ***
speed 1 22.042 22.042 31.1176 0.0001202 ***
carbon:press 2 5.250 2.625 3.7059 0.0558081 .
carbon:speed 2 0.583 0.292 0.4118 0.6714939
press:speed 1 1.042 1.042 1.4706 0.2485867
carbon:press:speed 2 1.083 0.542 0.7647 0.4868711
Residuals 12 8.500 0.708
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sarguta (SoM) Design and Analysis February - May, 2021 130 / 263
Factorial Designs The General Factorial Design

Conclusion

I It seems that interactions are largely absent, and that all three main effects
are significant. In particular, the low level of pressure results in smaller mean
deviations from the target.
I We see that the percentage of carbonation, operating pressure, and line
speed significantly affect the fill volume. The carbonation-pressure interaction
F ratio has a P-value of 0.0558, indicating some interaction between these
factors.
Task: Carry out an analysis of the residuals.

Sarguta (SoM) Design and Analysis February - May, 2021 131 / 263
The 2k Factorial Design

PART IV-2:
k
THE 2 FACTORIAL DESIGN

Sarguta (SoM) Design and Analysis February - May, 2021 132 / 263
The 2k Factorial Design Introduction

Introduction

I A 2k design includes k main effects, k2 two factor interactions, k3 three


 

factor interactions, ... , and one k factor interaction.


I The same notation is used for treatment combinations. For example: in a 25
design abd denotes A, B, D, at the high level; and C, E at the low level.
I Treatment combinations may be written in standard order.
I To estimate an effect, we can use a table of plus and minus signs.

Sarguta (SoM) Design and Analysis February - May, 2021 133 / 263
The 2k Factorial Design Introduction

A Single Replicate of the 2k Design

I Even for a moderate number of factors, the total number of treatment


combinations in a 2k factorial design is large. Example: 26 design.
I Frequently, available resources only allow a single replicate of the design to be
run, called an unreplicated factorial.
I With only one replicate, there is no estimate for error.
I To analyze an unreplicated factorial, we assume that certain higher order
interactions are negligible and combine their mean squares to estimate the
error.
I Sparsity of effects principal: most systems are dominated by some of the
main effects and low order interactions, and most high order interactions are
negligible.

Sarguta (SoM) Design and Analysis February - May, 2021 134 / 263
The 2k Factorial Design 22 Factorials

22 Factorials

I Two factors (A and B), each at two levels - low (’-’) and high (’+’). The
number of replicates = n.
I Example - investigate yield (y ) of a chemical process when the concentration
of a reactant (the primary substance producing the yield) - factor A - and
amount of a catalyst (to speed up the reaction) - factor B - are changed.
E.g. nickel is used as a ’catalyst’, or a carrier of hydrogen in the
hydrogenation of oils (the reactants) for use in the manufacture of margarine.
Factor n = 3 replicates
A B I II III Total Label
- - 28 25 27 80 (1)
+ - 36 32 32 100 a
- + 18 19 23 60 b
+ + 31 30 29 90 ab

Sarguta (SoM) Design and Analysis February - May, 2021 135 / 263
The 2k Factorial Design 22 Factorials

22 Factorial
I Notation

(1) = sum of observations at low levels of both factors,


a = sum of observations with A high and B low,
b = sum of observations with B high and A low,
ab = sum of observations with both high.

I Effects model

yijk = µ + Ai + Bj + (AB)ij + ijk , (i, j = 1, 2, k = 1, . . . , n)

I E.g. A1 = main effect of low level of A, A2 = main effect of high level of A.


But since A1 + A2 = 0, we have A1 = −A2 .
I We define the ’main effect of Factor A’ to be

A = A2 − A1 .

Sarguta (SoM) Design and Analysis February - May, 2021 136 / 263
The 2k Factorial Design 22 Factorials

Least Square Estimate


I What is the LSE of A? Since A is the effect of changing factor A from high
to low, we expect
 = average y at high A − average y at low A
a + ab (1) + b
= −
2n 2n
a + ab − (1) − b
= .
2n
This is the LSE.
I Reason: We know that the LSE of A2 is
Â2 = average y at high A − overall average y ,
and that of A1 is
Â1 = average y at low A − overall average y ,
so that
 = Â2 − Â1
= average y at high A − average y at low A.
Sarguta (SoM) Design and Analysis February - May, 2021 137 / 263
The 2k Factorial Design 22 Factorials

I Often the ’hats’ are omitted. Similarly,

b + ab − a − (1)
B=
2n

AB = difference between effect of A at high B, and effect of A at low B


ab − b a − (1)
= −
2n 2n
ab − b − a + (1)
= .
2n
With (1) = 80, a = 100, b = 60, ab = 90 we find

A = 8.33,
B = −5.0,
AB = 1.67.

I It appears that increasing the level of A results in an increase in yield; that


the opposite is true of B, and that there isn’t much interaction effect. To
confirm this we would do an ANOVA.
Sarguta (SoM) Design and Analysis February - May, 2021 138 / 263
The 2k Factorial Design 22 Factorials

Solution in R
> A <- c(-1, 1, -1,1)
> B <- c(-1, -1, 1, 1)
> I <- c(28, 36, 18, 31)
> II <- c(25, 32, 19, 30)
> III <- c(27, 32, 23, 29)
> data <- data.frame(A, B, I, II, III)
> data
A B I II III
1 -1 -1 28 25 27
2 1 -1 36 32 32
3 -1 1 18 19 23
4 1 1 31 30 29
> #Compute sums for each combination
> sums <- apply(data[,3:5], 1, sum)
> names(sums) <- c("(1)", "(a)", "(b)", "(ab)")
> sums
(1) (a) (b) (ab)
80 100 60 90
Sarguta (SoM) Design and Analysis February - May, 2021 139 / 263
The 2k Factorial Design 22 Factorials

Interaction Plots
> ybar <- sums/3
> par(mfrow=c(1,2))
> interaction.plot(A, B, ybar)
> interaction.plot(B, A, ybar)

B A
32

32
−1 1
mean of ybar

mean of ybar
1 −1
28

28
24

24
20

20

−1 1 −1 1

A B

Sarguta (SoM) Design and Analysis February - May, 2021 140 / 263
The 2k Factorial Design 22 Factorials

Build ANOVA Table

> y <- c(I, II, III)


> factorA <- as.factor(rep(A,3))
> factorB <- as.factor(rep(B,3))
> g <- lm(y ~factorA + factorB + factorA*factorB)
> anova(g)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
factorA 1 208.333 208.333 53.1915 8.444e-05 ***
factorB 1 75.000 75.000 19.1489 0.002362 **
factorA:factorB 1 8.333 8.333 2.1277 0.182776
Residuals 8 31.333 3.917
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sarguta (SoM) Design and Analysis February - May, 2021 141 / 263
The 2k Factorial Design 22 Factorials

Residual Analysis

Residuals vs Fitted Normal Q−Q

2.0
11 11
3

2 2

Standardized residuals
2

1.0
Residuals

1
0

0.0
−1

−1.0
−2

3
3

20 22 24 26 28 30 32 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5

Fitted values Theoretical Quantiles

Constant Leverage:
Scale−Location Residuals vs Factor Levels
11
2 11
1.2

2
Standardized residuals

1.5
Standardized residuals

3
0.8

0.5
−0.5
0.4

3
−1.5
0.0

factorA :
20 22 24 26 28 30 32 −1 1

Fitted values Factor Level Combinations

Sarguta (SoM) Design and Analysis February - May, 2021 142 / 263
The 2k Factorial Design 22 Factorials

Contrasts
I The estimates of the effects have used only the terms ab, a, b and (1), each
of which is the sum of n = 3 independent terms. Then
ab + a − b − (1) CA
A = = ,
2n 2n
ab − a + b − (1) CB
B = = ,
2n 2n
ab − a − b + (1) CAB
AB = = ,
2n 2n
where CA , CB , CAB are orthogonal contrasts (why?) in ab, a, b and (1).
I In our previous notation, the SS for Factor A (we might have written is as
bn Â2i ) is
P
  C2
SSA = 2n Â21 + Â22 = 4nÂ22 = nÂ2 = A ,
4n
and similarly
C2 C2
SSB = B , SSAB = AB .
4n 4n
[90+100−60−80]2
In this way SSA = 12 = 208.33.
Sarguta (SoM) Design and Analysis February - May, 2021 143 / 263
The 2k Factorial Design 2k Factorials

2k Factorials

I All of this generalizes to the 2k factorial, in which k factors are investigated,


each at two levels.
I To easily write down the estimates of the effects, and the contrasts, we start
with a table of ± signs, done here for k = 3.
I Label the rows (1), then the product of a with (1). Then all products of b
with the terms which are already there: b ∗ (1) = b, b ∗ a = ab. Then all the
products of c with the terms which are already there. (This is the ’standard’
order.)
I Now put in the signs. Start with 2k = 8 +0 s under the I , then alternate −0 s
and +0 s, then in groups of 2, finally (under C) in groups of 4 (= 2k−1 ). Then
write in the products under the interaction terms.

Sarguta (SoM) Design and Analysis February - May, 2021 144 / 263
The 2k Factorial Design 2k Factorials

Effect
I A B C AB AC
BC ABC
(1) + - - - + ++ -
a + + - - +
b + - + - +
ab + + + + - - - -
c + - - + +
ac + + - + -
bc + - + + -
abc + + + + +
I Interpretation: Assign the appropriate signs to the combinations (1), . . . , abc.
Effect estimates are
[(1) + b + c + bc] a + ab + ac + abc
A=− + ,
4n 4n
etc.
[a + b + c + abc] − [(1) + ab + ac + bc]
ABC = ,
4n
all with 2k−1 n in the denominator.

Sarguta (SoM) Design and Analysis February - May, 2021 145 / 263
The 2k Factorial Design 2k Factorials

Sum of Squares

I These are all of the form C


(2k−1 n)
for a contrast in the sums (1), . . . , abc; the
C2
corresponding SS is (2k n)
. For example

2
{[a + b + c + abc] − [(1) + ab + ac + bc]}
SSABC = .
8n
I The sums of squares are all on 1 d.f. (including SSI , which uses the 1 d.f.
usually subtracted from N = 2k n for the estimation of the overall mean µ),
so that SSE , obtained by subtraction, is on N − 2k = 2k (n − 1) d. f.
I The F-ratio to test the effect of factor A is
MSA
F0 = ,
MSE
SSE
where MSA = SSA and MSE = df (SSE ) .

Sarguta (SoM) Design and Analysis February - May, 2021 146 / 263
The 2k Factorial Design 2k Factorials

Replicate n = 1

I Suppose n = 1, so that no d.f. are available for the estimation of σ 2 . In the


22 there was Tukey’s test for non-additivity, which relied on the assumption
that the interactions were of a certain mathematically simple but statistically
dubious form (even more so for k > 2). A more common remedy is to not
even try to estimate certain effects - usually higher order interactions - and
use the d.f. released in this way to estimate error.
I A graphical way of identifying the important effects which must be in the
model, and those which can be dropped to facilitate error estimation, is a
normal probability plot of the absolute values of the effect estimates - a
’half-normal’ plot. Those effects which deviate significantly from the qqline
tend to be the important ones.
I Example. Data in Table 6-10 (Page 257). A chemical product is produced
using two levels each of temperature (A), pressure (B), concentration of
formaldehyde (C) and rate (D) at which the product is stirred. Response (Y)
is the ’filtration rate’.

Sarguta (SoM) Design and Analysis February - May, 2021 147 / 263
The 2k Factorial Design 2k Factorials

Data
> y<-c(45,71,48,65,68,60,80,65,43,100,45,104,75,86,70,96)
> A <-as.factor(rep(c(-1,1,-1,1),4))
> B <- as.factor(rep(c(-1, -1, 1, 1),4))
> C <- as.factor(rep(c(-1, -1, -1, -1,1,1,1,1),2))
> D <- as.factor(c(-1, -1, -1, -1,-1, -1, -1, -1,1,1,1,1,1,1,1,1))
> data<-data.frame(A,B,C,D,y)
> data
A B C D y
1 -1 -1 -1 -1 45
2 1 -1 -1 -1 71
3 -1 1 -1 -1 48
4 1 1 -1 -1 65
5 -1 -1 1 -1 68
6 1 -1 1 -1 60
7 -1 1 1 -1 80
8 1 1 1 -1 65
9 -1 -1 -1 1 43
10 1 -1 -1 1 100
11 -1 Sarguta
1 -1(SoM)1 45 Design and Analysis February - May, 2021 148 / 263
The 2k Factorial Design 2k Factorials

ANOVA
> g <- lm(y ~(A+B+C+D)^4)
> anova(g)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1870.56 1870.56
B 1 39.06 39.06
C 1 390.06 390.06
D 1 855.56 855.56
A:B 1 0.06 0.06
A:C 1 1314.06 1314.06
A:D 1 1105.56 1105.56
B:C 1 22.56 22.56
B:D 1 0.56 0.56
C:D 1 5.06 5.06
A:B:C 1 14.06 14.06
A:B:D 1 68.06 68.06
A:C:D Sarguta (SoM)
1 10.56 10.56 Design and Analysis February - May, 2021 149 / 263
The 2k Factorial Design 2k Factorials

Effects - R

> g$effects
(Intercept) A1 B1 C1 D1 A1
-280.25 -43.25 -6.25 19.75 29.25 0
A1:C1 A1:D1 B1:C1 B1:D1 C1:D1 A1:B1
-36.25 33.25 -4.75 -0.75 -2.25 3
A1:B1:D1 A1:C1:D1 B1:C1:D1 A1:B1:C1:D1
-8.25 3.25 5.25 2.75
Note that these are twice as large in absolute value as those computed, and the
signs sometimes differ. This is because of R’s definition of ’effect’, and makes no
difference for comparing their absolute values.

Sarguta (SoM) Design and Analysis February - May, 2021 150 / 263
The 2k Factorial Design 2k Factorials

> effects <- abs(g$effects[-1])


> qq <- qqnorm(effects, type="n")
> text(qq$x, qq$y, labels = names(effects))
Normal Q−Q Plot

A1
40

A1:C1
A1:D1
30
Sample Quantiles

D1
20

C1
10

A1:B1:D1
B1
B1:C1:D1
B1:C1
A1:B1:C1
A1:C1:D1
A1:B1:C1:D1
C1:D1
A1:B1 B1:D1
0

−1 0 1

Theoretical Quantiles

The significant terms seems to be A, C, D and the interactions AC, AD. So let’s
just drop B and fit all terms not involving B.

Sarguta (SoM) Design and Analysis February - May, 2021 151 / 263
The 2k Factorial Design 2k Factorials

Fit without Factor B


Because B (pressure) is not significant and all interactions involving B are
negligible, we may discard B from the experiment so that the design becomes a 23
factorial in A, C, and D with two replicates.
> h <- lm(y ~(A+C+D)^3)
> anova(h)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1870.56 1870.56 83.3677 1.667e-05 ***
C 1 390.06 390.06 17.3844 0.0031244 **
D 1 855.56 855.56 38.1309 0.0002666 ***
A:C 1 1314.06 1314.06 58.5655 6.001e-05 ***
A:D 1 1105.56 1105.56 49.2730 0.0001105 ***
C:D 1 5.06 5.06 0.2256 0.6474830
A:C:D 1 10.56 10.56 0.4708 0.5120321
Residuals 8 179.50 22.44
---
Signif.Sarguta
codes:
(SoM)
0 '***' 0.001Design
'**' 0.01 '*' 0.05 '.'February
and Analysis
' 2021
0.1- May, ' 1 152 / 263
The 2k Factorial Design 2k Factorials

Interaction Plots

1
80

80
1
−1
75

mean of y
mean of y

70
70

60
65

−1
−1

50
60

−1
A C D
−1 1

Factors A

D D
60 65 70 75 80
90

1 1
mean of y

mean of y

−1 −1
80
70
60

−1 1 −1 1

A C

Although the main effects plot indicates that C high is best, the interaction plots
show that the best settings are A high, C low and D high.

Sarguta (SoM) Design and Analysis February - May, 2021 153 / 263
The 2k Factorial Design Additional Concepts in Factorial Designs

The 3k Factorial Design

I The 3k Factorial Design is a factorial arrangement with k factors each at


three levels.
I We refer to the three levels of the factors as low (0), intermediate (1), and
high (2).
I For example, in a 32 design, the nine treatment combinations are denoted by
00, 01, 10, 02, 20, 11, 12, 21, 22.
I The 3k factorial design is considered by experimenters who are concerned
about curvature in the response.
I The addition of a third level allows the relationship between the response and
each factor to be modeled with a quadratic relationship.
I Other alternatives:
response surface designs
2k design augmented with center points

Sarguta (SoM) Design and Analysis February - May, 2021 154 / 263
The 2k Factorial Design Additional Concepts in Factorial Designs

Example: 32 Design

I The simplest 3k factorial design is the 32 design, which has two factors, each
at three levels.
I The 32 = 9 treatment combinations are: 00, 01, 10, 02, 20, 11, 12, 21, 22.
I There are eight degrees of freedom between these nine treatment
combinations: the main effects A and B have 2 degrees of freedom each, and
the AB interaction has 4 degrees of freedom.
I When a factor has three levels, it will have two degrees of freedom.
I Therefore, the associated sums of squares can be broken down into two
components: one that represents the linear effect (SSAL ) and the other that
represents the quadratic effect (SSAQ ).
I A linear effect is where the value of the response variable changes at almost a
constant rate over the different levels.
I A quadratic effect is where the value of the response variable changes along
the lines of a quadratic relationship.

Sarguta (SoM) Design and Analysis February - May, 2021 155 / 263
Blocking and Confounding in the 2k Factorial Design

PART VI:
BLOCKING AND CONFOUNDING IN THE 2k
FACTORIAL DESIGN

Sarguta (SoM) Design and Analysis February - May, 2021 156 / 263
Blocking and Confounding in the 2k Factorial Design Introduction

Introduction

I In many situations it is impossible to perform all of the runs in a 2k factorial


experiment under homogeneous conditions.
I For example, a single batch of raw material might not be large enough to
make all of the required runs.
I In other cases, it might be desirable to deliberately vary the experimental
conditions to ensure that the treatments are equally effective (i.e., robust)
across many situations that are likely to be encountered in practice.
I For example, a chemical engineer may run a pilot plant experiment with
several batches of raw material because he knows that different raw material
batches of different quality grades are likely to be used in the actual full-scale
process.
I The design technique used in these situations is blocking.

Sarguta (SoM) Design and Analysis February - May, 2021 157 / 263
Blocking and Confounding in the 2k Factorial Design Blocking

Blocking
I Importance of blocking is to control nuisance factors - day of week, batch of
raw material, etc.
I Complete Blocks. This is the easy case. Suppose we run a 22 factorial
experiment, with all 4 runs made on each of 3 days. So there are 3 replicates
(= blocks), 12 observations. There is 1 d.f. for each of I, A, B, AB, leaving 8
d.f. Of these, 2 are used for blocks and the remaining 6 for SSE .
I The LSE’s of the block effects are

b i = average of 4 observations in block i − overall average


Bl

and
X  2 3 
X 2
SSBlocks = Bl
bi =4 Bl
bi .
all obs’ns i=1
I Note the randomization used here - it is only within each block. If we could
run the blocks in random order, for instance if they were batches of raw
material, then we would also do so.

Sarguta (SoM) Design and Analysis February - May, 2021 158 / 263
Blocking and Confounding in the 2k Factorial Design Blocking

Example

I Example - chemical experiment from previous example, with ’Replicates’


re-labelled as ’Blocks’.
Factor Block
A B I II III Total Label
- - 28 25 27 80 (1)
+ - 36 32 32 100 a
- + 18 19 23 60 b
+ + 31 30 29 90 ab
averages 28.25 26.5 27.75 27.5
Bl
b 0.75 -1 0.25
SSBlocks = 4 (0.75)2 + (−1)2 + (0.25)2 = 6.5.


I Check on R.

Sarguta (SoM) Design and Analysis February - May, 2021 159 / 263
Blocking and Confounding in the 2k Factorial Design Blocking

Incomplete Blocks
I Consider again a 22 factorial, in which only 2 runs can be made in each of 2
days (the blocks). Which 2 runs?
I Consider
Block 1: (1), ab
Block 2: a, b.
I What is the LSE of the block effect? Think of the blocks as being at a ’high’
level - Block 1 - and a ’low’ level - Block 2.
I Then the estimate is
Bl = average at high level - average at low level
ab + (1) − a − b
=
2
[ab − a] − [b − (1)]
=
2
= effect of B when A is high - effect of B when A is low
= AB (6)
I We say that AB is confounded with blocks since the block effect and the AB
interaction are identical.
Sarguta (SoM) Design and Analysis February - May, 2021 160 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Confounding in the 2k Factorial Design


I In many problems it is impossible to perform a complete replicate of a
factorial design in one block.
I Unless experimenters have a prior estimate of error or are willing to assume
certain interactions to be negligible, they must replicate the design to obtain
an estimate of error.
I Confounding is a design technique for arranging a complete factorial
experiment in blocks, where the block size is smaller than the number of
treatment combinations in one replicate.
I The technique causes information about certain treatment effects (usually
high-order interactions) to be indistinguishable from, or confounded with,
blocks.
I Note that even though the designs presented are incomplete block designs
because each block does not contain all the treatments or treatment
combinations, the special structure of the 2k factorial system allows a
simplified method of analysis.
I We consider the construction and analysis of the 2k factorial design in 2p
incomplete blocks, where p < k. Consequently, these designs can be run in
two blocks (p = 1), four blocks (p = 2), eight blocks (p = 3), and so on.
Sarguta (SoM) Design and Analysis February - May, 2021 161 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Confounding the 2k Factorial Design in Two Blocks

I Suppose that we wish to run a single replicate of the 22 design.


I Each of the 22 = 4 treatment combinations requires a quantity of raw
material, for example, and each batch of raw material is only large enough for
two treatment combinations to be tested. Thus, two batches of raw material
are required.
I If batches of raw material are considered as blocks, then we must assign two
of the four treatment combinations to each block.
I One possible design is
Block 1: (1), ab
Block 2: a, b.
I The order in which the treatment combinations are run within a block is
randomly determined. We would also randomly decide which block to run
first.

Sarguta (SoM) Design and Analysis February - May, 2021 162 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

I Suppose we estimate the main effects of A and B just as if no blocking had


occurred. Then
1
A = [ab + a − b − (1)]
2
1
[ab + b − a − (1)]
B=
2
I Note that both A and B are unaffected by blocking because in each estimate
there is one plus and one minus treatment combination from each block.
That is, any difference between block 1 and block 2 will cancel out.
I The confounding can be seen from the table of ± signs. All runs in which
AB = + are in block 1, and all with AB = − are in block 2.
Effect
I A B AB Block
(1) + − − + 1
a + + − − 2
b + − + − 2
ab + + + + 1

Sarguta (SoM) Design and Analysis February - May, 2021 163 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Confounded Effect?

What effect is confounded with blocks in this design?


Effect
I A B AB Block
(1) + − − + 1
a + + − − 2
b + − + − 1
ab + + + + 2
I The usual practice is to confound the highest order interaction with blocks.
I This scheme can be used to confound any 2k design in two blocks.

Sarguta (SoM) Design and Analysis February - May, 2021 164 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

23 Design in two Blocks

I As in the last example, we can choose which effect is to be confounded with


the two blocks. For instance in a 23 design, with 4 runs in each of 2 blocks,
we confound ABC with blocks in the following way:
Effect
I A B C AB AC BC ABC Blocks
(1) + - - - + + + - 1
a + + - - + 2
b + - + - + 2
ab + + + - + - - - 1
c + - - + + 2
ac + + - + - 1
bc + - + + - 1
abc + + + + + 2

Sarguta (SoM) Design and Analysis February - May, 2021 165 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Other Methods for Constructing the Blocks

I For running a 2k factorial in 2, 4, 8, 16, . . . blocks, there are useful algebraic


methods to decide on a confounding scheme. For 2 blocks, start with a
’defining contrast’
L = α1 x1 + · · · + αk xk
where (
1, if factor i is high,
xi =
0, if factor i is low,
and αi is the exponent of factor i in the effect to be confounded.
I For example; 23 factorial with ABC = A1 B 1 C 1 confounded with blocks has
α1 = α2 = α3 = 1 and L = x1 + x2 + x3 . If AB = A1 B 1 C 0 is to be
confounded, then α1 = α2 = 1, α3 = 0, L = x1 + x2 .
I Now evaluate L at all treatment combinations, using ’arithmetic mod 2’:
x(mod 2) = remainder when x is divided by 2.
All those with L = 0 (mod 2) go in one block, those with L = 1 (mod 2) in
another.

Sarguta (SoM) Design and Analysis February - May, 2021 166 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

23 design with ABC confounded with blocks


I Consider a 23 design with ABC confounded with blocks. Here x1 corresponds
to A, x2 to B, x3 to C , and α1 = α2 = α3 = 1. Thus, the defining contrast
corresponding to ABC is
L = x1 + x2 + x3
I Treatment combinations
(1) : L = 1(0) + 1(0) + 1(0) = 0 = 0(mod2)
a:L = 1(1) + 1(0) + 1(0) = 1 = 1(mod2)
b:L = 1(0) + 1(1) + 1(0) = 1 = 1(mod2)
ab : L = 1(1) + 1(1) + 1(0) = 2 = 0(mod2)
c:L = 1(0) + 1(0) + 1(1) = 1 = 1(mod2)
ac : L = 1(1) + 1(0) + 1(1) = 2 = 0(mod2)
bc : L = 1(0) + 1(1) + 1(1) = 2 = 0(mod2)
abc : L = 1(1) + 1(1) + 1(1) = 3 = 1(mod2)
I Thus (1), ab, ac, bc are run in block 1 and a, b, c, abc are run in block 2.
This is in agreement with what we got using the ± signs.
Sarguta (SoM) Design and Analysis February - May, 2021 167 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Example - Filtration rate experiment

I Run the 24 factorial for the filtration rate experiment in 2 blocks,


corresponding to different batches of formaldehyde. Confound the ABCD
interaction with blocks, so that L = x1 + x2 + x3 + x4 (mod2) will be 0 if
x1 , x2 , x3 , x4 contains 0, 2 or 4 ones: (1), ab, ac, ad, bc, bd, cd, abcd and 1
otherwise:

Block 1 : (1), ab, ac, ad, bc, bd, cd, abcd,


Block 2 : a, b, c, d, abc, abd, acd, bcd.

I The data have been modified by subtracting 20 from all Block 1 observations,
to simulate a situation where the first batch of formaldehyde is inferior.

Sarguta (SoM) Design and Analysis February - May, 2021 168 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Data - R

> A <- rep(c(-1,1), times=8)


> B <- rep(c(-1,1), each = 2, times=4)
> C <- rep(c(-1,1), each = 4, times=2)
> D <- rep(c(-1,1), each = 8)
> y <- c(25, 71, 48, 45, 68, 40, 60, 65,
+ 43, 80, 25, 104, 55, 86, 70, 76)
> ABCD <- A*B*C*D
> A <- as.factor(A)
> B <- as.factor(B)
> C <- as.factor(C)
> D <- as.factor(D)
> data <- data.frame(A, B, C, D, ABCD, y)
> data

Sarguta (SoM) Design and Analysis February - May, 2021 169 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Output - R

A B C D ABCD y
1 -1 -1 -1 -1 1 25
2 1 -1 -1 -1 -1 71
3 -1 1 -1 -1 -1 48
4 1 1 -1 -1 1 45
5 -1 -1 1 -1 -1 68
6 1 -1 1 -1 1 40
7 -1 1 1 -1 1 60
8 1 1 1 -1 -1 65
9 -1 -1 -1 1 -1 43
10 1 -1 -1 1 1 80
11 -1 1 -1 1 1 25
12 1 1 -1 1 -1 104
13 -1 -1 1 1 1 55
14 1 -1 1 1 -1 86
15 -1 1 1 1 -1 70
16 1 1 1 1 1 76
Sarguta (SoM) Design and Analysis February - May, 2021 170 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

> # ABCD = Blocks


> g <- lm(y ~ (A+B+C+D)^4)
> anova(g)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1870.56 1870.56
B 1 39.06 39.06
C 1 390.06 390.06
D 1 855.56 855.56
A:B 1 0.06 0.06
A:C 1 1314.06 1314.06
A:D 1 1105.56 1105.56
B:C 1 22.56 22.56
B:D 1 0.56 0.56
C:D 1 5.06 5.06
A:B:C 1 14.06 14.06
A:B:D 1 68.06 68.06
A:C:D 1 10.56 10.56
B:C:D 1 27.56 27.56
Sarguta (SoM) Design and Analysis February - May, 2021 171 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Half normal plot


Normal Q−Q Plot

A1
40

A1:B1:C1:D1
A1:C1
A1:D1
30

D1
Sample Quantiles

20

C1
10

A1:B1:D1
B1
B1:C1:D1
B1:C1
A1:B1:C1
A1:C1:D1
C1:D1
A1:B1 B1:D1
0

−1 0 1

Theoretical Quantiles

Significant effects are A, C , D, AC , AD and Blocks (= ABCD). We can run the


ANOVA again, estimating only these effects.
Sarguta (SoM) Design and Analysis February - May, 2021 172 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

> Blocks <- ABCD


> h <- lm(y ~ A + C + D + A*C + A*D + Blocks)
> anova(h)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1870.56 1870.56 89.757 5.600e-06 ***
C 1 390.06 390.06 18.717 0.0019155 **
D 1 855.56 855.56 41.053 0.0001242 ***
Blocks 1 1387.56 1387.56 66.581 1.889e-05 ***
A:C 1 1314.06 1314.06 63.054 2.349e-05 ***
A:D 1 1105.56 1105.56 53.049 4.646e-05 ***
Residuals 9 187.56 20.84
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sarguta (SoM) Design and Analysis February - May, 2021 173 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Confounding the 2k Factorial Design in Four Blocks

I It is possible to construct 2k factorial designs confounded in four blocks of


2k−2 observations each.
I These designs are particularly useful in situations where the number of factors
is moderately large, say k ≥ 4, and block sizes are relatively small.
I As an example, consider the 25 design. If each block will hold only eight runs,
then four blocks must be used. The construction of this design is relatively
straightforward. Select two effects to be confounded with blocks, say ADE
and BCE . These effects have the two defining contrasts

L1 = x1 + x4 + x5
L2 = x2 + x3 + x5

associated with them.


I Now every treatment combination will yield a particular pair of values of
L1 (mod2) and L2 (mod2), that is, either
(L1 , L2 ) = (0, 0), (0, 1), (1, 0), or(1, 1).

Sarguta (SoM) Design and Analysis February - May, 2021 174 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

I Treatment combinations yielding the same values of (L1 , L2 ) are assigned to


the same block.
I In our example we find:

L1 = 0, L2 = 0 for (1), ad, bc, abcd, abe, ace, cde, bde


L1 = 1, L2 = 0 for a, d, abc, bcd, be, abde, ce, acde
L1 = 0, L2 = 1 for b, abd, c, acd, ae, de, abce, bcde
L1 = 1, L2 = 1 for e, ade, bce, abcde, ab, bd, ac, cd

I These treatment combinations would be assigned to different blocks.


I With a little reflection we realize that another effect in addition to ADE and
BCE must be confounded with blocks. Because there are four blocks with
three degrees of freedom between them, and because ADE and BCE have
only one degree of freedom each, clearly an additional effect with one degree
of freedom must be confounded.
I This effect is the generalized interaction of ADE and BCE , which is defined
as the product of ADE and BCE modulus 2.
I Thus, in our example the generalized interaction
(ADE )(BCE ) = ABCDE 2 = ABCD is also confounded with blocks.
Sarguta (SoM) Design and Analysis February - May, 2021 175 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

2k factorials in 2p blocks

I Suppose we needed four batches of formaldehyde, and could do only 4 runs


per batch. This is then a 24 factorial in 22 blocks.
I If two effects are confounded with blocks, then so is their product, which is
defined by ’multiplication mod 2’.
I Pick two effects to be confounded with blocks: ABC and ACD. Then also
(ABC )(ACD) = BD is confounded. We would not pick ABC and ABCD,
since (ABC )(ABCD) = D.
I For the choices ABC and ACD we have

L1 = x1 + x2 + x3
L2 = x1 + x3 + x4

with (L1 , L2 ) = (0, 0) in Block I, (L1 , L2 ) = (1, 0) in Block II, (L1 , L2 ) = (0, 1)
in Block III and (L1 , L2 ) = (1, 1) in Block IV.
Give the treatment combinations in the various blocks.

Sarguta (SoM) Design and Analysis February - May, 2021 176 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Confounded 4 factor factorial designs Example in R


> A <- rep(c(-1,1), times=8)
> B <- rep(c(-1,1), each = 2, times=4)
> C <- rep(c(-1,1), each = 4, times=2)
> D <- rep(c(-1,1), each = 8)
> ABC <- A*B*C
> ACD <- A*C*D
> BD <- B*D
> y <- c(25,71,48,45,68,40,60,65,43,80,25,14,55,86,20,76)
> blocks <- vector(length=16)
> for(i in 1:16) {
+ if(ABC[i]==-1 & ACD[i]==-1) blocks[i]=1
+ if(ABC[i]==1 & ACD[i]==-1) blocks[i]=2
+ if(ABC[i]==-1 & ACD[i]==1) blocks[i]=3
+ if(ABC[i]==1 & ACD[i]==1) blocks[i]=4
+ }
> blocks <- as.factor(blocks)
> data <- data.frame(A, B, C, D, blocks, ABC, ACD,BD, y)
> data
Sarguta (SoM) Design and Analysis February - May, 2021 177 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Confounded 4 factor factorial designs R Output

A B C D blocks ABC ACD BD y


1 -1 -1 -1 -1 1 -1 -1 1 25
2 1 -1 -1 -1 4 1 1 1 71
3 -1 1 -1 -1 2 1 -1 -1 48
4 1 1 -1 -1 3 -1 1 -1 45
5 -1 -1 1 -1 4 1 1 1 68
6 1 -1 1 -1 1 -1 -1 1 40
7 -1 1 1 -1 3 -1 1 -1 60
8 1 1 1 -1 2 1 -1 -1 65
9 -1 -1 -1 1 3 -1 1 -1 43
10 1 -1 -1 1 2 1 -1 -1 80
11 -1 1 -1 1 4 1 1 1 25
12 1 1 -1 1 1 -1 -1 1 14
13 -1 -1 1 1 2 1 -1 -1 55
14 1 -1 1 1 3 -1 1 -1 86
15 -1 1 1 1 1 -1 -1 1 20
16 1 1 1 1 4 1 1 1 76
Sarguta (SoM) Design and Analysis February - May, 2021 178 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

> g <- lm(y ~ blocks + A + B + C + D +A*B + A*C + A*D


+ + B*C + C*D + A*B*D + B*C*D + A*B*C*D)
> anova(g)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
blocks 3 3787.7 1262.56
A 1 1105.6 1105.56
B 1 826.6 826.56
C 1 885.1 885.06
D 1 33.1 33.06
A:B 1 95.1 95.06
A:C 1 1.6 1.56
A:D 1 540.6 540.56
B:C 1 217.6 217.56
C:D 1 60.1 60.06
A:B:D 1 3.1 3.06
B:C:D 1 22.6 22.56
A:B:C:D 1 5.1 5.06
Residuals 0 0.0
Sarguta (SoM) Design and Analysis February - May, 2021 179 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Half Normal Plot for 24 factorial in 22 blocks

Normal Q−Q Plot


50

blocks4
40
Sample Quantiles

A
30

B C
blocks3
blocks2
A:D
20

B:C
10

A:B
C:D
B:C:D D
A:C A:B:D A:B:C:D
0

−1 0 1

Theoretical Quantiles

It looks like we can drop the main effect of ’D’ if we keep some of its interactions.
R will, by default, estimate a main effect if an interaction is in the model. To fit
blocks, A, B, C, AB, AD, BC, CD but not D, we can add the SS and df for D to
those for Error.

Sarguta (SoM) Design and Analysis February - May, 2021 180 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

> h <- lm(y ~ blocks + A + B + C + B*C + A*B + A*D + C*D)


> anova(h)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
blocks 3 3787.7 1262.56 156.5969 0.0001333 ***
A 1 1105.6 1105.56 137.1240 0.0003042 ***
B 1 826.6 826.56 102.5194 0.0005356 ***
C 1 885.1 885.06 109.7752 0.0004690 ***
D 1 33.1 33.06 4.1008 0.1128484
B:C 1 217.6 217.56 26.9845 0.0065401 **
A:B 1 95.1 95.06 11.7907 0.0264444 *
A:D 1 540.6 540.56 67.0465 0.0012117 **
C:D 1 60.1 60.06 7.4496 0.0524755 .
Residuals 4 32.2 8.06
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This would change MSE to (32.2 + 33.1)/5 = 13.06 on 5 d.f.
Sarguta (SoM) Design and Analysis February - May, 2021 181 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Interaction Plots

65
70

B D
65

60
−1 1
1 −1
60

55
mean of y

mean of y
55

50
50

45
45

40
40

35
−1 1 −1 1

A A

60

C D
60

1 1
55
55

−1 −1
mean of y

mean of y
50

50
45

45
40
35

40

−1 1 −1 1

B C

The best combination seems to be A, C, D high, B low.


Sarguta (SoM) Design and Analysis February - May, 2021 182 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Partial Confounding

I When the number of variables is small, say k = 2 or 3, it is usually necessary


to replicate the experiment to obtain an estimate of error.
I For example, suppose that a 23 factorial must be run in two blocks with ABC
confounded, and the experimenter decides to replicate the design four times.
In this case, information on the ABC interaction cannot be retrieved because
ABC is confounded with blocks in each replicate. This design is said to be
completely confounded.
I If resources are sufficient to allow the replication of confounded designs, it is
generally better to use a slightly different method of designing the blocks in
each replicate. This approach consists of confounding a different effect in
each replicate so that some information on all effects is obtained. Such a
procedure is called partial confounding.
I Partial confounding is often better, since we then get estimates of effects
from the replications in which they are not confounded.

Sarguta (SoM) Design and Analysis February - May, 2021 183 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Example - Partial Confounding

I Example 7-3 from text. Two replicates of a 23 factorial are to be run, in 2


blocks each.
Replicate 1: Confound ABC with blocks. So L = x1 + x2 + x3 = 0 for
(1), ab, ac, bc and L = x1 + x2 + x3 = 1 for a, b, c, abc.
Replicate 2: Confound AB with blocks. So L = x1 + x2 = 0 for (1), ab, abc, c
and L = x1 + x2 = 1 for a, b, ac, bc.
Replicate 1 Replicate 2
Block 1 Block 2 Block 3 Block 4
(1)=550 a=669 (1)=604 a=650
ab=642 b=633 c=1052 b=601
ac=749 c=1037 ab= 635 ac=868
bc=1075 abc=729 abc=860 bc=1063

Sarguta (SoM) Design and Analysis February - May, 2021 184 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Example

> # Example 7-3 from text


> y <- c(550, 642, 749, 1075, 669, 633, 1037,729,
+ 604, 1052, 635, 860, 650, 601, 868, 1063)
> A <- c(-1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, 1, -1, 1, -1)
> B <- c(-1, 1, -1, 1, -1, 1, -1, 1, -1, -1, 1, 1, -1, 1, -1, 1)
> C <- c(-1, -1, 1, 1, -1, -1, 1, 1, -1, 1, -1, 1, -1, -1, 1, 1)
> ABC <- A*B*C
> AB <- A*B
> Rep <- as.factor(rep(c("I", "II"), each = 8))
> Block <- as.factor(rep(1:4, each=4))
> A <- as.factor(A)
> B <- as.factor(B)
> C <- as.factor(C)
> data <- data.frame(A, B, C, Rep, Block, ABC, AB, y)
> data

Sarguta (SoM) Design and Analysis February - May, 2021 185 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

A B C Rep Block ABC AB y


1 -1 -1 -1 I 1 -1 1 550
2 1 1 -1 I 1 -1 1 642
3 1 -1 1 I 1 -1 -1 749
4 -1 1 1 I 1 -1 -1 1075
5 1 -1 -1 I 2 1 -1 669
6 -1 1 -1 I 2 1 -1 633
7 -1 -1 1 I 2 1 1 1037
8 1 1 1 I 2 1 1 729
9 -1 -1 -1 II 3 -1 1 604
10 -1 -1 1 II 3 1 1 1052
11 1 1 -1 II 3 -1 1 635
12 1 1 1 II 3 1 1 860
13 1 -1 -1 II 4 1 -1 650
14 -1 1 -1 II 4 1 -1 601
15 1 -1 1 II 4 -1 -1 868
16 -1 1 1 II 4 -1 -1 1063

Sarguta (SoM) Design and Analysis February - May, 2021 186 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

R Code

When the levels of one factor (Blocks) make sense only within the levels of
another factor (Replicates) we say that the first is ’nested’ within the second. A
way to indicate this in R is as:
> h <- lm(y ~ Rep + Block%in%Rep + A + B + C + A*B
+ + A*C + B*C + A*B*C)
> anova(h)
Through the partial confounding we are able to estimate all interactions. It looks
like only A, C, and AC are significant.

Sarguta (SoM) Design and Analysis February - May, 2021 187 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

R Output

Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
Rep 1 3875 3875 1.5191 0.272551
A 1 41311 41311 16.1941 0.010079 *
B 1 218 218 0.0853 0.781987
C 1 374850 374850 146.9446 6.749e-05 ***
Rep:Block 2 458 229 0.0898 0.915560
A:B 1 3528 3528 1.3830 0.292529
A:C 1 94403 94403 37.0066 0.001736 **
B:C 1 18 18 0.0071 0.936205
A:B:C 1 6 6 0.0024 0.962816
Residuals 5 12755 2551
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sarguta (SoM) Design and Analysis February - May, 2021 188 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Residual Analysis

Carry out residual analysis for the example above.

Sarguta (SoM) Design and Analysis February - May, 2021 189 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Computing SSBlocks(Rep)
How is SSBlocks(Rep) computed? One way is to compute SSABC in Replicate I,
where this effect is confounded with blocks, and similarly SSAB in Replicate II, and
add them:
> # Calculate SS.Blocks.in.Rep as SS of effects
> #confounded with blocks:
> SSABC.confounded <- ((sum(y[Rep=="I" &
+ ABC==1])-sum(y[Rep=="I" & ABC==-1]))^2)/8
> SSAB.confounded <- ((sum(y[Rep=="II" &
+ AB==1])-sum(y[Rep=="II" & AB==-1]))^2)/8
> SS.Blocks.in.Rep <- SSABC.confounded + SSAB.confounded
> SSABC.confounded
[1] 338
> SSAB.confounded
[1] 120.125
> SS.Blocks.in.Rep
[1] 458.125
which is in agreement with the ANOVA output.
Sarguta (SoM) Design and Analysis February - May, 2021 190 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

Computing SSBlocks(Rep)

I Another method goes back to general principles. We calculate a SS for blocks


within each replicate (since blocks make sense only within the replicates):
2 X
X 2
2
SSBlocks(Rep) = 4 (ȳij. − ȳi.. ) = 458.125.
i=1 j=1

Here ȳij. is the average in block j of replicate i and ȳi.. is the overall average
of that replicate, which is the only one in which that block makes sense.

Sarguta (SoM) Design and Analysis February - May, 2021 191 / 263
Blocking and Confounding in the 2k Factorial Design Confounding

> # Another way to calculate SS.Blocks:


> block.means <- vector(length=4)
> for(j in 1:4) block.means[j] <- mean(y[Block ==j])
> rep.means <- rep(c(mean(y[Rep == "I"]),
+ mean(y[Rep == "II"])), each=2)
> cbind(block.means,rep.means)
block.means rep.means
[1,] 754.00 760.500
[2,] 767.00 760.500
[3,] 787.75 791.625
[4,] 795.50 791.625
> SS.Blocks.in.Rep <- 4*sum((block.means-rep.means)^2)
> SS.Blocks.in.Rep
[1] 458.125

Sarguta (SoM) Design and Analysis February - May, 2021 192 / 263
Fractional Factorial Designs

PART V:
FRACTIONAL FACTORIAL DESIGNS

Sarguta (SoM) Design and Analysis February - May, 2021 193 / 263
Fractional Factorial Designs Introduction

Introduction - Fractional Factorials

I Consider a 25 factorial. Even without replicates, there are 25 = 32


observations required to estimate the effects - 5 main effects, 10 two factor
interactions, 10 three factor interactions, 5 four factor interactions and 1 five
factor interaction. If three (or more) factor interactions are not of interest
then only 15 effects are left so that (including 1 d.f. for µ) perhaps only half
as many observations are needed.
I A 2k−1 design, or ’one-half fraction of the 2k design’, is one in which only
half of the 2k treatment combinations are observed.

Sarguta (SoM) Design and Analysis February - May, 2021 194 / 263
Fractional Factorial Designs Introduction

Example
I 23 factorial which could be run in two blocks, with ABC confounded with
blocks:
Effect
I A B C AB AC BC ABC Blocks
(1) + - - - + + + - 1
a + + - - + 2
b + - + - + 2
ab + + + - + - - - 1
c + - - + + 2
ac + + - + - 1
bc + - + + - 1
abc + + + + + 2
I If we run only block 2, then the design uses a, b, c, abc. These are those for
which ABC = +; since also I = + we say that the defining relation for the
design is I = ABC , and we refer to the ’word’ ABC as the ’generator’ of the
design.
I If we only used those combinations with A = +, then A = I would be the
defining relation and A the generator of the design.
Sarguta (SoM) Design and Analysis February - May, 2021 195 / 263
Fractional Factorial Designs One-Half Fraction

One-half Fraction
I By running only block 2, our one-half fraction is
Effect
I A B C AB AC BC ABC
a + + - - - - + +
b + - + - - + - +
c + - - + + - - +
abc + + + + + + + +
I The estimates of the effects are obtained by applying the ± signs
appropriately.
I We use [·]’s to distinguish these from the full factorial estimates.
a − b − c + abc
[A] = = [BC ] ,
2
−a + b − c + abc
[B] = = [AC ] ,
2
−a − b + c + abc
[C ] = = [AB] .
2
I We say that these pairs of effects are aliases.
Sarguta (SoM) Design and Analysis February - May, 2021 196 / 263
Fractional Factorial Designs One-Half Fraction

I Note that in the full factorial,

1 a − b − c + abc
(A + BC ) = ,
2 2
so that [A] and [BC ] are each estimating the same thing as A + BC . This is
denoted as [A] → A + BC , [B] → B + AC , etc.
I These relations can also be obtained by doing multiplication (mod 2) on the
defining relation:

I = ABC ⇒ A = A2 BC = BC ,
B = AB 2 C = AC ,
etc.

Sarguta (SoM) Design and Analysis February - May, 2021 197 / 263
Fractional Factorial Designs One-Half Fraction

I The one-half fraction with defining relation ABC = I is called the principal
fraction. The other half, in which ABC = −, is called the alternate or
complementary fraction, and has defining relation:

I = −ABC .

I Complementary fraction
Effect
I A B C AB AC BC ABC
(1) + - - - + + + -
ab + + + - + - - -
ac + + - + - - - -
bc + - + + - + + -

Sarguta (SoM) Design and Analysis February - May, 2021 198 / 263
Fractional Factorial Designs One-Half Fraction

Estimates from the complementary fraction


I The estimates from the complementary fraction are:

0 −(1) + ab + ac − bc 0
[A] = = − [BC ] ,
2
0 −(1) + ab − ac + bc 0
[B] = = − [AC ] ,
2
0
[C ] = ?? =??. (7)

I In the full factorial,

−(1) + ab + ac − bc
A − BC = ,
2
so that
0
[A] → A − BC
0
[B] → B − AC
etc
Sarguta (SoM) Design and Analysis February - May, 2021 199 / 263
Fractional Factorial Designs One-Half Fraction

I In practice, it does not matter which fraction is actually used. Both fractions
belong to the same family; that is, the two one-half fractions form a complete
23 design.
I Suppose that after running one of the one-half fractions of the 23 design, the
other fraction was also run. Thus, all eight runs associated with the full 23
are now available.
I We may now obtain de-aliased estimates of all the effects by analyzing the
eight runs as a full 23 design in two blocks of four runs each. This could also
be done by adding and subtracting the linear combination of effects from the
two individual fractions.
I For example, consider [A] → A + BC and [A]0 → A − BC . This implies that

1 1
([A] + [A]0 ) = (A + BC + A − BC ) → A
2 2
and that
1 1
([A] − [A]0 ) = (A + BC − A + BC ) → BC
2 2

Sarguta (SoM) Design and Analysis February - May, 2021 200 / 263
Fractional Factorial Designs Design Resolution

Design Resolution
A design is of resolution R if no p-factor effect is aliased with another effect
containing less than R − p factors.
1. Resolution III designs. These are designs in which no main effects are
aliased with any other main effect, but main effects are aliased with
two-factor interactions and some two-factor interactions may be aliased with
each other. A 23−1 design with I = ABC is a resolution III design (23−1 III ).
2. Resolution IV designs. These are designs in which no main effect is aliased
with any other main effect or with any two-factor interaction, but two-factor
interactions are aliased with each other. A 24−1 design with I = ABCD is a
resolution IV design (24−1
IV ).
3. Resolution V designs. These are designs in which no main effect or
two-factor interaction is aliased with any other main effect or two-factor
interaction, but two factor interactions are aliased with three-factor
interactions. A 25 − 1 design with I = ABCDE is a resolution V design
(25−1
V ).
In general, the resolution of a two-level fractional factorial design is equal to the
number of letters in the shortest word in the defining relation. Consequently, we
could call the preceding design types three-, four-, and five-letter designs,
respectively.
Sarguta (SoM) Design and Analysis February - May, 2021 201 / 263
Fractional Factorial Designs Design Resolution

Example - Filtration Rate

I A chemical product is produced using two levels each of temperature (A),


pressure (B), concentration of formaldehyde (C) and rate (D) at which the
product is stirred. Response (Y) is the ’filtration rate’. Suppose we decide to
investigate the effects of these factors by running half of a 24 factorial.
I We attain Resolution IV (the best possible) with the defining relation:

I = ABCD.
I Note that
The defining relationship implies that D = ABC .
The principal (or complementary) half of a 2k factorial is a full 2k−1 factorial
for k − 1 of the factors.
I Thus we can get the design by writing down a full 23 factorial for A, B and
C, and computing the signs for D from D = ABC .

Sarguta (SoM) Design and Analysis February - May, 2021 202 / 263
Fractional Factorial Designs Design Resolution

4−1
Resulting 2IV Design
Effect
I A B C D=ABC y
(1) + - - - - 45
a + + - - + 100
b + - + - + 45
ab + + + - - 65
c + - - + + 75
ac + + - + - 60
bc + - + + - 80
abc + + + + + 96
The liasing techniques are:
A = BCD, B = ACD, C = ABD, D = ABC , AB = CD, AC = BD, AD = BC .
Thus

[A] → A + BCD,
[AB] → AB + CD,
etc.

Sarguta (SoM) Design and Analysis February - May, 2021 203 / 263
Fractional Factorial Designs Design Resolution

Analysis - R

For the analysis, first (try to) fit the full 24 model:
> A <- rep(c(-1,1), times=4)
> B <- rep(c(-1,1), each = 2, times=2)
> C <- rep(c(-1,1), each = 4)
> D <- A*B*C
> A <- as.factor(A)
> B <- as.factor(B)
> C <- as.factor(C)
> D <- as.factor(D)
> y <- c(45, 100, 45, 65, 75, 60, 80, 96)
> data <- data.frame(A, B, C, D, y)

Sarguta (SoM) Design and Analysis February - May, 2021 204 / 263
Fractional Factorial Designs Design Resolution

Model

> g <- lm(y ~ (A+B+C+D)^4)


> anova(g)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 722.0 722.0
B 1 4.5 4.5
C 1 392.0 392.0
D 1 544.5 544.5
A:B 1 2.0 2.0
A:C 1 684.5 684.5
A:D 1 722.0 722.0
Residuals 0 0.0

Sarguta (SoM) Design and Analysis February - May, 2021 205 / 263
Fractional Factorial Designs Design Resolution

Only one member of each aliased pair is exhibited; by default it is the shortest
word in the pair. From the ANOVA it looks like only A, C and D have significant
main effects.
> g$effects
> effects <- abs(g$effects[-1])
> qq <- qqnorm(effects, type="n") # "n" means no plotting
> text(qq$x, qq$y, labels = names(effects))

Sarguta (SoM) Design and Analysis February - May, 2021 206 / 263
Fractional Factorial Designs Design Resolution

Half Normal Plot

Normal Q−Q Plot

A1:D1 A1
A1:C1
25

D1
20

C1
Sample Quantiles

15
10
5

B1
A1:B1

−1.0 −0.5 0.0 0.5 1.0

Theoretical Quantiles

Sarguta (SoM) Design and Analysis February - May, 2021 207 / 263
Fractional Factorial Designs Design Resolution

I The half normal plot also points to AD (= BC) and AC (= BD) as significant.
I Since B is not significant we wouldn’t expect BC or BD to be significant
either.
I We conclude that the factors of interest are A, C, D and the interactions AC,
AD.

Sarguta (SoM) Design and Analysis February - May, 2021 208 / 263
Fractional Factorial Designs Design Resolution

I If we drop B from the model then the table of ± signs becomes:


Effect
I A C D AC AD CD y
(1) + - - - + + + 45
a + + - + - + - 100
b + - - + + - - 45
ab + + - - - - + 65
c + - + + - - + 75
ac + + + - + - - 60
bc + - + - - + - 80
abc + + + + + + + 96
and we can estimate all main effects and two factor interactions without any
being aliased.

Sarguta (SoM) Design and Analysis February - May, 2021 209 / 263
Fractional Factorial Designs Design Resolution

Analysis
> h <- lm(y ~(A+C+D)^2)
> anova(h)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
A 1 722.0 722.0 160.4444 0.05016 .
C 1 392.0 392.0 87.1111 0.06795 .
D 1 544.5 544.5 121.0000 0.05772 .
A:C 1 684.5 684.5 152.1111 0.05151 .
A:D 1 722.0 722.0 160.4444 0.05016 .
C:D 1 2.0 2.0 0.4444 0.62567
Residuals 1 4.5 4.5
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If the insignificant SSCD were combined with SSE , we would have MSE = 3.25 on
2 d.f., and all F-values would be 4.5/3.25 = 1.38 times as large.
Sarguta (SoM) Design and Analysis February - May, 2021 210 / 263
Fractional Factorial Designs Generalization

Generalization - One-quarter fraction of 26 factorial


I To run a one-quarter fraction of a 26 factorial we would choose two defining
relationships, say
I = ABCE , I = BCDF ; implying
I = ABCE ∗ BCDF = ADEF and
E = ABC , F = BCD. (8)
I The complete defining relation for this design is

I = ABCE = BCDF = ADEF .


I We construct the design by writing down all 16 rows of ± signs for a full 24
factorial in factors A,B,C and D. Then the signs for E and F are computed
from (8).
I This is a Resolution IV design: 26−2
IV , and all two factor interactions are
aliased with other two (or more) factor interactions. For instance
AB = CE = ACDF = BDEF
EF = ABCF = BCDE = AD.
Sarguta (SoM) Design and Analysis February - May, 2021 211 / 263
Response Surface Methods and Designs

PART VI:
RESPONSE SURFACE METHODS AND DESIGNS

Sarguta (SoM) Design and Analysis February - May, 2021 212 / 263
Response Surface Methods and Designs Introduction

Introduction
I The purpose of response surface methods (RSM) is to optimize a process or
system. RSM is a way to explore the effect of operating conditions (the
factors) on the response variable, y .
I As we map out the response surface of y we move our process as close as
possible towards the optimum, taking into account any constraints.
I Initially, when we are far away from the optimum, we will use factorial
experiments. As we approach the optimum then these factorials are replaced
with better designs that more closely approximate conditions at the optimum.
I For example, suppose that a chemical engineer wishes to find the levels of
temperature (x1 ) and pressure (x2 ) that maximize the yield (y ) of a process.
The process yield is a function of the levels of temperature and pressure say

y = f (x1 , x2 ) + 

where  represents the noise or error observed in the response y . If we denote


the expected response by E (y ) = f (x1 , x2 ) = η, then the surface represented
by η = f (x1 , x2 ) is called a response surface.
Sarguta (SoM) Design and Analysis February - May, 2021 213 / 263
Response Surface Methods and Designs Approximation

Approximating Polynomials
I In most RSM problems, the form of the relationship between the response
and the independent variables is unknown.
I The first step in RSM is to find a suitable approximation for the true
functional relationship between y and the set of independent variables.
I Usually, a low-order polynomial in some region of the independent variables is
employed.
I If the response is well modeled by a linear function of the independent
variables, then the approximating function is the first-order model:

y = β0 + β1 x1 + β2 x2 + · · · + βk xk + 

I If there is curvature in the system, then a polynomial of higher degree must


be used, such as the second-order model:
k
X k
X XX
y = β0 + βi xi + βii xi2 + βij xi xj + 
i=1 i=1 i< j

Sarguta (SoM) Design and Analysis February - May, 2021 214 / 263
Response Surface Methods and Designs Approximation

Estimation of Parameters in Polynomials


I The method of least squares is used to estimate the parameters in the
approximating polynomials.
I The response surface analysis is then performed using the fitted surface. If
the fitted surface is an adequate approximation of the true response function,
then analysis of the fitted surface will be approximately equivalent to analysis
of the actual system.
I The model parameters can be estimated most effectively if proper
experimental designs are used to collect the data.
I Designs for fitting response surfaces are called response surface designs.
I RSM is a sequential procedure. Often, when we are at a point on the
response surface that is remote from the optimum there is little curvature in
the system and the first-order model will be appropriate. Our objective here
is to lead the experimenter rapidly and efficiently along a path of
improvement toward the general vicinity of the optimum.
I Once the region of the optimum has been found, a more elaborate model,
such as the second-order model, may be employed, and an analysis may be
performed to locate the optimum.
Sarguta (SoM) Design and Analysis February - May, 2021 215 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

The Method of Steepest Ascent


I The method of steepest ascent is a procedure for moving sequentially in the
direction of the maximum increase in the response. If minimization is desired,
then we call this technique the method of steepest descent.
I If we ignore cross products which gives an indication of the curvature of the
response surface that we are fitting and just look at the first order model,
then this is called the steepest ascent model.
I The fitted first order model is
k
X
ŷ = β̂0 + β̂i xi
i=1

I Experiments are conducted along the path of steepest ascent until no further
increase in response is observed. Then a new first-order model may be fit, a
new path of steepest ascent determined, and the procedure continued.
I Eventually, the experimenter will arrive in the vicinity of the optimum. This is
usually indicated by lack of fit of a first-order model.
I At that time, additional experiments are conducted to obtain a more precise
estimate of the optimum.
Sarguta (SoM) Design and Analysis February - May, 2021 216 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

Example (Example 11.1 in Text)


I A chemical engineer is interested in determining the operating conditions that
maximize the yield of a process. Two controllable variables influence process
yield: reaction time and reaction temperature. The engineer is currently
operating the process with a reaction time of 35 minutes and a temperature
of 155 Fahrenheit, which result in yields of around 40 percent. Because it is
unlikely that this region contains the optimum, she fits a first-order model
and applies the method of steepest ascent.
I The engineer decides that the region of exploration for fitting the first-order
model should be (30, 40) minutes of reaction time and (150, 160) Fahrenheit.
I To simplify the calculations, the independent variables will be coded to the
usual (-1, 1) interval. Thus if ξ1 denotes the natural variable time and ξ2
denotes the natural variable temperature, then the coded variables are
ξ1 − 35
x1 =
5
and
ξ2 − 35
x2 =
155
.
Sarguta (SoM) Design and Analysis February - May, 2021 217 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

Process Data for Fitting the First-Order Model


I The experimental design is as shown
Natural Variables Coded Variables Response
ξ1 ξ2 x1 x2 y
30 150 -1 -1 39.3
30 160 -1 1 40.0
40 150 1 -1 40.9
40 160 1 1 41.5
35 155 0 0 40.3
35 155 0 0 40.5
35 155 0 0 40.7
35 155 0 0 40.2
35 155 0 0 40.6
I Note that the design used to collect these data is a 22 factorial augmented by
five center points.
I Replicates at the center are used to estimate the experimental error and to
allow for checking the adequacy of the first-order model. Also, the design is
centered about the current operating conditions for the process.
Sarguta (SoM) Design and Analysis February - May, 2021 218 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

I A first-order model may be fit to these data by least squares.


> y<-c(39.3,40,40.9,41.5,40.3,40.5,40.7,40.2,40.6)
> x1<-c(-1,-1,1,1,0,0,0,0,0)
> x2<-c(-1,1,-1,1,0,0,0,0,0)
> fit<-lm(y~x1+x2)
> summary(fit)
Call:
lm(formula = y ~ x1 + x2)

Residuals:
Min 1Q Median 3Q Max
-0.244444 -0.044444 0.005556 0.055556 0.255556

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.44444 0.05729 705.987 5.45e-16 ***
x1 0.77500 0.08593 9.019 0.000104 ***
x2 0.32500 0.08593 3.782 0.009158 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sarguta (SoM) Design and Analysis February - May, 2021 219 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

I Employing the methods for two-level designs, we obtain the following model
in the coded variables:

ŷ = 40.44 + 0.775x1 + 0.325x2

I Before exploring along the path of steepest ascent, the adequacy of the
first-order model should be investigated.
I The 22 design with center points allows the experimenter to
Obtain an estimate of error.
Check for interactions (cross-product terms) in the model.
Check for quadratic effects (curvature).
I The replicates at the center can be used to calculate an estimate of error as
follows:
(202.3)2
(40.3)2 + (40.5)2 + (40.7)2 + (40.6)2 −
σ̂ 2 = 5
= 0.0430
4

Sarguta (SoM) Design and Analysis February - May, 2021 220 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

I The first-order model assumes that the variables x1 and x2 have an additive
effect on the response.
I Interaction between the variables would be represented by the coefficient β12
of a cross-product term x1 x2 added to the model.
I The least squares estimate of this coefficient is just one-half the interaction
effect calculated as in an ordinary 22 factorial design, or
1 1
β̂12 = [(1 ∗ 39.3) + (1 ∗ 41.5) + (−1 ∗ 40.0) + (−1 ∗ 40.9)] = (−0.1) = −0.0
4 4
I The single-degree-of-freedom sum of squares for interaction is

(−0.1)2
SSInteraction = = 0.0025
4
I Comparing SSInteraction to σ̂ 2 gives a lack-of-fit statistic

SSInteraction 0.0025
F = = = 0.058
σ̂ 2 0.0430
which is small, indicating that interaction is negligible.

Sarguta (SoM) Design and Analysis February - May, 2021 221 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

Call:
lm(formula = y ~ x1 * x2)

Residuals:
1 2 3 4 5 6 7
-0.01944 -0.01944 -0.01944 -0.01944 -0.14444 0.05556 0.25556 -0.24
9
0.15556

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.44444 0.06231 649.069 1.65e-13 ***
x1 0.77500 0.09347 8.292 0.000417 ***
x2 0.32500 0.09347 3.477 0.017713 *
x1:x2 -0.02500 0.09347 -0.267 0.799787
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.1869 on 5 degrees of freedom


Multiple R-squared: 0.9418, Adjusted R-squared: 0.9069
F-statistic: 26.97 on 3 and 5 DF, p-value: 0.00163
Sarguta (SoM) Design and Analysis February - May, 2021 222 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
x1 1 2.40250 2.40250 68.7520 0.0004166 ***
x2 1 0.42250 0.42250 12.0906 0.0177127 *
x1:x2 1 0.00250 0.00250 0.0715 0.7997870
Residuals 5 0.17472 0.03494
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sarguta (SoM) Design and Analysis February - May, 2021 223 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

Check for a Pure Quadratic Curvature Effect


I Compare the average response at the four points in the factorial portion of
the design, say ȳF = (39.3 + 40 + 40.9 + 41.5)/4 = 40.425, with the average
response at the design center, say
ȳC = (40.3 + 40.5 + 40.7 + 40.2 + 40.6)/5 = 40.46.
I If there is quadratic curvature in the true response function, then ȳF − ȳC is a
measure of this curvature.
I If β11 and β22 are the coefficients of the ”pure quadratic” terms x12 and x22 ,
then ȳF − ȳC is an estimate of β11 + β22 .
I In the example, an estimate of the pure quadratic term is
β̂11 + β̂22 = ȳF − ȳC = 40.425 − 40.46 = −0.035.
I The single-degree-of-freedom sum of squares associated with the null
hypothesis, H0 : β11 + β22 = 0, is
2
nF nC (ȳF − ȳC ) (4)(5)(−0.035)2
SSPureQuadratic = = = 0.0027
nF + nC 4+5
where nF and nC are the number of points in the factorial portion and the
number of center points, respectively.
Sarguta (SoM) Design and Analysis February - May, 2021 224 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

I Since
SSPureQuadratic 0.0027
F = = = 0.063
σ̂ 2 0.0430
is small, there is no indication of a pure quadratic effect.
I Both the interaction and curvature checks are not significant.
I The standard error of β̂1 and β̂2 is
r
  r MS σ̂ 2
r
0.0430
E
se β̂i = = = = 0.10
4 4 4

for i = 1, 2. Both regression coefficients β̂1 and β̂2 are large relative to their
standard errors.
I At this point we have no reason to question the adequacy of the first order
model.

Sarguta (SoM) Design and Analysis February - May, 2021 225 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

Analysis of Variance for the Model


> library(rsm)
> Chemical.rsm <- rsm(y ~ FO(x1, x2))
> summary(Chemical.rsm)
Call:
rsm(formula = y ~ FO(x1, x2))

Estimate Std. Error t value Pr(>|t|)


(Intercept) 40.444444 0.057288 705.9869 5.451e-16 ***
x1 0.775000 0.085932 9.0188 0.000104 ***
x2 0.325000 0.085932 3.7821 0.009158 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.941, Adjusted R-squared: 0.9213


F-statistic: 47.82 on 2 and 6 DF, p-value: 0.0002057

Analysis of Variance Table

Response: y
Sarguta (SoM) Design and Analysis February - May, 2021 226 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

I To move away from the design center - the point (x1 = 0, x2 = 0) - along the
path of steepest ascent, we would move 0.775 units in the x1 direction for
every 0.325 units in the x2 direction.
I Thus, the path of steepest ascent passes through the point (x1 = 0, x2 = 0)
and has a slope 0.325/0.775.
I The engineer decides to use 5 minutes of reaction time as the basic step size.
Using the relationship between ξ1 and x1 , we see that 5 minutes of reaction
time is equivalent to a step in the coded variable x1 of ∆x1 .
I Therefore, the steps along the path of steepest ascent are ∆x1 = 1.0000 and
∆x2 = (0.325/0.775) = 0.42.
I The engineer computes points along this path and observes the yields at
these points until a decrease in response is noted.
I Although the coded variables are easier to manipulate mathematically, the
natural variables must be used in running the process.
I Increases in response are observed through the tenth step; however, all steps
beyond this point result in a decrease in yield. Therefore, another first-order
model should be fit in the general vicinity of the point (ξ1 = 85, ξ2 = 175).

Sarguta (SoM) Design and Analysis February - May, 2021 227 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

Steepest Ascent Experiment

Coded Variables Natural Variables Response


Steps x1 x2 ξ1 ξ2 y
Origin 0 0 35 155
∆ 1.00 0.42 5 2
Origin + ∆ 1.00 0.42 40 157 41.0
Origin + 2∆ 2.00 0.84 45 159 42.9
Origin + 3∆ 3.00 1.26 50 161 47.1
Origin + 4∆ 4.00 1.68 55 163 49.7
Origin + 5∆ 5.00 2.10 60 165 53.8
Origin + 6∆ 6.00 2.52 65 167 59.9
Origin + 7∆ 7.00 2.94 70 169 65.0
Origin + 8∆ 8.00 3.36 75 171 70.4
Origin + 9∆ 9.00 3.78 80 173 77.6
Origin + 10∆ 10.00 4.20 85 175 80.3
Origin + 11∆ 11.00 4.62 90 179 76.2
Origin + 12∆ 12.00 5.04 95 181 75.1

Sarguta (SoM) Design and Analysis February - May, 2021 228 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

Fitting a new First-Order Model

I A new first-order model is fit around the point (ξ1 = 85, ξ2 = 175).
I The region of exploration for ξ1 is [80,90], and it is [170,180] for ξ2 .
I The coded variable are therefore
ξ1 − 85
x1 =
5
and
ξ2 − 175
x2 =
5
I A 22 design with five center points is used.

Sarguta (SoM) Design and Analysis February - May, 2021 229 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

Data for Second First-Order Model

I The experimental design is


Natural Variables Coded Variables Response
ξ1 ξ2 x1 x2 y
80 170 -1 -1 76.5
80 180 -1 1 77.0
90 170 1 -1 78.0
90 180 1 1 79.5
85 175 0 0 79.9
85 175 0 0 80.3
85 175 0 0 80.0
85 175 0 0 79.7
85 175 0 0 79.8
I The first-order model fit to the coded variables is

ŷ = 78.97 + 1.00x1 + 0.50x2

Sarguta (SoM) Design and Analysis February - May, 2021 230 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

> y1<-c(76.5,77,78,79.5,79.9,80.3,80,79.7,79.8)
> x1<-c(-1,-1,1,1,0,0,0,0,0)
> x2<-c(-1,1,-1,1,0,0,0,0,0)
> fit_rsm<-rsm(y1~FO(x1,x2))
> summary(fit_rsm)
Call:
rsm(formula = y1 ~ FO(x1, x2))

Estimate Std. Error t value Pr(>|t|)


(Intercept) 78.96667 0.45379 174.0156 2.43e-12 ***
x1 1.00000 0.68069 1.4691 0.1922
x2 0.50000 0.68069 0.7346 0.4903
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.3102, Adjusted R-squared: 0.08023


F-statistic: 1.349 on 2 and 6 DF, p-value: 0.3283

Analysis of Variance Table

Response: y1
Sarguta (SoM) Design and Analysis February - May, 2021 231 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

> fit.rsmi <- update(fit_rsm, . ~ . + TWI(x1, x2))


> summary(fit.rsmi)
Call:
rsm(formula = y1 ~ FO(x1, x2) + TWI(x1, x2))

Estimate Std. Error t value Pr(>|t|)


(Intercept) 78.96667 0.49148 160.6702 1.772e-10 ***
x1 1.00000 0.73722 1.3564 0.2330
x2 0.50000 0.73722 0.6782 0.5277
x1:x2 0.25000 0.73722 0.3391 0.7483
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple R-squared: 0.3257, Adjusted R-squared: -0.07891


F-statistic: 0.805 on 3 and 5 DF, p-value: 0.5426

Analysis of Variance Table

Response: y1
Df Sum Sq Mean Sq F value Pr(>F)
FO(x1, x2) 2 5.000 2.500 1.150 0.3882680
Sarguta (SoM) Design and Analysis February - May, 2021 232 / 263
Response Surface Methods and Designs The Method of Steepest Ascent

I The Lack of fit check imply that the first-order model is not an adequate
approximation.
I This curvature in the true surface may indicate that we are near the optimum.
I At this point, additional analysis must be done to locate the optimum more
precisely.

Sarguta (SoM) Design and Analysis February - May, 2021 233 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface

Analysis of a Second-Order Response Surface

I Once we are close to the optimal solution, the plane will be rather flat.
I We expect that the optimal solution will be somewhere in our experimental
set-up (a peak). Hence, we expect curvature.
I A model that incorporates curvature is usually required to approximate the
response.
I In most cases, the second-order model
k
X k
X XX
y = β0 + βi xi + βii xi2 + βij xi xj + 
i=1 i=1 i< j

is adequate.
I We now have more parameters, hence we need more observations to fit the
model.

Sarguta (SoM) Design and Analysis February - May, 2021 234 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface

Location of the Stationary Point

I Suppose we wish to find the levels of x1 , x2 , . . . , xk that optimize the


predicted response.
I This point, if it exists, will be a set of x1 , x2 , . . . , xk for which the partial
∂ŷ ∂ŷ ∂ŷ
derivatives ∂x 1
= ∂x 2
= · · · = ∂xk
= 0.
I This point is called the stationary point.
I The stationary point could represent a point of maximum response, a point
of minimum response, or a saddle point.
I Contour plots play a very important role in the study of the response surface.
By generating contour plots using computer software for response surface
analysis, the experimenter can usually characterize the shape of the surface
and locate the optimum with reasonable precision.

Sarguta (SoM) Design and Analysis February - May, 2021 235 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface

Response Surface and Contour Plot

Contour Plot Response Surface


1.0

78.5 79 79.5
0.5

80

79

Yield
80

80

78
.5
0.0
x1

77
1.0
79.5
0.5
−0.5

x10.0 1.0
79
−0.5 0.5
78.5
0.0
−1.0
−0.5 x2
78 −1.0
77.5
−1.0

77

−1.0 −0.5 0.0 0.5 1.0

x2

Sarguta (SoM) Design and Analysis February - May, 2021 236 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface

Mathematical Solution for the Location of the Stationary


Point
I Writing the fitted second-order model in matrix notation we have

ŷ = β̂0 + x 0 b + x 0 Bx (9)

where x is a vector of the response variables, b is a (k ∗ 1) vector of the


first-order regression coefficients and B is a (k ∗ k) symmetric matrix whose
main diagonal elements are the pure quadratic coefficients β̂ii and whose
off-diagonal elements are one-half the mixed quadratic coefficients (β̂ij , i 6= j).
I The derivative of ŷ with respect to the elements of the vector x equated to 0
is
∂ŷ
= b + 2Bx = 0 (10)
∂x
I The stationary point is the solution to Equation (10), or
1
xs = − B −1 b (11)
2
Sarguta (SoM) Design and Analysis February - May, 2021 237 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface

I By substituting Equation (11) into Equation (9), we can find the predicted
response at the stationary point as
1
ŷs = β̂0 + xs0 b (12)
2

Sarguta (SoM) Design and Analysis February - May, 2021 238 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface

Example

Example 11.2 in the text - (Page 489).

Sarguta (SoM) Design and Analysis February - May, 2021 239 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface

Contour and response surface plots of the yield response,


Example 11.2 (text)

> yield<-c(76.5,77,78,79.5,79.9,80.3,80,79.7,
+ 79.8,78.4,75.6,78.5,77)
> Temperature<-c(-1,-1,1,1,0,0,0,0,0,1.414,-1.414,0,0)
> Time<-c(-1,1,-1,1,0,0,0,0,0,0,0,1.414,-1.414)
> chem2.lm <- lm(yield ~ poly(Temperature, Time, degree=2))
> par(mfrow=c(1,2))
> contour(chem2.lm, Temperature ~ Time, main="Contour Plot")
> persp(chem2.lm, Temperature ~ Time, zlab = "Yield",
+ main="Response Surface")

Sarguta (SoM) Design and Analysis February - May, 2021 240 / 263
Response Surface Methods and Designs Analysis of a Second-Order Response Surface

Contour Plot Response Surface


1.5

78
77
76

.5
79.5
76.5
1.0

80
79
0.5

78

Yield
Temperature

80

76
0.0

74
77
−0.5

1.0

Te
0.5
78

m
0.0

pe
78.5
−0.5 0.5 1.0

rat
0.0
−1.0

−1.0
76

u
−0.5 Time

re
77.5 −1.0
75
.5
74 76.5
.5 75
75
−1.5

−1.5 −0.5 0.0 0.5 1.0 1.5

Time

Sarguta (SoM) Design and Analysis February - May, 2021 241 / 263
Experiments with Random Factors

PART VII:
EXPERIMENTS WITH RANDOM FACTORS

Sarguta (SoM) Design and Analysis February - May, 2021 242 / 263
Experiments with Random Factors

Random Effects Model

I So far we have considered only fixed factors, that is, the levels of the factors
used by the experimenter were the specific levels of interest - fixed levels of
temperature, pressure, etc.
I The statistical inferences made about these factors are confined to the
specific levels studied.
I Often factor levels are chosen at random from a larger population of potential
levels, and we wish to make inferences about the entire population of levels,
not just those that were used in the experimental design. The factor here, is
said to be a random factor.

Sarguta (SoM) Design and Analysis February - May, 2021 243 / 263
Experiments with Random Factors

Examples

I A drug company has its products manufactured in a large number of


locations, and suspects that the purity of the product might vary from one
location to another. Three locations are randomly chosen, and several
samples of product from each are selected and tested for purity.
I When pulp is made into paper, it is bleached to enhance the brightness. The
type of bleaching chemical is of interest. Four chemicals are chosen from a
large population of potential bleaching agents, and each is applied to five
batches of pulp. One wants to know, initially, if there is a difference in the
brightness resulting from the chemical types.

Sarguta (SoM) Design and Analysis February - May, 2021 244 / 263
Experiments with Random Factors

The Two-Factor Factorial with Random Factors


I Consider two factors, A and B, both with large number of levels that are for
interest.
I Choose at random a levels of factor A and b levels of factor B and arrange
these factor levels in a factorial experimental design.
I Replicating the experiment n times, the observations may be represented by
the linear model
yijk = µ + τi + βj + (τ β)ij + ijk , i = 1, 2, . . . , a; j = 1, 2, . . . , b; k = 1, 2, . . . , n.
where the model parameters τi , βj , (τ β)ij and ijk are all independent random
variables.
I Assume that the random variables τi , βj , (τ β)ij and ijk are normally
distributed with mean zero and variances given by
V (τi ) = στ2 , V (βj ) = σβ2 , V [(τ β)ij ] = στ2 β and V (ijk ) = σ 2
I The variance of any observation is therefore

V (yijk ) = στ2 + στ2 + στ2 β + σ 2

where στ2 , στ2 β , στ2 β and σ 2 are the variance components.


Sarguta (SoM) Design and Analysis February - May, 2021 245 / 263
Experiments with Random Factors

Hypothesis and Sums of Squares

I The hypotheses that we are interested in testing are H0 : στ2 = 0, H0 : σβ2 = 0,


and H0 : στ2 β = 0.
I The numerical calculations in the analysis of variance remain unchanged; that
is, SSA , SSB , SSAB , SSE and SST are all calculated as in the fixed effects case.
I To form the test statistics, the expected mean squares must be examined.
I Show that

E (MSA ) = σ 2 + nστ2 β + bnστ2


E (MSB ) = σ 2 + nστ2 β + anσβ2
E (MSAB ) = σ 2 + nστ2 β
E (MSE ) = σ2

Sarguta (SoM) Design and Analysis February - May, 2021 246 / 263
Experiments with Random Factors

Test Statistic
I From the expected mean squares, we see that the appropriate statistic for
testing the no interaction hypothesis H0 : στ2 β = 0 is
MSAB
F0 =
MSE
because under H0 both numerator and denominator of F0 have expectation
σ 2 , and only if H0 is false is E (MSAB ) greater than E (MSE ). F0 is
distributed as F(a−1)(b−1),ab(n−1) .
I Similarly, for testing H0 : στ2 = 0 we would use
MSA
F0 =
MSAB
which is distributed as F(a−1),(a−1)(b−1) .
I For testing H0 : σβ2 = 0 the test statistic is
MSB
F0 =
MSAB
which is distributed as F(b−1),(a−1)(b−1) .
Sarguta (SoM) Design and Analysis February - May, 2021 247 / 263
Experiments with Random Factors

Estimation of Variance Components


I The variance components may be estimated by the analysis of variance
method, that is, by equating the observed mean squares in the lines of the
analysis of variance table to their expected values and solving for the variance
components.
I This yields

σ̂ 2 = MSE
MSAB − MSE
σ̂τ2 β =
n
MSB − MSAB
σ̂β2 =
an
MSA − MSAB
σ̂τ2 =
bn
as the point estimates of the variance components in the two-factor random
effects model.
I These are moment estimators.

Sarguta (SoM) Design and Analysis February - May, 2021 248 / 263
Experiments with Random Factors

Example - A Measurement Systems Capability Study

Parts used in a manufacturing process are measured with a certain gauge.


Variability in the readings can arise from the parts being measured στ2 , from the
operators doing the measuring σβ2 , from the interaction between these two στ2 β ,
and from the gauge itself. Twenty parts are randomly chosen, and each is
measured twice by each of 3 operators chosen from a large population of
operators. All 120 measurements are made in a random order, so this is a
a ∗ b = 20 ∗ 3 factorial with n = 2 replicates. See data in Table 13.1 of the text.

Sarguta (SoM) Design and Analysis February - May, 2021 249 / 263
Experiments with Random Factors

Analysis of Variance (R - Output)

Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
part 19 1185.43 62.391 62.9151 <2e-16 ***
operator 2 2.62 1.308 1.3193 0.2750
part:operator 38 27.05 0.712 0.7178 0.8614
Residuals 60 59.50 0.992
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The interaction effect is not significant (p-value = 0.862).

Sarguta (SoM) Design and Analysis February - May, 2021 250 / 263
Experiments with Random Factors

Test Statistics
I There is no change in the ANOVA, the formation of the mean squares, or the
d.f.
I However, the relevant F-ratios are not necessarily what they were in the fixed
factor case. One must start by determining the expected values of the mean
squares.
F-value for A is 87.87324
p-value for A is 0
F-value for B is 1.84507
p-value for B is 0.1718915
F-value for AB is 0.7171717
p-value for AB is 0.8620944
I The variance components can be estimated by equating the mean squares to
their expected values and solving the resulting equations.
> var.tau.beta <- (MSAB-MSE)/n
> cat("Estimate of sigma.sqd(tau.beta) =", var.tau.beta,"\n")
Estimate of sigma.sqd(tau.beta) = -0.14
Sarguta (SoM) Design and Analysis February - May, 2021 251 / 263
Experiments with Random Factors

Interpretation

I Notice that the estimate of one of the variance components, στ2 β is negative.
This is certainly not reasonable because by definition variances are
non-negative.
I We can deal with this negative result in a variety of ways. One possibility is
to assume that the negative estimate means that the variance component is
really zero and just set it to zero, leaving the other non-negative estimates
unchanged. Another approach is to estimate the variance components with a
method that assures non-negative estimates (this can be done with the
maximum likelihood approach).
I The p-value for the interaction term in the ANOVA table is very large, take
this as evidence that στ2 β really is zero and that there is no interaction effect,
and then fit a reduced model.

Sarguta (SoM) Design and Analysis February - May, 2021 252 / 263
Experiments with Random Factors

Reduced Model
I Fit a reduced model of the form
yijk = µ + τi + βj + ijk
that does not include the interaction term.
I Here
E [MSA ] = σ 2 + bnστ2 ,
E [MSB ] = σ 2 + anσβ2 ,
E [MSE ] = σ2 .
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
part 19 1185.43 62.391 70.6447 <2e-16 ***
operator 2 2.62 1.308 1.4814 0.2324
Residuals 98 86.55 0.883
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sarguta (SoM) Design and Analysis February - May, 2021 253 / 263
Experiments with Random Factors

Variance Components Estimates

I Since there is no interaction term in the model, both main effects are tested
against the error term, and the estimates of the variance components are
62.39 − 0.88
σ̂τ2 = = 10.25
(3)(2)
1.31 − 0.88
σ̂β2 = = 0.0108
(20)(2)
σ̂ 2 = 0.88

I The variability of the gauge (arising from operator variability and random
error) is estimated by
2
σ̂gauge = σ̂ 2 + σ̂β2 = 0.88 + 0.0108 = 0.8908

I The variability in the gauge appears small relative to the variability in the
product (σ̂τ2 ). This is generally a desirable situation, implying that the gauge
is capable of distinguishing among different grades of product.

Sarguta (SoM) Design and Analysis February - May, 2021 254 / 263
Experiments with Random Factors

R - Output

> MSA <- 62.39


> MSB <- 1.31
> MSE <- .88
> var.beta <- (MSB-MSE)/(a*n)
> var.tau <- (MSA-MSE)/(b*n)
> var.gauge <- MSE + var.beta
> cat("Estimate of sigma.sqd(gauge) =", var.gauge,"\n")
Estimate of sigma.sqd(gauge) = 0.89075
> cat("Estimate of sigma.sqd(parts) =", var.tau,"\n")
Estimate of sigma.sqd(parts) = 10.25167

Sarguta (SoM) Design and Analysis February - May, 2021 255 / 263
Experiments with Random Factors

The Two-Factor Mixed Model


I If one of the factors A is fixed and the other factor B is random, we have a
mixed model.
I The linear statistical model is

yijk = µ + τi + βj + (τ β)ij + ijk

where τi is a fixed effect, βj is a random effect, the interaction (τ β)ij is


assumed to be a random effect, and ijk is a random error.
I Assume that the τi are fixed effects such that ai=1 τi = 0 and βj is a
P
NID(0, σβ2 ) random variable.
I The interaction effect, (τ β)ij , is a normal random variable with mean 0 and
h i
variance (a−1) a στ2 β .
Pa
I i=1 (τ β)ij = (τ β).j = 0. This restriction implies that certain interaction
elements at different levels of the fixed factor are not independent,
(Restricted Model).
I For each j, cov [(τ β)ij , (τ β)i 0 j ] = − 1a στ2 β and corr [(τ β)ij , (τ β)i 0 j ] = − a−1
1
.
Sarguta (SoM) Design and Analysis February - May, 2021 256 / 263
Experiments with Random Factors

Expected Mean Squares

I The Mean Squares are computed in the usual ways.


Pa
2 2 bn i=1 τi2
E (MSA ) = σ + nστ β +
a−1
2 2
E (MSB ) = σ + anσβ
E (MSAB ) = σ 2 + nστ2 β
E (MSE ) = σ2

I The appropriate test statistic for testing that the means of the fixed factor
MSA
effects are equal, or H0 : τi = 0, is F0 = MS AB
for which the reference
distribution is Fa−1,(a−1)(b−1) .
I For testing H0 : σβ2 = 0, the test statistic is F0 = MSB
MSE with reference
distribution Fb−1,ab(n−1) .
I For testing the interaction hypothesis H0 : στ2 β = 0, we would use F0 = MSAB
MSE
which has a reference distribution F(a−1)(b−1),ab(n−1) .

Sarguta (SoM) Design and Analysis February - May, 2021 257 / 263
Experiments with Random Factors

Estimates

I Fixed effects

µ̂ = ȳ...
τ̂i = ȳi.. − ȳ... .

I Estimate the variance components. Using the usual analysis of variance


method we equate the mean squares to their expectations and solve for the
components to get
MSB − MSE
σ̂β2 =
an
MSAB − MSE
σ̂τ2 β =
n
σ̂ 2 = MSE .

Sarguta (SoM) Design and Analysis February - May, 2021 258 / 263
Experiments with Random Factors

Example - Measurements Systems

Consider the previous example on measurement systems capability experiment.


Suppose now that only three operators use this gauge, so the operators are a fixed
factor. However, because the parts are chosen at random, the experiment now
involves a mixed model.
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
operator 2 2.62 1.308 1.3193 0.2750
part 19 1185.43 62.391 62.9151 <2e-16 ***
operator:part 38 27.05 0.712 0.7178 0.8614
Residuals 60 59.50 0.992
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sarguta (SoM) Design and Analysis February - May, 2021 259 / 263
Experiments with Random Factors

Mean Squares

F-value for A is 1.84507


p-value for A is 0.1718915
F-value for B is 63.0202
p-value for B is 0
F-value for AB is 0.7171717
p-value for AB is 0.8620944
Estimate of sigma.sqd(beta) = 10.23333
Estimate of sigma.sqd(tau.beta) = -0.14

Sarguta (SoM) Design and Analysis February - May, 2021 260 / 263
Experiments with Random Factors

Fit the Reduced Model

Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
operator 2 2.62 1.308 1.4814 0.2324
part 19 1185.43 62.391 70.6447 <2e-16 ***
Residuals 98 86.55 0.883
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sarguta (SoM) Design and Analysis February - May, 2021 261 / 263
Experiments with Random Factors

Mean Squares

> MSB <- 62.39


> MSA <- 1.31
> MSE <- .88
> var.beta <- (MSB-MSE)/(a*n)
> cat("Estimate of sigma.sqd(beta) =", var.beta,"\n")
Estimate of sigma.sqd(beta) = 10.25167

Sarguta (SoM) Design and Analysis February - May, 2021 262 / 263
Experiments with Random Factors

References

Lawson, J. (2014). Design and Analysis of Experiments with R. Chapman and


Hall, CRC Press.
Oehlert, Gary W. (2010). A first course in design and analysis of experiments
Douglas, C. Montgomery (2001). Design and Analysis of Experiments. John
Wiley 8th Edition.
Box, G. E. P. and Draper, Norman. (2007). Response Surfaces, Mixtures, and
Ridge Analyses, John Wiley 2nd Edition.

Sarguta (SoM) Design and Analysis February - May, 2021 263 / 263

You might also like