Professional Documents
Culture Documents
Stat 336-Design of Experiments - Dr. Eric Nyarko
Stat 336-Design of Experiments - Dr. Eric Nyarko
Stat 336-Design of Experiments - Dr. Eric Nyarko
Instructor
Eric Nyarko (PhD)
Department of Statistics and Actuarial Science
University of Ghana
ericnyarko@ug.edu.gh / nyarkoeric5@gmail.com
Introduction to DOE
• Learn about the objectives of experimental design and the role it plays
in the knowledge discovery process
• Learn about different strategies of experimentation
• Understand the role that statistical methods play in designing and
analyzing experiments
• Understand the concepts of main effects of factors and interaction
between factors
• Know about factorial experiments
• Know the practical guidelines for designing and conducting
experiments
• We may want to determine which input factors are responsible for the
observed changes in the response.
• Develop a model relating the response to the important input variables, and
use this model for process or system improvement or other decision
making
• “Best-guess” experiments
– Used a lot
– More successful than you might suspect, but there are
disadvantages…
• One-factor-at-a-time (OFAT) experiments
– Sometimes associated with the “scientific” or “engineering”
method
– Devastated by interaction, also very inefficient
• Statistically designed experiments
– Based on Fisher’s factorial concept
It serves as a
checklist to improve
experimentation and
ensures that results
are not corrupted for
lack of careful
planning.
• The t-test does not directly apply when there are more than
two levels of a factor
• There are lots of practical situations where there are either
more than two levels of interest, or there are several factors
of simultaneous interest
• The analysis of variance (ANOVA) is the appropriate
analysis “engine” for these types of experiments
• The ANOVA was developed by Fisher in the early 1920s,
and initially applied to agricultural experiments
• Used extensively today for industrial experiments
Dr. Eric Nyarko (UG) STAT 336: Design of Experiments 29
Experiments with a Single Factor
(The Analysis of Variance)
i = 1, 2,..., a
yij = + i + ij ,
j = 1, 2,..., n
i =1 j =1
ij .. i. .. ij i.
( y − y
i =1 j =1
) = [( y −
2
y ) + ( y − y
i =1 j =1
)]2
a a n
= n ( yi. − y.. ) 2 + ( yij − yi. ) 2
i =1 i =1 j =1
SST = SSTreatments + SS E
Dr. Eric Nyarko (UG) STAT 336: Design of Experiments 33
The Analysis of Variance
SST = SSTreatments + SS E
• A large value of SSTreatments reflects large differences in
treatment means
• A small value of SSTreatments likely indicates no differences in
treatment means
• Formal statistical hypotheses are:
H 0 : 1 = 2 = = a
H1 : At least one mean is different
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 = 𝜇4
against
𝐻1 : Some means are different
are given by
• If the standardized residuals 𝑑𝑖𝑗 are approximately normal with mean zero
and unit variance, then about
– 68% of the 𝑑𝑖𝑗 should fall within the limits ±1, about 95% of them
should fall within ±2, and virtually all of them should fall within ±3.
– A residual bigger than 3 or 4 standard deviations from zero is a
potential outlier.
where both the treatment effects 𝜏𝑖 and 𝜀𝑖𝑗 are random variables.
• We will assume that 𝜏𝑖 and 𝜀𝑖𝑗 and independent. Because 𝜏𝑖 is
independent of 𝜀𝑖𝑗 , the variance of any observation is
• The main diagonals of this matrix are the variances of each individual
observation and every off-diagonal element is the covariance of a pair
of observations.
Dr. Eric Nyarko (UG) STAT 336: Design of Experiments 56
Analysis of Variance for the Random
Model
• is still valid
The basic ANOVA sum of squares identity
𝑆𝑆𝑇 = 𝑆𝑆𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 + 𝑆𝑆𝐸
• Testing hypotheses about individual treatment effects is not very meaningful
because they were selected randomly
• We are more interested in the population of treatments, so we test hypotheses
about the variance component 𝜎𝜏2 .
• Under the null hypothesis 𝜎𝜏2 = 0, the following ratio is distributed as F with
a − 1 and N − a degrees of freedom
• A 100 1 − 𝛼 % CI for 𝜎 2 is
2 2
• Since 𝑀𝑆𝐸 = 1.90, 𝑁 = 16, 𝑎 = 4, 𝜒0.025,12 = 23.3364 and 𝜒0.975,12 =
4.4038, the 95% CI on 𝜎 2 is 0.9770 ≤ 𝜎 2 ≤ 5.1775
• Know how the blocking principle can be effective in reducing the variability
arising from controllable nuisance factors.
• Learn about the randomized complete block design.
• Understand how the analysis of variance can be extended to the randomized
complete block design.
• Know how to do model adequacy checking for the randomized complete block
design.
• Understand how a Latin square design can be used to control two sources of
nuisance variability in an experiment.
a b a b
(y
i =1 j =1
ij − y.. ) = [( yi. − y.. ) + ( y. j − y.. )
2
i =1 j =1
i =1 j =1
a b
+ ( yij − yi . − y. j + y.. ) 2
i =1 j =1
ab − 1 = a − 1 + b − 1 + ( a − 1)(b − 1)
• Therefore
– The ratios of sums of squares to their degrees of freedom result in
mean squares
– The ratio of the mean square for treatments to the error mean
square is an F statistic that can be used to test the hypothesis of
equal treatment means
To compute LSD for RCBD, replace n and 𝑑𝑓𝐸 = 𝑁 − 𝑎 in the LSD formula
for CRD by b and 𝑑𝑓𝐸 = (𝑎 − 1)(𝑏 − 1), respectively.
Dr. Eric Nyarko (UG) STAT 336: Design of Experiments 82
Random Blocks and/or Treatments
• Suppose that there are a treatments (fixed) and b blocks (random)
• A statistical model for the RCBD is
i = 1, 2,..., a
yij = + i + j + ij
j = 1, 2,..., b
• This is a special case of a mixed model (because it contains both fixed and
random factors).
• For the vascular graft experiment (Example 4.1), the estimate of 𝜎𝛽2 is
• From the vascular graft experiment with one missing value, we find
y′2. = 455.4, y′.4 = 267.5, and y′.. = 2060.4.
• Therefore,
NOTE:
• We may use the above “missing observation equation” iteratively to estimate several missing
values.
• To illustrate the iterative approach, suppose that two values are missing. Arbitrarily estimate the
first missing value, and then use this value along with the real data and the “missing
observation equation” to estimate the second. Now the “missing observation equation” can be
used to reestimate the first missing value, and following this, the second can be reestimated.
This process is continued until convergence is obtained. In any missing value problem, the error
degrees of freedom are reduced by one for each missing observation.
Dr. Eric Nyarko (UG) STAT 336: Design of Experiments 91
The Latin Square Design
• These designs are used to simultaneously control (or eliminate) two
sources of nuisance variability
• A significant assumption is that the three factors (treatments, nuisance
factors) do not interact
• If this assumption is violated, the Latin square design will not produce
valid results
• Latin squares are not used as much as the RCBD in industrial
experimentation
• However, it can be useful in situations where the rows and columns
represent factors the experimenter actually wishes to study and where
there are no randomization restrictions.
• Thus, three factors (rows, columns, and letters), each at p levels, can be
investigated in only 𝑝2 runs.
A Latin square in which the first row and column consists of the letters
written in alphabetical order is called a standard Latin square
Assignment:
Find the residuals for the rocket propellant problem and construct
appropriate plots.
• where y′𝑖.. , y′.𝑗. and y′..𝑘 indicate totals for the row, column,
and treatment with the missing value, respectively, and y′...
is the grand total with the missing value.
2. Use the same batches but different operators in each replicate (or,
equivalently, use the same operators but different batches).
• In the first period, half of the subjects (chosen at random) are given fluid A and
the other half fluid B.
• At the end of the period, the response is measured and a period of time is
allowed to pass in which any physiological effect of the fluids is eliminated.
• Then the experimenter has the subjects who took fluid A take fluid B and those
who took fluid B take fluid A.
• This design is called a crossover design
Dr. Eric Nyarko (UG) STAT 336: Design of Experiments 104
The Graeco-Latin Square Design
• Consider a p × p Latin square, and superimpose on it a second p × p
Latin square in which the treatments are denoted by Greek letters.
• If the two squares when superimposed have the property that each
Greek letter appears once and only once with each Latin letter, the two
Latin squares are said to be orthogonal, and the design obtained is
called a Graeco-Latin square.
• An example of a 4 × 4 Graeco-Latin square is shown below:
where 𝑦𝑖𝑗𝑘𝑙 is the observation in row i and column l for Latin letter j and Greek
letter k, 𝜃𝑖 is the effect of the ith row, 𝜏𝑗 is the effect of Latin letter treatment j, 𝜔𝑘
is the effect of Greek letter treatment k, Ψ𝑙 is the effect of column l, and 𝜖𝑖𝑗𝑘𝑙 is an
𝑁𝐼𝐷(0, 𝜎 2 ) random error component.
Dr. Eric Nyarko (UG) STAT 336: Design of Experiments 106
ANOVA for Graeco-Latin Square Design
• Suppose that there are a treatments and that each block can hold
exactly k (k < a) treatments.
𝑎
• A balanced incomplete block design may be constructed by taking 𝑘
where 𝑦𝑖𝑗 is the ith observation in the jth block, 𝜇 is the overall mean, 𝜏𝑖 is the
effect of the ith treatment, 𝛽𝑗 is the effect of the jth block, and 𝜖𝑖𝑗 is the
𝑁𝐼𝐷(0, 𝜎 2 ) random error component.
• The total variability in the data is expressed by the total corrected sum of
squares:
where 𝑄𝑖 is the adjusted total for the ith treatment, which is computed as
experiment.