Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 32

by Mark J. Anderson and Shari L.

Kraber

Quality managers who understand how to apply statistical tools for


design of experiments (DOE) are better able to support use of DOE in their
organizations. Ultimately, this can lead to breakthrough improvements in
product quality and process efficiency.

DOE provides a cost-


effective means for
solving problems and
developing new
processes. The simplest,
but most powerful, DOE
tool is two-level factorial
design, where each input
variable is varied at high
(+) and low (-) levels
and the output observed
for resultant changes.
Statistics can then help
determine which inputs
have the greatest effect
on outputs. For example,
Figure 1 shows the
results for a full two-level design on three factors affecting bearing life.
Note the large increase at the rear upper right corner of the cube.

In this example, two factors, heat and cage, interact to produce an


unexpected breakthrough in product quality. One-factor-at-a-time (OFAT)
experimentation will never reveal such interactions. Two-level factorials,
such as the one used in Figure 1, are much more efficient than OFAT
because they make use of multivariate design. It's simply a matter of
parallel processing (factorial design) vs. serial processing (OFAT).
Furthermore, two-level factorials don't require you to run the full number of
two-level combinations (2 # of factors), particularly when you get to five or
more factors. By making use of fractional designs, the two-level approach
can be extended to many factors without the cost of hundreds of runs.
Therefore, these DOEs are ideal for screening many factors to identify the
vital few that significantly affect your response.

Such improvements will obviously lead to increased market share and


profit. So why don't more manufacturers use DOE? In some cases, it's
simple ignorance, but even when companies provide proper training,
experimenters resist DOE because it requires planning, discipline and the
use of statistics. Fear of statistics is widespread, even among highly
educated scientists and managers. Quality professionals can play a big role
in helping their colleagues overcome their reluctance.

Using DOE successfully depends on understanding eight fundamental


concepts. To illustrate these keys to success, we'll look at a typical
example: reducing shrinkage of plastic parts from an injection molding
process. The molding case will demonstrate the use of fractional two-level
design.

1. Set good objectives

Before you can design an experiment, you must define its objective. The
focus of the study may be to screen out the factors that aren't critical to the
process, or it may be to optimize a few critical factors. A well-defined
objective leads the experimenter to the correct DOE.

In the initial stage of process development or troubleshooting, the


appropriate design choice is a fractional two-level factorial. This DOE
screens a large number of factors in a minimal number of runs. However, if
the process is already close to optimum conditions, then a response surface
design may be most appropriate. It will explore a few factors over many
levels.

If you don't identify the objectives of a study, you may pay the
consequences--trying to study too many or too few factors, not measuring
the correct responses, or arriving at conclusions that are already known.

Vague objectives lead to lost time and money, as well as frustration, for all
involved. Identifying the objective upfront builds a common understanding
of the project and expectations for the outcome.

In our case study of the injection molder, management wants to reduce


variation in parts shrinkage. If the shrinkage can be stabilized, then mold
dimensions can be adjusted so the parts can be made consistently.

The factors and levels


to be studied are as
shown in Table 1.

The experimenters
have chosen a two-
level factorial design
with 32 runs. A full
set of combinations would require 128 runs (27), so this represents a 1/4th
fraction.

2. Measure responses quantitatively

Many DOEs fail because their responses can't be measured quantitatively.


A classic example is found with visual inspections for quality.
Traditionally, process operators or inspectors use a qualitative system to
determine whether a product passes or fails. At best, they may have
boundary samples of minimally acceptable product. Although this system
may be OK for production, it isn't precise enough for a good DOE. Pass/fail
data can be used in DOE, but to do so is very inefficient. For example, if
your process typically produces a 0.1 percent defect rate, you would expect
to find five out of 5,000 parts defective. In order to execute a simple
designed experiment that investigated three factors in eight experimental
runs on such a process, you would need to utilize a minimum of 40,000
parts (8 x 5,000). This would assure getting enough defects to judge
improvement, but at an exorbitant cost.

For the purposes of experimentation, a rating scale works well. Even crude
scaling, from 1 to 5, is far better than using the simple pass/fail method.
Define the scale by providing benchmarks in the form of defective units or
pictures. Train three to five people to use the scale. During the experiment,
each trained inspector should rate each unit. Some inspectors may tend to
rate somewhat high or low, but this bias can be removed in the analysis via
blocking (see Key 5, "Block out known sources of variation"). For a good
DOE, the testing method must consistently produce reliable results.

In the case study on injection molding, the experimenters will measure


percent shrinkage at a critical dimension on the part. This is a quantitative
measurement requiring a great deal of precision. The short-term variation
in parts can be dampened by measuring several and inputting the average.
Other responses could be included, such as counts of blemishes and ratings
of other imperfections in surface quality.

3. Replicate to dampen uncontrollable variation (noise)

The more times you replicate a given set of conditions, the more precisely
you can estimate the response. Replication improves the chance of
detecting a statistically significant effect (the signal) in the midst of natural
process variation (the noise). In some processes, the noise drowns out the
signal. Before you do a DOE, it helps to assess the signal-to-noise ratio.
Then you can determine how many runs will be required for the DOE. You
first must decide how much of a signal you want to be able to detect. Then
you must estimate the noise. This can be determined from control charts,
process capability studies, analysis of variance (ANOVA) from prior DOEs
or a best guess based on experience.

The statisticians who developed two-level factorial designs incorporated


"hidden" replication within the test
matrixes. The level of replication is a
direct function of the size of the DOE.
You can use the data in Table 2 to
determine how many two-level factorial
runs you need in order to provide a 90-
percent probability of detecting the desired
signal. If you can't afford to do the
necessary runs, then you must see what
can be done to decrease noise. For
example, Table 2 shows a minimum of 64 runs for a signal-to-noise ratio of
1. However, if you could cut the noise in half, the signal-to-noise ratio
would double (to 2), thus reducing your runs from 64 to 16. If you can't
reduce noise, then you must accept an increase in the detectable signal (the
minimum effect that will be revealed by the DOE).

You can improve the power of the DOE by adding actual replicates where
conditions are duplicated. You can't just get by with repeat samples or
measurements. The entire process must be repeated from start to finish. If
you do submit several samples from a given experimental run, enter the
response as an average.

For our case study on injection molding, control charts reveal a standard
deviation of 0.60. Management wants to detect an effect of magnitude 0.85.
Therefore, the signal-to-noise ratio is approximately 1.4. The appropriate
number of runs for this two-level factorial experiment is 32. We decide not
to add further replicates due to time constraints, but several parts will be
made from each run. The response for each run becomes the average
shrinkage per part, thus dampening out variability in parts and the
measurement itself.

4. Randomize the run order

The order in which you run the experiments should be randomized to avoid
influence by uncontrolled variables such as tool wear, ambient temperature
and changes in raw material. These changes, which often are time-related,
can significantly influence the response. If you don't randomize the run
order, the DOE may indicate factor effects that are really due to
uncontrolled variables that just happened to change at the same time. For
example, let's assume that you run an experiment to keep your copier from
jamming so often during summer months. During the day-long DOE, you
first run all the low levels of a setting (factor "A"), and then you run the
high levels. Meanwhile, the humidity increases by 50 percent, creating a
significant change in the response. (The physical properties of paper are
very dependent on humidity.) In the analysis stage, factor A then appears to
be significant, but it's actually the change in humidity that caused the effect.
Randomization would have prevented this confusion.

The injection molders randomized the run order within each of several
machines used for their DOE. The between-machine differences were
blocked.

5. Block out known sources of variation

Blocking screens out noise caused by known sources of variation, such as


raw material batch, shift changes or machine differences. By dividing your
experimental runs into homogeneous blocks, and then arithmetically
removing the difference, you increase the sensitivity of your DOE.
Don't block anything that you want to study. For example, if you want to
measure the difference between two raw material suppliers, include them as
factors to study in your DOE.

In the injection molding case study, management would like the


experimenters to include all the machines in the DOE. There are four lines
in the factory, which may differ slightly. The experimenters divide the
DOE into four blocks of eight runs per production line. By running all lines
simultaneously, the experiment will get done four times faster. However, in
this case, where the DOE already is fractionated, there is a cost associated
with breaking it up into blocks: The interaction of mold temperature and
holding pressure can't be estimated due to aliasing. Aliasing is an
unfortunate side effect of fractional or blocked factorials.

6. Know which effects (if any) will be aliased

An alias indicates that you've changed two or more things at the same time
in the same way. Even unsophisticated experimenters know better, but
aliasing is nevertheless a critical and often overlooked feature of Plackett-
Burman, Taguchi designs or standard fractional factorials.

For example, if you try to study three factors in only four runs--a half-
fraction--the main effects become aliased with the two-factor interactions.
If you're lucky, only the main effects will be active, but more likely there
will be at least one interaction. The bearings case (Figure 1) can be
manipulated to show how dangerous it can be to run such a low-resolution
fraction. Table 3 shows the full-factorial test matrix.
In this case, the interaction AB is very significant, so it's included in the
matrix. (Note that this column is the product of columns A and B.) The
half-fraction is represented by the shaded rows--the responses for the other
runs have been struck out. Observe that in the highlighted area the pattern
of the minuses (lows) and pluses (highs) for AB is identical to that of factor
C. Put another way, C = C + AB, where the equal sign indicates aliasing.
By going to the half-fraction, we don't know whether the effect is the result
of C, AB or both.

Aliasing can be avoided by doing only full two-level factorials or high-


resolution fractionals, which isn't practical. Plackett-Burman or Taguchi
designs are often very low in resolution and therefore give very misleading
results on specific effects. If you must deal with these nonstandard designs,
always do a design evaluation to see what's aliased. Good DOE software
will give you the necessary details, even if runs are deleted or levels
changed. Then, if any effects are significant, you will know whether to rely
on the results or do further verification.

The injection molding study is a fractional factorial design with mediocre


resolution: Several two-factor interactions are aliased. A design evaluation
provides the specifics: CE = CE + FG, CF = CF + EG, CG = CG + EF. If
you evaluate the effects matrix for CE vs. FG, you will see a perfect
correlation. The plus symbol in the alias relationship tells you that the
calculated effect could be due to CE plus FG. If these or any of the other
aliased interactions are significant, further work will be needed.

7. Do a sequential series of experiments

Designed experiments should be executed in an iterative manner so that


information learned in one experiment can be applied to the next. For
example, rather than running a very large experiment with many factors
and using up the majority of your resources, consider starting with a
smaller experiment and then building upon the results. A typical series of
experiments consists of a screening design (fractional factorial) to identify
the significant factors, a full factorial or response surface design to fully
characterize or model the effects, followed up with confirmation runs to
verify your results. If you make a mistake in the selection of your factor
ranges or responses in a very large experiment, it can be very costly. Plan
for a series of sequential experiments so you can remain flexible. A good
guideline is not to invest more than 25 percent of your budget in the first
DOE.

8. Always confirm critical findings

After all the effort that goes into planning, running and analyzing a
designed experiment, it's very exciting to get the results of your work.
There is a tendency to eagerly grab the results, rush out to production and
say, "We have the answer!" Before doing that, you need to take the time to
do a confirmation run and verify the outcome. Good software packages will
provide you with a prediction interval to compare the results within some
degree of confidence. Remember, in statistics you never deal with
absolutes--there is always uncertainty in your recommendations. Be sure to
double-check your results.

In the injection molding case, the results of the experiment revealed a


significant interaction between booster pressure and moisture (CD). The
interaction is not aliased with any other two-factor interactions, so it's a
clear result. Shrinkage will be stabilized by keeping moisture low (D). This
is known as a robust operating condition.

Contour graphs and 3-D projections help you visualize your response
surface. To achieve robust operating conditions, look for flat areas in the
response surface.

The 3-D surfaces are very impressive, but they're only as good as the data
generated to create the predictive model. The results still must be
confirmed. If you want to generate more sophisticated surfaces, you should
follow up with response surface methods for optimization. These designs
require at least three levels of each factor, so you should restrict your study
to the vital few factors that survive the screening phase.

Promoting DOE

Design of experiments is a very powerful tool that can be utilized in all


manufacturing industries. Quality managers who encourage DOE use will
greatly increase their chances for making breakthrough improvements in
product quality and process efficiency.

About the authors

Mark J. Anderson and Shari L. Kraber are consultants at Stat-Ease Inc. in


Minneapolis. E-mail Anderson at manderson@qualitydigest.com .

OPTIMIZE ATTRIBUTE RESPONSES USING DESIGN OF


EXPERIMENTS
By Manikandan Jayakumar
11 COMMENTS
0
The objective of a design of experiments (DOE) is to optimize the value of a response variable by changing the values,
referred to as levels, of the factors that affect the response. Although often a response is continuous in nature, there are
some cases in which the response falls in the attribute category (e.g., pass/fail, accept/reject, etc.). This article explains
how to analyze an attribute type of response, using an example from the manufacturing sector.
Example Experiment
In this example there are three factors with two levels. The factors are renamed as Factor1, Factor2 and Factor3, and the
levels are coded as -1, +1.

The experiment was conducted with 10 replicates, or repetitions of the experiment at each combination of level settings,
to provide greater accuracy for attribute data.

In this example, the total number of runs was LF x replicates = 23 x 10 = 80, where L = number of levels and F = number of
factors.

Table 1: Results of 10 Replicates with 1 Combination of Levels

Factor1 Factor2 Factor3 Response

-1 +1 -1 OK

-1 +1 -1 NOK

-1 +1 -1 OK

-1 +1 -1 OK

-1 +1 -1 OK
-1 +1 -1 OK

-1 +1 -1 OK

-1 +1 -1 OK

-1 +1 -1 OK

-1 +1 -1 OK
A single set of experiments with 10 replicates is shown in Table 1.

Next, the responses for the replicates of this experiment were converted into proportions, p. In the results shown in Table
1, the proportion of OK (okay) to NOK (not okay) was 9 out of 10. Thus, p = 0.90.

Once the experiment was conducted for all combinations of levels, the captured OK/NOK attribute responses were
converted into proportions, displayed in Table 2.
Table 2: Summary of Experiment Data for All Combinations of Levels

Factor1 Factor2 Factor3 p

-1 -1 1 0.70

1 -1 -1 0.00

-1 1 -1 0.90

1 1 1 0.60

1 1 -1 0.40

-1 1 1 1.00

1 -1 1 0.00
-1 -1 -1 0.10

Handpicked Content : DOE in Software Testing: The Potential and the Risks

Arcsine Function
The arcsine square-root transform is the variance stabilizing transform for the binomial distribution and is the most
commonly used transformation. It assists when working with proportions and percentages. The proportion, p, can be
made nearly normal if the square root of p is used with the arcsine (or inverse sine: sin-1) transform. The arcsine transform
is then computed as a function of p ? arcsin (sqrt(p)).
In the example, the calculated p was converted into sqrt(p) and then again converted into arcsin (sqrt(p)) by using the
arcsine function. The results are shown in Table 3.
Table 3: Updated Results with Arcsine Function

Factor1 Factor2 Factor3 p Sqrt(p) Arcsin (sqrt(p))

-1 -1 1 0.70 0.83666 0.99116

1 -1 -1 0.00 0.00000 0.00000

-1 1 -1 0.90 0.94868 1.24905

1 1 1 0.60 0.77460 0.88608

1 1 -1 0.40 0.63246 0.68472

-1 1 1 1.00 1.00000 1.57080

1 -1 1 0.00 0.00000 0.00000

-1 -1 -1 0.10 0.31623 0.32175


Handpicked Content : Most Practical DOE Explained (with Template)

Analysis of Results
The general linear model (GLM) in analysis of variance (ANOVA) was used to identify the significant factor(s) among the
three factors. The arcsine value was specified as the response, and Factor1, Factor2, Factor3 were provided as the
factors. The output of GLM is shown in Figure 1.

Figure 1: General Linear Model: Arcsin (Sqrt(p)) Versus Factor1, Factor2, Factor3
The p-value calculated within the GLM results indicates that Factor1 and Factor2 are statistically significant. That is, those
factors have a significant bearing on the arcsine value.
Interaction Plot
An interaction plot displays the mean of the response variable for each level of a factor with the level of a second factor
held constant. Interaction plots are useful for judging the presence of interactions – when the response at one factor level
depends upon the level(s) of other factors.

Handpicked Content : Design of Experiments to Optimize Any Process or Product [VIDEO] – With Mark Kiemele
Parallel lines in an interaction plot indicate no interaction; non-parallel lines indicate an interaction between the factors.
The greater the departure of the lines from the parallel state, the higher the degree of interaction. To produce an
interaction plot, data must be available from all combinations of levels.

An interaction plot for the sample data is shown in Figure 2.

Figure 2: Interaction Plot for Arcsin (sqrt(p))


In all cases in Figure 2, the interaction lines are parallel, which shows there are no significant interactions between the
factors.

Main Effects Plot


A main effects plot is used in conjunction with ANOVA and DOE to examine differences in the response variable at varying
levels for one or more factors. When different levels of a factor affect the response differently, a main effect is present. A
main effects plot graphs the response mean for each factor level connected by a line.
Handpicked Content : Save Time With Fractional Factorial DOEs

The general patterns to look for are:

 When the line is horizontal (parallel to the x-axis), there is no main effect present. Each level of the factor affects the
response in the same way, and the response mean is the same across all factor levels.
 When the line is not horizontal, then there is a main effect present. Different levels of the factor affect the response
differently. The steeper the slope of the line, the greater the magnitude of the main effect.
The sample data is shown in a main effects plot in Figure 3 and suggests that a main effect is present in Factor1 and
Factor2 as the lines are not horizontal.

In the example, a larger value of the response arcsin (sqrt(p)) represents a better outcome. With that in mind, the analysis
suggests that the levels -1 for Factor1, +1 for Factor2 and +1 for Factor3 will provide the desired results (i.e., high
proportion of acceptance, or low proportion of rejections).

Figure 3: Main Effects Plot for Arcsin (sqrt(p))


Response Optimization
Response optimization is a method of analysis to identify the combination of input variable settings that jointly optimize a
single response or a set of responses. Within a statistical analysis program, the response optimizer function provides an
optimal solution for the input variable combinations, and sometimes an optimization plot. The optimization plot may be
interactive, in which case the input variable settings can be adjusted on the plot to search for more desirable solutions.

Handpicked Content : Process, Product and People: 3P Approach to Quality

Figure 4 shows the results of the response optimizer analysis for the example data. It indicates that the maximum arcsine
is 1.5708 with a desirability of 1.

Figure 4: Response Optimization


Solution and Verification
Based on the results of the analysis, the process parameters for factors are set at the following levels to reduce the
number of rejections:

 Factor 1: -1
 Factor 2: +1
 Factor 3: +1
To verify the above settings, 30 experiments were conducted and resulted in 0 rejections. With that verification, the
process settings were approved and updated in all appropriate process documents.

You Might Also Like

 Design of Experiments to Optimize Any Process or Product [VIDEO] – With Mark


Kiemele
 Six Sigma Critical Success Factors
 Design of Experiments – A Primer
 Save Time With Fractional Factorial DOEs
 Using Monte Carlo Simulation as Process Control Aid
Share With Your Network
 LINKEDIN
 TWITTER
 FACEBOOK
 EMAIL

Manikandan Jayakumar
View Profile | View all posts by Manikandan Jayakumar
Using the R-Squared Statistic in ANOVA and General Linear Models
Grabbing Hold of the GRPI Model

DESIGN OF EXPERIMENTS (DOE), TRAINING MATERIALS & AIDS

Give Trainees DOE Experience with Simulated Experiments


FEBRUARY 26, 2010 MICHAEL OHLER

Comments 11
1.

John Noguera
Thank you for the interesting article. However, this approach does not take into account the sample size of your
replicates. There is a statistical difference between 7/10 and 700/1000!

The way to do that (and thus also avoid the need to transform) is to use binary logistic regression.

0
- Log in to Reply

2.

Mike86
Please check the factor signs in Table 1. I’m showing -1 for Factor 1, +1 for Factor 2, and -1 for Factor 3 in all
experiments.

0
- Log in to Reply

3.

Michael Ohler
John, you are spot on right.

We recommend using FTP transformation for binary outputs of DOE before analyzing it. FTP transform does the following
(arcsin(sqrt(x/(n+1))+arcsin(sqrt((x+1)/(n+1)))/2 (I hope I made no mistakes with brackets, x being number of events and
N – sample size). As you may see, it does take sample size into account, but its impact rapidly goes down when n goes
up.
Recently I was preparing a data set for training and decided to compare results of analysis via DOE and Binary Logistic
Regression. Results were nothing short of shocking.
Data set was 5 factors, half factorial, with sample size per treatment being 600. The output was number of incomplete
credit card applications. DOE on FTPed data was able to find only 2 terms as significant. BLR found 9 – with 5
interactions!!! That was obviously due to the fact that BLR takes sample size into significance calculations.
Then I took significant terms from BLR to DOE and run Response Optimizer. And I got second big surprise. Even direction
of a factor impact calculated from main effects and interactions was different. DOE would say – Increase A to decrease
number for defectives. BLR would say – decrease A! This is obviously due to differences between purely linear model of
terms impact in DOE and log transform of BLR.
Again, it should be no surprise that results would be different, if one looks at actual math behind these 2 tools. However I
did not expect difference to be so practically significant!
My take away – use BLR if you know how to run it and read results. However, for some LSS students playing with
interactions in BLR and making final transfer function of probability with backward LN transform manually might be a
tough call.
So we are investigating with Minitab possible limits of using FTPed data within DOE functionality (i guess it will be
something like Sample size5 or something). If you wish I can keep you guys posted on the development.

Regards,
Maxim Korenyugin and Michael Ohler

0
- Log in to Reply

4.

Jeff Larsen
Enjoyed reading, really studying Manikandan-Jayakumar write-up. The only question I would have is: Could you not reach
the same conclusion by using an ‘Attribute Agreement Analysis?’ It seems that the purpose of DOE is to describe a
design that provides a purpose, which is to provide the most efficient and economical way to reach a valid conclusion.
However, the article flow does provide a simple interpretation of results found which speaks well to the design..

Thanks, well done and worth the read!


Scientific Method: Rethink Experiment Design
The traditional approach to experimentation, often referred to as the
“scientific method,” requires changing only one factor at a time
(OFAT), but this method only allows one to see things one dimension
at a time. By varying factors only at two levels each, but
simultaneously rather than one at a time, experimenters can uncover
important interactions.
By Mark J. Anderson and Patrick J. Whitcomb, Stat-Ease
Nov 01, 2006
The traditional approach to experimentation, often referred to as the “scientific method,” requires changing only one factor at a time (OFAT).
This should not be confused with the full factorial method (Table 1).

Table 1. An example of a full-factorial test matrix with 2 factors.


1. Average of minimum of two replicates at certain conditions.
2. Factors are variables chosen for degree of influence on response.
The OFAT approach not only suffers from being extremely inefficient, but more importantly, it cannot detect interactions of factors – much
less map a complex response surface. For example, consider the effects of two factors “A” and “B”, such as temperature and pressure, on
a response — for example, chemical yield. Let’s say these factors affect the response as shown by the surface in Figure 1.
Figure 1. The real contour of the response to factors A and B: interactions are important.
Unfortunately, an experimenter who adheres to the old-fashioned scientific method of OFAT can see things only one dimension at a time.
The points on Figure 1 outline the path taken by OFAT done first on factor A from left to right. The results can be seen in Figure 2. They
look good.
Figure 2. This is how OFAT sees the relationship between the response variable and factor A.
Next the OFAT experimenter sets A at the value where response is maximized and then varies factor B. Figure 3 shows the result. The
experiment cries “Eureka — I have found it!” However, it is clear from our perspective in Figure 1 that the outcome falls far short of the
potential increase in response.
Figure 3. This is how OFAT sees the relationship between the response variable and factor B.

Minimum-run designs
By simply varying factors only at two levels each, but simultaneously rather than one at a time, experimenters can uncover important
interactions such as the one depicted in Figure 4.
Figure 4. Response surface shows a pure interaction of two factors.
Furthermore, this parallel testing scheme is much more efficient (cost-effective) than the serial approach of OFAT. For up to four factors,
the number of runs required to do all combinations, 16 (= 24), may not be prohibitive. These full factorials provide resolution of all effects.
However, as a practical matter all you really need to know are the main effects (MEs) and two-factor interactions (2FIs) — higher-order
effects are so unlikely that they can be assumed to be negligible. Making this assumption (ignoring effects of 3FI or more) opens the door to
fractional two-level combinations that suffice for estimating MEs and 2FIs.

Statisticians have spelled these out more than half a century ago. Our previous article on DOE in Chemical Processing [1] details the
classical high-resolution fractional design options for up to 11 factors. They can be constructed with the aid of a textbook on DOE [2].

If you dig into traditional options for experiment designs, you will find designs available with as little as k+1 runs, where k equals the number
of factors you want to test. One popular class of designs, called Plackett-Burman after the two statisticians who invented them during
World War II, offers experiments that come in multiples of four, for example, 11 factors in 12 runs or 19 factors in 20 runs. However, these
“saturated” designs provide very poor resolution — main effects will be confused with 2FIs. We advise that you avoid running such low
resolution designs. They seemingly offer something for nothing, but the end result may be similar to banging a pipe wrench against a noisy
pump — the problem goes away only temporarily.
Pulling the pump apart and determining the real cause is a much more effective approach. Similarly, you will be better off running a bigger
design that at least resolves main effects clear of any 2FIs. Better yet would be an experimental layout that provides solid estimates of the
2FIs as well. Taking advantage of 21st century computer technology, statisticians developed a very attractive class of designs that require
minimum runs, or nearly so (some have an extra one to maintain an equal replication of the lows versus the highs of each factor). [3]. The
inventors deemed these designs “MR5,” where MR stands for minimum run and the number 5 refers to the relatively high resolution —
sufficient to clearly estimate not only main effects, but also the 2FIs. Table 2 shows the number of runs these MR5s require for 6 to 11
factors — far fewer than full factorials or even the standard fractions.

Table 2. New methods require far fewer test than old full-factorial approach.
Designs for up to 50 factors can be readily laid out via commercially available statistical software developed specifically for experimenters
[4]. A sample layout for six runs is provided in the Table 3.
Table 3. The MR5 layout for six factors in case study: + means the factor is increased; - means it is decreased.
It codes the lows as – (minus) and highs as + (plus). If you perform this experiment, we recommend that the rows be randomized for your
actual run order. If the number of runs far exceed your budget for time and materials, choose the option for MR4 (minimum run resolution
four) — well enough to screen main effects. As shown in Table 3, these lesser-resolution designs require a number of runs equal to two
times the factors, for example: 20 runs for 10 factors (versus 56 for the MR5 and 1,024 for all combinations in the full-factorial).

The power of the approach


Researchers at a client of ours applied DOE to successfully complete pilot studies on a product made via fermentation. Using a predictive
model derived from their published results [5], we reproduced it using the new minimum-run design technology. The experiments screened
six factors, for which we’ve simplified the descriptions, thought to affect the yield. For the acid-test, we picked the lower resolution option —
the MR4 with 12 runs. Table 4 shows the design layout and simulated results.

Table 4. MR4 design to screen a fermentation pilot process shows A’s power to affect yield. (Click to enlarge.)
We will greatly simply the statistical analysis of the data to provide a graphic focus on the bottom-line results. Figure 5 shows an ordered
bar chart of the absolute effects scaled by t-values, essentially the number of standard deviations beyond the mean response.
Figure 5. Pareto plot shows the strength of each factor and interactions on yield.
The line at 2.31 t-value represents a significant level for achieving 95% confidence that a given effect did not just occur by chance. In the
field of quality, displays like these are known a “Pareto” plots — named after an Italian economist who observed that about 20% of the
people owned 80% of a country’s land. Similarly, a rule-of-thumb for DOE says that 20% of the likely effects, main and 2FIs, will emerge as
being statistically significant for any given process. In this case, the main effect D towers above the remainder of the 11 effects that can be
estimated, other than the overall average, from this 12-run design. However, the effect of factor D apparently depends on the level of A, as
evidenced by the bar for interaction AD exceeding the threshold level for significance.

The main effect of A is the next largest bar, which would normally go into what statisticians call the “residual” pool for estimation of error.
However, in this case, the main effect of A goes into the predictive model to support the significant AD interaction. The practical reason why
factor A (temperature) should be modeled is illustrated by Figure 6, which shows that its effect depends on the level of factor D — the load
density.
Figure 6. Interaction of factors A and D is strong because lines don’t cross.
When D is at the high (+) level, increasing temperature reduces the predicted yield significantly, whereas at low D (–) the temperature has
little effect — perhaps somewhat positive — on yield. (The key to inferring significance in comparing one end to the other is whether the
least significant difference bars overlap or not. Notice that the bars at either end of the upper line on Figure 6 do not overlap, thus their
difference are statistically significant.)

This interaction effect proved to be the key for success in the actual fermentation study.

However, for some MR4 cases like this, a sophisticated software that offers forward stepwise regression for modeling such as the one we
used, will reveal a truly active interaction such as AD — but only if it is isolated and large relative to experimental error. If you suspect
beforehand that 2FIs are present in your system, upgrade your design to the higher resolution MR5 or equivalent standard fractional two-
level factorial.

Response surface methodology


After a series of screening studies and experiments that reveal two-factor interactions, you may approach a peak of performance that
exhibits significant curvature. Then, as we discussed in our previous article, you should augment your two-level factorial into a central
composite design (CCD). This is the standard design for response surface methods (RSM), which was invented in 1951. For details on
RSM as it is currently practiced, see the reference guidebook [6].

For experiments on six or more factors, the MR5s provide a more efficient core for CCDs than the classical fractions of two-level factorials
available half a century ago. For the six-factor case, the standard CCD with a half-fraction for its core requires 52 runs in total. By
substituting in the six-factor MR5 laid out in Table 2 as the core, the runs are reduced to 40 — a substantial savings with little loss for
purposes of RSM. Although someone well-versed on these methods with the MR5 templates in hand would have no trouble laying out
CCDs based on these, it becomes far easier with software developed for this purpose [7].

New and improved


By letting go of OFAT in favor of multivariable testing via DOE/RSM, engineers and chemists can become more productive and effective in
their experimentation, thus gaining greater knowledge more quickly – an imperative for the 21st century. Minimum-run designs, developed
in the years since our first article on this subject, make it easier than ever before to optimize larger numbers of factors that affect your
chemical process.

References
1. Anderson, M.J., Whitcomb, P.J., “Design of Experiments Strategies,” Project Engineering Annual, pp 46-48, 1998.
2. Anderson, M.J., Whitcomb, P.J., DOE Simplified, Productivity, Inc, New York, 2000.
3. Oehlert, G.W, Whitcomb, P.J., “Small, Efficient, Equireplicated Resolution V Fractions of 2K designs and their Application to Central
Composite Designs,” Proceedings of 46th Fall Technical Conference. American Society of Quality, Chemical and Process Industries
Division and Statistic Division; American Statistical Association, Section on Physical and Engineering Sciences, 2002.
4. Helseth, T.J., et al, Design-Ease, Version 7 for Windows, Stat-Ease, Inc, Minneapolis, 2006 ($495), website: www.statease.com.
5. Tholudur, A., et al, “Using Design of Experiments to Assess Escherichia coli Fermentation Robustness,” Bioprocess International, October
2005.
6. Anderson, M.J., Whitcomb, P.J., RSM Simplified, Productivity, Inc, New York, 2005.
7. Helseth, T.J., et al, Design-Expert, Version 7 for Windows, Stat-Ease, Inc, Minneapolis, 2006 ($995), website: www.statease.com.

Mark J. Anderson and Patrick J. Whitcomb are principals at Stat-Ease in Minneapolis, Minn.; E-mail them
at mark@statease.com and Pat@StatEase.com.

You might also like