Professional Documents
Culture Documents
By Mark J. Anderson and Shari L. Kraber
By Mark J. Anderson and Shari L. Kraber
Kraber
Before you can design an experiment, you must define its objective. The
focus of the study may be to screen out the factors that aren't critical to the
process, or it may be to optimize a few critical factors. A well-defined
objective leads the experimenter to the correct DOE.
If you don't identify the objectives of a study, you may pay the
consequences--trying to study too many or too few factors, not measuring
the correct responses, or arriving at conclusions that are already known.
Vague objectives lead to lost time and money, as well as frustration, for all
involved. Identifying the objective upfront builds a common understanding
of the project and expectations for the outcome.
The experimenters
have chosen a two-
level factorial design
with 32 runs. A full
set of combinations would require 128 runs (27), so this represents a 1/4th
fraction.
For the purposes of experimentation, a rating scale works well. Even crude
scaling, from 1 to 5, is far better than using the simple pass/fail method.
Define the scale by providing benchmarks in the form of defective units or
pictures. Train three to five people to use the scale. During the experiment,
each trained inspector should rate each unit. Some inspectors may tend to
rate somewhat high or low, but this bias can be removed in the analysis via
blocking (see Key 5, "Block out known sources of variation"). For a good
DOE, the testing method must consistently produce reliable results.
The more times you replicate a given set of conditions, the more precisely
you can estimate the response. Replication improves the chance of
detecting a statistically significant effect (the signal) in the midst of natural
process variation (the noise). In some processes, the noise drowns out the
signal. Before you do a DOE, it helps to assess the signal-to-noise ratio.
Then you can determine how many runs will be required for the DOE. You
first must decide how much of a signal you want to be able to detect. Then
you must estimate the noise. This can be determined from control charts,
process capability studies, analysis of variance (ANOVA) from prior DOEs
or a best guess based on experience.
You can improve the power of the DOE by adding actual replicates where
conditions are duplicated. You can't just get by with repeat samples or
measurements. The entire process must be repeated from start to finish. If
you do submit several samples from a given experimental run, enter the
response as an average.
For our case study on injection molding, control charts reveal a standard
deviation of 0.60. Management wants to detect an effect of magnitude 0.85.
Therefore, the signal-to-noise ratio is approximately 1.4. The appropriate
number of runs for this two-level factorial experiment is 32. We decide not
to add further replicates due to time constraints, but several parts will be
made from each run. The response for each run becomes the average
shrinkage per part, thus dampening out variability in parts and the
measurement itself.
The order in which you run the experiments should be randomized to avoid
influence by uncontrolled variables such as tool wear, ambient temperature
and changes in raw material. These changes, which often are time-related,
can significantly influence the response. If you don't randomize the run
order, the DOE may indicate factor effects that are really due to
uncontrolled variables that just happened to change at the same time. For
example, let's assume that you run an experiment to keep your copier from
jamming so often during summer months. During the day-long DOE, you
first run all the low levels of a setting (factor "A"), and then you run the
high levels. Meanwhile, the humidity increases by 50 percent, creating a
significant change in the response. (The physical properties of paper are
very dependent on humidity.) In the analysis stage, factor A then appears to
be significant, but it's actually the change in humidity that caused the effect.
Randomization would have prevented this confusion.
The injection molders randomized the run order within each of several
machines used for their DOE. The between-machine differences were
blocked.
An alias indicates that you've changed two or more things at the same time
in the same way. Even unsophisticated experimenters know better, but
aliasing is nevertheless a critical and often overlooked feature of Plackett-
Burman, Taguchi designs or standard fractional factorials.
For example, if you try to study three factors in only four runs--a half-
fraction--the main effects become aliased with the two-factor interactions.
If you're lucky, only the main effects will be active, but more likely there
will be at least one interaction. The bearings case (Figure 1) can be
manipulated to show how dangerous it can be to run such a low-resolution
fraction. Table 3 shows the full-factorial test matrix.
In this case, the interaction AB is very significant, so it's included in the
matrix. (Note that this column is the product of columns A and B.) The
half-fraction is represented by the shaded rows--the responses for the other
runs have been struck out. Observe that in the highlighted area the pattern
of the minuses (lows) and pluses (highs) for AB is identical to that of factor
C. Put another way, C = C + AB, where the equal sign indicates aliasing.
By going to the half-fraction, we don't know whether the effect is the result
of C, AB or both.
After all the effort that goes into planning, running and analyzing a
designed experiment, it's very exciting to get the results of your work.
There is a tendency to eagerly grab the results, rush out to production and
say, "We have the answer!" Before doing that, you need to take the time to
do a confirmation run and verify the outcome. Good software packages will
provide you with a prediction interval to compare the results within some
degree of confidence. Remember, in statistics you never deal with
absolutes--there is always uncertainty in your recommendations. Be sure to
double-check your results.
Contour graphs and 3-D projections help you visualize your response
surface. To achieve robust operating conditions, look for flat areas in the
response surface.
The 3-D surfaces are very impressive, but they're only as good as the data
generated to create the predictive model. The results still must be
confirmed. If you want to generate more sophisticated surfaces, you should
follow up with response surface methods for optimization. These designs
require at least three levels of each factor, so you should restrict your study
to the vital few factors that survive the screening phase.
Promoting DOE
The experiment was conducted with 10 replicates, or repetitions of the experiment at each combination of level settings,
to provide greater accuracy for attribute data.
In this example, the total number of runs was LF x replicates = 23 x 10 = 80, where L = number of levels and F = number of
factors.
-1 +1 -1 OK
-1 +1 -1 NOK
-1 +1 -1 OK
-1 +1 -1 OK
-1 +1 -1 OK
-1 +1 -1 OK
-1 +1 -1 OK
-1 +1 -1 OK
-1 +1 -1 OK
-1 +1 -1 OK
A single set of experiments with 10 replicates is shown in Table 1.
Next, the responses for the replicates of this experiment were converted into proportions, p. In the results shown in Table
1, the proportion of OK (okay) to NOK (not okay) was 9 out of 10. Thus, p = 0.90.
Once the experiment was conducted for all combinations of levels, the captured OK/NOK attribute responses were
converted into proportions, displayed in Table 2.
Table 2: Summary of Experiment Data for All Combinations of Levels
-1 -1 1 0.70
1 -1 -1 0.00
-1 1 -1 0.90
1 1 1 0.60
1 1 -1 0.40
-1 1 1 1.00
1 -1 1 0.00
-1 -1 -1 0.10
Handpicked Content : DOE in Software Testing: The Potential and the Risks
Arcsine Function
The arcsine square-root transform is the variance stabilizing transform for the binomial distribution and is the most
commonly used transformation. It assists when working with proportions and percentages. The proportion, p, can be
made nearly normal if the square root of p is used with the arcsine (or inverse sine: sin-1) transform. The arcsine transform
is then computed as a function of p ? arcsin (sqrt(p)).
In the example, the calculated p was converted into sqrt(p) and then again converted into arcsin (sqrt(p)) by using the
arcsine function. The results are shown in Table 3.
Table 3: Updated Results with Arcsine Function
Analysis of Results
The general linear model (GLM) in analysis of variance (ANOVA) was used to identify the significant factor(s) among the
three factors. The arcsine value was specified as the response, and Factor1, Factor2, Factor3 were provided as the
factors. The output of GLM is shown in Figure 1.
Figure 1: General Linear Model: Arcsin (Sqrt(p)) Versus Factor1, Factor2, Factor3
The p-value calculated within the GLM results indicates that Factor1 and Factor2 are statistically significant. That is, those
factors have a significant bearing on the arcsine value.
Interaction Plot
An interaction plot displays the mean of the response variable for each level of a factor with the level of a second factor
held constant. Interaction plots are useful for judging the presence of interactions – when the response at one factor level
depends upon the level(s) of other factors.
Handpicked Content : Design of Experiments to Optimize Any Process or Product [VIDEO] – With Mark Kiemele
Parallel lines in an interaction plot indicate no interaction; non-parallel lines indicate an interaction between the factors.
The greater the departure of the lines from the parallel state, the higher the degree of interaction. To produce an
interaction plot, data must be available from all combinations of levels.
When the line is horizontal (parallel to the x-axis), there is no main effect present. Each level of the factor affects the
response in the same way, and the response mean is the same across all factor levels.
When the line is not horizontal, then there is a main effect present. Different levels of the factor affect the response
differently. The steeper the slope of the line, the greater the magnitude of the main effect.
The sample data is shown in a main effects plot in Figure 3 and suggests that a main effect is present in Factor1 and
Factor2 as the lines are not horizontal.
In the example, a larger value of the response arcsin (sqrt(p)) represents a better outcome. With that in mind, the analysis
suggests that the levels -1 for Factor1, +1 for Factor2 and +1 for Factor3 will provide the desired results (i.e., high
proportion of acceptance, or low proportion of rejections).
Figure 4 shows the results of the response optimizer analysis for the example data. It indicates that the maximum arcsine
is 1.5708 with a desirability of 1.
Factor 1: -1
Factor 2: +1
Factor 3: +1
To verify the above settings, 30 experiments were conducted and resulted in 0 rejections. With that verification, the
process settings were approved and updated in all appropriate process documents.
Manikandan Jayakumar
View Profile | View all posts by Manikandan Jayakumar
Using the R-Squared Statistic in ANOVA and General Linear Models
Grabbing Hold of the GRPI Model
Comments 11
1.
John Noguera
Thank you for the interesting article. However, this approach does not take into account the sample size of your
replicates. There is a statistical difference between 7/10 and 700/1000!
The way to do that (and thus also avoid the need to transform) is to use binary logistic regression.
0
- Log in to Reply
2.
Mike86
Please check the factor signs in Table 1. I’m showing -1 for Factor 1, +1 for Factor 2, and -1 for Factor 3 in all
experiments.
0
- Log in to Reply
3.
Michael Ohler
John, you are spot on right.
We recommend using FTP transformation for binary outputs of DOE before analyzing it. FTP transform does the following
(arcsin(sqrt(x/(n+1))+arcsin(sqrt((x+1)/(n+1)))/2 (I hope I made no mistakes with brackets, x being number of events and
N – sample size). As you may see, it does take sample size into account, but its impact rapidly goes down when n goes
up.
Recently I was preparing a data set for training and decided to compare results of analysis via DOE and Binary Logistic
Regression. Results were nothing short of shocking.
Data set was 5 factors, half factorial, with sample size per treatment being 600. The output was number of incomplete
credit card applications. DOE on FTPed data was able to find only 2 terms as significant. BLR found 9 – with 5
interactions!!! That was obviously due to the fact that BLR takes sample size into significance calculations.
Then I took significant terms from BLR to DOE and run Response Optimizer. And I got second big surprise. Even direction
of a factor impact calculated from main effects and interactions was different. DOE would say – Increase A to decrease
number for defectives. BLR would say – decrease A! This is obviously due to differences between purely linear model of
terms impact in DOE and log transform of BLR.
Again, it should be no surprise that results would be different, if one looks at actual math behind these 2 tools. However I
did not expect difference to be so practically significant!
My take away – use BLR if you know how to run it and read results. However, for some LSS students playing with
interactions in BLR and making final transfer function of probability with backward LN transform manually might be a
tough call.
So we are investigating with Minitab possible limits of using FTPed data within DOE functionality (i guess it will be
something like Sample size5 or something). If you wish I can keep you guys posted on the development.
Regards,
Maxim Korenyugin and Michael Ohler
0
- Log in to Reply
4.
Jeff Larsen
Enjoyed reading, really studying Manikandan-Jayakumar write-up. The only question I would have is: Could you not reach
the same conclusion by using an ‘Attribute Agreement Analysis?’ It seems that the purpose of DOE is to describe a
design that provides a purpose, which is to provide the most efficient and economical way to reach a valid conclusion.
However, the article flow does provide a simple interpretation of results found which speaks well to the design..
Minimum-run designs
By simply varying factors only at two levels each, but simultaneously rather than one at a time, experimenters can uncover important
interactions such as the one depicted in Figure 4.
Figure 4. Response surface shows a pure interaction of two factors.
Furthermore, this parallel testing scheme is much more efficient (cost-effective) than the serial approach of OFAT. For up to four factors,
the number of runs required to do all combinations, 16 (= 24), may not be prohibitive. These full factorials provide resolution of all effects.
However, as a practical matter all you really need to know are the main effects (MEs) and two-factor interactions (2FIs) — higher-order
effects are so unlikely that they can be assumed to be negligible. Making this assumption (ignoring effects of 3FI or more) opens the door to
fractional two-level combinations that suffice for estimating MEs and 2FIs.
Statisticians have spelled these out more than half a century ago. Our previous article on DOE in Chemical Processing [1] details the
classical high-resolution fractional design options for up to 11 factors. They can be constructed with the aid of a textbook on DOE [2].
If you dig into traditional options for experiment designs, you will find designs available with as little as k+1 runs, where k equals the number
of factors you want to test. One popular class of designs, called Plackett-Burman after the two statisticians who invented them during
World War II, offers experiments that come in multiples of four, for example, 11 factors in 12 runs or 19 factors in 20 runs. However, these
“saturated” designs provide very poor resolution — main effects will be confused with 2FIs. We advise that you avoid running such low
resolution designs. They seemingly offer something for nothing, but the end result may be similar to banging a pipe wrench against a noisy
pump — the problem goes away only temporarily.
Pulling the pump apart and determining the real cause is a much more effective approach. Similarly, you will be better off running a bigger
design that at least resolves main effects clear of any 2FIs. Better yet would be an experimental layout that provides solid estimates of the
2FIs as well. Taking advantage of 21st century computer technology, statisticians developed a very attractive class of designs that require
minimum runs, or nearly so (some have an extra one to maintain an equal replication of the lows versus the highs of each factor). [3]. The
inventors deemed these designs “MR5,” where MR stands for minimum run and the number 5 refers to the relatively high resolution —
sufficient to clearly estimate not only main effects, but also the 2FIs. Table 2 shows the number of runs these MR5s require for 6 to 11
factors — far fewer than full factorials or even the standard fractions.
Table 2. New methods require far fewer test than old full-factorial approach.
Designs for up to 50 factors can be readily laid out via commercially available statistical software developed specifically for experimenters
[4]. A sample layout for six runs is provided in the Table 3.
Table 3. The MR5 layout for six factors in case study: + means the factor is increased; - means it is decreased.
It codes the lows as – (minus) and highs as + (plus). If you perform this experiment, we recommend that the rows be randomized for your
actual run order. If the number of runs far exceed your budget for time and materials, choose the option for MR4 (minimum run resolution
four) — well enough to screen main effects. As shown in Table 3, these lesser-resolution designs require a number of runs equal to two
times the factors, for example: 20 runs for 10 factors (versus 56 for the MR5 and 1,024 for all combinations in the full-factorial).
Table 4. MR4 design to screen a fermentation pilot process shows A’s power to affect yield. (Click to enlarge.)
We will greatly simply the statistical analysis of the data to provide a graphic focus on the bottom-line results. Figure 5 shows an ordered
bar chart of the absolute effects scaled by t-values, essentially the number of standard deviations beyond the mean response.
Figure 5. Pareto plot shows the strength of each factor and interactions on yield.
The line at 2.31 t-value represents a significant level for achieving 95% confidence that a given effect did not just occur by chance. In the
field of quality, displays like these are known a “Pareto” plots — named after an Italian economist who observed that about 20% of the
people owned 80% of a country’s land. Similarly, a rule-of-thumb for DOE says that 20% of the likely effects, main and 2FIs, will emerge as
being statistically significant for any given process. In this case, the main effect D towers above the remainder of the 11 effects that can be
estimated, other than the overall average, from this 12-run design. However, the effect of factor D apparently depends on the level of A, as
evidenced by the bar for interaction AD exceeding the threshold level for significance.
The main effect of A is the next largest bar, which would normally go into what statisticians call the “residual” pool for estimation of error.
However, in this case, the main effect of A goes into the predictive model to support the significant AD interaction. The practical reason why
factor A (temperature) should be modeled is illustrated by Figure 6, which shows that its effect depends on the level of factor D — the load
density.
Figure 6. Interaction of factors A and D is strong because lines don’t cross.
When D is at the high (+) level, increasing temperature reduces the predicted yield significantly, whereas at low D (–) the temperature has
little effect — perhaps somewhat positive — on yield. (The key to inferring significance in comparing one end to the other is whether the
least significant difference bars overlap or not. Notice that the bars at either end of the upper line on Figure 6 do not overlap, thus their
difference are statistically significant.)
This interaction effect proved to be the key for success in the actual fermentation study.
However, for some MR4 cases like this, a sophisticated software that offers forward stepwise regression for modeling such as the one we
used, will reveal a truly active interaction such as AD — but only if it is isolated and large relative to experimental error. If you suspect
beforehand that 2FIs are present in your system, upgrade your design to the higher resolution MR5 or equivalent standard fractional two-
level factorial.
For experiments on six or more factors, the MR5s provide a more efficient core for CCDs than the classical fractions of two-level factorials
available half a century ago. For the six-factor case, the standard CCD with a half-fraction for its core requires 52 runs in total. By
substituting in the six-factor MR5 laid out in Table 2 as the core, the runs are reduced to 40 — a substantial savings with little loss for
purposes of RSM. Although someone well-versed on these methods with the MR5 templates in hand would have no trouble laying out
CCDs based on these, it becomes far easier with software developed for this purpose [7].
References
1. Anderson, M.J., Whitcomb, P.J., “Design of Experiments Strategies,” Project Engineering Annual, pp 46-48, 1998.
2. Anderson, M.J., Whitcomb, P.J., DOE Simplified, Productivity, Inc, New York, 2000.
3. Oehlert, G.W, Whitcomb, P.J., “Small, Efficient, Equireplicated Resolution V Fractions of 2K designs and their Application to Central
Composite Designs,” Proceedings of 46th Fall Technical Conference. American Society of Quality, Chemical and Process Industries
Division and Statistic Division; American Statistical Association, Section on Physical and Engineering Sciences, 2002.
4. Helseth, T.J., et al, Design-Ease, Version 7 for Windows, Stat-Ease, Inc, Minneapolis, 2006 ($495), website: www.statease.com.
5. Tholudur, A., et al, “Using Design of Experiments to Assess Escherichia coli Fermentation Robustness,” Bioprocess International, October
2005.
6. Anderson, M.J., Whitcomb, P.J., RSM Simplified, Productivity, Inc, New York, 2005.
7. Helseth, T.J., et al, Design-Expert, Version 7 for Windows, Stat-Ease, Inc, Minneapolis, 2006 ($995), website: www.statease.com.
Mark J. Anderson and Patrick J. Whitcomb are principals at Stat-Ease in Minneapolis, Minn.; E-mail them
at mark@statease.com and Pat@StatEase.com.