Experimental Design I Screening Araujo

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

26 trends in analyticalchemistry, vol. 15, no.

1, 1996

Experimental design
I. Screening
Pedro W. Araujo, Richard to the efficiency of extraction. Often, when moni-
G. Brereton * toring processes, unexplained variations in extrac-
Bris to/, UK tion efficiency occur on a daily basis. These might
be due to column variability, pH and temperature
This series of three articles discusses the fluctuations, change in quality of a reagent, prob-
uses of experimental design in analytical lems with sample homogeneity, etc. Which factors
chemistry. The three parts, entitled screen- are most significant is not always obvious from first
ing, optimization and quantification, respec- inspection. Similarly, optimization of analytical
tively, are illustrated by examples taken from methods such as in high-performance liquid chro-
the literature. Screening is the first step in the matography (HPLC) and atomic absorption spec-
efficient assessment of the factors involved trometry (AAS) is influenced by a variety of
in an analytical system under study. This arti- factors. Initially, it is important to understand
cle discusses full factorial designs, fractional which factors are significant and then narrow down
factorial designs, Plackett and Burman the final optimization for three or four significant
designs and interpretation of numerical factors. _
results.

2. Screening approaches
1. Introduction
2.1. Full factorial designs
The analytical chemist is constantly performing
experiments for a very wide variety of reasons. These are most useful where the number of fac-
Over the past few years, the need to design exper- tors is relatively limited. For a typical factorial
iments systematically has been defined, and ana- design, it is necessary to determine how many fac-
lysts need to understand the wide variety of tors are of interest, and how many levels each factor
statistical and chemometric methods to help per- is to be studied at. An N factor, K level design
form experiments successfully. involves performing KN experiments.
There have been many groups of investigators These designs are organized as follows. First, it
employing experimental design methodology, for is necessary to establish a region over which each
a variety of different reasons. Some approach the factor is to be studied. This region is chosen as a
subject through statistical response surface meth- sensible experimental region. As an example, we
odology. Others are very empirical, using look at an experiment for the determination of Ti
approaches such as simplex to optimize their meth- in glass ceramics by flame AAS [ I]. The aim is to
ods. Still others find the need for experimental determine the influence of the elements Al, Mg, Na
design within multivariate analysis. and Si on the Ti signal. In this first step, sensible
The aim of this article is to take an overview of low and high levels of each element in pg ml -
experimental design methodology throughout ana- are proposed. For example, the concentration of Al
lytical chemistry, describing the various different varies between 100 and 320 pg ml - . This is called
strands. the experimental domain; there is no guarantee that
A major role of experimental design is in screen- the results will be valid outside this region.
ing. Most analytical processes are influenced by a The second step is to choose a design. A 4 factor,
wide variety of factors. For example, there may be 2 level design is given in Table 1, consisting of 16
20 steps in an extraction. It is important for the experiments. Each experiment involves recording
experimenter to understand which steps are crucial the signal in the presence of 4 compounds as given
in the design. For example, the experiment denoted
* Corresponding author. by + 1 - 1 + 1 + 1 involves recording the signal at

0165.9936/96/$15.00 0 1996 Elsevier Science B.V. All rights reserved


trends in analyticalchemistry, vol. 15, no. 7, 1996 27

Table 1 Table 2
Full factorial design 24 (16 experiments) to determine the influ- More elaborate design: a 3 level, 2 factor design (3*= 9 exper-
ence of the elements Al, Mg, Na and Si on the Ti signal iments)

No Xl X2 X3 X4 No. Xl x2

1 1 1 1 1 1 1
2 1 1 -1 1 1 -1
3 1 -1 1 1 -1 1
4 1 -1 -1 1 -1 -1
5 -1 1 1 1 0 1
6 -1 1 -1 1 0 -1
7 -1 -1 1 1 1 0
8 -1 -1 -1 1 -1 0
9 1 1 1 -1 0 0
IO 1 1 -1 -1
11 1 -1 1 -1
12 1 -1 -1 -1 where y^represents the estimated response, bO, is
13 -1 1 1 -1 the average experimental response, the coefficients
14 -1 1 -1 -1 6, to b, are the estimated effects of the factors
15 -1 -1 1 -1 considered and the extent to which these terms
16 -1 -1 -1 -1
affect the performance of the method is called main
effect. The coefficients bi2 to b,,_ , jN are called
a high level of the first compound, low level of the the interaction terms. We can see from Eq. 1 that
second compound and high levels of the third and the factorial design provides information about the
fourth compounds. importance of interactions between the factors.
More elaborate designs are 3 level designs. A 2 This means that sometimes the level in which some
factor, 3 level design is given in Table 2, where a factors must be set is influenced by their interaction
0 indicates an intermediate level. For several fac- with others, so that we can ensure a better expected
tors the number of experiments becomes imprac- experimental response. The application of this kind
ticable, but the added advantage is that a good of design for screening tends to be impractical
mathematical model can be obtained. when a high number or factors need to be tested
The third step is to choose a response. The aim and when other considerations such as analysis
of the screening experiment discussed above is to time and expenses, must be taken into account.
see whether the signal height (absorbance) in AAS 2.2. Reduced factorial designs
is influenced by these factors. It is important to
choose a response that can be easily quantified, e.g., The performance of chromatography can be
extraction efficiency, peak area, signal height. affected by approximately 50 factors [ 41, some of
Selecting an inappropriate measure of response
Table 3
may lead to inaccurate conclusions, e.g., selecting Example of some possible factors which can affect a general
the wrong CRF (chromatography response func- chromatography methodology
tion) in the case of overlapping peaks.
Fourth, a mathematical mode1 can then be pro- Sample preparation factors Chromatography factors
sample weight pf-f
duced relating the response to the factors. Often
shake time temperature
the factors are coded, that is, the raw data are not sonication time solvent composition
used, but the data are transformed mathematically heating temperature flow-rate
to a number, e.g., a low level is transformed to the wash volume buffer concentration
number - 1 and a high level to the number + 1. extraction volume additive concentration
Sometimes, the response is also transformed [ 2,3] centrifugation time analysis time
pore size Detector factors
(logarithmic transformation is common), but this
extraction wavelength
is not mandatory. Often a linear mode1 of the form: purification RI-range
dilution filter
B=bo+b,x,+b?X7+b3X3+...+bN-,XN-,
_ _ Column factors time constant
+b~,+b,,x,x,+b,3X,X3+b*3X2X3+ manufacturer lamp lifetime
batch Data-handling factors
lifetime user selector factor
28 trends in analytical chemistry, vol. 15, no. 1, 1996

which are listed in Table 3. For HPLC up to 15 Table 4


factors have been reported [5]. In order to study Fractional factorial design 26m (16 experiments) used in the
identification of significant effects in the processing conditions
such a large number of factors, reduced factorial of cheese
designs are employed. With these kinds of designs
the factors can be efficiently evaluated using a No. x, x, x, x, x5 (x,x&) x6 (x1x3x4)
small fraction of the experiments of the full facto-
1 -1 -1 -1 -1 -1 -1
rial design.
2 -1 -1 -1 1 1 1
There are several types of reduced factorial 3 -1 -1 1 -1 1 1
designs, but in this paper we will be concerned with 4 -1 -1 1 1 -1 -1
the fractional factorial design and the Plackett- 5 -1 1 -1 -1 1 -1
6 -1 1 -1 1 -1 1
Burman design.
7 -1 1 1 -1 -1 1
The description of a fractional factorial design 8 -1 1 1 1 1 -1
at K levels and N factors is given by K- exper- 9 1 -1 -1 -1 -1 1
iments, where P is always less than N. The follow- 10 1 -1 -1 1 1 -1
ing rules must be employed for this kind of design. 11 1 -1 1 -1 1 -1
12 1 -1 1 1 -1 1
??The design is usually applied at 2 levels (K= 2).
13 1 1 -1 -1 1 1
0 The number of experiments (2N-P) must be 14 1 1 -1 1 -1 -1
higher than N (number of factors), where P rep- 15 1 1 1 -1 -1 -1
resents the number of columns generated from a 16 1 1 1 1 1 1
full factorial design 2p (p = N - P) . As an exam-
x, = Maturity, x, = dry matter, x, = pH, x, = addition of dry mat-
ple, in the identification of significant effects in ter, x, = after-creaming, x, = cooling.
the processing conditions of cheese [ 61, six fac-
tors were studied and 16 experiments performed
(26-). The experimental matrix shown in IX 1 = 1223 *column 1
Table 4, which is formed by 6 columns (one = column 23 = b, = bz3
column per factor), was generated with a 24
(p= 6 - 2) full factorial design in order to 1X2= 1223*column 2
obtain the first four columns (factors xl, x2, x3 =column 13=bz=b13
and x4). The two additional columns (P= 2)
which represent the factors x5 and x6 were pro- 1X3= 1232*column 3
duced by combining x2 x3 x4 and x1 x3 x4, respec- =column 12*b3=b,z
tively.
The column x3 corresponds to a product of the
0 All the experiments proposed by the experimen-
columns x1x2. Thus the estimation of the effect b,
tal matrix (each row in Table 4) must be dif-
of factorx, as ( - 1 + 1 - 1 + 1) /4 will produce the
ferent in order to avoid replicate information.
same result as the estimated interaction term b23 of
The main limitation concerned with the frac-
the factors x2 and x3, respectively. This last inter-
tional factorial design is that the main effects ( bN)
action column is obtained multiplying columns x2
are confounded with interaction terms [b,,_ , jN].
and x3. The literature has reported some strategies
This observation can be exemplified by considering to overcome this limitation [7-91 and some of
a 2 level, 3 factor design. Suppose that 8 experi- them will be explained in the Section 1.2.3.
ments are too expensive to evaluate the influence A particular type of fractional factorial design is
of the three factors through a full factorial design. the Plackett and But-man design [ lo], which
We can obtain information on a fraction of exper- assumes that the interactions can be completely
iments, for instance 4, and P = 1, and just one col-
umn in the experimental matrix will be the result Table 5
of the combination of the other two independent Fractional factorial design 23- (4 experiments)
columns given in Table 5. Because column 3 is
No. Xl X2 x3 (4 x2)
the result of combining columns 1 and 2 ( 3 = 12))
it is called the generator of the design. This is mul- 1 -1 -1 1
tiplied (3 X 3 = 12 X 3) and the result is called the 2 1 -1 -1
3 -1 1 -1
defining relation I= 123. From this relation we can
4 1 1 1
obtain:
trends in analytical chemistry, vol. 15, no. 1, 7996 29

Table 6
Number of experiments to be performed according to number
of factors to be assessed when Plackett-Burman design is
used

No. of experiments No. of real factors No. of columns

4 from 2 to 3 4
8 from 4 to 7 7
12 from 8 to 11 11
20 from 12 to 19 19
24 from 20 to 23 23

ignored and so the main effects are calculated with


a reduced number of experiments. The minimum _____________
JJ
number of experiments to be performed in order to
assess the factors under study is given in Table 6. Fig. 1. Plackett-Burman matrix obtained by cyclical
As a rule for the construction of a Plackett-But-man permutation of the first row and for N= 3.
experimental matrix, we can establish the follow-
ing steps: different solvents and solvent combinations used
Select the number of factors (N) to be studied, in lipid separation by HPLC at different column
e.g., N= 3. temperatures [ 111. Observe how the last row is
It is evident from Table 6 that in some instances filled only with levels - 1 in order to obtain a
(as in our example for N= 3) the number of balance in the levels of each column and how with
columns does not correspond to the number of an 1/ 16 of the full design at 2 levels, it is possible
factors under study. In this case set the remaining to carry out this task.
columns correspond to dummy factors, e.g. time This kind of design is useful for a rapid screening
of the day. of the factors but runs the risk of overevaluating
Set the number of columns dictated by Table 6 some factors when the real effect lies on the inter-
and build in a random way a first row with almost action terms, which have been considered negli-
the same number of levels + 1 and - 1, e.g., for gible.
N=3wecanhave +1-l-1+1 (notethatthe
fourth factor is a dummy factor). 2.3. Interpreting results
In order to generate the experimental matrix, all
the cyclical permutation of the first row must be In the section above we found that fractional
obtained up to complete the number of experi- factorial designs are subject to some restraints
ments established in Table 6, e.g., for N= 3 the because the main effects can be confounded with
number of experiments has to be 4. The experi- the interaction terms. Some authors have used com-
mental matrix is constructed according to Fig. 1. plementary designs to separate the main effects and
The last row or experiment in this kind of matrix the interaction terms [7,8]. The other technique
when N> 3 has to contain only the levels ( + 1 or used has been the interaction diagrams [8,9] as
- I ) that balance each column to ensure that there described below for a fractional factorial designs
are an equal number of + 1 and - 1 levels studied
for each factor. This means that at the end each Table 7
column must have equal quantity of levels + 1 and Plackett-Burman design for the screening of 7 factors
- I, respectively.
No. x, x, x, x, x5 x2 x,
The model used for the Plackett-Burman design
is first order in each factor as follows: 1 -1 1 -1 1 1 1 -1
2 -1 -1 1 -1 1 1 1
<=h,,-th,x, +h,x?+&X3+...
- _ 3 1 -1 -1 1 -1 1 1
4 1 1 -1 -1 1 -1 1
+bN-,G-I +&-G (2) 5 1 1 1 -1 -1 1 -1
The Plackett-Burmann design for 7 factors at 2 6 -1 1 1 1 -1 -1 1
7 1 -1 1 1 1 -1 -1
levels as described in Table 7 has been used in the
8 -1 -1 -1 -1 -1 -1 -1
screening step in order to scout the importance of
30 trends in analytical chemistry, vol. 15, no. 1, 1996

Table 8
Fractional factorial design 27-4 used in the investigation of the variables affecting the cadmium sensitivity in a graphite furnace
without a platform and with modifier

No Xl x2 X3 X4 x5 04x2x3) X6 h%w4) x7 bv3x4) ng ml-

1 -1 -1 -1 -1 -1 -1 -1 7.607
2 1 -1 -1 -1 1 -1 1 8.006
3 -1 1 -1 -1 1 1 -1 6.774
4 1 1 -1 -1 -1 1 1 6.247
5 -1 -1 1 -1 1 1 1 8.226
6 1 -1 1 -1 -1 1 -1 7.838
7 -1 1 1 -1 -1 -1 1 5.345
8 1 1 1 -1 1 -1 -1 7.463
9 -1 -1 -1 1 -1 1 1 8.055
10 1 -1 -1 1 1 1 -1 8.104
11 -1 1 -1 1 1 -1 1 7.707
12 1 1 -1 1 -1 -1 -1 7.710
13 -1 -1 1 1 1 -1 -1 7.905
14 1 -1 1 1 -1 -1 1 7.643
15 -1 1 1 1 -1 1 -1 6.409
16 1 1 1 1 1 1 1 6.726

x, = Ashing ramp, x2 = ashing temperature, x, = ashing time, x, = calibration curve, x, = drying time, x, = atomization ramp,
x, = atomization temperature.

used in the determination of Cd by AAS using result, all the values are the averages over all exper-
electrothermal atomization without platform and iments presented in Table 8. As an example of
with modifier [ 91. Seven factors were evaluated calculation, the value 7.360 in the interaction bls
and 16 experiments carried out. The experimental (lower right quadrant) is obtained, taking the sen-
matrix and response obtained (sensitivity in ng sitivity values when the factors x1 and x5 are hold
ml- ) is presented in Table 8. The statistical anal- in the level + 1 and - 1, respectively, as follows:
ysis revealed that some confounded terms formed (6.247 + 7.838 + 7.7 10 + 7.643) /4 (they corre-
by the two factor interactions were significant and spond to the experiments 4, 6, 12 and 14 in
between them we have selected the term formed Table 8). The diagrams provide preliminary sug-
by bZ3+ b,5 + bd6to show how the interaction dia- gestions about which interactions may be most sig-
grams were build for a better comprehension of nificant in determining the sensitivity. For example
them. Fig. 2 shows the combined effect of the con- for factors 2 and 3 there is a fairly large difference
founded terms. Each upper right quadrant corre- according to whether factor 3 is at a low (favour-
sponds to the average experimental response when able) or high level, and a slightly smaller difference
the individual factors involved in the interaction relating to factor 2. It would appear that factors 2
term are both in the level + 1. Each upper left and 3 should preferably be at a low level to obtain
quadrant is the average when the first factor a good response, and certainly more sophisticated
involved in the interaction term is in the level - 1 quantitative modelling should take this particular
and the second in the level + 1, and so on. As a interaction into account.
More conventionally, the significance of each
term can be estimated according to its contribution
to reducing the overall error in the model. For
example, if the total sum of squares error for 20
experiments is 1 (this corresponds to the sum of
squares of the difference between the predicted and
observed response at each of the 20 points), then
each term in the model has an influence over this.
If, by removing a given term, the error stays vir-
Fig. 2. Diagrams of interactions for the confounded tually the same, the term has very little significance
term: b23+b,5+b46. on the model, and so probably can be neglected.
trends in analytical chemistry, vol. 15, no. 7, 7996 31

Sometimes, it is possible to use analysis of variance [71 P.W. Araujo, M.J. Gomez, Z.A. Benzo and C.
to determine the statistical significance of each Castillo, Chemom. Intell. Lab. Syst., 16 (1992)
term: the contribution to the model of a given term 203.
is compared with an analytical or replicate error. [f31 J. Scuotto, D. Mathieu, R. Gallo, R. Phan-Tan-
Luu, J. Metzger and M. Desbois, Bull. Sot. Chim.
This ratio of sums of squares can be assessed for
Belg., 94 (1985) 897.
significance using an F-test, but it is essential to
1 P.W. Araujo, C.V. Gomez, E. Marcano and Z.
recognize that this is only possible if there is rep- Benzo, Fresenius J. Anal. Chem., 351 (1995)
lication. 204.
It is vitally important to realize that the absolute R.L. Plackett and J.P. Burman, Biometrika, 33
size of a coefficient does not necessarily indicate ( 1946) 305.
significance. Size depends on scaling, so, for exam- P. Kaufmann, Chemom. Intell. Lab. Syst., 27
ple, if a parameter is measured in metres rather than (1995) 105.
centimetres, linear coefficients will decrease by
0.0 1 and quadratic coefficients by 0.000 1. The ratio
between linear and quadratic coefficients will also Richard Brereton was educated at Cambridge
University and since 7983 has been employed by
decrease by 0.01. This has nothing to do with the
the University of Bristol (School of Chemistry,
significance or otherwise of these terms. University of Bristol, Canto&s Close, Bristol BS8
1TS, UK). His research interests are in
chemometrics and the analysis of chloropylls. He
References has published around 50 research papers, 15 book
and proceedings articles, and has edited/written 4
[ I ] 0. Grossmann, Anal. Chim. Acta, 203 ( 1987) 55. books. He current/y supervises 7 Ph.D. students.
[21 R.A. Stone and A. Veevers, J. Chemom., 8 ( 1994) He is a member of the Editorial Board of
103. Chemome tries and Intelligent Labora tory Systems
[31 M.A. Allus, R.G. Brereton and G. Nickless, and the Advisory Board of Analytical Proceedings.
Chemom. Intell. Lab. Syst., 3 (1988) 215. Pedro W. Araujo is studying for a Ph.D. in the
[41 J.A. Van Leeuwen, L.M.C. Buydens, B.G.M. University of Bristol. He has published around 10
Vandeginste, G. Kateman, P.J. Schoenmakers and papers, mainly in experimental design and
M. Mulholland, Chemom. Intell. Lab. Syst., 10 analytical method development. Before coming to
(1991) 337. Bristol, he worked at the lnstituto Venezolano de
151 G. Wernimont, ASTM Standardization News, lnvestigaciones Cien Micas, Caracas, Venezuela.
March ( 1977) 13. His current research involves developing novel
161 0. Langsrud, M.R. Ellekjaer and T. Naes, J. chemometric and analytical methodology for the
Chemom., 8 ( 1994) 205. study of chloropyll and its degradation products.

TrAC Contributions

Articles for this journal are generally commissioned. Prospective authors who have not been
invited to write should first approach one of the Contributing Editors, or the Staff Editor in
Amsterdam (see below), with a brief outline of the proposed article including a few references.
Authors should note that all manuscripts are subject to peer review, and commissioning does not
automatically guarantee publication.

Short items of news, etc. and letters may be sent without prior arrangement to: Mr. D.C. Coleman,
Staff Editor TrAC, P.O. Box 330, 1000 AH Amsterdam, Netherlands, Tel.: ( + 3120) 485 2784;
Fax: ( + 3120) 485 2304.

You might also like