Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

View Article Online / Journal Homepage / Table of Contents for this issue

MARCH, 1973 Vol. 98, No. I164


T H E ANALYST
The Rapid Estimation and Control of Precision by
Duplicate Determinations
BY MICHAEL THOMPSON AND RICHARD J. HOWARTH
(Applied Geochemistry Research Group, Imperial College, London, SW7 2BP)
Studies on computer-simulated models have provided several new methods
Published on 01 January 1973 on http://pubs.rsc.org | doi:10.1039/AN9739800153

of estimating, studying or controlling analytical precision in real systems.


The methods are based upon precision estimators derived from the difference
between duplicate analyses, and take into account variations in the precision
of a determination with the concentration of the substance being determined.
The methods have been checked by applying them to simulated samples of
many duplicate analyses drawn by Monte Carlo techniques from specified
populations, that is, in effect, from analytical systems with known precision
characteristics. Some examples show the application of the methods in
Downloaded by Brown University on 06 January 2013

practice.
IT is rare to encounter analytical laboratories in which precision is regularly measured and
controlled. Usually the precision of a method is established during the development stage
by the replicate analysis of a few samples judged to be typical of the material that is to
be analysed. This is a necessary stage in development, but does not account for any addi-
tional sources of variation that occur in day-to-day work, such as those due to small modi-
fications in technique, which all analysts evolve, the varying skill of different analysts, an
unexpected change in the nature of the samples or some unobserved operational or instru-
mental factor. This situation is tolerated at one extreme, when the precision of a method
is considerably better than that Iequired for the application, and at the other extreme,
when it is not recognised that bad analytical results are stemming from failure to achieve
the necessary precision.
In applied geochemical surveys very large numbers of samples of stream sediment, soil,
rock, etc., are analysed a t a high rate to determine elements of interest in studies of mineral
exploration or agricultural trace element problems. To ensure the economic viability of such
an approach the analytical methods are reduced to the bare essentials by reducing the materials
and labour required to a minimum. The variability of the analytical results produced is
therefore usually the highest that can be accepted for the application. In this situation the
measurement and control of precision is of great importance because any additional source
of variability is liable to render the analytical results meaningless.
METHODS OF ESTIMATING PRECISION-
Methods that have been used to estimate the precision within a batch of analytical results
suffer from certain drawbacks. The most straightforward consists in the replicate analysis
of a few selected samples and the subsequent calculation of the standard deviation. Unless
random sampling is used the samples selected may not be representative of the batch. Indeed,
there is a risk that the replicates may be especially chosen because they are not representative,
i.e., if they are selected from the extremes of the concentration range to demonstrate the
variability in precision over the range. If this method is used the analyst becomes involved
in time-consuming calculations.
The “statistical series” method has been much used in mineral exploration work.1 This
method is based on the analysis of a series of mixtures of two bulk samples of material selected
so that their matrices are similar to the samples analysed in the batch, and the concentration
of the element sought varies over the whole range found in the samples of the batch. The
precision is obtained in terms of a single value for standard deviation. The statistical basis
of this method has been criticised on the grounds that no allowance is made for variation
in the absolute or relative precision over the concentration range.2 Again, the bulk samples
chosen for the “statistical series” may not represent the samples that were analysed in the
batch, and the analyst is involved in carrying out calculations. There is the additional risk
that the mixing of the two bulk samples may not be completely effective.
@ SAC and the authors.

153
View Article Online
154 THOMPSON AND HOWARTH: RAPID ESTIMATION AND [Analyst, VOl. 98
Analysts have always used duplicate analyses to gauge the reproducibility of their
results, but comparatively rarely use the methods that are available to make these duplicate
results quantitative in terms of precision. These methods again assume that the standard
deviation is invariable, and therefore only apply to samples that fall within a very limited
concentration range. In one method use is made of the formula s = d C d 2 / 2 n ,where s is
the estimate of standard deviation, d is the difference between the duplicates and n is the
number of pairs of determination^.^ I n another method the difference between the duplicates
is regarded as the range of two values, and use is made of the range as an estimator for standard
deviation by multiplying the range by a f a ~ t o r .However,
~~~ the range of two values has
a peculiar sampling frequency distribution and single values are apt to be misleading.
Published on 01 January 1973 on http://pubs.rsc.org | doi:10.1039/AN9739800153

In the present work we describe studies carried out by computer simulation techniques
concerning the sampling frequency distribution of duplicate analyses in analytical systems
when the standard deviation varies with concentration. Simple graphical and computational
methods for estimating precision have been evolved, tested and applied to actual cases arising
in geochemical practice.
DEFINITIONS-
Downloaded by Brown University on 06 January 2013

Definitions of precision and detection limit vary from laboratory to laboratory. The
methods of estimation described in this paper can be adapted to suit definitions other than
ours. By precision we mean twice the coefficient of variation expressed as a percentage,
or P, = 200 alp.,where p is the mean and a the standard deviation of a normal (gaussian)
distribution N ( p ,a ) . By detection limit we mean the concentration a t which P , = 100,
or p = 2a.
SAMPLING DISTRIBUTION OF PRECISION ESTIMATORS BASED ON DUPLICATE MEASUREMENTS-
If a pair of duplicate samples, x1 and x2, are taken from a parent population N(p,ap),
the sampling distribution of the difference between these values (x, - x,) is also normal,
N(O,aa),and it can be shown that a d = 2/Tap. The distribution of the absolute difference
between duplicate values lxl - x21 is, in effect, the positive half of N(O,(Td) (see Fig. 1). The
value of up can then be obtained from a population of duplicate values in three different
ways: by cal.culation of ad/2/3;-by obtaining the mean value of Ixl - x21, or d (it can be
showns that d = 0.798 a d , or 1.128 up); and by obtaining the median value of Ixl- xzI, or Md.
From tables of the area under the normal curve we find that Md = 0.675 a d , or 0.955 up.
From each of these methods one can obtain an estimate (Sp)of ap based on a sample
of n duplicate pairs. For large values of n the sampling variance can be derived analytically'
and var S, is 0.50 ai/n, 0.58 ui/n and 1.35 aJn for the three methods, respectively. Hence,
the use of the median (Md) is less precise than the use of the other two methods, although it will
be influenced to a lesser extent by gross errors and can be found graphically. The fortuitous
close approximation of Md and ap means that up can be estimated from the median without
any calculation.

Fig. 1. Position of the mean ( d ) and median ( M d )


of the half-normal curve relative t o its standard
deviationpd. Md = 0.675 Ud; d = 0.798 U d
View Article Online
March, 19731 CONTROL OF PRECISION BY DUPLICATE DETERMINATIONS 155
The foregoing arguments are based on many pairs of duplicate values selected from a
single normal population. In practical analytical terms this would correspond to values
drawn at random in pairs from any analytical results on the same sample, which would have
no advantage over simply determining the standard deviation on a set of replicates. The
information that is required in practice is the precision of the method over the whole con-
centration range of the samples, including any variation caused by matrix variations in the
samples, This information can be obtained from duplicate results on a large number of
samples. In statistical terms it is equivalent to drawing two values a t random from each
of R different populations N(pi,ai),i = 1,R. Even if all the means were equal, there would
still be variations in 01 from sample to sample because of variations in the matrix of the
Published on 01 January 1973 on http://pubs.rsc.org | doi:10.1039/AN9739800153

substance being determined. (In the analysis of stream sediments, soils, etc., this variation
is likely to be considerable.) However, it does not invalidate the conclusions drawn previously
from consideration of statistics on a single sample, because the measured analytical variance
will simply include variance due to variation of the sample matrix.
VARIATIONOF PRECISION WITH CONCENTRATION-
Downloaded by Brown University on 06 January 2013

In any analytical system the precision of determination will vary over the concentration
range of the samples. This fact must be taken into account if the concentration range is
wide. In a system in which the substance determined occurs in the same matrix in all the
samples, the standard deviation of measurement (uC)will increase with concentration, usually
as a linear function.* This can be expressed as
uc = ou + kc .. .. .. .. *. (1)
where a, is the standard deviation a t zero concentration, c is the true concentration (equivalent
to p above) and k is a constant for the system. I n terms of precision this relationship becomes
P, = 200 ~ O / C kp + .. .. ..
- (2)
where k, = 200 k. The detection limit, obtained by substituting P , = 100 per cent., is
made quantitative as
c = 2a,J(l - kp/100) .. ..
.. * * (3)
In almost all analytical methods kp is much less than 100, so that the detection limit becomes
c = 20,. Thus, the precision falls steadily from 100 per cent. a t the detection limit to an
asymptotic value of k, at high concentrations.
This model of precision variation has been used in all the computer simulations when
samples with a range of concentrations have been considered, and has been confirmed in
the actual analytical systems to which we have applied the method.
TABLE
I
PRECISION ESTIMATES BASED ON STANDARD DEVIATIONS O F 500 VALUES O F
+
400 (XI - X,)/(X, X 2 ) I N A SIMULATED SYSTEM N(lO,a)
U P c = 200 o / p Estimated precision
0.01 0.200 0.202
0.1 2.00 2-04
1.0 20.0 20.2
2.6 60.0 63.2
6.0 loot 116
10.0 200 262
20.0 400 622
t Detection limit.
A DIRECT PRECISION ESTIMATOR-
The standard deviation estimators previously referred to can be converted into direct
precision estimators by dividing by an estimate of the concentration. For duplicate determina-
+
tions this estimate would be the mean value (x, x2)/2, so the precision estimator is
+
400 (x,- x,)/(x, x,). The deviation of the distribution of this function from the normal
curve is not great when the precision is less than 100, Le., above the detection limit. This
can be seen in Table I where the precision of the parent population is compared with estimates
based on values of 400 (x, - x,)/(xl +
x,) selected a t random.
* Deviations from this rule can occur if measurements are made beyond the normal analytical range,
e . g . , on non-linear parts of calibration graphs.
View Article Online
156 THOMPSON A N D HOWARTH: RAPID ESTIMATION AND [ATZdySt, VOl. 98
This estimator requires more calculation to give precision by graphical plotting, but can
be of use when direct precision values are required and the duplicate data are handled by
computer, as in some larger projects.
SIMULATION TECHNIQUES-
In order to obtain a general impression of the basic form of the frequency distributions
+
Ixl - x21 and 21x, - x2!/(x1 x,), a large-sample Monte Carlo method was used. Values
of c were taken at fixed intervals over the concentration range of interest and used to generate
pairs of values selected a t random from populations of the form N(c,ac),where uc is defined
by substituting appropriate values of k and a. in equation (1). Fig. 2 shows a typical form of
Published on 01 January 1973 on http://pubs.rsc.org | doi:10.1039/AN9739800153

+
the frequency distribution of the relative difference 21x1 - x,l/(xl x,) with concentration.
Downloaded by Brown University on 06 January 2013

Log RD
‘3

4--1

Fig. 2. Isometric view of the frequency distribution of relative difference (RD),


+
2 I x1 - x 2 I /(xl x 2 ) , as a function of concentration c (p.p.m.) for a simulated analytical
system (uC = 0.01 + 0 . 0 4 ~ ) . The vertical axis corresponds to relative frequency

In a typical set of analytical samples, however, the concentrations will not be uniformly
distributed but will be less frequent towards the extremes of the range. In geochemical
samples the distribution of trace-element concentrations often tends towards the log-normal
[where y = log (c) is normally distributed], and such a model has been used in producing
a simulated population of duplicate values for testing the estimation methods. A number
of values (R) were picked a t random from a suitable population F ( y ) = N ( p , u ) , and were
transformed as
ci = l O y i , i = 1,R
to create a set of parent concentrations (ci). A pair of values (x1,x2)were then generated
from the R populations N(ci,uci), i = 1,R, to give a set of R duplicates very similar to those
which would be obtained from an actual set of samples, with the exception that there would
be no gross errors, as would occasionally be encountered in practice. These duplicate values
were then used to test the various methods of deriving the concentration - precision relation-
ship, which was, of course, pre-determined.
RESULTS OBTAINED FROM SIMULATED VALUES-
Computational method-This method is based on the standard deviation estimator .Said%
Sets of duplicate values were simulated as described above, and sorted into increasing order
View Article Online
March, 19731 CONTROL OF PRECISION BY DUPLICATE DETERMINATIONS 157
+
of concentration, as estimated by the value of (x, x,)/2. These values were then taken
in successive groups of twelve and within each group the mean value of (x,- x 2 ) / 2 and
the corresponding value of Sd/z/Fwere calculated. The relationship between the estimated
means and the estimated standard deviations was then found by linear regression. The
results obtained corresponded closely with the original generating function, as shown in
Table 11. In practice, this method would be most appropriate in circumstances when the
analytical data are processed by use of a computer, i.e., when long runs of data are produced
automatically at a high rate.
TABLE
I1
Published on 01 January 1973 on http://pubs.rsc.org | doi:10.1039/AN9739800153

THEACTUAL AND ESTIMATED RELATIONSHIP BETWEEN c AND oc


I N TWO SIMULATED SYSTEMS
System 1 System 2
Actual
- x t e d Actual
-Estimated
1.00 0.92 2.50 2.03
2
Downloaded by Brown University on 06 January 2013

0~0100 0.0095 0.100 0.121

A more simple method requires the analysis of the regression of Ix, - x21 on (x, x z ) / 2 . +
The regression coefficients are divided by 1.1284 to give the relationship between oc and c
directly. However, this method would be more sensitive to the effects of gross errors than
would the median technique. Its use is illustrated in Fig. 3, where it was used to derive
the relationship shown.

'1
0.8

Fig. 3. Values of]lxl xz.I versus (xl +x , ) / 2 per cent., obtained


by spectrographic determinations of potassmm, also showing the
medians (- - -) and functional relationship (-) So = 0.001 + 0-103c,
estimated by linear regression

Graphical method-In this method a graph of the mean estimator (x, x 2 ) / 2ueYsus the +
standard deviation estimator Ixl - x21 is plotted on log - log graph paper* for each (x1,xa)pair.
The points are divided into successive concentration ranges, each containing 10 to 20 points,
and the median value of Ixl - x21 in each range is marked. A curve fitted (by eye) through
the medians would be taken as showing the relationship between c and crc. Fig. 4 shows
such a graph with the medians drawn in; the line representing the generating relationship
between c and uc is shown for comparison purposes.
* Linear graphs can be used if the concentration range is small.
View Article Online
158 THOMPSON AND HOWARTH: RAPID ESTIMATION AND [Ana&St, VOl. 98
The speed and success of this method renders it particularly suitable for routine work
when, as is normally the practice, the analyst writes down and checks the results. Another
advantage of this method in practice stems from the fact that the median value is only
slightly affected by extreme values such as would be caused by a gross error.
Published on 01 January 1973 on http://pubs.rsc.org | doi:10.1039/AN9739800153
Downloaded by Brown University on 06 January 2013

- .. ..
. . .
b .

- 0 * t

- . *
- b .

SOMEPRACTICAL EXAMPLES-
In Fig. 5, the application of the graphical method given above to a rapid atomic-absorp-
tion method for determining zinc in stream sediments is illustrated, in a project in which every
tenth sample is determined in duplicate. This chart shows a feature not encountered in
the computer simulations, owing to the limited resolution of the digital read-out of the
instrument, viz., that the observed differences between duplicates consist mainly of small
multiples of the last readable figure. This does not affect the median if differences of zero
are plotted as half of the last readable figure.
This artifact does not appear in Fig. 3, in which the results derived from the deter-
mination of potassium by use of a direct-reading spectrograph are plotted. In this instance
the resolution of the read-out is much lower than the detection limit for the method.
CONTROLOF PRECISION-
When large numbers of duplicate pairs are available it is easy to obtain the relationship
between uc and c over the concentration range. In batches containing less than 200 samples,
when only 10 per cent. of the samples are analysed in duplicate, a good estimate of this
relationship cannot be made from the relatively few points scattered over the concentration
range. However, it is much easier to ascertain whether the points conform to some previously
decided standard of precision, either selected by some external requirement, or by previous
experience with the method. This conformity can be determined by drawing various per-
centiles calculated from the arbitrary precision standard on the chart before plotting the
points. The method is illustrated in Fig. 6, which shows a set of data which have better
precision than the arbitrary standard of P, = 20 per cent., as delineated by the 90th percentile
value as a function of concentration.
The combination most useful for precision control is that of the 90th and 99th percentiles,
which enables the analyst to establish immediately whether his set of data is conforming
to the arbitrary standard, and also whether any points are present that probably belong to
a different population from the remainder (Le., gross errors).
View Article Online
March, 19731 CONTROL OF PRECISION BY DUPLICATE DETERMINATIONS 159
Published on 01 January 1973 on http://pubs.rsc.org | doi:10.1039/AN9739800153

-
1:
.
Downloaded by Brown University on 06 January 2013

I I 1 1 1 1 1 1 1 I Illlllll I I 1 1 1 1 1 1 1 I I l l l U

00

Fig. 5. Values of Ixl - x z I v e t w s (xl xz)/2 .+


p.p.m., obtained for atomic-absorption determinations
of zinc, also showing the medians (- - -) and graphically
estimated functional relationship (-) S , = 3.0 +
0.033~

CONCLUSION
An assumption inherent in this approach has been that the characteristics of the system
are invariant within the time period of the observations. Thus, the method would be invalid
if applied to sets of duplicate pairs in which significant systematic bias was present between
the corresponding results. However, random variations on a time scale that was short in
relation to the sampling interval would merely contribute to the over-all variability.

10
(X, +X2)/2
Fig. 6. Values of Ixl - x21 versus (xl x,)/2 +
parts per loD, obtained by atomic-absorption deter-
minations of mercury, also showing the 90th and 99th
percentiles as a function of concentration for Pc =
20 per cent.
View Article Online
160 THOMPSON AND HOWARTH

For example, if all the duplicate determinations were made within a batch of analyses
the method would give an estimate only of the within-batch precision rather than of the
over-all characteristics of the analytical system. If, however, some samples were re-analysed
in a subsequent batch, perhaps after re-setting the instrument, by introduction of a new batch
of reagent, or by a different analyst, then some systematic difference between the correspond-
ing results might be detectable (by a significance test) superimposed upon the random vari-
ations. In such an instance the methods of estimation outlined would not give a reliable value
for precision. Finally, if the same procedure were used on many successive batches a valid
estimate of the over-all variance would be obtained, consisting of the sum of the within-batch
variances and the variance due to any systematic differences between batches.
Published on 01 January 1973 on http://pubs.rsc.org | doi:10.1039/AN9739800153

The project of which this work forms a part is supported by a grant from the Natural
Environment Research Council for an investigation under the direction of Professor J. S.
Webb into the applicability of statistical techniques to the interpretation of regional geo-
chemical data. Computer time has been provided by the Imperial College Computer Centre.
The authors thank the referees and Dr. P. J. Brown of the Department of Mathematics,
Imperial College, for their helpful comments on the statistics.
Downloaded by Brown University on 06 January 2013

REFERENCES
1. Craven, C. A. U., Trans. Instn Min. Metall., 1953-4, 63, 651.
2. Stern, J. E., “A Statistical Problem in Geochemical Prospecting,” M.Sc. Thesis, University of
London, 1959.
3. Youden, W. J., “Statistical Methods for Chemists,” Chapman and Hall, London, 1951, p. 16.
4. Dean, R. B., and Dixon, W. J., Analyt. Chew .,,‘1951, 23, 636.
5. Eckschlager, K., translated by Chalmers, R. A., Errors, Measurements and Results in Chemical
Analysis,” Van Nostrand Reinhold Co. Ltd., London, 1969, p. 89.
6. Elandt, R. C., Technometrics, 1961, 3, 551.
7. Kendall, M. G., and Stuart, A., “The Advanced Theory of Statistics,” Volume 1 , Griffin, London,
1969, pp. 235 t o 243.
Received September l l t h , 1972
Accepted November 27th, 1972

You might also like