Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/268308168

Lecture Notes in Statistics

Article · January 1996


DOI: 10.1007/978-1-4612-2380-1_17

CITATIONS READS

7 1,287

1 author:

Peter Lischer
ConStat Consulting
30 PUBLICATIONS   732 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Peter Lischer on 04 October 2016.

The user has requested enhancement of the downloaded file.


Robust Statistical Methods in
Interlaboratory Analytical Studies∗
Peter Lischer
ConStat Consulting, CH-3095-Spiegel b. Bern, Switzerland

Abstract

Interlaboratory analytical study is the general term of an experiment


organised by a committee and involving several laboratories to achieve a
common goal. Two important types of studies are the method-performance
studies and the laboratory-performance studies. The purpose of a method-
performance study is to determine the precision and bias characteristics of an
analytical test method. A laboratory-performance study ascertains whether
the laboratories conform to stated standards in their testing activities. An
iterative and a non-iterative method to calculate the estimates in a method-
performance study are presented and a new method based on a score-function
allows to characterise the performance of laboratories both as groups and indi-
vidually. This score is a squared Mahalanobis distance with robust estimates
of means and covariances. For the latters’ determination the specific structure
of the interlaboratory-test data is taken into account. Instructive graphical
displays supports the classification of the laboratories.

Key Words and phrases: Interlaboratory studies, robust distance, robust es-
timation of components of variance, multivariate outlier.
AMS 1991 subject classifications: Primary 62F35; secondary 62J10.

1 Introduction
In order for the results of analytical chemical measurements to be meaningful, pro-
cedures must be well developed enough that a reanalysis does not drastically change
the conclusions, and well enough specified that different laboratories will achieve
similar conclusions for the same sample. This means that there has to be a stan-
dard, i. e. a written document that lays down in full details how the test should
be carried out. A standardised method has to be robust, that is small variations in
the procedure should not produce unexpectedly large changes in the results. (ISO,

This paper won the 1995 W.J. Youden Award in Interlaboratory Testing from the American
Statistical Association.

251
252 P. LISCHER

1987). Analytical methods and laboratory-competence have to be tested in an inter-


laboratory study. In such a study, several samples to be analysed are divided, and a
part of each sample is sent to each of a number of laboratories. The resulting data
are analysed by the referee to yield not only estimates for replication error and for
laboratory bias but also the necessary information about the laboratory-performance
of all participating laboratories.

2 Method-performance studies

2.1 The statistical model


Most method-performance studies consider trials involving n laboratories, each of
which analyse a specimen p times (uniform-level experiment) or perform a split-level
test (see below). The procedure is repeated for a number of specimens. Let us first
consider uniform level experiments. Then for one particular specimen, every single
test result, yij , i = 1, 2, . . . , n; j = 1, 2, . . . , p, is the sum of three components:

yij = m + bi + eij ,

where m is the true (or consensus) value, bi is the laboratory bias with variance σL2
and eij is the replication error with variance σr2 . The bi and eij are assumed to be
uncorrelated and centred. The parameter σr is called repeatability standard devi-
ation and σR = (σL2 + σr2 )1/2 reproducibility standard deviation. Repeatability and
reproducibility are the traditional precision parameters in chemistry (ISO, 1987).
The repeatability (r) of the method is the value below which the absolute difference
between two single analytical results obtained with the same method on identical
sample material and under constant conditions as regards laboratory, analyst, appa-
ratus, chemicals and interval of time, is expected (with 95% confidence) to lie. The
reproducibility (R) is the value below which the absolute difference between two
single analytical results obtained with the same method on identical sample mate-
rial and under different conditions of laboratory, analyst, apparatus, chemicals and
interval of time, is expected (with 95% confidence) to lie. For normally distributed
errors we have
√ −1
r = 2Φ (0.975) σr
√ −1 q
R = 2Φ (0.975) σL2 + σr2 ,

where Φ(z) is the cumulative standard normal distribution. The measurements yij
are not all uncorrelated. We have
 2
 σL + σr2 , if (i, k) = (k, l)
Cov(yij , ykl ) = σL2 , if i = k and j 6= l
0, if i 6= k

A drawback of the uniform-level design is that the operator, when testing succes-
sively identical samples, may be influenced by the result of his first term. To prevent
INTERLABORATORY STUDIES 253

this an alternative split-level design may be used. In this procedure, instead of using
two samples that the operator has been told to be identical, or performing two tests
on the same specimen of material, two series of n samples are prepared at slightly
different levels m + ∆ and m − ∆ (where ∆ is small) and each of the n laboratories
receives one sample of series 1 and one sample of series 2 for testing. The values of
σr and σR derived from a split-level experiment are valid for the mean level m.
The aim of a method-performance study consists of finding estimates for the pre-
cision parameters σr and σR which are characteristic of the particular method and
not only of the specific study. To achieve this aim, the following conditions must be
fulfilled:

a) the participating laboratories and the samples used must be representative of


the planned application;
b) the determination of the precision data σr and σR must be unambiguous and
individual extreme results must be taken into account appropriately.

The conditions under a) are relatively easy to meet, although interlaboratory studies
often have to be conducted with volunteers instead of randomly selected participants.
On the other hand, the conditions in b) raise problems which often have not been
satisfactorily solved up to now. The classical analysis of variance supposes normally
distributed errors. However, every analytical chemist knows that deviant or suspect
results occur much more frequently than the normal distribution would predict.
There are many reasons for this; it is enough if just one parameter of the analytical
process is not completely under control. Since very few suspect values deviate by
an order of magnitude, it is often difficult to decide whether the suspect value
should be regarded as valid. In evaluating method-performance studies, extreme
results or all results obtained from a suspect laboratory are often eliminated before
the analysis of variance is conducted, in order to avoid excessively high values for
repeatability and reproducibility and, hence, a bad evaluation of the precision of
the method. But any such elimination inevitably entails the risk of underestimating
laboratory bias and replication error. An international convention about outlier
tests to be used was adopted (Horwitz, 1988). It does not, however, change the
unsatisfactory ’either-or’ situation, which is typical for all outlier tests: as soon as
the conditions for elimination are fulfilled, the value of the desired quantity changes
abruptly. Moreover, the proposed Cochran and Grubbs tests are far from the best
possible ones; e.g. the Grubbs test cannot even safely reject two distant outliers
out of 20 (Hampel, 1985). On the other hand, 30% outliers in method-performance
studies are rather the rule than the exception (Horwitz & Albert, 1986). These and
other unsatisfactory features of outlier tests led the Swiss Federal Committee for
Official Methods in Food Analysis (Lischer, 1987; SLB, 1989) and the Analytical
Methods Committee of the Royal Society of Chemistry (AMC, 1989) to propose
solutions which are similar. Instead of the hitherto usual outlier tests they use robust
statistical methods to calculate σr and σR . The underlying principle is Huber’s
proposal 2 (Huber, 1964). In the following we present two methods to estimate the
scale parameters σr and σR , the official SLB-method (SLB, 1989) and an alternative
method inspired by Rousseeuw’s scale estimator Qn (Rousseeuw & Croux, 1993).
254 P. LISCHER

2.2 The SLB-method for an uniform-level experiment


This method consists of three steps.

1) Estimation of the laboratory means m + bi by mi . The mi are the


solutions of the following system of equations:
p  
X yij − mi
ψc = 0 i = 1, 2, . . . , n ,
j=1
S∗

where ψc (t) = max(−c, min(t, c)) and S ∗ = 1.4826medij {|xij − medj xij |}.

2) Estimation of σr by Sr . Sr is the solution of


p
n X  
X yij − mi
ψc2 = n(p − 1)β ,
i=1 j=1
Sr
R
where β = ψc2 (z) dΦ(z).

3) Estimation of m and σL2 . The solution T of


n  
X mi − T
ψc = 0,
i=1
MADn

where MADn = 1.4826medi {|mi − medi mi |}, is a consistent estimate of m. If


T is put in
n  
X
2 mi − T
ψc = (n − 1)β
i=1
S
1/2
and solved for S, we get a consistent estimate of (σL2 + σr2 /p) . Furthermore,
SL2 = S 2 − Sr2 /p is a consistent estimate of the interlaboratory variance σL2 . If
S 2 − Sr2 /p < 0, then we put SL = 0.

The two quantities hi = (x̄i· − T )/S and ki = si /Sr , where x̄i· and si are mean and
within-laboratory standard deviation of laboratory i, will later be used to detect
laboratories which have produced unreliable results.
The third step is inspired by Huber’s proposal 2, but location T and scale S are
calculated separately whereas the proposed algorithm of the Analytical Methods
Committee of the Royal Society of Chemistry (AMC, 1989) calculates T and S
simultaneously. Reichenbach (1989) compared the two algorithms in a simulation
study. He showed that the AMC-algorithm converges slowly and that the breakdown
point is lower than 25% for moderate sample sizes. This is too small as 30% outliers
are not uncommon in interlaboratory trials. A procedure for evaluating such trials
should allow two bad laboratories out of eight. The SLB-procedure with separate
determination of location and scale has better convergence properties, a comparable
relative efficiency and a breakdown point of ≈ 30%.
INTERLABORATORY STUDIES 255

2.3 An alternative non-iterative method


The scale estimator used in SLB also has some drawbacks. It takes a symmetric
view of dispersion, because first a central value is determined and then it attaches
equal importance to positive and negative deviations from it, which does not seem to
be a natural approach at asymmetric distributions. Further, the finite breakdown
points are rather low and the rate of convergence near of the breakdown point
can be slow (Reichenbach, 1989). Finally, it is an iterative procedure. Practical
chemists, however, prefer explicit formulas. A non-iterative procedure to estimate
the precision parameters σr and σR with 50% breakdown point and a high relative
efficiency will be shown below. Basis is the Qn -estimator proposed by Rousseeuw
& Croux (1991). This estimator is given by the 0.25-quantile of the interpoint
distances. Let x = (x1 , x2 , . . . , xn ) be a set of n observations. Vectors and matrices
will be denoted by boldface throughout. Then

Qn (x) = fn 2.2191{|xi − xj |; i < j}(k) ,

where k = h2 ≈ 14 n2 and h = bn/2c + 1 is roughly half the number of observations.


 

The constant fn is a small-sample correction factor. The scale estimator Qn does not
need any location estimate. Instead of measuring how far away the observations are
from the central value, Qn looks at a typical distance between observations, which
is still valid at asymmetric distributions. The Gaussian efficiency of Qn is 82%.
In the case of a split-level design, Qn allows immediately to get estimates for σr and
σR . Let {(yi1 , yi2 ), i = 1, 2, . . . , n} be the results of the experiment, v= {(yi1 +yi2 )/2}
and w= {(yi1 − yi2 )/2}, so that Var[vi ] = σL2 + σr2 /2 and Var[wi ] = σr2 /2. Then

σ̂r = 2Qn (w) (1)
q p
σ̂R = σ̂L2 + σ̂r2 = Q2n (v) + Q2n (w) (2)

For an uniform-level experiment with p replicates, p ≥ 2 the procedure is similar.


Let dR be the 0.25-quantile of the absolute differences

{|yij − ykl |; i = 1, 2, . . . , n − 1, j = 1, 2, . . . , p, k = i + 1, i + 2, . . . , n, l = 1, 2, . . . , p}

and dr the 0.25-quantile of the absolute differences

{|(yij − yik ) − (yhl − yhm )|; i = 1, 2, . . . , n − 1, j < k, h = i + 1, i + 2, . . . , n, l < m}

Then √
σ̂r = 2.2191dr / 2
and
σ̂R = 2.2191dR .
256 P. LISCHER

Laboratories
1 2 3 4 5 6 7 8
(1) 4.2 3.1 3.2 3.2 3.2 3.2 3.2 3.2
(1) 4.4 3.1 3.2 3.1 3.1 3.3 3.2 2.7
(2) 26.2 26.5 27.0 26.8 26.4 28.8 28.2 26.0
(2) 26.0 26.6 27.2 26.5 26.2 28.0 28.2 25.9
(3) 48.5 44.4 46.4 45.7 44.1 48.8 45.1 45.5
(3) 48.3 44.5 46.6 46.0 45.0 48.5 45.6 49.3

Laboratories
9 10 11 12 13 14 15
(1) 3.0 2.9 3.1 2.6 3.6 3.0 3.1
(1) 3.0 3.1 3.1 2.7 3.5 3.1 3.1
(2) 25.9 26.2 29.6 24.7 29.2 25.1 25.9
(2) 26.0 26.4 30.0 24.1 29.6 25.2 26.0
(3) 43.8 45.0 50.7 45.8 49.0 42.9 44.9
(3) 44.2 45.2 50.6 46.1 50.0 42.9 44.7

Table 1: A trial of determination of nitrate in drinking water [mg/l] for three samples at
different concentration levels performed by fifteen laboratories.

2.4 Graphical display as an aid to analysis

Mandel (1989) presented a procedure for flagging outliers, based on two statistics,
called h and k. The h-values are calculated independently for each concentration
level. The overall average at that level is subtracted from each cell-average and
divided by the standard deviation. It is a measure of where a particular lab’s av-
erage lies with respect to the consensus value. The k-values are also calculated
independently at each level. It is simply the ratio of the within-cell standard devi-
ation to the pooled value over all laboratories at that level. It is evident that this
non-robust procedure suffers from the masking effect. But there is a simple remedy,
however. Instead of Mandel’s h and k we use the two quantities hi = (x̄i· − T )/S
and ki = si /Sr introduced earlier.
We will illustrate this procedure in terms of an interlaboratory study published
partially in the SLB (1989). It deals with the photometric determination of nitrate
in drinking water at three concentration levels. Every laboratory determined two
replicates for each sample (Table 1). The statistical analysis of the data was done
with the SLB- and the Qn -method (Table 2). The estimates do not much differ.
The h-values are displayed in Figure 1 and the k-values in Figure 2. h- and k-values
with absolute values ≤ 2 are traditionally considered as ”satisfactory”, between 2
and 3 as ”questionable” and with ≥ 3 as ”unsatisfactory”.
At a glance, we see what is going on: laboratories 1, 11, 12 and 13 got at least
one deviant mean value with |h| > 2 and laboratory 8 has a high within-laboratory
variation for sample 1 and 3 (|k| > 3). The organiser of the study has now to find
INTERLABORATORY STUDIES 257

SLB-method
µ̂ σ̂L σ̂r σ̂R
Sample 1 3.12 0.13 0.08 0.16
Sample 2 26.49 1.45 0.21 1.47
Sample 3 46.18 2.34 0.30 2.36

Qn -method
µ̂ σ̂L σ̂r σ̂R
Sample 1 3.12 0.18 0.13 0.22
Sample 2 26.49 1.31 0.26 1.33
Sample 3 46.18 1.98 0.26 2.00

Table 2: Statistical analysis for nitrate.

out whether there are any shortcomings in the analytical method.

3 Laboratory-performance studies

3.1 Robust distances


This type of interlaboratory analytical study is known as proficiency testing. Its aim
is to offer the participating laboratories the opportunity to compare their analytical
results with those of other laboratories. Two distinct aims can be formulated (AMC,
1992):

a) to encourage good performance generally, and especially to encourage the use


of proper routine quality control measures within individual laboratories; to
provide feedback to the laboratories and encourage remedial action where
shortcomings in performance are detected;

b) to provide a rational basis for the selection or licensing of laboratories for a


special task and likewise to disqualify laboratories for a specific task should
their performance fall below a certain standard.

These two aims are somewhat divergent, but the motivation is the same: the iden-
tification of laboratories that produce data of unacceptable quality.
Most proficiency testing schemes proceed by comparing the bias estimate (x − xtrue )
with a target value for the standard deviation that forms the criterion of perfor-
mance. An obvious approach is to form z-scores given by
x − xtrue
z= ,
σ
where σ is the target value for the standard deviation. If x̂ and σ̂ are good estimates
of xtrue and the standard deviation σ, respectively, and if the underlying distribution
258 P. LISCHER

Figure 1: Graph of h-values by laboratories.

were normal, then z would be approximately normally distributed with a mean


zero and a unit standard deviation. Because z is standardised, it can be usefully
compared between all analytes, test materials and analytical methods. Values of z
obtained from diverse materials and concentration ranges can, therefore, with due
caution, be combined to give a composite score for a laboratory in one round of a
proficiency test. AMC (1992) proposes the sum of squared scores as a performance
criterion of a laboratory. But this composite score does not take into account the
possible correlation structure of the data set.
Suppose we have data of n laboratories, each of which analysed p specimens: X
= {x1 , x2 , . . . , xn } = {(x11 , x12 , . . . , x1p ), . . . , (xn1 , xn2 , . . . , xnp )} and we want to di-
agnose outlying laboratories. The word outlier is applied here to any xi ∈ <p that
is markedly different from most of the other.
The squared Mahalanobis distance for observation xi is

d2i = (xi − x̄)V−1 (xi − x̄)T

where x̄ is the mean of the group and V is the sample covariance matrix. Asymp-
totically the d2i follow a chi-squared distribution on p degrees of freedom. If x̄ and
V were not estimated but were known population parameters, outlying values of xi
would yield large values of the squared distance d2i . However, the effect of such val-
ues on the estimation of x̄ and V leads to the rapid breakdown of the Mahalanobis
distance for the detection of outliers, particularly if several outliers are present. In
Rousseeuw & Leroy (1987) it was suggested to calculate robust distances by using
the Minimum Volume Ellipsoid estimator. This estimator has a maximal breakdown
point but is not very efficient. Further, a rule of thumb of Rousseeuw states that
there should be at least five observations per dimension (n/p ≥ 5). This requirement
INTERLABORATORY STUDIES 259

Figure 2: Graph of k-values by laboratories.

seems to be unrealistic for laboratory-performance studies as ratios n/p ≈ 2 are not


uncommon. In this case reliable estimates of distances can be got only if the specific
structure of the laboratory data is taken into account. This will be discussed below.

3.2 An interlaboratory example

We will illustrate the procedure in terms of a proficiency test carried out at the Swiss
Federal Research Station for Agricultural Chemistry and Hygiene of the Environ-
ment (FAC) with 23 participating laboratories. The primary goal was a laboratory
comparison. A number of dried sewage sludges was analysed for different heavy
metals and P and K. In this numerical example the results of the Zn-analyses are
presented (Table 3).
Each laboratory Li , (i = 1, . . . , 23) had to analyse 10 specimens S1 , S2 , . . . , S10 .
S1 , S2 , . . . , S5 were taken from 5 different homogenised samples of dried sludges,
S6 , S2 , . . . , S9 were mixtures: (S6 : 50%S1 +50%S2 ; S7 : 50%S3 +50%S2 ; S8 : 50%S4 +
50%S2 ; S9 : 50%S5 + 50%S2 ) and S10 was an extracted solution of S1 , produced in
a laboratory of the FAC. This specimen was analysed to obtain an estimate of the
effect of the sample preparation for different laboratories. The mixtures S6 , S7 , S8
and S9 together with S1 , S3 , S4 and S5 , allowed four alternative determinations of
the true content of S2 (Table 4). Inconsistencies between the direct and the indirect
determinations of the true concentration of sample S2 indicated a bad performance
of a laboratory. With the four supplementary results for the true concentration of
specimen S2 it was possible to get a more reliable value of the true content of S2
and to estimate the scale parameters σr and σR for this concentration level.
260 P. LISCHER

Specimens
Labs s1 s2 s3 s4 s5 s6 s7 s8 s9 s10
1 1531 2945 1264 2068 1150 2252 2109 2526 2096 1499
2 1527 2440 1250 2110 1160 2080 2140 2430 2010 1500
3 1500 2900 1200 1900 1100 2200 1900 2400 2000 1746
4 1490 2830 1210 2060 1090 2220 2060 2440 2030 1350
5 1547 2900 1238 2060 1120 2211 2094 2463 1961 1500
6 1432 —- —- —- 1076 2029 —- 1813 1368
7 1393 2466 1131 1886 1011 2040 1985 2321 1133 1430
8 1579 2954 1188 2240 1229 2313 2172 2642 2160 1524
9 1520 2950 1240 2080 1080 2250 2070 2520 2010 1514
10 1653 3199 1369 2489 1182 2537 2565 2729 2371 1648
11 1577 3030 1264 2113 1129 2296 2191 2676 2129 1600
12 1536 3014 1260 2089 1143 2304 2113 2513 2089 1616
13 1500 3150 1220 2030 1070 1940 1780 2370 1940 1520
14 1512 2779 1252 2089 1120 2245 2042 2517 2000 1485
15 1458 2688 1214 1896 1122 1618 1836 2160 1849 1655
16 1500 3160 1160 2150 1065 2180 2058 2482 2003 1428
17 1525 2865 1245 2125 1120 2215 2180 2600 2065 1495
18 1488 2720 1713 1968 1044 2143 2016 2344 1992 1448
19 1464 2857 1187 2024 1073 2153 2024 2413 1932 1576
20 1600 2400 1300 1900 440 2400 1900 2300 1900 1500
21 1558 3077 1279 2082 1116 2271 2152 2558 2133 1481
22 1595 3271 1344 2241 1202 2420 2280 2670 2134 —-
23 1803 2408 1084 2309 1473 2860 2063 2429 1369 1250

m̂j 1527 2882 1238 2075 1114 2222 2071 2479 2003 1507
σ̂j 60.0 271.3 62.8 132.5 60.6 153.2 126.3 139.2 128.5 102.5

Table 3: Zn-concentrations [mg/kg] in ten samples of sewage sludges. We note a relatively


high value for σ̂2 .

We write the model as

xij = mj + bij + eij , i = 1, 2, . . . , 23; j = 1, 2, . . . , 10 ,

where the mj are the true (or consensus) values, the bij are the random laboratory
2
effects assumed to have mean 0 and variance σLj and the eij are the replication errors
2 2 2
assumed to have mean 0 and variance σrj . In general σLj and σrj are concentration-
dependent but if the same analytical method is used and the concentration levels
are not too different the quotient

2
σLj
q= 2 2
σLj + σrj

can be assumed constant (0 ≤ q < 1).


INTERLABORATORY STUDIES 261

Labs s2 w1 w2 w3 w4 rd2i
1 2945 2973 2954 2984 3042 2.1
2 2440 2633 3030 2750 2860 14.9
3 2900 2900 2600 2900 2900 31.3
4 2830 2950 2910 2820 2970 7.0
5 2900 2875 2950 2866 2802 1.0
6 —- 2626 —– —– 2550 15.5
7 2466 2687 2839 2756 1255 98.0
8 2954 3047 3156 3044 3091 17.3
9 2950 2980 2900 2960 2940 1.7
10 3199 3421 3761 2969 3560 30.7
11 3030 3015 3118 3239 3129 4.8
12 3014 3072 2966 2937 3035 2.6
13 3150 2380 2340 2710 2810 26.0
14 2779 2978 2832 2945 2880 1.5
15 2688 1778 2458 2424 2576 63.9
16 3160 2860 2956 2814 2941 13.4
17 2865 2905 3115 3075 3010 4.3
18 2720 2798 2319 2720 2940 203.0
19 2857 2842 2861 2802 2791 6.6
20 2400 3200 2500 2700 3360 385.4
21 3077 2984 3025 3034 3150 4.6
22 3271 3245 3216 3099 3066 8.9
23 2408 3917 3042 2549 1265 379.2

m̂j 2882 2924 2925 2874 2928


σ̂j 271.3 241.5 259.1 175.8 231.0

Table 4: Direct and indirect determination of the Zn-concentration of specimen S2 (w1 =


2 · s6 − s1 , w2 = 2 · s7 − s3 , w3 = 2 · s8 − s4 , w4 = 2 · s9 − s5 ) and squared robust distances
rd2i . m̂j and σ̂j are robust estimates of location mj and scale σj2 = σLj 2 + σ 2 determined
rj
with the SLB-algorithm.

Let zij = (xij − m̂j )/σ̂j , i = 1, 2, . . . , 23; j = 1, 2, . . . , 10. We have



 1, if (i, k) = (k, l)
Cov(zij , zkl ) = q, if i = k and j 6= l
0, if i 6= k

The data structure of the zij is the same as the structure of results of an uniform-
level experiment. Therefore, the covariance matrix S has a very simple form: 1 in
the diagonal and q otherwise. With the SLB- or the Qn -method we get immediately
a robust estimate q̂ of q resp. Ŝ of S and
−1
rd2i = zi Ŝ zTi
is the squared robust distance of laboratory i.
The critical value is χ210,0.975 = 20.48. Laboratories with values of rd2i > 20.48 must
be considered as unreliable.
262 P. LISCHER

In Fig. 3 the z-scores are presented. In the first line below the graphic there are the
laboratory codes, in the second the number of analysed specimens, in the third the
squared robust distance and in the forth the corresponding p-values. We used the
SLB-method to get q̂.

Figure 3: z-scores of Zn-concentrations in sewage sludges and corresponding p-


values.

3.3 Choice of materials and distribution of samples


A proficiency test must enable the organiser to see whether there is a general im-
provement in performance in time. But if the same test material is distributed
several times, the participants would become aware of the consensus value after the
first round and the credibility of the results in successive rounds would be compro-
mised. Therefore the organiser should distribute also mixtures of samples. As we
have seen in the above example the true value of sample S2 could also be determined
INTERLABORATORY STUDIES 263

in an indirect way from samples S6 and S1 , from S7 and S3 , from S8 and S4 and
from S9 and S5 . We could organise a future proficiency test in the following way:
at each round four specimens P1 , P2 , P3 and P4 are distributed. P1 and P2 are
new samples, P3 and P4 are mixtures of P1 and P2 with P0 , where P0 is a specimen
from an earlier round with a (only for the organiser) known true value. Then the
organiser can evaluate an eventual improvement in performance and even estimate
the precision parameters σr and σR at the concentration level of P0 .
At a glance, we see the laboratories which have analytical problems for the determi-
nation of Zn. Similar graphics for the other elements can be done. As the confiden-
tiality of the results is extremely important in this type of laboratory-performance
studies, the organiser distributes to the laboratories only graphics which contain
exclusively their own results. Instead of the results of one element of all laboratories
(Fig. 3), he distributes graphics which show the results of all elements of a particular
laboratory.

4 Conclusions
Monitoring the amount of pollutants in soil, water, air, plants, food, etc. is impor-
tant nowadays. Analytical tests are required to judge contamination. The fascinat-
ing thing about analycal chemical measurements is that they can quantify chemical
contents objectively. A drawback is that they suffer from a lack of comparability.
It is common knowledge amongst those who practise analysis for trade and com-
merce that analysts can obtain different results on the same material. Obviously it
may not be in their interest to expose this fact. It is in the field of public health
and environmental monitoring, where determinand concentrations are often small
and where slight differences may be significant, that interlaboratory variation has
received most attention. The disturbing thing is the suggestion of unreliability and
its possible diffusion to the general public as well as to the governments responsible
in cases where important decisions must be made on the basis of chemical measure-
ments. However, the situation is not as bad as it seems. If chemists and statisticians
collaborate, try to understand each other’s problems and use realistic models, repre-
sentative samples, standardised and robust analytical methods, reference materials,
interlaboratory tests and good robust statistics, errors can be controlled.

References
[1] AMC (1989): Analytical Methods Committee. Robust Statistics Part 2: Inter-
laboratory Trials. Analyst 114 1699-1702.

[2] AMC (1992): Analytical Methods Committee. Proficiency Testing of Analytical


Laboratories: Organisation and Statistical Assessment. Analyst 117 97-117.

[3] Hampel, F.R. (1985): The Breakdown Points of the Mean Combined With
Some Rejection Rules, Technometrics 27 95-107.
264 P. LISCHER

[4] Horwitz, W. Albert, R. (1986): Performance Characteristics of Methods of


Analysis Used for Regulatory Purposes. Paper delivered at the Pittsburgh-
Conference and Exposition, Atlantic City, NJ USA.

[5] Horwitz, W. (1988): Protocol for the Design, Conduct and Interpretation of
Collaborative Studies. Pure & Appl. Chem. 60(6) 855-864.

[6] Huber P. J. (1964): Robust estimation of a location parameter. Ann. Math.


Statist. 35 73-101.

[7] ISO-5725 (1987): Accuracy (Trueness and Precision) of Test Measurements


Part 1: General Principles and Definitions. International Organisation for
Standardisation, Geneva, Switzerland.

[8] Lischer, P. (1987): Robuste Ringversuchsauswertung. Lebensmittel-Technologie


20 167-172.

[9] Mandel, J. (1989): Interlaboratory Testing and Rejection of Observations. Pro-


ceedings ISO/REMCO 184. International Organisation for Standardisation,
Geneva, Switzerland.

[10] Reichenbach, A. (1989): Robuste Methoden für die Auswertung von Ringver-
suchen. Diplomarbeit ETH-Zürich.

[11] Rousseeuw, P.J. & Leroy, A.M. (1987): Robust Regression and Outlier Detec-
tion. Wiley, New York.

[12] Rousseeuw, P.J. & van Zomeren, B.C. (1990): Unmasking multivariate outliers
and leverage points. J. Amer. Statist. Assoc. 85 633-639.

[13] Rousseeuw, P.J. & Croux, C. (1991): Alternatives to the Median Absolute
Deviation, J. Amer. Statist. Assoc. 88 1273-1283.

[14] SLB (1989): Schweizerisches Lebensmittelbuch, Kapitel 60. Statistik und


Ringversuche. Eidg. Drucksachen- und Materialzentrale, Bern.

View publication stats

You might also like