Professional Documents
Culture Documents
Consensus Values and Weighting Factors: Robert C. Paule and John Mandel
Consensus Values and Weighting Factors: Robert C. Paule and John Mandel
A method is presented for the statistical analysis of sets of data which are assembled from multiple ex-
periments. The analysis recognizes the existence of both within group and between group variabilities, and
calculates appropriate weighting factors based on the observed variability for each group. The weighting
factors are used to calculate a "best" consensus value from the overall experiment. The technique for
obtaining the consensus value is applicable to either the determination of the weighted average value, or to
the parameters associated with a weighted least squares regression problem. The calculations are made by
using an iterative technique with a truncated Taylor series expansion. The calculations are straight-
forward, and are easily programmed on a desktop computer.
An examination of the observed variabilities, both within groups and between groups, leads to con-
siderable insight into the overall experiment and greatly aids in the design of future experiments.
Key words: ANOVA (within-between), components of variance, consensus values, design of experiments,
pooling of variance, weighted average, weighted least squares regression.
377
values which are obtained by subtracting 200 from the the inverse of the variance of the individual Yi, that is,
measured values. c~i = I/Var(Yi). Low weights are given to values with
high variance.
Next consider the weighted average of m average
Method A B values, Ye:
Measured Values 201.1 201.9 201.5 216 225 203
Coded Values 1.1 1.9 1.5 16 25 3 m
EIcoiy
For ease of presentation, our evaluations will be made y=i=1 (1)
using the coded values. Giving the same weight to all six m i hi
M
coded values result in a straight average of 8.1. Note that
the addition or loss of a single method B measurement i~i
378
increases to the point that a lower level of systematic er- 6 2
ror can now be detected. There are practical limitations
to the pursuit of this process, and frequently one must
IY~
(YAY) 2
+ I({YBGYB)2
live with a certain detectable level of between set Q _ _ 112 l=1
(6-1) + (2-1) = 0.1398
systematic error. Effects such as interferences due to
(2)
minor sample components will vary in different
laboratory environments and these effects are extremely
so that
difficult to eliminate.
An essential point in our analysis is the assumption
52() } - 0.1398 2
that no information on systematic errors is available that
6 +2b
would allow us to place more confidence in any one set of
measurements as compared to the others. Thus, in this
and
analysis all sets have equal standing with regard to their
possible systematic errors. Our technique can, however,
-2y) 0.1398
be extended to cover situations involving different
2 +S
assumptions.
The calculation of the between set component of
variance is readily accomplished by an iterative pro- To summarize: The weighting constants used to
cedure, described in section 4. The sample estimate of calculate the consensus value are obtained by taking the
Var(Yd for method i, is obtained by combining the inverse of the variances of the various set Yi. The proper
within set component of variance, s 2., and the between variances are a combination of the within and the be-
set component of variance, sg. For the second example: tween set components of variance. Under certain cir-
cumstances, a more stable pooled within set component
S2 of variance may be used.
2- wA 8
2
=2y) -_ + b
4. Calculation of the Between Set
Component of Variance
S.2
82ty) _= B + 2b
2 S The proper weight for Yi is coi = 1 Var(Yj) and the
estimate of this quantity is:
The within set component of variance for method A is: -1
6 =[_'+
Ii i] (3)
2
7 ( Yi-ch)
Depending on the nature of the data, the within set
s i=- = 0.1427 variance may or may not be pooled. For either case,
1. (6-1) however, s2 must be evaluated. This is accomplished in
the following way.
Similarly, From the definition of coiwe obtain the relation:
S2 = 0.1250 co1Var(Yi) = 1
and 4 A/6 and A2 /2 are equal to 0.0238 and 0.0625, or equivalently (4)
the quantities that we had previously (and incorrectly)
called S2 (yA) and s2 YB). For the proper s2 (Y,) one needs Var(#3i1) = I
to add in sb. With an available S2 one calculates
estimates for S 2 (YA) and s 2(YB) and tte corresponding For any given set of coi, this variance can be estimated
weights, wA and wB' and then proceeds by eq (1) to ob- from the sample by the formula
tain a valid estimate of the consensus value, Y.
If the sA, are quite similar, as they are in the above ex- m
ample, one can make an improvement by using a more
stable pooled s2. There should, of course, be a
I zw),(Y-})
iYi"P
2
379
Equating this estimate to its expected value (unity, see eq this last iteration.
(4)), we obtain The small data set of our second example will now be
used to illustrate the iterative procedure. Let the first
m estimate for s2 be 100. In calculating the coifrom eq (3),
it is seen that the first term of the right-hand side is a
i;; l c,(y, - j)2 fixed quantity and that the values for the A and B sets
I
{m-1) have been previously calculated to be .0238 and .0625,
respectively.Thus, coA= 1/(.0238 + 100.) = .0099976
where Y is the estimate of the consensus value as given and COB = .0099938. The YA and YB are 1.533 and
by eq (1). The estimate of Y depends on knowing the wi. 16.550, respectively.
These can be calculated from eq (3), once s2 is known.
Thus, the only problem is to estimate sb. Equation (5) From eq (1), Y = 9.0400
provides the means for calculating sb through an iter- From eq (6), F0 = .1270
ative process. From eq (7), dv = 11.28
Define the function:
The next iteration would start with a value of 111.28 for
m S2, and would repeat the above set of calculations with
and 5. Discussion
dv=- /FO\ The above iterative calculations for the weights and
the weighted average are recommended. The calcula-
\ay tions are based on the recognition of both within and be-
tween group variability. The calculated consensus value
Evaluating the partial derivative in this equation, one is, in general, neither the grand average of all
obtains: measurements, nor the average of measurement set
averages. These overall averages merely describe two op-
Fo posite weighting situations from our more general
dv =
m 1 weighting eq (3). To illustrate this point consider the case
where a pooled SA is used in eq (3). When the 2 term of
iC O(2 -Y-Y)2°
i=1I I( Jo- this equation is zero, the weights for the Yi are all pro-
portional to ni. All individual measurements are
The adjusted (new) value for v is: therefore weighted equally. When, however, s2 is
New vo = Old vo + dv relatively large, the s2 /ni term of eq (3) is essentially
without effect, and all the measurement set averagesare
This new value is now introduced in eq (1), (3), (6), weighted equally. Equation (3) also gives proper
and (7) and the procedure is iterated until dv is satisfac- weighting for all intermediate cases. In addition, it
torily close to zero. If at any point in the iteration process describes the situation where the within set components
a negative value is obtained for v, this value should be of variance are different for different sets of
replaced by zero and the iteration continued. The last v measurements, and takes account of any differences in
is the s2 we seek. The coi and Y are also obtained from the number of replicates (ni) in the various groups.
380
The ready availability of programmable desktop com- This value is seen to be quite reasonable when one
puters strongly encourages the use the iterative ap- remembers that the uncoded group averages for methods
proach. Since one can easily do the calculations, there is A and B were 201.53 and 216.55. Notice in this exam-
little reason to not use proper weighting. ple that the between set component of variance is the
The examples to this point have been chosen to be predominant factor in the standard error.
easily worked by hand. They describe situations where
the intuitive answers are obvious. The examples use of 7. Example of an Interlaboratory Experiment
only two measurement sets, however, is not recommend- Using the Weighted Average
ed in practice since there is a very limited sampling of
measured differences between sets. Such a limited Five laboratories have made a number of determina-
sampling results in a s2 estimate that is quite uncertain. tions for the heat of vaporization of cadmium [5]. In this
The use of many sets of measurements is recommended experiment, each laboratory had a noticeably different
since this results in greater stability of the estimates. replication precision, and each performed a different
number of determinations to obtain its average value.
6. Calculation of the Standard Error of We now wish to determine the consensus value (weighted
the Weighted Average average) from this interlaboratory experiment. The in-
formation from the experiment is listed below, along
All practical applications of the weighted average will with the s2 calculated by the iterative procedure.
require some estimate of its uncertainty. Accordingly,
the standard error (standard deviation) of the weighted
2
average should be calculated. The derivation of the stan- Lab i Avg.Value nIns . 2n
381
Note that the second and third columns contain relative
weights while the fourth column contains absolute
weights. The relative weights cause no problem for the
calculation of the weighted average since inspection of eq
(1)shows that any constant multiplier for the relative
weights will cancel out. An inspection of the three col-
umns of weights, as well as the ordered laboratory heat
values, shows that the weights for the iterative procedure
most strongly favor the higher laboratory heat values.
Column five of the table, in turn, shows why the iterative
weights most strongly favor the higher laboratory heat
values; the observed within group variability is smaller
for the laboratories that have the higher heat values.
This causes the Var(Yi) for these laboratories to be
relatively small and the weights to be relatively large.
X
This example with actual laboratory data shows that Figure 1
one cannot automatically assume that the average of
averages and the average of measurements will bracket
the consensus value (weighted average). The weighted
gested by the data. The Y replication variability
average should be calculated. It is more sensitive to the
associated with a given X value is analogous to the
overall experiment and it responds to both the within-
previously described within set component of variance,
and the between group variability.
It will next be shown that the iterative treatment of
sd2.,and the variation shown by the scatter of the clusters
of points about the fitted line is analogous to the between
weighting factors can be easily extended to the problem
set components of variance, s .
of fitting lines by weighted least squares (regression).
The observed variance for the j-th replicate Y.
measurement made at a given Xi value will consist of the
8. Fitting Lines by Weighted Least Squares
sum of the within- and the between set components of
variance.
According to statistical theory, the above defined
estimate of the weighted average is the value that
S2(y,,) = S2 + S2
minimizes the sum of the weighted squared deviations of
the observed data (from the weighted average value). It
For convenience of calculation, it is desirable to deal
is a least squares estimate. A similar treatment is used in
weighted linear regression. Here, a pair of parameters,
with the averages of the replicate measurements. The
average of ni replicate measurements is denoted as Yi.
namely the intercept and the slope of the line, are
The observed variances for the averages are given by:
estimated, rather than a single average. The procedure,
however, is again the minimization of the weighted sum
- wi
of squares of deviations. Here, too, both within set and s2(y) = + S2
between set components of variance should be evaluated. ni b
Consider the situation where a laboratory calibrates
an instrument using a series of standards. The The within set variances of the above equation can be
laboratory may not always make the same number of evaluated for each distinct Xi value. It is possible, if
replicate measurements with the different standards. there is a consistent measurement process over the full
Thus there are different sets of replicate instrument range of values, to obtain a pooled estimate of the within
measurements (1) corresponding to a series of accurately set component of variance. This pooled estimate is ob-
determined standard values (X). An example of a linear tained in the same manner as described by eq (2), above.
calibration process is given in figure 1. Let us assume In the current application, the different Xi values corres-
that the linearity of the calibration curve has previously pond to the previously described different measurement
been established. An examination of the figure shows sets and there are now as many summations in the
that the variability in the Y direction among replicates numerator and denominator of eq (2) as there are
obtained at the same X value is relatively small when distinct Xi values.
compared with the scatter of the clusters of points about Let us now assume that an appropriate between set
the straight line. Thus, two sources of variability are sug- component of variance is available. The weights, coi =
382
I/Var(Yd) can be evaluated, and a standard weighted figure 2. The true line has both unit intercept and slope.
linear regression of Yi on Xi can be carried out (see either For this example, let us assume that "interferences" for
the Appendix, or Ref. 16]). Thus, the regression problem the X = 1 and the X = 5 standard samples are such
using weighted least squares centers on the determi- that the measured values will be about 0.2 units high.
nation of s2. Similarly, the X = 2 and X = 4 standards yield
The sb value, for the regression case with an intercept results that are about 0.2 units low. Duplicate
and slope, can be determined by the general iterative ap- measurements are made, and for simplicity assume that
proach given above. Equation (3) now refers to the these measurements have a fixed s2 value of 0.0008 (as
within- and between set random errors in the Y shown in fig. 2). With equal numbers of replicate
measurements. It is now used along with the following measurements, both the unweighted and the iterative
modified iteration equations: weighted regression calculations give the correct values
for the intercept and the slope.
m
2
F (S2) = C(}
i -.J) - (m-2) (8)
i=1
71
Fo (9)
S
b 6
(Yi-Yi) 5
4
y x Y
where A
1.0 2.18
Y.i = weighted least squares fitted value, i.e., 3 / 1.0 2.22
2.0 2.78
Yi = a + bXi * 2.0 2.82
The major modification is that instead ofkusing Y, we use 2 3.0 3.98
3.0 4.02
a weighted least squares fitted value Yi. Equation (8) 4.0 4.78
4.0 4.82
uses (m-2) rather than (m-1) degrees of freedom since we A 5.0 6.18
5.0 6.22
are now estimating two parameters, i.e., the intercept e I I ~ ~ ~~~~~~~~~~~~~~~~~~~~I
and the slope. 0 1 2 3 4 5 6
The procedure for iteration is little changed. An ar- X
Figure 2
bitrary initial estimate for sbis taken and used with (3)to
obtain the weights. Next, a weighted linear reg ession is
made of Yi on Xi to obtain estimates a, b, and i. This is
followed by the use of eq (8) and (9) to calculate a correc-
Let us now, however, say that the experimenter is par-
tion for s2 The whole procedure is then repeated until
tie correction for s2 is negligible. The final S2, a, b, and ticularly interested in determining the intercept and that
he/she therefore makes six rather than two replicate
Y. are then saved for further interpretation and use.
The above procedure for performing a weighted linear
measurements using the X = 1 standard. For the sake
of simplicity, assume that the six Y measurements again
least squares fit can be easily extended to a weighted
center at 2.2 and that s2 = .0008. Even though
quadratic, or higher order, regression of Y1 on Xi. For
everything looks nominally the same, the unweighted
example, to fit the equation Yi = a + bXi + cX9
regression calculation gives an intercept of 1.145 and a
change, in equation (8), the (m-2) to (m-3)to account f r
slope of 0.9636. Obviously, the six points at X = 1
the addition of coefficient c, and use a quadratic fitted Yi
have pulled the left side of the line upward. If we carried
in eq (8) and (9).
out the regression calculation using only the average Y
value for each X value we would obtain the correct in-
9. An Example of a Weighted Least
tercept and slope values. The average Y values are not
Squares Fit affected by the number of measurements used in each
average.
Let us examine the effect of different weighting factors In this example, in which appreciably more
on the determination of the intercept and slope of a measurements. were made for one standard. than for the
calibration line. A greatly simplified example is shown in others, and the replication error was relatively small, the
383
unweighted regression leads to erroneous results. A pro- tion one should make many replicate measurements with
per weighting procedure must prevent the measurements all of the standard samples so as to minimize the replica-
at one standard from unduly influencing the fit. Equa- tion uncertainty.
tion (3) of our iterative weighted regression calculations
will properly control the weighting. In this example, the
so term in eq (3) dominates the weighting. Use of the 11. Summary and Conclusions
iterative weighted linear regression gives a = 1.0008
and b = 0.9998. If the data from this example were real Calculation of consensus values, both in the form of
laboratory data, then our calculated a and b would be the weighted average or the weighted least squares
the appropriate sample estimates. regression, requires a knowledge of the within- and the
between set components of variance. The individual or
10. Design of Experiments the pooled within set components of variance can be
directly calculated from the experimental data. The be-
The interferences associated with the live standards of tween set component of variance can conveniently be
the above illustrative example have been ideally and ar- calculated from the experimental data using an iterative
tificially balanced. In real life situations the order in technique which is based on a truncated Taylor series ex-
which the interferences will occur will tend to be more pansion. Consensus value(s) are also obtained by this
random. When the replication error is small, i.e., s, is iterative technique.
small relative to s2, the positions of the (Xi, Yj) points A simple intuitive understanding of the within- and
will be mainly affected by these random sample in- between set components of variance allows one to more
terferences. In that case, the use of a larger number of efficiently design experiments for obtaining consensus
standards over the range of measurement interest is values.
recommended since this favors a more even distribution The logical arguments for use of the within- and be-
of these interferences, and a more accurate determina- tween set components of variance can be extended to
tion of the line. Furthermore when s2 is small relative to other areas of statistical analysis. Work is in progress for
S2 the use of large numbers of replicate measurements is extending the current techniques to nested analyses of
not recommended since these measurements are very in- variance.
efficient in determining the position of the (Xi, YIJ
points.
Consider next the situation shown in figure 3, where
4 is large relative to 2 Here all of the average points
12. References
(Xi, Yi) are very uncertain. The interferences of each
standard sample is now completely overshadowed by the
variability in the replicate measurements. For this situa- [11 Birge, R. T., The Calculation of Errors by the Method of Least
Squares, Phys. Rev., 40, 207-227 41932).
121 Mandel, J. and Paule, R. C., Interlaboratory Evaluation of a
Material with Unequal Numbers of Replicates, Anal. Chem., 42,
I 194-7 (1970), and Correction, Anal. Chem., 43, 1287 (1971).
[31 Cochran, W. G., The Combination of Estimates from Different
' I I - I I I Experiments, Biometrics, 10, 101-29 (19541.
0
141 Youden, W. J., Enduring Values, Technornetrics, 14, 111
(19721.
15] Paule, R. C. and Mandel, J., Analysis of Interlaboratory
y~~~~~ Measurements on the Vapor Pressure of Cadmium and Silver,
NBS Special Publication 260-21 (19711.
[6] Draper, N. R. and Smith, H., Applied Regression Analysis, 2nd
ed., Section 2.11 (John Wiley and Sons, N.Y., 1981).
Appendix
X
Figure 3 The formulas for estimating the slope and intercept by
weighted least squares are straightforward. The slope is
calculated from the observed m sets of (Xi, Yd)points.
384
m a = Y- bX
E Wi(Xi - X) (Yi 45
b = i=1 The interested reader may also wish to calculate the
standard errors of the above estimates of the slope and
m
the intercept, the formulas are:
E=
1 (Xi-X)2
i=l
1
where Sb
co)(X, -X) 2
m
.Ez
Xi = I=-I
(Dixi
m
i=1
and C-i X, 2
m Sa = [m 1Fm
E CAi
y/ = i= I
Fi E i'
i=L 2 i=L
-X)
co(X, 21
m
zoPi
i=1
A more detailed explanation of weighted least squares
The intercept is obtained by the following formula. fitting processes is contained in Ref. [6].
385