Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Accred Qual Assur

DOI 10.1007/s00769-017-1292-6

GENERAL PAPER

Bayesian analysis of homogeneity studies in the production


of reference materials
Adriaan M. H. van der Veen1

Received: 18 March 2017 / Accepted: 30 September 2017


Ó Springer-Verlag GmbH Germany 2017

Abstract For almost two decades, the batch homogeneity Introduction


in the production of reference materials has been evaluated
using analysis of variance (ANOVA) to determine the The current practice of viewing the batch (‘‘between-bot-
between-bottle standard deviation. This approach replaced tle’’) homogeneity as an uncertainty component to be
at that time the use of the F-test in ANOVA to determine included in the uncertainty budget of the certified value of
whether the ratio of the mean squares MSbetween =MSwithin is a reference material started with a publication by Pauwels
statistically significant. Problems arise when et al. [1] in the late 1990s. In the following years, the idea
MSbetween \MSwithin , because classical ANOVA provides a was further developed and extended to stability studies [2],
negative between-bottle variance, which is then often set to and eventually, an evaluation method based on analysis of
zero. By using a Bayesian hierarchical model, based on the variance (ANOVA) was proposed [3, 4]. The evaluation of
same assumptions as traditional ANOVA, we show that the between-bottle homogeneity has become an integral
even if MSbetween \MSwithin , there can be a relevant level of part of reference material production since the publication
between-bottle inhomogeneity to account for. The Baye- of the third edition of ISO Guide 35 [5], implementing
sian analysis produces a nonzero value for the between- relevant requirements of ISO Guide 34 [6] (recently
bottle standard deviation, which dismisses the practice of superseded by ISO 17034 [7]). Also in proficiency testing
setting this standard deviation to 0. At the same time, it [8], this type of assessment is essential to characterise the
dismisses the current guidance given in ISO Guide 35 dispersion of the property across the items (bottles) pro-
under these circumstances. Finally, it is shown that tradi- duced that are going to be used with a single assigned
tional ANOVA, meta-analysis methods and Bayesian value, valid for the entire batch [9].
analysis give very similar answers as long as The classical approach to using ANOVA in homo-
MSbetween [ MSwithin , so there is no need to discourage geneity studies of reference materials relies on the
using these methods in favour of a Bayesian analysis, relationships between the mean squares between
provided that the repeatability of the measurement method (MSbetween ) and within groups (MSwithin ) on one hand and
used to conduct the between-bottle homogeneity study is the expectations of the repeatability and between-bottle
sufficient to characterise the dispersion across the bottles variances on the other [3, 4]. As long as
(items) in a batch of a reference material. MSbetween  MSwithin , this way of evaluating homogeneity
study data works well. When this condition is not fulfilled,
Keywords Measurement uncertainty  Bayesian analysis  the between-bottle standard deviation cannot be computed,
Hierarchical model  Meta-analysis  Analysis of variance  because it would involve taking the square root of a neg-
Reference material  Homogeneity study ative number. Situations where MSbetween \MSwithin arise
when the differences due to between-bottle homogeneity
are small and the repeatability of the measurement method
& Adriaan M. H. van der Veen
is not sufficient to quantify such differences, or in the
avdveen@vsl.nl
presence of outliers or stragglers in the dataset. Till to date,
1
VSL, Thijsseweg 11, 2629 JA Delft, The Netherlands there are still experimenters that consider an F-test telling

123
Accred Qual Assur

that there is ‘‘no significant inhomogeneity’’ as desired for a homogeneity study of ten gas mixtures. These gas
outcome, see, e.g. [10], but such an assessment is inap- mixtures were obtained by decanting a gravimetrically
propriate, for it assesses the batch homogeneity against the prepared synthetic natural gas mixture [16]. The set chosen
wrong metric, namely the repeatability standard deviation was one that confirmed the successful operation of the
of the method, whereas it is supposed to be the standard decanting procedure, thus a dataset for which small values
uncertainty from the characterisation of the reference for the between-bottle standard deviation are expected. For
material, possibly combined with an uncertainty contribu- one component (nitrogen), MSbetween \MSwithin . The out-
tion due to the instability of the property value. come of the Bayesian analysis is compared with two
Insufficient repeatability of the method can be the result classical methods: traditional ANOVA and a meta-analysis
of choosing a poorly repeatable method for conducting the method using the DerSimonian–Laird (DL) model [17],
homogeneity study, a too low number of replicates, but based on the same random-effects model. Neither of these
such situations should be avoided [1, 4]. Nevertheless, after methods satisfactorily deals with the situation where
material processing, homogenising and bottling, the level MSbetween \MSwithin , but both methods would be considered
of between-bottle heterogeneity can be so small that it is as simpler alternatives for the Bayesian analysis otherwise.
hard to characterise it, even if a suitable measurement and
experimental set-up is chosen. The problem arising from
insufficient repeatability, and consequently Experimental
MSbetween \MSwithin has been addressed by two proposals to
set the between-bottle variance to zero, or a nonzero value, Ten gas mixtures were prepared by connecting ten evacu-
based on the uncertainty associated with the mean squares ated gas cylinders with a water volume of 1 L to a
‘‘within groups’’ [11]. Gelman et al. [12] comment that manifold, equipped with 10 positions. A gravimetrically
setting the between-group standard deviation in ANOVA to prepared synthetic natural gas mixture was obtained from a
zero when MSbetween \MSwithin is not necessarily what the commercial speciality gas supplier. This mixture was
data tells. This attitude is followed by ISO Guide 35 [5], decanted into the 10 cylinders by connecting it to a mass
by providing an alternative. The uncertainty contribution is flow controller which in turn was connected to the mani-
computed as [4, 5] fold. The gas flow rate was controlled during decanting, to
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffipffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi mitigate the gas temperature drop due to Joule–Thomson
ubb ¼ MSwithin =n 4 2=ðaðn  1ÞÞ ð1Þ
cooling [16]. The selected dataset originates from the
where a denotes the number of bottles and n the number of experiments performed after optimising the settings of the
replicate measurements in the homogeneity study. It decanting method.
reflects the uncertainty associated with the repeatability of The analyses were performed on an Agilent 7980A gas
the method [11]. This alternative has been criticised as it chromatograph, configured for the analysis of the compo-
would not properly address the dispersion due to between- sition of natural gas and biogas. The back channel consists
bottle heterogeneity [13]. In this paper, we present an of a HayeSep Q pre-column and HayeSep T? column and
alternative that sounds from a statistical point of view and a thermal conductivity detector (TCD) and is used for the
characterises even for datasets where MSbetween \MSwithin , separation of methane and carbon dioxide from the other
the between-bottle homogeneity. components. Nitrogen, hydrogen and oxygen are separated
The alternative to using traditional ANOVA [4], meta- and detected on the auxiliary channel equipped with a
analysis, or restricted maximum-likelihood estimation [13] is Hayesep/Molsieve column and TCD. The front channel is
to use a Bayesian approach to ANOVA [12, 14] which enables equipped with a PDMS-column and a flame ionisation
computing the within- and between-bottle standard devia- detector (FID). This channel is used for the separation of
tions, even if MSbetween \MSwithin , at the expense that the the hydrocarbons save methane. The carrier gas is helium
Bayesian approach is computationally more involved. The for the front and back channels and argon for the auxiliary
Bayesian analysis provides (samples of) the probability dis- channel. The data collection is performed using ChemS-
tributions for the within- and between-bottle standard tation software. The temperature programme is the same
deviations. From these probability distributions, values for the for all channels and starts with 35  C for 6 min, followed
corresponding standard deviations can be obtained. The by increasing the temperature with 10  C min1 to 100  C
model presented builds forth on a model developed for and hold for 0.5 min.
reworking the ANOVA example in the GUM (Guide to the The GC was calibrated with a suite of Primary Standard
expression of uncertainty in measurement) [15, example H.5]. Gas Mixtures (PSMs). The calibration was done using a
To assess the performance of a Bayesian analysis for multipoint calibration approach with at least 7 PSMs for
evaluating batch homogeneity, a dataset has been chosen each component [18]. The data processing was done in
accordance with ISO 6974 [19, 20]. For all components, a

123
Accred Qual Assur

quadratic polynomial was used. The peak areas observed Traditional ANOVA
during calibration and the homogeneity study were cor-
rected for instrument drift. The validity of the calibration Traditional ANOVA is well covered in the literature [3–5]
was confirmed using a quality control mixture. The cur- and the most widely used statistical method in the litera-
vatures of the calibration functions are irrelevant for the ture. The expression for the between-bottle variance is
calculations involved in the homogeneity study; the cur- obtained from the relationships between the mean squares
vature is small, and the amount-of-substance fractions of between and within groups and the expectations of the
the ten gas mixtures subjected to the homogeneity study are corresponding variances [3, 12]. The expression for the
close enough that nonlinearity of the analyser does not between-bottle standard deviation, here denoted by s, takes
influence the outcome. The measurements on the 10 gas the form
mixtures were taken in a single sequence under repeata- MSbetween  MSwithin
bility conditions. For each of the gas mixtures, six replicate s2 ¼ ð3Þ
n
injections were performed, of which the first one was dis-
carded to eliminate carry-over effects. where n denotes the number of replicates in one group. s is
identical to the sbb as used elsewhere in the literature (e.g.
[4, 5, 11]). The pooled repeatability
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi standard deviation of
Modelling and data analysis the method equals MSwithin [3].

Basic model Meta-analysis

The methods for evaluating a between-bottle homogeneity Meta-analysis is widely used to combine results from dif-
in this paper serve the same purpose as the ones proposed ferent studies [17, 22]. It also finds use to combine results
previously for evaluating between-bottle homogeneity from different laboratories in an interlaboratory compar-
studies [2, 3, 4] and are based on largely the same ison [23] and reference material characterisation [24].
assumptions. The main objective is still to compute the Meta-analysis can also be used to determine the between-
between-bottle standard deviation sbb from a dataset of a group standard deviation s in a between-bottle homo-
fully nested one-way design. As before, the measurement geneity study, as an alternative to traditional ANOVA (see
data (e.g. group means and replicates within the groups) is section Traditional ANOVA).
assumed to be normally distributed, but this is not a Based on the same random-effects model (see Eq. (2)), a
requirement for applying Bayesian methods; quite on the meta-analysis can be performed, which, among others, can
contrary, Bayesian methods are easier to adapt when other provide a value for s2 . In fact, traditional ANOVA can be
assumptions concerning the distribution of the measure- viewed as belonging to the larger family of models used for
ment data are needed [12, 21]. All methods can of course meta-analysis [22]. Several studies have been performed to
be adapted accordingly when another experimental set-up study the performance of the various models used for meta-
is chosen. analysis. From these studies, it can be inferred that the
All data analysis methods in this paper are based on the DerSimonian–Laird (DL) model pairs computational ease
same random-effects model. This model has been the basis with good characterisation of the dispersion of the property
for the development of the traditional ANOVA method for value across groups. The expression for the between-bottle
assessing the batch homogeneity of reference materials [4]. variance reads as [17]
This random-effects model can be written as [3, 4] P  2
i wi Yi  Y ða  1Þ
Yij ¼ l þ Ai þ eij ð2Þ s2 ¼ P P 2 P ð4Þ
i wi  i wi = i wi
where Yij denotes the value of the jth measurement of the
where Yi denotes a group mean, wi ¼ 1=s2i , si the within-
ith bottle (item), l the expected value, Ai the bias of the ith P P
group standard deviation, and Y ¼ i wi Yi = i wi . Similar
bottle (item), and eij the random measurement error in
to traditional ANOVA, in meta-analysis s2 ¼ 0 when the
observation j on item i. In most cases, it is assumed that
result of Eq. (4) is negative. The meta-analysis using the
eij  Nð0; r2 Þ where r2 denotes the repeatability variance.
DL-method does not use pooling of the within-group
Furthermore, it is assumed that Ai  Nð0; s2 Þ where s2
variances.
denotes the between-group variance. In case of a between-
The weights wi in Eq. (4) will for well-designed
bottle homogeneity study, s is the between-bottle standard
between-bottle homogeneity studies usually be very simi-
deviation, often denoted by rbb or sbb [4].
lar. Even if the within-group standard deviations si are
assumed to be samples of the same probability distribution,

123
Accred Qual Assur

these will be different. It is then still appropriate to use techniques and ways to evaluate the performance of the
weighing, albeit that the influence of the weighing will calculation.
generally be small. All calculations involving MCMC were implemented in
R [28], using the package RStan [29]. This package
Bayesian analysis enables writing the models in a specific language called
Stan [12]. To monitor the performance of the MCMC
Bayesian methods are in some respects fundamentally methods, the statistic R^ was used; this parameter is defined
different from classical statistical methods. Bayesian as [12]
methods use the measurement data to update the proba- rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
n1 1 B
bility density function of one or more parameters. Hence, R^ ¼ þ
n nW
the probability density functions of the parameter before
the updating need to be specified [12]. where B and W are the between-chain and within-chain
Two variants of the Bayesian model are proposed, one variance, and n is the length of the chain. At convergence,
with pooling of the within-group standard deviations, and R^  1 for each scalar parameter in the model. Furthermore,
another without pooling. In the case of pooling, there are trace plots [12] were used for inspecting the behaviour of
three parameters in the Bayesian model: the grand mean l, the chains in the MCMC calculation. The sampler in
the within-group standard deviation r and the between- RStan has some advantages for non-experts in Bayesian
group standard deviation s [12]. The Bayesian analysis computation in that they attempt to configure themselves
provides for these three parameters posterior probability during a warm-up phase, after which it generates the
density functions, from which, among others, the between- samples for an MCMC chain. Multiple chains can be used
bottle standard deviation can be obtained. For all parame- and combined, which is not a property of all samplers
ters, weakly informative prior distributions are selected, to [12, 30].
(1) allow the data to dominate in the Bayesian analysis and As prior for the mean l, the normal distribution is
(2) improve the performance of the Markov Chain Monte chosen with the nominal value of the amount-of-substance
Carlo method (MCMC), used for calculating a sample of fraction of the parent mixture as mean and a suitably large
the posterior distributions [12, 25]. standard deviation. This standard deviation is chosen to be
In most real-world situations, Bayesian computations wider than the dispersion of the group means and set to 5
are much more involved than their classical counterparts. times the repeatability standard deviation. For all standard
Only in a very limited number of cases, analytical deviations, the folded Cauchy distribution is used as prior,
expressions exist for, e.g., the mean and standard deviation which performs well for both between-group and within-
of a parameter after a Bayesian inference. Such cases group standard deviations [21]. Whereas choosing a non-
typically involve the use of non-informative, conjugate or informative probability distribution as prior for r would
semi-conjugate probability density functions to model prior lead to acceptable results, this is not the case for the
information [26]. Whereas in developing Bayesian models between-group standard deviation s [21]. The scale
such priors can be convenient as a starting point [12], priors parameter (A) for the folded Cauchy distribution is inferred
should be elicited on the knowledge at hand [27]. from prior knowledge about the analytical system (for the
Weakly informative priors are often appropriate in within-group standard deviation) and the envisaged batch
metrology, because (1) they allow the data to dominate and homogeneity (for the between-group standard deviation),
(2) they ensure that the posterior is ‘‘proper’’, i.e. the respectively. The location parameter of the half Cauchy
posterior fulfils the requirements of a probability density distribution is set to 0. Most of the density of the half
function. Using priors often implies that a Monte Carlo Cauchy distribution is then concentrated on the interval
method is required to generate a sample from the posterior. [0, A] [21].
From this sample, the mean, standard deviation and cov- The hierarchical model with pooling is shown in
erage interval, among others, can be computed. This kind Table 1. The model without pooling is obtained by
of Monte Carlo method is known as Markov Chain Monte declaring sig as an array (for each group an unknown
Carlo (MCMC) [12]. These methods can be implemented within-group standard deviation) and adapting the code
in various ways, but require resources well beyond those accordingly. The package RStan [29] enables writing the
usually employed for data processing in metrology (e.g. models in a specific programming language called Stan
mainstream spreadsheet software). Furthermore, coding (see Table 1) [12] and running the calculations from R. The
models involving MCMC requires a great deal of knowl- sampler in RStan has some advantages for non-experts in
edge about the properties and performance of these Bayesian computation in that it attempts to configure itself
computational methods sampling and simulation during a warm-up phase, after which it generates the
samples for an MCMC chain.

123
Accred Qual Assur

Table 1 Hierarchical model for a between-bottle homogeneity study with pooling

When performing the MCMC calculations, four chains of Monte Carlo calculations, give slightly different answers
sufficient length were used. The efforts in optimising the each time the calculation is run. As long as these differences
chain length have been kept to a minimum, as the compu- are small enough to be meaningless, the calculation was
tation is fast and the only concern has been that the values for deemed sufficiently accurate. Four chains have been used
the parameters could be reproduced to all meaningful digits. with 25 000 iterations. The warm-up phase was set to be 5000
It should be emphasised that these calculations, like all samples long, leaving 20 000 per chain for the posterior.

123
Accred Qual Assur

To improve the numerical stability of the model, the MSbetween \MSwithin [13]. For this dataset, these two alter-
data are rescaled by dividing the group means by their natives differ appreciably.
mean and adapting the within-group standard deviations The results in Table 2 have been used in this work as a
accordingly. In the parameter block (see Table 1), the benchmark in the validation of the Bayesian models with
scaled parameters are declared. The unscaled parameters and without pooling, save for nitrogen for which traditional
are computed in the block generated quantities. ANOVA is unable to provide a nonzero between-bottle
The rescaling also aids when using the model for other standard deviation.
datasets with a similar structure.
Meta-analysis

Results The results of the meta-analysis using the DL-model are


shown in Table 3. The mean values l^ using meta-analysis
Measurement data are very close to those from traditional ANOVA (see
Table 2). The between-bottle standard deviation s^ is larger
The results of the homogeneity study are shown in Fig. 1. in the meta-analysis than for the traditional analysis of
The datasets for the various components are quite different variance for ethane, iso-butane, n-butane, carbon dioxide
in terms of the dispersion of the group means and how this and methane. The between-bottle standard deviation is
dispersion relates to the expanded uncertainty computed as smaller for propane and n-hexane. For the other compo-
2  si where si denotes the standard deviation of the group nents, s^ from the meta-analysis is very similar to that from
mean. For all components save ethane and nitrogen, the traditional ANOVA. Also, meta-analysis provides for all
dispersion of the group means is larger than would be components nonzero values for s, save for nitrogen.
expected based on the computed expanded uncertainty.
The variability of the within-group standard deviations Bayesian analysis, pooled within-group standard
differs appreciably among components. The datasets for n- deviation
hexane and carbon dioxide show the largest heteroscedas-
ticity in this respect. The datasets from, e.g., methane, In the case of a Bayesian analysis involving MCMC, it is
ethane and the pentanes look much more homoscedastic, important to assess the convergence of the iterative cal-
i.e. have similar within-group standard deviations. culation [12]. Trace plots give, as function of the iteration
Pooling of the within-group standard deviation is often number, the value of the parameter. For both standard
appropriate in between-bottle homogeneity studies, deviations, the trace plots are shown for the scaled
because the data are usually obtained under repeatability parameters tau_ and sig_, which are defined as s=y and
conditions [4]. it is in most cases unlikely that the r=y, respectively. These scaled standard deviations are
repeatability standard deviation fluctuates appreciably from good approximations to the corresponding coefficients of
bottle to bottle. There is of course variability in the cal- variation and can be interpreted as such. This presentation
culated standard deviations, which is to a large extent due supports easier comparison of the posterior distributions
to the limited number of repeat observations. (posteriors) for the standard deviations across components.
The trace plots for ethane are shown in Fig. 2. These
Traditional ANOVA plots are typical for parameters for which the MCMC
calculation converged. The results for the four chains are
The results of evaluation of the between-bottle homo- very similar, and there is no meaningful between-chain
geneity from Fig. 1 using one-way ANOVA [4] are given ^ which is
variability. This is also reflected in the value for R,
in Table 2. As the within- and between-group standard indeed very close to one for the three parameters (see
deviations are small, these are given in lmol mol1 (parts- Table 4). The results are given with the number of mean-
per-million). Save nitrogen, all other datasets have a non- ingful digits derived from the standard error of the MCMC
zero between-bottle standard deviation. For nitrogen, the (SE).
MSbetween is smaller than MSwithin . This dataset alone would In Table 4, the column labelled mean gives the mean of
be sufficient motivation for using Bayesian inference the parameter, SE the standard deviation of the mean due to
instead of the traditional method for evaluating this the random error in the MCMC and SD the standard
homogeneity study. deviation of parameter mean. The next two columns give
For pnitrogen, using
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi Eq. (1), ubb ¼ 1:48 lmol mol1 the limits of the symmetric 95 % credible interval. The
where MSwithin =n ¼ 3:13 lmol mol1 . The latter has column labelled neff gives the approximate number of
been proposed as a ‘‘safe’’ value for sbb in case independent samples and the last column the value of R. ^ R
returns also the characteristics for all other model

123
Accred Qual Assur

ethane propane

4.703
3.501
fraction (cmol/mol)

fraction (cmol/mol)
3.499

4.701
3.497

4.699
3.495

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
mixture mixture
0.12348

iso−butane n−butane

0.12254
fraction (cmol/mol)

fraction (cmol/mol)
0.12342

0.12248
0.12336

0.12242
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
mixture mixture
iso−pentane n−pentane
0.03124
0.03059

fraction (cmol/mol)
fraction (cmol/mol)

0.03121
0.03056

0.03118
0.03053

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
mixture mixture
n−hexane nitrogen
0.07600
fraction (cmol/mol)

fraction (cmol/mol)
0.426
0.07594

0.424
0.07588

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
mixture mixture
carbon dioxide methane
5.036

85.98
fraction (cmol/mol)

fraction (cmol/mol)
5.032

85.95
85.92
5.028

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
mixture mixture

Fig. 1 Amount-of-substance fractions of the components (in cmol mol1 , or, equivalently, %) in the 10 synthetic natural gas mixtures. The half-
widths of the uncertainty bars represent the expanded uncertainty, computed 2si where si denotes the standard deviation of the group mean

123
Accred Qual Assur

Table 2 Results of the evaluation of the between-bottle homogeneity data using traditional analysis of variance
Component l MSbetween MSwithin s^ r^
cmol mol1 cmol2 mol2 cmol2 mol2 lmol mol1 lmol mol1

Ethane 3.49779 1:694 106 1:184 106 3.19 10.88


6 7
Propane 4.70073 4:278 10 2:213 10 9.01 4.70
iso-Butane 0.123405 2:139 109 1:796 1010 0.20 0.13
n-Butane 0.122493 2:783 109 1:993 1010 0.23 0.14
iso-Pentane 0.030568 2:667 1010 9:220 1011 0.06 0.10
n-Pentane 0.031213 4:436 1010 4:385 1011 0.09 0.07
9 10
n-Hexane 0.075964 4:408 10 1:193 10 0.29 0.11
Nitrogen 0.425144 3:249 107 4:903 107 7.00
Carbon dioxide 5.03265 9:014 106 1:320 106 12.4 11.5
4
Methane 85.9601 6:472 10 8:288 105 106 91
Each dataset comprises 10 gas mixtures and 5 replicates per mixture [16]

Table 3 Results of the evaluation of the between-bottle homogeneity data using meta-analysis and the DerSimonian–Laird model
Component l^ uðlÞ
^ s^
cmol mol1 cmol mol1 lmol mol1

Ethane 3.497798 0.000201 4.49


Propane 4.700733 0.000257 7.86
iso-Butane 0.123404 0.000005 0.15
n-Butane 0.122493 0.000006 0.18
iso-Pentane 0.030568 0.000002 0.07
n-Pentane 0.031212 0.000003 0.09
n-Hexane 0.075965 0.000006 0.19
Nitrogen 0.425069 0.000073
Carbon dioxide 5.03267 0.00040 11.7
Methane 85.9602 0.0038 114

0.00125

3.499
0.0004
0.00100

3.498 chain chain chain


m (cmol/mol)

0.00075
1 1 0.0003 1
σ/y
τ /y

2 2 2
3 0.00050 3 3

3.497 4 4 4

0.00025 0.0002

3.496
0.00000
5000 10000 15000 20000 25000 5000 10000 15000 20000 25000 5000 10000 15000 20000 25000
Iteration number Iteration number Iteration number

Fig. 2 Trace plots for l, s=y, and r=y for ethane

123
Accred Qual Assur

Table 4 Results of the MCMC calculation for ethane


Mean SE SD q0:025 q0:975 neff R^

l 3.4977911 0.0000008 0.0003141 3.4971687 3.4984109 145081 1.0000


s 0.0003508 0.0000009 0.0002891 0.0000139 0.0010754 109659 1.0001
r 0.0008777 0.0000003 0.0000990 0.0006905 0.0010781 152000 1.0000
Given are the mean, the standard error from the MCMC (SE), the standard deviation (SD), the 2.5 % and 97.5 % quantiles (q0:025 and q0:975 ;
respectively), all expressed in cmol mol1 , the effective number of samples ðneff Þ and R^

parameters, but these are omitted for brevity. It is worth estimated to be s^ ¼ 1:72 lmol mol1 , which lies in
noting that the s^ from ANOVA lies in the 95 % coverage between
p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ubb ¼ 1:48 lmol mol1 and
1 
interval of the same parameter from the Bayesian analysis. MS =n
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
within ¼ 3:13 lmol mol . Both u bb and
The repeatability standard deviation r^ from ANOVA lies MSwithin =n lie in the 95 % coverage interval of s.
just above the 97.5 % quantile of the same parameter from Comparing the results of nitrogen with those of ethane
the Bayesian analysis. (see Table 4), it is worth noting that the standard deviation
The posterior distributions of l, s, and r for ethane are of s for both parameters is quite large. These datasets give
shown in Fig. 3. The posterior of s is very asymmetric. It is with the Bayesian model very similar estimates for s and r
worth noting that the mode of the posterior of r=y is much (when expressed as coefficients of variation), yet the
larger than that of s=y. In the context of the GUM [15], classical methods (ANOVA and meta-analysis) treat the
from the posteriors of s, and r, only the location is of two datasets very differently.
relevance; the ‘‘uncertainty of the uncertainty’’ is not fur- The value of s^ for nitrogen compares well neither to the
ther relevant to the evaluation of measurement uncertainty. practice of setting it to zero, nor to the value of ubb from
The means of the posteriors of s and r are taken as esti- Eq. (1). Expressed as coefficient of variation (0.040 %), it
mates for the corresponding standard deviations. is substantially larger than pthat of carbon ffi dioxide (0.003 %)
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
The fitting of the model shown in Table 1 for the and methane (0.010 %). MSwithin =n is substantially larger
datasets of other components converged well. Notwith- than s^, thus providing a ‘‘safe value’’ for this uncertainty
standing the fact that the Bayesian model uses the scaled component.
parameters for the computations, the results for propane The results of the Bayesian analysis using the model
and n-hexane indicated that convergence was slower than with pooling are summarised in Table 6. The between-
for the other datasets. bottle standard deviation is larger than that from traditional
Computationally, the dataset of nitrogen was not more ANOVA (see Table 2) for ethane, propane and carbon
challenging than the others. The trace plots of the Bayesian dioxide and smaller for iso-pentane and methane. For the
analysis (see Fig. 4) show good convergence. The poste- other components, the between-bottle standard deviations
riors are shown in Fig. 5. The posterior of s for nitrogen from both approaches are very similar.
looks as skewed as that for ethane (see Fig. 3). Notwithstanding that a weakly informative prior is used
The output for nitrogen for l, s, and r is shown in for s, it is useful to perform an analysis of the sensitivity of
Table 5. The between-bottle standard deviation is the estimated between-bottle homogeneity standard
6000
1000 1500 2000

12000
4000

8000
density

density

density
2000

4000
500
0

3.4960 3.4975 3.4990 0.0000 0.0004 0.0008 0.0012 0.00015 0.00030


μ (cmol/mol) τ/ y σ/ y

Fig. 3 Posteriors of l, s=y, and r=y for ethane

123
Accred Qual Assur

0.003 0.0018

0.4255

0.0015
chain 0.002 chain chain
m (cmol/mol)

1 1 1

σ/ y
τ/y
0.4250 2 2 2
3 3 0.0012 3
4 4 4
0.001

0.4245 0.0009

0.000

5000 10000 15000 20000 25000 5000 10000 15000 20000 25000 5000 10000 15000 20000 25000
Iteration number Iteration number Iteration number

Fig. 4 Trace plots for l, s=y, and r=y for nitrogen


1500

2500
3000

1000
density

density

density
1500
500
1000

0 500
0

0.4245 0.4250 0.4255 0.0000 0.0010 0.0020 0.0030 0.0008 0.0012 0.0016
μ (cmol/mol) τ /y σ /y

Fig. 5 Posteriors of l, s=y, and r=y for nitrogen

Table 5 Results of the MCMC calculation for nitrogen


Mean SE SD q0:025 q0:975 neff R^

l 0.4251443 0.0000004 0.0001680 0.4248122 0.4254757 152000 1.0000


s 0.0001720 0.0000004 0.0001439 0.0000061 0.0005320 152000 1.0000
r 0.0004808 0.0000001 0.0000557 0.0003752 0.0005935 152000 1.0000
Given are the mean, the standard error from the MCMC (SE), the standard deviation (SD), the 2.5 % and 97.5 % quantiles (q0:025 and q0:975 ,
respectively), all expressed in cmol mol1 , the effective number of samples (neff ) and R^

deviation s^ as a function of the scale parameter A of the mind that this parameter has been evaluated over a very
Cauchy distribution. For ease of comparison between wide range, in this case, from 0.05 % to 1 %. Usually, from
components, the results and scale parameters are given inspecting the group means (¼ means of the bottles), it
relative to the corresponding l^ for the component. The should be possible to obtain a realistic value for the scale
superiority of this distribution over reference priors for a parameter A in accordance with the recommendations from
between-group standard deviation is well covered in the Gelman et al. [12].
literature [21].
The results of the sensitivity analysis for selected Bayesian analysis, no pooling of the within-group
components are shown in Fig. 6. Only the computed standard deviation
between-bottle homogeneity standard deviation for nitro-
gen shows a noticeable influence of the scale parameter A There can be circumstances where, even in a between-
of the Cauchy distribution. It should however be kept in bottle homogeneity study, it is unclear whether pooling of

123
Accred Qual Assur

Table 6 Values and standard deviations of the fitted parameters l, s, and r for the homogeneity data expressed as amount-of-substance fractions
using the model with pooling of the within-group standard deviations
Component l^ sðlÞ
^ s^ sÞ
sð^ r^ sðrÞ
^
cmol mol1 cmol mol1 lmol mol1 lmol mol1 lmol mol1 lmol mol1

Ethane 3.497790 0.000314 3.51 2.89 8.78 0.99


Propane 4.700730 0.000335 9.35 2.97 3.98 0.44
iso-Butane 0.123405 0.000008 0.20 0.08 0.11 0.01
n-Butane 0.122493 0.000009 0.24 0.08 0.11 0.01
iso-Pentane 0.030568 0.000003 0.04 0.03 0.07 0.01
n-Pentane 0.031213 0.000003 0.08 0.04 0.06 0.01
n-Hexane 0.075964 0.000010 0.30 0.08 0.06 0.01
Nitrogen 0.425144 0.000168 1.72 1.44 4.81 0.56
Carbon dioxide 5.03265 0.00050 13.6 4.9 6.7 0.7
Methane 85.9601 0.0041 90 46 79 9

ethane rescaling the data and lengthening the warm-up phase of


0.05 propane
the MCMC. The trace plots look very similar to those
iso-butane
n-butane obtained for the model with pooling (see Figs. 2 and 4) and
0.04 nitrogen are therefore not presented here. The same applies to the
carbon dioxide
methane
obtained posteriors, which are in shape also very similar to
their counterparts in section Bayesian analysis, pooled
0.03
within-group standard deviation (see Figs. 3 and 5).
τ /μ (%)

The results of the Bayesian analysis using a hierarchical


0.02 model without pooling are summarised in Table 7. The
results from the model without pooling of the within-group
0.01 standard deviations (see Table 7) compare well to those for
the model with pooling (see Table 6). This is not surprising,
because the measurements have been taken under repeata-
0.00
0.001 0.01
bility conditions in the same experiment. The between-bottle
A/μ standard deviation for ethane is larger than for the model with
pooling and that for nitrogen smaller. That these standard
Fig. 6 Sensitivity analysis of s^ for different values of the scale deviations differ most (in relative sense) is not surprising,
parameter A of the Cauchy distribution. According to Gelman [21],
because for these components the posteriors of s are rela-
this scale parameter should not be smaller than the between-group
standard deviation tively broad, indicating that the derived estimates s^ are more
uncertain than those for the other components.

the within-group standard deviations is justified or appro-


priate. The underlying assumption for pooling is that all Discussion and conclusions
within-group standard deviations can be viewed as draws
from the same probability distribution. Some experi- A Bayesian model is presented for the evaluation of between-
menters would prefer not at all to rely on such an bottle homogeneity studies in the production of reference and
assumption. Hence, the same dataset (see Fig. 1) has been proficiency test materials. The model provides, for all com-
evaluated with a model without pooling as well. ponents in a challenging dataset with small values for the
The model without pooling is somewhat more difficult between-bottle standard deviation, a nonzero value for the
to handle computationally than the model with pooling. latter standard deviation. The models with and without pool-
This is largely due to the fact that now the within-group ing give very similar answers for most components. The larger
standard deviations are calculated based on 5 of degrees of differences for the between-bottle standard deviation (s) for
freedom and this is for the sampler harder than with the ethane and nitrogen are partly due to the fact that the posterior
large number of degrees of freedom in the case of pooling. distributions of s are wider for these components, which
Eventually, the problem was satisfactorily solved by indicates more uncertainty about the estimates s^.

123
Accred Qual Assur

Table 7 Values and standard deviations of the fitted parameters l and s for the homogeneity data, expressed as amount-of-substance fractions
using the model without pooling of the within-group standard deviations
Component l^ sðlÞ
^ s^ sÞ
sð^
cmol mol1 cmol mol1 lmol mol1 lmol mol1

Ethane 3.497790 0.000212 4.23 2.23


Propane 4.700731 0.000321 9.65 2.65
iso-Butane 0.123404 0.000007 0.20 0.06
n-Butane 0.122493 0.000008 0.24 0.07
iso-Pentane 0.030568 0.000003 0.07 0.02
n-Pentane 0.031213 0.000003 0.10 0.03
n-Hexane 0.075964 0.000010 0.31 0.08
Nitrogen 0.425086 0.000096 1.21 0.92
Carbon dioxide 5.03267 0.00048 13.5 3.9
Methane 85.9601 0.0041 115 33

The estimated between-bottle standard deviation s^ for 2. Pauwels J, van der Veen AMH, Lamberty A, Schimmel H (2000)
nitrogen is more sensitive to the choice of the scale Evaluation of uncertainty of reference materials. Accred Qual
Assur 5(3):95–99. doi:10.1007/s007690050020
parameter of the prior (the Cauchy distribution) than for the 3. van der Veen AMH, Pauwels J (2000) Uncertainty calculations in
other components. The estimates of the between-bottle the certification of reference materials. 1. Principles of analysis of
standard deviation of the other components are very variance. Accred Qual Assur 5(12):464–469. doi:10.1007/
weakly dependent on the choice of this scale parameter. s007690000237
4. van der Veen AMH, Linsinger TP, Pauwels J (2001) Uncertainty
Whether this difference in behaviour has a relationship calculations in the certification of reference materials. 2. Homo-
with the fact that the classical statistical methods are geneity study. Accred Qual Assur 6(1):26–30. doi:10.1007/
unable to provide a value for the between-bottle standard s007690000238
deviation for nitrogen cannot be concluded on the basis of 5. ISO Guide 35 (2006) Reference materials—general and statistical
principles for certification. International Organization for Stan-
this dataset alone. dardization, Geneva
Notwithstanding that there is a desire to make the GUM 6. ISO Guide 34 (2009) General requirements for the competence of
‘‘Bayesian’’ [31], classical statistical methods can be reference material producers. International Organization for
employed as well in most circumstances, as they provide Standardization, Geneva
7. ISO 17034 (2016) General requirements for the competence of
generally very similar values for the parameters of interest reference material producers. International Organization for
(i.e. l, s, and r). This conclusion is supported by per- Standardization, Geneva
forming the evaluation of the homogeneity data using 8. ISO 13528 (2015) Statistical methods for use in proficiency
traditional analysis of variance (ANOVA) and meta-anal- testing by interlaboratory comparison. International Organization
for Standardization, Geneva
ysis using the DerSimonian–Laird model. Save for 9. ISO/IEC 17043 (2010) Conformity assessment—general
nitrogen, the results from these classical methods are very requirements for proficiency testing. International Organization
similar to the results from the Bayesian models. for Standardization, Geneva
The Bayesian analysis gives a value for the between- 10. Ulrich JC, Sarkis JES, Hortellani MA (2015) Homogeneity study
of candidate reference material in fish matrix. J Phys Conf Ser
bottle standard deviation which is not even close to the ubb 575:012,040. doi:10.1088/1742-6596/575/1/012040
as has been proposed [4] and described in ISO Guide 35 11. Linsinger TPJ, Pauwels J, van der Veen AMH, Schimmel H,
[5] in case of MSbetween \MSwithin . In case of classical Lamberty A (2001) Homogeneity and stability of reference
statistical methods, it would be more appropriate to use materials. Accred Qual Assur 6(1):20–25. doi:10.1007/
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi s007690000261
MSwithin =n [13] as a ‘‘safe value’’ under these 12. Gelman A, Carlin J, Stern H, Dunson D, Vehtari A, Rubin D
circumstances. (2013) Bayesian data analysis, 3rd edn. Chapman and Hall/CRC,
Boca Raton
Acknowledgements The work presented in this paper was funded by 13. Ellison SLR (2015) Homogeneity studies and ISO Guide
the Ministry of Economic Affairs of the Netherlands. 35:2006. Accred Qual Assur 20(6):519–528. doi:10.1007/s00769-
015-1162-z
14. Hoff PD (2009) A first course in Bayesian statistical methods.
Springer, New York. doi:10.1007/978-0-387-92407-6
References 15. BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP, OIML (2008)
Guide to the expression of uncertainty in measurement, JCGM
1. Pauwels J, Lamberty A, Schimmel H (1998) Homogeneity testing 100:2008, GUM 1995 with minor corrections. BIPM, Sèvres
of reference materials. Accred Qual Assur 3(2):51–55. doi:10. 16. Beelen R (2016) Preparation of a homogeneous set of PT mate-
1007/s007690050186 rials. Technical report, VSL, Delft

123
Accred Qual Assur

17. DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. certification of reference materials by interlaboratory comparison.
Controlled Clin Trials 7(3):177–188. doi:10.1016/0197- Accred Qual Assur 19(4):269–274. doi:10.1007/s00769-014-
2456(86)90046-2. http://www.sciencedirect.com/science/article/ 1066-3
pii/0197245686900462 25. Stan Developers Team (2016) Stan modeling language. User’s
18. ISO 6143 (2001) Gas analysis—comparison methods for deter- guide and reference manual. http://mc-stan.org/documentation/
mining and checking the composition of calibration gas mixtures, 26. Klauenberg K, Wübbeler G, Mickan B, Harris P, Elster C (2015)
2nd edn. International Organization for Standardization, Geneva A tutorial on bayesian normal linear regression. Metrologia
19. ISO 6974-1 (2012) Natural gas—determination of composition 52(6):878–892. doi:10.1088/0026-1394/52/6/878
with defined uncertainty by gas chromatography—part 1: 27. O’Hagan A (2014) Eliciting and using expert knowledge in
guidelines for tailored analysis. International Organization for metrology. Metrologia 51(4):S237–S244. doi:10.1088/0026-
Standardization, Geneva 1394/51/4/s237
20. ISO 6974-2 (2012) Natural gas—determination of composition 28. R Core Team (2016) R: a language and environment for statis-
with defined uncertainty by gas chromatography—part 2: mea- tical computing. R Foundation for Statistical Computing, Vienna,
suring-system characteristics and statistics for processing of data. Austria. https://www.R-project.org/
International Organization for Standardization, Geneva 29. Carpenter B, Gelman A, Hoffman M, Lee D, Goodrich B,
21. Gelman A (2006) Prior distributions for variance parameters in Betancourt M, Brubaker MA, Guo J, Li P, Riddell A (2016) Stan:
hierarchical models (comment on article by Browne and Draper). a probabilistic programming language. J Stat Softw (in press)
Bayesian Anal 1:515–534. doi:10.1214/06-ba117a 30. Klauenberg K, Elster C (2016) Markov chain monte carlo
22. DerSimonian R, Kacker R (2007) Random-effects model for methods: an introductory example. Metrologia 53(1):S32–S39.
meta-analysis of clinical trials: an update. Contemp Clin Trials doi:10.1088/0026-1394/53/1/s32
28(2):105–114. doi:10.1016/j.cct.2006.04.004 31. Bich W, Cox MG, Dybkaer R, Elster C, Estler WT, Hibbert B,
23. Kacker RN (2004) Combining information from interlaboratory Imai H, Kool W, Michotte C, Nielsen L, Pendrill L, Sidney S, van
evaluations using a random effects model. Metrologia der Veen AMH, Wöger W (2012) Revision of the ‘‘Guide to the
41(3):132–136. doi:10.1088/0026-1394/41/3/004 expression of Uncertainty in Measurement’’. Metrologia
24. Rivier C, Désenfant M, Crozet M, Rigaux C, Roudil D, Tufféry 49(6):702–705 http://stacks.iop.org/0026-1394/49/i=6/a=702
B, Ruas A (2014) Use of an excess variance approach for the

123

You might also like