Professional Documents
Culture Documents
Cordeiro, F., Emons, H., & Robouch, P. (2022) - Is The Z Score Sufficient To Assess Participants..
Cordeiro, F., Emons, H., & Robouch, P. (2022) - Is The Z Score Sufficient To Assess Participants..
https://doi.org/10.1007/s00769-022-01496-w
PRACTITIONER'S REPORT
Received: 6 May 2021 / Accepted: 3 February 2022 / Published online: 18 March 2022
© The Author(s) 2022
Abstract
Proficiency testing providers, accreditation bodies and testing laboratories should be aware that a laboratory participating
in a proficiency testing round might have reported a biased result despite a satisfactory performance indicated by an assess-
ment using uniquely the z score. A complementary performance evaluation, based on the ζ score and the assessment of the
measurement uncertainty, is therefore highly recommended. This work presents an intuitive graphical tool (the Naji2 plot)
that combines z and ζ scores together with the reported measurement uncertainties. This tool allows a comprehensive assess-
ment of the laboratory performance and enables to identify the need for corrective actions. The concerned laboratory should
then perform a root cause analysis and investigate their bias and/or their measurement uncertainty evaluation.
Keywords Proficiency testing · Performance assessment · Zeta score · Measurement uncertainty · Naji plot
13
Vol.:(0123456789)
146 Accreditation and Quality Assurance (2022) 27:145–153
together with the respective measurement uncertainties, in Table 1 The 27 theoretical possibilities related to z score, ζ score and
order to deliver a comprehensive assessment of the labora- the measurement uncertainty evaluation (MU)
tory performance. z score ζ score MU Possibility
or Naji2 plot
areas
13
Accreditation and Quality Assurance (2022) 27:145–153 147
[ ] ( )
( )2 ( )2 2 measurement uncertainties (case “a”), when urel xi is equal to
(5)
2
PL u xi + u xpt ≥ 𝜎pt 2 z ( )
urel (xpt) or to ) . While
( 𝜎pt,rel ( )each line crosses the y-axis (z = 0)
at (u x)i = u xpt or at u xi = 𝜎pt , they both cross the x-axis
Further transformed into: (u xi = 0) at z = −xpt ∕𝜎pt .
( ) 2 ( ) 2
(u xi ∕𝜎pt ) ≥ z2 ∕PL 2 − (u xpt ∕𝜎pt ) (6) The bias boundaries
Equation 6 describes the original Naji plot parabolas
( ) 2 No bias can be identified (null hypothesis H0) when the distri-
presented by Robouch et al. [4], when plotting (u xi ∕𝜎pt )
bution of a measurement result and the interval of the assigned
(y-axis) as a function of the z score (x-axis). These graphs
value and its expanded uncertainty (both assumed to follow a
were used in several publications related to proficiency test-
normal or approximately normal distributions) overlap with a
ing rounds organised in a broad variety of fields [10–17].
probability (α) of rejecting a true H
0 of 5%, and a probability
Furthermore, this graphical representation is currently
(β) of accepting a false H0 of 5% [19]. Consequently, a value
implemented in the commercial software “PROLab™”
(xi) lower than the assigned value (xpt) would be negatively
developed by Quodata GmbH (Dresden, Germany) for the
biased (z < 0) if:
interpretation of results reported in the frame of interlabora-
( )
tory comparisons [18]. xi + 1.64u xi < xpt − 1.64u(xpt ) (10)
An alternative transformation of Eq. 5 is further
investigated: where 1.64 is the inverse of the standard normal cumula-
( )2 ( )2 tive distribution for a probability of 95% (using the built-in
u xi ≥ 𝜎pt ⋅ z∕PL − u(xpt )2 (7) spreadsheet function “NORM.S.INV(0.95)”).
Combining Eqs. 2 and 10, one derives the following linear
Equivalent to: relation between u(xi) and z:
√[ ][ ] ( ) ( )
𝜎pt 𝜎pt u xi < (−𝜎pt z∕1.64) − u xpt (11)
u(xi ) ≥ z − u(xpt ) z + u(xpt ) (8)
PL PL
A similar linear relation is obtained in the case of “positive
Equation 8 describes two sets of hyperbolas obtained bias” (z > 0):
when plotting u(xi) versus the z score, for the two acceptance ( ) ( )
criteria ( PL = 2 or 3). This graphical representation consti-
u xi < (𝜎pt z∕1.64) − u xpt (12)
tutes the “Naji2 plot”. Each hyperbola presents the following
The two lines defined by Eqs. 11 and 12 delimit the “upper
characteristics: (i) it is symmetric around the y-axis; (ii) it
range” of the bias boundaries. The ranges delimited by these
is always positive or equal to zero when z = ±PL u(xpt )∕𝜎pt ;
lines are similar to those defined by the two green hyperbolas
and (iii) it increases steadily with increasing |z| to reach
(|ζ|> 2) and comply with the criteria for bias set by ISO Guide
the linear asymptote (y = ±𝜎pt z∕PL ). However, Eq. 8 is not
33 [20]. Such large ζ scores may be caused either by a too large
valid for z values ranging between “−PL u(xpt )∕𝜎pt ” and
numerator, indicating a significant deviation (bias) of the
“+PL u(xpt )∕𝜎pt ”, because they would lead to a negative value | |
reported value from the assigned value (|xi − xpt |); or by a too
under the square root. √ | |
small denominator ( u(xi )2 + u(xpt )2 ) caused by an underes-
timated reported MU, assuming that the uncertainty associated
with the assigned value is realistic. The concerned laboratory
Assessment of measurement uncertainties should then perform a root cause analysis and investigate their
bias and/or their measurement uncertainty evaluation, in order
In order to include the assessment of measurement uncer- to resolve the identified poor performance expressed as a ζ
tainties (cases “a”, “b”, “c”) in the Naji2 plots, the relation score.
between the relative standard deviation (urel(xi)) and the z Despite the fact that many PT results are indeed correspond-
score is described hereafter: ing to a satisfactory performance when expressed uniquely by
( ) ( ) ( )( ) the z score (|z| ≤ 2), some of them are located below the bias
u xi = urel xi xi = urel xi 𝜎pt z + xpt (9) boundaries (with |ζ| > 2) and may require corrective actions
by the laboratory.
Equation 9 is a linear relation of u(xi) as a function of the
z score. Two specific lines delimit the range of “realistic”
13
148 Accreditation and Quality Assurance (2022) 27:145–153
A hypothetical case study (v) two bias boundaries (Eqs. 11 and 12, see double blue
lines in Fig. 1).
In order to demonstrate the Naji2 plot concept, a hypo-
thetical proficiency test round attended by 40 participants The four vertical lines (|z| = 2 and 3), the four hyperbolas
was constructed. A PT provider processed a commer- (|ζ| = 2 and 3) and the two straight lines (urel(xi) = urel(xpt)
cially available food commodity to produce a test mate- or urel(xi) = σpt,rel) delimit all the possible Naji2 plot areas
rial with adequate homogeneity and stability, according identified by the letters “A” to “AA” in Table 1 and Fig. 1.
to the recommendations of ISO/IEC 17043 [3]. The PT The areas denoted by the letters with an asterisk in Table 1
provider also determined the total mass fraction of a could not be represented in Fig. 1, since they represent unre-
specific analyte ( xpt ) of 100 mg kg−1 and the associated alistic possibilities.
standard uncertainty (u(x pt)) of 3 mg kg −1. In addition, The spreadsheet function “NORMINV(RAND(),m,u)”
a 𝜎pt of 10 mg kg −1 was set to assess the measurement was used to generate two sets of 40 values, where m is the
capabilities of the laboratories to comply with some legal mean value and u the standard uncertainty. The first set of
requirements. Since u(xpt) ≤ 0.3 𝜎pt , the test item was con- data simulated the reported results (xi) normally distrib-
sidered fit-for-purpose and the z score could be applied for uted around the assigned value (m = 100 mg kg−1) with
performance assessment. u = 25 mg kg−1 (see Table 2, 2nd column). The second set
Figure 1 presents the Naji2 plot generated based on the simulated a broad range of reported standard measurement
above-mentioned predefined criteria ( xpt , u(xpt), and 𝜎pt ). uncertainties (u(xi)), with m = 6 mg kg−1 and u = 4 mg kg−1
This plot consists of: (see Table 2, 4th column). The standard uncertainties were
multiplied by a coverage factor (k = 2) to derive the expanded
(i) u(xi) (y-axis) as a function of the z score (x-axis) hav- uncertainties shown in Fig. 2.
ing; Table 2 summarises the synthetic data set, the relative
(ii) four vertical lines at |z| = 2 and 3; standard uncertainty urel(xi) (to be compared to urel(xpt) and
(iii) two sets of hyperbolas for |ζ| = 2 and 3 (Eq. 8); to σpt,rel) and the outcome of the four assessments (D%,i,
(iv) two lines representing u rel (x i ) = u rel (x pt ) and z score, ζ score, and MU). Similarly, Fig. 2 presents the
urel(xi) = σpt,rel (Eq. 9, see dotted and dashed lines in data set in a particular order, sorted by the reported result
Fig. 1); and in increasing order first, then by the performance catego-
ries (SP, QP and UP), expressed as z scores and ζ scores.
This allows the identification of 11 laboratories (from L25
to L38, see x-axis of Fig. 2) having “satisfactory” perfor-
18
U
|ζ| = 3 U mances expressed as z scores but “questionable” or even
|ζ| = 2
urel(xi ) = urel(xpt )
“unsatisfactory” performances when expressed as ζ scores.
16 L
urel(xi ) = σpt,rel
L
X
Figure 3 presents the Naji2 plot applied to the synthetic
14
X Bias boundary
O data set (Table 2). This graphical presentation allows an
intuitive and direct assessment of every data point, related
u(xi ), mg kg-1
12 W
C K
10
to all the assessment criteria under investigation, namely the
O F N
AA
z score, the ζ score or the bias boundaries, and the appro-
8 R
N
Z priateness of the reported MU. However, this graphical
6 B E presentation is only accessible to PT providers collecting
E Q
4 Z Q H
H measurand values including measurement uncertainties from
their participants.
2
Y P G D A D G P Y One can easily notice that: (i) most of the results are in
0 the central part of the plot, representing satisfactory z and ζ
-4 -3 -2 -1 0 1 2 3 4
scores; (ii) seven “overestimated” (case “c”) and (iii) seven
z score
seemingly “underestimated” (case “b”) MU were reported,
while four laboratories did not report their MU (u(xi) = 0
Fig. 1 The Naji2 plot, presenting the reported uncertainty (u(xi)) ver- for L02, L04, L13 and L27). Seventeen results are below
sus z score. Four vertical lines represent |z| = ± 2 or ± 3. The green
hyperbolas delimit |ζ|= 2, while the red ones delimit |ζ| = 3. The the “bias boundaries”, of which six are significantly biased
dotted and dashed lines delimit the “realistic” uncertainty range with |z|> 2 and |ζ|> 3 (L03, L04, L09, L14, L27, L33). The
(between urel(xi) = urel(xpt) and urel(xi) = σpt,rel, respectively), while remaining eleven laboratories have “satisfactory” z scores,
the two blue lines define the bias boundaries. The letters “A” to and “questionable” or “unsatisfactory” ζ scores (see empty
“AA” denote various Naji2 plot “areas” described in Table 1. This
graph was constructed using the following values: xpt = 100 mg kg−1; circles in Fig. 3). These laboratories may consider two types
u(xpt) = 3 mg kg−1; and σpt = 10 mg kg−1 of improvement actions: either an increase in their MU
13
Accreditation and Quality Assurance (2022) 27:145–153 149
Table 2 Synthetic data set of 40 laboratories having reported measurement results (xi) and u(xpt)=3 mg kg−1 and 𝜎pt = 10 mg kg−1). The grey and black cells represent “questionable” and
expanded uncertainties (U(xi)). The standard MU (u(xi)) and the relative standard uncertainty “unsatisfactory” performance scores, respectively. The last column refers to the MU assessment
(urel(xi)) are calculated accordingly. The performance scores (D%,i,, z and ζ scores) and the (cf. Case “a” realistic, “b” underestimated or “c” overestimated)
uncertainty assessment are based on the criteria set by the PT provider (xpt =100 mg kg−1;
13
150 Accreditation and Quality Assurance (2022) 27:145–153
Fig. 2 Results (xi) and expanded uncertainties (U(xi)) reported by (xpt ± 2 σpt), respectively. The green, yellow and red bars in the lower
40 laboratories (see Table 2). The solid line represents the assigned part of the figure indicate the “satisfactory”, “questionable” or “unsat-
value (xpt = 100 mg kg−1); while the blue and red dashed lines repre- isfactory” performance, expressed as z and ζ scores
sent the “assigned range” (xpt ± 2 u(xpt)) and the “acceptance range”
(vertical translation in the Naji2 plot) or a bias correction uncertainties. It was assumed that realistic uncertainty esti-
(horizontal translation in the Naji2 plot). mations should not be larger than the standard deviation for
performance assessment (u(xi)max = σpt) or smaller than the
uncertainty of the assigned value (u(x i)min = u(xpt)). This
concept was later adopted in the standard ISO 13528:2015
Discussion [7], where the example of the IMEP-111 is specifically
mentioned.
Measurement uncertainty evaluation Let us consider, for example, the results submit-
ted by L14 and L19 (62.2 ± 9.0 mg k g −1 /0.14/; and
Since 2000, the PT providers of the Joint Research Centre in 127.6 ± 11.5 mg kg−1 /0.09/, expressed as xi ± u(xi)/ure
Geel are assessing the participants’ performance using the z l(x i)/, see Table 2), in the frame of the hypothetical PT
and ζ scores for laboratories participating in various IMEP, round (where x pt = 100 mg k g −1; u(x pt) = 3 mg k g −1;
REIMEP, NUSIMEP PTs or for PT schemes organised by σpt = 10 mg kg−1; urel(xpt) = 0.03; and σpt,rel = 0.10). Accord-
JRC’s European Union Reference Laboratories for contami- ing to the above-mentioned criteria, the absolute MU
nants in food. The organisers have noticed that extremely reported by L14 would be considered as “realistic” (9 < 10,
large reported MU lead to |ζ|≤ 2 (satisfactory performance), in mg k g−1), while the one of L19 would be flagged as poten-
even when |z|≥ 3 (unsatisfactory performance). Hence, tially “overestimated” (11.5 > 10, in mg kg−1).
an additional evaluation criterion was introduced related However, according to Ellison and Williams [21, 22],
to the expected maximum and minimum measurement the relative uncertainty of chemical measurements can be
13
Accreditation and Quality Assurance (2022) 27:145–153 151
taken as constant for a range of measurand values. Based (acetic acid, 3% w/v). The test items were prepared gravi-
on this, the alternative approach described in this paper had metrically, which resulted in an accurate assigned value
been implemented by several EURLs of the JRC. Instead (xpt = 5.024 mg kg−1) with a small associated standard meas-
of comparing u(xi) to u(xpt) and to σpt, the reported rela- urement uncertainty (u(xpt) = 0.033 mg kg−1). Based on the
tive measurement uncertainty (urel(xi) = u(xi)/xi) is compared expert opinion, σpt,rel was set to 0.12. A total of 46 National
to the relative standard uncertainty of the assigned value Reference Laboratories (NRLs) and Official Control Labo-
(urel(xpt) = u(xpt)/xpt) and to the relative standard deviation for ratories (OCLs) reported results. The outcome of this PT is
proficiency assessment (σpt,rel = σpt/xpt) over the whole range presented in Fig. 4a, where the Naji2 plot includes the “bias
of reported mass fractions (e.g. from 60 to 140 mg kg−1 or boundaries”. Most of the laboratories reported accurate val-
|z| ≤ 4 in Figs. 1 and 3). ues and realistic measurement uncertainty estimations (case
Knowing that a constant relative uncertainty u rel(xi) “a”), while some of them reported overestimated MUs (case
implies a proportional increase of u(xi) with xi, one may “c”). However, according to ISO Guide 33 [20] six labora-
have for z > 0 (xi > xpt) a realistic u(xi) larger than u(xpt), tories have to investigate their significant bias (with |ζ|> 2),
or inversely, a realistic u(xi) smaller than u(xpt) for z < 0 of which three have a |z|≤ 2 (see empty circles in Fig. 4a).
(xi < xpt). Hence, according to the new criteria and unlike In 2019, the European Union Reference Laboratory for
to the assessment presented in ISO 13528, the MU state- Genetically Modified Food and Feed (EURL GMFF) organ-
ment of L14 is flagged as potentially “overestimated” ised a PT round (GMFF-19/02, [24]) for the determination
(0.14 > 0.10), while the one of L19 is considered as “realis- of GMOs in food and feed materials to support Regula-
tic” (0.09 < 0.10). tion (EU) 2017/625 on official controls [25]. One of the
These new criteria for the evaluation of reported meas- test items consisted of a pig feed spiked with a genetically
urement uncertainties may be taken into consideration to modified “40–3-2 soybean”. The EURL characterised the
replace those prescribed in the current ISO 13528 standard mass fraction of the GM event applying the EURL vali-
[7]. dated method and used the following values for the perfor-
mance assessment of the participants: xpt = 1.014 m/m %;
u(xpt) = 0.061 m/m %, and σpt,rel = 0.25. A total of 67 NRLs
Naji2 plot applied to real cases and OCLs reported results. Figure 4b shows the outcome of
this PT round, where the Naji2 plot presents the set of hyper-
In 2018, the European Union Reference Laboratory for bolas. Most of the laboratories reported accurate values and
Food Contact Materials (EURL-FCM) organised a PT realistic measurement uncertainty estimations, while some
round (FCM-18–02, [23]) for the determination of the mass of them reported overestimated MUs (case “c”). However,
fractions of total zinc and other metals in food simulant B according to ISO Guide 33 [20] sixteen laboratories have
to investigate their significant bias (with |ζ| > 2), of which
twelve have a |z| ≤ 2 (see empty circles in Fig. 4b). It is worth
noting that, in this particular case, the realistic range of MU
(defined by Eq. 9) goes to zero at z = − 4 (= 1/σpt,rel).
|ζ| = 3
18 |ζ| = 2
urel(xi ) = urel(xpt )
16 urel(xi ) = σpt,rel Conclusions
Bias limit
14
L32
L19
10
L17
L26
L14
L10
L15
L40
L28
L24
8
L01
L21
L23
6
corrective action that, otherwise, would have been hidden
L30
L18
L03
L05
L16 L25
L35
L20
4
L11
L08
L33
2 score. Similarly to the original Naji plot, the Naji2 plot can
L34
L13
L02
L27
0
-4 -3 -2 -1 0 1 2 3 4
levels and proficiency testing schemes. It can be used to
z score summarise the outcome of a PT round presenting the results
of all laboratories for a given measurand, or to present a
Fig. 3 The Naji2 plot including the synthetic data set presented in historical overview of a laboratory participation in different
Table 2 rounds of a specific PT scheme.
13
152 Accreditation and Quality Assurance (2022) 27:145–153
(a) Naji2 plot and bias boundaries (b) Naji2 plot and ζ score hyperbolas
0.6
1.4
urel(xi ) = urel(xpt )
urel(xi ) = σpt,rel |ζ| = 3
1.2
Bias boundaries |ζ| = 2
urel(xi ) = urel(xpt )
1.0 0.4 urel(xi ) = σpt,rel
u(xi ), mg kg-1
u(xi ), m/m %
0.8
0.6
0.2
0.4
0.2
0.0 0.0
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
z score z score
FCM-18-02 – mass fracon of GMFF-19-02: mass fracon of
total Zn in food simulant B [23] “40-3-2 soybean” in animal feed [24]
Fig. 4 Two Naji2 plots presenting the outcome of two PT rounds the |ζ| ≤ 2 and the |ζ| ≤ 3 ranges, respectively. Empty circles indicate
(FCM-18–02 and GMFF-19–02). a includes the two blue double lines the results having |z| ≤ 2 and |ζ| > 2
defining the bias boundaries. b includes the two hyperbolas defining
13
Accreditation and Quality Assurance (2022) 27:145–153 153
13