Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Analytical

Methods
View Article Online
PAPER View Journal | View Issue
Published on 18 December 2012. Downloaded by The University of Melbourne Libraries on 14/10/2014 15:03:28.

Notes on the use of Mandel's test to check for


nonlinearity in laboratory calibrations
Cite this: Anal. Methods, 2013, 5, 1145

J. M. Andrade* and M. P. Gómez-Carracedo

The assessment of the straight line behaviour of a calibration function is of paramount importance to
finally decide on the linear range of an analytical procedure and to proceed with the correct calculation
of the confidence interval associated with the magnitude of the property being studied. Mandel's test is
a straightforward and simple test that, despite being suggested by IUPAC in 1998, has not been applied
broadly. The present work studies the validity of the IUPAC approach where the degrees of freedom
become simplified, compares it to the original definition given by Mandel, and reviews its correct
interpretation. Simulations were made varying the number of data points considered in a calibration
(from 4 to 500), as well as the magnitude of the variances of the linear and non-linear models. It was
found that the IUPAC simplification is not valid in general although it can be used safely when the
Received 14th November 2012
Accepted 15th December 2012
variances of the linear and alternative models are very similar, typically when they differ by less than
10%. It is also noticed that application of Mandel's test to routine calibrations (e.g. 6 concentration
DOI: 10.1039/c2ay26400e
levels times 4 replicates each) would not be suitable for differentiating between linear and nonlinear
www.rsc.org/methods models even when their variances differ by 25%.

Introduction To objectively prove that the model is correct (or at least, that
it cannot be proved that it is wrong) the variance unexplained by
Whenever a calibration is carried out in the laboratory two main the linear t must be a pure (random) error. Following IUPAC,3
objectives are to be kept in mind. First, to quantify a given this can be done just by considering the linear model itself
substance (or level of a property) in a set of unknowns and, (a priori) or by comparing it against an alternative nonlinear
second, to evaluate the condence interval linked to such a model (a posteriori). The latter is usually a quadratic one
prediction. Both depend critically on the correctness of the (though this is not mandatory) because it is a quite reasonable
model and, therefore, major efforts must be made by the analyst alternative. For instance, because of the physico-chemical
to set beyond doubt that the model selected (currently, the behaviour of the molecules giving rise to departures from
‘linear’ or straight line one) is unbiased. Lambert–Bouguer–Beer's law. The rst sentence in this para-
This is far from simple because very few calibration solutions graph reveals how difficult it is to prove that a calibration t is
are prepared ordinarily due to time and economic constraints. correct, as we cannot prove that the null hypotheses of common
This complicates the assessment of the many aspects that statistical tests (the Student's t-test, the Fisher–Snedecor F-test,
should be validated before accepting a calibration model (ref. 1 etc.) are true. Instead, all we can do is to demonstrate that there
presents an updated review), for instance: the absence of are no proofs to show its falsehood. Another complication
outliers (randomness of the residuals and the absence of stems from the method of least squares as it is too limited to act
dened patterns on them), variance homogeneity along the as a diagnostic tool, and ‘it is much less suited to a diagnosis as to
different levels of the calibrators, negligible errors in the which functional form is most appropriate’.4
predictors with respect to the errors in the signals and checking The a priori option, the so-called lack-of-t test, is applied
that the linear model is unbiased. most commonly and it requires measurement of true replicates
A somehow subjective, traditional approach2 relies on for the calibrators. This is not always feasible and an alternative
restricting the linear straight calibration range as to the to compare the linear t to a nonlinear one was proposed in
maximum concentration level for which the difference between 1964 by the chemist and statistician John Mandel, while
the experimental signal and that predicted by the regression working for the US National Bureau of Standards.4 Despite its
(straight line) model differed by less than 3%; i.e., the concen- simplicity its use has only recently increased5–7 and was rec-
tration at which (ypredicted  yreal)/ypredicted ¼ 0.03 (or 3%). ommended by ISO.8 Its lack of popularity might be due to the
reluctance of the analysts to t a set of data to a polynomial
Dept. Analytical Chemistry, University of A Coruna, Campus da Zapateira, s/n,
model. Nevertheless, this is simple nowadays, as a quadratic
E-15008, A Coruna, Spain. E-mail: andrade@udc.es; Fax: +34 981167065 least squares t can be implemented easily with a popular

This journal is ª The Royal Society of Chemistry 2013 Anal. Methods, 2013, 5, 1145–1149 | 1145
View Article Online

Analytical Methods Paper

spreadsheet.6 This makes Mandel's test highly appealing for would be ‘the variance explained by the quadratic term is larger
analytical chemists as the most they have to do is to t the same than the residual variance’ and we should decide that a higher-
dataset to two models. than-rst-order polynomial ts the data better than the rst
The brief introduction to Mandel's test given by IUPAC,3 order one (the straight line). The question is to derive a useful
however simple and easy-to-understand, did not state correctly equation for the calculations.
the degrees of freedom and this might have raised concern on First, recall that the residual variance of a least squares t is
Published on 18 December 2012. Downloaded by The University of Melbourne Libraries on 14/10/2014 15:03:28.

its adequacy to most common applications where the number estimated as the sum of the squared residuals (SS) divided by
of calibrators is typically low. the corresponding degrees of freedom (dof), eqn (2):
The aim of this paper is to test the validity of the simplied
n 
P 2
IUPAC approach and compare it to Mandel's original deni- ypredicted  ytrue
tion.4 The correct interpretation of Mandel's test is reviewed and Sy=x 2 ¼ i¼1 (2)
a practical example is presented to evaluate how inclusion of the nk
degrees of freedom in the calculations can affect the decisions. where n is the overall number of standards and k is set to 2 and 3
Ten scenarios are to be shown depending on whether the vari- for the straight line and the quadratic models, respectively.
ance of the residuals of the straight line t or that of the Mandel did not dened the variance explained by the addi-
nonlinear t is greater. tional term(s) as a mere subtraction of the variances of the two
models (i.e., Sy/x,lin2  Sy/x,non2) but as a subtraction of the sum
Experimental of squares of the linear and quadratic ts, divided by the
Mandel's test: denition and interpretation difference of their degrees of freedom. Eqn (3) presents Man-
del's formal denition for the F-test, which can be developed
Mandel's test was summarized as3 ‘a comparison of the residual further to give eqn (4) and (5) (a combination of eqn (4) and (2)),
standard deviation of the linear model with that of the nonlinear which are those used currently5–7 (‘SS’, ‘non’ and ‘lin’ have the
model’. Such a denition results in the well-known conceptual same meaning as above). Accordingly, the difference between
Fischer–Snedecor F-test (Fexperimental ¼ Sy/x,lin2/Sy/x,non2), where S the denitions given by IUPAC and Mandel differ in the use of
stands for standard error of the regression, ‘lin’ for the straight the dof to weight the residual variances of the ts.
line model and ‘non’ for the nonlinear model (here a quadratic

one, although this is not mandatory). Note that although the ðSSlin  SSnon Þ ðdof lin  dof non Þ
term variance should be used senso stricto instead of standard Fexp ¼ (3)
Sy=x;non 2
error, this is not relevant for the discussions here. The deni-
tion was then resolved to eqn (1) (numbered 51 in ref. 3), which   
SSlin  SSnon ½ðn  2Þ  ðn  3Þ ðSSlin  SSnon Þ
indicates that a direct comparison between Sy/x,lin2 and Sy/x,non2 Fexp ¼ ¼
Sy=x;non 2
Sy=x;non 2
is indeed not carried out. Instead, the difference between the
(4)
residual variances of both models is compared with an esti-
mation of the pure random error.
ðn  2ÞSy=x;lin 2  ðn  3ÞSy=x;non 2
Fexp ¼ (5)
Sy=x;lin  Sy=x;non
2 2
Sy=x;non 2
Fexp ¼ (1)
Sy=x;non 2
If the null hypothesis cannot be rejected, i.e. the alternative
According to the original formulation from Mandel the does not improve the t, both the numerator and denominator
numerator of the F-test should not be the difference amongst estimate the true pure residual variance.4 Thus the experimental
the residual variances. The IUPAC's Gold Book and other on- F-test will be lower than the critical value for a given probability
line resources (see http://www.iupac.org) were searched for but level, and 1 and n  3 dof for the numerator and denominator
to the best of the authors' knowledge there were no updates of (in case the polynomial other than the quadratic one is
that equation. considered, n  k dof should be considered instead of n  3,
Despite eqn (1) not being strictly correct, it does refer to the with k being the order of the polynomial plus one). Otherwise,
conceptual framework proposed by Mandel to check whether the alternative hypothesis must be accepted; i.e. the numerator
the variance explained by the additional factor(s) added to a contains a structured variance which is larger than the pure
linear model to get the quadratic (in general, polynomial) model residual variance. Eqn (1) (ref. 3) avoids weighting the variances.
is statistically signicant. In other words, whether the variance However, as the number of calibrators that is considered into
explained by the additional term does not correspond to the laboratories is usually very low, a concern arises on its
random error (pure residuals) but to structured information. adequacy.
This implies studying whether the variance explained by the Finally, it is worth noting that Mandel's test can decide on
additional term is larger than the variance of the experimental the linear straight calibration range. It consists of the deletion
error, whose unbiased estimate is given by the denominator of of the highest concentration levels of the calibration and
eqn (1). In this way, the null hypothesis would be formulated as application of Mandel's test repeatedly.6 This will yield a
‘the variance explained by the additional term is not different concentration range where quadratic terms are not required.
from the residual variance’, and we would conclude that the This can also be done by the successive application of the lack
alternative model is not signicant. The alternative hypothesis of t test.9

1146 | Anal. Methods, 2013, 5, 1145–1149 This journal is ª The Royal Society of Chemistry 2013
View Article Online

Paper Analytical Methods

Dening the simulations model (i.e. the quadratic t is looser). Eqn (1) leads in almost all
scenarios to wrong decisions because Fexp,IUPAC is independent
To study whether the experimental statistics (Fexp) values
of the dof. For instance, Fexp,IUPAC ¼ 1 whenever the residual
calculated by the two denitions (eqn (1) and (5)) lead to
variance of the linear t is 100% higher (i.e., a much worse
different conclusions, simulations were made varying the
model) than the alternative one.
number of data points considered into a calibration (n, from 4
A second nding was expected: the higher the condence
to 500), as well as the magnitude of the variances of the straight
Published on 18 December 2012. Downloaded by The University of Melbourne Libraries on 14/10/2014 15:03:28.

level, the more standards are required to exceed the critical value.
line and the alternative non-linear (here, quadratic) model. In
general, we expect the variance of the quadratic t to be lower Increasing the condence level from 95% to 99% raises the
than the linear one (explanations will focus on this case) but the number of standards much less than considering the residual
variance of the quadratic t higher than the linear one (Table 1).
opposite situation was included in the calculations for
A third conclusion is that the relative magnitude of the
completeness. Plots present whether the two Fexp values become
residual variances of the models is a major issue in detecting
higher than Ftab at the same time. The two common condence
differences between them. When a model is much worse
levels, 95% and 99%, were tested. All simulations were made
using the popular Excel spreadsheet.
Ten different scenarios were considered:
(i) The residual variance of the linear t is much larger than
that of the alternative model (e.g., by 900%), and vice versa.
(ii) The residual variance of the linear t doubles that of the
alternative model (i.e., 100% larger), and vice versa.
(iii) The residual variance of the linear t is higher than that
of the alternative model (e.g., by 50%), and vice versa.
(iv) The residual variance of the linear t is slightly higher
than that of the alternative model (e.g., 10% larger), and vice
versa.
(v) The residual variance of the linear t is slightly higher
than that of the alternative model (i.e., 2% larger), and vice
versa.

Results and discussion


Table 1 resumes the values of n from where Fexp,Mandel and
Fexp,IUPAC exceed Ftab(cc%,1,n3) (when the variance of the alter-
native model is higher than the linear one, the absolute value of
Fexp was considered). It is seen rst that the overall behaviour is
the same regardless of which residual variance is larger and that
the number of calibrators aer which the critical value is
exceeded approximately doubles for Mandel's test (eqn (5))
when the quadratic residual variance is higher than the linear

Table 1 Number of standards from where Fexp,Mandel exceeds Ftab(cc%,1,n  3).


Shown between brackets is the number of standards from where Fexp,IUPAC
exceeds Ftab

Condence level (cc%)

95% 99%

Variance of the linear model > 900% 5 (9) 6 (17)


variance of the quadratic model 100% 8 (none) 12 (none)
50% 11 (none) 18 (none)
25% 17 (none) 29 (none)
10% 35 (none) 63 (none)
2% 145 (none) 283 (none) Fig. 1 Summary of the simulations for the three scenarios and a particular
Variance of the linear model < 900% 10 (none) 14 (none) setting: variance of the linear model (var_lin) > variance of the quadratic model
variance of the quadratic model 100% 15 (none) 23 (none) (var_quad). Horizontal axis: number of standards into the calibration; vertical axis:
50% 19 (none) 29 (none) values of the F-statistic. White, black and grey bars correspond to the absolute
25% 29 (none) 51 (none) values of Ftab(95%,1,n  3), Fexp,Mandel and Fexp,IUPAC, respectively. Note: the scales
10% 58 (none) 90 (none) of the ordinates were adjusted to simplify visualization and grey bars are not
2% 247 (none) 388 (none) always visible because of their low values.

This journal is ª The Royal Society of Chemistry 2013 Anal. Methods, 2013, 5, 1145–1149 | 1147
View Article Online

Analytical Methods Paper

(residual variance around 900%, 100% or 50% higher) Mandel's The positive point here is that eqn (1) and (5) lead to the same
test will detect that provided we have a reasonable number of conclusions when the variances are very similar, typically when
calibrators, less than 30 calibration solutions (99% condence they differ in less than 10% (Table 1). For instance Fexp,IUPAC ¼
level) will be enough (Table 1). Recall that the simplied test is 0.02 when there is only a 2% difference between them. This will
independent of the dof, and so it will always yield the wrong never be signicant and although Fexp,Mandel becomes larger than
conclusions (but for a scenario). the tabulated value when n > 145 (95% probability), such a cali-
Published on 18 December 2012. Downloaded by The University of Melbourne Libraries on 14/10/2014 15:03:28.

The trend is that the closer the residual variances of the two bration will not be practical in most circumstances.
models, the more calibrators are required by Mandel's test to Following this, some preliminary screening by the analyst is
exceed the critical F-values (to reject the null hypothesis due to mandatory to assess the relative magnitude of the variances
the similarity between the models). Indeed, when they become before indiscriminate use of the IUPAC approach as it would
highly similar most calibrations carried out routinely in labo- easily yield wrong conclusions if they differ even by 25%,
ratories (let us say, 6 concentration levels times 4 replicates) will depending on the particular scenario. Table 2 contains two
not have calibrators enough to differentiate between the two opposite (very simple) calibrations which might help potential
models. This is especially true when their residual variances readers and which are self-explanatory. In example 1, both
differ in less than 10% (residual variance of the linear model approaches conclude that the quadratic model does not outper-
higher than the alternative one), at 95% condence level. At form the linear t, whereas in example 2 they yield opposite
99% condence level (assuming a usual ‘6 levels  4 replicates’ conclusions and the IUPAC equation is wrong (by construction of
calibration), it would not be possible to differentiate the two the experimental data, the quadratic model is correct).
models even when their residual variances differ by 25% unless
more than 30 standards are prepared (if the t of the linear Conclusions
model is the worst), which is an unusual practice.
If the variances of the two models are very similar (e.g., they The simplied equation given by IUPAC in 1998 to test the
differ by 2%) the number of calibrators required to recognize signicance on a nonlinear model (Mandel's test) with respect
differences between the models would be so high (n ¼ 145 or to a linear one does not adhere strictly to the original denition
283, see Table 1 and Fig. 1) that we will rarely perform the from the 1964 book by Mandel. It does not consider subtracting
calibrations. Hence, we will not be able to reject the null the sum of squares of the linear and quadratic ts, divided by
hypothesis (i.e., we would conclude wrongly that the alternative the difference on their degrees of freedom, and it does not take
model is not signicant). At the limit, if the variances tend to be into account the corresponding degrees of freedom for the
almost the same, Fexp,IUPAC ¼ 0 and Fexp,Mandel ¼ 1, which residual variances considered in the calculations.
obviously will never exceed Ftab(cc%,1,n  3). Simulations showed that the IUPAC approach is valid only
when the variances of the linear and nonlinear model are
slightly different (typically less than 10%). If one of them is
Table 2 Two opposite cases exemplifying application of Mandel's test and its clearly higher (even by 50%) the conclusions would not match
difference from the IUPAC approach. Example 1 corresponds to a calibration using
those derived from the original Mandel's test.
an ion specific electrode and Example 2 corresponds to fluorescence measure-
ments. To simplify, no replicates are shown Therefore, we cannot recommend indiscriminate use of that
approach unless the analyst makes a preliminary screening to
Example 1 (Variance linear model is ca. 8% lower than Variance ascertain the degree of departure of the two variances. As
quadratic model) current pressure and large workloads in laboratories might
overlook the importance of this step we strongly encourage
[NH4] (%) 0.0 0.1 0.2 0.3 0.4 0.5
Corrected signal 0.065 0.175 0.325 0.445 0.545 0.675 application of the original Mandel's equation.
However, one must be careful because even when the
Example 2a (Variance linear model is ca. 379% larger than Variance quadratic model ts the data best, Mandel's test would require a
quadratic model)
high number of calibrators to detect this correctly. In case the
[Analyte] (mM) 0 1 2 3 4 5 6 7 8 9 10 residual variance of the linear t is 25% larger than the
Fluorescence 0.10 3.80 7.50 10.0 14.4 17.0 20.7 22.7 25.9 27.5 30.0 quadratic one, at a 95% condence level we would require at
least 17 calibration solutions to detect that, or 35 when its
Example 1 Example 2 residual variance is 10% larger. At 99% condence level, these
Linear Quadratic Linear Quadratic values increase to 29 and 63 calibrators, respectively.
model model model model

Sy/x2 0.00016048 0.00017429 0.923192 0.192664


References
n  kb 4 3 9 8
FMandel 0.68 35.13
1 M. C. Ortiz, S. Sánchez and L. Sarabia, Quality of analytical
FIUPAC 0.079 3.79 measurements: univariate regression, in Comprehensive
Ftab(95%) 10.13 5.32 Chemometrics: Chemical and Biochemical Data Analysis, ed. S. D.
Ftab(99%) 34.12 11.26 Brown, R. y. Tauler and B. Walczak, Elsevier, Amsterdam, 2009.
a
Modied from J. N. Miller, Spectroscopy International, 1991, 3(4), 41– 2 D. G. Mitchell and J. S. Garden, Talanta, 1982, 29(11),
43. b Linear models, k ¼ 2; quadratic models, k ¼ 3. 921–929.

1148 | Anal. Methods, 2013, 5, 1145–1149 This journal is ª The Royal Society of Chemistry 2013
View Article Online

Paper Analytical Methods

3 K. Danzer and L. A. Currie, Pure Appl. Chem., 1998, 70(4), 993– 7 L. Brüggermann, W. Quapp and R. Wennrich, Accredit. Qual.
1014. Assur., 2006, 11, 625–631.
4 J. Mandel, The Statistical Analysis of Experimental Data, Dover 8 ISO 8466-1, Water Quality – Calibration and Evaluation of
Publications, New York, 1964. Analytical Methods and Estimation of Performance
5 J. V. Loco, M. Elskens, C. Croux and H. Beernaert, Accredit. Characteristics – Part I, ISO Géneve, 2001.
Qual. Assur., 2002, 7, 281–285. 9 L. Cuadros Rodrı́guez, A. M. Garcı́a Campa~ na and
Published on 18 December 2012. Downloaded by The University of Melbourne Libraries on 14/10/2014 15:03:28.

6 J. W. Einax and M. Reichenbacher, Anal. Bioanal. Chem., 2006, J. M. Bosque Sendra, Anal. Lett., 1996, 29(7), 1231–
384, 14–18. 1239.

This journal is ª The Royal Society of Chemistry 2013 Anal. Methods, 2013, 5, 1145–1149 | 1149

You might also like