Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

DOI 10.

1515/labmed-2013-0003   J Lab Med 2013; 37(3): 147–163

Labormanagement/Laboratory Management Redaktion: E. Wieland

Rainer Haeckel*, Werner Wosniok and Rainer Klauke

Comparison of ordinary linear regression,


orthogonal regression, standardized principal
component analysis, Deming and Passing-Bablok
approach for method validation in laboratory
medicine

Vergleich von ordinärer linearer Regression, orthogonaler Regression, standardisierter


Hauptkomponentenanalyse, Deming und Passing-Bablok Verfahren zur Methodenvalidierung
in der Laboratoriumsmedizin

Abstract: A well-accepted tool for method validation is a calculated with regression coefficients of both DRs did
method comparison study. Results are usually assessed not deviate more than the permissible bias. Therefore,
on a scatter plot of which the fitting line is calculated the advantage of using gDR does not justify its greater
by several approaches, for example, ordinary (vertical) disadvantages in comparison with sDR.
linear regression (OLR), orthogonal regression (OR),
Deming regression (DR), Passing-Bablok method (PBR) Keywords: Deming regression; Passing-Bablok regres-
or standardized principal component regression (SPCR). sion; regression models.
DR was applied in its general form (gDR), requiring
information of the imprecision of at least two differ-
ent quantities and as simple DR (sDR) with impreci- Zusammenfassung: Eine Methodenvergleichsstudie ist
sion information of only one quantity. The equation of ein allgemein akzeptiertes Hilfsmittel der Methodenva-
the regression line calculated by these concepts varies lidierung. Die Ergebnisse werden oft auf einem Streud-
depending on range of measurement, analytical vari- iagramm dargestellt, dessen Ausgleichsgerade mittels
ation and on imprecision ratio (sAY /sAX). There is still verschiedener Verfahren berechnet wird, wie z.B. (verti-
a global debate about which statistical concept is the kale) lineare Regression (OLR), orthogonale Regression
most adequate for validating purposes. Various paired (OR), Deming Regression (DR), Passing-Bablok Methode
random samples with a size of 100 were simulated in (PBR) oder standardisierte Hauptkomponenten-Regres-
5000 replicates and evaluated with different regres- sion (SPCR). Die DR wurde in ihrer generalisierten Form
sion models. The behavior of the slope and intercept (gDR), die Information der Impr äzision wenigstens
of the regression lines were compared under various zweier unterschiedlicher Quantitäten, oder als einfache
conditions. Two extreme ranges of measurement and DR (sDR), die Impr äzision nur einer Quantität benötigt,
several variance ratios in the absence and presence of angewendet. Die mit diesen Konzepten berechneten
bias were studied. The results clearly demonstrated that Regressionslinien variieren in Anhängigkeit des Mess-
DR is the only model which can be applied without any bereichs, der analytischen Streuung und des Impr äzi-
precautions under conditions which usually occur in onsverhäetnisses (sAY /sAX). Es findet immer noch eine
method comparison studies, and therefore should be globale Debatte über das für Validierungszwecke adä-
preferred in laboratory medicine. Other models require quate statistische Verfahren statt. 100 verschiedene
restrictions with regard to range of measurement and/ gepaarte Zufallsproben wurden 5000 Mal einer Simu-
or imprecision profile. Differences of the concentra- lation mit den genannten Regressionsverfahren unter-
tions at different positions of the measurement interval worfen. Das Verhalten von Steigung und Interzept der
148      Haeckel et al.: Regressions for method comparisons

Regressionsgeraden wurde unter verschiedenen Bed- Introduction


ingungen verglichen. Zwei extreme Messbereiche und
verschiedene Varianzverhältnisse wurden in Ab- und Method comparison studies play an important role in the
Anwesenheit eines Bias untersucht. Die Ergebnisse validation of analytical procedures in laboratory medi-
zeigten eindeutig, dass nur die DR ohne Einschränkun- cine according to the well-accepted international stand-
gen unter Bedingungen, die üblicherweise bei Meth- ard ISO 15189 [1]. Comparison studies of measurement
odenvergleichen vorkommen, angewendet werden procedures are widely used to assess agreement or to
können und daher in der Laboratoriumsmedizin vor- detect disagreement (bias) between two procedures which
zuziehen ist. Die anderen Modelle erfordern Restrik- measure the same quantity. One procedure is usually con-
tionen in Bezug auf den Messbereich und/oder die sidered as the comparative (x-method), the other one as
Impräzisionsprofile. Konzentrationsunterschiede an the test procedure (y-method). The comparative method
verschiedenen Positionen des Messbereichs, die mit should be preferably a reference procedure, although it
den Regressionskoeffizienten beider DR berechnet may also be a standardized procedure with which the
wurden, wichen nicht mehr als der zulässige Bias von laboratory is well acquainted and which is subjected to
einander ab. Daher rechtfertigt der geringe Vorteil der internal and external quality assurance. For experimental
gDR gegenüber der einfachen DR nicht deren erhöhten details to perform method comparison studies, see earlier
Aufwand. recommendations [2] and reviews [3–7]. Both compared
procedures measure with a certain degree of uncertainty.
Usually, they disagree to some extent and, seldom, agree
Schlüsselwörter: Deming regression; Passing-Bablok
completely. The question is whether the disagreement
regression; regressions models.
can be tolerated [8].
Before a possible disagreement is determined, the dis-
*Correspondence: Rainer Haeckel, Katrepeler Landstrasse 45e, agreement must be correctly ascertained. This is usually
28357 Bremen, Germany, Tel.: +49-421-273446, Fax: +49-421- started with the graphical presentation of the paired data
21863799, E-Mail: rainer.haeckel@t-online.de
obtained from several subjects. Two types of presenta-
Werner Wosniok: Institut für Statistik, Universität Bremen, Bremen,
Germany tions are applied: the scatter plot (x/y plot) and the differ-
Rainer Klauke: Institut für Klinische Chemie, Medizinische ence plot according to Bland and Altman [9] or one of its
Hochschule Hannover, Hannover, Germany modifications such as the normalized difference plot [4].
The advantage of difference plots is the better visibility of
differences especially for (x, y) in the lower range of quan-
tities. The advantage of scatter plots is that the methods
Abbreviations: xH/xL ratio, ratio of measurement inter- for their assessment are well developed. Also, scatter plots
val limits, quantity of the upper limit of the measure- allow the detection of effects that cannot be seen in differ-
ment interval (xH) divided by the lower limit of the mea- ence plots, for example, the presence of a non-zero inter-
surement interval (xL); sX (or CVX), the standard deviation cept (a > 0). Several characteristics are usually examined
(or coefficient of variation) of the x-values (x-method, from x/y plots that can be determined by individual statis-
method for comparison); sY (or CVY), standard deviation tical tests: linearity over the entire measurement interval,
(or coefficient of variation) of the y-values (y-method, determined visually or as described by Bablok [10] and
method to be validated); sAX, analytical standard devia- more recently by Geistanger et al. [11], slope and intercept
tion of the x-method; sAY, analytical standard devia- of the fitting line, detection of outliers. If linearity can be
tion of the y-method; sAY /sAX or CVAY /CVAX, imprecision assumed, the data can be characterized by the equation
ratio, the analytical standard deviation (or coefficient of the fitting line which is defined by slope and intercept,
of variation) of the y-method divided by the standard and by the spread of the data pairs around the fitting line.
deviation (or coefficient of variation) of the x-method; There is still debate about which mathematical proce-
OLR, ordinary linear regression; DR, Deming regres- dure should be applied for calculating the fitting line [5,
sion; sDR, simple Deming regression; gDR, general 12]. The procedures proposed to differ in their assumptions
Deming regression; OR, orthogonal regression; PBR, about characteristics of analytical errors involved and
Passing-Bablok regression; SPCR, standardized princi- require, for that reason, more or less numerical effort. From
pal component regression; DPB, detectable proportional a theoretical point of view, it is obvious that a method com-
bias. parison should involve characteristics of analytical errors
Haeckel et al.: Regressions for method comparisons      149

present in the data as closely as possible. However, analyti- [7]. A proportional bias between the methods leads to a
cal errors in laboratory medicine may be relatively small, slope > 1 or < 1 (negative bias), and a constant bias leads to
as can be seen from sometimes high correlations between a deviation of the intercept from zero. If the comparative
comparative and test procedure results, so it might be sus- procedure is not a reference method, the term “bias” is not
pected that different mathematical procedures might lead quite correct and should be stated more precisely as “bias
to similar comparison results. In particular, it is not clear between procedures” or even more appropriately as “dif-
whether results from an elaborate mathematical proce- ference between procedures”. As indicated in the intro-
dure are superior enough compared to simple methods duction, an artificial bias can also be erroneously identi-
that they justify the high effort of generating detailed data fied by using an inappropriate regression procedure.
on analytical imprecision, which the methods require. By Several least square techniques for the estimation of
contrast, it is known that an inappropriate way of method the regression coefficients have been applied in labora-
comparison produces misleading results [8]. tory medicine, which can be classified into two categories:
In a recent study, ordinary linear regression (OLR) has 1. regressions assuming imprecision in the values of
been found to be sensitive to range of measurements and only one variable: OLR, and
to imprecision ratio (sAY /sAX), whereas standardized prin- 2. regressions assuming imprecision in the values of
cipal component regression (SPCR) and Passing-Bablok both variables
approach (PBR) were only sensitive to imprecision ratio 2.1. SPCR
[8]. In the former study, imprecision ratios were kept con- 2.2. OR, also called principal component regression
stant over the data range. The present study extends the 2.3. DR (two versions).
previous results by also considering non-constant impre-
cision ratios and includes orthogonal regression (OR) OLR minimizes the distance of the data points in a scatter
and two versions of the Deming approach (general and plot (comparison method on the abscissa, test method on
simple DR, gDR and sDR, respectively). These two DRs the ordinate) towards the fitting line either in the vertical
are specifically designed to deal with different and vari- direction (parallel to the ordinate), denoted regression
able imprecision ratios. The performance of the various of y on x (regression of first kind, ordinary least squares
methods is compared with regard to the bias and the vari- regression), or in the horizontal direction (parallel to
ance of the estimated coefficients. Additionally, the size of the abscissa), denoted regression of x on y (regression of
the detectable bias was estimated. Linnet [13] had already second kind, inverse least squares regression). The regres-
performed a similar simulation study, but compared only sion coefficients slope (b) and intercept (a) of both regres-
OLR with DR and did not consider the most realistic case sion functions depend on the measurement interval of the
that the ratio of sAX and sAY is non-constant in the range of variable to be studied, the variation of the x- and y-values,
measurements. and the covariance of x, y. Both regression functions are
almost identical if sAY /sAX = 1 and both mean values are
identical. The smaller the measurement range is the larger
sAX is and the more sAY /sAX deviates from 1, the more differ-
Linear regression models ent the regression coefficients of the two regression func-
tions from each other [7].
The result of all kinds of regression analyses is the equa- Regressions of category 2 permit a more realistic
tion of the fitting line. In the ideal case (concordance of description of the experimental situation than OLR.
both procedures), the fitting line has a slope of 1, an inter- In most cases (but not always), the slope of the regres-
cept of 0 and the data points scatter very closely around sions with errors in both variables is in-between the two
the regression line. An empirical result can deviate from slopes of the OLR of the first and second kind. Averdunk
the ideal result in various ways. The regression coeffi- and Borner [14] proposed to use the geometric mean of
cients may indicate the presence of bias. If the comparative the two slopes as slope of the regression equation which
method is a reference method, the bias of the test method corresponds to the SPCR developed by Feldmann et al.
can be characterized by the difference between the regres- [15, 16]. The common assumption here is that sX = sY, which
sion line (fitting line) and the line of equality (y = x) at a implies that sAX = sAY . Then, the SPCR regression line is the
given xi %bias is the difference in percent of a specified xi bisecting line between the OLR of the first and second kind
(e.g., medical decision limits or mean values). The statis- passing through their intersection [7, 15]. The (simple)
tical significance of slope and intercept deviations from Deming model [17–20] can deal with the case sAX≠sAY , but
1 or 0, respectively, can be tested using confidence limits requires that the ratio δ= sAY 2 2
/ sAX is known. This concept
150      Haeckel et al.: Regressions for method comparisons

was first proposed by Kummel [21] and later supported by one-third of the detection limit (in the case that the detec-
others [22, 23]. The original DR assumed that the impre- tion limit was determined by multiplying the sA of a blank
cision ratio δ is constant over the entire measurement sample with 3). The regression coefficients of the impreci-
interval (simple DR, DR with constant δ). Linnet [13] and sion profile are termed α (intercept) and β (slope). A β of
Martin [22] have pointed out that this is often not the case. sA = 0.04 means CVA = 4% (if α = 0).
If δ varies over the data range, because the methods have In Figure 1, several types of imprecision profiles are
different imprecision profiles, then the assumptions of the presented. In case A, both procedures have the same sA
simple Deming procedure are violated and the generalized (δ = 1 and δ is constant in the measurement interval); in
Deming procedure is, at least theoretically, more appro- case B, sA is different for both procedures (sAY > sAX) and
priate [22]. Of course, the imprecision profiles must be increasing with xi (sA is non-constant, CVA is constant, δ≠1
known to recognize if the conditions of either the simple and δ is constant); in case C, sA and CVA differ for both pro-
or the generalized Deming procedure hold. cedures and are non-constant (δ≠1 and δ is non-constant);
OR is a special case of DR if δ = 1 [23]. OR was first in case D, sA is constant (sAY > sAX), CVA is non-constant, δ is
described by Adcock in 1878 [24]. It minimizes the perpen- constant and > 1. In case B, no intercept exists, that means
dicular distance between each data point and the fitting that the detection limit of both procedures is zero. Pro-
line. bably, case C most often occurs in practice (most realistic
Two steps are required to perform gDR. In the first case).
step, the imprecision profile must be determined. The PBR uses an approach which is completely different
most reliable way may be to derive the profile function of the least square techniques by estimating the median
of duplicate examinations of human samples (instead of the slopes between all data pairs [4, 25].
of artificial control samples). As an alternative, the sA of
two control samples with different quantities are taken
assuming that the imprecision profile is usually a linear
relation between sA and the quantity measured. Another Methods
way may be to take the sA value of one control sample and
Two artificial data sets (comparative procedure X and
testing procedure Y) were created, one with a ratio of the
measurement interval limits of 1:1.35 (lower limit = 85 arbi-
trary units, upper limit = 115, sX≈5) and one with a ratio
of 1:19 (lower limit = 10, upper limit = 190, sX≈30). The two
A intervals probably cover a representative span of intervals
C B which usually occur in clinical chemistry. Mean values of
xi and yi were identical (100 arbitrary units), the analytical
Analytical standard deviation

standard deviations of both procedures (sAX and sAY) were


varied as indicated in Tables 1–3. The data of the two sets
were selected normally distributed over the measurement
D intervals.
Both data sets were subjected to different analytical
standard deviations in the absence and presence of pro-
portional and constant bias. The imprecision ratio was
kept either constant or was varied by increasing linearly
sAX and sAY in the measurement interval chosen as shown
in Tables 1–3. The imprecision was either identical for both
analytical procedures ( δ= sAY 2 2
/ sAX = 1) or different (δ≠1). In
True value the theoretical case that the imprecision profiles (plots
of sAX vs. xi and sAY vs. xi) of both methods are linear and
Figure 1 Graphical presentation of the theoretical relation of start at the origin (detection limit = 0), δ is constant. In the
the analytical imprecision and the quantity chosen for various
presence of an intercept α > 0 (detection limit > 0), δ is non-
imprecision profiles.
Solid lines represent the x-method and dashed lines the y-method:
constant. Monte Carlo simulations were repeated 5000
(A) black, δ = 1 and constant), (B) blue, δ > 1 and constant, (C) green, times and the mean values of the regression coefficients,
δ > 1 and non-constant, (D) red, δ > 1 and constant. the confidence intervals and the detectable proportional
Table 1 Influence of analytical standard deviation on intercept (a) and slope (b) of several fitting lines between comparative method (suffix X) and test method (suffix Y).

α of β of α of β of ra OLR SPCR OR sDR gDR PBR


sAX sAX sAY sAY a b a b a b a b a b a b

Example A: without “true” bias


2.000 0.040 1.000 0.020 0.545 Mean 59.2 0.408 25.2 0.748 40.2 0.598 –1.9 1.020 –1.9 1.019 31.0 0.689
sa; sbb 6.5 0.065 6.3 0.063 9.5 0.095 16.4 0.165 16.4 0.164 8.1 0.081
DPB 0.118 0.113 0.175 0.280 0.279 0.158
1.000 0.020 2.000 0.040 0.546 Mean 26.7 0.733 –34.3 1.343 –70.5 1.705 –0.2 1.002 –0.2 1.002 –46.7 1.467
sa; sbb 11.4 0.114 11.3 0.113 27.3 0.274 15.8 0.158 15.7 0.157 17.0 0.171
DPB 0.193 0.193 0.457 0.271 0.270 0.287
2.000 0.001 1.000 0.001 0.899 Mean 15.1 0.850 5.6 0.945 6.1 0.939 –0.2 1.002 –0.2 1.002 5.8 0.942
sa; sbb 4.1 0.041 4.1 0.041 4.6 0.046 4.9 0.049 4.9 0.049 4.6 0.046
DPB 0.068 0.068 0.076 0.081 0.081 0.075
0.500 0.010 0.500 0.001 0.950 Mean 8.3 0.917 3.5 0.965 3.7 0.964 –0.1 1.001 –0.1 1.001 3.5 0.964
sa; sbb 3.0 0.030 3.0 0.030 3.2 0.032 3.3 0.033 3.3 0.033 3.3 0.033
DPB 0.050 0.050 0.053 0.055 0.055 0.054
0.100 0.010 0.100 0.001 0.975 Mean 4.7 0.954 2.3 0.977 2.3 0.977 –0.0 1.000 –0.0 1.000 2.3 0.977
sa; sbb 2.1 0.021 2.1 0.022 2.2 0.022 2.3 0.023 2.2 0.022 2.3 0.023
DPB 0.036 0.036 0.037 0.038 0.038 0.038
1.000 0.010 0.100 0.001 0.976 Mean 4.6 0.953 2.2 0.978 2.3 0.977 –0.06 1.000 –0.06 1.000 2.2 0.978
sa; sbb 2.1 0.021 2.1 0.021 2.2 0.022 2.26 0.023 2.26 0.023 2.3 0.023
DPB 0.036 0.036 0.037 0.038 0.038 0.038
0.100 0.001 1.000 0.001 0.976 Mean 0.2 0.998 –2.3 1.023 –2.4 1.024 2.29 1.000 0.02 1.000 –2.26 1.024
sa; sbb 2.3 0.023 2.3 0.023 2.3 0.023 2.3 0.023 2.3 0.023 2.5 0.025
DPB 0.038 0.038 0.039 0.038 0.038 0.041
0.500 0.001 0.500 0.001 0.986 Mean 1.5 0.985 0.0 1.000 0.0 1.000 0.0 1.000 0.0 1.000 0.0 1.000
sa; sbb 1.7 0.017 1.7 0.017 1.7 0.017 1.7 0.017 1.7 0.017 1.8 0.018
DPB 0.028 0.028 0.029 0.029 0.029 0.030
0.000 0.02 0.000 0.02 0.860 Mean 13.9 0.861 –0.1 1.001 –0.14 1.001 –0.14 1.001 –0.13 1.001 –0.12 1.001
sa; sbb 35.2 0.052 5.19 0.052 6.07 0.061 6.07 0.061 6.01 0.060 5.9 0.059
DPB 0.086 0.086 0.100 1.00 0.099 0.097
0.000 0.05 0.000 0.05 0.500 Mean 50.1 0.499 –0.33 1.003 –1.63 1.016 –1.63 1.016 1.70 1.017 –0.81 1.007
sa; sbb 8.77 0.0875 8.78 0.088 18.9 0.189 18.9 0.189 18.9 0.189 12.48 0.125
DPB 0.166 0.167 0.316 0.316 0.317 0.210
Example B: with 10% proportional bias
1.000 0.020 2.000 0.040 0.577 Mean 29.089 0.809 –30.27 1.403 –67.92 1.779 –0.703 1.107 3.081 1.069 –43.81 1.538
sa; sbb 11.712 0.117 11.56 0.116 26.51 0.265 16.10 0.161 15.52 0.155 17.30 0.173
DPB 0.196 0.195 0.444 0.276 0.268 0.290
Haeckel et al.: Regressions for method comparisons      151
152      Haeckel et al.: Regressions for method comparisons

regression; PBR, Passing-Bablok regression). The table gives means of estimated slope and intercept with standard deviation and the detectable proportional bias (DPB; power = 90%), obtained
PBR

0.757
0.086
0.164

from 5000 simulations. Standard deviation of the measurement range = 5 units (example A, without “true” bias; example B, with 10% proportional bias). Significant under- or overestimations of
bias were calculated. The slopes and intercepts of the

regression line (OLR, ordinary linear regression; SPCR, standardized principal component regression; OR, orthogonal regression; sDR, simple Deming regression; gDR, generalized Deming
regression lines were calculated as described elsewhere

independent normally distributed errors (mean = 0, intercept α and slope β of sAX and of sAY) to normally distributed “true values”. Five approaches were used to obtain coefficients of the
[4, 8, 14] and for sDR and gDR as indicated in Appendix 1.

Coefficient of correlation. bStandard deviation of intercept (sa) and slope (sb) taken from the simulations. (X, Y) data pairs (n = 100, mean = 100 arbitrary units) were simulated by adding
a

34.15
8.591
Confidence limits were determined as empirical quantiles.
Although confidence limits were not always symmetrical,
only the standard deviation of slopes and intercepts are
gDR

1.083
0.171
0.287
given in Tables 1–3 to provide better readability. In Figure
2 the correct confidence limits are presented. The detect-
able size of a proportional bias is used as a measure for the
a

1.751
17.10

power of detection as described in Appendix 1. One practi-


cal example for plasma glucose was taken from the litera-
ture [8]. In this case, the empirical data were log-normally
sDR

1.120
0.178
0.296

distributed.
Stöckl et al. [5] have pointed out that the effect of
analytical imprecision of the x-method on OLR can be
a

–2.018
17.81

neglected if, as a general rule [2, 5], the correlation coef-


ficient r  ≥  0.975 (“small” data interval, within 1 decade)
or   ≥  0.99 (“wide” data interval, > 1 decade). This rule has
OR

0.683
0.104
0.184

also been mentioned by other authors [6, 22] and appears


very practical because the r-value is well known and is
a

41.71
10.37

easily available in many statistical platforms. Therefore,


we have tested the rule for regression coefficients of
various regression models.
SPCR

0.804
0.067
0.128

Results
a

29.61
6.728

In the absence of bias, OLR underestimated the slope of


OLR

0.450
0.068
0.129

the regression line with increasing analytical variation


(Tables 1 and 2) and with decreasing r-value (Figure 2). If
sAX > sAY, false estimations were more pronounced than if
sAY > sAX. Other methods estimated the slope correctly and
a

65.017
6.797

led to almost the same regression function if the impreci-


sion ratio was close to δ = 1 and constant over the entire
measurement interval. If the δ ratio was above 1, the slope
sa; sbb
Mean

DPB

was falsely increased with SPCR, OR and PBR. If the ratio


was below 1, the slope was falsely decreased with OLR,
slopes are marked in italics (according to DPB).

PBR and SPCR. With both DRs, the slope was not influ-
ra

0.559

enced by various imprecision values or imprecision ratios.


These effects were identical at s≈5 (Figure 3A, Table 1) and
β of
sAY

0.020

s≈30 (Figure 3B, Table 2).


Some examples represented in Tables 1 and 2 are
shown graphically in Figure 3, which demonstrated the
α of
sAY

1.000

slope differences between the regression models. The


slopes calculated by both DRs lie on the b = 1 line (parallel
(Table 1 Continued)

to the ordinate). All other models either underestimated


β of
sAX

0.040

b (e.g., Figure 3D) or overestimated b (e.g., Figure 3A, C).


A 10% proportional bias (slope = 1.1) shifted all results
parallel to the right of the b = 1 line more or less correctly.
α of
sAX

2.000

With examples CVAX = 2 and CVAY = 4 (Table 1, example B),


a
Table 2 Influence of analytical standard deviation on intercept (a) and slope (b) of several fitting lines between comparative method (suffix X) and test method (suffix Y).

α of sAX β of sAX α of sAY β of sAY ra OLR SPCR OR sDR gDR PBR

a b a b a b a b a b a b

Example A: without “true” bias


0.100 0.030 4.000 0.050 0.950 Mean 1.2 0.988 –3.9 1.039 –4.1 1.041 0.1 0.999 0.1 0.999 –3.8 1.039
sa; sbb 3.2 0.034 3.2 0.034 3.4 0.036 3.2 0.034 2.9 0.030 3.4 0.035
DPB 0.056 0.056 0.059 0.057 0.050 0.058
1.000 0.020 2.000 0.040 0.974 Mean 1.0 0.990 –1.6 1.016 –1.6 1.016 –0.0 1.000 –0.0 1.000 –1.5 1.015
sa; sbb 2.3 0.025 2.3 0.025 2.4 0.025 2.3 0.025 2.0 0.021 2.3 0.025
DPB 0.041 0.041 0.042 0.041 0.036 0.042
2.000 0.040 1.000 0.020 0.975 Mean 4.0 0.960 1.5 0.985 1.5 0.985 –0.0 1.000 0.0 1.000 1.3 0.985
sa; sbb 2.2 0.023 2.1 0.023 2.2 0.023 2.2 0.024 1.9 0.020 2.2 0.023
DPB 0.038 0.038 0.039 0.040 0.034 0.039
0.100 0.001 4.000 0.001 0.991 Mean 0.0 1.000 –0.9 1.009 –0.9 1.009 –0.0 1.000 –0.0 1.000 –1.0 1.009
sa; sbb 1.5 0.014 1.5 0.014 1.5 0.014 1.5 0.014 1.5 0.014 1.6 0.015
DPB 0.025 0.025 0.025 0.025 0.025 0.026
0.000 0.1000 0.0000 0.1000 0.891 Mean 10.87 0.892 –0.088 1.001 –0.117 1.001 –0.117 1.001 –0.07 1.001 –0.241 1.001
sa; sbb 4.524 0.049 4.399 0.049 5.01 0.056 5.007 0.056 3.11 0.037 4.197 0.049
DPB 0.081 0.082 0.092 0.092 0.062 0.081
0.000 0.2000 0.0000 0.2000 0.671 Mean 32.65 0.674 –0.425 1.005 –0.893 1.010 –0.893 1.010 –0.45 1.005 –0.715 1.005
sa; sbb 8.019 0.082 7.334 0.081 11.42 0.124 11.42 0.124 7.66 0.086 8.00 0.093
DPB 0.159 0.158 0.207 0.207 0.164 0.172
0.000 0.0500 0.0000 0.1500 0.873 Mean 2.865 0.971 –11.29 1.113 –13.06 1.130 –0.099 1.000 –0.20 1.000 –10.26 1.110
sa; sbb 5.372 0.060 5.489 0.0603 6.514 0.070 5.562 0.062 3.48 0.042 5.55 0.062
DPB 0.099 0.0992 0.137 0.106 0.069 0.105
0.000 0.0500 0.0000 0.2000 0.807 Mean 2.918 0.971 –20.36 1.2035 –25.86 1.258 –0.070 1.001 –0.09 1.001 –19.53 1.211
sa; sbb 6.887 0.077 7.219 0.0774 9.684 0.101 7.126 0.080 4.50 0.054 7.85 0.085
DPB 0.151 0.1517 0.181 0.155 0.0893 0.162
Example B: with 10% proportional bias
2.000 0.0400 1.0000 0.0200 0.975 Mean 4.451 1.056 1.779 1.082 1.562 1.085 –0.004 1.100 0.073 1.099 1.435 1.084
sa; sbb 2.378 0.025 2.353 0.025 2.421 0.026 2.455 0.026 2.063 0.023 2.368 0.026
DPB 0.042 0.042 0.044 0.044 0.038 0.043
1.000 0.0200 2.0000 0.0400 0.978 Mean 1.184 1.088 –1.274 1.113 –1.54 1.115 0.039 1.100 0.108 1.099 –1.37 1.114
sa; sbb 2.240 0.024 2.250 0.024 2.311 0.025 2.267 0.024 1.916 0.021 2.296 0.025
DPB 0.040 0.040 0.041 0.040 0.035 0.041

a
Coefficient of correlation. bStandard deviation of intercept (sa) and slope (sb) taken from the simulations. Standard deviation of the measurement range = 30 units (example A, without “true”
bias; example B, with 10% proportional bias). Significant under- or overestimations of slopes are marked in italics (according to DPB). For further details, see Table 1.
Haeckel et al.: Regressions for method comparisons      153
154      Haeckel et al.: Regressions for method comparisons

PBR

1.175

1.184
0.031
0.051

1.218
0.040
0.066
1.30

experimental regression coefficients of A (log-normal distribution). Example C: results obtained by simulations using the arbitrarily preselected regression coefficients (α, β) and a log-normal
[8, 26]. Imprecision profile was derived from duplicate measurements. Example B: regression coefficients, standard deviations and DPB values were obtained from a simulation study using
Coefficient of correlation. bStandard deviation of intercept (sa) and slope (sb) taken from the simulations. Example A: regression coefficients were calculated from earlier experimental data
Table 3 Influence of imprecision profile on intercept (a) and slope (b) of several fitting lines for capillary glucose concentrations determined with comparative (suffix X) and test methods

1.20

Slope of regression line


a

–0.492

–0.540
0.204

0.420
0.327
1.10

1.00
gDR

1.174

1.170
0.029
0.048

1.171
0.036
0.060
0.90

0.80
a

–0.461

–0.433
0.194

0.824
0.295
0.70
0.75 0.80 0.85 0.90 0.95 1.00
Coefficient of correlation
sDR

1.177

1.177
0.037
0.060

1.171
0.043
0.070 1.15 B
1.10
a

–0.481

–0.480
0.252

0.825
0.353

Slope of regression line


1.05

1.00
OR

1.194

1.196
0.037
0.062

1.220
0.044
0.073

0.95

0.90
a

–0.604

–0.618
0.257

0.386
0.370

0.85

0.80
0.88 0.90 0.92 0.94 0.96 0.98 1.00
SPCR

1.189

1.190
0.036
0.060

1.211
0.042
0.070

Coefficient of correlation

Figure 2 Coefficient of correlation r vs. the slope of the regression line.


Blue rhombs, OLR; brown rectangles, SPCR; green triangles, OR;
a

–0.565

–0.576
0.248

0.472
0.351

blue crosses, sDR; blue circles, gDR; brown points, PBR. (A) Standard
deviation of measurement interval=5; (B) standard deviation of meas-
urement interval=30. Part of the data was taken from Tables 1 and 2.
OLR

1.158

1.158
0.036
0.059

1.162
0.042
0.070

OLR underestimated the slope 29%, SPCR, OR and PBR


overestimated the slope (40%–78%), whereas sDR over-
a

–0.339

–0.342
0.247

0.905
0.350

distribution. For further details, see Table 1. Intercept values, mmol/L.

estimated the slope only 0.7% and gDR underestimated


the slope 3.1%. The same imprecision example at s≈30
(Table 2, example B) led to correct estimation with DRs
sa; sbb

sa; sbb
Mean

Mean

Mean
DPB

DPB

and to less severe deviations with other models than at


s≈5. In cases with a poor correlation (r < 0.6) and in the
presence of a 10% bias, sDR only slightly overestimated
ra

0.974

0.973

0.960

b (+5%), gDR underestimated by –35% and other models


led to more severe deviations (Table 1, example B).
β of sAY

0.018

0.018

0.0750

In Table 3, a real example of comparing two analyti-


cal procedures for the determination of capillary blood
glucose concentrations is shown (a hexokinase method
α of sAY

0.15

0.15

0.2000

vs. a point-of-care testing procedure). The experimental


data were taken from earlier publications [8, 26]. Different
imprecision profiles of both analytical procedures with
β of sAX

0.015

0.015

0.0250

non-constant δ occurred and were calculated of dupli-


cate determinations obtained in the earlier study (Table
Example B

Example C
Example A

3, example A). The regression coefficients of OLR were


(suffix Y).

α of sAX

0.053

0.053

0.060

lower, of SPCR and OR were higher than those of sDR and


gDR. The regression coefficients of PBR were similar to DR
a
Haeckel et al.: Regressions for method comparisons      155

values in this example. In a modified example (Table 3C), (Tables 1 and 2). Confidence intervals were calculated
larger differences of the analytical variances were arbi- from a non-parametric density estimate (see Appen-
trarily applied (CVAX = 2.5%, CVAY = 7.5% at 6 mmol/L). dix 1). Confidence intervals of b are presented as bars
A similar constellation as in examples A and B was on the bottom of Figure 3. They are slightly narrower
observed. for gDR only in the examples with the larger measure-
ment interval and approximately identical for all other
models.
Diagnostic relevance of differences Increasing the number of observations would narrow
between analytical procedures the confidence intervals from all models without affect-
ing the coefficients of regression (doubling reduces the
Confidence intervals interval size by 1/√2). The choice of the number of samples
(e.g., 100) is always a compromise between economy,
The regression coefficients of all models showed con- detection of possible interferences and the size of the con-
siderable variation as indicated by standard deviations fidence interval.

A C
Probability density function of slope estimate

Probability density function of slope estimate

0 0

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Estimated slope Estimated slope

B D
Probability density function of slope estimate

5
Probability density function of slope estimate

0 0

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Estimated slope Estimated slope

Figure 3 (Continued)
156      Haeckel et al.: Regressions for method comparisons

E G
Probability density function of slope estimate

Probability density function of slope estimate


15
15

10 10

5 5

0 0

0.9 1.0 1.1 1.2 0.9 1.0 1.1 1.2


Estimated slope Estimated slope

F H
Probability density function of slope estimate

Probability density function of slope estimate

15
15

10
10

5
5

0 0

0.9 1.0 1.1 1.2 0.9 1.0 1.1 1.2


Estimated slope Estimated slope

Figure 3 Probability density functions of the estimated slope determined by different regression models.
OLR, black lines; SPCR, green; OR, red; sDR, blue; gDR, magenta; PBR, brown. Bars at the bottom indicate 95% confidence intervals. (A–D)
Mean = 100 arbitrary units; s = 5; (E–H) mean = 100, s = 30; in the absence of bias (A, B, E, F) and in the presence of 10% proportional bias (C,
D, G, H). Standard deviations of analytical variation: sAX = 1+0.02 x, sAY = 2+0.04 y (A, C, E, G) and sAX = 2+0.04 x, sAY = 1+0.02 y (B, D, F, H). The
examples were taken from Tables 1 and 2.

Detectable proportional bias consequently the power of detection for a bias between
analytical procedures were similar for all models and only
The regression coefficients of all models showed consid- slightly better for gDR under some experimental condi-
erable variation even more precisely by the power of pro- tions (Figure 3E–F, Table 2). With sX≈5, confidence limits
portional bias detection (DPB; Tables 1 and 2). DPB is the (Figure 3A–D, Table 1) and DPBs (Figure 4, Table 1) of b
deviation of the slope b from 1 that can be detected with were slightly larger for DRs than for other models and
a probability (power) of 90%. If DPB is 5%, a true bias is identical for sDR and gDR (because the imprecision is
detected with a probability of at least 90% if the slope almost constant in the measurement interval).
is < 0.95 or > 1.05. Larger deviations will be detected with The confidence limits and DPB values of the practi-
higher probability (power). Significant DPBs are marked cal example (Table 3) behaved similar as in the simulation
in italics in Tables 1 and 2. study. They were slightly lower with gDR than with other
With the larger measurement interval (sX≈30), models. In example B, PBR led to values close to gDR, but
the confidence interval of slopes (and intercepts) and not in example C.
Haeckel et al.: Regressions for method comparisons      157

A 1.4 Permissible differences between methods


1.3
1.2 The acceptable difference between methods should not
Slope ± DPB

1.1 be greater than the permissible bias. The permissible


1 difference (bias) between methods was chosen from the
0.9 concept of desirable allowed bias based on biological
0.8 variation [28]. Instead of allowable bias the term permis-
0.7 sible is used according to the international vocabulary of
0.6 metrology [29]. Differences of the values calculated at the
0.4 0.5 0.6 0.7 0.8 0.9 1
lower end, at the middle and the upper end of the meas-
Coefficient of correlation
urement interval were calculated by various models and
B 1.8
compared with permissible bias.
1.6
With the narrower measurement interval (sX≈5), a per-
1.4 missible bias of 1% was chosen because this example is
Slope ± DPB

1.2 close to the situation of plasma sodium for which Fraser


1 had proposed a desirable allowable bias of 0.9% [28]. At
the lower limit of the measurement interval (x = 85 arbi-
0.8
trary units), only both DRs led to regression coefficients
0.6
within the permissible limits under all conditions (Figure
0.4 6C). Similar constellations were obtained with x = 100 and
0.4 0.5 0.6 0.7 0.8 0.9 1
Coefficient of correlation x = 115 (not shown).
With the larger measurement interval (sX≈30), a per-
Figure 4 The slope of the regression function ± detectable propor- missible difference of 5%–10% was chosen because this
tional bias (DPB), calculated with gDR (green triangles and crosses) example represents the situation of many quantities with
and sDR (blue rhombs and brown rectangles) in the absence of
a relative large biological variation (e.g., enzymes, tri-
“true” bias (slope = 1).
(A) s = 5, (B) s = 30. Part of the data was taken from Tables 1 and 2. glycerides, thyreotropin, etc.). Again, only both DRs led
to regression coefficients ( < 1.2%) within the permissible
limits under all conditions at the upper limit of the meas-
urement interval (Figure 7C). Similar constellations were
obtained with x = 10 and x = 100 (not shown).
The ratio of DPB for sDR and gDR (DPBgDR/DPBsDR) 3The simulation studies have demonstrated that the
varied between 0.6 and 1.0 for two extreme cases, a con- DR model determined the regression coefficients more
stant SDA and a constant CVA (Figure 5). In more realistic reliably than other models and its estimates were closer
cases occurring in practice, the ratio varied between 0.8 to the “true” values. Therefore, the regression coefficients
and 1.0 (crosses in Figure 5). DPB was almost linearly of gDR were taken as the “real” values in the experimental
and inversely related to the coefficient of correlation in glucose example (Table 3). The differences of the various
the range studied. It increased with decreasing r which glucose concentrations calculated by the models are
means the power of detection decreased with r. Whereas marked in bold in Table 4 if they exceeded the permissible
PBR appeared less reliable than both DRs in most cases, bias. This was the case at all three concentrations tested
PBR unexpectedly led to the same regression coefficients with OLR, SPCR, OR and PB in example C. The differ-
and the same DPB as gDR if the imprecision of both proce- ences with example B (not shown) are very close to those
dures were 20% (Table 2, example A). calculated for example A. The permissible bias for blood
In summary, DPB (in percent of b = 1) of sDR was 0%– glucose concentrations (2.3%) was taken from Fraser [28].
20% higher than that of gDR. It was 20%, if s≈30, δ≠1.0.
In many cases, the detectable bias differed less than 20%
and the difference may even be zero. In extreme cases, Permissible equivalence and spread limits
gDR may even have a larger DPB than sDR (at r = 0.5, Figure
4). We suggest that the advantage of using gDR does not Although the regression coefficients can be reliably
justify its greater disadvantages in comparison with detected by DR either in its simple or in the generalized
sDR. However, if the highest possible detection power is form, it does not detect sample specific bias which may
required, gDR may be applied. be caused by interferences from substances other than the
158      Haeckel et al.: Regressions for method comparisons

A B
1.4
1.4
s(intercept, gDR)/s(intercept, sDR)

1.2

s(slope, gDR)/s(slope, sDR)


1.2

1.0 1.0

0.8
0.8

0.6

0.5 0.6 0.7 0.8 0.9 1.0 0.5 0.6 0.7 0.8 0.9 1.0
Coefficient of correlation Coefficient of correlation

Figure 5 The ratio of the standard deviation (SD) of slope and intercept estimated by gDR and sDR in relation to the correlation coefficient r
(x, y) with different imprecision profiles.
Filled circles, SDA was constant (case D in Figure 3); filled rectangles, CVA was constant (case B in Figure 2); crosses, mixed cases
(case C in Figure 3). (A) s = 5, (B) s = 30.

quantity to be measured (sample-related effects). Inter- – r > 0.99: all models estimate the regression coefficients
fering factors may already be known (e.g., endogenous almost correctly and can be used.
chromogens as hemoglobin or exogenous factors as phar- – r = 0.99: OLR already underestimated the regression
maceuticals) or still be unknown. Interferences can be coefficients slightly independent on the width of the
suspected if the spread around the regression line cannot data interval. This effect increased with decreasing r.
solely be explained by the imprecision of both procedures – r < 0.99   ≥  0.80: OR, SPCR, PBR are equally suited if
[8]. Spread limits are drawn parallel to the fitting line. We δ = 1.0.
now recommend that the fitting line should be calculated – r<0.80  ≥ 0.60: sDR and gDR are equally suited, even if
by sDR. δ≠1.0 and δ is constant within the measurement interval.
A gratuitous Excel program which automatically – If δ is non-constant, which occurs for linear
determines equivalence and spread limits (based on sDR) imprecision profiles in the presence of an intercept of
can be obtained from [30]. A software program for gDR the imprecision profile (that means in the presence of
designed by Martin is available under information for a detection limit, the most realistic case), and r  ≥  0.6
authors of “Clinical Chemistry” [31]. both sDR and gDR still estimate the slope correctly,
but the standard deviations of slope and intercept are
sometimes lower with gDR than with sDR (indicating
The relation between Pearson’s the power of detecting bias is higher with gDR).
– Calculated slopes and intercepts of the regression
coefficient of correlation and line were linearly and inversely (intercept decreasing
various regression models with increasing slope) related with each other for all
models (as follows from the definition of the intercept,
Regarding the relation between r-value and estimated see Appendix 1).
regression coefficients, the following conclusions can be – The standard deviation of slope and intercept
drawn from the presented results: increased with decreasing r with all models.
Haeckel et al.: Regressions for method comparisons      159

A 20.0000 A 70

15.0000 60

50

Bias at 190
10.0000
Bias at x = 85

40
5.0000
30

0.0000 20
0.4000 0.5000 0.6000 0.7000 0.8000 0.9000 1.0000
10
-5.0000
0
-10.0000 0.6000 0.6500 0.7000 0.7500 0.8000 0.8500 0.9000 0.9500 1.0500
Correlation coefficient
-15.0000
Correlation of coefficient

B 5.0000 B 10

5
0.0000
0

Bias at 190
0.7000 0.7500 0.8000 0.8500 0.9000 0.9500 1.0000
-5.0000 0.8800 0.9000 0.9200 0.9400 0.9600 0.9800 1.0000
-5
Bias at 85

-10
-10.0000
-15
-15.0000
-20

-20.0000 -25
Coefficient of correlation

-25.0000
Correlation coefficient
C 2

1.5
C 1
1
0.5
0.5
Bias at 190

0
0.4000 0.5000 0.6000 0.7000 0.8000 0.9000 1.0000 0
-0.5
Bias at x = 85

0.6000 0.7000 0.8000 0.9000 1.0000


-1 -0.5

-1.5 -1

-2 -1.5

-2.5 -2
Coefficient of correlation
-3
-3.5 Figure 7 Bias at x = 190 arbitrary units (upper limit of the measure-
Coefficient of correlation
ment interval, sX = 30) calculated (A) by OLR, (B) SPCR (blue rhombs),
OR (brown rectangles) and PBR (green triangles) and (C) by sDR
Figure 6 Bias at x = 85 arbitrary units (lower limit of the measure-
(blue rhombs) and gDR (brown rectangles).
ment interval, sX = 5) calculated (A) by OLR, (B) SPCR (blue rhombs),
Bias: 190– (190slope + intercept). Most data were taken from
OR (brown rectangles) and PBR (green triangles) and (C) by sDR
Table 2.
(blue rhombs) and gDR (brown rectangles).
Bias: 85– (85 slope + intercept). Most data were taken from Table 1.

significant deviation of the intercept from 0). But r was not


The correlation coefficient r is often misused to indicate suited to select a particular regression model and the rule
agreement between the two methods [22, 23]. It may be of thumb mentioned above can lead to misinterpretations.
one in the presence of considerable bias. It can only be The correlation coefficient strongly depends on ana-
applied as an indicator for agreement in the absence of lytical imprecision, imprecision profile and is related to
bias (b = 1, a = 0). In the present study, r was found to be measurement range. For a given true imprecision of the
related to the size of the detectable proportional bias values and imprecision profiles of both methods, the
(minimum significant deviation of the slope b from 1). It resulting correlation coefficient can be calculated by Eq.
was also related to the detectable constant bias (minimum (A5) in Appendix 1.
160      Haeckel et al.: Regressions for method comparisons

Table 4 Differences between concentrations calculated by various the bisecting line between the two OLR lines. SPCA and
regression models for examples A and C presented in Table 3.
PBR are sensitive to an increase of the imprecision ratio,
which can be tolerated in the interval of 0.8–1.2 [8]. DR is
Glucose concentration, 2 7 15
mmol/L
not influenced by imprecision ratio if the correlation coef-
ficient r is > 0.6.
Example A/C A/C A/C
Using weights in regression models has often been
Concentrations proposed to deal with different analytical variances in
calculated, mmol/L the measurement interval. This requires knowledge of
 OLR 1.98/3.27 7.77/9.03 17.03/18.245
the imprecision profile of both analytical procedures
 SPCR 1.81/2.80 7.76/9.00 17.27/18.92
 OR 1.78/2.69 7.75/9.00 17.31/19.09
as mentioned above for gDR. gDR weighs errors in both
 sDR 1.87/3.15 7.76/9.02 17.17/18.42 directions and allows errors that are non-constant over
 gDR 1.89/3.17 7.76/9.02 17.15/18.37 the measurement interval. This is the most general form
 PBR 1.86/2.77 7.73/9.01 17.13/18.99 of using different weights. OLR allows errors in only one
Permissible bias 0.046 0.161 0.345 direction and therefore weighted errors can be used in
(2.3%), mmol/L
only one direction. SPCR and PBR implicitly apply identi-
The concentrations chosen were 2 mmol/L (36 mg/dL), 7 mmol/L cal weights to all data points, sDR uses weights constant
(126 mg/dL) and 15 mmol/L plasma glucose (270 mg/dL). The for each method but possibly different between methods.
estimated concentrations in mmol/L were calculated: chosen In conclusion, all methods operated well only within
concentration slope + intercept. The concentrations for sDR were certain limitations. Whereas OLR had a relatively narrow
considered to be closest to the (unknown) true values. Then, the
applicability, DR required the least restrictions. Therefore,
difference between the sDR values and the values of the other
regression models should be smaller than the permissible bias [26]. DR must be considered to be superior to other models. Both
Concentrations marked in bold deviate of the sDR value more than DR approaches are more or less equally suited in estimating
the permissible bias (permissible difference). the coefficients of regression correctly and are more robust
than other models studied. The only difference between
gDR and sDR is that gDR has a slightly higher power of
detection in some cases. In extreme cases which, anyhow,
Discussion should be considered as not comparable in laboratory med-
icine (r<0.6), both DR can also lead to erroneous estima-
Linnet [20] had already performed a similar simulation tions of regression coefficients, and gDR may even be less
study, but compared only OLR with DR and did not con- reliable than sDR. DR has been developed more than five
sider the most realistic case that the ratio of sAX and sAY is decades ago. Now, it is time that it replaces linear regres-
non-constant in the range of measurements. sion of the first kind for method comparisons. It is also
One of the major interests for scatter plots is detect- superior to other regression models which consider errors
ing constant and/or proportional bias. Because of the in both variables, but do not account for different error sizes
above-mentioned effects of the impression profile, two in both variables as already pointed out by Linnet [20].
questions arise: (i) is the observed bias really a true
bias (or an artifact) and (ii) is a true bias detectable (or
masked) by an imprecision effect. All regression models
discussed have limitations for answering the two ques- Limitations
tions. The behavior of the regression models varies
between two extreme cases, a constant SDA and a con- – The errors of the two variables are assumed to be
stant CVA. A general hierarchy of all regression models independent and normally distributed. The analytical
covering all possible situations probably does not exist. standard deviation depends linearly on the quantity
However, DR is the only approach which provides correct in the measurement interval [32, 33].
answers probably under all conditions which usually – All models are only valid for continuous variables. In
appear with method comparison studies in laboratory the discrete case, other models must be chosen.
medicine. – A problem may arise from limitations of the inter-
If the imprecision ratio δ is 1, the slope b is identical changeability of estimates of day-to-day imprecision
for DR and OR, approximately identical for DR, OR, SPCR between commercial control materials and native
and PBR, at least up to a CVA = 5.0 and the slope of the SPCR materials [34]. The imprecision profile determined
is the ratio of the slopes of the two OLRs, or, geometrically, with artificial samples has the benefit that it can
Haeckel et al.: Regressions for method comparisons      161

provide imprecision data from day to day with Conflict of interest statement
the disadvantage that matrix properties of control
Authors’ conflict of interest disclosure: The authors
samples may differ from those of patients’ samples.
stated that there are no conflicts of interest regarding the
The imprecision profile estimated from duplicate
publication of this article.
measurements has the benefit that the samples are
Research funding: None declared.
identical to the samples of which the regression
Employment or leadership: None declared.
results are derived with the disadvantage that only
Honorarium: None declared.
imprecision data from within run may be obtained.
However in both cases, δ should be comparable.
Control materials should be as commutable with
native materials as possible. Received January 22, 2013; accepted April 19, 2013

Appendix 1
The basic assumptions in method comparison are that (i) aY − aX ⋅bY bY
Ey = + ⋅ Ex = a + b⋅ Ex . (A3)
the observations x and y are related to the true unknown bX bX
value z by x = aX+bXz+εX and y = aY+bYz+εY, where (ii) aX,
bX, aY, bY quantify constant and proportional deviations
In a method comparison, only a and b can be derived from
(bias) between measured and true value, (iii) εX and εY are
observed x and y data pairs (see below). Eq. (A1) shows
random errors with mean zero and standard deviations
that the bias components aX, bX, aY , bY cannot be identified
sAX and sAY which are (iv) uncorrelated cov(εX, εY) = 0. The
from the relation between observed x and y only. An esti-
random errors representing the analytical variation are
mate of a = 0 only allows the conclusion that aY = aX·bY, but
(v) also assumed to be uncorrelated with z. The standard
not that the true constant bias terms are zero (aY = aX = 0).
deviation of the random errors may depend on the true
Similarly, from an estimate of b = 1 only follows that both
value z, and it is assumed that these are related to z by
proportional bias terms are identical (bX = bY), but not that
linear functions sAX(z) = aAX+bAX·z and sAY(z) = aAY+bAY·z, the
both are equal to one (bX = bY = 1). Estimates of b≠1 and of
imprecision profiles. From these profiles, and assuming
a≠0 can be caused by different analytical errors (sAX≠sAY)
that the true values are normally distributed, the overall
even in the absence of a systematic difference between
analytical standard deviations are obtained by
methods (see below).
∞ Observed values have mean values x , y and vari-
2
sAX = ∫ ( z − Ez X ) 2 ⋅ f ( z ;( aAX + bAX ⋅ z ) 2 ) dz , (A1) ances sX2 , sY2 . Observed variances are related to the (unob-
0
servable) variance sZ2 of the true values and the analytical
2 2
variances sAX , sAY via

2
sAY = ∫ ( z − EzY ) 2 ⋅ f ( z ;( aAY + bAY ⋅ z ) 2 ) dz (A2)
0 sX2 = sZ2 + sAX
2
and sY2 = sZ2 + sAY
2
. (A4)

where f (z; (aA+bA·z)2) is the normal probability density The correlation r (x, y) between observed values is
function and Ez is the mean value of the analytical error. related to the imprecision profiles and the standard devia-
If bAX = bAY = 0 the integrals simplify to tion of the true values z by:
sAX = aAX, sAY = aAY sZ2
r= (A5)
As an approximation to Eq. (A1), the value of sAX(z) at the sZ2 + sAX
2
⋅ sZ2 + sAY
2

position of the x mean may be used:


which simplifies to
sAX ≈ aAX + bAX ⋅ x , sAY ≈ aAY + bAY ⋅ y .
sZ2
r=
In this setting, the only observable quantities are the x sZ2 + sA2
and y values, the expected values of which are related
by: 2
if both analytical variances are identical sAX = sAY
2
( )
= sA2 .
162      Haeckel et al.: Regressions for method comparisons

The imprecision profiles must be determined outside CI = b ± t(1–α/2, n–2)·sb (A6)


the actual method comparison either by using duplicate
sXY
measurements or from extra measurements of at least two σ τ2 =
b
samples with different quantities (in the case of a linear
imprecision profile). sY2 − 2 bsXY + b 2 sX2
The equations for OLR can be taken from any textbook σ ω2 =
δ+ b 2
on basic statistics, the equations for regressions of errors
in both variables are less known, but have also been
σ δ2 σ ω2 + b 2 σ τ2 σ δ2 + σ τ2 σ ω2
described elsewhere [4, 15, 16, 25]. sb = .
Simple Deming regression, sDR [17]: nσ τ2

OR corresponds to sDR with δ = 1.


(s − δ s ) + 4 δ s
2
sY2 − δ sX2 + 2
Y
2
X
2
XY
b= The equations for gDR parameters and their standard
2 sXY deviations are given in [17–21].
Detectable proportional bias (deviation of slope b
sXY =
1
( )(
∑ x − x ⋅ yi − y
n− 1 i i
) from 1.0) is the solution of P(DPB/sb > t(n–2, ncp)) = β
where P is the cumulative distribution function of the
δ= sAY
2 2
/ sAX non-central, t-distribution with non-centrality parameter
ncp = |DPB|/sb, and β is the probability of not detecting a
a = y − b⋅ x deviation of at least size DBP. Then 1–β is the power to
detect a deviation of at least size DBP, which is obtained
Confidence limits of b for sDR [22]: numerically [27].

References
1. International Standard Organisation. Medical laboratories – 10. Bablok W. Range of linearity. In: Haeckel R, editor. Evaluation
particular requirements for quality and competence, ISO 15189, methods in laboratory medicine. Weinheim: VCH, 1993:251–8.
2nd ed. Geneva: International Standard Organisation, 2007: 11. Geistanger A, Berding C, Vorberg E, Herlan M. Local regression:
1–40. a new approach for measurement system comparison analysis.
2. Clinical and Laboratory Standards Institute. Method comparison Clin Chem Lab Med 2008;46:1211–9.
and bias estimation using patient samples: approved guideline – 12. Petersen PH, Stoeckl D, Blaabjerg O, Pedersen B, Birkemose E,
second edition (interim revision). CLSI document EP9-A2-IR. Thienpont L, et al. Graphical interpretation of analytical data
Wayne, PA: Clinical Laboratory Standards Institute, 2010; from comparison of a field method with reference method by
30:1–36. use of difference plots. Clin Chem 1997;43:2039–46.
3. Westgard JO, Hunt MR. Use and interpretation of common 13. Linnet K. Performance of Deming regression analysis in case
statistical tests in method-comparison studies. Clin Chem of misspecified analytical error ratio in method comparison
1973;19:49–57. studies. Clin Chem 1998;44:1024–31.
4. Bablok W, Haeckel R, Meyers W, Wosniok W. Biometrical 14. Averdunk R, Borner K. Korrelation der Thromboplastinzeiten
methods. In: Haeckel R, editor. Evaluation methods in laboratory bei Dicumarol-behandelten Patienten unter Verwendung
medicine. Weinheim: VCH, 1993:203–41. verschiedener Thrombokinase-Präparate. Z Klein Chem Klein
5. Stöckl D, Dewitte K, Thienpont LM. Validity of linear regression Biochem 1970;8:263–8.
in method comparison studies: is it limited by the statistical 15. Feldmann U, Schneider B, Klinkers H, Haeckel R. A multivariate
model or the quality of the analytical input data? Clin Chem approach for the biometric comparison of analytical methods in
1998;44:2340–6. clinical chemistry. J Clin Chem Clin Biochem 1981;19:121–37.
6. Westgard JO. Basic method validation, 3rd ed. Madison, WI: 16. Feldmann U. Robust bivariate errors-in-variables regression and
Westgard QC Inc., 2008. outlier detection. Eur J Clin Chem Clin Biochem 1992;30:405–14.
7. Haeckel R, Sonntag O. Validation of quantitative analytical 17. Deming WE, editor. Statistical adjustment of data. New York:
procedures in laboratory medicine. J Lab Med 2012;36:111–8. Wiley, 1943 (Dover Publications edition 1985).
8. Haeckel R, Wosniok W, Al-Shareef N. Permissible performance 18. Cornbleet PJ, Gochman N. Incorrect least-squares regression
limits of regression analyses in method comparisons. Clin Chem coefficients in method comparison analysis. Clin Chem
Lab Med 2011;49:1805–16. 1979;25:432–8.
9. Bland JM, Altman DG. Comparing two methods of clinical 19. Linnet K. Estimation of the linear relationship between the
measurement: a personal history. Int J Epidemiol 1995;24 measurement of two methods with proportional errors. Stat
(Suppl 1):S7–14. Meth 1990;9:1463–73.
Haeckel et al.: Regressions for method comparisons      163

20. Linnet K. Evaluation of regression procedures for methods 28. Fraser CG. Biological variation: from principles to practice.
comparison studies. Clin Chem 1993;39:424–32. Washington, DC: American Association for Clinical Chemistry,
21. Kummel CH. Reduction of observation equations which contain 2001:1–151.
more than one observed quantity. The Analyst 1879;6:97–105. 29. International vocabulary of metrology – basic and general
22. Martin RF. General Deming regression for estimating systematic concepts and associated terms (VIM). ISO guide 99, 3rd ed.
bias and its confidence interval in method-comparison studies. JCGM 2007:1–104.
Clin Chem 2000;46:100–4. 30. Keller Th. Excel-tool: permissible performance limits in method
23. Dunn G. Regression models for method comparison data. comparisons. Available at: http://www.acomed-statistik.de/
J Biopharmaceut Stat 2007;17:739–56. performance_limits_method_comparison.html. Accessed 10
24. Adcock RJ. A problem in least squares. Ann Math 1878;5:53–4. November, 2011.
25. Passing H, Bablok W. A new biometrical procedure for testing 31. Clinical Chemistry, information for authors. Tools for diagnostic
the equality of measurements from two different analytical accuracy. Available at: http://www.clinchem.org/site/info_ar/
methods. Application of linear regression procedures for info_authors.xhtml#tools. Accessed November 2012.
method comparison studies in clinical chemistry, Part I. J Clin 32. Haeckel R, Haeckel H. The determination of glucose concen-
Chem Clin Biochem 1983;21:709–20. tration in 20 microliter capillary blood, liquor and urine by
26. Haeckel R, Wosniok W, Puentmann I. Discordance rate, a new the hexokinase method with the endpoint analyzer 5030
concept for combining diagnostic decisions with analytical (Eppendorf). Z Klin Chem Klin Biochem 1972;10:453–61.
performance characteristics.1. Application in method or sample 33. Haeckel R, Mathias D. A two point method for the determination
system comparisons and in defining decision limits. Clin Chem of urea with a Gemsaec analyzer. Z Klin Chem Klin Biochem
Lab Med 2003;41:347–55. 1974;12:515–20.
27. Neter J, Kutner MH, Nachtsheim CJ, Wasserman W, editors. 34. Fuentes-Arderiu X, de-la-Presa G. Interchangeability of
Applied linear statistical models, 4th ed. Boston, MA: WCB estimates of day-to-day imprecision between commercial
McGraw Hill, 1996:56. control materials and serum pools. Clin Chem 2002;48:573–4.

You might also like