Professional Documents
Culture Documents
Comparison of Ordinary Linear Regression Orthogona
Comparison of Ordinary Linear Regression Orthogona
Abstract: A well-accepted tool for method validation is a calculated with regression coefficients of both DRs did
method comparison study. Results are usually assessed not deviate more than the permissible bias. Therefore,
on a scatter plot of which the fitting line is calculated the advantage of using gDR does not justify its greater
by several approaches, for example, ordinary (vertical) disadvantages in comparison with sDR.
linear regression (OLR), orthogonal regression (OR),
Deming regression (DR), Passing-Bablok method (PBR) Keywords: Deming regression; Passing-Bablok regres-
or standardized principal component regression (SPCR). sion; regression models.
DR was applied in its general form (gDR), requiring
information of the imprecision of at least two differ-
ent quantities and as simple DR (sDR) with impreci- Zusammenfassung: Eine Methodenvergleichsstudie ist
sion information of only one quantity. The equation of ein allgemein akzeptiertes Hilfsmittel der Methodenva-
the regression line calculated by these concepts varies lidierung. Die Ergebnisse werden oft auf einem Streud-
depending on range of measurement, analytical vari- iagramm dargestellt, dessen Ausgleichsgerade mittels
ation and on imprecision ratio (sAY /sAX). There is still verschiedener Verfahren berechnet wird, wie z.B. (verti-
a global debate about which statistical concept is the kale) lineare Regression (OLR), orthogonale Regression
most adequate for validating purposes. Various paired (OR), Deming Regression (DR), Passing-Bablok Methode
random samples with a size of 100 were simulated in (PBR) oder standardisierte Hauptkomponenten-Regres-
5000 replicates and evaluated with different regres- sion (SPCR). Die DR wurde in ihrer generalisierten Form
sion models. The behavior of the slope and intercept (gDR), die Information der Impr äzision wenigstens
of the regression lines were compared under various zweier unterschiedlicher Quantitäten, oder als einfache
conditions. Two extreme ranges of measurement and DR (sDR), die Impr äzision nur einer Quantität benötigt,
several variance ratios in the absence and presence of angewendet. Die mit diesen Konzepten berechneten
bias were studied. The results clearly demonstrated that Regressionslinien variieren in Anhängigkeit des Mess-
DR is the only model which can be applied without any bereichs, der analytischen Streuung und des Impr äzi-
precautions under conditions which usually occur in onsverhäetnisses (sAY /sAX). Es findet immer noch eine
method comparison studies, and therefore should be globale Debatte über das für Validierungszwecke adä-
preferred in laboratory medicine. Other models require quate statistische Verfahren statt. 100 verschiedene
restrictions with regard to range of measurement and/ gepaarte Zufallsproben wurden 5000 Mal einer Simu-
or imprecision profile. Differences of the concentra- lation mit den genannten Regressionsverfahren unter-
tions at different positions of the measurement interval worfen. Das Verhalten von Steigung und Interzept der
148 Haeckel et al.: Regressions for method comparisons
present in the data as closely as possible. However, analyti- [7]. A proportional bias between the methods leads to a
cal errors in laboratory medicine may be relatively small, slope > 1 or < 1 (negative bias), and a constant bias leads to
as can be seen from sometimes high correlations between a deviation of the intercept from zero. If the comparative
comparative and test procedure results, so it might be sus- procedure is not a reference method, the term “bias” is not
pected that different mathematical procedures might lead quite correct and should be stated more precisely as “bias
to similar comparison results. In particular, it is not clear between procedures” or even more appropriately as “dif-
whether results from an elaborate mathematical proce- ference between procedures”. As indicated in the intro-
dure are superior enough compared to simple methods duction, an artificial bias can also be erroneously identi-
that they justify the high effort of generating detailed data fied by using an inappropriate regression procedure.
on analytical imprecision, which the methods require. By Several least square techniques for the estimation of
contrast, it is known that an inappropriate way of method the regression coefficients have been applied in labora-
comparison produces misleading results [8]. tory medicine, which can be classified into two categories:
In a recent study, ordinary linear regression (OLR) has 1. regressions assuming imprecision in the values of
been found to be sensitive to range of measurements and only one variable: OLR, and
to imprecision ratio (sAY /sAX), whereas standardized prin- 2. regressions assuming imprecision in the values of
cipal component regression (SPCR) and Passing-Bablok both variables
approach (PBR) were only sensitive to imprecision ratio 2.1. SPCR
[8]. In the former study, imprecision ratios were kept con- 2.2. OR, also called principal component regression
stant over the data range. The present study extends the 2.3. DR (two versions).
previous results by also considering non-constant impre-
cision ratios and includes orthogonal regression (OR) OLR minimizes the distance of the data points in a scatter
and two versions of the Deming approach (general and plot (comparison method on the abscissa, test method on
simple DR, gDR and sDR, respectively). These two DRs the ordinate) towards the fitting line either in the vertical
are specifically designed to deal with different and vari- direction (parallel to the ordinate), denoted regression
able imprecision ratios. The performance of the various of y on x (regression of first kind, ordinary least squares
methods is compared with regard to the bias and the vari- regression), or in the horizontal direction (parallel to
ance of the estimated coefficients. Additionally, the size of the abscissa), denoted regression of x on y (regression of
the detectable bias was estimated. Linnet [13] had already second kind, inverse least squares regression). The regres-
performed a similar simulation study, but compared only sion coefficients slope (b) and intercept (a) of both regres-
OLR with DR and did not consider the most realistic case sion functions depend on the measurement interval of the
that the ratio of sAX and sAY is non-constant in the range of variable to be studied, the variation of the x- and y-values,
measurements. and the covariance of x, y. Both regression functions are
almost identical if sAY /sAX = 1 and both mean values are
identical. The smaller the measurement range is the larger
sAX is and the more sAY /sAX deviates from 1, the more differ-
Linear regression models ent the regression coefficients of the two regression func-
tions from each other [7].
The result of all kinds of regression analyses is the equa- Regressions of category 2 permit a more realistic
tion of the fitting line. In the ideal case (concordance of description of the experimental situation than OLR.
both procedures), the fitting line has a slope of 1, an inter- In most cases (but not always), the slope of the regres-
cept of 0 and the data points scatter very closely around sions with errors in both variables is in-between the two
the regression line. An empirical result can deviate from slopes of the OLR of the first and second kind. Averdunk
the ideal result in various ways. The regression coeffi- and Borner [14] proposed to use the geometric mean of
cients may indicate the presence of bias. If the comparative the two slopes as slope of the regression equation which
method is a reference method, the bias of the test method corresponds to the SPCR developed by Feldmann et al.
can be characterized by the difference between the regres- [15, 16]. The common assumption here is that sX = sY, which
sion line (fitting line) and the line of equality (y = x) at a implies that sAX = sAY . Then, the SPCR regression line is the
given xi %bias is the difference in percent of a specified xi bisecting line between the OLR of the first and second kind
(e.g., medical decision limits or mean values). The statis- passing through their intersection [7, 15]. The (simple)
tical significance of slope and intercept deviations from Deming model [17–20] can deal with the case sAX≠sAY , but
1 or 0, respectively, can be tested using confidence limits requires that the ratio δ= sAY 2 2
/ sAX is known. This concept
150 Haeckel et al.: Regressions for method comparisons
was first proposed by Kummel [21] and later supported by one-third of the detection limit (in the case that the detec-
others [22, 23]. The original DR assumed that the impre- tion limit was determined by multiplying the sA of a blank
cision ratio δ is constant over the entire measurement sample with 3). The regression coefficients of the impreci-
interval (simple DR, DR with constant δ). Linnet [13] and sion profile are termed α (intercept) and β (slope). A β of
Martin [22] have pointed out that this is often not the case. sA = 0.04 means CVA = 4% (if α = 0).
If δ varies over the data range, because the methods have In Figure 1, several types of imprecision profiles are
different imprecision profiles, then the assumptions of the presented. In case A, both procedures have the same sA
simple Deming procedure are violated and the generalized (δ = 1 and δ is constant in the measurement interval); in
Deming procedure is, at least theoretically, more appro- case B, sA is different for both procedures (sAY > sAX) and
priate [22]. Of course, the imprecision profiles must be increasing with xi (sA is non-constant, CVA is constant, δ≠1
known to recognize if the conditions of either the simple and δ is constant); in case C, sA and CVA differ for both pro-
or the generalized Deming procedure hold. cedures and are non-constant (δ≠1 and δ is non-constant);
OR is a special case of DR if δ = 1 [23]. OR was first in case D, sA is constant (sAY > sAX), CVA is non-constant, δ is
described by Adcock in 1878 [24]. It minimizes the perpen- constant and > 1. In case B, no intercept exists, that means
dicular distance between each data point and the fitting that the detection limit of both procedures is zero. Pro-
line. bably, case C most often occurs in practice (most realistic
Two steps are required to perform gDR. In the first case).
step, the imprecision profile must be determined. The PBR uses an approach which is completely different
most reliable way may be to derive the profile function of the least square techniques by estimating the median
of duplicate examinations of human samples (instead of the slopes between all data pairs [4, 25].
of artificial control samples). As an alternative, the sA of
two control samples with different quantities are taken
assuming that the imprecision profile is usually a linear
relation between sA and the quantity measured. Another Methods
way may be to take the sA value of one control sample and
Two artificial data sets (comparative procedure X and
testing procedure Y) were created, one with a ratio of the
measurement interval limits of 1:1.35 (lower limit = 85 arbi-
trary units, upper limit = 115, sX≈5) and one with a ratio
of 1:19 (lower limit = 10, upper limit = 190, sX≈30). The two
A intervals probably cover a representative span of intervals
C B which usually occur in clinical chemistry. Mean values of
xi and yi were identical (100 arbitrary units), the analytical
Analytical standard deviation
regression; PBR, Passing-Bablok regression). The table gives means of estimated slope and intercept with standard deviation and the detectable proportional bias (DPB; power = 90%), obtained
PBR
0.757
0.086
0.164
from 5000 simulations. Standard deviation of the measurement range = 5 units (example A, without “true” bias; example B, with 10% proportional bias). Significant under- or overestimations of
bias were calculated. The slopes and intercepts of the
regression line (OLR, ordinary linear regression; SPCR, standardized principal component regression; OR, orthogonal regression; sDR, simple Deming regression; gDR, generalized Deming
regression lines were calculated as described elsewhere
independent normally distributed errors (mean = 0, intercept α and slope β of sAX and of sAY) to normally distributed “true values”. Five approaches were used to obtain coefficients of the
[4, 8, 14] and for sDR and gDR as indicated in Appendix 1.
Coefficient of correlation. bStandard deviation of intercept (sa) and slope (sb) taken from the simulations. (X, Y) data pairs (n = 100, mean = 100 arbitrary units) were simulated by adding
a
34.15
8.591
Confidence limits were determined as empirical quantiles.
Although confidence limits were not always symmetrical,
only the standard deviation of slopes and intercepts are
gDR
1.083
0.171
0.287
given in Tables 1–3 to provide better readability. In Figure
2 the correct confidence limits are presented. The detect-
able size of a proportional bias is used as a measure for the
a
1.751
17.10
1.120
0.178
0.296
distributed.
Stöckl et al. [5] have pointed out that the effect of
analytical imprecision of the x-method on OLR can be
a
–2.018
17.81
0.683
0.104
0.184
41.71
10.37
0.804
0.067
0.128
Results
a
29.61
6.728
0.450
0.068
0.129
65.017
6.797
DPB
PBR and SPCR. With both DRs, the slope was not influ-
ra
0.559
0.020
1.000
0.040
2.000
a b a b a b a b a b a b
a
Coefficient of correlation. bStandard deviation of intercept (sa) and slope (sb) taken from the simulations. Standard deviation of the measurement range = 30 units (example A, without “true”
bias; example B, with 10% proportional bias). Significant under- or overestimations of slopes are marked in italics (according to DPB). For further details, see Table 1.
Haeckel et al.: Regressions for method comparisons 153
154 Haeckel et al.: Regressions for method comparisons
PBR
1.175
1.184
0.031
0.051
1.218
0.040
0.066
1.30
experimental regression coefficients of A (log-normal distribution). Example C: results obtained by simulations using the arbitrarily preselected regression coefficients (α, β) and a log-normal
[8, 26]. Imprecision profile was derived from duplicate measurements. Example B: regression coefficients, standard deviations and DPB values were obtained from a simulation study using
Coefficient of correlation. bStandard deviation of intercept (sa) and slope (sb) taken from the simulations. Example A: regression coefficients were calculated from earlier experimental data
Table 3 Influence of imprecision profile on intercept (a) and slope (b) of several fitting lines for capillary glucose concentrations determined with comparative (suffix X) and test methods
1.20
–0.492
–0.540
0.204
0.420
0.327
1.10
1.00
gDR
1.174
1.170
0.029
0.048
1.171
0.036
0.060
0.90
0.80
a
–0.461
–0.433
0.194
0.824
0.295
0.70
0.75 0.80 0.85 0.90 0.95 1.00
Coefficient of correlation
sDR
1.177
1.177
0.037
0.060
1.171
0.043
0.070 1.15 B
1.10
a
–0.481
–0.480
0.252
0.825
0.353
1.00
OR
1.194
1.196
0.037
0.062
1.220
0.044
0.073
0.95
0.90
a
–0.604
–0.618
0.257
0.386
0.370
0.85
0.80
0.88 0.90 0.92 0.94 0.96 0.98 1.00
SPCR
1.189
1.190
0.036
0.060
1.211
0.042
0.070
Coefficient of correlation
–0.565
–0.576
0.248
0.472
0.351
blue crosses, sDR; blue circles, gDR; brown points, PBR. (A) Standard
deviation of measurement interval=5; (B) standard deviation of meas-
urement interval=30. Part of the data was taken from Tables 1 and 2.
OLR
1.158
1.158
0.036
0.059
1.162
0.042
0.070
–0.339
–0.342
0.247
0.905
0.350
sa; sbb
Mean
Mean
Mean
DPB
DPB
0.974
0.973
0.960
0.018
0.018
0.0750
0.15
0.15
0.2000
0.015
0.015
0.0250
Example C
Example A
α of sAX
0.053
0.053
0.060
values in this example. In a modified example (Table 3C), (Tables 1 and 2). Confidence intervals were calculated
larger differences of the analytical variances were arbi- from a non-parametric density estimate (see Appen-
trarily applied (CVAX = 2.5%, CVAY = 7.5% at 6 mmol/L). dix 1). Confidence intervals of b are presented as bars
A similar constellation as in examples A and B was on the bottom of Figure 3. They are slightly narrower
observed. for gDR only in the examples with the larger measure-
ment interval and approximately identical for all other
models.
Diagnostic relevance of differences Increasing the number of observations would narrow
between analytical procedures the confidence intervals from all models without affect-
ing the coefficients of regression (doubling reduces the
Confidence intervals interval size by 1/√2). The choice of the number of samples
(e.g., 100) is always a compromise between economy,
The regression coefficients of all models showed con- detection of possible interferences and the size of the con-
siderable variation as indicated by standard deviations fidence interval.
A C
Probability density function of slope estimate
0 0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Estimated slope Estimated slope
B D
Probability density function of slope estimate
5
Probability density function of slope estimate
0 0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Estimated slope Estimated slope
Figure 3 (Continued)
156 Haeckel et al.: Regressions for method comparisons
E G
Probability density function of slope estimate
10 10
5 5
0 0
F H
Probability density function of slope estimate
15
15
10
10
5
5
0 0
Figure 3 Probability density functions of the estimated slope determined by different regression models.
OLR, black lines; SPCR, green; OR, red; sDR, blue; gDR, magenta; PBR, brown. Bars at the bottom indicate 95% confidence intervals. (A–D)
Mean = 100 arbitrary units; s = 5; (E–H) mean = 100, s = 30; in the absence of bias (A, B, E, F) and in the presence of 10% proportional bias (C,
D, G, H). Standard deviations of analytical variation: sAX = 1+0.02 x, sAY = 2+0.04 y (A, C, E, G) and sAX = 2+0.04 x, sAY = 1+0.02 y (B, D, F, H). The
examples were taken from Tables 1 and 2.
Detectable proportional bias consequently the power of detection for a bias between
analytical procedures were similar for all models and only
The regression coefficients of all models showed consid- slightly better for gDR under some experimental condi-
erable variation even more precisely by the power of pro- tions (Figure 3E–F, Table 2). With sX≈5, confidence limits
portional bias detection (DPB; Tables 1 and 2). DPB is the (Figure 3A–D, Table 1) and DPBs (Figure 4, Table 1) of b
deviation of the slope b from 1 that can be detected with were slightly larger for DRs than for other models and
a probability (power) of 90%. If DPB is 5%, a true bias is identical for sDR and gDR (because the imprecision is
detected with a probability of at least 90% if the slope almost constant in the measurement interval).
is < 0.95 or > 1.05. Larger deviations will be detected with The confidence limits and DPB values of the practi-
higher probability (power). Significant DPBs are marked cal example (Table 3) behaved similar as in the simulation
in italics in Tables 1 and 2. study. They were slightly lower with gDR than with other
With the larger measurement interval (sX≈30), models. In example B, PBR led to values close to gDR, but
the confidence interval of slopes (and intercepts) and not in example C.
Haeckel et al.: Regressions for method comparisons 157
A B
1.4
1.4
s(intercept, gDR)/s(intercept, sDR)
1.2
1.0 1.0
0.8
0.8
0.6
0.5 0.6 0.7 0.8 0.9 1.0 0.5 0.6 0.7 0.8 0.9 1.0
Coefficient of correlation Coefficient of correlation
Figure 5 The ratio of the standard deviation (SD) of slope and intercept estimated by gDR and sDR in relation to the correlation coefficient r
(x, y) with different imprecision profiles.
Filled circles, SDA was constant (case D in Figure 3); filled rectangles, CVA was constant (case B in Figure 2); crosses, mixed cases
(case C in Figure 3). (A) s = 5, (B) s = 30.
quantity to be measured (sample-related effects). Inter- – r > 0.99: all models estimate the regression coefficients
fering factors may already be known (e.g., endogenous almost correctly and can be used.
chromogens as hemoglobin or exogenous factors as phar- – r = 0.99: OLR already underestimated the regression
maceuticals) or still be unknown. Interferences can be coefficients slightly independent on the width of the
suspected if the spread around the regression line cannot data interval. This effect increased with decreasing r.
solely be explained by the imprecision of both procedures – r < 0.99 ≥ 0.80: OR, SPCR, PBR are equally suited if
[8]. Spread limits are drawn parallel to the fitting line. We δ = 1.0.
now recommend that the fitting line should be calculated – r<0.80 ≥ 0.60: sDR and gDR are equally suited, even if
by sDR. δ≠1.0 and δ is constant within the measurement interval.
A gratuitous Excel program which automatically – If δ is non-constant, which occurs for linear
determines equivalence and spread limits (based on sDR) imprecision profiles in the presence of an intercept of
can be obtained from [30]. A software program for gDR the imprecision profile (that means in the presence of
designed by Martin is available under information for a detection limit, the most realistic case), and r ≥ 0.6
authors of “Clinical Chemistry” [31]. both sDR and gDR still estimate the slope correctly,
but the standard deviations of slope and intercept are
sometimes lower with gDR than with sDR (indicating
The relation between Pearson’s the power of detecting bias is higher with gDR).
– Calculated slopes and intercepts of the regression
coefficient of correlation and line were linearly and inversely (intercept decreasing
various regression models with increasing slope) related with each other for all
models (as follows from the definition of the intercept,
Regarding the relation between r-value and estimated see Appendix 1).
regression coefficients, the following conclusions can be – The standard deviation of slope and intercept
drawn from the presented results: increased with decreasing r with all models.
Haeckel et al.: Regressions for method comparisons 159
A 20.0000 A 70
15.0000 60
50
Bias at 190
10.0000
Bias at x = 85
40
5.0000
30
0.0000 20
0.4000 0.5000 0.6000 0.7000 0.8000 0.9000 1.0000
10
-5.0000
0
-10.0000 0.6000 0.6500 0.7000 0.7500 0.8000 0.8500 0.9000 0.9500 1.0500
Correlation coefficient
-15.0000
Correlation of coefficient
B 5.0000 B 10
5
0.0000
0
Bias at 190
0.7000 0.7500 0.8000 0.8500 0.9000 0.9500 1.0000
-5.0000 0.8800 0.9000 0.9200 0.9400 0.9600 0.9800 1.0000
-5
Bias at 85
-10
-10.0000
-15
-15.0000
-20
-20.0000 -25
Coefficient of correlation
-25.0000
Correlation coefficient
C 2
1.5
C 1
1
0.5
0.5
Bias at 190
0
0.4000 0.5000 0.6000 0.7000 0.8000 0.9000 1.0000 0
-0.5
Bias at x = 85
-1.5 -1
-2 -1.5
-2.5 -2
Coefficient of correlation
-3
-3.5 Figure 7 Bias at x = 190 arbitrary units (upper limit of the measure-
Coefficient of correlation
ment interval, sX = 30) calculated (A) by OLR, (B) SPCR (blue rhombs),
OR (brown rectangles) and PBR (green triangles) and (C) by sDR
Figure 6 Bias at x = 85 arbitrary units (lower limit of the measure-
(blue rhombs) and gDR (brown rectangles).
ment interval, sX = 5) calculated (A) by OLR, (B) SPCR (blue rhombs),
Bias: 190– (190slope + intercept). Most data were taken from
OR (brown rectangles) and PBR (green triangles) and (C) by sDR
Table 2.
(blue rhombs) and gDR (brown rectangles).
Bias: 85– (85 slope + intercept). Most data were taken from Table 1.
Table 4 Differences between concentrations calculated by various the bisecting line between the two OLR lines. SPCA and
regression models for examples A and C presented in Table 3.
PBR are sensitive to an increase of the imprecision ratio,
which can be tolerated in the interval of 0.8–1.2 [8]. DR is
Glucose concentration, 2 7 15
mmol/L
not influenced by imprecision ratio if the correlation coef-
ficient r is > 0.6.
Example A/C A/C A/C
Using weights in regression models has often been
Concentrations proposed to deal with different analytical variances in
calculated, mmol/L the measurement interval. This requires knowledge of
OLR 1.98/3.27 7.77/9.03 17.03/18.245
the imprecision profile of both analytical procedures
SPCR 1.81/2.80 7.76/9.00 17.27/18.92
OR 1.78/2.69 7.75/9.00 17.31/19.09
as mentioned above for gDR. gDR weighs errors in both
sDR 1.87/3.15 7.76/9.02 17.17/18.42 directions and allows errors that are non-constant over
gDR 1.89/3.17 7.76/9.02 17.15/18.37 the measurement interval. This is the most general form
PBR 1.86/2.77 7.73/9.01 17.13/18.99 of using different weights. OLR allows errors in only one
Permissible bias 0.046 0.161 0.345 direction and therefore weighted errors can be used in
(2.3%), mmol/L
only one direction. SPCR and PBR implicitly apply identi-
The concentrations chosen were 2 mmol/L (36 mg/dL), 7 mmol/L cal weights to all data points, sDR uses weights constant
(126 mg/dL) and 15 mmol/L plasma glucose (270 mg/dL). The for each method but possibly different between methods.
estimated concentrations in mmol/L were calculated: chosen In conclusion, all methods operated well only within
concentration slope + intercept. The concentrations for sDR were certain limitations. Whereas OLR had a relatively narrow
considered to be closest to the (unknown) true values. Then, the
applicability, DR required the least restrictions. Therefore,
difference between the sDR values and the values of the other
regression models should be smaller than the permissible bias [26]. DR must be considered to be superior to other models. Both
Concentrations marked in bold deviate of the sDR value more than DR approaches are more or less equally suited in estimating
the permissible bias (permissible difference). the coefficients of regression correctly and are more robust
than other models studied. The only difference between
gDR and sDR is that gDR has a slightly higher power of
detection in some cases. In extreme cases which, anyhow,
Discussion should be considered as not comparable in laboratory med-
icine (r<0.6), both DR can also lead to erroneous estima-
Linnet [20] had already performed a similar simulation tions of regression coefficients, and gDR may even be less
study, but compared only OLR with DR and did not con- reliable than sDR. DR has been developed more than five
sider the most realistic case that the ratio of sAX and sAY is decades ago. Now, it is time that it replaces linear regres-
non-constant in the range of measurements. sion of the first kind for method comparisons. It is also
One of the major interests for scatter plots is detect- superior to other regression models which consider errors
ing constant and/or proportional bias. Because of the in both variables, but do not account for different error sizes
above-mentioned effects of the impression profile, two in both variables as already pointed out by Linnet [20].
questions arise: (i) is the observed bias really a true
bias (or an artifact) and (ii) is a true bias detectable (or
masked) by an imprecision effect. All regression models
discussed have limitations for answering the two ques- Limitations
tions. The behavior of the regression models varies
between two extreme cases, a constant SDA and a con- – The errors of the two variables are assumed to be
stant CVA. A general hierarchy of all regression models independent and normally distributed. The analytical
covering all possible situations probably does not exist. standard deviation depends linearly on the quantity
However, DR is the only approach which provides correct in the measurement interval [32, 33].
answers probably under all conditions which usually – All models are only valid for continuous variables. In
appear with method comparison studies in laboratory the discrete case, other models must be chosen.
medicine. – A problem may arise from limitations of the inter-
If the imprecision ratio δ is 1, the slope b is identical changeability of estimates of day-to-day imprecision
for DR and OR, approximately identical for DR, OR, SPCR between commercial control materials and native
and PBR, at least up to a CVA = 5.0 and the slope of the SPCR materials [34]. The imprecision profile determined
is the ratio of the slopes of the two OLRs, or, geometrically, with artificial samples has the benefit that it can
Haeckel et al.: Regressions for method comparisons 161
provide imprecision data from day to day with Conflict of interest statement
the disadvantage that matrix properties of control
Authors’ conflict of interest disclosure: The authors
samples may differ from those of patients’ samples.
stated that there are no conflicts of interest regarding the
The imprecision profile estimated from duplicate
publication of this article.
measurements has the benefit that the samples are
Research funding: None declared.
identical to the samples of which the regression
Employment or leadership: None declared.
results are derived with the disadvantage that only
Honorarium: None declared.
imprecision data from within run may be obtained.
However in both cases, δ should be comparable.
Control materials should be as commutable with
native materials as possible. Received January 22, 2013; accepted April 19, 2013
Appendix 1
The basic assumptions in method comparison are that (i) aY − aX ⋅bY bY
Ey = + ⋅ Ex = a + b⋅ Ex . (A3)
the observations x and y are related to the true unknown bX bX
value z by x = aX+bXz+εX and y = aY+bYz+εY, where (ii) aX,
bX, aY, bY quantify constant and proportional deviations
In a method comparison, only a and b can be derived from
(bias) between measured and true value, (iii) εX and εY are
observed x and y data pairs (see below). Eq. (A1) shows
random errors with mean zero and standard deviations
that the bias components aX, bX, aY , bY cannot be identified
sAX and sAY which are (iv) uncorrelated cov(εX, εY) = 0. The
from the relation between observed x and y only. An esti-
random errors representing the analytical variation are
mate of a = 0 only allows the conclusion that aY = aX·bY, but
(v) also assumed to be uncorrelated with z. The standard
not that the true constant bias terms are zero (aY = aX = 0).
deviation of the random errors may depend on the true
Similarly, from an estimate of b = 1 only follows that both
value z, and it is assumed that these are related to z by
proportional bias terms are identical (bX = bY), but not that
linear functions sAX(z) = aAX+bAX·z and sAY(z) = aAY+bAY·z, the
both are equal to one (bX = bY = 1). Estimates of b≠1 and of
imprecision profiles. From these profiles, and assuming
a≠0 can be caused by different analytical errors (sAX≠sAY)
that the true values are normally distributed, the overall
even in the absence of a systematic difference between
analytical standard deviations are obtained by
methods (see below).
∞ Observed values have mean values x , y and vari-
2
sAX = ∫ ( z − Ez X ) 2 ⋅ f ( z ;( aAX + bAX ⋅ z ) 2 ) dz , (A1) ances sX2 , sY2 . Observed variances are related to the (unob-
0
servable) variance sZ2 of the true values and the analytical
2 2
variances sAX , sAY via
∞
2
sAY = ∫ ( z − EzY ) 2 ⋅ f ( z ;( aAY + bAY ⋅ z ) 2 ) dz (A2)
0 sX2 = sZ2 + sAX
2
and sY2 = sZ2 + sAY
2
. (A4)
where f (z; (aA+bA·z)2) is the normal probability density The correlation r (x, y) between observed values is
function and Ez is the mean value of the analytical error. related to the imprecision profiles and the standard devia-
If bAX = bAY = 0 the integrals simplify to tion of the true values z by:
sAX = aAX, sAY = aAY sZ2
r= (A5)
As an approximation to Eq. (A1), the value of sAX(z) at the sZ2 + sAX
2
⋅ sZ2 + sAY
2
References
1. International Standard Organisation. Medical laboratories – 10. Bablok W. Range of linearity. In: Haeckel R, editor. Evaluation
particular requirements for quality and competence, ISO 15189, methods in laboratory medicine. Weinheim: VCH, 1993:251–8.
2nd ed. Geneva: International Standard Organisation, 2007: 11. Geistanger A, Berding C, Vorberg E, Herlan M. Local regression:
1–40. a new approach for measurement system comparison analysis.
2. Clinical and Laboratory Standards Institute. Method comparison Clin Chem Lab Med 2008;46:1211–9.
and bias estimation using patient samples: approved guideline – 12. Petersen PH, Stoeckl D, Blaabjerg O, Pedersen B, Birkemose E,
second edition (interim revision). CLSI document EP9-A2-IR. Thienpont L, et al. Graphical interpretation of analytical data
Wayne, PA: Clinical Laboratory Standards Institute, 2010; from comparison of a field method with reference method by
30:1–36. use of difference plots. Clin Chem 1997;43:2039–46.
3. Westgard JO, Hunt MR. Use and interpretation of common 13. Linnet K. Performance of Deming regression analysis in case
statistical tests in method-comparison studies. Clin Chem of misspecified analytical error ratio in method comparison
1973;19:49–57. studies. Clin Chem 1998;44:1024–31.
4. Bablok W, Haeckel R, Meyers W, Wosniok W. Biometrical 14. Averdunk R, Borner K. Korrelation der Thromboplastinzeiten
methods. In: Haeckel R, editor. Evaluation methods in laboratory bei Dicumarol-behandelten Patienten unter Verwendung
medicine. Weinheim: VCH, 1993:203–41. verschiedener Thrombokinase-Präparate. Z Klein Chem Klein
5. Stöckl D, Dewitte K, Thienpont LM. Validity of linear regression Biochem 1970;8:263–8.
in method comparison studies: is it limited by the statistical 15. Feldmann U, Schneider B, Klinkers H, Haeckel R. A multivariate
model or the quality of the analytical input data? Clin Chem approach for the biometric comparison of analytical methods in
1998;44:2340–6. clinical chemistry. J Clin Chem Clin Biochem 1981;19:121–37.
6. Westgard JO. Basic method validation, 3rd ed. Madison, WI: 16. Feldmann U. Robust bivariate errors-in-variables regression and
Westgard QC Inc., 2008. outlier detection. Eur J Clin Chem Clin Biochem 1992;30:405–14.
7. Haeckel R, Sonntag O. Validation of quantitative analytical 17. Deming WE, editor. Statistical adjustment of data. New York:
procedures in laboratory medicine. J Lab Med 2012;36:111–8. Wiley, 1943 (Dover Publications edition 1985).
8. Haeckel R, Wosniok W, Al-Shareef N. Permissible performance 18. Cornbleet PJ, Gochman N. Incorrect least-squares regression
limits of regression analyses in method comparisons. Clin Chem coefficients in method comparison analysis. Clin Chem
Lab Med 2011;49:1805–16. 1979;25:432–8.
9. Bland JM, Altman DG. Comparing two methods of clinical 19. Linnet K. Estimation of the linear relationship between the
measurement: a personal history. Int J Epidemiol 1995;24 measurement of two methods with proportional errors. Stat
(Suppl 1):S7–14. Meth 1990;9:1463–73.
Haeckel et al.: Regressions for method comparisons 163
20. Linnet K. Evaluation of regression procedures for methods 28. Fraser CG. Biological variation: from principles to practice.
comparison studies. Clin Chem 1993;39:424–32. Washington, DC: American Association for Clinical Chemistry,
21. Kummel CH. Reduction of observation equations which contain 2001:1–151.
more than one observed quantity. The Analyst 1879;6:97–105. 29. International vocabulary of metrology – basic and general
22. Martin RF. General Deming regression for estimating systematic concepts and associated terms (VIM). ISO guide 99, 3rd ed.
bias and its confidence interval in method-comparison studies. JCGM 2007:1–104.
Clin Chem 2000;46:100–4. 30. Keller Th. Excel-tool: permissible performance limits in method
23. Dunn G. Regression models for method comparison data. comparisons. Available at: http://www.acomed-statistik.de/
J Biopharmaceut Stat 2007;17:739–56. performance_limits_method_comparison.html. Accessed 10
24. Adcock RJ. A problem in least squares. Ann Math 1878;5:53–4. November, 2011.
25. Passing H, Bablok W. A new biometrical procedure for testing 31. Clinical Chemistry, information for authors. Tools for diagnostic
the equality of measurements from two different analytical accuracy. Available at: http://www.clinchem.org/site/info_ar/
methods. Application of linear regression procedures for info_authors.xhtml#tools. Accessed November 2012.
method comparison studies in clinical chemistry, Part I. J Clin 32. Haeckel R, Haeckel H. The determination of glucose concen-
Chem Clin Biochem 1983;21:709–20. tration in 20 microliter capillary blood, liquor and urine by
26. Haeckel R, Wosniok W, Puentmann I. Discordance rate, a new the hexokinase method with the endpoint analyzer 5030
concept for combining diagnostic decisions with analytical (Eppendorf). Z Klin Chem Klin Biochem 1972;10:453–61.
performance characteristics.1. Application in method or sample 33. Haeckel R, Mathias D. A two point method for the determination
system comparisons and in defining decision limits. Clin Chem of urea with a Gemsaec analyzer. Z Klin Chem Klin Biochem
Lab Med 2003;41:347–55. 1974;12:515–20.
27. Neter J, Kutner MH, Nachtsheim CJ, Wasserman W, editors. 34. Fuentes-Arderiu X, de-la-Presa G. Interchangeability of
Applied linear statistical models, 4th ed. Boston, MA: WCB estimates of day-to-day imprecision between commercial
McGraw Hill, 1996:56. control materials and serum pools. Clin Chem 2002;48:573–4.