Professional Documents
Culture Documents
Method Comparison in The Clin Lab
Method Comparison in The Clin Lab
Abstract: Studies comparing a new method with an established method, to assess whether the new measurements are comparable
with existing ones, are frequently conducted in clinical pathology laboratories. Assessment usually involves statistical analysis of
paired results from the 2 methods to objectively investigate sources of analytical error (total, random, and systematic). In this
review article, the types of errors that can be assessed in performing this task are described, and a general protocol for comparison
of quantitative methods is recommended. The typical protocol has 9 steps: 1) state the purpose of the experiment, 2) establish
a theoretical basis for the method comparison experiment, 3) become familiar with the new method, 4) obtain estimates of random
error for both methods, 5) estimate the number of samples to be included in the method comparison experiment, 6) define
acceptable difference between the 2 methods, 7) measure the patient samples, 8) analyze the data and 9) judge acceptability. The
protocol includes the essential investigations and decisions needed to objectively assess the overall analytical performance of
a new method compared to a reference or established method. The choice of statistical methods and recommendations of decision
criteria within the stages are discussed. Use of the protocol for decision-making is exemplified by the comparison of 2 methods for
measuring alanine aminotransferase activity in serum from dogs. Finally, a protocol for comparing simpler semiquantitative
methods with established methods that measure on a continuous scale is suggested. (Vet Clin Pathol. 2006;35:276–286)
2006 American Society for Veterinary Clinical Pathology
From the Department of Small Animal Clinical Science, The Royal Veterinary and Agricultural University, Groennegaardsvej 3, DK-1870, Frederiksberg C, Denmark. Corresponding
author: Asger Lundorff Jensen (alj@kvl.dk). This article has been peer-reviewed. ª2006 American Society for Veterinary Clinical Pathology
Types of Errors
obtained by a perfect measurement. This is also impossible to obtained in other species) is essential if unexpected or aberrant
achieve, and the best estimate of a true value is a value results occur. If antigen-antibody reactions are involved in the
produced by a reference method which can be described as new method, knowledge or hypotheses concerning the
a thoroughly investigated test method, clearly and exactly specificity, affinity, and avidity of the applied antibodies are
describing the necessary conditions and procedures, for the also valuable.
evaluation of a specific biological endpoint, which has been
shown to have accuracy and precision commensurate with its 3. Become familiar with the new method
intended use and which can, therefore, be used to assess the
accuracy of other methods for the same measurement. When In this phase, a working procedure is established. In practical
a reference method is not available, certified reference terms this means that one establishes sufficient working
material (which is not identical to calibration material) with competence with the method so one can correctly prepare
values measured by a reference method may be used to assess reagents, set up the analyzer, calibrate the method and obtain
systematic error. In veterinary clinical pathology, certified test results. If not done earlier, one also assesses whether the
species-specific reference material or reference methods are new method can actually measure the analyte in question, eg,
seldom available, and existing, routinely applied methods are by measuring samples with presumed different levels of
frequently used as the method to which a new method is analyte and mixtures thereof.
compared.
In other words, systematic error is the new method’s
difference from what is held to be a true value as determined
4. Obtain estimates of random error for both methods
by a reference method or an existing method in the laboratory. Estimates of random error (ie, data on imprecision) serve at
Systematic error can be subdivided into constant and pro- least 2 purposes. First, estimates of random error are to be
portional systematic error (Figure 1). Constant systematic
used in the method comparison experiment to judge accept-
errors are systematic deviations estimated as the average ability of the new method (see point 6). Second, if duplicate or
differences between the 2 methods. The presence of a constant replicate measurements are used in the method comparison
systematic error indicates that one method measures consis- experiment, estimates of random error may help in assessing
tently higher or lower in comparison with the other method. validity of the measurements by the individual methods and
Proportional systematic error means that the differences help identify unexpected test results arising from sample mix-
between the 2 methods are proportionally related to the level ups, transposition errors, and other mistakes.
of measurements.
If estimates of random error for both methods are not
already present, imprecision studies should be conducted. For
Method Comparison quantitative assays, it is useful to report imprecision as the CV
either at 2 or more specified mean values near clinical decision
Suggested protocol
points or at values in low, middle, and high parts of the
A method comparison study is a research experiment, and analytical range as obtained by repeating the test over
as with all other research experiments, a research protocol a specified number of days. Within-run CVs are appropriate
outlining the scope and procedures is essential. Local if all patient samples are analyzed in a single run.
traditions may influence the structure and content of the
protocol. In the following, we suggest a protocol based 5. Estimate the number of samples to be included in the
primarily on previous publications16,18 that includes items method comparison experiment
which in our experience are useful.
Most authors recommend including at least 40 patient
1. State the purpose of the experiment samples in the method comparison experiment.16,22 The
samples should cover the working range of the methods and
The reason for performing a method comparison experiment should represent the spectrum of diseases expected in routine
is to estimate the type and magnitude of systematic error application of the methods. Another significant factor that
between 2 methods and to judge if the 2 methods are identical determines the statistical power of a method comparison
within the inherent imprecision of both methods or within experiment is the number of samples. Based on simulations, it
preset analytical quality specifications. has been shown that an important factor in deciding the
number of samples to include is the range ratio, which is the
2. Establish a theoretical basis for the method maximum value divided by the minimum value.8 When the
comparison experiment range ratio is low, eg, 2, the number of samples should be high,
eg, 500, while when the range ratio is high, eg, 10, the number
It usually is very helpful to collect and write down infor- of samples to include may be lower, eg, 100.
mation relating to both the new method and the comparative
method. Information on sample requirements, analytical pro- 6. Define acceptable difference between the 2 methods
cess, reaction principles, calibration procedure, calculations,
known interferences, and anticipated analytical performance Before the measurements are conducted, the amount of
(eg, anticipated imprecision, inaccuracy, reportable range analytical error that is allowable without compromising test
Table 1. Proposed hierarchy of models to be applied to set analytical Table 2. Data on biological variation for some canine blood and serum
quality specifications. components.*
No. Model Sources of Information Analyte CVG CVI CVA CVmax Bmax TEmax Reference
(%) (%) (%) (%) (%) (%) No.
1 Evaluation of the effect of
analytical performance on RBC 4.4 5.4 2.8 2.7 1.8 6.3 39
clinical outcomes in specific HCT 5.2 6.4 1.1 3.2 2.1 7.4 39
clinical settings Hgb 4.7 5.9 2.9 3.0 1.9 6.9 39
2 Evaluation of the effect of 2.a. Data on biological variation WBC 12.3 12.1 3.7 6.1 4.3 14.4 39
analytical performance 2.b. Analysis of clinician’s opinions ALT 23.7 9.7 3.2 4.8 6.4 14.3 40
on clinical decisions
AST 10.9 11.4 3.3 5.7 4.0 13.4 40
in general
ALP 34.2 8.6 1.7 4.3 8.8 15.9 40
3 Published professional 3.a. National and international
recommendations expert bodies Albumin 3.0 2.4 1.6 1.2. 1.0 3.0 40
3.b. Expert local groups or individuals Total protein 3.1 2.6 1.1 1.3 1.0 3.2 40
4 Performance goals 4.a. Regulatory bodies Urea 35.1 16.1 3.8 8.0 9.7 22.9 40
4.b. Organizers of External Quality Creatinine 12.9 14.6 2.9 7.3 4.9 17.0 40
Assessment (EQA) schemes Cholesterol 15.1 7.3 3.0 3.7 4.2 10.3 40
5 Goals based on current 5.a. Data from EQA or proficiency Glucose 3.8 9.5 3.7 4.8 2.6 10.5 41
state of art testing schemes Fructosamine 4.2 11.1 2.8 5.6 3.0 12.2 41
5.b. Current publications on methodology Potassium 3.6 3.3 0.1 1.7 1.2 4.0 42
Total thyroxine (TT4) 17.2 17.0 4.0 8.4 6.0 19.9 43
Thyrotropin (TSH) 43.6 13.6 8.8 6.8 11.4 22.6 44
interpretation, patient care, or consumer care is defined.
The basis for a method comparison study is the hypothesis Iron 17.2 17.8 0.7 8.9 6.2 20.9 45
that the 2 methods are identical either within inherent Fibrinogen 19.0 17.1 2.8 8.5 6.4 20.4 45
imprecision of both methods or within preset analytical C-reactive protein 29.3 24.3 7.2 12.2 9.5 29.6 45
quality specifications.6 a-1-acid glycoprotein 67.0 9.6 8.1 4.8 16.9 24.8 45
Haptoglobin 20.2 17.0 4.9 8.5 6.6 20.6 45
Acceptance limits based on inherent imprecision of both *CVG indicates between-dog coefficient of variation; CVI, within-dog coefficient of
methods. The inherent imprecision of both methods is cal- variation; CVA, analytical coefficient of variation; CVmax, maximum allowable
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
imprecision; Bmax, maximum allowable inaccuracy; and TEmax, maximum allowable
culated as CV2Method1 þ CV2Method2 . When means of duplicates
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi total error.
CV2Method1 CV2
are used, the formula is 2 þ Method2
2 . If single measure-
ments are used and the imprecision (CV) is 5% and 3%, then the components in dogs, cows, and rabbits have been available for
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
inherent imprecision of both methods is 52 þ 32 55.8%. If many years. A list of data on biological variation of some
qffiffiffiffiffiffiffiffiffiffiffiffiffi common analytes in dogs is presented in Table 2. Data on
2 2
means of duplicates are used, the CV is 52 þ 32 54.1%. This biological variation also are available for numerous blood
means that if the mean value of the 2 methods is 100 and they components in humans,27 and these values can be used as
are expected to measure identically, then the difference starting points until veterinary data are available. Data on
between the 2 methods is expected to be within the interval biological variation make it possible to calculate objectively
0 6 1.96 CV mean in 95% of the measurements, ie, 0 6
maximum allowable values for imprecision28–31 (Imax), in-
1.96 (0.041 100)50 6 8.04.
accuracy31 (Bmax), and total error32 (TEmax) from the within-
animal (CVwithin) and between-animal (CVbetween) variations
Acceptance limits based on analytical quality specifications. using the following formulas:
Analytical quality specifications can be established in a num- Imax ¼ 0:5 CVwithin ;
rarely derived objectively from an analysis of medical needs in Analytical performance data from the Clinical Laboratory
specific clinical situations, one exception being the use of error Improvement Amendments (CLIA) proficiency testing criteria
grid analysis in the evaluation of portable blood glucose for medical laboratories also can be a starting point for setting
meters in dogs and cats.25,26 analytical quality requirements (for more details on CLIA, see
Meanwhile, data on biological variation for many blood www.fda.gov/cdrh/CLIA/index.html). Some CLIA total al-
9. Judge acceptability
Line B (for TEmax 5Bmax þ ð2 Imax Þ): from TEmax on the y-axis
What if the 2 methods do not produce identical results?
to ðTEmax>
2 Þ on the x-axis
Line C (for TEmax 5Bmax þ ð3 Imax Þ): from TEmax on the y-axis
If the method comparison experiment has revealed that the 2
to ðTE3Max Þ on the x-axis methods are not identical either within inherent combined
Line D (for TEmax 5Bmax þ ð4 Imax Þ): from TEmax on the y-axis
imprecision or within predefined limits, the methods cannot
to ðTE4max Þ on the x-axis. be used interchangeably. In some cases, this can be very
frustrating, for example, when a manufacturer has stopped
Imprecision and inaccuracy from the replication study
and the method comparison experiment, respectively, are then
plotted into the MEDx chart, and it is now easy to judge
whether the new method is just acceptable (ie, within control)
or of poor, marginal, good, or excellent performance (the
designations ‘‘poor,’’ ‘‘marginal,’’ ‘‘good,’’ and ‘‘excellent’’
correspond directly to sigma performance criteria 1, 2, 3, and 4
in ‘‘Six Sigma Quality Management’’) (Figure 7).38
producing a certain reagent and the method comparison Table 3. Example data from an experiment comparing 2 methods (new
experiment has revealed the new method does not produce method and routine method) for measurement of alanine aminotransfer-
identical results. In this case, it may be worthwhile to perform ase activity (U/L) in fresh unhemolyzed canine serum.
a new method comparison experiment using another new
method. If this is not an option and for some reason one is Routine Method New Method
forced to use the new method, new reference intervals for each
104 116
animal species must be prepared. Obtaining new reference
102 115
intervals is often a very cumbersome and expensive process
but it is, in our opinion, much preferable to simply including 113 125
the regression equation in the new method, since this may be 101 111
a significant source of undetectable and unexplainable error at 106 115
a later stage when everyone has forgotten that a regression 96 106
equation was included. 102 112
108 117
79 86
Example: Alanine Aminotransferase 85 90
Purpose of the experiment 116 125
94 104
A new method for measuring alanine aminotransferase (E.C. 101 111
2.6.1.2) (ALAT) activity in serum from dogs is being
110 121
considered in the laboratory. The laboratory already has
115 126
a method for measuring ALAT activity in serum from dogs.
99 105
The purpose of the method comparison experiment is to judge
if the 2 methods are identical either within inherent impre- 95 108
cision of both methods or within preset analytical quality 110 122
specifications. 97 106
93 107
100 108
Theoretical basis for the method comparison experiment 101 110
94 106
The sample material that is analyzed is fresh unhemolyzed
92 106
serum according to the laboratory’s standard operating
89 97
procedure for sample material for clinical chemical analysis.
Both methods use the modified International Federation for 115 127
Clinical Chemistry (IFCC) method where the reaction is 102 116
initiated by the addition of a-ketoglutarate as a second 120 133
reagent. The concentration of NADH is measured by its 111 123
absorbance at 340 nm, and the rate of absorbance decrease is 85 90
proportional to the ALAT activity. The routine method has an 75 86
imprecision of 2%. The new method has an imprecision of 5% 70 79
when applied to feline serum samples. 72 79
76 78
79 86
Familiarization with the new method
81 90
The new method has been applied to fresh unhemolyzed 82 89
canine serum samples for 1 week to obtain a working 88 86
competence with the method. Samples with different ALAT 89 91
activities have been mixed, and it has been observed that 65 75
ALAT activity in the mixtures is comparable to what would be
expected from the combined ALAT activities in the original
samples. Thus, it is assumed that the new method actually can
Number of samples to be included in the method
measure ALAT activity in canine serum samples.
comparison experiment
Estimates of random error for both methods The laboratory reference interval for ALAT activity in canine
serum is 0–80 U/L. Forty patient samples are assumed to be
The routine method has an imprecision of 2% (single samples). required. Since increased values are of clinical interest,
An experiment on 5 canine serum samples revealed that the samples with ALAT activities around and above the upper
inaccuracy of the new method was 4%. limit of the reference interval are preferred.
The 40 samples were measured by both methods no more then Figure 9. An example of an experiment comparing 2 methods for the
1 hour apart. Eight different patient samples were analyzed measurement of ALAT activity in fresh unhemolyzed canine serum
each day for 5 days. The results are presented in Table 3. samples by means of a difference plot. The dotted lines represent 0 6
1.96 inherent imprecision of both methods (4.5%). Only 13 values of
Judging acceptability
32. Petersen PH, Ricos C, Stockl D, et al. Proposed guidelines for the 39. Jensen AL, Iversen L, Petersen TK. Study on biological variability of
internal quality control of analytical results in the medical laboratory. haematological components in dogs. Comp Haemat Internat. 1998;8:
Eur J Clin Chem Clin Biochem. 1996;34:983–999. 202–204.
33. Payne RB. Method comparison: evaluation of least squares, Deming 40. Jensen AL, Aaes H. Critical differences of clinical chemical parameters
and Passing/Bablok regression procedures using computer simula- in blood from dogs. Res Vet Sci. 1993;54:10–14.
tion. Ann Clin Biochem. 1997;34:319–320.
41. Jensen AL, Aaes H, Iversen L, Petersen TK. The long-term biological
34. Lin’s concordance. Available at: http://www.niwa.co.nz/services/ variability of fasting plasma glucose and serum fructosamine in
statistical/concordance. Accessed August 16, 2005. healthy Beagle dogs. Vet Res Commun. 1999;23:73–80.
35. Jensen AL, Bantz M. Comparing laboratory tests using the difference 42. Jensen AL, Pedersen HD, Koch J, Aaes H, Flagstad A. Applicability of
plot method. Vet Clin Pathol. 1993;22:46–48. the critical difference. Zentralbl Veterinarmed A. 1993;40:624–630.
36. Bland JM, Altman DG. Comparing methods of measurements: why 43. Jensen AL, Hoier R. Evaluation of thyroid function in dogs by hormone
plotting difference against standard method is misleading. Lancet. analysis: effects of data on biological variation. Vet Clin Pathol. 1996;25:
1995;346:1085–1087. 130–134.
37. Jensen AL, Iversen L, Hoier R. Evaluation of analytical perfor- 44. Iversen L, Jensen AL, Hoier R, Aaes H. Biological variation of ca-
mance assisted by total error criteria of a commercial enzyme nine serum thyrotropin (TSH) concentration. Vet Clin Pathol. 1999;28:
immunometric assay for canine serum thyrotropin. Vet Clin Pathol. 16–19.
1999;28:53–56.
45. Kjelgaard-Hansen M, Mikkelsen LM, Kristensen AT, Jensen AL. Study
38. Westgard JO. Six Sigma Quality Design and Control. Madison, WI: on biological variability of five acute-phase reactants in dogs. Comp
Westgard QC Inc; 2001. Clin Path. 2003;12:69–74.