Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

William Horwitz

Bureau of Foods HFF-7 Regulations


Food and Drug Administration
Washington, D.C. 20204

Evaluation of
Analytical Methods Used for
Regulation of Foods and Drugs
Although the Association of Official in a single laboratory and presumably
Analytical Chemists (AOAC) has been by the same analyst. Under our legal
evaluating and approving methods of system this solution is impossible,
analysis for almost 100 years, there is since defendants accused by laborato-
practically no discussion in the Jour- ry evidence have a constitutional right
nal of the Association of Official Ana- to produce rebuttal evidence from any
lytical Chemists of the criteria for de- laboratory of their choice. Therefore,
termining which methods should be the important question to be answered
approved for regulatory use. These de- in the evaluation of methods of analy-
cisions are usually made on the basis sis is how much allowance must be
of a method's performance in interlab- made for between-laboratory variabil-
oratory collaborative studies. ity in interpreting the values produced
John Mandel of the National Bu- by different laboratories. If the vari-
reau of Standards, in his 1981 Shew- ability or error produced by the meth-
hart medal address (1), pointed out od is excessive—that is, it does not
that the basic objective of conducting permit effective regulation as required
interlaboratory tests is not to detect by the statute—the method must be
the known statistically significant dif- judged unacceptable for the intended
ferences among laboratories: "The real purpose.

FDA
aim is to achieve the practical inter- The purpose of this paper is to
changeability of test results." Inter- suggest some practical limits of ac-
laboratory tests are conducted to de- ceptable variability in methods of
termine how much allowance must be analysis required by AOAC's custom-
made for variability among laborato- ers—the regulatory agencies and the
ries in order to make the values inter- regulated industries. The collabora-
changeable. tive study procedure has provided the
An irreducible difference exists be- essential data for developing this in-
tween supposedly identical measure- formation.
ments made in different laboratories.
This point was recently demonstrated Method Characteristics
by a group of New Zealand govern- Methods are usually evaluated on
ment laboratories in attempting to the basis of three characteristics: reli-
minimize the discrepancies in values ability, applicability, and practicabili-
for blood alcohol between laboratories. ty. For our present purpose, reliability
The laboratories went to great pains is the overriding consideration. In gen-
to discover every source of error, even eral, when a need exists for a method,
to the extreme of moving analysts we have to accept any reasonable de-
from one laboratory to another. They gree of reliability. Applicability to a
found that an analyst increases his or wide range of sample types and practi-
her intra-analyst variability when cability with respect to cost, time, and
moved to a different laboratory envi- training constraints both assume
ronment. They concluded that the greater importance when there are
only way to eliminate interlaboratory several competing methods.
variability was to conduct all analyses The important aspects of reliability,
listed in their approximate order of
importance for most purposes, are:
• Reproducibility, or total be-
Presented at the 95th Annual Meeting of the As- tween-laboratory precision. This is the
sociation of Official Analytical Chemists, Wash-
ington, D.C., Oct. 19,1981. measure of the ability of different lab-

This article not subject to U.S. Copyright ANALYTICAL CHEMISTRY, VOL. 54, NO. 1, JANUARY 1982 · 67 A
Published 1981 American Chemical Society
since the definition of "best" will vary
with the purpose for which the meth­
od will be used. Since this is not usual­
ly known beforehand, we must usually
assume that our primary interest will
be in the achievement of a suitable de­
gree of precision and bias; require­
ments for specificity and limit of reli­
able measurement will usually be self-
evident.
I Drugs
in Feeds
In regulatory work, or even in ana­
lyzing for adherence to commercial
specifications, between-laboratory
Pharma­
ceuticals Pesticide variability is the most important fac­
Residues Aflatoxlns tor. Bias can be tolerated. If it is con­
•ε stant, a correction factor can be used.
If it is variable, it becomes a compo­
Trace nent of reproducibility (between-labo­
Major Minor Elements ratory variability). In fact, Youden (2)
Nutrients Nutrients equates systematic error to the "true
! between-laboratory" variability,
which in our terminology is the repro­
ducibility adjusted for the within-lab­
oratory variability (repeatability).
Bias, as a recovery factor, particularly
in modern trace analysis, is generally
permitted to seek its own level, pro­
vided it is above 60-80% (3).

Interlaboratory Precision
Concentration
It would appear that any systematic
approach to estimating what consti­
Figure 1. The general curve relating interlaboratory coefficients of variation (ex­ tutes a reasonable precision would be
pressed as powers of two on the right) with concentration (expressed as powers of an almost impossible task. Methods
10) along the horizontal center axis are composed of almost infinite com­
binations of dissolution, cleanup, and
oratories to check each other. It is the that a high degree of accuracy and measurement procedures. These innu­
overall measure of variability, includ­ precision is not an important require­ merable combinations are applied to
ing the within-laboratory component. ment. The averaging of numerous im­ pure substances and complex mixtures
• Repeatability, or within-labora­ precise determinations often provides as solids, liquids, and gases by ana­
tory precision. This is the measure of a surprisingly good mean. Sometimes lysts with various degrees of compe­
the ability of a laboratory (or analyst) all that is needed is to differentiate tency. Yet, despite this complexity, we
to check itself. samples with "none" of the analyte have found that analytical variability
• Systematic error or bias (some­ from those that contain "significant" can be summarized (in an oversimpli­
times also called "accuracy or inaccu­ amounts. In monitoring trends, the fied fashion to be sure) by plotting the
racy"). This is the difference of the systematic error, as long as it is con­ determined mean coefficient of varia­
value(s) obtained from the true, as­ stant, is not important. The precision tion (CV), expressed as powers of two,
signed, or consensus value(s). must be good enough to detect when a against the analyte level measured, ex­
• Specificity (when required). This "significant" difference occurs. In pressed as powers of 10, as shown in
is the ability of the method to measure compliance activities a high degree of Figure 1, taken from our recent paper
what it is intended to measure. accuracy (as lack of bias) and preci­ on quality control (4).
• Limit of reliable measurement sion are required at the specification The sources of these data are an ex­
(when required). This is the smallest level, unless the specification is based amination of over 150 independent
amount (or concentration) of a mate­ upon the method itself, in which case AOAC interlaboratory collaborative
rial that can be measured with a stat­ only precision is pertinent. The preci­ studies covering numerous AOAC top­
ed degree of confidence. sion requirement may decrease as the ics, from drug preparations and pesti­
Which of these factors is most im­ distance from the specification value cide formulations on the high end of
portant depends upon the purpose for increases. When the "no residue" re­ the concentration scale to aflatoxin
which the data will be used. In regula­ quirements of the Federal Food, Drug, contaminants at the low end, with im­
tory analysis, analytical values are and Cosmetic Act are involved, speci­ portant stops in between at pesticide
used for three major purposes: to sur­ ficity and limit of reliable measure­ residue and trace element concentra­
vey a field to determine the extent of ment are the most important consid­ tions. At least five analytical meth­
a problem; to monitor trends to deter­ erations. In practical work, other re­ ods—chromatography, atomic absorp­
mine if any corrective action has to be quirements come into play. In surveys, tion spectrometry, spectrophotometry,
taken; and to determine compliance the need to analyze many samples polarography, and bioassay—are in­
with an economic or legal specifica­ makes a rapid method a necessity; in volved. A convenient, easily remem­
tion. A different emphasis on the vari­ monitoring, repeated sampling of the bered reference point is that at 1 ppm
ous method characteristics is required same population is important; in com­ (HT 6 ), the CV is 24 = 16%. Other
to accomplish each purpose. In pliance, practicality, although impor­ points are given in Figure 1.
surveying a field, the normal variabili­ tant, is secondary to reliability.
The most important and startling
ty of the measurement of a commodity Therefore, it is apparent that there point is that this idealized smoothed
or the environment is usually so large is no such thing as a "best" method curve is independent of the nature of

68 A · ANALYTICAL CHEMISTRY, VOL. 54, NO. 1, JANUARY 1982


"S
c
«
ô
3"

"S ο
+i
c
_o
s^t°r::-
ϋ a
1—

ι
σ>
Ο

I ?
0)

C ^ W c i e n t of Variation

ε<ο
ο
υ
Year Sample Fat Content (%)

Figure 2. The performance of laboratories analyzing EPA's Figure 3. The interlaboratory coefficient of variation and
quality-control samples for pesticide residues in fat and blood standard deviation (absolute) of the gravimetric ether extrac­
over a 13-year period (4). Fat · , blood Ο tion method for the determination of fat in meat as a function
of the concentration (%) of fat (6)

metric) (6) and methyl esters of fatty


acids (gas chromatographic) (7), as
50 well as methods in the micro scale
such as pesticide residues (gas chro­
matographic) (8, 9), and trace ele­
«-· 40 ments (predominantly atomic absorp­
c tion) (10). These examples are given
"S in Figures 3-6, with the general preci­
U 30 sion curve labeled "AOAC" drawn in
> for reference in Figures 5 and 6. When
ο the specific method curve deviates
c 20 markedly from the general precision
curve, we have obviously gone beyond
1 10 the limit of reliable measurement for
Ο that method.
ο There is one other useful piece of
100 32 10 3.2 1 0.3 0.1 information that has been extracted
from the data used to construct the
Concentration (%) general precision curve. The precision
component due essentially to analysts
(within-laboratory error or repeata-
Figure 4. The interlaboratory coefficient of variation (center
bilty) in AOAC studies is usually one-
curve) and the 9 5 % confidence limits (two outer curves) for
half to two-thirds of the total variabil­
the gas chromatographic determination of methyl esters of
ity (the combined effect of the within-
fatty aeids ( 7) and between-laboratory variability),
as given in Figure 1. Ratios of repeat­
the analyte or of the analytical tech­ (5), show that the between-laboratory ability to reproducibility considerably
nique that was used to make the mea­ CVs improved with analytical experi­ less than 0.5 indicate a very personal
surement. This curve is merely a sum­ ence, but only to a minimum value ap­ method: Analysts can check them­
mary of available interlaboratory data, proximating the 16% found in the col­ selves very well but they cannot check
independent of such external influ­ laborative studies for pesticide resi­ other analysts in other laboratories.
ences as sampling and contamination. dues in food. Similarly, the quality This situation suggests that the direc­
The significant data points are aver­ control monitoring of laboratories de­ tions require reworking or that the
ages of a number of studies of similar termining aflatoxin in peanuts for cer­ reference standards may differ from
analytes whose CVs may cover a factor tification purposes (4) gives a value laboratory to laboratory. However,
of two in either direction; similarly the that corresponds to the 32% CV on the this low ratio is also typical of meth­
concentration range may also extend general curve for aflatoxin at the 10 ods requiring considerable personal
in both directions by an order of mag­ ppb concentration level. skill, such as counting filth elements
nitude or so. But in general, the values or mold. A very high ratio can indicate
Other evidence suggests that this that individual analyst replications
taken from this curve are indicative of curve may represent a floor for the
achievable and acceptable perfor­ are so poor that they swamp out the
precision of analytical methods in the between-laboratory component.
mance of an analytical method by dif­ interlaboratory environment. In fact,
ferent laboratories. it appears that all methods more or
Independent evidence also supports less follow such a curve up to a point Outliers
this general precision curve. Quality- where the precision begins to deterio­ Another index that may prove use­
control studies of pesticide residue de­ rate at an even faster rate. This phe­ ful for evaluating methods is the per­
terminations in fat and blood by EPA nomenon is shown by methods in the centage of outliers reported in a col­
contractors, summarized in Figure 2 macro scale such as fat in meat (gravi­ laborative study. Every analyst has

70 A · ANALYTICAL CHEMISTRY, VOL. 54, NO. 1, JANUARY 1982


100

80

120

I-
Ô
£
c
ο
.S
« 80
«*-ο
40
ο c
φ
δ
«
ο
υ
40
20

100 10 1.0 01 0.01 1000 100 10.0 1.0


Concentration (ppm) Concentration (ppb)

Figure 5. The interlaboratory coefficients of variation for the Figure 6. The interlaboratory coefficients of variation of trace
determination of pesticide residues in butterfat (8) and in elements in blood by various methods as calculated from the
wildlife (9) by gas chromatographic methods. Wildlife O, but- data reviewed by Versieck and Cornells ( 10)
terfat ·

implicit faith that if a method is fol- compare the result with, or they are green cacao beans (13), illustrate this
lowed exactly, the correct result will eliminated by repetition. In an inter- effect. The repeatability (within-labo-
automatically be produced. However, laboratory situation with blind sam- ratory variability) changes from an
our review of several hundred collabo- ples, where there is no opportunity to unacceptable CV of 50% for the 20 val-
rative studies in which the samples censor the data, outliers are more ob- ues from 10 laboratories to a marginal-
were examined as true unknowns re- vious. In current AOAC collaborative ly acceptable 36% by the omission of
veals that often 5 to 15% of the re- studies, outliers are usually eliminated one value classified as an outlier by
ported values are statistical outliers— by the techniques suggested by You- the Dixon test. In this case, as in many
values that are far outside the region den (2): a ranking test to remove con- others, there is no question about the
where most of the other values reside. sistently high or low laboratories, fol- classification as an outlier since 18
Outliers are produced by experienced lowed by the elimination of outlying ppb of aflatoxin had been added to
chemists as well as by novices, at individual values by a Dixon test in- each sample. In this study, the mean
macro as well as at trace levels. Schul- volving the deviations of extreme changed little by the elimination of
ler et al. (11), in their review of afla- values. the outlier: from 16.0 to 14.6 ppb, or
toxin methods, noted that they had to We have only recently realized the expressed in terms of recovery, from
tolerate a 10% outlier rate in recom- importance of outliers in the evalua- 89 to 81%.
mending methods for international tion of methods of analysis for approv- This particular example shows only
referee status. In the analysis of moon al by the AOAC. Outliers are a fact of a 5% outlier rate. We have seen meth-
rocks from the Lunar Analysis Pro- laboratory life and allowance must be ods approved by the AOAC with an
gram of the U.S. National Aeronautics made for them. By definition, they lie outlier rate as large as 50%. There is
and Space Administration, Morrison at the extreme points of the statistical only one legitimate excuse for elimina-
reported (12) that almost 7% of the frequency distributions of a series of tion of laboratories without a statisti-
values had to be discarded as outliers. analytical values. Therefore, they have cal test: intentional or unintentional
Outliers produced by a single ana- a large influence on the magnitude of failure to follow the method. An inten-
lyst or within a laboratory are usually the indices used to measure the per- tional failure occurs when the speci-
inconspicuous, since they are either formance of methods. fied equipment or reagent is unavail-
unrecognized from analysis of single Figures 7 and 8, using the data from able and failure to substitute will
samples where there is nothing to the collaborative study of aflatoxin in mean dropping out of the study. But

72 A · ANALYTICAL CHEMISTRY, VOL. 54, NO. 1, JANUARY 1982


Added

11
0.08
10
0.07 CV = 36%
9
CV = 50%
„ 0.06
8 a
Ο
Ζ 7 % 0.05
-Q to
9
_i 5 SE 0.04
ο
4
3 Ι °·03
ο
2
ι? 0.02
1 0.01
0 10 20 30 40 0 10 20 30 40
Total Aflatoxin (ppb) Concentration (ppb)

Figure 7. Original data from the interlaboratory study of the Figure 8. The data from Figure 7 plotted as a normal frequen­
determination of aflatoxin in cacao beans ( 13). The two val­ cy distribution with the outlier included (20 points, broken
ues from each laboratory are plotted horizontally. The circled line) and the outlier excluded (18 points, solid line)
value is an outlier by the Dixon test

substitution on the basis of "it cannot False Positives and False limited. Furthermore, many methods
possibly affect the results" is inexcus­ Negatives in food chemistry are empirical—they
able. Collaborative studies are very are based upon the faith that other
expensive in terms of time and man­ Another potentially useful suitabili­ participants will adhere to the speci­
power. Jeopardizing their success with ty index for evaluation of methods fied directions to produce equivalent
untested changes undermines the en­ may be the percentage of false posi­ results. Empirical methods by defini­
tire collaborative study. tives and false negatives. False posi­ tion have no bias. In trace analysis,
tives (excessively high blanks) may the precision characteristic usually
Although many AOAC studies show appear when working at any concen­
no outliers, a 5 to 15% outlier rate, takes care of recovery, because the
tration level, but the appearance of random error is often as large or larger
particularly at the ppm and ppb lev­ both false positives and false negatives
els, is not at all unusual. If but one of than the systematic error. If the recov­
is characteristic of trace analysis. ery is low but repeatable, as in isotope
five or six laboratories required in a These values are not necessarily out­
minimum statistical pattern turns out dilution methods, any recovery is ac­
liers, but they can be. We have noted ceptable since correction can be made
to be an outlier by the Youden rank­ in our examination of the available
ing test, we have reached a 20% rejec­ back to the 100% level. If it is variable,
aflatoxin studies that the percentage even though within acceptable limits,
tion point. It appears that we may of false negatives increases much like
have to tolerate a 20% outlier rate, be­ the correction factor procedure will
our CV/concentration curve as the not do. The method must then be ac­
cause that is the penalty to be paid if concentration approaches zero. This
we use a minimum number of partici­ companied by sufficient recovery data
phenomenon may be more useful for to indicate the boundaries of perfor­
pating laboratories. This is one of the delineating a limit of reliable measure­
reasons why having at least 10 labora­ mance. In the proposed "SOM" docu­
ment, i.e., the concentration at which ment of the FDA, recovery limits were
tories in a study will improve the the proportion of false negatives is
chances of acquiring adequate data for given as more than 80% for concentra­
more than 20% (or some other num­ tions of 0.1 ppm and above, and more
statistical evaluation. ber), than for evaluating the perfor­ than 60% for lower concentrations (3).
There is one important statistical mance of methods in general. Naturally, we would prefer higher re­
problem with outliers: What outlier coveries. But these figures do appear
test should be used? This is a complex to be reasonable in light of actual re­
statistical problem whose solution de­ Bias or Systematic Error
coveries under ordinary, and not col­
pends on the true distribution of val­ Up to now I have paid little atten­ laborative, conditions.
ues. Chemists seem to have little diffi­ tion to the matter of bias or systemat­
culty in applying intuition and experi­ ic error because in most cases it takes
ence to this problem, but many statis­ care of itself. Very few methods are re­ Summary
ticians are appalled at this approach. jected because of low or high recov­ The primary objective of interlabo­
We hope to apply a number of outlier eries. In the case of macro methods, ratory studies is to determine if we
tests to a number of AOAC collabora­ recovery is frequently very close to have achieved interchangeability of
tive studies to determine if any of the theoretical, because these methods are test results among laboratories. But
several dozen procedures described in usually based upon stoichiometry or interchangeability is a function of the
the statistical literature is best suited basic physical principles of extraction purpose for which the results will be
for application to interlaboratory or separations, and the amount of ana- used: to survey a field; to monitor
work. lyte available for measurement is not trends; or to determine compliance

74 A · ANALYTICAL CHEMISTRY, VOL. 54, NO. 1, JANUARY 1982


with a specification. E a c h of t h e s e

How to choose p u r p o s e s places different e m p h a s i s


u p o n t h e characteristics or a t t r i b u t e s

an ICP Spectrometer of m e t h o d s : their reliability, applica­


bility, a n d practicability. For regulato­
ry p u r p o s e s reliability is p a r a m o u n t
a n d t h e between-laboratory precision
is t h e critical c o m p o n e n t . I n general,
t h i s precision can be r e p r e s e n t e d by
t h e following equation:
CV (%) = 2 ( 1 -°- 6 1 o « c )
where C is t h e concentration ex­
pressed as powers of 10 (e.g., 1 p p m =
1 0 - 6 ) . T h e coefficient of variation
doubles for each decrease of concen­
t r a t i o n of two orders of m a g n i t u d e .
T h e between-laboratory coefficient of
variation a t 1 p p m is 16% (2 4 ). T h e
w i t h i n - l a b o r a t o r y CV should ordinari­
ly b e one-half t o t w o - t h i r d s t h e b e ­
t w e e n - l a b o r a t o r y CV. O t h e r potential
evaluation criteria include an outlier
r a t e of 20% or less a n d an acceptable
level of false positive values a n d false
negative values. Acceptable specificity
a n d limit of reliable m e a s u r e m e n t m a y
and an ICP company. also have t o be based on t h e level of
false positives a n d false negatives. R e ­
#2 Look for superior performance in covery values ordinarily t a k e care of
any matrix. t h e m s e l v e s a t macro levels, b u t a t
Yes No t r a c e levels, 60% at t h e p p b level a n d
1. Are low detection limits (ppb) 80% a t t h e 0.1 p p m level m a y be t h e
lowest acceptable recoveries.
important?
2. Has the instrument been References
designed specifically for ICP? (1) Mandel, J. Quality Progress August
3. Can alignment be performed and 1981, 34-36.
verified easily- graphically? (2) Youden, W. J.; Steiner, Ε. Η. Statisti­
cal Manual of the AOAC (1975). Associa­
4. Does the instrument have the tion of Official Analytical Chemists:
stability and accuracy you need? Washington, D.C.
(3) Fed. Regist. March 20,1979, 44 (55),
17 070-114.
The more "yeses" you checked, the more reasons (4) Horwitz, W.; Kamps, L. R.; Boyer,
you have to learn more about the Baird Plasma K. W. J. Assoc. Off. Anal. Chem. 1980,
63,1344-54.
Spectromet, the most advanced system in the ICP (5) Watts, Randall R. "Proficiency Testing
field today. Designed from the ground up for ICP, the and Other Aspects of a Comprehensive
Plasma Spectromet combines Baird quality optics Quality Assurance Program." In "Opti­
mizing Chemical Laboratory Perfor­
with the most advanced innovations in plasma mance through the Application of Quali­
spectroscopy and high speed graphic computer ty Assurance Principles"; Garfield, Fred­
erick M. et al., Eds.; Association of Offi­
technology. With the Baird Plasma Spectromet, you cial Analytical Chemists: Arlington, Va.,
can have simultaneous multielement analysis in a 1980, pp 87-115.
matter of seconds. (6) Pettinati, Julio D.; Swift, Clifton E.
J. Assoc. Off. Anal. Chem. 1977, 60,
As the world leader in optical emission spectrom­ 600-608.
eters since 1936, Baird can be depended upon for (7) Firestone, D.; Horwitz, W. J. Assoc.
superior performance, quality workmanship, and a Off. Anal. Chem. 1979, 62, 709-21.
(8) Snelson, J. T. "Analysis of Organochlo-
total commitment to our customers. rine Residues in Butter Fat." Document
For more information about the Baird Plasma CX/PR 77/9 (1976); Food and Agricul­
ture Organization: Rome, Italy.
Spectromet and a free copy of the entire "How to (9) Holden, A. V. "The OECD Interna­
Choose" series, call or write us at Baird Corporation, tional Co-operative Studies of Organo-
125 Middlesex Turnpike, Bedford, MA 01730. chlorine Residues in Wildlife." In "Envi­
ronmental Quality and Safety"; Coul-
Tel: (617) 276-6094. Telex: 923491. ston, F.; Korte, F., Eds.; George Thieme
Publishers: Stuttgart; Supplement Vol.
Ill, pp 40-46.
(10) Versieck, Jacques; Cornells, Rita,
Anal. Chim. Acta 1980,118, 217-54.
(11) Schuller, P. L.; Horwitz, W.; Stoloff,
L. J. Assoc. Off. Anal. Chem. 1980, 59,

BAIRD
The Spectroscopy People
1315-43.
(12) Morrison, G. H. Anal. Chem. 1971, 43
(7), 22-31A.
(13) Scott, P. M.; Pryzybylski, W. J. Assoc.
Off. Anal. Chem. 1971,54, 540-44.
CIRCLE 26 ON READER SERVICE CARD
76 A · ANALYTICAL CHEMISTRY, VOL. 54, NO. 1, JANUARY 1982

You might also like