Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Statistical methods tor comparing Catherine S. Berkey\ Chester W.

DouglassV Richard W. Vaiachovi&\


Howard H. Chauncey^, and

dentai diagnostic procedures Barbara J, iVIcNeii^


'Harvard Sctiool of Dental Medicine, ^Veterans
Administration Outpatient Clinic and Harvard
School of Dental Medicine, ^Harvard Medical
School and Brigham and Women's Hospital,
Boston, Massachusetts, USA

Berkey CS, Douglass CW, Valachovic RW, Chauncey HH, McNeil B,I: Statistical
methods for comparing dental diagnostic procedures. Community Dent Oral
Epidemiol 1990; 18: 169-76,

Abstract - In dental diagnosis, there are typieally two or more elinical diagnostic
procedures which may be used either independently or jointly to reach a conclusion
regarding the presence of a particular disease in a patient. To determine which of
these diagnostic procedures are more accurate, statistical tnethods may be applied to
research data in which the true health status as well as the diagnosis provided by
each clinical procedure are available on each observation. Results arising from this
type of analysis can be of great interest to clinicians when the diagnostic procedures
themselves are costly, painful, or even potentially harmful to the patient. Considered Key words: clinical diagnostic procedures:
sensitivity: specificity: McNemar's test: logistic
here is the special situation encountered in dental research in which each patient regression: repeated measures logistic
can have multiple concurrent cases of a certain disease such as caries, for then the regression: regressive logistic models
statistical evaluation of diagnostic procedures is even further complicated. This Dr. Catherine S. Berkey, HSDM-DCA, Harvard
report describes several statistical approaches for comparing the efficacy of diagnostic University, 188 Longvi/ood Avenue, Boston, MA
tests and illustrates their application on data from a study of diagnostic radiographs 02115, USA
for dental earies. Accepted for publication 5 October 1989

Various statistical methods have been ap- rect diagnosis? However, in situations cations. In these instances, the McNemar
plied for the evaluation of clinical diag- such as dental caries diagnosis, the above analysis of sensitivity can provide a suit-
nostic procedures (1-7), Some of these question can be repeated many times for able comparison of clinical diagnostic
focus on evaluating a single type of diag- each patient, i,e,, for every tooth or for procedures. Analysis by the logistic mod-
nostic test while others cotnpare the rela- each surface of every tooth. The statisti- el may be favored by some statisticians
tive efficacy of two or more different di- cal evaluation of the clinical diagnostic for its non-reliance upon normal distri-
agnostic procedures. One method, procedures is then complicated by the bution assumptions and because it em-
known as receiver operating character- repeated measures nature of the data. phasizes the fundamental structure of
istic (ROC) curve analysis (1), for com- This paper discusses some statistical medical diagnostic data (9). In addition
paring diagnostic procedures against the approaches for comparing two or more to the diagnostic variables, both con-
true disease state is appropriate when diagnostic tests of disease versus health tinuous and categorical covariates can
there is a series of decision points that when there is a single disease threshold be included in the logistic analysis, and
can be used for choosing between the and when an individual may have con- versions of the model are available which
diagnoses of health and disease. The ap- current cases of disease. The statistical permit non-independent observations.
proach of GREENitousE & MANTEL (4) is approaches include McNemar's (8) anal- Empirical comparisons of these statis-
also appropriate when a continuous ysis of diagnostic test sensitivity and tical methods are performed on a subset
range of diagnostic test scores is pro- specificity, and logistic regression anal- of data from a study of three types of
duced by one or more diagnostic tests. ysis (the ordinary model and two repeat- diagnostic radiographs for dental caries
However, for some diseases and their di- ed measures models). The statistical in which each patient has up to 32 sepa-
agnostic tests there is a single threshold methods themselves are not new, but rate diagnoses (one for each tooth) from
separating health and disease so that it their application to this type of problem each clinical test. Here we assume that
is not feasible to apply either of the latter is not immediately obvious. there is one clinically relevant decision
t'wo approaches. Furthermore, current The McNemar analysis might be fa- point (carious lesion into the dentin) be-
statistical methods have generally been vored by clinicians familiar with diagnos- tween disease and health. Similar statisti-
applied to situations in which each pa- tic test sensitivities and specificities. For cal analyses can be used to investigate
tient can have only a single case of the certain diseases, the sensitivity of a test the efficient use of dental radiographs in
disease; i,e,, is this patient diseased and may be of great importance while the detecting periodontal disease as well as
'will a certain clinical test provide the cor- specificity has less serious clinical impli- for evaluating diagnostic tools in other
170 BERKEY ET AL.

medical settings in which there are multi- and periapical radiographs were also in- table of the diagnoses (diseased or heal-
ple testing sites per patient. terpreted simultaneously by a study den- thy) from the first diagnostic technique
tist, at least 2 weeks removed from the versus the second diagnostic technique,
interpretation of any of the three radio- on only those teeth that are diseased (ac-
Methods graphs individually so as to minimize re- cording to the reference standard):
Source of data
call of any earlier diagnosis on the same
A sample was selected from a larger patient. The order in which the dentist diseased units (n = nJ
study for the presentation of these statis- saw a patient's radiographs, either indi-
tical methods. The Veterans Administra- vidually or simultaneously, was random- diagnostic technique B
tion Dental Longitudinal Study (DLS) ized. The simultaneous interpretation of diseased healthy
(10) is being conducted at the Veterans the three radiographs was the basis of diagnostic diseased dn d,2
Administration Outpatient Clinic in Bos- the consensus radiographic reference stan- technique healthy
dz, d22
ton as part of an ongoing investigation dard, which provides the best diagnosis
by the Normative Aging Study (NAS) of dental disease that can be detected
(11). Healthy adult white males, ranging radiographically, short of extracting and
in age from 27 to 64 yr, entered the study performing a biopsy on the tooth in a A separate table is created and analyzed
between 1968 and 1974, Because oral laboratory. Therefore, this empirical ref- for each of the eight tooth types. Wheth-
health status was not a criterion for the erence standard is assumed to provide er the two diagnostic techniques have sig-
selection of participants into the Norma- the true radiographic diagnosis and it is nificantly different sensitivities on a
tive Aging Study, there was substantial against this standard that the four inde- tooth type is evaluated by doing the Mc-
variability in their oral conditions, but pendent diagnostic methods were eval- Nemar x^ test for paired data (8) on this
each participant was subsequently re- uated. Note that there is an inherent bias 2 x 2 table. The McNemar test with con-
quired to have at least 10 teeth on each against the oral examination because we tinuity correction (15) is used for the
side of the mouth in order to enter the use a radiographic standard. However, it analysis of sensitivity due to the small
Harvard dental radiograph study. The is clinically relevant to evaluate the abili- numbers of diseased units:
data reported here consist of interpret- ty of the oral exam to detect those lesions
ations of the baseline radiographic sur- which are diagnosible by radiographs. 2 ^

veys and oral examinations of 283 adult Examiner reliability in the oral examina-
males who were the first of 602 men to tion (12) and in the interpretation of den-
If this computed chi-square statistic with
enter the radiographic component of the tal radiographs (13) are reported else-
one degree of freedom is larger than 3.84,
DLS. Three different types of dental ra- where. Findings from the entire dataset
then the two diagnostic techniques have
diographs were taken on each patient, as of 602 participants, for both caries and
significantly different sensitivities.
well as an independent oral examination, periodontal diseases, are reported else-
To test for differences in specificity be-
allowing the comparison of findings of where (14) using more familiar methods
tween two diagnostic techniques, the
disease from each of these four diagnos- of analysis.
2 x 2 table would contain their diagnoses
tic procedures. on otrly those teeth that are healthy on
For each participant, the status of each the reference standard;
of his 32 teeth was recorded as present, McNemar analysis of sensitivity and
specificity
impacted or missing. Also recorded for hcaitiiyunits(n=n|,)
each tooth were areh (maxillary/mandib- The first method described is an applica-
ular), side of mouth (right/left), and tion of McNemar's test (8) to clinical diagnostic technique B
tooth type (eight types in all). Informa- diagnostic data in such a way as to evalu- disea.sed healthy
tion on the caries status of each tooth ate diagnostic sensitivity and specificity diagnostic diseased h,, h,2
that was present was obtained as follows: separately. The sensitivity of a clinical technique , ,,
^ healthy h2, h,2
1) independent clinical (oral) examina- test is the proportion of diseased individ-
tion; uals who are correctly identified while
2) interpretation of periapical radio- the specificity is the proportion of heal-
graphs alone; thy individuals who are correctly diag- Since the nutnber of healthy units n,, is
3) interpretation of panoramic radio- nosed as being free of disease (1), It may relatively large, the correction for conti-
graphs alone; and be important for clinicians to know nuity is dropped (x^ = [h2|-h,2f/[h|2 + h2i])
4) interpretation of posterior bitewing whether the sensitivities of certain diag- when testing for significant differences in
radiographs alone. nostic tests are comparable or dissimilar, speciftcity between A and B on a single
These are the four diagnostie procedures while for other diseases the specificities tooth type.
which are being compared. A tooth is may be of greater interest to them. Alter- The advantages of this approach are
considered to be diseased if it has a cari- native measures with which clinicians are that the paired nature of these diagnostic
ous lesion into the dentin. For this anal- familiar, such as predictive value negative data is taken into account and no special
ysis, small lesions into the enamel were and predictive value positive (1), could computer software is required as the x'
coded as healthy. be analyzed similarly. can be computed by hand from the 2 x 2
In addition to the above four compet- The simple, nonparametric analysis of table. However, not all of the observa-
ing diagnoses, the panoramic, bitewing sensitivity begins by setting up a 2 x 2 tions which go into the creation of each
Comparing diagnostic prtxx'dnres 171

2 x 2 table are independent. Because all rather it deals with them simultaneously the logistic regression subroutine L O -
four teeth of the same type are combined and indirectly. This approach provides GIST, at the D a n a - F a r b e r Cancer Insti-
into one analysis, one patient may, for a model which predicts the log odds of tute Computing Center, which finds m a x -
example, have four diseased canines in- disease for a tooth (or surface) having a imum likelihood estimates by the New-
cluded in the table for analyzing sensitivi- specified set of diagnostic test results. ton-Raphson method.
ty on canines, although due to the rarity The linear logistic regression model
of the disease (caries prevalence is about (15, 18) for a binary dependent variable
5%) it is highly unlikely that more than Repeated measures iogistic modei
one diseased canine per patient would be The ordinary logistic tnodel described
present at the exam. However, the heal- log above is not theoretically appropriate
thy teeth in the 2 x 2 table for the analysis when more than one observation (such
of specificity are much more likely to where /'i = P r ( y i = ! ) is the probability as teeth or surfaces) per individual is in-
come from the same patient. Alternative- that the /th unit or observation has the volved in the estimation of a single mod-
ly, one might create separate tables for disease. The units, i = I , 2 , . . . n , are as- el, A better approach is to perform a
each of 32 teeth and tabulate the results sumed independent in this ordinary logis- repeated measures logistic analysis on
of nearly 400 McNemar tests, using the tic model. the J correlated observations of the indi-
Bonferroni correction (16) to take multi- For these dental data, the /th observa- viduals, as described by STIRATELLI et al.
ple comparisons into account in assessing tion in this model corresponds to a single (19), in which observations of the ///;
significance, but many of the tables of tooth having y i = l if it has a carious individual are assumed to be statistically
diseased teeth would have too few num- lesion into the dentin on the radiographic dependent. Unfortunately, easily imple-
bers due to the rarity of the disease. standard and yi = O if healthy. Each m o d - mented methods of parameter estimation
Furthermore, the ability of any radio- el will encompass teeth of one of the eight are not yet available, A similar repeated
graphic procedure to diagnose caries on types. Because the unit of analysis is the measures model is
a single tooth type should not vary by tooth, the observations are incorrectly
arch or side of mouth, so there are techni- assumed by this model to be independent
cal reasons for pooling the fbur teeth of log
within individuals as well as between iti-
identical morphology into a single anal- dividuals. The binary predictor variables
ysis. Other disadvantages of using Mc-
Xj Xjk include the individual diagnoses
Nemar's test are that continuous covari- or X.; = X|P where X, is the vector of logits
( l = d i s e a s e , 0 = healthy) from the oral
ates of interest cannot easily be included of P(yjj|Xii), and the bi, of the /th patient
examination and from each of the three
in such an analysis, and only two diag- are arbitrarily correlated. This model was
individual radiographs, and other covari-
nostic techniques are compared at a time. estimated using a special SAS MATRIX
ates (arch and side of mouth) of interest
However, overall significance tests, that program provided to us by Professor
which can be continuous or categorical.
all (three or more) diagnostic technicjues Laird at the Harvard School of Public
Although the ability of the diagnostic
are similar in sensitivity, can be per- Health. The program fits weighted least
methods to detect disease should n o t dif-
formed by SAS PROC FREQ ((26) see p, squares linear logistic models to the mar-
fer by arch and side, disease prevalence
430) and the Cochran-Mantel-Haenszel ginal logits using an arbitrary covariance
may differ by arch while difTerences asso-
test (CMH option) for PATIENT x matrix. Similar software available from
ciated with side of mouth might be attrib-
RADIOGRAPHIC METHOD x OUT- other authors (20, 21) may provide mech-
utable to oral care by a predominantly
COME, while specifying NOPRINT anistns for avoiding computational diffi-
right-handed population. T h e estimates
(17), The row mean scores statistic pro- culties and for dealing with outcomes
of the coefficients P and their significance
vides the P-value. This can be done se- having low prevalence. PROC CAT-
levels indicate h o w well each diagnostic
parately by tooth type, and multiple teeth MOD (26) might also be used to fit this
procedure agrees with the reference stan-
of the same type can be taken into ac- model. In dental data, the / subscript
dard, after adjusting for the effects of
count by letting OUTCOME represent could indicate the patient while the /
arch, side of mouth, and the remaining
the number (0, I, 2, 3, or 4) of positive could indicate the tooth or surface within
diagnostic procedures. Larger positive
teeth for that radiographic method, the /th patient's mouth.
coefficients, relative to those of the other
among those teeth that are positive on
diagnostic procedures, imply better Unfortunately, due to technical com-
the consensus standard. Overall signifi-
agreement with the referent diagnosis. puting problems described in the Results
cance tests on diagnostic specificities can
Each coefficient p can be converted to our software was unable to routinely fit
be accomplished similarly, except that
outcome then represents the number of
an odds ratio by O R = e'', interpretable a model having the six predictor vari-
diagnostic negative teeth among those
as the odds of disease if that variable, ables as used elsewhere in this paper. We
say the bitewing radiograph, provides the were, however, able to fit a model with
that are negative on the consensus.
diagnosis of disease relative to the odds five predictor variables, for example, the
of disease if the bitewing says no disease, model
controlling for the diagnoses by the other
methods. log
Ordinary iogistic modei
The ordinary logistic model can be es-
T h i s statistical method does not analyze timated using SAS LOGIST, SAS CAT-
sensitivity and specificity independently; MOD, or B M D P - L R (25, 26). We used where teeth of one type (canines) on only
172 BERKEY ET AL.

one side of the mouth are included, for cern is that a natural ordering of the Since the purpose is to compare the
the ith patient and jth arch. On the other repeated dependent variables (reference efficacy of the clinical diagnostic proce-
hand, correlations of tooth disease within diagnoses on four canines) is assumed. dures, of particular concern is that 1)
individuals in this particular subset of Because the suitability of this assumption statistical tests comparing two or more
data were quite small. The median of in this setting is not obvious, we refer the diagnostic procedures reach the appro-
these correlations, 9's, between reader to BONNEY (22) for further details priate conclusion, and 2) the magnitude
quadrants of the mouth was only .09. on the model. We are able to apply this of the estimated coefficients (and their
Correlation among the eight teeth within model by arbitrarily designating orders significance levels) of the clinical proce-
a quadrant is not relevant here since the to the four teeth of one type. dures are in the correct order relative
four teeth from eaeh patient in any model to each other. The comparisons between
were of the same tooth type (i.e., central statistical methods below are intended to
incisor) but from the four different quad- Illustration of statistical methods provide empirical evidence as to whether
rants of the mouth. We applied these statistical methods to these methods provide similar results
data from the oral examination and ra- based upon these two criteria, assuming
diographic surveys of 283 asymptomatic correlations within individuals are not
Regressive logistic model substantial.
dental patients, McNemar and ordinary
Another approach which might be valu- logistic analyses are performed on each
able is the use of regressive logistic mod- of eight tooth types separately because
els described by BONNEY (22). Their ad- every tooth type is not visible on all types
vantage is that existing computer pro- of radiographs (four tooth types are not Results
grams for the standard logistic model for visible on bitewings) and also because we Repeated measures and regressive logistic
independent observations can be used versus ordinary logistic modei
expect the relative abilities of the four
(SAS PROC LOGIST, PROC CAT- diagnostic procedures to vary substan- Because the smallest nuinber of missing
MOD, BMDPLR). The theoretical con- tially between, but not within, tooth type. teeth were canines, and since the repeated
measures logistic software did not allow
Tahle 1. Comparison of ordinary logistic, repeated measures logistic and regressive logistic
missing data, this analysis was performed
models on 272 patients each having four canines. Each model predicts log-odds of caries only on canines. We fit the ordinary, the
according to the reference standard (arch: 1 = maxiffary, O = mandihufar; periapicaf, panoramic repeated measures and regressive logistic
and oral exam: f = caries, O = heafthy). Shown are eoeffieients and their standard errors models to all four canines from those
Ordinary fogistic modef: All four canines from a patient are assumed to be independent units
272 patients having complete data. This
Separately by side of mouth effort was hampered due to computing
limitations. This particular problem (272
(/!=1088)' Right {n = 544) Left' ()! = 544)
patients, three diagnostic variables and
intercept -4.929 (0.463) -6.129 (1.042) -4.259 (0.509) two repeated measures variables, arch
arch 1.009(0.526) 2.340 (f.063) 0.048 (0.703) and side) required the manipulation of
periapicaf 3.177 (0.565) 3.754 (0.752) 2.282 (f.l76) matrices larger than the version of SAS
panoramic - f . 9 2 3 (f.33f) -2.3f8 (f.662) -6.001 (f6.792)
oral exam f.83O (0.519) 2.344 (0.662) 0.799 (f.l22) MATRIX at our computing facility
could accommodate. The variable side
Repeated measures logistic model: canines on opposite sides of mouth are assumed to be (of mouth) was therefore eliminated from
independent units, but canines on the same side (from maxillary and mandibular arch) are
treated as repeated measures from the same patient'
this single analysis to reduce the dimen-
Separately by side of mouth sions of the matrices, with the unfortu-
nate result that teeth from the opposite
(n = 544) Right (« = 272) Left (n = 272)
side of the mouth were assumed indepen-
intercept -4.810 (0.462) -4.560(1.057) -4,380 (0.576) dent although teeth from the maxillary
arch 0.772 (0.462) 0.545 (1.057) 0.3 f2 (0.576) and mandibular arch of the same side
periapical 3.084 (O.f9f) 3.686 (0.384) 0.67f (7.716) were recognized as pairs from the same
panoramic -f.77O (0.454) -2.009 (0.807) -2.923 (7.7 f 6)
oral exam 2.025 (0.061) 2,335 (0,098) f .430 (0.076)
patient. Models were subsequently also
fit separately to canines on the right and
Regressive fogistic modef: four eanines are recognized as units from same patient. Re-ordering on the left sides of the mouth. Only for
the canines will result in different estimates. See BONNEY (1987) model 6.1 (« = 272 patients, 1088 these repeated measures logistic models
canines)
Assumed ordering of canines within mouth: which are restricted to one side of the
mouth are assumptions regarding inde-
LL,LR,UL,UR UR,LR,LL,UL UL,LR,UR,LL pendence of observations satisfied for
periapical 3,347 (0.600) 3.376 (0.622) 3.043 (0.588) these data.
panoramic -2.082 (1.392) -2.254 (1.404) -1.840 (f.386)
oral exam 1.993 (0.543) f.92f (0.53f) t.793 (0.526) The regressive logistic model (BoN-
NEY'S model 6.1 (22)) was fit to all four
'Sampfe size given is number of units assumed to be independent in anafysis. There are 17 canines with the independent variables
carious lesions on the right side of mouth, 9 on the left.
'This model only fitted by SAS LOGIST because the software used elsewhere did not converge
(diagnoses by periapical, panoramic and
to a solution (due to an empty cell in the 2 x 2 table of panoramie diagnosis by disease status). oral exam) added as linear terms. We
'Zero cells in the s x 2'' data matrix n (notation of KOCH et al, (20) were replaced by O.OOf. assumed the order mandibular left, man-
Comparing diagnostic procedures 173

dibular right, maxillary left and maxil- appear in the first column of Table 1, using two alternative orderings appear in
lary right, which was the observed order The ordinary logistic model (upper third the seeond and third column at the bot-
of increasing disease prevalence in these of table) assumes that all 1088 canines tom of Table 1,
canines. Since arch and side of mouth are statistically independent, while the re- Differences between the ordinary lo-
determine the order, they do not appear peated measures model (middle) recog- gistic and repeated measures models for
as explanatory variables in the model. nizes that canines in the maxillary and the right side of the mouth (center col-
Two alternative orderings of the canities mandibular arch on the same side (but umn of Table 1) are also relatively small,
were also tried, to see how much the not opposite sides) are related. The re- aside from the coefficient of arch. Only
coefficients of interest are affected by gressive logistic model (bottom of Table 17 of these 544 canines were diseased.
choosing different orders. 1) assumes that all four canines within a Models for the left side of the mouth
Model coefficients estimated from all patient are correlated, but the order are in less agreement. The software used
1088 canines (4 canines x 272 patients) shown is assumed. Models obtained elsewhere in this paper for ordinary logis-
tic regression (Dana Farber's LOGIST)
would not converge for those 544 left-
sided canines, so an alternative program
was used (SAS LOGIST), The instability
SPECIFICITIES of the estimate for panoramic diagnosis
is apparently due to an etnpty cell, Ort
this side of the mouth, notte of the diag-
noses of disease by the panoramic radio-
graph were disease according to the refer-
ent, explaining the negative coefficient.
In both the ordinary and the repeated
measures model, the eoefficient of pano-
ramic radiograph was very sensitive to
the method used for dealing with the
empty cell. Given that there are only 26
0,89
carious teeth among 1088 teeth, and only
nine on the left side, there is reasonable
0,87 agreement in the coefficients provided by
the models in Table 1, However, in gen-
0,85
Central Laterat Cuspid 1st Bi 2nd Bi 1st Mol 2nd Mol 3rd Mol eral for outcomes with low prevalence,
TOOTH TYPE there will be insufficient information to
estimate parameters well in complex
" Oral Examination —!— Periapical —^^ Panoramic Bitewing models such as these.
Given the technical problems encoun-
SENSITIVITIES tered in implementing the repeated mea-
sures logistic regression software and the
0,7 ordering assumption in the regressive lo-
gistic model. Table 1 indicates that use
0.6 of ordinary logistic regression software
on weakly correlated data might be an
0.5
acceptable alternative. Actually, the stan-
0.4 dard errors of the regressive model were
quite close to those of the ordinary mod-
0.3 el. For data having stronger correlation
structure, these conclusions cannot be as-
0.2 sumed.
0.1

Central Lateral Cuspid 1st Bi 2nd Bi 1st Mol 2nd Mol 3rd Mol Ordinary logistic model versus McNemar
TOOTH TYPE analysis of sensitivity and specificity
' Oral Examination Periapical -*-Panoramic -9-Bitewing Summary sensitivities and specificities
(Fig, 1) show substantial variability be-
p-value: .273 .006 ,008 .001 ,001 ,003 ,001 ,472 tween diagnostic procedures and tooth
types. The dental investigators suspected
Fig, 1, Tooth-specific sensitivities and specificities of fbur clinical procedures for diagnosing
dental caries. Also shown are significance levels for the null hypotheses that all diagnostic a priori that the efficacy of the four pro-
methods are equally sensitive for detecting caries. Numbers of diseased and healthy for each cedures would be affected by the mor-
tooth type appear in Table 2. phology and relative positions of the dif-
174 BERKEY ET AL.

ferent teeth within the tnouth, as eaeh disease than periapicals {P= .097) on first /"=.579; sensitivity P=.O91: specificity
proeedure has a different perspective of premolars. Differences in specificity of P=,149). This tnay be due to significant
the oral cavity. Overall /"-values, for dif- pairs of radiographs shown (lower por- side of mouth effects for ftrst premolars
ferences among the sensitivities of all the tion of Table 2) were not significant ex- (see Table 4) which are ignored by Mc-
diagnostie methods, appear along the eept on second molars. Nemar's test. In general, although isolat-
bottom of Fig. 1 for each tooth type. The corresponding logistic model tests ed differenees ean be noted, no consistent
Significance levels from tests of of hypotheses regarding the similarity of patterns of differenees in the findings of
hypotheses, that speeified pairs of clinical pairs of diagnostic procedures (adjusting Tables 2 and 3 emerge.
procedures are similar in their diagnostie for the effeets of areh, side, and other The relative magnitudes of the coeffi-
abilities on each tooth type, are com- diagnostie procedures) are shown in cients within each tnodel in Table 4 indi-
pared from the McNemar and logistic Table 3, where one /"-value corresponds cate relative importanee of the diagnostie
analyses. Each pair was chosen a priori as to two P-values in Table 2. For example, procedures within each tooth type. For
the two radiographic types which would the logistic model (Table 3) reports that instance, on first premolars the model
perfortn best on the speeifie tooth type. on lateral incisors the periapieals and indieates that bitewings (odds ratio =
Beeause separate MeNemar tests are panoratnics are significantly different e-''™ = 40.4), periapicals and finally pano-
done for sensitivity and specificity, we (P=.OO1) in their diagnostic abilities. ramics (in that order) provide the best
will in essence be comparing two P-val- McNemar's test reports that they are dif- diagnoses and that the oral examination
ues with a single P-value from the corre- ferent (P=.OO3) in their sensitivities to is not a significant predictor of radio-
sponding logistie test of hypothesis. caries but not in their specificities (P= graphically detectable caries, controlling
McNemar chi-square tests on 2 x 2 .176). In some instances (see second mo- for arch, side, and other diagnostic pro-
tables of diseased teeth found that peria- lars), the logistic /"-value (.18) appears to cedures.
picals had significantly higher sensitivi- be a blending of the separate sensitivity
ties than panoramic radiographs on lat- (.999) and specificity (.046) /"-values, but
eral incisors and eanines (P<.05) as this does not occur consistently. The Discussion
shown in the top row of Table 2. Bite- tnost obvious differences between Tables
Data such as those frequently encoun-
wings were marginally more sensitive to 2 and 3 are for the ftrst premolar (logistie
tered in studies of oral health, in which
a single individual produces tnultiple oh-
.servations, should clearly be analyzed by
Table 2. McNctrtar's test on sensitivity and speciftcity of periapical, panoramic, arid posterior
bitewing radiographs. 'Shown are P-values resulting from the indicated hypothesis test on each tnethods which reeognize the laek of in-
tooth type. (Test with continuity correction is done on sensitivities) dependence among observations within
an individual. But methods are still
Type of tooth
evolving for the analysis of repeated cate-
Central Lateral First Second First Second Third gorical outcomes (27), espeeially with
incisor tnctsor Canine premolar premolar tnolar molar tnolar missing observations (23). Although the-
Sensitivity oretical models often exist for particular
H,,: Per = Pan 0.249 0.003 0.008 0.466 situations, cotnputer software is not
Ho: Per = B'W 0.097 0.855 0.861 0.999 available for widespread application
No. of carious when data are incotnplete. Where soft-
teeth 15 32 28 59 76 75 68 25
ware is available, technical probletns fre-
Specificity quently arise during itnpletnentation. We
H,,: Per = Pan 0.645 0.176 0.113 0.743 even had problems with two subroutines
H,,: Per = BW 0.149 0.999 0.168 0.046
for ordinary logistic regression, one of
No. of sound
teeth 1084 1067 1093 1004 891 714 893 420 which is widely available (26), due to the
low prevalenee of our outeome. The re-
'Incisors, canines atid third molars are not visible on bitewing radiographs. peated measures logistie regression tnod-
el experieneed computational difficulties
Table 3. Ordinary logistic model analysis of carie.s for each tooth type among 283 study on complete data in a SAS MATRIX
participants. Arch and side are in each tnodel (coeffteients giveti in Table 4). Shown here are subroutine on our IBM 4341. The arbi-
signiftcance levels (/'-values) from perfortrting the indicated test of model coefficients. (Per = trary choice of a different ordering of the
periapical, Pan = panorarnic, BW = bitewing)' four responses (eanities) in the BONNEY
Type of tooth
model (22) ean produce different coeffi-
cients and standard errors, with a poten-
Central Lateral First Second First Second Third
incisor iticisor
tially different interpretation of findings.
Canine pretnolar premolar tnolar tnolar tnolar
The approach of KORN & WIIITTOMORE
Ho: Ppcr—PP;I.I 0.180 0.001 0.000 0.252 (24) would likely also not be suitable for
H(i: Ppcr = PBW 0.579 0.844 0.988 0.180 ovir data in whieh low response rates and
Total no. of teeth 1099 1099 1121 1063 967 789 961 445 missing values (teeth) are eotntnon. The
No. of carious algorithm for another more flexible mod-
teeth 15 32 28 59 76 75 68 25 el (19) reportedly converges very slowly
and the eosts of implementation are high.
Incisors, canines, and third molars are not visible on bitewing radiograhps.
Comparitig diagnostic procedures 175

Table 4. For each tooth type, estitnated coetTicients for logistie tnodels predicting carious lesions atnong 283 participants. The linear eotnbination
o f the coefficients shown predicts the log odds of a carious tooth for specified arch, side, and diagnoses by itidividual radiographs and oral exam.
T variable]

Type of tooth
Central Lateral First Second Seeond
Predictor variable ineisor incisor Canine premolar premolar First molar molar Third molar

Constant -5.736* -5.586* -4.621* -4.980* -3.986* -4.255* -3.6.56* -5.274*


Arch:
M a x . = l, Mand. = O 1.237 2.001* 0.857 0.454 0.135 0.206 -0.645 1.012
Side:
L e f t = l , Right = 0 0.149 -0.531 -0.538 0.845* -0.041 0.606 -0.043 0.696
Periapical Radiograph:
c a r i e s = l , healthy = 0 3.267* 3.587* 3.433* 3.331* 2.858* 2.678* 3.327* 4.492*
Panoramic Radiograph:
caries = 1, healthy = 0 1.228 -0.317 -2.168 2.044* 0.685 1.872* 1.849* 3.210*
Bitewing Radiograph:'
caries = 1, healthy = 0 — - — 3.698* 2.969* 2.686* 2.510*
Oral Examination:
caries = 1 , healthy = 0 1.675* 1.544* 1.872* 0.651 0.891* 1.310* 0.305 0.699

Total no. of teeth 1099 1099 1121 1063 967 789 961 445
N o . of carious teeth 15 32 28 59 76 75 68 25

*P<0.05; coefficient signiftcantly non-zero. 'Incisors, canines, and third tnolars are not visible on bitewing radiographs.

If one were analyzing surface rather than whieh deals better with nonindependent count the full multivariate nature of the
tooth, the problems presented here observations of a single tooth type was problem (up to 32 observations per pa-
'would be even more severe. Additional also shown (17). The multicolhnearity tient and three or four diagnostie tests),
technical problems may arise if a high problem mentioned for logistie regres- and the simpler alternative of performing
degree of eollinearity is present among sion is not an issue for these analyses. 32 separate analyses, one for each tooth
predietor variables, which would oeeur if The issue of independenee between (or one for eaeh surfaee), taking multiple
the diagnostic abilities of the radiographs separate analyses has not been discussed. coti.parisons into account. However, the
'were more similar than these were. Due The eight logistic models in Table 4 are further subdivision of the small number
to the laek of eorrelation between quad- not independent analyses because the of diseased teetli into separate analyses
rants within subjeets in this small sub- presenee of disease in one type of tooth would be questionable even for the full
sample, the ordinary logistie model eould (i.e., first molar) is assoeiated with the dataset of 602 participants.
probably be applied with little risk in a likelihood of disease in an adjaeent tooth The availability of statistical software
study eomparing the efficacy of four den- (i.e., seeond molar; notice the similarities may for some investigators be the deter-
tal diagnostic procedures. As the purpose of their coefficients in Table 4) beeause mining faetor as to whieh analysis is per-
in this paper is the presentation of statis- of their similar morphology and common fortned. Logistie analysis is not univer-
tical methodologies on a subset of our oral environment. The same is true for sally available, particularly in the PC en-
data, no eonelusive clinical interpretation the MeNemar analyses (Table 2). In addi- vironment. We transferred these data
of results should be drawn. tion, the sensitivity and speeifieity anal- between mainframe computers to obtain
On the other hand, the McNemar yses on a single tooth type in Table 2 are SAS MATRIX for the repeated measures
analysis (8) is very sitnple to itnpleinent not independent sinee the sensitivity and logistic analyses and to obtain SAS's or-
and it works well for outcomes with low specificity of a diagnostie test are nega- dinary logistie regression software when
prevalenee. McNemar's test may be the tively correlated. Therefore, none of the the logistic software used elsewhere did
best approaeh if any particular feature of analyses demonstrated is perfect. It not converge in one inodel. There is
a diagnostie test (sensitivity, specificity, would indeed be preferable to obtain a elearly a need for further developtnent
predictive value negative, or predictive single model on all 32 teeth combined, of statistieal software for the analysis of
value positive) is of major importance but the corresponding increase in the categorical repeated measures with miss-
and therefore should be singled out for complexity of such a repeated measures ing data. We have seen here that dental
investigation. The primary disadvantage analysis eould not be dealt with current- studies present tnany challenges to the
of the McNemar analyses is that eon- ly. Also, imbalance in the data due to correct statistical analysis of their data.
trolling for a continuous confounding tooth extractions, and the fact that four As pointed out by others (28), the final
variable would be diffieult. Due to the tooth types were not visible on bitewings, ehoiee of a method depends also upon
low prevalence of the disease we studied, would further complicate estimation of a non-statistieal eriteria specific to the
eaeh test of sensitivity occurs on a dataset single tnodel on all 32 teeth simulta- practical probletn under investigation.
of mostly independent observations, neously. We have attempted to find a
although eaeh test of specificity does not. compromise between the ideal solution, Acknowledgments - This research vvas support-
A similar overall test, for the sensitivity in whieh a single statistical model is de- ed by Grant Nos. HS-04852 and HS-05708
veloped which explicitly takes into ac- from the National Center for Health Services
of three or tnore diagnostic proeedures. Research and the 'Veterans Administration
176 BERKEY ET A L .

Medical Research Service. We acknowledge 1947; 12: 153-7. dom-effects models for serial observations
the contributions of Drs. ANILA 'WuEStNHA, 9. DAWtt:) AP. Properties of diagnostic data with binary response. Biometrics 1984; 40:
JANE WEtNTRAUB and ALEXtA ANTCZAK, M S . distributions. Biometrics 1976; 32: 647-58. 961-71.
KATE GtLLOOLY, Mr. JONATHAN DtRECTOti, 10. KAPUR K K , GLASS R G , LOETUS ER, A L - 20. KOCH G G , LANots JR, FREEMAN JL,
and Ms. JOY STIIWART to this project. Finally, MAN JE, FELLER R P . The Veterans Admin- FREEMAN DH JR, LEHNEN RG. A general
we are very grateful to the referees for their istration lotigitudinal study of oral health methodology for the analysis of experi-
many valuable suggestions. and disease. Aging Hum Develop 1972; 3: ments with repeated tneasurement of eate-
125-37. gorieal data. Bloinetrks \911\ 33: 133-58.
11. BELL B , ROSE CL, DAMON A. The Veterans 21. LANDts RJ, KOCH GG. Categorical data
References Administration longitudinal study of heal- analysis in longitudinal studies. In:
thy aging. J Gerontol 1966; 6: 179-84. NESSELtlOADE JR, BALTES PB. eds. Longi-
1. DotjGLASS CW, McNEtL BJ. Clinical deci- 12. FELUMAN RS, DOUGLASS CW, LOETUS ER, tudinal Re.seareh in the Study of Behavioral
sion analysis methods applied to diagnos- KAPUR KK, CHAUNCEY H H . Interexamin- Development. New York: Academic Press,
tic tests in dentistry. J Dent Edue 1983; er agreement in the measurement of peri- 1979.
47: 708-12. odontal disease. / Periodont Res 1982; 17: 22. BONNEY GE. Logistic regression for de-
2. WEiNsrEiN MC, FiNEBERG MV. Clinical 80-9. pendent binary observations. Biometrics
Decision Analysis. Philadelphia: W. B. 13. VALACHOvtc RW, DOUGLASS CW, BERKI-.Y
Saunders, 1980. 1987; 4i.'951-73.
CS, CHAUNCEY HH, McNEtL BJ. Exatnin- 23. STRAM D O , WEt LJ, WARE J H . Analysis
3. GRINER P F , MAJEWSKI RJ, MusiiLtN AI, er reliability in the interpretation of dental
GREENLAND P. Selection and interpret- of repeated ordered categorical outcomes
radiographs. J Dent Res 1986; 65: 432-6. with possibly tnissitig observations and
ation of diagnostic tests and procedures: 14. DOUGLASS CW, VALACHOVIC RW, WtJES-
principles and applications. Ann Int Med titne-dependent covariates. JASA 1988;
tNHA A, CHAUNCEV HH, KAPUR KK, M C - 83: 631-7.
1981; 94: 553-92. NEIL BJ. elinieal efficacy of dental radio-
4. GtiEENHOUSK SW, MANTEL N . The evalua- 24. KORN EL, WwtTTEMORE AS. Methods for
graphy in the detection of dental caries analyzing panel studies of acute health ef-
tion of diagnostic tests. Biometrics 1950; and periodontal diseases. Oral Surg 1986;
6: 399-412. fects or air pollution. Biometrics t979; 35:
62: 330-9.
5. BUCK A A, GART JJ. Comparison of a 795-804.
15. BtsHOP YMM, FEtNBERG SE, HOLLAND
screening test and a reference test in epide- 25. DtxoN WJ. ed. Biomedleal Computer Pro-
PW. Diserete Multivariate Analysis. Cam-
tniologie studies. I. Am J Epidemioi 1966; bridge, MA: MIT Press, 1975; 258, 357. grams. Los Angeles: UCLA Press, 1975.
Si.' 586-92. 16. KLt'lNBAUM DG, KUPPI-R LL, MULLER 26. SAS INSTITUTE INC. SAS User's Guide:
6. Hut SL, WALTER SD. Estimating the error KE. Applied Regression Analysis and Statisties, Version 5 edn. Cary, NC: SAS
rates of diagnostic tests. Biometrics 1980; Other Multlvariablc Methods. Boston: Institute Inc., 1985.
36: 167-71. PWS-KENT, 1988; 32. 27. CONNOLLY MA, LIANG K Y . Conditiotial
7. N A G E L K E R K E N , F I D L E R 'V, BuWALt^A M . 17. KutuTZ SJ, LANDts JR, KOCH G G . A gen- logistic regression models for eorrelated
Instrumental variables in the evaluation of eral overview of Mantel-Haenszel tneth- binary data. Biometrika 1988; 75: 501-6.
diagnostie test procedures when the true ods: applieations and recent develop- 28. Trt't'ERtNGTON DM, MURRAY G D , MUR-
disease state is unknown. Stat Med 1988; ments. Ann Rev Publie Health 1988; 9: RAY LS, SpttiGELHAL't'ER DJ, SKENE AJ,
7: 738-44. 123-60. HABBEMA J D F , GELPKE GJ. Comparison
8. MCNEMAR Q . Note on the sampling error 18. Cox DR. The Analysis of Binary Data. of discrimination techniques applied to a
of the differenee between correlated pro- London: Methuen, 1970; 18. complex data set of head iniured patients.
portions or percentages. Psychometriea 19. STtRATELLt R, LAtRD N, WARE J. Ran- J Roy Stat Soe 1981; ^.' 144-75.

You might also like