Assessing The Quality of Randomized Controlled Trials: An Annotated Bibliography of Scales and Checklists

Assessing the Quality of Randomized
Controlled Trials: An Annotated Bibliography

of Scales and Checklists
David Moher, MSc, Alejandro R. Jadad, MD,

Graham Nichol, MD, Marie Penman BScN,
Peter Tugwell, MD, and Sharon Walsh, MD
Clinical Epidemiology Unit, Loeb Medical Research Institute (D.M., M.P., P.T., S. W.),
Department of Medicine, University of Ottawa (D.M., P.T., S.W.), Oxford Regional Pain
Relief Unit, University of Oxford (A.R.J.), and Division of Clinical Epidemiology, Department
of Medicine, Brigham and Women’s Hospital (G.N.)
ABSTRACT: Assessing the quality of randomized controlled trials (RCTs) is important and
relatively new. Quality gives us an estimate of the likelihood that the results are a
valid estimate of the truth. We present an annotated bibliography of scales and
checklists developed to assess quality. Twenty-five scales and nine checklists have
been developed to assess quality. The checklists are most useful in providing investi-
gators with guidelines as to what information should be included in reporting RCTs.
The scales give readers a quantitative index of the likelihood that the reported
methodology and results are free of bias. There are several shortcomings with these
scales. Future scale development is likely to be most beneficial if questions common to
all trials are assessed, if the scale is easy to use, and if it is developed with sufficient
rigor.
KEY WORDS: Clinimetrics, assessmenf, quality, scales, checklists
INTRODUCTION
The randomized controlled (RCT) is considered the most reliable
trial
method on which to assess the efficacy of treatments 111. Regardless of
whether or not the result of an RCT reaches statistical significance the design,
conduct, and published report should be of high quality. High-quality trials
and their reports should lead to better and more realistic estimates of treat-
ment effects, more accurate and reproducible estimates of treatment efficacy,
and hopefully greater acceptance of these results within the health care
community.
Address reprint requests to: David Moher, MSc, Clinical Epidemiology Unit, Loeb Medical Research
Institute, Ottawa Civic Hospital, 1053 Carling Avenue, Ottawa, Ontario KZY 4E9, Canada.
Received October 22,1993; revised July 12, 1994.
Controlled Clinical Trials 16162-73
0197-2456/95/$9.50 0 Elsevier Science Inc. 1995
SSDI 0197-2456(94)00031-W 655 Avenue of the Americas, New York, New York 10010
Assessing the Quality of RCTs 63
It is important to distinguish between assessing the quality of a trial and

the quality of its report. We define the quality of a trial, our primary interest,
as “the confidence that the trial design, conduct, and analysis has minimized
or avoided biases in its treatment comparisons.” This definition focuses on
methodological quality. The quality of a trial report can be defined as “pro-
viding information about the design, conduct, and analysis of the trial.” A
trial designed with several biases that is well reported can receive a high-qua-
lity score. Conversely, a well-designed and conducted trial that is poorly
reported would receive a low-quality score.
Developing scales to assess quality is a relatively new phenomenon. The
first scale was published in 1981 121. By 1993 an additional 24 scales had been
developed 13-241. Checklists have a longer history. The first checklist was
published in 1961 1251 and nine had been published by 1993 [26-331. This
paper presents an annotated bibliography of these scales and checklists with a
discussion of some factors related to the assessment of quality in RCTs. We
begin with a rationale for the need to measure trial quality, propose a
definition of methodological quality, present the results of our search for
scales and checklists, discuss our results, and comment on the future direction
of this research.
METHODOLOGY
To be considered as a scale, the construct under consideration should be a
continuum, with quantitative units that reflect varying levels of a trait or
characteristic 1341. There had to be evidence that the scale was developed to
measure quality and each item had to have a numeric score attached to it with
an overall summary score. For a checklist to be included the author(s)
intentions were not to attach a quantitative score to each of the questions or to
have an overall numeric score.
To capture all the published scales and checklists we carried out a MED-
LINE search of research reports published between January 1966 and De-
cember 1992. We included the following key words in our searching strategy:
quality, clinical trials, scale, checklist, and human. We used wild cards for
RANDOM* and MeSH terms for CLINICAL TRIALS and RANDOMIZED
CONTROLLED TRIALS. Our search was not limited to the English language.
We reviewed all of the scales and checklists so identified, including their
references for scales and checklists. We also wrote to various authors of scales
and checklists asking them whether they knew of other published or unpub-
lished scales and/or checklists. We rejected any scale or checklist that was a
minor modification of an existing one.
From each scale the following items were recorded: the name of the scale
or its principal author, whether the scale was developed to assess the quality
of any trial or specific trials (e.g., contrast media, pain), whether quality was
defined, the type of quality assessed (i.e., methodological quality or the
quality of reporting), how the items were selected, whether the scale included
items on four content areas bearing on the internal validity of a trial (patient
assignment J351, masking 1361, patient follow-up 1371, and statistical analysis
1381, the number of items, whether the scale had undergone rigorous devel-
D. Moher et al.
opment [39,40,41], interrater reliability, the approximate time to complete the

scale (if this was not reported, we estimated the time based on our experience
using the scale), range of possible scores, guidelines as to how a trial should
be scored, and scores reported during scale development and/or in a meta-
analysis).
From each checklist the following items were recorded: principal author,
number of items included in the checklist, whether quality was defined, type
of quality being assessed, how the items were selected for inclusion, whether
the checklist included items on patient assignment, masking, patient follow-up,
and statistical analysis, and the approximate time to complete the checklist.
RESULTS
Scales
Twenty-five scales were identified (see Table 1). Twenty-three of these
scales have been published. The remaining two scales are still under develop-
ment or unpublished.
Fifteen (60%) of the scales were designed to assess the quality of any trial
12,4,7-11,13-15,19,20,231. The remaining 10 (40%) scales assess the quality of
specific trials (e.g., contrast media, pain) [3,5-6,12,16-18,21,22,24]. Differences
in the scope of scales can lead to discrepancies in how trials should be scored.
Scales that yield higher scores for double-masked trials automatically dis-
criminate against surgical trials in which masking may be inappropriate or
impossible. Six (24%) of the scales defined the construct quality used in their
scale development [4,5,7,8,14]. Three (12%) of the scales were designed to
assess the quality of the trial report [3,4,141, 8 (32%) to assess methodological
quality ]2,5,6,13,16,21,221, and the remaining 14 (56%) to assess both method-
ological quality and the quality of the report [7,8-12,15,17-20,23,24].
Twenty-four (96%) used “accepted criteria” to select the items for inclusion
in their scale [2-13,15-241. By accepted criteria the authors reported using
items selected from textbooks of clinical trials [42,431. The remaining scale [14]
used a pool of items that were narrowed down, to the final version of the
scale, using standard scale development techniques. Twenty-two (88%) of the
scales had at least one item about patient assignment [2-17,20-241, 20 (80%)
had at least one item about masking [2-5,7,8,10-12,14-17,19,20,22-241, 11
(44%) had at least one item about patient follow-up [2,5,9,16,19-21,23,24], and
21 (84%) had at least one item about statistical analysis [2-5,7-12,14-16,18-
20,22-241. The number of items in a scale ranged from 3 to 34.
Only one (4%) scale satisfied our criteria of rigorous development 1141. This
report documented how the items were initially selected, how and why the
final items were included, how the scale discriminated between trials of
differing quality, and on the range of quality scores obtained during its
development. Twelve (48%) of the scales reported interrater reliability
[3,4,6,8,13-16,19,20,24]: five (41.7%) reported percent agreement, six (50%)
reported intraclass correlation (ICC) or K, its equivalent, and one (8.3%)
reported Pearson correlations.
Eighteen (72%) of the scales described how the items should be scored
when assessing quality [2-4,6-8,14-20,23,241. All of the items in each of the 12
(48%) of the scales [3,5-7,9,10,13,14,17,19,211 can be scored in 10 min or less

(range <lo-45 min). How to score each of the scales varies considerably as do
the possible range of scores. Seventeen (68%) of the scales provided detailed
instructions as to how scores should be assigned to each item as well as
computing the overall summary score 12-46~8,14-20,23,241. Total scores for
each of the scales ranged from 1 to 170 points. Eight (32%) of the scales used a
weighting system to score quality. For example, Koes and colleagues 1161
allocate 4% of the scales’ points to how the trial reports randomization and
10% to how a trial reports whether the outcome(s) were assessed in a masked
manner.
Four (16%) of the scales recommended steps to minimize bias for those
completing quality assessments 12,8,14,161. For example, the quality assessor
should not know (masked) the identity of the trial’s author(s), journal, and
outcome.
Although all of the scales can be used to assess the quality of individual
trials, they are most often cited in the context of assessing the quality of trials
used in meta-analyses. We reviewed the quality of trials used in a meta-ana-
lysis in which each scale was used [4,5,8,9,11-17,44-49]. Where reported we
used the mean quality score or calculated it (C individual trial scores/total
number of trials) ourselves. These scores have been converted to a percent
[(score points/maximum score points) x 1001 and rounded to the nearest
integer. With the exception of the high scores (mean = 82%) reported by
Imperiale and McCullough and the Annals scale (mean = 75%) 113,41, the
overall quality across scales is approximately 50%, indicating room for im-
provement in either the quality of the trials, how the trial is reported, or both.
Alternatively, these scores may simply indicate that the construct is inade-
quate.
Checklists
Nine checklists assessing the quality of RCTs have been published (see
Table 2). The checklists vary from 4 to 57 items. Quality was partially defined
in two (22.2%) checklists [25,26]. Four (44.4%) of the checklists were designed
to assess the methodological quality of the trials [25,26,32,33], three (33.3%) to
assess the quality of reporting 127,29,311, and the remaining two (22.2%) to
assess methodological quality as well as the quality of reporting [28,301.
Accepted criteria were used in the selection of items for all the checklists.
Seven (77.8%) of the checklists included at least one item about patient
assignment [26-29,31-331, eight (88.9%) had at least one item about masking
126-331, five (55.6%) had at least one item about patient follow-up [26-29,321,
and eight (88.9%) had at least one item about statistical analysis [25,27-331. A
trial could be assessed by all the checklists in 30 min or less.
DISCUSSION
Despite the enormous time and energy required to develop a scale, all of
the 25 scales reviewed here, with one exception, have major weaknesses. With
one exception, all of the scales have evolved with little or no standard scale
Table 1 Descriptive characteristics of published and unpublished scales used to
assess the quality of randomized controlled trials (RCTs)
Detailed
Statis- Instruc-
Type Type of Patient Patient tical Scale Time Cons for Meta-
Scale of Quality quality Items assign- Mask- follow- analy- Number devel- Inter-rater to corn- Scoring scoring analysis
Nam@ scaleh defined< assessedd selected mentf in@ up” sis’ of items menti reliabilityk plete’ rangem items” scores”
Andrew3+4 S n r ac y Y n Y 11 nr 0.95” 10 o-22 56

Annals4 a r ac Y Y n Y ;; nr 0.12” 15 34-170 ; 75
Beckermans S :: m ac Y Y Y Y nr 10 O-25 n 35
Brown6,45 n m ac n n nr ;fs9n 10 o-21 y 55
Y
Chalmers, 17,‘j6 ;: Y m&r ac Y ; n Y : nr nr <lo o-9 58
Chalmers TC1,47 g n m ac Y Y Y Y 27 nr nr 45 O-100 ; 45
Choa ’ Y m&r ac Y Y n Y 2; nr 0.89S 30 o-1 y 60
Colditzg : n m&r ac Y Y Y nr nr 10 O-8 n 56
Detsky’O g n m&r ac Y ; n Y 5 nr nr 10 o-15
Evansl’ g n m&r ac Y Y n Y 33 nr nr 15 O-100 :: :;
Gotzsche12 S n m&r ac Y Y n Y 8.8 nr nr 15 O-8 n 25,38
Imperiale13 g n m ac Y n n 5 nr 0.79” <lo O-5 n 82
Jadad14 Y r pool Y ; n Y 6 Y 0.65,0.755 <lo O-8 y 56
: Y r Y Y n n 3 Y 0.66,0.7? <lo o-5 54
JonasP g m Y Y y Y 20 nr 0.6> 20 O-100 ;
Kleijnen15 g ;: m&r ac Y n Y 7 nr 0.87” 15 O-100 y :;
Koe@ S P m ac Y ; y Y ;I nr 0.8” 15 O-100 y 37
Lindeq !S Y m&r ac Y Y y Y mr nr 30 O-100 y nr
NurmohamedI7 S P m&r ac n 8 nr nr 10 O-8 y
Y Y n
Ongheniala S m&r ac n n n Y 10 nr nr 15 O-10 Z
Poynard’g+a i? ;: m&r ac n Y Y Y 14 nr >.665 10 -2-26 ; 13
ReischZOAy g n m&r ac Y Y Y Y 34 nr 0.99,0.71’ 30 o-34 y 45
Smithzl S n m ac Y n Y n 8 nr nr 10 O-40 n 62 >
SpitzerZ2 s n m ac n 5 nr nr 25 O-5 n nr ls
Y Y Y a
Tapsz3 g n m&r ac Y Y Y Y 29 nr nr 30 O-100 Y nr e.
Ter RietZ4 S P m&r ac Y Y Y Y 18 nr 0.93” 15 l-100 Y 47 Gz
aname of scale or principal author z

“g = generic scale; s = specific scale (e.g. contrast media, pain) ro
cn = no; y = yes; p = partially defined E
r
drn = methodological quality; n = quality of report
q
eat = accepted criteria (see text for details); pool = pool of items (see text for details)
%
fwas there an item on patient assignment n = no; y = yes
gwas there an item on masking n = no; y = yes
5
hwas there an item on patient follow-up n = no; y = yes 0,
iwas there an item on statistical analysis n = no; y = yes
/was the scale rigorously developed (see text for details); nr = not reported; y = yes
knr = not reported
‘approximate time (in minutes) to complete scoring a trial. If it was not stated by the authors we estimated the time by scoring trials
lnthis is the range of potential scores using the scale. Higher scores indicate superior quality.
“n = no; y = yes
“mean scores reported or calculated by us (X individual trial scores/total number of trials). These scores have been converted to a percent [(score points/maxi-
mum score points)*1001 and rounded to the nearest integer; n = no; y = yes
PJonas WB. The likelihood of validity evaluation method. Unpublished manuscript, 1993
qLinde K, Clausius N, Melchart D, Brandmaier R, Jonas WB, Eitel F. Controlled clinical trials on the efficacy of treatment strategies using homeopathic
preparations: a systematic review. Unpublished manuscript, 1993.
‘percent agreement
Sintra class correlation or Kappa
‘Pearson correlation
D. Moher et al.
Table 2 Descriptive characteristics of published checklists used to assess the quality

of randomized controlled trials
Statis-
No. Type of Patient Patient tical
Checklist of Quality Quality Items Assign- Mask- follow- analy- Time to
Nam@ Items Definedb AssessedC Selectedd mente ingf upg sisk complete’
Badgleyz5 5 P m ac n n n Y 15
Blandz6 18 P m ac Y Y Y n 20
DerSimonianz7 11 n r ac Y Y Y Y 15
Gardner28 26 n m&r ac Y Y Y Y 20
Grantzg 28 n r ac Y Y Y Y 20
Lione130 45 n m&r ac n Y n Y 30
Mahon31 n r ac Y Y n Y 10
Thomson32 1; n m ac Y Y Y Y 15
Weintraub33 57 n m ac Y Y n Y 25
‘name of principal author

bn = no; y = yes; p = partially defined
cm = methodological quality; r = quality of report
dac = accepted criteria (see text for details)
ewas there an item on patient assignment. n = no; y = yes
fwas there an item on masking. n = no; y = yes
8was there an item on patient follow-up. n = no; y = yes
hwas there an item on statistical analysis. n = no; y = yes
‘approximate time (in minutes) to complete scoring a trial. If it was not stated by the authors we estimated
the time by scoring trials
development techniques [39,40,41]. Because of these results we recommend

caution in assessing quality using any scale that has been inadequately
developed.
With one exception, all of the items chosen for use by scale developers
were based on what the authors called “accepted criteria” from standard
clinical trial textbooks. Although these criteria may be useful, some of them
are based on conviction whereas others are based on empirical evidence. It is
hard to imagine how informed consent, an item in two scales [3,20], could
systematically influence the methodological quality of a trial. However, there
is evidence that how patients are assigned to treatment groups, asked in the
majority of scales, can systematically alter treatment effects 1501.
The majority of scales included at least one item about patient assignment,
masking, patient follow-up, and statistical analysis. In our view the develop-
ment of a scale in which the items are generic to all trials has several
advantages. For example, the essential set of items would be common across
all trials allowing for useful comparisons. This approach has been successfully
used in other areas of health care [51].
Less than half of the scales reported any measure of interrater reliability.
Of those that did report reliability there was variability on the measure used.
Only 50% of the scales reported intraclass correlations (or K, its equivalent)
perhaps the most appropriate measure. Surprisingly, seven scales did not
provide details to readers as to how to score the individual items or overall
summary. Because of this lack of information we experienced difficulty in
assigning a summary quality score to at least one scale 1101.
There are several consequences resulting from inadequately developed

scales. There may be a lack of agreement between scales about what construct
is being measured. In our review less than 25% of the scales defined what is
meant by the construct trial quality, a fundamental aspect of any scale
development. Without such a definition there is a risk that a scale purporting
to measure trial quality is actually measuring a different construct. Alterna-
tively, some scales may be assessing the methodological quality of a trial,
whereas others may be assessing the quality of a trial report, or a combination
of both. Kleijnen and colleagues’ scale 1151 appears to assess both the method-
ological quality of a trial, by asking about the number of patients analyzed,
and the quality of the trial report, by asking whether patient characteristics
were adequately described.
Even if the scales available vary in their size, complexity, and level of
development, it would be useful to ascertain whether different scales, when
applied to the same trial, provide similar results. This information could
guide quality assessors in their choice of scale. There would be little advan-
tage in using a 20-item scale to assess trial quality if similar results could be
obtained by using a 6-item scale.
Detsky and colleagues 1101 assessed the quality of 18 trials used in a
parenteral nutritional support systematic overview. Using scales developed
by Chalmers and Detsky, these authors reported that although there were
minor differences in the raw scores, the rankings of trials in the scores’ quality
remained similar across trials. In contrast, Moher and colleagues [52] used six
scales to assess the quality of 12 trials used in a meta-analysis to assess the
effects of antithrombotic therapy for patients with acute ischemic stroke.
These authors found that overall quality scores for each trial varied consider-
ably across scales. Differences in trial quality scores scales ranged from 23% to
74%. Similar results were obtained using rank scores of individual trials.
These results may be explained in part by differences in how the scales were
developed.
Future efforts in assessing quality may be best spent in developing scales
with appropriate rigor. Developing a scale to assess quality should be consid-
ered similar to developing any other instrument. A description of the general
principles for scale development is beyond the scope of this paper, but there
are nevertheless some specific issues that must be addressed when develop-
ing a scale to assess the quality of a trial (see Table 3).
We also need to address whether, as part of a meta-analysis, efficacy and
safety analyses should be conducted with and without quality scores. This can
have impact on how results are interpreted and is likely to become more
important for large collaborative groups conducting meta-analyses 153,541.
Nurmohamed and colleagues 1171 recently published a meta-analysis compar-
ing low molecular weight heparin (LMWH) with standard heparin in proxi-
mal deep-vein thrombosis (DVT). These authors reported a statistically signif-
icant beneficial effect of LMWH in reducing DVT when all the trials were
used in the analysis. However, when the analysis was limited to trials
described as having strong methodological quality, the analysis was less
favorable with respect to the prevention of DVT and was not statistically
significant.
70 D. Moher et al.
Table 3 Specific issues to address in the development of a scale to assess

trial aualitV
l Definition of the quality construct
l Definition of the scope of the scale
Any trial or only randomized controlled trials?
Studies in a specific medical field or in all medical disciplines?
Published and/or unpublished data?
l Definition of population of end-users
Same or different backgrounds?
l Selection of targets
Stratified or random sample of trials?
l Selection of raters
Instruction sheet or formal training?
l Trial scoring
Open or blind assessments?
aA complete description of general clinimetric principles can be found elsewhere <39,40,41>.
Imperiale and McCullough [131 conducted a meta-analysis of whether

corticosteroids reduce mortality from alcoholic hepatitis. These authors con-
cluded that the protective effect of corticosteroids was higher among trials of
higher quality and among trials that excluded patients with active gastroin-
testinal bleeding. The latter result would suggest confounding of quality with
other aspects of study design. Other investigations have found no relation-
ship between treatment effects and quality [471.
Checklists suffer from many of the problems noted with the development
of scales. Checklist developers do not provide details on how and why their
items were selected for inclusion. The number of items also varies between
checklists although the majority of them include items on patient assignment,
masking, patient follow-up, and statistical analysis. Perhaps checklists can be
most useful in helping authors report their trials. Checklists provide items
authors should include when reporting their trials. Some journals have also
started using checklists in assessing the statistical quality of submissions [55].
There have been several papers outlining the poor quality of reporting of
RCTs and the need to improve this situation [29,56,571. Other approaches,
such as more informative abstracts 1581, have been useful. Other methods of
improving the reporting of RCTs, such as structured reporting of the text of
trials, have also been proposed. Checklists offer a useful ancillary tool for all
of these suggestions.
Quality assessment is important and relatively new. We reviewed all of the
scales and checklists developed to assess quality. Our results indicate that
there is room for improvement in how the scales were developed. The scales
differ from one another in almost every respect: how and why the items were
selected for inclusion, the number of items, reliability, approximate time to
complete, and scoring range. Little attention has been given to the construct
that the scales are assessing. With one exception the scales are uniformly
weak in how they were developed. Checklists are also weak in their develop-
ment. Checklists may be most useful in helping authors report their trials. We
are likely to see continued interest in quality assessment. Future efforts may
be most beneficial if scales are developed in which items common to all trials
are assessed, the scale is easy to use, and it has been rigorously developed.
Assessing the quality of trials used in meta-analyses is also important. If

safety and efficacy results of a meta-analysis are significantly affected by the
quality of original trials, then its results may be less meaningful if quality is
not assessed formally.
REFERENCES
1. Cook DJ, Guyatt GH, Laupacis A, Sackett DL: Rules of evidence and clinical
recommendations on the use of antithrombotic agents. Chest 102:305S-311S, 1992
2. Chalmers TC, Smith H, Blackburn B, Silverman B, Schroeder B, Reitman D,
Ambroz A: A method for assessing the quality of a randomized control trial.
Controlled Clin Trials 2:3149, 1981
3. Andrew E: Method for assessment of the reporting standard of clinical trials with
roentgen contrast media. Acta Radio1 Diag 25:55-58, 1984
4. Goodman SN, Berlin J, Fletcher RH, Fletcher SW. Manuscript quality before and
after peer review and editing at Annals of Internal Medicine. Ann Intern Med
121:11-21,1994.
5. Beckerman H, de Bie RA, Bouter LM, De Cuyper HJ, Oostendorp RAB: The
efficacy of laser therapy for musculoskeletal and skin disorders. In: Effectiviteit
Van Fysiotherapie: Een Literatuuronderzoek. Beckerman H, Bouter L, eds. Maas-
tricht, Rijksuniversiteit Limburg, 1990
6. Brown SA: Measurement of quality of primary studies for meta-analysis. Nursing
Res 40:352-355, 1991
7. Chalmers I, Adams M, Dickersin K, Hetherington J, Tarnow-Mordi W, Meinert C,
Tonascia S, Chalmers TC: A cohort study of summary reports of controlled trials.
JAMA 263:1401-1405,199O
8. Cho MK, Bero LA. Instruments for assessing the quality of drug studies published
in the medical literature. JAMA 272:101-104, 1994.
9. Colditz GA, Miller JN, Mosteller F: How study design affects outcomes in compar-
ison of therapy: I. Med Stat Med 8:441454, 1989
10. Detsky AS, Naylor CD, O’Rourke K, McGeer AJ, L’Abbe KA: Incorporating
variations in the quality of individual randomized trials into meta-analysis. J Clin
Epidemiol45:225-265, 1992
11. Evans M, Pollock AV: A score system for evaluating random control clinical trials
of prophylaxis of abdominal surgical wound infection. Br J Surg 72:256-260,1985
12. Gdtzsche I’: Methodology and overt and hidden bias in reports of 196 double-
blind trials of nonsteroidal antiinflammatory drugs in rheumatoid arthritis. Con-
trolled Clin Trials 10:31-56, (erratum:356), 1989
13. Imperiale TF, McCullough AJ: Do corticosteroids reduce mortality from alcoholic
hepatitis? A meta-analysis of the randomized trials. Ann Intern Med 113:299-307,
1990
14. Jadad-Bechara AR. Meta-Analysis of randomised clinical trials in pain relief. DPhil
thesis, University of Oxford, 1994. (Paper in preparation by Jadad AR, Moore RA,
Carroll D, Jenkinson C, Reynolds DJM, Gavaghan DJ, McQuay HJ.)
15. Kleijnen J, Knipschild P, ter Riet G: Clinical trials of homeopathy. Br Med J
302:316-323,199l
16. Koes BW, Assendelft WJJ, van der Heijden GJMG, Bouter LM, Knipschild PG:
Spinal manipulation and mobilization for back and neck pain: a blinded review. Br
Med J 303:1298-1303,199l
17. Nurmohamed MT, Rosendaal FR, Buller HR, Dekker E, Hommes DW, Venden-
broucke JP, Briet E: Low-molecular-weight heparin versus standard heparin in
general and orthopaedic surgery: a meta-analysis. Lancet 340:152-156, 1992
72 D. Moher et al.
18. Ongenhenia I’, Van Houdenhove B: Antidepressants-induced analgesia in chronic

non-malignant pain: a meta-analysis of 39 placebo-controlled studies. Pain 49:205-
219, 1992
19. Poynard T: Evaluation de la qualite methodologique des essais therapeutiques
randomises. La Presse Medicale 17315-318, 1988
20. Reisch JS, Tyson JE, Mize SG: Aid to the evaluation of therapeutic studies.
Pediatrics 84815-827, 1989
21. Smith K, Cook D, Guyatt GH, Madhavan J, Oxman AD: Respiratory muscle
training in chronic airflow limitation: a meta-analysis. Am Rev Resp Dis 145:533-
539, 1992
22. Spitzer WO, Lawrence V, Dales R, Hill G, Archer MC, Clark I’, Abenhaim L,
Hardy J, Sampalis J, Pinfold I’, Morgan PI’: Links between passive smoking and
disease: a best evidence synthesis. Clin Invest Med 13:17-42,1990
23. Levine J: Trial Assessment Procedure Scale (TAPS). Printed by Department of
Health and Human Services, Public Health Service, Alcohol, Drug Abuse and
Mental Health Administration, National Institute of Mental Health, Bethesda, MD,
1980
24. Ter Riet G, Kleijnen J, Knipschild I’: Acupuncture and chronic pain: a criteria-
based meta-analysis. J Clin Epidemiol43:1191-1199, 1990
25. Badgley RF: An assessment of research methods reported in 103 scientific articles
from two Canadian medical journals. Can Med Assoc J 85:246250,1961
26. Bland JM, Jones DR, Bennett S, Cook DG, Haines AI’, MacFarlane AJ: Is the
clinical tria1 evidence about new drugs statistically adequate? Br J Clin Pharmacol
19:155-160, 1985
27. DerSimonian R, Charette LJ, McPeek B, Mosteller F: Reporting on methods in
clinical trials. New Engl J Med 306:1332-1337, 1982
28. Gardner MJ, Machin D, Campbell MJ: Use of check lists in assessing the statistical
content of medical studies. In: Statistics with Confidence-Confidence Intervals and
Statistical Guidelines. London, BMJ, 1989
29. Grant A: Reporting clinical trials. Br J Obstet Gynaecol 96:397400, 1989
30. Lionel NDW, Herxheimer A: Assessing reports of therapeutic trials. Br Med J
3:637-640, 1970
31. Mahon WA, Daniel EE: A method for the assessment of reports of drug trials. Can
Med Assoc J 90:565-569,1964
32. Thomson ME, Kramer MS: Methodologic standards for controlled clinical trials of
early contact and maternal-infant behavior. Pediatrics 73:294-300, 1984
33. Weintraub M: How to critically assess clinical drug trials. Drug Ther 12:131-148,
1982
34. Brown FG: Principles of Educational and Psychological Testing, 3rd Ed. New
York, Holt, Rinehart and Winston, 1983
35. Altman DG: Randomization. Essential for reducing bias. Br Med J 302:1481-1482,
1991
36. Karlowski TR, Chalmers TC, Frendel LD, Kapikian AZ, Lewis TL, Lynch JM:
Ascorbic acid for the common cold: a prophylactic and therapeutic trial. JAMA
231:1038-1042, 1975
37. Schulz KF: Methodological quality and bias in randomized controlled trials. PhD
thesis, University of London, 1994
38. Pocock SJ, Hughes MD, Lee RJ: Statistical problems in the reporting of clinical
trials: a survey of three journals. New Engl J Med 317:426-432, 1987
39. McDowell I, Newell C: Measuring Health: A Guide to Rating Scales and Question-
naires. New York, Oxford University Press, 1987
40. Feinstein AR: Clinimetrics. New Haven, Yale University Press, 1987
41. Steiner DL, Norman GR: Health Measurement Scales: A Practical Guide to Their
Development and Use. Oxford, Oxford University Press, 1989
42. Pocock SJ: Clinical Trials: A Practical Approach. New York, John Wiley and Sons,
1983
43. Meinert CL: Clinical Trials: Design, Conduct and Analysis. New York, Oxford
University Press, 1986
44. Andrew E, Eide H, Fuglerud P, Hagen EK, Kristoffersen DT, Lambrechts M,
Waaler A, Weibye M: Publications on clinical trials with x-ray contrast media:
differences in quality between journals and decades. Eur J Radio1 10:92-97, 1990
45. Brown SA: Studies of educational interventions and outcomes in diabetic adults: a
meta-analysis revisited. Patient Educ Couns 16:189-215, 1990
46. Prendiville W, Elbourne D, Chalmers I: The effects of routine oxytocic administra-
tion in the management of the third stage of lobour: an overview of the evidence
from controlled trials. Br J Obstet Gynecol95:3-16, 1988
47. Emerson JD, Burdick E, Hoaglin DC, Mosteller F, Chalmers TC: An empirical
study of the possible relation of treatment differences to quality scores in con-
trolled randomized clinical trials. Controlled Clin Trials 11:339-352, 1990
48. Poynard T, Naveau S, Chaput JC: Methodological quality of randomized clinical
trials in the treatment of portal hypertension. In: Methodology and Reviews of
Clinical Trials in Portal Hypertension. Burroughs AK, ed. Amsterdam, Excerpta
Medica, 1987
49. Tyson JE, Reisch JS, Jimenez J: Clinical trials in perinatal medicine. In: Reproduc-
tive and Perinatal Epidemiology. Kiely M, ed. Boca Raton, CRC Press, 1990
50. Schulz KF, Chalmers I, Hayes RJ, Altman DG: Failure to conceal treatment alloca-
tion schedules in trials influenced estimates of treatment effects. Controlled Clin
Trials 15:63S, 1994
51. Aaronson NK: Quality of life assessment in clinical trials: methodological issues.
Controlled Clin Trials 10 (Suppl):195-208, 1989
52. Moher D, Jadad AR, Tugwell I’: Assessing the quality of randomized controlled
trials: current issues and future directions. Int J Technology Assess Health Care,
forthcoming
53. Chalmers I, Dickersin K, Chalmers TC: Getting to grips with Archie Cochrane’s
agenda. All randomized controlled trials should be registered and reported. Br
Med J 305:786-788,1992
54. Outcomes and PORTS. Lancet 340:1439,1992
55. Gore SM, Jones G, Thompson GS: The Lancet’s statistical review process: areas for
improvement by authors. Statistics 340:100-102, 1992
56. Simon R, Wittes RE: Methodological guidelines for reports of clinical trials. Cancer
Treat Rep 69:1-3,1985
57. Mosteller F, Gilbert JP, McPeek B: Reporting standards and research strategies for
controlled trials. Controlled Clin Trials 1:37-58, 1980
58. Haynes RB, Mulrow C, Huth EJ, Altman DG, Gardner MJ: More informative
abstracts revisited. Ann Intern Med 113:69-76, 1990

Assessing The Quality of Randomized Controlled Trials: An Annotated Bibliography of Scales and Checklists

Uploaded by

Copyright:

Available Formats

You might also like

Assessing The Quality of Randomized Controlled Trials: An Annotated Bibliography of Scales and Checklists

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessing The Quality of Randomized Controlled Trials: An Annotated Bibliography of Scales and Checklists

Uploaded by

Copyright:

Available Formats

Assessing the Quality of Randomized

Controlled Trials: An Annotated Bibliography

David Moher, MSc, Alejandro R. Jadad, MD,

KEY WORDS: Clinimetrics, assessmenf, quality, scales, checklists

It is important to distinguish between assessing the quality of a trial and

opment [39,40,41], interrater reliability, the approximate time to complete the

(48%) of the scales [3,5-7,9,10,13,14,17,19,211 can be scored in 10 min or less

Andrew3+4 S n r ac y Y n Y 11 nr 0.95” 10 o-22 56

aname of scale or principal author z

Table 2 Descriptive characteristics of published checklists used to assess the quality

‘name of principal author

development techniques [39,40,41]. Because of these results we recommend

There are several consequences resulting from inadequately developed

Table 3 Specific issues to address in the development of a scale to assess

aA complete description of general clinimetric principles can be found elsewhere <39,40,41>.

Imperiale and McCullough [131 conducted a meta-analysis of whether

Assessing the quality of trials used in meta-analyses is also important. If

18. Ongenhenia I’, Van Houdenhove B: Antidepressants-induced analgesia in chronic

You might also like