Professional Documents
Culture Documents
Assessing The Quality of Randomized Controlled Trials: An Annotated Bibliography of Scales and Checklists
Assessing The Quality of Randomized Controlled Trials: An Annotated Bibliography of Scales and Checklists
Assessing The Quality of Randomized Controlled Trials: An Annotated Bibliography of Scales and Checklists
ABSTRACT: Assessing the quality of randomized controlled trials (RCTs) is important and
relatively new. Quality gives us an estimate of the likelihood that the results are a
valid estimate of the truth. We present an annotated bibliography of scales and
checklists developed to assess quality. Twenty-five scales and nine checklists have
been developed to assess quality. The checklists are most useful in providing investi-
gators with guidelines as to what information should be included in reporting RCTs.
The scales give readers a quantitative index of the likelihood that the reported
methodology and results are free of bias. There are several shortcomings with these
scales. Future scale development is likely to be most beneficial if questions common to
all trials are assessed, if the scale is easy to use, and if it is developed with sufficient
rigor.
INTRODUCTION
The randomized controlled (RCT) is considered the most reliable
trial
method on which to assess the efficacy of treatments 111. Regardless of
whether or not the result of an RCT reaches statistical significance the design,
conduct, and published report should be of high quality. High-quality trials
and their reports should lead to better and more realistic estimates of treat-
ment effects, more accurate and reproducible estimates of treatment efficacy,
and hopefully greater acceptance of these results within the health care
community.
Address reprint requests to: David Moher, MSc, Clinical Epidemiology Unit, Loeb Medical Research
Institute, Ottawa Civic Hospital, 1053 Carling Avenue, Ottawa, Ontario KZY 4E9, Canada.
Received October 22,1993; revised July 12, 1994.
Controlled Clinical Trials 16162-73
0197-2456/95/$9.50 0 Elsevier Science Inc. 1995
SSDI 0197-2456(94)00031-W 655 Avenue of the Americas, New York, New York 10010
Assessing the Quality of RCTs 63
METHODOLOGY
To be considered as a scale, the construct under consideration should be a
continuum, with quantitative units that reflect varying levels of a trait or
characteristic 1341. There had to be evidence that the scale was developed to
measure quality and each item had to have a numeric score attached to it with
an overall summary score. For a checklist to be included the author(s)
intentions were not to attach a quantitative score to each of the questions or to
have an overall numeric score.
To capture all the published scales and checklists we carried out a MED-
LINE search of research reports published between January 1966 and De-
cember 1992. We included the following key words in our searching strategy:
quality, clinical trials, scale, checklist, and human. We used wild cards for
RANDOM* and MeSH terms for CLINICAL TRIALS and RANDOMIZED
CONTROLLED TRIALS. Our search was not limited to the English language.
We reviewed all of the scales and checklists so identified, including their
references for scales and checklists. We also wrote to various authors of scales
and checklists asking them whether they knew of other published or unpub-
lished scales and/or checklists. We rejected any scale or checklist that was a
minor modification of an existing one.
From each scale the following items were recorded: the name of the scale
or its principal author, whether the scale was developed to assess the quality
of any trial or specific trials (e.g., contrast media, pain), whether quality was
defined, the type of quality assessed (i.e., methodological quality or the
quality of reporting), how the items were selected, whether the scale included
items on four content areas bearing on the internal validity of a trial (patient
assignment J351, masking 1361, patient follow-up 1371, and statistical analysis
1381, the number of items, whether the scale had undergone rigorous devel-
D. Moher et al.
RESULTS
Scales
Twenty-five scales were identified (see Table 1). Twenty-three of these
scales have been published. The remaining two scales are still under develop-
ment or unpublished.
Fifteen (60%) of the scales were designed to assess the quality of any trial
12,4,7-11,13-15,19,20,231. The remaining 10 (40%) scales assess the quality of
specific trials (e.g., contrast media, pain) [3,5-6,12,16-18,21,22,24]. Differences
in the scope of scales can lead to discrepancies in how trials should be scored.
Scales that yield higher scores for double-masked trials automatically dis-
criminate against surgical trials in which masking may be inappropriate or
impossible. Six (24%) of the scales defined the construct quality used in their
scale development [4,5,7,8,14]. Three (12%) of the scales were designed to
assess the quality of the trial report [3,4,141, 8 (32%) to assess methodological
quality ]2,5,6,13,16,21,221, and the remaining 14 (56%) to assess both method-
ological quality and the quality of the report [7,8-12,15,17-20,23,24].
Twenty-four (96%) used “accepted criteria” to select the items for inclusion
in their scale [2-13,15-241. By accepted criteria the authors reported using
items selected from textbooks of clinical trials [42,431. The remaining scale [14]
used a pool of items that were narrowed down, to the final version of the
scale, using standard scale development techniques. Twenty-two (88%) of the
scales had at least one item about patient assignment [2-17,20-241, 20 (80%)
had at least one item about masking [2-5,7,8,10-12,14-17,19,20,22-241, 11
(44%) had at least one item about patient follow-up [2,5,9,16,19-21,23,24], and
21 (84%) had at least one item about statistical analysis [2-5,7-12,14-16,18-
20,22-241. The number of items in a scale ranged from 3 to 34.
Only one (4%) scale satisfied our criteria of rigorous development 1141. This
report documented how the items were initially selected, how and why the
final items were included, how the scale discriminated between trials of
differing quality, and on the range of quality scores obtained during its
development. Twelve (48%) of the scales reported interrater reliability
[3,4,6,8,13-16,19,20,24]: five (41.7%) reported percent agreement, six (50%)
reported intraclass correlation (ICC) or K, its equivalent, and one (8.3%)
reported Pearson correlations.
Eighteen (72%) of the scales described how the items should be scored
when assessing quality [2-4,6-8,14-20,23,241. All of the items in each of the 12
Assessing the Quality of RCTs 65
Checklists
Nine checklists assessing the quality of RCTs have been published (see
Table 2). The checklists vary from 4 to 57 items. Quality was partially defined
in two (22.2%) checklists [25,26]. Four (44.4%) of the checklists were designed
to assess the methodological quality of the trials [25,26,32,33], three (33.3%) to
assess the quality of reporting 127,29,311, and the remaining two (22.2%) to
assess methodological quality as well as the quality of reporting [28,301.
Accepted criteria were used in the selection of items for all the checklists.
Seven (77.8%) of the checklists included at least one item about patient
assignment [26-29,31-331, eight (88.9%) had at least one item about masking
126-331, five (55.6%) had at least one item about patient follow-up [26-29,321,
and eight (88.9%) had at least one item about statistical analysis [25,27-331. A
trial could be assessed by all the checklists in 30 min or less.
DISCUSSION
Despite the enormous time and energy required to develop a scale, all of
the 25 scales reviewed here, with one exception, have major weaknesses. With
one exception, all of the scales have evolved with little or no standard scale
Table 1 Descriptive characteristics of published and unpublished scales used to
assess the quality of randomized controlled trials (RCTs)
Detailed
Statis- Instruc-
Type Type of Patient Patient tical Scale Time Cons for Meta-
Scale of Quality quality Items assign- Mask- follow- analy- Number devel- Inter-rater to corn- Scoring scoring analysis
Nam@ scaleh defined< assessedd selected mentf in@ up” sis’ of items menti reliabilityk plete’ rangem items” scores”
REFERENCES
1. Cook DJ, Guyatt GH, Laupacis A, Sackett DL: Rules of evidence and clinical
recommendations on the use of antithrombotic agents. Chest 102:305S-311S, 1992
2. Chalmers TC, Smith H, Blackburn B, Silverman B, Schroeder B, Reitman D,
Ambroz A: A method for assessing the quality of a randomized control trial.
Controlled Clin Trials 2:3149, 1981
3. Andrew E: Method for assessment of the reporting standard of clinical trials with
roentgen contrast media. Acta Radio1 Diag 25:55-58, 1984
4. Goodman SN, Berlin J, Fletcher RH, Fletcher SW. Manuscript quality before and
after peer review and editing at Annals of Internal Medicine. Ann Intern Med
121:11-21,1994.
5. Beckerman H, de Bie RA, Bouter LM, De Cuyper HJ, Oostendorp RAB: The
efficacy of laser therapy for musculoskeletal and skin disorders. In: Effectiviteit
Van Fysiotherapie: Een Literatuuronderzoek. Beckerman H, Bouter L, eds. Maas-
tricht, Rijksuniversiteit Limburg, 1990
6. Brown SA: Measurement of quality of primary studies for meta-analysis. Nursing
Res 40:352-355, 1991
7. Chalmers I, Adams M, Dickersin K, Hetherington J, Tarnow-Mordi W, Meinert C,
Tonascia S, Chalmers TC: A cohort study of summary reports of controlled trials.
JAMA 263:1401-1405,199O
8. Cho MK, Bero LA. Instruments for assessing the quality of drug studies published
in the medical literature. JAMA 272:101-104, 1994.
9. Colditz GA, Miller JN, Mosteller F: How study design affects outcomes in compar-
ison of therapy: I. Med Stat Med 8:441454, 1989
10. Detsky AS, Naylor CD, O’Rourke K, McGeer AJ, L’Abbe KA: Incorporating
variations in the quality of individual randomized trials into meta-analysis. J Clin
Epidemiol45:225-265, 1992
11. Evans M, Pollock AV: A score system for evaluating random control clinical trials
of prophylaxis of abdominal surgical wound infection. Br J Surg 72:256-260,1985
12. Gdtzsche I’: Methodology and overt and hidden bias in reports of 196 double-
blind trials of nonsteroidal antiinflammatory drugs in rheumatoid arthritis. Con-
trolled Clin Trials 10:31-56, (erratum:356), 1989
13. Imperiale TF, McCullough AJ: Do corticosteroids reduce mortality from alcoholic
hepatitis? A meta-analysis of the randomized trials. Ann Intern Med 113:299-307,
1990
14. Jadad-Bechara AR. Meta-Analysis of randomised clinical trials in pain relief. DPhil
thesis, University of Oxford, 1994. (Paper in preparation by Jadad AR, Moore RA,
Carroll D, Jenkinson C, Reynolds DJM, Gavaghan DJ, McQuay HJ.)
15. Kleijnen J, Knipschild P, ter Riet G: Clinical trials of homeopathy. Br Med J
302:316-323,199l
16. Koes BW, Assendelft WJJ, van der Heijden GJMG, Bouter LM, Knipschild PG:
Spinal manipulation and mobilization for back and neck pain: a blinded review. Br
Med J 303:1298-1303,199l
17. Nurmohamed MT, Rosendaal FR, Buller HR, Dekker E, Hommes DW, Venden-
broucke JP, Briet E: Low-molecular-weight heparin versus standard heparin in
general and orthopaedic surgery: a meta-analysis. Lancet 340:152-156, 1992
72 D. Moher et al.
41. Steiner DL, Norman GR: Health Measurement Scales: A Practical Guide to Their
Development and Use. Oxford, Oxford University Press, 1989
42. Pocock SJ: Clinical Trials: A Practical Approach. New York, John Wiley and Sons,
1983
43. Meinert CL: Clinical Trials: Design, Conduct and Analysis. New York, Oxford
University Press, 1986
44. Andrew E, Eide H, Fuglerud P, Hagen EK, Kristoffersen DT, Lambrechts M,
Waaler A, Weibye M: Publications on clinical trials with x-ray contrast media:
differences in quality between journals and decades. Eur J Radio1 10:92-97, 1990
45. Brown SA: Studies of educational interventions and outcomes in diabetic adults: a
meta-analysis revisited. Patient Educ Couns 16:189-215, 1990
46. Prendiville W, Elbourne D, Chalmers I: The effects of routine oxytocic administra-
tion in the management of the third stage of lobour: an overview of the evidence
from controlled trials. Br J Obstet Gynecol95:3-16, 1988
47. Emerson JD, Burdick E, Hoaglin DC, Mosteller F, Chalmers TC: An empirical
study of the possible relation of treatment differences to quality scores in con-
trolled randomized clinical trials. Controlled Clin Trials 11:339-352, 1990
48. Poynard T, Naveau S, Chaput JC: Methodological quality of randomized clinical
trials in the treatment of portal hypertension. In: Methodology and Reviews of
Clinical Trials in Portal Hypertension. Burroughs AK, ed. Amsterdam, Excerpta
Medica, 1987
49. Tyson JE, Reisch JS, Jimenez J: Clinical trials in perinatal medicine. In: Reproduc-
tive and Perinatal Epidemiology. Kiely M, ed. Boca Raton, CRC Press, 1990
50. Schulz KF, Chalmers I, Hayes RJ, Altman DG: Failure to conceal treatment alloca-
tion schedules in trials influenced estimates of treatment effects. Controlled Clin
Trials 15:63S, 1994
51. Aaronson NK: Quality of life assessment in clinical trials: methodological issues.
Controlled Clin Trials 10 (Suppl):195-208, 1989
52. Moher D, Jadad AR, Tugwell I’: Assessing the quality of randomized controlled
trials: current issues and future directions. Int J Technology Assess Health Care,
forthcoming
53. Chalmers I, Dickersin K, Chalmers TC: Getting to grips with Archie Cochrane’s
agenda. All randomized controlled trials should be registered and reported. Br
Med J 305:786-788,1992
54. Outcomes and PORTS. Lancet 340:1439,1992
55. Gore SM, Jones G, Thompson GS: The Lancet’s statistical review process: areas for
improvement by authors. Statistics 340:100-102, 1992
56. Simon R, Wittes RE: Methodological guidelines for reports of clinical trials. Cancer
Treat Rep 69:1-3,1985
57. Mosteller F, Gilbert JP, McPeek B: Reporting standards and research strategies for
controlled trials. Controlled Clin Trials 1:37-58, 1980
58. Haynes RB, Mulrow C, Huth EJ, Altman DG, Gardner MJ: More informative
abstracts revisited. Ann Intern Med 113:69-76, 1990