Professional Documents
Culture Documents
Validation of A New Tool For The Assessment of Study Quality and Reporting in Exercise Training Studies: Testex
Validation of A New Tool For The Assessment of Study Quality and Reporting in Exercise Training Studies: Testex
ABSTRACT
Introduction: Several established tools are available to assess study quality and reporting of randomized controlled
trials; however, these tools were designed with clinical intervention trials in mind. In exercise training intervention
trials some of the traditional study quality criteria, such as participant or researcher blinding, are extremely difficult to
implement.
Methods: We developed the Tool for the assEssment of Study qualiTy and reporting in EXercise (TESTEX) – a study
quality and reporting assessment tool, designed specifically for use in exercise training studies. Our tool is a 15-point
scale (5 points for study quality and 10 points for reporting) and addresses previously unmentioned quality
assessment criteria specific to exercise training studies.
Results: There were no systematic differences between the summated TESTEX scores of each observer [H(2) ¼ 0.392,
P ¼ 0.822]. There was a significant association between the summated TESTEX scores of the three observers, with
almost perfect agreement between observers 1 and 2 [intra-class correlation coefficient (ICC) ¼ 0.93, 95% confidence
interval (CI) 0.82–0.97, P < 0.001], observers 1 and 3 (ICC ¼ 0.96, 95% CI 0.89–0.98, P < 0.001) and observers 2 and
3 (ICC ¼ 0.91, 95% CI 0.75–0.96, P < 0.001).
Conclusions: The TESTEX scale is a new, reliable tool, specific to exercise scientists, that facilitates a comprehensive
review of exercise training trials.
Key words: assessment tool, exercise training, study quality, study reporting
Int J Evid Based Healthc 2015; 13:9–18.
International Journal of Evidence-Based Healthcare ß 2015 University of Adelaide, Joanna Briggs Institute 9
©2015 University of Adelaide, Joanna Briggs Institute. Unauthorized reproduction of this article is prohibited.
NA Smart et al.
employ this ranking strategy is to identify sources of events, session attendance, exercise adherence, and
bias. One common source of bias is measurement error exercise programme characteristics, which are important
where researchers who deliver the intervention are also in exercise training studies and are not included in the
responsible for taking outcome measurements – which PEDro scale.7 The authors of this work have experience
may be biased in favour of the exercise over control assessing study quality for numerous published meta-
group; blinding of outcome assessors may therefore analyses.8–15 The PEDro scale is perhaps the tool that
eliminate this form of bias. Outcome reporting from comes closest to meeting the methodological and
trials may be deficient in a number of ways and this is reporting requirements of exercise studies; however,
easily rectified. Selective outcome reporting is com- several shortcomings remain.
mon, where post-intervention change in initially stipu- The primary objective of this study was to develop
lated outcome measures are withheld from publication an exercise science-specific scale, designed for use by
because results are undesirable. Reliability, or the con- exercise specialists, to assess the quality and reporting of
sistency of a measurement or the absence of measure- exercise training trials. The secondary objective was to
ment error,1 and degree of observer error or validity are assess the validity and reliability of this scale. It is
two other sources of error. In exercise training studies intended that the scale will be used by researchers
health practitioners are almost always concerned with conducting systematic review and meta-analyses, so
safety, yet adverse events are not always reported. they can quantify the strength of individual study
Related to this, one may report a certain type of designs and reporting; and by exercise science prac-
exercise intervention to be more beneficial to a group titioners seeking to establish whether a particular inter-
than another intervention; yet this is of secondary vention is beneficial or safe in the face of conflicting
importance if adverse events rates, withdrawal or evidence. While developing the Tool for the assEssment
adherence rates are worse. For these reasons, we have of Study qualiTy and reporting in EXercise (TESTEX) scale,
focussed on the specifics of methodology and report- we aimed to avoid using redundant criteria, which we
ing in exercise training studies with respect to these feel are not applicable for exercise training studies,
shortcomings. and to include new criteria, which we think are most
Various tools are available to assess the quality of relevant to study design, quality and reporting in
methodology and reporting in randomized controlled exercise sciences.
trials. The Consolidated Standards of Reporting Trials
statement2 is a general tool to guide study reporting, Methods
and while the JADAD score3 was for some time the The authorship group consisted of members of an
preferred quality and reporting assessment tool fav- existing collaborative with previous experience in pro-
oured by the Cochrane Collaboration,4 a more recent ducing meta-analyses and therefore assessing study
risk of bias (ROB) scale has been developed by Cochrane, quality. All members were asked to list the difficulties
although the reliability of ROB appears low.5 Adopting a they had encountered in using PEDro in assessing
general tool, such as ROB, for exercise training studies, is exercise training studies. Members were also asked
extremely likely to lack the specificity and certain criteria which items they thought were redundant and which
(such as participant blinding) will be redundant. Within should be included in an exercise training-specific
the field of physical therapy, the Physiotherapy Evidence assessment tool. The group used the PEDro scale as a
Database (PEDro) tool has become widely used by template for the development of a new scale. A series of
physiotherapists and is currently also the tool of choice meetings were organized, during which newly pro-
for exercise physiologists. Some of the criteria included posed and existing (PEDro) criteria were assessed for
in the PEDro scale, many of which were adapted from the inclusion in the new scale. Newly proposed criteria were
Delphi list,6 are often redundant for exercise training based upon difficulties experienced when conducting
studies. Examples of this are the blinding criteria utilized study quality assessment for meta-analyses, which is
by the PEDro scale; blinding of exercise training partici- crucial for information accuracy and translation into
pants is not feasible, as is blinding of the investigators clinical practice. Most items were unanimously included
directly supervising the training. Therefore, these criteria in the protocol, but two items were debated at three
are redundant for exercise training studies. On the con- meetings. At the third meeting, a consensus was
trary, other important methodological and reporting reached and a draft protocol was circulated for com-
criteria determining the effectiveness and the risks of ment. Three drafts were edited before a final version was
an exercise intervention are not adequately addressed. reached. Once the draft had been finalized, a reliability
Examples of these are reporting of withdrawals/adverse study was conducted.
10 International Journal of Evidence-Based Healthcare ß 2015 University of Adelaide, Joanna Briggs Institute
©2015 University of Adelaide, Joanna Briggs Institute. Unauthorized reproduction of this article is prohibited.
METHODOLOGY PAPER
International Journal of Evidence-Based Healthcare ß 2015 University of Adelaide, Joanna Briggs Institute 11
©2015 University of Adelaide, Joanna Briggs Institute. Unauthorized reproduction of this article is prohibited.
12
NA Smart et al.
Table 1. ‘Detailed TESTEX scale’ (maximum score 15)
Criterion Explanation Scoring
Study quality
1 – Eligibility criteria specified Eligibility criteria should be specified and fulfilled and specific diagnostic test values should be provided 1 Point – if eligibility criteria are clearly stated and fulfilled
for all participants.
2 – Randomization specified A description of the method used to allocate patients into treatment groups should be provided. 1 Point – if methods are described and they are truly random e.g.
coin-tossing, sequence of randomly generated numbers
3 – Allocation concealment It should be stated if group allocation was concealed; meaning if a patient was eligible for inclusion in 1 Point – if group allocation was concealed from patients eligible
the trial was unaware (when this decision was made) of which group the patient would be allocated for inclusion in the trial (e.g. consent should be given before
to. randomization)
4 – Groups similar at baseline Baseline data of all participants who were randomized should be presented. There should be no 1 Point – if baseline data are separated by group allocation,
International Journal of Evidence-Based Healthcare ß 2015 University of Adelaide, Joanna Briggs Institute
significant difference in the measure of the severity of the treated condition between treatment presented and no differences are apparent
groups.
Blinding of all participants This item is not scored. No point
Blinding of all therapists This item is not scored. No point
5 – Blinding of assessor (for at least It is not always possible to blind patients and/or therapists; however, blinding of assessors is reasonable. 1 Point – if it is stated unambiguously that the assessor of at least
one key outcome) If assessors of primary outcome measures are blinded to the intervention allocation of the patients, 1 primary outcome measure was blinded to group allocation
this should be stated clearly.
Study reporting
6 – Outcome measures assessed in The percentage of patients completing the study in both groups should be reported. No point – if withdrawals are >15%
85% of patients Any adverse events (serious medical events, deaths, hospitalizations etc.) should be reported for each 1 Point – if adherence>85%
intervention group. 1 Point – if adverse events are reported
The percentage of exercise sessions completed by the exercise patients who did not withdraw from the 1 Point – if exercise attendance is reported
study should be reported. Total possible – 3 points
7 – Intention-to-treat analysis When a patient withdraws, this analysis is conducted by using either the last value obtained for each of 1 Point – if intention to treat analysis was performed on outcomes
the outcome measures as a post-intervention value, or by using the baseline value as a post value. of interest
This analysis should be added to the data of those that did complete the study and an analysis
conducted.
8 – Between-group statistical Comparison of exercise vs. comparator (control) group for the primary and at least one secondary 1 Point – if between-group statistical comparisons are reported for
comparisons reported outcome should be performed. the primary outcome measure of interest
1 Point – if between-group statistical comparisons are reported for
at least one secondary outcome measure
Total possible – 2 points
9 – Point measures and measures of Point estimates should be provided for all outcomes, otherwise this could be deemed selective 1 Point – if all outcomes are reported with point estimates
variability for all reported outcome outcome reporting.
measures
10 – Activity monitoring in control Between-group differences may be diluted if control patients crossover to intervention. As many as one 1 Point – if control patients are asked to report their levels of
groups third of patients do this, so some measure e.g. exercise diary or activity monitoring should be physical activity and data are presented
supplied so this effect can be measured and quantified.
11 – Relative exercise intensity Exercise intensity is considered by many to be the best stimulus for adaptation. Once patients begin an 1 Point – if exercise load is titrated to keep relative intensity
remained constant exercise programme at a set intensity they will begin to adapt. Throughout the study duration the constant
relative intensity will fall in those that do adapt. Therefore, periodic assessment of exercise capacity
should be conducted and the intensity titrated up (or in those that lose fitness, titrated down) so
that exercise intensity remains constant.
12 – Exercise volume and energy Exercise parameters; session and programme duration, session frequency, exercise training intensity and 1 Point – if exercise volume and energy expenditure can be
expenditure modality should be clearly reported. calculated
Total out of a possible 15 points
©2015 University of Adelaide, Joanna Briggs Institute. Unauthorized reproduction of this article is prohibited.
METHODOLOGY PAPER
training trials. For these reasons, we award 1 point Allocation concealment – TESTEX Criterion 3
for studies that report and fulfil included eligibility In addition to eligibility criteria, the TESTEX scale awards
criteria. 1 point for the concealment of allocation. A study is
considered to have provided allocation concealment if
Randomization specified – TESTEX Criterion 2 the potential patients were unaware of which group they
Consistent with the second criterion of the PEDro scale, would be allocated to, at the time the patients give
we feel it is insufficient to consider that a study has their consent.
utilized a random allocation if the published manuscript
merely states that allocation was random. The precise Groups similar at baseline – TESTEX Criterion 4
method of randomization (e.g. computer-generated ran- Studies of exercise training interventions should report
dom numbers, coin-tossing and dice-rolling) should be at least one measure of the severity of the condition
specified. Quasi-randomized allocation procedures do being treated and at least one (different) key outcome
not satisfy this criterion. Random allocation ensures that measure at baseline. The rater must be satisfied that the
(within the constraints provided by chance) treatment groups’ outcomes would not be expected to differ (on
and control groups are comparable; this is especially the basis of baseline differences in prognostic variables
important in exercise training studies as sample size is alone) by a statistically significant amount. Discrepancies
often less than 50 patients and study withdrawal is more at baseline between groups may be indicative of
than 15%. We award 1 point for studies that stipulate the inadequate randomization procedures. One point is
method of randomization. awarded if baseline data are presented by group
International Journal of Evidence-Based Healthcare ß 2015 University of Adelaide, Joanna Briggs Institute 13
©2015 University of Adelaide, Joanna Briggs Institute. Unauthorized reproduction of this article is prohibited.
NA Smart et al.
allocation, and there is no significant difference between continue to exercise, inherently attaching the prac-
study groups in the key outcome(s) of interest and titioner to the treatment. Collectively, we feel that the
the measure of the severity of the condition being above reasons render this criterion as an inappropriate
treated. scoring criterion. For this reason, unlike the PEDro scale
we do not award a point for this criterion.
Blinding of all participants
Blinding involves ensuring that participants were unable Blinding of assessor (for at least one key
to discriminate whether they had or had not received the outcome) – TESTEX Criterion 5
treatment. It is acknowledged that in some physiother- Whereas blinding of patients and therapists is very
apy studies it is possible to provide ‘sham’ interventions difficult to implement in exercise training studies, it is
that could be perceived by participants to mimic actual reasonable to expect to blind assessors (those people
interventions. In exercise training studies participant that conduct outcome data measurements) to the inter-
blinding is difficult to achieve, with just a few notable vention allocation of the participants. When assessors
exceptions. For example, there have been studies that have been blinded, the reader can be satisfied that the
have compared exercise training to: cycling at zero load; apparent effect (or lack of effect) of treatment was not
‘functional electrical stimulation (FES)’ or ‘sham’ and due to the assessors’ biases impinging on their measures
inspiratory muscle training (IMT), whereby participants of outcomes. One point is awarded if it is stated, unam-
can be subjected to interventions that are likely to be biguously, that the assessor of the primary outcome
below the stimulus threshold to elicit physiological measure was blinded to group allocation. An exception
adaptation. Despite these attempts at ‘sham’ training, to this is where studies state that measurements are
participants are usually aware of the groups to which completely automated, for example, measurements of
they have been randomized (based on information that blood analyses in which case the potential for human
they receive when giving their consent to participate), so bias has been removed, 1 point can be awarded.
true blinding is almost impossible. Studies that have
allocated participants to either FES or IMT intervention Common study reporting criteria
groups could be considered to have employed shams or Outcome measures assessed in 85% of
placebos. We acknowledge that the reason for using patients – TESTEX Criterion 6
sham treatments is that they can be manipulated to fall The volume and duration of exercise training required
above or below the therapeutic threshold expected to to elicit adaptations varies with different outcome
elicit a physical adaptation, but feel that this is in general measures, exercise prescriptions and patient character-
not feasible in exercise training studies. For this reason, istics. However, it is generally accepted that significant
unlike the PEDro scale, we do not award a point for changes in some measures, such as cardio-respiratory
this criterion. fitness, cardiac function, lipids or glycaemic control, are
not immediate and that at least 1 month is required to
Blinding of all therapists detect changes. Due to the extended intervention
Like in criterion 3, it is our contention that exercise periods typical of exercise training, the proportion of
training interventions do not lend themselves to blind- patients who complete the study is often less than 85%
ing the administering therapist. When therapists have and not all of those who complete the study attend all
been blinded, the reader can be satisfied that the appa- exercise sessions (we deal with high withdrawal rates in
rent effect (or lack of effect) of treatment was not due to criterion 7 on ITT). It is therefore important in exercise
the therapists’ enthusiasm or lack of enthusiasm for the training studies to distinguish between exercise adher-
treatment or control conditions. In exercise studies per- ence and exercise attendance as both are relevant. For
formed on patients with chronic disease, it is of funda- the purposes of this document we will define exercise
mental importance that the administering therapist is adherence as the number of withdrawals and com-
fully aware of the possible effects of a given treatment. pletions in both the study’s intervention and control
Indeed, in studies administering exercise interventions groups. Exercise attendance is defined here as the per-
to moderate-high risk patients, it would be considered centage of target sessions completed by each individual
negligent on behalf of the therapist not to have obtained who completes the study. Quite often more than 15% of
an a priori record of the patient’s medical history and the people will withdraw from an exercise training study
scope of effects that might occur in that individual as during the stipulated study period. Moreover, exercise
a result of the treatment. Furthermore, part of the attendance is less than 85% in some of the people who
therapist’s role is to provide motivation for patients to do not withdraw from the study. It is therefore desirable
14 International Journal of Evidence-Based Healthcare ß 2015 University of Adelaide, Joanna Briggs Institute
©2015 University of Adelaide, Joanna Briggs Institute. Unauthorized reproduction of this article is prohibited.
METHODOLOGY PAPER
that both the number of withdrawals in each allocation group provides two things. Firstly, this will indicate
group, and also the mean and SD of the percentage whether there is regression to the mean, that is patients
exercise session attendance are reported for interven- improve without exercise, perhaps because their medi-
tion groups. The one caveat to this would be where an cations are optimized or other reasons unrelated to
alternative target, for example, kilocalories per week of the exercise intervention. Secondly, it may be that
exercise energy expenditure was successfully achieved. some patients who should have exercised did not
In this case a point would be awarded regardless of the and some who should not have received the interven-
number of exercise sessions completed. tion did. Although we attribute a score to these issues in
We also feel that physical activity in control group other criteria, the between-group comparison alerts us
patients should be monitored but we address this, and to the fact that regression to the mean or inappropriate
attribute a score, in criterion 10. We award 1 point if treatment allocations may have occurred and also
studies report exercise training adherence of at least whether the difference between groups is greater than
85%; no point will be awarded if adherence is less can plausibly be attributed to chance. One point is
than 85%. We award 1 point for reporting adverse awarded if the primary outcome of interest is reported,
events (deaths, hospitalizations, etc. are reported). We with another point awarded for the reporting of at least
award a point for this as the uptake of exercise therapy is one secondary outcome for a total of 2 points.
almost always evaluated in terms of a balance between
expected benefits and the risk of adverse events. We also Point measures and measures of variability for
award 1 point for reporting session attendance for the all reported outcome measures – TESTEX
exercise group(s). Criterion 9
Point estimates (often P values) of treatment effect
‘Intention-to-treat’ analysis – TESTEX only provide limited information about the outcomes
Criterion 7 of treatment and control groups. A more comprehensive
In this criterion, we assess if all patients for whom out- approach is to also provide measures of variability. We
come measures were available received the treatment or suggest, however, that this is extended to all reported
control condition as allocated or, where this was not the outcome measures to avoid selective outcome report-
case, data for at least one key outcome in which one is ing. We award 1 point for this criterion.
interested was analysed by ITT analysis. We actually
propose that, when possible, an ITT analysis is conducted Activity monitoring in control groups – TESTEX
so either the last value obtained for each of the outcome Criterion 10
measures is used as the post-intervention value, or the The largest trial of exercise training in heart failure to
baseline value is used as the post-value, when a patient date (HF-ACTION)18 suggested that one of the reasons
withdraws. The inclusion of this criterion aims to estab- for a lower than expected post-intervention difference
lish if certain patient demographics, clinical status, medi- between groups occurred because approximately 30%
cations and so on predispose patients to withdraw. The of patients allocated to sedentary control undertook
overall aim here is to help future research and clinical exercise training privately. It is therefore recommended
services to improve exercise adherence by identifying that a robust study design quantifies this by making
predisposing factors that lead to study withdrawal. One some provision for measuring activity levels in control
point is awarded if an ITT is conducted. ITT analysis is patients to avoid crossover to exercise. This may be a
performed by substituting the last measurement (which simple method, such as providing patients with an
may be the baseline measurement) in those that did not activity diary or, more advanced, by providing acceler-
complete the study and these data are included in ometry or heart rate monitoring devices to assess poten-
analyses of those that did complete. tial contamination of control group by monitoring
physical activity behaviour. We award 1 point for any
Between-group statistical comparisons reporting on results of activity monitoring in sedentary
reported – TESTEX Criterion 8 control participants.
To score in this criterion, between-group statistical com-
parisons should be reported for the primary and at least Relative exercise intensity remained constant –
one secondary outcome measure. If all outcomes are TESTEX Criterion 11
not reported, this would deem to be selective outcome As patients adapt to exercise training, if the workload is
reporting. We are primarily interested in whether inter- kept constant, then the relative exercise intensity will
vention groups improve, but a comparison with a control continually fall as patients improve their physical work
International Journal of Evidence-Based Healthcare ß 2015 University of Adelaide, Joanna Briggs Institute 15
©2015 University of Adelaide, Joanna Briggs Institute. Unauthorized reproduction of this article is prohibited.
NA Smart et al.
Table 3. Inter-observer reliability (Kappa SE) between the three expert reviewers using the 15-point
TESTEX criteria
TESTEX criteria Observer 1 vs. Observer 2 vs. Observer 1 vs.
observer 2 observer 3 observer 3
Eligibility criteria included Constant Constant Constant
Randomization method stated 1.00 (0.00)f 1.00 (0.00)f 1.00 (0.00)f
Allocation concealment Constant Constant Constant
Groups similar at baseline 0.48 (0.24)d 0.61 (0.23)e 0.32 (0.30)d
Assessor blinded 0.69 (0.16)e 0.69 (0.16)e 1.00 (0.00)f
Study withdrawals <15% 0.44 (0.33)d 0.41 (0.21)d 0.41 (0.21)d
Adverse events reported 0.65 (0.18)e 0.77 (0.15)e 0.88 (0.11)f
Session attendance reported 0.63 (0.18)e 0.52 (0.20)d 0.86 (0.14)f
Intention-to-treat analysis 1.00 (0.00)f 0.61 (0.24)e 0.61 (0.24)e
Between-group primary analysis Constant Constant Constant
Between-group secondary analysis Constant Constant Constant
Point measures for all outcomes Constant Constant Constant
Activity monitoring controls 0.77 (0.14)e 0.51 (0.21)d 0.77 (0.14)e
Relative exercise intensity adjusted 0.73 (0.17)e 0.73 (0.17)e 1.00 (0.00)f
Exercise energy expenditure information reported 0.83 (0.17)f 0.83 (0.17)f 1.00 (0.00)f
17
Superscript letters denote the following level of agreement between observers : a, poor; b, slight; c, fair; d, moderate; e, substantial; f, almost perfect.
Constant ¼ 100% agreement between observers, preventing kappa analysis.
16 International Journal of Evidence-Based Healthcare ß 2015 University of Adelaide, Joanna Briggs Institute
©2015 University of Adelaide, Joanna Briggs Institute. Unauthorized reproduction of this article is prohibited.
METHODOLOGY PAPER
exercise training in clinical populations. The TESTEX scale We feel that the TESTEX scale with the newly intro-
uses 12 criteria, with some criteria scoring more than one duced criteria addresses common shortcomings in study
possible point, for a maximum score of 15 points, as design, quality and reporting in the exercise sciences. We
compared to the PEDro scale, which uses 11 criteria for a are therefore confident the TESTEX scale will improve
maximum score of 10 points. study design and reporting, thus qualifying inferences
In contrast to the PEDro scale, the TESTEX scale takes and conclusions in the exercise sciences.
eligibility criteria into account. Both scales award 1 point
for the concealment of allocation. Subsequent blinding Conclusion
of patients and therapists is nearly always unachievable The TESTEX scale is a new reliable tool, specific to
in exercise training studies and does not attract any exercise scientists, that facilitates a comprehensive
points. The TESTEX scale expands on outcome measures review of exercise training trials.
including reports on study withdrawals, session attend-
ance, intention-to-treat (ITT) analyses, reporting seden- Acknowledgements
tary control crossover to exercise, periodic adjustment of The authors would like to acknowledge the researchers
exercise load so that intensity remains constant, adverse who assisted with the review procedures.
events and measurement errors, as well as description of The authors report no conflicts of interest.
exercise characteristics which allows a calculation of
exercise volume and energy expenditure. We acknowl- References
edge that exercise training studies do not lend them- 1. Batterham A, George KP. Reliability in evidence-based
selves to participant blinding and this may introduce clinical practice: a primer for allied health professionals.
measurement error. Phys Ther Sport 2003; 4: 122–8.
The inter-observer reliability of the 15 different 2. Antes G. The new CONSORT statement. Br Med J 2010; 340:
c1432.
TESTEX items ranged between moderate and almost
3. Jadad AR, Moore RA, Carroll D, et al. Assessing the quality of
perfect agreement, suggesting that observers, varying
reports of randomized clinical trials: is blinding necessary?
in experience, can reach an acceptable level of reliability Control Clin Trials 1996; 17: 1–12.
without specific training or familiarization. These find- 4. Higgins JPT, Green S (Eds): Cochrane handbook for system-
ings reflect the clarity of our descriptions for each item of atic reviews of interventions Version 5.1.0 [updated March
the TESTEX and are consistent with those reported 2011]. The Cochrane Collaboration; 2011. www.cochrane-
for the PEDro scale.7 The moderate agreements occurred handbook.org. [Accessed 31 July 2014]
consistently for the criteria of ‘Study withdrawals less 5. Hartling L, Hamm MP, Milne A, et al. Testing the risk of bias
than 15%’. Follow-up interviews revealed some minor tool showed low reliability between individual reviewers
oversights among the observers, whereby points were and across consensus assessments of reviewer pairs. J Clin
awarded to studies reporting withdrawals of only the Epidemiol 2012; 66: 973–81.
6. Verhagen AP, de Vet HC, de Bie RA, et al. The Delphi list: a
treatment groups, and not the control groups. Given that
criteria list for quality assessment of randomized clinical
this is clearly stated in the TESTEX criteria, and only minor
trials for conducting systematic reviews developed by
disagreements were observed, there appears to be Delphi consensus. J Clin Epidemiol 1995; 51: 1235–41.
limited threat to the reliability of each of the TESTEX 7. Maher CG, Sherrington C, Herbert RD, et al. Reliability of the
items. PEDro scale for rating quality of randomized controlled
The observers also agreed, almost perfectly, on the trials. Phys Ther 2003; 83: 713–21.
summated TESTEX score, with ICCs ranging from 0.91 8. Cornelissen VA, Buys R, Smart NA. Endurance exercise
to 0.96. In a sample of 19 studies, the typical difference beneficially affects ambulatory blood pressure: a syste-
between observers’ summated TESTEX scores ranged matic review and meta-analysis. J Hypertens 2013; 31:
from 1 to 2 points and was not systematically differ- 639–48.
ent. We applied the TESTEX criteria to a previous meta- 9. Smart N, Meyer T, Butterfield J, et al. Individual patient
meta-analysis of exercise training effects on systemic brain
analysis performed by some of the current authors in
natriuretic peptide expression in heart failure. Eur J Prev
the area of exercise intervention, accounting for a
Cardiol 2012; 19: 428–35.
worst-case error of 2 points. An error of 2 points 10. Smart NA, Dieberg G, Giallauria F. Functional electrical
would not have resulted in the exclusion of research stimulation for chronic heart failure: a meta-analysis. Int J
papers from the meta-analysis and would not have Cardiol 2013; 167: 80–6.
altered the conclusions of the study. On this basis, 11. Cornelissen VA, Smart NA. Exercise training for blood
the worst case error of the summated TESTEX can be pressure: a systematic review and meta-analysis. J Am Heart
tolerated. Assoc 2013; 2: e004473.
International Journal of Evidence-Based Healthcare ß 2015 University of Adelaide, Joanna Briggs Institute 17
©2015 University of Adelaide, Joanna Briggs Institute. Unauthorized reproduction of this article is prohibited.
NA Smart et al.
12. Ismail H, McFarlane JR, Nojoumian AH, et al. Clinical out- 15. Smart NA, Giallauria F, Dieberg G. Efficacy of inspiratory
comes and cardiovascular responses to different exercise muscle training in chronic heart failure patients: a systematic
training intensities in patients with heart failure: a system- review and meta-analysis. Int J Cardiol 2013; 167: 1502–7.
atic review and meta-analysis. J Am Coll Cardiol Heart Fail 16. Fleiss J, Cohen J. The equivalence of weighted kappa and
2013; 1: 514–22. the interclass correlation coefficient as measures of
13. Ismail H, McFarlane J, Smart NA. Is exercise training reliability. Educ Psychol Measurement 1973; 33: 613–9.
beneficial for heart failure patients taking beta-adrenergic 17. Landis JR, Koch GG. The measurement of observer agree-
blockers? A systematic review and meta-analysis. Congest ment for categorical data. Biometrics 1977; 33: 159–74.
Heart Fail 2013; 19: 61–9. 18. O’Connor CM, Whellan DJ, Lee KL, et al. Efficacy and safety
14. Smart NA, Dieberg G, Giallauria F. Intermittent versus of exercise training in patients with chronic heart failure:
continuous exercise training in chronic heart failure: a HF-ACTION randomized controlled trial. J Am Med Assoc
meta-analysis. Int J Cardiol 2013; 166: 352–8. 2009; 301: 1439–50.
18 International Journal of Evidence-Based Healthcare ß 2015 University of Adelaide, Joanna Briggs Institute
©2015 University of Adelaide, Joanna Briggs Institute. Unauthorized reproduction of this article is prohibited.