Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Received: 13 November 2018 Revised: 30 March 2019 Accepted: 2 July 2019

DOI: 10.1002/jrsm.1369

RESEARCH ARTICLE

The value of a second reviewer for study selection in systematic


reviews

Carolyn R.T. Stoll1 | Sonya Izadi1 | Susan Fowler2 | Paige Green3 | Jerry Suls3 |
1
Graham A. Colditz

1
Division of Public Health Sciences,
Department of Surgery, Washington Background: Although dual independent review of search results by two reviewers
University School of Medicine, Saint Louis, is generally recommended for systematic reviews, there are not consistent recom-
Missouri, USA
2
mendations regarding the timing of the use of the second reviewer. This study com-
Brown School, Washington University
School of Medicine, Saint Louis, Missouri, pared the use of a complete dual review approach, with two reviewers in both the
USA title/abstract screening stage and the full-text screening stage, as compared with a
3
Behavioral Research Program, Division of limited dual review approach, with two reviewers only in the full-text stage.
Cancer Control and Population Sciences,
Methods: This study was performed within the context of a large systematic
National Cancer Institute, Bethesda,
Maryland, USA review. Two reviewers performed a complete dual review of 15 000 search results
and a limited dual review of 15 000 search results. The number of relevant studies
Correspondence
Carolyn R.T. Stoll, MPH, MSW, Division of
mistakenly excluded by highly experienced reviewers in the complete dual review
Public Health Sciences, Department of was compared with the number mistakenly excluded during the full-text stage of
Surgery, 660 South Euclid Ave, Campus the limited dual review.
Box 8100, Saint Louis, MO 63110, USA.
Email: carolyn.stoll@wustl.edu
Results: In the complete dual review approach, an additional 6.6% to 9.1% of eligi-
ble studies were identified during the title/abstract stage by using two reviewers,
Funding information and an additional 6.6% to 11.9% of eligible studies were identified during the full-
Alvin J. Siteman Cancer Center Biostatistics
Shared Resource, Grant/Award Number: P30 text stage by using two reviewers. In the limited dual review approach, an addi-
CA091842; Foundation for Barnes-Jewish tional 4.4% to 5.3% of eligible studies were identified with the use of two
Hospital, St Louis; National Cancer Institute,
reviewers.
National Institutes of Health, Grant/Award
Number: HHSN261200800001E Conclusions: Using a second reviewer throughout the entire study screening pro-
cess can increase the number of relevant studies identified for use in a systematic
review. Systematic review performers should consider using a complete dual
review process to ensure all relevant studies are included in their review.

KEYWORDS
eligibility screening, review methods, search strategy, study selection

1 | I N T RO D U C T I O N formation of specific and clear eligibility criteria and the sys-


tematic implementation of these criteria against each record
When performing a systematic review, the importance of found in the search process.
study selection cannot be overstated. Decisions about which As a comprehensive search strategy can result in thou-
studies to include are considered among the most significant sands of results that must be screened, the process of screen-
decisions made during the review process.1,2 The quality of ing search results against eligibility criteria can require
the study selection process is dependent on two factors, the significant time and resources. The best method for study

Res Syn Meth. 2019;1–7. wileyonlinelibrary.com/journal/jrsm © 2019 John Wiley & Sons, Ltd. 1
2 STOLL ET AL.

screening is one that allows for a high level of accuracy, analysis and found that using two reviewers in the
ensuring that no relevant studies are mistakenly excluded, title/abstract stage and moving all records marked by at least
with as much efficiency as possible. Studies can be mistak- one reviewer as eligible, rather than resolving disagreements,
enly excluded during the screening process due to a mis- was equally effective and less costly than traditional double
application or misunderstanding of eligibility criteria or due screening in a systematic review of effects of undergraduate
to random error of the screener. In order to reduce this poten- medical education in UK general practice settings.6 How-
tial for missed studies, it is commonly recommended that ever, they conclude that effectiveness of different screening
two (or more) screeners undertake the screening process. methods is likely to vary between systematic reviews.
The Agency for Healthcare Research and Quality (AHRQ), We set out to compare two methods of study selection,
the Center for Reviews and Dissemination (CRD), the Insti- using dual independent reviewers throughout the
tute of Medicine (IOM), and the Cochrane Collaboration all title/abstract and full-text stages of the screening process ver-
recommend using two or more members of the review team, sus using dual independent reviewers only in the full-text
working independently, to screen studies.1-4 The IOM specif- stage in the context of a systematic review exploring repre-
ically notes that doubling the number of screeners requires sentation of multimorbidity in behavioral intervention ran-
significant time and resources but that the additional expense domized controlled trials.7 The objective of this study is to
is justified in order to reduce bias and errors.2 A previous identify if using dual reviewers throughout the entire study
study explored the impact of using dual reviewers as com- screening process produces a clear benefit over using dual
pared with a single reviewer and found that the average reviewers only at the full-text screening stage in a large sys-
increase in eligible studies identified using two reviewers tematic review.
was 9%, ranging from 0% to 32%, suggesting a notable
impact by the second reviewer.5 2 | METHODS
Additionally, these groups are consistent in rec-
ommending that study screening be performed in a two-stage 2.1 | Study setting
process, in which titles and abstracts are screened first,
followed by full-text study reports. However, these groups This study took place in the context of a large systematic
do not explicitly recommend if the additional reviewer review evaluating the inclusion of participants with multiple
should be involved in both stages. Cochrane and AHRQ chronic conditions in randomized trials of behavioral health
address this briefly. Cochrane suggests that adding a second interventions.7 The methods and results of this systematic
reviewer at the full-text stage may be sufficient, saying that review are reported separately,7 but summarized briefly here.
“Authors must first decide if more than one of them will The eligibility criteria of the systematic review were (a)
assess the titles and abstracts of records retrieved from the primary report of an RCT testing the efficacy or effective-
search … It is most important that the final selection of stud- ness of behavioral interventions, (b) the study reports origi-
ies into the review is undertaken by more than one author.”1 nal data (protocols, post-trial follow-up studies, and
AHRQ states that “Some form of dual review should be secondary or separate subgroup analyses were excluded), (c)
done at each stage”; however, they suggest alternatives to the RCT targets chronic illness, (d) the RCT applied eligibil-
dual review in the title/abstract stage such as having the sec- ity criteria at the individual level, (e) the trial was published
ond reviewer only review the first reviewer's exclusions, or in English, and (f) the RCT enrolled only adult subjects
only conducting dual review on a small percentage of the (18 y or older).
records in a pilot phase in order to resolve any confusion, The search strategy of the systematic review was
and then going on to single review only for the remainder of designed to be broad in order to identify all published RCTs
the title/abstract phase.3 However, while performing a pilot in adults that test behavioral health interventions and target
phase may help to reduce error due to unclear or misunder- chronic illness. Due to this broad search strategy, this sys-
stood eligibility criteria, it is unlikely to prevent all error, tematic review involved a large number of search results that
and it will not prevent random error by the reviewer, so is provided the ideal setting for the current study. This search
unlikely to be sufficient6 to improve accuracy and prevent produced 343 123 records of potentially relevant reports.
mistakenly excluded studies. After removing duplicate records, 190 555 records
Using a second reviewer in the screening process repre- remained.
sents a significant amount of resources. Using a second After the search was performed, a sampling strategy was
reviewer only in the full-text stage of screening may be a used to produce a representative sample of literature of
way to reduce resources necessary, while still maintaining a behavioral intervention RCTs targeting participants with
lower level of bias. A previous study explored various chronic conditions published from 2000 to 2014. This was
methods of study selection using a cost-effectiveness done by randomly ordering search results (within three time
STOLL ET AL. 3

periods, 2000-2004, 2005-2009, and 2010-2014) using the 2.3 | Limited dual review approach
RAND function in Microsoft Excel and performing study
In the second approach of the study (Figure 1), reviewers
selection on the randomly ordered results within each time
performed a limited dual review of records. Records were
period until the target sample size (200 studies per time
assigned to each reviewer in alternating groups of 2500
period, 600 studies total) was reached.
(total of 15 000 records) such that each title/abstract was
For purposes of the current study, the first 15 000 records
reviewed by only one reviewer. Decisions made by the sole
from the randomly ordered search results (5000 per time
reviewer regarding exclusion or moving of the record to full-
period) were used for the complete dual review approach,
text screening were considered final.
and the next 15 000 records of the randomly ordered search
Studies indicated for full-text review by solo review were
results (5000 per time period) were used for the limited dual
then independently dually reviewed, following the same full-
review approach. Two experienced reviewers (C.S. and S.I.)
text review process as in the first approach of the study.
took part in this study. Both reviewers were involved in the
study design of the systematic review and the definition of
the eligibility criteria. The reviewers went through a pilot 2.4 | Analysis
process with the eligibility criteria prior to starting the study
2.4.1 | Complete dual review
to ensure they had a similar understanding of the eligibility
criteria. In order to quantify the benefit of the use of a second
reviewer in each stage, analysis was performed to identify
records in each stage that were mistakenly excluded by a
2.2 | Complete dual review approach
reviewer. Screening results from the title/abstract stage were
During the first approach of the study (Figure 1), appraised to identify records that were originally selected for
reviewers fully performed a dual independent review of inclusion by one reviewer and exclusion by the other
all records. Reviewers first independently screened studies reviewer and then after discussion between reviewers were
by title/abstract and compared results. Records were moved to full-text screening. Only records that were
excluded if both reviewers had excluded them. Records excluded by one reviewer but eventually included in the sys-
were moved to full-text screening if both reviewers indi- tematic review were considered to have been mistakenly
cated they should be kept. Records for which reviewers excluded. Records that were originally disagreed on but were
had opposing decisions were reviewed again together, and not eventually included in the review were not counted. The
a consensus was made to exclude them or move them to full-text screening stage was appraised in a similar way to
full-text screening. identify which studies that were eventually included in the
Reviewers then independently screened identical lists of review had been mistakenly excluded by each reviewer. The
studies by reading the full-text of each study report and records mistakenly excluded in both the title/abstract stage,
applying eligibility criteria. After the screenings were com- and full-text stage were identified and subtracted from the
plete, results were compared. Records were excluded if both total number of records mistakenly excluded to determine
reviewers had excluded them. Records were included in the the total number of unique records mistakenly excluded by a
review if both reviewers had included them. Records for reviewer during the complete dual review approach.
which reviewers had opposing decisions were reviewed
again together and a consensus was formed.
2.4.2 | Limited dual review
This review process was completed in three sets of 5000
records (total of 15 000 records), with comparison between The full-text screening results of the limited dual review
the results of the two reviewers at the end of each set. approach were analyzed in the same way as the full-text

FIGURE 1 Comparison of the complete dual review approach and the limited dual review approach
4 STOLL ET AL.

screening results of the complete dual review to identify 3.1 | Complete dual review approach
which studies that were eventually included in the review
Using the complete dual review approach, a total of 15 000
had been mistakenly excluded by each reviewer.
title/abstract records were screened. Of these records, 242
study reports were ultimately included in the systematic
2.4.3 | Comparison of approaches review. During the title/abstract screening, a total of 810
records were identified by at least one reviewer as potentially
Total number of mistakenly excluded unique records by meeting eligibility criteria. After comparison of results and
reviewers (mistakenly excluded vs not mistakenly excluded) discussion of disagreements, 507 records were moved to
in each approach (complete dual vs limited dual) were com- full-text review stage. During this stage, the reviewers mis-
pared using a chi-square test of independence. Only unique takenly excluded 22 (9.1%) and 16 (6.6%) records of the 242
records mistakenly excluded in the complete dual review study reports that eventually were included in the systematic
approach were counted so that a record that was mistakenly review.
excluded in both the title/abstract screening stage and the During the full-text review stage, 507 records were
full-text screening stage was only counted once. screened. Of these, 242 were ultimately included in the sys-
tematic review. During this stage, the reviewers mistakenly
excluded 29 (11.9%) and 16 (6.6%) records of the 242 study
2.4.4 | Comparison of results within complete reports that eventually were included in the systematic
dual screening approach review. Overall, 51 (0.3%) and 32 (0.2%) of 15 000 records
Number of unique records mistakenly excluded in each set were identified as being mistakenly excluded in at least one
of 5000 records screened during the complete dual review stage by a reviewer, for a total of 83 across both stages.
approach was calculated to explore bias over time, i.e., if However, 17 of these were mistakenly excluded in both the
reviewers improved their accuracy of screening throughout title/abstract stage and the full-text stage, leaving 66 (0.4%)
the approach. Total number of mistakenly excluded unique unique records mistakenly excluded of 15 000 screened
records by reviewers (mistakenly excluded vs not mistakenly using the complete dual review approach.
excluded) in each set of 5000 results screened using the
complete dual review approach (first set of 5000 vs second 3.2 | Limited dual review approach
set of 5000 vs third set of 5000) were compared using a chi-
square test of independence. During the limited dual review approach, a total of 15 000
title/abstract records were screened. Of these 15 000 records,
515 records were moved to the full-text review stage by a
3 | R E S U LT S single reviewer. These 515 records were then dually
reviewed by two reviewers, and ultimately, 226 full-text
Figure 2 summarizes the results of the analysis. study reports were included in the review. During the full-

FIGURE 2 Results using the complete dual review approach and the limited dual review approach
STOLL ET AL. 5

text review stage, reviewers mistakenly excluded 10 (4.4%) the evidence from previous studies6 of the effects of the
and 12 (5.3%) records of the 226 study reports that eventu- timing of the use of the second reviewer.
ally were included in the systematic review. Overall, 22 It is reasonable that a complete dual review process
(0.2%) of 15 000 records were mistakenly excluded in the would result in increased thoroughness in selection of stud-
full-text stage by a reviewer using the limited dual review ies for a systematic review. Random error, particularly when
approach. screening such a large number of search results, can occur at
either stage in the screening process, so it is logical that a
complete dual review process would help to prevent this
3.3 | Comparison of selection approaches
error from affecting the inclusion of relevant study reports in
A chi-square test of independence was performed to examine a systematic review. Given the potential for random error,
the relation between review process (complete dual vs lim- other strategies recommended to improve completeness of
ited dual) and number of mistakenly excluded study reports study selection in a systematic review, such as conducting
(mistakenly excluded vs not mistakenly excluded). The rela- dual review on a small percentage of search results only in a
tion between screening approach and number of mistakenly pilot phase before proceeding to single review of
excluded study reports was significant (χ 21 = 22.065, P < title/abstract,3 are unlikely to result in the level of precision
.05). Reviewers were more likely to identify mistakenly that can be reached with a complete dual review process.
excluded relevant study reports through the complete dual Our results differ from that of Shemilt et al,6 which
review process (0.4%) than the limited dual review process reported similar rates of effectiveness (number of inappropri-
(0.2%). ate exclusions avoided) across each method of study screen-
ing, including what they refer to as double screening and
3.4 | Comparison of results within complete single screening, which are similar to our two approaches.
dual review approach There are several possible explanations for this different
result. Potentially, the two systematic reviews that provided
When considering each set of 5000 results used in the com- the context for these studies differed in complexity of eligi-
plete dual screening approach, reviewers mistakenly bility criteria, or the different research questions resulted in
excluded 16 (0.3%) of 5000, 22 (0.4%) of 5000, and 26 different types of title/abstract records or full-text reports that
(0.5%) of 5000 results. A chi-square test of independence may be more or less difficult to screen properly. The system-
was performed to examine the relation between set of 5000 atic review that provided the context for this study had a
results screened (first set of 5000 vs second set of 5000 vs broad study question and sought to include a diverse set of
third set of 5000) and number of mistakenly excluded unique studies in type of both behavioral intervention being tested
records by reviewers (mistakenly excluded vs not mistakenly and participant population targeted.7 Other systematic
excluded) using the complete dual review approach. The reviews on more focused questions aim to include a more
relation between set of 5000 and number of mistakenly specific set of study reports that will greatly impact how the
excluded unique study reports was not statistically signifi- eligibility criteria is defined, what type and variety of
cant (χ22 = 2.385, P = .30). records are found through the search strategy, and therefore
the entire study selection process. Continued investigation of
4 | DISCUSSI ON these screening methods in other systematic reviews differ-
ing in type or context is necessary to provide additional evi-
This study compared the use of a complete dual review dence guiding systematic reviewers on which screening
screening process, with independent dual review at both the methods should be used.
title/abstract stage and the full-text stage, to a limited dual This study represents an addition to the limited evidence
review screening process, with single review at the regarding the effectiveness of different study screening
title/abstract stage and dual review at the full-text stage, approaches in systematic reviews. The systematic review
within the context of a large systematic review of random- during which this study was performed provided a unique
ized controlled trials of behavioral interventions in partici- opportunity to perform this study given the large of number
pants with chronic conditions.7 Results show that a complete of search results that we needed to screen.
dual review screening process can result in identifying a However, there are several limitations of the study. A
larger number of eligible studies than a limited dual screen- major limitation is that our study only focuses on effective-
ing process. Our results confirm the conclusions of a previ- ness of the two approaches in reducing the number of
ous study that including a second reviewer in the screening records mistakenly excluded and does not directly address
process can result in significant impact5 and contribute to the cost-effectiveness of each method.
6 STOLL ET AL.

There are also limitations of our methods. If studies were dual review process, which is less likely to be impacted by
misclassified by both reviewers, it is possible that the num- these limitations. As additional studies were identified in the
ber of studies mistakenly excluded by reviewers is under- title/abstract stage than in the full-text stage alone, this con-
estimated. Additionally, the complete dual review process firms that dual review at the title/abstract stage results in
approach was performed before the limited dual review pro- finding additional relevant studies.
cess approach, creating the possibility of bias over time as Further studies should be performed to confirm the find-
reviewers became more experienced in the screening process ings of this study in the context of other systematic reviews
and improved their precision in identifying relevant studies. on different topics and with varying complexity of eligibility
This is one possible explanation for the higher number of criteria. Exploration of whether there are certain characteris-
studies mistakenly excluded by reviewers in the full-text tics of study reports that made them more likely to be mis-
screening of the complete dual review process approach as takenly excluded, such as publication in non-English
compared with the limited dual review process approach. journals, could provide helpful context for how to improve
However, in order to reduce this potential bias, the complete the study selection process. Additionally, studies should
dual review process approach was performed in three sets of evaluate additional other strategies recommended to improve
5000 search records. After each set of 5000 records, precision of study selection in a systematic review, such as
reviewers compared titles/abstracts chosen, agreed on the having a second reviewer confirms the decisions of the first
records for full-text screening, performed the full-text reviewer in the title/abstract stage, as opposed to performing
screening stage, and agreed on the studies to be included in fully independent reviews.3 Novel methods for performing
the systematic review. This allowed the two reviewers to study selection in systematic reviews, such as automating the
improve their process and potentially improve their precision process,8-12 have been proposed and have shown promise in
within the complete dual review process approach, which providing a reduction in work load but also a reduction in
would have not been possible had we performed the com- precision.6,13 These methods need further comparison in pre-
plete dual review process on the entire 15 000 records before cision to the complete dual review process.
comparing between reviewers. Our results comparing across Although it may be tempting for systematic review per-
sets of 5000 records used in the complete dual review formers to use a limited dual review process to save time and
approach show no statistically significant difference is num- resources, this study provides evidence that a complete dual
ber of mistakenly excluded study reports, suggesting that review process will increase the precision of study selection
although reviewers may improve at study screening over in systematic reviews. Given the importance of a precise
time, it is not at a level that would impact our results. This study selection process for the quality and results of a sys-
adds to our confidence in our overall conclusion, that the tematic review, review performers should carefully consider
complete dual review approach identifies more mistakenly the potential impact of performing anything other than a
excluded study reports than the limited dual review complete dual review process for study selection.
approach, even while acknowledging the potential bias by
performing one approach before the other.
It is also possible that dual screening in the title/abstract
F U N D I NG
stage could affect the number of studies mistakenly excluded
by dual review in the full-text review stage. Potentially, some This project has been funded in whole or in part with federal
of the records that were prevented from being mistakenly funds from the National Cancer Institute, National Institutes
excluded by the use of a second reviewer in the title/abstract of Health, under contract no. HHSN261200800001E. The
stage may have been study reports that were less straightfor- content of this publication does not necessarily reflect the
ward in meeting the eligibility criteria, creating a more com- views or polices of the Department of Health and Human
plicated process for full-text review if records were more Services, nor does mention of trade names, commercial
difficult to assess using the eligibility criteria. A reader may products, or organizations to imply endorsement by the US
be more likely to have rejected these mistakenly during the Government. Stoll, Izadi, and Colditz are supported, in part,
full-text stage, providing more opportunity for the second by the Foundation for Barnes-Jewish Hospital, St Louis.
reviewer to identify mistakenly excluded records as com- Colditz is also supported by the Alvin J. Siteman Cancer
pared with the limited dual review process that may have Center Biostatistics Shared Resource, P30 CA091842.
started with a group of records that more obviously fit eligi-
bility criteria.
However, the study design also allows us to compare C O N F L I C T O F I N T ER EST
independent dual review at the title/abstract stage to the full-
text stage within the 15 000 records used in the complete The authors declare no conflict of interest.
STOLL ET AL. 7

DATA S HA RI NG 6. Shemilt I, Khan N, Park S, Thomas J. Use of cost-effectiveness


analysis to compare the efficiency of study identification methods
The data that support the findings of this study are available in systematic reviews. Syst Rev. 2016;5(1):140.
from the corresponding author upon reasonable request. 7. Stoll CR, Izadi S, Fowler S, et al. Multimorbidity in randomized
controlled trials of behavioral interventions: a systematic review.
Health Psychol. In press;2019. http://dx.doi.org/10.1037/
O RC I D hea0000726
8. Olofsson H, Brolund A, Hellberg C, et al. Can abstract screening
Carolyn R.T. Stoll https://orcid.org/0000-0001-6951-4219
workload be reduced using text mining? User experiences of the
Paige Green https://orcid.org/0000-0001-7886-8924 tool Rayyan. Res Synth Methods. 2017;8(3):275-280.
Jerry Suls https://orcid.org/0000-0002-8436-7488 9. Paynter R, Bañez LL, Berliner E, et al. EPC Methods: An Explora-
Graham A. Colditz https://orcid.org/0000-0002-7307- tion of the Use of Text-Mining Software in Systematic Reviews.
0291 Rockville (MD): Agency for Healthcare Research and Quality;
2016 16-EHC023-EF.
10. Przybyła P, Brockmeier AJ, Kontonatsios G, et al. Prioritising ref-
R E F E RENC E S erences for systematic reviews with RobotAnalyst: a user study.
Res Synth Methods. 2018;9(3):470-488.
1. Higgins JP, Green S, eds. Cochrane Handbook for Systematic
11. Thomas J, McNaught J, Ananiadou S. Applications of text mining
Reviews of Interventions Version 5.1.0 (updated March 2011).
within systematic reviews. Res Synth Methods. 2011;2(1):1-14.
The Cochrane Collaboration, 2011. The Cochrane Collaboration.
12. Wallace BC, Small K, Brodley CE, et al. Toward modernizing the
2. Morton S, Berg A, Levit L, Eden J. Finding What Works in Health
systematic review pipeline in genetics: efficient updating via data
Care: Standards for Systematic Reviews. Washington, DC:
mining. Genet Med. 2012;14(7):663-669.
National Academies Press; 2011.
13. O'Mara-Eves A, Thomas J, McNaught J, Miwa M, Ananiadou S.
3. McDonagh M, Peterson K, Raina P, Chang S, Shekelle P. Avoiding
Using text mining for study identification in systematic reviews: a
Bias in Selecting Studies. Rockville (MD): Agency for Health Care
systematic review of current approaches. Syst Rev. 2015;4(1):5.
Research and Quality; 2013.
4. University of York Centre for Reviews and Dissemination. System-
atic Reviews: CRD's Guidance for Undertaking Reviews in Health How to cite this article: Stoll C, Izadi S, Fowler S,
Care. York: University of York, Centre for Reviews & Dissemina-
Green P, Suls J, Colditz GA. The value of a second
tion; 2009.
5. Edwards P, Clarke M, DiGuiseppi C, Pratap S, Roberts I, Wentz
reviewer for study selection in systematic reviews. Res
R. Identification of randomized controlled trials in systematic Syn Meth. 2019;1–7. https://doi.org/10.1002/jrsm.
reviews: accuracy and reliability of screening records. Stat Med. 1369
2002;21(11):1635-1640.

You might also like