Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/344615909

Eight problems with literature reviews and how to fix them

Article in Nature Ecology & Evolution · October 2020


DOI: 10.1038/s41559-020-01295-x

CITATIONS READS

110 27,751

9 authors, including:

Neal R Haddaway Alison Bethel


Leibniz Centre for Agricultural Landscape Research University of Exeter
194 PUBLICATIONS 7,740 CITATIONS 14 PUBLICATIONS 531 CITATIONS

SEE PROFILE SEE PROFILE

Lynn V Dicks Julia Koricheva


University of Cambridge Royal Holloway, University of London
242 PUBLICATIONS 11,703 CITATIONS 241 PUBLICATIONS 19,970 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

EviEM Systematic Map and Reviews: The impacts of farming on soil organic carbon View project

UK and global MPA research View project

All content following this page was uploaded by Neal R Haddaway on 12 October 2020.

The user has requested enhancement of the downloaded file.


Perspective
https://doi.org/10.1038/s41559-020-01295-x

Eight problems with literature reviews and how to


fix them
Neal R. Haddaway 1,2,3 ✉, Alison Bethel4, Lynn V. Dicks5,6, Julia Koricheva 7, Biljana Macura ,
1

Gillian Petrokofsky8, Andrew S. Pullin9, Sini Savilaakso 10,11 and Gavin B. Stewart 12

Traditional approaches to reviewing literature may be susceptible to bias and result in incorrect decisions. This is of particular
concern when reviews address policy- and practice-relevant questions. Systematic reviews have been introduced as a more rig-
orous approach to synthesizing evidence across studies; they rely on a suite of evidence-based methods aimed at maximizing
rigour and minimizing susceptibility to bias. Despite the increasing popularity of systematic reviews in the environmental field,
evidence synthesis methods continue to be poorly applied in practice, resulting in the publication of syntheses that are highly
susceptible to bias. Recognizing the constraints that researchers can sometimes feel when attempting to plan, conduct and
publish rigorous and comprehensive evidence syntheses, we aim here to identify major pitfalls in the conduct and reporting of
systematic reviews, making use of recent examples from across the field. Adopting a ‘critical friend’ role in supporting would-be
systematic reviews and avoiding individual responses to police use of the ‘systematic review’ label, we go on to identify meth-
odological solutions to mitigate these pitfalls. We then highlight existing support available to avoid these issues and call on the
entire community, including systematic review specialists, to work towards better evidence syntheses for better evidence and
better decisions.

T
he aims of literature reviews range from providing a primer conservation and environmental management (see http://www.
for the uninitiated to summarizing the evidence for deci- environmentalevidence.org).
sion making1. Traditional approaches to literature reviews
are susceptible to bias and may result in incorrect decisions2,3. This Towards a better understanding of rigour in evidence
can be particularly problematic when reviews address applied, synthesis
policy-relevant questions, such as human impact on the environ- Despite the increasing popularity of systematic reviews in the envi-
ment or effectiveness of interventions where there is a need for ronmental field, evidence synthesis methods continue to be poorly
review results to provide a high level of credibility, accountability, applied in practice, resulting in the publication of syntheses that are
transparency, objectivity, or where there is a large or disparate evi- highly susceptible to bias. In one assessment by O’Leary et al.7, a
dence base or controversy and disagreement amongst existing stud- set of 92 environmental reviews published in 2015 was judged to
ies. Instead, rigorous approaches to synthesizing evidence across be poorly conducted and reported (a median score of 2.5 out of a
studies may be needed, that is, systematic reviews. possible 39 using the Collaboration for Environmental Evidence
Systematic review is a type of research synthesis that relies on a Synthesis Appraisal Tool (CEESAT)8). Substandard reviews could
suite of evidence-based methods aimed at maximizing rigour and provide misleading findings, potentially causing harm and wasting
minimizing susceptibility to bias. This is achieved by attempting to valuable resources in research, policy and practice. Furthermore,
increase comprehensiveness, transparency and procedural objectiv- these reviews could erode trust in evidence synthesis as an academic
ity of the review process4. The methods involved are outlined in Fig. 1 endeavour.
(see also refs. 2,5). Substantial support exists to help raise the rigour of evidence
Systematic reviews were originally developed in the fields synthesis toward the recognized standards of systematic reviews:
of social science and healthcare and have had a transformative a range of Open Access methodological guidance and standards
effect, particularly in health, where they underpin evidence- exists both across subjects9,10 and in the field of conservation and
based medicine6. Introduction of systematic reviews into environment5. Methods for peer-reviewing and critically appraising
medicine was facilitated by Cochrane, the review coordinating body the rigour of systematic reviews are also freely available8,11. Open
that sets standards and guidance for systematic reviews of health- Educational resources in evidence synthesis methodology exist
care interventions (https://www.cochrane.org/). Systematic reviews online (for example, ref. 1 and https://synthesistraining.github.io/).
are now increasingly published in other fields, with the Collaboration There are free-to-use, online platforms designed to support the
for Environmental Evidence (CEE) established in 2008 to act methodology, such as SysRev (https://sysrev.com). Finally, an
as the coordinating body supporting efforts in the field of open and willing community of practice consisting of hundreds

Mercator Research Institute on Climate Change and Global Commons, Berlin, Germany. 2Stockholm Environment Institute, Stockholm, Sweden. 3Africa
1

Centre for Evidence, University of Johannesburg, Johannesburg, South Africa. 4College of Medicine and Health, Exeter University, Exeter, UK. 5Department
of Zoology, University of Cambridge, Cambridge, UK. 6School of Biological Sciences, University of East Anglia, Norwich, UK. 7Department of Biological
Sciences, Royal Holloway University of London, Egham, UK. 8Department of Zoology, University of Oxford, Oxford, UK. 9Collaboration for Environmental
Evidence, UK Centre, School of Natural Sciences, Bangor University, Bangor, UK. 10Liljus ltd, London, UK. 11Department of Forest Sciences, University of
Helsinki, Helsinki, Finland. 12Evidence Synthesis Lab, School of Natural and Environmental Sciences, University of Newcastle, Newcastle-upon-Tyne, UK.
✉e-mail: neal_haddaway@hotmail.com

Nature Ecology & Evolution | www.nature.com/natecolevol


Perspective NATuRe Ecology & EvoluTion

Question formulation
time, and without necessarily breaking the bank when it comes to
resources and funding.

Protocol (ideally, peer-reviewed and published) Objectives


Recognizing the constraints that researchers can sometimes face
Searching
when attempting to plan, conduct and publish rigorous and com-
prehensive evidence syntheses, we aim here to identify major pit-
falls in the conduct and reporting of systematic reviews, making
Article screening use of recent examples from across the field. Adopting a ‘critical
friend’ role of supporting potential systematic reviewers, we go
Data extraction
on to identify methodological solutions to mitigate these pitfalls.
We then highlight existing support available to avoid these issues.
Finally, we describe key intervention points where the conservation
Critical appraisal and environmental management communities, including funders,
review authors, editors, peer-reviewers, educators, and us as meth-
Synthesis
odologists, can act to avoid problems associated with unreliable and
substandard reviews.

Final review (peer-reviewed and published) Eight problems, eight solutions


In the following section, we use recent examples of literature reviews
Communication published in the field of conservation and environmental science
to highlight eight major limitations and sources of bias in evidence
synthesis that undermine reliability. We describe each problem and
Fig. 1 | Schematic showing the main stages necessary for the conduct of provide potential mitigation solutions in turn. The problems, exam-
a systematic review as defined by the Collaboration for Environmental ples and solutions for different actors are outlined in Supplementary
Evidence. See www.environmentalevidence.org. Information.

1. Lack of relevance (limited stakeholder engagement).


of methodologists exists in the field of conservation and environ- Description. Taking a broad definition of stakeholders (including any
ment (CEE, www.environmentalevidence.org), as it does in social individual or group who is affected by or may affect the review and
policy (the Campbell Collaboration, www.campbellcollaboration. its findings13), all reviews whose results will be used either to shape
org) and healthcare (Cochrane, www.cochrane.org). That said, the an academic field or to inform policy or practice decision making
lack of awareness and adherence to internationally accepted mini- should involve some degree of stakeholder engagement. Doing so
mum standards and best practices in evidence synthesis in the field can improve review effectiveness, efficiency and impact14,15. In some
of conservation and environment demonstrates that more must be ‘public goods’ reviews (that is, those published and intended for a
done to support the publication of reliable syntheses. Despite all wide audience16), however, authors do not adequately engage with
these clear international standards and freely accessible and abun- relevant stakeholders. This may result in the scope of the review being
dant guidance for systematic reviews, review articles are frequently of limited practical relevance to researchers and decision-makers. It
published that claim to be ‘systematic reviews’, because they have may also result in the review using definitions of key concepts and
employed some elements of the method, but fall substantially short search terms that are not broadly accepted or appropriate, limiting
of the standard12. In sum, we see two related issues when consider- acceptance and uptake of the review’s findings, or producing an
ing rigour of evidence syntheses. Firstly, most published evidence inaccurate or biased selection of literature. This may result from a
reviews are poorly conducted. Secondly, those that describe them- lack of coherence within the stakeholder communities themselves.
selves as ‘systematic reviews’ imply an increased level of rigour, and Stakeholder engagement in evidence synthesis is an opportunity for
where this is not true (that is, the authors have failed to adequately attempting to resolve these issues, however; providing broad benefits
follow accepted standards), confusion occurs over what the term to the wider science-policy and -practice community.
‘systematic review’ really means.
Here, we describe issues affecting all evidence reviews and Example. In conducting a systematic review on the impacts of palm
encourage review authors to transparently report their methods so oil production on biodiversity, Savilaakso et al.17 contacted recog-
that the reader can judge how systematic they have been. We do not nized experts and key stakeholders as outlined in the protocol18.
believe that all reviews should be ‘systematic reviews’; for example, Although the authors contacted company representatives, in ret-
‘primers’ or overviews to a novel topic or reviews that combine rospect the stakeholder engagement was not broad enough. After
concepts do not seek to be comprehensive, rigorous or definitive publication of the review, the Malaysian palm oil industry criticized
in influencing policy. However, we do believe that all reviews can the review for its narrow focus on biodiversity and not including
benefit from applying some of these best practices in systematic poverty impacts. A broader stakeholder engagement could have
approaches, with transparency perhaps being the least costly to alleviated the problem by explaining the purpose of the review (that
operationalize. is, review of existing knowledge as a starting point for research
We understand the resource and time constraints faced by proposals related to land use) and/or it could have led to a broader
review authors, and we appreciate the costs involved in attempting review inclusive of social impacts.
to produce and publish rigorous evidence syntheses. However, we
do believe that the reliability of reviews intended to inform policy Mitigation strategies. Stakeholder engagement can require substan-
is a serious scientific and social issue and could be substantially tial resources if reviewers aim for it to be comprehensive and include
improved if the research community were to fully embrace rigorous physical meetings, particularly on contentious topics. However,
evidence synthesis methods, committing to raise awareness across stakeholders can readily be identified, mapped and contacted for
the board. We also know that this can be achieved incrementally, feedback and inclusion without the need for extensive budgets.
progressively increasing the standard of reviews produced over Reviewers could, as a minimum, attempt to identify important

Nature Ecology & Evolution | www.nature.com/natecolevol


NATuRe Ecology & EvoluTion Perspective
minority or marginalized groups and then engage with key groups searched. In addition, the authors reported only some of the terms
remotely, asking for feedback on a brief summary of the planned that were included in the bibliographic searches. In their review of
review by e-mail14,19. This should be described in the review report. the impact of species traits on responses to climate change, Pacifici
et al.25 did not describe how their inclusion criteria were applied in
2. Mission creep and lack of a protocol. Description. Mission creep practice, so it is impossible to know whether or how they dealt with
occurs when the review deviates from the initial objectives. Key defi- subjectivity and inconsistency between reviewers. More problem-
nitions, search strategies and inclusion or appraisal criteria may alter atic, Owen-Smith26 and Prugh et al.27 failed to include a methods
over time or differ between reviewers. The resultant set of articles section of any kind in their reviews. Also problematic, and perhaps
will then not be representative of the relevant evidence base and more common than a failure to describe methods, is a failure to
important studies may have been omitted. As a result, the review include the extracted data. For example, Li et al.28 did not pres-
may be highly inaccurate and misleading, and will be unrepeatable. ent their data, which prevents replication of their analyses or later
A priori protocols minimize bias, allow constructive feedback before updating of their synthesis.
mistakes in review methodology are made, allow readers to verify
methods and reporting, and act as a within-group roadmap in meth- Mitigation strategies. Making use of high-standard evidence syn-
ods during conduct of the review. Reviews that lack protocols pre- theses and guidance (such as those published by Cochrane, the
clude this clarity and verifiability. This is similar to ‘pre-registering’ Campbell Collaboration and CEE) as examples can help improve
of primary research in some fields, where methodological plans are reporting. Similarly, review authors should attempt to con-
published, date-stamped, versioned and are unalterable). form to internationally accepted review reporting standards,
such as Preferred Reporting Items for Systematic Reviews and
Example. In their review of insect declines, Sánchez-Bayo and Meta-Analyses (PRISMA)29 and RepOrting standards for Systematic
Wyckhuys20 failed to provide a protocol and succumbed to mission Evidence Syntheses (ROSES)23, to ensure all relevant methodologi-
creep. They did so by initially focusing on drivers of insect decline cal information has been included in protocols and review reports.
as described in the objectives, but shifting to generalize about Additionally, review authors can choose to include methodology
insect populations across all species, not just those declining. Their experts in their review teams or advisory groups. Finally, review
searches focused exclusively on studies identifying declining popu- authors can choose to publish their syntheses through leading orga-
lations, but their conclusions purportedly relate to all insect popula- nizations and journals working with systematic reviews and maps,
tions. Similarly, Agarwala and Ginsberg21 reviewed the tragedy of such as CEE.
the commons and common‐property resources but failed to provide Review authors should provide metadata (descriptive informa-
a protocol that would justify the choice of search terms and clarify tion), data (individual study findings) and analytical code (for exam-
the criteria selecting studies for the review. ple, R scripts used for meta-analysis) in full alongside their review as
far as is legally permitted, and summary data where not. Guidelines
Mitigation strategies. Review authors should carefully design an (https://data.research.cornell.edu/content/writing-metadata) and
a priori protocol that outlines planned methods for searching, example systematic reviews (for example, ref. 30) can highlight best
screening, data extraction, critical appraisal and synthesis in detail. practices in metadata creation. Where authors’ decisions are known
This should ideally be peer-reviewed and published (journals such to be somewhat subjective, for example on issues relating to study
as Environmental Evidence, Ecological Solutions and Evidence and validity, review authors should first trial assessments and then dis-
Conservation Biology now accept registered reports/protocols, and cuss among co-authors all inconsistencies in detail before continu-
protocols can be stored publicly on preprint servers such as Open ing. In addition, reviewers should report in detail all decisions; for
Science Framework Preprints (https://osf.io/preprints)), and may example, which studies are eligible, what data should be extracted
benefit substantially from stakeholder feedback (see point 1 above). and how valid studies are viewed to be, along with justifications
Occasionally, deviations from the protocol are necessary as evidence for these decisions. This then allows actions to be fully understood
emerges, and these must be detailed and justified in the final report. and replicated.

3. Lack of transparency/replicability (inability to repeat the 4. Selection bias and a lack of comprehensiveness (inappropriate
study). Description. An ability to repeat a review’s methods exactly search methods and strategy). Description. Selection bias occurs
(also referred to as ‘replicability’) is a central tenet of the scientific where the articles included in a review are not representative of the
method22, and the methods used to produce reviews should be evidence base as a whole31. Any resultant synthesis and conclusions
reported transparently in sufficient detail to allow the review to be based on this evidence are then highly likely to be biased or inac-
replicated or verified23. If the reader can understand neither how curate. Broadly speaking, selection bias may occur in reviews as a
studies were identified, selected and synthesized, nor which were result of failing to account for bias in what research is published
excluded, the risk of bias cannot be assessed, and unclear subjective (publication bias) and what data are reported in published studies
decisions may affect reliability. Unreplicable reviews cannot truly (reporting bias), and by substandard review methods that affect
be trusted, since mistakes may have been made during conduct. In which studies are included in the review. Specifically in relation to
addition, unreplicable reviews have limited legacy, since they can- search strategies, however, selection bias affects syntheses through
not be upgraded or updated and differences in outcomes between inappropriate search strategies; for example, as a result of ‘cherry
several reviews on the same topic cannot be reconciled. Ultimately, picking’ studies for inclusion, choosing biased/unrepresentative
unreplicable reviews erode trust in evidence synthesis as a disci- bibliographic databases, or using inappropriate search strategies for
pline, creating a barrier to evidence-informed policy. Similarly, a the subject at hand.
lack of transparency in reporting what was found (that is, raw study
data, summary statistics and analytical code) prevents analytical Example. By including ‘decline’ as a search term, Sánchez-Bayo and
replication and verification. Wyckhuys20 targeted only studies showing a reduction in insect
population, contradicting their goal to collate “all long-term insect
Example. Lwasa et al.24, in their review of the mediating impacts of surveys conducted over the past 40 years”. Thus, the authors syn-
urban agriculture and forestry on climate change, failed to describe thesized a subset of evidence based on the direction of observed
their methods in sufficient detail; for example, which grey literature results, potentially missing studies showing a neutral or positive
sources and which databases/indexes within Web of Science were change, and exaggerating the insect populations’ declining status.

Nature Ecology & Evolution | www.nature.com/natecolevol


Perspective NATuRe Ecology & EvoluTion

Furthermore, the authors’ search was not comprehensive, includ- Mitigation strategies. Review authors should attempt to identify and
ing no synonyms, which are vital to account for differences in how include relevant grey literature in their syntheses37,40. This can be
researchers describe a concept. Their string will have missed any attempted by searching specifically for file-drawer research in thesis
research using other terms that may be important synonyms; for repositories and catalogues, preprint servers and funders’ registries.
example, ‘reduction’ as well as ‘decline’. Adding the term ‘increas*’ Calls can also be made for researchers to submit unpublished studies.
would retrieve a substantial additional body of evidence. Secondly, Organizational reports should be searched for by screening websites
the review authors searched only one resource, Web of Science and physical repositories of relevant organizations, and by searching
(they probably mean Web of Science Core Collections, but the exact on specific bibliographic databases or web-based academic search
indexes involved would still be unclear). The authors also excluded/ engines, such as Google Scholar. Review authors should attempt to
ignored grey literature (see point 5, below). identify publication bias in their syntheses by conducting appropri-
In a review of tropical forest management impacts32 and in ate tests (for example, Egger test) and visualizations (for example,
a review of forest conservation policies33, searches for evidence funnel plots) that may suggest publication bias as a feasible reason
were performed only within Google Scholar, relying on Google’s for heterogeneity between large and small studies41.
relevance-based sorting algorithm that displays only the first 1,000
records, which likely provides a biased subset of the literature and 6. Lack of appropriate critical appraisal (treating all evidence as
has been widely shown to be inappropriate as a main source of stud- equally valid). Description. Some primary research is less reliable
ies for literature review34–36. than others because of problems with the methods used, potentially
resulting in an inaccurate or biased finding42. Reviews that fail to
Mitigation strategies. Search methods should include more than bib- appropriately assess and account for the reliability of included stud-
liographic database searching; supplementary methods should also ies are susceptible to perpetuating these problems through the syn-
be employed, for example forwards and backwards citation search- thesis, resulting in inaccurate and biased findings. Primary research
ing, web searching and calls for submission of evidence. Search may have issues relating to ‘internal validity’ (that is, the accuracy
strategies should be carefully planned and should include a compre- of methods) that are caused, for example, by confounding variables,
hensive set of synonyms relevant to the review scope. Specifically, a lack of blinding, failure to account for the presence of confound-
the strategy should: (1) be based on thorough scoping of the lit- ing variables, and a lack of randomization. Reviews may also suffer
erature; (2) be trialled in a sample database and tested to ensure it from problems with external validity, whereby primary studies vary
recovers studies of known relevance (benchmarking37); (3) ideally in their relevance to the review question (for example being con-
be constructed by or with input/support from an information spe- ducted across different spatial scales) but this is not accounted for
cialist/librarian; (4) involve searches of multiple bibliographic data- in the synthesis. Finally, review conclusions may be misleading if
bases (ranging in subject/geographic/temporal scope; for example, studies are selected for meta-analysis based on criteria that do not
Scopus, CAB Abstracts and MEDLINE) to maximize comprehen- properly relate to the study question.
siveness and mitigate bias; and (5) be outlined in an a priori proto- Englund et al.43 provide an illustrative example of how cri-
col that is published and open for scrutiny. teria influence study selection and subsequent meta-analysis
results. Their datasets on stream predation experiments vary from
5. Publication bias (exclusion of grey literature and failure to test all-inclusive criteria to minimal subset of studies. The study shows
for evidence of publication bias). Description. This issue is closely how meta-analytic patterns can appear and disappear based on the
related to and perhaps a subset of point 4 above, but nevertheless selection criteria applied.
requires a separate discussion due to the nature of the mitigation strat-
egies necessary. Positive and statistically significant research findings Example. Burivalova et al.32 included in their review a variety of
are more likely to be published than negative and non-significant studies from meta-analysis to case studies. Their stated goal was “to
results38. The findings of syntheses based only on traditional, com- compare forest variables under two different management regimes,
mercially published academic research will be as biased as the under- or before and after management implementation” in tropical forests.
lying research. Research that is not published in traditional academic They did not conduct critical appraisal of the studies and ended up
journals controlled by commercial publishers is called ‘grey litera- including studies that lacked either internal or external validity. For
ture’, and consists of two main groups—the ‘file-drawer’ research that example, they included an earlier study by Burivalova et al.44 that
was intended to be published in an academic outlet but for some rea- looked at the importance of logging intensity as a driver of biodiver-
son was not; where this reason was a lack of statistical or perceived sity decline in timber estates. However, conclusions about logging
biological significance, publication bias has occurred. A second type intensity were hampered by a failure to consider log extraction tech-
of grey literature consists of organizational reports and other stud- niques, and this failure had already been noted by Bicknell et al.45
ies that were not intended for an academic audience. Where relevant who sought to account for the influence of extraction techniques
studies of this type are omitted from a review, the evidence base will with meta-analysis. Burivalova et al.32 also included a study by
lack comprehensiveness (see point 4 above). Tests that lead one to Damette and Delacote46 that used global country-level data to study
strongly suspect the presence of publication bias and/or quantify its deforestation and assess sustainability of forest harvesting. Although
potential impact are an important element of a high-quality quantita- some of the results were given separately for developing countries,
tive synthesis (Egger test, Vivea and Hedges tests39). the dataset used to assess certification impacts included countries
globally and thus lacked external validity in a review focused on
Example. In their recent review, Agarwala and Ginsberg21 ignored tropical forests only. Similarly, they included a study by Blomley
grey (that is, not commercially published) literature, excluding et al.47 that compared participatory forest management to govern-
organizational reports and theses shown to be valuable sources of ment managed forests in Tanzania without reporting any baseline
evidence30. When the authors then critically appraised studies, there differences or matching criteria for the different forest areas.
was no justification for avoiding grey literature on the grounds of
validity, and including it could have reduced the probability of pub- Mitigation strategies. Systematic reviews should include a critical
lication bias. Pacifici et al.25 also failed to include grey literature. As a appraisal of every included study’s internal and external validity5.
result, the included evidence is likely to be unreliable (although their This assessment should be carefully planned a priori and trialled to
summaries are arguably more dangerous because of vote-counting ensure that it is fit-for-purpose and that review authors can conduct
(see point 7, below). the appraisal consistently10. Existing critical appraisal tools used in

Nature Ecology & Evolution | www.nature.com/natecolevol


NATuRe Ecology & EvoluTion Perspective
other reviews may prove a useful starter from which to develop a heterogeneous to allow any meaningful synthesis or meta-analysis…
suitable tool42. Critical appraisal can be used as a basis to exclude or We follow a standard vote counting procedure where significant
down-weight flawed studies, and its outputs should be used in the positive effects, significant negative effects, and no significant effects
synthesis in some way5: for example, by including study validity as are assigned a ‘vote’ in order to integrate information and generalise
a moderator or basis for sensitivity analysis in quantitative synthe- the effect direction for each structural component on each taxo-
sis (for example, ref. 48), or in order to prioritize presentation and nomic group”. Delaquis et al.58 similarly stated that they deliberately
discussion of the evidence base. Complex scoring systems should chose a vote-counting approach, despite calculating effect sizes in
be avoided to minimize the risk of introducing errors and to ensure some cases. Pacifici et al.25 also synthesized by vote-counting to esti-
repeatability. Instead, studies should be given categorical coding, for mate the percentage of species in major groups that demonstrated
example low, high and unclear validity49. In addition, meta-analysis responses to climate change. In their review of conservation inter-
can be used to compare the magnitude of the effects in studies of dif- vention effectiveness, Burivalova et al.32 visualized their mapping of
ferent validity (for example, observational and experimental stud- evidence by displaying the number of studies for each intervention
ies). These analyses should not be used to adjust meta-analytical type and colour coding studies according to their direction of effect
weighting but should inform judgements about overall strength of (positive, neutral, negative), thereby promoting so-called ‘visual
evidence and uncertainty in effect estimates. vote-counting’.

7. Inappropriate synthesis (using vote-counting and inappropri- Mitigation strategies. Vote-counting should never be used instead
ate statistics). Description. All literature reviews attempt to create of meta-analysis. If the data in primary studies are deemed to be
new knowledge by summarizing a body of evidence. For quantita- too heterogeneous to be combined by means of meta-analysis (for
tive reviews this may take the form of a meta-analysis; that is, com- example, because reported measures of outcome are too diverse),
bining of effect sizes and variances across all studies to generate one using a flawed approach such as vote-counting is unlikely to help.
or more summary effect estimates with confidence intervals (or Instead, the scope of the review might need to be reassessed and
slopes and intercepts in the case of meta-regressions)50. Not all sys- narrowed down to a subset of studies that could be meaningfully
tematic reviews may use meta-analysis as a synthesis method, but all combined. Alternatively, formal methods for narrative synthesis
reviews that are identified as ‘meta-analyses’ must fulfil a number should be used to summarize and describe the evidence base59. It is
of standard requirements such as calculation of the effect sizes for perfectly acceptable (and encouraged) to tabulate the results of all
individual studies, calculation of the combined effects and confi- studies in a narrative synthesis that includes quantitative results and
dence intervals and so on51,52. Meta-analyses and systematic reviews statistical significance, but this should also include results of criti-
are therefore overlapping, with some arguing that all meta-analyses cal appraisal of study validity. Doing so ensures that no studies are
in the environmental field should be based on systematic methods ‘excluded’ from the review because data are not reported in a way
to identify, collate, extract information from and appraise studies as that allows inclusion in a meta-analysis. Indeed, important conclu-
they are in other domains53. sions can be made from narrative synthesis without meta-analyses
For reviews of qualitative evidence, summarizing the body of (for example, ref. 60).
evidence takes the form of a formal drawing together of qualitative A common justification for vote-counting is lack of reporting
study findings to generate hypotheses, create new theories or con- of variance measures in ecological literature. Studies lacking vari-
ceptual models54. The choice and design of the synthesis methods ance measures should be included using the narrative synthesis
are just as critical to the rigour of a review as the question formu- methods described above. Where quantitative synthesis is desired,
lation, searching, screening, critical appraisal and data extraction: meta-analysis of a reduced dataset is preferable to vote-counting a
inappropriate synthesis invalidates all preceding steps. Where full larger data set, ignoring precision, effect magnitude and heteroge-
synthesis is performed, authors should be careful to ensure they use neity. Increasing provision of data as Open Science permeates eco-
established and appropriate synthesis methods. logical research practice should make this problem less pervasive in
One common problem with evidence syntheses occurs when the future.
authors fall foul of ‘vote-counting’ (reviewed in ref. 55). Vote-counting Maps of evidence (for example, systematic maps) that aim to
is the tallying-up of studies based on statistical significance and catalogue an evidence base typically do not extract study findings:
direction of their findings. This approach is problematic for sev- this should primarily only be done in the context of a robust sys-
eral reasons. Firstly, it ignores statistical power and study precision. tematic review that also involves critical appraisal of study validity
Many studies might report non-significant effect not because the and, ideally, appropriate quantitative or qualitative synthesis. Only
effect does not exist, but because the statistical power of these stud- established qualitative and quantitative synthesis methods should
ies is too low to detect it. Secondly, vote-counting ignores the mag- be used making the most of the plethora of methodological support
nitude of effect of each study: those showing a positive effect may available in the literature.
have a much larger effect size than those showing a negative effect.
Finally, vote-counting ignores study validity: the positive studies 8. A lack of consistency and error checking (working individu-
may have a much higher validity than the negative ones, for example ally). Description. An individual researcher performing the various
due to better study designs. tasks of a systematic review may interpret definitions, concepts and
system boundaries differently from someone else. This variabil-
Example. Sánchez-Bayo and Wyckhuys20 claimed to have con- ity is an inherent part of being human, but in a literature review it
ducted a meta-analysis of studies on insect decline, but no stan- may result in the inclusion or exclusion of a different set of stud-
dard meta-analysis methods were used and the review fails most ies depending on individual interpretation. By working alone and
criteria for meta-analyses51,52. It is also unclear how annual decline unchallenged, a reviewer cannot be sure they are correctly inter-
rates were calculated, and such measures were not standard effect preting the protocol. Similarly, working alone can lead to a higher
sizes. There is no mention of weighting, and analysis of variance rate of errors (and importantly for reviews, an unacceptable false
(ANOVA) is inappropriate for combining estimates from different negative error rate, or the erroneous exclusion of relevant studies)
studies. Britt et al.56 similarly did not use established meta-analysis than working in concert with another researcher61.
methods in their quantitative synthesis.
Graham et al.57 chose to use a vote-counting approach in their Example. In their review of the water chemistry habitat associations
review on hedgerows as farmland habitats because “the data are too of the white-clawed crayfish (Austropotamobius pallipes), Rallo and

Nature Ecology & Evolution | www.nature.com/natecolevol


Perspective NATuRe Ecology & EvoluTion

Box 1 | Recommended actions for authors wishing to conduct more rigorous literature reviews

• Familiarize yourself with the best practice in evidence syn- for example, Environmental Evidence, which aims to publish
thesis methods and appreciate that systematic reviewing is a high-quality systematic reviews.
flexible methodology that can be applied to any research topic • Demonstrate and assess the rigour of a review and how it is
provided the question is suitably formulated. reported using existing tools such as ROSES reporting stand-
• Make use of freely accessible guidance, minimum standards ards23, CEESAT (www.environmentalevidence.org/ceeder)
and educational resources provided by CEE and others (for and CEE standards of conduct (http://www.environmentalevi-
example, the Campbell Collaboration and Cochrane). dence.org/information-for-authors).
• Seek training in evidence synthesis to produce a reliable review • Editors and publishers should ensure that instructions for
with a lasting legacy and potential to impact decision-making. authors include sufficient detail and minimum standards
• Connect with existing communities of practice—individual regarding conducting and reporting evidence syntheses, and
methodologists, information specialists/librarians, work- they should ensure that authors follow them: for example,
ing groups, specialist organizations, conferences—and make guidance for reviews for Biological Conservation state “Review
use of the plethora of online resources related to evidence articles… must include a methods section explaining how the
synthesis. literature for review was selected”. Yet several recent reviews
• Engage with stakeholders (including experts) when planning published in this journal lack a methods section altogether
your review: consult with a broad range of stakeholders when (for example, refs. 26,27). Journals should endorse or enforce
setting the scope; with librarians and information special- reporting and conduct standards, such as PRISMA (https://www.
ists when developing the search strategy; with statisticians prisma-statement.org), ROSES (https://www.roses-reporting.
and synthesis methodologists when designing quantitative or com) or Methodological Expectations of Cochrane Inter-
qualitative synthesis; and with communications experts when vention Reviews (MECIR) (https://methods.cochrane.
translating review findings. org/methodological-expectations-cochrane-intervention-
• Ensure that a review is clear in its purpose and objectives. reviews).
• Ensure the intended level of rigour (including transparency, • Methodology experts should support review authors and
procedural objectivity and comprehensiveness) of a review is editors by: raising awareness of rigorous evidence synthesis
achieved. methodology; developing and advertising Open Educational
• Follow Open Science principles when conducting and pub- resources to support those wishing to conduct or appraise
lishing reviews (Open Synthesis64) to ensure transparency; systematic reviews; acting as methodology editors and
that is, make your data, methods and paper freely accessible peer-reviewers for community journals (for example, Environ-
and reusable. ment International that has a dedicated systematic review edi-
• Check author guidance for specific journals for advice on tor); increasing efficiency of reporting and appraisal tools to
what is requested to be included with systematic reviews, make them easier to use in editorial triage and peer-review.

García-Arberas62 tabulated minima, maxima and mean for a range systematic reviews are as free from bias and as reliable as possible,
of water chemistry variables (their table 4). Their review methods demonstrated by recent, flawed, high-profile reviews.
are not described, but there are several transcription errors in the No one group is responsible for this failure and no one group
table that should have been corrected by error checking or dual data produces perfect systematic reviews. We call for the entire
extraction. research community to work together to raise the standard of
systematic reviews published in conservation and environmen-
Mitigation strategies. It is for the reasons of alternative interpretation tal management. Whilst systematic reviews are major undertak-
and false negative errors that the major coordinating bodies require ings that require careful planning and involvement of a range of
at least a subset of the evidence base to be processed (that is, screen- experts, these are not reasons to abandon rigour in favour of an
ing, data extraction and appraisal) by more than one reviewer— unregulated free-for-all in evidence synthesis methods. We call on
typically following by an initial trial of the task to ensure reviewers review authors to conduct more rigorous reviews, on editors and
interpret and apply the instructions consistently (refining instruc- peer-reviewers to gate keep more strictly, and the community of
tions where necessary to improve consistency)5,10. Additionally, methodologists to better support the broader research commu-
few individuals have the requisite skill set to acquire, appraise and nity. We cannot afford to fund or generate second-order research
synthesize studies alone. High-quality evidence synthesis is likely to waste (that is, poor-quality reviews): many primary studies are
involve collaboration with information specialists, evidence synthe- already a waste of resources63, and we must not waste resources
sis methodologists/statisticians as well as domain specialists. on methodologically poor or biased syntheses. Only by working
together can we build and maintain a strong system of rigorous,
Advice for more rigorous reviews evidence-informed decision-making in conservation and environ-
In Box 1, we provide general advice for those involved in funding, mental management.
commissioning, conducting or editing/peer-reviewing/appraising
a review. We give a number of specific recommendations to the Received: 24 March 2020; Accepted: 31 July 2020;
Published: xx xx xxxx
research community to support rigorous evidence synthesis.

Conclusions References
Systematic reviews are increasingly seen as viable and important 1. Grant, M. J. & Booth, A. A typology of reviews: an analysis of 14 review
means of reliably summarizing rapidly expanding bodies of sci- types and associated methodologies. Health Info Libr. J. 26, 91–108 (2009).
2. Haddaway, N. R. & Macura, B. The role of reporting standards in producing
entific evidence to support decision making in policy and prac- robust literature reviews. Nat. Clim. Change 8, 444–447 (2018).
tice across disciplines. At the same time, however, there is a lack 3. Pullin, A. S. & Knight, T. M. Science informing policy–a health warning for
of awareness and appreciation of the methods needed to ensure the environment. Environ. Evid. 1, 15 (2012).

Nature Ecology & Evolution | www.nature.com/natecolevol


NATuRe Ecology & EvoluTion Perspective
4. Haddaway, N., Woodcock, P., Macura, B. & Collins, A. Making literature 33. Min-Venditti, A. A., Moore, G. W. & Fleischman, F. What policies improve
reviews more reliable through application of lessons from systematic reviews. forest cover? A systematic review of research from Mesoamerica. Glob.
Conserv. Biol. 29, 1596–1605 (2015). Environ. Change 47, 21–27 (2017).
5. Pullin, A., Frampton, G., Livoreil, B. & Petrokofsky, G. Guidelines and 34. Bramer, W. M., Giustini, D. & Kramer, B. M. R. Comparing the
Standards for Evidence Synthesis in Environmental Management (Collaboration coverage, recall, and precision of searches for 120 systematic reviews in
for Environmental Evidence, 2018). Embase, MEDLINE, and Google Scholar: a prospective study. Syst. Rev. 5,
6. White, H. The twenty-first century experimenting society: the four waves of 39 (2016).
the evidence revolution. Palgrave Commun. 5, 47 (2019). 35. Bramer, W. M., Giustini, D., Kramer, B. M. R. & Anderson, P. F. The
7. O’Leary, B. C. et al. The reliability of evidence review methodology in comparative recall of Google Scholar versus PubMed in identical searches for
environmental science and conservation. Environ. Sci. Policy 64, 75–82 (2016). biomedical systematic reviews: a review of searches used in systematic
8. Woodcock, P., Pullin, A. S. & Kaiser, M. J. Evaluating and improving the reviews. Syst. Rev. 2, 115 (2013).
reliability of evidence syntheses in conservation and environmental science: a 36. Gusenbauer, M. & Haddaway, N. R. Which academic search systems are
methodology. Biol. Conserv. 176, 54–62 (2014). suitable for systematic reviews or meta‐analyses? Evaluating retrieval qualities
9. Campbell Systematic Reviews: Policies and Guidelines (Campbell Collaboration, of Google Scholar, PubMed, and 26 other resources. Res. Synth. Methods 11,
2014). 181–217 (2020).
10. Higgins, J. P. et al. Cochrane Handbook for Systematic Reviews of Interventions 37. Livoreil, B. et al. Systematic searching for environmental evidence using
(John Wiley & Sons, 2019). multiple tools and sources. Environ. Evid. 6, 23 (2017).
11. Shea, B. J. et al. AMSTAR 2: a critical appraisal tool for systematic reviews that 38. Mlinarić, A., Horvat, M. & Šupak Smolčić, V. Dealing with the positive
include randomised or non-randomised studies of healthcare interventions, or publication bias: why you should really publish your negative results.
both. BMJ 358, j4008 (2017). Biochem. Med. 27, 447–452 (2017).
12. Haddaway, N. R., Land, M. & Macura, B. “A little learning is a dangerous 39. Lin, L. & Chu, H. Quantifying publication bias in meta‐analysis. Biometrics
thing”: a call for better understanding of the term ‘systematic review’. Environ. 74, 785–794 (2018).
Int. 99, 356–360 (2017). 40. Haddaway, N. R. & Bayliss, H. R. Shades of grey: two forms of grey
13. Freeman, R. E. Strategic Management: A Stakeholder Approach (Cambridge literature important for reviews in conservation. Biol. Conserv. 191,
Univ. Press, 2010). 827–829 (2015).
14. Haddaway, N. R. et al. A framework for stakeholder engagement during 41. Viechtbauer, W. Conducting meta-analyses in R with the metafor package.
systematic reviews and maps in environmental management. Environ. Evid. 6, J. Stat. Softw. 36, 1–48 (2010).
11 (2017). 42. Bilotta, G. S., Milner, A. M. & Boyd, I. On the use of systematic reviews to
15. Land, M., Macura, B., Bernes, C. & Johansson, S. A five-step approach for inform environmental policies. Environ. Sci. Policy 42, 67–77 (2014).
stakeholder engagement in prioritisation and planning of environmental 43. Englund, G., Sarnelle, O. & Cooper, S. D. The importance of data‐selection
evidence syntheses. Environ. Evid. 6, 25 (2017). criteria: meta‐analyses of stream predation experiments. Ecology 80,
16. Oliver, S. & Dickson, K. Policy-relevant systematic reviews to strengthen 1132–1141 (1999).
health systems: models and mechanisms to support their production. Evid. 44. Burivalova, Z., Şekercioğlu, Ç. H. & Koh, L. P. Thresholds of logging
Policy 12, 235–259 (2016). intensity to maintain tropical forest biodiversity. Curr. Biol. 24,
17. Savilaakso, S. et al. Systematic review of effects on biodiversity from oil palm 1893–1898 (2014).
production. Environ. Evid. 3, 4 (2014). 45. Bicknell, J. E., Struebig, M. J., Edwards, D. P. & Davies, Z. G. Improved
18. Savilaakso, S., Laumonier, Y., Guariguata, M. R. & Nasi, R. Does production timber harvest techniques maintain biodiversity in tropical forests. Curr. Biol.
of oil palm, soybean, or jatropha change biodiversity and ecosystem functions 24, R1119–R1120 (2014).
in tropical forests. Environ. Evid. 2, 17 (2013). 46. Damette, O. & Delacote, P. Unsustainable timber harvesting, deforestation
19. Haddaway, N. R. & Crowe, S. Experiences and lessons in stakeholder and the role of certification. Ecol. Econ. 70, 1211–1219 (2011).
engagement in environmental evidence synthesis: a truly special series. 47. Blomley, T. et al. Seeing the wood for the trees: an assessment of the impact
Environ. Evid. 7, 11 (2018). of participatory forest management on forest condition in Tanzania. Oryx 42,
20. Sánchez-Bayo, F. & Wyckhuys, K. A. Worldwide decline of the entomofauna: 380–391 (2008).
a review of its drivers. Biol. Conserv. 232, 8–27 (2019). 48. Haddaway, N. R. et al. How does tillage intensity affect soil organic carbon?
21. Agarwala, M. & Ginsberg, J. R. Untangling outcomes of de jure and de facto A systematic review. Environ. Evid. 6, 30 (2017).
community-based management of natural resources. Conserv. Biol. 31, 49. Higgins, J. P. et al. The Cochrane Collaboration’s tool for assessing risk of bias
1232–1246 (2017). in randomised trials. BMJ 343, d5928 (2011).
22. Gurevitch, J., Curtis, P. S. & Jones, M. H. Meta-analysis in ecology. Adv. Ecol. 50. Stewart, G. Meta-analysis in applied ecology. Biol. Lett. 6, 78–81 (2010).
Res. 32, 199–247 (2001). 51. Koricheva, J. & Gurevitch, J. Uses and misuses of meta‐analysis in plant
23. Haddaway, N. R., Macura, B., Whaley, P. & Pullin, A. S. ROSES RepOrting ecology. J. Ecol. 102, 828–844 (2014).
standards for Systematic Evidence Syntheses: pro forma, flow-diagram and 52. Vetter, D., Ruecker, G. & Storch, I. Meta‐analysis: a need for well‐defined
descriptive summary of the plan and conduct of environmental systematic usage in ecology and conservation biology. Ecosphere 4, 1–24 (2013).
reviews and systematic maps. Environ. Evid. 7, 7 (2018). 53. Stewart, G. B. & Schmid, C. H. Lessons from meta-analysis in ecology and
24. Lwasa, S. et al. A meta-analysis of urban and peri-urban agriculture and evolution: the need for trans-disciplinary evidence synthesis methodologies.
forestry in mediating climate change. Curr. Opin. Environ. Sustain. 13, Res. Synth. Methods 6, 109–110 (2015).
68–73 (2015). 54. Macura, B. et al. Systematic reviews of qualitative evidence for environmental
25. Pacifici, M. et al. Species’ traits influenced their response to recent climate policy and management: an overview of different methodological options.
change. Nat. Clim. Change 7, 205–208 (2017). Environ. Evid. 8, 24 (2019).
26. Owen-Smith, N. Ramifying effects of the risk of predation on African 55. Koricheva, J. & Gurevitch, J. in Handbook of Meta-analysis in Ecology and
multi-predator, multi-prey large-mammal assemblages and the conservation Evolution (eds Koricheva, J. et al.) Ch. 1 (Princeton Scholarship Online,
implications. Biol. Conserv. 232, 51–58 (2019). 2013).
27. Prugh, L. R. et al. Designing studies of predation risk for improved inference 56. Britt, M., Haworth, S. E., Johnson, J. B., Martchenko, D. & Shafer, A. B.
in carnivore-ungulate systems. Biol. Conserv. 232, 194–207 (2019). The importance of non-academic coauthors in bridging the conservation
28. Li, Y. et al. Effects of biochar application in forest ecosystems on soil genetics gap. Biol. Conserv. 218, 118–123 (2018).
properties and greenhouse gas emissions: a review. J. Soil Sediment. 18, 57. Graham, L., Gaulton, R., Gerard, F. & Staley, J. T. The influence of hedgerow
546–563 (2018). structural condition on wildlife habitat provision in farmed landscapes.
29. Moher, D., Liberati, A., Tetzlaff, J. & Altman, D. G., The PRISMA Group. Biol. Conserv. 220, 122–131 (2018).
Preferred reporting items for systematic reviews and meta-analyses: the 58. Delaquis, E., de Haan, S. & Wyckhuys, K. A. On-farm diversity offsets
PRISMA statement. PLoS Med. 6, e1000097 (2009). environmental pressures in tropical agro-ecosystems: a synthetic review for
30. Bernes, C. et al. What is the influence of a reduction of planktivorous and cassava-based systems. Agric. Ecosyst. Environ. 251, 226–235 (2018).
benthivorous fish on water quality in temperate eutrophic lakes? A systematic 59. Popay, J. et al. Guidance on the Conduct of Narrative Synthesis in Systematic
review. Environ. Evid. 4, 7 (2015). Reviews: A Product from the ESRC Methods Programme Version 1 (Lancaster
31. McDonagh, M., Peterson, K., Raina, P., Chang, S. & Shekelle, P. Avoiding bias in Univ., 2006).
selecting studies. Methods Guide for Effectiveness and Comparative Effectiveness 60. Pullin, A. S. et al. Human well-being impacts of terrestrial protected areas.
Reviews [Internet] (Agency for Healthcare Research and Quality, 2013). Environ. Evid. 2, 19 (2013).
32. Burivalova, Z., Hua, F., Koh, L. P., Garcia, C. & Putz, F. A critical comparison 61. Waffenschmidt, S., Knelangen, M., Sieben, W., Bühn, S. & Pieper, D. Single
of conventional, certified, and community management of tropical forests for screening versus conventional double screening for study selection in
timber in terms of environmental, economic, and social variables. Conserv. systematic reviews: a methodological systematic review. BMC Med. Res.
Lett. 10, 4–14 (2017). Methodol. 19, 132 (2019).

Nature Ecology & Evolution | www.nature.com/natecolevol


Perspective NATuRe Ecology & EvoluTion
62. Rallo, A. & García-Arberas, L. Differences in abiotic water conditions Competing interests
between fluvial reaches and crayfish fauna in some northern rivers of the S.S. is a co-founder of Liljus ltd, a firm that provides research services in sustainable
Iberian Peninsula. Aquat. Living Resour. 15, 119–128 (2002). finance as well as forest conservation and management. The other authors declare no
63. Glasziou, P. & Chalmers, I. Research waste is still a scandal—an essay by Paul competing interests.
Glasziou and Iain Chalmers. BMJ 363, k4645 (2018).
64. Haddaway, N. R. Open Synthesis: on the need for evidence synthesis to
embrace Open Science. Environ. Evid. 7, 26 (2018). Additional information
Supplementary information is available for this paper at https://doi.org/10.1038/
s41559-020-01295-x.
Acknowledgements
We thank C. Shortall from Rothamstead Research for useful discussions on the topic. Correspondence should be addressed to N.R.H.
Reprints and permissions information is available at www.nature.com/reprints.

Author contributions Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
N.R.H. developed the manuscript idea and a first draft. All authors contributed to published maps and institutional affiliations.
examples and edited the text. All authors have read and approve of the final submission. © Springer Nature Limited 2020

Nature Ecology & Evolution | www.nature.com/natecolevol

View publication stats

You might also like