DORA Indicators Guidance

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

DORA metrics guidance 2

Introduction .................................................................................................................... Page 3

Journal Impact Factor (and other journal metrics) ................................................. Page 6

Citations ........................................................................................................................... Page 7

h-index .............................................................................................................................. Page 8

Field-normalized citation indicators ......................................................................... Page 9

Altmetrics ....................................................................................................................... Page 10

Concluding remarks...................................................................................................... Page 11

Produced by the DORA Research Assessment Metrics Task Force:

Ginny Barbour
Rachel Bruce
Stephen Curry
Bodo Stern
Stuart King
Rebecca Lawrence

This content is available under a Creative Commons Attribution Share Alike License
(CC BY-SA 4.0).

Please cite this document as:

DORA. 2024. Guidance on the responsible use of quantitative indicators in research


assessment. http://doi.org/10.5281/zenodo.10979644

For more information, contact Zen Faulkes, DORA Program Director.


DORA metrics guidance 3

Research assessment is an important or research projects requires careful


and challenging task and many contextualization.
institutions work hard to grapple with
The Declaration on Research
its complexities. Nevertheless, the
Assessment (DORA) is best known for
tendency to fall back on quantitative
being critical of the misuse of the
indicators (or metrics1) that are often
Journal Impact Factor (JIF) in
assumed to provide a measure of
research evaluation. As a result,
objectivity remains widespread. While
DORA is often asked for its views on
indicators have great utility in the
other indicators. In this briefing note
fields of bibliometrics and
we therefore aim to explain how the
scientometrics (e.g., tracking the
principles underlying DORA apply to
growth or decline of different
other quantitative indicators that are
subfields), they are inherently
sometimes used in the evaluation of
reductive so their use in the
research and researchers.
assessment of individual researchers

1
While the term “metric” suggests a quantity that is being measured directly, “indicator” better reflects the fact
that quantities being used in research assessment are more often indirect proxies for quality. We therefore use
“indicator” throughout this briefing document.
DORA metrics guidance 4

A close reading of the Declaration on Research Assessment reveals an approach to the


use of quantitative information that is based on five simple principles:

What is your rationale for using How will you take account of the
particular quantitative indicators in proxy and reductive nature inherent
your research or researcher in any indicator? (e.g., citations are
assessments? Is it grounded in good not a direct measure of quality; the
evidence? h-index takes no account of age,
discipline, or career breaks).

Ideally, rules for the use of


quantitative indicators in research How will you avoid biases inherent in
assessment should be developed in quantitative indicators? Though it is
dialogue with your research often assumed that bibliometric
community.2 They should be indicators are “objective,” decisions
published so that those being to publish a paper or to cite it are
evaluated understand your criteria. choices that can reflect structural
Make sure also that reviewers are and personal biases. Decision makers
fully aware of your approach to using need to be proactive and transparent
quantitative information in in efforts to mitigate the impact of
assessment. these biases in research assessment
— and the same obviously applies to
the qualitative aspects of assessment.

How well does the indicator refer to


the qualities of the person or the
piece of work being assessed? Be
mindful of aggregate metrics (e.g.,
JIF, h-index), which conceal large
variations in performance, and of
composite indicators (e.g., scores in
university league tables, altmetrics),
which are made up of arbitrarily
weighted scores for very different 2
Ideally also, any indicator used should be based
attributes and activities and are on open data and algorithms so that anyone being
therefore difficult to interpret evaluated can verify how it is calculated but many
commonly used “off the shelf” indicators still rely
meaningfully.
on closed data.
DORA metrics guidance 5

Below we explain how these evaluation. Best practice is to co-


principles apply to some of the more create research assessment
commonly used indicators. The list processes with your organizational
cannot be exhaustive, but we hope community and to start by agreeing
these examples will show how the on the values, outcomes, and
principles could be applied in behaviors that will set the
practice to any quantitative indicator. benchmarks for your assessment. The
Giving undue weight to just one or INORMS SCOPE framework or
two indicators is unlikely to provide a DORA’s SPACE rubric are useful tools
properly informed or balanced for this purpose.

The SCOPE framework is created by The SPACE Rubric is available in the


INORMS. DORA Resource Library.
DORA metrics guidance 6

The Journal Impact Factor (JIF) is an shows that JIFs are poor predictors of
indicator that can essentially be citation performance of individual
defined as the annual average articles. Further, it is also often stated
number of citations to papers in any that the quality of peer review is
given journal in the two preceding higher at high JIF journals, but we
years. (The actual calculation is more know of no good evidence to support
opaque than this.) this.

Criticisms of the JIF are laid out in So, when judging individual
some detail in the DORA declaration publications or their authors, one has
and elsewhere, but the critical issue to look closer. The individual citation
for research assessment is that claims performance of the paper can provide
that the JIF is a signifier of the value some insight but, as discussed below,
or quality of an individual paper are needs to be considered in context.
not supported by a close examination Assessment of the content is also
of the evidence. The JIF is a measure critical, as is knowing the particular
of what might be termed the “average contribution of any author listing it in
citation performance” of papers in a their CV. Narrative CVs are emerging
particular journal but, aside from its as a useful tool for capturing this
many technical shortcomings, it gives more qualitative information in
no indication of the variation in the concise and comparable forms.
distribution of citations, which
The reservations noted here
typically range over 2-3 orders of
regarding the use of the journal
magnitude, from which it is
impact factor apply equally to other
calculated. Although it is tempting to
journal-based indicators, e.g., the
rely on the law of averages and
Citescore, the Eigenfactor Score, and
conclude that a paper from a high JIF
the Source Normalized Impact per
journal is likely to be better than one
Paper (SNIP).
from a low JIF journal, the evidence
DORA metrics guidance 7

The citation count of an article is researchers, a phenomenon long


defined as the number of times it is known as the Matthew effect.
included in the reference list of other Likewise, citations of identical
articles or books. At first glance, editorials published in multiple
using article citations in researcher journals correlate with the Journal
assessment is an improvement over Impact Factor. Numbers of citations
journal-based indicators like the are also impacted by the variable
journal impact factor, because publication volumes of different
citations offer information at the disciplines; citations should therefore
relevant level of granularity, the not be used to compare researchers
individual research article. However, in dissimilar fields. Differences in
as with any quantitative indicator, citation patterns that disfavor women
citations provide a limited view of are also well documented and should
researcher performance. be taken into account when
considering citations for researcher
Citation performance is a lagging
assessment. Moreover, since citation
indicator that takes time, often years,
data do not indicate whether articles
to turn into a robust signal. It is
are cited for positive or negative
therefore not well suited to evaluate
reasons, they cannot be used to
recent scholarship or to compare
indicate research quality without
researchers at different career stages,
additional supporting information;
or in different disciplines.
work to develop Citation Typing
Any use of citations in research Ontology may help to resolve this
assessment should also bear in mind issue in the future.
other limitations. Bibliometricians
For all these reasons, citation data
acknowledge that citations reflect the
cannot replace the critical judgment
influence of a research article, but
of experts and should be used with
this can differ in important ways from
caution in researcher assessment. An
what evaluators may really want to
indicator that reflects to what extent
determine: the quality and
subsequent research builds upon a
significance of research findings.
reported discovery would be a
Citation patterns can be skewed by
significant improvement on current
author and journal reputations; e.g.,
citation-based metrics, since re-use
author status can lead to citation
of research findings signifies rigor
bias, with prominent researchers
and significance, two key features of
attracting more citations for similar
high-quality research.
work than less well-known
DORA metrics guidance 8

The h-index for individual authors is assessment. For example, the h-index
defined as the number of their papers will usually be higher for researchers
that have been cited at least h times; at later career stages, or who have
an author with a h-index of 10, for not taken career breaks, or who work
example, has ten papers, each with at in disciplines that attract higher
least ten citations. citation rates (e.g., medical sciences
as compared with mathematics or
The h-index is commonly used by
humanities); nor does it take account
institutions and individuals to
of the nature of the author’s
compare researchers or to monitor
contributions to each of their papers.
their “performance” over time.
In disciplines that rely increasingly on
However, it is difficult to interpret
interdisciplinary, collaborative
meaningfully, not least because it can
approaches, the h-index may thus
give inconsistent and counterintuitive
reflect participation in large teams
readings of researcher impact.
rather than individual contributions.
Moreover, the value of the h-index
depends on the database used to Any organization making use of the
derive it (e.g., Web of Science, h-index in research assessment
Scopus, Google Scholar) and can be should be able to explain how it
manipulated by gaming. provides a meaningful insight into
individual research performance, and
As a reductive aggregate indicator,
how account is taken of individual
the h-index also lacks crucial
circumstances (e.g., academic age,
contextual information that should be
career breaks, scholarly discipline).
included in responsible research
DORA metrics guidance 9

Commonly used field-normalized variation inherent in the numbers of


citation indicators such as Field citations attracted by the papers
Weighted Citation Impact (FWCI) or making up any given body of work
Relative Citation Ratio (RCR) (similar to the skewed citation
represent attempts to correct for the distributions characteristic of any
citation variability arising from given journal). Analysis shows that in
differences between fields, types, and datasets comprising only a few tens
ages of publications. FWCI is or hundreds of papers, the average
calculated typically for a collection of FWCI is less reliable because of the
publications as the average ratio impact of highly cited outliers. The
obtained by dividing the average FWCI should therefore only be
number of citations accrued per applied to large datasets, typically
paper in the collection by the average comprising thousands of papers, e.g.,
expected for papers of the same type the aggregate output of a large
(e.g., primary research articles) and department. Even then, the variability
year of publication that are in the associated with differences in sample
same field. It is therefore an indicator size means it should not be reported
of the relative citation performance beyond a single decimal place. It is
of a body of research work. For not suitable for evaluating individual
instance, an FWCI of 2 means the researchers because it is unreliable at
research has twice its expected the scale of a typical bibliography and
number of citations for papers in a can fluctuate significantly over time.
given subject area.
The RCR is an article-level indicator
Caution is necessary when using that correlates strongly with the
indicators such as FWCI, not only FWCI across numerous subject areas,
because of the difficulty in defining and has elicited similar concerns
which papers belong in which fields about its reliability and suitability for
(which affects the denominator in the researcher assessment.
calculation), but also because of the
DORA metrics guidance 10

Altmetrics, a generalization of the the organizations that produce these


term “alternative metrics,” attempt to scores reassess periodically how best
capture the amount of attention a to create the composite figure and as
research output has received in non- different contributing sources are
academic outlets (e.g., organizational added or removed over time. It is also
reports, social media). Types of important to note that some of the
activities captured within the metric activities included in altmetric scores,
score vary enormously, from those especially those associated with
more focused on public engagement social media, are prone to being
(e.g., tweets and reposts, Facebook gamed.
mentions, newspaper or YouTube
Because of the relatively opaque ways
coverage), through to researcher
they are calculated, altmetric scores
engagement (e.g., patents, numbers
provide little context for the type and
of post-publication peer reviews,
purposes of engagement with
inclusion in research highlight
particular research outputs and are
platforms), and even inclusion in
difficult to interpret in terms of
policy documents. Different types of
broader research impact. They are
altmetric scores, which can be
not in any meaningful sense a
calculated for articles, books, data
measure of research quality.
sets, presentations, and more, can be
obtained from a range of commercial However, when details of the original
providers, including Altmetric, mentions and references that
ImpactStory, Plum Analytics, and contribute to these altmetric scores
Overton. are provided, these might provide
useful information in a more specific
Altmetric information is often
context about the levels of attention
presented as a composite score,
and reach of a research output (e.g.,
which represents a weighted measure
interest generated among patient
of all the attention picked up for a
advocacy groups). In such
research output (i.e., not a raw total
circumstances, they may be a useful
of the number of mentions). The
component of a broader examination
weightings used vary for different
of research contributions.
types of attention and can change as
DORA metrics guidance 11

The guidance given in this briefing evaluation since the ability to win
note is neither exhaustive nor competitive funding for ideas is a
comprehensive, but it illustrates how desirable attribute, but this
the principles laid out in the DORA information should always be
declaration can be applied when contextualized. For example, it is
other metrics are considered for use important to recognize that the
in assessment of research or requirement for funds differs
researchers. The examples included markedly between fields (even within
here refer only to publication-based STEM disciplines), that biases still
metrics, but other indicators should disfavor women and other under-
be treated in the same way (e.g., see represented groups, and that even
the Metrics Toolkit and the the most rigorous funding decisions
challenges associated with making are attended by uncertainty and are
targets of metrics). For example, poorly predictive of research
grant funding income is often productivity.
assessed during researcher
DORA

6120 Executive Blvd., Rockville, MD, USA

sfdora.org

You might also like