Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Integrative and Comparative Biology

Integrative and Comparative Biology, pp. 1–11


doi:10.1093/icb/icy076 Society for Integrative and Comparative Biology

SYMPOSIUM

Biodiversity Assessment, DNA Barcoding, and the


Minority Majority
Julia D. Sigwart1 and Amy Garbett
Queen’s University Marine Laboratory, 12-13 The Strand, Portaferry BT22 1PF, Northern Ireland
From the symposium “Measuring Biodiversity and Extinction: Present and Past” presented at the annual meeting of the
Society for Integrative and Comparative Biology, January 3–7, 2018 at San Francisco, California.

1
E-mail: j.sigwart@qub.ac.uk

Synopsis The majority of species on Earth are in “under-studied” groups, and indeed probably the majority of species
remain undiscovered and undescribed. Species are natural units of evolution, and they are formed from branching phyloge-
netic processes that have a mathematical structure. So it follows that we should be able to develop a set of general principles
that describe global patterns of species groups, like genera. Understanding such patterns would lend considerable power to the
approach of “taxonomic surrogacy.” In environmental assessments, ecology, and paleontology, it is common to substitute
genus-level or family-level identification where definitive species identification is impractical. Clarity and confidence in fun-
damental patterns, based on a robust null model for species and genus level diversity, can accelerate species discovery: there
are more species in the tropics, species-poor genera are very common, large genera are rare. Much hope has been placed in
DNA barcoding as an effective tool to increase the pace of species discovery, but it is abundantly clear that certain mito-
chondrial DNA (mtDNA) markers are more or less variable in different clades and universal threshold values are impractical
to delimit species. This study further examines the patterns of divergence in one common mtDNA barcode fragment,
cytochrome c oxidase subunit 1at the genus level. We compared pairwise divergence in this fragment between two animal
clades that have similar species richness but different evolutionary histories: birds and bivalves. We analyzed quality controlled
alignments of over 39,000 published sequences in 1223 genera. Median pairwise differences at the genus level are positively
correlated with the species richness of a genus, and this is not dependent of the number of sequences sampled. Unsurprisingly,
sequence divergence in vertebrates was far more constrained than in evolutionarily more ancient non-vertebrate clades.
Differences among the groups examined highlight the need for DNA barcode approaches to be considered in the context
of specific biological groups. Vertebrates are better studied, but not necessarily representative of the majority of biodiversity. A
technique that provides powerful insights for vertebrate species may be ineffective for the majority of organisms.

Introduction Skewed distributions have been observed in tax-


Many natural and human systems follow highly onomy, where smaller groups are very frequent and
skewed distributions. Skewed distributions are so larger groups are less common (Fig. 1). Monotypic
prevalent, it has even been proposed that this may genera, those with only one genus, are the largest
be an emergent property of the universe, attributable fraction of global genera; large genera are globally
to entropy (Grönholm and Annila 2007). The distri- rare. This was first observed by Yule (1925), but
bution of wealth, frequency of the sizes of cities, or many authors considered the skewed frequency dis-
the sizes of corporations all form similar patterns tribution within taxonomic ranks to be an artifact
where smaller units are very frequent and larger units of human preferences in classification (Strand and
are increasingly less common. There are many small Panova 2015). Recent work has now shown that the
towns but a few major cities; there are many poor skew distribution in taxonomy is concordant with
people and a few mega-rich. The same type of pattern evolution, and phylogenetic simulations produce
emerges in ecology, such as in the relative abundance similar taxonomic patterns to real world data
of co-occurring species: most species are rare. (Sigwart et al. 2018).

ß The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology.
All rights reserved. For permissions please email: journals.permissions@oup.com.
Downloaded from https://academic.oup.com/icb/advance-article-abstract/doi/10.1093/icb/icy076/5058946
by New Mexico State University Library user
on 28 July 2018
2 J. D. Sigwart and A. Garbett

60% All marine invertebrates Birds


n = 29,269 genera n = 2,278 genera
Odonata Rhodophyta
n = 688 genera n = 1,005 genera
50%
Reptiles Bryophyta
n = 1,176 genera n = 561 genera
percent of genera

40% Fish Pteridophyta


n = 4,915 genera n = 1,309 genera
Mammals
n = 1,242 genera
30%

20%

10%

0%

1 5 10 50 100 500 1000


genus size (species richness)

Fig. 1 The distribution of genus size in terms of species richness follows a well-described right skew pattern. Among global genera in
animals and plants, monotypic (one species) and small genera are most frequent, and large genera are a very small proportion of
genera. Figure modified from Sigwart et al. (2018).

The relationship between phylogeny and taxon- mitochondria available from each cell, and mtDNA
omy is important because measures of biodiversity is also known to be more durable than nuclear DNA
frequently depend on the substitution of higher- (e.g., Schwarz et al. 2009), although this varies
ranked groups wherein species identifications are among different organisms perhaps with metabolic
not available (Gaston and Williams 1993; Bertrand rate (e.g., Kazakova and Markosian 1966). This
et al. 2006). This approach of “taxonomic surrogacy” makes mtDNA a useful tool for cases where environ-
is used in estimations of both present biodiversity mental and weather conditions may have degraded
and long-term patterns in the fossil record. Major the DNA, including in subfossil remains, and mu-
shifts in the diversity of genus-level or family-level seum material in chemical preservation (Friedman
groups are the primary evidence for past mass ex- and DeSalle 2008; Higgins et al. 2015). Work by
tinction events (Raup and Sepkoski 1986; Hendricks Hebert et al. (2003) proposed that most animal spe-
et al. 2014). Identifying some organisms to genus, cies can be rapidly and correctly identified by exam-
family, or perhaps even coarser groupings is a tech- ining the DNA sequence of a portion of the
nique that is also routinely used in ecology and en- cytochrome c oxidase subunit I (COI) mitochondrial
vironmental biology. In biodiversity assessment it is gene)—a “DNA barcode.”
common to identify a specimen to the best available In order for DNA sequence data to provide useful
level, based on the expertise of the observer or the species-level identifications, there must be a well-
quality of information. It is not clear whether mor- developed reference library to match the genetic se-
photaxonomic approaches (using key characters or quence of the individual of concern to species that
synapomorphies for a diagnosis to genus or family was previously identified (Gotelli 2004). If there is
level, as appropriate) are fully transferable to modern no reference sequence that matches the species being
molecular approaches used to delimit species. examined, then comparing a new sequence to the
Identifying species often involves using mitochon- database will either return no answer or will be,
drial DNA (mtDNA) fragments. There are several worse, an incorrect identification based on the clos-
advantages of mtDNA over nuclear markers. It is est available match. Such publicly accessible reference
estimated that mtDNA has a mutation rate perhaps databases covering taxa from around the globe are
10 times higher than that of nuclear DNA, and this continuously being built, but remain very incomplete
makes it appropriate for identifying recent evolu- (Federhen 2011; Ratnasingham and Hebert 2007). In
tionary divergences, at the population or species level fungi, this was identified as a significant problem in
(Brown et al. 1979). There are vast numbers of 2008, when the number of sequences produced from

Downloaded from https://academic.oup.com/icb/advance-article-abstract/doi/10.1093/icb/icy076/5058946


by New Mexico State University Library user
on 28 July 2018
The biodiversity minority majority 3

amplifying environmental samples had already begun unaffected by rapid radiations. Finally, species may
to outpace specimen-based voucher sequences with be placed in a genus by taxonomists at random, which
robust identification (Blackwell 2011). would therefore result in entirely random genetic di-
In cases where a species-level match is not possi- vergence across genus-level groups.
ble, molecular operational taxonomic units can be In order to address this question, we sampled all
analyzed as interim units for ecological studies available published COI barcode sequences for two
(Blaxter et al. 2005). In the most automated approach, major animal groups with contrasting taxonomic
often used in microbiology, operational taxonomic and evolutionary contexts. Birds are the most taxo-
units are identified based on a similarity threshold nomically complete major group of animals; although
of 1–3% difference in pairwise comparisons among there are ongoing arguments about the partitioning of
sequence fragments. There are, however, limitations species and subspecies units, the number of named
associated with barcoding where recently diverged or species of birds closely matches with the projected
hybridizing species may be overlooked, or specimens total global species richness (Barrowclough et al.
from one species with naturally high variation in that 2016). Birds also present a wealth of natural history
region are falsely scored as multiple species. Over a data and extensive knowledge of their evolution and
large spatial scale the separation between intraspecific their major radiation that began shortly after the end
versus interspecific divergences can be reduced, limit- Cretaceous, 66 million years ago (Claramunt and
ing the power of the analysis to observe distinct spe- Cracraft 2015). The rates of evolution of bird mito-
cies (Bergsten et al. 2012). It is clear that a generalized chondrial genomes are correlated with species diver-
threshold to separate species is not appropriate, and sification (Eo and DeWoody 2010). By contrast,
may not even be transferable across groups or regions. bivalves are a broadly distributed invertebrate group
Importantly, higher taxonomic ranks are not as- with a deep and rich fossil record dating to the ear-
sumed to represent any fixed level of diversification. liest hard-shelled animals in the Cambrian (Stöger
Different groups evolve at very different rates (Langley et al. 2013). Yet there is an ongoing and high fre-
and Fitch 1974). Therefore, a genus, family, or class quency of discovery of new species of bivalves, and
can neither represent an absolute span of diversity in the basic shape of the bivalve phylogenetic tree has
terms of species richness nor morphological disparity. only recently been resolved (Gonzalez et al. 2015).
That would imply some ultimate authority and a rig- Systematic studies of many other groups of hard-
idly fixed process of macroevolution. Instead, ranks shelled marine invertebrates with long fossil records
are applied in a relativistic framework, grouping spe- are similarly dominated by discovery of new taxa over
cies based on comparisons with their nearest similar phylogenetic revision.
or evolutionarily related counterparts. Nonetheless, There are clearly different taxonomic traditions in
rates of diversification of lineages that are morpho- different organismal groups; the technical specifics of
logically recognised as species and clades should be what differentiates species- or genus-level diagnoses
linked to rates of evolution of the genome. cannot be directly transferred from one animal group
For many reasons, the question of genus level di- to another. Still, emergent taxonomic pattern of
versity is crucial to understanding global biodiver- genus-size leads to statistically equivalent frequency
sity. Binominal taxonomic names include species distributions in birds and marine invertebrates
and genus epithets, and fundamentally similar spe- (Sigwart et al. 2018; Fig. 1). This sets up a comparison
cies are classified in the same genus. This article of contrasts to examine the role of a DNA barcode
asked a simple question of whether the genetic “size” marker in distinguishing genus-level groups.
of a genus, measured as divergence in the COI bar-
code region, had any recognizable trends or patterns
that correlate with species richness of the sampled ge-
nus. For the sake of argument, there are several poten- Methods
tial patterns that may link the variation in species The analytical workflow for this project was man-
richness of genera to genetic divergence. For example, aged in R (R Core Team 2017). This followed a
there could be a fixed threshold distance that univer- process, described below, of extracting data from
sally separates species. In this case, higher species rich- the online BOLD Systems (Ratnasingham and
ness in a genus would result in a larger genetic size of Hebert 2007; http://www.boldsystems.org/, accessed
the genus. Alternatively, a genus may be a fundamental March 2018), applying quality control filters to the
unit with a fixed genetic size. In this scenario, the di- data, aligning sequence files via the CIPRES Science
vergence among species contained in a genus would be Gateway (Miller et al. 2010), and applying further
constant regardless of species richness, and remain quality controls by inspection, before assessing

Downloaded from https://academic.oup.com/icb/advance-article-abstract/doi/10.1093/icb/icy076/5058946


by New Mexico State University Library user
on 28 July 2018
4 J. D. Sigwart and A. Garbett

patterns of genetic distance. In these steps, data were whole containing genus for any ambiguous taxon
managed at the taxonomic ordinal level. was excluded from the analysis. If there was only
Taxa were selected for the analysis to represent one sequence fragment for a genus, then that genus
animal groups with deliberately different taxonomic was discarded.
completeness and taxonomic practice. We extracted Finally, we assessed the taxonomic completeness
COI barcode data for all living species of birds, of genus-level groups with available sequence data.
bivalves, and three other groups of marine inverte- We compared the number of species within each
brates with hard skeletons and deep fossil records: included genus to established lists of genus species
polyplacophoran molluscs (chitons) in two orders: richness from authoritative taxonomic databases as
Lepidopleurida and Chitonida, bryozoans in the or- used in our previous research compiled in 2015
der Cheilostomatida, and brittlestars in the order (Sigwart et al. 2018). These figures for species rich-
Ophiurida. The taxonomic names of orders were ness may exclude a small number of new species
used as the “taxon” search string via the “bold” named in recent years, but this is unlikely to have
package in R (Chamberlain 2017). Alternative ver- any systemic impact on results.
sions of the names were added whenever available Alignment files for each taxonomic order were
(e.g., combining results for both “Venerida,” and then split into subsets for each genus (based on
“Veneroida” for Venus clams). The extracted data the first element of the sequence name, the genus
were subset to include only COI fragments, filtered ID inherited from the original record data). We cal-
by the database entry for genetic marker being either culated a pairwise-distance matrix comparing all
“COI-5P” or blank. The resulting filtered dataset was fragments in each genus using the function dist.a-
exported in fasta format. Each sequence file was then lignment in the SeqinR package (Charif and Lobry
aligned in CLUSTALW (Larkin et al. 2007) using a 2007). Gaps (i.e., missing data at either end of these
high gap penalty to prevent internal gaps. Resulting alignments) were not counted in the distance iden-
alignment files were then inspected, and any mis- tity measure. Note that this function returns a ma-
aligned sequences were discarded (these were as- trix of square root values for percent difference;
sumed to be reverse compliment sequences or these were squared to report percent-differences.
alternative fragments that were not correctly identi- This returned the following data for each genus-
fied in the source database, but no attempt was level group: genus name, number of species in-
made to fit them to the alignment). Where a se- cluded, number of sequences in analysis, median
quence was fit to the alignment via internal gaps percent difference from pairwise comparisons, and
or violated a conserved sequence region, it was re- maximum pairwise difference.
moved. The alignment was then manually trimmed We compared genetic distance per genus into four
to a central region of 400 bp that maximized overlap metrics: genus size (species richness), taxonomic
of the available sequences in that taxonomic order. completeness (proportion of valid species included),
All sequence records that were incomplete for ge- species sampled (the number of species included per
nus and/or species identification were discarded. genus), and sequence coverage (number of available
Species names in online genetic data repositories sequence fragments per genus). Comparisons were
also include unidentified species (often called visualized using the R package GGally (Schloerke
“sp.”), hybrids and varieties, and mis-spellings (e.g., et al. 2017). We examined patterns of genetic dis-
one species of parrot represented as both tance data into four partitions: bivalves, other sam-
Orthopsittaca manilatus and O. manilata). We did pled invertebrates (not a natural group), passerine
not attempt any intervention to correct the syntax, birds, and non-passerine birds (as the order
so completeness must be seen as approximate, and Passeriformes comprises more than half of living
may be inflated. The number of species names per bird species richness). Spearman’s rank correlation
genus may exceed the number of valid names, where coefficient was calculated for each comparison, us-
there are additional variant spellings, varietals, or ing log-10 transformed data for species richness,
subspecies. Where species coverage exceeded 100% species sampled, and sequences sampled.
(more species in our analysis than that are taxonom- Additional statistics were not pursued because of
ically accepted as valid species in that genus), these low correlation values in almost all cases, inappli-
genera were included in the analysis but excluded cability of normality criteria, and intrinsic inter-
from trends in completeness. We examined the dependencies in the potential predictors (e.g., num-
data for ambiguous taxa (those that were identified ber of species sampled and taxonomic completeness
with “aff.,” “cf.,” “sp.,” or other indication of uncer- are capped by the species richness of a sampled
tainty in the species-level identification), and the genus).

Downloaded from https://academic.oup.com/icb/advance-article-abstract/doi/10.1093/icb/icy076/5058946


by New Mexico State University Library user
on 28 July 2018
The biodiversity minority majority 5

Table 1 Comparative information for data used in the present analysis in birds (data for Passeriformes are given separate to other
orders, as they are illustrated separately in Fig. 3), bivalves, and additional hard-shelled invertebrate taxa (chitons, bryozoans, and
ophiuroids, see the “Methods” section)

Approximate global species


richness (nearest 1000)
Sequences Species Genera Total valid living Sequences per
Animal group sampled sampled sampled genera (approx.) genus (median) Described Estimated
Non-passerine birds 4644 861 276 980 7 4000 4000
Passerine birds 8052 1640 517 1300 7 6000 6000
Bivalves 20,928 1334 290 500 12 9000 10,000–20,000
Chitons 1505 100 32 70 12.5 1000 –
Bryozoans 530 61 30 400 3 4000 –
Ophiuroids 1038 88 41 300 4 2000 –
The number of genera sampled is based on genus identification provided by the data repository (BOLD), this number excludes genera removed
because they were represented by a single fragment (and therefore no distance metric could be calculated) or included species with ambig-
uously identified species.

Results absent, without considering the fraction of its known


This analysis used a total of 1223 animal genera, species that are present in the dataset. Genera with
represented by 39,644 DNA sequence fragments fewer species are inherently more likely to be
(Table 1). As expected, the availability of sequence completely sampled (e.g., a genus with only one spe-
fragments was very heavily right skewed, with the cies must have 100% of species sampled, if it is in-
majority of genera having few sequences available cluded at all). Thus the non-correlation or even
and a few taxa with many sequences. On inspection slightly negative correlation of these distance metrics
of selected data, we observed that the distribution of with taxonomic completeness does not refute the
pairwise comparison values is often multimodal, per- larger pattern of larger genera with greater distances.
haps resulting from a combination of data sources. A Importantly, maximum value for pairwise distances
dataset at the genus level could include multiple spe- among sequences in a genus could easily be skewed
cies and/or sites, and local population-level studies by the inclusion of even a single misidentified sam-
that may produce a cluster of lower distance values ple. Median values for these data are probably more
with many measurements from a single species or informative.
locality. An arithmetic mean is not an appropriate Median values for pairwise distances indicate two
measure of central tendency for the genus genetic important findings. First, median genetic distances
size, and the median value of the distance matrix are larger in genera that contain more species, and
was taken as the primary descriptive statistic; al- increase with both species richness and the number
though the median does not capture any description of species sampled. Second, median distance meas-
of frequency distribution, it is less biased. urements are not affected by increasing sample sat-
Pairwise distances per genus occupied a much uration; the number of sequences sampled is not
broader range of values in bivalves and other inver- significantly correlated with median genetic distance
tebrate taxa when compared with those of birds per genus in bivalves (P ¼ 0.70) or other inverte-
(Figs. 2 and 3). Pairwise distance values were not brates (P ¼ 0.64) and only weakly correlated with
strongly correlated with the other factors examined, increasing sampling in birds (Spearman’s q ¼ 0.46).
although these were nominally statistically signifi-
cant in all cases of birds, the highest correlation co-
Discussion
efficient was <70% which cannot be considered as a
strong indicator of a potential causative relationship. Diversity, divergence, and expectations
The data do suggest a strong positive relationship It was not our expectation that birds and bivalves
between genus-level genetic distance and both spe- have very much in common, and these two major
cies richness and sampling density; larger genera and groups were deliberately selected as contrasting ani-
larger datasets have greater maximum pairwise dis- mal groups with coincidentally similar species and
tances. This does not account for taxonomic genus richness. The pitfalls of dependency on DNA
completeness—here, a genus is only present or barcodes have been highlighted in many previous

Downloaded from https://academic.oup.com/icb/advance-article-abstract/doi/10.1093/icb/icy076/5058946


by New Mexico State University Library user
on 28 July 2018
6 J. D. Sigwart and A. Garbett

Maximum pairwise distance 50%

40%
per genus (COI)

30%

20%

10%

0%

50%
Median pairwise distance

40%
per genus (COI)

30%

20%

10%

0%
1 10 100 0 0.50 1 1 5 10 1 10 100 1000
species richness taxonomic completeness species sampled sequences sampled
per genus
Maximum

non-passerines: non-passerines: non-passerines: non-passerines:


ρ = 0.65 (P < 0.001) ρ = -0.22 (P < 0.001) ρ = 0.75 (P < 0.001) ρ = 0.60 (P < 0.001)
passerines: passerines: passerines: passerines:
ρ = 0.61 (P < 0.001) ρ = -0.22 (P < 0.001) ρ = 0.78 (P < 0.001) ρ = 0.64 (P < 0.001)
per genus

non-passerines: non-passerines: non-passerines: non-passerines:


Median

ρ = 0.62 (P < 0.01) ρ = -0.20 (P = 0.0012) ρ = 0.72 (P < 0.001) ρ = 0.46 (P < 0.001)
passerines: passerines: passerines: passerines:
ρ = 0.55 (P < 0.01) ρ = -0.19 (P < 0.001) ρ = 0.72 (P < 0.001) ρ = 0.44 (P < 0.001)

Fig. 2 Measures of genetic distance per genus, as correlated to other taxonomic attributes of the genus (i.e., each plotted circle
represents one genus). The metrics are either the maximum (top) or median (lower) pairwise genetic distance for COI fragments
available for all genera of birds. The distance metrics are compared with species richness (the number of taxonomically valid species
per genus), taxonomic completeness (number of species sampled per genus, as a percent of valid species), species sampled (count of
species included per genus in the analysis), and sequences sampled (number of data points as barcode sequences per genus). Genera in
the order Passeriformes are solid, darker circles; other non-passerine genera are lighter, open circles. Spearman’s rank correlation
coefficients for each plot are given in the lower half of the figure.

studies, in particular that specific markers perform the lack of any tidy correlations of our measured
differently across animal groups, and that is illus- factors with genetic distances in bivalves or other
trated by the present results (Rubinoff 2006; Kress invertebrates, it is tempting to consider explanations
et al. 2015). The standards developed in the context that would excuse the pattern, based on an assump-
of one group of organisms may not be transferable tion that the more constrained patterns observed in
to any other group. birds is the “correct” result. Perhaps the identifica-
There are very different patterns in genetic dis- tion or definitions of genera were incorrect in these
tance at the genus level between birds and the inver- invertebrates, species were incorrectly identified,
tebrate animals sampled, and the difference is sequences were contaminated, too few genera were
sufficiently stark that it is useful to consider similar- included, or species richness within genera insuffi-
ities as well as differences between the clades and the ciently sampled. We will consider each of these issues
data analyzed. Genetic distances in bird genera fol- below.
low striking and well-resolved patterns with correla- On the issue of sampling available species or ge-
tions between genetic distances and other factors. nus diversity, representation appears to be largely
The measured genetic distances per genus are more comparable and equivalent between birds and the
constrained in birds, with a maximum pairwise dif- invertebrate sequences analyzed (Table 1); in both
ference of around 20% per genus compared with up cases, they represent mostly low numbers of sequen-
to 50% divergence within bivalves genera. In light of ces per species or per genus and low sampling

Downloaded from https://academic.oup.com/icb/advance-article-abstract/doi/10.1093/icb/icy076/5058946


by New Mexico State University Library user
on 28 July 2018
The biodiversity minority majority 7

Maximum pairwise distance


50%

40%
per genus (COI)

30%

20%

10%

0%

50%
Median pairwise distance

40%
per genus (COI)

30%

20%

10%

0%
1 10 100 0 0.50 1 1 5 10 1 10 100 1000
species richness taxonomic completeness species sampled sequences sampled
per genus
Maximum

bivalves: bivalves: bivalves: bivalves:


ρ = 0.29 (P < 0.01) ρ = 0.14 (P = 0.19, N.S.) ρ = 0.54 (P < 0.001) ρ = 0.4 (P < 0.01)
invertebrates: invertebrates: invertebrates: invertebrates:
corr = 0.4 (P < 0.001) corr = -0.15 (P = 0.18, N.S.) corr = 0.67 (P < 0.001) corr = 0.36 (P < 0.001)
per genus

bivalves: bivalves: bivalves: bivalves:


Median

ρ = 0.42 (P < 0.001) ρ = -0.10 (P = 0.37, N.S.) ρ = 0.47 (P < 0.001) ρ = -0.03 (P = 0.70, N.S.)
invertebrates: invertebrates: invertebrates: invertebrates:
ρ = 0.22 (P = 0.046) ρ = -0.03 (P = 0.81, N.S.) ρ = 0.53 (P < 0.001) ρ = −0.05 (P = 0.64, N.S.)

Fig. 3 Measures of genetic distance per genus, as correlated to other taxonomic attributes of the genus (i.e., each plotted circle
represents one genus) and metrics are all as in Fig. 2). Bivalve genera are solid (teal) circles; select other invertebrates are black, open
circles. Spearman’s rank correlation coefficients for each plot are given in the lower half of the figure.

density of known species. Higher density sampling is genera that contained only a single sequence, or
unlikely to change the pattern recovered in bivalves. any sequences that had ambiguous species-level iden-
Other explanations for the divergent patterns in tification. Uncertainty at the species level indicates
bivalves in birds—misidentification, contamination— neither uncertainty of genus-level identification nor
would imply substantial, systemic problems across a the identification of other species in the genus, but
very large scale of collective research on bivalves and our conservative approach removed relatively few
marine invertebrates (which some invertebrate work- genera. We did examine data from excluded genera,
ers may suspect is the case) and also require the un- and there was no evident pattern of genetic distance
realistic absence of equivalent errors from anyone with respect to the presence of ambiguous taxa in a
working on birds. genus, except that the broadest invertebrate samples,
The source data for these analyses were gleaned all genera with more than 15 species sampled in a
from public databases, and the species names on genus, contained uncertain identifications. The same
sequences in these repositories often lead later work- was not true of birds. This could indicate that
ers to identify their own samples by comparison. among the sampled invertebrates, larger samples of
Early errors lead to incorrect sequence-based identi- species richness are an indicator of research concen-
fication, false confidence, and a cascade of later tration on areas of taxonomic or phylogenetic
problems (Morton 2018). It should be noted that uncertainty.
this problem is not isolated to sequence data, but Resources like GenBank are anecdotally rife with
voucher specimens are prerequisite for later compar- misidentifications, at the species level, genus level,
isons and often not available for published sequences and above, and data integrity protocols or user mo-
(Bortolus 2008). The present analysis excluded tivation may create barriers to correcting

Downloaded from https://academic.oup.com/icb/advance-article-abstract/doi/10.1093/icb/icy076/5058946


by New Mexico State University Library user
on 28 July 2018
8 J. D. Sigwart and A. Garbett

misidentifications discovered after publication varying genetic sizes at genus level among the differ-
(Morton 2018). It is reasonable to assume that these ent evolutionary groups as well as different effects of
errors occur more frequently for invertebrate taxa some metrics of interest.
than for birds. However, it is our view that misiden- Molecular phylogenetic studies frequently reveal
tification represents a minority (possibly a large mi- hidden diversity by identifying evolutionarily distinct
nority); certainly not all of the sequences in GenBank lineages. Yet the appropriate cut-off between what
are misidentified. It is relevant, then, that there is no constitutes distinct populations within a species ver-
pattern in our analysis of bivalve genera that suggests sus distinct species is not always readily apparent. As
the presence of identifiable outliers (Fig. 3), which a result, the act of defining a population (or group
would be expected if some genera included misiden- of closely related populations), and what warrants
tified sequences, and thus diverged from an underly- species status is not clear cut. Divergence in genetic
ing pattern. Although data gaps and uncertainty markers may indicate separate thresholds at pop-
problems are substantial barriers for understanding ulation or species level; the additional results here
non-vertebrate diversity, a larger issue may be uncon- suggest that median distances at the genus level in
scious expectations of equivalency among highly di- COI are not affected by sample saturation and may
vergent groups. be useful in future studies. Traditionally species de-
limitation has relied heavily on morphological char-
acterization to distinguish taxa. However, this
Genus size and the biodiversity minority majority method can fail to adequately reflect distinct evolu-
DNA barcoding techniques are increasingly used in tionary lineages and may result in underestimates of
combination with classic taxonomy, adapting to the biodiversity (Knowlton 2000). Species delimitation is
growing needs for species classification (Purty and especially challenging for organisms with highly con-
Chatterjee 2016). Species identification and delimita- served morphology, the so-called cryptic species
tion are often the primary focus to tackle the added (sensu Bickford et al. 2007). Consequently, modern
demand for measuring and monitoring global biodi- methods of species delimitation use objective criteria
versity. Although species level identification is un- and incorporate data from multiple sources includ-
doubtedly a vital measure of diversity, it is often ing ecological, geological, morphological, phyloge-
not feasible to refine identifications to species level; netic, and behavioral attributes (Yang and Rannala
for example, in fossil taxa, or due to financial con- 2010).
siderations, prioritizing efforts in large scale studies, Global species richness for birds is broadly ac-
or differentiating morphologically similar species in cepted as around 10,000 species. Recent work posited
the field (Heino and Soininen 2007; Mazaris et al. that the number could be much higher based on the
2008; Bertasi et al. 2009). Working toward cataloging genetic identification (Barrowclough et al. 2016), but
all living global diversity may require resorting to is based much on the finer-scale analysis than could
higher ranked taxonomic levels, such as genera, as be currently attempted on any other group of organ-
a substitute for species. isms. Global species richness estimates for poorly-
Genus level genetic variety, which is relevant to studied groups are notoriously inaccurate, because
the overarching goals of biodiversity inventories, it is difficult to predict the data gap of undiscovered
remains understudied. Species-level genetic thresh- species (Bebber et al. 2007), so it is very difficult to
olds have been widely examined across a wide range predict the total global living species richness for
of evolutionary groups; for example, a 2% genetic bivalves with any confidence. The broad estimate
variance is widely acknowledged as an “acceptable” of 10,000–20,000 bivalve species (e.g., Gonzalez
species threshold. Although this may be adequate for et al. 2015) is still in line with the broadest interpre-
specific markers in specific groups, there is no foun- tation of bird species richness, from 10,000 to 18,000
dation to expect the same threshold even for one (Barrowclough et al. 2016).
particular marker that would have transferability Birds and bivalves represent similar extant species
across organismal groups, or that one threshold richness but very different evolutionary histories.
would apply to different markers. Methods for Although the sampled invertebrates are all clades
assessments are disproportionately focused on taxo- that have deep histories and persisted through mul-
nomically well-known groups, and later adapted for tiple past mass extinction events and birds radiated
application to other less well-studied groups. more recently, they have similar numbers of living
Variation among clades is an important factor: species. Genera of bivalves likely represent a more
what may work for one may not necessarily work heterogeneous collection of rates of genotypic and
for another. The results presented here demonstrate phenotypic evolution, differing level of extinction,

Downloaded from https://academic.oup.com/icb/advance-article-abstract/doi/10.1093/icb/icy076/5058946


by New Mexico State University Library user
on 28 July 2018
The biodiversity minority majority 9

and more variety than found among a similar num- nomenclature also play a critical role in other areas
ber of bird species. Nonetheless certain emergent of biology. Estimates suggest that only about 10% of
patterns in species richness are universal (Sigwart the world’s biodiversity has been described (Mora
et al. 2018; Fig. 1). Previous studies have attempted et al. 2011). Many taxonomic groups also have un-
to impose standard metrics on species and species dergone revisions in light of molecular phylogenetic
groups (e.g., Avise and Johns 1999). The results of studies or additional discoveries, or are in need of
this study are further evidence as to why it is not revision on the basis of the new information from
viable to extend such standardization across broadly ongoing systematic research. This has absolutely be-
separated groups with different evolutionary histo- come routine in most invertebrate animals, whereas
ries, and emergent patterns in species richness and in a well-studied clade like birds, the discovery of
clade size may not show straightforward correlation new species or dramatic phylogenetic revision is in-
with genetic diversity or divergence times. creasingly infrequent. However, among under-
The COI mtDNA fragment was championed as an studied groups, the minority–majority that repre-
identification “barcode” because it is relatively fast- sents most animal life, the pace of new species
evolving and thus reveals species-level differences. descriptions has been described as glacial (Scotland
This also means that divegences may become satu- et al. 2003). Many species remain in a taxonomic
rated, especially in comparing individuals within limbo due to the lack of formal description, and
groups above the species level. Saturation could be this noted problem has been termed as the
a factor in the high divergences seen in our sampled “Linnean shortfall” (Brown and Lomolino 1998).
invertebrates, but this does not necessarily mean that Although the seemingly arbitrary designator of a
those species or genera are misclassified. name should not dictate management or conserva-
Species evolution is a continuous, ongoing pro- tion of a species, in reality name recognition can
cess, and the process of speciation—one have social, political, and legal influence
population-lineage splitting into daughter lineages (Beheregaray and Caccone 2007). Ambiguous species
with independent evolutionary trajectories—is not status can severely hamper conservation efforts for
instantaneous. At any given point in time, different threatened cryptic taxa (e.g., Niemiller et al. 2013).
lineages are de facto in different points of their on- Taxonomy is essential for conservation, but it is
going cyclical process of speciation and equilibrium. equally important that species names are recognized
It is naturally often difficult to tell whether a species as scientific hypotheses subject to the same rigorous
complex is a group of incipient species that are re- testing and ongoing refinement as any other branch
ticulate or overlapping, or rather a collection of very of science (Thomson et al. 2018).
distinct species at equilibrium but observed without Morphological diagnoses remain important in the
enough knowledge to differentiate them field, as the ultimate non-destructive sampling ap-
(Dobzhansky 1935). This is not to say that species proach. In terms of the process of taxonomic iden-
are not real, but only that the species we observe at tification and description, species are identified via a
this moment in time are not at all evolutionary equi- total evidence approach, which reflects the totality of
librium. Much hope has been placed in species bar- their evolution and there is no single easy solution.
coding; however, given this macroevolutionary Even if an easy solution such as a barcoding marker
paradigm there may be fundamental limitations to is effective in one group, it is unlikely to be a taxo-
the utility of barcoding to differentiate difficult or nomic silver bullet.
cryptic species. Although thresholds for a percent
divergence in a given marker may be perfectly ap-
propriate for certain constrained clades, the same Conclusions
approach may not be equally informative in other Taxonomically well-studied groups provide aspira-
organisms (Rubinoff 2006). In this context, it should tional goals for systematics, phylogneteics, and evo-
probably be not expected that an expanded ap- lutionary biology. The wealth of knowledge on the
proach, the genetic size of genera, would follow a natural history of birds, for example, contributes to
clear trend. It is rather remarkable that there are the deep understanding of delimiting species and
trends like this for birds, as we found here, and these genera in that clade. However, it may be that the
should probably not be seen as the “correct” answer reason there is so much information and such com-
or an aspiration for results from research on other plete taxonomy of birds is that birds are relatively a
organisms. very constrained group that follows unusually pre-
In addition to the theoretical and practical impor- dictable evolutionary patterns. These patterns may
tance of species status, issues of taxonomy and not be a norm that could form any basis for

Downloaded from https://academic.oup.com/icb/advance-article-abstract/doi/10.1093/icb/icy076/5058946


by New Mexico State University Library user
on 28 July 2018
10 J. D. Sigwart and A. Garbett

expectations in the evolution of other more diverse Barrowclough GF, Cracraft J, Klicka J, Zink RM. 2016. How
organisms. Similar patterns may be found in other many kinds of birds are there and why does it matter?
vertebrates, but that represents a tiny fraction of real PLoS One 11:e0166307.
Bebber DP, Marriott FH, Gaston KJ, Harris SA, Scotland RW.
animal diversity. Although there are emergent uni-
2007. Predicting unknown species numbers using discovery
versal patterns in systematics, such as global size- curves. Proc R Soc Lond B Biol Sci 274:1651–8.
frequency of genera constrained across animals and Beheregaray LB, Caccone A. 2007. Cryptic biodiversity in a
plant groups, these mathematical phenomena are changing world. J Biol 6:9.
driven by deeper issues of evolution and not the Bergsten J, Bilton DT, Fujisawa T, Elliott M, Monaghan MT,
specific evolutionary trajectory of individual lineages Balke M, Hendrich L, Geijer J, Herrmann J, Foster GN,
at the species or genus level. There are many evolu- et al. 2012. The effect of geographical scale of sampling
on DNA barcoding. Syst Biol 61:851–69.
tionary pathways that lead a clade to a small genus—
Bertasi F, Colangelo MA, Colosio F, Gregorio G, Abbiati M,
combinations of both extinction and/or a lack of Ceccherelli VU. 2009. Comparing efficacy of different tax-
diversification—but only one pathway to a large ge- onomic resolutions and surrogates in detecting changes in
nus, through rapid radiation. The genetic patterns in soft bottom assemblages due to coastal defence structures.
species evolution in one group thus do not serve as a Mar Pollut Bull 58:686–94.
robust predictor for other groups; as evidenced here Bertrand Y, Pleijel F, Rouse GW. 2006. Taxonomic surrogacy
by diversification in a single “barcode” fragment. in biodiversity assessments, and the meaning of Linnaean
The utility of barcode fragments should be con- ranks. Syst Biodivers 4:149–59.
Bickford D, Lohman DJ, Sodhi NS, Ng PK, Meier R, Winker
sidered with caution—divergence in a single frag-
K, Ingram KK, Das I. 2007. Cryptic species as a window on
ment is not equally applicable to all animals, and diversity and conservation. Trends Ecol Evol 22:148–55.
different species evolve at different rates so thresh- Blackwell M. 2011. The fungi: 1, 2, 3 . . . 5.1 million species?
olds should not be expected to be universally con- Am J Bot 98:426–38.
sistent. Moreover, genetic distance approaches are Blaxter M, Mann J, Chapman T, Thomas F, Whitton C, Floyd
evidentially even less consistently applicable to spe- R, Abebe E. 2005. Defining operational taxonomic units
cies groups, yet groups such as genera or families are using DNA barcode data. Philos Trans R Soc Lond B
Biol Sci. 360:1935–43.
frequently the appropriate level of identification in
Bortolus A. 2008. Error cascades in the biological sciences: the
large scale biodiversity surveys. Any approach to unwanted consequences of using bad taxonomy in ecology.
assessing biodiversity should work for most organ- Ambio 37:114–8.
isms. Most genera are small, and moreover most Brown JH, Lomolino MV. 1998. Biogeography. Sunderland
animal groups are under-studied. These are the mi- (MA): Sinauer Associates.
nority majority of biodiversity. The most fascinating Brown WM, George M, Wilson AC. 1979. Rapid evolution of
questions in understanding evolution lie in the animal mitochondrial DNA. Proc Nat Acad Sci 76:1967–71.
Claramunt S, Cracraft J. 2015. A new time tree reveals Earth
source of comparative diversity and diversification.
history’s imprint on the evolution of modern birds. Sci
Adv 1:e1501005.
Acknowledgments Chamberlain S. 2017. Bold: interface to bold systems API. R
This project would not have been possible without package version 0.5.0.9213. (https://github.com/ropensci/
the resources made freely available by the Barcode of bold)
Life Data System (http://www.boldsystems.org/), and Charif D, Lobry JR. 2007. Seqin{R} 1.0-2: a contributed pack-
age to the {R} project for statistical computing devoted to
we are grateful to all the contributors and partners to biological sequences retrieval and analysis. In: Bastolla U,
that project. Four reviewers provided comments that Porto M, Roman HE, Vendruscolo M, editors. Structural
improved an earlier version of this article. approaches to sequence evolution: molecules, networks,
populations. Berlin: Spinger. p. 207–32.
Funding Dobzhansky T. 1935. A critique of the species concept in
biology. Philos Sci 2:344–55.
This work was supported by the European Eo SH, DeWoody JA. 2010. Evolutionary rates of mitochon-
Commission Horizon 2020 research and innovation drial genomes correspond to diversification rates and to
program under grant agreement No. H2020-MSCA- contemporary species richness in birds and reptiles. Proc
IF-2014-655661, and the government of Northern R Soc Lond B Biol Sci 277:3587–92.
Ireland (Department of Employment and Learning). Federhen S. 2011. The NCBI taxonomy database. Nuc Acids
Res 40:D136–43.
Friedman M, DeSalle R. 2008. Mitochondrial DNA extraction
References and sequencing of formalin-fixed archival snake tissue.
Avise JC, Johns GC. 1999. Proposal for a standardized tem- DNA Seq 19:433–7.
poral scheme of biological classification for extant species. Gaston KJ, Williams PH. 1993. Mapping the world’s
Proc Natl Acad Sci U S A 96:7358–63. species—the higher taxon approach. Biodivers Lett 1:2–8.

Downloaded from https://academic.oup.com/icb/advance-article-abstract/doi/10.1093/icb/icy076/5058946


by New Mexico State University Library user
on 28 July 2018
The biodiversity minority majority 11

Gonzalez VL, Andrade SC, Bieler R, Collins TM, Dunn CW, Morton B. 2018. Fake new. Mar Pollut Bull 128:396–7.
Mikkelsen PM, Taylor JD, Giribet G. 2015. A phylogenetic Niemiller ML, Graening GO, Fenolio DB, Godwin JC, Cooley
backbone for Bivalvia: an RNA-seq approach. Proc R Soc JR, Pearson WD, Fitzpatrick BM, Near TJ. 2013. Doomed
Lond B Biol Sci 282:20142332. before they are described? The need for conservation
Gotelli NJ. 2004. A taxonomic wish-list for community ecol- assessments of cryptic species complexes using an amblyop-
ogy. Phil Trans R Soc Lond B 359:585–97. sid cavefish (Amblyopsidae: Typhlichthys) as a case study.
Grönholm T, Annila A. 2007. Natural distribution. Math Biodivers Conserv 22:1799–820.
Biosci 210:659–67. Purty RS, Chatterjee S. 2016. DNA Barcoding: an effective
Heino J, Soininen J. 2007. Are higher taxa adequate surro- technique in molecular taxonomy. Austin J Biotechnol
gates for species-level assemblage patterns and species rich- Bioeng 3:1059.
ness in stream organisms? Biol Conserv 137:78–89. Ratnasingham S, Hebert PD. 2007. BOLD: the barcode of life
Hendricks JR, Saupe EE, Myers CE, Hermsen EJ, Allmon data system. Mol Ecol Notes 7:355–64.
WD. 2014. The generification of the fossil record. Raup DM, Sepkoski JJ Jr. 1986. Periodic extinction of families
Paleobiology 40:511–28. and genera. Science 231:833–6.
Hebert PD, Cywinska A, Ball SL. 2003. Biological identifi- R Core Team. 2017. R: a language and environment for sta-
cations through DNA barcodes. Proc Roy Soc Lond B tistical computing. R Foundation for Statistical Computing.
Biol Sci 70:313–21. Vienna, Austria (https://www.R-project.org/).
Higgins D, Rohrlach AB, Kaidonis J, Townsend G, Austin Rubinoff D. 2006. Utility of mitochondrial DNA barcodes in
JJ. 2015. Differential nuclear and mitochondrial DNA species conservation. Conserv Biol 20:1026–33.
preservation in post-mortem teeth with implications for Schloerke B, Crowley J, Cook D, Briatte F, Marbach M,
forensic and ancient DNA studies. PLoS One Thoen E, Elberg A, Larmarange J. 2017. GGally: extension
10:e0126935. to ‘ggplot2’. R package version 1.3.2 (https://CRAN.R-proj-
Kazakova TB, Markosian KA. 1966. Comparison of physico- ect.org/package¼GGally).
chemical properties of mitochondrial and nuclear deoxyri- Schwarz C, Debruyne R, Kuch M, McNally E, Schwarcz H,
bonucleic acid from rat liver cells. Nature 211:79–80. Aubrey AD, Bada J, Poinar H. 2009. New insights from old
Knowlton N. 2000. Molecular genetic analyses of species bones: DNA preservation and degradation in permafrost
boundaries in the sea. In: Sole-Cava AM, Russo CAM, preserved mammoth remains. Nucleic Acids Res
Thorpe JP, editors. Marine genetics. Berlin: Springer. p. 37:3215–29.
73–90. Scotland R, Hughes C, Bailey D, Wortley A. 2003. The Big
Kress WJ, Garcıa-Robledo C, Uriarte M, Erickson DL. 2015. Machine and the much-maligned taxonomist. Syst
DNA barcodes for ecology, evolution, and conservation. Biodivers 1:139–43.
Trends Ecol Evol 30:25–35. Sigwart JD, Sutton MD, Bennett KD. 2018. How big is a
Langley CH, Fitch WM. 1974. An examination of the con- genus? Towards a nomothetic systematics. Zool J Linn
stancy of the rate of molecular evolution. J Mol Evol Soc 183:237–52.
3:161–77. Stöger I, Sigwart JD, Kano Y, Knebelsberger T, Marshall BA,
Larkin MA, Blackshields G, Brown NP, Chenna R, Schwabe E, Schrödl M. 2013. The continuing debate on
McGettigan PA, McWilliam H, Valentin F, Wallace IM, deep molluscan phylogeny: evidence for Serialia
Wilm A, Lopez R, et al. 2007. ClustalW and ClustalX ver- (Mollusca, Monoplacophora). BioMed Res Int 2013:1.
sion 2. Bioinformatics 23:2947–8. Strand M, Panova M. 2015. Size of genera—biology or tax-
Mazaris AD, Kallimanis AS, Sgardelis SP, Pantis JD. 2008. onomy? Zool Scr 44:106–16.
Does higher taxon diversity reflect richness of conservation Thomson SA, Pyle RL, Ahyong ST, Alonso-Zarazaga M,
interest species? The case for birds, mammals, amphibians, Ammirati J, Araya JF, Ascher JS, Audisio TL, Azevedo-
and reptiles in Greek protected areas. Ecol Indic 8:664–71. Santos VM, Bailly N, et al. 2018. Taxonomy based on
Miller MA, Pfeiffer W, Schwartz T. 2010. Creating the science is necessary for global conservation. PLoS Biol
CIPRES Science Gateway for inference of large phylogenetic 16:e2005075.
trees. In: Proceedings of the Gateway Computing Yang Z, Rannala B. 2010. Bayesian species delimitation using
Environments Workshop (GCE), 14 November 2010, multilocus sequence data. Proc Natl Acad Sci U S A
New Orleans (LA): IEEE. p. 1–8. 107:9264–9.
Mora C, Tittensor DP, Adl S, Simpson AG, Worm B. 2011. Yule GU. 1925. A mathematical theory of evolution, based on
How many species are there on Earth and in the ocean? the conclusions of Dr. J. C. Willis, F.R.S. Philos Trans R
PLoS Biol 9:e1001127. Soc Lond B Biol Sci 213:21–87.

Downloaded from https://academic.oup.com/icb/advance-article-abstract/doi/10.1093/icb/icy076/5058946


by New Mexico State University Library user
on 28 July 2018

You might also like