THE TORTOISE AND THE HARE IV

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

American Journal of Botany 101(11): 1987–2004, 2014.

CHLOROPLAST DNA SEQUENCE UTILITY FOR THE LOWEST


PHYLOGENETIC AND PHYLOGEOGRAPHIC INFERENCES IN
ANGIOSPERMS: THE TORTOISE AND THE HARE IV1
JOEY SHAW2,3,5, HAYDEN L. SHAFER2, O. RAYNE LEONARD4, MARGARET J. KOVACH2,
MARK SCHORR2, AND ASHLEY B. MORRIS4
2Department of Biological and Environmental Sciences, University of Tennessee at Chattanooga, Chattanooga, Tennessee 37403
USA; 3Botanical Research Institute of Texas, Fort Worth, Texas USA; and 4Department of Biology, Middle Tennessee State
University, Murfreesboro, Tennessee 37132 USA

• Premise of the study: Noncoding chloroplast DNA (NC-cpDNA) sequences are the staple data source of low-level phylogeo-
graphic and phylogenetic studies of angiosperms. We followed up on previous papers (tortoise and hare II and III) that sought
to identify the most consistently variable regions of NC-cpDNA. We used an exhaustive literature review and newly available
whole plastome data to assess applicability of previous conclusions at low taxonomic levels.
• Methods: We aligned complete plastomes of 25 species pairs from across angiosperms, comparing the number of genetic dif-
ferences found in 107 NC-cpDNA regions and matK. We surveyed Web of Science for the plant phylogeographic literature
between 2007 and 2013 to assess how NC-cpDNA has been used at the intraspecific level.
• Key results: Several regions are consistently the most variable across angiosperm lineages: ndhF-rpl32, rpl32-trnL(UAG), ndhC-
trnV(UAC), 5′rps16-trnQ(UUG), psbE-petL, trnT(GGU)-psbD, petA-psbJ, and rpl16 intron. However, there is no universally best
region. The average number of regions applied to low-level studies is ~2.5, which may be too little to access the full discrimi-
nating power of this genome.
• Conclusions: Plastome sequences have been used successfully at lower and lower taxonomic levels. Our findings corroborate
earlier works, suggesting that there are regions that are most likely to be the most variable. However, while NC-cpDNA se-
quences are commonly used in plant phylogeographic studies, few of the most variable regions are applied in that context.
Furthermore, it appears that in most studies too few NC-cpDNAs are used to access the discriminating power of the cpDNA
genome.

Key words: chloroplast; DNA barcode; intergenic spacer; intraspecific; intron; noncoding cpDNA; phylogeography; plastid
region; plastome; tortoise and hare.

The chloroplast genome has long been recognized as the It is at this shallowest taxonomic level, (inter- and intraspecific
workhorse for testing relationships between biological and geo- studies) that researchers are often challenged with finding suf-
graphical phenomena in angiosperms (Palmer, 1987; Palmer ficient genetic variability to address their questions (Schaal
et al., 1988; Olmstead and Palmer, 1994; Soltis et al., 1997; et al., 1998; Schaal and Olsen, 2000; Holderegger and Abbott,
Graham and Olmstead, 2000; Kelchner, 2000). Chloroplast se- 2003; Petit and Vendramin, 2007). Shaw et al. (2005, 2007)
quences have been used successfully to infer relationships at all provided guidance in the search for the most consistently vari-
taxonomic levels, from the deepest-level relationships of land able noncoding chloroplast regions (NC-cpDNA). In “The Tor-
plants and angiosperms (Chase et al., 1993; Borsch et al., 2003; toise and the Hare II” (TH2) (Shaw et al., 2005), the focus was
Hilu et al., 2003; Moore et al., 2010), through intermediate tax- to compare the relative number of genetic differences typically
onomic levels of orders and families (Downie et al., 2000; found in NC-cpDNA regions that were prevalent in the litera-
Potter et al., 2007; Chin et al., 2014), to relationships among ture at that time. In that study, 21 NC-cpDNA regions were as-
closely related species or populations (Soltis et al., 1997, 2006; sessed for utility within each of 10 seed plant lineages. Each
Schaal et al., 1998; Shaw and Small, 2004; Morris et al., 2008). lineage comprised three congeneric species. The results were
surprising in that the most commonly used regions at that time
1 Manuscript received 9 September 2014; revision accepted 17 September (e.g., trnL intron, trnL-trnF), were among the least informative,
2014. while some uncommonly used regions (trnD-trnT, trnS-trnG,
The authors thank Brad Ruhfel, Stephen Downie, an anonymous and rpoB-trnC) appeared to be much more informative. In “The
reviewer, and the Associate Editor for their careful consideration of this Tortoise and the Hare III” (TH3) (Shaw et al., 2007), the focus
article. The authors thank Ed Schilling (University of Tennessee) and was to expand genomic sampling to all single-copy NC-cpDNA
James Beck (Wichita State University) for reviewing early drafts of this regions. Initially, paired plastomes from three angiosperm lin-
manuscript. They also extend special thanks to R. Small, E. Schilling, eages (Atropa vs. Nicotiana [Solanaceae]: asterid; Lotus vs.
E. Lickey, J. Beck, K. Grubbs, S. Farmer, W. Liu, J. Miller, and C. Winder
Medicago [Fabaceae]: rosid; and Saccharum vs. Oryza [Poa-
for their work and important contributions to the earlier chapters of this
research and the Hesler Foundation at the University of Tennessee for cae]: monocot) were used to screen all single-copy NC-cpDNA
financially supporting the earlier chapters. regions to search for any that might have higher variability than
5 Author for correspondence (e-mail: joey-shaw@utc.edu) the best regions identified in TH2. The outcome highlighted 13
regions, and these were sequenced across the same three-species
doi:10.3732/ajb.1400398 groups as the TH2 study. In the end, nine rarely or never before

American Journal of Botany 101(11): 1987–2004, 2014; http://www.amjbot.org/ © 2014 Botanical Society of America

1987
1988 AMERICAN JOURNAL OF BOTANY [Vol. 101

used NC-cpDNA regions were identified as being, on average we anticipate that Sanger or NGS of a smaller number of indi-
across angiosperms, the most informative for low-level studies vidual NC-cpDNA regions will continue to be an important ap-
(rpl32-trnL(UAG), trnQ(UUG)-5′rps16, 3’trnV(UAC)-ndhC, ndhF- proach for many exploratory inquiries, low-level systematics and
rpl32, psbD-trnT(GGU), psbJ-petA, 3′rps16–5′trnK(UUU), atpI- phylogeographic studies, and for barcoding studies, at least for
atpH, and petL-psbE). the near future. Therefore, there is still a need to be able to select
Before the publication of TH3, very few congeneric species from a few, presumably the most potentially informative, NC-
pairs of whole plastomes were available in GenBank, and we cpDNA regions.
were fortunate to find species pairs within the same family (as- In the present study, we asked: Which, and how many, of the
terids: Solanaceae; rosids: Fabaceae; and monocots: Poaceae). “most informative regions” of the TH series have been useful at
There are now ~340 angiosperm chloroplast genomes in Gen- the lowest phylogenetic and phylogeographic levels in angio-
Bank (January 2014), many of which represent congeneric sperms? While we are particularly interested in intraspecific data,
accessions. This wealth of recent data allows us the new oppor- there are currently no publicly available whole plastome phylo-
tunity to more thoroughly screen the utility of noncoding re- geography data sets, nor are there many publications for such an
gions of the plastome across a wider spectrum of the angiosperm assessment (but see Whittall et al., 2010 and Doorduin et al.,
phylogeny to evaluate inferences made in TH3. As a conse- 2011). Therefore, we addressed our question by analyzing a large
quence of the rapidly increasing number of whole plastomes set of whole plastomes, reflecting as many low-level taxonomic
now available in public databases, we have an opportunity to comparisons as are currently available through the NCBI Organ-
learn from the growing availability of such data for the benefit elle Genome Resources (GenBank). We used 25 closely related
of the larger community of researchers not yet sequencing chlo- species pairs—congeners when possible (20/25)—across 10 ma-
roplast genomes. jor lineages of angiosperms (Nymphaeales, magnoliids, mono-
With the rapid advancement of next-generation sequencing cots, Proteales, eurosid I, eurosid II, Caryophyllales, Ericales,
(NGS) approaches, we have entered the “Golden Age of Molecu- euasterid I, euasterid II) (Fig. 1) to compare all single-copy NC-
lar Ecology” (Paliy, 2013), and sequencing whole plastomes, or cpDNA regions and matK because it is among the most variable
even hundreds of nuclear loci, for low-level inquiry is certainly coding regions and has been a popular region since 1997 (Hilu
the future (Soltis et al., 2013). It is becoming increasingly feasi- and Liang, 1997). We also were able to compare the results
ble to sequence whole or nearly whole plastomes using high- within the major clades to overall results.
throughput methods (Cronn et al., 2012; De Wit et al., 2012; Shi In addition to this new data set, we searched the literature cov-
et al., 2012; Straub et al., 2012; Stull et al., 2013; Uribe-Convers ering the period since the last TH paper (2007–2013). To deter-
et al., 2014), and the cost savings over Sanger sequencing can be mine whether NC-cpDNA sequences are an important tool for
significant for large-scale studies involving many taxa (Godden low-level studies, we asked: What percentage of intraspecific
et al., 2013) or for laboratories that are already established with studies of plants used cpDNA sequence data? Beyond this initial
the needed laboratory and computational infrastructure. Core se- question, we also asked: Which NC-cpDNA regions were chosen
quencing facilities now offer opportunities for multiple laborato- and sequenced? How many NC-cpDNAs were needed to gener-
ries to share the cost of an NGS run, further lowering the price ate sufficient data? How many papers explicitly reported screen-
per sample. While there are a few notable cases in which re- ing NC-cpDNAs, and how many were typically screened?
searchers have published phylogenies based on whole plastome All results are discussed in the context of the trends in the
data, most are targeted at resolving deep relationships (Ruhfel current literature and findings of our previous TH studies. In the
et al., 2014) or tackling questions in model organisms (Eserman end, we propose a strategy for investigators of low-level taxo-
et al., 2014) or economically important groups (Nikiforova et al., nomic or phylogeographic studies using NC-cpDNA sequence
2013; Njuguna et al., 2013), but this is rapidly changing. At pres- data.
ent, the majority of whole plastome publications are limited
to one or a few accessions per study (e.g., Martin et al., 2013;
Walker et al., 2014), and these typically come from the larger MATERIALS AND METHODS
research laboratories, reflecting the financial and computational
costs associated with this kind of data generation (Rocha et al., Taxon selection for genomic comparisons—Initially, we set out to use only
2013; Soltis et al., 2013). The start-up costs for hardware and congeneric species pairs in angiosperms; however, doing so, left us with phylo-
software alone may prohibit smaller, lesser-funded laboratories genetic gaps in our sampling. To fill in these phylogenetic gaps, we used the
most closely related species pairs available in a few lineages, resulting in an
around the world from moving from sequencing a few NC- initial list of 34 angiosperm species pairs for comparison. After we had com-
cpDNA regions to generating massive quantities of genomic data. piled all of the data, four species pairs were removed (Acorus americanus and
For smaller-scale studies, or studies coming from lesser-funded A. calamus, Nicotiana sylvestris and N. tabacum, Oryza nivara and O. sativa,
or equipped laboratories, the cost of NGS and the complexity of and Phyllostachys edulis and P. nigra var. henonis) because they contained too
bioinformatics analyses required for such large data sets may still few genetic differences between them, as we later explain in detail. We also had
be prohibitive (Doyle, 2013; Rocha et al., 2013; Bowen et al., to remove five other species pairs (Wolffia australiana and Wolffiella lingulata,
Colocosia esculenta and Lemna minor, Ranunculus macranthus and Megaler-
2014; but see Straub et al., 2012 for an alternative viewpoint). anthus saniculifolia, Cuscuta exaltata and C. obtusiflora, and Erodium carvifo-
Bowen et al. (2014), in summarizing results from a meeting at the lium and E. texanum) because they were too variable to be confidently aligned
National Evolutionary Synthesis Center (NESCent), suggested across the entire genome (the large indels and genomic rearrangements in these
that (1) single mtDNA locus studies (for plants, we would mod- species pairs that made our analyses difficult could certainly provide workers of
ify this phrase to “one to several cpDNA loci studies”) provide these groups with important information in other contexts). The remaining 25
powerful first assessments of patterns; (2) genome-wide analyses species pairs (including 20 congeneric pairs) used in our analyses are shown in
Fig. 1. We made a strong effort to sample across the phylogenetic breadth of
are warranted if results from standard markers are discordant angiosperms, using nomenclature and topology from the Angiosperm Phylog-
with other aspects of the organisms biology; (3) at this stage of eny Group (http://www.mobot.org/MOBOT/research/APweb/). We made an
software development and technology, a judicious rather than effort to sample about the same number of species pairs per major lineage. In
wholesale application of genomics is prudent. Given all of this, the end, we were able to sample one species pair from Nymphaeales, two from
November 2014] SHAW ET AL.—TORTOISE AND HARE IV 1989

Fig. 1. Phylogenetic breadth of sampling for plastome comparisons. Species pairs (mostly congeners) are listed beside family names and next to major
lineages. Tree topology follows Angiosperm Phylogeny Group III (APG III) (http://www.mobot.org/MOBOT/research/APweb/).

magnoliids, five from monocots, one from Proteales, four from eurosid I, four at a fast pace for coding regions. Because matK has historically been an impor-
from eurosid II, one from Caryophyllales, one from Ericales, three from euaste- tant region (Hilu and Liang, 1997) and is commonly recognized as being as
rid I, and three from euasterid II (Fig. 1). variable as many noncoding regions of the plastome, our research effort was
focused on matK and all single-copy, noncoding regions of the plastome. In
total, 107 noncoding regions and matK were compared across 25 species pairs
NC-cpDNA selection—Rather than try (1) to determine exactly where the spanning the breadth of angiosperms.
large single copy (LSC), small single copy (SSC), and inverted repeat (IR) re-
gions occur in each of the 25 lineages and (2) expand the data set to include
uncommon intergenic spacers (i.e., intergenic spacers unique to a given taxon Sequence alignment—The GenBank nucleotide BLAST function was used for
because of genomic rearrangements unique to that taxon), we used the Gossy- initial alignment. BLAST aligns local regions of two separate sequences based on
pium hirsutum genome (Lee et al., 2006) as a model. It was used because it is a nucleotide similarity. In initial BLAST searches, we found that some of the more
typical chloroplast genome to determine which regions to include or exclude as genetically dissimilar species pairs (e.g., Aethionema) often contained too many
well as the starting and stopping points of the LSC, SSC, and IRs. gaps to ensure reliable alignments. Thus, we altered the default BLAST algorithm
It is well accepted that the most variable portions of the cpDNA genome are parameters to increase the standard linear gap cost, as a measure to reduce the pres-
not in the IR regions (Clegg et al., 1984; Wolfe et al., 1987). The IR regions ence of misaligned sequences. BLAST parameters were as follows: Max target
have been shown to contain low levels of variability (Maier et al., 1995), and sequences = 100, expected threshold = 10, Max matches in a query range = 0,
Presting (2006) confirmed and quantified earlier reports that the IR was signifi- Match/Mismatch scores = 1-2, and Gap Costs = Existence: 5; Extension: 2. After
cantly more resistant to genetic changes compared to the single-copy regions. the sequences were aligned with BLAST, the alignments were visualized in Por-
We therefore concentrated on the single-copy portions of the genome. table Document Format, and the gene-encoding regions were unequivocally de-
The two single-copy regions consist of the LSC, which is typically about 80 noted using the annotations in GenBank. Alignments were manually scored for
kb long, and the SSC, which is typically 20 kb. These regions contain a combi- genetic differences. In a few cases where small sections of the alignments were too
nation of rRNA and tRNA genes, as well as protein-encoding genes. It is com- variable to be confidently aligned (e.g., poly A/T regions), gaps were opened up,
mon knowledge that gene-encoding regions will accumulate genetic differences and these were conservatively scored as a single genetic difference (an indel).
more slowly than noncoding regions, the exceptions being matK (Steele and
Vilgalys, 1994; Johnson and Soltis, 1995; Fazekas et al., 2008; Lahaye et al., Data analysis—All noncoding portions of the genome, including intergenic
2008) and perhaps ycf1 (Dong et al., 2012), which seem to accumulate mutations spacers (IGS) and introns, were analyzed for genetic differences between the
1990 AMERICAN JOURNAL OF BOTANY [Vol. 101

species pairs, and these genetic differences were counted as potentially infor- Literature review—To answer the question “What proportion of intraspe-
mative characters (PICs) following Shaw et al. (2005, 2007). (PICs have been cific plant studies used cpDNA sequence data?”, we performed a Web of Sci-
shown to be a good predictor of parsimoniously informative characters [Fior ence search in May 2014 by confining the dates to 1987–2013 and searching on
et al., 2013].) The PICs included indels, nucleotide substitutions, and inversions. the terms “phylogeography or phylogeographic”, refined to articles, then fur-
Indels and inversions were scored as binary (presence/absence) characters fol- ther refined by “*aceae”. Subsequent refinements were performed using either
lowing Simmons and Ochoterena (2000). The process of alignment, annotation, “nuclear or nDNA” or “mitochondria or mitochondrial or mtDNA” or “chloro-
and analysis was repeated for each of the 25 species pairs. The length of the plast or plastid or cpDNA”.
noncoding region was recorded, and the number of PICs tallied for each non- To address the other questions posited in the Introduction section, we per-
coding cpDNA region and matK in each of the 25 lineages. All genetic differ- formed a search on Web of Science in January 2014. We first used the topic
ences were scored as independent, single characters. search terms “phylogeography or phylogeographic” limited to the years 2007–
Three types of calculations were performed to evaluate NC-cpDNA variabil- 2013. We refined this number using the topic search terms “chloroplast or cp-
ity. First, we estimated the proportion of observed genetic differences for each DNA” to focus specifically on plant phylogeographic studies. While this may
NC-cpDNA using a modified version of the formula used in O’Donnell (1992), exclude papers that exclusively use nuclear markers, we determined this to be
Gielly and Taberlet (1994), and Shaw et al. (2005, 2007). The proportion of ge- the most appropriate search strategy for the present survey. The top five source
netic differences (or % variability) = [(NS + ID + IV) / L] × 100, where NS = the titles for these plant phylogeographic studies during the selected 7 years were
number of nucleotide substitutions, ID = the number of indels, IV = the number Molecular Ecology (99), Journal of Biogeography (79), Molecular Phylogenet-
of inversions, and L = the aligned sequence length. Second, we averaged the ics and Evolution (45), American Journal of Botany (32), and Plant Systematics
number of PICs found within each noncoding chloroplast region across the 25 and Evolution (28), representing approximately 41% of total publications under
lineages that contained those NC-cpDNA regions (if the region was missing in a our search criteria. Because our emphasis here is on low taxonomic level and
lineage, the total was divided by 24 rather than 25). Third, to ensure that lineages population studies, we excluded papers in Molecular Phylogenetics and Evolu-
containing a greater number of genetic differences between the species pair tion and Plant Systematics and Evolution due to the more traditional systematic
(greater evolutionary distance) were not overrepresented (weighted) in global nature of publications therein. Of the total of 210 plant phylogeographic studies
comparisons, we determined the percentage contribution of each noncoding re- in the remaining three journals (Molecular Ecology, Journal of Biogeography,
gion to the overall PIC total within a lineage (what was called “normalized PICs” and American Journal of Botany), 33 were excluded following our review for
in earlier TH papers). In effect, we calculated the percentage contribution of a either being a review or focused on an animal, more than three species, or a
NC-cpDNA to the number of genetic differences observed in a species pair. nonvascular plant. This exclusion resulted in 178 publications being included in
These values were then used to generate an average value for each noncoding our review.
region so that the regions could be directly compared. We built a database containing the following information for each of these
We argue that the percentage contribution values are the most accurate for 178 studies: author, year of publication, source title, taxon, family, markers
comparisons of NC-cpDNA variability across lineages because they reduce the tested (when available), markers used, and length and PICs for markers used
overrepresentation of average PIC values from the species pairs that evolved at (when available). We used these data to answer the questions posited in the
faster rates or are evolutionarily further apart. For example, Silene species ac- introduction.
cumulated 1483 PICs across all regions, while Olea species only had 321 PICs. We were also interested in comparing relative utility of regions within stud-
Without this “normalization” of the data, the species pair with the greatest over- ies to determine whether patterns predicted by TH2 and TH3 were supported by
all number of genetic differences (e.g., Silene) would more strongly influence the literature. For this comparison, we could only include studies that explicitly
the analysis compared with other lineages that had many fewer genetic differ- stated the number of PICs/region and used at least two or more regions. Of the
ences between species. A downside to the percentage contribution analysis is 178 studies included in our review, 45 met these criteria. The remaining studies
that species pairs with a very low number of genetic differences between them either reported PICs across all regions combined, or many did not report PICs
tend to have overly high scores in this analysis. For example, the Nicotiana at all. For these 45 studies, NC-cpDNAs were ranked by decreasing number of
species pair was omitted because there were only two nucleotide substitutions PICs observed, and trends were qualitatively assessed across studies.
across the entire alignment, and therefore, each NC-cpDNA containing them
would have the very large percentage contribution PIC score of 0.50. For this
reason, we omitted species pairs with fewer than 100 PICs in the alignment RESULTS
(discussed earlier in Taxon selection for genomic comparisons).
Overview of molecular data set— In all, we manually exam-
Adjusted mean PICs—A mixed-model analysis of covariance (ANCOVA) ined over 2.5 million base pairs of aligned data from 25 species
was used to look for statistical separation between the 10 NC-cpDNAs that pairs, resulting in a data set of 107 single-copy NC-cpDNA re-
ranked highest in the overall percentage contribution analysis. We chose to ana-
lyze only 10 regions because statistical inference would be weakened with all
gions including 15 introns, 92 IGS, and matK for 25 species
108 regions. Furthermore, we argue that if statistical separation among the top pairs (Appendix S1, see Supplemental Data accompanying the
10 is present, then it is sure to exist between these and the rest. The mean PICs/ online version of this article).
region was treated as the response variable and the fixed-effects factor was the
NC-cpDNAs; lineage was treated as a block (random-effects factor) and region Taxon selection, omission, and results on omitted taxa—At
length (number of base pairs per region) as a covariate. Data were log-transformed the outset, we identified 34 congeneric or fairly closely related
[log10 (PIC + 1)]; [log10 (length + 1)] to satisfy the parametric test (ANCOVA)
assumptions of normality, variance homogeneity, and linearity. Region-specific
species pairs; however, five of the pairs were too variable to be
PIC estimates were reported as geometric means ± SE. confidently aligned, and four were too invariable to yield reli-
A preliminary ANCOVA indicated that there was a direct linear relationship able information. There were only two genetic differences be-
between the number of PICs/region and length of the region (P < 0.0001) and tween Nicotiana sylvestris (NC_007500) and Nicotiana tabacum
that the slopes of PIC-length relationships were similar (P > 0.10), which al- (NC_001879); one difference was in the psbM-trnD(GUC) spacer,
lowed us to compare slope-adjusted mean PICs/region. Region-specific esti- and the other was in the petD intron. There were only 22 genetic
mates of adjusted mean PICs (y), which control for the effect of region length
(x, covariate), were determined using linear regression models [log (y+ 1) = a +
differences between Acorus americanus (NC_010093) and
b(x + 1)] with a common slope (b = 0.9427) and length value (x = mean length = Acorus calamus (NC_007407). These differences were found in
~1073 bp). Scheffé’s method was used to make pairwise comparisons (con- ndhF-rpl32 (1), ccsA-ndhD (1), ndhG-ndhI (1), 5′trnK(UUU)-
trasts) between region-specific mean PICs and/or groups of mean PICs, based 3′rps16 (2), rps16 intron (1), trnQ(UUG)-psbK (1), trnC(GCA)-petN
on patterns exhibited by the data. Scheffé’s procedure, a highly conservative (2), trnD(GUC)-trnY(GUA) (1), trnT(GGU)-psbD (3), trnT(UGU)-
post hoc test that maintains a familywise type I error rate at or below the alpha trnL(UAA) (1), trnM(CAU)-atpE (1), atpB-rbcL (1), ycf4-cemA (1),
level, is recommended for multiple contrasts between groups of sample means
(Zar, 2010).
petA-psbJ (1), psbE-petL (1), psaJ-rpl33 (1), rps18-rpl20 (1),
Data were analyzed using the UNIVARIATE, REG, GLM, MIXED, and and rpl16 intron (1). We only observed 26 genetic differences
LSMEANS procedures of the Statistical Analysis System (SAS Institute, between Oryza nivara (NC_005973) and Oryza sativa (NC_
2011). Statistical significance was declared at an alpha level of 0.05. 008155), and these were found in ndhF-rpl32 (1), ccsA-ndhD
November 2014] SHAW ET AL.—TORTOISE AND HARE IV 1991

(1), ndhA intron (1), psbA-3′trnK(UUU) (1), rps16 intron (2), though their percentage variability was high (compare Fig. 2C
trnD(GUC)-trnY(GUA) (2), trnE(UUC)-trnT(GGU) (2), psbZ-trnG(GCC) with Fig. 2A and 2B). The rpl32-trnL(UAG) and rps15-ycf1 regions
(2), ycf3-trnS(GGA) (2), ndhC-trnV(UAC) (3), trnM(CAU)-atpE (1), were the only two of the top regions that were over 500 bp and
accD-psaI (1), petL-petG (1), petD-rpoA (1), rps11-rpl36 (1), only rpl32-trnL(UAG) was over 750 bp. Because choosing the most
rpl14-rpl16 (2), rpl16 intron (1), and rpl22-rps19 (1). Twenty- informative NC-cpDNAs requires combining the effects of NC-
four genetic differences were observed between Phyllostachys cpDNA length and percentage variability, Fig. 2D is positioned
edulis (NC_015817) and Phyllostachys nigra (NC_015826). In next to Fig. 2C to highlight the fact that several NC-cpDNAs that
Phyllostachys, differences were found in rpl32-trnL(UAG) (5), are highly variable by percentage are also very short.
ccsA-ndhD (2), psaC-ndhE (2), matK-5′trnK(UUU) (1), 5′rps16- Because we had more than one species pair within most ma-
trnQ(UUG) (2), rpoB-trnC(GCA) (1), petM-psbM (1), psbM-trnD(GUC) jor lineages, we compared the trends in the major clades of an-
(1), ycf3 intron1 (1), ycf3-trnS(GGA) (1), trnT(UGU)-trnL(UAA) (2), giosperms with the overall trends described above (Fig. 3A–F).
ndhC-trnV(UAC) (1), rpl33-rps18 (1), petB intron (1), and rpl14- While the ranking of the top regions is not perfectly consistent
rpl16 (2). We also had to remove species pairs because they were across the major lineages, there are some NC-cpDNAs that
too variable to be confidently aligned; that is, between these spe- consistently rank among the top 13 (top 13 shown to incorpo-
cies pairs, GenBank BLAST opened up large gaps with scattered rate matK because it has been a popular and informative marker
unaligned nucleotides, and we could not manually improve on since the beginning of sequencing cpDNA).
these alignments. Species pairs in which this was the case in- Either ndhF-rpl32 or rpl32-trnL(UAG) was the highest ranked
clude: Erodium carvifolium (NC_015083) and Erodium texanum region in four of six major lineages, in one lineage rpl32-trnL(UAG)
(NC_014569); Cuscuta exaltata (NC_009963) and Cuscuta gro- was ranked second behind 5′rps16-trnQ(UUG), and both regions
novii (NC_009765); Wollfia australiana (NC_015899) and Woll- were too variable to be included in one species comparison (An-
fiella lingulata (NC_015894); Colocasia esculenta (NC_016753) thriscus/Daucus). The 5′rps16-trnQ(UUG) region ranked in the top
and Lemna minor (NC_010109); and Ranunculus macranthus two in two of six major lineages and in the top 13 in four of six
(NC_008796) and Megaleranthus saniculifolia (NC_012615). lineages, but was excluded from one lineage for being too vari-
In a few cases, one or two NC-cpDNAs had to be omitted able (Daucus/Anthriscus). It should be noted that eurosid I was
from an otherwise neatly aligned genomic comparison because the only major lineage in which neither ndhF-rpl32 nor rpl32-
these NC-cpDNAs were too variable to be confidently aligned. trnL(UAG) ranked in the top 2; however, these NC-cpDNA regions
This was the case in Nuphar/Nymphaea for petN-psbM, Phoe- are missing from Populus and unusually small in Vigna, perhaps
nix/Pseudophoenix for trnC-petN and petN-psbM, Gossypium skewing the data because these two lineages account for two of
for psbE-petL, Silene for trnH(GUG)-psbA and clpP-psbB, and four species pairs of the eurosid I group. The rpl32-trnL(UAG) re-
Anthriscus/Daucus for ndhF-rpl32, rpl32-trnL(UAG), 5′trnK- gion did rank in the top five in the other eurosid I, Cucumis. An-
3′rps16, 5′rps16-trnQ(UUG), and atpH-atpI (Appendix S1). other high-ranking region, ndhC-trnV(UAC), ranked in the top five
in four of six of the major lineages.
The most informative noncoding cpDNA regions—Figure 2A Analysis of the top eight most variable regions in the 25 indi-
shows the number of genetic differences observed within every vidual species pair comparisons (Fig. 4) shows that ndhC-
single-copy NC-cpDNA and matK, averaged across 25 species trnV(UAC), rpl32- trnL(UAG), and ndhF-rpl32 all ranked in the top
pairs. In this analysis, the top 10 regions were: 5′rps16-trnQ(UUG), 10 in 16 of 25 lineages, psbE-petL in 15 of 25 lineages (and it
ndhC-trnV(UAC), ndhF-rpl32, trnT(GGU)-psbD, psbE-petL, petA- was too variable to be included in Gossypium), trnT(GGU)-psbD
psbJ, rpl32-trnL(UAG), rpl16 intron, ndhA intron, and rpoB- in 14 of 25 lineages, 5′rps16-trnQ(UUG) and petA-psbJ in 11 of
trnC(GCA) (underlines highlight the regions in common between 25 lineages, and rpl16 intron in 10 of 25 lineages.
this analysis and the analysis of average percentage contribution
to total PICs, below). The matK gene was ranked 12th. Figure 2B Adjusted mean PICs— Results of the mixed-model AN-
summarizes the PIC data where the value for each NC-cpDNA COVA (which accounted for variation across lineages and con-
represents the average percentage contribution of the total genetic trolled for the effect of region length) indicate that the mean
differences observed in pairwise species comparisons averaged log-transformed number of PICs/region differed across the 10
across the 25 lineages. In this analysis, the top 10 regions were: NC-cpDNAs that ranked the highest in the overall percentage
ndhF-rpl32, ndhC-trnV(UAC), rpl16 intron, rpl32-trnL(UAG), contribution analysis (F9,194 = 3.09, P = 0.0017). Following the
5′rps16-trnQ(UUG), 5′trnK(UUU)-3′rps16, psbE-petL, trnT(GGU)- ANCOVA, we performed pairwise contrasts of adjusted means
psbD, petA-psbJ, and psbM-trnD(GUC) (underlines highlight the and/or groups of means using Scheffé’s procedure; selected
regions in common between this analysis and the analysis of over- comparisons involving combined DNA regions (combined vs.
all average PICs, above). The matK gene was again ranked 12th. combined; single vs. combined) revealed significant differ-
While the relative positions of the most variable regions may ences in the mean PICs/region between certain groups of the 10
change between the averaged PIC and percentage contribution regions.
of averaged PIC analyses (or even lineage to lineage, see be- Scheffé’s contrasts revealed significant differences in ad-
low), there are eight NC-cpDNAs in the top 10 of both of these justed mean PICs/region among three groups of combined
analyses and these are underlined above and throughout the rest DNA regions (P < 0.05; Fig. 5). Mean PICs/region (compared
of this paper. These eight regions account for roughly 21% of as groups based on visual breaks in the data) was highest for
the PICs in a given lineage (based on percentage contribution of Group 1 (geometric mean = 19.2 PICs; regions ndhF-rpl32,
each region averaged across the 25 species pairs). The top 12 rpl32-trnL(UAG)), intermediate for Group 2 (12.7 PICs; regions
regions account for >33% of the total PICs. ndhC-trnV(UAC), rpl16 intron, petA-psbJ), and lowest for Group
Percentage variability within each region is shown in Fig. 2C. 3 (10.2 PICs; regions 5′rps16-trnQ(UUG), psbE-petL, trnT(GGU)-
Eight of the top 11 most variable regions in this analysis had an psbD, ndhA intron, rpoB-trnC(GCA)). Pairwise contrasts within
average total length of about ≤250 bp and 9/11 were ≤350 bp, the three groups yielded no differences in mean PICs/region
making the total PICs offered by these regions relatively low even (P > 0.05).
1992 AMERICAN JOURNAL OF BOTANY [Vol. 101

Fig. 2. Expected number of genetic differences within the noncoding single-copy portions of the plastome (NC-cpDNA). Gene order is preserved
beginning at the top with the intersection of inverted repeat b (IRb) and the large single-copy (LSC) region; the small single-copy (SSC) region is at the
bottom and begins with rps15-ycf1. The vertical black lines and the four-point stars highlight the 10 regions that on average contain the greatest number of
potentially informative characters (PICs) in those analyses (A, B). (A) Number of PICs averaged across the 25 species pairs. (B) Average percentage of
potentially informative characters (PICs) that each noncoding region contributes to the total PICs observed between species pairs. (C) Percentage variabil-
ity averaged across 25 species pairs. The top 11 most variable regions by percentage are cutoff from the rest by the vertical black line and the four-point
stars (there was a tie for 10th place). Arrows from NC-cpDNAs in (C) point to the same regions in (D) to highlight the fact that 8/11 of the most variable
regions in terms of percentage variability are too short to be of great value. (D) Length of noncoding regions averaged across 25 species pairs. The black
line in (D) marks 250 bp to highlight those regions that may be potentially highly variable in terms of percentage but are less than 250 bp long.
November 2014] SHAW ET AL.—TORTOISE AND HARE IV 1993

Fig. 2. Continued.
1994 AMERICAN JOURNAL OF BOTANY [Vol. 101

Fig. 3. Clade-specific ranks of single-copy noncoding NC-cpDNAs (top 13). Percentage contribution of total potentially informative characters (PICs)
(y-axis) for each NC-cpDNA (x-axis) averaged across the species pairs in the clade. The eight highest-ranking regions in the analysis of all 25 lineages
combined are marked with four-pointed stars (ndhF-rpl32, rpl32-trnL, rps16-trnQ, ndhC-trnV, trnT-psbD, psbE-petL, petA-psbJ, rpl16 intron).

In another contrast (excluding Group 1), the mean PICs/region rpoB-trnC(GCA); P < 0.05). In summary, the ANCOVA and mul-
for the highest-ranking Group 2 region (geometric mean = 14.7 tiple contrasts of adjusted mean PICs/region within the top 10
PICs; region ndhC-trnV(UAC) was higher than that for other regions (averaged across 20–25 lineages) indicated that ndhF-
regions in Groups 2 and 3 (10.6 PICs; regions rpl16 intron, petA- rpl32, rpl32-trnL(UAG), and ndhC-trnV(UAC) were the most vari-
psbJ, 5′rps16-trnQ(UUG), psbE-petL, trnT(GGU)-psbD, ndhA intron, able frames, respectively.
November 2014] SHAW ET AL.—TORTOISE AND HARE IV 1995

Literature review—The total number of “phylogeography or allowed workers to begin utilizing the most informative NC-cpDNA
phylogeographic” papers published between 1987 and 2013 was regions and push the use of the plastome further toward the
10 196. We refined this number by limiting the selections to arti- shallowest taxonomic levels.
cles only (9286) and refining by the search term “*aceae” to cap-
ture any mention of a plant family in the text, resulting in a total of The most informative noncoding cpDNA regions—A few
1089 papers. Further refining by the search terms “chloroplast or NC-cpDNA regions stand out as being the most likely to contain
plastid or cpDNA” resulted in 699 papers; refining by “mitochon- high levels of variability at the inter- and intraspecific levels, even
dria or mitochondrial or mtDNA” resulted in 351 papers; refining though this study (Figs. 2–4) and a body of literature (Mort et al.,
by “nuclear or nDNA” resulted in 253 papers (Fig. 6). 2007; Shaw et al., 2007; Särkinen and George, 2013) reinforces
For the literature review, 178 papers published between 2007 the observation that there is no universally best NC-cpDNA. That
and 2013 were reviewed for this study. Of those 178, 123 (69%) said, screening what are known to be, on average, the fastest
used cpDNA sequence data (Fig. 7). The percentage of papers evolving regions is a good starting point for molecular investiga-
using cpDNA sequences varied by year, but ranged between tions of new groups (Cires et al., 2012; Fior et al., 2013). In the
50% (2007) and 81% (2012). The average number of cpDNA best-case scenario, researchers would have, or could generate, at
regions used was two to three, with a minimum of one to a least two complete plastomes within their study genus (or study
maximum of six. Only 28 of the 178 papers explicitly reported section of large genera) to screen for the most informative re-
screening additional loci, although it was not always intimated gions, before beginning a phylogeographic or low-level phyloge-
what loci were tested. When this information was provided, we netic study (Doorduin et al., 2011; Fehrmann et al., 2012; Li
determined that the mean number of loci screened by year et al., 2013; Särkinen and George, 2013). Since there may be as
ranged from four to 15, with a few papers indicating “several” many as 17 000 genera of angiosperms (http://www.theplantlist.
regions were screened. org) and GenBank presently (August 2014) has only 374 angio-
Comparing the noncoding regions within studies showed that sperms plastomes representing 205 genera—and a fair number of
in 11 comparisons, trnH(GUG)-psbA provided more PICs than these are either species poor, endemic, or genera of parasitic
other regions in the same study. However, in six other studies, plants and not likely to be useful predictors of variability across
trnH(GUG)-psbA was not the best performer. The region with the large numbers of species—it is unlikely that most researchers
second highest observed PICs was trnS(GCU)-5′trnG(GCC), based will have this advantage for several more years.
on nine comparisons. There were only two studies (in which it While it is true that the cost of NGS approaches are rapidly
was included) that it was not the top performer. Of the top re- declining, and novel and simpler methods for undertaking such
gions reported in the present study, two (ndhC-trnV(UAC) and work are being rapidly published, many researchers (particularly
psbM-trnD(GUC) were not used in any of the studies reviewed for at primarily undergraduate institutions) are still limited by fund-
this comparison. The remaining top regions identified here ing and computational capacity. For these researchers, as well as
were used infrequently in the surveyed papers: rpl32-trnL(UAG) others, there still remains a trade-off between collecting “big
(best in three of four studies); ndhF-rpl32 (best in the one study data” for a small number of taxa or individuals at the same cost
in which it was used); psbE-petL (best in the one study in which of collecting “small data” for a larger number of taxa or individu-
it was used); trnT(UGU)-trnL(UAA) (best in one of two studies); als. The cost of software (e.g., Geneious) may be inexpensive to
trnT(GGU)-psbD, rpoB-trnC(GCA), and petA-psbJ were not best in some, yet to other researchers this cost alone may be prohibitive.
any of the studies in which they were used (2, 1, and 2 studies, Additionally, for phylogeographic studies, software has not yet
respectively; Appendix S2, see online Supplemental Data). fully developed to be able to handle the computational loads as-
sociated with NGS big data. Therefore, the choice between NGS
and Sanger approaches, while not necessarily mutually exclusive
DISCUSSION (Fior et al., 2013), is going to be an independent one based on
funding (both short-term and long-term potential), laboratory in-
NC-cpDNAs are important for low-level studies— Today, frastructure, and computational support in a given laboratory. As
data from NC-cpDNA is the most commonly used tool for phy- such, there is still a valid need for information such as that pre-
logeographic and low-level phylogenetic studies of plants (Fig. sented here, providing insight into which NC-cpDNA regions
6). Our literature review highlighted the fact that researchers or portions of the plastome are likely to be most informative
working at inter- and intraspecific levels still rely heavily on (Fig. 8). That is, the results of our work can guide researchers
NC-cpDNA sequence data from one to a few loci (Figs. 6, 7). who want to use the most variable portions of the plastome,
Of the more recently published plant phylogeography papers, whether they are amplified independently, in larger clusters, mul-
those published between 2007 and 2013, nearly 70% used cp- tiplexed, or extracted from whole plastome data sets.
DNA sequence data (Fig. 7). That said, three regions of the plastome stand out to us as
There is a body of literature dating back to the 1990s in hotspots of variability (Fig. 8). First is the area in the SSC that
which botanists have been searching for the most appropriate runs from ccsA to ndhF. This portion of the plastome contains
NC-cpDNAs for low-level phylogenetic and phylogeographic two of the most variable single regions (ndhF-rpl32 and rpl32-
studies (Taberlet et al., 1991; Demesure et al., 1995; Dumolin- trnL(UAG)). The second hotspot, is a larger area of the genome
Lapegue et al., 1997b; Small et al., 1998; Hamilton, 1999; from matK to 3′trnG(GCC), which contains several fairly vari-
Saltonstall, 2001; Shaw et al., 2005, 2007). In the mid-2000s, a able regions including matK, 5′trnK(UUU)-3′rps16, 5′rps16-
few studies compared numerous NC-cpDNA regions across a trnQ(UUG), and trnS(GCU)-5′trnG(GCC), among others. The third
broad enough taxonomic scope such that general trends about hotspot is from rpoB to psbD, and this larger portion of the ge-
NC-cpDNA region relative variability could be reported (Small nome contains several high-ranking regions all clustered to-
et al., 1998; Bastia et al., 2001; Grivet et al., 2001; Aoki et al., gether. Interestingly, there are a fair number of rearrangements
2003; Dhingra and Folta, 2005; Shaw et al., 2005, 2007; Ebert in these regions of the plastome as well (Appendix S1). Based
and Peakall, 2009). Reports of these general trends may have on the extensive plastome survey included in the present study,
1996 AMERICAN JOURNAL OF BOTANY [Vol. 101

Fig. 4. Potentially informative characters (y-axis) for each NC-cpDNA (x-axis) within each species pair. The eight highest-ranking regions in the
analysis of all 25 lineages combined are marked with four-pointed stars (ndhF-rpl32, rpl32-trnL, rps16-trnQ, ndhC-trnV, trnT-psbD, psbE-petL, petA-psbJ,
rpl16 intron). Mag. = Magnolia, Cymb. = Cymbidium, Phalae. = Phalaenopsis, Aeth. = Aethionema, Sol. = Solanum, Chrys. = Chrysanthemum.
November 2014] SHAW ET AL.—TORTOISE AND HARE IV 1997

Fig. 4. Continued.

we suggest that is it possible to make good predictions based on We used 25 closely related species pairs (most were conge-
either the overall trends or based on the trends observed within ners) from 10 major lineages of angiosperms, with multiple spe-
a clade of interest (Figs. 2–4). cies pairs from six of these clades, to test all NC-cpDNA regions
1998 AMERICAN JOURNAL OF BOTANY [Vol. 101

Fig. 5. Geometric mean potentially informative characters (PICs) ±


SE per region for the top 10 NC-cpDNA regions (n = 20–25 lineages per
region). Regression-based estimates [y = log10 (total PICs per region +1), x
= log10 (length of region + 1)] were calculated for a common x value. Re-
gions are coded 1 to 10 (highest to lowest) and arranged in decreasing or- Fig. 7. Summary of cpDNA sequence used in 178 plant phylogeo-
der of mean values. Mean PICs/region are reported for three statistical graphic studies published between 2007 and 2013. Of 178 studies surveyed
groups; group means followed by different letters are significantly differ- over 7 years, 69% used cpDNA sequence data. This percentage varied by
ent (P < 0.05; ANCOVA, Scheffé contrasts). year, ranging between 50% (2007) and 81% (2012), and it has not dropped
below 68% since 2009.

plus matK. Our data predicts that the regions most likely to con-
tain the greatest number of variable sites in angiosperms are
5′rps16-trnQ(UUG), ndhC-trnV(UAC), ndhF-rpl32, trnT(GGU)-psbD, (specifically, ndhF and rpl32) are highly variable, making “uni-
psbE-petL, petA-psbJ, rpl32-trnL(UAG), rpl16 intron, ndhA intron, versal” primer design difficult for these two NC-cpDNAs. Ap-
rpoB-trnC(GCA), trnS(GCU)-5′trnG(GCC), 5′trnK(UUU)-3′rps16, psbM- pendix S3 contains a brief discussion on the ndhF-rpl32 and
trnD(GUC), and matK (Fig. 2A, B). Primers have been developed and rpl32-trnL(UAG) primers and a table that will aid in lineage-
are “universal” for most of these regions (see Shaw et al., 2007). specific primer design for future projects.
However, the coding regions of ndhF-rpl32 and rpl32-trnL(UAG) The most variable NC-cpDNA regions identified in this re-
search agree with the previous findings by Shaw et al. (2007) and
others (Byrne and Hankinson, 2012; Dong et al., 2012). It is in-
teresting that these studies are highly corroborative regarding the
best NC-cpDNAs for low-level studies, even though they sam-
pled completely different species from many different plant fam-
ilies. That is, none of the 25 species pairs sampled here were the
same as any species pair from TH2 or TH3. Both the present study
and TH3 converge on ndhF-rpl32, rpl32-trnL(UAG), 5′rps16-trnQ(UUG),
ndhC-trnV(UAC), psbJ-petA, trnT(GGU)-psbD, and psbE-psbL
being among the most informative regions of the plastome for
most groups. Even analyzed in a different way and with differ-
ent species, Dong et al. (2012) and Byrne and Hankinson (2012)
showed rpl32-trnL(UAG), 5′rps16-trnQ(UUG), ndhC-trnV(UAC),
trnT(GGU)-psbD, and rpl16 intron to be among the most variable.
Additionally, new comparative plastomic data continue to be
consistent with our findings at multiple taxonomic levels, includ-
ing Illicium (O. R. Leonard and A. B. Morris, unpublished data),
Solanaceae (Särkinen and George, 2013), and Apiales (Downie
and Jansen, in press). Finally, numerous other studies with nar-
rower taxonomic scopes have also shown these same regions to
be the most variable of the angiosperm plastome (see discussion,
below).
Fig. 6. Intraspecific studies of plants from 1987 to 2013 showing the Like TH3, we show that when making finer level comparisons,
contributions of the separate plant genomes. We performed a search in Web there is a lesser degree of predictability among the top NC-
of Science (May 2014) to determine which plant genome(s) was most fre-
quently used in the phylogeographic literature. Search terms included “phy-
cpDNAs. For example, trnS(GCU)-5′trnG(GCC) ranked just outside of
logeography or phylogeographic” refined by articles only and the term the top 10 regions in the overall comparison, and it ranked in the top
“*aceae” to identify studies including a plant family, with subsequent inde- 12 in 3/6 of the comparisons of major lineages, but it ranked first
pendent refinements using the terms “chloroplast or plastid or cpDNA” or place in the Phalaenopsis (monocot) and Populus (eurosid I) com-
“mitochondria or mitochondrial or mtDNA” or “nuclear or nDNA”. parisons and third place in Olea (euasterid I) (Fig. 4). Interestingly,
November 2014] SHAW ET AL.—TORTOISE AND HARE IV 1999

Fig. 8. Map of the potentially informative characters (PICs) expected to be found around the plastome. Gene content, order, and mapping, was based
on the map of Lee et al. (2006) for Gossypium hirsutum. Bars radiating in from NC-cpDNA regions show the average percentage contribution to total PICs
for each NC-cpDNA and matK. The blue, orange, red, and pink circles indicate increasing predicted PIC values for each NC-cpDNA, as demonstrated by
the scale bar. The inverted repeats are indicated by bold black lines on the genome and on the blue, orange, red, and pink circles. Note that three areas
highlighted in light green standout as hotspots or clusters of highly variable regions of the plastome: ccsA-ndhF, matK-3′trnG, and rpoB-psbD.

both Phalaenopsis and Populus have plastome rearrangements region). In another example, matK, which ranked 12th place
that result in the partial loss of the ndhF-rpl32-trnL(UAG) region overall (Fig. 2A, 2B) and in the top 12 positions in 3/6 comparisons
(actually, further supporting our data that this is a highly variable of major lineages (Fig. 3B–D), was ranked first in Aethionema
2000 AMERICAN JOURNAL OF BOTANY [Vol. 101

(eurosid II) and second in Vigna (eurosid I) and Olea (Fig. 4). In sequences may not be necessary to resolve many questions in low-
rare cases, we show that a NC-cpDNA that ranked very low over- level studies. The top eight regions, ~8500 bp (~5.6% of the cp-
all (e.g., rpl33-rps18), ranked very highly in one species com- DNA plastome), account for roughly one fifth of the NC-cpDNA
parison (Nelumbo, Proteales) (Fig. 4), and did not appear in the PICs present in a given lineage (based on percentage contribution
top eight in the remaining 24 species comparisons. of each region averaged across the 25 species pairs) and the top 12
Despite the rare cases when low ranked NC-cpDNAs turn out regions, ~12 900 bp (8.6% of the cpDNA plastome), account for
to be the most informative in a given lineage or when the top NC- more than one third of the total NC-cpDNA PICs. Since about one
cpDNA regions may change rank among the top positions, a fifth of the genetic differences of the noncoding portions of the
few regions like ndhF-rpl32, rpl32-trnL(UAG), ndhC-trnV(UAC), plastome can be captured by sequencing the eight most variable
and 5′rps16-trnQ(UUG) are, on average, among the most informa- regions and there is a performance plateau with respect to species-
tive across angiosperms (Figs. 2–4) (Byrne and Hankinson, 2012; level discrimination beyond six or seven regions, we posit that for
Dong et al., 2012). In fact, one of these four NC-cpDNAs was the many inter- and intraspecific studies whole plastome sequencing is
top ranked NC-cpDNA region in 11 of 25 of the species compari- probably unnecessary. Depending on the lineage of interest, at
sons (of 108 total regions) and five of six major lineage compari- least four and up to eight of the most variable NC-cpDNA regions
sons. Furthermore, one of these four most variable NC-cpDNAs will likely access the majority of the low-level discriminating
was among the top three regions in 19 of 25 lineages. Even in the power of the plastome.
above example of Nelumbo, where a low-ranked region overall
was the most variable in this lineage, these four regions were all Discriminating power of the cpDNA genome—In the late
in the top 10. This finding suggests that while there is some amount 1980s and early 1990s, the cpDNA genome was believed to be ill-
of inconsistency among the top NC-cpDNAs, and no single NC- fit for intraspecific studies because of its very slow rate of mutation
cpDNA is universally best for low-level molecular studies, it is (Palmer, 1987). This belief was largely based on a few intraspecific
likely that one of the top NC-cpDNAs will be among the most in- studies using cpDNA restriction digests (e.g., Banks and Birky,
formative in a given taxonomic group of angiosperms. 1985; Sytsma and Schaal, 1985). Soon after, Palmer et al. (1988)
and Olmstead and Palmer (1994) suggested that the plastome
How many NC-cpDNA regions should we use and how many might be useful for intraspecific studies in some cases, but some-
should we screen?—Through the last 6 years, the average number how our community held firm to the idea that the plastome was
of cpDNA regions used in intra- and interspecific studies was two only useful for interspecific studies and above. Toward the late
to three, with a minimum of one to a maximum of six. Unfortu- 1990s and early to mid-2000s, some researchers successfully used
nately, only 28 of 178 papers reviewed here reported screening NC-cpDNA RFLP (Demesure et al., 1996; Dumolin-Lapegue
additional loci; too often it was not stated what other NC-cpDNA et al., 1997a; Soltis et al., 1997; King and Ferris, 1998; Taberlet
regions were screened. In the papers where it was reported, the et al., 1998; Newton et al., 1999; Petit et al., 2002) or sequence data
mean number of loci ranged from four to 15, and the authors rarely (Ohsako and Ohnishi, 2000; Chiang et al., 2001; Lu et al., 2001;
reported information on the regions screened but excluded. There- Okaura and Harada, 2002; Holderegger and Abbott, 2003; Yamane
fore, there is insufficient data from the literature to allow us to build et al., 2003) to infer intraspecific relationships of plants and Wid-
a quantitative database from which to draw conclusions about how mer and Baltisberger (1999) even touted “extensive intraspecific
many NC-cpDNAs should be screened prior to study, and we hope chloroplast variation.” These earliest studies to have used NC-
to address this question in the future. cpDNA sequence data all predated TH2, and most used what are
At this point, we know that it is the exception rather than the today considered intermediately variable regions, like trnL(UAA)-
rule that a single NC-cpDNA region will provide enough ge- trnF(GAA) and psbA-trnH(GUG), although some used rpoB-trnC(GCA)
netic information to discriminate all of the species in a genus or (Yamane et al., 2003) or trnD(GUC)-trnT(GGU) (Lu et al., 2001),
section or all populations in a phylogeographic study. Our lit- which, on average, are fairly high-ranking. Other fairly high-
erature review showed that at the lowest levels in which work- ranking regions like trnS(GCU)-5′trnG(GCC) and trnD(GUC)-trnT(GGU)
ers used NC-cpDNAs, the average number of NC-cpDNA were shown to contain small amounts of intraspecific variation in
regions used was two to three, with a minimum of one to a three unrelated species from the Alps (Schönswetter et al., 2006b,
maximum of six. The DNA barcoding community has provided a). Wang et al. (2009) even used regions that ranked relatively low
a wealth of information regarding the species-level discriminat- here, the rpl20-rps12, trnV intron, and psbA-trnH(GUG) regions, to
ing power of plastome regions. For a while now it has been reveal nine haplotypes from 23 populations of Aconitum gymnan-
understood that a multilocus approach is necessary to capture drum (basal eudicot). To our knowledge, some of the earliest intra-
enough genetic variability to differentiate species (Kress et al., specific studies to use a top region were Yamane and Kawahara
2005; Newmaster et al., 2006; Chase et al., 2007; Kress and (2005) who used ndhF-rpl32 in combination with rpoB-trnC(GCA),
Erickson, 2007; Lahaye et al., 2008). Fazekas et al. (2008) trnF(GAA)-ndhJ, and atpI-atpH to identify 2–8 haplotypes per spe-
showed that at least four NC-cpDNA regions are necessary to cies in 13 Triticum-Aegilops species (monocot) and Huang et al.
capture the total species discriminating power of the plastome (2004) who used petA-psbJ along with petG-trnP(UGG) to reveal
and that greatly increasing the number of regions above four, nine haplotypes among 24 populations of Trochodendron aralioi-
especially above six or seven, will not increase species-level des (basal eudicot).
discrimination because of a “performance plateau”. More recently, numerous papers have used top-performing
We draw two conclusions here. First, on average the phyloge- NC-cpDNAs to uncover intraspecific variability. Aguirre-Liguori
netics/phylogeography community is under-sampling the plastome et al. (2014) used ndhF-rpl32, rpl32-trnL(UAG), and psbJ-petA to
by using only two to three regions per study and therefore not using identify nine haplotypes and geographic structure within Fouqui-
cpDNA sequence data to the fullest extent of its discriminating eria shrevei (basal asterid). James and McDougall (2014) used
power. Second, while there are certainly cases where only se- 5′rps16-trnQ(UUG) to reveal eight haplotypes and geographic
quencing the entire plastome will find genomic differ- structure within Grevillea renwickiana (basal eudicot), and Chen
ences between study taxa (Whittall et al., 2010), whole plastome et al. (2013) used this same region to reveal 15 haplotypes among
November 2014] SHAW ET AL.—TORTOISE AND HARE IV 2001

17 populations of Prunus pseudocerasus (eurosid I). Liu et al. Data from our literature review were insufficient for us to
(2012) uncovered 12 haplotypes using rpl32-trnL(UAG) and make recommendations on how many individuals to screen;
showed it to be as informative as the nuclear-encoded PAL in a additionally, the screening process or strategy will likely vary
phylogeographic inquiry of Camellia taliensis (basal asterid), depending the taxonomic and geographic scope of the project.
and Cires et al. (2012) used rpl32-trnL(UAG) and 5′rps16-trnQ(UUG) Because the top regions of this study are likely to be among the
to reveal intraspecific variability in Ranunculus parnassifolius most variable in most angiosperm species, we suggest including
(basal eudicot). In fact, Cires et al. showed each of these regions all of them in initial screenings. These include ndhF-rpl32, rpl32-
to contain three times more variable characters than in the nuclear trnL(UAG), 5′rps16-trnQ(UUG), ndhC-trnV(UAC), trnT(GGU)-psbD,
ribosomal ITS. Dunbar-Co et al. (2008) used ndhF-rpl32 and psbE-petL, petA-psbJ, and rpl16 intron. To increase the likeli-
rpl32-trnL(UAG) in an intra- and interspecific study of Hawaiian hood of finding the most variable regions in a specific group of
Plantago (asterid I) and showed this concatenated region to con- interest, we suggest (1) potentially screening some of the other
tain twice the variable characters present in the ITS. Yuan et al. top ranking regions as well, such as ndhA intron, rpoB-trnC(GCA),
(2008) used 5′rps16-trnQ(UUG) and psbA-trnH(GUG) to highlight psbM-trnD(GUC), trnS(GCU)-5′trnG(GCC), matK, 5′trnK(UUU)-
11 haplotypes from 16 populations of Dipentodon (eurosid II). 3′rps16, petN-psbM, and perhaps trnT(UGU)-trnL(UAA), or (2) some
Yuan et al. (2011) used 5′rps16-trnQ(UUG) in addition to petB- of the most variable regions from the major lineage closest to the
petD and psbA-trnH(GUG) in an intraspecific study of Paeonia study group of interest (Figs. 3, 4), or (3) some NC-cpDNAs that
rockii (basal eurosid) and found 16 haplotypes among 335 indi- other studies have shown to be highly informative in taxa closely
viduals (interestingly, petB-petD outperformed 5′rps16-trnQ(UUG) related to your group of study. Primers and protocols for all of
and was 1458 bp long, which, according to our average of 248 bp, these regions are referenced in Shaw et al. (2005, 2007). Further-
is abnormally large for this region, Fig. 2E). more, the DNA barcoding community has shown that 4–6 NC-
We are not suggesting that NC-cpDNA can always be used to cpDNAs will likely capture all of the species-discriminating
elucidate intraspecific relationships, but rather that we should power of the plastome, so for now we suggest screening up to
not assume that it will not. In a study of 21 closely related spe- 10–12 regions to determine which 4–6 to include in low-level
cies of Kaempferia and related genera (Zingiberales), Techap- phylogenetic or phylogeographic studies.
rasan et al. (2010) showed that about half of the species (11) Lastly, we ask our community to include detailed data tables in
displayed intraspecific variability using psbA-trnH(GUG) and their publications to clarify the relative contribution of each se-
petA-psbJ sequences. There are also instances where these re- quenced region and specifically include information on screened
gions are unlikely to be helpful at intraspecific levels (consider but omitted NC-cpDNA regions (even if abandoned early be-
the four species pairs that we excluded from our plastome anal- cause of poly A/T regions near the ends). At this point, we sus-
yses due to a lack of variation: Acorus, Nicotiana, Oryza, and pect there to be a wealth of unpublished information that will
Phyllostachys). likely be the cause of repeated effort and wasted resources in the
In part, the ability of our community to quantitatively com- future. At a minimum, these tables should include NC-cpDNA
pare NC-cpDNA region utility has been hampered by a paucity name, length, nucleotide substitutions, indels, inversions, vari-
of detail reported in phylogeographic and phylogenetic studies. able characters, informative characters, and haplotypes uncov-
It is problematic that for the published haplotype networks or ered by the different regions, all presented within the context of
phylogenies, many current studies fail to report the relative sample size by population. If each study includes this informa-
contributions of (1) the different kinds of data (e.g., NC-cpDNA tion for each region employed, within a year’s time, there will be
vs. microsatellite vs. AFLP) and (2) the different NC-cpDNA a substantial amount of data that may be of use to other research-
regions, both those that were included and arguably more im- ers that do not have plastome resources for their target groups.
portantly those that were screened and excluded. Or, even when As we have learned how to use it more effectively, the plastome
such data are reported, they are not always clearly presented in has become an extraordinary tool in our understanding of the
an easily accessible manner. At the outset of this study, we had evolutionary history of plants. The literature establishes that the
hoped to mine the phylogeographic literature for data regard- plastome has allowed us to investigate the deepest to shallowest
ing (1) how many gene regions were screened before choosing relationships of plants, and we continue to push the envelope fur-
ones for study and (2) how many haplotypes were uncovered ther to better understand recent evolutionary patterns within and
with each type of molecular marker and with each different among populations. The success we have experienced so far has
NC-cpDNA region. However, most studies provide only the largely been without the aid of the most variable regions of the
summary of results from concatenated data, with no indication plastome. It appears as though we are embarking on new oppor-
of the relative contribution of each sequenced region to the sum tunities once again, through this organelle that was once thought
of the whole. Within the 178 papers surveyed in our literature to contain so little species-level information.
review, only 45 included such information, and the information
was not parallel enough to elucidate patterns.

Conclusions—At this point, several things are clear. First, NC- LITERATURE CITED
cpDNA sequences are still a tool of choice among systematists/
phylogeographers focused at the shallowest taxonomic levels. Sec- AGUIRRE-LIGUORI, J. A., E. SCHEINVAR, AND L. E. EGUIARTE. 2014. Gypsum
soil restriction drives genetic differentiation in Fouquieria shrevei
ond, there is a strong need for us to be able to predict which mark-
(Fouquieriaceae). American Journal of Botany 101: 730–736.
ers will be the most variable within a given lineage because (1) AOKI, K., T. SUZUKI, AND N. MURAKAMI. 2003. Intraspecific sequence varia-
NC-cpDNAs are important for inter- and intraspecific studies, (2) tion of chloroplast DNA among the component species of evergreen
generating whole plastome sequences as a starting point for inter- broad-leaved forests in Japan. Journal of Plant Research 116: 337–344.
and intraspecific studies is still beyond the reach of most research- BANKS, J. A., AND C. W. BIRKY. 1985. Chloroplast DNA diversity is low in
ers and not necessarily needed for all low-level studies, and (3) a wild plant, Lupinus texensis. Proceedings of the National Academy
there is no single most variable NC-cpDNA. of Sciences, USA 82: 6950–6954.
2002 AMERICAN JOURNAL OF BOTANY [Vol. 101

BASTIA, T., N. SCOTTI, AND T. CARDI. 2001. Organelle DNA analysis of DOYLE, J. J. 2013. The promise of genomics for a “next generation” of
Solanum and Brassica somatic hybrids by PCR with ‘universal prim- advances in higher-level legume molecular systematics. South African
ers’. Theoretical and Applied Genetics 102: 1265–1272. Journal of Botany 89: 10–18.
BORSCH, T., K. W. HILU, D. QUANDT, V. WILDE, C. NEINHUIS, AND W. DUMOLIN-LAPEGUE, S., B. DEMESURE, S. FINESCHI, V. LE COME, AND R. J.
BARTHLOTT. 2003. Noncoding plastid trnT-trnF sequences reveal a PETIT. 1997a. Phylogeographic structure of white oaks throughout
well resolved phylogeny of basal angiosperms. Journal of Evolutionary the European continent. Genetics 146: 1475–1487.
Biology 16: 558–576. DUMOLIN-LAPEGUE, S., M. H. PEMONGE, AND R. J. PETIT. 1997b. An en-
BOWEN, B. W., K. SHANKER, N. YASUDA, M. CELIA, M. C. M. D. MALAY, S. VON DER larged set of consensus primers for the study of organelle DNA in
HEYDEN, AND G. PAULAY. 2014. Phylogeography unplugged: Compara- plants. Molecular Ecology 6: 393–397.
tive surveys in the genomic era. Bulletin of Marine Science 90: 13–46. DUNBAR-CO, S., A. M. WIECZOREK, AND C. W. MORDEN. 2008. Molecular
BYRNE, M., AND M. HANKINSON. 2012. Testing the variability of chlo- phylogeny and adaptive radiation of the endemic Hawaiian Plantago
roplast sequences for plant phylogeography. Australian Journal of species (Plantaginaceae). American Journal of Botany 95: 1177–1188.
Botany 60: 569–574. EBERT, D., AND R. PEAKALL. 2009. A new set of universal de novo se-
CHASE, M. W., R. S. COWAN, P. M. HOLLINGSWORTH, C. VAN DEN BERG, S. quencing primers for extensive coverage of noncoding chloroplast
MADRINAN, G. PETERSEN, O. SEBERG, ET AL. 2007. A proposal for a DNA: New opportunities for phylogenetic studies and cpSSR discov-
standardised protocol to barcode all land plants. Taxon 56: 295–299. ery. Molecular Ecology Resources 9: 777–783.
CHASE, M. W., D. E. SOLTIS, R. G. OLMSTEAD, D. MORGAN, D. H. LES, ESERMAN, L. A., G. P. TILEY, R. L. JARRET, J. H. LEEBENS-MACK, AND R.
B. D. MISHLER, M. R. DUVALL, ET AL. 1993. Phylogenetics of seed E. MILLER. 2014. Phylogenetics and diversification of morning glo-
plants—An analysis of nucleotide sequences from the plastid gene ries (tribe Ipomoeeae, Convolvulaceae) based on whole plastome se-
rbcL. Annals of the Missouri Botanical Garden 80: 528–580. quences. American Journal of Botany 101: 92–103.
CHEN, T., X. R. WANG, H. R. TANG, Q. CHEN, X. J. HUANG, AND J. CHEN. FAZEKAS, A. J., K. S. BURGESS, P. R. KESANAKURTI, S. W. GRAHAM, S. G.
2013. Genetic diversity and population structure of Chinese cherry NEWMASTER, B. C. HUSBAND, D. M. PERCY, ET AL. 2008. Multiple
revealed by chloroplast DNA trnQ-rps16 intergenic spacers variation. multilocus DNA barcodes from the plastid genome discriminate plant
Genetic Resources and Crop Evolution 60: 1859–1871. species equally well. PLOS ONE 3: e2802.
CHIANG, T. Y., Y. C. CHIANG, Y. J. CHEN, C. H. CHOU, S. HAVANOND, T. N. FEHRMANN, S., C. T. PHILBRICK, AND R. HALLIBURTON. 2012. Intraspecific
HONG, AND S. HUANG. 2001. Phylogeography of Kandelia candel in variation in Posodostemum ceratophyllum (Podostemaceae):
East Asiatic mangroves based on nucleotide variation of chloroplast Evidence of refugia and colonization since the last glacial maximum.
and mitochondrial DNAs. Molecular Ecology 10: 2697–2710. American Journal of Botany 99: 145–151.
CHIN, S.-W., J. SHAW, R. HABERLE, J. WEN, AND D. POTTER. 2014. Diversity FIOR, S., M. LI, B. OXELMAN, R. VIOLA, S. A. HODGES, L. OMETTO, AND
of almonds, peaches, plums and cherries—Molecular systematics and C. VAROTTO. 2013. Spatiotemporal reconstruction of the Aquilegia
biogeographic history of Prunus (Rosaceae). Molecular Phylogenetics rapid radiation through next-generation sequencing of rapidly evolv-
and Evolution 76: 34–48. ing cpDNA regions. New Phytologist 198: 579–592.
CIRES, E., C. CUESTA, P. VARGAS, AND J. A. F. PRIETO. 2012. Unraveling GIELLY, L., AND P. TABERLET. 1994. The use of chloroplast DNA to resolve
the evolutionary history of the polyploid complex Ranunculus par- plant phylogenies—Noncoding versus rbcL sequences. Molecular
nassifolius (Ranunculaceae). Biological Journal of the Linnean Biology and Evolution 11: 769–777.
Society 107: 477–493. GODDEN, G. T., I. E. JORDON-THADEN, S. CHAMALA, A. A. CROWL, N. GARCÍA,
CLEGG, M. T., J. R. RAWSON, AND K. THOMAS. 1984. Chloroplast DNA C. C. GERMAIN-AUBREY, J. M. HEANEY, ET AL. 2013. Making next-
variation in pearl millet and related species. Genetics 106: 449–461. generation sequencing work for you: Approaches and practical con-
CRONN, R., B. J. KNAUS, A. LISTON, P. J. MAUGHAN, M. PARKS, J. V. SYRING, siderations for marker development and phylogenetics. Plant Ecology
AND J. UDALL. 2012. Targeted enrichment strategies for next-genera- & Diversity 5: 427–450.
tion plant biology. American Journal of Botany 99: 291–311. GRAHAM, S. W., AND R. G. OLMSTEAD. 2000. Utility of 17 chloroplast
DE WIT, P., M. H. PESPENI, J. T. LADNER, D. J. BARSHIS, F. SENECA, H. JARIS, genes for inferring the phylogeny of the basal angiosperms. American
N. O. THERKILDSEN, ET AL. 2012. The simple fool’s guide to popula- Journal of Botany 87: 1712–1730.
tion genomics via RNA-Seq: An introduction to high-throughput se- GRIVET, D., B. HEINZE, G. G. VENDRAMIN, AND R. J. PETIT. 2001. Genome
quencing data analysis. Molecular Ecology Resources 12: 1058–1067. walking with consensus primers: Application to the large single copy
DEMESURE, B., B. COMPS, AND R. J. PETIT. 1996. Chloroplast DNA phy- region of chloroplast DNA. Molecular Ecology Resources 1: 345–349.
logeography of the common beech (Fagus sylvatica L.) in Europe. HAMILTON, M. B. 1999. Four primer pairs for the amplification of chlo-
Evolution 50: 2515–2520. roplast intergenic regions with intraspecific variation. Molecular
DEMESURE, B., N. SODZI, AND R. J. PETIT. 1995. A set of universal primers Ecology 8: 521–523.
for amplification of polymorphic noncoding regions of mitochondrial HILU, K. W., T. BORSCH, K. MÜLLER, D. E. SOLTIS, P. S. SOLTIS, V. SAVOLAINEN,
and chloroplast DNA in plants. Molecular Ecology 4: 129–131. M. W. CHASE, ET AL. 2003. Angiosperm phylogeny based on matK se-
DHINGRA, A., AND K. M. FOLTA. 2005. ASAP: Amplification, sequencing, quence information. American Journal of Botany 90: 1758–1776.
and annotation of plastomes. BMC Genomics 6: 176. HILU, K. W., AND H. LIANG. 1997. The matK gene: Sequence variation
DONG, W., J. LIU, J. YU, L. WANG, AND S. ZHOU. 2012. Highly variable and application in plant systematics. American Journal of Botany 84:
chloroplast markers for evaluating plant phylogeny at low taxonomic 830–839.
levels and for DNA barcoding. PLoS ONE 7: e35071. HOLDEREGGER, R., AND R. J. ABBOTT. 2003. Phylogeography of the Arctic-
DOORDUIN, L., B. GRAVENDEEL, Y. LAMMERS, Y. ARIYUREK, T. CHIN-A- Alpine Saxifraga oppositifolia (Saxifragaceae) and some related taxa
WOENG, AND K. VRIELING. 2011. The complete chloroplast genome of based on cpDNA and its sequence variation. American Journal of
17 individuals of pest species Jacobaea vulgaris: SNPs, microsatel- Botany 90: 931–936.
lites, and barcoding markers for population and phylogenetic studies. HUANG, S.-F., S. Y. HWANG, J. C. WANG, AND T. P. LIN. 2004. Phylogeography
DNA Research 18: 93–105. of Trochodendron aralioides (Trochodendraceae) in Taiwan and its
DOWNIE, S. R., AND R. K. JANSEN. In press. A comparative analysis of whole adjacent areas. Journal of Biogeography 31: 1251–1259.
plastid genomes from the Apiales: Expansion and contraction of the JAMES, E. A., AND K. L. MCDOUGALL. 2014. Spatial genetic structure re-
inverted repeat, mitochondrial to plastid transfer of DNA, and iden- flects extensive clonality, low genotypic diversity and habitat frag-
tification of highly divergent noncoding regions. Systematic Botany. mentation in Grevillea renwickiana (Proteaceae), a rare, sterile shrub
DOWNIE, S. R., D. S. KATZ-DOWNIE, AND M. F. WATSON. 2000. A phylogeny from south-eastern Australia. Annals of Botany 114: 413–423.
of the flowering plant family Apiaceae based on chloroplast DNA rpl16 JOHNSON, L. A., AND D. E. SOLTIS. 1995. Phylogenetic inference in
and rpoC1 intron sequences: Towards a suprageneric classification of Saxifragaceae sensu stricto and Gilia (Polemoniaceae) using matK
subfamily Apioideae. American Journal of Botany 87: 273–292. sequences. Annals of the Missouri Botanical Garden 82: 149–175.
November 2014] SHAW ET AL.—TORTOISE AND HARE IV 2003

KELCHNER, S. A. 2000. The evolution of non-coding chloroplast DNA and OKAURA, T., AND K. HARADA. 2002. Phylogeographical structure revealed
its application in plant systematics. Annals of the Missouri Botanical by chloroplast DNA variation in Japanese beech (Fagus crenata
Garden 87: 482–498. Blume). Heredity 88: 322–329.
KING, R. A., AND C. FERRIS. 1998. Chloroplast DNA phylogeography of OLMSTEAD, R. G., AND J. D. PALMER. 1994. Chloroplast DNA systemat-
Alnus glutinosa (L.) Gaertn. Molecular Ecology 7: 1151–1161. ics—A review of methods and data analysis. American Journal of
KRESS, W. J., AND D. L. ERICKSON. 2007. A two-locus global DNA bar- Botany 81: 1205–1224.
code for land plants: The coding rbcL gene complements the non- PALIY, O. 2013. The golden age of molecular ecology. Journal of
coding trnH-psbA spacer region. PLoS ONE 2: e508. Phylogenetics and Evolutionary Biology 1: e105.
KRESS, W. J., K. J. WURDACK, E. A. ZIMMER, L. A. WEIGT, AND D. H. JANZEN. PALMER, J. D. 1987. Chloroplast DNA evolution and biosystematic uses
2005. Use of DNA barcodes to identify flowering plants. Proceedings of chloroplast DNA variation. American Naturalist 130: S6–S29.
of the National Academy of Sciences, USA 102: 8369–8374. PALMER, J. D., R. K. JANSEN, H. J. MICHAELS, M. W. CHASE, AND J. R.
LAHAYE, R., M. VAN DER BANK, D. BOGARIN, J. WARNER, F. PUPULIN, G. MANHART. 1988. Chloroplast DNA variation and plant phylogeny.
GIGOT, O. MAURIN, ET AL. 2008. DNA barcoding the floras of biodi- Annals of the Missouri Botanical Garden 75: 1180–1206.
versity hotspots. Proceedings of the National Academy of Sciences of PETIT, R. J., U. M. CSAIKL, S. BORDÁC S, K. BURG, E. COART, J. COTTRELL, B.
the United States of America 105: 2923–2928. VAN DAM, ET AL. 2002. Range wide distribution of chloroplast DNA
LEE, S. B., C. KAITTANIS, R. K. JANSEN, J. B. HOSTETLER, L. J. TALLON, C. diversity and pollen deposits in European white oaks: Inferences
D. TOWN, AND H. DANIELL. 2006. The complete chloroplast genome about colonisation routes and management of oak genetic resources.
sequence of Gossypium hirsutum: Organization and phylogenetic re- Forest Ecology and Management 156: 5–26.
lationships to other angiosperms. BMC Genomics 7: 61. PETIT, R. J., AND G. G. VENDRAMIN. 2007. Plant phylogeography based on
LI, R., P. F. MA, J. WEN, AND T. S. YI. 2013. Complete sequencing of five organelle genes: An introduction. In S. Weiss and N. Ferrand [eds.],
Araliaceae chloroplast genomes and the phylogenetic implications. Phylogeography of southern European refugia, 23–97. Springer,
PLoS ONE 8: e78568. Dordrecht, Netherlands.
LIU, Y., S. X. YANG, P. Z. JI, AND L. Z. GAO. 2012. Phylogeography of POTTER, D., T. ERIKSSON, R. C. EVANS, S. OH, J. E. E. SMEDMARK, D. R.
Camellia taliensis (Theaceae) inferred from chloroplast and nuclear MORGAN, M. KERR, ET AL. 2007. Phylogeny and classification of
DNA: Insights into evolutionary history and conservation. BMC Rosaceae. Plant Systematics and Evolution 266: 5–43.
Evolutionary Biology 12: 92. PRESTING, G. G. 2006. Identification of conserved regions in the plastid
LU, S. Y., C. I. PENG, Y. P. CHENG, K. H. HONG, AND T. Y. CHIANG. genome: Implications for DNA barcoding and biological function.
2001. Chloroplast DNA phylogeography of Cunninghamia konishii Canadian Journal of Botany 84: 1434–1443.
(Cupressaceae), an endemic conifer of Taiwan. Genome 44: 797–807. ROCHA, L. A., M. A. BERNAL, M. R. GAITHER, AND M. E. ALFARO. 2013.
MAIER, R. M., K. NECKERMANN, G. L. IGLOI, AND H. KÖSSEL. 1995. Massively parallel DNA sequencing: The new frontier in biogeogra-
Complete sequence of the maize chloroplast genome—Gene content, phy. Frontiers of Biogeography 5.
hotspots of divergence and fine-tuning of genetic information by tran- RUHFEL, B. R., M. A. GITZENDANNER, P. S. SOLTIS, D. E. SOLTIS, AND J. G.
script editing. Journal of Molecular Biology 251: 614–628. BURLEIGH. 2014. From algae to angiosperms—Inferring the phylog-
MARTIN, G., F. C. BAURENS, C. CARDI, J. M. AURY, AND A. D'HONT. 2013. eny of green plants (Viridiplantae) from 360 plastid genomes. BMC
The complete chloroplast genome of banana (Musa acuminata, Evolutionary Biology 14: 23.
Zingiberales): Insight into plastid monocotyledon evolution. PLoS SALTONSTALL, K. 2001. A set of primers for amplification of noncoding regions
ONE 8: e67350. of chloroplast DNA in the grasses. Molecular Ecology Notes 1: 76–78.
MOORE, M. J., P. S. SOLTIS, C. D. BELL, J. G. BURLEIGH, AND D. E. SOLTIS. SÄRKINEN, T., AND M. GEORGE. 2013. Plastid marker variation: Can complete
2010. Phylogenetic analysis of 83 plastid genes further resolves plastid genomes from closely related species help? PLoS ONE 8: e82266.
the early diversification of eudicots. Proceedings of the National SCHAAL, B. A., D. A. HAYWORTH, K. M. OLSEN, J. T. RAUSCHER, AND W. A.
Academy of Sciences, USA 107: 4623–4628. SMITH. 1998. Phylogeographic studies in plants: Problems and pros-
MORRIS, A., S. ICKERT-BOND, D. BRUNSON, D. SOLTIS, AND P. SOLTIS. 2008. pects. Molecular Ecology 7: 465–474.
Phylogeographical structure and temporal complexity in American SCHAAL, B. A., AND K. M. OLSEN. 2000. Gene genealogies and population
sweetgum (Liquidambar styraciflua; Altingiaceae). Molecular variation in plants. Proceedings of the National Academy of Sciences,
Ecology 17: 3889–3900. USA 97: 7024–7029.
MORT, M. E., J. K. ARCHIBALD, C. P. RANDLE, N. D. LEVSEN, T. R. O'LEARY, SCHÖNSWETTER, P., M. POPP, AND C. BROCHMANN. 2006a. Central Asian
K. TOPALOV, C. M. WIEGAND, ET AL. 2007. Inferring phylogeny at low origin of and strong genetic differentiation among populations of the
taxonomic levels: Utility of rapidly evolving cpDNA and nuclear ITS rare and disjunct Carex atrofusca (Cyperaceae) in the Alps. Journal
loci. American Journal of Botany 94: 173–183. of Biogeography 33: 948–956.
NEWMASTER, S. G., A. J. FAZEKAS, AND S. RAGUPATHY. 2006. DNA barcod- SCHÖNSWETTER, P., M. POPP, AND C. BROCHMANN. 2006b. Rare arctic-
ing in land plants: Evaluation of rbcL in a multigene tiered approach. alpine plants of the European Alps have different immigration histo-
Canadian Journal of Botany 84: 335–341. ries: The snow bed species Minuartia biflora and Ranunculus pyg-
NEWTON, A. C., T. R. ALLNUTT, A. C. M. GILLIES, A. J. LOWE, AND R. A. ENNOS. maeus. Molecular Ecology 15: 709–720.
1999. Molecular phylogeography, intraspecific variation and the con- SHAW, J., E. B. LICKEY, J. T. BECK, S. B. FARMER, W. S. LIU, J. MILLER, K. C.
servation of tree species. Trends in Ecology & Evolution 14: 140–145. SIRIPUN, ET AL. 2005. The tortoise and the hare II: Relative utility of
NIKIFOROVA, S. V., D. CAVALIERI, R. VELASCO, AND V. GOREMYKIN. 2013. 21 noncoding chloroplast DNA sequences for phylogenetic analysis.
Phylogenetic analysis of 47 chloroplast genomes clarifies the con- American Journal of Botany 92: 142–166.
tribution of wild species to the domesticated apple maternal line. SHAW, J., E. B. LICKEY, E. E. SCHILLING, AND R. L. SMALL. 2007. Comparison
Molecular Biology and Evolution 30: 1751–1760. of whole chloroplast genome sequences to choose noncoding regions
NJUGUNA, W., A. LISTON, R. CRONN, T. L. ASHMAN, AND N. BASSIL. 2013. for phylogenetic studies in angiosperms: The tortoise and the hare III.
Insights into phylogeny, sex function and age of Fragaria based on American Journal of Botany 94: 275–288.
whole chloroplast genome sequencing. Molecular Phylogenetics and SHAW, J., AND R. L. SMALL. 2004. Addressing the “hardest puzzle in
Evolution 66: 17–29. American pomology”: Phylogeny of Prunus sect. Prunocerasus
O’DONNELL, K. 1992. Ribosomal DNA internal transcribed spacers are (Rosaceae) based on seven noncoding chloroplast DNA regions.
highly divergent in the phytopathogenic ascomycete Fusarium sam- American Journal of Botany 91: 985–996.
bucinum (Gibberella pulicaris). Current Genetics 22: 213–220. SHI, C., N. HU, H. HUANG, J. GAO, Y. J. ZHAO, AND L. Z. GAO. 2012. An
OHSAKO, T., AND O. OHNISHI. 2000. Intra- and interspecific phylogeny improved chloroplast DNA extraction procedure for whole plastid ge-
of wild Fagopyrum (Polygonaceae) species based on nucleotide se- nome sequencing. PLoS ONE 7: e31468.
quences of noncoding regions in chloroplast DNA. American Journal SIMMONS, M. P., AND H. OCHOTERENA. 2000. Gaps as characters in se-
of Botany 87: 573–582. quence-based phylogenetic analyses. Systematic Biology 49: 369–381.
2004 AMERICAN JOURNAL OF BOTANY

SMALL, R. L., J. A. RYBURN, R. C. CRONN, T. SEELANAN, AND J. F. WENDEL. sequencing for systematic studies. Applications in Plant Sciences
1998. The tortoise and the hare: Choosing between noncoding plastome 2(1): 1300063.
and nuclear Adh sequences for phylogeny reconstruction in a recently WALKER, J. F., M. J. ZANIS, AND N. C. EMERY. 2014. Comparative analysis
diverged plant group. American Journal of Botany 85: 1301–1315. of complete chloroplast genome sequence and inversion variation in
SOLTIS, D. E., M. A. GITZENDANNER, D. D. STRENGE, AND P. S. SOLTIS. 1997. Lasthenia burkei (Madieae, Asteraceae). American Journal of Botany
Chloroplast DNA intraspecific phylogeography of plants from the Pacific 101: 722–729.
Northwest of North America. Plant Systematics and Evolution 206: WANG, L. Y., R. J. ABBOTT, W. ZHENG, P. CHEN, Y. J. WANG, AND J. Q. LIU.
353–373. 2009. History and evolution of alpine plants endemic to the Qinghai-
SOLTIS, D. E., M. A. GITZENDANNER, G. W. STULL, M. CHESTER, A. Tibetan Plateau: Aconitum gymnandrum (Ranunculaceae). Molecular
CHANDERBALI, S. CHAMALA, I. E. JORDON-THADEN, ET AL. 2013. The Ecology 18: 709–721.
potential of genomics in plant systematics. Taxon 62: 886–898. WHITTALL, J. B., J. SYRING, M. PARKS, J. BUENROSTRO, C. DICK, A. LISTON,
SOLTIS, D. E., A. B. MORRIS, J. S. MCLACHLAN, P. S. MANOS, AND P. S. AND R. CRONN. 2010. Finding a (pine) needle in a haystack:
SOLTIS. 2006. Comparative phylogeography of unglaciated eastern Chloroplast genome sequence divergence in rare and widespread
North America. Molecular Ecology 15: 4261–4293. pines. Molecular Ecology 19: 100–114.
STEELE, K. P., AND R. VILGALYS. 1994. Phylogenetic analyses of WIDMER, A., AND M. BALTISBERGER. 1999. Extensive intraspecific chlo-
Polemoniaceae using nucleotide sequences of the plastid gene matK. roplast DNA (cpDNA) variation in the alpine Draba aizoides
Systematic Botany 19: 126–142. L.(Brassicaceae): Haplotype relationships and population structure.
STRAUB, S. C. K., M. PARKS, K. WEITEMIER, M. FISHBEIN, R. C. CRONN, AND Molecular Ecology 8: 1405–1415.
A. LISTON. 2012. Navigating the tip of the genomic iceberg: Next- WOLFE, K. H., W. H. LI, AND P. M. SHARP. 1987. Rates of nucleotide substi-
generation sequencing for plant systematics. American Journal of tution vary greatly among plant mitochondrial, chloroplast, and nuclear
Botany 99: 349–364. DNAs. Proceedings of the National Academy of Sciences, USA 84:
STULL, G. W., M. J. MOORE, V. S. MANDALA, N. A. DOUGLAS, H. R. KATES, 9054–9058.
X. QI, S. F. BROCKINGTON, ET AL. 2013. A targeted enrichment strat- YAMANE, K., AND T. KAWAHARA. 2005. Intra- and interspecific phy-
egy for massively parallel sequencing of angiosperm plastid genomes. logenetic relationships among diploid Triticum-Aegilops species
Applications in Plant Sciences 1: 1200497. (Poaceae) based on base-pair substitutions, indels, and microsatellites
SYTSMA, K. J., AND B. A. SCHAAL. 1985. Phylogenetics of the Lisianthius in chloroplast noncoding sequences. American Journal of Botany 92:
skinneri (Gentianaceae) species complex in Panama utilizing DNA 1887–1898.
restriction fragment analysis. Evolution 39: 594–608. YAMANE, K., Y. YASUI, AND O. OHNISHI. 2003. Intraspecific cp-
TABERLET, P., L. FUMAGALLI, A.-G. WUST-SAUCY, AND J.-F. COSSON. 1998. DNA variations of diploid and tetraploid perennial buckwheat,
Comparative phylogeography and postglacial colonization routes in Fagopyrum cymosum (Polygonaceae). American Journal of Botany
Europe. Molecular Ecology 7: 453–464. 90: 339–346.
TABERLET, P., L. GIELLY, G. PAUTOU, AND J. BOUVET. 1991. Universal YUAN, J.-H., F.-Y. CHENG, AND S.-L. ZHOU. 2011. The phylogeographic
primers for amplification of 3 noncoding regions of chloroplast DNA. structure and conservation genetics of the endangered tree peony,
Plant Molecular Biology 17: 1105–1109. Paeonia rockii (Paeoniaceae), inferred from chloroplast gene se-
TECHAPRASAN, J., S. KLINBUNGA, C. NGAMRIABSAKUL, AND T. JENJITTIKUL. quences. Conservation Genetics 12: 1539–1549.
2010. Genetic variation of Kaempferia (Zingiberaceae) in Thailand YUAN, Q.-J., Z. Y. ZHANG, H. PENG, AND S. GE. 2008. Chloroplast phylo-
based on chloroplast DNA (psbA-trnH and petA-psbJ) sequences. geography of Dipentodon (Dipentodontaceae) in southwest China and
Genetics and Molecular Research 9: 1957–1973. northern Vietnam. Molecular Ecology 17: 1054–1065.
URIBE-CONVERS, S., J. R. DUKE, M. J. MOORE, AND D. C. TANK. 2014. A ZAR, J. H. 2010. Biostatistical analysis, 5th ed. Pearson Prentice Hall,
long PCR-based approach for DNA enrichment prior to next-generation Upper Saddle River, New Jersey, USA.

You might also like