Professional Documents
Culture Documents
Plant Genotyping 1
Plant Genotyping 1
Plant
Genotyping
Methods and Protocols
METHODS IN MOLECULAR BIOLOGY
Series Editor
John M. Walker
School of Life Sciences
University of Hertfordshire
Hatfield, Hertfordshire, AL10 9AB, UK
Edited by
Jacqueline Batley
School of Plant Biology, The University of Western Australia,
Crawley, WA, Australia
Editor
Jacqueline Batley
School of Plant Biology, The University of Western Australia
Crawley, WA, Australia
Plant genotyping is a rapidly advancing field. The ability to produce vast amounts of DNA
sequence data has enabled the discovery of molecular markers in a vast array of plant species,
meaning that genotyping rather marker development becomes the rate limiting factor. This
volume is aimed at plant biologists working on plants from model organisms and crops, to
orphan species and focuses on all the different marker types available. The volume would
also be of interest to researchers who would benefit from an introduction to the different
marker systems available for plant research.
Plant genotyping is required for a variety of end uses including marker-assisted selec-
tion, associating phenotype with polymorphism, DNA barcoding, genetic diversity analysis,
conservation genetics, and improving genome assemblies. The most suitable genotyping
system to use depends on the throughput requirements, facilities available, and questions
to be answered. Chapters within this volume focus on the diverse range of genotyping
methods available, with guidelines as to what methods may be suitable for the different
needs of the researchers. Overviews are provided in the early chapters. Given the issues with
polyploidy in some plant species, information is included describing how to handle this
data. Information is also provided on bioinformatics tools for marker discovery, databases
hosting existing markers, and software for data analysis. Chapters providing details on spe-
cific genotyping methods are then included.
Scientific research progresses rapidly and the technologies for genotyping evolve with
this. In this volume we have covered the different methods available to date, many of which
will continue to increase in throughput as these technologies increase and researchers are
encouraged to frequently review which may be the most applicable method for their
research.
v
Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
vii
viii Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
Contributors
ix
x Contributors
Abstract
Genetic diversity between individuals can be tracked and monitored using a range of molecular markers.
These markers can detect variation ranging in scale from a single base pair up to duplications and transloca-
tions of entire chromosomal regions. The genotyping of individuals allows the detection of this variation and
it has been successfully applied in plant science for many years. The increasing amounts of sequence data able
to be generated using next-generation sequencing (NGS) technologies have produced a vast expansion in the
rate of discovery of polymorphisms, with single nucleotide polymorphisms (SNPs) predominating as the
marker of choice. This increase in polymorphic marker resources through efficient discovery, coupled with
the utility of SNPs, has enabled the shift to high-throughput genotyping assays and these methods are
reviewed and discussed here, alongside the recent innovations allowing increased throughput.
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_1, © Springer Science+Business Media New York 2015
1
2 Dhwani A. Patel et al.
1.3 Randomly RAPD markers are random DNA segments amplified using short
Amplified Polymorphic primers of around 10 bp via PCR [3]. Short primers ensure com-
DNAs (RAPDs) plementary sequence matching and subsequent amplification in
the genome. There may be length variation and presence/absence of
priming sites. The final products can be visualized using agarose
gel electrophoresis. Due to the low selectivity of short primers,
using this method increases the chances of nonspecific priming and
therefore artifacts [3]. Another issue with RAPD markers is their
Advances in Plant Genotyping: Where the Future Will Take Us 3
1.4 Simple Sequence SSRs, also known as microsatellites, are tandem repeated short
Repeats (SSRs)/ DNA stretches that can occur as mono-, di-, tri-, tetra-, penta, and
Microsatellites hexanucleotides [5]. The number of repeated units is affected by
mutations which makes SSRs highly polymorphic. These markers
have several beneficial attributes like their abundance in the
genome, high reproducibility, multiallelic variability, and genetic
codominance. SSRs have a wide range of applicability such as in
marker-assisted studies (MAS), genetic diversity analysis, genetic
mapping, and phenotype mapping to mention a few [5]. They also
allow for transferability between species because primers designed
in one species often amplify corresponding loci in related species,
with information gained able to be used for comparative analyses.
Some SSR sequences have been implicated in playing a role in gene
function and expression, as transcriptional activating elements, and
those SSRs present in noncoding regions may have a functional
significance [5]. Drawbacks such as varying abundance of markers
in different species, reduced frequency of SSRs in plant genomes
relative to animal genomes, and degree of optimization required in
each new species limit their use [6].
1.5 Restriction-Site RADs are short DNA stretches adjacent to every restriction enzyme
Associated DNA (RAD) recognition site [7] and are useful in reducing the complexity of a
genome [1]. The latest advances in using RAD markers include
sequencing RAD tags for Single Nucleotide Polymorphism (SNP)
discovery and genotyping. This has proven effective in discovering
polymorphic markers even in organisms with low polymorphism.
Due to their reduced genome representation, the nucleotides next
to the recognition sites can be sequenced at high depth for SNP
detection. The user can also choose the number of markers to be
used based on the restriction enzymes chosen. This method can be
used for bulk segregant analysis by genotyping pooled populations
and multiplexed samples [1].
1.6 Allele-Specific ASAPs is a method whereby at least one PCR primer is selected
Associated Primers that contains a polymorphism (usually at the 3′ end), compared to
(ASAPs) regular PCR-based reactions whereby nonpolymorphic primers are
used to amplify a polymorphic region in between them [8]. Under
stringent PCR conditions, this results in matched primers amplifying
the required fragment and mismatched primers not allowing ampli-
fication. The appearance of an amplicon on an agarose gel thus
allows for resolution of DNA polymorphism in a presence/absence
relationship [8]. The main benefit of this method over similar
methods of its era is the enhanced throughput achievable, as it
involves fewer steps and was more easily applied to a large number
4 Dhwani A. Patel et al.
1.7 Single Nucleotide SNPs are the most abundant markers present in a genome [2].
Polymorphisms (SNPs) They have become the most popular choice of marker for several
genetic analyses. A SNP can be defined as a nucleotide difference
between two individuals at a particular locus [5]. The three forms
of SNPs are transitions (C/T, G/A), transversions (C/G, A/T,
T/G, C/A), and insertions/deletions (indels). C/T SNPs tend to
be more frequent outside of transcribed regions as a result of
increased cytosine methylation and amplified cytosine deamination
(reviewed in [10]).
SNPs have many features that make them the ideal choice as a
molecular marker. They occur abundantly in the genome, are
relatively stable during evolution, and have a low mutation rate.
Such molecular markers are good tools to analyze the various
processes encompassing the population and evolutionary genetics
of an organism. These include mating systems, patterns of specia-
tion and dispersal, and comparative genomics [11]. SNPs are also
excellent genetic markers for high-density genetic map construction
for the genetic and physical mapping of genomes, trait mapping
and association, and linkage disequilibrium (LD) studies. In agri-
culture, these properties enable SNPs to be applied to genetic diag-
nostics, germplasm identification, and marker-assisted selection for
breeding programs.
The usefulness of SNPs for various applications depends on
their genomic location and environment. Genic SNPs are identified
within expressed sequences from available EST databases or next-
generation transcriptome sequencing data [12–17]. These SNPs
can result in either synonymous or nonsynonymous amino acid
changes. Nonsynonymous SNPs may be linked directly to gene
function or be “perfect” markers by altering protein structure or
function. Genic SNPs are often selected against, which can be
observed by the lower frequency of nonsynonymous to synony-
mous base changes in gene regions, and can lead to an underestima-
tion of true SNP number and reduced resolution for genetic
diversity studies (reviewed in [18]). Genic SNPs are also limited to
actively transcribed or gene-rich regions of the genome. The exis-
tence of duplicated loci and highly conserved gene family members,
especially in polyploidy species, can compromise the applicability
of genic SNPs to downstream applications such as association
mapping and LD studies [12, 19].
With the recent advances in whole genome sequencing (WGS)
technologies, genomic SNPs are increasing in popularity and acces-
sibility [20]. Genomic SNPs can be identified from any sequenced
region in the genome, minimizing problems from duplicated genic
regions conserved within and between genomes. Furthermore, the
Advances in Plant Genotyping: Where the Future Will Take Us 5
1.8 Reduced- Advances in genome sequencing technologies have paved the way
Representation for significant improvements in the rapid detection of genetic varia-
Libraries (RRLs) tion as well as the throughput and wealth of the information
and Complexity obtained. Using reduced-representation sequencing, which involves
Reduction sequencing a few targeted, genomic regions rather than the entire
of Polymorphic genome, individuals can be directly compared for sequence varia-
Sequences (CRoPS) tions. Partial, but genome-wide, coverage is obtained by digesting
and pooling samples from multiple individuals with a frequently
cutting restriction enzyme [21]. The fragments of desired size are
then selected and sequenced at high depth at a reduced cost to full
genome sequencing. Reads from reduced-representation sequenc-
ing can be mapped to a reference genome for polymorphism detec-
tion, SNP calling, and haplotype analysis (where adjacent SNPs are
inherited as a conserved block of sequence). In the absence of a
reference, paired-end sequencing reads from any second generation
sequencing (SGS) platform, or long reads from the Roche Genome
Sequencer, can be used to assemble the fragments. However, this
method is not suitable to be applied to genomes with high ploidy
levels or large repetitive genome fractions [1].
CRoPS was the first method that used sequence identifiers, or
barcodes, to uniquely tag sequence reads of an individual DNA
sample, enabling multiplexing of samples on one lane of any SGS
platform for polymorphism identification and population studies
[21]. Studies in maize have demonstrated the applicability of
CRoPS [22, 23]. Barcodes can also be applied to RRLs as long as
fragment size is selected individually for each sample before
pooling.
2 Sanger Sequencing
3 High-Throughput Genotyping
3.1 TaqMan Assay Allele-specific hybridization coupled with taq polymerase activity
during PCR forms the basis of the TaqMan assay [27]. One pair of
PCR primers and two different probes to one SNP site are used.
Fluorescence occurs when one of the probes matches a SNP allele,
which leads to the separation of the quencher and the fluorescent
dye. Life Technologies’ 7900HT Fast Real-Time PCR system can
process eighty-four 384 well plates in up to 4 days. One of the
drawbacks of this assay is its high cost of probes for a low level of
SNP multiplexing. Some recent advances include systems like
Biomark HD-System [28] and OpenArray [29] that have a small
sample requirement, consume less reagents, and have a higher
throughput [27].
3.2 iPlex Gold Assay Multiplex PCR, single-base extension and Matrix-assisted laser
desorption/ionization-time of flight (MALDI-TOF) mass spec-
trometry (MS) detection combined together make the iPlex Gold
assay (Sequenom, www.sequenom.com). Shrimp alkaline phospha-
tase deactivates the remaining nucleotides after PCR and the single
base primer extension is performed. The SNP site is combined
with one of four terminator nucleotides and the products are trans-
ferred onto 384-matrix spot chips following desalination, to be
analyzed using MALDI-TOF MS [27]. One 384 plate can be pro-
cessed in less than 10 h. This method is very useful for low input
samples because it directly analyses the allele-specific product and
outputs highly accurate data.
3.3 High Resolution HRM uses intercalating fluorescent dyes to monitor the melting
Melt (HRM) profile (unmelted to melted) of PCR products by genotyping on
Advances in Plant Genotyping: Where the Future Will Take Us 7
5 Bioinformatics Challenges
5.1 Handling Large The rapid advances in sequencing technology over the past decade
Volumes of Data have led to an explosion of sequence and molecular marker data
[33]. In the early days of sequencing, the growth of sequence capa-
bilities and information technology resources went hand in hand
[34]. In the last decade however, the emergence of NGS technology
has advanced the field so much, that the throughput and the out-
put of data from individual sequencing runs has reached the point
where it is outgrowing the capacity to store this data in an efficient
and cost-effective way [20]. The genome informatics ecosystem is
at risk of getting swamped with data that current storage capacity
cannot absorb, with the sequence data output doubling every
5 months (on average) which is in turn dramatically lowering the
cost per DNA base sequenced [34]. This may pose as a challenge
in the near future but alternative options like cloud computing are
currently under consideration [34, 35]. The rapid increase in
sequencing data has also created the need for new algorithms that
can process this flood of data in a meaningful and effective way.
5.2 Assembly Some of the greatest advancements have come from genome
Software assembly software, as this is one of the most important applications
of NGS data [36]. Early assembly software struggled to meet the
needs of researchers in assembling complex genomes such as those of
higher plants and mammals, however recent advances have allowed
for the completion of several eukaryotic genomes [37, 38]. One
significant challenge in genome assembly is the existence of large
repetitive elements within genomes [39]. This can in part be tackled
Advances in Plant Genotyping: Where the Future Will Take Us 9
5.3 Alignment The overwhelming volume of sequence data has also led to the
Software development of new alignment algorithms, as existing tools simply
cannot cope [39]. This applies to traditional dynamic program-
ming methods, as well as the BLAST family of alignment heuris-
tics. Current alignment algorithms have addressed this problem by
splitting the alignment problem into two steps: First, candidate
alignment locations are found using a heuristic search; second, the
actual alignment is performed. Examples of this include BLAT,
MAQ, Bowtie, and SOAPaligner/SOAP2 [39].
6 Conclusion
References
1. Mir RR, Varshney RK (2013) Future prospects 2. Agarwal M, Shrivastava N, Padh H (2008)
of molecular markers in plants. In: Henry RJ Advances in molecular marker techniques and
(ed) Molecular markers in plants. Wiley, their applications in plant sciences. Plant Cell
New York, pp 169–190 Rep 27:617–631
10 Dhwani A. Patel et al.
Molecular analysis and genome discovery. John 34. Stein LD (2010) The case for cloud computing
Wiley & Sons, Ltd, pp 1–23 in genome informatics. Genome Biol 11:207
28. Fluidigm (2012) Biomark HD system. http:// 35. Dai L, Xin G, Yan G, Jingfa X, Zhang Z (2012)
www.fluidigm.com/biomark-hd-system.html Bioinformatics clouds for big data manipula-
29. LifeTechnologies (2012) OpenArray® tion. Biol Direct 7:43
Real-Time PCR System. http://www.applied- 36. Edwards D, Batley J (2010) Plant genome
biosystems.com/absite/us/en/home/applica- sequencing: applications for crop improve-
tions-technologies/real-time-pcr/ ment. Plant Biotechnol J 8:2–9
real-time- pcr-instruments/openarray-real- 37. Imelfort M, Edwards D (2009) De novo
time-pcr-system.html sequencing of plant genomes using second-
30. Biofire (2012) LightScanner® system mutation generation technologies. Brief Bioinform 10:
discovery, gene scanning and genotyping. bio- 609–618
fire diagnostics. http://www.biofiredx.com/ 38. Imelfort M, Batley J, Grimmond S, Edwards D
LightScanner/ (2009) Genome sequencing approaches and
31. Tindall EA, Petersen DC, Nikolaysen S, Miller successes. In: Somers DJ, Langridge P,
W, Schuster SC, Hayes VM (2010) Gustafson JP (eds) Plant genomics. Humana,
Interpretation of custom designed Illumina Kentucky, pp 345–358
genotype cluster plots for targeted association 39. Lee HC, Lai KT, Lorenc MT, Imelfort M,
studies and next-generation sequence valida- Duran C, Edwards D (2012) Bioinformatics
tion. BMC Res Notes 3:39 tools and databases for analysis of next-
32. Durstewitz G, Polley A, Plieske J, Luerssen H, generation sequence data. Brief Funct
Graner EM, Wieseke R, Ganal MW (2010) Genomics 11:12–24
SNP discovery by amplicon sequencing and 40. Lai K, Duran C, Berkman PJ, Lorenc MT,
multiplex SNP genotyping in the allopolyploid Stiller J, Manoli S, Hayden MJ, Forrest KL,
species Brassica napus. Genome 53:948–956 Fleury D, Baumann U, Zander M, Mason AS,
33. Thudi M, Li YP, Jackson SA, May GD, Batley J, Edwards D (2012) Single nucleotide
Varshney RK (2012) Current state-of-art of polymorphism discovery from wheat next-
sequencing technologies for plant genomics generation sequence data. Plant Biotechnol J
research. Brief Funct Genomics 11:3–11 10:743–749
Chapter 2
Abstract
Individuals within a population of a sexually reproducing species will have some degree of heritable genomic
variation caused by mutations, insertion/deletions (INDELS), inversions, duplications, and translocations.
Such variation can be detected and screened using molecular, or genetic, markers. By definition, molecular
markers are genetic loci that can be easily tracked and quantified in a population and may be associated with
a particular gene or trait of interest. This chapter will review the current major applications of molecular
markers in plants.
Key words Molecular markers, SNPs, Association mapping, Genetic diversity, Genetic mapping,
Marker-assisted selection
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_2, © Springer Science+Business Media New York 2015
13
14 Alice C. Hayward et al.
2.1 Genetic Linkage One of the most important applications of genetic markers has been
Map Construction the construction of genetic linkage maps [1, 2]. These maps are
and QTL Identification created by genotyping a large mapping population of segregating
individuals and studying the resulting recombination frequencies
between genetic markers. This enables establishment of linkage
groups of associated markers with an approximate relative position
along a chromosome based on their likelihood of being coinher-
ited. A linkage group will inherently often represent a large propor-
tion of an individual chromosome with imputed recombination
points. The abundance of SNPs and their ability to be discovered
and genotyped rapidly in a high-throughput manner makes them
particularly valuable markers for genetic mapping [3–5].
Importantly, when the same mapping population used to
derive a linkage map is phenotyped for segregating traits of inter-
est, such as seed color or flowering time, the association between
marker patterns and the phenotypic variation can be quantified.
This then enables identification of the genomic regions controlling
traits of interest. Where these traits are quantitative, the associated
genomic region(s) are known as quantitative trait loci (QTL). The
identification of markers closely linked to genetic loci of interest,
including QTL, enables discovery of the underlying, causative
gene(s). Prior to the availability of whole genome sequencing tech-
nologies, this involved map-based cloning, which used the known
sequence of markers directly flanking a locus to amplify and
sequence the intervening region for gene candidate identification.
Depending on the resolution of the genetic map as defined by
marker density and thus distance between flanking markers, this
process was often extremely time and resource intensive.
Nonetheless, it enabled the first identification of developmentally
and agriculturally important genes in many crop and model plant
species. In the crop canola, QTL of importance include those for
oil yield, oil quality, disease resistance, and pod shatter tolerance,
amongst many others [6–9].
2.2 Genome Genetic linkage maps are highly valuable in helping to assemble
Assembly, Physical contigs of next-generation genome sequencing data into chromo-
Mapping, and Synteny somes. This is achieved by physically mapping genetic marker
Mapping sequences on these contigs and comparing this to their known
relative location on the genetic map. The success of this process
depends on the accuracy and robustness of the genetic linkage
Molecular Marker Applications in Plants 15
2.3 Association Genetic markers that are linked to traits under selection are highly
Mapping and Linkage valuable for identifying genetic loci that contribute to phenotypic
Disequilibrium variation based on linkage disequilibrium (LD). LD refers to the
coinheritance of specific genetic markers in ancestrally related indi-
viduals at higher frequencies than expected based on recombina-
tion distances. Regions that are in high LD may be under high
selection pressure for particular allelic combinations, implying a
positive relationship between otherwise physically distinct alleles
and quantifiable traits. LD mapping, or association mapping, refers
to the analysis of statistical associations between genetic markers,
usually individual SNPs or SNP haplotypes, and traits (phenotypes)
in a collection of individuals [5, 19–21]. SNP haplotypes,
which comprise SNP alleles always found in particular allelic
16 Alice C. Hayward et al.
3 Genomic-Based Breeding
3.1 Marker-Assisted Markers provide the potential to fine map important genetic loci
Selection: Single with high resolution through the use of mapping populations.
Marker–Trait Where these populations are phenotyped for traits of agronomic
Associations importance, such as disease resistance, the inheritance of particular
marker loci or haplotypes in the population can be linked to such
phenotypes. Genotyping of markers tightly linked to traits can
then rapidly predict the phenotypes of a large selection of segregat-
ing individuals at an early stage of development, often well before
phenotypic screening would be possible, and at reduced cost.
The application of single marker–trait associations to crop breeding
is known as marker-assisted selection (MAS). MAS enables efficient
selection of breeding lines for the introgression of desirable traits
into commercial crop accessions as well as the high-throughput
screening of the resulting progeny [34, 35].
An effective marker for MAS must generally be located within
1 cM of a desired trait and able to be genotyped at high through-
put and reproducibility [36]. Low polymorphism, poor genomic
distribution, and/or poor reproducibility of marker types includ-
ing RFLPs (Restriction Fragment Length Polymorphisms) and
RAPDs (Randomly Amplified Polymorphic DNA) limit their appli-
cation to MAS. Microsatellites (also known as Simple Sequence
Repeats, SSRs) are highly polymorphic, reproducible alternatives,
but are often poorly linked to genes [36, 37]. Nonetheless, SSRs
and RAPDs have been applied in B. napus (canola) MAS programs
for selection for major gene disease resistance [38], yellow seed
coat color [39], male fertility restorer lines [40], and improvement
of oil quality [41].
SNPs are currently the best markers for MAS due to their high
prevalence and polymorphism in the genome and their potential for
strong, or even perfect, linkage to traits of interest [5, 42–44].
Perfect linkage is possible where the polymorphism is directly
responsible for variation in the desired trait. The development of
high-throughput sequencing technologies in recent years has greatly
assisted association studies that utilize SNP markers [45, 46].
3.2 Genome-Wide The availability of phenotypic data along with genotypic data permits
Marker-Assisted the association of loci, or haplotypes, at a genome-wide scale, which
Selection may be used to mine an entire genome for genotype–phenotype
correlations.
When there are enough markers, spanning the entire genome
in a dense manner, it is expected that the gene, or genes, of interest
will be in linkage disequilibrium with at least one or some of the
markers, leading to marker-assisted selection on a genomic scale
[47]. Genome-wide marker-assisted selection studies will be an
important way of safeguarding global food supplies into the future.
One study performed by Morris and coworkers investigated
18 Alice C. Hayward et al.
4.1 Crop Breeding For agriculturally important species, a high level of allelic diversity
provides an essential resource for mining beneficial trait variants
associated with this diversity. In the context of a changing climate,
a diverse germplasm set provides a valuable degree of genetic
plasticity and adaptive potential for breeding-based crop
20 Alice C. Hayward et al.
4.3 Taxonomic Comparing the genetic similarity of related species is the most
Classification accurate method of resolving taxonomic classifications. Various
molecular marker methods provide a fast, high-throughput, and
effective means to determine evolutionary relationships at differing
resolution. Within the Brassicaceae family, phylogenies remain
somewhat confused as a result of recurrent hybridization and poly-
ploidization events [79]. SNPs for high-throughput evolutionary
analysis are being applied to resolve ancestral karyotypes in the
Brassicaceae and the origin and timing of whole genome duplica-
tion and hybridization events [80–82]. The ability to efficiently
classify large numbers of samples into species groups also has appli-
cations for germplasm banks by facilitating routine verification of
stored lines and control of potential contamination. In the study by
Pradhan and coworkers [66], SSR markers with known genomic
locations from each of the three Brassica “A,” “B,” and “C”
genomes were used to confirm species identity in a collection of
B. nigra accessions found to be contaminated with B. juncea and B.
rapa species. Thus, genetic markers with known genomic origin
can be valuable for species classification where identification based
solely on morphological characters is difficult [66].
6 Conclusion
References
1. Baird NA, Etter PD, Atwood TS, Currey MC, Marcroft S, Kearney G, Smith KF, Forster JW,
Shiver AL, Lewis ZA, Selker EU, Cresko WA, Spangenberg GC (2009) Genetic map con-
Johnson EA (2008) Rapid SNP discovery and struction and QTL mapping of resistance to
genetic mapping using sequenced RAD mark- blackleg (Leptosphaeria maculans) disease in
ers. PLoS One 3:e3376 Australian canola (Brassica napus L.) cultivars.
2. Duran C, Edwards D, Batley J (2009) Genetic Theor Appl Genet 120:71–83
maps and the use of synteny. In: Somers DJ, 7. Pilet ML, Delourme R, Foisset N, Renard M
Langridge P, Gustafson JP (eds) Plant genom- (1998) Identification of loci contributing to
ics: methods and protocols. Humana Press, quantitative field resistance to blackleg disease,
New York, NY, pp 41–56 causal agent Leptosphaeria maculans (Desm.)
3. Duran C, Edwards D, Batley J (2009) Ces. et de Not., in Winter rapeseed (Brassica
Molecular marker discovery and genetic map napus L.). Theor Appl Genet 96:23–30
visualisation. In: Edwards D (ed) Applied bio- 8. Qiu D, Morgan C, Shi J, Long Y, Liu J, Li R,
informatics. Springer, New York, pp 165–189 Zhuang X, Wang Y, Tan X, Dietrich E,
4. Edwards D, Wilcox S, Barrero RA, Fleury D, Weihmann T, Everett C, Vanstraelen S, Beckett
Cavanagh CR, Forrest KL, Hayden MJ, P, Fraser F, Trick M, Barnes S, Wilmer J,
Moolhuijzen P, Keeble-Gagnere G, Bellgard MI, Schmidt R, Li J, Li D, Meng J, Bancroft I
Lorenc MT, Shang CA, Baumann U, Taylor JM, (2006) A comparative linkage map of oilseed
Morell MK, Langridge P, Appels R, Fitzgerald A rape and its use for QTL analysis of seed oil and
(2012) Bread matters: a national initiative to pro- erucic acid content. Theor Appl Genet 114:
file the genetic diversity of Australian wheat. 67–80
Plant Biotechnol J 10:703–708 9. Smooker AM, Wells R, Morgan C, Beaudoin F,
5. Hayward A, Dalton-Morgan J, Mason A, Cho K, Fraser F, Bancroft I (2011) The identi-
Zander M, Edwards D, Batley J (2012) SNP fication and mapping of candidate genes and
discovery and applications in Brassica napus. QTL involved in the fatty acid desaturation
J Plant Biotechnol 39:49–61 pathway in Brassica napus. Theor Appl Genet
6. Kaur S, Cogan NO, Ye G, Baillie RC, Hand 122:1075–1090
ML, Ling AE, McGearey AK, Kaur J, Hopkins 10. Tollenaere R, Hayward A, Dalton-Morgan J,
CJ, Todorovic M, Mountford H, Edwards D, Campbell E, Lee JRM, Lorenc MT, Manoli S,
Batley J, Burton W, Salisbury P, Gororo N, Stiller J, Raman R, Raman H, Edwards D,
24 Alice C. Hayward et al.
Batley J (2012) Identification and characteriza- 21. Rafalski JA (2010) Association genetics in
tion of candidate Rlm4 blackleg resistance crop improvement. Curr Opin Plant Biol 13:
genes in Brassica napus using next-generation 174–180
sequencing. Plant Biotechnol J 10:709–715 22. Cowling WA, Balázs E (2010) Prospects and
11. Choi SR, Teakle GR, Plaha P, Kim JH, Allender challenges for genome-wide association and
CJ, Beynon E, Piao ZY, Soengas P, Han TH, genomic selection in oilseed Brassica species.
King GJ, Barker GC, Hand P, Lydiate DJ, Genome 53:1024–1028
Batley J, Edwards D, Koo DH, Bang JW, Park 23. Atwell S, Huang YS, Vilhjalmsso BJ et al
BS, Lim YP (2007) The reference genetic link- (2010) Genome-wide association study of 107
age map for the multinational Brassica rapa phenotypes in Arabidopsis thaliana inbred
genome sequencing project. Theor Appl Genet lines. Nature 465:627–631
115:777–792 24. Cardon LR, Bell JI (2001) Association study
12. Edwards D, Batley J, Cogan NOI, Forster JW, designs for complex diseases. Nat Rev Genet
Chagné D (2007) Single nucleotide polymor- 2:91–99
phism discovery. In: Oraguzie N, Rikkerink E, 25. Flint-Garcia SA, Thornsberry JM, Buckler ES
Gardiner S, Silva H (eds) Association mapping (2003) Structure of linkage disequilibrium in
in plants. Springer, New York, pp 53–76 plants. Annu Rev Plant Physiol Plant Mol Biol
13. Love C, Logan E, Erwin T, Kaur J, Lim GAC, 54:357–374
Hopkins C, Batley J, James N, May S, 26. Oraguzie N (2007) An overview of association
Spangenberg G, Edwards D (2006) Integrating mapping. In: Oraguzie N, Rikkerink E,
and interrogating diverse Brassica data within Gardiner S, Silva H (eds) Association mapping
an EnsEMBL structured database. Proceedings in plants. Springer, New York, pp 1–9
of the joint meeting of the fourteenth crucifer
genetics workshop and fourth ishs symposium 27. Neale DB, Savolainen O (2004) Association
on Brassicas. Acta Hort 706:77–82 genetics of complex traits in conifers. Trends
Plant Sci 9:325–330
14. Bevan M, Murphy G (1999) The small, the
large and the wild: the value of comparison 28. Waugh R, Jannink JL, Muehlbauer GJ, Ramsay
in plant genomics. Trends Genet 15: L (2009) The emergence of whole genome
211–214 association scans in barley. Curr Opin Plant
Biol 12:218–222
15. Feuillet C, Keller B (2002) Comparative
genomics in the grass family: molecular charac- 29. Yu JM, Buckler ES (2006) Genetic association
terization of grass genome structure and evolu- mapping and genome organization of maize.
tion. Ann Bot 89:3–10 Curr Opin Biotechnol 17:155–160
16. Galvão VC, Nordstrom KJV, Lanz C, Sulz P, 30. Chagné D, Batley J, Edwards D, Forster JW
Mathieu J, Pose D, Schmid M, Weigel D, (2007) Single nucleotide polymorphisms
Schneeberger K (2012) Synteny-based genotyping in plants. In: Oraguzie N, Rikkerink
mapping-by-sequencing enabled by targeted E, Gardiner S, Silva H (eds) Association map-
enrichment. Plant J 71:517–526 ping in plants. Springer, New York, pp 77–94
17. McClean PE, Mamidi S, McConnell M, 31. Duran C, Eales D, Marshall D, Imelfort M,
Chikara S, Lee R (2010) Synteny mapping Stiller J, Berkman PJ, Clark T, McKenzie M,
between common bean and soybean reveals Appleby N, Batley J, Basford K, Edwards D
extensive blocks of shared loci. BMC Genomics (2010) Future tools for association mapping in
11:184 crop plants. Genome 53:1017–1023
18. Zhu HY, Kim DJ, Baek JM, Choi HK, Ellis 32. Yan JB, Shah T, Warburton ML, Buckler ES,
LC, Kuester H, McCombie WR, Peng HM, McMullen MD, Crouch J (2009) Genetic
Cook DR (2003) Syntenic relationships characterization and linkage disequilibrium
between Medicago truncatula and Arabidopsis estimation of a global maize collection using
reveal extensive divergence of genome organi- SNP markers. PLoS One 4:e8451
zation. Plant Physiol 131:1018–1026 33. Guerra FP, Wegrzyn JL, Sykes R, Davis MF,
19. Abdurakhmonov IY, Abdukarimov A (2008) Stanton BJ, Neale DB (2013) Association genet-
Application of association mapping to under- ics of chemical wood properties in black poplar
standing the genetic diversity of plant germ- (Populus nigra). New Phytol 197:162–176
plasm resources. Int J Plant Genomics 2008: 34. Appleby N, Edwards D, Batley J (2009) New
574927 technologies for ultra-high throughput geno-
20. Gupta PK, Rustgi S, Kulwal PL (2005) Linkage typing in plants. In: Somers DJ, Langridge P,
disequilibrium and association studies in higher Gustafson JP (eds) Plant genomics: methods
plants: present status and future prospects. and protocols. Humana Press, New York, NY,
Plant Mol Biol 57:461–485 pp 19–39
Molecular Marker Applications in Plants 25
35. Semagn K, Bjornstad A, Ndjiondjop MN Brown PJ, Acharya CB, Mitchell SE, Harriman
(2006) An overview of molecular marker meth- J, Glaubitz JC, Buckler ES, Kresovich S (2013)
ods for plants. Afr J Biotechnol 5:2540–2568 Population genomic and genome-wide associa-
36. Mohan M, Nair S, Bhagwat A, Krishna TG, tion studies of agroclimatic traits in sorghum.
Yano M, Bhatia CR, Sasaki T (1997) Genome Proc Natl Acad Sci U S A 110:453–458
mapping, molecular markers and marker- 49. Yang HA, Tao Y, Zheng ZQ, Li CD,
assisted selection in crop plants. Mol Breed Sweetingham MW, Howieson JG (2012)
3:87–103 Application of next-generation sequencing for
37. Hong CP, Piao ZY, Kang TW, Batley J, Yang rapid marker development in molecular plant
TJ, Hur YK, Bhak J, Park BS, Edwards D, Lim breeding: a case study on anthracnose disease
YP (2007) Genomic distribution of simple resistance in Lupinus angustifolius L. BMC
sequence repeats in Brassica rapa. Mol Cells Genomics 13:318
23:349–356 50. Jiang HC, Feng YT, Bao L, Li X, Gao GJ,
38. Chèvre AM, Barret P, Eber F, Dupuy P, Brun Zhang QL, Xiao JH, Xu CG, He YQ (2012)
H, Tanguy X, Renard M (1997) Selection of Improving blast resistance of Jin 23B and its
stable Brassica napus-B.juncea recombinant hybrid rice by marker-assisted gene pyramiding.
lines resistant to blackleg (Leptosphaeria macu- Mol Breed 30:1679–1688
lans): identification of molecular markers, 51. Zhao K, Tung CW, Eizenga GC, Wright MH,
chromosomal and genomic origin of the intro- Ali ML, Price AH, Norton GJ, Islam MR,
gression. Theor Appl Genet 95:1104–1111 Reynolds A, Mezey J, McClung AM,
39. Somers DJ, Rakow G, Prabhu VK, Friesen Bustamante CD, McCouch SR (2011)
KRD (2001) Identification of a major gene and Genome-wide association mapping reveals a
RAPD markers for yellow seed coat colour in rich genetic architecture of complex traits in
Brassica napus. Genome 1077–1082 Oryza sativa. Nat Commun 2:467
40. Hansen M, Hallden C, Nilsson NO, Sall T 52. Lippman ZB, Semel Y, Zamir D (2007) An
(1997) Marker-assisted selection of restored integrated view of quantitative trait variation
male-fertile Brassica napus plants using a set of using tomato interspecific introgression lines.
dominant RAPD markers. Mol Breed 3: Curr Opin Genet Dev 17:545–552
449–456 53. Schauer N, Semel Y, Balbo I, Steinfath M,
41. Tanhuanpää PK, Vilkki JP, Vilkki HJ (1995) Repsilber D, Selbig J, Pleban T, Zamir D,
Association of a RAPD marker with linolenic Fernie AR (2008) Mode of inheritance of pri-
acid concentration in the seed oil of rapeseed mary metabolic traits in tomato. Plant Cell
(Brassica napus L). Genome 38:414–416 20:509–523
42. Barker GLA, Edwards KJ (2009) A genome- 54. Schauer N, Semel Y, Roessner U, Gur A,
wide analysis of single nucleotide polymor- Balbo I, Carrari F, Pleban T, Perez-Melis A,
phism diversity in the world's major cereal Bruedigam C, Kopka J, Willmitzer L, Zamir D,
crops. Plant Biotechnol J 7:318–325 Fernie AR (2006) Comprehensive metabolic
43. Ching A, Caldwell KS, Jung M, Dolan M, profiling and phenotyping of interspecific
Smith OS, Tingey S, Morgante M, Rafalski AJ introgression lines for tomato improvement.
(2002) SNP frequency, haplotype structure Nat Biotechnol 24:447–454
and linkage disequilibrium in elite maize inbred 55. Liu YS, Gur A, Ronen G, Causse M, Damidaux
lines. BMC Genet 3:19 R, Buret M, Hirschberg J, Zamir D (2003)
44. Snowdon RJ, Friedt W (2004) Molecular There is more to tomato fruit colour than can-
markers in Brassica oilseed breeding: current didate carotenoid genes. Plant Biotechnol J
status and future possibilities. Plant Breed 123: 1:195–207
1–8 56. Tieman DM, Zeigler M, Schmelz EA, Taylor
45. Syvänen AC (2005) Toward genome-wide MG, Bliss P, Kirst M, Klee HJ (2006)
SNP genotyping. Nat Genet 37:S5–S10 Identification of loci affecting flavour volatile
46. Varshney RK, Nayak SN, May GD, Jackson SA emissions in tomato fruits. J Exp Bot 57:
(2009) Next-generation sequencing technolo- 887–896
gies and their implications for crop genetics 57. Eshed Y, Zamir D (1995) An introgression line
and breeding. Trends Biotechnol 27:522–530 population of Lycopersicon pennellii in the cul-
47. Meuwissen T (2007) Genomic selection: tivated tomato enables the identification and
marker assisted selection on a genome wide fine mapping of yield-associated QTL. Genetics
scale. J Anim Breed Genet 124:321–322 141:1147–1162
48. Morris GP, Ramu P, Deshpande SP, Hash CT, 58. Semel Y, Nissenbaum J, Menda N, Zinder M,
Shah T, Upadhyaya HD, Riera-Lizarazu O, Krieger U, Issman N, Pleban T, Lippman Z,
26 Alice C. Hayward et al.
Gur A, Zamir D (2006) Overdominant quanti- 69. Fourmann M, Barret P, Froger N, Baron C,
tative trait loci for yield and fitness in tomato. Charlot F, Delourme R, Brunel D (2002)
Proc Natl Acad Sci U S A 103:12981–12986 From Arabidopsis thaliana to Brassica napus:
59. Kamenetzky L, Asis R, Bassi S, de Godoy F, development of amplified consensus genetic
Bermudez L, Fernie AR, Van Sluys MA, markers (ACGM) for construction of a gene
Vrebalov J, Giovannoni JJ, Rossi M, Carrari F map. Theor Appl Genet 105:1196–1206
(2010) Genomic analysis of wild tomato intro- 70. Ferguson ME, Hearne SJ, Close TJ, Wanamaker
gressions determining metabolism- and yield- S, Moskal WA, Town CD, de Young J, Marri
associated traits. Plant Physiol 152:1772–1786 PR, Rabbi IY, de Villiers EP (2012)
60. Howell PM, Marshall DF, Lydiate DJ (1996) Identification, validation and high-throughput
Towards developing intervarietal substitution genotyping of transcribed gene SNPs in cassava.
lines in Brassica napus using marker-assisted Theor Appl Genet 124:685–695
selection. Genome 39:348–358 71. Cao J, Schneeberger K, Ossowski S, Gunther
61. Zou J, Zhu JL, Huang SM, Tian ET, Xiao Y, T, Bender S, Fitz J, Koenig D, Lanz C, Stegle
Fu DH, Tu JX, Fu TD, Meng JL (2010) O, Lippert C, Wang X, Ott F, Muller J, Alonso-
Broadening the avenue of intersubgenomic Blanco C, Borgwardt K, Schmid KJ, Weigel D
heterosis in oilseed Brassica. Theor Appl Genet (2011) Whole-genome sequencing of multiple
120:283–290 Arabidopsis thaliana populations. Nat Genet
62. Cowling WA (2007) Genetic diversity in 43:956–U960
Australian canola and implications for crop 72. He GH, Prakash C (2001) Evaluation of
breeding for changing future environments. genetic relationships among botanical varieties
Field Crop Res 104:103–111 of cultivated peanut (Arachis hypogaea L.)
63. Foster JT, Allan GJ, Chan AP, Rabinowicz PD, using AFLP markers. Genet Resour Crop Evol
Ravel J, Jackson PJ, Keim P (2010) Single 48:347–352
nucleotide polymorphisms for assessing genetic 73. Hyten DL, Song QJ, Zhu YL, Choi IY, Nelson
diversity in castor bean (Ricinus communis). RL, Costa JM, Specht JE, Shoemaker RC,
BMC Plant Biol 10:13 Cregan PB (2006) Impacts of genetic bottle-
64. Allan G, Williams A, Rabinowicz PD, Chan AP, necks on soybean genome diversity. Proc Natl
Ravel J, Keim P (2008) Worldwide genotyping Acad Sci U S A 103:16666–16671
of castor bean germplasm (Ricinus communis L.) 74. Levi A, Thomas CE, Keinath AP, Wehner TC
using AFLPs and SSRs. Genet Resour Crop (2001) Genetic diversity among watermelon
Evol 55:365–378 (Citrullus lanatus and Citrullus colocynthis)
65. Bagavathiannan MV, Julier B, Barre P, Gulden accessions. Genet Resour Crop Evo 48:
RH, Van Acker RC (2010) Genetic diversity of 559–566
feral alfalfa (Medicago sativa L.) populations 75. Song K, Osborn TC (1992) Polyphyletic ori-
occurring in Manitoba, Canada and compari- gins of Brassica napus – new evidence based on
son with alfalfa cultivars: an analysis using SSR organelle and nuclear RFLP analyses. Genome
markers and phenotypic traits. Euphytica 35:992–1001
173:419–432 76. Chen S, Nelson MN, Chevre AM, Jenczewski
66. Pradhan A, Nelson MN, Plummer JA, Cowling E, Li ZY, Mason AS, Meng JL, Plummer JA,
WA, Yan GJ (2011) Characterization of Pradhan A, Siddique KHM, Snowdon RJ, Yan
Brassica nigra collections using simple GJ, Zhou WJ, Cowling WA (2011) Trigenomic
sequence repeat markers reveals distinct groups bridges for Brassica improvement. Crit Rev
associated with geographical location, and fre- Plant Sci 30:524–547
quent mislabelling of species identity. Genome 77. Yu FQ, Gugel RK, Kutcher HR, Peng G,
54:50–63 Rimmer SR (2013) Identification and mapping
67. Wang J, Kaur S, Cogan NOI, Dobrowolski of a novel blackleg resistance locus LepR4 in
MP, Salisbury PA, Burton WA, Baillie R, Hand the progenies from Brassica napus x B. rapa
M, Hopkins C, Forster JW, Smith KF, subsp. sylvestris. Theor Appl Genet 126:
Spangenberg G (2009) Assessment of genetic 307–315
diversity in Australian canola (Brassica napus 78. Hayward A, McLanders J, Campbell E,
L.) cultivars using SSR markers. Crop Pasture Edwards D, Batley J (2012) Genomic advances
Sci 60:1193–1201 will herald new insights into the Brassica:
68. Edwards D, Forster J, Chagné D, Batley J Leptosphaeria maculans pathosystem. Plant
(2007) What are SNPs? In: Oraguzie N, Biol 14:1–10
Rikkerink E, Gardiner S, Silva H (eds) Association 79. Lysak MA, Koch MA (2011) Phylogeny,
mapping in plants. Springer, New York, genome, and karyotype evolution of crucifers
pp 41–52 (Brassicaceae). In: Schmidt R, Bancroft I (eds)
Molecular Marker Applications in Plants 27
Genetics and genomics of the Brassicaceae. 89. Iorizzo M, Senalik DA, Grzebelus D, Bowman
Springer, New York, pp 1–31 M, Cavagnaro PF, Matvienko M, Ashrafi H,
80. Hu Z, Huang S, Sun M, Wang H, Hua W Van Deynze A, Simon PW (2011) De novo
(2012) Development and application of single assembly and characterization of the carrot
nucleotide polymorphism markers in the poly- transcriptome reveals novel genes, new markers,
ploid Brassica napus by 454 sequencing of and genetic diversity. BMC Genomics 12:389
expressed sequence tags. Plant Breed 131: 90. van Orsouw NJ, Hogers RCJ, Janssen A, Yalcin
293–299 F, Snoeijers S, Verstege E, Schneiders H, van der
81. Schranz ME, Song BH, Windsor AJ, Mitchell- Poel H, van Oeveren J, Verstegen H, van Eijk
Olds T (2007) Comparative genomics in the MJT (2007) Complexity reduction of polymor-
Brassicaceae: a family-wide perspective. Curr phic sequences (CRoPS (TM)): a novel approach
Opin Plant Biol 10:168–175 for large-scale polymorphism discovery in com-
82. Trick M, Long Y, Meng JL, Bancroft I (2009) plex genomes. PLoS One 2:e1172
Single nucleotide polymorphism (SNP) discov- 91. Hendre PS, Kamalakannan R, Varghese M
ery in the polyploid Brassica napus using Solexa (2012) High-throughput and parallel SNP dis-
transcriptome sequencing. Plant Biotechnol J covery in selected candidate genes in Eucalyptus
7:334–346 camaldulensis using Illumina NGS platform.
83. Mayer KFX, Waugh R, Langridge P et al Plant Biotechnol J 10:646–656
(2012) A physical, genetic and functional 92. Kharabian-Masouleh A, Waters DL, Reinke
sequence assembly of the barley genome. RF, Henry RJ (2011) Discovery of polymor-
Nature 491:711–716 phisms in starch-related genes in rice germ-
84. Schnable PS, Ware D, Fulton RS et al (2009) plasm by amplification of pooled DNA and
The B73 maize genome: complexity, diversity, deeply parallel sequencing. Plant Biotechnol J
and dynamics. Science 326:1112–1115 9:1074–1085
85. Chagné D, Crowhurst RN, Troggio M, 93. You FM, Huo N, Deal KR, Gu YQ, Luo MC,
Davey MW, Gilmore B, Lawley C, McGuire PE, Dvorak J, Anderson OD (2011)
Vanderzande S, Hellens RP, Kumar S, Cestaro Annotation-based genome-wide SNP discov-
A, Velasco R, Main D, Rees JD, Iezzoni A, ery in the large and complex Aegilops tauschii
Mockler T, Wilhelm L, Van de Weg E, genome using next-generation sequencing
Gardiner SE, Bassil N, Peace C (2012) without a reference genome sequence. BMC
Genome-wide SNP detection, validation, and Genomics 12:59
development of an 8K array for apple. PLoS 94. Berkman PJ, Lai KT, Lorenc MT, Edwards D
One 7:e31745 (2012) Next-generation sequencing applica-
86. Verde I, Bassil N, Scalabrin S, Gilmore B, tions for wheat crop improvement. Am J Bot
Lawley CT, Gasic K, Micheletti D, Rosyara 99:365–371
UR, Cattonaro F, Vendramin E, Main D, 95. Berkman PJ, Skarshewski A, Manoli S, Lorenc
Aramini V, Blas AL, Mockler TC, Bryant DW, MT, Stiller J, Smits L, Lai KT, Campbell E,
Wilhelm L, Troggio M, Sosinski B, Aranzana Kubalakova M, Simkova H, Batley J, Dolezel J,
MJ, Arus P, Iezzoni A, Morgante M, Peace C Hernandez P, Edwards D (2012) Sequencing
(2012) Development and evaluation of a 9K wheat chromosome arm 7BS delimits the
SNP array for peach by internationally coordi- 7BS/4AL translocation and reveals homoeolo-
nated SNP detection and validation in breed- gous gene conservation. Theor Appl Genet
ing germplasm. PLoS One 7:e35668 124:423–432
87. You FM, Deal KR, Wang J, Britton MT, Fass 96. Hernandez P, Martis M, Dorado G, Pfeifer M,
JN, Lin D, Dandekar A, Leslie CA, Aradhya M, Galvez S, Schaaf S, Jouve N, Simkova H,
Luo MC, Dvorak J (2012) Genome-wide SNP Valarik M, Dolezel J, Mayer KFX (2012) Next-
discovery in walnut with an AGSNP pipeline generation sequencing and syntenic integra-
updated for SNP discovery in allogamous tion of flow-sorted arms of wheat chromosome
organisms. BMC Genomics 13:354 4A exposes the chromosome structure and
88. Bundock PC, Eliott FG, Ablett G, Benson AD, gene content. Plant J 69:377–386
Casu RE, Aitken KS, Henry RJ (2009) 97. Lai K, Berkman PJ, Lorenc MT, Duran C,
Targeted single nucleotide polymorphism Smits L, Manoli S, Stiller J, Edwards D (2012)
(SNP) discovery in a highly polyploid plant WheatGenome.info: an integrated database
species using 454 sequencing. Plant Biotechnol J and portal for wheat genome information.
7:347–354 Plant Cell Physiol 53:e2
Chapter 3
Abstract
With the advent of sequencing technology, next-generation sequencing (NGS) technology has dramatically
revolutionized plant genomics. NGS technology combined with new software tools enables the discovery,
validation, and assessment of genetic markers on a large scale. Among different markers systems, simple
sequence repeats (SSRs) and Single nucleotide polymorphisms (SNPs) are the markers of choice for genetics
and plant breeding. SSR markers have been a choice for large-scale characterization of germplasm collec-
tions, construction of genetic maps, and QTL identification. Similarly, SNPs are the most abundant genetic
variations with higher frequencies throughout the genome of plant species. This chapter discusses various
tools available for genome assembly and widely focuses on SSR and SNP marker discovery.
Key words Next-generation sequencing (NGS), Genetic markers, SSRs, Microsatellites, SNPs,
Mapping tools, Assembly tools, SSRPrimerII, SGSautoSNP
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_3, © Springer Science+Business Media New York 2015
29
30 Pradeep Ruperao and David Edwards
1.1 What Are SSRs? SSRs, also known as microsatellites, are repeating DNA sequences
of 1–6 nucleotides that occur ubiquitously in all prokaryotic and
eukaryotic genomes. The number of repeat units may be variable
among individual genotypes, making SSRs useful for genetic analy-
sis. The variability of alleles at a locus makes SSRs markers more
informative per locus than SNPs [24].
The main limitation in the development of SSR markers has
been the discovery of sequences containing SSR repeats to allow
primer design for polymerase chain reaction (PCR) amplification
and genotyping. SSRs in the coding regions of genes may modify
gene function. Because most such modifications are likely to be
detrimental, the number of SSRs and polymorphisms within cod-
ing regions is expected to be lower than in noncoding sequences.
Hence genomic noncoding regions are the preferred source of
sequence for SSR mining. The isolation of SSRs has traditionally
been a labor intensive and economically costly process, yielding
relatively small number of markers. The process involved the con-
struction of genomic libraries enriched for targeted SSR motifs and
the isolation and sequencing of clones containing the SSR [25].
Additionally, primers from a single SSR locus should amplify only
the target locus and the SSR should show clear polymorphism.
Computational approaches overcome many of the limitations of
SSR discovery, and with the rapid expansion of NGS, there is an
increasing abundance of DNA sequence data suitable for SSR
discovery.
1.2 What Are SNPs? Single nucleotide polymorphisms, frequently called SNPs, are
the most common type of genetic variation among species [26].
A SNP is a single base change in a DNA sequence that can be clas-
sified as one of two types. Transitions are purine–purine (A⇔G) or
pyrimidine–pyrimidine (C⇔T) changes, while transversions are
purine–pyrimidine or pyrimidine–purine changes (A⇔C, A⇔T,
G⇔C, G⇔T). The development of high-throughput methods for
the discovery and genotyping of SNPs has led to a revolution in
their use as molecular markers [4, 27–29]. In principle, at each
Bioinformatics: Identification of Markers from Next-Generation Sequence Data 31
Table 1
Sequencing technologies
Table 2
Assembly software
assembly are listed in Table 2. Each approach has its own merits.
For example, gsAssembler is specifically designed for 454 data with
the possibility of including of Sanger or other FASTA format
sequence data. Geneious (from Biomatters Ltd.) [35, 36], CLC
Genomics Workbench, and SeqManNGen (DNASTAR) are com-
mercially available software packages to analyze Sanger, 454,
Illumina, and other NGS datasets. Newbler is a de novo sequence
assembler developed for use with 454 sequencing data. Velvet is a
de Bruijn graph-based assembler for de novo assembly of short
reads [37]. While it is fairly simple to set up and run these software
packages, significant bioinformatics and genomics knowledge are
often required to obtain optimal results.
2.1 SSR Discovery With the revolution in sequencing technology, it is now feasible to
screen entire genomes for the presence of SSRs using bioinformat-
ics tools. The search parameters used for SSR detection also impact
SSR discovery, and several computational tools such as SSRPrimer
[38, 39] also design PCR primers flanking the SSR sequences,
and it is now possible to computationally predict polymorphic
SSRs [40].
Bioinformatics: Identification of Markers from Next-Generation Sequence Data 33
Table 3
SSR tools
Name References
STRING—Java search for tandem [50]
repeats in genomes
SSRPrimerII http://www.appliedbioinformatics.com.au/projects/ssrPrimer
MicroSAtellite (MISA) http://pgrc.ipk-gatersleben.de/misa/
Sputnik http://espressosoftware.com/sputnik/index.html
BuildSSR [102]
SSR Identification Tool (SSRIT) [103]
Tandem Repeat Finder (TRF) [46]
Tandem Repeat Occurrence Locator [56]
(TROLL)
Mreps [42]
SSRSEARCH ftp://ftp.gramene.org/pub/gramene/software/scripts/ssr.pl
Msatfinder http://www.genomics.ceh.ac.uk/msatfinder/
RepeatMasker http://www.mendeley.com/research/repeatmasker-open30/
Imperfect Microsatellite Extractor [52]
(IMEx)
Spectral repeat finder (SRF) [104]
CENSOR http://www.girinst.org/censor/
2.2 SNP Discovery SNPs have emerged as the markers of choice in breeding programs
because of their abundance and high-throughput detection capac-
ities [62]. There is a huge potential to apply SNPs in crop improve-
ment programs and various methods have been described to detect
and genotype SNPs.
A common way to identify SNPs from NGS data is to first map
variety specific reads to a reference genome. Algorithms are then
applied either to identify differences between the reads and the refer-
ence or to identify sequence differences in the aligned reads, usually
including measures of accuracy to reduce the occurrence of false-
positive SNP calls. Many SNP discovery software programs have
been developed. Some such as CASAVA (Consensus Assessment of
Sequence And Variation) are provided together with next-genera-
tion sequencers (Illumina), with GS Amplicon Variant Analyzer and
GS Reference Mapper Software supplied for the Roche 454
GS-FLX. Commercial software such as NextGENe (http://www.
softgenetics.com/), CLC Genomics Workbench (http://www.
clcbio.com/index.php?id=1240) or Biomatters Geneious [63] and
free-ware programs such as SNPdector [64], ACCUSA [65],
AGSNP [66], NGS-SNP [67], AtlasSNP2 [68], PolyScan [69], and
SGSautoSNP [70] are also available.
The efficiency of variant detection depends on the accuracy of
read alignment. Burrows–Wheeler transform (BWT)-based align-
ers (Bowtie [71], SOAP2 [72], and BWA [73]) are fast, memory
efficient, and particularly useful for aligning repetitive reads, but
comparatively less sensitive than hash-based algorithms such as
MAQ [74], Novoalign, and Stampy [75]. MAQ introduced
mapping quality, a Phred-like measure of alignment confidence.
Bioinformatics: Identification of Markers from Next-Generation Sequence Data 35
Table 4
SNP tools
Program Website/reference
SOAP2 http://soap.genomics.org.cn/index.html
Samtools http://samtools.sourceforge.net/
GATK http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_Toolkit
MaCH http://genome.sph.umich.edu/wiki/Thunder
Qcall ftp://ftp.sanger.ac.uk/pub/ Multi-sample LD
rd/QCALL
IMPUTE2 http://mathgen.stats.ox.ac.uk/impute/impute_v2.html
GigaBayes http://bioinformatics.bc.edu/marthlab/GigaBayes
SNPdetector [64]
Geneious http://www.geneious.com/
PolyScan [69]
SGSautoSNP [70]
QualitySNP [105]
3 Case Studies
3.1.2 SSRPrimer Jewell et al. [38] applied an automated web-based SSR discovery
and SSR Taxonomy Tree: method, SSRPrimer [39] which combines SSR discovery with
Biome SSR Discovery PCR primer design for SSR amplification. SSRs are identified using
SPUTNIK, and the results parsed to Primer3 for locus-specific
primer design. This approach was first used for individual species
datasets [88–95] but later applied to the complete GenBank data-
base, designing PCR amplification primers for 14 million SSRs,
representing the first biome scale SSR discovery [38]. The resulting
SSR Taxonomy Tree tool provides web-based searching of this data,
together with downloading and visualization of SSR amplified
primers.
Bioinformatics: Identification of Markers from Next-Generation Sequence Data 37
3.2 SNP Discovery To increase the number of SNPs available for basic and applied
potato genetics, Hamilton et al. [96] conducted extensive tran-
3.2.1 Single Nucleotide
scriptome sequencing from three relevant potato cultivars (Atlantic,
Polymorphism Discovery
Premier Russet, and Snowden) using the Illumina platform.
in Elite North American
Quality filtered reads were assembled with Velvet [37] and the
Potato Germplasm
assemblies compared with Sanger EST collections from the variet-
ies Bintje, Kennebec, and Shepody. The majority of the Sanger
reads were represented within the Illumina GA2 datasets. MAQ
was employed to identify and filter SNPs within the three Illumina
transcriptomes. The infinium BeadXpress was used to validate and
assess allelic diversity in a diverse set of potato germplasm. This
study identified 575,340 SNPs in elite potato germplasm.
3.2.2 Discovery of Single Lorenc et al. [70] developed an approach called SGSautoSNP for
Nucleotide Polymorphisms SNP prediction, demonstrating the method by identifying SNPs
in Complex Genomes between four wheat cultivars. Variety specific reads were mapped to
Using SGSautoSNP the reference wheat chromosomes 7A, 7B, and 7D [97–99] using
SOAP [37]. The resulting BAM files were used in SGSautoSNP for
SNP discovery. SNPs were called between reads in the alignment
without considering the reference allele. More than 800,000 SNPs
were predicted across the wheat group 7 chromosomes with a vali-
dated accuracy of >93 %. The approach has since been used for SNP
discovery in Brassica with an accuracy of 96 % [100].
4 Examples
SSRPrimerII
SPUTNIK PRIMER
FASTA sequences SSR Primers
Fig. 1 SSRPrimerII accepts sequences in FASTA format to find SSR markers with
SPUTNIK and PRIMER3
Fig. 2 Retrieval and downloading of wheat drought related sequences from GenBank
Fig. 5 SGSautoSNP calls SNPs between cultivars that are represented by at least two reads. SNPs within a
cultivar are ignored as they are likely to represent mis-mapping in homozygous species
5 Notes
References
1. Appleby N, Edwards D, Batley J (2009) New Langridge P, Gustafson JP (eds) Plant genom-
technologies for ultra-high throughput geno- ics. Humana, New York, pp 41–56
typing in plants. In: Somers DJ, Langridge P, 13. Edwards D, Wang X (2012) Genome
Gustafson JP (eds) Plant genomics. Humana, Sequencing Initiatives. In: Edwards D, Parkin
Louisville, KY, pp 19–40 IAP, Batley J (eds) Genetics. Genomics and
2. Edwards D, Batley J, Snowdon R (2013) breeding of oilseed Brassicas. Science Publishers
Accessing complex crop genomes with next- Inc., New Hampshire, pp 152–157
generation sequencing. Theor Appl Genet 14. Edwards D, Batley J (2010) Plant genome
126:1–11 sequencing: applications for crop improve-
3. Berkman PJ, Lai K, Lorenc MT, Edwards D ment. Plant Biotechnol J 7:1–8
(2012) Next generation sequencing applica- 15. Imelfort M, Edwards D (2009) De novo
tions for wheat crop improvement. Am J Bot sequencing of plant genomes using
99:365–371 second-generation technologies. Brief
4. Duran C, Eales D, Marshall D, Imelfort M, Bioinform 10:609–618
Stiller J, Berkman PJ, Clark T, McKenzie M, 16. Imelfort M, Duran C, Batley J, Edwards D
Appleby N, Batley J, Basford K, Edwards D (2009) Discovering genetic polymorphisms
(2010) Future tools for association mapping in next-generation sequencing data. Plant
in crop plants. Genome 53:1017–1023 Biotechnol J 7:312–317
5. Lorenc MT, Boskovic Z, Stiller J, Duran C, 17. Nie X, Li B, Wang L, S B, Liu S, Li T, Dolezel
Edwards D (2012) Role of bioinformatics as a J, Edwards D, Luo MC, Weining S (2012)
tool for oilseed Brassica species. In: Edwards Development of chromosome-arm-specific
D, Parkin IAP, Batley J (eds) Genetics. microsatellite markers in Triticum aestivum
Genomics and breeding of oilseed Brassicas. (Poaceae) using NGS technology. Am J Bot
Science Publishers Inc., New Hampshire, 99:e369–e371
pp 194–205 18. Lai K, Duran C, Berkman PJ, Lorenc MT,
6. Duran C, Boskovic Z, Batley J, Edwards D Stiller J, Manoli S, Hayden MJ, Forrest KL,
(2011) Role of bioinformatics as a tool for Fleury D, Baumann U, Zander M, Mason AS,
vegetable Brassica species. In: Sadowski J (ed) Batley J, Edwards D (2012) Single nucleotide
Vegetable Brassicas. Science Publishers, Inc., polymorphism discovery from wheat next-
New Hampshire, pp 406–418 generation sequence data. Plant Biotechnol J
7. Edwards D (2011) Wheat bioinformatics. In: 10:743–749
Bonjean A, Angus W, Van Ginkel M (eds) 19. Duran C, Appleby N, Edwards D, Batley J
The world wheat book. Lavoisier, Paris, (2009) Molecular genetic markers: discovery,
pp 851–875 applications, data storage and visualisation.
8. Batley J, Jewell E, Edwards D (2007) Curr Bioinform 4:16–27
Automated discovery of single nucleotide 20. Lai K, Berkman PJ, Lorenc MT, Duran C,
polymorphism (SNP) and simple sequence Smits L, Manoli S, Stiller J, Edwards D
repeat (SSR) molecular genetic markers. In: (2012) WheatGenome.info: an integrated
Edwards D (ed) Plant bioinformatics. database and portal for wheat genome infor-
Humana, New York, pp 473–494 mation. Plant Cell Physiol 53:1–7
9. Duran C, Edwards D, Batley J (2009) 21. Lai K, Lorenc MT, Edwards D (2012)
Molecular marker discovery and genetic map Genomic databases for crop improvement.
visualisation. In: Edwards D, Hanson D, Agronomy 2:62–73
Stajich J (eds) Applied bioinformatics. 22. Edwards D, Batley J (2008) Bioinformatics:
Springer, New York, pp 165–189 fundamentals and applications in plant genet-
10. Edwards D, Batley J (2004) Plant bioinfor- ics, mapping and breeding. In: Kole C,
matics: from genome to phenome. Trends Abbott AG (eds) Principles and practices of
Biotechnol 22:232–237 plant genomics. Science Publishers Inc, New
11. Batley J, Edwards D (2007) SNP applications Hampshire, pp 269–302
in plants. In: Oraguzie NC, Rikkerink EHA, 23. Edwards D (2007) Bioinformatics and plant
Gardiner SE, De Silva HN (eds) Association genomics for staple crops improvement. In:
mapping in plants. Springer, New York, Kang MS, Priyadarshan M (eds) Breeding major
pp 95–102 food staples. Blackwell, London, pp 93–106
12. Duran C, Edwards D, Batley J (2009) Genetic 24. Hamblin MT, Warburton ML, Buckler ES
maps and the use of synteny. In: Somers DJ, (2007) Empirical comparison of simple
44 Pradeep Ruperao and David Edwards
sequence repeats and single nucleotide poly- 35. Meintjes P, Duran C, Kearse M, Moir R,
morphisms in assessment of maize diversity Wilson A, Stones-Havas S, Cheung M,
and relatedness. PLoS One 2:e1367 Sturrock S, Buxton S, Cooper A, Markowitz
25. Edwards KJ, Barker JHA, Daly A, Jones C, S, Thierer T, Ashton B, Heled J (2012)
Karp A (1996) Microsatellite libraries Geneious Basic: an integrated and extendable
enriched for several microsatellite sequences desktop software platform for the organiza-
in plants. Biotechniques 20:758 tion and analysis of sequence data.
26. Edwards D, Forster JW, Chagné D, Batley J Bioinformatics 28:1647–1649
(2007) What are SNPs? In: Oraguzie NC, 36. Drummond AJ, Ashton BSB, Cheung M,
Rikkerink EHA, Gardiner SE, De Silva HN Cooper A, Duran C, Field M, Heled J, Kearse
(eds) Association mapping in plants. Springer, M, Markowitz S, Moir R, Stones-Havas S,
New York, pp 41–52 Sturrock S, Thierer T, Wilson A (2011)
Geneious v5.4. http://www.geneious.com
27. Gupta PK (2008) Single-molecule DNA
sequencing technologies for future genomics 37. Zerbino DR, Birney E (2008) Velvet: algo-
research. Trends Biotechnol 26:602–611 rithms for de novo short read assembly using
de Bruijn graphs. Genome Res 18:821–829
28. Edwards D, Forster JW, Cogan NOI, Batley
J, Chagné D (2007) Single nucleotide poly- 38. Jewell E, Robinson A, Savage D, Erwin T,
morphism discovery. In: Oraguzie NC, Love CG, Lim GA, Li X, Batley J, Spangenberg
Rikkerink EHA, Gardiner SE, De Silva HN GC, Edwards D (2006) SSRPrimer and SSR
(eds) Association mapping in plants. Springer, taxonomy tree: biome SSR discovery. Nucleic
New York, pp 53–76 Acids Res 34:W656–W659
29. Chagné D, Batley J, Edwards D, Forster JW 39. Robinson AJ, Love CG, Batley J, Barker G,
(2007) Single nucleotide polymorphism Edwards D (2004) Simple sequence repeat
genotyping in plants. In: Oraguzie NC, marker loci discovery using SSR primer.
Rikkerink EHA, Gardiner SE, De Silva HN Bioinformatics 20:1475–1476
(eds) Association mapping in plants. Springer, 40. Duran C, Singhania R, Raman H, Batley J,
New York, pp 77–94 Edwards D (2013) Predicting polymorphic
30. Mogg R, Batley J, Hanley S, Edwards D, EST-SSRs in silico. Mol Ecol Resour 13:
O'Sullivan H, Edwards KJ (2002) 538–545
Characterization of the flanking regions of 41. Thiel T, Michalek W, Varshney RK, Graner A
Zea mays microsatellites reveals a large num- (2003) Exploiting EST databases for the
ber of useful sequence polymorphisms. Theor development and characterization of gene-
Appl Genet 105:532–543 derived SSR-markers in barley (Hordeum vul-
31. Rothberg JM, Hinz W, Rearick TM, Schultz gare L.). Theor Appl Genet 106:411–422
J, Mileski W, Davey M, Leamon JH, Johnson 42. Kolpakov R, Bana G, Kucherov G (2003)
K et al (2011) An integrated semiconductor mreps: efficient and flexible detection of
device enabling non-optical genome sequenc- tandem repeats in DNA. Nucleic Acids Res
ing. Nature 475:348–352 31:3672–3678
32. Blanca J, Canizares J, Roig C, Ziarsolo P, Nuez 43. da Maia LC, Palmieri DA, de Souza VQ,
F, Pico B (2011) Transcriptome characteriza- Kopp MM, de Carvalho FI, Costa de Oliveira
tion and high throughput SSRs and SNPs A (2008) SSR locator: tool for simple
discovery in Cucurbita pepo (Cucurbitaceae). sequence repeat discovery integrated with
BMC Genomics 12:104 primer design and PCR simulation. Int J
Plant Genomics 2008:412696
33. Parchman TL, Geist KS, Grahnen JA,
Benkman CW, Buerkle CA (2010) 44. Martins WS, Lucas DC, Neves KF, Bertioli DJ
Transcriptome sequencing in an ecologically (2009) WebSat – a web software for microsat-
important tree species: assembly, annotation, ellite marker development. Bioinformation
and marker discovery. BMC Genomics 11:180 3:282–283
34. Hiremath PJ, Farmer A, Cannon SB, 45. Taneda A (2004) Adplot: detection and visu-
Woodward J, Kudapa H, Tuteja R, Kumar A, alization of repetitive patterns in complete
Bhanuprakash A, Mulaosmanovic B, Gujaria genomes. Bioinformatics 20:701–708
N, Krishnamurthy L, Gaur M, Kavikishor B, 46. Benson G (1999) Tandem repeats finder: a
Shah T, Srinivasan R, Lohse M, Xiao Y, Town program to analyze DNA sequences. Nucleic
CD, Cook DR, May GD, Varshney RK Acids Res 27:573–580
(2011) Large-scale transcriptome analysis in 47. Sobreira TJ, Durham AM, Gruber A (2006)
chickpea (Cicer arietinum L.), an orphan TRAP: automated classification, quantifica-
legume crop of the semi-arid tropics of Asia tion and annotation of tandemly repeated
and Africa. Plant Biotechnol J 9:922–931 sequences. Bioinformatics 22:361–362
Bioinformatics: Identification of Markers from Next-Generation Sequence Data 45
48. Wexler Y, Yakhini Z, Kashi Y, Geiger D 63. Kearse M, Moir R, Wilson A, Stones-Havas S,
(2005) Finding approximate tandem repeats Cheung M, Sturrock S, Buxton S, Cooper A,
in genomic sequences. J Comput Biol 12: Markowitz S, Duran C, Thierer T, Ashton B,
928–942 Meintjes P, Drummond A (2012) Geneious
49. Reneker J, Shyu CR, Zeng P, Polacco JC, basic: an integrated and extendable desktop
Gassmann W (2004) ACMES: fast multiple- software platform for the organization and
genome searches for short repeat sequences analysis of sequence data. Bioinformatics
with concurrent cross-species information 28:1647–1649
retrieval. Nucleic Acids Res 32:W649–W653 64. Zhang J, Wheeler DA, Yakub I, Wei S, Sood
50. Parisi V, De Fonzo V, Aluffi-Pentini F (2003) R, Rowe W, Liu PP, Gibbs RA, Buetow KH
STRING: finding tandem repeats in DNA (2005) SNPdetector: a software tool for
sequences. Bioinformatics 19:1733–1738 sensitive and accurate SNP detection. PLoS
51. Karaca M, Bilgen M, Onus AN, Ince AG, Comput Biol 1:e53
Elmasulu SY (2005) Exact tandem repeats 65. Frohler S, Dieterich C (2010) ACCUSA–
analyzer (E-TRA): a new program for DNA accurate SNP calling on draft genomes.
sequence mining. J Genet 84:49–54 Bioinformatics 26:1364–1365
52. Mudunuri SB, Nagarajaram HA (2007) 66. You FM, Deal KR, Wang J, Britton MT, Fass
IMEx: imperfect microsatellite extractor. JN, Lin D, Dandekar AM, Leslie CA, Aradhya
Bioinformatics 23:1181–1187 M, Luo MC, Dvorak J (2012) Genome-wide
53. Kofler R, Schlotterer C, Lelley T (2007) SNP discovery in walnut with an AGSNP
SciRoKo: a new tool for whole genome mic- pipeline updated for SNP discovery in alloga-
rosatellite search and investigation. mous organisms. BMC Genomics 13:354
Bioinformatics 23:1683–1685 67. Grant JR, Arantes AS, Liao X, Stothard P
54. Bizzaro JW, Marx KA (2003) Poly: a quanti- (2011) In-depth annotation of SNPs arising
tative analysis tool for simple sequence repeat from resequencing projects using NGS-
(SSR) tracts in DNA. BMC Bioinformatics SNP. Bioinformatics 27:2300–2301
4:22 68. Shen Y, Wan Z, Coarfa C, Drabek R, Chen L,
55. Tarailo-Graovac M, Chen N (2009) Using Ostrowski EA, Liu Y, Weinstock GM, Wheeler
RepeatMasker to identify repetitive elements DA, Gibbs RA, Yu F (2010) A SNP discovery
in genomic sequences. Curr Protoc method to assess variant allele probability
Bioinformat Chapter 4:Unit 4 10 from next-generation resequencing data.
56. Castelo AT, Martins W, Gao GR (2002) Genome Res 20:273–280
TROLL–tandem repeat occurrence locator. 69. Chen K, McLellan MD, Ding L, Wendl MC,
Bioinformatics 18:634–636 Kasai Y, Wilson RK, Mardis ER (2007)
57. Kurtz S, Schleiermacher C (1999) REPuter: PolyScan: an automatic indel and SNP detec-
fast computation of maximal repeats in com- tion approach to the analysis of human rese-
plete genomes. Bioinformatics 15:426–427 quencing data. Genome Res 17:659–666
58. Betley JN, Frith MC, Graber JH, Choo S, 70. Lorenc MT, Hayashi S, Stiller J, Lee H,
Deshler JO (2002) A ubiquitous and con- Manoli S, Ruperao P, Visendi P, Berkman PJ,
served signal for RNA localization in chor- Lai K, Batley J, Edwards D (2012) Discovery
dates. Curr Biol 12:1756–1761 of single nucleotide polymorphisms in com-
59. Faircloth BC (2008) msatcommander: detec- plex genomes using SGSautoSNP. Biology
tion of microsatellite repeat arrays and auto- 1:370–382
mated, locus-specific primer design. Mol Ecol 71. Langmead B, Trapnell C, Pop M, Salzberg SL
Resour 8:92–94 (2009) Ultrafast and memory-efficient align-
60. Perry JC, Rowe L (2011) Rapid microsatellite ment of short DNA sequences to the human
development for water striders by next- genome. Genome Biol 10:R25
generation sequencing. J Hered 102:125–129 72. Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen
61. Garg R, Patel RK, Tyagi AK, Jain M (2011) K, Wang J (2009) SOAP2: an improved ultra-
De novo assembly of chickpea transcriptome fast tool for short read alignment. Bioinformatics
using short reads for gene discovery and 25:1966–1967
marker identification. DNA Res 18:53–63 73. Li H, Durbin R (2009) Fast and accurate
62. Collins LA, Torrero MN, Franzblau SG short read alignment with Burrows-Wheeler
(1998) Green fluorescent protein reporter transform. Bioinformatics 25:1754–1760
microplate assay for high-throughput screen- 74. Li H, Ruan J, Durbin RM (2008) Mapping
ing of compounds against Mycobacterium short DNA sequencing reads and calling vari-
tuberculosis. Antimicrob Agents Chemother ants using mapping quality scores. Genome
42:344–347 Res 18:1851–1858
46 Pradeep Ruperao and David Edwards
Douches DS, Buell CR (2011) Single nucleo- 101. Azam S, Thakur V, Ruperao P, Shah T, Balaji
tide polymorphism discovery in elite North J, Amindala B, Farmer AD, Studholme DJ,
American potato germplasm. BMC Genomics May GD, Edwards D, Jones JD, Varshney RK
12:302 (2012) Coverage-based consensus calling
97. Berkman PJ, Skarshewski A, Lorenc MT, Lai (CbCC) of short sequence reads and com-
K, Duran C, Ling EYS, Stiller J, Smits L, parison of CbCC results to identify SNPs in
Imelfort M, Manoli S, McKenzie M, chickpea (Cicer arietinum; Fabaceae), a crop
Kubalakova M, Simkova H, Batley J, Fleury species without a reference genome. Am J Bot
D, Dolezel J, Edwards D (2011) Sequencing 99:186–192
and assembly of low copy and genic regions of 102. Rungis D, Berube Y, Zhang J, Ralph S,
isolated Triticum aestivum chromosome arm Ritland CE, Ellis BE, Douglas C, Bohlmann
7DS. Plant Biotechnol J 9:768–775 J, Ritland K (2004) Robust simple sequence
98. Berkman PJ, Skarshewski A, Manoli S, Lorenc repeat markers for spruce (Picea sp) from
MT, Stiller J, Smits L, Lai K, Campbell E, expressed sequence tags. Theor Appl Genet
Kubalakova M, Simkova H, Batley J, Dolezel J, 109:1283–1294
Hernandez P, Edwards D (2012) Sequencing 103. Kantety RV, La Rota M, Matthews DE,
wheat chromosome arm 7BS delimits the Sorrells ME (2002) Data mining for simple
7BS/4AL translocation and reveals homoeol- sequence repeats in expressed sequence tags
ogous gene conservation. Theor Appl Genet from barley, maize, rice, sorghum and wheat.
124:423–432 Plant Mol Biol 48:501–510
99. Berkman PJ, Visendi P, Lee HC, Stiller J, 104. Sharma D, Issac B, Raghava G, Ramaswamy R
Manoli S, Lorenc MT, Lai K, Batley J, Fleury (2004) Spectral repeat finder (SRF): identifica-
D, Šimková H, Kubaláková M, Weining S, tion of repetitive sequences using Fourier trans-
Doležel J, Edwards D (2013) Dispersion and formation. Bioinformatics 20:1405–1412
domestication shaped the genome of bread 105. Tang J, Vosman B, Voorrips RE, van der
wheat. Plant Biotechnol J 11:564–571 Linden CG, Leunissen JA (2006) QualitySNP:
100. Hayward A, Dalton-Morgan J, Mason A, a pipeline for detecting single nucleotide
Zander M, Edwards D, Batley J (2012) SNP polymorphisms and insertions/deletions in
discovery and applications in Brassica napus. EST data from diploid and polyploid species.
J Plant Biotechnol 39:1–12 BMC Bioinformatics 7:438
Chapter 4
Abstract
The detection and analysis of genetic variation plays an important role in plant breeding and this role is
increasing with the continued development of genome sequencing technologies. Molecular genetic markers
are important tools to characterize genetic variation and assist with genomic breeding. Processing and
storing the growing abundance of molecular marker data being produced requires the development of specific
bioinformatics tools and advanced databases. Molecular marker databases range from species specific
through to organism wide and often host a variety of additional related genetic, genomic, or phenotypic infor-
mation. In this chapter, we will present some of the features of plant molecular genetic marker databases,
highlight the various types of marker resources, and predict the potential future direction of crop marker
databases.
Key words Molecular marker, Genetic marker, Genetic variation, SNP marker, SSR marker
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_4, © Springer Science+Business Media New York 2015
49
50 Kaitao Lai et al.
Database name Viewer SNPs SSRs RFLPs RAPDs AFLPs ESTs BACs DArTs DNA probes PCR primers
autoSNPdb * +
Brassica.info + + + + +
Brassica rapa * +
genome database
Chickpea root EST +
database
Cotton Marker * + + +
Database (CMD)
GenBank dbSNP * +
Graingenes * + + + + + + +
Gramene * + + + + + + + + +
ICRISAT +
Legume Information * + + + + +
System (LIS)
MaizeGDB + + + + +
MoccaDB + + +
Panzea * + +
Rice Genome * +
Annotation Project
SSR Primer +
Molecular Marker Databases
(continued)
Database name Viewer SNPs SSRs RFLPs RAPDs AFLPs ESTs BACs DArTs DNA probes PCR primers
SOL Genomics * + + + + +
Network (SGN)
SoyBase * + + + + + +
Kaitao Lai et al.
(continued)
53
Table 2
54
(continued)
References
8. Edwards D, Forster JW, Chagné D, Batley J species using 454 sequencing. Plant Biotechnol
(2007) What are SNPs? In: Oraguzie NC, J 7:347–354
Rikkerink EHA, Gardiner SE, Silva HND 19. Edwards D, Batley J (2010) Plant genome
(eds) Association mapping in plants. Springer, sequencing: applications for crop improvement.
New York, pp 41–52 Plant Biotechnol J 7:1–8
9. Chagné D, Batley J, Edwards D, Forster JW 20. Hong CP, Piao ZY, Kang TW, Batley J, Yang
(2007) Single nucleotide polymorphism geno- TJ, Hur YK, Bhak J, Park BS, Edwards D,
typing in plants. In: Oraguzie N, Rikkerink E, Lim YP (2007) Genomic distribution of simple
Gardiner S, De Silva H (eds) Association map- sequence repeats in Brassica rapa. Mol Cells
ping in plants. Springer, New York, pp 77–94 23:349–356
10. Edwards D, Forster JW, Cogan NOI, Batley J, 21. Burgess B, Mountford H, Hopkins CJ, Love C,
Chagné D (2007) Single nucleotide polymor- Ling AE, Spangenberg GC, Edwards D, Batley J
phism discovery. In: Oraguzie N, Rikkerink E, (2006) Identification and characterization of
Gardiner S, De Silva H (eds) Association map- simple sequence repeat (SSR) markers derived
ping in plants. Springer, New York, pp 53–76 in silico from Brassica oleracea genome shotgun
11. Batley J, Edwards D (2007) SNP applications sequences. Mol Ecol Notes 6:1191–1194
in plants. In: Oraguzie N, Rikkerink E, Gardiner 22. Nie X, Li B, Wang L, Liu P, Biradar SS, Li T,
S, De Silva H (eds) Association mapping in Dolezel J, Edwards D, Luo M, Weining S
plants. Springer, New York, pp 95–102 (2012) Development of chromosome-arm-
12. Allen AM, Barker GL, Berry ST, Coghill JA, specific microsatellite markers in Triticum aes-
Gwilliam R, Kirby S, Robinson P, Brenchley tivum (Poaceae) using NGS technology. Am J
RC, D'Amore R, McKenzie N, Waite D, Hall Bot 99:e369–e371
A, Bevan M, Hall N, Edwards KJ (2011) 23. Keniry A, Hopkins CJ, Jewell E, Morrison B,
Transcript-specific, single-nucleotide polymor- Spangenberg GC, Edwards D, Batley J (2006)
phism discovery and linkage analysis in hexa- Identification and characterization of simple
ploid bread wheat (Triticum aestivum L.). sequence repeat (SSR) markers from Fragaria x
Plant Biotechnol J 9:1086–1099 ananassa expressed sequences. Mol Ecol Notes
13. Winfield MO, Wilkinson PA, Allen AM, Barker 6:319–322
GL, Coghill JA, Burridge A, Hall A, Brenchley 24. Batley J, Barker G, O'Sullivan H, Edwards KJ,
RC, D'Amore R, Hall N, Bevan MW, Richmond Edwards D (2003) Mining for single nucleo-
T, Gerhardt DJ, Jeddeloh JA, Edwards KJ (2012) tide polymorphisms and insertions/deletions
Targeted re-sequencing of the allohexaploid in maize expressed sequence tag data. Plant
wheat exome. Plant Biotechnol J 10:733–742 Physiol 132:84–91
14. Kharabian-Masouleh A, Waters DLE, Reinke 25. Lee H, Lai K, Lorenc MT, Imelfort M, Duran
RF, Henry RJ (2011) Discovery of polymor- C, Edwards D (2012) Bioinformatics tools and
phisms in starch-related genes in rice germ- databases for analysis of next generation
plasm by amplification of pooled DNA and sequence data. Brief Funct Genomics 2:12–24
deeply parallel sequencing†. Plant Biotechnol J
9:1074–1085 26. Imelfort M, Duran C, Batley J, Edwards D
(2009) Discovering genetic polymorphisms
15. Subbaiyan GK, Waters DL, Katiyar SK,
in next-generation sequencing data. Plant
Sadananda AR, Vaddadi S, Henry RJ (2012)
Biotechnol J 7:312–317
Genome-wide DNA polymorphisms in elite
indica rice inbreds discovered by whole-genome 27. Wang X, Wang H, Wang J, Sun R, Wu J, Liu S,
sequencing. Plant Biotechnol J 10:623–634 Bai Y, Mun J-H, Bancroft I, Cheng F, Huang
16. Trick M, Long Y, Meng JL, Bancroft I (2009) S, Li X, Hua W, Wang J, Wang X, Freeling M,
Single nucleotide polymorphism (SNP) discov- Pires JC, Paterson AH, Chalhoub B, Wang B,
ery in the polyploid Brassica napus using Solexa Hayward A, Sharpe AG, Park B-S, Weisshaar B,
transcriptome sequencing. Plant Biotechnol J Liu B, Li B, Liu B, Tong C, Song C, Duran C,
7:334–346 Peng C, Geng C, Koh C, Lin C, Edwards D,
Mu D, Shen D, Soumpourou E, Li F, Fraser F,
17. Barker GLA, Edwards KJ (2009) A genome- Conant G, Lassalle G, King GJ, Bonnema G,
wide analysis of single nucleotide polymor- Tang H, Wang H, Belcram H, Zhou H,
phism diversity in the world's major cereal Hirakawa H, Abe H, Guo H, Wang H, Jin H,
crops. Plant Biotechnol J 7:318–325 Parkin IAP, Batley J, Kim J-S, Just J, Li J, Xu J,
18. Bundock PC, Eliott FG, Ablett G, Benson AD, Deng J, Kim JA, Li J, Yu J, Meng J, Wang J,
Casu RE, Aitken KS, Henry RJ (2009) Min J, Poulain J, Hatakeyama K, Wu K, Wang
Targeted single nucleotide polymorphism L, Fang L, Trick M, Links MG, Zhao M, Jin
(SNP) discovery in a highly polyploid plant M, Ramchiary N, Drou N, Berkman PJ, Cai Q,
60 Kaitao Lai et al.
Huang Q, Li R, Tabata S, Cheng S, Zhang S, 37. Lim G, Jewell E, Li X, Erwin T, Love C, Batley
Zhang S, Huang S, Sato S, Sun S, Kwon S-J, J, Spangenberg G, Edwards D (2007) A com-
Choi S-R, Lee T-H, Fan W, Zhao X, Tan X, Xu parative map viewer integrating genetic maps
X, Wang Y, Qiu Y, Yin Y, Li Y, Du Y, Liao Y, for Brassica and Arabidopsis. BMC Plant Biol
Lim Y, Narusaka Y, Wang Y, Wang Z, Li Z, 7:40
Wang Z, Xiong Z, Zhang Z (2011) The 38. Duran C, Boskovic Z, Imelfort M, Batley J,
genome of the mesopolyploid crop species Hamilton NA, Edwards D (2010) CMap3D: a
Brassica rapa. Nat Genet 43:1035–1040 3D visualisation tool for comparative genetic
28. Hayward A, Dalton-Morgan J, Mason A, maps. Bioinformatics 26:273–274
Zander M, Edwards D, Batley J (2012) SNP 39. Duran C, Eales D, Marshall D, Imelfort M,
discovery and applications in Brassica napus. J Stiller J, Berkman PJ, Clark T, McKenzie M,
Plant Biotechnol 39:49–61 Appleby N, Batley J, Basford K, Edwards D
29. Hayward A, Vighnesh G, Delay C, Samian (2010) Future tools for association mapping in
MR, Manoli S, Stiller J, McKenzie M, Edwards crop plants. Genome 53:1017–1023
D, Batley J (2012) Second-generation sequenc- 40. Edwards D, Batley J (2004) Plant bioinformatics:
ing for gene discovery in the Brassicaceae. from genome to phenome. Trends Biotechnol
Plant Biotechnol J 10:750–759 22:232–237
30. Tollenaere R, Hayward A, Dalton-Morgan J, 41. Lai K, Lorenc MT, Edwards D (2012)
Campbell E, McLanders J, Lorenc M, Manoli Genomic databases for crop improvement.
S, Stiller J, Raman R, Raman H, Edwards D, Agronomy 2:62–73
Batley J (2012) Identification and characterisa- 42. Youens-Clark K, Buckler E, Casstevens T, Chen
tion of candidate Rlm4 blackleg resistance C, DeClerck G, Derwent P, Dharmawardhana
genes in Brassica napus using next generation P, Jaiswal P, Kersey P, Karthikeyan AS, Lu J,
sequencing. Plant Biotechnol J 10:709–715 McCouch SR, Ren L, Spooner W, Stein JC,
31. Berkman BJ, Skarshewski A, Lorenc MT, Lai Thomason J, Wei S, Ware D (2011) Gramene
K, Duran C, Ling EYS, Stiller J, Smits L, database in 2010: updates and extensions.
Imelfort M, Manoli S, McKenzie M, Nucleic Acids Res 39:D1085–D1094
Kubalakova M, Simkova H, Batley J, Fleury D, 43. Youens-Clark K, Faga B, Yap IV, Stein L, Ware
Dolezel J, Edwards D (2011) Sequencing and D (2009) CMap 1.01: a comparative mapping
assembly of low copy and genic regions of iso- application for the Internet. Bioinformatics
lated Triticum aestivum chromosome arm 25:3040–3042
7DS. Plant Biotechnol J 9:768–775
44. Close TJ, Bhat PR, Lonardi S, Wu Y, Rostoks
32. Berkman PJ, Skarshewski A, Manoli S, Lorenc N, Ramsay L, Druka A, Stein N, Svensson JT,
MT, Stiller J, Smits L, Lai K, Campbell E, Wanamaker S, Bozdag S, Roose ML, Moscou
Kubalakova M, Simkova H, Batley J, Dolezel J, MJ, Chao S, Varshney RK, Szucs P, Sato K,
Hernandez P, Edwards D (2012) Sequencing Hayes PM, Matthews DE, Kleinhofs A,
wheat chromosome arm 7BS delimits the Muehlbauer GJ, DeYoung J, Marshall DF,
7BS/4AL translocation and reveals homoeolo- Madishetty K, Fenton RD, Condamine P,
gous gene conservation. Theor Appl Genet Graner A, Waugh R (2009) Development and
124:423–432 implementation of high-throughput SNP
33. Hernandez P, Martis M, Dorado G, Pfeifer M, genotyping in barley. BMC Genomics 10:582
Galvez S, Schaaf S, Jouve N, Simkova H, Valarik 45. O'Sullivan H (2007) GrainGenes – a genomic
M, Dolezel J, Mayer KF (2012) Next-generation database for Triticeae and Avena. In: Edwards
sequencing and syntenic integration of flow- D (ed) Methods in molecular biology. Humana,
sorted arms of wheat chromosome 4A exposes Totowa, NJ, pp 301–314
the chromosome structure and gene content.
Plant J Cell Mol Biol 69:377–386 46. Carollo V, Matthews DE, Lazo GR, Blake TK,
Hummel DD, Lui N, Hane DL, Anderson OD
34. Duran C, Edwards D, Batley J (2009) Genetic
(2005) GrainGenes 2.0. An improved resource
maps and the use of synteny. In: Gustafson JP,
for the small-grains community. Plant Physiol
Langridge P, Somers DJ (eds) Plant genomics.
139:643–651
Humana, New York, pp 41–55
47. Matthews DE, Carollo VL, Lazo GR, Anderson
35. Batley J, Edwards D (2009) Genome sequence
OD (2003) GrainGenes, the genome database
data: management, storage, and visualization.
for small-grain crops. Nucleic Acids Res 31:
Biotechniques 46:333–336
183–186
36. Duran C, Appleby N, Edwards D, Batley J
(2009) Molecular genetic markers: discovery, 48. Szűcs P, Blake VC, Bhat PR, Chao S, Close TJ,
applications, data storage and visualisation. Cuesta-Marcos A, Muehlbauer GJ, Ramsay L,
Curr Bioinform 4:16–27 Waugh R, Hayes PM (2009) An integrated
Molecular Marker Databases 61
resource for Barley linkage map and malting database for functional, comparative and diver-
quality QTL alignment. Plant Gen 2:134–140 sity studies in the Rubiaceae family. BMC Plant
49. Canaran P, Stein L, Ware D (2006) Look- Biol 9:123
Align: an interactive web-based multiple 59. Blenda A, Scheffler J, Scheffler B, Palmer M,
sequence alignment viewer with polymorphism Lacape JM, Yu JZ, Jesudurai C, Jung S,
analysis support. Bioinformatics 22:885–886 Muthukumar S, Yellambalase P, Ficklin S,
50. Mochida K, Saisho D, Yoshida T, Sakurai T, Staton M, Eshelman R, Ulloa M, Saha S, Burr B,
Shinozaki K (2008) TriMEDB: a database to Liu S, Zhang T, Fang D, Pepper A, Kumpatla
integrate transcribed markers and facilitate S, Jacobs J, Tomkins J, Cantrell R, Main D
genetic studies of the tribe Triticeae. BMC (2006) CMD: a cotton microsatellite database
Plant Biol 8:72 resource for Gossypium genomics. BMC
Genomics 7:132
51. Hori K, Takehara S, Nankaku N, Sato K,
Sasakuma T, Takeda K (2007) Barley EST mark- 60. Duran C, Appleby N, Clark T, Wood D,
ers enhance map saturation and QTL mapping Imelfort M, Batley J, Edwards D (2009)
in diploid wheat. Breed Sci 57:39–45 AutoSNPdb: an annotated single nucleotide
polymorphism database for crop plants. Nucleic
52. Ouyang S, Zhu W, Hamilton J, Lin H, Campbell Acids Res 37:D951–D953
M, Childs K, Thibaud-Nissen F, Malek RL, Lee
Y, Zheng L, Orvis J, Haas B, Wortman J, Buell 61. Barker G, Batley J, O'Sullivan H, Edwards KJ,
CR (2007) The TIGR rice genome annotation Edwards D (2003) Redundancy based detec-
resource: improvements and new features. tion of sequence polymorphisms in expressed
Nucleic Acids Res 35:D883–D887 sequence tag data using autoSNP. Bioinformatics
19:421–422
53. Temnykh S, DeClerck G, Lukashova A,
Lipovich L, Cartinhour S, McCouch S (2001) 62. Duran C, Appleby N, Vardy M, Imelfort M,
Computational and experimental analysis of Edwards D, Batley J (2009) Single nucleotide
microsatellites in rice (Oryza sativa L.): fre- polymorphism discovery in barley using autoS-
quency, length variation, transposon associa- NPdb. Plant Biotechnol J 7:326–333
tions, and genetic marker potential. Genome 63. Lai K, Duran C, Berkman PJ, Lorenc MT,
Res 11:1441–1452 Stiller J, Manoli S, Hayden MJ, Forrest KL,
54. Lorenc MT, Boskovic Z, Stiller J, Duran C, Fleury D, Baumann U, Zander M, Mason AS,
Edwards D (2012) Role of Bioinformatics as a Batley J, Edwards D (2012) Single nucleotide
tool for oilseed Brassica species. In: Edwards polymorphism discovery from wheat next-
D, Parkin IAP, Batley J (eds) Genetics, genom- generation sequence data. Plant Biotechnol J
ics and breeding of oilseed Brassicas. Science 10:743–749
Publishers Inc., New Hampshire, pp 194–205 64. Edwards D (2011) Wheat bioinformatics. In:
55. Duran C, Boskovic Z, Batley J, Edwards D Bonjean A, Angus W, Van Ginkel M (eds)
(2011) Role of bioinformatics as a tool for veg- The world wheat book. Lavoisier, Paris,
etable Brassica species. In: Sadowski J (ed) pp 851–875
Vegetable Brassicas. Science Publishers, Inc., 65. Lai K, Berkman PJ, Lorenc MT, Duran C,
New Hampshire, pp 406–418 Smits L, Manoli S, Stiller J, Edwards D (2012)
56. Choi SR, Teakle GR, Plaha P, Kim JH, Allender WheatGenome.info: an integrated database
CJ, Beynon E, Piao ZY, Soengas P, Han TH, and portal for wheat genome information.
King GJ, Barker GC, Hand P, Lydiate DJ, Plant Cell Physiol 53:e2
Batley J, Edwards D, Koo DH, Bang JW, Park 66. Edwards D, Wilcox S, Barrero RA, Fleury D,
BS, Lim YP (2007) The reference genetic link- Cavanagh CR, Forrest KL, Hayden MJ,
age map for the multinational Brassica rapa Moolhuijzen P, Keeble-Gagnère G, Bellgard MI,
genome sequencing project. Theor Appl Genet Lorenc MT, Shang CA, Baumann U, Taylor JM,
115:777–792 Morell MK, Langridge P, Appels R, Fitzgerald A
57. Bombarely A, Menda N, Tecle IY, Buels RM, (2012) Bread matters: a national initiative to
Strickler S, Fischer-York T, Pujar A, Leto J, profile the genetic diversity of Australian wheat.
Gosselin J, Mueller LA (2011) The sol Plant Biotechnol J 10:703–708
genomics network (solgenomics.net): grow- 67. Jewell E, Robinson A, Savage D, Erwin T, Love
ing tomatoes using Perl. Nucleic Acids Res CG, Lim GAC, Li X, Batley J, Spangenberg
39:D1149–D1155 GC, Edwards D (2006) SSRPrimer and SSR
58. Plechakova O, Tranchant-Dubreuil C, Benedet taxonomy tree: biome SSR discovery. Nucleic
F, Couderc M, Tinaut A, Viader V, De Block P, Acids Res 34:W656–W659
Hamon P, Campa C, de Kochko A, Hamon S, 68. Robinson AJ, Love CG, Batley J, Barker G,
Poncet V (2009) MoccaDB – an integrative Edwards D (2004) Simple sequence repeat
62 Kaitao Lai et al.
marker loci discovery using SSR primer. M, Edgar R, Federhen S, Feolo M, Geer LY,
Bioinformatics 20:1475–1476 Helmberg W, Kapustin Y, Khovayko O,
69. Batley J, Hopkins CJ, Cogan NOI, Hand M, Landsman D, Lipman DJ, Madden TL, Maglott
Jewell E, Kaur J, Kaur S, Li X, Ling AE, Love DR, Miller V, Ostell J, Pruitt KD, Schuler GD,
C, Mountford H, Todorovic M, Vardy M, Shumway M, Sequeira E, Sherry ST, Sirotkin K,
Walkiewicz M, Spangenberg GC, Edwards D Souvorov A, Starchenko G, Tatusov RL,
(2007) Identification and characterization of Tatusova TA, Wagner L, Yaschenko E (2008)
simple sequence repeat markers from Brassica Database resources of the national center for
napus expressed sequences. Mol Ecol Notes biotechnology information. Nucleic Acids Res
7:886–889 36:D13–D21
70. Hopkins CJ, Cogan NOI, Hand M, Jewell E, 78. Gonzales MD, Gajendran K, Farmer AD,
Kaur J, Li X, Lim GAC, Ling AE, Love C, Archuleta E, Beavis WD (2007) Leveraging
Mountford H, Todorovic M, Vardy M, model legume information to find candidate
Spangenberg GC, Edwards D, Batley J (2007) genes for soybean sudden death syndrome using
Sixteen new simple sequence repeat markers the legume information system. In: Edwards D
from Brassica juncea expressed sequences and (ed) Methods in molecular biology. Humana,
their cross-species amplification. Mol Ecol Totowa, NJ, pp 245–259
Notes 7:697–700 79. Gonzales MD, Archuleta E, Farmer A,
71. Ling AE, Kaur J, Burgess B, Hand M, Hopkins Gajendran K, Grant D, Shoemaker R, Beavis
CJ, Li X, Love CG, Vardy M, Walkiewicz M, WD, Waugh ME (2005) The legume informa-
Spangenberg G, Edwards D, Batley J (2007) tion system (LIS): an integrated information
Characterization of simple sequence repeat resource for comparative legume biology.
markers derived in silico from Brassica rapa Nucleic Acids Res 33:D660–D665
bacterial artificial chromosome sequences and 80. Schaeffer ML, Harper LC, Gardiner JM,
their application in Brassica napus. Mol Ecol Andorf CM, Campbell DA, Cannon EK, Sen
Notes 7:273–277 TZ, Lawrence CJ (2011) MaizeGDB: curation
72. Jayashree B, Buhariwalla HK, Shinde S, and outreach go hand-in-hand. Database
Crouch JH (2005) A legume genommics (Oxford) 2011, bar022
resource: the chickpea root expressed sequence 81. Lawrence CJ (2007) MaizeGDB – the maize
tag database. Electron J Biotechnol 8: genetics and genomics database. In: Edwards
128–133 D (ed) Methods in molecular biology. Humana,
73. Azam S, Thakur V, Ruperao P, Shah T, Balaji J, Totowa, NJ, pp 331–345
Amindala B, Farmer AD, Studholme DJ, 82. Lawrence CJ, Schaeffer ML, Seigfried TE,
May GD, Edwards D, Jones JD, Varshney RK Campbell DA, Harper LC (2007) MaizeGDB's
(2012) Coverage-based consensus calling new data types, resources and activities. Nucleic
(CbCC) of short sequence reads and compari- Acids Res 35:D895–D900
son of CbCC results to identify SNPs in chick- 83. Canaran P, Buckler ES, Glaubitz JC, Stein L,
pea (Cicer arietinum; Fabaceae), a crop species Sun Q, Zhao W, Ware D (2008) Panzea: an
without a reference genome. Am J Bot 99: update on new content and features. Nucleic
186–192 Acids Res 36:D1041–D1043
74. Cheng F, Liu S, Wu J, Fang L, Sun S, Liu B, Li 84. Grant D, Nelson RT, Cannon SB, Shoemaker
P, Hua W, Wang X, Cheng F, Liu SY, Wu J, Fang RC (2010) SoyBase, the USDA-ARS soybean
L, Sun SL, Liu B, Li PX, Hua W, Wang XW genetics and genomics database. Nucleic Acids
(2011) BRAD, the genetics and genomics data- Res 38:D843–D846
base for Brassica plants. BMC Plant Biol 11:136
85. Wegrzyn J, Main D, Figueroa B, Choi M, Yu J,
75. Karsch-Mizrachi I, Nakamura Y, Cochrane G Neale D, Jung S, Lee T, Stanton M, Zheng P,
(2012) The international nucleotide sequence Ficklin S, Cho I, Peace C, Evans K, Volk G,
database collaboration. Nucleic Acids Res 40: Oraguzie N, Chen C, Olmstead M, Gmitter G,
D33–D37 Abbott A (2012) Uniform standards for genome
76. Benson DA, Karsch-Mizrachi I, Lipman DJ, databases in forest and fruit trees. Tree Genet
Ostell J, Sayers EW (2009) GenBank. Nucleic Genomes 8:1–2
Acids Res 37:26–31 86. Tree fruit Genome Database Resources (tfGDR)
77. Wheeler DL, Barrett T, Benson DA, Bryant SH, (2002) Washington State University, Pullman,
Canese K, Chetvernin V, Church DM, DiCuccio WA. http://www.tfgdr.org
Chapter 5
Abstract
Inter-simple sequence repeat PCR (ISSR-PCR) is a fast, inexpensive genotyping technique based on length
variation in the regions between microsatellites. The method requires no species-specific prior knowledge
of microsatellite location or composition. Very small amounts of DNA are required, making this method
ideal for organisms of conservation concern, or where the quantity of DNA is extremely limited due to
organism size. ISSR-PCR can be highly reproducible but requires careful attention to detail. Optimization
of DNA extraction, fragment amplification, and normalization of fragment peak heights during fluorescent
detection are critical steps to minimizing the downstream time spent verifying and scoring the data.
Key words ABI Genetic Analyzer, Capillary electrophoresis, Conservation, Fragment, Inexpensive,
Normalization, Population genetics, ISSR-PCR
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_5, © Springer Science+Business Media New York 2015
63
64 Linda M. Prince
Nondegenerate 801 ATA TAT ATA TAT ATA TT 24.5 41.8 34.0 All the way Yes! Tm = 43°
802 ATA TAT ATA TAT ATA TG 24.8 44.2 36.0 All the way Yes! Tm = 43°
803 ATA TAT ATA TAT ATA TC 23.8 44.2 36.0 All the way Yes! Tm = 43°
804 TAT ATA TAT ATA TAT AA 23.3 41.8 34.0 All the way Yes! Tm = 38°
805 TAT ATA TAT ATA TAT AC 21.9 44.2 36.0 All the way Yes! Tm = 38°
806 TAT ATA TAT ATA TAT AG 22.5 44.2 36.0 All the way Yes! Tm = 38°
807 AGA GAG AGA GAG AGA GT 39.6 61.1 50.0 No No
808 AGA GAG AGA GAG AGA GC 44.2 63.5 52.0 1 of 2 bp No
809 AGA GAG AGA GAG AGA GG 44.0 63.5 52.0 No No
810 GAG AGA GAG AGA GAG AT 40.0 61.1 50.0 1 of 2 bp No
811 GAG AGA GAG AGA GAG AC 40.0 63.5 52.0 No No
811 GAG AGA GAG AGA GAG AC 40.0 63.5 52.0 No No
812 GAG AGA GAG AGA GAG AA 41.3 61.1 50.0 No No
813 CTC TCT CTC TCT CTC TT 40.9 61.1 50.0 No No
814 CTC TCT CTC TCT CTC TA 38.5 61.1 50.0 1 of 2 bp No
814 CTC TCT CTC TCT CTC TA 38.5 61.1 50.0 1 of 2 bp No
815 CTC TCT CTC TCT CTC TG 41.7 63.5 52.0 No No
816 CAC ACA CAC ACA CAC AT 46.1 61.1 50.0 1 of 2 bp No
817 CAC ACA CAC ACA CAC AA 47.6 61.1 50.0 No No
818 CAC ACA CAC ACA CAC AG 46.8 63.5 52.0 No No
819 GTG TGT GTG TGT GTG TA 42.7 61.1 50.0 1 of 2 bp No
820 GTG TGT GTG TGT GTG TC 45.0 63.5 52.0 No No
821 GTG TGT GTG TGT GTG TT 45.3 61.1 50.0 No No
822 TCT CTC TCT CTC TCT CA 42.2 61.1 50.0 No No
823 TCT CTC TCT CTC TCT CC 44.5 63.5 52.0 No No
824 TCT CTC TCT CTC TCT CG 46.0 63.5 52.0 1 of 2 bp No
825 ACA CAC ACA CAC ACA CT 44.4 61.1 50.0 No No
826 ACA CAC ACA CAC ACA CC 48.6 63.5 52.0 No No
827 ACA CAC ACA CAC ACA CG 50.2 63.5 52.0 1 of 2 bp No
Plant Genotyping Using Fluorescently Tagged Inter-Simple Sequence Repeats…
(continued)
Table 1
66
(continued)
Generate 831 ATA TAT ATA TAT ATA TYA 26.6 44.0 36.0 All the way Yes! Tm = 43°
832 ATA TAT ATA TAT ATA TYC 28.2 46.3 38.0 All the way Yes! Tm = 43°
833 ATA TAT ATA TAT ATA TYG 29.1 46.3 38.0 All the way Yes! Tm = 43°
834 AGA GAG AGA GAG AGA GYT 43.5 62.2 52.0 No No
Linda M. Prince
835 AGA GAG AGA GAG AGA GYC 43.2 64.5 54.0 No No
836 AGA GAG AGA GAG AGA GYA 41.2 62.2 52.0 No No
837 TAT ATA TAT ATA TAT ART 26.6 44.0 36.0 All the way Yes! Tm = 38°
838 TAT ATA TAT ATA TAT ARC 25.9 46.3 38.0 All the way Yes! Tm = 38°
839 TAT ATA TAT ATA TAT ARG 26.9 46.3 38.0 All the way Yes! Tm = 38°
840 GAG AGA GAG AGA GAG AYT 43.9 62.2 52.0 No No
841 GAG AGA GAG AGA GAG AYC 43.6 64.5 54.0 No No
842 GAG AGA GAG AGA GAG AYG 44.7 64.5 54.0 No No
843 CTC TCT CTC TCT CTC TRA 42.5 62.2 52.0 No No
844 CTC TCT CTC TCT CTC TRC 44.4 64.5 54.0 No No
845 CTC TCT CTC TCT CTC TRG 45.5 64.5 54.0 No No
846 CAC ACA CAC ACA CAC ART 49.9 62.2 52.0 No No
847 CAC ACA CAC ACA CAC ARC 49.8 64.5 54.0 No No
848 CAC ACA CAC ACA CAC ARG 51.1 64.5 54.0 No No
849 GTG TGT GTG TGT GTG TYA 46.7 62.2 52.0 No No
850 GTG TGT GTG TGT GTG TYC 48.9 64.5 54.0 No No
851 GTG TGT GTG TGT GTG TYG 50.2 64.5 54.0 No No
852 TCT CTC TCT CTC TCT CRA 42.2 62.2 52.0 No No
853 TCT CTC TCT CTC TCT CRT 44.4 62.2 52.0 No No
854 TCT CTC TCT CTC TCT CRG 45.3 64.5 54.0 No No
855 ACA CAC ACA CAC ACA CYT 48.3 62.2 52.0 No No
856 ACA CAC ACA CAC ACA CYA 46.0 62.2 52.0 No No
857 ACA CAC ACA CAC ACA CYG 49.4 64.5 54.0 No No
858 TGT GTG TGT GTG TGT GRT 50.2 62.2 52.0 No No
859 TGT GTG TGT GTG TGT GRC 50.1 64.5 54.0 No No
860 TGT GTG TGT GTG TGT GRA 47.8 62.2 52.0 No No
Nondegenerate 861 ACC ACC ACC ACC ACC ACC 65.4 71.3 60.0 No No
862 AGC AGC AGC AGC AGC AGC 67.6 71.3 60.0 6 of 2 bp No
863 AGT AGT AGT AGT AGT AGT 30.3 57.7 48.0 3 of 1–2 bp No
864 ATG ATG ATG ATG ATG ATG 48.4 57.7 48.0 6 of 2 bp No
865 CCG CCG CCG CCG CCG CCG 90.8 85.0 72.0 6 of 2 bp No
866 CTC CTC CTC CTC CTC CTC 59.4 71.3 60.0 No No
867 GGC GGC GGC GGC GGC GGC 90.1 85.0 72.0 6 of 2 bp No
868 GAA GAA GAA GAA GAA GAA 46.9 57.7 48.0 No No
869 GTT GTT GTT GTT GTT GTT 49.2 57.7 48.0 No No
870 TGC TGC TGC TGC TGC TGC 69.5 71.3 60.0 6 of 2 bp No
871 TAT TAT TAT TAT TAT TAT 33.1 44.0 36.0 6 of 2 bp No
872 GAT AGA TAG ATA GAT A 26.8 49.6 40.0 4 of 2 bp No
873 GAC AGA CAG ACA GAC A 39.7 59.8 48.0 No No
874 CCC TCC CTC CCT CCC T 62.9 70.1 56.0 No No
875 CTA GCT AGC TAG CTA G 39.9 59.8 48.0 All the way Yes! Tm = 66°
876 GAT AGA TAG ACA GAC A 32.8 54.7 44.0 1 of 2 bp No
877 TGC ATG CAT GCA TGC A 61.2 59.8 48.0 All the way Yes! Tm = 93°
878 GGA TGG ATG GAT GGA T 53.2 59.8 48.0 4 of 2 bp No
879 CTT CAC TTC ACT TCA 37.7 52.9 42.0 No No
880 GGA GAG GAG AGG AGA 45.1 61.1 48.0 No No
881 GGG TGG GGT GGG GTG 63.7 69.3 54.0 No No
Degenerate 882 VBV ATA TAT ATA TAT AT 25.7 41.8 34.0 All the way Yes! Tm = 31°
883 BVB TAT ATA TAT ATA TA 26.9 41.8 34.0 All the way Yes! Tm = 31°
884 HBH AGA GAG AGA GAG AG 39.7 58.7 48.0 No No
885 BHB GAG AGA GAG AGA GA 43.3 58.7 48.0 No No
886 VDV CTC TCT CTC TCT CT 41.7 58.7 48.0 No No
887 DVD TCT CTC TCT CTC TC 42.5 58.7 48.0 No No
888 BDB CAC ACA CAC ACA CA 47.7 58.7 48.0 No No
889 DBD ACA CAC ACA CAC AC 43.1 58.7 48.0 No No
890 VHV GTG TGT GTG TGT GT 46.6 58.7 48.0 No No
891 HVH TGT GTG TGT GTG TG 47.8 58.7 48.0 No No
892 TAG ATC TGA TAT CTG AAT TCC C 57.3 65.7 60.0 2 of 2 bp, 1 of 6 bp Yes! Tm = 28°
893 NNN NNN NNN NNN NNN 41.6 36.5 30.0 All the way Yes! Tm = 63°
894 TGG TAG CTC TTG ATC ANN NNN 58.8 63.0 56.0 4 of 1–2 bp; 1 of 5, 1 of 6 No
895 AGA GTT GGT AGC TCT TGA TC 54.9 66.2 58.0 8 of 1–4 bp Yes! Tm = 37°
Plant Genotyping Using Fluorescently Tagged Inter-Simple Sequence Repeats…
896 AGG TCG CGG CCG CNN NNN NAT G 75.8 73.2 68.0 1 of 8 bp No
897 CCG ACT CGA GNN NNN NAT GTG G 65.9 69.5 64.0 10 of 1–2 bp; 1 of 6 bp No
898 GAT CAA GCT TNN NNN NAT GTG G 61.1 63.9 58.0 2 of 1 bp; 1 of 4 bp; 1 of 8 bp No
899 CAT GGT GTT GGT CAT TGT TCC A 68.0 69.5 64.0 2 of 3 bp; 2 of 4 bp Yes! Tm = 49°
67
900 ACT TCC CCA CAG GTT AAC ACA 63.6 68.9 62.0 2 of 1 bp; 2 of 2 bp; 1 of 6 bp No
Tm/Td calculations: method 1: nearest neighbor, method 2: %GC, method 3: 2° × (A + T) + 4° × (G + C)
68 Linda M. Prince
2 Materials
2.2 PCR 1. High-fidelity DNA polymerase such as Phusion (see Note 2).
Amplification 2. dNTPs (2.5 mM each, 10 mM total).
3. Oligonucleotide primers (20 μM; Table 1) labeled with fluo-
rescent dyes specific to your electrophoresis instrumentation.
4. Deionized water.
5. Thin-walled PCR tubes appropriate to your thermal cyclers.
6. Thermal cycler (routine; real-time preferred).
3 Methods
3.2 PCR See Notes 1 and 8 before beginning. Optimization must be per-
Amplification formed for every species project and each primer. Negative controls
must be run to identify primer-dimers and to assess reagent
70 Linda M. Prince
3.3 Fragment See Notes 8–10 before beginning. Normalization of fragment peak
Detection heights during fluorescent detection is critical. Automated Fragment
Analysis relies on user-specified thresholds for peak width and peak
height. Samples that amplify weakly and are not improved by clean-
ing of the DNA or normalization via addition of more PCR product
will require extensive downstream verification. The procedure below
is specific to an Applied Biosystems, Inc. 3130xl Genetic Analyzer,
but can be adapted to most other capillary platforms.
1. Instrument setup (see Note 9): POP-7 polymer, 50 cm capillary
array, ISSR run module (see Note 2).
2. Sample Preparation: 96-well run plate, 0.5 μL of GeneScan
1200 LIZ size standard, 10 μL of Hi-Di Formamide (see
Note 10), 0.5–5.0 μL of sample (see Note 11), Plate cassette
assembly.
3. Denature prepared samples at 95 °C for 2 min and hold at 4 °C
before loading the plate on the instrument. Store plates at 4 °C
until successful data collection has been confirmed, then dis-
card in hazardous waste. Increase PCR product if necessary,
and rerun within 24 h, repeating the denaturization step. There
is no need to add more 1200 Liz size standard.
Plant Genotyping Using Fluorescently Tagged Inter-Simple Sequence Repeats… 71
3.4 Fragment See Note 12. The method below is specific to The ABI 3130xL
Analysis from 100 instrument, but can be adapted to other platforms. Output (.fsa) files
to 800 bp are imported to GeneMapper v4.0 (or later) software. GeneMapper
software can accommodate a large number of samples, but it is easier
to manipulate projects created for each primer separately. New
(Generic) files are analyzed with a modified AFLP Analysis Method.
This method scores each peak above a minimum peak height (50 rfu)
as an allele and applies a binary label of 1, check, or 0, for the
presence of peak in a particular bin. The level of background “noise”
is often around 10 bp, but will vary depending on overall signal
strength. Select appropriate parameters (yours may differ):
1. General Parameters: Description: AFLP tutorial, Instrument:
3130xl.
2. Allele Parameters:
● Analyze dyes = blue (FAM), green (VIC), yellow (NED),
red (HEX), orange (LIZ).
● Analysis range = 50–800 bps (allows you to see primer-
dimers but avoids most dye blobs that will cause your anal-
ysis to fail).
● Normalization scope = Project.
● Normalization method = Sum of Signal.
● Panel = Generate panel using samples. Bin width (bp) = 1.0;
Use all samples.
● Allele calling = Name alleles using labels. Useful labels
include 0 (<5), 1 (≥100), and check (50 > but ≤100).
3. Peak Detector Parameters.
● Peak detection algorithm = Advanced.
● Analysis range = Partial (3,250–1,950).
● Analysis sizing = All sizes.
● Smoothing = None.
● Size calling method = Local Southern Method.
● Peak amplitude thresholds = 50 for all five colors.
● Min. peak half width = 2 pts.
● Polynomial degree = 3.
● Peak window size = 15 pts.
● Slope threshold = 0.0.
4. Peak Quality Parameters—Use factory defaults.
5. Quality Flags Parameters—Use factory defaults.
Select the appropriate size standard (after pruning fragments
<50 and >800 bp, and renaming the 1200 Liz standard) from the
pull-down menu. The Tables feature can be used to export any
72 Linda M. Prince
4 Notes
Acknowledgements
References
Eleusine with DNA markers. Genome 38: 14. Levi A, Thomas CE, Newman M, Reddy OUK,
757–763 Zhang X, Xu Y (2004) ISSR and AFLP markers
4. Kostia S, Varvio S-L, Vakkari P, Pulkkinen P differ among American watermelon cultivars
(1995) Microsatellite sequences in a conifer, with limited genetic diversity. J Am Soc Hort
Pinus sylvestris. Genome 38:1244–1248 Sci 129:553–558
5. Charters YM, Robertson A, Wilkinson MJ, 15. Doyle JJ, Doyle JL (1987) A rapid DNA isola-
Ramsay G (1996) PCR analysis of oilseed rape tion procedure for small quantities of fresh leaf
cultivars (Brassica napus L. ssp. oleifera) using tissue. Phytochem Bull 19:11–15
5'-anchored simple sequence repeat (SSR) 16. Applied Biosystems, Inc. (2010) Application
primers. Theor Appl Genet 92:442–447 note: ISSR plant genotyping. Publication
6. PubMed.gov. US National Library of Medicine/ 106AP31-01. http://tools.invitrogen.com/
National Institutes of Health. http://www. content/sfs/brochures/cms_079244.pdf .
ncbi.nlm.nih.gov/pubmed. Accessed 12 May Accessed 10 Dec 2012
2012 17. Kistler L (2012) Ch 10 Ancient DNA extrac-
7. Albani MC, Battey NH, Wilkinson MJ (2004) tion from plants. In: Shapiro B, Hofreiter M
The development of ISSR-derived SCAR (eds) Ancient DNA: methods and protocols.
markers around the SEASONAL Human, New York, pp 71–79
FLOWERING LOCUS (SFL) in Fragaria 18. Herzer S (2001) DNA purification. In: Gerstein
vesca. Theor Appl Genet 109:571–579 AS (ed) Molecular biology problem solver: a lab-
8. Bornet B, Antoine E, Françoise S, Marcaillou-Le oratory guide. Wiley-Liss, Inc. http://onlineli-
Baut C (2005) Development of sequence char- brary.wiley.com/book/10.1002/0471223905.
acterized amplified region markers from inter- Accessed 10 Dec 2012
simple sequence repeat fingerprints for the 19. John ME (1992) An efficient method for isola-
molecular detection of toxic phytoplankton tion of RNA and DNA from plants containing
Alexandrium catenella (Dinophyceae) and polyphenolics. Nucleic Acids Res 20:2381
pseudo-Nitzchia pseudodelicatissima 20. Fazekas AJ, Steeves R, Newmaster SG (2010)
(Bacillariophyceae) from French coastal waters. Improving sequencing quality from PCR prod-
J Phycol 41:704–711 ucts containing long mononucleotide repeats.
9. Ye Q, Qiu Y-X, Quo Y-Q, Chen J-X, Yang S-Z, Biotechniques 48:277–285
Zhao M-S, Fu C-X (2006) Species-specific
21. Amersham Biosciences (2000) Nucleic acid
SCAR markers for authentication of
purification: nucleon phytopure. Data
Sinocalycanthus chinensis. J Zhejiang Univ Sci
File18-1146-64. https://www.gelifesciences.
B 7:868–872
com. Accessed 10 Dec 2012
10. UBC primer set No. 9, Biotechnology
22. Bitesize Bio. The basics: how phenol extraction
Laboratory, University of British Columbia,
works.http://bitesizebio.com/articles/the-basics-
Vancouver, Canada
how-phenol-extraction-works/. Accessed 10
11. Bornet B, Branchard M (2001) Nonanchored Dec 2012
inter simple sequence repeat (ISSR) markers:
reproducible and specific tools for genome fin- 23. Zumbo P (2012) Ethanol precipitation. Weill
gerprinting. Plant Mol Biol Rep 19:209–215 Cornell Medical College Department of
Physiology and Biophysics, Ithaca, NY, p 12
12. Monte-Corvo L, Goulão L, Oliveira C (2001)
ISSR analysis of cultivars of pear and suitability 24. http://irc.igd.cornell.edu/Protocols/RNase
of molecular markers for clone discrimination. Protocol.html. Accessed 10 Dec 2012
J Am Soc Hort Sci 126:517–522 25. Meudt HM, Clarke AC (2007) Almost forgot-
13. Qian W, Ge S, Hong DY (2001) Genetic varia- ten or latest practice? AFLP applications, analyses
tion within and among populations of a wild and advances. Trends Plant Sci 12:106–117
rice Oryza granulate from China detected by 26. Rychlik W (2002) OLIGO primer analysis
RAPD and ISSR markers. Theor Appl Genet software, version 6. Molecular biology insights.
102:440–449 Cascade, Inc, Cascade-Chipita Park, CO
Chapter 6
SSR Genotyping
Annaliese S. Mason
Abstract
SSR genotyping involves the use of simple sequence repeats (SSRs) as DNA markers. SSRs, also called
microsatellites, are a type of repetitive DNA sequence ubiquitous in most plant genomes. SSRs contain
repeats of a motif sequence 1–6 bp in length. Due to this structure SSRs frequently undergo mutations,
mainly due to DNA polymerase errors, which involve the addition or subtraction of a repeat unit. Hence,
SSR sequences are highly polymorphic and may be readily used for detection of allelic variation within
populations. SSRs are present within both genic and nongenic regions and are occasionally transcribed,
and hence may be identified in expressed sequence tags (ESTs) as well as more commonly in nongenic
DNA sequences. SSR genotyping involves the design of DNA-based primers to amplify SSR sequences
from extracted genomic DNA, followed by amplification of the SSR repeat region using polymerase chain
reaction, and subsequent visualization of the resulting DNA products, usually using gel electrophoresis.
These procedures are described in this chapter. SSRs have been one of the most favored molecular markers
for plant genotyping in the last 20 years due to their high levels of polymorphism, wide distribution across most
plant genomes, and ease of use and will continue to be a useful tool in many species for years to come.
Key words Simple sequence repeats, PCR-based markers, Molecular markers, Plant genotyping,
Polymorphism, Primer design, Agarose gel electrophoresis
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_6, © Springer Science+Business Media New York 2015
77
78 Annaliese S. Mason
Fig. 1 Example of agarose gel electrophoresis of PCR products resulting from SSR locus amplification using PCR,
showing visualization of two alleles of approximately 320 and 230 bp in size (Allele 1 and Allele 2, respectively)
in 23 experimental individuals. Sample 4 and Sample 16 represent probable failed amplification of the SSR
locus, Samples 13, 15, and 18 are homozygous for Allele 2, Sample 23 is homozygous for Allele 1 and the
remaining samples are heterozygous (one copy of Allele 1 and one copy of Allele 2)
80 Annaliese S. Mason
2 Materials
3 Methods
3.2 Sequence 1. Make up 10–50 μl (see Note 4) per reaction PCR mixes in
Amplification 0.2 ml PCR tubes on ice containing (see Note 5):
(See Note 3) ● 0.125 mM dNTP mix (e.g., 1.6 μl of 2.5 mM solution in
a 20 μl reaction).
● 1× DNA polymerase buffer (e.g., 2 μl of 10× DNA poly-
merase buffer in a 20 μl reaction).
● 2 mM MgCl (e.g., 2 μl of 20 mM MgCl solution in a 20 μl
reaction; only add MgCl if not present in 10× DNA poly-
merase buffer).
● 0.5 μM forward primer (e.g., 1 μl of 10 μM solution in a
20 μl reaction).
● 0.5 μM reverse primer (e.g., 1 μl of 10 μM solution in a
20 μl reaction).
● 10–75 ng of genomic DNA (e.g., 5 μl of 10 ng/μl DNA
solution in a 20 μl reaction).
● 1 U of Taq (DNA polymerase enzyme from Thermus
aquaticus; e.g., 0.2 μl of 5 U/μl solution in a 20 μl
reaction).
● Purified deionized water to the appropriate volume (e.g., up
to 20 μl in a 20 μl reaction).
2. Mix thoroughly by flicking the tube (do not vortex).
3. Thermocycler programming (see Note 6)
Heat cycling (see Note 7): Initial denaturation, then 15–35
cycles of denaturation, melting, and annealing, followed by a
final extension. Typical temperatures and times are given for a
product of 500 bp.
(a) Initial denaturation: 94 °C for 5 min.Then 35 cycles of
steps 2–4:
(b) Denaturation: 94 °C for 30 s.
(c) Melting (see Note 8): 50 °C for 60 s.
(d) Annealing: 72 °C for 60 s.
Followed by a single extension step:
(e) Extension: 72 °C for 10 min.
3.3 Visualization 1. Prepare the agarose gel for electrophoresis. Weigh out 1 g of
of Amplified DNA molecular biology grade agarose powder into a conical flask, add
Product Using Agarose 100 ml of 1× TAE buffer (makes 1 % agarose gel, see Note 10).
Gel Electrophoresis Heat the solution to boiling and check to make sure all powder
(See Note 9) is dissolved. Cool the outside of the flask under running water
(turning or swirling flask to prevent uneven cooling) until flask
can be held comfortably (~60 °C), and then add 2 μl of 10 mg/ml
ethidium bromide solution (see Note 11).
2. Pour liquid agarose into a gel mold, generally consisting of a
hard plastic tray which has been taped on both ends (to allow
SSR Genotyping 83
4 Notes
12. PCR products may be stored at this step for later visualization,
although storing before adding loading buffer is preferable. 4 °C
is fine for short-term storage (up to 2 weeks), and −20 °C is
preferable for longer term storage.
13. Gel tanks which run faster and at higher voltages are also
available commercially, and premade agarose gels can also be
bought rather than made from scratch.
14. Microsatellites are codominant molecular markers, simplifying
the analysis of marker data as all alleles produced should be
observable. If only one SSR locus is amplified by the primers
then a maximum of two alleles should be observed for any one
individual. The observation of one allele represents either
homozygosity at that locus or two alleles that are too close in
size to be distinguishable. Absence of alleles in some individuals
can be due to either failed PCRs, polymorphisms in the primer
binding site, or deletion mutations. If multiple loci are ampli-
fied by a single primer (detectable as the presence of three or
more alleles) then scoring alleles can become more complex.
Amplification of multiple loci is common in polyploids, but
may also occur in other species. Increasing the melting tem-
perature in the PCR reaction may increase primer specificity to
remove additional bands produced by amplification of second-
ary loci. For simple determination of genetic diversity the allo-
cation of alleles to known SSR loci is not as critical, as in these
analyses the patterns of bands are used to produce a similarity
matrix between individuals for production of phylogenic trees,
and the location of the SSRs is of secondary importance.
Likewise, creation of linkage maps does not require prior
knowledge of SSR location, and may in fact be used to allocate
SSRs to linkage groups. However, in other uses of SSRs, such as
haplotype determination and analysis of genetic introgressions,
the location of the SSR and hence the correct identification of
alleles as belonging to each locus is crucial.
15. If capillary electrophoresis is used to separate PCR products,
then allele copy number may be able to be determined based on
relative amplification of the fluorescent peaks. This can only be
done if two or more alleles are amplified by the same PCR prim-
ers, as relative peak amplification is required for proper assess-
ment of copy number, and will not be reliable for all markers.
To assess the reliability of allele copy number analysis, the ratio of
amplification of each allele relative to every other allele amplified
by the same primer should be calculated. If ratios do not fall
neatly into whole number multiples (e.g., 1:1, 1:2), allele copy
number should not be assessed using that marker, but otherwise
this technique may provide some utility for detecting events such
as homoeologous nonreciprocal translocations [29–31].
88 Annaliese S. Mason
References
1. Tautz D, Renz M (1984) Simple sequences 13. Plaschke J, Ganal MW, Roder MS (1995)
are ubiquitous repetitive components of Detection of genetic diversity in closely-related
eukaryotic genomes. Nucleic Acids Res 12: bread wheat using microsatellite markers.
4127–4138 Theor Appl Genet 91:1001–1007
2. Morgante M, Rafalski A, Biddle P et al (1994) 14. Yang GP, Maroof MAS, Xu CG et al (1994)
Genetic mapping and variability of 7 soybean Comparative analysis of microsatellite DNA
simple sequence repeat loci. Genome 37: polymorphism in landraces and cultivars of
763–769 rice. Mol Gen Genet 245:187–194
3. Cox R, Mirkin SM (1997) Characteristic 15. Akkaya MS, Bhagwat AA, Cregan PB (1992)
enrichment of DNA repeats in different Length polymorphisms of simple sequence
genomes. Proc Natl Acad Sci U S A 94: repeat DNA in soybean. Genetics 132:
5237–5242 1131–1139
4. Strand M, Prolla TA, Liskay RM et al (1993) 16. Chung SM, Staub JE, Chen JF (2006)
Destabilization of tracts of simple repetitive Molecular phylogeny of Cucumis species as
DNA in yeast by mutations affecting DNA revealed by consensus chloroplast SSR marker
mismatch repair. Nature 365:274–276 length and sequence variation. Genome 49:
5. Tautz D (1989) Hypervariability of simple 219–229
sequences as a general source for polymorphic 17. Tang S, Yu JK, Slabaugh MB et al (2002)
DNA markers. Nucleic Acids Res 17: Simple sequence repeat map of the sunflower
6463–6471 genome. Theor Appl Genet 105:1124–1136
6. Wolfe KH, Li WH, Sharp PM (1987) Rates of 18. Hopkins MS, Casa AM, Wang T et al (1999)
nucleotide substitution vary greatly among Discovery and characterization of polymorphic
plant mitochondrial, chloroplast, and nuclear simple sequence repeats (SSRs) in peanut.
DNAs. Proc Natl Acad Sci U S A 84: Crop Sci 39:1243–1247
9054–9058 19. Goldstein DB, Roemer GW, Smith DA et al
7. Varshney RK, Graner A, Sorrells ME (2005) (1999) The use of microsatellite variation to
Genic microsatellite markers in plants: features infer population structure and demographic
and applications. Trends Biotechnol 23:48–55 history in a natural model system. Genetics
8. Zane L, Bargelloni L, Patarnello T (2002) 151:797–801
Strategies for microsatellite isolation: a review. 20. Goldstein DB, Pollock DD (1997) Launching
Mol Ecol 11:1–16 microsatellites: A review of mutation processes
9. Squirrell J, Hollingsworth PM, Woodhead M and methods of phylogenetic inference. J Hered
et al (2003) How much effort is required to 88:335–342
isolate nuclear microsatellites from plants? Mol 21. Barrier M, Friar E, Robichaux R et al (2000)
Ecol 12:1339–1348 Interspecific evolution in plant microsatellite
10. Mohan M, Nair S, Bhagwat A et al (1997) structure. Gene 241:101–105
Genome mapping, molecular markers and 22. Zou J, Fu DH, Gong HH et al (2011) De novo
marker-assisted selection in crop plants. Mol genetic variation associated with retrotranspo-
Breed 3:87–103 son activation, genomic rearrangements and
11. Weber JL, May PE (1989) Abundant class of trait variation in a recombinant inbred line
human DNA polymorphisms which can be population of Brassica napus derived from inter-
typed using the polymerase chain-reaction. specific hybridization with Brassica rapa. Plant J
Am J Hum Genet 44:388–396 68:212–224
12. Devos KM, Bryan GJ, Collins AJ et al (1995) 23. Zhou WC, Kolb FL, Bai GH et al (2003)
Application of 2 microsatellite sequences in Validation of a major QTL for scab resistance
wheat storage proteins as molecular markers. with SSR markers and use of marker-assisted
Theor Appl Genet 90:247–252 selection in wheat. Plant Breed 122:40–46
SSR Genotyping 89
24. Young ND (1999) A cautiously optimistic generation technologies. Brief Bioinform 10:
vision for marker-assisted breeding. Mol Breed 609–618
5:505–510 29. Mason AS, Nelson MN, Castello M-C et al
25. Collard BCY, Mackill DJ (2008) Marker- (2011) Genotypic effects on the frequency of
assisted selection: an approach for precision homoeologous and homologous recombination
plant breeding in the twenty-first century. in Brassica napus × B. carinata hybrids. Theor
Philos T R Soc B 363:557–572 Appl Genet 122:543–553
26. Varshney RK, Nayak SN, May GD et al (2009) 30. Nelson MN, Mason AS, Castello M-C et al
Next-generation sequencing technologies and (2009) Microspore culture preferentially
their implications for crop genetics and breed- selects unreduced (2n) gametes from an inter-
ing. Trends Biotechnol 27:522–530 specific hybrid of Brassica napus L. × Brassica
27. Maniatis T, Jeffrey A, Vandesande H (1975) carinata Braun. Theor Appl Genet 119:
Chain-length determination of small double- 497–505
stranded and single-stranded DNA molecules by 31. Nicolas SD, Mignon GL, Eber F et al (2007)
polyacrylamide gel electrophoresis. Biochemistry Homeologous recombination plays a major
14:3787–3794 role in chromosome rearrangements that occur
28. Imelfort M, Edwards D (2009) De novo during meiosis of Brassica napus haploids.
sequencing of plant genomes using second- Genetics 175:487–503
Chapter 7
Abstract
RFLP (Restriction Fragment Length Polymorphism) is a commonly used technique that can be used for
genotyping for nearly all organisms, including plants, animals, and humans. RFLP is widely used in genetic
and genomic research, such as genome mapping and gene identification. The technique involves DNA
digestion, gel electrophoresis, capillary transfer of DNA, and southern hybridization. In this chapter, we
aim to give a detailed introduction of how to perform RFLPs for identifying genotypes.
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_7, © Springer Science+Business Media New York 2015
91
92 Shutao Dai and Yan Long
2 Materials
3 Methods
3.1 Digesting DNA 1. Measure the concentration of DNA samples with fluorometer
for Southern Blotting and equilibrate them to a concentration of 0.6–1.5 μg/μl.
2. Mix 15 μg of genomic DNA, 3 μl of 10× buffer, 29 μl of auto-
claved ddH2O, and 3 μl of restriction enzyme (10 U/μl) to a
total of 35 μl for each sample. Mix thoroughly by tapping the
tubes several times, followed by very brief spinning. Digest at
the appropriate temperature for 8–12 h.
3. After completing the digestion, store at 4 °C, and check the
quality of the digestion with 3.5 μl (1.5 μg) of the digested DNA
on an agarose gel. Good digestion should show an even distri-
bution and clear smear (Fig. 1).
Fig. 1 Restricted DNA fragments separated by agarose gel electrophoresis and visualized by UV light
Genotyping Analysis Using an RFLP Assay 95
5. Turn the gel upside down, then cut it to the desired size
(usually 19.5 cm × 10 cm), and cut off the upper right-hand
corner of the gel to mark the direction of the electrophoresis.
6. Place a glass plate on a large plastic tray, add enough 0.4 M
NaOH to saturate two sheets of filter papers (22 cm × 30 cm)
with the transfer solution (0.4 M NaOH) and place on the
glass plate to create a wick. Roll out all air bubbles from the
wick with a glass rod.
7. Carefully transfer the gel onto the wick and take care to remove
air bubbles beneath it (see Note 7).
8. Wet the Hybond-N+ membrane (19.5 cm × 10 cm) with the
transfer solution and cut off a corner. Pipette a small volume of
the solution onto the gel, and place the membrane on it,
making sure the position of the cut corner is identical to that
of the gel. Then roll out all air bubbles (see Note 8).
9. Wet two pieces of filter paper (20 cm × 10.5 cm) in the transfer
solution and place them on the top of the membrane and roll
out all air bubbles.
10. Surround the gel with plastic wrap or parafilm and place a stack
of paper towels (20 cm × 10.5 cm) on the filter paper above the
gel (see Note 9). Put a glass plate on the top of these towels
and place a 500-g weight on the glass plate.
11. Allow the transfer of DNA in the capillary transfer system to
proceed for 20–30 h (Fig. 2) (see Note 10).
12. Carefully remove the glass plate, paper towels and filter paper,
and peel the gel from the membrane (see Note 11), make the
position of the gel slots on the membrane with a soft lead pencil,
and transfer the membrane into 2× SSC to wash briefly so as to
remove bits of gel or particles.
13. Air dry the membrane on a sheet of filter paper, then sandwich
with two pieces of filter papers, after baking at 80–100 °C
3.3 Radioactively 1. Set up the following reaction for one tube with 1–2 membrane:
Labeling Probe 2–3 μl (50–150 ng) of probe DNA, 1.5 μl of Random primers
d(N)6, 1 μl of marker DNA, 2 μl of 10× dCTP mixture, 0.5 μl
of Klenow fragment enzyme, 2 μl of Klenow Buffer and add
autoclaved ddH2O to a volume of 20 μl.
2. Add probe DNA, random Primers, and marker DNA into a
0.5-ml eppendorf tube, heat in a boiling water bath for 5 min,
then quickly cool in wet ice for 5 min, spin briefly and quickly
place back on ice and go to the next stage.
3. Add appropriate ddH2O, 10× dCTP Buffer, Klenow Buffer,
and Klenow Fragment enzyme into the same tube in wet ice,
then mix quickly by taping the tube and spinning briefly.
4. Take the tube into hybridization room and add 1–2 μl of
[α -32p]dCTP to the tube, mix thoroughly with a pipette.
5. Leave at room temperature (30 °C) for about 2–4 h.
6. Add 400 μl denaturation solution to each reaction tube, then
put in a boiling water bath for 4 min.
7. Cool in ice until use.
3.4 Hybridization 1. Prepare the following prehybridization solution for one tube
with Radioactively with 1–2 membranes: 3 g of dextran sulphate, 17.55 ml of
Labeled Probe ddH2O, 6 ml of 20× SSPE, 6 ml of 50× Denhardts, 0.3 ml
of 10 % SDS, 150 μl of Salmon sperm DNA (10 mg/ml) to a
total of 30 ml.
2. Gently dissolve the dextran sulphate in ddH2O (prewarmed at
65 °C) at room temperature, then add 20× SSPE, 50× Denhardts,
and 10 % SDS. After mixing well, heat to 65 °C in a water bath.
3. Add 150 μl of freshly boiled 10 mg/ml Salmon sperm DNA or
Herring testes DNA to the above solution and mix well, keep-
ing it in 65 °C.
4. Roll the membrane (stored in 2× SSC) into a hybridization
tube, add 30 ml 2× SSC to prewarmed at the hybridization
oven for about 10–20 min at 65 °C, then remove the 2× SSC
and add 30 ml of the prehybridization solution to the tube and
eliminate air bubbles.
5. Place the tube inside a prewarmed hybridization oven to
prehybridize for approximately 8 h at 65 °C.
6. Prepare the hybridization solution in the same way as the
prehybridization solution during prehybridization, and keep it
at 65 °C: 0.6 g of dextran sulphate, 3.5 ml of ddH2O, 1.2 ml
of 20× SSPE, 1.2 ml of 50× Denhardts, 60 μl of 10 % SDS,
30 μl of Salmon sperm DNA (10 mg/ml) to a total of 6 ml.
Genotyping Analysis Using an RFLP Assay 97
Fig. 3 A photograph developed with X-ray film. The blots represent the restriction fragments
3.5 Stripping Remove the probe and prepare hybridization again. Wash the
and Reprobing membrane as follows (see Note 14): Wash A for 10 min, Wash B
for several minutes (see Note 15), Wash C for 20 min. After washing,
put the membrane into 2× SSC until use.
4 Notes
References
3. Helentjaris T, Slocum M, Wright S, Schaefer A, by QTL association with RFLP alleles. Theor
Nienhuis J (1986) Construction of genetic Appl Genet 88:486–489
linkage maps in maize and tomato using restric- 12. Waldron BL, Moreno-Sevilla B, Anderson JA,
tion fragment length polymorphisms. Theor Stack RW, Frohberg RC (1999) RFLP map-
Appl Genet 72:761–769 ping of QTL for Fusarium head blight resis-
4. Weber D, Helentjaris T (1989) Mapping RFLP tance in wheat. Crop Sci 39:805–811
loci in maize using B-A translocations. Genetics 13. Zimnoch-Guzowska E, Marczewski W,
121:583–590 Lebecka R, Flis B, Scha¨fer-Pregl R, Salamini
5. Ali S, Müller CR, Epplen JT (1986) DNA fin- F, Gebhardt C (2000) QTL analysis of new
ger printing by oligonucleotide probes specific sources of resistance to Erwinia carotovora ssp.
for simple repeats. Hum Genet 74:239–243 atroseptica in potato done by AFLP, RFLP,
6. Becker J, Vos P, Kuiper M, Salamini F, Heun M and resistance-gene-like markers. Crop Sci 40:
(1995) Combined mapping of AFLP and 1156–1167
RFLP markers in barley. Mol Gen Genet 14. Chartier-Hariln M-C, Parfitt M, Legrain S,
249:65–73 Pérez-Tur J et al (1994) Apolipoprotein E,
7. Bradshaw HD, Villar M, Watson BD, Otto epsilon 4 allele as a major risk factor for spo-
KG, Stewart S, Stettler RF (1994) Molecular radic early and late-onset forms of Alzheimer’s
genetics of growth and development in disease: analysis of the 19q13.2 chromosomal
Populus. III. A genetic linkage map of a hybrid region. Hum Mol Genet 3:569–574
poplar composed of RFLP, STS, and RAPD 15. Inoue N, Kawashima S, Kanazawa K, Yamada
markers. Theor Appl Genet 89:167–178 S, Akita H, Yokoyama M (1998) Polymorphism
8. Cregan PB, Jarvik T, Bush AL, Shoemaker RC, of the NADH/NADPH oxidase p22 phox
Lark KG, Kahler AL, Kaya N, VanToai TT, gene in patients with coronary artery disease.
Lohnes DG, Chung J, Specht JE (1999) An Circulation 97:135–137
integrated genetic linkage map of the soybean 16. Shindo Y, Inoko H, Yamamoto T, Ohno S
genome. Crop Sci 39:464–1490 (1994) HLA-DRB1 typing of Vogt-Koyanagi-
9. Lespinasse D, Rodier-Goud M, Grivet L, Harada's disease by PCR-RFLP and the strong
Leconte A, Legnate H, Seguin M (2000) A association with DRB1*0405 and DRB1*0410.
saturated genetic linkage map of rubber tree Br J Ophthalmol 78:223–226
(Hevea spp.) based on RFLP, AFLP, microsat- 17. Allen RW, Bliss B, Pearson A (1989)
ellite, and isozyme markers. Theor Appl Genet Characteristics of a DNA probe (pa3′HVR)
100:127–138 when used for paternity testing. Transfusion
10. Amer IMB, Worland AJ, Korzun V, Börner A 29:477–485
(1997) Genetic mapping of QTL controlling 18. Morling N, Hansen HE (1993) Paternity test-
tissue-culture response on chromosome 2B of ing with VNTR DNA systems. Int J Leg Med
wheat (Triticum aestivum L.) in relation to 105:189–196
major genes and RFLP markers. Theor Appl 19. Smith JC, Newton CR, Alves A, Anwar R,
Genet 94:1047–1052 Jenner D, Markham AF (1990) Highly poly-
11. Lark KG, Orf J, Mansur LM (1994) Epistatic morphic minisatellite DNA probes. Further eval-
expression of quantitative trait loci (QTL) in uation for individual identification and paternity
soybean [Glycine max (L.) Merr.] determined testing. J Forensic Sci Soc 30:3–18
Chapter 8
Abstract
DNA barcoding uses specific regions of DNA in order to identify species. Initiatives are taking place around
the world to generate DNA barcodes for all groups of living organisms and to make these data publically
available in order to help understand, conserve, and utilize the world’s biodiversity. For land plants the core
DNA barcode markers are two sections of coding regions within the chloroplast, part of the genes, rbcL and
matK. In order to create high quality databases, each plant that is DNA barcoded needs to have a herbarium
voucher that accompanies the rbcL and matK DNA sequences. The quality of the DNA sequences,
the primers used, and trace files should also be accessible to users of the data. Multiple individuals
should be DNA barcoded for each species in order to check for errors and allow for intraspecific variation.
The world’s herbaria provide a rich resource of already preserved and identified material and these can be
used for DNA barcoding as well as by collecting fresh samples from the wild. These protocols describe the
whole DNA barcoding process, from the collection of plant material from the wild or from the herbarium,
how to extract and amplify the DNA, and how to check the quality of the data after sequencing.
Key words DNA barcoding, Plants, rbcL, matK, Species identification, Plant collection, Herbarium
specimens, DNA extraction, PCR
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_8, © Springer Science+Business Media New York 2015
101
102 Natasha de Vere et al.
For instance, barcoding approaches have been used for the verifica-
tion of plant products ranging from medicinal plants [5, 6] to
kitchen spices [7], berries [8], olive oil [9], and tea [10]. Ecological
applications have included the identification of invasive species
[11–13], characterization of below-ground plant diversity using
roots [14], and reconstruction of past vegetation and climate from
plant remains in the soil [15]. Genetic sequences obtained in the con-
text of DNA barcoding have also been used to create phylogenetic
trees for use in phylogenetic community ecology [16, 17].
These applications all depend on using regions of DNA that are
able to identify between species without being too variable within a
species. Following the evaluation of several candidate markers, the
Plant Working Group (PWG) of the Consortium for the Barcoding of
Life (CBOL) recommended that regions of two plastid genes, rbcL
and matK, be adopted as the standard plant DNA barcodes, with the
recognition that supplementary markers may be required [18, 19].
The use of DNA barcoding as an identification tool is also dependent
on the creation of high-quality reference databases of sequences [19].
Essential to a database is that every DNA sequence should be associ-
ated with the plant specimen from which it came; along with when,
where, and by whom it was collected and identified. This is best done
through the creation of a herbarium voucher alongside each DNA
sample, although sometimes for rare and threatened species a photo-
graph may provide a substitute [4]. The lab procedures through
which a sample is processed should also be recorded, with the primers
used, trace files and quality statistics for its DNA sequence all available
to end users of the data [4]. All data should be publically available;
GenBank provides a repository for DNA sequences but in addition it
is recommended to deposit data on to the Barcode of Life Datasystem
(BOLD) [20]. BOLD provides a means of managing projects and
allows trace files, scans of herbarium specimens, and photographs to
be stored alongside DNA sequences [20].
With estimates suggesting that there may be around 380,000
land plant species in the world, composed of around 352,000
angiosperms, 1,300 gymnosperms, and 13,000 bryophytes, ferns,
and fern allies, DNA barcoding must use existing resources and
expertise efficiently in order to make an effective contribution to
cataloguing such huge diversity [21]. The herbaria of the world
provide a vast and important source of plant material that is already
identified and preserved, capturing years of taxonomic expertise [4].
Extracting DNA from herbarium specimens, however, can be more
problematic compared to collecting new samples from the wild.
The usability of samples for DNA extraction in different herbaria
varies according to the conditions in which specimens have been
stored and how they were originally preserved. Newer material
works better than older and certain taxonomic groups work less
well than others [4, 22]. For large DNA barcode campaigns we
recommend a combined approach that uses herbarium specimens
to quickly bulk up the number of samples available for DNA
DNA Barcoding for Plants 103
2 Materials
4. Field press.
5. Drying oven that can provide a steady flow of warm air at
35–45 °C.
6. Acid-free herbarium mounting paper.
7. Herbarium labels.
8. Gummed linen strips.
9. Archival quality PVA glue.
10. Freezer.
11. Insect-proof herbarium cupboards.
2.4 DNA Extraction 1. Commercial extraction kit, e.g., Qiagen DNeasy 96 Plant Kit.
of Herbarium Samples 2. Molecular biology grade 100 % ethanol.
in 96-Well Format
3. Mill for tissue grinding, e.g., Qiagen TissueLyser with 96-well
plate adaptor.
4. 3 mm tungsten carbide milling beads.
5. Centrifuge with deep bucket rotor for 96-well plates, capable
of achieving 6,000 × g.
6. Pipettes; multichannel and single with associated tips.
7. Measuring cylinders and buffer reservoirs.
8. Burner for flaming.
9. Forceps.
10. Water bath set at 65 °C.
11. Proteinase K (see Note 9).
12. DTT (see Note 10).
13. Fridge and freezer.
3 Methods
3.2 Field Collecting 1. Before going into the field prepare the target list of species and
of Plant Samples areas to be covered.
(See Note 14) 2. Ensure silica gel is fully dehydrated and place a tablespoon of gel
into each specimen bag. Store these in an airtight bag or con-
tainer to keep them dry; these are best prepared before going
into the field.
3. Locate target specimen. Record the species name (if known)
and assign it a collection number. This collection number will
link the DNA sample to its herbarium voucher and collection
information.
4. As a minimum, record the date, collector’s name(s), and location
to within at least 100 m or latitude and longitude using a GPS
and the locality name following the spelling on the map. In addi-
tion you may want to collect other information, for example,
life stage, habitat, associated species, topography, and aspect.
5. Collect a sample of the plant that includes all of the features
required for its identification. Often this means that flowering
material is required for angiosperms and spore-bearing fronds
for ferns. Root material may also be necessary for some species.
It is important to familiarize yourself with the key taxonomic
features of a species before collecting.
6. Label a jewelry tag with the collection code and attach it to
the plant stem or other part where it will not become easily
detached.
7. It is best to put tissue samples into silica gel immediately in the
field to dry them quickly to maximize DNA quality, especially
in hot or dry climates. Select a green, undamaged leaf from the
specimen for DNA collection and remove with scissors or by
hand. Cut or tear three 0.5 cm × 0.5 cm sections from the leaf
and place into a sample bag containing one tablespoon of dry
silica gel. Give the bag a quick shake to bury the leaf pieces,
fold the bag closed, and label it with the collection code and,
DNA Barcoding for Plants 107
3.3 Preparation 1. On returning from the field, interleave the flimsies containing
of Herbarium plant samples with drying papers and corrugates. Place within
Specimens a press and dry within a luke warm drying oven or in a well-
ventilated area taking care not to overheat.
2. After 24 h, the plant should be easier to work with and can be
arranged to illustrate all of its key taxonomic features if this has
not been done when first pressed. For example, manipulate
specimens so that both upper and lower surfaces of leaves are
shown and flowers or spore structures are clearly displayed.
3. Change the drying papers regularly, normally once a day, so that
the sample dries quickly; this maximizes the chances of getting
further DNA out of the samples if subsequently required. Dry
damp papers before reusing.
4. Once the sample is dried, mount it on to a herbarium sheet
using small glued linen strips. Use the minimum number of
strips to secure the plant safely without obscuring key features
of the specimen. Other methods of attaching specimens such
as sewing or glue can also be used, but gluing is not recom-
mended as this impedes further use of the specimen for DNA
extraction.
5. Glue a herbarium label containing all of the collection infor-
mation on to the bottom right-hand corner of the sheet.
6. Mounted specimens should be frozen at −20 °C for a minimum
of 72 h to kill any pests that might be found within the plants.
Place specimens in cardboard boxes covered with air-tight
108 Natasha de Vere et al.
3.4 Collecting Collecting samples directly from herbarium specimens for DNA
Samples for DNA extraction is an efficient way to obtain a large number of verified
Extraction samples. The age of the specimen, how it has been preserved and
from Herbarium stored, and the taxonomic group all affect the likelihood of obtain-
Specimens ing useable DNA, therefore it is advisable to conduct a trial before
embarking on a large-scale sampling campaign. We found an approx-
imate 10 % loss of DNA recoverability per decade, so preferentially
sample material less than 30 years old [4].
1. Prepare the target list of species for collecting. It is advisable to
arrange this with reference to the layout of the herbarium col-
lection, so that herbarium cabinets are visited in order and the
collection can be sampled as efficiently as possible.
2. Prepare labels with duplicate collection codes; these can be cut
in half, with one half stuck to the herbarium specimen to indi-
cate that it has been sampled and the other placed in the bag
with the leaf sample. Both labels need to show the collection
code as this is the number linking the DNA sample with the
herbarium specimen. Barcoded herbarium labels (i.e., not
DNA barcodes) can be used if duplicated.
3. Choose the herbarium specimen for sampling. The specimen
must be suitable for having a small section of tissue removed
(approximately 2–4 cm2) without decreasing its scientific value
and should preferably have been determined by an expert in addi-
tion to the collector. More recently collected samples work better.
Do not sample type specimens or historically important material
unless there are compelling reasons to do so. Collect multiple
samples per species from throughout the geographic target area.
4. Remove a small piece of material, around 2–4 cm2 and place
this into airtight ziplock bag using forceps, along with the label
with the species name and collection code. Select samples from
areas of green, thin tissue which will have dried quickly and
retained DNA quality. DNA is more easily extracted from flowers
than leaves for some taxonomic groups such as Orchidaceae
and Hypericaceae. Be careful of mixed species collections on
the same herbarium sheet.
DNA Barcoding for Plants 109
3.5 Laboratory Keeping track of the collected samples as they pass through the lab
Information procedures is a nontrivial task, especially for plants, as each sample
Management Systems will be amplified multiple times to allow for successful amplifica-
(LIMS) tion using the two DNA barcode markers. It is possible to keep
track of samples using spreadsheets but we recommend that for
larger scale DNA barcode campaigns some form of LIMS system is
used. We use the Biocode PlugIn, a free tool that can be added
into the Geneious Pro bioinformatics software [24].
3.6 DNA Extraction There are many different methods for DNA extraction from plant
of Herbarium Samples material. The method we present here uses a commercial kit
in 96-Well Format (Qiagen DNeasy 96 Plant Kit) but the protocol has been adapted
for use with herbarium specimens. It uses a 96-well format with
two 96-well plates processed per extraction (see Note 17).
1. Decide on the position of each sample to be extracted for two
96-well plates.
2. Place a 3-mm tungsten carbide bead into each sample tube
within two 96-well sample plates.
3. Add 0.5 cm2 of tissue sample to each tube using forceps. Tissue
thickness will vary between samples so consider this and aim for
an even amount of tissue across samples. When placing the
samples into the sample tubes, only open one strip of 8 lids at a
time as the samples can become statically charged and jump
about. Break or find samples of suitable size within the sample
bags to avoid particles of material moving into the lab.
4. Dip forceps in 70 % ethanol or flame between samples.
5. Extract the DNA as per the manufacturers’ instructions but
with the following modifications.
6. To the 400 μl of AP1 buffer add 80 μl of DDT at 0.75 mg/ml
and 20 μl of Proteinase K at 1 mg/ml. Make up enough for all
192 samples being extracted and add 400 μl of the mixture to
each sample tube.
7. Disrupt the sample in a mill for 2 min each side. Turn the
samples around between each 2 min to allow for consistent
sample disruption.
8. After disruption, extend the incubation time in the modified
AP1 buffer to 1 h at 65 °C (see Note 18).
9. At the end of the DNA extraction extend the final incubation
stage with AE buffer to 15 min.
110 Natasha de Vere et al.
Table 1
rbcL and matK primers commonly used to amplify plant species
3.7 PCR The method described here is for the amplification of the DNA
Amplification barcode markers rbcL and matK. It is optimized for use with
herbarium material but also works for freshly collected material
that has been stored in silica gel prior to extraction. Table 1 shows
primers commonly used for rbcL and matK. rbcL primers are gener-
ally universal, working well across a broad taxonomic range; we use
rbcLaF and rbcLr590 for the first PCR. If this fails we then use a
different reverse primer. matK is more problematic and often
requires more primer combinations, especially when using herbar-
ium material. For herbarium material we often use primers specific
to the order of flowering plants to which the sample belongs [4].
matK amplification can also sometimes be problematic for nonseed
plants and further primer development is required for these [21].
1. Table 2 details the components required for PCR. Decide on
the number of samples to be amplified and make up a master
mix with each component (except for the DNA) in the quanti-
ties required plus a little extra for pipetting errors. Include a
water control to test for contaminants.
2. Add 18 μl of master mix into each PCR tube either individually
or using a multichannel pipette if using a 96-well PCR plate.
DNA Barcoding for Plants 111
Table 2
PCR components required in order to amplify rbcL and matK
3.8 Gel There are many shapes and sizes of gel support and combs; this
Electrophoresis method can be used to run a 96-well plate of samples at one time
(see Note 20).
1. Plan the order in which samples will be added to the agarose gel.
2. Assemble walls of gel support or cover with two pieces of
masking tape each side.
3. Make a 1 % gel by weighing out 1.3 g of agarose in a conical
flask and add 130 ml of 1× TAE buffer. This is for a 1 % gel
with a gel support of 16 × 17 cm.
4. Heat in a microwave on medium power until bubbling
(this should take around 3 min). Check that all of the agarose
has fully dissolved; it should be clear with no thread-like
appearance.
5. Cool by placing the conical flask under a cold running tap,
swirl to allow even cooling but do not shake as this will add in
air bubbles. Cool until comfortable to touch.
6. Add 3 μl of SYBRsafe dye and swirl until incorporated.
7. Pour into gel support, avoiding air bubbles or layers and insert
combs. Four combs with 30 wells each allows a 96-well PCR
plate to be run at each time.
112 Natasha de Vere et al.
3.9 DNA Sequencing For DNA barcoding, samples should be Sanger sequenced in both
directions so each PCR plate will result in two sequencing plates.
DNA can be sequenced using the same primers that were used for
PCR. DNA sequencing technology is developing rapidly and many
of the applications of DNA barcoding make use of next generation
sequencing approaches. Sanger sequencing is, at the time of writ-
ing, still an appropriate tool for the creation of reference DNA
barcode libraries due to its accuracy and long read length, but this
may change in the future.
We recommend the use of a commercial provider for DNA
sequencing and purification. Shop around for the best price and do
negotiate for discounts when a large number of samples are going
to be sequenced.
3.10 Manual Editing, There are a number of software packages available that can be used
Alignment, for manual editing, for example, Sequencher, Geneious, and
and Data Checks CodonCode Aligner; we use Sequencher. Rather than give specific
details of methods for particular software packages, we provide
here a guideline to what is required for the manual editing and
checking process.
DNA Barcoding for Plants 113
1. Download the AB1 files from the DNA sequencer into a folder
containing the forward and reverse reads. Depending on how
DNA sequencing was carried out, the files may need to be
renamed to make it easier to construct the contigs.
2. The sequences must be trimmed to remove low quality bases at
the beginning and end of the sequencing read. A standard set of
trim criteria should be used. We use a 25 bp window and remove
sequence with >2 bp showing a quality score (QV) <20.
3. The forward and reverse reads now need to be assembled into
a contig. Sequencher provides methods for assembly by name
allowing the process to be automated.
4. After you have your assembled contigs, manually edit each
one. First, check that one of the reading frames is free of stop
codons. Then check the amount of overlap in the forward and
reverse read; this should be greater than 50 % for DNA barcode
sequences (see Note 24).
5. View the contig consensus and the traces for each read. Search
for contig disagreements and if any are found examine the for-
ward and reverse sequences to see if the reason is clear. If it is,
then change the base to the correct letter. If the disagreement is
not clear then change the consensus to N for that base. Also
search for ambiguous bases. Check each ambiguous base; leave
if the call is acceptable, change to the correct letter or N if not.
6. Trim the primers from the consensus; sometimes, depending
on the amount of trimming, only part of the primer will be left
or none at all (see Note 25).
7. Provide a summary of the quality of the sequences. Quality
statistics can include the amount of bidirectional read; mean
QV of sequences; the percentage of high (QV >30) and low
quality (QV <20) bases, and the number of internal gaps
and substitutions when aligning the forward and reverse reads
(see Note 26).
8. When manual editing is complete, export the successful con-
sensus sequences as FASTA files.
9. Manual editing of individual sequences may not always spot
extra bases inserted or missed during the base-calling process
and may not always spot sequences in reverse complement. We
align the sequences to highlight these errors and also to prepare
sequences for downstream analysis. For rbcL we use Clustal W
in MEGA [25], for MatK we use transAlign [26].
10. Completed alignments may require some manual editing; rbcL
does not contain indels but matK does and manual editing can
improve the final alignment. We use MEGA, Bioedit, and
Mesquite for manually editing alignments (see Note 27).
114 Natasha de Vere et al.
11. In the final alignment, scan through and check for any
sequences that appear misplaced, often due to a missed or extra
base being called. If the whole sequence appears incorrect then
sometimes it is the reverse complement. Check multiple copies
of the same species, these should be similar.
12. If all the sequences in the alignment used the same forward and
reverse primers then the primers can be trimmed at this stage
instead of individually during manual editing of the contigs.
13. We use the alignment to make guide trees to look for obvi-
ously misplaced taxa as these can represent mislabeling or con-
taminants. Before making the tree, trim the sequences to a
constant length. We use MEGA to make Neighbor-joining
trees using Kimura-2-parameter and 1,000 bootstrap replicates
(see Note 28).
14. View the neighbor-joining tree for obviously misplaced samples.
Samples of the same species should appear close together but
may not necessarily form a monophyletic group. This approach
can help spot contaminants and errors within very different
taxonomic groups but will not help if contaminants are close
relatives.
15. Once the data is checked it can be exported in FASTA format and
uploaded, along with scans of herbarium vouchers, collection
data, and trace files, to BOLD [20].
4 Notes
1. Silica gel is a drying agent; wear gloves when handling and use
in a well-ventilated area. We find it preferable to use small balls
rather than powder as this generates less dust. The silica gel must
be fully dehydrated before use and should be stored in an airtight
box or bag to prevent it absorbing moisture from the air.
2. We use glassine paper negative bags in which to collect specimens
as these can be purchased in suitable sizes so that a tablespoon
of silica gel can be added into each bag. They can be purchased
from photographic suppliers. Alternatively plastic ziplock bags
can be used or 2 ml tubes with screw top lids.
3. Traditionally a metal carrying case called a vasculum is used to
collect herbarium samples until they can be pressed. This keeps
them cool and prevents them from becoming damaged. This,
however, is rather bulky to carry and plastic carrier bags may be
used instead.
4. Field collecting notebooks can be used to record specimens
but if a large collecting campaign is planned it is advisable to use
a Field Information Management System (FIMS) [27].
DNA Barcoding for Plants 115
24. As rbcL and matK are coding regions of DNA, stop codons
should not be found within the sequences. A genuine stop
codon within the sequence can indicate that the gene is no
longer functional and therefore a pseudogene has been
sequenced; this is quite rare for rbcL and matK, but if it does
occur that sequence should be removed. More frequently stop
codons are caused by a base being added or missed during base-
calling which throws the reading frame out; this can often be
repaired during the manual editing process.
25. The consensus will be shown in the 5′–3′ direction so the
reverse primer will need to be reverse complemented before
searching for it.
26. The CBOL Plant Working Group define high quality sequences
as those in which both the forward and reverse reads should
have a minimum length of 100 bp, a minimum mean QV of
>30, and the posttrim lengths should be >50 % of the original
read length. The assembled contig should have >50 % overlap
in the alignment of the forward and reverse reads with <1 %
low-quality bases (<20 QV), and <1 % internal gaps and substi-
tutions when aligning the forward and reverse reads [18].
27. rbcL typically does not contain indels in the alignment. The
only cases where we have found indels are when the alignment
contains parasitic plants. Plants that are completely parasitic do
not need a functional rbcL gene, so greater sequence variation
is possible due to the relaxation of selection pressure on this
region [30].
28. If using MEGA to make the alignment and Neighbor-joining
trees then note that the final alignment will need to be saved as
a .meg file so that it can be opened in the data explorer module
of MEGA.
References
1. Hebert PDN, Cywinska A, Ball SL et al (2003) 6. Chen S, Yao H, Han J et al (2010) Validation
Biological identifications through DNA bar- of the ITS2 region as a novel DNA barcode
codes. Proc R Soc Lond B Biol Sci 270: for identifying medicinal plant species. PLoS
313–321 One 5:e8613
2. Hebert PDN, Gregory TR (2005) The prom- 7. De Mattia F, Bruni I, Galimberti A et al (2011)
ise of DNA barcoding for taxonomy. Syst Biol A comparative study of different DNA barcod-
54:852–859 ing markers for the identification of some mem-
3. Chase MW, Fay MF (2009) Barcoding of plants bers of Lamiacaea. Food Res Int 44:693–702
and fungi. Science 325:682–683 8. Jaakola L, Suokas M, Haggman H (2010)
4. de Vere N, Rich TCG, Ford CR et al (2012) Novel approaches based on DNA barcoding
DNA barcoding the native flowering plants and high-resolution melting of amplicons for
and conifers of Wales. PLoS One 7:e37945 authenticity analyses of berry species. Food
5. Asahina H, Shinozaki J, Masuda K et al (2010) Chem 123:494–500
Identification of medicinal Dendrobium species 9. Kumar S, Kahlon T, Chaudhary S (2011) A
by phylogenetic analyses using matK and rbcL rapid screening for adulterants in olive oil using
sequences. J Nat Med 64:133–138 DNA barcodes. Food Chem 127:1335–1341
118 Natasha de Vere et al.
10. Stoeckle MY, Gamble CC, Kirpekar R et al 24. Parker M, Stones-Havas S, Starger C et al
(2011) Commercial teas highlight plant DNA (2012) Laboratory information management
barcode identification successes and obstacles. systems for DNA barcoding. In: Kress WJ,
Sci Rep 1:42 Erickson DL (eds) Springer protocols methods
11. Bleeker W, Klausmeyer S, Peintinger M et al in molecular biology 858 DNA barcodes
(2008) DNA sequences identify invasive alien methods and protocols. Humana, New York,
Cardamine at Lake Constance. Biol Conserv pp 269–310
141:692–698 25. Tamura K, Peterson D, Peterson N et al (2011)
12. Saunders GW (2009) Routine DNA barcoding MEGA5: molecular evolutionary genetics anal-
of Canadian Gracilariales (Rhodophyta) reveals ysis using maximum likelihood, evolutionary
the invasive species Gracilaria vermiculophylla distance, and maximum parsimony methods.
in British Columbia. Mol Ecol Resour 9: Mol Biol Evol 28:2731–2739
140–150 26. Bininda-Emonds ORP (2005) Transalign:
13. Van De Wiel CCM, Van Der Schoot J, Van using amino acids to facilitate the multiple
Valkenburg JLCH et al (2009) DNA barcod- alignment of protein-coding DNA sequences.
ing discriminates the noxious invasive plant BMC Bioinformatics 6:156
species, floating pennywort (Hydrocotyle 27. Deck J, Gross J, Stones-Havas S et al (2012)
ranunculoides L.f.), from non-invasive relatives. Field information management systems for
Mol Ecol Resour 9:1086–1091 DNA barcoding. In: Kress WJ, Erickson DL
14. Kesanakurti PR, Fazekas AJ, Burgess KS et al (eds) Springer protocols methods in molecular
(2011) Spatial patterns of plant diversity biology 858 DNA barcodes methods and pro-
below-ground as revealed by DNA barcoding. tocols. Humana, New York, pp 255–267
Mol Ecol 20:1289–1302 28. Kreader CA (1996) Relief of amplification
15. Sonstebo JH, Gielly L, Brysting AK et al (2010) inhibition in PCR with bovine serum albumin
Using next-generation sequencing for molecu- or T4 gene 32 protein. Appl Environ Microbiol
lar reconstruction of past Arctic vegetation and 62:1102–1106
climate. Mol Ecol Resour 10:1009–1018 29. Ivanova NV, Fazekas AJ, Hebert PDN (2008)
16. Kress WJ, Erickson DL, Jones FA et al (2009) Semi-automated, membrane-based protocol
Plant DNA barcodes and a community phylog- for DNA isolation from plants. Plant Mol Biol
eny of a tropical forest dynamics plot in Panama. Rep 26:186–198
Proc Natl Acad Sci U S A 106:18621–18626 30. Wolfe AD, dePamphilis CW (1998) The effect
17. Kress WJ, Erickson DL, Swenson NG et al of relaxed functional constraints on the photo-
(2010) Advances in the use of DNA barcodes synthetic gene rbcL in photosynthetic and
to build a community phylogeny for tropical nonphotosynthetic parasitic plants. Mol Biol
trees in a Puerto Rican forest dynamics plot. Evol 15:1243–1258
PLoS One 5:e15409 31. Dunning LT, Savolainen V (2010) Broad-scale
18. CBOL Plant Working Group (2009) A DNA amplification of matK for DNA barcoding plants,
barcode for land plants. Proc Natl Acad Sci U S a technical note. Bot J Linn Soc 164:1–9
A 106:12794–12797 32. Kress WJ, Erickson DL (2007) A two-locus
19. Hollingsworth PM, Graham SW, Little DP global DNA barcode for land plants: the cod-
(2011) Choosing and using a plant DNA bar- ing rbcL gene complements the non-coding
code. PLoS One 6:e19254 trnH-psbA spacer region. PLoS One 2:e508
20. Ratnasingham S, Hebert PDN (2007) BOLD: 33. Fazekas AJ, Burgess KS, Kesanakurti PR et al
the barcode of life data system (www.barcod- (2008) Multiple multilocus DNA barcodes
inglife.org). Mol Ecol Notes 7:355–364 from the plastid genome discriminate plant
21. Fazekas A, Kuzmina ML, Newmaster SG et al species equally well. PLoS One 3:e2802
(2012) DNA barcoding methods for land 34. Fay MF, Swensen SM, Chase MW (1997)
plants. In: Kress WJ, Erickson DL (eds) Taxonomic affinities of Medusagyne oppositifo-
Springer protocols methods in molecular biol- lia (Medusagynaceae). Kew Bull 52:111–120
ogy 858 DNA barcodes methods and proto- 35. Ford CS, Ayres KL, Toomey N et al (2009)
cols. Springer, New York, pp 223–252 Selection of candidate coding DNA barcoding
22. Särkinen T, Staats M, Richardson JE et al regions for use on land plants. Bot J Linn Soc
(2012) How to open the treasure chest? 159:1–11
Optimising DNA extraction from herbarium 36. Cuenoud P, Savolainen V, Chatrou LW et al
specimens. PLoS One 7:e43808 (2002) Molecular phylogenetics of
23. Bridson D, Forman L (1998) The herbarium Caryophyllales based on nuclear 18S rDNA and
handbook, 3rd edn. Royal Botanic Gardens plastid rbcL, atpB, and matK DNA sequences.
Kew, London Am J Bot 89:132–144
Chapter 9
Abstract
Digital gene expression (DGE) analysis is a cost-effective method for large-scale quantitative transcriptome
analysis using second generation sequencing. Here we describe how adaptation of DGE with barcode
indexing in large segregating plant populations of over 100 genotypes can be applied for successful
expression QTL (eQTL) and gene expression network analysis to develop transcript-based markers for
breeding.
Key words Digital gene expression analysis, SAGE, RNA-Seq, Genetical genomics, eQTL
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_9, © Springer Science+Business Media New York 2015
119
120 Christian Obermeier et al.
2 Materials
2.2 DNAse Digestion 1. RQ1 RNAse-free DNAse (Promega, Madison, WI, USA).
2. Phenol/Chloroform/Isoamyl Alcohol (25:24:1, v/v, Life
Technologies, Carlsbad, CA, USA).
3. 3 M sodium acetate, pH 5.5, store at 4 °C (Carl Roth,
Karlsruhe, Germany).
4. Ethanol, 95 % (Carl Roth, Karlsruhe, Germany).
5. Ethanol, 70 % (Carl Roth, Karlsruhe, Germany).
6. Autoclaved deionized water (Merck Millipore, Billerica, MA,
USA).
2.5 Second-Strand 1. E. coli DNA Polymerase (10 U/μl), store at −20 °C (Fermentas
cDNA Synthesis International Inc./Thermal Scientific, Burlington, Canada).
2. E. coli RNase H (5 U/μl), store at −20 °C (Fermentas
International Inc./Thermal Scientific, Burlington, Canada).
3. Mussel glycogen (20 ng/μl), store at −20 °C (Roche
Diagnostics Deutschland GmbH, Mannheim, Germany).
4. Bovine Serum Albumin, BSA (10 mg/ml): Store at −20 °C
(New England Biolabs Inc., Ipswich, MA, USA).
5. Second-Strand Buffer (10×), store at −20 °C (Fermentas
International Inc./Thermal Scientific, Burlington, Canada).
6. 0.5 M EDTA, pH 8.0, store at room temperature (Carl Roth,
Karlsruhe, Germany).
7. Wash Buffer C: 5 mM Tris–HCl, pH 7.5, 0.5 mM EDTA, 1 M
NaCl, 0.1 % SDS (see Note 6), 10 μg/ml mussel glycogen,
store at 4 °C (Carl Roth, Karlsruhe, Germany).
8. Wash Buffer D: 5 mM Tris–HCl, pH 7.5, 0.5 mM EDTA, 1 M
NaCl, 200 μg/ml BSA, store at 4 °C (Carl Roth, Karlsruhe,
Germany).
9. NEBuffer 3 (10×): 1 M NaCl, 500 mM Tris–HCl, 100 mM
MgCl2, 10 mM Dithiothreitol, pH 7.9, store at −20 °C (New
England Biolabs Inc., Ipswich, MA, USA).
10. NEBuffer 4 (10×): 500 mM potassium acetate, 200 mM Tris-
acetate, 100 mM magnesium acetate, 10 mM Dithiothreitol,
pH 7.9, store at −20 °C (New England Biolabs Inc., Ipswich,
MA, USA).
11. NEBuffer 3 (1×): diluted from 10× stock (1 M NaCl, 500 mM
Tris–HCl, 100 mM MgCl2, 10 mM Dithiothreitol, pH 7.9),
store at −20 °C (New England Biolabs Inc., Ipswich, MA, USA).
12. NEBuffer 4 (1×): diluted from 10× stock (500 mM potassium
acetate, 200 mM Tris-acetate, 100 mM magnesium acetate,
10 mM Dithiothreitol, pH 7.9), store at −20 °C (New England
Biolabs Inc., Ipswich, MA, USA).
GEX1b_GACT: 5′-P-GATCGTCGGAGACTCTGTAGA
ACTCTGAAC-3′
GEX1a_CGTA: 5′-ACAGGTTCAGAGTTCTACAGCGTA
TCCGAC-3′
GEX1b_CGTA: 5′-P-GATCGTCGGACGTACTGTAGA
ACTCTGAAC-3′GEX1a_TCAG:
′-ACAGGTTCAGAGTTCTACAGTCAGTCCGAC-3′
GEX1b_TCAG:5′-P-GATCGTCGGATCAG CTGTAGAA
CTCTGAAC-3′
2. T4 DNA Ligase (5 U/μl), store at −20 °C (Life Technologies,
Carlsbad, CA, USA).
3. Ligase Buffer (5×), store at −20 °C (Life Technologies,
Carlsbad, CA, USA).
4. LoTE: 3 mM Tris–HCl, pH 7.5, 0.2 mM EDTA, store at 4 °C
(Carl Roth, Karlsruhe, Germany).
5. Autoclaved deionized water (see Note 2) or DEPC-treated
water (Invitrogen, Carlsbad, CA, USA).
6. Wash Buffer D: 5 mM Tris–HCl, pH 7.5, 0.5 mM EDTA, 1 M
NaCl, 200 μg/ml BSA, store at 4 °C (Carl Roth, Karlsruhe,
Germany; New England Biolabs Inc., Ipswich, MA, USA).
3 Methods
3.1 Isolation of Total 1. Grind 200 mg of plant material stored at −80 °C to a fine pow-
RNA der in a precooled mortar with pestle using liquid nitrogen.
2. Transfer the sample into a precooled 2-ml microcentrifuge
tube, using a precooled spatula. Avoid thawing of plant mate-
rial and transfer tubes to −20 °C until a manageable set of sam-
ples is ground.
3. Add 1 ml of cold (4 °C) TRIzol reagent and vortex for 30 s.
4. Incubate the homogenized samples for 5 min at room
temperature.
5. Add 0.2 ml of chloroform and mix by inverting by hand for 15 s.
6. Incubate samples for 2–3 min at room temperature.
7. Centrifuge samples at 12,000 × g for 15 min at 4 °C (see Note 8).
8. Transfer the supernatants (aqueous phases) very carefully into
a fresh set of 2 ml microcentrifuge tubes.
9. Add 0.5 ml of ice cold (−20 °C) isopropanol to samples, mix
well by inversion.
10. Incubate samples at room temperature for 10 min.
11. Centrifuge samples at 12,000 × g for 10 min at 4 °C.
12. Discard the supernatants and wash the RNA pellets once with
1 ml of 75 % ethanol.
13. Centrifuge samples at 12,000 × g for 5 min at 4 °C.
14. Discard the supernatant and let the pellets dry at room tem-
perature for approx. 10 min under a fume hood or use a
SpeedVac.
15. Dissolve the total RNA from each samples in 80 μl of RNase-
free water.
16. Store RNA samples at −80 °C.
17. Check total RNA quality by agarose gel electrophoresis (1 %)
of an aliquot of the samples (10 μl) (see Note 9).
18. Estimate RNA concentration and check quality by using a
Nanodrop spectrophometer. Extraction of high quality
Digital Gene Expression Analysis 129
8. Centrifuge briefly to collect any beads that may stick to the cap
of the tube.
9. Return the tubes to the magnetic stand for 2 min and carefully
remove the supernatant.
10. Repeat steps 4–9 once.
11. Resuspend the beads in 50 μl of Lysis/Binding buffer.
12. Adjust a total amount of 50 μg of the total DNAse-treated
RNA to 50 μl with water.
13. Heat the RNA solutions to 65 °C for 5 min to disrupt any
secondary structure, place on ice.
14. After removing the Lysis/Binding buffer from the equilibrated
oligo dT beads (step 11 above) add 50 μl of total RNA sample
to the beads.
15. Mix the sample well by flicking the tubes and firmly close and
place into a 50-ml Falcon tube stuffed with a Kimwipe.
16. Mix the beads and RNA by slowly rocking the tube on a rock-
ing platform or vortexing the tubes intermittently on a slow
vortex, for 30 min at room temperature. Check in between
that beads do not sediment.
17. Place the tube on a magnetic stand for 2 min, then carefully
remove and discard the supernatant.
18. Wash the beads on the magnetic stand with 200 μl of Wash
Buffer B.
19. Wash the beads a second time with 200 μl of Wash Buffer.
3.4 First-Strand 1. Wash the beads four times with 100 μl of 1× First-Strand Buffer
cDNA Synthesis (for SuperScript IIII Reverse Transcriptase). On the fourth
wash do not remove the supernatant.
2. Mix the following reagents for the first-strand cDNA synthesis
with a total volume of 38 μl in a new siliconized (nonstick)
RNase-free 1.5 ml tube on ice: 8 μl First-Strand Buffer (5×),
0.5 μl RNaseOUT (4 U/μl), 4.5 μl DTT (0.1 M), 2 μl dNTP
mix (10 mM), 3 μl betaine (5 M), 20 μl water.
3. Carefully remove the fourth wash from the beads and immedi-
ately add 38 μl of first-strand cDNA synthesis mix to the beads.
Mix gently by flicking the tube without causing the beads to
stick to the upper inner walls or lid of the tubes.
4. Centrifuge briefly to collect any beads that may stick to the
inner cap of the tubes.
5. Place the tubes at 42 °C for 2 min to equilibrate the reagents.
6. Add 2 μl of SuperScript III Reverse Transcriptase (200 U/μl).
7. Incubate on a thermomixer at 42 °C for 1 h, mixing for 30 s,
130 × g at 10 min intervals. Make sure the beads do not
sediment.
Digital Gene Expression Analysis 131
3.6 Digestion 1. Add the following reagents directly to the tube containing
of cDNA With DpnII 89 μl of cDNA magnetic bead suspension for a total volume of
100 μl: 10 μl NEB Buffer 3 (10×), 1 μl DpnII (10 U/μl).
2. Place the bottle containing Wash Buffer C in a water bath set
at 37 °C (see Note 15).
132 Christian Obermeier et al.
3.8 Cleaving 1. Prepare a fresh dilution of 10× SAM (400 μM) from the NEB
with the Tagging supplied SAM (32 mM). 20 μl of 10× SAM is needed per
Enzyme MmeI sample.
Digital Gene Expression Analysis 133
18. Incubate the tubes at −20 °C for at least 30 min and microcen-
trifuge for 30 min at maximum speed at 4 °C.
19. Carefully remove the supernatants.
20. Wash the pellets three times with 1 ml of cold 70 % ethanol,
centrifuge for 15 min.
21. Dry the pellets at room temperature for approx. 10 min under
a fume hood or use a SpeedVac. Do not overdry.
22. Resuspend the pellets in 6 μl of water and incubate for 10 min
to aid in solubilization.
3.9 Ligation of GEX 1. Prepare double-stranded DNA adapter GEX2 by mixing two
Adapter 2 complementary oligonucleotides in equal concentrations (1:1,
1.5 μM each single-stranded oligonucleotide) and anneal by
heating to 95 °C for 2 min, cool slowly to 4 °C over an hour
in a thermal cycler resulting in a 0.75 μM double-stranded
adapter GEX 2 solution.
2. Set up the following adapter ligation on ice directly into the
tube containing the 6 μl of DNA solution (Subheading 3.8)
for a total volume of 10 μl: 1 μl GEX Adapter 2 (0.75 μM),
2 μl Ligase Buffer (5×), 1 μl T4 DNA ligase (5 U/μl).
3. Seal the lid of the tube with parafilm and incubate overnight
at 16 °C.
3.10 Enrichment 1. Prepare a PCR Master Mix and distribute in wells of a 96-well
of Adapter-Ligated PCR plate. Total volume for one reaction is 25 μl: 16 μl of
cDNA by PCR water, 5 μl of Phusion HF buffer (5×), 0.25 μl of GEX1_
and Purification PCR_1 primer (25 μM), 0.25 μl of GEX_PCR_2 primer
from Gel (25 μM), 0.75 μl of dNTPs (10 mM), 0.25 μl of Phusion Hot
Start DNA Polymerase (2 U/μl).
2. Add 2.5 μl of GEX2 Adapter 2-ligated cDNA to each well
(25 μl total volume).
3. Amplify in a thermal cycler using the following program: 30 s
at 98 °C, 13 cycles of: 10 s at 98 °C, 30 s at 60 °C, 15 s at
72 °C, 10 min at 72 °C, hold at 4 °C.
4. When preparing libraries for the first time also include a dilution
series of GEX Adapter 2-ligated cDNA for amplification (1:5,
1:10, 1:20, 1:100, 1:200) and compare amplification patterns
with expected fragment sizes and yield by polyacrylamide gel
electrophoresis. Expected sizes are 93 bp for the targeted GEX1-
tag-GEX2 fragment and smaller sizes for artifacts including
76 bp for the GEX1-GEX2 adapter ligation, 30 bp for the GEX1
adapter, 23 bp for the GEX2 adapter fragment plus PCR primer
dimers. Identify the dilution which yields in PCR highest yield
of the 93 bp fragment compared to the other nontargeted
fragments by polyacrylamide gel electrophoresis and use for
cutting out and purification of 93 bp fragments.
Digital Gene Expression Analysis 135
Fig. 1 Amplification products from four different DGE-DpnII libraries (1–4) loaded
on a 12 % polyacrylamide gel. Staining was done using SYBR Green I Nucleic
Acid Gel Stain (Lonza). M = 25 bp size marker. Correctly sized fragments containing
a tag should be 93 bp and can be excised from the gel before sequencing
136 Christian Obermeier et al.
17. Add 100 μl of 1× gel elution buffer to the gel debris in the
2 ml tubes.
18. Elute the DNA by incubating at 65 °C for 1 h.
19. Transfer the eluate and the gel debris to the top of a Spin-X filter.
20. Centrifuge the filter for 5 min at 14,000 rpm (13,000 × g) at
room temperature.
21. Add 2 μl of glycogen, 20 μl of 3 M NaOAc, and 650 μl of cold
95 % ethanol.
22. Mix by vortexing and store the tubes at −20 °C for 5 min.
23. Centrifuge at 14,0000 rpm (13,000 × g) for 20 min.
24. Discard the supernatants, leaving the pellets intact.
25. Wash the pellets two times with 1 ml 70 % ethanol, room
temperature.
26. Discard the supernatants leaving the pellets intact.
27. Dry the pellets at room temperature for approx. 10 min under
a fume hood, or use a SpeedVac. Do not overdry.
28. Resuspend the pellets in 10 μl of Qiagen elution buffer
(QIAGEN PCR purification kit).
29. Let the tubes sit for 10 min at room temperature. Store at −20 °C.
4 Notes
Acknowledgements
References
1. Jansen RC, Nap J-P (2001) Genetical genom- (2002) Using the transcriptome to annotate
ics: the added value from segregation. Trends the genome. Nat Biotechnol 19:508–512
Genet 17:388–391 5. Matsumura H, Reich S, Ito A, Saitoh H,
2. De Koning D-J, Haley CS (2005) Genetical Kamoun S, Winter P, Kahl G, Reuter R, Krüger
genomics in humans and model organisms. DH, Terauchi R (2003) Gene expression anal-
Trends Genet 21:377–381 ysis of plant host–pathogen interactions by
3. Velculescu VE, Zhang L, Vogelstein B, Kinzler SuperSAGE. Proc Natl Acad Sci U S A
KW (1995) Serial analysis of gene expression. 100:15718–15723
Science 270:484–487 6. Torres T, Metta M, Ottenwälder B, Schlötterer
4. Saha S, Sparks AB, Rago C, Akmaev V, Wang C (2008) Gene expression profiling by massively
CJ, Vogelstein B, Kinzler KW, Velculescu VE parallel sequencing. Genome Res 18:172–177
140 Christian Obermeier et al.
7. Kahl G, Molina C, Rotter B, Jüngling R, Frank 12. Morrissy S, Zhao Y, Delaney A, Asano J, Dhalla N,
A, Krezdorn N, Hoffmeier K, Winter P (2012) Li I, McDonald H, Pandoh P, Prabhu A, Tam A,
Reduced representation sequencing of plant Hirst M, Marra M (2010) Digital gene expression by
stress transcriptomes. J Plant Biochem tag sequencing on the Illumina Genome Analyzer.
Biotechnol. doi:10.1007/s13562-012-0129-y Curr Protoc Hum Genet 11(11):1–11.11.36
8. Zheng W, Chung LM, Zhao H (2011) Bias 13. Morrissy AS, Morin RD, Delaney A, Zeng T,
detection and correction in RNA-sequencing McDonald H, Jones S, Zhao Y, Hirst M, Marra
data. BMC Bioinformatics 12:290 MA (2009) Next-generation tag sequencing for
9. Gowda M, Wang GL (2008) Robust- cancer gene expression profiling. Genome Res
LongSAGE (RL-SAGE): an improved 19:1825–1835
LongSAGE method for high-throughput tran- 14. Invitrogen (2010) I-SAGE™ Long Kit. For con-
scriptome analysis. Methods Mol Biol 387: structing Long SAGE™ (serial analysis of gene
25–38 expression) libraries. Version D, 19 October
10. Obermeier C, Hosseini B, Friedt W, Snowdon 2010, 25-0656. http://tools.invitrogen.com/
R (2009) Gene expression profiling via content/sfs/manuals/isagelong_man.pdf .
LongSAGE in a non-model plant species: a Accessed 26 Jun 2012
case study in seeds of Brassica napus. BMC 15. Gatti DM, Shabalin AA, Lam T-C, Wright FA,
Genomics 10:295 Rusyn I, Nobel AB (2009) FastMap: Fast
11. Illumina (2007) Preparing samples for digital eQTL mapping in homozygous populations.
gene expression-tag profiling with DpnII. Bioinformatics 25:482–489
http://illumina.bioinfo.ucr.edu/ht/docu- 16. Shabalin AA (2012) Matrix eQTL: ultra fast
mentation/molbiol-docs/DGE-DpnII- eQTL analysis via large matrix operations.
Sample-Prep.pdf. Accessed 26 Jun 2012 Bioinformatics 28:1353–1358
Chapter 10
Abstract
Heteroduplex-based genotyping methods have proven to be technologically effective and economically
efficient for low- to medium-range throughput single-nucleotide polymorphism (SNP) determination. In this
chapter we describe two protocols that were successfully applied for SNP detection and haplotype analysis
of candidate genes in association studies. The protocols involve (1) enzymatic mismatch cleavage with
endonuclease CEL1 from celery, associated with fragment separation using capillary electrophoresis
(CEL1 cleavage), and (2) differential retention of the homo/heteroduplex DNA molecules under partial
denaturing conditions on ion pair reversed-phase liquid chromatography (dHPLC). Both methods are
complementary since dHPLC is more versatile than CEL1 cleavage for identifying multiple SNP per target
region, and the latter is easily optimized for sequences with fewer SNPs or small insertion/deletion poly-
morphisms. Besides, CEL1 cleavage is a powerful method to localize the position of the mutation when
fragment resolution is done using capillary electrophoresis.
Key words Heteroduplex analysis, SNP, CEL1 cleavage, Capillary electrophoresis, dHPLC,
Genotyping, Candidate genes
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_10, © Springer Science+Business Media New York 2015
141
142 Norma Paniego et al.
Fig. 1 Outline of SNP detection by heteroduplex analysis. Amplified regions of different alleles (allele A and B)
are mixed in equimolecular proportions and subjected to a heating and cooling process to enable the formation
of homoduplex and heteroduplex molecules. In dHPLC, heteroduplex molecules elute earlier than the homodu-
plex because of their reduced melting temperature. In CEL1, a DNA automatic analyzer enables the detection
of labeled fragments corresponding to cleaved heteroduplex and homoduplex molecules. Reproduced from [9]
with permission from Springer Science + Business Media
2 Materials
2.1 Heteroduplex 1. High quality DNA from plant tissue (see Note 1).
Formation 2. A set of individuals of known genotype for each candidate gene:
target amplicons from a small panel of homozygous individuals
of the species of interest are used as reference to make homo-
and heteroduplex samples for the analysis and optimization.
3. Primer pairs for PCR amplification of target regions (10 μM
working dilution) (see Note 2).
4. PCR components: 50 mM MgCl2, 10 mM dNTP, PCR buffer,
high yield high fidelity DNA polymerase or equivalent (see Note 3).
5. A 96/384-well thermocycler with adjusting ramp settings and
touch-down settings.
6. DNA/PCR amplicon quantification system: Hoechst 33258
or Picogreen®, double-stranded DNA standard, microtiter
plate fluorometer (e.g., Gemini from Molecular Probes or
equivalent) (see Note 4).
7. DNA/PCR amplicon quality visualization system: bromophenol
blue/xylene cyanol gel loading buffer, ethidium bromide
stained agarose gel, agarose gel electrophoresis apparatus, and
gel visualization system (see Note 5).
2.2 CEL1 Cleavage 1. CEL1 juice extract (CJE). Partially purified extract obtained
according to the protocol described by Till et al. [7, 8]
(see Note 6).
2. CEL1 reaction buffer 10×: 5 ml 1 M MgSO4, 5 ml 1 M HEPES,
pH = 7.5, 2.5 ml 2 M KCl, 0.1 ml 10 % Triton® X-100, 5 μl
20 mg/ml bovine serum albumin, 37.5 ml deionized water.
3. Heating block.
4. CEL1 stop solution: 0.15 M EDTA, pH = 8.
5. Absolute ethanol and 75 % ethanol.
6. Deionized water.
7. Centrifuge.
2.4 dHPLC 1. Agilent series 1100 HPLC system (Agilent Technologies Inc.,
CA, USA) or equivalent including biocompatibility kit, binary
pump equipped with solvent degasser unit, autosampler with
cooling module, column oven, and variable wavelength
detector.
2. dHPLC Column (Varian Helix™ DNA or Transgenomic
DNASep column).
3. Buffer A: 100 mM TEAA, pH 7.0, 0.1 mM EDTA (see Note 7).
4. Buffer B: 100 mM TEAA, pH 7.0, 0.1 mM EDTA, 25 % (v/v)
acetonitrile (see Note 7).
5. pUC18 Hae III digested (Sigma Chemical Company, MO,
USA, Cat. No.: D6293).
6. dHPLC Melt Program available at the Stanford University
webpage: http://insertion.stanford.edu/melt.html. The sensi-
tivity of the heteroduplex analysis using dHPLC is maximized
by maintaining the HPLC column at a temperature that favors
partial strand denaturation in the presence of base-pair mis-
matches. The “dHPLC Melt program” or similar allows opti-
mal temperature selection for mutation detection from an in
silico reference sequence (see Note 8).
7. 384-well microplates (recommended Greiner Bio-One, 384
well microplate, low volume, HiBase, clear, Cat. No.: 784101
compatible with Agilent HPLC autosampler).
8. Application software (HPCHEM Agilent or equivalent).
3 Methods
3.2 CEL1 1. Add 0.2 μl of CJE and 2 μl of 10× CEL1 reaction buffer to each
Cleavage Assay tube containing mix 1, mix 2, mix 3 and mix 4 (see Note 10).
2. Incubate samples in a heating block at 45 °C for 15 min.
3. Place samples on ice and stop the reaction with 5 μl of CEL1
stop solution.
4. Add 2.5 volumes of absolute ethanol to precipitate DNA mol-
ecules. Incubate at 20 °C for 30 min and then centrifuge at
3,600 × g for 45 min. Wash the pellet with 200 μl of 75 %
ethanol. Repeat centrifugation step. Discard supernatant. Dry
and resuspend the pellet in 5 μl of deionized water.
3.3 Capillary 1. Dilute the final CEL1 treated product (5 μl) with 10 μl of
Electrophoresis Hi-Di formamide and 0.25 μl of GeneScan 500 (-250) ROX size
of CEL1 Treated standard.
Products 2. Heat samples at 95 °C for 5 min.
3. Transfer samples to ice for 5 min.
4. Inject samples into 3130xl Genetic Analyzer.
146 Norma Paniego et al.
3.5 Genotype Calling 1. The comparison procedure should be done using only samples
from the same candidate gene. Mix 1 samples will produce
heteroduplex profiles similar to those of mix 2 whenever the
corresponding reference and uncharacterized individuals have
SNP Genotyping by Heteroduplex Analysis 147
Fig. 2 Detection of SNPs from sunflower candidate genes (1-ACCO, MADSB-TF3, LIM, CPSI, and AALP) with
capillary electrophoresis CEL1 cleavage and dHPLC. (a) Electropherograms illustrating heteroduplex (upper
panel) and homoduplex (bottom panel) profiles for 1-ACCO and MADSB-TF3 genes. Gray bars indicate the
cleavage or homoduplex product/s at the corresponding base-pair position. The y axes are in Fluorescence
Units and x axes are in base pairs. (b) dHPLC elution profiles of LIM, CPSI, and AALP genes. Heteroduplex
(upper chromatogram) and homoduplex molecules (bottom chromatogram) obtained at 0.9 ml/min in 5 min of
running (x axes). y-Axes are in milli-Absorbance Units (mAU) that correlate with milli-Volt Units. Reproduced
from [9] with permission from Springer Science + Business Media
mix 4 samples should also show only one peak. Samples made
of a single PCR product which show two or more peaks should
be subjected to further analysis to evaluate heterozygosis or
contamination.
4 Notes
Acknowledgments
References
1. Appleby N, Edwards D, Batley J (2009) New 6. Xiao W, Oefner PJ (2001) Denaturing high-
technologies for ultra-high throughput genotyp- performance liquid chromatography: a review.
ing in plants. In: Somers DJ, Langridge P, Hum Mutat 17:439–474
Gustafson JP (eds) Plant genomics. Humana, 7. Till BJ, Colbert T, Tompa R et al (2003) High-
New Hampshire, pp 19–40 throughput TILLING for functional genomics.
2. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, In: Grotewold E (ed) Plant functional genomics:
Kawamoto K, Buckler ES, Mitchell SE (2011) A methods and protocols. Humana, New Hampshire,
robust, simple genotyping-by-sequencing (GBS) pp 205–220
approach for high diversity species. PLoS One 6:10 8. Till BJ, Burtner C, Comai L, Henikoff S
3. Thomson MJ, Zhao K, Wright M et al (2011) (2004) Mismatch cleavage by single-strand
High-throughput single nucleotide polymorphism specific nucleases. Nucleic Acids Res 32:
genotyping for breeding applications in rice using 2632–2641
the BeadXpress platform. Mol Breed 29:875–886 9. Fusari CM, Lia VV, Nishinakamasu V, Zubrzycki
4. Fusari CM, Di Rienzo JA, Troglia C et al (2012) JE, Puebla AF, Maligne AE, Hopp HE, Heinz
Association mapping in sunflower for sclerotinia RA, Paniego NB (2010) Single nucleotide poly-
head rot resistance. BMC Plant Biol 12:93 morphism genotyping by heteroduplex analysis
5. Till BJ, Zerr T, Comai L, Henikoff S (2006) in sunflower (Helianthus annuus L.). Mol Breed
A protocol for TILLING and Ecotilling in plants 28:73–89
and animals. Nat Protoc 1:2465–2477
Chapter 11
Abstract
Identifying DNA variations associated with important agronomic traits is a major focus for plant biologists
today. Modern crop breeders use molecular markers widely as tools for selecting new varieties more rapidly
and efficiently. High-Resolution Melting (HRM) is frequently selected as the method of choice to rapidly
and cost effectively detect and genotype SNPs. These SNPs can be used for gene mapping studies and
routinely by breeders.
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_11, © Springer Science+Business Media New York 2015
151
152 David Chagné
SNPs rapidly and cost effectively, for gene mapping studies and SNP
validation, as well as for developing markers in a form that plant
breeders can use routinely.
PCR product
A A
T T
Homozygous A A
T T
C
SNP Heteroduplex
T
Fig. 1 Principle of the high-resolution melting technique. Fast melting and reannealing promotes the formation
of heteroduplex PCR products for heterozygous individuals. Such heteroduplexes are less stable than homo-
duplexes and melt at a lower temperature
Application of the High-Resolution Melting Technique… 153
All PCR primers were amplified using the same PCR conditions
(see Note 10). The high-resolution melting analysis was performed
immediately after the PCR amplification, with 25 acquisitions per
degree Celsius.
All three markers were polymorphic (Fig. 3); however, different
segregation types were observed, depending on the genotype of
the parents. In the first example, a backcross type segregation was
observed (Fig. 3a), with two genotypes and melting curves observed
in the progeny. The second type of segregation where both parents
were heterozygous and had the same genotype yielded three geno-
types in the progeny (Fig. 3b). Both homozygous genotypes were
discriminated with this marker. Most interestingly, the HRM tech-
nique detected genotypic differences resulting from a more com-
plex segregation pattern, where two SNPs were located within the
PCR amplicon, yielding four different genotypes in the progeny
(Fig. 3c). As all three types of segregation are common in outcross-
ing and highly heterozygous species, this experiment demonstrates
the usefulness of the HRM technique for detecting and genotyping
SNPs in plants.
5 Orthologous Markers
Fig. 3 High-Resolution Melting profile for three SNP markers in a segregating population of apple seedlings.
PCR products were amplified over the two parents (top left-hand wells of the first column for each marker) and
14 individuals of the F1 progeny (“M.9” × “Robusta 5”). Distinctive melting profiles were obtained where the
marker was present in two (a), three (b), or four (c) genotypes in the progeny. The underlying SNP genotypes
of both parents and the progeny are shown alongside each melting profile
Application of the High-Resolution Melting Technique… 157
6 Conclusion
7 Notes
Acknowledgements
References
1. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, 3. Liew M, Pryor R, Palais R, Meadows C, Erali M,
Kawamoto K, Buckler ES, Mitchell SE (2011) Lyon E, Wittwer C (2004) Genotyping of sin-
A robust, simple genotyping-by-sequencing gle-nucleotide polymorphisms by high-resolution
(GBS) approach for high diversity species. PLoS melting of small amplicons. Clin Chem 50:
One 6:e19379 1156–1164
2. Liew M, Nelson L, Margraf R, Mitchell S, Erali M, 4. Montgomery J, Wittwer CT, Kent JO, Zhou LM
Mao R, Lyon E, Wittwer C (2006) Genotyping of (2007) Scanning the cystic fibrosis transmem-
human platelet antigens 1 to 6 and 15 by high- brane conductance regulator gene using high-
resolution amplicon melting and conventional resolution DNA melting analysis. Clin Chem
hybridization probes. J Mol Diagn 8:97–104 53:1891–1898
Application of the High-Resolution Melting Technique… 159
5. Newcomb RD, Crowhurst RN, Gleave AP, 6. Celton J-M, Tustin DS, Chagné D, Gardiner SE
Rikkerink EHA, Allan AC, Beuning LL, Bowen (2009) Construction of a dense genetic linkage
JH, Gera E, Jamieson KR, Janssen BJ et al map for apple rootstocks using SSRs developed
(2006) Analyses of expressed sequence tags from from Malus ESTs and Pyrus genomic sequences.
apple. Plant Physiol 141:147–166 Tree Genet Genomes 5:93–107
Chapter 12
Abstract
Most plant species are known to be either ancient or recent polyploids, containing more than one genome
as a result of past interspecific hybridization events (allopolyploidy) and/or genome doubling (autopoly-
ploidy). Genotyping in polyploid species offers a set of unique challenges. Most molecular marker meth-
odologies are made more complex by polyploidy, as multilocus alleles are generally produced when a single
locus is targeted. Genotyping by sequencing is also more challenging in polyploids, with problematic
assemblies of duplicated regions and difficulties in distinguishing between inter- and intragenomic poly-
morphisms. Strategies for identifying and overcoming the challenges of polyploidy in plant genotyping
are proposed.
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_12, © Springer Science+Business Media New York 2015
161
162 Annaliese S. Mason
Ch. 1 a B C D
Homologues
Ch. 1 a b c d
Homoeologues
Ch. 2 A B D
Homologues
Ch. 2 A B d
Fig. 1 Cartoon of possible locus configurations in a 2n = 4 species with ancestral polyploidy and hence genomic
duplications between Chromosome 1 (Ch. 1) and Chromosome 2 (Ch. 2). Alleles “A” and “a” represent a false
polymorphism, whereby Ch. 1 and Ch. 2 have homeologous homozygous loci, but marker assays may detect
a single polymorphic locus. Alleles “B” and “b” may constitute a useful polymorphic locus on Ch. 1, but only in
codominant marker assays. Alleles “C” and “c” represent a polymorphic chromosome-specific locus, the most
desirable type of locus in a species with genome duplication. Alleles “D” and “d” may only be detectable as
being present at multiple loci due to distorted allele segregation ratios or through sequence validation
4.1 Investigating In some species and marker systems, the effects of polyploidy can
the Origin be partially or fully resolved by the addition of different types of
of the Polyploidy Event control samples. In some cases, it can be very helpful to include
Through Phylogenetic related species to the species of interest in the marker assays. In
Analysis Brassica, the allopolyploid species B. napus (canola, 2n = AACC),
B. juncea (Indian mustard, 2n = AABB), and B. carinata
(2n = BBCC) contain the A, B, and C genomes fundamentally un-
rearranged comparative to the genomes of extant diploid species
B. rapa (2n = AA), B. oleracea (2n = CC), and B. nigra (2n = BB).
Hence, inclusion of the diploid species in assays provides valuable
information about genome location in the allopolyploids. For
example, an allele observed in B. napus (2n = AACC) and B. rapa
(2n = AA) but not B. oleracea (2n = CC) may be presumed to be
located in the Brassica A genome. Progenitor species and hence
genomic controls for bread wheat (2n = AABBDD) also exist, such
as durum wheat (2n = AABB) and D genome grass species.
However, this level of genomic relationship and hence ability to
control for genome location in polyploids is unfortunately uncom-
mon. Regardless, in many species with hitherto unsuspected poly-
ploidy the interrogation of related species will still yield valuable
insights into the nature of the polyploidy event. Many polyploid
species are actually hybrids (or allopolyploids) between related spe-
cies, and tracking these origins will be invaluable for further geno-
typing, for example in designing subgenome specific primers in the
hybrid species. Investigation of more distantly related species may
also yield information as to how recent the polyploidy event was in
the history of the species, and hence what the expected level of
genome duplication is. If a whole genus or family exhibits the same
polyploid phenotype, the polyploidy event may be predicted to be
less recent than if a single species displays genome duplication.
Genotyping using less variable markers, such as RFLPs, may be
particularly useful in elucidating these relationships. More highly
variable markers such as SSRs and SNPs may be of more limited
utility in phylogenic analyses, but provide far more useful infor-
mation within species populations and species complexes.
166 Annaliese S. Mason
4.2 Mapping Mapping populations are an extremely useful genotyping tool, and
this is especially true in polyploid species. Production of a mapping
population involves crossing two genetically distinct parent indi-
viduals to produce an F1 individual or population, then detection
of allelic cosegregation in resulting progeny. In plants, the process
can be facilitated in many species by production of homozygous
parent lines and by use of microspore culture on the F1 to
immediately examine gametic segregation in the first generation
using doubled-haploid microspore-derived plants. In the case of
markers that produce multiple alleles at different homeologous
loci, creation of a linkage map can allow each allele to be mapped
to a separate subgenome and genomic location. Segregation of
two alleles in a mapping population also offers definite evidence
that both alleles are present at the same locus. Hence, for species
where little genomic information is present, production of a map-
ping population can be invaluable in resolving complexities caused
by genome duplication.
4.3 Validation Sequencing alleles used for genotyping analysis is a common process
and Sequencing in the production of some marker types, such as SSRs [24] and
SNPs [21]. In this approach, the actual DNA sequence of the allele
and surrounding region is validated by sequencing. Poor match of
surrounding sequences (e.g., flanking sequences of SSRs) between
alleles is indicative of amplification of two homeologous loci, rather
than amplification of two alleles from a single locus. Sequence mis-
matches may be either difficult or easy to detect: in recent poly-
ploids, high conservation of homeologous sequence is expected.
Use of markers localized to noncoding DNA sequences may be
more helpful in this instance, as coding regions of DNA sequence
(such as may be provided by RNAseq technology) may be expected
to be more conserved between homeologous regions. However, as
DNA and RNA sequencing becomes more common and easily
accessible, availability of genomic sequence information against
which to compare sequenced marker alleles is improving. In particu-
lar, genotyping by sequencing approaches provide a ready means of
screening for allelic haplotypes in populations, and hence differenti-
ating “normal” homologous variation between alleles at a single
locus from homeologous variation between duplicated loci.
5 Summary
References
1. Jiao YN, Wickett NJ, Ayyampalayam S, Microspore culture preferentially selects unre-
Chanderbali AS, Landherr L, Ralph PE et al duced (2n) gametes from an interspecific
(2011) Ancestral polyploidy in seed plants and hybrid of Brassica napus L. × Brassica carinata
angiosperms. Nature. 473:97–100 Braun. Theor Appl Genet 119:497–505
2. Leitch AR, Leitch IJ (2008) Genomic plasticity 11. Zou J, Fu DH, Gong HH, Qian W, Xia W,
and the diversity of polyploid plants. Science Pires JC et al (2011) De novo genetic variation
320:481–483 associated with retrotransposon activation,
3. Soltis DE, Buggs RJA, Doyle JJ, Soltis PS (2010) genomic rearrangements and trait variation in a
What we still don't know about polyploidy. recombinant inbred line population of Brassica
Taxon 59:1387–1403 napus derived from interspecific hybridization
4. Harlan JR, DeWet JMJ, On Ö (1975) Winge with Brassica rapa. Plant J 68:212–224
and a prayer: the origins of polyploidy. Bot Rev 12. Imelfort M, Edwards D (2009) De novo sequenc-
41:361–390 ing of plant genomes using second-generation
5. Leitch IJ, Bennett MD (2004) Genome down- technologies. Brief Bioinform 10:609–618
sizing in polyploid plants. Biol J Linn Soc 13. Chen S, Nelson MN, Chèvre AM, Jenczewski
82:651–663 E, Li Z, Mason AS et al (2011) Trigenomic
6. Schranz ME, Lysak MA, Mitchell-Olds T bridges for Brassica improvement. Crit Rev
(2006) The ABC's of comparative genomics in Plant Sci 30:524–547
the Brassicaceae: building blocks of crucifer 14. Veilleux RE, Lauer FI (1981) Variation for 2n
genomes. Trends Plant Sci 11:535–542 pollen production in clones of Solanum
7. Nelson MN, Parkin IAP, Lydiate DJ (2011) phureja Juz. and Buk. Theor Appl Genet 59:
The mosaic of ancestral karyotype blocks in 95–100
the Sinapis alba L. genome. Genome 54: 15. Mason AS, Yan GJ, Cowling WA, Nelson MN
33–41 (2012) A new method for producing allohexa-
8. Wang XW, Wang HZ, Wang J, Sun RF, Wu J, ploid Brassica through unreduced gametes.
Liu SY et al (2011) The genome of the meso- Euphytica 186:277–287
polyploid crop species Brassica rapa. Nat 16. Zohary D, Nur U (1959) Natural triploids in
Genet 43:1035–1040 the orchard grass. Dactylis-Glomerata L. poly-
9. Mason AS, Nelson MN, Castello M-C, Yan G, ploid complex and their significance for gene
Cowling WA (2011) Genotypic effects on the flow from diploid to tetraploid levels. Evolution
frequency of homoeologous and homologous 13:311–317
recombination in Brassica napus × B. carinata 17. Mallet J (2007) Hybrid speciation. Nat Rev
hybrids. Theor Appl Genet 122:543–553 446:279–283
10. Nelson MN, Mason AS, Castello M-C, 18. Petit C, Bretagnolle F, Felber F (1999)
Thomson L, Yan G, Cowling WA (2009) Evolutionary consequences of diploid-polyploid
168 Annaliese S. Mason
hybrid zones in wild species. Trends Ecol Evol 22. Lai KT, Duran C, Berkman PJ, Lorenc MT,
14:306–311 Stiller J, Manoli S et al (2012) Single nucleotide
19. Imelfort M, Duran C, Batley J, Edwards D polymorphism discovery from wheat next-
(2009) Discovering genetic polymorphisms in generation sequence data. Plant Biotechnol J
next-generation sequencing data. Plant 10:743–749
Biotechnol J 7:312–317 23. Oliver RE, Lazo GR, Lutz JD, Rubenfield MJ,
20. Duran C, Appleby N, Edwards D, Batley J Tinker NA, Anderson JM et al (2011) Model
(2009) Molecular genetic markers: discovery, SNP development for complex genomes based
applications, data storage and visualisation. Curr on hexaploid oat using high-throughput 454
Bioinform 4:16–27 sequencing technology. BMC Genomics 12
21. Hayward A, Mason AS, Dalton-Morgan J, 24. Squirrell J, Hollingsworth PM, Woodhead M,
Zander M, Edwards D, Batley J (2012) SNP Russell J, Lowe AJ, Gibby M et al (2003) How
discovery and applications in Brassica napus. much effort is required to isolate nuclear micro-
J Plant Biotechnol 39:49–61 satellites from plants? Mol Ecol 12:1339–1348
Chapter 13
Abstract
We report the development of a simple genomic reduction protocol based on 454-pyrosequencing
technology that discovers large numbers of single nucleotide polymorphisms (SNP) from pooled DNA
samples. The method is based on the conservation of restriction endonuclease sites across samples and
biotin separation for genomic reduction and the addition of multiplex identifier (MID) barcodes to each
of the pooled samples to allow for postsequencing deconvolution of the pooled DNA fragments and
SNP discovery.
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_13, © Springer Science+Business Media New York 2015
169
170 Peter J. Maughan et al.
2 Materials
2.1 Restriction 1. 150 ng/μL genomic DNA for each sample. DNA should be
Digest extracted using a standard DNA extraction protocol that pro-
duces high quality DNA (260/280 nm ratio: 1.8–2.0). Care
should be taken to correctly ascertain DNA concentrations
(UV absorbance or fluorometry).
2. Restriction endonucleases. EcoRI (20,000 U/mL); BfaI
(5,000 U/mL) (New England Biolabs, Beverly, MA)
(see Notes 2–3).
3. 10× NEBuffer 4 (500 mM potassium acetate, 200 mM Tris–
acetate, 100 mM Magnesium Acetate, 10 mM Dithiothreitol,
pH 7.9; New England Biolabs, Beverly, MA) (see Note 4).
4. Nuclease free water.
5. Thermal cycler or heat control water bath programmable to
37 °C.
6. PCR strip tubes with caps.
2.3 Ligation 1. T4 DNA Ligase (400,000 cohesive end units/mL; see Note 6)
(New England Biolabs, Beverly, MA).
2. 10× T4 DNA Ligase Buffer (500 mM Tris–HCl, 100 mM
MgCl2, 10 mM ATP, 100 mM Dithiothreitol, pH 7.5; New
England Biolabs, Beverly, MA).
3. EcoRI DS adapter (5 μM) and BfaI DS adapter (50 μM) as
prepared in Subheading 3.3.
4. Thermal cycler programmable to 16 °C.
Table 1
MID barcode primer pairs
3 Methods
3.1 Restriction 1. In a PCR strip tube prepare a double digestion for each of the
Digest DNA sample as described in Table 2. If more than one DNA
sample is processed, a cocktail of the nuclease free water,
restriction enzymes, and NEB 4 buffer can be prepared (mixed
thoroughly) and subdivided into the separate PCR strip tubes
containing the DNA for each sample.
2. Cap and gently mix each sample thoroughly by finger flicking.
3. Incubate the samples at 37 °C for 1 h. (see Note 10).
3.2 Adapter Adapters can be prepared ahead of time or during the restriction
Preparation digestion period.
1. In separate 1.5 mL microfuge tubes prepare the EcoRI and
BfaI double-stranded adapters by combining the components
in Table 3 for each of the adapters.
174 Peter J. Maughan et al.
Table 2
Double digest preparation
Table 3
Adapter preparation for (a) EcoRI Double-Stranded Adapter and
(b) BfaI double-stranded adapter
Table 4
Ligation mix preparation
Ligation mix
5 μM EcoRI DS adapter 3.0 μL
50 μM BfaI DS adapter 3.0 μL
10× ligase buffer 1.0 μL
T4 ligase 0.5 μL
Nuclease free water 2.5 μL
Total 10 μL
2. The total volume per reaction is now 50 μL. Cap and gently
mix each sample thoroughly.
3. Incubate the samples at 16 °C for 3 h (see Note 10).
4. Following the ligation, add 25 μL of nuclease free water to
each sample to bring the final volume to 75 μL (used in step 5
of Subheading 3.4).
3.4 Size Exclusion 1. Invert a Chroma Spin™-400 column several times to resus-
via Spin pend the gel matrix completely. One column per sample is
Chromatography required.
2. Holding the column upright, grasp the breakaway end between
your thumb and index finger and snap it off. Place the end of
the spin column into a 2-mL microcentrifuge tube, and remove
the top cap.
3. Place the column + collection tube in a 15 mL tube and cen-
trifuge at 700 × g in a swing-out rotor for 5 min. After centrifu-
gation, the column matrix will appear semidry. This step purges
the equilibration buffer from the column and reestablishes the
matrix bed.
4. Remove the spin column and collection tube from the centrifuge
rotor, and discard the collection tube.
5. Place the spin column into a clean 2-mL microcentrifuge tube
and carefully apply the 75 μL restriction digestion/ligation
sample to the center of the gel surface.
6. Centrifuge at 700 × g in a swing-out rotor for 5 min.
7. Remove the spin column and collection tube from the rotor.
The purified sample is in the collection tube.
3.6 PCR 1. In a PCR strip tube prepare a PCR reaction for each of the
Amplification DNA sample as outlined in Table 5.
and MID-Barcode 2. Gently mix the PCR reaction and thermal cycle with the
Attachment parameters outlined in Table 6.
Table 5
PCR mastermix
Table 6
PCR thermocycling parameters
6. From the gel picture, identify the optimal number of cycles that
produces a bright smear through the target range (450–
600 bp) without over cycling. The goal is to perform sufficient
cycles to remain within the exponential phase of PCR. In our
hands, the optimal cycle number is typically between 20 and
22 cycles (see Note 14).
7. Prepare a new 50 μL PCR reaction for each sample according
to the protocol above. Cycle each sample to the optimum
number of cycles determined above.
4 Notes
Acknowledgments
References
variation at 90 anonymous loci in the banded 11. Wiedmann RT, Smith TP, Nonneman DJ
wren (Thryothorus pleurostictus). Conserv (2008) SNP discovery in swine by reduced rep-
Genet 9:1657–1660 resentation and high throughput pyrosequenc-
9. Kawuki R, Ferguson M, Labuschagne M, ing. BMC Genet 9:81
Herselman L, Kim DJ (2009) Identification, 12. Van Tassell CP, Smith TPL, Matukumalli LK,
characterisation and application of single nucle- Taylor JF, Schnabel RD, Lawley CT,
otide polymorphisms for diversity assessment in Haudenschild CD, Moore SS, Warren WC,
cassava (Manihot esculenta Crantz). Mol Breed Sonstegard TS (2008) SNP discovery and allele
23:669–684 frequency estimation by deep sequencing of
10. Rostoks N, Mudie S, Cardle L, Russell J, Ramsay reduced representation libraries. Nat Methods
L, Booth A, Svensson JT, Wanamaker SI, Walia 5:247–252
H, Rodriguez EM, Hedley PE, Liu H, Morris J, 13. Maughan PJ, Yourstone SM, Jellen EN,
Close TJ, Marshall DF, Waugh R (2005) Udall JA (2009) SNP discovery via genomic
Genome-wide SNP discovery and linkage analy- reduction, barcoding, and 454-pyrose-
sis in barley based on genes responsive to abiotic quencing in amaranth. Plant Genome 2:
stress. Mol Genet Genomics 274:515–527 260–270
Chapter 14
Abstract
The unambiguous differentiation of crop genotypes is often laborious or expensive. A rapid, robust,
and cost-efficient marker system is required for routine genotyping in plant breeding and marker-assisted
selection. We describe the Inter-SINE Amplified Polymorphism (ISAP) system that is based on standard
molecular methods resulting in genotype-specific fingerprints at high resolution. These markers are derived
from Short Interspersed Nuclear Elements (SINEs) which are dispersed repetitive sequences present in
most if not all plant genomes and can be efficiently extracted from plant genome sequences. The ISAP
method was developed on potato as model plant but is also transferable to other plant species.
Key words Inter-SINE amplified polymorphism, ISAP, Retrotransposon, Short interspersed nuclear
elements, SINE
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_14, © Springer Science+Business Media New York 2015
183
184 Torsten Wenke et al.
tRNA-related region
TSD 3’ tail TSD
5’ A B tRNA-unrelated (A/T)n 3’
R F R F F R
Electrophoresis
Fig. 1 Short Interspersed Nuclear Elements (SINEs) and principle of the Inter-SINE Amplified Polymorphism
(ISAP) method. SINEs are typically characterized by a tRNA-derived 5′ region containing the RNA polymerase
III promoter motif (box A and box B), a non-tRNA related region of mostly unknown origin, and an A/T-rich 3′
tail or simple sequence repeat. The flanking target site duplication (TSD) is created during the integration
process. For ISAP analyses genomic DNA between neighboring SINEs is amplified by PCR with outward-facing
SINE reverse (R) and forward primers (F). PCR amplicons are separated by electrophoresis
Inter-SINE Amplified Polymorphism (ISAP) for Rapid and Robust Plant Genotyping 185
2 Materials
Prepare all solutions using sterile distilled water. Store all reagents
according to manufacturer’s instructions or at room temperature if
not indicated otherwise.
2.2 Agarose Gel 1. Standard horizontal agarose gel electrophoresis and gel docu-
Electrophoresis mentation equipment (see Note 3).
2. 50× TAE buffer: 2 M Tris base, 2 M glacial acetic acid, 50 mM
EDTA dissolved in H2O, pH 8.5. Dilute to 1× TAE with water
prior use.
3. Standard molecular biology grade LE (low electroendosmosis)
agarose.
4. 1 % ethidium bromide (see Note 4).
5. DNA electrophoresis size standard for amplicon sizing, e.g.,
GeneRuler™ 100 bp Plus DNA Ladder (Thermo Scientific).
3 Methods
3.1 Primer Design 1. For the design of primers for ISAP, representative copies of a
SINE family have to be aligned using MUSCLE, MAFFT
[16, 17], or similar programs (see Note 6).
186 Torsten Wenke et al.
Fig. 2 Example of an aligned SINE family (SolS-IV) of the potato genome. Gray shaded rectangles indicate
the RNA polymerase III promoter boxes A and B. Poly(A) tails at the 3′ ends are located directly upstream of the
target site duplications which are underlined. Arrows show positions of primers derived for ISAP. Dots indicate
identical nucleotides. Dashes show gaps introduced to optimize the alignment. EMBL accessions are given
3.2 Polymerase The isolation of genomic DNA from plant material can be
Chain Reaction conducted according to several protocols (e.g., CTAB protocol by
[18]) or with commercially available kits. The use of young leaves
without degradation ensures high quality DNA which is essential
for reliable and reproducible banding patterns. The isolated DNA
should be RNA free.
1. Prepare template DNA for PCR reactions with a concentration
of 10 ng/μl (see Note 9).
2. Prepare a PCR master mix for all samples on ice. Each reaction
consists of 2 μl of DreamTaq™ Green Buffer, 2 μl of dNTPs,
1 μl of BSA, 1 μl of each primer, 0.1 μl of DreamTaq™ DNA
Inter-SINE Amplified Polymorphism (ISAP) for Rapid and Robust Plant Genotyping 187
3.3 Agarose Gel For comparable banding patterns prepare and run gels under con-
Electrophoresis stant conditions (gel composition, voltage, separation time, buffer
composition, and size standard). Conditions described here are
optimized for amplicons of 100–2,000 bp length resulting from a
typical ISAP run.
1. Prepare a 2 % agarose gel with 1× TAE buffer (see Note 12).
2. For staining of the PCR products add ethidium bromide
(0.05 μl/ml gel) prior casting.
3. Put the gel into the electrophoresis chamber and fill in fresh
1× TAE until the gel is covered with 1 mm of the buffer
(see Note 13).
4. Load the complete 20 μl PCR reaction volume and 1–1.5 μg
of the DNA size marker on the gel (see Note 14).
5. Separation runs at approximately 3.5 V/cm until a clear dif-
ferentiation of the individual bands is achieved.
M 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 M bp
3000
1000
500
100
Fig. 3 ISAP patterns of potato varieties with the primer pair SolS-IIIa-extended-F/SolS-IV-extended-R resolved
on 2 % agarose gel in 1× TAE buffer. Lanes correspond to the varieties “Acapella” (1 ), “Angela” (2 ), “Annabelle”
(3 ), “Arcona” (4 ), “Arkula” (5 ), “Arosa” (6 ), “Atica” (7 ), “Ballerina” (8 ), “Bellaprima” (9 ), “Berber” (10 ),
“Bonus” (11 ), “Borwina” (12 ), “Carlita” (13 ), “Rita” (14 ), “Rosara” (15 ), “Salome” (16 ), “Solist” (17 ), “Stefanie”
(18 ), “Valetta” (19 ), “Velox” (20 ), “Verona” (21 ), “Terrana” (22 ), “Agave” (23 ), “Agila” (24 ), “Aktiva” (25 ),
“Ampera” (26 ), and 100 bp Plus DNA Ladder (M )
188 Torsten Wenke et al.
3.4 Capillary 1. Prepare the PCR reaction mix according to Subheading 3.2
Electrophoresis including at least one 5′ dye-labeled primer (see Note 5).
The PCR conditions are described earlier.
2. Use an appropriate size standard according to the expected
amplicon size range.
3. Separate the PCR products in a sequencing device referring to
the manufacturer’s instructions. Depending on the capillary
sequencer aliquots of 1 μl of the PCR reaction or less might be
sufficient for signal detection (see Note 15).
4. Separation results will be documented as electropherograms
and automatically stored in a file (Fig. 4).
3.5 Data Analysis Depending on the purpose of the ISAP analysis, different require-
ments for management and interpretation of the data exist. In par-
ticular, the analysis of large numbers of samples from several ISAP
experiments requires adequate software. Programs like GelCompar
II or BioNumerics (Applied Maths NV) are suitable for the estab-
lishment of a database. This software allows the storage and nor-
malization of data from different ISAP experiments which enables
the comparison and combination of ISAP runs (Fig. 5). The soft-
ware accepts data generated by both conventional agarose gels and
150000
125000 210
291
Dye Signal [rfu]
100000 451
596
75000
509
50000
533 681
458
25000
0
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700
Size [nt]
Fig. 4 Electropherogram of ISAP fragments generated with primers SolS-IIIa-F/SolS-IV-R separated by capillary
electrophoresis for the potato cultivar “Gala”. For fluorescence signal detection, one primer was labeled at
the 5′ end with Cy5. Amplicons (peaks with size information) were separated on the Beckman CEQ™ 8000
capillary sequencer according to the “Frag4” method including an internal size standard (small peaks)
Inter-SINE Amplified Polymorphism (ISAP) for Rapid and Robust Plant Genotyping 189
Fig. 5 ISAP banding patterns of potato varieties analyzed using GelCompar II software (Applied Maths NV). Gel
images for each variety from various Experiment data (e.g., different ISAP primer combinations) can be stored
and analyzed. A Comparison allows cluster Analyses based on the electrophoretic separation to determine
pattern Similarities between varieties visualized as color-coded matrix
4 Notes
Table 1
ISAP primers used for potato containing the 20 bp-5′extension (underlined)
Acknowledgement
References
1. Finnegan DJ (1989) Eukaryotic transposable 7. Zhang X, Wessler SR (2005) BoS: a large and
elements and genome evolution. Trends Genet diverse family of short interspersed elements
5:103–107 (SINEs) in Brassica oleracea. J Mol Evol 60:
2. Ohshima K, Okada N (2005) SINEs and 677–687
LINEs: symbionts of eukaryotic genomes with 8. Deragon JM, Zhang X (2006) Short inter-
a common tail. Cytogenet Genome Res 110: spersed elements (SINEs) in plants: origin,
475–490 classification, and use as phylogenetic markers.
3. Lenoir A, Lavie L, Prieto JL, Goubely C, Cote Syst Biol 55:949–956
JC, Pelissier T, Deragon JM (2001) The evolu- 9. Fawcett JA, Kawahara T, Watanabe H, Yasui Y
tionary origin and genomic organization of (2006) A SINE family widely distributed in the
SINEs in Arabidopsis thaliana. Mol Biol Evol plant kingdom and its evolutionary history.
18:2315–2322 Plant Mol Biol 61:505–514
4. Ohtsubo H, Cheng C, Ohsawa I et al (2004) 10. Tsuchimoto S, Hirao Y, Ohtsubo E, Ohtsubo
Rice retroposon p-SINE1 and origin of culti- H (2008) New SINE families from rice, OsSN,
vated rice. Breed Sci 54:1–11 with poly(A) at the 3' ends. Genes Genet Syst
5. Xu JH, Osawa I, Tsuchimoto S, Ohtsubo E, 83:227–236
Ohtsubo H (2005) Two new SINE elements, 11. Baucom RS, Estill JC, Chaparro C, Upshaw N,
p-SINE2 and p-SINE3, from rice. Genes Jogi A, Deragon JM, Westerman RP, SanMiguel
Genet Syst 80:161–171 PJ, Bennetzen JL (2009) Exceptional diversity,
6. Wenke T, Döbel T, Sörensen TR, Junghans H, non-random distribution, and rapid evolution
Weisshaar B, Schmid T (2011) Targeted iden- of retroelements in the B73 maize genome.
tification of short interspersed nuclear element PLoS Genet 5:e1000732
families shows their widespread existence and 12. Seibt KM, Wenke T, Wollrab C, Junghans H,
extreme heterogeneity in plant genomes. Plant Muders K, Dehmer KJ, Diekmann K, Schmidt
Cell 23:3117–3128 T (2012) Development and application of
192 Torsten Wenke et al.
SINE-based markers for genotyping of potato 16. Edgar RC (2004) MUSCLE: a multiple sequence
varieties. Theor Appl Genet 125:185–196 alignment method with reduced time and space
13. Huang SW, Xu X, Pan SK, Cheng SF, Zhang B complexity. BMC Bioinformatics 5:1–19
et al (2011) Genome sequence and analysis of 17. Katoh K, Misawa K, Kuma K, Miyata T (2002)
the tuber crop potato. Nature 475: MAFFT: a novel method for rapid multiple
189–195 sequence alignment based on fast Fourier
14. Pieterse L, Hils U (2009) World catalogue of transform. Nucleic Acids Res 30:3059–3066
potato varieties 2009/10. Agrimedia GmbH, 18. Saghai-Maroof MA, Soliman KM, Jorgensen
Clenze RA, Allard RW (1984) Ribosomal DNA
15. Rodriguez F, Ghislain M, Clausen AM, Jansky spacer-length polymorphisms in barley: men-
SH, Spooner DM (2010) Hybrid origins of delian inheritance, chromosomal location, and
cultivated potatoes. Theor Appl Genet 121: population dynamics. Proc Natl Acad Sci U S A
1187–1198 81:8014–8018
Chapter 15
Abstract
TILLING (Targeting Induced Local Lesions IN Genomes) is a well-known reverse genetics technique
designed to detect unknown SNPs (single nucleotide polymorphisms) in genes of interest using an enzymatic
digestion and is widely employed in plant and animal genomics. The main advantage of this technique is
that it allows for the high-throughput identification of an allelic series of mutants with a range of modified
functions for a particular gene. In this chapter, we aim to give a detailed introduction of how to establish
a TILLING platform for identifying mutants in plants, including generation of a large mutant population,
DNA and seed library preparation, mutation identification based on a LI-COR4300 DNA analyzer, and
confirmation of functions of the mutated genes.
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_15, © Springer Science+Business Media New York 2015
193
194 Nian Wang and Lei Shi
2 Materials
2.1 Construction 1. Seeds used for mutagenesis should be homozygous, e.g., seeds
of Mutant Population harvested from a double haploid (DH) line.
2. Ethyl Methanesulfonate (EMS) solution: EMS (Sigma-Aldrich,
M0880, USA) (see Note 1), Buffer solution (for 1 l): dissolve
Screening of Mutations by TILLING in Plants 195
Fig. 1 Outline of the high-throughput TILLING procedure [33]. Seeds are mutagenized by treatment with alkyl-
ating agents, such as EMS, which primarily introduces G/C to A/T transitions; M1 plants are self-fertilized, and
M2 individuals are used to prepare DNA samples for mutational screening, whilst an inventory of their seeds is
established for future and downstream research. For mutation screening, DNAs are pooled to maximize
the efficiency of mutation detection. PCR is performed using 5′-end-labeled gene-specific primers to target
the desired locus, and hetero-duplexes are formed by heating and cooling the PCR products. CEL I nuclease is
used to cleave at base mismatches, and the products representing induced mutations are visualized with
denaturing polyacrylamide gel electrophoresis
2.2 Mutation 1. Solution for CEL1: CEL1 extracted from celery stalk
Detection (see Note 2).
2. Buffer B: 0.1 M Tris–HCI, pH 7.7, 0.5 M KCl, 100 μl PMSF.
3. 10× CEL1 buffer solution: 100 ml 1 M MgSO4, 100 ml 1 M
HEPES, pH 7.0 (see Note 3), 50 ml 2 M KCL, 100 μl 10 %
Triton X-100, 100 μl 20 mg/ml BSA. Mix all reagents with
ddH20 and set the volume to 1 l.
4. LICOR 4300 DNA analyzer and the corresponding accessories,
such as 25 cm gel glass, 0.25 mm spacers.
5. Gel: KBPlus 6.50 % Gel Matrix (LI-COR Biosciences-GmbH,
Germany) (see Note 4). 10 % APS, TEMED, 25 ml syringe and
a 0.2 μm filter, assembled (clean) glass plates, 1 l 0.8× TBE (1×
TBE for 6.5 % gel), formamide loading dye (MWG, Germany).
6. Sephadex G50 (Sigma-Aldrich, S5897, USA), 96-well spin
plates, 96-well column loader (Millipore).
7. Labeled primers and DNA ladders (see Note 5).
196 Nian Wang and Lei Shi
3 Methods
3.1 Construction 1. Dissolve EMS into buffer solution at the required concentration.
of Mutant Population Incubate seeds in the EMS solution for 24–36 h (see Note 6).
2. Wash the treated seeds in fresh water for 3–4 h (see Note 7).
3. Sow the mutagenized seeds in the field. The plants are desig-
nated as the M1 generation (see Note 8).
4. Self-pollinate each M1 plant and harvest the seeds.
5. Sow the seeds from M1 plants in the field, these plants are
designated as the M2 generation. Make sure each M1 produces
a M2 plant (see Note 9).
6. Extract DNA from each M2 plant and store in −20 °C after the
DNA concentrations are adjusted to be in an appropriate simi-
lar level. The DNA extraction method will depend on the spe-
cies being investigated.
7. Harvest the self-pollinated seeds of the M2 plants and store
under dry and low temperature conditions.
3.2 Mutation Primers are designed according to the sequence of the candidate
Detection genes being targeted. Make sure of the specificity of primer ampli-
fication for the targeted genes (see Note 10).
3.2.1 Primer Design
3.2.2 PCR Setup The DNA templates for the PCR reaction are pooled four- to
eightfold (see Note 11). All reactions should be kept out of the
light as much as possible to protect the fluorescent oligos from
degrading. Use between 10 and 100 ng of genomic DNA as tem-
plate for the PCR reaction, which is performed in a 96-well plate.
The components of the PCR mix are listed in Table 1. PCR cycling
conditions are outlined in Table 2.
Table 1
The components of PCR mix
Table 2
PCR cycling conditions, including heteroduplex formation
94 °C 2 min
Loop 1: 8 cycles
94 °C 30 s Touch down
65 °C 30 s Increment –1 °C /cycle
72 °C 30 s
Loop 2: 35 cycles
94 °C 30 s Amplify
58 °C 30 s
72 °C 30 s
72 °C 5 min Final extension
99 °C 10 min Denaturation
95 °C 10 min
Decrease 95–80 °C at 3 °C/min Renaturation
Decrease 80–55 °C at 1 °C/min
Hold at 55 °C for 20 min
3.2.3 CEL1 Digestion This step is performed to digest the mismatch formed by DNA
mutation. The original crude CEL1 enzyme is extracted from 500 g
celery stalk or can be purchased (see Note 12). CEL1 working
solution is prepared with 4 μl of crude CEL1 and 96 μl of buffer B.
The digested conditions and procedures are listed as follows.
1. The 96-well plate with the digested reaction solution (Table 3)
is incubated for 15 min at 45 °C.
2. Stop reaction by adding 5 μl of 75 mM EDTA to each well in
the plate and thoroughly mix.
3. The products should be frozen or purified immediately as the
CEL1 enzyme is very active and might continue cutting/
degrading the sample.
198 Nian Wang and Lei Shi
Table 3
CEL1 reaction mix composition
3.2.4 Sample This step is performed to remove excess salt and concentrate the
Purification: Isopropanol CEL1 digested product. Both Sepahadex G50 (from Pharmacia,
Precipitation cross-linked dextran) and isopropanol precipitation can be per-
formed for sample purification and concentration and the work
flow of them both are shown.
1. 15 μl of isopropanol is added to each well in the 96-well plate
which contains the inactive CEL1 digested products.
2. The mixtures of each well are pipetted up and down (manually).
The 96-well plate with samples is spun for 15 min at 4,000 × g
in a microplate centrifuge.
3. The supernatant of each sample is removed and dipped on a
paper towel.
4. The precipitate of each sample is washed with 20 μl 70 % etha-
nol and spun for 15 min and supernatant is decanted.
5. The pellet is dissolved in 5 μl of formamide loading buffer. Do
not worry about the remnants of ethanol. It will evaporate in
the next step.
6. The samples are heated at 85 °C for about 0.5 h until the final
volume of the samples is about 3–5 μl.
3.2.5 Sample 1. Sephadex G50 is loaded in a 96-well spin Sephadex plate and
Purification: Sephadex the excess of resin is removed using a column loader.
Purification 2. 300 μl sterile ddH2O is added into each well with 8-channel
pipette.
3. The 96-well plate with resin is incubated at room temperature
for at least 1.5 h or stored at 4 °C for up to 1 week.
4. Put the Sephadex plate on an alignment frame and an empty
96-well plate. Centrifuge them for 2 min and remove plate
with the flow-through (water).
5. Fill a new PCR plate with 4 μl formamide loading dye
(see Note 13).
Screening of Mutations by TILLING in Plants 199
6. The sephadex plate and alignment frame are put on the PCR
plate with the loading dye.
7. Add all of the samples (the inactive CEL1 digested reaction)
into the well of the Sephadex plate directly and make sure they
are over the centers of the columns without touching them
with the pipette tips.
8. Put a new 96-well plate under the Sephadex plate and centrifuge
them for 2 min, the purified products are now in the flow-
through (see Note 14).
9. Heat the samples at 85 °C for about 0.5 h until the final volume
is about 3–5 μl.
3.2.6 Preparing Gels The 25 cm gels are prepared for electrophoresis to identify the
and Electrophoresis mutants using the LICOR 4300 DNA analyzer.
1. Mix 20 ml of KBplus with 150 μl of 10 % APS and then add
15 μl of TEMED.
2. Fill the syringe with the above mixture and place the filter in
the syringe (see Note 15).
3. Pour the above mixture by dispensing it through the filter
immediately to the chink between the glass plates. The air bub-
bles can be prevented from forming by tapping on the glass
plates at the liquid edge. If air bubbles appear, they can be
removed just after the gel is poured using the bubble catcher
(see Note 16).
4. Insert the comb spacer in the center.
5. Insert the plexiglass pressure plate between the plates and
clamp rails and tighten the screws (see Note 17).
6. Let the gel fully polymerize for at least 1.5 h or store the gel at
4 °C for not more than 1 day (see Note 18).
7. Remove the plexiglass plate and the comb spacer.
8. Remove the excess polyacrylamide in and above the slot with
pipette.
9. Rinse the outside of the plates with ddH2O and ethanol
(see Note 19).
10. Place the gel on the LI-COR machine and setup all accessories
following the manual book of the LI-COR DNA analyzer.
11. Run the gel (see Note 20).
3.3 Scoring This step is performed to identify which pooled samples would
and Confirmation harbor mutations according to the images from the LICOR 4300
of the Mutations DNA analyzer.
1. Download the IRD 700 and 800 nm images from the LICOR
4300 DNA analyzer to personal computer.
200 Nian Wang and Lei Shi
2. Check if there are any new bands on some lanes on the 700
and 800 nm images except the main bands of the PCR prod-
ucts of your targeted gene and dimers.
3. Calculate the sizes of new bands obtained from step 2 on the
same lanes on the 700 and 800 nm images, respectively.
4. A lane with new bands on both 700 and 800 nm images and
the total size of the two bands equal to the size of PCR product
of the targeted genes can be regarded as a possible mutation
existing among the pooled samples of the lane.
5. Confirm the mutation by sequencing (see Note 21).
4 Notes
the proper pooling folds. The new pooling samples and the
original one should be distinguished.
12. Crude CEL1 is extracted from celery stalk.
13. To reduce the cost, the components of formamide loading dye
can be prepared according to the following components: 25 ml
of deionized formamide, 500 μl of 0.5 M EDTA, pH 8.0,
6 mg of bromophenol blue, and 67 ml of ddH2O.
14. The used sephadex plate can be cleaned and reused after dry-
ing at 36 °C and washing.
15. This step should be performed as soon as possible to prevent
the gel polymerizing.
16. The glass plate should be placed at a horizontal level. Striking
the plate very gently can prevent the bubbles.
17. Clamp the plates.
18. If the gel is to be kept for a day, its ends should be covered with
a wet (0.8× TBE buffer) tissue and wrapped in a plastic foil.
19. The place where the laser scans the gel should be very clean so
as to prevent strong background from appearing in the final
images.
20. Follow the manual book of LICOR DNA analyzer to set up
the parameters of electrophoresis running.
21. Sequence the targeting PCR products of the samples which
harbor possible mutations. Compare the sequence of PCR
products of the potential mutations with the wild type to con-
firm the mutation.
References
1. McCallum CM, Comai L, Greene EA, Henikoff 5. Jones MO, Piron-Prunier F, Marcel F, Piednoir-
S (2000) Targeted screening for induced muta- Barbeau E, Alsadon AA, Wahb-Allah MA,
tions. Nat Biotechnol 18:455–457 Al-Doss AA, Bowler C, Bramley PM, Fraser PD,
2. McCallum CM, Comai L, Greene EA, Bendahmane A (2012) Characterisation of
Henikoff S (2000) Targeting induced local alleles of tomato light signalling genes gener-
lesions IN genomes (TILLING) for plant ated by TILLING. Phytochemistry 79:78–86
functional genomics. Plant Physiol 123: 6. Gady AL, Vriezen WH, Van de Wal MH, Huang
439–442 P, Bovy AG, Visser RG, Bachem CW (2012)
3. Till BJ, Colbert T, Tompa R, Enns LC, Induced point mutations in the phytoene syn-
Codomo CA, Johnson JE, Reynolds SH, thase 1 gene cause differences in carotenoid
Henikoff JG, Greene EA, Steine MN, Comai content during tomato fruit ripening. Mol
L, Henikoff S (2003) High-throughput Breed 29:801–812
TILLING for functional genomics. Methods 7. Chen L, Huang L, Min D, Phillips A, Wang S,
Mol Biol 236:205–220 Madgwick PJ, Parry MA, Hu YG (2012)
4. Till BJ, Reynolds SH, Greene EA, Codomo Development and characterization of a new
CA, Enns LC, Johnson JE, Burtner C, Odden TILLING population of common bread wheat
AR, Young K, Taylor NE, Henikoff JG, Comai (Triticum aestivum L.). PLoS One 7:e41570
L, Henikoff S (2003) Large-scale discovery of 8. Sabetta W, Alba V, Blanco A, Montemurro C
induced point mutations with high-throughput (2011) sunTILL: a TILLING resource for gene
TILLING. Genome Res 13:524–530 function analysis in sunflower. Plant Methods
7:20
202 Nian Wang and Lei Shi
9. Slade AJ, McGuire C, Loeffler D, Mullenberg 20. Elias R, Till BJ, Mba C, Al-Safadi B (2009)
J, Skinner W, Fazio G, Holm A, Brandt KM, Optimizing TILLING and Ecotilling tech-
Steine MN, Goodstal JF, Knauf VC (2012) niques for potato (Solanum tuberosum L).
Development of high amylose wheat through BMC Res Notes 2:141
TILLING. BMC Plant Biol 12:69 21. de Lorenzo L, Merchan F, Laporte P, Thompson
10. Sikora P, Chawade A, Larsson M, Olsson J, R, Clarke J, Sousa C, Crespi M (2009) A novel
Olsson O (2012) Mutagenesis as a tool in plant plant leucine-rich repeat receptor kinase regu-
genetics, functional genomics, and breeding. lates the response of Medicago truncatula roots
Int J Plant Genomics 2011:314829 to salt stress. Plant Cell 21:668–680
11. Okabe Y, Asamizu E, Saito T, Matsukura C, 22. Wang N, Wang Y, Tian F, King GJ, Zhang C,
Ariizumi T, Bres C, Rothan C, Mizoguchi T, Long Y, Shi L, Meng J (2008) A functional
Ezura H (2011) Tomato TILLING technol- genomics resource for Brassica napus: develop-
ogy: development of a reverse genetics tool for ment of an EMS mutagenized population and
the efficient isolation of mutants from Micro- discovery of FAE1 point mutations by
Tom mutant libraries. Plant Cell Physiol TILLING. New Phytol 180:751–765
52:1994–2005 23. Suzuki T, Eiguchi M, Kumamaru T, Satoh H,
12. Knoll JE, Ramos ML, Zeng Y, Holbrook CC, Matsusaka H, Moriguchi K, Nagato Y, Kurata
Chow M, Chen S, Maleki S, Bhattacharya A, N (2008) MNU-induced mutant pools and
Ozias-Akins P (2011) TILLING for allergen high performance TILLING enable finding of
reduction and improvement of quality traits in any gene mutation in rice. Mol Genet Genomics
peanut (Arachis hypogaea L.). BMC Plant Biol 279:213–223
11:81 24. Cooper JL, Till BJ, Laport RG, Darlow MC,
13. Stephenson P, Baker D, Girin T, Perez A, Kleffner JM, Jamai A, El-Mellouki T, Liu S,
Amoah S, King GJ, Ostergaard L (2010) A rich Ritchie R, Nielsen N, Bilyeu KD, Meksem K,
TILLING resource for studying gene function Comai L, Henikoff S (2008) TILLING to
in Brassica rapa. BMC Plant Biol 10:62 detect induced mutations in soybean. BMC
14. Minoia S, Petrozza A, D'Onofrio O, Piron F, Plant Biol 8:9
Mosca G, Sozio G, Cellini F, Bendahmane A, 25. Till BJ, Cooper J, Tai TH, Colowit P, Greene
Carriero F (2010) A new mutant genetic EA, Henikoff S, Comai L (2007) Discovery of
resource for tomato crop improvement by chemically induced mutations in rice by
TILLING technology. BMC Res Notes 3:69 TILLING. BMC Plant Biol 7:19
15. Fitzgerald TL, Kazan K, Li Z, Morell MK, 26. Horst I, Welham T, Kelly S, Kaneko T, Sato S,
Manners JM (2010) A high-throughput method Tabata S, Parniske M, Wang TL (2007)
for the detection of homologous gene deletions TILLING mutants of Lotus japonicus reveal
in hexaploid wheat. BMC Plant Biol 10:264 that nitrogen assimilation and fixation can
16. Uauy C, Paraiso F, Colasuonno P, Tran RK, occur in the absence of nodule-enhanced
Tsai H, Berardi S, Comai L, Dubcovsky J sucrose synthase. Plant Physiol 144:806–820
(2009) A modified TILLING approach to 27. Heckmann AB, Lombardo F, Miwa H, Perry
detect induced mutations in tetraploid and JA, Bunnewell S, Parniske M, Wang TL,
hexaploid wheat. BMC Plant Biol 9:115 Downie JA (2006) Lotus japonicus nodulation
17. Perry J, Brachmann A, Welham T, Binder A, requires two GRAS domain regulators, one of
Charpentier M, Groth M, Haage K, Markmann which is functionally conserved in a non-
K, Wang TL, Parniske M (2009) TILLING in legume. Plant Physiol 142:1739–1750
Lotus japonicus identified large allelic series for 28. Till BJ, Reynolds SH, Weil C, Springer N,
symbiosis genes and revealed a bias in func- Burtner C, Young K, Bowers E, Codomo CA,
tionally defective ethyl methanesulfonate alleles Enns LC, Odden AR, Greene EA, Comai L,
toward glycine replacements. Plant Physiol 151: Henikoff S (2004) Discovery of induced point
1281–1291 mutations in maize genes by TILLING. BMC
18. Morita R, Kusaba M, Iida S, Yamaguchi H, Plant Biol 4:12
Nishio T, Nishimura M (2009) Molecular char- 29. Perry JA, Wang TL, Welham TJ, Gardner S,
acterization of mutations induced by gamma irra- Pike JM, Yoshida S, Parniske M (2003) A
diation in rice. Genes Genet Syst 84:361–370 TILLING reverse genetics tool and a web-
19. Le Signor C, Savois V, Aubert G, Verdier J, accessible collection of mutants of the legume
Nicolas M, Pagny G, Moussy F, Sanchez M, Lotus japonicus. Plant Physiol 131:866–871
Baker D, Clarke J, Thompson R (2009) 30. Tsai H, Howell T, Nitcher R, Missirian V,
Optimizing TILLING populations for reverse Watson B, Ngo KJ, Lieberman M, Fass J, Uauy
genetics in Medicago truncatula. Plant C, Tran RK, Khan AA, Filkov V, Tai TH,
Biotechnol J 7:430–441 Dubcovsky J, Comai L (2011) Discovery of
Screening of Mutations by TILLING in Plants 203
rare mutations in populations: TILLING by 33. Colbert T, Till BJ, Tompa R, Reynolds S,
sequencing. Plant Physiol 156:1257–1268 Steine MN, Yeung AT, McCallum CM, Comai
31. Kurowska M, Daszkowska-Golec A, Gruszka L, Henikoff S. (2001) High- throughput
D, Marzec M, Szurman M, Szarejko I, screening for induced point mutations. Plant
Maluszynski M (2011) TILLING: a shortcut in Physiol 126:480–484
functional genomics. J Appl Genet 34. Oleyowski CA, Bronson Mullins CR, Godwin
52:371–390 AK, Yeung AT (1998) Mutation detection
32. Weil CF (2009) TILLING in grass species. using a novel plant endonuclease. Nucleic
Plant Physiol 149:158–164 Acids Res 26:4597–4602
Chapter 16
Abstract
Mass spectrometric cleaved amplified polymorphic sequence (MS-CAPS) is a method for detecting genes
using a combination of short PCR and matrix-assisted laser desorption ionization time-of-flight mass
spectrometry (MALDI-TOF MS). MS-CAPS can identify a single nucleotide polymorphism (SNP) in less
than one hour and is suitable for plants, animals, bacteria, and food.
Key words Mass spectrometric cleaved amplified polymorphic sequence (MS-CAPS), Matrix-assisted
laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS), Polymerase
chain reaction (PCR), Restriction enzyme, Single nucleotide polymorphisms (SNPs), Uracil-DNA
glycosylase (UDG)
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_16, © Springer Science+Business Media New York 2015
205
206 Hideyuki Kajiwara
2 Materials
2.1 DNA Extraction 1. gDNA extraction solution: pure sterilized water or water con-
taining 20 mM cysteine: Dissolve 351 mg L-cysteine hydro-
chloride monohydrate in 100 ml of water (see Note 1).
2. Plastic or glass rod.
2.2 PCR 1. Primer sets to amplify the specific regions (see Note 2).
2. PCR amplification apparatus, such as the 2720 thermal cycler
(Applied Biosystems, Foster city, CA, USA) (see Note 3).
3. DNA polymerase: Tfi DNA polymerase with 10× buffer and
50 mM MgCl2 (Invitrogen, Carlsbad, CA, USA) (see Note 4).
4. dATP, dCTP, dGTP, dTTP, and dUTP: dilute to 5 mM dNTP.
3 Methods
3.1 DNA Extraction 1. Manually crush pieces of sample (approx. 3–5 mm of plant
and Isolation leaf) with a rod, then vortex in 0.1 ml water or 20 mM cysteine
of Single-Stranded (see Note 1) for 1–3 min.
DNA 2. Briefly centrifuge the sample, if needed, and use 0.8 μl of the
supernatant for PCR (see Note 8).
3.2 PCR 1. Prepare the solution for asymmetric PCR by mixing 0.8 μl of
extracted crude gDNA, 3.5 μl of 0.1 mM primer, 0.5 μl of
0.1 mM biotinylated primer, 14.8 μl of PCR amplification mix-
ture, and 0.4 μl of Tfi DNA polymerase in a 0.2 ml PCR tube.
PCR amplification mixture consists of 0.8 μl of 5 mM dATP,
0.8 μl of 5 mM dCTP, 0.8 μl of 5 mM dGTP, 0.8 μl of 5 mM
dTTP, 0.6 μl of 50 mM MgCl2, 4 μl of 10× PCR buffer solu-
tions, and 7 μl of water. If UDG is to be used, replace dTTP
with dUTP (see Note 9). The final volume should be 20 μl.
2. Amplify the DNA using the following reaction conditions for
fast PCR: denaturation for 1 min at 94 °C; then five cycles of
2 s at 94 °C, 2 s at X °C, and 2 s at 72 °C; then 25 cycles of 2 s
at 85 °C, 2 s at X °C, and 2 s at 72 °C. X °C is the annealing
temperature of the particular primers used (see Note 10).
3. Check the amplicon by acrylamide gel electrophoresis, if
required (see Note 11).
208 Hideyuki Kajiwara
4 Notes
B SNP
gDNA
+ Restriction
enzyme or UDG
+ MB/SA
washing by water
N
MB/SA=B
S
Alkali denaturation
washing by water
N
MB/SA=B
S
at 65⬚C + NH 3 solution
N
MB -SA B
S
Table 1
Nucleotide sequences of rice SNP gene loci and PCR primers
Gene loci were selected using the database http://rapdb.dna.affrc.go.jp/. A vertical arrow shows
the restriction enzyme cleavage site. The position of the SNP is shown as an upper case red letter
2.0
Intensity (x 104) [a.u.]
1SLin
Hi
1.5 m/z 5687.713
0.5
1SLin
2.0 Ak
1.0
m/z
5000 6000 7000 8000
(m/z)
b 1SLin
Hi m/z 8735.504
8.0
Intensity (x 104) [a.u.]
4.0
1SLin
Ak
m/z 8694.644 Δ = 40.860
6.0
4.0
Fig. 2 MS-CAPS analysis of rice cultivars. (a) SNP gene locus E20943 was selected for SNP detection and
digested by EaeI. The peak at m/z 5,687 differed significantly between cultivar Akitakomachi (Ak) and cultivar
Himenomochi (Hi). A horizontal arrow shows the peak corresponding to the amplicon and a vertical arrow at
m/z 6570 shows the peak corresponding to unreacted primer. If asymmetric PCR was not used, the peak
derived from the unreacted primer would be higher than that of the PCR products. (b) Analysis of the same
locus using a different approach. Amplicons were digested using Tsp509I. The theoretical mass difference
between Hi and Ak would be 40.03 (Table 2); the observed difference by MS-CAPS analysis was 40.860.
A horizontal arrow shows the observed peak and vertical arrows show postsource decay products during
MALDI-TOF MS analysis. (c) MS-CAPS analysis of amplicons treated with UDG. Biotinylated primer R1744FB
and primer R1744R were used to amplify sequences from cultivars Heiseimochi (He) and Nipponbare (Ni), and
peaks representing the amplicons were detected at m/z 6,976.991 and 6,961.424, respectively. PCR products
after digestion by UDG and alkali treatment are shown. The mass difference detected by MS-CAPS analysis of
He and Ni was 15.576 (theoretical difference between A and G = 16.00 (Table 2)). Vertical arrows show the
degradation products from postsource decay during MALDI-TOF MS analysis
212 Hideyuki Kajiwara
c
1SLin
2.0 He
m/z 6976.991
1.0
Intensity (x 104) [a.u.]
1SLin
1.5 Ni m/z 6961.424 Δ = 15.576
0.5
Fig. 2 (continued)
Table 2
Mass differences between nucleosides
Nucleoside (mass) G A T C
G (151.10) – 15.97 24.99 40.03
A (135.13) – 9.02 24.03
T (126.11) – 16.00
C (111.10) –
The mass of the mono isotope of each nucleotide was used for calculation
Acknowledgements
References
Abstract
Accurate genotyping is essential for building genetic maps and performing genome assembly of polyploid
species. Recent high-throughput techniques, such as Illumina GoldenGate™ and Sequenom iPLEX
MassARRAY®, have made it possible to accurately estimate the relative abundances of different alleles even
when the ploidy of the population is unknown. Here we describe the experimental methods for collecting
these relative allele intensities and then demonstrate the practical concerns for inferring genotypes using
Bayesian inference via the software package SuperMASSA.
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_17, © Springer Science+Business Media New York 2015
215
216 Marcelo Mollinari and Oliver Serang
markers, especially for tetraploid species [19]. In this case, since the
number of possible gamete genotypes is relatively small (up to 6), it
is possible to predict the genotype of a pair of parents (for a given
locus) based on the marker information in their progeny. This kind
of information enabled several authors to propose methods for
genetic mapping [20–24] and QTL mapping [25–27], taking into
account the multidose and multiallelic nature of tetraploid species.
More recently, single nucleotide polymorphisms have played
an important role in genetic studies of polyploids. Along with inser-
tions and deletions, they are the most common type of sequence
difference between alleles [28]. The abundance of SNPs makes
them extremely important to the construction of saturated genetic
maps and for QTL analysis and association studies. In polyploids, a
locus may carry multiple doses of a particular nucleotide. The quan-
tification of this dosage is possible when using quantitative SNP
genotyping. Quantitative high-throughput technologies, such as
Illumina GoldenGate™ [29] and Sequenom iPLEX MassARRAY®
[30, 31] provide two signals for each SNP locus. Since SNPs are
mostly biallelic, each one of these signals corresponds to an inten-
sity recorded for one of the two possible alleles. Thus, the expected
value of each signal intensity is proportional to the corresponding
allele dosage [31, 32]. Using these ideas, Serang et al. [5] proposed
a graphical Bayesian model for inferring polyploid SNP genotypes
(i.e., inferring the discrete genotype of each individual for each
locus, identifying the number of copies of each allele and, if neces-
sary, predicting the ploidy level).
In this chapter, we describe the practical aspects of genotyping
polyploids with MassARRAY® and similar platforms. Polyploid
genotyping can be partitioned into two distinct sets of tasks:
The first set of tasks involves estimating the relative abundance of
alleles from each individual in the population. Subheading 1.1
describes how the MassARRAY® platform can be used to estimate
the relative intensities of each allele for each individual in the popu-
lation. The second set of tasks use a scatter plot of relative intensi-
ties to estimate the genotype for each individual in the population.
Subheading 3.2 describes how to estimate genotypes from these
relative abundances; the methods described in this subsection do
not depend on the platform used to estimate the relative
abundances.
1.1 Current Methods Currently, the two important protocols for polyploid genotyping
for Estimating Relative are Illumina GoldenGate™ and Sequenom MassARRAY®; these
Abundance of Alleles protocols use different methods for estimating the relative abun-
dance of the two alleles at a particular locus (Fig. 1). Note that
both procedures only provide relative allele abundances; even with
well-designed controls (e.g., starting with the same amount of tis-
sue from each individual), the amplified allele abundances may not
be comparable between two different individuals, because different
individuals may be processed separately at some point, creating
218 Marcelo Mollinari and Oliver Serang
a Sequence unique b
Fluorescent Mass modified
SNP site targeting a particular Extension primer
label terminators
bead
Allele +
T T Allele
C Allele C
+
Allele
CT CT CT
Intensity
Intensity
Intensity
T/T T/C C/C
m/z m/z m/z
T/T T/C C/C
Fig. 1 Alternative methods for measuring relative abundance of alleles. In both methods there are allele-
specific amplifications prior to the quantification step. (a) In fluorescence-based methods such as Illumina
GoldenGate™, each allele hybridizes to its own fluorescent probe. A unique sequence contained in the prod-
ucts of the specific amplification hybridizes in a specific bead. Thus, the assay products that were in solution
are bound to a solid surface for quantification. The abundance of each allele is determined by quantifying the
fluorescence intensity using a laser scanning confocal microscope [33]. (b) In mass spectrometry-based
methods such as Sequenom MassARRAY®, the DNA fragments at each allele are ionized and time-of-flight
mass spectrometry is used to quantify the relative abundance of each allele. Because the alleles have non-
identical sequences (due to modified-mass ddNTP terminators in Sequenom iPLEX reaction, indicated by
asterisks), they have slightly different masses; measuring the total charge of the ions at the time of flight
expected for the mass-to-charge ratio for each allele gives a robust estimate for the relative abundance of
each allele in the form of a intensity spectrum [30]
2 Materials
3 Methods
3.1 Processing One of the most suitable methods for quantitative SNP genotyping
the Biological Sample in polyploids is MALDI-TOF MS. In MALDI-TOF MS the DNA
with MassARRAY® is deposited in a buffering matrix containing a crystalline structure
(3-hydroxypicolinic acid, in MassARRAY®). This buffering matrix
is responsible for the absorption of a greater part of the energy
applied to the sample, preventing a significant decomposition and
fragmentation of the DNA [31]. Then, the LASER beam is directed
on to the samples causing the desorption/ionization of the DNA.
The ionized molecules (mostly positive ions) pass through a flight
tube with a detector at the end. The flight time is proportional to
the mass-to-charge ratio (m/z) of the ionized molecule [42]; how-
ever, MALDI usually has a strong enrichment for singly charged
ions. In the same electric field (and thus the same force field, since
the charges are roughly homogeneous), molecules with higher
masses accelerate more slowly, and consequently have longer time of
flight than molecules with low masses. Figure 2 shows a schematic
representation of MALDI-TOF MS.
The motivating principle behind SNP genotyping using
MALDI-TOF MS is to detect the abundance of the DNA fragments
from each allele by measuring the intensities of the masses corre-
sponding to the two amplified DNA fragments. These fragments
are amplified using target-specific PCR and extensions conducted
at the SNP site. Each allele at the locus has a specific mass, and the
relative intensities of the amplified product at each mass are indica-
tive of the relative intensities of the alleles.
One of the advantages of SNP genotyping using MALDI-TOF
MS is its capacity for multiplexing. This multiplexing, which allows
multiple loci to be processed in a single analysis, makes this tech-
nology cost-effective when compared to the other high-throughput
technologies available [37]. One of the most used MALDI-TOF
MS genomics platforms is the Sequenom MassARRAY® system
[31]. The first optimized protocol for multiplexing developed by
222 Marcelo Mollinari and Oliver Serang
Fig. 2 Schematic representation of MALDI-TOF MS. The DNA is embedded in a buffering matrix which forms a
crystalline structure. A LASER beam is directed on to the matrix, which absorbs the greater part of the energy
and ionizes the analyte. The ionized molecules pass through a flight tube to a detector at the end. The flight
time is proportional to the mass-to-charge ratio (m/z) of the ionized molecule. The output is usually given in a
form of a three-peak spectrum. The first peak (m+3) corresponds to an unextended primer. The second cor-
responds to the low mass allele (m+2) and the third corresponds to the high mass allele (m+1)
Fig. 3 (continued) mass differences corresponding to the mass of a nucleotide, which makes the multiplexing
routine feasible. The iPLEX reaction instead uses acyclic mass-modified terminators to perform the extension into
the SNP site. These mass-modified terminators used by iPLEX differentiate the mass of the extension products by
at least 16 Da. Since the mass differences can be detected directly at the SNP site without extra nucleotide
masses, it is possible to establish higher multiplex levels than the ones obtained when using hME [43, 47]
Quantitative SNP Genotyping of Polyploids with MassARRAY and Other Platforms 223
Fig. 3 A schematic representation of MassARRAY® hME and iPLEX assays. The first step is based on a locus-
specific PCR followed by a treatment with shrimp alkaline phosphatase (SAP) for both assays. In the hME
reaction one extra nucleotide is used in order to create larger mass differences between extension products.
In the example, the allele-specific extension is conducted using a normal T nucleotide (which is complemen-
tary to the SNP site A) along with other terminator nucleotides (A, C, and G). Thus, when a nucleotide A is
present as a template, the extension passes the SNP site and stops at the next nucleotide. When the other
allele is present (G in the example), the extension terminates exactly at the SNP site. This produces alleles with
224 Marcelo Mollinari and Oliver Serang
40
30
Intensity of allele 2
Intensity of allele 2
20
10
0
0
Fig. 4 Examples of scatter plots of raw data presented in Serang et al. [5]. (a) Example of an autotetraploid
potato scatter plot of allele intensities in an association panel obtained using the Illumina GoldenGate™ assay.
The annotated scatter plot for this SNP is shown in Fig. 7. (b) Example of a sugarcane scatter plot of allele
intensities in a F1 biparental sugarcane population with unknown ploidy obtained using the Sequenom iPLEX
MassARRAY® technology. The annotated scatter plot for these two SNPs is shown in Fig. 8
3.2.1 Iterative Approach Iterative approaches utilizing mixture models were the first methods
devised to perform joint inference on the clusters and genotypes.
In particular, Voorrips et al. [49] extended the method of Fujisawa
et al. [50] from diploids to autotetraploids. These methods essen-
tially alternate between assigning points to a cluster and comput-
ing a linear regression on each cluster, which provides the average
slope of the cluster (i.e., it computes m in the regression y = mx + b).
This average slope is then used to find an integer solution x/y = m
where Y and X are integers and Y + X = P. Points are then reas-
signed to the nearest cluster (using the regression from each cluster),
and the process is continued until convergence is reached. Points
with total intensities too small (x2 + y2 < τ) are excluded from analysis
because small changes can affect their cluster membership, and can
in turn affect the slope of the cluster, resulting in cumulative errors
of nontrivial magnitude.
This iterative approach is intuitive; however, it does not fully
consider the interdependence of the three goals enumerated earlier.
As a result, the clusters are relatively unconstrained. After conver-
gence is reached, a cluster may be assigned to a nonsensical region
between two genotypes, with a slope m that does not correspond
well to any predicted dosage for that ploidy. Furthermore, the model
requires the ploidy to be known in advance (to determine the num-
ber of components in the mixture model). It is not possible to infer
the true ploidy by simply trying several different possible ploidys,
because the unconstrained nature of the model inherently rewards
higher ploidys (consider a model where each point is contained in a
single cluster and zero error in the regressions).
Essentially, these problems lead to a lack of “identifiability”;
even with an infinite sample size, it is not possible to compare ploidys
that would share common clusters (e.g., 1:3 cluster from a tetra-
ploid population and 2:6 cluster from an octoploid population).
3.2.2 Bayesian Approach The Bayesian approach used by SuperMASSA [5] starts with an
assumed ploidy and then computes the predicted location of geno-
type clusters and, by using population-level modeling, the predicted
distribution of genotypes in the population. The population-level
information (e.g., a population in Hardy–Weinberg equilibrium at
the locus or the progeny of an F1 cross) can then be used to com-
pute a likelihood that rewards solutions that not only offer tight
clusters at the predicted ratios (or angles), but also rewards solu-
tions that yield plausible genotype distributions for the population.
Figure 5 shows the graphical model used by the Bayesian method
SuperMASSA: G denotes all of the genotype assignments and
228 Marcelo Mollinari and Oliver Serang
50 50
40 40
Intensity of allele 2
Intensity of allele 2
30 30
20 20
10 10
0 0
10 10
50 0 50 100 150 200 250 50 0 50 100 150 200 250
Intensity of allele 1 Intensity of allele 1
Fig. 6 Illustration of a suboptimal genotype configuration presented in Serang et al. [5]. The left panel shows
an inferior genotype configuration, where two individuals are assigned genotypes where the relative intensities
of each individual are closer to the expected relative intensities for the genotype assigned to the other indi-
vidual. The right panel shows a superior genotype configuration, which swaps the genotypes of these two
individuals, achieving the same distribution of genotypes in the population, but with a lower distance between
the scatter plot points and the expected relative intensities for the predicted genotypes
the population, and the search space for the Gaussian width parameter
σ, which is used to model the noise in the intensity measurements.
Specifying a larger Gaussian width parameter σ decreases efficiency
of MAP inference because it permits highly diffuse clusters; thus
configurations that assign a point to a far-away cluster are not sub-
stantially penalized and cannot be aborted as early by the branch
and bound. The Bayesian method is the current method of choice
for genotype inference, particularly if the ploidy is not known or if
information or data concerning parents or population structure are
available.
3.3 Example Figures 7 and 8 depict examples of annotated loci using the two
of Good Loci models available in the current version of SuperMASSA, e.g., F1
and Hardy–Weinberg. The raw data for these Figures is shown in
Fig. 4. The locus on Fig. 7 was obtained from a tetraploid potato
association panel and the Hardy–Weinberg model was used. The
scatter plot from Fig. 8 was obtained from a biparental cross of two
precommercial varieties of sugarcane with unknown ploidy. In that
case, the MAP configuration simultaneously found the ploidy level,
a set of two parents with their respective dosages, and the genotype
assignments for the scatter plot. In both figures, the theoretical
genotype distributions and observed genotype distributions are
nearly identical. Also, it is important to note that the genotype
annotations are extremely close to the predicted angles for each
assigned genotype.
When the ploidy is unknown (which is the case of commercial
varieties of sugarcane), SuperMASSA searches for a ploidy which
yields a MAP in a specified range (see Note 1). In Fig. 8 the esti-
mated ploidy level was 10 and the posterior probability given by
SuperMASSA was extremely close to 1.00, indicating a good
Fig. 7 The annotated scatter plot of allele intensities for the SNP shown in Fig. 4a (a tetraploid potato association
panel). The first graphic shows the annotated scatter plot and the second shows the theoretical distribution of
genotypes in the population and the distribution of individuals assigned to each genotype with σ = 0.10. Data
extracted from Voorrips et al. [49]. Platform used: Illumina GoldenGate
232 Marcelo Mollinari and Oliver Serang
Parents Progeny
0.5
Expected
40
30 Observed
35 0.4
25
Intensity of allele 2
Intensity of allele 2
30
Frequencies
20 25 0.3
15 20
0.2
15
10
10
0.1
5
5
0 0 0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30 35 40 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Fig. 8 The annotated scatter plot of allele intensities for the SNP shown in Fig. 4b (a F1 biparental sugarcane
population). The ploidy level was searched from 2 to 100 (only even numbers). The first graphic shows parental
data, consisting of 12 replicates of each parent. The second indicates the annotated scatter plot in F1 popula-
tion and the third shows the theoretical distribution of genotypes in the population and the distribution of
individuals assigned to each genotype. Extracted from Serang et al. [5]. The posterior probability associated to
the classification was ≈1.00 and the estimated ploidy level was 10 with σ = 0.16. Platform used: Sequenom
MassARRAY with iPLEX chemistry
Fig. 9 Effect of the increment of the naive posterior report threshold on the num-
ber of annotated individuals for the loci analyzed in Fig. 7. All individuals studied
have at least a posterior of 0.55 (dashed line) and more than 80 % of the data
have a posterior equal or higher than 0.90 (dot-dashed line), which certainly
indicates a good result
4 Notes
50 16
40
14
Intensity of allele 2
Intensity of allele 2
Intensity of allele 2
40 12
30
10
30
20 8
20 6
10 4
10
2
0 0 0
0 10 20 30 40 0 10 20 30 40 50 0 2 4 6 8 10 12 14 16
Fig. 10 Example of three annotated sugarcane SNPs with increasing levels of difficulty to estimate the correct
ploidy level (the signal/noise ratio decreases from left to right). The two first SNPs were genotyped in a F1
biparental sugarcane population. The third was genotyped in an association sugarcane panel. In all cases the
range where the ploidy level was searched was from 2 to 100 (only even numbers). The first locus computes
an estimated ploidy of eight with a high posterior probability (≈1.00). The second locus achieves an estimated
ploidy of 10, and a lower posterior probability (≈0.51). The third locus estimates the ploidy to be 76, and with
a very low posterior probability (≈0.24). Platform used: Sequenom MassARRAY with iPLEX chemistry. Courtesy
of Dr. Anete P de Souza, Centro de Biologia Molecular e Engenharia Gen´etica, UNICAMP, Campinas, Brazil
Parents Progency
25 0.25
35 Expected
Observed
30 0.20
20
Intensity of allele 2
Intensity of allele 2
25
Frequencies
15 0.15
20
10 15 0.10
10
5 0.05
5
0 0 0.00
0 5 10 15 20 25 0 5 10 15 20 25 30 35 0 5 10 15 20 25
Intensity of allele 1 Intensity of allele 1 Doses of allele 1
Fig. 11 Annotated scatter plot of a SNP genotyped in a F1 biparental sugarcane population. The ploidy level
was searched from 2 to 100 (only even numbers). When there is too much noise (a low signal/noise ratio), it
is difficult to obtain a good classification. In this case, it can be identified by observing the low posterior
probability (≈0.47). The estimated ploidy level was 24. Platform used: Sequenom MassARRAY with iPLEX
chemistry. Courtesy of Dr. Anete P de Souza, Centro de Biologia Molecular e Engenharia Gen´etica, UNICAMP,
Campinas, Brazil
236 Marcelo Mollinari and Oliver Serang
Parents Progency
0.40
40 Expected
35 0.35 Observed
35
30
Intensity of allele 2
30 0.30
Intensity of allele 2
25
Frequencies
25 0.25
20 0.20
20
15 15 0.15
10 10 0.10
5 5 0.05
0 0 0.00
0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 40 .0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
Intensity of allele 1 Intensity of allele 1 Doses of allele 1
Fig. 12 Annotated scatter plot of a SNP genotyped in a F1 biparental sugarcane population. The ploidy level
was searched from 2 to 100 (only even numbers). It is possible to see a skew in parental and progeny scatter
plots. Although the skew has not been modeled, a high posterior was obtained (≈1.00) indicating a high quality
of inferred genotypes. The estimated ploidy level was 6. Platform used: Sequenom MassARRAY with iPLEX
chemistry. Courtesy of Dr. Anete P de Souza, Centro de Biologia Molecular e Engenharia Gen´etica, UNICAMP,
Campinas, Brazil
Fig. 13 Annotated scatter plot of a SNP genotyped in an association sugarcane panel. The ploidy level was
searched from 2 to 100 (only even numbers). It is possible to see distinct clusters but the posterior probability
indicates a poor result (≈0.67). The estimated ploidy level was 12. Platform used: Sequenom MassARRAY with
iPLEX chemistry. Courtesy of Dr. Anete P de Souza, Centro de Biologia Molecular e Engenharia Gen´etica,
UNICAMP, Campinas, Brazil
Quantitative SNP Genotyping of Polyploids with MassARRAY and Other Platforms 239
Acknowledgements
We are grateful for the help of Dr. Antonio Augusto Franco Garcia
of University of São Paulo ESALQ, Dr. Thiago G Marconi and Dr.
Anete P de Souza, Centro de Biologia Molecular e Engenharia
Gen´etica, UNICAMP for generously sharing their data and exper-
tise. We also like to thank Dr. Gary McDowell of Harvard Medical
School and Boston Children’s Hospital and Dr. Ryan Emerson of
Adaptive TCR Corporation for their suggestions.
References
1. Soltis DE, Albert V, Leebens-Mack J, Bell CD, Ming R (2012) Genome size variation in three
Paterson AH, Zheng C, Sankoff D, Saccharum species. Euphytica 185:511–519
Depamphilis CW, Wall PK, Soltis PS (2009) 8. D’Hont A, Grivet L, Feldmnn P, Glaszmann J,
Polyploidy and angiosperm diversification. Am Rao S, Berding N (1996) Characterisation of
J Bot 96:336–348 the double genome structure of modern sugar-
2. Darlington CD (1937) Recent advances in cane cultivars (Saccharum spp.) by molecular
cytology. J&A Churchill, Ltd., London cytogenetics. Mol Gen Genet 250:405–413
3. Stebbins GL (1950) Variation and evolution in 9. D’Hont A (2005) Unraveling the genome
plants. Columbia University Press, New York structure of polyploids using FISH and GISH;
4. Hieter P, Griffiths T (1999) Polyploidy – more examples of sugarcane and banana. Cytogenet
is more or less. Science 285:210–211 Genome Res 109:27–33
5. Serang O, Mollinari M, Garcia A (2012) 10. Wu KK, Burnquist W, Sorrells ME, Tew TL,
Efficient exact maximum a posteriori computa- Moore PH, Tanksley SD (1992) The detection
tion for bayesian SNP genotyping in poly- and estimation of linkage in polyploids using
ploids. PLoS One 7:e30906 single-dose restriction fragments. Theor Appl
6. Edme S, Comstock J, Miller J, Tai P (2005) Genet 83:294–300
Determination of DNA content and genome 11. Da Silva JAG, Honeycutt RJ, Burnquist W,
size in sugarcane. J Am Soc Sugar Cane Al-Janabi SM, Sorrells M, Tanksley SD, Sobral
Technol 25:1–16 BWS (1995) Saccharum spontaneum L. ’SES
7. Zhang J, Nagai C, Yu Q, Pan Y, Ayala-Silva T, 208’ genetic linkage map combining RFLP-and
Schnell R, Comstock J, Arumuganathan A, PCR-based markers. Mol Breed 1:165–179
240 Marcelo Mollinari and Oliver Serang
12. Sorrells ME (1992) Development and applica- autotetraploid species. Genetics 159:
tion of RFLP in polyploids. Crop Sci 32: 1819–1832
1086–1091 27. Wu R, Ma CX, Casella G (2004) A mixed poly-
13. Ripol MI, Churchill GA, Silva JAGD, Sorrells ploid model for linkage analysis in outcrossing
M (1999) Statistical aspects of genetic map- tetraploids using a pseudo-test backcross
ping in autopolyploids. Gene 235:31–41 design. J Comput Biol 11:562–580
14. Baker P, Jackson P, Aitken K (2010) Bayesian 28. Rafalski A (2002) Applications of single nucle-
estimation of marker dosage in sugarcane and otide polymorphisms in crop genetics. Curr
other autopolyploids. Theor Appl Genet Opin Plant Biol 5:94–100
120:1653–72 29. Fan JB, Oliphant A, Shen R et al (2003) Highly
15. Guo M, Davis D, Birchler JA (1996) Dosage parallel SNP genotyping. Cold Spring Harb
effects on gene expression in a maize ploidy Symp Quant Biol 68:69–78
series. Genetics 142:1349–1355 30. Oeth P, Beaulieu M, Park C, Kosman D, de
16. Galitski T, Saldanha AJ, Styles CA, Lander ES, Mistro G, van Den Boom D, Jurinke C (2007)
Fink GR (1999) Ploidy regulation of gene iPLEX assay: increased plexing efficiency and
expression. Science 285:251–254 flexibility for MassARRAY system through sin-
17. Wang J, Tian L, Lee HS, Wei NE, Jiang H, gle base primer extension with mass-modified
Watson B, Madlung A, Osborn TC, Doerge terminators. Sequenom application note.
RW, Comai L, Chen ZJ (2006) Genomewide Sequenom, San Diego, CA
non additive gene regulation in Arabidopsis 31. Oeth P, de Mistro G, Marnellos G, Shi T, van den
allotetraploids. Genetics 172:507–17 Boom D (2009) Qualitative and quantitative
18. Osborn TC, Pires JC, Birchler JA, Auger DL, genotyping using single base primer extension
Chen ZJ, Lee HS, Comai L, Madlung A, coupled with matrix-assisted laser desorption/
Doerge RW, Colot V, Martienssen RA (2003) ionization time-of-flight mass spectrometry
Understanding mechanisms of novel gene (MassARRAY). In: Komar AA (ed) Single nucle-
expression in polyploids. Trends Genet 19: otide polymorphisms. Humana, New York,
141–147 pp 307–343
19. Luo ZW, Hackett CA, Bradshaw JE, McNicol 32. Akhunov E, Nicolet C, Dvorak J (2009) Single
JW, Milbourne D (2000) Predicting parental nucleotide polymorphism genotyping in poly-
genotypes and gene segregation for tetrasomic ploid wheat with the Illumina GoldenGate
inheritance. Theor Appl Genet 100:1067–1073 assay. Theor Appl Genet 119:507–517
20. Luo ZW, Hackett CA, Bradshaw JE, McNicol 33. Illumina, Inc. (2006) Goldengate assay work-
JW, Milbourne D (2001) Construction of a flow. Technical report. Illumina Inc., San
genetic linkage map in tetraploid species using Diego, CA
molecular markers. Genetics 157:1369–1385 34. Baird NA, Etter PD, Atwood TS, Currey MC,
21. Ma CX, Casella G, Shen ZJ, Osborn TC, Wu R Shiver AL, Lewis ZA, Selker EU, Cresko WA,
(2002) A unified framework for mapping quan- Johnson EA (2008) Rapid SNP discovery and
titative trait loci in bivalent tetraploids using genetic mapping using sequenced RAD mark-
single-dose restriction fragments: a case study ers. PLoS One 3:e3376
from Alfalfa. Genome Res 12:1974–1981 35. Elshire RJ, Glaubitz JC, Sun Q, Poland JA,
22. Luo ZW, Zhang RM, Kearsey MJ (2004) Kawamoto K, Buckler ES, Mitchell SE (2011)
Theoretical basis for genetic linkage analysis in A robust, simple genotyping-by-sequencing
autotetraploid species. Proc Natl Acad Sci (gbs) approach for high diversity species. PLoS
U S A 101:7040–7045 One 6:e19379
23. Luo ZW, Zhang Z, Leach L, Zhang RM, 36. Bagge M, Lubberstedt T (2008) Functional
Bradshaw JE, Kearsey MJ (2006) Constructing markers in wheat: technical and economic
genetic linkage maps under a tetrasomic model. aspects. Mol Breed 22:319–328
Genetics 172:2635–2645 37. Ragoussis J, Elvidge GP, Kaur K, Colella S
24. Leach LJ, Wang L, Kearsey MJ, Luo Z (2010) (2006) Matrix-assisted laser desorption/ioni-
Multilocus tetrasomic linkage analysis using sation, time-of-flight mass spectrometry in
hidden Markov chain model. Proc Natl Acad genomics research. PLoS Genet 2:e100
Sci U S A 107:4270–4274 38. Griffin TJ, Smith LM (2000) Single-nucleotide
25. Xie C, Xu S (2000) Mapping quantitative trait polymorphism analysis by MALDITOF mass
loci in tetraploid populations. Genet Res spectrometry. Trends Biotechnol 18:77–84
76:105–115 39. Marziali A, Akeson M (2001) New DNA
26. Hackett CA, Bradshaw JE, McNicol JW (2001) sequencing methods. Annu Rev Biomed Eng
Interval mapping of quantitative trait loci in 3:195–223
Quantitative SNP Genotyping of Polyploids with MassARRAY and Other Platforms 241
40. Gabriel S, Ziaugra L, Tabbaa D (2009) SNP 45. Wang J, Roe B, Macmil S et al (2010)
genotyping using the Sequenom MassARRAY Microcollinearity between autopolyploid sug-
iPLEX platform. Curr Protoc Hum Genet arcane and diploid sorghum genomes. BMC
60:1–18 Genomics 11:261
41. Bradic M, Costa J, Chelo IM (2011) 46. Berard A, Le Paslier M, Dardevet M, Exbrayat-
Genotyping with Sequenom. In: Rockman M, Vinson F, Bonnin I, Cenci A, Haudry A, Brunel
Orgogozo V (eds) Molecular methods for D, Ravel C (2009) High-throughput single
evolutionary genetics. Humana, New York, nucleotide polymorphism genotyping in wheat
pp 193–210 (Triticum spp.). Plant Biotechnol J 7:364–374
42. Irwin D (2008) The MassARRAY system for 47. Sequenom (2007) Typer 40 manual.
plant genomics. In: Henry R (ed) Plant geno- Sequenom, San Diego, CA
typing II, SNP technology. CSIRO Publishing, 48. Bonk T, Humeny A (2001) MALDI-TOF-MS
Collingwood, VIC, pp 98–113 analysis of protein and DNA. Neuroscientist
43. Sequenom (2003) Multiplexing the homoge- 7:6–12
neous MassEXTEND assay. Sequenom, San 49. Voorrips RE, Gort G, Vosman B (2011)
Diego, CA Genotype calling in tetraploid species from bi-
44. Storm N, Darnhofer-Patel B, van den Boom D, allelic marker data using mixture models. BMC
Rodi CP (2003) MALDI-TOF mass Bioinformatics 12:172
spectrometry-based SNP genotyping. In: Kwok 50. Fujisawa H, Eguchi S, Ushijima M, Miyata S,
P (ed) Single nucleotide polymorphisms – meth- Miki Y, Muto T, Matsuura M (2004) Genotyping
ods and protocols. Humana, New York, of single nucleotide polymorphism using model-
pp 241–262 based clustering. Bioinformatics 20:718–726
Chapter 18
Abstract
In a separate chapter we describe a simple method for single nucleotide polymorphism (SNP) discovery
using genomic reduction. Here we describe a scalable and cost-effective SNP genotyping method based
on KBioscience’s competitive allele-specific PCR amplification of target sequences and endpoint fluores-
cence genotyping (KASPar™) using a FRET capable plate reader or Fluidigm’s dynamic array high-
throughput platform.
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_18, © Springer Science+Business Media New York 2015
243
244 Scott M. Smith and Peter J. Maughan
2 Materials
3 Methods
3.1 KASPar 1. Inject control line fluid into the top and bottom control line
Genotyping via fluid reservoirs of the 96.96 chip (one syringe per reservoir)
Fluidigm’s (see Fig. 2).
Dynamic Array 2. Load chip into IFC Controller [HX] with the barcode facing
out and “Prime” the IFC. Priming takes approximately 20 min.
3. Prepare working KASPar primer mix: In a 96-well PCR plate
combine allele-specific primer 1, allele-specific primer 2, com-
mon reverse primer, and nuclease free water as shown in Table 1
(see Note 9).
SNP Genotyping Using KASPar Assays 247
Fig. 2 Diagram of a Fluidigm 96.96 chip layout. Control line fluid is injected into
each of the control line fluid reservoirs and primed using the IFC Controller
(Control line fluid is pressurized causing it to enter the chip allowing control of
various valves). Ninety-six assays and 96 samples are then loaded into their
respective inlets (see Fig. 3). Assays and samples are then forced into the IFC
chip using the IFC controller
Table 1
KASPar primer mix preparation
Fig. 3 Diagram of one set of IFC inlets (assay or sample) and how to load them.
Using an 8-channel pipette, pipette assays and samples into their appropriate
inlets by column. A standard 8-channel pipette will pipette into every other inlet.
Pipette column 1 of the prepared 96-well plate of assays into the first column
(every other inlet as indicated by the blue highlighted inlets) then work right with
columns 2, 3, 4, 5, and 6. Pipette column 7 of the prepared 96-well plate of
assays into the first column of inlets (every other inlet as indicated by the purple
highlighted inlets) just below those pipetted previously. Work your way right with
columns 8, 9, 10, 11, and 12 until all inlets are filled. Repeat this pattern for
prepared samples, pipetting them into the sample inlets
prepared assay plate to chip assay inlets starting on the top left
working right filling every other inlet in each of the six inlet
columns. Pipette the remaining columns from the prepared
assay plate (columns 7–12) just below the previously pipetted
assays starting on the left working right (see Fig. 3).
11. Pipette 5 μL of samples into each sample inlet using an 8-channel
pipette. Pipette by column the first six columns from prepared
sample plate to chip sample inlets starting on the top left
SNP Genotyping Using KASPar Assays 249
working right filling every other inlet in each of the six inlet
columns. Pipette the remaining columns from the prepared
sample plate (columns 7–12) just below the previously pipetted
samples starting on the left working right (see Fig. 3).
12. Remove any bubbles in the sample and assay inlets (see Note 10).
13. Remove the blue plastic protector from the bottom of the IFC.
14. Place the chip in the FC1 Cycler with the barcode facing out
and thermal cycle using the touchdown conditions described
in Table 2 (see Notes 11 and 12).
Table 2
Touchdown PCR conditions for KASPar genotyping via Fluidigm’s
dynamic array
Table 3
Further cycling conditions for KASPar genotyping via Fluidigm’s
dynamic array
Fig. 4 Example of SNP assays using the KASPar genotyping on the Fluidigm access array. The image was
obtained from Fluidigm’s SNP Genotyping Analysis software and shows a Cartesian graph with three distinct
genotypic cluster. Each dot represents one sample
15. Prepare to read the chip by turning on the EP1 Reader and
opening the EP1 Data Collection software (see Note 13).
16. Remove the chip from the FC1 Cycler and place it in the EP1
Reader with the barcode facing out. Read the chip using EP1
Data Collection software’s on screen directions.
17. Remove the chip from the EP1 Reader and place it back in the
FC1 Cycler and cycle for an additional five cycles using the
conditions outlined in Table 3.
18. Repeat steps 16 and 17 one more time to obtain reads for 36,
41, and 46 cycles (see Note 14).
19. Use the Fluidigm SNP Genotyping Analysis software to analyze
the genotyping results. Genotyping results are plotted by SNP
assay on Cartesian graphs with each dot representing a single
sample genotype. Samples with the same genotype should group
together forming distinct genotype-specific clusters (see Fig. 4).
SNP Genotyping Using KASPar Assays 251
3.2 KASPar 1. Prepare KASPar primer mix: In a 96-well PCR plate combine
Genotyping via allele-specific primer 1, allele-specific primer 2, common
FRET-Capable reverse primer, and nuclease free water as described in Table 1.
Plate Reader 2. Prepare a DNA plate by pipetting 4 μL of DNA into each well
of a 96 or 384 well plate and dry down the DNA sample in a
centrifugal evaporator (speed vac) or by leaving the sample
uncovered for several hours at room temperature in a laminar
flow hood (see Note 15).
3. Prepare individual KASPar primer master mixes for each SNP
assay by combining the components in Table 4 into individu-
ally labeled microfuge tubes. Dispense KASPar primer master
mixes into each well of the prepared DNA plate (see Note 16).
4. Seal the plate with an optically clear seal, vortex briefly and
centrifuge the plate.
5. Thermal cycle the reaction using the touchdown conditions as
described in step 14 of Subheading 3.1.
6. Capture end-point fluorescence signal using a FRET-capable
plate reader. Genotyping results can be plotted by SNP assay
on Cartesian graphs with each dot representing a single sample
genotype using KBiosciences Kluster Caller or other similar
software packages (see Fig. 4).
3.3 Specific Target The specific target amplification (STA) is an optional step. STA
Preamplification reduces the complexity of the template DNA by targeting and pre-
amplifying the SNP amplicons. This step is most useful when the
starting template DNA is of low quality or quantity.
1. Prepare 10× STA Primer mix (final solution will contain 500 nM
of each primer): in a single PCR tube, combine 2 μL of each
forward primer and 2 μL of each reverse primer and bring the
final volume up to 400 μL by adding TE Buffer as described in
Table 5.
Table 4
KASPar master mixes for genotyping via FRET-capable plate reader
Table 5
STA primer mix components
Table 6
STA premix components
Table 7
STA thermal cycling conditions
4 Notes
Sample
Sample
Sample 8
Sample 9
Sample
Sample
Sample
Sample
Sample
Sample
Sample 7
Controls
1 2 3 4 5 6 7 8 9 10 11 12
SNP A 1/1 1/2 1/3 1/4 1/5 1/6 1/7 1/8 1/9 1/10 1/11 1/A1
SNP B 2/1 2/2 2/3 2/4 2/5 2/6 2/7 2/8 2/9 2/10 2/11 2/A1
SNP C 3/1 3/2 3/3 3/4 3/5 3/6 3/7 3/8 3/9 3/10 3/11 3/A2
SNP D 4/1 4/2 4/3 4/4 4/5 4/6 4/7 4/8 4/9 4/10 4/11 4/A2
SNP E 5/1 5/2 5/3 5/4 5/5 5/6 5/7 5/8 5/9 5/10 5/11 5/Het
SNP F 6/1 6/2 6/3 6/4 6/5 6/6 6/7 6/8 6/9 6/10 6/11 6/Het
SNP G 7/1 7/2 7/3 7/4 7/5 7/6 7/7 7/8 7/9 7/10 7/11 7/NTC
SNP 8 H 8/1 8/2 8/3 8/4 8/5 8/6 8/7 8/8 8/9 8/10 8/11 8/NTC
Fig. 5 Genotyping of 11 samples with eight SNP assays in a 96-well plate. Each row (minus the last row) will
be filled with a single SNP assay (row A = SNP 1, row B = SNP 2, etc.). Each column will be filled with a single
DNA sample (column 1 = Sample 1, column 2 = Sample 2, etc.). The last column will be used for controls.
Multiple positive as well as multiple negative controls are included. For this example, prepare eight KASPar
primer-specific master mixes. Each master mix should contain enough master mix for 14–16 samples (11
samples, 1 control, and 2–4 for overage). The table below depicts the plate set up (first and second numbers
in well positions represent SNP assay and sample numbersrespectively)
Acknowledgments
References
1. Dou J, Zhao X, Fu X, Jiao W, Wang N, Zhang L, 4. Blair MW, Cortes AS, Penmetsa RV, Farmer A,
Hu X, Wang S, Bao Z (2012) Reference-free Carrasquilla-Garcia N, Cook DR (2013) A high-
SNP calling: improved accuracy by preventing throughput SNP marker system for parental
incorrect calls from repetitive genomic regions. polymorphism screening, and diversity analysis
Biology 7:17 in common bean (Phaseolus vulgaris L.). Theor
2. Filiault DL, Maloof JN (2012) A genome-wide Appl Genet 126:535–548
association study identifies variants underlying 5. Foolad MR, Panthee DR (2012) Marker-assisted
the Arabidopsis thaliana shade avoidance selection in tomato breeding. Crit Rev Plant Sci
response. PLoS Genet 8:e1002589 31:93–123
3. Ogden R, Baird J, Senn H, Ross M (2012) The 6. Maughan PJ, Smith SM, Fairbanks DJ, Jellen
use of cross-species genome-wide arrays to dis- EN (2011) Development, characterization, and
cover SNP markers for conservation genetics: a linkage mapping of single nucleotide polymor-
case study from Arabian and scimitar-horned phisms in the grain amaranths (Amaranthus sp.).
oryx. Conser Genet Resour 4:471–473 Plant Genome J 4:92–101
Chapter 19
Abstract
Genotyping by sequencing (GBS) is a relatively new method used to determine the differences in the
genetic makeup of individuals. Its novelty stems from a combination of two already available methods:
genotyping and next-generation sequencing. Depending on the individual study design GBS protocols
can take multiple forms, however most share a sequence of core steps that have to be undertaken. These
include: sequencing of the DNA from the individuals of interest (usually two parents of a mapping population
and their progeny), mapping of the sequencing reads to the reference sequence, SNP calling and filtering,
SNP genotyping and imputation, followed by haplotype identification and downstream analysis. GBS has
a range of applications from general marker discovery, haplotype identification, and recombination charac-
terization to quantitative trait locus (QTL) analysis, genome-wide association studies (GWAS), and genomic
selection (GS). It has already been applied to a range of plant species including: rice, maize, artichoke, and
Arabidopsis thaliana. It is a promising approach which is likely to provide new and important insights into
plant biology.
Key words Genotyping, GBS, Markers, SNPs, SNP calling, Imputation, Haplotype identification,
Recombination
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_19, © Springer Science+Business Media New York 2015
257
258 Agnieszka A. Golicz et al.
3 GBS: Method
3.1 Sequencing The main consideration when adopting a sequencing strategy for a
GBS project involves deciding between whole genome and reduced
representation sequencing. An advantage of reduced representation
sequencing is a reduction in the amount of sequence data required.
However, this should be balanced by the increased complexity of
DNA sequencing library preparation, downstream bioinformatics
analysis, and potential bias of reduced complexity sequencing.
A variety of reduced representation protocols exist including exon-
capture [39], RNA-Seq [40], and RAD-Seq [38, 41] (for exhaus-
tive review refer to: Genome-wide genetic marker discovery and
genotyping using next-generation sequencing [42]). One of the
most popular methods of genome size reduction for GBS is restric-
tion site-associated DNA sequencing (RAD-Seq). However, while
working efficiently for some species, several sources of bias in RAD-
Seq experiments have been identified [43]. These include restric-
tion fragment bias, restriction site heterozygosity, and PCR GC
content bias, and these should be taken into account during experi-
mental design. Recently, with the reducing cost of Illumina DNA
sequence data generation, whole-genome sequencing emerged as a
viable alternative to reduced representation. Whole-genome sequenc-
ing [35, 44, 45] decreases the number of steps and cost of library
preparation, reduces the complexity of downstream bioinformatics
analysis, and eliminates biases stemming from the use of restriction
enzymes, but may require more sequence data than reduced repre-
sentation methods. A major advantage of whole-genome sequenc-
ing is that genotyping resolution can be adjusted by generating
different quantities of data, balancing resolution with cost.
Sequencing depth is another major consideration during GBS
experimental design. The optimal mean coverage per locus may
vary depending on the species, experimental goals, and strategies
Skim-Based Genotyping by Sequencing 261
3.2 Read Mapping There are a variety of tools used to map reads to a reference genome
(see Note 3), including but not limited to: SOAP2 [47], Bowtie
[48], and BWA [49]. The specific choice of the mapping software
depends on the application (see Note 4). The characteristics to
be taken into account include speed, sensitivity, ability to discover
indels, availability of computing resources, and personal preference.
Usually only read pairs that map uniquely to a unique place in the
genome should be considered. Discarding read pairs that map
equally well to more than one position reduces the false positive
variant discovery rate [50].
3.3 SNP Calling GBS methods differ in that some, such as RAD-Seq, discover SNPs
and Filtering during the genotyping process while others require an initial SNP
discovery process prior to genotyping. Many applications in plants
require prior knowledge of the SNP positions. SNP discovery is
usually performed by resequencing of the parental individuals fol-
lowed by read alignment to the available reference sequence and
SNP calling. Different SNP discovery tools are available and the
most appropriate one to use would depend on the species and data
type. Some example SNP discovery tools include: SHORE [51],
samtools mpileup [52], and SGSAutoSNP [50]. The main concep-
tual difference between SNP calling pipelines involves the use of the
reference sequence. Some use the reference sequence in the SNP
identification process, whereas the others use the reference for read
alignment only and then discover polymorphisms between the
aligned reads. During the SNP discovery process, sequence errors
and misaligned reads may result in false positive SNP calls and
these are considerations SNP discovery software needs to address
to ensure accurate prediction. Depending on the software used,
SNPs can be further filtered using criteria including: total coverage
at SNP position, number of reads supporting the SNP, proportion
262 Agnieszka A. Golicz et al.
of the reads supporting the SNP, and the quality score of bases at
the SNP position. The application of an appropriate choice of fil-
ters should remove a large proportion of false positive SNPs. In the
case of RAD-seq, if the reference genome is available, the raw reads
can also be aligned to the reference sequence. Alternatively, when
no reference sequence is available de novo RAD tag analysis is pos-
sible. Very similar sequences that differ only by a small number of
mismatches, and presumably represent the same locus, are clus-
tered together; SNP and indels can then be identified between
alleles.
3.5 SNP Imputation For certain applications SNP imputation may be desired. Imputation
involves inferring missing genotypes to generate a more complete
picture of the genetic makeup. In the skim sequencing applications,
where often only a portion of markers for each individual will be
genotyped, imputation becomes an important step in the data anal-
ysis pipelines. If an appropriate method is chosen, the accuracy of
imputation can be very high [44]. Imputation methods range from
fairly straightforward “filling in of the missing SNP” based on the
known surrounding genotypes in the recombinant populations, a
k nearest neighbors (KNN) algorithm-based model which was suc-
cessfully applied in the analysis of 500 unrelated rice samples [44],
through to sophisticated statistical methods which rely on linkage
disequilibrium structure and haplotype maps [53]. The choice of
imputation procedure requires careful consideration and needs to
be balanced with the data volume and desired resolution of geno-
typing. If too much information is missing from a sample, imputa-
tion may not be accurate and it may be beneficial to remove the
sample before analysis.
3.6 Haplotype The markers obtained from GBS may be used to construct high
Identification density haplotype maps [44], estimate recombination rates [36, 35],
and Further Analysis and perform GWAS [44] and genomic selection. The high number
of markers obtained compared to traditional genotyping provides
more precision for the downstream analysis.
Genotyping data can be used to scan for points in which
recombination rates are unusually high or low, i.e., recombination
hot- or cold-spots. SNP density can be measured along and
between the chromosomes. Analysis of SNP density provides infor-
mation regarding sequence conservation; the regions with lowest
SNP density presumably being the most conserved and the regions
Skim-Based Genotyping by Sequencing 263
with high SNP density being the least conserved. Sequence con-
servation may in turn provide clues regarding functionality and
evolutionary history. GWAS has emerged as tool, which enables
identification of genomic variation underling complex traits. GWAS
is extremely convenient because it does not require any prior
knowledge about the location of the gene of interest. However, it
is dependent on the quality and the density of the markers used.
GBS has a potential to provide high density maker maps and will
improve the efficiency of GWAS in plants. Figure 1 depicts the
workflow in a sample GBS pipeline.
4.1 Experimental The aim of the experiment is to perform de novo SNP discovery
Design and Choice based on NGS data, genotyping of a population by skim sequenc-
of Dataset ing, and estimation of recombination frequency along Arabidopsis
thaliana chromosome 1.
Two parental individuals (P1 and P2) and 100 offspring dou-
ble haploid (DH) individuals were selected for sequencing (see Note
5). Illumina sequencing libraries, with an insert size of 500 bp,
were constructed for each individual. Then 100 bp paired reads
were generated using an Illumina HiSeq 2000. The parental indi-
viduals were sequenced to 30× coverage. The progeny individuals
were sequenced with average coverage of 1×.
4.2 Read Mapping Reads from the parental individuals and offspring were mapped to
the A. thaliana genome downloaded from phytozome version
v9.0 (http://www.phytozome.net/) using SOAPaligner/soap2.
Only reads that map uniquely to one position in the genome were
considered and a broad insert size of 0–1,000 bp was selected.
SOAP parameters: –m 0 –x 1,000 –r 0.
4.3 SNP Calling SNPs were discovered using SGSautoSNP [50]. SGSautoSNP is a
SNP discovery tool designed specifically for complex, polyploid
plant genomes, though it also works well with simpler genomes.
SGSautoSNP represents a novel approach to SNP discovery, since
it does not consider the reference sequence during SNP calling.
Instead, it uses a reference to align reads from multiple samples
and then finds SNPs between samples by comparison of the mapped
reads. Also, considering the fact that plant populations are often
inbred or doubled haploid and highly homozygous, SGSautoSNP
discards all the SNPs that are heterozygous within a single sample
deeming them most likely due to mis-mapping of reads and
representing false SNP calls.
According to the SGSautoSNP strategy, reads from both paren-
tal individuals (P1 and P2) are aligned to the reference genome and
then polymorphisms between parents are identified. The resulting
SNP file is then used for genotyping of the DH population.
4.4 SNP Genotyping As outlined earlier, the reads from 100 offspring double haploid
individuals were aligned to the same A. thaliana reference. A cus-
tom parser, GenotypeSNPs.pl, compares the SNPs called by
SGSautoSNP to the mappings generated for the offspring and
checks which nucleotides are present at each SNP position.
Skim-Based Genotyping by Sequencing 265
4.5 SNP Imputation Because the DH individuals were sequenced at low coverage, miss-
ing sequence data results in missing genotype calls. SNP imputa-
tion was performed to “fill in” the missing genotypes based on
the haplotype structure of the parents. The principle behind the
imputation method is presented in Fig. 2.
4.6 Recombination Based on the haplotypes of the two parents, the inheritance of
Frequency Estimation blocks P1 (paternal homozygosity) and P2 (maternal homozygosity)
haplotypes was determined (see Note 6). Recombination events
100
90
80
70
60
CO events
50
40
30
20
10
0
0.3
1.8
3.3
4.8
6.3
7.8
9.3
10.8
12.3
13.8
15.3
16.8
18.3
19.8
21.3
22.8
24.3
25.8
27.3
28.8
Chromosome position [Mbp]
Fig. 3 Recombination frequencies. Recombination frequency for each position along A. thaliana chromosome 1.
The frequency was calculated by summing the cross over (CO) events for all the individuals in the window of
width 5,000 bp
5 Notes
References
1. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, 11. Batley J, Edwards D (2009) Mining for single
Kawamoto K, Buckler ES, Mitchell SE (2011) nucleotide polymorphism (SNP) and simple
A robust, simple genotyping-by-sequencing sequence repeat (SSR) molecular genetic mark-
(GBS) approach for high diversity species. ers. In: Posada D (ed) Bioinformatics for
PLoS One 6:e19379 DNA sequence analysis. Humana, New York,
2. Botstein D, White RL, Skolnick M, Davis RW pp 303–322
(1980) Construction of a genetic linkage map 12. Appleby N, Edwards D, Batley J (2009) New
in man using restriction fragment length poly- technologies for ultra-high throughput geno-
morphisms. Am J Hum Genet 32:314–331 typing in plants. In: Somers D, Langridge P,
3. Vos P, Hogers R, Bleeker M, Reijans M, van de Gustafson J (eds) Plant genomics. Humana,
Lee T, Hornes M, Frijters A, Pot J, Peleman J, New York, pp 19–40
Kuiper M et al (1995) AFLP: a new technique 13. Edwards D, Batley J, Snowdon R (2013)
for DNA fingerprinting. Nucleic Acids Res Accessing complex crop genomes with next-
23:4407–4414 generation sequencing. Theor Appl Genet
4. Jarne P, Lagoda PJL (1996) Microsatellites, 126:1–11
from molecules to populations and back. 14. Edwards D, Wang X (2012) Genome sequenc-
Trends Ecol Evol 11:424–429 ing initiatives. In: Edwards D, Parkin IAP,
5. Arif IA, Bakir MA, Khan HA, Al Farhan AH, Batley J (eds) Genetics, genomics and breeding
Al Homaidan AA, Bahkali AH, Sadoon MA, of oilseed Brassicas. Science Publishers Inc.,
Shobrak M (2010) A brief review of molecular New Hampshire, pp 152–157
techniques to assess plant diversity. Int J Mol 15. Imelfort M, Batley J, Grimmond S, Edwards D
Sci 11:2079–2096 (2009) Genome sequencing approaches and
6. Edwards D, Forster JW, Chagné D, Batley J successes. In: Somers D, Langridge P, Gustafson
(2007) What are SNPs? In: Oraguzie NC, J (eds) Plant genomics. Humana, New York,
Rikkerink EHA, Gardiner SE, De Silva HN pp 345–358
(eds) Association mapping in plants. Springer, 16. Edwards D, Batley J (2010) Plant genome
New York, pp 41–52 sequencing: applications for crop improve-
7. Edwards D, Forster JW, Cogan NOI, Batley J, ment. Plant Biotechnol J 7:1–8
Chagné D (2007) Single nucleotide polymor- 17. Berkman PJ, Skarshewski A, Lorenc MT, Lai
phism discovery. In: Oraguzie N, Rikkerink E, K, Duran C, Ling EYS, Stiller J, Smits L,
Gardiner S, De Silva H (eds) Association map- Imelfort M, Manoli S, McKenzie M,
ping in plants. Springer, New York, pp 53–76 Kubalakova M, Simkova H, Batley J, Fleury D,
8. Chagné D, Batley J, Edwards D, Forster JW Dolezel J, Edwards D (2011) Sequencing and
(2007) Single nucleotide polymorphism geno- assembly of low copy and genic regions of iso-
typing in plants. In: Oraguzie N, Rikkerink E, lated Triticum aestivum chromosome arm
Gardiner S, De Silva H (eds) Association map- 7DS. Plant Biotechnol J 9:768–775
ping in plants. Springer, New York, pp 77–94 18. Berkman PJ, Skarshewski A, Manoli S, Lorenc
9. Duran C, Edwards D, Batley J (2009) MT, Stiller J, Smits L, Lai K, Campbell E,
Molecular marker discovery and genetic map Kubalakova M, Simkova H, Batley J, Dolezel J,
visualisation. In: Edwards D, Hanson D, Hernandez P, Edwards D (2012) Sequencing
Stajich J (eds) Applied bioinformatics. Springer, wheat chromosome arm 7BS delimits the
New York, pp 165–189 7BS/4AL translocation and reveals homoeolo-
10. Hayward A, Dalton-Morgan J, Mason A, gous gene conservation. Theor Appl Genet
Zander M, Edwards D, Batley J (2012) SNP 124:423–432
discovery and applications in Brassica napus. 19. Berkman PJ, Visendi P, Lee HC, Stiller J,
J Plant Biotechnol 39:1–12 Manoli S, Lorenc MT, Lai K, Batley J, Fleury D,
268 Agnieszka A. Golicz et al.
Šimková H, Kubaláková M, Weining S, Doležel Boodhun A, Brennan JS, Bridgham JA, Brown
J, Edwards D (2013) Dispersion and domesti- RC, Brown AA, Buermann DH, Bundu AA,
cation shaped the genome of bread wheat. Burrows JC, Carter NP, Castillo N, Chiara
Plant Biotechnol J 11:564–571 ECM, Chang S, Neil Cooley R, Crake NR,
20. Wang X, Wang H, Wang J, Sun R, Wu J, Liu S, Dada OO, Diakoumakos KD, Dominguez-
Bai Y, Mun J-H, Bancroft I, Cheng F, Huang Fernandez B, Earnshaw DJ, Egbujor UC,
S, Li X, Hua W, Wang J, Wang X, Freeling M, Elmore DW, Etchin SS, Ewan MR, Fedurco
Pires JC, Paterson AH, Chalhoub B, Wang B, M, Fraser LJ, Fuentes Fajardo KV, Scott Furey
Hayward A, Sharpe AG, Park B-S, Weisshaar B, W, George D, Gietzen KJ, Goddard CP, Golda
Liu B, Li B, Liu B, Tong C, Song C, Duran C, GS, Granieri PA, Green DE, Gustafson DL,
Peng C, Geng C, Koh C, Lin C, Edwards D, Hansen NF, Harnish K, Haudenschild CD,
Mu D, Shen D, Soumpourou E, Li F, Fraser F, Heyer NI, Hims MM, Ho JT, Horgan AM,
Conant G, Lassalle G, King GJ, Bonnema G, Hoschler K, Hurwitz S, Ivanov DV, Johnson
Tang H, Wang H, Belcram H, Zhou H, MQ, James T, Huw Jones TA, Kang GD,
Hirakawa H, Abe H, Guo H, Wang H, Jin H, Kerelska TH, Kersey AD, Khrebtukova I,
Parkin IAP, Batley J, Kim J-S, Just J, Li J, Xu J, Kindwall AP, Kingsbury Z, Kokko-Gonzales
Deng J, Kim JA, Li J, Yu J, Meng J, Wang J, PI, Kumar A, Laurent MA, Lawley CT, Lee
Min J, Poulain J, Hatakeyama K, Wu K, Wang SE, Lee X, Liao AK, Loch JA, Lok M, Luo S,
L, Fang L, Trick M, Links MG, Zhao M, Jin Mammen RM, Martin JW, McCauley PG,
M, Ramchiary N, Drou N, Berkman PJ, Cai Q, McNitt P, Mehta P, Moon KW, Mullens JW,
Huang Q, Li R, Tabata S, Cheng S, Zhang S, Newington T, Ning Z, Ling Ng B, Novo SM,
Zhang S, Huang S, Sato S, Sun S, Kwon S-J, O'Neill MJ, Osborne MA, Osnowski A,
Choi S-R, Lee T-H, Fan W, Zhao X, Tan X, Xu Ostadan O, Paraschos LL, Pickering L, Pike
X, Wang Y, Qiu Y, Yin Y, Li Y, Du Y, Liao Y, AC, Pike AC, Chris Pinkard D, Pliskin DP,
Lim Y, Narusaka Y, Wang Y, Wang Z, Li Z, Podhasky J, Quijano VJ, Raczy C, Rae VH,
Wang Z, Xiong Z, Zhang Z. (2011) The Rawlings SR, Chiva Rodriguez A, Roe PM,
genome of the mesopolyploid crop species Rogers J, Rogert Bacigalupo MC, Romanov
Brassica rapa. Nat Genet 43:1035–1040 N, Romieu A, Roth RK, Rourke NJ, Ruediger
21. Varshney RK, Song C, Saxena RK, Azam S, Yu ST, Rusman E, Sanches-Kuiper RM, Schenker
S, Sharpe AG, Cannon SB, Baek J, Tar'an B, MR, Seoane JM, Shaw RJ, Shiver MK, Short
Millan T, Zhang X, Rosen B, Ramsay LD, Iwata SW, Sizto NL, Sluis JP, Smith MA, Ernest
A, Wang Y, Nelson W, Farmer AD, Gaur PM, Sohna Sohna J, Spence EJ, Stevens K, Sutton
Soderlund C, Penmetsa RV, Xu C, Bharti AK, N, Szajkowski L, Tregidgo CL, Turcatti G,
He W, Winter P, Zhao S, Hane JK, Carrasquilla- Vandevondele S, Verhovsky Y, Virk SM,
Garcia N, Condie JA, Upadhyaya HD, Luo M, Wakelin S, Walcott GC, Wang J, Worsley GJ,
Singh NP, Lichtenzveig J, Gali KK, Rubio J, Yan J, Yau L, Zuerlein M, Rogers J, Mullikin
Nadarajan N, Thudi M, Dolezel J, Bansal KC, JC, Hurles ME, McCooke NJ, West JS, Oaks
Xu X, Edwards D, Zhang G, Kahl G, Gil J, FL, Lundberg PL, Klenerman D, Durbin R,
Singh KB, Datta SK, Jackson SA, Wang J, Cook Smith AJ (2008) Accurate whole human
D (2013) Draft genome sequence of kabuli genome sequencing using reversible termina-
chickpea (Cicer arietinum): genetic structure tor chemistry. Nature 456:53–59
and breeding constraints for crop improvement. 23. Margulies M, Egholm M, Altman WE, Attiya S,
Nat Biotechnol 31:240–246 Bader JS, Bemben LA, Berka J, Braverman MS,
22. Bentley DR, Balasubramanian S, Swerdlow Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM,
HP, Smith GP, Milton J, Brown CG, Hall KP, Gomes XV, Godwin BC, He W, Helgesen S, Ho
Evers DJ, Barnes CL, Bignell HR, Boutell JM, CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie
Bryant J, Carter RJ, Keira Cheetham R, Cox TP, Jirage KB, Kim JB, Knight JR, Lanza JR,
AJ, Ellis DJ, Flatbush MR, Gormley NA, Leamon JH, Lefkowitz SM, Lei M, Li J,
Humphray SJ, Irving LJ, Karbelashvili MS, Lohman KL, Lu H, Makhijani VB, McDade
Kirk SM, Li H, Liu X, Maisinger KS, Murray KE, McKenna MP, Myers EW, Nickerson E,
LJ, Obradovic B, Ost T, Parkinson ML, Pratt Nobile JR, Plant R, Puc BP, Ronan MT, Roth
MR, Rasolonjatovo IM, Reed MT, Rigatti R, GT, Sarkis GJ, Simons JF, Simpson JW,
Rodighiero C, Ross MT, Sabot A, Sankar SV, Srinivasan M, Tartaro KR, Tomasz A, Vogt KA,
Scally A, Schroth GP, Smith ME, Smith VP, Volkmer GA, Wang SH, Wang Y, Weiner MP,
Spiridou A, Torrance PE, Tzonev SS, Vermaas Yu P, Begley RF, Rothberg JM (2005) Genome
EH, Walter K, Wu X, Zhang L, Alam MD, sequencing in microfabricated high-density
Anastasi C, Aniebo IC, Bailey DM, Bancarz picolitre reactors. Nature 437:376–380
IR, Banerjee S, Barbour SG, Baybayan PA, 24. IT (2013) Ion Torrent. http://www.iontorrent.
Benoit VA, Benson KF, Bevis C, Black PJ, com/
Skim-Based Genotyping by Sequencing 269
25. Imelfort M, Duran C, Batley J, Edwards D 37. Scaglione D, Acquadro A, Portis E, Tirone M,
(2009) Discovering genetic polymorphisms in Knapp SJ, Lanteri S (2012) RAD tag sequenc-
next-generation sequencing data. Plant ing as a source of SNP markers in Cynara car-
Biotechnol J 7:312–317 dunculus L. BMC Genomics 13:3
26. Lorenc MT, Boskovic Z, Stiller J, Duran C, 38. Miller MR, Dunham JP, Amores A, Cresko WA,
Edwards D (2012) Role of bioinformatics as a Johnson EA (2007) Rapid and cost-effective
tool for oilseed Brassica species. In: Edwards polymorphism identification and genotyping
D, Parkin IAP, Batley J (eds) Genetics, genom- using restriction site associated DNA (RAD)
ics and breeding of oilseed Brassicas. Science markers. Genome Res 17:240–248
Publishers Inc, New Hampshire, pp 194–205 39. Ng SB, Turner EH, Robertson PD, Flygare
27. Duran C, Boskovic Z, Batley J, Edwards D SD, Bigham AW, Lee C, Shaffer T, Wong M,
(2011) Role of bioinformatics as a tool for veg- Bhattacharjee A, Eichler EE, Bamshad M,
etable Brassica species. In: Stiller J (ed) Nickerson DA, Shendure J (2009) Targeted
Vegetable Brassicas. Science Publishers, Inc., capture and massively parallel sequencing of 12
New Hampshire, pp 406–418 human exomes. Nature 461:272–276
28. Edwards D (2011) Wheat bioinformatics. In: 40. Nagalakshmi U, Wang Z, Waern K, Shou C,
Bonjean A, Angus W, Van Ginkel M (eds) Raha D, Gerstein M, Snyder M (2008) The
The world wheat book. Lavoisier, Paris, transcriptional landscape of the yeast genome
pp 851–875 defined by RNA sequencing. Science 320:
29. Lee H, Lai K, Lorenc MT, Imelfort M, Duran 1344–1349
C, Edwards D (2012) Bioinformatics tools and 41. Baird NA, Etter PD, Atwood TS, Currey MC,
databases for analysis of next generation Shiver AL, Lewis ZA, Selker EU, Cresko WA,
sequence data. Brief Funct Genomics 2:12–24 Johnson EA (2008) Rapid SNP discovery and
30. Berkman PJ, Lai K, Lorenc MT, Edwards D genetic mapping using sequenced RAD mark-
(2012) Next generation sequencing applica- ers. PLoS One 3:e3376
tions for wheat crop improvement. Am J Bot 42. Davey JW, Hohenlohe PA, Etter PD, Boone
99:365–371 JQ, Catchen JM, Blaxter ML (2011) Genome-
31. Batley J, Edwards D (2009) Genome sequence wide genetic marker discovery and genotyping
data: management, storage, and visualization. using next-generation sequencing. Nat Rev
Biotechniques 46:333–336 Genet 12:499–510
32. Duran C, Eales D, Marshall D, Imelfort M, 43. Davey JW, Cezard T, Fuentes-Utrilla P, Eland
Stiller J, Berkman PJ, Clark T, McKenzie M, C, Gharbi K, Blaxter ML (2012) Special fea-
Appleby N, Batley J, Basford K, Edwards D tures of RAD Sequencing data: implications for
(2010) Future tools for association mapping in genotyping. Mol Ecol 22:3151–3164
crop plants. Genome 53:1017–1023 44. Huang X, Wei X, Sang T, Zhao Q, Feng Q,
33. Gabriel SB, Schaffner SF, Nguyen H, Moore Zhao Y, Li C, Zhu C, Lu T, Zhang Z, Li M,
JM, Roy J, Blumenstiel B, Higgins J, DeFelice Fan D, Guo Y, Wang A, Wang L, Deng L, Li
M, Lochner A, Faggart M, Liu-Cordero SN, W, Lu Y, Weng Q, Liu K, Huang T, Zhou T,
Rotimi C, Adeyemo A, Cooper R, Ward R, Jing Y, Li W, Lin Z, Buckler ES, Qian Q,
Lander ES, Daly MJ, Altshuler D (2002) The Zhang Q-F, Li J, Han B (2010) Genome-wide
structure of haplotype blocks in the human association studies of 14 agronomic traits in
genome. Science 296:2225–2229 rice landraces. Nat Genet 42:961–967
34. Huang X, Feng Q, Qian Q, Zhao Q, Wang L, 45. Wilkening S, Tekkedil MM, Lin G, Fritsch ES,
Wang A, Guan J, Fan D, Weng Q, Huang T, Wei W, Gagneur J, Lazinski DW, Camilli A,
Dong G, Sang T, Han B (2009) High- Steinmetz LM (2013) Genotyping 1000 yeast
throughput genotyping by whole-genome strains by next-generation sequencing. BMC
resequencing. Genome Res 19:1068–1076 Genomics 14:90
35. Yang S, Yuan Y, Wang L, Li J, Wang W, Liu H, 46. Sampson J, Jacobs K, Yeager M, Chanock S,
Chen J-Q, Hurst LD, Tian D (2012) Great Chatterjee N (2011) Efficient study design for
majority of recombination events in Arabidopsis next generation sequencing. Genet Epidemiol
are gene conversion events. Proc Natl Acad Sci 35:269–277
%R 101073/pnas1211827110 47. Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen
36. Gore MA, Chia J-M, Elshire RJ, Sun Q, Ersoz K, Wang J (2009) SOAP2: an improved ultra-
ES, Hurwitz BL, Peiffer JA, McMullen MD, fast tool for short read alignment.
Grills GS, Ross-Ibarra J, Ware DH, Buckler ES Bioinformatics 25:1966–1967
(2009) A first-generation haplotype map of 48. Langmead B, Trapnell C, Pop M, Salzberg SL
maize. Science 326:1115–1117 (2009) Ultrafast and memory-efficient alignment
270 Agnieszka A. Golicz et al.
of short DNA sequences to the human genome. 51. Ossowski S, Schneeberger K, Clark RM, Lanz
Genome Biol 10:R25 C, Warthmann N, Weigel D (2008) Sequencing
49. Li H, Durbin R (2009) Fast and accurate short of natural strains of Arabidopsis thaliana with
read alignment with Burrows-Wheeler trans- short reads. Genome Res 18:2024–2033
form. Bioinformatics 25:1754–1760 52. Li H, Handsaker B, Wysoker A, Fennell T, Ruan
50. Lorenc MT, Hayashi S, Stiller J, Lee H, J, Homer N, Marth G, Abecasis G, Durbin R
Manoli S, Ruperao P, Visendi P, Berkman PJ, (2009) The sequence alignment/map format
Lai K, Batley J, Edwards D (2012) Discovery and SAMtools. Bioinformatics 25:2078–2079
of single nucleotide polymorphisms in com- 53. Halperin E, Stephan DA (2009) SNP imputa-
plex genomes using SGSautoSNP. Biology tion in association studies. Nat Biotechnol
1:370–382 27:349–351
Chapter 20
Abstract
The modified genotyping by sequencing method described here emphasizes verifying the success of each
library ligation by performing individual PCRs, before preparing the pool of barcoded amplicons to be
sequenced. Although this extra step might seem excessive, it will give peace of mind to the researcher knowing
each individual is represented in the data set and avoid additional data imputation at the analysis stage.
Key words Genome partitioning, GBS, Genotyping by sequencing, Next generation sequencing
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_20, © Springer Science+Business Media New York 2015
271
272 Elena Hilario
2 Materials
2.2 Enzymes 1. AccuPrime™ Taq DNA polymerase High Fidelity 5 U/μL and
and Kits 10× AccuPrime™ High Fidelity Buffer I (Life Technologies,
CA, USA).
The Restriction Enzyme Target Approach to Genotyping by Sequencing (GBS) 273
3 Methods
3.1 DNA Digestion 1. Normalize all your DNA preparations to have the same con-
centration to speed up the pipetting steps (see Note 1).
2. Prepare a restriction enzyme master mix and aliquot into a
1 mL deep well plate, per reaction: 5 μL of 10× NEB reaction
buffer 3, 0.5 μL of 100× BSA, 1 μL of 20 U/μL BamHI, 1 μg
genomic DNA, deionized sterile water to 50 μL.
3. Seal the plate with aluminum tape and mix gently by vortexing.
4. Incubate the digestions at 37 °C for 3 h.
5. Spin down and store at −20 °C until ready for the next step.
3.3 Annealed GBS 1. In a new PCR plate add 6 μL of 1× TE, pH 7.5, to each well.
Adaptor Plate 2. Aliquot 2 μL of the annealed common adaptor mix to each
well.
3. Aliquot 2 μL of annealed barcoded adaptor pair mix to their
corresponding well. Seal the plate and mix. This is now the
annealed GBS adaptor plate.
The concentration of the annealed GBS adaptor plate is
1 pmol/μL for each oligonucleotide: annealed GBp_1, GBn_1,
GC, and GCB.
For short-term storage (4–5 days), keep at 4 °C; otherwise,
store at −20 °C.
The Restriction Enzyme Target Approach to Genotyping by Sequencing (GBS) 275
3.4 Anneal 1. Aliquot 1 μL from the annealed GBS adaptor plate into their
the Adaptor Pairs corresponding well of the DNA restriction enzyme digestion
to the Digested DNA plate. Seal the plate, spin down briefly, mix gently by vortexing,
and spin down again.
2. Incubate the DNA restriction enzyme digestion plate containing
the annealed GBS adaptors at 65 °C for 5 min in a water bath.
3. Remove about 1.5 L of water from the water bath and place it
in the large plastic box.
4. Transfer the plate to the plastic box and let it cool down to
room temperature (~23 °C) for about 2 h.
5. Spin down briefly.
3.5 Ligation 1. Prepare a T4 DNA ligase master mix according to the following:
3 μL of Deionized sterile water, 14 μL of 5× T4 DNA Ligase
buffer, 2 μL of T4 DNA Ligase 1 U/μL, and 51 μL of DNA
digested + annealed GBS adaptors.
2. Aliquot 19 μL of the T4 DNA master mix into each well of the
plate containing the digested DNA and annealed GBS adaptors.
Change tips after each transfer. Seal the plate, mix by gently
vortexing, and spin down briefly.
3. Incubate the ligation reactions at 4 °C (refrigerator) overnight
(see Note 4).
4. Spin down at 500 × g for 5 min at room temperature.
5. Bring the total volume to 100 μL by adding 1 μL of 20 mg/mL
dextran blue, and 29 μL of 1× TE pH 7.5. Spin down briefly
and mix by gently vortexing.
6. Add 10 μL of 3 M of sodium acetate pH 5.2, mix by vortexing
and add 200 μL of 100 % absolute ethanol. Seal the plate and
mix gently by vortexing. Incubate at −20 °C for at least 2 h, or
overnight.
7. Spin down at 1,000 × g for 25 min, in the cold room if possible;
otherwise at room temperature.
8. Discard supernatant by inverting the plate over a container.
Add 200 μL of 70 % ethanol. Let it stand at room temperature
for 30–60 min (see Note 5). Spin down as in step 7.
9. Discard the supernatant and blot the plate over two paper tow-
els to remove all liquid.
10. To remove all traces of ethanol solution spin down the plate
inverted over a piece of paper towel for 3–4 s only.
11. Let the plate air dry at room temperature for 10 min.
12. Add 50 μL of 1× TE, pH 7.5. Seal the plate and vortex thor-
oughly. Let the DNA dissolve completely at 4 °C overnight
(see Note 6).
13. For long-term storage, keep at −20 °C.
276 Elena Hilario
3.6 Amplification 1. Dissolve PCR primers PPA and PPB in 1× TE pH 7.5 to have
(See Note 7) a stock concentration of 1 nmol/μL. Dilute 1:100 in 1× TE
pH 7.5 to have a working solution of 10 pmol/μL.
2. Make a PCR master mix for the total number of libraries that
need to be amplified (see Note 8), per reaction: 40.3 μL of
deionized sterile water, 5 μL of 1× AccuPrime™ High Fidelity
Buffer I (see Note 9), 1 μL of PPA 10 pmol/μL, 1 μL of PPB
10 pmol/μL, 0.2 μL of AccuPrime™ Taq DNA polymerase
High Fidelity 5 U/μL, and 2.5 μL of GBS_BamHI library
plate (corresponding well).
3. Aliquot 47.5 μL of the PCR master mix in each well of a new
PCR plate labeled as GBS_BamHI amplified library plate.
4. Add 2.5 μL of each GBS_BamHI library plate well to its cor-
responding location in the GBS_BamHI amplified library plate.
Seal the plate and mix gently by vortexing. Spin down the plate
briefly.
5. Run the following PCR profile: 72 °C, 5 min → 94 °C,
1 min → (94 °C, 30 s → 65 °C, 30 s → 68 °C, 30 s) × 25
cycles → 68 °C, 5 min → stop, leave at room temperature.
6. Run 15 μL of each GBS_BamHI amplified library per lane in a
2 % agarose gel (see Note 10), in 1× TAE buffer, to confirm
that every library was successfully prepared and amplified.
3.7 GBS Library Pool 1. To prepare the GBS_BamHI library pool: transfer 20–30 μL of
and Clean Up each amplified library from the GBS_BamHI amplified library
plate into one 15 mL Falcon centrifuge tube. Measure the final
volume.
2. If there is any remaining PCR reaction you may want to keep
as backup, seal the GBS_BamHI amplified library plate and
store at −20 °C.
3. Purify the GBS_BamHI library pool with QIAquick® PCR
Purification Kit with the following modifications:
(a) Add 5 volumes of PB buffer to the library pool in the
15 mL Falcon (step 1), mix thoroughly by vortexing.
Incubate for 5 min at room temperature.
(b) Load 4 Qiaquick® PCR columns with the PB/library mix-
ture. Follow the manufacturer’s instructions, except incu-
bate the EB buffer 10 min at room temperature before
centrifugation for a complete elution. Final total volume is
~200 μL. Store at 4 °C, in a screw-capped 1.5 mL tube.
The Restriction Enzyme Target Approach to Genotyping by Sequencing (GBS) 277
4 Notes
this pellet and allow the 70 % ethanol to wash the salts out
of the DNA, let diffusion and time do this job.
6. Do not resuspend the DNA by pipetting. Let it dissolve gently
into the buffer solution overnight.
7. This is the most crucial step in the protocol. A high fidelity
enzyme is recommended for the amplification. This protocol is
based on an end-point PCR approach. An optimization step to
determine the ideal number of cycles used should be performed:
18, 23, 25, and 30 cycles. The expected amplicon profile should
be a smooth smear from >200 bp up to 1 kbp. The presence of
prominent bands is not desirable, but sometimes unavoidable.
These bands are due to amplification bias or to repetitive ele-
ments in the genome which contain that particular restriction
enzyme target site. To avoid the first issue a real-time PCR
approach could reduce amplification bias and ensure uniform
coverage. Commercial kits are available (e.g., Kapa Biosystems).
I strongly recommend you perform an individual amplification
of each library and carry out its analysis before pooling them all
for sequencing (Subheading 3.7). This extra step will ensure
that each individual’s genomic DNA was successfully digested,
ligated, and amplified. For optimal visualization of the ampli-
fied libraries, analyze the amplicons produced in cycle optimiza-
tion step in the Bioanalyzer. A real-time PCR approach can be
performed to minimize amplification bias and ensure uniform
coverage by using commercial kits (e.g., Kapa Biosystems).
Prominent bands due to repetitive elements containing the
restriction enzyme site could be avoided if some sequence infor-
mation is known about these problematic elements. However,
in most cases this issue is hard to avoid, but these data points can
be removed bioinformatically at the analysis stage.
8. Pipette the PCR master mix in exactly the order shown in the
table to avoid any contamination. After pipetting primer PPB,
close the tube, vortex, spin down briefly, and then add the
DNA polymerase directly into the solution, pipetting a few
times to release all the enzyme into the liquid. This practice is
recommended for any molecular biology procedure. It allows
any stabilizing agents in the reaction buffer (e.g., bovine serum
albumin, polyethylene glycol, glycerol, etc.) to coat the inside
of the plastic tube and minimize the adsorption of the enzyme
to the walls.
9. The amplification also works with 10× AccuPrime High
Fidelity Buffer II, included with the enzyme. The difference
between buffers is the DNA template used: Buffer I is opti-
mized for small DNA fragments, and Buffer II is optimal for
genomic DNA.
10. To prepare a sturdy and transparent 2 % agarose gel, use 1 g
of any standard agarose (molecular biology grade) and 1 g of
The Restriction Enzyme Target Approach to Genotyping by Sequencing (GBS) 279
Acknowledgments
I would like to thank Lena Fraser, Lorna Barron, and Anne Gunson
(The New Zealand Institute for Plant and Food Research) for their
valuable comments and corrections to this protocol.
References
1. Davey JW, Hohenlohe PA, Etter PD, Boone JQ, robust, simple genotyping-by-sequencing (GBS)
Catchen JM, Blaxter ML (2011) Genome-wide approach for high diversity species. PLoS One
genetic marker discovery and genotyping using 6:e19379. doi:10.1371/journal.pone.0019379
next-generation sequencing. Nat Rev Genet 4. Ko W-Y, David RM, Akashi H (2003) Molecular
12:499–510 phylogeny of the Drosophila melanogaster species
2. Turner EH, Ng SB, Nickerson DA, Shendure J subgroup. J Mol Evol 57:562–573
(2009) Methods for genomic partitioning. Annu 5. van Gurp T. www.deenabio.com/services/
Rev Genomics Hum Genet 10:263–284 gbs-adapters
3. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, 6. Davey JW (ed) www.wiki.ed.ac.uk/display/
Kawamoto K, Buckler ES, Mitchell SE (2011) A RADSequencing/Home
Chapter 21
Abstract
The advent of Next-Generation sequencing-by-synthesis technologies has fuelled SNP discovery, genotyping,
and screening of populations in myriad ways for many species, including various plant species. One tech-
nique widely applied to screening a large number of SNP markers over a large number of samples is the
Illumina Infinium™ assay.
Key words Illumina Infinium™ assay, SNP discovery, SNP selection, SNP genotyping, Consortia
1 Introduction
1.1 What Are SNPs? Single Nucleotide Polymorphisms (SNPs) are individual nucleo-
tide base differences between two DNA sequences. SNPs are the
most common type of known DNA variation. In principle each
nucleotide could have four different variants at any particular site,
however in general SNPs are biallelic and can be categorized
according to the type of nucleotide substitution as either a transi-
tion (C/T or G/A) or a transversion (C/G, A/T, C/A, or T/G).
The disadvantage of biallelic markers, when compared to multial-
lelic markers such as SSRs, is compensated by the relative abun-
dance of SNPs. For example, on average one SNP is found every
29 and 288 bp in the potato and apple genomes, respectively [1, 2].
Consequently, SNPs have replaced microsatellites as the marker of
choice in plant genetics due to their potential for high multiplexing
in one reaction and ease of data analysis and interpretation.
1.2 Principle The Infinium™ assay (Illumina, Inc) relies upon probes designed to
of the Infinium™ target a sequence immediately upstream of a target SNP. The probes
Chemistry are attached to beads and deployed on a fixed glass slide format in
an average of 15× redundancy for each SNP genotype. The assay
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_21, © Springer Science+Business Media New York 2015
281
282 David Chagné et al.
has the ability to target 3,000–1 million SNPs for each sample in a
single experiment. The Infinium™ assay involves a single workflow
but a closer look reveals two different assays are used to target
the maximum possible SNPs of interest from a given genome. The
Infinium I assay interrogates each SNP using two allele-specific
probes on two separate bead types. The other Infinium™ chemis-
try (Infinium II) involves a single probe and bead type to query a
SNP with a base extension providing the discriminating allele
information. The 3′ end of the oligonucleotide probe is extended
by a DNA polymerase using labeled ddNTPs (single base exten-
sion). The terminating fluorescent dye corresponds to the two tar-
get alleles, which makes it possible to detect two allelic variants for
a variable site and discriminate heterozygous from homozygous
genotypes. The Infinium™ II assay uses two dyes: one dye for both
adenine and thymine, and another dye for both cytosine and gua-
nine. Therefore, A/T and C/G transversion SNPs require two
beads to discriminate between the target alleles. The distinction
between one-bead type and two-bead type assays is most impor-
tant in the design phase, as targeting one bead type SNPs will opti-
mize the space on any given array format.
1.3 Applications SNPs are widely used to understand evolutionary and genetic
of SNPs for Plant relationships between and within species, to identify correlations
Genetics to disease status in humans, and to investigate traits of agronomic
and Examples interest in high-value livestock and crops. SNPs provide an impor-
of Infinium™ Assays tant source of molecular markers that are useful in genetic map-
for Plants ping, map-based positional cloning, detection of marker–trait gene
associations through linkage and linkage disequilibrium mapping,
and the assessment of genetic relationships between individuals.
The low mutation rate of SNPs makes them excellent markers for
studying complex genetic traits as well as genome evolution [3].
The Infinium™ assay has been used for a range of plant species.
Myles et al. [4] have characterized genome-wide patterns of genetic
variation in several hundred cultivars of Vitis vinifera and its wild
relative V. sylvestris using the grape 9000 SNP Infinium™ array [5].
They show that V. vinifera was domesticated from V. sylvestris in
the Near East and have identified parent–offspring and sibling con-
nections, most of them first-degree relationships, between some
well-known varieties. The apple SNP array of 9,000 SNPs [2] was
used for assessing the efficiency of genomic selection for improving
fruit quality in an apple breeding program [6] and to develop a
dense SNP-based linkage map of an apple rootstock progeny [7].
In sunflower, a 10K array was developed and used for diversity
analyses [8] and for the construction of a dense genetic map based
on multiple crosses [9]. In potato (an auto-tetraploid) a 10K array
was designed based on SNPs located in candidate genes, as well as
the potato genome sequence [10]. So far, selected SNPs have
been used for studying the allelic variation in a diversity panel [10],
Infinium™ SNP Assay Methods in Plants 283
and for the construction of two diploid genetic linkage maps, each
with the reference potato genome sequence genotype as a parent
[11]. The maps resulted in an improved anchoring of sequence
scaffolds to the potato genome assembly.
1.4 Genotyping Despite the vital importance of plants as a source of food, the use
Budgets of the Infinium™ technique for plants has lagged behind its appli-
and Advantages cation for human and major livestock species (e.g., chicken, pig,
of Consortia cattle, and sheep). The research communities working on plant
species tend to be small and fragmented and not as well resourced
as the human and animal research communities. Nevertheless,
the demand for high-throughput genotyping is high. The major
contributor to recent growth in the ability of plant geneticists to
use the Infinium™ technique has been the development of world-
scale research consortia to enable a concerted design of Infinium™
assays. Consortia offer an opportunity for a research community to
drive the development of an SNP array while sample contributors
drive wide adoption and validation of common SNP content per-
ceived by the community as needed to capture the genome of
interest. Researchers often have overlapping goals that are best
addressed by a combined effort for the development of a single
tool or set of common tools where economies of scale can be lever-
aged. Tool development may include a strategy of SNP selection
that targets haplotype blocks (i.e., “tag SNPs” or tSNPs), even
coverage across the genome or specific gene-rich regions or a com-
bination of these in a single SNP array tool. Tools that meet mul-
tiple needs often have a minimum of several thousand marker
density [2, 12, 13]. Although, in some cases marker sets as low as
384 SNPs have been widely adopted for targeted common pur-
poses (e.g., crosses between rice lines Oryza indica vs. O. japonica)
[14], as described in the Goldengate chapter in this volume. Where
it is undesirable for content to be shared, (e.g., by commercial
partners), proprietary marker content can be supplemented to the
base content of a genotyping BeadChip to make a custom version
for that partner alone.
1.5 SNP Discovery SNP detection using next-generation sequencing platforms gives
and SNP Selection access to the variation of a species, either for one selected individual
for Infinium™ Assay or entire core collections, germplasm sets, breeding lines, or the
diversity set across a species’ natural range. Nevertheless, querying
all SNPs detected in the genome is prohibitive and unnecessary
(provides redundant information), necessitating a strategy for SNP
selection for assay design. For example, a genetic map experiment
would only require a few thousand markers, which is far fewer than
the millions that may be detected originally. Ideally the SNPs to be
selected for a high-throughput assay can be validated to ensure an
optimum conversion rate to polymorphic markers. This is unrealistic
for many species of plants for which no validation dataset is
284 David Chagné et al.
available. A few tricks can be used to work around this during SNP
discovery. For example, using pedigreed populations for the origi-
nal sequencing for SNP detection enables sorting of true and false
SNPs by looking at their segregation patterns in the population.
This increases the confidence for each SNP converting to a poly-
morphic assay, which is a useful parameter to track and use when
finalizing SNP selection. Other SNP selection criteria can include
focus on specific SNP sites based on their location in the genome
(evenly distributed or in clusters), proximity or affiliation with
gene coding regions, or SNP type (Infinium I or Infinium II).
SNP detection based on whole genome resequencing data can
be done by calling genotypes from pools of different individuals, as
done in the case of the RosBREED apple 9K [2] and RosBREED
peach 9K [15] SNP arrays. Alternatively, genotypes can be obtained
from separate individuals (if high coverage is available for each
individual) followed by merging all the calls at the end, as demon-
strated in the case of the FruitBreedomics apple 20K SNP array
(www.fruitbreedomics.com).
Economical methods of SNP discovery include use of reduced
representation libraries (RRLs) obtained by enzyme digestion of
DNA to increase the local coverage, as in the case of the grapevine
9K array [4], or focusing only on the coding portion of a genome
by sequencing normalized cDNA, as reported in the development
of the SolCAP tomato [16] and potato [10] 8K SNP arrays.
Knowing the specific features of the reference genome is quite
important when identifying the markers to consider for the array.
In particular, SNPs from paralogous regions and repetitive elements
should be avoided as the signal produced from the chip would
most probably be affected by interference produced by those
regions. This is especially an issue for highly heterozygous or poly-
ploid species. For example, for the genome of apple which went
through a whole genome duplication [17], the Infinium™ II 20K
SNP array design included resequencing data of two doubled-
haploid accessions obtained from ‘Golden Delicious’, which is the
cultivar used for the reference genome sequence [17]. The use of
doubled-haploids resulted in the exclusion of SNPs showing a
“heterozygous behavior”, indicating multiple loci within either of
the two doubled-haploids.
In general, SNP selection is the step in the assay design that
requires the most intense and thoughtful input from the user.
Once SNP discovery is complete, a typical SNP selection pipeline
ideally includes (depending upon the status of the reference
genome): Chromosome and coordinate map information, genetic
marker position, estimated minor allele frequencies in a discovery
panel, distance from the target SNP to the closest known adjacent
polymorphism on either side of the SNP, 50 bases of flanking
sequence on both sides of the SNP, target SNP alleles with referenced
strand (e.g., TOP/BOT or FOR/REV), estimate of conversion
Infinium™ SNP Assay Methods in Plants 285
2 Materials
2.1 Reagents 1. Illumina supplied reagents are supplied in correct amounts for
the ordered assay (Table 1).
2. Genomic DNA (see Notes 1 and 2).
3. 0.1 N NaOH: Dissolve 4 g of NaOH in 1 L water.
Table 1
Illumina supplied reagents for the Infinium assay
Item Part#
ATM—Anti Stain Two-Color Master Mix 11208317
FMS—Fragmentation solution 11203428
MA1—Multisample Amplification 1 Mix 11202880
MA2—Multisample Amplification 2 Mix 11203401
MSM—Multisample Amplification Master Mix 11203410
PB1—Reagent used to prepare BeadChips for hybridization 11191922
PB2—Humidifying buffer used during hybridization 11191130
PM1—Precipitation solution 11203436
RA1—Resuspension, hybridization, and wash solution 11222442
STM—Superior Two-Color Master Mix 11288046
TEM—Two-Color Extension Master Mix 11208309
XC1—Xstain BeadChip solution 1 11208288
XC2—Xstain BeadChip solution 2 11208296
XC3—Xstain BeadChip solution 3 11208421
XC4—Xstain BeadChip solution 4 11208430
288 David Chagné et al.
4. 100 % 2-propanol.
5. 100 % ethanol.
6. 95 % formamide, 1 mM EDTA. Store at −20 °C.
7. 10 mM Tris–HCl, pH 8.5.
3 Methods
3.1 Infinium™ Assay While this protocol is written from the perspective of assaying 96
Protocol samples using Infinium™ chips, which hold 24 samples each, other
combinations are possible and can be easily accommodated into
the protocol. The 96-well/24-sample chip format provides the
highest throughput possible and currently allows up to 90,000
SNPs to be queried simultaneously in 12× redundancy for over 99 %
call rates on validated SNP assays.
Unless stated, all centrifugation and vortexing steps are for
1 min.
1. Quantitate samples using the Qubit dsDNA BR assay. Normalize
all samples in a 96-well PCR plate to 50 ng/μl by adding
Tris–HCl 10 mM, pH 8.5 (see Notes 1–3).
2. Dispense 20 μl of MA1, followed by 4 μl of DNA sample, and
then 4 μl of NaOH into each well of the 0.8 ml plate and seal
with a cap mat.
3. Vortex at 1,600 rpm and centrifuge at 280 × g, then incubate at
room temperature for 10 min.
4. Dispense 34 μl of MA2 and 38 μl of MSM into each well,
before resealing for vortexing and centrifuging as in step 3.
5. Incubate resealed plate in a 37 °C oven for 20–24 h.
6. Before opening the plate, centrifuge briefly at 50 × g to ensure
all liquid is in the bottom of the wells.
7. Add 25 μl of FMS to each well, reseal and vortex as before.
Centrifuge briefly again at 50 × g to ensure all liquid is in the
bottom of the wells.
8. Incubate on a heating block at 37 °C for 1 h.
9. Add 50 μl of PM1 to each well, seal and vortex as before.
10. Incubate for a further 5 min on the 37 °C heating block.
Centrifuge briefly at 50 × g to ensure all liquid is in the bottom
of the wells.
11. Add 155 μl of 2-propanol to each well, then seal plate with a
second, fresh cap mat.
12. Mix by inverting the plate at least ten times, then incubate at
4 °C for 30 min.
13. Prepare a balance plate before centrifuging at 2,000 × g and 4 °C
for 20 min. This should produce pale blue pellets in the bottom
of the wells (see Note 4).
14. Immediately decant supernatant by smoothly and rapidly
inverting the plate onto an absorbent pad prepared on the bench.
Remove all liquid by tapping the plate firmly for 1 min on the pad.
290 David Chagné et al.
33. Remove the staining rack to a tube rack in one smooth, rapid
motion and use self-locking tweezers to slide each BeadChip
from the staining rack to the tube rack.
34. Place the entire tube rack in a vacuum desiccator and start the
vacuum, using at least 508 mm Hg. Dry under vacuum for
50–55 min.
35. Image BeadChips on HiScan system.
36. Import data to GenomeStudio software.
37. Analyze results.
3.2 Infinium™ Assay Each Infinium™ bead array is hybridized with one DNA sample.
Downstream Analysis The raw data from an Infinium™ assay consist of fluorescence
intensity in two colors with an average of 15 beads of each bead
type (for Infinium II SNPs) carrying the information of one SNP
locus. The raw data are filtered within the iScan software so that
aberrant outliers, if present, are removed prior to using the remain-
ing data to identify the correct genotype call for that bead type and
its targeted SNP. The overall data have as many as individual sam-
ples in the analysis. Such data cannot be analyzed manually and
require specialized software such as GenomeStudio to extract and
transform the data into a meaningful and analyzable format. After
a BeadChip is scanned, the data are imported into GenomeStudio
Software for analysis. Input and output files for GenomeStudio are
shown in Fig. 1. Most importantly, as Infinium™ assays have at
least 3,000 markers run simultaneously, the data analysis is a step
change from simplex marker systems, where a lot of emphasis and
attention used to be devoted to troubleshooting every single data
point. A systematic approach must be employed to automate the
analysis as much as possible, which involves using quality metrics to
filter out the good from the ambiguous data. Some analysis may be
done manually for data points that are ambiguous if these are
viewed as essential. These can be identified using quality metrics,
although often these loci are so few that they can be excluded to
avoid manual work as much as possible.
In addition to GenomeStudio’s GenCall, a number of algo-
rithms were developed to process the raw signal of the BeadArray
into genotype calls. The three more widely applicable are Illuminus
[24], GenoSNP [25], and CRLMM [26, 27]. The main modeling
differences lie in the normalization method and clustering that can
occur either within sample (GenCall, Illuminus, GenoSNP) or
both within and between samples (CRLMM). In plants most pub-
lications use the GenomeStudio’s proprietary GenCall method.
Initial steps for data analysis within GenomeStudio involve a
preliminary sample quality evaluation to determine which samples
may require reprocessing or removal. If a custom cluster file (*.egt)
is required, clustering should be done after removal of failed or
suboptimal samples. Because GenomeStudio is a population-based
Infinium™ SNP Assay Methods in Plants 293
Fig. 1 Inputs and outputs for GenomeStudio’s Genotyping Module. Two different types of file can be used for
this process: Intensity data files (*.idat) or Genotype Call Files (*.gtc). An optional input into GenomeStudio that
can be generated from the Instrument Control Software is the *.gtc format. The *.gtc format consolidates
information from *.bpm, *.csv, *.idat, *.egt for faster uploading of data into GenomeStudio. During *.gtc file
generation, signal intensity data from *.idat files are combined with information about SNP content on the
array from the bead pool manifest file (*.bpm) and cluster reference information for each locus (*.egt). Outputs
depend upon downstream analysis tool requirements
Fig. 2 Poorly performing samples (encircled) are obvious outliers from the popu-
lation of samples when 10 % GC Score (or 50 % GC Score in the case of more
raw data) is plotted against sample call rate
3.3 Calling Clusters A large number of cultivated plant species are polyploid, such as
for Polyploid Genomes potato (tetraploid; see Note 5), wheat (hexaploid), and strawberry
(octoploid). The expected segregation for SNPs using the
Infinium™ technique is therefore more complex and will exhibit
more than the three clusters (AA, AB, and BB) typical of diploid
species. Methods adapted for polyploidy in GenomeStudio soft-
ware include an algorithm for automated calling of clusters repre-
sented by polyploid genomes. The automated clustering algorithms
start from an estimated density distribution and are able to detect
meaningful clusters in data with varying density, which is common
in genotyping data. Sensitivity of cluster detection can be adjusted
at the project level by specifying a minimum number of points in a
cluster and cluster distance. The X-Y coordinates for cluster
positions can be exported from GenomeStudio for downstream
data analysis. The automated cluster calling functionality currently
available in GenomeStudio is using both Density Based Spatial
Clustering of Applications with Noise (DBSCAN) and Ordering
Points to Identify the Clustering Structure (OPTICS; [28]) algo-
rithms. Sensitivity for cluster detection can be adjusted by altering
minimum cluster distance and minimum number of points required
to define a cluster.
4 Notes
a b
2.20 2.40
2.00 2.20
1.80 2.00
1.60 1.80
1.60
1.40
1.40
1.20
Nom R
1.20
Nom R
1
1
0.80
0.80
0.60
0.60
0.40
0.40
0.20 0.20
0 0.00
56 96 0 15 53 0
−0.20 −0.20
−0.40
0 0.20 0.40 0.60 0.80 1 0 0.20 0.40 0.60 0.80 1
Norm Theta Norm Theta
Fig. 3 Comparison between two DNA extraction methods for plant tissue: SNP calling and clustering using
GenomeStudio. Samples from young expanding pear (Pyrus communis) leaves were extracted (a) using the
Macherey Nagel Nucleospin kit and (b) using a CTAB-based technique, and analyzed using the 9 k apple and
pear Infinium™ assay. The individuals from the two experiments belong to the same F1 population grown in
similar conditions. The SNP shown is a pear SNP. The clustering for the CTAB-based extraction is of much lower
quality (i.e., the clusters are more spread out and less separated) than for the column-based extraction kit
References
Abstract
Highly parallel genotyping assays, such as the GoldenGate assay developed by Illumina, capable of inter-
rogating up to 3,072 single nucleotide polymorphisms (SNPs) simultaneously, have greatly facilitated
genome-wide studies, particularly for crops with large and complex genome structures. In this report, we
provide detailed information and guidelines regarding genomic DNA preparation, SNP assay design, SNP
assay protocols, and genotype calling using Illumina’s GenomeStudio software.
Key words DNA marker, High-throughput genotyping, Oligo pool assay, OPA, Single nucleotide
polymorphism, SNP
1 Introduction
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6_22, © Springer Science+Business Media New York 2015
299
300 Shiaoman Chao and Cindy Lawley
2 Materials
2.4 GoldenGate 1. OPA (oligo pool assay): prepare a final list of SNP panel and
Assay submit it to Illumina for OPA synthesis. The SNPs included in the
final list have previously been processed through the Illumina
assay design tool (ADT) pipeline.
2. GoldenGate assay reagent kits from Illumina: includes the DNA
activation kit, the BeadChip assay kit, and universal-32 BeadChips
(see Note 1).
3. User supplied reagents: Titanium Taq DNA polymerase
(Clontech Laboratories, Inc., Mountain View, CA), 0.1 N
NaOH, 70 % ethanol, and 100 % ethanol.
4. User supplied lab consumables: reagent trough, single and
8-channel manual pipettes, filter tips for 8-channel manual
pipettes, 96-well PCR plates, aluminum heat seal foil, adhesive
plate seal, 96-well 0.45 μM filter plates (EMD Millipore,
Billerica, MA), and 96-well V-bottom plates.
3 Methods
3.1 Sample Tissue Two methods are described for sample tissue preparation and
Preparation in 96-Well either one will suffice.
Plate Format
3.1.1 Freeze Drying 1. Place a 96-deep well plate on ice, cut a piece of 2-in. leaf blade
at the seedling stage, fold and insert in the well.
2. After a plate of samples is collected in full, wrap the plate with
miracloth, fasten the miracloth with a string, then plunge the
plate in liquid nitrogen.
302 Shiaoman Chao and Cindy Lawley
3. Place the frozen plate in –80 °C freezer, and continue with tissue
collecting.
4. Place all frozen plates in the freeze dryer, and dry the tissues
overnight.
5. Remove the string and miracloth, cover the plate with a plate
mat, and store the plates at 4 °C before extraction.
3.1.2 Silica gel 1. Fill the plain type silica gel in the 96-deep well plates following
the protocol of Bodo Slotta et al. [12]. Place the plate mat to
ensure silica gel is not exposed to the moisture in the air.
2. Remove the plate mat, cut a piece of 2-in. leaf blade at the
seedling stage, fold and insert in the well (see Note 2).
3. Place the mat back onto the plate after all samples are collected
and flip the plate a few times to ensure the leaf tissues are in
contact with silica gel.
4. Store the plates at room temperature in airtight plastic bags for
a week, allowing tissues to dry, then proceed with DNA
extraction.
5. Store the plates at 4 °C if DNA is not extracted immediately.
3.2 DNA Extraction This method is adapted from the original protocol reported by
Pallotta et al. [13] for extracting DNA in 96-well plates using a
robot. The same method can be used to manually extract DNA
from dried tissues in individual tubes or in strip tubes (see Note 3).
DNA is stored at 4 °C before use or –20 °C for longer term.
1. Preheat the extraction buffer to 65 °C.
2. To grind freeze-dried leaf tissues in powder, add a ball bearing
to each well. The silica gel-dried leaf tissues can be ground
using the silica gel present in each well. Load the plates to the
tissue grinder and grind for a specified length of time depending
on the model used.
3. Add 500 μl of extraction buffer to each well (see Note 4). Seal
the plates with adhesive seals and incubate the plates at 65 °C
for 30 min. Vortex the plates every 5 min during incubation.
4. Cool the plates on ice for 15 min before adding 250 μl of cold
6 M ammonium acetate. Seal the plates with adhesive seals,
mix by vortexing, and incubate the plates on ice for 15 min.
5. Centrifuge the plates for 20 min at 4,000 × g at 10 °C.
6. Add 360 μl chilled isopropanol into each well of new 96
deep-well plates.
7. Transfer 600 μl of the supernatant into new 96 deep-well plates
containing isopropanol (see Note 5). Mix thoroughly and
allow DNA to precipitate for 10 min or longer at 4 °C.
GoldenGate SNP Genotyping 303
3.4 SNP To ensure a high success rate of converting candidate SNPs to suc-
Assay Design cessful assays, the SNP panel is first evaluated using Illumina’s
algorithm for GoldenGate scoring, the Assay Design Tool (ADT).
ADT is a bioinformatic pipeline based on a proprietary algorithm
304 Shiaoman Chao and Cindy Lawley
3.5 GoldenGate The genotyping assay generally takes about 3 days. Day 1 involves
Genotyping Assay DNA activation and hybridizing OPA to biotinylated DNA tem-
plates overnight. Day 2 involves extension, ligation, and PCR
amplification of DNA templates containing the targeted SNPs. It is
recommended that all the assays up to the PCR step be carried out in
a pre-PCR clean room. In the post-PCR room, the PCR products
are cleaned up, denatured, and hybridized to the BeadChips over-
night. Day 3 involves BeadChips washing and imaging to generate
hybridization intensity values.
the plates upside down at 8 × g for 1 min. Air dry DNA at room
temperature for 15 min.
5. Dissolve activated DNA in 10 μl of RS1, and proceed with the
next step.
3.5.4 PCR Amplification 1. Add 64 μl of Titanium Taq polymerase and 50 μl of UDG (see
Note 11) to MMP tubes, mix well. Aliquot 30 μl of MMP
mixture to the PCR plates, and store the plates in the dark.
2. Preheat the heat blocks to 95 °C.
3. Place the ASE plates on magnetic stands, pipette and discard
all liquid from the ASE plates after 15 min incubation, wash
wells once with 50 μl of UB1.
4. Add 35 μl of IP1 to the ASE plates and incubate at 95 °C for
1 min.
5. Transfer 30 μl of supernatant from the ASE plates on magnetic
stands to the PCR plates, heat-seal the PCR plates, and discard
the ASE plates.
6. Place the PCR plates into the thermal cycler, and run the
program set at 37 °C for 10 min, 95 °C for 3 min, followed by
34 cycles of 95 °C for 35 s, 56 °C for 35 s and 72 °C for 2 min,
then a final extension at 72 °C for 10 min, before holding the
program at 4 °C for 5 min.
3.5.5 Clean 1. Add 20 μl of MBP into each well of the PCR plates. Set the
and Denature PCR 8-channel pipette to 85 μl, pipette all the solution in the PCR
Products plates up and down several times to mix, then transfer the
mixed solution to the 0.45 μM filter plates. Incubate the filter
plates at room temperature for 1 h in the dark.
306 Shiaoman Chao and Cindy Lawley
3.5.7 BeadChip Wash 1. Prepare three wash dishes with two filled with 300 ml of PB1,
and Imaging and the third filled with 300 ml of XC4 reagent mixed with
100 % ethanol.
2. Remove the seals on the BeadChips. Load up to 12 BeadChips
to a wash rack, and immerse the BeadChips in the first wash
dish containing PB1, move up and down ten times.
3. Transfer the wash rack to the second PB1 wash dish and let it
soak for 5 min.
4. Transfer the wash rack to the XC4 wash dish and move the wash
rack slowly up and down ten times, and let it soak for 5 min.
5. Dry the BeadChips in desiccators under vacuum for 1 h or
until dry.
6. Clean the underside of each BeadChips to remove excess XC4
with Kimwipes wetted with 70 % ethanol.
7. Download dmap files corresponding to each BeadChips through
a Decode File Client application (see Note 12), and load the
BeadChips to the array reader, such as iScan.
8. The intensity data (.idat) files generated by the reader contain
allele-specific hybridization intensity values.
GoldenGate SNP Genotyping 307
3.6 Genotype Calling Three files are required to start a new GenomeStudio project:
Using GenomeStudio (1) the intensity data (.idat) files, (2) the OPA manifest (.opa) con-
Software taining interrogating probe and bead address sequence informa-
tion, and (3) the sample sheet (.csv) containing the OPA name, the
Sentrix barcodes (all BeadChips are barcoded), and the sample
names, their corresponding well positions, and other relevant sam-
ple information (see Note 13) (Fig. 1). A typical GenomeStudio
project contains three major elements, the SNP Graph where gen-
otype calling can be manipulated, the Samples Table containing
sample names and the call rate for each sample over all SNPs
assayed, and the SNP Table containing the names of the SNPs used
in the genotyping assay and the statistics of genotype clustering of
all samples assayed for each SNP (Fig. 2).
The software first normalizes and scales the intensity data to
adjust for the background noise. The software then uses the
GenCall (GC) no-call threshold (0.25 is the recommended lower
threshold for GC score for the GoldenGate assay) to determine if
the genotypes should be assigned within the call region of any
given cluster. If the score is less than 0.25, the genotype is consid-
ered too far from the centroid of the cluster to be reliably assigned
to the cluster and results in a no-call, or missing data. GenomeStudio
software was developed originally using human data assuming dip-
loidy and Hardy–Weinberg equilibrium (HWE), and thus includes
metrics that allow easy screening of loci that deviate from HWE.
After applying automated data clustering, three genotype clusters
iScan
GenomeStudio Software
Visualization
Raw Data Report
Tools
Fig. 1 The workflow for generating SNP genotype data using the GenomeStudio software. Sample sheet and
cluster file are optional for starting a new GenomeStudio project
308 Shiaoman Chao and Cindy Lawley
Fig. 3 Three genotypes are expected after automated calling using the algorithm provided by the software
Fig. 5 This SNP detected subclusters and should be eliminated for further analysis
4 Notes
10. One person can manually process two plates at the same time.
11. UDG, uracil DNA glycosylase, is used to kill carry-over DNA as
a precaution to minimize cross contamination from different
batches of experiments.
12. The dmap files generated from the decoding process contain
the information of the bead types and their positions on each
BeadChip and are required during array scanning.
13. The sample sheet is optional. However, if the sample sheet is
not used, GenomeStudio will assign each sample with a generic
name.
14. Illumina recently released the GenomeStudio Polyploid
Clustering (PC) Module that uses density-based algorithms to
assign genotypes to clusters. It is suitable for polyploid species
for which the standard diploid clustering algorithm imple-
mented in the Genotyping Module is not appropriate. The PC
Module performs cluster assignment, but does not call geno-
types. Manual editing of cluster assignments is still necessary.
References
1. Fan J-B, Oliphant A, Shen R, Kermani BG, Rigault P, Zhou L, Stuelphagel S, Chee MS
Garcia F, Gunderson KL, Hansen M, Steemers (2003) Highly parallel SNP genotyping. Cold
F, Butler SL, Deloukas P, Galver L, Hunt S, Spring Harbor Symp Quant Biol 68:69–78
McBride C, Bibikova M, Rubano T, Chen J, 2. Shen R, Fan J-B, Campbell D, Chang W, Chen
Wickham E, Doucet D, Chang W, Campbell J, Doucet D, Yeakley J, Bibikova M, Garcia
D, Zhang B, Kryglyak S, Bentley D, Hass J, EW, McBride C, Steemers F, Garcia F, Kermani
312 Shiaoman Chao and Cindy Lawley
BG, Gunderson K, Oliphant A (2005) High- 8. Akhunov E, Nicolet N, Dvorak J (2009) Single
throughput SNP genotyping on universal bead nucleotide polymorphism genotyping in poly-
arrays. Mutat Res 573:70–82 ploid wheat with the Illumina GoldenGate
3. Yan J, Yang X, Shah T, Sanchez-Villeda H, Li J, assay. Theor Appl Genet 119:507–517
Warburton M, Zhou Y, Crouch JH, Xu Y 9. Chao S, Oliver R, Lazo G, Tinker N, Jellen E,
(2010) High-throughput SNP genotyping Maughan J, Jackson E (2012) Development of
with GoldenGate assay in maize. Mol Breed a high-density SNP genotyping panel as a com-
25:441–451 munity resource for genetic analysis in oat.
4. Hyten DL, Song Q, Choi I-Y, Yoon M-S, Abstract. Plant and Animal Genome XX
Specht JE, Matukumalli LK, Nelson RL, Conference, 14–18 Jan 2012, San Diego, CA
Shoemaker RC, Young ND, Cregan PB (2008) 10. Oliphant A, Barker DL, Stuelpnagel JR, Chee
High-throughput genotyping with the MS (2002) BeadArray™ Technology: enabling
GoldenGate assay in the complex genome of an accurate, cost-effective approach to high-
soybean. Theor Appl Genet 116:945–952 throughput genotyping. Biotechniques 32:
5. Rostoks N, Ramsay L, MacKenzie K, Cardle L, S56–S61
Bhat PR, Roose ML, Svensson JT, Stein N, 11. Gunderson KL, Kruglyak S, Graige MS, Garcia
Varshney RK, Marshall DF, Graner A, Close TJ, F, Kermani BG, Zhao C, Che D, Dickinson T,
Waugh R (2006) Recent history of artificial Wickham E, Bierle J, Doucet D, Milewski M,
outcrossing facilitates whole-genome associa- Yang R, Siegmund C, Hass J, Zhou L, Oliphant
tion mapping in elite inbred crop varieties. Proc A, Fan J-B, Barnard S, Chee MS (2006)
Natl Acad Sci U S A 103:18656–18661 Decoding randomly ordered DNA arrays.
6. Close TJ, Bhat PR, Lonardi S, Wu Y, Rostoks Genome Res 14:870–877
N, Ramsay L, Druka A, Stein N, Svensson JT, 12. Bodo Slotta TA, Brady L, Chao S (2008) High
Wanamaker S, Bozdag S, Roose ML, Moscou throughput tissue preparation for large-scale
MJ, Chao S, Varshney RK, Szucs P, Sato K, genotyping experiments. Mol Ecol Resour
Hayes PM, Matthews DE, Kleinhofs A, 8:83–87
Muehlbauer GJ, DeYoung J, Marshall DF, 13. Pallotta MA, Warner P, Fox RL, Kuchel H,
Madishetty K, Fenton RD, Condamine P, Jefferies SJ, Langridge P (2003) Marker assisted
Graner A, Waugh R (2009) Development and wheat breeding in the southern region of
implementation of high-throughput SNP Australia. Proceedings of the tenth international
genotyping in barley. BMC Genomics 10:582 wheat genetics symposium, Paestum, Italy,
7. Zhao K, Wright M, Kimball J, Eizenga G, pp 789–791
McClung A, Kovach M, Tyagi W, Ali ML, Tung 14. Tinker NA, Chao S, Lazo GR, Oliver RE,
C-W, Reynolds A, Bustamante CD, McCouch Huang YF, Poland JA, Jellen EN, Maughan
SR (2010) Genomic diversity and introgression PJ, Kilian A, Jackson EW (2014) A SNP
in O sativa reveal the impact of domestication genotyping array for hexaploid oat (Avena
and breeding on the rice genome. PLoS One sativa L.). Plant Genome doi: 10.3835/
5:e10780 plantgenome2014.03.0010.
INDEX
Jacqueline Batley (ed.), Plant Genotyping: Methods and Protocols, Methods in Molecular Biology, vol. 1245,
DOI 10.1007/978-1-4939-1966-6, © Springer Science+Business Media New York 2015
313
PLANT GENOTYPING: METHODS AND PROTOCOLS
314 Index
Goldengate ...................................................7, 16, 21, 35, 42, Mass spectrometric cleaved amplified polymorphic sequence
217–219, 225, 226, 231, 266, 283, 288, 299–311 (MS-CAPS) .................................................205–213
Graingenes ............................................................. 51, 53, 55 Mass spectrometry.........................6, 207, 208, 218–220, 222
Gramene ................................................................. 51, 53, 55 Matrix-assisted laser desorption/ionization time of flight
(MALDI-TOF)....................................... 6, 205–213,
H 219–222, 224, 226, 237
Haplotype identification........................... 258, 262–263, 265 MicroSAtellite (MISA) ..........................3, 17, 30, 33, 34, 36,
Heteroduplex .................................... 141–150, 152–154, 197 49, 55–57, 63, 64, 77, 78, 85, 87, 157, 216, 266, 281
High resolution melting (HRM) MID barcodes. See Multiplex identifier (MID) barcodes
haploid samples ..........................................................155 MoccaDB ............................................................... 51, 53, 57
heterozygous species ...........................................153–155 Molecular markers ............................1–5, 8, 9, 13–23, 29, 30,
Homoeologous loci...............................................................7 38, 49–58, 78, 87, 91, 162, 184, 215, 257, 258, 282
Hybridisation ...................................................................288 Mreps .................................................................................33
Msatfinder ....................................................................33, 34
I MS-CAPS. See Mass spectrometric cleaved amplified
polymorphic sequence (MS-CAPS)
ICRISAT ............................................................... 51, 53, 58
Multiplex identifier (MID) barcodes ............... 170, 172, 173,
Imperfect microsatellite extractor (IMEx)....................33, 34
176–178, 180
IMPUTE2 .........................................................................35
Mutant population ...................................................194–196
Indels ................................................................. 2, 4, 34, 113,
Mutation detection ........................................... 144, 195–199
117, 193, 194, 259, 261, 262
Indexing .............................................................................35 N
Infinium ..................................7, 37, 225, 266, 281–297, 304
Inter simple sequence repeat (ISSR).............................63–74 Next generation sequencing (NGS) ........................ 5, 6, 8, 9,
Inter-sine amplified polymorphism (ISAP) .............183–191 22, 29–42, 79, 80, 112, 119, 162, 164, 257, 258, 260,
iPLEX ..........................................................6, 217, 218, 220, 264, 271, 283
222–226, 232, 234, 235, 237, 238
O
K Oligo pool assay (OPA) ............. 301, 304, 305, 307, 309, 310
Kaspar............................................................... 218, 243–255 hybridisation ...............................................................305
Orthologous markers ................................................155–156
L
P
Laboratory information management system
(LIMS) .................................................................109 Panzea .................................................................... 51, 53, 55
Legume information system (LIS) ......................... 51, 53, 55 Polymerase chain reaction (PCR) .............2, 3, 5, 6, 9, 30–32,
Ligation ............................................124–126, 132, 134, 172, 36–38, 42, 50–52, 56, 57, 63, 64, 68–70, 72–74,
174–175, 179–180, 272, 275–277, 300, 304, 305 79–87, 98, 104–105, 110–112, 115, 116, 126–127,
Linkage disequilibrium (LD) ............................ 4, 15–17, 19, 134–136, 143, 145, 148, 152–158, 162–164,
20, 35, 83, 262, 282 170–174, 176–178, 180, 184–187, 189–191,
194–201, 205–207, 209–213, 219–225, 235, 236,
M 244–247, 249, 251–253, 271–274, 276, 278, 288,
289, 295, 300, 301, 303–306
MaCH ................................................................................35
Polyploidy ....................................4, 21–23, 35, 161–167, 295
MaizeGDB ............................................................ 51, 53, 55
PolyScan .......................................................................34, 35
MALDI-TOF. See Matrix-assisted laser desorption/
Primer design .............................................30, 31, 36, 42, 81,
ionization time of flight (MALDI-TOF)
85, 110, 157, 185–186, 196, 208, 253
Mapping
Pyrosequencing......................................31, 35, 170, 181, 187
association............................................. 4, 14–16, 83, 165
genetic ................................................3, 4, 14, 15, 55, 57,
Q
91, 169, 217, 236, 282, 283
physical ............................................... 4, 5, 13–15, 20, 56 Qcall ...................................................................................35
synteny....................................................................14–15 QualitySNP ........................................................................35
Marker assisted selection (MAS)...............3, 4, 17–19, 29, 78 Quantitative trait loci (QTL) ...........................14, 15, 18, 49,
MassARRAY ............................................................215–239 50, 78, 83, 119, 137, 215, 217, 258, 259
PLANT GENOTYPING: METHODS AND PROTOCOLS
Index
315
R detector .........................................................................35
filtering ..........................36, 261–262, 285–286, 294, 310
Radioactively labeling probe ...................................92, 96–97 imputation ..........................................................262, 265
RAD sequencing. See Restriction site associated DNA selection .............................................. 175, 282–287, 304
(RAD) sequencing Single stranded DNA (ssDNA) ........................ 71, 178–180,
Randomly amplified polymorphic DNA 206–208
(RAPDs).......2–3, 17, 50–52, 55, 56, 63, 78, 162, 216 Size exclusion ...........................................................172, 175
Read mapping ............................................ 21, 261, 262, 264 Skim based genotyping by sequencing .....................257–267
Reduced representation libraries (RRLs)......................5, 284 SNP. See Single nucleotide polymorphism (SNP)
RepeatMasker.....................................................................33 SOAP2. See Short Oligonucleotide Analysis Package
Restriction digest ............................................. 2, 63, 98, 171, (SOAP)2
173–175, 179, 180, 260, 277 SOL genomics network (SGN).............................. 52, 54, 56
Restriction fragment length polymorphism Southern blot................................................................92–96
(RFLP) .................................2, 17, 50–52, 55–57, 63, SoyBase .................................................................. 52, 54, 56
1–98, 162, 163, 165, 216, 258 Spectral repeat finder (SRF) ...............................................33
Restriction site associated DNA (RAD) Spin chromatography ....................................... 172, 175, 176
sequencing ...............................3, 8, 18, 259–262, 272 Sputnik ....................................................... 33, 34, 36, 38, 42
Rice genome annotation project ............................. 51, 53, 56 SSR. See Simple sequence repeat (SSR)
RNA isolation .......................................... 121–122, 128–129 SSR identification tool (SSRIT) ........................................33
RRLs. See Reduced representation libraries (RRLs) SSRPrimerII...........................................................33, 37–40
SSRSEARCH ....................................................................33
S
Sample collection T
from the field ...................................... 103, 105–107, 114
Tandem repeat finder (TRF) ........................................33, 34
from herbarium specimens ......................... 103–109, 114
Tandem repeat occurrence locator (TROLL)...............33, 34
Sample purification
Taqman ................................................................................6
isopropanol precipitation ............................................198
Targeting induced local lesions in genomes
sephadex .............................................................198–199
(TILLING) ..................................................193–201
Samtools ............................................35, 36, 41, 42, 261, 285
tfGDR Project Website ................................................52, 54
Sanger sequencing .................................... 5, 21, 79, 112, 120
TRF. See Tandem repeat finder (TRF)
Sequence alignment.............................................. 35, 56, 102
Triticeae Mapped EST Database ver.2.0
SGSautoSNP ..............................34, 35, 37–42, 57, 261, 264
(TriMEDB) ................................................ 52, 54, 56
Short Oligonucleotide Analysis Package
TROLL. See Tandem repeat occurrence
(SOAP)2 ...............9, 34, 35, 37, 40, 41, 261, 264, 266
locator (TROLL)
Simple sequence repeat (SSR) ........................... 3, 17, 19–21,
29–34, 36–38, 42, 49–58, 63–74, 77–88, 162, 163, V
165, 166, 184, 243, 258, 266, 271
discovery ...........................................................30–34, 36 Validation ...............................................21, 29, 85, 127, 136,
taxonomy tree ................................................... 36, 52, 54 152, 164, 166, 283, 285
Single nucleotide polymorphism (SNP) VegMarks ............................................................... 52, 54, 57
assay design.....................................21, 22, 243, 250–255,
W
266, 281–297, 303–304, 310
calling .........5, 35, 37, 38, 40, 41, 260–262, 264, 285, 296 Wheat genome information .........................................52, 54