Crawford and Oleksiak 2016

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Briefings in Functional Genomics, 15(5), 2016, 342–351

doi: 10.1093/bfgp/elw008
Advance Access Publication Date: 4 April 2016
Review paper

Downloaded from https://academic.oup.com/bfg/article-abstract/15/5/342/1742085 by Universidade de São Paulo - IO user on 17 December 2019


Ecological population genomics in the marine
environment
Douglas L. Crawford and Marjorie F. Oleksiak
Corresponding author: Douglas L. Crawford, University of Miami, RSMAS, Marine Biology and Ecology, 4600 Rickenbacker Causeway, Miami, FL 33149.
E-mail: dcrawford@rsmas.miami.edu

Abstract
Marine species live in a wide diversity of environments and yet, because of their pelagic life stages, are thought to be well-
connected: they have high migration rates that inhibit significant population structure. Recent innovations in sequencing
technologies now provide information on nucleotide polymorphisms at thousands to tens of thousands of loci based on
whole genomes, reduced representative portions of genomes (0.1–1%) or a majority of expressed mRNAs. Data from these
genomic approaches are used to define and quantify single-nucleotide polymorphisms (SNPs). These SNP data tend to agree
with data from older technologies (allozymes or microsatellites), which support well-connected populations with few gen-
etic differences among populations. However, these studies also find few percentages of SNPs (1–5%) that readily distinguish
genetic differences among populations on relatively small geographic scales. The magnitudes of the genetic differences (FST
values) suggest that hundreds of loci with significant differences are due to positive selective pressures. Thus, these data
suggest that natural selection is effectively altering allele frequencies at 100s of loci in marine populations. In this manu-
script, we provide examples of these studies, the strengths and weaknesses of different genomic approaches as well as im-
portant technical aspects associated with genomic approaches.

Key words: GBS, RAD, RNA-Seq, evolution, population genetics

Importance of genomics are connected. This is important because many marine organisms
have a pelagic larval stage, and the oceans’ currents can transport
The world’s oceans provide diverse organisms and environmental larvae thousands of kilometers from where they hatched. We also
habitats to better explore ecological genomics: how genomic vari- can contrast neutral loci to genetic changes that deviate from the
ation and its expression enhance an organism’s probability of suc- neutral patterns and are most likely evolving by natural selection
cess. Marine habitats range from freezing polar regions to warm to better understand local adaptation. Marine environments pro-
tropical waters, from shallow coastal estuaries to deep-sea hydro- vide clines along a coast, within an estuary and within a tidal
thermal vents that rely on chemosynthesis rather than photosyn- shore, and altered environments at vastly different time and spa-
thesis, and adaptation to these environments likely requires tial scales to investigate the potential for adaptive change.
many different genes in many different pathways. Using genomic Further, changing habitats due to factors such as global climate
approaches, which query the whole genome rather than a tar- change, anthropogenic pollution and habitat destruction are af-
geted gene set, we are more likely to discover the evolved changes fecting all the world’s oceans. Genomic approaches allow us to ex-
responsible for this variation in life than if we explore only a few plore the rate of adaptive change on a scale that was not
genes. Additionally, because genomic approaches provide so previously possible. Combining genomics with research on marine
many loci, many of which are evolving by neutral processes, we organisms that inhabit diverse habitats provides opportunities to
can use neutral-demographic variation to define how populations address important questions concerning how marine organisms

Douglas L. Crawford is a Professor in the Department of Marine Biology and Ecology at the Rosenstiel School of Marine and Atmospheric Science,
University of Miami. His research focuses on evolutionary genomics integrating physiology, mRNA expression and nucleotide polymorphisms.
Marjorie F. Oleksiak is an Associate Professor in the Department of Marine Biology and Ecology at the Rosenstiel School of Marine and Atmospheric
Science, University of Miami. Her research uses genomic tools with physiological and toxicological phenotypes to enhance our understanding of evolu-
tionary processes.

C The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com
V

342
Ecological population genomics | 343

and populations interact with their environments and adapt to past 5 years, several approaches have been developed to survey
environmental change, and genomic approaches provide the tools 1000s to 10 000s of loci by sequencing whole genomes or reduced
to address the genetic basis of these ecological interactions with a portions of the genome (transcriptome sequencing or reduced rep-
breadth of coverage that has not previously been available. resentative libraries [27]). These approaches are discussed below.
Genomic studies interrogate across the genome and include While whole-genome sequencing has been widely used to
whole-genome analyses, quantifying gene expression and ana- discover SNPs in model organisms such as humans and flies,
lyses of genetic variation (e.g. SNP, single-nucleotide polymorph- there are fewer examples in non-model, marine organisms. A

Downloaded from https://academic.oup.com/bfg/article-abstract/15/5/342/1742085 by Universidade de São Paulo - IO user on 17 December 2019


ism). This essay focuses on nucleotide variation; quantitative few examples of SNP discovery using whole-genome sequenc-
differences in gene expression are reviewed elsewhere [1–11]. ing in marine organisms include 5.9 million SNPs identified in
Common to all genomic studies is the breadth of the data that three-spined stickleback (Gasterosteus aculeatus) [28], 3.8 mil-
are interrogated. Whole-genome sequences, all expressed genes lion SNPs identified in Pacific oyster (Crassostrea gigas) [29], 1
and thousands of genetic markers are interrogated versus a few million SNPs identified in Atlantic cod (Gadus morhua) [30] and
sequences, transcripts or microsatellites that have been used pre- 335 000 SNPs identified in acorn barnacle (Semibalanus bal-
viously. Both the genomic breadth and random sampling of a anoides) [31]. These whole-genome approaches require a refer-
large number of genetic markers across the genome allow one to ence genome. An alternative approach is to assemble a partial
establish neutral expectations and provide greater confidence in genome by aligning genomic sequences to an assembled tran-
estimating neutral population parameters, such as effective scriptome (full-length RNA sequences that represent a majority
population size and migration rate [12]. Demography and evolu- of expressed genes). This creates a partial genome assembly of
tionary history should affect these neutral loci differently from both exons and flanking regions. This approach identified
loci under selection or outlier loci, and consequently, population 440 817 SNPs in Atlantic herring (Clupea harengus) [32]. A similar
genomics also provides the ability to identify adaptive or func- approach was taken with European anchovy (Engraulis encrasico-
tionally important loci and genes [13]. This is the fundamental lus) [33]. Although feasible for many laboratories, whole-gen-
difference between recent genomic approaches and many earlier ome sequencing for non-model species is not trivial because
studies: with many 1000s of markers, one can readily define neu- genome assembly requires high-performance computing and
tral and demographic patterns and contrast these with DNA poly- extensive bioinformatics. Thus, until whole-genome bioinfor-
morphism patterns that significantly differ from the neutral matics methods (e.g. starting with whole-genome assembly and
expectation. It is the discovery of the rapidity and breadth of non- annotation followed by variant detection) for hundreds of indi-
neutral changes that will greatly enhance our understanding of viduals become standard for non-model organisms, SNPs for
life and how it accommodates to changes in our biosphere. non-model organisms are most likely to be identified by
sequencing a reduced subset of a species’ genome. Even if
whole-genome bioinformatics methods were standard, the ex-
Progress in ecological genetics perimental question may be more appropriately answered
Genetic markers have been used to study population structure using a reduced representative library because it is a less labori-
in natural populations since the early 1960s. The first genetic ous approach for sampling 10 000s of loci in many individuals.
marker used was hemoglobin, which was used to detect inter- There are two main ways to reduce the complexity of a spe-
species polymorphisms in whiting and cod populations [14]. cies’ genome for population genomic analyses across many in-
Hemoglobin studies were soon followed by allozyme studies. dividuals: (1) focus on expressed genes, and (2) construct a
Allozymes are amino acid protein variants in enzymes that can reduced representative genomic DNA library. A big advantage of
be differentiated by non-denaturing gel electrophoresis (starch both these methods is that using the same data, one can simul-
gels) using enzyme-specific stains, and initial allozyme studies taneously discover SNPs and genotype individuals at those
uncovered substantial protein polymorphisms in natural fly same SNPs [34] as long as individuals are not pooled. If individ-
populations [15] and humans [16]. These two studies were fol- uals are pooled, then population allele frequencies can be deter-
lowed by many others that substantiated these initial findings mined but one forgoes knowing individual genotypes. A final
[17, 18]: natural populations showed too many protein poly- advantage of these two methods is that they can be used on
morphisms (30% of proteins were polymorphic) based on the species with few or no genomic resources, which is often the
concept of genetic load [19, 20]. These discoveries led to the case for marine organisms, especially those of little economic
neutral theory of molecular evolution [21]. The neutral theory of importance. A third method, exon capture [35], provides a gene-
molecular evolution suggests that most molecular polymorph- targeted approach to identify SNPs. However, this approach re-
isms have no effect on fitness (are not adaptively important) quires prior knowledge of the genes to be captured based on the
and has since been used extensively as a null hypothesis to test species of interest or a related species and thus is more difficult
whether the variations within and among species are biologic- to implement for many marine species.
ally important because they affect fitness [22, 23]. Focusing on transcriptomics or expressed genes (mRNAs se-
With the advent of DNA methods, genetic markers shifted quences) for SNP discovery significantly reduces the sequence
from protein-based markers to DNA-based markers. These DNA- target size (only 1–5% of vertebrate genomes are transcribed
based markers include restriction fragment length polymorph- [36]) and has been used to identify SNPs in a variety of marine or-
isms (RFLPs) and the closely related amplified fragment length ganisms. SNPs have been identified from expressed sequence tag
polymorphisms (AFLPs), microsatellites and SNPs and are well- (EST) collections in Atlantic cod (Gadus morhua) [37, 38], shrimp
summarized in other reviews [24]. Table 1 lists key characteristics (Litopenaeus vannamei) [39, 40], sole (Solea solea) [41], stickleback
of both protein- and DNA-based genetic markers. (Gasterosteus aculeatus) [28], killifish (Fundulus heteroclitus) [42] and
eastern oyster (Crassostrea virginica) [43]. SNPs also have been
identified using 454 cDNA sequence data in copepods (Tigriopus
SNP discovery with next-generation sequencing californicus) [44], marine gastropods (Littorina saxatilis [45] and
Until recently, most studies investigating environmental impacts Concholepas concholepas [46]), soft clam (Mesodesma donacium) [47],
on genetic variation examined less than 100 loci [26]. Yet, in the Atlantic herring (Clupea harengus) [48], silver-lipped pearl oyster
344 | Crawford and Oleksiak

Table 1. Common genetic marker characteristics for diploid organisms

Characteristics Allozymes RFLPs/AFLPs Microsatellites SNPs

Codominance (genotypic data) Yes No Yes Yes


Neutrality Some Yes (mostly) Yes Transcriptome-based: uncertain; genomic-based: mostly
Number of variable loci typically assayed 10–50 10–1000 5–20 100s–10 000s
Number of alleles/locus 1–5 2 1–50 2

Downloaded from https://academic.oup.com/bfg/article-abstract/15/5/342/1742085 by Universidade de São Paulo - IO user on 17 December 2019


Based on [25].

Table 2. Useful terms and their description for this article

Term Description

Whole-genome sequencing Sequences that capture nearly all of the genome and are assembled into large (10s–100 kb)
scaffolds. Because whole genomes can be 0.1 to 10 Gbps, whole-genome approaches require
much more sequencing than GBS, RNA-seq or exon-capture. Thus, few individuals are
typically analyzed.
RRGS: Reduced representative Genome sequencing that only sequences a selected ‘reduced’ portion of the genome. Typically
genome sequencing (RAD, RAD-tag, GBS) this approach sequences 0.1% to 1% of the genome derived from specific restriction sites.
The power of this approach is that the same loci in the genome of approximately 50–100 bp
are sequenced in many (100s of) individuals.
Transcriptomics and RNA-seq Transcriptomics is the quantitative and qualitative analysis of RNA expression. RNA-seq is a
transcriptomic approach that uses sequences of expressed RNA and provides both quantita-
tive measures of expression and identifies nucleotide variation in expressed DNA (i.e. in
RNA).
Capture sequencing, exon-capture, Capture sequencing, as the name implies, is sequencing selected (captured) genomic DNA tar-
exomic sequencing, targeted sequencing gets. The most general approach is exon-capture or exomic sequencing that uses PCR,
hybridization probes or cDNAs to capture exons or DNA that is expressed as RNA. Yet, the
selected targets can be any parts of the genome where there are probes, or primers to specify
a portion of the genome.
454 sequencing technology 454 was one of the initial next-generation sequencing approaches that captures individual
DNA fragments on beads and sequences each of these fragments in separate wells in high-
density well plates; 454 sequencing produces 300–500 bp sequences (reads) yielding 0.5
Gbp/run.
Illumina sequencing technology Illumina sequencing captures DNA fragments on a solid surface and sequences these
fragments on the surface. There are many different platforms, and the High-Seq platform
that typically produces 50–200 bp reads yields 50 to 1000 Gbp per run
ABI SOLiD sequencing technology Sequencing by Oligonucleotide Ligation and Detection is a novel approach that does not rely
on sequencing by DNA polymerization. Each sequence is 50 bp and yields 30–50 Gbp per
run.

(Pinctada maxima) [49], California red abalone (Haliotis rufescens) assembles a transcriptome from RNA-seq (RNA sequencing or
[50] and a marine annelid (Streblospio benedicti) [51]. Both EST and whole transcriptome sequencing) data, then aligns RNA-seq
454 sequencing (Table 2) yield relatively long reads compared reads to this assembly and finally identifies polymorphic loci.
with the Illumina and ABI platforms (Table 2), making de novo The RNA-seq reads might come from individual transcripts or
transcriptome assembly easier and potentially more robust. To from transcripts pooled from many individuals.
compare the utility of Roche 454 (which produces 300–500 bp Focusing SNP discovery on expressed genes has the signifi-
reads but only 0.5 Gbp/run [52]) and Illumina Genome Analyzer cant advantage of targeting many functional portions of the
(GAII, which only produces 100 bp reads, but >30 Gbp [52]) genome (i.e. all SNPs come from mRNAs, most of which encode
sequencing for SNP discovery, both platforms were used for SNP proteins and do not include non-coding genomic regions) [61].
discovery in European hake (Merluccius merluccius). Not surpris- Knowing the function of a locus containing a SNP of interest
ingly, the average contig length (set of overlapping DNA se- gives further insight into the putative SNP phenotype and is
quences) was 3-fold greater for 454 than for Illumina, while the useful for downstream analyses, for instance gene functional
average coverage depth was 7.5-fold greater for Illumina. category enrichment analyses. This knowledge can be critically
However, the two approaches yielded similar percentages of important when looking for signatures of natural selection and
polymorphic loci, showing that both platforms are suitable for trying to determine the functional consequences of variation
large-scale SNP discovery [53]. Other marine species using shorter that seems to be under selection or adaptive.
read sequencing platforms such as Illumina and ABI SOLiD plat- One shortcoming of targeting expressed genes results from
forms for SNP discovery include mammals, fish, mollusks and differences in allele-specific expression. Allele-specific expres-
arthropods [31, 32, 54–60]. De Wit et al. outline a transcriptome sion, where one allele is preferentially expressed, can result
assembly approach followed by SNP marker development and in inaccurately calling an individual a homozygote rather than
genotyping [61]. The idea is relatively straightforward: one a heterozygote. Perhaps the biggest disadvantage of using
Ecological population genomics | 345

sequenced in 100s of individuals with >10-fold coverage per nu-


cleotide per individual. This sequencing depth for many indi-
viduals allows one to define SNPs, genotypes for all individuals
and allele frequencies across populations. With barcoding,
RRGS allows hundreds of individuals to be genotyped simultan-
eously and relatively cheaply and has opened the door for a
huge range of population genomics studies using any non-

Downloaded from https://academic.oup.com/bfg/article-abstract/15/5/342/1742085 by Universidade de São Paulo - IO user on 17 December 2019


model species. Due to the ease of these approaches, their re-
peatability [67–69] and their potential to produce thousands of
genetic markers, RRGS approaches to identify SNPs are fast
becoming the approach of choice for population genomic ana-
lyses in marine organisms.
Unlike SNPs developed using mRNAs, SNPs identified using
RRGS approaches with genomic DNA are most frequently from
non-functional, non-coding genomic DNA regions. While many
‘non-functional’ or neutral markers are useful for defining
population genetic parameters [12], RRGS utility to discover
SNPs that are evolving by natural selection may be less effective
than RNA-seq because so many of the SNPs should be associ-
ated with genomic DNA that has little if any function. Despite
Figure 1. Frequency of expression. The number of gene-specific sequence tags
(as log based 10) versus their relative frequency for 5890 different transcripts. this, we and others have used GBS or RAD tag sequencing
Most genes have few sequence reads and a few genes have most reads. approaches to define SNPs evolving by natural selection [70, 71].
Thus, both sequencing cDNAs or reduced representative gen-
ome approaches have their strengths and weaknesses, and the
RNA-seq to identify SNPs results from the widely varying mRNA choice to use one or the other will depend on research goals.
concentrations within a sample [62]. Figure 1 shows the fre-
quencies of genes and their relative expression level from an
RNA-seq experiment. Most mRNAs (4536 or 76%) have few reads
Implications for marine ecology
each, and the sum of all these reads represents only a fraction Genomic approaches with marine organisms provide finer reso-
(6%) of the total number of reads. A few mRNAs (136 or 2%) are lution of population structure; for example several thousand
greatly overexpressed and thus have many reads each; the sum SNPs are able to define coral reef population structure along the
of these reads represents >80% of all reads. This means that a Florida coast where previous microsatellite or DNA sequences of
gene that is highly expressed will be sequenced many times at one or few loci suggest no differences along this coast [72].
the expense of genes that are less highly expressed, without Additional genomic approaches more readily identify loci with di-
any additional SNP identification. More importantly, because vergence patterns that are unlikely to represent neutral diver-
highly expressed genes use up the sequencing capacity, fewer gence but instead are most likely due to natural selection [32, 54,
individuals can be sequenced at the same time, losing the ad- 55, 59, 60, 71, 73]. These genomic approaches that estimate the
vantage of simultaneously discovering SNPs and genotyping shared nucleotide diversity among individuals and populations
hundreds of individuals. For example, using the RNA-seq ap- provide thousands of loci, many of which are neutral. It is the
proach in red abalone, approximately 22 000 SNPs were identi- neutral loci that are used to estimate basic population genetic
fied among 39 individuals using seven lanes of an Illumina parameters and demography estimates [12], and thus it is pos-
HiSeq 2000 [55]; in contrast, typically a similar number of ran- sible to more accurately define loci with non-neutral patterns
dom, genome-wide SNPs can be identified in 100s of individuals (potentially adaptive loci). An approach to defining adaptive loci
using a single lane [63]. is to distinguish SNPs with FST values that are dissimilar to FST
Identifying SNPs in genomic DNA avoids the problem of values achieved when permutating the data (FST values are too
biased representation of the most highly expressed mRNAs, as large relative to expected values [74, 75]). FST values [76, 77] pro-
there are only two copies of each locus per cell and each locus vide measures of the genetic distance among populations and
occurs at the same frequency (Table 2). However, as suggested range from 0 to 1.0. FST values measure the genetic variation be-
above, due to both sequencing and genome assembly and anno- tween populations relative to the total variation [76, 77]. A simple
tation costs, a more practical strategy for population genomics but inexact estimate of a FST value is: FST ¼ ((HT  HS)/HT), where
of natural marine populations is to only sequence a select small HT represents expected heterozygosity per locus of the total
portion of the genome. The most common strategy is ‘reduced population and HS represents expected heterozygosity of a sub-
representation genome sequencing’ (RRGS, Table 2, reviewed in population. Thus, a genomic approach that identifies 1000s of
[27, 64, 65]). RRGS defines genotypes by sequencing and is also SNPs is statistically more powerful in identifying adaptive SNPs
described as RAD-seq (restriction site-associated DNA sequenc- because SNPs with too large FST values are more readily defined
ing) [58, 66] and GBS (genotyping by sequencing) [63]. In RRGS, in comparison with many neutral loci [75, 78, 79].
genomic DNA is cut with one or more restriction enzymes, and One example of the strength of a genomic approach is in the
only short DNA fragments are sequenced (reviewed in [27, 64, estuarine fish sailfin molly (Poecilia latipinna). An RRGS approach
65]), constituting approximately 0.1% to 1% of the genome. This was used to define approximately 1320 SNPs found in 80% of
approach yields many thousandfold sequence coverage for the 120 individuals. Among three geographically close (within 10
same fragments. Many hundreds of individuals can be km) populations in the Florida Keys (Figure 2A), a majority of
sequenced simultaneously by using barcodes that identify each SNPs suggest little genetic distance with FST values <0.0125 (less
individual. With 100s of millions of sequence reads of approxi- than 1.25% of the variance among populations relative to the
mately 100 bp, there will be 1000s to 10 000s of the same loci total) [71]. Although this FST value is significant, it is very small.
346 | Crawford and Oleksiak

Downloaded from https://academic.oup.com/bfg/article-abstract/15/5/342/1742085 by Universidade de São Paulo - IO user on 17 December 2019


Figure 2. Sailfin molly GBS. A: Three geographically close Florida Keys populations (BP: Big Pine Key, NNK: No Name Key). B: Outlier test for 1.3K SNPs among 120
individuals; red highlights SNPs with extreme FST values unlikely to occur due to neutral processes. C. Structure analysis of Florida Keys populations using SNPs with
extreme FST values. Each vertical bar represents a multivariate summary of the ancestral genotype. (A colour version of this figure is available online at: http://
bfg.oxfordjournals.org)

The RRGS data for the genetic variation within and among these investigating 440 817 SNPs, 3847 SNPs (approximately 1%)
and other populations are very similar to previous studies using showed significantly strong differences among populations,
allozymes and microsatellites [80, 81], which suggests that there with a few at or near fixation [32]. Similar data are found in tem-
is little if any difference among South Florida sailfin molly poral isolation of pink salmon (Oncorhynchus gorbuscha)
populations. Yet, among the salt marsh flats in the Florida Keys and Atlantic salmon (Salmo salar), where a few out of 1000s of
populations, there are SNPs with significant genetic diversity SNPs are associated with migration and reproduction timing
among populations (Figure 2B [71]). Specifically, these three [59, 60].
similar populations have 18 SNPs with FST values that are un- Data from these studies consistently detect a few percent-
likely to occur relative to permutation of SNPs with similar het- ages of SNPs with elevated divergence between groups or popu-
erozygosities (p-value < 0.01, false discovery rate ¼ 0.1; Figure lations that exceed neutral expectations. These studies
2B). Using these loci, there is a strong structure among popula- highlight the power of genomic approaches: discovery and char-
tions (Figure 2C). The meaningful difference between RRGS and acterization of many thousands of polymorphic loci provide a
previous measures of genetic diversity is the number of loci wide breadth of information where 1-5% of SNPs distinguish im-
examined, which allows more precise delineations of popula- portant population structure. Many of these informative SNPs
tion structure and, as well, facilitates identifying loci with ex- have FST values that are indicative of non-neutral processes and
cessive FST values that could indicate adaptive divergence. thus suggest that adaptive evolution acting on many loci is
Thus, the power of RRGS studies is that they enhance previous common.
studies and can also distinguish subtle but evolutionarily im-
portant genetic differences.
The resolution of genetic differences based on RRGS is found
Technical aspects: alignment, sequence depth,
in other studies. In 37 red abalone (Haliotis rufescens), the vast
majority of SNPs had little genetic differentiation among popu-
HWE and LD
lations [55]. Yet, 691 from 22 000 loci had significantly higher The big difference between traditional genetic markers and gen-
FST values that readily distinguish populations along the etic markers from genomic approaches is the number of
California Coast [55]. Similarly, among Baltic and Atlantic her- markers. Genomic approaches can yield hundreds of thousands
ring (Clupea harengus), most SNPs support genetic homogeneity, and even millions of genetic markers. However, except for num-
reflecting the lack of obvious dispersal barriers in marine habi- ber, genetic markers generated via genomic approaches are very
tats [32, 54]. These SNPs for many or a majority of loci yield similar to those generated via less high-throughput approaches
similar results to studies using allozymes or microsatellites and should be treated so [31, 71]. This entails filtering the data
where population differentiation across large geographic areas so that only high-quality and high-confidence genetic markers
is very low (FST values 0.005). Yet, out of 5985 SNPs, there were are used to address the relevant biological questions. An initial
117 (2.0%) that showed evidence of substantial divergence, cor- filtering step begins with sequence coverage depth. The general
responding to FST values ¼ 0.128 [54]. In a similar analysis, when idea is that a polymorphism that occurs in multiple sequences
Ecological population genomics | 347

Downloaded from https://academic.oup.com/bfg/article-abstract/15/5/342/1742085 by Universidade de São Paulo - IO user on 17 December 2019


Figure 3. Distribution of minor allele frequencies (MAFs) and Ho/He. SNPs from 120 individuals among five Florida sailfin molly populations [71]. A: Histogram of MAFs
for 3878 SNPs prior to filtering (i.e. without removing SNPs that do not occur in 90% of 120 individuals from five populations, and >1% MAF. MAFs of 5% and 1% are high-
lighted in red and black, respectively). B: Ratio of observed heterozygosity (Ho) over expected heterozygosity (He ¼ 2pq). SNPs with Ho > He and significantly different
from Hardy–Weinberg expectation (HWE, p < 0.01) highlighted in red. Notice that many SNPs have Ho  He, because the Wahlund effect among five populations would
reduce Ho relative to He [93]. These SNPs with low Ho are not removed. (A colour version of this figure is available online at: http://bfg.oxfordjournals.org)

and in multiple individuals is unlikely to be a technical artifact aligned. The alignment method has a large effect on the resulting
(e.g. a sequencing error). Yet, there are potential technical prob- SNP pool. For instance, the Tassel pipeline [83, 84] for species
lems specific to RRGS approaches [27, 64, 65]. with a reference genome and the related UNEAK pipeline for spe-
Five technical aspects to be aware of are ascertainment bias, cies without a reference genome are used to identify SNPs from
alignment method, sequence depth, mis-alignment of paralogs RRGS data. Tassel and other bioinformatics approaches that rely
and linkage disequilibrium (LD). SNP discovery should use indi- on aligning short sequence reads have the option of using Bowtie
viduals from across the population range versus genotyping [85] and BWA [86]. However, using the same raw sequencing
one population and investigating these population-specific data, Bowtie and BWA result in different final SNP sets: in our
polymorphisms. Examining SNPs discovered in only one popu- and others’ experience, only approximately 50% of reads pro-
lation leads to ascertainment bias [82]. This ascertainment bias duced shared alignments with these two tools [87–89]. Even after
occurs because polymorphisms are population-specific, and filtering to remove SNPs with too few reads and excessive Ho,
defining SNPs in one population will cause one to miss SNPs in this difference in which SNPs are identified is not eliminated, al-
a second population, thus creating a false sense of lower nu- though it does greatly reduce the frequencies of alternate SNP
cleotide diversity in the second population. identification [72]. It is difficult to suggest the cause or the conse-
An important technical aspect of RRGS approaches starts with quences of identifying different SNPs from the same sequences,
alignment. In general, short sequence reads or tags are aligned to but one can compare these two approaches and inquire about
a reference genome or, alternatively, are assembled into a set of the frequencies of paralogs and sequencing depth to choose an
contiguous sequences, to which individual sequence reads are objective approach (see below).
348 | Crawford and Oleksiak

Sequencing depth, the number of reads per locus or SNP, de- effect [93], Figure 3B), and these informative SNPs can be used
fines the probability of identifying allelic variation, defining in- to describe the connectivity and differences within a species.
dividual genotypes and avoiding sequencing errors. To identify LD is the significant correlation among SNPs. This statistical
true allelic variants (versus sequencing errors) and genotypes association among SNPs when physically close is due to limits
(where the identity of a SNP variant on both chromosomes is of recombination to create independence among sites. A poten-
estimated) requires sufficient read depth. If there were an equal tial advantage for genomic approaches that identify 1000s of
chance of sequencing both alleles, the probability of identifying SNPs is a more accurate estimate of effective population size

Downloaded from https://academic.oup.com/bfg/article-abstract/15/5/342/1742085 by Universidade de São Paulo - IO user on 17 December 2019


the second allele would be likely (>90%) with five reads. using the LD method [96]. Many marine organisms are believed
Unfortunately, the distribution of reads is Poisson or negative to have very large effective population sizes, and so, current
binomial [90], which means that one needs approximately 10 methodologies using 10–20 loci are not very informative (i.e.
reads per locus per individual to accurately estimate genotypes 95% confidence intervals often include infinity). This arises be-
[91, 92]. Notice that genotype can be used to estimate popula- cause LD due to drift is low in large populations and difficult to
tion allele frequencies from pooled samples of sufficient num- detect. Large numbers of markers are expected to overcome this
bers of individuals. However, identifying both alleles in an difficulty, allowing the upper bounds of Ne to be estimated [96].
individual and thus its genotype is important for association While LD and the size of linkage groups can be a hallmark of
studies (relating genotypes to phenotypes) and for identifying adaptive divergence [97], it also creates a bias in that SNPs in LD
paralogs (different loci for similar genes, e.g. the three genes for are not independent. Thus, for example, many SNPs in LD that
lactate dehydrogenase: Ldh-A, Ldh-B and Ldh-C). Identifying have lower divergence among populations could inflate meas-
genotypes is readily accomplished by ligating on a barcode for ures of connectivity. If the primary goal of an RRGS project is to
each individual: a sequence-specific DNA adaptor that is pre- identify demographic processes, one can readily remove SNPs
sent in each sequence read and is used to identify every in LD. This is best applied for SNPs with a genome, so that one
individual. can examine a subset of SNPs that are physically close to one
Sequencing or PCR errors that can contribute to incorrect another. It is more difficult with SNP discoveries without a gen-
SNP calls can be avoided by requiring a minimum number of ome because it would require defining the correlation among all
reads in more than one individual. Sequencing errors of 0.01% possible pairwise combinations, a computationally difficult task
to 0.1% can produce rare SNPs [92]. Similarly, in many RRGS with thousands of SNPs. There are other approaches. One is to
approaches, there is a PCR step that can introduce errors in ‘thin’ the data, so that no SNPs occur on the same tag sequence
many copies of a sequence tag if the error occurs in the first few or are within a fixed distance (e.g. only keeping SNPs that are
cycles. These will create many sequence tags in an individual >100 bp apart). While these approaches are effective, one loses
with a novel SNP. Both errors can be avoided by requiring that informative SNPs because even though there may be a signifi-
the rare SNP both has sufficient read depth and occurs in many cant correlation between loci, rarely are the correlations perfect,
individuals. Thus, if one requires a minimum of five reads per and thus these sites have some information content. One sug-
SNP with 1% minimum allele frequency (MAF) when sequencing gestion is to estimate population parameters by permuting the
>100 individuals, all SNPs will be represented in more than one data, using a subset of unlinked SNPs or defining the LD among
individual multiple times. Removing SNPs with low MAF has a the most informative SNPs. The last suggestion is important for
small effect on the total number of SNPs (Figure 3A), and thus is defining the number of independent loci that explain the diver-
a simple way to avoid errors. Moreover, removing these SNPs gence among populations and, because fewer SNPs are
enhances downstream analyses that rely on permutation of the involved, is more readily accomplished.
data [94]. The downside of this approach of choosing MAFs is There are other concerns with GBS or RAD-seq approaches
that one will miss rare variations that may contribute to pheno- [65, 98]; these include PCR-biased amplification that causes al-
typic variation [95]. lelic bias, allele drop off or loss of specific loci. These biases will
RRGS approaches using short reads generated by next-gen- reduce the ability to identify genetic variation, making different
eration sequencers can produce alignments among different populations more similar because one allele is missing or
paralogs (similar genes that occur at different locations so they under-represented. Other problems are population-specific
do not recombine). A hallmark of SNPs that represent align- polymorphisms in restriction sites such that the proper size
ments among paralogs (variants between loci instead of true al- fragments are only produced in one or a few populations. If
lelic variants at a single locus) is that there will be excessive SNPs are filtered such that they occur in a vast majority (>75%)
heterozygosity. For example, imagine two genes that code for of individuals, these problems will only reduce the number of
similar proteins and thus share a high degree of similarity for informative SNPs, not create bias in divergence estimates
parts of their genes; further imagine that each gene has a fixed among populations.
different nucleotide at the same position. If a short read aligns In summary, most technical problems with RRGS
to both genes at this position, all individuals will appear to be approaches are addressed by having sufficient read depth
heterozygotes. The first step to avoiding this is removing tag se- among many individuals and eliminating SNPs with too large
quences that align to more than one portion of the genome, a Ho/He. These requirements may underestimate variation be-
default requirement for several alignment tools. This requires a cause rare or low-frequency alleles will not be considered.
complete genome where all paralogs are identified. However, Linkage is not necessarily a problem because it can be used to
not all RRGS projects in marine species will have a genome, and estimate non-neutral patterns of divergence, but it will overesti-
not all genomes are assembled well-enough so that all paralogs mate the number of independent loci.
are identified. An additional approach is to identify all SNPs
with excessive observed heterozygosity (Ho) relative to the ex-
pected heterozygosity (He) using the Hardy–Weinberg equilib-
Conclusion
rium test (Figure 3B). Notice, only SNPs with large excessive Ho Marine ecological genomics provides the ability to better under-
are rejected because loci with heterozygote deficits (low Ho) stand population structure, genetic divergence among popula-
relative to He are likely among divergent populations (Wahlund tions and selectively important genes. These data are important
Ecological population genomics | 349

because they inform us about the conservation genetics of iso- 10. Wittkopp PJ. Variable gene expression in eukaryotes: a net-
lated populations, the genes affecting important phenotypes work perspective. J Exp Biol 2007;210:1567–75.
(e.g. reproductive schedules) and the effectiveness of adaptive 11. Wray GA, Hahn MW, Abouheif E, et al. The evolution of tran-
change. Currently, many genomic studies suggest that marine scriptional regulation in eukaryotes. Mol Biol Evol 2003;20:
species have greater population structure than previously 1377–419.
appreciated. Additionally, many of these studies identify many 12. Allendorf FW, Hohenlohe PA, Luikart G. Genomics and the
loci that appear to be evolving by natural selection. An exten- future of conservation genetics. Nat Rev Genet 2010;11:

Downloaded from https://academic.oup.com/bfg/article-abstract/15/5/342/1742085 by Universidade de São Paulo - IO user on 17 December 2019


sion of these observations is that natural selection is more ef- 697–709.
fective than currently appreciated, resulting in marine 13. Luikart G, England PR, Tallmon D, et al. The power and prom-
populations adapted to subtle differences in their local environ- ise of population genomics: from genotyping to genome typ-
ments. If it is true that natural selection is more effectively ing. Nat Rev Genet 2003;4:981–94.
shaping population-specific genotypes, it suggests that current 14. Sick K. Haemoglobin polymorphism in fishes. Nature 1961;
climate changes will be mitigated by adaptive change in many 192:894–6.
marine organisms with sufficient genetic variation. 15. Lewontin RC, Hubby JL. A molecular approach to the study of
genic heterozygosity in natural populations. II Amount of
variation and degree of heterozygosity in natural populations
Key Points of Drosophila pseudoobscura. Genetics 1966;54:595–609.
• Many marine organisms inhabit diverse environments 16. Harris H. Enzyme polymorphisms in man. Proc R Soc Lond
and have highly connected populations because of pe- 1966;164:298–310.
lagic life stages. 17. Nevo E. Genetic variation in natural populations: patterns
• Genomic approaches provide many thousands of poly- and theory. Theor Popul Biol 1978;13:121–77.
morphic nucleotide sites. 18. Selander RK, Johnson WE. Genetic variation among verte-
• Most population analyses using genome-wide poly- brate species. Annu Rev Ecol Syst 1973;4:75–91.
morphism data support previous population analyses 19. Charlesworth B. Why we are not dead one hundred times
using fewer genetic markers. over. Evolution 2013;67:3354–61.
• The strength of genomic approaches is that many 20. Haldane JBS. The cost of natural selection. J. Genetics
thousands of neutral loci can be used to define loci 1957;55:511–24.
that exhibit population-specific or extreme FST values. 21. Kimura M. Evolutionary rate at the molecular level. Nature
• These non-neutral loci have greater power to define 1968;217:624–6.
population structure or explain genotype to phenotype 22. Kreitman M. The neutral theory is dead. Long live the neutral
relationships. theory. Bioessays 1996;18:678–83; discussion 683.
23. Wayne ML, Simonsen KL. Statistical tests of neutrality in the
age of weak selection. Trends Ecol Evol 1998;13:236–40.
Funding 24. Schlötterer C. The evolution of molecular markers—just a
M.F.O. and D.L.C. was provided by NSF grants from MCB matter of fashion? Nat Rev Genet 2004;5:63–9.
1434565, IOS 1147042 and DEB-1265282. 25. Jarne P, Lagoda PJ. Microsatellites, from molecules to popula-
tions and back. Trends Ecol Evol 1996;11:424–9.
26. Seeb JE, Carvalho G, Hauser L, et al. Single-nucleotide poly-
References morphism (SNP) discovery and applications of SNP geno-
1. Cossins AR, Crawford DL. Fish as models for environmental typing in nonmodel organisms. Mol Ecol Resour 2011;11
genomics. Nat Rev Genet 2005;6:324–33. (Suppl 1):1–8.
2. Grishkevich V, Yanai I. The genomic determinants of geno- 27. Davey JW, Hohenlohe PA, Etter PD, et al. Genome-wide gen-
type  environment interactions in gene expression. Trends etic marker discovery and genotyping using next-generation
Genet 2013;29:479–87. sequencing. Nat Rev Genet 2011;12:499–510.
3. Oleksiak M F., Crawford DL, Functional genomics in fishes, in- 28. Jones FC, Grabherr MG, Chan YF, et al. The genomic basis of
sights into physiological complexity. In: Evan D., Claiborne J. adaptive evolution in threespine sticklebacks. Nature
(eds). The Physiology of Fishes. Boca Raton: CRC Press, 2006, 2012;484:55–61.
523–50. 29. Zhang G, Fang X, Guo X, et al. The oyster genome reveals
4. Oleksiak MF. Genomic approaches with natural fish popula- stress adaptation and complexity of shell formation. Nature
tions. J Fish Biol 2010;76:1067–93. 2012;490:49–54.
5. Romero IG, Ruvinsky I, Gilad Y. Comparative studies of gene 30. Star B, Nederbragt AJ, Jentoft S, et al. The genome sequence
expression and the evolution of gene regulation. Nat Rev of Atlantic cod reveals a unique immune system. Nature
Genet 2012;13:505–16. 2011;477:207–10.
6. Stranger BE, Forrest MS, Clark AG, et al. Genome-Wide 31. Flight PA, Rand DM. Genetic variation in the acorn barnacle
Associations of Gene Expression Variation in Humans. PLoS from allozymes to population genomics. Integr Comp Biol
Genet 2005;1:e78. 2012;52:418–29.
7. Townsend JP, Cavalieri D, Hartl DL. Population genetic vari- 32. Lamichhaney S, Martinez Barrio A, Rafati N, et al. Population-
ation in genome-wide gene expression. Mol Biol Evol scale sequencing reveals genetic differentiation due to local
2003;20:955–63. adaptation in Atlantic herring. Proc Natl Acad Sci USA 2012;
8. Whitehead A, Crawford DL. Variation within and among spe- 109:19345–50.
cies in gene expression: raw material for evolution. Mol Ecol 33. Montes I, Conklin D, Albaina A, et al. SNP discovery in
2006;15:1197–211. European anchovy (Engraulis encrasicolus, L) by high-
9. Williams RBH, Chan EKF, Cowley MJ, et al. The influence of gen- throughput transcriptome and genome sequencing. PLoS One
etic variation on gene expression. Genome Res 2007;17:1707–16. 2013;8:e70051.
350 | Crawford and Oleksiak

34. Narum SR, Buerkle CA, Davey JW, et al. Genotyping-by- 53. Milano I, Babbucci M, Panitz F, et al. Novel tools for conserva-
sequencing in ecological and conservation genomics. Mol Ecol tion genomics: comparing two high-throughput approaches
2013;22:2841–7. for SNP discovery in the transcriptome of the European hake.
35. Hodges E, Xuan Z, Balija V, et al. Genome-wide in situ PloS One 2011;6:e28008.
exon capture for selective resequencing. Nat Genet 2007;39: 54. Corander J, Majander KK, Cheng L, et al. High degree of cryptic
1522–7. population differentiation in the Baltic Sea herring Clupea
36. Willette DA, Allendorf FW, Barber PH, et al. So, you want to harengus. Mol Ecol 2013;22:2931–40.

Downloaded from https://academic.oup.com/bfg/article-abstract/15/5/342/1742085 by Universidade de São Paulo - IO user on 17 December 2019


use next-generation sequencing in marine systems? Insight 55. De Wit P, Palumbi SR. Transcriptome-wide polymorphisms of
from the Pan-Pacific advanced studies institute. Bull Mar Sci red abalone (Haliotis rufescens) reveal patterns of gene flow
2014;90:79–122. and local adaptation. Mol Ecol 2013;22:2884–97.
37. Bowman S, Hubert S, Higgins B, et al. An integrated approach 56. Everett MV, Grau ED, Seeb JE. Short reads and nonmodel
to gene discovery and marker development in Atlantic cod species: exploring the complexities of next-generation
(Gadus morhua). Mar Biotechnol 2011;13:242–55. sequence assembly and SNP discovery in the absence of
38. Hubert S, Higgins B, Borza T, et al. Development of a SNP re- a reference genome. Mol Ecol Resourc 2011;11 (Suppl 1):
source and a genetic linkage map for Atlantic cod (Gadus 93–108.
morhua). BMC Genomics 2010;11:191. 57. Fernandez R, Schubert M, Vargas-Velazquez AM, et al. A
39. Ciobanu DC, Bastiaansen JWM, Magrin J, et al. A major SNP re- genomewide catalogue of single nucleotide polymorphisms
source for dissection of phenotypic and genetic variation in in white-beaked and Atlantic white-sided dolphins. Mol Ecol
Pacific white shrimp (Litopenaeus vannamei). Anim Genet Resourc 2016;16:266–76.
2010;41:39–47. 58. Hohenlohe PA, Amish SJ, Catchen JM, et al. Next-generation
40. Du ZQ, Ciobanu D, Onteru S, et al. A gene-based SNP linkage RAD sequencing identifies thousands of SNPs for assessing
map for pacific white shrimp, Litopenaeus vannamei. Anim hybridization between rainbow and westslope cutthroat
Genet 2010;41:286–94. trout. Mol Ecol Resourc 2011;11:117–22.
41. Diopere E, Hellemans B, Volckaert FA, et al. Identification and 59. Johnston SE, Orell P, Pritchard VL, et al. Genome-wide SNP
validation of single nucleotide polymorphisms in growth- analysis reveals a genetic basis for sea-age variation in a wild
and maturation-related candidate genes in sole (Solea solea population of Atlantic salmon (Salmo salar). Mol Ecol 2014;
L.). Mar Genomics 2013;9:33–8. 23:3452–68.
42. Williams LM, Ma X, Boyko AR, et al. SNP identification, verifi- 60. Seeb LW, Waples RK, Limborg MT, et al. Parallel signatures of
cation, and utility for population genetics in a non-model selection in temporally isolated lineages of pink salmon. Mol
genus. BMC Genet 2010;11:32. Ecol 2014;23:2473–85.
43. Quilang J, Wang S, Li P, et al. Generation and analysis of ESTs 61. De Wit P, Pespeni MH, Palumbi SR. SNP genotyping and popu-
from the eastern oyster, Crassostrea virginica Gmelin and lation genomics from expressed sequences–current advances
identification of microsatellite and SNP markers. BMC and future possibilities. Mol Ecol 2015;24:2310–23.
Genomics 2007;8:157. 62. Bonaldo MF, Lennon G, Soares MB. Normalization and sub-
44. Foley BR, Rose CG, Rundle DE, et al. A gene-based SNP re- traction: two approaches to facilitate gene discovery. Genome
source and linkage map for the copepod Tigriopus californi- Res 1996;6:791–806.
cus. BMC Genomics 2011;12:568. 63. Elshire RJ, Glaubitz JC, Sun Q, et al. A robust, simple genotyping-
45. Galindo J, Grahame J, Butlin R. An EST-based genome scan by-sequencing (GBS) approach for high diversity species. PloS
using 454 sequencing in the marine snail Littorina saxatilis. One 2011;6:e19379.
J Evol Biol 2010;23:2004–16. 64. Andrews KR, Luikart G. Recent novel approaches for popula-
46. Gallardo-Esca  rate C, Nun~ ez-Acun~ a G, Valenzuela-Mun ~ oz V. tion genomics data analysis. Mol Ecol 2014;23:1661–7.
SNP discovery in the marine gastropod Concholepas concho- 65. Puritz JB, Matz MV, Toonen RJ, et al. Demystifying the RAD
lepas by high-throughput transcriptome sequencing. Conserv fad. Mol Ecol 2014;23:5937–42.
Genet Resour 2013;5:1053–4. 66. Baird N, Etter P, Atwood T, et al. Rapid SNP discovery and gen-
47. Gallardo-Esca  rate C, Valenzuela-Mun ~ oz V, Nu
n~ ez-Acun ~ a G, etic mapping using sequenced RAD markers. PloS One
et al. SNP discovery and gene annotation in the surf clam 2008;3:e3376.
Mesodesma donacium. Aquac Res 2015;46:1175–87. 67. DaCosta JM, Sorenson MD. Amplification biases and consist-
48. Helyar SJ, Limborg M, Bekkevold D, et al. SNP discovery using ent recovery of Loci in a double-digest RAD-seq protocol. PloS
next generation transcriptomic sequencing in Atlantic her- One 2014;9:e106713.
ring (Clupea harengus). PloS One 2012;7:e42089. 68. Gonen S, Bishop SC, Houston RD. Exploring the utility of
49. Jones DB, Jerry DR, Fore ^t S, et al. Genome-wide SNP valid- cross-laboratory RAD-sequencing datasets for phylogenetic
ation and mantle tissue transcriptome analysis in the silver- analysis. BMC Res Notes 2015;8:299.
lipped pearl oyster, Pinctada maxima. Mar Biotechnol 69. Gorjanc G, Cleveland MA, Houston RD, et al. Potential of geno-
2013;15:647–58. typing-by-sequencing for genomic selection in livestock
50. Valenzuela-Mun ~ oz V, Araya-Garay JM, Gallardo-Esca  rate C. populations. Genet Sel Evol 2015;47:12.
SNP discovery and High Resolution Melting Analysis from 70. Hohenlohe PA, Bassham S, Etter PD, et al. Population gen-
massive transcriptome sequencing in the California red aba- omics of parallel adaptation in threespine stickleback using
lone Haliotis rufescens. Mar Genomics 2013;10:11–6. sequenced RAD tags. PLoS Genet 2010;6:e1000862.
51. Zakas C, Schult N, McHugh D, et al. Transcriptome analysis 71. Nunez JC, Seale TP, Fraser MA, et al. Population genomics
and SNP development can resolve population differentiation of the euryhaline teleost Poecilia latipinna. PloS One 2015;10:
of Streblospio benedicti, a developmentally dimorphic mar- e0137077.
ine annelid. PloS One 2012;7:e31613. 72. Drury C, Dale KE, Panlilio JM, et al. Genomic variation among
52. Metzker ML. Sequencing technologies: the next generation. populations of threatened coral: acropora cervicornis. BMC
Nat Rev Genet 2010;11:31–46. Genomics 2016, in press.
Ecological population genomics | 351

73. Milano I, Babbucci M, Cariani A, et al. Outlier SNP markers 86. Li H, Durbin R. Fast and accurate short read alignment
reveal fine-scale genetic structuring across European with Burrows–Wheeler transform. Bioinformatics 2009;25:
hake populations (Merluccius merluccius). Mol Ecol 1754–60.
2014;23:118–35. 87. Nielsen R, Mattila DK, Clapham PJ, et al. Statistical approaches
74. Beaumont MA, Balding DJ. Identifying adaptive genetic diver- to paternity analysis in natural populations and applications
gence among populations from genome scans. Mol Ecol to the North Atlantic humpback whale. Genetics 2001;157:
2004;13:969–80. 1673–82.

Downloaded from https://academic.oup.com/bfg/article-abstract/15/5/342/1742085 by Universidade de São Paulo - IO user on 17 December 2019


75. Lotterhos KE, Schaal SM. Genome scans for the contemporary 88. O’Rawe J, Jiang T, Sun G, et al. Low concordance of multiple
response to selection in quantitative traits. Mol Ecol 2014;23: variant-calling pipelines: practical implications for exome
4435–7. and genome sequencing. Genome Med 2013;5:1–18.
76. Holsinger KE, Weir BS. Genetics in geographically structured 89. Bayer T, Aranda M, Sunagawa S, et al. Symbiodinium tran-
populations: defining, estimating and interpreting FST. Nat scriptomes: genome insights into the dinoflagellate sym-
Rev Genet 2009;10:639–50. bionts of reef-building corals. PloS One 2012;7:e35269.
77. Weir BS, Cockerham CC. Estimating F-statistics for the ana- 90. Gautier M, Foucaud J, Gharbi K, et al. Estimation of population
lysis of population-structure. Evolution 1984;38:1358–70. allele frequencies from next-generation sequencing data:
78. Lotterhos KE, Whitlock MC. Evaluation of demographic his- pool-versus individual-based genotyping. Mol Ecol 2013;22:
tory and neutral parameterization on the performance of FST 3766–79.
outlier tests. Mol Ecol 2014;23:2178–92. 91. Lynch M. Estimation of allele frequencies from high-
79. Whitlock MC, Lotterhos KE. Editor: Judith LB. Reliable coverage genome-sequencing projects. Genetics 2009;182:
Detection of Loci Responsible for Local Adaptation: Inference 295–301.
of a Null Model through Trimming the Distribution of FST. Am 92. Maruki T, Lynch M. Genotype-frequency estimation from
Nat 2015;186:S24–36. high-throughput sequencing data. Genetics 2015;201:473–86.
80. Apodaca JJ, Trexler JC, Jue NK, et al. Large-scale natural dis- 93. Hartl DL, Clark AG. Principles of Population Genetics.
turbance alters genetic population structure of the sailfin Sunderland, MA: Sinauer Associates, 2007.
molly, Poecilia latipinna. Am Nat 2013;181:254–63. 94. Excoffier L, Laval G, Schneider S. Arlequin (version 3.0): an
81. Trexler JC, Travis J, Dinep A. Variation among populations of integrated software package for population genetics data
the sailfin molly in the rate of concurrent multiple paternity analysis. Evol Bioinform 2005;1:47–50.
and its implications for mating-system evolution. Behav Ecol 95. Nelson MR, Wegmann D, Ehm MG, et al. An abundance of rare
Sociobiol 1997;40:297–305. functional variants in 202 drug target genes sequenced in
82. Clark AG, Hubisz MJ, Bustamante CD, et al. Ascertainment 14,002 people. Science 2012;337:100–4.
bias in studies of human genome-wide polymorphism. 96. Waples RS, Do C. Linkage disequilibrium estimates of con-
Genome Res 2005;15:1496–502. temporary N e using highly variable genetic markers: a
83. Bradbury PJ, Zhang Z, Kroon DE, et al. TASSEL: software for as- largely untapped resource for applied conservation and evo-
sociation mapping of complex traits in diverse samples. lution. Evol Appl 2010;3:244–62.
Bioinformatics 2007;23:2633–5. 97. Pritchard JK, Pickrell JK, Coop G. The genetics of human adap-
84. Glaubitz JC, Casstevens TM, Lu F, et al. TASSEL-GBS: a high tation: hard sweeps, soft sweeps, and polygenic adaptation.
capacity genotyping by sequencing analysis pipeline. PloS Curr Biol 2010;20:R208–15.
One 2014;9:e90346. 98. Andrews KR, Hohenlohe PA, Miller MR, et al. Trade-offs and
85. Langmead B, Salzberg S. Fast gapped-read alignment with utility of alternative RADseq methods: reply to Puritz et al.
Bowtie 2. Nat Methods 2012;9:357–9. Mol Ecol 2014;23:5943–6.

You might also like