Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

REVIEWS

A P P L I C AT I O N S O F N E X T- G E N E R AT I O N S E Q U E N C I N G

Reconstructing ancient genomes


and epigenomes
Ludovic Orlando1,2, M. Thomas P. Gilbert1,3 and Eske Willerslev1
Research involving ancient DNA (aDNA) has experienced a true technological revolution
in recent years through advances in the recovery of aDNA and, particularly, through
applications of high-throughput sequencing. Formerly restricted to the analysis of only
limited amounts of genetic information, aDNA studies have now progressed to
whole-genome sequencing for an increasing number of ancient individuals and extinct
species, as well as to epigenomic characterization. Such advances have enabled the
sequencing of specimens of up to 1 million years old, which, owing to their extensive DNA
damage and contamination, were previously not amenable to genetic analyses. In this
Review, we discuss these varied technical challenges and solutions for sequencing
ancient genomes and epigenomes.

Osseous materials
The 1984 publication of a short mitochondrial DNA sequencing, HTS revealed how a diverse range of fos-
Calcified animal tissues, sequence from the quagga, a zebra-like equid that has sil specimens that were previously ignored owing to an
such as bones and teeth. been extinct since the 1880s, initiated the field of ancient inability to yield PCR amplicons nevertheless contained
DNA (aDNA) research1. Following concomitant devel- ultrashort aDNA fragments (~30–50 bp). Combining
opment of PCR and realization that DNA survived in HTS with extraction methods tailored to the short,
osseous materials2, the future of aDNA research looked damaged aDNA molecules increased the time win-
bright. However, the degraded nature of aDNA3 coupled dow for aDNA sequencing by an order of magnitude
with the sensitivity of PCR to contamination — whether to at least 1 million years in permafrozen regions17 and
derived from environmental microorganisms or human 500,000 years in temperate caves31,32. Beyond genomes,
handling, and thus embedded in the samples, or in the the profiling of the epi­genetic landscape (that is, epig-
form of laboratory and/or reagent contamination — con- enomes) of these ancient samples has recently become
tributed to a series of publications based on false-positive feasible33,34, conferring the potential to characterize
results. Given that these problems seriously undermined regulatory changes throughout evolutionary timescales.
1
Centre for GeoGenetics, the field’s broader scientific interest and reliability However, there are also difficulties in paleogenomic stud-
Natural History Museum
until the mid‑2000s, few would have expected that, by ies. Indeed, HTS has enhanced some of the challenges,
of Denmark, University of
Copenhagen, Øster Voldgade the field’s twenty-fifth birthday, the genome of an ancient including data authentication and contaminant iden-
5–7, Copenhagen 1350C, human4 and draft genomes of the extinct mammoth5 tification, as well as accounting for inflated error rates
Denmark. and Neanderthals6 would have been sequenced. Today, caused by damaged nucleotides.
2
Université de Toulouse, many tens of ancient genomes, ranging from microbial In this Review, we discuss key technological devel-
University Paul Sabatier
(UPS), Laboratoire AMIS,
pathogens7–13 to vertebrate genomes14–29 (including the opments underpinning the paleogenomic revolution
CNRS UMR 5288, 37 allées quagga19), have been sequenced. (FIG. 1) and describe post-mortem damage types com-
Jules Guesde, 31000 Paleogenomics is driven by high-throughput sequenc- mon to aDNA and how they can be accounted for (and
Toulouse, France. ing (HTS) platforms, some of which generate data from even exploited). Furthermore, we discuss how aDNA
3
Trace and Environmental
billions of short DNA fragments per run30. In most pale- targets can be enriched relative to other DNA, how the
DNA Laboratory, Department
of Environment and ogenomic studies, DNA libraries are generated by ligat- resulting sequences can be analysed, and recent progress
Agriculture, Curtin University, ing the genomic extract to generic adaptors, amplified in characterizing ancient epigenomes. Throughout, we
Perth, Western Australia using PCR and then subjected to HTS using so‑called highlight current limitations and provide perspectives for
6102, Australia. second-generation sequencing platforms. This contrasts future developments. As most advances relate to human
Correspondence to L.O.
e‑mail: lorlando@snm.ku.dk
with traditional PCR-based approaches, in which loci calcified tissues (bones and teeth), we principally focus
doi:10.1038/nrg3935 are individually targeted and sub-amplicon-sized DNA on these. Some of the key findings addressing long-
Published online 9 June 2015 is unexploitable. In addition to enabling whole-genome standing debates in our own global population history

NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 395

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

Date range of methods

2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016

Whole‑genome in‑solution capture78


Whole‑chromosome target enrichment77
Microarray‑based target enrichment73
In‑solution target enrichment; PCR probes70
Primer extension capture69
Selective uracil enrichment68
ssDNA library16,66
Extraction of ultrashort DNA fragments31
True single‑molecule DNA sequencing (Helicos)45
High‑throughput DNA sequencing
13‑Mb‑long mammoth Mammoth Methylomes33
DNA (454)52 proteome115
Methylome and nucleosome maps34
0.7× mammoth genome (454)5 6.8× Iceman 52× Neanderthal genome (Illumina)22 aDNA studies
16× Paleo−Eskimo genome (Illumina)4 genome and date of
400,000‑year‑old mitochondrial genomes31,32
(SOLiD)15 publication
1.3× Neanderthal genome (454 and Illumina)6 1.1× 700,000‑year‑old horse genome (Illumina and Helicos)17
30× Denisovan genome (Illumina)16 Maize kernel transcriptomes113

Figure 1 | Major advances in ancient genomics. The major methodological advances described in this Review are
Nature
presented with respect to milestones in paleogenomics, including whole-genome sequencing and Reviews | Genetics
the characterization of
transcriptomes, epigenomes and proteomes. Average genome fold-coverage (×) and sequencing platforms are indicated
where applicable. aDNA, ancient DNA; ssDNA, single-stranded DNA.

are summarized in BOX 1 to illustrate the diversity of detection methods indicated that polymerase-blocking
information that can be gathered, and recent literature lesions such as interstrand crosslinks could be prominent
describing other key evolutionary insights revealed by in aDNA49, direct experimental assays based on HTS data
ancient genomics have been reviewed elsewhere35–38. suggested a more minor contribution50. Therefore, their
general importance may be context dependent.
aDNA damage and tailored extractions
aDNA damage. aDNA damage accumulates over time Targeting ultrashort fragments. Extensive aDNA frag-
and was originally characterized using enzymatic reac- mentation was documented early in the field’s his-
tions to reveal the presence of particular types of DNA tory, with later quantitative PCR assays revealing up to
damage (such as abasic sites and crosslinks3) or gas chro- 100‑fold decreases in the abundance of PCR templates
matography experiments coupled with mass spectrome- for each doubling of target size51. As HTS generally allows
try 39. Later approaches inferred damage types on the basis most aDNA molecules to be sequenced over their full
of mutational patterns in sequence data40–43. Specifically, length, the resulting distribution represents a size-decay
an excess of C→T mutations, and their significant reduc- curve52 that enables direct quantitative comparisons of
tion following treatment with uracil DNA glycosylase41, fragmentation across specimens through space, time
revealed cytosine deamination to uracil (a thymine and environmental conditions53. Although random DNA
analogue) as the most prominent base modification. fragmentation should decrease molecule numbers expo-
Second-generation HTS data subsequently refined our understanding of nentially as size increases, aDNA templates often peak at
sequencing such damage, demonstrating that deamination increases 40–80 bp before this decay is observed. The exact median
High-throughput short-read towards read termini44, consistent with expectations of length observed reflects the overall fragmentation levels
DNA sequencing platforms
that require library
faster rates in the overhanging single strands at the frag- experienced after death, which generally increase with
construction and thus ment termini16,44,45 (FIG. 2). HTS data also revealed that the depositional temperature53,54. However, the deviation
modification of the DNA before depurination drives post-mortem DNA fragmentation, as from the expected exponential decay curve for ultrashort
sequencing. Most commonly genomic positions preceding read starts (corresponding sizes suggests that common extraction protocols do not
represented by the Illumina,
to breaks or abasic sites in aDNA molecules) often con- recover, and thus do not optimally exploit, this fraction
GS‑FLX (454), ABI SOLiD and
Ion Torrent series. sist of purines44. This bias appears towards adenines for of molecules.
younger samples but guanines for older samples, possibly This challenge was met by introducing improved
Resonance structures reflecting differences in fragmentation dynamics46 and/ silica-based extraction protocols that modify volume
Dynamic, alternative forms of or base-specific resonance structures47. Statistical models and composition of the DNA-binding buffer 31. These
molecular groups, such as
nucleotide bases, that result
exploiting nucleotide misincorporation patterns in HTS methodological improvements increased recovery
from electron delocalization data sets revealed single-strand breaks in aDNA44,48, most rates of 35–50-bp molecules by twofold to fivefold, and
within the molecule. likely at nicks or abasic sites. Finally, whereas indirect greatly contributed towards the sequencing of even very

396 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

Box 1 | Human evolution insights: one of the principal achievements of ancient genomics
An area of great interest in the study of human evolution is clarifying the admixture history and the migration routes
followed by our ancestors to create contemporary patterns of genetic variation112. Study of the historical hair of an
Aboriginal Australian revealed the existence of a migration from Africa or the Middle East that reached Australia and that
took place 20,000–30,000 years earlier than the migration that gave rise to present-day Europeans and Asians14. The
36,200‑year-old bone remains from an Upper Paleolithic man from Kostenki, Russia, were also found to be genetically
closer to contemporary Europeans than to contemporary Asians, suggesting an earlier date for the separation between
these populations27. The 24,000‑year-old remains of a child from Mal’ta, south-central Siberia, Russia, showed strong
genetic affinities not only with Europeans but also with Native Americans, indicating a mixed population ancestry for the
first Americans24. The Solutrean theory, which assumed a European origin across the Atlantic for the Paleo-Indian Clovis
culture in North America, could be ruled out because the 12,600‑year-old cranial remains of the Anzick individual
belonging to this culture shows greater genetic affinities to Native Americans than to Europeans25.
The peopling of Europe and the effect of the agricultural revolution have also received great attention15,18,21,27,71,89,103,105,121.
The main genetic components present in modern Europeans seem to have already differentiated by 36,200 years ago27,
and their later dispersal involved several migration waves71,122. The expansion of the first Neolithic farmers resulted in
mixing hunter-gatherer Mesolithic and near-eastern population backgrounds within western Europe ~7,500 years
ago18,21,71,121. A later extensive migration took place ~4,500 years ago from the steppes and was associated with the spread
of Indo-European languages into Europe71. The possibility to gather genome-wide data at population scales from ancient
individuals now provides an opportunity for a fine reconstruction of population migration and admixture patterns from
classical antiquity to modern times.
In some cases, ancient genomes have revealed direct genetic continuity across different archaeological cultures,
questioning theories assuming that culture only changes through the migration of peoples and not simply though the
spread of ideas. The first example is provided by the Paleo-Eskimos from the New World Arctic, who represent distinct
cultural units but were found to represent a single population, first replaced by Inuit <1,000 years ago4,23.
Ancient genomic data have also enhanced our understanding of deeper human evolution, showing that modern
humans admixed with Neanderthals6,22 ~50,000–60,000 years ago27,123 while expanding out of Africa. Current data
suggest that more-recent admixture events might even have taken place122, but this needs further investigation. The
existence of the Denisovans, who may represent a distinct population of Neanderthals from the Altai mountains that
substantially contributed to the genomic diversity of present-day Melanesians, was revealed based on genomic
data14,16,105,123. The sequencing of the mitochondrial genome of ~400,000‑year-old hominins from Atapuerca, Spain, also
revealed genetic affinities between these individuals and the Denisovans, although genome-wide information is needed
to understand the underlying population history32,124.
Finally, when the age of a specimen can be determined using radiocarbon dating, it can be used as a calibration point
to help in the estimation of genome-wide mutation rates. When applied to the remains of a 45,000‑year-old human
from Siberia, this technique confirmed that the autosomal mutation rate was about half of that estimated from the
human–chimpanzee divergence122, which is consistent with recent estimates from human pedigrees125. Similarly, when
the age of a sample falls outside the range of radiocarbon dating, the deficit of mutations observed along the
phylogenetic branch leading to the specimen can be used to provide an estimate for when it lived16,22.

old (for example, ~400,000‑year-old) specimens31,32. part of the Illumina library building procedure, preferen-
Furthermore, light pre-digestion of bone or tooth pow- tially amplifies short and relatively GC‑rich templates62.
der before full extraction on the remaining undigested The same is true for related polymerases, such as Phusion
matter significantly increases the relative proportion of Hot Start I and II, even when high-fidelity buffers are
endogenous DNA recovered45,55–58, probably by washing used. This bias is reduced, or even disappears, when other
away microbial contaminants57 or fully liberating DNA polymerases are used, and Accuprime Pfx, Herculase II
from the matrix 59. Finally, the specific tissue sampled Fusion and Pfu Turbo Cx Hotstart currently seem to
(for example, petrosal bone versus other bones18,25 and be better alternatives than the most commonly used
cementum versus dentine58,60) and sampling procedures polymerases, AmpliTaq Gold and Platinum Taq High-
(for example, drilling at low versus high speed60) affect Fidelity 62. Increasing PCR cycle number often reduces
the quality of extracted aDNA. the molecular complexity of DNA libraries63; thus, poly-
merases should be carefully selected, PCR amplification
Pre-digestion DNA library construction and amplification cycles minimized and/or independent PCR reactions
Exposure of ancient calcified General recommendations. Second-generation sequenc- undertaken in parallel to limit such biases. This has
materials to a short initial ing requires template molecule modification through important consequences for authenticating aDNA data
digestion aimed at removing
adaptor ligation30. Both library construction and subse- and quantifying post-mortem DNA damage, as expected
substantial fractions of
exogenous contaminants. quent PCR amplification represent sources of error 61,62. misincorporation models require tailoring to the exact
The parts of a genome sequenced can be affected by experimental procedure followed45,64.
454 adaptor binding biases and/or the relative efficacy of PCR
The initial generation of GS‑FLX enzymes to amplify the constructs. Which and where Double-stranded DNA libraries. Different DNA library
sequencing platforms based
on pyrosequencing, before
nucleotide misincorporations occur during these amplifi- construction methods also show clear differences in
their acquisition and renaming cations also confer errors in resulting sequences16,44,61. For efficiency. Early aDNA libraries were based around
by Roche. example, the Phusion polymerase, which was originally 454‑compatible blunt-end approaches42–44,52 (FIG. 3a).

NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 397

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

T/A ligation However, as adaptor ligation is random, a fraction of Denisovan genome at ~30× coverage using DNA extracts
A common DNA ligation the constructs do not contain both of the different adap- generated from 40 mg of bone material16. Although the
technology that relies on tors and thus cannot be sequenced using this method. method is sometimes beneficial on highly degraded
complementary pairing of Another possible limitation is adaptor dimer formation osseous materials31,32 (as both strands and every single-
thymine and adenine
overhangs at the 3ʹ ends of the
during ligation; if amplified and sequenced, these waste strand break of endogenous DNA molecules have 3ʹ ter-
adaptors and inserts to be sequencing capacity. Illumina introduced T/A ligation to mini that are compatible with their incorporation into
ligated, respectively. overcome this in their original library construction pro- libraries), its benefit on less-degraded and non-osseous
cedure, in which aDNA fragments have an overhanging materials remains unverified.
Shotgun sequencing
adenine added (known as A‑tailing) to facilitate liga-
The sequencing of fragmented
DNA in the absence of any
tion to T‑tailed adaptors (FIG. 3b). However, this strategy Enriching for aDNA
selection strategy. seems to be suboptimal for aDNA, mostly because tem- aDNA extracts are metagenomic mixtures. The endog-
plates starting with thymines are less efficiently processed enous DNA within most ancient specimens is usually
during ligation61. Thus the (often substantial) fraction embedded within high levels of environmental microbial
of templates containing deaminated cytosine residues DNA. Although there are notable exceptions (including
(thymine analogues) at their temini44 fails to incorpo- some keratinized materials4,67, particularly dense bones
rate into libraries61. TruSeq libraries, which also rely on such as the petrosal bone18,25, and intentionally preserved
T/A ligation, have also been shown to introduce signifi- materials from museums or herbaria8), it is unusual for
cant amounts of palindromic artefacts, whereby short the endogenous DNA content in most calcified remains
sequence segments at read starts are copied towards to account for more than a few percent of the total DNA
read ends65. content. DNA preservation and environmental microbial
contamination levels can show extreme variation within
Single-stranded DNA libraries. A subsequent devel- a single bone. For example, extracts and libraries con-
opment was library construction directly on single- structed from a single 36,000‑year-old European human
stranded DNA (ssDNA) templates16,66. In this method, bone yielded 0.1–8.0% of human DNA27, and even greater
DNA is denatured using heat into single strands and variation (0.5–27.8%) was seen using the early Native
then ligated to a first adaptor, before extension with American ‘Anzick’ cranial bone25.
Bst polymerase generates the complementary strand. A High microbial contaminant DNA levels render
second adaptor is ligated at the 3ʹ end of the comple- shotgun sequencing of genomes uneconomical. Thus,
mentary strand, and the full construct is then amplified several methods have been developed that improve
by PCR (FIG. 3c). Inclusion of biotin in the first adaptor accessibility to endogenous aDNA. These enrichment
allows minimal DNA loss during purification using strategies are used either during library construction,
streptavidin-coated paramagnetic beads. The develop- by preferentially incorporating damaged aDNA frag-
ment of this method enabled characterization of the ments68, or after library construction, by separating

R CpG Y R Y R Y
Y R Y R CpG Y R
m

Post‑mortem DNA decay

Single‑strand break
5′ UpG 3′ 5′ 3′ 5′ 3′
3′ 5′ 3′ 5′ T pG 3′ 5′

Overhang
Post‑mortem base modification Post‑mortem base modification Abasic site

O O
CH3
NH NH
O O O
N N OH
CH2 O CH2 O CH2
5′ O 5′ O 5′ O

3′ 3′ 3′
O O O O O O
P P P
O O O O O O

Figure 2 | Typical ancient DNA molecules. A diverse range of thymines (blue) when cytosines were methylated (mC). Such deaminations
degradation reactions affect DNA post-mortem and result in extensive occur much faster at overhanging ends. Other modifications include
Nature Reviews abasic
| Genetics
fragmentation (preferentially at purine nucleotides) and base modifications. sites (green) and single-strand breaks (vertical lines). The chemical
The most common base modification identified in high-throughput structures of three damage by‑products (uracils, thymines and abasic sites)
sequencing data sets is deamination of cytosines into uracils (red), or are shown. R, purine; Y, pyrimidine.

398 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

endogenous and exogenous fractions through anneal- nuclear genomes (~3 Gb29,78–80). Other approaches that
ing to pre-defined sets of probes (in solution69–72 or on have been demonstrated, although not used in the
microarrays7,73). Intended capture targets range from most recent relevant studies, include targeted diges-
whole mitochondrial genomes (~16 kb31,32,69,72,74,75) or tion of environmental microbial DNA using restriction
ancient commensal and pathogenic bacterial genomes enzymes6 and primer extension capture (PEC)69. Before
(~4 Mb7,10–13) to large sets of single-nucleotide poly- discussing enrichment strategies further, we highlight
morphisms (SNPs) (~400,000 SNPs71), whole exomes that currently none is able to recover 100% of the tar-
(~30 Mb73,76), chromosomes (~30 Mb77) and even whole get molecules, and thus they come at a cost of reduced

a dsDNA library c ssDNA library


U U (Strand 1)
(Strand 2)

End repair Denaturation

U (From strand 1)
U
A (From strand 1)
(From strand 2)
Adapter ligation
ssDNA adapter ligation
U
A
U
Fill in

U
A
Extension

U
A

b A-tailed library
U

Adapter ligation
End repair

U
U A
A

Extension and A-tailing

U A
A A Heat denaturation

A
Adapter ligation
Supernatant
T U A
A A T

Figure 3 | Constructing ancient DNA libraries. The three most common at the T‑tailed end but have non-complementary arms at the other end).
Nature Reviews | Genetics
types of ancient DNA (aDNA) libraries are shown. 5ʹ-phosphate groups are The use of such adaptors results in aDNA strands being flanked by distinct
indicated with black circles, single-strand DNA breaks are shown as vertical non-complementary adaptor sequences at each end to enable subsequent
lines, biotinylated adaptor groups are shown in red, and streptavidin- unidirectional sequencing through the aDNA fragment. Nicks resulting
coated beads are shown in grey. a | To construct a double-stranded DNA from ligation are filled‑in through PCR post-ligation. c | To construct a
(dsDNA) library, aDNA is first end-repaired. It is then ligated single-stranded DNA (ssDNA) library, aDNA is first denatured into single
to double-stranded adaptors (blue), and the resultant nicks are filled in to strands using heat and then ligated to biotinylated single-stranded
construct library templates devoid of single-strand breaks. b | To construct adaptors. The original DNA strand is then copied using DNA polymerase
an A‑tailed DNA library, aDNA is end-repaired and then A‑tailed (that is, an extension, and a second adaptor is ligated to enable further PCR
adenine is added to the 3ʹ ends of the strands) to facilitate subsequent amplification and sequencing. Purification steps are performed using
ligation to T‑tailed adaptors while disfavouring ligation between adaptor streptavidin-coated paramagnetic beads. Part c adapted with permission
pairs. The adaptors are typically Y‑shaped (that is, they are complementary from REF. 16, American Association for the Advancement of Science.

NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 399

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

library complexity. Therefore, the upper threshold on can be adapted depending on the phylogenetic distance
the maximum sequencing depth attainable from a given between targets and probes, streptavidin-coated beads
library is reduced, and users must consider the end goal are washed to eliminate library constructs with inserts
of their analyses before determining whether capture is showing no genetic proximity to the targeted regions,
a sensible strategy over direct shotgun sequencing. If and the final fraction is amplified and sequenced.
the goal is to sequence to high coverage, highly complex This strategy has predominantly been used for
libraries showing relatively high endogenous content sequencing mitochondrial genomes72,74,75,82–85, bacte-
can be shotgun-sequenced4,6,18,22,26, but enrichment of rial plasmids86 and short nuclear loci84. Hybridization
multiple libraries is advisable in other cases71,72. is even successful when probes diverge from targets
by 10–13%82, which is useful if no close living relative
Damaged template enrichment. One approach selec- and/or reference genome is available. This can also be
tively targets damaged DNA molecules68 during ssDNA exploited to detect probe carry-over post-sequencing
library preparation16,66. After the DNA strand comple- if the DNA from a distantly related organism was
mentary to the original template is generated, con- used for preparing probes (for example, if DNA from
structs are 5ʹ-phosphorylated, which enables ligation a European bison was used when enriching for aDNA
to a non-phosphorylated adaptor (FIG. 4a). Following from aurochs83). Alternatively, potential probe carry-over
extension with Bst polymerase to fill the nick located can be eliminated before sequencing using dedicated
5ʹ of this adaptor, treatment with uracil DNA glycosy- molecular tools. For example, replacement of deoxythy-
lase and endonuclease VIII (USER mix) is implemented midine triphosphate (dTTP) by deoxyuridine triphos-
to first replace deaminated cytosines with abasic sites phate (dUTP) in probes enables subsequent digestion
and then to cleave out these abasic sites81. The new 3ʹ with uracil DNA glycosylase before amplification
end is then dephosphorylated and used for priming a and sequencing 83.
Primer extension capture
(PEC). An enrichment
new extension. Thus, all library strands that originally Biotinylated probes can also be custom designed
technology based on the harboured deaminated cytosines are reconstructed and synthesized, which enables specific probe til-
ligation of short 5ʹ-biotinylated over their full length and are available in the reaction ing and in silico assessment for secondary structures,
oligonucleotides (including a supernatant for further amplification and sequenc- homogeneous GC content and annealing tempera-
12‑nucleotide-long spacer
ing. The undamaged DNA template fraction remains tures. Different manufacturers can now deliver such
followed by a primer of 18–25
nucleotides that is designed to attached to streptavidin-coated paramagnetic beads probes, with related procedures apparently achieving
match a particular region of and can be retained for other uses. This method has similar efficiency 80. Depending on the overall size of
interest) to single-stranded shown great specificity when applied to samples from the genomic regions targeted, multiple libraries can, in
target molecules. This is Late Pleistocene Neanderthals showing extreme levels theory, be enriched as pools to achieve faster hands‑on
followed by a single round of
polymerase-based extension
of deamination68. Importantly, in all extracts tested, times. However, owing to the probable formation of
so as to increase the length the relative contamination from modern human DNA chimeric DNA libraries during post-capture PCRs, pooling
over which the molecules are decreased by ~1.6‑fold following selective enrichment, of libraries before capture should ideally be avoided, or
hybridized. suggesting that undamaged templates resulting from if pooling is used then the constituent libraries should
recent manipulations of the specimen could readily at least be double-indexed DNA libraries87 to enable chi-
Tiled probes
Probes that overlap in their be filtered. Furthermore, the endogenous content of maera identification and elimination from subsequent
positioning on the target so one sample increased by 3.7–5‑fold, which markedly analyses. Increasing probe tiling densities (11 bp ver-
as to ensure that every target reduced the genome sequencing cost. Future experi- sus 24 bp) did not consistently improve enrichment for
position is covered by more ments will no doubt explore the wider potential of this ~670 nuclear loci in archaeological maize, suggesting
than one different probe.
method. For now, users should bear in mind that any that even relatively reduced probe densities can be used
Chimeric DNA libraries endogenous undamaged molecules will not be retained to efficiently recover the full molecular complexity of
Recombination between and will thus be lost, making the method only appro- DNA libraries88.
libraries containing different priate for the most damaged samples. Additionally, any In general, custom-synthesized biotinylated probes
template molecules during
DNA carrying damage will be enriched, potentially are most economical when targeting fairly small regions
library PCR amplification,
resulting in hybrid (chimeric) providing access to the genomes of associated ancient (hundreds of kilobases to a few megabases) owing to
sequences that do not microorganisms (although these can show reduced probe synthesis costs. However, microarrays can achieve
represent true biological DNA damage levels compared to their human hosts9). extremely high probe numbers (approximately 1 million
sequences. each) and, if manufacturers consent, can be chemically
Double-indexed DNA
Extension-free target enrichment in solution. Target treated to cleave the probes from the microarray sur-
libraries enrichment approaches based on target–probe hybrid- face, thus recovering large sets of probes at relatively
DNA libraries in which short ization are currently widely used. These require heat reasonable costs71,76,77. Synthetic DNA probes are built
(for example, 8 bp long) denaturation of DNA libraries to enable annealing of into biotinylated probe libraries using biotinylated adap-
unique nucleotide indexes
library inserts to overlapping tiled probes along target tors of minimal size (~20 bp) to limit interference during
are incorporated within both
adaptors used during library regions. Probes can be economically generated using probe–target annealing. The known adaptor sequence
construction. Indexes are long-range PCR, if fresh DNA material from closely allows further amplification, thereby immortalizing the
bordered by known sequences related species can be extracted70, through PCR ampli- probe set at low cost. In this way, Fu et al.77 used 8.7 mil-
that serve to prime index con shearing and then ligation to a biotinylated adaptor. lion probes to recover most of the non-repetitive frac-
sequencing reactions and also
enable library attachment to
This probe library can be amplified (with biotinylated tion of chromosome 21 from a 40,000‑year-old human
the surface of the sequencing primers) and used in an unlimited number of enrich- specimen from Tianyuan cave, China. In addition, they
flow cell. ment reactions. Following annealing at stringencies that targeted ~3,500 200‑bp‑long regions around positions

400 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

a Selective uracil enrichment b WISC


U (Strand 1) Endogenous fraction
(Strand 2) U
A
Denaturation
Exogenous fraction

U (From strand 1)
Heat denaturation
(From strand 1)
(From strand 2) Hybridization

ssDNA adapter ligation


U

Target-enriched fraction
U
Washing U
and elution

Extension and
phosphorylation

U
A Hybridization

Biotinylated
RNA probes

Adapter ligation In vitro transcription

U
A

Probe DNA library

Fresh DNA extract

USER treatment
Figure 4 | Enriching DNA libraries for ancient inserts. a | Selective uracil
enrichment is shown. 5ʹ-phosphate groups are indicated with black circles,
A single-strand DNA breaks are shown as vertical lines, biotinylated adaptor groups are
shown in red, and streptavidin-coated beads are shown in grey. A single-stranded
DNA (ssDNA) library is built until the polymerase extension step. DNA is then
phosphorylated to enable the ligation of the second adaptor. This contrasts with
the ssDNA library procedure, in which the ligation occurs between the 5ʹ end of the
second adaptor and the 3ʹ end of the newly synthesized strand (FIG. 3c). DNA is then
Extension
treated with uracil DNA glycosylase and endonuclease VIII (USER mix) to generate
and then cleave out abasic sites at cytosines that were deaminated into uracils
post-mortem. The 3ʹ-phosphate groups at these new termini are then removed
A (not shown). The resulting 3ʹ‑OH ends now serve to prime an extension with a DNA
polymerase, which copies throughout the whole length of the strand complementary
to where the damage was. As a result, the supernatant now contains double-stranded
DNA (dsDNA) library templates corresponding to the original deaminated strands.
Other library templates remain unaffected and can be separated, as they remain
bound to streptavidin-coated paramagnetic beads. b | In whole-genome in-solution
Extension
capture (WISC), ssDNA templates from an ancient DNA (aDNA) library are prepared.
The target, endogenous aDNA is shown as thin black lines, whereas the exogenous
contaminating DNA is shown as thin green lines; adaptors are shown as thick blue
A Supernatant
lines. In parallel, a probe DNA library is prepared from fresh modern DNA extracts (thin
red lines) and used to generate biotinylated RNA probes through in vitro transcription.
T7 adaptors to enable in vitro transcription are shown in thick purple lines. The
aDNA library is annealed to the RNA probes, low-complexity DNA and adaptor
blockers (the latter two are not shown for simplicity). The library fraction of interest
is then recovered following elution from streptavidin-coated paramagnetic beads.
Part a adapted with permission from REF. 68, Cold Spring Harbor Laboratory Press.
Part b adapted with permission from REF. 78, The American Society of Human Genetics.

Nature Reviews | Genetics


NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 401

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

known to carry allelic variants in archaic and modern WISC-like approaches consistently improve the pro-
humans, thereby enabling direct estimates of archaic portion of sequences that can be mapped to the human
hominin ancestry within the Tianyuan specimen. The reference genome compared to shotgun sequencing
method was also used to obtain the exome sequence of (6–159‑fold), at least when based on double-stranded
two Neanderthals from Spain and Croatia76 and, more DNA libraries 29,78–80. As hybridization efficiency
recently, sequence data from ~400,000 loci within a sin- increases with target length79, its efficacy may be reduced
gle reaction71. This target enrichment procedure reduced when analysing libraries built using single-strand meth-
the genotyping costs by at least 45‑fold per ancient speci- ods16,66, which routinely exhibit smaller mean target mol-
men71 and enabled genome-wide analyses of ancient ecule sizes. The fraction of reads that align to repetitive
individuals at population scales. In this analysis, two regions also generally increases with WISC, despite the
52‑nucleotide-long probes were selected to be located on use of an excess of low-complexity DNA. Unsurprisingly,
each side of a polymorphic site, and two were centred WISC-enriched libraries show reduced complexity, so
on the polymorphic site, each representing one of the that almost every unique insert can be sequenced with
two possible alleles. minimal sequencing efforts78. As an example, 5–10 mil-
lion sequencing reads generated using WISC-enriched
Solid-phase target enrichment. Direct application libraries of a Bronze Age Danish human hair sample and
of microarrays can also enrich large sets of targets, a pre-Columbian Peruvian human bone were found to
using approaches originally described for modern cover 7,000–21,000 ancestry-informative markers, which
DNA89. First used in the aDNA context to characterize proved to be sufficient for inferring the continental
exome sequences from a 49,000‑year-old Neanderthal groups that are the closest to these ancient individuals78.
specimen73, microarrays have subsequently enabled
whole-genome sequencing from bacterial strains Analysing aDNA
responsible for major historical epidemiological out- From reads to genome alignments. Most available
breaks7,9–13, including the Black Death7. Microarrays paleo­genomes were generated using Illumina technolo-
also provide interesting alternatives to real-time PCR gies, although there are exceptions5,15,17. Analysis of the
and shotgun sequencing for parallel screening of >100 underlying sequence data mainly relies on computa-
pathogens12,90. This is particularly appropriate for iden- tional approaches developed for handling HTS data
tifying ancient pathogens, which often leave no physical from modern DNA material, with some additional par-
skeletal evidence and are generally found only as trace ticularities. Most procedures are implemented within the
material. Possible drawbacks are poor detection of the open-source PALEOMIX package91, in which reads are
most divergent genomic regions and omission of regions trimmed of adaptor sequences using AdapterRemoval92
with important genomic rearrangements (such as and collapsed when mate pairs are available and over-
insertions) or unknown additional plasmids that do not lap significantly, filtered for a minimal size of 25–30 bp
segregate in modern strains. and aligned against reference genomes of interest using
Burrows–Wheeler Aligner (BWA)93 or Bowtie 2 (REF. 94).
Whole-genome enrichment. There is a growing interest Alignments showing low-quality scores and PCR dupli-
in characterizing the entire genome sequence of ancient cates are further removed using the MarkDuplicates pro-
individuals at population scales. However, none of the gram from Picard tools, and reads are locally realigned
methods presented above is appropriate for pulling around small insertions and deletions (indels) to improve
down whole human genomes, as this requires synthe- overall genome quality using the IndelRealigner tool from
sizing gigabases of probes. Whole-genome in‑solution the Genome Analysis Toolkit (GATK)95. PALEOMIX can
capture (WISC)78 and a commercial alternative with also quantify DNA damage levels using mapDamage2
similar performance79,80 fill this niche, enabling eco- (REF. 48) and perform phylogenomic and metagenomic
nomical whole-genome enrichment. WISC starts with analyses using modules mostly based on inferences deriv-
the preparation of a genome-wide RNA probe library ing from ExaML (Exascale Maximum Likelihood)96 and
from a species with a genome that is closely related to MetaPhlAn (Metagenomic Phylogenetic Analysis)97,
the target genome in the aDNA sample (FIG. 4b). These respectively.
RNA probes are generated from a genomic DNA library Unlike sequences in other re‑sequencing genome
flanked by adaptors containing T7 promoters that projects, in which mismatches relative to the refer-
enable a relatively inexpensive reaction, in vitro tran- ence genome generally are derived from sequencing
scription. This in vitro transcription step is carried out errors and polymorphisms, aDNA sequences exhibit
in the presence of biotin 16–UTP, so that the resultant substantial fractions of nucleotide misincorporations
RNA probes are biotinylated. The biotinylated RNA that result from sequencing damaged bases. As these
probes are annealed to the ssDNA of a heat-denatured misincorporations cluster towards read termini, seed-
Mate pairs aDNA library, while low-complexity DNA and adaptor- ing approaches, whereby only the most upstream part
Pairs of sequences derived from blocking RNA oligonucleotides improve stringency and of the sequence is used for speeding up identification
both ends of a DNA library. reduce enrichment for highly repetitive regions. Non- of possible alignments along the genome, should be
hybridized DNA is washed away, whereas the bound, avoided98. Parameters controlling acceptance thresholds
Edit distance
The number of sequence
enriched library fraction is finally released following for read‑to‑reference edit distance should be adapted to
mismatch counts between RNase treatment (which precludes probe carry-over) the phylogenetic distance to the reference genome, as
reads and targets. and amplified before sequencing. overly conservative procedures will under-represent the

402 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

most polymorphic regions and under-estimate heterozy- Statistical damage models also allow correction of
gosity levels. Conversely, overly permissive procedures base quality scores depending on their probability
will inflate the alignment false-positive rate, resulting of being the result of nucleotide misincorporations at
in regions with many reads from different organisms, damage sites48, thus limiting their possible effect on
which is a particular challenge for aDNA data, given downstream analyses. However, we emphasize that for
its complex mixture of endogenous and exogenous low-coverage data — in which mismatches are observed
reads52,57. on a few reads at best and penalized when close to read
Owing to the accumulation of nucleotide misincor- termini — this procedure can potentially inflate the
poration towards read ends, probabilistic aligners based genetic proximity to the reference genome. SNP calling
on position-scoring matrices have been developed to can also benefit from genotype callers, such as SNPest4,102,
embed aDNA features from the aligning step. Available that explicitly model post-mortem DNA damage as a
aligners include Mapping Iterative Assembler (MIA)69, possible source of error. Furthermore, nucleotide mis-
ANFO Short Read Aligner/Mapper 6 and BWA-PSSM incorporation patterns can be used by computational
(position-specific scoring matrix)99, and these generally tools to sort the fraction of reads that show evidence
show good performance for short reads and/or low- of post-mortem damage101, which is useful when there
quality data, although some show running times that is substantial modern DNA contamination. Although
are compatible only with alignments against relatively extremely conservative and not cost-effective (as not
Probabilistic aligners
small reference genomes (for example, mitochondrial all aDNA molecules carry post-mortem DNA damage
Mapping algorithms that can
accommodate non-uniform genomes). Importantly, such probabilistic approaches and many true aDNA reads will be discarded), damage-
distributions of sequencing handle platform-specific error profiles in a sound based filtering approaches have shown great success in
errors along reads, generally statistical framework. characterizing whole-mitochondrial sequences from
leading to improved alignments extensively contaminated Neanderthal specimens101
between reads and reference
genomes.
Authenticating aDNA data. Following read alignment, and an ~400,000‑year-old hominin32. Finally, com-
analyses often focus on authenticating whether sequenc- paring analytical outcomes when considering the full
Thermal age ing data are ancient. Software such as mapDamage45,100 population of reads or only the most damaged frac-
The predicted time that or pmdtools101 can test the presence of typical nucleotide tion (and disregarding mutations, such as transitions,
it would have taken an
misincorporation patterns that result from inflated cyto- that derive from post-mortem damage40–44) can provide
archaeological sample to
produce the observed degree sine deamination rates at overhangs. Such patterns can evidence that the results are not driven by damage and
of DNA degradation were the be first obtained by preparing libraries on an aliquot of contamination artefacts103.
sample exposed to a constant the DNA extract, while saving the remaining fraction for In addition to revealing nucleotide misincorporation
temperature of 10 °C since preparing almost damage-free libraries following USER patterns, mapDamage also delivers the base composition
deposition. Thermal age has
been proposed to adjust the
treatment 81. This will limit nucleotide misincorpora- of the genomic regions directly flanking DNA inserts
chronological age of a sample tion effects on downstream analyses. Alternatively, mild and therefore tests depurination as the main driver for
to its thermal history and to USER treatment, which removes most, but not all, of the DNA fragmentation44,48,100, which can also help authen-
help in predicting the likelihood damage signature, has been proposed to enable sequence tication. This pattern is substantially affected following
of DNA surviving in
authentication and population analyses using the same USER treatment, which mainly cleaves DNA down-
archaeological remains.
sample aliquot 72. stream of unmethylated cytosine residues, therefore
Haplotypes Nucleotide misincorporation patterns can be resulting in an excess of cytosines at genomic positions
The DNA sequences of exploited to fit statistical models of post-mortem DNA just preceding read starts16,72.
haploid chromosomes. damage and estimate cytosine deamination rates and
Derived alleles
nick frequencies44,48. Even though deamination rates at Estimating contamination levels. Nucleotide mis­
Alleles that are evolutionarily overhangs were reported to increase linearly with time incorporation and base compositional patterns can be
derived in a lineage of interest across a wide range of archaeological sites and preserva- detected in even substantially contaminated samples.
and that are not represented tion conditions46, this pattern has not been confirmed This can happen when treating the outer sample surface
in an ancestral population
within archaeological sites in permafrost17 or temperate with bleach before DNA extraction, which can help to
or species.
environments72. Additionally, different remains from the remove a fraction of fresh DNA contaminants but also
Ancestral alleles same specimen and/or extracts from the same remain introduces signatures of DNA damage within the remain-
Alleles in the ancestral state can show variable levels of DNA damage27,32. This sug- ing contaminants104. This can also happen when a mix-
before a mutation took place gests complex relationships in which both global condi- ture of highly degraded aDNA templates and undamaged
in a descending population,
species or lineage.
tions, as reflected in the thermal age of a given specimen, DNA contaminants is incorporated into libraries. A suite
and microenvironmental factors (within and between of tools has thus been developed for further authenticat-
Nearly fixed remains) drive the amount of DNA damage ultimately ing aDNA data (in particular for human aDNA). The
Fixed alleles are those that measured. In our opinion, these complex relationships, current methods available exploit the sequence infor-
are derived and present in
and the dependency of damage quantification on the mation at sites and/or haplotypes with known variation
all individuals in a descendent
population or species. Nearly library preparation and amplification procedures, pre- across species and/or populations. For example, modern
fixed alleles therefore clude the use of strict minimal thresholds of expected human contamination in Neanderthal HTS data has
represent those that are DNA damage levels as authentication criteria. Thus, been estimated using the relative proportion of derived
present in nearly all individuals quantitative comparison with the levels observed for alleles and ancestral alleles observed at mitochondrial sites
(thus close to fixation, for
example, showing allelic
samples excavated at the same or similar archaeological showing nearly fixed derived alleles in modern humans6.
frequencies of 99% in the sites, and processed with the same experimental tools, is A similar rationale was used to estimate the possible con-
population). recommended. tribution of different human population backgrounds105

NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 403

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

or species83 to final mitochondrial consensus sequences. and during sequencing. The best alternative developed
A statistically more powerful contamination estimator for so far involves USER treatment followed by paired-end
mitochondrial reads that uses linkage information at the sequencing 81, which can generally reduce error rates to
read level has been developed74. <0.1–0.2% per base16,27, although with the caveat that
As the cellular mitochondrial number is variable the error rate seems to be only marginally reduced in
across cell types and tissues, contamination estimates CpG dinucleotide contexts owing to the high degree of
based on mitochondrial sequence data do not directly methylation at such sites16,33.
reflect the true contamination levels of the nuclear When authenticated on the basis of low contamina-
genome106. Heterozygosity levels observed on male tion levels and evidence of post-mortem DNA dam-
X chromosomes can be used as a nuclear contamina- age, the genome-wide data gathered can be used for
tion proxy. As males are haploid for most X chromo- various analyses, including phylogenomic reconstruc-
some loci, base discordance between overlapping reads tion 9–11,16,17,26,91 and inference of population history
should result only from sequencing errors and should be (BOX 2). In addition, sequence data can be exploited to
distributed randomly along the chromosome. However, reconstruct genome-wide epigenetic maps (see below),
if modern human DNA contamination is present, dis- thus paving the way for evaluating the extent to which
cordance rates should inflate at sites that are polymor- epigenetic reprogramming has influenced recent human
phic within contemporary populations14. For archaic evolution38 (FIG. 5).
hominin specimens, nuclear contamination rates can
be calculated from fixed alleles that are derived in mod- Reconstructing ancient epigenomes
ern humans6. For female ancient human samples, the Genome-wide epigenetic maps. Cytosine methylation,
presence of sequences that are known to be unique to which predominantly occurs in vertebrates at CpG
the Y chromosome can also reflect the presence of con- dinucleotides, is the most commonly studied epigenetic
tamination from male-derived sources106. Triallelic sites modification. Classically, methylomes (that is, genome-
at autosomes could potentially be used in the future to wide maps of methylated cytosines) are reconstructed
estimate levels of nuclear contamination with modern using bisulfite sequencing that converts unmethylated
human DNA, irrespective of the sample gender. cytosines into uracils, which are sequenced as thymines,
leading to CpG→TpG mutations. Methylated cytosines
Genome completion and error rates. Reliable contami- are not converted and are sequenced as regular CpG
nation estimates can generally be recovered from the sites. Although it was successfully used to reveal fine-
data aligning to the X chromosome using even low- scale methylation patterns at four nuclear loci on the
depth information, as long as each single genomic posi- DNA extracted from a Late Pleistocene bison bone108,
tion is covered once on average (that is, ~1× coverage). this approach generally requires large amounts of DNA,
Ultimately, the exact fraction of the genome that is cov- as it inflicts extensive damage to DNA and is therefore
ered depends on the sequencing effort and the sequence generally inappropriate for aDNA. However, similar
length. For aDNA sequence reads of 60 nucleotides, CpG→UpG conversions naturally occur post-mortem.
~87% of the human genome is non-repetitive, and there- In contrast to bisulfite treatment, methylated epialleles
fore reads of similar size (or shorter) cannot be uniquely (mCpGs) are deaminated into TpGs, whereas unmeth-
aligned to the remaining ~13% of the genome4. For ylated epialleles (CpGs) are deaminated into UpGs. As
example, the genome of a Paleo-Eskimo Greenlander cytosine deamination is elevated in methylated con-
of the Saqqaq culture was sequenced to ~16× coverage, texts109, a substantial fraction of mCpGs is expected to be
with ~20% of the genome missing. This achieved ~20× converted into TpGs. Molecular tools that prevent UpG
coverage at positions covered at least once, although sequencing can therefore reveal methylated epialleles by
some variation was observed along the chromosomes, as tracking CpG→TpG mutations33,34.
half of the positions showed a depth of coverage of ≤7×. Briggs et al.81 first exploited this feature of aDNA,
Using DNA polymerases that reduce size and base com- using HTS data generated from library inserts in
positional biases during library amplification62 can help which uracil residues were removed following USER
to limit such variation, although nucleosome protection treatment. Gokhman et al.33 subsequently generated
can also lead to specific patterns of depth‑of‑coverage high-resolution aDNA methylation maps using high-
variation along the genome (see below). quality Neanderthal and Denisovan genomes, identify-
Overall sequence accuracy of ancient genomes is ing ~2,000 regions that show differential methylation
another parameter that is worth considering, as sequenc- patterns in the bones of modern versus archaic homi-
ing errors will have an impact on downstream analyses. nins. Homeobox D10 (HOXD10), which encodes a
Genome-wide error rates are generally estimated using key regulator of limb development, is found in one
three‑way alignments that include the genome of a such region, and its epigenetic reprograming is pro-
closely related outgroup (for example, the chimpanzee) posed to have participated in the shaping of specific
and a high-quality genome from a living conspecific morpho-anatomical features along the human lineage33.
Epialleles individual107. The excess of derived alleles observed in Pedersen et al.34 used sequence data from the high-
Allelic variants showing the genome of the ancient individual provides an esti- coverage Paleo-Eskimo Saqqaq genome4 to also extract
identical genetic sequences
but different epigenetic
mate for its error rate relative to the high-quality modern genome-wide methylation information, as DNA libraries
marks, such as different genome. Unsurprisingly, this rate is highly dependent on were amplified using a polymerase that does not bypass
methylation patterns. DNA damage levels and the molecular tools used before uracils. Furthermore, CpG→TpG substitution rates were

404 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

Box 2 | Reconstructing population histories


around 40–80 bp. However, several aDNA sequence
data sets exhibit striking 10‑bp periodicity in their size
One of the most common first steps in the analysis of genome-wide data from ancient distribution4,21,25,26,34,79. Pedersen et al.34 proposed that
humans is the characterization of their closest relatives among modern populations. this results from nucleosome protection, with DNA
Such inferences are generally based on principal component analysis (PCA) or fragmentation preferentially occurring at nucleotides
statistical clustering, using software such as Admixture126. A benefit of statistical
facing away from nucleosomes. Assuming that nucle-
clustering is that it also enables documentation of contamination levels through
determining whether the ancient samples exhibit a genetic contribution that could be
osomes are strongly positioned and phased along DNA
derived from the research team4. With shotgun sequencing at low depth of coverage scaffolds, and recalling that the turn of the DNA helix
(for example, ≤8×), genotypes cannot be reliably determined, and analyses are is 10 bp long, only 1 nucleotide per 10 bp would be
generally performed using pseudo-haploid data in which sequence reads from many fully exposed to hydrolysis. If this is true, then nucleo-
loci consist of a random sampling of only one of the two constituent alleles, and thus some protection should also drive additional patterns.
individuals are considered to be homozygous for the unique allele sampled at a given For example, DNA fragmentation should occur pref-
locus. The genomic regions covered across multiple individuals are then also limited, erentially within spacers, which are nucleosome-free
which reduces the number of orthologous loci overlapping known genetic variation in regions of ~50 bp separating successive ~150-bp DNA
modern populations. In such cases, the ancestry of each ancient individual can be blocks covered by nucleosomes. Fewer endogenous
determined using multidimensional scaling (MDS), which exploits pairwise measures
reads should therefore map to spacer regions, lead-
of genetic distances in a panel of individuals, calculated by normalizing the sum of all
instances where two individuals show different alleles by the total number of loci with
ing to depth‑of‑coverage periodicities of ~200 bp,
no missing data in each pair. This procedure is implemented in the bammds package127. with peaks of coverage corresponding to nucleosome
Additionally, Procrustes transformation of individual PCA projections based on the centres and correlating with both in silico predicted
particular vector of single-nucleotide polymorphisms covered in each specimen and and experimentally derived nucleosome maps. These
the same reference panel can help to visualize the population affinities of a group of predicted periodicities were confirmed in the Saqqaq
ancient individuals within a single analysis103. sample data, even following correction for base com-
However, PCA-based approaches reflect not only population ancestries but also the positional effects, which can substantially affect
temporal sampling between ancient and modern individuals128. Thus, at best, MDS, PCA depth‑of‑coverage variation during library amplifica-
and clustering analyses should be viewed as formulating evolutionary hypotheses, tion62. This finding, together with expected patterns
which subsequently require testing using approaches such as model-based inference,
of methylation and depth of coverage within CTCF
as well as coalescence simulations14, D‑statistics129 and population f‑statistics130.
regions and splicing sites, confirmed the nucleosome
Population f‑statistics methods, such as the f3‑statistics, have been developed for
detecting populations with mixed ancestries and identifying populations that are protection hypothesis.
closest to ancient individuals130. D‑statistics has received particular attention because it Nucleosomes might protect DNA from cleavage that
originally supported the theory that admixture occurred between Neanderthals and occurs during cellular apoptosis and/or post-mortem34.
non-African modern humans6. D-statistics is based on four-way alignments that include As similar periodicity patterns have been found not
one outgroup (O) and three populations (H1, H2 and H3), of which two (H1 and H2) are only in ancient hair follicles4,34, which have undergone
more closely related. For example, in the case of Neanderthals, with the following extensive apoptosis, but also in other ancient tissues
configuration (O = Chimpanzee, H3 = Neanderthals; H2 = Eurasians, H2 = Africans), that are not particularly affected by apoptosis, such as
positive D‑statistics indicate an excess of shared polymorphisms and possible teeth21 and bones25,26,79, we expect that ancient nucleo-
admixture between Neanderthals and Eurasians6,16,22. However, this observation is also
some maps could, in the future, be reconstructed across
compatible with gene flow into Africans from a currently unsampled and divergent
ghost population129, as well as with population subdivision in Africa, with Neanderthal
a wide range of samples. Recalling that such patterns
and Eurasian ancestors leaving Africa from related population backgrounds131. are also absent from many of the samples analysed so
Admixture events can be further dated from the distribution of introgressive block far, further work is needed to understand which fac-
lengths in modern and ancient individuals27,122,132, as recombination reduces their size tors drive the preservation of signatures of nucleosome
over time. The resulting date seemed to be too recent to be compatible with a scenario protection after death.
involving population subdivision in Africa, which confirmed admixture with
Neanderthals outside Africa. Assessing ancient gene expression levels. Post-mortem
DNA damage enables the reconstruction of ancient
methylome and nucleosome maps. Given the central
Ghost population found to correlate with known methylation levels at pro- role of epigenetic states in regulating chromatin acces-
An unsampled population that moters, exons, introns and CpG islands. This was not sibility to transcription factors, this information can be
exchanges migrants with other observed for other CpN dinucleotides, confirming that tentatively used to infer ancient gene expression levels.
sampled populations and that methylation drives the signal. The authors also used Encouragingly, methylation ratios between gene bod-
can be identified based on
admixture signatures left in
CpG→TpG substitutions at read starts (where deami- ies and promoter regions (a proxy for gene expression)
descending populations. nation rates are maximal) to infer ancient methylation showed strong correlation with hair follicle expression
levels for genomic regions overlapping the loci from the levels measured using high-throughput RNA sequenc-
Introgressive block lengths Illumina Infinium Human Methylation450BeadChip ing (RNA-seq)34. However, further work is needed to
Population admixture
array. The inferred methylation profile from the Saqqaq develop genuine proxies that accurately measure ancient
introduces a mosaic of ancestry
blocks along the genome, the sample was found to cluster with hair follicle meth- gene expression levels. The epigenome of each cell type
lengths of which decrease with ylation profiles, which is in agreement with the tissue is complex, and ancient samples will necessarily span a
each subsequent generation originally used for DNA extraction. range of tissues, with unbalanced contributions from
owing to recombination. different cell types, which will possibly result in vari-
Introgressive block lengths
can therefore be exploited
Genome-wide nucleosome maps. Library inserts able validity of expression predictions across samples,
to determine the date of derived from endogenous aDNA generally show uni- age, sex and health conditions. As one example, genome
admixture events. modal size distributions that are typically centred hypermethylation is a known response to viral infection

NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 405

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

Nucleosome Spacer DNA technologies themselves had the greatest impact on the
mCpG field. Although not originally designed for aDNA, their
CpG massive throughput coupled with their ability to sequence
short molecules rendered them ideal for aDNA applica-
tions. Therefore, it is likely that future HTS platforms that
directly sequence DNA bases and their modifications with
minimal (if any) library preparation will drive the future
of aDNA research. The results of the initial application of
Depth of
coverage true single-molecule DNA sequencing are encouraging,
having demonstrated substantial improvement in relative
amounts of accessible endogenous sequences17,45,56.
Although most paleogenomic studies have focused
CpG→TpG on a limited number of individuals, current approaches
allow the characterization of genome-wide SNP vari-
Figure 5 | Tracking ancient nucleosome and methylation maps. DNA wrapped
ation at ancient population scales71,111. Future studies
Nature Reviewsin| Genetics
around nucleosomes can be protected post-mortem and over-represented can be expected to investigate genetic variation in large
high-throughput sequencing (HTS) data. Therefore, depth‑of‑coverage patterns population samples on the high-density SNP or even
along the genome can be exploited to position the location of nucleosomes on whole-genome scale, thus improving our understanding
ancient genomes. Similarly, post-mortem deamination at CpG sites transforms of past demographic, adaptive and admixture trajectories
methylated CpG (mCpG) sites into TpG sites but transforms unmethylated CpG sites with greater detail112.
into UpG sites. With molecular tools disabling the sequencing of the UpGs, CpG→TpG Besides delivering ancient genomes and epig-
mutations in HTS data provides an opportunity to detect ancient mCpGs, enomes, new methodological developments have also
with hypomethylated regions showing low CpG→TpG conversion rates and provided access to ancient transcriptomes113,114 and
hypermethylated regions showing high CpG→TpG conversion rates. Adapted with
proteomes17,115,116. Owing to the biochemical processes
permission from REF. 38, American Association for the Advancement of Science.
inherent in animal cell death, animal tissues are unlikely
to represent good reservoirs for long-term RNA sur-
vival. Materials still exist in other organisms that do not
in plants, and methylation assays for ancient plant mate- undergo autolysis. One example is plant seed, a tissue that
rial can therefore be used to monitor viral exposure in requires RNA survival for germination and that has dem-
ancient populations109. onstrated ancient RNA survival going back hundreds to
thousands of years113,114. Such materials may contribute
Conclusions to our understanding of how gene expression path-
Recent technical developments have enhanced our ways have been remodelled during domestication.
understanding of the properties of aDNA molecules and Additionally, a wide range of ancient proteins have been
how we should best proceed to maximize their retrieval. sequenced from Late115 and Middle17 Pleistocene speci-
In some environments, this enables genomic characteri- mens. With half-lives exceeding that of DNA, ancient
zation throughout much of the past million years17,31,32. peptides might be the only way to retrieve genetic
Ongoing research and the increasing wealth of sequenc- information from the early Pleistocene and even earlier
ing data generated will undoubtedly further improve time periods. Within a much more recent time range,
current approaches in the near future. DNA extraction namely the past few thousand years, studies of proteins
represents an area with great potential for improve- have already delivered information that is not obtainable
ment, especially if tailored to the molecular structures, from DNA, such as whether milk products were already
niches and microenvironmental parameters that best consumed in particular ancient societies117. Molecular
preserve DNA. analyses of dental plaque, which offers a rich reservoir
The discovery that post-mortem cytosine deami- entrapping biomolecules derived not only from the host
nation preferentially occurs at overhangs was impor- but also from its diet and the oral microbiome118,119, may
tant for the development of authentication criteria44. also hold great promises, especially now that computa-
CTCF regions However, other base modifications, including pyrimi- tional approaches have been developed to compare the
Genomic regions targeted by dine derivatives, have been identified 39. Improved diversity of past and present microbiomes57.
CCCTC-binding factor (CTCF)
and involved in regulating the
characterization of the chemical features of aDNA mol- A final question worth considering is whether the
three-dimensional structure ecules, as well as their methylation and nucleosome pro- technological breakthroughs in ancient genomics may
of chromatin and transcription tection patterns, could therefore open new avenues for offer pathways towards de-extinction120. Bringing back
by mediating long-range data authentication. This will also improve our ability lost species is of growing interest, and although it is a
interactions between genomic
to correct sequence analyses from as-yet-unidentified topic fraught with challenges ranging from the ethical to
sequences.
biases and provide opportunities for targeting damaged the technological, for many extinct species a key starting
Admixture templates before sequencing. The development of engi- requisite will be a well-characterized reference genome.
Interbreeding of individuals neered DNA polymerases that can bypass specific DNA As new extraction and computational methods expand
from multiple population lesions introduced post-mortem110 could also facilitate the age range and quality of specimens from which such
origins, resulting in the
introduction of DNA from one
library construction and amplification. data can reliably be obtained, so too will the range of
population into the genomes Importantly, although the approaches outlined species that could be considered as possible targets for
of a second population. here improve aDNA retrieval and analyses, the HTS de-extinction attempts.

406 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

1. Higuchi, R., Bowman, B., Freiberger, M., Ryder, O. A. 23. Raghavan, M. et al. The genetic prehistory of the New 48. Jónsson, H. et al. mapDamage2.0: fast approximate
& Wilson, A. C. DNA sequences from the quagga, an World Arctic. Science 345, 1255832 (2014). Bayesian estimates of ancient DNA damage
extinct member of the horse family. Nature 312, 24. Raghavan, M. et al. Upper Paleolithic Siberian parameters. Bioinformatics 29, 1682–1684 (2013).
282–284 (1984). genome reveals dual ancestry of Native Americans. 49. Hansen, A. J. et al. Crosslinks rather than strand
2. Hagelberg, E., Sykes, B. & Hedges, R. Ancient bone Nature 505, 87–91 (2014). breaks determine access to ancient DNA sequences
DNA amplified. Nature 342, 485 (1989). 25. Rasmussen, M. et al. The genome of a Late from frozen sediments. Genet. 173, 1175–1179
3. Pääbo, S. Ancient DNA: extraction, characterization, Pleistocene human from a Clovis burial site in western (2006).
molecular cloning, and enzymatic amplification. Montana. Nature 506, 225–229 (2014). 50. Heyn, P. et al. Road blocks on paleogenomes —
Proc. Natl Acad. Sci. USA 86, 1939–1943 (1989). 26. Schubert, M. et al. Prehistoric genomes reveal the polymerase extension profiling reveals the frequency
4. Rasmussen, M. et al. Ancient human genome genetic foundation and cost of horse domestication. of blocking lesions in ancient DNA. Nucleic Acids Res.
sequence of an extinct Paleo-Eskimo. Nature 463, Proc. Natl Acad. Sci. USA 111, E5661–E5669 (2014). 38, e161 (2010).
757–762 (2010). 27. Seguin-Orlando, A. et al. Genomic structure in 51. Poinar, H. N., Kuch, M., McDonald, G., Martin, P. &
This study takes advantage of the relative absence Europeans dating back at least 36,200 years. Science Pääbo, S. Nuclear gene sequences from a late
of environmental microorganisms within ancient 346, 1113–1118 (2014). Pleistocene sloth coprolithe. Curr. Biol. 13,
hairs to characterize the first high-quality ancient 28. Ramirez, O. et al. Genome data from a sixteenth 1150–1152 (2003).
human genome. century pig illuminate modern breed relationships. 52. Poinar, H. N. et al. Metagenomics to paleogenomics:
5. Miller, W. et al. Sequencing the nuclear genome of the Heredity 114, 175–184 (2015). large-scale sequencing of mammoth DNA. Science
extinct woolly mammoth. Nature 456, 387–390 (2008). 29. Schroeder, H. et al. Genome-wide ancestry of 17th 311, 393–394 (2006).
6. Green, R. E. et al. A draft sequence of the Neandertal century enslaved Africans from the Caribbean. This study reports the first genetic analysis of
genome. Science 328, 710–722 (2010). Proc. Natl Acad. Sci. USA 112, 3669–3673 (2015). ancient specimens based on a HTS technology,
This paper reports the first draft genome of an 30. Metzker, M. L. Sequencing technologies — the next paving the way for whole-genome sequencing from
archaic hominin and many methodological generation. Nat. Rev. Genet. 11, 31–46 (2010). ancient specimens.
developments that are still commonly used for 31. Dabney, J. et al. Complete mitochondrial genome 53. Allentoft, M. E. et al. The half-life of DNA in bone:
characterizing and analysing ancient genomes. sequence of a Middle Pleistocene cave bear measuring decay kinetics in 158 dated fossils.
7. Bos, K. I. et al. A draft genome of Yersinia pestis from reconstructed from ultrashort DNA fragments. Proc. Proc. Biol. Sci. 279, 4724–4733 (2012).
victims of the Black Death. Nature 478, 506–510 Natl Acad. Sci. USA 110, 15758–15763 (2013). 54. Smith, C. I., Chamberlain, A. T., Riley, M. S.,
(2011). 32. Meyer, M. et al. A mitochondrial genome sequence of Stringer, C. & Collins, M. J. The thermal history of
This paper reports the first genome isolated from a hominin from Sima de los Huesos. Nature 505, human fossils and the likelihood of successful DNA
an ancient pathogenic bacterium, confirming the 403–406 (2014). amplification. J. Hum. Evol. 45, 203–217 (2003).
Black Death as a plague epidemic. It revealed that 33. Gokhman, D. et al. Reconstructing the DNA 55. Schwarz, C. et al. New insights from old bones:
no derived variant is unique to the medieval strain, methylation maps of the Neandertal and the DNA preservation and degradation in permafrost
suggesting that non-genetic factors enhanced the Denisovan. Science 344, 523–527 (2014). preserved mammoth remains. Nucleic Acids Res. 37,
virulence of the pathogen. 34. Pedersen, J. S. et al. Genome-wide nucleosome map 3215–2129 (2009).
8. Martin, M. D. et al. Reconstructing genome evolution and cytosine methylation levels of an ancient human 56. Ginolhac, A. et al. Improving the performance of true
in historic samples of the Irish potato famine genome. Genome Res. 24, 454–466 (2014). single molecule sequencing for ancient DNA. BMC
pathogen. Nat. Commun. 4, 2172 (2013). This study exploits DNA degradation patterns in Genomics 13, 177 (2012).
9. Schuenemann, V. J. et al. Genome-wide comparison of HTS data sets to characterize, for the first time, 57. Der Sarkissian, C. et al. Shotgun microbial profiling of
medieval and modern Mycobacterium leprae. Science genome-wide nucleosome and methylation maps fossil remains. Mol. Ecol. 23, 1780–1798 (2014).
341, 179–183 (2013). from an ancient human and infer ancient gene 58. Damgaard, P. et al. Improving access to endogenous
10. Bos, K. I. et al. Pre-Columbian mycobacterial genomes expression levels and the age at death of the DNA in ancient bones and teeth. BioRxiv http://dx.doi.
reveal seals as a source of New World human individual. org/10.1101/014985 (2015).
tuberculosis. Nature 514, 494–497 (2014). 35. Ermini, L., Der Sarkissian, C., Willerslev, E. & 59. Salamon, M., Tuross, N., Arensburg, B. & Weiner, S.
11. Devault, A. M. et al. Second-pandemic strain of Vibrio Orlando, L. Major transitions in human evolution Relatively well preserved DNA is present in the crystal
cholera from the Philadelphia cholera outbreak. revisited: a tribute to ancient DNA. J. Hum. Evol. 79, aggregates of fossil bones. Proc. Natl Acad. Sci. USA
N. Engl. J. Med. 370, 334–340 (2014). 4–20 (2015). 102, 13783–13788 (2005).
12. Devault, A. M. et al. Ancient pathogen DNA in 36. Shapiro, B. & Hofreiter, M. A paleogenomic 60. Adler, C. J., Haak, W., Donlon, D. & Cooper, A.
archaeological samples detected with a microbial perspective on evolution and gene function: new Survival and recovery of DNA from ancient teeth and
detection array. Sci. Rep. 4, 4245 (2014). insights from ancient DNA. Science 343, 1236573 bones. J. Archaeol. Sci. 38, 956–964 (2011).
13. Wagner, D. M. et al. Yersinia pestis and the Plague of (2014). 61. Seguin-Orlando, A. et al. Ligation bias in illumina
Justinian 541–543 AD: a genomic analysis. Lancet 37. Orlando, L. & Cooper, A. Using ancient DNA to next-generation DNA libraries: implications for
Infect. Dis. 14, 319–326 (2014). understand evolutionary and ecological processes. sequencing ancient genomes. PLoS ONE 8, e78575
14. Rasmussen, M. et al. An Aboriginal Australian genome Ann. Rev. Ecol. Evol. Syst. 45, 573–598 (2014). (2013).
reveals separate human dispersals into Asia. Science 38. Orlando, L. & Willerslev, E. An epigenetic window into 62. Dabney, J. & Meyer, M. Length and GC‑biases during
334, 94–98 (2011). the past? Science 345, 511–512 (2014). sequencing library amplification: a comparison of
15. Keller, A. et al. New insights into the Tyrolean Iceman’s 39. Höss, M., Jaruga, P., Zastawny, T. H., Dizdaroglu, M. & various polymerase-buffer systems with ancient and
origin and phenotype as inferred by whole-genome Pääbo, S. DNA damage and DNA sequence retrieval modern DNA sequencing libraries. Biotechniques
sequencing. Nat. Commun. 3, 698 (2012). from ancient tissues. Nucleic Acids Res. 24, 87–94 (2012).
16. Meyer, M. et al. A high-coverage genome sequence 1304–1307 (1996). 63. Young, A. L. et al. A new strategy for genome
from an archaic Denisovan individual. Science 338, 40. Hansen, A. J., Willerslev, E., Wiuf, C., Mourier, T. & assembly using short sequence reads and reduced
222–226 (2012). Arctander, P. Statistical evidence for miscoding lesions representation libraries. Genome Res. 20, 249–256
This paper describes a novel method for in ancient DNA templates. Mol. Biol. Evol. 18, (2010).
constructing aDNA libraries using ssDNA 262–265 (2001). 64. Seguin-Orlando, A. et al. Amplification of TruSeq
templates, which enabled the characterization of 41. Hofreiter, M., Jaenicke, V., Serre, D., von Haeseler, A. ancient DNA libraries with AccuPrime Pfx:
the Denisovan genome at a quality rivalling that & Pääbo, S. DNA sequences from multiple consequences on nucleotide misincorporation and
of modern genomes, starting from only minute amplifications reveal artifacts induced by cytosine methylation patterns. STAR 1, STAR2015112054892
amounts of DNA extracts. deamination in ancient DNA. Nucleic Acids Res. 29, 315Y.0000000005 (2015).
17. Orlando, L. et al. Recalibrating Equus evolution using 4793–4799 (2001). 65. Star, B. et al. Palindromic sequence artifacts generated
the genome sequence of an early Middle Pleistocene 42. Stiller, M. et al. Patterns of nucleotide during next generation sequencing library preparation
horse. Nature 499, 74–78 (2013). misincorporations during enzymatic amplification and from historic and ancient DNA. PLoS ONE 9, e89676
This study takes advantage of both direct large-scale sequencing of ancient DNA. Proc. (2014).
second-generation (high-throughput, and library- Natl Acad. Sci. USA 103, 13578–13584 (2006). 66. Gansauge, M. T. & Meyer, M. Single-stranded DNA
and amplification-dependent) and third-generation 43. Gilbert, M. T. et al. Recharacterization of ancient DNA library preparation for the sequencing of ancient or
(high-throughput, and library- and amplification- miscoding lesions: insights in the era of sequencing- damaged DNA. Nat. Protoc. 8, 737–748 (2013).
independent) sequencing technologies to present by‑synthesis. Nucleic Acids Res. 35, 1–10 (2007). 67. Gilbert, M. T. et al. Whole-genome shotgun sequencing
the oldest genome sequence hitherto 44. Briggs, A. et al. Patterns of damage in genomic DNA of mitochondria from ancient hair shafts. Science 317,
characterized: that of an ~700,000‑year-old horse. sequences from a Neandertal. Proc. Natl Acad. Sci. 1927–1930 (2007).
18. Gamba, C. et al. Genome flux and stasis in a five USA 104, 14616–14621 (2007). 68. Gansauge, M. T. & Meyer, M. Selective enrichment of
millennium transect of European prehistory. Nat. This study characterizes typical nucleotide damaged DNA molecules for ancient genome
Commun. 5, 5257 (2014). misincorporation and fragmentation patterns using sequencing. Genome Res. 24, 1543–1549 (2014).
19. Jónsson, H. et al. Speciation with gene flow in equids HTS data from aDNA extracts, which have been 69. Briggs, A. et al. Targeted retrieval and analysis of five
despite extensive chromosomal plasticity. Proc. Natl subsequently used as essential authentication Neandertal mtDNA genomes. Science 325, 318–321
Acad. Sci. USA 111, 18655–18660 (2014). criteria. (2009).
20. Malaspinas, A. S. et al. Two ancient human genomes 45. Orlando, L. et al. True single-molecule DNA 70. Maricic, T., Whitten, M. & Pääbo, S. Multiplexed DNA
reveal Polynesian ancestry among the indigenous sequencing of a Pleistocene horse bone. Genome Res. sequence capture of mitochondrial genomes using
Botocudos of Brazil. Curr. Biol. 24, R1035–R1037 21, 1705–1719 (2011). PCR products. PLoS ONE 5, e14004 (2010).
(2014). 46. Sawyer, S. et al. Temporal patterns of nucleotide 71. Haak, W. et al. Massive migration from the steppe was
21. Olalde, I. et al. Derived immune and ancestral misincorporations and DNA fragmentation in ancient a source for Indo-European languages in Europe. Nature
pigmentation alleles in a 7,000‑year-old Mesolithic DNA. PLoS ONE 7, e34131 (2012). http://dx.doi.org/10.1038/nature14317 (2015).
European. Nature 507, 225–228 (2014). 47. Overballe-Petersen, S., Orlando, L. & Willerslev, E. 72. Rohland, N., Harney, E., Mallick, S., Nordenfelt, S. &
22. Prüfer, K. et al. The complete genome sequence of a Next-generation sequencing offers new insights into Reich, D. Partial uracil–DNA–glycosylase treatment for
Neanderthal from the Altai Mountains. Nature 505, DNA degradation. Trends Biotechnol. 30, 364–368 screening of ancient DNA. Phil. Trans. R. Soc. B 370,
43–49 (2014). (2012). 20130624 (2015).

NATURE REVIEWS | GENETICS VOLUME 16 | JULY 2015 | 407

© 2015 Macmillan Publishers Limited. All rights reserved


REVIEWS

73. Burbano, H. A. et al. Targeted investigation of the 92. Lindgreen, S. AdapterRemoval: easy cleaning of next- 119. Warinner, C. et al. Pathogens and host immunity in the
Neandertal genome by array-based sequence capture. generation sequencing reads. BMC Res. Notes 5, 337 ancient human oral cavity. Nat. Genet. 46, 336–344
Science 328, 723–725 (2010). (2012). (2014).
This paper reports the first characterization of an 93. Li, H. & Durbin, R. Fast and accurate short read 120. Shapiro, B. How to Clone a Mammoth — the Science
ancient exome using target enrichment approaches alignment with Burrows–Wheeler transform. of De‑extinction (Princeton University Press, 2014).
on microarrays. Bioinformatics 25, 1754–1760 (2009). 121. Lazaridis, I. et al. Ancient human genomes suggest
74. Fu, Q. et al. A revised timescale for human evolution 94. Langmead, B. & Salzberg, S. L. Fast gapped-read three ancestral populations for present-day
based on ancient mitochondrial genomes. Curr. Biol. alignment with Bowtie 2. Nat. Methods 9, 357–359 Europeans. Nature 513, 409–413 (2014).
23, 553–559 (2013). (2012). 122. Fu, Q. et al. Genome sequence of a 45,000‑year-old
75. Vilstrup, J. T. et al. Mitochondrial phylogenomics of 95. McKenna, A. et al. The genome analysis toolkit: a modern human from western Siberia. Nature 514,
modern and ancient equids. PLoS ONE 8, e55950 MapReduce framework for analyzing next-generation 445–449 (2014).
(2013). DNA sequencing data. Genome Res. 20, 1297–1303 123. Krause, J. et al. The complete mitochondrial DNA
76. Castellano, S. et al. Patterns of coding variation in the (2010). genome of an unknown hominin from southern
complete exomes of three Neandertals. Proc. Natl 96. Kozlov, A. M., Aberer, A. J. & Stamatakis, A. ExaML Siberia. Nature 464, 894–897 (2010).
Acad. Sci. USA 111, 6666–6671 (2014). version 3: a tool for phylogenomic analyses on 124. Orlando, L. A. 400,000‑year-old mitochondrial
77. Fu, Q. et al. DNA analysis of an early modern human supercomputers. Bioinformatics http://dx.doi. genome questions phylogenetic relationships
form Tianyuan Cave, China. Proc. Natl Acad. Sci. USA org/10.1093/bioinformatics/btv184 (2015). amongst archaic hominins. Bioessays 36,
110, 2223–2227 (2013). 97. Segata, N. et al. Metagenomic microbial community 598–605 (2014).
This paper describes a target enrichment profiling using unique clade-specific marker genes. 125. Scally, A. & Durbin, R. Revising the human mutation
procedure exploiting millions of DNA probes Nat. Methods 9, 811–814 (2012). rate: implications for understanding human evolution.
cleaved from user-designed DNA microarrays to 98. Schubert, M. et al. Improving ancient DNA read Nat. Rev. Genet. 13, 745–753 (2012).
characterize the almost complete sequence of the mapping against modern reference genomes. BMC 126. Alexander, D. H., Novembre, J. & Lange, K.
non-repetitive fraction of chromosome 21 for an Genomics13, 178 (2012). Fast model-based estimation of ancestry in unrelated
~40,000‑year-old human. 99. Kerpedjev, P., Frellsen, J., Lindgreen, S. & Krogh, A. individuals. Genome Res. 19, 1655–1664 (2009).
78. Carpenter, M. L. et al. Pulling out the 1%: whole- Adaptable probabilistic mapping of short reads using 127. Malaspinas, A. S. et al. bammds: a tools for assessing
genome capture for the targeted enrichment of position specific scoring matrices. BMC Bioinformatics the ancestry of low-depth whole-genome data using
ancient DNA sequencing libraries. Am. J. Hum. Genet. 15, 100 (2014). multidimensional scaling (MDS). Bioinformatics 30,
93, 852–864 (2013). 100. Ginolhac, A., Rasmussen, M., Gilbert, M. T., 2962–2964 (2014).
This paper reports the first whole-genome target Willerslev, E. & Orlando, L. mapDamage: testing for 128. Skoglund, P., Sjodin, P., Skoglund, T., Lascoux, M. &
enrichment method, which makes use of damage patterns in ancient DNA sequences. Jakobsson, M. Investigating population history using
self-generated RNA probes. The method Bioinformatics 27, 2153–2155 (2011). temporal genetic differentiation. Mol. Biol. Evol. 31,
substantially reduces the operational cost of target 101. Skoglund, P. et al. Separating endogenous ancient 2516–2527 (2014).
enrichment and allows genetic analyses of specimens DNA from modern day contamination in a Siberian 129. Durand, E. Y., Patterson, N., Reich, D. & Slatkin, M.
with only minute amounts of aDNA templates. Neandertal. Proc. Natl Acad. Sci. USA 111, Testing for ancient admixture between closely related
79. Enk, J. M. et al. Ancient whole genome enrichment 2229–2234 (2014). populations. Mol. Biol. Evol. 28, 2239–2252 (2011).
using baits built from modern DNA. Mol. Biol. Evol. 102. Lindgreen, S., Krogh, A. & Pedersen, J. S. 130. Patterson, N. et al. Ancient admixture in human
31, 1292–1295 (2014). SNPest: a probabilistic graphical model for estimating history. Genetics. 192, 1065–1093 (2012).
80. Avila-Arcos, C. et al. Comparative performance of two genotypes. BMC Res. Notes 7, 698 (2014). 131. Eriksson, A. & Manica, A. Effect of ancient population
whole-genome capture methodologies on ancient DNA 103. Skoglund, P. et al. Origins and genetic legacy of structure on the degree of polymorphism shared
Illumina libraries. Methods Ecol. Evol. http://dx.doi. Neolithic farmers and hunter-gatherers in Europe. between modern human populations and ancient
org/10.1111/2041-210X.12353 (2015). Science 336, 466–469 (2012). hominins. Proc. Natl Acad. Sci. USA 109,
81. Briggs, A. et al. Removal of deaminated cytosines and 104. García-Garcerà, M. et al. Fragmentation of 13956–13960 (2012).
detection of in vivo methylation in ancient DNA. contaminant and endogenous DNA in ancient 132. Sankararaman, S., Patterson, N., Heng, L., Pääbo, S.
Nucleic Acids Res. 38, e87 (2010). samples determined by shotgun sequencing; & Reich, D. The date of interbreeding between
This paper presents an enzymatic procedure based prospects for human paleogenomics. PLoS ONE 6, Neandertals and modern humans. PLoS Genet. 8,
on the treatment of DNA extracts with USER mix, e24161 (2011). e1002947 (2012).
which can considerably reduce the sequencing error 105. Sánchez-Quinto, F. et al. Genomic affinities of two
rate of ancient genomes by limiting the effect of 7,000‑year-old Iberian hunter-gatherers. Curr. Biol. Acknowledgements
nucleotide misincorporations at damaged sites. 22, 1494–1499 (2012). This work was supported by the Danish Council for
82. Mason, V. C., Li, G., Helgen, K. M. & Murphy, W. J. 106. Green, R. E. et al. The Neandertal genome Independent Research, Natural Sciences (FNU,
Efficient cross-species capture hybridization and and ancient DNA authenticity. EMBO J. 28, 4002‑00152B and 0602‑02383B); the Danish National
next-generation sequencing of mitochondrial genomes 2494–2502 (2009). Research Foundation (DNFR94); the Lundbeck Foundation
from noninvasively sampled museum specimens. 107. Reich, D. et al. Genetic history of an archaic hominin (R52‑5062); a Marie Curie Career Integration Grant (FP7
Genome Res. 21, 1695–1704 (2011). group from Denisova Cave in Siberia. Nature 468, CIG‑293845); and the “Chaires d’Attractivité 2014” IDEX,
83. Zhang, H. et al. Morphological and genetic evidence 1053–1060 (2010). University of Toulouse, France.
for early Holocene cattle management in northeastern 108. Llamas, B. et al. High-resolution analysis of cytosine
China. Nat. Commun. 4, 2755 (2013). methylation in ancient DNA. PLoS ONE 7, e30226 Competing interests statement
84. Fabre, P. H. et al. Rodents of the Caribbean: origin and (2012). The authors declare no competing interests.
diversification of hutias unraveled by next-generation 109. Smith, O. et al. Genomic methylation patterns in
museomics. Biol. Lett. http://dx.doi.org/10.1098/ archaeological barley show de‑methylation as a time-
rsbl.2014.0266 (2014). dependent diagenetic process. Sci. Rep. 4, 5559
FURTHER INFORMATION
85. Foote, A. D. et al. Tracking niche variation over (2014).
AdapterRemoval: https://github.com/slindgreen/
millennial timescales in sympatric killer whale 110. d’Abbadie, M. et al. Molecular breeding of
AdapterRemoval
lineages. Proc. Biol. Sci. 280, 20131481 (2013). polymerases for amplification of ancient DNA.
Admixture: http://www.genetics.ucla.edu/software/admixture/
86. Schuenemann, V. J. et al. Targeted enrichment of Nat. Biotech. 25, 939–943 (2007).
ANFO Short Read Aligner/Mapper: https://bioinf.eva.mpg.
ancient pathogens yielding the pPCP1 plasmid 111. da Fonseca, R. R. et al. The origin and evolution of
of Yersinia pestis from victims of the Black Death. maize in the American Southwest. Nat. Plants de/anfo/
Proc. Natl Acad. Sci. USA 108, E746–E452 (2011). 1, 14003 (2015). bammds: http://dna.ku.dk/~sapfo/bammds.html
87. Kircher, M., Sawyer, S. & Meyer, M. Double indexing 112. Pickrell, J. K. & Reich, D. Toward a new history and Bayesian reconstruction of ancient DNA fragments:
overcomes inaccuracies in multiplex sequencing on the geography of human genes informed by ancient DNA. https://github.com/grenaud/leeHom
Illumina platform. Nucleic Acids Res. 40, e3 (2012). Trends Genet. 30, 377–389 (2014). Bowtie 2: http://bowtie-bio.sourceforge.net/bowtie2/index.
88. Avila-Arcos, M. C. et al. Application and comparison of 113. Fordyce, S. L. et al. Deep sequencing of RNA from shtml
large-scale solution-based DNA capture-enrichment ancient maize kernels. PLoS ONE 8, e50961 (2013). Burrows–Wheeler Aligner: http://bio-bwa.sourceforge.net/
methods on ancient DNA. Sci. Rep. 1, 73 (2011). 114. Allaby, R. G. et al. Using archaeogenomic and BWA-PSSM: http://bwa-pssm.binf.ku.dk/
89. Hodges, E. et al. Hybrid selection of discrete computational approaches to unravel the history of ExaML: http://sco.h-its.org/exelixis/web/software/examl/
genomic intervals on custom-designed microarrays local adaptation in crops. Phil. Trans. R. Soc. B 370, index.html
for massively parallel sequencing. Nat. Protoc. 4, 20130377 (2015). Genome Analysis Toolkit: https://www.broadinstitute.org/
960–974 (2009). 115. Cappellini, E. et al. Proteomic analysis of a Pleistocene gatk/
90. Bos, K. I. et al. Parallel detection of ancient pathogens mammoth femur reveals more than one hundred mapDamage and mapDamage2: http://ginolhac.github.io/
via array-based DNA capture. Phil. Trans. R. Soc. B ancient bone proteins. 11, 917–926 (2012). mapDamage/
370, 20130375 (2015). 116. Cappellini, E., Collins, M. J. & Gilbert, M. T. Mapping Iterative Assembler: http://mia-assembler.
91. Schubert, M. et al. Characterization of ancient and Unlocking ancient protein palimpsests. Science 343, sourceforge.net/
modern genomes by SNP detection and phylogenomic 1320–1322 (2014). MetaPhlAn: http://huttenhower.sph.harvard.edu/metaphlan
and metagenomic analysis using PALEOMIX. Nat. 117. Warinner, C. et al. Direct evidence of milk PALEOMIX: https://github.com/MikkelSchubert/paleomix
Protoc. 9, 1056–1082 (2013). consumption from ancient human dental calculus. Picard: http://broadinstitute.github.io/picard
This paper presents a fully automated pipeline Sci. Rep. 4, 7104 (2014). pmdtools: https://code.google.com/p/pmdtools/
performing all sequence analyses associated with 118. Adler, C. J. et al. Sequencing ancient calcified dental SNPest: https://github.com/slindgreen/SNPest
re‑sequencing genomic projects, phylogenomic plaque shows changes in oral microbiota with dietary Thermal Age Web Tool: http://thermal-age.eu/
inference and metagenomic profiling. It is applicable shifts of the Neolithic and Industrial revolutions. ALL LINKS ARE ACTIVE IN THE ONLINE PDF
to both modern and ancient sequence data sets. Nat. Genet. 45, 450–455 (2013).

408 | JULY 2015 | VOLUME 16 www.nature.com/reviews/genetics

© 2015 Macmillan Publishers Limited. All rights reserved

You might also like