Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Bioinformatics, 37(20), 2021, 3652–3653

doi: 10.1093/bioinformatics/btab190
Advance Access Publication Date: 23 April 2021
Applications Note

Sequence analysis
DamageProfiler: fast damage pattern calculation for
ancient DNA
1,2,3,
Judith Neukamm *, Alexander Peltzer2,4 and Kay Nieselt2,*

Downloaded from https://academic.oup.com/bioinformatics/article/37/20/3652/6247758 by guest on 15 May 2023


1
Institute of Evolutionary Medicine, University of Zurich, 8057 Zurich, Switzerland, 2Institute for Bioinformatics and Medical
Informatics, University of Tübingen, 72076 Tübingen, Germany, 3Institute for Archaeological Sciences, University of Tübingen, 72070
Tübingen, Germany and 4Max Planck Institute for the Science of Human History, 07745 Jena, Germany
*To whom correspondence should be addressed.
Associate Editor: Janet Kelso

Received on October 5, 2020; revised on February 22, 2021; editorial decision on March 11, 2021

Abstract
Motivation: In ancient DNA research, the authentication of ancient samples based on specific features remains a
crucial step in data analysis. Because of this central importance, researchers lacking deeper programming know-
ledge should be able to run a basic damage authentication analysis. Such software should be user-friendly and easy
to integrate into an analysis pipeline.
Results: DamageProfiler is a Java-based, stand-alone software to determine damage patterns in ancient DNA. The
results are provided in various file formats and plots for further processing. DamageProfiler has an intuitive graphic-
al as well as command line interface that allows the tool to be easily embedded into an analysis pipeline.
Availability and implementation: All of the source code is freely available on GitHub (https://github.com/Integrative-
Transcriptomics/DamageProfiler).
Contact: judith.neukamm@uzh.ch or kay.nieselt@uni-tuebingen.de
Supplementary information: Supplementary data are available at Bioinformatics online.

1 Introduction sample sets. For ancient metagenomic samples, HOPS (Hübler et al.,
2019) has recently been published. This tool offers a fully automat-
The field of ancient DNA (aDNA) further develops with every pass- ized bacterial screening pipeline for aDNA sequences that provides
ing year and current research spans diverse topics including the ana- detailed information on species identification and authenticity.
lysis of various taxa, from ancient plant and animal species However, HOPS includes the metagenomic mapping software
(Debruyne et al., 2008; Jaenicke-Després et al., 2003), to human MALT (Vågene et al., 2018) and is limited to its output format.
microbiological communities from oral or gut microbiomes (Tito Here, we present DamageProfiler, a fast and user-friendly soft-
et al., 2012; Warinner, 2016). One central analysis step is the au- ware to investigate damage patterns in high-throughput sequencing
thentication of the ancient origin of the DNA. Several characteristics data. DamageProfiler aims for high usability and fast runtime to be
have been identified that describe post-mortem DNA modifications. applicable to large datasets. The results are provided as intuitive,
These can be used to differentiate between authentic aDNA and well-described figures and text-based representation in various file
modern contamination and are typically summarized as (i) short formats, including JSON format. Moreover, the visualizations can
fragment length (Meyer et al., 2016); (ii) preferential fragmentation be explored interactively in the graphical user interface. The various
of aDNA at purine bases (Briggs et al., 2007); and (iii) increased output formats and runtime improvements make it suitable for inte-
Cytosine (C) to Thymine (T) base misincorporations towards the grated use in analysis pipelines.
five prime (50 ) end and (iv) complementary Guanine (G) to Adenine
(A) base misincorporations towards the three prime (30 ) end of the
fragment due to enhanced cytosine deamination in single-stranded
2 Results
50 -overhanging ends (Briggs et al., 2007). Existing software such as
mapDamage2.0 (Jónsson et al., 2013) and PMDTools (Skoglund DamageProfiler is a fast, interactive and easy-to-use Java-based pro-
et al., 2014) calculate the nucleotide misincorporation and fragmen- gram to investigate the damage patterns of high-throughput
tation patterns from next-generation sequencing (NGS) reads and sequencing data. DamageProfiler is compatible with newer file for-
provide graphical and text-based representations, as well as further mats such as CRAM, and also legacy file formats such as SAM or
statistical estimates. However, their usability is hindered by the ex- BAM. In addition to several statistics, DamageProfiler generates a
clusive use as a command line tool and long runtimes for larger damage plot, which is calculated by determining the frequency of C

C The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
V 3652
DamageProfiler 3653

to T base misincorporations per position of all fragments that map


to the reference genome, the length distribution of all mapped reads,
and the edit distance between the read and the reference both in text
and image formats (Supplementary Fig. S1).
Contrasting to the double-stranded aDNA library preparation
protocol (Briggs and Heyn, 2012), the single-stranded library proto-
col (Gansauge et al., 2017) results in reads having only C to T base
misincorporations at both the 50 and 30 ends and thus renders the
visualization of the G to A base misincorporations superfluous.
Subsequently, for single-stranded protocols, DamageProfiler pro-
vides a specific option to compute and visualize only C to T base
misincorporations.
DamageProfiler is also applicable to metagenomic mapping files,
i.e. mapping files that contain mapping information for multiple ref-
erence files. Based on a user-specified list of reference IDs, a detailed

Downloaded from https://academic.oup.com/bioinformatics/article/37/20/3652/6247758 by guest on 15 May 2023


output for each species and a general summary file is generated.
DamageProfiler runs platform-independent and provides both a
graphical and command-line interface. The graphical user interface
(Supplementary Fig. S2) enables easy configuration of all parameters Fig. 1. Runtime comparison. The runtime of mapDamage2.0, PMDTools and
and offers an interactive, visual exploration of the results. All uti- DamageProfiler was compared based on five different mapping files of increasing size. The
lized parameters and steps performed are listed in a log file, which y-axis is log-scaled and shows the runtime in seconds, the x-axis refers to the five samples.
helps to track each step for reproducible results. In addition to com- For Motala12 and Loschbour, PMDTools did not finish within 48h and was cancelled
prehensive documentation (https://damageprofiler.readthedocs.io),
a help page is available directly within the interface to provide more
In summary, DamageProfiler is a fast and user-friendly tool for
details about all parameters.
coherent calculation of damage patterns for mapped NGS reads
DamageProfiler is developed as stand-alone software, but can
from a variety of mapping tools. Furthermore, DamageProfiler can
also be integrated into analysis pipelines. Currently, DamageProfiler
be applied to single- or multi-reference mapping files.
is already available within the EAGER pipeline (Peltzer et al., 2016),
and is the default damage computation tool in nf-core/eager (Ewels
et al., 2020; Fellows Yates et al., 2021).
DamageProfiler accepts BAM, SAM and CRAM formatted files Acknowledgements
and provides various output formats, such as SVG images, text and The authors thank Abagail M. Breidenstein for proofreading the manuscript.
JSON files, for further processing of the results. The JSON output They also thank James A. Fellows Yates, Alexander Hübner, Maxime Borry
can, for example, be used to summarize results with direct integra- and Simon Heumos for bug reports, feature suggestions and contributions.
tion in MultiQC (Ewels et al., 2016).
Conflict of Interest: none declared.
To evaluate the accuracy of the results, DamageProfiler was tested
on a simulated dataset (Supplementary Note S1, Supplementary Fig.
S3). We could show that the results are identical to the expected dam-
age patterns, which proves the reliability of DamageProfiler. References
While various mapping software is used in the aDNA field, Briggs,A.W. et al. (2007) Patterns of damage in genomic DNA sequences from
DamageProfiler was successfully tested on the most common ones a Neandertal. Proc. Natl. Acad. Sci. USA, 104, 14616–14621.
(Supplementary Note S2, Supplementary Fig. S4). Briggs,A.W. and Heyn,P. (2012) Preparation of next-generation sequencing
Moreover, we demonstrated the application of DamageProfiler libraries from damaged DNA. In: Shapiro,B. and Hofreiter,M. (eds.) Ancient
on a metagenomic mapping file (Supplementary Note S3, DNA: Methods and Protocols. Humana Press, Totowa, NJ, pp. 143–154.
Supplementary Fig. S5). Based on a given set of species of interest, Debruyne,R. et al. (2008) Out of America: ancient DNA evidence for a new world
DamageProfiler calculates the damage patterns for all species indi- origin of late quaternary woolly mammoths. Curr. Biol., 18, 1320–1326.
vidually and also generates a summary file containing all species. Ewels,P. et al. (2016) MultiQC: summarize analysis results for multiple tools
DamageProfiler also improves the runtime of damage pattern and samples in a single report. Bioinformatics, 32, 3047–3048.
calculations compared to existing software (Jónsson et al., 2013; Ewels,P. et al. (2020) The nf-core framework for community-curated bioinfor-
matics pipelines. Nat. Biotechnol., 38, 276–278.
Skoglund et al., 2014). As each software provides different addition-
Fellows Yates,J.A. et al. (2021) Reproducible, portable, and efficient ancient
al features, we only compared the runtime of accessing the base sub-
genome reconstruction with nf-core/eager. PeerJ, 9, e10947.
stitutions and the output generation. The runtime was tested for five
Gansauge,M.-T. et al. (2017) Single-stranded DNA library preparation from
files of different sizes and reference genomes and it could be shown
highly degraded DNA using T4 DNA ligase. Nucleic Acids Res., 45, e79.
that DamageProfiler performs best (Fig. 1). In detail, it runs on aver- Hübler,R. et al. (2019) HOPS: automated detection and authentication of
age 5.5 times faster than mapDamage2.0 and 174.6 times faster pathogen DNA in archaeological remains. Genome Biol., 20, 280.
than PMDTools (Supplementary Note S4, Supplementary Table S1). Jaenicke-Després,V. et al. (2003) Early allelic selection in maize as revealed by
Moreover, an increase in speed can be specifically observed for ancient DNA. Science, 302, 1206–1208.
larger mapping files (Supplementary Table S1). Jónsson,H. et al. (2013) mapDamage2.0: fast approximate Bayesian estimates
of ancient DNA damage parameters. Bioinformatics, 29, 1682–1684.
Meyer,M. et al. (2016) Nuclear DNA sequences from the Middle Pleistocene
3 Discussion Sima de los Huesos hominins. Nature, 531, 504–507.
Peltzer,A. et al. (2016) EAGER: efficient ancient genome reconstruction.
We demonstrated the reliability of the results of DamageProfiler and Genome Biol., 17, 60.
showed that our tool is faster than the existing software, and could Skoglund,P. et al. (2014) Separating endogenous ancient DNA from modern
even be improved by a multithreaded implementation. We could day contamination in a Siberian Neandertal. Proc. Natl. Acad. Sci. USA,
also show that re-implementing the back-end in Java did not alter 111, 2229–2234.
the results. Although DamageProfiler does not provide the same Tito,R.Y. et al. (2012) Insights from characterizing extinct human gut micro-
comprehensive statistics as mapDamage2.0, including the rescaling biomes. PLoS One, 7, e51146.
of the base quality score based on the probability of being damaged, Vågene,Å.J. et al. (2018) Salmonella enterica genomes from victims of a major
it can be used to evaluate damage patterns in DNA sequences, pro- sixteenth-century epidemic in Mexico. Nat. Ecol. Evol., 2, 520–528.
vides a GUI for more convenient exploration of the results and Warinner,C. (2016) Dental calculus and the evolution of the human oral
extends the application to metagenomic mapping files. microbiome. J. Calif. Dent. Assoc., 44, 411–420.

You might also like