Significance and Impact of The Study

Article type : Original Article
Accepted Article
Comparative Study of Sequence Aligners for Detecting Antibiotic Resistance in Bacterial
Metagenomes
Camille McCall1#, Irene Xagoraraki1
Department of Civil and Environmental Engineering, Michigan State University, East Lansing,
Michigan, USA1
#Address correspondence to Camille McCall, email: mccallca@msu.edu; phone: (248) 346-2359
Running Headline: Antibiotic Resistance in Metagenomes
Significance and Impact of the Study
Antibiotic resistance genes (ARGs) are pollutants known to persist in wastewater
treatment plants among other environments, thus methods for detecting these genes have
become increasingly relevant. Next generation sequencing has brought about a host of sequence
alignment tools that provide a comprehensive look into antimicrobial resistance in
environmental samples. However, standardizing practices in ARG metagenomic studies is
challenging since results produced from alignment tools can vary significantly. Our study
provides sequence alignment results of synthetic, and authentic bacterial metagenomes mapped
This article has been accepted for publication and undergone full peer review but has not
been through the copyediting, typesetting, pagination and proofreading process, which may
lead to differences between this version and the Version of Record. Please cite this article as
doi: 10.1111/lam.12842
This article is protected by copyright. All rights reserved.

against an ARG database using multiple alignment tools, and the best practice for detecting
ARGs in environmental samples.

Accepted Article
Abstract
We aim to compare the performance of Bowtie2, BWA-MEM, BLASTN, and BLASTX when
aligning bacterial metagenomes against the Comprehensive Antibiotic Resistance Database
(CARD). Simulated reads were used to evaluate the performance of each aligner under the
following four performance criteria: correctly mapped, false positives, multi-reads, and partials.
The optimal alignment approach was applied to samples from two wastewater treatment plants
to detect ARGs using next generation sequencing. BLASTN mapped with greater accuracy
among the four sequence alignment approaches considered followed by Bowtie2. BLASTX
generated the greatest number of false positives and multi-reads when aligned against the
CARD. The performance of each alignment tool was also investigated using error-free reads.
Although each aligner mapped a greater number of error-free reads as compared to Illumina-
error reads, in general, the introduction of sequencing errors had little effect on alignment
results when aligning against the CARD. Given each performance criteria, BLASTN was found to
be the most favorable alignment tool and was therefore used to assess resistance genes in
sewage samples. Beta-lactam and aminoglycoside were found to be the most abundant classes
of antibiotic resistance genes in each sample.
Keywords: Antibiotic resistance, Metagenomics, Bowtie2, BWA-MEM, BLAST, Alignment

Introduction
It has been reported that within the first few years of introducing a new antibiotic,
Accepted Article
pathogens commonly known to persist in hospital settings, develop resistance (Zhang et al.
2015a). Resistant pathogens of this nature escape controlled settings through, most commonly,
hospital waste streams and are released into the environment. Additionally, the intake of
antibiotics is considerably greater in livestock operations than in human therapeutics.
Tetracyclines, for example, are administered to livestock at subtherapeutic levels to act as
growth promoters, and prevent disease. This may further facilitate the dissemination of
resistance genes in the environment through the use of manure and wastewater effluent in land
applications (Manaia et al. 2016; Schmieder and Edwards 2012). Consequently, antibiotic
resistance genes (ARGs) have become an evolving environmental pollutant known to occur in
ecosystems such as, soils, surface waters, and wastewater treatment plants (WWTPs) (Manaia
et al. 2016). Several techniques, namely, culturing, quantitative polymerase chain reaction
(qPCR), and microarrays have been used for detecting ARGs in the environment (Luby et al.
2016). Though these techniques have made it possible to detect ARGs, they are insufficient in
identifying novel ARGs, or a broad spectrum of these genes within microbial communities due
to the lack of culturability, or availability of primers.
Next-generation sequencing (NGS) is culture-independent and allows for the analysis of
metagenomes, which consist of all genetic material contained in a sample. NGS platforms such
as Illumina, PacBio, and SOLiD sequence libraries of genomic fragments that are later processed
for interpretation. A major stage in the post-sequencing process is the alignment of sequence
fragments (or “reads”) against reference sequences (Langmead and Salzberg 2012; Hatem et al.
2013). The Comprehensive Antibiotic Resistance Database (CARD) is a widely used reference
database in recent metagenomic ARG studies (Elbehery et al. 2016; Garner et al. 2016; Subirats
et al. 2016).

A number of sequence aligners have been developed for aligning reads against reference
databases. The Basic Local Alignment Search Tool (BLAST) is a widely used sequence alignment
Accepted Article
tool for detecting antibiotic resistance in metagenomic studies (Yang et al. 2014; Zhang et al.
2015b; Elbehery et al. 2016; Subirats et al. 2016). BLAST takes a heuristic approach, which
allows for rapid alignment of query sequences against massive reference databases like
GenBank (Altschul et al. 1997). In addition to BLAST, two popular sequence aligners, Bowtie2
(Langmead and Salzberg 2012) and BWA-MEM (Li 2013) attempt to find all possible alignments
for each query sequence by way of indexing the reference, finding global, or maximum exact
matches, and re-seeding.
Several studies have evaluated aligners given either synthetic data, or authentic data
extracted from, for example, the human genome (Ruffalo et al. 2011; Langmead and Salzberg
2012; Hatem et al. 2013; Li 2013). Fewer studies have analyzed sequence aligners using
bacterial genomes (Li 2013), and to our knowledge there are no known studies that have
assessed the alignment of bacterial metagenomes against ARG reference databases. There are a
number of factors that influence the behavior of sequence aligners including, parameters set,
genetic data, and sequencing platform (Hatem et al. 2013; Elbehery et al. 2016). Sequence
aligners are challenged with the task of assigning multi-reads (reads that align equally as good
to more than one position on a reference genome) (Treangen and Salzberg 2012) to the correct
position and can result in errors when reporting best hits only. When dealing with metagenomic
datasets, the situation is aggravated because the volume of data increases dramatically. Errors
generated from the sequencing platform introduce further ambiguity when analyzing
sequencing data.
To assess the performance of sequence aligners in ARG metagenomic studies, we aligned
two simulated ARG bacterial metagenomes (one without error and another with simulated

sequencing errors) against the CARD. Bowtie2, BWA-MEM, BLASTN and BLASTX were chosen
based on their frequent appearance in metagenomic studies, compatibility with Illumina data,
Accepted Article
and capabilities to deal with reads of various lengths. Furthermore, real bacterial DNA extracted
from two WWTPs and sequenced on an Illumina platform, was aligned against the CARD using
the optimal alignment approach determined by the specified performance criteria. We aim to
(1) assess the performance of BWA-MEM, Bowtie2, BLASTN, and BLASTX when aligning a
simulated bacterial metagenome against an ARG database, and (2) apply the optimal alignment
approach on empirical sewage samples and report the relative abundance of ARG classes in
each sample. Results presented here will assist in determining what tool to use in ARG
metagenomic studies.
Results and Discussion
Alignment of Simulated Bacterial Metagenomes
To validate each alignment approach, simulated reads were initially mapped back to their
whole genomes (Table 1). During alignment of simulated reads against whole genomes, Bowtie2
and BWA-MEM aligned 100% of reads with and without sequencing errors while BLASTN
aligned 99.71% and 99.82% of reads with error and without error, respectively. Without error
BWA-MEM mapped at a greater accuracy followed by Bowtie2 and BLASTN. Although BWA-
MEM mapped error-free reads with greater accuracy than Bowtie2 and BLAST, there was a
significant decrease in alignment accuracy when aligning reads with sequencing errors. Hence,
our findings suggest that BWA-MEM is more sensitive to discrepancies between the reference
and query sequence due to sequencing errors or allelic variation.
Bowtie2 yielded the highest alignment error rate at 3.51% followed by BWA-MEM at
2.60% when aligning error containing reads against whole genomes. Conversely, BWA-MEM
generated an error of 5.17% while Bowtie2 obtained an error rate of 3.98% when aligning

error-free reads. All BLAST results were evaluated with an E-value threshold of 1e-5.
Considering the absence of a protein reference for whole genomes, BLASTX was not considered
Accepted Article
in the preliminary analysis. The high accuracy of aligned reads with and without error among all
methods indicate that the alignment approaches taken generally performed well.
To evaluate the performance of each sequence aligner when mapping bacterial
metagenomes to an ARG reference, simulated reads were aligned against the CARD. The highest
number of mapped reads was obtained during BLASTX alignment. Each alignment tool obtained
a greater number of mapped reads when aligning reads without sequencing errors. BLASTN
obtained the greatest number of correctly mapped reads followed by Bowtie2 (Table 2). While
BLASTX obtained the lowest percentage of false positives, it generated a substantial amount of
multi-reads. BWA-MEM produced the least amount of multi-reads and there was no significant
difference in the percent of multi-reads between BLASTN and Bowtie2 (P = 0.08). Bowtie2
alignment generated 6.08% and 8.17% more false positives compared to BLASTN when aligning
reads with and without error, respectively. Bowtie2 yielded the highest error rate when aligning
simulated data against the CARD with and without error at 5.42% and 2.47%, respectively,
followed by BWA-MEM at 4.30% and 2.05%, respectively.
Alignment of simulated bacterial metagenomes against the CARD revealed BLASTN to
have a slightly greater accuracy compared to Bowtie2, BWA-MEM, and BLASTX. Although false
positives are anticipated, each alignment tool generated a relatively large number of false
positives when mapping against the CARD as compared to whole genomes. This could be
attributed to sequencing errors, which can complicate the alignment process resulting in
incorrect alignments, specifically in metagenomic data (Elbehery et al. 2016). BWA-MEM and
BLASTN mapped reads without sequencing errors with marginally greater accuracy than reads
consisting of sequencing errors. In most cases, the percent of multi-reads decreased when

aligning reads with sequencing errors as opposed to reads without. The number of false
positives was slightly less for error-free reads only during alignment with BLAST. Repetitive
Accepted Article
regions can also have a substantial impact on the number of false alignments and multi-reads
(Treangen and Salzberg 2012, Yu et al. 2012). While this may be the case during BLASTX
alignment, further analysis is needed to draw a definite conclusion. Despite the slight variations
in results between error-free and Illumina-error reads, the performance (i.e. no. of mapped
reads, correct, partials, multi-reads, and false positives) of aligners when mapping against the
CARD remained largely unchanged when introducing sequencing errors (P = 0.05 - 0.57).
Alignment and Survey of ARGs detected in Wastewater Metagenomes
Quality analysis results on trimmed sequences revealed a mean quality of 36 for both
samples (Table 4). Wastewater samples were aligned against the CARD using BLASTN with an
E-value of 1e-5, the remaining parameters were maintained at default settings. A gene similarity
threshold of ≥ 90% over 150 bps was considered for mapped reads.
A total of 256 and 300 different ARGs met threshold conditions, each obtaining an E-value
of less than 1e-50, in the CAS and MBR sewage samples, respectively using BLASTN. β-lactam
resistance genes were the most abundant in each sample followed by aminoglycosides. Minor
counts of ARGs belonging to elfamycin, glycopeptide, and polymyxin classes of antibiotics were
detected in the MBR sample, but went undetected in the CAS sample (Figure 1). Streptomyces
cinnamoneus tuf gene (NCBI accession no. X98831), resistant to elfamycins, was detected in the
MBR sample. Elfamycins are a class of naturally occurring antibiotics that inhibit bacterial
growth by binding to the elongation factor Tu polypeptide, a component responsible for
bacterial protein synthesis (Sottani et al. 1993). To our knowledge, no recent studies on the
prevalence of elfamycin resistance genes in sewage treatment plants have been documented.

Our results suggest the occurrence of this gene in the MBR sample. There was no significant
difference in the abundance of antibiotic classes found between sewage samples (P = 0.88).
Accepted Article
Limitations on Sequence Alignment Results
Results generated in this analysis only depict results from the specific aligner parameters,
references, and samples used in this study. Some may choose to adjust sequence algorithm
parameters depending on characteristics of data, biological application, and sensitivity needs.
Here, we offer results from the alignment of simulated and real bacterial metagenomes mapped
against an ARG reference database with varying aligners under default conditions.
Since the analysis of ARGs using sequencing is most accurate when identifying known
genes (Schmieder and Edwards 2012), ARGs detected in each WWTP only suggest the presence
of these genes in the samples investigated. When possible, traditional biological detection
methods are recommended for verifying the identification of genes detected using sequences
aligners.
This study evaluates the performance of Bowtie2, BWA-MEM, and BLAST sequence
alignment tools in metagenomic ARG analyses. It also highlights sequencing errors as a potential
factor that can interfere with accurately detecting ARGs in bacterial metagenomes using
sequence aligners. BLASTN reported a greater percentage of accurate alignments followed by
Bowtie2 in the simulated metagenomes. BLASTN aligned a greater number of reads as
compared to Bowtie2 and maintained a lower number of false positives verses Bowtie2 and
BWA-MEM. Therefore, BLASTN was selected as the aligner of choice in the study. It is clear that
each tool has its tradeoff when confronted with specificity verses sensitivity. To gain more

insight into the performance of sequence aligners in ARG metagenomic analyses, further studies
with varying sequencing tools and aligner performance parameters are warranted in the future.
Accepted Article
Materials and methods
Wastewater Sample Collection
Sewage samples were collected from the East Lansing, conventional activated sludge
(CAS), and Traverse City, membrane bioreactor (MBR), WWTPs in Michigan (U.S.A.) in 2013.
The characteristics of these WWTPs are shown in Table 3. Samples presented in this study were
taken directly after the disinfection process of each treatment utility. In short, 2 liters of grab
effluent sample was collected in sterile nalgene bottles from each WWTP. Samples were mixed
and stored on ice, then transported to the laboratory for further processing.
Wastewater Sample Processing and Filtration
Bacteria were recovered using a standard filtration technique with 0.45 μm HA filters
(Millipore, Billerica, MA). The volume of sample filtered was 1 liter for each sample. The filters
were collected in sterile 50 ml polypropylene tubes and 50 ml Phosphate Buffer (1X PBS) was
added in each tube containing a filter. The tubes were vortexed for 5 min to allow the biomass
layer on the filters to mix with water. Both tubes were centrifuged for 20 min at 2309 g to
concentrate the sample down to 2 ml. Supernatant was discarded and the concentrates were
stored at −80°C until DNA extraction was performed.

Nucleic Acid Extraction
DNA extraction was performed using a MagNA Pure Compact DNA extractor (Roche
Accepted Article
Applied Science, Indianapolis, IN, USA) following the protocol in the manufacturer’s manual. The
MagNA Pure Compact utilizes a magnetic-bead technology for the isolation process. Sample
amount of 400 μl was loaded in the system and the elution volume was 100 μl. The purified
DNAs were stored in a freezer at -20°C. DNA concentration was determined using the NanoDrop
Spectrophotometer (NanoDrop® ND-1000, Wilmington, DE).
High-throughput Sequencing and Preprocessing
DNA samples were isolated and approximately 1 μg of DNA (per sample) was sent to the
Research Technology Support Facility (RTSF) at Michigan State University. The NuGEN Ovation
Ultralow Library System, with an input requirement of 1-100 ng of DNA, was used for both
samples to accommodate for any sample containing low genetic material. After preparation,
libraries were sequenced on an Illumina platform (Illumina HiSeq2500, Roche Technologies)
generating 150 bp paired-end reads.
Sequences were returned as R1.FASTQ and R2.FASTQ files for each sample, where R1 and
R2 constitute a read pair. Each FASTQ file was processed using a Unix/Linux system offered
through the MSU High Performance Computing Center (HPCC). Raw sequences were analyzed
for quality using FastQC, a quality control tool for sequencing data (Andrews 2010). Based on
the quality control check, Illumina adapters and reads with an average quality score below 15
were removed using Trimmomatic (Bolger et al. 2014). Finally, FastQC was performed once
more on the quality trimmed reads to ensure the integrity of the sequence reads and accuracy of
latter sequence alignment processes.

Simulated Bacterial Metagenome
A synthetic bacterial metagenome was constructed from seven complete sequences,

Accepted Article
identified as whole genomes or mobile genetic elements (MGEs), extracted from the NCBI’s
nucleotide database. Sequences associated with the following NCBI accession numbers were
used and consist of ARGs contained in the CARD: NC_003197.2, NC_010410.1, X58272.1,
NC_000962.3, NC_003112.2, NC_020088.1, NC_020418. Selected genomes are of different
genera and comprise ARGs with lengths ranging between 439 - 3205 nucleotides (Table S1).
Read length and insert size for pair-end synthetic reads was assigned based on the
characteristics of sequenced wastewater samples. Read length distribution for both samples
was validated using a custom awk command on trimmed reads. BBmerge (extended to 100
bases with a kmer size of 62) in the BBTools package (BBMap – Bushnell B. –
http://sourceforge.net/projects/bbmap) was used to determine the insert size and standard
deviation of both overlapping and non-overlapping reads in the trimmed CAS and MBR pair-end
FASTQ files. Synthetic reads were then simulated using Grinder, version 0.5.4. (Angly et al.
2012). Grinder generated 150 bp, pair-end reads with 1x coverage and a mean insert size and
standard deviation of 218 and 54, respectively. The remaining parameters were run at default
conditions.
To better assess the effects of sequencing error on performance, synthetic reads were
generated with and without error. Illumina-error profiles were generated as recommended in
the instruction manual. Quality scores between 15 and 36 were assigned to each read in the
resulting FASTQ file according to error profile. The simulated metagenome reads without error
were generated with default quality scores. Synthetic reads were aligned with Bowtie2, BWA-
MEM, BLASTN, and BLASTX under default conditions as described below.

ARG Reference Databases
Reference genes (both nucleotide and protein sequences) from non-mutated CARD
Accepted Article
version 1.18 (Jia et al. 2017) were downloaded and used for alignment. The nucleotide CARD is
composed of antibiotic resistance gene sequences in FASTA format consisting of various
antibiotic classes. It is 2,027,840 nucleotide bases in length and contains 2165 ARG sequences
imported from NCBI GenBank and peer-reviewed publications. The protein CARD is 671,057
amino acids in length and consists of 2165 protein sequences in FASTA format. Reference genes
are classified based on the CARD’s Antibiotic Resistance Ontology (ARO) (Jia et al. 2017).
Sequence Alignment and Performance Evaluation
Simulated metagenomes were analyzed using Bowtie2, BWA-MEM, and BLAST, tools for
aligning reads to reference sequences. Bowtie2 was operated using default settings (i.e. end-to-
end alignment, and a minimum threshold alignment score of -90) for each metagenome
(Langmead and Salzberg 2012). BWA-MEM was operated using default settings (i.e. local
alignment) (Li 2013). BLASTN and BLASTX in the BLAST+ package version 2.6 were used for
aligning reads. BLAST tools were ran using default settings (i.e. local alignment) with an E-value
of 1e-5.
Because a considerable amount of ambiguity is expected when mapping reads to the
CARD, the alignment methods used in this study were verified by first aligning simulated reads
back to their whole genomes. The performance of each alignment tool was then evaluated when
mapping simulated reads against the CARD. Since the position of each read is known, a custom
python code was used to evaluate the following four performance criteria: correctly mapped
reads (reads aligned to the correct position on the genome, or gene), partially mapped reads
(reads offset from its true position while obtaining at least 20 true positive bases), false

positives (reads aligned to the incorrect position), and multi-reads. The abovementioned
parameters were evaluated for Bowtie2, BWA-MEM, and BLASTN alignments. Reads mapped to
Accepted Article
their respective nucleotide accession number when translated with a percent identity of ≥ 90%
over at least 25 amino acids (aa) (Elbehery et al. 2016) were considered correctly mapped
during alignment with BLASTX. Reads mapped with the abovementioned criteria to the
incorrect target accession number were considered false positives. Multi-reads obtained during
BLASTX follow the same criteria as previously mentioned. Partially mapped reads were not
considered during BLASTX alignment. Since multi-reads introduce uncertainty when analyzing
metagenomic data, in most cases, multi-reads were not favored.
The optimal alignment approach given the performance criteria was used for aligning the
wastewater samples against the CARD. Genes meeting a threshold value of ≥ 90.00% nucleotide
gene similarity (Kristiansson et al. 2011; Zhang et al. 2015b) or ≥ 90.00% over at least 25 aa
were clustered together in the appropriate ARG class.
Statistical Analysis
Significant differences in performance criteria between aligners and abundance of ARG
classes between sewage samples was determined by a one-way analysis of variance (ANOVA)
test in SPSS 24.0, where P < 0.05 was considered statistically significant. Error rates were
retrieved directly from the sample’s Binary Alignment/Mapping (BAM) output file using
SAMtools. All quality scores reported follow a Phred scale (Ruffalo et al. 2011).

Acknowledgments
We would like to thank the managers at East Lansing and Traverse City WWTPs for
Accepted Article
providing samples. Thank you to Bioinformatics Research Specialists, Dr. Tracy Teal and
Dharanya Sampath for technical support. Special thanks to Dr. Shin-Han Shiu for his guidance,
and Peipei Wang for providing custom python scripts.
Conflict of Interest
No conflict of interest declared.
References
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W. and Lipman, D.J.
(1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Nucleic Acids Res 25, 3389-3402.
Andrews S. (2010) FastQC: A quality control tool for high throughput sequence data.
Angly, F.E., Willner, D., Rohwer, F., Hugenholtz, P. and Tyson, G.W. (2012) Grinder: a versatile
amplicon and shotgun sequence simulator. Nucleic Acids Res 40, e94-e94.
Bolger, A.M., Lohse, M. and Usadel, B. (2014) Trimmomatic: a flexible trimmer for Illumina
sequence data. Bioinformatics 30, 2114-2120.
Elbehery, A.H.A., Aziz, R.K. and Siam, R. (2016) Antibiotic Resistome: Improving Detection and
Quantification Accuracy for Comparative Metagenomics. OMICS 20, 229-238.

Garner, E., Wallace, J.S., Argoty, G.A., Wilkinson, C., Fahrenfeld, N., Heath, L.S., Zhang, L.Q., Arabi,
M., Aga, D.S. and Pruden, A. (2016) Metagenomic profiling of historic Colorado Front Range
Accepted Article
flood impact on distribution of riverine antibiotic resistance genes. Sci Rep 6.
Gupta, S.K., Padmanabhan, B.R., Diene, S.M., Lopez-Rojas, R., Kempf, M., Landraud, L. and Rolain,
J.M. (2014) ARG-ANNOT, a New Bioinformatic Tool To Discover Antibiotic Resistance Genes in
Bacterial Genomes. Antimicrob Agents and Chemother 58, 212-220.
Hatem, A., Bozdag, D., Toland, A.E. and Catalyurek, U.V. (2013) Benchmarking short sequence
mapping tools. BMC Bioinformatics 14.
Jia, B., Raphenya, A.R., Alcock, B., Waglechner, N., Guo, P., Tsang, K.K., Lago, B.A., Dave, B.M.,
Pereira, S., Sharma, A.N., Doshi, S., Courtot, M., Lo, R., Williams, L.E., Frye, J.G., Elsayegh, T.,
Sardar, D., Westman, E.L., Pawlowski, A.C., Johnson, T.A., Brinkman, F.S.L., Wright, G.D. and
McArthur, A.G. (2017) CARD 2017: expansion and model-centric curation of the comprehensive
antibiotic resistance database. Nucleic Acids Res 45, D566-D573.
Kristiansson, E., Fick, J., Janzon, A., Grabic, R., Rutgersson, C., Weijdegard, B., Soderstrom, H. and
Larsson, D.G.J. (2011) Pyrosequencing of Antibiotic-Contaminated River Sediments Reveals
High Levels of Resistance and Gene Transfer Elements. PLoS One 6.
Langmead, B. and Salzberg, S.L. (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods
9, 357-U354.
Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.
arXiv:1303.3997v1 [q-bio.GN].
Luby, E., Ibekwe, A.M., Zilles, J. and Pruden, A. (2016) Molecular Methods for Assessment of
Antibiotic Resistance in Agricultural Ecosystems: Prospects and Challenges. J of Environ Qual
45, 441-453.

Manaia, C.M., Macedo, G., Fatta-Kassinos, D. and Nunes, O.C. (2016) Antibiotic resistance in
urban aquatic environments: can it be controlled? Appl Microbiol Biotechnol 100, 1543-1557.
Accepted Article
Ruffalo, M., LaFramboise, T. and Koyuturk, M. (2011) Comparative analysis of algorithms for
next-generation sequencing read alignment. Bioinformatics 27, 2790-2796.
Schmieder, R. and Edwards, R. (2012) Insights into antibiotic resistance through metagenomic
approaches. Future Microbiol 7, 73-89.
Sottani, C., Islam, K., Soffientini, A., Zerilli, L.F. and Seraglia, R. (1993) Studies on the Interaction
of Elfamycin Antibiotics with Elongation Factor-Tu by Mass Spectroscopic Techniques. Rapid
Commun Mass Spectrom 7, 680-683.
Subirats, J., Sanchez-Melsio, A., Borrego, C.M., Balcazar, J.L. and Simonet, P. (2016) Metagenomic
analysis reveals that bacteriophages are reservoirs of antibiotic resistance genes. Int J of
Antimicrob Agents 48, 163-167.
Treangen, T.J. and Salzberg, S.L. (2012) Repetitive DNA and next-generation sequencing:
computational challenges and solutions. Nat Rev Genet 13, 36-46.
Yang, Y., Li, B., Zou, S., Fang, H.H.P. and Zhang, T. (2014) Fate of antibiotic resistance genes in
sewage treatment plant revealed by metagenomic approach. Water Res 62, 97-106.
Ye, H., Meehan, J., Tong, W. and Hong, H. (2015) Alignment of Short Reads: A Crucial Step for
Application of Next-Generation Sequencing Data in Precision Medicine. Pharmaceutics 7, 523-
541.
Yu, X., Guda, K., Willis, J., Veigl, M., Wang, Z., Markowitz, S., Adams, M.D. and Sun, S. (2012) How
do alignment programs perform on sequencing data with varying qualities and from repetitive
regions? Biodata Min 5.

Zhang, S.H., Han, B., Gu, J., Wang, C., Wang, P.F., Ma, Y.Y., Cao, J.S. and He, Z.L. (2015a) Fate of
antibiotic resistant cultivable heterotrophic bacteria and antibiotic resistance genes in

Accepted Article
wastewater treatment processes. Chemosphere 135, 138-145.
Zhang, T., Yang, Y. and Pruden, A. (2015b) Effect of temperature on removal of antibiotic
resistance genes by anaerobic digestion of activated sludge revealed by metagenomic
approach. Appl Microbiol Biotechnol 99, 7771-7779.
Supporting Information
Table S1 Summary of genomes and ARGs contained in the simulated bacterial metagenome.
Percent idenitiy represents BLASTn hit against the NCBI nucleotide database (nr/nt) with an E-
value of 0 over 100% of the query seqeunce. Each ARG seqeunce was extracted directly from the
CARD. All genome and gene annotations were extracted from NCBI and the CARD’s Antibiotic
Resistance Ontology.
Figure 1 Relative abundance of antibiotic resistance classes in the (a) CAS and (b) MBR
wastewater samples when aligned with BLASTN against the CARD.

Table 1 Alignment statistics of simulated bacterial metagenome reads with and without
Illumina-errors when aligned with Bowtie2, BWA-MEM, and BLASTN against their whole
Accepted Article
genomes.
Bowtie2 BWA-MEM BLASTN
No. of Mapped reads (%) 128864 (100) 128864 (100) 128484 (99.71)
No Error 128864 (100) 128864 (100) 128633 (99.82)
No. Correct (%) 123946 (96.18) 107837 (83.68) 120788 (94.01)
No Error 124129 (96.33) 126988 (98.54) 120260 (93.49)
No. of Partials (%) 275 (0.21) 18993(14.74) 0
No Error 2 (~ 0) 4 (~ 0) 0
No. Multi-reads (%) 4637 (3.60) 0 7696 (5.99)
No Error 4724 (3.67) 0 8373 (6.51)
No. of False Positives (%) 6 (~ 0) 2034 (1.58) 0
No Error 9 (0.01) 1872 (1.45) 0

Table 2 Alignment statistics of simulated bacterial metagenome reads with and without
Illumina-errors when aligned with Bowtie2, BWA-MEM, BLASTN, and BLASTX against the
Accepted Article
CARD. BLASTX was evaluated based on the following criteria: target genes obtaining a percent
identity of ≥ 90% over at least 25 amino acids with the read mapped to the correct target NCBI
accession number.
Bowtie2 BWA-MEM BLASTN BLASTX
No. of Mapped reads (%) 456 (0.35) 802 (0.62) 509 (0.39) 1090 (0.85)
No Error 532 (0.41) 1118 (0.87) 619 (0.48) 2340 (1.82)
No. Correct (%) 84 (18.42) 75 (9.35) 103 (20.24) 82 (7.52)
No Error 88 (16.54) 111 (9.93) 137 (22.13) 139 (5.94)
No. of Partials (%) 12 (2.63) 39 (4.86) 11 (2.16) -
No Error 14 (2.63) 15 (1.34) 6 (0.97) -
No. Multi-reads (%) 68 (14.91) 9 (1.12) 100 (19.65) 681 (62.48)
No Error 90 (16.92) 9 (0.81) 131 (21.16) 1676 (71.62)
No. of False Positives (%) 292 (64.04) 679 (84.66) 295 (57.96) 327 (30)
No Error 340 (63.91) 983 (87.92) 345 (55.74) 525 (22.44)

Table 3 Process characteristics for East Lansing and Traverse City wastewater treatment
plants.
Accepted Article
East Lansing WWTP Traverse City WWTP
Wastewater treatment process Conventional Activated Membrane Biological Reactor

(biological treatment) Sludge (CAS) (MBR)
Sludge Retention Time (SRT) 14 days 7.58 days
Capacity 18.8 MGD 17.0 MGD
Avg. Flow 13.4 MGD 8.5 MGD
Discharge Rate 14.1 MGD 4.0 MGD
Disinfection Chlorine (Cl) Ultraviolet (UV)
Table 4 Quality control analysis results on raw and post quality trimmed sequence reads using
FastQC.
Raw Quality Trimmed
Sample Avg. GC Avg. GC

Sequences Per Sequences Per
Sample Content Content
Abbreviation Paired-end Paired-end
(%) (%)
Conventional
CAS 13115140 52 10747794 52
Activated Sludge
Membrane
MBR 12478608 58 9779513 58
Bioreactor

Accepted Article

Significance and Impact of The Study

Uploaded by

Copyright:

Available Formats

You might also like

Significance and Impact of The Study

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Significance and Impact of The Study

Uploaded by

Copyright:

Available Formats

Article type : Original Article

Camille McCall1#, Irene Xagoraraki1

#Address correspondence to Camille McCall, email: mccallca@msu.edu; phone: (248) 346-2359

Running Headline: Antibiotic Resistance in Metagenomes

Significance and Impact of the Study

Antibiotic resistance genes (ARGs) are pollutants known to persist in wastewater

alignment tools that provide a comprehensive look into antimicrobial resistance in

environmental samples. However, standardizing practices in ARG metagenomic studies is

This article is protected by copyright. All rights reserved.

ARGs in environmental samples.

aligning bacterial metagenomes against the Comprehensive Antibiotic Resistance Database

of antibiotic resistance genes in each sample.

Keywords: Antibiotic resistance, Metagenomics, Bowtie2, BWA-MEM, BLAST, Alignment

This article is protected by copyright. All rights reserved.

antibiotics is considerably greater in livestock operations than in human therapeutics.

Tetracyclines, for example, are administered to livestock at subtherapeutic levels to act as

to the lack of culturability, or availability of primers.

Next-generation sequencing (NGS) is culture-independent and allows for the analysis of

This article is protected by copyright. All rights reserved.

matches, and re-seeding.

To assess the performance of sequence aligners in ARG metagenomic studies, we aligned

This article is protected by copyright. All rights reserved.

Results and Discussion

Alignment of Simulated Bacterial Metagenomes

and query sequence due to sequencing errors or allelic variation.

This article is protected by copyright. All rights reserved.

To evaluate the performance of each sequence aligner when mapping bacterial

followed by BWA-MEM at 4.30% and 2.05%, respectively.

Alignment of simulated bacterial metagenomes against the CARD revealed BLASTN to

This article is protected by copyright. All rights reserved.

Alignment and Survey of ARGs detected in Wastewater Metagenomes

growth by binding to the elongation factor Tu polypeptide, a component responsible for

This article is protected by copyright. All rights reserved.

parameters depending on characteristics of data, biological application, and sensitivity needs.

sequence aligners. BLASTN reported a greater percentage of accurate alignments followed by

Bowtie2 in the simulated metagenomes. BLASTN aligned a greater number of reads as

This article is protected by copyright. All rights reserved.

Wastewater Sample Collection

Wastewater Sample Processing and Filtration

stored at −80°C until DNA extraction was performed.

This article is protected by copyright. All rights reserved.

Spectrophotometer (NanoDrop® ND-1000, Wilmington, DE).

High-throughput Sequencing and Preprocessing

libraries were sequenced on an Illumina platform (Illumina HiSeq2500, Roche Technologies)

generating 150 bp paired-end reads.

latter sequence alignment processes.

This article is protected by copyright. All rights reserved.

A synthetic bacterial metagenome was constructed from seven complete sequences,

NC_000962.3, NC_003112.2, NC_020088.1, NC_020418. Selected genomes are of different

http://sourceforge.net/projects/bbmap) was used to determine the insert size and standard

MEM, BLASTN, and BLASTX under default conditions as described below.

This article is protected by copyright. All rights reserved.

composed of antibiotic resistance gene sequences in FASTA format consisting of various

Sequence Alignment and Performance Evaluation

Because a considerable amount of ambiguity is expected when mapping reads to the

This article is protected by copyright. All rights reserved.

metagenomic data, in most cases, multi-reads were not favored.

were clustered together in the appropriate ARG class.