Significance and Impact of The Study

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Article type : Original Article

Accepted Article
Comparative Study of Sequence Aligners for Detecting Antibiotic Resistance in Bacterial

Metagenomes

Camille McCall1#, Irene Xagoraraki1

Department of Civil and Environmental Engineering, Michigan State University, East Lansing,

Michigan, USA1

#Address correspondence to Camille McCall, email: mccallca@msu.edu; phone: (248) 346-2359

Running Headline: Antibiotic Resistance in Metagenomes

Significance and Impact of the Study

Antibiotic resistance genes (ARGs) are pollutants known to persist in wastewater

treatment plants among other environments, thus methods for detecting these genes have

become increasingly relevant. Next generation sequencing has brought about a host of sequence

alignment tools that provide a comprehensive look into antimicrobial resistance in

environmental samples. However, standardizing practices in ARG metagenomic studies is

challenging since results produced from alignment tools can vary significantly. Our study

provides sequence alignment results of synthetic, and authentic bacterial metagenomes mapped

This article has been accepted for publication and undergone full peer review but has not
been through the copyediting, typesetting, pagination and proofreading process, which may
lead to differences between this version and the Version of Record. Please cite this article as
doi: 10.1111/lam.12842

This article is protected by copyright. All rights reserved.


against an ARG database using multiple alignment tools, and the best practice for detecting

ARGs in environmental samples.


Accepted Article
Abstract

We aim to compare the performance of Bowtie2, BWA-MEM, BLASTN, and BLASTX when

aligning bacterial metagenomes against the Comprehensive Antibiotic Resistance Database

(CARD). Simulated reads were used to evaluate the performance of each aligner under the

following four performance criteria: correctly mapped, false positives, multi-reads, and partials.

The optimal alignment approach was applied to samples from two wastewater treatment plants

to detect ARGs using next generation sequencing. BLASTN mapped with greater accuracy

among the four sequence alignment approaches considered followed by Bowtie2. BLASTX

generated the greatest number of false positives and multi-reads when aligned against the

CARD. The performance of each alignment tool was also investigated using error-free reads.

Although each aligner mapped a greater number of error-free reads as compared to Illumina-

error reads, in general, the introduction of sequencing errors had little effect on alignment

results when aligning against the CARD. Given each performance criteria, BLASTN was found to

be the most favorable alignment tool and was therefore used to assess resistance genes in

sewage samples. Beta-lactam and aminoglycoside were found to be the most abundant classes

of antibiotic resistance genes in each sample.

Keywords: Antibiotic resistance, Metagenomics, Bowtie2, BWA-MEM, BLAST, Alignment

This article is protected by copyright. All rights reserved.


Introduction

It has been reported that within the first few years of introducing a new antibiotic,
Accepted Article
pathogens commonly known to persist in hospital settings, develop resistance (Zhang et al.

2015a). Resistant pathogens of this nature escape controlled settings through, most commonly,

hospital waste streams and are released into the environment. Additionally, the intake of

antibiotics is considerably greater in livestock operations than in human therapeutics.

Tetracyclines, for example, are administered to livestock at subtherapeutic levels to act as

growth promoters, and prevent disease. This may further facilitate the dissemination of

resistance genes in the environment through the use of manure and wastewater effluent in land

applications (Manaia et al. 2016; Schmieder and Edwards 2012). Consequently, antibiotic

resistance genes (ARGs) have become an evolving environmental pollutant known to occur in

ecosystems such as, soils, surface waters, and wastewater treatment plants (WWTPs) (Manaia

et al. 2016). Several techniques, namely, culturing, quantitative polymerase chain reaction

(qPCR), and microarrays have been used for detecting ARGs in the environment (Luby et al.

2016). Though these techniques have made it possible to detect ARGs, they are insufficient in

identifying novel ARGs, or a broad spectrum of these genes within microbial communities due

to the lack of culturability, or availability of primers.

Next-generation sequencing (NGS) is culture-independent and allows for the analysis of

metagenomes, which consist of all genetic material contained in a sample. NGS platforms such

as Illumina, PacBio, and SOLiD sequence libraries of genomic fragments that are later processed

for interpretation. A major stage in the post-sequencing process is the alignment of sequence

fragments (or “reads”) against reference sequences (Langmead and Salzberg 2012; Hatem et al.

2013). The Comprehensive Antibiotic Resistance Database (CARD) is a widely used reference

database in recent metagenomic ARG studies (Elbehery et al. 2016; Garner et al. 2016; Subirats

et al. 2016).

This article is protected by copyright. All rights reserved.


A number of sequence aligners have been developed for aligning reads against reference

databases. The Basic Local Alignment Search Tool (BLAST) is a widely used sequence alignment
Accepted Article
tool for detecting antibiotic resistance in metagenomic studies (Yang et al. 2014; Zhang et al.

2015b; Elbehery et al. 2016; Subirats et al. 2016). BLAST takes a heuristic approach, which

allows for rapid alignment of query sequences against massive reference databases like

GenBank (Altschul et al. 1997). In addition to BLAST, two popular sequence aligners, Bowtie2

(Langmead and Salzberg 2012) and BWA-MEM (Li 2013) attempt to find all possible alignments

for each query sequence by way of indexing the reference, finding global, or maximum exact

matches, and re-seeding.

Several studies have evaluated aligners given either synthetic data, or authentic data

extracted from, for example, the human genome (Ruffalo et al. 2011; Langmead and Salzberg

2012; Hatem et al. 2013; Li 2013). Fewer studies have analyzed sequence aligners using

bacterial genomes (Li 2013), and to our knowledge there are no known studies that have

assessed the alignment of bacterial metagenomes against ARG reference databases. There are a

number of factors that influence the behavior of sequence aligners including, parameters set,

genetic data, and sequencing platform (Hatem et al. 2013; Elbehery et al. 2016). Sequence

aligners are challenged with the task of assigning multi-reads (reads that align equally as good

to more than one position on a reference genome) (Treangen and Salzberg 2012) to the correct

position and can result in errors when reporting best hits only. When dealing with metagenomic

datasets, the situation is aggravated because the volume of data increases dramatically. Errors

generated from the sequencing platform introduce further ambiguity when analyzing

sequencing data.

To assess the performance of sequence aligners in ARG metagenomic studies, we aligned

two simulated ARG bacterial metagenomes (one without error and another with simulated

This article is protected by copyright. All rights reserved.


sequencing errors) against the CARD. Bowtie2, BWA-MEM, BLASTN and BLASTX were chosen

based on their frequent appearance in metagenomic studies, compatibility with Illumina data,
Accepted Article
and capabilities to deal with reads of various lengths. Furthermore, real bacterial DNA extracted

from two WWTPs and sequenced on an Illumina platform, was aligned against the CARD using

the optimal alignment approach determined by the specified performance criteria. We aim to

(1) assess the performance of BWA-MEM, Bowtie2, BLASTN, and BLASTX when aligning a

simulated bacterial metagenome against an ARG database, and (2) apply the optimal alignment

approach on empirical sewage samples and report the relative abundance of ARG classes in

each sample. Results presented here will assist in determining what tool to use in ARG

metagenomic studies.

Results and Discussion

Alignment of Simulated Bacterial Metagenomes

To validate each alignment approach, simulated reads were initially mapped back to their

whole genomes (Table 1). During alignment of simulated reads against whole genomes, Bowtie2

and BWA-MEM aligned 100% of reads with and without sequencing errors while BLASTN

aligned 99.71% and 99.82% of reads with error and without error, respectively. Without error

BWA-MEM mapped at a greater accuracy followed by Bowtie2 and BLASTN. Although BWA-

MEM mapped error-free reads with greater accuracy than Bowtie2 and BLAST, there was a

significant decrease in alignment accuracy when aligning reads with sequencing errors. Hence,

our findings suggest that BWA-MEM is more sensitive to discrepancies between the reference

and query sequence due to sequencing errors or allelic variation.

Bowtie2 yielded the highest alignment error rate at 3.51% followed by BWA-MEM at

2.60% when aligning error containing reads against whole genomes. Conversely, BWA-MEM

generated an error of 5.17% while Bowtie2 obtained an error rate of 3.98% when aligning

This article is protected by copyright. All rights reserved.


error-free reads. All BLAST results were evaluated with an E-value threshold of 1e-5.

Considering the absence of a protein reference for whole genomes, BLASTX was not considered
Accepted Article
in the preliminary analysis. The high accuracy of aligned reads with and without error among all

methods indicate that the alignment approaches taken generally performed well.

To evaluate the performance of each sequence aligner when mapping bacterial

metagenomes to an ARG reference, simulated reads were aligned against the CARD. The highest

number of mapped reads was obtained during BLASTX alignment. Each alignment tool obtained

a greater number of mapped reads when aligning reads without sequencing errors. BLASTN

obtained the greatest number of correctly mapped reads followed by Bowtie2 (Table 2). While

BLASTX obtained the lowest percentage of false positives, it generated a substantial amount of

multi-reads. BWA-MEM produced the least amount of multi-reads and there was no significant

difference in the percent of multi-reads between BLASTN and Bowtie2 (P = 0.08). Bowtie2

alignment generated 6.08% and 8.17% more false positives compared to BLASTN when aligning

reads with and without error, respectively. Bowtie2 yielded the highest error rate when aligning

simulated data against the CARD with and without error at 5.42% and 2.47%, respectively,

followed by BWA-MEM at 4.30% and 2.05%, respectively.

Alignment of simulated bacterial metagenomes against the CARD revealed BLASTN to

have a slightly greater accuracy compared to Bowtie2, BWA-MEM, and BLASTX. Although false

positives are anticipated, each alignment tool generated a relatively large number of false

positives when mapping against the CARD as compared to whole genomes. This could be

attributed to sequencing errors, which can complicate the alignment process resulting in

incorrect alignments, specifically in metagenomic data (Elbehery et al. 2016). BWA-MEM and

BLASTN mapped reads without sequencing errors with marginally greater accuracy than reads

consisting of sequencing errors. In most cases, the percent of multi-reads decreased when

This article is protected by copyright. All rights reserved.


aligning reads with sequencing errors as opposed to reads without. The number of false

positives was slightly less for error-free reads only during alignment with BLAST. Repetitive
Accepted Article
regions can also have a substantial impact on the number of false alignments and multi-reads

(Treangen and Salzberg 2012, Yu et al. 2012). While this may be the case during BLASTX

alignment, further analysis is needed to draw a definite conclusion. Despite the slight variations

in results between error-free and Illumina-error reads, the performance (i.e. no. of mapped

reads, correct, partials, multi-reads, and false positives) of aligners when mapping against the

CARD remained largely unchanged when introducing sequencing errors (P = 0.05 - 0.57).

Alignment and Survey of ARGs detected in Wastewater Metagenomes

Quality analysis results on trimmed sequences revealed a mean quality of 36 for both

samples (Table 4). Wastewater samples were aligned against the CARD using BLASTN with an

E-value of 1e-5, the remaining parameters were maintained at default settings. A gene similarity

threshold of ≥ 90% over 150 bps was considered for mapped reads.

A total of 256 and 300 different ARGs met threshold conditions, each obtaining an E-value

of less than 1e-50, in the CAS and MBR sewage samples, respectively using BLASTN. β-lactam

resistance genes were the most abundant in each sample followed by aminoglycosides. Minor

counts of ARGs belonging to elfamycin, glycopeptide, and polymyxin classes of antibiotics were

detected in the MBR sample, but went undetected in the CAS sample (Figure 1). Streptomyces

cinnamoneus tuf gene (NCBI accession no. X98831), resistant to elfamycins, was detected in the

MBR sample. Elfamycins are a class of naturally occurring antibiotics that inhibit bacterial

growth by binding to the elongation factor Tu polypeptide, a component responsible for

bacterial protein synthesis (Sottani et al. 1993). To our knowledge, no recent studies on the

prevalence of elfamycin resistance genes in sewage treatment plants have been documented.

This article is protected by copyright. All rights reserved.


Our results suggest the occurrence of this gene in the MBR sample. There was no significant

difference in the abundance of antibiotic classes found between sewage samples (P = 0.88).
Accepted Article
Limitations on Sequence Alignment Results

Results generated in this analysis only depict results from the specific aligner parameters,

references, and samples used in this study. Some may choose to adjust sequence algorithm

parameters depending on characteristics of data, biological application, and sensitivity needs.

Here, we offer results from the alignment of simulated and real bacterial metagenomes mapped

against an ARG reference database with varying aligners under default conditions.

Since the analysis of ARGs using sequencing is most accurate when identifying known

genes (Schmieder and Edwards 2012), ARGs detected in each WWTP only suggest the presence

of these genes in the samples investigated. When possible, traditional biological detection

methods are recommended for verifying the identification of genes detected using sequences

aligners.

This study evaluates the performance of Bowtie2, BWA-MEM, and BLAST sequence

alignment tools in metagenomic ARG analyses. It also highlights sequencing errors as a potential

factor that can interfere with accurately detecting ARGs in bacterial metagenomes using

sequence aligners. BLASTN reported a greater percentage of accurate alignments followed by

Bowtie2 in the simulated metagenomes. BLASTN aligned a greater number of reads as

compared to Bowtie2 and maintained a lower number of false positives verses Bowtie2 and

BWA-MEM. Therefore, BLASTN was selected as the aligner of choice in the study. It is clear that

each tool has its tradeoff when confronted with specificity verses sensitivity. To gain more

This article is protected by copyright. All rights reserved.


insight into the performance of sequence aligners in ARG metagenomic analyses, further studies

with varying sequencing tools and aligner performance parameters are warranted in the future.
Accepted Article
Materials and methods

Wastewater Sample Collection

Sewage samples were collected from the East Lansing, conventional activated sludge

(CAS), and Traverse City, membrane bioreactor (MBR), WWTPs in Michigan (U.S.A.) in 2013.

The characteristics of these WWTPs are shown in Table 3. Samples presented in this study were

taken directly after the disinfection process of each treatment utility. In short, 2 liters of grab

effluent sample was collected in sterile nalgene bottles from each WWTP. Samples were mixed

and stored on ice, then transported to the laboratory for further processing.

Wastewater Sample Processing and Filtration

Bacteria were recovered using a standard filtration technique with 0.45 μm HA filters

(Millipore, Billerica, MA). The volume of sample filtered was 1 liter for each sample. The filters

were collected in sterile 50 ml polypropylene tubes and 50 ml Phosphate Buffer (1X PBS) was

added in each tube containing a filter. The tubes were vortexed for 5 min to allow the biomass

layer on the filters to mix with water. Both tubes were centrifuged for 20 min at 2309 g to

concentrate the sample down to 2 ml. Supernatant was discarded and the concentrates were

stored at −80°C until DNA extraction was performed.

This article is protected by copyright. All rights reserved.


Nucleic Acid Extraction

DNA extraction was performed using a MagNA Pure Compact DNA extractor (Roche
Accepted Article
Applied Science, Indianapolis, IN, USA) following the protocol in the manufacturer’s manual. The

MagNA Pure Compact utilizes a magnetic-bead technology for the isolation process. Sample

amount of 400 μl was loaded in the system and the elution volume was 100 μl. The purified

DNAs were stored in a freezer at -20°C. DNA concentration was determined using the NanoDrop

Spectrophotometer (NanoDrop® ND-1000, Wilmington, DE).

High-throughput Sequencing and Preprocessing

DNA samples were isolated and approximately 1 μg of DNA (per sample) was sent to the

Research Technology Support Facility (RTSF) at Michigan State University. The NuGEN Ovation

Ultralow Library System, with an input requirement of 1-100 ng of DNA, was used for both

samples to accommodate for any sample containing low genetic material. After preparation,

libraries were sequenced on an Illumina platform (Illumina HiSeq2500, Roche Technologies)

generating 150 bp paired-end reads.

Sequences were returned as R1.FASTQ and R2.FASTQ files for each sample, where R1 and

R2 constitute a read pair. Each FASTQ file was processed using a Unix/Linux system offered

through the MSU High Performance Computing Center (HPCC). Raw sequences were analyzed

for quality using FastQC, a quality control tool for sequencing data (Andrews 2010). Based on

the quality control check, Illumina adapters and reads with an average quality score below 15

were removed using Trimmomatic (Bolger et al. 2014). Finally, FastQC was performed once

more on the quality trimmed reads to ensure the integrity of the sequence reads and accuracy of

latter sequence alignment processes.

This article is protected by copyright. All rights reserved.


Simulated Bacterial Metagenome

A synthetic bacterial metagenome was constructed from seven complete sequences,


Accepted Article
identified as whole genomes or mobile genetic elements (MGEs), extracted from the NCBI’s

nucleotide database. Sequences associated with the following NCBI accession numbers were

used and consist of ARGs contained in the CARD: NC_003197.2, NC_010410.1, X58272.1,

NC_000962.3, NC_003112.2, NC_020088.1, NC_020418. Selected genomes are of different

genera and comprise ARGs with lengths ranging between 439 - 3205 nucleotides (Table S1).

Read length and insert size for pair-end synthetic reads was assigned based on the

characteristics of sequenced wastewater samples. Read length distribution for both samples

was validated using a custom awk command on trimmed reads. BBmerge (extended to 100

bases with a kmer size of 62) in the BBTools package (BBMap – Bushnell B. –

http://sourceforge.net/projects/bbmap) was used to determine the insert size and standard

deviation of both overlapping and non-overlapping reads in the trimmed CAS and MBR pair-end

FASTQ files. Synthetic reads were then simulated using Grinder, version 0.5.4. (Angly et al.

2012). Grinder generated 150 bp, pair-end reads with 1x coverage and a mean insert size and

standard deviation of 218 and 54, respectively. The remaining parameters were run at default

conditions.

To better assess the effects of sequencing error on performance, synthetic reads were

generated with and without error. Illumina-error profiles were generated as recommended in

the instruction manual. Quality scores between 15 and 36 were assigned to each read in the

resulting FASTQ file according to error profile. The simulated metagenome reads without error

were generated with default quality scores. Synthetic reads were aligned with Bowtie2, BWA-

MEM, BLASTN, and BLASTX under default conditions as described below.

This article is protected by copyright. All rights reserved.


ARG Reference Databases

Reference genes (both nucleotide and protein sequences) from non-mutated CARD
Accepted Article
version 1.18 (Jia et al. 2017) were downloaded and used for alignment. The nucleotide CARD is

composed of antibiotic resistance gene sequences in FASTA format consisting of various

antibiotic classes. It is 2,027,840 nucleotide bases in length and contains 2165 ARG sequences

imported from NCBI GenBank and peer-reviewed publications. The protein CARD is 671,057

amino acids in length and consists of 2165 protein sequences in FASTA format. Reference genes

are classified based on the CARD’s Antibiotic Resistance Ontology (ARO) (Jia et al. 2017).

Sequence Alignment and Performance Evaluation

Simulated metagenomes were analyzed using Bowtie2, BWA-MEM, and BLAST, tools for

aligning reads to reference sequences. Bowtie2 was operated using default settings (i.e. end-to-

end alignment, and a minimum threshold alignment score of -90) for each metagenome

(Langmead and Salzberg 2012). BWA-MEM was operated using default settings (i.e. local

alignment) (Li 2013). BLASTN and BLASTX in the BLAST+ package version 2.6 were used for

aligning reads. BLAST tools were ran using default settings (i.e. local alignment) with an E-value

of 1e-5.

Because a considerable amount of ambiguity is expected when mapping reads to the

CARD, the alignment methods used in this study were verified by first aligning simulated reads

back to their whole genomes. The performance of each alignment tool was then evaluated when

mapping simulated reads against the CARD. Since the position of each read is known, a custom

python code was used to evaluate the following four performance criteria: correctly mapped

reads (reads aligned to the correct position on the genome, or gene), partially mapped reads

(reads offset from its true position while obtaining at least 20 true positive bases), false

This article is protected by copyright. All rights reserved.


positives (reads aligned to the incorrect position), and multi-reads. The abovementioned

parameters were evaluated for Bowtie2, BWA-MEM, and BLASTN alignments. Reads mapped to
Accepted Article
their respective nucleotide accession number when translated with a percent identity of ≥ 90%

over at least 25 amino acids (aa) (Elbehery et al. 2016) were considered correctly mapped

during alignment with BLASTX. Reads mapped with the abovementioned criteria to the

incorrect target accession number were considered false positives. Multi-reads obtained during

BLASTX follow the same criteria as previously mentioned. Partially mapped reads were not

considered during BLASTX alignment. Since multi-reads introduce uncertainty when analyzing

metagenomic data, in most cases, multi-reads were not favored.

The optimal alignment approach given the performance criteria was used for aligning the

wastewater samples against the CARD. Genes meeting a threshold value of ≥ 90.00% nucleotide

gene similarity (Kristiansson et al. 2011; Zhang et al. 2015b) or ≥ 90.00% over at least 25 aa

were clustered together in the appropriate ARG class.

Statistical Analysis

Significant differences in performance criteria between aligners and abundance of ARG

classes between sewage samples was determined by a one-way analysis of variance (ANOVA)

test in SPSS 24.0, where P < 0.05 was considered statistically significant. Error rates were

retrieved directly from the sample’s Binary Alignment/Mapping (BAM) output file using

SAMtools. All quality scores reported follow a Phred scale (Ruffalo et al. 2011).

This article is protected by copyright. All rights reserved.


Acknowledgments

We would like to thank the managers at East Lansing and Traverse City WWTPs for
Accepted Article
providing samples. Thank you to Bioinformatics Research Specialists, Dr. Tracy Teal and

Dharanya Sampath for technical support. Special thanks to Dr. Shin-Han Shiu for his guidance,

and Peipei Wang for providing custom python scripts.

Conflict of Interest

No conflict of interest declared.

References

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J.H., Zhang, Z., Miller, W. and Lipman, D.J.

(1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Nucleic Acids Res 25, 3389-3402.

Andrews S. (2010) FastQC: A quality control tool for high throughput sequence data.

Angly, F.E., Willner, D., Rohwer, F., Hugenholtz, P. and Tyson, G.W. (2012) Grinder: a versatile

amplicon and shotgun sequence simulator. Nucleic Acids Res 40, e94-e94.

Bolger, A.M., Lohse, M. and Usadel, B. (2014) Trimmomatic: a flexible trimmer for Illumina

sequence data. Bioinformatics 30, 2114-2120.

Elbehery, A.H.A., Aziz, R.K. and Siam, R. (2016) Antibiotic Resistome: Improving Detection and

Quantification Accuracy for Comparative Metagenomics. OMICS 20, 229-238.

This article is protected by copyright. All rights reserved.


Garner, E., Wallace, J.S., Argoty, G.A., Wilkinson, C., Fahrenfeld, N., Heath, L.S., Zhang, L.Q., Arabi,

M., Aga, D.S. and Pruden, A. (2016) Metagenomic profiling of historic Colorado Front Range
Accepted Article
flood impact on distribution of riverine antibiotic resistance genes. Sci Rep 6.

Gupta, S.K., Padmanabhan, B.R., Diene, S.M., Lopez-Rojas, R., Kempf, M., Landraud, L. and Rolain,

J.M. (2014) ARG-ANNOT, a New Bioinformatic Tool To Discover Antibiotic Resistance Genes in

Bacterial Genomes. Antimicrob Agents and Chemother 58, 212-220.

Hatem, A., Bozdag, D., Toland, A.E. and Catalyurek, U.V. (2013) Benchmarking short sequence

mapping tools. BMC Bioinformatics 14.

Jia, B., Raphenya, A.R., Alcock, B., Waglechner, N., Guo, P., Tsang, K.K., Lago, B.A., Dave, B.M.,

Pereira, S., Sharma, A.N., Doshi, S., Courtot, M., Lo, R., Williams, L.E., Frye, J.G., Elsayegh, T.,

Sardar, D., Westman, E.L., Pawlowski, A.C., Johnson, T.A., Brinkman, F.S.L., Wright, G.D. and

McArthur, A.G. (2017) CARD 2017: expansion and model-centric curation of the comprehensive

antibiotic resistance database. Nucleic Acids Res 45, D566-D573.

Kristiansson, E., Fick, J., Janzon, A., Grabic, R., Rutgersson, C., Weijdegard, B., Soderstrom, H. and

Larsson, D.G.J. (2011) Pyrosequencing of Antibiotic-Contaminated River Sediments Reveals

High Levels of Resistance and Gene Transfer Elements. PLoS One 6.

Langmead, B. and Salzberg, S.L. (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods

9, 357-U354.

Li H. (2013) Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.

arXiv:1303.3997v1 [q-bio.GN].

Luby, E., Ibekwe, A.M., Zilles, J. and Pruden, A. (2016) Molecular Methods for Assessment of

Antibiotic Resistance in Agricultural Ecosystems: Prospects and Challenges. J of Environ Qual

45, 441-453.

This article is protected by copyright. All rights reserved.


Manaia, C.M., Macedo, G., Fatta-Kassinos, D. and Nunes, O.C. (2016) Antibiotic resistance in

urban aquatic environments: can it be controlled? Appl Microbiol Biotechnol 100, 1543-1557.
Accepted Article
Ruffalo, M., LaFramboise, T. and Koyuturk, M. (2011) Comparative analysis of algorithms for

next-generation sequencing read alignment. Bioinformatics 27, 2790-2796.

Schmieder, R. and Edwards, R. (2012) Insights into antibiotic resistance through metagenomic

approaches. Future Microbiol 7, 73-89.

Sottani, C., Islam, K., Soffientini, A., Zerilli, L.F. and Seraglia, R. (1993) Studies on the Interaction

of Elfamycin Antibiotics with Elongation Factor-Tu by Mass Spectroscopic Techniques. Rapid

Commun Mass Spectrom 7, 680-683.

Subirats, J., Sanchez-Melsio, A., Borrego, C.M., Balcazar, J.L. and Simonet, P. (2016) Metagenomic

analysis reveals that bacteriophages are reservoirs of antibiotic resistance genes. Int J of

Antimicrob Agents 48, 163-167.

Treangen, T.J. and Salzberg, S.L. (2012) Repetitive DNA and next-generation sequencing:

computational challenges and solutions. Nat Rev Genet 13, 36-46.

Yang, Y., Li, B., Zou, S., Fang, H.H.P. and Zhang, T. (2014) Fate of antibiotic resistance genes in

sewage treatment plant revealed by metagenomic approach. Water Res 62, 97-106.

Ye, H., Meehan, J., Tong, W. and Hong, H. (2015) Alignment of Short Reads: A Crucial Step for

Application of Next-Generation Sequencing Data in Precision Medicine. Pharmaceutics 7, 523-

541.

Yu, X., Guda, K., Willis, J., Veigl, M., Wang, Z., Markowitz, S., Adams, M.D. and Sun, S. (2012) How

do alignment programs perform on sequencing data with varying qualities and from repetitive

regions? Biodata Min 5.

This article is protected by copyright. All rights reserved.


Zhang, S.H., Han, B., Gu, J., Wang, C., Wang, P.F., Ma, Y.Y., Cao, J.S. and He, Z.L. (2015a) Fate of

antibiotic resistant cultivable heterotrophic bacteria and antibiotic resistance genes in


Accepted Article
wastewater treatment processes. Chemosphere 135, 138-145.

Zhang, T., Yang, Y. and Pruden, A. (2015b) Effect of temperature on removal of antibiotic

resistance genes by anaerobic digestion of activated sludge revealed by metagenomic

approach. Appl Microbiol Biotechnol 99, 7771-7779.

Supporting Information

Table S1 Summary of genomes and ARGs contained in the simulated bacterial metagenome.

Percent idenitiy represents BLASTn hit against the NCBI nucleotide database (nr/nt) with an E-

value of 0 over 100% of the query seqeunce. Each ARG seqeunce was extracted directly from the

CARD. All genome and gene annotations were extracted from NCBI and the CARD’s Antibiotic

Resistance Ontology.

Figure 1 Relative abundance of antibiotic resistance classes in the (a) CAS and (b) MBR

wastewater samples when aligned with BLASTN against the CARD.

This article is protected by copyright. All rights reserved.


Table 1 Alignment statistics of simulated bacterial metagenome reads with and without

Illumina-errors when aligned with Bowtie2, BWA-MEM, and BLASTN against their whole
Accepted Article
genomes.

Bowtie2 BWA-MEM BLASTN

No. of Mapped reads (%) 128864 (100) 128864 (100) 128484 (99.71)

No Error 128864 (100) 128864 (100) 128633 (99.82)

No. Correct (%) 123946 (96.18) 107837 (83.68) 120788 (94.01)

No Error 124129 (96.33) 126988 (98.54) 120260 (93.49)

No. of Partials (%) 275 (0.21) 18993(14.74) 0

No Error 2 (~ 0) 4 (~ 0) 0

No. Multi-reads (%) 4637 (3.60) 0 7696 (5.99)

No Error 4724 (3.67) 0 8373 (6.51)

No. of False Positives (%) 6 (~ 0) 2034 (1.58) 0

No Error 9 (0.01) 1872 (1.45) 0

This article is protected by copyright. All rights reserved.


Table 2 Alignment statistics of simulated bacterial metagenome reads with and without

Illumina-errors when aligned with Bowtie2, BWA-MEM, BLASTN, and BLASTX against the
Accepted Article
CARD. BLASTX was evaluated based on the following criteria: target genes obtaining a percent

identity of ≥ 90% over at least 25 amino acids with the read mapped to the correct target NCBI

accession number.

Bowtie2 BWA-MEM BLASTN BLASTX

No. of Mapped reads (%) 456 (0.35) 802 (0.62) 509 (0.39) 1090 (0.85)

No Error 532 (0.41) 1118 (0.87) 619 (0.48) 2340 (1.82)

No. Correct (%) 84 (18.42) 75 (9.35) 103 (20.24) 82 (7.52)

No Error 88 (16.54) 111 (9.93) 137 (22.13) 139 (5.94)

No. of Partials (%) 12 (2.63) 39 (4.86) 11 (2.16) -

No Error 14 (2.63) 15 (1.34) 6 (0.97) -

No. Multi-reads (%) 68 (14.91) 9 (1.12) 100 (19.65) 681 (62.48)

No Error 90 (16.92) 9 (0.81) 131 (21.16) 1676 (71.62)

No. of False Positives (%) 292 (64.04) 679 (84.66) 295 (57.96) 327 (30)

No Error 340 (63.91) 983 (87.92) 345 (55.74) 525 (22.44)

This article is protected by copyright. All rights reserved.


Table 3 Process characteristics for East Lansing and Traverse City wastewater treatment

plants.
Accepted Article
East Lansing WWTP Traverse City WWTP

Wastewater treatment process Conventional Activated Membrane Biological Reactor


(biological treatment) Sludge (CAS) (MBR)

Sludge Retention Time (SRT) 14 days 7.58 days

Capacity 18.8 MGD 17.0 MGD

Avg. Flow 13.4 MGD 8.5 MGD

Discharge Rate 14.1 MGD 4.0 MGD

Disinfection Chlorine (Cl) Ultraviolet (UV)

Table 4 Quality control analysis results on raw and post quality trimmed sequence reads using

FastQC.

Raw Quality Trimmed

Sample Avg. GC Avg. GC


Sequences Per Sequences Per
Sample Content Content
Abbreviation Paired-end Paired-end
(%) (%)

Conventional
CAS 13115140 52 10747794 52
Activated Sludge

Membrane
MBR 12478608 58 9779513 58
Bioreactor

This article is protected by copyright. All rights reserved.


Accepted Article

This article is protected by copyright. All rights reserved.

You might also like