ASMNGS 2018 Abstracts

Title: 1: Epidemiological Cues: NGS in Clinical and Public Health Microbiology
Time: Monday, September 24, 2018, 9:00 am - 12:30 pm

Abstract
Translating Metagenomics into Clinical Reality
Title:
Author R. R. Colwell;
Block: University of Maryland, College Park, MD.
The translation of metagenomic methods into routine use for diagnosis and the
development of microbiome-based therapies requires precise taxonomic classification of
all microorganisms (bacteria, archaea, viruses, fungi, protists) at strain-level along with
characterization of their virulence and pathogenicity traits. The capacity for beneficial
effects versus pathogenicity and virulence can vary widely among strains of the same
species. And mounting evidence shows that the distinction between commensal and
pathogenic lifestyles is strongly influenced by strain - not species-level - variability. With
the emerging consensus that the clinically informative and actionable unit in
microbiology is a strain, we face the need for validation and standardization of molecular
Abstract: methods, bioinformatics tools and qualified databases that can deliver strain-level
resolution. This presentation will highlight class leading bioinformatic approaches that
enable multi-kingdom microbial classification with strain-level resolution and
quantitatively and present recent studies that assess the accuracy and performance of
these approaches. The presentation will further address the importance for “challenges”,
controlled cross-platform benchmark studies, for the development and validation of
clinical-grade bioinformatics tools and databases and share considerations on how study
design can affect the unbiased and controlled validation of these tools. Concluding, we
will provide examples for error modes in the metagenomics workflow beyond data
analysis and propose strategies for quality control and standardization.
Abstract
Deployable NGS for Influenza Virus Field Surveillance and Outbreak Response
Title:
M. Keller1, B. Rambo-Martin1, M. Wilson1, J. Nolting2, T. Wong1, X. Lin1, T. Anderson3, E.
Author Neuhaus1, A. Vincent3, T. Davis1, B. Zhou1, A. Bowman2, D. Wentworth1, J. Barnes1;
1
Block: Centers for Disease Control and Prevention, Atlanta, GA, 2The Ohio State University,
Columbus, OH, 3United States Department of Agriculture, Ames, IA.
Background: Exhibition swine are known to be a source for transmission of zoonotic
Influenza A Virus (IAV) to humans1. These IAVs can contain genetic/antigenic components
to which the human population has not been exposed and therefore pose a significant
risk of pandemic. Genomic analyses are critical to understanding this risk and are
requisite in vaccine generation. Generation of genomic data on site, as demonstrated in
outbreaks of Zika2 and Ebola3, provides information on how to stop transmission and
increasing timeliness of response. Here we demonstrate the feasibility of real-time in-
field IAV genomic surveillance at a swine expo. Methods: At a large swine expo, 24 nasal
wipes from IAV positive pigs and their immediate neighbors were utilized for NGS. RNA
was isolated from these samples, converted to cDNA, amplified using a modified M-
RTPCR, and sequenced on-site using the MinION sequencer (Oxford Nanopore
Technologies). Local analysis was performed on a high-performance laptop computer
using a custom bioinformatics pipeline and displayed via an R Shiny application.
Consensus sequences were transmitted electronically for candidate vaccine virus (CVV)
synthesis. RNA was subsequently analyzed by confirmatory NGS using our standard
MiSeq-based (Illumina) pipeline.Results: Of the 24 samples analyzed, 13 codon complete
genomes were obtained and included one A(H1N1), 11 A(H1N2), and one A(H3N2) IAVs.
On-site sequences of these 13 genomes were 99.8% in concordance with data obtained
Abstract:
from our optimized pipeline. The A(H1N2) viruses were very closely related and
represented an outbreak of H1 δ 2 lineage viruses. The most closely related CVV had 20-
24 amino acid substitutions in the HA antigen including changes in neutralizing epitopes,
therefore that CVV is unlikely to offer protection. As an exercise to respond in an
outbreak setting, the data from this A(H1N2) outbreak was used to synthesize DNA for
reverse genetics rescue of an experimental CVV. Conclusions: We developed a portable
strategy to conduct IAV NGS and analysis on-site. An outbreak of H1 δ 2 lineage was
identified in the field, and within hours of specimen collection, genomic information was
analyzed and optimal sequences were sent to CDC for generation of a synthetic CVV.
Despite the limited read level accuracy of the MinION, rapidly generated consensus
sequences were useful for identification, phylogenetic analysis, and DNA synthesis for
CVV development. Electronically delivered consensus data accelerates the development
of vaccines in response to emerging threats, particularly in areas of the world where
transporting samples may be difficult/impossible. Rapid genomic characterization, like
the data obtained here, provides timely information during an outbreak regarding
lineage, reassortment, signatures of human adaptation, and inferred antigenic
characteristics that help to identify where a virus originated, identify transmission chains,
and control emerging IAVs.
Abstract One Week, One Thousand Bacterial Genomes: Microfluidics for Molecular Epidemiology
Title: and High-resolution Intra-patient Bacterial Evolution
Author G. K. Lagoudas1, S. Kim1, C. Merakou2, T. Lieberman1, G. P. Priebe2, P. Blainey1;
1
Block: MIT, Cambridge, MA, 2Boston Children's Hospital, Boston, MA.
Background: High-throughput and low-cost whole genome sequencing (WGS) of bacterial
pathogens has the potential to transform clinical microbiology and infection control
through high-volume surveillance of patient bacterial specimens. Furthermore, efficient
high-throughput sequencing can power the study of bacterial evolution within and
between patients. However, sample preparation of bacteria for sequencing is more
resource intensive, costly, and logistically complex than DNA sequencing and prohibitively
so for large studies involving thousands of isolates. Methods: We created a procedure to
prepare bacterial isolates for sequencing at the scale of 1000 samples in 7 days. Based on
a microfluidic device developed in our laboratory, we implemented an improved
operating protocol that increased throughput 10-fold and extensively validated a protocol
for device reuse without contamination. The new procedure produces DNA libraries for
sequencing from an input of whole cells and allows a single operator to process 144
samples per 8 hours using only four small devices and small quantities of reagents. We
applied this method to prepare and sequence over 4000 isolates, including
1100 Burkholderia dolosa isolates collected from cystic fibrosis patients involved in a B.
Abstract: dolosa outbreak. Samples were collected from three patients close in time to a
transmission event, and tissue was obtained at autopsy of one index patient. Forty-one
different sites in the lung of the index patient were sampled to study the bacterial
heterogeneity between sites, and additional samples were collected to compare
heterogeneity across body sites. Results: A total of 1100 B. dolosa samples were
processed and sequenced. The microfluidic sample preparation produced DNA libraries
that were of comparable quality to libraries from a standard benchtop preparation
method, even with 100-fold less cell input. Generation of sequencing libraries from the
device were highly reproducible, as comparison across samples showed a tight
distribution of duplication rate, a marker of library conversion efficiency. Evaluation of
the genetic diversity among B. dolosa isolates using a previously developed pipeline will
be presented. Conclusions: Microfluidic technology that fully integrates bacterial DNA
isolation and library construction enables high-throughput bacterial WGS. Our platform
supports large clinical studies and investigational projects, as demonstrated by our rapid
sequencing of over one thousand B. dolosa isolates. Our technology can facilitate routine
infection control efforts, as well as enable research studies that require large sample size.
Abstract Rapid diagnosis of lower respiratory infection using nanopore based metagenomic
Title: sequencing
T. Charalampous1, H. Richardson1, G. Kay1, R. Baldan2, C. Jeanes3, D. Rae3, S. Grundy3, D.
Turner4, J. Wain1, R. Leggett5, D. Livermore6, J. O'Grady1;
1
Quadram Institute Biosciences, Norwich, UNITED KINGDOM, 2King's College London,
Author
Norwich, UNITED KINGDOM, 3Norfolk and Norwich University Hospital, Norwich, UNITED
Block:
KINGDOM, 4Oxford Nanopore Technologies, Oxford, UNITED KINGDOM, 5Earlham
Institute, Norwich, UNITED KINGDOM, 6University of East Anglia, Norwich, UNITED
KINGDOM.
Lower respiratory infections (LRIs) accounted for three million deaths in 2016 making
them the leading infectious cause of mortality worldwide, according to the WHO. LRIs
have a complex aetiology, making their diagnosis challenging. The current “gold
standard” for the microbial investigation of bacterial LRIs is culture, which has poor
clinical sensitivity and is too slow to effectively guide early antibiotic therapy.
Metagenomic sequencing potentially could replace culture, providing rapid, sensitive and
comprehensive results. A major challenge for clinical metagenomics is the high ratio of
human:pathogen DNA present in respiratory samples. To address this, we developed and
optimised a novel metagenomics pipeline for the investigation of bacterial LRIs (e.g.
hospital acquired pneumonia) using a saponin based host DNA depletion method, to
remove up to 99.99% of human DNA, combined with rapid nanopore sequencing and
real-time analysis. The first iteration of the pipeline was tested on respiratory samples
Abstract: (including sputum, bronchoalveolar lavage and endotracheal aspirate) from 40 patients
with microbiologically confirmed bacterial pathogens, and controls whose samples had
yielded “normal respiratory flora” or “no growth”. Pathogen and antibiotic resistance
gene identification were obtained within eight hours. The pipeline was then refined to
reduce turnaround time and increase sensitivity, then tested on a further 41 respiratory
samples. This refined method was 96.6% concordant with routine microbiology for the
detection of pathogens with a turnaround time from sample to result of six hours. In an
MRSA-positive sputum sample, S. aureus was identified and the mecA gene detected
within five minutes of sequencing (approx. four hours total turnaround). Six samples
could be sequenced per MinION flowcell, at an approximate cost of $130USD per sample.
This study demonstrates that nanopore based metagenomic sequencing can rapidly and
accurately diagnose LRIs and highlights the importance of efficient human DNA depletion
for rapid and cost-effective clinical metagenomics.
2: From Pipelines to Pixels: NGS Data Integration (Reporting, QC/QA, Accreditation,
Title:
Training) and Visualization
Time: Monday, September 24, 2018, 3:45 pm - 6:00 pm
Abstract A Method for Systematically Surveying Data Visualizations in Infectious Disease Genomic
Title: Epidemiology
Author A. Crisan, J. L. Gardy, T. Munzner;
Block: University of British Columbia, Vancouver, BC, CANADA.
Background. Stakeholders within public health can use the results of genomic analyses to
establish practice guidelines and enact policies. Yet, these stakeholders vary in their
abilities to interpret genomic findings and contextualize the results with other sources of
data. Data visualization is an emergent solution to address interpretability challenges, but
absent is a systematic and robust method to help identify the appropriate visualization to
use in different contexts. Methods. We have developed a systematic method for
generating an explorable visualization design space, which catalogues visualizations
existing within the infectious disease genomic epidemiology literature. Our method uses
an automated literature analysis phase to establish why data were visualized, followed by
a manual visualization analysis phase to establish what data were visualized and how. The
literature analysis phase queried PubMed and used an unsupervised cluster analysis on
article titles and abstracts to discover topic clusters that suggested why data were
visualized. In order to ensure that we had a variety of data visualizations for further
analysis, we sampled articles from across topic clusters and then extracted their figures.
We then applied open and axial coding techniques, from qualitative research methods, to
the sampled figures in order to iteratively derive taxonomic codes that described
elements of each data visualization, thus enabling us to compare
Abstract:
visualizations. Results. We applied our method to a document corpus of approximately
18,000 articles, from which we sampled 204 articles for analysis. We added 17 articles
manually for a final 221 articles that yielded 801 figures and 49 missed opportunity
tables. These figures served as inputs to the visualization analysis phase and resulted in
taxonomic codes along three descriptive axes of visualization design: chart types within
the visualization, chart combinations, and chart enhancements. We refer to the collective
complement of derived taxonomic codes as GEViT (Genomic Epidemiology Visualization
Typology). To operationalize GEViT and the results of the literature analysis we have
created a browsable image gallery (http://gevit.net), that allows an individual to explore
the myriad of complex types of data visualizations (i.e. the visualization design space).
Our analysis of the visualization design space through GEViT also revealed a number of
data visualization challenges within infectious disease genomic epidemiology that future
bioinformatics work should address. Conclusions. Generating explorable visualization
design spaces can help stakeholders and bioinformaticians design and evaluate data
visualizations for different contexts. By consistently codifying a visualization’s elements it
also becomes possible to meaningfully test visualization efficacy beyond an individual’s
intuition and preferences.
Title:
Abstract Using Machine Learning to Predict Antimicrobial Minimum Inhibitory Concentrations and
Title: Associated Genomic Features for Nontyphoidal Salmonella
M. Nguyen1, S. Long2, P. F. McDermott3, R. J. Olsen2, R. Olson1, R. L. Stevens1, G. H.
Author Tyson3, S. Zhao3, J. J. Davis1;
1
Block: University of Chicago, Chicago, IL, 2Houston Methodist Hospital, Houston, TX, 3Food and
Drug Administration, Laurel, MD.
Nontyphoidal Salmonella species are the leading bacterial cause of food-borne disease in
the United States. Whole genome sequences and paired antimicrobial susceptibility data
are available for these strains because of surveillance efforts from public health agencies.
In this study, a collection of 5,278 nontyphoidal Salmonella genomes, collected over 15
years in the United States, was used to generate XGBoost-based machine learning models
for predicting minimum inhibitory concentrations (MICs) for 15 antibiotics. The MIC
prediction models have average accuracies between 95-96% within ± 1 two-fold dilution
factor and can predict MICs with no a priori information about the underlying gene
Abstract:
content or resistance phenotypes of the strains. By selecting diverse genomes for training
sets, we show that highly accurate MIC prediction models can be generated with fewer
than 500 strains. We also show that our approach for predicting MICs is stable over time
despite annual fluctuations in antimicrobial resistance gene content in the sampled
genomes. Finally, using feature selection, we explore the important genomic regions
identified by the models for predicting MICs. Our strategy for developing whole genome
sequence-based models for surveillance and clinical diagnostics can be readily applied to
other important human pathogens.
Title:
Abstract
Pathogenwatch: A Global Platform for Genomic Surveillance
Title:
Author R. Goater, S. Argimón, K. Abudahab, B. Taylor, C. Yeats, S. Harris, D. M. Aanensen;
Block: Wellcome Sanger Institute, Cambridge, UNITED KINGDOM.
Background: Rapid processing of genome data can deliver information to enable
decision-making at a number of scales: Local - does a genome contain signatures of
resistance to antibiotics? National - is there a circulating resistant lineage that requires
targeted infection control, and International - to understand continental transmission and
clonal dissemination. We require a platform to rapidly analyse genomic data, visualise
large datasets in aggregate, and provide open access to these results. Methods and
Results: Pathogenwatch is a web application consisting of a Node.js back-end and React
front-end, with an analysis pipeline based on a novel system we call the “Runner
architecture”. This architecture uses Docker and standard streams to allow analytical
programs to be easily integrated without modification. Users can drag and drop genome
assemblies into Pathogenwatch and receive rapid speciation, AMR prediction, and
cgMLST-based clustering for major bacterial pathogens. For a selected number of species,
Abstract: core SNP-based trees are provided. Results for single genomes are summarised in a
simple report, and collections of multiple genomes are clustered and presented through
interactive maps and phylogenies to assess the spread of pathogens at a regional and
national level. The Runner architecture allows for analyses to be added to the system
without knowledge of the architecture and technologies used. Adding an analysis can be
as simple as wrapping a command-line tool in a Docker image and updating a
configuration file. Using Docker allows for integration of programs written in any
language, and the architecture allows the system to evolve without needing to change
the analytical programs in any way. We will demonstrate the architecture and the
application live with genomic epidemiology use cases for major pathogenic species, with
a focus on near real-time, rapid processing of genomic data via the
web. Conclusions: Pathogenwatch offers a scalable, flexible platform for genomic
epidemiology and surveillance for public health utility.
Title: 3: Farm-to-Table: NGS in Veterinary, Food, and Environmental Microbiology
Time: Tuesday, September 25, 2018, 9:00 am - 12:30 pm
Abstract Whole-genome Sequencing Analyses to Investigate a Nationwide Outbreak of Listeriosis
Title: Caused by Ready-to-Eat Processed Meat Products, South Africa, 2017-2018
M. Allam1, M. M. Maury2, N. Tau1, S. L. Smouse1, P. S. Mtshali1, P. Sekwadi1, R.
Mathebula1, N. Govender1, A. Leclercq2, M. Lecuit2, J. Thomas1, K. McCarthy1, L.
Author
Blumberg1, A. Ismail1, A. M. Smith1;
Block: 1
National Institute for Communicable Diseases, Johannesburg, SOUTH AFRICA, 2Institut
Pasteur, Paris, FRANCE.
Background: In South Africa, between 2017 and 2018, a large and multi-province
outbreak of listeriosis, a serious foodborne disease caused by Listeria
monocytogenes was traced to ready-to-eat (RTE) processed meat products using
epidemiological investigations and whole-genome sequencing (WGS) analyses.
Methods: In addition to standard laboratory techniques, WGS was performed for
additional subtyping of the outbreak isolates and to support investigation into the source
of the outbreak.
Results: As of 19 June 2018, 1056 cases leading to 214 deaths were reported to the
National Institute for Communicable Diseases, South Africa. Multilocus sequence typing
(MLST) of 628 clinical isolates using WGS determined that 571/628 (91%) belonged to
the L. monocytogenes sequence type 6 (ST6) and the remainder (9%) to 17 other
Abstract: sequence types. Furthermore, WGS-based core genome multilocus sequence typing
(cgMLST) analysis using 1748 core genes showed that all ST6 (the outbreak sequence
type) clinical isolates beloned to the cgMLST type CT4148. Isolates of the same cgMLST
type were found in RTE processed meat products (including a widely consumed product
called "polony") and in the processing environment of the manufacturer (0-4 allelic
differences), strongly suggesting that the polony and the other RTE products made in this
facility is the source of the outbreak.
Conclusions: Popular RTE processed meat products from a single food production facility
caused this listeriosis outbreak, which is to date the world's largest. High-throughput
sequencing combined with epidemiological and traceback investigation was instrumental
in swiftly locating the source of contaminated food, preventing further illnesses and
deaths.
Abstract A Validation Approach of an End-to-End Whole Genome Sequencing Workflow for Source
Title: Tracking of Listeria monocytogenes and Salmonella enterica
Author A. Portmann, C. Barretto;
Block: Nestlé Research Center, Lausanne, SWITZERLAND.
Whole genome sequencing (WGS), due to its high discriminatory power, is routinely being
used for source tracking, pathogen surveillance and outbreak investigation. In the food
industry, WGS used for source tracking is beneficial to support contamination
investigations. Despite its increased use worldwide, no standards or guidelines are
available today for its use in outbreak and/or trace-back investigations. The differences
between genomes identified by WGS need to be trusted and a validation of all steps of
the WGS workflow is therefore recommended. Here we present a validation of an end-to-
end WGS workflow for Listeria monocytogenes and Salmonella enterica, including isolates
sub-culturing, DNA extraction, sequencing and bioinformatics analysis. The following
performance criteria were assessed: stability, repeatability, reproducibility,
discriminatory power and epidemiological concordance. Few SNPs were observed for L.
Abstract:
monocytogenes and S. enterica when comparing isolate sequences derived from the
same subculture and between isolates after 10 subcultures. Consequently, the stability of
the WGS workflow for L. monocytogenes and S. enterica was demonstrated despite the
few genomic variations that can occur during sub-culturing steps. Repeatability and
reproducibility were confirmed. The WGS workflow has a high discriminatory power and
confirms genetic relatedness. Additionally, the WGS workflow was able to reproduce
published outbreak results, illustrating the epidemiological concordance. The current
study proposes a validation approach comprising all steps of a WGS workflow and
demonstrates that the workflow can be applied to L. monocytogenes or S. enterica. This
work is one of the first steps to the harmonization of WGS methodologies for source
tracking.
Abstract
Dynamic Human Environmental Exposome Revealed by Longitudinal Personal Monitoring
Title:
Author C. Jiang, X. Wang, X. Li, J. Inlora, T. Wang, Q. Liu, M. Snyder;
Block: Stanford University, Palo Alto, CA.
Background: Human health is greatly impacted by genetics, environmental exposure, and
lifestyle. In recent years, significant effort has been dedicated to understanding how
genetics and lifestyle can influence our health. However, our understanding of the human
environmental exposures, especially at the personal level, is quite limited. Information
about environmental exposures, both biotics (e.g. pollens, microbes, and microbes) and
abiotics (e.g. chemicals) can be important for understanding and monitoring numerous
diseases such as respiratory diseases, allergy and asthma, chronic inflammatory diseases,
and even cancer. Methods: We have developed a novel highly sensitive method to
monitor personal airborne biological and chemical exposures (collectively referred to as
the environmental exposome) longitudinally by integrating a wearable device and
multiple-omics measurements, using non-targeted Next Generation Sequencing (NGS)
and Mass-Spectrometry (Mass-spec) approaches. We applied this method to track 15
different individuals spatial-temporally, among which three people were tracked up to
890 days and 201 time points, to provide an extensive personal profiling of the
environmental exposome. Results: We demonstrated that individuals are potentially
exposed to thousands of pan-domain species and thousands of chemical compounds,
Abstract: including insecticides and carcinogens. In aggregate, over 2500 species were identified
with great intraspecies diversity. We found that personal biological and chemical
exposomes are highly dynamic and vary spatial-temporally, even for individuals located in
the same general geographical region. We were able to construct a season-predictive
model based on the pan-domain genera profile. Integrated analysis of biological and
chemical exposomes revealed strong location-dependent relationships. Finally, we built
an exposome interaction network and demonstrated the presence of distinct yet
interconnected human- and environment-centric clouds, depicting extensive inter-species
relationships derived from various interacting ecosystems such as human, flora, pets and
arthropods. Conclusions: Overall, we describe a method to capture and analyze personal
environmental exposures, and demonstrate that human exposomes are diverse, dynamic,
spatiotemporally-driven interaction networks that have the potential to impact human
health. Specifically, our exposures can be largely partitioned into environment- and
human-centric components, with the latter largely reflecting the emerging “Human
microbial cloud” concept. Finally, both the data and approach are expected to be of
general value to a broad spectrum of scientific fields, such as public health,
biotechnology, microbiome, environmental science, evolution, and ecology.
Abstract Measuring the Influences of Contamination on Whole-Genome Sequence Analyses of
Title: Foodborne Pathogens
A. Pightling1, C. Carrillo2, N. Petronella3, J. Baugher1, A. Angers-Loustau4, S. Berthelet2, S.
Chandry5, P. Evans6, M. Mistou7, M. Petrillo8, W. Tong9, G. Van Domselaar10, W. Zou11, B.
Blais2;
1
U.S. Food and Drug Administration, College Park, MD, 2Canadian Food Inspection
Agency, Ottawa, ON, CANADA, 3Health Canada, Ottawa, ON, CANADA, 4European
Author
Commission, Ispra, ITALY, 5Commonwealth Scientific and Industrial Research
Block:
Organisation, Melbourne, AUSTRALIA, 6U.S. Department of Agriculture, Washington, DC,
DC, 7French National Institute for Agricultural Research, Peris, FRANCE, 8European
Commission, Brussels, BELGIUM, 9U.S. Food and Drug Administration, Little Rock,
AR, 10Public Health Agency of Canada, Winnipeg, MB, CANADA, 11U.S. Food and Drug
Administration, White Oak, MD.
High quality whole-genome sequence (WGS) data is critical for regulatory and outbreak
investigations of foodborne pathogens. Quality assessment of WGS data involves
examining coverage, read-length, and quality score to ensure that it is fit-for-purpose.
The integrity of DNA samples is also an important consideration as cross-contamination
during sample preparations for sequencing (e.g., during colony picking and growth of
bacteria, gDNA extraction, and library preparation) and carryover from previous runs may
occur. The foci of genome sequencing in food safety laboratories are mainly Escherichia
coli, Listeria monocytogenes, Campylobacter jejunii/coli, and Salmonella enterica. It is
currently unknown how contamination of WGS data for one of these pathogens by
another (e.g., E. coli contamination of an L. monocytogenes sequencing run) influences
downstream bioinformatic analyses and interpretation of results. Even less understood
are the ramifications of contamination by the same species (e.g., E. coli contamination of
an E. coli sequencing run) on analyses and final interpretations. Gaining an understanding
of the purity of samples and the quality of WGS data necessary for robust analyses and
appropriate interpretation of results allows laboratories to implement controls that will
Abstract: ensure the high level of confidence necessary for WGS analysis in a regulatory context. In
addition to data quality and purity, the impact of bioinformatic methodologies and
parameters on the reproducibility of WGS analyses for pathogen identification and
outbreak detection may affect the interpretation of results. For example, single-
nucleotide polymorphism and multi-locus sequence subtyping methods may yield
different results under certain conditions, such as variations in sequence quality, degrees
of DNA sample purity, and different dataset compositions. In the case of phylogenetic
analyses, these differences may manifest as variations in branching order and branch
lengths that could change investigative conclusions. Conclusions regarding the
identification of alleles in target loci, such as those that confer antimicrobial resistance or
those that contribute to propensity to cause disease (virulence), may also be altered by
data quality, purity, and methodology. This project addresses these issues by measuring
our ability to detect contamination. In addition, we measure the accuracy, specificity, and
sensitivity of WGS analyses performed with multiple bioinformatic approaches on real
and simulated datasets with different amounts of contamination with genomic DNA of
various genetic distances.
Title: 4: Drugs and Thugs: NGS to Combat AMR
Time: Tuesday, September 25, 2018, 3:45 pm - 6:00 pm
Abstract AMRtime: Rapid Accurate Identification of Antimicrobial Resistance Determinants from
Title: Metagenomic Data
F. Maguire1, B. Alcock2, F. S. Brinkman3, A. G. McArthur2, R. G. Beiko1;
Author 1
Dalhousie University, Halifax, NS, CANADA, 2McMaster University, Hamilton, ON,
Block:
CANADA, 3Simon Fraser University, Burnaby, BC, CANADA.
Metagenomics, the direct sequencing of the mixture of genomes present in a sample, is
an increasingly common workflow within the life sciences. It is frequently used to
investigate previously intractable problems such as the functional characterisation of
entire microbial environments. One such use-case of global and national public-health
importance is analysing the nature and transmission dynamics of antimicrobial resistance
(AMR) determinants in human, agri-food and environmental samples. Recently some
tools have been developed to profile AMR from metagenomes, however, these are
generally limited to profiling at the level of AMR genes clustered by % sequence identity,
which may or may not be biologically meaningful. By exploiting the expertly curated
ontological structure of the Comprehensive Antibiotic Resistance Database (CARD) and
new CARD Prevalence datasets, we have developed an approach using a hierarchical set
of machine learning classifiers. This allows us to produce gene-specific AMR profiles to
2386 determinants as well as profiles for higher order, biologically informed, AMR gene
family groups. Firstly, DIAMOND based heuristically accelerated homology searches are
used to filter out non-AMR related metagenomic reads. This filtering has been optimised
to prioritise minimisation of false negatives over minimising false positives. Features
Abstract:
generated from these homology searches as well as sequence features are then used to
train a random forest classifier to classify filtered reads into one of 227 CARD AMR gene
families (e.g. MCR phosphoethanolamine transferase). For each gene family an additional
random forest classifier is trained to classify reads into one of the specific AMR
determinants belonging to that family (e.g. MCR-1, MCR-2, MCR-3 etc.). This process
involves very little computational overhead when classifying beyond the initial homology
search. On a fully held out test-set of MiSeq reads simulated from the CARD canonical
gene sequences this method resulted in an average precision and recall of 0.993 and
0.987 at the AMR gene family level. Within the 227 AMR families, 70% (158) had an
average F1-score greater than 0.99 for classification to specific AMR determinants. A
further 10% (24) averaged F1-scores between 0.8 and 0.99. In comparative analyses on
the same dataset this outperformed homology searches alone, read mapping and
variation graph based methods in terms of average overall accuracy and precision.
Further work will aim to improve classification within certain families and expand
AMRtime to include variant based AMR models as well as meta-models (e.g. multi-
component efflux pump systems).
Abstract Diversity Among blaKPC-containing Plasmids in Escherichia coli and Other Bacterial Species
Title: Isolated from the Same Patients
Author T. H. Hazen1, R. Mettus2, C. L. McElheny2, S. L. Bowler2, Y. Doi2, D. A. Rasko1;
1
Block: UMB, Baltimore, MD, 2University of Pittsburgh, Pittsburgh, PA.
Carbapenem resistant Enterobacteriaceae are a significant public health concern, and
genes encoding the Klebsiella pneumoniae carbapenemase (KPC) have contributed to the
global spread of carbapenem resistance. In the current study, we used whole-genome
sequencing to investigate the diversity of blaKPC-containing plasmids and antimicrobial
resistance mechanisms among 26 blaKPC-containing Escherichia coli, and 13 blaKPC-
containing Enterobacter asburiae, Enterobacter hormaechei, K. pneumoniae, Klebsiella
variicola, Klebsiella michiganensis, and Serratia marcescens strains, which were isolated
from the same patients as the blaKPC-containing E. coli. A blaKPC-containing IncN and/or
Abstract: IncFIIK plasmid was identified in 77% (30/39) of the E. coli and other bacterial species
analyzed. Complete genome sequencing and comparative analysis of a blaKPC-containing
IncN plasmid from one of the E. coli strains demonstrated that this plasmid is present in
the K. pneumoniae and S. marcescens strains from this patient, and is conserved among
13 of the E. coli and other bacterial species analyzed. Interestingly, while both IncFIIK and
IncN plasmids were prevalent among the strains analyzed, the IncN plasmids were more
often identified in multiple bacterial species from the same patients, demonstrating a
contribution of this IncN plasmid to the inter-genera dissemination of the blaKPC genes
between the E. coli and other bacterial species analyzed.
Abstract Development and Validation of a Clinical Whole-Genome Sequencing Pipeline for the
Title: Detection of Antimicrobial Resistance Genes in Bacterial Isolates
E. Snavely, E. Nazarian, K. Mitchell, J. Shea, P. Lapierre, M. Palumbo, W. Haas, J. Bodnar,
Author
K. Cummings, S. Morris, J. Stella, C. Wagner, K. Musser;
Block:
Wadsworth Center, NYS Department of Health, Albany, NY.
Background: As the Northeast (NE) Regional Antimicrobial Resistance Laboratory
Network (ARLN) lab, the Wadsworth Center detects, prevents, and responds to infectious
disease threats by characterizing carbapenem-resistant and colistin-resistant bacterial
isolates. Current molecular approaches identify carbapenem-resistance conferred by the
KPC, NDM, VIM, IMP, OXA-48-like, and OXA-23-like β-lactamase families. These
approaches are limited in their ability to determine the specific variation responsible for
resistance, detect other antimicrobial resistance (AR) genes, and identify emerging types
of AR. Whole-genome sequencing (WGS) can help to overcome these
limitations. Methods: Our WGS pipeline incorporates multiple bioinformatic analysis
steps to identify AR genes present in the genome of bacterial isolates. Following DNA
sequencing using the Illumina MiSeq sequencing platform, read quality is determined and
low-quality sequence data is removed. Bacterial species identification is confirmed in-
silico and de novo assembly of the genome is performed. Assembly quality is assessed
using quantitative measurements prior to multilocus sequence typing analysis and AR
gene identification. Final analysis of the AR genes present in the genome assembly
compares gene identification between multiple databases and determines the best
match. Characterization of the plasmid content of these bacteria is also performed. These
Abstract:
pipeline steps have been unified in a Python application that uses Docker for
infrastructure portability and dependency management. Results: Analytical performance
was established using 20 bacterial isolates to demonstrate the accuracy and
reproducibility of this bioinformatic pipeline through specificity, intra-assay and inter-
assay reproducibility studies. Accuracy was verified utilizing both a blinded retrospective
study with 40 isolates and a prospective study of isolates that represented the range of
interpretations from this pipeline. To date we have analyzed over 200 bacterial isolates,
demonstrating 100% correlation to the expected characterization from both real-time
PCR assays and antimicrobial susceptibility testing results. Additionally, we identified
acquired colistin and carbapenem-resistance genes that were previously not
detected. Conclusions: The results presented in this validation study demonstrate this
WGS analysis pipeline is highly sensitive, specific, and reproducible. In addition to the
identification of specific gene variants responsible for resistance, we identified AR genes
present in bacterial isolates from the ARLN NE region not detected by existing methods.
The ability to rapidly detect these known and emerging AR genes is of public health
importance and will allow for prompt implementation of infection control measures and
contribute to the overall understanding of AR.
Title: 5: Pipe Dreams: Analytical Methods, Bioinformatics Tools, and Pipelines
Time: Wednesday, September 26, 2018, 9:00 am - 12:45 pm
Abstract
Plasmid Detection and Assembly in Genomic and Metagenomic Datasets
Title:
D. Antipov1, M. Raiko1, A. Lapidus1, P. A. Pevzner2;
Author 1
St. Petersburg State University, St. Petersburg, RUSSIAN FEDERATION, 2St. Petersburg
Block:
State University / UCSD, St. Petersburg, RUSSIAN FEDERATION.
Plasmids are extrachromosomal and independently replicating DNA molecules that
provide their bacterial hosts with additional genetic material important for survival and
adaptation. Before the sequencing era, plasmids were detected based on various
phenotypic changes they provide, such as antibiotic resistance or ability to degrade
recalcitrant organic compounds. However, sequencing efforts revealed many cryptic
plasmids that do not contribute to the phenotype of the host cell in an obvious way.
Although there are about 10,000 plasmids listed in the RefSeq database (Pruitt et al,
2006), many plasmids remain undetected since it is not trivial to assemble plasmids from
genomic and metagenomic datasets (Antipov et al., 2016, Rozov et al., 2017). We thus
conjecture that many classes of plasmids remain unknown, like many previously
unknown classes of viruses that were found in recent studies (Paez-Espino et al, 2016,
Roux et al., 2016).Since plasmids exchange genetic material with the host chromosomes
and vary in structure (circular or linear), size (from a thousand to millions of nucleotides),
Abstract: and gene content, it is not clear how to computationally define the concept of a plasmid
in such a way that it would be possible to distinguish them from the chromosomes in
draft assemblies. Also, plasmid assembly is complicated by various repeats that are
difficult to resolve using short reads sequencing technologies.Here we present
a metaplasmidSPAdes algorithm that improves on existing tools (Antipov et al., 2016,
Rozov et al., 2017) by (i) iteratively extracting subgraphs with gradually increasing read
coverage from the metagenome assembly graph, (ii) finding putative plasmids as simple
cycles in these subgraphs, and (iii) verifying the found putative plasmids using a
new plasmidVerify tool. We applied plasmidSPAdes+ (plasmidSPAdes complemented
by plasmidVerify) and metaplasmidSPAdes to diverse genomic and metagenomic samples
and revealed 1000s of plasmids missed in previous studies, including many plasmids that
share no significant similarities with known plasmids, and plasmids carrying antibiotic-
resistance genes.This study was funded by the Russian Science Foundation (grant 14-50-
00069).
Abstract
Dash: Efficient Genomic Set Operations Using HyperLogLogs
Title:
Author D. N. Baker, B. Langmead;
Block: Johns Hopkins University, Baltimore, MD.
Many recent studies have proposed new alignment-free methods for comparing,
clustering and classifying genomes and sequencing datasets. We propose a new method,
Dash, which uses the HyperLogLog (HLL) sketch data structure. This structure specializes
is estimating the cardinalities of sets as well as cardinalities of their unions and
intersections with other sets. Using the HLL, Dash is substantially faster than previous
methods based on MinHash, while providing the greater accuracy and reduced bias. We
Abstract: show this across large ranges of pairwise-genome distances, both in terms of Jaccard
index and in terms of average nucleotide identity, and we show that previous tools
exhibit biases across some portions of this range. Dash also supports approximate
membership queries -- for which Bloom filters have generally been used -- and we show
that Dash's HyperLogLog-based techniques can outperform Bloom filters for these tasks
as well. Dash is implemented in C++17 and is freely available under GPLv3 at
https://github.com/dnbaker/dash.
Abstract
AusTrakka: Enabling Data Sharing for Surveillance --- or Why Your Parents Were Right
Title:
A. Goncalves da Silva1, T. Seemann2, D. A. Williamson1, B. P. Howden1;
Author 1
Microbiological Diagnostic Unit Public Health Lab, Parkville, AUSTRALIA, 2Melbourne
Block:
Bioinformatics, Parkville, AUSTRALIA.
If you are a parent or have observed parents, you have uttered (or heard) something
similar to: you need to share with your friend/sibling/cousin. We don’t instinctively share
things but we are taught that there are positive outcomes in doing so. By sharing our toys
we often gain friendship, which makes things a lot more fun - or, at the very least, it stops
the parents from nagging. Scientists, epidemiologists, and public health officials don’t
instinctively share data either. But, we have numerous examples of the benefits that can
come from real-time data sharing in public health microbiology (e.g. GenomeTrakr), and
it can bring the global public health community closer together. However, it is still
difficult to share data, even if there is a will. Uploading sequence data to giant
sequencing archives (i.e., NCBI, ENA, DDBJ) can be onerous and time-consuming, and
sometimes bewildering. There may also be relevant legislation preventing the upload of
such information to public archives. Now, much like sharing toys, sharing data requires
trust. Trust that the other person won’t break the toy or misuse the data. When learning
Abstract:
to share, we start small, with a toy we are not particularly interested in, keeping the real
goods out of reach. That is the goal of AusTrakka. To create a safe environment where
data sharing amongst public health laboratories can become routine, and that the impact
on surveillance can be directly felt by all Australian laboratories irrespective of their
sequencing and bioinformatic capacity. In many ways, it will replicate functionality seen
in other tools and databases (e.g., GenomeTrakr, Enterobase, Innuendo, Microreact,
IRIDA), but doing it in a way that respects yet de-emphasizes local legislation. In doing so,
we hope it will whet people’s appetite for more, allowing for sharing at broader scales. In
others aspects, it will try to improve on how things are currently done, in particular, how
data is uploaded. Here, we present a demo of what AusTrakka platform will look like. It is
composed of a responsive and dynamic front-end in JavaScript, a backend of REST APIs
and a PostgreSQL database, with analyses conducted using open-source tools versioned
in Singularity/Docker containers run in a cloud environment.
Abstract ORION - One health surRveillance Initiative on harmOnization of Data Collection and
Title: interpretatioN
K. Lagesen1, F. Dorea2, J. Ellis-Iversen3, I. Boone4, T. Buschhardt4, J. Gethmann5, M.
Filter4;
Author 1Norwegian Veterinary Institute, Oslo, NORWAY, 2Swedish National Veterinary Institute,
Block: Uppsala, SWEDEN, 3Technical University of Denmark, Kgs. Lyngby, DENMARK, 4German
Federal Institute for Risk Assessment, Berlin, GERMANY, 5Friedrich-Loeffler-Institut,
Greifswald - Insel Riems, GERMANY.
We would like to present the ORION “One Health EJP” EU project, which aims at
establishing and strengthening inter-institutional collaboration and transdisciplinary
knowledge transfer within surveillance data integration and interpretation, along the One
Health (OH) objective of improving health and well-being. This will be achieved through
an interdisciplinary collaboration of 12 veterinary, food and/or public health institutes
from 6 European countries. These agencies are committed to adopting best practice OH
surveillance solutions (guidelines, methods, tools and knowledge). Specifically this project
will create an “OH Surveillance Codex”, a high level framework for harmonized, cross-
sectional description and categorization of surveillance data covering all surveillance
phases and all knowledge types; an “OH Knowledge Base” a cross-domain inventory of
currently available data sources, methods, algorithms and tools that support OH
surveillance data generation, data analysis, modelling and decision support; and a set of
Abstract:
“OH Surveillance Infrastructural Resources”, practical, low-level resources which can
form the basis for successful harmonization and integration of surveillance data and
methods.
One of the work packages within this project has as its focus how new technologies,
amongst them sequencing and bioinformatics, can be used for surveillance. These
methods are challenging for many public, food and veterinary institutions to implement.
This work package aims to create an overview of what infrastructure is needed, as well as
which methods and tools are in current use. Another aim is to examine how sequencing
based results may be used in epidemiological surveillance models. The ultimate goal for
this part of the project is to create a handbook which can help institutions use such
methods in a way that enables them to get the results they need, while also producing
results that can be integrated with those from other institutions.
Abstract From Contigs to Chromosomes: A Hi-C Based Graph Assembly Tool Significantly Improves
Title: Metagenome Contiguity and -Cfree Metagenomic Deconvolution
Z. Kronenberg1, S. Sullivan1, A. Wiser1, M. Press1, S. Eacker1, T. Stalder2, M. Shakya3, E.
Author Top2, P. Chain3, C. Lo3, I. Liachko1;
1
Block: Phase Genomics, Seattle, WA, 2University of Idaho, Moscow, ID, 3Los Alamos National
Labs, Los Alamos, NM.
Metagenome assembly using either long- or short-read sequence data remains
challenging due to the loss of long-range genomic contiguity and the mixing of DNA from
different species within a sample. Short-read sequence data yields fragmented
assemblies largely due to the inability to assemble over non-unique sequences. Long read
assemblies suffer from ascertainment bias due to the differences in relative abundance
of microbial genomes. All assemblies are inhibited by the loss of intra-cellular sequence
connectivity lost during DNA extraction. The fragmentation of microbial contigs is caused,
in part, by ambiguity in the assembly graph, were bubbles and bifurcations obscure the
true genomic sequence. To resolve ambiguities in the graph we leverage the proximity
information from Hi-C data. The in vivo proximity-ligation technology, Hi-C, uses chemical
Abstract: crosslinking to create chimeric junctions between DNA sequences that were proximal in
physical space prior to DNA extraction. As such, it captures which sequences occupied
the same cells prior to lysis and allows culture-free genome-scale deconvolution of mixed
assemblies. We have developed a new metagenome assembler that works by layering Hi-
C data onto Graphical Fragment Assembly (GFA) files, and then uses an ant colony
optimization algorithm to identify likely unitig paths. These paths are then merged into
contigs. Benchmarks done on synthetic communities show that this method significantly
increases the average contig lengths in metagenome assemblies while decreasing the
rates of contig chimerism. Additionally, we show that applying Hi-C data downstream of
assembly enables the de novo clustering of genomes, the association of plasmids with
hosts, and strain deconvolution without culturing.
Abstract Locally Partitioned Phylogenetic Agglomeration Enables Exhaustive Detection of Outbreak
Title: Clusters via Global Genome Trees
Author S. S. Minot;
Block: Fred Hutchinson Cancer Research Center, Seattle, WA.
As whole-genome sequencing has been deployed to state and local public health
laboratories, the amount of whole-genome data available for outbreak surveillance has
expanded rapidly. While the computational algorithms used to perform taxonomic
identification and whole-genome comparison of individual pairs of isolates have made
great strides (i.e. MinHash-based comparison), the phylogenetic methods used to
construct genome trees have lagged behind considerably. In order to translate the whole-
genome data generated by next-generation sequencing platforms into a form which is
readily consumable by public health professionals, the phylogenetic tree is still the most
convenient and easily understandable modality. Moreover, recent advancements in
visualization (e.g. nextstrain) have provided intuitive interactive user interfaces for
navigating trees with hundreds or thousands of tips, enabling public health professionals
to identify potential outbreak clusters. However, the core phylogenetic algorithms used
to transform pairwise genome comparisons into genome trees are inherently unscalable
Abstract:
and cannot be applied practically to the complete set of whole-genome data which is
available (on the order of 10^5). Here we present a novel computational approach that
uses local partitioning and phylogenetic agglomeration in order to rapidly create genome
trees which closely approximate the topology of traditional methods. This novel
computational approach will help enable public health professionals to use the entirety
of microbiome genome surveillance for outbreak epidemiology, even as genome
databases continue to expand.
References:
Hadfield, J. et al. (2018) ‘Nextstrain: real-time tracking of pathogen evolution.’,
Bioinformatics (Oxford, England). doi: 10.1093/bioinformatics/bty407.
Ondov, B. D. et al. (2016) ‘Mash: fast genome and metagenome distance estimation using
MinHash’, Genome Biology. BioMed Central, 17(1), p. 132. doi: 10.1186/s13059-016-
0997-x.
Abstract
Mash Screen: Fast Sequence Containment Estimation Using MinHash
Title:
B. Ondov1, A. Kostic2, A. Sappington3, S. Koren1, A. Phillippy1;
Author 1
National Human Genome Research Institute, Bethesda, MD, 2Princeton University,
Block:
Princeton, NJ, 3Massachussets Institute of Technology, Cambridge, MA.
Background: The MinHash algorithm for dimensionality reduction has proven effective
for extremely fast, probabilistic estimation of mutational distances between either two
genomes or two metagenomes. However, these estimates have been underpinned by k-
mer set operations for measuring resemblance rather than containment, making existing
tools inappropriate for comparing smaller sequences to larger ones, e.g. finding genomes
within metagenomes. Methods: We implement sequence containment estimation using
a novel, online algorithm that is compatible with existing MinHash approaches and
supports estimation of genome containment within unannotated read sets. Containment
of entire proteomes within a set of raw sequencing reads can also be tested using on-the-
fly six-frame translation. From these containment scores, we propose a formula for
estimation of either whole-genome or whole-proteome identity and demonstrate
meaningful correlation of this statistic with real data from a synthetic
metagenome. Results: We demonstrate the usefulness of our method in two directions.
First, a new read set can be rapidly screened for hits to a reference database. We show
Abstract:
how this can be used for detecting contamination and for narrowing potential
constituents for more precise classification. Second, we demonstrate the construction of
a prescreened library for all SRA metagenomes against all RefSeq genomes. The result of
this computation is a similarity relation between all RefSeq genomes and the
corresponding SRA metagenomes predicted to contain them. This allows retrospective
studies in which existing sequencing data can be mined for newly discovered genomes. As
an example, we identify all SRA records predicted to contain close relatives to a recently
described human polyomavirus. Conclusions: Mash first introduced MinHash for rapid
genome-to-genome comparisons. Mash Screen now extends these ideas to the problem
of genome-to-metagenome comparisons. While many tools exist for short sequence
search, Mash Screen provides a fast and convenient tool for whole-genome search. This
type of search can be used to check new sequencing runs for contamination; build a
catalog of genomes contained within a metagenome; or to search past sequencing runs
for genomes of interest.
Abstract Viral Pathogen Cloud Pipelines Enable Locally-driven NGS Analyses in Limited-resource
Title: Labs.
Author D. J. Park1, C. Tomkins-Tinch1, K. J. Siddle1, S. M. Winnicki1, M. F. Lin2, P. C. Sabeti1;
1
Block: Broad Institute of Harvard and MIT, Cambridge, MA, 2DNAnexus, Mountain View, CA.
Here we present "viral-ngs," a package of tools for metagenomic identification, assembly,
annotation, and intrahost variant characterization of viral sequence data. Viral genomic
data presents distinct analysis challenges, including: extremely high diversity within each
species (Lassa virus is 70% conserved at the nucleotide level), poor representation of that
diversity in published databases, low viral signal-to-background ratios in read data, and
cross contamination in read data from other samples and background metagenomic
noise. Based on methods and approaches originally developed for Lassa virus analysis,
viral-ngs was initially released in 2014 as a generalization of those tools and best
practices and has since been successfully applied to a wide range of uncultured viral
samples, including: Lassa, Ebola, Zika, mumps, enterovirus, HIV, influenza, Powassan,
rabies, CMV, VZV, as well as samples from patients with fevers of unknown origin.
The software is distributed via GitHub, Bioconda, and the Quay.io Docker registry.
Abstract: Individual atomic analysis steps can be invoked from python scripts at the command line.
Common viral analysis pipelines and workflows are provided in both Snakemake and WDL
languages; the latter implementation facilitates our continuous integration and
deployment to the DNAnexus platform. DNAnexus is a Platform-as-a-Service (PaaS)
vendor that provides an intuitive web interface for genomic analyses on a cloud compute
backend. This is currently the most widely used implementation of our viral-ngs package
and is used by our West African viral collaborators for all of their genomic analysis needs.
Iterative improvement of this software suite continues based on our research needs and
the evolving needs of our collaborating pathogen sequencing centers in West Africa
(https://acegid.org/), Walter Reed Army Institute of Research affiliated sites in West
Africa, and a number of state public health labs in the Northeast US. The DNAnexus
implementation, in particular, enables reliable, real-time cloud data capture from West
African sequencing centers with unstable power and internet connectivity.
Session: Poster Session A
Poster
1
Board #:
Title: Identification of Gram-Negative Non Fermentative Bacteria - How Hard Can It Be?
T. Whistler1, P. Sawatwong2, O. Sangwichian2, P. Jorakate2, S. Makprasert2, U. Surin3, L. F.
Peruski1;
Author 1
Centers for Disease Control and Prevention, Atlanta, GA, 2Thailand MOPH - US CDC
Block:
Collaboration, Nonthaburi, THAILAND, 3Nakhon Phanom Provincial Health Office, Nakhon
Phanom, THAILAND.
Background: Gram-negative non-fermentative (GNNF) bacteria are ubiquitous in nature and
when isolated from clinical specimens are generally disregarded as possible contaminants.
Recently they have emerged as important opportunistic and nosocomial pathogens. Due to
taxonomic complexity and high phenotypic similarity between these organisms, accurate
identification represents a challenge for conventional microbiology. We used the MiSeq next
generation sequencing platform to identify isolates missed by traditional
methods. Methods: In a retrospective study, we examined 9 years of bloodstream infection
data gathered from population-based surveillance in two rural Thai provinces for unidentified
GNNF organisms. Isolates not identified by manual biochemical tests (221/14507) were
retested on the BD Phoenix Automated Identification System, organisms not recognized
(15/221) were subject to whole genome sequencing using an Illumina Miseq. Organism
characterization was attempted using several different web-based platforms including
Taxonomer (https://www.taxonomer.com/), One Codex (https://onecodex.com/), and
Center for Genomic Epidemiology (https://cge.cbs.dtu.dk/services/). FastQ files were
uploaded directly to each web-based server. Results: All 15 isolates were successfully
Abstract:
sequenced and met quality control conditions. Organism identification was achieved for all
15 isolates, however concordance of all 3 programs at the genus level was only achieved for
10/15 isolates, and 1/15 for species (Laribacter honkongensis). Agreement for 7/15 isolates
speciation was attained with 2/3
programs: Wohlfahrtiimonas chitiniclastica, Moraxella osloensis (2), Burkholderia cenocepaci
a, Roseomonas gilardii (2) and Streptococcus pasteurianus. Other isolates
were Acinetobacter (2), Delfitia, Pseudomaonas, Staphylococcus and Bacillus species. Conclus
ions: One of the largest issues around implementing next generation sequencing is the
bioinformatics. Public health laboratories in many countries would benefit enormously from
this technology, especially as cheaper, low-throughput systems become available. Free web-
based services are important for implementation of WGS at our partner-run international
sites, where laboratorians do not have access to high-powered computing. Common
pathogens are well represented in databases, but as evident form this study lesser-known
opportunistic pathogens with high phenotypic variance are problematic, although NGS does
provide some promising results.
Poster
2
Board #:
Title: An Outbreak of Streptococcus pyogenes in a Mental Health Facility in Singapore
Author P. de Sessions;
Block: Genome Institute of Singapore, United States, FL.
As humans, we share our existence with a complex community of microbial partners; fungal,
viral and bacterial. At the Genome Institute of Singapore (GIS),
the GIS Efficient Rapid Microbial Sequencing (GERMS) platform utilizes both wet lab and dry
lab tools to genetically characterize these populations and how they impact our lives. The
principal goal of the platform is to enable hospital, industrial and academic partners to use
microbial genomics for a variety of applications. The 5 pipelines we run are the viral runner
pipeline, 16S, metagenomics, RNA-seq and GERMS- whole genome sequencing (WGS). To
illustrate our GERMS-WGS pipeline we tracked a Streptococcus pyogenes (S. pyogenes)
outbreak from June 2016. S. pyogenes is a Gram-positive bacterium often found in the throat
and on the skin. It can be spread through direct contact with mucus from the nose or throat of
persons who are infected or through contact with infected wounds or sores on the skin. To
identify cases, we collected oropharyngeal swabs from residents and hospital staff members
Abstract: on the affected wards and swabs from residents with visible wounds on skin. Four cases of S.
pyogenes in a residential ward in a large mental health facility, were observed and shortly
afterwards an outbreak control team was formed and GERMS was called upon to help to use
genomics together with epidemiology to better understand the relationship between isolates.
In addition to WGS, we also preformed transitional typing methods like emm typing and MLST
typing. We found that WGS had higher resolution than those typing methods in identifying
clusters with recent and ongoing person-to-person transmissions, which allowed
implementation of targeted intervention to control the outbreak, saved hospital-wide
antibiotic prophylaxis, ruled out staff transmission and distinguished between two separate
outbreaks based on their genetic similarity. As the cost of WGS goes down, the speed of a full
genomic answer shortens and cost of WGS is comparable to traditional methods of
sequencing typing, we start to approach and era where WGS can replace traditional
sequencing typing methods in the clinic.
Poster
3
Board #:
EDGE Bioinformatics Platform for NGS Data- Empowering the Development of Genomics
Title:
Expertise
Author C. Lo, P. Li, M. Shakya, M. Flynn, G. House, Y. Xu, K. Davenport, P. Chain;
Block: Los Alamos National Laboratory, Los Alamos, NM.
Continued advancements in sequencing technologies have fueled the development of new
sequencing applications and promise to flood current databases with raw data. A number of
factors prevent the seamless and easy use of these data, including the breadth of project
goals, the wide array of tools that individually perform fractions of any given analysis, the
large number of associated software/hardware dependencies, and the detailed expertise
required to perform these analyses. To address these issues, we have developed an intuitive
web-based environment with a wide assortment of integrated and cutting-edge
bioinformatics tools in preconfigured workflows. These workflows, coupled with the ease of
use of the environment, provide even novice next-generation sequencing users with the
ability to perform many complex analyses with only a few mouse clicks and, within the
context of the same environment, to visualize and further interrogate their results. This
bioinformatics platform is an initial attempt at Empowering the Development
of Genomics Expertise (EDGE) in a wide range of applications for microbial research. EDGE is
under constant development to expand its capabilities.
Workflows:
• Pre-processing (QC & host removal)
• Assembly & Annotation
Abstract:
• Reference-based Analysis
• Taxonomy Classification
• Phylogenetic Analysis
• Specialty Genes
• Primer Analysis
• Identification of AMR and virulence genes
• Comparison of multiple metagenomics samples’ taxonomy results
• 16S/18S/fungal ITS analysis using QIIME
• Integration of sample metadata collection/storage
The recent released EDGE version 2.2 includes several new features:
• Report generation
• Pathogen analysis and characterization
• Presence/absence of targeted amplicons
• Differential gene expression (RNA-Seq)
• Tools for analysis of Nanopore MinIon data
EDGE is available for users to upload data and try out the platform’s tools and pipelines
at https://edgebioinformatics.org/
Poster
4
Board #:
Title: Application of Fungal Metagenomics for Fixed Tissue Profiling
D. Rothenheber1, B. D. Chilson1, D. B. Needle2, R. R. Anderson1, I. Sidor2, N. J. Marra1, B.
Stanhope1, A. J. Tachil1, B. Stevens2, J. C. Ellis3, S. Ahlberg4, M. Murray3, B. Thompson1, R. E.
Author
Gibson2, L. Goodman1;
Block: 1
Cornell University, Ithaca, NY, 2University of New Hampshire, Durham, NH, 3Tufts University,
Boston, MA, 4Center for Wildlife, Cape Neddick, ME.
The application of metagenomics for fungal identification has become increasingly popular in
the clinical veterinary diagnostic field for identification of non-culturable organisms or from
fixed specimens. However, unlike bacterial metagenomics no true “universal barcode” has
been established, leading researchers to choose different primer sets targeting varying
regions of the SSU, LSU, and ITS region. Here, we present initial 28S profiling with the Illumina
MiSeq from two clinical cases and a case series in comparison with histopathology and culture
results as a tool for fungal identification from fixed tissue samples. The first case analyzed was
from a bearded dragon with a history of skin lesions. Nannizziopsis guarroi was cultured from
a fresh biopsy, and a subsequent biopsy was formalin fixed and tested by 28S metagenomics.
The Nannizziopsis genus represented 76% of fungal reads (others were from Malassezia), and
a consensus alignment to reference sequences confirmed the same species as the pure
culture. The second case was a porcupine with suspected histoplasmosis based on pathology
Abstract:
for which culture was unsuccessful. Approximately 8% of fungal reads came from Histoplasma
capsulatum or its teleomorph Ajellomyces capsulatus, which confirmed the first report of this
agent in the state of NY. We also analyzed a case series of twelve wild North American
porcupines from Maine that were diagnosed with atypical dermatophytosis from 2010 to
2017. Formalin-fixed tissue biopsies and cultures from plucked hairs from six cases were
analyzed. All fungal reads from pure cultures were identified as either Arthroderma
benhamiae or Trichophyton mentagrophytes, which was consistent with pathologic findings.
The FFPE samples revealed varying proportions of the Ascomycota and Basidiomycota phyla.
Due to the general limitations of 28S profiling for clinical identification current work is being
conducted to sequence and analyze alternative targets. Our aim is to identify cost effective
markers(s) that are (1) competent with fixed tissue samples, (2) possess significant taxonomic
resolving power, and (3) can be implemented with simple bioinformatics pipelines.
Poster
5
Board #:
Cryptococcal Meningitis Caused by Cryptococcus gattii Might be Underestimated in China: A
Title:
Finding from Next-Generation Sequencing of Cerebrospinal Fluid
Y. GE1, S. Fan1, X. Ma1, H. Guan1, H. Wu2;
Author 1Chinese Acdemy of Medical Science Peking Union Medical College Hospital, Beijing,
Block: CHINA, 2Tianjin Medical Laboratory, BGI-Tianjin, BGI-Shenzhen, Tianjin 300308, China, Tianjin,
CHINA.
Background Cryptococcal meningitis causes morbidity and mortality worldwide. It is usually
caused by Cryptococcus neoformans (C. neoformans) and mainly occurs in
immunocompromised patients. The diagnosis of cryptococcal meningitis might be delayed in
immunocompetent patients. Besides, cryptococcal meningitis caused by Cryptococcus
gattii (C. gattii) might be ignored due to the lack of routine clinical methods to differentiate
the two species. Methods Next-generation sequencing (NGS) of cerebrospinal fluid (CSF) was
applied to detect pathogens in patients with clinically suspected central nervous system
infections in a tertiary referral center in China. The patients diagnosed as cryptococcal
Abstract: meningitis are reviewed in the study. Results Seven patients diagnosed as cryptococcal
meningitis are studied, including five patients infected with C. neoformans and two patients
infected with C. gattii, the latter of which is extremely rare in northern
China. Cryptococcus specific sequences were detected in all the seven patients by NGS of CSF.
The identified reads corresponding to C. neoformans or C. gattii range from 95 to 68,261,
with genomic coverage ranging from 0.13 to 40 %. Conclusions This is the first case series
study highlighting the application of NGS of CSF as a diagnostic method for cryptococcal
meningitis. The findings in this study suggest the underestimate of cryptococcal meningitis
caused by C. gattii in China.
Poster
7
Board #:
Title: Next-Generation Sequencing based Antimicrobial Resistance Diagnostics
M. Fritz1, J. Zhou2, S. Beisken1, V. Galata3, J. Wang2, Y. Chen2, J. Ramoni1, N. Mahfouz1, Y. Li4, A.
Plum5, A. Keller3, H. Jiang2, H. Tan2, A. E. Posch1;
Author 1
Ares Genetics GmbH, Wien, AUSTRIA, 2MGI, BGI-Shenzhen, Shenzhen, CHINA, 3Chair for
Block:
Clinical Bioinformatics, Saarbrücken, GERMANY, 4BGI Genomics, BGI-Shenzhen, Shenzhen,
CHINA, 5Curetis GmbH, Holzgerlingen, GERMANY.
Antimicrobial Resistance (AMR) is a major and rapidly increasing public health threat. As of
today, antibiotic-resistant infections cause ~700,000 lives per year and are projected to claim
more than 10 million annual casualties with a concomitant 2% to 3.5% damage to global gross
domestic product by 2050. Effectively fighting this threat not only requires appropriate
therapy options and novel antibiotics, but also novel diagnostics for timely therapy selection
and antibiotic stewardship. In this context, Next-Generation Sequencing (NGS) is increasingly
applied for clinical isolate sequencing and more and more considered the bacterial typing
method of choice by national and international centers for disease prevention and control.
Performant NGS platforms together with high-quality AMR reference databases comprising
validated AMR markers with diagnostic performance statistics are key to NGS-based AMR
diagnostics. Here we combine BGISEQ-500, a novel benchtop sequencing device based on
combinatorial probe-anchor synthesis (cPAS) with ARESdb, a comprehensive AMR reference
Abstract:
database that links diagnostic markers with resistance phenotypes for thousands of
representative pathogens that have been collected globally over the last 30 years. Towards an
any-sample-to-answer solution for NGS-based AMR diagnostics, we provide first results for
combining BGISEQ-500 and ARESdb with the Curetis UNYVERO L4 Lysator as well as the
MGIFLP system integrating and automating all steps from sample lysis, DNA extraction, library
preparation and sequencing to reporting. We systematically evaluate BGISEQ-500 on the raw
read, assembly and annotation level for bacterial typing and AMR marker detection on multi-
drug resistant isolates from Klebsiella pneumoniae and Escherichia coli and compare the
results against standard Illumina (HiSeq 2000/2500) sequencing technology. Using resistance
phenotypes determined via broth microdilution as ground truth, we demonstrate that
BGISEQ-500 in combination with ARESdb accurately identifies pathogens as well as genetic
AMR markers including resistance genes and point mutations.
Poster
8
Board #:
RipTide™ Ultra High-Throughput Rapid DNA Library Preparation for Next Generation
Title:
Sequencing
A. Siddique1, G. Suckow2, N. Homer3, J. Bajena2, P. Ordoukhanian2, S. Head2, K. Brown1;
Author 1
iGenomX, Carlsbad, CA, 2The Scripps Research Institute, La Jolla, CA, 3Fulcrum Genomics,
Block:
Somerville, MA.
Whole Genome Shotgun Sequencing has become the tool of choice for microbial genome
analysis. Rapidly declining costs of sequencing, data analysis, data storage and database
access will continue to drive adoption. Library construction has not kept pace with these
advancements, with costs of preparing a next generation sequencing (NGS) library often
exceeding the cost of sequencing. Popular methods of library construction for NGS include
fragmentation, end-repair and adapter ligation, and transposase-mediated adapter insertion.
The Riptide High Throughput Rapid DNA Library Prep is distinctly different in its approach
because it relies on polymerase-mediated primer extension for library preparation. The initial
step of the prep, involving primer extension with barcoded random primers, is performed in a
Abstract: 96-well plate. Each well of the plate contains primers with a unique barcode; consequently,
the library generated from each well is uniquely identifiable and can be bioinformatically
traced back to the original sample after sequencing. Following this step, the primer extension
products are combined into one pool and all subsequent steps, including second strand
synthesis and PCR, are performed with the single pool. The library prep is fast, easily
automatable and can be tuned to genomes of high and low GC content. With automation,
960 samples can be processed in a single day. The technology will aid genetic research by
helping to increase sample throughput and by reducing processing steps and operating costs.
Presented here is RipTide High Throughput Rapid DNA Library Prep sequencing data
generated from multiple microbial genomes.
Poster
9
Board #:
Title: Metagenomic Characterization of Microbiomes and Resistomes in Animal Food
Q. Yang1, D. A. Tadesse1, K. J. Domesle1, K. G. Jarvis2, C. Hsu1, S. H. Sarria1, X. Li3, B. Ge1;
1
Office of Research, Center for Veterinary Medicine, U.S. Food and Drug Administration,
Author Laurel, MD, 2Office of Applied Research and Safety Assessment, Center for Food Safety and
Block: Applied Nutrition, U.S. Food and Drug Administration, Laurel, MD, 3Office of Surveillance and
Compliance, , Center for Veterinary Medicine, U.S. Food and Drug Administration, Rockville,
MD.
Animal food, e.g., pet food, animal feed, and raw materials and ingredients, is a diverse and
complex food matrix. High-throughput sequencing facilitates the characterization of the
microbiota and antimicrobial resistance genes associated with these matrices. In this study,
we used 16S rRNA gene amplicon and shotgun metagenomic sequencing to profile the
microbiomes and resistomes of representative animal foods (cattle feed, poultry feed, and dry
dog food). Additionally, a mock microbial community was included for method evaluation.
Four genomic DNA extraction methods were compared and 16S rRNA gene amplicon
(targeting three regions) and shotgun metagenomic DNA libraries were sequenced on
Illumina’s MiSeq and NextSeq platforms, respectively. Quality-filtered 16S rRNA gene
sequences were analyzed using multi-step OTU picking by QIIME. Taxonomic and resistome
profiles of shotgun metagenomic sequences were determined by MetaPhlAn2 and ShortBRED,
respectively. Both 16S rRNA gene and shotgun metagenomic sequencing data suggested that
Abstract: ZymoBIOMICS DNA miniprep kit provided taxonomic profiles most resembling the theoretical
values of the mock community and the V3-V4 region of the 16S rRNA gene was the most
accurate. Shotgun metagenomic sequencing revealed distinct microbiota among the animal
foods tested. At the genus level, the most abundant taxa among all animal foods
were Pantoea (35.4%) followed by Xanthomonas (21.7%), Pseudomonas (15.0%),
and Bacillus (11.4%). Salmonella was observed in one cattle feed and two poultry feed
samples at relative abundances below 5%. A total of 28 resistance genes conferring resistance
to 8 antimicrobial drug classes were identified. Animal feed samples harbored resistance
genes from all 8 classes while pet food only contained genes conferring resistance to β-
lactams and trimethoprim. These data enhance our understanding of endemic microbiota as
well as resistance genes present in animal food. Such information may be useful in future
microbial risk assessment efforts to enable better characterization and control of
microbiological hazards in animal food.
Poster
11
Board #:
Title: ConFindr: Detection of Intraspecies Contamination in Raw NRGS Reads
Author A. J. Low, A. G. Koziol, B. W. Blais, C. D. Carrillo;
Block: Canadian Food Inspection Agency, Ottawa, ON, CANADA.
Food microbiology laboratories increasingly apply bacterial whole-genome sequence (WGS)
analyses for pathogen identification, high-resolution typing and risk profiling. The ability to
assess WGS data quality, for determination of DNA contamination arising during sample
preparation or from previous sequencing runs, will greatly contribute to the reliability of
downstream analyses. Existing tools are effective for the identification of mixed-species
contamination but do not readily identify contamination from closely-related organisms. We
have developed a bioinformatic tool, ConFindr, for detection of intra-species contamination.
Abstract: ConFindr analyzes raw reads to find the presence of multiple alleles of single-copy genes that
are conserved across the entire bacterial domain, indicating a sample is contaminated. Testing
on major food pathogens Listeria monocytogenes, Salmonella enterica, and Escherichia
coli has shown that ConFindr can find contamination caused by strains that are as few as 1000
SNPs (~99.99 percent identity) away from the genome of interest, sensitivity that greatly
exceeds any existing tools, while maintaining an extremely low false positive rate and runtime
of less than one minute per sample. ConFindr is written in Python, and is freely available at
https://github.com/lowandrew/ConFindr under the MIT licence.
Poster
12
Board #:
Title: GeneSippr:a Rapid Whole-genome Approach for Bacterial Identification and Typing
Author A. G. Koziol, A. Low, C. Carrillo, B. W. Blais;
Block: Canadian Food Inspection Agency, Ottawa, ON, CANADA.
The timely identification and characterization of foodborne bacteria for risk assessment
purposes is a key component of outbreak investigations. Current methods require several
days and/or provide low-resolution identification. Here we describe an update to the whole-
genome sequencing (WGS) approach (GeneSippr) enabling same-day characterization of
colony isolates. This bioinformatics tool, written in Python, includes the following
functionalities: typing (16S or MASH-based speciation, serotype determination, MLST and
rMLST profiling), marker detection (antimicrobial resistance, virulence risk factors,
pathogenesis-associated genes, or on user-provided databases), quality analysis (detection of
inter- and intra-species contamination, and genomically dispersed conserved sequences), and
Abstract:
report creation. Each analysis can be enabled or disabled as desired on either an in-progress
MiSeq run or on raw FASTQ files. By optionally disabling demultiplexing of samples, a
sequencing run consisting of a library from a single bacterial isolate prepared with a MiSeq
Reagent Micro or Nano Kit v2 can yield usable data within hours. By not having to wait until
the run completes, and providing an optional graphical user interface with automated, user-
friendly outputs, technicians in front-line laboratories can shave days from the start of a
sequencing run to the creation of a final report. GeneSippr is open source, and is available
under the MIT license at https://github.com/OLC-Bioinformatics/sipprverse, or prepackaged
in bioconda.
Poster
13
Board #:
Title: Next Generation Sequencing of the Hepatitis A Virus Outbreak in San Diego
Author S. Steele, T. Basler, B. Austin;
Block: San Diego County Public Health Laboratory, San Diego, CA.
In San Diego County, California a Hepatitis A Virus (HAV) outbreak developed with the first
case identified in November of 2016, affecting nearly 600 people to date. Unlike other HAV
outbreaks, the nature and size of this particular outbreak is unique because it circulated in
the homeless and illicit drug user population, causing the County of San Diego to declare a
local public health emergency in September 2017. To improve response in the community to
the emergency the San Diego Public Health Laboratory (SDPHL) increased its testing capacity
with the implementation of both an HAV PCR and a sequencing assay. SDPHL created a
testing workflow that first screens suspect patient specimens with a laboratory developed
TaqMan assay to determine if the HAV RNA is present. If detected, the specimen is
Abstract: sequenced for genotyping and cluster identification using the Center for Disease Control and
Prevention’s (CDC) recently developed Global Hepatitis Outbreak and Surveillance Tools
(GHOST) protocol utilizing next generation sequencing (NGS). Prior to GHOST, genotyping and
cluster identification was performed on HAV specimens by either the California Department
of Public Health (CDPH) or the CDC using Sanger sequencing technology. Sanger sequencing
methods revealed that HAV genotype IB was the cause of the outbreak. Cluster identification
demonstrated one predominate cluster with 16 additional clusters identified. SDPHL
processed 475 HAV positive specimens using the GHOST protocol and used a combination of
in-house developed bioinformatics and GHOST analysis tools to visualize transmission in the
community during the outbreak.
Poster
14
Board #:
A Genomic Database for Detection of Hai Outbreak Using NGS: Outbreaks Validation Before
Title:
Inclusion of Public Genomes in Our Epidemiological Database
M. Chapel1, I. Bizine1, G. Kaneko2, A. Griffon2, B. Muller2, M. Rumigny Pierrot1, E. Santiago
Author
Allexant2, G. Guigon2, C. Mirande1;
Block: 1
bioMérieux, La Balme les Grottes, FRANCE, 2bioMérieux, Marcy l'étoile, FRANCE.
Background: Hospital acquired infections (HAI) are a cause of continuously increasing
morbidity and mortality. Surveillance is essential to prevent HAI and the spread of pathogens
in healthcare institutions. The drastic reduction in the cost of Next-Generation Sequencing
(NGS) technologies and the possibility to sequence the whole genome of bacteria (WGS)
contribute to solve this issue by providing the ultimate resolution for strain typing. To be
effective, this novel whole-genome approach requires the development of a high quality
epidemiologically-oriented database of genomes. Leveraging our expertise coming from our
bioMérieux EpiSeq v1 service, focused on surveillance of infections due to Staphylococcus
aureus, we extended the database to 12 other HAI-related species. Methods: Well-
characterized strains related to outbreaks coming from National Reference Centers and
hospital partners were collected with their epidemiological metadata and sequenced to build
the database. Also, a standardized process was defined to include WGS data and metadata of
public strains related to outbreaks to enrich the epidemiological database. The step of
Abstract: validation is critical to ensure the quality of the base content. All the genomes coming from
potential outbreaks were validated comparing provided/published information and WGS data
analysis performed using our bioinformatic pipeline. Examples of validation results obtained
with public data are presented in this work. Results: The database contains more than 30 000
genomes with their curated epidemiological information. This high number of genomes brings
a large biological diversity to the database, thus covering dozens of prevalent sequence types
(ST) of pathogens involved in HAI worldwide. The geographic diversity supported by the
database includes Europe, Asia and North America. Validation of all genomes and
epidemiological metadata ensures the quality of the database to provide relevant and
efficient results and references for a clinical usage. Conclusions: The database covers a large
geographical and biological diversity for 13 bacterial pathogens corresponding to the most
prevalent species involved in worldwide HAIs. The presence of confirmed and well-
characterized outbreaks in the database facilitates the interpretation of NGS-based strain
typing results.
Poster
15
Board #:
Estrogen-mediated Gut Microbiome Alterations Influence Sexual Dimorphism in Metabolic
Title:
Syndrome in Mice
Author K. Kaliannan;
Block: Massachusetts General Hospital Harvard Medical School, Charlestown, MA.
Background: Understanding the mechanism of the sexual dimorphism in susceptibility to
obesity and metabolic syndrome (MS) is important for the development of effective
interventions for MS. Results: Here we show that gut microbiome mediates the preventive
effect of estrogen (17β-estradiol) on metabolic endotoxemia (ME) and low-grade chronic
inflammation (LGCI), the underlying causes of MS and chronic diseases. The characteristic
profiles of gut microbiome (fecal bacterial community composition and predicted function by
16S-ribosomal RNA sequencing and phylogenetic reconstruction of unobserved states
analysis) observed in female and 17β-estradiol -treated male and ovariectomized mice, such
as decreased Proteobacteria and lipopolysaccharide biosynthesis, were associated with a
lower susceptibility to ME, LGCI and MS in these animals. Interestingly, fecal microbiota-
transplant from male mice transferred the MS phenotype to female mice, while antibiotic
Abstract:
treatment eliminated the sexual dimorphism in MS, suggesting a causative role of the gut
microbiome in this condition. Moreover, estrogenic compounds such as isoflavones exerted
microbiome-modulating effects similar to those of 17β-estradiol and reversed symptoms of
MS in the male mice. Finally, both expression and activity of intestinal alkaline phosphatase
(IAP), a gut microbiota-modifying non-classical anti-microbial peptide, were upregulated by
17β-estradiol and isoflavones, whereas inhibition of IAP induced ME and LCGI in female mice,
indicating a critical role of IAP in mediating the effects of estrogen on these
parameters. Conclusions: In summary, we have identified a previously uncharacterized
microbiome-based mechanism that sheds light upon sexual dimorphism in the incidence of
MS, and that suggests novel therapeutic targets and strategies for the management of obesity
and MS in males and postmenopausal women.
Poster
16
Board #:
Title: RNA Sequencing from HCV Drug Resistance Screening to Co-infection Discovery: A Case Study
P. Lallemand1, B. T. Sherman2, G. S. Mendez1, H. C. Highbarger1, J. A. Kovacs3, C. Hadigan4, R.
L. Dewar1;
Author 1Virus Isolation and Serology Laboratory, Leidos Biomedical Research, Inc., FNLCR, Frederick,
Block: MD, 2Laboratory of Human Retrovirology and Immunoinformatics, Leidos Biomedical
Research, Inc., FNLCR, Frederick, MD, 3Critical Care Medicine Department, NIH Clinical Center,
Bethesda, MD, 4Laboratory of Immunoregulation, NAID, NIH, Bethesda, MD.
Hepatitis C Virus (HCV) chronically infects about 180 million people worldwide and is
associated with the development of liver fibrosis, cirrhosis, hepatic failure, and hepatocellular
cancer. Historically, treatment of HCV has been based on interferon alpha (IFN-α) and
ribavirin (RBV), which are associated with high treatment failure rates and severe side-effects.
Combination therapy with the all-oral Directly Acting Antivirals (DAAs) targeting NS3, NS5A,
and NS5B HCV proteins is effective in curing about 90% of patients with very few side-effects.
DAAs are, however, vulnerable to drug resistance. HCV sequencing and analysis for
Resistance-Associated Variants (RAVs) prior to the selection of a specific DAA drug regimen is
not routine, despite RAVs commonly being present prior to treatment. A template-
independent sequencing method and bioinformatics pipeline was developed to investigate
RAVs in HCV. This sequencing method was performed on a patient who had failed to clear an
Abstract:
HCV (genotype-1a) infection following 2 different DAA drug regimens. Plasma from time
points pre- and post- treatment with IFN-α/ RBV and the two DAA regimens were analyzed.
Sequencing revealed that RAVs were present for the drug Simeprevir prior to the initial DAA
treatment. During the first DAA treatment new RAVs were identified to Daclatasvir,
Ledipasvir, and Ombitasvir. No evidence of superinfection or reinfection was found. Based on
these results, the patient was enrolled in a new study using a drug regimen for which no RAVs
had been detected. An undiagnosed co-infection with Hepatitis-G virus (HGV) was discovered
to have been present prior to any of the treatment time points. More than 50% of sequencing
reads came from HGV, demonstrating the power of this method to sequence other, even
unknown, RNA viruses present in patient plasma. Funded by NCI Contract No.
HHSN261200800001E
Poster
17
Board #:
Characterising the Microbiome by Targeted Sequencing of Bacterial 16S rRNA and Fungal ITS
Title:
Genes
W. Ridderberg, M. Vinter Lund, R. Friborg, F. Strino, H. Attarizi, P. Ettenhuber, M. Lappe, S.
Author
Cardinale, J. Jacobs;
Block:
QIAGEN Bioinformatics, Aarhus, DENMARK.
Background: Microorganisms play an essential role in the degradation of plant litter. Because
degradation is a dynamic process, the composition of the litter changes continuously and the
microbial community dynamically responds to the resulting fluctuations in nutrient
availability. Purahong and co-workers (2016) studied the successional changes in the
communities of both bacteria and fungi during decomposition of beech leaf litter over 473
days in the Hainich-Dün Biodiversity Exploratory in Central Germany. Using decomposing leaf
litter as the model of a dynamic environment, we demonstrate the utility of the CLC Microbial
Genomics Module applied to study changes in the complex profiles of environmental
bacterial and fungal communities. Methods: CLC Microbial Genomics Module, an extension of
CLC Genomics Workbench, contains tools essential for profiling microbiomes and other
Abstract:
complex microbial communities, including quality control of input data, clustering reads into
OTU’s, comparative analysis across samples, and data visualization tools with publication
quality graphics. Furthermore, preconfigured, but fully customizable, workflows ensure ease-
of-use and reproducibility of results. Results: Analyses revealed a dynamic microbial
community with successional changes in both bacterial and fungal communities. We found
that the observed changes in fungal and bacterial abundances correlate with the
predominant metabolic capabilities required at the different stages of leaf decomposition in
response to changing nutrient availability. Conclusion: Using data published by Purahong et
al., (2016), we demonstrate how CLC Microbial Genomics Module can be used for analysing
amplicon sequencing data to profile diverse microbial communities.
Poster
18
Board #:
Infectious Disease Metagenomics: Pre-Sequencing, Sequencing and Post-Sequencing
Title:
Considerations for Clinical Application
Author N. A. Hasan, H. Li, A. Materna, m. Dadlani, R. R. Colwell;
Block: CosmosID, Rockville, MD.
Shotgun metagenomic sequencing is rapidly adopted by the biomedical community for
clinical infection diagnosis and for surveillance applications. Benefits of metagenomic
sequencing include a highly accurate, unbiased, and culture independent characterization of
microbial communities. As a consequence, metagenomics is complementing or even replacing
traditional infectious disease research tools, such as culture, AST and PCR. However, despite
the positive impact of metagenomic sequencing on clinical microbiology, many laboratories
are challenged by the method’s disruptive effect on traditional lab workflows and by the
complexities inherent to establishing a robust, standardized and validated workflow in the
clinical lab. Metagenomics is uniquely sensitive to the introduction of bias along almost every
step of the workflow which can impact accuracy, precision, and a timely and actionable
diagnosis. Therefore, the optimization and standardization of pre-sequencing, sequencing,
and post-sequencing steps have to be carefully considered. In this presentation we shed light
on failure-modes and present mitigation strategies employed at the CosmosID NGS Service
Laboratory along the three workflow phases: Pre-sequencing: The various laboratory
methods employed for sample collection, preservation, nucleic acid isolation and preparation
Abstract:
of sequencing libraries need to avoid laboratory contamination and control the introduction
of bias or variability. Quality control practices and the use of internal standards and controls is
an important part of this phase. Sequencing:Differences in data quality, read length and
depth, as well as distinct error profiles among the various Next-Generation Sequencing
platforms must be carefully considered as they otherwise affect consistent and accurate data
interpretation. Post sequencing:Metagenomics data analysis poses a huge data and
informatics challenge. A myriad of published algorithms scientifically explore different
approaches for deconvoluting the valuable biological signals from bias and error introduced
during the pre-sequencing and sequencing phases. While the clinically informative and
actionable unit in microbiology is a strain, not a species, most available methods fail at sub-
species level resolution. Therefore the choice of algorithms and databases has a significant
impact on the fidelity and actionability of the outcome. CosmosID relies on proprietary
algorithms and expert curated databases of microbial genome information. Independent
validations have found that CosmosID’s fast and user-friendly platform uniquely classifies
microbes with strain-level resolution and industry-leading sensitivity and precision.
Poster
19
Board #:
End-to-End Next-Generation Sequencing for Strain Sub-Typing and Epidemiological Analysis:
Title:
An Exploration of Relatedness and Virulence in Streptococcus pyogenes and SDSE
N. A. Hasan1, G. Hansen2, A. Sabin3, B. Fanelli1;
Author 1
CosmosID, Rockville, MD, 2Hennepin Healthcare System, Minneapolis, MN, 3University of
Block:
Minnesota, Minneapolis, MN.
Streptococcus dysgalactiae subsp. equisimilis (SDSE) is an emerging human pathogen causing
life-threatening invasive infections including the streptococcal toxic shock syndrome.
Laboratory reporting of SDSE is poorly understood and little insight regarding clonal
relationships of SDSE exist. We hypothesize that virulence genes between Streptococcus
pyogenes (GAS) and SDSE exist; contributing to an evolving disease spectrum of SDSE
infections. Using whole genome sequencing of 42 SDSE isolates from clinical infections and
asymptomatic volunteers we describe the genetic relatedness and overlap between GAS and
SDSE and described virulence factors associated with GAS isolates within SDSE. The study was
Abstract:
carried out using the recently launched service for whole genome next-generation
sequencing (NGS) and the in-depth bioinformatic analysis of microbial isolates by CosmosID
NGS Service Laboratorya new. Leveraging CosmosID industry-leading databases and
bioinformatics the service includes in addition to library preparation and genome sequencing
i) unambiguous strain-level taxonomic classification, ii) genotyping, iii) the comprehensive
characterization of antimicrobial resistance, plasmids and virulence, and iv) a detailed
analysis of relatedness between isolates and previously reported genomes e.g. in order to
better delineate transmission events.
Poster
20
Board #:
Title: A Rapid Identify Pipeline for Pathogens Active Detection by Meta Next Generation Sequence
Author J. Ma;
Block: BGI, Shenzhen, CHINA.
At present, meta Next Generation Sequencing (mNGS) technology is applied more and more
now than before in clinical diagnosis for pathogen. Both DNA and RNA sequencing can quickly
produce enormous data respectively, and lots of identity pipelines were published, but both
of these methods have defects in that it is difficult to remove background microbial
interference that leads challenging to diagnose real pathogen caused disease for patients.
There are normally more than hundred species listed after the identify pipeline. For some
pure samples like Cerebrospinal Fluid, the most abundant species in the identified result may
Abstract: be the pathogen that caused the disease. However, for complex samples like Blood,
Bronchoalveolar Lavage Fluid, the most abundant species usually came from the background
or contaminated from environment, this is not easy to diagnose which is the real pathogen.
Here we developed a new pipeline that classified and identified pathogenic microorganisms
based on both DNA and RNA mNGS, which not only increase the accuracy of identification,
but also find potentially pathogenic microorganisms that are actively expressed. The species
number identified was dropped from hundreds to less than ten which make it easier for
clinical pathogen diagnosis.
Poster
21
Board #:
Supervised Machine Learning Approach for Prediction of Legionella pneumophila Serogroup
Title:
Classification from Whole Genome Sequencing
Author S. S. Morrison, J. A. Caravas, N. A. Kozak-Muiznieks, B. H. Raphael, J. M. Winchell;
Block: Centers for Disease Control and Prevention, Atlanta, GA.
Background: The majority of Legionnaire’s Dieses cases are due to a single species, L.
pneumophila. This species consists of at least 17 serogroups, with serogroup 1 (sg1) being
most frequently isolated from clinical and environmental samples. While a number of tools
are available for sequence-based identification of sg1 isolates, there are only antibody-based
methods for discrimination among serogroups 2-17. With the bacterial genome sequences
getting more readily available, there is an urgent need to develop sequence-based tools for
identification of all L. pneumophila serogroups. Methods: We performed whole genome
sequencing on 181 L. pneumophila isolates representing 14 different serogroups and aligned
sequences to the L. pneumophila Philadelphia sg1 genome sequence. Isolate serogroup were
previously characterized by direct fluorescent antibody and/or detection of sg1-specific wzm.
A 30 kb sequence and gene presence/absence alignment was created for each isolate based
on the lipopolysaccharide (LPS) biosynthesis region. The alignment matrix was used as input
for the RandomForest and Caret packages in R. The RandomForest machine learning method
with the kFolds cross validation was performed in order to identify genetic features that
contained enough discriminatory power for L. pneumophila sg typing. A custom python script
was also used to calculate the intra- and inter- pairwise nucleotide identity within the LPS
Abstract: region. Results: After the removal of conserved nucleotide positions, 24,546 features were
used as input into random forest algorithm. The cross validation consisted of 10 sample sets
each with ~ 160 isolates randomly selected. We excluded sg 11, 16, and 17 because they were
represented by a single isolate. The cross-validation analysis resulted in approximately 73.5%
model accuracy. Sg 1 -7, 9, and 13 had an error rate between 0 - 0.30 and for sg 8, 10, 12, 14,
and 15 the error rate was >0.50. A pairwise identity analysis of the LPS regions revealed some
serogroups are highly conserved (97.99% -100%), while others potentially have several
representative sequences (95.5% -100%). Also, several isolates had higher inter-sg similarity
than intra-sg similarity. This suggests that not all sg have homogenous LPS regions and may
represent independent origins for the same sg phenotype. Conclusion: The random forest
approach revealed the LPS region as a promising target to construct diagnostic assays to
detect non-Lp1 isolates. With the transition to sequencing as a first approach during L.
pneumophila outbreak investigations, this model may be used to help elucidate traditional
epidemiology metadata. This initial work is the foundation to construct a refined predictive
model for L. pneumophila serogroups. The findings and conclusions in this report are those of
the authors and do not necessarily represent the official position of the Centers for Disease
Control and Prevention.
Poster
22
Board #:
A High-throughput Protocols for the Detection of HIV-I Drug Resistant Markers Using
Title:
Multiplexed Primer ID and Next Generation Sequencing
Author S. Zhou, J. Nelson, M. Clark, R. Swanstrom;
Block: UNC-Chapel Hill, Chapel Hill, NC.
Background: Next generation sequencing (NGS) platforms will soon replace the current
Sanger sequencing for the analysis of drug resistance mutations. However, NGS platforms are
intrinsically limited in their ability to probe for diversity due to the misincorporation and
recombination from the preceding PCR, obscured sampling depth, and the error-prone nature
of the sequencing platforms. We have largely solved these problems with a strategy called
Primer ID. Each template consensus sequence (TCS) represents an original RNA template
sequenced. Here we present a multiplexed Primer ID (MPID) approach with Illumina MiSeq to
detect HIV-1 drug resistance mutations in protease (PR), reverse transcriptase (RT), integrase
(IN) and the V3 region of env with one single sequencing library reaction per patient.
Methods: Viral RNA was extracted from plasma samples collected from HIV-1 infected
subjects. We used multiplexed cDNA primers targeting HIV-1 PR, RT, IN and V3 regions, each
with a block of random nucleotides (Primer ID). After purification we used two rounds of PCR
to amplify the cDNA and incorporate sequencing adaptors. Pooled libraries were sequenced
using MiSeq. Sequencing data were processed using TCS pipeline. Drug resistance mutations
were called at PR, RT and IN regions after filtering hypermutations and stop codons using TCS-
SDRM pipeline. We performed two experiments to validate the reproducibility of the assay. In
the first experiment, we divided viral RNA extracted from 3 subjects into 4-5 equal portions
Abstract:
for cDNA synthesis to test. In the second experiment, we diluted the plasma samples from 3
subjects in serial dilutions and performed viral RNA extraction, library prep and sequencing
separately to evaluate the reproducibility of TCS number and minority mutation abundances
with low template input. Results: In the first validation we sequenced a total of 14 duplicates
from the 3 plasma samples. The variations of TCS numbers among duplicates are all within
25% of average TCS numbers, and the abundances of minority variants (1% to 30%) were
within 10% difference of the average abundance of all replicates. In the second validation, we
sequenced 3 to 5 titrations of viral RNA templates, each with 3 replicates for each subject. We
found out that we could still recover 1-5% of the input templates even with input viral RNA
template number as low as around 500 copies. However, template utilization varied between
samples, between amplicons, and was somewhat affected by multiplexing. We also found out
that the accuracy of detection of minority mutations is determined by the sampling
depth. Conclusions: MPID drug resistance testing can detect HIV-1 drug resistance mutations
at multiple regions with one library prep and sequencing for each sample. Revealed sampling
depth allows calculation of confidence intervals for minor variants. Our method shows great
reproducibility in the detection of sampling depth and in defining the frequency of minority
variants.
Poster
23
Board #:
Evaluation of Whole Genome Sequencing Genome-Based Methods and Bead-Based Molecular
Title:
Methods for the Serotyping of Salmonella Isolated from Food and Environmental Samples
K. A. MacMaster1, M. J. Nucci2, S. M. Madson3, G. S. Wagley4, K. C. Jinneman1, M. M. Moore1;
Author 1Food and Drug Administration, Bothell, WA, 2Food and Drug Administration, Lakewood,
Block: CO, 3Food and Drug Administration, Jefferson, AR, 4Food and Drug Administration, Atlanta,
GA.
Background: Salmonella serotyping using antisera has been essential to surveillance and
outbreak investigations. There are over 2500 known Salmonella serovars. Keeping a full
inventory of antisera is not practical since antisera are expensive and those to rare antigens
often expire before use. Recent methods have been developed that target the genes
encoding the antigens recognized by traditional serotyping following the White-Kauffmann-Le
Minor scheme. These include bead-based Salmonella molecular serotyping (SMS) methods
and genome-based serotyping using publicly available internet tools and whole genome
sequencing (WGS) data. A recent study evaluated the SMS method for identifying
572 Salmonella isolates from FDA samples and compared it with traditional serotyping. The
WGS data for many isolates from the SMS study are publicly available in the GenomeTrakr
depository maintained at the National Center for Biotechnology Information (NCBI). In
addition, WGS data and traditional serotyping are collected in real-time from
all Salmonella isolated by FDA field laboratories. The goal of this study was to evaluate WGS
genome-based serotyping of Salmonella isolates using SeqSero and Salmonella In Silico Typing
Abstract: Resource (SISTR), and compare the results to SMS and traditional
serotyping. Methods: Isolates were whole genome sequenced on the Illumina MiSeq platform
and the WGS data was uploaded to the GenomeTrakr depository. WGS data of all isolates
were analyzed by SeqSero and SISTR. The results were compared to SMS and traditional
serotyping. Results: To-date 419/572 (73.3%) isolates from the SMS study have WGS data
available. The number of isolates identified as expected were 411 (98.1%) by SMS, 417
(99.5%) by SeqSero, and 418 (99.8%) by SISTR. The ability of each method to narrow isolates
to a single serovar were 200 (48.7%) by SMS, 319 (76.5%) by SeqSero, and 413 (98.8%) by
SISTR. To-date 131 real-time isolates have been analyzed. The number of isolates correctly
identified were 130 (99.2%) by SeqSero and 123 (93.9%) by SISTR. The ability of the methods
to narrow to a single serovar were 105 (80.8%) by SeqSero and 115 (93.5%) by
SISTR. Conclusions: Genome-based serotyping methods improved results compared to SMS
and should be a valuable tool, enabling a more targeted approach to antiserum testing and
confirmation. Genomic-based methods should aid in the identification of rough, non-motile,
or weakly agglutinating isolates, reducing the rates of misidentification.
Poster
24
Board #:
VAMPr:
Title:
Variant Association Mapping and Prediction of Antimicrobial Resistance
Author J. Kim, R. Pifer, A. Koh, Y. Xie, D. Greenberg, X. Zhan;
Block: UT Southwestern Medical Center, Dallas, TX.
Antimicrobial resistance (AMR) is influenced by not only presence/absence of corresponding
genes but also mutations. VAMPr is an analysis pipeline to identify the genes and mutations
and to build association and prediction models between the genotypes and susceptibility
phenotypes. Applying whole genome sequencing and antibiogram data in public databases,
Abstract: we obtained 1,425 wild-type genes and 18,427 non-consensus mutations from 3,666 clinical
pathogen isolates, 6,895 genotype-phenotype associations and 92.1% of average cross-
validation accuracy of phenotype predictions for 120 pathogen-antibiotics models. The
associations will give insight into novel AMR mechanisms and the prebuilt models will be used
to predict antibiotic susceptibilities of new pathogen isolates.
Poster
25
Board #:
Whole Genome Sequencing in the Clinical Microbiology Laboratory: A Pipeline to Answer
Title:
Diverse Clinical and Epidemiological Questions
Author S. Taffner, A. Malek, H. Mostafa, N. Pecora;
Block: University of Rochester, Rochester, NY.
Background: Next generation sequencing (NGS) is an emerging technique in clinical
microbiology with applications ranging from outbreak analysis to genomic surveillance to
analysis of unusual pathogens. Often NGS analysis is a fragmented step-wise process or
pipelines are specialized for a single application or species. Herein we describe the URMC
clinical microbiology pipeline (pipeline), a robust, quality-controlled, modular process for
diverse applications and pathogens. Methods: The pipeline was designed to flexibly perform
rapid analysis on a variety of datasets and questions while storing previously analyzed isolates
allowing the user to build a local database of isolates discovered in their area. The pipeline
consists of two steps written in Python, sqlite3, and JavaScript. The first step performs quality
control on the raw reads (trimmomatic, FastQC) followed by genome assembly (SPAdes) and
plasmid assembly (PlasmidSPAdes). Quality of genome assembly is assessed (Quast), genus
and species (strainseeker), and MLST of samples are identified. Common phenotyping blast
databases are included in the pipeline but custom blast databases can be added making the
pipeline relevant to any species or project. To rapidly identify the best species reference the
genome coverage is calculated for every sample (Quast). Step two consists of a modified
Abstract: CFSAN SNP Pipeline for reference-based SNP Calling and Phylogenetic Analysis. Modifications
include masking SNPs which occur inside phages, mobile elements, and transposons, only
include sites where a consensus exists in every sample, to produce a maximum likelihood tree
(FastTree), and an interactive web application is produced to visualize the coverage and SNP
locations throughout the genome to ensure consistent coverage and no SNP
clustering. Results: The pipeline was successfully used for the following diverse projects;
Genomic investigation of an Enterobacter aerogenes outbreak in a cardiac intensive care unit.
Genomic surveillance of carbapenem-resistant pathogens, Extended spectrum β-
lactamases E.coli, and emerging Clostridium difficile in Western New York. We have also used
the pipeline for characterizing unusually isolated organisms e.g. Facklamina
hominis. Conclusions: Whole genome sequencing is a powerful tool to complement
traditional clinical microbiology techniques. Here, we described a pipeline that has been
proven in diverse projects to be a versatile for clinical microbiology needs. Future plans
include automatically generating an editable phylogenetic tree that overlays meta-data onto
and adding a cloud-based user interface for initializing the pipeline, analyzing the data, and
producing a standardized report to provide to clinical staff.
Poster
26
Board #:
A Sudy on miRNA Signature in the Whole Blood of Individuals with Latent Tuberculosis
Title:
Infection
M. Hijikata1, N. T. Hang2, D. B. Tam3, I. Matsushita1, S. Seto1, V. C. Cuong4, P. H. Thuong3, N.
Keicho1;
Author 1
The Research Institute of Tuberculosis, JATA, Tokyo, JAPAN, 2NCGM-BMH Medical
Block:
Collaboration Center, Hanoi, VIET NAM, 3Hanoi Lung Hospital, Hanoi, VIET NAM, 4Hanoi
Department of Health, Hanoi, VIET NAM.
Objectives: Biomarkers for latent tuberculosis infection (LTBI) may effectively predict who will
develop active TB. As the first step, we have hypothesized that microRNA (miRNA) signatures
in the peripheral blood may reflect the host response at the site of LTBI, and attempted to
identify candidate miRNAs possibly associated with LTBI using a next generation
sequencer. Design and Methods: Healthcare workers (HCWs) in Hanoi Lung Hospital and
district TB centers in Hanoi, Vietnam were enrolled, and their LTBI status was assessed using
interferon-gamma release assays (IGRA). Whole blood samples were collected into PAXgene
Blood RNA tubes (PreAnalytiX QIAGEN/BD), and their total RNAs were extracted. Small RNA
libraries prepared from IGRA-positive (n=3) and IGRA-negative (n=3) samples were analyzed
using a NextSeq 500 sequencing platform with BaseSpace Onsite Sequence Hub using the
Abstract:
DESeq2 variance model (Illumina). The expression levels of the candidate miRNAs in the
whole blood of HCWs (n=109) were also determined by real-time RT-PCR, and analyzed
together with clinico-epidemiological data. Results: Approximately 1,000 miRNAs were
detected in the initial screening, and five miRNAs were significantly highly expressed in the
samples of IGRA-positive HCWs than in those of IGRA-negative HCWs, exceeding a statistical
threshold. One of the five miRNAs showed significantly higher expression in 41 IGRA-positive
HCWs than in 68 IGRA-negative HCWs by real-time RT-PCR (P = 0.0286). Conclusions: We
identified miRNA candidates associated with LTBI, although it is necessary to validate their
expression levels using another larger set of samples. Possible functional significance of the
candidate miRNAs should be further investigated.
Poster
27
Board #:
Building a Pan-genome Allele Database for Whole Genome Sequence-based genotyping
Title:
of Salmonella Isolates of More than 250 Serovars
Author Y. HONG, Y. Tu, Y. Liu, B. Chen, C. Chiou;
Block: Centers for Disease Control, Taiwan, Taichung, TAIWAN.
Background: Whole genome sequencing (WGS) has been a promising method for genotyping
of bacterial isolates for epidemiological investigation of disease outbreaks and active disease
surveillance. WGS-based genotypic data can be comparable among laboratories when the
genetic profiles are generated by using a common pan-genome allele database (PGAdb). A
PGAdb is needed for genotyping of Salmonella isolates. Methods: A total of 8,133 genomes
from the NCBI database were used to construct a Salmonella PGAdb using a modified PGAdb-
builder pipeline. The usefulness of the PGAdb in generating cgMLST profiles of isolates for
identifying epidemiologically-related clusters was assessed using isolates of 3 set of isolates
with different serovars. Results: A Salmonella PGAdb was constructed using 8,133 genomes
from more than 250 serovars, including S. Typhi (24.8%), S. Typhimurium
Abstract:
(13.1%), S. Enteritidis (9.5%), S. Heidelberg (3.2%), and other serovars (49.4%). The database
contained 64,113 loci (genes) of which 3,356 loci were shared by ≥95% of the genomes (core
genes), and 60,757 loci by <95% of the genomes (accessory genes). The database was
applicable to common Salmonella serovars. The database were evaluated by generating
cgMLST profiles of 3 sets of isolates of different serovars and the epidemiologically-related
strains were discriminated successfully from unrelated strains by clustering these
profiles. Conclusions: The Salmonella PGAdb built with a large number of genomes from
diverse serovars can be a useful tool for generating genetic profiles from WGS data of isolates.
With limited validation datasets, further assessment has to be conducted using isolate sets of
various serovars.
Poster
28
Board #:
Title: Microbiological Diagnosis of Otitis Media by Shotgun Metagenomic
P. Woerther1, C. Levy2, G. Gricourt3, V. Demontant3, V. Saint-Rose1, C. Angebault1, E. Sitterle4,
J. Pawlotsky1, J. Decousser1, R. Cohen2, C. Rodriguez1;
Author 1
Microbiology Dpt, Henri Mondor Hospital, APHP/UPEC, Creteil, FRANCE, 2Activ Fundation, St
Block:
Maure, FRANCE, 3Platform NGS, APHP, IMRB, INSERM U955, Creteil, FRANCE, 4Mycology Dpt,
Necker Enfants Malades, APHP, Paris, FRANCE.
Background: Microbiological diagnosis based on metagenomic (NDBM) strategies may offer
new perspectives in clinical microbiology. However, experience on analysis and interpretation
from NDBM data is scarce, particularly for polymicrobial samples. In this study, we aimed to
compare information provided by both culture and NDBM methods in a collection of otitis
media sputum. Material/methods: Thirteen samples of otitis media sputum isolated from 12
children were included. Culture based analysis was performed by plating the samples
according to the standard recommendations. For the NDBM analysis, we used a SM protocol
including experimental and analysis (MetaMIC®). Briefly, a DNA and RNA extraction protocol
adapted for all type of samples and pathogens were used. Then, DNA and RNA libraries were
prepared in parallel before sequencing on NextSeq500 (Illumina). Finally, the software
filtered, quantified, identified, reconstructed the viral, bacterial and fungal genomes when
possible and gave a final automatic report. In order to precise the identification of certain
species difficult to identify (e.g. Streptococcus sp., Corynebacterium sp.), a post-analysis
Abstract: specific pipeline was developed. Identified bacteria were considered as involved in the
infectious process when (i) the number of sequences obtained was higher than that
of Propionibacterium sequences and (ii) bacterial identification was compatible with an
otopathogen. Results: In all, identification of at least one bacteria was obtained in 6/13 and
8/13 cases, when culture based methods or NDBM were compared, respectively. When both
methods were positive, concordance was obtained in all cases (5). In those cases, infections
involving more than one bacterial species were identified by culture in 2 samples but only in
one case by NDBM. Conversely, coinfections involving both virus and bacteria were allowed
by NDBM in 2 cases. In the other last cases, only one virus was identified
(Table). Conclusions: Using our interpretation rules, we found that susceptibility of NDBM was
higher than that of culture based methods. Although polymicrobial infections were slightly
better detected by culture, NDBM enabled the identification of a viral pathogen in 4 cases.
Based on these promising results, we think that the diagnosis values of NDBM may still be
increased with the development of new interpretation rules.
Poster
29
Board #:
Metagenomic Shotgun Sequencing in the Microbiology Laboratory: Routine Implementation
Title:
for Complex Cases
C. Rodriguez1, R. Lepeule1, E. Sitterle2, G. Gricourt3, V. Demontant3, V. Fihman1, J. Decousser1,
L. Coutte1, C. Angebault1, F. Botterel1, A. Lebouter1, S. Fourati1, J. Pawlotsky1, P. Woerther1;
Author 1
Microbiology Dpt, Henri Mondor Hospital, APHP/UPEC, Creteil, FRANCE, 2Mycology Dpt,
Block:
Necker Enfants Malades, APHP, Paris, FRANCE, 3Platform NGS, APHP, IMRB, INSERM U955,
Creteil, FRANCE.
Background: Next generation sequencing (NGS) is changing the field of diagnosis in Infectious
Diseases. However, because of the challenge associated with the complex data analysis and
the bioinformatic tools it requires, NGS is not included in the routine processes. Here, we
report the routine implementation of NGS methods for microbiological diagnosis purposes in
the Microbiology Department of our university hospital, which carries out the medical
analysis for 3000 beds, including 1000 acute patients. Methods: From January 2017 to
January 2018, we periodically organize multidisciplinary meetings involving Virologists,
Bacteriologists, Parasitologists and Infectious Disease physicians. The aim of these meetings is
to discuss cases, the infectious origin of which is suspected but not confirmed by the standard
methods. As a result, all the specimens likely to contribute to the diagnosis were processed
according to our unbiased clinical metagenomic (CMg) protocol. Our Department includes its
own complete metagenomic platform and pipeline (MetaMIC) dedicated to microbiological
Abstract:
diagnosis and involves biologists, physicians, bioinformaticians and engineers. Results: A total
of 29 samples from 20 patients were processed. In 38% (11/29) of the tested samples, CMg
enabled the identification of one or more pathogens. In 7 of these cases, CMg allowed the
identification of microorganisms, which were not found by conventional methods. However,
the information provided by NGS did not improve the diagnosis in the 4 other cases. In two
cases, CMg was negative while conventional methods were not. Finally, in the 16 last
samples, both CMg and conventional methods were negative. Discussion: As a whole, CMg
improved the diagnosis of 35% (7/20) of the tested patients. The two false negative of CMg
corresponded to respiratory colonization by Candida sp.. However, CMg failed to
identify Mycobacterium tuberculosis in one peritoneal sample, probably due to both its
polymicrobial presentation and the poor performances associated with the nucleic acids
extraction of mycobacteria.
Poster
30
Board #:
Benchmarking Genetic Epidemiology Applications Using Large Scale Public Whole Genome
Title:
Sequencing Data
Author R. Kolde, H. van Aggelen, H. Chamarthi, P. Kaplan, A. Hoss;
Block: Philips Research, Cambridge, MA.
Whole genome sequencing (WGS) has shown great promise in detangling complex infectious
outbreaks and identifying transmissions in clinical settings. There is a range of methods and
pipelines proposed for defining genetic similarity: core genome MLST, reference alignment
and assembly-based SNP distances, and alignment free k-mer based methods. In a small
retrospective study, with freedom to adjust parameters, all of these methods usually provide
acceptable outcomes—and small differences in accuracy do not change the results. In a
clinical environment, however, where thousands of samples are going to be sequenced
routinely, small differences in performance can have striking consequences in resulting
workload. Unbiased benchmarking of these methods is complicated, as true infectious
Abstract:
clusters usually are not fully known. Here we propose to use the ability to distinguish the
relatedness of infectious samples derived from same and different patients, as well as within
and between geographic locations, as proxies for performance. Using this methodology on
large public datasets, we report clear differences across methods in power to identify related
samples, consistent across species and gram stains. We also demonstrate how the
benchmark sets can be used to finely tune the parameters for each particular method or
pipeline and how a species-specific approach is required to maximize the performance. By
establishing cohort-independent benchmarks, we pave the way towards efficient application
of genetic epidemiology tools in a clinical setting.
Poster
31
Board #:
NGS-based Phylogeny of DiphtheriaTtoxin and Corresponding Pathogenic Features
Title:
in Corynebacterium spp. Implies Species-specific Pathogenicity Transmission
A. Dangel1, A. Berger2, R. Konrad1, A. Sing2;
Author 1Bavarian Health and Food Safety Authority, Oberschleissheim, GERMANY, 2Bavarian Health
Block: and Food Safety Authority, German National Consiliary Laboratory on Diphtheria,
Oberschleissheim, GERMANY.
Diphtheria toxin (DT) is produced by toxigenic strains of pathogenic Corynebacterium
diphtheriae as well as zoonotic C. ulcerans and C. pseudotuberculosis. Toxigenic strains may
cause severe respiratory diphtheria, myocarditis, neurological damage or cutaneous
diphtheria. In central Europe infections with toxigenic C. ulcerans outnumber those of
toxigenic C. diphtheriae currently and C. ulcerans is increasingly recognized as emerging
pathogen. The DT encoding tox gene is located in a mobile genomic region and single cases
of tox variability between C. diphtheriae and C. ulcerans have been shown. In contrast,
differences between species of the diphtheria toxin repressor (dtxR) gene, occurring in
toxigenic and non-toxigenic Corynebacteria have not been characterized yet.
We used whole genome sequencing data from 90 toxigenic and 46 non-toxigenic isolates of
different pathogenic Corynebacterium species of animal or human origin to elucidate
differences in tox, dtxR and tox-surrounding mobile genetic elements in a comparative way on
a large sample set. We performed de novo assembly, ordering of contigs, genome annotation
and extracted tox, dtxR as well as prophages and other tox-surrounding pathogenicity-related
mobile elements. Extracted regions were used for phylogenetic comparisons between the
different Corynebacterium species.
Translated sequences of both tox and dtxR could be classified in four distinct, mainly species-
Abstract:
specific clades, corresponding to C. diphtheriae, C. pseudotuberculosis, C. ulcerans and
atypical C. ulcerans from wild boar. Average amino acid similarities within the groups were
above 99%, but between groups only at 91 – 94% for DT and at 79 – 98% for DtxR. However,
for DT different subgroups could be identified, correlating with geographic origin of the
isolates or different mobile genetic elements comprising the tox gene. In nearly
all C. diphtheriae isolates tox genes fell within genetic regions of known prophage sequences.
In contrast, in C. ulcerans diverse mobile elements including the tox gene could be identified:
either prophage sequences differing from C. diphtheriae prophages or, in nearly all isolates
without tox-overlapping prophage annotations, an alternative but very homogeneous
pathogenicity island (PAI) described previously. Only one isolate showed a different,
shorter tox-comprising putative PAI. Beyond the tox-overlapping elements, most
analyzed Corynebacterium isolates harbored a variety of additional prophages.
In conclusion our NGS data from 136 Corynebacterium isolates indicate the existence of
different genetic backgrounds of DT-mediated pathogenicity in
different Corynebacterium species. Further pathogenicity related elements
within C. ulcerans imply that DT transmission pathways between isolates may be more diverse
in the zoonotic species and contribute to its emerging pathogenic potential.
Poster
32
Board #:
Using Bacterial Whole Genome Sequence Data for the Rapid Development of Clone-specific
Title:
Screening Assays in a Hospital as a Case Atudy
Author C. S. Heinz, K. Prior, S. Bletz, A. Mellmann, D. Harmsen;
Block: University Hospital Münster, Münster, GERMANY.
There is ample evidence about the utility of bacterial whole genome sequencing (WGS) for
genotyping. WGS data is also a valuable resource to develop clone-specific molecular
screening assays for active case finding during situations where an outbreak is suspected
without prior cultivation. As evidence is scarce addressing this topic, here we aim to
demonstrate as a case study the utility of WGS data for designing such assays by using
amongst others a novel tool. An Escherichia coli sequence type (ST) 131 probable
transmission event comprising six isolates was tested against a background of 47 other ST131
isolates collected during the same six-month period in the same institution. Furthermore, a
methicillin-resistant Staphylococcus aureus (MRSA) cluster involving three samples was
evaluated against 63 other MRSA samples of the same spa type (t011) that were also isolated
during the same time period in the same hospital. Three tools that return results within
minutes were used with default settings to extract clone-specific genetic information from
WGS data: (1) desktop ssGeneFinder for finding specific stretches of DNA by employing
iterative BLAST searches, (2) kmer search based RUCS web service, and (3) a new feature
implemented in Ridom SeqSphere+ that finds specific single nucleotide polymorphisms (SNPs)
in core genome genes. For found SNPs ABI PrimerExpress was employed to design primers
Abstract: and probes for TaqMan allelic discrimination assays. For the ST131 E. coli assembled genomes
ssGeneFinder, RUCS, and SeqSphere+ found 1, 10, and 10 clone-specific targets, respectively.
PrimerExpress was able to design 5 TaqMan and 6 TaqMan MGB allelic discrimination assays
for the found 10 specific SNPs. Exemplarily the specificity was experimentally verified for one
multiplex TaqMan assay using a GC content changing clone-specific SNP and a species-specific
target. Using the clone-specific primers alone, a high-resolution melting curve (HRMC)
analysis was also discriminatory as predicted. In case of the MRSA transmission event with
closer clustering of the background isolates, ssGeneFinder, RUCS, and SeqSphere+ found 0, 1,
and 12 clone-specific targets, respectively. The found RUCS target could not be confirmed in
the laboratory as being specific. PrimerExpress came up with 0 TaqMan and 5 TaqMan MGB
assays. Again, specificity was confirmed experimentally for one multiplex TaqMan MGB and a
clone-specific HRMC assay. The results of this study illustrate that it is feasible to use WGS
data to rapidly develop and implement clone-specific screening assays. As delivery of HRMC
primers only is quicker (1-2 days) than the for TaqMan tests additionally needed fluorescently
labeled probes (4-7 days), an implementation strategy when using SeqSphere+ could be to
employ initially clone-specific HRMC assays that can be replaced later against potentially
more specific (multiplex) TaqMan assays.
Poster
33
Board #:
POREgressive: Near Real Time Local Pathogen Identification and Characterization Using
Title:
Oxford Nanopore MinION and Off the Shelf Informatics Tools
Author C. R. Paden, S. Tong;
Nanopore sequencing is ushering in an era of portable, low-cost, rapid, high throughput DNA
sequencing. Oxford Nanopore’s MinION detects and sequences single molecules of DNA by
electrophoretically driving them one by one through nano-scale pores. With the right tools,
this technology may prove revolutionary for mobile, resource-poor, or space-limited
laboratories. Many existing bioinformatic tools rely on post hocanalysis of sequencing data,
which is counter to the MinION’s promise of real-time analysis of sequence data. In addition,
a major challenge of the technology is that local, standardized data analysis is not yet
routinely available to laboratory technicians, especially important in a mobile or deployment
setting, where it is not always practical or possible to access cloud analysis resources. To
meet this need of mobile, rapid data analysis, we developed a set of tools, POREgressive,
which runs alongside the MinION’s data acquisition software. There are modules that batch
Abstract: and send reads for basecalling and other modules that progressively analyze the run for
metagenomic content or for specific sequences as data arrives. POREgressive can use either
local or network resources, depending on the environment. Rather than starting from the
ground up, building new analysis algorithms, we adapted existing open source tools, such as
Kraken and BWA, in order to display analyses that would be familiar to end users. By
analyzing data as it is generated, results can be gathered in minutes to hours rather than
waiting days for a complete sequencing run, and the run can be stopped when enough data is
gathered. Many nanopore sequencing tools are being developed to take advantage of the
unique qualities of nanopore reads, such as length, to perform novel analyses. POREgressive
is meant to bring the real-time capabilities of the MinION to more traditional tools and
techniques, reducing the learning curve and increasing the immediate usability of this quickly
evolving technology.
Poster
34
Board #:
Whole genome sequencing and comparative genome analysis of human and
Title:
bovine Salmonella Dublin isolates
Author S. W. Kim1, D. M. Harhay2, S. Salaheen1, J. S. Van Kessel1, B. J. Haley1;
1
Block: ARS USDA, Beltsville, MD, 2ARS USDA, Clay Center, NE.
Background: Salmonella Dublin is a bovine-adapted serovar and, although
infrequent, S. Dublin infections in humans can result in serious complications with relatively
high hospitalization and mortality rates. Human clinical and bovine isolates are often resistant
to multiple antimicrobials. Methods: To investigate the temporal, geographic, and host-
specific diversity of S. Dublin isolates a total of 115 isolates recovered in the US between 2002
and 2014 were sequenced on an Illumina NextSeq 500. Reads were trimmed using
Trimmomatic and assembled using SPAdes. In addition, 782 publicly available S. Dublin
genomes were included in this analysis. Single nucleotide polymorphisms were identified
among all the genomes using Parsnp with default parameters and a recombination filter. A
Maximum Likelihood phylogenetic tree was inferred using RAxML. Antimicrobial resistance
(AMR) genes were identified using DIAMOND (sequence identity ≥ 90%, AMR gene
coverage > 60%). Results: S. Dublin genomes clustered into five major clades, and isolates
from the US clustered in one clade. Four subclades were further identified among the US
isolates. Two clades comprised only a few isolates from two states. The third clade comprised
isolates from seven states and the remaining clade included isolates from 26 states. Multiple
clusters of isolates originating from the same geographic location were observed, indicating
Abstract: that multiple sub-lineages of S. Dublin are circulating within some states. Most of the
genomes were sequence type (ST)10; only one isolate was ST3208, two were ST2829, three
were ST2037 and four were ST73. All the non-ST10 S. Dublin had similar multilocus sequence
typing alleles as ST10 with one or two differences and were all members of eBurst Group 53.
Aminoglycoside resistance genes were identified in most isolates. Genes conferring resistance
to β-lactams, tetracycline, phenicol, and sulfonamides were observed in 48, 31, 23, and 11%
of isolates, respectively. More than 33% of genomes encoded genes associated with
resistance to four or more classes of antimicrobials and were thus considered multidrug
resistant (MDR). Greater odds were observed for MDR isolates from bovine (OR:7.87, 95% CI:
3.51-17.69), in the US (OR:26.60 95% CI: 15.16- 46.67), between 2010 and 2017 (OR:1.42
95% CI: 1.05- 1.92) compared to isolates from humans, as well as those isolated outside of the
US, and before 2010, respectively. Conclusions: Results of this study indicate that there is a
high level of diversity among globally isolated S. Dublin with a high percentage of publicly
available and study genomes encoding resistance to multiple antibiotics. This study also
demonstrated that there are phylogenetic and resistance gene content differences between
isolates from different regions as well as between those isolated in different years and from
different sources.
Poster
35
Board #:
Diverse Whole Genome Sequence-based Typing Approaches During a Foodborne Outbreak
Title:
Investigation: The 2015 Central Italy Listeriosis Case Study
M. Orsini1, A. Di Pasquale1, C. Patavino1, A. Rinaldi1, M. Ancora1, M. Di Domenico1, M.
Marcacci1, M. Torresi2, V. Acciari2, F. Pomilio2, B. Félix3, J. Mariet3, A. Pietzka4, M. Van der
Voort5, F. Guidi6, G. Blasi6, C. Cammà1;
1
Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise (IZSAM), National Reference
Center for WGS of microbial pathogens: database and bioinformatic analysis., Teramo,
ITALY, 2Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise (IZSAM), National
Author
Reference Laboratory for Listeria monocytogenes., Teramo, ITALY, 3Maisons-Alfort Laboratory
Block:
for Food Safety, Salmonella and Listeria Unit, University of Paris-Est, French Agency for Food,
Environmental and Occupational Health & Safety (ANSES), Maisons-Alfort, FRANCE, 4Institute
of Medical Microbiology and Hygiene, Austrian Agency for Health and Food Safety, Graz,
AUSTRIA, 5The Netherlands food and consumer product safety authority (NVWA),
Wageningen, NETHERLANDS, 6Istituto Zooprofilattico Sperimentale dell'Umbria e delle
Marche, Fermo, ITALY.
An increasing number of severe human listeriosis has been recorded in the Marche region,
Italy, since May 2015. The circulating strains were subtyped using different molecular
methods: PFGE, MLST and NGS. MLST and PFGE identified a large group of strains exhibiting
ST7 and a pulsotype never detected before. Following an official inquiry of the EURL and EU
NRLs Lm laboratories addressing to share the collected dataset, the novel Italian pulsotype
was compared to those present in the international databases. Austria, France and the
Netherlands provided 27 strains belonging to the same pulsotype or within the same ST. The
NGS based phylogenetic analysis displayed the cluster nature of the Italian isolates confirming
the previous MLST/PFGE typing. Moreover, the analysis allowed to exclude the European
isolates as belonging to the Italian outbreak, specifically those of the same pulsotype from the
Abstract: Netherlands. A suspected source of infection was hypothesized in January 2016 with the
isolation of a food strain from a Pork Headcheese locally produced. The contaminated
batches of the meat product were recalled from the market by Local Health Authority and the
last clinical case linked to the outbreak strain was notified on March 11, 2016. A concrete
cooperation between EURL and NRLs laboratories of the Member States is encouraged due to
the international dimension of food trade and related pathogens. The matching with the
human strains was based on molecular typing including NGS analysis. Several methods of
clustering were applied and compared; namely, core/whole genome MLST then supported by
core and pan SNPs analysis in order to improve the discriminative power with respect to
molecular subtyping methods. The present study confirmed that the genomic based
approaches represent precious tools in course of outbreak investigation.
Poster
36
Board #:
Application of Joined Shotgun Metagenomics and DNA Cross-linking Technology Allows
Title:
Tracking of Resistance Plasmid Transfer in vivo
Author C. Deneke, J. Grützke, S. Hadziabdic, B. Malorny;
Block: German Federal Institute for Risk Assessment, Berlin, GERMANY.
Increasing prevalence of carbapenemase-producing Enterobacteriaceae is presenting an
important public health concern worldwide. This is additionally highlighted with the
increasing reports on the detection of carbapenem resistant/non-susceptible bacteria in
livestock, food products, wild animals and the environment. In this study, we elaborate
different bioinformatic approaches to analyze the spread and prevalence of a multi-drug
resistant Salmonella NDM producing plasmid (blaNDM-1-fosA3-IncA/C) in chicken gut
microbiomes. This analysis is based on a broiler chicken infection study where first Salmonella
Infantis (recipient strain) was administered to chicken, followed by the administration of an
NDM producing plasmid harboring Salmonella Corvallis strain (donor strain). At different time
points, chicken feces were collected and shotgun metagenomics was performed. For the
particular example study at hand, traditional culture-based microbiology analysis was
performed in parallel - thus allowing the comparative analysis of two independent
technologies. In this contribution, we study the evolution of the prevalence of the inoculated
donor and recipient Salmonella strains as well as the native commensal bacterial community
Abstract: under the experimental condition without antibiotic stress. In particular, we compare
different taxonomic profiling tools and reference databases for detection and quantification
of the co-occurring Salmonella strains. This step requires sub-species resolution and we
present different newly developed approaches for this task. The focus of this work was to find
evidence for the plasmid transfer from the donor strain to either the recipient strain or to
commensal bacteria in the chicken gut. Therefore, the resistance gene profile of the
metagenomics samples was followed over time as well as the abundance of the NDM
producing plasmid. This was compared to the detection of the Escherichia coli and S. Infantis
transconjugants obtained from the independent culture-based assay. Furthermore, we follow
the plasmid transfer in more detail for selected samples via the application of Hi-C cross-
linking technology which provides direct linkage of plasmid and chromosomal DNA. In
summary, our contribution gives a comprehensive overview about bioinformatics approaches
for resistance tracking in metagenomes, shows a novel application of cross-linking
experiments for plasmid transfer and discusses different approaches for the sub-species level
strain quantification.
Poster
37
Board #:
Microbiota of the Airways of Young Children with Cystic Fibrosis During Exacerbations and
Title:
Antimicrobial Treatments
Author N. Nørskov-Lauritsen1, W. Ridderberg2, H. V. Olesen1;
1
Block: Aarhus Uiversity Hospital, Århus N, DENMARK, 2Qiagen Bioinformatics, Århus C, DENMARK.
The Danish tradition of monthly surveillance of airway secretions by means of laryngeal
suction offers a unique possibility for monitoring the microbiota in the early stages of the
disease. 184 airway secretions from 52 patients aged 0-10 years were analyzed by NGS during
a 6 month period. Patients were treated with antimicrobials at 18% of the visits, while two
thirds of the patients received antimicrobials within 30 days after a visit. By
culture, Haemophilus influenzae prevailed among the 2-5 y olds, while Staphylococcus
aureus was the most common pathogen among the older children. By deep sequencing of 16S
rDNA, the five most abundant genera (Moraxella, Haemophilus, Streptococcus,
Neisseria and Veillonella) accounted for 70.6%, while the two
following, Staphylococcus and Pseudomonas, accounted for 4% each. Bordetella constituted
the 12th most common genus, however 79% of the reads came from one chronically infected
patient. Using the Bray-Curtis dissimilarity metric we found that samples cluster by patient,
and that the lung microbiota was significantly different between the majorities of patients.
Although fluctuations occurred, the airway microbiota was relatively personal within the
investigated time frame. The phylogenetic diversity (estimated by alpha diversity) was lower if
patients were receiving antimicrobial treatment, while clinical exacerbations had less impact
Abstract: on alpha diversity of microbial communities. Pseudomonas aeruginosa was cultured for the
first time from one patient who was recently transferred to the clinic. Pseudomonas rRNA was
undetectable in the specimen obtained 5 weeks before, but constituted 86.2% of the reads in
the culture-positive specimen. Oral ciprofloxacin and inhaled tobramycin (2 and 4 weeks
treatment, respectively) completely eliminated Pseudomonas rRNA during and after therapy,
but 3.5 months later Pseudomonas was detectable with a relative abundance of 0.94%.
Finally, P. aeruginosa was diagnosed by culture in the fourth specimen obtained after seven
months. One patient completed a three months aggressive antimicrobial regimen directed
against P. aeruginosa. During therapy, Streptococcus,
Prevotella and Fusobacterium comprised 70-80% of reads, thereafter Neisseria, Haemophilus,
Gemella, and Lactococcus increased in the range of 6603-fold to 570-
fold. Staphylococcus rRNA also increased, and S. aureus was cultured; but the relative
abundance of Staphylococcus decreased 49-576 fold following antimicrobial therapy directed
against this pathogen. Microbiota analysis by standard 16S rRNA gene primers was inferior to
culture on selective agar plates for detection of Mycobacterium abscessus, but showed
excellent capability for monitoring treatment of P. aeruginosa and assessment of microbiota
fluctuations elicited by antimicrobials.
Poster
38
Board #:
Identification of Lineage-specific SNPs Responsible for the Predominance of Carbapenem-
Title: resistant Klebsiella pneumonia ST-11 in Southwest of China from a Long-term Surveillance in
the Hospitals
Author Y. Feng, Z. Zong;
Block: West China Hospital of Sichuan University, Chengdu, CHINA.
Unlike epidemiology of carbapenem-resistant Klebsiella pneumonia (CRKP) in Europe and
North America where most of strains are ST-258, the predominant one in China is ST-11.
Hence an over 4-year and still going-on surveillance for CRKP started from 2014 in the
hospitals, during which anal swabs and other samples, such as blood, pus and urine, of in-
patients were screened routinely to track and control both intra- and inter-hospital
transmission. To date, a total of 371 CRKP strains were isolated and sequenced using Illumina
Hiseq platform, of which 72% (n=268) strains were identified as ST-11 by in silico MLST typing.
To investigate homologies of these strains under both local and global scale, all public and
qualified K. pneumonia ST-11 strains from SRA by the end of 2017 (n=367) were included in
the analysis. Along with our data, reads of 635 strains were mapped against the complete
chromosome of the first isolated ST-11 in our surveillance, followed by the identification of
core SNPs. The high-resolution phylogenetic tree was inferred using maximum-likelihood
method based on the core SNPs, which revealed several events of outbreak, including
nosocomial ones between wards, also clear transmissions among hospitals. Our strains were
found to be separated into 4 main lineages, of which 2 small ones (n=3, n=5) were clustered
with stains from Southeast Asia and Europe, uncovering a hidden intercontinental
transmission route, and most surprisingly, along with 16 strains from the public database, the
Abstract:
rest of our CRKPs formed two sister lineages (n=46, n=228), which were apparently distinct
from the others. Even though both lineages are the main force of prevalent CRKP in China,
the preponderant one was believed to have certain advantages regarding survival or
colonization during the competition with the others, and hence lineage-specific non-
synonymous SNPs were identified by contrasting these sister lineages. Around 20 high-quality
SNPs have been found to be specific to the largest lineage, most of which reside on genes
involved in metabolism pathway, such as glycosyltransferase, alcohol dehydrogenase. These
amino acid substitutions may confer benefits of effectively utilizing alternative carbon
sources, in the situation where Chinese people have a unique dietary structure. Notably, the
two SNPs that one truncates glycine betaine-binding protein YehZ and the other causes
missense mutation of lipid A biosynthesis protein LpxM, appear to be much more significant
as previous studies have shown that the loss of gene yehZ allows cells to avoid
hyperosmolarity, and the modification of lipid A can optimize immune evasion to facilitate
colonization. Although, these SNPs need further investigation to be verified as preferable
choices over the other in terms of adaption, it is becoming evident that to be less virulent to
avoid severe immune response of the host, more resistant to the varieties of stress, is how
bacteria live within humans.
Poster
39
Board #:
Insights from validation of a Verocytotoxigenic (Shiga Toxigenic) Escherichia coli Whole-
Title:
Genome Sequencing Pipeline
J. Wang1, J. Wright1, B. Gilpin2, D. Duncan1, H. Strydom1, N. Karki1, J. Draper3, X. Ren3, K. Dyet3,
Author P. Carter3;
1
Block: ESR, Upper Hutt, NEW ZEALAND, 2ESR, Christchurch, NEW ZEALAND, 3ESR, Porirua, NEW
ZEALAND.
New Zealand’s national public health reference laboratories at the Institute of Environmental
Science and Research (ESR) are transitioning from classical typing methods to whole-genome
sequencing (WGS) -based approaches. Here we will describe the validation performed on
typing of Verocytotoxigenic (Shiga Toxigenic) Escherichia coli (VTEC/STEC) during pipeline
development. Our pipeline is a python wrapper of Nullarbor, SRST2 and ABRicate with
reference databases including two MLST schemes, EcOH, SeroFinder, VirulenceFinder, VFDB,
and VFDB compiled for SRST2. The output is a simple table summarising sequencing quality
and typing data including multilocus sequencing type (MLST), serotype (O:H type) and
presence or absence of the virulence genes stx1, stx2, eae and ehxA. We assessed the
accuracy (by comparison to laboratory results) and consistency of the WGS-based analysis, as
well as computational aspects such as run time, intermediate file generation, and final output
file size. In total, over 400 isolates have been sequenced, analysed, and compared to
laboratory results.
Abstract:
WGS greatly improved serotyping capability, as 76 O nontypable / Rough isolates (18% of the
total) and 66 H nontypable / non-motile (HNM) isolates (15% of the total) were able to be
typed by in silico methods. However, WGS failed to identify the O type of five O64 isolates
that were typed by agglutination, and seven HNM also remained H-untypable by WGS.
The success rate of virulence gene calling was 94% overall. The discrepancy in virulence gene
presence/absence calls was greatest for non-O157 isolates. We investigated 20 isolates
where virulence genes had been identified by conventional PCR and were not detected in
silico. Re-testing of the isolate stock using conventional PCR confirmed absence of some of
the virulence genes in question, suggesting loss on subculture or the presence of multiple
strains in the stock culture; this is being investigated further.
In this presentation we will discuss the reasons behind discrepancies observed, why we think
the O64 and HNM isolates remained untypable by WGS, and how loss of virulence genes
during subculture may impact on VTEC/STEC surveillance.
Poster
40
Board #:
A Customized Next-Generation Sequencing Pipeline for Investigating the Chronic Wound
Title:
Microbiome
F. Huygens1, S. Rudd2, I. U. Rathnayake1;
Author 1
Queensland University of Technology, Brisbane, AUSTRALIA, 23Queensland Facility for
Block:
Advanced Bioinformatics, The University of Queensland, Brisbane, AUSTRALIA.
Background: Chronic wounds are known to occur most commonly in diabetic, elderly and
immunocompromised individuals. Bacteria can contribute to the lack of healing and
persistent inflammation of chronic wounds. Several studies have examined bacterial
communities associated with chronic wounds using Next-Generation Sequencing (NGS)
technologies, however, workflows such as QIIME and Mothur have limitations with regards to
processing of large volumes of data in a meaningful manner. Methods: Wound swab samples
were collected from patients with chronic wounds attending the Queensland University of
Technology (QUT) wound clinic over a period of 12 weeks. The data set contained 364
samples representing the evolution of 66 wounds from 56 patients. The Ion PGM System for
NGS was used for NGS analysis. Eleven PGM chips were used in this study. For this large
dataset, we developed a robust analysis method that requires a reduction of data dimensions,
and to achieve this, 16S rRNA sequences were collapsed based on perfect matches (length
and sequence). Non-canonical sequences were excluded from the analysis. Taxonomic
assignment was performed against the RDP database and the final dataset of operational-
Abstract: taxonomic-units (OTUs) was transformed into a format suitable for import into
Calypso. Results: 56 million valid DNA sequences were identified from 11 PGM chips (DNA
sequences longer than 3 nucleotides in length). The dataset was statistically evaluated using
the Mothur 16S rRNA workflow and cluster sequences were aligned against the SILVA
database. The size of the data collection was reduced to 30.7 million on the basis of the
sequence length threshold and was further reduced to 15.1 million using the polybase
threshold. These sequences passed quality and synthetic sequence filtering steps and were
suitable for downstream analysis. The final dataset of operational-taxonomic-units (OTU) was
composed of 20,987 clusters and represented 13.4 million sequences. Calypso outputs
showed significantly differentially abundant bacterial genera in healing and non-healing
wounds, and demonstrated distribution of genera over time for different wound
aetiologies. Conclusions: We used a robust 16S rRNA NGS pipeline that enabled the
investigation of 364 wound samples as part of a large temporal study. Ultimately, we were
able to maximise the DNA sequence length to 320 nucleotides, which significantly improved
the taxonomical temporal profiling of the chronic wound microbiome.
Poster
41
Board #:
Microbiomes of the Five Different Body Habitats from Otherwise Healthy Adults with
Title:
Periodontal Disease
N. E. Yang1, Y. Je2, J. Shin3, T. Kim1, S. Seo2, Y. Hong2, T. Kim1, J. Park2, H. Lee4, J. S. Yang3, H.
Lee5, K. Park6;
1
Department of Laboratory Medicine, Seoul National University Hospital, Seoul, KOREA,
REPUBLIC OF, 2Department of Laboratory Medicine, Seoul National University Bundang
Author Hospital, Seongnam, KOREA, REPUBLIC OF, 3Department of BigData, Macrogen Inc., Seoul,
Block: KOREA, REPUBLIC OF, 4Department of Periodontology, Section of Dentistry, Seoul National
University Bundang Hospital, Seongnam, KOREA, REPUBLIC OF, 5Department of Pathology,
Seoul National University College of Medicine, Seoul, KOREA, REPUBLIC OF, 6Department of
Laboratory Medicine, Seoul National University College of Medicine, Seoul, KOREA, REPUBLIC
OF.
Alteration in the oral microbiome is a well-studied example of how the microbial dysbiosis is
implicated with diseases. A body of evidence has demonstrated that periodontal disease is
closely linked to many the non-communicable diseases. It is therefore imperative to expand
our knowledge on the breadth of microbial communities in otherwise healthy individuals with
periodontal disease so that we can early-detect any significant alteration in the microbiomes
in patients with periodontal disease. Ten healthy adults were recruited. Inclusion criteria
required that subjects be in good general health. No subject showed any symptom or sign of
infection at the time of sample-collection except for localized periodontal inflammation.
Subjects refrained from any dental activities at least two hours prior to sample collection.
Specimens from oral cavity - saliva, buccal mucosa, and dental plaque -, stool, and the blood
were collected in a uniform manner according to a standardized protocol. Samples were
subjected to sequencing using MiSeq™ platform. Sequencing produced an average of 362,204
Abstract: reads per sample, and resulted in a total of 336,370 taxonomic assignment. Of note, the
blood samples from all of the subjects were found to contain bacterial microbes. The alpha-
diversity of each habitat was assessed by Chao 1, Shannon, and Simpson indices. The blood
microbiome was the most diverse with significant proportion of unclassified taxa of microbes.
At the genus level, the microbiota of the five habitats were separated into two distinct
clusters based on Unifrac Unweighted Pair Group Method with Arithmetic Mean and Principal
Coordinates Analysis. Few studies yet described the comprehensive profiles of microbiomes
at multiple body habitats in the patients with periodontal disease. Moreover, only a handful
of scattered studies have addressed the presence of the blood microbiome in systemically
healthy individuals. Additional studies are warranted to identify microbial signatures of
periodontal disease before any systemic disease is developed. Furthermore, intensive
investigation on the blood microbiome is needed to answer the question of what is their
biological and clinical significance.
Poster
42
Board #:
Protective Effect of Probiotics on gut Dysbiosis Minimal Enteritis Induced by High
Title:
Temperature and Humidity
Author H. Luo, S. Chen;
Block: Guangzhou University of Chinese Medicine, Guangzhou, CHINA.
High temperature and humidity (HTH) can cause diarrhea due to food and drinking water
contamination. However, its direct effects on the gut microbiota and gastrointestinal
inflammation are unknown. To investigate the effects of HTH and probiotics on the
microbiome, 21 male mice were randomly assigned into normal control (NC), HTH, and
probiotic-treated (PR) groups. HTH and PR groups were regularly housed at 35 ± 0.15 °C with
a humidity of 90-95% for eight consecutive weeks. A broad-spectrum probiotic was
administrated to PR-group mice from day 50 to 56. Clinical signs were observed and the gut
microbiota was analyzed by 16S rRNA-based metagenomics. Microbial community structures
were defined, and univariate, multivariate, and correlation analyses were performed on
bacterial profiles. Functional metagenome analysis was performed based on 16S rRNA.
Abstract: Intestinal pathology and the expression of defensins and pro-inflammatory cytokines were
also assessed. Mice in HTH and PR groups gradually developed loose feces, which peaked at 2
weeks and then gradually subsided. The HTH group developed a distinct microbiota profile
associated with augmented metabolism and human disease pathways with suppression of
the environmental information processing pathway. Pathological assays indicated minimal
enteritis, increased bacterial translocation, and elevated plasma LPS and pro-inflammatory
cytokine levels. These abnormalities were partially rescued by probiotics. Thus, ambient HTH
directly contributes to gut dysbiosis and minimal enteritis. Broad spectrum probiotics
improved disease signs, partially normalized the microbiota, and ameliorated gut
inflammation. This study sheds new light on the pathogenesis of climate-associated diseases
and offers a possible therapeutic approach.
Poster
43
Board #:
WGS-based Retrospective Analysis for Surveillance of L. monocytogenes Infections in Emilia-
Title:
Romagna - Italy
E. Scaltriti1, L. Bolzoni1, C. Vocale2, M. Morganti1, M. De Flaviis1, G. Casadei1, M. Re2, S.
Pongolini1;
Author 1
IZSLER, Risk Analysis and Genomic Epidemiology Unit, Parma, ITALY, 2Operating Unit of
Block:
Clinical Microbiology, Regional Reference Center for Microbiological Emergencies (CRREM),
St. Orsola-Malpighi Polyclinic, Bologna, ITALY.
Background: Whole Genome Sequencing (WGS) of clinical, food and environmental isolates is
often performed for detecting and tracing of L. monocytogenes outbreaks. Nevertheless, due
to the long incubation of listeriosis, most outbreak sources remain undetected with the
consequence that clusters of genomically-similar isolates remain only presumptive outbreaks.
In this condition of missing epidemiological confirmation, the identification of a cut-off of
genomic similarity leading to a confident definition of the outbreak borders is crucial for the
attribution of isolates to the outbreaks. In this retrospective study, we analyzed human
isolates collected in Emilia-Romagna region of Italy from 2012 to 2017 to look for possible
outbreaks. Methods: A total of 119 L. monocytogenes isolates belonging to serogroup IIa
were subjected to WGS with 250x2bp paired-end runs on a Miseq Platform (Illumina).
Genomes were de-novo assembled with SPAdes and core-genome multi-locus sequence
typing (cgMLST) was performed using the BigsDB allele scheme (Institut Pasteur). Genomic
clusters were identified based on the number of pairwise allele differences between isolates
according to the cut-off proposed by ECDC. For long-lasting clusters, phylogenetic analysis
based on SNPs (CFSAN pipeline, US FDA) was performed through molecular clock models in a
Bayesian framework (BEAST). Statistical analyses were performed to detect the presence of
Abstract:
within-cluster evolutionary signals against date-randomized datasets. Results: The analysis
performed with cgMLST highlighted the presence of 13 clusters, which included between 2
and 21 cases and lasted between 3 months and 5 years. Despite the relatively small time span
between isolations, we found that the larger cluster detected showed enough evolutionary
signal to allow phylogenetic inference. We estimated a very small effective population size
confirming the clonal origin of the infection. We also found that including late isolates
(excluded from the cluster based on allele differences) did not change the size of the
population, suggesting a common origin. The presence within the cluster of isolates with
identical sequences did not allow defining a fully resolved tree. However, by comparing the
trees produced by the Bayesian simulations with date-randomized trees, we found a high
level of population drift (represented by tree imbalance), suggesting that the source of
infection is under high selective pressure. Conclusions: Our study suggests that allele
thresholds should not be used as the only parameter to define L. monocytogenes clusters. It is
critical to complement allele counts with clustering generated by phylogenetically meaningful
algorithms. This approach can help retrospectively evaluate cluster membership (in particular
with regard to late isolates of an outbreak) and identify persistent strains in food facilities.
Poster
44
Board #:
The Predominant GII.4 Noroviruses Explore a Larger Sequence Space at the Intra-host Level as
Title:
Compared to Other Norovirus Genotypes
K. Tohma1, M. Saito2, H. Mayta3, M. Zimic3, C. J. Lepore1, L. A. Ford-Siltz1, R. H. Gilman4, G. I.
Parra1;
Author 1
U.S. Food and Drug Administration, Silver Spring, MD, 2Tohoku University Graduate School of
Block:
Medicine, Sendai, JAPAN, 3Universidad Peruana Cayetano Heredia, Lima, PERU, 4Johns
Hopkins University Bloomberg School of Public Health, Baltimore, MD.
Background: Norovirus is a major cause of acute gastroenteritis. Although humans can be
infected by more than 30 different genotypes, a single genotype (GII.4) has been shown to be
the most prevalent. The dominance of GII.4 noroviruses has been explained by the constant
accumulation of substitutions on the major capsid protein, which results in the chronological
emergence of novel antigenic variants. This evolutionary pattern is in stark contrast to that of
all other norovirus genotypes, which present limited diversification over time. To determine if
these different evolutionary patterns detected at the inter-host level are reflected at the
intra-host level, we analyzed a set of consecutive samples from children infected with
norovirus. We used next-generation sequencing (NGS) technologies and bioinformatic
analyses to determine the intra-host virus population dynamics of GII.4 and non-GII.4
noroviruses. Methods: Stool samples from children enrolled in two cohort studies (2007-2011
and 2013-2014) in Peru were tested for norovirus; samples positive for noroviruses were
selected for full-length genome PCR amplification and subsequent high-throughput DNA
sequencing. Intra-host dynamics of norovirus were analyzed using samples collected during
the acute and shedding phases of infection. The intra-Single Nucleotide Polymorphisms
(iSNPs) were determined based on the consensus sequences from the earliest samples during
Abstract:
the shedding episodes. All computational analyses were done using HIVE (High-performance
Integrated Virtual Environment) system and R. Results: We obtained 73 near full-length
norovirus sequences (≥7322 bp in length; covering >95% of the complete viral genome) with
average depth of coverage that ranged from 4669 to 60292. This data corresponded to 20
shedding episodes (GII.4: 7, non-GII.4: 11, mixed-infections: 2) from 17 children. Total
duration of shedding ranged from 4 to 89 days (GII.4: 8-89, median 22; non-GII.4: 4-37,
median 12). The iSNPs were observed all over the viral genome. Interestingly, GII.4 viruses
presented a higher number of iSNPs that were not detected in the most recent earlier
samples (GII.4: 1-35, median 12; non-GII.4: 1-20, median 7), while the non-GII.4 viruses
presented iSNPs that persisted during the shedding phase (GII.4: 0-15, median 1; non-GII.4: 0-
8, median 2). Conclusions: The GII.4 noroviruses presented a high number of de novo iSNPs
during the shedding phase, while the viral population of non-GII.4 noroviruses persisted for
several days and weeks within the host. Although the sample size is too small to conduct
statistics, these observations, together with those at the inter-host level, suggest a higher
genetic robustness of GII.4 noroviruses that allows them to explore a wider sequence space
and hence predominate in the human population.
Poster
45
Board #:
Title: Genomic Comparisons of Canine and Human S. pseudintermedius
Author B. Duim, A. Wegener, E. Broens, A. Zomer, A. Timmerman, J. A. Wagenaar;
Block: Utrecht University, Faculty of Veterinary medicine, Utrecht, NETHERLANDS.
S. pseudintermedius is a major pathogen in dogs and can occasionally be found in human
infections. In dogs S. pseudintermedius is often multidrug resistant (MDR), including
methicillin resistant (MRSP), and in humans it is an opportunistic pathogen found in elderly
and immunocompromised patients and is primarily methicillin susceptible. Antibiotic
resistance and in particular methicillin resistance in S. pseudintermedius is mediated by genes
very similar to those of S. aureus, leading to a risk of transmission of resistant bacteria
between humans and dogs and of resistance genes between S. pseudintermedius and S.
aureus. This study compares the genomes of S. pseudintermedius isolates from canine and
human infections, focussing on resistance gene and virulence content and clonal
distribution.Fifty MRSP isolates and 56 methicillin-susceptible S. pseudintermedius (MSSP)
isolates from dogs, from the Veterinary Microbiological Diagnostic Center, and 26 S.
pseudintermedius isolates from humans were MiSeq sequenced; including four isolates (1
Abstract:
human, 3 canine) from the same household. The resistance genes and sequence types were
identified using the batch upload pipeline from the Center for Genomic Epidemiology (CGE).
With BLAST searches, virulence and mobile elements were identified. Phylogenetic analysis
was performed using Gubbins. All but one of the 50 canine MRSP isolates were MDR; ST71
was the dominant sequence type (ST), followed by ST45 and ST258. From the 56 canine MSSP
isolates, 18 were MDR; and no dominant ST was identified. From the 26 human isolates, 11
were MDR and one was MRSP; ST 241 was associated with human isolates, including the
human isolate and two out of three dog isolates from the same household. The number of
human MDR isolates was higher than previously expected. MRSP showed a clonal
distribution, whereas MSSP was highly diverse. Resistance and virulence gene patterns that
may indicate host specificity or genes that could explain strain transmissions will be
presented.
Poster
46
Board #:
Bioinformatics Solutions Towards the Advancement of Pathogen Detection with
Title:
Metagenomics
Author P. Li1, J. Russell2, D. Yarmosh2, K. Davenport1, C. Lo1, J. Jacobs2, P. Chain1;
1
Block: Los Alamos National Laboratory, Los ALamos, NM, 2MRIGlobal, Gaithersburg, MD.
Next-generation-sequencing (NGS) has great potential for use as an excellent tool for
detecting and diagnosing infectious disease. As applied to metagenomics, NGS poses several
challenges when geared toward general pathogen detection activities. Some of these
challenges include the robust assignment of pathogens, given an incomplete database, short
reads, and algorithms that focus only on easy use cases (e.g. pathogens comprise most of the
sample). General metagenomics taxonomy classifiers have been employed to help identify
organisms within clinical samples. However, before NGS can be used as a routine procedure
in a clinical setting, several hurdles must be overcome: (1) an easy-to-use environment that
technicians or other non-bioinformatics experts can use, including reports and visualizations
that can be interpreted by clinicians in a meaningful fashion; (2) rapid bioinformatics tools
Abstract: which run effectively on commodity hardware; (3) levels of confidence for reported
organisms that is not tied solely to abundance; (4) a database of known pathogens that allow
meaningful reporting. Here, we provide some examples of the issues surrounding the use of
NGS as a method to robustly identify pathogens in complex samples. We also present a series
of efforts designed to: (1) lower the barrier for non-experts to use NGS for routine
bioinformatics applications by developing a user-friendly web-based suite of tools; (2) limit
the number of organisms mis-identified within samples, thereby improving positive predictive
value; (3) provide the ability to fine-tune parameters to better assess what defaults should be
used given specific questions that require different cutoffs (e.g. pathogen discovery vs
detection of known pathogens). We also present a first attempt at developing confidence
scoring algorithms that are not tied to the abundance of identified organisms.
Poster
47
Board #:
Implementation of Whole Genome Sequencing for Routine Surveillance of Mycobacterium
Title:
bovis in Cattle in Great Britain
R. J. Ellis1, F. Yang-Turner2, J. Nunez-Garcia1, E. Palkopoulou1, T. Peto2, A. Mitchell1, S. Downs1,
Author S. Hoosdally2, D. Foster2, D. Crook2, G. Hewinson1;
1
Block: Animal and Plant Health Agency, Addlestone, UNITED KINGDOM, 2Nuffield Department of
Medicine, University of Oxford, Oxford, UNITED KINGDOM.
Bovine tuberculosis is endemic in some parts of England and Wales and as part of the effort
to control this disease, the Animal and Plant Health Agency has implemented routine whole
genome sequencing of M. bovis isolated from cattle. As there are approximately 5000 isolates
from across GB each year we have developed an efficient yet comprehensive sample-to-result
process. Key considerations have been cost effectiveness when compared to traditional
molecular typing (e.g. spoligotyping and VNTR), compatibility with historical genotyping data
and ensuring the results are in a usable format for field veterinarians and epidemiologists.
Illumina library preparation is performed on simple heat-lysed cell suspensions and
multiplexed on a NextSeq instrument to keep costs to a minimum. Data processing is fully
automated after data transfer, and includes a process to infer the traditional spoligotype.
Abstract: Generally, there is good congruence between groups defined by a combination of
spoligotyping and VNTR and those observed when clustering based on core genome SNP
analysis. However, homoplasy in the genotyping markers has been observed, leading to
amalgamation of some groups and partition of others. An approach for linking genetic
relatedness and geographical neighbourhood has been developed to aid visualization of the
data in a readily usable format. The information is being used to interrogate unusual clusters
of cases, to gain further understanding of disease transmission and to determine possible
intervention strategies. This approach has already identified that within herd diversity of M.
bovis is greater than previously thought, indicating that there may be multiple sources of
infection at any one time. The data collected is also proving useful when tracking the likely
source of human cases of M. bovis.
Poster
48
Board #:
Development of a Rapid Bioinformatic Analysis Pipeline for Nanopore-based Shotgun
Title:
Metagenomics in Severe Acute Respiratory Infection
N. J. Groves1, M. Chand1, V. Chalker1, A. Ainley2, G. Amos3, A. Logan3, A. Kearns1, B. Pichon1,
M. Doumith1, J. O'Grady4, M. Zambon1;
Author 1
Public Health England, London, UNITED KINGDOM, 2Imperial College Healthcare NHS Trust,
Block:
London, UNITED KINGDOM, 3National Institute for Biological Standards and Control (NIBSC),
London, UNITED KINGDOM, 4University of East Anglia, Norwich, UNITED KINGDOM.
Background: Severe pneumonia remains poorly characterized microbiologically; posing
clinical and public health challenges. Surveillance of severe reversible respiratory failure of
infectious cause through the UK extracorporeal membrane oxygenation (ECMO) network
shows the aetiology of 30-60% of cases is unknown. We aimed to utilise the capabilities of
nanopore sequencing to improve the diagnosis and characterisation of severe pneumonia.
This will require development of clinical grade bioinformatic analysis. Method: Nanopore
shotgun metagenomics was piloted using a bespoke respiratory bacterial community of 10
pathogens and commensals of known genome sequence, prepared by the National Institute
for Biological Standards and Control (NIBSC). Outputs were used to develop and validate a
two phase pipeline. Phase 1, a commercially available workflow, combines real time base
calling and index based taxonomic assignment with immediate report. In Phase 2, a more
sensitve method of demultiplexing is used, adapters are trimmed and chimeric reads
removed. Taxonomic assignment is repeated for corrected reads allowing characterisation of
typing, virulence and antimicrobial resistance determinants. Results were assessed by
mapping raw reads and assembled contigs to known genome sequences. Selected clinical
samples of known aetiology were used for further validation. Results: At varying levels of
input DNA, 9 of 10 species in the mock community were consistently identified by real time
Abstract:
analysis; the total megabases reflected an equal input by DNA weight, when reads assigned to
genus level were considered. A commensal Corynebacterium spp, was not detected in real
time but 90% of the genome was covered at a depth of ≥1X by mapping; likely reflecting an
inadequate reference database. Additional species within highly similar genera, such as
Neisseria, were identified but were filtered out using a threshold of >1% of total reads. S.
pneumoniae in the mock community produced 25X depth across 99% of the genome. A raw
error rate of 15% corrected to <0.5% after assembly. Coverage of the capsular operon was
sufficient to determine serotype 1, and of quality suitable for SNP calling. Results were
replicated in clinical samples from severe pneumonia cases following human DNA depletion.
In a case of Legionnaire’s disease, L. pneumophila was the only species identified at >1% of
total reads. Similarly, in S. aureus pneumonia, correct ascertainment of MLST, toxin and
antimicrobial resistance determinants was possible directly from
sample. Discussion: Validation demonstrated the potential of shotgun metagenomics to
improve diagnosis and characterisation of severe pneumonia. The high error rate in nanopore
sequencing does not appear to impair species level identification of respiratory pathogens. A
two phase pipeline can provide rapid clinical information in hours, and subsequent pathogen
characterisation suitable for clinical and public health use.
Poster
49
Board #:
Comprehensive 16S rDNA Analysis of the Saliva Microbiome and Its Association with BMI or
Title:
Diet
M. Fosbrink1, M. Baird1, M. Zais1, F. Strino2, W. Ridderberg2, S. Cardinale2, J. Jacobs2, E.
Author
Lader1;
Block: 1
QIAGEN Sciences, Frederick, MD, 2QIAGEN, Aarhus A/S, DENMARK.
Differences in the human oral microbiome (HOMB) have been shown to be driven by both
diet and systemic conditions such as obesity. Previous targeted metagenomics studies for
HOMB have targeted only one or two variable regions of 16S ribosomal DNA, resulting in
challenges comparing data across studies, and the detection of some organisms. To analyze
the diversity in the HOMB in relation to BMI or diet, targeted metagenomics sequencing of
the 16S rDNA gene was performed using the QIAGEN QIAseq 16S/ITS Library Kit on human
donor saliva samples. This kit addresses critical issues related to 16S rDNA gene sequencing
by: 1) incorporating phased primers to increase base diversity and accuracy; 2) using low
bioburden reagents which minimize contaminating; 3) providing a screening panel to
interrogate all variable regions of bacterial 16S rDNA gene and fungal ITS DNA; 4) enabling
researchers to customize which variable regions are targeted using QIAGEN validated
primers; and 5) utilizing a customizable, purpose built bioinformatics workflow within CLC bio
Genomics Workbench’s Microbial Genomics Module. In this study, we demonstrate the utility
of the QIAseq 16S/ITS library kit for analysis of human saliva samples. First, the screening
panel was used to determine which variable regions maximized diversity across all the
Abstract:
samples. Data from 54 samples was combined for each 16S variable region and alpha-
diversity was calculated. The V4V5 and V5V7 regions exhibited the highest diversity for these
samples, suggesting these regions may be the most informative for studying the saliva
microbiome. In addition, we observed the known disjoint specificity for some variable regions
to classify at the species level. Splitting of the samples into groups based on BMI, the normal
weight group had a higher alpha-diversity compared to the obese group. In addition, saliva
microbiome samples obtained from the obese group had a reduced abundance of several
genera found at higher levels in the healthy group. The impact of (broadly defined) diet on
the saliva microbiome was also assessed, and we show that Eastern-based diets have a higher
diversity when compared to Western-based diets. In conclusion, using QIAGEN QIAseq
16S/ITS screening panel, we were able to identify optimal 16S variable regions for human oral
Microbiome samples. Using this information, we were able to specifically target these
optimal 16S rDNA regions and demonstrate that microbial diversity, composition, and
abundance of the human oral microbiome is strongly impacted by BMI and broad food
choices.
Poster
50
Board #:
In-house cgMLST Analysis of Salmonella enterica subsp enterica Serovar Senftenberg Isolates
Title:
from Recurrent Outbreaks
Author J. Haendiges1, T. Blessington2, J. Zheng2, G. Davidson2, J. D. Miller1, M. Hoffmann2;
1
Block: NSF International, Ann Arbor, MI, 2US Food and Drug Administration, College Park, MD.
Background: Pistachios have been linked to two multistate outbreaks of Salmonella infections
in 2013 and 2016 and involved with product recalls in 2009 and 2013. Salmonella serovar
Senftenberg has been commonly isolated from pistachios since 2009. Methods: In this study,
whole-genome sequence (WGS) data from 75 Salmonella Senftenberg isolates were analyzed
to provide insight into evolutionary relationships and persistence among strains linked to
events of salmonellosis over the seven-year period. The sources of these isolates comprised
47 pistachio (2009-2016), 18 other food commodities (1941-2013), 3 clinical (2006-2017), 6
environmental (2011-2016) and 1 ATCC reference strain. Forty-two out of the 47 pistachio
isolates were obtained from the USA; three from Lebanon, and one each from Canada and
Turkey. Genomes of three strains isolated from pistachios were completely closed with long-
read sequencing technology, and used as reference to create a cgMLST scheme containing
2696 core genes using Ridom SeqSphere+. Sequence assemblies were imported and typed
Abstract: using the cgMLST scheme. Results: The phylogeny illustrates that the 2016 outbreak involves
direct descendants from the 2009 and 2013 events rather than an independent
contamination event. These isolates shared minimum allelle differences (0-12) between them
and were distinct from other food isolates and pistachio products from outside the United
States. Interestingly, pistachio isolates from a separate 2013 outbreak associated with
environmental contamination were genetically different than the clonal strain. Notably, the
outbreak strain harbored a sodA and a a clpB gene that were not present in the ATCC
reference strain. Part of a stress-induced multi-chaperone system, clpB is involved in the
recovery of the cell from heat-induced damage and sodA has been documented to code for
biocide and heavy metal tolerance. Conclusion: The data suggests a prominent clonal strain
of Salmonella Senftenberg is persistent in the pistachio production supply chain. Further
research aims to elucidate the mechanisms of resistance of this clonal strain in order to
institute better preventative controls and good manufacturing practices that will aid in the
reduction of possible elimination of contamination of this pathogen in pistachios.
Poster
51
Board #:
Title: Molecular Genotyping of Hepatitis A Virus - California
W. Probert1, C. Gonzalez2, A. Espinosa1, J. K. Hacker1;
Author 1
California Department of Public Health, Richmond, CA, 2Sonoma County Public Health
Block:
Laboratory, Santa Rosa, CA.
Background: The United States has seen a resurgence of hepatitis A virus (HAV) infections in
recent years particularly among persons experiencing homelessness and users of illicit drugs.
Several states have recently reported outbreaks of hepatitis A caused by HAV subgenotype IB.
In October 2017, California declared a public health emergency in response to a surge in
hepatitis A cases. As part of that response, we implemented molecular genotyping to identify
the HAV strains associated with the CA outbreak, support epidemiologic investigation of
disease transmission, and monitor the effectiveness of public health control and prevention
measures. Methods: Sera from symptomatic, HAV IgM+ case-patients were accepted for HAV
molecular genotyping. Extracted nucleic acids were amplified using nested, RT-PCR targeting
the VP1-P2B region and a 315 nt fragment sequenced using Sanger sequencing. In addition,
representative specimens were selected for rRNA depletion library preparation and whole
genome sequencing (WGS) on the MiSeq platform. WGS data analysis was facilitated using
the Viral NGS Analysis Pipeline. Results: The HAV VP1-P2B region was successfully amplified
and sequenced for 161 serum specimens collected between August 2017 and May 2018.
Subgenotype classification by VP1-P2B sequence yielded 54 IA-, 104 IB-, and 3 IIIA- HAV
Abstract: positive specimens. Among the IB specimens, phylogenetic analysis of the VP1-P2B sequences
resulted in a tight cluster of 102 specimens that represented the CA outbreak strain with
several closely related variants. The remaining two IB sequences matched outbreak strains
reported in Michigan. Comparison of nearly complete genome sequences indicated that the
CA HAV outbreak strain shared 95.6% nucleotide and 99.7% amino acid sequence identity
with HAV IB type strain. Significantly, nucleotide substitutions in the 5’-UTR and amino acid
substitutions in VP1 and P2C were noted. Conclusions: In response to a public health
emergency, we rapidly developed the capacity for detecting, genotyping, and WGS of HAV.
Molecular genotyping revealed a large cluster of closely related HAV subgenotype IB strains
as the source of the outbreak. Similar strains have been identified as the source of recent
outbreaks in other states including Utah, Kentucky, Indiana, and Arizona. A nearly complete
genome sequence was determined for the predominant HAV IB outbreak strain and several
closely related variants. Substitutions found in putative HAV virulence determinants may
warrant further investigation. Finally, a notable decline in HAV specimen submissions and in
the detection of the outbreak strain genotype indicate that public health measures have been
successful in controlling and preventing further spread of this strain in CA.
Poster
52
Board #:
Title: Clinical and Research Experience with NGS-based Diagnostics
Author J. E. Ellis;
Block: Fry Laboratories, LLC, Scottsdale, AZ.
Next Generation DNA Sequencing (NGS) represents a compelling alternative to the single-
analyte, syndromic panels, or culture-based diagnostic pipelines. The current research-based
NGS analysis tools and pipelines typically rely on specialized operators, are excessively time
consuming, or do not have the features necessary for routine clinical use. To address the
demands of clinical microbiology and basic science research we created the Rapid Infectious
Disease Identification (RIDI™) system, an infectious disease analysis system. We will present a
review of the instrumentation selection, sequencing strategy for “dirty” clinical samples, and
analysis pipelines with contrast to the RIDI analysis pipeline including experience drawn from
5 years of routine clinical use. A summary of the greater than 600 replicates of more than 60
American Type Culture Collection (ATCC) reference standards individually and in various
combinations will also be presented. Further considerations regarding the CFU / genomic
equivalents, nomenclature, and reporting requirements will be reviewed. Curation of massive
Abstract:
public databases and algorithms targeting ambiguous DNA sequences have yielded significant
advances to the DNA-sequence-based identification strategy for pathogens. The RIDI™ system
accurately and rapidly identifies bacteria/archaea with more than 99% of the DNA sequence
reads to the genus level and greater than 75% to the species level of both reference
standards and simulated clinical samples. Additionally, fungi and protozoa are identified
accurately to the genus level more than 86% of the time and to the species level more than
67% of the time. These identification rates support meaningful clinical intervention and
significant research value simultaneously. This strategy has yielded the detection of unusual
microbial DNA signatures in a variety of acute and chronic disease states including
bacteremia, coronary artery debris, uncultivatable eye infections, and chronic inflammatory
illnesses. Ultimately, our experiences support the use of an infectious disease clinical
diagnostics and research pipeline utilizing an NGS-based approach.
Poster
53
Board #:
Title: Mosaic - a Cloud Platform for Microbiome Analysis
Author S. T. Westreich;
Block: DNAnexus, Mountain View, CA.
Exploration of the microbiome is an exciting new area of science, incorporating many “big
data” aspects to fully encompass the diversity and complexity of this environment. There has
been an explosion of new tools designed for the microbiome space; some are modeled after
existing genomic tools, while others are designed specifically for multi-organism “-omics”
analysis. Many of these tools, however, require considerable computational resources and
can prove challenging to install and run. In partnership with Janssen Human Microbiome
Institute (JHMI), DNAnexus has created Mosaic, a cloud platform designed for microbiome-
focused informatics. Mosaic allows for third-party bioinformatics tools to be developed as
applications, which can be run on cloud instances through a graphical user interface (GUI).
Abstract:
Mosaic provides a central platform for tool developers to share their implemented programs,
and for tool users to bring their data for analysis without needing extensive command line
experience. Mosaic also hosts microbiome challenges, similar to CAMI, designed to
encourage development of new methods and raise publicity for the most effective
microbiome tools. Mosaic offers significant scalability and processing power for handling
computational microbiome analysis tasks and is striving to bring together the microbiome
community to allow for increased communication and collaboration. It offers an avenue for
tool comparison and a method for tool usage without the need for a high-performance
computing cluster or extensive programming experience.
Poster
54
Board #:
Title: Metagenomic Sequencing Identification of Rickettsiafelis in an American Patient
Author Y. Chen, C. R. Icenhour;
Block: Aperiomics, Inc, Ashburn, VA.
Background: Rickettsia felis is an intracellular gram-negative bacterium, first described in
1991 as a human pathogen from a patient in Texas. Although R. felis was first identified in the
USA, only one case (as reported in literature) has been reported in the USA over the past two
decades. Cat fleas (Ctenocephalides felis) are considered the primary vector and reservoir
for R. felis, although other vectors are likely. Current methods for R. felis diagnosis include a
real-time PCR assay and antigen-based test, the latter method showing a high failure rate.
Reported here is an American male patient testing positive for R. felis in peripheral blood.
This 34-year-old male lives in Alabama with a domestic cat. After suffering from rashes on his
left arm and trunk, he went to see a dermatologist in February 2017. No new personal care
products or medications had been used immediately before the rash appeared. On March 06,
2018 the patient was sent to an ER, having suffered from a stroke. Except a previously
diagnosed hypertension condition, all blood tests came back normal. Methods: One
peripheral blood plasma sample was collected from the male patient on May 11, 2018. DNA
was extracted from the sample using Qiagen QIAamp DNA Microbiome Kit, and sequencing
library was prepared by KAPA HyperPrep Kits. Sequencing reads were produced with an
Illumina NextSeq500 platform at Technology Center for Genomics & Bioinformatics (UCLA).
Raw sequencing data was processed and analyzed with XplorePATHOSM, a fully automated
Abstract:
metagenomics sequencing data analysis platform developed and operated by Aperiomics,
Inc. Results: 6.36 million paired-end sequence reads of 75bp length were generated by
shotgun metagenomic sequencing. 656,660 sequence reads were filtered out because of low
sequencing quality. In the remaining clean sequence reads, 98.6% (5,627,427) were human
sequences and 0.3% (16,393) aligned to microbial reference genomes. 10,184 sequence reads
were unique for 99 microbial genomes, which belonged to 87 microbial species. R. felis was
identified as the microorganism with the highest relative abundance (14.2%) among all
identified microbial species. In total, 413 paired-end sequence reads (including 256 unique
sequence reads) were aligned to the R. felis genome, covering around 2.0% of the whole R.
felis genome. Based on the sequencing results, the patient’s doctor prescribed doxycycline,
and multiple symptoms of the patient had experienced resolved after two-week
treatment. Conclusion: This case demonstrates the value of shotgun metagenomic
sequencing for the identification of rare pathogens. Xplore-PATHOSM can identify over ten
thousand pathogens in one test, which is crucial for identifying pathogens that cannot be
identified through traditional testing. Billions of people worldwide suffer from emergent and
chronic infectious diseases and shotgun metagenomic sequencing is an important weapon in
the battle against infectious disease.
Poster
55
Board #:
A Core Genome Approach That Enables Prospective and Near Real-time Monitoring of
Title:
Infectious Outbreaks
H. van Aggelen1, R. Kolde1, H. Chamarthi1, J. T. Fallon2, J. J. Carmona3, M. M. Fortunado-
Author Habib3, B. D. Gross3;
1
Block: Philips Research, Cambridge, MA, 2New York Medical College, Valhalla, NY, 3Philips
Healthcare, Cambridge, MA.
Background: Whole genome sequencing is increasingly being adopted in clinical settings to
confirm or rule out potential transmissions of infectious agents. Clinical infection monitoring
is most actionable when performed in a prospective manner, in which samples are
continuously added and compared to previous samples. To enable prospective pathogen
comparison, genomic relatedness metrics must be: i) consistent across time, ii) efficient to
compute, and iii) reliable across the large variety of samples typically seen in a clinical setting.
Appropriate selection of genomic regions to compare, i.e. a core genome, is critical to obtain
a consistent metric of pathogen relatedness via single nucleotide differences. Methods: We
propose a method that selects conserved nucleotides in a reference genome based on the
variation seen in publicly available RefSeq genome assemblies. The conserved nucleotides can
be computed efficiently from the k-mer occurrence frequencies in the genome assembly set.
The resulting core genome is sample set-independent and can be applied universally across
time and location, such that single nucleotide difference metrics remain constant over time.
Given the constancy of the metrics, previously analyzed samples do not need to be re-
analyzed when samples are added, which significantly reduces the computational
burden. Results: Using this method, we generated core genomes based on all 8274 RefSeq
Abstract: assemblies for S. aureus, 2876 assemblies for K. pneumoniae and 782 assemblies for E.
faecium and tested them on large clinical data sets. We show that this method disambiguates
same-pathogen samples better than a core genome consisting of conserved genes, as
measured by ROC curves for same-patient versus different-patient samples for sets of 1362
East England S. aureus and 905 Houston K. pneumoniae samples. For these data sets, the
proposed method achieves a 0.981 area under the curve (AUC) for K. pneumoniae and 0.977
AUC for S. aureus, which is superior to the 0.935 and 0.955 AUC obtained with the conserved
gene approach and translates into a significant difference when monitoring large numbers of
samples. The proposed method is universally applicable, which we demonstrate by comparing
multiple geographically distinct cohorts. We illustrate that this method recovers previously
published confirmed outbreak samples with high accuracy in a large set of 1457 S.
aureus samples from the U.K: all 45 samples part of the outbreak were recovered by the
proposed method, and 2 other samples were identified as similar to the outbreak, whereas
the conserved gene method confirmed only 36 of the 45 outbreak samples and identified 3
other samples as similar. Conclusions: The proposed core genome approach not only makes it
possible to perform prospective and near real-time genomic studies, it also provides a
universal framework to quantify pathogen relationships across geographical locations.
Poster
56
Board #:
Title: Staphylococcus aureus Viewed from the Perspective of 40,000+ Genomes
Author R. A. Petit III, T. D. Read;
Block: Emory University, Atlanta, GA.
We created Staphopia, an analysis pipeline, database and Application Programming Interface
for batch analysis of thousands of S. aureus Illumina whole genome shotgun projects. Written
in Python, Staphopia’s analysis pipeline consists of submodules running open-source tools. It
accepts raw FASTQ reads as an input, which undergo quality control filtration, error
correction and reduction to a maximum of approximately 100x chromosome coverage. This
reduction significantly reduces total runtime without detrimentally affecting the results. The
pipeline performs de novo assembly-based and mapping-based analysis. Automated gene
calling and annotation is performed on the assembled contigs. Read-mapping is used to call
variants (single nucleotide polymorphisms and insertion/deletions) against a reference S.
Abstract: aureus chromosome (N315, ST5). We ran the analysis pipeline on more than 43,000 S. aureus
shotgun Illumina genome projects in the public European Nucleotide Archive database in
November 2017. We found that only a quarter of known multi-locus sequence types (STs)
were represented but the top ten STs made up 70% of all genomes. Methicillin-resistant S.
aureus (MRSA) were 64% of all genomes. Using the Staphopia database we selected 380 high
quality genomes deposited with good metadata, each from a different multi-locus sequence
type, as a non-redundant diversity set for studying S. aureus evolution. In addition to
answering basic science questions, Staphopia could serve as a potential platform for rapid
clinical diagnostics of S. aureus isolates in the future. The system could also be adapted as a
template for other organism-specific databases.
Poster
57
Board #:
Title: Examination of Mycobacterium avium Complex Infections Through WGS Analysis
D. J. Operario, A. F. Koeppel, S. D. Turner, A. J. Prorock, Y. Bao, K. Sol-Church, S. Pholwat, M.
Author
H. Scheurenbrand, E. R. Houpt;
Block:
University of Virginia, Charlottesville, VA.
Background: Disease attributed to Mycobacterium avium complex (MAC) is caused by a
number of closely related non-tuberculous mycobacteria including M. avium and M.
intracellulare. Neither current probe-based MAC diagnostics nor disease presentation lend
themselves to easily distinguishing between the different MAC species. As a result, clinically
diagnosed stable disease, or relapsed/reinfected MAC disease may actually be caused by
distinct organisms. Methods: To elucidate this phenomenon, we identified 36 patients who
each had multiple AFB cultures where MAC was identified, a total of 97 isolates. Bacterial
genomic DNA from each isolate was subjected to whole genome sequencing on an Illumina
NextSeq platform. Data cleanup on the resulting sequences was performed using FastQC. To
rapidly compare sequences between isolates, sequencing controls, and reference sequences,
MinHash was employed thereby eliminating the need to pre-assemble the genomes. MinHash
distances were then used to construct a phylogenetic tree, rooted on a reference sequence
of M. tuberculosis H37Rv. Kraken/Bracken was used to estimate the relative species
abundance in samples with mixed infections. In addition, we performed both phenotypic and
Abstract:
genotypic drug susceptibility testing on each isolate. Phenotypic testing included
clarithromycin, rifampin, rifampicin, ethambutol, amikacin, moxifloxacin, linezolid, tedizolid,
clofazimine, bedaquiline, tigecycline, and ceftazidime/avibactam. Genotypic drug
susceptibility testing was achieved using a combination of SRST2, abricate, and
ariba. Results: Our phylogenetic analysis showed that the isolates formed four distinct
clusters: an M. avium cluster, an M. intracellulare cluster, a
mixed avium/intracellulare cluster, and a “MAC other/M. abscessus” cluster. Species level
abundance estimates from Bracken appear to correlate well with the genetic distances
estimated by MinHash. Grouping isolates by patient, analyzing by order of isolate collection,
and adding phenotypic susceptibility testing information to the phylogenetic tree revealed
that some subjects were truly stable in their MAC disease while others actually switched
between phylogenetic clusters. Conclusions: Our results suggest that for certain patients,
WGS confirms a very stable disease while for others MAC disease is actually being caused by
different organisms that were possibly acquired as superinfections.
Poster
58
Board #:
Genomic Characterization and Phylogenetic Analysis of Salmonella Newport Clinical Strains
Title:
from Tennessee, 2017-2018
L. K. Hudson1, C. Moore2, L. Constantine-Renna3, J. K. Yackley3, X. Qian2, L. S. Thomas2, K. N.
Garman3, J. R. Dunn3, T. G. Denes1;
Author 1
Department of Food Science, University of Tennessee, Knoxville, TN, 2Tennessee Department
Block:
of Health, Division of Laboratory Services, Nashville, TN, 3Tennessee Department of Health,
Nashville, TN.
Background: Salmonella Newport is the third most common Salmonella serovar sent to the
Tennessee (TN) State Public Health Laboratory. The objectives of this study were to
retrospectively examine the genomic population structure of Newport clinical isolates from
TN in 2017-18 and describe epidemiological features among clades of case-patients
identified. Methods: Biosample numbers and metadata for S. Newport (n=91) clinical isolates
from TN collected January 2017 through June 2018 were provided by the TN Dept. of Health.
Raw reads were downloaded from NCBI SRA, trimmed with Trimmomatic, and quality
checked using FastQC. A reference assembly was chosen (BioSample SAMN05172397).
hqSNPs were identified using the CFSAN SNP pipeline and the resulting matrix was used to
construct a neighbor-joining tree with Mega7. Additionally, trimmed reads were assembled
with SPAdes and contigs annotated with Prokka. Assembly statistics were reported by BBMap,
SAMtools, and QUAST. Serotype designations were confirmed with SeqZero. Results: Nine
distinct clades of interest were identified: four major clades (1, 2, 4, and 5) and five minor
clades (3, 6, 7, 8, and 9). Major clades were defined as containing five or more isolates and
minor clades as containing less than five. Clades may represent or contain epidemiological
clusters. Clade 1 consisted of 41 isolates, with most (n=28) from patients in the western
Abstract: region of TN (11 from Shelby county), and were collected over a period of 11 months. Clade 2
consisted of nine isolates with source counties from all three regions and were collected over
13 months. Clade 4 consisted of nine isolates, with the majority (n=6) being isolated from the
western region (three from Obion county) and one each from counties in east and middle TN,
and one with an unknown source county. Clade 4 isolates were collected over eight months
and most were from male patients (78%) and adults (>17 yrs of age; 78%). Clade 5 contained
12 isolates, mostly isolated from counties in either middle (n=5) or east (n=5; three from Knox
county) TN. Clade 5 isolates were collected over one year and were mostly from female
patients (67%) and all from adults. The five minor clades all contained isolates collected from
middle or west TN. Conclusions: The clustering patterns of clades 1 and 4 with a majority of
isolates from western TN, together with the timeline of the isolation dates, may indicate that
these illnesses were from common or related exposures. In contrast, other clades (2 and 5)
are characterized by a widespread geographical distribution throughout the state. Further
investigation of epidemiological data and possible environmental sources may identify the
source of illness and possible preventive strategies. In addition, information gained about the
population structure of this serovar provides guidance for selecting SNP distance thresholds
used to identify clusters that may be of epidemiological significance.
Poster
59
Board #:
Title: ATCC® Site-Specific Mock Community Standards for Human Microbiome Applications
Author M. Hunter, S. Saha, S. King, M. Amselle, J. Lopera, B. Benton, D. Mittar;
Block: ATCC, Manassas, VA.
Advancement and accessibility of next-generation sequencing technologies have influenced
microbiome analyses in tremendous ways, opening up numerous applications in the areas of
human health and disease. To date, a significant body of work has been performed on the
human gut microbiome to evaluate its species composition and influence on physiology; this
research has led to additional studies on microbiomes localized at other sites on the human
body (e.g., skin, oral, vaginal). However, a predominant limitation in these site-specific
microbiome studies is the lack of appropriate and relevant standards to control the technical
biases introduced throughout the metagenomics workflow. To address this, ATCC has
developed a set of genomic and whole cell mock microbial communities from fully sequenced
and characterized ATCC strains that represent species found in the oral, skin, gut, or vaginal
Abstract: microbiome. To further enhance the use of these standards and eliminate the bias associated
with data analysis, we have also collaborated with One Codex to develop data analysis
modules that provide simple output in the form of true-positive, relative abundance, and
false-negative scores for 16S rRNA community profiling and shotgun metagenomics
sequencing. In this proof-of-concept study, we tested these mock communities via 16S rRNA
and shotgun metagenomics sequencing methods and analyzed the resulting data using the
One Codex data analysis platform. From this analysis, we found a strong correlation between
the expected and observed microbial compositions (less than 1 log difference between
relative abundance of individual genomes), indicating that these site-specific microbiome
standards can be used as tools for assay optimization and as daily run quality control
standards for microbiome assays.
Poster
60
Board #:
Assessing Quantitative Performance of Metagenomic Profiling Using Genomic DNA Reference
Title:
Material Mixtures
Author J. Kralj1, D. Tourlousse2, S. Servetas1, S. Forry1, S. Jackson1;
1
Block: NIST, Gaithersburg, MD, 2AIST, Tsukuba, JAPAN.
Background: Shotgun metagenomics is being employed in many microbial-related clinical,
environmental, and bio-security applications. However, development of materials and
methods to characterize relative performance metrics (e.g. sensitivity, specificity, accuracy)
have lagged. This presents a challenge to translating these new technologies into commercial
use, as regulatory agencies often require performance characterization of the entire sample
analysis pipeline. In response, NIST has undertaken development of a DNA-based reference
material (RM 8376) to enable characterization of sample processing and computational
procedures from sequencing to bioinformatics. RM 8376 consists of unmixed genomic DNA
(gDNA) from 19 different bacteria (16 total species, 3 pairs of related subspecies). We
evaluated the performance of a metagenomic profiling pipeline using mixtures of the RM
components. Experimental Sample mixtures contained the 19 components combined into 5
pools of 3 or 4 components each, and mixed in roughly equigenomic and log10 dilution
mixtures (latin square) for 6 total samples. Each sample was prepared using the Nextera XT
DNA Library Prep Kit with AMPure XP beads, and sequenced on an Illumina MiSeq (2x300 bp
paired reads). In addition, in silico read data matching the sample pools were generated by
Abstract:
subsampling raw reads from each individual RM component. For both data sets, taxonomic
classification was performed using Centrifuge with the default p+h+v indices. Results &
Discussion: In silico mixtures showed up to approximately 3-fold difference between expected
and observed relative abundances for 14/16 species; 2 species were missed. Pure isolate read
data were correctly identified, suggesting that the presence of other species resulted in
inaccurate abundance estimates. The physical mixture relative abundances differed from
expected. Of note, GC content was correlated with fold-differences, with high-GC content
genomes being substantially underrepresented. Further study could show quantitation of
biases from different sample preparations. Conclusions: As a whole, these data verified that
the 19-component mixtures can sufficiently represent sample complexity to inform on
computational biases. The new RM is a promising tool for deconvolving and quantifying biases
due sample preparation (as a result of e.g. GC-content) and computational methods, which
was a major goal of this material. The availability of RMs (such as described here) should be
useful in assessing the performance (accuracy, sensitivity, specificity, etc.) of each component
of the analysis pipeline, allowing the validation or optimization of a metagenomic workflow.
Poster
61
Board #:
Identification of a Novel Genetic Marker Associated with Pseudomonas fluorescens Blue
Title:
Discoloration Through a Genome-wide Association Strategy
F. Chiesa1, S. Lomonaco2, M. Rossi1, S. Gallina3, A. Dalmasso1, L. Decastelli3, T. Civera1;
Author 1University of Turin, Grugliasco, ITALY, 2Center for Food Safety and Applied Nutrition, U.S.
Block: Food and Drug Administration, College Park, MD, 3Istituto Zooprofilattico Sperimentale del
Piemonte Liguria e Valle d’Aosta, Torino, ITALY.
Background: The blue discoloration defect in different types of fresh cheeses has surged
worldwide during the present decade. In June 2010, Italian national health authorities notified
the European Rapid Alert System for Food and Feed about altered organoleptic characteristics
(blue color) and high numbers (5.1 106 CFU/g) of Pseudomonas fluorescens in mozzarella
cheese imported from Germany. Since then, the problem persisted and although seldomly, it
consistently affected several producers. The aim of this study was to identify a genetic marker
for the identification of pigmenting strains. Methods: The genomes of 3 pigmenting and 3
non-pigmenting P. fluorescens were sequenced on a MiSeq instrument after library
preparation with Nextera® XT DNA Library Preparation kit (Illumina). Reads were assembled
with SPAdes v3.2 and annotated using RAST (http://rast.nmpdr.org). For the genome-wide
association study (GWAS), 33 additional genomes were selected from GenBank (3 pigmenting
strains and 30 non-pigmenting strains), in addition to the 6 obtained herein. The pan-genome
pipeline Roary was used to calculate the pan-genome (https://sanger-
pathogens.github.io/Roary/) and its output was analyzed using Scoary to calculate the
associations between all accessory genes and the pigment production
(https://github.com/AdmiralenOla/Scoary). Only genes associated to an OD>1 and Pairwise p
Abstract: <0,05 were considered as possible markers. Primers for functionally annotated genes
identified as putative markers were designed using Primer3 plus. The selected target
gene (trpB) was then multiplexed with a previously designed primer set for the identification
of Pseudomonas genus (oprI). Results: WGS highlighted the presence of a 16 Kbp gene cluster,
present only in the pigmenting isolates, characterized by a set of accessory genes related to
tryptophan metabolism. This cluster is located on a 100 Kbp locus, which was highly
conserved (99-100%) in the pigmenting isolates included in our study. The Scoary analysis
confirmed the accessory genes of the pigmenting isolates gene cluster as significantly
associated with the pigment production. A total of 17 functionally annotated genes were
identified as putative markers. A primers pair amplifying a fragment of trpB was selected on
the base of amplification results and suitable amplicon size to work with the oprI primer pair.
The newly designed PCR multiplex assay demonstrated a 100% specificity for the pigmenting
isolates, producing the expected bands (trpB, 200 bp and oprI, 250 bp) in all the pigmenting
strains tested. Conclusions: This study highlights i) the suitability of the GWAS strategy for the
identification of genetic markers; ii) the suitability of trpB for the identification of P.
fluorescens pigmenting strains, and iii) the usefulness of a multiplex PCR assay for the
identification of sources of contamination in production facilities.
Poster
62
Board #:
Design, Analysis, and Validation of Error-Correcting Internal Spike-In Controls for
Title:
Metagenomics
N. Greenfield1, R. Bovee1, C. Smith1, S. A. Cunningham2, R. Patel2;
Author 1
One Codex, San Francisco, CA, 2Department of Laboratory Medicine and Pathology, Mayo
Block:
Clinic, Rochester, MN.
Robust next-generation sequencing (NGS) metagenomic assays require defined detection
limits and process traceability from sample collection to bioinformatic analysis. DNA
sequence spike-ins can serve as qualitative controls, barcode and track samples, and provide
absolute concentration data to address these challenges. Here we describe the design,
analysis, and validation of error-correcting internal spike-in controls for metagenomics. We
developed software to design synthetic sequences and use a Hamming(3, 1) code to encode a
sequence descriptor, barcode, and manufacturing lot information. Our error-correcting
Abstract: encoding makes the spike-ins’ detection robust to DNA base substitutions, insertions, and
deletions. These errors can occur either during manufacturing or as part of sequencing.
Designed sequences are not homologous to known reference genomes and contain no
homopolymer runs. Finally, we added support to the One Codex Platform for automatically
detecting and analyzing these spike-in sequences. We tested our software and spike-in
control design by generating synthetic sequences, synthesizing them, and adding them at
multiple concentrations as part of a clinical NGS workflow. Initial results across a range of
sample types and library preparations are presented.
Poster
63
Board #:
The Cluster Core Genome Size as a Metric to Improve Person-to-person Pathogen
Title:
Transmission Analysis
Author T. de Man, A. Laufer Halpin;
Block: US Centers for Disease Control and Prevention, Atlanta, GA.
Whole genome sequencing (WGS) provides a high resolution view of the microevolution of
bacterial pathogens causing infections. These data are used for tracing person-to-person
transmission events through phylogenetic analysis using single nucleotide polymorphisms
(SNPs) identified by read mapping to an outbreak reference genome. Pairwise distances
between sequenced isolates are usually measured in SNPs passing certain criteria (high
quality SNPs, hqSNPs) from the portion of the reference genome that is considered the core
genome of the isolate set (cluster core genome). Phylogenetically informative hqSNPs are
those that result from neutral point mutations rather than homologous recombination events
or phage DNA. Most phylogenetic tools only output SNPs, limiting inferences that can be
made for pathogen transmission events. However, the size of the cluster core genome
relative to the reference genome can inform a more accurate transmission analysis.
Therefore, we developed a Perl script that estimates the cluster core genome size across a
group of isolates, for a given depth of coverage, by assessing BAM files. An optional
parameter in which the user also provides a BED file of masked regions on the reference
genome will result in disregarding regions that do not harbor phylogenetically informative
hqSNPs (i.e., regions of homologous recombination). Our Perl script is available on GitHub
Abstract: and can be used with output from any SNP pipeline that generates BAM files. The script will
operate post-SNP calling to support isolate relatedness conclusions. However, challenges
remain in establishing transmission events with certainty; read coverage across a reference
genome is rarely uniform due to differences in genomic content, repetitive sequences, and
G+C content bias. Phylogenetically informative cluster core hqSNPs typically require support
of > 10 reads, meaning genomic areas with less read depth are unable to produce hqSNPs.
Increasing read depth stringency therefore leads to a smaller cluster core genome available
for hqSNP inclusion. If the majority of isolates do not meet the horizontal coverage threshold,
it could indicate a more closely related reference genome is needed or that those isolates are
unlikely to be related. Any sequenced isolate that does not meet the minimum coverage
threshold (e.g., 10X coverage across >80% of the reference genome) warrants further
assessment. In outbreak investigations where WGS is implemented, it is critical that cluster
core genome size and depth of coverage are always communicated with the number of
hqSNPs. A small cluster core genome, even with a low number of hqSNPs, can incorrectly
indicate the set of sequenced isolates is closely related. This might otherwise be missed if
only the number of hqSNPs is reported. Considering the cluster core genome size harboring
hqSNPs informs a more accurate transmission analysis to improve patient outcomes.
Poster
64
Board #:
WGS-based Characterisation of Listeria monocytogenes Isolated from the food-production
Title:
Chain and Humans in Italy
F. Chiesa1, S. Gallina2, L. Decastelli2, V. Filipello3, G. Kastanis4, M. Allard4, E. Brown4, S.
Lomonaco4;
Author 1University of Turin, Grugliasco, ITALY, 2Istituto Zooprofilattico Sperimentale del Piemonte
Block: Liguria e Valle d’Aosta, Torino, ITALY, 3Istituto Zooprofilattico Sperimentale della Lombardia,
Emilia Romagna, Brescia, ITALY, 4Center for Food Safety and Applied Nutrition, U.S. Food and
Drug Administration, College Park, MD.
Background: Listeria monocytogenes is an environmentally ubiquitous organism
contaminating food-processing environments. Consumption of contaminated food is thought
to be the cause of 99% of listeriosis cases. Whole genome sequencing (WGS) is a valuable
typing tool for L. monocytogenes isolates and the identification of virulence islands that may
influence infectivity. Methods: The 510 isolates were collected in Italy over 14 years (2002-
2016) and included 94 human clinical and 416 food/environmental isolates. Whole genome
sequences were obtained at the FDA-CFSAN sequencing facility. Available multi-locus
sequence typing schemes were used to assign sequence types (STs), clonal complexes (CCs)
and virulence types (VTs). The NCBI Pathogen Isolates Browser tool was used to determine
SNP clustering and antibiotic resistance (AMR) genotypes. Isolates were screened for the
presence of i) premature stop codons (PMSCs) in inlA, and ii) two Stress Survival Islets
(SSIs). Results: We observed 72 SNP clusters, 51 STs (4 novel), 38 CCs and 45 VTs. Ten CCs
accounted for more than 80% of the isolates. Most of the food/environmental isolates
belonged to CC9/VT11 (n=176) and CC121/VT94 (n=31), while clinical isolates were mostly
represented by CC1/VT20 (n=18). inlA PMSCs were found in 48% of isolates (n=246), and were
more frequent among environmental than clinical isolates (58% vs 10%, respectively).
Tetracycline-associated resistance gene tet(M) was observed in 5.3% of isolates (n=27). Genes
Abstract:
associated with benzalkonium chloride resistance were found in 16.8% (n=86
Tn6188 transposon) and 7.1% (n=37 bcrABC) of isolates. Tn6188 was found in 65% (n=21) of
CC121/VT94 isolates, while bcrABC was present in 77% (n=27) of CC31/VT113 isolates. Finally,
less than 1% of isolates harbored the efflux proteins emrE, qacA and qacC. SSI-1 was found in
63% of isolates (n=321), while SSI-2 in 7% (n=36). All CC99/VT11 isolates carried only SSI-1,
while all CC121/VT94 isolates only carried SSI-2. Conclusions: The overrepresentation of
CC9/VT11 could be related to multiple sampling of the same type/source. This could also
explain the high rate of tet(M), as 23 out of 27 tet(M) positive isolates belonged to
CCT9/VT11. Among the diversity of L. monocytogenes strains from food production, some CCs
(e.g. CC121 and CC9) have been commonly reported as prevalent in processing environments.
This might be linked to the presence of SSIs or the ability to survive to disinfectants. SSI-1
contributes to growth in suboptimal conditions, while SSI-2 is involved in stress response.
These SSIs can help L. monocytogenes adapt to specific niches in food processing
environments. Further studies are needed to characterize the impact of Tn6188 and
bcrABC on environmental persistence. Our findings could be helpful in both monitoring the
food productions more at risk for L. monocytogenes and in supporting epidemiological
investigations of outbreaks.
Poster
65
Board #:
Imipenem Resistance Mechanism Analysis in Pseudomonas aeruginosa Through Whole
Title:
Genome Sequencing Analysis
Author W. Chang, A. Saeed, V. Sapiro, T. Walker;
Block: Opgen, Gaithersburg, MD.
Background: Antibiotic resistance is accelerated by the misuse and overuse of antibiotics. For
better patient treatment and antibiotic stewardship, it is crucial to rapidly and accurately
determine a pathogen’s antibiotic resistance. P. aeruginosa have developed three
mechanisms to confer resistance to carbapenems: acquisition of carbapenem degradation or
modification enzyme genes; mutations of porin genes (e.g. oprD) to decrease outer
membrane permeability; and overexpression of efflux pump systems. In the previous study
we presented at ASM Microbe 2018, we have found that all three mechanisms are involved in
the resistance to meropenem, a carbapenem, using whole genome sequencing (WGS) and
meropenem-resistance phenotypic data. In this study, we studied the relationship of the
relevant genes in P. aeruginosa with resistance to another carbapenem,
imipenem. Materials/methods: WGS sequencing reads or assemblies and phenotype data
describing imipenem resistance for 208 P. aeruginosa isolates were acquired from public
databases and analyzed with OpGen Acuitas® Whole Genome Sequencing Analysis pipelines.
Of these, 87 isolates were resistant to imipenem and 121 isolates were susceptible. All known
carbapenem degradation/modification enzyme genes and chromosomal gene mutations
conferring resistance to carbapenems were analyzed by comparison to reference genes from
the reference strain P. aeruginosa PAO1 (NCBI: NC_002516.2). Results: All three mechanisms
Abstract: were analyzed. 36 isolates possess at least one carbapenem degradation/modification
enzyme gene, 69 have loss of function (LOF) mutations in oprD genes and 13 have LOF
mutations in nalD (a transcription repressor for the efflux system, MexAB-OprM). In all, 88
isolates harbor at least one enzyme or mutation described above; of these, 78 isolates were
resistant to imipenem and 10 were susceptible. Using these as indicators of the resistance to
imipenem, the prediction accuracy is calculated at 90.9%, sensitivity at 89.7%, specificity at
91.7%, positive predictive value at 88.6% and negative prediction value at 92.5%. Similar to
resistance to meropenem, isolates with a LOF mutation in oprD or with carbapenem
degradation/modification enzyme genes have high positive predictive value of resistance to
imipenem: at 91.7% and 92.8%, respectively. And even though a LOF mutation in nalD is
related to imipenem resistance, mutations in the repressor mexR for the same efflux pump
system isn’t. However, unlike resistance to meropenem, OXA-2 and LOF mutation
in opdD don’t confer resistance to imipenem. Conclusions: Similar to resistance to
meropenem, P. aeruginosa can become resistant to imipenem through acquisition of
carbapenem degradation enzymes, mutations in porin genes and efflux pump systems.
However, the differences also exist between resistance to meropenem and resistance to
imipenem such as OXA-2 or LOF mutation of opdD confer resistance to meropenem, but not
to imipenem.
Poster
66
Board #:
Whole Genome Sequencing and Bioinformatic Analysis of Two Foodborne Illness
Title:
Outbreaks: Campylobacter jejuni and Salmonella enterica
Author K. F. Oakeson;
Block: Utah Public Health Laboratory, Taylorsville, UT.
Whole genome sequencing (WGS) is rapidly becoming a powerful tool for determining the
relatedness of bacterial isolates in foodborne illness detection and outbreak investigation.
WGS has been applied to large national outbreaks and surveillance, however, WGS has rarely
been used in smaller local outbreaks. This work describes the retrospective application of
reference free whole genome sequencing and bioinformatic analysis to a local outbreak of
Campylobacter jejuni associated with raw milk
The current work demonstrates the superior resolution of genetic relatedness generated by
Abstract: WGS data analysis when compared to pulsed-field gel electrophoresis (PFGE). WGS is
powerful alternative to PFGE for the determination of genetic relatedness between bacterial
isolates. The application of WGS and bioinformatic analysis was applied to a Utah specific
outbreak of Campylobacter jejuni associated with raw milk and to a national multi-state
outbreak of Salmonella enterica associated with rotisserie chicken to illustrate the flexibility
and scalability of the workflow. Together these two analyses show how a reference sequence
free WGS workflow is superior to PFGE and other WGS workflows that are based on single
nucleotide polymorphisms (SNPs).
Poster
67
Board #:
Title: Maximal Viral Information Recovery from Sequence Data Using VirMAP
Author N. J. Ajami, M. C. Wong, M. C. Ross, R. E. Lloyd, J. F. Petrosino;
Block: Baylor College of Medicine, Houston, TX.
Accurate classification of the human virome is critical to a full understanding of the role
viruses play in health and disease. This implies the need for sensitive, specific, and practical
pipelines that return precise outputs while still enabling case-specific post hoc analysis. Viral
taxonomic characterization from metagenomic data suffers from high background noise and
signal crosstalk that confounds current methods. Here we develop VirMAP that overcomes
Abstract: these limitations using techniques that merge nucleotide and protein information to
taxonomically classify viral reconstructions independent of genome coverage or read overlap.
We validated VirMAP using published datasets and viral mock communities containing RNA
and DNA viruses and bacteriophages. VirMAP offers opportunities to enhance metagenomic
studies seeking to define virome-host interactions, improve biosurveillance capabilities, and
strengthen molecular epidemiology reporting.
Poster
68
Board #:
Microbial Water Quality Assessment of Hurricane Harvey Floodwater Remnants Using
Title:
Shotgun Metagenomic Sequencing
J. Narayanan1, S. Chellam2, A. Kelley1, S. Christnacht2, H. M. Lavigne2, S. Das2, J. Murphy1, V.
Author Hill1;
1
Block: Centers for Disease Control and Prevention, Atlanta, GA, 2Zachry Department of Civil
Engineering Texas A&M University, College Station, TX.
It is estimated that culturable microbes represent less than one percent of the total microbial
population. With recent advances in next-generation sequencing (NGS), it is possible to obtain
a complete taxonomic and metabolic profile of microbial communities in water samples. In
the present study, we evaluated the two most commonly used NGS-based approaches -
targeted 16S rRNA amplicon sequencing, and shotgun sequencing - to analyze 11 floodwater
samples collected in the greater Houston area following Hurricane Harvey. The shotgun
sequencing data provides taxonomic and metabolic fingerprinting from all nucleic acids (DNA
and RNA) extracted, whereas the targeted 16S rRNA amplicon sequencing only provides
bacterial community profiles. The value of the shotgun metagenomic approach has been
successfully demonstrated by simultaneous detection of both opportunistic waterborne
Abstract:
bacterial pathogens (Legionella, Mycobacterium and Pseudomonas) and protozoan hosts
(Acanthamoeba and Naegleria). These groups of microorganisms typically co-exist in biofilms
associated with drinking water distribution systems or premise plumbing. A metagenomic
analysis of water samples also provided information on the presence of Vibrio. In addition,
fecal indicator bacteria, including Escherichia coli and Enterococcus faecalis, were detected in
samples. Several bacteriophages including those from Mycobacterium, Vibrio, Escherichia,
and Pseudomonas, were also identified along with their bacterial hosts. The shotgun
metagenomic sequencing approach developed herein provides reliable information on
microbial ecology and community level physiological profiles, which may aid in controlling
pathogens present in contaminated water through effective management practices.
Poster
69
Board #:
Title: Multiplex NGS Detection of Conserved Regions of Bacterial Toxins in Environmental Water
A. Gonzalez-Revello1, R. Fort2, A. Iriarte3, P. Zunino4, C. Piccini4, J. Sotelo-Silveira1;
1
Genomics Dept., Instituto de Investigaciones Biológicas Clemente Estable, Montevideo,
Author URUGUAY, 2Sequencing Platform, Instituto de Investigaciones Biológicas Clemente Estable,
Block: Montevideo, URUGUAY, 3Dept. of Biotechnology, Instituto de Higiene, Fac. Medicina,
Montevideo, URUGUAY, 4Dept. Microbiology, Instituto de Investigaciones Biológicas
Clemente Estable, Montevideo, URUGUAY.
The control and microbiological monitoring of water with tools that ensure the identification
of pathogens and their pathogenic or toxic capacities, ensures adequate quality standards.
Currently, regular detection is based mainly on cultures and they have clear limitations. We
aimed to introduce in our country, Uruguay, the detection of pathogens and their
pathogenicity or toxicity, particularly through new genomic techniques. Through the selection
of conserved regions in sequences of relevant bacterial toxins (focusing on cyanobacteria and
a series of pathogens relevant to human health) we designed Ampli-seq panels to multiplex
detection of these genes by NGS in water samples. Additionally, the community present in
the sample will be verified by metagenomics of 16S rRNA. A total of 274 primer pairs,
targeting 548 amplicons (85-340bp in length) were designed targeting 40 conserved regions
Abstract: and 32 full length genes. Control samples yielded sequences mapping 98% to target regions,
producing few off target products and depended on template abundance. For full length
genes, 80 to 90% of the sequence was recovered. Sequencing of environmental water
samples where blooms of cyanobacteria’s were previously observed, yielded amplicons
identifying the species and sequence variance. Sixteen S metagenomics matched the profiles
of bacterial strains both in controls and environmental samples. Additionally, the system can
be used in combination with different sequencing platforms (Ion Torrent, Illumina, or Oxford
Nanopore). In general, the system proved to be sensitive and useful to detect a wide variety
of species yielding sequencing information that could be useful not only to detect the
presence of genes coding for known toxins but ones derived from species yet to be
characterized.
Poster
70
Board #:
Comparison of Phenotypic and Genotypic Drug Susceptibility Test Results of 108 drug-
Title:
resistant Mycobacterium tuberculosis Isolates
A. Takaki1, Y. Murase1, A. Aono1, K. Chikamatsu1, Y. Igarashi1, H. Yoshida2, Y. Tamura2, T.
Author Nagai2, H. Yamada1, S. Mitarai1;
1
Block: The Research Institute of Tuberculosis, JATA, Tokyo, JAPAN, 2Osaka Habikino Medical Center,
Osaka Prefectural Hospital Organization, Osaka, JAPAN.
Background: Rapid diagnosis of Drug-resistant tuberculosis (DR-TB), especially multidrug- and
extensively drug-resistant (M/XDR-) TB is one of the important for present TB management in
the world. In recent years, to shorten the time for drug susceptibility testing (DST), the
molecular diagnostics including the whole genome sequencing (WGS) become popular.
However, it is known that there are some discordances between genotypic and phenotypic
DST. The accuracy of such methods depends on the reliable DST data. Then, to detect the
mutations and evaluate those DST, we compared the WGS data with our DST results about
108 M. tuberculosis (MTB) isolates including MDR and XDR using the 14 TB drug resistance
prediction tool, TGS-TB. Methods: A total of 108 MTB was collected from DR-TB patients in
Osaka Habikino Medical Center (Osaka, Japan) from 1998 to 2016. WGS analysis of the
isolates was used with QIAseq FX DNA Library Kit (QIAGEN) and MiSeq (illumina) from cloning
strains. TGS-TB (https://gph.niid.go.jp/tgs-tb/) was used to predict the mutations in
responsible genes. The phenotypic DST results were obtained using the two kinds of
proportion method for 10 anti-tuberculosis drugs (Ogawa modified medium, Welpack-S,
Kyokuto) and 12 anti-tuberculosis drugs (Lowenstein-Jensen medium), and the minimum
Abstract:
inhibitory concentrations (MICs) for 16 drugs. And DST for Pyrazinamide (PZA) was conducted
using MIGT AST PZA (Becton Dickinson) and Simplified (modified) PZAse test
(unpublished). Results: DR-TB included 43 MDR and 35 XDR, as the results of phenotypic DST,
and consisted of lineage 2 (East-Asian, 88%) and lineage 4 (Euro-American, 12%). The
potential AMR prediction programme of TGS-TB predicted 81% (63/78) of MDR/XDR-TB
correctly. However, one susceptible isolate was predicted as MDR-TB. To date, we analysed
DST for five main drugs, isoniazid (INH), rifampicin (RIF), kanamycin (KM) and levofloxacin
(LVFX) and PZA. The sensitivities of predicted drug resistance for INH, RIF, LVFX were 93%,
93%, 91%, respectively. However, the specificity of INH was low (57%). In contrast, the
sensitivity and the specificity for KM were 68% and 100%, respectively. PZA showed high
sensitivity and specificity (>95%). Conclusion: To increase the accuracy of molecular DST, we
need to establish high quality genome database for DR-TB strains with reliable phenotypic
DST. Given the relatively low sensitivity/specificity of conventional DST to several drugs, the
continuous variables, MIC, will show the more detailed relations of mutations and drug
susceptibility.
Poster
71
Board #:
Mosaic, a Cloud-Based Community Platform for the Acceleration of Translational Microbiome
Title:
Science
Author J. Didion;
Mosaic provides a collaborative space where researchers can implement and compare
microbiome methods through community challenges. The “Strains” series of challenges
encourages the improvement of strain-level performance of bioinformatic tools. “Standards”
Abstract:
addresses experimental and computational sources of variability in metagenomic analyses to
promote accurate and reproducible NGS-based microbiome profiling. Learn more and get
started at http://mosaicbiome.com/.
Poster
72
Board #:
Title: Pathogen Detection Community Challenges on the Precision FDA Platform
Author J. Didion;
Next-generation sequencing (NGS) is revolutionizing microbial pathogen identification and
surveillance, and commercialization of metagenomics technologies is increasing at an
exponential rate. The U.S. Food and Drug Administration (FDA) seeks to promote the
development and improvement of bioinformatics pipelines for detecting pathogens in
samples sequenced using metagenomics by providing a cloud-based, community platform for
NGS assay evaluation and regulatory science exploration called precisionFDA. Over the past
Abstract: two years, precisionFDA has hosted and will continue to host a range of challenges, including
the Center for Food Safety and Nutrition (CFSAN) Pathogen Detection Challenge, and the
Center for Devices and Radiological Health (CDRH) Biothreat Challenge. These research
activities have challenged the community to test the ability of their pipelines to detect
pathogens, while also revealing specific areas where improvement is needed. Current and
future challenges - hosted at https://precision.fda.gov/challenges - will continue to motivate
improvements in metagenomic pathogen surveillance.
Poster
73
Board #:
Neisseria gonorrhoeae: Genomic Investigation of Azithromycin Resistant Strains, New South
Title:
Wales, 2017
C. R. George1, R. Rockett2, D. Whiley3, R. Enriquez1, R. Kundu1, J. El-Nasser1, V. Sintchenko4, M.
Author Lahra1;
1
Block: NSW Health Pathology, Randwick, AUSTRALIA, 2CIDM-PH, Westmead, AUSTRALIA, 3University
of Queensland, St Lucia, AUSTRALIA, 4NSW Health Pathology, Westmead, AUSTRALIA.
Background: Gonococcal antimicrobial resistance is a global concern, with treatment failure
now reported for every class of clinically assessed antimicrobial agent. Led by the United
Kingdom in 2011, and followed by many countries including the United States, China,
Australia, and since 2016 the World Health Organization, the recommended treatment
of Neisseria gonorrhoeaeis dual therapy (azithromycin and ceftriaxone) to forestall resistance
to ceftriaxone. Gonococcal resistance continues to rise but there is a new paradigm, with
azithromycin resistance increasing globally, whilst rates of ceftriaxone decreased
susceptibility (MIC 0.125 mg/L) decreasing in Australia and overseas. In Australia, rates of
ceftriaxone decreased susceptibility (0.06-0.125 mg/L) fell from 8.8% in 2013 to 1.2% in 2017
(Quarter 1), whilst azithromycin resistance (MIC ≥ 1 mg/L) increased from 2.1% in 2013 to
Abstract: 10.3% in 2017 (Quarter 1). A major outbreak of azithromycin resistant N. gonorrhoeaewas
first detected in South Australia in 2016, and secondary outbreaks have occurred in several
states including New South Wales. Aim: To investigate the outbreak of azithromycin
resistant N. gonorrhoeaein Australia by characterising strains and determining if the
resistance mechanisms were novel or known. Methods: We used Whole Genome Sequencing
and other molecular investigations to rapidly detect and characterise antimicrobial resistance
mechanisms in gonococcal isolates identified from New South Wales during 2017 (Quarter
1, n= 82). Results: We demonstrate that the use of molecular investigations including Whole
Genome Sequencing provides much needed solutions for investigating the characteristics of
outbreaks in this era of emerging antimicrobial resistance. We demonstrate relationships
between azithromycin resistance and epidemiological population groups.
Poster
74
Board #:
Title: A Modular, Versatile WGS Data Analysis Pipeline for Bacterial Outbreak Investigations
Author W. Haas, P. Lapierre, K. Musser;
Block: NYSDOH - Wadsworth Center, Albany, NY.
Background: The pathogens Legionella pneumophila and Escherichia coli have in common
that they can cause life-threatening infections, give rise to outbreaks that affect large
numbers of people, and are prone to horizontal gene transfer (HGT). These factors make
isolate characterization a priority and a challenge for public health laboratories, especially in
the case of species with high genome plasticity where a single mobile genetic element can
drastically change the genetic makeup of an isolate. Here, we present a versatile
bioinformatic pipeline that takes HGT into consideration and that can be easily adopted to
any bacterial pathogen. Methods: Illumina sequencing reads are trimmed and subjected to
quality control. The query isolate's genome sequence is obtained through de novo assembly
and compared to other genomes to find the best possible reference for mapping and variant
calling. A phylogenetic tree and a minimum spanning tree are generated from all genomes
that have been mapped to the same reference to depict how the isolates within a cluster are
related to each other. If the percentage of unmapped reads or the number of variants are too
large, the query's sequence will be added to the list of candidate reference genomes to serve
as nucleus for its own cluster. Several build-in controls ensure that the data are reliable and
Abstract:
alert the user to potential issues. Log, report, and summary files are generated automatically
to simplify data analysis and reporting. Results: Using WGS data from outbreaks of
Legionnaire's disease in New York State as test samples, the pipeline was able to correctly
group related isolates into clusters and identify the sources of the outbreaks by variant
analysis. The pipeline was easily able to accommodate other species, such as E. coli, by
changing the reference database. New capabilities, such as predicting the presence of Shiga-
toxin genes, were easily added by appending modules to the program. The ability to select a
reference from several candidates and to automatically add new references produced a more
detailed genome comparison since mapping all isolates to a single reference ignores mobile
genetic elements. Conclusions: While WGS is replacing methods such as Pulse Field Gel
Electrophoresis as the standard for isolate characterization, the methods to analyze these
data are far from being standardized. Producing inaccurate results can have severe
consequences in a public health setting, potentially resulting in increased morbidity and
mortality. Here, we present a bioinformatic data analyses pipeline that is accurate, robust,
and versatile.
Poster
75
Board #:
Metagenomic and Bioinformatic Evaluation of Clinical Specimens for Culture-Independent
Title:
Source Attribution of Legionnaires’ Disease
Author J. W. Mercante, J. A. Caravas, L. Lie, B. H. Raphael, J. M. Winchell;
Legionnaires’ disease (LD) is a severe and sometimes fatal pneumonia caused
by Legionella bacteria. Most Legionella species inhabit natural freshwater environments, but
they can also colonize man-made water networks, causing disease when a susceptible host is
exposed to contaminated aerosols. Identifying the source of exposure is important for
stopping ongoing disease transmission. Recently, NGS-based methods have allowed high-
resolution genetic matching of clinical and environmental Legionella isolates during
outbreaks. Yet, LD investigations often fail to recover clinical and/or environmental isolates,
creating uncertainty as to the source of infection. The aim of this study was to explore
metagenomic sequencing and bioinformatic analysis methods for rapid LD source attribution
when clinical isolates are not available. Previous results from our laboratory suggested low-
burden clinical samples may not yield sufficient Legionella sequence data for proper analysis.
Thus, a PCR-based strategy was developed, using Legionella spp. (ssrA) and human-specific
(RNaseP) targets, to prioritize clinical specimens with higher bacterial ratios for sequencing.
To evaluate this strategy, 29 culture-positive respiratory specimens were tested and 2 high
quality specimens were chosen for WGS using MiSeq V3 chemistry. Resulting sequencing
reads were taxonomically profiled and binned with Kraken using a modified v1 database.
Abstract: Remaining, uncategorized reads were identified using the output of DIAMOND in MEGAN6.
Between 0.2% (~52,000 reads) and 0.6% (~76,000 reads) of the paired sequences were
categorized as Legionella and approximately 99.4% of all reads were of human origin.
Mapping reads to the strain Paris reference sequence revealed 6-10X coverage on average
across 85-87% of the reference genome. Sequences were then analyzed by a previously
constructed gene-by-gene Legionella typing scheme implemented in the Bionumerics
platform, and 301 to 994 complete genetic loci were called from the assembled
metagenomes. Importantly, hierarchical clustering showed both clinical-derived
metagenomes clustered tightly within clades (at 98.6-99.3% identity) that included
the Legionella isolate (>2500 loci identified) previously recovered from these specimens, as
well as with additional clinical isolates associated with the same LD outbreak. This study
demonstrates a metagenomic workflow strategy that may provide sufficient genomic
resolution for LD source attribution in the absence of a clinical isolate. Further studies are
underway to optimize the strategy using human-DNA depletion methods and environmental
microbiome enrichment.
The findings and conclusions in this presentation are those of the authors and do not
necessarily represent the official position of the Centers for Disease Control and Prevention.
Poster
76
Board #:
Using Nanopore Sequencing to Understand the Widespread Antibiotic Resistance in the
Title:
Alaskan Soil Resistome
Author T. Haan, M. C. McCarthy, A. Ducluzeau, D. M. Drown;
Block: University of Alaska Fairbanks, Fairbanks, AK.
Resistant microbes may have a significant negative impact on the health of Alaskans.
Identifying specific antibiotic resistant microbes is essential for quick and appropriate
treatment. Increased antibiotic resistance in the environment may limit treatment options for
infections. As the climate changes and permafrost thaws, antibiotic resistant microbes may
multiply at an increased rate. Multidrug resistance genes are often found on plasmids,
enabling the rapid sharing via horizontal gene transfer. We also should consider transmission
via wildlife and humans across Alaska and circumpolar Arctic. Here using culture-based
methods, we found widespread antibiotic resistance along with heavy metal tolerance in local
boreal forest soils affected by thawing permafrost. Using novel metagenomic analysis of long-
Abstract: read, Nanopore DNA sequence data, we identify individual resistance genes present in our
samples. mcr genes confer colistin resistance, a drug of last resort due to side effects. We
found ORFs homologous to mcr3, a plasmid-borne resistance gene discovered by a clinical
study of isolates from Asia and the United States. mcr3 is likely related to a
phosphoethanolamine transferase gene in Enterobacteriaceae and Aeromonas species,
common environmental microbes. The CDC now tracks the mcr gene superfamily. In our
samples, homologs of mcr3 were found in all soil samples and in 6 different genomes
tentatively identified as suspected environmental microbial reservoirs. Importantly, these
methods allow us to better understand the environmental reservoir of antibiotic resistance in
Alaska.
Poster
77
Board #:
Single Chromosomal Genome Assemblies on the Sequel System with Circulomics High
Title:
Molecular Weight DNA Extraction for Microbes
J. Wong1, C. Heiner1, M. Kim2, H. Ferrao1, V. J. Wallace2, K. Eng1, R. Fedak2, J. Wilson1, D.
Author
Kilburn2, M. Ashby1, P. Baybayan1, J. M. Burke2, K. Bjornson1, K. J. Liu2;
Block: 1
Pacific Biosciences, Menlo Park, CA, 2Circulomics, Baltimore, MD.
Background: Recent developments with Nanobind technology from Circulomics provide an
elegant high molecular weight (HMW) DNA extraction solution for sequencing genomes from
Gram-positive and -negative microbes. Nanobind is a nanostructured magnetic disk that can
be used for rapid extraction of HMW DNA from diverse sample types including cultured cells,
blood, plant nuclei, and bacteria. Processing can be either automated using common
instruments or performed manually, and it can be completed in <1 hour for most sample
types. Methods: We have validated several critical workflow steps for generating high-quality
microbial genome assemblies in a high-throughput environment in a new streamlined
microbial multiplexing workflow. This workflow enables high-volume and cost-effective
sequencing of up to 16 microbes totaling 30 Mb in genome size on a single SMRT Cell 1M
using a target shear size of 10 kb. We also evaluated this method on a set of four “class 3”
microbes with >7 kb repeats. The fragment size was increased to ~14 kb, with some
Abstract: fragments >30 kb. Results: Here we share data demonstrating these new capabilities using
isolates relevant to high-throughput sequencing applications, including Shigella, common
foodborne pathogens (Listeria, Salmonella), and species often seen in hospital settings
(Klebsiella, Staphylococcus). For nearly all microbes, including the difficult class 3 microbes,
we achieved complete de novo microbial assemblies of ≤5 chromosomal contigs with
minimum quality scores of 40 (99.99% accuracy) using data from multiplexed SMRTbell
libraries. Each library was sequenced on a single SMRT Cell 1M with the PacBio Sequel System
and analyzed with streamlined SMRT Analysis assembly methods. Conclusions: Using a
combination of Circulomics Nanobind extraction and PacBio SMRT Sequencing, we prepared
and sequenced a pool of microbes totaling ~30 Mb on one SMRT Cell 1M. With our
streamlined workflow, which includes automated demultiplexing and push-button assembly,
we achieved complete closed genomes for most microbes with quality values typically ranging
from 40 to over 50.
Poster
78
Board #:
rRNATagger: an Integrated Pipeline for Marker Gene Amplicon Sequence Data Processing
Title:
Geared for HPC Environments
J. Tremblay1, E. Yergeau2, C. W. Greer1;
Author 1
National Research Council Canada, Montreal, QC, CANADA, 2INRS - Centre Armand-Frappier,
Block:
Laval, QC, CANADA.
With the advent of high throughput sequencing, microbiology is increasingly becoming a data
intensive field of science. Because of its low cost, robust databases and established
bioinformatic workflows, sequencing of 16S/18S/ITS rRNA gene amplicons, which provides a
marker of choice for phylogenetic studies, has become ubiquitous and has grown into the
backbone of modern microbial ecology. Many established end-to-end bioinformatic pipelines
are available to perform short amplicon data analysis and have proven to be central for
advancing the field of microbial ecology. These pipelines have been partly written for a
general audience, which is arguably a main reason for their widespread adoption. However,
few options exist for a more specialized audience that is experienced in Linux-based systems
and high performance computing (HPC) environments. For such an audience, existing
pipelines can be limiting to fully leverage modern HPC capabilities and perform tweaking and
optimization operations. Moreover, a wealth of stand-alone software packages that perform
Abstract: specific targeted bioinformatic tasks are increasingly accessible through code repositories and
scientific publications and finding a way to easily integrate these applications in a pipeline is
critical in the context of the fast-paced evolution in bioinformatic methodologies. Here we
describe rRNATagger, a short rRNA marker gene amplicon pipeline coded in a python
framework that enables fine tuning and integration of virtually any potential rRNA gene
amplicon bioinformatic procedure. It is designed to work within an HPC environment,
supporting a complex network of job-dependencies with a smart-restart mechanism in case
of job failure or parameter modifications. As proof of concept, we present end results
obtained with rRNATagger using 16S, 18S, ITS and PacBio long read amplicon data types as
input. Using a selection of published algorithms for generating Operational Taxonomic Units
and Single Nucleotide Variants and for computing downstream taxonomic summaries and
diversity metrics, we demonstrate the performance and versatility of our pipeline for
systematic analyses of amplicon sequence data.
Poster
79
Board #:
Title: Tracking the Resistome in One Health Surveillance
Author H. Tate;
Block: FDA, Laurel, MD.
The National Antimicrobial Resistance Monitoring System and others have shown that the
presence of known antimicrobial resistance determinants is highly correlated with clinical
resistance. Studies show that whole genome sequence data can be used to reliably predict
resistance in Salmonella, Campylobacter, E.coli and other bacteria, making it possible to infer
resistance from genomes without traditional antimicrobial susceptibility data. Drawing on
that conclusion, in November 2017 the NARMS program launched Resistome Tracker, an
online tool that provides easily accessible and visually informative interactive displays of
antibiotic resistance genes. Resistome Tracker harvests resistance gene information from
Abstract: genomic data deposited at NCBI and allows users to customize visualizations by antibiotic
drug class, compare resistance genes across different sources, identify new resistance genes,
and map selected resistance genes to geographic region. The tool also provides alerts about
new resistance traits as they emerge in a region or source to provide early warning on
emergent trends. Because a variety of sources are represented in the NCBI dataset,
Resistome Tracker inherently employs a One Health approach to antimicrobial resistance
surveillance. By understanding potential reservoirs for the dissemination of resistant traits,
public health officials, academics, and researchers can develop interventions to stop or slow
the spread of antibiotic resistance.
Poster
80
Board #:
Title: Using Whole Genome Sequencing for Detection ofBacillus cereusToxin Genes in Food
Author A. T. Nguyen1, S. M. Tallent2;
1
Block: Merieux NutriSciences, Chicago, IL, 2US FDA/CFSAN/ORS, College Park, MD.
Introduction: Bacillus cereus is among the top ten pathogens associated with foodborne
illness in the United States; it causes diarrheal disease (ingestion and production of hemolysin
BL (Hbl), nonhemolytic enterotoxin (Nhe), or cytotoxin (CytK) in the gut) or emetic disease
(ingestion of pre-formed cereulide (Ces)). Currently, toxin detection is only available for one
component of each tri-part toxin (HblC and NheB) or by mass spectrometry
(Ces). Purpose: Presently, there is no primary screening method for detection of toxin genes
before detection using a commercialized kit or mass spectrometry method. Development of a
genomic sequencing method and pipeline for initial screening of toxin-producing genes would
provide fast primary detection for potential toxins and other virulence factors. Methods: A
sensitive whole genome sequencing method and analysis pipeline using BTyper, a
computational tool that classifies and characterizes virulence potential of suspected strains,
Abstract:
was developed. DNA was extracted and sequenced from B. cereus culture or spiked food (rice,
gravy, whey powder, pancake mix, and infant formula), and BTyper was used to analyze
sequence data. Results: Our results show that BTyper can be used to detect toxin-producing
strains of B. cereus in culture and in six different food types. BTyper analysis confirmed a toxin
profile of 49/50 Hbl and 48/50 Nhe in inclusivity strains and 30/30 exclusivity strains
previously studied by PCR and examined for toxin production by commercialized protein
assays. Significance: Detection of B. cereus toxins is paramount to ensuring food safety;
however, toxin detection from food is laborious and time-consuming, and some toxins are not
directly detectable from food. Development of this method for detection of toxin genes is a
powerful primary screening method and allows for the acquisition of more in-depth data
about the suspected strain.
Poster
81
Board #:
Agricultural Origins of a Highly-persistent Clone of Vancomycin-resistant Enterococcus
Title:
faecalis in New Zealand
R. Rushton-Green1, G. Tairoa1, R. Darnell1, G. P. Carter2, D. Williamson2, G. Cook1, X. Morgan1;
Author 1
University of Otago, Dunedin, NEW ZEALAND, 2Doherty Institute, University of Melbourne,
Block:
Melbourne, AUSTRALIA.
Background: Enterococcus faecalis and Enterococcus faecium are human and animal gut
commensals. Vancomycin-resistant enterococci (VRE) are important opportunistic pathogens
with limited treatment options. Historically, the glycopeptide antibiotics vancomycin and
avoparcin have cultivated vancomycin resistance in both human and animal isolates, resulting
in global cessation of avoparcin use between 1997 and 2000. To better understand human
and animal-associated VRE strains in the post-avoparcin era, we sequenced the genomes of
231 VRE isolates from New Zealand (NZ) (75 human clinical, 156 poultry) cultured between
1998 and 2009. Results: A clonal E. faecalis strain (MLST 108) was highly prevalent among
both poultry and human isolates in the three years following avoparcin discontinuation in
2000, consistent with previous molecular typing of NZ VRE strains. Metadata and
Abstract: antimicrobial susceptibility information suggest an agricultural origin for this clone, and that
historic antimicrobial use led to evolution of clinically-relevant resistances. VRE isolate
resistomes were largely carried on multiple, heterogeneous plasmids containing diverse
resistance determinants and multiple linked selection mechanisms. In contrast to E. faecalis,
lineages of E. faecium delineate between agricultural and human
reservoirs. Conclusions: Historic use of antimicrobials in NZ agriculture has driven the
evolution of a clonal VRE strain carrying a range of clinically-relevant antimicrobial
resistances. The high genomic conservation and the near-universal presence of bacitracin
resistance genes suggest a common poultry origin with high bacitracin exposure. The
persistence of this clone in NZ for over a decade indicates that co-selection and other
stabilizing mechanisms may be important drivers for the persistence of this clone.
Poster
82
Board #:
Title: A Molecular Biology Cloud for the Microbial NGS Researchers
Author V. Nagarajan;
Block: MolBioCloud, Silver Spring, MD.
Background: A combination of the high throughput NGS data technology and the
democratized public cloud computing power, has created a perfect environment for
biomedical discovery. But, unfortunately, the complexity of the cloud ecosystem and the mind
boggling software dependency that is baked into the modern open source ecosystem, poses a
serious threat to this wonderful opportunity that the biomedical researchers are facing today.
In this work, we demonstrate a cloud computing platform that truly brings the power of the
cloud to the researcher's finger tips. Methods: We carefully reviewed the literature for all the
published software, pipelines and workflows that are related to applied microbial NGS data
analysis. The identified tools were then individually installed, configured and tested in
MolBioCloud (an AWS cloud computing marketplace DaaS product, that is FedRAMP
compliant and super user friendly, with a nice drag and drop Desktop interface). The tools
were then packaged (MicrobialMolBioCloudPackage) and tested as a single installer for use
with MolBioCloud. The platform was tested multiple times using the several different instance
types within the Amazon Cloud Computing Infrastructure, for reliability and
reproducibility. Results: The MolBioCloud platform already comes prepackaged with
Abstract:
thousands of standard molecular biology and latest versions of NGS analysis tools. The
MicrobialMolBioCloudPackage that we developed for this work, is freely available at
https://molbiocloud.com/help/tiki-index.php?page=MicrobialMolBioCloudPackage . This
package consists of several tools for NGS based MLST, taxonomy classification, metagenomics,
community composition, GWAS and population genomics. Hundreds of popular and standard
tools including the latest versions of mother, QIIME2, PathoScope, MashTree, Staphopia,
Snippy, Prokka, Roary, abricate, SalmID, Sixess, MEGAN, Krona, MetaPhlAn, SNP-sites etc are
included in this package with thousands of dependencies resolved, configured and
tested. Conclusion: The MicrobialMolBioCloudPackage is a free one-step installer that help
setup a complicated computing environment in a very easy manner. Along with secure, on-
demand, pay-per-use public cloud model, this platform has a great potential, to help the
microbial researchers realize this amazing discovery opportunity, using the BigData and the
public Cloud combination. With this platform, the researchers can now easily set up a secure,
powerful, easy to use, biologist friendly, low-cost and scalable virtual private cloud based nice
and powerful Desktop system, for their personal research.
Poster
83
Board #:
Next-generation Sequencing and Antibiotic Resistance Profiles of Salmonella Strains Isolated
Title:
from Stream Sediments and Poultry Litter in the Shenandoah Valley of Virginia
Author C. Holmes, S. Jurgensen, N. Greenman, J. Herrick;
Block: James Madison University, Harrisonburg, VA.
Environmental reservoirs of Salmonella -- particularly those related to agricultural practices --
may contribute significantly to the dissemination of these potential human pathogens.
However, such reservoirs are not well characterized. Furthermore, the use of antibiotics in
animal agriculture has potentially expanded the transfer and recombination of antimicrobial
resistance (AMR) genes among Salmonella and other bacterial populations in environmental
ecosystems. Surveillance of Salmonella in ecosystems such as streams and soils is potentially
an important tool for understanding the overall distribution and epidemiology of this
pathogen. Stream sediments from seven sites and chicken litter from five poultry houses in
the Shenandoah Valley were sampled between October 2016 and January 2018. Modified
FDA Bacteriological Analytical Manual methods of pre-enrichment, enrichment, and isolation
were used to isolate 38 Salmonella strains. Thirty-three were isolated from stream sediments
Abstract: and five from litter from a single commercial chicken house. Putative Salmonella were
confirmed by polymerase chain reaction amplification of the Salmonella-specific invA gene.
DNA of all isolate genomes was Illumina-sequenced and assembled and eight were sequenced
using Nanopore technology and hybrid-assembled. Fourteen different serotypes were
identified (using SeqSero, SISTR and SMART PCR) among the 38 isolates. AMR profiles were
determined using Sensititre MIC assays as well as surveyed in silico using KmerResistance and
ABRicate utilizing the ARG-Annot database. All isolates possessed the ampicillin resistance
gene ampH and at least one aminoglycoside resistance gene. Thirteen isolates also had
resistance phenotypes and/or genes encoding resistance to other clinically or agriculturally
relevant antibiotics. Populations of Salmonella in poultry litter and especially in stream
sediments impacted by agricultural runoff may constitute important environmental reservoirs
for antibiotic resistance genes.
Poster
84
Board #:
Rapid Detection of Antimicrobial Resistance Markers in Bacillus anthracis by Nanopore Whole
Title:
Genome Sequencing
A. S. Gargis1, B. Cherney1, A. Conley2, H. P. McLaughlin1, D. Sue1;
Author 1
Centers for Disease Control and Prevention, Atlanta, GA, 2IHRC Inc. Georgia Tech Applied
Block:
Bioinformatics Laboratory, Atlanta, GA.
Background: Bacillus anthracis is a spore-forming bacterium that causes anthrax in
humans. Antibiotics, including tetracyclines, fluoroquinolones, and β-lactams, are typically
effective for anthrax treatment. During an emergency, the detection of antimicrobial
resistance in B. anthracis strains will be critical for an effective public health response. High
quality whole genome sequencing data may be useful to detect genetic engineering, plasmids
or antimicrobial resistance gene markers in implicated strain(s). Methods: Commercially
available DNA extraction kits, such as the MasterPure Complete DNA and RNA Purification Kit
(Epicentre) and QIAamp DNA Blood Mini Kit (Qiagen), can purify B. anthracis DNA, but have
not been evaluated for nanopore sequencing. Here, both kits were used to isolate gDNA from
agar-cultured cells of a select agent-excluded, attenuated B. anthracis strain (Sterne/pUTE29)
that is not susceptible to tetracycline. Sequencing libraries were prepared using the Rapid
Sequencing Kit (Oxford Nanopore Technologies). De novo assemblies were generated using a
custom pipeline (Albacore, Minimap/Miniasm, Racon, and Nanopolish). We assessed (1) DNA
quantity and quality, (2) if purified DNA could be used for library preparation, (3) sequence
quality of MinION versus Illumina and PacBio data, (4) whether the virulence plasmid, pXO1,
and an introduced plasmid, pUTE29, could be assembled, and (5) if the tetracycline resistance
gene (tetL) could be identified. Finally, retrospective down sampling of MinION reads was
Abstract:
performed to assess the time required for de novo assembly. Results: DNA of sufficient
quality and quantity for MinION sequencing was obtained from both kits, but an optimized
cell lysis with extended incubation (1 h) was required. Nanopore sequencing yielded 3-6 Gb of
data and each de novo assembly covered >99% of the reference sequence (Sterne). Large
(pXO1, 181.7 kb) and small (pUTE29, 7.3 kb) plasmids, as well as tetL were detected in
MinION, Illumina, and PacBio data. Mapping MinION reads to the Sterne reference revealed
numerous (4,000 to 5,000) small indels, especially in homopolymer regions, compared to
Illumina and PacBio assemblies. Down sampling analysis revealed that only minor
improvements in assembly quality were observed after analysis of 100,000 reads. DNA
extraction, library preparation, sequencing, and assembly was completed in ~10 h from a
pure B. anthracis isolate. Conclusions: With an optimized lysis step, both extraction methods
produced B. anthracis DNA of sufficient quality and quantity for the Rapid Sequencing Kit.
MinION reads were error-prone compared to Illumina and PacBio data, particularly in
homopolymer regions. However, plasmids and an introduced antimicrobial resistance
gene were assembled from MinION data. The ability to de novo assemble a B.
anthracis genome from a culture isolate in ~10 h using nanopore sequencing could contribute
to an expedited public health response.
Session: Poster Session B
Poster
85
Board #:
Characterization of Microbiota in Cerebrospinal Fluid (CSF) from Patients with CSF Shunt
Title:
Infections Using Shotgun Sequencing
Author P. Hodor1, C. Pope2, K. Whitlock1, L. Hoffman2, T. Simon1;
1
Block: Seattle Children's Hospital, Seattle, WA, 2University of Washington, Seattle, WA.
Background: Treatment of hydrocephalus consists of surgical placement of a CSF shunt.
Approximately 10% of patients develop a shunt infection within 1 year of CSF shunt
placement. Approximately 20% of patients with first infection develop reinfection. It is not
known whether reinfections are caused by an organism previously present in the host or are
independent infection events. Identification of microorganisms associated with CSF shunt
infections has traditionally relied on culture methods, but high throughput sequencing of 16S
ribosomal RNA has been adopted more recently to identify bacterial species present. Here we
present the results of a pilot study using whole genome shotgun sequencing and evaluate the
additional resolution this method provides to our understanding of CSF shunt
infection. Methods: CSF samples were obtained from 6 patients having 2 infections, with one
sample collected near the beginning and another near the end of each infection. The V4
region of 16S ribosomal RNA was amplified and sequenced. Alternatively, DNA was processed
in duplicate by whole genome amplification (WGA) followed by shotgun sequencing.
Taxonomic assignments of sequences obtained by 16S and WGA were compared against each
other and with microbiological culture results. Non-human sequences from WGA were
Abstract:
assembled and compared against known genomes from similar species. Results: Taxonomic
classification of bacteria observed by 16S and WGA was consistent with that obtained in the
CSF cultures at the beginning of each infection episode. However, taxa assigned by 16S
stopped at the genus level, and in one case (Klebsiella pneumoniae) 16S only identified the
family (Enterobacteriaceae). WGA was able to identify all species detected in culture.
Furthermore, WGA provided additional insights into the composition of the samples, such as
showing that human DNA constituted 76 to 99% of the reads, identifying outlier samples of
questionable quality, and detecting 2 cases of significant viral load. A few CSF samples
produced a sufficiently large number of bacterial reads to allow partial assembly of the
predominant species and comparison to known genomes to identify the closest matching
strain. Conclusion: This proof of concept study showed the value of shotgun sequencing in
studying the microbiota of CSF shunt infections. Not only were the results consistent with
culture-based methods, but additional insights could be gained regarding strain identity of
predominant bacteria and identification of viral loads. This approach opens the door to a
detailed understanding of the progression of infections and reinfections.
Poster
86
Board #:
GenEpiO and FoodOn: Enabling Data Interoperability for Infectious Disease Surveillance,
Title:
Investigation and Control
E. Griffiths1, D. Dooley2, G. Gosal2, N. Alikhan3, M. Sanchez4, T. Matthews5, A. Pertkau5, J.
Adam5, R. Timme4, M. Graham5, G. Van Domselaar5, F. Brinkman1, W. Hsiao6;
Author 1Simon Fraser University, Vancouver, BC, CANADA, 2University of British Columbia,
Block: Vancouver, BC, CANADA, 3University of Warwick, Coventry, UNITED KINGDOM, 4US Food and
Drug Administration, College Park, MD, 5Public Health Agency of Canada, Winnipeg, MB,
CANADA, 6BC Centre for Disease Control Public Health Laboratory, Vancouver, BC, CANADA.
Backgroundl:The ability to share data between organizations is crucial for global, real-time
infectious disease surveillance and investigation. Reliable capture and harmonization of whole
genome sequencing (WGS) contextual information (sample source, experimental and
bioinformatics methods, lab, clinical and epidemiological data) is critical for the interpretation
of WGS results used for decision making in health crises. This data is often recorded using free
text and institution-specific data dictionaries, requiring time-consuming and error-prone
transformation before it can be used in investigations. Ontologies provide hierarchies of well-
defined, standardized vocabulary enabling comparisons at different levels of granularity;
universal IDs for disambiguating terms; built-in logic enhancing querying power; and
synonyms that enable institutions to use preferred terms while linking to a standard,
improving interoperability. We have created two ontologies to better harmonize and
integrate genomics data into food microbiology and public health workflows, called the
Genomic Epidemiology Ontology (GenEpiO) and the Food Ontology (FoodOn). Here, we
describe the development of tools which facilitate ontology implementation within food
safety and public health communities. Methods: User engagement activities identified
vocabulary gaps, user needs, and use cases. Two ontology-driven tools were created to
Abstract: enable mapping of food microbiology and public health data to standardized terms. LexMapr,
a Python-based, hybrid lexicon and rule-based system, was developed to address the many
challenges in processing short textual data. Test datasets of metadata were mapped to
GenEpiO and FoodOn to establish rules for natural language processing. Also, a Linux-based,
open source, Python-driven web portal called the Genomic Epidemiology Entity Mart (GEEM)
was developed to better enable the exchange of ontology-driven data specifications between
agencies. Results: These tools and resources are currently being tested and evaluated for use
in key databases and platforms for typing and tracking foodborne pathogens - Enterobase,
GenomeTrakr and IRIDA. LexMapr testing indicates that the software has a high level of
sensitivity in data clean-up, text matching and concept mapping. Furthermore, data
specifications were created using GEEM for different applications, including an International
Organization for Standards (ISO) standard for the implementation of WGS for food
microbiology. The International GenEpiO Consortium (>80 members, from 15 countries) was
also established to create consensus and uptake. Conclusions: The improved inferencing and
computability of harmonized data provided by our resources and tools can enhance
communication and analyses, resulting in faster hypothesis generation during investigations,
and ultimately, better health outcomes.
Poster
87
Board #:
Title: Data Parsing: Efficiency, Organization, and Analysis of NGS Data for Public Health Labs
Author C. Hanigan;
Block: APHL, Silver Spring, MD.
The APHL and CDC Influenza Division began using the APHL Informatics Messaging System
(AIMS) for transmission and analysis of next generation sequencing (NGS) data generated by
the National Influenza Surveillance References Centers on behalf of CDC. With the success of
this endeavor, laboratories have sought to leverage the data transmission and analysis
capabilities for additional pathogens beyond influenza. This expansion has created the need
for a “data parser” that can manage and route files based on data submitter, unique
pathogen, and specific project to the appropriate data bucket and pipeline for analysis. The
data parser, for its ability to slice AMD files and runs, is called Ninja. The Ninja software
organizes and directs incoming NGS data based on pathogens, submitters, and projects to
Abstract: different services, e.g, placing the files into directories, rerouting files to another site, or
notifying other processes about the availability of NGS data. The Ninja is able to sort and
direct NGS data from multiple pathogens and projects into the appropriate data bucket and
data analysis pipeline, based on the information users put on the sample sheet that will cue
the Ninja to parse the NGS data into the appropriate places. Public health laboratories have a
variety of hurdles in their use of NGS for disease detection and surveillance. Limited
workforce and bioinformatics capacity are two of the biggest challenges. As public health
laboratories expand their use of NGS, tools like Ninja will be instrumental in allowing them to
use this technology for more Ninja's technical capabilities allow public health labs to increase
their efficiency and expand their use of NGS into a variety of pathogens.
Poster
88
Board #:
Prediction of Antibiotic Minimum Inhibitory Concentration from Bacterial Whole Genome
Title:
Sequence Data in Klebsiella pneumoniae
Author S. W. Long1, R. J. Olsen1, J. J. Davis2, M. Nguyen2, F. Xia2, T. Brettin2, J. M. Musser1;
1
Block: Houston Methodist Hospital, Houston, TX, 2University of Chicago, Chicago, IL.
Introduction: Antimicrobial resistance testing has been a mainstay of clinical microbiology
since the early 1970s. Phenotypic determination of minimum inhibitory concentration (MIC) is
culture-dependent, requiring hours of growth before rendering an actionable result. Multiple
studies have shown that decreasing the time between initial sample collection to actionable
clinically relevant susceptibility results has multiple patient benefits, including decreased
length of stay, decreased mortality, and decreased costs. Whole genome sequencing (WGS)
has continued to decrease in cost while delivering faster results, proving useful for molecular
microbiology. Recent advances in machine learning can develop classifiers that use bacterial
WGS data to predict MIC within one dilution for many antibiotics. Methods: We used the
whole genome sequence of 1,668 K. pneumoniae isolated from patients which had
phenotypic antimicrobial susceptibility testing performed by BD Phoenix. We used these data
Abstract: to build classifiers using an XGBoost-based machine learning model to predict minimum
inhibitory concentrations (MICs) for 20 antibiotics. These predictions were validated against a
test set of isolates not included in the training set. Results: The overall accuracy of the model,
within ±1 two-fold dilution factor, is 92%. Individual accuracies are ≥90% for 15/20 antibiotics
tested. We show that the MICs predicted by the model correlate with known antimicrobial
resistance genes. Conclusion: Importantly, the genome-wide approach described offers a
method to predict MICs without knowledge of the underlying gene content. This study shows
that machine learning can be used to build a complete in silico MIC prediction panel for K.
pneumoniae and provides a framework for building MIC prediction models for other
pathogenic bacteria. The ability to rapidly sequence bacterial genomes and then predict an
MIC and resulting phenotype hours before culture-based methods have completed is a great
potential advance for patient care and guiding empiric therapy.
Poster
89
Board #:
Long-range Sequencing to Identify Multispecies blaKPC-harboring IncN Plasmid Carriage at a
Title:
New York City Hospital
Author A. Gomez-Simmonds, M. K. Annavajhala, M. J. Giddins, S. L. Stump, A. Uhlemann;
Block: Columbia University Medical Center, New York, NY.
Introduction: Infections caused by carbapenem-resistant Enterobacteriaceae (CRE) are
associated with high mortality due to broad-spectrum antibiotic resistance. The plasmid-
encoded Klebsiella pneumoniae carbapenemase (KPC) is the dominant mechanism of
carbapenem resistance in the US. Both clonal expansion and horizontal transfer have been
implicated in the spread of CRE. However, challenges sequencing plasmids have limited the
ability to assign blaKPC to specific plasmid backbones to assess plasmid-
mediated blaKPC transmission. Focusing on broad-host range IncN plasmids, which we
previously detected in multiple strains of Enterobacter cloacae complex, we used MinION
long-range sequencing to characterize and compare blaKPC-harboring plasmids in CRE clinical
isolates collected at a tertiary care center where CRE are endemic. Methods: CRE isolates
collected between 2010-2017 were identified on the basis of phenotypic resistance to
meropenem (MIC≥2 mcg/dL) and sequenced using Illumina (n=469). blaKPC subtypes,
multilocus sequence types, and plasmid replicon types were detected by SRST2 using the
ARG-ANNOT, PubMLST, and PlasmidFinder databases, respectively. A subset of isolates found
to have blaKPC-3 and a plasmid profile including an IncN replicon (n=15; 11 K. pneumoniae, 4 E.
cloacae) underwent plasmid DNA extraction (Qiagen) followed by long-range sequencing
using the MinION (Oxford). Hybrid plasmid assemblies were generated using SPAdes and
Abstract:
visually curated and compared using Geneious (Biomatters). Results: We successfully
localized both a plasmid replicon gene and blaKPC to a single contig for 8/15 isolates with
median Illumina housekeeping read depths of 27.7 and 4,203 curated MinION sequencing
reads (IQR 26.7-134.2 and 3,512.5-7,013.5, respectively). In 2 additional isolates, 2-3 large
contigs mapped closely to a local internal reference plasmid (pNR0276, NCBI accession
number PNXT00000000). blaKPC-3 was found on IncN plasmids in 6 isolates, including 3 K.
pneumoniae from 3 different STs and 3 E. cloacae from 2 STs, ranging in length from 48,506-
76,249 kb. In 4 K. pneumoniae isolates, blaKPC-3 was found on IncFII plasmids. Alignment of
IncN plasmid sequences to pNR0276 indicated that 5/6 plasmids shared at least 90% pairwise
identity over the full length of pNR0276, while one isolate harbored a truncated plasmid
sharing an ~40 kb core region with pNR0276. Conclusions: Long-range sequencing enabled
identification of an established blaKPC-3-harboring IncN plasmid backbone in carbapenem-
resistant K. pneumoniae and E. cloacae at our hospital. Further study is needed to determine
the extent of dissemination of IncN and other blaKPC-harboring plasmids among
Enterobacteriaceae. Long-range sequencing has the potential to greatly facilitate
comprehensive plasmid sequencing and demonstrate the important contribution of plasmids
to the dissemination of CRE.
Poster
90
Board #:
Title: Pan-Synteny Graphs: Understanding Rearrangements
Author A. Warren;
Block: Biocomplexity Institute at Virginia Tech, Blackburg, VA.
We present Pan-synteny graphs, a multiple whole genome alignment model for
understanding genome rearrangements and a graphical interface for browsing and
understanding complex genome relationships. The visualization and interaction techniques
demonstrate a powerful new kind of genome browser capable of summarizing information
between hundreds of genomes. This effort touches on several different research fronts-
graph representation of genomes and their alignments, synteny block analysis, whole
genome sequence alignment, pan-genome analysis, multiple sequence alignment, and
genome rearrangement analysis. Pan-synteny graphs represent a fundamentally new strategy
to compare thousands of bacterial genomes in a scalable manner. Graph creation also
identifies relative evolutionary events such as inversion, translocation, deletion, and
insertion. Though this approach was originally developed from a pan-genome perspective for
Abstract:
prokaryotes we are excited about its applicability to a wide range of topics. Algorithmically
novel elements include the contextualization of synteny analysis both between and within
multi-contig genomes. We also believe the algorithmic approach for discovering collision
points has great value in the recognition of evolutionary relationships between a group of
genomes. Pan-synteny graphs harness the information in pre-existing family databases, e.g.
COGs and others. We will demonstrate how this information is able to make model
construction more resilient to distant and complex evolutionary relationships as compared to
existing tools such as Mauve and Harvest. This comparative graphical model also serves as a
framework to analyze incomplete genomes. We hope to show that the graph abstraction and
layout algorithm not only serve to make the resulting model approachable in terms of human
cognition but represents a step forward in interactive comparative genomics.
Poster
91
Board #:
Whose Lab is it, Anyway? --- Teaching Lab-specific Biases to a Metagenomics Taxonomy
Title:
Classifier
Author J. A. Russell1, A. Shteyman1, D. Yarmosh1, P. Davis1, P. Li2, K. Davenport2, P. Chain2, J. Bagnoli1;
1
Block: MRIGlobal, Gaithersburg, MD, 2Los Alamos National Laboratory, Los Alamos, NM.
Metagenomics is emerging as an important tool in biosurveillance, public health, and clinical
applications. However, ease-of-use for execution and data analysis remains a primary barrier-
of-entry to the full adoption of metagenomics in applied health and forensics settings. Here,
we present PanGIA (Pan-Genomics for Infectious Agents), a novel framework for hosting,
processing, analyzing, and reporting read-mapping data from metagenomics samples that can
be run on commodity computer hardware. PanGIA was developed to address existing gaps
that may preclude clinicians, medical technicians, forensics personnel, or other non-expert
end-users from routinely leveraging metagenomics data for their needs. PanGIA is primarily
meant for the detection and discovery of pathogenic microorganisms from clinical and
environmental metagenomics data. PanGIA provides two forms of confidence scoring; the
first pairs coverage data with ‘uniqueness’ information derived from each reference genome
for a stand-alone determination of confidence for each query sequence at each taxonomy
Abstract: level, and the second compares a known ‘negative control’ profile with the profile of an
unknown sample to determine significance in presence ‘above background’. Data can be
quickly summarized within the graphical user interface to rapidly detect specific organisms-of-
interest. PanGIA’s default parameters were optimized using a ROC-approach
(Receiver Operating Characteristic curve) from in-silico-generated microbial communities.
Recent work, leveraging a machine-learning approach, has explored the capacity of PanGIA to
learn what known false-positives look like (across confidence score, normalized read
abundance, reference genome linear coverage, depth-of-coverage, RPKM, and other metrics)
such that PanGIA can more accurately distinguish potential false-positives in real-world
laboratory sequencing data. In this way, over time and with initial user input, PanGIA can
‘learn’, recognize, and account for the contaminants and biases inherent to whichever
laboratory it is placed in. This feature adds a unique level of confidence in discerning
unambiguous detection events from low-confidence hits and false positives.
Poster
92
Board #:
Development and Application Of QuAISAR-H: A Bioinformatics Pipeline for Short Read
Title:
Sequences of Healthcare-Associated Pathogens
Author R. A. Stanton, N. Vlachos, T. J. de Man, A. Lawsin, A. Laufer Halpin;
Block: Centers for Disease Control and Prevention, Decatur, GA.
The application of whole genome sequencing (WGS) to surveillance projects and outbreak
investigations of pathogens causing healthcare-associated infections (HAI) grants public
health microbiologists an unprecedented level of resolution towards understanding the
epidemiology of antimicrobial resistance, and transmission dynamics. However, the technical
expertise required for processing and analyzing WGS data is often a major obstacle in public
health laboratories, limiting the feasibility of implementing WGS on a wide-scale.
Furthermore, healthcare-associated pathogens are uniquely challenging because of their
diversity; our group alone has sequenced more than 50 different species causing HAIs. Finally,
the lack of established standards for performing sequence analysis has left a gap in public
health practice. To address these shortcomings, we have developed QuAISAR-H: a specialized
pipeline for Quality control, Assembly, species Identification, Sequence typing, Annotation,
and Resistance mechanisms for Healthcare-associated pathogens. QuAISAR-H currently runs
on the CDC’s high performance computing cluster, utilizing open source software and custom
scripts. It accepts and is optimized for raw reads generated by Illumina short read sequencers
Abstract:
and initially performs a variety of quality control assessments, including species identification
and contamination checks using Kraken and Gottcha. Genome assemblies are generated
using SPAdes, classified using MLST definitions and functionally annotated by Prokka.
Antimicrobial resistance genes are identified using multiple databases from both the raw
reads (using SRST2) and the assemblies (using c-SSTAR). The output assemblies and high-
quality, cleaned reads generated by QuAISAR-H can be used for downstream phylogenetic
analysis. The implementation of QuAISAR-H has allowed us to move towards a more
standardized approach of analyzing WGS data from HAI pathogens. We have iterated and
streamlined the pipeline through processing more than 3400 isolates sequenced internally
and externally, including those from 45 HAI outbreaks. A graphical user interface to provide
public health laboratories across the country with direct and easy access to QuAISAR-H is
currently under development and will be available through the CDC’s Office of Advanced
Molecular Detection online portal. This will enhance not only local capacity , but also national
efficiency in utilizing WGS data for HAI surveillance and investigation.
Poster
93
Board #:
Title: GenomeTrakr Proficiency Testing for Foodborne Pathogen Surveillance
Author R. E. Timme, H. Rand, M. Leon, E. Strain, M. Allard, D. Roberson, J. Baugher;
Block: US Food and Drug Administration, College Park, MD.
Pathogen monitoring is becoming much more robust as sequencing technologies become
more affordable and accessible world-wide. This transition is especially apparent in the field
of food safety, which has demonstrated how whole genome sequencing (WGS) can be used
on a global scale to protect public health. GenomeTrakr coordinates the WGS performed by
public health agencies and other partners by providing a public database with real-time
cluster analysis for foodborne pathogen surveillance. As growing numbers of public health
labs use WGS technology to support enforcement decisions, it is essential to have confidence
in the quality of the data being used and the downstream data analyses which guide these
decisions. Routine proficiency tests, such as the one described here, have an important role in
ensuring the validity of both data and procedures. GenomeTrakr ran an annual internal
proficiency test through 2015 that is now harmonized with PulseNet. In 2015 the
Abstract: GenomeTrakr proficiency test consisted of 8 isolates of common foodborne pathogens;
participating laboratories were required to follow a protocol to culture these and perform
WGS. Resulting sequence data were evaluated for proper annotation, sequence quality, and
applicability to downstream bioinformatics analyses. Overall, this exercise revealed the
degree of variation which should be expected in sequence data produced across a diverse
network of laboratories. Illumina MiSeq sequence data collected for the same set of strains
across 21 different labs exhibited high reproducibility, while revealing a narrow range of
technical and biological variance. The numbers of SNPs reported for sequencing runs of the
same isolates across multiple labs support the robustness of our cluster analysis pipeline in
that each individual isolate cultured and resequenced multiple times in multiple places are all
easily identifiable as originating from the same source. Subsequent proficiency tests confirm
these results.
Poster
94
Board #:
Impact of Antibiotic and Innate Immune Pressures on Enterococcal Adaptation in the Human
Title:
Bloodstream
D. Van Tyne1, A. L. Manson2, M. M. Huycke3, J. Karanicolas4, A. M. Earl2, M. S. Gilmore5;
1
University of Pittsburgh School of Medicine, Pittsburgh, PA, 2Broad Institute, Cambridge,
Author
MA, 3University of Oklahoma Health Sciences Center, Oklahoma City, OK, 4Fox Chase Cancer
Block:
Center, Philadelphia, PA, 5Harvard Medical School, Massachusetts Eye and Ear Infirmary,
Boston, MA.
Multidrug-resistant enterococci emerged in the early 1980s, and are now among leading
causes of drug-resistant bacterial infection worldwide. We used functional genomics to study
one of the earliest outbreaks of multidrug-resistant Enterococcus faecalis bacteremia, to
determine how a clonal lineage adapted to grow and survive in the human bloodstream.
Genome sequence analysis of 62 closely related strains revealed a progression of increasingly
fixed mutations, as well as repeated independent occurrence of mutations in a relatively
Abstract: small set of genes. The most frequently encountered independent mutations we observed
occurred in a novel pathway that rendered E. faecalis better able to withstand antibiotic
pressure and innate defenses in the bloodstream, and were associated with changes in cell
surface-associated polysaccharides. A shift in mutation pattern then occurred, which
corresponded to the introduction of carbapenem antibiotics in 1987. This work uncovers new
pathways that allow enterococci to survive the transition from the gut into the bloodstream,
positioning them to cause infections associated with high mortality.
Poster
95
Board #:
Title: Characterization of Tissue-associated Metagenomes Using Selective Nanopore Sequencing
Author J. Wang, C. Jones, T. Furey, S. Sheikh, O. Finkel, J. Dangl;
Block: University of North Carolina at Chapel Hill, Chapel Hill, NC.
While 16S rDNA profiling has been the standard approach to characterizing host-associated
microbiome communities, it produces taxonomic classifications practically limited to the
genus level and suffers from PCR and other biases. Whole metagenome sequencing produces
more specific taxonomic information and an estimate of genetic content describing the
functional capacity of a microbial community. However, metagenomic studies are expensive
and require high sequencing depth, especially in tissue-associated microbiomes, where host
DNA makes up the vast majority of the sequenced reads (90-99+%). We describe a real-time
sequencing and analysis approach using Oxford Nanopore sequencers that enables real-time
enrichment or depletion of specific sequences. Using the "read-until" functionality of the
MinION sequencer, we perform basecalling and alignment of partial read sequences in real
time on a distributed cloud computing platform and eject reads belonging to the host
genome, thereby increasing the relative and absolute abundance of microbial sequences. This
Abstract: approach is essentially unbiased compared to existing method for preferential cell lysis and
DNA extraction, and produces an actual increase in sequenced microbial DNA unlike post-
sequencing filtering. We demonstrate the power of this approach by depleting host (human)
DNA in a mock host-microbial metagenome, and in a colon biopsy sample to describe the
composition and function of the mucosa-associated microbiome in the colon. We observed a
two to six-fold increase in the relative abundance of microbial sequences relative to host,
depending on the initial proportion. This selection method produces no detectable false-
positive depletion (of microbial sequences) or selection bias in the retained reads. We
additionally propose a simple and effective method for accurately classifying observed long
reads as host, or to their appropriate species/strian-level taxa. These host-depleted
metagenome experiments - with novel methods to efficiently classify long, error-prone reads
- demonstrate the power of tightly coupled sequencing and informatics protocols to enable
efficient investigation of disease-relevant tissue-associated microbiota.
Poster
96
Board #:
ClinicalWhole-Genome Sequencing of mycobacterium tuberculosis complex Isolates- 2½ Years
Title:
of Experience Analyzing, Reporting and Improving TB Testing in New York State
K. Musser1, J. Shea1, P. Lapierre1, T. Halse1, J. Lemon2, J. Rakeman2, V. Escuyer1;
Author 1
Wadsworth Center, NYSDOH, Albany, NY, 2Public Health Laboratory, New York City
Block:
Department of Health and Mental Hygiene, New York City, NY.
Background: Mycobacterium tuberculosis (MTB) is an important pathogen, infecting more
than a third of the world population; New York State (NYS) has the 3rd highest number of
cases by state in the US. The cost and time associated with diagnostic testing and treatment
of MTB can be considerable and weeks to months are required to identify, assess drug
susceptibility, and generate molecular genotypes. Our laboratory developed and validated a
comprehensive whole-genome sequencing (WGS) assay to characterize MTBcomplex (MTBC)
isolates, replacing seven molecular tests. We implemented this testing in March of 2016 and
have continually measured its performance, assessed turnaround time (TAT), success at
resistance prediction and high-resolution genotyping. Methods: The MTBC WGS assay is
comprised of a novel DNA extraction, optimized library preparation, paired-end WGS, and an
in-house developed bioinformatics pipeline; numerous quality control steps are incorporated
in this testing. Following DNA sequencing, the pipeline performs analysis usingthree principal
components: modules used for the phylogenetic analysis, modules used to perform
taxonomic identification of the samples, and modules used for SNP calling and resistance
profiling. The results from all three components generate a final comprehensive report for
each sample analyzed that is reported through our LIMS. Results: To date we have tested
Abstract: 1634 MTBC strains from unique NYS patients,including NYC. Of these, 5 members of MTBC
have been identified: 1560 M. tuberculosis, 27 M. bovis-BCG, 26M. bovis, 17 M. africanum,
and 4 M. orygis. In-silico spoligotypes were generated for 96.5% of strains tested, and strains
found to be closely related (<20 SNPs genome wide) were reported for epidemiological
investigation. Resistance profiles of the MTBC strains showed 79.8% to be susceptible to eight
drugs, 7.8% resistant to at least isoniazid, 2.4% multidrug resistant (MDR), and 0.12%
extensively drug resistant (XDR) strains. When compared with conventional phenotypic drug
susceptibility testing (DST), our assay was found to have an overall resistance predictive value
of 94% and a susceptibility predictive value of 98% based on >8000 phenotype-genotype
comparisons. We have assessed TAT since implementation and reduced our initial 8-day TAT
to 5 days from MTBC DNA extraction to report. This TAT has resulted in genotypic resistance
predictions being reported an average of 8 days earlier than first-line phenotypic
DST. Conclusions: This TB WGS clinical assay is providing comprehensive detection of drug
resistance, identification to the MTBC member and typing for epidemiological investigations.
As a result of improvements as well as updates to analyze more samples at one time, an
improved TB WGS pipeline is in use. This assay continues to improve patient management
and is supporting epidemiological investigations in NYS and NYC.
Poster
97
Board #:
Title: NGS Applied to the Epidemiology of Influenza a Virus Diversity in Brazil
A. B. Veiga1, T. Song2, T. G. Baccin1, T. S. Gregianini3, H. V. Bakel4, A. García-Sastre5, E.
Ghedin2;
Author 1Universidade Federal de Ciências da Saúde de Porto Alegre, Porto Alegre, BRAZIL, 2New York
Block: University, New York City, NY, 35Laboratório Central de Saúde Pública da Secretaria de Saúde
do Estado do Rio Grande do Sul – LACEN/SES-RS, Porto Alegre, BRAZIL, 4Mount Sinai Hospital,
New York City, NY, 5Icahn School of Medicine at Mount Sinai, New York City, NY.
Systematic surveillance of seasonal influenza A viruses using next generation sequencing
(NGS) has the potential to contribute to early detection of novel influenza strains in the
human population. In this study, we sequenced and analyzed clinical samples collected in
Brazil between 2009 and 2016 from 220 individuals infected with pandemic H1N1
(H1N1pdm09) or H3N2 influenza A virus. Phylogenetic analyses show persistence of strains
from one season to the next in Brazil, with introductions of new strains from global circulating
viruses. An analysis of single nucleotide variants (SNV) in the NGS data reveals mixed
infections with minor circulating strains that also appear to seed the next season. Some SNVs
Abstract: are located in antigenic sites of the hemagglutinin, leading to changes in antigenicity in recent
strains. For example, the non-synonymous mutation A538C in segment 4 of H1N1pdm09
(K180Q substitution in the HA antigenic site) appeared in strains during the 2013-2014
influenza season in the Northern Hemisphere, but the SNV analysis shows that minor variants
carrying this mutation had been circulating as early as 2011. In 10 of the 220 infected
individuals sampled we also detected mixed subtype infections, considered a rare occurrence
in the human population. NGS combined with minor variant analysis proves to be a powerful
surveillance tool to identify mixed infections and potential circulating strains in upcoming
seasons.
Poster
98
Board #:
Title: NGS analysis methods for Illumina data while the sequencer is running
S. H. Tausch1, T. P. Loka2, M. S. Lindner2, P. W. Dabrowski2, B. Strauch2, J. M. Schulze2, A.
Author Andrusch2, A. Radonic2, A. Nitsche2, B. Renard2;
1
Block: German Federal Institute for Risk Assessment, Berlin, GERMANY, 2Robert Koch Institute,
Berlin, GERMANY.
Background: Food induced infectious diseases still remain a major cause of health problems
across the globe. With the continuously increased use of next-generation sequencing (NGS) in
the field of infectious disease outbreak analysis and food safety controls, there is a strong
need for fast turnaround time from sample arrival to analysis results. While runtime of data
analysis software has significantly decreased, the overall turnaround time from sample arrival
to interpretable analysis results remained nearly the same due to the sequential paradigm of
data production and analysis. To overcome this limitation, we developed a collection of tools
for sequence analysis while the sequencer is still running. Methods: The presented methods
include software for read mapping (HiLive; Lindner et al., 2017,
doi:10.1093/bioinformatics/btw659), taxonomic classification (LiveKraken; Tausch et al.,
2018, doi:10.1093/bioinformatics/bty433), privacy preservation (PriLive; Loka et al., 2018,
doi:10.1093/bioinformatics/bty128), pathogen identification (PathoLive; Tausch et al.) and a
workflow for SNP/variant calling (Loka et al.) while the sequencer is running. Results: We are
able to show that each of our tools generates comparable or superior results to established
Abstract: tools in the named fields. HiLive’s accuracy (F1 = 0.761) is slightly higher than that of the other
tested approaches (BWA: 0.760, Bowtie2: 0.742) with the end of a sequencing run. LiveKraken
performs identical to Kraken with the end of a full MiSeq run, while reaching comparable
accuracy after less than half of a run (F1 = 0.96 at cycle 80 of 216). PriLive filters human reads
more accurately (F1 = 99.961) than BMTagger (99.956) and DeconSeq (99.941) and can
moreover mask sensitive data before it is completely produced. PathoLive combines a live
mapping approach with novel background masking techniques and thereby achieves highest
accuracy on a real HiSeq run (ROC-auc = 0.97 after 36h turnaround time) compared to Clinical
PathoScope (ROC-auc = 0.91 after 95h turnaround time) and Bracken (ROC-auc = 0.48 after
95h turnaround time). Conclusion: With each of these tools, we prove the ability to generate
meaningful results with or even before the end of a sequencing run. This allows minimizing
the turnaround time of a variety of tasks and can thereby increase the efficiency of high
throughput routine analyses. It could furthermore significantly reduce the response time in
urgent cases of infectious disease outbreaks. Since more and more institutions have their own
sequencers available, the parallelization of wet- and drylab is at hand.
Poster
99
Board #:
Beaver fever: Whole Genome Characterization of Waterborne Giardia Isolates Revealed Mix
Title:
Assemblages and Zoonotic Transmission
K. Tsui1, R. Miller2, M. Uyaguari-Diaz2, P. Tang1, C. Chauve3, W. Hsiao2, J. Isaac-Renton2, N.
Author Prystajecky2;
1
Block: Sidra Medicine, Doha, QATAR, 2University of British Columbia, Vancouver, BC,
CANADA, 3Simon Fraser University, Vancouver, BC, CANADA.
Giardia causes the diarrheal disease known as giardiasis; transmission through contaminated
surface water is common. The protozoan parasite’s genetic diversity has major implications
for human health and epidemiology. To determine the extent of transmission from wildlife
through surface water, we performed whole-genome sequencing (WGS) to characterize
89 Giardia duodenalis isolates from both outbreak and sporadic infections: 29 isolates from
raw surface water, 38 from humans, and 22 from veterinary sources. Using single nucleotide
variants (SNVs), combined with epidemiological data, relationships contributing to zoonotic
transmission were described. Two assemblages, A and B, were identified in surface water,
human, and veterinary isolates. Mixes of zoonotic assemblages A and B were seen in all the
community waterborne outbreaks in British Columbia (BC), Canada, studied. Assemblage A
was further subdivided into assemblages A1 and A2 based on the genetic variation observed.
Abstract:
The A1 assemblage was highly clonal; isolates of surface water, human, and veterinary origins
from Canada, United States, and New Zealand clustered together with minor variation,
consistent with this being a panglobal zoonotic lineage. In contrast, assemblage B isolates
were variable and consisted of several clonal lineages relating to waterborne outbreaks and
geographic locations. Most human infection isolates in waterborne outbreaks clustered with
isolates from surface water and beavers implicated to be outbreak sources by public health.
In-depth outbreak analysis demonstrated that beavers can act as amplification hosts for
human infections and can act as sources of surface water contamination. It is also known that
other wild and domesticated animals, as well as humans, can be sources of waterborne
giardiasis. This study demonstrates the utility of WGS in furthering our understanding
of Giardiatransmission dynamics at the water-human-animal interface.
Poster
100
Board #:
Title: Refactoring the NCBI Prokaryotic Genome Annotation Pipeline into a Stand-alone Tool
F. Thibaud-Nissen1, D. Slotta1, A. Badretdin1, B. Kiryutin1, A. Gourianov1, B. Busby1, R. Cohen1,
Author
W. Hlavina1, M. Hsieh2, S. Turner2;
Block: 1
NCBI/NLM/NIH, Bethesda, MD, 2Pacific Biosciences, Menlo Park, CA.
The NCBI Prokaryotic Genome Annotation Pipeline (PGAP) has been used to annotate RefSeq
prokaryotic genomes since the early 2000s, increasing in quality and consistency over the
years. PGAP annotation, also offered as a service to researchers submitting genome
assemblies to GenBank has become a reliable resource for the prokaryotic community.We
have re-factored PGAP into a stand-alone pipeline that can be executed outside of NCBI on
individual computers or in a cloud environment. The pipeline is written in CWL, which
executes programs wrapped in Docker containers, to run on a variety of platforms. To ensure
conformance of the stand-alone results with results generated at NCBI, manually curated
Abstract: evidence and other datasets used by the pipeline are bundled and distributed with the
pipeline. The goal is for stand-alone PGAP to produce annotation that is in line with internal
NCBI PGAP and that is submittable to GenBank. We expect that making PGAP portable will
accelerate research by providing scientists a quality annotation of the genomes they
assemble prior to submission. It will also give users an opportunity to iterate over the
assembly process until the assembly quality is high enough to produce quality annotation. We
will describe the stand-alone PGAP prototype and the results of the annotation tests
performed on multiple platforms and by multiple users across several locations, with respect
to performance and conformance.
Poster
101
Board #:
Diagnosis and Characterization of Canine Distemper Virus Through Real Time Sequencing by
Title:
MinION Nanopore Technology
A. Lorusso, A. Peserico, M. Marcacci, D. Malatesta, M. Di Domenico, I. Mangone, F. Pizzurro,
Author
G. Zaccaria, C. Cammà;
Block:
Istituto Zooprofilattico dell' Abruzzo e del Molise, Teramo, ITALY.
Rapid identification of the etiologic agent of an infectious disease is essential for setting up
treatment and preventive measures. In general, pathogen identification is performed by
direct diagnostic tests which normally include amplification of target nucleic acid by PCR-
based assays. Although these approaches are highly specific and, often, validated, they suffer
a number of limitations, including the difficulties of testing for the plethora of rare pathogens
that might be expected to cause a given pathology and their inability to identify new or
unexpected pathogens, eventually originated from cross-species jumps. Therefore, the
existence of other more rapid, broad-range and sensitive techniques have become more and
more important in the milieu of laboratory diagnosis of infectious diseases. In this
Abstract:
perspective, nucleic acids purified from the brain tissue of a dog succumbed after severe
neurological signs were processed with the MinION (Oxford Nanopore Technologies,
Cambridge UK) sequencing technology. Canine distemper virus (CDV) infection was
diagnosed. The earliest detection of sequence reads belonging to CDV was accomplished
within the first 20 minutes of real time sequencing. Subsequently, a specific real time RT-PCR
assay and immunohistochemistry were used to confirm the presence of CDV RNA and
antigen, respectively, in tissues. This study supports the use of the MinION in veterinary
clinical practice with tremendous advantages in terms of rapidity and accuracy of molecular
diagnosis.
Poster
102
Board #:
Title: Biohansel for RapidSubtyping of Highly Clonal Pathogens Using Canonical SNPS
G. Labbe1, P. Kruczkiewicz2, P. Mabon2, M. Rankin1, M. Gopez2, J. Robertson1, N. Knox2, A. R.
Reimer2, G. Tong2, H. J. Adam3, R. P. Johnson1, G. Van Domselaar2, J. H. Nash4;
1
National Microbiology Laboratory, Public Health Agency of Canada, Guelph, ON,
Author
CANADA, 2National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB,
Block:
CANADA, 3Department of Medical Microbiology, University of Manitoba, Winnipeg, MB,
CANADA, 4National Microbiology Laboratory, Public Health Agency of Canada, Toronto, ON,
CANADA.
Background: Whole genome sequencing (WGS) is rapidly being adopted by Public Health in
many jurisdictions, creating a need for rapid, robust analytical tools. Single nucleotide
polymorphism (SNP) genotyping panels have been developed for numerous organisms based
on canonical SNPs that are discriminatory for clonal populations. Using canonical SNP panels,
new isolates can rapidly be placed within the population structure without the need to
rebuild phylogenetic trees for the entire population. A canonical SNP-based nomenclature can
facilitate long-term surveillance by allowing numerous comparisons of isolates across time in
a context broader than typically considered for outbreak response. Methods: Biohansel
rapidly classifies WGS data into hierarchical subtypes without the need for assembly.
Canonical SNP schemas for two prevalent Salmonella serovars (S. Enteritidis and S.
Heidelberg) have been incorporated into biohansel, and user-defined schemas can also be
supplied at runtime for subtyping other pathogens. Biohansel identifies SNPs using the Aho-
Corasick algorithm (Ju et al., 2017, doi.org/10.1101/229708) according to defined k-mers
containing target SNPs. Results are evaluated using a quality assurance module which
identifies problematic samples according to the number of targets found, target coverage,
and concordance with the population structure defined by each schema. Possible mixed
Abstract:
samples are identified based on the presence of discordant sets of SNPs and presence of
multiple SNPs for each target. Biohansel is a Python 3 application and available on PyPI,
Conda and as a Galaxy tool. Source code is available at https://github.com/phac-
nml/biohansel. Results: We demonstrate the utility of biohansel by rapidly analyzing
>23,000 S. Enteritidis and >3,000 S. Heidelberg WGS datasets from public repositories using
minimal computational resources, and by identifying subtype associations with commodities
and geography. Biohansel proved useful to rapidly identify closely related isolates and exclude
poor quality WGS datasets, enabling the creation of reference-mapped phylogenetic trees
with the high discriminatory power needed for traceback investigations. The tool was also
able to detect and subtype Salmonella in shotgun metagenomics datasets obtained from
clinical stool samples. The Aho-Corasick algorithm for k-mer searching is as fast as NCBI
BLAST+ against assembly contigs (~0.4s) and is 10 times faster than Jellyfish (~33s vs ~356s)
for typically sized Salmonella read sets (30-100X coverage). Conclusions: In a public health
context, biohansel enables rapid and high resolution classification of North American isolates,
providing a robust, stable framework for source attribution and supporting identification of
possible interventions to reduce contamination of food products.
Poster
104
Board #:
Title: IRIDA: A Platform for Genomic Epidemiology
A. Petkau1, T. Matthews1, F. Bristow2, J. Adam1, P. Kruczkiewicz1, J. Cabral1, J. Thiessen1, E.
Griffiths3, D. Dooley4, D. Fornika4, G. Winsor3, M. Graham1, E. Taboada1, R. Beiko5, W. Hsiao4,
F. Brinkman3, G. Van Domselaar1;
Author 1
Public Health Agency of Canada, Winnipeg, MB, CANADA, 2University of Manitoba, Winnipeg,
Block:
MB, CANADA, 3Simon Fraser University, Burnaby, BC, CANADA, 4BC Public Health
Microbiology and Reference Laboratory, Vancouver, BC, CANADA, 5Dalhousie University,
Halifax, NS, CANADA.
Background: Whole genome sequencing (WGS) is a powerful tool for public health infectious
disease investigations owing to its higher resolution, greater efficiency and cost-effectiveness
over traditional genotyping methods. However, implementation of WGS in routine public
health microbiology labs is impeded by the complexity in data management, availability of
easy-to-use pipelines, integration of pipeline results with epidemiological metadata, and
restrictive jurisdictional data sharing policies. To address these issues, we developed the
Integrated Rapid Infectious Disease Analysis (IRIDA) platform—a user-friendly, decentralized,
open source bioinformatics and analytical web platform—to support real-time infectious
disease outbreak investigations using WGS data. Methods/Results: IRIDA stores and manages
WGS data alongside contextual metadata—providing a single system for processing and
generating reports on sequenced samples. WGS data is automatically uploaded to IRIDA using
a tool installable on a sequencing instrument. Data is then processed to evaluate quality,
assemble, perform in silico sequence typing, and save results into the epidemiological
metadata system. Typing of Salmonella genomes uses SISTR, a tool for Salmonella serovar
prediction and cgMLST analysis from WGS data. Additional k-mer based typing pipelines
include MentaLiST for cg/wgMLST and biohansel for SNP-based typing. SNVPhyl provides
Abstract: whole genome phylogenetic analysis using SNV/SNPs; Mash provides rapid distance
estimation to existing genomes in RefSeq. Genomes may be sent to IslandViewer for genomic
island detection. Pipelines may be configured to trigger automatically on upload of new WGS
data, or users may select sets of samples for additional analysis through the IRIDA pipelines.
The IRIDA metadata system integrates data generated from a pipeline—such as sequence
type—with user-provided metadata into a single table. Users may toggle the display of
metadata fields and save specific views of the metadata for later use. These views of
metadata may also be visualized alongside a phylogenetic tree. The IRIDA REST API enables
secure exchange of genomic and epidemiological metadata, enabling construction of a
decentralized genome data sharing network. The IRIDA REST API may also be used to extend
IRIDA’s functionality, such as through additional tools for custom report generation or
integration with the phylogeographic software GenGIS. Conclusion: IRIDA is successfully
deployed as the official bioinformatics platform for public health genomics in the pan-
Canadian Public Health Laboratory Network (CPHLN). The storage, management, and analysis
of WGS data alongside contextual metadata has helped simplify surveillance and outbreak
investigation activities. IRIDA is open source and freely available at https://github.com/phac-
nml/irida and http://irida.ca.
Poster
105
Board #:
Interpreting Whole-Genome Sequence Analyses of Foodborne Bacteria for Regulatory
Title:
Applications and Outbreak Investigations
Author A. Pightling, J. Pettengill, Y. Luo, J. Baugher, H. Rand, E. Strain;
Block: U.S. Food and Drug Administration, College Park, MD.
Whole-genome sequence (WGS) analysis has revolutionized the food safety industry by
enabling high-resolution typing of foodborne bacteria. Higher resolving power allows
investigators to identify origins of contamination during illness outbreaks and regulatory
activities quickly and accurately. Government agencies and industry stakeholders worldwide
are now analyzing WGS data routinely. Although researchers have published many studies
that assess the efficacy of WGS data analysis for source attribution, guidance for interpreting
WGS analyses is lacking. Here, we provide the framework for interpreting WGS analyses used
by the Food and Drug Administration's Center for Food Safety and Applied Nutrition (CFSAN).
We based this framework on the experiences of CFSAN investigators, collaborations and
interactions with government and industry partners, and evaluation of the published
literature. A fundamental question for investigators is whether two or more bacteria arose
from the same source of contamination. Analysts often count the numbers of nucleotide
Abstract:
differences (single-nucleotide polymorphisms [SNPs]) between two or more genome
sequences to measure genetic distances. However, using SNP thresholds alone to assess
whether bacteria originated from the same source can be misleading. Bacteria that are
isolated from food, environmental, or clinical samples are representatives of bacterial
populations. These populations are subject to evolutionary forces that can change genome
sequences. Therefore, interpreting WGS analyses of foodborne bacteria requires a more
sophisticated approach. We present a framework for interpreting WGS analyses that
combines SNP counts with phylogenetic tree topologies and bootstrap support. We also
elucidate the roles of WGS, epidemiological, traceback, and other evidence in forming the
conclusions of investigations, making clear that WGS data alone is insufficient for links
between bacterial isolates to be made. Finally, we present examples that illustrate the
application of this framework to real-world situations.
Poster
106
Board #:
GenomeTrakr Database and Network: WGS Network for Real-Time Characterization and
Title:
SourceTracking of Foodborne Pathogens
M. W. Allard, R. Timme, M. Sanchez, E. Stevens, M. Hoffmann, K. Yao, G. Kastanis, G. Kastanis,
Author D. Miller, T. Muruvanda, S. Lomonaco1, E. Strain, J. Payne, A. Pightling, H. Rand, J. Pettengill,
Block: Y. Luo, N. Gonzalez-Escalona, D. Melka, S. Lindley, Y. Chen, S. Tallent, E. Brown;
US FDA, College Park, MD.
A national database of federal, state, academic and international laboratories has been using
WGS data to rapidly characterize pathogens. This mature GenomeTrakr network is part of
NCBI Pathogen Detection web site. Public health agencies (FDA, CDC and USDA-FSIS) collect
and share data in real time. This high-resolution, rapidly growing database is actively being
used in outbreak investigations at state, national, and international levels. GenomeTrakr
database has demonstrated how distributed network of desktop WGS sequencers can be
used in concert with traditional epidemiology and investigation for source tracking of
foodborne pathogens. This new “open data” model allows greater transparency between
federal/state agencies, industry partners, academia, and international collaborators. This
database has continued to grow and diversify the foodborne pathogen database doubling in
Abstract:
the last year to ~207,000 draft genomes. Two new international surveillance efforts were
added to collect food, animal and environmental isolates including Campylobacter. NCBI has
release new data analysis tools that improve rapid interpretation and visualization. NCBI,
currently is producing daily clustering results for 22 pathogens
including: Salmonella, Listeria, E. coliand Campylobacter. The high-resolution WGS signal in
concert with epidemiological or inspection evidence has drastically enhanced our ability to
identify the food sources of current outbreaks for foodborne pathogens with ~200 regulatory
clusters examined in 2017. Results demonstrate global benefits of having an open data
model. Understanding root causes of foodborne contamination assists our academic, public
health and industry partners to develop preventative controls to make food safer globally.
Poster
108
Board #:
Title: Ultra-Rapid Sample-to-Answer for Fieldable Genomic Sequencing-Based Biothreat Detection
T. Reed1, M. Karavis2, S. Deshpande3, R. Lewandowski1, C. Anderson2, M. LaFrance1, P. Roth4,
A. Liem4, R. C. Bernhards5;
1
CBRNE Analytical & Remediation Activity, 20th CBRNE Command, US Army, Aberdeen
Author
Proving Ground, MD, 2Edgewood Chemical Biological Center, Aberdeen Proving Ground,
Block:
MD, 3Science & Technology Corp. support to Edgewood Chemical Biological Center, Aberdeen
Proving Ground, MD, 4DCS Corp. support to Edgewood Chemical Biological Center, Aberdeen
Proving Ground, MD, 5Defense Threat Reduction Agency, Ft. Belvoir, VA.
Rapid and accurate detection technologies are critically needed in the field, especially for
unknown, emerging, and genetically modified biothreats. Next-generation sequencing (NGS)
technologies are superior in that the entire genome can be analyzed, which allows for
unbiased, conclusive identification, and the ability to detect new and synthetically modified
threats. However, most NGS sequencing technologies have substantial size, power, and
sample preparation requirements which severely limits their use in far-forward
environments, and current methodologies for sample-to-answer take multiple days to
complete. The MinION nanopore sequencer developed by Oxford Nanopore Technologies has
recently emerged as a portable NGS technology. Nanopore sequencing utilizes biological
proteins as nanopores for the passage and identification of DNA and RNA molecules.
Improvements in error rates combined with the high amount of read generation have made
nanopore sequencing comparable to existing NGS sequencing technologies, such as Illumina
and PacBio, without the need for large, expensive equipment. The MinION is able to fit in the
palm of your hand, offering the capability to conduct true field-deployable sequencing, which
can allow for rapid identification of unknown threats and disease monitoring in resource-
limited settings. This project aims to accelerate the time from sample to answer, simplify the
Abstract:
procedures, and reduce equipment/power that is needed. An optimized workflow was
established using simple, fieldable, and rapid sample and library preparation procedures. The
workflow includes the use of the portable OmniLyse bead beading device capable of lysing
spores within two minutes, a rapid DNA purification protocol, and an eight minute library
preparation. In addition, the utility of the VolTRAX automated sample/library preparation
device, the Flongle flow cell adapter, and the MinIT miniature processor are currently being
investigated for inclusion into the workflow. Within 10 minutes of sequencing on the MinION,
enough reads are generated to conclusively identify the organisms present in the sample.
Automatic offline live basecalling is used during the sequencing run, and after sequencing is
complete, the data is analyzed instantly using offline software developed at ECBC. Using this
workflow, raw sample-to-answer can be achieved in approximately one hour. Field
demonstrations are being conducted with DoD mobile lab operators for assessment. The goal
is to allow for genomic sequencing identification to be performed rapidly by minimally
trained personnel in low-resource environments and without the need for high-powered lab
equipment. Using this procedure, the MinION could be used by the warfighter to rapidly
identify unknown biothreats on the battlefield or in expeditionary analytic scenarios.
Poster
109
Board #:
Title: Tracing the Origins of Hospital-onset Clostridioides difficile Infections
J. Worley1, C. Cummins1, M. Delaney1, A. DuBois1, S. Men1, M. Klompas2, L. Bry1;
Author 1Massachusetts Host-Microbiome Center, Department of Pathology, Brigham and Women's
Block: Hospital, Boston, MA, 2Department of Population Medicine, Harvard Medical School and
Harvard Pilgrim Health Care Institute, Boston, MA.
Background: Clostridioides difficile is a leading cause of health care-associated infection in the
United States and the leading cause of death from a gastrointestinal pathogen. The annual
costs associated with its treatment are frequently estimated to be in excess of $5 billion. We
designed an experiment to address if patients who develop hospital-onset C. difficile infection
(CDI) were infected by strains commonly found at the hospital, from other patients, or
asymptomatically carried upon admission. While some strains have been more common,
particularly sequence type 1/NAP1/PCR ribotype 027 (ST1), there is high genetic diversity
within disease-causing C. difficile. Whole-genome sequencing, which can identify clonally
related bacteria, was used to address this question by sequencing strains from CDI presenting
patients and an incoming patient screen. Methods: The study period was September 2017
through May 2018. Patients admitted to the intensive care units are screened for
vancomycin-resistant Enterococci by rectal swab (VRE swab) upon admission and weekly
thereafter. These swabs were screened for C. difficile to identify strains arriving to the
hospital. VRE swabs were collected from November through April 15th, while stool was
collected over the entire period. Stool collection was hospital wide. Isolates were sequenced
using the Illumina MiSeq platform. Single-nucleotide polymorphisms were identified de novo
Abstract: using kSNP and through core-gene alignment. Sequence types and genetic features were
assessed using BLAST and MUSCLE. Sequence types were classified using
PubMLST. Results: 2418 swabs from over 1500 patients were screened for C. difficile, of
which 177 produced C. difficile isolates (7%). 179 stool samples were collected during this
period, of which over 90% produced isolates. In this dataset, 5 patients transitioned from
asymptomatic carriage to CDI, each time without changing sequence type. Additionally, 7
patients transitioned from non- carriage to CDI. While sequencing is not complete
(anticipated completion by September), a diverse set of isolates representing over 50
sequence types (ST) was found from 242 sequenced isolates. Of these, only 12 were from ST1
(5%). Strains from ST1 and related STs were more likely to be found in CDI than other strains,
and atoxigenic strains less likely. Conclusions: Strains incoming to the hospital are highly
diverse and represent much of the genetic diversity within C. difficile. ST1 does not represent
the predominant strain in our samples, even though it is still more strongly linked to disease
than other STs. We find that, in all cases where a patient is asymptomatically colonized before
CDI onset, the same strain was isolated before and during CDI. Even with a small sample size,
this raises the possibility of being able to identify a subpopulation of patients at greater risk
for developing CDI and adjusting medical care appropriately.
Poster
110
Board #:
Title: Metagenomic Strain Detection with Rainbow Sketching
Author R. Bovee, C. Smith, N. Greenfield;
Block: One Codex, San Francisco, CA.
Identifying specific strains and mixtures of strains in complex metagenomic samples is a key
challenge in both epidemiology and environmental microbiology. Even sensitive, k-mer-based
metagenomic classification tools struggle with strain identification due to issues including
database quality, contamination, and genetic recombination and other shared homology. In
contrast, recent MinHash methods sketch the entire sample (which is inappropriate for
complex mixtures) or require a comparison for each available reference (limiting scalability).
We present Rainbow Sketching, an approach that leverages both k-mer-based taxonomic
classification and MinHash sketching for strain tracking and mixture modeling. Rainbow
Abstract: Sketching first performs taxonomic classification of each individual k-mer (“coloring” each k-
mer) and uses these colors to build a rainbow of discrete sketches. Each taxa-specific sketch
may then be compared against a subset of relevant reference genomes - identifying present
strains and determining strain-reference novelty. Count data within these sketches also
provides a foundation for modeling strain mixtures.
We employed this method in the recent PrecisionFDA CFSAN Pathogen Detection Challenge -
achieving the highest overall score detecting Salmonella strains against a metagenomic
background. We present results from this challenge, several clinical and other real-world
datasets, and simulated data to demonstrate the sensitivity and specificity of this approach.
Poster
111
Board #:
Development of a Serotyping Pipeline Using Whole Genome Sequencing (WGS)
Title:
for Shigella Identification
Author Y. Wu, H. Lau, T. Lee, D. K. Lau;
Block: FDA, Alameda, CA.
The bacteria Shigella spp. of 4 species and >50 serotypes cause shigellosis, a disease that
leads to significant morbidity, mortality, and economic loss worldwide. An estimated 500,000
annual shigellosis cases occur in the US, and the number of cases has been on the rise.
Shigellosis is transmitted through the fecal-oral route, and about one-third of these cases are
foodborne. Serotyping (speciation) is an important tool for Epidemiological surveillance that
informs future policy making for outbreak control and vaccine development.
Classical Shigella serotyping based on serology is tedious, time-consuming, limited by the
availability of sensitive and serotype-specific antibodies, and its interpretation often
interfered by cross-reactivity. Modern molecular diagnostic assays are fast and sensitive but
does not distinguish Shigella at species level or even from the closed related
enteroinvasive Escherichia coli (EIEC) strains. Due to its high discriminating power, whole
genome sequencing (WGS) holds the promise to replace the conventional Shigella serotyping
with a faster and more accurate in silico serotyping. However, analysts trained as Laboratory
Microbiologists do not usually possess sophisticated bioinformatics skills. Some serotypes
of Shigella are determined by both O-antigen biosynthetic genes and O-antigen modification
Abstract: enzymes, which can be complicated to interpret. We have developed an automated workflow
that utilizes limited computational resources to accurately and rapidly
determine Shigella serotypes using WGS data from Shigella and EIEC strains available in the
laboratory and on NCBI SRA. To conserve time and computational resources, raw WGS reads
are subjected to alignment with an in-house curated reference sequence database composed
of Shigella serotype determinants and genus- and species-specific sequences as indicators to
exclude non-Shigella isolates. Serotype prediction is made based on sequence hits that pass
threshold levels of coverage and accuracy. Operators with minimal computer programming
skills and knowledge in Shigella genetics can obtain an unambiguous interpretation using this
pipeline. For pair-ended fastq reads of < 1.7 GB, the turn-around time is under 5 minutes. This
pipeline will be further optimized and streamlined for accuracy, ease of use, and confidence
of predictive values before validation. We are also expanding the reference sequence
database by constantly updating it with newly available sequences from provisional
serotypes. This pipeline is the first step towards building a comprehensive WGS-based
analysis pipeline of Shigella spp. for outbreak investigation and control in a field laboratory
setting, where speed is essential.
Poster
112
Board #:
Title: Quinolone Resistance Mechanisms Found in E. coli from Four Animal Species in Norway
H. Kaspersen1, C. Sekse1, J. S. Slettemeås1, R. Simm2, M. Norström1, H. Sørum3, A. Urdahl1, K.
Lagesen1;
Author 1
Norwegian Veterinary Institute, Oslo, NORWAY, 2Department of Oral Biology, Faculty of
Block:
Dentistry, University of Oslo, Oslo, NORWAY, 3Institute of Food Safety and Infection Biology,
Norwegian University of Life Sciences, Oslo, NORWAY.
Quinolones and fluoroquinolones are regarded as critically important for human health, but
increased use of these compounds have been linked to increased occurrence of resistance. In
Norway, fluoroquinolones are used in negligent amounts in livestock, and prophylactic use is
prohibited. Nevertheless, low levels of quinolone resistant E. coli (QREC) have been observed
in a high proportion of the samples analysed in the monitoring programme for antimicrobial
resistance in the veterinary and food production sectors (NORM-VET). To better understand
the occurrence of QREC, the resistance mechanisms present in selected isolates from the
NORM-VET programme are characterized. E. coli isolates were defined as QREC when they
grew in the presence of ciprofloxacin and/or nalidixic acid at concentrations above the
epidemiological cut-off values of 0.06 µg/ml and 16 µg/ml, respectively. QREC isolates were
randomly selected and grouped based on animal species of origin, minimum inhibitory
Abstract: concentration (MIC) for ciprofloxacin and nalidixic acid, as well as the number of additional
resistant phenotypes, resulting in 285 isolates. The isolates originated from wild birds (n = 69),
red foxes (n = 53), pigs (n = 75), and broilers (n = 88). The MIC ranges of the isolates for
ciprofloxacin and nalidixic acid were 0.03 - 16 µg/ml and 4 - 256 µg/ml, respectively. Whole
genome sequencing on Illumina HiSeq2/3/4000 with Nextera Flex/XT library prep was
performed on the isolates. The resulting sequences were run through the Bifrost pipeline
(github.com/NorwegianVeterinaryInstitute/Bifrost) for quality control, antimicrobial
resistance gene identification, and multilocus sequence typing (MLST). Acquired resistance
genes and mutations in intrinsic genes are identified from reads by mapping to a reference
database, followed by local assemblies. Preliminary results suggest that over 80 % of the
isolates have at least one mutation in the gyrA gene, less than 30 % in the gyrB, less than 50 %
for parC and above 60 % for parE. Further analysis is being done and results will be presented.
Poster
113
Board #:
Population Genomic Analysis of Pseudomonas aeruginosa Reveals Strain Sharing During New-
Title:
Onset Cystic Fibrosis Infections
P. J. Stapleton1, C. Izydorczyk2, A. Blanchard3, P. W. Wang2, J. Diaz Caballero2, Y. Yau1, V.
Waters3, D. S. Guttman2;
Author 1
Dept. of Laboratory Medicine and Pathobiology, Univ. of Toronto, Toronto, ON,
Block:
CANADA, 2Dept. of Cell and Systems Biology, Univ. of Toronto, Toronto, ON, CANADA, 3Div. of
Infectious Diseases, Hospital for Sick Children, Toronto, ON, CANADA.
Introduction: Sharing of Pseudomonas aeruginosa (Pa) strains between cystic fibrosis (CF)
patients with chronic infection is relatively common. It is unclear how frequent Pa strain
sharing is in new-onset infections occurring earlier in CF, when infections are treated with
antibiotic eradication therapy (AET) and epidemic strains infrequently encountered. We
sequenced Pa isolated from sputum of children prior to initiation of inhaled AET, to
determine the frequency of mixed strain infection, strain sharing, and their association with
AET failure. Methods: We sequenced 342 Pa isolates using Illumina technology, collected
from 65 children with 75 distinct episodes of new-onset infection (episodes at least 1 year
apart, AET failure in 27% of episodes) between 2012 and 2016. Up to 10 isolates were
sequenced per episode. We performed first-pass analysis of population structure by building
phylogenies with 1) Assembly and Alignment Free (AAF), 2) a pairwise distance matrix
generated from assemblies using Mash and 3) a conventional mapping step and SNP
alignment. We further investigated clusters suggestive of strain sharing by mapping genomes
from each cluster to closely related references, using 3 pipelines; 1) Bacteria and Archaea
Genome Analyser (BAGA), 2) Snippy and 3) an in-house pipeline. Maximum likelihood
phylogenetic trees were generated from Single Nucleotide Polymorphisms (SNP) alignments
with IQ-TREE. Strain sharing, which could result from direct/indirect transmission or a
Abstract:
common environmental reservoir, was inferred based on detection of appropriate topological
signal in these trees (strains from different patients exhibiting close monophyletic, or
paraphyletic, relationships). Univariate logistic regressions were used to assess associations
between mixed infection, strain sharing and AET failure. All statistical analyses were done
using SAS 9.04.01. Results: Pairwise SNP differences between closely related isolates differed
depending on the pipeline used and how closely related the reference, but tree topologies
were broadly similar. A large number of patients shared Pa strains with other patients
(N=25/65, 40%). Mixed infection (two or more strains present in sputum concurrently)
occurred in 12/75 episodes (16%). Having a mixed infection was significantly associated with
sharing of Pa strains (unadjusted OR 10.7, 95% CI 2.2; 53.7, p <0.01) but was not associated
with AET failure. Furthermore, strain sharing was not associated with AET
failure. Conclusions: A large proportion of patients were infected with a Pa strain shared with
other patients; the reason for this requires for further investigation. Mixed lineage Pa
infections were relatively frequently observed in new-onset episodes and were associated
with strain sharing between patients. Tree topologies for individual clusters were similar
regardless of SNP calling pipeline used despite variation in pairwise SNP difference between
isolates.
Poster
114
Board #:
Antimicrobial Resistance Prediction by Whole Genome Sequencing in MRSA and VRE: A Real-
Title:
World Application
A. Babiker, M. M. Mustapha, K. A. Shutt, C. D. Ezeonwuka, S. L. Ohm, M. P. Pacey, J. Marsh, V.
Author
S. Cooper, Y. Doi, L. H. Harrison;
Block:
University of Pittsburgh, Pittsburgh, PA.
Background: The antimicrobial resistance (AMR) crisis represents a serious threat to public
health and the healthcare economy and has resulted in concentrated efforts to increase
development of rapid molecular diagnostics for AMR. In combination with publicly-available
web-based AMR databases, whole genome sequencing (WGS) offers the capacity for rapid
detection of antibiotic resistance genes. Here we studied the concordance between WGS-
based resistance prediction and phenotypic susceptibility testing results for methicillin-
resistant Staphylococcus aureus (MRSA) and vancomycin resistant Enterococcus (VRE) clinical
isolates using publicly-available tools. Methods: Clinical isolates prospectively collected at the
University of Pittsburgh Medical Center between December 2016 and December 2017
underwent WGS. Antibiotic-resistant gene content was assessed from assembled genomes by
BLASTn search of online databases ResFinder and the Comprehensive Antibiotic Resistance
Abstract:
Database (CARD). Concordance between WGS-predicted and phenotypic susceptibility as well
as sensitivity, specificity, positive and negative predictive values (NPV, PPV) were calculated
for each antibiotic/organism combination, using the phenotypic results as the gold
standard. Results: Phenotypic susceptibility testing and WGS results were available for 109
and 105 MRSA and VRE isolates respectively. Out of a total of 1,058 isolate/antibiotic
combinations overall concordance was 98.8% with a sensitivity, specificity, PPV, NPV of 98.0%
(95% CI, 0.97-0.99), 99.1% (95 % CI, 0.98-0.99), 98.5% (95% CI, 0.97-0.99), 99.0% (95% CI,
0.98-0.99), respectively. Identification of point mutations in housekeeping genes increased
the concordance to 99.3% and the sensitivity to 99.5% (95% CI, 0.98-0.99) and NPV to 99.8%
(95% CI, 0.99-0.99). Conclusion: WGS can be used as a reliable predicator of phenotypic
resistance for both MRSA and VRE using readily available online tools
Poster
115
Board #:
Deep Sequencing of a Measles Vaccine Strain Reveals Complexity of Defective Interfering
Title:
Genomes
Author A. Beck, M. Coughlin, P. Rota, B. Bankamp;
Defective interfering particles (DI) of measles virus (MeV) frequently arise in cell culture,
suppressing the replication of standard virus. Paramyxovirus DIs are immunostimulatory, so
elucidation of the formation and function of DIs is important to more fully understand MeV
replication. Many of the complex truncation jump points of DI RNAs arise from a “copy-back”
mechanism involving disassociation of the polymerase complex and reattachment in
opposing strand-orientation. DI are challenging to quantify with traditional molecular
techniques. We serially passaged the MeV vaccine strain, Moraten, in Vero-hSLAM and MRC5
cells, using conditions that promote DI formation. A MIQE-standard RT-qPCR assay using SYBR
chemistry was developed to measure ratios of MeV full-length genomic and DI RNA species
and to minimize experimental noise arising from viral mRNAs. RT-qPCR data were validated
by a two-step, end-point PCR procedure that detects a copy-back polarity switch in the DI
Abstract: RNA sequence. RNA extracted from DI-containing cell lysates were sequenced using stranded
Illumina chemistry and analyzed for DI jump points by various in silico methods. Gapped read
alignments were used in conjunction with R/Bioconductor ranged data processing to detect
the true diversity of DI content in the samples. A high diversity of truncation jump points was
observed and stranded sequencing data suggested replication of DI genomes. Cyclic
suppression of standard virus titers was observed along the passage series, and was inversely
correlated with concentrations of DI RNAs determined by RT-qPCR. DI were detected after
serial passage in Vero-hSLAM cells but not MRC5 cells. The findings represent novel evidence
of DI complexity in a laboratory passaged MeV vaccine strain; DIs were stably and
independently observed in replicate trials over 20 passages. These findings suggest that cell-
specific mechanisms affect DI formation, and that NGS methods are of utility for the discovery
of novel RNA populations in paramyxovirus-infected cells.
Poster
116
Board #:
Title: Long-read Sequencing and Assembling of Bacillus Species in Error-free Simulations
Author J. Li;
Block: Changshu Institute of Technology, Suzhou, CHINA.
The generation of genomic data from microorganisms has revolutionized our abilities to
understand their biology, but it is still challenging to quickly and cheaply obtain the complete
genome sequence of microbes in an automated, high-throughput manner. While the advent
of second-generation sequencing technologies provided significantly higher data throughput,
their shorter read lengths and more pronounced sequence-context bias led to a shift towards
resequencing applications. Recently, single molecule real-time (SMRT) DNA sequencing has
been used to generate sequencing reads that are much longer than second-generation or
even Sanger sequencing reads, facilitating de novo genome assembly and genome finishing.
Abstract: Here we tried to develop a novel multiplex strategy to make full use of the capacity and
characteristics of SMRT sequencing in microbe genome assembly. We first used error-free
simulations to evaluate the practicability of assembling SMRT genomic sequencing data from
multiple microbes into finished genomes once at a time. And then we compared the
influence of some key factors, including sequencing coverage and read length, on multiplex
assembling. Our results showed that long-read genomic sequencing inherently provided the
ability to assemble genomic sequencing data from multiple microbes into finished genomes
duo to its long read length. This approach might be helpful for the various groups of microbial
genome projects or metagenomics research.
Poster
117
Board #:
Rapid Antibiotic Resistance Identification Using VOLTRAX Library Preparation and Nanopore
Title:
Sequencing
Author J. Humphrey, T. Seitz, A. Ducluzeau, D. Drown;
Block: University of Alaska Fairbanks, Fairbanks, AK.
Antibiotic resistance is a growing health crisis in the US accounting for over 20,000 deaths
annually. Environmental bacteria can act as reservoir of opportunistic pathogens despite a
lack of exposure. Resistant microbes may have a significant negative impact on the health of
Alaskans. Identifying specific antibiotic resistant microbes is essential for quick and
appropriate treatment. Here we demonstrate a rapid, automated, and portable sequencing
platform. In this proof of concept, each library was constructed from a single cultured
Abstract: microbial isolate using the the VOLTRAX, a rapid, automated library preparation. Sequencing
was carried out using the portable Nanopore DNA sequencer, MinION. We demonstrate the
results of our long read bioinformatic pipeline for assembly, contig polishing, and annotation.
All of these steps were carried out by two undergraduates with introductory laboratory skills.
Importantly, these methods allow us to better understand the environmental reservoir of
antibiotic resistance in Alaska. These results demonstrate the potential application of the
portable library preparation and sequencing for a mobile biosurveillance laboratory.
Poster
118
Board #:
Metagenomics for Diagnosis of Sterile Site Infection: Balancing Automation with Expert
Title:
Interpretation
Author C. Anscombe, A. Nguyen, N. Le, H. Nguyen, P. Ashton, T. Le;
Block: Oxford University Clinical Research Unit (OUCRU), Ho CHi MInh City, VIET NAM.
Background: A quick literature search will demonstrate the increasing popularity of
metagenomic sequencing methods to diagnose patients with suspected infections when other
methods have failed. However, there is a need to carry out these analyses on larger,
prospective cohorts, to determine sensitivity and specificity. At OUCRU, Vietnam, we are
investigating its use in central nervous system (CNS) infections and in patients with fever of
unknown aetiology. In order to examine this data effectively we have developed an analysis
pipeline which is rapid, requires low RAM, and is designed for clinicians or microbiologists to
use. Methods: The pipeline takes raw reads from Illumina sequencing, removes host reads
and classifies the remaining samples using CLARK in light mode. After classification, a
prediction of genome coverage is made for each organism identified based on number of
reads and the genome size of the organism. If a threshold is met, the reference for that taxon
ID is downloaded and sample reads mapped. Outputs include mapping statistics such as
genome coverage and number of reads mapped. A report on the frequency at which taxon
IDs are found across the run is automatically generated, allowing users to consider
contamination. Users can customize the classification database to fit with need, define the
relevant host genome, input contamination libraries and specify taxa to ignore in analysis
based on local knowledge. Results: The pipeline was used to analyze metagenomic
Abstract:
sequencing results from 71 CSF samples collected from patients presenting with CNS infection
in Vietnam. After pipeline completion, the number of reference mapping analyses was 104.
Different sub-types of torque teno virus accounted for 33 of these, and were removed from
the analysis. The results were then edited for clinical significance by a microbiologist. Results
identified pathogens in 17 samples; 8 Streptococcus suis, 4 enteroviruses, two cases of
mumps and one S. pneumoniae, Japanese encephalitis and Varicella-zoster virus (VZV). In
addition, Hepatitis B was identified in 5 cases, but was not considered a cause of CNS disease,
but merely reflective of the high incidence of Hepatitis B in Vietnam. Genome coverage of
these pathogens varied from 0.83% to 81.33%. All findings were confirmed with specific PCR,
with Ct values ranging from 27 to 40. Use of an arbitrary cut of in reference genome coverage
led to missing VZV and produced 2 false positives (a polytropic provirus and Streptococcus
agalactiae ), which were all negative by PCR. Showing that just as in a culture based
diagnostic approach there is no replacement for expert interpretation of
results. Conclusions: Bioinformatics can help to automate processing of metagenomic
sequencing but, interpretation remains the domain of human intelligence. Building
bioinformatics tools with this in mind will enable more rapid uptake of metagenomics for
diagnosis from sterile sites.
Poster
119
Board #:
Title: Microbial Dark Matter Analysis Using 16S rRNA Gene Metagenomics Sequences.
Author H. Barak, A. Sivan, A. Kushmaro;
Block: Ben-Gurion University, Beer Sheva, ISRAEL.
Microorganisms are the most diverse and abundant life forms on Earth and account for a
large portion of the Earth’s biomass and biodiversity. To date though, our knowledge
regarding microbial life is lacking, as it is based mainly on information from cultivated
organisms. Indeed, microbiologists have borrowed from astrophysics and termed the
‘uncultured microbial majority’ as ‘microbial dark matter’. The realization of how diverse and
unexplored microorganisms are, actually stems from recent advances in molecular biology,
and in particular from novel methods for sequencing microbial small subunit ribosomal RNA
genes directly from environmental samples termed next generation sequencing (NGS). This
has led us to use NGS that generates several gigabases of sequencing data in a single
experimental run, to identify and classify environmental samples of microorganisms. In
metagenomics sequencing analysis (both 16S and shotgun), sequences are compared to
reference databases that contain only small part of the existing microorganisms and
therefore their taxonomy assignment may reveal groups of unknown microorganisms or
origins. These unknowns, or the ‘microbial sequences dark matter’, are usually ignored in
Abstract:
spite of their great importance. The goal of this work was to develop an improved
bioinformatics method that enables more complete analyses of the microbial communities in
numerous environments. Therefore, NGS was used to identify previously unknown
microorganisms from three different environments (industrials wastewater, Negev Desert’s
rocks and water wells at the Arava valley). 16S rRNA gene metagenome analysis of the
microorganisms from those three environments produce about ~4 million reads for 75
samples. Between 0.1-12% of the sequences in each sample were tagged as ‘Unassigned’.
Employing relatively simple methodology for resequencing of original gDNA samples through
Sanger or MiSeq Illumina with specific primers, this study demonstrates that the mysterious
‘Unassigned’ group apparently contains sequences of candidate phyla. Those unknown
sequences can be located on a phylogenetic tree and thus provide a better understanding of
the ‘sequences dark matter’ and its role in the research of microbial communities and
diversity. Studying this ‘dark matter’ will extend the existing databases and could reveal the
hidden potential of the ‘microbial dark matter’.
Poster
120
Board #:
Title: Whole-Genome Sequencing of Zika Virus Directly from Clinical Samples
K. Kamelian1, A. Olmstead2, V. Montoya2, W. Dong2, M. Morshed3, P. R. Harrigan1, J. Joy2;
Author 1
University of British Columbia, Vancouver, BC, CANADA, 2BC Centre for Excellence in
Block:
HIV/AIDS, Vancouver, BC, CANADA, 3BC Centre for Disease Control, Vancouver, BC, CANADA.
Background: In 2016, The World Health Organization declared the Zika virus a Public Health
Emergency of International Concern due to the increasing prevalence of Zika virus infections
in the Americas. The Zika virus has been associated with increased incidence of the
neurological condition Guillain-Barré syndrome and birth defect microcephaly. Routine
surveillance tools currently rely on PCR amplification, Sanger sequencing, and antibody-based
tests to identify new cases of Zika infections. However, whole-genome sequencing (WGS) of
the Zika virus may present certain advantages over other surveillance tools by providing more
detailed information on viral phylogenetic clustering, transmission, and
geography. Methods: Specimens from five subjects with travel-acquired Zika virus infection
(putatively from Belize, Mexico, an undisclosed Caribbean region, Barbados, and Panama)
were obtained from the British Columbia Centre for Disease Control Public Health Laboratory
(BCCDC) and had a range of cyclic threshold (Ct) values (21 - 33). WGS of Zika virus was
performed on an Illumina MiSeq using a previously published procedure designed to
overcome some of the limitations of low viral load and partially degraded samples by
amplifying several short amplicons to create a tiling path across the Zika virus genome
(Quick et al.,2017). Sequences were analyzed for depth of coverage and total number of reads
Abstract:
including total quality trimmed reads, viral reads, and human reads. Phylogenetic analysis was
performed to investigate geographic clustering of travel-related cases. Results: Consensus
sequences ranging from 8 - 10.5 kb were obtained for the five samples. Higher Ct values were
correlated with lower coverage, lower number of viral reads, higher number of human reads,
and overall lower depth. Median depth of coverage of the samples was 24,000 (IQR: 17,000-
25,000). Although some contigs had low depth of coverage (less than 10 reads), they still
provided adequate genome coverage for the regions sequenced. Phylogenetic analysis of
sequencing data confirmed the suspected regions of Zika infection for two of five samples.
Three samples were missing reference genomes of suspected areas of infection. However,
they clustered within close geographical proximity to neighboring regions. Conclusions: Our
results highlight the usefulness of WGS using the tiling amplicon method in a clinical setting.
WGS of the Zika virus allows insight into the origins of infection, transmission patterns, and
the genetic diversity of travel related cases. However, our samples gave rise to five sources of
the Zika virus infection suggesting that the complexity and global movement of the Zika virus
epidemic is likely to limit precise interpretations of the origin of travel related cases and is
dependent on availability of reference sequences from regions of interest.
Poster
121
Board #:
Identifying Putative Transmission Clusters of The Multidrug-Resistant E. coli ST131-
Title:
H30 Lineage Among U.S. Children Using Whole Genome Sequencing
Author A. Miles-Jay1, S. J. Weissman2, A. L. Adler2, J. G. Baseman1, D. M. Zerr2;
1
Block: University of Washington, Seattle, WA, 2Seattle Children's Hospital, Seattle, WA.
Introduction: E. coli ST131-H30 is a globally disseminated lineage that is implicated in rising
rates of multidrug resistance among extraintestinal E. coli infections. Despite the public
health significance of this pathogen, its transmission dynamics are poorly understood. This is
in part due to ST131-H30’s capacity for prolonged subclinical intestinal colonization, likely
resulting in a plethora of “silent” transmission events that are difficult to capture directly. We
assessed the ability to detect putative transmission clusters among E. coli ST131-H30 isolates
collected from U.S. children during routine clinical care. Methods: We applied whole genome
sequencing and a novel framework for transmission cluster detection to clinical E. coli ST131-
H30 isolates collected in a multicenter surveillance study that took place from 2009-2013 at 4
geographically diverse U.S. children’s hospitals. Isolates were sequenced on an Illumina
NextSeq platform. Quality filtered and trimmed sequencing reads were mapped to a high-
quality ST131-H30 reference genome and core genome single nucleotide variants were
identified. The R package transcluster—which probabilistically infers the number of
transmission events separating cases using pairwise genomic distance and sampling dates—
was used to identify and characterize putative transmission clusters where the implied
number of transmissions was less than 25 with a probability of 80% (the default
Abstract: settings). Results: A total of 126 E. coli ST131-H30 isolates were included. Twelve isolates
(9.5%) were placed into 6 putative transmission clusters; each cluster contained 2 isolates and
no clusters spanned multiple study sites. The time between sampling in a cluster ranged from
1 to 199 days. Five of the 6 clusters were composed of the CTX-M-15-type extended-spectrum
beta-lactamase-producing subclone of ST131-H30; 1 cluster was composed of non-ESBL
producing H30 isolates. The implied number of transmission events separating isolates in a
single cluster ranged from 1-18 events. The clusters contained a mix of hospital associated (n
= 5), healthcare-associated community onset (n = 3), and community-associated (n = 4)
infections. Two instances of plausible nosocomial transmission were
identified. Conclusions: The integration of whole genome sequencing data and a novel
framework for transmission cluster detection revealed putative transmission clusters
among E. coli ST131-H30 isolates collected during routine clinical care. Although geographic
location was not explicitly incorporated into the analysis, all clusters sorted by geographic
site, strengthening their epidemiologic plausibility. Whole genome sequencing of clinical
isolates could guide more detailed and resource-intensive sampling efforts designed to
elucidate transmission pathways of difficult-to-track and worrisome lineages like E.
coli ST131-H30.
Poster
122
Board #:
Virulence Characteristics and an Action Mode of Antibiotic Resistance in Multidrug-
Title:
Resistant Pseudomonas aeruginosa
Author W. Hwang;
Block: Department of Microbiology and Immunology, Yonsei University, Seoul, KOREA, REPUBLIC OF.
Pseudomonas aeruginosa displays intrinsic resistance to many antibiotics and known to
acquire actively genetic mutations for further resistance. In this study, we attempted to
understand genomic and transcriptomic landscapes of P. aeruginosa clinical isolates that are
highly resistant to multiple antibiotics. We also aimed to reveal a mode of antibiotic
resistance by elucidating transcriptional response of genes conferring antibiotic resistance. To
this end, we sequenced the whole genomes and profiled genome-wide RNA transcripts of
three different multi-drug resistant (MDR) clinical isolates that are phylogenetically distant
from one another. Multi-layered genome comparisons with genomes of antibiotic-
susceptible P. aeruginosa strains and 70 other antibiotic-resistance strains revealed both well-
characterized conserved gene mutations and distinct distribution of antibiotic-resistant genes
Abstract: (ARGs) among strains. Transcriptions of genes involved in quorum sensing and type VI
secretion systems were invariably downregulated in the MDR strains. Virulence-associated
phenotypes were further examined and results indicate that our MDR strains are clearly
avirulent. Transcriptions of 64 genes, logically selected to be related with antibiotic resistance
in MDR strains, were active under normal growth conditions and remained unchanged during
antibiotic treatment. These results propose that antibiotic resistance is achieved by a
“proactive” response scheme, where ARGs are constitutively expressed even in the absence
of antibiotic stress, rather than a “reactive” response. Bacterial responses explored at the
transcriptomic level in conjunction with their genome repertoires provided novel insights into
(i) the virulence-associated phenotypes and (ii) a mode of antibiotic resistance in MDR P.
aeruginosa strains.
Poster
123
Board #:
Title: Metagenomics Analysis of Spent Water from a Livestock Farm
O. A. Aiyegoro1, A. A. Adegoke2, D. T. Babalola3, E. Adetiba4;
1
Agricultural Research Council- Animal Production Institute, Pretoria, SOUTH
Author
AFRICA, 2University of Uyo, Akwa-Ibom, Uyo, NIGERIA, 3Covenant University-Centre for
Block:
Systems and Information Services, Cannanland, Ota, NIGERIA, 4Covenant University-
Department of Electrical and Information Engineering, Cannanland, Ota, NIGERIA.
Microorganisms are ubiquitous and important to the proper functioning of ecosystems;
including aquatic milieu, where they play vital roles in the water cycles and removal of
nutrients and toxins. Hence, studying these microbes are very essential. Prior to now, non-
culturable microbes are difficult to study, however, the advent of metagenomics analysis has
helped to solve this problem. In this present study, the gene profile of microbial compositions
of used water from animal research farm was analysed, in order to obtain a scanned profile of
all resident microbiome in the water. To analyse the microbial communities as depicted in the
sequenced data, MetaPhlAn2 was used. The reads were analysed and combined to form a
merged abundance table. This table was edited and viewed using LibreOffice Calc. A heatmap
showing the abundance profiles of the microbes was generated using Hclust2. A cladogram
showing taxonomic relatedness was captured using GraphlAn. This was done by rendering
trees and annotating them with microbial names and relative abundances. Pie-charts,
showing specific comparisons based on various clade, were generated using Krona. The
analysis of the microbial samples showed that the environment was dominated by Bacteria
(99.88%). The sample also showed that Archaea (0.07%) and Viruses (0.05%) were present, in
very small populations. Upon further analysis, it was shown that about 5 phyla were present
within the bacterial population namely: Actinobacteria (0.5%), Bacteroidetes (9%), Firmicutes
Abstract:
(3.3%), Proteobacteria (86.6%) and Spirochaetes (0.6%). A total of 40 different genera were
identified with the genera Thauera making up 74% of the entire population. This comes as no
surprise as thauera is a denitrifying bacteria playing a crucial role in the waste water
ecosystem. Thauera plays an important role in the removal of nitrogen nitrate and other
aromatic compounds. For this reason, Thauera is usually detected in most wastewater
treatment samples. Another notable genera worth noting among the bacterial population is
the genera Thiomonas making about 5% of the bacterial population. The viruses present in
the effluent are composed majorly of the genus Siphoviridae and Gammaretrovirus. The
Kingdom Archaea was found to consist of only
the genera Methanobrevibacter. Metagenomic analysis provide insight into the microbial
community present in the waste water effluent. We were able to analyse the various
microorganisms present as well as their relative abundances. We were also able to use
various tools to provide graphical illustrations that aided our analysis. The results revealed
that a large percentage of the microorganisms present were bacteria and we were able to
view their diversity. A huge part of this bacterial presence was directly involved in the
wastewater ecology and had major roles in the breakdown of chemical compounds present in
the water.
Poster
124
Board #:
Title: Nanopore Sequencing for AMR Detection and Characterization
Author Y. Fan1, W. Timp1, T. Simner2, P. Tamma2, Y. Bergman2;
1
Block: Johns Hopkins University, Baltimore, MD, 2Johns Hopkins Hospital, Baltimore, MD.
Background: The continuing threat of antimicrobial resistance poses an urgent global public
health concern. Early and accurate detectionof resistance mechanisms can both prevent the
dissemination of resistant organisms with in the healthcare environment and ensure patients
are placed on early and effective antibiotic therapy. To this end, we leveraged the long reads,
low overhead, and real time analysis capabilities of nanopore sequencing in order to detect
both acquired resistance genes and chromosomal mutations that potentially confer
antimicrobial resistance. Methods: Forty clinical Klebsiella pneumoniae isolates with a variety
of resistance mechanisms from patients hospitalized at The Johns Hopkins Hospital were
sequenced on the Oxford Min ION platform. Genomes were assembled using canu, and
corrected with signal level algorithms implemented in the nanopolish software package.
These isolates were also sequenced on the Illumina platform, and the more accurate short
read data were used to further correct the assemblies. Abricate was used to screen the
contigs for resistance genes, using several databases, including CARD, Resfinder, and
PlasmidFinder. Chromosomal mutations and their consequences in the amino acid domain
were identified using custom C++ code. Results: We found that disagreements between the
Abstract: nanopolished and Illumina polished assemblies clustered near methylation motifs. By
examining these errors, and by using improved signal models for these motifs in
amethylationa ware versionof nanopolish, we can build an assembly using only nanopore
data that achieves ~99.8% identity with illumina polished assemblies. We are continuing
toexamine the locations and motifs of the remaining ~0.2% of errors to identify new and
better approaches to polish nanopore-only assemblies. This remaining ~0.2% is important
because it makes accurate prediction of protein translation, and hence truncating or missense
mutations difficult to detect. Corrected assemblies allowed us to identify a variety of small
mutations noted int he literature to be responsible for resistance phenotypes. By limiting our
analysis pipeline to 52,000 reads with an average length of 10 kb, we are able to sequence
and build a high quality genome in under 8 hours using a machine with 36 cores and 72 GB
RAM. Conclusions: We are developing tools that apply nanopore sequencing for rapid and
accurate identification of antibiotic resistance mutations, which will clinicians in placing
critically-ill patients on early and effective antibiotic therapy. As we collect and sequence
more isolates, and accrue information about genetic features that give rise to antimicrobial
resistance, we will increase the utility of real time sequencing assays for diagnostic purposes.
Poster
125
Board #:
ASA³P: An Automatic and Highly Scalable Pipeline for Bacterial Genome Assembly, Annotation
Title:
and Higher-level Analyses
O. Schwengers1, A. Hoek1, M. Schneider1, M. Fritzenwanker2, L. Falgenhauer2, J. Falgenhauer2,
T. Hain2, T. Chakraborty2, A. Goesmann1;
Author 1
Bioinformatics and Systems Biology, Justus-Liebig University Giessen, Giessen,
Block:
GERMANY, 2Institute of Medical Microbiology, Justus-Liebig University Giessen, Giessen,
GERMANY.
Background: Major technological advances and the dramatic decrease in costs of bacterial
whole genome sequencing is having an unprecedented effect in genome epidemiology and
metagenomics. These exciting developments require the establishment of effective, efficient
and scalable bioinformatics software tools for data processing and analysis of the high
throughput data obtained before scientific interpretation can take place. Methods: In order
to solve core bioinformatics tasks such as quality trimming, assembly and annotation, ASA³P
takes advantage of published and well performing third party tools and combines them with
comprehensive databases. It is a modular software pipeline comprising a core application
implemented in Java and Groovy together with cluster distributable scripts implemented in
Groovy. HTML reports take advantage of modern and interactive JavaScript libraries. For
massive scalability our pipeline integrates well with Sun Grid Engine compatible compute
clusters. Within cloud computing environments the software is able to setup complex
hardware and software infrastructures and thus is able to automatically create its own
compute clusters. Results: Here, we introduce ASA³P, a fully automatic and scalable
Abstract:
assembly, annotation and higher-level analysis pipeline for bacterial genomes. The pipeline
conducts all of the necessary data processing steps, i.e. quality clipping and assembly of
sequencing reads, scaffolding subsequent contigs and annotation of genome sequences.
Furthermore, ASA³P performs comprehensive genome characterizations and analyses, e.g. for
taxonomic classification, and detection of both AMR genes and virulence factors. Results are
presented via an HTML based user interface providing aggregated information, interactive
visualizations and access to intermediate results in standard bioinformatic file formats. ASA³P
is available in two versions: a local Docker container for small-scale projects and an
OpenStack cloud version able to automatically create and manage its own self-scaling
compute cluster. Discussion: ASA³P is a software tool enabling the automatic processing,
assembly, annotation and higher level analysis of bacterial NGS whole genome data in a
comfortable but high-throughput manner. The burden of technical complexity is overcome by
simple setup routines and the use of Docker and OpenStack images. Thus, automatic and
standardized analysis of hundreds of bacterial genomes is now feasible on a daily basis.
Poster
126
Board #:
Title: Soil Microbiome Phenotypic Response to Nutrient and Moisture Perturbations
T. Roy Chowdhury1, E. M. Bottos2, R. A. White III2, J. O. Brown2, L. M. Bramer2, J. D. Zucker2, C.
Author M. Brislawn2, S. J. Fansler3, L. McCue3, S. J. Callister3, J. K. Jansson3;
1
Block: University of Maryland, College Park, MD, 2Pacific Northwest Laboratory, Richland,
WA, 3Pacific Northwest National Laboratory, Richland, WA.
Soil microbiome responses to changing environmental conditions are manifested as shifts in
community structure and/or modification of activity. However, molecular-level details
underlying functional responses of soil microbiomes to perturbation are largely unknown.
Here, we demonstrate a multi-omics approach to determine the impact of environmental
perturbations on the soil microbiome across taxonomic and functional levels. Kansas native
prairie soil samples from three field locations were either treated with glycine as a model root
exudate, or perturbed by changing moisture conditions. The microbiome response was
assessed using a suite of omics measurements: 16S rRNA amplicon sequencing,
Abstract:
metagenomics, metatransciptomics, and metabolomics. The soil microbiome responded to
glycine at the functional level, but not at the community structure level. In contrast, soil
drying shifted both the microbiome composition and function. A major challenge in soil
microbial ecology is the extraordinary phylogenetic and functional diversity of the soil
microbiome in association with the physico-chemical complexity of the soil habitat. Here by
using a multi-omics approach, we elucidated the phenotypic response of the soil microbiome
across different levels of expression; thus providing a proof-of-concept for use of this
approach to assess key physiological traits expressed by the soil microbiome.
Poster
127
Board #:
Title: Microbial Diversity of New Orleans Groundwater Using Illumina Miseq
Author S. Sherchan;
Block: Tulane University, New Orleans, LA.
Groundwater contamination will result in poor drinking water quality, loss of water supply,
and pose human health in great risk. Increased attention has been devoted to the direct
detection of pathogenic organisms in groundwater by using next-generation sequencing
(NGS). We investigated microbial biodiversity of groundwater using Illumina Miseq. Water
samples were collected from 55 private wells in New Orleans Louisiana. Our results indicate
twenty bacterial phyla. Proteobacteria was the most dominant phylum in most of samples
(relative abundance: 71.1%), followed by Chlorobi (5.1%), Actinobacteria (4.2%), Chloroflexi
Abstract: (3.3%), Cyanobacteria (2.2%), and Bacteroidetes (2.0%) (Fig. S5). At the genus level, five
genera were abundant (> 3%) in well water samples with Methylomonas (5.3%),
Methylosinus (3.7%), Mycobacterium (3.4%), Dechloromonas (3.3%), and Thiobacillus (3.1%).
The relative abundances of the class of Gammaproteobacteria and Actinobacteria were
positively associated with qPCR results of Legionella spp. and mycobacteria respectively.
However, the regression analysis showed no significance (p > 0.05). Principal coordinates
analysis (PCoA) of unweighted UniFrac indicates patterns of bacterial community composition
in groundwater reflect sampling locations.
Poster
128
Board #:
Title: Third-Generation Sequencing of Microbes Isolated from Fermented Beverages
J. H. Collins, E. Van Zyl, A. Cosio, G. Brogan, L. Chico, A. Rozen, E. Thomson, J. Coburn, E. M.
Author
Young;
Block:
Worcester Polytechnic Institute, Worcester, MA.
Fermented beverages represent a growing multibillion dollar industry. This is not limited to
beer and wine – other products like hard cider and kombucha are among the fastest growing
sectors in the industry. Understanding the microbial communities that generate – or spoil –
these products are essential to controlling flavor and consistency. Plummeting sequencing
costs and new long read sequencing technologies can enable genomics of individual isolates
as well as microbial consortia. To this end, we demonstrate the utility of nanopore sequencing
for genome and metagenome sequencing of beer, hard cider, and kombucha fermentations.
We present de novo genome assemblies of four isolates: a Saccharomyces cerevisiae strain
from homebrew, two novel Saccharomyces and Pichia yeast isolates from hard cider, and
a Gluconacetobacter xylinus strain from a home kombucha culture. We further attempt to
sequence the metagenome of the kombucha culture, and show genome assembly of the most
common Acetobacter strain is possible, although targeted metagenomes such as those based
on 16S rRNA are likely better suited to capturing all members of the community. To assemble
the isolate genomes, four de novo genome assemblers, MiniASM, Canu, Flye, and
SMARTdenovo, were evaluated at varying genome coverages, with SMARTdenovo performing
the best based on number of contigs, number of mismatches, and average coding sequence
Abstract: (CDS) length. Quality of the assemblies can be greatly improved by scaffolding to a reference
genome with pyScaf and polishing using Nanopolish, increasing the average CDS length by
approximately 33% across all assemblies. Yet, the average CDS length of the polished
nanopore assemblies were approximately 75% of their reference strain, which could be
improved by complementary Illumina sequencing. Yet, without Illumina data, we were able to
place the homebrew S. cerevisiae strain in the “Beer 2” clade based on multiple mutations to
the MAL11 gene and a specific W497 mutation in the FDC1 gene, identify new yeast species
from hard cider, assemble a genome from a metagenome experiment, and assemble a
complete G. xylinus genome from an isolate with a contig longer than the existing reference
strain. This demonstrates that quality genome assemblies from fermented beverages using
nanopore sequencing are possible. The low cost and ease of use of nanopore technology
promises high quality genomic information for future strain breeding or engineering, as well
as assessing spoilage for process control. Thus, a workflow of nanopore sequencing coupled
with de novo assembly using SMARTdenovo, and optionally pyScaf and Nanopolish, can
facilitate quality genomics of microbes from fermented beverages and therefore has great
potential utility for producers of fermented beverages for the control of flavor and
consistency.
Poster
129
Board #:
Whole Genome Analysis of Clinical Multidrug-resistant Acinetobacter baumannii Strains in
Title:
Vietnam Hospital
N. Si-Tuan1, C. Nguyen2, H. Nguyen Thuy3;
1
Thongnhat General Hospital of Dongnai Province, 234 highway 1A, Tan Bien ward, Bien Hoa
Author City, VIET NAM, 2Department of Bioinformatics and Medical Statistics, Vinmec Research
Block: Institute of Stem Cell and Biotechnology, Hanoi, Vietnam, Hanoi, Vietnam, VIET
NAM, 3Department of Biotechnology, Faculty of Chemical Engineering, Ho Chi Minh City
University of Technology, HCM National University, Ho Chi Minh City, VIET NAM.
Background: Acinetobacter baumannii is an important nosocomial pathogen that can develop
multidrug resistance. In this study, we sought to explore the genomic properties,
phylogenetic relationships, and comparative genomics of this pathogen through strain
DMS06669 and DMS06670 (isolated from the sputum of two male patients with hospital-
acquired pneumonia). Methods: Whole genome analysis of A. baumanniiDMS0669 and
DMS06670 included de novo assembly; gene prediction; functional annotation to public
databases; phylogenetic tree construction by average nucleotide identity; pan-genome
analysis and antibiotics resistance genes identification. Antibiotics resistance genes in-
vitro were isolated by PCR and re-confirmed by improved Sanger method. Results: The data
Abstract: showed that a total of 19 possible antibiotic resistance genes, conferring resistance to eight
distinct classes of antibiotics, were identified in two strains. Nine of these genes have not
previously been reported to occur in A. baumannii. Comparative analysis of 23 available
genomes of A. baumannii revealed an open pan-genome consisting of 15,883 genes. All
antibiotics resistance genes were isolated. Conclusions: Our results provide important
information regarding mechanisms that may contribute to antibiotic resistance in the
DMS06669 and DMS06670 strain and have implications for treatment of patients infected
with A. baumannii.
Keywords: Acinetobacter baumannii, multidrug resistance, pan-genome analysis, comparative
whole-genome analysis, next generation sequencing.
Poster
130
Board #:
Genomic Characterization of Carbapenem-resistant Escherichia coli Isolated from Clinical
Title:
Samples in Thailand
K. R. Margulieux1, A. Srijan1, S. Ruekit1, E. Snesrud2, A. Ong2, O. Serichantalergs1, R.
Kormanee3, P. Sukhchat3, A. Jones2, P. McGann2, J. Crawford1, B. Swierczewski2;
Author 1
Armed Forces Research Institute of Medical Sciences, Bangkok, THAILAND, 2Walter Reed
Block:
Army Institute of Research, Silver Spring, MD, 3Queen Sirikit Naval Hospital, Chonburi,
THAILAND.
Background: The spread of multidrug-resistant bacterial pathogens is one the most
dangerous current public health threats, especially in regions such as Southeast Asia with
widely unregulated antibiotic usage. Whole genome sequencing (WGS) of clinical isolates can
provide additional insights about multidrug-resistant isolates and complement efforts of local
clinical laboratories. Methods: A total of 183 clinical Escherichia coli isolates were collected at
Queen Sirikit Naval Hospital in Chonburi, Thailand from October 2016 - January 2018 as part
of routine surveillance for multidrug-resistant pathogens. The isolates were verified and
underwent antimicrobial susceptibility tested using the BD PhoenixTM 50 and the NMIC/ID 95
panel to screen for carbapenem-resistant isolates. WGS of identified carbapenem-resistant
isolates was performed on an Illumina MiSeq Benchtop sequencer and subsequently
analyzed. Results: Of the 183 multidrug-resistant E. coli isolates collected, 168 (91.8%)
demonstrated resistance to 3rd generation cephalosporins and 167 (91.3%) to azetronam.
Phenotypic resistance to at least one carbapenem tested was observed in 11/183 (6.0%)
isolates. 8/11 isolates were positive for carbapenemase production using a CarbaNP test.
Abstract: Genomic characterization of E. coli isolates showed that the 11 isolates carried 5
carbapenem-resistance genes between them. Single isolates carried blaNDM-4, blaKPC-2,
and blaOXA-181, two isolates carried blaOXA-48, and six isolates carried blaNDM-5. The three isolates
that tested negative for carbapenemase production with the CarbaNP test
carried blaOXA genes. Isolate relatedness was shown through whole genome
comparison. Conclusions: A total of 11/183 (6%) multidrug-resistant E. coli isolates identified
over 16 months at Queen Sirikit Naval Hospital were shown to be carbapenem resistant. Five
carbapenem-resistance genes were identified in these 11 isolates. The most prevalent
carbapenem-resistance gene was blaNDM-5, a gene increasingly reported in Southeast Asia.
Notably, the isolates carrying blaOXA genes appeared to have lower phenotypic levels of
carbapenemase production compared to other gene types. This may lead to an under-
reporting of carbapenem-resistance in the region and treatment complications if not
detected during routine clinical screening. It is important to continue long-term surveillance
of hospitals in Thailand, and utilizing WGS for in-depth isolate analysis is important to fully
understand the current circulation of resistant pathogens and their evolution over time.
Poster
131
Board #:
Title: All 16S rRNA Gene Variable Regions Essential for Microbiome Survey
Author Q. Yang, C. Franco, W. Zhang;
Block: Flinders University, Adelaide, AUSTRALIA.
Background: Marine sponges (phylum Porifera) are enriched by host-specific and
opportunistic microorganisms that make up to 60% of the mesohyl volume. The majority of
these microbes have not been identified. 16S rRNA gene based metagenomics sequencing has
become the method of choice to study sponge microbiomes, however, results from amplicon-
based analyses that employ only one pair of primers targeting specific variable regions have
been found to be grossly under-representative. Therefore, the aims of this study were to test
the hypothesis that primers targeting different variable regions of the 16S rRNA gene reveal
vastly different parts of the microbiome and to develop an improved approach to reveal a
more complete microbial profile. Methods: To test the hypothesis, five primers sets covering
all the variable 16S rRNA gene regions were validated to reveal the microbiomes of four
representative sponge species in different orders on Illumina MiSeq platform. Results: A
significant increase in microbiome coverage was achieved. 29.5% of phylum-level OTUs and
35.5% class-level OTUs generated from this developed approach could not be recovered by
the most commonly used single primer set targeting the V4 region only. In relation to the
Abstract:
microbial sequence recovery, this approach could increase the sequence coverage by 93.9 to
549.9% for each of four sponge species when compared to that using the V4 primer set. A
further indirect comparison with metagenomics-based microbiome survey demonstrated that
the multi-primer approach performed substantially better, especially in revealing unaffiliated
taxa, that are either candidate or unassigned. Conclusions: Our study indicated that a
validated combination of variable region-specific primer sets covering the full length of 16S
rRNA gene is essential to analyze sponge microbial communities when using amplicon-based
analysis, so as to avoid an incomplete and misleading microbiome profiling. This multi-primer
approach can be conveniently applied and represents a fundamental change from
conventional single primer set amplicon-based microbiome studies. It could contribute
significantly to any microbiome survey projects, to achieve a more comprehensive
understanding of the microbial profile. The superior capacity on uncovering the unaffiliated
microbial OTUs allows for a greater potential to discover the taxonomic ‘blind spots’ within
the largely unknown microbial world.
Poster
132
Board #:
Title: Factors Associated with Surface Water Microbial Community Structure
Author T. Chung1, D. L. Weller2, J. Kovac1;
1
Block: The Pennsylvania State University, State College, PA, 2Cornell University, Ithaca, NY.
Background: According to the U.S. Geological Survey, surface-water sources accounted for
52% of all irrigation withdrawals in 2015. However, temporal variation in physical (e.g.,
turbidity) and chemical (e.g., pH) water quality, and spatiotemporal variation in
environmental factors (e.g., weather, proximity to upstream livestock operations) may affect
microbial community composition and the microbiological quality of surface water. While a
number of studies have investigated drinking water microbiomes, little is known about the
microbial communities in surface waters that are used for irrigation. Here we investigated
geospatial, weather and landscape factors associated with surface water micro- and
mycobiomes. Methods: We characterized the bacterial and fungal community composition of
68 samples collected using Moore swabs from six streams in upstate New York between May
and August 2017. Samples were separated into particulate matter (i.e., soil) and water
fractions. Total DNA was extracted from fractions using PowerSoil (n=68) or PowerWater
(n=46) kits, respectively. Data on physical and chemical water quality, upstream landscape
characteristics and weather data were also collected at sampling. Microbial community
composition of each sample was determined by Illumina sequencing of PCR-amplified 16S
Abstract: rRNA gene V4 region and ITS sequences. Sequences were processed using Mothur. The
resulting OTUs were used to investigate sample biodiversity within and between streams
using permutational multivariate analysis of variances (PERMANOVA), clustering, ordination
and network analysis. Results: Significant differences in microbial community structure were
observed among samples collected from different streams. According to PERMANOVA, these
differences may be associated with differences in upstream activity (p<0.05). Three out of 18
samples collected immediately downstream of dairies had a relatively higher abundance of
Moraxellaceae or Enterobacteriaceae. Microbial communities also differed between water
and soil fractions of individual samples according to the UniFrac-based PCoA clustering, and
PERMANOVA (p<0.01). While the dominant families in soil fractions were Rhodocyclaceae,
Rubritaleceae, and Sphingomonadaceae; Chitinophagaceae, Enterobacteriaceae, and
Moraxellaceae were more abundant in water fractions. Conclusions: Taxonomic composition
of soil and water fraction of collected surface water samples differ. Further, the microbial and
especially Enterobacteriaceae content in water may be affected by the adjacent land use.
Thus, this study provides a baseline data describing surface water microbiome structure that
can guide further studies focused on detection of microbiological safety hazards in water.
Poster
133
Board #:
Whole Genome Sequencing Analysis Reveals That Air-conditioners Cooling Towers are
Title: Reservoirs for Legionella pneumophila and Lead to Infections with the Same Strains Over
Years
D. Wüthrich1, S. Gautsch2, P. Brodmann2, O. Dubuis3, R. Spieler-Denz4, S. Tschudin-Sutter1, V.
Gaia5, S. Fuchs4, A. Egli1;
Author 1
University Hospital of Basel, Basel, SWITZERLAND, 2Cantonal Laboratory City of Basel, Basel,
Block:
SWITZERLAND, 3Viollier, Allschwil, SWITZERLAND, 4Medical Services City of Basel, Basel,
SWITZERLAND, 5National Reference Center for Legionella, Bellinzona, SWITZERLAND.
Background: Water supply and air-conditioner cooling towers (ACCT) are known potential
sources of Legionella pneumophila. Traditional typing methods have low resolution and may
not allow reliable identification of transmissions. The advent of whole genome sequencing
(WGS) allows high-resolution analysis, and the study of complexity within environmental
compartments. Materials/methods: In summer 2017, the health administration of the City of
Basel detected an increase of Legionella pneumophila infections compared to previous
months. An epidemiological and WGS-based microbiological investigation was performed,
involving isolates from the local water supply and two ACCTs (n=60), clinical outbreak and
non-outbreak related isolates from 2017 (n=8) and those collected between 2003-2016
(n=26). Finally, we also compared the sequenced strains to already published bacterial
Abstract: genomes from 17 countries (n=539). Results: Phylogenetic analysis of the ACCT isolates
showed clustering into two groups separated by a few hundred allelic differences. Several
strains were found in both ACCTs. Furthermore, we found that isolates from the two ACCTs
were highly related to three clinical isolates from 2017. Five clinical isolates from the last
decade also found to be closely related to the recent isolates from ACCTs. Finally, we found
several clinical isolates to be related to published genomes. Conclusions: Current outbreak-
related and historic isolates were linked to ACCTs. ACCTs form a complex environmental
habitat in which strains are conserved over years and are exchanged between locations. WGS-
based typing allows to explore this complex network, which might have public health
implications on the tracing of potential sources and the interpretation of environmental
findings.
Poster
134
Board #:
Transmission Analysis and Modelling Epidemiology of Tuberculosis (TAME-TB) Using
Title: Population Based Whole Genome Sequencing in Slums and Slum Rehabilitation Dwellings in
Mumbai, India
Author A. Chatterjee;
Block: Indian Institute of Technology Bombay, Mumbai, Maharashtra, INDIA.
A recent study using whole genome sequencing (WGS) demonstrated the presence of a clonal
outbreak of multidrug resistant (MDR) TB in Mumbai. Thus transmission of Mtb in the
community appears to be one of the most significant contributors to the current epidemic
like situation in Mumbai and elsewhere, and emerges as a key intervention point for the
public health system. While nosocomial transmission and transmission in public spaces have
been identified for intervention, household transmission is overlooked. One of the reasons
for the neglect is due to limited data. Previous inability to ascertain the path of transmission
has been recently overcome by the use of whole genome sequencing (WGS) which has
successfully traced several outbreaks of TB. Here we determine the transmission of Mtb in
low socio-economic households, in a defined slum cluster and an adjacent slum rehabilitation
(SRA) cluster in Mumbai. In these low income settlements the air quality has been found to
Abstract: be poor which can significantly increases the risk of airborne infection. Additionally the
spatial adjacency of these slum settlements pose a potential risk to increasing vulnerability.
Using WGS of Mtb isolated for TB patients in the two locations, the study demonstrates the
proportion of TB caused by household transmission. Using phylogenomics, bayesian
estimation of risk of infection and GIS mapping, the study will conclusively trace the
transmission chains in the locale. By overlaying this information with modelling of the
household built environment, the study proposes to understand the potential contribution of
such layouts and the effect of spatial autocorrelation to the increased transmission, the study
will device a novel public health tool for Real time monitoring and mapping of TB
transmission. Additionally, this study will contribute towards understanding the relationship
between TB transmission and slum-household clustering through a spatio-temporal analysis
route. This spatial analysis route adds to the novelty of this study.
Poster
135
Board #:
Illumina Based Analysis and Prediction of a Suitable Metagenomic Diversity of Endophytic
Title:
Bacteria to Improve Rice Crop Productivity
Author A. Krishnamoorthy, M. K. Maiti;
Block: Indian Institute of Technology Kharagpur, Kharagpur, INDIA.
Background: To combat the declining supply of natural resources and to address the need of
feeding a steadily growing population, implementation of sustainable agricultural practices is
mandatory. For enhancing crop productivity, exploring the interactions between plant and
microbial populations could prove as an effective strategy. Endophytes are microorganisms
that reside and colonize within the host plant tissues and are reported to aid in plant growth
promotion (PGP). Through the present study, we would like to address the following key
questions in the domain of rice endophyte research through Next Generation Sequencing
(NGS): (i) What are the core and temporal compositional profiles of endophytic bacterial
diversity in the different host tissues of rice plant? (ii) How do they affect the fitness of the
host plants? (iii) How do plants respond to inoculation with the endophytic bacterial strains
upon seed priming in relation to PGP characteristics? Methods: In the present study, using
Illumina sequencing of the V3-V4 hypervariable region of the 16S rRNA gene and
metagenomic library analysis, we have identified the endophytic bacterial population present
in the root and shoot tissues of selected aromatic rice cultivar Kalonunia grown in
vitro aseptically. Further, we have isolated endophytic bacteria through the culturable
approach and investigated the PGP effect rendered by them on the host plant rice. Results: A
Abstract: total of 40,600 reads were obtained from the two 16S rRNA gene samples sent for Illumina
sequencing after removal of reads corresponding to Cyanobacteria and other chimeras. In our
study, regardless of the plant tissue, members of phylum Firmicutes were the most abundant,
followed by those of Proteobacteria, Bacteroidetes, Actinobacteria
Spirochaetes and Tenericutes. Analysis of the sequences revealed the presence of a core
bacterial endomicrobiome comprising mainly of Acinetobacter sp., Heliorestis sp.
and Thiomonas sp. in the root and shoot metagenomic samples. The root metagenome of
rice, however, showed more abundance and diversity of bacterial endophytes than the shoot
sample which include species of Pseudomonas and Stenotrophomonas. All the bacterial
endophytes identified are previously reported to exhibit PGP in host plants through several
mechanisms, such as phytohormone production, nitrogen fixation, phosphate solubilization,
siderophore production, etc. Through the culturable approach, different species
of Pseudomonas and Methylobacterium were isolated from the callus cultures of Kalonunia
rice cultivar which are reported to have good PGP abilities. Conclusion: Our findings indicate
that a typical metagenomic diversity of endophytic bacteria could be predicted through NGS
for the development of suitable bacterial consortia using selected endophytic isolates as
bioinoculants to improve rice crop productivity.
Poster
136
Board #:
Molecular Evolution of HSV-1 and -2 Clinical Specimens During Storage, Transport and Single-
Title:
passage Culture
Author P. Roychoudhury, A. Greninger, H. Xie, M. Huang, S. Selke, A. Wald, C. Johnston, K. Jerome;
Block: University of Washington, Seattle, WA.
The herpes simplex viruses HSV-1 and -2 are ubiquitous human pathogens responsible for a
large burden of disease worldwide, manifesting as oral and genital ulcers, neonatal disease,
encephalitis and keratitis. To date, most HSV genomics has been performed on culture
isolates, raising concerns that these genomes may not accurately represent the clinical
specimens from which they were derived. We have developed and validated an approach
that combines a DNA oligonucleotide hybridization panel with a bioinformatic pipeline that
allows the recovery of near-complete HSV genomes directly from clinical specimens with
Illumina sequencing. Our computational pipeline performs rapid assembly and annotation of
whole viral genomes starting from raw reads, allowing the recovery of near-full-length
genomes from specimens with as low as 102HSV copies/ml and 100,000 reads. We applied
Abstract:
this approach to a set of HSV-1 clinical swabs and paired single-passage culture isolates and
saw limited sequence evolution with 14 out of 17 specimens being completely identical in the
UL-US regions. With a separate set of clinical samples, we compared HSV-2 sequences from
swab-derived specimens sequenced after different methods of storage (swab in viral
transport media or PCR buffer, single passage culture, extracted DNA) and again saw minimal
sequence evolution across different specimen types. Together, these results show that low-
passage clinical isolates are reflective of the viral sequences present in the lesion and can be
used for phylogenetic analyses. We have also used this method to detect superinfection by
unrelated HSV strains in single and temporal samples, illustrating the power of direct-from-
specimen sequencing of HSV.
Poster
137
Board #:
Population Structure, Pilus Island Distribution,and Antimicrobial Resistance Genes of Group
Title:
B Streptococcus Isolated from Infants with Invasive Disease
S. T. Lukhele1, G. Kwatra1, A. Ismail2, M. Ali2, C. Cutland1, Z. Dangor1, S. Madhi1;
Author 1
University of the Witwatersrand, Johannesburg, SOUTH AFRICA, 2National Institute of
Block:
Communicable Diseases, Johannesburg, SOUTH AFRICA.
Background Group B Streptococcus is a leading cause of neonatal invasive disease, however,
there is limited information on the invasive disease genotypes from Africa. This study aimed
to investigate genotype diversity and antimicrobial resistance genes associated with among
invasive GBS isolates collected over a 12years period in Johannesburg, South
Africa. Methods Whole genome sequencing was performed using illumina bio-sequencer and
Nextera DNA kit. Whole genome multi-locus typing was used to determine the genetic
diversity of Group B Streptococcus isolates. The presence of resistance genes and pilus islands
were identified using PubMLST. Results Among 293 isolates, 17 genotypes were found with
Abstract:
ST17 (36.51%) and ST23 (19.79%) as being dominant genotypes, followed by ST109 (16.3%),
ST1 (4.09%), ST28 (4.09%) and ST10 (3.41%). The invasive disease isolates were mostly
associated (90%) with cps-Y, -L, and -F capsular biosynthesis genes. Pilus islands (PI) identified
included PI-2b (24.5%), PI-2a (26.2%), and PI-1 (28.6%) and in combination PI-1+2a (9.5%) and
PI-1+2b (27.9%). Nearly 92.8% of invasive isolates had a Tet-M gene and 5.11% had erm gene
in their genome. Conclusion The dominant sequence type of invasive disease isolates were
ST17 and ST23. The presence of tetracycline resistance gene and PI-1 were observed among
the most dominant genotypes of Group B Streptococcus.
Poster
138
Board #:
Elucidation of Major Environmental Factors That Govern the Microbial Community Structure
Title:
and Function in Acid Mine Drainage of Malanjkhand Copper Project, India
Author A. Gupta, A. Dutta, J. Sarkar, M. K. Panigrahi, P. Sar;
Block: Indian Institute of Technology Kharagpur, Kharagpur, INDIA.
Background: Biological oxidation of the sulfidic ores in the mining region generates highly
acidic mine drainage which is threat to an ecosystem as it contains high concentration of
heavy metals and sulfate, hence considered to be an extreme environment for life. The
present study is designed to understand the role of major environmental factors (pH, SO42-
and DOC) in assemblage of microbial community structure and function. Methods: Acid mine
drainage (water and sediment) samples collected from the Malanjkhand copper project, India
were used in the present study. To attain this objective, geochemistry of the samples was
thoroughly assessed followed by 16S rRNA based targeted sequencing and shot gun
metagenome approach to understand the microbial diversity and function as well as
statistical analysis was used to comprehend the role of environmental factors in shaping the
microbial community composition. Results: The samples were found to be distinct in its
geochemical parameters and were partitioned in to two pH regimes [low (1.9 < pH < 4.0) and
high 4.0 < pH < 6.0)]. The low pH samples contained high sulfate and heavy metals
concentration as compared to high pH samples. The microbial diversity of the low pH samples
were dominated with highly acidic, Fe/S oxidizing taxa responsible for AMD generation
Abstract:
whereas the high pH samples constituted of moderately acidophilic/neutrophilic microbial
groups involved in diverse biogeochemical cycling and contained groups which could be used
as a potent members for AMD attenuation. Canonical correspondence analysis established
the role of pH, Fe, SO42-, DOC etc. in shaping the structure of microbial community
composition. Spearman correlation of the OTUs with pH, Fe, SO42-revealed that highly acidic
and Fe/S oxidizing groups were found to be positively correlated with these parameters. To
understand the metabolic potential of the microbial community, one sample from each pH
regime was considered for shot gun metagenome based approach and results revealed that
genes involved in diverse biogeochemical cycling were detected in both the samples. The high
abundance of genes involved in S oxidation and pH stress were detected in low pH sample
whereas sulfate reducing gene were found to be more in high pH sample. The genes involved
in C fixation, nitrogen metabolism and heavy metal stress were detected in both the samples
hence confirmed the function of these organisms under low organic carbon and heavy metal
stress. Conclusion:The present study provides a deeper insight into the role of environmental
variables in shaping the microbial community structure and function in acid mine drainage.
Poster
139
Board #:
Rapidly Accumulated Tobramycin Resistance by Pseudomonas aeruginosa in CF-like Acidic pH
Title:
Environment
Author Q. Lin, Y. Di;
Block: University of Pittsburgh, Pittsburgh, PA.
Background: Cystic fibrosis (CF) is a genetic disease with a loss of cystic fibrosis
transmembrane conductance regulator (CFTR) function that leads to impaired airway host
defense. Chronic infection and colonization by gram-negative Pseudomonas aeruginosa (P.
aeruginosa), an opportunistic pathogen, contribute to high mortality rates in CF. While the
airway surface liquid of CF patients becoming more acidic with aging, the prevalence of P.
aeruginosa lung infection also gradually increases over time in CF patients from age 2 to 45
and P. aeruginosa eventually becomes the dominant bacterial strain colonized in the lungs of
CF suffers. We previously demonstrated that the acidic CF lung microenvironment
promotes P. aeruginosa biofilm formation and multi-drug resistance. But the effects of acidic
CF lung microenvironment on tobramycin treatment-associated antibiotic resistance (AR)
remains unknown. In this study, we hypothesize that the acidic microenvironment promotes
Abstract: faster and stronger tobramycin resistance compared to physiologically neutral pH non-CF lung
microenvironment. Methods: Planktonic and bead-transfer biofilm models were used for P.
aeruginosa PA14 evolution study in pH 6.5 and 7.5 with or without tobramycin treatment.
Bacterial whole genome sequence data were acquired by Next-Generation Sequencing (NGS)
technology. Results: Our results indicated that PA14 exhibited a rapid morphological change
under acidic pH conditions. Acidic environment also stimulated faster and stronger PA14
tobramycin resistance compared to neutral pH conditions. NGS results showed that acidic
environments elicited several DNA mutations that were likely pH-dependent. Conclusions:
Our results indicated that PA14 generated AR quickly under tobramycin treatment and the
acidic lung microenvironment promoted even faster tobramycin resistance in the biofilm
mold of growth. The pH-dependent DNA mutations are potential targets for future treatment
in CF patients to effectively eliminate P. aeruginosa infection.
Poster
140
Board #:
Genomic Characterization and Phylogenetic Analysis of Salmonella Javiana Clinical Strains
Title:
from Tennessee, 2017-2018
L. K. Hudson1, C. Moore2, L. Constantine-Renna3, X. Qian2, L. S. Thomas2, K. N. Garman3, J. R.
Dunn3, T. G. Denes1;
Author 1
Department of Food Science, University of Tennessee, Knoxville, TN, 2Tennessee Department
Block:
of Health, Division of Laboratory Services, Nashville, TN, 3Tennessee Department of Health,
Nashville, TN.
Background: Salmonella Javiana is the fourth most common serovar of Salmonella found to
cause illnesses in Tennessee (TN), but is geographically clustered in the western region.
Almost two-thirds (63%) of S. Javiana clinical isolates from January 2017 through June 2018
were from counties in west TN. The objectives of this study were to retrospectively examine
the genomic population structure of Javiana isolates from patients in TN in 2017-18 and
describe epidemiological features among clades of case-patients
identified. Methods: Biosample numbers and metadata for S. Javiana (n=61) clinical isolates
from TN collected January 2017 to June 2018 were provided by the Tennessee Department of
Health. Raw reads were downloaded from the NCBI SRA database, trimmed using
Trimmomatic, and quality checked using FastQC. An appropriate reference assembly was
chosen (BioSample SAMN01832085) and downloaded from the NCBI refseq database. hqSNPs
were identified using the CFSAN SNP pipeline and the resulting matrix was used to construct a
neighbor-joining tree with Mega7. Additionally, trimmed reads were assembled using SPAdes
and contigs annotated with Prokka. Assembly statistics were generated by BBMap, SAMtools,
and QUAST. SeqSero was used to confirm serotype designations. Results: Two distinct major
clades of interest were identified (clades 1 and 4), each with geographical clustering, along
Abstract: with 2 minor clades. Major clades were defined as containing five or more isolates and minor
clades as containing less than five. Clades may represent or contain epidemiological
clusters. Clade 1 consisted of 23 isolates, with almost all (n=21) being isolated from patients in
the western region of TN and the majority isolated in a single county (Shelby; n=13). Isolates
from Clade 1 were collected over a period of approximately nine months. Clade 4 consisted of
20 isolates, mostly isolated from counties in west TN (n=13), that were collected over a period
of approximately one year. There is a notable subclade of seven isolates within clade 4
(subclade 4A), all from four rural counties in one geographic location (Carroll, Gibson,
Crockett, and Madison counties). These were collected over a period of about five months
and the majority of isolates from this subclade were collected from patients that were male
(86%) and adults (86%). Conclusions: The clustering patterns of S. Javiana isolates with a large
portion originating in western TN, together with the timeline of the isolation dates and SNP
differences, may indicate that many of these are environmentally acquired. Further
investigation of epidemiological data and possible environmental sources may identify the
source of illness and possible preventive strategies. In addition, information gained about the
population structure of this serovar provides guidance for selecting SNP distance thresholds
used to identify clusters that may be of epidemiological significance.
Poster
142
Board #:
Identifying Metabolically Active Bacteria in Tobacco Products with DNA Labeling and Next-
Title:
generation Sequencing
Author S. Chattopadhyay1, L. Malayil1, E. Mongodin2, A. Sapkota1;
1
Block: University of Maryland, College Park, MD, 2University of Maryland, Baltimore, MD.
The advent of the Family Smoking Prevention and Tobacco Control Act, implemented by the
U.S. Food and Drug Administration, has resulted in the need to improve our understanding of
the microbial constituents of tobacco products. 16S rRNA gene sequencing techniques have
enabled us to gain insights into identifying non-culturable bacteria present in tobacco
products. These sequencing techniques generate massive amounts of unbiased data but are
unable to determine what proportion of identified bacterial communities are live and active.
To bridge this knowledge gap, our study aimed to identify and quantify the metabolically
active bacterial communities in commercially-available tobacco products. We tested 14
tobacco products: 4 brands of cigarettes, 4 brands of little cigars and 6 brands of hookah,
each with three distinct flavors. For each product, 0.2g of tobacco was treated with either i)
Propidium monoazide (PMA), allowing detection of viable bacteria by inactivating DNA that is
not contained within an intact cell membrane, or ii) 5-bromo-2’- deoxyuridine (BrdU),
allowing detection of proliferating cells, or left untreated (control samples), after which total
genomic DNA was extracted. BrdU samples were immunocaptured and all samples
Abstract:
underwent PCR of the 16S rRNA gene followed by sequencing on Illumina HiSeq. Downstream
analyses were performed using QIIME and R. Overall, 1,242 species-level operational
taxonomic units (OTUs) were identified from more than 11 million sequences across 88
samples. Alpha diversity analysis (Observed and Shannon indices) showed significant
(p<0.005) differences between bacterial communities among BrdU-treated, PMA-treated and
control samples across all tobacco products. In addition, Fflavoring of tobacco products also
showed significant effects on bacterial community composition. Beta diversity analysis
comparing products using Bray-Curtis dissimilarity also identified significant differences
between BrdU- and PMA-treated samples (ANOSIM R= 24.4%, p<0.001). Actinobacteria,
Bacteroidetes, Firmicutes, and Proteobacteria were the top phyla identified across all
products. Our data confirms that tobacco bacterial communities are diverse and differ across
brands and products. This study is the first to characterize the presence of a metabolically
active fraction of bacterial communities residing within these products, which are affected by
tobacco flavors.
Poster
143
Board #:
WGS Analysis of O157 and non-O157 Shiga Toxin-producing Escherichia coli Clinical Isolates
Title:
from Michigan, 2010-2014
Author H. Blankenship, S. D. Manning;
Block: Michigan State University, East Lansing, MI.
Background. Shiga toxin-producing Escherichia coli (STEC) is a leading foodborne pathogen
with a diverse genetic background that contributes to variation in disease presentation and
severity. The use of WGS allows for a more comprehensive genomic profiles to be rapidly and
easily obtained for comparing between isolates and for identification of genetic factors
associated with disease outcomes. Methods. STEC isolates were obtained from 2010-2014 as
part of an active surveillance system, which included four hospitals in Michigan. Wizard
Genomic DNA purification and Illumina Nextera XT kits were used followed by sequencing
using the Illumina MiSeq platform. de novo genome assembly was performed with Spades
following trimming and quality checking with Trimmomatic and FastQC. Results. WGS data is
available for 477 STEC isolates (33 O157 and 444 non-O157) from Michigan. A workflow was
Abstract: developed with bioinfomatic scripts to extract the molecular serotype, virulence and
resistance gene profiles, and multilocus sequence type (ST) for each strain. SNPs specific for
one of nine clades were extracted from the 33 O157 strains as were CRISPR spacer regions to
demonstrate a high level of diversity with multiple unique gene profiles. 45 non-O157
serogroups were identified and the isolates were grouped into 54 STs with 4 new STs
identified. The O157 strains grouped into 5 clades, with the majority (n=14) belonging to
clade 8. Conclusion. Use of WGS to characterize 477 STEC isolates has demonstrated that
strains recovered from patients in Michigan are diverse and that specific gene profiles can be
associated with epidemiological data. Comparative genomic analyses of STEC and other
foodborne pathogens are important to identify key profiles that are most important for
severe infections, and to validate existing subtyping methods.
Poster
144
Board #:
Genome Sequences of Bacillus sporothermodurans Strains Isolated from Ultra High
Title:
Temperature (UHT) Milk
R. Owusu-Darko1, M. Allam2, S. D. Oliveira3, C. A. Ferreira3, S. Grover4, S. Mtshali2, A. Ismail2,
E. M. Buys1;
Author 1University of Pretoria, Pretoria, SOUTH AFRICA, 2National Institute for Communicable
Block: Diseases, Johannesburg, SOUTH AFRICA, 3School of Sciences, Pontifícia Universidade Católica
do Rio Grande do Sul, Porto Alegre, BRAZIL, 4Dairy Microbiology Division, Molecular Biology
Unit, National Dairy Research Institute, Karnal, INDIA.
Bacillus sporothermodurans, first isolated in ultra-high temperature (UHT) milk, is a thermo-
resistant, Gram-positive bacterium that can produce highly heat resistant endospores (HRS),
that may survive UHT heat treatments. We sequenced four genomes of B.
sporothermodurans, including for the first time, both heat resistant and non-heat resistant
strains. The size of the genomes ranges from 3.4 Mb to 3.9 Mb with an average G + C content
of 36 % and the number of coding sequences ranging from 3768 to 4558. Our research also
shows that both heat resistant and non-heat resistant strains have similar compliment of heat
resistance genes, the hrcA-dnaK-dnaJ-grpE operon and biofilm formation of the TasA and
homologs. The whole genome sequence of three of the four sequenced B.
Abstract: sporothermodurans strains have the Listeria sp. pathogenicity island LIPI-1, presumably
obtained through horizontal gene transfer. Evolutionary trends of B.
sporothermodurans suggest a common ancestor originating from the gut of insects or
Arachnids like its closest phylogenetic neighbor, Bacillus oleronius. The draft genomes carried
out on the Illumina MiSeq system will enhance our understanding of the genes and pathways
responsible for heat resistance and biofilm formation which is of prime importance to the
dairy industry. It will also allow for pangenome studies which are ongoing and the
evolutionary relationships with other Bacillus species of concern to the food industry. PacBio
sequencing to start in earnest will allow to fill out the gaps in the genomes undertaken
through the MiSeq platform.
Poster
145
Board #:
Dissection of The Mobilome of Carbapenem-Resistant Klebsiella pneumoniae (CR-Kpn) Using
Title:
Short and Long Read Assemblies: A Prospective Study in Houston, TX
W. C. Shropshire, A. Q. Dinh, H. Ecklund, A. Wanger, W. Miller, D. Panesso, T. T. Tran, C. A.
Author
Arias, B. M. Hanson;
Block:
UTHealth, Houston, TX.
Background: Klebsiella pneumoniae (Kpn) is a gram-negative pathogen that is responsible for
nosocomial infections leading to significant morbidity and mortality worldwide. Mobile
genetic elements (MGEs; e.g. plasmids and transposons) are of crucial importance for these
organisms to adapt and evolve. Exchanging of accessory genes that confer resistance to
antimicrobials is particularly important in clinical settings. The application of short- and long-
read sequencing platforms with next generation sequencing (NGS) bioinformatic tools
permits high resolution of these complex resistance elements which otherwise prove difficult
using one sequencing platform alone. Here, we describe the genomic profiles of 95 CR-Kpns
belonging to clonal group (CG) 258 and non-CG258 collected in hospitals across Houston, TX
from May to December 2017. Methods: Libraries were prepared with Nextera XT DNA Library
Prep Kit (Illumina) and Rapid Sequencing Kit (SQK-RBK004, Oxford Nanopore Technologies,
ONT). Sequencing platforms used were the MiSeq and HiSeq 4000 (Illumina) and GridION X5
(ONT). A custom pipeline was developed for high-throughput data QC, processing, assembly,
and analysis of Illumina data. Oxford Nanopore sequencing data was assembled using Canu
v1.7.1 and polished with Illumina sequencing data using Pilon 1.22. Chromosomes and
Abstract: plasmids were circularized using Circlator 1.5.5. Results: Phylogenetic and multi-locus
sequence typing (MLST) analysis on 95 Kpn samples revealed two predominant sequence
types (STs) with 38/95 (40%) ST258 and 35/95 (37%) ST307. Short-read alignment analysis
indicated that all ST307s had the extended spectrum beta lactamase (ESBL) CTX-M-15 gene
whereas it was only present in one ST258 isolate. Two non-ST258/ST307 isolates carried the
genes encoding NDM-1 carbapenemase. Two representative isolates from each dominant
clade and four non-ST258/ST307 isolates were sequenced using the GridION X5 platform with
their plasmid and MGE structures closed and resolved respectively. Fox example, we were
able to resolve transposon Tn4401a linked with blaKPC-3 carriage within an ST307 isolate on a
single ONT read. This allowed us to identify multiple isoforms and SNPs of genes initially
identified through the abricate tool, i.e. CARD and PlasmidFinder, and gain greater detail of
the MGE structures that carried them. Conclusions: Initial phylogenetic analysis revealed two
clades, ST258 and ST307, which appear to dominate the multifocal prevalence within our
Houston hospital setting. The application of two NGS sequencing platforms along with our
custom bioinformatics pipeline allowed a complete elucidation of the MGE structures of
interest as well as the resistance determinants of which these MGEs carried.
Poster
146
Board #:
Hybrid Sequencing and Assembly Reveals Genomic Diversity of Methicillin-
Title: susceptible Staphylococcus aureus (MSSA) from a ,eonatal Intensive Care Unit (NICU)
Surveillance Effort
M. K. Annavajhala, W. Geng, A. C. Hill-Ricciuti, S. Ferguson, S. L. Stump, M. J. Giddins, M.
Author
Messina, P. Zachariah, D. A. Green, S. Whittier, L. Saiman, A. Uhlemann;
Block:
Columbia University Medical Center, New York, NY.
Background: MSSA is a more prevalent NICU pathogen than methicillin-resistant S.
aureus (MRSA), yet optimal MSSA infection prevention and control strategies are unclear.
Given the ubiquity of MSSA as a human colonizer, neonatal acquisition likely occurs through
multiple routes during delivery and close contact with parents and healthcare providers.
However, the introduction and potential local spread of MSSA and the role of systematic
decolonization for infants colonized with MSSA remain incompletely understood. Here, we
used short- and long-read whole-genome sequencing (WGS) to define the diversity of MSSA
during an ongoing NICU surveillance effort and aimed to identify genomic features
potentiating the spread of prevalent MSSA clones. Methods: Infants hospitalized in a 75-bed
university-affiliated level III-IV NICU over an 18-month period were screened twice monthly
for MSSA-positive clinical and/or pooled four-site surveillance cultures. We spa typed isolates
using PCR and sequenced the most prevalent spa types using Illumina WGS (n=107). We used
SRST2 for in silico multilocus sequence typing and antibiotic resistance gene and plasmid
replicon typing. Oxford nanopore sequencing and hybrid assembly for each MLST type was
used to create optimal reference genomes. We included previously published data in
phylogenetic analyses of core genome SNPs to identify the evolutionary history of major
clones. Results: We collected 466 MSSA isolates from 297 infants. MSSA spa types identified
Abstract:
(80 in total) included t279 (n=86), t1451 (n=21), and t571 (n=10). Of note, t1451 and t571
belong to ST398, a common clindamycin-resistant MSSA in our local community. In contrast,
t279 (CC15) has not been encountered in community surveillance efforts yet increased in the
NICU during the study period. We used nanopore sequencing of the oldest 2016 CC15 isolate
to generate a reference genome. Compared to publicly available genomes, our reference
greatly reduced pairwise SNP distances and allowed for more accurate phylogenetic
inferences. ST398 NICU isolates formed three clusters with closely related community
isolates, suggesting community members and NICU reservoirs as sources of acquisition in
neonates. CC15 comprised two clades of closely related isolates (< 100 SNPs) distinct from
known community MSSA, pointing to clonal expansion within the NICU. Almost all CC15 also
harbored mupA-encoding plasmids, including the reference isolate, indicating potential
proliferation due to decolonization efforts with mupirocin. Conclusions: MSSA in our NICU
exhibited substantial genetic heterogeneity. Comparative genomics indicate genotype-
specific pathways of introduction and spread of MSSA, including potential community-
(ST398) or healthcare- (CC15) associated sources. Antibiotic resistance may play an important
role in dissemination of CC15. Future surveillance efforts could benefit from routine
genotyping.
Poster
147
Board #:
Title: Microfluidic NGS Sample Preparation for High-Throughput Epidemiology
Author S. Kim, G. Lagoudas, P. Blainey;
Block: The Broad Institute of MIT and Harvard, Cambridge, MA.
While low-cost DNA sequencing is transforming biological research and discovery, preparing
large sample sets for sequencing with minuscule starting material is now the limiting factor in
many applications. Here we introduce a polydimethlysiloxane (PDMS) microfluidic device that
automates the key steps in whole genome NGS sequencing sample preparation, integrating
lysis, fragmentation, adapter tagging, purification, and size selection of 96 samples in parallel.
We applied our device to process about 5000 whole genome sequencing libraries
Abstract: of Pseudomonas aeruginosa clinical isolates, methicillin resistant Staphylococcus
aureus, Mycobacterium tuberculosis, soil microbes, and human gut microbes using
dramatically reduced sample input and reagent quantities. These microfluidic libraries
showed excellent coverage for variant calling, phylotyping, metabolic profiliing, and
metagenomic analysis performance from only 10,000 cells (50 picograms of genomic DNA).
Our method will enable high-throughput processing of samples for shotgun sequencing with
broad application to basic science and clinical medicine.
Poster
148
Board #:
Title: PathOGiST: Calibrated Multi-criterion Genomic Analysis for Public Health
P. Feijao1, M. Katebi1, E. Lasalle2, H. Yao3, S. La1, M. Nguyen1, C. Chauve1, L. Chindelevich1;
Author 1
Simon Fraser University, Vancouver, BC, CANADA, 2ENS Paris-Saclay, Paris, FRANCE, 3École
Block:
Polytechnique, Paris, FRANCE.
As public health organizations start to rely on whole-genome sequencing (WGS)data for
infectious disease surveillance and outbreak investigations, two main issues emerge from the
use of WGS data for genotyping. First, methods for differentiating outbreak-related strains
from sporadic strains are often based on a single type of genomic variation. This approach
captures only a limited amount of the genomic variability and tells only a partial story of the
organism's evolutionary history. Second, WGS-based sample clustering algorithms are often
not calibrated, meaning that the determination of clustering thresholds or subtyping cutoffs
is still mostly arbitrary. There are many forces driving pathogen evolution and as a result,
using the wrong set of variants or the wrong cutoffs may mislead the investigation of a
Abstract:
pathogen outbreak.We address this issues by developing PathOGiST, that implements and
integrates existing and novel genomic variant calling algorithms from WGS data (SNPs, Multi
locus sequence typing and copy number variations), together with clustering algorithms
based on a multi-criterion genome dissimilarity measure using various kinds of genomic
variants. Final steps include the calibration of the statistical models and algorithms using
large reference sets of selected pathogen genomes from epidemiologically confirmed
outbreak strains.The PathOGiST pipeline will be implemented both as a standalone and as a
Galaxy tool, and will be part of the IRIDA platform, making it available to public health
workers in Canada and around the world.
Poster
149
Board #:
Title: Worflows for detection in methylomes in ONT and Pacbio
Author L. A. Arteaga-Figueroa, V. Villegas-Escobar, J. Correa-Alvarez;
Block: Universidad EAFIT, Medellín, COLOMBIA.
Background: TheThird generation sequencing (TGS) technologies present many advantages in
comparison with Next generation sequencing (NGS) , mainly because its capability to produce
long reads, detect base modifications, RNA sequencing, superior performance on repeated
regions. Many comparisons are found regarding the performance, but few compare the
capability to detect DNA modifications, available software performance, and final annotated
methylome. In this study,we provide workflows for the detection of 6mA and 5mC in
Nanopore (ONT) and SMRT sequencing data, and a comprehensive comparison between
them. Methods: Data obtained with ONT and PacBio from Bacillus subtilis project (EA-
CB0015) were used to design workflows for the base detection analysis (considering the most
tools available).These workflows compare intensively mappers and modified base detectors.
For 6mA, we implemented mCaller, Tombo,and ipdSummary (kineticTools) ; for 5mC, we
implemented mCaller, Tombo, Nanopolish and ipdSummary. Homescripts in python were
used for output analysis and graphs. Results: In terms of mapping, for PacBio, so far, only
Blasr was able to output the .bam necessary for the analysis. For ONTdata, Graphmap and
Abstract:
Minimap2,Although Graphmap was more accurate than Minimap2 for high error datasets,
Minimap2 performed better; we are currently testing lordFAST.For 5mC detection in ONT
data, we found that Tombo is the most sensitive, nearly followed by Nanopolish. For 6mA,
Tombo was also the most sensitive. ipdSummary failed many times to assign identity to
modified bases, and outputted many less bases than ONT software for both 6mA and 5mC.
Additionally, detection of modified bases in the same position (in dimers like TA, GC) but in
different strands were also observed.The missidentification of 4mC remains as an issue
through the analysis, the later chemistry of PacBio does not have an appropriate software for
4mC detection. When we compared ONT for 4mC with 5mC ONT detections, some positions
matched, and as aforementioned, a big part of the output of ipdSummary does not assign
identity to all the modified bases, we are still working in the identification of 8oxoA, 8oxoG,
and other kinds of mC with the purpose of identify possible false positives for 5mC in ONT
results. Conclusions: Oxford Nanopore sequencing showed to be more sensitivethat PacBio
sequencing for 5mC and 6mA detection accordingto our results.
Poster
150
Board #:
Metagenomic MinION and Illumina Sequencing for Surveillance of Cholera and Other
Title:
Waterborne Pathogens in Haiti
Author B. Stebbins, S. Hung, C. Martin, T. Ford;
Block: UMass Amherst, Amherst, MA.
The recent deadly cholera outbreak after the 2010 earthquake in Haiti prompted the need for
a portable system that will allow for rapid pathogen identification without requiring
expensive laboratory resources. To address the need, we are testing the field suitability of the
hand-held sequencer called the MinION (Oxford Nanopore Technologies) for the
metagenomic assessment and detection of waterborne pathogens such as Vibrio cholerae. In
the initial testing phase, we collected and tested water samples for coliforms from the Mill
Abstract: and Fort Rivers in Amherst and Hadley, Massachusetts. DNA was then isolated, spiked with V.
cholerae DNA, and prepared via the DNA ligation method for sequencing. Initial MinION
sequencing followed by bioinformatic analysis of the data using WIMP, OneCodex, and
CosmosID (Rockville, MD) platform detected V. cholerae spike-in but not the coliforms unless
the samples were enriched for these species. Further optimization of the portable system is
being undertaken to allow for future field metagenomic applicability. We will also be using
Illumina MiSeq sequencing for comparison of read accuracy.
Poster
151
Board #:
Title: HAIviz: Visualizing Genomic Epidemiology Data of Healthcare Associated Infections
B. Permana1, B. M. Forde1, L. Roberts1, P. Harris2, S. A. Beatson1;
Author 1
University of Queensland, St Lucia, AUSTRALIA, 2UQ Centre for Clinical Research, Brisbane
Block:
City, AUSTRALIA.
Visualization is an essential aid for communicating genomic epidemiology of infectious
disease that often employs complex epidemiological and genomic information. Over the past
few years, several tools such as Microreact, PhyloGeoTool, and Nextstrain have been
developed and demonstrated the benefit of data integration and interactive visualization.
However, these tools mainly focus on global epidemiology and phylogeography. Applications
that feature specific data related to Healthcare Associated Infection (HAI) such as the
patient's bed movements, hospital room layout, and infection transmission network remain
unavailable.
Here we present HAIviz, an interactive single web page application for visualizing genomic
epidemiology of HAIs. It was developed using popular web technologies to allow infection
control professionals, epidemiologist, clinicians, and public health decision-makers explore
potential insights from Whole Genome Sequencing (WGS) data and epidemiological
Abstract: information. HAIviz allows users to display a detailed hospital map, integrated with patient
metadata, phylogenetic trees and transmission networks. Users can create single or multiple
visualization windows by uploading their own dataset in the required format; metadata and
transmission files in Comma Separated Value (CSV), maps in GeoJSON, and trees in Newick.
Each generated window is independently arranged, giving users the freedom to display their
preferred information.
HAIviz is freely available at the URL http://haiviz.beatsonlab.com and can be accessed using
any modern browser. As a client-side application, HAIviz perform all computational process in
the user's machine with no information posted to the server, making it inherently private,
secure, and scalable. Currently, HAIviz is accessible as a standalone visualization application
that works with the input files created by a separate workflow. In the future, HAIviz aimed to
be an integrated system with WGS-based epidemiology and bioinformatics pipeline,
promoting a real-time HAI surveillance and investigation framework.
Poster
152
Board #:
The Role of Region-Specific Based on Single Nucleotide Polymorphisms in Virulence Genes
Title:
in Mycobacterium tuberculosis
Author A. Mutshembele, L. Malinga, M. Van Der Walt;
Block: South African Medical Research Council, PRETORIA, SOUTH AFRICA.
Background: Countries are experiencing a serious public health threat and major obstacle to
disease control due to excess antibiotic use and drug-resistant Mycobacterium
tuberculosis. Single nucleotide polymorphisms (SNPs) and groups of virulence genes will be
used for genotyping. Single nucleotide polymorphism could be the most valid markers due to
the very low level of homoplasy and that they are ideally suited for defining phylogenetic
grouping with very high confidence. Objectives: To analyse approximately 800 M.
tuberculosis whole genome sequencing (WGS) data deposited in a web-based comprehensive
information system PATRIC website and select drug resistance and susceptible genomes for
genotyping. To create database of virulence gene mutation catalogues based on M.
tuberculosis genome deposited from Brazil, China and South Africa. Research Methods: In
this work a bioinformatics analysis of virulence genes from 303 whole genome sequencing
of M. tuberculosis was performed from Brazil (n=2), China (n=23), India (n=238), Russia (n=
259) and South Africa (n=278) downloaded from PATRIC database and analysed on CLC
Abstract:
Genomics Workbench 11. Results: A bioinformatics analysis of M. tuberculosis WGS showed
that out of 15 tested genes (mazF3; vapB17; vapC47; higA; vapC37; vapC38; vapC6; mazF8;
vapC3; mce3B; cyp125; vapC25; vapB34; mce3F; vapC46) only from gene vapC3 did not have
mutations. Several genes were found to carry SNPs that correlate with specific genotypes.
Using vapC37 and vapC38 we observed Beijing lineage and its sublineages which are
associated with drug resistance and elevated virulence. Mutations obtained were specific for
lineage and sublineage level. VapC3; vapC38; and mazF8 genes were associated with LAM1,
2,9 and LAM4/F15/KZN sublineages. Conclusions: The constructed SNP reflected the
evolutionary relationship between lineages. In future we will need to establish a South
African M. tuberculosis catalogue of SNPs in virulence gene specific to the F15/LAM4/KZN
lineage this will complement the diagnostic pipeline using WGS data for drug resistance
detection and lineage determination. We will determine a list of conserved genes that can be
used for future development of DNA vaccines.
Poster
153
Board #:
Title: Rapid and Precise Alignment of Raw Reads Against Redundant Databases with KMA
Author P. T. Clausen, F. M. Aarestrup, O. Lund;
Block: Technical University of Denmark, Lyngby, DENMARK.
Background: As the cost of sequencing has declined, clinical diagnostics based on next
generation sequencing (NGS) have become reality. Diagnostics based on sequencing will
require rapid and precise mapping against redundant databases because some of the most
important determinants, such as antimicrobial resistance and core genome multilocus
sequence typing (MLST) alleles, are highly similar to one another.In order to facilitate this, a
novel mapping method, KMA (k-mer alignment), was designed. KMA is able to map raw reads
directly against redundant databases, it also scales well for large redundant databases. KMA
uses k-mer seeding to speed up mapping and the Needleman-Wunsch algorithm to accurately
align extensions from k-mer seeds. Multi-mapping reads are resolved using a novel sorting
Abstract: scheme (ConClave scheme), ensuring an accurate selection of templates.Results: The
functionality of KMA was compared with SRST2, MGmapper, BWA-MEM, Bowtie2, Minimap2
and Salmon, using both simulated data and a dataset of Escherichia coli mapped against
resistance genes and core genome MLST alleles. KMA outperforms current methods with
respect to both accuracy and speed, while using a comparable amount of
memory. Conclusion: With KMA, it was possible map raw reads directly against redundant
databases with high accuracy, speed and memory efficiency. Availability: KMA is
implemented in C, and is freely available
at: https://bitbucket.org/genomicepidemiology/kma and as web-service
at: https://cge.cbs.dtu.dk/services/KMA/.
Poster
154
Board #:
Population Genomics Identified Salmonella Newport ST45 as the Main Driver of Emerging
Title:
Multidrug Resistance
Author M. Yue1, S. Rankin2, D. Schifferli2, W. Fang1, H. Pan1;
1
Block: Zhejiang University, Hangzhou, CHINA, 2University of Pennsylvania, Philadelphia, PA.
Salmonella is one of the most important foodborne pathogens in the world, the emerging of
multidrug-resistant Salmonella clones pose a significant threat for veterinary public health
and food safety. However, the genetic and/or evolutionary pressure for the selection of
antibiotic-resistant pathogens in food animals remains poorly understood. The aim of this
study was a global investigation of the population diversity of S. Newport isolates by studying
the MLST of 2,250 isolates. Three clades were identified that correlated with the
niches/origins of isolation (human, animal, and environment). Sequence analysis of
1,855 S. Newport genomes identified Sequence Type 45 (ST45) as the predominant clone
among the animal isolates (87%), but only in 9% of the isolates from human infections. ST45
isolates carried multiple plasmids, the majority (> 90%) had a unique IncA/C plasmid that
ranged in size from 80 to 200 kb. The plasmid carried genes responsible for multidrug
Abstract: resistance, including floR, tetAR, strAB, sul, mer, and bla. Importantly, three Chinese strains
carried the mcr-1 gene, that confers plasmid-mediated resistance to colistin, one of a number
of last-resort antibiotics for treating Gram-negative bacterial infections. A genome-wide
association study (GWAS) correlated chromosome regions or genetic variations with
maintenance of an IncA/C plasmid in ST45 isolates. An additional investigation of the
minimum inhibitory concentration (MIC) of 27 antibiotics in 3,728 isolates isolated from the
food-chain (food-animals, retail meats, and humans) suggested that AR S. Newport from
humans have multiple, but distinct origins. Animal and retail-meat isolates are distinct from >
92% of the human isolates by their antibiotic-resistance patterns. Taken together, our
findings suggest S. Newport ST45 is the dominant clone in food-animals in the world. The
GWAS data will serve to investigate genetic determinants that contribute to the maintenance
of this clone in food-animals.
Poster
155
Board #:
BacPipe: A Rapid, User-Friendly Whole Genome Sequencing Pipeline for Clinical Diagnostic
Title:
Bacteriology and Outbreak Detection
B. Xavier, M. Mysara, M. Bolzan, C. Lammens, S. Kumar-Singh, H. Goossens, S. Malhotra-
Author
Kumar;
Block:
University of Antwerp, Wilrijk, BELGIUM.
Despite rapid advances in whole genome sequencing (WGS) technologies, their integration
into routine microbiological diagnostics and infection control has been hampered by the need
for downstream bioinformatics analyses that require considerable expertise. We have
developed a comprehensive, rapid, and computationally low-resource bioinformatics pipeline
(BacPipe) that enables direct analyses of bacterial whole-genome sequences (raw reads,
contigs or scaffolds) obtained from second and third-generation sequencing technologies.
BacPipe is an ensemble of state-of-the-art, open-access tools for quality verification, genome
assembly, annotation, and identification of the bacterial genotype (MLST, emm typing),
resistance genes, plasmids, virulence genes, and single nucleotide polymorphisms (SNPs). The
Abstract:
outbreak module (SNPs and patient metadata) can simultaneously analyse many strains to
identify evolutionary relationships and transmission routes. Importantly, parallelization of
tools in BacPipe considerably reduces the time-to-result. Validation of BacPipe using prior
published WGS datasets from hospital, community and food-borne outbreaks and from
transmission studies of important pathogens demonstrated the speed and simplicity of the
pipeline that reconstructed the same analyses and conclusions within a few hours. We
believe this fully automated pipeline will contribute to overcoming one of the primary hurdles
to WGS data analysis and interpretation, facilitating its application for routine patient-care in
hospitals and public-health and infection-control monitoring.
Poster
156
Board #:
Title: Rapid Extraction of Single-copy Core Genes for Species Delimitation
S. Wittouck1, S. Wuyts1, C. Meehan2, V. van Noort3, S. Lebeer1;
Author 1
University of Antwerp, Antwerp, BELGIUM, 2Institute of Tropical Medicine Antwerp,
Block:
Antwerp, BELGIUM, 3KULeuven, Leuven, BELGIUM.
Background: Many analyses in comparative genomics and phylogenetics rely on single-copy
core genes (SCGs): genes present in all genomes of interest in exactly one copy. Current
strategies to obtain SCGs are either slow or rely on pre-computed marker genes, either
universal or lineage-specific. Methods: We developed a tool for the rapid extraction of SCGs
from large sets of genomes in linear time. The tool works by first identifying candidate SCGs
on a random subset of “seed” genomes with OrthoFinder, which uses an approach based on
all-vs-all blastp and MCL. This is followed by a search for those candidate SCGs in all genomes
using HMMER. Finally, a score cutoff is determined per candidate SCG to optimize for single-
copy presence and only candidate SCGs present in nearly all genomes are retained. We apply
our tool to all 2110 publicly available genomes that belong to the Lactobacillus Genus
Complex (LGC). We show the applicability of the obtained SCGs by using them for 1) quality
control of all genomes, 2) species delimitation based on pairwise single-copy core nucleotide
identities (SCNIs) and 3) phylogeny inference using one representative genome per species. In
addition, we compare our SCNI-based species delimitation with ANI and TETRA based species
delimitations. Results: On a subset of 200 LGC genomes, we show that our tool identifies
Abstract: similar SCGs as full gene family clustering, but is faster. In the full dataset of 2,110 genomes,
we identify 422 SCGs sensu lato. Using those, we find that 1,980 genomes are of high quality
based on filters of < 5% missingness and < 5% contamination. The pairwise SCNI and ANI
similarities are strongly correlated above and slightly below the species threshold, while,
surprisingly, they are very weakly correlated for more distantly related genomes. Species
delimitation of the high-quality genomes results in the identification of thirteen “new”
species, in the sense that it is not yet known that genomes of those species are publicly
available. Some of those genomes are annotated as other species but are too distant from
their type strain to be classified as such, while others are annotated as unclassified on the
species level. The phylogeny of the species shows that the new species are spread across the
LGC tree, with some being closely related to known species, while others are more
distant. Conclusions: Our tool for rapid extraction of SCGs yields similar results as current
methods and is much faster. We suggest the SCNI similarity as an alternative for ANI since it
can be determined rapidly for large datasets, results in very similar species boundaries and
might more accurately represent genome distances for more distantly related genomes.
Finally, we identify thirteen new species among publicly available LGC genomes.
Poster
157
Board #:
Genomic Epidemiology of Vibrio cholerae O1 in Haiti: A Switch from the Ogawa to Inaba
Title:
Serotype
T. K. Paisie, C. Mavian, T. Azarian, M. Cash, D. J. Nolan, A. Ali, M. T. Alam, J. Morris Jr., M.
Author
Salemi;
Block:
University of Florida, Gainesville, FL.
Vibrio cholerae is the causative agent of the disease cholera. This bacterium is ubiquitous in
aquatic environments and toxigenic V. cholerae O1 may serve as a source for recurrent
cholera epidemics around the globe. In January 2010, a massive earthquake struck Haiti,
causing severe damage to the public health infrastructure. Then in October 2010, cholera
appeared in Haiti for the first time in over 150 years. Previous studies show that the early
cases of cholera in Haiti are consistent with a single-source introduction of V.
cholerae O1 from Nepalese U.N. peacekeeping troops sent after the earthquake. After the
initial epidemic waves, cholera may now be endemic in Haiti, showing seasonal outbreak
patterns associated with the rainy season. This clonal, single-source introduction of V.
cholerae O1 presents a unique opportunity to study the evolution and selective pressures
acting on this microorganism. By performing phylodynamic analysis with genome-wide single
nucleotide polymorphisms (SNP), we are able to investigate the ongoing cholera epidemic
occurring in Haiti and the underlying evolutionary processes and selective pressures at a
Abstract: remarkable resolution. Since the start of the cholera outbreaks in 2010, the dominate
serotype of V. cholerae O1 circulating in Haiti was the Ogawa serotype. Then in 2015, Inaba
became the dominant serotype in Haiti. The main driver causing the switch from the Ogawa
to the Inaba serotype is by a nucleotide substitution in the wbeT gene. Though the switch
from the Ogawa to the Inaba serotype is a common phenomenon in the genome of V.
cholerae O1, if the Ogawa serotype still remains dominate in the population and an outbreak
of the Inaba serotype occurred, this could have been caused by a separate introduction into
the population. Previous studies have shown that the Inaba serotype has been present in
Haiti since 2012 but it has never propagated and become established as the dominate
serotype circulating in Haiti. By using genome-wide SNPs to perform our analysis, we are able
to assess potential evolutionary changes and selective pressures that are occurring in the V.
cholerae O1 genome to generate this switch in serotype. Our results suggest that the V.
cholerae O1 strains currently circulating in Haiti have evolved from their initial clonal, single-
source outbreak of the Ogawa serotype to the new, unintroduced Inaba serotype.
Poster
158
Board #:
LINbase: A Fast and Precise Whole Genome-Based Web Tool for Bacterial Pathogen
Title:
Identification and Tracking
Author L. Tian, L. S. Heath, B. A. Vinatzer;
Block: Virginia Tech, Blacksburg, VA.
The current pragmatic approach to bacterial taxonomy provides clear classificationguidelines
to determine if a bacterial isolate belongs to an already named species. It also provides clear
nomenclature rules on how to name new species. However, species descriptions do not
reveal the extent of genetic and phenotypic diversity within species and current taxonomy
does not provide any general guidelines or rules for intraspecific classification. This is highly
problematic in the case of bacterial pathogens, including foodborne pathogens, since most
pathogen species contain non-pathogenic strains and even pathogenic strains can be
separated into different intraspecific groups based on genomic and phenotypic features.
Whole genome sequencing (WGS) has shown considerable potential to facilitate the
detection of foodborne disease outbreaks and origin tracking by increasing the discriminatory
power compared to molecular methods such as pulsed-field gel electrophoresis (PFGE),
Abstract:
multiple-locus variable number tandem repeat analysis (MLVA) and multi-locus sequence
typing (MLST). Life Identification Numbers (LINs) have been shown to reflect phylogenetic
relationships and to provide a system for classification at - and below - the species level. At
the same time, LINs greatly improve identification of bacterial isolates because LINs can
identify bacterial isolates as members of species or members of intraspecific groups or
members of any other defined groups, e.g. isolates simply belonging to the same disease
outbreak. LINbase is a Web tool that implements LINs for classification and precise
identification of bacteria. In combination with fast algorithms and a user-friendly Web
interface, LINbase will not only provide users with the ability to precisely identify any
bacterial isolate based on its genome sequence within minutes, but also to determine
outbreak-association.
Poster
159
Board #:
Title: Next Generation Sequencing to Investigate Nosocomial Transmission of Influenza
D. Frampton1, R. Blackburn1, C. Houlihan1, C. Smith1, Z. Kozlakidis1, S. Hue2, A. Hayward1, E.
Author Nastouli1;
1
Block: UCL / Farr Institute for Health Informatics Research, London, UNITED KINGDOM, 2London
School of Hygiene and Tropical Medicine, London, UNITED KINGDOM.
Overview: Evidence-based infection control of nosocomial influenza has the potential to offer
substantial human health improvements and financial cost-savings. However, few studies
have examined nosocomial influenza transmission outside the narrow context of suspected
outbreaks. This study is one of the first to apply full genome sequencing to examine influenza
transmission in hospital settings and to compare genomic clusters defined at the level of the
full genome with cases that were epidemiologically linked in time and space (hospital ward or
clinic). Our findings exemplify the use of full genome sequencing for hospital surveillance of
transmission, showing the technique can identify distinct transmission chains with
substantially greater resolution than can be achieved through classical epidemiological
investigation. We show that an important proportion of hospitalized influenza cases (at least
one in eleven) lead to onward transmission usually involving short chains of transmission
(average length of 3 cases). Many transmission chains cannot be explained by known contact
between individuals suggesting “missing links” in the chain due to under-ascertainment of
influenza cases in patients and a potential role for staff and/or visitors (who were not
sampled) in transmission. Methods: All influenza samples from inpatients, outpatients and
Abstract:
A&E attenders at a single hospital were included between September 2012 and March 2014.
Clinical records were used to define patients with suspected nosocomial infection with
possible transmission inferred from timing of first positive sample (relative to admission) and
spatio-temporal links to other infected patients. Sequencing was by Illumina
MiSeq. Results: 50 of 214 cases were part of genetically defined transmission chains amongst
hospitalised patients. The proportion in genetic transmission chains was substantially higher
for patients testing positive after 2 days of admission than those diagnosed soon after
admission (p<0.001), and for those with spatio-temporal links compared to those without
(p<0.001). The genetic distance between pairs of cases with spatio-temporal links was lower
than that for pairs with no spatial links (p<0.001). Assuming each genetically identified cluster
includes one community-acquired index case we estimate that 16% of hospital cases were
due to nosocomial transmission. 1 in 11 cases seeded a new transmission chain comprising an
average of 3 cases. Conclusions: Nosocomial influenza contributes significantly to hospital
burden during outbreak seasons. Routine whole genome sequencing will support outbreak
investigations and monitor the impact of infection and control measures.
Poster
160
Board #:
Title: PiReT: Pipeline for Reference-based Transcriptomics
Author M. Shakya, S. Feng, C. Lo, K. W. Davenport, B. Hu, P. S. Chain;
Block: Los Alamos National Laboratory, Los Alamos, NM.
Transcriptomics enables identifying genes and pathways that are differentially expressed in
one condition over another, discovering small RNAs (sRNA), annotating transcribed genes,
and characterizing alternative splicing. With the rapid advancement in sequencing
technologies providing unprecedented throughput at an acceptable cost, many research
laboratories have shown interests in applying transcriptomics for their research. However,
most of the laboratories have found themselves continuously challenged by the lack of
bioinformatics and statistical expertise needed to design, implement, and maintain
computational workflows capable of analyzing transcriptomics data. Here, we
present Pipeline for Reference-based Transcriptomics or PiReT, a one of a kind reference-
based transcriptomics solution that adopts an open architecture and is built upon web-based
analysis platform of EDGE Bioinformatics to enable biologists with little or no computational
Abstract:
knowledge to analyze their data.A typical transcriptomics workflow requires implementing an
array of bioinformatics tools, each of which addresses a particular step in the analysis, e.g.
quality control, alignment, fragment counting, statistical hypothesis testing, etc. PiReT
effectively weaves together open source bioinformatics tools such as FaQCs, HISAT2,
featureCounts, EdgeR, DeSeq2, etc. and presents it in an interactive web Graphical User
Interface (GUI) where users can upload their raw data (fastq), customize steps of analysis, and
produce biologist-friendly results (e.g. RPKM/FPKM/TPM, read counts, list of differentially
expressed genes and pathway, etc.) and data visualizations within the GUI. It can perform
metatranscriptome analysis like host and pathogens responses, detect sRNAs, and perform
gene set enrichment or pathway analysis. PiReT can be used as a stand-alone workflow in
command line and is also integrated into EDGE Bioinformatics.
Poster
161
Board #:
Chronic Campylobacteriosis Outbreak Investigation in Great Apes Using Next-Generation
Title:
Sequencing
Author D. Bandoy1, E. Crook2, N. Kong1, C. Huang1, B. Weimer1;
1
Block: University of California,Davis, Davis, CA, 2Hogle Zoo, Salt Lake City, UT.
Background: Campylobacteriosis is one of the leading causes of diarrhea globally. While more
than a thousand genomes have been published for Campylobacter jejuni, genomes from
other Campylobacter species, like C. hyointestinalis, have been reported infrequently. To date
only 18 C. hyointestinalis complete and draft genomes are available published, primarily from
domestic and wild ruminants. Methods: Campylobacter was isolated and identified from a
longitudinal surveillance strategy with feces using classical microbiological methods. Whole
genome sequencing was done using the previously described protocol of the 100K Foodborne
Pathogen Project using Illumina HiSeq X Ten instruments. Raw reads were assembled using
CLC Genomics. Genome distance was computed using GGDC and the distance matrix values
were used to generate a phylogenomic tree. Annotation was done using Prokka followed by
pangenome analysis using Roary that was visualized using Phandango. Whole genome
sequences were analyzed using ABRIcate and the online Comprehensive Antibiotic Resistance
database (CARD) for antimicrobial resistance genes and virulence factors. Genomic islands
were predicted using Island Viewer online tool and manual curation was performed using
Mauve alignment. Results: In silico genome distance placed the isolates into distinct groups of
host species of origin, indicating host species adaptation. This clustering corresponds to the
Abstract: host species specific set of genes as demonstrated by presence-absence variation (PAV)
analysis using the pangenome output. Only one isolate (BCW 9279) within the primate
outbreak showed phylogenetic incongruence by clustering with the New Zealand deer
isolates. The apparent phylogenetic incongruence has been determined to be due to
horizontal gene transfer with genomic islands acquired from Clostridioides difficile. Further
analysis revealed the existence of arsenical resistance genes which eventually was no longer
identified in the post-antibiotic treatment genome sequences. This finding indicates a
mechanism of genomic divergence with the acquisition of genomic islands that were
negatively selected within the context of an ongoing outbreak and therapeutic intervention.
Surprisingly, despite the great ape’s exposure to tetracycline treatment, C. hyointesinalis ssp.
hyointestinalis resistome profiling revealed absence of any known antibiotic resistance genes.
rRNA copy number comparison (one copy in all the primates versus three copies in the
reference) suggests a possible mechanism of a carrier state with reduced metabolic
activity. Conclusion: Whole genome sequencing revealed unprecedented resolution of
longitudinal infection dynamics, revealing acute genomic island gain and loss due to negative
selection pressure with antibiotic exposure. These findings highlight the genomic flexibility
of Campylobacter hyointestinalis ssp. intestinalis in chronic diarrhea of great apes.
Poster
162
Board #:
Title: Sunbeam: An Extensible Pipeline for Analyzing Metagenomic Sequencing Experiments
E. L. Clarke1, L. J. Taylor1, C. Zhao2, A. Connell1, F. D. Bushman1, K. Bittinger2;
Author 1
Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 2Children's
Block:
Hospital of Philadelphia, Philadelphia, PA.
Background: Shotgun metagenomic sequencing experiments provide functional and
compositional insight into complex microbial communities. To analyze such data, a number of
preprocessing and analytical steps must be performed. Many of these steps, such as quality
control, adapter trimming, and phylogenetic classification, are common to many sequencing
experiments. Other analyses are specific to each study. Methods:Here we introduce
Sunbeam, a modular and user-extensible pipeline designed to process metagenomic
sequencing data in a consistent and reproducible fashion. Sunbeam performs multiple
processing steps common to many metagenomic sequencing experiments including quality
control, adapter trimming, host read removal, low-complexity filtering, metagenomic
classification, read assembly, and reference genome alignments. Sunbeam also includes a
powerful extension framework that enables users to incorporate new analysis or processing
steps easily. Results: Sunbeam installs in a single step, has no dependencies other than Linux,
Abstract:
doesn't require administrative access, and works on most cluster computing frameworks.
Sunbeam is inherently modular and will restart where it left off in case of error. To quickly
and accurately filter problematic low-complexity reads in metagenomic data, we also
introduce Komplexity, a rapid sequence complexity analysis tool, which identifies low
complexity sequences to allow removal. The Sunbeam pipeline is well-documented, regularly
updated and in routine use. We also provide a number of pre-built extensions
(github.com/sunbeam-labs/). Conclusions: Sunbeam provides an easy-to-use, extensible
framework for in-depth analysis of metagenomic sequencing experiments. Sunbeam ensures
reproducible and consistent analyses by standardizing post-processing, analytical, and custom
steps, and robust removal of problematic, low-complexity reads. Sunbeam is written in
Python using the Snakemake workflow management software and is freely available at
github.com/sunbeam-labs/sunbeam under the GPLv3.
Poster
163
Board #:
Title: A Carbon Nanotube Platform for Virus Enrichment
Author Y. Yeh, M. Terrones;
Block: The Pennsylvania State University, University Park, PA.
Viral pathogens evolve rapidly and unpredictably, challenging the effectiveness of existing
studies of viral evolution. Deep sequencing techniques detect viral mutations and diversities
by sequencing a genome region multiple times. A higher coverage along a consensus
sequence allows for reliable identifications of mutations among a viral population. Starting
with a sufficient amount and a high purity of genomic materials is a key to obtain a high
coverage consensus. One current challenge is the low virus counts in most samples, leading to
sequence reads that are dominated by hosts rather than by viral pathogens. Extant
enrichment methods, including virus culture and genome amplification, often introduce
artificial variants or bias among sequence reads. Size-tunable-enrichment-platform (STEP),
our recently developed portable technology, is constructed by aligned and functionalized
carbon nanotube forests to enrich different viruses based on their sizes while removing host
contaminants, e.g. host cell debris, DNA, mRNA, etc. The CNT-STEP significantly improves
detection limits and virus isolation rates by at least 100 times. We integrate CNT-STEP with
NGS in order to sequence unknown virus directly from field samples after enrichment. After
Abstract: enrichment, NGS viral reads increased from 2.9% (37,627 reads) to 90.6% (1,175,537 reads),
thus corresponding to an enrichment factor of ~600, and indicating that the CNT-STEP
removed most of the contamination from the host. In order to validate our new approach for
real field samples, we applied a cloacal swab pool collected from five ducks during a 2012 AIV
surveillance in Pennsylvania. Without any virus purification and propagation, the duck swab
sample was enriched and concentrated by a CNT-STEP of 95 nm inter-tubular distance. No
clogging was observed under scanning electron microscopy (SEM). NGS and de novo
sequence assembly yielded 8 AIV contigs in complete lengths, but no AIV related contig was
discovered in the sample without CNT-STEP enrichment. We named it
“A/duck/PA/02099/2012 (H11N9)”. The H11N9 strain was further confirmed by US
department of agriculture (USDA) through serological tests. This enrichment increases two
orders of magnitude of sequencing coverage that dramatically enhance the sensitivity in
identifying mutations. An outcome of this collaboration is the establishment of a unique
method that enables close monitor of viral evolutions and a cost-effective sample preparation
platform to allow for efficiency in viral deep sequencing.
Poster
164
Board #:
Whole Genome Sequencing to Track the Origin and Spread of Tuberculosis in Low Prevalence
Title:
Setting of Australia
S. Gautam1, M. Aogáin2, L. Cooley3, G. Haug4, J. Fyfe5, M. Globan5, R. O’Toole1;
Author 1University of Tasmania, Hobart, AUSTRALIA, 2Trinity College, Dublin, Dublin, IRELAND, 3Royal
Block: Hobart Hospital, Hobart, AUSTRALIA, 4Launceston General Hospital, Launceston,
AUSTRALIA, 5Victoria Infectious Diseases Reference Laboratory, Melbourne, AUSTRALIA.
Background: Tasmania is a small island state in Australia with annual tuberculosis (TB)
incidence rate of 1.7/100,000 population in 2014. A 60% drop in the current rate of TB by
2035 and a 95% drop by 2050 are required in Tasmania to meet World Health Organization’s
international target of TB eradication by 2050. This study was designed to identify the source
and track transmission of TB in Tasmania which is largely unknown. Methods: Whole genome
sequence (WGS) analyses of cultured isolates of Mycobacterium tuberculosis obtained from
2014 to 2016 in Tasmania was performed using Illumina Miseq at University of Tasmania. The
genomic data was analyzed for single locus variation to determine phylogeny and drug
resistance-conferring mutations. Genomic information was also analyzed in reference to
public health surveillance records. Furthermore, in silico spoligotyping was performed to
relate Tasmanian TB cases with publicly available isolates of International spoligotypes.
Abstract: Household contacts of TB cases were traced and their isolates analyzed. A cut-off of ≤5 single
nucleotide polymorphism (SNP) differences between the isolates was used to define the
recent transmission. Results: More than 80% of TB cases in Tasmania were detected in non-
Australian born individuals. Two clusters of TB were detected, one belonging to individuals
originating from Nepal and other from New Zealand. Based on WGS data, isolates belonging
to the largest cluster of TB in Tasmania were related to those prevalent in patient’s country of
origin, Nepal. Furthermore, SNP analyses revealed Vietnam as the origin of the first case of
multi-drug resistant TB in Tasmania. In addition, a human case of bovine TB reported after 40
years of its eradication from cattle in Tasmania was linked to M. bovis previously reported in
mainland Australia. Conclusion: Majority of TB cases in Tasmania have been reported in
foreign-born individuals. Geographically, TB in the state had a foreign origin. Transmission of
TB occurred within the members of the close community but not in a wider population.
Poster
165
Board #:
Title: Understanding genomic landscapes in EnteroBase with cgMLST & GrapeTree.
N. Alikhan1, Z. Zhou1, N. Luhmann1, C. Vaz2, A. P. Francisco2, J. A. Carriço3, M. Achtman1;
Author 1University of Warwick, Coventry, UNITED KINGDOM, 2Instituto de Engenharia de Sistemas e
Block: Computadores: Investigação e Desenvolvimento, Lisbon, PORTUGAL, 3Universidade de Lisboa,
Lisbon, PORTUGAL.
Sequenced raw reads are available in ENA for more than 647,000 bacterial genomes.
Important goals for such data may include identifying groups of genetically related bacteria in
order to facilitate epidemiological tracking or in depth analyses. However, even these simple
goals are difficult unless the raw data is codified.
We have developed an online tool, EnteroBase (http://enterobase.warwick.ac.uk), which
provides access to genomic assemblies, genotypes and analytical tools to biologists, clinicians
and epidemiologists. EnteroBase includes consistent high-resolution genotyping by core
genome multi-locus sequence typing (cgMLST) schemes
for Salmonella, Escherichia, Yersinia & Clostridioides and their intuitive visualization by
GrapeTree (https://github.com/achtman-lab/GrapeTree) (1). Phylogenetic analyses via single
nucleotide polymorphisms (SNPs) of up to 1,000 genomes are also available on-demand. An
initial impression of the benefits of this approach can be found in a recent review article (2).
We are already implementing the combination of data from modern genomes with ancient
DNA. EnteroBase contains more than 150,000 genomes from Salmonella and 70,000 from
Abstract:
Escherichia. These are unprecedented troves of data on the diversity within these two genera,
and the size of these databases will continue to increase dramatically over the next few years.
All read data are checked for quality, assembled and genotyped with a versioned pipeline,
ensuring consistency. EnteroBase supports sharing of data within private groups of
researchers as well as publishing graphical analyses and datasets for the entire global
community. We are also establishing facilities to allow free download of all genomes in
EnteroBase via a dedicated server. MSTree V2 and RapidNJ are implemented within
GrapeTree, and can identify important clusters of related organisms among 100,000 genomes
based on cgMLST. However, we are already preparing for the future that will encompass
orders of magnitude more genomes, by developing hierarchical clustering, which will provide
persistent and scalable designations, as a general tool for microbial genomics.
Reference List
1. Z. Zhou et al., BioRxiv doi: https://doi.org/10.1101/216788 (2017).
2. N.-F. Alikhan, Z. Zhou, M. J. Sergeant, M. Achtman, PLoS Genet 14, e1007261 (2018).
Poster
166
Board #:
Comparison of Genomic Analysis Methods for Investigation of a Legionellosis Cluster in New
Title:
York City, October 2017
C. Kretz1, P. Lapierre2, J. Mercante1, J. Novak3, E. Omoregie3, E. Gonzalez3, J. Wang3, B.
Author Raphael1, S. Hughes3, K. Musser4, J. Rakeman3;
1
Block: CDC, Atlanta, GA, 2NYS Wadsworth Center, Albany, NY, 3NYC PHL, NYC, NY, 4NYS Wadsworth,
Albany, NY.
Legionellosis is caused by exposure to Legionella species found in water; symptoms range
from a mild influenza-like illness to a serious and sometimes fatal form of pneumonia. In the
United States, ~79% of cases are associated with Legionella pneumophila serogroup 1
(Lp1). Legionella is a growing public health concern in the country; disease incidence has
nearly quadrupled since 2000 with several large high-profile outbreaks in the recent years,
including in New York City. During October 1-14, 2017, 15 cases of legionellosis were
confirmed in a <0.75 km radius in Flushing, Queens, NY. We used 2 comparative genomic
methods to characterize environmental isolates collected during the investigation and
compare them with circulating strains to assess diversity of Lp1. During the environmental
investigation, 55 epidemiologically linked cooling towers and water fountains were sampled
and tested by culture-based methods and by real-time PCR for presence
of Legionella species, Legionella pneumophila, and Legionella pneumophila serogroup 1 (Lp1).
No clinical isolates were recovered from 6 sputum specimens obtained from patients, but
13 Legionella species isolates and 5 Lp1 isolates were recovered from environmental sources.
Whole-genome sequencing (WGS) was performed on all 5 Lp1 isolates and single nucleotide
polymorphism (SNP) and multilocus sequence typing (wgMLST) were carried out for in-depth
Abstract: molecular characterization of circulating strains. SNP analysis was used to compare isolates
recovered during the investigation with historical Lp isolates from New York State (NYS). One
isolate matched two unrelated clinical isolates from NYS with 24-25 SNPs differences,
whereas, remaining isolates were closely related to each other and environmental isolates
previously recovered during the 2015 South Bronx outbreak, which indicates persistence of
this strain in NYC. In silico sequence-based typing revealed that 4 isolates were sequence type
(ST) 1400 that has been found only in NY and 2 of these isolates shared a high degree of
similarity (>99% allele identity) with a clinical isolate from 2009 recovered in NYC. WGS has
provided additional resolution to outbreak investigations. In this investigation, two different
genomic sequence analysis methods were used with comparable results. Both methods were
able to distinguish and separate isolates based on their relatedness. We conclude that
particular Legionella strains could be endemic and persistent in NYC based on similarity of
strains and ST unique to NY. Additionally, we detected diversity of potential disease-causing
strains in cooling towers when compared with strains commonly found in the United States.
Our investigation showcases how WGS is crucial in outbreak investigations and highlights
need for obtaining clinical isolates from patients with legionellosis to identify disease sources
to prevent additional exposures.
Poster
167
Board #:
Prevalence and Serovar Diversity of Salmonella spp. in Primary Agriculturalhorticultural Fruit
Title:
Production Environments
Author L. Chidamba, L. L. Korsten, A. Gomba;
Block: University of Pretoria, Pretoria, SOUTH AFRICA.
Increases in foodborne disease outbreaks associated with fresh produce have necessitated
the need to identify potential sources of microbial contamination in produce and agricultural
environments. The present study evaluated Salmonella prevalence and serovar diversity in
fruit (225), water (140) and surface (126) samples, from three commercial farms and
associated packhouses, located in different farming regions in South Africa. Fruit and water
samples were collected from both orchards and packhouses, while surface samples were
collected from conveyer belts and hands of packhouse employees. Salmonella was detected
in 26 of the 491 (5.3%) samples. Environmental samples (water and surfaces) recorded a
slightly higher proportion (3.1%; 15/491) of positive samples compared to fruit samples
Abstract: (2.2%; 11/491). Salmonella was not detected on employee hands and river water samples. A
total of 263 Salmonella cultures were isolated from the 26 from positive samples by standard
culture methods, preliminarily identified through matrix-assisted laser desorption ionisation-
time of flight mass spectroscopy (MALDI-TOF MS) and API 20E, and confirmed by invA gene.
Of the 39 representative isolates serotyped the serovars Muenchen (33.3%), Typhimurium
(30.8%), Heidelberg (20.5%), Bsilla (7.7%), Salmonella subspecies IIb: 17: r: z (5.1%) and one
untypable strain were identified. Most samples had multiple serovars with orchard water
form one site recording the highest serovar diversity (4 serovars). Our findings show the
potential of agricultural fruit production environments to act as reservoirs of clinically
important Salmonella serovars.
Poster
168
Board #:
Applying Bioinformatics Pipelines to Reconstruct Bacterial Genomes from Human Faecal
Title:
Metagenomes
F. M. Mobegi1, L. E. Leong1, B. Ramadass2, E. Mortimer3, M. J. Manary4, D. H. Alpers5, G. P.
Young3, B. S. Ramakrishna6, G. B. Rogers1;
Author 1SAHMRI, Adelaide, AUSTRALIA, 2All India Institute of Medical Sciences, Odisha,
Block: INDIA, 3Flinders University, Bedford Park, AUSTRALIA, 4Washington University in St. Louis, St
Louis, MO, 5Washington University in St. Louis, St. Louis, MO, 6SRM Institutes for Medical
Science, Chennai, INDIA.
Introduction: Advances in metagenomics and computational methods, together with
reductions in sequencing costs, have aided culture-independent studies of complex microbial
systems. Metagenomics-based analysis has primarily focused on assessments of microbiome
diversity and functional capacities. However, deep metagenomics sequencing also allows the
reconstruction and exploration of draft microbial genomes. This approach is particularly
powerful in relation to culture-refractory species. We aimed to retrieve high-quality draft
bacterial genomes from faecal metagenomes, generated from pre-school children in Tamil
Nadu, India. Methods: Shotgun metagenomic sequencing was performed on longitudinal
stool sample collections from three stunted and three non-stunted adolescents who were
enrolled in a starch supplementation study. On average, 261 million high-quality reads were
obtained from each sample. Using IDBA and CD-HIT, the reads were de novo assembled into
contigs and dereplicated to remove redundant sequences respectively. Individual sample
reads were then mapped to the non-redundant contigs using Burrow-Wheeler Aligner (BWA),
and the resulting BAM files sorted and indexed using Samtools. MetaBAT, with all the five
preset parameters, and a depth file of each BAM file, was used to bin contigs, as previously
described1. Resulting bins, which represent draft genomes, were assessed for completeness
and purity and refined using checkM and RefineM. Taxonomic assignments for the drafts
Abstract: were confirmed using Kraken. Results: Based on deep metagenomic sequencing we
reconstructed near-complete genomes of 114 bacterial taxa with high genome quality
(completeness ≥70%, contamination ≤10%). Approximately 93% of all recovered genomes
represented fermentative commensal species. Although the reconstructed genomes
displayed notable consistency with their type-strains in the core genome composition, some
selected culture-refractory anaerobic bacteria revealed significant differences with their type-
strain counterparts in the accessory genome. In monosaccharides metabolism, for
example, reconstructed A. muciniphila has genes needed to utilise d-galacturonate and d-
glucuronate, which are absent in the type-strain A. muciniphila (ATCCBAA-835). In contrast,
the ATCC strain has genes for fructose utilisation that are absent in our genome. These
differences might reflect characteristics of local diet. Conclusion: We demonstrated the
successful recovery of draft bacterial genomes from faecal metagenomes and their
comparison to type-strains. This ability to construct bacterial genomes directly from
metagenomes is valuable in allowing the analysis of culture-refractory taxa, and is likely to be
particularly important in contexts where advanced culture techniques are unavailable. It also
provides a means to mine existing published datasets.
Literature
1. Parks, D.H. et al. Nature Microbiology 2, 1533-1542 (2017).
Poster
169
Board #:
Title: Building Bioinformatics Infrastructure in New England State Public Health Laboratories
T. Hsu1, G. Gallagher1, N. L. Yozwiak2, D. J. Park2, C. Tomkins-Tinch2, T. Fink1, N. Consortium3,
P. Sabeti2, S. Smole1;
Author 1
Massachusetts State Public Health Laboratory, Jamaica Plain, MA, 2Broad Institute of MIT and
Block:
Harvard, Cambridge, MA, 3Northeast Environmental and Public Health Laboratory Directors,
NA, MA.
Background: Public health laboratory surveillance systems have historically relied on two
types of assays to detect pathogens, which include i) profiling the organism-of-interest via
morphological traits (microscopy), metabolic capability (culture), and/or molecular subtyping,
or ii) surveying the environment of the organism-of-interest through serology. With
decreasing costs, next generation sequencing (NGS) has emerged as a surveillance tool that
potentially provides a standardized protocol across groups of microorganisms and finer
resolution than subtyping. Unfortunately, the implementation of NGS and bioinformatics
analyses in state laboratories has remained challenging. Methods: Massachusetts State Public
Health Laboratory (MA SPHL) was funded as the bioinformatics leader laboratory in 2018 for
the New England area, which includes CT, MA, ME, NH, NY, RI, VT, NYC, and NJ. In order to
determine the status of bioinformatics for these laboratories, MA SPHL hosted a series of 6
calls in partnership with the Broad Institute of MIT and Harvard. The first 3 calls reviewed
each state’s sequencing and bioinformatics infrastructure, while the latter 3 calls consisted of
demonstrations of cloud computing through Amazon Web Services (AWS) and Google
Abstract: Compute Engine. Results: Surveys and calls revealed that most states had MiSeq sequencing
capability, and participated in CDC programs such as PulseNet, National Antimicrobial
Resistance Monitoring System (NARMS), Global Health Outbreak and Surveillance Technology
(GHOST), and CaliciNet. Wadsworth NY was an outlier with both sequencing and
bioinformatics cores, along with active assay and pipeline development for organisms outside
those projects (for e.g, adenovirus, mumps, and zika). Difficulties in setting up bioinformatics
infrastructure stemmed from information technology (IT) resistance for Linux or Cloud
support, little funding for bioinformatics staff, and lack of data policies for sequencing
data. Conclusions: Most New England states have obtained the ability to sequence by
participating in CDC programs, but rely on the CDC or outside collaborators for bioinformatics
analyses. Each state will likely require its own unique solution due to differing state laws and
governance, which in turn shapes state IT departments and prevents them from emulating
CDC’s compute model. To provide future guidance, we are currently drafting a
“Bioinformatics Implementation Guide”, which will include considerations for hiring
bioinformaticians, finding compute hardware, and working with IT.

ASMNGS 2018 Abstracts

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ASMNGS 2018 Abstracts

Uploaded by

Copyright:

Available Formats

Title: 1: Epidemiological Cues: NGS in Clinical and Public Health Microbiology

Time: Monday, September 24, 2018, 9:00 am - 12:30 pm

You might also like