Periodontal Microbial Dark Matter

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

DOI: 10.1111/prd.

12349

REVIEW ARTICLE

Probing periodontal microbial dark matter using


metataxonomics and metagenomics

Purnima S. Kumar1 | Shareef M. Dabdoub1 | Sukirth M. Ganesan2


1
Department of Periodontology, College of Dentistry, The Ohio State University, Columbus, Ohio, USA
2
Department of Periodontics, College of Dentistry and Dental Clinics, The University of Iowa, Iowa City, Iowa, USA

Correspondence
Purnima S. Kumar, Division of periodontology, College of Dentistry, 305 West 12th Avenue, Columbus, Ohio43210, USA.
Email: kumar.83@osu.edu

1 |  I NTRO D U C TI O N ecosystem: the field of metagenomics. Here, we review the philo-
sophical, technological, and methodological advances that led to the
To understand how bacteria cause disease, it is important to not development of DNA sequencing as a tool to interrogate microbial
only understand how they colonize the host, what partners they re- communities and how this contributed to advancing our understand-
cruit to enable their host-associated lifestyle, how they evade the ing of periodontal health and disease.
immune system or create an environment of immune tolerance, but
also when, why, and how pathobionts are created. Our view of the
periodontal microbial community has been shaped by a century or 1.1 | Why sequence?
more of cultivation-based and microscopic investigations. While
these studies firmly established the infection-mediated etiology of Our knowledge of the role played by bacteria in the etiology of
periodontal diseases, it was apparent from the very early days that periodontal diseases has co-evolved with the development of tech-
periodontal microbiology suffered from what Staley and Konopka niques and technologies for bacterial identification and characteri-
described as the “great plate count anomaly”, in that these cultur- zation. The earliest evidence that dental plaque contains bacteria
able bacteria were only a minor part of what was visible under the came from the light microscopy-based observations of Anton von
1
microscope. For nearly a century, much effort has been devoted to Leewenhoek over 400 years ago. 2 He called them “animalcules” and
finding the right tools to investigate this uncultivated majority, also drew what we today recognize as fairly accurate representations
known as “microbial dark matter”. of cocci, bacilli, and fusiform bacteria, and even some mobile and
The discovery that DNA was an effective tool with which to “see” gliding bacteria. The work of Robert Koch, Fannie Hesse, and Julius
microbial dark matter was a significant breakthrough in environmen- Petri enabled isolating pure cultures of bacteria.3 These methods,
tal microbiology, and oral microbiologists were among the earliest when applied to characterizations of periodontal microbiota, pro-
to capitalize on these advances. By identifying the order in which pelled the discovery of cultivable species associated with gingivitis
nucleotides are arranged in a stretch of DNA (DNA sequencing) and and periodontitis, as well as periodontal health. Work by Noguchi,4
creating a repository of these sequences, sequence databases were Mcintosh et al,5 Socransky et al,6 and Rosebury and Reynolds7 in
created. Computational tools that used probability-driven analysis developing anaerobic cultivation methods quickly led to the discov-
of these sequences enabled the discovery of new and unsuspected ery of many anaerobic bacteria associated with periodontal disease.
species and ascribed novel functions to these species. The discovery Large-scale cultivation studies by Moore and Moore8 began the age
that the evolutionary history of prokaryotes was recorded in cer- of “We know what we can grow”. Using 51 000 isolates obtained
tain molecules and that mapping this evolutionary distance was a from cases of periodontal health, adult gingivitis, and periodontitis
robust method to identify bacteria at the level of species led to a in circumpubertal and adult individuals, certain patterns of bacterial
new branch of study: metataxonomics. The philosophy that host-as- colonization were identified.8 Actinomyces and streptococci played
sociated bacteria live in organized, cooperating communities led to a primary role in colonization of the tooth surface, and Fusobacterium
advances in studying the entire collection of microbial genes in an nucleatum emerged as a principal and frequent initiator of gingival

© 2020 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd

Periodontology 2000. 2020;00:1–16.  |


wileyonlinelibrary.com/journal/prd     1
|
2       KUMAR et al.

inflammation. These studies identified 28 nonspirochetal species, 50-300, or, based on the triplet code, one in 150-900 nucleotides.
five treponemal species, and at least one Mycoplasma species as im- Therefore, DNA is an excellent repository of not only evolution-
portant initiators of destructive periodontitis. These species would ary memory, but also the drivers of that evolutionary change.
later form the basis of the DNA-DNA checkerboard (see below).
However, when results from microscopy-based methods were cor- And thus began an era of molecular typing methodologies. Early
related with cultivation-based approaches, it quickly became appar- techniques largely used restriction enzymes to digest DNA and
ent that a large fraction of organisms in the environment, including compared the resulting banding patterns as measures of genetic
the mouth, were uncultivated. Staley and Konopka1 called this the relatedness.13-16 Plasmid DNA digests, restriction fragment length
“great plate count anomaly”. polymorphism and terminal restriction fragment length polymor-
As we moved away from the “single culprit” (eg, tuberculosis, phism, pulsed-field gel electrophoresis, and ribotyping are all exam-
typhoid, cholera, malaria) to the “multiple culprit” model of disease ples of restriction enzyme digestion.17 When these methods were
causation, “similarity” between organisms became an important used to answer questions about periodontal pathogens, several im-
method of identifying potential pathogens. To do this, microbiolo- portant lessons were learnt:
gists of the late 1800s and early 1900s adopted principles that were
being used to classify eukaryotes.9 However, this was highly inef- 1. Bacterial species demonstrate a variable amount of heteroge-
fective; unlike prokaryotes, eukaryotes such as plants and animals neity in their DNA content. This led to the identification of
are rich in morphologic and behavioral detail which can be used to “clonal types” based on differences in band patterns generated
place them on the phylogenetic tree. One classic example is the mis- on a gel following restriction enzyme digestion.18 For exam-
10
classification of archaea as bacteria. These ancient organisms are ple, Aggregatibacter actinomycetemcomitans demonstrated limited
morphologically similar to bacteria and were known as archaebac- genetic heterogeneity, with no more than two clonal types in
teria based on cultivation and microscopic characterization. But it any one patient, while F. nucleatum demonstrated 5-18 clonal
is now established that these organisms are genetically more similar types, with different clonal types emerging following successful
to eukaryotes than to bacteria and have been placed in a domain of treatment of periodontal disease.
their own. 2. Clonal types of certain, but not all, bacterial species can be verti-
Pioneering work by Zuckerkandl11 demonstrated that genetic cally transmitted from mother to child, however, a similar pattern
material (DNA, RNA, polypeptides) can be used to identify bacteria is not observable with fathers. An organism that has been very
with a much greater degree of accuracy than phenotypic character- well studied in this regard is Streptococcus mutans.
istics. According to the theory of “emergent evolution”, the first liv- 3. Even although bacteria demonstrate prolific capabilities for hori-
ing cell arose once a certain level of protein organization had been zontal gene transmission, they exist as discrete taxonomic enti-
9
achieved in the “primordial soup”. As the cell grew more organized, ties, and characterization of genetic diversity within species can
and developed more specialized functions, the story of its evolution provide information regarding their role in pathogenesis.
was captured in certain molecules within its organization. Thus, the
modern cell is a record of its own evolutionary history. In this regard, Another popular method was DNA-DNA hybridization (pop-
the DNA molecule is considered the first nucleic acid polymer to ularly known as checkerboard), in which digoxigenin-labeled sin-
evolve, as it carries both the memory of and the information needed gle-stranded DNA from cultivated species were immobilized on a gel
to create a protein. Evolution from the first DNA-containing cell to and an environmental sample was hybridized to it.19 Because of the
the modern-day cell occurred through a series of mutations, each need to obtain genomic DNA, this methodology was restricted to
of which had an incremental impact on cell physiology. The greater cultivated species, thereby further perpetuating the “We know what
the number of mutations in the DNA, the greater the changes to cell we can grow” paradigm. The 33 species (28 nonspirochetal species
organization and metabolism. Therefore, two organisms that share and five treponemes, see above) that had been identified through
the same evolutionary history will have more similarities in their culturing were the ideal candidates for this method. Over 350 pub-
DNA, and therefore in their properties and behavioral attributes. lications resulted from this approach, and we learnt much about the
Zuckerkandl12called these molecules semantides and set forth the abundances of these selected species in health and disease, as well
following arguments to support the use of semantophoretic mole- as shifts in their abundances following surgical and nonsurgical in-
cules to assign phylogenetic relationships: terventions. However, these individual species are part of complex
communities and their role in disease causation or perpetuation can
1. DNA is a linear permutation of one of four nucleotides (ad- only be fully understood when studied in an ecological context. This
enine, cytosine, guanine, and thymine) and the amino acids was quickly borne out when their ability to act as diagnostic or prog-
that they encode can exist in one of 20 states. This allows nostic markers of periodontal diseases was tested and yielded no
for objective comparisons to be made between the sequences consistent results.
of two organisms. It soon became apparent that the single-culprit or multiple-cul-
2. Evolutionary changes that impact the organization and function prit model of pathogenesis did not explain the variation in disease
of a cell typically occur through substitution of one amino acid in incidence, severity, or progression and that cultivation, microscopy,
KUMAR et al. |
      3

F I G U R E 1   First-generation DNA sequencing technologies. Example DNA to be sequenced. (A) is illustrated undergoing either Sanger
(B) or Maxam-Gilbert (C) sequencing. (B) Sanger's “chain-termination” sequencing. Radio- or fluorescently labeled dideoxynucleotides of
a given type—which once incorporated, prevent further extension—are included in DNA polymerization reactions at low concentrations
(primed off a 5′ sequence; not shown). Therefore in each of the four reactions, sequence fragments are generated with 3′ truncations as a
dideoxynucleotide is randomly incorporated at a particular instance of that base (underlined 3′ terminal characters). (C) Maxam and Gilbert's
“chemical sequencing” method. DNA must first be labeled, typically by inclusion of radioactive P 32 in its 5′ phosphate moiety (shown
here by Ⓟ ). Different chemical treatments are then used to selectively remove the base from a small proportion of DNA sites. Hydrazine
removes bases from pyrimidines (cytosine and thymine), while hydrazine in the presence of high salt concentrations can only remove those
from cytosine. Acid can then be used to remove the bases from purines (adenine and guanine), with dimethyl sulfate being used to attack
guanines (although adenine will also be affected to a much lesser extent). Piperidine is then used to cleave the phophodiester backbone
at the abasic site, yielding fragments of variable length. (D) Fragments generated from either methodology can then be visualized via
electrophoresis on a high-resolution polyacrylamide gel: sequences are then inferred by reading “up” the gel, as the shorter DNA fragments
migrate fastest. In Sanger sequencing (left) the sequence is inferred by finding the lane in which the band is present for a given site, as the
3′ terminating labeled dideoxynucleotide corresponds to the base at that position. Maxam-Gilbert sequencing27 requires a small additional
logical step: Ts and As can be directly inferred from a band in the pyrimidine or purine lanes, respectively, while G and C are indicated by the
presence of dual bands in the G and A + G lanes, or C and C + T lanes, respectively. (Reused with permission from25)
|
4       KUMAR et al.

F I G U R E 2   Second-generation DNA sequencing parallelized amplification. (A) DNA molecules being clonally amplified in an emulsion
PCR. Adapter ligation and PCR produces DNA libraries with appropriate 5′ and 3′ ends, which can then be made single-stranded and
immobilized onto individual suitably oligonucleotide-tagged microbeads. Bead-DNA conjugates can then be emulsified using aqueous
amplification reagents in oil, ideally producing emulsion droplets containing only one bead (illustrated in the two leftmost droplets, with
different molecules indicated in different colors). Clonal amplification then occurs during the emulsion PCR as each template DNA is
physically separate from all others, with daughter molecules remaining bound to the microbeads. This is the conceptual basis underlying
sequencing in 454, Ion Torrent, and polony sequencing protocols. (B) Bridge amplification to produce clusters of clonal DNA populations in a
planar solid-phase PCR reaction, as occurs in Solexa/Illumina sequencing. Single-stranded DNA with terminating sequences complementary
to the two lawn-oligos will anneal when washed over the flow-cell, and during isothermal PCR will replicate in a confined area, bending
over to prime at neighboring sites, producing a local cluster of identical molecules. (C) and (D) demonstrate how these two different forms
of clonally amplified sequences can then be read in a highly parallelized manner: emulsion PCR-produced microbeads can be washed over a
picotiter plate, containing wells large enough to fit only one bead (C). DNA polymerase can then be added to the wells, and each nucleotide
can be washed over in turn, and deoxynucleotide incorporation monitored (eg, via pyrophosphate or hydrogen ion release). Flow-cell–bound
clusters produced via bridge amplification (D) can be visualized by detecting fluorescent reversible-terminator nucleotides at the ends of a
proceeding extension reaction, requiring cycle-by-cycle measurements and removal of terminators. (Reused with permission from25)

or targeted DNA-based approaches such as PCR or checkerboard 1965, when Holley et al22 decoded the sequence of alanine trans-
were inadequate for capturing the richness and diversity of the oral fer RNA from Saccharomyces cereviciae. At the same time, Sanger
microbiome. One of the watershed moments in oral microbiologi- et al23 used two-dimensional fractionation to develop a collection
cal research was the discovery that 60% of oral bacteria were as of of ribosomal and transfer RNA sequences from various organisms.
yet uncultivated, and that 33% of oral bacterial species had been Wu and Taylor's24 discovery that radioactive nucleotides could be
misclassified using phenotypic characterization methods. 20 Hence incorporated into DNA using DNA polymerase enabled the order
began the era of sequencing in oral microbiology. (or sequence) of nucleotides in a DNA fragment to be inferred (re-
viewed by Heather and Chain25). Sequencing was revolutionized in
1977 when Sanger et al26 developed the chain-termination (or dide-
2 |  H I S TO RY O F S EQ U E N C I N G oxy) technique. By incorporating radiolabeled chemical analogs of
deoxynucleotides into the reaction and identifying which one was
Although Watson and Crick 21 deduced the double-turn helix struc- incorporated into the DNA strand, one could “read” the sequence of
ture of DNA using x-ray crystallography in 1953, the order in which a gene (Figure 1). This “first-generation” sequencing technique un-
nucleotides were present in DNA could not be discerned until derwent another innovation: a method to measure pyrophosphate
KUMAR et al. |
      5

during incorporation of deoxynucleotides was discovered, allow- Historically, several molecules have emerged as molecular clocks,
25
ing for multiple sequences to be analyzed in parallel (Figure 2). for example, the genes encoding elongation and initiation factors for
This “second-generation” sequencing technique, also known as translation, RNA polymerase subunits, ATPase subunits, DNA gy-
“pyrosequencing”, was soon replaced by a method in which the de- rases, recA, heat-shock proteins, cytochrome c, and ribosomal RNA
oxynucleotides are immobilized to a flow-cell surface, and barcoded genes. However, the 16S subunit of the ribosomal RNA gene, which
DNA is flowed over it (Figure 3). This method, also known as “next encodes the small subunit (30S) of the ribosome, 29 became the ideal
generation” or “bridge sequencing”, allowed for massively parallel candidate because
sequencing of multiple samples (HiSeq and MiSeq are examples of
sequencing platforms that use this method). Sequence-based char- • it is present in all organisms
acterizations of microbial communities can be achieved through • the ribosome is a central organelle in cellular function, ensuring
one of two techniques: amplicon sequencing or whole-genome that the gene is conserved
shotgun sequencing. • mutations occur in different parts of the gene at different rates,
Amplicon sequencing refers to a method by which specific genes enabling us to compute phylogenetic distances accurately
(or regions within genes) are sequenced and compared with a data- • they are large-sized genes (~ 1600 base pairs)
base for bacterial identification. Certain characteristics are central • there are fewer secondary structures (about 45 helical loops) in
to the success of amplicon sequencing methodology: the RNA, making it easy to sequence
• there are swathes of gene sequences that are conserved, making
1. The gene is a viable “molecular clock”.9 A molecular clock is it easy to design universal primers, while the nine hypervariable
any molecule whose sequence has undergone random changes regions (designated V1-V9) are documents of the phylogenetic
over certain periods of time. By comparing two or more species distances between species.
carrying the same molecule, it becomes possible to measure
when their DNA began to diverge. Changes in sequence are Comparison of 16S sequences led to the creation of the phylo-
computed as a product of the rate at which the mutation genetic tree of life, 28 which demonstrated three distinct domains,
occurs in a population, and the time over which these changes eukaryote, archaea, and bacteria. As sequencing technology im-
have occurred.28 proved, it became possible to develop a collection of unambiguous,
2. The gene is ubiquitous. In order to make comparisons between full-length sequences, which improved the ability to identify spe-
different species, it is important that all species possess this gene. cies using these genes. Curated sequence banks such as the Human
3. The gene carries out a sufficiently important cellular function that Oral Microbiome database, SILVA, greengenes, and the Ribosomal
its sequence is conserved. Database Project are some of the classifiers that are in present use.30

F I G U R E 3   Third-generation DNA sequencing nucleotide detection. (A) Nucleotide detection in a zero-mode waveguide, as featured
in PacBio sequencers. DNA polymerase molecules are attached to the bottom of each zero-mode waveguide (*), and target DNA and
fluorescent nucleotides are added. As the diameter is narrower than the excitation light's wavelength, illumination rapidly decays traveling
up the zero-mode waveguide: nucleotides being incorporated during polymerization at the base of the zero-mode waveguide provide real-
time bursts of fluorescent signal, without undue interference from other labeled deoxynucleotides (dNTPs) in solution. (B) Nanopore DNA
sequencing as employed in Oxford Nanopore Technonologies's MinION sequencer. Double-stranded DNA gets denatured by a processive
enzyme (†), which ratchets one of the strands through a biological nanopore (‡) embedded in a synthetic membrane, across which a voltage
is applied. As the single-stranded DNA passes through the nanopore the different bases prevent ionic flow in a distinctive manner, allowing
the sequence of the molecule to be inferred by monitoring the current at each channel. (Reused with permission from25)
|
6       KUMAR et al.

2.1 | What 16S sequencing taught us about the oral person-to-person bacterial transmission in densely populated
microbiome in healthy individuals areas, hunter-gatherer vs agrarian diet, and genotypic and ethnic
differences between cohorts. The geography of the oral cavity
Ten years after Kroes et al20 demonstrated that 48% of the oral itself appears to be a driver of microbial assemblages. Both tooth
microbiome was uncharacterized, and that 37% of cultivated oral location (molar vs incisor) as well as location on the tooth (buccal
species had been misidentified using phenotypic characterizations, vs lingual) are discriminants of community profiles, while mucosal
the Human Microbiome Project was undertaken.31,32 This effort ob- biofilms segregate along salivary flow gradients.41-44
tained 16S sequences from seven oral (buccal mucosa, hard palate, 4. Within the same geographic location, ethnicity is a determinant
keratinized gingiva, saliva, supra- and subgingival plaque, and tongue of the oral microbiome, with the biggest differences between
dorsum) and two oropharyngeal sites (throat and palatine tonsils) of African Americans and Caucasians.45 In fact, the difference be-
182 individuals aged 18-40 years. Several studies were made pos- tween these two racial/ethnic groups living in the USA is greater
sible as a result of technological advancements in high-throughput than that between either group and ethnicities from China or
deep sequencing as well as the development of bioinformatic pipe- Mexico. This pattern of microbial dissimilarity follows the differ-
lines for sequence analysis. From all these studies, we learnt the ing prevalence of periodontal disease among these racial/ethnic
following: groups.46
5. There is persuasive evidence that the oral microbiome is a herit-
1. The oral microbiome is second only to the gastrointestinal able feature, much like our genome. A recent study demonstrated
tract in species richness and diversity. However, a “core” group that the microbiome of predentate infants is a complete subset of
of organisms can be identified in the oral samples of 95% of that of their mothers, and that 85% of this predentate microbiome
individuals.33,34 The number of “core species” varies from 22 in is retained throughout adult life.47 Newer studies are emerging
saliva to 11 in supra- and subgingival plaque to seven on the to show that children of parents with aggressive periodontitis ac-
33
keratinized gingiva. However, the abundance of these “core quire a dysbiotic microbiome very early in life, and that this can-
species” varies widely among individuals. Even although there not be altered through professional oral prophylaxis. However,
are five distinct colonization niches in the oral cavity, certain studies of twins reared together and apart demonstrate that both
species (belonging to the genera Fusobacterium, Streptococcus, genetics and environment can influence oral bacterial acquisition
Veillonella, Granulicatella, and Gemella) are shared by all these and colonization.48,49
niches. This level of species uniformity is the greatest in the oral 6. Anthropogenic behaviors play important roles in shaping the peri-
cavity when compared with other body sites. On the other hand, odontal microbiome. While the subgingival microbiome is stable
certain genera (notably Bacteroides, Prevotella, Corynebacterium, against small perturbations in the ecosystem such as food intake
Fusobacterium, Pasteurella, and Neisseria) contain species that (also known as pulses), sustained insults to the ecosystem (known
are highly related phylogenetically yet demonstrate distinct pre- as pressed perturbations), for example, smoking,50-54 electronic
dilections for specific oral sites. For example, Corynebacterium cigarette use,55 obesity,56 and alcohol use57 are all drivers of al-
matruchotii was identified in supragingival plaque of most indi- tered community assemblages. The response of the subgingival
viduals, while a close relative, Corynebacterium argentoratense, microbiome is specific to each type of exposure and has the ca-
was most frequently detected in saliva and the hard palate. pacity to customize the immuno-inflammatory machinery of the
2. The oral microbiome plays an important role in vascular home- host accordingly.
ostasis, and the entero-salivary nitrate metabolic cycle is one 7. Systemic diseases and conditions play important roles in shifting
method by which it mediates its effects. Blood vessel diameter, the subgingival microbial community structure.58 Among these,
especially in the peripheral circulatory system, is controlled by there is strong evidence that pregnancy,59 rheumatoid arthritis,60
endothelial nitric oxide synthase, an enzyme whose activity is diabetes,61,62 chronic renal failure,63,64 pancreatic cancer,65 and
catalyzed by nitric oxide. Humans do not have the capacity to pro- Fanconi's anemia66 all contribute to shifting species richness,
duce nitric oxide from dietary nitrates and are entirely depend- evenness, and diversity in unique ways. When anthropogenic in-
ent on their bacterial communities for this functionality. Certain fluences are superimposed on these systemic diseases, the micro-
oral commensals are known to reduce salivary nitrates to nitrite.35 biome shifts in unpredictable ways.
Moreover, using a daily antibacterial mouthwash results in a 90% 8. Early studies using 16S sequencing reported that the periodontal
decrease in salivary nitrite levels, accompanied by a mean in- microbiome demonstrates a high degree of temporal stability over
crease of 3.5 mmHg in blood pressure.36 Together, these studies a 2-year period in the absence of disease or other major perturba-
attest to the significant role played by oral bacteria in controlling tions.67 However, several studies document temporal fluctuations
peripheral blood flow. in the salivary68,69 and tongue70 communities over short or long
3. Populations from different geographic locations demonstrate observation periods. Together, these data suggest that there is
distinct microbial profiles.37-40 Several rationalizations have significant interpersonal as well as location-dependent variability
been proposed to explain these differences, including easier in the stability of the oral microbiome.71
KUMAR et al. |
      7

2.2 | What 16S sequencing taught us about the a microbial community from a specific environment was defined
subgingival microbiome in periodontal disease “metagenomics” by Handelsman et al.88
The foundation for metagenomic exploration of human microbial
16S amplicon sequencing provided us with an unprecedented view ecosystems was laid by pioneering studies on soil and ocean ecosys-
of the richness of the oral microbiome and enabled the discovery of tems. 25,89 Venter et al90 carried out the first whole-genome shotgun
organisms that had never been cultivated or had been misidentified sequencing in the ocean ecosystem and identified 1.2 million un-
during cultivation-dependent methods.72-74 It was initially estimated known genes and 148 previously unidentified bacterial phylotypes
that 60% of the subgingival microbiome74 was as of yet uncultivated, (bacteria whose presence is only known by their DNA sequence).
however, this was later revised to 45%.73 The perspective provided Once community DNA has been sheared and the fragments
by Sanger sequencing, and later by next-generation sequencing, sequenced, one of two approaches is possible: genome-centric or
taught us the following: gene-centric. In the first approach, the fragments are arranged on a
scaffold to recreate complete bacterial genomes, and thereby esti-
1. Oral dysbiosis differs from that of the gut, where disease is mate the microbial content of the community. This is known as as-
accompanied by a reduction in the number of indigenous com- sembly-driven metagenomics and it investigates the metabolic and
mensals and an increase in density of one or more species.75 By lifestyle capabilities of an ecosystem in the context of the organ-
contrast, the subgingival microbiome of a periodontally healthy isms that inhabit them. However, this approach requires substantial
individual contains 45-90 species,72,76 whose abundances do not sequencing depth to allow for meaningful genome assembly. In the
vary significantly following initial colonization. About one-third second approach, the information enclosed in the fragmented se-
of these species are influenced by the universal characteristics quences can be used to deduce the metabolic and other functional
of the subgingival environment and do not exhibit significant capabilities of a community, and two samples can be compared ei-
interpersonal variability.33 By contrast, the defining elements ther at the level of gene families, operons, or cellular processes. This
of established periodontitis are a microbiome that exhibits high approach, known as comparative metagenomics, allows for compar-
alpha and beta diversity (a greater number of species that ison of not only the functional capabilities of the two communities
demonstrate a wide range of abundances)77 and significant but also the taxonomic composition of these communities, as taxon-
heterogeneity between individuals78 and within disease sites omy can be deduced from the 16S sequences in the metagenome or
79
in the same mouth. This microbial heterogeneity poses a from other marker genes.
challenge to identifying microbial markers of disease onset or In 2006, Gill et al91 characterized the functionalities encoded in
progression. the gut metagenome of healthy humans for the first time. Metabolic
2. A highly diverse and species-rich microbiome is associated with reconstruction of the gut microbiomes of two unrelated healthy
80
aggressive periodontitis, especially localized aggressive peri- adults showed significant enrichment for genes involved in several
odontitis. The microbiome not only includes bacterial species, but metabolic pathways that had not been previously reported: the me-
also members of the domain Archaea (Methanobrevibacter oralis, tabolism of xenobiotics, glycans, and amino acids, the production of
Methanobacterium curvum/congolense, and Methanosarcina mazeii) methane, and the biosynthesis of vitamins and isoprenoids through
81,82
and viruses. the 2-methyl-D-erythritol 4-phosphate pathway. In another land-
3. The peri-implant sulcus provides a unique colonization niche for mark study by Turnbaugh et al,92 the host phenotype (obesity) cor-
oral bacteria, and the peri-implant microbiome is distinct from the related with microbial genes and specific metabolic pathways. The
adjoining periodontal ecosystem.83-87 Peri-implantitis and peri- obesity-associated gut microbiome showed an increased capacity to
53
implant mucositis are microbiologically similar entities, giving promote fat deposition. This study not only highlighted the merit
rise to the paradigm that peri-implant mucositis is a pivotal event of comparative metagenomic studies but also indicated a way in
in the progression to irreversible destruction of peri-implant which in silico observations could be used to design translatable ex-
tissues. perimental models and in turn determine the mechanism of human
diseases.

3 | E VO LU TI O N O F M E TAG E N O M I C S
3.1 | What comparative metagenomics taught us
From an ecological perspective, 16S amplicon sequencing answers about periodontal disease
the question: Who is out there? However, ribosomal genes contrib-
ute to <0.1% of the total genome, and hence, short-read amplicon It was not until 2012 that comparative metagenomic approaches en-
sequencing techniques do not resolve the question: What are these tered periodontal literature. A pilot study of 15 subgingival plaque
communities capable of doing? This question is best answered by samples from two patients with periodontitis and three healthy
determining the proteins encoded by the microbial community and controls by Liu et al,93 using the Illumina sequencing platform
differentiating samples based on microbial functions rather than (2  × 76 base pairs paired end), reported taxonomic and functional
type of species. The process of collecting genes from members of changes associated with periodontitis for the first time. Despite
|
8       KUMAR et al.

limitations such as high levels (~90%) of contamination with human inter-kingdom cooperativity. Interestingly, we also identified that the
DNA, and lower sequencing coverage, the authors were able to con- taxonomic and functional profiles of sites that were clinically healthy
struct several oral microbes, and preliminarily characterize some in patients diagnosed with periodontitis were similar to active dis-
system-level differences between periodontal health and disease. ease sites. This finding indicated a global dysbiosis in the ecosystem
Of note, they reported that the subgingival microbiome in periodon- and not a site-specific dysbiosis, as previously believed, thus shift-
titis was characterized by the presence of previously uncultivated ing the paradigm in periodontal microbial ecology. With significant
TM7s and enrichment in several degradation pathways and virulence microbial heterogeneity, a predominance of specialist species, and
factors. the presence of novel functions in a disease-associated microbiome,
In 2013, Wang et al94 used a deeper sequencing approach we were able to identify 30 genes encoding 14 distinct functions as
(Illumina HiSeq 2 × 100 base pairs) for their comparative analysis discriminators of periodontitis.
of 16 subgingival plaque samples from six healthy and five subjects
with periodontitis. They established a core-disease microbiota, iden-
tified almost 30 novel and previously unidentified species in the oral 3.2 | Strain-resolved metagenomics and what it can
microbiome, and reported the contribution of phages to disease in teach us about periodontal disease
this ecosystem. Additionally, they reported an overrepresentation
of several functional categories traditionally attributed to virulence Metagenomics has traditionally been used to provide a global view of
and pathogenicity such as bacterial chemotaxis, lipopolysaccharide, microbial communities in the context of their environments. Recent
flagellar assembly, and glycan biosynthesis. efforts, however, have been aimed at using this approach to exam-
Several microbial genes associated with virulence factors and ine gene content and the functional potential of bacterial strains.97
functional components related to amino acid metabolism, glycos- Microbial strains are low-level taxonomic units that maintain the
aminoglycan, and pyrimidine degradation were enriched in peri- same phenotype under different exposures/conditions; however,
odontitis in a targeted functional gene array approach with ~135 they show evident genotypic heterogeneity. In other words, the
000 probes by Li et al.95 In a longitudinal study, Shi et al96 established same microbial species can have substantially different strains. The
shifts in the functional potential of the subgingival microbiome in argument made in favor of this approach is that bacterial strains are
response to treatment. Distinct shifts in taxonomic composition and the building blocks of any microbiome, and that accessory genes
an increase in co-occurrence between bacterial members occurred present in only specific strains can encode important functions that
in the disease sites, but not in the resolved sites. Along with the allow the organism to survive and thrive in a particular environment.
levels of anaerobes such as Campylobacter, Treponema, Tannerella, This strain-level heterogeneity can explain the differences in the
and Porphyromonas, there were associated shifts in functions such virulence of pathogens.98 Identifying strains from metagenomic data
as chemotaxis and flagellar motility, which were associated with can be performed by mapping metagenomic sequence reads against
disease sites (n = 12). Also, specific functions, for example, lysine reference genomes and obtaining a catalog of single-nucleotide vari-
degradation, were overrepresented in disease sites, while lysine bio- ant patterns or marker genes across different samples or time points.
synthesis was overrepresented in shallow sites. When three patients Several computational tools are available to facilitate this analysis.
were followed up for a third visit (baseline with disease, after scaling Meta-multi locus sequence typing99 identifies a group of hypervari-
and root planing, maintenance visit) after therapy, the authors were able loci in a species and creates a sequence type for each strain. By
able to use the state of the microbiome at the second visit to predict comparing the metagenome under consideration with a database of
the clinical trajectory at the third visit with an accuracy of 81.1%. multi locus sequence types, not only can the data be resolved at the
The first comprehensive site-specific functional characterization strain level, but new sequence types which diverge from those in
of the oral microbiome encompassing all the members of the eco- the database can reveal the presence of novel strains. StrainPhlAn100
system, including viral, fungal, and archaeal species in periodontal extends this approach to species for which at least one reference
health and disease, in a larger sample size with robust statistical com- genome is available. Pangenome-based phylogenomic analysis,101
79
parisons was published by our group in 2016. This study was the on the other hand, uses the presence/absence of genes to create
first undertaking in the field of oral microbiome research to: (a) es- a strain-specific gene repertoire. Lineage102 and ConStrains103 har-
tablish a core metabolic network associated with periodontal health ness the potential of single nucleotide polymorphisms within univer-
and disease; (b) identify the discriminatory functional markers; and sal genes to identify strain-level differences. In another approach,
(c) determine contributing members of the functional networks. Our assembled contiguous sequences can be grouped into “bins”, rep-
findings of upregulated virulence functions such as lipopolysaccha- resenting distinct organisms, known as metagenome-assembled ge-
ride synthesis, iron metabolism, and bacterial chemotaxis in disease, nomes. De novo Extraction of Strains from MetAgeNomes104 uses
upheld the definition of the “disease microbiome” as described by bins derived from de novo co-assembly of multiple samples to re-
other groups using whole-genome shotgun sequencing approaches. solve sequence variants within core genes and designate them as
The disease state was also associated with an increase in abundance haplotypes and strains.
of archaeal members that correlated with a corresponding increase One of the earliest studies in the human microbiome
in anaerobic/fermentation pathways and increased functional to document strain-level genomic variations along with
KUMAR et al. |
      9

genotypic-phenotypic interplay was published by Morowitz tool for implementing this approach to assemble original draft
105
et al in 2011. However, this study only investigated selected human genome for the Human Genome Project.116
taxa (Citrobacter spp.). A comprehensive single-nucleotide vari- With improved efficiency and automation, and decreased costs
ant analysis of 252 samples from 207 individuals from Europe of sequencing, large-scale analysis of microbial communities became
and North America identified individual specific strains and es- a reality. The recognition that the information content of ribosomal
tablished temporal stability in the genomic variation patterns in RNA (particularly the nearly universal 16S subunit), with its mix of
longitudinal samples.106 Consequently, in 2015 and 2016, sev- highly conserved and hypervariable regions, would provide a means
eral groups utilized a marker-species or pan-gene approach to to identify organisms with unprecedented accuracy, provided “pow-
track the strain-level variations in the metagenomes across dif- erful new tools to the microbial ecologist”.117 Now two approaches
ferent samples. However, attaining the accuracy and resolution could be pursued. First, databases of known sequences correspond-
necessary to isolate genomes and reconstruct variants was still ing to organisms could be compiled and searched. Indeed, in 1980
challenging. In 2017, Truong et al107 utilized a novel strain identi- the Approved Lists recorded 1791 valid species. By 2007 that number
fication approach in more than 1500 metagenomes from subjects had increased to 8168,118 and the current release of the SILVA small
in five different continents and demonstrated: (a) subject-specific subunit non-redundant 99 database contains 510 984 sequences
strain variation; (b) stability of the identified strains in species; and separated by a 99% identity cutoff.119 Second, novel species could
(c) geographic/host factor-specific variation in the microbial strain be determined by a process of elimination against such databases
structure. Strain variation in Bacteroides was the most consistent and by clustering with self-similar sequences from environmental
while Prevotella copri was the most plastic colonizer. samples. Both approaches required new bioinformatic tools for com-
While the literature related to strain-resolved metagenomics is putational search and alignment techniques.
principally in the environmental and gut microbiomes, subject-spe- One of the first bioinformatic approaches developed for se-
cific, site-specific, and strain-specific colonization of the most prev- quence similarity was the Needleman-Wunsch global alignment
alent oral commensal Neisseria has been demonstrated across the technique.120 It is an optimal matching algorithm that assigns a score
different niches in the oropharynx. This work, published by Donati (taking into consideration matches, mismatches, and gaps) to every
et al108 in 2016, laid the foundations for future pan-genome stud- possible alignment between two sequences. While, practically, this
ies in the oral cavity. In a study of oral, vaginal, and gut microbiome means that the algorithm is quite slow, it is still used today when
109
in pregnant women, Goltsman et al were able to associate strain exact global alignment is a primary concern. In 1981, the Smith-
variations in the oral microbiome with pregnancy complications such Waterman algorithm was proposed to perform local sequence align-
as preeclampsia. Currently, few investigations explore strain reso- ment.121 As opposed to the global alignment approach, this algorithm
lution in the oral metagenome, and no evidence of metagenomic considers alignments of all lengths between two sequences. While
strain-level variations associated with periodontal health or disease this approach was still quite slow, it allowed for much wider applica-
is available. tion given the underlying biology and sequencing approaches. When
comparing two sequences, depending on the source and where the
sequencing happened to start, a global alignment is much less likely
4 | E VO LU TI O N O F Q UA NTITATI V E than a partial, or local, alignment. Returning to targeted sequencing
B I O LO G Y of 16S ribosomal RNA, we can see that the combination of sequence
starting point and biological variation make local alignment a more
One of the most consequential realizations in the field of molecular reasonable approach.
biology was that genetic information (whether DNA, RNA, or protein) However, while local alignment provided expanded applicabil-
could be recorded digitally as strings of characters, each represent- ity, the Smith-Waterman algorithm was impractical for the kind
ing a single base or amino acid. Once Frederick Sanger determined of large-scale investigations enabled by expanding databases and
the sequence of insulin,110 it quickly became clear that computers improved sequencing technologies. Consequently, it was gener-
would be essential in this arena, as multiple sequence comparisons ally replaced by heuristic-enabled algorithms that were less exact-
by hand would be impractical and, sure enough, developments in ing, but substantially more computationally efficient. The FASTA
sequencing were necessarily accompanied by advances in software algorithm122,123 published in 1985 was one of the more popular
for making use of the resulting data. of this type. In addition to DNA-DNA alignment, it allowed for
The assembly of the human genome took a “shotgun” approach translated alignments for comparing DNA sequences with protein
where large insert clones were fragmented and sequenced, with each databases. Soon after, in 1990, a breakthrough algorithm was pub-
sequence read assigned per-base quality scores and reassembled by lished: the basic local alignment tool, known as BLAST,124 which
overlap consensus. The most common approach for achieving this has become one of the most widely used tools in bioinformatics.
111
were the software packages Phred (whose scoring approach is Its emphasis on speed through its novel heuristic algorithm was
still used today), Phrap,112 Consed,113 and PolyPhred.114,115 Adjacent vital to enable searches of large databases such as GenBank.125,126
clones were then aligned and assembled employing a guided ap- Part of its successful approach was the introduction of “seeding”,
proach using a Tiling Path map, with Gigassembler being the primary where it quickly finds significant short matches (high-scoring
|
10       KUMAR et al.

segment pairs) between two sequences aided by evolutionarily of previously added sequences to avoid the problem of poor ini-
127
guided scoring matrices (eg, BLOcks SUbstitution Matrix) be- tial alignment choice in progressive alignment. Another variation in
fore continuing to a more exact matching process by extending the these algorithms involves the use of hidden Markov models to con-
seeds to longer ungapped alignments, followed by gapped align- struct probabilistic inference models to assign likelihood scores to
ments.128 This screening process allowed it to rapidly discard large sets of multiple sequence alignments. Biosequence analysis using
numbers of sequences that could not provide a significant match. profile hidden Markov models165 is the most popular of this variety.
Furthermore, a significant contributor to its widespread use One issue with using local alignment algorithms for amplicon se-
was making a web-accessible interface available on the National quencing is that they attempt to find all homologous sequences at a
Center for Biotechnology Information website.125,126,129 detailed level. While the basic local alignment search tool and simi-
Following its initial release, the core algorithm has been im- lar algorithms were significantly faster than the previous generation
proved (basic local alignment search tool +),130 extended to enable of optimal matching algorithms, they were not designed to identify
large simultaneous query sets (mega basic local alignment search redundancy, precisely the result of targeted sequencing employing
tool),131,132 to examine distantly related proteins (position-specific PCR amplification. In moderate complexity environments such as the
133
iterative basic local alignment search tool), and has been central oral cavity, or even high complexity environments such as oceans or
processing unit-, graphics processing unit-, and field-programmable soils, the number of unique organisms is orders of magnitude smaller
gate-accelerated.134-136 In addition, it has inspired a subsequent ex- than the number of sequences which can be derived from those en-
plosion of heuristic local alignment algorithms, including Sequence vironments, especially with the current generation of sequencing
137 138
Search and Alignment by Hashing Algorithm, miBLAST, machines. Thus, sequence clustering algorithms were employed or
BLAT,139 Bowtie,140 USEARCH,141 Bowtie 2,142 High Speed Basic developed to quickly reduce the number of sequences considered
Local Alignment Search Tool Nucleotide,128 double index alignment for matching to databases with local alignment algorithms. One of
of next-generation sequencing data,143 and Many-against-Many se- the first approaches was CD-HIT,166 which was created to reduce
144,145
quence searching. the size of protein databases by eliminating highly homologous se-
The wide applicability, speed, and accuracy of these tools has quences. It selected the longest sequence and proceeded through
seen them being commonly used for read-mapping analysis of the rest of the data in decreasing order classifying sequences as re-
whole-genome shotgun sequencing (whole genome sequencing or dundant or representative (new cluster), employing word indexing,
whole metagenome sequencing) data, where sequenced reads are and counting tables to rapidly accomplish this task. A parallelized
mapped to general or specialized sequence and protein databases version was released in 2012166 to handle the substantial increase
125,126
such as National Genetic Sequence Data Base, the pepti- in sequencing depth provided by high-throughput sequencing tech-
dase database,146 Kyoto encyclopedia of genes and genomes,147 nologies. UCLUST141 represented an improvement in speed and sen-
1458,149 150,151 152
Pfam, Clusters of Orthologous Groups, Swiss-Prot, sitivity over CD-HIT by exploiting the fact that “similar sequences
SEED,153 Reference Sequence Database, 154
UniRef,155 UniProt,156 tend to have short words in common”. By sorting sequences in de-
Virulence Factors Data Base,157 evolutionary genealogy of genes: creasing order of common words, substantial improvements could
Non-supervised Orthologous Groups,158 Carbohydrate-Active be achieved.
159
Enzyme database, database of carbohydrate-active enzyme These tools were highly popular in 16S sequencing analysis, as
(CAZyme) sequence and annotation,160 and Integrated Microbial evidenced by their inclusion in the software pipeline for microbial
Genome/Virus.161 ecology, Quantitative Insights Into Microbial Ecology.167 The other
While local alignment algorithms have achieved widespread most widely used pipeline mothur168 employed the PHYLogeny
use across molecular biology, targeted sequencing and phyloge- Inference Package169 to create a pairwise distance matrix between
netic analysis require algorithms implementing multiple sequence sequences, followed by the unweighted-pair group method using
alignment, where three or more sequences (DNA, RNA, protein) average linkage clustering to group the sequences, resulting in a
are compared simultaneously to determine sequence homology similar end product to CD-HIT and UCLUST. More recently, these
and evolutionary relationships. The original approach is the pro- relatively general clustering algorithms have been replaced by highly
gressive multiple sequence alignment method first introduced by targeted algorithms focused on the characteristics of the 16S ribo-
Feng and Doolittle.162 This method started with pairwise alignment, somal RNA (or another amplicon sequencing target) gene as well as
using the Needleman-Wunsch algorithm mentioned earlier, to align the sequencing platforms themselves and their error distributions.
closely related sequences first then continuing to add more distantly These algorithms belong to a new class that produce amplicon se-
related sequences as the alignment grows. The CLUSTAL W algo- quence variants170 as a successor to the operational taxonomic units
rithm followed including additional heuristics and distance-based produced by the previous generation of clustering algorithms. The
clustering to improve alignments and speed, and its family of algo- tools taking this approach include UPARSE,171 Minimum Entropy
rithms are still the most popular of this type (eg, multiple alignment Decomposition,172 Divisive Amplicon Denoising Algorithm,173 and
163
using fast Fourier transform ). The software Multiple Sequence deblur.174 By focusing on controlling sequencing error, these algo-
164
Comparison by Log-Expectation represents further improvement rithms can resolve sequences down to single nucleotide variants
of the current generation by iteratively reassessing the alignment without relying on arbitrary similar cutoffs (eg, 97%) and provide
KUMAR et al. |
      11

a means for comparing labels across studies without requiring They attributed these differences to database size, sequence source
reclustering. (amplicon vs genome assemblies), and sequences with missing taxo-
In the realm of 16S sequencing, there are four primary gen- nomic information.
eral purpose (nonhabitat-specific) databases used for analysis175:
SILVA,176 the Ribosomal Database Project,177 the National Center
for Biotechnology Information,178 and greengenes.179 Each varies 4.1 | How sequencing analysis
in their data sources, number of included sequences, methods for pipelines and bioinformatics shaped our
assigning and resolving taxonomic assignment, covered domains of understanding of the oral microbiome
life, included ribosomal targets, and frequency of updates. SILVA
focuses on classifications for bacteria, archaea, and eukarya, cu- The oral microbiome is known to harbor between 600 and 700 spe-
rating 16S and 18S sequence data primarily from International cies of bacteria, with some estimates as high as 1200.76 Archaeal and
Nucleotide Sequence Database Collaboration members. Taxonomic viral species are also present, but less well characterized; Gannoum
labeling is based on Bergey's Manual of Systematic Bacteriology,180 et al184 reported 101 fungal species. Furthermore, of the known bac-
181
List of Prokaryotic names with Standing in Nomenclature, and terial constituents, ~ 30% have not yet been cultivated.73 In order to
the International Society of Protistologists, but is ultimately man- provide comprehensive and reliable references for studies involving
ually curated.175,176 The Ribosomal Database Project covers bacte- the oral cavity, given the issues discussed, efforts have been taken
ria, archaea, and fungi, differing from SILVA by inclusion of the 28S to develop oral-specific databases. Two habitat-specific databases
marker. It sources sequence data from GenBank and the European have been created to fill this role: the Human Oral Microbiome
Bioinformatics Institute as well as from direct submissions. As Database185 and CORE, a “phylogenetically curated 16S rDNA data-
with SILVA, taxonomy in the Ribosomal Database Project is based base of the core oral microbiome”.186
on Bergey's Manual of Systematic Bacteriology and LPSN, but fun- The Human Oral Microbiome Database aims to provide both a
175,177
gal assignment is conducted through their own efforts. The resource for taxonomic identification as well as functional charac-
National Center for Biotechnology Information database is solely terization of oral samples by providing both 16S and whole genome
focused on submitted 16S sequence data. Bacteria and archaeal reference databases. It was originally created to provide a “stable tax-
sequence data are available as separate BioProjects for download onomic structure for the unnamed oral taxa” by introducing a human
(PRJNA33175 and PRJNA33317, respectively). For identification oral taxon ID, such that each named species and phylotype cluster is
purposes, a pre-built BLAST database is also provided.175,178 It uti- assigned an ID and therefore independent of naming. The original da-
lizes manual curation of naming based on over 150 sources, both tabase was constructed from 600 16S ribosomal RNA gene libraries
general and organism-specific. Its efforts are an ongoing process and over 35 000 clone sequences. As per the latest release, version
in conjunction with data submissions, and daily updates are issued. 15.2 (2019/09/02), the Human Oral Microbiome Database contains
Finally, greengenes is similarly focused on 16S sequences for bac- 1015 full-length 16S sequences, 1570 complete and draft genomic
teria and archaea. Taxonomic classification is drawn primarily from sequences, and has been expanded to represent the aero-diges-
National Center for Biotechnology Information in conjunction tive tract; it is now called the expanded Human Oral Microbiome
with de novo phylogenetic tree construction utilizing alignments Database.187 The CORE database was created with a similar objec-
175
based on sequence and secondary structure. tive, and with a focus on correcting classification and naming incon-
Although there are many general similarities between the da- sistencies in oral species found in existing 16S databases. The initial
tabases in their sources for data and naming, the exact sequences data source was a combination of known named and cultured oral
included, filtering and chimera-checking, tree creation, and simple species from GenBank and 20 000 Sanger-generated 16S sequences
human error, all combine to create differences in assignment accu- collected from 200 human subjects, ultimately resulting in 1043
racy (outside of tool choice). Even from a simple naming perspective sequences representing 636 species.186 In both cases, the authors
there are substantial differences. In their recent article, Balvočiūtė reported substantial improvements in oral sample characterization
175
and Huson attempted mapping each of these databases to each over the general purpose databases discussed above.185-187 As with
other by simple phylogeny. They found that 63%-90% of taxa at any database, however, an unavoidable issue is its finite nature.
various levels of the taxonomic ranks were unique to one of the Known oral species mined from the literature are necessarily re-
four databases. National Center for Biotechnology Information and stricted by the patients and study subjects they were isolated from.
SILVA shared the most with each other: 60% at the phylum, class, With known differences in microbiome composition by ethnicity and
and order ranks, and 10% at the genus and species ranks. However, geography,45,188 certain study designs may benefit by incorporating
National Center for Biotechnology Information appeared to be the multiple databases in an analysis. Indeed, with any single database, it
most inclusive of all the databases. Another recent study published can be difficult to know whether an unidentified sequence is simply
182
by Park and Won utilized a mock community analysis to compare missing, represents a novel organism, or is the result of sequenc-
assignment accuracy among SILVA, greengenes, and EzBioCloud183 ing error. Lourenço et al189 studied the oral and gut microbiomes in
(no longer publicly available). Among each of the databases they individuals with and without periodontal disease. Utilizing both the
reported up to 20% false-positive assignments at the genus level. greengenes and Human Oral Microbiome databases, they reported
|
12       KUMAR et al.

the presence of pathogenic species associated with inflammation 11. Zuckerkandl E, Pauling L. Molecules as documents of evolutionary
history. J Theor Biol. 1965;8(2):357-366.
and periodontal tissue destruction in the gut microbiomes of their
12. Zuckerkandl E. On the molecular evolutionary clock. J Mol Evol.
subjects regardless of periodontal health status. Shi et al96 extracted 1987;26(1–2):34-46.
16S sequences from whole-genome shotgun sequencing data and 13. Forney LJ, Zhou X, Brown CJ. Molecular microbial ecology: land of
utilized both the Human Oral Microbiome and CORE databases, as the one-eyed king. Curr Opin Microbiol. 2004;7(3):210-220.
14. Liu WT, Marsh TL, Cheng H, Forney LJ. Characterization of mi-
well as SILVA. To verify these taxonomic results, they constructed
crobial diversity by determining terminal restriction fragment
a customized database of genomic sequences and mapped their length polymorphisms of genes encoding 16S rRNA. Appl Environ
whole-genome shotgun sequencing reads against it. Ultimately, Microbiol. 1997;63(11):4516-4522.
they found the results of each of the methods to be consistent with 15. Montagner F, Gomes BP, Kumar PS. Molecular fingerprinting
each other. Another recent study utilized a similar approach with the reveals the presence of unique communities associated with
paired samples of root canals and acute apical abscesses. J Endod.
stated goal of increasing the resolution in an understudied group:
2010;36(9):1475-1479.
individuals of Arab heritage. Al-Hebshi et al190 combined modified 16. Sakamoto M, Huang Y, Ohnishi M, Umeda M, Ishikawa I, Benno
versions of the Human Oral Microbiome database, the expanded Y. Changes in oral microbial profiles after periodontal treatment
Human Oral Microbiome database, and greengenes based on their as determined by molecular analysis of 16S rRNA genes. J Med
Microbiol. 2004;53(Pt 6):563-571.
previous experiences with low resolution results using a single da-
17. Wang X, Jordan IK, Mayer LW. Chapter 29 - A phylogenetic per-
tabase. As sequencing becomes less and less expensive and more spective on molecular epidemiology. In: Tang Y-W, Sussman M, Liu
globally available, databases will certainly improve, but the issues we D, Poxton I, Schwartzman J, eds. Molecular Medical Microbiology,
have discussed should be kept in mind when designing and executing 2nd edn. Boston, MA: Academic Press; 2015:517-536.
18. Genco RJ, Loos BG. The use of genomic DMA fingerprinting in
future studies.
studies of the epidemiology of bacteria in periodontitis. J Clin
Periodontol. 1991;18(6):396-405.
19. Socransky SS, Haffajee AD, Smith C, et al. Use of checkerboard
5 |  CO N C LU S I O N DNA-DNA hybridization to study complex microbial ecosystems.
Oral Microbiol Immunol. 2004;19(6):352-362.
20. Kroes I, Lepp PW, Relman DA. Bacterial diversity within the human
In summary, DNA sequencing methodologies have expanded our subgingival crevice. Proc Natl Acad Sci USA. 1999;96(25):14547-14552.
understanding of the oral and periodontal microbial ecosystem ex- 21. Watson JD, Crick FH. Molecular structure of nucleic acids. Nature.
ponentially. While these methods are not without bias or error, they 1953;171(4356):737-738.
22. Holley RW, Apgar J, Everett GA, et al. Structure of a ribonucleic
provide a means to identify previously unknown or unsuspected
acid. Science (New York, NY). 1965;147(3664):1462-1465.
species in these niches, explore their contributions to health and 23. Sanger F, Brownlee G, Barrell B. A two-dimensional frac-
their role in disease etiology, and pave the way for targeted, preci- tionation procedure for radioactive nucleotides. J Mol Biol.
sion therapeutics. 1965;13(2):373-IN4.
24. Wu R, Taylor E. Nucleotide sequence analysis of DNA. II. Complete
nucleotide. J Mol Biol. 1971;57(3):491-511.
REFERENCES
25. Heather JM, Chain B. The sequence of sequencers: the history of
1. Staley JT, Konopka A. Measurement of in situ activities of non- sequencing DNA. Genomics. 2016;107(1):1-8.
photosynthetic microorganisms in aquatic and terrestrial habitats. 26. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-termi-
Annu Rev Microbiol. 1985;39:321–346. nating inhibitors. Proc Natl Acad Sci USA. 1977;74(12):5463-5467.
2. Gest H. The discovery of microorganisms by Robert Hooke and 27. Maxam AM, Gilbert W. A new method for sequencing DNA. Proc
Antoni Van Leeuwenhoek, fellows of the Royal Society. Notes Rec Natl Acad Sci USA. 1977;74(2):560–564. https://doi.org/10.1073/
R Soc Lond. 2004;58(2):187-201. pnas.74.2.560
3. Blevins SM, Bronze MS. Robert Koch and the ‘golden age’of bac- 28. Woese CR. Interpreting the universal phylogenetic tree. Proc Natl
teriology. Int. J. Infect. Dis. 2010;14(9):e744–51. http://www.scien​ Acad Sci USA. 2000;97(15):8392-8396.
cedir​ect.com/scien​ce/artic​le/pii/S1201​97121​0 023143 29. Pace NR. A molecular view of microbial diversity and the bio-
4. Noguchi H. Method for the pure cultivation of patho- sphere. Science (New York, NY). 1997;276(5313):734-740.
genic Treponema pallidum (Spirochaeta pallida). J Exp Med. 30. Williams TA, Heaps SE. Chapter 2 - an introduction to phylogenet-
1911;14(2):99-108. ics and the tree of life. In: Goodfellow M, Sutcliffe I, Chun J, eds.
5. Mcintosh J, Fildes P, Bulloch W. A new apparatus for the isola- Methods in Microbiology. Vol. 41. Cambridge, MA: Academic Press;
tion and cultivation of anaerobic micro-organ isms. The Lancet. 2014:13-44.
1916;187(4832):768-770. 31. The Human Microbiome Project Consortium. Structure func-
6. Socransky S, Macdonald JB, Sawyer S. The cultivation of tion and diversity of the healthy human microbiome. Nature.
Treponema microdentium as surface colonies. Arch Oral Biol. 2012;486(7402):207-214.
1959;1:171-172. 32. The Human Microbiome Project Consortium. A framework for
7. Rosebury T, Reynolds JB. Continuous anaerobiosis for cultivation human microbiome research. Nature. 2012;486(7402):215-221.
of spirochetes. Proc Soc Exp Biol Med. 1964;117:813-815. 33. Huse SM, Ye Y, Zhou Y, Fodor AA. A core human microbiome
8. Moore WE, Moore LV. The bacteria of periodontal diseases. as viewed through 16S rRNA sequence clusters. PLoS One.
Periodontol. 2000;1994(5):66-77. 2012;7(6):e34242.
9. Woese CR. Bacterial evolution. Microbiol Rev. 1987;51(2):221-271. 34. Li K, Bihan M, Methé BA. Analyses of the stability and core
10. Eme L, Spang A, Lombard J, Stairs CW, Ettema TJG. Archaea and taxonomic memberships of the human microbiome. PLoS One.
the origin of eukaryotes. Nat Rev Microbiol. 2017;15(12):711-723. 2013;8(5):e63139.
KUMAR et al. |
      13

35. Doel JJ, Benjamin N, Hector MP, Rogers M, Allaker RP. Evaluation 57. Fan X, Peters BA, Jacobs EJ, et al. Drinking alcohol is associated
of bacterial nitrate reduction in the human oral cavity. Eur J Oral with variation in the human oral microbiome in a large study of
Sci. 2005;113(1):14-19. American adults. Microbiome. 2018;6(1):59.
36. Kapil V, Haydar SM, Pearl V, Lundberg JO, Weitzberg E, Ahluwalia 58. Kumar PS. From focal sepsis to periodontal medicine: a century
A. Physiological role for nitrate-reducing oral bacteria in blood of exploring the role of the oral microbiome in systemic disease. J
pressure control. Free Radic Biol Med. 2013;55:93-100. Physiol. 2017;595(2):465-476.
37. Nasidze I, Li J, Quinque D, Tang K, Stoneking M. Global diversity in 59. Paropkari AD, Leblebicioglu B, Christian LM, Kumar PS. Smoking,
the human salivary microbiome. Genome Res. 2009;19(4):636-643. pregnancy and the subgingival microbiome. Sci Rep. 2016;6:30388.
38. Rylev M, Kilian M. Prevalence and distribution of principal peri- 60. Lopez-Oliva I, Paropkari AD, Saraswat S, et al. Dysbiotic subgingi-
odontal pathogens worldwide. J Clin Periodontol. 2008;35(8 val microbial communities in periodontally healthy patients with
Suppl):346-361. rheumatoid arthritis. Arthritis Rheumatol. 2018;70(7):1008-1013.
39. Li J, Nasidze I, Quinque D, et al. The saliva microbiome of Pan and 61. Ganesan SM, Joshi V, Fellows M, et al. A tale of two risks:
Homo. BMC Microbiol. 2013;13(1):204. smoking, diabetes and the subgingival microbiome. ISME J.
40. Nasidze I, Li J, Schroeder R, Creasey JL, Li M, Stoneking M. High 2017;11(9):2075-2089.
diversity of the saliva microbiome in Batwa Pygmies. PLoS One. 62. Zhou M, Rong R, Munro D, et al. Investigation of the effect
2011;6(8):e23352. of type 2 diabetes mellitus on subgingival plaque microbi-
41. Proctor DM, Fukuyama JA, Loomer PM, et al. A spatial gradient ota by high-throughput 16S rDNA pyrosequencing. PLoS One.
of bacterial diversity in the human oral cavity shaped by salivary 2013;8(4):e61516.
flow. Nat Commun. 2018;9(1):681. 63. Araujo MV, Hong BY, Fava PL, et al. End stage renal disease as a
42. Proctor DM, Shelef KM, Gonzalez A, et al. Microbial biogeography modifier of the periodontal microbiome. BMC Nephrol. 2015;16:80.
and ecology of the mouth and implications for periodontal dis- 64. Leung WK, Yau JY, Jin LJ, et al. Subgingival microbiota of renal
eases. Periodontol 2000. 2020;82(1):26-41. transplant recipients. Oral Microbiol Immunol. 2003;18(1):37-44.
43. Sreenivasan PK, DeVizio W, Prasad KV, et al. Regional differences 65. Fan X, Alekseyenko AV, Wu J, et al. Human oral microbiome and
within the dentition for plaque, gingivitis, and anaerobic bacteria. prospective risk for pancreatic cancer: a population-based nested
J Clin Dent. 2010;21(1):13-19. case-control study. Gut. 2018;67(1):120-127.
44. Simon-Soro A, Tomas I, Cabrera-Rubio R, Catalan MD, Nyvad 66. Furquim CP, Soares GM, Ribeiro LL, et al. The salivary microbiome
B, Mira A. Microbial geography of the oral cavity. J Dent Res. and oral cancer risk: a pilot study in fanconi anemia. J Dent Res.
2013;92(7):616-621. 2017;96(3):292-299.
45. Mason MR, Nagaraja HN, Camerlengo T, Joshi V, Kumar PS. 67. Kumar PS, Leys EJ, Bryk JM, Martinez FJ, Moeschberger Ml,
Sequencing identifies ethnicity-specific bacterial signatures in the Griffen Al. Changes in periodontal health status are as-
oral microbiome. PLoS One. 2013;8(10):e77287. sociated with bacterial community shifts as assessed by
46. Eke PI, Dye BA, Wei L, Thornton-Evans GO, Genco RJ. Prevalence quantitative 16S cloning and sequencing. J Clin Microbiol.
of periodontitis in adults in the United States: 2009 and 2010. J 2006;44(10):3665-3673.
Dent Res. 2012;91(10):914-920. 68. Lazarevic V, Whiteson K, Hernandez D, Francois P, Schrenzel J.
47. Mason MR, Chambers S, Dabdoub SM, Thikkurissy S, Kumar PS. Study of inter- and intra-individual variations in the salivary micro-
Characterizing oral microbial communities across dentition states biota. BMC Genom. 2010;11:523.
and colonization niches. Microbiome. 2018;6(1):67. 69. Belstrøm D, Holmstrup P, Bardow A, Kokaras A, Fiehn N-E, Paster
48. Michalowicz BS, Diehl SR, Gunsolley JC, et al. Evidence of a sub- BJ. Temporal stability of the salivary microbiota in oral health.
stantial genetic basis for risk of adult periodontitis. J Periodontol. PLoS One. 2016;11(1):e0147472.
2000;71(11):1699-1707. 70. Kageyama S, Asakawa M, Takeshita T, et al. Transition of bacterial
49. Michalowicz BS, Wolff LF, Klump D, et al. Periodontal bacteria in diversity and composition in tongue microbiota during the first
adult twins. J Periodontol. 1999;70(3):263-273. two years of life. mSphere. 2019;4(3):00187-19.
50. Bizzarro S, Loos BG, Laine ML, Crielaard W, Zaura E. Subgingival 71. Flores GE, Caporaso JG, Henley JB, et al. Temporal variability is
microbiome in smokers and non-smokers in periodonti- a personalized feature of the human microbiome. Genome Biol.
tis: an exploratory study using traditional targeted tech- 2014;15(12):531.
niques and a next-generation sequencing. J Clin Periodontol. 72. Aas JA, Paster BJ, Stokes LN, Olsen I, Dewhirst FE. Defining
2013;40(5):483-492. the normal bacterial flora of the oral cavity. J Clin Microbiol.
51. Camelo-Castillo AJ, Mira A, Pico A, et al. Subgingival microbiota 2005;43(11):5721-5732.
in health compared to periodontitis and the influence of smoking. 73. Dewhirst FE, Chen T, Izard J, et al. The human oral microbiome. J
Front Microbiol. 2015;6:119. Bacteriol. 2010;192(19):5002-5017.
52. Shchipkova AY, Nagaraja HN, Kumar PS. Subgingival mi- 74. Paster BJ, Boches SK, Galvin JL, et al. Bacterial diversity in human
crobial profiles of smokers with periodontitis. J Dent Res. subgingival plaque. J Bacteriol. 2001;183(12):3770-3783.
2010;89(11):1247-1253. 75. Kriss M, Hazleton KZ, Nusbacher NM, Martin CG, Lozupone CA.
53. Tsigarida AA, Dabdoub SM, Nagaraja HN, Kumar PS. The influ- Low diversity gut microbiota dysbiosis: drivers, functional implica-
ence of smoking on the peri-implant microbiome. J Dent Res. tions and recovery. Curr Opin Microbiol. 2018;44:34-40.
2015;94(9):1202-1217. 76. Deo PN, Deshmukh R. Oral Microbiome: Unveiling the Fundamentals.
54. Mason MR, Preshaw PM, Nagaraja HN, Dabdoub SM, Rahman A, Vol. 23. Alphen aan den Rijn, Netherlands: Wolters Kluwer
Kumar PS. The subgingival microbiome of clinically healthy current Medknow Publications; 2019:122-128.
and never smokers. ISME J. 2015;9(1):268-272. 77. Abusleme L, Dupuy AK, Dutzan N, et al. The subgingival microbi-
55. Kumar PS, Clark P, Brinkman MC, Saxena D. Novel nicotine deliv- ome in health and periodontitis and its relationship with commu-
ery systems. Adv Dent Res. 2019;30(1):11-15. nity biomass and inflammation. ISME J. 2013;7(5):1016-1025.
56. Tam J, Hoffmann T, Fischer S, Bornstein S, Grassler J, Noack B. 78. Boutin S, Hagenfeld D, Zimmermann H, et al. Clustering of sub-
Obesity alters composition and diversity of the oral microbiota in gingival microbiota reveals microbial disease ecotypes associated
patients with type 2 diabetes mellitus independently of glycemic with clinical stages of periodontitis in a cross-sectional study. Front
control. PLoS One. 2018;13(10):e0204724. Microbiol. 2017;8:340.
|
14       KUMAR et al.

79. Dabdoub SM, Ganesan SM, Kumar PS. Comparative metagenom- 100. Truong DT, Tett A, Pasolli E, Huttenhower C, Segata N. Microbial
ics reveals taxonomically idiosyncratic yet functionally congruent strain-level population structure and genetic diversity from
communities in periodontitis. Sci Rep. 2016;6:38993. metagenomes. Genome Res. 2017;27(4):626-638.
80. Faveri M, Mayer MP, Feres M, de Figueiredo LC, Dewhirst FE, 101. Scholz M, Ward DV, Pasolli E, et al. Strain-level microbial epidemi-
Paster BJ. Microbiological diversity of generalized aggressive ology and population genomics from shotgun metagenomics. Nat
periodontitis by 16S rRNA clonal analysis. Oral Microbiol Immunol. Methods. 2016;13(5):435-438.
2008;23(2):112-118. 102. O'Brien JD, Didelot X, Iqbal Z, Amenga-Etego L, Ahiska B,
81. Lira EA, Ramiro FS, Chiarelli FM, et al. Reduction in prevalence Falush D. A Bayesian approach to inferring the phylogenetic
of Archaea after periodontal therapy in subjects with generalized structure of communities from metagenomic data. Genetics.
aggressive periodontitis. Aust Dent J. 2013;58(4):442-447. 2014;197(3):925-937.
82. Matarazzo F, Ribeiro AC, Feres M, Faveri M, Mayer MP. Diversity 103. Luo C, Knight R, Siljander H, Knip M, Xavier RJ, Gevers D.

and quantitative analysis of Archaea in aggressive periodon- ConStrains identifies microbial strains in metagenomic datasets.
titis and periodontally healthy subjects. J Clin Periodontol. Nat Biotechnol. 2015;33(10):1045-1052.
2011;38(7):621-627. 104. Quince C, Delmont TO, Raguideau S, et al. DESMAN: a new tool
83. Dabdoub SM, Tsigarida AA, Kumar PS. Patient-specific analysis of for de novo extraction of strains from metagenomes. Genome Biol.
periodontal and peri-implant microbiomes. J Dent Res. 2013;92(12 2017;18(1):181.
Suppl):168s-175s. 105. Morowitz MJ, Denef VJ, Costello EK, et al. Strain-resolved com-
84. Kumar PS, Mason MR, Brooker MR, O'Brien K. Pyrosequencing munity genomic analysis of gut microbial colonization in a prema-
reveals unique microbial signatures associated with healthy and ture infant. Proc Natl Acad Sci. 2011;108(3):1128-1133.
failing dental implants. J Clin Periodontol. 2012;39(5):425-433. 106. Schloissnig S, Arumugam M, Sunagawa S, et al. Genomic

85. Maruyama N, Maruyama F, Takeuchi Y, Aikawa C, Izumi Y, variation landscape of the human gut microbiome. Nature.
Nakagawa I. Intraindividual variation in core microbiota in peri-im- 2013;493(7430):45-50.
plantitis and periodontitis. Sci Rep. 2014;4:6602. 107. Truong DT, Franzosa EA, Tickle TL, et al. MetaPhlAn2 for en-
86. Robitaille N, Reed DN, Walters JD, Kumar PS. Periodontal and hanced metagenomic taxonomic profiling. Nat Methods.
peri-implant diseases: identical or fraternal infections? Mol Oral 2015;12(10):902-903.
Microbiol. 2016;31(4):285-301. 108. Donati C, Zolfo M, Albanese D, et al. Uncovering oral Neisseria
87. Yu XL, Chan Y, Zhuang L, et al. Intra-oral single-site comparisons of tropism and persistence using metagenomic sequencing. Nature
periodontal and peri-implant microbiota in health and disease. Clin Microbiology. 2016;1(7):16070.
Oral Implants Res. 2019;30(8):760-776. 109. Goltsman DSA, Sun CL, Proctor DM, et al. Metagenomic anal-
88. Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman ysis with strain-level resolution reveals fine-scale vari-
RM. Molecular biological access to the chemistry of unknown ation in the human pregnancy microbiome. Genome Res.
soil microbes: a new frontier for natural products. Chem Biol. 2018;28(10):1467-1480.
1998;5(10):R245-R249. 110. Stretton AOW. The first sequence: Fred Sanger and insulin.
89. Shendure J, Balasubramanian S, Church GM, et al. DNA sequencing Genetics. 2002;162(2):527–532.
at 40: past, present and future. Nature. 2017;550(7676):345-353. 111. Ewing B, Green P. Base-calling of automated sequencer traces
90. Venter JC, Remington K, Heidelberg JF, et al. Environmental ge- using phred. II. Error probabilities. Genome Res. 1998;8(3):186-194.
nome shotgun sequencing of the Sargasso Sea. Science (New York, 112. Ewing B, Hillier LD, Wendl MC, Green P. Base-calling of automated
NY). 2004;304(5667):66-74. sequencer traces using phred. I. Accuracy assessment. Genome
91. Gill SR, Pop M, Deboy RT, et al. Metagenomic analysis of Res. 1998;8(3):175-185.
the human distal gut microbiome. Science (New York, NY). 113. Gordon D, Abajian C, Green P. Consed: a graphical tool for se-
2006;312(5778):1355-1359. quence finishing. Genome Res. 1998;8(3):195-202.
92. Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, 114. Nickerson DA, Tobe VO, Taylor SL. PolyPhred: automating
Gordon JI. An obesity-associated gut microbiome with increased the detection and genotyping of single nucleotide substitu-
capacity for energy harvest. Nature. 2006;444(7122):1027-1031. tions using fluorescence-based resequencing. Nucleic Acids Res.
93. Liu B, Faller LL, Klitgord N, et al. Deep sequencing of the oral 1997;25(14):2745-2751.
microbiome reveals signatures of periodontal disease. PLoS One. 115. Machado M, Magalhães WCS, Sene A, et al. Phred-Phrap package
2012;7(6):e37919. to analyses tools: a pipeline to facilitate population genetics re-se-
94. Wang J, Qi J, Zhao H, et al. Metagenomic sequencing reveals mi- quencing studies. Investigative Genetics. 2011;2(1):3.
crobiota and its functional potential associated with periodontal 116. Kent WJ, Haussler D. Assembly of the working draft of the human
disease. Sci Rep. 2013;3(1):1843. genome with GigAssembler. Genome Res. 2001;11(9):1541-1548.
95. Li Y, He J, He Z, et al. Phylogenetic and functional gene structure 117. Pace NR, Stahl DA, Lane DJ, Olsen GJ. The Analysis of Natural
shifts of the oral microbiomes in periodontitis patients. ISME J. Microbial Populations by Ribosomal RNA Sequences. Boston, MA:
2014;8(9):1879-1891. Springer. 1986:1-55.
96. Shi B, Chang M, Martin J, et al. Dynamic changes in the subgingi- 118. Janda JM, Abbott SL. 16S rRNA gene sequencing for bacterial
val microbiome and their potential for diagnosis and prognosis of identification in the diagnostic laboratory: pluses, perils, and pit-
periodontitis. MBio. 2015;6(1):e01926-01914. falls. J Clin Microbiol. 2007;45(9):2761-2764.
97. Malmstrom RR, Eloe-Fadrosh EA. Advancing genome-re- 119. Glockner FO, Yilmaz P, Quast C, et al. 25 years of serving the com-
solved metagenomics beyond the shotgun. mSystems. munity with ribosomal RNA gene reference databases and tools. J
2019;4(3):e00118-00119. Biotechnol. 2017;261:169-176.
98. Segata N. On the road to strain-resolved comparative metage- 120. Needleman SB, Wunsch CD. A general method applicable to the
nomics. mSystems. 2018;3(2):e00190-e117. search for similarities in the amino acid sequence of two proteins.
99. Zolfo M, Tett A, Jousson O, Donati C, Segata N. MetaMLST: J Mol Biol. 1970;48(3):443-453.
multi-locus strain-level bacterial typing from metagenomic sam- 121. Smith TF, Waterman MS. Identification of common molecular sub-
ples. Nucleic Acids Res. 2017;45(2):e7. sequences. J Mol Biol. 1981;147(1):195-197.
KUMAR et al. |
      15

122. Lipman DJ, Pearson WR. Rapid and sensitive protein similarity 148. Bateman A, Birney E, Durbin R, Eddy SR, Howe KL, Sonnhammer
searches. Science (New York, NY). 1985;227(4693):1435-1441. ELL. The Pfam protein families database. Nucleic Acids Res.
123. Pearson WR, Lipman DJ. Improved tools for biological sequence 2000;28(1):263-266.
comparison. Proc Natl Acad Sci USA. 1988;85(8):2444-2448. 149. Finn RD, Coggill P, Eberhardt RY, et al. The Pfam protein families
124. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local database: towards a more sustainable future. Nucleic Acids Res.
alignment search tool. J Mol Biol. 1990;215(3):403-410. 2015;44(D1):D279-D285.
125. Bilofsky HS, Burks C, Fickett JW, et al. The GenBank genetic se- 150. Galperin MY, Makarova KS, Wolf YI, Koonin EV. Expanded micro-
quence databank. Nucleic Acids Res. 1986;14(1):1-4. bial genome coverage and improved protein family annotation in
126. Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch- the COG database. Nucleic Acids Res. 2014;43(D1):D261-D269.
Mizrachi I. GenBank. Nucleic Acids Res. 2018;47(D1):D94-D99. 151. Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG data-
127. Henikoff S, Henikoff JG. Amino acid substitution matrices from base: a tool for genome-scale analysis of protein functions and
protein blocks. Proc Natl Acad Sci USA. 1992;89(22):10915-10919. evolution. Nucleic Acids Res. 2000;28(1):33-36.
128. Chen Y, Ye W, Zhang Y, Xu Y. High speed BLASTN: an accelerated 152. Schneider M, Tognolli M, Bairoch A. The Swiss-Prot protein knowl-
MegaBLAST search tool. Nucleic Acids Res. 2015;43(16):7762-7768. edgebase and ExPASy: providing the plant community with high
129. Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, quality proteomic data and tools. Plant Physiology and Biochemistry,
Madden TL. NCBI BLAST: a better web interface. Nucleic Acids Res. 42(12), 1013-1021.
2008;36(suppl_2):W5-W9. 153. Overbeek R, Olson R, Pusch GD, et al. The SEED and the Rapid
130. Camacho C, Coulouris G, Avagyan V, et al. BLAST+: architecture Annotation of microbial genomes using Subsystems Technology
and applications. BMC Bioinf. 2009;10(1):421. (RAST). Nucleic Acids Res. 2014;42(D1):D206-D214.
131. Morgulis A, Coulouris G, Raytselis Y, Madden TL, Agarwala R, 154. Pruitt KD, Tatusova T, Maglott DRNCBI. Reference Sequence

Schäffer AA. Database indexing for production MegaBLAST (RefSeq): a curated non-redundant sequence database
searches. Bioinformatics (Oxford, England). 2008;24(16):1757-1764. of genomes, transcripts and proteins. Nucleic Acids Res.
132. Zhang Z, Schwartz S, Wagner L, Miller W. A Greedy Algorithm 2005;33(suppl_1):D501-D504.
for Aligning DNA Sequences. Vol. 7. New Rochelle, NY: Mary Ann 155. Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH, UniProt
Liebert, Inc.; 2000:203-214. Consortium. UniRef clusters: a comprehensive and scalable alter-
133. Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and native for improving sequence similarity searches. Bioinformatics
PSI-BLAST: a new generation of protein database search pro- (Oxford, England). 2014;31(6):926-932.
grams. Nucleic Acids Res. 1997;25(17):3389-3402. 156. UniProt Consortium. UniProt: a hub for protein information.

134. Darling AE, Carey L, Feng WC. The design, implementation, and Nucleic Acids Res. 2014;43(D1):D204-D212.
evaluation of mpiBLAST. Proceedings of ClusterWorld. No. LA-UR- 157. Chen L, Xiong Z, Sun L, Yang J, Jin QVFDB. update: toward the
03-2862. California: Los Alamos National Laboratory;2003. genetic diversity and molecular evolution of bacterial virulence
135. Liu W, Schmidt B, Liu Y, Voss G, Müller-Wittig W. Mapping of BLASTP factors. Nucleic Acids Res. 2012;40(D1):D641-D645.
algorithm onto GPU clusters. IEEE 17th International Conference on 158. Huerta-Cepas J, Szklarczyk D, Forslund K, et al. eggNOG 4.5: a hi-
Parallel and Distributed Systems, Tainan. 2011;236–243. erarchical orthology framework with improved functional annota-
136. Wienbrandt L, Siebert D, Schimmler M. Improvement of BLASTp tions for eukaryotic, prokaryotic and viral sequences. Nucleic Acids
on the FPGA-based high-performance computer RIVYERA. Res. 2015;44(D1):D286-D293.
International Symposium on Bioinformatics Research and Applications. 159. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard
Berlin, Heidelberg: Springer;2012:275–286. V, Henrissat B. The Carbohydrate-Active EnZymes database
137. Ning Z, Cox AJ, Mullikin JC. SSAHA: a fast search method for large (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res.
DNA databases. Genome Res. 2001;11(10):1725-1729. 2008;37(suppl_1):D233-D238.
138. Kim YJ, Boyd A, Athey BD, Patel JM. miBLAST: scalable evalua- 160. Huang L, Zhang H, Wu P, et al. dbCAN-seq: a database of carbohy-
tion of a batch of nucleotide sequence queries with BLAST. Nucleic drate-active enzyme (CAZyme) sequence and annotation. Nucleic
Acids Res. 2005;33(13):4335-4344. Acids Res. 2017;46(D1):D516-D521.
139. Kent WJ. BLAT–the BLAST-like alignment tool. Genome Res. 161. Paez-Espino D, Roux S, Chen I-M, et al. IMG/VR vol 2.0: an inte-
2002;12(4):656-664. grated data management and analysis system for cultivated and
140. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memo- environmental viral genomes. Nucleic Acids Res. 2018;47(D1):D67
ry-efficient alignment of short DNA sequences to the human ge- 8-D686.
nome. Genome Biol. 2009;10(3):R25. 162. Feng DF, Doolittle RF. Progressive sequence alignment as a prereq-
141. Edgar RC. Search and clustering orders of magnitude faster than uisite to correct phylogenetic trees. J Mol Evol. 1987;25(4):351-360.
BLAST. Bioinformatics (Oxford, England). 2010;26(19):2460-2461. 163. Katoh K, Misawa K, Ki K, Miyata T. MAFFT: a novel method for
142. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie rapid multiple sequence alignment based on fast Fourier trans-
2. Nat Methods. 2012;9(4):357-359. form. Nucleic Acids Res. 2002;30(14):3059-3066.
143. Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment 164. Edgar RC. MUSCLE: multiple sequence alignment with high accu-
using DIAMOND. Nat Methods. 2015;12(1):59-60. racy and high throughput. Nucleic Acids Res. 2004;32(5):1792-1797.
144. Mirdita M, Steinegger M, Söding J. MMseqs2 desktop and
165. Finn RD, Clements J, Eddy SR. HMMER web server: in-
local web server app for fast, interactive sequence searches. teractive sequence similarity searching. Nucleic Acids Res.
Bioinformatics (Oxford, England). 2019;35(16):2856-2858. 2011;39(suppl_2):W29-W37.
145. Steinegger M, Söding J. MMseqs2 enables sensitive protein se- 166. Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for cluster-
quence searching for the analysis of massive data sets. Nature bio- ing the next-generation sequencing data. Bioinformatics (Oxford,
technology. 2017;35(11):1026-1028. England). 2012;28(23):3150-3152.
146. Rawlings ND, Barrett AJ, Finn R. Twenty years of the MEROPS 167. Caporaso JG, Kuczynski J, Stombaugh J, et al. QIIME Allows
database of proteolytic enzymes, their substrates and inhibitors. Analysis of High-throughput Community Sequencing Data. Nature
Nucleic Acids Res. 2015;44(D1):D343-D350. methods. 2010;7:335-336.
147. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and ge- 168. Schloss PD, Westcott SL, Ryabin T, et al. Introducing mothur: open-
nomes. Nucleic Acids Res. 2000;28(1):27-30. source, platform-independent, community-supported software
|
16       KUMAR et al.

for describing and comparing microbial communities. Appl Environ 182. Park S-C, Won S. Evaluation of 16S rRNA databases for tax-
Microbiol. 2009;75(23):7537-7541. onomic assignments using a mock community. Genomics Inf.
169. Retief JD. Phylogenetic analysis using PHYLIP. Methods in molecu- 2018;16(4):e24.
lar biology (Clifton, NJ). 2000;132:243-258. 183. Yoon SH, Ha SM, Kwon S, et al. Introducing EzBioCloud: a taxonom-
170. Callahan BJ, McMurdie PJ, Holmes SP. Exact sequence variants ically united database of 16S rRNA gene sequences and whole-ge-
should replace operational taxonomic units in marker-gene data nome assemblies. Int J Syst Evol Microbiol. 2017;67(5):1613-1617.
analysis. ISME J. 2017;11(12):2639-2643. 184. Ghannoum MA, Jurevic RJ, Mukherjee PK, et al. Characterization
171. Edgar RC. UPARSE: Highly accurate OTU sequences from micro- of the oral fungal microbiome (mycobiome) in healthy individuals.
bial amplicon reads. Nat Methods. 2013;10(10):996-998. PLoS Pathog. 2010;6(1):e1000713.
172. Eren AM, Morrison HG, Lescault PJ, Reveillaud J, Vineis JH, Sogin 185. Chen T, Yu WH, Izard J, Baranova OV, Lakshmanan A, Dewhirst FE.
ML. Minimum entropy decomposition: Unsupervised oligotyping The Human Oral Microbiome Database: a web accessible resource
for sensitive partitioning of high-throughput marker gene se- for investigating oral microbe taxonomic and genomic information.
quences. ISME J. 2015;9(4):968-979. Database (Oxford). 2010;2010(0):baq013.
173. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, 186. Griffen AL, Beall CJ, Firestone ND, et al. CORE: a phylogenetical-
Holmes SP. DADA2: High-resolution sample inference from ly-curated 16S rDNA database of the core oral microbiome. PLoS
Illumina amplicon data. Nat Methods. 2016;13(7):581-583. One. 2011;6(4):e19051.
174. Amir A, McDonald D, Navas-Molina JA, et al. Deblur rapidly re- 187. Escapa IF, Chen T, Huang Y, Gajare P, Dewhirst FE, Lemon KP. New
solves single-nucleotide. community sequence patterns. mSys- insights into human nostril microbiome from the expanded human
tems. 2017;2(2):e00191-16. oral microbiome database (eHOMD): a resource for the microbiome
175. Balvočiute M, Huson DH. SILVA, RDP, Greengenes, NCBI and OTT - of the human aerodigestive tract. mSystems. 2018;3(6):e00187-18.
how do these taxonomies compare? BMC Genom. 2017;18(S2):114. 188. Gupta VK, Paul S, Dutta C. Geography, ethnicity or subsis-

176. Quast C, Pruesse E, Yilmaz P, et al. The SILVA ribosomal RNA gene tence-specific variations in human microbiome composition and
database project: improved data processing and web-based tools. diversity. Front Microbiol. 2017;8:1162.
Nucleic Acids Res. 2012;41(D1):D590-D596. 189. Lourenςo TGB, Spencer SJ, Alm EJ, Colombo APV. Defining the gut
177. Wang Q, Garrity GM, Tiedje JM, Cole JR. Naïve Bayesian classifier microbiota in individuals with periodontal diseases: an exploratory
for rapid assignment of rRNA sequences into the new bacterial study. J Oral Microbiol. 2018;10(1):1487741.
taxonomy. Appl Environ Microbiol. 2007;73(16):5261-5267.
190. Al-Hebshi NN, Abdulhaq A, Albarrag A, Basode VK, Chen T.
178. O'Leary NA, Wright MW, Brister JR, et al. Reference sequence Species-level core oral bacteriome identified by 16S rRNA py-
(RefSeq) database at NCBI: current status, taxonomic expansion, rosequencing in a healthy young Arab population. J Oral Microbiol.
and functional annotation. Nucleic Acids Res. 2015;44(D1):D733 2016;8(1):31444.
-D745.
179. DeSantis TZ, Hugenholtz P, Larsen N, et al. Greengenes, a chime-
ra-checked 16S rRNA gene database and workbench compatible
How to cite this article: Kumar PS, Dabdoub SM, Ganesan
with ARB. Appl Environ Microbiol. 2006;72(7):5069-5072.
180. Garrity GM, Holt JG. The Road Map to the Manual. In Whitman
SM. Probing periodontal microbial dark matter using
WB, Rainey F, Kämpfer P, et al. eds. Bergey's Manual of Systematics metataxonomics and metagenomics. Periodontol 2000.
of Archaea and Bacteria. 2015. https://doi.org/10.1002/97811​ 2020;00:1–16. https://doi.org/10.1111/prd.12349
18960​608.bm00031
181. Euzéby JP. List of Bacterial Names with Standing in Nomenclature:
A Folder Available on the Internet. Int J Syst Bacteriol.
1997;47:590-592.

You might also like