Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Microbiological Research 260 (2022) 127023

Contents lists available at ScienceDirect

Microbiological Research
journal homepage: www.elsevier.com/locate/micres

Recovering metagenome-assembled genomes from shotgun metagenomic


sequencing data: Methods, applications, challenges, and opportunities
Yunyan Zhou a, b, *, Min Liu a, Jiawen Yang a
a
State Key Laboratory of Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang 330045, China
b
Institute of Engineering Biology and Health, Collaborative Innovation Center of Yangtze River Delta Region Green Pharmaceuticals, College of Pharmaceutical Sciences,
Zhejiang University of Technology, Hangzhou, Zhejiang 310014, China

A R T I C L E I N F O A B S T R A C T

Keywords: Reference genomes are essential for analyzing the metabolic and functional potentials of microbiomes. However,
Metagenome-assembled genomes microbial genome resources are limited because most of microorganisms are difficult to culture. Genome binning
Methods is a culture-independent approach that can recover a vast number of microbial genomes from short-read high
Applications
throughput shotgun metagenomic sequencing data. In this review, we summarize methods commonly used for
Challenges
Opportunities
reconstructing metagenome-assembled genomes (MAGs) to provide a reference for researchers to choose pro­
priate software programs among the numerous and complicated tools and pipelines that are available for these
analyses. In addition, we discuss application prospects, challenges, and opportunities for recovering MAGs from
metagenomic sequencing data.

1. Introduction genomic binning approaches from metagenomic data. Genome-binning


groups sequences from the same species or even the same strain together
Microbial communities play critical roles in many ecosystems. For according to their nucleotide compositions and read coverage from
example, gut microbial communities are well known to be involved in sequencing data of one or multiple samples. These techniques can pro­
host metabolism, immunity, and even behaviors. Reference genomes are vide insight into the community structures of microbiome at the strain
essential for analyzing the potential metabolic and functional capacities level (Tyson et al., 2004; Sharon et al., 2013; Sangwan et al., 2016).
of microbial populations. However, most microorganisms within natural Following the first reconstruction of metagenome-assembled genomes
microbial communities lack available genomes, because many taxa have (MAGs) from a natural acidophilic biofilm by Tyson et al. (2004), many
not been cultivated in laboratories (Hugenholtz, 2002; Fodor et al., MAGs have been recovered from diverse samples, including from guts
2012), despite that significant efforts have been made to culture and (Wang et al., 2019; Glendinning et al., 2020; Lesker et al., 2020; Chen
sequence microbiome genomes (Human Microbiome Jumpstart Refer­ et al., 2021a; Feng et al., 2021; Levin et al., 2021; Peng et al., 2021),
ence Strains et al., 2010; Browne et al., 2016; Lagier et al., 2016; rumens (Hess et al., 2011; Svartstrom et al., 2017; Stewart et al., 2019b),
Seshadri et al., 2018; Ito et al., 2019; Zou et al., 2019). Nevertheless, this sediments (Anantharaman et al., 2016; Zaremba-Niedzwiedzka et al.,
lack of information seriously prevents progress into microbiome 2017), soils (Hultman et al., 2015; Woodcroft et al., 2018), oceans (Tully
research. Single-cell genome sequencing allows researchers to obtain et al., 2018), and other types of environments (Albertsen et al., 2013;
microbial genomes without cultivation (Marcy et al., 2007). However, Gullert et al., 2016; Danko et al., 2021). An explosive increase of MAG
this approach requires special equipment and technical challenges, numbers has been recovered from human gut communities in recent
including difficulties in isolating single cell and bias in single cell years (Almeida et al., 2019, 2020; Nayfach et al., 2019). Indeed, the
genome amplification (Gawad et al., 2016). Genomes OnLine Database (GOLD) has collected 180,550 isolate ge­
Numerous advances in high-throughput sequencing technologies nomes and 18,382 MAGs as of June 2021 (Mukherjee et al., 2019).
and bioinformatic tools have been made in recent years leading to the Likewise, 258,406 bacterial and archaeal genomes have been taxo­
ability of researchers to recover numerous microbial genomes using nomically classified within the Genome Taxonomy Database (GTDB,

* Corresponding author at: State Key Laboratory of Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University, Nanchang 330045,
China.
E-mail addresses: 664280148@qq.com (Y. Zhou), 854768208@qq.com (M. Liu), 383049451@qq.com (J. Yang).

https://doi.org/10.1016/j.micres.2022.127023
Received 18 November 2021; Received in revised form 7 March 2022; Accepted 5 April 2022
Available online 8 April 2022
0944-5013/© 2022 Elsevier GmbH. All rights reserved.
Y. Zhou et al. Microbiological Research 260 (2022) 127023

Release 06-RS202) as of April 2021, including 45,555 bacterial and following three modules (Fig. 1): (i) preprocessing of sequence reads
2339 archaeal species clusters (Parks et al., 2020). Nevertheless, the including adapter trimming, quality control, and host genomic sequence
number of genomes generated thus far considerably surpasses these removal; (ii) reconstruction of MAGs including metagenomic assembly,
numbers. Several studies resulting in large-scale recovery of MAGs have metagenomic binning, and optimization of MAG quality; (iii) quantifi­
been conducted in recent years. For example, Anantharaman et al. cation and annotation of MAGs including quantification, taxonomic
(2016) reconstructed 2540 MAGs from an aquifer system, leading to the classification, and genome annotation. Based on study purposes, MAGs
generation of genomes from 47 previously uncharacterized can then be used for one of further analyses, such as functional capacity
phylum-level lineages. Likewise, Parks et al. (2017) recovered near 8000 analysis, phylogenetic analysis, biomarker discovery, pan-genome
MAGs from over 1500 public metagenomic datasets, with a particular analysis, identification of single nucleotide polymorphisms (SNPs) at
emphasis on environmental and non-human gastrointestinal samples. the population level, and host prediction of viral genomes. To evaluate
Moreover, Stewart et al. (2019b) assembled 4941 rumen microbial the selection of appropriate software programs among various and
MAGs from 283 ruminant cattle, providing a valuable genomic resource complicated tools and pipelines, an understanding of the algorithms is
for evaluating the structures and functions of rumen microbiota. We first needed along with a comparison of the advantages and limitations
previously assembled 6339 MAGs from 500 metagenomes of pig gut of these programs.
microbiomes, and provided an expanded resource for the studies in
swine gut microbiome (Chen et al., 2021a). Recently, Xie et al. (2021) 2.1. Sequence reads preprocessing
assembled 10,373 MAGs from 370 lumen samples comprising ten
gastrointestinal tract regions from seven ruminant species, leading to the High-throughput sequencing platforms allow multiple samples to be
identification of 8745 uncultured candidate species. In humans, Pasolli sequenced at once by adding dual indexed barcodes during library
et al. (2019) reconstructed 154,723 microbial genomes from human preparation to distinguish sequence reads from each sample. Sequence
metagenomes spanning samples from different body sites, host ages, errors, base content biases, and overrepresented sequences are all pro­
countries, and lifestyles. In addition, Almeida et al. (2019) generated 92, duced during library preparation and sequencing. Indeed, from sample
143 MAGs from 11,850 human gut metagenomes and identified 1952 collection to DNA extraction, library preparation, and sequencing,
uncultured species. They further provided a new catalog comprising contaminants can be introduced at each step from hosts, humans or
204,938 non-redundant genomes from 4644 gut prokaryotes and con­ surrounding environments. Therefore, sequence read preprocessing is
structed a Unified Human Gastrointestinal Protein (UHGP) catalog important for further analyses. Several software programs are available
(Almeida et al., 2020). Researchers have used these assembled for adapter trimming and quality control (Table 1). Trimmomatic is a
human-associated metagenomes to explore the intraspecies genomic popular tool that exhibits good performance in trimming poor-quality
diversity, leading to the identification of considerable intraspecies reads, but it is only operational for Illumina platforms sequence data
genomic variation that can be specific to populations within human (Bolger et al., 2014). In addition, Trim Galore (https://www.bioinforma
individuals. Nayfach et al. (2020) reconstructed a genomic catalog of tics.babraham.ac.uk/projects/trim_galore/) incorporates FASTQC
Earth’s microbiomes from 10,450 metagenomes of diverse habitats, and (Andrews, 2010) and Cutadapt (Martin, 2011), and can be used with
that included 52,515 MAGs representing 12,556 novel candidate species data from all high-throughput sequencing platforms for trimming low
spanning 135 phyla. These recovered MAGs have greatly expanded the quality and adapter sequences. Further, Fastp can automatically identify
known phylogenetic diversity of bacterial and archaeal communities, adapters, and rapidly perform quality control and adapter trimming,
and provided essential resources for the establishment of a taxonomic producing results in both the JSON and HTML formats (Chen et al.,
framework (Parks et al., 2020). Further, these genomic datasets can 2018). Removal of contaminant reads can be performed by aligning
promote thorough investigations into microbial ecological characteris­ reads to reference genome sequences including host genomic sequences.
tics (Edwards and Holt, 2013; Roller et al., 2013). Moreover, large-scale BWA (Li and Durbin, 2009) and Bowtie 2 (Langmead and Salzberg,
MAG data recovery also promote the development of bioinformatic tools 2012) are two of the mostly commonly used tools for alignment of short
to analyze metagenomic sequencing data, as implemented in the sequence reads to reference genomes. In addition, the software pro­
GTDB-tk (Chaumeil et al., 2019) and PhyloPhlAn3.0 (Asnicar et al., grams for file format transformation such as Samtools (Li et al., 2009)
2020) software suites. and Bedtools (Quinlan and Hall, 2010) are needed sometimes. SeqKit is a
Many computational algorithms and software programs have been toolkit for FASTA/Q file manipulation that supports the production of
developed for metagenomic sequencing data analysis (Table S1). How­ simple statistics, file format conversions, searching, filtering, extraction,
ever, a remaining challenge is the suitable choice of software or pipeline deduplication, splitting, shuffling, and sampling (Shen et al., 2016).
programs among numerous bioinformatic tools, and especially for re­ Further, the MultiQC program has traditionally been used to summarize
searchers without bioinformatics backgrounds. Here, we summarize and report results from multiple tools and samples within a single file
available methods and software programs that are commonly used to (Ewels et al., 2016). KneadData (https://github.com/biobaker­
construct MAGs, while also discussing the advantages and limitations of y/kneaddata) is a new pipeline that incorporates all of the above func­
these programs for MAG generation from short-read shotgun meta­ tions including in quality control, adapter trimming, and removing host
genomic sequencing data. We further discuss application, challenges genomic sequences, thereby making it a useful pipeline for preprocess­
and opportunities for MAG generation in microbial studies. ing metagenomic and metatranscriptomic sequencing data.

2. Procedures and software for reconstructing microbial 2.2. Metagenome assembly


genomes from metagenomic sequencing data
Metagenome assembly involves de novo assembly of short meta­
A typical shotgun metagenomics study contains five steps including genomic sequence reads into contigs. Popular graph-based algorithms
study design, sample collection and sequencing, pre-processing of for metagenome assembly include Overlap-Layout-Consensus (OLC) and
sequencing data, sequence analysis, and post-processing and validation de Bruijn graph (DBG) algorithms (Miller et al., 2010; Perez-Cobas et al.,
(Quince et al., 2017). Recovering MAGs is an effective approach in 2020). DBG algorithms based on k-mer compositions are the most
metagenomics studies for mining useful information from metagenomic commonly used approaches in short read metagenomic assemblers
sequencing data. Saheb Kashaf et al. (2021) provided a protocol to (Pevzner et al., 2001). In addition, OLC algorithms can be used to
recover draft genomes from short read shotgun metagenomic reconstruct longer contigs based on overlap between paired reads.
sequencing data. A conventional workflow for reconstructing microbial However, this approach has a high error rate and has rarely been used
genomes from metagenomic sequencing data primarily includes the with short read data. Nevertheless, the emergence of long-read

2
Y. Zhou et al. Microbiological Research 260 (2022) 127023

Fig. 1. | Typical workflow for reconstructing and analyzing metagenome-assembled genomes (MAGs) from metagenomic sequencing data. The steps for recon­
structing and analyzing MAGs includes: (i) preprocessing sequencing reads including adapter trimming, quality control, and host genomic sequence removal; (ii)
MAG reconstructions including metagenome assembly, metagenomic binning, and optimization of MAG qualities; (iii) quantification and annotation of MAGs
including quantification, taxonomic classification, and genome annotation; and (iv) advanced MAG analyses.

for sequence assembly. Several studies have evaluated the performance


Table 1
of multiple de novo assemblers using metagenomic sequencing data
Software programs used for preprocessing raw sequence reads.
from distinct environments and simulated datasets (Sczyrba et al., 2017;
Software Major functions Description and advantages van der Walt et al., 2017; Forouzan et al., 2018). Among the numerous
FastQC Quality assessment Identifies potential problems available assemblers, metaSPAdes and MEGAHIT are most recom­
Summary report production mended. metaSPAdes was updated from SPAdes (Bankevich et al.,
Cutadapt Quality control, adapter Removes unwanted sequence 2012), with the former being specific for metagenomic sequencing data,
trimming
Trimmomatic Quality control, adapter Good performance in trimming poor-
and exhibits excellent performance in assembling long contig lengths,
trimming quality data with a high fraction of reads assembled into contigs. However, meta­
Contains a library of Illumina SPAdes also incurs increasing mis-assemblies, and is both time and
adapters and primer sequences memory intensive. MEGAHIT is an alternative choice if both a high level
Used for Illumina platform sequence
of assembly completeness and low number of mis-assemblies are
data
Trim galore Quality control, adapter A wrapper tool for FastQC and desired. MEGAHIT features advantages of fast assembly and low mem­
trimming Cutadapt ory requirements. Further, a high level of accuracy can be achieved by
Automatic identification of adapters MEGAHIT for assemblies from metagenomic sequencing data with low
fastp Quality control, adapter Fast complexity. The Critical Assessment of Metagenome Interpretation
trimming Automatic identification of adapters
Summary report production
(CAMI) (Sczyrba et al., 2017) study indicated that MEGAHIT, Minia and
KneadData Quality control, adapter A pipeline that integrates multiple Meraga (Meraculous (Chapman et al., 2011) + MEGAHIT) obtained
trimming, removing host tools, including Trimmomatic, higher cumulative contig sizes and contig numbers than the OperaMS
genomic sequence FastQC and Bowtie2 Scaffolder (Gao et al., 2011), Ray Meta (Boisvert et al., 2010) and the
Designed for metagenomic and
Velour assembler (Zerbino and Birney, 2008). The IDBA-UD assembler
metatranscriptomic sequencing data
features good performance with highly uneven sequence data depth
(Peng et al., 2012).
sequencing technology has led to re-popularization of OLC algorithms. AMOS (Treangen et al., 2011) and MeGAMerge (Scholz et al., 2014)
Many assemblers are available based on DBG algorithms (Table 2), are shotgun metagenome assemblers that use the OLC approach, and
including the metaSPAdes (Nurk et al., 2017), MEGAHIT (Li et al., combine the results of multiple assemblers to improve assembly quality.
2015), SOAPdenovo2 (Luo et al., 2012), IDBA-UD (Peng et al., 2012), Further, the GAM-NGS (Vicedomini et al., 2013) program is also capable
and Minia (Chikhi and Rizk, 2013) assemblers. All of these software of merging multiple assemblies to improve contiguity and correctness.
programs exhibit multi-k-mer capabilities that are very typically used However, the program does not rely on global alignment to build

3
Y. Zhou et al. Microbiological Research 260 (2022) 127023

Table 2 QUAST (Gurevich et al., 2013) can detect mis-assemblies and structural
Popular software programs for metagenomic assembly. variations using a reference-based approach, while also evaluating
Program Assembly Algorithms Descriptions and Advantages contig sizes and genome representation, even without a reference.
MetaQUAST (Alla et al., 2016) is a modified version of QUAST that is
MEGAHIT De Bruijn graphs Less computational resources
Multi-k assembly Fewer mis-assemblies specifically designed for metagenomic sequence assembly, and that
Bias towards low coverage features additional functionalities. Specifically, the former can detect
genomes chimeric contigs and unknown species contents. In addition, the Deep­
metaSPAdes De Bruijn graphs Long contig lengths MAsED (Mineeva et al., 2020) program uses a deep learning approach to
Multi-k assembly High fraction of reads that can
be assembled to contigs
identify mis-assembled contigs independently from reference genomes.
SOAPdenovo2 De Bruijn graphs Less computational resources Different programs carry different time and memory consumption needs
Multi-k assembly needed and these should be considered when choosing programs for quality
Greatly surpasses its predecessor assessment.
SOAPdenovo in both assembly
Metagenomic sequence assembly strategies include single-sample
length and accuracy
IDBA-UD De Bruijn graphs Better performance for assembly and co-assembly of multiple samples (Table 3). Co-assembly
Multi-k assembly sequencing data with highly can lead to obtaining more genomes with higher completeness, and
uneven depths more genomes from the species exhibiting low abundances, but also
Minia De Bruijn graphs Less computational resources leads to higher contamination (Pasolli et al., 2019). To increase the
Multi-k assembly needed
Canu Overlap-Layout-Consensus Long-read assembler
fraction of reads that can be assembled to contigs and the accuracy of
AMOS Overlap-Layout-Consensus Merges results from multiple assemblies, a combined approach of single-sample assembly and
assemblers to improve overall co-assembly of multiple samples should be considered (Stewart et al.,
assembly quality 2019b). However, increased data sizes and their requirements of large
MeGAMerge Overlap-Layout-Consensus Merges assembled contigs, long
computational resources lead to the co-assembly approach not being
reads from metagenomic
sequencing runs suitable for analyzing large datasets with large sample sizes. Genomes
GAM-NGS Identifies regions of two Less computational resources produced from co-assembly should be considered as population-level
assemblies representing the needed genomes. Consequently, assembled contig quality may decrease when
same genomic locus by read Merges results from multiple sequence reads from different sample types are included (Saheb Kashaf
alignment and stores them in a assemblers to improve overall
et al., 2021). A previous study also suggested that single-sample as­
weighted graph assembly quality
MaSuRCA Overlap-Layout-Consensus Supports assemblies of short sembly can lead to a greater number of higher quality genomes
and de Bruijn graphs reads together with long reads compared to co-assembly (Olm et al., 2017).
Suitable for low-complexity De-replication of MAGs must be conducted when pooling samples,
databases
with dRep being a recommended de-replication tool that performs
hybridSPAdes De Bruijn graphs Supports assemblies of short
reads together with long reads pairwise genome comparisons based on average nucleotide identity
OPERA-MS De Bruijn graphs and Supports assemblies of short (ANI). The highest quality genome is then chosen from each replication
scaffolding reads together with long reads cluster to generate a de-replicated catalog of MAGs (Olm et al., 2017).
Long-read connections Strain-resolved assembly However, increased MAG numbers lead to dramatic increases in
required computational resources and time for de-replication. Mash is
assembly graphs, but rather identifies regions of two assemblies that another tool for de-replicating MAGs that exhibits fast computational
represent the same genomic locus by read alignment, followed by stor­ speeds and estimates the similarity between genomes using the MinHash
age in a weighted graph. Canu (Koren et al., 2017) is an assembler that is distance (Ondov et al., 2016). However, high MAG completeness is
specially designed for assembling long reads generated from long-read required for MAGs this analysis.
sequencing technologies like those from PacBio or Oxford Nanopore
methods. Hybrid assembly methods combine the advantages of 2.3. Metagenome binning
high-accuracy second-generation sequencing data with the long
sequence lengths of third-generation sequencing data. The hybrid Metagenomic binning groups sequences from the same species or
approach ultimately provides higher quality contigs than using either even the same strain into genome bins. Binning can be conducted via
short read or long read assemblies alone, thereby uniquely in resolving read, contig, or gene binning based on the type of sequence data that are
complex repeated DNA segments. Several hybrid assemblers have been clustered (Nielsen et al., 2014; Girotto et al., 2016; Sangwan et al.,
developed in recent years including MaSuRCA (Zimin et al., 2017), 2016). Binning contigs have been most commonly used for genome
hybridSPAdes (Bankevich et al., 2012) and OPERA-MS (Bertrand et al., binning in recent years. Generally, MAGs are generated by contig
2019). binning into taxonomic bins based on nucleotide compositions or
Metagenomic sequence assembly is challenged by many factors sequence read abundances. The tools commonly used for binning pri­
including intragenomic and intergenomic repeats, uneven species marily include MetaBAT 2 (Kang et al., 2019), Maxbin 2 (Wu et al.,
abundances, uneven sequence coverage, high strain diversity, low re­ 2016) and CONCOCT (Alneberg et al., 2014) (Table 4). MetaBAT 2 has
covery rates for some phyla, and sequencing errors (Nayfach et al., 2019;
Olson et al., 2019). Further, assembly quality can be influenced by Table 3
community complexity, the presence of closely related genomes, Comparison of single sample assembly and co-assembly of multiple samples.
sequencing depth, and k-mer sizes (Sczyrba et al., 2017). Consequently,
Assembly Advantages Disadvantages
the accurate assessment of metagenome assembly qualities is critical for strategy
further analysis. Indicators for assembly quality primarily include the
Single- Lower contamination Lower completeness
following three aspects: (i) statistical indicators including the numbers assembly Fast Difficult to obtain genomes from
of contigs, average lengths of contigs, and N50 values; (ii) the accuracy Memory efficient low-abundance species
of assembled contigs including mismatches and the number of Suitable for analyses with
mis-assemblies; and (iii) completeness of contigs as measured by the large sample sizes
Co-assembly Higher completeness Higher contamination
fraction of reads that are mapped to them (Forouzan et al., 2018).
Obtains more genomes from Time-consuming
Several software programs are available for assessing assembly quality. low-abundance species High memory requirements

4
Y. Zhou et al. Microbiological Research 260 (2022) 127023

been commonly recommended for MAG reconstructions from large

Bins are then combined from more than three binning results from programs (e.g., from CONCOCT, MetaBAT, and MyCC).
An adaptive binning algorithm (using abundance (ABD) to rank-normalize tetranucleotide frequencies (TNF) and calculate
sample size datasets because of its highly efficient use of computational

Iteratively compares bin sets and selects a better bin based on the scoring function S=Completion-5 *Contamination for
memory. The numbers and quality of assembled bins are greatly influ­

Combines the bins from multiple binning results (e.g., CONCOCT, MetaBAT, MaxBin2, tetraESOMs, and ABAWACA).
enced by contig length, and thus, most of the aforementioned programs

The Binning_refiner module is used to produce possible combinations of bin sets from multiple binning results.
An iterative procedure is used based on bin score, scaffold N50 and bin size to select a non-redundant bin set.
suggest the use of contigs with ≥ 1000 bp for binning. However, contig

Uses a Gaussian mixture model fitting with a variational Bayesian approximation to cluster contigs into bins.

A scoring function to rank genome bins is used based on their estimated completeness and contamination.
lengths must be ≥ 1500 bp when MetaBAT 2 is used for binning. Vari­
ational Autoencoders for Metagenomic Binning (VAMB) is a recently
a composite score of TNF and ABD; graph-based clustering; small contigs bin recruitment is possible).

developed software program for metagenomic binning (Nissen et al.,

Determines clustering threshold based on the density of Pearson correlation coefficient distances.
Uses variational autoencoder (VAE) to encode abundances and compositions of contigs or genes.
2021) and encodes sequences according to abundances and k-mer dis­
tribution information before clustering bins using deep variational
autoencoders. VAMB usually obtains fewer bins compared with Meta­

Removes contigs with divergent GC content, coverage, or tetranucleotide signatures.


BAT 2, but the number of bins is relatively stable and robust when using
different minimum contig length thresholds.
Uses an Expectation-Maximization algorithm to cluster contigs into bins.

Pairwise BLASTn is used to get shared contigs between two sets of bins. 2.4. Optimization of MAG quality
Initializes the number of genomes based on single copy maker genes.

No uniform criteria for MAG quality have yet been established,


Uses iterative medoid clustering of contigs or genes into bins.

although estimated completeness and contamination are two major in­


A simple k-means algorithm is used to initialize the clusters.

dicators of MAG quality. CheckM (Parks et al., 2015) can be used to


estimate completeness and contamination of bacterial or archaeal ge­
Removes contigs with incongruent 16 S rRNA genes.
nomes using lineage-specific marker genes by referencing an established
genome tree. CheckM has been used to assess MAG quality in almost all
Removes contigs with conflicting taxonomy.

studies that recover MAGs from shotgun metagenomic sequencing data.


However, CheckM is unable to assess the quality of non-bacterial or
Regularized expectation maximization.

archaeal genomes, including those from fungi or other microbial eu­


karyotes (Parks et al., 2015). In contrast, the BUSCO (Manni et al., 2021)
program can be used to assess MAG qualities from bacterial, archaeal,
viral, and eukaryotic species. Moreover, the program is suitable for
various data types including genome assemblies, gene sets and MAGs.
each pair of bins.

The DAS Tool (Sieber et al., 2018), Binning_refiner (Song and


Thomas, 2017), or MetaWRAP (Uritskiy et al., 2018) programs can all be
Algorithm

used to refine bins to achieve higher completeness and less contamina­


tion (e.g., refining towards higher quality) by integrating several
binning results. In addition, RefineM can be used to identify and remove
contigs with divergent GC content, coverage, and tetranucleotide sig­
Groups contigs into individual bins based on tetranucleotide frequencies and

Groups contigs into individual bins based on tetranucleotide frequencies and

Refines bins by integrating the results of multiple binning software programs.

Refines bins by integrating the results of multiple binning software programs.

A pipeline for recovering MAGs and refining bins by integrating the results of

natures, in addition to those with conflicting taxonomy, and incongruent


(tetranucleotide by default) and contig coverages across multiple samples.

16 S rRNA genes (Parks et al., 2017). Genome bins can also be further
Refines bins by removing contigs within a bin that were identified as

refined by extracting reads that belong to each bin and then


re-assembling them using metaSPAdes (Nurk et al., 2017).
Groups contigs into individual bins based on k-mer frequencies

MAG quality has also been evaluated using several other metrics
including strain heterogeneity, quality score (com­
pleteness − 5 × contamination), the number of contigs within a MAG,
the presence of rRNA genes (23 S, 16 S, and 5 S), and the number of
tRNAs (Parks et al., 2017; Pasolli et al., 2019; Stewart et al., 2019b;
Bins microbial genomes using deep learning.

Nayfach et al., 2020). The Minimum Information about a Metagenome


contig coverages across multiple samples.

contig coverages across multiple samples.

Assembled Genome (MIMAG) standards have been proposed by the


multiple binning software programs.

Genomic Standards Consortium, stating that MAGs with more than 50%
Software programs used for binning and refining genome bins.

completeness and less than 10% contamination are considered as


medium-quality. In addition, high-quality MAGs are considered those
with more than 90% completeness, less than 5% contamination, and the
presence of all 23 S, 16 S, and 5 S rRNA genes, in addition to at least 18
tRNAs (Bowers et al., 2017).
contamination.
Description

2.5. Quantification and annotation of MAGs

Medium- or high-quality MAGs are commonly subjected to quanti­


fication, taxonomic classification and genome annotation. MetaWRAP is
Binning_refiner

a recommended pipeline for those analyses if sample sizes are not too
MetaWRAP
MetaBAT 2

CONCOCT

large. As indicated above, MetaWRAP is a flexible and modular pipeline


Maxbin 2

DAS Tool
Software

RefineM
VAMB

that comprises all steps from preprocessing of metagenomic sequence


reads to the recovery of MAGs, in addition to quantification and anno­
tations of MAGs. However, high computational capacity is required to
refinement

run MetaWRAP. Anvi’o (Eren et al., 2015) is another analytics and


visualization platform for ‘omics data, that can also perform assembly,
Binning
Table 4

mapping, profiling, binning, refinement of bins (with interactive in­


Bins

terfaces), and summarize results. In addition, MAGpy (Stewart et al.,

5
Y. Zhou et al. Microbiological Research 260 (2022) 127023

2019a) is another pipeline for downstream analyses of MAGs including (Chaumeil et al., 2019), MiGA (Rodriguez et al., 2018), PhyloPhlAn
comparison of protein or genome sequences between MAGs and several 3.0 (Asnicar et al., 2020) and MAGpy (Stewart et al., 2019a) programs.
public databases, quality assessment, taxonomic classification, and Among these, GTDB-tk has been most widely used among many studies
phylogenetic analysis. Further, Salmon (Patro et al., 2017) has long been in recent years (Danko et al., 2021; Nayfach et al., 2020; Xie et al., 2021;
used to quantify the abundances of RNA-seq data transcripts, and has Chen et al., 2021a).
also been used to quantify MAGs abundances. The quantification mod­ Prokka (Seemann, 2014) is a pipeline that integrates multiple soft­
ule of MetaWRAP calculates the abundances of each MAG in each ware programs to annotate MAGs. The program can predict coding
sample based on the length-weighted average of the contig abundances sequence (CDS), rRNA genes, and tRNA gene from MAGs. The output
that are generated by Salmon (Uritskiy et al., 2018). files generated by Prokka include FASTA files of protein and nucleotide
MAG taxonomic classification methods can be classified into three sequences of coding genes, a Genbank file, and a GFF (v3) file containing
categories: (i) DNA-based classification including through average sequences and annotations. All of these files can be easily used in other
nucleotide identity (ANI) and alignment of genome sequences; (ii) downstream analyses.
protein-based classification based on average amino acid identity (AAI)
and alignment of protein sequences; and (iii) marker-based classification
based on species-specific core gene datasets and universal markers 2.6. Public microbial genome databases
(Table 5). The classification module of MetaWRAP (Uritskiy et al., 2018)
uses DNA-based classification, while BAT (von Meijenfeldt et al., 2019) To construct more complete microbial genome datasets, researchers
classifies MAGs using protein-based classification. Some tools also typically integrate MAGs with public microbial genome databases. Mi­
classify MAGs using more than two methods, including the GTDB-tk crobial genome databases that are available and that are commonly used
for taxonomic annotation include the Genome Taxonomy Database
(GTDB) (Parks et al., 2020), Integrated Microbial Genomes (IMG) (Chen
Table 5
Software used to taxonomic classification of MAGs.
et al., 2021b), RefSeq (https://www.ncbi.nlm.nih.gov/refseq/), and
GenBank (https://www.ncbi.nlm.nih.gov/genbank/) databases. In
Software Database Classification methods Type
addition, public reference genome datasets from specific environments
GTDB-tk Genome Taxonomy Placement in the GTDB DNA-based, or from isolate genomes include the Human Microbiome Project (HMP)
Database (GTDB) reference tree marker-
(Human Microbiome Jumpstart Reference Strains et al., 2010), the
Relative evolutionary based
divergence (RED)
Genomic Encyclopedia of Bacteria and Archaea (GEBA) (Mukherjee
Average nucleotide et al., 2017), the Genomes from Earth’s Microbiomes (GEM) (Nayfach
identity (ANI) et al., 2020), rumen-uncultured genomes (RUGs) (Stewart et al., 2019b),
MiGA NCBI Genome Database Average nucleotide DNA-based,
(Prokaryotes), RefSeq identity (ANI) protein-
Average amino acid based Table 6
identity (AAI) Genome databases or public reference genome datasets commonly used for
PhyloPhlAn 87,173 reference Comparison against the DNA-based, genome comparisons.
3.0 genomes in GenBank, 400 most universal marker-
Database/ # Genomes # Species Web Address Last update
154,723 MAGs, markers across based
Dataset time/
57,841,793 gene bacterial and archaeal
version
families in UniRef90 species
Species-specific core GTDB Bacteria: Bacteria: https://gtdb.ecog April 27,
genes from the 254,090 45,555 enomic.org/ 2021/
> 18,000 sets of Archaea: Archaea: Release 06-
preselected gene 4316 2339 RS202
families in UniRef90 IMG Bacteria: Bacteria: https://img.jgi. June 5,
Average nucleotide 93,966 32,334 doe.gov/cgi-bin/w 2021/
identity (ANI) Archaea: Archaea: /main.cgi Version
MAGpy UniProt, over 100,000 Comparison of protein DNA-based, 2114 1060 5.4.1
public genomes sequences protein- RefSeq Bacteria: Bacteria: https://ftp.ncbi. Jan. 9,
Comparison of genome based, 203,019 31,179 nlm.nih.gov/ge 2021
sequences marker- Archaea: Archaea: nomes/refseq/
Uses PhyloPhlAn based 1107 713
(based on the 400 most GenBank Bacteria: Bacteria: https://ftp.ncbi. Jan. 9,
universal markers) to 833,917 51,841 nlm.nih.gov/g 2021
perform phylogenetic Archaea: Archaea: enomes/genbank/
analysis 6274 2418
MetaWRAP NCBI Nucleotide Taxonomic DNA-based HMP 2947 / https://www.ncbi. Published
Sequence Database classification of each nlm.nih.gov/bi in 2010
(NT), NCBI Taxonomy contig oproject/28331
Database, RefSeq The final taxonomy of a GEBA Bacteria:974 579 https://genome. Published
bin is determined based Archaea: 29 genera jgi.doe.gov/porta in 2017
on the taxonomy of l/geba1003/geba
each contig in a 1003.info.html
phylogenetic tree and GEM 52,515 18,028 https://genome.jg Published
the branch weight i.doe.gov/portal/ in 2020
BAT NCBI non-redundant ORF prediction Protein- GEMs/GEMs.
protein database (NR), Predicted ORFs are based home.html
NCBI Taxonomy aligned to the NCBI_nr RUGs Bacteria: 2346 https://www.ebi. Published
Database database 4815 ac.uk/ena/brow in 2019
ORFs are classified Archaea: ser/view
based on the LCA 126 /PRJEB31266
algorithm Hungate1000 410 82 genera https://genome. Published
MAGs are then collection jgi.doe.gov/port in 2018
classified based on a al/HungateCollect
voting approach using ion/HungateColle
all classified ORFs ction.info.html

6
Y. Zhou et al. Microbiological Research 260 (2022) 127023

and the Hungate genome catalog (Seshadri et al., 2018) (Table 6). There Table 7
are, of course, more public microbial genome datasets than those listed Databases commonly used to functional annotate MAGs.
above, individual researchers can choose their databases according to Database Description Item Web Address Analysis tools
specific environments and the purpose of the study. The comprehen­
KEGG High-level KEGG https://www BLAST (
siveness and taxonomic accuracy of a reference genome database can functions and Orthology .kegg.jp/ Altschul et al.,
also critically impact the accuracy of MAG taxonomic annotation. Un­ utilities of (KO), modules 1990),
known taxa can be identified by taxonomic classification and by biological and pathways GhostKOALA
comparing sequence similarities between recovered genomes and public systems (Kanehisa
et al., 2016)
reference genomes. CAZy Carbohydrate Carbohydrate- http://www. dbCAN (Yin
It should be noted that the definition of bacterial species remains metabolism active cazy.org/ et al., 2012)
controversial. A species can be classified based on several aspects, enzymes enzymes
including monophyly, ribosomal RNA genes, genomic coherence and (CAZymes)
eggNOG Orthologous Orthologous http://eggno eggNOG-
phenotypic coherence (Rossello-Mora and Amann, 2015). Thus,
relationships, groups (OGs) g.embl.de mapper (
sequence similarity may be different for each taxonomy, even when gene Huerta-Cepas
considering the same taxonomic level. Consequently, it is difficult to evolutionary et al., 2017)
accurately classify MAGs to the species level using a fixed threshold for histories and
sequence similarity. Generally, the recovered genomes can be clustered functional
annotations
to strain-level bins and species group bins (SGBs) at the thresholds of antiSMASH Secondary BGCs https://antis antiSMASH (
99% and 95% ANI, respectively. Nevertheless, high quality is required to metabolite mash.secon Blin et al.,
obtain high taxonomic accuracy for MAGs. ‘biosynthetic darymetabo 2019)
gene clusters’ lites.org
(BGCs)
3. Advanced analyses of MAGs
CARD Reference DNA Antibiotic http:// BLAST (
and protein resistance arpcard.mcm Altschul et al.,
Genomes of uncultured and previously uncharacterized microbial sequences, genes (ARGs), aster.c 1990), RGI(
taxa can be obtained by metagenomic binning. These recovered ge­ detection Antibiotic Alcock et al.,
nomes can then provide new resources for microbial genome databases. models, and Resistance 2020)
bioinformatics Ontology
Thus, advanced analyses, including through functional analysis, tools for the (ARO)
comparative genomic analysis, and host prediction of viral genomes, can molecular basis
yield valuable biological insights from MAGs. of bacterial
antimicrobial
resistance
3.1. Potential functional capacities of microbiota
(AMR)
VFDB Virulence VFs http://www. BLAST (
MAGs can be used to further explore the potential functional ca­ factors (VFs) of mgc.ac.cn/ Altschul et al.,
pacities of microbiota. For example, Stewart et al. (2019b) constructed a bacterial VFs/ 1990),
gene catalog containing 10.69 million predicted genes from > 5000 pathogens. VFanalyzer (
Liu et al.,
rumen MAGs and isolate genomes, leading to the identification of 442, 2019)
917 genes involved in carbohydrate metabolism. A large percentage of
these predicted CAZymes exhibited low sequence similarity to those in
the CAZy database. This catalog consequently provides an important (Table 7).
gene resource for encoded proteins and enzymes in the study of rumen
microbiomes. Nayfach et al. (2020) also constructed a catalog contain­ 3.2. Evolution and variation of microbiota
ing 111,428,992 full-length genes that were classified into 5794,145
protein clusters (PCs) as part of the Earth’s Microbiomes (GEM) catalog. Extensive strain diversity within species suggests that genomic in­
Among these PCs, nearly 70% were newly identified and represent new formation from a single genome is insufficient to reflect the gene pool
functional capacities. In addition, 87,187 novel biosynthetic gene clus­ and functional capacities for a species. Increased numbers of microbial
ters (BGCs) were identified and subjected to in depth analysis of their genomes allow comparative genomic analysis to become a powerful
biosynthetic potentials. method to understand the relatedness and genomic variation among
Importantly, microbial cultivation is an effective method for exper­ strains within species. Comparative genomics analyses includes evalu­
imental verification of microbial functions, but can also expand the ating phylogenetic relationships among taxa (Pasolli et al., 2019; De
reservoir of known taxa and functions inferred by metagenomic analyses Filippis et al., 2020), genome-wide comparisons (Koonin and Mushe­
(Lagier et al., 2018; Liu et al., 2020). Vice versa, MAGs can provide gian, 1996; Koonin et al., 1996), identification of SNPs (Nayfach et al.,
information for improving cultivation conditions via analysis of ge­ 2019) and pan-genome analysis (Medini et al., 2005; Bezuidt et al.,
nomes, thereby increasing the number of bacterial strains that are suc­ 2016; Pasolli et al., 2019). Karcher et al. (2020) suggested that microbial
cessfully isolated and cultivated (Nayfach et al., 2019). Previous studies population structures are associated with geography and the evolu­
of MAGs have suggested uncultured bacteria can exhibit small genome tionary relationships of their host based on comparative genomic anal­
sizes and slow replication rates, while also lacking many genes associ­ ysis of 1321 Eubacterium rectale genomes recovered from human gut
ated with cultivated organisms due to the loss of numerous relatively metagenomes spanning geographic origins and host lifestyle. Likewise,
conserved metabolic pathways (Brown et al., 2015, 2016; Nayfach et al., De Filippis et al. (2020) used comparative genomics to show that
2019). Databases commonly used for functional annotations of MAGs functional potential and diversity vary among 3000
include the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kane­ Faecalibacterium-like MAGs based on host age, origin, lifestyle, and
hisa and Goto, 2000), carbohydrate-active enzymes (CAZy) database disease. Tett et al. (2019) explored the global population structure and
(Lombard et al., 2014), evolutionary genealogy of genes non-supervised genetic diversity of Prevotella copri using 1023 genomes (comprising 17
orthologous groups (eggNOGs) (Huerta-Cepas et al., 2019), antibiotics sequenced isolates and 1006 MAGs). In addition to the above, several
and secondary metabolite analysis shell (antiSMASH) (Blin et al., 2021), studies have shown that horizontal gene transfer is a dominant force in
Comprehensive Antibiotic Resistance Database (CARD) (Alcock et al., prokaryotic evolution. Comparative genomics can provide a deeper
2020) and the Virulence Factor Database (VFDB) (Chen et al., 2016) understanding of the source of genes from different microbial species at

7
Y. Zhou et al. Microbiological Research 260 (2022) 127023

the genomic level (Koonin and Wolf, 2008; Bezuidt et al., 2016). that converts dopamine to m-tyramine. These results suggest that it is
Moreover, comparative genomics can help to determine the presence or necessary to analyze the interactions between microorganisms and drug
absence of genes with important functions (e.g., antibiotic resistance metabolisms at the strain or even genomic levels. Consequently, evalu­
genes and virulence genes) (Edwards and Holt, 2013). ating the interactions and mechanisms between MAGs and drug meta­
bolism is of considerable significance for treating diseases and in drug
3.3. Host prediction for viral genomes development.

Viruses play significant roles in global ecosystem functioning (Suttle, 4.2. Application prospects of MAGs in animals
2007; Shkoporov and Hill, 2019). Moreover, viruses are estimated to be
the most abundant microbial entities on Earth and out-number their Animal microbiomes contain abundant biological resources, such as
prokaryotic hosts by over an order of magnitude. However, the micro­ encoded enzymes (Singh et al., 2014; Rashamuse et al., 2017; Freitas
bial hosts of viruses are largely unknown, and especially for uncultivated et al., 2019), antimicrobials (Ochoa et al., 2018; Akbar et al., 2019;
viruses. MAGs can be valuable resources for investigating of virus-host Chevrette et al., 2019) and immunomodulators (Yin et al., 2018). Here,
interactions. Many approaches exist that enable the prediction of pre­ we focus on the application prospects of MAGs in wild animals, live­
dict bacteriophage-bacteria relationships, including CRISPR-spacer stock, and poultry.
matching, genetic homology matcheing, and abundance profile com­ Wild animals exhibit strong immune systems and adaptability, that
parisons (Edwards et al., 2016). Thousands of interactions between vi­ may be due to their microbiomes. Levin et al. (2021) discovered new
ruses and bacteria have been identified from MAGs using these toxin-metabolizing genes using 5080 MAGs recovered from the gut
approaches (Nayfach et al., 2020; Danko et al., 2021) and provide metagenomes of over 180 wild animals. These results demonstrate the
considerable knowledge for the understanding for interactions between potential for using animal metagenome databases to identify microbial
phages and their host bacteria. functions and explore biological resources. The study also showed that
the gut microbiome composition of wild animal is correlated with host
4. Application prospects of MAGs heredity, diet, habitat environments, social structures, and lifespans.
Several other studies have also reported similar results (Wu et al., 2018;
MAGs have huge application prospects in human, animal and other Perofsky et al., 2019; Huang et al., 2022). These insights contribute to a
environments via their combination with other data including applica­ more comprehensive understanding of host-microbiota interactions and
tion as clinical indicators, diagnosing disease status, host phenotypes, improve strategies for wildlife conservation.
and for other omics data (including culturomics and metabolomics). In Livestock and poultry such as pigs, cows, sheep and chickens, are
contrast to traditional culture-based methods and 16 S rRNA sequencing closely related to human life. One study observed significant differences
methods, the assembly of microbial genomes through metagenomics can in gut microbial community composition and function between high and
simultaneously detect diverse bacteria, archaea and viruses, and also low feed efficiency host groups (Xie et al., 2021). Consequently,
reveal the functions of microorganisms at the strain or even genomic improving feed efficiency may promote the productivity of livestock and
levels. Here, we introduce some potential application areas and specific poultry in addition to food products, such as meat, eggs, milk, and other
application examples of MAGs in human, animal and other products. Feed additives, such as probiotics, prebiotics, and synbiotics,
environments. can improve feed efficiency and animal growth rates, and can also help
to treat diseases such as diarrhea (Markowiak and Slizewska, 2018).
4.1. Application prospects of MAGs in human studies MAGs provide rich resources for identifying potential probiotics sup­
plements for animal feed.
Microbiomes play important roles in human health and under­
standing the function of these microbial communities along with their 4.3. Application prospects of MAGs for other environments
specific strains can provide biotechnology prospects for therapeutic
fields and open up new potential treatments of diseases. Specifically, MAGs have also been recovered from diverse natural environments
strain-level resolution of MAGs can help to promote infectious disease (Parks et al., 2017; Nayfach et al., 2020; Danko et al., 2021). Microbial
diagnosis, microbiome analyses in diseased and healthy states, pre­ communities and their metabolites are present in air, building materials,
dictions of antimicrobial resistance (Bradley et al., 2015; Lo et al., waters, soils and other environments where they may significant im­
2018), detection of virulence determinants (Teh et al., 2021; Wang et al., plications for human health (Gilbert and Stephens, 2018; Jack et al.,
2021), and disease biomarker identification (Sommer et al., 2017; Dapa 2018; Lehtimaki et al., 2021; Zhu et al., 2019). Further, antimicrobial
et al., 2022). Phages have been used to treat human diseases due to resistance genes (ARGs) of bacteria can be transferred to humans,
growing antimicrobial resistance in the post-antibiotic age (Dedrick leading to drug ineffectiveness and increasing the risk of disease trans­
et al., 2019; Sabino et al., 2020; Hsu et al., 2021). To this end, MAGs are mission (Sun et al., 2020; Wu et al., 2022). MAGs in natural environ­
important resources for identifying host bacteria-phage associations and ments are important resources for identifying potential hosts of ARGs,
provide valuable information for developing phage therapy (Fujimoto and facilitate the identification of potential risks to public health in
et al., 2020). environments. A recently constructed catalog of global urban microbial
Microbiota exhibit important influences on drug metabolism and ecosystems and antimicrobial resistance highlighted the potential ap­
therapeutic effects (Zimmermann et al., 2019; Balaich et al., 2021). plications of MAGs from nature environments in public health (Danko
MAGs allow the prediction of potential functional capacities via analysis et al., 2021). Within agriculture, MAGs can allow researchers to explore
of genome sequences (e.g., by encoding enzymes involved in secondary the effects of rhizosphere microbiomes and soil microbiomes on nutrient
metabolite synthesis) and understanding the relationships between uptake, disease resistance, stress resistance, and of plant production at
microbiomes and drug metabolism. For example, Maini Rekdal et al. the strain level (Carrion et al., 2019). Further, microorganisms are
(2019) identified an inter-species gut bacterial pathway for the meta­ important sources of numerous enzymes, antimicrobials, bacteriocins,
bolism of Levodopa that is the primary medication used Parkinson’s and many other natural products, leading to their widespread use in
disease. In this pathway, Levodopa is first converted to dopamine by food, chemical, and pharmaceutical industries and many others (Houde
Enterococcus faecalis and then converted to m-tyramine by Eggerthella et al., 2004; Shin et al., 2013; Singh et al., 2016; Hug et al., 2020). Thus,
lenta, thereby reducing the effectiveness of the drug. It should be noted natural environment MAGs are significant resources for microbiome
that not all E.lenta strains convert dopamine to m-tyramine. The strains research in the fields of public health, agriculture, industry and many
capable of this activity have a single-nucleotide polymorphism in dadh others.

8
Y. Zhou et al. Microbiological Research 260 (2022) 127023

5. Challenges and opportunities metaproteomics approaches. Integrating metagenomic, metatran­


scriptomic, metaproteomic, metabolomic and viromic analyses can
The number of MAGs has rapidly increased in recent years with the provide a systems-level understanding of both the composition and
development of high-throughput sequencing technologies. However, functional capacities of microbiomes, along with interactions among
uniform bioinformatic procedures and quality standards have yet to be microbiota (Bikel et al., 2015; Mallick et al., 2017; Abu-Ali et al., 2018;
produced for assembled genomes. Consequently, MAG quality varies Liu et al., 2020). Thus, it is necessary to investigate the causality of
greatly across studies. In addition, important metadata information microorganisms in host phenotypes by integrating multiple omics
related to MAGs, including features, sample sources, and corresponding datasets including transcriptomics, metabolomics, and culturomics.
sequencing data, are often missing from public records. These lack of Assembly-free metagenomic profiling approaches can used to rapidly
data brings challenges for the reuse or integration of public datasets obtain genes and functional profiles by aligning metagenomic sequence
(Scholz et al., 2012). The deficiency of this important information also reads to reference gene catalogs (Brown et al., 2019). Such approaches
seriously restricts comparisons among different datasets and in-depth can also help to identify low-abundance species that are difficult to
analysis of microbiome data. Kasmanas et al. (2020) recently con­ assemble (Quince et al., 2017). Nevertheless, a comprehensive and
structed a curated and standardized database of metadata for human complete catalog of genomes and genes would be required for identi­
metagenomes. Further, Liu et al. (2020) provided a practical guide for fying uncharacterized organisms because it is difficult to identify pre­
the reproducible analysis of microbiome data, suggesting that re­ viously uncharacterized microorganisms. Consequently, MAG and
searchers should submit raw sequence data, detailed metadata, and non-redundant gene catalog would be valuable resources for
associated code with the publication studies. These suggestions provide assembly-free metagenomic sequencing data analyses.
a framework for constructing unified metagenomic databases and
standards for metadata submission in the future. 6. Conclusions
Furthermore, optimized algorithms are required in each step of
generating MAGs from metagenomes to improve assembly efficiency, Genomic binning from metagenomes provides a vast number of un­
MAG quality, and taxonomic resolution. In particular, MAG quality re­ cultured microbial genomes from shotgun metagenomic sequencing
lies on the quality of assembled contigs. However, the correct assembly data and significantly expands the known phylogenetic diversity of
of contigs from sequence reads is a challenge due to high strain diversity, microbial communities. In this review, we provide a reference for re­
the presence of repeat sequences, and the influence of high abundance searchers to choose the proper tools and pipelines for metagenomic
species. Third-generation sequencing technology can produce longer sequencing-based studies. Further, the review provides a framework to
contigs and more complete genomes (Stewart et al., 2019b), but suffers better understand the contributions and application prospects of MAGs
from a higher error rate (Jain et al., 2016). Metagenomic assembly from in microbiome studies. The quality of draft genomes has yet to be
the combination of both long and short sequence reads can help to improved by combining data from multiple sequencing technologies and
obtain higher quality, or nearly complete genomes, leading to more improvements in bioinformatics algorithms. Nevertheless, MAGs pro­
accurate taxonomic classification and more comprehensive functional vide critical genomic information for isolating and culturing microor­
genome annotation (Bertrand et al., 2019). In contrast to metagenomic ganisms, in addition to improving the investigation of relationships
sequencing, single-cell genomic sequencing can produce higher quality between microbiomes and host phenotypes. Culture-based studies are
genomes with low abundance, while also linking gene functions to consequently critically needed to further elucidate and confirm the
specific microbial strains. However, challenges exist for single-cell functional capacities of microorganisms related to these MAGs.
genomic sequencing via cell-sorting, the presence of chimeric reads
and uneven read coverage (Xu and Zhao, 2018). Consequently, the Author Contributions
combination of metagenomic sequencing and single-cell genomic
sequencing can compensate for individual weaknesses of the two tech­ Y. Z. designed, wrote and revised the manuscript. M. L. and J. Y.
niques and will help to expand our understanding of the diversity and edited the manuscript. All authors contributed to the article and
functional gene capacity of uncultured bacteria. approved the submitted version.
The vast majority of MAGs recovered from shotgun metagenomic
sequencing data by metagenomic binning are bacteria, followed by a Acknowledgements
small percentage of archaea MAGs. Further, viral sequences are rarely
binned, and are easily combined with bacterial, archaeal, or eukaryotic We appreciate professor Congying Chen in the State key laboratory
sequences (Xie et al., 2021). Thus, archaeal, viral, and fungi MAGs are of pig genetic improvement and production technology, Jiangxi Agri­
lacking. This discrepancy is likely due to variable abundances of mi­ cultural University for his suggestion on the revision of article. This
crobial species affecting the assembly of low abundance species. Most work was supported by the National Natural Science Foundation of
studies that generate MAGs focus on bacterial taxa. Thus, a relatively China (31772579 and 31760654).
adequate understanding of bacterial composition exists for various en­
vironments. However, the interactions between bacterial species remain
unclear. Nevertheless, some investigations of the interactions between Declaration of Competing Interest
bacteria have been conducted by integrating culturomes, metagenomes
(or 16 S rRNA gene sequencing data), transcriptomes, and mathematical The authors declare that they have no competing interests.
models (Buffie et al., 2015; Zhao et al., 2018; D’Hoe et al., 2019). Mi­
crobial interactions are critical in structuring the composition and Appendix A. Supporting information
functional capacity of microbial communities, in addition to regulating
the health and behavior of the hosts. Fungi can affect bacterial growth, Supplementary data associated with this article can be found in the
nutrient availability, and ecosystem function (de Menezes et al., 2017; online version at doi:10.1016/j.micres.2022.127023.
Pierce et al., 2020). Likewise, phages have been reported to play
essential roles in regulating bacterial diversity and affecting host im­ References
munity through coevolution with bacteria (Mirzaei and Maurice, 2017;
De Sordi et al., 2019). Moreover, Heyer et al. (2019) showed that mi­ Abu-Ali, G.S., Mehta, R.S., Lloyd-Price, J., Mallick, H., Branck, T., Ivey, K.L., Drew, D.A.,
DuLong, C., Rimm, E., Izard, J., Chan, A.T., Huttenhower, C., 2018.
crobial communities in biogas plants are shaped by bacterial-archaeal Metatranscriptome of human faecal microbial communities in a cohort of adult men.
syntrophic interactions and phage-bacterial interactions using Nat. Microbiol. 3 (3), 356–366.

9
Y. Zhou et al. Microbiological Research 260 (2022) 127023

Akbar, N., Siddiqui, R., Sagathevan, K.A., Khan, N.A., 2019. Gut bacteria of animals/ Wilson, D.J., Wyllie, D.H., Diel, R., Niemann, S., Feuerriegel, S., Kohl, T.A.,
pests living in polluted environments are a potential source of antibacterials. Appl. Ismail, N., Omar, S.V., Smith, E.G., Buck, D., McVean, G., Walker, A.S., Peto, T.E.,
Microbiol. Biotechnol. 103 (10), 3955–3964. Crook, D.W., Iqbal, Z., 2015. Rapid antibiotic-resistance predictions from genome
Albertsen, M., Hugenholtz, P., Skarshewski, A., Nielsen, K.L., Tyson, G.W., Nielsen, P.H., sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat.
2013. Genome sequences of rare, uncultured bacteria obtained by differential Commun. 6, 10063.
coverage binning of multiple metagenomes. Nat. Biotechnol. 31 (6), 533–538. Brown, C.T., Hug, L.A., Thomas, B.C., Sharon, I., Castelle, C.J., Singh, A., Wilkins, M.J.,
Alcock, B.P., Raphenya, A.R., Lau, T.T.Y., Tsang, K.K., Bouchard, M., Edalatmand, A., Wrighton, K.C., Williams, K.H., Banfield, J.F., 2015. Unusual biology across a group
Huynh, W., Nguyen, A.V., Cheng, A.A., Liu, S., Min, S.Y., Miroshnichenko, A., comprising more than 15% of domain bacteria. Nature 523 (7559), 208-11.
Tran, H.K., Werfalli, R.E., Nasir, J.A., Oloni, M., Speicher, D.J., Florescu, A., Brown, C.T., Olm, M.R., Thomas, B.C., Banfield, J.F., 2016. Measurement of bacterial
Singh, B., Faltyn, M., Hernandez-Koutoucheva, A., Sharma, A.N., Bordeleau, E., replication rates in microbial communities. Nat. Biotechnol. 34 (12), 1256–1263.
Pawlowski, A.C., Zubyk, H.L., Dooley, D., Griffiths, E., Maguire, F., Winsor, G.L., Brown, S.M., Chen, H., Hao, Y., Laungani, B.P., Ali, T.A., Dong, C., Lijeron, C., Kim, B.,
Beiko, R.G., Brinkman, F.S.L., Hsiao, W.W.L., Domselaar, G.V., McArthur, A.G.C.A.R. Wultsch, C., Pei, Z., Krampis, K., 2019. MGS-Fast: Metagenomic shotgun data fast
D., 2020. antibiotic resistome surveillance with the comprehensive antibiotic annotation using microbial gene catalogs. Gigascience 8, 4.
resistance database. Nucleic Acids Res. 48 (D1), D517–D525, 2020. Browne, H.P., Forster, S.C., Anonye, B.O., Kumar, N., Neville, B.A., Stares, M.D.,
Alla, M., Vladislav, S., Alexey, G., 2016. MetaQUAST: evaluation of metagenome Goulding, D., Lawley, T.D., 2016. Culturing of ’unculturable’ human microbiota
assemblies. Bioinformatics 32 (7), 1088–1090. reveals novel taxa and extensive sporulation. Nature 533 (7604), 543–546.
Almeida, A., Mitchell, A.L., Boland, M., Forster, S.C., Gloor, G.B., Tarkowska, A., Buffie, C.G., Bucci, V., Stein, R.R., McKenney, P.T., Ling, L., Gobourne, A., No, D., Liu, H.,
Lawley, T.D., Finn, R.D., 2019. A new genomic blueprint of the human gut Kinnebrew, M., Viale, A., Littmann, E., van den Brink, M.R., Jenq, R.R., Taur, Y.,
microbiota. Nature 568 (7753), 499–504. Sander, C., Cross, J.R., Toussaint, N.C., Xavier, J.B., Pamer, E.G., 2015. Precision
Almeida, A., Nayfach, S., Boland, M., Strozzi, F., Beracochea, M., Shi, Z.J., Pollard, K.S., microbiome reconstitution restores bile acid mediated resistance to Clostridium
Sakharova, E., Parks, D.H., Hugenholtz, P., Segata, N., Kyrpides, N.C., Finn, R.D., difficile. Nature 517 (7533), 205–208.
2020. A unified catalog of 204,938 reference genomes from the human gut Carrion, V.J., Perez-Jaramillo, J., Cordovez, V., Tracanna, V., de Hollander, M., Ruiz-
microbiome. Nat. Biotechnol. Buck, D., Mendes, L.W., van Ijcken, W.F.J., Gomez-Exposito, R., Elsayed, S.S.,
Alneberg, J., Bjarnason, B.S., de Bruijn, I., Schirmer, M., Quick, J., Ijaz, U.Z., Lahti, L., Mohanraju, P., Arifah, A., van der Oost, J., Paulson, J.N., Mendes, R., van Wezel, G.
Loman, N.J., Andersson, A.F., Quince, C., 2014. Binning metagenomic contigs by P., Medema, M.H., Raaijmakers, J.M., 2019. Pathogen-induced activation of disease-
coverage and composition. Nat. Methods 11 (11), 1144–1146. suppressive functions in the endophytic root microbiome. Science 366 (6465),
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local 606–612.
alignment search tool. J. Mol. Biol. 215 (3), 403–410. Chapman, J.A., Ho, I., Sunkara, S., Luo, S., Schroth, G.P., Rokhsar, D.S., 2011.
Anantharaman, K., Brown, C.T., Hug, L.A., Sharon, I., Castelle, C.J., Probst, A.J., Meraculous: de novo genome assembly with short paired-end reads. PLoS One 6 (8),
Thomas, B.C., Singh, A., Wilkins, M.J., Karaoz, U., Brodie, E.L., Williams, K.H., e23501.
Hubbard, S.S., Banfield, J.F., 2016. Thousands of microbial genomes shed light on Chaumeil, P.A., Mussig, A.J., Hugenholtz, P., Parks, D.H., 2019. GTDB-Tk: a toolkit to
interconnected biogeochemical processes in an aquifer system. Nat. Commun. 7, classify genomes with the genome taxonomy database. Bioinformatics.
13219. Chen, C., Zhou, Y., Fu, H., Xiong, X., Fang, S., Jiang, H., Wu, J., Yang, H., Gao, J.,
Andrews S. FastQC: a quality control tool for high throughput sequence data. http:// Huang, L., 2021a. Expanded catalog of microbial genes and metagenome-assembled
www.bioinformatics.babraham.ac.uk/projects/fastqc 2010. genomes from the pig gut microbiome. Nat. Commun. 12 (1), 1106.
Asnicar, F., Thomas, A.M., Beghini, F., Mengoni, C., Manara, S., Manghi, P., Zhu, Q., Chen, I.A., Chu, K., Palaniappan, K., Ratner, A., Huang, J., Huntemann, M., Hajek, P.,
Bolzan, M., Cumbo, F., May, U., Sanders, J.G., Zolfo, M., Kopylova, E., Pasolli, E., Ritter, S., Varghese, N., Seshadri, R., Roux, S., Woyke, T., Eloe-Fadrosh, E.A.,
Knight, R., Mirarab, S., Huttenhower, C., Segata, N., 2020. Precise phylogenetic Ivanova, N.N., Kyrpides, N.C., 2021b. The IMG/M data management and analysis
analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. system v.6.0: new tools and advanced capabilities. Nucleic Acids Res. 49 (D1),
Nat. Commun. 11 (1), 2500. D751–D763.
Balaich, J., Estrella, M., Wu, G., Jeffrey, P.D., Biswas, A., Zhao, L., Korennykh, A., Chen, L., Zheng, D., Liu, B., Yang, J., Jin, Q., 2016. VFDB 2016: hierarchical and refined
Donia, M.S., 2021. The human microbiome encodes resistance to the antidiabetic dataset for big data analysis–10 years on. Nucleic Acids Res. 44 (D1), D694–D697.
drug acarbose. Nature 600 (7887), 110–115. Chen, S., Zhou, Y., Chen, Y., Gu, J., 2018. fastp: an ultra-fast all-in-one FASTQ
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V. preprocessor. Bioinformatics 34 (17), i884–i890.
M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., Pyshkin, A.V., Sirotkin, A.V., Chevrette, M.G., Carlson, C.M., Ortega, H.E., Thomas, C., Ananiev, G.E., Barns, K.J.,
Vyahhi, N., Tesler, G., Alekseyev, M.A., Pevzner, P.A., 2012. SPAdes: a new genome Book, A.J., Cagnazzo, J., Carlos, C., Flanigan, W., Grubbs, K.J., Horn, H.A.,
assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19 Hoffmann, F.M., Klassen, J.L., Knack, J.J., Lewin, G.R., McDonald, B.R., Muller, L.,
(5), 455-77. Melo, W.G.P., Pinto-Tomas, A.A., Schmitz, A., Wendt-Pienkowski, E., Wildman, S.,
Bertrand, D., Shaw, J., Kalathiyappan, M., Ng, A.H.Q., Kumar, M.S., Li, C., Dvornicic, M., Zhao, M., Zhang, F., Bugni, T.S., Andes, D.R., Pupo, M.T., Currie, C.R., 2019. The
Soldo, J.P., Koh, J.Y., Tong, C., Ng, O.T., Barkham, T., Young, B., Marimuthu, K., antimicrobial potential of Streptomyces from insect microbiomes. Nat. Commun. 10
Chng, K.R., Sikic, M., Nagarajan, N., 2019. Hybrid metagenomic assembly enables (1), 516.
high-resolution analysis of resistance determinants and mobile elements in human Chikhi RRizk, G., 2013. Space-efficient and exact de Bruijn graph representation based
microbiomes. Nat. Biotechnol. 37 (8), 937–944. on a Bloom filter. Algorithms Mol. Biol. 8 (1), 22.
Bezuidt, O.K., Pierneef, R., Gomri, A.M., Adesioye, F., Makhalanyane, T.P., Kharroub, K., Danko, D., Bezdan, D., Afshin, E.E., Ahsanuddin, S., Bhattacharya, C., Butler, D.J.,
Cowan, D.A., 2016. The geobacillus pan-genome: implications for the evolution of Chng, K.R., Donnellan, D., Hecht, J., Jackson, K., Kuchin, K., Karasikov, M.,
the genus. Front. Microbiol. 7, 723. Lyons, A., Mak, L., Meleshko, D., Mustafa, H., Mutai, B., Neches, R.Y., Ng, A.,
Bikel, S., Valdez-Lara, A., Cornejo-Granados, F., Rico, K., Canizales-Quinteros, S., Nikolayeva, O., Nikolayeva, T., Png, E., Ryon, K.A., Sanchez, J.L., Shaaban, H.,
Soberon, X., Del Pozo-Yauner, L., Ochoa-Leyva, A., 2015. Combining metagenomics, Sierra, M.A., Thomas, D., Young, B., Abudayyeh, O.O., Alicea, J., Bhattacharyya, M.,
metatranscriptomics and viromics to explore novel microbial interactions: towards a Blekhman, R., Castro-Nallar, E., Canas, A.M., Chatziefthimiou, A.D., Crawford, R.W.,
systems-level understanding of human microbiome. Comput. Struct. Biotechnol. J. De Filippis, F., Deng, Y., Desnues, C., Dias-Neto, E., Dybwad, M., Elhaik, E.,
13, 390–401. Ercolini, D., Frolova, A., Gankin, D., Gootenberg, J.S., Graf, A.B., Green, D.C.,
Blin, K., Shaw, S., Steinke, K., Villebro, R., Ziemert, N., Lee, S.Y., Medema, M.H., Hajirasouliha, I., Hastings, J.J.A., Hernandez, M., Iraola, G., Jang, S., Kahles, A.,
Weber, T., 2019. antiSMASH 5.0: updates to the secondary metabolite genome Kelly, F.J., Knights, K., Kyrpides, N.C., Labaj, P.P., Lee, P.K.H., Leung, M.H.Y.,
mining pipeline. Nucleic Acids Res. 47 (W1), W81–W87. Ljungdahl, P.O., Mason-Buck, G., McGrath, K., Meydan, C., Mongodin, E.F.,
Blin, K., Shaw, S., Kautsar, S.A., Medema, M.H., Weber, T., 2021. The antiSMASH Moraes, M.O., Nagarajan, N., Nieto-Caballero, M., Noushmehr, H., Oliveira, M.,
database version 3: increased taxonomic coverage and new query features for Ossowski, S., Osuolale, O.O., Ozcan, O., Paez-Espino, D., Rascovan, N., Richard, H.,
modular enzymes. Nucleic Acids Res. 49 (D1), D639–D643. Ratsch, G., Schriml, L.M., Semmler, T., Sezerman, O.U., Shi, L., Shi, T., Siam, R.,
Boisvert, S., Laviolette, F., Corbeil, J.Ray, 2010. Ray: simultaneous assembly of reads Song, L.H., Suzuki, H., Court, D.S., Tighe, S.W., Tong, X., Udekwu, K.I., Ugalde, J.A.,
from a mix of high-throughput sequencing technologies. J. Comput. Biol. Valentine, B., Vassilev, D.I., Vayndorf, E.M., Velavan, T.P., Wu, J., Zambrano, M.M.,
1519–1533. Zhu, J., Zhu, S., Mason, C.E., 2021. International Meta SUBC. A global metagenomic
Bolger, A.M., Lohse, M., Usadel, B., 2014. Trimmomatic: a flexible trimmer for Illumina map of urban microbiomes and antimicrobial resistance. Cell 184 (13), 3376–3393
sequence data. Bioinformatics 30 (15), 2114–2120. e17.
Bowers, R.M., Kyrpides, N.C., Stepanauskas, R., Harmon-Smith, M., Doud, D., Reddy, T. Dapa, T., Ramiro, R.S., Pedro, M.F., Gordo, I., Xavier, K.B., 2022. Diet leaves a genetic
B.K., Schulz, F., Jarett, J., Rivers, A.R., Eloe-Fadrosh, E.A., Tringe, S.G., Ivanova, N. signature in a keystone member of the gut microbiota. Cell Host Microbe 30 (2),
N., Copeland, A., Clum, A., Becraft, E.D., Malmstrom, R.R., Birren, B., Podar, M., 183–199 e10.
Bork, P., Weinstock, G.M., Garrity, G.M., Dodsworth, J.A., Yooseph, S., Sutton, G., De Filippis, F., Pasolli, E., Ercolini, D., 2020. Newly explored faecalibacterium diversity
Glockner, F.O., Gilbert, J.A., Nelson, W.C., Hallam, S.J., Jungbluth, S.P., Ettema, T.J. is connected to age, lifestyle, geography, and disease. Curr. Biol.
G., Tighe, S., Konstantinidis, K.T., Liu, W.T., Baker, B.J., Rattei, T., Eisen, J.A., De Sordi, L., Lourenco, M., Debarbieux, L., 2019. The battle within: interactions of
Hedlund, B., McMahon, K.D., Fierer, N., Knight, R., Finn, R., Cochrane, G., Karsch- bacteriophages and bacteria in the gastrointestinal tract. Cell Host Microbe 25 (2),
Mizrachi, I., Tyson, G.W., Rinke, C., Genome Standards, C., Lapidus, A., Meyer, F., 210–218.
Yilmaz, P., Parks, D.H., Eren, A.M., Schriml, L., Banfield, J.F., Hugenholtz, P., Dedrick, R.M., Guerrero-Bustamante, C.A., Garlena, R.A., Russell, D.A., Ford, K.,
Woyke, T., 2017. Minimum information about a single amplified genome (MISAG) Harris, K., Gilmour, K.C., Soothill, J., Jacobs-Sera, D., Schooley, R.T., Hatfull, G.F.,
and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Spencer, H., 2019. Engineered bacteriophages for treatment of a patient with a
Biotechnol. 35 (8), 725–731. disseminated drug-resistant Mycobacterium abscessus. Nat. Med. 25 (5), 730–733.
Bradley, P., Gordon, N.C., Walker, T.M., Dunn, L., Heys, S., Huang, B., Earle, S., D’Hoe, K., Vet, S., Faust, K., Moens, F., Falony, G., Gonze, D., Llorens-Rico, V., Gelens, L.,
Pankhurst, L.J., Anson, L., de Cesare, M., Piazza, P., Votintseva, A.A., Golubchik, T., Danckaert, J., De Vuyst, L., Raes, J., 2019. Correction: Integrated culturing,

10
Y. Zhou et al. Microbiological Research 260 (2022) 127023

modeling and transcriptomics uncovers complex interactions and emergent behavior Human Microbiome Jumpstart Reference Strains, C., Nelson, K.E., Weinstock, G.M.,
in a three-species synthetic gut community. Elife 8. Highlander, S.K., Worley, K.C., Creasy, H.H., Wortman, J.R., Rusch, D.B.,
Edwards, D.J., Holt, K.E., 2013. Beginner’s guide to comparative bacterial genome Mitreva, M., Sodergren, E., Chinwalla, A.T., Feldgarden, M., Gevers, D., Haas, B.J.,
analysis using next-generation sequence data. Micro Inf. Exp. 3 (1), 2. Madupu, R., Ward, D.V., Birren, B.W., Gibbs, R.A., Methe, B., Petrosino, J.F.,
Edwards, R.A., McNair, K., Faust, K., Raes, J., Dutilh, B.E., 2016. Computational Strausberg, R.L., Sutton, G.G., White, O.R., Wilson, R.K., Durkin, S., Giglio, M.G.,
approaches to predict bacteriophage-host relationships. FEMS Microbiol. Rev. 40 Gujja, S., Howarth, C., Kodira, C.D., Kyrpides, N., Mehta, T., Muzny, D.M.,
(2), 258–272. Pearson, M., Pepin, K., Pati, A., Qin, X., Yandava, C., Zeng, Q., Zhang, L., Berlin, A.
Eren, A.M., Esen, O.C., Quince, C., Vineis, J.H., Morrison, H.G., Sogin, M.L., Delmont, T. M., Chen, L., Hepburn, T.A., Johnson, J., McCorrison, J., Miller, J., Minx, P.,
O., 2015. Anvi’o: an advanced analysis and visualization platform for ’omics data. Nusbaum, C., Russ, C., Sykes, S.M., Tomlinson, C.M., Young, S., Warren, W.C.,
PeerJ 3, e1319. Badger, J., Crabtree, J., Markowitz, V.M., Orvis, J., Cree, A., Ferriera, S., Fulton, L.L.,
Ewels, P., Magnusson, M., Lundin, S., Kaller, M., 2016. MultiQC: summarize analysis Fulton, R.S., Gillis, M., Hemphill, L.D., Joshi, V., Kovar, C., Torralba, M.,
results for multiple tools and samples in a single report. Bioinformatics 32 (19), Wetterstrand, K.A., Abouellleil, A., Wollam, A.M., Buhay, C.J., Ding, Y., Dugan, S.,
3047–3048. FitzGerald, M.G., Holder, M., Hostetler, J., Clifton, S.W., Allen-Vercoe, E., Earl, A.M.,
Feng, Y., Wang, Y., Zhu, B., Gao, G.F., Guo, Y., Hu, Y., 2021. Metagenome-assembled Farmer, C.N., Liolios, K., Surette, M.G., Xu, Q., Pohl, C., Wilczek-Boney, K., Zhu, D.
genomes and gene catalog from the chicken gut microbiome aid in deciphering A., 2010. catalog of reference genomes from the human microbiome. Science 328
antibiotic resistomes. Commun. Biol. 4 (1), 1305. (5981), 994–999.
Fodor, A.A., DeSantis, T.Z., Wylie, K.M., Badger, J.H., Ye, Y., Hepburn, T., Hu, P., Ito, T., Sekizuka, T., Kishi, N., Yamashita, A., Kuroda, M., 2019. Conventional culture
Sodergren, E., Liolios, K., Huot-Creasy, H., Birren, B.W., Earl, A.M., 2012. The "most methods with commercially available media unveil the presence of novel culturable
wanted" taxa from the human microbiome for whole genome sequencing. PLoS One bacteria. Gut Microbes 10 (1), 77–91.
7 (7), e41294. Jack, Gilbert, Brent, Stephens. Microbiology of the built environment. Nature Reviews
Forouzan, E., Shariati, P., Mousavi Maleki, M.S., Karkhane, A.A., Yakhchali, B., 2018. Microbiology 2018.
Practical evaluation of 11 de novo assemblers in metagenome assembly. Jain, M., Olsen, H.E., Paten, B., Akeson, M., 2016. The Oxford nanopore MinION:
J. Microbiol. Methods 151, 99–105. delivery of nanopore sequencing to the genomics community. Genome Biol. 17 (1),
Freitas, R.C., Marques, H.I.F., Silva, M., Cavalett, A., Odisi, E.J., Silva, B.L.D., 239.
Montemor, J.E., Toyofuku, T., Kato, C., Fujikura, K., Kitazato, H., Lima, A.O.S., Kanehisa, M., Sato, Y., Morishima, K., 2016. BlastKOALA and GhostKOALA: KEGG Tools
2019. Evidence of selective pressure in whale fall microbiome proteins and its for Functional Characterization of Genome and Metagenome Sequences. J. Mol. Biol.
potential application to industry. Mar. Genom. 45, 21–27. 428 (4), 726–731.
Fujimoto, K., Kimura, Y., Shimohigoshi, M., Satoh, T., Sato, S., Tremmel, G., Kanehisa, S., Goto, M., 2000. KEGG: kyoto Encycl. Genes Genomes Nucleic Acids Res, 28,
Uematsu, M., Kawaguchi, Y., Usui, Y., Nakano, Y., Hayashi, T., Kashima, K., Yuki, Y., 1, pp. 27–30.
Yamaguchi, K., Furukawa, Y., Kakuta, M., Akiyama, Y., Yamaguchi, R., Crowe, S.E., Kang, D.D., Li, F., Kirton, E., Thomas, A., Egan, R., An, H., Wang, Z., 2019. MetaBAT 2:
Ernst, P.B., Miyano, S., Kiyono, H., Imoto, S., Uematsu, S., 2020. Metagenome data an adaptive binning algorithm for robust and efficient genome reconstruction from
on intestinal phage-bacteria associations aids the development of phage therapy metagenome assemblies. PeerJ 7, e7359.
against pathobionts. Cell Host Microbe 28 (3), 380–389 e9. Karcher, N., Pasolli, E., Asnicar, F., Huang, K.D., Tett, A., Manara, S., Armanini, F.,
Gao, S., Sung, W.K., Nagarajan, N., 2011. J Comput Biol 2011 reconstructing optimal Bain, D., Duncan, S.H., Louis, P., Zolfo, M., Manghi, P., Valles-Colomer, M.,
genomic scaffolds with high-throughput paired-end sequences, Opera 1681 1691. Raffaeta, R., Rota-Stabelli, O., Collado, M.C., Zeller, G., Falush, D., Maixner, F.,
Gawad, C., Koh, W., Quake, S.R., 2016. Single-cell genome sequencing: current state of Walker, A.W., Huttenhower, C., Segata, N., 2020. Analysis of 1321 Eubacterium
the science. Nat. Rev. Genet 17 (3), 175-88. rectale genomes from metagenomes uncovers complex phylogeographic population
Gilbert JAStephens, B., 2018. Microbiology of the built environment. Nat. Rev. Microbiol structure and subspecies functional adaptations. Genome Biol. 21 (1), 138.
16 (11), 661–670. Kasmanas, J.C., Bartholomaus, A., Correa, F.B., Tal, T., Jehmlich, N., Herberth, G., von
Girotto, S., Pizzi, C., Comin, M., 2016. MetaProb: accurate metagenomic reads binning Bergen, M., Stadler, P.F., Carvalho, A., Nunes da Rocha, U., 2020.
based on probabilistic sequence signatures. Bioinformatics 32 (17), i567–i575. HumanMetagenomeDB: a public repository of curated and standardized metadata
Glendinning, L., Stewart, R.D., Pallen, M.J., Watson, K.A., Watson, M., 2020. Assembly of for human metagenomes. Nucleic Acids Res.
hundreds of novel bacterial genomes from the chicken caecum. Genome Biol. 21 (1), Koonin, E.V., Mushegian, A.R., 1996. Complete genome sequences of cellular life forms:
34. glimpses of theoretical evolutionary genomics. Curr. Opin. Genet. Dev. 6 (6),
Gullert, S., Fischer, M.A., Turaev, D., Noebauer, B., Ilmberger, N., Wemheuer, B., 757–762.
Alawi, M., Rattei, T., Daniel, R., Schmitz, R.A., Grundhoff, A., Streit, W.R., 2016. Koonin, E.V., Mushegian, A.R., Rudd, K.E., 1996. Sequencing and analysis of bacterial
Deep metagenome and metatranscriptome analyses of microbial communities genomes. Curr. Biol. 6 (4), 404–416.
affiliated with an industrial biogas fermenter, a cow rumen, and elephant feces Koonin, Eugene V., Wolf, Yuri I., 2008. Genomics of bacteria and archaea: the emerging
reveal major differences in carbohydrate hydrolysis strategies. Biotechnol. Biofuels dynamic view of the prokaryotic world. Nucleic Acids Res. 36 (21), 6688-719.
9, 121. Koren, S., Walenz, B.P., Berlin, K., Miller, J.R., Bergman, N.H., Phillippy, A.M., 2017.
Gurevich, A., Saveliev, V., Vyahhi, N., Tesler, G., 2013. QUAST: quality assessment tool Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and
for genome assemblies. Bioinformatics 29 (8), 1072–1075. repeat separation. Genome Res. 27 (5), 722–736.
Hess, M., Sczyrba, A., Egan, R., Kim, T.W., Chokhawala, H., Schroth, G., Luo, S., Clark, D. Lagier, J.C., Khelaifia, S., Alou, M.T., Ndongo, S., Dione, N., Hugon, P., Caputo, A.,
S., Chen, F., Zhang, T., Mackie, R.I., Pennacchio, L.A., Tringe, S.G., Visel, A., Cadoret, F., Traore, S.I., Seck, E.H., Dubourg, G., Durand, G., Mourembou, G.,
Woyke, T., Wang, Z., Rubin, E.M., 2011. Metagenomic discovery of biomass- Guilhot, E., Togo, A., Bellali, S., Bachar, D., Cassir, N., Bittar, F., Delerce, J.,
degrading genes and genomes from cow rumen. Science 331 (6016), 463–467. Mailhe, M., Ricaboni, D., Bilen, M., Dangui Nieko, N.P., Dia Badiane, N.M.,
Heyer, R., Schallert, K., Siewert, C., Kohrs, F., Greve, J., Maus, I., Klang, J., Klocke, M., Valles, C., Mouelhi, D., Diop, K., Million, M., Musso, D., Abrahao, J., Azhar, E.I.,
Heiermann, M., Hoffmann, M., Puttker, S., Calusinska, M., Zoun, R., Saake, G., Bibi, F., Yasir, M., Diallo, A., Sokhna, C., Djossou, F., Vitton, V., Robert, C., Rolain, J.
Benndorf, D., Reichl, U., 2019. Metaproteome analysis reveals that syntrophy, M., La Scola, B., Fournier, P.E., Levasseur, A., Raoult, D., 2016. Culture of previously
competition, and phage-host interaction shape microbial communities in biogas uncultured members of the human gut microbiota by culturomics. Nat. Microbiol. 1,
plants. Microbiome 7 (1), 69. 16203.
Houde A., Kademi A., Leblanc D., 2004. Lipases and Their Industrial Applications: An Lagier, J.C., Dubourg, G., Million, M., Cadoret, F., Bilen, M., Fenollar, F., Levasseur, A.,
Overview. Rolain, J.M., Fournier, P.E., Raoult, D., 2018. Culturing the human microbiota and
Hsu, C.L., Duan, Y., Fouts, D.E., Schnabl, B., 2021. Intestinal virome and therapeutic culturomics. Nat. Rev. Microbiol. 16, 540–550.
potential of bacteriophages in liver disease. J. Hepatol. 75 (6), 1465–1475. Langmead BSalzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nat.
Huang, G., Wang, L., Li, J., Hou, R., Wang, M., Wang, Z., Qu, Q., Zhou, W., Nie, Y., Methods 9 (4), 357–359.
Hu, Y., Ma, Y., Yan, L., Wei, H., Wei, F., 2022. Seasonal shift of the gut microbiome Lehtimaki, J., Thorsen, J., Rasmussen, M.A., Hjelmso, M., Shah, S., Mortensen, M.S.,
synchronizes host peripheral circadian rhythm for physiological adaptation to a low- Trivedi, U., Vestergaard, G., Bonnelykke, K., Chawes, B.L., Brix, S., Sorensen, S.J.,
fat diet in the giant panda. Cell Rep. 38 (3), 110203. Bisgaard, H., Stokholm, J., 2021. Urbanized microbiota in infants, immune
Huerta-Cepas, J., Forslund, K., Coelho, L.P., Szklarczyk, D., Jensen, L.J., von Mering, C., constitution, and later risk of atopic diseases. J. Allergy Clin. Immunol. 148 (1),
Bork, P., 2017. Fast genome-wide functional annotation through orthology 234–243.
assignment by eggNOG-mapper. Mol. Biol. Evol. 34 (8), 2115–2122. Lesker, T.R., Durairaj, A.C., Galvez, E.J.C., Lagkouvardos, I., Baines, J.F., Clavel, T.,
Huerta-Cepas, J., Szklarczyk, D., Heller, D., Hernandez-Plaza, A., Forslund, S.K., Sczyrba, A., McHardy, A.C., Strowig, T., 2020. An integrated metagenome catalog
Cook, H., Mende, D.R., Letunic, I., Rattei, T., Jensen, L.J., von Mering, C., Bork, P., reveals new insights into the murine gut microbiome. Cell Rep. 30 (9), 2909–2922
2019. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated e6.
orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47 Levin, D., Raab, N., Pinto, Y., Rothschild, D., Zanir, G., Godneva, A., Mellul, N.,
(D1), D309–D314. Futorian, D., Gal, D., Leviatan, S., Zeevi, D., Bachelet, I., Segal, E., 2021. Diversity
Hug, J.J., Krug, D., Müller, R., 2020. Bacteria as genetically programmable producers of and functional landscapes in the microbiota of animals in the wild. Science 372,
bioactive natural products. Nat. Rev. Chem. 4, 4. 6539.
Hugenholtz, P., 2002. Exploring prokaryotic diversity in the genomic era. Genome Biol. 3 Li, D., Liu, C.M., Luo, R., Sadakane, K., Lam, T.W., 2015. MEGAHIT: an ultra-fast single-
(2). REVIEWS0003. node solution for large and complex metagenomics assembly via succinct de Bruijn
Hultman, J., Waldrop, M.P., Mackelprang, R., David, M.M., McFarland, J., Blazewicz, S. graph. Bioinformatics 31 (10), 1674–1676.
J., Harden, J., Turetsky, M.R., McGuire, A.D., Shah, M.B., VerBerkmoes, N.C., Lee, L. Li, H., Durbin, R., 2009. Fast and accurate short read alignment with Burrows-Wheeler
H., Mavrommatis, K., Jansson, J.K., 2015. Multi-omics of permafrost, active layer transform. Bioinformatics 25 (14), 1754–1760.
and thermokarst bog soil microbiomes. Nature 521 (7551), 208-12.

11
Y. Zhou et al. Microbiological Research 260 (2022) 127023

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Nurk, S., Meleshko, D., Korobeynikov, A., Pevzner, P.A., 2017. metaSPAdes: a new
Abecasis, G., Durbin, R., 2009. Genome project data processing S. The sequence versatile metagenomic assembler. Genome Res. 27 (5), 824–834.
alignment/map format and SAMtools. Bioinformatics 25 (16), 2078–2079. Ochoa, J.L., Sanchez, L.M., Koo, B.M., Doherty, J.S., Rajendram, M., Huang, K.C.,
Liu, B., Zheng, D., Jin, Q., Chen, L., Yang, J., 2019. VFDB 2019: a comparative Gross, C.A., Linington, R.G., 2018. Marine mammal microbiota yields novel
pathogenomic platform with an interactive web interface. Nucleic Acids Res. 47 antibiotic with potent activity against clostridium difficile. ACS Infect. Dis. 4 (1),
(D1), D687–D692. 59–67.
Liu YX, Qin Y., Chen T., Lu M., Qian X., Guo X., Bai Y. A practical guide to amplicon and Olm, M.R., Brown, C.T., Brooks, B., Banfield, J.F., 2017. dRep: a tool for fast and accurate
metagenomic analysis of microbiome data. Protein Cell 2020. genomic comparisons that enables improved genome recovery from metagenomes
Lo, S.W., Kumar, N., Wheeler, N.E., 2018. Breaking the code of antibiotic resistance. Nat. through de-replication. ISME J. 11 (12), 2864–2868.
Rev. Microbiol 16 (5), 262. Olson, N.D., Treangen, T.J., Hill, C.M., Cepeda-Espinoza, V., Ghurye, J., Koren, S.,
Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P.M., Henrissat, B., 2014. The Pop, M., 2019. Metagenomic assembly through the lens of validation: recent
carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 42, advances in assessing and improving the quality of genomes assembled from
D490–D495. metagenomes. Brief. Bioinform. 20 (4), 1140–1150.
Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y., Ondov, B.D., Treangen, T.J., Melsted, P., Mallonee, A.B., Bergman, N.H., Koren, S.,
Tang, J., Wu, G., Zhang, H., Shi, Y., Liu, Y., Yu, C., Wang, B., Lu, Y., Han, C., Phillippy, A.M., 2016. Mash: fast genome and metagenome distance estimation using
Cheung, D.W., Yiu, S.M., Peng, S., Xiaoqian, Z., Liu, G., Liao, X., Li, Y., Yang, H., MinHash. Genome Biol. 17 (1), 132.
Wang, J., Lam, T.W., Wang, J., 2012. SOAPdenovo2: an empirically improved Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P., Tyson, G.W., 2015. CheckM:
memory-efficient short-read de novo assembler. Gigascience 1 (1), 18. assessing the quality of microbial genomes recovered from isolates, single cells, and
Maini Rekdal, V., Bess, E.N., Bisanz, J.E., Turnbaugh, P.J., Balskus, E.P., 2019. Discovery metagenomes. Genome Res. 25 (7), 1043–1055.
and inhibition of an interspecies gut bacterial pathway for Levodopa metabolism. Parks, D.H., Rinke, C., Chuvochina, M., Chaumeil, P.A., Woodcroft, B.J., Evans, P.N.,
Science 364, 6445. Hugenholtz, P., Tyson, G.W., 2017. Recovery of nearly 8,000 metagenome-
Mallick, H., Ma, S., Franzosa, E.A., Vatanen, T., Morgan, X.C., Huttenhower, C., 2017. assembled genomes substantially expands the tree of life. Nat. Microbiol. 2 (11),
Experimental design and quantitative analysis of microbial community multiomics. 1533–1542.
Genome Biol. 18 (1), 228. Parks, D.H., Chuvochina, M., Chaumeil, P.A., Rinke, C., Mussig, A.J., Hugenholtz, P.,
Manni, M., Berkeley, M.R., Seppey, M., Simao, F.A., Zdobnov, E.M., 2021. BUSCO 2020. A complete domain-to-species taxonomy for Bacteria and Archaea. Nat.
update: novel and streamlined workflows along with broader and deeper Biotechnol. 38 (9), 1079–1086.
phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Pasolli, E., Asnicar, F., Manara, S., Zolfo, M., Karcher, N., Armanini, F., Beghini, F.,
Mol. Biol. Evol. Manghi, P., Tett, A., Ghensi, P., Collado, M.C., Rice, B.L., DuLong, C., Morgan, X.C.,
Marcy, Y., Ouverney, C., Bik, E.M., Losekann, T., Ivanova, N., Martin, H.G., Szeto, E., Golden, C.D., Quince, C., Huttenhower, C., Segata, N., 2019. Extensive unexplored
Platt, D., Hugenholtz, P., Relman, D.A., Quake, S.R., 2007. Dissecting biological human microbiome diversity revealed by over 150,000 genomes from metagenomes
“dark matter” with single-cell genetic analysis of rare and uncultivated TM7 spanning age, geography, and lifestyle. Cell 176 (3), 649–662 e20.
microbes from the human mouth. Proc. Natl. Acad. Sci. U.S.A. 104 (29), Patro, R., Duggal, G., Love, M.I., Irizarry, R.A., Kingsford, C., 2017. Salmon provides fast
11889–11894. and bias-aware quantification of transcript expression. Nat. Methods 14 (4),
Markowiak, K., Slizewska, P., 2018. Role probiotics, prebiotics synbiotics Anim. Nutr. 417–419.
Gut Pathog. 10, 21. Peng, X., Wilken, S.E., Lankiewicz, T.S., Gilmore, S.P., Brown, J.L., Henske, J.K., Swift, C.
Martin, M., 2011. Cutadapt removes adapter sequences from high-throughput L., Salamov, A., Barry, K., Grigoriev, I.V., Theodorou, M.K., Valentine, D.L.,
sequencing reads. EMBnet J. 17 (1), 10–12. O’Malley, M.A., 2021. Genomic and functional analyses of fungal and bacterial
Medini, D., Donati, C., Tettelin, H., Masignani, V., Rappuoli, R., 2005. The microbial pan- consortia that enable lignocellulose breakdown in goat gut microbiomes. Nat.
genome. Curr. Opin. Genet. Dev. 15 (6), 589–594. Microbiol. 6 (4), 499–511.
von Meijenfeldt, F.A.B., Arkhipova, K., Cambuy, D.D., Coutinho, F.H., Dutilh, B.E., 2019. Peng, Y., Leung, H.C., Yiu, S.M., Chin, F.Y., 2012. IDBA-UD: a de novo assembler for
Robust taxonomic classification of uncharted microbial sequences and bins with CAT single-cell and metagenomic sequencing data with highly uneven depth.
and BAT. Genome Biol. 20 (1), 217. Bioinformatics 28 (11), 1420–1428.
de Menezes, A.B., Richardson, A.E., Thrall, P.H., 2017. Linking fungal-bacterial co- Perez-Cobas, A.E., Gomez-Valero, L., Buchrieser, C., 2020. Metagenomic approaches in
occurrences to soil ecosystem function. Curr. Opin. Microbiol. 37, 135–141. microbial ecology: an update on whole-genome and marker gene sequencing
Miller, J.R., Koren, S., Sutton, G., 2010. Assembly algorithms for next-generation analyses. Micro Genom. 6, 8.
sequencing data. Genomics 95 (6), 315–327. Perofsky, A.C., Lewis, R.J., Meyers, L.A., 2019. Terrestriality and bacterial transfer: a
Mineeva, O., Rojas-Carulla, M., Ley, R.E., Scholkopf, B., Youngblut, N.D., 2020. comparative study of gut microbiomes in sympatric Malagasy mammals. ISME J. 13
DeepMAsED: evaluating the quality of metagenomic assemblies. Bioinformatics 36 (1), 50–63.
(10), 3011–3017. Pevzner, P.A., Tang, H., Waterman, M.S., 2001. An Eulerian path approach to DNA
Mirzaei MKMaurice, C.F., 2017. Menage a trois in the human gut: interactions between fragment assembly. Proc. Natl. Acad. Sci. USA 98 (17), 9748–9753.
host, bacteria and phages. Nat. Rev. Microbiol. 15 (7), 397–408. Pierce, E.C., Morin, M., Little, J.C., Liu, R.B., Tannous, J., Keller, N.P., Pogliano, K.,
Mukherjee, S., Seshadri, R., Varghese, N.J., Eloe-Fadrosh, E.A., Meier-Kolthoff, J.P., Wolfe, B.E., Sanchez, L.M., Dutton, R.J., 2020. Bacterial-fungal interactions revealed
Goker, M., Coates, R.C., Hadjithomas, M., Pavlopoulos, G.A., Paez-Espino, D., by genome-wide analysis of bacterial mutant fitness. Nat. Microbiol.
Yoshikuni, Y., Visel, A., Whitman, W.B., Garrity, G.M., Eisen, J.A., Hugenholtz, P., Quince, C., Walker, A.W., Simpson, J.T., Loman, N.J., Segata, N., 2017. Shotgun
Pati, A., Ivanova, N.N., Woyke, T., Klenk, H.P., Kyrpides, N.C., 2017. 1,003 reference metagenomics, from sampling to analysis. Nat. Biotechnol. 35 (9), 833–844.
genomes of bacterial and archaeal isolates expand coverage of the tree of life. Nat. Quinlan ARHall, I.M., 2010. BEDTools: a flexible suite of utilities for comparing genomic
Biotechnol. 35 (7), 676–683. features. Bioinformatics, 26 (6), 841–842.
Mukherjee, S., Stamatis, D., Bertsch, J., Ovchinnikova, G., Katta, H.Y., Mojica, A., Rashamuse, K., Sanyika Tendai, W., Mathiba, K., Ngcobo, T., Mtimka, S., Brady, D.,
Chen, I.A., Kyrpides, N.C., Reddy, T., 2019. Genomes OnLine database (GOLD) v.7: 2017. Metagenomic mining of glycoside hydrolases from the hindgut bacterial
updates and new features. Nucleic Acids Res. 47 (D1), D649–D659. symbionts of a termite (Trinervitermes trinervoides) and the characterization of a
Nayfach, S., Shi, Z.J., Seshadri, R., Pollard, K.S., Kyrpides, N.C., 2019. New insights from multimodular beta-1,4-xylanase (GH11). Biotechnol. Appl. Biochem 64 (2),
uncultivated genomes of the global human gut microbiome. Nature 568 (7753), 174–186.
505–510. Rodriguez, R.L., Gunturu, S., Harvey, W.T., Rossello-Mora, R., Tiedje, J.M., Cole, J.R.,
Nayfach, S., Roux, S., Seshadri, R., Udwary, D., Varghese, N., Schulz, F., Wu, D., Paez- Konstantinidis, K.T., 2018. The Microbial Genomes Atlas (MiGA) webserver:
Espino, D., Chen, I.M., Huntemann, M., Palaniappan, K., Ladau, J., Mukherjee, S., taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome
Reddy, T.B.K., Nielsen, T., Kirton, E., Faria, J.P., Edirisinghe, J.N., Henry, C.S., level. Nucleic Acids Res 46 (W1), W282–W288.
Jungbluth, S.P., Chivian, D., Dehal, P., Wood-Charlson, E.M., Arkin, A.P., Tringe, S. Roller, M., Lucic, V., Nagy, I., Perica, T., Vlahovicek, K., 2013. Environmental shaping of
G., Visel, A., Consortium, I.M.D., Woyke, T., Mouncey, N.J., Ivanova, N.N., codon usage and functional adaptation across microbial communities. Nucleic Acids
Kyrpides, N.C., Eloe-Fadrosh, E.A., 2020. A genomic catalog of Earth’s microbiomes. Res 41 (19), 8842–8852.
Nat. Biotechnol. Rossello-Mora RAmann, R., 2015. Past and future species definitions for Bacteria and
Nielsen, H.B., Almeida, M., Juncker, A.S., Rasmussen, S., Li, J., Sunagawa, S., Plichta, D. Archaea. Syst. Appl. Microbiol 38 (4), 209–216.
R., Gautier, L., Pedersen, A.G., Le Chatelier, E., Pelletier, E., Bonde, I., Nielsen, T., Sabino, J., Hirten, R.P., Colombel, J.F., 2020. Review article: bacteriophages in
Manichanh, C., Arumugam, M., Batto, J.M., Quintanilha Dos Santos, M.B., Blom, N., gastroenterology-from biology to clinical applications. Aliment Pharm. Ther. 51 (1),
Borruel, N., Burgdorf, K.S., Boumezbeur, F., Casellas, F., Dore, J., Dworzynski, P., 53–63.
Guarner, F., Hansen, T., Hildebrand, F., Kaas, R.S., Kennedy, S., Kristiansen, K., Saheb Kashaf, S., Almeida, A., Segre, J.A., Finn, R.D., 2021. Recovering prokaryotic
Kultima, J.R., Leonard, P., Levenez, F., Lund, O., Moumen, B., Le Paslier, D., genomes from host-associated, short-read shotgun metagenomic sequencing data.
Pons, N., Pedersen, O., Prifti, E., Qin, J., Raes, J., Sorensen, S., Tap, J., Tims, S., Nat. Protoc. 16 (5), 2520–2541.
Ussery, D.W., Yamada, T., Meta, H.I.T.C., Renault, P., Sicheritz-Ponten, T., Bork, P., Sangwan, N., Xia, F., Gilbert, J.A., 2016. Recovering complete and draft population
Wang, J., Brunak, S., Ehrlich, S.D., 2014. , Meta HITC. Identification and assembly of genomes from metagenome datasets. Microbiome 4, 8.
genomes and genetic elements in complex metagenomic samples without using Scholz, M., Lo, C.C., Chain, P.S., 2014. Improved assemblies using a source-agnostic
reference genomes. Nat. Biotechnol. 32 (8), 822–828. pipeline for MetaGenomic Assembly by Merging (MeGAMerge) of contigs. Sci. Rep.
Nissen, J.N., Johansen, J., Allesoe, R.L., Sonderby, C.K., Armenteros, J.J.A., Gronbech, C. 4, 6480.
H., Jensen, L.J., Nielsen, H.B., Petersen, T.N., Winther, O., Rasmussen, S., 2021. Scholz, M.B., Lo, C.C., Chain, P.S., 2012. Next generation sequencing and bioinformatic
Improved metagenome binning and assembly using deep variational autoencoders. bottlenecks: the current state of metagenomic data analysis. Curr. Opin. Biotechnol.
Nat. Biotechnol. 23 (1), 9–15.

12
Y. Zhou et al. Microbiological Research 260 (2022) 127023

Sczyrba, A., Hofmann, P., Belmann, P., Koslicki, D., Janssen, S., Droge, J., Gregor, I., Treangen, T.J., Dan, D.S., Angly, F.E., Koren, S., Pop, M., 2011. Next Generation
Majda, S., Fiedler, J., Dahms, E., Bremges, A., Fritz, A., Garrido-Oter, R., Sequence Assembly with AMOS. Curr. Protoc. Bioinform. 33.
Jorgensen, T.S., Shapiro, N., Blood, P.D., Gurevich, A., Bai, Y., Turaev, D., Tully, B.J., Graham, E.D., Heidelberg, J.F., 2018. The reconstruction of 2,631 draft
DeMaere, M.Z., Chikhi, R., Nagarajan, N., Quince, C., Meyer, F., Balvociute, M., metagenome-assembled genomes from the global oceans. Sci. Data 5, 170203.
Hansen, L.H., Sorensen, S.J., Chia, B.K.H., Denis, B., Froula, J.L., Wang, Z., Egan, R., Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., Richardson, P.M.,
Don Kang, D., Cook, J.J., Deltel, C., Beckstette, M., Lemaitre, C., Peterlongo, P., Solovyev, V.V., Rubin, E.M., Rokhsar, D.S., Banfield, J.F., 2004. Community
Rizk, G., Lavenier, D., Wu, Y.W., Singer, S.W., Jain, C., Strous, M., Klingenberg, H., structure and metabolism through reconstruction of microbial genomes from the
Meinicke, P., Barton, M.D., Lingner, T., Lin, H.H., Liao, Y.C., Silva, G.G.Z., Cuevas, D. environment. Nature 428 (6978), 37–43.
A., Edwards, R.A., Saha, S., Piro, V.C., Renard, B.Y., Pop, M., Klenk, H.P., Goker, M., Uritskiy, G.V., DiRuggiero, J., Taylor, J., 2018. MetaWRAP-a flexible pipeline for
Kyrpides, N.C., Woyke, T., Vorholt, J.A., Schulze-Lefert, P., Rubin, E.M., Darling, A. genome-resolved metagenomic data analysis. Microbiome 6 (1), 158.
E., Rattei, T., McHardy, A.C., 2017. Critical assessment of metagenome van der Walt, A.J., van Goethem, M.W., Ramond, J.B., Makhalanyane, T.P., Reva, O.,
interpretation-a benchmark of metagenomics software. Nat. Methods 14 (11), Cowan, D.A., 2017. Assembling metagenomes, one community at a time. BMC
1063–1071. Genom. 18 (1), 521.
Seemann, T., 2014. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30 Vicedomini, R., Vezzi, F., Scalabrin, S., Arvestad, L., Policriti, A., 2013. GAM-NGS:
(14), 2068–2069. genomic assemblies merger for next generation sequencing. BMC Bioinform. 14, 7.
Seshadri, R., Leahy, S.C., Attwood, G.T., Teh, K.H., Lambie, S.C., Cookson, A.L., Eloe- S6.
Fadrosh, E.A., Pavlopoulos, G.A., Hadjithomas, M., Varghese, N.J., Paez-Espino, D., Wang, M., Doenyas, C., Wan, J., Zeng, S., Cai, C., Zhou, J., Liu, Y., Yin, Z., Zhou, W.,
Hungate project, Perry, c, Henderson, R., Creevey, G., Terrapon, C.J., Lapebie, N., 2021. Virulence factor-related gut microbiota genes and immunoglobulin A levels as
Drula, P., Lombard, E., Rubin, V., Kyrpides, E., Henrissat, N.C., Woyke, B., novel markers for machine learning-based classification of autism spectrum disorder.
Ivanova, T., Kelly WJ, N.N., 2018. Cultivation and sequencing of rumen microbiome Comput. Struct. Biotechnol. J. 19, 545–554.
members from the hungate1000 collection. Nat. Biotechnol. 36 (4), 359–367. Wang, W., Hu, H., Zijlstra, R.T., Zheng, J., Ganzle, M.G., 2019. Metagenomic
Sharon, I., Morowitz, M.J., Thomas, B.C., Costello, E.K., Relman, D.A., Banfield, J.F., reconstructions of gut microbial metabolism in weanling pigs. Microbiome 7 (1), 48.
2013. Time series community genomics analysis reveals rapid shifts in bacterial Woodcroft, B.J., Singleton, C.M., Boyd, J.A., Evans, P.N., Emerson, J.B., Zayed, A.A.F.,
species, strains, and phage during infant gut colonization. Genome Res. 23 (1), Hoelzle, R.D., Lamberton, T.O., McCalley, C.K., Hodgkins, S.B., Wilson, R.M.,
111–120. Purvine, S.O., Nicora, C.D., Li, C., Frolking, S., Chanton, J.P., Crill, P.M., Saleska, S.
Shen, W., Le, S., Li, Y., Hu, F., 2016. SeqKit: a cross-platform and ultrafast toolkit for R., Rich, V.I., Tyson, G.W., 2018. Genome-centric view of carbon processing in
FASTA/Q file Manipulation. PLoS One 11 (10), e0163962. thawing permafrost. Nature 560 (7716), 49–54.
Shin, J.H., Kim, H.U., Kim, D.I., Lee, S.Y., 2013. Production of bulk chemicals via novel Wu, D., Jin, L., Xie, J., Liu, H., Zhao, J., Ye, D., Li, X.D., 2022. Inhalable antibiotic
metabolic pathways in microorganisms. Biotechnol. Adv. 31 (6), 925–935. resistomes emitted from hospitals: metagenomic insights into bacterial hosts, clinical
Shkoporov ANHill, C., 2019. Bacteriophages of the human gut: the “known unknown” of relevance, and environmental risks. Microbiome 10 (1), 19.
the microbiome. Cell Host Microbe 25 (2), 195–209. Wu, Y., Yang, Y., Cao, L., Yin, H., Xu, M., Wang, Z., Liu, Y., Wang, X., Deng, Y., 2018.
Sieber, C.M.K., Probst, A.J., Sharrar, A., Thomas, B.C., Hess, M., Tringe, S.G., Banfield, J. Habitat environments impacted the gut microbiome of long-distance migratory swan
F., 2018. Recovery of genomes from metagenomes via a dereplication, aggregation geese but central species conserved. Sci. Rep. 8 (1), 13314.
and scoring strategy. Nat. Microbiol. 3 (7), 836–843. Wu, Y.W., Simmons, B.A., Singer, S.W., 2016. MaxBin 2.0: an automated binning
Singh, K.M., Reddy, B., Patel, D., Patel, A.K., Parmar, N., Patel, A., Patel, J.B., Joshi, C.G., algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics
2014. High potential source for biomass degradation enzyme discovery and 32 (4), 605–607.
environmental aspects revealed through metagenomics of Indian buffalo rumen. Xie, F., Jin, W., Si, H., Yuan, Y., Tao, Y., Liu, J., Wang, X., Yang, C., Li, Q., Yan, X., Lin, L.,
Biomed. Res. Int. 2014, 267189. Jiang, Q., Zhang, L., Guo, C., Greening, C., Heller, R., Guan, L.L., Pope, P.B., Tan, Z.,
Singh, R., Kumar, M., Mittal, A., Mehta, P.K., 2016. Microb. Cell. Ind. Appl. Zhu, W., Wang, M., Qiu, Q., Li, Z., Mao, S., 2021. An integrated gene catalog and
Sommer, F., Ruhlemann, M.C., Bang, C., Hoppner, M., Rehman, A., Kaleta, C., Schmitt- over 10,000 metagenome-assembled genomes from the gastrointestinal microbiome
Kopplin, P., Dempfle, A., Weidinger, S., Ellinghaus, E., Krauss-Etschmann, S., of ruminants. Microbiome 9 (1), 137.
Schmidt-Arras, D., Aden, K., Schulte, D., Ellinghaus, D., Schreiber, S., Tholey, A., Yin, M., Yan, X., Weng, W., Yang, Y., Gao, R., Liu, M., Pan, C., Zhu, Q., Li, H., Wei, Q.,
Rupp, J., Laudes, M., Baines, J.F., Rosenstiel, P., Franke, A., 2017. Microbiomarkers Shen, T., Ma, Y., Qin, H., 2018. Micro Integral Membrane Protein (MIMP), a Newly
in inflammatory bowel diseases: caveats come with caviar. Gut 66 (10), 1734–1738. Discovered Anti-inflammatory Protein Of Lactobacillus Plantarum, Enhances The
Song WZThomas, T., 2017. Binning_refiner: improving genome bins through the Gut Barrier And Modulates Microbiota And Inflammatory Cytokines. Cell Physiol.
combination of different binning programs. Bioinformatics 33 (12), 1873–1875. Biochem. 45 (2), 474–490.
Stewart, R.D., Auffret, M.D., Snelling, T.J., Roehe, R., Watson, M., 2019a. MAGpy: a Xu, Y., Zhao, F., 2018. Single-cell metagenomics: challenges and applications. Protein
reproducible pipeline for the downstream analysis of metagenome-assembled Cell 9 (5), 501–510.
genomes (MAGs). Bioinformatics 35 (12), 2150–2152. Yin, Y., Mao, X., Yang, J., Chen, X., Mao, F., Xu, Y., 2012. dbCAN: a web resource for
Stewart, R.D., Auffret, M.D., Warr, A., Walker, A.W., Roehe, R., Watson, M., 2019b. automated carbohydrate-active enzyme annotation. Nucleic Acids Res. 40,
Compendium of 4,941 rumen metagenome-assembled genomes for rumen W445–W451.
microbiome biology and enzyme discovery. Nat. Biotechnol. 37 (8), 953–961. Zaremba-Niedzwiedzka, K., Caceres, E.F., Saw, J.H., Backstrom, D., Juzokaite, L.,
Sun, J., Liao, X.P., D’Souza, A.W., Boolchandani, M., Li, S.H., Cheng, K., Luis Vancaester, E., Seitz, K.W., Anantharaman, K., Starnawski, P., Kjeldsen, K.U.,
Martinez, J., Li, L., Feng, Y.J., Fang, L.X., Huang, T., Xia, J., Yu, Y., Zhou, Y.F., Stott, M.B., Nunoura, T., Banfield, J.F., Schramm, A., Baker, B.J., Spang, A.,
Sun, Y.X., Deng, X.B., Zeng, Z.L., Jiang, H.X., Fang, B.H., Tang, Y.Z., Lian, X.L., Ettema, T.J., 2017. Asgard archaea illuminate the origin of eukaryotic cellular
Zhang, R.M., Fang, Z.W., Yan, Q.L., Dantas, G., Liu, Y.H., 2020. Environmental complexity. Nature 541 (7637), 353–358.
remodeling of human gut microbiota and antibiotic resistome in livestock farms. Zerbino DRBirney, E., 2008. Velvet: algorithms for de novo short read assembly using de
Nat. Commun. 11 (1), 1427. Bruijn graphs. Genome Res. 18 (5), 821–829.
Suttle, C.A., 2007. Marine viruses–major players in the global ecosystem. Nat. Rev. Zhao, L., Zhang, F., Ding, X., Wu, G., Lam, Y.Y., Wang, X., Fu, H., Xue, X., Lu, C., Ma, J.,
Microbiol 5 (10), 801–812. Yu, L., Xu, C., Ren, Z., Xu, Y., Xu, S., Shen, H., Zhu, X., Shi, Y., Shen, Q., Dong, W.,
Svartstrom, O., Alneberg, J., Terrapon, N., Lombard, V., de Bruijn, I., Malmsten, J., Liu, R., Ling, Y., Zeng, Y., Wang, X., Zhang, Q., Wang, J., Wang, L., Wu, Y., Zeng, B.,
Dalin, A.M., El Muller, E., Shah, P., Wilmes, P., Henrissat, B., Aspeborg, H., Wei, H., Zhang, M., Peng, Y., Zhang, C., 2018. Gut bacteria selectively promoted by
Andersson, A.F., 2017. Ninety-nine de novo assembled genomes from the moose dietary fibers alleviate type 2 diabetes. Science 359 (6380), 1151–1156.
(Alces alces) rumen microbiome provide new insights into microbial plant biomass Zhu, Y., Zhao, Y., Zhu, D., Gillings, M., Penuelas, J., Ok, Y., Capon, A., Banwartj, S.,
degradation. ISME J. 11 (11), 2538–2551. 2019. Soil biota, antimicrobial resistance and planetary health. Environment
Teh, J.J., Berendsen, E.M., Hoedt, E.C., Kang, S., Zhang, J., Zhang, F., Liu, Q., International 131, 105059.
Hamilton, A.L., Wilson-O’Brien, A., Ching, J., Sung, J.J.Y., Yu, J., Ng, S.C., Zimin, A.V., Puiu, D., Luo, M.C., Zhu, T., Koren, S., Marcais, G., Yorke, J.A., Dvorak, J.,
Kamm, M.A., Morrison, M., 2021. Novel strain-level resolution of Crohn’s disease Salzberg, S.L., 2017. Hybrid assembly of the large and highly repetitive genome of
mucosa-associated microbiota via an ex vivo combination of microbe culture and Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads
metagenomic sequencing. ISME J. 15 (11), 3326–3338. algorithm. Genome Res. 27 (5), 787–792.
Tett, A., Huang, K.D., Asnicar, F., Fehlner-Peach, H., Pasolli, E., Karcher, N., Zimmermann, M., Zimmermann-Kogadeeva, M., Wegmann, R., Goodman, A.L., 2019.
Armanini, F., Manghi, P., Bonham, K., Zolfo, M., De Filippis, F., Magnabosco, C., Mapping human microbiome drug metabolism by gut bacteria and their genes.
Bonneau, R., Lusingu, J., Amuasi, J., Reinhard, K., Rattei, T., Boulund, F., Nature 570 (7762), 462–467.
Engstrand, L., Zink, A., Collado, M.C., Littman, D.R., Eibach, D., Ercolini, D., Rota- Zou, Y., Xue, W., Luo, G., Deng, Z., Qin, P., Guo, R., Sun, H., Xia, Y., Liang, S., Dai, Y.,
Stabelli, O., Huttenhower, C., Maixner, F., Segata, N., 2019. The Prevotella Copri Wan, D., Jiang, R., Su, L., Feng, Q., Jie, Z., Guo, T., Xia, Z., Liu, C., Yu, J., Lin, Y.,
Complex Comprises Four Distinct Clades Underrepresented In Westernized Tang, S., Huo, G., Xu, X., Hou, Y., Liu, X., Wang, J., Yang, H., Kristiansen, K., Li, J.,
Populations. Cell Host Microbe 26 (5), 666–679 e7. Jia, H., Xiao, L., 2019. 1,520 reference genomes from cultivated human gut bacteria
enable functional microbiome analyses. Nat. Biotechnol. 37 (2), 179–185.

13

You might also like